Regular expressions are very powerful. Here are some typical use cases:
- Email validation
- Password validation
- Searching for a pattern in a string
- Searching and capturing matches in a string
In my current project, I needed to parse a CSV file and a particular field had a float value. This value could appear as any of the following:
- 0.678
- 0.003782
- 2e-08
- 1.456e-06
So it could either be a literal float or it could be in scientific notation. I’m going to use Elixir as the implementation language. We first need to parse the string into a float. Use String.to_float/1 for this.
Fire up iex.
iex(3)> String.to_float("0.678")
0.678
iex(4)> String.to_float("123.678")
123.678
iex(5)> String.to_float("2e-08")
** (ArgumentError) argument error
:erlang.binary_to_float("2e-08")
iex(5)>
Alright. so 2e-08 is not a valid scientific notation according to Elixir. What about this?
iex(6)> String.to_float("2.0e-08")
2.0e-8
iex(7)>
Thats better! So Elixir expects the initial number to be a float with decimals. So lets use a regex to fix this.
iex(8)> String.replace "2e-08", ~r/(\d+)e/, "\\1.0e"
"2.0e-08"
So we replace a digit followed by e with the same digit with decimals and e appended. And now lets pipe this onwards.
iex(9)> String.replace("2e-08", ~r/(\d+)e/, "\\1.0e") |> String.to_float
2.0e-8
iex(10)>
Cool.
iex(11)> String.replace("1.456e-06", ~r/(\d+)e/, "\\1.0e") |> String.to_float
** (ArgumentError) argument error
:erlang.binary_to_float("1.456.0e-06")
iex(11)>
Hmm… So you see that the digit gets replaced correctly. But we don’t want to do this for numbers which already are in decimal notation.
iex(14)> String.replace("1.456e-06", ~r/^\d+[^\.](\d+)e/, "\\1.0e")
"1.456e-06"
iex(15)> String.replace("2e-08", ~r/^\d+[^\.](\d+)e/, "\\1.0e")
"2e-08"
iex(16)>
We use a negative character class, and don’t replace the string if it contains a dot. However this screws up our earlier string. On a side note make sure you always have regression tests to catch these kind of bugs.
So I banged my head for a while as regular expressions are a tough beast to tame. And then I thought about using lookarounds. And specifically for this case a negative look-behind assertion.
So I need to look behind and make sure that there is no dot.
iex(16)> String.replace("2e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")
"2.0e-08"
iex(17)> String.replace("2.56e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")
"2.56e-08"
iex(18)> String.replace("2e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")
"2.0e-08"
iex(19)> String.replace("0.0894", ~r/(?<!.)(\d+)e/, "\\1.0e")
"0.0894"
iex(20)> String.replace("1.456e-06", ~r/(?<!.)(\d+)e/, "\\1.0e")
"1.456e-06"
iex(21)>
Works great!