Negative look-behind assertions

Regular expressions are very powerful. Here are some typical use cases:

  • Email validation
  • Password validation
  • Searching for a pattern in a string
  • Searching and capturing matches in a string

In my current project, I needed to parse a CSV file and a particular field had a float value. This value could appear as any of the following:

  • 0.678
  • 0.003782
  • 2e-08
  • 1.456e-06

So it could either be a literal float or it could be in scientific notation. I’m going to use Elixir as the implementation language. We first need to parse the string into a float. Use String.to_float/1 for this.

Fire up iex.

1
2
3
4
5
6
7
8
iex(3)> String.to_float("0.678")
0.678
iex(4)> String.to_float("123.678")
123.678
iex(5)> String.to_float("2e-08")       
** (ArgumentError) argument error
    :erlang.binary_to_float("2e-08")
iex(5)> 

Alright. so 2e-08 is not a valid scientific notation according to Elixir. What about this?

1
2
3
iex(6)> String.to_float("2.0e-08")
2.0e-8
iex(7)>

That’s better! So Elixir expects the initial number to be a float with decimals. So lets use a regex to fix this.

1
2
iex(8)> String.replace "2e-08", ~r/(\d+)e/, "\\1.0e"
"2.0e-08"

So we replace a digit followed by e with the same digit with decimals and e appended. And now lets pipe this onwards.

1
2
3
iex(9)> String.replace("2e-08", ~r/(\d+)e/, "\\1.0e") |> String.to_float
2.0e-8
iex(10)>

Cool.

1
2
3
4
iex(11)> String.replace("1.456e-06", ~r/(\d+)e/, "\\1.0e") |> String.to_float
** (ArgumentError) argument error
    :erlang.binary_to_float("1.456.0e-06")
iex(11)> 

Hmm… So you see that the digit gets replaced correctly. But we don’t want to do this for numbers which already are in decimal notation.

1
2
3
4
5
iex(14)> String.replace("1.456e-06", ~r/^\d+[^\.](\d+)e/, "\\1.0e")
"1.456e-06"
iex(15)> String.replace("2e-08", ~r/^\d+[^\.](\d+)e/, "\\1.0e")    
"2e-08"
iex(16)>

We use a negative character class, and don’t replace the string if it contains a dot. However this screws up our earlier string. On a side note make sure you always have regression tests to catch these kind of bugs.

So I banged my head for a while as regular expressions are a tough beast to tame. And then I thought about using lookarounds. And specifically for this case a negative look-behind assertion.

So I need to look behind and make sure that there is no dot.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
iex(16)> String.replace("2e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")   
"2.0e-08"
iex(17)> String.replace("2.56e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")
"2.56e-08"
iex(18)> String.replace("2e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")   
"2.0e-08"
iex(19)> String.replace("0.0894", ~r/(?<!.)(\d+)e/, "\\1.0e")
"0.0894"
iex(20)> String.replace("1.456e-06", ~r/(?<!.)(\d+)e/, "\\1.0e")
"1.456e-06"
iex(21)>

Works great!