Negative look-behind assertions

Regular expressions are very powerful. Here are some typical use cases:

  • Email validation
  • Password validation
  • Searching for a pattern in a string
  • Searching and capturing matches in a string

In my current project, I needed to parse a CSV file and a particular field had a float value. This value could appear as any of the following:

  • 0.678
  • 0.003782
  • 2e-08
  • 1.456e-06

So it could either be a literal float or it could be in scientific notation. I’m going to use Elixir as the implementation language. We first need to parse the string into a float. Use String.to_float/1 for this.

Fire up iex.

iex(3)> String.to_float("0.678")
iex(4)> String.to_float("123.678")
iex(5)> String.to_float("2e-08")       
** (ArgumentError) argument error

Alright. so 2e-08 is not a valid scientific notation according to Elixir. What about this?

iex(6)> String.to_float("2.0e-08")

That’s better! So Elixir expects the initial number to be a float with decimals. So lets use a regex to fix this.

iex(8)> String.replace "2e-08", ~r/(\d+)e/, "\\1.0e"

So we replace a digit followed by e with the same digit with decimals and e appended. And now lets pipe this onwards.

iex(9)> String.replace("2e-08", ~r/(\d+)e/, "\\1.0e") |> String.to_float


iex(11)> String.replace("1.456e-06", ~r/(\d+)e/, "\\1.0e") |> String.to_float
** (ArgumentError) argument error

Hmm… So you see that the digit gets replaced correctly. But we don’t want to do this for numbers which already are in decimal notation.

iex(14)> String.replace("1.456e-06", ~r/^\d+[^\.](\d+)e/, "\\1.0e")
iex(15)> String.replace("2e-08", ~r/^\d+[^\.](\d+)e/, "\\1.0e")    

We use a negative character class, and don’t replace the string if it contains a dot. However this screws up our earlier string. On a side note make sure you always have regression tests to catch these kind of bugs.

So I banged my head for a while as regular expressions are a tough beast to tame. And then I thought about using lookarounds. And specifically for this case a negative look-behind assertion.

So I need to look behind and make sure that there is no dot.

iex(16)> String.replace("2e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")   
iex(17)> String.replace("2.56e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")
iex(18)> String.replace("2e-08", ~r/(?<!.)(\d+)e/, "\\1.0e")   
iex(19)> String.replace("0.0894", ~r/(?<!.)(\d+)e/, "\\1.0e")
iex(20)> String.replace("1.456e-06", ~r/(?<!.)(\d+)e/, "\\1.0e")

Works great!