Negative look-behind assertions
Regular expressions are very powerful. Here are some typical use cases:
- Email validation
- Password validation
- Searching for a pattern in a string
- Searching and capturing matches in a string
In my current project, I needed to parse a CSV file and a particular field had a float value. This value could appear as any of the following:
- 0.678
- 0.003782
- 2e-08
- 1.456e-06
So it could either be a literal float or it could be in scientific notation. I’m going to use Elixir as the implementation language. We first need to parse the string into a float. Use String.to_float/1
for this.
Fire up iex
.
|
|
Alright. so 2e-08 is not a valid scientific notation according to Elixir. What about this?
|
|
That’s better! So Elixir expects the initial number to be a float with decimals. So lets use a regex to fix this.
|
|
So we replace a digit followed by e with the same digit with decimals and e appended. And now lets pipe this onwards.
|
|
Cool.
|
|
Hmm… So you see that the digit gets replaced correctly. But we don’t want to do this for numbers which already are in decimal notation.
|
|
We use a negative character class, and don’t replace the string if it contains a dot. However this screws up our earlier string. On a side note make sure you always have regression tests to catch these kind of bugs.
So I banged my head for a while as regular expressions are a tough beast to tame. And then I thought about using lookarounds. And specifically for this case a negative look-behind assertion.
So I need to look behind and make sure that there is no dot.
|
|
Works great!