Skip to content
Advertisement

Python regex: Negative look-ahead with selection of different length strings

I’m searching for a Regex pattern in Python 3.8 that matches a sequence of strings if and only if it’s not followed by a selection of other strings. For example, I want to match the pattern “<fruit> and <fruit>” only if the second fruit isn’t followed by “ice” or “juice”:

pat = re.compile(r"(?P<before>apples|bananas)s+ands+(?P<after>oranges|lemons)(?!s*juice|ice)")

However, this pattern has problems if the selection of strings in the negative look-ahead aren’t of the same length:

>>> pat.search("apples and oranges juice")  # doesn't match -> ok

>>> pat.search("apples and oranges ice")  # matches -> not ok
<re.Match object; span=(0, 18), match='apples and oranges'>

I ultimately want to use sub to replace the matched sequence only if it’s not followed by the selection of strings mentioned above. Is it possible to change the behavior of the negative look-ahead to match as much as possible?

Advertisement

Answer

You only need a “negative look-ahead assertion”.

r"what you want(?!(what you do not want to follow))"

The following pattern does the job you want.

import re

pat = re.compile(r"(apples|bananas)s+ands+(oranges|lemons)(?!(s*(juice|ice)))")

print(pat.search("apples and oranges"))       # Matches
print(pat.search("apples and oranges water")) # Matches

print(pat.search("apple and oranges ice"))    # Does not match
print(pat.search("bananas and lemons juice")) # Does not match
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement