I’m searching for a Regex pattern in Python 3.8 that matches a sequence of strings if and only if it’s not followed by a selection of other strings. For example, I want to match the pattern “<fruit> and <fruit>” only if the second fruit isn’t followed by “ice” or “juice”:
pat = re.compile(r"(?P<before>apples|bananas)s+ands+(?P<after>oranges|lemons)(?!s*juice|ice)")
However, this pattern has problems if the selection of strings in the negative look-ahead aren’t of the same length:
>>> pat.search("apples and oranges juice") # doesn't match -> ok >>> pat.search("apples and oranges ice") # matches -> not ok <re.Match object; span=(0, 18), match='apples and oranges'>
I ultimately want to use sub
to replace the matched sequence only if it’s not followed by the selection of strings mentioned above. Is it possible to change the behavior of the negative look-ahead to match as much as possible?
Advertisement
Answer
You only need a “negative look-ahead assertion”.
r"what you want(?!(what you do not want to follow))"
The following pattern does the job you want.
import re pat = re.compile(r"(apples|bananas)s+ands+(oranges|lemons)(?!(s*(juice|ice)))") print(pat.search("apples and oranges")) # Matches print(pat.search("apples and oranges water")) # Matches print(pat.search("apple and oranges ice")) # Does not match print(pat.search("bananas and lemons juice")) # Does not match