I need to match all strings that contain one word of a list, but only if that word is not immediately preceded by another specific word. I have this regex:
.*(?<!forbidden)b(word1|word2|word3)b.*
that is still matching a sentence like hello forbidden word1 because forbidden is matched by .*. But if I remove the .* I am not anymore matching strings like hello word1, which I want to match.
Note that I want to match a string like forbidden hello word1.
Could you suggest me how to fix this problem?
Advertisement
Answer
Have a look into word boundaries bword can never touch a word character to the left.
To disallow (word1|word2|word3) if not preceded by forbidden and
- one - W(non word character)- ^.*?b(?<!forbiddenW)(word1|word2|word3)b.* 
- multiple - W- Lookbehinds need to be of fixed length in Python regex. To get around this, an idea is to use - W*outside preceded by- (?<!W)for setting the position to look behind.- ^.*?(?<!forbidden)(?<!W)W*b(word1|word2|word3)b.* - Regex101 demo (in multiline demo I used - [^wn]instead- Wfor not skipping over lines)- Certainly variable-width lookbehind, such as - (?<!forbiddenW+)would be more comfortable. PyPI Regex >- import regex AS resupports lookbehind of variable length: See this demo
Note: If you do not capture anything, a (?: non-capturing groups can be used as well.