Skip to content
Advertisement

Regex match word not immediately preceded by another word but possibly preceded by that word before

I need to match all strings that contain one word of a list, but only if that word is not immediately preceded by another specific word. I have this regex:

.*(?<!forbidden)b(word1|word2|word3)b.*

that is still matching a sentence like hello forbidden word1 because forbidden is matched by .*. But if I remove the .* I am not anymore matching strings like hello word1, which I want to match.

Note that I want to match a string like forbidden hello word1.

Could you suggest me how to fix this problem?

Advertisement

Answer

Have a look into word boundaries bword can never touch a word character to the left.

To disallow (word1|word2|word3) if not preceded by forbidden and

  • one W (non word character)

    ^.*?b(?<!forbiddenW)(word1|word2|word3)b.*
    

    See this demo at regex101

  • multiple W

    Lookbehinds need to be of fixed length in Python regex. To get around this, an idea is to use W* outside preceded by (?<!W) for setting the position to look behind.

    ^.*?(?<!forbidden)(?<!W)W*b(word1|word2|word3)b.*
    

    Regex101 demo (in multiline demo I used [^wn] instead W for not skipping over lines)

    Certainly variable-width lookbehind, such as (?<!forbiddenW+) would be more comfortable. PyPI Regex > import regex AS re supports lookbehind of variable length: See this demo

Note: If you do not capture anything, a (?: non-capturing groups can be used as well.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement