Python Regex pL matching issues

Question

I'm trying to match a list of keywords I have, taking care to include all Latin characters (e.g accented). Here's an example gives: Which looks correct. However: gives: This is wrong, as I expect a match for "u blah". I've tried to also use Pythons built in re module, but I cannot get it to work with pL or p{Latin}

Accepted Answer

The problem with your ((?!pL)|^)blah((?!pL)|$) regex is that the ((?!pL)|^) group contains two alternatives where the first one always fails the regex (why? Because (?!pL) is a negative lookahead that fails the match if the next char is a letter, and the next char to match is b in blah) and only ^ works all the time, i.e. your regex is equal to ^blah((?!pL)|$) and only matches at the start of string.Note (?!pL) already matches a position at the end of string, so ((?!pL)|$) = (?!pL).You should use(?<!pL)blah(?!pL)See the regex demo (switched to PCRE for the demo purposes).Note that the re-compatible version of the regex is(?<![^Wd_])blah(?![^Wd_])See the regex demo.

Advertisement

Answer