Skip to content
Advertisement

regex matches string despite negative lookahead

I want to match the first 2 words in a string, except when the second one is “feat”, then I just want to match the first word.

My plan: (w+(?: w+))(?!feat) does not work. “feat” gets matched everytime. I tried variations of the same, but to no avail.

Here’s an example string: “Technotronic feat Ya Kid K”

Thank you for your help!

Edit:

this is the string where it flips: “Technotronic feat Ya Kid K”

this is the code that should cut the string:

pattern = re.compile("^w+(?: (?!featb)w+)?")

def cut(string):
    str = pattern.search(string).group(0)

    return str

Advertisement

Answer

You can use

w+(?: (?!featb)w+)?
w+(?:s+(?!featb)w+)?

See the regex demo.

The point is that you need to restrict what the second w+ matches right before the w+ (as lookaheads match the text immediately after the current position), and to allow matching words starting with feat, you need to use a word boundary after feat in the lookahead.

Regex details:

  • w+ – one or more word chars
  • (?:s+(?!featb)w+)? – an optional non-capturing group:
    • s+ – zero or more whitespaces
    • (?!featb) – immediately to the right, there cannot be a whole word feat (so, the subsequent w+ won’t match feat but will match feature)
  • w+ – one or more word chars.

See the Python demo:

import re
pattern = re.compile(r"^w+(?: (?!featb)w+)?")

def cut(text):
    m = pattern.search(text)
    if m:
        return m.group(0)
    return string

print(cut("Technotronic feat Ya Kid K"))    # => Technotronic
print(cut("Technotronic feature Ya Kid K")) # => Technotronic feature
Advertisement