Skip to content
Advertisement

regex to catch text until a signal word occurs

I’m trying to create a regex which catches a text until a signal word occurs. Until the signal word is not the first word my solution works fine. Since I’m using python with the regex module the code is

new_text = regex.sub("^(?>.*s)*?(?=SIGNALWORD)", "", text)

And

blabla blabla blabla
blabla blabla
SIGNALWORD blublub blublub
blublub blublub blublub
SIGNALWORD blabla blabla 
blabla blabla

becomes

SIGNALWORD blublub blublub
blublub blublub blublub
SIGNALWORD blabla blabla 
blabla blabla

But if the signal word is the first word it does not work properly. And

SIGNALWORD blublub blublub
blublub blublub blublub
SIGNALWORD blabla blabla 
blabla blabla

becomes

SIGNALWORD blabla blabla 
blabla blabla

I want it to do nothing if the signal word is the first word. I’ve played with the regex.DOTALL and regex.MULTILINE parameter, but I had no positive match.

Advertisement

Answer

You might use a negative lookahead (?!SIGNALWORD) to assert that the string does not start with SIGNALWORD

import regex

text = ("blabla blabla blablan"
            "blabla blablan"
            "SIGNALWORD blublub blublubn"
            "blublub blublub blublubn"
            "SIGNALWORD blabla blabla n"
            "blabla blabla")

new_text = regex.sub("^(?!SIGNALWORD)(?>.*s)*?(?=SIGNALWORD)", "", text)
print(new_text)

See the outcome of the first Python demo and the second Python demo.

Advertisement