I’m trying to create a regex which catches a text until a signal word occurs. Until the signal word is not the first word my solution works fine. Since I’m using python with the regex module the code is
new_text = regex.sub("^(?>.*s)*?(?=SIGNALWORD)", "", text)
And
blabla blabla blabla blabla blabla SIGNALWORD blublub blublub blublub blublub blublub SIGNALWORD blabla blabla blabla blabla
becomes
SIGNALWORD blublub blublub blublub blublub blublub SIGNALWORD blabla blabla blabla blabla
But if the signal word is the first word it does not work properly. And
SIGNALWORD blublub blublub blublub blublub blublub SIGNALWORD blabla blabla blabla blabla
becomes
SIGNALWORD blabla blabla blabla blabla
I want it to do nothing if the signal word is the first word. I’ve played with the regex.DOTALL
and regex.MULTILINE
parameter, but I had no positive match.
Advertisement
Answer
You might use a negative lookahead (?!SIGNALWORD)
to assert that the string does not start with SIGNALWORD
import regex text = ("blabla blabla blablan" "blabla blablan" "SIGNALWORD blublub blublubn" "blublub blublub blublubn" "SIGNALWORD blabla blabla n" "blabla blabla") new_text = regex.sub("^(?!SIGNALWORD)(?>.*s)*?(?=SIGNALWORD)", "", text) print(new_text)
See the outcome of the first Python demo and the second Python demo.