I’m trying to create a regex which catches a text until a signal word occurs. Until the signal word is not the first word my solution works fine. Since I’m using python with the regex module the code is
JavaScript
x
2
1
new_text = regex.sub("^(?>.*s)*?(?=SIGNALWORD)", "", text)
2
And
JavaScript
1
7
1
blabla blabla blabla
2
blabla blabla
3
SIGNALWORD blublub blublub
4
blublub blublub blublub
5
SIGNALWORD blabla blabla
6
blabla blabla
7
becomes
JavaScript
1
5
1
SIGNALWORD blublub blublub
2
blublub blublub blublub
3
SIGNALWORD blabla blabla
4
blabla blabla
5
But if the signal word is the first word it does not work properly. And
JavaScript
1
5
1
SIGNALWORD blublub blublub
2
blublub blublub blublub
3
SIGNALWORD blabla blabla
4
blabla blabla
5
becomes
JavaScript
1
3
1
SIGNALWORD blabla blabla
2
blabla blabla
3
I want it to do nothing if the signal word is the first word. I’ve played with the regex.DOTALL
and regex.MULTILINE
parameter, but I had no positive match.
Advertisement
Answer
You might use a negative lookahead (?!SIGNALWORD)
to assert that the string does not start with SIGNALWORD
JavaScript
1
12
12
1
import regex
2
3
text = ("blabla blabla blablan"
4
"blabla blablan"
5
"SIGNALWORD blublub blublubn"
6
"blublub blublub blublubn"
7
"SIGNALWORD blabla blabla n"
8
"blabla blabla")
9
10
new_text = regex.sub("^(?!SIGNALWORD)(?>.*s)*?(?=SIGNALWORD)", "", text)
11
print(new_text)
12
See the outcome of the first Python demo and the second Python demo.