I have a text like this:
JavaScript
x
4
1
text = "Text1.
2
Textt « text2 »
3
Some other text"
4
i want a regex code that is able to delete the text inside the quotes and the text before it till the dot.
so the output be like that :
JavaScript
1
3
1
text = "Text1.
2
Some other text"
3
the code am stuck into :
JavaScript
1
2
1
text= re.sub(r's*.*?»', '', text)
2
what the code actually does is delete a more than expected here’s an example :
JavaScript
1
6
1
text="Text1.
2
Textt « text2 »
3
Some other text
4
Textt « text3 »
5
other text"
6
the output i get is like this :
JavaScript
1
3
1
text="Text1.
2
other text"
3
Advertisement
Answer
You can use
JavaScript
1
2
1
re.sub(r'(.)[^.]*«[^«»]*»', r'1', text)
2
See the regex demo.
(.)
– Group 1 (1
in the replacement refers to this captured value): a dot[^.]*
– zero or more chars other than a.
«[^«»]*»
– a substring between«
and»
without other«
and»
inside.
See a Python demo:
JavaScript
1
4
1
import re
2
text = "Text1.n Textt « text2 »n Some other text"
3
print( re.sub(r'(.)[^.]*«[^«»]*»', r'1', text) )
4