I have a text like this:
text = "Text1. Textt « text2 » Some other text"
i want a regex code that is able to delete the text inside the quotes and the text before it till the dot.
so the output be like that :
text = "Text1. Some other text"
the code am stuck into :
text= re.sub(r's*.*?»', '', text)
what the code actually does is delete a more than expected here’s an example :
text="Text1. Textt « text2 » Some other text Textt « text3 » other text"
the output i get is like this :
text="Text1. other text"
Advertisement
Answer
You can use
re.sub(r'(.)[^.]*«[^«»]*»', r'1', text)
See the regex demo.
(.)
– Group 1 (1
in the replacement refers to this captured value): a dot[^.]*
– zero or more chars other than a.
«[^«»]*»
– a substring between«
and»
without other«
and»
inside.
See a Python demo:
import re text = "Text1.n Textt « text2 »n Some other text" print( re.sub(r'(.)[^.]*«[^«»]*»', r'1', text) )