Skip to content
Advertisement

Ignore text from dot to a specific character with regex python

I have a text like this:

text = "Text1.
        Textt « text2 »
        Some other text"

i want a regex code that is able to delete the text inside the quotes and the text before it till the dot.

so the output be like that :

text = "Text1.
    Some other text"

the code am stuck into :

text= re.sub(r's*.*?»', '', text)

what the code actually does is delete a more than expected here’s an example :

text="Text1.
        Textt « text2 »
        Some other text
        Textt « text3 »
        other text"

the output i get is like this :

text="Text1.
    other text"

Advertisement

Answer

You can use

re.sub(r'(.)[^.]*«[^«»]*»', r'1', text)

See the regex demo.

  • (.) – Group 1 (1 in the replacement refers to this captured value): a dot
  • [^.]* – zero or more chars other than a .
  • «[^«»]*» – a substring between « and » without other « and » inside.

See a Python demo:

import re
text = "Text1.n        Textt « text2 »n        Some other text"
print( re.sub(r'(.)[^.]*«[^«»]*»', r'1', text) )
Advertisement