How can I remove a sub-string from a string starting and ending with a certain character combination like:
' bla <span class=""latex""> ... This can be different1 ... </span> blub <span class=""latex""> ... This can be different2 ... </span> bleb'
That I want as result:
'bla blub bleb'
I tried something like this
string.replace('<span class=""latex"">' * '</span>', '')
but this does not work.
Is there a way to implement this?
Advertisement
Answer
This could work:
>>> import re >>> x=re.sub(r"""<span class=""latex"">.+?</span>""", "", s) >>> x ' bla blub bleb'
EDIT : after clarification by the OP, changed the answer to use lazy quantifier instead of capturing group. While this works, it is not scalable to more complex cases. If that is the case, the proper solution would be to parse the string and extract what is needed.