Hi I am trying to use regrex to replace everything surounded by “()” with an empty string “”, but not in the case where “()” is in an angle bracket. e.g. “<..()>” should be ignored and not replaced. Example input:
Hi<Hi(now)>_(.)
Example output:
Hi<Hi(now)>_
Following what I read from answer
I have tried using the following method:
example = "Hi<Hi(now)>_(.)" regex = re.compile(r"<[^>]*>|(([^)]*))") re.sub(regex, '', str(first_p))
But it instead outputted
'Hi_'
Can anyone explain what might have gone wrong?
Advertisement
Answer
You have wrapped the wrong alternative with a capturing group and missed the backreference in the replacement part:
import re example = "Hi<Hi(now)>_(.)" regex = re.compile(r"(<[^<>]*>)|([^()]*)") print( re.sub(regex, r'1', example) )
See the Python demo.
Note the <
and >
are not special chars, and need not escaping.
The (<[^<>]*>)|([^()]*)
pattern captures into Group 1 any substring that starts with <
, then has zero or more chars other than <
and >
and then ends with >
, and just matches any substring between (
and )
having no other (
and )
in between.
The 1
replacement puts back the captured substring where it was.