Hi I am trying to use regrex to replace everything surounded by “()” with an empty string “”, but not in the case where “()” is in an angle bracket. e.g. “<..()>” should be ignored and not replaced. Example input:
Hi<Hi(now)>_(.)
Example output:
Hi<Hi(now)>_
Following what I read from answer
I have tried using the following method:
example = "Hi<Hi(now)>_(.)" regex = re.compile(r"<[^>]*>|(([^)]*))") re.sub(regex, '', str(first_p))
But it instead outputted
'Hi_'
Can anyone explain what might have gone wrong?
Advertisement
Answer
You have wrapped the wrong alternative with a capturing group and missed the backreference in the replacement part:
import re example = "Hi<Hi(now)>_(.)" regex = re.compile(r"(<[^<>]*>)|([^()]*)") print( re.sub(regex, r'1', example) )
See the Python demo.
Note the < and > are not special chars, and need not escaping.
The (<[^<>]*>)|([^()]*) pattern captures into Group 1 any substring that starts with <, then has zero or more chars other than < and > and then ends with >, and just matches any substring between ( and ) having no other ( and ) in between.
The 1 replacement puts back the captured substring where it was.