Skip to content
Advertisement

Substitude everything in “()” but ignore those cases where round brackects are inside square brackets “” in regrex

Hi I am trying to use regrex to replace everything surounded by “()” with an empty string “”, but not in the case where “()” is in an angle bracket. e.g. “<..()>” should be ignored and not replaced. Example input:

Hi<Hi(now)>_(.)

Example output:

Hi<Hi(now)>_

Following what I read from answer

I have tried using the following method:

example = "Hi<Hi(now)>_(.)"
regex = re.compile(r"<[^>]*>|(([^)]*))")
re.sub(regex, '', str(first_p))

But it instead outputted

'Hi_'

Can anyone explain what might have gone wrong?

Advertisement

Answer

You have wrapped the wrong alternative with a capturing group and missed the backreference in the replacement part:

import re
example = "Hi<Hi(now)>_(.)"
regex = re.compile(r"(<[^<>]*>)|([^()]*)")
print( re.sub(regex, r'1', example) )

See the Python demo.

Note the < and > are not special chars, and need not escaping.

The (<[^<>]*>)|([^()]*) pattern captures into Group 1 any substring that starts with <, then has zero or more chars other than < and > and then ends with >, and just matches any substring between ( and ) having no other ( and ) in between.

The 1 replacement puts back the captured substring where it was.

Advertisement