Where is such a regex wrong?

Tags: ,



I am using python.
The pattern is:

re.compile(r'^(.+?)-?.*?(.+?)')

The text like:

text1 = 'TVTP-S2(xxxx123123)'

text2 = 'TVTP(xxxx123123)'

I expect to get TVTP

Answer

Another option to match those formats is:

^([^-()]+)(?:-[^()]*)?([^()]*)

Explanation

  • ^ Start of string
  • ([^-()]+) Capture group 1, match 1+ times any character other than - ( and )
  • (?:-[^()]*)? As the - is excluded from the first part, optionally match - followed by any char other than ( and )
  • ([^()]*) Match from ( till ) without matching any parenthesis between them

Regex demo | Python demo

Example

import re

regex = r"^([^-()]+)(?:-[^()]*)?([^()]*)"
s = ("TVTP-S2(xxxx123123)n"
    "TVTP(xxxx123123)n")
    
print(re.findall(regex, s, re.MULTILINE))

Output

['TVTP', 'TVTP']


Source: stackoverflow