Skip to content
Advertisement

How to ensure that at least one of A and B sign appears in the regex?

s_l = ["春天年初, ...","1999年", "很多年以前"]
for front_part in s_l:
    idx_year = re.search(r'[d]*[年]{1}[初末底前]{0,1}',front_part).end() 
    if re.search(r'[d]*[年]{1}[初末底前]{0,1}',front_part) else 0
    print(idx_year)

I want to search the idx of sub-string that has , and at the same time, there must be digits(sign A) before or [初末底前](sign B) behind “年”, for example,in s_l, it should return 4,5,0

One idea to divide the regex, like

re.search(r'[d]+[年]{1}',front_part) or re.search(r'[年]{1}[初末底前]{0,1}',front_part)

but it is too complex, other one is using (?=...) but I haven’t got the idea and how to use it,any suggestions?

Advertisement

Answer

You can use a lookbehind assertion to match an occurrence of that’s preceded by a digit. Use an alternation pattern to match one that’s followed by [初末底前]:

pattern = re.compile(r'(?<=d)年|年[初末底前]')
print([match.end() if match else 0 for match in map(pattern.search, s_l)])

This outputs:

[4, 5, 0]
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement