s_l = ["春天年初, ...","1999年", "很多年以前"] for front_part in s_l: idx_year = re.search(r'[d]*[年]{1}[初末底前]{0,1}',front_part).end() if re.search(r'[d]*[年]{1}[初末底前]{0,1}',front_part) else 0 print(idx_year)
I want to search the idx of sub-string that has 年
, and at the same time, there must be digits(sign A) before 年
or [初末底前]
(sign B) behind “年”, for example,in s_l, it should return 4,5,0
One idea to divide the regex, like
re.search(r'[d]+[年]{1}',front_part) or re.search(r'[年]{1}[初末底前]{0,1}',front_part)
but it is too complex, other one is using (?=...)
but I haven’t got the idea and how to use it,any suggestions?
Advertisement
Answer
You can use a lookbehind assertion to match an occurrence of 年
that’s preceded by a digit. Use an alternation pattern to match one that’s followed by [初末底前]
:
pattern = re.compile(r'(?<=d)年|年[初末底前]') print([match.end() if match else 0 for match in map(pattern.search, s_l)])
This outputs:
[4, 5, 0]