Skip to content
Advertisement

What would be the regex pattern for the following?

I have multiple regex strings in format:- Example:

A=’AB.224-QW-2018′

B=’AB.876-5-LS-2018′

C=’AB.26-LS-18′

D=’AB-123-6-LS-2017′

E=’IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L’

F=’ZX-ss-12L-AB-123-6-LS-2017-BC-22′

G=’AB.224-2018′

H=”AB.224/QW/2018′

I=”AB/224/2018′

J=’AB-10-HDB-231-NCLT-1-2017 AD-42-HH-2019′

K=”AB-1-HDB-NCLT-1-2016 AD-42-HH-2020′

L=’AB-1-HDB-NCLT-1-2016/(AD-42-HH-2020)

I want a regex pattern to get the output for the numbers that occur after the alphabets(that appear at the start) as well as the first alphabets. And at last years that are mentioned at last. There are some strings which contain 876-5,123-6 in B and D respectively. I don’t want the single number that appear after -.

My code :

re.search(r"D*d*D*(AB)D*(d+)D*(20)?(d{2})D*d*D*)

Another attempt

re.search(r"D*d*D*(AB)D*(d+)D*d?D*(20)?(d{2})D*d*D*)

Both attempts will not work for all of them. Any pattern to match all strings?

I have created groups in regex pattern and extracted them as d.group(1)+”/”+d.group(2)+”/”+d.group(4). So output is expected as following if a regex pattern matches for all of them.

Expected Output

A='AB/224/18'

B='AB/876/18'

C='AB/26/18'

D='AB/123/17'

E='AB/224/18'

F='AB/123/17'

G='AB/224/18'

H='AB/224/18'

I='AB/224/18'

J='AB/10/17'

K='AB/1/16'

L='AB/1/16'



Advertisement

Answer

You could use 3 capture groups:

b(AB)D*(d+)S*?(?:20)?(dd)b
  • b A word boundary to prevent a partial word match
  • (AB) Capture AB in group 1
  • D* Match optional non digits
  • (d+) Capture 1+ digits in group 2
  • S*? Optionally match non whitespace characters, as least as possible
  • (?:20)? Optionally match 20
  • (dd) Capture 2 digits in group 3
  • b A word boundary

Regex demo

For example using re.finditer which returns Match objects that each hold the group values.

Using enumerate you can loop the matches. Every item in the iteration returns a tuple, where the first value is the count (that you don’t need here) and the second value contains the Match object.

import re

pattern = r"b(AB)D*(d+)S*?(?:20)?(dd)b"

s = ("A='AB.224-QW-2018'n"
            "B='AB.876-5-LS-2018'n"
            "C='AB.26-LS-18'n"
            "D='AB-123-6-LS-2017'n"
            "IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L' F='ZX-ss-12L-AB-123-6-LS-2017-BC-22n"
            "A='AB.224-QW-2018'n"
            "B='AB.876-5-LS-2018'n"
            "C='AB.26-LS-18'n"
            "D='AB-123-6-LS-2017'n"
            "E='IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L'n"
            "F='ZX-ss-12L-AB-123-6-LS-2017-BC-22'n"
            "G='AB.224-2018'n"
            "H='AB.224/QW/2018'n"
            "I='AB/224/2018'")

matches = re.finditer(pattern, s)

for _, m in enumerate(matches, start=1):
    print(m.group(1) + "/" + m.group(2) + "/" + m.group(3))

Output

AB/224/18
AB/876/18
AB/26/18
AB/123/17
AB/224/18
AB/123/17
AB/224/18
AB/876/18
AB/26/18
AB/123/17
AB/224/18
AB/123/17
AB/224/18
AB/224/18
AB/224/18
Advertisement