I have multiple regex strings in format:- Example:
A=’AB.224-QW-2018′
B=’AB.876-5-LS-2018′
C=’AB.26-LS-18′
D=’AB-123-6-LS-2017′
E=’IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L’
F=’ZX-ss-12L-AB-123-6-LS-2017-BC-22′
G=’AB.224-2018′
H=”AB.224/QW/2018′
I=”AB/224/2018′
J=’AB-10-HDB-231-NCLT-1-2017 AD-42-HH-2019′
K=”AB-1-HDB-NCLT-1-2016 AD-42-HH-2020′
L=’AB-1-HDB-NCLT-1-2016/(AD-42-HH-2020)
I want a regex pattern to get the output for the numbers that occur after the alphabets(that appear at the start) as well as the first alphabets. And at last years that are mentioned at last. There are some strings which contain 876-5,123-6 in B and D respectively. I don’t want the single number that appear after -.
My code :
re.search(r"D*d*D*(AB)D*(d+)D*(20)?(d{2})D*d*D*)
Another attempt
re.search(r"D*d*D*(AB)D*(d+)D*d?D*(20)?(d{2})D*d*D*)
Both attempts will not work for all of them. Any pattern to match all strings?
I have created groups in regex pattern and extracted them as d.group(1)+”/”+d.group(2)+”/”+d.group(4). So output is expected as following if a regex pattern matches for all of them.
Expected Output
A='AB/224/18' B='AB/876/18' C='AB/26/18' D='AB/123/17' E='AB/224/18' F='AB/123/17' G='AB/224/18' H='AB/224/18' I='AB/224/18' J='AB/10/17' K='AB/1/16' L='AB/1/16'
Advertisement
Answer
You could use 3 capture groups:
b(AB)D*(d+)S*?(?:20)?(dd)b
b
A word boundary to prevent a partial word match(AB)
Capture AB in group 1D*
Match optional non digits(d+)
Capture 1+ digits in group 2S*?
Optionally match non whitespace characters, as least as possible(?:20)?
Optionally match 20(dd)
Capture 2 digits in group 3b
A word boundary
For example using re.finditer which returns Match objects that each hold the group values.
Using enumerate you can loop the matches. Every item in the iteration returns a tuple, where the first value is the count (that you don’t need here) and the second value contains the Match object.
import re pattern = r"b(AB)D*(d+)S*?(?:20)?(dd)b" s = ("A='AB.224-QW-2018'n" "B='AB.876-5-LS-2018'n" "C='AB.26-LS-18'n" "D='AB-123-6-LS-2017'n" "IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L' F='ZX-ss-12L-AB-123-6-LS-2017-BC-22n" "A='AB.224-QW-2018'n" "B='AB.876-5-LS-2018'n" "C='AB.26-LS-18'n" "D='AB-123-6-LS-2017'n" "E='IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L'n" "F='ZX-ss-12L-AB-123-6-LS-2017-BC-22'n" "G='AB.224-2018'n" "H='AB.224/QW/2018'n" "I='AB/224/2018'") matches = re.finditer(pattern, s) for _, m in enumerate(matches, start=1): print(m.group(1) + "/" + m.group(2) + "/" + m.group(3))
Output
AB/224/18 AB/876/18 AB/26/18 AB/123/17 AB/224/18 AB/123/17 AB/224/18 AB/876/18 AB/26/18 AB/123/17 AB/224/18 AB/123/17 AB/224/18 AB/224/18 AB/224/18