Skip to content
Advertisement

How to get index of regex match of only the matched and included part?

txt =  'Port of Discharge/ Airport of destinationtXYZABCtttttttt44B'

I am doing:

reg_ind = [(m.start(0),m.end(0)) for m in re.finditer(r' port.{0,6}discharge.{0,3}/.{0,3}airport.{0,7}destination.*(?=44B)', txt,re.IGNORECASE | re.VERBOSE)]

print(reg_ind)
[(0, 56)]

print(txt[reg_ind[0][0]: reg_ind[0][1]])
Port of Discharge/ Airport of destination       XYZABC 

I want the index to end at Airport of destination.

Desired output:

print(reg_ind)
[(0, 41)]

print(txt[reg_ind[0][0]: reg_ind[0][1]])
Port of Discharge/ Airport of destination

Advertisement

Answer

You may move .* into the lookahead to avoid consuming that part of the match:

port.{0,6}discharge.{0,3}/.{0,3}airport.{0,7}destination(?=.*44B)
                                                         ^^^^^^^^

See a regex demo and a Python demo:

import re

txt =  'Port of Discharge/ Airport of destinationtXYZABCtttttttt44B'
pat = r' port.{0,6}discharge.{0,3}/.{0,3}airport.{0,7}destination(?=.*44B)'
reg_ind = [(m.start(0),m.end(0)) for m in re.finditer(pat, txt,re.IGNORECASE | re.VERBOSE)]
print(reg_ind) # => [(0, 41)]
Advertisement