Python Regex – Reference first line on every match, until the start of a new group

Question

Sample text: Intended result: Regex Attempts: (?:^This is (?PHeaderB)s) (Line (?Pd)s)*? Matches only the Header 'H' and 1st 'L' Line (?:^This is (?PHeaderB)s)? (Line (?Pd)s)*? manage to match multiple 'L' Lines however, only first 2 line are of the same match, not the subsequent L lines does not reference the Header capture group. I tried other attempts to adjust the

Accepted Answer

Mix of regex and substitutions with format.It is assumed that below a Header you always have a Line iimport retext = """This is HeaderA Line 1 Line 2 Line 3 Line 4 Line 5This is HeaderB Line 1 Line 2"""ordered_matches = [] # globaldef custom_match(m, all_matches=ordered_matches):    p = m.group(0)    if p.isdigit():        all_matches[-1] += [p]    else:        all_matches += [[p]]    return '' # doesn't matterr = re.sub(r'([A-Z0-9]+)$', custom_match, text, flags=re.M)for m in ordered_matches:    print(('Header{}{{}} '.format(m[0]) * (len(m)-1)).format(*m[1:]))OutputHeaderA1 HeaderA2 HeaderA3 HeaderA4 HeaderA5 HeaderB1 HeaderB2

Advertisement

Answer