Skip to content
Advertisement

Obtain all combinations for regex substitution of overlapping pattern

My goal is to obtain all the possible substitution for all overlapping patterns of a given regex.

Normally when I want to obtain a substitution with regex I do the following

import re

re.sub(pattern='III', repl='U', string='MIIII') 

and I would obtain the following output:

MUI

As stated in the documentation the substitution is done only on the leftmost match in case of overlap, but what I need is to obtain all the possible substitutions, which in this case are:

MUI
MIU

My goal is using it also for complex regex patterns like the following

re.sub(pattern="M(.*)$", repl="M\1\1", string='MIU')
MIUIU

I didn’t find any solutions natively in the python standard library

Advertisement

Answer

One of the way to implement this is to search for pattern (using re.search()) until no match pattern found and replace just single occurrence of pattern (using re.sub() with count argument) slicing string every iteration to skip previous match.

import re

source = "MMM123"
pattern = re.compile("M(.*)$")
replacement = r"M11"

last_start = 0
temp = source
while match := pattern.search(temp):
    print(source[:last_start], pattern.sub(replacement, temp, 1), sep="")
    last_start += match.start() + 1
    temp = source[last_start:]
Advertisement