Overlapping regular expression substitution in Python, but contingent on values of capture groups

Question

I'm currently writing a program in Python that is supposed to transliterate all the characters in a language from one orthography into another. There are two things at hand here, one of which is already solved, and the second is the problem. In the first step, characters from the source orthography are converted into the target orthography, e.g. (ffr: the

Accepted Answer

Yes, only very small changes are needed.  VR'W -> V'RWIn fact, only the first 3 characters need to be manipulated, with &#8216;W&#8217; as a necessary condition, so the problem we have to solve becomes:  VR'(W) -> V'RUsing lookahead assertion: (? =&#8230;) can match VR'(W)Previous: VR'W(a|e|i|u)(l|m|n|w|y)(')(a|e|i|u)The subsequent ones match only three letters but look forward one W: VR'(W)(a|e|i|u)(l|m|n|w|y)(')(?=(a|e|i|u))So &#8216;W&#8217; is the condition, not in operation range, it can be matched again.import redef glottalized_resonant_mover(linestring):        '''    moves glottal character over according to glottalized resonant     hierarchy:    case description: VR’W for some vowels V, W; some glottalized     resonant R’    hierarchy: e > i > a > u               3 > 2 > 1 > 0    if h(V) > h(W), then string is V’RW        '''    hi_scores = {'e' : 3,                'i' : 2,                'a' : 1,                'u' : 0}    def hierarchy_sub(matchobj):        '''moves glottalized resonant if a vowel pulls it one way        or the other        '''        if hi_scores[matchobj.group(1)] > hi_scores[matchobj.group(4)]:            swap_string = ''.join(                [                matchobj.group(1),                matchobj.group(3),                matchobj.group(2),                #matchobj.group(4) <- Don't need the last one because 'lookahead'                ]            )            return swap_string        else:            return matchobj.group(0)           glot_res_re = re.compile('(a|e|i|u)(l|m|n|w|y)(’)(?=(a|e|i|u))')    # glot_res_re = re.compile('(a|e|i|u)(l|m|n|w|y)(’)(a|e|i|u)')    swapstring = glot_res_re.sub( hierarchy_sub, linestring)        return swapstringsample = ['’im’ush', 'ttham’uqwus', 'xwtsekwul’im’us']answer =['’i’mush', 'ttha’muqwus', 'xwtsekwul’i’mus']it1 = iter(sample)it2 = iter(answer)for i in sample:    print(next(it1),'->',glottalized_resonant_mover(i), "==", next(it2))Output:’im’ush -> ’i’mush == ’i’mushttham’uqwus -> ttha’muqwus == ttha’muqwusxwtsekwul’im’us -> xwtsekwul’i’mus == xwtsekwul’i’mus

Advertisement

Answer