import re line = "treinta y un" #example 1 line = "veinti un " #example 2 line = "un" #example 3 line = "un " #example 4 line = "uno" #example 5 line = "treinta yun" #example 6 line = "treinta y unghhg" #example 7 re_for_identificate_1 = "(?<!^)un" re_for_identificate_2 = " un" line = re.sub(re_for_identificate_2, " un ", line) line = re.sub(re_for_identificate_1, "un ", line) print(repr(line))
How to obtain this outputs from those inputs?
"treinta y un " #for example 1 "veinti un " #for example 2 "un " #for example 3 "un " #for example 4 "uno" #for example 5 "treinta yun" #for example 6 "treinta y unghhg" #for example 7
Note that for examples 4, 5, 6 and 7 the regex should not make any changes, since after the word there is already a space placed, or because in the case of "uno"
, the word "un"
is not at the end of the sentence, or in the case of "treinta yun"
the substring "un"
is not preceded by one or more spaces.
Advertisement
Answer
If you want to use regex, you can use bun$
, which checks that the last whole word in the string is un
, and that there is nothing after it in the string. If that is the case, a space is added to the end of the string:
import re lines = ["treinta y un", "veinti un ", "un", "un ", "uno", "treinta yun", "treinta y unghhg"] result = [re.sub(r'bun$', 'un ', line) for line in lines]
Output:
[ 'treinta y un ', 'veinti un ', 'un ', 'un ', 'uno', 'treinta yun', 'treinta y unghhg' ]