Add a space after a word if it’s at the beginning of a string or if it’s after one or more spaces, and at the same time it must be at end or before n

import re

line = "treinta y un"       #example 1
line = "veinti un "         #example 2
line = "un"                 #example 3
line = "un "                #example 4
line = "uno"                #example 5
line = "treinta yun"        #example 6
line = "treinta y unghhg"   #example 7

re_for_identificate_1 = "(?<!^)un"
re_for_identificate_2 = " un"

line = re.sub(re_for_identificate_2, " un ", line)
line = re.sub(re_for_identificate_1, "un ", line)

print(repr(line))

JavaScript
​x
 
import re
​
line = "treinta y un"       #example 1
line = "veinti un "         #example 2
line = "un"                 #example 3
line = "un "                #example 4
line = "uno"                #example 5
line = "treinta yun"        #example 6
line = "treinta y unghhg"   #example 7
​
re_for_identificate_1 = "(?<!^)un"
re_for_identificate_2 = " un"
​
line = re.sub(re_for_identificate_2, " un ", line)
line = re.sub(re_for_identificate_1, "un ", line)
​
print(repr(line))
​

How to obtain this outputs from those inputs?

"treinta y un "       #for example 1
"veinti un "          #for example 2
"un "                 #for example 3
"un "                 #for example 4
"uno"                 #for example 5
"treinta yun"         #for example 6
"treinta y unghhg"    #for example 7

JavaScript
 
"treinta y un "       #for example 1
"veinti un "          #for example 2
"un "                 #for example 3
"un "                 #for example 4
"uno"                 #for example 5
"treinta yun"         #for example 6
"treinta y unghhg"    #for example 7
​

Note that for examples 4, 5, 6 and 7 the regex should not make any changes, since after the word there is already a space placed, or because in the case of "uno", the word "un" is not at the end of the sentence, or in the case of "treinta yun" the substring "un" is not preceded by one or more spaces.

Answer

If you want to use regex, you can use bun$, which checks that the last whole word in the string is un, and that there is nothing after it in the string. If that is the case, a space is added to the end of the string:

import re

lines = ["treinta y un", "veinti un ", "un", "un ",
         "uno", "treinta yun", "treinta y unghhg"]

result = [re.sub(r'bun$', 'un ', line) for line in lines]

JavaScript
 
import re
​
lines = ["treinta y un", "veinti un ", "un", "un ",
         "uno", "treinta yun", "treinta y unghhg"]
​
result = [re.sub(r'bun$', 'un ', line) for line in lines]
​

Output:

[
 'treinta y un ',
 'veinti un ',
 'un ',
 'un ',
 'uno',
 'treinta yun',
 'treinta y unghhg'
]

JavaScript
 
[
 'treinta y un ',
 'veinti un ',
 'un ',
 'un ',
 'uno',
 'treinta yun',
 'treinta y unghhg'
]
​

Advertisement

Answer