Skip to content
Advertisement

Regex: Remove the letters with length 1-3 which are before the dot

If I have an input something like this

input  = 'AB. Hello word.'
the output should be 
output = 'Hello word.'

Another example is

input = 'AB′. Hello word'
output = Hello Word

I want to produce a code which is generalized for any group of letter in any language. This is my code

text = 'A. Hello word.'
text = re.sub(r'A. w{1,2}.*', '', text)
text

output = llo word.

So I can change ‘A’ with any other letter, but for some reason isn’t working well.

I tried also this one

text = 'Ab. Hello word.'
text = re.sub(r'A+. w{1,2}.*', '', text)
text
output = Ab. Hello word.

but isn’t working as well.

Advertisement

Answer

Try this:

import re

regex = r"^[^.]{1,3}.s*"

test_str = ("AB. Hello word.n"
    "AB′. Hello word.n"
    "A. Hello word.n"
    "Ab. Hello word.n")

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

Output:

Hello word.
Hello word.
Hello word.
Hello word.

regex101

Rextester

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement