Find and remove slightly different substring on string

Question

I want to find out if a substring is contained in the string and remove it from it without touching the rest of the string. The thing is that the substring pattern that I have to perform the search on is not exactly what will be contained in the string. In particular the problem is due to spanish accent vocals

Accepted Answer

This normalize() method might be a little overkill and maybe using the code from @Harpe at https://stackoverflow.com/a/71591988/218663 works fine.Here I am going to break the original string into &#8220;words&#8221; and then join all the non-matching words back into a string:import unicodedatadef normalize(text):    return unicodedata.normalize("NFD", text).encode('ascii', 'ignore').decode('utf-8').lower()myString = "I'm júst a tésting stríng"substring = "TESTING"newString = " ".join(word for word in myString.split(" ") if normalize(word) != normalize(substring))print(newString)giving you:I'm júst a stríngIf your &#8220;substring&#8221; could be multi-word I might think about switching strategies to a regex:import reimport unicodedatadef normalize(text):    return unicodedata.normalize("NFD", text).encode('ascii', 'ignore').decode('utf-8').lower()myString = "I'm júst á tésting stríng"substring = "A TESTING"match = re.search(f"\s{ normalize(substring) }\s", normalize(myString))if match:    found_at = match.span()    first_part = myString[:found_at[0]]    second_part = myString[found_at[1]:]    print(f"{first_part} {second_part}".strip())I think that will give you:I'm júst stríng

Advertisement

Answer