Skip to content
Advertisement

Python regular expression how to deal with multiple back slash

I’m dealing with text data and having problem erasing multiple back slashes. I found out that using .sub works quite well. So I coded as below to erase back slash+r n t f v

temp_string = re.sub(r"[tnrfv]"," ",string)

However, the code above can’t deal with the string below.

string = '\\r \\nLove the filtered water and crushed ice in the door.'

So coded as this:

temp_string = re.sub(r"[\\t\\n\\r\\f\\v]"," ",string)
temp_string

But it’s showing result like this..

I don’t know why this happens.

Erasing all the v,f,n and so on..

I found out using .replace(“\\r”,” ”) works! However,in this way, i should go like..

.replace(“\\r”,” ”)

.replace(“\r”,” ”)

.replace(“\r”,” ”)

.replace(“r”,” ”)

.replace(“\\t”,” ”)

…

I’m pretty sure there’d be better way..

Advertisement

Answer

You can’t define a sequence of characters inside a character class. Character classes are meant to match a single character. So, [\\t\\n\\r\\f\\v] is equal to [\tnrfv] and matches either a backslash, or t, n, r, f or v letters.

To match a sequence of chars, you need to use them one by one. To match a n two-char string you need to use \n pattern (r'\n'). If you need to match either n or v texts you would need to use either \n|\v, (?:\n|\v) or better \[nv].

So, if you want to match a backslash followed with a letter from the rtnfv char set, or "t" (TAB), "n" (line feed), "r" (carriage return), "f" (form feed) or "v" (vertical tab) chars you can use

r'\[rtnfv]|[tnrfv]'
r'(?:\[rtnfv]|[tnrfv])'
r'(?:\[rtnfv]|[tnrfv])+'

The last one matches one or more consecutive occurrences of the patterns that may be mixed with each other.

Advertisement