I’m dealing with text data and having problem erasing multiple back slashes. I found out that using .sub works quite well. So I coded as below to erase back slash+r n t f v
temp_string = re.sub(r"[tnrfv]"," ",string)
However, the code above can’t deal with the string below.
string = '\\r \\nLove the filtered water and crushed ice in the door.'
So coded as this:
temp_string = re.sub(r"[\\t\\n\\r\\f\\v]"," ",string) temp_string
But it’s showing result like this..
I don’t know why this happens.
Erasing all the v,f,n and so on..
I found out using .replace(“\\r”,” ”) works!
However,in this way, i should go like..
.replace(“\\r”,” ”) .replace(“\r”,” ”) .replace(“\r”,” ”) .replace(“r”,” ”) .replace(“\\t”,” ”) …
I’m pretty sure there’d be better way..
Advertisement
Answer
You can’t define a sequence of characters inside a character class. Character classes are meant to match a single character. So, [\\t\\n\\r\\f\\v] is equal to [\tnrfv] and matches either a backslash, or t, n, r, f or v letters.
To match a sequence of chars, you need to use them one by one. To match a n two-char string you need to use \n pattern (r'\n'). If you need to match either n or v texts you would need to use either \n|\v, (?:\n|\v) or better \[nv].
So, if you want to match a backslash followed with a letter from the rtnfv char set, or "t" (TAB), "n" (line feed), "r" (carriage return), "f" (form feed) or "v" (vertical tab) chars you can use
r'\[rtnfv]|[tnrfv]' r'(?:\[rtnfv]|[tnrfv])' r'(?:\[rtnfv]|[tnrfv])+'
The last one matches one or more consecutive occurrences of the patterns that may be mixed with each other.