I’m dealing with text data and having problem erasing multiple back slashes. I found out that using .sub works quite well. So I coded as below to erase back slash+r n t f v
temp_string = re.sub(r"[tnrfv]"," ",string)
However, the code above can’t deal with the string below.
string = '\\r \\nLove the filtered water and crushed ice in the door.'
So coded as this:
temp_string = re.sub(r"[\\t\\n\\r\\f\\v]"," ",string) temp_string
But it’s showing result like this..
I don’t know why this happens.
Erasing all the v,f,n and so on..
I found out using .replace(“\\r”,” ”)
works!
However,in this way, i should go like..
.replace(“\\r”,” ”) .replace(“\r”,” ”) .replace(“\r”,” ”) .replace(“r”,” ”) .replace(“\\t”,” ”) …
I’m pretty sure there’d be better way..
Advertisement
Answer
You can’t define a sequence of characters inside a character class. Character classes are meant to match a single character. So, [\\t\\n\\r\\f\\v]
is equal to [\tnrfv]
and matches either a backslash, or t
, n
, r
, f
or v
letters.
To match a sequence of chars, you need to use them one by one. To match a n
two-char string you need to use \n
pattern (r'\n'
). If you need to match either n
or v
texts you would need to use either \n|\v
, (?:\n|\v)
or better \[nv]
.
So, if you want to match a backslash followed with a letter from the rtnfv
char set, or "t"
(TAB), "n"
(line feed), "r"
(carriage return), "f"
(form feed) or "v"
(vertical tab) chars you can use
r'\[rtnfv]|[tnrfv]' r'(?:\[rtnfv]|[tnrfv])' r'(?:\[rtnfv]|[tnrfv])+'
The last one matches one or more consecutive occurrences of the patterns that may be mixed with each other.