I have a dataframe where i am try to clean the data from all cols. There are some annomalies in teh data like this:
"[n], [ta], [cb]"
basically anything in square brackets i want to ignore and replace with a space.
I have this:
df['data1'] = df['data1'].str.replace(r"[(n|ta|cb)]", " ")
this works except I still get the square brackets in the data but they are just empty. Not sure how to also remove the square brackets as well as the letters in it. Also not sure if there is a quicker way to do this on all columns and not just one at a time.
Advertisement
Answer
It works for me. I think the reason why you previously got remaining square brackets was because you didn’t include the escape character (back slash).
df = pd.DataFrame({'data1':['[n], [ta], [cb]']})
Without escape characters:
df['data1'].str.replace(r"[(n|ta|cb)]", " ") # 0 [ ], [ ], [ ] # Name: data1, dtype: object
With escape characters:
df['data1'].str.replace(r"[(n|ta|cb)]", " ") # 0 , , # Name: data1, dtype: object
To apply this to all columns, just use a for loop:
for col in df.columns: df[col] = df[col].str.replace(r"[(n|ta|cb)]", " ")