escaping square brackets from string in dataframe



I have a dataframe where i am try to clean the data from all cols. There are some annomalies in teh data like this: "[n], [ta], [cb]" basically anything in square brackets i want to ignore and replace with a space. I have this:

df['data1'] = df['data1'].str.replace(r"[(n|ta|cb)]", " ")

this works except I still get the square brackets in the data but they are just empty. Not sure how to also remove the square brackets as well as the letters in it. Also not sure if there is a quicker way to do this on all columns and not just one at a time.

Answer

It works for me. I think the reason why you previously got remaining square brackets was because you didn’t include the escape character (back slash).

df = pd.DataFrame({'data1':['[n], [ta], [cb]']})

Without escape characters:

df['data1'].str.replace(r"[(n|ta|cb)]", " ")

# 0    [ ], [  ], [  ]
# Name: data1, dtype: object

With escape characters:

df['data1'].str.replace(r"[(n|ta|cb)]", " ")

# 0     ,  ,  
# Name: data1, dtype: object

To apply this to all columns, just use a for loop:

for col in df.columns:
    df[col] = df[col].str.replace(r"[(n|ta|cb)]", " ")


Source: stackoverflow