Skip to content
Advertisement

Replace special characters in pandas dataframe from a string of special characters

I have created a pandas dataframe called df using this code:

import numpy as np import pandas as pd

ds = {'col1' : ['1','3/','4'], 'col2':['A','!B','@C']}

df =pd.DataFrame(data=ds)

The dataframe looks like this:

print(df)

  col1 col2
0    1    A
1   3/   !B
2    4   @C

The columns contain some special characters (/ and @) that I need to replace with a blank space.

Now, I have a list of special characters:

listOfSpecialChars = ‘¬`!”£$£#/,.+*><@|”‘

How can I replace any of the special characters listed in listOfSpecialChars with a blank space, any time I encounter them at any point in a dataframe, for any columns? At the moment I am dealing with a 100K-record dataframe with 560 columns, so I can’t write a piece of code for each variable.

Advertisement

Answer

You can use apply with str.replace:

import re
chars = ''.join(map(re.escape, listOfSpecialChars))

df2 = df.apply(lambda c: c.str.replace(f'[{chars}]', '', regex=True))

Alternatively, stack/unstack:

df2 = df.stack().str.replace(f'[{chars}]', '', regex=True).unstack()

output:

  col1 col2
0    1    A
1    3    B
2    4    C
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement