Skip to content
Advertisement

Remove strings and special characters from multiple columns

I want to remove strings and special characters from multiple columns (which contain nan values). I only want the numbers to be present. Below is one of my columns. Any help will be appreciated!

    OTHER
    0   $300.00
    1   $850.00
    2   $550.00
    3   nan
    4   $1,250.00
    5   $81.00

Expected outcome:

    OTHER
    0   300.00
    1   850.00
    2   550.00
    3   
    4   1250.00
    5   81.00

Advertisement

Answer

Start by filling NaN values by empty string, then extract the values by regex, then fill the NaN values, then finally replace comma by empty string.

>>> (df['OTHER'].fillna('')
    .astype(str)
    .str.extract('(d+(?:,d+)?(?:.d+)?)', expand=False)
    .fillna('')
    .str.replace(',',  ''))

0     300.00
1     850.00
2     550.00
3           
4    1250.00
5      81.00
Name: OTHER, dtype: object

But for above data, following should also work fine, i.e. taking the only values that are digit or decimal point .:

>>>(df['OTHER']
    .fillna('')
    .astype(str)
    .apply(lambda x: ''.join(i for i in x if i.isdigit() or i=='.')))

0     300.00
1     850.00
2     550.00
3           
4    1250.00
5      81.00
Name: OTHER, dtype: object

For doing the same thing on multiple columns, try this:

for col in df:
    df[col] = df[col]....#rest of the code from any of above two method
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement