I want to remove strings and special characters from multiple columns (which contain nan
values). I only want the numbers to be present. Below is one of my columns. Any help will be appreciated!
OTHER 0 $300.00 1 $850.00 2 $550.00 3 nan 4 $1,250.00 5 $81.00
Expected outcome:
OTHER 0 300.00 1 850.00 2 550.00 3 4 1250.00 5 81.00
Advertisement
Answer
Start by filling NaN
values by empty string, then extract the values by regex
, then fill the NaN
values, then finally replace comma by empty string.
>>> (df['OTHER'].fillna('') .astype(str) .str.extract('(d+(?:,d+)?(?:.d+)?)', expand=False) .fillna('') .str.replace(',', '')) 0 300.00 1 850.00 2 550.00 3 4 1250.00 5 81.00 Name: OTHER, dtype: object
But for above data, following should also work fine, i.e. taking the only values that are digit or decimal point .
:
>>>(df['OTHER'] .fillna('') .astype(str) .apply(lambda x: ''.join(i for i in x if i.isdigit() or i=='.'))) 0 300.00 1 850.00 2 550.00 3 4 1250.00 5 81.00 Name: OTHER, dtype: object
For doing the same thing on multiple columns, try this:
for col in df: df[col] = df[col]....#rest of the code from any of above two method