I want to remove strings and special characters from multiple columns (which contain nan values). I only want the numbers to be present. Below is one of my columns. Any help will be appreciated!
OTHER
0 $300.00
1 $850.00
2 $550.00
3 nan
4 $1,250.00
5 $81.00
Expected outcome:
OTHER
0 300.00
1 850.00
2 550.00
3
4 1250.00
5 81.00
Advertisement
Answer
Start by filling NaN values by empty string, then extract the values by regex, then fill the NaN values, then finally replace comma by empty string.
>>> (df['OTHER'].fillna('')
.astype(str)
.str.extract('(d+(?:,d+)?(?:.d+)?)', expand=False)
.fillna('')
.str.replace(',', ''))
0 300.00
1 850.00
2 550.00
3
4 1250.00
5 81.00
Name: OTHER, dtype: object
But for above data, following should also work fine, i.e. taking the only values that are digit or decimal point .:
>>>(df['OTHER']
.fillna('')
.astype(str)
.apply(lambda x: ''.join(i for i in x if i.isdigit() or i=='.')))
0 300.00
1 850.00
2 550.00
3
4 1250.00
5 81.00
Name: OTHER, dtype: object
For doing the same thing on multiple columns, try this:
for col in df:
df[col] = df[col]....#rest of the code from any of above two method