I am getting hold of data from google sheet(consisting of 26 columns) into a python dataframe. 4 columns A,B,C,D have data in the form of % values(eg 15.6%) and also contain some rows with N/A values. I am trying to convert these columns into numbers so that I can use them for other calculations, but am having problems doing so. The output for all the rows is 1.0000 using my code, but I want it to be 0.156(from 15.6%) This is the code I am using
df = df.replace(r'^s*$', np.nan, regex=True)
df = df.replace(r'%', '', regex=True)
cols = ['A','B','C','D']
df[cols] = df[cols].str.astype('float').div(100)
I have been trying to find a solution for the past 3 hrs, but nothing seems to work:/
I feel the main problem here could be the presence of N/A values which can’t be transformed using the astype()
Sample dataset:
Solution: Thanks to @PraysonW.Daniel
df[cols] = df[cols].apply(pd.to_numeric, errors="coerce").div(100)
Advertisement
Answer
These are the steps I took
import pandas as pd
…
dataf.loc[:, ["A","B"]] = (dataf
.loc[:, ["A","B"]]
.replace(r"%","", regex=True)
.apply(pd.to_numeric, errors="coerce")
.div(100)
)
First I selected the columns we are interested in, then replace %
with “ , force values to numeric and last divided by 100
Data and Results
# generate data
dataf = pd.DataFrame({"id":["lgloo",]*3,"A":["75%", "10%",""], "B":["20%","0", "0"]})
…
dataf
id A B
0 lgloo 0.75 0.2
1 lgloo 0.1 0
2 lgloo NaN 0