I am getting hold of data from google sheet(consisting of 26 columns) into a python dataframe. 4 columns A,B,C,D have data in the form of % values(eg 15.6%) and also contain some rows with N/A values. I am trying to convert these columns into numbers so that I can use them for other calculations, but am having problems doing so. The output for all the rows is 1.0000 using my code, but I want it to be 0.156(from 15.6%) This is the code I am using
df = df.replace(r'^s*$', np.nan, regex=True) df = df.replace(r'%', '', regex=True) cols = ['A','B','C','D'] df[cols] = df[cols].str.astype('float').div(100)
I have been trying to find a solution for the past 3 hrs, but nothing seems to work:/ I feel the main problem here could be the presence of N/A values which can’t be transformed using the astype() Sample dataset:
Solution: Thanks to @PraysonW.Daniel
df[cols] = df[cols].apply(pd.to_numeric, errors="coerce").div(100)
Advertisement
Answer
These are the steps I took
import pandas as pd … dataf.loc[:, ["A","B"]] = (dataf .loc[:, ["A","B"]] .replace(r"%","", regex=True) .apply(pd.to_numeric, errors="coerce") .div(100) )
First I selected the columns we are interested in, then replace %
with “ , force values to numeric and last divided by 100
Data and Results
# generate data dataf = pd.DataFrame({"id":["lgloo",]*3,"A":["75%", "10%",""], "B":["20%","0", "0"]}) … dataf id A B 0 lgloo 0.75 0.2 1 lgloo 0.1 0 2 lgloo NaN 0