I have a dataframe with 4 columns an ID and three categories that results fell into
<80% 80-90 >90 id 1 2 4 4 2 3 6 1 3 7 0 3
I would like to convert it to percentages ie:
<80% 80-90 >90 id 1 20% 40% 40% 2 30% 60% 10% 3 70% 0% 30%
this seems like it should be within pandas capabilities but I just can’t figure it out.
Thanks in advance!
Advertisement
Answer
You can do this using basic pandas operators .div and .sum, using the axis argument to make sure the calculations happen the way you want:
cols = ['<80%', '80-90', '>90'] df[cols] = df[cols].div(df[cols].sum(axis=1), axis=0).multiply(100)
- Calculate the sum of each column (
df[cols].sum(axis=1).axis=1makes the summation occur across the rows, rather than down the columns. - Divide the dataframe by the resulting series (
df[cols].div(df[cols].sum(axis=1), axis=0).axis=0makes the division happen across the columns. - To finish, multiply the results by
100so they are percentages between 0 and 100 instead of proportions between 0 and 1 (or you can skip this step and store them as proportions).