Ddate Test Name T/F 01-01-2019 Haem T 02-01-2019 Haem T 03-01-2019 Haem T 04-01-2019 Haem F 05-01-2019 Haem F 01-01-2019 CBC T 02-01-2019 CBC T 03-01-2019 CBC T 04-01-2019 CBC F 05-01-2019 CBC F 01-01-2019 KFT T 02-01-2019 KFT T 03-01-2019 KFT T 04-01-2019 KFT F 05-01-2019 KFT F
I have sample dataframe above. I wish to calculate percentage True for each date. I am able to do as below. But, feel it can be done with groupby + agg. Is it possible?
My attempt:
import pandas as pd import numpy as np df['T/F'] = df['T/F'].map({'T':1, 'F':0}) df['T'] = np.where(df['T/F']==1, 1, 0) df['F'] = 1 - df['T'] df = df.groupby('Ddate').agg( { 'T': sum, 'F': sum, } ) df['Total'] = df['T'] + df['F'] df['per T'] = df['T'] / df['Total'] * 100
Advertisement
Answer
You can do groupby like this:
df['T/F'].eq('T').groupby([df['Date']]).mean()
Output:
Date 01-01-2019 1.0 02-01-2019 1.0 03-01-2019 1.0 04-01-2019 0.0 05-01-2019 0.0 Name: T/F, dtype: float64
You can get both percentages for T
and F
with crosstab
:
pd.crosstab(df.Date, df['T/F'], normalize='index')
Output:
T/F F T Date 01-01-2019 0.0 1.0 02-01-2019 0.0 1.0 03-01-2019 0.0 1.0 04-01-2019 1.0 0.0 05-01-2019 1.0 0.0
Note 1: Extra comment to your code: The counts per date can be obtained by:
counts = pd.crosstab(df['Date'], df['T/F'])
Then the percentage of T
can be:
counts['per T'] = counts['T']/counts.sum(axis=1)
Note 2: Don’t do groupby().agg({'col1': sum, 'col2':sum})
because:
sum
is python native, and is slowagg
is slow(er), and only useful when you want to perform different operations to different columns.
Do: groupby()[['col1','col2']].sum()
Note 3: All of the solutions above give percentage in scale 0-1
. If you want scale 0-100
, you know what to do.