Ddate Test Name T/F 01-01-2019 Haem T 02-01-2019 Haem T 03-01-2019 Haem T 04-01-2019 Haem F 05-01-2019 Haem F 01-01-2019 CBC T 02-01-2019 CBC T 03-01-2019 CBC T 04-01-2019 CBC F 05-01-2019 CBC F 01-01-2019 KFT T 02-01-2019 KFT T 03-01-2019 KFT T 04-01-2019 KFT F 05-01-2019 KFT F
I have sample dataframe above. I wish to calculate percentage True for each date. I am able to do as below. But, feel it can be done with groupby + agg. Is it possible?
My attempt:
import pandas as pd
import numpy as np
df['T/F'] = df['T/F'].map({'T':1, 'F':0})
df['T'] = np.where(df['T/F']==1, 1, 0)
df['F'] = 1 - df['T']
df = df.groupby('Ddate').agg(
{
'T': sum,
'F': sum,
}
)
df['Total'] = df['T'] + df['F']
df['per T'] = df['T'] / df['Total'] * 100
Advertisement
Answer
You can do groupby like this:
df['T/F'].eq('T').groupby([df['Date']]).mean()
Output:
Date 01-01-2019 1.0 02-01-2019 1.0 03-01-2019 1.0 04-01-2019 0.0 05-01-2019 0.0 Name: T/F, dtype: float64
You can get both percentages for T and F with crosstab:
pd.crosstab(df.Date, df['T/F'], normalize='index')
Output:
T/F F T Date 01-01-2019 0.0 1.0 02-01-2019 0.0 1.0 03-01-2019 0.0 1.0 04-01-2019 1.0 0.0 05-01-2019 1.0 0.0
Note 1: Extra comment to your code: The counts per date can be obtained by:
counts = pd.crosstab(df['Date'], df['T/F'])
Then the percentage of T can be:
counts['per T'] = counts['T']/counts.sum(axis=1)
Note 2: Don’t do groupby().agg({'col1': sum, 'col2':sum}) because:
sumis python native, and is slowaggis slow(er), and only useful when you want to perform different operations to different columns.
Do: groupby()[['col1','col2']].sum()
Note 3: All of the solutions above give percentage in scale 0-1. If you want scale 0-100, you know what to do.