Have a data frame of a predictive model output that is seperated into tertiles (low, medium, and high risk). I want to calculate the percentage of people in each risk zone that have the outcome of interest.
JavaScript
x
8
1
import pandas as pd
2
3
data = {'risk_group': ["medium", "low", "high", "low", "high", "high", .],
4
'outcome': [1, 0, 1, 0, 1, 1, .}
5
6
df = pd.DataFrame (data, columns = ['risk_group','outcome'])
7
8
theoretical desired output is a dataframe that has
JavaScript
1
4
1
low : 12% w/ outcome
2
medium : 34% w/ outcome
3
high: 78% w/ outcome
4
Advertisement
Answer
Use:
JavaScript
1
2
1
df.groupby('risk_group').outcome.apply(lambda x: x.sum()/x.size * 100)
2