I have the following dataframe I want to reformat it in the following way: Group by name/account/monthly periods Average (mean) balance to two decimal places Average (mean) for transactions to no decimal places Count of days where balance < 0 Count of days where Balance > Max credit So the I apply the following function to make a Series of

Conditional counts in pandas group by

I have the following dataframe

I want to reformat it in the following way:

Group by name/account/monthly periods
Average (mean) balance to two decimal places
Average (mean) for transactions to no decimal places
Count of days where balance < 0
Count of days where Balance > Max credit

So the I apply the following function to make a Series of all the aggregations, and use the Series index as labels for the new columns:

 def f(x):
    d = {}
    d['Avg_Transactions'] = x['Transaction'].mean().round()
    d['Avg_Balance'] = x['Balance'].mean().round(2)
    d['Zero_Balance_days'] = (x['Balance'] < 0).count() 
    d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).count()
    return pd.Series(d, index=['Avg_Transactions', 'Avg_Balance', 'Zero_Balance_days', 'Over_Credit_days'])

month = df.groupby(['Account','Name', 'Month']).apply(f)

JavaScript
​x
 
 def f(x):
    d = {}
    d['Avg_Transactions'] = x['Transaction'].mean().round()
    d['Avg_Balance'] = x['Balance'].mean().round(2)
    d['Zero_Balance_days'] = (x['Balance'] < 0).count() 
    d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).count()
    return pd.Series(d, index=['Avg_Transactions', 'Avg_Balance', 'Zero_Balance_days', 'Over_Credit_days'])
​
month = df.groupby(['Account','Name', 'Month']).apply(f)
​

Which gives the following output:

But I am getting the conditional counts wrong and instead counting all days. What would be the proper way to write these?

Answer

You can try replace your 2 lines with .count() to .sum(), as follows:

d['Zero_Balance_days'] = (x['Balance'] < 0).sum() 
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()

JavaScript
 
d['Zero_Balance_days'] = (x['Balance'] < 0).sum() 
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()
​

.count() returns number of non-NA/null observations in the Series of boolean index while both True/False are not NA/null and will be counted as well.

.sum() returns the sum of entries of True since True is interpreted as 1 while False is interpreted as 0 in the summation.

Advertisement

Answer