Skip to content
Advertisement

Conditional counts in pandas group by

I have the following dataframe enter image description here

I want to reformat it in the following way:

  1. Group by name/account/monthly periods
  2. Average (mean) balance to two decimal places
  3. Average (mean) for transactions to no decimal places
  4. Count of days where balance < 0
  5. Count of days where Balance > Max credit

So the I apply the following function to make a Series of all the aggregations, and use the Series index as labels for the new columns:

 def f(x):
    d = {}
    d['Avg_Transactions'] = x['Transaction'].mean().round()
    d['Avg_Balance'] = x['Balance'].mean().round(2)
    d['Zero_Balance_days'] = (x['Balance'] < 0).count() 
    d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).count()
    return pd.Series(d, index=['Avg_Transactions', 'Avg_Balance', 'Zero_Balance_days', 'Over_Credit_days'])

month = df.groupby(['Account','Name', 'Month']).apply(f)

Which gives the following output: enter image description here

But I am getting the conditional counts wrong and instead counting all days. What would be the proper way to write these?

Advertisement

Answer

You can try replace your 2 lines with .count() to .sum(), as follows:

d['Zero_Balance_days'] = (x['Balance'] < 0).sum() 
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()

.count() returns number of non-NA/null observations in the Series of boolean index while both True/False are not NA/null and will be counted as well.

.sum() returns the sum of entries of True since True is interpreted as 1 while False is interpreted as 0 in the summation.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement