I have the following dataframe
I want to reformat it in the following way:
- Group by name/account/monthly periods
- Average (mean) balance to two decimal places
- Average (mean) for transactions to no decimal places
- Count of days where balance < 0
- Count of days where Balance > Max credit
So the I apply the following function to make a Series of all the aggregations, and use the Series index as labels for the new columns:
def f(x): d = {} d['Avg_Transactions'] = x['Transaction'].mean().round() d['Avg_Balance'] = x['Balance'].mean().round(2) d['Zero_Balance_days'] = (x['Balance'] < 0).count() d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).count() return pd.Series(d, index=['Avg_Transactions', 'Avg_Balance', 'Zero_Balance_days', 'Over_Credit_days']) month = df.groupby(['Account','Name', 'Month']).apply(f)
Which gives the following output:
But I am getting the conditional counts wrong and instead counting all days. What would be the proper way to write these?
Advertisement
Answer
You can try replace your 2 lines with .count()
to .sum()
, as follows:
d['Zero_Balance_days'] = (x['Balance'] < 0).sum() d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()
.count()
returns number of non-NA/null observations in the Series of boolean index while both True
/False
are not NA/null and will be counted as well.
.sum()
returns the sum of entries of True
since True
is interpreted as 1
while False
is interpreted as 0
in the summation.