I have the following dataframe
I want to reformat it in the following way:
- Group by name/account/monthly periods
- Average (mean) balance to two decimal places
- Average (mean) for transactions to no decimal places
- Count of days where balance < 0
- Count of days where Balance > Max credit
So the I apply the following function to make a Series of all the aggregations, and use the Series index as labels for the new columns:
JavaScript
x
10
10
1
def f(x):
2
d = {}
3
d['Avg_Transactions'] = x['Transaction'].mean().round()
4
d['Avg_Balance'] = x['Balance'].mean().round(2)
5
d['Zero_Balance_days'] = (x['Balance'] < 0).count()
6
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).count()
7
return pd.Series(d, index=['Avg_Transactions', 'Avg_Balance', 'Zero_Balance_days', 'Over_Credit_days'])
8
9
month = df.groupby(['Account','Name', 'Month']).apply(f)
10
Which gives the following output:
But I am getting the conditional counts wrong and instead counting all days. What would be the proper way to write these?
Advertisement
Answer
You can try replace your 2 lines with .count()
to .sum()
, as follows:
JavaScript
1
3
1
d['Zero_Balance_days'] = (x['Balance'] < 0).sum()
2
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()
3
.count()
returns number of non-NA/null observations in the Series of boolean index while both True
/False
are not NA/null and will be counted as well.
.sum()
returns the sum of entries of True
since True
is interpreted as 1
while False
is interpreted as 0
in the summation.