I have a dataframe:
JavaScript
x
16
16
1
Out[78]:
2
contract month year buys adjusted_lots price
3
0 W Z 5 Sell -5 554.85
4
1 C Z 5 Sell -3 424.50
5
2 C Z 5 Sell -2 424.00
6
3 C Z 5 Sell -2 423.75
7
4 C Z 5 Sell -3 423.50
8
5 C Z 5 Sell -2 425.50
9
6 C Z 5 Sell -3 425.25
10
7 C Z 5 Sell -2 426.00
11
8 C Z 5 Sell -2 426.75
12
9 CC U 5 Buy 5 3328.00
13
10 SB V 5 Buy 5 11.65
14
11 SB V 5 Buy 5 11.64
15
12 SB V 5 Buy 2 11.60
16
I need a sum of adjusted_lots , price which is weighted average , of price and adjusted_lots , grouped by all the other columns , ie. grouped by (contract, month , year and buys)
Similar solution on R was achieved by following code, using dplyr, however unable to do the same in pandas.
JavaScript
1
14
14
1
> newdf = df %>%
2
select ( contract , month , year , buys , adjusted_lots , price ) %>%
3
group_by( contract , month , year , buys) %>%
4
summarise(qty = sum( adjusted_lots) , avgpx = weighted.mean(x = price , w = adjusted_lots) , comdty = "Comdty" )
5
6
> newdf
7
Source: local data frame [4 x 6]
8
9
contract month year comdty qty avgpx
10
1 C Z 5 Comdty -19 424.8289
11
2 CC U 5 Comdty 5 3328.0000
12
3 SB V 5 Comdty 12 11.6375
13
4 W Z 5 Comdty -5 554.8500
14
is the same possible by groupby or any other solution ?
Advertisement
Answer
EDIT: update aggregation so it works with recent version of pandas
To pass multiple functions to a groupby object, you need to pass a tuples with the aggregation functions and the column to which the function applies:
JavaScript
1
19
19
1
# Define a lambda function to compute the weighted mean:
2
wm = lambda x: np.average(x, weights=df.loc[x.index, "adjusted_lots"])
3
4
# Define a dictionary with the functions to apply for a given column:
5
# the following is deprecated since pandas 0.20:
6
# f = {'adjusted_lots': ['sum'], 'price': {'weighted_mean' : wm} }
7
# df.groupby(["contract", "month", "year", "buys"]).agg(f)
8
9
# Groupby and aggregate with namedAgg [1]:
10
df.groupby(["contract", "month", "year", "buys"]).agg(adjusted_lots=("adjusted_lots", "sum"),
11
price_weighted_mean=("price", wm))
12
13
adjusted_lots price_weighted_mean
14
contract month year buys
15
C Z 5 Sell -19 424.828947
16
CC U 5 Buy 5 3328.000000
17
SB V 5 Buy 12 11.637500
18
W Z 5 Sell -5 554.850000
19
You can see more here:
and in a similar question here: