I have the following dataframe (which is a pretty reduced sample from my original one).
JavaScript
x
69
69
1
year month id revenue
2
80496 2020 2 16 536000.00
3
84222 2020 1 2758 430344.66
4
84223 2020 1 2758 679945.17
5
84224 2020 1 2758 543982.82
6
84225 2020 3 2758 287728.00
7
84226 2020 3 2758 321874.00
8
84227 2020 3 2758 408128.00
9
84230 2020 3 2758 458304.00
10
84231 2020 3 2758 332528.00
11
85309 2020 2 1560 277233.84
12
85312 2020 2 1560 419910.83
13
85876 2020 3 2890 333281.00
14
90690 2020 1 1304 306297.32
15
90691 2020 1 1304 391399.67
16
90698 2020 1 1304 314511.00
17
90699 2020 2 1304 598394.98
18
90701 2020 2 1304 391659.34
19
90702 2020 2 1304 420196.17
20
90703 2020 2 1304 390432.01
21
90705 2020 2 1304 485868.16
22
90706 2020 2 1304 340244.66
23
91066 2020 2 3738 426222.33
24
92039 2020 2 1273 269514.00
25
92040 2020 2 1273 399330.00
26
92043 2020 3 1273 845502.01
27
92044 2020 3 1273 285688.00
28
92045 2020 2 1304 471677.67
29
92053 2020 2 1304 306994.32
30
92055 2020 2 1304 514140.01
31
92058 2020 2 1304 670594.83
32
92059 2020 2 1304 313311.17
33
92060 2020 2 1304 264825.32
34
92061 2020 2 1304 417365.50
35
92063 2020 3 2758 460628.00
36
92064 2020 3 2758 270060.00
37
92074 2020 2 4354 626157.00
38
92075 2020 2 4354 1225539.99
39
92078 2020 3 2911 381026.67
40
92079 2020 3 2911 325471.67
41
92080 2020 3 2911 383008.33
42
92081 2020 3 2911 267538.66
43
92082 2020 3 2911 383789.17
44
92083 2020 3 2911 352452.50
45
92084 2020 3 2911 279040.84
46
92085 2020 3 2911 367950.82
47
92087 2020 2 130 284714.26
48
92088 2020 2 130 600318.97
49
92089 2020 2 130 270437.93
50
92091 2020 2 130 272350.83
51
92092 2020 2 130 346533.36
52
92093 2020 2 130 294939.32
53
92100 2020 2 134 303719.16
54
92101 2020 3 134 367001.67
55
92112 2020 3 1561 276828.00
56
92113 2020 3 1561 279312.01
57
92114 2020 3 1625 294794.50
58
92119 2020 3 1625 592332.50
59
92126 2020 2 2890 620486.67
60
92128 2020 3 2890 680190.00
61
92130 2020 3 2890 418707.33
62
92131 2020 3 2890 328754.99
63
92132 2020 3 2890 339958.00
64
92137 2020 3 2890 554962.00
65
92138 2020 3 2890 365953.34
66
92139 2020 3 2890 486639.16
67
92140 2020 3 2890 1610025.83
68
92141 2020 3 2890 589236.49
69
I’m trying to get the top 2 ids for each year AND month. So, for example, the idea was to obtain the below df. My main problem here, is to get the Top n along with the dates, because the nlargest
method applies to a pd.Series or a dataframe (but in this case, you cannot pass a list for multiple conditions).
JavaScript
1
8
1
year month id revenue
2
0 2020 3 2890 6626149.47
3
1 2020 3 2758 4428253.33
4
691 2020 2 1304 6997646.32
5
692 2020 2 130 2229204.96
6
1785 2020 1 2758 1749250.49
7
1786 2020 1 1304 1581520.32
8
So far, my code is:
JavaScript
1
2
1
df.groupby(by=['year', 'month', 'id'])[['revenue']].agg({'revenue': 'sum'}).sort_values(by=['year', 'month', 'revenue'], ascending=False).reset_index()
2
Advertisement
Answer
One option is to sort the columns before grouping, and use the nth
function:
JavaScript
1
17
17
1
(df.sort_values(['year', 'month','revenue', 'id'],
2
ascending=[True, True, False, False])
3
.groupby(['year', 'month'],
4
as_index = False,
5
sort = False)
6
.nth([0,1])
7
)
8
9
year month id revenue
10
84223 2020 1 2758 679945.17
11
84224 2020 1 2758 543982.82
12
92075 2020 2 4354 1225539.99
13
92058 2020 2 1304 670594.83
14
92140 2020 3 2890 1610025.83
15
92043 2020 3 1273 845502.01
16
17