Context
I want, for each team, the rows of the data frame that contains the top three scoring players.
In my head, it is a combination of Dataframe.nlargest()
and Dataframe.groupby()
but I don’t think this is supported.
My ideal solution is:
- performed directly on
df
without having to create other dataframes - legible, and
- relatively performant (real df shape is 7M rows and 5 col)
Input
JavaScript
x
3
1
import pandas as pd
2
df = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"A","4":"A","5":"B","6":"B","7":"B","8":"B","9":"B","10":"C","11":"C","12":"C","13":"C","14":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Donna","4":"Elizabeth","5":"Fran","6":"Greta","7":"Heather","8":"Iris","9":"Jackie","10":"Kelly","11":"Lucy","12":"Molly","13":"Nina","14":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":8,"4":10,"5":28,"6":29,"7":18,"8":25,"9":9,"10":12,"11":23,"12":18,"13":10,"14":15}}')
3
JavaScript
1
18
18
1
| team | player | points |
2
|------|-----------|--------|
3
| A | Alice | 15 |
4
| A | Becky | 11 |
5
| A | Carmen | 13 |
6
| A | Donna | 8 |
7
| A | Elizabeth | 10 |
8
| B | Fran | 28 |
9
| B | Greta | 29 |
10
| B | Heather | 18 |
11
| B | Iris | 25 |
12
| B | Jackie | 9 |
13
| C | Kelly | 12 |
14
| C | Lucy | 23 |
15
| C | Molly | 18 |
16
| C | Nina | 10 |
17
| C | Ophelia | 15 |
18
Desired Output
JavaScript
1
3
1
df_output = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"B","4":"B","5":"B","6":"C","7":"C","8":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Fran","4":"Greta","5":"Iris","6":"Lucy","7":"Molly","8":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":28,"4":29,"5":25,"6":23,"7":18,"8":15}}')
2
df_output
3
JavaScript
1
12
12
1
| team | player | points |
2
|------|---------|--------|
3
| A | Alice | 15 |
4
| A | Becky | 11 |
5
| A | Carmen | 13 |
6
| B | Fran | 28 |
7
| B | Greta | 29 |
8
| B | Iris | 25 |
9
| C | Lucy | 23 |
10
| C | Molly | 18 |
11
| C | Ophelia | 15 |
12
Advertisement
Answer
You can use df.groupby.rank
method:
JavaScript
1
13
13
1
In [1401]: df[df.groupby('team')['points'].rank(ascending=False) <= 3]
2
Out[1401]:
3
team player points
4
0 A Alice 15
5
1 A Becky 11
6
2 A Carmen 13
7
5 B Fran 28
8
6 B Greta 29
9
8 B Iris 25
10
11 C Lucy 23
11
12 C Molly 18
12
14 C Ophelia 15
13