I’ll try to ask my question as clearly as possible.
I have the following DataFrame which looks like this
import pandas as pd data = {'player' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'], 'game' : ['Soccer', 'Basketball', 'Ping pong', 'Soccer', 'Tennis', 'Tennis', 'Baseball', 'Volleyball', 'Dodgeball']} df = pd.DataFrame(data, columns=['player','game']) player game 0 A Soccer 1 A Basketball 2 A Ping pong 3 B Soccer 4 B Tennis 5 B Tennis 6 C Baseball 7 C Volleyball 8 C Dodgeball
Now I want to keep values unique to each player only once. Ideally in a list, but that’s not a big deal.
For example, player A
and B
play soccer
so I don’t want soccer in the output.
tennis
appears twice, but both for player B
so it would be in the output.
I’d want to output to be :
player game 0 A Basketball 1 A Ping pong 2 B Soccer 3 B Tennis 4 C Baseball 5 C Volleyball 6 C Dodgeball
Or like this:
player game 0 A [Basketball, Ping Pong] 1 B [Soccer, Tennis] 2 C [Baseball, Volleyball, Dodgeball]
Thank you for your help!
Advertisement
Answer
It seems need remove duplicates with keep last per column ‘game’ by DataFrame.drop_duplicates
and then if need lists aggregate them by list
:
df = (df.drop_duplicates('game', keep='last') .groupby('player')['game'] .agg(list) .reset_index()) print (df) player game 0 A [Basketball, Ping pong] 1 B [Soccer, Tennis] 2 C [Baseball, Volleyball, Dodgeball]