Combining unique elements of a DataFrame in a list

Tags: , ,



I’ll try to ask my question as clearly as possible.

I have the following DataFrame which looks like this

import pandas as pd
data = {'player' : ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
       'game' : ['Soccer', 'Basketball', 'Ping pong', 'Soccer', 'Tennis', 'Tennis', 'Baseball', 'Volleyball', 'Dodgeball']}
df = pd.DataFrame(data, columns=['player','game'])

  player        game
0      A      Soccer
1      A  Basketball
2      A   Ping pong
3      B      Soccer
4      B      Tennis
5      B      Tennis
6      C    Baseball
7      C  Volleyball
8      C   Dodgeball

Now I want to keep values unique to each player only once. Ideally in a list, but that’s not a big deal.

For example, player A and B play soccer so I don’t want soccer in the output. tennis appears twice, but both for player B so it would be in the output.

I’d want to output to be :

player        game
0      A  Basketball
1      A   Ping pong
2      B      Soccer
3      B      Tennis
4      C    Baseball
5      C  Volleyball
6      C   Dodgeball

Or like this:

player        game
0      A  [Basketball, Ping Pong]
1      B  [Soccer, Tennis]
2      C  [Baseball, Volleyball, Dodgeball]

Thank you for your help!

Answer

It seems need remove duplicates with keep last per column ‘game’ by DataFrame.drop_duplicates and then if need lists aggregate them by list:

df = (df.drop_duplicates('game', keep='last')
        .groupby('player')['game']
        .agg(list)
        .reset_index())
print (df)
  player                               game
0      A            [Basketball, Ping pong]
1      B                   [Soccer, Tennis]
2      C  [Baseball, Volleyball, Dodgeball]


Source: stackoverflow