Skip to content
Advertisement

after a groupby create a new column with a list of unique values for another column of the groupes values

So i have a dataframe with two columns: artistID and genre:

  artistID  genre
0    52   rock
1    63   pop
2    73   salsa
3    94   reggaeton
4    6177  rock
5    64   salsa
6    862      metal
7    52   pop
8    63   hiphop
9    64   jazz
10   52   metal
11   63   electro
12   73   latino
13   94   trap
14   6177  pop
15   64   latino
15   456   hiphop

And what I want to do is to group by the column artistID (so the resulting datafdrame has as many rows as artistID there are in this dataframe), and the second column of the new dataframe I want it to be like a list or an array or whatever it is convenient of all the unique genres to which each artistID has been taged. So I want the resultind dataframe to look like this:

  artistID    genre
0    52    [rock, pop, metal]
1    63    [pop, electro, hiphop]
2    73    [salsa, latino]
3    94    [reggaeton, trap]
4    6177  [rock, pop]
5    64    [salsa, jazz, latino]
6    862   [metal]
7    456   [hiphop]

How can I do this ?

I also must say, this dataframe is just an wxample, my real dataframe has almost 200.000 rows and 20.000 different artistID

Advertisement

Answer

Use Groupby.agg:

In [2237]: df.groupby('artistID')['genre'].agg(set).reset_index()

OR:

In [2240]: df.groupby('artistID')['genre'].apply(lambda x: list(set(x)))

Out[2237]: 
   artistID                   genre
0        52      [rock, pop, metal]
1        63  [pop, hiphop, electro]
2        64   [salsa, jazz, latino]
3        73         [salsa, latino]
4        94       [reggaeton, trap]
5       456                [hiphop]
6       862                 [metal]
7      6177             [rock, pop]
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement