So i have a dataframe with two columns: artistID and genre:
JavaScript
x
19
19
1
artistID genre
2
0 52 rock
3
1 63 pop
4
2 73 salsa
5
3 94 reggaeton
6
4 6177 rock
7
5 64 salsa
8
6 862 metal
9
7 52 pop
10
8 63 hiphop
11
9 64 jazz
12
10 52 metal
13
11 63 electro
14
12 73 latino
15
13 94 trap
16
14 6177 pop
17
15 64 latino
18
15 456 hiphop
19
And what I want to do is to group by the column artistID (so the resulting datafdrame has as many rows as artistID there are in this dataframe), and the second column of the new dataframe I want it to be like a list or an array or whatever it is convenient of all the unique genres to which each artistID has been taged. So I want the resultind dataframe to look like this:
JavaScript
1
10
10
1
artistID genre
2
0 52 [rock, pop, metal]
3
1 63 [pop, electro, hiphop]
4
2 73 [salsa, latino]
5
3 94 [reggaeton, trap]
6
4 6177 [rock, pop]
7
5 64 [salsa, jazz, latino]
8
6 862 [metal]
9
7 456 [hiphop]
10
How can I do this ?
I also must say, this dataframe is just an wxample, my real dataframe has almost 200.000 rows and 20.000 different artistID
Advertisement
Answer
Use Groupby.agg
:
JavaScript
1
2
1
In [2237]: df.groupby('artistID')['genre'].agg(set).reset_index()
2
OR:
JavaScript
1
13
13
1
In [2240]: df.groupby('artistID')['genre'].apply(lambda x: list(set(x)))
2
3
Out[2237]:
4
artistID genre
5
0 52 [rock, pop, metal]
6
1 63 [pop, hiphop, electro]
7
2 64 [salsa, jazz, latino]
8
3 73 [salsa, latino]
9
4 94 [reggaeton, trap]
10
5 456 [hiphop]
11
6 862 [metal]
12
7 6177 [rock, pop]
13