Skip to content
Advertisement

Groupby aggregate and transpose in pandas

df=

Genre Song          Singer               Playlist           Album
Rock  Evil Walks      AC/DC                 Music            For Those About To Rock We Salute You
Rock  Snowballed      AC/DC                 Music            For Those About To Rock We Salute You
Rock  C.O.D           AC/DC                 Music            For Those About To Rock We Salute You         
Rock  Perfect         Alanis Morissette     Music            Jagged Little Pill
Rock  Forgiven        Alanis Morissette     Music            Jagged Little Pill
Metal Sad But True    Apocalyptica          Music            Plays Metallica By Four Cellos
Metal All For You     Black Label Society   Music            Alcohol Fueled Brewtality Live! [Disc 1]
Blues Layla           Eric Clapton          Music            The Cream Of Clapton
Blues Crossroads      Eric Clapton          Music            The Cream Of Clapton
.......
......
....
Latin Etnia           Chico Science         Music            Afrociberdelia

Off all the genres in the genre field, I only need to consider ‘Rock’, ‘Latin’, ‘Metal’, ‘Blues’ and build a new dataframe based on the following requirements

a.how many songs the singer has from that genre (count of each genre must be in a separate column).

b.Count of how many albums the singer has in the data.

c.Count of how many tracks the singer has in the data.

d.Count of how many playlists that include any Song of the singer.

Desired Output:

Singer       Rock  Latin  Metal  Blues   CountofAlbums   CountofSongs  Count of Playlists
AC/DC         5      7    8      2         4                22             2
Metallica     8      0    22     0         6                30             6       
Iron Maiden   21     0    27     13        10               61             12

I was going to create one df for part a and one for parts b,c,d and merge them.

For parts b,c and d. I thought of looping over singer names and using nunique to get distinct count, but did not realize, the loop would also return column headers everytime.

mylist=list(set(df.Singer))
for i in mylist:
    temp=df[df['Singer']==i]
    df2=temp.nunique().to_frame().T
    

For part A, I was going to group songs by genre find a count and do a transpose

mylist=list(set(df.Singer))
for i in mylist:
   group=df4.groupby('Genre_Name').agg(count=('Song','count'))
   newdf=group.T

Any help will be greatly appreciated!

Advertisement

Answer

Can be done in one line but it’s a bit of a mouthful…

df = pd.DataFrame({
    'Genre':['Rock']*5+['Metal']*2+['Blues']*2+['Latin'],
    'Song':['Evil Walks','Snowballed','C.O.D','Perfect','Forgiven','Sad But True',
    'All For You','Layla','Crossroads','Etnia'],
    'Singer':['AC/DC']*3+['Alanis Morissette']*2+['Apocalyptica']+['Black Label Society']+['Eric Clapton']*2+['Chico Science'],
    'Playlist':['Music']*10,
    'Album':['For Those About To Rock We Salute You']*3+['Jagged Little Pill']*2+['Plays Metallica By Four Cellos']+['Alcohol Fueled Brewtality Live! [Disc 1]']+['The Cream Of Clapton']*2+['Afrociberdelia']
    })

agg_df=df.groupby('Singer').agg({'Song':'count'})
agg_df=agg_df.join(df[['Singer','Album']].drop_duplicates().groupby('Singer').count())
agg_df=agg_df.join(df[['Singer','Playlist']].drop_duplicates().groupby('Singer').count())
agg_df=agg_df.join(df.reset_index()[['Singer','Genre','index']].groupby(['Singer','Genre']).count().rename({'index':'count'},axis=1).unstack().fillna(0).astype(np.int16))
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement