df=
Genre Song Singer Playlist Album Rock Evil Walks AC/DC Music For Those About To Rock We Salute You Rock Snowballed AC/DC Music For Those About To Rock We Salute You Rock C.O.D AC/DC Music For Those About To Rock We Salute You Rock Perfect Alanis Morissette Music Jagged Little Pill Rock Forgiven Alanis Morissette Music Jagged Little Pill Metal Sad But True Apocalyptica Music Plays Metallica By Four Cellos Metal All For You Black Label Society Music Alcohol Fueled Brewtality Live! [Disc 1] Blues Layla Eric Clapton Music The Cream Of Clapton Blues Crossroads Eric Clapton Music The Cream Of Clapton ....... ...... .... Latin Etnia Chico Science Music Afrociberdelia
Off all the genres in the genre field, I only need to consider ‘Rock’, ‘Latin’, ‘Metal’, ‘Blues’ and build a new dataframe based on the following requirements
a.how many songs the singer has from that genre (count of each genre must be in a separate column).
b.Count of how many albums the singer has in the data.
c.Count of how many tracks the singer has in the data.
d.Count of how many playlists that include any Song of the singer.
Desired Output:
Singer Rock Latin Metal Blues CountofAlbums CountofSongs Count of Playlists AC/DC 5 7 8 2 4 22 2 Metallica 8 0 22 0 6 30 6 Iron Maiden 21 0 27 13 10 61 12
I was going to create one df for part a and one for parts b,c,d and merge them.
For parts b,c and d. I thought of looping over singer names and using nunique to get distinct count, but did not realize, the loop would also return column headers everytime.
mylist=list(set(df.Singer)) for i in mylist: temp=df[df['Singer']==i] df2=temp.nunique().to_frame().T
For part A, I was going to group songs by genre find a count and do a transpose
mylist=list(set(df.Singer)) for i in mylist: group=df4.groupby('Genre_Name').agg(count=('Song','count')) newdf=group.T
Any help will be greatly appreciated!
Advertisement
Answer
Can be done in one line but it’s a bit of a mouthful…
df = pd.DataFrame({ 'Genre':['Rock']*5+['Metal']*2+['Blues']*2+['Latin'], 'Song':['Evil Walks','Snowballed','C.O.D','Perfect','Forgiven','Sad But True', 'All For You','Layla','Crossroads','Etnia'], 'Singer':['AC/DC']*3+['Alanis Morissette']*2+['Apocalyptica']+['Black Label Society']+['Eric Clapton']*2+['Chico Science'], 'Playlist':['Music']*10, 'Album':['For Those About To Rock We Salute You']*3+['Jagged Little Pill']*2+['Plays Metallica By Four Cellos']+['Alcohol Fueled Brewtality Live! [Disc 1]']+['The Cream Of Clapton']*2+['Afrociberdelia'] }) agg_df=df.groupby('Singer').agg({'Song':'count'}) agg_df=agg_df.join(df[['Singer','Album']].drop_duplicates().groupby('Singer').count()) agg_df=agg_df.join(df[['Singer','Playlist']].drop_duplicates().groupby('Singer').count()) agg_df=agg_df.join(df.reset_index()[['Singer','Genre','index']].groupby(['Singer','Genre']).count().rename({'index':'count'},axis=1).unstack().fillna(0).astype(np.int16))