Skip to content
Advertisement

Transforming many columns into 3 column categories which contains lists?

I have a DataFrame with 31 columns, which contains 3 categories “Classic”, “Premium” and “Luxe” I want to swap the way the DataFrame works to have only 3 comumns “Classic”, “Premium” and “Luxe” and 31 categories which can be listed inside.

Since I’m new I can only post a link to the picture for more clarity : Here is what I have and what I want to do

Here’s what I tried so far:

Suppose we use this DataFrame

import pandas as pd
import numpy as np

d = {'Name': ["Alban","Benan","Conor","Dino","Egor","Fatima" ],'Virtual visits': ["Classique", "Classique","","" ,"Premium","Premium"], 'Cars' :["","" ,"Luxe","Luxe","Luxe", "Premium"],'Portraits' :["","" ,"Classique","Classique","Luxe", "Premium"],'Animals' :["Premium","Luxe" ,"","","Luxe", ""]}
df = pd.DataFrame(data=d)
df

    Name    Virtual visits  Cars       Portraits    Animals
0   Alban   Classique                  Premium
1   Benan   Classique                  Luxe
2   Conor   Luxe            Classique   
3   Dino    Luxe            Classique   
4   Egor    Premium         Luxe       Luxe         Luxe
5   Fatima  Premium         Premium    Premium  

Using melt looked promising so I did that and a groupBy:

df = df.melt(id_vars=["Name"])
df.groupby(by=["Name"]).sum()

Name    variable                            value   
Alban   Virtual visitsCarsPortraitsAnimals  ClassiquePremium
Benan   Virtual visitsCarsPortraitsAnimals  ClassiqueLuxe
Conor   Virtual visitsCarsPortraitsAnimals  LuxeClassique
Dino    Virtual visitsCarsPortraitsAnimals  LuxeClassique
Egor    Virtual visitsCarsPortraitsAnimals  PremiumLuxeLuxeLuxe
Fatima  Virtual visitsCarsPortraitsAnimals  PremiumPremiumPremium

So the column “Variable” is made up of all the column names, which is not bad but instead of a single string added to one another, I would like to have a list, and instead of columns Variable and Value, I would need to have 3 columns, “Premium”, “Classque”, “Luxe” and in each of them have the relevant list of categories appear.

Thanks for your help.

Advertisement

Answer

You were right with melt. After that you need a pivot table:

out = df.replace('',np.nan).melt("Name")
.pivot_table(index="Name",columns="value",values="variable",aggfunc=','.join)
.rename_axis(None,axis=1).reset_index()

print(out)

     Name       Classique                    Luxe  
0   Alban  Virtual visits                     NaN   
1   Benan  Virtual visits                 Animals   
2   Conor       Portraits                    Cars   
3    Dino       Portraits                    Cars   
4    Egor             NaN  Cars,Portraits,Animals   
5  Fatima             NaN                     NaN   

                         Premium  
0                        Animals  
1                            NaN  
2                            NaN  
3                            NaN  
4                 Virtual visits  
5  Virtual visits,Cars,Portraits  
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement