Skip to content
Advertisement

How can I perform the following transformation?

I have a dataframe as follows:

    id      fruits                      fastfoods
0   1110    banana|grapes|pine apple    noodles|manchurian
1   1123    apple|orange|pine apple     friedrice|manchurian
2   1245    apple|grapes                noodles|fried rice
3   1710    banana|orange|pine apple    noodles|manchurian
4   1109    banana|apple|pine apple     manchurian

And I want to convert this dataframe as follows:

    id      banana  grapes  pineApple   apple   orange  noodles manchurian  friedRice
0   1110    yes     yes     yes         no      no      yes     yes         no
1   1123    no      no      yes         no      no      no      yes         yes
2   1245    no      yes     no          yes     no      yes     no          yes
3   1710    yes     no      yes         no      yes     yes     yes         no
4   1109    yes     no      yes         yes     no      no      yes         no

I tried a few things but nothing worked. Any ideas?

Advertisement

Answer

Use Series.str.get_dummies with DataFrame.stack:

df1 = (df.set_index('id')
         .stack()
         .str.get_dummies()
         .max(level=0)
         .replace({0:'no', 1:'yes'})
         .reset_index())
print (df1)
     id apple banana fried rice friedrice grapes manchurian noodles orange  
0  1110    no    yes         no        no    yes        yes     yes     no   
1  1123   yes     no         no       yes     no        yes      no    yes   
2  1245   yes     no        yes        no    yes         no     yes     no   
3  1710    no    yes         no        no     no        yes     yes    yes   
4  1109   yes    yes         no        no     no        yes      no     no   

  pine apple  
0        yes  
1        yes  
2         no  
3        yes  
4        yes  

If order is important:

order = df.melt('id')['value'].str.split('|', expand=True).stack().unique()
print (order)
['banana' 'grapes' 'pine apple' 'apple' 'orange' 'noodles' 'manchurian'
 'friedrice' 'fried rice']

df1 = (df.set_index('id')
         .stack()
         .str.get_dummies()
         .max(level=0)
         .replace({0:'no', 1:'yes'})
         .reindex(order, axis=1)
         .reset_index()
         )
print (df1)
     id banana grapes pine apple apple orange noodles manchurian friedrice  
0  1110    yes    yes        yes    no     no     yes        yes        no   
1  1123     no     no        yes   yes    yes      no        yes       yes   
2  1245     no    yes         no   yes     no     yes         no        no   
3  1710    yes     no        yes    no    yes     yes        yes        no   
4  1109    yes     no        yes   yes     no      no        yes        no   

  fried rice  
0         no  
1         no  
2        yes  
3         no  
4         no  
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement