I have a dataframe as follows:
id fruits fastfoods 0 1110 banana|grapes|pine apple noodles|manchurian 1 1123 apple|orange|pine apple friedrice|manchurian 2 1245 apple|grapes noodles|fried rice 3 1710 banana|orange|pine apple noodles|manchurian 4 1109 banana|apple|pine apple manchurian
And I want to convert this dataframe as follows:
id banana grapes pineApple apple orange noodles manchurian friedRice 0 1110 yes yes yes no no yes yes no 1 1123 no no yes no no no yes yes 2 1245 no yes no yes no yes no yes 3 1710 yes no yes no yes yes yes no 4 1109 yes no yes yes no no yes no
I tried a few things but nothing worked. Any ideas?
Advertisement
Answer
Use Series.str.get_dummies with DataFrame.stack:
df1 = (df.set_index('id')
.stack()
.str.get_dummies()
.max(level=0)
.replace({0:'no', 1:'yes'})
.reset_index())
print (df1)
id apple banana fried rice friedrice grapes manchurian noodles orange
0 1110 no yes no no yes yes yes no
1 1123 yes no no yes no yes no yes
2 1245 yes no yes no yes no yes no
3 1710 no yes no no no yes yes yes
4 1109 yes yes no no no yes no no
pine apple
0 yes
1 yes
2 no
3 yes
4 yes
If order is important:
order = df.melt('id')['value'].str.split('|', expand=True).stack().unique()
print (order)
['banana' 'grapes' 'pine apple' 'apple' 'orange' 'noodles' 'manchurian'
'friedrice' 'fried rice']
df1 = (df.set_index('id')
.stack()
.str.get_dummies()
.max(level=0)
.replace({0:'no', 1:'yes'})
.reindex(order, axis=1)
.reset_index()
)
print (df1)
id banana grapes pine apple apple orange noodles manchurian friedrice
0 1110 yes yes yes no no yes yes no
1 1123 no no yes yes yes no yes yes
2 1245 no yes no yes no yes no no
3 1710 yes no yes no yes yes yes no
4 1109 yes no yes yes no no yes no
fried rice
0 no
1 no
2 yes
3 no
4 no