I have a dataframe as follows:
id fruits fastfoods 0 1110 banana|grapes|pine apple noodles|manchurian 1 1123 apple|orange|pine apple friedrice|manchurian 2 1245 apple|grapes noodles|fried rice 3 1710 banana|orange|pine apple noodles|manchurian 4 1109 banana|apple|pine apple manchurian
And I want to convert this dataframe as follows:
id banana grapes pineApple apple orange noodles manchurian friedRice 0 1110 yes yes yes no no yes yes no 1 1123 no no yes no no no yes yes 2 1245 no yes no yes no yes no yes 3 1710 yes no yes no yes yes yes no 4 1109 yes no yes yes no no yes no
I tried a few things but nothing worked. Any ideas?
Advertisement
Answer
Use Series.str.get_dummies
with DataFrame.stack
:
df1 = (df.set_index('id') .stack() .str.get_dummies() .max(level=0) .replace({0:'no', 1:'yes'}) .reset_index()) print (df1) id apple banana fried rice friedrice grapes manchurian noodles orange 0 1110 no yes no no yes yes yes no 1 1123 yes no no yes no yes no yes 2 1245 yes no yes no yes no yes no 3 1710 no yes no no no yes yes yes 4 1109 yes yes no no no yes no no pine apple 0 yes 1 yes 2 no 3 yes 4 yes
If order is important:
order = df.melt('id')['value'].str.split('|', expand=True).stack().unique() print (order) ['banana' 'grapes' 'pine apple' 'apple' 'orange' 'noodles' 'manchurian' 'friedrice' 'fried rice'] df1 = (df.set_index('id') .stack() .str.get_dummies() .max(level=0) .replace({0:'no', 1:'yes'}) .reindex(order, axis=1) .reset_index() ) print (df1) id banana grapes pine apple apple orange noodles manchurian friedrice 0 1110 yes yes yes no no yes yes no 1 1123 no no yes yes yes no yes yes 2 1245 no yes no yes no yes no no 3 1710 yes no yes no yes yes yes no 4 1109 yes no yes yes no no yes no fried rice 0 no 1 no 2 yes 3 no 4 no