Skip to content
Advertisement

Dataframe new columns to tell if the row contains column’s header text

2 columns dataframe as the first screenshot. I want to add new columns (by the contents in the Note column from the original dataframe) to tell if the Note column contains the new column’s header text.

Example as the second screenshot.

enter image description here

Some lines work for a few columns. When there are a lot of new columns, it’s not efficient.

What is a good way to do so?

import pandas as pd
from io import StringIO

csvfile = StringIO(
'''NametNote
MiketBright, Kind
LilytFriendly
KatetConsiderate, energetic
JohntReliable, friendly
AletBright''')

df = pd.read_csv(csvfile, sep = 't', engine='python')

col_list =  df['Note'].tolist()

n_list = []
for c in col_list:
    for _ in c.split(','):
        n_list.append(_)

df = df.assign(**dict.fromkeys(n_list, ''))
    
df["Bright"][df['Note'].str.contains("Bright")] = "Yes"

Advertisement

Answer

You can try .str.get_dummies then replace 1 with Yes

df = df.join(df['Note'].str.get_dummies(', ').replace({1: 'Yes', 0: ''}))
print(df)

   Name                    Note Bright Considerate Friendly Kind Reliable energetic friendly
0  Mike            Bright, Kind    Yes                       Yes
1  Lily                Friendly                         Yes
2  Kate  Considerate, energetic                Yes                              Yes
3  John      Reliable, friendly                                       Yes                Yes
4   Ale                  Bright    Yes
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement