2 columns dataframe as the first screenshot. I want to add new columns (by the contents in the Note column from the original dataframe) to tell if the Note column contains the new column’s header text.
Example as the second screenshot.
Some lines work for a few columns. When there are a lot of new columns, it’s not efficient.
What is a good way to do so?
import pandas as pd from io import StringIO csvfile = StringIO( '''NametNote MiketBright, Kind LilytFriendly KatetConsiderate, energetic JohntReliable, friendly AletBright''') df = pd.read_csv(csvfile, sep = 't', engine='python') col_list = df['Note'].tolist() n_list = [] for c in col_list: for _ in c.split(','): n_list.append(_) df = df.assign(**dict.fromkeys(n_list, '')) df["Bright"][df['Note'].str.contains("Bright")] = "Yes"
Advertisement
Answer
You can try .str.get_dummies
then replace 1
with Yes
df = df.join(df['Note'].str.get_dummies(', ').replace({1: 'Yes', 0: ''}))
print(df) Name Note Bright Considerate Friendly Kind Reliable energetic friendly 0 Mike Bright, Kind Yes Yes 1 Lily Friendly Yes 2 Kate Considerate, energetic Yes Yes 3 John Reliable, friendly Yes Yes 4 Ale Bright Yes