I’d like to update values in a column [‘D’] of my data frame based on the substrings contained in column [‘A’]
I have dataframe df, where column D is an exact copy of column A. Column A has list of different cereal names. But I want to check if A contains a certain word and if so, then column D updates to show that word.
For example, If A contains the word Bran then D should update to show Bran. And if column A contains none of the criteria we are looking for then it should return Other.
name | Type of Cereal |
---|---|
Bran Cereal | Bran |
Fiber Cereal | Fiber |
Nut Cereal | Nut |
So far I have tried this, but I am not able to change the rest of the columns without any criteria words to ‘Other’
df.loc[df['Type of Cereal'].str.contains('Fiber'), 'Type of Cereal'] = 'Fiber' df.loc[df['Type of Cereal'].str.contains('Bran'), 'Type of Cereal'] = 'Bran' df.loc[df['Type of Cereal'].str.contains('Nut'), 'Type of Cereal'] = 'Nut'
I have also tried creating a function, but I’m not sure how to loop through each value in column A
def conditions1(s): CerealTypeFiber = df['name'].str.contains('Fiber').sum() CerealTypeBran = df['name'].str.contains('Bran').sum() if CerealTypeFiber != None: return 'Fiber' elif CerealTypeBran != None: return 'Bran' else: return 'Other' df['Cereal Category'] = df.apply(conditions1, axis=1)
This just updates the whole column to Fiber.
Any thoughts/suggestions?
Advertisement
Answer
Other way using pandas.Series.str.extract
:
df["Type of Cereal"] = df["name"].str.extract("(Bran|Fiber|Nut)").fillna("Other") print(df)
Output (Note 4th row in case of no match):
name Type of Cereal 0 Bran Cereal Bran 1 Fiber Cereal Fiber 2 Nut Cereal Nut 3 Something Other