I have DataFrame in Python Pandas like below:
col1 | description ---------- |----------- John Simon |John Simon red Terry Juk |green Terry Juk John Bravo |John Bravo brown Ann Still |orange Ann Still bad_list = ["red", "green"]
And I need to select only these people from “col1” which have value from “col” in column “description” and something else (nevermind before or after), but it something else could not be from bad_list.
So I need to select only John Bravo and Ann Still because they have their value from “col1” in column “description” and does not have words from bad_list with their name in column “description”.
How to do that in Python Pandas?
Advertisement
Answer
Try:
bad_list = ["red", "green"] mask = df["description"].str.contains(r"|".join(bad_list)) print(df.loc[~mask, "col1"])
Prints:
2 John Bravo 3 Ann Still Name: col1, dtype: object
EDIT: To check if value from col1
is in description:
bad_list = ["red", "green"] mask1 = df["description"].str.contains(r"|".join(bad_list)) mask2 = df.apply(lambda x: x["col1"] in x["description"], axis=1) print(df.loc[~mask1 & mask2, "col1"])
EDIT2: To ignore case:
import re bad_list = ["RED", "green"] mask1 = df["description"].str.contains(r"|".join(bad_list), flags=re.I) mask2 = df.apply(lambda x: x["col1"] in x["description"], axis=1) print(df.loc[~mask1 & mask2, "col1"])