Fairly new to python. This seems to be a really simple question but I can’t find any information about it. I have a list of strings, and for each string I want to check whether it is present in a dataframe (actually in a particular column of the dataframe. Not whether a substring is present, but the whole exact string.
So my dataframe is something like the following:
A=pd.DataFrame(["ancestry","time","history"])
I should simply be able to use the “string in dataframe” method, as in
"time" in A
This returns False however. If I run
"time" == A.iloc[1]
it returns “True”, but annoyingly as part of a series, and this depends on knowing where in the dataframe the corresponding string is. Is there some way I can just use the string in df method, to easily find out whether the strings in my list are in the dataframe?
Advertisement
Answer
The way to deal with this is to compare the whole dataframe with "time"
. That will return a mask where each value of the DF is True if it was time
, False otherwise. Then, you can use .any()
to check if there are any True values:
>>> A = pd.DataFrame(["ancestry","time","history"]) >>> A 0 0 ancestry 1 time 2 history >>> A == "time" # or A.eq("time") 0 0 False 1 True 2 False >>> (A == "time").any() 0 True dtype: bool
Notice in the above output, (A == "time").any()
returns a Series where each entry is a column and whether or not that column contained time
. If you want to check the entire dataframe (across all columns), call .any()
twice:
>>> (A == "time").any().any() True