Skip to content
Advertisement

Pandas dataframe reports no matching string when the string is present

Fairly new to python. This seems to be a really simple question but I can’t find any information about it. I have a list of strings, and for each string I want to check whether it is present in a dataframe (actually in a particular column of the dataframe. Not whether a substring is present, but the whole exact string.

So my dataframe is something like the following:

A=pd.DataFrame(["ancestry","time","history"])

I should simply be able to use the “string in dataframe” method, as in

"time" in A

This returns False however. If I run

"time" == A.iloc[1]

it returns “True”, but annoyingly as part of a series, and this depends on knowing where in the dataframe the corresponding string is. Is there some way I can just use the string in df method, to easily find out whether the strings in my list are in the dataframe?

Advertisement

Answer

The way to deal with this is to compare the whole dataframe with "time". That will return a mask where each value of the DF is True if it was time, False otherwise. Then, you can use .any() to check if there are any True values:

>>> A = pd.DataFrame(["ancestry","time","history"])
>>> A
          0
0  ancestry
1      time
2   history

>>> A == "time"  # or A.eq("time")
       0
0  False
1   True
2  False

>>> (A == "time").any()
0    True
dtype: bool

Notice in the above output, (A == "time").any() returns a Series where each entry is a column and whether or not that column contained time. If you want to check the entire dataframe (across all columns), call .any() twice:

>>> (A == "time").any().any()
True
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement