I am trying to use switches to turn on and off conditionals in a pandas dataframe. The switches are just boolean variables that will be True or False. The problem is that ~True does not evaluate the same as False as I expected it to. Why does this not work?
>>> dataframe = pd.DataFrame({'col1': [3, 4, 5, 6], 'col2': [6, 5, 4, 3]}) >>> dataframe col1 col2 0 3 6 1 4 5 2 5 4 3 6 3 >>> dataframe.loc[dataframe.col1 <= dataframe.col2] col1 col2 0 3 6 1 4 5 >>> dataframe.loc[(True) | (dataframe.col1 <= dataframe.col2)] col1 col2 0 3 6 1 4 5 2 5 4 3 6 3 >>> dataframe.loc[(False) | (dataframe.col1 <= dataframe.col2)] col1 col2 0 3 6 1 4 5 >>> dataframe.loc[(~True) | (dataframe.col1 <= dataframe.col2)] col1 col2 0 3 6 1 4 5 2 5 4 3 6 3 >>> dataframe.loc[(~(True)) | (dataframe.col1 <= dataframe.col2)] col1 col2 0 3 6 1 4 5 2 5 4 3 6 3 >>> >>> dataframe = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [True, False, False, True]}) >>> dataframe col1 col2 0 1 True 1 2 False 2 3 False 3 4 True >>> dataframe.loc[dataframe.col2] col1 col2 0 1 True 3 4 True >>> dataframe.loc[not dataframe.col2] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 1537, in __nonzero__ raise ValueError( ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). >>> dataframe.loc[dataframe.col2 == False] col1 col2 1 2 False 2 3 False
Advertisement
Answer
This is a pandas operator behavior (implemented from Numpy).
True
is not a pandas object. Instead it’s a boolean. So obviously, the ~
operator isn’t meant to reverse booleans, only in Pandas.
As you can see:
>>> ~True -2 >>>
It gives -2
, which is the regular __invert__
magic method behavior.
Therefore:
>>> bool(-2) True >>>
Gives True
.
Don’t mix up Pandas and Python behavior, Pandas implements it’s on __invert__
usage, example:
>>> ~pd.Series([True]) 0 False dtype: bool >>>
As you can see, in pandas (also Numpy), it inverts the booleans. Therefor if you write:
>>> dataframe.loc[~pd.Series([True]).any() | (dataframe.col1 <= dataframe.col2)] col1 col2 0 3 6 1 4 5 >>>
You can clearly see that it behaves equivalently as False
.
The best way here is with not
:
>>> dataframe.loc[(not True) | (dataframe.col1 <= dataframe.col2)] col1 col2 0 3 6 1 4 5 >>>