I want to filter my dataframe with an or
condition to keep rows with a particular column’s values that are outside the range [-0.25, 0.25]
. I tried:
df = df[(df['col'] < -0.25) or (df['col'] > 0.25)]
But I get the error:
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Advertisement
Answer
The or
and and
Python statements require truth-values. For pandas, these are considered ambiguous, so you should use “bitwise” |
(or) or &
(and) operations:
df = df[(df['col'] < -0.25) | (df['col'] > 0.25)]
These are overloaded for these kinds of data structures to yield the element-wise or
or and
.
Just to add some more explanation to this statement:
The exception is thrown when you want to get the bool
of a pandas.Series
:
>>> import pandas as pd
>>> x = pd.Series([1])
>>> bool(x)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You hit a place where the operator implicitly converted the operands to bool
(you used or
but it also happens for and
, if
and while
):
>>> x or x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> x and x
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> if x:
print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> while x:
print('fun')
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Besides these four statements, there are several Python functions that hide some bool
calls (like any
, all
, filter
, …). These are normally not problematic with pandas.Series
, but for completeness I wanted to mention these.
In your case, the exception isn’t really helpful, because it doesn’t mention the right alternatives. For and
and or
, if you want element-wise comparisons, you can use:
-
JavaScript131
>>> import numpy as np
2>>> np.logical_or(x, y)
3
or simply the
|
operator:JavaScript121>>> x | y
2
-
JavaScript121
>>> np.logical_and(x, y)
2
or simply the
&
operator:JavaScript121>>> x & y
2
If you’re using the operators, then be sure to set your parentheses correctly because of operator precedence.
There are several logical NumPy functions which should work on pandas.Series
.
The alternatives mentioned in the Exception are more suited if you encountered it when doing if
or while
. I’ll shortly explain each of these:
If you want to check if your Series is empty:
JavaScript171>>> x = pd.Series([])
2>>> x.empty
3True
4>>> x = pd.Series([1])
5>>> x.empty
6False
7
Python normally interprets the
len
gth of containers (likelist
,tuple
, …) as truth-value if it has no explicit Boolean interpretation. So if you want the Python-like check, you could do:if x.size
orif not x.empty
instead ofif x
.If your
Series
contains one and only one Boolean value:JavaScript161>>> x = pd.Series([100])
2>>> (x > 50).bool()
3True
4>>> (x < 50).bool()
5False
6
If you want to check the first and only item of your Series (like
.bool()
, but it works even for non-Boolean contents):JavaScript141>>> x = pd.Series([100])
2>>> x.item()
3100
4
If you want to check if all or any item is not-zero, not-empty or not-False:
JavaScript161>>> x = pd.Series([0, 1, 2])
2>>> x.all() # Because one element is zero
3False
4>>> x.any() # because one (or more) elements are non-zero
5True
6