I am trying to very efficiently chain a variable amount of boolean pandas Series, to be used as a filter on a DataFrame through boolean indexing.
Normally when dealing with multiple boolean conditions, one chains them like this
condition_1 = (df.A > some_value) condition_2 = (df.B <= other_value) condition_3 = (df.C == another_value) full_indexer = condition_1 & condition_2 & condition_3
but this becomes a problem with a variable amount of conditions.
bool_indexers = [ condition_1, condition_2, ..., condition_N, ]
I have tried out some possible solutions, but I am convinced it can be done more efficiently.
Option 1
Loop over the indexers and apply consecutively.
full_indexer = bool_indexers[0] for indexer in bool_indexers[1:]: full_indexer &= indexer
Option 2
Put into a DataFrame and calculate the row product.
full_indexer = pd.DataFrame(bool_indexers).product(axis=0)
Option 3
Use numpy.product
(like in this answer) and create a new Series out of the result.
full_indexer = pd.Series(np.prod(np.vstack(bool_indexers), axis=0))
All three solutions are somewhat inefficient because they rely on looping or force you to create a new object (which can be slow if repeated many times).
Can it be done more efficiently or is this it?
Advertisement
Answer
Use np.logical_and
:
import pandas as pd import numpy as np df = pd.DataFrame({'A': [0, 1, 2], 'B': [0, 1, 2], 'C': [0, 1, 2]}) m1 = df.A > 0 m2 = df.B <= 1 m3 = df.C == 1 m = np.logical_and.reduce([m1, m2, m3]) # OR m = np.all([m1, m2, m3], axis=0) out = df[np.logical_and.reduce([m1, m2, m3])]
Output:
>>> pd.concat([m1, m2, m3], axis=1) A B C 0 False True False 1 True True True 2 True False False >>> m array([False, True, False]) >>> out A B C 1 1 1 1