Skip to content
Advertisement

PySpark – Combine a list of filtering conditions

For starters, let me define a sample dataframe and import the sql functions:

JavaScript

This returns the following dataframe:

JavaScript

Now lets say I have a list of filtering conditions, for example, a list of filtering conditions detailing that columns A and B shall be equal to 1

JavaScript

I can combine these two conditions as follows and then filter the dataframe, obtaining the following result:

JavaScript

Result:

JavaScript

MY QUESTION

If l is a list of unknown length n (that is, a list of n filtering conditions) instead of only two, which is the most pythonic way, or a one-liner way to logically combine them in and & or | manner?

all() and any() will not work, because they are designed for simple lists of [True, False] elements.

As an example, let us say that l = [func.col("A") == 1, func.col("B") == 1, func.col("C") == 2].

Help would be much appreciated.

Advertisement

Answer

You could use reduce, or a loop. The execution plan in spark will be the same for both, so I believe it’s just a matter of preference

JavaScript

Produces

JavaScript

and

JavaScript

Produces

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement