Skip to content
Advertisement

How to select rows from list in PySpark

Suppose we have two dataframes df1 and df2 where df1 has columns [a, b, c, p, q, r] and df2 has columns [d, e, f, a, b, c]. Suppose the common columns are stored in a list common_cols = ['a', 'b', 'c'].

How do you join the two dataframes using the common_cols list within a sql command? The code below attempts to do this.

JavaScript

Advertisement

Answer

Demo setup

JavaScript

Solution, based on using (SQL syntax for join)

JavaScript

JavaScript

JavaScript
Advertisement