Skip to content
Advertisement

Create a new column by replacing comma-separated column’s values with a lookup based on another dataframe

I have PySpark dataframe (source_df) in which there is a column with values that are comma-separated. I am trying to replace those values with a lookup based on another dataframe (lookup_df)

source_df

JavaScript

lookup_df

JavaScript

output dataframe:

JavaScript

Column A is a primary key and is always unique. Column T is unique for a given value of A.

Advertisement

Answer

You can split and explode the column B and do a left join. Then collect the D values and concat with comma.

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement