Skip to content
Advertisement

How to get conditional values into new column from several external lists or arrays

I have the following dataframe:

JavaScript

To which I have to create an additional column new_col_cond that is dependent on the values of multiple external lists/arrays (I have also tried with dictionaries), for example:

JavaScript

The new column depends on the value of ratio and selects from either array according to id as index. I have tried:

JavaScript

with errors coming. I assume that the main source of error is using a column as index for the array, but not sure how else to insert the index into the array. Given the conditional nature of the column I have not tried to join (data is millions of rows and lists are in the thousands).

Due to the size of the dataset I am steering away from Pandas and udfs. The resulting dataframe should look like this:

JavaScript

Any help in solving this issue is appreciated.

Advertisement

Answer

Create ArrayType column expressions from the numpy arrays and use them in your condition like this:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement