Skip to content
Advertisement

I need to append only those who has non null values in pyspark dataframe

I am having the pyspark dataframe (df) having below sample table (table1): id, col1, col2, col3 1, abc, null, def 2, null, def, abc 3, def, abc, null

I am trying to get new column (final) by appending the all the columns by ignoring null values. I have tried pyspark code and used f.array(col1, col2, col3). Values are getting appended but it not ignoring null values. I have also tried UDF to append only non null columns but it is not working.

JavaScript

please let me know if question is not clear or any more info is required. Any help would be appreciated. :)

Advertisement

Answer

Since Spark 2.4 you can use Higher Order Functions to do that (there is no UDF needed). In PySpark the query can look like this:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement