Skip to content
Advertisement

pyspark: turn array of dict to new columns

I am struggling to transform my pyspark dataframe which looks like this:

JavaScript

to this:

JavaScript

I tried to pivot and a bunch of others things but don’t get the result above.

Note that I don’t have the exact number of dict in the column Tstring

Do you know how I can do this?

Advertisement

Answer

Using transform function you can convert each element of the array into a map type. After that, you can use aggregate function to get one map, explode it then pivot the keys to get the desired output:

JavaScript

I’m using Spark 3.1+, so the higher-order functions such as transform are available in dataframe API but you can do the same using expr for spark <3.1:

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement