pyspark: turn array of dict to new columns

Question

I am struggling to transform my pyspark dataframe which looks like this: to this: I tried to pivot and a bunch of others things but don't get the result above. Note that I don't have the exact number of dict in the column Tstring Do you know how I can do this? Answer Using transform function you can convert each

Accepted Answer

Using transform function you can convert each element of the array into a map type. After that, you can use aggregate function to get one map, explode it then pivot the keys to get the desired output:from pyspark.sql import functions as Fdf1 = df.withColumn( "Tstring", F.transform("Tstring", lambda x: F.from_json(x, "map"))).withColumn( "Tstring", F.aggregate( F.expr("slice(Tstring, 2, size(Tstring))"), F.col("Tstring")[0], lambda acc, x: F.map_concat(acc, x) )).select( "id", "Tlist", F.explode("Tstring")).groupby( "id", "Tlist").pivot("key").agg(F.first("value"))df1.show()#+--------+----------+----+----+#|id |Tlist |v1 |v2 |#+--------+----------+----+----+#|0018aad4|[300, 450]|blue|red |#|0018aad5|[300] |blue|null|#+--------+----------+----+----+I’m using Spark 3.1+, so the higher-order functions such as transform are available in dataframe API but you can do the same using expr for spark <3.1:df1 = (df.withColumn("Tstring", F.expr("transform(Tstring, x-> from_json(x, 'map'))")) .withColumn("Tstring", F.expr("aggregate(slice(Tstring, 2, size(Tstring)), Tstring[0], (acc, x) -> map_concat(acc, x))")) .select("id", "Tlist", F.explode("Tstring")) .groupby("id", "Tlist") .pivot("key") .agg(F.first("value")) )

Advertisement

Answer