I have a dataframe which looks like this
ID col 1 [item1 -> 0.2, Item2 -> 0.3, item3 -> 0.4] 2 [item2 -> 0.1, Item2 -> 0.7, item3 -> 0.2]
I want to sum of all the row wise decimal values and store into a new column
ID col total 1 [item1 -> 0.2, Item2 -> 0.3, item3 -> 0.4] 0.9 2 [item2 -> 0.1, Item2 -> 0.7, item3 -> 0.2] 1.0
My approach
df = df.withColumn('total', F.expr('aggregate(map_values(col),0,(acc,x) -> acc + x)'))
This is not working as it says, it can be applied only to int
Advertisement
Answer
data_sdf. withColumn('map_vals', func.map_values('col')). withColumn('sum_of_vals', func.expr('aggregate(map_vals, cast(0 as double), (x, y) -> x + y)'))
Since, your values are of float
type, the initial value passed within the aggregate
should match the type of the values in the array. So, casting the initial 0
to double
instead of using 0
should work fine.