pyspark – How to define MapType for when/otherwise

Question

I have a pyspark DataFrame with a MapType column that either contains the map<string, int> format or is None. I need to perform some calculations using collect_list. But collect_list excludes None values and I am trying to find a workaround, by transforming None to string similar to Include null values …

Accepted Answer

we can create MapType using create_map of required literals.df = spark.createDataFrame([(1,2,{'abc':12}),(3,4,None)],['c1','c2','c3'])df.show()+---+---+-----------+| c1| c2|         c3|+---+---+-----------+|  1|  2|[abc -> 12]||  3|  4|       null|+---+---+-----------+from pyspark.sql import functions as fdefaultval = f.create_map(f.lit('None'),f.lit(1))df = df.withColumn('c3',f.when(f.col('c3').isNull(),defaultval).otherwise(f.col('c3'))df.select(f.collect_list('c3')).show(truncate=False)+--------------------------+|collect_list(c3)          |+--------------------------+|[[abc -> 12], [None -> 1]]|+--------------------------+

Advertisement

Answer