Spread List of Lists to Sparks DF with PySpark?

Question

I&#8217;m currently struggling with following issue: Let&#8217;s take following List of Lists: How can I create following Sparks DF out of it with one row per element of each sublist: The only way I&#8217;m getting this done is by processing this list to another list with for-loops, which basically then alrea…

Accepted Answer

You can create a dataframe and use explode and array_min to get the desired output:import pyspark.sql.functions as Fl = [[1, 2, 3], [4, 5], [6, 7]]df = spark.createDataFrame(    [[l]],     ['col']).select(    F.explode('col').alias('value')).withColumn(    'min_value',     F.array_min('value')).withColumn(    'value',     F.explode('value'))df.show()+-----+---------+|value|min_value|+-----+---------+|    1|        1||    2|        1||    3|        1||    4|        4||    5|        4||    6|        6||    7|        6|+-----+---------+

Advertisement

Answer