I’m currently struggling with following issue:
Let’s take following List of Lists:
[[1, 2, 3], [4, 5], [6, 7]]
How can I create following Sparks DF out of it with one row per element of each sublist:
| min_value | value | --------------------- | 1| 1| | 1| 2| | 1| 3| | 4| 4| | 4| 5| | 6| 6| | 6| 7|
The only way I’m getting this done is by processing this list to another list with for-loops, which basically then already represents all rows of my DF, which is probably not the best way to solve this.
THX & BR IntoNumbers
Advertisement
Answer
You can create a dataframe and use explode and array_min to get the desired output:
import pyspark.sql.functions as F l = [[1, 2, 3], [4, 5], [6, 7]] df = spark.createDataFrame( [[l]], ['col'] ).select( F.explode('col').alias('value') ).withColumn( 'min_value', F.array_min('value') ).withColumn( 'value', F.explode('value') ) df.show() +-----+---------+ |value|min_value| +-----+---------+ | 1| 1| | 2| 1| | 3| 1| | 4| 4| | 5| 4| | 6| 6| | 7| 6| +-----+---------+