I’m currently struggling with following issue:
Let’s take following List of Lists:
[[1, 2, 3], [4, 5], [6, 7]]
How can I create following Sparks DF out of it with one row per element of each sublist:
| min_value | value | --------------------- | 1| 1| | 1| 2| | 1| 3| | 4| 4| | 4| 5| | 6| 6| | 6| 7|
The only way I’m getting this done is by processing this list to another list with for-loops, which basically then already represents all rows of my DF, which is probably not the best way to solve this.
THX & BR IntoNumbers
Advertisement
Answer
You can create a dataframe and use explode and array_min to get the desired output:
import pyspark.sql.functions as F
l = [[1, 2, 3], [4, 5], [6, 7]]
df = spark.createDataFrame(
[[l]],
['col']
).select(
F.explode('col').alias('value')
).withColumn(
'min_value',
F.array_min('value')
).withColumn(
'value',
F.explode('value')
)
df.show()
+-----+---------+
|value|min_value|
+-----+---------+
| 1| 1|
| 2| 1|
| 3| 1|
| 4| 4|
| 5| 4|
| 6| 6|
| 7| 6|
+-----+---------+