I’m currently struggling with following issue:
Let’s take following List of Lists:
JavaScript
x
2
1
[[1, 2, 3], [4, 5], [6, 7]]
2
How can I create following Sparks DF out of it with one row per element of each sublist:
JavaScript
1
10
10
1
| min_value | value |
2
---------------------
3
| 1| 1|
4
| 1| 2|
5
| 1| 3|
6
| 4| 4|
7
| 4| 5|
8
| 6| 6|
9
| 6| 7|
10
The only way I’m getting this done is by processing this list to another list with for-loops, which basically then already represents all rows of my DF, which is probably not the best way to solve this.
THX & BR IntoNumbers
Advertisement
Answer
You can create a dataframe and use explode and array_min to get the desired output:
JavaScript
1
30
30
1
import pyspark.sql.functions as F
2
3
l = [[1, 2, 3], [4, 5], [6, 7]]
4
5
df = spark.createDataFrame(
6
[[l]],
7
['col']
8
).select(
9
F.explode('col').alias('value')
10
).withColumn(
11
'min_value',
12
F.array_min('value')
13
).withColumn(
14
'value',
15
F.explode('value')
16
)
17
18
df.show()
19
+-----+---------+
20
|value|min_value|
21
+-----+---------+
22
| 1| 1|
23
| 2| 1|
24
| 3| 1|
25
| 4| 4|
26
| 5| 4|
27
| 6| 6|
28
| 7| 6|
29
+-----+---------+
30