How do I transpose columns in Pyspark? I want to make columns become rows, and rows become the columns.
Here is the input:
JavaScript
x
7
1
+---- +------+-----+-----+
2
|idx | vin |cur | mean|
3
+---- +------+-----+-----+
4
|Type1| D| 5.0 |6.0 |
5
|Type2| C| null| 7.0 |
6
+---- +------+-----+-----+
7
Expected Outcome:
JavaScript
1
8
1
+---- +------+-----+
2
|idx |Type1 |Type2|
3
+---- +------+-----+
4
|vin | D | C |
5
|cur | 5.0 | null|
6
|mean | 6.0 | 7.0 |
7
+-----+------+-----+
8
Advertisement
Answer
You can combine stack
function to unpivot vin
, mean
and cur
columns then pivot
column idx
:
JavaScript
1
18
18
1
from pyspark.sql import functions as F
2
3
df1 = df.selectExpr("idx", "stack(3, 'vin',vin, 'cur',cur, 'mean',mean)")
4
.select("idx", "col0", "col1")
5
.groupBy("col0")
6
.pivot("idx").agg(F.first("col1"))
7
.withColumnRenamed("col0", "idx")
8
9
df1.show(truncate=False)
10
11
#+----+-----+-----+
12
#|idx |Type1|Type2|
13
#+----+-----+-----+
14
#|vin |D |C |
15
#|mean|6.0 |7.0 |
16
#|cur |5.0 |null |
17
#+----+-----+-----+
18
You apply the transformation one by one to see how it works and what do each part.