How do I transpose columns in Pyspark? I want to make columns become rows, and rows become the columns.
Here is the input:
+---- +------+-----+-----+ |idx | vin |cur | mean| +---- +------+-----+-----+ |Type1| D| 5.0 |6.0 | |Type2| C| null| 7.0 | +---- +------+-----+-----+
Expected Outcome:
+---- +------+-----+ |idx |Type1 |Type2| +---- +------+-----+ |vin | D | C | |cur | 5.0 | null| |mean | 6.0 | 7.0 | +-----+------+-----+
Advertisement
Answer
You can combine stack function to unpivot vin, mean and cur columns then pivot column idx:
from pyspark.sql import functions as F
df1 = df.selectExpr("idx", "stack(3, 'vin',vin, 'cur',cur, 'mean',mean)")
.select("idx", "col0", "col1")
.groupBy("col0")
.pivot("idx").agg(F.first("col1"))
.withColumnRenamed("col0", "idx")
df1.show(truncate=False)
#+----+-----+-----+
#|idx |Type1|Type2|
#+----+-----+-----+
#|vin |D |C |
#|mean|6.0 |7.0 |
#|cur |5.0 |null |
#+----+-----+-----+
You apply the transformation one by one to see how it works and what do each part.