How do I transpose columns in Pyspark? I want to make columns become rows, and rows become the columns.
Here is the input:
+---- +------+-----+-----+ |idx | vin |cur | mean| +---- +------+-----+-----+ |Type1| D| 5.0 |6.0 | |Type2| C| null| 7.0 | +---- +------+-----+-----+
Expected Outcome:
+---- +------+-----+ |idx |Type1 |Type2| +---- +------+-----+ |vin | D | C | |cur | 5.0 | null| |mean | 6.0 | 7.0 | +-----+------+-----+
Advertisement
Answer
You can combine stack
function to unpivot vin
, mean
and cur
columns then pivot
column idx
:
from pyspark.sql import functions as F df1 = df.selectExpr("idx", "stack(3, 'vin',vin, 'cur',cur, 'mean',mean)") .select("idx", "col0", "col1") .groupBy("col0") .pivot("idx").agg(F.first("col1")) .withColumnRenamed("col0", "idx") df1.show(truncate=False) #+----+-----+-----+ #|idx |Type1|Type2| #+----+-----+-----+ #|vin |D |C | #|mean|6.0 |7.0 | #|cur |5.0 |null | #+----+-----+-----+
You apply the transformation one by one to see how it works and what do each part.