Skip to content
Advertisement

How to transpose a dataframe in pyspark?

How do I transpose columns in Pyspark? I want to make columns become rows, and rows become the columns.

Here is the input:

+---- +------+-----+-----+
|idx  | vin  |cur  | mean|
+---- +------+-----+-----+
|Type1|     D| 5.0 |6.0  |
|Type2|     C| null| 7.0 |
+---- +------+-----+-----+ 

Expected Outcome:

+---- +------+-----+
|idx  |Type1 |Type2| 
+---- +------+-----+
|vin  |   D  | C   |
|cur  |  5.0 | null| 
|mean |  6.0 | 7.0 |     
+-----+------+-----+

Advertisement

Answer

You can combine stack function to unpivot vin, mean and cur columns then pivot column idx:

from pyspark.sql import functions as F

df1 = df.selectExpr("idx", "stack(3, 'vin',vin, 'cur',cur, 'mean',mean)") 
    .select("idx", "col0", "col1") 
    .groupBy("col0") 
    .pivot("idx").agg(F.first("col1")) 
    .withColumnRenamed("col0", "idx")

df1.show(truncate=False)

#+----+-----+-----+
#|idx |Type1|Type2|
#+----+-----+-----+
#|vin |D    |C    |
#|mean|6.0  |7.0  |
#|cur |5.0  |null |
#+----+-----+-----+

You apply the transformation one by one to see how it works and what do each part.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement