Pivotting DataFrame with fixed column names

Question

Let's say I have below dataframe: and by design each user has 3 rows. I want to turn my DataFrame into: I was trying to groupBy(col('user')) and then pivot by ticker but it returns as many columns as different tickers there are so instead I wish I could have fixed number of columns. Is there any other Spark operator I

Accepted Answer

If the order doesn&#8217;t matter, then you can use row_number to number themimport pyspark.sql.functions as Ffrom pyspark.sql import Windowdf = df.withColumn('rank', F.row_number().over(Window.partitionBy('user').orderBy('ticker')))df = df.groupBy('user').pivot('rank').agg(F.first('ticker').alias('ticker'), F.first('date').alias('date'))

Advertisement

Answer