join two rows itertively to create new table in spark with one row for each two rows in new table

Question

Have a table where I want to go in range of two rows How to I create below table that goes in a range of two and shows the first id with the second col b and message in spark. Final table will look like this. Answer In pyspark you can use Window, example Output:

Accepted Answer

In pyspark you can use Window, examplewindow = Window.orderBy('id').rowsBetween(Window.currentRow, 1)(df.withColumn('ids', F.concat_ws(':', F.first('id').over(window), F.last('id').over(window))).withColumn('messages', F.concat_ws(',', F.first('col b').over(window), F.last('message').over(window))).withColumn('full_message', F.concat_ws(',', 'ids', 'messages'))# select only the first entries, regardless of the id.withColumn('seq_id', F.row_number().over(Window.orderBy('id'))).filter(F.col('seq_id') % 2 != 0).select('id', 'full_message'))Output:id  full_message1   1:2,abc,world3   3:4,abc 1,night100 100:101,abc1,Tuesday

Advertisement

Answer