Skip to content
Advertisement

Groupby column and create lists for other columns, preserving order

I have a PySpark dataframe which looks like this:

JavaScript

I want to group by or partition by ID column and then the lists for col1 and col2 should be created based on the order of timestamp.

JavaScript

My approach:

JavaScript

But this is not returning list of col1 and col2.

Advertisement

Answer

I don’t think the order can be reliably preserved using groupBy aggregations. So window functions seems to be the way to go.

Setup:

JavaScript

Script:

JavaScript

Result:

JavaScript

You were also very close to what you needed. I’ve played around and this seems to be working too:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement