Skip to content
Advertisement

Pyspark groupBy DataFrame without aggregation or count

Can it iterate through the Pyspark groupBy dataframe without aggregation or count?

For example code in Pandas:

JavaScript

Advertisement

Answer

At best you can use .first , .last to get respective values from the groupBy but not all in the way you can get in pandas.

ex:

JavaScript

Since their is a basic difference between the way the data is handled in pandas and spark not all functionalities can be used in the same way.

Their are a few work arounds to get what you want like:

for diamonds DataFrame:

JavaScript

You can use:

JavaScript

output :

JavaScript

In Function groups you can decide what kind of grouping you want for the data. It is a simple filter condition but it will get you all the groups separately.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement