Skip to content
Advertisement

Is there a Scala Spark equivalent to pandas Grouper freq feature?

In pandas, if we have a time series and need to group it by a certain frequency (say, every two weeks), it’s possible to use the Grouper class, like this:

JavaScript

Is there any equivalent in Spark (more specifically, using Scala) for this feature?

Advertisement

Answer

You can use the sql function window. First, you create the timestamp column, if you don´t have any yet, from a string type datetime:

JavaScript

Then, apply the window function to the timestamp column, and do the aggregation to the column you need to obtain a result per slot:

JavaScript

The result includes the calculated windows. Take a look to the doc for a better understanding of the input parameters: https://spark.apache.org/docs/latest/api/sql/index.html#window.

JavaScript

it gives:

JavaScript
Advertisement