Is there a Scala Spark equivalent to pandas Grouper freq feature?

Question

In pandas, if we have a time series and need to group it by a certain frequency (say, every two weeks), it&#8217;s possible to use the Grouper class, like this: Is there any equivalent in Spark (more specifically, using Scala) for this feature? Answer You can use the sql function window. First, you create the…

Accepted Answer

You can use the sql function window. First, you create the timestamp column, if you don´t have any yet, from a string type datetime:val data =  Seq(("2022-01-01 00:00:00", 1),      ("2022-01-01 00:15:00", 1),      ("2022-01-08 23:30:00", 1),      ("2022-01-22 23:30:00", 4))Then, apply the window function to the timestamp column, and do the aggregation to the column you need to obtain a result per slot:val df0 =   df.groupBy(window(col("date"), "1 week", "1 week", "0 minutes"))    .agg(sum("a") as "sum_a")The result includes the calculated windows. Take a look to the doc for a better understanding of the input parameters: https://spark.apache.org/docs/latest/api/sql/index.html#window.val df1 = df0.select("window.start", "window.end", "sum_a")df1.show()it gives:+-------------------+-------------------+-----+|              start|                end|sum_a|+-------------------+-------------------+-----+|2022-01-20 01:00:00|2022-01-27 01:00:00|    4||2021-12-30 01:00:00|2022-01-06 01:00:00|    2||2022-01-06 01:00:00|2022-01-13 01:00:00|    1|+-------------------+-------------------+-----+

Advertisement

Answer