Skip to content
Advertisement

Sum value between overlapping interval slices per group

I have a pyspark dataframe as below:

JavaScript

And I want to sum only consumption on overlapping interval slices per idx:

JavaScript

Advertisement

Answer

You can use sequence to expand the intervals into single days, explode the list of days and then sum the consumption for each timestamp and idx:

JavaScript

Output:

JavaScript

Remarks:

  • sequence includes the last value of the interval, so one day has to be substracted from valid_to.
  • the missing end dates of the intervals are then restored using a full join with the original valid_to values, filling up null values with 0.0.
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement