Skip to content
Advertisement

How to calculate cumulative sum over date range excluding weekends in PySpark 2.0?

This is an extension to an earlier question I raised here How to calculate difference between dates excluding weekends in PySpark 2.2.0. My spark dataframe looks like below and can be generated with the accompanying code:

JavaScript

I am trying to calculate cumulative sums over a period of 2,3,4,5 & 30 days. Below is a sample code for 2 days and the resulting table.

JavaScript

What I am trying to do is when calculating the date range, the calculation excludes weekends i.e. in my table 2020-11-27 is a Friday and 2020-11-30 is Monday. The diff between them is 1 if we exclude Sat & Sun. I want the cumulative sum of 2020-11-27 and 2020-11-30 values in front of 2020-11-30 in the ‘cum_sum_2d_temp’ column which should be 3. I am looking to combine the solution to my earlier question to the date range.

Advertisement

Answer

Calculate the date_dif relative to the earliest date:

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement