Skip to content
Advertisement

split a list of overlapping intervals into non overlapping subintervals in a pyspark dataframe

I have a pyspark dataframe that contains the columns start_time, end_time that define an interval per row.

There is a column rate, and I want to know if there is not different values for a sub-interval (that is overlapped by definition); and if it is the case, I want to keep the last record as the ground truth.

Inputs:

JavaScript

Advertisement

Answer

You can compare the end_time with the next start_time, and replace the end_time with the next start_time if the latter is smaller than the former.

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement