Skip to content
Advertisement

How to collapse overlapping intervals [start-end] and keep the smaller?

I have a Pandas dataframe of intervals defined by 2 numerical coordinates, ‘start’ and ‘end’.

I am trying to collapse all intervals that are overlapping, and keep the inner coordinates.

index start end  
0 10 40  
1 13 34  
2 50 100  
3 44 94  

Output: The same Pandas dataframe with collapsed intervals and inner coordinates. Two intervals overlap if they share a common point, including closed endpoints. Intervals that only have an open endpoint in common do not overlap.

e.g. The intervals with row index = [0,1] are overlapping. I want to collapse these 2 intervals into a new interval, which has new_start == max([10, 13]) and new_end == min([40,34]). The collapse interval for row index [0,1] will have new_start = 13, new_end = 34.

index start end  
0 13 34  
1 50 94

The size of the dataframe is 2M rows, therefore I am also looking for an efficient way to do it.

Advertisement

Answer

it can be done like below

df.groupby(((df.shift()["end"] - df["start"])<0).cumsum()).agg({"start":"min", "end":"max"})
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement