How many records include each time interval in Pandas?

I have two Pandas event dataframes. The first is for longer events, while the second is for shorter events. The start and end times of the events define them. Every long event is a “container” of at least one short event. Each short event is part of only one long event interval.

I try to figure out how many short events/intervals are in each long event.

Minimal example:

# long events
start | end | tag
0.5   | 0.7 | AAA
0.7   | 1.2 | BBB
1.2   | 2   | CCC

# short events

start | end | tag
0.5   | 0.55| a
0.55  | 0.7 | aa
0.7   | 1.2 | b
1.2   | 1.3 | c
1.3   | 1.4 | cc
1.4   | 1.5 | ccc
1.5   | 1.6 | cccc
1.6   | 2   | ccccc

The desired output is:

start | end | tag | count of sub events
0.5   | 0.7 | AAA | 2
0.7   | 1.2 | BBB | 1
1.2   | 2   | CCC | 5

I attempted to use “apply”, so I created a function that looked like this:

def count_records_in_time_range(df, start, end):
    return len(df[(df['start'] >= start) & (df['end'] <= end)])

But I’m not sure how to proceed; should I use this function on the short events tables or the long events tables? How can I calculate the desired count per record in one table (the long events dataframe) based on the records in the other table (the short events dataframe)?

Answer

We can use numpy broadcasting here

s, e = long[['start', 'end']].values[None, :].T
long['count'] = ((s <= short['start'].values) & (e >= short['end'].values)).sum(1)

   start  end  tag  count
0    0.5  0.7  AAA      2
1    0.7  1.2  BBB      1
2    1.2  2.0  CCC      5

Advertisement

Answer