Finding overlapped intervals in a set of intervals

Question

We have a login system that tracks how long people are connected. I would like to write a code to find people who were online at the same time. Look at this example, please: Think of these as intervals of Person 1 to 4. I want the output of the algorithm to be something like this: I tried to solve

Accepted Answer

Heavily inspired by Minimum number of platforms required for a railway station, but maintaining a set of persons present instead of a count of trains present:Also an implementation of the algorithm hinted at in Yves Daoust&#8217;s answer.The idea is to first build a chronological list of all events, where an event is either the arrival or departure of a person.Then we iterate on the list of events, maintaining a set of persons present, adding the persons who arrive and removing the persons who leave.Below, the chronological list of events is implemented as a python dict mapping times to python lists of events, so that events happening at the same time, if any, are naturally grouped together.from operator import itemgetterdata = {    'P1': [1,7],    'P2': [2,5],    'P3': [3,4],    'P4': [6,8]}def get_overlaps(data):    events = {}    for person, (start, end) in data.items():        events.setdefault(start, []).append((person, set.add))        events.setdefault(end, []).append((person, set.remove))    # events = {1: [('P1', add)], 2: [('P2', add)], 3: [('P3', add)], 4: [('P3', remove)], 5: [('P2', remove)], 6: [('P4', add)], 7: [('P1', remove)], 8: [('P4', remove)]}    present_persons = set()    start_time = 0    for new_time, grouped_events in sorted(events.items(), key=itemgetter(0)):        if len(present_persons) >= 2:            yield (frozenset(present_persons), (start_time, new_time))        start_time = new_time        for new_person, add_or_remove in grouped_events:            add_or_remove(present_persons, new_person)data = {    'P1': [1,7],    'P2': [2,5],    'P3': [3,4],    'P4': [6,8]}overlaps = list(get_overlaps(data))print(overlaps)# [(frozenset({'P2', 'P1'}),       (2, 3)),#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),#  (frozenset({'P2', 'P1'}),       (4, 5)),#  (frozenset({'P4', 'P1'}),       (6, 7))]data2 = { # three events at time 5: (P1,remove), (P2,remove), (P4,add)     'P1': [1,5],    'P2': [2,5],    'P3': [3,4],    'P4': [5,8]}overlaps = list(get_overlaps(data2))print(overlaps)# [(frozenset({'P2', 'P1'}), (2, 3)),#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),#  (frozenset({'P2', 'P1'}), (4, 5))]This algorithm has time complexity O(N), where N is the number of persons. For comparison, the algorithm in Dani Mesejo&#8217;s answer has time complexity O(TN), where N is the number of persons and T the maximum time that any person is present, measured in number of instants.

Advertisement

Answer