Skip to content
Advertisement

Finding overlapped intervals in a set of intervals

We have a login system that tracks how long people are connected. I would like to write a code to find people who were online at the same time. Look at this example, please:

P1: [1,7]
P2: [2,5]
P3: [3,4]
P4: [6,8]

Think of these as intervals of Person 1 to 4. I want the output of the algorithm to be something like this:

P1, P2 : [2, 3]
P1, P2, P3 : [3, 4]
P1, P2 : [4, 5]
P1, P4 : [6,7]

I tried to solve the problem with two for loops so that we get a list of people whose intervals overlap, but the problem is dealing with intervals for more than one person. for example, in the above example, [3,4] not have to come in [4, 5] in line three because it calculated as a three-person interval.

Advertisement

Answer

Heavily inspired by Minimum number of platforms required for a railway station, but maintaining a set of persons present instead of a count of trains present:

Also an implementation of the algorithm hinted at in Yves Daoust’s answer.

The idea is to first build a chronological list of all events, where an event is either the arrival or departure of a person.

Then we iterate on the list of events, maintaining a set of persons present, adding the persons who arrive and removing the persons who leave.

Below, the chronological list of events is implemented as a python dict mapping times to python lists of events, so that events happening at the same time, if any, are naturally grouped together.

from operator import itemgetter

data = {
    'P1': [1,7],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [6,8]
}

def get_overlaps(data):
    events = {}
    for person, (start, end) in data.items():
        events.setdefault(start, []).append((person, set.add))
        events.setdefault(end, []).append((person, set.remove))
    # events = {1: [('P1', add)], 2: [('P2', add)], 3: [('P3', add)], 4: [('P3', remove)], 5: [('P2', remove)], 6: [('P4', add)], 7: [('P1', remove)], 8: [('P4', remove)]}
    present_persons = set()
    start_time = 0
    for new_time, grouped_events in sorted(events.items(), key=itemgetter(0)):
        if len(present_persons) >= 2:
            yield (frozenset(present_persons), (start_time, new_time))
        start_time = new_time
        for new_person, add_or_remove in grouped_events:
            add_or_remove(present_persons, new_person)

data = {
    'P1': [1,7],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [6,8]
}
overlaps = list(get_overlaps(data))

print(overlaps)
# [(frozenset({'P2', 'P1'}),       (2, 3)),
#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
#  (frozenset({'P2', 'P1'}),       (4, 5)),
#  (frozenset({'P4', 'P1'}),       (6, 7))]

data2 = { # three events at time 5: (P1,remove), (P2,remove), (P4,add) 
    'P1': [1,5],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [5,8]
}

overlaps = list(get_overlaps(data2))

print(overlaps)
# [(frozenset({'P2', 'P1'}), (2, 3)),
#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
#  (frozenset({'P2', 'P1'}), (4, 5))]

This algorithm has time complexity O(N), where N is the number of persons. For comparison, the algorithm in Dani Mesejo’s answer has time complexity O(TN), where N is the number of persons and T the maximum time that any person is present, measured in number of instants.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement