We have a login system that tracks how long people are connected. I would like to write a code to find people who were online at the same time. Look at this example, please:
P1: [1,7] P2: [2,5] P3: [3,4] P4: [6,8]
Think of these as intervals of Person 1 to 4. I want the output of the algorithm to be something like this:
P1, P2 : [2, 3] P1, P2, P3 : [3, 4] P1, P2 : [4, 5] P1, P4 : [6,7]
I tried to solve the problem with two for loops so that we get a list of people whose intervals overlap, but the problem is dealing with intervals for more than one person. for example, in the above example, [3,4] not have to come in [4, 5] in line three because it calculated as a three-person interval.
Advertisement
Answer
Heavily inspired by Minimum number of platforms required for a railway station, but maintaining a set of persons present instead of a count of trains present:
Also an implementation of the algorithm hinted at in Yves Daoust’s answer.
The idea is to first build a chronological list of all events, where an event is either the arrival or departure of a person.
Then we iterate on the list of events, maintaining a set of persons present, adding the persons who arrive and removing the persons who leave.
Below, the chronological list of events is implemented as a python dict
mapping times to python list
s of events, so that events happening at the same time, if any, are naturally grouped together.
from operator import itemgetter data = { 'P1': [1,7], 'P2': [2,5], 'P3': [3,4], 'P4': [6,8] } def get_overlaps(data): events = {} for person, (start, end) in data.items(): events.setdefault(start, []).append((person, set.add)) events.setdefault(end, []).append((person, set.remove)) # events = {1: [('P1', add)], 2: [('P2', add)], 3: [('P3', add)], 4: [('P3', remove)], 5: [('P2', remove)], 6: [('P4', add)], 7: [('P1', remove)], 8: [('P4', remove)]} present_persons = set() start_time = 0 for new_time, grouped_events in sorted(events.items(), key=itemgetter(0)): if len(present_persons) >= 2: yield (frozenset(present_persons), (start_time, new_time)) start_time = new_time for new_person, add_or_remove in grouped_events: add_or_remove(present_persons, new_person) data = { 'P1': [1,7], 'P2': [2,5], 'P3': [3,4], 'P4': [6,8] } overlaps = list(get_overlaps(data)) print(overlaps) # [(frozenset({'P2', 'P1'}), (2, 3)), # (frozenset({'P2', 'P1', 'P3'}), (3, 4)), # (frozenset({'P2', 'P1'}), (4, 5)), # (frozenset({'P4', 'P1'}), (6, 7))] data2 = { # three events at time 5: (P1,remove), (P2,remove), (P4,add) 'P1': [1,5], 'P2': [2,5], 'P3': [3,4], 'P4': [5,8] } overlaps = list(get_overlaps(data2)) print(overlaps) # [(frozenset({'P2', 'P1'}), (2, 3)), # (frozenset({'P2', 'P1', 'P3'}), (3, 4)), # (frozenset({'P2', 'P1'}), (4, 5))]
This algorithm has time complexity O(N), where N is the number of persons. For comparison, the algorithm in Dani Mesejo’s answer has time complexity O(TN), where N is the number of persons and T the maximum time that any person is present, measured in number of instants.