We have a login system that tracks how long people are connected. I would like to write a code to find people who were online at the same time. Look at this example, please:
P1: [1,7]
P2: [2,5]
P3: [3,4]
P4: [6,8]
Think of these as intervals of Person 1 to 4. I want the output of the algorithm to be something like this:
P1, P2 : [2, 3]
P1, P2, P3 : [3, 4]
P1, P2 : [4, 5]
P1, P4 : [6,7]
I tried to solve the problem with two for loops so that we get a list of people whose intervals overlap, but the problem is dealing with intervals for more than one person. for example, in the above example, [3,4] not have to come in [4, 5] in line three because it calculated as a three-person interval.
Advertisement
Answer
Heavily inspired by Minimum number of platforms required for a railway station, but maintaining a set of persons present instead of a count of trains present:
Also an implementation of the algorithm hinted at in Yves Daoust’s answer.
The idea is to first build a chronological list of all events, where an event is either the arrival or departure of a person.
Then we iterate on the list of events, maintaining a set of persons present, adding the persons who arrive and removing the persons who leave.
Below, the chronological list of events is implemented as a python dict
mapping times to python list
s of events, so that events happening at the same time, if any, are naturally grouped together.
from operator import itemgetter
data = {
'P1': [1,7],
'P2': [2,5],
'P3': [3,4],
'P4': [6,8]
}
def get_overlaps(data):
events = {}
for person, (start, end) in data.items():
events.setdefault(start, []).append((person, set.add))
events.setdefault(end, []).append((person, set.remove))
# events = {1: [('P1', add)], 2: [('P2', add)], 3: [('P3', add)], 4: [('P3', remove)], 5: [('P2', remove)], 6: [('P4', add)], 7: [('P1', remove)], 8: [('P4', remove)]}
present_persons = set()
start_time = 0
for new_time, grouped_events in sorted(events.items(), key=itemgetter(0)):
if len(present_persons) >= 2:
yield (frozenset(present_persons), (start_time, new_time))
start_time = new_time
for new_person, add_or_remove in grouped_events:
add_or_remove(present_persons, new_person)
data = {
'P1': [1,7],
'P2': [2,5],
'P3': [3,4],
'P4': [6,8]
}
overlaps = list(get_overlaps(data))
print(overlaps)
# [(frozenset({'P2', 'P1'}), (2, 3)),
# (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
# (frozenset({'P2', 'P1'}), (4, 5)),
# (frozenset({'P4', 'P1'}), (6, 7))]
data2 = { # three events at time 5: (P1,remove), (P2,remove), (P4,add)
'P1': [1,5],
'P2': [2,5],
'P3': [3,4],
'P4': [5,8]
}
overlaps = list(get_overlaps(data2))
print(overlaps)
# [(frozenset({'P2', 'P1'}), (2, 3)),
# (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
# (frozenset({'P2', 'P1'}), (4, 5))]
This algorithm has time complexity O(N), where N is the number of persons. For comparison, the algorithm in Dani Mesejo’s answer has time complexity O(TN), where N is the number of persons and T the maximum time that any person is present, measured in number of instants.