Skip to content
Advertisement

Grouping / clustering a list of numbers so that the min-max gap of each subset is always less than a cutoff in Python

Say I have a list of 50 random numbers. I want to group the numbers in a way that each subset has a min-max gap less than a cutoff 0.05. Below is my code.

import random

def cluster(data, cutoff):
    data.sort()
    res = []
    old_x = -10.
    for x in data:
        if abs(x - old_x) > cutoff:
            res.append([x])
        else:
            res[-1].append(x)
        old_x = x
    return res

cutoff = 0.05
data = [random.random() for _ in range(50)]
res = cluster(data, cutoff)

Check if all subsets have min-max gaps less than the cutoff:

print(all([(max(s) - min(s)) < cutoff for s in res]))

Output:

False

Obviously my code is not working. Any suggestions?

Advertisement

Answer

Following @j_random_hacker’s answer, I simply change my code to

def cluster(data, cutoff):
    data.sort()
    res = []
    old_x = -10.
    for x in data:
        if abs(x - old_x) > cutoff:
            res.append([x])
            old_x = x
        else:
            res[-1].append(x)
    return res

Now it is working as expected

>>> print(all([(max(s) - min(s)) < cutoff for s in res]))
True
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement