I need to cluster groups of points with the same linear relationship, as per the code and figure below. Obviously, I wouldn't have the points that way; I would just have the following x and y. Note the following: the points respect linear relationships with high slope, they present a slight separation from each other, and they all have the

Clustering different sets of points with different linear relationships to each other in Python

I need to cluster groups of points with the same linear relationship, as per the code and figure below.

import numpy as np
import matplotlib.pyplot as plt

x  = np.linspace(0, 30, 100)
y1 = 3*x -50 + 20*np.random.random(size=len(x))
y2 = 3*x -20 + 20*np.random.random(size=len(x))
y3 = 3*x +10 + 20*np.random.random(size=len(x))

plt.plot(x, y1, 'o')
plt.plot(x, y2, 'o')
plt.plot(x, y3, 'o')
plt.xlim([-50,125])
plt.ylim([-50,125])

JavaScript
​x
 
import numpy as np
import matplotlib.pyplot as plt
​
x  = np.linspace(0, 30, 100)
y1 = 3*x -50 + 20*np.random.random(size=len(x))
y2 = 3*x -20 + 20*np.random.random(size=len(x))
y3 = 3*x +10 + 20*np.random.random(size=len(x))
​
plt.plot(x, y1, 'o')
plt.plot(x, y2, 'o')
plt.plot(x, y3, 'o')
plt.xlim([-50,125])
plt.ylim([-50,125])
​

Obviously, I wouldn’t have the points that way; I would just have the following x and y.

x_final = np.concatenate((x,x,x))
y_final = np.concatenate((y1,y2,y3))

plt.plot(x_final, y_final, 'o')
plt.xlim([-50,125])
plt.ylim([-50,125])

JavaScript
 
x_final = np.concatenate((x,x,x))
y_final = np.concatenate((y1,y2,y3))
​
plt.plot(x_final, y_final, 'o')
plt.xlim([-50,125])
plt.ylim([-50,125])
​

Note the following: the points respect linear relationships with high slope, they present a slight separation from each other, and they all have the same slope, just with different intercepts.

How would you suggest I cluster these points? I thought about using PCA and clustering the main components with k-means, but I don’t know if there would be a more efficient way. In my real case I have more than three clusters and they have different distances from each other, even though they all have the same slope.

Answer

Take a look at all clustering algorithms scikit-learn offers :

Spectral clustering and Gaussian mixture should work for your use case.

Advertisement

Answer