Using cophenetic distance to choose best linkage method?

Question

I have the dataset that generates the following code. The case is that I would like to make a dendrogram (bottom-up) in Python and I must select a linkage criterion. If you consult the documentation of the function you can see the existing methods. https://docs.scipy.org/doc/scipy/reference/generated/scipy.cl…

Accepted Answer

There is no direct way to know which linkage is best. However, by looking at spread of data we can best guess. For your case, single linkage will produce best result. Single linkage works best if cluster is in form of a chain. Complete linkage is more appropriate for data with globules/spherical clusters.If your data has categorical variables, then average/centroid/ward may not work properly. Single/Complete linkage is better for data with categorical variables.from sklearn.cluster import AgglomerativeClusteringfig, ax = plt.subplots(1,4,figsize=(20,5))link =['single','complete','average','ward']for i in range(4):    model = AgglomerativeClustering(n_clusters=2, linkage=link[i])    labels = model.fit_predict(X_moons)    ax[i].scatter(X_moons[:,0],X_moons[:,1], c=labels)    ax[i].set_title(link[i])fig.show()Further Reading: https://www.youtube.com/watch?v=VMyXc3SiEqs

Advertisement

Answer