I’m trying to agglomerate adjacent cells (and their neighbours) that have the same type (integer from 1 to 10) into new clusters by assigning them to a cluster id. As visualised here for some of the clusters: Currently, I use an abbreviation from Breadth-First search to go through all neighbours and their neighbours and then assign a cluster-id to all
Tag: cluster-analysis
How to get the centroids in DBSCAN sklearn?
I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans. However, I observed that DBSCAN has something called core points. I am thinking if it is possible to use these core points or any other alternative to obtain
Clustering images using unsupervised Machine Learning
I have a database of images that contains identity cards, bills and passports. I want to classify these images into different groups (i.e identity cards, bills and passports). As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised). The idea for me is like this: the clustering will
Kmean clustering top terms in cluster
I am using python Kmean clustering algorithm for cluster document. I have created a term-document matrix Then I applied Kmean clustering using following code My next task is to see the top terms in every cluster, searching on googole suggested that many of the people has used the km.cluster_centers_.argsort()[:, ::-1] for finding the top term in the clusters using the
sklearn Clustering: Fastest way to determine optimal number of cluster on large data sets
I use KMeans and the silhouette_score from sklearn in python to calculate my cluster, but on >10.000 samples with >1000 cluster calculating the silhouette_score is very slow. Is there a faster method to determine the optimal number of cluster? Or should I change the clustering algorithm? If yes, which is the best (and fastest) algorithm for a data set with
Python: DBSCAN in 3 dimensional space
I have been searching around for an implementation of DBSCAN for 3 dimensional points without much luck. Does anyone know I library that handles this or has any experience with doing this? I am assuming that the DBSCAN algorithm can handle 3 dimensions, by having the e value be a radius metric and the distance between points measured by euclidean