Tag: cluster-analysis

Agglomerate adjecent cells and their neighbours of the same type to clusters with python

I’m trying to agglomerate adjacent cells (and their neighbours) that have the same type (integer from 1 to 10) into new clusters by assigning them to a cluster id. As visualised here for some of the clusters: Currently, I use an abbreviation from Breadth-First search to go through all neighbours and their neighbours and then assign a cluster-id to all

How to get the centroids in DBSCAN sklearn?

cluster-analysis dbscan python scikit-learn

I am using DBSCAN for clustering. However, now I want to pick a point from each cluster that represents it, but I realized that DBSCAN does not have centroids as in kmeans. However, I observed that DBSCAN has something called core points. I am thinking if it is possible to use these core points or any other alternative to obtain

Clustering images using unsupervised Machine Learning

cluster-analysis computer-vision k-means python unsupervised-learning

I have a database of images that contains identity cards, bills and passports. I want to classify these images into different groups (i.e identity cards, bills and passports). As I read about that, one of the ways to do this task is clustering (since it is going to be unsupervised). The idea for me is like this: the clustering will

Kmean clustering top terms in cluster

cluster-analysis k-means python scikit-learn

I am using python Kmean clustering algorithm for cluster document. I have created a term-document matrix Then I applied Kmean clustering using following code My next task is to see the top terms in every cluster, searching on googole suggested that many of the people has used the km.cluster_centers_.argsort()[:, ::-1] for finding the top term in the clusters using the

sklearn Clustering: Fastest way to determine optimal number of cluster on large data sets

bigdata cluster-analysis data-mining python scikit-learn

I use KMeans and the silhouette_score from sklearn in python to calculate my cluster, but on >10.000 samples with >1000 cluster calculating the silhouette_score is very slow. Is there a faster method to determine the optimal number of cluster? Or should I change the clustering algorithm? If yes, which is the best (and fastest) algorithm for a data set with

Python: DBSCAN in 3 dimensional space

cluster-analysis dbscan python

I have been searching around for an implementation of DBSCAN for 3 dimensional points without much luck. Does anyone know I library that handles this or has any experience with doing this? I am assuming that the DBSCAN algorithm can handle 3 dimensions, by having the e value be a radius metric and the distance between points measured by euclidean