Tag: cluster-analysis

How would I make clusters from a Levenshtein similarity matrix?

cluster-analysis levenshtein-distance nlp python similarity

I have a similarity matrix of words and would like to apply an algorithm that can put the words in clusters. Here’s the example I have so far: Obviously this is a very simple dummy example, but what I would expect the output to be is 2 clusters, one with ‘The Bachelor’,’The Bachelorett…

Grouping all the rows with close timestamps in pandas dataframe

cluster-analysis dataframe pandas python

I have a df that looks like this, it contains frequencies recorded at some specific time and place. I want to group all the rows which are just 2 seconds apart (like there are 3 rows index 5-7 which have a time difference of just 2 seconds). Similarly, index 8-10 also have the same difference and I want to pl…

Clustering different sets of points with different linear relationships to each other in Python

cluster-analysis grouping intercept linear-regression python

I need to cluster groups of points with the same linear relationship, as per the code and figure below. Obviously, I wouldn’t have the points that way; I would just have the following x and y. Note the following: the points respect linear relationships with high slope, they present a slight separation f…

How to dynamically change color of selected category using dropdown box?

cluster-analysis colors dropdownbox plotly-dash python

I am working on an app that takes in 2 inputs to update a scatterplot displaying the results of a cluster analysis. The first input filters the points on the graph through a time range slider. The second input is a dropdown box that is intended to highlight the color of a category of interest on the graph. Th…

Get number of Clusters (3D)

cluster-analysis machine-learning python

I have a question about clustering. When you’re using k-nearest neighbour algorithm, you have to say, how many clusters you’re expecting. My problem is now, that I have some runs, where the number of clusters varies. I checked, and there are some methods how you can restrict, how many clusters you…

How to count the same rows between multiple CSV files in Pandas?

cluster-analysis data-science netflow pandas python

I merged 3 different CSV(D1,D2,D3) Netflow datasets and created one big dataset(df), and applied KMeans clustering to this dataset. To merge them I did not use pd.concat because of memory error and solved with Linux terminal. All these datasets contain the same column names, they have 12 columns(all numerical…

Clustering on Python and Bokeh; select widget which allows user to change clustering algorithm

bokeh cluster-analysis numpy python scikit-learn

I am trying to build a feature in a Bokeh dashboard which allows the user to cluster data. I am using the following example as a template, here is the link:- Clustering in Bokeh example Here is the code from this example:- The example allows the user to cluster data. Within the code, you can specify which alg…

Getting the coordinates of elements in clusters without a loop in numpy

cluster-analysis ndimage numpy python scipy

I have a 2D array, where I label clusters using the ndimage.label() function like this: I can get the element counts, the centroids or the bounding box of the labeled clusters. But I would like to also get the coordinates of each element in clusters. Something like this (the data structure doesn’t have …

KMeans clustering from all possible combinations of 2 columns not producing correct output

cluster-analysis k-means matplotlib pandas python

I have a 4 column dataframe which I extracted from the iris dataset. I use kmeans to plot 3 clusters from all possible combinations of 2 columns. However, there seems to be something wrong with the output, especially since the cluster centers are not placed at the center of the clusters. I have provided examp…

Grouping / clustering a list of numbers so that the min-max gap of each subset is always less than a cutoff in Python

algorithm cluster-analysis grouping python subset

Say I have a list of 50 random numbers. I want to group the numbers in a way that each subset has a min-max gap less than a cutoff 0.05. Below is my code. Check if all subsets have min-max gaps less than the cutoff: Output: Obviously my code is not working. Any suggestions? Answer Following @j_random_hacker&#…