I have a similarity matrix of words and would like to apply an algorithm that can put the words in clusters. Here’s the example I have so far: Obviously this is a very simple dummy example, but what I would expect the output to be is 2 clusters, one with ‘The Bachelor’,’The Bachelorette’,’The Bachelor Special’, and the other with ‘SportsCenter’,’SportsCenter
Tag: cluster-analysis
Grouping all the rows with close timestamps in pandas dataframe
I have a df that looks like this, it contains frequencies recorded at some specific time and place. I want to group all the rows which are just 2 seconds apart (like there are 3 rows index 5-7 which have a time difference of just 2 seconds). Similarly, index 8-10 also have the same difference and I want to place
Clustering different sets of points with different linear relationships to each other in Python
I need to cluster groups of points with the same linear relationship, as per the code and figure below. Obviously, I wouldn’t have the points that way; I would just have the following x and y. Note the following: the points respect linear relationships with high slope, they present a slight separation from each other, and they all have the
How to dynamically change color of selected category using dropdown box?
I am working on an app that takes in 2 inputs to update a scatterplot displaying the results of a cluster analysis. The first input filters the points on the graph through a time range slider. The second input is a dropdown box that is intended to highlight the color of a category of interest on the graph. The categories
Get number of Clusters (3D)
I have a question about clustering. When you’re using k-nearest neighbour algorithm, you have to say, how many clusters you’re expecting. My problem is now, that I have some runs, where the number of clusters varies. I checked, and there are some methods how you can restrict, how many clusters you have, but these algorithms work for a two-dimensional problem.
How to count the same rows between multiple CSV files in Pandas?
I merged 3 different CSV(D1,D2,D3) Netflow datasets and created one big dataset(df), and applied KMeans clustering to this dataset. To merge them I did not use pd.concat because of memory error and solved with Linux terminal. All these datasets contain the same column names, they have 12 columns(all numerical values) Example expected result: cluster_0 has xxxx numbers of same rows
Clustering on Python and Bokeh; select widget which allows user to change clustering algorithm
I am trying to build a feature in a Bokeh dashboard which allows the user to cluster data. I am using the following example as a template, here is the link:- Clustering in Bokeh example Here is the code from this example:- The example allows the user to cluster data. Within the code, you can specify which algorithm to use;
Getting the coordinates of elements in clusters without a loop in numpy
I have a 2D array, where I label clusters using the ndimage.label() function like this: I can get the element counts, the centroids or the bounding box of the labeled clusters. But I would like to also get the coordinates of each element in clusters. Something like this (the data structure doesn’t have to be like this, any data structure
KMeans clustering from all possible combinations of 2 columns not producing correct output
I have a 4 column dataframe which I extracted from the iris dataset. I use kmeans to plot 3 clusters from all possible combinations of 2 columns. However, there seems to be something wrong with the output, especially since the cluster centers are not placed at the center of the clusters. I have provided examples of the output. Only cluster_1
Grouping / clustering a list of numbers so that the min-max gap of each subset is always less than a cutoff in Python
Say I have a list of 50 random numbers. I want to group the numbers in a way that each subset has a min-max gap less than a cutoff 0.05. Below is my code. Check if all subsets have min-max gaps less than the cutoff: Output: Obviously my code is not working. Any suggestions? Answer Following @j_random_hacker’s answer, I simply