Skip to content
Advertisement

Tag: data-mining

Verify that a column name is a unique identifier

I have a dataset called df_authors and in that dataset I have a column called author. I have to verify that df_authors.author is a unique identifier. What I tried, len(df_authors) == len(df_authors[‘author’].unique()), and this returns True. My question is have I done this right. I found this line of code online and not a 100% sure if it does what

sklearn Clustering: Fastest way to determine optimal number of cluster on large data sets

I use KMeans and the silhouette_score from sklearn in python to calculate my cluster, but on >10.000 samples with >1000 cluster calculating the silhouette_score is very slow. Is there a faster method to determine the optimal number of cluster? Or should I change the clustering algorithm? If yes, which is the best (and fastest) algorithm for a data set with

Advertisement