How to count the same rows between multiple CSV files in Pandas?

Question

I merged 3 different CSV(D1,D2,D3) Netflow datasets and created one big dataset(df), and applied KMeans clustering to this dataset. To merge them I did not use pd.concat because of memory error and solved with Linux terminal. All these datasets contain the same column names, they have 12 columns(all numerical…

Accepted Answer

cluster0_D1 = pd.merge(D1, cluster_0, how ='inner')number_of_rows_D1 = len(cluster0_D1)cluster0_D2 = pd.merge(D2, cluster_0, how ='inner')number_of_rows_D2 = len(cluster0_D2)cluster0_D3 = pd.merge(D3, cluster_0, how ='inner')number_of_rows_D3 = len(cluster0_D3)print("How many samples belong to D1, D2, D3 for cluster_0?")print("D1: ",number_of_rows_D1)print("D2: ",number_of_rows_D2)print("D3: ",number_of_rows_D3)I think this solved my problem.

Advertisement

Answer