Skip to content
Advertisement

How to count the same rows between multiple CSV files in Pandas?

I merged 3 different CSV(D1,D2,D3) Netflow datasets and created one big dataset(df), and applied KMeans clustering to this dataset. To merge them I did not use pd.concat because of memory error and solved with Linux terminal.

JavaScript

All these datasets contain the same column names, they have 12 columns(all numerical values)

Example expected result:

cluster_0 has xxxx numbers of same rows from D1, xxxxx numbers of same rows from D2, xxxxx numbers of same rows from D3?

Advertisement

Answer

JavaScript

I think this solved my problem. enter image description here

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement