How to make clusters of Pandas data frame?

Question

I am trying to make a cluster of the following pandas data frame and trying to give the names. E.g - "Personal Info" is cluster name and it consist of (PERSON,LOCATION,PHONE_NUMBER,EMAIL_ADDRESS,PASSPORT,SSN, DRIVER_LICENSE) and also addition of there Counts. which will be 460. Clusters: for reference I am providing clusters structure Input data: Output: Answer You can create an inverse dictionary

Accepted Answer

You can create an inverse dictionary and map:d = {'personal_info': ['PERSON','LOCATION','PHONE_NUMBER','EMAIL_ADDRESS','PASSPORT','SSN','DRIVER_LICENSE'],    'finance':['CREDIT_CARD','BANK_NUMBER','ITIN','IBAN_CODE'],    'info': ['NHS'],    'network':['IP_ADDRESS','DOMAIN_NAME'],    'others':['CRYPTO','DATE_TIME','NRP']    }d_inv = {x:k for k, v in d.items() for x in v}(df['Counts'].groupby(df['PII'].map(d_inv)).sum()   .rename_axis('Cluster names')       # rename to match output   .reset_index(name='Total count'))Output:   Cluster names  Total count0        finance           971           info            02        network          1403         others           864  personal_info          460

Advertisement

Answer