Creating Dataframes for different clusters

Question

I have a dataset Using this dataset, I clustered the dataset based on the number of times &#8220;System&#8221; is repeated for a particular &#8220;Name&#8221;. In the above example, Names A, B and D have one &#8220;AZ&#8221; &#8220;Subset&#8221; while C, E have two &#8220;AY&#8221; subsets and F has two AZ so…

Accepted Answer

You can adapt my previous answer:getting the clustersclusters = (df.groupby(['Name', 'System'])   ['System'].agg(Cluster=lambda x: (x.iloc[0], len(x)))   .droplevel('System').reset_index()   .groupby('Cluster')['Name'].agg(frozenset)   .reset_index())#    Cluster       Name# 0  (AY, 2)     (C, E)# 1  (AZ, 1)  (A, B, D)# 2  (AZ, 2)        (F)splitting by groupgroups = df['Name'].map(clusters.explode('Name').set_index('Name')['Cluster'])for _,d in df.groupby(groups):    print(d)#    Name System# 5     C     AY# 6     C     AY# 8     E     AY# 9     E     AY# 10    E    NaN##   Name System# 0    A     AZ# 1    A    NaN# 2    B     AZ# 3    B    NaN# 4    B    NaN# 7    D     AZ##    Name System# 11    F     AZ# 12    F     AZ# 13    F    NaN

Advertisement

Answer