Skip to content
Advertisement

Creating Dataframes for different clusters

I have a dataset

Name    System
A       AZ
A       NaN
B       AZ
B       NaN
B       NaN
C       AY
C       AY
D       AZ
E       AY
E       AY
E       NaN
F       AZ
F       AZ
F       NaN

Using this dataset, I clustered the dataset based on the number of times “System” is repeated for a particular “Name”.

In the above example, Names A, B and D have one “AZ” “Subset” while C, E have two “AY” subsets and F has two AZ so it is a different cluster. The clustered form is as follows;

Cluster     Names
AZ          A,B
AY,AY       C,E
AZ,AZ       F 

Code Used to generate Clusters:

df1 = (df.dropna(subset=['System'])
         .groupby('Name')['System']
         .agg(','.join)
         .reset_index()
         .groupby('System')['Name']
         .agg(','.join)
         .rename_axis('Cluster')
         .reset_index())

Based on this clustered output, I need different dataframes for different clusters: How can I do this?

Output Example-

Dataframe1-

Name    System
A       AZ
A       NaN
B       AZ
B       NaN
B       NaN

Dataframe2

Name    System
C       AY
C       AY
E       AY
E       AY
E       NaN

Dataframe 3

Name    System
F       AZ
F       AZ
F       NaN

Advertisement

Answer

You can adapt my previous answer:

getting the clusters

clusters = (df.groupby(['Name', 'System'])
   ['System'].agg(Cluster=lambda x: (x.iloc[0], len(x)))
   .droplevel('System').reset_index()
   .groupby('Cluster')['Name'].agg(frozenset)
   .reset_index()
)

#    Cluster       Name
# 0  (AY, 2)     (C, E)
# 1  (AZ, 1)  (A, B, D)
# 2  (AZ, 2)        (F)

splitting by group

groups = df['Name'].map(clusters.explode('Name').set_index('Name')['Cluster'])

for _,d in df.groupby(groups):
    print(d)

#    Name System
# 5     C     AY
# 6     C     AY
# 8     E     AY
# 9     E     AY
# 10    E    NaN
#
#   Name System
# 0    A     AZ
# 1    A    NaN
# 2    B     AZ
# 3    B    NaN
# 4    B    NaN
# 7    D     AZ
#
#    Name System
# 11    F     AZ
# 12    F     AZ
# 13    F    NaN
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement