Skip to content
Advertisement

Create new key based on relationship between two columns

I’m trying to add a key for all related instances between two columns, then create a GroupID

The logic will be:

  1. Check all instances of ID2 linked to ID1
  2. CHeck all instances of ID1 linked to ID2 found in (1)
  3. Repeat until all relationships found

enter image description here

Advertisement

Answer

Let us try with networkx

import networkx as nx
G=nx.from_pandas_edgelist(df, 'ID1', 'ID2')
l=list(nx.connected_components(G))
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
d={k: v for d in L for k, v in d.items()}
df['new'] = df['ID1'].map(d)
df
Out[302]: 
  ID1  ID2  new
0   A    1    0
1   A    2    0
2   B    1    0
3   B    3    0
4   C    4    1
5   C    5    1
6   D    2    0
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement