I’m trying to add a key for all related instances between two columns, then create a GroupID
The logic will be:
- Check all instances of ID2 linked to ID1
- CHeck all instances of ID1 linked to ID2 found in (1)
- Repeat until all relationships found
Advertisement
Answer
Let us try with networkx
JavaScript
x
17
17
1
import networkx as nx
2
G=nx.from_pandas_edgelist(df, 'ID1', 'ID2')
3
l=list(nx.connected_components(G))
4
L=[dict.fromkeys(y,x) for x, y in enumerate(l)]
5
d={k: v for d in L for k, v in d.items()}
6
df['new'] = df['ID1'].map(d)
7
df
8
Out[302]:
9
ID1 ID2 new
10
0 A 1 0
11
1 A 2 0
12
2 B 1 0
13
3 B 3 0
14
4 C 4 1
15
5 C 5 1
16
6 D 2 0
17