I have a list like the following
+----+-------------------+ | id | first_predecessor | +----+-------------------+ | 0 | 4 | | 1 | 5 | | 2 | 6 | | 3 | 17,18 | | 4 | 7 | | 5 | 8 | | 6 | 9 | | 7 | 10,11,12 | | 8 | 13,14,15 | | 9 | 16 | | 10 | Input | | 11 | Input | | 12 | Input | | 13 | Input | | 14 | Input | | 15 | Input | | 16 | Input | | 17 | 19 | | 18 | 20 | | 19 | 21 | | 20 | 22 | | 21 | Input | +----+-------------------+
One item can have multiple immediate incoming ids, like in case of id=3, which is imediately preceeded by id=17 and id=18. I need a python code to determine this result by following the chain of predecessors both ways: (it is best to read the column all_successors to understand the logic, all_predecessors is the same logic backwards)
+----+-------------------+------------------+----------------+ | id | first_predecessor | all_predecessors | all_successors | +----+-------------------+------------------+----------------+ | 0 | 4 | 4,7,10,11,12 | | | 1 | 5 | 5,8,13,14,15 | | | 2 | 6 | 6,9,16 | | | 3 | 17,18 | 19,21,20,22 | | | 4 | 7 | 7,10,11,12 | 0 | | 5 | 8 | 8,13,14,15 | 1 | | 6 | 9 | 9,16 | 2 | | 7 | 10,11,12 | 10,11,12 | 0,4 | | 8 | 13,14,15 | 13,14,15 | 1,5 | | 9 | 16 | 16 | 2,6 | | 10 | Input | | 0,4,7 | | 11 | Input | | 0,4,7 | | 12 | Input | | 0,4,7 | | 13 | Input | | 1,5,8 | | 14 | Input | | 1,5,8 | | 15 | Input | | 1,5,8 | | 16 | Input | | 2,6,9 | | 17 | 19 | 19,21 | 3 | | 18 | 20 | 20,22 | 3 | | 19 | 21 | 21 | 3,17 | | 20 | 22 | 22 | 3,18 | | 21 | Input | | 3,17,19 | | 22 | Input | | 3,18,20 | +----+-------------------+------------------+----------------+
I need some kind of recursive solution, or should I use some graph package?
Advertisement
Answer
You can use the following functions to find all predecessors and all successors.
ancestors(G, source): Returns all nodes having a path tosourceinG.descendants(G, source): Returns all nodes reachable fromsourceinG.
To run the following example, make sure you change INPUT in your id column to NaN.
df_ = df.copy()
df_['first_predecessor'] = df_['first_predecessor'].str.split(',')
df_ = df_.explode('first_predecessor')
df_['first_predecessor'] = df_['first_predecessor'].fillna(-1).astype(int)
G = nx.from_pandas_edgelist(df_, 'first_predecessor', 'id', create_using=nx.DiGraph())
G.remove_node(-1)
df['all_predecessors'] = df['id'].apply(lambda x: ','.join(map(str, sorted(nx.ancestors(G, x)))))
df['all_successors'] = df['id'].apply(lambda x: ','.join(map(str, sorted(nx.descendants(G, x)))))
print(df)
id first_predecessor all_predecessors all_successors
0 0 4 4,7,10,11,12
1 1 5 5,8,13,14,15
2 2 6 6,9,16
3 3 17,18 17,18,19,20,21,22
4 4 7 7,10,11,12 0
5 5 8 8,13,14,15 1
6 6 9 9,16 2
7 7 10,11,12 10,11,12 0,4
8 8 13,14,15 13,14,15 1,5
9 9 16 16 2,6
10 10 NaN 0,4,7
11 11 NaN 0,4,7
12 12 NaN 0,4,7
13 13 NaN 1,5,8
14 14 NaN 1,5,8
15 15 NaN 1,5,8
16 16 NaN 2,6,9
17 17 19 19,21 3
18 18 20 20,22 3
19 19 21 21 3,17
20 20 22 22 3,18
21 21 NaN 3,17,19