I have a list like the following
+----+-------------------+ | id | first_predecessor | +----+-------------------+ | 0 | 4 | | 1 | 5 | | 2 | 6 | | 3 | 17,18 | | 4 | 7 | | 5 | 8 | | 6 | 9 | | 7 | 10,11,12 | | 8 | 13,14,15 | | 9 | 16 | | 10 | Input | | 11 | Input | | 12 | Input | | 13 | Input | | 14 | Input | | 15 | Input | | 16 | Input | | 17 | 19 | | 18 | 20 | | 19 | 21 | | 20 | 22 | | 21 | Input | +----+-------------------+
One item can have multiple immediate incoming ids, like in case of id=3, which is imediately preceeded by id=17 and id=18. I need a python code to determine this result by following the chain of predecessors both ways: (it is best to read the column all_successors to understand the logic, all_predecessors is the same logic backwards)
+----+-------------------+------------------+----------------+ | id | first_predecessor | all_predecessors | all_successors | +----+-------------------+------------------+----------------+ | 0 | 4 | 4,7,10,11,12 | | | 1 | 5 | 5,8,13,14,15 | | | 2 | 6 | 6,9,16 | | | 3 | 17,18 | 19,21,20,22 | | | 4 | 7 | 7,10,11,12 | 0 | | 5 | 8 | 8,13,14,15 | 1 | | 6 | 9 | 9,16 | 2 | | 7 | 10,11,12 | 10,11,12 | 0,4 | | 8 | 13,14,15 | 13,14,15 | 1,5 | | 9 | 16 | 16 | 2,6 | | 10 | Input | | 0,4,7 | | 11 | Input | | 0,4,7 | | 12 | Input | | 0,4,7 | | 13 | Input | | 1,5,8 | | 14 | Input | | 1,5,8 | | 15 | Input | | 1,5,8 | | 16 | Input | | 2,6,9 | | 17 | 19 | 19,21 | 3 | | 18 | 20 | 20,22 | 3 | | 19 | 21 | 21 | 3,17 | | 20 | 22 | 22 | 3,18 | | 21 | Input | | 3,17,19 | | 22 | Input | | 3,18,20 | +----+-------------------+------------------+----------------+
I need some kind of recursive solution, or should I use some graph package?
Advertisement
Answer
You can use the following functions to find all predecessors and all successors.
ancestors(G, source)
: Returns all nodes having a path tosource
inG
.descendants(G, source)
: Returns all nodes reachable fromsource
inG
.
To run the following example, make sure you change INPUT
in your id
column to NaN
.
df_ = df.copy() df_['first_predecessor'] = df_['first_predecessor'].str.split(',') df_ = df_.explode('first_predecessor') df_['first_predecessor'] = df_['first_predecessor'].fillna(-1).astype(int) G = nx.from_pandas_edgelist(df_, 'first_predecessor', 'id', create_using=nx.DiGraph()) G.remove_node(-1) df['all_predecessors'] = df['id'].apply(lambda x: ','.join(map(str, sorted(nx.ancestors(G, x))))) df['all_successors'] = df['id'].apply(lambda x: ','.join(map(str, sorted(nx.descendants(G, x)))))
print(df) id first_predecessor all_predecessors all_successors 0 0 4 4,7,10,11,12 1 1 5 5,8,13,14,15 2 2 6 6,9,16 3 3 17,18 17,18,19,20,21,22 4 4 7 7,10,11,12 0 5 5 8 8,13,14,15 1 6 6 9 9,16 2 7 7 10,11,12 10,11,12 0,4 8 8 13,14,15 13,14,15 1,5 9 9 16 16 2,6 10 10 NaN 0,4,7 11 11 NaN 0,4,7 12 12 NaN 0,4,7 13 13 NaN 1,5,8 14 14 NaN 1,5,8 15 15 NaN 1,5,8 16 16 NaN 2,6,9 17 17 19 19,21 3 18 18 20 20,22 3 19 19 21 21 3,17 20 20 22 22 3,18 21 21 NaN 3,17,19