I have a serious issue with finding all possible paths in my csv file that looks like this :
Source | Target | Source_repo | Target_repo |
---|---|---|---|
SOURCE1 | Target2 | repo-1 | repo-2 |
SOURCE5 | Target3 | repo-5 | repo-3 |
SOURCE8 | Target5 | repo-8 | repo-5 |
There a large amount of lines in the datasets, more than 5000 lines. I want to generate all possible paths like this in and return a list (Target5 is equal to SOURCE5):
- SOURCE1 Target2
- SOURCE8 Target5 Target3
I want to implement this solution without using recursive functions, since causes problems (maximum recursion depth exceeded).
This is the current code example :
def attach_co_changing_components(base_component): co_changes = df_depends_on.loc[df_depends_on["Source_repo"] == base_component, "Target_repo"].values result = {base_component: list(co_changes)} return result def dfs(data, path, paths): datum = path[-1] if datum in data: for val in data[datum]: new_path = path + [val] paths = dfs(data, new_path, paths) else: paths += [path] return paths def enumerate_paths(graph, nodes=[]): nodes = graph.keys() all_paths = [] for node in nodes: node_paths = dfs(graph, [node], []) all_paths += node_paths return all_paths if __name__ == "__main__": df = pd.read_csv("clean_openstack_evolution.csv") co_changing_components = df[["Source"]].copy() co_changing_components = co_changing_components.drop_duplicates( ).reset_index(drop=True) co_changing_components = co_changing_components["Source"].map( attach_co_changing_components) co_changing_components = co_changing_components.rename("Path") co_changing_components = co_changing_components.reset_index(drop=True) newdict = {} for k, v in [(key, d[key]) for d in co_changing_components for key in d]: if k not in newdict: newdict[k] = v else: newdict[k].append(v) graph_keys = df_depends_on["Source_repo"].drop_duplicates().to_dict( ).values() graph_keys = {*graph_keys} graph_keys = set([ k for k in graph_keys if len(df_depends_on[df_depends_on["Target"] == k]) > 0 ]) result = enumerate_paths(new_dict)
Here is the output after executing the preceding code :
Here is the data link Google drive
I tried to solve the problem using recursive function, but the code failed with the problem of depth exceeded. I aim to solve it without recursive functions.
Advertisement
Answer
I’m not sure if you want all paths or paths specifically from node to another node. Either way this looks like a job for networkx
.
Setup (nx.from_pandas_edgelist
)
import networkx as nx import pandas as pd df = pd.read_csv("...") graph = nx.from_pandas_edgelist(df, create_using=nx.DiGraph)
All paths (nx.all_simple_paths
)
from itertools import chain, product, starmap from functools import partial roots = (node for node, d in graph.in_degree if d == 0) leaves = (node for node, d in graph.out_degree if d == 0) all_paths = partial(nx.all_simple_paths, graph) paths = list(chain.from_iterable(starmap(all_paths, product(roots, leaves))))
From one node to another
source_node = "some_node_in_graph" target_node = "some_other_node_in_graph" list(nx.all_simple_paths(graph, source=source_node, target=target_node))