Skip to content
Advertisement

Convert 2 column dataframe into multi-level hierarchical dataframe

I have a pandas dataframe

From To
A B
A C
D E
F F
B G
B H
B I
G J
G K
L L
M M
N N

I want to convert it into multi column hierarchy. The expected hierarchy will look like

Level_1 Level_2 Level_3 Level_4
A B G J
A B G K
A B H
A B I
A C
D E
F F
L L
M M
N N

Is there an in-built way in pandas to achieve this? I know i can use recursion, Is there any other simplified way?

Advertisement

Answer

You can easily get what you expect using networkx

# Python env: pip install networkx
# Anaconda env: conda install networkx

import networkx as nx
import pandas as pd

df = pd.DataFrame({'From': ['A', 'A', 'D', 'F', 'B', 'B', 'B', 'G', 'G', 'L', 'M', 'N'],
                   'To': ['B', 'C', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N']})

G = nx.from_pandas_edgelist(df, source='From', target='To', create_using=nx.DiGraph)
roots = [v for v, d in G.in_degree() if d == 0]
leaves = [v for v, d in G.out_degree() if d == 0]

all_paths = []
for root in roots:
    for leaf in leaves:
        paths = nx.all_simple_paths(G, root, leaf)
        all_paths.extend(paths)

for node in nx.nodes_with_selfloops(G):
    all_paths.append([node, node])

Output:

>>> pd.DataFrame(sorted(all_paths)).add_prefix('Level_').fillna('')
  Level_0 Level_1 Level_2 Level_3
0       A       B       G       J
1       A       B       G       K
2       A       B       H
3       A       B       I
4       A       C
5       D       E
6       F       F
7       L       L
8       M       M
9       N       N

Documentation: networkx.algorithms.simple_paths.all_simple_paths

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement