Trying to go from a DataFrame where each row is a source entity and columns are the type of relations between one or more entities like this:
JavaScript
x
8
1
import numpy as np
2
import pandas as pd
3
4
i = [['a', np.nan, np.nan, ['d', 'e']],
5
['b', 'f', np.nan, np.nan],
6
['c', np.nan, 'g', 'h']]
7
inputs = pd.DataFrame(i, columns=['source', 'mom', 'dad', 'sibling'])
8
To one where each row includes a source’s unique target entity and relation type in separate columns:
JavaScript
1
7
1
o = [['a', 'd', 'sibling'],
2
['a', 'e', 'sibling'],
3
['b', 'f', 'mom'],
4
['c', 'g', 'dad'],
5
['c', 'h', 'sib']]
6
outputs = pd.DataFrame(o)
7
I’ve looked at pandas functionality including stack()
and explode()
but can’t figure out how to implement a pandas-native solution. Any suggestions on how to do this efficiently?
Advertisement
Answer
Per @sammywemmy , melt and explode should do the trick:
JavaScript
1
2
1
inputs.melt("source", var_name="relationship").dropna().explode('value')
2