Trying to go from a DataFrame where each row is a source entity and columns are the type of relations between one or more entities like this:
import numpy as np
import pandas as pd
i = [['a', np.nan, np.nan, ['d', 'e']],
     ['b', 'f', np.nan, np.nan],
     ['c', np.nan, 'g', 'h']]
inputs = pd.DataFrame(i, columns=['source', 'mom', 'dad', 'sibling'])
To one where each row includes a source’s unique target entity and relation type in separate columns:
o = [['a', 'd', 'sibling'],
     ['a', 'e', 'sibling'],
     ['b', 'f', 'mom'],
     ['c', 'g', 'dad'],
     ['c', 'h', 'sib']]
outputs = pd.DataFrame(o)
I’ve looked at pandas functionality including stack() and explode() but can’t figure out how to implement a pandas-native solution. Any suggestions on how to do this efficiently?
Advertisement
Answer
Per @sammywemmy , melt and explode should do the trick:
inputs.melt("source", var_name="relationship").dropna().explode('value')
 
						