Trying to go from a DataFrame where each row is a source entity and columns are the type of relations between one or more entities like this:
import numpy as np import pandas as pd i = [['a', np.nan, np.nan, ['d', 'e']], ['b', 'f', np.nan, np.nan], ['c', np.nan, 'g', 'h']] inputs = pd.DataFrame(i, columns=['source', 'mom', 'dad', 'sibling'])
To one where each row includes a source’s unique target entity and relation type in separate columns:
o = [['a', 'd', 'sibling'], ['a', 'e', 'sibling'], ['b', 'f', 'mom'], ['c', 'g', 'dad'], ['c', 'h', 'sib']] outputs = pd.DataFrame(o)
I’ve looked at pandas functionality including stack()
and explode()
but can’t figure out how to implement a pandas-native solution. Any suggestions on how to do this efficiently?
Advertisement
Answer
Per @sammywemmy , melt and explode should do the trick:
inputs.melt("source", var_name="relationship").dropna().explode('value')