Skip to content
Advertisement

New dataframe of all non-NaN pairs of elements between two columns in pandas

Trying to go from a DataFrame where each row is a source entity and columns are the type of relations between one or more entities like this:

import numpy as np
import pandas as pd

i = [['a', np.nan, np.nan, ['d', 'e']],
     ['b', 'f', np.nan, np.nan],
     ['c', np.nan, 'g', 'h']]
inputs = pd.DataFrame(i, columns=['source', 'mom', 'dad', 'sibling'])

To one where each row includes a source’s unique target entity and relation type in separate columns:

o = [['a', 'd', 'sibling'],
     ['a', 'e', 'sibling'],
     ['b', 'f', 'mom'],
     ['c', 'g', 'dad'],
     ['c', 'h', 'sib']]
outputs = pd.DataFrame(o)

I’ve looked at pandas functionality including stack() and explode() but can’t figure out how to implement a pandas-native solution. Any suggestions on how to do this efficiently?

Advertisement

Answer

Per @sammywemmy , melt and explode should do the trick:

inputs.melt("source", var_name="relationship").dropna().explode('value')
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement