I have a pandas dataframe of 100 rows x 7 columns like this:
Values in column source are connected to the values in the other columns. For example, a is connected to contact_1, contact_2... contact_5.
In the same way, b is connected to contact_6, contact_7 .... and contact_10.
I want to stack these columns into two columns only (i.e. source and destination), to help me build a graph using edgelist format.
The expected output data format is:
I tried df.stack() but did not get the desired result, I got the following:
Any suggestions?
Advertisement
Answer
You’re looking for pd.wide_to_long. This should do:
pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')
The column destination_ will have the info you’re looking for.
Example:
import pandas as pd
d = {'source': ['a', 'b'],
'destination_1': ['contact_1', 'contact_6'],
'destination_2': ['contact_2', 'contact_7']}
df = pd.DataFrame(d)
pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')
Output:
destination_ source number a 1 contact_1 b 1 contact_6 a 2 contact_2 b 2 contact_7


