I have a pandas dataframe of 100 rows x 7 columns like this:
Values in column source
are connected to the values in the other columns. For example, a
is connected to contact_1, contact_2... contact_5
.
In the same way, b
is connected to contact_6, contact_7 .... and contact_10
.
I want to stack these columns into two columns only (i.e. source and destination), to help me build a graph using edgelist format.
The expected output data format is:
I tried df.stack()
but did not get the desired result, I got the following:
Any suggestions?
Advertisement
Answer
You’re looking for pd.wide_to_long
. This should do:
pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')
The column destination_
will have the info you’re looking for.
Example:
import pandas as pd d = {'source': ['a', 'b'], 'destination_1': ['contact_1', 'contact_6'], 'destination_2': ['contact_2', 'contact_7']} df = pd.DataFrame(d) pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')
Output:
destination_ source number a 1 contact_1 b 1 contact_6 a 2 contact_2 b 2 contact_7