Skip to content
Advertisement

Python DataFrame: How to connect different columns with the same name and merge them into one column

Problem

I have a df that has many columns with the same column name. I wish to use the same column name as a key to do like UNION in SQL.

Example

see example data:

cie = ['y','n','y','n']
words = [['bank', 'payment'],['student', 'loan','payment'],['bank', 'payment'],['student', 'loan']]
df = pd.DataFrame(data=words, index=cie)

df:

    0   1   2
y   bank    payment None
n   student loan    payment
y   bank    payment None
n   student loan    None

df.T:

    y       n       y       n
0   bank    student bank    student
1   payment loan    payment loan
2   None    payment None    None

I need to combine the two y columns since I want to calculate how many times the words in y leads to the results to be sure. Ideally the results should be:

    y       n
0   bank    student
1   payment loan
2   None    payment
3   bank    student
4   payment loan
5   None    None

I tried many methods but they do not work. Can anybody help?

Advertisement

Answer

IIUC first melt , the using cumcount create the additional key , now you will see the problem became pivot

s=df.reset_index().melt('index')
s.variable=s.groupby('index').cumcount()
s.pivot(*s.columns).T
Out[43]: 
index           n        y
variable                  
0         student     bank
1         student     bank
2            loan  payment
3            loan  payment
4         payment     None
5            None     None
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement