Problem
I have a df that has many columns with the same column name. I wish to use the same column name as a key to do like UNION in SQL.
Example
see example data:
cie = ['y','n','y','n'] words = [['bank', 'payment'],['student', 'loan','payment'],['bank', 'payment'],['student', 'loan']] df = pd.DataFrame(data=words, index=cie)
df:
0 1 2 y bank payment None n student loan payment y bank payment None n student loan None
df.T:
y n y n 0 bank student bank student 1 payment loan payment loan 2 None payment None None
I need to combine the two y columns since I want to calculate how many times the words in y leads to the results to be sure. Ideally the results should be:
y n 0 bank student 1 payment loan 2 None payment 3 bank student 4 payment loan 5 None None
I tried many methods but they do not work. Can anybody help?
Advertisement
Answer
IIUC first melt
, the using cumcount
create the additional key , now you will see the problem became pivot
s=df.reset_index().melt('index') s.variable=s.groupby('index').cumcount() s.pivot(*s.columns).T Out[43]: index n y variable 0 student bank 1 student bank 2 loan payment 3 loan payment 4 payment None 5 None None