Problem
I have a df that has many columns with the same column name. I wish to use the same column name as a key to do like UNION in SQL.
Example
see example data:
cie = ['y','n','y','n'] words = [['bank', 'payment'],['student', 'loan','payment'],['bank', 'payment'],['student', 'loan']] df = pd.DataFrame(data=words, index=cie)
df:
0 1 2 y bank payment None n student loan payment y bank payment None n student loan None
df.T:
y n y n 0 bank student bank student 1 payment loan payment loan 2 None payment None None
I need to combine the two y columns since I want to calculate how many times the words in y leads to the results to be sure. Ideally the results should be:
y n 0 bank student 1 payment loan 2 None payment 3 bank student 4 payment loan 5 None None
I tried many methods but they do not work. Can anybody help?
Advertisement
Answer
IIUC first melt , the using cumcount create the additional key , now you will see the problem became pivot
s=df.reset_index().melt('index')
s.variable=s.groupby('index').cumcount()
s.pivot(*s.columns).T
Out[43]:
index n y
variable
0 student bank
1 student bank
2 loan payment
3 loan payment
4 payment None
5 None None