Problem
I have a df that has many columns with the same column name. I wish to use the same column name as a key to do like UNION in SQL.
Example
see example data:
JavaScript
x
4
1
cie = ['y','n','y','n']
2
words = [['bank', 'payment'],['student', 'loan','payment'],['bank', 'payment'],['student', 'loan']]
3
df = pd.DataFrame(data=words, index=cie)
4
df:
JavaScript
1
6
1
0 1 2
2
y bank payment None
3
n student loan payment
4
y bank payment None
5
n student loan None
6
df.T:
JavaScript
1
5
1
y n y n
2
0 bank student bank student
3
1 payment loan payment loan
4
2 None payment None None
5
I need to combine the two y columns since I want to calculate how many times the words in y leads to the results to be sure. Ideally the results should be:
JavaScript
1
8
1
y n
2
0 bank student
3
1 payment loan
4
2 None payment
5
3 bank student
6
4 payment loan
7
5 None None
8
I tried many methods but they do not work. Can anybody help?
Advertisement
Answer
IIUC first melt
, the using cumcount
create the additional key , now you will see the problem became pivot
JavaScript
1
13
13
1
s=df.reset_index().melt('index')
2
s.variable=s.groupby('index').cumcount()
3
s.pivot(*s.columns).T
4
Out[43]:
5
index n y
6
variable
7
0 student bank
8
1 student bank
9
2 loan payment
10
3 loan payment
11
4 payment None
12
5 None None
13