I’m trying to figure out the most efficient way to join two dataframes such as below.
I’ve tried pd.merge and maybe using the rank function but cannot seem to figure a way.
Thanks in advance
df1
JavaScript
x
6
1
| A | B | C |
2
| -------- | -------------- |------------|
3
| TBK1 | 2022-01-01 |2022-04-04 |
4
| TBK1 | 2022-02-02 |2021-01-09 |
5
| TBK3 | 2022-05-07 |2023-02-04 |
6
What I’m trying to achieve is this
df2
JavaScript
1
7
1
| A | B | C | D | E |
2
| -------- | -------------- |------------|----------------|------------|
3
| TBK1 | 2022-01-01 |2022-04-04 | 2022-02-02 |2021-01-09 |
4
| TBK3 | 2022-05-07 |2023-02-04 |NaN |NaN |
5
6
7
Advertisement
Answer
You might want to use groupby
with unstack
as advised in this answer:
JavaScript
1
24
24
1
import pandas as pd
2
from string import ascii_uppercase
3
4
# Reproduce the data
5
df = pd.DataFrame()
6
df['A'] = ['TBK1','TBK1', 'TBK3']
7
df['B'] = ['2022-01-01' , '2022-02-02', '2022-05-07']
8
df['C'] = ['2022-04-04', '2021-01-09', '2023-02-04']
9
10
# Count how many rows exists per unique entry
11
s = df.groupby(['A']).cumcount()
12
# Unstack
13
df1 = df.set_index(['A', s]).unstack().sort_index(level=1, axis=1)
14
# Rename columns
15
df1.columns = [l for l in ascii_uppercase[1:len(df1.columns)+1]]
16
# Flatten columns names (aesthetics)
17
df1 = df1.reset_index()
18
19
print(df1)
20
21
A B C D E
22
0 TBK1 2022-01-01 2022-04-04 2022-02-02 2021-01-09
23
1 TBK3 2022-05-07 2023-02-04 NaN NaN
24