I have two dataframes (taken from pd.to_clipboard()
, suggest using pd.read_clipboard()
)
df_a
:
DateField Team FirstVal SecondVal
0 1/1/2001 Team1 578 32631
1 1/1/2001 Team2 570 54718
2 1/1/2001 Team3 992 16092
3 1/1/2001 Team4 323 96744
4 1/1/2001 Team5 591 51103
5 2/1/2001 Team1 407 85107
6 2/1/2001 Team2 980 62692
7 2/1/2001 Team3 668 55523
8 2/1/2001 Team4 622 81807
9 2/1/2001 Team5 966 18505
10 3/1/2001 Team1 606 41817
11 3/1/2001 Team2 547 39217
12 3/1/2001 Team3 794 72819
13 3/1/2001 Team4 509 49918
14 3/1/2001 Team5 318 49998
15 4/1/2001 Team1 340 48283
16 4/1/2001 Team2 380 26229
17 4/1/2001 Team3 307 79244
18 4/1/2001 Team4 374 22626
19 4/1/2001 Team5 918 99260
and df_b
:
DateField Team1 Team2 Team3 Team4 Team5
0 1/1/2001 49 44 95 28 9
1 1/2/2001 72 37 3 30 85
2 1/3/2001 47 6 53 93 3
3 1/4/2001 15 96 11 96 81
4 1/5/2001 14 95 11 66 12
5 1/6/2001 29 62 19 96 74
6 1/7/2001 36 10 25 38 96
7 1/8/2001 24 39 1 76 44
8 1/9/2001 9 3 32 2 25
9 1/10/2001 25 75 52 19 9
10 1/11/2001 17 67 61 90 36
11 1/12/2001 15 17 36 54 13
What I am looking to do is add a third column to df_a
, say ThirdVal
, which contains the value in df_b
where the DateField
and Team
align. My issue is that df_b
is transposed and formatted awry compared to df_a
.
I have looked into pd.pivot()
but have been unsuccessful in pivoting the dataframe to make it joinable, but I am not sure what to fill in for the columns
argument. Should I be looking into a different function?
Advertisement
Answer
What you’re looking for is not pivot
, but melt
.
df_c = df_b.melt(id_vars = 'DateField', value_vars = ['Team1', 'Team2','Team3','Team4','Team5'])
Or, for a more general approach, where all columns not specified in id_vars
will be used as value_vars
:
df_c = df_b.melt(id_vars = 'DateField')
Gives you a DataFrame looking like this:
DateField variable value
0 1/1/2001 Team1 49
1 1/2/2001 Team1 72
2 1/3/2001 Team1 47
3 1/4/2001 Team1 15
4 1/5/2001 Team1 14
5 1/6/2001 Team1 29
6 1/7/2001 Team1 36
7 1/8/2001 Team1 24
8 1/9/2001 Team1 9
9 1/10/2001 Team1 25
10 1/11/2001 Team1 17
11 1/12/2001 Team1 15
12 1/1/2001 Team2 44
13 1/2/2001 Team2 37
14 1/3/2001 Team2 6
15 1/4/2001 Team2 96
16 1/5/2001 Team2 95
17 1/6/2001 Team2 62
18 1/7/2001 Team2 10
19 1/8/2001 Team2 39
20 1/9/2001 Team2 3
21 1/10/2001 Team2 75
22 1/11/2001 Team2 67
23 1/12/2001 Team2 17
24 1/1/2001 Team3 95
25 1/2/2001 Team3 3
26 1/3/2001 Team3 53
27 1/4/2001 Team3 11
28 1/5/2001 Team3 11
29 1/6/2001 Team3 19
30 1/7/2001 Team3 25
31 1/8/2001 Team3 1
32 1/9/2001 Team3 32
33 1/10/2001 Team3 52
34 1/11/2001 Team3 61
35 1/12/2001 Team3 36
36 1/1/2001 Team4 28
37 1/2/2001 Team4 30
38 1/3/2001 Team4 93
39 1/4/2001 Team4 96
40 1/5/2001 Team4 66
41 1/6/2001 Team4 96
42 1/7/2001 Team4 38
43 1/8/2001 Team4 76
44 1/9/2001 Team4 2
45 1/10/2001 Team4 19
46 1/11/2001 Team4 90
47 1/12/2001 Team4 54
48 1/1/2001 Team5 9
49 1/2/2001 Team5 85
50 1/3/2001 Team5 3
51 1/4/2001 Team5 81
52 1/5/2001 Team5 12
53 1/6/2001 Team5 74
54 1/7/2001 Team5 96
55 1/8/2001 Team5 44
56 1/9/2001 Team5 25
57 1/10/2001 Team5 9
58 1/11/2001 Team5 36
59 1/12/2001 Team5 13
Which you can then merge with your df_a
:
df_a.merge(df_c, how = 'left', left_on = ['DateField','Team'], right_on = ['DateField','variable'])
To get output (not missing values are due to left join as I wanted to preserve df_a
data, since you mentioned adding a column to it):
DateField Team FirstVal SecondVal variable value
0 1/1/2001 Team1 578 32631 Team1 49.0
1 1/1/2001 Team2 570 54718 Team2 44.0
2 1/1/2001 Team3 992 16092 Team3 95.0
3 1/1/2001 Team4 323 96744 Team4 28.0
4 1/1/2001 Team5 591 51103 Team5 9.0
5 2/1/2001 Team1 407 85107
6 2/1/2001 Team2 980 62692
7 2/1/2001 Team3 668 55523
8 2/1/2001 Team4 622 81807
9 2/1/2001 Team5 966 18505
10 3/1/2001 Team1 606 41817
11 3/1/2001 Team2 547 39217
12 3/1/2001 Team3 794 72819
13 3/1/2001 Team4 509 49918
14 3/1/2001 Team5 318 49998
15 4/1/2001 Team1 340 48283
16 4/1/2001 Team2 380 26229
17 4/1/2001 Team3 307 79244
18 4/1/2001 Team4 374 22626
19 4/1/2001 Team5 918 99260