I have two dataframes (taken from pd.to_clipboard()
, suggest using pd.read_clipboard()
)
df_a
:
DateField Team FirstVal SecondVal 0 1/1/2001 Team1 578 32631 1 1/1/2001 Team2 570 54718 2 1/1/2001 Team3 992 16092 3 1/1/2001 Team4 323 96744 4 1/1/2001 Team5 591 51103 5 2/1/2001 Team1 407 85107 6 2/1/2001 Team2 980 62692 7 2/1/2001 Team3 668 55523 8 2/1/2001 Team4 622 81807 9 2/1/2001 Team5 966 18505 10 3/1/2001 Team1 606 41817 11 3/1/2001 Team2 547 39217 12 3/1/2001 Team3 794 72819 13 3/1/2001 Team4 509 49918 14 3/1/2001 Team5 318 49998 15 4/1/2001 Team1 340 48283 16 4/1/2001 Team2 380 26229 17 4/1/2001 Team3 307 79244 18 4/1/2001 Team4 374 22626 19 4/1/2001 Team5 918 99260
and df_b
:
DateField Team1 Team2 Team3 Team4 Team5 0 1/1/2001 49 44 95 28 9 1 1/2/2001 72 37 3 30 85 2 1/3/2001 47 6 53 93 3 3 1/4/2001 15 96 11 96 81 4 1/5/2001 14 95 11 66 12 5 1/6/2001 29 62 19 96 74 6 1/7/2001 36 10 25 38 96 7 1/8/2001 24 39 1 76 44 8 1/9/2001 9 3 32 2 25 9 1/10/2001 25 75 52 19 9 10 1/11/2001 17 67 61 90 36 11 1/12/2001 15 17 36 54 13
What I am looking to do is add a third column to df_a
, say ThirdVal
, which contains the value in df_b
where the DateField
and Team
align. My issue is that df_b
is transposed and formatted awry compared to df_a
.
I have looked into pd.pivot()
but have been unsuccessful in pivoting the dataframe to make it joinable, but I am not sure what to fill in for the columns
argument. Should I be looking into a different function?
Advertisement
Answer
What you’re looking for is not pivot
, but melt
.
df_c = df_b.melt(id_vars = 'DateField', value_vars = ['Team1', 'Team2','Team3','Team4','Team5'])
Or, for a more general approach, where all columns not specified in id_vars
will be used as value_vars
:
df_c = df_b.melt(id_vars = 'DateField')
Gives you a DataFrame looking like this:
DateField variable value 0 1/1/2001 Team1 49 1 1/2/2001 Team1 72 2 1/3/2001 Team1 47 3 1/4/2001 Team1 15 4 1/5/2001 Team1 14 5 1/6/2001 Team1 29 6 1/7/2001 Team1 36 7 1/8/2001 Team1 24 8 1/9/2001 Team1 9 9 1/10/2001 Team1 25 10 1/11/2001 Team1 17 11 1/12/2001 Team1 15 12 1/1/2001 Team2 44 13 1/2/2001 Team2 37 14 1/3/2001 Team2 6 15 1/4/2001 Team2 96 16 1/5/2001 Team2 95 17 1/6/2001 Team2 62 18 1/7/2001 Team2 10 19 1/8/2001 Team2 39 20 1/9/2001 Team2 3 21 1/10/2001 Team2 75 22 1/11/2001 Team2 67 23 1/12/2001 Team2 17 24 1/1/2001 Team3 95 25 1/2/2001 Team3 3 26 1/3/2001 Team3 53 27 1/4/2001 Team3 11 28 1/5/2001 Team3 11 29 1/6/2001 Team3 19 30 1/7/2001 Team3 25 31 1/8/2001 Team3 1 32 1/9/2001 Team3 32 33 1/10/2001 Team3 52 34 1/11/2001 Team3 61 35 1/12/2001 Team3 36 36 1/1/2001 Team4 28 37 1/2/2001 Team4 30 38 1/3/2001 Team4 93 39 1/4/2001 Team4 96 40 1/5/2001 Team4 66 41 1/6/2001 Team4 96 42 1/7/2001 Team4 38 43 1/8/2001 Team4 76 44 1/9/2001 Team4 2 45 1/10/2001 Team4 19 46 1/11/2001 Team4 90 47 1/12/2001 Team4 54 48 1/1/2001 Team5 9 49 1/2/2001 Team5 85 50 1/3/2001 Team5 3 51 1/4/2001 Team5 81 52 1/5/2001 Team5 12 53 1/6/2001 Team5 74 54 1/7/2001 Team5 96 55 1/8/2001 Team5 44 56 1/9/2001 Team5 25 57 1/10/2001 Team5 9 58 1/11/2001 Team5 36 59 1/12/2001 Team5 13
Which you can then merge with your df_a
:
df_a.merge(df_c, how = 'left', left_on = ['DateField','Team'], right_on = ['DateField','variable'])
To get output (not missing values are due to left join as I wanted to preserve df_a
data, since you mentioned adding a column to it):
DateField Team FirstVal SecondVal variable value 0 1/1/2001 Team1 578 32631 Team1 49.0 1 1/1/2001 Team2 570 54718 Team2 44.0 2 1/1/2001 Team3 992 16092 Team3 95.0 3 1/1/2001 Team4 323 96744 Team4 28.0 4 1/1/2001 Team5 591 51103 Team5 9.0 5 2/1/2001 Team1 407 85107 6 2/1/2001 Team2 980 62692 7 2/1/2001 Team3 668 55523 8 2/1/2001 Team4 622 81807 9 2/1/2001 Team5 966 18505 10 3/1/2001 Team1 606 41817 11 3/1/2001 Team2 547 39217 12 3/1/2001 Team3 794 72819 13 3/1/2001 Team4 509 49918 14 3/1/2001 Team5 318 49998 15 4/1/2001 Team1 340 48283 16 4/1/2001 Team2 380 26229 17 4/1/2001 Team3 307 79244 18 4/1/2001 Team4 374 22626 19 4/1/2001 Team5 918 99260