Skip to content
Advertisement

Pivot and merge two pandas dataframes

I have two dataframes (taken from pd.to_clipboard(), suggest using pd.read_clipboard())

df_a:

    DateField   Team    FirstVal    SecondVal
0   1/1/2001    Team1   578 32631
1   1/1/2001    Team2   570 54718
2   1/1/2001    Team3   992 16092
3   1/1/2001    Team4   323 96744
4   1/1/2001    Team5   591 51103
5   2/1/2001    Team1   407 85107
6   2/1/2001    Team2   980 62692
7   2/1/2001    Team3   668 55523
8   2/1/2001    Team4   622 81807
9   2/1/2001    Team5   966 18505
10  3/1/2001    Team1   606 41817
11  3/1/2001    Team2   547 39217
12  3/1/2001    Team3   794 72819
13  3/1/2001    Team4   509 49918
14  3/1/2001    Team5   318 49998
15  4/1/2001    Team1   340 48283
16  4/1/2001    Team2   380 26229
17  4/1/2001    Team3   307 79244
18  4/1/2001    Team4   374 22626
19  4/1/2001    Team5   918 99260

and df_b:

    DateField   Team1   Team2   Team3   Team4   Team5
0   1/1/2001    49  44  95  28  9
1   1/2/2001    72  37  3   30  85
2   1/3/2001    47  6   53  93  3
3   1/4/2001    15  96  11  96  81
4   1/5/2001    14  95  11  66  12
5   1/6/2001    29  62  19  96  74
6   1/7/2001    36  10  25  38  96
7   1/8/2001    24  39  1   76  44
8   1/9/2001    9   3   32  2   25
9   1/10/2001   25  75  52  19  9
10  1/11/2001   17  67  61  90  36
11  1/12/2001   15  17  36  54  13

What I am looking to do is add a third column to df_a, say ThirdVal, which contains the value in df_b where the DateField and Team align. My issue is that df_b is transposed and formatted awry compared to df_a.

I have looked into pd.pivot() but have been unsuccessful in pivoting the dataframe to make it joinable, but I am not sure what to fill in for the columns argument. Should I be looking into a different function?

Advertisement

Answer

What you’re looking for is not pivot, but melt.

df_c = df_b.melt(id_vars = 'DateField', value_vars = ['Team1', 'Team2','Team3','Team4','Team5'])

Or, for a more general approach, where all columns not specified in id_vars will be used as value_vars:

df_c = df_b.melt(id_vars = 'DateField')

Gives you a DataFrame looking like this:

    DateField   variable    value
0   1/1/2001    Team1   49
1   1/2/2001    Team1   72
2   1/3/2001    Team1   47
3   1/4/2001    Team1   15
4   1/5/2001    Team1   14
5   1/6/2001    Team1   29
6   1/7/2001    Team1   36
7   1/8/2001    Team1   24
8   1/9/2001    Team1   9
9   1/10/2001   Team1   25
10  1/11/2001   Team1   17
11  1/12/2001   Team1   15
12  1/1/2001    Team2   44
13  1/2/2001    Team2   37
14  1/3/2001    Team2   6
15  1/4/2001    Team2   96
16  1/5/2001    Team2   95
17  1/6/2001    Team2   62
18  1/7/2001    Team2   10
19  1/8/2001    Team2   39
20  1/9/2001    Team2   3
21  1/10/2001   Team2   75
22  1/11/2001   Team2   67
23  1/12/2001   Team2   17
24  1/1/2001    Team3   95
25  1/2/2001    Team3   3
26  1/3/2001    Team3   53
27  1/4/2001    Team3   11
28  1/5/2001    Team3   11
29  1/6/2001    Team3   19
30  1/7/2001    Team3   25
31  1/8/2001    Team3   1
32  1/9/2001    Team3   32
33  1/10/2001   Team3   52
34  1/11/2001   Team3   61
35  1/12/2001   Team3   36
36  1/1/2001    Team4   28
37  1/2/2001    Team4   30
38  1/3/2001    Team4   93
39  1/4/2001    Team4   96
40  1/5/2001    Team4   66
41  1/6/2001    Team4   96
42  1/7/2001    Team4   38
43  1/8/2001    Team4   76
44  1/9/2001    Team4   2
45  1/10/2001   Team4   19
46  1/11/2001   Team4   90
47  1/12/2001   Team4   54
48  1/1/2001    Team5   9
49  1/2/2001    Team5   85
50  1/3/2001    Team5   3
51  1/4/2001    Team5   81
52  1/5/2001    Team5   12
53  1/6/2001    Team5   74
54  1/7/2001    Team5   96
55  1/8/2001    Team5   44
56  1/9/2001    Team5   25
57  1/10/2001   Team5   9
58  1/11/2001   Team5   36
59  1/12/2001   Team5   13

Which you can then merge with your df_a:

df_a.merge(df_c, how = 'left', left_on = ['DateField','Team'], right_on = ['DateField','variable'])

To get output (not missing values are due to left join as I wanted to preserve df_a data, since you mentioned adding a column to it):

    DateField   Team    FirstVal    SecondVal   variable    value
0   1/1/2001    Team1   578 32631   Team1   49.0
1   1/1/2001    Team2   570 54718   Team2   44.0
2   1/1/2001    Team3   992 16092   Team3   95.0
3   1/1/2001    Team4   323 96744   Team4   28.0
4   1/1/2001    Team5   591 51103   Team5   9.0
5   2/1/2001    Team1   407 85107       
6   2/1/2001    Team2   980 62692       
7   2/1/2001    Team3   668 55523       
8   2/1/2001    Team4   622 81807       
9   2/1/2001    Team5   966 18505       
10  3/1/2001    Team1   606 41817       
11  3/1/2001    Team2   547 39217       
12  3/1/2001    Team3   794 72819       
13  3/1/2001    Team4   509 49918       
14  3/1/2001    Team5   318 49998       
15  4/1/2001    Team1   340 48283       
16  4/1/2001    Team2   380 26229       
17  4/1/2001    Team3   307 79244       
18  4/1/2001    Team4   374 22626       
19  4/1/2001    Team5   918 99260       
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement