I have two data frames. I want to add the columns from the second data frame to the first one by matching the column information in data frame 1.
The first data frame: dd
BCS libcs
0 AAA-1 1
1 ABA-1 1
2 ACD-1 1
3 AAT-1 1
4 AAR-1 1
... ...
188453 TCC-61 61
188454 TTG-61 61
188455 CTG-61 61
188456 ATG-61 61
188457 TTT-61 61
print(dd.shape)
(188458, 2)
The second dataframe: df
sample_id DState 0 H20_AGS AG 1 H21_AGS AG 2 H22_BGS BG 3 H23_AGS AG 4 H24_CGS CG .. ... ... 56 H90_DGS DG 57 H91_DGS DG 58 H92_BGS BG 59 H93_BGS BG 60 H94_CGS CG print(df.shape) (61, 2)
There are 61 unique items in column libcs in dataframe1 and 61 rows in the second data frame. I want to merge based on the matching column information.
Expected Output:
BCS libcs sample_id DState
0 AAA-1 1 H20_AGS AG
1 ABA-1 1 H20_AGS AG
2 ACD-1 1 H20_AGS AG
3 AAT-1 1 H20_AGS AG
4 AAR-1 1 H20_AGS AG
... ...
188453 TCC-61 61 H94_CGS CG
188454 TTG-61 61 H94_CGS CG
188455 CTG-61 61 H94_CGS CG
188456 ATG-61 61 H94_CGS CG
188457 TTT-61 61 H94_CGS CG
I tried the below code but it seems like it is not working
dd.libcs.value_counts() dd.index = dd.groupby(level=0).cumcount() C = dd.join(df).reset_index(drop=True) print (C)
Advertisement
Answer
Use merge
>>> dd.merge(df, left_on='libcs', right_on=(df.index+1).astype(str))
BCS libcs sample_id DState
0 AAA-1 1 H20_AGS AG
1 ABA-1 1 H20_AGS AG
2 ACD-1 1 H20_AGS AG
3 AAT-1 1 H20_AGS AG
4 AAR-1 1 H20_AGS AG
5 TCC-61 61 H94_CGS CG
6 TTG-61 61 H94_CGS CG
7 CTG-61 61 H94_CGS CG
8 ATG-61 61 H94_CGS CG
9 TTT-61 61 H94_CGS CG