I have two data frames. I want to add the columns from the second data frame to the first one by matching the column information in data frame 1.
The first data frame: dd
BCS libcs 0 AAA-1 1 1 ABA-1 1 2 ACD-1 1 3 AAT-1 1 4 AAR-1 1 ... ... 188453 TCC-61 61 188454 TTG-61 61 188455 CTG-61 61 188456 ATG-61 61 188457 TTT-61 61 print(dd.shape) (188458, 2)
The second dataframe: df
sample_id DState 0 H20_AGS AG 1 H21_AGS AG 2 H22_BGS BG 3 H23_AGS AG 4 H24_CGS CG .. ... ... 56 H90_DGS DG 57 H91_DGS DG 58 H92_BGS BG 59 H93_BGS BG 60 H94_CGS CG print(df.shape) (61, 2)
There are 61 unique items in column libcs
in dataframe1 and 61 rows in the second data frame. I want to merge based on the matching column information.
Expected Output:
BCS libcs sample_id DState 0 AAA-1 1 H20_AGS AG 1 ABA-1 1 H20_AGS AG 2 ACD-1 1 H20_AGS AG 3 AAT-1 1 H20_AGS AG 4 AAR-1 1 H20_AGS AG ... ... 188453 TCC-61 61 H94_CGS CG 188454 TTG-61 61 H94_CGS CG 188455 CTG-61 61 H94_CGS CG 188456 ATG-61 61 H94_CGS CG 188457 TTT-61 61 H94_CGS CG
I tried the below code but it seems like it is not working
dd.libcs.value_counts() dd.index = dd.groupby(level=0).cumcount() C = dd.join(df).reset_index(drop=True) print (C)
Advertisement
Answer
Use merge
>>> dd.merge(df, left_on='libcs', right_on=(df.index+1).astype(str)) BCS libcs sample_id DState 0 AAA-1 1 H20_AGS AG 1 ABA-1 1 H20_AGS AG 2 ACD-1 1 H20_AGS AG 3 AAT-1 1 H20_AGS AG 4 AAR-1 1 H20_AGS AG 5 TCC-61 61 H94_CGS CG 6 TTG-61 61 H94_CGS CG 7 CTG-61 61 H94_CGS CG 8 ATG-61 61 H94_CGS CG 9 TTT-61 61 H94_CGS CG