Skip to content
Advertisement

Comparing Two dataframes of pandas on the basis of condition of two columns

I have two dataframe in which columns are different, And I need to search and the map the data in new file. I am sharing the dataframes and also desired output

DF1

    Ref.X               Ref.Y            Ref.No
    255623              145244651        XF1234
    45454544            NA               DF7886
    1300256655          3511614646       DF8948
    5546446166          NA               VX4578
    1230001556          NA               GL8947
    4548754545          4548545877       DF7896
    3652656             NA               XF4895
    7884045455          45681247         GL8597
    148428              1242154661       XF3564

DF2

    Type      STR1            STR2
    ABJH      45656644        145244651
    ABJH      844654989       1242154661
    BHJH      51654241654     3511614646
    BHJH      1230001556      255565656
    ABJH      5546446166      8946549849
    ABJH      45454544        544895655
    BHJH      3652656         565464447

Now I need to a output from above two dataframes.

Conditions

  1. Need to search Ref.Y in STR2, if available then pick the “Type” for output.
  2. if Ref.Y is “NA” then need to search Ref.X in STR1, if available then pick the “Type” for output.
  3. Also I need “Ref.No” in desired output correspond to the “Type” Ref.X or Ref.Y found in DF2

Desired Output

    Ref.X               Ref.Y            Ref.No     Int_Type
    255623              145244651        XF1234     ABJH
    45454544            NA               DF7886     ABJH
    1300256655          3511614646       DF8948     BHJH
    5546446166          NA               VX4578     ABJH
    1230001556          NA               GL8947     BHJH
    3652656             NA               XF4895     BHJH
    148428              1242154661       XF3564     ABJH

Thanks.

Advertisement

Answer

You can try repace NA string to <NA> type then use np.where

df1 = df1.replace({'NA': pd.NA})

df1['Int_Type'] = np.where(df1['Ref.Y'].isna(),
                       df1['Ref.X'].map(dict(zip(df2['STR1'], df2['Type']))),
                       df1['Ref.Y'].map(dict(zip(df2['STR2'], df2['Type']))))

Or you can directly compare with NA string:

df1['Int_Type'] = np.where(df1['Ref.Y'].eq('NA'),
        Ref.X         Ref.Y  Ref.No  Int_Type
0      255623  1.452447e+08  XF1234  ABJH
1      148428  1.242155e+09  XF3564  ABJH
2  1300256655  3.511615e+09  DF8948  BHJH
3  1230001556           NaN  GL8947  BHJH
4  5546446166           NaN  VX4578  ABJH
5    45454544           NaN  DF7886  ABJH
6     3652656           NaN  XF4895  BHJH
7  4548754545  4.548546e+09  DF7896   NaN
8  7884045455  4.568125e+07  GL8597   NaN

To get Ref.No correspond to the “Type” Ref.X or Ref.Y found in DF2, you can drop the NAN column in Int_Type

df1 = df1.dropna(subset='Int_Type')
print(df1)

        Ref.X         Ref.Y  Ref.No Int_Type
0      255623  1.452447e+08  XF1234     ABJH
1      148428  1.242155e+09  XF3564     ABJH
2  1300256655  3.511615e+09  DF8948     BHJH
3  1230001556           NaN  GL8947     BHJH
4  5546446166           NaN  VX4578     ABJH
5    45454544           NaN  DF7886     ABJH
6     3652656           NaN  XF4895     BHJH
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement