Compare two dataframe column values and join with condition in python?

Question

I need to join the below dataframe based on some condition. df_output I need to join two dataframe df1, df2 based on Id column but every element should be in df.Id list that's when we consider it a match. Answer While this isn't a highly efficient solution, you can use some sets to solve this problem. In the above snippet:

Accepted Answer

While this isn&#8217;t a highly efficient solution, you can use some sets to solve this problem.matches = df1["Id"].apply(set) <= df2["Id"].apply(set)out = df1.copy()out.loc[matches, df2.columns.difference(["Id"])] = df2print(out)                Id  Value Product_Name0  [101, 102, 103]  10001         Shoe1  [101, 102, 104]  10000        jeans2  [101, 102, 105]  10002      make-up3  [101, 107, 105]  10003          NaNIn the above snippet:matches = df1["Id"].apply(set) <= df2["Id"].apply(set) returns a boolean Series that is True where the contents of each row in df1[&#8216;Id&#8217;] is in the corresponding row in df2[&#8216;Id&#8217;], and False otherwiseInstead of performing an actual merge we can simply align the 2 DataFrames on the aforementioned boolean SeriesIf you want to test Ids against eachother in both dataframes, you can take the cartesian product of both DataFrames, filter it down to the inner join via the set criteria, and then append back any missing left join keys.out = (    pd.merge(df1, df2, how="cross")    .loc[lambda df: df["Id_x"].map(set) <= df["Id_y"].map(set)]    .pipe(        lambda df: df.append(             df1.loc[~df1["Id"].isin(df["Id_x"])].rename(columns={"Id": "Id_x"})         )    )    .reset_index(drop=True))print(out)              Id_x  Value                  Id_y Product_Name0  [101, 102, 103]  10001  [101, 102, 103, 104]         Shoe1  [101, 102, 104]  10000  [101, 102, 103, 104]         Shoe2  [101, 102, 104]  10000  [101, 102, 109, 104]        jeans3  [101, 102, 105]  10002  [101, 105, 102, 108]      make-up4  [101, 107, 105]  10003                   NaN          NaN

Advertisement

Answer