I need to join the below dataframe based on some condition.
df1: Id Value [101,102,103] 10001 [101,102,104] 10000 [101,102,105] 10002 [101,107,105] 10003 df2: Id Product_Name [101,102,103,104] Shoe [101,102,109,104] jeans [101,105,102,108] make-up [101,105,106,118] shirt
df_output
Id Value Product_Name [101,102,103] 10001 Shoe -- Every value present in df2.Id list [101,102,104] 10000 Jeans -- Every value present in df2.Id list [101,102,105] 10002 Make-Up -- Every value present in df2.Id list [101,107,105] 10003 NaN -- Not Every Value matches in df2.Id list.
I need to join two dataframe df1, df2 based on Id column but every element should be in df.Id list that’s when we consider it a match.
Python Code: import pandas as pd df_output = pd.merge(df1, df2, on='Id', how='left')
Advertisement
Answer
While this isn’t a highly efficient solution, you can use some set
s to solve this problem.
matches = df1["Id"].apply(set) <= df2["Id"].apply(set) out = df1.copy() out.loc[matches, df2.columns.difference(["Id"])] = df2 print(out) Id Value Product_Name 0 [101, 102, 103] 10001 Shoe 1 [101, 102, 104] 10000 jeans 2 [101, 102, 105] 10002 make-up 3 [101, 107, 105] 10003 NaN
In the above snippet:
matches = df1["Id"].apply(set) <= df2["Id"].apply(set)
returns a booleanSeries
that is True where the contents of each row in df1[‘Id’] is in the corresponding row in df2[‘Id’], and False otherwise- Instead of performing an actual
merge
we can simply align the 2 DataFrames on the aforementioned booleanSeries
If you want to test Ids against eachother in both dataframes, you can take the cartesian product of both DataFrames, filter it down to the inner join via the set criteria, and then append back any missing left join keys.
out = ( pd.merge(df1, df2, how="cross") .loc[lambda df: df["Id_x"].map(set) <= df["Id_y"].map(set)] .pipe( lambda df: df.append( df1.loc[~df1["Id"].isin(df["Id_x"])].rename(columns={"Id": "Id_x"}) ) ) .reset_index(drop=True) ) print(out) Id_x Value Id_y Product_Name 0 [101, 102, 103] 10001 [101, 102, 103, 104] Shoe 1 [101, 102, 104] 10000 [101, 102, 103, 104] Shoe 2 [101, 102, 104] 10000 [101, 102, 109, 104] jeans 3 [101, 102, 105] 10002 [101, 105, 102, 108] make-up 4 [101, 107, 105] 10003 NaN NaN