Skip to content
Advertisement

Finding similar rows in two dataframes using pandas

I have two data frames, The first one is the root data frame, second one is obtained from first data frame (which is based on a pattern that “Name” must be repeated 3 times and “Subset” must of the pattern as shown in dataframe 2 below).

Based on these two dataframes, i need to add a “Remark” column to root dataframe and the column writes “Yes” is particular row in dataframe 1 is present in dataframe 2, else it should be blank.

DataFrame1-

Name    Subset            Value
A       67-A-5678          14
A       58-ABC-87555       187
A       45-ASH-87954       5465
T       78-A-8793          4533
T       52-O-5642          5644
S       34-A-8785          454
S       58-ASO-98978       54
S       23-ASH-87895       784
X       98-X-87876         455
X       87-ABC-54578       4545
X       56-ASH-89667       854
Y       09-D-98644         45
Y       87-ABC-78834       98
Y       87-ASH-87455A      4566
L       67-A-87545         78
L       89-GHS-08753       12 
L       78-PHU-09876       655 

DataFrame2-

Name    Subset             Value
A       67-A-5678          14       
A       58-ABC-87555       187      
A       45-ASH-87954       5465     
X       98-X-87876         455
X       87-ABC-54578       4545
X       56-ASH-89667       854
Y       09-D-98644         45
Y       87-ABC-78834       98
Y       87-ASH-87455A      4566

Output Dataframe-

Name    Subset            Value    Remark
A       67-A-5678          14      Yes
A       58-ABC-87555       187     Yes 
A       45-ASH-87954       5465    Yes
T       78-A-8793          4533
T       52-O-5642          5644
S       34-A-8785          454
S       58-ASO-98978       54
S       23-ASH-87895       784
X       98-X-87876         455     Yes
X       87-ABC-54578       4545    Yes
X       56-ASH-89667       854     Yes
Y       09-D-98644         45      Yes
Y       87-ABC-78834       98      Yes
Y       87-ASH-87455A      4566    Yes
L       67-A-87545         78
L       89-GHS-08753       12 
L       78-PHU-09876       655

P.S. Actual dataset can be of many columns/rows.

Advertisement

Answer

You can merge with indicator=True and use the “both” property to get the matching rows:

(df1.merge(df2, on=list(df1.columns), how='left', indicator=True)
    .assign(**{'Remark Added': lambda d: d['_merge'].eq('both').map({True: 'Yes', False: ''})})
    .drop(columns='_merge')
)

NB. to see how this works, comment the .drop(columns='_merge') line to keep the temporary _merge column

output:

   Name         Subset  Value Remark Added
0     A      67-A-5678     14          Yes
1     A   58-ABC-87555    187          Yes
2     A   45-ASH-87954   5465          Yes
3     T      78-A-8793   4533             
4     T      52-O-5642   5644             
5     S      34-A-8785    454             
6     S   58-ASO-98978     54             
7     S   23-ASH-87895    784             
8     X     98-X-87876    455          Yes
9     X   87-ABC-54578   4545          Yes
10    X   56-ASH-89667    854          Yes
11    Y     09-D-98644     45          Yes
12    Y   87-ABC-78834     98          Yes
13    Y  87-ASH-87455A   4566          Yes
14    L     67-A-87545     78             
15    L   89-GHS-08753     12             
16    L   78-PHU-09876    655             
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement