Skip to content
Advertisement

Pandas: return rows that have two matching columns commonality

I am trying to write a commonality script which will return rows in a pandas dataframe that have two matching columns, and also will sum up the number of rows with matches into a new column

OPERATION and MACHINE are the columns to match

Input:

BATCH OPERATION MACHINE DATE
1A 4000 Printer1 01-Jan-22
1A 2000 Fax1 02-Jan-22
1B 4000 Printer2 03-Jan-22
1B 2000 Phone1 04-Jan-22
1C 2000 Phone2 05-Jan-22
1A 2000 Phone2 06-Jan-22

Output: just the last two rows in this example with OPERATION =2000 and MACHINE = Phone2

BATCH OPERATION MACHINE DATE MATCHES
1C 2000 Phone2 05-Jan-22 2
1A 2000 Phone2 06-Jan-22 2

I have tried several merge and duplicated df commands but can’t get it work.

Advertisement

Answer

df[df.groupby(['OPERATION', 'MACHINE'])['BATCH'].transform('count') > 1].assign(MATCHES = df.groupby(['OPERATION', 'MACHINE'])['BATCH'].transform('count'))

  BATCH  OPERATION MACHINE       DATE  MATCHES
4    1C       2000  Phone2  05-Jan-22        2
5    1A       2000  Phone2  06-Jan-22        2

OR

df['MATCHES'] = df.groupby(['OPERATION', 'MACHINE'])['BATCH'].transform('count')
df[df['MATCHES']>1]

Note: one may be faster than other depending on the source dataframe

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement