Skip to content
Advertisement

Pandas deleting rows based on same sting in columns

Manufacturer               Buy Box Seller
0   Goli                   Goli Nutrition Inc.
1   Hanes                  3rd Street Brands
2   NaN                    Inspiring Life
3   Sports Research        Sports Research
4   Beckham Luxury Linen   Thalestris Co.

Hello i am using pandas DataFrame to clean this file and want to delete rows which contains the manufacturers name in the buy-box seller column. For example row 1 will be deleted because it contains the string ‘Goli’ in Buy-Box seller Column.

Advertisement

Answer

There are misisng values so first replace them by DataFrame.fillna and then test if match values between columns by not in statement in DataFrame.apply with axis=1 and filter in boolean indexing:

mask = (df.fillna('Missing vals')
          .apply(lambda x: x['Manufacturer'] not in x['Buy Box Seller'], axis=1))
df = df[mask]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement