Skip to content
Advertisement

How to get all the rows with the same values on a certain set of columns of an other specified row in Pandas?

In a setup similar to this:

>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
                   'B': [randint(1, 9)*10 for x in range(10)],
                   'C': [randint(1, 9)*100 for x in range(10)]})
>>> df
   A   B    C
0  9  80  900
1  9  70  700
2  5  70  900
3  8  80  900
4  7  50  200
5  9  30  900
6  2  80  900
7  2  80  400
8  5  80  300
9  7  80  900

My question is how to get ALL the rows in the dataframe with the same values on a certain set of columns ( let’s say for example {B,C} ) of an other specified row ( for example row with index 3)

I want this (index 3, set {B,C}):

   A   B    C
0  9  80  900
3  8  80  900 # this is the rows specified
6  2  80  900
9  7  80  900

The problem now is that in my case my set of columns ({B,C}) is composed of more than 200 columns, and i can’t find a way to generate such a long condition. For the problem you can assume the column are enumerated from 0 to n.

Advertisement

Answer

You can have the subset of columns as a list and get the values at the given index for the subset of columns using .loc accessor, then check for equalities and call all for axis=1, finally get the resulting dataframe for this masking.

>>> cols = ['B', 'C']
>>> index = 3
>>> df[(df[cols]==df.loc[index, cols]).all(1)]

OUTPUT:

   A   B    C
0  9  80  900
3  8  80  900
6  2  80  900
9  7  80  900
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement