In a setup similar to this:
JavaScript
x
18
18
1
>>> import pandas as pd
2
>>> from random import randint
3
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
4
'B': [randint(1, 9)*10 for x in range(10)],
5
'C': [randint(1, 9)*100 for x in range(10)]})
6
>>> df
7
A B C
8
0 9 80 900
9
1 9 70 700
10
2 5 70 900
11
3 8 80 900
12
4 7 50 200
13
5 9 30 900
14
6 2 80 900
15
7 2 80 400
16
8 5 80 300
17
9 7 80 900
18
My question is how to get ALL the rows in the dataframe with the same values on a certain set of columns ( let’s say for example {B,C} ) of an other specified row ( for example row with index 3)
I want this (index 3, set {B,C}):
JavaScript
1
6
1
A B C
2
0 9 80 900
3
3 8 80 900 # this is the rows specified
4
6 2 80 900
5
9 7 80 900
6
The problem now is that in my case my set of columns ({B,C}) is composed of more than 200 columns, and i can’t find a way to generate such a long condition. For the problem you can assume the column are enumerated from 0 to n.
Advertisement
Answer
You can have the subset of columns as a list and get the values at the given index for the subset of columns using .loc
accessor, then check for equalities and call all
for axis=1
, finally get the resulting dataframe for this masking.
JavaScript
1
4
1
>>> cols = ['B', 'C']
2
>>> index = 3
3
>>> df[(df[cols]==df.loc[index, cols]).all(1)]
4
OUTPUT:
JavaScript
1
6
1
A B C
2
0 9 80 900
3
3 8 80 900
4
6 2 80 900
5
9 7 80 900
6