In a setup similar to this:
>>> import pandas as pd
>>> from random import randint
>>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)],
'B': [randint(1, 9)*10 for x in range(10)],
'C': [randint(1, 9)*100 for x in range(10)]})
>>> df
A B C
0 9 80 900
1 9 70 700
2 5 70 900
3 8 80 900
4 7 50 200
5 9 30 900
6 2 80 900
7 2 80 400
8 5 80 300
9 7 80 900
My question is how to get ALL the rows in the dataframe with the same values on a certain set of columns ( let’s say for example {B,C} ) of an other specified row ( for example row with index 3)
I want this (index 3, set {B,C}):
A B C 0 9 80 900 3 8 80 900 # this is the rows specified 6 2 80 900 9 7 80 900
The problem now is that in my case my set of columns ({B,C}) is composed of more than 200 columns, and i can’t find a way to generate such a long condition. For the problem you can assume the column are enumerated from 0 to n.
Advertisement
Answer
You can have the subset of columns as a list and get the values at the given index for the subset of columns using .loc accessor, then check for equalities and call all for axis=1, finally get the resulting dataframe for this masking.
>>> cols = ['B', 'C'] >>> index = 3 >>> df[(df[cols]==df.loc[index, cols]).all(1)]
OUTPUT:
A B C 0 9 80 900 3 8 80 900 6 2 80 900 9 7 80 900