In a setup similar to this:
>>> import pandas as pd >>> from random import randint >>> df = pd.DataFrame({'A': [randint(1, 9) for x in range(10)], 'B': [randint(1, 9)*10 for x in range(10)], 'C': [randint(1, 9)*100 for x in range(10)]}) >>> df A B C 0 9 80 900 1 9 70 700 2 5 70 900 3 8 80 900 4 7 50 200 5 9 30 900 6 2 80 900 7 2 80 400 8 5 80 300 9 7 80 900
My question is how to get ALL the rows in the dataframe with the same values on a certain set of columns ( let’s say for example {B,C} ) of an other specified row ( for example row with index 3)
I want this (index 3, set {B,C}):
A B C 0 9 80 900 3 8 80 900 # this is the rows specified 6 2 80 900 9 7 80 900
The problem now is that in my case my set of columns ({B,C}) is composed of more than 200 columns, and i can’t find a way to generate such a long condition. For the problem you can assume the column are enumerated from 0 to n.
Advertisement
Answer
You can have the subset of columns as a list and get the values at the given index for the subset of columns using .loc
accessor, then check for equalities and call all
for axis=1
, finally get the resulting dataframe for this masking.
>>> cols = ['B', 'C'] >>> index = 3 >>> df[(df[cols]==df.loc[index, cols]).all(1)]
OUTPUT:
A B C 0 9 80 900 3 8 80 900 6 2 80 900 9 7 80 900