I have a pandas dataframe and I need to select the rows where all the columns are equal to 0.00. How can I do that?
Advertisement
Answer
First, we’ll setup some example data to work on – here there are 3 columns, but it could be more, or fewer in your example.
import pandas as pd data = [ {"A": 1, "B": 1, "C" : 1}, {"A": 0, "B": 1, "C" : 1}, {"A": 1, "B": 0, "C" : 1}, {"A": 0, "B": 0, "C" : 1}, {"A": 1, "B": 1, "C" : 0}, {"A": 0, "B": 1, "C" : 0}, {"A": 1, "B": 0, "C" : 0}, {"A": 0, "B": 0, "C" : 0} ] df = pd.DataFrame(data)
And define a function that accepts a row from a dataframe, and performs some test on it – in this case, check that all values in the row are equal to zero.
def all_zero(row): if all([v==0 for v in row.values]): return True else: return False
Once the test function is defined apply it to the dataframe along axis 1 (row by row):
df.apply(all_zero, axis=1)
Which returns a boolean series, showing for each row, whether the test passed or failed.
0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 True dtype: bool
And you can use this boolean series as an index-level selector (or mask, as some people call it), to retrieve only those rows that return True
.
df[df.apply(all_zero, axis=1)]
index | A | B | C |
---|---|---|---|
7 | 0 | 0 | 0 |