I have a large DataFrame
that structures as follows:
import pandas as pd df = pd.DataFrame({'name1': [1, 0, 1,1], 'name2': [0, 0, 0,1], 'name3': [1, 1, 1,1], 'namen': [0, 0, 0,0]}, index=['label1', 'label2', 'label3', 'labeln']) >>> df name1 name2 name3 name4 label1 1 0 1 1 label2 0 0 0 1 label3 1 1 1 1 label4 0 0 0 0
I am trying to build a function that takes in n row names as arguments sums up the values in all columns and returns me the column names if the sum of those columns equals n.
For instance, using label1, label2 and label3 as inputs I would like to obtain the following output:
def common_terms(*nargs): the function... >>> common_terms(label1, label2, label3) (name4)
or
>>> common_terms(label1, label3) (name1, name3)
I have little knowledge of building functions in Python, but got my head really stuck on this. Could you kindly help me to progress?
Advertisement
Answer
Filter rows by loc
and test if all 1
per columns, then filter index
of Series
:
def common_terms(*nargs): i = df.loc[list(nargs)].all() return i.index[i].tolist() print (common_terms('label1', 'label2', 'label3')) ['namen'] print (common_terms('label1','label3')) ['name1', 'namen']