Function that takes n rows as input and returns co…

I have a large DataFrame that structures as follows:

import pandas as pd

df = pd.DataFrame({'name1': [1, 0, 1,1],
                   'name2': [0, 0, 0,1],
                   'name3': [1, 1, 1,1],
                   'namen': [0, 0, 0,0]},
                  index=['label1', 'label2', 'label3', 'labeln'])
>>> df
      name1 name2 name3 name4
label1  1     0     1      1
label2  0     0     0      1
label3  1     1     1      1
label4  0     0     0      0

I am trying to build a function that takes in n row names as arguments sums up the values in all columns and returns me the column names if the sum of those columns equals n.

For instance, using label1, label2 and label3 as inputs I would like to obtain the following output:

def common_terms(*nargs):
   the function...

>>> common_terms(label1, label2, label3)
(name4)

>>> common_terms(label1, label3)
(name1, name3)

I have little knowledge of building functions in Python, but got my head really stuck on this. Could you kindly help me to progress?

Answer

Filter rows by loc and test if all 1 per columns, then filter index of Series:

def common_terms(*nargs):
   i = df.loc[list(nargs)].all()
   return i.index[i].tolist()

print (common_terms('label1', 'label2', 'label3'))
['namen']

print (common_terms('label1','label3'))
['name1', 'namen']

Function that takes n rows as input and returns column names if sum in column equals n

Advertisement

Answer