Skip to content
Advertisement

Function that takes n rows as input and returns column names if sum in column equals n

I have a large DataFrame that structures as follows:

import pandas as pd

df = pd.DataFrame({'name1': [1, 0, 1,1],
                   'name2': [0, 0, 0,1],
                   'name3': [1, 1, 1,1],
                   'namen': [0, 0, 0,0]},
                  index=['label1', 'label2', 'label3', 'labeln'])
>>> df
      name1 name2 name3 name4
label1  1     0     1      1
label2  0     0     0      1
label3  1     1     1      1
label4  0     0     0      0

I am trying to build a function that takes in n row names as arguments sums up the values in all columns and returns me the column names if the sum of those columns equals n.

For instance, using label1, label2 and label3 as inputs I would like to obtain the following output:

def common_terms(*nargs):
   the function...

>>> common_terms(label1, label2, label3)
(name4)

or

>>> common_terms(label1, label3)
(name1, name3)

I have little knowledge of building functions in Python, but got my head really stuck on this. Could you kindly help me to progress?

Advertisement

Answer

Filter rows by loc and test if all 1 per columns, then filter index of Series:

def common_terms(*nargs):
   i = df.loc[list(nargs)].all()
   return i.index[i].tolist()

print (common_terms('label1', 'label2', 'label3'))
['namen']

print (common_terms('label1','label3'))
['name1', 'namen']
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement