Skip to content
Advertisement

List comprehension with multiple conditions on different columns

I have the following df,

data = [['Male', 'Agree'], ['Male', 'Agree'], ['Male', 'Disagree'], ['Female','Neutral']]
 
df = pd.DataFrame(data, columns = ['Sex', 'Opinion'])
df

& would like to get the total number of Male who either Agree or Disagree. I expect the answer to be 3 but instead get 9.

sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ] 

I have done this through other methods and I’m trying to understand list comprehension better.

Advertisement

Answer

Let’s unpack this a bit. The original statement

total = sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]

is equivalent to

total = 0
for x in df['Opinion']:
    for y in df['Sex']:
        if x in ['Agree', 'Disagree']:
            if y=='Male':
                total += 1

I think it should be clear in this case why you get 9.

What you actually want is to only consider corresponding pairs of two equal sized iterables. There’s the handy zip built-in in python which does just this,

total = 0
for x,y in zip(df['Opinion'], df['Sex']):
    if x in ['Agree', 'Disagree'] and y=='Male':
        total += 1

or as a comprehension

total = sum(1 for x,y in zip(df['Opinion'], df['Sex']) if x in ['Agree', 'Disagree'] and y=='Male')
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement