I have the following df,
data = [['Male', 'Agree'], ['Male', 'Agree'], ['Male', 'Disagree'], ['Female','Neutral']] df = pd.DataFrame(data, columns = ['Sex', 'Opinion']) df
& would like to get the total number of Male who either Agree or Disagree. I expect the answer to be 3 but instead get 9.
sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]
I have done this through other methods and I’m trying to understand list comprehension better.
Advertisement
Answer
Let’s unpack this a bit. The original statement
total = sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]
is equivalent to
total = 0
for x in df['Opinion']:
    for y in df['Sex']:
        if x in ['Agree', 'Disagree']:
            if y=='Male':
                total += 1
I think it should be clear in this case why you get 9.
What you actually want is to only consider corresponding pairs of two equal sized iterables. There’s the handy zip built-in in python which does just this,
total = 0
for x,y in zip(df['Opinion'], df['Sex']):
    if x in ['Agree', 'Disagree'] and y=='Male':
        total += 1
or as a comprehension
total = sum(1 for x,y in zip(df['Opinion'], df['Sex']) if x in ['Agree', 'Disagree'] and y=='Male')
 
						