I have the following df,
data = [['Male', 'Agree'], ['Male', 'Agree'], ['Male', 'Disagree'], ['Female','Neutral']] df = pd.DataFrame(data, columns = ['Sex', 'Opinion']) df
& would like to get the total number of Male who either Agree or Disagree. I expect the answer to be 3 but instead get 9.
sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]
I have done this through other methods and I’m trying to understand list comprehension better.
Advertisement
Answer
Let’s unpack this a bit. The original statement
total = sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]
is equivalent to
total = 0 for x in df['Opinion']: for y in df['Sex']: if x in ['Agree', 'Disagree']: if y=='Male': total += 1
I think it should be clear in this case why you get 9
.
What you actually want is to only consider corresponding pairs of two equal sized iterables. There’s the handy zip
built-in in python which does just this,
total = 0 for x,y in zip(df['Opinion'], df['Sex']): if x in ['Agree', 'Disagree'] and y=='Male': total += 1
or as a comprehension
total = sum(1 for x,y in zip(df['Opinion'], df['Sex']) if x in ['Agree', 'Disagree'] and y=='Male')