I have the following df,
JavaScript
x
5
1
data = [['Male', 'Agree'], ['Male', 'Agree'], ['Male', 'Disagree'], ['Female','Neutral']]
2
3
df = pd.DataFrame(data, columns = ['Sex', 'Opinion'])
4
df
5
& would like to get the total number of Male who either Agree or Disagree. I expect the answer to be 3 but instead get 9.
JavaScript
1
2
1
sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]
2
I have done this through other methods and I’m trying to understand list comprehension better.
Advertisement
Answer
Let’s unpack this a bit. The original statement
JavaScript
1
2
1
total = sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]
2
is equivalent to
JavaScript
1
7
1
total = 0
2
for x in df['Opinion']:
3
for y in df['Sex']:
4
if x in ['Agree', 'Disagree']:
5
if y=='Male':
6
total += 1
7
I think it should be clear in this case why you get 9
.
What you actually want is to only consider corresponding pairs of two equal sized iterables. There’s the handy zip
built-in in python which does just this,
JavaScript
1
5
1
total = 0
2
for x,y in zip(df['Opinion'], df['Sex']):
3
if x in ['Agree', 'Disagree'] and y=='Male':
4
total += 1
5
or as a comprehension
JavaScript
1
2
1
total = sum(1 for x,y in zip(df['Opinion'], df['Sex']) if x in ['Agree', 'Disagree'] and y=='Male')
2