Skip to content
Advertisement

Summarizing a pandas DataFrame by group using a custom function results in wrong output

I have a pandas DataFrame that I want to summarize by group, using a custom function that resolves to a boolean value.

Consider the following data. df describes 4 people, and for each person the fruits they like.

JavaScript

I want to summarize this table to find the people who like both apricot and apple. In other words, my desired output is the following table

JavaScript

My attempt

I first defined a function that searches for string(s) existence in a target list:

JavaScript

Examples that is_needle_in_haystack() works:

JavaScript

Now I used is_needle_in_haystack() while grouping df by name:

JavaScript

Then why do I get the following output, which clearly not as expected?

JavaScript

What have I done wrong in my code?

Advertisement

Answer

The problem is that haystack is a Series, when called in .agg, change to:

JavaScript

Output

JavaScript

The in operator for Series, returns False, for example:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement