How can I find rows in Pandas DataFrame where the sum of 2 rows is greater than some value?

Question

In a dataset like the one below, I&#8217;m trying to group the rows by attr_1 and attr_2, and if the sum of the count column exceeds a threshold (in this case 100), I want to keep the original rows. account attr_1 attr_2 count ABC X1 Y1 25 DEF X1 Y1 100 ABC X2 Y2 150 DEF X2 Y2 0 ABC

Accepted Answer

You can use groupby + filter, and in the filter lambda, provides a scalar condition for the group:df.groupby(['attr_1', 'attr_2']).filter(lambda g:  g['count'].sum() >= min_count)  account attr_1 attr_2  count0     ABC     X1     Y1     251     DEF     X1     Y1    1002     ABC     X2     Y2    1503     DEF     X2     Y2      0Or use groupby + transform to create a filter condition that&#8217;s compatible with the original data frame:df[df.groupby(['attr_1', 'attr_2'])['count'].transform('sum').ge(min_count)]  account attr_1 attr_2  count0     ABC     X1     Y1     251     DEF     X1     Y1    1002     ABC     X2     Y2    1503     DEF     X2     Y2      0

Advertisement

Answer