How to assign a value to a column for a subset of dataframe based on a condition in Pandas?

Question

I have a data frame: df: index A class label 0 4 0 0 1 5 1 0 2 6 0 0 3 7 1 0 I want to change the label to 1, if the mean of A column of rows with class 0 is bigger than the mean of all data in column A? How to do this

Accepted Answer

Use the following, pandas.DataFrame.groupby 'class', get groupby.mean of each group of 'A', check whether greater than df['A'].mean(), and pandas.Series.map that boolean series astype(int) to df['class'] and assign to df['label']:>>> df['label'] = df['class'].map( df.groupby('class')['A'].mean() > df['A'].mean() ).astype(int)>>> df index A class label0 0 4 0 01 1 5 1 12 2 6 0 03 3 7 1 1Since you are checking only for class == 0, you need to add another boolean mask on df['class']:>>> df['label'] = (df['class'].map( df.groupby('class')['A'].mean() > df['A'].mean() ) & (~df['class'].astype(bool)) ).astype(int) index A class label0 0 4 0 01 1 5 1 0 # because (5+7)/2 < (4+5+6+7)/42 2 6 0 03 3 7 1 0 # because (5+7)/2 < (4+5+6+7)/4So even if your code has worked, you will not know it, because the conditions do not get fulfilled.

Advertisement

Answer