Skip to content
Advertisement

How to assign a value to a column for a subset of dataframe based on a condition in Pandas?

I have a data frame:

JavaScript

df:

index A class label
0 4 0 0
1 5 1 0
2 6 0 0
3 7 1 0

I want to change the label to 1, if the mean of A column of rows with class 0 is bigger than the mean of all data in column A?

How to do this in a few line of code?

I tried this but didn’t work:

JavaScript

Advertisement

Answer

Use the following, pandas.DataFrame.groupby 'class', get groupby.mean of each group of 'A', check whether greater than df['A'].mean(), and pandas.Series.map that boolean series astype(int) to df['class'] and assign to df['label']:

JavaScript

Since you are checking only for class == 0, you need to add another boolean mask on df['class']:

JavaScript

So even if your code has worked, you will not know it, because the conditions do not get fulfilled.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement