I have a data frame:
df = pd.DataFrame([[0,4,0,0], [1,5,1,0], [2,6,0,0], [3,7,1,0]], columns=['index', 'A', 'class', 'label'])
df:
index | A | class | label |
---|---|---|---|
0 | 4 | 0 | 0 |
1 | 5 | 1 | 0 |
2 | 6 | 0 | 0 |
3 | 7 | 1 | 0 |
I want to change the label to 1, if the mean of A column of rows with class 0 is bigger than the mean of all data in column A?
How to do this in a few line of code?
I tried this but didn’t work:
if df[df['class'] == 0]['A'].mean() > df['A'].mean(): df[df['class']]['lable'] = 1
Advertisement
Answer
Use the following, pandas.DataFrame.groupby
'class'
, get groupby.mean
of each group of 'A'
, check whether greater than df['A'].mean()
, and pandas.Series.map
that boolean
series astype
(int)
to df['class']
and assign to df['label']
:
>>> df['label'] = df['class'].map( df.groupby('class')['A'].mean() > df['A'].mean() ).astype(int) >>> df index A class label 0 0 4 0 0 1 1 5 1 1 2 2 6 0 0 3 3 7 1 1
Since you are checking only for class
== 0, you need to add another boolean mask
on df['class']
:
>>> df['label'] = (df['class'].map( df.groupby('class')['A'].mean() > df['A'].mean() ) & (~df['class'].astype(bool)) ).astype(int) index A class label 0 0 4 0 0 1 1 5 1 0 # because (5+7)/2 < (4+5+6+7)/4 2 2 6 0 0 3 3 7 1 0 # because (5+7)/2 < (4+5+6+7)/4
So even if your code has worked, you will not know it, because the conditions do not get fulfilled.