I have a data frame:
JavaScript
x
5
1
df = pd.DataFrame([[0,4,0,0],
2
[1,5,1,0],
3
[2,6,0,0],
4
[3,7,1,0]], columns=['index', 'A', 'class', 'label'])
5
df:
index | A | class | label |
---|---|---|---|
0 | 4 | 0 | 0 |
1 | 5 | 1 | 0 |
2 | 6 | 0 | 0 |
3 | 7 | 1 | 0 |
I want to change the label to 1, if the mean of A column of rows with class 0 is bigger than the mean of all data in column A?
How to do this in a few line of code?
I tried this but didn’t work:
JavaScript
1
3
1
if df[df['class'] == 0]['A'].mean() > df['A'].mean():
2
df[df['class']]['lable'] = 1
3
Advertisement
Answer
Use the following, pandas.DataFrame.groupby
'class'
, get groupby.mean
of each group of 'A'
, check whether greater than df['A'].mean()
, and pandas.Series.map
that boolean
series astype
(int)
to df['class']
and assign to df['label']
:
JavaScript
1
12
12
1
>>> df['label'] = df['class'].map(
2
df.groupby('class')['A'].mean() > df['A'].mean()
3
).astype(int)
4
5
>>> df
6
7
index A class label
8
0 0 4 0 0
9
1 1 5 1 1
10
2 2 6 0 0
11
3 3 7 1 1
12
Since you are checking only for class
== 0, you need to add another boolean mask
on df['class']
:
JavaScript
1
10
10
1
>>> df['label'] = (df['class'].map(
2
df.groupby('class')['A'].mean() > df['A'].mean()
3
) & (~df['class'].astype(bool))
4
).astype(int)
5
index A class label
6
0 0 4 0 0
7
1 1 5 1 0 # because (5+7)/2 < (4+5+6+7)/4
8
2 2 6 0 0
9
3 3 7 1 0 # because (5+7)/2 < (4+5+6+7)/4
10
So even if your code has worked, you will not know it, because the conditions do not get fulfilled.