I have this Dataframe. I want to make age range 1-5, 6-10, 11-15, etc and set all values in this range by their mean.
Name Age 0 x 5 1 y 7 2 z 2 3 p 9 4 q 12 5 r 6 6 s 5 7 t 1 8 u 13 9 v 10
Now I want to add a column ageGroup which will contain the mean of the required range. Here 1-5 is a range. so all of the ages between these will mean value. Here, (5+2+5+1) // 4 = 3. Similarly, for range 11-15 will be (12+13) // 2 = 12.
So, the expected output is.
Name Age ageGroup 0 x 5 3 1 y 7 8 2 z 2 3 3 p 9 8 4 q 12 12 5 r 6 8 6 s 5 3 7 t 1 3 8 u 13 12 9 v 10 8
Advertisement
Answer
You can use pd.cut
to bin the data and then you can use with groupby:
max_age = 15 step = 5 df['ageGroup'] = df.groupby(pd.cut(df['Age'], range(0,max_age+step,5)))['Age'].transform('mean').round()
print(df) Name Age ageGroup 0 x 5 3.0 1 y 7 8.0 2 z 2 3.0 3 p 9 8.0 4 q 12 12.0 5 r 6 8.0 6 s 5 3.0 7 t 1 3.0 8 u 13 12.0 9 v 10 8.0