I have a data frame
JavaScript
x
2
1
df = pd.DataFrame([["A",1,98,56,61], ["B",1,99,54,36], ["C",1,97,32,83],["B",1,96,31,90], ["C",1,45,32,12], ["A",1,67,33,55], ["C",1,54,65,73], ["A",1,34,84,98], ["B",1,76,12,99]], columns=["id","date","c1","c2","c3"])
2
Need to calculate Z-score for columns “c1”, “c2”, “c3” using groupby on “id”, and transform it to the original form without using the loop.
Expected output:
JavaScript
1
2
1
df_out = pd.DataFrame([["A",1,1.21179,-0.079921,-0.543442], ["B",1,0.84893,1.26172,-1.401826], ["C",1,1.395551,-0.707107,0.860437],["B",1,0.55507,-0.077644,0.539164], ["C",1,-0.89609,-0.707107,-1.402194], ["A",1,0.025511,-1.182827,-0.858988], ["C",1,-0.49946,1.414214,0.541757], ["A",1,-1.237301,1.262748,1.40243], ["B",1,-1.404,-1.184075,0.862662]], columns=["id","date","c1","c2","c3"])
2
How to do it?
Advertisement
Answer
Use GroupBy.transform
with DataFrame.join
:
JavaScript
1
15
15
1
from scipy.stats import zscore
2
3
df = df[['id','date']].join(df.groupby(['id','date']).transform(zscore))
4
print (df)
5
id date c1 c2 c3
6
0 A 1 1.211790 -0.079921 -0.543442
7
1 B 1 0.848930 1.261720 -1.401826
8
2 C 1 1.395551 -0.707107 0.860437
9
3 B 1 0.555070 -0.077644 0.539164
10
4 C 1 -0.896090 -0.707107 -1.402194
11
5 A 1 0.025511 -1.182827 -0.858988
12
6 C 1 -0.499460 1.414214 0.541757
13
7 A 1 -1.237301 1.262748 1.402430
14
8 B 1 -1.404000 -1.184075 0.862662
15