Here is a sample code.
JavaScript
x
3
1
df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
2
df['C'] = df.B.rolling(window=3)
3
Output:
JavaScript
1
12
12
1
A B C
2
0 -0.108897 1.877987 Rolling [window=3,center=False,axis=0]
3
1 -1.276055 -0.424382 Rolling [window=3,center=False,axis=0]
4
2 1.578561 -1.094649 Rolling [window=3,center=False,axis=0]
5
3 -0.443294 1.683261 Rolling [window=3,center=False,axis=0]
6
4 0.674124 0.281077 Rolling [window=3,center=False,axis=0]
7
5 0.587773 0.697557 Rolling [window=3,center=False,axis=0]
8
6 -0.258038 -1.230902 Rolling [window=3,center=False,axis=0]
9
7 -0.443269 0.647107 Rolling [window=3,center=False,axis=0]
10
8 0.347187 0.753585 Rolling [window=3,center=False,axis=0]
11
9 -0.369179 0.975155 Rolling [window=3,center=False,axis=0]
12
I want my ‘C’ column to be an array like [0.1231, -1.132, 0.8766]. I tried using rolling apply but in vain.
Expected Output:
JavaScript
1
12
12
1
A B C
2
0 -0.108897 1.877987 []
3
1 -1.276055 -0.424382 []
4
2 1.578561 -1.094649 [-1.094649, -0.424382, 1.877987]
5
3 -0.443294 1.683261 [1.683261, -1.094649, -0.424382]
6
4 0.674124 0.281077 [0.281077, 1.683261, -1.094649]
7
5 0.587773 0.697557 [0.697557, 0.281077, 1.683261]
8
6 -0.258038 -1.230902 [-1.230902, 0.697557, 0.281077]
9
7 -0.443269 0.647107 [0.647107, -1.230902, 0.697557]
10
8 0.347187 0.753585 [0.753585, 0.647107, -1.230902]
11
9 -0.369179 0.975155 [0.975155, 0.753585, 0.647107]
12
Advertisement
Answer
You could use np.stride_tricks
:
JavaScript
1
47
47
1
import numpy as np
2
as_strided = np.lib.stride_tricks.as_strided
3
4
df
5
6
A B
7
0 -0.272824 -1.606357
8
1 -0.350643 0.000510
9
2 0.247222 1.627117
10
3 -1.601180 0.550903
11
4 0.803039 -1.231291
12
5 -0.536713 -0.313384
13
6 -0.840931 -0.675352
14
7 -0.930186 -0.189356
15
8 0.151349 0.522533
16
9 -0.046146 0.507406
17
18
win = 3 # window size
19
20
# https://stackoverflow.com/a/47483615/4909087
21
v = as_strided(df.B, (len(df) - (win - 1), win), (df.B.values.strides * 2))
22
23
v
24
array([[ -1.60635669e+00, 5.10129842e-04, 1.62711678e+00],
25
[ 5.10129842e-04, 1.62711678e+00, 5.50902812e-01],
26
[ 1.62711678e+00, 5.50902812e-01, -1.23129111e+00],
27
[ 5.50902812e-01, -1.23129111e+00, -3.13383794e-01],
28
[ -1.23129111e+00, -3.13383794e-01, -6.75352179e-01],
29
[ -3.13383794e-01, -6.75352179e-01, -1.89356194e-01],
30
[ -6.75352179e-01, -1.89356194e-01, 5.22532550e-01],
31
[ -1.89356194e-01, 5.22532550e-01, 5.07405549e-01]])
32
33
df['C'] = pd.Series(v.tolist(), index=df.index[win - 1:])
34
df
35
36
A B C
37
0 -0.272824 -1.606357 NaN
38
1 -0.350643 0.000510 NaN
39
2 0.247222 1.627117 [-1.606356691642917, 0.0005101298424200881, 1.
40
3 -1.601180 0.550903 [0.0005101298424200881, 1.6271167809032248, 0.
41
4 0.803039 -1.231291 [1.6271167809032248, 0.5509028122535129, -1.23
42
5 -0.536713 -0.313384 [0.5509028122535129, -1.2312911105674484, -0.3
43
6 -0.840931 -0.675352 [-1.2312911105674484, -0.3133837943758246, -0.
44
7 -0.930186 -0.189356 [-0.3133837943758246, -0.6753521794378446, -0.
45
8 0.151349 0.522533 [-0.6753521794378446, -0.18935619377656243, 0.
46
9 -0.046146 0.507406 [-0.18935619377656243, 0.52253255045267, 0.507
47