Suppose I have two very simple arrays with numpy:
JavaScript
x
4
1
import numpy as np
2
reference=np.array([0,1,2,3,0,0,0,7,8,9,10])
3
probe=np.zeros(3)
4
I would like to find which slice of array reference
has the highest pearson’s correlation coefficient with array probe
. To do that, I would like to slice the array reference
using some sort of sub-arrays that are overlapped in a for loop, which means I shift one element at a time of reference
, and compare it against array probe
. I did the slicing using the non elegant code below:
JavaScript
1
9
1
from statistics import correlation
2
for i in range(0,len(reference)):
3
#get the slice of the data
4
sliced_data=reference[i:i+len(probe)]
5
#only calculate the correlation when probe and reference have the same number of elements
6
if len(sliced_data)==len(probe):
7
my_rho = correlation(sliced_data, probe)
8
9
I have one issues and one question about such a code:
1-once I run the code, I have the error below:
JavaScript151my_rho = correlation(sliced_data, probe)
2File "/usr/lib/python3.10/statistics.py", line 919, in correlation
3raise StatisticsError('at least one of the inputs is constant')
4statistics.StatisticsError: at least one of the inputs is constant
5
2- is there a more elegant way of doing such slicing with python?
Advertisement
Answer
You can use sliding_window_view
to get the successive values, for a vectorized computation of the correlation, use a custom function:
JavaScript
1
12
12
1
from numpy.lib.stride_tricks import sliding_window_view as swv
2
3
def np_corr(X, y):
4
# adapted from https://stackoverflow.com/a/71253141
5
denom = (np.sqrt((len(y) * np.sum(X**2, axis=-1) - np.sum(X, axis=-1) ** 2)
6
* (len(y) * np.sum(y**2) - np.sum(y)**2)))
7
return np.divide((len(y) * np.sum(X * y[None, :], axis=-1) - (np.sum(X, axis=-1) * np.sum(y))),
8
denom, where=denom!=0
9
)
10
11
corr = np_corr(swv(reference, len(probe)), probe)
12
Output:
JavaScript
1
3
1
array([ 1. , 1. , -0.65465367, -0.8660254 , 0. ,
2
0.8660254 , 0.91766294, 1. , 1. ])
3