Skip to content
Advertisement

Computing the correlation coefficient between two multi-dimensional arrays

I have two arrays that have the shapes N X T and M X T. I’d like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).

What’s the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I’m expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I’m open to converting them to a different type.

I’m expecting my output to be an array with the shape N X M.

N.B. When I say “correlation coefficient,” I mean the Pearson product-moment correlation coefficient.

Here are some things to note:

  • The numpy function correlate requires input arrays to be one-dimensional.
  • The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.
  • The scipy.stats function pearsonr requires input arrays to be one-dimensional.

Advertisement

Answer

Correlation (default ‘valid’ case) between two 2D arrays:

You can simply use matrix-multiplication np.dot like so –

JavaScript

Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.


Row-wise Correlation Coefficient calculation for two 2D arrays:

JavaScript

This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB

Benchmarking

This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.

Case #1:

JavaScript

Case #2:

JavaScript

Case #3:

JavaScript

The other loopy pearsonr based approach seemed too slow, but here are the runtimes for one small datasize –

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement