Skip to content
Advertisement

Time & memory complexity management with multi-dimensional matrices using parallelisation and numpy

I have a time series of very large matrices. I am trying to speed up the process and was wondering the most optimal way to do this. The two things that came to mind are to parallelize the process using numba or to apply a function to the matrices such as with np.apply_along_axis.

Speed and memory complexity are very important. I have enclosed some example code to generate these matrices. The true ones are much larger (shapes larger than (400, 400, 400, 400)). I assume the two functions “determineShiftsJmax” and “addPadding” are the most time complex, given the nested loops.

import numpy as np

def determineShiftsJmax(m, jmaxR, jmaxC):
    layerR = min(m, jmaxR - 1)
    layerC = min(m, jmaxC - 1)

    nodesR = 2 * layerR + 1
    nodesC = 2 * layerC + 1

    u = range(0, nodesR)
    b = range(0, nodesC)

    mat = np.zeros((nodesR, nodesC), dtype=object)
    for x, i in enumerate(u):
        for y, j in enumerate(b):
            up = (j, 2 * layerC - j)
            left = (i, 2 * layerR - i)
            mat[x, y] = (left, up)
    if (jmaxC <= jmaxR) and (m >= jmaxC):
        res = np.pad(mat, 1, mode="edge")
    elif (jmaxR <= jmaxC) and (m >= jmaxR):
        res = np.pad(mat, 1, mode="edge")
    else:
        res = mat
    return res

def addPadding(array, shift, shape):
    paddedMatrix = []
    for i in range(shape[0]):
        for j in range(shape[1]):
            padding = np.pad(array[i, j], shift[i, j])
            paddedMatrix.append(padding)
    paddedMatrix = np.array(paddedMatrix)
    return paddedMatrix

shapeE = [(1, 1, 3, 3),
          (3, 3, 5, 5),
          (5, 5, 7, 7),
          (7, 7, 9, 7),
          (9, 7, 11, 7),
          (11, 7, 11, 7),
          (11, 7, 11, 7),
          (11, 7, 11, 7),
          (11, 7, 11, 7),
          (11, 7, 11, 7)]

shapeI = [(1, 1, 3, 3),
          (3, 3, 3, 3),
          (5, 5, 3, 3),
          (7, 7, 3, 3),
          (9, 7, 3, 3),
          (11, 7, 3, 3),
          (11, 7, 3, 3),
          (11, 7, 3, 3),
          (11, 7, 3, 3),
          (11, 7, 3, 3)]


qs = [np.ones(x) for x in shapeI]

jmaxR = 3
jmaxC = 5


# Time step 0
m = 0
shift = determineShiftsJmax(m, jmaxC, jmaxR)
newMatrix = addPadding(qs[m], shift, shapeI[m])


# all time Steps
newMatrices = []
for m, shape in enumerate(shapeI):
    shift = determineShiftsJmax(m, jmaxC, jmaxR)
    newMatrix = addPadding(qs[m], shift, shape)
    newMatrix = newMatrix.reshape(shapeE[m])
    newMatrices.append(newMatrix)


Advertisement

Answer

It is not just python loops that slow down your code. So before going into parallelisation, you should try to improve the memory layout.

A numpy array is a continuous sequence of bytes in memory. So whenever you pre- or append something (e.g. padding) you need to create a new one with a larger size, set the padded value and copy the remaining part. This makes your code very inefficient. The key is to allocate the output array only once and then copy the parts of the incoming array into their destination. See the example below:

def crate_new_matirx(old_matrix, new_shape, m, jmaxR, jmaxC):
    old_shape = old_matrix.shape
    # allocate the memory
    new_matrix = np.zeros_like(old_matrix, shape=new_shape)
    
    # calculate the shift magic
    layerR = min(m, jmaxR - 1)
    layerC = min(m, jmaxC - 1)

    nodesR = 2 * layerR + 1
    nodesC = 2 * layerC + 1
    
    repeat = (
        (jmaxC <= jmaxR) and (m >= jmaxC) or
        (jmaxR <= jmaxC) and (m >= jmaxR)
    )

    # do the loop to copy the input at the right location
    for l in range(new_shape[0]):
        for k in range(new_shape[1]):
            jlow = min(max(0, l - int(repeat)), new_shape[2] - old_shape[2])
            jhigh = jlow + old_shape[2]
            ilow = min(max(0, k - int(repeat)), new_shape[3] - old_shape[3])
            ihigh = ilow + old_shape[3]
            new_matrix[l, k, jlow:jhigh, ilow:ihigh] = old_matrix[l, k, :, :]
    return new_matrix

However, you mentioned, that you are expecting shapes of (400, 400, 400, 400), so your output array can easily fill many GB of memory, assuming float64, you will need about 200 GB. Knowing that most of the values will be zeros may lead to the question, if you really need to crate the padded matrices in memory. I do not know what you want to do with them afterwards, but if your algorithm will need those matrices for any subsequent calculation, you could also try to get an index mapping to determine if the value you try to access is zero or an actual data value. You could also write a function to get the inner matrix for given outer two indexes (l, k) and call it when needed.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement