Is there a better way to use cython when looking to speed up Python?

Question

I have a large numpy array with the following structure: I&#8217;m using cython to try and speed up the processing as much as possible. The argument dataset in the code below is the above array. However, when running the above code with and without cython I get the following times: without cython: 0:00:00.945…

Accepted Answer

Generally with numpy, you want to:Put the same data type into an array (avoid dtype=object and use a separate array for the strings). Otherwise every element access must internally test for the data type, and that will slow things down. This is equally true for cython.Avoid element-wise access and use only operations on entire arrays instead. For your case, consider building up the indices in an integer array and express the indexing of your input array as one operation.E.g.:a = np.array(..., dtype=np.float) # input number columns only from abovefa = a.flatten()  # helps to use 1d indicesfar = fa.reshape((fa.shape[0], 1))  # make 2d for hstack()idxs = np.indices((a.shape[0], a.shape[0]))idxs1 = idxs[0].flatten()  # 0,0,0,1,1,1,...idxs2 = idxs[1].flatten()  # 0,1,0,1,...np.hstack((far[idxs1], far[idx2]))No cython needed (unless you really need complex element-wise calculation).Since you previously iterated with O(n^2) operations, the above should also work out to a speedup even if you first have to convert your input array to this format.

Advertisement

Answer