Performance comparison: Why is it faster to copy an entire numpy Matrix and then change one column than to just use numpy.column_stack?

Question

I am trying to improve the performance of some Python code. In that code, one column of a matrix (numpy-array) has to be changed temporarily. The given code looks as follows: Now I thought it should be a big improvement to not create a copy of the entire matrix A (in the example used, the matrix is 500&#215;5…

Accepted Answer

The copymethod of numpy arrays will trigger code that will just copy over all array data at maximum CPU speed, in native code &#8211; if it is 500x500x8 bytes per element, we are talking about ~2MB of data &#8211; that fits conforably even in the cache of your CPU.And numpy only have to create metadata for a single Python object.On the other hand, column_stack runs some Python code (although not on fine grained objects, or it would be worse), and ends up copying the array nonetheless (it take your slices of the current array &#8211; an slice is not copied, but then calls np.concatenate internally, which triggers the copy). So you just add the overhead of copying the data in parts, and some juggling to create  on the order of 10 Python-level array objects in the process (between slicing, concatenating, etc&#8230;) &#8211; that makes up for the 10% extra time you get.

Advertisement

Answer