I am looking for a fast way to preserve large numpy arrays. I want to save them to the disk in a binary format, then read them back into memory relatively fastly. cPickle is not fast enough, unfortunately.
I found numpy.savez and numpy.load. But the weird thing is, numpy.load loads a npy file into “memory-map”. That means regular manipulating of arrays really slow. For example, something like this would be really slow:
#!/usr/bin/python import numpy as np; import time; from tempfile import TemporaryFile n = 10000000; a = np.arange(n) b = np.arange(n) * 10 c = np.arange(n) * -0.5 file = TemporaryFile() np.savez(file,a = a, b = b, c = c); file.seek(0) t = time.time() z = np.load(file) print "loading time = ", time.time() - t t = time.time() aa = z['a'] bb = z['b'] cc = z['c'] print "assigning time = ", time.time() - t;
more precisely, the first line will be really fast, but the remaining lines that assign the arrays to obj
are ridiculously slow:
loading time = 0.000220775604248 assining time = 2.72940087318
Is there any better way of preserving numpy arrays? Ideally, I want to be able to store multiple arrays in one file.
Advertisement
Answer
I’m a big fan of hdf5 for storing large numpy arrays. There are two options for dealing with hdf5 in python:
Both are designed to work with numpy arrays efficiently.