Skip to content
Advertisement

best way to preserve numpy arrays on disk

I am looking for a fast way to preserve large numpy arrays. I want to save them to the disk in a binary format, then read them back into memory relatively fastly. cPickle is not fast enough, unfortunately.

I found numpy.savez and numpy.load. But the weird thing is, numpy.load loads a npy file into “memory-map”. That means regular manipulating of arrays really slow. For example, something like this would be really slow:

JavaScript

more precisely, the first line will be really fast, but the remaining lines that assign the arrays to obj are ridiculously slow:

JavaScript

Is there any better way of preserving numpy arrays? Ideally, I want to be able to store multiple arrays in one file.

Advertisement

Answer

I’m a big fan of hdf5 for storing large numpy arrays. There are two options for dealing with hdf5 in python:

http://www.pytables.org/

http://www.h5py.org/

Both are designed to work with numpy arrays efficiently.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement