Skip to content

Tag: io

Efficiently reading small pieces from multiple h5df files?

I have a hdf5 file every day, which contains compressed data for many assets. Specifically, each h5 file contains 5000 assets, and is organized by key-value structure such as The data of each asset has the same format and size and all together I have around 1000 days of data. Now the task is to do ad-hoc anal…

Optimal way to use multiprocessing for many files

So I have a large list of files that need to be processed into CSVs. Each file itself is quite large, and each line is a string. Each line of the files could represent one of three types of data, each of which is processed a bit differently. My current solution looks like the following: I iterate through the …

How to read a sequence of files as one file (fileinput lacks read)?

I need to read a sequence of files and fileinput seemed just what was needed but then I realized it lacks a read method. What is the canonical way (if any) to do this? (Explicit catenation will be wasteful.) Is there a known technical or security reason that fileinput does not support read? This related quest…

How to json dump inside Zipfile.open process?

I am trying to write a json inside a ZipFile BytesIO process. It goes like this: It is later saved in a Django File field. However it does not dump the data into the json_file. Finds it hard since it does not report an error message. Answer Your code ‘shadows’ zipfile, which won’t be a probl…

Print ‘std err’ value from statsmodels OLS results

(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can’t access the docs) I’m doing a linear regression using statsmodels, basically: I know that I can print out the full set of results with: which outputs something like: I need a way to print out only the values of coef…