Skip to content
Advertisement

How to save a list in a pandas dataframe cell to a HDF5 table format?

I have a dataframe that I want to save in the appendable format to a hdf5 file. The dataframe looks like this:

JavaScript

And the code that replicates the issue is:

JavaScript

Unfortunately, it returns this error:

JavaScript

I am aware that I can save each value in a separate column. This does not help my extended use case, as there might be variable length lists.

I know I could convert the list to a string and then recreate it based on the string, but if I start converting each column to string, I might as well use a text format, like csv, instead of a binary one like hdf5.

Is there a standard way of saving lists into hdf5 table format?

Advertisement

Answer

Python Lists present a challenge when writing to HDF5 because they may contain different types. For example, this is a perfectly valid list: [1, 'two', 3.0]. Also, if I understand your Pandas 'column1' dataframe, it may contain different length lists. There is no (simple) way to represent this as an HDF5 dataset. [That’s why you got the [mixed] object dtype message. The conversion of the dataframe creates an intermediate object that is written as a dataset. The dtype of the converted list data is “O” (object), and HDF5 doesn’t support this type.]

However, all is not lost. If we can make some assumptions about your data, we can wrangle it into a HDF5 dataset. Assumptions: 1) all df list entities are the same type (int in this case), and 2) all df lists are the same length. (We can handle different length lists, but it is more complicated.) Also, you will need to use a different package to write the HDF5 data (either PyTables or h5py). PyTables is the underlying package for Pandas HDF5 support and h5py is widely used. The choice is yours.

Before I post the code, here is an outline of the process:

  1. Create a NumPy record array (aka recarray) from the the dataframe
  2. Define the desired type and shape for the HDF5 dataset (as an Atom for Pytables, or a dtype for h5py).
  3. Create the dataset with Ataom/dtype definition above (could do on 1 line, but easier to read this way).
  4. Loop over rows of the recarray (from Step 1), and write data to rows of the dataset. This converts the List to the equivalent array.

Code to create recarray (adds 2 rows to your dataframe):

JavaScript

PyTables specific code to export data:

JavaScript

h5py specific code to export data:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement