Save a list of dictionaries with numpy arrays

Question

I have a dataset composed as: Each element of the list is a dictionary containing a key "sample" and its value is a numpy array that has shape (2048,3) and the category is the class of that sample. The dataset len is 8000. I tried to save in JSON but it said it can't serialize numpy arrays. What's the best

Accepted Answer

Creating an example specific to your data requires more details about the dictionaries in the list. I created an example that assumes every dictionary has:A unique value for the category key. The value is used for the dataset name.There is a sample key with the array you want to save.Code below creates some data, loads to a HDF5 file with h5py package, then reads the data back into a new list of dictionaries. It is a good starting point for your problem.import numpy as npimport h5pya0, a1 = 10, 5arr1 = np.arange(a0*a1).reshape(a0,a1)arr2 = np.arange(a0*a1,2*a0*a1).reshape(a0,a1)arr3 = np.arange(2*a0*a1,3*a0*a1).reshape(a0,a1)dataset = [{"sample":arr1, "category":"Cat"},            {"sample":arr2, "category":"Dog"},           {"sample":arr3, "category":"Fish"},           ]# Create the HDF5 file with "category" as dataset name and "sample" as the datawith h5py.File('SO_73499414.h5', 'w') as h5f:    for ds_dict in dataset:        h5f.create_dataset(ds_dict["category"], data=ds_dict["sample"])# Retrieve the HDF5 data with "category" as dataset name and "sample" as the datads_list = []with h5py.File('SO_73499414.h5', 'r') as h5f:    for ds_name in h5f:        print(ds_name,'n',h5f[ds_name]) # prints name and dataset attributes        print(h5f[ds_name][()]) # prints the dataset values (as an array)         # add data and name to list        ds_list.append({"sample":h5f[ds_name][()], "category":ds_name})Here is a second method when category values aren&#8217;t unique.a0, a1 = 10, 5arr1 = np.arange(a0*a1).reshape(a0,a1)arr2 = np.arange(a0*a1,2*a0*a1).reshape(a0,a1)arr3 = np.arange(2*a0*a1,3*a0*a1).reshape(a0,a1)arr4 = np.arange(3*a0*a1,4*a0*a1).reshape(a0,a1)dataset = [{"sample":arr1, "category":"Cat"},            {"sample":arr2, "category":"Dog"},           {"sample":arr3, "category":"Cat"},           {"sample":arr4, "category":"Dog"}           ]# Create the HDF5 file with  dataset name using counter and "sample" as the data# "category" is savee as a dataset attributewith h5py.File('SO_73499414.h5', 'w') as h5f:    for i, ds_dict in enumerate(dataset):        ds = h5f.create_dataset(f'ds_{i:04}', data=ds_dict["sample"])        ds.attrs["category"] = ds_dict["category"]# Retrieve the HDF5 data with  "sample" as the data and "category" from the attributeds_list = []with h5py.File('SO_73499414.h5', 'r') as h5f:    for ds_name in h5f:        print(ds_name,'n',h5f[ds_name]) # prints name and dataset attributes        print(h5f[ds_name].attrs["category"]) # prints the category attribute        print(h5f[ds_name][()]) # prints the dataset values (as an array)                 # add data and name to list        ds_list.append({"sample":h5f[ds_name][()], "category":h5f[ds_name].attrs["category"]})

Advertisement

Answer