Skip to content
Advertisement

Extracting datasets from 1 HDF5 file to multiple files

I have actually raised a question in generating img from HDF5. Now, another problem I have is to generate the h5 from the existing.

For instance, I have a [ABC.h5], inside, there is the dataset for image and its gt_density map. The keys would be [images, density_maps]

I want to have [GT_001.h5], [GT_002.h5]… instead of the single h5 file. This is the [density_maps] extracted for each image.

How to achieve this? Thanks a lot.

[EDIT] Here is more related information. Thank you @kcw78 for the guides. In the original dataset in the CRSNet, there is a single image file and its ground truth density map in h5. This density map is <HDF5 dataset “density”: shape (544, 932), type “<f4”> <class ‘h5py._hl.dataset.Dataset’>. Therefore, in this dataset, for each IMG_001.jpg, there is an according to IMG_001.h5.

In the dataset I have, it is a single h5 file with the information: HDF5 dataset “density_maps”: shape (300, 380, 676, 1), type “<f4”> <class ‘h5py._hl.dataset.Dataset’> <HDF5 dataset “images”: shape (300, 380, 676, 1), type “|u1”> <class ‘h5py._hl.dataset.Dataset’>

I have successfully generated the corresponding images from the file. Therefore, my current problem would be how to loop and copy the dataset to another new h5 and built a corresponding density map h5 for each image. To explain with a sample, how can I achieve the IMG_001.h5… from this single H5PY file

Advertisement

Answer

This answers your question based on my interpretation of your data. If it doesn’t solve your problem, please clarify the summary below.

First, please be careful with the term “dataset”. It has a specific meaning with h5py. You use “dataset” to refer to a set of data used for training and testing a CNN. That makes it difficult when there are also datasets IN a HDF5 file.

Based on your explanation, this is my understanding of the different files you have for training and testing.

Your original set of training and testing data in the CRSNet:
image files: IMG_###.jpg
ground truth density map files: IMG_###.h5 with attributes: name=”density”; shape=(544, 932); type=”<f4″>
You have pairs of image and density files — 1 .jpg and .h5 file for IMG_001 thru IMG_NNN.

Your new set of training and testing data:
H5 Filename: [ABC.h5]
H5 Dataset 1: name=”images”: shape=(300, 380, 676, 1), type=”|u1″
H5 Dataset 2: name=”density_maps”, shape=(300, 380, 676, 1), type=”<f4″>

You have extracted the data from the “images” dataset in this .h5 file to create IMG_###.jpg (like your original set of training and testing data). Now you want to extract arrays from the “density_maps” dataset in the .h5 file to create IMG_###.h5.

If so, the process is the same as the image extraction procedure. The only difference is you write the data to a .h5 file instead of .jpg file. See below for a pseudo-code.

with h5py.File('yourfile.h5','r') as h5r:
    for i in range(h5r['density_maps'].shape[0]):
        dmap_arr = h5r['density_maps'][i,:] 
        h5w=h5py.File(f'IMG_{i:03}.h5','w')
        h5w.create_dataset('density_maps',data=dmap_arr)
        h5w.close()
        

Note, when you read dmap_arr you may get shape=(380, 676, 1). If so, you can reshape with .reshape(380, 676). Like this:

        dmap_arr = h5r['density_maps'][i,:].reshape(380, 676)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement