I have two folders( one for train and one for test) and each one has around 10 files in h5 format. I want to read them and use them in a dataset. I have a function to read them, but I don’t know how I can use it to read the file in my class.
def read_h5(path): data = h5py.File(path, 'r') image = data['image'][:] label = data['label'][:] return image, label class Myclass(Dataset): def __init__(self, split='train', transform=None): raise NotImplementedError def __len__(self): raise NotImplementedError def __getitem__(self, index): raise NotImplementedError
Do you have a suggestion? Thank you in advance
Advertisement
Answer
This might be a start for what you want to do. I implemented the __init__()
, but not __len__()
or __get_item__()
. User provides the path, and the init function calls the class method read_h5()
to get the arrays of image and label data. There is a short main to create a class objects from 2 different H5 files. Modify the paths
list with folder and filenames for all of your training and testing data.
class H5_data(): def __init__(self, path): #split='train', transform=None): self.path = path self.image, self.label = H5_data.read_h5(path) @classmethod def read_h5(cls,path): with h5py.File(path, 'r') as data: image = data['image'][()] label = data['label'][()] return image, label paths = ['train_0.h5', 'test_0.h5'] for path in paths: h5_test = H5_data(path) print(f'For HDF5 file: {path}') print(f'image data, shape: {h5_test.image.shape}; dtype: {h5_test.image.dtype}') print(f'label data, shape: {h5_test.label.shape}; dtype: {h5_test.label.dtype}')
IMHO, creating a class with the array data is overkill (and could lead to memory problems if you have really large datasets). It is more memory efficient to create h5py dataset objects, and access the data when you need it. Example below does the same as code above, without creating a class object with numpy arrays.
paths = ['train_0.h5', 'test_0.h5'] for path in paths: with h5py.File(path, 'r') as data: image = data['image'] label = data['label'] print(f'For HDF5 file: {path}') print(f'image data, shape: {image.shape}; dtype: {image.dtype}') print(f'label data, shape: {label.shape}; dtype: {label.dtype}')