Skip to content
Advertisement

PyTorch: Dataloader for time series task

I have a Pandas dataframe with n rows and k columns loaded into memory. I would like to get batches for a forecasting task where the first training example of a batch should have shape (q, k) with q referring to the number of rows from the original dataframe (e.g. 0:128). The next example should be (128:256, k) and so on. So, ultimately, one batch should have the shape (32, q, k) with 32 corresponding to the batch size.

Since TensorDataset from data_utils does not work here, I am wondering what the best way would be. I tried to use np.array_split() to get as first dimension the number of possible splits of q values in order to write a custom DataLoader but then reshaping is not guaranteed to work since not all arrays have the same shape.

Here is a minimal example to make it more clear. In this case, batch size is 3 and q is 2:

JavaScript

The dataset:

JavaScript

The first batch in this case should have the shape (3,2,3) and look like:

JavaScript

Advertisement

Answer

You can write your analog of the TensorDataset. To do this you need to inherit from the Dataset class.

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement