Skip to content
Advertisement

How to perform operations on very big torch tensors without splitting them

My Task:

I’m trying to calculate the pair-wise distance between every two samples in two big tensors (for k-Nearest-Neighbours), That is – given tensor test with shape (b1,c,h,w) and tensor train with shape (b2,c,h,w), I need || test[i]-train[j] || for every i,j. (where both test[i] and train[j] have shape (c,h,w), as those are sampes in the batch).

The Problem

both train and test are very big, so I can’t fit them into RAM

My current solution

For a start, I did not construct these tensors in one go – As I build them, I split the data Tensor and save them separately to memory, so I end up with files {Testtest_1,...,Testtest_n} and {Traintrain_1,...,Traintrain_m}. Then, I load in a nested for loop every Testtest_i and Traintrain_j, calculate the current distance, and save it.

This semi-pseudo-code might explain

JavaScript

What I thought might work

I came across FAISS repository, in which they explain that this process can be sped up (maybe?) using their solutions, though I’m not quite sure how. Regardless, any approach would help!

Advertisement

Answer

Did you check the FAISS documentation?

If what you need is the L2 norm (torch.cidst uses p=2 as default parameter) then it is quite straightforward. Code below is an adaptation of the FAISS docs to your example:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement