How to perform operations on very big torch tensors without splitting them

Question

My Task: I&#8217;m trying to calculate the pair-wise distance between every two samples in two big tensors (for k-Nearest-Neighbours), That is &#8211; given tensor test with shape (b1,c,h,w) and tensor train with shape (b2,c,h,w), I need || test[i]-train[j] || for every i,j. (where both test[i] and train[j] h…

Accepted Answer

Did you check the FAISS documentation?If what you need is the L2 norm (torch.cidst uses p=2 as default parameter) then it is quite straightforward. Code below is an adaptation of the FAISS docs to your example:import faissimport numpy as npd = 64                           # dimensionnb = 100000                      # database sizenq = 10000                       # nb of queriesnp.random.seed(1234)             # make reproduciblex_test = np.random.random((nb, d)).astype('float32')x_test[:, 0] += np.arange(nb) / 1000.x_train = np.random.random((nq, d)).astype('float32')x_train[:, 0] += np.arange(nq) / 1000.index = faiss.IndexFlatL2(d)   # build the indexprint(index.is_trained)index.add(x_test)                  # add vectors to the indexprint(index.ntotal)k= 100 # take the 100 closest neighborsD, I = index.search(x_train, k)     # actual searchprint(I[:5])                   # neighbors of the 100 first queriesprint(I[-5:])                  # neighbors of the 100 last queries

Advertisement

Answer