Resize feature vector from neural network

Question

I am trying to perform a task of approximation of two embeddings (textual and visual). For the visual embedding, I am using VGG as the encoder. The output is a 1x1000 embedding. For the textual encoder, I am using a Transformer to which output is shaped 1x712. What I want is to convert both these vectors to the same dimension

Accepted Answer

You could either apply a differentiable PCA operator such as torch.pca_lowrank.Alternatively, an easier solution is to use two fully connected adapter layers to learn two mappings. One for you image features 1000 -> n, the other for textual features: 712 -> n. Then you can choose a fusion strategy to combine the  two features shaped (1,n): either using concatenation, point-wise addition/multiplication (in thoses cases n should be equal to 512. Esle you can learn a final mapping n*2 -> 512.

Advertisement

Answer