I find it difficult to understand a notion about tensors. For VGG (https://www.tensorflow.org/api_docs/python/tf/keras/applications/VGG16), we start from a batch of colour images (none,224,224,3) and apply 64 2D convolutional filters. At the output we obtain a tensor of (none,224,224,64), we can see this by making a summary of the model. However, a filter must treat all 3 colours and my intuition tells me that I should have an output tensor of (none,224,224,3,64). Could one explain to me why my reasoning is wrong? Thank you for your explanations.
Advertisement
Answer
All filters have shape
(kernel_height, kernel_width, input_channels)
When they pass on your input data with 'SAME'
padding, the output shape result is
(input_height, input_width)
And that, for all filters, so
(input_height, input_width, n_filters)