I was wondering whether creating the model by passing activity_regularizer='l1_l2'
as an argument to Conv2D()
model = keras.Sequential() model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu', activity_regularizer='l1_l2', input_shape=X_train[0].shape)) model.add(Dropout(0.2)) model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid')) model.add(Dropout(0.3)) model.add(Flatten()) model.add(Dense(10, activation='softmax')) model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy']) model.summary() history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)
will mathematically make a difference to creating the model by adding model.add(ActivityRegularization(l1=..., l2=...))
seperately?
model = keras.Sequential() model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu', input_shape=X_train[0].shape)) model.add(Dropout(0.2)) model.add(ActivityRegularization(l1=some_number, l2=some_number)) model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid')) model.add(Dropout(0.3)) model.add(Flatten()) model.add(Dense(10, activation='softmax')) model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy']) model.summary() history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)
For me, it is hard to tell, as training always involves some randomness. But the results seem similar.
One additional question I have is: I accidentally passed the activity_regularizer='l1_l2'
argument to the MaxPooling2D() layer before, and the code ran. How can that be, considering that activity_regularizer
is not given as a possible argument for MaxPooling2D() in tensorflow?
Advertisement
Answer
Technically, if you are not applying any other constraint on the layer output, applying the activity regularizer inside the layer as well as outside the convolution layer is same. However, applying it outside the convolution layer gives the user more flexibility. For instance, the user might want to regularize the output units after the skip connections are set up instead of after the convolution. It is just like to have an activation function inside the convolution layer or using keras.activations
to use the activations after he convolution layer. Sometimes this is done after batch normalization.
For your second question, the MaxPool2D
layer takes the activity regularizer constraint. Even though this is not mentioned in their documentation, it kind of makes sense intuitionally, since the user might want to regularize the outputs after max-pooling. You can check that activity_regularizer
does not only work with the MaxPool2D
layer but also with other layers such as the BatchNormalization
layer for the same reason.