Skip to content
Advertisement

Is passing activity_regularizer as argument to Conv2D() the same as passing it seperately right after Conv2D()? (Tensorflow)

I was wondering whether creating the model by passing activity_regularizer='l1_l2' as an argument to Conv2D()

model = keras.Sequential()
model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu',
                   activity_regularizer='l1_l2', input_shape=X_train[0].shape))
model.add(Dropout(0.2))
model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid'))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)

will mathematically make a difference to creating the model by adding model.add(ActivityRegularization(l1=..., l2=...)) seperately?

model = keras.Sequential()
model.add(Conv2D(filters=16, kernel_size=(6, 6), strides=3, padding='valid', activation='relu',
                   input_shape=X_train[0].shape))
model.add(Dropout(0.2))
model.add(ActivityRegularization(l1=some_number, l2=some_number))
model.add(MaxPooling2D(pool_size=(3, 1), strides=3, padding='valid')) 
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer=Adam(learning_rate = 0.001), loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(X_train, y_train, epochs = 10, validation_data = (X_val, y_val), verbose=0)

For me, it is hard to tell, as training always involves some randomness. But the results seem similar.

One additional question I have is: I accidentally passed the activity_regularizer='l1_l2' argument to the MaxPooling2D() layer before, and the code ran. How can that be, considering that activity_regularizer is not given as a possible argument for MaxPooling2D() in tensorflow?

Advertisement

Answer

Technically, if you are not applying any other constraint on the layer output, applying the activity regularizer inside the layer as well as outside the convolution layer is same. However, applying it outside the convolution layer gives the user more flexibility. For instance, the user might want to regularize the output units after the skip connections are set up instead of after the convolution. It is just like to have an activation function inside the convolution layer or using keras.activations to use the activations after he convolution layer. Sometimes this is done after batch normalization.

For your second question, the MaxPool2D layer takes the activity regularizer constraint. Even though this is not mentioned in their documentation, it kind of makes sense intuitionally, since the user might want to regularize the outputs after max-pooling. You can check that activity_regularizer does not only work with the MaxPool2D layer but also with other layers such as the BatchNormalization layer for the same reason.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement