I was wondering whether creating the model by passing activity_regularizer=’l1_l2′ as an argument to Conv2D() will mathematically make a difference to creating the model by adding model.add(ActivityRegularization(l1=…, l2=…)) seperately? For me, it is hard to tell, as training always involves some randomness. But the results seem similar. One additional question I have is: I accidentally passed the activity_regularizer=’l1_l2′ argument to
Tag: regularized
L1/L2 regularization in PyTorch
How do I add L1/L2 regularization in PyTorch without manually computing it? Answer See the documentation. Add a weight_decay parameter to the optimizer for L2 regularization.