Is there any way to implement early stopping callback for Tensorflow 2 model_main_tf.py?

Question

Hello I'm working on object detection using tensorflow 2 object detection API model_main_tf2.py file normally we can use early stopping callback for model.fit() when we use normally but when i tried to training with pipeline config model_main_tf2.py file and .config file I'm not able to implement it because I'm unable to locate model.fit() in the main file so please is

Accepted Answer

I had a look inside the model_main_tf2.py file. Let&#8217;s take the following piece of code:model_lib_v2.train_loop(          pipeline_config_path=FLAGS.pipeline_config_path,          model_dir=FLAGS.model_dir,          train_steps=FLAGS.num_train_steps,          use_tpu=FLAGS.use_tpu,          checkpoint_every_n=FLAGS.checkpoint_every_n,          record_summaries=FLAGS.record_summaries) Instead of executing the training through fit it is used a custom training loop. In the code above is called the function that executes the training operation. model_lib_v2 is just another file of the repo that you&#8217;ve linked.If you have a look at the train_loop function, you&#8217;ll see that at some point is executed the following code:with tf.GradientTape() as tape:    losses_dict, _ = _compute_losses_and_predictions_dicts(        detection_model, features, labels,        training_step=training_step,        add_regularization_loss=add_regularization_loss)    losses_dict = normalize_dict(losses_dict, num_replicas)  trainable_variables = detection_model.trainable_variables  total_loss = losses_dict['Loss/total_loss']  gradients = tape.gradient(total_loss, trainable_variables)GradientTape basically computes the gradients needed to update the weights of the model during the training phase. I won&#8217;t go into much details, but if you are interested you can have a look at the linked documentation.Now, you are interested in adding an early stopping callback, but you don&#8217;t have a fit. You can still add early stopping, but in a different way.You can follow a strategy like the one below (Refer to this tutorial by tensorflow for the full code):epochs = 100patience = 5  # you can play with this values to obtain the best configwait = 0best = 0for epoch in range(epochs):    # training (calling the function that holds the GradientTape    for step, (x_batch_train, y_batch_train) in enumerate(ds_train):      loss_value = train_step(x_batch_train, y_batch_train)        # updating the metrics after the whole training loop on a single epoch             train_acc = train_acc_metric.result()    train_loss = train_loss_metric.result()    train_acc_metric.reset_states()    train_loss_metric.reset_states()    print("Training acc over epoch: %.4f" % (train_acc.numpy()))        # evaluating the model just trained in a new epoch, on the validation data    for x_batch_val, y_batch_val in ds_test:      test_step(x_batch_val, y_batch_val)        # updating the metrics for validation    val_acc = val_acc_metric.result()    val_loss = val_loss_metric.result()    val_acc_metric.reset_states()    val_loss_metric.reset_states()    print("Validation acc: %.4f" % (float(val_acc),))    print("Time taken: %.2fs" % (time.time() - start_time))    # The early stopping strategy: stop the training if `val_loss` does not    # decrease over a certain number of epochs.    wait += 1    if val_loss > best:      best = val_loss      wait = 0    if wait >= patience:      break

Advertisement

Answer