I am migrating my training loop to Tensorflow 2.0 API. In eager execution mode, tf.GradientTape replaces tf.gradients. The question is, do they have the same functionality? Specifically:
- In function - gradient():- Is the parameter output_gradientsequivalent tograd_ysin the old API?
- What about parameters colocate_gradients_with_ops.aggregation_method,gate_gradientsoftf.gradients? Are they deprecated due to lack of use? Can they be replaced by using other methods in 2.0 API? Are they needed in Eager Execution at all?
 
- Is the parameter 
- Is function - jacobian()equivalent to- tf.python.ops.parallel_for.gradients?
Advertisement
Answer
Please find the response below.
- Regarding Output Gradientsandgrad_ys: Yes, they can be considered same.
Detailed Explanation: Info about Output Gradients is  mentioned in Github -> imperative_grad.py as shown below.
output_gradients: if not None, a list of gradient provided for each Target, or None if we are to use the target’s computed downstream gradient,
Info about grad_ys is mentioned in TF Site as shown below:
grad_ys: is a list of tensors of the same length as ys that holds the initial gradients for each y in ys. When grad_ys is None, we fill in a tensor of ‘1’s of the shape of y for each y in ys. A user can provide their own initial grad_ys to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).
From the above explanations, and from the below code, mentioned in page 394 of the book, Hands on ML using Scikit-Learn & Tensorflow,
we can conclude that initial value of Theta can be a Random Value and we can pass that using the parameters, output_gradients or grad_ys.
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta") gradients = tf.gradients(mse, [theta])[0] training_op = tf.assign(theta, theta - learning_rate * gradients)
- Regarding colocate_gradients_with_ops: Yes, it is not needed for Eager Execution as it is related to Control Flow Context of Graphs.
Detailed Explanation: colocate_gradients_with_ops points to the below code mentioned in Github -> ops.py. Control flow Context is related to the concept of Context, which is related to Graphs, as explained in TF Site -> Graphs
 def _colocate_with_for_gradient(self, op, gradient_uid,
                                  ignore_existing=False):
    with self.colocate_with(op, ignore_existing):
      if gradient_uid is not None and self._control_flow_context is not None:
        self._control_flow_context.EnterGradientColocation(op, gradient_uid)
        try:
          yield
        finally:
          self._control_flow_context.ExitGradientColocation(op, gradient_uid)
      else:
        yield
- Regarding - aggregation_method: The equivalent of this parameter has been implemented in 2.0, named- _aggregate_gradsas shown in Github link
- Regarding - gate_gradients: Not needed for Eager as this also is related to Graph Context.
Detailed Explanation: As shown in the below code from Github -> gradients_utils.py, if gate_gradients is True, then some operations are added to graph using the function, _colocate_with_for_gradient, which in turn depends on Control Flow Context of Graphs.
if gate_gradients and len([x for x in in_grads
                                         if x is not None]) > 1:
                with ops.device(None):
                  with ops._colocate_with_for_gradient(  # pylint: disable=protected-access
                      None,
                      gradient_uid,
                      ignore_existing=True):
                    in_grads = control_flow_ops.tuple(in_grads)
- Regarding jacobian: Yes they are same.