I am migrating my training loop to Tensorflow 2.0 API. In eager execution mode, tf.GradientTape
replaces tf.gradients
. The question is, do they have the same functionality? Specifically:
In function
gradient()
:- Is the parameter
output_gradients
equivalent tograd_ys
in the old API? - What about parameters
colocate_gradients_with_ops
.aggregation_method
,gate_gradients
oftf.gradients
? Are they deprecated due to lack of use? Can they be replaced by using other methods in 2.0 API? Are they needed in Eager Execution at all?
- Is the parameter
Is function
jacobian()
equivalent totf.python.ops.parallel_for.gradients
?
Advertisement
Answer
Please find the response below.
- Regarding
Output Gradients
andgrad_ys
: Yes, they can be considered same.
Detailed Explanation: Info about Output Gradients
is mentioned in Github -> imperative_grad.py as shown below.
output_gradients: if not None, a list of gradient provided for each Target, or None if we are to use the target’s computed downstream gradient,
Info about grad_ys
is mentioned in TF Site as shown below:
grad_ys: is a list of tensors of the same length as ys that holds the initial gradients for each y in ys. When grad_ys is None, we fill in a tensor of ‘1’s of the shape of y for each y in ys. A user can provide their own initial grad_ys to compute the derivatives using a different initial gradient for each y (e.g., if one wanted to weight the gradient differently for each value in each y).
From the above explanations, and from the below code, mentioned in page 394 of the book, Hands on ML using Scikit-Learn & Tensorflow,
we can conclude that initial value of Theta
can be a Random Value and we can pass that using the parameters, output_gradients
or grad_ys
.
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta") gradients = tf.gradients(mse, [theta])[0] training_op = tf.assign(theta, theta - learning_rate * gradients)
- Regarding
colocate_gradients_with_ops
: Yes, it is not needed for Eager Execution as it is related to Control Flow Context of Graphs.
Detailed Explanation: colocate_gradients_with_ops
points to the below code mentioned in Github -> ops.py. Control flow Context is related to the concept of Context, which is related to Graphs, as explained in TF Site -> Graphs
def _colocate_with_for_gradient(self, op, gradient_uid, ignore_existing=False): with self.colocate_with(op, ignore_existing): if gradient_uid is not None and self._control_flow_context is not None: self._control_flow_context.EnterGradientColocation(op, gradient_uid) try: yield finally: self._control_flow_context.ExitGradientColocation(op, gradient_uid) else: yield
Regarding
aggregation_method
: The equivalent of this parameter has been implemented in 2.0, named_aggregate_grads
as shown in Github linkRegarding
gate_gradients
: Not needed for Eager as this also is related to Graph Context.
Detailed Explanation: As shown in the below code from Github -> gradients_utils.py, if gate_gradients
is True
, then some operations are added to graph using the function, _colocate_with_for_gradient
, which in turn depends on Control Flow Context of Graphs.
if gate_gradients and len([x for x in in_grads if x is not None]) > 1: with ops.device(None): with ops._colocate_with_for_gradient( # pylint: disable=protected-access None, gradient_uid, ignore_existing=True): in_grads = control_flow_ops.tuple(in_grads)
- Regarding
jacobian
: Yes they are same.