Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detach the derivative though the Hessian #425

Merged
merged 19 commits into from
Jan 19, 2023
Merged

Conversation

bamos
Copy link
Contributor

@bamos bamos commented Jan 12, 2023

Motivation and Context

@joeaortiz has found that the implicit derivative derivation assumes that the linearized Jacobian is detached, which the code doesn't do. Differentiating the Newton step without it detached may result in incorrectly adding terms to the derivative. This doesn't impact many cases, including many of the examples, because the derivatives through the Jacobian turns out to be zero (since it is the third derivative of the objective). This PR detaches it for when it is not and adds this case to the derivative tests. I don't think the detach I've added impacts any other use cases (and all of the tests are still passing with it added)

How Has This Been Tested

This issue can be reproduced in the backward pass example by squaring the error to make these higher-order terms appear:

def quad_error_fn(optim_vars, aux_vars):
    a, b = optim_vars
    x, y = aux_vars
    est = a.tensor * x.tensor.square() + b.tensor
    err = (y.tensor - est)**2
    return err

Output from before this PR (without the detach)

  • Implicit and truncated derivatives are slightly off.
  • Unrolled one is correct since these derivatives are part of the optimization process
  • The DLM doesn't with the errors squared since it assumes they aren't, so I've left the terms unsquared in the example (but added a test that squares them for all of the other modes) \cc @rtqichen
--- backward_mode=unroll
[ 1.1731514e-03 -7.1603751e-01  1.0089772e-01  2.3340979e-01
  3.5123870e-02 -3.7375614e-01  1.9657709e-02 -1.2930521e+00
  1.8306887e-01 -3.1813970e-01]

--- backward_mode=implicit
[ 1.7856072e-03 -1.0842836e+00  1.5127251e-01  3.5055295e-01
  5.2643321e-02 -5.5838758e-01  2.9702932e-02 -1.9221746e+00
  2.7487200e-01 -4.8069519e-01]

--- backward_mode=truncated, backward_num_iterations=5
[ 1.2274925e-03 -7.4516952e-01  1.0397888e-01  2.4094889e-01
  3.6182307e-02 -3.8375673e-01  2.0420844e-02 -1.3211163e+00
  1.8896094e-01 -3.3034897e-01]

--- backward_mode=dlm
[ 1.1149611e+02 -3.8616781e+04  3.9213835e+03  5.5711709e+03
  4.6857378e+03 -1.3844049e+03  1.6241917e+02 -2.2353323e+05
  5.3557306e+02 -1.4936550e+03]

--- Numeric derivative
[ 1.17315142e-03 -7.16037512e-01  1.00897722e-01  2.33409792e-01
  3.51238698e-02 -3.73756140e-01  1.96577087e-02 -1.29305208e+00
  1.83068871e-01 -3.18139702e-01]

=== Runtimes
Forward: 3.10e-02 s +/- 1.93e-03 s
Backward (unroll): 9.24e-03 s +/- 1.94e-03 s
Backward (implicit) 6.93e-04 s +/- 1.30e-04 s
Backward (truncated, 5 steps) 3.36e-03 s +/- 6.75e-04 s
Backward (dlm) 2.22e-03 s +/- 1.53e-04 s

Output from this PR (with the detach)

  • Implicit and truncated derivative is corrected, unrolled unimpacted.
  • Runtime is slightly faster since it's not computing the additional derivatives
--- backward_mode=unroll
[ 1.1736895e-03 -7.1499848e-01  1.0213172e-01  2.3029083e-01
  3.5493679e-02 -3.8408920e-01  1.9351948e-02 -1.2940415e+00
  1.8987459e-01 -3.1147364e-01]

--- backward_mode=implicit
[ 1.1736881e-03 -7.1499836e-01  1.0213167e-01  2.3029073e-01
  3.5493661e-02 -3.8408923e-01  1.9351928e-02 -1.2940412e+00
  1.8987444e-01 -3.1147367e-01]

--- backward_mode=truncated, backward_num_iterations=5
[ 1.1736895e-03 -7.1499848e-01  1.0213172e-01  2.3029083e-01
  3.5493679e-02 -3.8408920e-01  1.9351948e-02 -1.2940415e+00
  1.8987459e-01 -3.1147364e-01]

--- backward_mode=dlm
[ 1.1149611e+02 -3.8616781e+04  3.9213835e+03  5.5711709e+03
  4.6857378e+03 -1.3844049e+03  1.6241917e+02 -2.2353323e+05
  5.3557306e+02 -1.4936550e+03]

--- Numeric derivative
[ 1.17368950e-03 -7.14998484e-01  1.02131717e-01  2.30290830e-01
  3.54936793e-02 -3.84089202e-01  1.93519481e-02 -1.29404151e+00
  1.89874589e-01 -3.11473638e-01]

=== Runtimes
Forward: 3.07e-02 s +/- 1.32e-03 s
Backward (unroll): 3.50e-03 s +/- 1.47e-04 s
Backward (implicit) 2.91e-04 s +/- 1.23e-05 s
Backward (truncated, 5 steps) 1.23e-03 s +/- 9.79e-05 s
Backward (dlm) 2.15e-03 s +/- 1.80e-04 s

Types of changes

  • I am detaching the Jacobian in the dense and sparse linearization code so that any backend's Jacobian will be detached

Discussion point

I believe detaching this will always be the correct mode when implicitly differentiating and when using the truncated and unrolled modes when an optimal solution is found. However, the detach removes some higher-order terms in the truncated and unrolled modes that may be useful if only a suboptimal solution is found. This issue should only come up when these derivatives are non-zero (e.g. when the error is non-linear) so it's may not impact many of the existing use cases. We could consider adding an option to enable the derivatives (and disable these detaches) in case there are some truncated/unrolled settings where it is useful.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 12, 2023
Copy link
Contributor

@joeaortiz joeaortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Have verified this with another example I had been investigating and it now gives the correct gradient expected from theory.

@rtqichen rtqichen self-requested a review January 13, 2023 20:59
Copy link
Contributor

@rtqichen rtqichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the DLM formula was derived based on a sum of squares assumption.. it doesn't seem straightforward to generalize to higher powers.

@bamos bamos changed the base branch from main to austinw.collision_shape January 15, 2023 23:43
@bamos bamos changed the base branch from austinw.collision_shape to main January 15, 2023 23:43
@bamos bamos force-pushed the bda.2023.01.12.backward-detach branch from eb006be to b6e0d32 Compare January 15, 2023 23:45
@bamos bamos changed the title Detach the derivative though the Jacobian Detach the derivative though the Hessian Jan 18, 2023
Copy link
Contributor

@mhmukadam mhmukadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome team work here, thanks for leading the PR and the updates @bamos, thanks for catching the issue and for the test case @joeaortiz, and thanks for the fixes on the sparse solvers @luisenp.

…in the sparse solvers (#434)

* Added a flag for detaching the hessian of sparse linearization.

* Created utility to compute A_grad of sparse solvers. Includes support for detaching the contribution of the hessian.

* Changed all sparse autograd to use compute_A_grad utility.

* Changed sparse solvers to respect sparse linearization 'detached_hessian' flag.

* Changed backward test to also test sparse solvers.

* Fixed incorrect use of GN for truncated.
@luisenp luisenp merged commit 78bcaf3 into main Jan 19, 2023
@luisenp luisenp deleted the bda.2023.01.12.backward-detach branch January 19, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants