Detach the derivative though the Hessian #425

bamos · 2023-01-12T18:03:50Z

Motivation and Context

@joeaortiz has found that the implicit derivative derivation assumes that the linearized Jacobian is detached, which the code doesn't do. Differentiating the Newton step without it detached may result in incorrectly adding terms to the derivative. This doesn't impact many cases, including many of the examples, because the derivatives through the Jacobian turns out to be zero (since it is the third derivative of the objective). This PR detaches it for when it is not and adds this case to the derivative tests. I don't think the detach I've added impacts any other use cases (and all of the tests are still passing with it added)

How Has This Been Tested

This issue can be reproduced in the backward pass example by squaring the error to make these higher-order terms appear:

def quad_error_fn(optim_vars, aux_vars):
    a, b = optim_vars
    x, y = aux_vars
    est = a.tensor * x.tensor.square() + b.tensor
    err = (y.tensor - est)**2
    return err

Output from before this PR (without the detach)

Implicit and truncated derivatives are slightly off.
Unrolled one is correct since these derivatives are part of the optimization process
The DLM doesn't with the errors squared since it assumes they aren't, so I've left the terms unsquared in the example (but added a test that squares them for all of the other modes) \cc @rtqichen

--- backward_mode=unroll
[ 1.1731514e-03 -7.1603751e-01  1.0089772e-01  2.3340979e-01
  3.5123870e-02 -3.7375614e-01  1.9657709e-02 -1.2930521e+00
  1.8306887e-01 -3.1813970e-01]

--- backward_mode=implicit
[ 1.7856072e-03 -1.0842836e+00  1.5127251e-01  3.5055295e-01
  5.2643321e-02 -5.5838758e-01  2.9702932e-02 -1.9221746e+00
  2.7487200e-01 -4.8069519e-01]

--- backward_mode=truncated, backward_num_iterations=5
[ 1.2274925e-03 -7.4516952e-01  1.0397888e-01  2.4094889e-01
  3.6182307e-02 -3.8375673e-01  2.0420844e-02 -1.3211163e+00
  1.8896094e-01 -3.3034897e-01]

--- backward_mode=dlm
[ 1.1149611e+02 -3.8616781e+04  3.9213835e+03  5.5711709e+03
  4.6857378e+03 -1.3844049e+03  1.6241917e+02 -2.2353323e+05
  5.3557306e+02 -1.4936550e+03]

--- Numeric derivative
[ 1.17315142e-03 -7.16037512e-01  1.00897722e-01  2.33409792e-01
  3.51238698e-02 -3.73756140e-01  1.96577087e-02 -1.29305208e+00
  1.83068871e-01 -3.18139702e-01]

=== Runtimes
Forward: 3.10e-02 s +/- 1.93e-03 s
Backward (unroll): 9.24e-03 s +/- 1.94e-03 s
Backward (implicit) 6.93e-04 s +/- 1.30e-04 s
Backward (truncated, 5 steps) 3.36e-03 s +/- 6.75e-04 s
Backward (dlm) 2.22e-03 s +/- 1.53e-04 s

Output from this PR (with the detach)

Implicit and truncated derivative is corrected, unrolled unimpacted.
Runtime is slightly faster since it's not computing the additional derivatives

--- backward_mode=unroll
[ 1.1736895e-03 -7.1499848e-01  1.0213172e-01  2.3029083e-01
  3.5493679e-02 -3.8408920e-01  1.9351948e-02 -1.2940415e+00
  1.8987459e-01 -3.1147364e-01]

--- backward_mode=implicit
[ 1.1736881e-03 -7.1499836e-01  1.0213167e-01  2.3029073e-01
  3.5493661e-02 -3.8408923e-01  1.9351928e-02 -1.2940412e+00
  1.8987444e-01 -3.1147367e-01]

--- backward_mode=truncated, backward_num_iterations=5
[ 1.1736895e-03 -7.1499848e-01  1.0213172e-01  2.3029083e-01
  3.5493679e-02 -3.8408920e-01  1.9351948e-02 -1.2940415e+00
  1.8987459e-01 -3.1147364e-01]

--- backward_mode=dlm
[ 1.1149611e+02 -3.8616781e+04  3.9213835e+03  5.5711709e+03
  4.6857378e+03 -1.3844049e+03  1.6241917e+02 -2.2353323e+05
  5.3557306e+02 -1.4936550e+03]

--- Numeric derivative
[ 1.17368950e-03 -7.14998484e-01  1.02131717e-01  2.30290830e-01
  3.54936793e-02 -3.84089202e-01  1.93519481e-02 -1.29404151e+00
  1.89874589e-01 -3.11473638e-01]

=== Runtimes
Forward: 3.07e-02 s +/- 1.32e-03 s
Backward (unroll): 3.50e-03 s +/- 1.47e-04 s
Backward (implicit) 2.91e-04 s +/- 1.23e-05 s
Backward (truncated, 5 steps) 1.23e-03 s +/- 9.79e-05 s
Backward (dlm) 2.15e-03 s +/- 1.80e-04 s

Types of changes

I am detaching the Jacobian in the dense and sparse linearization code so that any backend's Jacobian will be detached

Discussion point

I believe detaching this will always be the correct mode when implicitly differentiating and when using the truncated and unrolled modes when an optimal solution is found. However, the detach removes some higher-order terms in the truncated and unrolled modes that may be useful if only a suboptimal solution is found. This issue should only come up when these derivatives are non-zero (e.g. when the error is non-linear) so it's may not impact many of the existing use cases. We could consider adding an option to enable the derivatives (and disable these detaches) in case there are some truncated/unrolled settings where it is useful.

joeaortiz

Looks good to me. Have verified this with another example I had been investigating and it now gives the correct gradient expected from theory.

rtqichen

Right, the DLM formula was derived based on a sum of squares assumption.. it doesn't seem straightforward to generalize to higher powers.

theseus/utils/utils.py

theseus/optimizer/nonlinear/nonlinear_optimizer.py

mhmukadam

LGTM! Awesome team work here, thanks for leading the PR and the updates @bamos, thanks for catching the issue and for the test case @joeaortiz, and thanks for the fixes on the sparse solvers @luisenp.

…in the sparse solvers (#434) * Added a flag for detaching the hessian of sparse linearization. * Created utility to compute A_grad of sparse solvers. Includes support for detaching the contribution of the hessian. * Changed all sparse autograd to use compute_A_grad utility. * Changed sparse solvers to respect sparse linearization 'detached_hessian' flag. * Changed backward test to also test sparse solvers. * Fixed incorrect use of GN for truncated.

bamos requested review from rtqichen, luisenp, mhmukadam and joeaortiz January 12, 2023 18:03

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 12, 2023

joeaortiz approved these changes Jan 13, 2023

View reviewed changes

rtqichen approved these changes Jan 13, 2023

View reviewed changes

rtqichen self-requested a review January 13, 2023 20:59

rtqichen approved these changes Jan 13, 2023

View reviewed changes

bamos changed the base branch from main to austinw.collision_shape January 15, 2023 23:43

bamos changed the base branch from austinw.collision_shape to main January 15, 2023 23:43

bamos and others added 9 commits January 15, 2023 18:45

Detach the derivative though the Jacobian

5b148dd

Added flag to toggle if linearization has to detach jacobians.

d52235f

add conditional for detaching in the last iteration

c005756

remove conditional

8ef76dd

add back numeric derivatives, bump down atol

dc99f71

Moved detach to hessian

dc14f89

replace numdifftools with a central difference estimate

c31a58b

formatting

a003687

fix variable name

b6e0d32

bamos force-pushed the bda.2023.01.12.backward-detach branch from eb006be to b6e0d32 Compare January 15, 2023 23:45

luisenp and others added 6 commits January 16, 2023 10:35

Moved numeric_grad to utils.

5f897c3

rm error tolerance

a34091d

add quartic test

56b928d

remove print

19678fd

Move numeric_grad to utils.h

d382beb

Merge branch 'main' into bda.2023.01.12.backward-detach

1695398

luisenp mentioned this pull request Jan 18, 2023

This adds support for detaching the hessian contribution to A_grad in the sparse solvers #434

Merged

bamos changed the title ~~Detach the derivative though the Jacobian~~ Detach the derivative though the Hessian Jan 18, 2023

mhmukadam reviewed Jan 18, 2023

View reviewed changes

theseus/utils/utils.py Show resolved Hide resolved

mhmukadam reviewed Jan 18, 2023

View reviewed changes

theseus/optimizer/nonlinear/nonlinear_optimizer.py Outdated Show resolved Hide resolved

mhmukadam approved these changes Jan 18, 2023

View reviewed changes

luisenp added 4 commits January 19, 2023 13:09

Fix wrong type info.

6624525

Merge branch 'main' into bda.2023.01.12.backward-detach

7f6e244

Fix import.

602b729

luisenp approved these changes Jan 19, 2023

View reviewed changes

luisenp merged commit 78bcaf3 into main Jan 19, 2023

luisenp deleted the bda.2023.01.12.backward-detach branch January 19, 2023 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detach the derivative though the Hessian #425

Detach the derivative though the Hessian #425

bamos commented Jan 12, 2023 •

edited

Loading

joeaortiz left a comment

rtqichen left a comment

mhmukadam left a comment

Detach the derivative though the Hessian #425

Detach the derivative though the Hessian #425

Conversation

bamos commented Jan 12, 2023 • edited Loading

Motivation and Context

How Has This Been Tested

Output from before this PR (without the detach)

Output from this PR (with the detach)

Types of changes

Discussion point

joeaortiz left a comment

Choose a reason for hiding this comment

rtqichen left a comment

Choose a reason for hiding this comment

mhmukadam left a comment

Choose a reason for hiding this comment

bamos commented Jan 12, 2023 •

edited

Loading