-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detach the derivative though the Hessian #425
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Have verified this with another example I had been investigating and it now gives the correct gradient expected from theory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the DLM formula was derived based on a sum of squares assumption.. it doesn't seem straightforward to generalize to higher powers.
eb006be
to
b6e0d32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Awesome team work here, thanks for leading the PR and the updates @bamos, thanks for catching the issue and for the test case @joeaortiz, and thanks for the fixes on the sparse solvers @luisenp.
…in the sparse solvers (#434) * Added a flag for detaching the hessian of sparse linearization. * Created utility to compute A_grad of sparse solvers. Includes support for detaching the contribution of the hessian. * Changed all sparse autograd to use compute_A_grad utility. * Changed sparse solvers to respect sparse linearization 'detached_hessian' flag. * Changed backward test to also test sparse solvers. * Fixed incorrect use of GN for truncated.
Motivation and Context
@joeaortiz has found that the implicit derivative derivation assumes that the linearized Jacobian is detached, which the code doesn't do. Differentiating the Newton step without it detached may result in incorrectly adding terms to the derivative. This doesn't impact many cases, including many of the examples, because the derivatives through the Jacobian turns out to be zero (since it is the third derivative of the objective). This PR detaches it for when it is not and adds this case to the derivative tests. I don't think the detach I've added impacts any other use cases (and all of the tests are still passing with it added)
How Has This Been Tested
This issue can be reproduced in the backward pass example by squaring the error to make these higher-order terms appear:
Output from before this PR (without the detach)
Output from this PR (with the detach)
Types of changes
Discussion point
I believe detaching this will always be the correct mode when implicitly differentiating and when using the truncated and unrolled modes when an optimal solution is found. However, the detach removes some higher-order terms in the truncated and unrolled modes that may be useful if only a suboptimal solution is found. This issue should only come up when these derivatives are non-zero (e.g. when the error is non-linear) so it's may not impact many of the existing use cases. We could consider adding an option to enable the derivatives (and disable these detaches) in case there are some truncated/unrolled settings where it is useful.