Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diagonal scaling method and re-enable LM adaptive + ellipsoidal #393

Merged
merged 3 commits into from
Dec 9, 2022

Conversation

luisenp
Copy link
Contributor

@luisenp luisenp commented Dec 2, 2022

This adds a linearization diagonal scaling method and implementations for dense and sparse linearization. The sparse implementation could be faster with a custom kernel, to avoid python loop over rows, but this should be good enough to get an initial implementation.

@luisenp luisenp added the enhancement New feature or request label Dec 2, 2022
@luisenp luisenp requested a review from fantaosha December 2, 2022 21:59
@luisenp luisenp self-assigned this Dec 2, 2022
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 2, 2022
Copy link
Contributor

@fantaosha fantaosha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only some minor comments.

theseus/optimizer/dense_linearization.py Show resolved Hide resolved
assert isinstance(self._damping, torch.Tensor)
damping = self._damping.view(-1, 1)
if ellipsoidal_damping:
damping = linearization.diagonal_scaling(damping)
# Deliberately using Atb before updating the variables, according to
# the LM reference above
den = (delta * (damping * delta + linearization.Atb.squeeze(2))).sum(dim=1) / 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In trust region method, den is also called predicted reduction. In addition, a test can be added for den as we discussed before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this to TrustRegion class in the Dogleg PR (remind me if I forget). We'll unify these two in a later PR.

theseus/optimizer/sparse_linearization.py Outdated Show resolved Hide resolved
@luisenp
Copy link
Contributor Author

luisenp commented Dec 3, 2022

@fantaosha That's correct. Out of our 3 sparse solvers, 2 don't need to explicitly construct AtA. The only exception is LUCudaSparseSolver, which does it here. But note that this class has no access to our linearization structures. Also note that if you try to call lin.AtA on a sparse linearization it actually raises NotImplementedError. I'd prefer to leave like this for now until we have an actual use for such method.

On the other hand, your question made me realize that caching the solution of linear solvers could also be useful now that we are allowing retries. I'll add this in a later PR.

Copy link
Contributor

@fantaosha fantaosha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only one minor comment.

theseus/optimizer/sparse_linearization.py Outdated Show resolved Hide resolved
theseus/optimizer/dense_linearization.py Show resolved Hide resolved
@luisenp luisenp force-pushed the lep.diagonal_scaling branch from 4bbea0f to 9dd36ba Compare December 5, 2022 16:46
@luisenp luisenp force-pushed the lep.sparse_matrix_vector_prod branch from 146e0bf to 1d703b1 Compare December 5, 2022 16:59
@luisenp luisenp force-pushed the lep.diagonal_scaling branch 3 times, most recently from 122798c to 8c5c348 Compare December 5, 2022 22:54
@luisenp luisenp force-pushed the lep.sparse_matrix_vector_prod branch from 4a78a03 to be37500 Compare December 8, 2022 16:59
@luisenp luisenp force-pushed the lep.diagonal_scaling branch from 8c5c348 to bed3dab Compare December 8, 2022 17:04
@luisenp luisenp force-pushed the lep.sparse_matrix_vector_prod branch from be37500 to 1e9dcb9 Compare December 8, 2022 17:26
Base automatically changed from lep.sparse_matrix_vector_prod to main December 8, 2022 18:17
@luisenp luisenp force-pushed the lep.diagonal_scaling branch from bed3dab to b4535c7 Compare December 8, 2022 18:58
@luisenp luisenp merged commit 1d3ef23 into main Dec 9, 2022
@luisenp luisenp deleted the lep.diagonal_scaling branch December 9, 2022 15:01
Copy link
Contributor

@mhmukadam mhmukadam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

suddhu pushed a commit to suddhu/theseus that referenced this pull request Jan 21, 2023
…acebookresearch#393)

* Add a linearization method to do hessian scaling.

* Add caching for diag(AtA).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants