Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ridge Regularization in linear solve: consistency #131

Open
Algue-Rythme opened this issue Dec 16, 2021 · 0 comments
Open

Ridge Regularization in linear solve: consistency #131

Algue-Rythme opened this issue Dec 16, 2021 · 0 comments

Comments

@Algue-Rythme
Copy link
Collaborator

Algue-Rythme commented Dec 16, 2021

It will be easier to discuss this on github rather than internal Google doc.

The issue

Current state of API:

Function Without ridge With ridge r > 0 Remark
solve_cg Ax=b (A+rI)=b well posed because A is PSD
solve_gmres Ax=b (A+rI)=b ill-posed if A=-rI
solve_bicgstab Ax=b (A+rI)=b ill-posed if A=-rI
solve_normal_cg A^TAx=A^Tb (A^TA+rI)x=A^Tb well posed because A^TA is PSD

There are consistency issues here: with ridge regularization we expect (A^T+rI)(A+rI)x=(A^T+rI)b for solve_normal_cg. Consequently all of solve_cg, solve_gmres and solve_bicgstab are interchangeable when r > 0, but not with solve_nornal_cg. Worse: when r=0 they are all interchangeable with each other (at least for PD matrices).

Discussion:

Tikhonov regularization
regularizes with A^TA+rI - just like solve_cg. This guarantees a well posed problem.

Other observation: most solvers of Sklearn for Ridge regression uses the A^TA+rI trick.
No one uses A+rI on a general matrix A: it only makes sense to do so on PSD matrix in general.

Solution

Two solutions:

  1. Change (A^TA+rI)x=A^Tb into (A^T+rI)(A+rI)x=(A^T+rI)b, but in this case the problem is ill-posed for A=-rI.
  2. Remove ridge regularization from gmres/bicgstab because currently the regularization may lead to matrix with worse condition number (unexpected behavior when we regularize a system). This happens in particular for NSD matrices.

I am in favor of the second option to remain consistent with literature; unless we can prove that the A+rI approach makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant