You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It will be easier to discuss this on github rather than internal Google doc.
The issue
Current state of API:
Function
Without ridge
With ridge r > 0
Remark
solve_cg
Ax=b
(A+rI)=b
well posed because A is PSD
solve_gmres
Ax=b
(A+rI)=b
ill-posed if A=-rI
solve_bicgstab
Ax=b
(A+rI)=b
ill-posed if A=-rI
solve_normal_cg
A^TAx=A^Tb
(A^TA+rI)x=A^Tb
well posed because A^TA is PSD
There are consistency issues here: with ridge regularization we expect (A^T+rI)(A+rI)x=(A^T+rI)b for solve_normal_cg. Consequently all of solve_cg, solve_gmres and solve_bicgstab are interchangeable when r > 0, but not with solve_nornal_cg. Worse: when r=0 they are all interchangeable with each other (at least for PD matrices).
Discussion:
Tikhonov regularization regularizes with A^TA+rI - just like solve_cg. This guarantees a well posed problem.
Other observation: most solvers of Sklearn for Ridge regression uses the A^TA+rI trick.
No one uses A+rI on a general matrix A: it only makes sense to do so on PSD matrix in general.
Solution
Two solutions:
Change (A^TA+rI)x=A^Tb into (A^T+rI)(A+rI)x=(A^T+rI)b, but in this case the problem is ill-posed for A=-rI.
Remove ridge regularization from gmres/bicgstab because currently the regularization may lead to matrix with worse condition number (unexpected behavior when we regularize a system). This happens in particular for NSD matrices.
I am in favor of the second option to remain consistent with literature; unless we can prove that the A+rI approach makes sense.
The text was updated successfully, but these errors were encountered:
It will be easier to discuss this on github rather than internal Google doc.
The issue
Current state of API:
There are consistency issues here: with ridge regularization we expect
(A^T+rI)(A+rI)x=(A^T+rI)b
forsolve_normal_cg
. Consequently all ofsolve_cg
,solve_gmres
andsolve_bicgstab
are interchangeable whenr > 0
, but not withsolve_nornal_cg
. Worse: whenr=0
they are all interchangeable with each other (at least for PD matrices).Discussion:
Tikhonov regularization
regularizes with
A^TA+rI
- just likesolve_cg
. This guarantees a well posed problem.Other observation: most solvers of Sklearn for Ridge regression uses the
A^TA+rI
trick.No one uses
A+rI
on a general matrix A: it only makes sense to do so on PSD matrix in general.Solution
Two solutions:
(A^TA+rI)x=A^Tb
into(A^T+rI)(A+rI)x=(A^T+rI)b
, but in this case the problem is ill-posed forA=-rI
.I am in favor of the second option to remain consistent with literature; unless we can prove that the
A+rI
approach makes sense.The text was updated successfully, but these errors were encountered: