Lanczos Method Seminar for Eigenvalue Reading Group
Andre Leger
1 Introduction and Notation
• Eigenvalue Problem: Ax = λx, A ∈ CN ×N , x ∈ CN
• Now λ ∈ R since A = AT .
• Vector qi is orthonormal if
1. qi T qi = 1,
2. QT = Q−1 , Q = [q1 , . . . , qN ],
3. ||qi ||2 = 1.
4. qi T qj = 0 for i 6= j,
2 A Reminder of the Power Method
• We recall the Power Method is used to find the eigenvector associated with the maximum
eigenvalue.
• Simply xk+1 = cAxk , where c is a normalisation constant to prevent large xk+1 .
• As k → ∞, xk+1 → v1 , the eigenvector associated with eigenvalue λ1 where λ1 > λ2 ≥
λ3 . . . λN
• We obtain the maximum eigenvalue by the Rayleigh Quotient
T
k xk Axk
R(A, x ) =
||xk ||22
• Why don’t we just use the QR method? Well if A is sparse, then applying an iteration of the
QR approach does not maintain sparsity of the new matrix. INEFFICIENT.
• Note: We only find ONE eigenvector and eigenvalue. What if we want more?
3 The Idea Behind Lanczos Method
• Lets follow the Power Method, but save each iteration, such that we obtain
v, A, v, A2 v, . . . Ak−1 v
• These vectors form the Krylov Space
Kk (Av) = span {v, Av, A2 v . . . , Ak−1 }
• So after n iterations
v, Av, . . . , An−1 v
are linearly independent and x can be formed from the space.
• By the Power Method, the n-th iteration tends to an eigenvector hence the sequence becomes
linearly dependent but we want a sequence of linearly independent vectors.
• Hence we orthogonalise the vectors, this is the basis of Lanczos Method
4 Lanczos Method
• Assume we have orthonormal vectors
q1 , q2 , . . . , qN
• Simply let Q = [q1 , q2 , . . . , qk ] hence
QT Q = I
• We want to change A to a tridiagonal matrix T, and apply a similarly transformation:
QT AQ = T or AQ = QT
• So we define T to be
α1 β1 0 ... ... ... 0
β1 α2 β2 0 ... ... 0
0 ..
β2 α3 β3 0 ... .
.. ..
. 0 ... ... ... ... . ∈ Ck+1,k
Tk+1,k =
.. ..
. ... ... ... ... ... .
..
. ... ... ... ... . . . βk−1
0 ... ... ... 0 βk−1 αk
0 ... ... ... ... 0 βk
• After k steps we have AQk = Qk+1 Tk+1,k for A ∈ CN,N , Qk ∈ CN,k , Qk+1 ∈ CN,k+1 ,
Tk+1,k ∈ Ck+1,k .
• We observe that
AQk = Qk+1 Tk+1,k = Qk Tk,k + βk qk+1 eTk
• Now AQ = QT hence
A [q1 , q2 , . . . , qk ] = [q1 , q2 , . . . , qk ] Tk
• The first column of the left hand side matrix is given by
Aq1 = α1 q1 + β1 q2
• The ith term by
Aqi = βi−1 qi−1 + αi qi + βi qi+1 ,† i = 2, . . .
• We wish to find the alphas and betas so multiply † by qTi so that
qTi Aqi = qTi βi−1 qi−1 + qTi αi qi + qTi βi qi+1
= βi−1 qTi qi−1 + αi qTi qi + βi qTi qi+1
= αi qTi qi
• We obtain βi by rearranging † from the recurrence formula
ri ≡ βi qi+1 = Aqi − αi qi − βi−1 qi−1
• We assume βi 6= 0 and so βi = ||ri ||2 .
• We may now determine the next orthonormal vector
ri
qi+1 = .
βi
5 A Little Proof - Omit from Seminar
Lemma: All vectors qi+1 generated by the 3-term are orthogonal to all qk for k < i
Proof
• We assume qTi+1 qi = 0 = qTi+1 qi−1 and by induction step qTi qk for k < i.
• We prove qTi+1 qk for k < i.
• Multiply † by qk for k ≤ i − 2 and we show qk , qi are A orthogonal. Hence
qTk Aqi = qTk AT qi = (Aqk )T qi
= (βk−1 qk−1 + αk qk + βk qk+1 )T qi
= βk−1 qTk−1 qi + αk qTk qi + βk qTk+1 qi
= 0+0+0=0
• Now multiply † by qk so that
qTk Aqi = βi−1 qTk qi−1 + αi qTk qi + βi−1 qTk qi+1
Rearranging we obtain
βi−1 qTk qi+1 = qTk Aqi − βi−1 qTk qi−1 − αi qTk qi = 0
6 The Lanzcos Algorithm
Initialise: choose r = q0 and let β0 = ||q0 ||2
Begin Loop: for j = 1, ...
r
qj = βj−1
r = Aqj
r = r − qj−1 βj−1
αj = qTj r
r = r − qj αj
Othorgonalise if necessary
βj = ||r||2
Compute approximate eigenvalues of Tj
Test Convergence (see remarks)
endfor
End Loop
7 Remarks 1: Finding the Eigenvalues and Eigenvectors
• So how do we find the eigenvalues and eigenvectors?
• If βk = 0 then
1. We diagonalise the matrix Tk using simple QR method to find the exact eigenvalues.
Tk = Sk diag (λ1 , . . . , λk ) SkT
where the matrix Sk is orthonormal Sk SkT = I.
2. The exact eigenvectors are given correspondingly in the columns of the matrix Y where
SkT QTk AQk Sk = diag (λ1 , . . . , λk )
so that Y = Qk Sk .
3. We converge to the k largest eigenvalues. The proof is very difficult and is omitted.
• Now βk is never really zero. Hence we only converge to the eigenvalue.
– After k steps we have AQk = Qk Tk,k + βk qk+1 eTk
– For βk small we obtain approximations to the eigenvalues θi ≈ λi by
Tk = Sk diag (θ1 , . . . , θk ) SkT
– We multiply AQk by Sk from above so that
AQk Sk = Qk Tk,k Sk + βk qk+1 eTk Sk
= Qk Sk diag (θ1 , . . . , θk ) SkT Sk + βk qk+1 eTk Sk
AYk = Yk diag (θ1 , . . . , θk ) + βk qk+1 eTk Sk
AYk .ej = Yk diag (θ1 , . . . , θk ) .ej + βk qk+1 eTk Sk ej
Ayj = yj θj + βk qk+1 Skj
Ayj − θj yj = βk qk+1 Skj
∴ ||Ayj − θj yj ||2 = |βk ||Skj | (1)
– So if βk → 0 we prove θj → λj .
– Otherwise |βk ||Skj | needs to be small to have a good approximation, hence convergence
criterion
|βk ||Skj | <
8 Remarks 2: Difficulties with Lanzcos Method
• In practice, the problem is that the orthogonality is not preserved.
• As soon as one eigenvalue converges all the basis vectors qi pick up perturbations biased
toward the direction of the corresponding eigenvector and orthogonality is lost.
• A “ghost” copy of the eigenvalue will appear again in the tridiagonal matrix T.
• To counter this we fully re-orthonormalize the sequence by using Gram-Schmidt or even QR.
• However, either approach would be expense if the dimension if the Krylov space is large.
• So instead a selective re-orthonormalization is pursued. More specifically, the practical
√
approach is to orthonormalize half-way i.e., within half machine-recision M .
• If the eigenvalues of A are not well separated, then we can use a shift and employ the matrix
(A − σI)−1
following the shifted inverted power method to generate the appropriate Krylov subspaces.