0% found this document useful (0 votes)
58 views

Lect09 Solver LU Sparse

This document discusses LU factorization for solving systems of linear equations. It begins with an overview of Gaussian elimination and LU factorization. Gaussian elimination transforms a matrix A into lower and upper triangular matrices L and U through row operations. LU factorization finds the elements of L and U such that A = LU. Solving systems using LU factorization involves first solving Ly = b, then solving Ux = y. LU factorization has advantages over direct Gaussian elimination when solving multiple systems with the same coefficient matrix A.

Uploaded by

Teto Schedule
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Lect09 Solver LU Sparse

This document discusses LU factorization for solving systems of linear equations. It begins with an overview of Gaussian elimination and LU factorization. Gaussian elimination transforms a matrix A into lower and upper triangular matrices L and U through row operations. LU factorization finds the elements of L and U such that A = LU. Solving systems using LU factorization involves first solving Ly = b, then solving Ux = y. LU factorization has advantages over direct Gaussian elimination when solving multiple systems with the same coefficient matrix A.

Uploaded by

Teto Schedule
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

PRINCIPLES OF CIRCUIT SIMULAITON

Lecture 9.
Linear Solver:
LU Solver and Sparse Matrix

Guoyong Shi, PhD


shiguoyong@ic.sjtu.edu.cn
School of Microelectronics
Shanghai Jiao Tong University
Fall 2010

2010-11-15 Slide 1
Outline
Part 1:
• Gaussian Elimination
• LU Factorization
• Pivoting
• Doolittle method and Crout’s Method
• Summary
Part 2: Sparse Matrix

2010-11-15 Lecture 9 slide 2


Motivation
• Either in Sparse Tableau Analysis (STA) or in
Modified Nodal Analysis (MNA), we have to
solve linear system of equations: Ax = b

⎛A 0 0 ⎞⎛ i ⎞ ⎛ 0 ⎞
⎜ T ⎟⎜ ⎟
⎜0 I −A ⎟ v = ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟
⎜K i
⎝ Kv 0 ⎟⎠ ⎜⎝ e ⎟⎠ ⎜⎝ S ⎟⎠ ⎡1 1 1⎤
⎢R + G 2 + R −G 2 − ⎥
R 3 ⎛ e1 ⎞ ⎛ 0 ⎞
⎢ 1 3
⎥⎜ ⎟ = ⎜ ⎟
⎢ 1 1 1 ⎥ ⎝e2 ⎠ ⎝ I S 5 ⎠
⎢ − +
⎣ R3 R 4 R 3 ⎥⎦

2010-11-15 Lecture 9 slide 3


Motivation
• Even in nonlinear circuit analysis, after
"linearization", again one has to solve a
system of linear equations: Ax = b.
• Many other engineering problems require
solving a system of linear equations.

• Typically, matrix size is of 1000s to millions.


• This needs to be solved 1000 to million times
for one simulation cycle.
• That's why we'd like to have very efficient
linear solvers!

2010-11-15 Lecture 9 slide 4


Problem Description
Problem:
Solve Ax = b
A: nxn (real, non-singular), x: nx1, b: nx1
Methods:
– Direct Methods (this lecture)
Gaussian Elimination, LU Decomposition, Crout
– Indirect, Iterative Methods (another lecture)
Gauss-Jacobi, Gauss-Seidel, Successive Over
Relaxation (SOR), Krylov

2010-11-15 Lecture 9 slide 5


Gaussian Elimination -- Example
⎧2 x + y = 5

⎩ x + 2y = 4

⎛2 1 5⎞
1
⎛2 1 5⎞ ⎜ ⎧x = 2
3 ⎟⎟


⎜0 3
2
⎜ 1 2 4⎟ ⎩y = 1
⎝ ⎠ ⎝ 2 2⎠

⎛ 1 0 ⎞⎛ 2 1 ⎞
⎛ 2 1 ⎞ ⎜ ⎟⎜
⎜ 1 2⎟ = ⎜ 1 3⎟ LU factorization
⎝ ⎠ 1 ⎟⎜ 0 ⎟
⎝2 ⎠⎝ 2⎠

2010-11-15 Lecture 9 slide 6


Use of LU Factorization
L U
⎛ 1 0⎞⎛2 1 ⎞
⎛ 2 1 ⎞⎛ x ⎞ ⎜ ⎟ ⎜ ⎟ ⎛ x ⎞ ⎛5⎞
⎜ 1 2 ⎟⎜ y ⎟ = ⎜ 1 3 ⎜ ⎟ =⎜ ⎟
⎝ ⎠⎝ ⎠ 1⎟ ⎜ 0 ⎟⎝ y ⎠ ⎝ 4⎠
⎝2 ⎠⎝ 2⎠
Define
⎛2 1⎞
⎛u ⎞ ⎜ ⎟ ⎛x⎞
Solving the L-system: ⎜ v ⎟ := ⎜ 3 ⎟⎜ ⎟
⎝ ⎠ 0 ⎝y ⎠
⎝ 2⎠
⎛ 1 0⎞ ⎛5⎞
⎜1 ⎟⎛u ⎞ = ⎛ 5 ⎞ ⎛u ⎞ ⎜ ⎟
⎜ ⎟ ⎜v ⎟ ⎜ 4⎟
1 ⎝ ⎠ ⎝ ⎠ ⎜v ⎟ = ⎜ 3 ⎟
⎝2 ⎠ Solve ⎝ ⎠
⎝2⎠
L
Triangle systems are easy to solve (by back-substitution.)
2010-11-15 Lecture 9 slide 7
Use of LU Factorization
⎛ 1 0⎞ ⎛5⎞
⎜1 ⎟⎛u ⎞ = ⎛ 5 ⎞ ⎛ ⎞ ⎜ ⎟
u
⎜ ⎜ ⎟ ⎜ ⎟
1⎟ ⎝v ⎠ ⎝ 4 ⎠ ⎜v ⎟ = ⎜ 3 ⎟
⎝2 ⎠ ⎝ ⎠
⎝2⎠
U U
⎛2 1⎞ ⎛5⎞ ⎛2 1⎞
⎜ ⎟ ⎛x⎞ ⎜ ⎟ ⎛ x ⎞ ⎛u ⎞
3 ⎜ ⎟ = 3⎟ ⎜ ⎟
⎜ ⎟ =⎜ ⎟
⎜0 ⎟⎝ y ⎠ ⎜ ⎜0 3 ⎟⎝ y ⎠ ⎝v ⎠
⎝ 2⎠ ⎝2⎠ ⎝ 2⎠
Solving the U-system:
⎛ x ⎞ ⎛ 2⎞
⎜ y ⎟ = ⎜ 1⎟
⎝ ⎠ ⎝ ⎠

2010-11-15 Lecture 9 slide 8


LU Factorization

A = LU = U The task of L & U


L factorization is to find the
elements in matrices L and U.
A x = L (U x ) = Ly = b

1. Let y = Ux.
2. Solve y from Ly = b
3. Solve x from Ux = y

2010-11-15 Lecture 9 slide 9


Advantages of LU Factorization
• When solving Ax = b for multiple b, but the
same A, then we only LU-factorize A only
once.
• In circuit simulation, entries of A may change,
but structure of A does not alter.
– This factor can used to speed up repeated LU-
factorization.
– Implemented as symbolic factorization in the
“sparse1.3” solver in Spice 3f4.

2010-11-15 Lecture 9 slide 10


Gaussian Elimination
• Gaussian elimination is a process of row
transformation

Ax = b
a11 a12 a13 a1n b1
a 21 a 22 a 23 a 2n b2
a 31 a 32 a 33 a 3n b3

an 1 an 2 an 3 ann bn
Eliminate the lower triangular part
2010-11-15 Lecture 9 slide 11
Gaussian Elimination
a11 ≠ 0 a11 a12 a13 a1n b1
ai 1 a 21 a 22 a 23 a 2n b2
− a 31 a 32 a 33 a 3n b3
a11

an 1 an 2 an 3 ann bn

a11 a12 a13 a1n b1


0 a 22 a 23 a 2n b2
0 a 32 a 33 a 3n b3
Entries updated
0 an 2 an 3 ann bn
2010-11-15 Lecture 9 slide 12
Eliminating 1st Column
• Column elimination is equiv to row transformation


1 0 0 0⎤

⎡ a11 a12 a13 a1n ⎤ b1
⎢−
a 21
1 0 0⎥ ⎢a a 22 a 23 a 2n ⎥ b 2


a 11 ⎥

⎢ 21 ⎥
⎢− a 31
0 1 0⎥ X ⎢a 31 a 32 a 33 a 3n ⎥ b3


a 11 ⎥

⎢ ⎥
⎢ ⎥ ⎢ ⎥

⎢−
an1
0 0

1 ⎥ ⎢⎣an 1 an 2 an 3 ann ⎥⎦ bn
⎣ a 11 ⎦

⎡a11 a12 a13 a1n ⎤ b1


⎢ (2) ⎥
⎢ 0 a (2)
22 a (2)
23 a 2n ⎥ b2(2)
L1-1 A = A(2) ⎢ 0 a (2) a (2) ⎥ b3(2)
32 33 a 3(2)
n
⎢ ⎥
⎢ ⎥
⎢ (2) ⎥
⎣ 0 a (2)
n2 a (2)
n3 ann ⎦ bn(2)
2010-11-15 Lecture 9 slide 13
Eliminating 2nd Column
⎡1 0 0 0⎤ ⎡a11 a12 a13 a1n ⎤
⎢0 1 0 0⎥ ⎢ (2) (2) ⎥

⎢ (2 )

⎥ ⎢ 0 a 22 a 23 a 2(2)
n ⎥
a 32



a (2 )
1 0⎥

X ⎢ 0 a (2) a (2)
32 33 a 3(2)
n

⎢ ⎥
22
⎢ ⎥

⎢ a (2 )


⎢ ⎥
⎢0 − n 2
0 1 ⎥ ⎢ (2) (2) (2) ⎥
⎣ 0 an 2 an 3 ann
(2 )
a
⎣ 22 ⎦ ⎦
(2)
a 22 ≠0
⎡a11 a12 a13 a1n ⎤ b1
⎢ (2) (2) (2) ⎥
⎢ 0 a 22 a 23 a 2n ⎥ b2(2)
L2-1 A(2) = A(3) ⎢0 (3) ⎥
0 a 33 a 3(3)
n b (3)
⎢ ⎥ 3
⎢ ⎥
⎢ (3) ⎥
⎣0 0 an(3)3 ann (3)
⎦ bn
2010-11-15 Lecture 9 slide 14
Continue on Elimination
• Suppose all diagonals are nonzero

⎡ a11 a12 a13 a1n ⎤ b1


⎢a a 22 a 23 a 2n ⎥ b 2
⎢ 21 ⎥
Ln-1 Ln-1-1 ••• L2-1 L1-1 ⎢a 31 a 32 a 33 a 3n ⎥ b3
⎢ ⎥
⎢ ⎥
⎢⎣an 1 an 2 an 3 ann ⎥⎦ bn

Upper triangular ⎡a11 a12 a13 a1n ⎤ b1


⎢ ⎥ (2)
A (n)
(2) (2)
⎢ 0 a 22 a 23 a 2(2)
n ⎥ b2
⎢0 ⎥ b (3)
A(n) x= b(n)
(3)
0 a 33 a 3(3)
n
⎢ ⎥ 3

⎢ ⎥
⎢ (n ) ⎥ b (n )
⎣0 0 0 ann ⎦ n
2010-11-15 Lecture 9 slide 15
Triangular System
Gaussian elimination ends up with the following
upper triangular system of equations

⎡a11 a12 a13 a1n ⎤ ⎛ x ⎞ ⎛ b1 ⎞


1
⎢ (2) (2) ⎥
(2) ⎜ ⎟ ⎜ (2) ⎟
0 a a a 2n ⎥ x 2 b
⎢ 22 23 ⎜ ⎟ ⎜ 2 ⎟
⎢0 0 a 33(3)
a 3(3) ⎥ ⎜ x ⎟ = ⎜ b (3) ⎟
n 3
⎢ ⎥⎜ ⎟ ⎜ 3 ⎟
⎢ ⎥⎜ ⎟ ⎜ ⎟
⎢ (n ) ⎥ ⎜ x ⎟ ⎜ (n ) ⎟
⎣0 0 0 ann ⎦ ⎝ n ⎠ ⎝ bn ⎠

Solve this system from bottom up: xn, xn-1, ..., x1

2010-11-15 Lecture 9 slide 16


LU Factorization
• Gaussian elimination leads to LU factorization

( Ln-1 Ln-1-1 ••• L2-1 L1-1 ) A = U

A = (L1 L2 ••• Ln-1 Ln) U = LU L = (L1 L2 ••• Ln-1 Ln)

⎡ 1 0 0 0⎤
⎢a ⎥ ⎡a11 a12 a13 a1n ⎤
⎢ 21 1 0 0⎥ ⎢ (2) (2) ⎥
⎢ a 11 ⎥ ⎢ 0 a 22 a 23 a 2(2)
n ⎥
⎢ ⎥
⎢ a 31 a (2 )
32
1 0⎥ ⎢0 0 a 33(3)
a 3(3) ⎥
⎢ a 11 a (2 ) ⎥ ⎢ n ⎥
22
⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ (2 ) ⎥ ⎢ (n ) ⎥
⎢a n1 a n 2
* 1 ⎥ ⎣0 0 0 ann ⎦
⎢⎣ a 1 1 a (2 )
22 ⎥⎦

2010-11-15 Lecture 9 slide 17


Complexity of LU
a11 ≠ 0 a11 a12 a13 a1n
a 21

a 11 a 21 a 22 a 23 a 2n
a a 31 a 32 a 33 a 3n
− 31
a 11

an1

a 11
an 1 an 2 an 3 ann

# of mul / div = (n-1)*n ≈ n2 n >> 1

n
1

i =1
i = n (n + 1)(2n + 1) ∼ O (n 3 )
6
2

2010-11-15 Lecture 9 slide 18


Cost of Back-Substitution
⎡a11 a12 a13 a1n ⎤ ⎛ x ⎞
1
⎛ b1 ⎞
⎢ (2) (2) ⎥ ⎜x ⎟ ⎜ (2) ⎟
⎢ 0 a a a 2(2)
n ⎥ ⎜ b2 ⎟
22 23 ⎜ ⎟
2
⎢0 0 a 33(3)
a 3(3) ⎥ ⎜ x ⎟ = ⎜ b (3) ⎟
n ⎥ 3
⎢ ⎜ ⎟ ⎜ 3 ⎟
⎢ ⎥⎜ ⎟ ⎜ ⎟
⎢ (n ) ⎥ ⎜ x ⎟ ⎜ (n ) ⎟
⎣0 0 0 ann ⎦ ⎝ n ⎠ ⎝ bn ⎠

bn(n ) bn(n−1−1) − an(n−1,


−1)
nxn
x n = (n ) x n −1 =
ann an(n−1,
−1)
n −1

n
1
Total # of mul / div = ∑ i
i =1
= n (n + 1)
2
∼ O (n 2
)

2010-11-15 Lecture 9 slide 19


Zero Diagonal
Example 1: After two steps of Gaussian elimination:
⎡× × 0 0 0 0⎤
⎢0 ⎡× × 0 0 0 0⎤
× × 0 ⎥⎥ ⎢0 0 ⎥⎥
⎢ 0 0
⎢ × 0 0 ×
⎢ 1 1 ⎥
⎢0 0 − 1 0⎥ ⎢ 1 1 ⎥
R R ⎢0 0 − 1 0⎥
⎢ ⎥ R R
⎢ 1 1 ⎥ ⎢ ⎥
⎢0 0 − 0 1⎥ ⎢0 0 0 0 1 1 ⎥

R R
⎥ ⎢0 0 0 × × 0 ⎥⎥
⎢0 0 × 0 × 0⎥ ⎢
⎢0 ⎢⎣ 0 × × 0 ⎥⎦
0 ⎥⎦
0 0
⎣ 0 0 × ×

Gaussian elimination
cannot continue

2010-11-15 Lecture 9 slide 20


Pivoting
Solution 1:
Interchange rows to bring a non-zero element into position (k, k):
⎡0 1 1 ⎤ ⎡ x x 0⎤
⎢× × ⎥ ⎢ 0 1 1⎥
⎢ 0⎥ ⎢ ⎥
⎢⎣ × × 0 ⎥⎦ ⎢⎣ x x 0 ⎥⎦
Solution 2: How about column exchange? Yes
Then the unknowns are re-ordered as well.
⎡ 1 0 1⎤
⎡ 0 1 1 ⎤ ⎢ x x 0⎥
⎢× × ⎥
⎢ 0⎥ ⎢ ⎥
⎢⎣ × × 0 ⎥⎦ ⎢⎣ x x 0 ⎥⎦
In general both rows and columns can be exchanged!
2010-11-15 Lecture 9 slide 21
Small Diagonal
Example 2: ⎡ 1 . 25 × 10

−4
1 . 25 ⎤ ⎡ x 1 ⎤
⎥⎢ ⎥ =
⎡ 6 . 25 ⎤
⎢ 75 ⎥
⎢⎣ 12 . 5 12 . 5 ⎥⎦ ⎣ x 2 ⎦ ⎣ ⎦
Assume finite arithmetic: 3-digit floating point, we have
−105 ⎡ 1 . 25 × 10 −4
1 . 25 ⎤ ⎡ x1 ⎤ ⎡ 6 . 25 ⎤
⎢ ⎥⎢ ⎥ = ⎢ 75 ⎥
⎢⎣ 12 . 5 12 . 5 ⎥⎦ ⎣ x 2 ⎦ ⎣ ⎦
pivoting

⎡ 1 . 25 × 10 −4
1 . 25 ⎤ ⎡ x1 ⎤ ⎡ 6 . 25 ⎤
⎢ ⎥⎢ ⎥ = ⎢ ⎥
⎢⎣ − 6 . 25 × 10
5
⎢⎣ 0 − 1 . 25 × 10 5
⎥⎦ ⎣ 2 ⎦
x ⎥⎦

12.5 rounded off


⎧⎪ x 2 = 5
⎨ −4 x1 = 0
⎪⎩ ( 1 . 25 × 10 ) x 1 + (1 . 25 ) x 2 = 6 . 25

Unfortunately, (0, 5) is not the solution. Considering the 2nd


equation: 12.5 * 0 + 12.5 * 5 = 62.5 ≠ 75.
2010-11-15 Lecture 9 slide 22
Accuracy Depends on Pivoting
−105 ⎡ 1 . 25 × 10 −4
1 . 25 ⎤ ⎡ x1 ⎤ ⎡ 6 . 25 ⎤
⎢ ⎥⎢ ⎥ = ⎢ 75 ⎥
⎢⎣ 12 . 5 12 . 5 ⎥⎦ ⎣ x 2 ⎦ ⎣ ⎦
pivoting

Reason:
a11 (the pivot) is too small relative to the other numbers!

Solution: Don't choose small element to do elimination. Pick a


large element by row / column interchanges.

Correct solution to 5 digit accuracy is


x1 = 1.0001
x2 = 5.0000
2010-11-15 Lecture 9 slide 23
What causes accuracy problem?
• Ill Conditioning: The A matrix close to singular
• Round-off error: Relative magnitude too big

x− y =0
x+ y =1
x− y =0 x+ y =1

x− y =0
x− y =0
x − 1 . 01 y = 0 . 01
Í resulting in numerical errors
ill-conditioned

2010-11-15 Lecture 9 slide 24


Pivoting Strategies
1. Partial Pivoting

2. Complete Pivoting

3. Threshold Pivoting

2010-11-15 Lecture 9 slide 25


Pivoting Strategy 1
1. Partial Pivoting: (Row interchange only)
Choose r as the smallest integer such that:

k
a rk (k )
= max a jk
(k )
j = k ,..., n
U
k
Rows k to n L r

Search Area

2010-11-15 Lecture 9 slide 26


Pivoting Strategy 2
2. Complete Pivoting: (Row and column interchange)
Choose r and s as the smallest integer such that:

a rs ( k ) = max a ij ( k ) k
U
i = k ,..., n
j = k ,..., n L r
rows k to n; s
cols k to n

Search Area

2010-11-15 Lecture 9 slide 27


Pivoting Strategy 3
3. Threshold Pivoting:
a kk (k )
< ε p a rk (k )
a. Apply partial pivoting only if
b. Apply complete pivoting only if a kk (k )
< ε p a rs (k )

user specified

a rk ( k ) = max a jk
(k )
U
j = k ,..., n
L r
a rs ( k ) = max a ij ( k ) s
i = k ,..., n
j = k ,..., n
Implemented in Spice 3f4

2010-11-15 Lecture 9 slide 28


Variants of LU Factorization
• Doolittle Method
• Crout Method
• Motivated by directly filling in L/U elements in the
storage space of the original matrix "A".
⎡ 1 0 0 0⎤ ⎡u 11 u 12 u 13 u 1n ⎤
A = LU = U ⎢ 1 0 0⎥
⎢ 0 u
22 u 23 u 2n ⎥
⎢ 21 ⎥ ⎢ ⎥
L ⎢ 31 1 0⎥ ⎢ 0 0 u 33 u 3n ⎥
32
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ n 1 n2 n3 1 ⎥⎦ ⎢⎣ 0 0 0 u nn ⎥⎦

⎡ a11 a12 a13 a1n ⎤


⎢a a 22 a 23 a 2n ⎥
⎢ 21 ⎥
Reuse the storage ⎢a 31 a 32 a 33 a 3n ⎥
⎢ ⎥
⎢ ⎥
⎢⎣an 1 an 2 an 3 ann ⎥⎦
2010-11-15 Lecture 9 slide 29
Variants of LU Factorization
Hence we need a sequential method to process the rows and
columns of A in certain order – processed rows / columns are
not used in the later processing.

A = LU = U
L Reuse the storage

⎡ a11 a12 a13 a1n ⎤ ⎡ 1 0 0 0⎤ ⎡u 11 u 12 u 13 u1n ⎤


⎢a ⎢ 0 u u 2n ⎥
a 22 a 23 a 2n ⎥ ⎢ 1 0 0⎥ ⎢ 22 u 23 ⎥
⎢ 21 ⎥ ⎢ 21 ⎥
⎢a 31 a 32 a 33 a 3n ⎥ ⎢ 31 1 0⎥ ⎢ 0 0 u 33 u 3n ⎥
32
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣an 1 an 2 an 3 ann ⎥⎦ ⎢⎣ n 1 n2 n3 1 ⎥⎦ ⎢⎣ 0 0 0 u nn ⎥⎦

2010-11-15 Lecture 9 slide 30


Doolittle Method – 1
Keep this row
⎡ a11 a12 a13 a1n ⎤ ⎡ 1 0 0 0⎤ ⎡u 11 u 12 u 13 u1n ⎤
⎢a a 2n ⎥ ⎢ ⎢ 0 u u 2n ⎥
⎢ 21
a 22 a 23
⎥ 1 0 0⎥ ⎢ 22 u 23 ⎥
⎢ 21 ⎥
⎢a 31 a 32 a 33 a 3n ⎥ = ⎢ 31 1 0⎥ ⎢ 0 0 u 33 u 3n ⎥
32
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢⎣an 1 an 2 an 3 ann ⎥⎦ ⎢⎣ n 1 n2 n3 1 ⎥⎦ ⎢⎣ 0 0 0 u nn ⎥⎦

First solve the 1st row of U, i.e., U(1, :)

(u11 u12 u13 u 1n ) = (a11 a12 a13 a1n )

2010-11-15 Lecture 9 slide 31


Doolittle Method – 2

⎡ a11 a12 a13 a1n ⎤ ⎡ 1 0 0 0⎤ ⎡u 11 u 12 u 13 u 1n ⎤


⎢ 0⎥ ⎢ 0 u
⎢a a 22 a 23 a 2n ⎥ ⎢ 21
1 0
⎥ 22 u 23 u 2n ⎥
⎢ 21 ⎥ ⎢ ⎥
= ⎢ 31 1 0⎥ ⎢ 0 0 u 33 u 3n ⎥
⎢a 31 a 32 a 33 a 3n ⎥ ⎢
32

⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢⎣ n 1 1 ⎥⎦
⎢⎣an 1 an 2 an 3 ann ⎥⎦ n2 n3 ⎢⎣ 0 0 0 u nn ⎥⎦

Then solve the 1st column of L, i.e., L(2:n, 1)

⎡a 21 ⎤ ⎡ 21 ⎤
⎢a ⎥ ⎢ ⎥
⎢ 31 ⎥ = ⎢ 31 ⎥
u 11 u11 = a11
⎢ ⎥ ⎢ ⎥
⎢a ⎥ ⎢ ⎥
⎣ n1 ⎦ ⎣ n1 ⎦

2010-11-15 Lecture 9 slide 32


Doolittle Method – 3
(1)

⎡ a11 a12 a13 a1n ⎤ ⎡ 1 0 0 0⎤ ⎡u 11 u 12 u 13 u1n ⎤


⎢a ⎢ 0⎥ ⎢ 0 u u 2n ⎥ (3)
a 22 a 23 a 2n ⎥ ⎢ 21
1 0
⎥ ⎢ 22 u 23 ⎥
⎢ 21 ⎥
⎢a 31 a 32 a 33 a 3n ⎥ = ⎢⎢ 31 32 1 0⎥ ⎢ 0 0 u 33 u 3n ⎥
⎢ ⎥ ⎥ ⎢ ⎥
⎢ ⎥ (2) ⎢ ⎥ ⎢ ⎥
⎢⎣an 1 an 2 an 3 ann ⎥⎦ ⎢⎣ n 1 n2 n3 1 ⎥⎦ ⎢⎣ 0 0 0 u nn ⎥⎦

Solve the 2nd row of U, i.e., U(2, 2:n)

21 (u12 u13 u 1n ) + (u 22 u 23 u 2n )

= (a 22 a 23 a 2n )

2010-11-15 Lecture 9 slide 33


Doolittle Method – 4
⎡ 1 0 0 0⎤ ⎡u 11 u 12 u 13 u1n ⎤
⎡ a11 a12 a13 a1n ⎤ ⎢ 0 u
⎢a ⎢ 0⎥ 22 u 23 u 2n ⎥
a 22 a 23 a 2n ⎥ ⎢ 21
1 0
⎥ ⎢ ⎥
⎢ 21 ⎥ ⎢ 0 0 u 33 u 3n ⎥
⎢a 31 a 32 a 33 a 3n ⎥ = ⎢ 31 32 1 0⎥
⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢⎣ 0 0 0 u nn ⎥⎦
⎢⎣an 1 an 2 an 3 ⎢⎣ n 1 1 ⎥⎦
ann ⎥⎦ n2 n3

The computation order of the Doolittle Method:

L \U =
1
3
5

2 4 6

2010-11-15 Lecture 9 slide 34


Crout Method
• Similar to the Doolittle Method, but starts from the 1st
column (Doolittle starts from the 1st row.)

⎡ 11 0 0 0 ⎤ ⎡1 u 12 u 13 u1n ⎤
⎢ ⎢0 1 u u 2n ⎥
A = LU = U 0 0 ⎥ ⎢ 23 ⎥
⎢ 21 22 ⎥
L ⎢ 31 32 33 0 ⎥ ⎢0 0 1 u 3n ⎥
⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ n1 n2 n3 nn ⎥
⎦ ⎢⎣0 0 0 1 ⎥⎦

The diagonals of U
The computation order of the Crout Method: are normalized !
L \U = 2
4
6

1 3 5
2010-11-15 Lecture 9 slide 35
Storage of LU Factorization
Using only one 2-dimensional array !

U
U

L L
A (3) A (4)

• In sparse matrix implementation, this type of storage


requires increasing memory space because of fill-ins
during the factorization.

2010-11-15 Lecture 9 slide 36


Summary
• LU factorization has been used in virtually all
circuit simulators
– Good for multiple RHS and sensitivity calculation
• Pivoting is required to handle zero diagonals
and to improve numerical accuracy
– Partial pivoting (row exchange): tradeoff between
accuracy and efficiency
– Matrix condition number is used to analyze the
effect of round-off errors and numerical stability

2010-11-15 Lecture 9 slide 37


PRINCIPLES OF CIRCUIT SIMULAITON

Part 2.
Programming Techniques
for Sparse Matrices

2010-11-15 Slide 38
Outline
• Why Sparse Matrix Techniques?
• Sparse Matrix Data Structure
• Markowitz Pivoting
• Diagonal Pivoting for MNA Matrices
• Modified Markowitz pivoting
• How to Handle Sparse RHS
• Summary

2010-11-15 Lecture 9 slide 39


Why Sparse Matrix?
Motivation:
– n = 103 equations
– Complexity of Gaussian elimination ~ O(n3)
– n = 103 Î ~ 109 flops operations
Î (1 GHz computer) 10 sec
Î storage 106 words

Exploiting Sparsity
– MNA Î 3 nonzeros / row
– Can reach complexity for Gaussian elimination
• ~ O(n1.1) – O(n1.5) (Empirical complexity)

2010-11-15 Lecture 9 slide 40


Sparse Matrix Programming
• Use linked-list data structure
– to avoid storing zeros
– used to be hard before 1980s: in Fortran!
• Avoid trivial operations 0x = 0, 0+x = x
• Two kinds of zero
– Structural zeros – always 0 independent of
numerical operations
– Numerical zeros – resulting from computation
• Avoid losing sparsity (very important!)
– sparsity changes with pivoting

2010-11-15 Lecture 9 slide 41


Non-zero Fill-ins
• Gaussian elimination causes nonzero fill-ins

-1 3 4 5 0 6 0 0 1 2 8 3 4 5 0 6 0 0 1 2 8
3 0 0 0 -1 0 0 3 -2 4 0 -4 -5 0 -7 0 0 2 -4 -4
5 0 -3 2 6 1 2 7 0 0 5 0 -3 2 6 1 2 7 0 0
0 4 8 0 10 0 0 2 2 3 0 4 8 0 10 0 0 2 2 3
1 5 4 0 0 0 0 1 0 6 1 5 4 0 0 0 0 1 0 6

fill-ins

2010-11-15 Lecture 9 slide 42


How to Maintain Sparsity
• One should choose appropriate pivoting (during
Gaussian Elimination, G.E.) to avoid large increment
of fill-ins.

After row/col
reordering no fill-ins
introduced

before GE after GE before GE after GE


x x x x x x x x x x x x x x x x
x x x x x x x x x x x
x x x x x x x x x x x
x x x x x x x x x x x
x x x x x x x x x x x
x x x x x x x x x x x x x x x x x x

2010-11-15 fill-ins Lecture 9 slide 43


Markowitz Criterion
• Markowitz criterion
– kth pivot;
– A(k) is the reduced matrix
– NZ = nonzero
– The num of nonzeros in a row (column) is also called the row
(column) degree.
– The column degrees can be used for column ordering.
If chosen for pivoting

x x x x 4
x x x 3 # NZ this row
x x 2
A(k) x x 2
(row degrees)
x x 2
3 3 2 2 3 ci \ ri

# NZ this col (column degrees)


2010-11-15 Lecture 9 slide 44
Markowitz Product
• If Gaussian Elimination to pivot on (i, j)
• Markowitz product = (ri –1)(cj-1)
= maximum possible number of fill-ins if
pivoting at (i, j)

• Recommendations: (implemented in Sparse1.3)


– Best with largest magnitude of pivot element and
smallest Markowitz product
– Try threshold test after choosing smallest
Markowitz product (M.P.)
– Break ties (if equal M.P.) by choosing element with
largest magnitude
2010-11-15 Lecture 9 slide 45
Sparse Matrix Data Structure
Example Matrix
r\c 1 2 3
1 1 1.2 0
2 0 1.5 0
3 2.1 0 1.7

Matrix Element structure

struct elem{
real value;
int row;
int col;
struct elem *next_in_row;
struct elem *next_in_col;
} Element;

2010-11-15 Lecture 9 slide 46


Data Structure in Sparse 1.3
• Sparse 1.3 – Written by Ken Kundert, 1985~1988, then PhD
student at Berkeley, later with Cadence Design Systems, Inc.

FirstInCol[1] FirstInCol[2] FirstInCol[3]

FirstInRow[1] 1.0 1 1 1.2 1 2


diag[1]

FirstInRow[2] 1.5 2 2
diag[2]
FirstInRow[3] 2.1 3 1 1.7 3 3
diag[3]

2.1 3 1

value row col

2010-11-15 Lecture 9 slide 47


ASTAP Data Structure
• ASTAP is an IBM simulator using STA (Sparse
Tableau Analysis).
1 2 3 4 ...
Row Pointers 1 3 4 6 -1
r\c 1 2 3
1 1 1.2 0
Col Indices 1 2 2 1 3 -1
2 0 1.5 0
Values 1.0 1.2 1.5 2.1 1.7
3 2.1 0 1.7
values stored row-wise

9Row Pointers point to the beginning of Col Indices.


9Nonzeros in the same row are indexed by their col indexes continuously.
9Used by many iterative sparse solvers

2010-11-15 Lecture 9 slide 48


Key Loops in a SPICE Program

dx Update stamps
C = f (x ,t )
dt related to time
Cx n +1 = Cx n + h ⋅ f (x n +1 ,t n +1 ) +
⎡ ∂f ⎤ Newton-Raphson
Cx n(k+)1 = ⎢ ⎥ ⎡ x n(k+)1 − x n(k+1−1) ⎤ +
⎣ ∂x ⎦ ⎣ ⎦ (at point x)
Invoke linear solver

A=
(
∂f x n(k+1−1) ,t n +1 ) x := x + Δx

∂x t := t + Δt

2010-11-15 Lecture 9 slide 49


Linear Solves in Simulation
Update stamps
related to time

Newton-Raphson
(at point x)
Invoke linear solver

x := x + Δx

t := t + Δt
At each time point, Ax = b has to be solved for many times

time t

time points
2010-11-15 Lecture 9 slide 50
Structure of Matrix Stamps
• In circuit simulation, matrix being solved
repeatedly is of the same structur;
• only some entries vary at different frequency
or time points.
Typical matrix structure

T C
X T
A= T C X X
C = Constant
T T = Time varying
C X C X = Nonlinear (varying even at
C X the same time point)

2010-11-15 Lecture 9 slide 51


Strategies for Efficiency
• Utilizing the structural information can greatly
improve the solving efficiency.

• Strategies:
– Weighted Markowitz Product
– Reuse the LU factorization
– Iterative solver (by conditioning)
– ...

2010-11-15 Lecture 9 slide 52


A Good (Sparse) LU Solver
Properties of a good LU solver:
• Should have a good column ordering algorithm.
• With a good column ordering, partial (row)
pivoting would be enough !
• Should have an ordering/elimination separated
design:
– i.e., ordering is separated from elimination.
– SuperLU does this,
– but Sparse1.3 doesn’t.

2010-11-15 Lecture 9 slide 53


Optimal Ordering is NP-hard
• The ordering has a significant impact on the
memory and computational requirements for
the latter stages.
• However, finding the optimal ordering for A
(in the sense of minimizing fill-in) has been
proven to be NP-complete.
• Heuristics must be used for all but simple (or
specially structured) cases.

M.R. Garey and D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-Completeness
W.H. Freeman, New York, 1979.

2010-11-15 Lecture 9 slide 54


Column Ordering
Why Important ?
• A good column ordering greatly reduces the
number of fill-ins, resulting in a vast speedup.
• However, searching a pivot with minimum
degree at each step (in Sparse 1.3) is not
efficient.
• Best to get a good ordering before elimination
(e.g. SuperLU), but not easy!

2010-11-15 Lecture 9 slide 55


Available Ordering Algorithms
SuperLU uses the following algorithms:

• Multiple Minimum Degree (MMD) applied to the


structure of (ATA).
– Mostly good
• Multiple Minimum Degree (MMD) applied to the
structure of (AT+A).
– Mostly good
• Column Approximate Minimum Degree
(COLAMD).
– Mostly not good!

2010-11-15 Lecture 9 slide 56


Summary
• Exploiting sparsity reduces CPU time and
memory
• Markowitz algorithm reflects a good tradeoff
between overhead (computation of MP) and
savings (less fill-ins)
• Use weighted Markowitz to account for
different types of element stamps in nonlinear
dynamic circuit simulation
• Consider sparse RHS and selective unknowns
for speedup

2010-11-15 Lecture 9 slide 57


No-turn-in Exercise
• Spice3f4 contains a solver called Sparse 1.3 (in
src/lib/sparse)
• This is a independent solver that can be used outside
Spice3f4.
• Download the sparse package from the course web
page (sparse.tar.gz) (or ask TA).
• Find the test program called "spTest.c".
• Modify this program if necessary so that you can run
the solver.
• Create some test matrices to test the sparse solver.
• Compare the solved results to that by MATLAB.

2010-11-15 Lecture 9 slide 58


Software
• Sparse1.3 is in C and was programmed by Dr. Ken
Kundert (fellow of Cadence; architect of Spectre).
• Source code is available from
http://www.netlib.org/sparse/
• SparseLib++ is in C++ and comes from NIST. The
authors are J. Dongarra, A. Loumsdaine, R. Pozo, K.
Remington.
• See``A Sparse Matrix Library in C++ for High
Performance Architectures”, Proc. of the Second
Object Oriented Numerics Conference, pp. 214-218,
1994.
• The paper and the C++ source code are available from
http://math.nist.gov/sparselib%2b%2b/

2010-11-15 Lecture 9 slide 59


References
1. G. Dahlquist and A. Bjorck, Numerical Methods
(translated by N. Anderson), Prentice Hall, Inc.
Englewood Cliffs, New Jersey, 1974.
2. W. J. McCalla, Fundamentals of Computer-Aided
Circuit Simulation, Kluwer Academic Publishers.
1. Chapter 3, “Sparse Matrix Methods”
3. Albert Ruehli (Ed.), “Circuit Analysis, Simulation and
Design”, North-Holland, 1986.
1. K. Kundert, “Sparse Matrix Techniques”
4. J. Dongarra, A. Loumsdaine, R. Pozo, K. Remington,
``A Sparse Matrix Library in C++ for High
Performance Architectures,” Proc. of the Second
Object Oriented Numerics Conference, pp. 214-218,
1994.
2010-11-15 Lecture 9 slide 60

You might also like