Lecture 4 Control
Lecture 4 Control
Bibliography 22
iii
Chapter 1
The problem is one of constrained functional minimization, and has several approaches
namely:
4. TheHamilton-Jacobi equation solved for special case of the linear time- invariant plant
with quadratic performance criterion (called the performance index), which takes the
form of the matrix Riccati (1724) equation.
1
2
1.1.1 Quadratic performance index
If the state and control variables in equations (9.4) and (9.5) are squared, then the
performance index become quadratic. The advantage of a quadratic performance index is
that for a linear system it has a mathematical solution that yields a linear control law of the
form
Example:
Determine
(a) the Riccati matrix P
(b) the state feedback matrix K
(c) the closed-loop eigenvalues
6
7
8
An optimal policy has property that no matter what the previous decision (i.e control)
have been , the remaining decisions must constitute an optimal policy with regard to the
state resulting from those previous decisions. i.e
u(k) = f (x(k))
Example
11
Suppose that we are driving from Point A to Point C, and we ask what is the shortest
path in miles. If A and C represent Los Angeles and Boston, for example, there are many
paths to choose from! Assume that one way or another we have found the best path, and
that a Point B lies along this path, say Las Vegas. Let X be an arbitrary point east of Las
Vegas. If we were to now solve the optimization problem for getting from only Las Vegas
to Boston, this same arbitrary point X would be along the new optimal path as well. The
point is a subtle one: the optimization problem from Las Vegas to Boston is easier than that
from Los Angeles to Boston, and the idea is to use this property backwards through time to
evolve the optimal path, beginning in Boston.
Nodal Travel. We now add some structure to the above experiment. Consider now traveling
from point A (Los Angeles) to Point D (Boston). Suppose there are only three places to cross
the Rocky Mountains, B1,B2,B3, and three places to cross the Mississippi River, C1,C2,C3.
By way of notation, we say that the path from A to B1 isAB1. Suppose that all of the paths
(and distances) from A to the B-nodes are known, as are those from the B-nodes to the
C-nodes, and the C-nodes to the terminal point D. There are nine unique paths from A to
D. A brute-force approach sums up the total distance for all the possible paths, and picks
the shortest one.
In terms of computations, we could summarize that this method requires nine additions
of three numbers, equivalent to eighteen additions of two numbers. The comparison of
numbers is relatively cheap. The dynamic programming approach has two steps. First, from
each B-node, pick the best path to D. There are three possible paths from B1 to D, for
example, and nine paths total from the B-level to D. Store the best paths as B1D|opt ,
B2D|opt , B3D|opt . This operation involves nine additions of two numbers. Second, compute
the distance for each of the possible paths from A to D, constrained to the optimal paths
from the B-nodes onward: AB1 + B1D|opt , AB2 + B2D|opt , or AB3 + B3D|opt .
The combined path with the shortest distance is the total solution; this second step
involves three sums of two numbers, and total optimization is done in twelve additions of
two numbers. Needless to say, this example gives only a mild advantage to the dynamic
programming approach over brute force. The gap widens vastly, however, as one increases
12
the dimensions of the solution space. In general, if there are s layers of nodes (e.g., rivers
or mountain ranges), and each has width n (e.g., n river crossing points), the brute force
approach will take (sns ) additions, while the dynamic programming procedure involves only
(n2 (s1) + n) additions. In the case of n = 5, s = 5, brute force requires 6250 additions;
dynamic programming needs only 105.
The minimization of V (x(t), u(t)) is made by considering all the possible control inputs u in
the time interval (t, t + δt). As suggested by dynamic programming, the return at time t is
constructed from the return at t + δt, and the differential component due to l(x, u). If V is
smooth and has no explicit dependence on t, as written, then using taylors theorem
´ δa2 ´´
f (a + δa) = f (a) + δaf (a) + f (a) + h.o.t
2!
∂V dx
V (x(t + δt), u(t + δt)) = V (x(t), u(t)) + δt + h.o.t (1.4)
∂x dt
∂V
= V (x(t), u(t)) + (Ax(t) + Bu(t))δt
∂x
Now control input u in the interval (t, t + δt) cannot affect V (x(t), u(t)), so inserting the this
result in equation 1.3 and making a cancellation gives
13
∂V
0 = min{l(x(t), u(t)) + (Ax(t) + Bu(t))} (1.5)
u ∂x
We next make the assumption that V (x, u) has the following form:
1
V (x, u) = xT P x (1.6)
2
where P is a symmetric matrix, and positive definite. It follows that
∂V
= xT P → (1.7)
∂x
0 = min{l(x(t), u(t)) + xT P (Ax(t) + Bu(t))}
u
We finally specify the instantaneous penalty function. The LQR employs the special quadratic
form
1 1
l(x, u) = xT Qx + uT Ru (1.8)
2 2
where Q and R are both symmetric and positive definite. The matrices Q and R are to be
set by the user, and represent the main tuning knobs for the LQR. Substitution of this form
into the above equation, and setting the derivative with respect to u to zero gives
0 = uT R + xT P B (1.9)
uT = −xT P BR−1
u = −R−1 B T P x
The gain matrix for the feedback control is thus K = R−1 B T P . Inserting this solution back
into equation 1.7 and 1.8, and eliminating u in favor of x, we have
1 1
0 = xT Qx − xT P BR−1 B T P x + xT P Ax (1.10)
2 2
All the matrices here are symmetric except for P A; since xT P Ax = xT AT P x, we can make
its effect symmetric by letting
1 1
xT P Ax = xT P Ax + xT AT P x
2 2
leading to the final matrix-only result
0 = P A + AT P − P BR−1 B T P + Q (1.11)
14
1.3.3 Dynamic Programming problem (Discrete time systems)
Problem:
Find control sequence u(k) and state trajectory of a dynamic system/plant
in discrete form
x(k + 1) = f (k, x(k), u(k))
minimizing the discrete performance index
N
X −1
Jk (x(k)) = φ(x(N ), N ) + F (k, x(k), u(k))
i=0
Where N is the terminal point time index and φ(x(N ), N ) is the terminal cost.
Solution:
Assume we have already found an optimal control from k + 1 onwards, ie. u(k + 1), u(k + 2)
∗
, . . ., u(N − 1), with the optimal cost Jk+1 (x(k + 1)). If we apply any control u(k) at time
k, the cost is given by
∗
Jk (x(k), u(k)) = F (x(k), u(k)) + Jk+1 (x(k + 1))
Algorithm:
15
16
Example:
∗
J0 = F (x(k), u(k)) + Jk+1 (x(k + 1))
1=N −1
1 X
= (x(2) − 20)2 + (2x2 (k) + 4u2 (k))
2 k=0
Using principle of optimality, find control sequence u(0) and u(1), Assume no constraints on
u(k)
Solution:
17
18
The state estimator design problem is to choose the observer gain L in the observer equation
ė = (A − LC)e
ẋ = AT x + C T u with u = −Kx
Since the K obtained by LQ optimal control design is stabilizing as long as some stabilizabil-
ity and detectability conditions are satisfied. L = K T can be used as a stabilizing observer
gain as well.
ẋ = AT x + C T u with u = −Kx
C T → B, AT → A
Thus AP + P AT − P C T R−1 CP + Q = 0
Stabilizing feedback gain K for the dual system is given by
K = LT = R−1 CP ⇒ L = P C T R−1
∆x = x − x̂ = e
∆y = C∆x = Ce
The error dynamics equation is given by
e = (A − LC)e + ω − Lυ
ω(t) and υ(t) are random disturbances and x0 is a random initial variable. Let the spectral
densities for random processes ω(t) and υ(t) be Qf and Rf respectively
• Zero-mean gausian
• White noise
ε[ω(t)ω T (τ )] = Qf δ(t − τ )
ε[υ(t)υ T (τ )] = Rf δ(t − τ )
ε[υi (t)ωjT (τ )] = 0
ε[x0 xT0 (τ )] = S
Where ε(x) is the expected value (mean) of the random variable x. In other words
Jt = ε{x̃(t)x̃(t)T } (1.14)
Solution:
The state observer equation will take the form
dx̂(t)
= Ax̂(t) + L[y(t) − C x̂(t)] (1.15)
dt
with a quadratic performance index
L = Pf C T Rf−1 (1.17)
By duality with the LQR problem, the steady state filter takes the form
dx̂(t)
= Ax̂ + L(y − C x̂) (1.19)
dt
Where
L = Pf C T Rf−1
and Pf is the stabilizing solution of the following CTARE
References: [2, 1, 3]
Bibliography
[1] B.D.O. Anderson and J.B. Moore, Linear optimal control, Prentice-Hall Electrical Engi-
neering Series, Prentice-Hall, 1971.
[2] G.C. Goodwin, S.F. Graebe, and M.E. Salgado, Control system design, Prentice Hall,
2001.
[3] Katsuhiko Ogata, Modern control engineering - fifth edition, Addison-Wesley, 2010.
22