0% found this document useful (0 votes)
78 views

Lecture 4 Control

The document discusses optimal control and related topics. It introduces the linear quadratic regulator problem for finding an optimal control law for a linear system with a quadratic performance index. It describes the continuous-time differential matrix Riccati equation and algebraic Riccati equation solutions. It also discusses the linear quadratic tracking problem, dynamic programming approach using the principle of optimality, and the linear quadratic observer and Kalman filter as an optimal observer.

Uploaded by

Pamela Chemutai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views

Lecture 4 Control

The document discusses optimal control and related topics. It introduces the linear quadratic regulator problem for finding an optimal control law for a linear system with a quadratic performance index. It describes the continuous-time differential matrix Riccati equation and algebraic Riccati equation solutions. It also discusses the linear quadratic tracking problem, dynamic programming approach using the principle of optimality, and the linear quadratic observer and Kalman filter as an optimal observer.

Uploaded by

Pamela Chemutai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Table of Contents

Table of Contents iii

1 LQ: Optimal Control 1


1.1 Optimal Control:LQR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Quadratic performance index . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 The Linear quadratic problem (LQR) . . . . . . . . . . . . . . . . . . 2
1.2 LQT: LQ Tracking Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Optimal tracking system. . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Optimal Control:Dynamic Programming . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Principle of optimality . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Dynamic Programming and Full-State Feedback (Continous time sys-
tems) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Dynamic Programming problem (Discrete time systems) . . . . . . . 14
1.4 LQ Observer and Kalman filter (Optimal observer) . . . . . . . . . . . . . . 18
1.4.1 LQ Observer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4.2 The Kalman filter single variable estimation problem . . . . . . . . . 19
1.4.3 Kalman filter as optimal observer . . . . . . . . . . . . . . . . . . . . 19

Bibliography 22

iii
Chapter 1

LQ: Optimal Control

1.1 Optimal Control:LQR


An optimal control system seeks to maximize the return from a system for the minimum
cost. In general terms, the optimal control problem is to find a control u(t) which causes
the system

The problem is one of constrained functional minimization, and has several approaches
namely:

1. Variational calculus - Euler Langrange equations

2. The maximum principle of Pontryagin- Hamiltonian function

3. Dynamic programming method of Bellman - principle of optimality (Hamilton-Jacobi


partial differential equation)

4. TheHamilton-Jacobi equation solved for special case of the linear time- invariant plant
with quadratic performance criterion (called the performance index), which takes the
form of the matrix Riccati (1724) equation.

1
2
1.1.1 Quadratic performance index

If the state and control variables in equations (9.4) and (9.5) are squared, then the
performance index become quadratic. The advantage of a quadratic performance index is
that for a linear system it has a mathematical solution that yields a linear control law of the
form

1.1.2 The Linear quadratic problem (LQR)


The Linear Quadratic Regulator (LQR) provides an optimal control law for a linear system
with a quadratic performance index.
3
4

This equation (9.23) is referred to as (CTDMRE) Continuous Time Differential Ma-


trix Riccatti Equation
5

This equation (9.25)is referred to as (ARE) Agebraic Riccatti Equation

Example:

Determine
(a) the Riccati matrix P
(b) the state feedback matrix K
(c) the closed-loop eigenvalues
6
7
8

1.2 LQT: LQ Tracking Problem


The tracking or servomechanism problem is defined in section 9.1.1(e), and is directed at
applying a control u(t) to drive a plant so that the state vector x(t) follows a desired state
trajectory r(t) in some optimal manner.
9
1.2.1 Optimal tracking system.

Fig: 9.2 Optimal tracking system.


10

1.3 Optimal Control:Dynamic Programming


The basic idea of Dynamic programming is a discrete, multistage optimization problem in
the sense that at each of the finite set of time, a decision is chosen from a finite number of
decisions based on some optimization criterion referred to as the principle of optimality

1.3.1 Principle of optimality


If the system has reached an intermediate point on an optimal path to some goal, then the
remainder of the path must constitute the optimal path from the intermediate point to the
goal.

An optimal policy has property that no matter what the previous decision (i.e control)
have been , the remaining decisions must constitute an optimal policy with regard to the
state resulting from those previous decisions. i.e

u(k) = f (x(k))

Example
11

Figure: Nodal Travel

Suppose that we are driving from Point A to Point C, and we ask what is the shortest
path in miles. If A and C represent Los Angeles and Boston, for example, there are many
paths to choose from! Assume that one way or another we have found the best path, and
that a Point B lies along this path, say Las Vegas. Let X be an arbitrary point east of Las
Vegas. If we were to now solve the optimization problem for getting from only Las Vegas
to Boston, this same arbitrary point X would be along the new optimal path as well. The
point is a subtle one: the optimization problem from Las Vegas to Boston is easier than that
from Los Angeles to Boston, and the idea is to use this property backwards through time to
evolve the optimal path, beginning in Boston.
Nodal Travel. We now add some structure to the above experiment. Consider now traveling
from point A (Los Angeles) to Point D (Boston). Suppose there are only three places to cross
the Rocky Mountains, B1,B2,B3, and three places to cross the Mississippi River, C1,C2,C3.
By way of notation, we say that the path from A to B1 isAB1. Suppose that all of the paths
(and distances) from A to the B-nodes are known, as are those from the B-nodes to the
C-nodes, and the C-nodes to the terminal point D. There are nine unique paths from A to
D. A brute-force approach sums up the total distance for all the possible paths, and picks
the shortest one.
In terms of computations, we could summarize that this method requires nine additions
of three numbers, equivalent to eighteen additions of two numbers. The comparison of
numbers is relatively cheap. The dynamic programming approach has two steps. First, from
each B-node, pick the best path to D. There are three possible paths from B1 to D, for
example, and nine paths total from the B-level to D. Store the best paths as B1D|opt ,
B2D|opt , B3D|opt . This operation involves nine additions of two numbers. Second, compute
the distance for each of the possible paths from A to D, constrained to the optimal paths
from the B-nodes onward: AB1 + B1D|opt , AB2 + B2D|opt , or AB3 + B3D|opt .
The combined path with the shortest distance is the total solution; this second step
involves three sums of two numbers, and total optimization is done in twelve additions of
two numbers. Needless to say, this example gives only a mild advantage to the dynamic
programming approach over brute force. The gap widens vastly, however, as one increases
12
the dimensions of the solution space. In general, if there are s layers of nodes (e.g., rivers
or mountain ranges), and each has width n (e.g., n river crossing points), the brute force
approach will take (sns ) additions, while the dynamic programming procedure involves only
(n2 (s1) + n) additions. In the case of n = 5, s = 5, brute force requires 6250 additions;
dynamic programming needs only 105.

1.3.2 Dynamic Programming and Full-State Feedback (Continous


time systems)
We consider here the regulation problem, that is, of keeping xdesired = 0. The closed-loop
system thus is intended to reject disturbances and recover from initial conditions, but not
necessarily follow y −trajectories. There are several necessary definitions. First we define an
instantaneous penalty function l(x(t), u(t)), which is to be greater than zero for all nonzero
x and u. The cost associated with this penalty, along an optimal trajectory, is
Z ∞
J= l(x(t), u(t))dτ (1.1)
0
i.e., the integral over time of the instantaneous penalty. Finally, the optimal return is
the cost of the optimal trajectory remaining after time t:
Z ∞
V (x(t), u(t)) = l(x(τ ), u(τ ))dτ (1.2)
0
We have directly from the dynamic programming principle

V (x(t), u(t)) = min{l(x(t), u(t))δt + V (x(t + δt), u(t + δt))} (1.3)


u

The minimization of V (x(t), u(t)) is made by considering all the possible control inputs u in
the time interval (t, t + δt). As suggested by dynamic programming, the return at time t is
constructed from the return at t + δt, and the differential component due to l(x, u). If V is
smooth and has no explicit dependence on t, as written, then using taylors theorem

´ δa2 ´´
f (a + δa) = f (a) + δaf (a) + f (a) + h.o.t
2!

∂V dx
V (x(t + δt), u(t + δt)) = V (x(t), u(t)) + δt + h.o.t (1.4)
∂x dt
∂V
= V (x(t), u(t)) + (Ax(t) + Bu(t))δt
∂x
Now control input u in the interval (t, t + δt) cannot affect V (x(t), u(t)), so inserting the this
result in equation 1.3 and making a cancellation gives
13
∂V
0 = min{l(x(t), u(t)) + (Ax(t) + Bu(t))} (1.5)
u ∂x
We next make the assumption that V (x, u) has the following form:
1
V (x, u) = xT P x (1.6)
2
where P is a symmetric matrix, and positive definite. It follows that

∂V
= xT P → (1.7)
∂x
0 = min{l(x(t), u(t)) + xT P (Ax(t) + Bu(t))}
u

We finally specify the instantaneous penalty function. The LQR employs the special quadratic
form

1 1
l(x, u) = xT Qx + uT Ru (1.8)
2 2
where Q and R are both symmetric and positive definite. The matrices Q and R are to be
set by the user, and represent the main tuning knobs for the LQR. Substitution of this form
into the above equation, and setting the derivative with respect to u to zero gives

0 = uT R + xT P B (1.9)
uT = −xT P BR−1
u = −R−1 B T P x

The gain matrix for the feedback control is thus K = R−1 B T P . Inserting this solution back
into equation 1.7 and 1.8, and eliminating u in favor of x, we have

1 1
0 = xT Qx − xT P BR−1 B T P x + xT P Ax (1.10)
2 2
All the matrices here are symmetric except for P A; since xT P Ax = xT AT P x, we can make
its effect symmetric by letting

1 1
xT P Ax = xT P Ax + xT AT P x
2 2
leading to the final matrix-only result

0 = P A + AT P − P BR−1 B T P + Q (1.11)
14
1.3.3 Dynamic Programming problem (Discrete time systems)
Problem:
Find control sequence u(k) and state trajectory of a dynamic system/plant

ẋ(t) = f (t, x(t), u(t))

in discrete form
x(k + 1) = f (k, x(k), u(k))
minimizing the discrete performance index
N
X −1
Jk (x(k)) = φ(x(N ), N ) + F (k, x(k), u(k))
i=0

Where N is the terminal point time index and φ(x(N ), N ) is the terminal cost.

Solution:
Assume we have already found an optimal control from k + 1 onwards, ie. u(k + 1), u(k + 2)

, . . ., u(N − 1), with the optimal cost Jk+1 (x(k + 1)). If we apply any control u(k) at time
k, the cost is given by

Jk (x(k), u(k)) = F (x(k), u(k)) + Jk+1 (x(k + 1))

According to bellman, the optimal cost from time k on is

Jk∗ (x(k), u(k)) = min{F (x(k), u(k)) + Jk+1



(x(k + 1))}
u(k)

Hence, optimization over only one control vector at a time.

Algorithm:
15
16
Example:

Given a discrete time dynamic system

x(k + 1) = 4x(k) − 6u(k), x(0) = 8

and a performance index


J0 = F (x(k), u(k)) + Jk+1 (x(k + 1))
1=N −1
1 X
= (x(2) − 20)2 + (2x2 (k) + 4u2 (k))
2 k=0

Using principle of optimality, find control sequence u(0) and u(1), Assume no constraints on
u(k)
Solution:
17
18

1.4 LQ Observer and Kalman filter (Optimal observer)


1.4.1 LQ Observer
For a system of the form

ẋ(t) = Ax(t) + Bu(t) (1.12)


y(t) = Cx(t)

The state estimator design problem is to choose the observer gain L in the observer equation

x̂˙ = Ax̂ + Bu + L(y − C x̂)

With the observer error dynamics equation

ė = (A − LC)e

So that the observer error dynamics is stable.

The related state feedback problem (Dual) is to choose K in

ẋ = AT x + C T u with u = −Kx

Which implies ẋ = (AT − C T K)x


and (AT − C T K) is stable.
19
By choosing L = K T for the observer, the observer is ensured to be stable.

Since the K obtained by LQ optimal control design is stabilizing as long as some stabilizabil-
ity and detectability conditions are satisfied. L = K T can be used as a stabilizing observer
gain as well.

Solving LQ control problem for the dual problem

ẋ = AT x + C T u with u = −Kx

Transform the algebraic Riccatti equation

C T → B, AT → A

Thus AP + P AT − P C T R−1 CP + Q = 0
Stabilizing feedback gain K for the dual system is given by

K = LT = R−1 CP ⇒ L = P C T R−1

Where L is the observer gain

1.4.2 The Kalman filter single variable estimation problem


In the design of state observers in previous sections (LQR and LQT), it was assumed that
the measurements y = Cx were noise free. In practice, this is not usually the case and
therefore the observed state vector x may also be contaminated with noise.

1.4.3 Kalman filter as optimal observer


For a linear stochastic system process disturbance and measurement noise are accounted for
model uncertainty and the plant/process model is modified to.

ẋ(t) = Ax(t) + Bu(t) + ω(t) (1.13)


y(t) = Cx(t) + υ(t)

The observer equation is given by

x̂˙ = Ax̂ + Bu + L(y − C x̂)

with the observer error dynamics


ė = (A − LC)e
Let the errors be
20

∆x = x − x̂ = e
∆y = C∆x = Ce
The error dynamics equation is given by

e = (A − LC)e + ω − Lυ

Formal formulation of Kalman filters

ω(t) and υ(t) are random disturbances and x0 is a random initial variable. Let the spectral
densities for random processes ω(t) and υ(t) be Qf and Rf respectively

The random variables/noise have the following properties

• They are stationery

• Zero-mean gausian

• White noise

These implies the following relationships:

ε[ω(t)ω T (τ )] = Qf δ(t − τ )
ε[υ(t)υ T (τ )] = Rf δ(t − τ )
ε[υi (t)ωjT (τ )] = 0
ε[x0 xT0 (τ )] = S

Where ε(x) is the expected value (mean) of the random variable x. In other words

• ω(t) is un-related to ω(τ ) whenever τ 6= t

• υ(t) is un-related to υ(τ ) whenever τ 6= t

• υ(t) is not related to ω(τ ) at all

• Qf is the expected size of ω(t)ω T (τ )

• Rf is the expected size of υ(t)υ T

• S is the expected size of x0 xT0


21
Our objective is to find a linear filter (observer) driven by y(t) which produces a state estimate
x̂(t) and at the same time the filter is optimized by minimizing the quadratic function

Jt = ε{x̃(t)x̃(t)T } (1.14)

Where x̃(t) = x̂(t) − x(t) ; Estimation error

Solution:
The state observer equation will take the form
dx̂(t)
= Ax̂(t) + L[y(t) − C x̂(t)] (1.15)
dt
with a quadratic performance index

Jt = ε{(x̂(t) − x(t))(x̂(t) − x(t))T } = ε{x̃(t)x̃(t)t } (1.16)

Where L is the optimal gain which satisfies

L = Pf C T Rf−1 (1.17)

and Pf is the solution to

APf + Pf AT − Pf C T Rf−1 CPf + Qf = 0 (1.18)


The solution has very close connection to the LQR problem. The properties of the
optimal filter therefore follow directly from the optimal LQR solution by making the following
correspondences

LQR Kalman filter


A AT
B CT
Q Qf
R Rf

By duality with the LQR problem, the steady state filter takes the form

dx̂(t)
= Ax̂ + L(y − C x̂) (1.19)
dt
Where
L = Pf C T Rf−1
and Pf is the stabilizing solution of the following CTARE

Qf − Pf C T R−1 CPf + Pf AT + APf = 0 (1.20)

References: [2, 1, 3]
Bibliography

[1] B.D.O. Anderson and J.B. Moore, Linear optimal control, Prentice-Hall Electrical Engi-
neering Series, Prentice-Hall, 1971.

[2] G.C. Goodwin, S.F. Graebe, and M.E. Salgado, Control system design, Prentice Hall,
2001.

[3] Katsuhiko Ogata, Modern control engineering - fifth edition, Addison-Wesley, 2010.

22

You might also like