ASE396 Methods of Estimation/Detection Scribe Notes

ASE 396: Model-Based Dete
tion and Estimation
Class Notes
Spring 2012
Contents
1 Overview of Estimation and Dete tion
1.1
Appli ations of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
Fantasti ally simple: Least squares problem . . . . . . . . . . . . . . . . . . .
1.1.2
Fantasti ally omplex: Estimation of GPS . . . . . . . . . . . . . . . . . . . .
1.2
General Solution Pro edure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
We Want to Know Our Unknowns
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
Example: Missile Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Linear Algebra Review
2.1
Ve tors, matri es and operations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.2.1
Properties of norms
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2.2
S hwartz inequality
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Determinant and inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.1
Determinant
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.2
Singularity and non-singularity . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.3.3
Matrix inversion
12
2.3.4
Inversion of partitioned matri es
2.3.5
Matrix inversion lemma
2.3
2.4
2.5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
12
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Ve tor transformation and matrix fa torization . . . . . . . . . . . . . . . . . . . . .
13
2.4.1
The Householder Transformation . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.4.2
QR Fa torization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.4.3
Cholesky Fa torization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
Eigenvalues and ve tors
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Review of Probability Theroy
14
15
3.1
Probability: what is it?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.2
Axiomati Approa h to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
4 Review of Probability and Statisti s
20
4.1
Conditional Probability
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
4.2
Independent Events and Independent Random Variables . . . . . . . . . . . . . . . .
20
4.2.1
20
Further impli ations of independen e . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS
CONTENTS
4.3
Ve tor-Valued Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
4.4
Total Probability Theorem
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.5
Bayes's Theorem
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.5.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.6
Conditional Expe tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.7
Gaussian Random Ve tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.7.1
Bayesian Jargon
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.8
Joint and Conditional Gaussian Random Variables . . . . . . . . . . . . . . . . . . .
24
4.9
Expe ted Value of a Quadrati Form . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
5 Dete tion/Hypothesis Testing Basi s
26
5.1
Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
5.2
Neyman-Pearson Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
5.3
Example:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
5.4
Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
6 Optimum Dete tion Stru tures

6.1
Dete tor Performan e
6.2
Composite Hypothesis Testing
30
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Estimation Basi s
30
30
34
7.1
The Problem of Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . .
34
7.2
Maximum Likelihood Estimators
34
7.3
Maximum A Posteriori Estimators
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
7.4
Example 1: ML Estimator for a Linear First-Order Dynami al System . . . . . . . .
35
7.5
Example 2: MAP Estimator for a Linear First-Order Dynami al System . . . . . . .
36
7.6
Least-Squares Estimators
7.7
Example 3: LS Estimator for a Linear First-Order Dynami al System
. . . . . . . .
36
7.8
Minimum Mean-Squared Error Estimators . . . . . . . . . . . . . . . . . . . . . . . .
37
7.9
Summary
37
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Linear estimation for stati systems

8.1
8.2
38
MAP estimator for Gaussian problems . . . . . . . . . . . . . . . . . . . . . . . . . .
38
8.1.1
40
Analysis of MAP Estimate
Bat h Least Squares Estimator

8.2.1
8.3
36
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
Properties of the Least Squares Estimator . . . . . . . . . . . . . . . . . . . .
42
Square-Root-Based LS Solutions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
8.3.1
Re ursive Least Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . .
44
8.3.2
Analysis of Re ursive LS Algorithm
46
. . . . . . . . . . . . . . . . . . . . . . .
8.4
Example: Maximum Likelihood Estimate
. . . . . . . . . . . . . . . . . . . . . . . .
46
8.5
Re ursive Approa h Using Square-Root LS Method . . . . . . . . . . . . . . . . . . .
47
8.5.1
Review of square-root LS method . . . . . . . . . . . . . . . . . . . . . . . . .
47
8.5.2
Re ursive Square-Root LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
CONTENTS
CONTENTS
9 Nonlinear Least Square Estimation
50
9.1
Basi s of nonlinear least square estimation . . . . . . . . . . . . . . . . . . . . . . . .
9.2
Newton-Rhaphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
9.3
Gauss-Newton Algorithm (Gill et al. 4.7.2)(with step length algorithm)
. . . . . . .
53
9.4
Levenberg-Marquart Method (LM) . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
10 Sto hasti Linear System Models
50
55
10.1 Continuous-time model for dynami systems . . . . . . . . . . . . . . . . . . . . . . .
55
10.2 White noise for sto hasti systems
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
10.3 Predi tion of mean and ovarian e
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
10.4 Dis rete-time models of sto hasti systems . . . . . . . . . . . . . . . . . . . . . . . .
57
10.5 Dis rete-time measurement model
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
10.6 Full dis rete-time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
11 Kalman lter for dis rete-time linear system
61
12 Alternative formulas for ovarian e and gain of Kalman lter
65
12.0.1 Interpretation of the Kalman gain
. . . . . . . . . . . . . . . . . . . . . . . .
65
12.0.2 Generalization for Weighted Pro ess Noise . . . . . . . . . . . . . . . . . . . .
66
12.0.3 Deriving the Kalman Filter from a MAP approa h . . . . . . . . . . . . . . .
66
12.0.4 Setting up the ost fun tion . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
12.0.5 Minimizing the ost fun tion
. . . . . . . . . . . . . . . . . . . . . . . . . . .
67
12.0.6 Solving for the minimum- ost estimate . . . . . . . . . . . . . . . . . . . . . .
68
13 Stability and Consisten y of Kalman Filter

13.1 Stability of KF
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2 Control of a System Estimated by KF
70
70
. . . . . . . . . . . . . . . . . . . . . . . . . .
71
13.3 Matrix Ri ati Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
13.4 Steady-State KF Equations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
13.5 Steady-State Error Dynami s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
13.6 Properties of KF Innovations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
13.7 Likelihood of a Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
14 Kalman Filter Consisten y
75
14.1 Consisten y Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
14.2 Statisti al test for KF performan e . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
14.2.1 Monte Carlo Simulation based test . . . . . . . . . . . . . . . . . . . . . . . .
76
14.2.2 Real-Time (Multiple-Runs) Tests . . . . . . . . . . . . . . . . . . . . . . . . .
77
14.2.3 Real-Time (Single-Run) Tests . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
15 Correlated Pro ess and Measurement Noise

15.1 Aside: Realization Problem
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2 Aside: Spe tral Fa torization
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
83
83
CONTENTS
CONTENTS
16 Information Filter/SRIF
16.1 Information Filter
84
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
16.1.1 Benets of the Information Filter . . . . . . . . . . . . . . . . . . . . . . . . .
85
16.1.2 Disadvantages of the Information Filter
85
16.2 Square Root Information Filtering
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
16.2.1 Propagation Step and Measurement Update . . . . . . . . . . . . . . . . . . .
88
16.3 Benets of the square root information lter
17 Smoothing
17.1 Estimate
x(k)
based on
Zj
. . . . . . . . . . . . . . . . . . . . . .
91
92
with
j>k
. . . . . . . . . . . . . . . . . . . . . . . . . .
92
17.2 Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
18 Nonlinear Dieren e Equations from ZOH Nonlinear Dierential Equations
95
18.1 Zero-Order-Hold (ZOH) Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
18.2 Varian e of the ZOH Pro ess Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
18.3 Partial Derivatives of Dieren e Equations . . . . . . . . . . . . . . . . . . . . . . . .
97
19 Nonlinear Estimation for Dynami al Systems

19.1 Standard EKF Methods for Approximation
19.1.1 Predi tion:
100
. . . . . . . . . . . . . . . . . . . . . . . 100
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
19.1.2 Measurement update:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
19.2 EKF as an algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
20 Iterated Kalman Filter
104
20.1 Re-interpretation of EKF as a Gauss Newton Step
. . . . . . . . . . . . . . . . . . . 104
20.2 Iterated Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

20.3 Forward Ba kward Smoothing (BSEKF) . . . . . . . . . . . . . . . . . . . . . . . . . 107
21 Multiple Model (MM) Filtering

21.1 Stati Case
108
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
21.1.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

21.1.2 Steps
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
21.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
22 The "Bootstrap" Parti le Filter

22.1 Motivation
111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
22.1.1 Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

22.1.2 Update
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
22.1.3 Importan e Sampling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
22.2 Parti le Filter Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

k
22.2.1 How to hoose q[x(k)|z ]? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
22.3 Bootstrap Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
22.3.1 Note:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
1 OVERVIEW OF ESTIMATION AND DETECTION
1 Overview of Estimation and Dete tion

S ribe: Kyle Wesson
Estimation
observations.
is the pro ess of inferring the value of a quantity from indire t and ina urate
Estimation an be thought of as working with a ontinuoum of values.
We will
frequently seek optimal solutions over heuristi solutions. The so- alled optimal solutions seek to
provide the best estimates with some quanti ation of their a ura y.
Dete tion
is the pro ess of making de isions based on our estimates. It inovlves hypothesis
testing against a set of nite possibilities and sele ting one that best represents our estimates.
Dete tion an be though of as a subset of estimation.
disturbances and
modeling errors
control
inputs
Dynamical
System
prior
information
measurement
errors
system
state
Measurement
System
measurements
State
Estimator
state estimate
and uncertainties
this gets you about 80% of the way

we often do this naturally (e.g., playing catch)
Figure 1: A blo k diagram illustrating the estimation pro ess.
1.1 Appli ations of Estimation

1.1.1 Fantasti ally simple: Least squares problem
We often work with equations of the form
z = Hx + w
where
is a ve tor of deterministi but unknown parameters,
is a matrix that linearly relates
to
z,
and
(1)
is a ve tor of Gaussian noise,
is a measurement ve tor.
1.1.2 Fantasti ally omplex: Estimation of GPS

Consider the IGS Tra king Network with 300 ground sites (shown in Fig. 2) tra king GPS
satellites in 6 planes with 4 (or more) satellites (SV) per plane.
Q: How do the sites know where they are?

A: They range o of several SVs whose positions are known.
Q: How do the SVs know where they are?
A: They range o of the sites... Oh no! ir ular logi .
When all of the quantities of interest are ompiled together, one has a staggeringly omplex
estimator:
non-linear state dynami s and measurement equations (square root sum of squares)
1.2 General Solution Pro edure
Figure 2: Map showing lo ation of the ground-based IGS sites (from
non-uniform gravitational potential
disturban e for es on SVs su h as irregular solar heating
rustal motion alters ground site lo ation
ionosphere and neutral atmosphere have ee ts
earth orientation parameter variations (EOP)
lo k variations
Goal:
http://igs b.jpl.nasa.gov)
estimate all site lo ations, all SV orbital parameters, atmospheri delays, slowly varying
SV disturban es, EOP, lo k errors. Wow!
1.2 General Solution Pro edure

1. Develop a physi s-based model that relates the unknown parameters or states to the measurements and identify in model the sour es of un ertainty, quantify levels.
2. Do an analysis to determine whether the unknown quantities are uniquely determinable from
the measurements (observability).
3. If answer to 1. is yes, apply estimation te hnique to estimate the unknown quantities. Also
estimate the a ura y of our estimates.
1.3 We Want to Know Our Unknowns

There are known knowns, These are things we know that we know. There are known
unknowns. That is to say, there are things that we known we don't know. But there
are also unknown unknowns. There are things we don't know we don't know.
Rumsfeld, Feb 12, 2002
In addition, we an onsider:
known an't-knowns: observability analysis says not uniquely determinable
unknown an't knowns: didn't do the observability analysis
Donald
1.4 Example: Missile Problem

x2
missle
x1
Figure 3: An illustration of the missile tra king problem.

Physi s:
x
1 = 0
(2)
x
2 = g
(3)
y1k = x1 (tk ) + n1k

y2k = x2 (tk ) + n2k
(4)
Measurements:
(5)
Assume:
E[n1k ] = E[n2k ] = 0
E[n1k n1j ] =
E[n2k n2j ] =
Unknown Quantities:
(6)
12 kj
22 kj
(7)
(8)
x1 (0), x2 (0), v1 (0), v2 (0) where v1 = x 1 and v2 = x 2 .

x1 (t) and x2 (t) (the dynami s
First, solve equation of motion to determine
model):
x1 (t) = x1 (0) + v1 (0)t
(9)
x2 (t) = x2 (0) + v2 (0)t gt /2
(10)
Next, determine the measurements:
y1 (t) = x1 (0) + v1 (0)t + n1k
(11)
(12)
y2 (t) = x2 (0) + v2 (0)t gt /2 + n2k
Q: Are x1 (0) (and other quantities) observable

A: Yes! (for n 2)
from
[y11 , y12 , y21 , . . . , y1n , y2n ]?
Now develop an estimation algorithm and apply it:

1 0
y11
y22 0 1
=
.
.
.
| {z }
z
.
.
.
.
.
.
t1
0
{z
H
.
.
.
x (0)

1
0
n11
0
x2 (0) gt2 /2 n22

t1
v (0) +
+
.
.
.
1
.
.
.
.
.
.
v2 (0)
} | {z } | {z } | {z }
x
Assuming some estimation strategy, we get the following quantities:

Then, we an predi t the impa t lo ation noting
x
1 (0), x
2 (0), v1 (0), v2 (0).
x2 (timpact ) = 0.
0=x
2 (0) + v2 (0)timpact gt2impact/2
p
2 (0)
v2 (0) + v22 (0) + 2g x
timpact =
g
timpact into previous equation for site estimate: x
1 (timpact ). We
ximpact . Re ursion: in orporate new measurements as they arise.
Plug
and
(13)
(14)
(15)
an also estimate
timpact
2 LINEAR ALGEBRA REVIEW
2 Linear Algebra Review

S ribe: Chirag Patel
2.1 Ve tors, matri es and operations

Ve tor
= ordered list of quantities:
v1
v2

v= .
..
vn
where
vj
is the
j th
s alar element and
Matrix
where
is an
nm
matrix,
v Rn .
a11
Outer Produ t:
s alar
If
a12
a21
A=
.
..
an1

A = aij
..
= aT b
b Rn1
if
and
a, b
a1m
..
.
.
.
Q: What is the rank of the outer produ t?

A: 1, it is a produ t of two rank one ve tors.
anm
(i, j)th
(17)
(18)
element (row
ve tors. Notation:
then
A = bcT Rnm
of a matrix is the number of independent rows or olumns.
Tra e:
of
If
is the sum of the diagonal elements: tr(A)
Symmetri Matrix:
n
P
i=1
A Rnm , B Rmn ,
if
then tr(AB)nn
A = AT
= tr(BA)mm
for a square matrix
olumn
j ).
(19)
Rank:
i,
AT = C T DT
n1
c Rm1 ,
.
.
.
is the
then
are
A Rnm ; aij
A = BC ,
Inner Produ t:
(16)
A.
aii .
ha, bi
with
aij = bi cj .
2.2 Norms
Quadrati Form:
If
a quadrati form in
P = PT,
x Rn1 .
then we an dene
= xT P x
(20)
n X
n
X
(21)
Or,
xi Pij xj
j=1 i=1
Positive deniteness: Given P = P T , if xT P x > 0 x 6= 0, then P is positive denite

T
(notation: P > 0). If only x P x 0, then P is positive semidenite (notation: P 0).

1
0.5
1 5
Q: Whi h is positive denite:
or
?
0.5
2
5 2
A: The rst one is, the se ond is not.

Q: Can one tell by inspe tion?
A: Not in general, but P > 0
all eigenvalues are positive (we will learn about eigenvalues
soon).
Weighted inner produ t: ha, bip = aT P b
is a weighted inner produ t if
P > 0.
2.2 Norms
Norm of x ould be:
l1 : kxk1 =
l2 : kxk2 =
n
P
i=1
|xi |,
n
P
i=1
x2i
Manhattan norm"
1/2
xT x,
Eu lidean norm"
l : kxk = max |xi |

i
Aside: The three norms dened above are shown in Figure
?? for the n = 2 ase.
The interse -
tions on the axes are the same in ea h ase.

Fun fa t #2: The l distan e is also alled the hessboard distan e be ause it is equal to the minimum number of moves a king must make to go from one square to another, if ea h square is of side
length
1.
Also, for an entertaining (and informative) read on taxi ab geometry (that uses the l1 - in-
stead of the l2 -norm for distan e), try

In general, the
p-norm
http://www.ams.org/samplings/feature- olumn/f ar -taxi.
an be dened
p 1:
1/p
kxp k = (|x1 |p + |x2 |p + + |xn |p )
10
(22)
2.3 Determinant and inverse
2.2.1 Properties of norms

Three properties of norms:
1.
kxk 0
2.
kxk = ||kxk
3.
ka + bk kak + kbk
If
P > 0,
and
then
kxk = 0
kxk2P =
i
x=0
- the triangle inequality
xT P x;
this is also a valid norm alled the
Hereafter, as a matter of notation,
P -weighted
norm.
kxk2 = kxk.
Indu ed matrix norm:

kAknm = max kAxk
(23)
|xT y| kxkkyk
(24)
kxk=1
2.2.2 S hwartz inequality
It's like saying
| cos | 1.

2.3.1 Determinant
Determinant of A where A is square = |A|
|A| =
where
Also,
Cij
is

a =a
with the
for a
ith
11
n
X
i=1
row and
aij |Cij |(1)i+j j = 1, . . .
(25)
j th olumn removed.

a b

c d = ad bc
(26)
matrix (this is not the absolute value, in ase
Q: Where does
the
determinant ome from?
d b
a b
1
1
, then A
= |A|
.
A: If A =
c
about a matrix.
This is re ursive until
11
is negative).
The determinant tells us ru ial information
2.3.2 Singularity and non-singularity

A Rnn
is singular if
|A| = 0.
is rank-de ient if
If
|A| 6= 0,
then
If
|A| = 0,
then
x 6= 0
But if
|A| 6= 0,
rank(A) < n.
is non-singular (all olumns are linearly independent).
then
su h that
Ax = 0.
Ax = 0 x = 0.
2.3.3 Matrix inversion

If
A Rnn
and
|A| 6= 0, A1
A1 =
|A|
su h that
|C11 |
A = BC ,
then
|C21 |
..
|C12 |
.
.
.
1+n
(1)
If
A1 A = AA1 = I .
A1 = C 1 B 1 ,
.
..
|C1n |
provided that
A, B ,
and
(1)n+1 |Cn1 |
.
.
.
.
.
|Cnn |
C
(27)
are non-singular.
Solution of linear equations: If A Rnn and |A| 6= 0, and b Rn1 ,

Ax = b by x = A1 b. Note that x is a unique solution and x Rn1 .
then we an solve
2.3.4 Inversion of partitioned matri es

Let
P
P = 11
P21
P12
P22
with
P11 , P22
square. If
P11
and
1
= P22 P21 P11
P12
are invertible (
is the S hur omplement), then:
P 1 =
V11
V21
Where
1
1
1
V11 = P11
+ P11
P12 1 P21 P11
1
V12 = P11
P12 1
1
V21 = 1 P21 P11
V22 = 1
Note that
dim(Pij ) = dim(Vij ).
12
V12
V22
(28)
2.4 Ve tor transformation and matrix fa torization
2.3.5 Matrix inversion lemma

(A + BCD)
= A1 A1 B DA1 B + C 1
Alternatively,
1. If
D = BT ,
1
= A1 A1 B B T A1 B + C 1
1
B T A1
A = P 1 , B = H T , D = H , C = R1 , then
1
1
P 1 + H T R1 H
= P P H T HP H T + R
HP
Orthonormal matri es (orthogonal matri es):

Q
DA1
(29)
then
A + BCB T
2. If
1
is orthonormal. Thus
All olumns of
If
QT Q = QQT = I .
Q is n n and |Q| 6= 0 and Q1 = QT ,
are perpendi ular to ea h other and have unit norm:
Important appli ation of orthogonal matrix:
kQxk = kxk: Q
(30)
(31)
then
qiT qj = ij .
is isometri .
kQxk2 = xT QT Qx = xT x = kxk2
2.4 Ve tor transformation and matrix fa torization

2.4.1 The Householder Transformation
See Bierman p59-63, Gill et. al. p35-36.
T

x2 xn , then there is a matrix
Let x = x1
for some
The matrix

Hx =
is symmetri and orthogonal.
H
0
su h that
T
(32)
is alled a Householder matrix.
kHxk = kxk =
(33)
Orthogonal transformation an be used to ompress all the magnitude ("energy") of a ve tor in to

a single omponent, zeroing all other omponents.
has the form
H=I
We an solve for
that makes

Hx =
2vv T
vT v
T
0 :

1
0

v = x + sign(x1 )kxk 0
..
.
0
13
(34)
(35)
2.5 Eigenvalues and ve tors
2.4.2 QR Fa torization
Given any
A Rmn ,
we an express A as a produ t of an orthogonal matrix and an upper
triangular matrix:
QR = A
QT Q = QQT = I , Q Rmm , and R Rmn . R is an upper triangular matrix,
ne essarily square (all elements of R below the main diagonal are zero).
1
If A is square and invertible, then so is R. What's more, R
is easy to ompute.

e
R
e is n n and upper triangular.
If A is m n with m > n, then R =
; R
0
where
Q=
(36)
and not
H1 H 2 . . . H p
{z
}
|
(37)
(Householder transforms)
p = min(m, n).
with
Use MATLAB:
[Q, R = qr(A);
2.4.3 Cholesky Fa torization

Given an n
T
that R R = P .
matrix
P = PT
with
is sort of a matrix square root of
P > 0,
it is possible to nd an upper triangular
su h
P.
Use MATLAB:
R = hol(P);
2.5 Eigenvalues and ve tors

A Rnn , s alars i (possible omplex) and asso iated ve tors vi , whose elements are
omplex if i C su h that Avi = i vi with kvi k 6= 0.
This an also be written as (A i I)vi = 0.
Thus, det(A i I) = 0 gives a degree n polynomial in i ( hara teristi equation) whose roots
are i .
T
If A = A : n eigenvalues of A are real and the n eigenve tors are independent and orthogonal.
Given
The onverse is not ne essarily true.
If
A > 0:
all
i > 0 ().
If
A 0:
all
i 0 ().
Also,
|A| =
n
Q
i=1
i ;
tr(A)
n
P
i .
i=1
14
3 REVIEW OF PROBABILITY THEROY
3 Review of Probability Theroy

S ribe: Lu Xia
Q: Where does randomness - a omponent randomness - originate?
A1: Essentially random: atomi s ale.
Heisenberg un entainty prin iple:
(Figure 4)
p1 (x)
p2 (v)
x0
v0
(a) Position of parti le
(b) Speed of parti le
Figure 4: Heisenberg un ertainty prin iple
x v
n
m
(38)
n plan k's onstant

m mass of parti le
"Our most pre ise des ription of nature must be in terms of probabilities" R.Feynman.
A2: Ee tively random

Assume a parti le is aught in a potential given by
along
and a parti le of mass
m = 1,
we have,
x
=
Add a bit of dampling:
V (x) =
dV
= x3 + x
dx
x4
4
x2
(Figure 5), for motion only
(39)
x
= x x3 + x,
x = v
(40)
x
= v = x x + x.
15
(41)
3.1 Probability: what is it?
Figure 5: potential
1.5
xdot
0.5
0.5
1.5
2
1.5
0.5
0
x
0.5
1.5
Figure 6: Phase portrait
Phase portrait: (Figure 6)
1 2
2 v + V (x) ,the total density of the basin of attra tion for some
approa hes 50%. This implies a requirement of in reasing energy pre ision
For large energies
E(x) =
neighbourhood E
E = E
E to spe ify the sink that we want.
alled "transient haos" in nonlinear dynami s. (Strogatz)
3.1 Probability: what is it?

(1). Measure of belief denition: make argumens for out omes based on, e.g. the phase portrait
in pro eeding example. (No experiments required perhaps imagined experiments maybe not
even that.)
(2). Relative frequen y denition:
P (A) = lim
This denition gives mathematitians ts:

1. In what sense doese the limit exist?
2. Can't evern perform
N =
experiments.
3. What about non repeatable experiments?
16
NB
N
(42)
3.2 Axiomati Approa h to Probability
experiement: a pro ess with (ee tively) random out omes.

event: a set of possible out omes of experiments.
S = sample space (all possible outcomes)
Figure 7: Sample Spa e

The event A o urs when an experiment has been done and the out ome is an element of A.
Probability of A: a number that satises the follows axioms.

0
= 1. (S = sample spa e, "sure event")
(3) Let A B , set of out omes in both A and B.
If A B = , then P (A B) = P (A) + P (B)
(1)P (A)
(2)P (S)
onsequen e:
P (A) = 1 P (A)
A=
"not A"
S =AA
AA =
(44)
(45)
P (A A) = P (A) + P (A) = 1
Assign probabilities by whatever means.(e.g.
(43)
relateive frequen y, means of belief.)
(46)
But they
must always satisfy these axioms.
Probability Density Fun tion (pdf)

event:
{x : d < x }
probability density fun tion:
x() , lim
d0
P ( d < x )
d
(47)
(48)
ommonly written as:
p(x), px (x)
P ( < x ) =
17
p(x)dx
(49)
Commulative Density fun tion:Px () , P (x < ) =
inf p(x)dx
Note:
"sure event", this onstraint also implies a normalization requirement on the pdf.
()2
1
e 22
p() =
2
R
Px ( ) = I .
(deathbed identity)
the
(50)
Dis rete-valued random variable

i ,
If x an only take a dis rete value
for
i = 1, . . . n,
then we an dene probability masses
i ,
i = 1, 2, . . . , n.
i = P (x = p )
p(x) =
n
X
i=1
()
i = 1
(51)
i (x i )
(52)
is Dira delta fun tion.
Expe ted value:

E[x] ,
xP (x)dx = x
(mean of x)
(53)
In general,
E[g(x)] =
g(x)p(x)dx
(54)
(55)
varian e of x:
x2 = E[(x x
)2 ] =
We say
If
x [a, b]
if
E[x] = a, x2 = b.
x =
(56)
(57)
Also,
is Gaussian,
(x x)2 p(x)dx = E[x2 ] x

2
x 0
is the standard deviation of x.
x(
1
(x)
)e 2x2 dx =
2
(58)
(59)
similarly,
x2 = 2
18
Joint PDF of 2 Random Variables

p(, ) ,
P {( d < x ) ( d < x )}
d0,d0
dd
lim
(60)
(61)
events on a joingt distribution are regions in 2-spa e.

out omes are points.
p(x) =
p(x, y)dy.
(62)
marginal density "integrate out" all the y dimension.
Cumulative distribution
Px,y (, ) = x
=
xp(x, y)dxdy =
A similar argument holds for
y.
p(x, y)dyx =
xp(x)dx
(63)
(64)
Varian e denitions follow a simillar pattern.
Covarian e:
cov(x, y) = E[(x x
)(y y)]
Z Z
=
(x x)(y y)p(x, y)dxdy
an show
dependent.
|xy 1,
cov(x, y)
= orrelation
x y
and if
xy = 0 x, y
(66)
= x yxy
xy =
(65)
oe ient
are un orrelated.
19
(67)
(68)
xy = 1 x, y
are linearly
4 REVIEW OF PROBABILITY AND STATISTICS
4 Review of Probability and Statisti s

S ribe: Na hiappan Valliappan
4.1 Conditional Probability

Are events
and
related in the sense that knowledge of the o urren e of one alter the
probability of the other? If one of the events, say

from
(the sure set) to
B,
has o urred, it then shrinks the sample spa e
as illustrated in Figure 8.
A B
Figure 8: Venn diagram illustrating onditional probability.

Now, to nd the probability that event
overlapping region
this is expressed as
AB
o urred (given
B ),
we need to divide the area of the
in the Venn diagram by the redu ed sample spa e
Pr(A | B) =
B.
Mathemati ally,
Pr(A B)
Pr(B)
(69)
Similarly, for random variables we have
p(x | y) =
p(x, y)
p(y)
(70)
4.2 Independent Events and Independent Random Variables

If knowledge of the o urren e of
does not alter the probability of
A,
then
and
are
independent events. Mathemati ally,
Pr(A) = Pr(A | B) =
Pr(A B)
Pr(A, B)
=
Pr(A, B) = Pr(A) Pr(B)
Pr(B)
Pr(B)
(71)
4.2.1 Further impli ations of independen e

Pr(A | B) = Pr(A) Pr(B | A) = Pr(B)
20
(72)
4.3 Ve tor-Valued Random Variables
S
B
A B
A
Figure 9: Does this Venn diagram express inde-
Figure 10:
penden e between A and B?
den e between events A and B
Q:
Venn diagram expressing indepen-
Does the Venn diagram in gure 9 express independen e? If not, how do we express inde-
penden e between events in a Venn diagram?
A:
A and B sin e
A and B are indepenthe Venn diagram we have Pr(A) 6= 0 and Pr(B) 6= 0.
no overlap between events A and B in the Venn diagram
No! The Venn diagram in gure 9 does not express independen e between
it leads to a ontradi tion.
It an be explained as follows.
Sin e events
Pr(A B) = Pr(A) Pr(B). From

Pr(A B) 6= 0. However, there is
= Pr(A B) = 0. Therefore, we run into a ontradi tion and gure 9 does not represent any
independen e between A and B . A orre t way to express independen e between events A and B
dent,
Hen e,
is illustrated in gure 10.

Random variables are independent if the joint probability density fun tion (pdf )
fa tored into the produ t of its marginal densities
p(x)
and
p(y)
p(x, y)
p(x, y) = p(x)p(y)
Also, if
and
Cov(x, y) =
Z Z
(x x)(y y)p(x, y) dxdy =

Cov(x, y)
=0
x y
(x x)p(x) dx
and
denote
n-valued
(y y)p(y) dy = 0
(no oupling between x and y)
4.3 Ve tor-Valued Random Variables

x
(73)
are independent random variables, we have
= xy =
Let
an be
i.e.,
ve tor random variables.
x1
x2

x= .
..
xn
21

1
2

=.
..
n
(74)
(75)
4.4 Total Probability Theorem
The joint pdf des ribing the statisti s of the random ve tor
is dened as follows.
px () = px (1 , 2 , . . . , n )
Pr {(1 d1 < x1 < 1 ) (2 d2 < x2 < 2 ) (n dn < xn < n )}
d1 0,
d1 d2 . . . dn
, lim
(76)
d2 0,
...
dn 0
The expe ted value of the random ve tor
is denoted by
E[x]
(or equivalently,
dened as
E[x] =
...
) Rn
x
and is
xp(x) dx1 dx2 . . . dxn
(77)
Pxx Rnn and is dened as

Z
T
] = (x x)(x
T p(x)dx
Cov(x) = E[(x x)(x
x)
x)
The ovarian e matrix of
where
Pxx (i, j)
is denoted by
(78)
is given by
Pxx (i, j) =
The ovarian e matrix
...
Px x
(xi x
i )(xj x
j )p(x1 , x2 , . . . , xn ) dx1 dx2 . . . dxn
is symmetri i.e.,
T
Pxx = Pxx
.
Further,
there is no linear dependen e between any of the random elements of
Pxx 0
x.
and if
Pxx 0
then
4.4 Total Probability Theorem

If B1 , B2 . . . , Bn are mutually
{B1 B2 Bn } = S , then
ex lusive and exhaustive events i.e.,
Pr(A) =
n
X
{Bi Bj = }, i 6= j
Pr(A, Bi )
i=1
n
X
i=1
and
(79)
Pr(A | Bi ) Pr(Bi )
For random variables,
p(x) =
p(x, y)dy =
4.5 Bayes's Theorem
p(x | y)p(y)dy
(80)
B1 , B2 . . . , Bn be n mutually ex lusive and exhaustive events i.e., {Bi Bj = }, i 6= j

{B1 B2 Bn } = S , and let A be any event with Pr(A) > 0, then
Consider
and
Pr(Bi | A) =
Pr(A | Bi ) Pr(Bi )
Pr(Bi A)
= Pn
Pr(A)
j=1 Pr(A | Bj ) Pr(Bj )
22
(81)
4.6 Conditional Expe tation
Thus equation (81) allows us to reverse the onditioning in onditional probabilities. For densities,
p(y | x)p(x)
p(y)
p(x | y) =
(82)
Be ause of its utility in solving estimation problems, equation (82) is onsidered the
estimation .
workhorse for
Note that, the onditional density is a valid probability density fun tion and hen e
sums to 1.
p(x | y) dx =
p(y | x)p(x)
dx =
p(y)
p(y)
p(x, y)
dx =
=1
p(y)
p(y)
(83)
4.5.1 Bayesian Jargon

p(x)
is an
p(x | y)
apriori
is an
belief (or often referred to as the prior)
aposteriori
belief (or often referred to as the posterior)
Q: What if we have no prior belief ?

A: (A1) There are Bayesians and loset Bayesians!
onsidering a
diused
prior, say uniform over
R.
(A2) One an still apply Bayes postulate by
p(x) is entirely unknown, then
If
assume
p(x) = ,
a onstant independent of x (diused prior assumption).
p(x) =
It an be shown that
1
2
otherwise
|x|
divides out of the formula
(84)
p(y|x)p(x)
1
for an
p(y)
wide interval on either side
of 0.
4.6 Conditional Expe tation

Conditional expe tation of a random variable
E[x | y] =
In general,
E[x | y] 6= E[x] (unless x and y are

E[E[x | y]] = E[x].
given another random variable
xp(x | y) dx
E[x | y]p(x | y) dx
(85)
independent). Now, we prove the tower property of
Z +
is dened as
the expe tation operator
E[E[x | y]] =
xp(x | y)p(y) dydx

xp(x, y) dydx
= E[x] = x
.
23
(Marginalizing over y)
4.7 Gaussian Random Ve tors
4.7 Gaussian Random Ve tors

Given an
nn
Rn1 , we dene the

P (= P T ) and

T P 1 (x )
(x )
1
p(x) =
n
1 exp
2
(2) 2 |P | 2
symmetri matrix
4.7.1 Properties
1.
E[x] =
2.
T] = P
E[(x )(x
)
pdf of
as
(86)
4.8 Joint and Conditional Gaussian Random Variables1

Given two Gaussian random ve tors
sta king ve tors
and
z,
and
dene a new Gaussian random ve tor,
y,
by
z.
y=

x
z
dim(x, y, z)
Pyy
= (nx , ny , nz )

Pxx Pxz
= Cov(y) =
Pzx Pzz
(87)

nz
1
1
T Pyy
|Pzz | 2
(2) 2 exp 12 (y y)
(y y)
p(x, z)
p(y)
p(x | z) =
=
=

nx +nz
1
1
p(z)
p(z)
T Pzz
|Pyy | 2
(z z)
(2) 2 exp 21 (z z)
(88)
After mu h algebra (in luding blo k inverse formula et .), we an show that the pdf of the
onditional random variable
p(x | z)
(x | z)
is
1 T 12
|Pxx Pxz Pzz
Pxz |
1
1
1 T 1
1
Pxz Pzz
T )T (Pxx Pxz Pzz
Pxz Pzz
(z z)
Pxz ) (x x
(z (89)
z)))
exp( (x x
2
(2)
nx
2
Mean:
1
+ Pxz Pzz
E[x | z] = x
(z z)
This applies a orre tion term to
based on the value of
(90)
z.
Covarian e:
1 T
E[(x E[x | z])((x E[x | z]))T | z] = Pxx Pxz Pzz
Pxz
1 T
Pxx Pxz Pzz
Pxz 0. So,
with information about z .
Note that,
estimate
it redu es the ovarian e
1 We will revisit this topi later with regard to optimal estimation
24
Pxx
(91)
and hen e improves the
4.9 Expe ted Value of a Quadrati Form 4 REVIEW OF PROBABILITY AND STATISTICS
4.9 Expe ted Value of a Quadrati Form

Given a symmetri matrix
A(= AT )
and a random ve tor
with
E[x] = 0,
E[xT Ax] =
E[tr(xT Ax)]
=
=
E[tr(AxxT )]
tr(AE[xxT ])
(sin e
(sin e
xT Ax = tr(xT Ax))
tr(.) and E[.] are linear
tr(ACov(x))
(sin e
E[x] = 0
tr(APxx )
and
then we have
operators)
Cov(x) = xxT )
(92)
25
5 DETECTION/HYPOTHESIS TESTING BASICS
5 Dete tion/Hypothesis Testing Basi s

S ribe: Marlon Bright
5.1 Hypothesis Testing

Hypothesis Testing (HT) provides the theory behind our previously des ribed methods for onsisten y he king. HT is also impli it in multiple model estimation, in signal dete tion (e.g., GPS
signal a quisition, spoong dete tion,et .), et . Consider a parameter (or a ve tor of parameters)
Assume that
an take on only dis rete values:
{0 , 1 , , m1 }
Let
Hi
denote the hypothesis that the true value of
i .
is
Thus, we dene the following
m-ary
hypothesis:
H 0 : = 0
H 1 : = 1
.
.
.
H m : = m
Therefore, in the binary hypothesis ase we have, only
H 0 : = 0
H 1 : = 1
In this ase,
H0
is known as the
null hypothesis
and
H1
as the
alternative hypothesis.
We fo us
rst on binary HT. Let,
= P [ a ept H1 |H0
= P [ a ept H1 |H1
= P [ a ept H0 |H1
= 1 PD
PF
PD
PM
A de ision between
p(z|0 )
and
p(z|1 )
H0
and
H1
true
true
true
]
]
]
(false alarm)
(dete tion)
(miss)
is based on a set of random variables (observations)
(probability density fun tion of
given
and
z,
where
are true, respe tively) are
known.
5.2 Neyman-Pearson Lemma

The Neyman-Pearson Lemma (NP) tells us how to hoose optimally between
by optimal we mean: given a xed maximum
Let
>0
PF ,
maximize
PD .
be a threshold. NP states that the optimal test is of the form:
hoose
hoose
H0
H1
if
if
p(z|1 ) < p(z|0 )

p(z|1 ) > p(z|0 )
hoose randomly on equality
26
H0
and
H1 , where
5.3 Example:
NP, also, states that su h a test
exists
and is unique for every
0 PF 1
The Neyman Pearson Lemma goes on to give us the form of the test. It boils down to omparing
the likelihood ratio to a threshold:
(z) =
p(z|1 )
p(z|0 )
(z) =
p(z|H1 )
p(z|H0 )
whi h is sometimes written:
Apply the NP test as follows:

1. Assign
H0
2. Sele t
PF .
and
H1 .
By onvention, we often set
PF
willing to a ept a 5% han e that we reje t
= 0.05 or 0.01.
H0
PF
= 0.05 means that we are
(missile in oming) when true ("no missile").
3. Simplify the threshold test if possible:
H1
(z)
H0
(z)
4. Determine (or approximate) the distribution of

5. Cal ulate the proper value of
based on
PF
H1
H0
(z).
and on the distribution of
5.3 Example:
Assume that the re eived signal in a radar dete tion problem is
z = + ,
N (0, 2 )
and
H0 : = 0
H 1 : = 1
>0
Then:
p(z|0 ) = N (z; 0, 2 )
p(z|1 ) = N (z; 1 , 2 )
Form the log-likelihood ratio:
27
(z).
5.3 Example:

p(z|1 )
(z 1 )2 z 2
L(z) = log(z) = log
=
p(z|0 )
2 2

2z1 12
=
2 2
At this point the NP tests amounts to
H1
(z)
= log()
H0
We further simplify by noting that the NP test an be boiled down to:
H1
= 0
with
H0
Now set
to satisfy:
PF = P [z > 0 |0 ] =
2 2 + 12
21
p(z|0 ) dz
p(z|0 )
p(z|1 )
1
0
PM
PF
Figure 11: Binary Hypothesis PDFs. Shaded regions illustrate probability density of
MATLAB
ommand syntax for the above is:
PF = 1 lambda_0
PD = 1 PM = 1 -
norm df(lambda_0, theta_0, sigma);

= norminv(1-PF, theta_0, sigma);
norm df(lambda_0, theta_1, sigma);
PD;
28
PF
and
PM .
5.4 Remarks
5.4 Remarks
1. The hoi e of threshold depends only on
2.
PF
PF
p( (z)|0 ).
and
PF
applies to ea h measurement taken. The ombined
after repeated measurements may
be large.
3. Note that the NP approa h does not take into a ount any
of
4.
H0
Q:
and
a priori judgement on the likelihood
H1 .
PF ?
What rationale an we give for our hoi e of
The NP formulation doesn't give any
guidan e here.
A: Remarks 2 and
ost (AKA risk).
3 point us to a Bayesian framework where we onsider the
Pj = prior
Cjk = ost
probability of
of de iding
R = C00 P0 P [de iding H0 |H0

+C11 P1 P [de iding H1 |H1
is true]
is true]
Hj
expe eted
Hj
when
Hk
is true
+ C10 P0 P [de iding H1 |H0

+ C01 P0 P [de iding H0 |H1
is true]
is true]
Or in a synonymous form:
R = C00 P0 P [de iding H0 |H0
is true]
+ C10 P0 PF + C11 P1 PD + C01 P0 PM
Goal: minimize R
This formulation leads dire tly to a likelihood ratio test and it hooses the value of
for
you:
p(z|H1 )
p(z|H0 )
H1
H0
Q: If the Bayesian approa h automati ally hooses the "right" value of , then why take the
NP approa h, whi h seems to set
a ording to some seemingly arbitrary value of
A: It's often not possible or pra ti al to nd Cjk
29
and
Pj .
PF ?
6 OPTIMUM DETECTION STRUCTURES
6 Optimum Dete tion Stru tures

S ribe: Gezheng Wen
6.1 Dete tor Performan e

A dete tor designed a ording to the NP formulation is guaranteed to have the max
given
PF .
Even so,
The urve is alled
PD might
not be a eptable. It is often useful to plot
Re eiver Operating hara teristi s.
PD
PD
for a
as a fun tion of
PF .
Receiver Operating characteristics

1
0.9
0.8
0.7
PD
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
PF
0.6
0.7
0.8
0.9
Figure 12: Re eiver Operating hara teristi s

When
, it is easy to
(0, 0) and (1, 1).
dis riminate. When
p(z|0 ) = p(z|1 ),
the ROC is a straight line
going through
6.2 Composite Hypothesis Testing

In a simple hypothesis testing (HT), the parameter
orresponding to a xed value:
[0 , 1 , . . . , M1 ].
In a omposite HT,
an take on a range of values (e.g.,
[0, 2]).
In many ases of pra ti e
interest,the nal dete tion test again boils down to a likelihood ration test.
Example:
H0 :
z1 = w1
z2 = w2
H1 :
z1 = A cos + w1
z2 = A sin + w2
A > 0 is some positive onstant,

N (0, 2 ) and w1 , w2 , are independent
Our parameter is this ase is
where
is some random variable and

1
2
30
U[0, 2], w1, w2
H0
H1
Figure 13: (a)signal absent,(b)signal present
1 {0, A},2 [0, 2].

parameter spa e on whi h
where
The
lives is thus de omposed into two disjoint regions:
Figure 14: Two Regions
0 = { | 1 = 0}
1 = { | 1 = A}
HT boils down to:
(z) =
p(z | 1 )
p(z | 0 )
H1
H0
We an show that:
1
1
exp{
(z )T (z )}
2 2
2 2

1 cos 2
= (1 , 2 ) =
1 sin 2
p(z | ) =
31
1
1
exp{ 2 z T z}
2 2
2
Z2
p(z | 1 ) = p(z | 1 = A, 2 = )p ()d
p(z | 0 ) =
1
1
exp{ 2 (z (A, ))T (z (A, ))}d
4 2 2
2
This te hnique of averaging out is generalizable.

The key idea is to re over
From the likelihood ratio:
p(z | )
by averaging over
A
exp( 2
p(z | 1 )
2)
(z) =
=
p(z | 0 )
2
Z2
A
exp{ (z1 cos() + z2 sin()}d
2
Simplify further by going to the polar oordinates:
r=
q
z12 + z22
= arctan(
z2
)
z1
Then
z1 = r cos()
z2 = r sin()
Plug in and simplify

2
(z) =
A
exp( 2
2)
2
Z2
exp{
Note:I0 ( Ar
2 ) =
1st kind.
1
2
2
R
0
exp{ Ar
2 cos( )}d ,
where
I0
Ar
cos( )}d
2
is the 0th order modied Rossel fun tion of
Let
(z)
Sin e
I0 (x)
= I0 (
is monotoni ally in reasing in
(z)
x,
Ar
)
2
H0
Thus,the optimal test ompares
Under
Under
H0 ,
H1 ,
H0
the test is equivalent to:
H1
H1
2 I1
0 ( )
=
A
with a threshold.
is distributed as a Rayleigh distribution.

is distributed as a Ri e distribution.
32
Q: Why do we so often nd ourselves performing a orrelation as part of a dete tion test?
A: Here we go:
General Gaussian Problem:
H0 : z N (0 , P0 )
H1 : z N (1 , P1 )
Note:
j
Measurement in
may be orrelated (Pj may no be diagonal) and the values of elements of
may not be all the same.

We an show in
HW
(z) =
that HT redu es to:
1
1
(z 0 )T P01 (z 0 ) (z 1 )( T )P11 (z 1 )
2
2
H1
H0
In the general ase, nding an analyti al expression for the distribution of
(z)
But some spe ial ases are tra table.
Example:
Suppose
P0 = P1 = P,
then the test redu es to:
H1
(z) = P1 z
H0
where
= 0 .
0 = 0
Further suppose
and
P = 2 I.
(z) = T1 z
H1
H0
n
P
zi
()
> ( )
H1
<
H0
1i
Figure 15: Correlator blo k diagram
33
is not easy.
7 ESTIMATION BASICS
7 Estimation Basi s
S ribe: Zaher M. Kassas
7.1 The Problem of Parameter Estimation

Given a data set
Z k , {z(1), z(2), . . . , z(k)},
we wish to estimate some unknown parameter
x.
Our estimate will be a fun tion of the data set and possibly time, i.e.
= x(k,
Z k ) = f (k, Z k ).
x
f (k, Z k ),
There are several ways to designing the fun tion
i.e. ways to dening an estimator, whi h
will be explored next.
7.2 Maximum Likelihood Estimators

x, then we simply regard it as an unknown
p(Z |x) is known. Dene the likelihood fun tion
If we have no a priori information about the parameter

k
onstant ve tor. Assume that the onditional pdf

as
Z k (x) , p(Z k |x).

Then, the maximum likelihood (ML) estimator is one that maximizes su h likelihood fun tion,
namely
ML = arg max Z k (x)

x
x
By the rst order ne essary ondition (FONC) of optimality, we set the derivative of the likelihood
fun tion with respe t to
to zero, namely
Z k (x)
0.
x
This impli itly denes
ML
x
as the solution of
equations in
unknowns.
7.3 Maximum A Posteriori Estimators

Assume that we have some prior information about the parameter
the pdf
p(x),
in that it is a sample from

p(Z k |x). Then, the
alled the prior pdf. Assume that we know the onditional pdf
posterior distribution an be written by Bayes' rule as
p(x|Z k ) = R
p(Z k |x)p(x)
.
p(Z k |x)p(x)dx
(93)
The maximum a posteriori (MAP) estimator is one that maximizes the posterior distribution,
namely
MAP = arg max p(x|Z k )

x
x
It is worth noting that the denominator in (93) is onstant with respe t to the maximization
parameter
x;
therefore, it is immaterial in nding
MAP .
x
The ML and MAP estimators are the same if the prior pdf
p(x) = lim
, |x|
0, |x| >
34
1
2 ;
1
2 .
p(x)
is diuse, i.e.
7.4 Example 1: ML Estimator for a Linear First-Order Dynami al System

7 ESTIMATION BASICS
7.4 Example 1: ML Estimator for a Linear First-Order Dynami al System

Consider the linear dynami al system, hara terized by the dierential equation
y = ay.
The solution to this system an be readily found to be
y(t) = y(0)eat ,
where
y(0)
is the initial ondition. The obje tive is to estimate the parameter
x = y(0)
given the
dis rete measurements
z(j) = y(jt) + w(j),

where
is the sampling interval,
j = 1, 2, . . . , k,
w(j) N (0, 2 ),
and
E [w(i)w(j)] = 2 ij ,
with
ij
being the
yields the
Krone ker delta fun tion. The measurements an be re-written as
z(j) = j x + w(j),
where
j , eatj
and
j = 1, 2, . . . , k,
x , y(0).
By independen e of the measurements, the likelihood fun tion an be written as
Z k (x) , p(Z k |x) = p(z(1)|x)p(z(2)|x) p(z(k)|x).

The pdf
p(z(j)|x)
is nothing but the pdf
p(w),
shifted by
z(j) j x,
i.e.
p(z(j)|x) = N (z(j) j x, 2 ).
Therefore, we an write
p(Z k |x) = pw (z(1) 1 x)pw (z(2) 2 x) pw (z(k) k x),

where
pw (w) = N (w; 0, 2 ).
Therefore the likelihood fun tion be omes
Z k (x) =
Dierentiating
Z k (x)
k
1 X
1
exp
2 2
(2)k/2 k
j=1
[z(j) j x]
as per the FONC yields
k
X
1
[z(j) j x] (j ) 0
Z k (x). 2
j=1
Re ognizing that
ML estimate as
Z k (x) 6= 0
implies that the se ond term must be zero. Solving for
xML =
Pk
j=1
Pk
35
z(j)j
j=1
j2
7.5 Example 2: MAP Estimator for a Linear First-Order Dynami al System

7 ESTIMATION BASICS
7.5 Example 2: MAP Estimator for a Linear First-Order Dynami al

System
Consider the system in Example 1 with the additional information that x is a sample from a
x N (x , x2 ). The posterior pdf in
this ase is given by
Gaussian distribution with known mean and varian e, namely
k
X
1
1
2
[z(j) j x] 2 (x x )2 ,
exp 2
p(x|Z k ) = c1
2 j=1
2x
(2)(k+1)/2 k x
where
c1 = [p(Z k )]1
is onstant that is not a fun tion of x, hen e it is immaterial for purposes

p(x|Z k ) as per the FONC yields
of maximization. Dierentiating
k
X
1
1
p(x|Z k ). 2
[z(j) j x] j 2 (x x ) 0.
j=1
x
Re ognizing that
MAP estimate as
p(x|Z k ) 6= 0
implies that the se ond term must be zero. Solving for
x
MAP =
It is worth noting that as
x 0,
1
2
Pk
z(j)j +
j=1
1
2
Pk
j=1
j2 +
1
2 x
x
yields the
1
2
x
the ML and MAP estimators oin ide.
Note that we an re-write the posterior pdf as
where
2
new
is dened as

1
1
2
p(x|Z k ) = p
,
(x
)
exp
MAP
2
2
2new
2new
2
new
=
1
2
Pk
j=1
j2 +
1
2
x
i.e. the posterior pdf has the form of a Gaussian pdf with mean
x
MAP
and varian e
7.6 Least-Squares Estimators
2
new
.
Least-squares (LS) estimators aim at minimizing the ost fun tion, dened by the sum of the
squares of the error between the data and the model, denoted as
namely
LS = arg min C(k, Z k ) , ||||22 .

x
x
Hen e, LS estimators aim at nding
that minimize the Eu lidean norm of su h error ve tor.
7.7 Example 3: LS Estimator for a Linear First-Order Dynami al System

For the linear rst-order dynami al system in Example 1, the LS estimator is dened a ording
to
xLS
k
1 X
2
= arg min C(k, Z ) , 2
[z(j) j (x)] .
x
2 j=1
k
36
7.8 Minimum Mean-Squared Error Estimators

Dierentiating
C(k, Z k )
7 ESTIMATION BASICS
as per the FONC yields
k
1 X
[z(j) j x] (j ) 0.
2
j=1
Solving for
yields the LS estimator as
xLS =
Pk
j=1
Pk
z(j)j
j=1
j2
Note that the resulting LS estimator oin ided with the ML estimator. This stems from the fa t
that for Gaussian random variables, the ML estimation orresponds to a Eu lidean distan e metri .
7.8 Minimum Mean-Squared Error Estimators

Assume that
p(x)
and
p(Z k |x)
are known, whi h allow us to determine
p(x|Z k ).
The mini-
mum mean-squared error (MMSE) estimator aims at minimizing the ost fun tion dened by the
onditional mean of the squared estimation error, i.e.
MMSE = arg min C(

x
x, Z ) , E (
x x) |Z
x
Taking the derivative of
C(
x, Z k )
with respe t to
(
x x)2 p(x|Z k )dx.
as per the FONC and solving for
yields the
MMSE estimator to be the onditional mean, namely

MMSE = E x|Z k .
x
Q: Under what onditions is x MMSE = x MAP ?

A: Whenever the peak (mode) of p(x|Z k ) oin ides with its mean.
7.9 Summary
LS
ML
MAP
For Gaussian random variables, if
ML for Gaussian measurement noise.
MAP if a diuse prior is assumed.
MMSE if the mean and the mode of
p(x)
p(x|Z k )
oin ide.
is diuse, then: ML
37
MAP
LS
MMSE.
8 LINEAR ESTIMATION FOR STATIC SYSTEMS
8 Linear estimation for stati systems

S ribe: Joshua Yuan
8.1 MAP estimator for Gaussian problems

z = Hx + w, suppose we wish to estimate x Rnx x1 Assume that
w N (0, R) is nz 1 noise ve tor, and H is a known nz nx matrix. We also have a priori knowledge
T
Pxx ) The noise is also un orrelated, so the ovarian e matrix E[(x x)W
that x N (x,
]=0
Our approa h to this problem is to develop a joint PDF for x and z , then use our understanding
MAP su h that
of onditional Gaussian distributions to determine p(x | z). Thereby, we an nd x
Given a system model
MAP = arg max p(x | z)

x
(94)
In order to nd
p(x | z),
z.
we rst need
p(x, z),
so let's dene some things that will help us get
there. First, let's nd
z = E[z]
= E[Hz + w]
(95)
(96)
= HE[x] + E[w]
+0
= Hx
Next, we need ovarian e matri es
Pxz , Pzx ,
and
(97)
(98)
Pzz .
T]
Pxz = E[(x x)(z
z)
(99)
T]
= E[(x x)(Hx
+ w H x)
T
(100)
H + (x x)w
= E[(x x)(x
x)
]
(101)
H ] + E[(x x)w
= E[(x x)(x
x)
]
= Pxx H
(102)
(103)
(104)
Be ause of the symmetry of ovarian e matri es, we an also say that
T
Pzx = Pxz
= HPxx
For
(105)
Pzz ,
T]
Pzz = E[(z z)(z
z)
(106)
+ w)(H(x x)
+ w) ]
= E[(H(x x)
T
= HPxx H + R
38
(107)
(108)
8.1 MAP estimator for Gaussian problems8 LINEAR ESTIMATION FOR STATIC SYSTEMS
Now we an dene
p(x, z),
p(x | z) =
from whi h we an nd
p(x | z).
p(x, z)
p(z)
(109)
= c(z) exp
1
2
T Pxx
(x x)
T Pzx
(z z)
Pxz
Pzz

(x x)
(z z)
(110)
Re all from the linear algebra review that
(111)
1 T 1
Vxx = (Pxx Pxz Pzz
Pxz )
(112)
Pxx
Pzx
Pxz
Pzz
1
Vxx
Vzx
Vxz
Vzz
where
Vxz =
Vzz =
Now we an nd
Note that
MAP
x
1
Vxx Pxz Pzz
1
(Pzz Pxz Pxx
Pxz )1
by maximizing
p(x | z),
is a s alar value. Set
0=
Vxz
Vzz

(x x)
(z z)
x
(115)
(116)
C(x | z) = 0.
C
x
C
=
Now we solve for
(114)
or equivalently, we an minimize over
log(p(x | z) = const + C(x | z)

Vxx
1
T
T
(x x)
(z z)
C(x | z) =
Vzx
2
C(x | z)
(113)
(117)
x1
.
.
.
C
xnk
(118)
+ Vxz (z z)
= Vxx (x x)
(119)
1
MAP = x
Vxx
x
Vxz (z z)
(120)
MAP .
x
+
=x
1
Pxz Pzz
(z
z)
(121)
Using our original formulas for the ovarian e matri es, we get
MAP = x
+ Pxx H T (HPxx H T + R)1 (z H x)
x
39
(122)
8.2 Bat h Least Squares Estimator
8.1.1 Analysis of MAP Estimate

, E[x]
,
x
Let's nd
and
Pxx|z .
=xx
(123)
Pxx H (HPxx H + R)
= (x x)
T
= E[x x]
Pxx H (HPxx H + R)
E[x]
=0
Be ause
E[x]
= 0,
MAP
x
(z H x)
E[z H x]
(127)
(128)
(129)
] Pxz Pzz
T]
= E[(x x)(x
x)
E[(z z)(x
x)
1 T
T
T ]Pzz
E[(x x)(z
z)
Pxz + Pxz Pzz
E[(z z)(z
Note that
Pxx|z < Pxx ,
1 T
Pxz Pzz
Pxz
T
Pxx H (HPxx H T
and
1 T
T ]Pzz
z)
Pxz
(130)
(131)
+ R)
HPxx
(132)
implying we are getting a better result. However, if the elements of
be ome too large (the noise is too powerful),
assumes
(126)
]
= E[(x x)(x
x)
= Pxx
(125)
is an unbiased estimator.
x
T ]
Pxx|z = E[x
= Pxx
(124)
Pxx|z = Pxx .
Note also that the above analysis
w are Gaussian distributed.

x or w is not Gaussian? What
an we say about the form of the estimator then?

Q: What if
A: See Bar-Shalom, se tion 3.3; Even if they're not Gaussian, if E[w] = 0 and we know x , Pxx ,
and
R,
then
MAP
x
found from assuming Gaussian distribution is the optimal linear estimator.
However, there may still be a better non-linear estimator, i.e.
1
MAP = x
+ Pxz Pzz
x
(z z)
= (I
an estimator of the form
+ Dz ,
Cx
Pxz Pzz
H)x
a linaer ombination of
(133)
1
Pxz Pzz
z
and
(134)
is the best hoi e for C and D.

Given measurements
z(i) = H(i)x + w(i)
(with no a priori
distribution), suppose that
w(i) N (0, R(i))

E[w(i)w(j)] = 0, if i 6= j
T
R(i) = R(i) > 0
And we dene a (s alar) ost fun tion
J(k) =
k
X
i=1
J(k)
(135)
(136)
(137)
as
[z(i) H(i)x]T R(i)1 [z(i) H(i)x]
40
(138)
We want to use this ost fun tion to deemphasize any noisy measurements.
Note that now we have a time index, and no a priori knowledge.
1
p(z k | x) = C exp 2 J(k)
(139)
J(k) = 2 ln[p(z | x] + constant
So minimizing
J(k)
with respe t to
is equivalent to maximizing
estimator).
(140)
(x) = p(z k | x)
(ML
We need a hange of notation to in orporate data and parameters for ea h new time step.
z(1)
z(2)
Z ( k) = . Rnz k1
..
z(k)
H(1)
H(2)
H k = . Rnz knx
..
H(k)
w(1)
w(2)
k = . Rnz k1
..
w(k)
R(1)
0
.
.
0
R(2) 0
.
Rnz knz k
Rk =
.
.
..
..
0
0
0
0 R(k)
(141)
(142)
(143)
(144)
(145)
(R is a blo k diagonal matrix.)

Then we an rewrite
J(k)
as
J(k) = [Z k H k x]T (Rk )1 [Z k H k x]

The summation notation is now gone, but
J(k)
(146)
is the same as before, and still a s alar value.
Now to minimize J(k),
0=
J
x
J
x1
J
x2
.
.
.
J
xnx
(147)
(148)
= 2(H k )(Rk )1 (Z k H k x)
41
(149)
8.3 Square-Root-Based LS Solutions

This gives us
nx
number of equations, and
nx
unknowns. Now we solve for
k.
x
k = [(H k )T (Rk )1 H k ]1 (H k )T (Rk )1 Z k

x
T
= (H R
By dropping the
H)
H R
(150)
(151)
supers ripts, we get one of the deathbed identities, the normal equations.
8.2.1 Properties of the Least Squares Estimator

=xx
(152)
= x (H T R1 H)1 H T R1 z
T
= x (H R
= [I (H R
H)
H R
H)
H R
=0
(153)
(Hx + w)
(154)
H]x (H R
Using this, we an nd expe tation and varian e for
H)
H R
(155)
(156)
= (H T R1 H)1 H T R1 E[w]
E[x]
=0
(157)
(158)
As you an see here, the least squares estimator is unbiased.
Px x = (H T R1 H)1 H T R1 E[wT ]R1 H(H T R1 H)1

T
= (H R
This follows be ause the
1
with a neighboring R
.
H)
(159)
(160)
H T R1 and R1 an els with (H T R1 H)1 , and E[wT ] = R, an eling

Refer to Bierman for more detail on this topi .
H T R1 H must be invertible, whi h amounts to the parameter x being

T 1
observable. In some ases, x is only lo ally observable from the data, meaning that H R
H is
T 1
almost singular. In these ases, it's a bad idea to dire tly invert H R H ; small numeri al errors
.
an lead to large errors in x
Re all for a solution to exist,
The solution to this problem is to use square root algorithms.
They are more numeri ally ro-
bust, and also leads to a more elegant and intuitive interpretation of least squares.
Let
R), R = RT , R > 0.
z H x + w , w N (x,
MATLAB,
Ra = chol(R);
42
Use the Cholesky fa torization,
RaT Ra = R.
In
Then let
z = (Ra1 )T z
(161)
(RaT )1 z
(RaT z
RaT H
RaT w
(162)
=
=
H=
w=
(163)
(164)
(165)
Now we have a transformed measurement model,
z = Hx + w
E[w] =
E[ww ] =
=
=
(166)
E[RaT w ] = 0
E[RaT w wT Ra1 ]
RaT E[w wT ]Ra1
RaT RaT Ra Ra1
=I
Be ause
E[ww ] = I , w N (0, I).
(167)
(168)
(169)
(170)
(171)
Now our noise is standard uniform, and ni er to work with.
The ost fun tion of our problem is
J(k) = [z Hx]T [z Hx]

= kHx zk
(172)
(173)
Re all multiplying a ve tor by an orthonormal matrix doesn't hange its magnitude.
kvk = v T v
T
(174)
kQvk = v Q Qv
T
=v v
= kvk
(175)
(176)
(177)
Q: Can we multiply Hxz by some orthonormal matrix and leverly simplify the ost fun tion?
A: yes we an; use QR fa torization
R
= H,
Q
so let
T
T =Q
to get
T (Hx z)k2
J(k) = kQ
zk
2
= kRx
T
z = Q z
We an break this up further, like this:
43
(178)
(179)

2
R
o
z

J(k) =
x o

0
o x zo k + kk2
= kR
(180)
(181)
How do we minimize this? We solve rst for x, made possible be ause if rank(H) = nx (whi h
o Rnx nx and is invertible. We also have the solution
R
is needed for observability), then
d
kRo x zo k2
dx
o kR
o x zk
= 2R
1
o zo
=R
0=
LS
x
(182)
(183)
(184)
The solution was obtained without squaring anything. Unfortunately, the omponent norm of
is the irredu ible part of the ost. In other words,
Re all our expression for
Px x

J(k)
LS
x
= kk2
(185)
from before
Px x = (H T R1 H )1
(186)
(187)
(H Ra1 RaT H )1
T
1
= (H H)

T 1
T
T
Q
Ro
= R
Q
0
o
0
T
1
o R
o)
= (R
=
Remember that
kk2
is orthonormal, so
(188)
(189)
(190)
1 R
T )
(R
o
o
T Q
= I.
Q
(191)
Additionally, we know that
o
R
an be inverted
without problems at this point.

The matrix
in
oT R
o
HT H = R
is alled the
information matrix.
produ es a large amount of information about
x,
A large
leading to a small
HT H
indi ates data
Px x|z
.
8.3.1 Re ursive Least Squares Estimator

k+1
z k ), P (k, z k ) = Px x|z
x(k,
= H(k +
k and we get a new pie e of data, z
k+1
+ 1, z
1)x + w(k + 1). Ideally, we would like to nd x(k
) without starting from s rat h at ea h
Suppose we have
time step.
44
So let us set this up using sta ked ve tors and matri es:
k+1
H k+1
Rk+1
wk+1

zk
=
z(k + 1)
k
H
=
H k+1
k

R
0
=
0 R(k + 1)

wk
=
w(k + 1)
(192)
(193)
(194)
(195)
We an show that
J(k + 1) = J(k) + [z(k + 1) H(k + 1)x]T R1 (k + 1)[z(k + 1) H(k + 1)x]

J(k)
The key to the re ursion is to rewrite
as
z k )]T P 1 (k, z k )[x x(k,

z k )]
J(k) = [x x(k,
k
k 1
(R )
+ [z H x]
The term after the
(197)
[z H x]
(198)
sign is the irredu ible omponent that doesn't depend on
x.
Therefore,
z k )]T P 1 (k, z k )[x x(k,

z k )]
J(k + 1) = [x x(k,
T
+ [z(k + 1) H(k + 1)x] R
(196)
(199)
(k + 1)[z(k + 1) H(k + 1)x] + irreducibles
J
Then to minimize the ost fun tion, we set x
=0
and solve for
(200)
+ 1, z(k + 1),
x(k
J
x
z k )] zH T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x]
= 2P 1 (k, z k )[x x(k,
0=
(201)
(202)
Solving this gives
+ 1, z(k + 1)) = [P 1 (k, z k ) + H T (k + 1)R1 (k + 1)H(k + 1)]1

x(k
z k ) + H T (k + 1)R1 (k + 1)z(k + 1)]
[P 1 (k, z k )x(k,
(203)
(204)
With some manipulation, we an get a ommon form we see in literature,
+ 1, z k+1 ) = x(k,
z k ) + W (k + 1)[z(k + 1) H(k + 1)x]
x(k
1
k
T
1
W (k + 1) = [P (k, z ) + H (k + 1)R (k + 1)H(k + 1)]1 H T (k + 1)R1 (k + 1)
This form of
is a feedba k/predi tion/ orre tion form with a gain matrix
45
W.
(205)
(206)
8.4 Example: Maximum Likelihood Estimate

8.3.2 Analysis of Re ursive LS Algorithm
We an analyze
+ 1, z k+1 )
x(k
like this,
+ 1, z k+1 ) = x x(k
+ 1, z k+1 )
x(k
(207)
z ) W (k + 1)w(k + 1)
= [I W (k + 1)H(k + 1)]x(k,
Let's assume that
z k )wT (k + 1)] = 0.
E[x(k,
(208)
Then,
+ 1, z k+1 )x
T (k + 1, z k+1 )]
P (k + 1, z k+1 ) = E[x(k
(209)
z )x
(k, z )][I W (k + 1)H(k + 1)]
= [I W (k + 1)H(k + 1)]E[x(k,
T
+ W (k + 1)R(k + 1)W (k + 1)
We an nd alternate formulas for
(210)
(211)
P (k + 1, z k+1 ) and W (k + 1) an be
derived using the matrix
inversion lemma.
P (k + 1, z k+1 ) = P 1 (k, z k ) + [H T (k + 1)R1 (k + 1)H(k + 1)]1

W (k + 1) = P (k + 1, z k+1 )H T (k + 1)R1 (k + 1)
P (k + 1, z k+1 ) = [I W (k + 1)H(k + 1)]P (k, z k )

W (k + 1) = P (k, z k )H T (k + 1)[H(k + 1)P (k, z k )H k (k + 1) + R(k + 1)]1
(212)
(213)
(214)
(215)
8.4 Example: Maximum Likelihood Estimate

Re all a previous example,
zj = j x + w,
oming from
y = ay
(216)
zj = y(jt) + w(j)
a(jt)
j = exp
(217)
(218)
(219)
The ML estimate is
x
(k, z ) =
Pk
j=1
Pk
j z(j)
j=1
j2
2
P (k, z k ) = LS
= Pk
(220)
(221)
2
j=1 j
, w(j) N (0, 2 )
(222)
Now, in the new notation we ame up with above,
H(k + 1) = k+1
R(k + 1) =
46
(223)
(224)
8.5 Re ursive Approa h Using Square-Root8 LSLINEAR

Method ESTIMATION FOR STATIC SYSTEMS
Then it follows that
W (k + 1) =
We absorbed the
+ 1, z k+1 ).
x(k
+ 1, z
x(k
2
k+1
k+1
Pk
j=1
j2
k+1
k+1
k+1
=
Pk
2
k+1 + j=1 j2
k+1
= Pk+1
2
j=1 j

2
Pk
k+1 + 2
2
j=1
(225)
(226)
(227)
term in the denominator into the summation. Using this, we an get a new
Pk j z(j)
h
Pk j z(j) i
k+1
j=1
j=1
)=
+ Pk+1
z(k + 1) k+1
Pk
Pk
2
2
2
j=1 j
j+1 j
j=1 j
Pk+1
j=1 j z(j)
= Pk+1
2
j=1 j
(228)
(229)
8.5 Re ursive Approa h Using Square-Root LS Method

8.5.1 Review of square-root LS method
Let's review the square-root LS method rst.
z = H x + w , w N (0, R), the data equation,
Given
1. Change of oordinates to turn data equation into a normalized form.
z = Hx + w, w N (0, I)
(230)
Do a Cholesky fa torization of R.
2. Set up the ost fun tion
J(x) = kHx zk2

Additionally, note that we have shown the
(231)
that minimizes a ost fun tion based on the
normalized data equation also minimizes a ost fun tion based on the original data equation.
Our solution
LS
x
will be the value of
that minimizes
J(x).
3. Transform the problem again
J(x) = kHx zk2
= kT (Hx z)k
(232)
(233)
LS
This works provided T is orthogonal, so we'll hoose a spe ial T that makes solving for x
R
= H from QR fa torization, so Q
is orthogonal, and R
is upper
T , and Q
easier. Let T = Q
47

triangular. Then,
T (Q
Rx
zk2
J(x) = kQ
Q
T zk2
= kRx

2
R
o
z

=
x o

0
o x zo k2 + kk2
= kR
4. Minimize
J(x)
Assuming that
o
R
(234)
(235)
(236)
(237)
is invertible,
o1 zo
LS = R
x
LS ) = kk
J(x
kk2 is
o
Note: if R
The
(238)
(239)
irredu ible, the ost due to data that did not quite t.
is not invertible, then the problem was not quite observable, and the original
matrix had some linearly dependent olumns.

5. Analysis of the Solution
oT R
o )1
Px x|z
= (R
1 R
T
=R
o
(240)
(241)
We get the result by taking the inversion inside to avoid al ulating square inversion. We an
solve for the information matrix,
T R
HT H = R
o o
(242)
Remember the information matrix is larger in the positive denite sense when you have lots
of information. This is equivalent to the Fisher information matrix for an e ient estimator.
this ase, the information matrix is equal to the Fisher information matrix.
8.5.2 Re ursive Square-Root LS
So for
Let's onsider the re ursive square root approa h we have.
We are assuming that we have
already done normalization on our data equation.
J(k + 1) = kH k+1 x z k+1 k2
(243)
We sta k in oming data like before, normalizing measurements as they ome in, assuming independent measurements.

2

Hk
zk

x
J(k + 1) =
H(k + 1)
z(k + 1)
= kH k x z k k2 + kH(k + 1)x z(k + 1)k2

o (k)x zo k2 + kH(k + 1)x z(k + 1)k2
= k(k)k2 + kR

2
R
o (k)
zo (k)

x
= k(k)k2 +
H(k + 1)
z(k + 1)
48
(244)
(245)
(246)
(247)
(248)

o (k)x zo k2 and kH(k + 1)x z(k + 1)k2 are so similar,
kR
minimize for the same x.
With our sta ked x oe ient ve tor, we an do QR fa torization,

o (k)
R
QR =
H(k + 1)
Be ause
And if we transform via
we an sta k them while we
(249)
T ,
T =Q

2
R
o (k + 1)
zo (k + 1)

x
J(k + 1) = k(k)k +
(k + 1)
0
o (k + 1)x zo (k + 1)k2
= k(k)k2 + k(k + 1)k2 + kR
2
Ignoring the irredu ible stu, we an now nd
+ 1, z k+1 ),
x(k
and the varian e
o1 (k + 1)zo (k + 1)
+ 1, z k+1 ) = R
x(k
P (k + 1, z k+1 ) = [(H k+1 )T H k+1 ]1
=
=
1
R
o
and
T
R
o
oT (k + 1)R
o (k +
[R
1
(k + 1)R
T (k
R
o
o
are square root information matri es.
49
1)]
(250)
(251)
P (k + 1, z k+1 ).
(252)
(253)
+ 1)
(254)
(255)
9 NONLINEAR LEAST SQUARE ESTIMATION
9 Nonlinear Least Square Estimation

S ribe: Chao Jia
9.1 Basi s of nonlinear least square estimation

Problem model:
z = h(x + w)
w N (0, R), w Rnz 1
fun tion of x. We an write
where
and
x Rnx 1 .
Normally
(256)
nz > nx . h(x)
is an
nx -by-1
Problem statement:
nd
h1 (x)
h2 (x)
h(x) = . .
..
hnz (x)
ve tor
(257)
to minimize the obje tive fun tion
JN LW (x
(nonlinear and
weighted).
JN LW (x) = [z h(x)]T R1 [z h(x)].
(258)
, our nonlinear LS estimate.

x that minimizes JN LW (x), then this be omes x
h(x) = Hx, also note that p(z|x) = C exp [ 21 JN LW (x)] (be ause
the noise is additive). Minimizing JN LW (x) is equivalent to maximizing the likelihood fun tion.
is also the ML estimate. It does not matter that h(x) is nonlinear. Hense, JN LW (x) has a
So x
If we an nd a unique
Note that in the linear ase
rigorous statisti al meaning.
Properties of JN LW (x):
1.
JN LW (x) 0
2.
JN LW (x) = 0 h(x) = z
(assumes
R > 0)
Use Cholesky fa torization to simplify the form of JN LW (x):

T
and wa = Ra w N (0, I). Then we have
ha (x) = RaT h(x),
RaT Ra = R.
JN LW (x) = (za ha (x))T (za ha (x))

= ||za ha (x)||
Drop the a's and onsider

now.
JN LW (x) = ||za ha (x)||2 .
50
Then let
za = RaT z ,
(259)
(260)
This will be our generi problem formulation
9.2 Newton-Rhaphson method
Aside:
The gradient operator is
x f T (x) =
T
) =
x = ( x

x1

x2
.
..
xn

x1

x2
. [f1 (x), f2 (x), . . . , fm (x)]
..
xn
The Ja obian is the transpose of this:

f
T
T
x = [x f (x)]
Dene
h
x xnom
= H(xnom ) = H .
Thus,
Hij =
f2
x1
f2
x2
.
.
.
f2
xn
x
f11
x2
.
.
.
f1
xn
hi
xj x
h(x)
(262)
(263)
x.
were linear, then from
(261)

J T
) = 2H T (z h(x)
0=(
x x
Our goal is to solve this equation for
To nd
Rnm

h
J
T
( ) = 2(z h(x)
T H.
= 2(z h(x)
x x
x x
So we need
Note that if
xn
We know that
..
fm
x1
fm
x2
.
.
.
fm
xn
nom
JN L is:
J

x1
J T
..
0=(
) = .

x x
J

First order ne essary ondition for the minimum
0 = 2H T (z h(x))
we have
= (H T H)1 H T z .
x
whi h satises the rst order ne essary ondition for minimum
JN L ,
rstly we will
onsider the Newton-Rhaphson (NR) method.

The NR method is originally used for nding zeros of a nonlinear fun tion.
As shown in the above gure, to nd the zero of a nonlinear fun tion
initial guess
x1
In Newton-Rhaphson
expansion:
x2 , where x2 = x1 x.
f (x1 )
method x = f (x ) . The NR
1
f (x),
we start from an
and update it as
method omes from rst order Taylor
0 = f (x) = f (x + x) = f (x1 ) + f (x1 )x + H.O.T.
x
=
f (x1 )
f (x1 )
51
(264)
Figure 16: Newton-Rhaphson method
In the urrent ontext (minimizing

Dene
=x
x
g,
x
then we have
JN L (x), suppose
= x
+x
g.
x
we have a guess for
and all it
g .
x
Based on the rst order ne essary ondition we need
g + x)[z
g + x)]
0 = H T (x
h(x
(265)
The ve tor Taylor series is dened as
= f (x
g + x)
f (x)
Let
f (x) = H T (x)[z h(x)],
(266)

f
g) + [
+ O(x
2)
= f (x
]x
x x g
(267)
then based on NR method we need to solve
g) + [
]x
0 = f (x
x x g
f
H T
h
=
[z h(x)] H T ( )
x
x
x
2h
= 2 [z h(x)] + H T H = V
x
f
2h
x2 is beyond a matrix, it is a tually a tensor of three indi es. x
f
entry in
x an be written as
where
=V

nz
X
2 hl
f
g )] + (H T H)ij
[zl hl (x
]ij =
x
xi xj x g
l=1
52
is symmetri . Ea h
(268)
9.3 Gauss-Newton Algorithm (Gill et al. 4.7.2)

(with step length algorithm)
g )(z h(x
g )) + V x)
. So we have x
= V 1 H T (x
g ). In
0 = H T (x
x
g + x
, then repeat this until onvergen e. (i.e., x
0)
for NR method just let x
g starts "su iently lose" to the optimum, then x
g onverges super linearly to the optimum
If x
under normal onditions (h(x) satises several smoothness requirements).
NR method says: solve
9.3 Gauss-Newton Algorithm (Gill et al. 4.7.2)

(with step length algorithm)
Problems with NR method:
1. painful to ompute
H
x
g
x
2. NR an diverge if
2h
x2
is "too far" from solution
soln produ es a small residual error, then ||z h(x

soln )|| is small, and it is reasonable to
If x
T
negle t the se ond order term in V , i.e., let V = H H .
= V 1 f (x
g ) = (H T H)H T [z h(x
g )].
Therefore, x
One an arrive at this expression via another straightforward route:
2
g ) Hx||
= ||z h(x)||
2
J(x)
= || z h(x
| {z }
(269)
Let
= ||z Hx||
2.
J(x)
The
in (269).
that minimize this ost fun tion is the same as
given
Comparison:
NR method applies a Taylor series expansion to the rst order ne essary onditions. On the
other hand, the GN method applies Taylor series to the measurement model:
To avoid divergen e, we modify the updating equation in GN method as
0 < 1.
g )+Hx+w
z = h(x
.
g x
g + x
, where
x
g + x]
is less than J[x
g ]. This guarantees onvergen e in
J[x
J(x) 0.
< J(x
g )?
Q: How do we know : 0 < 1 s.t. J(x g + x)
g + x)
, all the step length.
= J(x
A: Dene J()

dJ
= J(x

. ( hain rule)
g ) and d
Note: J(0)
)x
= ( J
x x
+x
Choose
s.t.
virtually
all situations be ause of ondition
Consider

dJ

d
=0
=
x
x x g
(270)
g )]T H(H T H)1 H T (z h(x

g ))
= 2[z h(x
g )))T (H T H)1 (H T (z h(x

g )))
= 2(H T (z h(x
(271)
(272)
This is a quadrati form. In ases where the nonlinear system is observable,

H T H > 0, whi h

dJ
implies rank(H) = nx (all olumns are linearly independent) and
d =0 < 0, with equality only
g ) = 0.
if z h(x
Thus for some small values of
g)
, J()
< J(x
is guaranteed!
Pra ti al but rude approa h for GN algorithm:
53
9.4 Levenberg-Marquart Method (LM)

1. Set
2.
= 1.
= 0), Jg
Jg = J(
new = J( = 1)
3. while
Jg new Jg
= /2
Jg new = J()
end
This will onverge to a lo al minimum.
9.4 Levenberg-Marquart Method (LM)

In ea h updating step of LM method, we have
g x
g + x
LM ,
x
where
LM = (H T H + I)1 H T [z h(x
g )]
x
with
If
0
LM
= 0 x
is equivalent to
in Gauss-Newton method with
The LM method does not use step size parameter
LM
GN
=0
=
=1
=0
instead it uses
(273)
= 1.
.
Pseudo LM algorithm:
1.
=0
2. he k if
H T H > 0,
3. ompute
something small, say
= ||H|| 0.001
LM ()
x
4. measure the ost

5. If
if not let
Jgnew Jg ,
g ), Jgnew = J(x
g + x
LM ())
Jg = J(x
then let
= max(2, ||H|| 0.001),
go to step (3)
Else we a ept the new guess
LM a hieves fast onvergen e near the solution if the residuals are small. If the residuals near
solution are not small, we may have to use full NR(Newton Rhaphson) method.
54
10 STOCHASTIC LINEAR SYSTEM MODELS
10 Sto hasti Linear System Models

S ribe: Noble Hatten
10.1 Continuous-time model for dynami systems
Figure 17: We will be dealing with NLTV sto hasti systems.

A ontinuous-time model for a dynami system is given by
x = A(t)x + B(t)u + D(t)

v (t)
(274)
x (the state ve tor) is nx 1, u (the input ve tor) is nu 1, v (the pro ess noise or disturban e)
nv 1, and the matri es A (the system matrix), B (the input gain), and D (the noise gain) are
where
is
appropriately dimensioned. The measurement model is given by
z(t) = C(t)x(t) + w(t)

where
(the measurement noise) is
Note: v
nz 1
and
(275)
(the measurement matrix) is
nz nx .
is ontinuous but not dierentiable, meaning that it annot properly be put into a
dierential equation. However, a more rigorous derivation of the equation still leads to the same
result.
Note:
If
v(t) = w(t)
= 0,
v(t)
x(t0 ) in order to predi t the

z() for t0 ), but sometimes
pdf of
data
x(t0 ) and u( ) for t0 t, one an predi t x and z
w(t)
are not equal to 0, it may be enough to know the
onditional pdfs of all future x(t) values ( onditioned on the
then, given
for the entire time interval. When
and
this is not the ase.
The solution of the above system is
x(t) = F (t, t0 )x(t0 ) +
F (t, )[B( )u( ) + D( )

v ( )]d
(276)
t0
where
Note: F
is the state transition matrix, sometimes denoted
(t, t0 ).
is dened by its properties:
F (t, t0 ) = A(t)F (t, t0 )

t
F (t0 , t0 ) = I
where
is the identity matrix. Other properties of
55
in lude
(277)
(278)
F (t, ) = F (t, )F (, )
F (t, ) = [F (, t)]1
If
A is onstant, then F (t, ) = F (t, 0) = eA(t ), where the matrix exponential is dened
as
eA(t ) = I + A (t ) +
1 2
A (t )2 + . . .
2!
(279)
(The matrix exponential may be al ulated in MATLAB using the expm() fun tion.)
If
is time-varying, then one must numeri ally integrate the matrix initial value problem in
order to determine
F (t, ).

v(t)
is white noise if
v(t)
is sto hasti ally independent of
v( )
for all
t 6=
and
E[
v (t)] = 0.
(The noise must be independent even when t and are very lose.) A onsequen e of whiteness
is that E[
v (t)
v T ( )] = V (t)(t ), whi h for V (t) = V = const implies that Svv (f ) = power
spe tral density of
v(t) = v ,
independent in time
zero mean
has ovarian e
i.e. the power spe trum is at. This implies that white noise is
V (t)(t )
This also implies a pro ess that has innite power be ause, at
t = , (t ) = .
However, we
"just go with" the tion of white noise be ause it is onvenient and an be a good approximation
over a frequen y band (as opposed to the entire frequen y spe trum).
10.3 Predi tion of mean and ovarian e

The predi tion of the mean is
E[x(t)] = F (t, t0 )E[x(t0 )] +

= F (t, t0 )x(t
0) +
x(t)
F (t, )[B( )u( )]d
(280)
t0
F (t, )B( )u( )d
(281)
t0
Additionally,
= A(t)x(t)
+ B(t)u(t)
x
meaning that the predi tion of the mean follows the linear system. If
(282)
E[
v (t)] = v
(not zero-
mean), then
= A(t)x(t)
+ B(t)u(t) + D(t)
x
v
56
(283)
10.4 Dis rete-time models of sto hasti systems10 STOCHASTIC LINEAR SYSTEM MODELS
whi h is still deterministi .
The ovarian e is
T]
Pxx (t) = E[(x x)(x
x)
Substituting for
(284)
gives
0 )] +
Pxx (t) = E[(F (t, t0 )[x(t0 ) x(t
t0
0 )] +
F (t, 1 )D(1 )
v (1 )d1 )(F (t, t0 )[x(t0 ) x(t
F (t, 2 )D(2 )
v (2 )d2 )T ]
t0
(285)
Expanding gives
0 ))(x(t0 ) x(t
0 ))T ]F T (t, t0 ) +
Pxx (t) = F (t, t0 )E[(x(t0 ) x(t
Z tZ
t0
F (t, 1 )D(1 )E[

v (1 )
v T (1 )]D T (2 )F (t, 2 )d1 d2
t0
(286)
where E[
v (1 )
v T (1 )] = V (1 )(1 2 ). Also, ross terms in the ovarian e go to zero be ause
0 ))(
E[(x(t0 ) x(t
v T )] = 0 for all > t0 due to the whiteness of the noise. The sifting property of
the Dira delta allows us to ollapse one integral:
Pxx (t) = F (t, t0 )Pxx (t0 )F (t, t0 ) +
F (t, 1 )D(1 )V (1 )D T (1 )F T (t, 1 )d1
(287)
t0
P xx (t) = A(t)Pxx (t) + Pxx (t)AT (t) + D(t)V (t)D T (t)
(288)
Note: P xx (t) is symmetri and linear in Pxx (t).

Note: If A(t) = A = const and if Re[eig(A)] < 0 eigenvalues of A (i.e.
and if
V (t) = V = const, D(t) = D = const,
Pxxss ,
the steady-state value. Thus,
and
V > 0,
then
Pxx (t)
the system is stable)
onverges to a onstant
0 = APxxss + Pxxss AT + DV D T
(289)
is a linear matrix equation known as the ontinuous time Lyapunov equation.
To solve this
equation in MATLAB, use the lyap() fun tion
Pxxss = lyap(A, DV D T )
(290)
10.4 Dis rete-time models of sto hasti systems

Assume a zero-order hold ontrol input:
u(t) = u(tk ) = uk
for
tk t tk+1 .
Then an equivalent dis rete-time model of our original ontinuous system is:
x(tk+1 ) = F (tk+1 , tk )x(tk ) + G(tk+1 , tk )u(tk ) + v(tk )

57
(291)
10.4 Dis rete-time models of sto hasti systems10 STOCHASTIC LINEAR SYSTEM MODELS
Figure 18: A zero-order hold ontrol input holds a onstant value for
t [tk , tk+1 ).
where
G(tk+1 , tk ) =
v(tk ) =
tk+1
tk
Z tk+1
F (tk+1 , )B( )d
(292)
F (tk+1 , )D( )
v ( )d
(293)
tk
v(tk )
is the dis rete-time pro ess noise disturban e. If
v(t)
is white noise, then
E[v(tk )] = 0
and
E[v(tk )v (tj )] =
tj+1 Z tk+1
tj
where
F (tk+1 , 1 )D(1 )E[

v (1 )
v T (2 )]D T (2 )F T (tj+1 , 2 )d2 d2
(294)
tk
E[
v (1 )
v T (2 )] = V (1 )(1 2 ).
E[v(tk )v T (tj )] = jk
tk+1
Thus,
F (tk+1 , 1 )D(1 )V (1 )D T (1 )F T (tk+1 , 1 )d1
(295)
tk
= jk Qk
jk appears
j = k .)
(The Krone ker delta

are independent unless
(296)
be ause, if the
windows do not overlap, then the intervals
We may now simplify the notation:
x(tk ) x(k)
(297)
v(tk ) v(k)
F (tk+1 , tk ) F (k)
(299)
u(tk ) u(k)
G(tk+1 , tk ) G(k)
V (tk+1 , tk ) V (k)
58
(298)
(300)
(301)
(302)
The dynami s model then be omes:
x(k + 1) = F (k)x(k) + G(k)u(k) + v(k)

E[v(k)] = 0 and E[v(k)v T (j)] = kj Qk .
and if tk+1 tk = t = const, then
where
stant),
(303)
For a time-invariant system (A, B, D, V on-
F (k) = F = eAt
Z t
G(k) = G =
eA Bd
Q(k) =
(304)
(305)
eA DV D T eA
(306)

The dis rete-time measurement model is
z(k) = H(k)x(k) + w(k)

where
w(k)
(307)
is dis rete-time white measurement noise. This implies
E[w(k)] = 0
(308)
E[w(k)wT (j)] = kj R(k)
(309)
, but
R(k) = RT (k) > 0. We think of z(k) as a sample from z(t) = C(t)x(t) + w(t)
it is not orre t to say that z(k) = z(tk ). The problem lies in the assumption of whiteness, and
. Be ause of this, E[w(t
k )w
T (tk )] = (0)Q = . The orre t way
therefore innite power, for w(t)
to obtain z(k) is to assume an anti-aliasing lter is used to low-pass lter z(t) before sampling.
where
This an be modeled as an average-and-sample operation:

By the Nyquist sampling theorem, we must sample at twi e the bandwidth of the anti-aliasing
1
= t
).
Now, R(k) be omes
lter (fsamp
1
R(k) = 2
t
where
tk
tk
tk t tk t
1 )w
T (2 )]d1 d2
E[w(
k)
Rw(t
t
(310)
1 )w
T (2 )] = Rw(
1 )(1 2 ).
E[w(
10.6 Full dis rete-time model
Combining the dis rete-time dynami s and measurement models, we obtain the full dis rete-time
model.
59
10.6 Full dis rete-time model
x(k + 1) = F (k)x(k) + G(k)u(k) + v(k)

z(k) = H(k)x(k) + w(k)
(311)
(312)
E[v(k)] = 0, E[v(k)v T (j)] = Q(k)kj , E[w(k)] = 0, E[w(k)wT (j)] = R(k)kj ,

E[w(k)v (j)] = 0 k, j .
with
and
We an solve the dis rete-time dynami s equation.
x(k) = [F (k 1)F (k 2) F (0)]x(0) +
Note:
When
2) F (i + 1)]
i = k 1,
k1
X
i=0
{[F (k 1)F (k 2) F (i + 1)][G(i)u(i) + v(i)]}

(313)
the identity matrix
should be used in pla e for
in the summation.
We an also predi t the statisti s of
[F (k 1)F (k
x(k).
+ 1) = E[x(k + 1)] = F (k)x(k)
x(k
+ G(k)u(k)
+ 1))(x(k + 1) x(k
+ 1))T ] = F (k)Pxx F T k + Q(k)
Pxx = E[(x(k + 1) x(k
(314)
(315)
Pxx exploits E[(x(k) x(k))v

(k)] = 0 to an el ross terms.
F (k) = F = const, and if Q(k) = Q = const (LTI system), and if max(abs(eig(F ))) < 1
The simpli ation of

If
(asymptoti ally stable dis rete-time system), then
P (k)xx Pxxss
where
Pxxss
(316)
is a steady-state ovarian e. We then have the dis rete-time Lyapunov equation
Pxxss = F Pxxss F T + Q
(317)
Pxxss = dlyap(F , Q)
(318)
In MATLAB:
Also,
to
Q > 0 Pxxss > 0.
Q: Where does the requirement that max(abs(eig(F ))) < 1 for stability ome from?
A: F = eAt . One an show that || = eRe()t , where is the eigenvalue of F orresponding
an eigenvalue of
A.
I
Re() < 0,
then
|| < 1.
60
11 KALMAN FILTER FOR DISCRETE-TIME LINEAR SYSTEM
11 Kalman lter for dis rete-time linear system

S ribe: Alan Bernstein
1) Dynami s and measurement model
x(k + 1) = F (k)x(k) + G(k)u(k) + v(k)

z(k) = H(k)x(k) + w(k)
E[v(k)v(j)T ] = Qk jk
E[v(k)] = 0
E[w(k)] = 0
E[w(k)w(j) ] = Rk jk
E[w(k)v(j)T ] = 0 k, j
Note:
and
(319)
(320)
(321)
(322)
(323)
un orrelated is not required, but will be assumed here to simplify analysis.
2) A priori information about initial state
E[x(0)|Z 0 ] = x
(0)
T
(325)
(326)
(327)
E[(x(0) x(0))(x(0) x(0)) |Z ] = P (0)
E[w(k)(x(0) x(0)) ] = 0 k 0
E[v(k)(x(0) x(0)) ] = 0 k 0
3)
v(k), w(k), x(0)
(324)
are all Gaussian random variables
Here we have hosen some spe i onditions for setting up the Kalman lter. Later, we will
relax some of these onditions, or investigate the impli ations of them being violated.
There are several generi estimation problems:
x
(k|Z k ) = x
(k|z0, ..., zk ) - Use measurements up to time k, to estimate
Filtering: Determine
state at time k. This an be done in real time, and ausally - it does not depend on future states.
Smoothing: Determine x
(j|Z k ), for j < k - Use future data to nd an improved estimate of a
histori al state. This is non ausal.
(j|Z k ), for
Predi tion: Determine x
j > k
- Estimation of a future state - this gives the worst
estimate of the three types.

Predi tion is always hard, espe ially when it's about the future - Grou ho Marx
We will on entrate on the ltering problem for now.
There are several dierent standard notations for the ltering problem. Bar-Shalom's notation
is unambiguous, but umbersome, so we will use a leaner alternative.
61
Bar-shalom
Humphreys
x
(k|Z k ) = x(k|k)
x
(k + 1|Z k ) = x(k + 1|k)
P (k|Z k ) = P (k|k)
P (k + 1|Z k ) = P (k + 1|k)
x
(k)
x(k + 1)
P (k)
P (k + 1)
others
+
name
x
(k)
x
(k + 1)
P + (k)
P (k + 1)
a posteriori state estimate

a priori state estimate
a posteriori state estimate error ovarian e
a priori state estimate error ovarian e
This notation is ni e be ause it orresponds to the prior in the stati estimation equations.
Filtering steps (derivation based on MMSE)
0) set
k = 0,
then
x
(k), P (k)
known
1) state and ovarian e propagation: predi t state and error ovarian e at step
on data through
k+1, onditioned
z(k)
state estimate propagation:
x
(k + 1) = E[x(k + 1)|Z k ]
(328)
x
(k + 1) = E[F (k)x(k) + G(k)u(k) + v(k)|Z ]

x
(k + 1) = F (k)E[x(k)|z k ] + G(k)u(k) +
E[v(k)|Z
k ]
x
(k + 1) = F (k)
x(k) + G(k)u(k)
(329)
(330)
(331)
state ovarian e propagation:
P (k + 1) = E[(x(k + 1) x
(k + 1))(x(k + 1) x(k + 1))T |Z k ]
P (k + 1) = F (k)P (k)F T (k) + Q(k)
(332)
(333)
Some additional steps for this an be found on page 204 of Bar-Shalom. The ross-terms are zero
due to the fa t that
v(k)
x
(k).
is zero mean and white, and orthogonal to
The rst term tends to de rease (sin e the absolute value of the eigenvalues of F are less than one),
but be ause of the additive
term, the overall ovarian e grows (in the positive denite sense).
2) measurement update: use
x
(k + 1) and P (k + 1) (a priori info)
and the measurement
z(k + 1)
to get an improved state estimate with a redu ed estimation error ovarian e, due to our measurement update. This next step has been solved previously in lass, in the review of linear algebra.
Bar-Shalom derivation: get distribution of
[x(k + 1) z(k + 1)]T ,
onditioned on
Z k+1 ,
and solve
by analogy to stati ase.
z(k + 1) = E[z(k + 1)|Z k ]
(334)
z(k + 1) = E[H(k + 1)x(k + 1) + w(k + 1)|Z

z(k + 1) = H(k + 1)E[x(k + 1)|z
z(k + 1) = H(k + 1)
x(k + 1)
62
k+1
k+1
]
((
((k+1
(1)|Z
E[w(k
]
] +(
((+
(335)
(336)
(337)
Pzz (k + 1) = E[(z(k + 1) z(k + 1))(z(k + 1) z(k + 1))T |Z k ]

Pzz (k + 1) = E[[H(k + 1)(x(k + 1) x
(k + 1)) + w(k + 1)]
[H(k + 1)(x(k + 1) x(k + 1)) + w(k + 1)]T |Z k ]

Pzz (k + 1) = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)
Pxz (k + 1) = E[(x(k + 1) x
(k + 1))(z(k + 1) z(k + 1))T |Z k ]
Pxz (k + 1) = E[(x(k + 1) x
(k + 1))(x(k + 1) x
(k + 1))T ]H T (k + 1)+
(
(
((((
((
E[(x(k (
+(
1)(
(
x(k
+ 1))wT (k + 1)]
(((
Pxz (k + 1) = P (k + 1)H T (k + 1)
Thus, we have all the moments required to spe ify
P ([x(k + 1)z(k + 1)]T |z k ),

x(k + 1)
x(k + 1)
P
N
,
z(k + 1)
H(k + 1)
x(k + 1)
H P
where the
(k + 1)
P H T
HP HT + R
so

(338)
index on the elements of the ovarian e matrix are suppressed for brevity, and
zk.
the distribution is onditioned on

We seek:
x
(k + 1) = E[x(k + 1)|Z k+1 ]
= E[x(k + 1)|z(k + 1), Z k ]
= E[x(k + 1)|z(k + 1)]
Here, the onditioning on
Zk
is impli it; by suppressing this onditioning, the form of this problem
is made to resemble the stati ase. So, the problem has now been redu ed to one we've already
solved (in Linear Estimation of Stati Systems):
1
x(k + 1) = x
(k + 1) + Pxz (k + 1)Pzz
(k + 1)[z(k + 1) z(k + 1)]
1
T
P (k + 1) = P (k + 1) Pxz (k + 1)Pzz (k + 1)Pxz

(k + 1)
substituting:

1
x = x + P H T H P H T + R
[z H x
]
P = P P H T [H P H T + R]1 H P
where the
(k + 1)
index is suppressed on ea h term in both expressions for brevity.
63
(339)
(340)

Using Bar-Shalom notation:
S(k + 1) = Pzz = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)

(k + 1) = z(k + 1) H(k + 1)
x(k + 1)
T
W (k + 1) = P (k + 1)H (k + 1)S(k + 1)
(innovation ovarian e )
(innovation )
(Kalman
gain matrix )
x
(k + 1) = x(k + 1) + W (k + 1)(k + 1)
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W T (k + 1)
Summary:
given
0) set
x
(0), P (0),
k=0
1) propagate state and ovarian e ( ompute
x
(k + 1), P (k + 1)
2) measurement update of state and ovarian e:

ompute:
(k + 1) = z(k + 1) H(k + 1)
x(k + 1)
T
S(k + 1) = H(k + 1)P (k + 1)H (k + 1) + R(k + 1)
(innovation )
(innovation ovarian e )
T
W (k + 1) = P (k + 1)H (k + 1)S(k + 1)
(gain )
x(k + 1) = x
(k + 1) + W (k + 1)(k + 1)
(a posteriori or ltered state estimate)
T
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W (k + 1)
(a posteriori state error ovarian e )
3)
k k+1
4) goto (1)
64
12 ALTERNATIVE FORMULAS FOR COVARIANCE AND GAIN OF KALMAN FILTER
12 Alternative formulas for ovarian e and gain of Kalman

lter
Note: These are algebrai ally equivalent to the formulas given previously
P (k + 1) = [P 1 (k + 1) + H T (k + 1)R1 (k + 1)H(k + 1)]1

P (k + 1) = [I W (K + 1)H(k + 1)]P (k + 1)[I W (K + 1)H(k + 1)]T
(341)
. . . + W (K + 1)R(k + 1)W T (K + 1)
(342)
The se ond of these is alled the Joseph Form of the state ovarian e update. It guarantees
P (k + 1) > 0, even with the limited pre ision of a omputer. The

P (k + 1) is always positive denite, so this an help implementation.
Kalman lter assumes that
W (k + 1) = P (k + 1)H T (k + 1)R1 (k + 1)
(343)
that
Sin e
P (k + 1)
is usually al ulated using
W (k + 1),
this last form is more useful for derivation
and analysis than as part of the ( ausal) lter.
12.0.1 Interpretation of the Kalman gain

What is the
W (k + 1)?
Again repeat the denition of it:
W (k + 1) = P (k + 1)H T (k + 1)[H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)]1

x(k + 1) = x
(k + 1) + W (k + 1)(k + 1)
(344)
(345)
Remarks
As
If
If
W (k + 1) 0
then new measurements are not taken into a ount in
P (k + 1) is very small then W (k + 1) is also small and x(k + 1)

+ 1) being small implies that x
(k + 1) is a good estimate)
(k
(P
P (k + 1)
is so big that
an upper limit.
H(k + 1)P (k + 1)H T (k + 1) R(k + 1)
x
(k + 1)
diers little from
then
W (k + 1)
x(k + 1)
approa hes
In this ase we trust the measurements almost entirely in the subspa e in
whi h they give information.
The subspa e in whi h the measurements provide information

If
nz < nx
or the range
P (k + 1)
then the measurements provide information in an
H(k + 1). Spe i ally,

of H(k + 1):
determined by
nz -dimension
subspa e that is
this subspa e is the omplement of the null spa e of
z(k + 1) range[H(k + 1)] = (null[H(k + 1)])

gives the weighting of the update in this subspa e.
This an be visualized in two dimensions:
65
H(k + 1),
range[H(k + 1)]
state update
x
(k + 1)
x
(k + 1)
Figure 19: The state update o urs in the subspa e in whi h the measurements provide information
12.0.2 Generalization for Weighted Pro ess Noise

We an modify the dynami s equations to have weightings on the noise su h that:
x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)

Here setting
=I
(346)
re overs the previous (unweighted) form. With this generalization, the only
hange in the Kalman lter is in the ovarian e propagation, whi h is now given by:
P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)
(347)
12.0.3 Deriving the Kalman Filter from a MAP approa h

The value of approa hing the ltering problem from a maximum a-posteriori estimation (MAP)
approa h:
E[x|Z k ]
Can be better for nonlinear problems or whenever
Aids interpretation of square-root ltering te hniques
Aids in statisti al hypothesis testing of the Kalman lter
is di ult to determine
For dis rete, statisti al, linear, time-varying (SLTV) systems, the result will be equivalent to
the MMSE-based derivation:
x
MAP (k) = x
MMSE (k)
12.0.4 Setting up the ost fun tion

Note: We will be estimating pro ess noise
v(k)
along with the state.
This omes out of the
equations anyway and sets us uf for doing smoothing in the future.

k
Note: Conditioning on Z is implied in all of this derivation but not shown expli itly for brevity
p[x(k + 1), v(k)|z(k + 1)] =
p[z(k + 1)|x(k + 1), v(k)]p[x(k + 1), v(k)]

p[z(k + 1)]
66
(348)

p[x(k + 1), v(k)|z(k + 1)] with respe t to x(k + 1) and v(k). This allows us to
p[z(k + 1)] and just maximize the numerator. This is equivalent to minimizing
J(x(k + 1), v(k)) (where log() denotes the natural logarithm):
Now maximize
essentially ignore
the ost fun tion
J(x(k + 1), v(k)) = log(p[z(k + 1)|x(k + 1), v(k)]) log(p[x(k + 1), v(k)])
(349)
Note that px(k+1),v(k) [x(k + 1), v(k)] = Cpx(k),v(k) [x(k), v(k)] where C is a onstant and the
subs ripts make lear that these are two distin t probability distributions.
Q: Why an we make the above transformation from p[x(k + 1), v(k)] to p[x(k), v(k)]?
A: Be ause x(k + 1) is a fun tion of x(k). More generally, for any invertible 1-to-1 fun tion
Y = g(X),
it an be shown that
pY [y] =
pX [g1 (y)]
| dy
dx |
Applying this transformation to the probability leads to an additive onstant in the ost fun tion
be ause of the
log,
and then this onstant an be ignored in minimizing the ost fun tion. The ost
fun tion be omes the sum of three parts:
1
J(x(k + 1), v(k)) = [z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)
2
. . . [z(k + 1) H(k + 1)x(k + 1)]
1
1
(k)]T P 1 [x(k) x
(k)] + v T (k)Q1 (k
. . . + [x(k) x
2
2
12.0.5 Minimizing the ost fun tion

The ost fun tion developed above an be written su in tly as:
1
[z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]
2
1
1
(350)
. . . + [x(k) x(k)]T P 1 [x(k) x
(k)] + v T (k)Q1 (k)v(k)
2
2
J(x(k + 1), v(k)) =
Repla e
x(k)
with
x(k) = F 1 [x(k + 1) G(k)u(k) (k)v(k)]
1
2
1
(k)]T P 1
. . . + [F 1 [x(k + 1) G(k)u(k) (k)v(k)] x
2
1
. . . [F 1 [x(k + 1) G(k)u(k) (k)v(k)] x
(k)] + v T (k)Q1 (k)v(k) (351)
2
J(x(k + 1), v(k)) =
Simplify this some by re alling that
x
(k + 1) = F (k)
x(k) + G(k)u(k)
1
2
1
. . . + [F 1 [x(k + 1) (k)v(k) x
(k + 1)]]T P 1
2
1
(352)
. . . [F 1 [x(k + 1) (k)v(k) x
(k + 1)]] + v T (k)Q1 (k)v(k)
2
J(x(k + 1), v(k)) =
67

Now minimize
J(x(k + 1), v(k))
with respe t to
x(k + 1)
and
v(k)
by nding the respe tive
rst-order ne essary onditions

J
v(k)
J
x(k + 1)
T
T
= 0 = T (k)F T (k)P 1 (k)F 1 (k)[x(k + 1) (k)v(k) x

(k + 1)]
. . . + Q1 (k)v(k)
(a)
(353)
(b)
(354)
= 0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]

. . . + F T (k)P 1 (k)F 1 (k)[x(k + 1) (k)v(k) x
(k + 1)]
Now we wish to reformulate (a) and (b) above into a form su h that:

x(k + 1)
C1
=
v(k)
C2
Then we an solve by taking the inverse of
or by subsitution (Cholesky fa torization)
12.0.6 Solving for the minimum- ost estimate

In order to get the equations (a) and (b) into the desired form, rst solve (a) for
v(k)
v(k) = [T (k)F T (k)P 1 kF 1 (k)k + Q1 (k)]1 T (k)F T (k)P 1 kF 1 (k)[x(k + 1) x

(k + 1)]
Now using the matrix inversion lemma and then the denition of
P (k + 1):
v(k) = Q(k)T (k)[F (k)P (k)F T (k) + (k)Q(k)T (k)]1 [x(k + 1) x

(k + 1)]
T
1
v(k) = Q(k) (k)P (k + 1)[x(k + 1) x
(k + 1)]
(355)
(356)
(357)
Substitute this result into (b) to get the next equation, and then olle t terms and manipulate
to simplify.
0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]

. . . + F T (k)P 1 kF 1 (k)[x(k + 1) (k)Q(k)T (k)P 1 (k + 1)[x(k + 1) x
(k + 1)] x
(k + 1)]
(358)
= H (k + 1)R
(k + 1)[z(k + 1) H(k + 1)x(k + 1)]

. . . + F T (k)P 1 kF 1 (k)[I (k)Q(k)T (k)P 1 (k + 1)][x(k + 1) x
(k + 1)]
Now repla ing
with
P (k + 1)P 1 (k + 1)
(359)
we have:
0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]

. . . + F T (k)P 1 kF 1 (k)[P (k + 1) (k)Q(k)T (k)]P 1 (k + 1)[x(k + 1) x
(k + 1)]
68
(360)

(k + 1) it an be shown that
Then using the denition of P
T
(k)Q(k) (k)] = I so the equation be omes:
F T (k)P 1 kF 1 (k)[P (k + 1)
0 = H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)] + P 1 (k + 1)[x(k + 1) x

(k + 1)]
Now solve for
x(k + 1),
whi h we now all
(after the measurement update).
x
(k + 1)
(361)
be ause it is the a posteriori state estimate
Then manipulate the result to get it in the same form as the
previous Kalman lter derivation
x
(k + 1) = [P 1 (k + 1) + H T (k + 1)R1 (k + 1)H(k + 1)]1
. . . + [P 1 (k + 1)
x(k + 1) + H T (k + 1)R1 (k + 1)z(k + 1)]
1
= x(k + 1) + [P
(k + 1) + H (k + 1)R
(k + 1)H(k + 1)]
(362)
H (k + 1)R
. . . [z(k + 1) H(k + 1)
x(k + 1)]
= x(k + 1) + W (k + 1)[z(k + 1) H(k + 1)
x(k + 1)]
(k + 1)
(363)
(364)
As expe ted this agrees with the previous derivation. We an also substitute this ba k into
to get an astimate for it:
v(k)
v(k)
v(k) = Q(k)T (k)P 1 (k + 1)[

x(k + 1) x(k + 1)]
T
1
v(k) = Q(k) (k)P (k + 1)W (k + 1)[z(k + 1) H(k + 1)

x(k + 1)]
(365)
(366)
This is extra information whi h we did not get from the previous MMSE derivation.
The end result for the state guess
x
(k + 1)
is the same as for the MMSE-based derivation but
now there is also an estimate for the pro ess noise
v(k). W (k + 1)
is dened as before.
x
(k + 1) = x(k + 1) + W (k + 1)[z(k + 1) H(k + 1)
x(k + 1)]
T
1
v(k) = Q(k) (k)P (k + 1)W (k + 1)[z(k + 1) H(k + 1)

x(k + 1)]
Note: Even in a perfe t model
white.
v(k) 6= v(k)
be ause
69
v(k)
is onditioned on
z(k + 1)
(367)
(368)
and is not
13 STABILITY AND CONSISTENCY OF KALMAN FILTER
13 Stability and Consisten y of Kalman Filter

S ribe: Mi hael Szmuk
13.1 Stability of KF
Assume
v(k) = w(k) = 0 k
(zero-input stability).
We want to show that the error ve tor
de ays to zero:
e(k) = x(k) x(k)
(369)
Pseudo Proof:
+ 1)
e(k + 1) = x(k + 1) x(k
(370)
+ 1) + W (k + 1){z(k + 1) H(k + 1)x(k

+ 1)}]
= F (k)x(k) + G(k)u(k) [x(k
(371)
However,
z(k + 1) = H(k + 1)x(k + 1) = H(k + 1)[F (k)x(k) + G(k)u(k)]

+ 1) = F (k)x(k)
x(k
+ G(k)u(k)
Substituting the above auses the
u(k)'s
to an el, giving the error dynami s for
(372)
(373)
v(k) = w(k) = 0
k :
e(k + 1) = [I W (k + 1)H(k + 1)]F (k)e(k)
(374)
It is a little tri ky to prove stability be ause this is a time varying system. If the system was
not time varying, it would be possible to look at the modulus of the eigenvalues. To analyze, we
will use a Lyapunov-type energy method to prove stability. Dene an energy-like fun tion
V [k, e(k)] =
1 T
e (k)P 1 (k)e(k)
2
(375)
V is a weighted 2-norm of e(k) be ause P (k) > 0. Therefore, V 0,

if e(k) = 0. We wish to show that V always gets smaller as k in reases.
Then,
only
with equality if and
1 T
e (k + 1)P 1 (k + 1)e(k + 1)
2
1
= eT (k)[P (k) + D(k)]1 e(k)
2
V [k + 1, e(k + 1)] =
(376)
(377)
where
D(k) , F 1 (k)[Q(k)T + P (k + 1)H T (k + 1)R1 (k + 1)H(k + 1)P (k + 1)]F T (k)

Note
D(k) 0,
[P (k) + D(k)]1 < P 1 (k).

Q(k), R(k), F (k), and H(k) (i.e.
whi h implies
under suitable onditions on
Thus
(378)
V [k + 1, e(k + 1)] V [k, e(k)]

Q and R are not too big
observable,
or small, ontrollable with respe t to points of entry of pro ess noise, or "sto hasti ontrollability
and observability").
Here observable implies that all unstable or neutrally stable subspa es of original system are
observable. However, the original system need not be stable! Then, we an show that:
70
13.2 Control of a System Estimated

13 STABILITY
by KF
AND CONSISTENCY OF KALMAN FILTER
1.
P 1 (k) > 0
2.
V [k + N, e(k + N )] < V [k, e(k)]
Thus,
for some bound
is de reasing but not be ause
for some
P (k)
: 0 < 1
and some
N.
is in reasing.
13.2 Control of a System Estimated by KF

Re all the system:
x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)

z(k) = H(k)x(k) + W (k)
(379)
(380)
Re all the Kalman Filter equations:
+ 1)
(k + 1) = z(k + 1) H(k + 1)x(k
S(k + 1) = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)
W (k + 1) = P (k + 1)H T (k + 1)S 1 (k + 1)
+ 1) = x(k
+ 1) + W (k + 1)(k + 1)
x(k
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W T (k + 1)
Now, suppose we have a ontrol law
by:
u(k) = C(k)x(k).
Innovation
(381)
Innovation Covarian e
(382)
Kalman Gain Matrix
(383)
A Posteriori State Est.
(384)
A Posteriori State Error Cov.
(385)
Then, the losed-loop dynami s are given
x(k + 1) = [F (k) G(k)C(k)]x(k) + (k)v(k)

If the pair
(F, G)
(386)
is ontrollable then we an design a stabilizing ontroller. However, it may be
expensive, impra ti al, or impossible to measure all of the states. What if we use
x(k)
instead?
Can this stabilize the system?
Q:
for x(k)? Alternatively,

x(k)
?
x(k)
. The system be omes:
A: Then, u(k) = C(k)x(k)
Can we substitute
an we measure
z(k),
estimate
x(k),
and
feed ba k
x(k + 1) = F (k)x(k) G(k)C(k)x(k)

+ (k)v(k)
+ 1) = [I W (k + 1)H(k + 1)][F (k)x(k)
x(k
G(k)C(k)x(k)]+
W (k + 1){H(k + 1)[F (k)x(k) G(k)C(k)x(k)

+ (k)v(k)] + w(k + 1)}
Change oordinates:

x(k)
x(k)
x(k)
=
.
x(k)
e(k)
x(k) x(k)
(387)
(388)
(389)
Then, the dynami s of the overall
ontroller-estimator system are given by:
x(k + 1) = [F (k) G(k)C(k)]x(k) + G(k)C(k)e(k) + (k)v(k)
e(k + 1) = [I W (k + 1)H(k + 1)]F (k)e(k)+

[I W (k + 1)H(k + 1)](k)v(k) W (k + 1)w(k + 1)
71
(390)
(391)
(392)
13.3 Matrix Ri ati Equation 13 STABILITY AND CONSISTENCY OF KALMAN FILTER
Q: Is this system stable?

A: For analysis, ignore exogenous
inputs. Also, re ognize that error dynami s do not depend
on the state (not true for nonlinear systems). We already showed that
any linear KF that satises the sto hasti observability onditions, et .
e(k) 6= 0
e(k) 0
as
for
Then, after a long time
and the state dynami s obey
x(k + 1) = [F (k) G(k)C(k)]x(k)

whi h is stable by our hoi e of
(393)
C(k).
This is alled the separation prin iple. That is, one an design a full-state feedba k ontrol law
separately from the KF. When onne ted, the system will be stable. The ombined system ends up
with the properties of the two independent systems (in terms of poles and zeros). In pra ti e, the
poles of the observer should be to the left (i.e. faster) than those of the ontrolled plant or pro ess.
Note: If the estimator has modeling errors (e.g.
or
not perfe tly known), then the system
may not de ouple.
13.3 Matrix Ri ati Equation

Re all that
P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)

We substituted for
P (k)
(394)
to get:
P (k + 1) = F (k){P (k) P (k)H T (k)[H(k)P (k)H T (k) + R(k)]1 H(k)P (k)}F T (k)+
T
(k)Q(k) (k)
This gives us a dynami model for
(395)
(396)
P (k),
whi h is nonlinear in
P (k).
This is alled the Matrix
Ri ati equation (MRE). Beware: Analysis of MRE is not easy! Spe ial ase: Steady state solution
for LTI systems:
and if
Q, R > 0
F (k) = F , G(k) = G,
et . If the pair
(F, H)
is observable,
then MRE onverges to a steady-state solution
Pss > 0,
(F, )
is ontrollable,
whi h an be determined
by solving the Algebrai Ri ati Equation (ARE):
Pss = F Pss Pss H T [H Pss H T + R]1 H Pss F T + QT

where
P (0) > 0.
(397)
The MATLAB fun tion for solving the ARE is:
[Wss , Pss , Pss ] = dlqe(F, , H, Q, R)
72
(398)
13.4 Steady-State KF Equations13 STABILITY AND CONSISTENCY OF KALMAN FILTER
13.4 Steady-State KF Equations

The steady-state Kalman Filter equations are given by
+ 1) = F x(k)
x(k
+ Gu(k)
+ 1) = x(k
+ 1) + Wss [z(k + 1) H(k + 1)x(k
+ 1)]
x(k
(399)
(400)
13.5 Steady-State Error Dynami s

The steady-state error dynami s are given by
e(k + 1) =Ass HF e(k)
(401)
Ass , I Wss H
(402)
where
We know that the dynami s of
e(k)
are stable from
max(abs(eig(Ass ))) < 1.
NB: Original system dynami s may not have been stable.
13.6 Properties of KF Innovations

, where E[(k)] =
The KF innovation ve tor is given by (k) = z(k) H(k)x(k)
T
show that E[(k) (j)] = S(k)kj (i.e. the innovation is white), where we re all that
0.
S(k) , H(k)P (k)H T (k) + R(k)

The whiteness of
(k)
We an
(403)
follows from the whiteness of the pro ess noise:
+ 1)]
v(k) = [T F T P 1 F 1 + Q1 ]1 T F T P 1 F 1 [x(k + 1) x(k
(404)
Using the Matrix Inversion Lemma,
+ 1)]
v(k) = Q(k)T (k)[P 1 (k + 1)[x(k + 1) x(k
(405)
P (k + 1) , F P F T + QT
(406)
where we re all that
Then,
0 = H T R1 [z(k + 1) Hx(k + 1)]+

+ 1)] x(k
+ 1)}
F T P 1 F 1 {x(k + 1) QT P 1 (k + 1)[x(k + 1) x(k
T
= H R
F
T
T
[z(k + 1) Hx(k + 1)]+

+ 1)]
P F 1 [P P 1 QT P 1 (k + 1)][x(k + 1) x(k
= H R
[z(k + 1) Hx(k + 1)]+

+ 1)]
F P F 1 [P (k + 1) QT ]P 1 (k + 1)[x(k + 1) x(k
T 1
1
+ 1)]
= H R [z(k + 1) Hx(k + 1)] + I P (k + 1)[x(k + 1) x(k
T
73
(407)
(408)
(409)
(410)
(411)
(412)
(413)
(414)
13.7 Likelihood of a Filter Model13 STABILITY AND CONSISTENCY OF KALMAN FILTER

where the last step is a hieved using the denition of
P (k + 1).
We an then solve for
+ 1) su h
x(k
that
+ 1) = [P 1 + H T R1 H]1 [P 1 x(k
+ 1) + H T R1 z(k + 1)]
x(k
(415)
This is just like the re ursive least-squares form. With further manipulation we an get:
+ 1) = x(k
+ 1) + [P 1 + H T (k + 1)R1 (k + 1)H(k + 1)]1
x(k
+ 1)]
H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k
+ 1) + W (k + 1)[z(k + 1) H(k + 1)x(k
+ 1)]
= x(k
Substituting this into the equation for
v(k)
(417)
(418)
gives
+ 1) x(k
+ 1)]
] = Q(k)T (k)P 1 (k + 1)[x(k
T
1
+ 1)]
= Q(k) (k)P (k + 1)W (k + 1)[z(k + 1) H(k + 1)x(k
v(k) = E[v(k)|Z
(416)
k+1
(419)
(420)
This gives us more information about the pro ess noise.
Q: How do we know if the lter is working properly?

A: If it is, then E[(k) T (j)] should behave as kj S(k).
If not, then there ould be a modeling
or oding error, or the system may be subje t to olored noise (i.e. non-white noise).
NB: This is only a ne essary ondition, not su ient.
That is, satisfying these onditions does
not imply that the lter will work properly.
13.7 Likelihood of a Filter Model

Consider the MAP form of the KF:
+ 1), (k)]
min{J[x(k + 1), (k)]} = J[x(k
1 T
= (k + 1)S 1 (k + 1)(k + 1)
2
(421)
(422)
Use this to get the likelihood fun tion for the orre tness of the KF model.
P [Z k |KF
mode]
1)]}
= C exp{J[x(1),
(0)]} exp{J[x(2),
(1)]}
exp{J[x(k),
(k
(423)
= C exp{
1
2
k
X
T (j)S 1 (j)(j)}
(424)
j=1
Suppose we have multiple KF models with dierent
Q: Whi h model do we trust the most?

A: The one with largest P [Z k |KF mode].
J ,
k
X
, P (0).
F , , Q, R, H , x(0)
This amounts to nding the KF with the minimum
T (j)S 1 (j)(j)
(425)
j=1
That is, the one with the minimum weighted least-square error. This leads to the Multiple Model
approa h.
74
14 KALMAN FILTER CONSISTENCY
14 Kalman Filter Consisten y

S ribe: Anamika Dubey
Re all the denition of onsisten y:
x(k)
T] = 0
E[x(k)
This doesn't hold in most auses as eviden e by the fa t that
(426)
P (k)
does not go to zero in most
ase. The ulprit is the pro ess noise. Hen e, for our purpose we de rease our requirements for
onsisten y.
14.1 Consisten y Denition

A state estimator (KF) is alled onsistent if its state estimation errors satisfy the onditions
given below. This is nite-sample onsisten y property, whi h means analysis for the estimation
errors based on a nite number of samples.
1. Unbiased estimator - Have zero mean
E[x(k)]
=0
(427)
2. E ient estimator - Have ovarian e matrix as al ulated by the lter
x(k)
T ] = P (k) = J 1 (k)
E[x(k)
where,
J 1 (k)
(428)
is the sher information matrix
In ontradi tion, the parameter estimator onsisten y is an asymptoti (innite size sample)
property. Typi ally the onsisten y riteria of the lter are as follows:
1. The state errors should have zero mean and have ovarian e matrix as al ulated by the lter.
2. The innovations should also have the same property as mentioned in 1.
3. The innovation should be a eptable as white.
The rst riteria, whi h is most important one, an be tested only in simulation (Monte- arlo
simulations). The last two riteria an be tested on real data (single-run / multiple-runs). In theory
these properties should hold but in pra ti e they might not. Following are the reasons for this:
Modeling error - an be solved using tuning the lter
Numeri al error - an be solved using square root information (SRI) te hnique
Programming error - solve by xing the bugs
14.2 Statisti al test for KF performan e

Consisten y of a lter an be tested in two ways.
First, o-line-test - Using multiple runs
(Monte- arlo simulations); Se ond, Real-time test - using real-time measurements by single-run or
multiple-runs of the experiment (if the experiment an be repeated).
75
Figure 20: Truth model simulation for monte- arlo test
14.2.1 Monte Carlo Simulation based test

Consider a truth model simulation
The truth model uses the dynami s equations to generate measurement
x(k).
z(k)
and state ve tors
The measurement ve tors are then used as input to the kalman lter (under evaluation) and
estimated states
x(k)
are generated.
Using the notation,
x(k)
= x(k) x(k)
(429)
dene normalized estimation error squared (NEES) as
T (k)P 1 (k)x(k)
(k) = x
If KF is working properly then
(k)
(430)
should be distributed as hi-square with
nx
degrees of free-
dom.
To demonstrate this further, We de ompose
and
(k)
P 1 (k) = V (k)(k)V T (k)
where,
V (k)V T (k) = I
is a diagonal matrix
let,
y(k)
= V T (k)x(k)
(431)
then,
(k) = y (k)
(k)y(k)
=
2
nx
X
yi
i=1
whi h is distributed as
We an do
with
nx
(432)
yi
degree of freedom, i.e. mean((k)) =
nx
monte arlo simulations of our truth model and lter the measurements and see
if average of the normalized estimation error squared (NEES) approa hes
76
nx .
Let,
k =
where
N
1 X i
(k)
N i=1
ith monte arlo simulation model. Then, if

nx as N approa hed to . Note that N k is
(433)
denote
approa hes to
the lter is working properly then

2N n .
distributed as
Probability distribution for average normalized state estimation error squared

0.014
Chisquare density
0.012
0.01
0.008
0.006
(1alpha)
0.004
0.002
(alpha/2)
(alpha/2)
Nr1
Nnx
Nr2
Figure 21: Hypothesis test for average NEES

If
2 hypothesis test, under H0 hypothesis that the lter

r1 k r2 ; (1 ) 100% of the times. s given by:
is false alarm probability. Then for
is onsistent,
should be limited to
N r2
N r1
We usually hoose
r1
and
r2
su h that
p()d = (1 )
= 0.01
or
(434)
= 0.05
In MATLAB:
r1
chi2inv( , N.nx )
2
N
and
r2
chi2inv(1
, N.nx )
2
Note: If these limits are violated then something is wrong with the lter.
14.2.2 Real-Time (Multiple-Runs) Tests
This is test is done on lter (KF) based on the real time data for the dynami model under
evaluation. The test is appli able for the experiments that an be repeated in real world. Hen e
the dynami model under evaluation should available for
real-time runs.
First ompute,
(k) = T (k)S 1 (k)(k)
77
(435)
If KF is working properly then this is distributed as
2nz .
We an do
N,
independent rum of the
real experiment (if it an be repeated) and al ulate :
k =
Note that
If
N k
is distributed as
2N nz .
is false alarm probability, then for
limited to
r1 k r2 , (1 ) 100%
N
1 X i
(k)
N i=1
(436)
hypothesis test for lter onsisten y,
of the times.
r1
and
r2
should be
are given by
In MATLAB:
r1
chi2inv( , N.nz )
2
N
Note:
and
r2
chi2inv(1
, N.nz )
2
If these limits are violated then something is wrong with the lter.
In addition to testing size of
(k),
we an test for its whiteness as well.
Compute the auto-
orrelation fun tion,
lm (k, j) = r
P
PN
i
li (k)m
(j)
2 P
2
N
N
i (k)
i (j)
i=1 l
i=1 m
i=1
(k, k) = 1
k = j then lm
k 6= j , then we expe t lm (k, k) to be small as ompared to 1,
For N large enough and k 6= j , this statisti s an be approximated
1
as varian e.
zero mean and
N
If
(437)
When
as normal distribution with
Normal Distribution
Probability distribution for autocorrelation function
(1alpha) *100 %
Figure 22: Hypothesis test for auto- orrelation fun tion
78
E[lm (k, j)] = 0
(438)
E[lm (k, j)Tlm (k, j)] =

hen e,
r lm (k, j) r , (1 ) 100% of the time.
Note:
Typi ally to do the whiteness test, we look
1
N
(439)
, 0, p
where, r is given as : r = norminv 1
2
(N )
at l = m and just look for k and k + 1
14.2.3 Real-Time (Single-Run) Tests

All the above tests assume that
independent runs have been made. While they an be used
on a single run, they might have a high variability. The question is whether one an a hieve a low
variability based on a single run, as a real-time implementation. This test for lter onsisten y is
alled
real-time onsisten y test.
These test are based on repla ing the ensemble averages by time averages based on the
ergodi ity
of the innovation sequen e.

Time-averaged normalized statisti s is given by :
1X T
(k)S 1 (k)(k)
=
(440)
k=1
is distributed as
then,
2nz .
Hen e, same hypothesis test applies, with only one experiment.
Similarly whiteness test an be done. The whiteness test statisti s for innovations, whi h are
time-step apart an be written as time-average auto orrelation.
P
k=1 l (k)m (k + j)
lm (j) = q P
2 P
2
( k=1 l (k)) ( k=1 m (k + j))
This statisti s is normally distributed for large
(441)
Furthermore it an be shown that for large
E[lm (j)] = 0
E[lm (j)Tlm (j)] =
(442)
(443)
Question: What if one of the above tests fail?

Answer: Consider lter tuning.
Strategy for lter tuning
Assume
vary
Q(k)
1). If
F (k), H(k), (k), R(k)
are orre t.
Sin e we know least about pro ess noise, Hen e,
to pass the onsisten y test.
(k)
is too small then it means model ts measurement too well.
79

This is the ase if
Q(k)
is too big. In this ase the system is making too mu h adjustment to
x(k)
in response to ea h measurement.
(k + 1) = F (k)P (k)F T (k) +
Re all P
(k)Q(k)T (k)
If Q(k) is too big then so is P (k + 1)

(k + 1)H T (k + 1) + R(k + 1)
Sin e, S(k + 1) = H(k + 1)P
whi h will make (k) too small.
Qold (k)
Solution is to de rease Q(k) su h that Qnew (k) =
10
2). If
(k)
is too big, then in rease
Q(k)
80
then
S 1 (k + 1)
will be too small,
15 CORRELATED PROCESS AND MEASUREMENT NOISE
15 Correlated Pro ess and Measurement Noise

S ribe: Henri Kjellberg
Base assumptions of the Kalman Filter:
1. Pro ess noise is white
2. Measurement noise is white
3. The two noises are un orrelated
Q: What if one or more of these assumptions is violated?

A: Let's ba k up and onsider our noise sour es in the frequen y domain.
time noise. Usually by white we mean spe i ally that:
Svv (f ) = power
Consider ontinuous
spe trum of
v=Const
V.
By the Weiner-Khon hin theorem, take the inverse Fourier transform to re over the auto- orrelation
fun tion:
Spe trally, at
E [
v (t) v (t + )] = R = F 1 [(S (f ))] = V ( )
(444)
un orrelated in time
This implies that a nonuniform power spe tra leads to auto- orelated noise:
6I
YY
5W
YY
Figure 23: An example of a low-pass pro ess

Strategy: the power spe trum of the auto- orrelated noise an be approximated as losely as
desired by the output of a linear subsystem driven by white noise.
ZKLWH
ZWQW
DXWRFRUUHODWHG
DXWRFRUUHODWHG
YW
XW
2ULJLQDO
6\VWHP
Figure 24: Power spe trum approximation linear system
x (t) = A (t) x (t) + B (t) u (t) + D (t) v (t)
81
(445)
15 CORRELATED PROCESS AND MEASUREMENT NOISE
z (t) = C (t) x (t) + w

(t) + n
(t)
Assume
(446)
E [
v (t)] = E [w
(t)] = E [
n (t)] = 0
ZWQW
ZKLWH
*V

ZWQW
YW
*V

XW
YW
2ULJLQDO
6\VWHP
Figure 25: Augmented system
Let
(t) =
v (t)
w
(t) + n (t)
The shaping or pre-whitening lters an be implemented in state spa e as follows:
x (t) = A x (t) + B n
(t)
(447)
(t) = C x (t) + D (t)
(448)
where
v (t)
(t)
(t) = w
n
(t)
(449)
The output of the shaping system an be used to drive the original systems. The augmented
dynami s be ome:

I
I
x (t)
x (t)
B (t)
A (t) D (t)
C
D (t)
D
=
+
u (t) +
0
0
(t)
x (t)
x (t)
0
0
A
B
{z
}
|

(450)
new A matrtix

x (t)
+ 0 I D (t)
z (t) = C (t) 0 I C
x (t)
Q: How do we develop the shaping lters G1 (s) and G2 (s)?

A: Estimate the power spe trum as a rational model (e.g., Svv (f ) =
transfer fun tion from
Svv (f )
and
Sww (f )
(451)
N (f )
D(f ) ) and derive the
via spe tral fa torization.
(See the standard texts on power spe trum parametrization).

One an often build up the desired power spe trum as a ombination of several building blo ks prototypi al Gauss-Markov pro esses. The above overs only auto- orrelated noise. See Bar Shalom
8.3 for ross- orrelated measurement and pro ess noise.
82
15.1 Aside: Realization Problem15 CORRELATED PROCESS AND MEASUREMENT NOISE
15.1 Aside: Realization Problem

We an re all the
Realization Problem,
where one attempts to go from an input-output
relationship governed by a transfer fun tion su h as
(s) u
y = G
(s)
(452)
into the state spa e model
x (t) =
A (t) x (t) + B (t) u (t)
(453)
y (t) =
C (t) x (t) + D (t) u (t)
(454)
For stri tly proper transfer fun tions (the degree of the numerator is less than the degree of the
denominator) one an reate a ontrollable anoni al form or an observable anoni al form. We do
this by exposing the oe ients of the numerator and denominator of the transfer fun tion. As an
example:
Given a transfer fun tion:
G (s) =
n1 s2 + n2 s + n3
s3 + d1 s2 + d2 s + d3
(455)
A state spa e model that is guaranteed to be ontrollable will take the form:

d2 d3
1
0
0 x (t) + 0 u (t)
1
0
0

y (t) = n1 n2 n3 x (t)
d1
x (t) = 1
0
(456)
(457)
Related MATLAB fun tions to investigate are: tf, ss, zpk, frd, ssdata, tf2ss.
15.2 Aside: Spe tral Fa torization
Spe tral fa torization involves taking a transfer fun tion su h as the one shown in Bar Shalom
on p. 67:
Svv () = S0
1
a2 + 2
(458)
And splitting it into two fun tions, one part with all the Right Hand Plane (RHP) roots and
the other in luding all the Left Hand Plane (LHP) roots.
Svv () =
1
1
S0
a + j a j
H () =
1
a + j
Is identied as the ausal transfer fun tion.
83
(459)
(460)
16 INFORMATION FILTER/SRIF
16 Information Filter/SRIF
S ribe: Ken Pesyna
Re all that the
a posteriori
state estimate error ovarian e matrix
P (k + 1)
an be dened in
terms of an update formula:
1
T
1
P (k + 1) =
(k + 1)H(k + 1)
P (k + 1) + |H (k + 1)R {z
}
matrix squaring operation
However, the matrix squaring operation is a bad idea numeri ally. It squares ondition number
of the matri es.
Note that
pT R
p > 0
P (k + 1) = R
may also be ill- onditioned for the same reason.
So may
R(k + 1).
Let's write
P (k + 1)
in terms of another update formula:
P (k + 1) = P (k + 1) W (k + 1)S 1 (k + 1)W T (k + 1)
(461)
P (k+1) > 0 but roundo errors (e.g. from non-innite

P (k + 1) indenite (not positive denite) or symmetri .
T
Joseph's form of P (k + 1) update to ensure P (k + 1) = P (k + 1) > 0, but this
a ura y of P (k + 1). A deterioration in a ura y of P (k + 1) an lead to garbage
Here, innite numeri al pre ision ensures

numeri al pre ision) an make
One an use
doesn't help the
results.
Bar Shalom introdu es the square root ovarian e lter, whi h keeps tra k of the square root of
the ovarian e matrix. But this requires the ability to update a Cholesky fa torization.

Strategy: work with
P 1 (k).
Instead of keeping tra k of
x(k + 1), P (k + 1), x(k + 1), P (k + 1),
we keep tra k of:
y(k)
y(k)
P 1 (k)
P 1 (k)
= P 1 (k)
x(k)
= P 1 (k)
x(k)
= I (k)
(463)
(464)
= I (k)
I (k) is known as the information matrix and is equal to

P (k) and is related to the Fisher information matrix I(k).
where
matrix
(462)
(465)
the inverse of the ovarian e
We an substitute these denitions into the Kalman Filter to get the Information Filter. After
mu h algebra in luding the matrix inversion lemma, we arrive at the following:
Let
A(k) = F T (k)I (k)F 1 (k)

Propagation step:
84
(466)
n

1 T o T

y(k + 1) = I A(k)(k) T (k)A(k)(k) + Q1 (k)
(k) F (k)
y (k) + A(k)G(k)u(k)
(467)

1 T
I(k + 1) = A(k) A(k)(k) T (k)A(k)(k) + Q1 (k)
(k)A(k)
|
{z
}
(468)
Information is de reasing due to pro ess noise.
Pro ess noise de reases the information during the propagation step. This is similar to a hole
in a metaphori al information bu ket; If
Qk = 0,
i.e. there is no pro ess noise, then there is no
information leaking out and
I(k + 1) = A(k) = F T (k)I (k)F 1 (k)
(469)
y(k + 1) = y(k + 1) + H T (k + 1)R1 (k + 1)z(k + 1)
(470)
The update step:
I (k + 1) = I(k + 1) +
H T (k + 1)R1 (k + 1)H(k + 1)
|
{z
}
(471)
information is in reasing from measurements

Here the information is in reasing due to new measurements. This is similar to an information
pipe lling the metaphori al information bu ket.
We an re over
and
by:
x
(k + 1) = I 1 (k + 1)
y (k + 1)
(472)
P (k + 1) = I 1 (k + 1)
(473)
16.1.1 Benets of the Information Filter
The information lter is more e ient than Kalman Filter if
nz > nx , nv
and if
R(k)
is
diagonal. Usually this not the ase, however.

For linear systems we an pi k the initial state estimate arbitrarily as long as we set
I (0) = 0.
This represents the diuse prior, i.e. no idea of our initial state. This setting the initial prior
to be diuse annot be as easily done with the regular Kalman Filter.
We ould set the
error ovarian e matrix to innity, but limited numeri al pre ision limits our ability to do so
(k) = I 1 (k)
y (k) until I (k) be omes invertible. If
in real systems. We annot ompute x
the system is observable, then
I (k)
will eventually be ome invertible. Waiting for I (k) to

k so that H k has full olumn rank in the bat h
be ome invertible is like waiting large enough

initialization problem.
16.1.2 Disadvantages of the Information Filter
This form of the information lter still involves squaring in the
85
H T R1 H
terms.

The square root information lter is one spe i implementation of the information lter whi h
involves no squaring of terms.
Dene:
T
Rxx
(k) Rxx (k) = I (k)
T
xx (k) = I (k)
Rxx (k) R
where
Rxx (k)
and
kk (k)
R
are the Choleski fa torizations of
(474)
(475)
I (k)
and
zx (k) = Rxx (k) x

(k)
zx (k) = Rxx (k) x

(k)
I(k)
respe tively, and
(476)
(477)
Also let
RaT (k)Ra (k)
= R(k)
(478)
Ha (k) = RaT (k) H (k)

za (k) = RaT (k) z (k)
(479)
wa (k) =
RaT
(k) w (k)
(480)
(481)
The transformed measurement equation be omes:
za (k) = Ha (k) x (k) + wa (k)
(482)
E [wa (k)] = 0
(483)
where

E wa (k) waT (j) = kj I
(484)
The dynami s model remains un hanged:
x (k + 1) = F (k) x (k) + G (k) u (k) + (k) v (k)
(485)
E [v (k)] = 0
(486)

E v (k) v T (j) = kj Q (k)
We now en ode prior information about
x (k)
and
v (k)
(487)
into so alled (square root) information
equations (AKA the data equations). Let
T
Rvv
(k) Rvv (k) = Q1 (k)
Note:
Rvv (k) = hol (inv (Q (k))) = [inv ( hol (Q (k)))]

86
(488)
zx (k) = Rxx (k) x (k) + wx (k) , wx (k) (0, I)
(489)
zv (k) = 0 = Rvv (k) v (k) + wv (k) , wv (k) (0, I)
(490)

E wx (k) wvT (k) = 0
(491)
These square root information equations store, or en ode, the state and pro ess noise estimates
and their ovarian es. We an re over our estimates from the information equations as long as
is invertible. If
time
and the
Rxx (k)
Rxx
is not invertible then the system is not observable from the data through
a priori info at time 0.
Note that
Rxx (k)
is upper triangular.
Let's now de ode the state from the state information equation:
1
x (k) = Rxx
(k) [zx (k) wx (k)]
Suppose we want our best estimate of
x(k)
denoted as
(492)
x
(k):
x
(k) = E [x (k) |k]
1
1
= Rxx
(k) E [zx (k) |k] Rxx
(k) E [wx (k) |k]
|
|
{z
}
{z
}
=
(494)
zx (k)
1
Rxx
(493)
(k) zx (k)
(495)
This result is onsistent with the previous denition in Eq. 476.

Let
1
x
(k) = x (k) x
(k) = Rxx
(k) wx (k)
P (k)

= E x
(k) x
T (k) |k

T
1
(k)
= Rxx
(k) E wx (k) wxT (k) Rxx
{z
}
|
(496)
(497)
(498)
1
T
= Rxx
(k) Rxx
(k)
= I
(k)
(499)
(500)
This result is onsistent with the previous denition in Eq. 474.

Likewise,
v (k) = E [v (k) |k] = 0
(501)
h
i
T
E (v (k) v (k)) (v (k) v (k)) |k = Q (k)
(502)
This result is also onsistent with previous denitions.
87
16.2.1 Propagation Step and Measurement Update

We need to gure out how to perform the propagation step and the measurement update on
these square root information equations.
We'll use the MAP approa h whi h is equivalent to
minimizing the negative natural log of the a posteriori of the
a posteriori
onditional probability
density fun tion. This amounts to minimizing the ost fun tion:
Ja [x (k) , v (k) , x (k + 1) , k] =
=
log (p)
1
1
[x(k) x
(k)]T P 1 (k)[...] + v T (k)Q1 (k)v(k)
2
2
1
+ [z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[...]
2
(503)
(504)
(505)
After normalization of the above form, an alternative formulation of the ost fun tion based in
square root information notation is:
Ja [x (k) , v (k) , x (k + 1) , k] =
1
1
2
2
kRxx (k) x (k) zx (k)k + kzv (k) Rvv (k) v (k)k
{z
} 2|
{z
}
2|
a priori x(k)
a priori v(k)
1
2
+ kHa (k + 1) x (k + 1) za (k + 1)k
(506)
{z
}
2|
measurement at k+1
The insight here is that the prior estimate of the state and pro ess noise an be expressed as a
measurement and thus formulated into the above ost fun tion.
Our task is to minimize
x (k + 1).
eliminate
Ja
x (k)
x (k), v (k), and

v (k), x (k + 1) and use this to
subje t to the dynami s model, whi h relates
We will solve the dynami s model for
x (k)
in terms of
from the ost fun tion:
x (k) = F 1 (k) [x (k + 1) G (k) u (k) (k) v (k)]

Simultaneously eliminate
Ja .
x (k)
and enfor e the dynami s onstraint, by substituting
We'll all this equivalent ost fun tion
(507)
x (k)
into
Jb :
Jb [v (k) , x (k + 1) , k](508)

2

1
Rvv (k)
0
v (k)
0

=
(509)
zx (k) + Rxx (k) F 1 (k) G (k) u (k)
2 Rxx (k) F 1 (k) (k) Rxx (k) F 1 (k) x (k + 1)
|

{z
}

Big blo k matrix
1
2
+ kHa (k + 1) x (k + 1) za (k + 1)k (510)
2
In the equation above, we used the following identity for the rst term:
2
a
2
2

kak + kbk =
b
88
(511)

Re all that
kT vk = kV k
Ta (k) = QT (k) from QR fa torization of the
orthonormal. Multiplying insides of rst term by Ta (k)
for T orthonormal. Let
big blo k matrix in Eq. 508 above.
Ta is
and rewrite as:
Jb [v (k) , x (k + 1) , k] =

2
1
v (k)
zv (k)
Rvv Rvx (k + 1)

zx (k)
Rxx (k + 1) x (k + 1)
2 0
1
+ kHa (k + 1) x (k + 1) za (k + 1)k2
2
(512)
NB: This was the propagation step!

Next Step: Just as we took a step of data equations and pa ked them into a ost fun tion, so
an we take the ost fun tion and unpa k it ba k into a set of data equations.
If we unpa k the rst term into its square root information equations, we obtain:
1. The
a posteriori square root information equation for v (k) as a fun tion of x (k + 1):
zv (k) = Rvv v (k) + Rvx (k + 1) x (k + 1) + w v (k) ,
w v (k) (O, I)
(513)
This equation is a by-produ t of the ltering pro ess. It is not used to determine the ltered
state estimate, but will be used in smoothing. Filtering implies ausality, smoothing implies
non ausality ( an use future information).
2. The
a priori state square root information matrix at k + 1:

zx (k + 1) = Rxx (k + 1) x (k + 1) + wx (k + 1) ,
Jb
Now minimizing
wrt.
Jb
0=
v (k)
w x (k) (O, I)
(514)
v (k):
T
Rvv (k)
| {z }
non-singular
This yields:

Rvv (k) v (k) + Rvx (k + 1) x (k + 1) zv (k)

1
v (k) = Rvv (k) zv (k) Rvx (k + 1) x (k + 1)
Eq. 516 is equivalent to the solution for
v (k)
(515)
(516)
in the MAP derivation of the Kalman Filter.
Substitute Eq. 516 into solution into Eq. 512, and sta king the remaining terms we get a new
yet equivalent ost fun tion:

2

1 R (k + 1)
z (k + 1)
Jc [x (k + 1) , k + 1] = xx
x (k + 1) x

za (k + 1)
2 Ha (k + 1)
|

{z
}
Matrix

A.
If Matrix A were square (and non-singular) we ould just take it's inverse to ompute
the lter's best estimate of
x(k + 1).
(517)
x
(k + 1)
However, it's not, so we want to QR fa torize this matrix
in order to push all the energy from the
Ha
term up into the top of the matrix, making it upper
89
triangular. This will de ouple the ost fun tion into a omponent that depends on
x(k + 1)
and
one that does not. We do this by performing QR fa torization on matrix A and the subsequent
and orthonormal transformation to the ost fun tion as before to get:

2

z (k + 1)
R (k + 1)
Jc (x (k + 1) , k + 1) = xx
x (k + 1) x

0
zr (k + 1)

|

{z
}
upper triangular

NB: This was the update step!
to
a posteriori.
1
1
2
2
kRxx (k + 1) x (k + 1) zx (k + 1)k + kzr (k + 1)k
2
2
(519)
The la k of bars above the terms indi ates that we have gone from
a priori
(518)
Unsta k to get:
Jc [x (k + 1) , k + 1] =
Now unpa k the impli it square root information equation from this ost fun tion to get:
1. The
a posteriori square root information equation for for the state at k + 1:

zx (k + 1) = Rxx (k + 1) x (k + 1) + wx (k + 1) ,
wx (k + 1) (0, I)
(520)
2. The residual error equation:
zr (k + 1) = wr (k + 1) ,
90
wr (k + 1) (0, I)
(521)
Aside:
Q: Where do the
Rxx , zx (k + 1), zr (k + 1),wx (k + 1)
and
wr (k + 1)
terms ome from?
A: They ome from the orthogonal transformation that o urs to the ost fun tion, i.e. Eq. 517,
after performing the QR fa torization and transforming the matri es. First, to make things lear,
let's unpa k Eq. 517 into its impli it square root information equations:
z x (k + 1) = Rxx (k + 1) x (k + 1) + w x (k + 1)
(522)
za (k + 1) = Ha (k + 1) x (k + 1) + wa (k + 1)
(523)
It's easy to see the impli it noise terms
w
x
and
wa .
A in the ost fun tion, i.e. in Eq. 517, and dening
Now performing QR fa torization on matrix

Ta (k + 1) = QT , where Q is from the QR
fa torization and then transforming ea h matrix by left multiplying by

Rxx (k + 1)
0

zx (k + 1)
zr (k + 1)

wx (k + 1)
wr (k + 1)
=
=
=
Ta (k + 1),
we arrive at:

xx (k + 1)
R
Ha (k + 1)

z (k + 1)
Ta (k + 1) x
za (k + 1)

w
x (k + 1)
Ta (k + 1)
wa (k + 1)
Ta (k + 1)
(524)
(525)
(526)
Ta (k + 1) is orthonormal, wx (k +1) and wr (k +1) retain the same distribution as w

x (k +1)
wa (k + 1), i.e.wx (k + 1) (0, I) , wr (k + 1) (0, I).
Now returning to Jc . We an minimize it by inspe tion.
Be ause
and
1
x
(k + 1) = Rxx
(k + 1) zx (k + 1)
(k + 1) su h that the rst term of of

Thus, we have solved for x
2
Jc = 12 kzr (k + 1)k , whi h be omes its minimum.
It an be shown that
(527)
Jc
be omes 0 and we are left
with
kzr (k + 1)k = zTr (k + 1) zr (k + 1) = T (k + 1) S (k + 1) (k + 1)

where
and
(528)
are terms dened earlier within the normal Kalman ltering (non-square-root-
information) ontext.
No matrix squaring: (only QR fa torization and inversion of R matri es when ne essary)
Very robust numeri ally
P (k)is
Note:
F (k)
guaranteed symmetri and positive denite be ause
1
T
P (k) = Rxx
(k) Rxx
(k)
must be invertible. If not our solution must be more fan y.
91
17 SMOOTHING
17 Smoothing
17.1 Estimate x(k) based on Z j with j > k

3 Classi Types:
Fixed point smoothing: Interested in
Fixed lag smoothing: Estimate
Fixed interval smoothing: Estimate
urrent
x(k)
for a xed
x(k), x(k + 1), ...
k,
but
keeps in reasing
using data that extends
samples past the
of interest
x(k)k
from 1 to N using all data on
k = 1, 2, , N
Fo us: Fixed interval smoothing

New Notation
x(k|N
) = x (k)
P (k|N ) = P (k)
v(k|N ) = v (k)
Preferred Implementation: Square-Root Information Smoother (SRIS)
Key Observation: Smoother equations fall out of the MAP estimation approa h.
First, exe tue
an SRIF forward (ltering) pass.

Square-Root Information Equations relating the state and pro ess noise
v (k) (0, I)
w
vv (0)v(0) + R
vx (1)x(1) + w
zv (0) = R
v (0)
.
.
.
zv (N 1) = Rvv (N 1)v(N 1) + Rvx (N )x(N ) + w

v (N 1)
Square-Root Information Equations for the residuals: Dis ard them, they are useless
zr (0) = wr (1)
.
.
.
zr (N ) = wr (N )
Square-Root Information Equation for the state at N
zx (N ) = Rxx (N )x(N ) + wx (N )
Invoke the Dynami s Model:
x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)
Smoothing is essentially a ba kwards iteration. Use the dynami s model to emilimate

of
x(N 1)
92
x(N ) in favor
17.2 Steps
17 SMOOTHING
17.2 Steps
1. Let:
zx (N ) = zx (N )
Rxx
(N ) = Rxx(N )
wx (N ) = wx (N )
2. (If needed) Compute:
1
x (N ) = Rxx
(N )z (N )
1
T
P (N ) = Rxx
(N )Rxx
(N )
3. Set
k = N 1.
The ost fun tion asso iated with the Square-Root Information equations at
k an be written as:
Jb [v(k), x(k + 1), k] =
vx (k + 1)x(k + 1) zv (k)||2 + 1 ||Rxx

||Rvv (k)v(k) + R
(k + 1)x(k + 1) zx (k + 1)||2
2
2
We seek to minimize this subje t to the dynami s equation.

4. Use the dynami s equation to eliminate
x(k + 1) in
sta king leads to:
Ja [v(k), x(k), k] =
1
||
2
favor of

x(k).

v(k)
x(k)
Substitute for
x(k + 1) and
||2
whose implied SR Information equations are:

vx (k + 1)G(k)u(k)
vv (k) + R
vx (k + 1)(k) R
vx (k + 1)F (k) v(k)
v (k)
zv (k) R
|R
w
=
+
zx (k + 1) Rxx
(k + 1)G(k)u(k)
Rxx
(k + 1)(k)
Rxx
(k + 1)F (k) x(k)
wx (k + 1)
Multiply both sides by some
Ta (k) = QT (k)
from qr-fa torization of the blo k matrix. This
does not hange the ost, but now the SR information equations are de oupled:
Where
Thus

wv (k)
(0, I)
xx (k)

zv (k)
Rvv (k) Rvx
(k) v(k)
w (k)
=
+ v
zx (k)
0
Rxx
(k) x(k)
xx (k)
Ja [v(k), x(k)] =
||R (k)v(k) + Rvx

(k)x(k) zv (k)||2
2 vv
5. Minimize the ost:
1
x (k) = Rxx
(k)zx (k) = E[x(k)|Z N ]
1
T
P (k) = Rxx
(k)Rxx
(k)
1
v (k) = Rvv
(k)[zv (k) Rvx
(k)x (k)] = E[v(k)|Z N ]
1
T
T
T
Pvv
(k) = Rvv
(k)[I + Rvx
(k)Rxx
(k)Rxx
(k)Rvx
(k)]Rvv
(k)
1
(k) = Rvv
(k)Rvx
(k)Rxx
(k)Rxx
(k)
Pvx
N ote : Ja [v (k), x (k), k] = 0
93
17.2 Steps
6. If
k = 0,
17 SMOOTHING
stop. Otherwise, de rement k by 1 and goto step 4. now using the SR Information
equations:
zx (k + 1) = Rxx
(k + 1)x(k + 1) + wx (k + 1)
vv (k)v(k) + R
vx (k + 1)x(k + 1) + w
zv (k) = R
v (k)
94
18 NONLINEAR DIFFERENCE EQUATIONS FROM ZOH NONLINEAR DIFFERENTIAL

EQUATIONS
18 Nonlinear Dieren e Equations from ZOH Nonlinear Differential Equations

S ribe: Daniel P. Shepard
We wish to nd an approximate Kalman Filter algorithm for non-linear systems as dieren e
equations of the form
x(k + 1) = f [k, x(k), u(k), v(k)]

z(k) = h [k, x(k)] + w(k)
(529)
(530)
We need to onstru t this from the ontinuous-time non-linear models of the form
x(t)
= f (t, x(t), u(t)) + D(t)
v (t)
z(t) = h (t, x(t)) + w(t)
(531)
(532)
Re all that we already did this in Chapter 10 for linear systems when
x(t)
= A(t)x(t) + B(t)u(t) + D(t)
v (t)
z(t) = C(t)x(t) + w(t)
(533)
x(k + 1) = F (k)x(k) + G(k)u(k) + (k)v(k)
(535)
(534)
was tranformed into
z(k) = H(k)x(k) + w(k)
(536)
18.1 Zero-Order-Hold (ZOH) Assumption

Under zero-order-hold (ZOH) assumptions, the sampling interval
enough that over the interval
is assumed to be small
tk = kt t < (k + 1)t = tk+1

u(t) u(k)
v(t) v(k)
w(k)
w(t)
(537)
(538)
(539)
This is illustrated in Fig. 26, whi h shows that the value of the ontrol is assumed onstant (i.e.
t. ZOH also requires that the

z(t) after anti-alias ltering to avoid innite varian e of w(k).
xk (t) be the ontinuous-time state in the interval tk t < tk+1 .
held over) over the sampling interval
measurement
z(k)
is a
sample of
Let
problem
95
Dene the initial-value

18.2 Varian e of the ZOH Pro ess Noise
EQUATIONS
Figure 26: The ontrol input under ZOH assumptions
x k (t) = f [t, xk (t), u(t)] + D(t)

v (t)
z(t) = h [t, x(t)] + w(t)

xk (tk ) = x(k)
(540)
(541)
(542)
We an solve for xk (t) on tk t < tk+1 . The solution depends on x(k), u(k), and v(k). Let
f [k, x(k), u(k), v(k)] = xk (tk+1 ), where f [] is some pro edure for integrating forward to tk+1 . In
MATLAB, this integration pro edure ould be ode45 or any other numeri al integration s heme.
18.2 Varian e of the ZOH Pro ess Noise

Q: Given
How do we relate
Q(t)
A: If t is small, then
and

E v(t)
v T ( ) = (t )Q(t)

T
E v(k)v (j) = kj Q(t)
(544)
?
Q(t)
i
h
f [k, x(k), u(k), v(k)] x(k) + t f (tk , x(k), u(k)) + D(tk )v(k)
This is simple Euler Integration. In this ase, the term of
tD(tk )v(k),
(543)
f []
(545)
orresponding to the pro ess noise is
whi h an be alternatively expressed as
tD(tk )v(k)
Both of these forms of the pro ess noise term of
tk+1
D( )
v ( ) d
tk
f []
96
have zero mean. Also,
(546)

18.3 Partial Derivatives of Dieren e Equations
EQUATIONS
cov
Z
cov (tD(tk )v(k)) = t2 D(tk )Q(k)DT (tk )

Z tk+1
tk+1
)DT ( ) d
D( )Q(
D( )
v ( ) d =
tk
tk
(548)
k )DT (tk )
tD(tk )Q(t
Equating these two ovarian es for small
yields that
Q(k) =
Note that
(547)
limt0 Q(k) = .
k)
Q(t
t
(549)
This result omes from the whiteness of
v(t).
Q: What if the measurement interval t is too large for the ZOH assumption to hold?
A: One an take m intermediate steps of length t
m between ea h measurement. Choose m su h
that
t
m is small enough that
Q(k) =
t )
Q(k
m
t
m
is a reasonable approximation. Thus, the model takes
the form
x(k + 1) = f [k, x(k), u(k), v(k)]

z(0) = h [0, x(0)] + w(0)
(550)
z(m) = h [m, x(m)] + w(m)
(rst measurement)
(551)
(next measurement) . . .
(552)
In other words, implement a KF by performing m propagation steps and then an update step, sin e
new measurements only arrive every
mt
se onds.

To design a nonlinear estimation algorithm, we'll need to know

f []
F (k) =
x(k) k,x(k),u(k),0

f []
(k) =
v(k)
(553)
(554)
k,x(k),u(k),0
Q: How do we nd these?

A: Re all that
x k (t) = f (t, xk (t), u(t)) + D(t)

v (t)
xk (tk ) = x(k)
Taking the partial with respe t to
x(k)
(555)
(556)
yields
97

EQUATIONS

f()
[x k (t)] =

x(k)
xk (t)

xk
x(k) t
t,x (t),u(k)
k

xk
= A(t)
x(k) t
Sin e
xk (tk ) = x(k)
(557)
and the order of dierentiation an be inter hanged, we have the following
initial-value problem for the state transition matrix

d xk (t)
xk (t)
= A(t)
dt x(k)
x(k)
xk (tk )
= Inx xnx
x(k)
(558)
(559)
xk (tk )
x(k) is similar to the state-transition matrix
ontinuous-time linear systems in Se tion 10.
This shows that
Similarly, for
F (t, tk )
from the dis ussions of the
(k)

xk (t)
d xk (t)
= A(t)
+ D(t)
dt v(k)
v(k)
xk (tk )
=0
v(k)
However, we want to know the derivatives of the dis rete
f [],
(560)
(561)
not the ontinuous
f().
These
derivatives are given by
xk (tk+1 )
f []
=
x(k)
x(k)
f []
xk (tk+1 )
=
v(k)
v(k)
This requires integration of Eqs. (558) and (560) from
tk
to
(562)
(563)
tk+1 .
This an be a omplished using
numeri al integration s hemes, su h as ode45, to integrate the two matrix dierential equations at
the same time we're integrating the
xk (t) dierential equation.
To do this, break the matri es apart
into olumn ve tors as
xk (t)
= [1 (t), 2 (t), . . . , nx (t)]
x(k)
xk (t)
= [1 (t), 2 (t), . . . , nv (t)]
v(k)
98
(564)
(565)

EQUATIONS
Then, the initial-value problems for these newly dened ve tors are
i (t) = A(t)i (t),
i = 1, 2, . . . , nx
i (tk ) = [0, 0, . . . , 0, |{z}

1 , 0, . . . , 0]
(566)
(567)
ith row
i (t) = A(t)i (t) + di (t),
i (tk ) = 0
where
D(t) = [d1 (t), d2 (t), . . . , dnv (t)].
i = 1, 2, . . . , nv
(569)
Now a large state ve tor an be dened as

T
Xbig = xTk , T1 , T2 , . . . , Tnx , 1T , 2T , . . . , nTv
This state ve tor has dimension
(568)
nx (nx + nv + 1)x1.
(570)
A large numeri al integration routine an be
written, with the appropriate previously dened initial onditions, to solve
X big = fbig (t, Xb ig, u(k), v(k))
99
(571)
19 NONLINEAR ESTIMATION FOR DYNAMICAL SYSTEMS
19 Nonlinear Estimation for Dynami al Systems

S ribe: Shaina Johl
Problem statement
Dynami s model:
x(k + 1) = f [k, x(k), u(k), v(k)]

E[v(k)] = 0,
(572)
E[v(k)v (j)] = kj Q(k)
(573)
Measurement model:
z(k) = h [k, x(k)] + w(k)
(574)
E [w(k)] = 0, E [w(k)w(j)] = jk R(k)
(575)
Q: How to optimally estimate x(k)?

A: See Bar Shalom 10.2
Our strategy is a sub-optimal strategy. Approximate the optimal estimates by applying linear
MMSE estimator to the non-linear problem.
Approximate
Approximate their ovarian es

x(k + 1) = E x(k + 1)|Z k

z(k + 1) = E z(k + 1)|Z k
(576)
(577)
P (k + 1), P kz (k + 1), P zz (k + 1)
(578)
If we an assume that the approximations are valid then we an use our old update equations
for the measurement update of the Kalman lter
x
(k + 1) =x(k + 1) + P xz (k + 1)Pzz (k + 1) [z(k + 1) z(k + 1)]
1
P (k + 1) =P (k + 1) P xz (k + 1)Pzz
(k +
T
1)P xz (k
+ 1)
(579)
(580)
19.1 Standard EKF Methods for Approximation

EKF Philosophy: Linearize the non-linear equations about the urrent best state estimate.
19.1.1 Predi tion:

x(k + 1) = E f [k, x(k), u(k), v(k)] |Z k
100
(581)
19.1 Standard EKF Methods for

19 Approximation
NONLINEAR ESTIMATION FOR DYNAMICAL SYSTEMS
Expand in a Taylor series expansion about
x(k) = x
(k), v(k) = v(k) = 0:
x
(k + 1) = E f [k, x
(k), u(k), 0] +
[x(k) x
(k)]
x k,x(k),u(k),0
|
{z
}
F (k)
f
+ Higher

v(k)
+
v(k)
k,
x(k),u(k),0
{z
}
|
Order Terms|Z
(k)
(582)
Negle t higher order terms, hoping that the linearization is valid over the likely values of
x(k)!
This is surprisingly ee tive.

Now take the expe tations:

x(k + 1) = f [k, x
(k), u(k), 0] + F (k) E[x(k) x(k)|Z k ] +(k) E v(k)|Z k
|
{z
}
|
{z
}
approximately=0
Note: Compute
x(k + 1)
via our numeri al pro edure
Q(k)
would have ae ted
But,
x(k + 1).
f [k, x(k),
h u(k),
h
i v(k)].
Wat h out: if we had retained the se ond order derivatives
(583)
2f
x2
and
2 f
v 2
then
P (k)
and
This is a property of non-linear lters in general.

P (k + 1) = E (x(k + 1) x(k + 1))( )T
(584)
x(k + 1) = x(k + 1) + F (k) [x(k) x

(k)] + (k)v(k) + HOT
| {z }
(585)
P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)
(586)
negle t
Substituting and taking expe tations yields:
Note: This is the same as for the linear Kalman Filter. The only dieren e is that
(k)
F (k)
and
are omputed by numeri al integration.
19.1.2 Measurement update:

z(k + 1) = E h [k + 1, x(k + 1)] + w(k + 1)|Z k

|k+1,x(k+1) [x(k + 1) x(k + 1)] + HOT + w(k + 1)|Z k

= E h [k + 1, x(k + 1)] +
x
{z
}
|
H(k+1)

= h [k + 1, x(k + 1)] + H(k + 1) E x(k + 1) x(k + 1)|Z k + E w(k + 1)|Z k
|
{z
} |
{z
}
approximately=0
h [k + 1, x(k + 1)]
101
(587)
19.2 EKF as an algorithm
P ae ts z .

P xz (k + 1) = E (x(k + 1) x(k + 1))(z(k + 1) z(k + 1))T |Z k
(588)
z(k + 1) z(k + 1) + H(k + 1) [x(k + 1) x(k + 1)] + w(k + 1)
(589)
Note: If HOTs are retained, then
Note that
Therefore

P xz (k + 1) = E (x(k + 1) x(k + 1))(H(k + 1) [x(k + 1) x(k + 1) + w(k + 1)])T |Z k
T
= P (k + 1)H (k + 1)
(590)
(591)
Similarly,
P zz (k + 1) = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)
(592)

1. Start with
2. Set
x
(0), P (0)
k=0
3. Compute
x(k + 1) = f [k, x
(k), u(k), 0]

f
F (k) =
x k,x(k),u(k),0

f
(k) =
v(k) k,x(k),u(k),0
(593)
(594)
(595)
P (k + 1) = F (k)P (k)F T (k) + (k)Q(k)T (k)
(596)
(597)
4. Compute the Measurement Update
z(k + 1) =h[k + 1, x
(k + 1)]

h
H(k + 1) =
x
(598)
(599)
k+1,x(k+1)
(k + 1) =z(k + 1) z(k + 1)
S(k + 1) =H(k + 1)P (k + 1)H T (k + 1) + R(k + 1) = P zz (k + 1)
T
W (k + 1) =P (k + 1)H (k + 1)S (k + 1)
x(k + 1) =x(k + 1) + W (k + 1)(k + 1)
(603)
P (k + 1) =P (k + 1) W (k + 1)S(k + 1)W (k + 1)
102
(601)
(602)
Note that we an use alternate formulas instead of
(600)
and
for the Kalman lter.
(604)
5. Filter:
P (k + 1) = [P 1 (k + 1) + H T (k + 1)R1 (k + 1)H(k + 1)]1

Joseph form (guarantees that
P (k + 1) > 0:
P (k + 1) =[I W (k + 1)H(k + 1)]P (k + 1)[I W (k + 1)H(k + 1)]T

T
W (k + 1) =P (k + 1)H (k + 1)R
k
by 1 and go to Step 3.
103
(606)
(607)
(k + 1)
(608)
+ W (k + 1)R(k + 1)W (k + 1)
6. In rement
(605)
20 ITERATED KALMAN FILTER
20 Iterated Kalman Filter

S ribe: Aprajita Sant

It an be thought of as a step towards the solution of the nonlinear MAP problem.
Consider the MAP ost fun tion:
1
[x (k) x
(k)]T P 1 (k) [...]
2
1 T
v (k) Q1 (k) v (k)
2
1
T
{z (k + 1) h [k + 1, x (k + 1)]} R1 (k + 1) {...}
2
Ja [k, v (k) , x (k) , x(k + 1)] =

+
+
whi h omes from minimizing

p x (k) , v (k) , x (k + 1) Z k = Cexp(Ja )
Minimizing of Ja must be subje t to the nonlinear dynami s.
(609)
Dene a nonlinear inverted dynami s
fun tion:
x (k) = f 1 [k, x (k + 1) , u (k) , v(k)]

Here, the fun tion is dened su h that:
1
x (k + 1) = f [k, f
{k, x (k + 1) , u (k) , v (k)} , u (k) , v(k)]
The above an be better visualized by taking the example of our original nonlinear problem
f = F (k)x(k) + G(k)u(k) + (k)v(k)
(610)
f 1 = F 1 (k) [x(k + 1) G(k)u(k) (k)]
(611)
1
We an obtain f
by numeri ally integrating ba kward f in time.
1
to eliminate x(k) from the MAP ost fun tion and thereby dene:
Use f
Jb [k, v (k) , x(k + 1)] =

+
T
1 1
f [k, x(k + 1, u(k), v(k))] x
(k) P 1 (k) [...]
2
1 T
1
T
v (k) Q1 (k) v (k) + {z (k + 1) h [k + 1, x (k + 1)]} R1 (k + 1)(612)
{...}
2
2
This is just a weighted least squares ost fun tion for errors in the three equations given below:
1.
0 = f 1 [k, x(k + 1), u(k), v(k)] x(k)
2.
0 = v(k)
3.
0 = z(k + 1) h[k + 1, x(k + 1)]
with weighting
with weighting
P 1
Q1
with weighting
104
R1 (k + 1)
Strategy: We use the Gauss-Newton method to linearize and solve. We start by rst linearizing
x(k + 1) = x
(k + 1) and v(k) = 0.
x
(k)
about the rst guess at

below using our initial
Here
x
(k + 1) is obtained by the equation
x
(k + 1) = f [k, x
(k), u(k), v(j)]
The next step is to solve the linearized least squares problem for
(613)
x
(k)
and
v(k),
then re-linearize
and then iterate. We do this by linearizing ea h of the three equations above.

For 1) we have:
f 1
f [k, x
(k + 1) , u (k) , v (k)] +
x (k + 1)
1
f
[v (k) 0] x
(k)
v (k) k,x(k+1)
k,
x(k+1)
(614)
x
(k) = f 1 [k, x
(k + 1) , u (k) , v (k)]
In the above expression
the x
(k) s an el.
[x (k + 1) x (k + 1)]
from the denition of inverse. Thus
It an be shown that:
f 1
x (k + 1)
= F 1 (k) =
k,
x(k+1),u(k),0
f
x(k)
1
k,
x(k),u(k),0
And
f 1
v (k)
k,
x(k+1),u(k),0
1

f
= F 1 (k) (k) = F 1 (k)
v(k) k,x(k),u(k),0
The linearized equation then be omes:

0 = F 1 (k) [x (k + 1) x
(k + 1)] F 1 (k) (k) v(k)
Similarly, we linearize 3) as shown below:
0
(k + 1)]
= z (k + 1) h [k + 1, x
We dene,
H (k + 1) =
Also, we know,
h
x(k+1)
h
x (k + 1)
k,
x(k+1)
[x (k + 1) x
(k + 1)]
k,
x(k+1)
z (k + 1) = h [k + 1, x
(k + 1)]
The linearized equation for 3) redu es to:
0 = z (k + 1) z (k + 1) H(k + 1)[x (k + 1) x (k + 1)]

Summarize: To summarize the three linearized equations are:
1.

0 = F 1 (k) [x (k + 1) x (k + 1)] F 1 (k) (k) v(k)
105
20.2 Iterated Kalman Filter
2.
0 = v (k)
3.
0 = z (k + 1) z (k + 1) H (k + 1) [x (k + 1) x
(k + 1)]
.Jb
New ost fun tion obtained by substituting the linearized equations ba k into the ost fun tion
Jb [k, v (k) , x(k + 1)] =

+
+
1
1
T
[x (k + 1) x (k + 1) (k) v(k)] F T (k) P (k) F 1 (k) [...]
2
1 T
v (k) Q1 (k) v (k)
2
1
T
{z (k + 1) z (k + 1) H (k + 1) [x (k + 1) x
(k + 1)]} R1 (k + 1)(615)
{...}
2
The new ost fun tion is minimized w.r.t
x (k + 1)and v (k).
If the linearization is good, this is very
lose to maximizing the aposteriori likelihood fun tion and an be viewed as the justi ation for
EKF. Also, there are analogies to represent Extended Kalman lter as a square root information
lter.
20.2 Iterated Kalman Filter

The traditional EKF takes only one Gauss-Newton step. The iterated EKF is su h that it takes
multiple Gauss-Newton steps, re-linearizing only the measurement equation at ea h step.(One ould
1
also imagine re-linearizing the f
dynami s using.x
(k + 1) and v (k) Infa t, one an imagine relinearizing over the last N-steps whi h is the idea behind the BSEKF.
Jc [k + 1, x(k + 1)] = Jb [k, vOP T (k) , x(k + 1)]
Consider
(i.e the
v (k)that
minimizesJb )
The approximation omes from linearized dynami s (linearized about)x

(k). Measurement equation
remains non-linear.
Strategy:
iterate.
i (k
Let x
Start with a guess of
x (k + 1)solve
the linearized LS problem for a new guess, then
+ 1) be the estimated,x (k + 1) after the ith Gauss Newton step, note. x

0 (k + 1) = x (k + 1)
Also, let
H i (k + 1) =
The linearized measurement equation after the
h
x
ith
k+1,
xi (k+1)
step:

0 = z (k + 1) h k + 1, x
i (k + 1) H i (k + 1) x (k + 1) x
i (k + 1)
Relate these iterations ba k to standard KF equations by:
T
Jci
=
x(k + 1)
= P 1 (k + 1) [x (k + 1) x (k + 1)]

T
H i (k + 1)R1 (k + 1){z (k + 1) h k + 1, x
i (k + 1) H i (k + 1) x (k + 1) x
i (k +(616)
1) }
106
20.3 Forward Ba kward Smoothing (BSEKF)
Let,
P i+1 (k + 1) = [P 1 (k + 1) + H i (k + 1)R1 (k + 1)H i (k + 1)]

Then, solving for
x
i+1 (k + 1) =
+
x (k + 1)
yields:

T
x
i (k + 1) + P i (k + 1) H i (k + 1)R1 (k + 1) ){z (k + 1) h k + 1, x
i (k + 1) }
P i (k + 1) P 1 (k + 1) [
x (k + 1) x
i (k + 1)]
(617)
With,
P 0 (k + 1) = P (k + 1) and x0 (k + 1) = x
(k + 1)
Note that
xi (k + 1)
is the traditional (non-iterated) EKF estimate.
Stop iterating when, the expression below gets very small
i+1

x
(k + 1) x
i (k + 1) <
Use step size adjustment as before if worried about divergen e
20.3 Forward Ba kward Smoothing (BSEKF)
It is used to deal with urrent and several past measurement non-linearities plus dynami s nonlinearities.
Solve for,
x (k j) and v (k j) , j = m, m 1, ..., 0
Su h that it minimizes the expression below:
1
T
[x (k j) x (k m)] P 1 (k m) [...]
2
k1
1 X T
v (l) Q1 (l) v(l)
2
l=km
1
2
k
X
l=km1
{z (l) h [l, x (l))]} R
Su h that,
x (k + 1) = f [k, x (l) , u (k) , v (k)]

Steps:
x (k j)
Start with an initial guess of
Linearize the dynami s about this guess at ea h time.
Solve the linear smoothing problem with a step-size adjustment algorithm.
propagate it forward.
107
T
T
(l) {...}
(618)
21 MULTIPLE MODEL (MM) FILTERING
21 Multiple Model (MM) Filtering

S ribe: Yousof Mortazavi
Let
be a ve tor of unknown parametersit might ae t
any
of
F, G, , H, Q, R, x
(0), P (0).
A Bayesian approa h to multiple model ltering seeks the following:
p[x(k), |Z k ] = p[x(k)|, Z k ] p[|Z k ]

For onvenien e, let
take on values in
Then:
p[x(k)|j , Z k ]
{1 , 2 , . . . , M }.
is the posterior density of
p[ = j |Z k ] , j (k) is
x(k)
under the
the probability that the
j th
j th
model.
model is orre t given
Zk
21.1 Stati Case
Consider the stati ase, where
PM
j=1
j = 1).
(k) = = onst.
21.1.1 Strategy
1. Determine how to propagate
p[x(k)|Z k ] =
j (k)
to
PM
j (k + 1)
j=1 j (k) p[x(k)|j , Z

or MMSE. Also, al ulate P (k).
2. Find
] and use this to hoose an optimal x(k) per MAP
21.1.2 Steps
1. Propagate:
j (k)
p[j |Z k ]
= p[j |z(k), Z k1 ]
=
= j (k)
p[z(k)|j , Z k1 ] p[j |Z k1 ]
p[z(k)|Z k1 ]
p[z(k)|j , Z k1 ] j (k 1)
M
X
p[z(k)|l , Z k1 ] l (k 1)
l=1
The fa tor
p[z(k)|j , Z k1 ]
is the likelihood fun tion of model
at time
k.
In the linear Gaussian ase:
p[z(k)|j , Z k1 ] = N (z(k); Hj (k) xj (k), Sj (k))
= N ((k); 0, Sj (k)) = p(j (k))
where
Sj (k)
is the innovation ovarian e matrix for model
108
and
j (k) = z(k) Hj (k) xj (k)
KF
T1
21.2 Remarks
Z
x1
n1
21 MULTIPLE
MODEL (MM) FILTERING
mu1
x2
2. Estimate: n2
T2
KF
xhat
xMAP (k) =
x
MMSE
xM(k) =
KF
PMMSE (k) =
mu2
arg max p[x(k)|Z k ]

x(k)
E[x(k)|Z ] =
nM
TM
sum
M
X
M
X
x
j (k) j (k)
j=1
spread of the means

z
}|
{
muM(k)] [xj (k) xMMSE (k)]T

j (k) Pj (k) + [
xj (k) x
MMSE
j=1
propagate
muk1
Fig. 27 shows the s hemati for the MMSE ase.
Figure 27: Multiple model lter s hemati for MMSE ase
21.2 Remarks
Choosing

{1 , 2 , . . . , M }
is the subje t of ongoing resear h.
hard to distinguish similar

large
makes models hard to distinguish
Approximate multiple model lter for nonlinear systems an be designedjust repla e KF

with EKF.
If the orre t
is among the
j ,
then orresponding
j (k)
will approa h 1 as
Q: What an be done for time varying (k)

A1: Ad ho modi ation: impose an arti ial lower bound on the j (k)
A2: Dynami multiple model lter
Let
(k) {1 , 2 , . . . , M }
k .
and assume model swit hing is a Markov pro ess with transition
probabilities given by
pij = p[(k) = j |(k 1) = i ]

Then for a general non-linear system
x(k + 1) =
z(k) =
f [k, x(k), (k)]

h[k, x(k), (k)]
109
Hidden Markov Model
21.2 Remarks
There exist
Mk
21 MULTIPLE MODEL (MM) FILTERING

possible sequen es at time
k.
Q: How an we deal pra ti ally with su h exponential growth?

A:
Method
(1) retain only most probable
Complexity (# of lters)
M
MN
sequen es
(2) ombine sequen es that dier only before last
steps
Bar Shalom introdu es:
Generalized Pseudo-Bayesian estimator of rst order (GPB1) (N
Generalized Pseudo-Bayesian estimator of se ond order (GPB2) (N
Intera ting Multiple Model Estimator (IMM) (like
N =2
= 1)
but only
= 2)
M
lters used)
For additional material, please refer to Mayba k se tion 10.8 and Bar Shalom Se tion 11.6.
110
22 THE "BOOTSTRAP" PARTICLE FILTER
22 The "Bootstrap" Parti le Filter

22.1 Motivation
A more perfe t linear estimator. All MMSE estimators, in luding approximate te hniques su h
as the Extended Kalman Filter (EKF) and Sigma Point Filter (SPF), redu e to taking the approximate onditional mean and ovarian e:
x(k)
= E[x(k)|z k ]
T
P (k) = E[[x(k) x(k)][x(k)

x(k)]
]
Remarks
EKF-produ ed approximations for
SPF approximations are generally better
But both EKF and SPF are onsidering only the rst two moments of the posterior pdf
k
x(k)
and
P (k)
are poor
p[x(k)|z ]
Q: What if the posterior pdf is multi-modal?

Q: What would the estimation look like without approximation?
A: (Assume no ontrol input for simpli ity)
x(k + 1) = f [k, x(k), v(k)]
z(k + 1) = h[k + 1, x(k + 1)] + w(k + 1)
22.1.1 Propagation
Prior at
k+1
(Also known as the Chapman-Komorgorov Equation):
p[x(k + 1)|z k ] =
p[x(k + 1)|x(k)]p[x(k)|z k ]dx(k)
22.1.2 Update
p[x(k + 1)|z k+1 ] =
p[z(k + 1)|x(k + 1)]p[x(k + 1)|z k ]

p[z(k + 1)|z k ]
This re ursion allows us a ess to the entire posterior pdf.
We an then hoose to optimize our
estimate against any riterion we wish.
Q: Why should we settle for anything less than this optimal estimate?
A: Think about the one-dimensional problem: We an approa h optimality
by numeri al inte-
gration. But grid size must be small and the grid must apture the tails of the distribution.
Now think about the multi-dimensional problem: If we need 100 ells per dimension and we have
nx dimensions, the total number of ells is 100nx . Another problem: The grid-based method is not
111
22.2 Parti le Filter Algorithm
readily parallelizable.
The Parti le Filter suers from the same drawba ks as the grid-based method (massive memory and omputation is required even for modest
nx .
Not readily parallelizable), but we'll study it
anyway.
Key Idea: Estimate the posterior using weighted "parti les":
p[x(k)|z k ]
Ns
X
i=1
wi [x(k) i (k)]
X
wi = 1
are alled parti les or support points
To hoose the support points
and the weights
wi
we approximate the pro ess of drawing a sample
from a distribution.
Problem: For an arbitrary distribution, it is omputationally expensive to generate random samples.
22.1.3 Importan e Sampling

Suppose we an't draw from
p(x) e onomi ally, but we an from another distribution q(x), alled
the importan e density.

1.
q(x)
is non-zero everywhere
2.
q(x)
and
p(x)
an be approximated as:
Then
wi 's
p(x)
p(x)
is non-zero
have similar mean and ovarian e
are given by
p(x)
PNs
i=1
wi [x i ]
wi = c
Ns
where we draw {i }i=1 from
p(i )
q(i )
where is a normalizing onstant.

The approximation be omes exa t as
Open Questions:
Q:
Q:
Ns
How fast does the approximation approa h the truth?

What should
Ns
be for a given problem?

Basi Idea: Like the SPF ex ept that
1. We use more samples
2. We draw them randomly
3. We don't attempt to shoehorn transformed samples into a Gaussian distribution
112
q(x)
and
Generi Steps (Later we'll fo us spe i ally on the Bootstrap part):
1. Choose an importan e density

2. Draw samples
q[x(k)|z k ]
i (k), i = 1, 2, ..., Ns
3. Compute importan e weights:

4. If ne essary, approximate
from
wi (k) =
p[x(k)|z k ]
q[x(k)|z k ]
p[i (k)|z k ] P p[i (k)|z k ] 1

[ i q[i (k)|zk ] ]
q[i (k)|z k ]
PNs
i=1
And ompute basi estimation quantities:
wi (k)[x(k) i (k)]
x(k)
= E[x(k)|z k ]
wi (k)i (k)
T k
P (k) = E[[x(k) x(k)][x(k)

x(k)]
|z ]
X
T
wi (k)[i (k) x(k)][
i (k) x(k)]
i
This step is not ne essary unless you a tually have to provide a single estimate.
22.2.1 How to hoose q[x(k)|z k ]?

Choose q to fa tor in a onvenient way:
q[i (k)|i (0), , i (k 1)] q[i (0), , i (k 1)|z k ]

Consider the true density
p[i(0), , i (k)|z k ]
= p[i (0), , i (k)|z k1 , z(k)]
p[z(k)|i (0), , i (k), z k1 ]p[i (0), , i (k)|z k1 ]

p[z(k)|z k1 ]
p[z(k)|i (0), , i (k), z k1 ] = p[z(k)|i (k)] (zeroeth-order

k1
]
i (0), ,i (k)|z
p[i (0), ..., i (k)|z k ] = p[z(k)|i (k)]p[
p[z(k)|z k1 ]
Assume:
Then
Markov assumption)
Use Bayes' rule on the se ond fa tor in the numerator to get:
p[i (0), ..., i (k)|z k1 ] = p[i (k)|i (0), , i (k 1), z k1 ]p[i (0), ..., i (k 1)|z k1 ]
Assume:
p[i (k)|i (0), , i (k 1), z k1 ] = p[i (k)|i (k 1)]

p[i (0), ..., i (k)|z k ] =
(rst-order Markov assumption)
p[z(k)|i (k)]p[i (k)|i (k 1)]p[i (0), , i (k 1)|z k1 ]

p[z(k)|z k1 ]
Now onsider the weights:
wi (k) =
p[i (0), ..., i (k)|z k ]

q[i (0), ..., i (k)|z k ]
113
22.3 Bootstrap Algorithm
Then, making use of the above, we an write:
wi (k) = c
p[z(k)|i (k)]p[i (k)|i (k 1)] p[i (0), , i (k 1)|z k1 ]
q[i (k)|i (0), , i (k 1)]

q[i (0), , i (k 1)|z k ]
where the se ond fra tion is the same as

How to pi k
q[i (k)|i (0), , i (k 1)]?
Bootstrap method:
Then the
wi (k 1)
wi (k)
q = p[i (k)|i (k 1)]
be ome:
wi = c p[z(k)|i (k)] wi (k 1)
(This is similar to the Multiple-Model updates for the
i 's)

1. Draw initial parti les
i (0), i [1, Ns ]
from the known initial probability density
p[x(0)]
and
initialize weights one ea h parti le equally:
i (0) p[x(0)], i [1, Ns ]

1
, i [1, Ns ]
wi (0) =
Ns
2. Draw one sample of the pro ess noise for ea h parti le.
Note this step is performed only
q[x(k)] =
p[x(k)|x(k1)], whi h is generally made be ause pro ess noise is often assumed to be Gaussian.
be ause the bootstrap lter makes the parti ular hoi e of importan e density
vi (k 1) p[
v (k 1)], i [1, Ns ]
3. Propagate ea h parti le forward a ording to the dynami s fun tion. This is analogous to the
predi tion step in teh EKF or SPF, as it predi ts the parti le set forward in time without any
measurement updates. Noti e the parti le weights don't hange during this step.
i (k) = f [i (k 1), vi (k 1), u(k 1)]

4. Repeat steps 2 and 3 until the next measurement update.
Note that times between mea-
surements an be subdivided into multiple predi tions if ne essary for a ureate modeling or
omputational savings.
5. At the time of the measurement, re al ulate the weights on the parti les a ording to the
bootstrap weight update equation. Note that the use of the primed wi (k) to indi ate that it
is not yet normalized.
wi (k) = p[z(k)|i (k)] wi (k 1)

114
Very small likelihoods
p[z(k)|i (k)] may ause the numeri al underow problems in the parwi (k) might get set to zero be ause it is too small to represent in
ti le lter, i.e. a weight
double pre ision. To be safe, the parti les may be updated a ording to log-likelihoods:
log[wi (k)] = log[p[z(k)|i (k)]] + log[wi (k 1)]

wi (k) = exp[log(wi (k)) maxi [log(wi (k))]]
wi (k) indi ates that the doubly-primed weights are also not normalized and are dif
ferent from wi (k). In performing this log likelihood update, the value of the largest weight
maxi [log(wi (k))] has been subtra ted from ea h weight before taking teh exponent. This
Where
s ales all weights prior to the exponentiation for added numeri al robustness. In the parti ular (and typi al) ase of zero-mean additive Gaussian white measurement noise, the weight
update is parti ularly simple. That is, if
z(k) = h[x(k)] + w(k)
with
w(k) N (0, R),
then:
1
log[wi (k)] = [z(k) h(i (k))]T R1 (k)[z(k) h(i (k))] + log[wi (k 1)]
2
wi (k) = exp(log[wi (k)] maxi (log[wi (k)]))
Noti e in taking the log of the Gaussian likelihood, we drop the normalization onstant. That
onstant is the same for all weights, so it gets an elled when the weights are re-normalized
later on.
6. Re-normalize the weights so they sum to unity.
This is ne essary in order to preserve the
fa t that the set of parti les a tually represent a dis rete approximation to the posterior
probability density of
x(k).
w (k)
wi (k) = PNsi
i=1 wi (k)
7. Evaluate the ee tive number of parti les
s:
N
1
s = P
N
Ns
2
i=1 (wi (k))
8. Resample the parti les if the ee tive number

if
s < Ns /2, but

N
s
N
is too low. A de ent heuristi is to resample
resampling more or less than that may also be justied. Here is a ommon
resampling algorithm to be used when the parti le lter needs to be resampled:

(a) Choose a random number
(b) Find
su h that
Pm1
j=1
on
[0, 1]
uniformly
wj (k) <
Pm
j=1
= m (k) and winew (k) = N1s

new
(d) Repeat steps a through until i
(k), i
new
( ) Set i
(k)
wj (k)
= [1, Ns ]
are hosen
(e) Delete the old set of parti les and use the new set and new weights
Note that some old parti les might appear more than on e in the new set, whereas others
might disappear altogether.
115
9. Compute basi estimation statisti s when desired (but don't throw out the parti le set!)
x(k)
P (k)
Ns
X
i=1
wi (k)i (k)
i=1
wi (k)[i (k) x(k)][

i (k) x(k)]
10. Return to step 2 and ontinue
22.3.1 Note:
Q: What is p[z(k)|i (k)]?
A: Suppose z(k) = h[k, x(k)] + w(k), w(k) N (0, R)
Then
Ns
X
p[z(k)|i (k)] = N (z(k); h[i (k)], R)
116

ASE396 Methods of Estimation/Detection Scribe Notes

Uploaded by

ASE396 Methods of Estimation/Detection Scribe Notes

Uploaded by

ASE 396: Model-Based Dete

tion and Estimation

Appli ations of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fantasti ally simple: Least squares problem . . . . . . . . . . . . . . . . . . .

Fantasti ally omplex: Estimation of GPS . . . . . . . . . . . . . . . . . . . .

General Solution Pro edure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

We Want to Know Our Unknowns

Example: Missile Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Linear Algebra Review

Ve tors, matri es and operations

Determinant and inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Singularity and non-singularity . . . . . . . . . . . . . . . . . . . . . . . . . .

Inversion of partitioned matri es

Matrix inversion lemma

Ve tor transformation and matrix fa torization . . . . . . . . . . . . . . . . . . . . .

The Householder Transformation . . . . . . . . . . . . . . . . . . . . . . . . .

Eigenvalues and ve tors

3 Review of Probability Theroy

Probability: what is it?

Axiomati Approa h to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Review of Probability and Statisti s

Independent Events and Independent Random Variables . . . . . . . . . . . . . . . .

Further impli ations of independen e . . . . . . . . . . . . . . . . . . . . . . .

Ve tor-Valued Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Total Probability Theorem

Conditional Expe tation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Gaussian Random Ve tors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Joint and Conditional Gaussian Random Variables . . . . . . . . . . . . . . . . . . .

Expe ted Value of a Quadrati Form . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Dete tion/Hypothesis Testing Basi s

6 Optimum Dete tion Stru tures

Dete tor Performan e

Composite Hypothesis Testing

The Problem of Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . .

Maximum Likelihood Estimators

Maximum A Posteriori Estimators

Example 1: ML Estimator for a Linear First-Order Dynami al System . . . . . . . .

Example 2: MAP Estimator for a Linear First-Order Dynami al System . . . . . . .

Example 3: LS Estimator for a Linear First-Order Dynami al System

Minimum Mean-Squared Error Estimators . . . . . . . . . . . . . . . . . . . . . . . .

8 Linear estimation for stati systems

MAP estimator for Gaussian problems . . . . . . . . . . . . . . . . . . . . . . . . . .

Analysis of MAP Estimate

Bat h Least Squares Estimator

Properties of the Least Squares Estimator . . . . . . . . . . . . . . . . . . . .

Re ursive Least Squares Estimator . . . . . . . . . . . . . . . . . . . . . . . .

Analysis of Re ursive LS Algorithm

Example: Maximum Likelihood Estimate

Re ursive Approa h Using Square-Root LS Method . . . . . . . . . . . . . . . . . . .

Review of square-root LS method . . . . . . . . . . . . . . . . . . . . . . . . .

9 Nonlinear Least Square Estimation

Basi s of nonlinear least square estimation . . . . . . . . . . . . . . . . . . . . . . . .

Gauss-Newton Algorithm (Gill et al. 4.7.2)(with step length algorithm)

Levenberg-Marquart Method (LM) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 Sto hasti Linear System Models

10.1 Continuous-time model for dynami systems . . . . . . . . . . . . . . . . . . . . . . .

10.2 White noise for sto hasti systems

10.3 Predi tion of mean and ovarian e

10.4 Dis rete-time models of sto hasti systems . . . . . . . . . . . . . . . . . . . . . . . .

10.5 Dis rete-time measurement model

10.6 Full dis rete-time model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11 Kalman lter for dis rete-time linear system

12 Alternative formulas for ovarian e and gain of Kalman lter

12.0.1 Interpretation of the Kalman gain

12.0.2 Generalization for Weighted Pro ess Noise . . . . . . . . . . . . . . . . . . . .

12.0.3 Deriving the Kalman Filter from a MAP approa h . . . . . . . . . . . . . . .

12.0.4 Setting up the ost fun tion . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12.0.5 Minimizing the ost fun tion

12.0.6 Solving for the minimum- ost estimate . . . . . . . . . . . . . . . . . . . . . .

13 Stability and Consisten y of Kalman Filter

13.2 Control of a System Estimated by KF

13.3 Matrix Ri ati Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.4 Steady-State KF Equations

13.5 Steady-State Error Dynami s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13.6 Properties of KF Innovations

13.7 Likelihood of a Filter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11 Kalman lter for dis rete-time linear system

12 Alternative formulas for ovarian e and gain of Kalman lter

14.1 Consisten y Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16.1.1 Benets of the Information Filter . . . . . . . . . . . . . . . . . . . . . . . . .

16.3 Benets of the square root information lter

18 Nonlinear Dieren e Equations from ZOH Nonlinear Dierential Equations

18.3 Partial Derivatives of Dieren e Equations . . . . . . . . . . . . . . . . . . . . . . . .

ionosphere and neutral atmosphere have ee ts