ASE396 Methods of Estimation/Detection Scribe Notes
ASE396 Methods of Estimation/Detection Scribe Notes
Class Notes
Spring 2012
Contents
1 Overview of Estimation and Dete
tion
1.1
1.1.1
1.1.2
1.2
1.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4
2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
2.2.1
Properties of norms
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2.2
S hwartz inequality
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
11
2.3.1
Determinant
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.3.2
12
2.3.3
Matrix inversion
12
2.3.4
2.3.5
2.3
2.4
2.5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
12
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
13
2.4.1
13
2.4.2
QR Fa torization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.4.3
Cholesky Fa torization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
15
3.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
3.2
17
20
4.1
Conditional Probability
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
4.2
20
4.2.1
20
CONTENTS
CONTENTS
4.3
21
4.4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.5
Bayes's Theorem
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
4.5.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
4.6
23
4.7
24
4.7.1
Bayesian Jargon
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
4.8
24
4.9
25
26
5.1
Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
5.2
Neyman-Pearson Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
5.3
Example:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
5.4
Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
6.2
30
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Estimation Basi s
30
30
34
7.1
34
7.2
34
7.3
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
7.4
35
7.5
36
7.6
Least-Squares Estimators
7.7
. . . . . . . .
36
7.8
37
7.9
Summary
37
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
38
8.1.1
40
8.3
36
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
42
Square-Root-Based LS Solutions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
8.3.1
44
8.3.2
46
. . . . . . . . . . . . . . . . . . . . . . .
8.4
. . . . . . . . . . . . . . . . . . . . . . . .
46
8.5
47
8.5.1
47
8.5.2
Re ursive Square-Root LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
CONTENTS
CONTENTS
50
9.1
9.2
Newton-Rhaphson method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
9.3
. . . . . . .
53
9.4
54
50
55
55
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
57
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
59
61
65
. . . . . . . . . . . . . . . . . . . . . . . .
65
66
66
66
. . . . . . . . . . . . . . . . . . . . . . . . . . .
67
68
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
70
. . . . . . . . . . . . . . . . . . . . . . . . . .
71
72
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
73
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
74
75
75
75
76
77
79
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
83
83
CONTENTS
CONTENTS
16 Information Filter/SRIF
16.1 Information Filter
84
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
85
85
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
88
17 Smoothing
17.1 Estimate
x(k)
based on
Zj
. . . . . . . . . . . . . . . . . . . . . .
91
92
with
j>k
. . . . . . . . . . . . . . . . . . . . . . . . . .
92
17.2 Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
95
95
96
97
100
. . . . . . . . . . . . . . . . . . . . . . . 100
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
104
. . . . . . . . . . . . . . . . . . . 104
108
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Estimation
observations.
is the pro ess of inferring the value of a quantity from indire t and ina urate
We will
frequently seek optimal solutions over heuristi
solutions. The so-
alled optimal solutions seek to
provide the best estimates with some quanti
ation of their a
ura
y.
Dete tion
is the pro ess of making de isions based on our estimates. It inovlves hypothesis
testing against a set of nite possibilities and sele
ting one that best represents our estimates.
Dete
tion
an be though of as a subset of estimation.
disturbances and
modeling errors
control
inputs
Dynamical
System
prior
information
measurement
errors
system
state
Measurement
System
measurements
State
Estimator
state estimate
and uncertainties
z = Hx + w
where
to
z,
and
(1)
is a measurement ve tor.
When all of the quantities of interest are
ompiled together, one has a staggeringly
omplex
estimator:
non-linear state dynami s and measurement equations (square root sum of squares)
lo k variations
Goal:
http://igs b.jpl.nasa.gov)
estimate all site lo ations, all SV orbital parameters, atmospheri delays, slowly varying
In addition, we an onsider:
Donald
missle
x1
x
1 = 0
(2)
x
2 = g
(3)
(4)
Measurements:
(5)
Assume:
E[n1k ] = E[n2k ] = 0
E[n1k n1j ] =
E[n2k n2j ] =
Unknown Quantities:
(6)
12 kj
22 kj
(7)
(8)
model):
(9)
(10)
(11)
(12)
from
1 0
y11
y22 0 1
=
.
.
.
| {z }
z
.
.
.
.
.
.
t1
0
{z
H
.
.
.
x (0)
1
0
n11
0
v (0) +
+
.
.
.
1
.
.
.
.
.
.
v2 (0)
} | {z } | {z } | {z }
x
x
1 (0), x
2 (0), v1 (0), v2 (0).
x2 (timpact ) = 0.
0=x
2 (0) + v2 (0)timpact gt2impact/2
p
2 (0)
v2 (0) + v22 (0) + 2g x
timpact =
g
timpact into previous equation for site estimate: x
1 (timpact ). We
ximpact . Re
ursion: in
orporate new measurements as they arise.
Plug
and
(13)
(14)
(15)
an also estimate
timpact
v1
v2
v= .
..
vn
where
vj
is the
j th
Matrix
where
is an
nm
matrix,
v Rn .
a11
Outer Produ t:
s
alar
If
a12
a21
A=
.
..
an1
A = aij
..
= aT b
b Rn1
if
and
a, b
a1m
..
.
.
.
anm
(i, j)th
(17)
(18)
element (row
ve tors. Notation:
then
A = bcT Rnm
Tra e:
of
If
Symmetri Matrix:
n
P
i=1
A Rnm , B Rmn ,
if
then tr(AB)nn
A = AT
= tr(BA)mm
olumn
j ).
(19)
Rank:
i,
AT = C T DT
n1
c Rm1 ,
.
.
.
is the
then
are
A Rnm ; aij
A = BC ,
Inner Produ t:
(16)
A.
aii .
ha, bi
with
aij = bi cj .
2.2 Norms
Quadrati Form:
If
a quadrati form in
P = PT,
x Rn1 .
then we an dene
= xT P x
(20)
n X
n
X
(21)
Or,
xi Pij xj
j=1 i=1
soon).
P > 0.
2.2 Norms
l1 : kxk1 =
l2 : kxk2 =
n
P
i=1
|xi |,
n
P
i=1
x2i
Manhattan norm"
1/2
xT x,
The interse -
1.
Also, for an entertaining (and informative) read on taxi ab geometry (that uses the l1 - in-
p-norm
an be dened
p 1:
1/p
10
(22)
kxk 0
2.
kxk = ||kxk
3.
ka + bk kak + kbk
If
P > 0,
and
then
kxk = 0
kxk2P =
i
x=0
xT P x;
P -weighted
norm.
kxk2 = kxk.
(23)
|xT y| kxkkyk
(24)
kxk=1
| cos | 1.
Also,
Cij
is
a =a
with the
for a
ith
11
n
X
i=1
row and
(25)
j th
olumn removed.
a b
c d = ad bc
(26)
Q: Where does
the
determinant
ome from?
d b
a b
1
1
, then A
= |A|
.
A: If A =
c
about a matrix.
11
is negative).
is singular if
|A| = 0.
is rank-de ient if
If
|A| 6= 0,
then
If
|A| = 0,
then
x 6= 0
But if
|A| 6= 0,
rank(A) < n.
then
su h that
Ax = 0.
Ax = 0 x = 0.
A Rnn
and
|A| 6= 0, A1
A1 =
|A|
su h that
|C11 |
A = BC ,
then
|C21 |
..
|C12 |
.
.
.
1+n
(1)
If
A1 A = AA1 = I .
A1 = C 1 B 1 ,
.
..
|C1n |
provided that
A, B ,
and
(1)n+1 |Cn1 |
.
.
.
.
.
|Cnn |
C
(27)
are non-singular.
then we an solve
P
P = 11
P21
P12
P22
with
P11 , P22
square. If
P11
and
1
= P22 P21 P11
P12
are invertible (
P 1 =
V11
V21
Where
1
1
1
V11 = P11
+ P11
P12 1 P21 P11
1
V12 = P11
P12 1
1
V21 = 1 P21 P11
V22 = 1
Note that
dim(Pij ) = dim(Vij ).
12
V12
V22
(28)
= A1 A1 B DA1 B + C 1
Alternatively,
1. If
D = BT ,
1
= A1 A1 B B T A1 B + C 1
1
B T A1
A = P 1 , B = H T , D = H , C = R1 , then
1
1
P 1 + H T R1 H
= P P H T HP H T + R
HP
DA1
(29)
then
A + BCB T
2. If
1
is orthonormal. Thus
All
olumns of
If
QT Q = QQT = I .
kQxk = kxk: Q
(30)
(31)
then
qiT qj = ij .
is isometri .
kQxk2 = xT QT Qx = xT x = kxk2
The matrix
Hx =
H
0
su h that
T
(32)
kHxk = kxk =
(33)
H=I
We
an solve for
that makes
Hx =
2vv T
vT v
T
0 :
1
0
v = x + sign(x1 )kxk 0
..
.
0
13
(34)
(35)
2.4.2 QR Fa
torization
Given any
A Rmn ,
triangular matrix:
QR = A
QT Q = QQT = I , Q Rmm , and R Rmn . R is an upper triangular matrix,
ne
essarily square (all elements of R below the main diagonal are zero).
1
If A is square and invertible, then so is R. What's more, R
is easy to
ompute.
e
R
e is n n and upper triangular.
If A is m n with m > n, then R =
; R
0
where
Q=
(36)
and not
H1 H 2 . . . H p
{z
}
|
(37)
(Householder transforms)
p = min(m, n).
with
Use MATLAB:
[Q, R = qr(A);
matrix
P = PT
with
P > 0,
su h
P.
Use MATLAB:
R = hol(P);
If
A > 0:
all
i > 0 ().
If
A 0:
all
i 0 ().
Also,
|A| =
n
Q
i=1
i ;
tr(A)
n
P
i .
i=1
14
(Figure 4)
p1 (x)
p2 (v)
x0
v0
x v
n
m
(38)
"Our most pre ise des ription of nature must be in terms of probabilities" R.Feynman.
m = 1,
we have,
x
=
Add a bit of dampling:
V (x) =
dV
= x3 + x
dx
x4
4
x2
(39)
x
= x x3 + x,
x = v
(40)
x
= v = x x + x.
15
(41)
Figure 5: potential
1.5
xdot
0.5
0.5
1.5
2
1.5
0.5
0
x
0.5
1.5
1 2
2 v + V (x) ,the total density of the basin of attra
tion for some
approa
hes 50%. This implies a requirement of in
reasing energy pre
ision
E(x) =
neighbourhood E
E = E
E to spe
ify the sink that we want.
P (A) = lim
N =
experiments.
16
NB
N
(42)
(1)P (A)
(2)P (S)
onsequen e:
P (A) = 1 P (A)
A=
"not A"
S =AA
AA =
(44)
(45)
P (A A) = P (A) + P (A) = 1
Assign probabilities by whatever means.(e.g.
(43)
(46)
But they
{x : d < x }
x() , lim
d0
P ( d < x )
d
(47)
(48)
p(x), px (x)
P ( < x ) =
17
p(x)dx
(49)
inf p(x)dx
Note:
"sure event", this onstraint also implies a normalization requirement on the pdf.
()2
1
e 22
p() =
2
R
Px ( ) = I .
(deathbed identity)
the
(50)
for
i = 1, . . . n,
i ,
i = 1, 2, . . . , n.
i = P (x = p )
p(x) =
n
X
i=1
()
i = 1
(51)
i (x i )
(52)
xP (x)dx = x
(mean of x)
(53)
In general,
E[g(x)] =
g(x)p(x)dx
(54)
(55)
varian
e of x:
x2 = E[(x x
)2 ] =
We say
If
x [a, b]
if
E[x] = a, x2 = b.
x =
(56)
(57)
Also,
is Gaussian,
x 0
x(
1
(x)
)e 2x2 dx =
2
(58)
(59)
similarly,
x2 = 2
18
P {( d < x ) ( d < x )}
d0,d0
dd
lim
(60)
(61)
p(x) =
p(x, y)dy.
(62)
Cumulative distribution
Px,y (, ) = x
=
xp(x, y)dxdy =
y.
p(x, y)dyx =
xp(x)dx
(63)
(64)
Covarian
e:
cov(x, y) = E[(x x
)(y y)]
Z Z
=
(x x)(y y)p(x, y)dxdy
an show
dependent.
|xy 1,
cov(x, y)
=
orrelation
x y
and if
xy = 0 x, y
(66)
= x yxy
xy =
(65)
oe ient
are un orrelated.
19
(67)
(68)
xy = 1 x, y
are linearly
and
related in the sense that knowledge of the o urren e of one alter the
B,
as illustrated in Figure 8.
A B
AB
o urred (given
B ),
Pr(A | B) =
B.
Mathemati ally,
Pr(A B)
Pr(B)
(69)
p(x | y) =
p(x, y)
p(y)
(70)
A,
then
and
are
Pr(A) = Pr(A | B) =
Pr(A B)
Pr(A, B)
=
Pr(A, B) = Pr(A) Pr(B)
Pr(B)
Pr(B)
(71)
(72)
S
B
A B
A
Figure 9: Does this Venn diagram express inde-
Figure 10:
Q:
Does the Venn diagram in gure 9 express independen e? If not, how do we express inde-
A:
A and B sin
e
A and B are indepenthe Venn diagram we have Pr(A) 6= 0 and Pr(B) 6= 0.
no overlap between events A and B in the Venn diagram
No! The Venn diagram in gure 9 does not express independen e between
It an be explained as follows.
Sin e events
dent,
Hen e,
p(x)
and
p(y)
p(x, y)
p(x, y) = p(x)p(y)
Also, if
and
Cov(x, y) =
Z Z
(x x)p(x) dx
and
denote
n-valued
(y y)p(y) dy = 0
(73)
= xy =
Let
an be
i.e.,
x1
x2
x= .
..
xn
21
1
2
=.
..
n
(74)
(75)
The joint pdf des ribing the statisti s of the random ve tor
is dened as follows.
px () = px (1 , 2 , . . . , n )
Pr {(1 d1 < x1 < 1 ) (2 d2 < x2 < 2 ) (n dn < xn < n )}
d1 0,
d1 d2 . . . dn
, lim
(76)
d2 0,
...
dn 0
is denoted by
E[x]
(or equivalently,
dened as
E[x] =
...
) Rn
x
and is
(77)
] = (x x)(x
T p(x)dx
Cov(x) = E[(x x)(x
x)
x)
where
Pxx (i, j)
is denoted by
(78)
is given by
Pxx (i, j) =
...
Px x
(xi x
i )(xj x
j )p(x1 , x2 , . . . , xn ) dx1 dx2 . . . dxn
is symmetri i.e.,
T
Pxx = Pxx
.
Further,
Pxx 0
x.
and if
Pxx 0
then
Pr(A) =
n
X
{Bi Bj = }, i 6= j
Pr(A, Bi )
i=1
n
X
i=1
and
(79)
Pr(A | Bi ) Pr(Bi )
p(x) =
p(x, y)dy =
p(x | y)p(y)dy
(80)
Consider
and
Pr(Bi | A) =
Pr(A | Bi ) Pr(Bi )
Pr(Bi A)
= Pn
Pr(A)
j=1 Pr(A | Bj ) Pr(Bj )
22
(81)
Thus equation (81) allows us to reverse the onditioning in onditional probabilities. For densities,
p(y | x)p(x)
p(y)
p(x | y) =
(82)
Be ause of its utility in solving estimation problems, equation (82) is onsidered the
estimation .
workhorse for
Note that, the onditional density is a valid probability density fun tion and hen e
sums to 1.
p(x | y) dx =
p(y | x)p(x)
dx =
p(y)
p(y)
p(x, y)
dx =
=1
p(y)
p(y)
(83)
is an
p(x | y)
apriori
is an
aposteriori
diused
R.
If
assume
p(x) = ,
p(x) =
It
an be shown that
1
2
otherwise
|x|
(84)
p(y|x)p(x)
1
for an
p(y)
wide interval on either side
of 0.
E[x | y] =
In general,
xp(x | y) dx
E[x | y]p(x | y) dx
(85)
Z +
is dened as
E[E[x | y]] =
= E[x] = x
.
23
(Marginalizing over y)
nn
(x )
1
p(x) =
n
1 exp
2
(2) 2 |P | 2
symmetri matrix
4.7.1 Properties
1.
E[x] =
2.
T] = P
E[(x )(x
)
pdf of
as
(86)
and
z,
and
y,
by
z.
y=
x
z
dim(x, y, z)
Pyy
= (nx , ny , nz )
Pxx Pxz
= Cov(y) =
Pzx Pzz
(87)
nz
1
1
T Pyy
|Pzz | 2
(2) 2 exp 12 (y y)
(y y)
p(x, z)
p(y)
p(x | z) =
=
=
nx +nz
1
1
p(z)
p(z)
T Pzz
|Pyy | 2
(z z)
(2) 2 exp 21 (z z)
(88)
After mu
h algebra (in
luding blo
k inverse formula et
.), we
an show that the pdf of the
onditional random variable
p(x | z)
(x | z)
is
1 T 12
|Pxx Pxz Pzz
Pxz |
1
1
1 T 1
1
Pxz Pzz
T )T (Pxx Pxz Pzz
Pxz Pzz
(z z)
Pxz ) (x x
(z (89)
z)))
exp( (x x
2
(2)
nx
2
Mean:
1
+ Pxz Pzz
E[x | z] = x
(z z)
This applies a
orre
tion term to
(90)
z.
Covarian
e:
1 T
E[(x E[x | z])((x E[x | z]))T | z] = Pxx Pxz Pzz
Pxz
1 T
Pxx Pxz Pzz
Pxz 0. So,
with information about z .
Note that,
estimate
24
Pxx
(91)
4.9 Expe ted Value of a Quadrati Form 4 REVIEW OF PROBABILITY AND STATISTICS
A(= AT )
with
E[x] = 0,
E[xT Ax] =
E[tr(xT Ax)]
=
=
E[tr(AxxT )]
tr(AE[xxT ])
(sin
e
(sin
e
xT Ax = tr(xT Ax))
tr(.) and E[.] are linear
tr(ACov(x))
(sin e
E[x] = 0
tr(APxx )
and
then we have
operators)
Cov(x) = xxT )
(92)
25
Assume that
{0 , 1 , , m1 }
Let
Hi
i .
is
m-ary
hypothesis:
H 0 : = 0
H 1 : = 1
.
.
.
H m : = m
Therefore, in the binary hypothesis
ase we have, only
H 0 : = 0
H 1 : = 1
In this
ase,
H0
is known as the
null hypothesis
and
H1
as the
alternative hypothesis.
We fo us
= P [ a
ept H1 |H0
= P [ a
ept H1 |H1
= P [ a
ept H0 |H1
= 1 PD
PF
PD
PM
A de
ision between
p(z|0 )
and
p(z|1 )
H0
and
H1
true
true
true
]
]
]
(false alarm)
(dete
tion)
(miss)
given
and
z,
where
known.
>0
PF ,
maximize
PD .
hoose
hoose
H0
H1
if
if
26
H0
and
H1 , where
5.3 Example:
exists
0 PF 1
The Neyman Pearson Lemma goes on to give us the form of the test. It boils down to
omparing
the likelihood ratio to a threshold:
(z) =
p(z|1 )
p(z|0 )
(z) =
p(z|H1 )
p(z|H0 )
H0
2. Sele t
PF .
and
H1 .
PF
= 0.05 or 0.01.
H0
PF
H1
(z)
H0
(z)
based on
PF
H1
H0
(z).
5.3 Example:
Assume that the re
eived signal in a radar dete
tion problem is
z = + ,
N (0, 2 )
and
H0 : = 0
H 1 : = 1
>0
Then:
p(z|0 ) = N (z; 0, 2 )
p(z|1 ) = N (z; 1 , 2 )
Form the log-likelihood ratio:
27
(z).
5.3 Example:
p(z|1 )
(z 1 )2 z 2
L(z) = log(z) = log
=
p(z|0 )
2 2
2z1 12
=
2 2
At this point the NP tests amounts to
H1
(z)
= log()
H0
We further simplify by noting that the NP test
an be boiled down to:
H1
= 0
with
H0
Now set
to satisfy:
PF = P [z > 0 |0 ] =
2 2 + 12
21
p(z|0 ) dz
p(z|0 )
p(z|1 )
1
0
PM
PF
Figure 11: Binary Hypothesis PDFs. Shaded regions illustrate probability density of
MATLAB
PF = 1 lambda_0
PD = 1 PM = 1 -
PF
and
PM .
5.4 Remarks
5.4 Remarks
1. The
hoi
e of threshold depends only on
2.
PF
PF
p( (z)|0 ).
and
PF
be large.
3. Note that the NP approa
h does not take into a
ount any
of
4.
H0
Q:
and
H1 .
PF ?
guidan e here.
A: Remarks 2 and
ost (AKA risk).
Pj = prior
Cjk =
ost
probability of
of de iding
is true]
is true]
Hj
expe eted
Hj
when
Hk
is true
is true]
is true]
Or in a synonymous form:
is true]
Goal: minimize R
This formulation leads dire
tly to a likelihood ratio test and it
hooses the value of
for
you:
p(z|H1 )
p(z|H0 )
H1
H0
Q: If the Bayesian approa
h automati
ally
hooses the "right" value of , then why take the
NP approa
h, whi
h seems to set
29
and
Pj .
PF ?
PF .
Even so,
PD might
PD
PD
for a
as a fun tion of
PF .
PD
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
PF
0.6
0.7
0.8
0.9
, it is easy to
(0, 0) and (1, 1).
p(z|0 ) = p(z|1 ),
going through
[0 , 1 , . . . , M1 ].
In a
omposite HT,
[0, 2]).
interest,the nal dete tion test again boils down to a likelihood ration test.
Example:
H0 :
z1 = w1
z2 = w2
H1 :
z1 = A cos + w1
z2 = A sin + w2
where
1
2
30
H0
H1
where
The
0 = { | 1 = 0}
1 = { | 1 = A}
HT boils down to:
(z) =
p(z | 1 )
p(z | 0 )
H1
H0
We an show that:
1
1
exp{
(z )T (z )}
2 2
2 2
1 cos 2
= (1 , 2 ) =
1 sin 2
p(z | ) =
31
1
1
exp{ 2 z T z}
2 2
2
Z2
p(z | 1 ) = p(z | 1 = A, 2 = )p ()d
p(z | 0 ) =
1
1
exp{ 2 (z (A, ))T (z (A, ))}d
4 2 2
2
p(z | )
by averaging over
A
exp( 2
p(z | 1 )
2)
(z) =
=
p(z | 0 )
2
Z2
A
exp{ (z1 cos() + z2 sin()}d
2
r=
q
z12 + z22
= arctan(
z2
)
z1
Then
z1 = r cos()
z2 = r sin()
(z) =
A
exp( 2
2)
2
Z2
exp{
Note:I0 ( Ar
2 ) =
1st kind.
1
2
2
R
0
exp{ Ar
2 cos( )}d ,
where
I0
Ar
cos( )}d
2
Let
(z)
Sin
e
I0 (x)
= I0 (
(z)
x,
Ar
)
2
H0
Thus,the optimal test
ompares
Under
Under
H0 ,
H1 ,
H0
H1
H1
2 I1
0 ( )
=
A
with a threshold.
32
Q: Why do we so often nd ourselves performing a
orrelation as part of a dete
tion test?
A: Here we go:
General Gaussian Problem:
H0 : z N (0 , P0 )
H1 : z N (1 , P1 )
Note:
j
Measurement in
HW
(z) =
1
1
(z 0 )T P01 (z 0 ) (z 1 )( T )P11 (z 1 )
2
2
H1
H0
(z)
Example:
Suppose
P0 = P1 = P,
H1
(z) = P1 z
H0
where
= 0 .
0 = 0
Further suppose
and
P = 2 I.
(z) = T1 z
H1
H0
n
P
zi
()
> ( )
H1
<
H0
1i
Figure 15: Correlator blo
k diagram
33
is not easy.
7 ESTIMATION BASICS
7 Estimation Basi
s
S
ribe: Zaher M. Kassas
x.
Our estimate will be a fun tion of the data set and possibly time, i.e.
= x(k,
Z k ) = f (k, Z k ).
x
f (k, Z k ),
By the rst order ne
essary
ondition (FONC) of optimality, we set the derivative of the likelihood
fun
tion with respe
t to
to zero, namely
Z k (x)
0.
x
This impli
itly denes
ML
x
as the solution of
equations in
unknowns.
p(x),
alled the prior pdf. Assume that we know the onditional pdf
p(x|Z k ) = R
p(Z k |x)p(x)
.
p(Z k |x)p(x)dx
(93)
The maximum a posteriori (MAP) estimator is one that maximizes the posterior distribution,
namely
It is worth noting that the denominator in (93) is
onstant with respe
t to the maximization
parameter
x;
MAP .
x
The ML and MAP estimators are the same if the prior pdf
p(x) = lim
, |x|
0, |x| >
34
1
2 ;
1
2 .
p(x)
is diuse, i.e.
y = ay.
The solution to this system
an be readily found to be
y(t) = y(0)eat ,
where
y(0)
x = y(0)
given the
j = 1, 2, . . . , k,
w(j) N (0, 2 ),
and
E [w(i)w(j)] = 2 ij ,
with
ij
being the
yields the
z(j) = j x + w(j),
where
j , eatj
and
j = 1, 2, . . . , k,
x , y(0).
p(z(j)|x)
p(w),
shifted by
z(j) j x,
i.e.
p(z(j)|x) = N (z(j) j x, 2 ).
Therefore, we
an write
pw (w) = N (w; 0, 2 ).
Z k (x) =
Dierentiating
Z k (x)
k
1 X
1
exp
2 2
(2)k/2 k
j=1
[z(j) j x]
k
X
1
[z(j) j x] (j ) 0
Z k (x). 2
j=1
Re
ognizing that
ML estimate as
Z k (x) 6= 0
xML =
Pk
j=1
Pk
35
z(j)j
j=1
j2
k
X
1
1
2
[z(j) j x] 2 (x x )2 ,
exp 2
p(x|Z k ) = c1
2 j=1
2x
(2)(k+1)/2 k x
where
c1 = [p(Z k )]1
of maximization. Dierentiating
k
X
1
1
p(x|Z k ). 2
[z(j) j x] j 2 (x x ) 0.
j=1
x
Re
ognizing that
MAP estimate as
p(x|Z k ) 6= 0
x
MAP =
It is worth noting that as
x 0,
1
2
Pk
z(j)j +
j=1
1
2
Pk
j=1
j2 +
1
2 x
x
yields the
1
2
x
where
2
new
is dened as
1
1
2
p(x|Z k ) = p
,
(x
)
exp
MAP
2
2
2new
2new
2
new
=
1
2
Pk
j=1
j2 +
1
2
x
i.e. the posterior pdf has the form of a Gaussian pdf with mean
x
MAP
and varian e
2
new
.
Least-squares (LS) estimators aim at minimizing the
ost fun
tion, dened by the sum of the
squares of the error between the data and the model, denoted as
namely
xLS
k
1 X
2
= arg min C(k, Z ) , 2
[z(j) j (x)] .
x
2 j=1
k
36
C(k, Z k )
7 ESTIMATION BASICS
k
1 X
[z(j) j x] (j ) 0.
2
j=1
Solving for
xLS =
Pk
j=1
Pk
z(j)j
j=1
j2
Note that the resulting LS estimator
oin
ided with the ML estimator. This stems from the fa
t
that for Gaussian random variables, the ML estimation
orresponds to a Eu
lidean distan
e metri
.
p(x)
and
p(Z k |x)
p(x|Z k ).
The mini-
mum mean-squared error (MMSE) estimator aims at minimizing the
ost fun
tion dened by the
onditional mean of the squared estimation error, i.e.
C(
x, Z k )
with respe t to
(
x x)2 p(x|Z k )dx.
yields the
MMSE = E x|Z k .
x
7.9 Summary
LS
ML
MAP
p(x)
p(x|Z k )
oin ide.
is diuse, then: ML
37
MAP
LS
MMSE.
that x N (x,
]=0
Our approa
h to this problem is to develop a joint PDF for x and z , then use our understanding
MAP su
h that
of
onditional Gaussian distributions to determine p(x | z). Thereby, we
an nd x
Given a system model
(94)
In order to nd
p(x | z),
z.
we rst need
p(x, z),
z = E[z]
= E[Hz + w]
(95)
(96)
= HE[x] + E[w]
+0
= Hx
Next, we need
ovarian
e matri
es
Pxz , Pzx ,
and
(97)
(98)
Pzz .
T]
Pxz = E[(x x)(z
z)
(99)
T]
= E[(x x)(Hx
+ w H x)
T
(100)
H + (x x)w
= E[(x x)(x
x)
]
(101)
H ] + E[(x x)w
= E[(x x)(x
x)
]
= Pxx H
(102)
(103)
(104)
T
Pzx = Pxz
= HPxx
For
(105)
Pzz ,
T]
Pzz = E[(z z)(z
z)
(106)
+ w)(H(x x)
+ w) ]
= E[(H(x x)
T
= HPxx H + R
38
(107)
(108)
8.1 MAP estimator for Gaussian problems8 LINEAR ESTIMATION FOR STATIC SYSTEMS
Now we
an dene
p(x, z),
p(x | z) =
p(x | z).
p(x, z)
p(z)
(109)
= c(z) exp
1
2
T Pxx
(x x)
T Pzx
(z z)
Pxz
Pzz
(x x)
(z z)
(110)
(111)
1 T 1
Vxx = (Pxx Pxz Pzz
Pxz )
(112)
Pxx
Pzx
Pxz
Pzz
1
Vxx
Vzx
Vxz
Vzz
where
Vxz =
Vzz =
Now we
an nd
Note that
MAP
x
1
Vxx Pxz Pzz
1
(Pzz Pxz Pxx
Pxz )1
by maximizing
p(x | z),
0=
Vxz
Vzz
(x x)
(z z)
x
(115)
(116)
C(x | z) = 0.
C
x
C
=
Now we solve for
(114)
(x x)
(z z)
C(x | z) =
Vzx
2
C(x | z)
(113)
(117)
x1
.
.
.
C
xnk
(118)
+ Vxz (z z)
= Vxx (x x)
(119)
1
MAP = x
Vxx
x
Vxz (z z)
(120)
MAP .
x
+
=x
1
Pxz Pzz
(z
z)
(121)
Using our original formulas for the ovarian e matri es, we get
MAP = x
+ Pxx H T (HPxx H T + R)1 (z H x)
x
39
(122)
Let's nd
and
Pxx|z .
=xx
(123)
Pxx H (HPxx H + R)
= (x x)
T
= E[x x]
Pxx H (HPxx H + R)
E[x]
=0
Be ause
E[x]
= 0,
MAP
x
(z H x)
E[z H x]
(127)
(128)
(129)
] Pxz Pzz
T]
= E[(x x)(x
x)
E[(z z)(x
x)
1 T
T
T ]Pzz
E[(x x)(z
z)
Pxz + Pxz Pzz
E[(z z)(z
Note that
1 T
Pxz Pzz
Pxz
T
Pxx H (HPxx H T
and
1 T
T ]Pzz
z)
Pxz
(130)
(131)
+ R)
HPxx
(132)
assumes
(126)
]
= E[(x x)(x
x)
= Pxx
(125)
is an unbiased estimator.
x
T ]
Pxx|z = E[x
= Pxx
(124)
Pxx|z = Pxx .
and
R,
then
MAP
x
1
MAP = x
+ Pxz Pzz
x
(z z)
= (I
+ Dz ,
Cx
Pxz Pzz
H)x
a linaer ombination of
(133)
1
Pxz Pzz
z
and
(134)
(with no a priori
J(k) =
k
X
i=1
J(k)
(135)
(136)
(137)
as
40
(138)
We want to use this
ost fun
tion to deemphasize any noisy measurements.
Note that now we have a time index, and no a priori knowledge.
1
(139)
So minimizing
J(k)
with respe t to
is equivalent to maximizing
estimator).
(140)
(x) = p(z k | x)
(ML
We need a hange of notation to in orporate data and parameters for ea h new time step.
z(1)
z(2)
Z ( k) = . Rnz k1
..
z(k)
H(1)
H(2)
H k = . Rnz knx
..
H(k)
w(1)
w(2)
k = . Rnz k1
..
w(k)
R(1)
0
.
.
0
R(2) 0
.
Rnz knz k
Rk =
.
.
..
..
0
0
0
0 R(k)
(141)
(142)
(143)
(144)
(145)
J(k)
as
J(k)
(146)
0=
J
x
J
x1
J
x2
.
.
.
J
xnx
(147)
(148)
= 2(H k )(Rk )1 (Z k H k x)
41
(149)
nx
nx
k.
x
= (H R
By dropping the
H)
H R
(150)
(151)
supers ripts, we get one of the deathbed identities, the normal equations.
(152)
= x (H T R1 H)1 H T R1 z
T
= x (H R
= [I (H R
H)
H R
H)
H R
=0
(153)
(Hx + w)
(154)
H]x (H R
H)
H R
(155)
(156)
= (H T R1 H)1 H T R1 E[w]
E[x]
=0
(157)
(158)
= (H R
This follows be
ause the
1
with a neighboring R
.
H)
(159)
(160)
bust, and also leads to a more elegant and intuitive interpretation of least squares.
Let
R), R = RT , R > 0.
z H x + w , w N (x,
MATLAB,
Ra = chol(R);
42
RaT Ra = R.
In
Then let
z = (Ra1 )T z
(161)
(RaT )1 z
(RaT z
RaT H
RaT w
(162)
=
=
H=
w=
(163)
(164)
(165)
z = Hx + w
E[w] =
E[ww ] =
=
=
(166)
E[RaT w ] = 0
E[RaT w wT Ra1 ]
RaT E[w wT ]Ra1
RaT RaT Ra Ra1
=I
Be
ause
(167)
(168)
(169)
(170)
(171)
(172)
(173)
kvk = v T v
T
(174)
kQvk = v Q Qv
T
=v v
= kvk
(175)
(176)
(177)
Q: Can we multiply Hxz by some orthonormal matrix and
leverly simplify the
ost fun
tion?
A: yes we
an; use QR fa
torization
R
= H,
Q
so let
T
T =Q
to get
T (Hx z)k2
J(k) = kQ
zk
2
= kRx
T
z = Q z
43
(178)
(179)
2
R
o
z
J(k) =
x o
0
o x zo k + kk2
= kR
(180)
(181)
How do we minimize this? We solve rst for x, made possible be
ause if rank(H) = nx (whi
h
o Rnx nx and is invertible. We also have the solution
R
d
kRo x zo k2
dx
o kR
o x zk
= 2R
1
o zo
=R
0=
LS
x
(182)
(183)
(184)
The solution was obtained without squaring anything. Unfortunately, the
omponent norm of
is the irredu
ible part of the
ost. In other words,
Px x
J(k)
LS
x
= kk2
(185)
from before
Px x = (H T R1 H )1
(186)
(187)
(H Ra1 RaT H )1
T
1
= (H H)
T 1
T
T
Q
Ro
= R
Q
0
o
0
T
1
o R
o)
= (R
=
Remember that
kk2
is orthonormal, so
(188)
(189)
(190)
1 R
T )
(R
o
o
T Q
= I.
Q
(191)
Additionally, we know that
o
R
an be inverted
oT R
o
HT H = R
is alled the
information matrix.
x,
A large
leading to a small
HT H
Px x|z
.
time step.
44
So let us set this up using sta ked ve tors and matri es:
k+1
H k+1
Rk+1
wk+1
zk
=
z(k + 1)
k
H
=
H k+1
k
R
0
=
0 R(k + 1)
wk
=
w(k + 1)
(192)
(193)
(194)
(195)
We an show that
as
k 1
(R )
+ [z H x]
(197)
[z H x]
(198)
x.
Therefore,
(196)
(199)
J
Then to minimize the
ost fun
tion, we set x
=0
(200)
+ 1, z(k + 1),
x(k
J
x
z k )] zH T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x]
= 2P 1 (k, z k )[x x(k,
0=
(201)
(202)
(203)
(204)
+ 1, z k+1 ) = x(k,
z k ) + W (k + 1)[z(k + 1) H(k + 1)x]
x(k
1
k
T
1
W (k + 1) = [P (k, z ) + H (k + 1)R (k + 1)H(k + 1)]1 H T (k + 1)R1 (k + 1)
This form of
45
W.
(205)
(206)
+ 1, z k+1 )
x(k
like this,
+ 1, z k+1 ) = x x(k
+ 1, z k+1 )
x(k
(207)
z ) W (k + 1)w(k + 1)
= [I W (k + 1)H(k + 1)]x(k,
z k )wT (k + 1)] = 0.
E[x(k,
(208)
Then,
+ 1, z k+1 )x
T (k + 1, z k+1 )]
P (k + 1, z k+1 ) = E[x(k
(209)
z )x
(k, z )][I W (k + 1)H(k + 1)]
= [I W (k + 1)H(k + 1)]E[x(k,
T
+ W (k + 1)R(k + 1)W (k + 1)
We
an nd alternate formulas for
(210)
(211)
P (k + 1, z k+1 ) and W (k + 1) an be
inversion lemma.
(212)
(213)
(214)
(215)
zj = j x + w,
oming from
y = ay
(216)
zj = y(jt) + w(j)
a(jt)
j = exp
(217)
(218)
(219)
The ML estimate is
x
(k, z ) =
Pk
j=1
Pk
j z(j)
j=1
j2
2
P (k, z k ) = LS
= Pk
(220)
(221)
2
j=1 j
, w(j) N (0, 2 )
(222)
H(k + 1) = k+1
R(k + 1) =
46
(223)
(224)
W (k + 1) =
We absorbed the
+ 1, z k+1 ).
x(k
+ 1, z
x(k
2
k+1
k+1
Pk
j=1
j2
k+1
k+1
k+1
=
Pk
2
k+1 + j=1 j2
k+1
= Pk+1
2
j=1 j
2
Pk
k+1 + 2
2
j=1
(225)
(226)
(227)
term in the denominator into the summation. Using this, we an get a new
Pk j z(j)
h
Pk j z(j) i
k+1
j=1
j=1
)=
+ Pk+1
z(k + 1) k+1
Pk
Pk
2
2
2
j=1 j
j+1 j
j=1 j
Pk+1
j=1 j z(j)
= Pk+1
2
j=1 j
(228)
(229)
Given
z = Hx + w, w N (0, I)
(230)
Do a Cholesky fa
torization of R.
2. Set up the
ost fun
tion
(231)
normalized data equation also minimizes a
ost fun
tion based on the original data equation.
Our solution
LS
x
that minimizes
J(x).
= kT (Hx z)k
(232)
(233)
LS
This works provided T is orthogonal, so we'll
hoose a spe
ial T that makes solving for x
R
= H from QR fa
torization, so Q
is orthogonal, and R
is upper
T , and Q
easier. Let T = Q
47
T (Q
Rx
zk2
J(x) = kQ
Q
T zk2
= kRx
2
R
o
z
=
x o
0
o x zo k2 + kk2
= kR
4. Minimize
J(x)
Assuming that
o
R
(234)
(235)
(236)
(237)
is invertible,
o1 zo
LS = R
x
LS ) = kk
J(x
kk2 is
o
Note: if R
The
(238)
(239)
irredu
ible, the
ost due to data that did not quite t.
is not invertible, then the problem was not quite observable, and the original
oT R
o )1
Px x|z
= (R
1 R
T
=R
o
(240)
(241)
We get the result by taking the inversion inside to avoid
al
ulating square inversion. We
an
solve for the information matrix,
T R
HT H = R
o o
(242)
Remember the information matrix is larger in the positive denite sense when you have lots
of information. This is equivalent to the Fisher information matrix for an e
ient estimator.
this
ase, the information matrix is equal to the Fisher information matrix.
8.5.2 Re
ursive Square-Root LS
So for
(243)
We sta k in oming data like before, normalizing measurements as they ome in, assuming independent measurements.
2
Hk
zk
x
J(k + 1) =
H(k + 1)
z(k + 1)
= k(k)k2 +
H(k + 1)
z(k + 1)
48
(244)
(245)
(246)
(247)
(248)
QR =
H(k + 1)
Be
ause
(249)
T ,
T =Q
2
R
o (k + 1)
zo (k + 1)
x
J(k + 1) = k(k)k +
(k + 1)
0
o (k + 1)x zo (k + 1)k2
= k(k)k2 + k(k + 1)k2 + kR
2
+ 1, z k+1 ),
x(k
o1 (k + 1)zo (k + 1)
+ 1, z k+1 ) = R
x(k
P (k + 1, z k+1 ) = [(H k+1 )T H k+1 ]1
=
=
1
R
o
and
T
R
o
oT (k + 1)R
o (k +
[R
1
(k + 1)R
T (k
R
o
o
49
1)]
(250)
(251)
P (k + 1, z k+1 ).
(252)
(253)
+ 1)
(254)
(255)
z = h(x + w)
w N (0, R), w Rnz 1
fun
tion of x. We
an write
where
and
x Rnx 1 .
Normally
(256)
nz > nx . h(x)
is an
nx -by-1
Problem statement:
nd
h1 (x)
h2 (x)
h(x) = . .
..
hnz (x)
ve tor
(257)
JN LW (x
(nonlinear and
weighted).
(258)
Properties of JN LW (x):
1.
JN LW (x) 0
2.
JN LW (x) = 0 h(x) = z
(assumes
R > 0)
RaT Ra = R.
50
Then let
za = RaT z ,
(259)
(260)
Aside:
x f T (x) =
T
) =
x = ( x
x1
x2
.
..
xn
x1
x2
. [f1 (x), f2 (x), . . . , fm (x)]
..
xn
h
x xnom
= H(xnom ) = H .
Thus,
Hij =
f2
x1
f2
x2
.
.
.
f2
xn
x
f11
x2
.
.
.
f1
xn
hi
xj x
h(x)
(262)
(263)
x.
(261)
J T
) = 2H T (z h(x)
0=(
x x
To nd
Rnm
h
J
T
( ) = 2(z h(x)
T H.
= 2(z h(x)
x x
x x
So we need
Note that if
xn
We know that
..
fm
x1
fm
x2
.
.
.
fm
xn
nom
JN L is:
J
x1
J T
..
0=(
) = .
x x
J
0 = 2H T (z h(x))
we have
= (H T H)1 H T z .
x
JN L ,
rstly we will
x1
In Newton-Rhaphson
expansion:
x2 , where x2 = x1 x.
f (x1 )
method x = f (x ) . The NR
1
f (x),
we start from an
and update it as
x
=
f (x1 )
f (x1 )
51
(264)
=x
x
g,
x
then we have
JN L (x), suppose
= x
+x
g.
x
and all it
g .
x
g + x)[z
g + x)]
0 = H T (x
h(x
(265)
= f (x
g + x)
f (x)
Let
(266)
f
g) + [
+ O(x
2)
= f (x
]x
x x g
(267)
g) + [
]x
0 = f (x
x x g
f
H T
h
=
[z h(x)] H T ( )
x
x
x
2h
= 2 [z h(x)] + H T H = V
x
f
2h
x2 is beyond a matrix, it is a
tually a tensor of three indi
es. x
f
entry in
x
an be written as
where
=V
nz
X
2 hl
f
g )] + (H T H)ij
[zl hl (x
]ij =
x
xi xj x g
l=1
52
is symmetri . Ea h
(268)
H
x
g
x
2. NR an diverge if
2h
x2
2
g ) Hx||
= ||z h(x)||
2
J(x)
= || z h(x
| {z }
(269)
Let
= ||z Hx||
2.
J(x)
The
in (269).
given
Comparison:
NR method applies a Taylor series expansion to the rst order ne
essary
onditions. On the
other hand, the GN method applies Taylor series to the measurement model:
To avoid divergen
e, we modify the updating equation in GN method as
0 < 1.
g )+Hx+w
z = h(x
.
g x
g + x
, where
x
g + x]
is less than J[x
g ]. This guarantees
onvergen
e in
J[x
J(x) 0.
< J(x
g )?
Q: How do we know : 0 < 1 s.t. J(x g + x)
g + x)
,
all the step length.
= J(x
A: Dene J()
dJ
= J(x
. (
hain rule)
g ) and d
Note: J(0)
)x
= ( J
x x
+x
Choose
s.t.
virtually
Consider
dJ
d
=0
=
x
x x g
(270)
(271)
(272)
dJ
implies rank(H) = nx (all
olumns are linearly independent) and
d =0 < 0, with equality only
g ) = 0.
if z h(x
Thus for some small values of
g)
, J()
< J(x
is guaranteed!
53
= 1.
= 0), Jg
Jg = J(
new = J( = 1)
3. while
Jg new Jg
= /2
Jg new = J()
end
This will
onverge to a lo
al minimum.
g x
g + x
LM ,
x
where
LM = (H T H + I)1 H T [z h(x
g )]
x
with
If
0
LM
= 0 x
is equivalent to
LM
GN
=0
=
=1
=0
instead it uses
(273)
= 1.
.
Pseudo LM algorithm:
1.
=0
2. he k if
H T H > 0,
3. ompute
= ||H|| 0.001
LM ()
x
if not let
Jgnew Jg ,
g ), Jgnew = J(x
g + x
LM ())
Jg = J(x
then let
go to step (3)
LM a
hieves fast
onvergen
e near the solution if the residuals are small. If the residuals near
solution are not small, we may have to use full NR(Newton Rhaphson) method.
54
(274)
x (the state ve
tor) is nx 1, u (the input ve
tor) is nu 1, v (the pro
ess noise or disturban
e)
nv 1, and the matri
es A (the system matrix), B (the input gain), and D (the noise gain) are
where
is
Note: v
nz 1
and
(275)
nz nx .
is ontinuous but not dierentiable, meaning that it annot properly be put into a
dierential equation. However, a more rigorous derivation of the equation still leads to the same
result.
Note:
If
v(t) = w(t)
= 0,
v(t)
pdf of
data
w(t)
are not equal to 0, it may be enough to know the
onditional pdfs of all future x(t) values (
onditioned on the
then, given
and
(276)
t0
where
Note: F
(t, t0 ).
55
in lude
(277)
(278)
F (t, ) = F (t, )F (, )
F (t, ) = [F (, t)]1
If
A is onstant, then F (t, ) = F (t, 0) = eA(t ), where the matrix exponential is dened
as
eA(t ) = I + A (t ) +
1 2
A (t )2 + . . .
2!
(279)
(The matrix exponential may be al ulated in MATLAB using the expm() fun tion.)
If
is time-varying, then one must numeri ally integrate the matrix initial value problem in
order to determine
F (t, ).
is white noise if
v(t)
v( )
for all
t 6=
and
E[
v (t)] = 0.
(The noise must be independent even when t and are very
lose.) A
onsequen
e of whiteness
is that E[
v (t)
v T ( )] = V (t)(t ), whi
h for V (t) = V = const implies that Svv (f ) = power
spe
tral density of
v(t) = v ,
independent in time
zero mean
has ovarian e
i.e. the power spe trum is at. This implies that white noise is
V (t)(t )
This also implies a pro ess that has innite power be ause, at
t = , (t ) = .
However, we
"just go with" the
tion of white noise be
ause it is
onvenient and
an be a good approximation
over a frequen
y band (as opposed to the entire frequen
y spe
trum).
(280)
t0
(281)
t0
Additionally,
= A(t)x(t)
+ B(t)u(t)
x
meaning that the predi
tion of the mean follows the linear system. If
(282)
E[
v (t)] = v
(not zero-
mean), then
= A(t)x(t)
+ B(t)u(t) + D(t)
x
v
56
(283)
10.4 Dis
rete-time models of sto
hasti
systems10 STOCHASTIC LINEAR SYSTEM MODELS
whi
h is still deterministi
.
The
ovarian
e is
T]
Pxx (t) = E[(x x)(x
x)
Substituting for
(284)
gives
0 )] +
Pxx (t) = E[(F (t, t0 )[x(t0 ) x(t
t0
0 )] +
F (t, 1 )D(1 )
v (1 )d1 )(F (t, t0 )[x(t0 ) x(t
F (t, 2 )D(2 )
v (2 )d2 )T ]
t0
(285)
Expanding gives
0 ))(x(t0 ) x(t
0 ))T ]F T (t, t0 ) +
Pxx (t) = F (t, t0 )E[(x(t0 ) x(t
Z tZ
t0
t0
(286)
where E[
v (1 )
v T (1 )] = V (1 )(1 2 ). Also,
ross terms in the
ovarian
e go to zero be
ause
0 ))(
E[(x(t0 ) x(t
v T )] = 0 for all > t0 due to the whiteness of the noise. The sifting property of
the Dira
delta allows us to
ollapse one integral:
(287)
t0
(288)
and if
Pxxss ,
and
V > 0,
then
Pxx (t)
onverges to a onstant
0 = APxxss + Pxxss AT + DV D T
(289)
To solve this
Pxxss = lyap(A, DV D T )
(290)
u(t) = u(tk ) = uk
for
tk t tk+1 .
Then an equivalent dis rete-time model of our original ontinuous system is:
(291)
10.4 Dis rete-time models of sto hasti systems10 STOCHASTIC LINEAR SYSTEM MODELS
Figure 18: A zero-order hold ontrol input holds a onstant value for
t [tk , tk+1 ).
where
G(tk+1 , tk ) =
v(tk ) =
tk+1
tk
Z tk+1
F (tk+1 , )B( )d
(292)
F (tk+1 , )D( )
v ( )d
(293)
tk
v(tk )
v(t)
E[v(tk )] = 0
and
E[v(tk )v (tj )] =
tj+1 Z tk+1
tj
where
(294)
tk
E[
v (1 )
v T (2 )] = V (1 )(1 2 ).
E[v(tk )v T (tj )] = jk
tk+1
Thus,
(295)
tk
= jk Qk
jk appears
j = k .)
(296)
be
ause, if the
x(tk ) x(k)
(297)
v(tk ) v(k)
F (tk+1 , tk ) F (k)
(299)
u(tk ) u(k)
G(tk+1 , tk ) G(k)
V (tk+1 , tk ) V (k)
58
(298)
(300)
(301)
(302)
where
stant),
(303)
F (k) = F = eAt
Z t
G(k) = G =
eA Bd
Q(k) =
(304)
(305)
eA DV D T eA
(306)
w(k)
(307)
E[w(k)] = 0
(308)
(309)
, but
R(k) = RT (k) > 0. We think of z(k) as a sample from z(t) = C(t)x(t) + w(t)
it is not
orre
t to say that z(k) = z(tk ). The problem lies in the assumption of whiteness, and
. Be
ause of this, E[w(t
k )w
T (tk )] = (0)Q = . The
orre
t way
therefore innite power, for w(t)
to obtain z(k) is to assume an anti-aliasing lter is used to low-pass lter z(t) before sampling.
where
lter (fsamp
1
R(k) = 2
t
where
tk
tk
tk t tk t
1 )w
T (2 )]d1 d2
E[w(
k)
Rw(t
t
(310)
1 )w
T (2 )] = Rw(
1 )(1 2 ).
E[w(
Combining the dis
rete-time dynami
s and measurement models, we obtain the full dis
rete-time
model.
59
(311)
(312)
and
Note:
When
2) F (i + 1)]
i = k 1,
k1
X
i=0
in the summation.
[F (k 1)F (k
x(k).
x(k
+ G(k)u(k)
+ 1))(x(k + 1) x(k
+ 1))T ] = F (k)Pxx F T k + Q(k)
Pxx = E[(x(k + 1) x(k
(314)
(315)
P (k)xx Pxxss
where
Pxxss
(316)
Pxxss = F Pxxss F T + Q
(317)
Pxxss = dlyap(F , Q)
(318)
In MATLAB:
Also,
to
Q: Where does the requirement that max(abs(eig(F ))) < 1 for stability
ome from?
A: F = eAt . One
an show that || = eRe()t , where is the eigenvalue of F
orresponding
an eigenvalue of
A.
I
Re() < 0,
then
|| < 1.
60
E[v(k)v(j)T ] = Qk jk
E[v(k)] = 0
E[w(k)] = 0
E[w(k)w(j) ] = Rk jk
E[w(k)v(j)T ] = 0 k, j
Note:
and
(319)
(320)
(321)
(322)
(323)
E[x(0)|Z 0 ] = x
(0)
T
(325)
(326)
(327)
E[w(k)(x(0) x(0)) ] = 0 k 0
E[v(k)(x(0) x(0)) ] = 0 k 0
3)
(324)
Here we have
hosen some spe
i
onditions for setting up the Kalman lter. Later, we will
relax some of these
onditions, or investigate the impli
ations of them being violated.
There are several generi
estimation problems:
x
(k|Z k ) = x
(k|z0, ..., zk ) - Use measurements up to time k, to estimate
Filtering: Determine
state at time k. This
an be done in real time, and
ausally - it does not depend on future states.
Smoothing: Determine x
(j|Z k ), for j < k - Use future data to nd an improved estimate of a
histori
al state. This is non
ausal.
(j|Z k ), for
Predi
tion: Determine x
j > k
61
Bar-shalom
Humphreys
x
(k|Z k ) = x(k|k)
x
(k + 1|Z k ) = x(k + 1|k)
P (k|Z k ) = P (k|k)
P (k + 1|Z k ) = P (k + 1|k)
x
(k)
x(k + 1)
P (k)
P (k + 1)
others
+
name
x
(k)
x
(k + 1)
P + (k)
P (k + 1)
This notation is ni
e be
ause it
orresponds to the prior in the stati
estimation equations.
Filtering steps (derivation based on MMSE)
0) set
k = 0,
then
x
(k), P (k)
known
1) state and
ovarian
e propagation: predi
t state and error
ovarian
e at step
on data through
k+1, onditioned
z(k)
x
(k + 1) = E[x(k + 1)|Z k ]
(328)
x
(k + 1) = E[F (k)x(k) + G(k)u(k) + v(k)|Z ]
x
(k + 1) = F (k)E[x(k)|z k ] + G(k)u(k) +
E[v(k)|Z
k ]
x
(k + 1) = F (k)
x(k) + G(k)u(k)
(329)
(330)
(331)
P (k + 1) = E[(x(k + 1) x
(k + 1))(x(k + 1) x(k + 1))T |Z k ]
P (k + 1) = F (k)P (k)F T (k) + Q(k)
(332)
(333)
Some additional steps for this
an be found on page 204 of Bar-Shalom. The
ross-terms are zero
due to the fa
t that
v(k)
x
(k).
The rst term tends to de
rease (sin
e the absolute value of the eigenvalues of F are less than one),
but be
ause of the additive
term, the overall ovarian e grows (in the positive denite sense).
x
(k + 1) and P (k + 1) (a priori info)
z(k + 1)
to get an improved state estimate with a redu
ed estimation error
ovarian
e, due to our measurement update. This next step has been solved previously in
lass, in the review of linear algebra.
Bar-Shalom derivation: get distribution of
onditioned on
Z k+1 ,
and solve
(334)
62
k+1
k+1
]
((
((k+1
(1)|Z
E[w(k
]
] +(
((+
(335)
(336)
(337)
Pxz (k + 1) = E[(x(k + 1) x
(k + 1))(z(k + 1) z(k + 1))T |Z k ]
Pxz (k + 1) = E[(x(k + 1) x
(k + 1))(x(k + 1) x
(k + 1))T ]H T (k + 1)+
(
(
((((
((
E[(x(k (
+(
1)(
(
x(k
+ 1))wT (k + 1)]
(((
Pxz (k + 1) = P (k + 1)H T (k + 1)
Thus, we have all the moments required to spe
ify
x(k + 1)
x(k + 1)
P
N
,
z(k + 1)
H(k + 1)
x(k + 1)
H P
where the
(k + 1)
P H T
HP HT + R
so
(338)
index on the elements of the
ovarian
e matrix are suppressed for brevity, and
zk.
x
(k + 1) = E[x(k + 1)|Z k+1 ]
= E[x(k + 1)|z(k + 1), Z k ]
= E[x(k + 1)|z(k + 1)]
Here, the
onditioning on
Zk
is made to resemble the stati
ase. So, the problem has now been redu
ed to one we've already
solved (in Linear Estimation of Stati
Systems):
1
x(k + 1) = x
(k + 1) + Pxz (k + 1)Pzz
(k + 1)[z(k + 1) z(k + 1)]
1
T
1
x = x + P H T H P H T + R
[z H x
]
P = P P H T [H P H T + R]1 H P
where the
(k + 1)
63
(339)
(340)
W (k + 1) = P (k + 1)H (k + 1)S(k + 1)
(innovation
ovarian
e )
(innovation )
(Kalman
gain matrix )
x
(k + 1) = x(k + 1) + W (k + 1)(k + 1)
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W T (k + 1)
Summary:
given
0) set
x
(0), P (0),
k=0
x
(k + 1), P (k + 1)
(k + 1) = z(k + 1) H(k + 1)
x(k + 1)
T
(innovation )
(innovation
ovarian
e )
T
W (k + 1) = P (k + 1)H (k + 1)S(k + 1)
(gain )
x(k + 1) = x
(k + 1) + W (k + 1)(k + 1)
(a posteriori or ltered state estimate)
T
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W (k + 1)
(a posteriori state error
ovarian
e )
3)
k k+1
4) goto (1)
64
(341)
. . . + W (K + 1)R(k + 1)W T (K + 1)
(342)
The se ond of these is alled the Joseph Form of the state ovarian e update. It guarantees
W (k + 1) = P (k + 1)H T (k + 1)R1 (k + 1)
(343)
that
Sin e
P (k + 1)
W (k + 1),
W (k + 1)?
(344)
(345)
Remarks
As
If
If
W (k + 1) 0
(k
(P
P (k + 1)
is so big that
an upper limit.
x
(k + 1)
then
W (k + 1)
x(k + 1)
approa hes
nz < nx
or the range
P (k + 1)
determined by
nz -dimension
subspa e that is
65
H(k + 1),
range[H(k + 1)]
state update
x
(k + 1)
x
(k + 1)
Figure 19: The state update o urs in the subspa e in whi h the measurements provide information
=I
(346)
re overs the previous (unweighted) form. With this generalization, the only
hange in the Kalman lter is in the ovarian e propagation, whi h is now given by:
(347)
E[x|Z k ]
For dis
rete, statisti
al, linear, time-varying (SLTV) systems, the result will be equivalent to
the MMSE-based derivation:
x
MAP (k) = x
MMSE (k)
v(k)
(348)
Now maximize
essentially ignore
the
ost fun
tion
J(x(k + 1), v(k)) = log(p[z(k + 1)|x(k + 1), v(k)]) log(p[x(k + 1), v(k)])
(349)
Note that px(k+1),v(k) [x(k + 1), v(k)] = Cpx(k),v(k) [x(k), v(k)] where C is a
onstant and the
subs
ripts make
lear that these are two distin
t probability distributions.
Q: Why
an we make the above transformation from p[x(k + 1), v(k)] to p[x(k), v(k)]?
A: Be
ause x(k + 1) is a fun
tion of x(k). More generally, for any invertible 1-to-1 fun
tion
Y = g(X),
it an be shown that
pY [y] =
pX [g1 (y)]
| dy
dx |
Applying this transformation to the probability leads to an additive
onstant in the
ost fun
tion
be
ause of the
log,
and then this onstant an be ignored in minimizing the ost fun tion. The ost
1
J(x(k + 1), v(k)) = [z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)
2
. . . [z(k + 1) H(k + 1)x(k + 1)]
1
1
(k)]T P 1 [x(k) x
(k)] + v T (k)Q1 (k
. . . + [x(k) x
2
2
1
[z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]
2
1
1
(350)
. . . + [x(k) x(k)]T P 1 [x(k) x
(k)] + v T (k)Q1 (k)v(k)
2
2
Repla e
x(k)
with
1
[z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]
2
1
(k)]T P 1
. . . + [F 1 [x(k + 1) G(k)u(k) (k)v(k)] x
2
1
. . . [F 1 [x(k + 1) G(k)u(k) (k)v(k)] x
(k)] + v T (k)Q1 (k)v(k) (351)
2
x
(k + 1) = F (k)
x(k) + G(k)u(k)
1
[z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[z(k + 1) H(k + 1)x(k + 1)]
2
1
. . . + [F 1 [x(k + 1) (k)v(k) x
(k + 1)]]T P 1
2
1
(352)
. . . [F 1 [x(k + 1) (k)v(k) x
(k + 1)]] + v T (k)Q1 (k)v(k)
2
67
with respe t to
x(k + 1)
and
v(k)
J
v(k)
J
x(k + 1)
T
T
(a)
(353)
(b)
(354)
Now we wish to reformulate (a) and (b) above into a form su h that:
x(k + 1)
C1
=
v(k)
C2
v(k)
P (k + 1):
(355)
(356)
(357)
Substitute this result into (b) to get the next equation, and then
olle
t terms and manipulate
to simplify.
(358)
= H (k + 1)R
with
P (k + 1)P 1 (k + 1)
(359)
we have:
(360)
F T (k)P 1 kF 1 (k)[P (k + 1)
x(k + 1),
x
(k + 1)
(361)
x
(k + 1) = [P 1 (k + 1) + H T (k + 1)R1 (k + 1)H(k + 1)]1
. . . + [P 1 (k + 1)
x(k + 1) + H T (k + 1)R1 (k + 1)z(k + 1)]
1
= x(k + 1) + [P
(k + 1) + H (k + 1)R
(k + 1)H(k + 1)]
(362)
H (k + 1)R
. . . [z(k + 1) H(k + 1)
x(k + 1)]
= x(k + 1) + W (k + 1)[z(k + 1) H(k + 1)
x(k + 1)]
(k + 1)
(363)
(364)
As expe
ted this agrees with the previous derivation. We
an also substitute this ba
k into
to get an astimate for it:
v(k)
v(k)
(365)
(366)
This is extra information whi
h we did not get from the previous MMSE derivation.
The end result for the state guess
x
(k + 1)
v(k). W (k + 1)
is dened as before.
x
(k + 1) = x(k + 1) + W (k + 1)[z(k + 1) H(k + 1)
x(k + 1)]
T
1
v(k) 6= v(k)
be ause
69
v(k)
is onditioned on
z(k + 1)
(367)
(368)
and is not
13.1 Stability of KF
Assume
v(k) = w(k) = 0 k
(zero-input stability).
de ays to zero:
(369)
Pseudo Proof:
+ 1)
e(k + 1) = x(k + 1) x(k
(370)
(371)
However,
x(k
+ G(k)u(k)
Substituting the above
auses the
u(k)'s
(372)
(373)
v(k) = w(k) = 0
k :
e(k + 1) = [I W (k + 1)H(k + 1)]F (k)e(k)
(374)
It is a little tri
ky to prove stability be
ause this is a time varying system. If the system was
not time varying, it would be possible to look at the modulus of the eigenvalues. To analyze, we
will use a Lyapunov-type energy method to prove stability. Dene an energy-like fun
tion
V [k, e(k)] =
1 T
e (k)P 1 (k)e(k)
2
(375)
Then,
only
1 T
e (k + 1)P 1 (k + 1)e(k + 1)
2
1
= eT (k)[P (k) + D(k)]1 e(k)
2
V [k + 1, e(k + 1)] =
(376)
(377)
where
D(k) 0,
whi h implies
Thus
(378)
observable,
or small,
ontrollable with respe
t to points of entry of pro
ess noise, or "sto
hasti
ontrollability
and observability").
Here observable implies that all unstable or neutrally stable subspa
es of original system are
observable. However, the original system need not be stable! Then, we
an show that:
70
P 1 (k) > 0
2.
Thus,
for some
P (k)
: 0 < 1
and some
N.
is in reasing.
(379)
(380)
+ 1)
(k + 1) = z(k + 1) H(k + 1)x(k
S(k + 1) = H(k + 1)P (k + 1)H T (k + 1) + R(k + 1)
W (k + 1) = P (k + 1)H T (k + 1)S 1 (k + 1)
+ 1) = x(k
+ 1) + W (k + 1)(k + 1)
x(k
P (k + 1) = P (k + 1) W (k + 1)S(k + 1)W T (k + 1)
Now, suppose we have a
ontrol law
by:
u(k) = C(k)x(k).
Innovation
(381)
Innovation Covarian e
(382)
(383)
(384)
(385)
(F, G)
(386)
expensive, impra ti al, or impossible to measure all of the states. What if we use
x(k)
instead?
Q:
an we measure
z(k),
estimate
x(k),
and
feed ba k
x(k
G(k)C(k)x(k)]+
x(k)
x(k)
x(k)
=
.
x(k)
e(k)
x(k) x(k)
(387)
(388)
(389)
71
(390)
(391)
(392)
on the state (not true for nonlinear systems). We already showed that
any linear KF that satises the sto
hasti
observability
onditions, et
.
e(k) 6= 0
e(k) 0
as
for
(393)
C(k).
This is
alled the separation prin
iple. That is, one
an design a full-state feedba
k
ontrol law
separately from the KF. When
onne
ted, the system will be stable. The
ombined system ends up
with the properties of the two independent systems (in terms of poles and zeros). In pra
ti
e, the
poles of the observer should be to the left (i.e. faster) than those of the
ontrolled plant or pro
ess.
Note: If the estimator has modeling errors (e.g.
or
P (k)
(394)
to get:
P (k + 1) = F (k){P (k) P (k)H T (k)[H(k)P (k)H T (k) + R(k)]1 H(k)P (k)}F T (k)+
T
(k)Q(k) (k)
(395)
(396)
P (k),
whi h is nonlinear in
P (k).
Ri
ati equation (MRE). Beware: Analysis of MRE is not easy! Spe
ial
ase: Steady state solution
for LTI systems:
and if
Q, R > 0
F (k) = F , G(k) = G,
et . If the pair
(F, H)
is observable,
Pss > 0,
(F, )
is ontrollable,
whi h an be determined
P (0) > 0.
(397)
72
(398)
+ 1) = F x(k)
x(k
+ Gu(k)
+ 1) = x(k
+ 1) + Wss [z(k + 1) H(k + 1)x(k
+ 1)]
x(k
(399)
(400)
(401)
Ass , I Wss H
(402)
where
e(k)
0.
(k)
We an
(403)
+ 1)]
v(k) = [T F T P 1 F 1 + Q1 ]1 T F T P 1 F 1 [x(k + 1) x(k
(404)
+ 1)]
v(k) = Q(k)T (k)[P 1 (k + 1)[x(k + 1) x(k
(405)
P (k + 1) , F P F T + QT
(406)
Then,
= H R
F
T
T
= H R
73
(407)
(408)
(409)
(410)
(411)
(412)
(413)
(414)
P (k + 1).
+ 1) su
h
x(k
that
+ 1) = [P 1 + H T R1 H]1 [P 1 x(k
+ 1) + H T R1 z(k + 1)]
x(k
(415)
This is just like the re ursive least-squares form. With further manipulation we an get:
+ 1) = x(k
+ 1) + [P 1 + H T (k + 1)R1 (k + 1)H(k + 1)]1
x(k
+ 1)]
H T (k + 1)R1 (k + 1)[z(k + 1) H(k + 1)x(k
+ 1) + W (k + 1)[z(k + 1) H(k + 1)x(k
+ 1)]
= x(k
v(k)
(417)
(418)
gives
+ 1) x(k
+ 1)]
] = Q(k)T (k)P 1 (k + 1)[x(k
T
1
+ 1)]
= Q(k) (k)P (k + 1)W (k + 1)[z(k + 1) H(k + 1)x(k
v(k) = E[v(k)|Z
(416)
k+1
(419)
(420)
or
oding error, or the system may be subje
t to
olored noise (i.e. non-white noise).
NB: This is only a ne
essary
ondition, not su
ient.
+ 1), (k)]
min{J[x(k + 1), (k)]} = J[x(k
1 T
= (k + 1)S 1 (k + 1)(k + 1)
2
(421)
(422)
Use this to get the likelihood fun tion for the orre tness of the KF model.
P [Z k |KF
mode]
1)]}
= C exp{J[x(1),
(0)]} exp{J[x(2),
(1)]}
exp{J[x(k),
(k
(423)
= C exp{
1
2
k
X
T (j)S 1 (j)(j)}
(424)
j=1
J ,
k
X
, P (0).
F , , Q, R, H , x(0)
T (j)S 1 (j)(j)
(425)
j=1
That is, the one with the minimum weighted least-square error. This leads to the Multiple Model
approa
h.
74
x(k)
T] = 0
E[x(k)
This doesn't hold in most
auses as eviden
e by the fa
t that
(426)
P (k)
ase. The
ulprit is the pro
ess noise. Hen
e, for our purpose we de
rease our requirements for
onsisten
y.
E[x(k)]
=0
(427)
x(k)
T ] = P (k) = J 1 (k)
E[x(k)
where,
J 1 (k)
(428)
In
ontradi
tion, the parameter estimator
onsisten
y is an asymptoti
(innite size sample)
property. Typi
ally the
onsisten
y
riteria of the lter are as follows:
1. The state errors should have zero mean and have
ovarian
e matrix as
al
ulated by the lter.
2. The innovations should also have the same property as mentioned in 1.
3. The innovation should be a
eptable as white.
The rst
riteria, whi
h is most important one,
an be tested only in simulation (Monte-
arlo
simulations). The last two
riteria
an be tested on real data (single-run / multiple-runs). In theory
these properties should hold but in pra
ti
e they might not. Following are the reasons for this:
(Monte-
arlo simulations); Se
ond, Real-time test - using real-time measurements by single-run or
multiple-runs of the experiment (if the experiment
an be repeated).
75
x(k).
z(k)
The measurement ve tors are then used as input to the kalman lter (under evaluation) and
estimated states
x(k)
are generated.
x(k)
= x(k) x(k)
(429)
T (k)P 1 (k)x(k)
(k) = x
If KF is working properly then
(k)
(430)
nx
degrees of free-
dom.
To demonstrate this further, We de
ompose
and
(k)
where,
V (k)V T (k) = I
is a diagonal matrix
let,
y(k)
= V T (k)x(k)
(431)
then,
(k) = y (k)
(k)y(k)
=
2
nx
X
yi
i=1
whi
h is distributed as
We
an do
with
nx
(432)
yi
nx
monte arlo simulations of our truth model and lter the measurements and see
76
nx .
Let,
k =
where
N
1 X i
(k)
N i=1
(433)
denote
approa hes to
distributed as
Chisquare density
0.012
0.01
0.008
0.006
(1alpha)
0.004
0.002
(alpha/2)
(alpha/2)
Nr1
Nnx
Nr2
is onsistent,
should be limited to
N r2
N r1
We usually hoose
r1
and
r2
su h that
p()d = (1 )
= 0.01
or
(434)
= 0.05
In MATLAB:
r1
chi2inv( , N.nx )
2
N
and
r2
chi2inv(1
, N.nx )
2
Note: If these limits are violated then something is wrong with the lter.
14.2.2 Real-Time (Multiple-Runs) Tests
This is test is done on lter (KF) based on the real time data for the dynami
model under
evaluation. The test is appli
able for the experiments that
an be repeated in real world. Hen
e
the dynami
model under evaluation should available for
real-time runs.
First ompute,
77
(435)
2nz .
We an do
N,
k =
Note that
If
N k
is distributed as
2N nz .
limited to
r1 k r2 , (1 ) 100%
N
1 X i
(k)
N i=1
(436)
of the times.
r1
and
r2
should be
are given by
In MATLAB:
r1
chi2inv( , N.nz )
2
N
Note:
and
r2
chi2inv(1
, N.nz )
2
If these limits are violated then something is wrong with the lter.
(k),
lm (k, j) = r
P
PN
i
li (k)m
(j)
2 P
2
N
N
i (k)
i (j)
i=1 l
i=1 m
i=1
(k, k) = 1
k = j then lm
k 6= j , then we expe
t lm (k, k) to be small as
ompared to 1,
For N large enough and k 6= j , this statisti
s
an be approximated
1
as varian
e.
zero mean and
N
If
(437)
When
Normal Distribution
(1alpha) *100 %
78
(438)
Note:
1
N
(439)
, 0, p
where, r is given as : r = norminv 1
2
(N )
at l = m and just look for k and k + 1
on a single run, they might have a high variability. The question is whether one
an a
hieve a low
variability based on a single run, as a real-time implementation. This test for lter
onsisten
y is
alled
These test are based on repla ing the ensemble averages by time averages based on the
ergodi ity
1X T
(k)S 1 (k)(k)
=
(440)
k=1
is distributed as
then,
2nz .
Similarly whiteness test an be done. The whiteness test statisti s for innovations, whi h are
P
k=1 l (k)m (k + j)
lm (j) = q P
2 P
2
(441)
E[lm (j)] = 0
E[lm (j)Tlm (j)] =
(442)
(443)
Q(k)
1). If
are orre t.
(k)
79
Q(k)
x(k)
in response to ea
h measurement.
(k + 1) = F (k)P (k)F T (k) +
Re
all P
(k)Q(k)T (k)
(k)
Q(k)
80
then
S 1 (k + 1)
Svv (f ) = power
Consider ontinuous
spe trum of
v=Const
V.
By the Weiner-Khon
hin theorem, take the inverse Fourier transform to re
over the auto-
orrelation
fun
tion:
E [
v (t) v (t + )] = R = F 1 [(S (f ))] = V ( )
(444)
un orrelated in time
This implies that a nonuniform power spe tra leads to auto- orelated noise:
6I
YY
5W
YY
ZKLWH
ZWQW
DXWRFRUUHODWHG
DXWRFRUUHODWHG
YW
XW
2ULJLQDO
6\VWHP
81
(445)
(446)
E [
v (t)] = E [w
(t)] = E [
n (t)] = 0
ZWQW
ZKLWH
*V
ZWQW
YW
*V
XW
YW
2ULJLQDO
6\VWHP
Let
(t) =
v (t)
w
(t) + n (t)
x (t) = A x (t) + B n
(t)
(447)
(448)
where
v (t)
(t)
(t) = w
n
(t)
(449)
The output of the shaping system
an be used to drive the original systems. The augmented
dynami
s be
ome:
I
I
x (t)
x (t)
B (t)
A (t) D (t)
C
D (t)
D
=
+
u (t) +
0
0
(t)
x (t)
x (t)
0
0
A
B
{z
}
|
(450)
new A matrtix
x (t)
+ 0 I D (t)
z (t) = C (t) 0 I C
x (t)
Svv (f )
and
Sww (f )
(451)
N (f )
D(f ) ) and derive the
82
Realization Problem,
(s) u
y = G
(s)
(452)
x (t) =
(453)
y (t) =
(454)
For stri
tly proper transfer fun
tions (the degree of the numerator is less than the degree of the
denominator) one
an
reate a
ontrollable
anoni
al form or an observable
anoni
al form. We do
this by exposing the
oe
ients of the numerator and denominator of the transfer fun
tion. As an
example:
Given a transfer fun
tion:
G (s) =
n1 s2 + n2 s + n3
s3 + d1 s2 + d2 s + d3
(455)
A state spa e model that is guaranteed to be ontrollable will take the form:
d2 d3
1
0
0 x (t) + 0 u (t)
1
0
0
y (t) = n1 n2 n3 x (t)
d1
x (t) = 1
0
(456)
(457)
Related MATLAB fun tions to investigate are: tf, ss, zpk, frd, ssdata, tf2ss.
Spe
tral fa
torization involves taking a transfer fun
tion su
h as the one shown in Bar Shalom
on p. 67:
Svv () = S0
1
a2 + 2
(458)
And splitting it into two fun
tions, one part with all the Right Hand Plane (RHP) roots and
the other in
luding all the Left Hand Plane (LHP) roots.
Svv () =
1
1
S0
a + j a j
H () =
1
a + j
83
(459)
(460)
16 INFORMATION FILTER/SRIF
16 Information Filter/SRIF
S
ribe: Ken Pesyna
Re
all that the
a posteriori
P (k + 1)
an be dened in
1
T
1
P (k + 1) =
(k + 1)H(k + 1)
P (k + 1) + |H (k + 1)R {z
}
matrix squaring operation
However, the matrix squaring operation is a bad idea numeri
ally. It squares
ondition number
of the matri
es.
Note that
pT R
p > 0
P (k + 1) = R
So may
R(k + 1).
Let's write
P (k + 1)
P (k + 1) = P (k + 1) W (k + 1)S 1 (k + 1)W T (k + 1)
(461)
Bar Shalom introdu
es the square root
ovarian
e lter, whi
h keeps tra
k of the square root of
the
ovarian
e matrix. But this requires the ability to update a Cholesky fa
torization.
P 1 (k).
y(k)
y(k)
P 1 (k)
P 1 (k)
= P 1 (k)
x(k)
= P 1 (k)
x(k)
= I (k)
(463)
(464)
= I (k)
where
matrix
(462)
(465)
the inverse of the
ovarian
e
We
an substitute these denitions into the Kalman Filter to get the Information Filter. After
mu
h algebra in
luding the matrix inversion lemma, we arrive at the following:
Let
84
(466)
16 INFORMATION FILTER/SRIF
n
1 T o T
y(k + 1) = I A(k)(k) T (k)A(k)(k) + Q1 (k)
(k) F (k)
y (k) + A(k)G(k)u(k)
(467)
1 T
I(k + 1) = A(k) A(k)(k) T (k)A(k)(k) + Q1 (k)
(k)A(k)
|
{z
}
(468)
Pro
ess noise de
reases the information during the propagation step. This is similar to a hole
in a metaphori
al information bu
ket; If
Qk = 0,
(469)
(470)
I (k + 1) = I(k + 1) +
H T (k + 1)R1 (k + 1)H(k + 1)
|
{z
}
(471)
and
by:
x
(k + 1) = I 1 (k + 1)
y (k + 1)
(472)
P (k + 1) = I 1 (k + 1)
(473)
nz > nx , nv
and if
R(k)
is
I (0) = 0.
This represents the diuse prior, i.e. no idea of our initial state. This setting the initial prior
to be diuse
annot be as easily done with the regular Kalman Filter.
error
ovarian
e matrix to innity, but limited numeri
al pre
ision limits our ability to do so
(k) = I 1 (k)
y (k) until I (k) be
omes invertible. If
in real systems. We
annot
ompute x
the system is observable, then
I (k)
85
H T R1 H
terms.
16 INFORMATION FILTER/SRIF
T
Rxx
(k) Rxx (k) = I (k)
T
xx (k) = I (k)
Rxx (k) R
where
Rxx (k)
and
kk (k)
R
(474)
(475)
I (k)
and
I(k)
(476)
(477)
Also let
= R(k)
(478)
(479)
wa (k) =
RaT
(k) w (k)
(480)
(481)
(482)
E [wa (k)] = 0
(483)
where
E wa (k) waT (j) = kj I
(484)
(485)
E [v (k)] = 0
(486)
E v (k) v T (j) = kj Q (k)
x (k)
and
v (k)
(487)
T
Rvv
(k) Rvv (k) = Q1 (k)
Note:
(488)
16 INFORMATION FILTER/SRIF
(489)
(490)
E wx (k) wvT (k) = 0
(491)
These square root information equations store, or en
ode, the state and pro
ess noise estimates
and their
ovarian
es. We
an re
over our estimates from the information equations as long as
is invertible. If
time
and the
Rxx (k)
Rxx
is not invertible then the system is not observable from the data through
Note that
Rxx (k)
is upper triangular.
Let's now de ode the state from the state information equation:
1
x (k) = Rxx
(k) [zx (k) wx (k)]
Suppose we want our best estimate of
x(k)
denoted as
(492)
x
(k):
x
(k) = E [x (k) |k]
1
1
= Rxx
(k) E [zx (k) |k] Rxx
(k) E [wx (k) |k]
|
|
{z
}
{z
}
=
(494)
zx (k)
1
Rxx
(493)
(k) zx (k)
(495)
1
x
(k) = x (k) x
(k) = Rxx
(k) wx (k)
P (k)
= E x
(k) x
T (k) |k
T
1
(k)
= Rxx
(k) E wx (k) wxT (k) Rxx
{z
}
|
(496)
(497)
(498)
1
T
= Rxx
(k) Rxx
(k)
= I
(k)
(499)
(500)
(501)
h
i
T
E (v (k) v (k)) (v (k) v (k)) |k = Q (k)
(502)
87
16 INFORMATION FILTER/SRIF
a posteriori
onditional probability
density fun tion. This amounts to minimizing the ost fun tion:
Ja [x (k) , v (k) , x (k + 1) , k] =
=
log (p)
1
1
[x(k) x
(k)]T P 1 (k)[...] + v T (k)Q1 (k)v(k)
2
2
1
+ [z(k + 1) H(k + 1)x(k + 1)]T R1 (k + 1)[...]
2
(503)
(504)
(505)
After normalization of the above form, an alternative formulation of the
ost fun
tion based in
square root information notation is:
Ja [x (k) , v (k) , x (k + 1) , k] =
1
1
2
2
kRxx (k) x (k) zx (k)k + kzv (k) Rvv (k) v (k)k
{z
} 2|
{z
}
2|
a priori x(k)
a priori v(k)
1
2
+ kHa (k + 1) x (k + 1) za (k + 1)k
(506)
{z
}
2|
measurement at k+1
The insight here is that the prior estimate of the state and pro
ess noise
an be expressed as a
measurement and thus formulated into the above
ost fun
tion.
Our task is to minimize
x (k + 1).
eliminate
Ja
x (k)
x (k)
in terms of
Ja .
x (k)
(507)
x (k)
into
Jb :
Jb [v (k) , x (k + 1) , k](508)
2
1
Rvv (k)
0
v (k)
0
=
(509)
zx (k) + Rxx (k) F 1 (k) G (k) u (k)
2
Rxx (k) F 1 (k) (k) Rxx (k) F 1 (k) x (k + 1)
|
{z
}
Big blo
k matrix
1
2
+ kHa (k + 1) x (k + 1) za (k + 1)k (510)
2
In the equation above, we used the following identity for the rst term:
2
a
2
2
kak + kbk =
b
88
(511)
kT vk = kV k
16 INFORMATION FILTER/SRIF
Ta (k) = QT (k) from QR fa
torization of the
orthonormal. Multiplying insides of rst term by Ta (k)
Ta is
Jb [v (k) , x (k + 1) , k] =
2
1
v (k)
zv (k)
Rvv Rvx (k + 1)
zx (k)
Rxx (k + 1) x (k + 1)
2
0
1
+ kHa (k + 1) x (k + 1) za (k + 1)k2
2
(512)
a posteriori square root information equation for v (k) as a fun
tion of x (k + 1):
zv (k) = Rvv v (k) + Rvx (k + 1) x (k + 1) + w v (k) ,
w v (k) (O, I)
(513)
This equation is a by-produ
t of the ltering pro
ess. It is not used to determine the ltered
state estimate, but will be used in smoothing. Filtering implies
ausality, smoothing implies
non
ausality (
an use future information).
2. The
Now minimizing
wrt.
Jb
0=
v (k)
w x (k) (O, I)
(514)
v (k):
T
Rvv (k)
| {z }
non-singular
This yields:
Rvv (k) v (k) + Rvx (k + 1) x (k + 1) zv (k)
1
v (k) = Rvv (k) zv (k) Rvx (k + 1) x (k + 1)
v (k)
(515)
(516)
Substitute Eq. 516 into solution into Eq. 512, and sta
king the remaining terms we get a new
yet equivalent
ost fun
tion:
2
1
R (k + 1)
z (k + 1)
Jc [x (k + 1) , k + 1] =
xx
x (k + 1) x
za (k + 1)
2
Ha (k + 1)
|
{z
}
Matrix
A.
If Matrix A were square (and non-singular) we
ould just take it's inverse to
ompute
the lter's best estimate of
x(k + 1).
(517)
x
(k + 1)
Ha
89
16 INFORMATION FILTER/SRIF
triangular. This will de ouple the ost fun tion into a omponent that depends on
x(k + 1)
and
one that does not. We do this by performing QR fa
torization on matrix A and the subsequent
and orthonormal transformation to the
ost fun
tion as before to get:
2
z (k + 1)
R (k + 1)
Jc (x (k + 1) , k + 1) =
xx
x (k + 1) x
0
zr (k + 1)
|
{z
}
upper triangular
to
a posteriori.
1
1
2
2
kRxx (k + 1) x (k + 1) zx (k + 1)k + kzr (k + 1)k
2
2
(519)
The la k of bars above the terms indi ates that we have gone from
a priori
(518)
Unsta k to get:
Jc [x (k + 1) , k + 1] =
Now unpa
k the impli
it square root information equation from this
ost fun
tion to get:
1. The
wx (k + 1) (0, I)
(520)
zr (k + 1) = wr (k + 1) ,
90
wr (k + 1) (0, I)
(521)
16 INFORMATION FILTER/SRIF
Aside:
Q: Where do the
and
wr (k + 1)
A: They
ome from the orthogonal transformation that o
urs to the
ost fun
tion, i.e. Eq. 517,
after performing the QR fa
torization and transforming the matri
es. First, to make things
lear,
let's unpa
k Eq. 517 into its impli
it square root information equations:
z x (k + 1) = Rxx (k + 1) x (k + 1) + w x (k + 1)
(522)
za (k + 1) = Ha (k + 1) x (k + 1) + wa (k + 1)
(523)
w
x
and
wa .
Rxx (k + 1)
0
zx (k + 1)
zr (k + 1)
wx (k + 1)
wr (k + 1)
=
=
=
Ta (k + 1),
we arrive at:
xx (k + 1)
R
Ha (k + 1)
z (k + 1)
Ta (k + 1) x
za (k + 1)
w
x (k + 1)
Ta (k + 1)
wa (k + 1)
Ta (k + 1)
(524)
(525)
(526)
Be
ause
and
1
x
(k + 1) = Rxx
(k + 1) zx (k + 1)
(527)
Jc
with
and
(528)
are terms dened earlier within the normal Kalman ltering (non-square-root-
information) ontext.
P (k)is
Note:
F (k)
1
T
P (k) = Rxx
(k) Rxx
(k)
91
17 SMOOTHING
17 Smoothing
urrent
x(k)
for a xed
k,
but
keeps in reasing
of interest
x(k)k
k = 1, 2, , N
x(k|N
) = x (k)
P (k|N ) = P (k)
v(k|N ) = v (k)
Preferred Implementation: Square-Root Information Smoother (SRIS)
Key Observation: Smoother equations fall out of the MAP estimation approa
h.
v (k) (0, I)
w
vv (0)v(0) + R
vx (1)x(1) + w
zv (0) = R
v (0)
.
.
.
zr (0) = wr (1)
.
.
.
zr (N ) = wr (N )
Square-Root Information Equation for the state at N
zx (N ) = Rxx (N )x(N ) + wx (N )
Invoke the Dynami
s Model:
x(N 1)
92
x(N ) in favor
17.2 Steps
17 SMOOTHING
17.2 Steps
1. Let:
zx (N ) = zx (N )
Rxx
(N ) = Rxx(N )
wx (N ) = wx (N )
2. (If needed) Compute:
1
x (N ) = Rxx
(N )z (N )
1
T
P (N ) = Rxx
(N )Rxx
(N )
3. Set
k = N 1.
The ost fun tion asso iated with the Square-Root Information equations at
k an be written as:
x(k + 1) in
Ja [v(k), x(k), k] =
1
||
2
favor of
x(k).
v(k)
x(k)
Substitute for
x(k + 1) and
||2
vx (k + 1)G(k)u(k)
vv (k) + R
vx (k + 1)(k) R
vx (k + 1)F (k) v(k)
v (k)
zv (k) R
|R
w
=
+
zx (k + 1) Rxx
(k + 1)G(k)u(k)
Rxx
(k + 1)(k)
Rxx
(k + 1)F (k) x(k)
wx (k + 1)
Ta (k) = QT (k)
does not hange the ost, but now the SR information equations are de oupled:
Where
Thus
wv (k)
(0, I)
xx (k)
zv (k)
Rvv (k) Rvx
(k) v(k)
w (k)
=
+ v
zx (k)
0
Rxx
(k) x(k)
xx (k)
Ja [v(k), x(k)] =
1
x (k) = Rxx
(k)zx (k) = E[x(k)|Z N ]
1
T
P (k) = Rxx
(k)Rxx
(k)
1
v (k) = Rvv
(k)[zv (k) Rvx
(k)x (k)] = E[v(k)|Z N ]
1
T
T
T
Pvv
(k) = Rvv
(k)[I + Rvx
(k)Rxx
(k)Rxx
(k)Rvx
(k)]Rvv
(k)
1
(k) = Rvv
(k)Rvx
(k)Rxx
(k)Rxx
(k)
Pvx
93
17.2 Steps
6. If
k = 0,
17 SMOOTHING
stop. Otherwise, de
rement k by 1 and goto step 4. now using the SR Information
equations:
zx (k + 1) = Rxx
(k + 1)x(k + 1) + wx (k + 1)
vv (k)v(k) + R
vx (k + 1)x(k + 1) + w
zv (k) = R
v (k)
94
(529)
(530)
We need to onstru t this from the ontinuous-time non-linear models of the form
x(t)
= f (t, x(t), u(t)) + D(t)
v (t)
(531)
(532)
Re all that we already did this in Chapter 10 for linear systems when
x(t)
= A(t)x(t) + B(t)u(t) + D(t)
v (t)
(533)
(535)
(534)
(536)
is assumed to be small
(537)
(538)
(539)
This is illustrated in Fig. 26, whi h shows that the value of the ontrol is assumed onstant (i.e.
measurement
z(k)
is a
sample of
Let
problem
95
(540)
(541)
(542)
We
an solve for xk (t) on tk t < tk+1 . The solution depends on x(k), u(k), and v(k). Let
f [k, x(k), u(k), v(k)] = xk (tk+1 ), where f [] is some pro
edure for integrating forward to tk+1 . In
MATLAB, this integration pro
edure
ould be ode45 or any other numeri
al integration s
heme.
How do we relate
Q(t)
A: If t is small, then
and
E v(t)
v T ( ) = (t )Q(t)
T
E v(k)v (j) = kj Q(t)
(544)
?
Q(t)
i
h
f [k, x(k), u(k), v(k)] x(k) + t f (tk , x(k), u(k)) + D(tk )v(k)
This is simple Euler Integration. In this
ase, the term of
tD(tk )v(k),
(543)
f []
(545)
tD(tk )v(k)
tk+1
D( )
v ( ) d
tk
f []
96
(546)
cov
Z
tk
(548)
k )DT (tk )
tD(tk )Q(t
yields that
Q(k) =
Note that
(547)
limt0 Q(k) = .
k)
Q(t
t
(549)
v(t).
Q: What if the measurement interval t is too large for the ZOH assumption to hold?
A: One
an take m intermediate steps of length t
m between ea
h measurement. Choose m su
h
that
t
m is small enough that
Q(k) =
t )
Q(k
m
t
m
the form
(550)
(rst measurement)
(551)
(next measurement) . . .
(552)
In other words, implement a KF by performing m propagation steps and then an update step, sin
e
new measurements only arrive every
mt
se onds.
f []
F (k) =
x(k) k,x(k),u(k),0
f []
(k) =
v(k)
(553)
(554)
k,x(k),u(k),0
x(k)
(555)
(556)
yields
97
f()
[x k (t)] =
x(k)
xk (t)
xk
x(k) t
t,x (t),u(k)
k
xk
= A(t)
x(k) t
Sin e
xk (tk ) = x(k)
(557)
d xk (t)
xk (t)
= A(t)
dt x(k)
x(k)
xk (tk )
= Inx xnx
x(k)
(558)
(559)
xk (tk )
x(k) is similar to the state-transition matrix
ontinuous-time linear systems in Se
tion 10.
This shows that
Similarly, for
F (t, tk )
(k)
xk (t)
d xk (t)
= A(t)
+ D(t)
dt v(k)
v(k)
xk (tk )
=0
v(k)
f [],
(560)
(561)
f().
These
xk (tk+1 )
f []
=
x(k)
x(k)
f []
xk (tk+1 )
=
v(k)
v(k)
This requires integration of Eqs. (558) and (560) from
tk
to
(562)
(563)
tk+1 .
numeri
al integration s
hemes, su
h as ode45, to integrate the two matrix dierential equations at
the same time we're integrating the
xk (t)
= [1 (t), 2 (t), . . . , nx (t)]
x(k)
xk (t)
= [1 (t), 2 (t), . . . , nv (t)]
v(k)
98
(564)
(565)
i = 1, 2, . . . , nx
(566)
(567)
ith row
i (tk ) = 0
where
i = 1, 2, . . . , nv
(569)
T
Xbig = xTk , T1 , T2 , . . . , Tnx , 1T , 2T , . . . , nTv
(568)
nx (nx + nv + 1)x1.
(570)
99
(571)
Problem statement
Dynami
s model:
(572)
(573)
Measurement model:
(574)
(575)
Approximate
x(k + 1) = E x(k + 1)|Z k
z(k + 1) = E z(k + 1)|Z k
(576)
(577)
P (k + 1), P kz (k + 1), P zz (k + 1)
(578)
If we
an assume that the approximations are valid then we
an use our old update equations
for the measurement update of the Kalman lter
x
(k + 1) =x(k + 1) + P xz (k + 1)Pzz (k + 1) [z(k + 1) z(k + 1)]
1
P (k + 1) =P (k + 1) P xz (k + 1)Pzz
(k +
T
1)P xz (k
+ 1)
(579)
(580)
100
(581)
x(k) = x
(k), v(k) = v(k) = 0:
x
(k + 1) = E f [k, x
(k), u(k), 0] +
[x(k) x
(k)]
x k,x(k),u(k),0
|
{z
}
F (k)
f
+ Higher
v(k)
+
v(k)
k,
x(k),u(k),0
{z
}
|
Order Terms|Z
(k)
(582)
Negle t higher order terms, hoping that the linearization is valid over the likely values of
x(k)!
x(k + 1) = f [k, x
(k), u(k), 0] + F (k) E[x(k) x(k)|Z k ] +(k) E v(k)|Z k
|
{z
}
|
{z
}
approximately=0
Note: Compute
x(k + 1)
Q(k)
But,
x(k + 1).
f [k, x(k),
h u(k),
h
i v(k)].
(583)
2f
x2
and
2 f
v 2
then
P (k)
and
P (k + 1) = E (x(k + 1) x(k + 1))( )T
(584)
(585)
(586)
negle t
Note: This is the same as for the linear Kalman Filter. The only dieren e is that
(k)
F (k)
and
{z
}
|
H(k+1)
= h [k + 1, x(k + 1)] + H(k + 1) E x(k + 1) x(k + 1)|Z k + E w(k + 1)|Z k
|
{z
} |
{z
}
approximately=0
h [k + 1, x(k + 1)]
101
(587)
P ae
ts z .
P xz (k + 1) = E (x(k + 1) x(k + 1))(z(k + 1) z(k + 1))T |Z k
(588)
(589)
Note that
Therefore
P xz (k + 1) = E (x(k + 1) x(k + 1))(H(k + 1) [x(k + 1) x(k + 1) + w(k + 1)])T |Z k
T
= P (k + 1)H (k + 1)
(590)
(591)
Similarly,
(592)
x
(0), P (0)
k=0
3. Compute
x(k + 1) = f [k, x
(k), u(k), 0]
f
F (k) =
x k,x(k),u(k),0
f
(k) =
v(k) k,x(k),u(k),0
(593)
(594)
(595)
(596)
(597)
z(k + 1) =h[k + 1, x
(k + 1)]
h
H(k + 1) =
x
(598)
(599)
k+1,x(k+1)
(k + 1) =z(k + 1) z(k + 1)
S(k + 1) =H(k + 1)P (k + 1)H T (k + 1) + R(k + 1) = P zz (k + 1)
T
W (k + 1) =P (k + 1)H (k + 1)S (k + 1)
x(k + 1) =x(k + 1) + W (k + 1)(k + 1)
(603)
P (k + 1) =P (k + 1) W (k + 1)S(k + 1)W (k + 1)
102
(601)
(602)
(600)
and
(604)
5. Filter:
P (k + 1) > 0:
W (k + 1) =P (k + 1)H (k + 1)R
k
by 1 and go to Step 3.
103
(606)
(607)
(k + 1)
(608)
+ W (k + 1)R(k + 1)W (k + 1)
6. In rement
(605)
1
[x (k) x
(k)]T P 1 (k) [...]
2
1 T
v (k) Q1 (k) v (k)
2
1
T
{z (k + 1) h [k + 1, x (k + 1)]} R1 (k + 1) {...}
2
(609)
fun tion:
x (k + 1) = f [k, f
The above an be better visualized by taking the example of our original nonlinear problem
(610)
(611)
1
We
an obtain f
by numeri
ally integrating ba
kward f in time.
1
to eliminate x(k) from the MAP
ost fun
tion and thereby dene:
Use f
T
1 1
f [k, x(k + 1, u(k), v(k))] x
(k) P 1 (k) [...]
2
1 T
1
T
v (k) Q1 (k) v (k) + {z (k + 1) h [k + 1, x (k + 1)]} R1 (k + 1)(612)
{...}
2
2
This is just a weighted least squares
ost fun
tion for errors in the three equations given below:
1.
2.
0 = v(k)
3.
with weighting
with weighting
P 1
Q1
with weighting
104
R1 (k + 1)
Strategy: We use the Gauss-Newton method to linearize and solve. We start by rst linearizing
x(k + 1) = x
(k + 1) and v(k) = 0.
x
(k)
Here
x
(k + 1) is obtained by the equation
x
(k + 1) = f [k, x
(k), u(k), v(j)]
The next step is to solve the linearized least squares problem for
(613)
x
(k)
and
v(k),
then re-linearize
f 1
f [k, x
(k + 1) , u (k) , v (k)] +
x (k + 1)
1
f
[v (k) 0] x
(k)
v (k) k,x(k+1)
k,
x(k+1)
(614)
x
(k) = f 1 [k, x
(k + 1) , u (k) , v (k)]
the x
(k) s
an
el.
[x (k + 1) x (k + 1)]
It an be shown that:
f 1
x (k + 1)
= F 1 (k) =
k,
x(k+1),u(k),0
f
x(k)
1
k,
x(k),u(k),0
And
f 1
v (k)
k,
x(k+1),u(k),0
1
f
= F 1 (k) (k) = F 1 (k)
v(k) k,x(k),u(k),0
0 = F 1 (k) [x (k + 1) x
(k + 1)] F 1 (k) (k) v(k)
0
(k + 1)]
= z (k + 1) h [k + 1, x
We dene,
H (k + 1) =
Also, we know,
h
x(k+1)
h
x (k + 1)
k,
x(k+1)
[x (k + 1) x
(k + 1)]
k,
x(k+1)
z (k + 1) = h [k + 1, x
(k + 1)]
0 = F 1 (k) [x (k + 1) x (k + 1)] F 1 (k) (k) v(k)
105
2.
0 = v (k)
3.
0 = z (k + 1) z (k + 1) H (k + 1) [x (k + 1) x
(k + 1)]
.Jb
New ost fun tion obtained by substituting the linearized equations ba k into the ost fun tion
1
1
T
[x (k + 1) x (k + 1) (k) v(k)] F T (k) P (k) F 1 (k) [...]
2
1 T
v (k) Q1 (k) v (k)
2
1
T
{z (k + 1) z (k + 1) H (k + 1) [x (k + 1) x
(k + 1)]} R1 (k + 1)(615)
{...}
2
x (k + 1)and v (k).
lose to maximizing the aposteriori likelihood fun
tion and
an be viewed as the justi
ation for
EKF. Also, there are analogies to represent Extended Kalman lter as a square root information
lter.
Consider
(i.e the
v (k)that
minimizesJb )
x (k + 1)solve
Also, let
H i (k + 1) =
The linearized measurement equation after the
h
x
ith
k+1,
xi (k+1)
step:
0 = z (k + 1) h k + 1, x
i (k + 1) H i (k + 1) x (k + 1) x
i (k + 1)
T
Jci
=
x(k + 1)
= P 1 (k + 1) [x (k + 1) x (k + 1)]
T
H i (k + 1)R1 (k + 1){z (k + 1) h k + 1, x
i (k + 1) H i (k + 1) x (k + 1) x
i (k +(616)
1) }
106
Let,
x
i+1 (k + 1) =
+
x (k + 1)
yields:
T
x
i (k + 1) + P i (k + 1) H i (k + 1)R1 (k + 1) ){z (k + 1) h k + 1, x
i (k + 1) }
P i (k + 1) P 1 (k + 1) [
x (k + 1) x
i (k + 1)]
(617)
With,
P 0 (k + 1) = P (k + 1) and x0 (k + 1) = x
(k + 1)
Note that
xi (k + 1)
i+1
x
(k + 1) x
i (k + 1)
<
It is used to deal with
urrent and several past measurement non-linearities plus dynami
s nonlinearities.
Solve for,
x (k j) and v (k j) , j = m, m 1, ..., 0
Su
h that it minimizes the expression below:
1
T
[x (k j) x (k m)] P 1 (k m) [...]
2
k1
1 X T
v (l) Q1 (l) v(l)
2
l=km
1
2
k
X
l=km1
Su h that,
x (k j)
propagate it forward.
107
T
T
(l) {...}
(618)
any
of
F, G, , H, Q, R, x
(0), P (0).
take on values in
Then:
p[x(k)|j , Z k ]
{1 , 2 , . . . , M }.
p[ = j |Z k ] , j (k) is
x(k)
under the
j th
j th
model.
Zk
PM
j=1
j = 1).
(k) = = onst.
21.1.1 Strategy
1. Determine how to propagate
p[x(k)|Z k ] =
j (k)
to
PM
j (k + 1)
2. Find
21.1.2 Steps
1. Propagate:
j (k)
p[j |Z k ]
= p[j |z(k), Z k1 ]
=
= j (k)
p[z(k)|j , Z k1 ] p[j |Z k1 ]
p[z(k)|Z k1 ]
p[z(k)|j , Z k1 ] j (k 1)
M
X
p[z(k)|l , Z k1 ] l (k 1)
l=1
The fa tor
p[z(k)|j , Z k1 ]
at time
k.
where
Sj (k)
108
and
KF
T1
21.2 Remarks
Z
x1
n1
21 MULTIPLE
MODEL (MM) FILTERING
mu1
x2
2. Estimate: n2
T2
KF
xhat
xMAP (k) =
x
MMSE
xM(k) =
KF
PMMSE (k) =
mu2
E[x(k)|Z ] =
nM
TM
sum
M
X
M
X
x
j (k) j (k)
j=1
j=1
propagate
muk1
21.2 Remarks
Choosing
{1 , 2 , . . . , M }
is among the
j ,
then orresponding
j (k)
will approa h 1 as
(k) {1 , 2 , . . . , M }
k .
and assume model swit hing is a Markov pro ess with transition
probabilities given by
x(k + 1) =
z(k) =
21.2 Remarks
There exist
Mk
k.
Complexity (# of lters)
M
MN
sequen es
steps
N =2
= 1)
but only
= 2)
M
lters used)
For additional material, please refer to Mayba k se tion 10.8 and Bar Shalom Se tion 11.6.
110
A more perfe
t linear estimator. All MMSE estimators, in
luding approximate te
hniques su
h
as the Extended Kalman Filter (EKF) and Sigma Point Filter (SPF), redu
e to taking the approximate
onditional mean and
ovarian
e:
x(k)
= E[x(k)|z k ]
T
But both EKF and SPF are
onsidering only the rst two moments of the posterior pdf
k
x(k)
and
P (k)
are poor
p[x(k)|z ]
22.1.1 Propagation
Prior at
k+1
p[x(k + 1)|z k ] =
22.1.2 Update
p[x(k + 1)|z k+1 ] =
Q: Why should we settle for anything less than this optimal estimate?
A: Think about the one-dimensional problem: We
an approa
h optimality
by numeri al inte-
gration. But grid size must be small and the grid must
apture the tails of the distribution.
Now think about the multi-dimensional problem: If we need 100
ells per dimension and we have
nx dimensions, the total number of
ells is 100nx . Another problem: The grid-based method is not
111
readily parallelizable.
The Parti
le Filter suers from the same drawba
ks as the grid-based method (massive memory and
omputation is required even for modest
nx .
anyway.
Key Idea: Estimate the posterior using weighted "parti
les":
p[x(k)|z k ]
Ns
X
i=1
wi [x(k) i (k)]
X
wi = 1
wi
from a distribution.
Problem: For an arbitrary distribution, it is
omputationally expensive to generate random samples.
q(x)
is non-zero everywhere
2.
q(x)
and
p(x)
an be approximated as:
Then
wi 's
p(x)
p(x)
is non-zero
are given by
p(x)
PNs
i=1
wi [x i ]
wi = c
Ns
where we draw {i }i=1 from
p(i )
q(i )
Q:
Q:
Ns
Ns
112
q(x)
and
q[x(k)|z k ]
i (k), i = 1, 2, ..., Ns
from
wi (k) =
p[x(k)|z k ]
q[x(k)|z k ]
PNs
i=1
And
ompute basi
estimation quantities:
wi (k)[x(k) i (k)]
x(k)
= E[x(k)|z k ]
wi (k)i (k)
T k
i (k) x(k)]
i
This step is not ne essary unless you a tually have to provide a single estimate.
p[i(0), , i (k)|z k ]
Assume:
Then
Markov assumption)
p[i (0), ..., i (k)|z k1 ] = p[i (k)|i (0), , i (k 1), z k1 ]p[i (0), ..., i (k 1)|z k1 ]
Assume:
wi (k) =
wi (k) = c
Bootstrap method:
Then the
wi (k 1)
wi (k)
be ome:
wi = c p[z(k)|i (k)] wi (k 1)
(This is similar to the Multiple-Model updates for the
i 's)
i (0), i [1, Ns ]
p[x(0)]
and
q[x(k)] =
p[x(k)|x(k1)], whi
h is generally made be
ause pro
ess noise is often assumed to be Gaussian.
be
ause the bootstrap lter makes the parti
ular
hoi
e of importan
e density
vi (k 1) p[
v (k 1)], i [1, Ns ]
3. Propagate ea
h parti
le forward a
ording to the dynami
s fun
tion. This is analogous to the
predi
tion step in teh EKF or SPF, as it predi
ts the parti
le set forward in time without any
measurement updates. Noti
e the parti
le weights don't
hange during this step.
surements
an be subdivided into multiple predi
tions if ne
essary for a
ureate modeling or
omputational savings.
5. At the time of the measurement, re
al
ulate the weights on the parti
les a
ording to the
bootstrap weight update equation. Note that the use of the primed wi (k) to indi
ate that it
is not yet normalized.
p[z(k)|i (k)] may ause the numeri al underow problems in the parwi (k) might get set to zero be ause it is too small to represent in
double pre ision. To be safe, the parti les may be updated a ording to log-likelihoods:
maxi [log(wi (k))] has been subtra
ted from ea
h weight before taking teh exponent. This
Where
s
ales all weights prior to the exponentiation for added numeri
al robustness. In the parti
ular (and typi
al)
ase of zero-mean additive Gaussian white measurement noise, the weight
update is parti
ularly simple. That is, if
with
then:
1
log[wi (k)] = [z(k) h(i (k))]T R1 (k)[z(k) h(i (k))] + log[wi (k 1)]
2
wi (k) = exp(log[wi (k)] maxi (log[wi (k)]))
Noti
e in taking the log of the Gaussian likelihood, we drop the normalization
onstant. That
onstant is the same for all weights, so it gets
an
elled when the weights are re-normalized
later on.
6. Re-normalize the weights so they sum to unity.
fa
t that the set of parti
les a
tually represent a dis
rete approximation to the posterior
probability density of
x(k).
w (k)
wi (k) = PNsi
i=1 wi (k)
s:
N
1
s = P
N
Ns
2
i=1 (wi (k))
s
N
resampling more or less than that may also be justied. Here is a ommon
su h that
Pm1
j=1
on
[0, 1]
uniformly
wj (k) <
Pm
j=1
wj (k)
= [1, Ns ]
are hosen
(e) Delete the old set of parti
les and use the new set and new weights
Note that some old parti
les might appear more than on
e in the new set, whereas others
might disappear altogether.
115
9. Compute basi estimation statisti s when desired (but don't throw out the parti le set!)
x(k)
P (k)
Ns
X
i=1
wi (k)i (k)
i=1
22.3.1 Note:
Q: What is p[z(k)|i (k)]?
A: Suppose z(k) = h[k, x(k)] + w(k), w(k) N (0, R)
Then
Ns
X
116