HW 6 Sol
HW 6 Sol
1. Linear estimator. Consider a channel with the observation Y = XZ, where the signal X and
the noise Z are uncorrelated Gaussian random variables. Let E[X] = 1, E[Z] = 2, σX 2 = 5,
and σZ2 = 8.
Solution:
(a) We know that the best linear estimate is given by the formula
Cov(X, Y )
X̂ = (Y − E(Y )) + E(X).
σY2
Note that X and Z Gaussian and uncorrelated implies they are independent. Therefore,
1 (y−2)2
fY (y) = √ e− 2×68 .
2π × 68
1
On the other hand, as a function of two random variables, Y has pdf
Z ∞ y
fY (y) = fX (x)fZ dx.
−∞ x
2. Additive-noise channel with path gain. Consider the additive noise channel shown in the figure
below, where X and Z are zero mean and uncorrelated, and a and b are constants.
X a b Y = b(aX + Z)
Find the MMSE linear estimate of X given Y and its MSE in terms only of σX , σZ , a, and b.
Cov(X, Y )
X̂ = (Y − E(Y )) + E(X).
σY2
E(X) = 0,
E(Y ) = b(aE(X) + E(Z)) = 0,
2
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = E(Xb(aX + Z)) = a b σX ,
σY2 = E(Y 2 ) − (E(Y ))2 = E(b2 (aX + Z)2 ) = b2 a2 σX
2
+ b2 σZ2 .
2
3. Image processing. A pixel signal X ∼ U[−k, k] is digitized to obtain
1
X̃ = i + , if i < X ≤ i + 1, i = −k, −k + 1, . . . , k − 2, k − 1.
2
To improve the the visual appearance, the digitized value X̃ is dithered by adding an inde-
pendent noise Z with mean E(Z) = 0 and variance Var(Z) = N to obtain Y = X̃ + Z.
Solution:
E(X) = 0,
E(Y ) = E(X̃) + E(Z) = 0,
k−1 k−1
X 1 2 1 1 X 4k 2 − 1
σY2 = VarX̃ + VarZ = (i + ) +N = (2i + 1)2 + N = + N.
2 2k 4k 12
i=−k i=0
3
1 1 1 1 1 2
1 2 2 1
(a) (b) (c) 1 2 2 (d) 1 2 3
0 2 1 2
1 2 3 2 3 3
Solution:
1 1 2 2
2 0 −1 1 2 3
0 = −1 < 0.
2 3 3 −1
5. Gaussian random vector. Given a Gaussian random vector X ∼ N (µ, Σ), where µ = (1 5 2)T
and
1 1 0
Σ = 1 4 0 .
0 0 9
Solution:
(a) i. The marginal pdfs of a jointly Gaussian pdf are Gaussian. Therefore X1 ∼ N (1, 1).
ii. Since X2 and X3 are independent (σ23 = 0), the variance of the sum is the sum
of the variances. Also the sum of two jointly Gaussian random variables is also
Gaussian. Therefore X2 + X3 ∼ N (7, 13).
4
iii. Since 2X1 + X2 + X3 is a linear transformation of a Gaussian random vector,
X1
2X1 + X2 + X3 = 2 1 1 X2 ,
X3
it is a Gaussian random vector with mean and variance
1 1 1 0 2
µ = 2 1 1 5 = 9 and σ 2 = 2 1 1 1 4 0 1 = 21 .
2 0 0 9 1
Thus 2X1 + X2 + X3 ∼ N (9, 21).
iv. Since σ13 = 0, X3 and X1 are uncorrelated and hence independent since they are
jointly Gaussian; similarly, since σ23 = 0, X3 and X2 are independent. Therefore
the conditional pdf of X3 given (X1 , X2 ) is the same as the pdf of X3 , which is
N (2, 9).
v. We use the general formula for the conditional Gaussian pdf:
X2 | {X1 = x1 } ∼ N Σ21 Σ−1 −1
11 (x − µ1 ) + µ2 , Σ22 − Σ21 Σ11 Σ12
5
(c) In general, AX ∼ N (AµX , AΣX AT ). For this problem,
1
2 1 1 9
µY = AµX = 5 = ,
1 −1 1 −2
2
1 1 0 2 1
T 2 1 1 21 6
ΣY = AΣX A = 1 4 0 1 −1 =
.
1 −1 1 6 12
0 0 9 1 1
9 21 6
Thus Y ∼ N , .
−2 6 12
6. Gaussian Markov chain. Let X, Y, and Z be jointly Gaussian random variables with zero
mean and unit variance, i.e., E(X) = E(Y ) = E(Z) = 0 and E(X 2 ) = E(Y 2 ) = E(Z 2 ) = 1. Let
ρX,Y denote the correlation coefficient between X and Y , and let ρY,Z denote the correlation
coefficient between Y and Z. Suppose that X and Z are conditionally independent given Y .
Solution:
Cov(X, Z)
ρX,Z = ,
σX σZ
where,
Now E(X|Y ) can be easily calculated from the bivariate Gaussian conditional density
ρX,Y σX
E(X|Y ) = E(X) + (Y − E(Y )) = ρX,Y Y.
σY
Similarly, we have
E(Z|Y ) = ρY,Z Y.
6
Therefore, combining the above,
ρX,Z = E(XZ)
= E[E(X|Y )E(Z|Y )]
= E(ρX,Y ρY,Z Y 2 )
= ρX,Y ρY,Z E(Y 2 )
= ρX,Y ρY,Z .
(b) X, Y and Z are jointly Gaussian random variables. Thus, the minimum MSE estimate
of Z given (X, Y ) is linear.
1 ρX,Y
Σ(X,Y )T = ,
ρX,Y 1
E(XZ) ρX,Z
Σ(X,Y )T Z = = ,
E(Y Z) ρY,Z
ΣZ(X,Y )T = ρX,Z ρY,Z .
Therefore,
X
Ẑ = ΣZ(X,Y )T Σ−1
(X,Y )T Y
−1
1 ρX,Y X
= ρX,Z ρY,Z
ρX,Y 1 Y
1 1 −ρX,Y X
= ρX,Z ρY,Z
1 − ρ2X,Y −ρX,Y 1 Y
1 X
0 −ρ2X,Y ρY,Z
= 2
+ ρY,Z ,
1 − ρX,Y Y
where the last equality follows from the result of (a). Thus,
X
Ẑ = 0 ρY,Z = ρY,Z Y.
Y
7
7. Prediction of an autoregressive process. Let X be a random vector with zero mean and
covariance matrix
α α2 · · · αn−1
1
α 1 α
α2 α 1
ΣX =
.. ..
. .
α n−1 ··· 1
for |α| < 1. X1 , X2 , . . . , Xn−1 are observed, find the best linear MSE estimate (predictor) of
Xn . Compute its MSE.
Solution: We have
· · · αn−2 αn−1
1
.. .. .. ..
ΣX = . . . . .
αn−2 · · · 1 α
αn−1 · · · α 1
T
By defining Y = X1 · · · Xn−1 , we have
· · · αn−2
1
ΣY = ... .. .. ,
. .
αn−2 · · · 1
T
ΣYX = αn−1 · · ·
α ,
ΣXY = αn−1 · · ·
α ,
σx 2 = 1.
Therefore,
X̂n = ΣXY Σ−1
Y Y
−1
αn−2
1 ···
. .. .. Y
= αn−1 · · · α ..
. .
αn−2 · · · 1
= hT Y (where hT = ΣXY Σ−1
Y )
= 0 · · · 0 α Y (since hT ΣY = ΣXY )
= αXn−1 ;
and
0 α ...
= 1 − 0 ···
α
= 1 − α2 .
8
8. Noise cancellation. A classical problem in statistical signal processing involves estimating a
weak signal (e.g., the heart beat of a fetus) in the presence of a strong interference (the heart
beat of its mother) by making two observations; one with the weak signal present and one
without (by placing one microphone on the mother’s belly and another close to her heart).
The observations can then be combined to estimate the weak signal by “cancelling out” the
interference. The following is a simple version of this application.
Let the weak signal X be a random variable with mean µ and variance P , and the observations
be Y1 = X + Z1 (Z1 being the strong interference), and Y2 = Z1 + Z2 (Z2 is a measurement
noise), where Z1 and Z2 are zero mean with variances N1 and N2 , respectively. Assume that
X, Z1 and Z2 are uncorrelated. Find the best linear MSE estimate of X given Y1 and Y2 and
its MSE. Interprete the results.
Solution: This is a vector linear MSE problem. Since Z1 and Z2 are zero mean, µX = µY1 = µ
and µY2 = 0. We first normalize the random variables by subtracting off their means to get
X ′ = X − µ, and
Y −µ
Y′ = 1 .
Y2
Now using the orthogonality principle we can find the best linear MSE estimate X̂ ′ of X ′ . To
do so we first find
P + N1 N1 P
ΣY = and ΣYX = .
N1 N1 + N2 0
Thus,
X̂ ′ = ΣTYX Σ−1
Y Y
′
1 N1 + N2 −N1
Y′
= P 0
P (N1 + N2 ) + N1 N2 −N1 P + N1
P
(N1 + N2 ) −N1 Y′ .
=
P (N1 + N2 ) + N1 N2
The best linear MSE estimate is X̂ = X̂ ′ + µ. Thus,
P
X̂ = ((N1 + N2 )(Y1 − µ) − N1 Y2 ) + µ
P (N1 + N2 ) + N1 N2
1
= (P ((N1 + N2 )Y1 − N1 Y2 )) + N1 N2 µ) .
P (N1 + N2 ) + N1 N2
The MSE can be calculated by
2
MSE = σX − ΣTYX Σ−1
Y ΣYX
P P
=P− (N1 + N2 ) −N1
P (N1 + N2 ) + N1 N2 0
P 2 (N1 + N2 )
=P−
P (N1 + N2 ) + N1 N2
P N1 N2
= .
P (N1 + N2 ) + N1 N2
9
The equation for the MSE makes perfect sense. First, note that if N1 and N2 are held constant
but P goes to infinity, the MSE tends to NN11+N N2
2
. Next, note that if both N1 and N2 go to
2
infinity, the MSE goes to σX , i.e., the estimate becomes worthless. Finally, note that if either
N1 or N2 goes to 0, the MSE also goes to 0. This is because the estimator will then use the
measurement with zero noise variance and perfectly determine the signal X.
10
Solutions to Additional Exercises
1. Worst noise distribution. Consider an additive noise channel Y = X+Z, where the signal X ∼
N (0, P ) and the noise Z has zero mean and variance N . Assume X and Z are independent.
Find a distribution of Z that maximizes the minimum MSE of estimating X given Y , i.e., the
distribution of the worst noise Z that has the given mean and variance. You need to justify
your answer.
To prove this statement, we show that the MSE corresponds to any other distribution of
Z is less than or equal to the MSE of Gaussian noise, i.e. MSENonG ≤ MSEG .
We know for any noise, MMSE estimation is no worse than linear MMSE estimation, so
MSENonG ≤ LMSE. Linear MMSE estimate of X given Y is given by
Cov(X, Y ) P
X̂ = 2 (Y − E(Y )) + E(X) = Y,
σY P +N
2 Cov2 (X, Y ) P2 NP
LMSE = σX − 2 = P − = .
σY P +N P +N
Note that LMSE only depends on the second moment of X and Z. So MSE corresponds to
any distribution of Z is always upper bounded by the same LMSE, i.e. MSENonG ≤ PN+N
P
.
When Z is Gaussian and independent of X, (X, Y ) are joint Gaussian. Then MSEG is
equal to LMSE, i.e. MSEG = PN+N
P
. Hence,
NP
MSENonG ≤ = MSEG ,
P +N
which shows the Gaussian noise is the worst.
2. Jointly Gaussian random variables. Let X and Y be jointly Gaussian random variables with
pdf
1 1 2 2
fX,Y (x, y) = p e− 2 (4x /3+16y /3+8xy/3−8x−16y+16) .
π 3/4
Solution:
(a) We can write the joint pdf for X and Y jointly Gaussian as
h i
exp − a (x − µX )2 + b (y − µY )2 + c (x − µX ) (y − µY )
fX,Y (x, y) = q ,
2πσX σY 1 − ρ2X,Y
11
where
1 1 −2ρX,Y
a= 2 , b= , c= .
2(1 − ρ2X,Y )σX 2(1 − ρ2X,Y )σY2 2(1 − ρ2X,Y )σX σY
2 8 4
a= , b= , c= ,
3 3 3
and we get three equations in three unknowns
c 1
ρX,Y = − √ = − ,
2 ab 2
2 1
σX = = 1,
2(1 − ρ2X,Y )a
1 1
σY2 = 2 = .
2(1 − ρX,Y )b 4
2aµX + cµY = 4,
2bµY + cµX = 8,
µX = 2, µY = 1.
Finally
1
Cov(X, Y ) = ρX,Y σX σY = − .
4
(b) X and Y are jointly Gaussian random variables. Thus, the minimum MSE estimate of
X given Y is linear
Cov(X, Y )
E(X|Y ) = (Y − µY ) + µX = −(Y − 1) + 2 = 3 − Y.
σY2
3
MMSE = E(Var(X|Y )) = (1 − ρ2XY )σX
2
= .
4
3. Markov chain. Suppose X1 and X3 are independent given X2 . Show that
f (x1 , x2 , x3 ) = f (x1 )f (x2 |x1 )f (x3 |x2 ) = f (x3 )f (x2 |x3 )f (x1 |x2 ).
12
Therefore, using the definition of conditional density,
f (x1 , x2 , x3 ) = f (x1 )f (x2 |x1 )f (x3 |x1 , x2 ) = f (x1 )f (x2 |x1 )f (x3 |x2 ) ,
f (x1 , x2 , x3 ) = f (x3 )f (x2 |x3 )f (x1 |x2 , x3 ) = f (x3 )f (x2 |x3 )f (x1 |x2 ),
(a) Let X̂ be the best MSE linear estimate of X given Y. Then X̂ and X − X̂ are individually
zero-mean Gaussians. Find their variances.
(b) X̂ and X − X̂ are independent.
(c) Now write X = X̂ + (X − X̂). If Y = y then X = ΣXY Σ−1
Y y + (X − X̂).
(d) Now complete the proof.
Solution:
(a) Let X̂ be the best MSE linear estimate of X given Y. In the MSE vector case section of
Lecture Notes #6 it was shown that X̂ and X − X̂ are individually zero-mean Gaussian
random variables with variances ΣXY Σ−1 2 −1
Y ΣYX and σX − ΣXY ΣY ΣYX , respectively.
(b) The random variables X̂ and X − X̂ are jointly Gaussian since they are obtained by
a linear transformation of the GRV [ Y X ]T . By orthogonality, X̂ and X − X̂ are
uncorrelated, so they are also independent. By the same reasoning, X − X̂ and Y are
independent.
(c) Now write X = X̂ + (X − X̂). Then given Y = y
X = ΣXY Σ−1
Y y + (X − X̂) ,
since X − X̂ is independent of Y.
(d) Thus X | {Y = y} is Gaussian with mean ΣXY Σ−1 2 −1
Y y and variance σX − ΣXY ΣY ΣYX .
13
5. Additive nonwhite Gaussian noise channel. Let Yi = X + Zi for i = 1, 2, . . . , n be n ob-
servations of a signal X ∼ N(0, P ). The additive noise random variables Z1 , Z2 , . . . , Zn are
zero mean jointly Gaussian random variables that are independent of X and have correlation
E(Zi Zj ) = N · 2−|i−j| for 1 ≤ i, j ≤ n.
Hint: the coefficients for the best estimate are of the form hT = [ a b b · · · b b a ].
Solution:
P + N/2n−2 P + N/2n−1
P P +N P + N/2 ··· h1
P P + N/2 P +N ··· P + N/2n−3 P + N/2n−2
h2
.. . .. .. .. ..
.= .. .. . .
. . . .
P P + N/2n−2 P + N/2n−3
··· P +N P + N/2 hn−1
P P + N/2n−1 P + N/2n−2 ··· P + N/2 P +N hn
By the hint, there are only 2 degrees of freedom given, a and b. Solving this equation
using the first 2 rows of the matrix, we obtain
h1 2
h2 1
..
P ..
. = .
3N + (n + 2)P .
hn−1 1
hn 2
14
(b) The minimum mean square error is
15