0% found this document useful (0 votes)
28 views

Lecture 6

The document discusses the likelihood function and likelihood principle. It defines the likelihood function as the joint density function of a sample, treating the parameter as fixed. While the likelihood function contains sufficient information to generate estimators like the maximum likelihood estimator, it does not necessarily contain all information in the data. This is illustrated through an example where the likelihood contains no information about the unknown parameter, yet it can still be estimated from the data. The likelihood principle, which states that the likelihood contains all information, is therefore false.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Lecture 6

The document discusses the likelihood function and likelihood principle. It defines the likelihood function as the joint density function of a sample, treating the parameter as fixed. While the likelihood function contains sufficient information to generate estimators like the maximum likelihood estimator, it does not necessarily contain all information in the data. This is illustrated through an example where the likelihood contains no information about the unknown parameter, yet it can still be estimated from the data. The likelihood principle, which states that the likelihood contains all information, is therefore false.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Lecture Notes 6

1 The Likelihood Function


Definition. Let X n = (X1 , · · · , Xn ) have joint density p(xn ; θ) = p(x1 , . . . , xn ; θ) where
θ ∈ Θ. The likelihood function L : Θ → [0, ∞) is defined by

L(θ) ≡ L(θ; xn ) = p(xn ; θ)

where xn is fixed and θ varies in Θ.

1. The likelihood function is a function of θ.


2. The likelihood function is not a probability density function.
3. If the data are iid then the likelihood is
n
Y
L(θ) = p(xi ; θ) iid case only.
i=1

4. The likelihood is only defined up to a constant of proportionality.


5. The likelihood function is used (i) to generate estimators (the maximum likelihood
estimator) and (ii) as a key ingredient in Bayesian inference.

Example 1 Suppose that X = (X1 , X2 , X3 ) ∼ Multinomial(n, p) where

p = (p1 , p2 , p3 ) = (θ, θ, 1 − 2θ).

So  
n
p(x; θ) = px1 1 px2 2 px3 3 = θx1 +x2 (1 − 2θ)x3 .
x1 x2 x3
Suppose that X = (1, 3, 2). Then
6!
L(θ) = θ1 θ3 (1 − 2θ)2 ∝ θ4 (1 − 2θ)2 .
1! 3! 2!
Now suppose that X = (2, 2, 2). Then
6!
L(θ) = θ2 θ2 (1 − 2θ)2 ∝ θ4 (1 − 2θ)2 .
2! 2! 2!
Hence, the likelihood function is the same for these two datasets.

Example 2 X1 , · · · , Xn ∼ N (µ, 1). Then,


  n2 ( n
)
1 1X n n o
L(µ) = exp − (xi − µ)2 ∝ exp − (x − µ)2 .
2π 2 i=1 2

1
Example 3 Let X1 , . . . , Xn ∼ Bernoulli(p). Then

L(p) ∝ pX (1 − p)n−X
P
for p ∈ [0, 1] where X = i Xi .

Theorem 4 Write xn ∼ y n if L(θ|xn ) ∝ L(θ|y n ). The partition induced by ∼ is the minimal


sufficient partition.

Proof. Homework. 

2 Likelihood, Sufficiency and the Likelihood Principle


The likelihood function is a minimal sufficient statistic. That is, if we define the equivalence
relation: xn ∼ y n when L(θ; xn ) ∝ L(θ; y n ) then the resulting partition is minimal sufficient.
Does this mean that the likelihood function contains all the relevant information? Some
people say yes it does. This is sometimes called the likelihood principle. That is, the likelihood
principle says that the likelihood function contains all the infomation in the data.
This is FALSE. Here is a simple example to illustrate why. Let C = {c1 , . . . , cN } be
a finite set of constants. For simplicity, asssume that cj ∈ {0, 1} (although this is not
important). Let θ = N1 N
P
j=1 cj . Suppose we want to estimate θ. We proceed as follows. Let
S1 , . . . , Sn ∼ Bernoulli(π) where π is known. If Si = 1 you get to see ci . Otherwise, you do
not. (This is an example of survey sampling.) The likelihood function is
Y
π Si (1 − π)1−Si .
i

The unknown parameter does not appear in the likelihood. In fact, there are no unknown
parameters in the likelihood! The likelihood function contains no information at all.
But we can estimate θ. Let
N
1 X
θb = cj Sj .
N π j=1

Then E(θ)
b = θ. Hoeffding’s inequality implies that
2 π2
P(|θb − θ| > ) ≤ 2e−2n .

Hence, θb is close to θ with high probability.


Summary: the minimal sufficient statistic has all the information you need to compute
the likelihood. But that does not mean that all the information is in the likelihood.

You might also like