Data Analysis
Data Analysis
analysis
Zixu Wang
i
3.5 Multidimensional random variables . . . . . . . . . . . . . . . . . 15
4 Probability distribution 16
4.1 Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Compound Poisson distribution . . . . . . . . . . . . . . . . . . . 18
4.5 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . 18
4.6 Pascal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.7 Hyper-geometric distribution . . . . . . . . . . . . . . . . . . . . 19
4.8 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.9 exponent distribution . . . . . . . . . . . . . . . . . . . . . . . . 20
4.10 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.11 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.12 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.13 Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.14 Landau distribution . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.15 χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.16 T Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.17 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
ii
6.5 Bayes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Parameter estimation 33
7.1 Estimate and Likelihood function . . . . . . . . . . . . . . . . . . 33
7.1.1 Congruence . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.2 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.3 Effectiveness and the least variance . . . . . . . . . . . . . 34
7.1.4 Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2.1 Pivot variable method . . . . . . . . . . . . . . . . . . . . 35
7.2.2 Large sample method . . . . . . . . . . . . . . . . . . . . 35
7.3 Confidence interval for normal distribution . . . . . . . . . . . . 35
10 Moments method 40
11 Interval estimation 41
12 42
13 43
14 44
15 45
16 46
17 47
iii
Chapter 1
Preliminary probability
theory
1.1 Probability
• Statistical Probability
n
P (A) = lim .
N →∞ N
• Classical probability
P (A) = k/n.
• Geometric probability
ωA
P (A) = ,
Ω
–
P (S) = 1
–
X
n
P (A1 ∪ A2 ∪ · · · ∪ An ) = P (Ak )
k=1
X∞
P (A1 ∪ A2 ∪ · · · ∪ An ∪ · · · ) = P (Ak ).
k=1
1
1.3 Property of probability
•
P (A) + P (Ā) = 1.
•
P (∅) = 0.
•
P (A) ⩾ P (B).
if A ⊃ B.
•
P (A) ⩾ P (B).
if Ai is a division of S.
•
P (A − B) = P (A) − P (B).
if A ⊃ B.
•
P (A ∪ B) = P (A) + P (B) − P (AB).
• generalization:
X
n
S1 = P (Ai ),
i=1
Xn
S2 = P (Ai Aj ),
i<j=2
Xn
S3 = P (Ai Aj Ak )Q ,
i<j<k=3
Then !
[
n
P Ai ≡ P (A1 ∪ A2 ∪ · · · ∪ An )
i=1
= S1 − S2 + S3 − · · · − (−1)n Sn .
which is the addition theorem of probability.
2
1.4 Contingent probability
•
P (AB)
P (B|A) = .
P (A)
or
P (AB)
P (A|B) = .
P (B)
• We can get
•
P (A|B) = 0;
if AB = ∅.
•
P (A|B) = 1 − P (Ā|B),
1.4.2 Independence
•
P (AB) = P (A) · P (B)
3
Chapter 2
•
F (x2 ) − F (x1 ) = P (x1 < X ⩽ x2 ) ⩾ 0.
•
F (xmin ) = 0, F (xmax ) = 1.
4
Figure 2.1:
Figure 2.2:
•
f (x) ⩾ 0.
• Z
f (x)dx = F (xmax ) = 1,
Ω
which is called the uniformity of the probability density.
• Z x2
f (x)dx = F (x2 ) − F (x1 )
x1
•
f (x) = F ′ (x).
5
2.2 Distribution of Random variables function
Y is the transformation of X, namely
Y = Y (X).
2.2.1 Discrete
• X
p(yj ) = qj = p(X = xi ).
y(xi )=yj
•
p(yi ) = p(xi ) = p(x(yi )),
if X to Y is one to one.
2.2.2 Continuous
•
X
k
dxi (y)
g(y) = f (xi (y)) .
i=1
dy
•
dx(y)
g(y) = f (x(y)) .
dy
if one to one.
or Z
E(Y ) = E{g(X)} = g(x)f (x)dx.
Ω
• Median
P (X ⩽ xp ) ⩾ p, P (X ⩾ xp ) ⩾ 1 − p, 0<p<1
when p = 0.5
• Most probable value
p(xpro ) = max{p(x1 ), p(x2 ), · · · }
or
f (xpro ) = max{f (x)}
x∈Ω
6
2.3.1 Moments
• l − ordermomentof XwithrespecttoC is denoted by
αl ≡ E{(X − C)l }.
• Relationship
Xn
n
µn = λn−k (−µ)k ,
k
k=0
Xn
n
λn = µn−k (µ)k .
k
k=0
•
µ0 = 1, µ1 = 0, µ2 = σ 2
•
λ0 = 1, λ1 = µ, λ2 = σ 2 + µ2
• Variance is defined by µ2 , and can calculated by
V (X) = E(X 2 ) − µ2 .
• Skewness
µ3 E{(X − µ)3 }
γ1 ≡ = .
3/2
µ2 σ3
Figure 2.3:
7
• Kurtosis
µ4 E{(X − µ)4 }
γ2 ≡ − 3 = − 3.
µ22 σ4
Figure 2.4:
Z = X + iY
while
E(Z) = E(X) + iE(Y ).
8
• Characteristic function of random variable is defined by the Fourier trans-
form of the probability density function.
φX (t) ≡ E(eitX )
X
= p(xk )eitxk
k
Z ∞
= eitx f (x)dx
−∞
• Z ∞
1
f (x) = φX (t)e−ixt dt;
2π −∞
• Z ∞
i 1 −itb
F (b) − F (a) = e − e−ita φX (t)dt.
2π −∞ t
2.4.1 Property
•
φ(0) = E(e0 ) = 1.
•
|φ(t)| ⩽ 1.
•
φY (t) = eibt φX (at),
is the characteristic function of Y = aX + b.
•
φZ (t) = φX (t) · φY (t),
is the characteristic function of Z = X + Y .
or
n (−n) (n) −n dn φX (t)
λn = E(X ) = i φX (0) =i .
dtn t=0
9
2.5 Probability-generating function for discrete
variables
• X
φX (t) = p(xk )eitxk = E(eitX ),
k
with
Z = eit
we have X
G(Z) ≡ E(Z X ) = p(xk )Z xk ,
k
10
Chapter 3
Multidimensional random
variables and their
distribution
3.1 Two-dimension
•
F (x, y) = P (X ⩽ x, Y ⩽ y).
is the distribution function of {X, Y }
• Define
P (X = xi , Y = yj ) = pij , i, j = 1, 2, · · · ,
then X
F (x, y) = pij .
xi ⩽x
Similarly, Z Z
y x
F (x, y) = f (u, v)dudv,
y min x min
with
∂ 2 F (x, y)
f (x, y) = ..
∂x∂y
•
FX (x) = F (x, ymax ), FY (y) = F (xmax , y).
• Independence
F (x, y) = FX (x) · FY (y)
or
f (x, y) = fX (x) · fY (y).
11
3.2 Contingent probability distribution
• Discrete
P (X = xi , Y = yj ) pij
P (X = xi |Y = yj ) = = , i = 1, 2, · · · .
P (Y = yj ) pj
• Continuous
f (x, y)
f (x|y) = .
fY (y)
• When H = X, Z Z
E(X) = xf (x, y)dxdy,
Ωy Ωx
Therefore,
Z Z
E(aX + bY ) = (ax + by)f (x, y)dxdy
Ωy Ωx
= aE(X) + bE(Y ).
• Given H(X, Y ) = X l Y m ,
λlm = E(X l Y m ),
3.3.1 Covariance
• Define covariance
cov(X, Y ) ≡ µ11
= E{[X − E(X)][Y − E(Y )]}
= E(XY ) − E(X)E(Y ) − E(X)E(Y ) + E(X)E(Y )
= E(XY ) − E(X)E(Y ).
–
cov(X, Y ) = cov(Y, X).
12
–
cov(aX, bY ) = ab cov(X, Y )
–
cov(X1 + X2 , Y ) = cov(X1 , Y ) + cov(X2 , Y ).
•
V (aX + bY ) = a2 V (X) + b2 V (Y ) + 2abcov(X, Y ).
• Coefficient of association
cov(X, Y )
ρXY ≡ .
σ(X)σ(Y )
–
|ρXY | ⩽ 1.
–
|ρXY | = 1
while Y = aX + b.
• Schwarz inequation
where
Vij = cov(Xi , Xj )
• Discrete X
F (u, v) = F (U ⩽ u, V ⩽ v) = pij
(i,j)∈D
• Continuous
x, y
g(u, v) = f (x, y) J .
u, v
where ∂x ∂y
x, y ∂(x, y)
J = = ∂u
∂x
∂u
∂y ̸= 0.
u, v ∂(u, v) ∂v ∂v
13
3.4.1 Z =X +Y
• Discrete X
PZ (zk ) = PX (xi ) PY (zk − xi )
i
X
= PX (zk − yj )PY (yj ).
j
• Continuous
ZZ
FZ (z) = P (Z ⩽ z) = f (x, y)dxdy.
x+y⩽z
Figure 3.1:
3.4.2 Z =X −Y
• Z
fZ (z) = f (z + y, y)dy.
14
3.4.3 Z = XY
• Z
z 1
fZ (z) = f ,y dy
y |y|
Z +∞ Z 0
1 z 1 z
= f , y dy − f , y dy.
0 y y −∞ y y
3.4.4 Z = X/Y
• Z
fZ (z) = |y|f (zy, y)dy.
3.4.5 Z = X2 + Y 2
• Z
1 2π √ √
fZ (z) = f ( z cos φ, z sin φ)dφ.
2 0
15
Chapter 4
Probability distribution
• Value of expectation
E(x) = p.
• Variance
V (x) = p(1 − p).
• Binomial distribution
– Let
r = X1 + X2 + · · · + Xn .
–
n
B(r; n, p) = pr (1 − p)n−r , r = 0, 1, · · · , n.
r
X
x
F (x; n, p) = B(r; n, p), x = 0, 1, · · · , n.
r=0
–
µ ≡ E(r) = np, V (r) = np(1 − p).
1 − 2p 1 − 6p(1 − p)
γ1 = , γ2 = .
[np(1 − p)]
1
2 np(1 − p)
16
4.2 Multinomial distribution
•
E = A1 + A2 + · · · + Al ,
and
P (Aj ) = pj , j = 1, 2, · · · , l.
•
n!
M (r; n, p) = pr1 pr2 · · · prl l .
r1 !r2 ! · · · rl ! 1 2
• Value of expectation
E(rj ) = npj , j = 1, 2, · · · , l.
• Variance
V (rj ) = npj (1 − pj ).
• Covariance
cov(ri , rj )
•
E(r) = G′ (1) = µ,
V (r) = G′′ (1) + G′ (1) − [G′ (1)] = µ.
2
namely,
E(r) = V (r) = µ.
• Characteristic function
it
−1)
φ(t) = eµ(e .
17
•
1 1
γ1 = √ , γ 2 = ,
µ µ
dλk
λk+1 = µ λk + .
dµ
• Poisson’s theorem When
n → ∞, np = µ
• Value of expectation
E(r) = p−1 ,
• Variance
V (r) = (1 − p)/p2 ,
•
γ1 = (2 − p)/(1 − p)1/2 ,
γ2 = (p2 − 6p + 6)/(1 − p),
• Probability-generating function
pZ
G (Z) = .
1 − (1 − p)Z
• Value of expectation
E(r) = k/p,
18
• Variance
V (r) = k(1 − p)/p2 ,
• p
γ1 = (2 − p)/ k(1 − p),
72 = (p2 − 6p + 6)/k(1 − p),
• Probability-generating function
k
pZ
G(Z) = .
1 − (1 − p)Z
• Value of expectation
na
E(r) =
N
• Variance
N − n na a
V (r) = · 1− .
N −1 N N
• p
γ1 = (2 − p)/ k(1 − p),
72 = (p2 − 6p + 6)/k(1 − p),
• Probability-generating function
k
pZ
G(Z) = .
1 − (1 − p)Z
19
Figure 4.1:
• Value of expectation
Z b
a+b
E (X) = xf (x)dx = ,
a 2
• Variance Z b
(b − a)2
V (X) = [x − E(X)]2 f (x)dx = ,
a 12
•
γ1 = 0, γ2 = −1.2.
• Characteristic function
eitb − eita
φ(t) =
it(b − a).
20
4.10 Gamma distribution
•
β α α−1 −βx
f (x; α, β) = x e , α, β > 0, 0 ⩽ x < ∞,
Γ(α)
• Value of expectation
E(X) = α/β
• Variance
V (X) = α/β 2
•
2 6
γ1 = √ , γ 2 = .
α α
• Characteristic function
φ(t) = (1 − it/β)−α .
Figure 4.2:
21
• Variance
mn
V (X) = ,
(m + n)2 (m + n + 1)
• √
2(n − m) m + n + 1
γ1 = √ ,
(m + n + 2) mn
3(m + n + 1) 2(m + n)2 + mn(m + n − 6)
= − 3.
mn(m + n + 2)(m + n + 3)
• Characteristic function
∞
Γ (m + n) X Γ (m + n)(it)k
φ(t) = .
Γ (m) Γ (m + n + k)Γ (k + 1)
k=0
Figure 4.3:
22
•
E(X) = µ,
V (X) = σ 2 ,
γ1 = γ2 = 0.
Figure 4.4:
23
4.14 Landau distribution
•
4.15 χ2 Distribution
•
1 y
y 2 −1 e− 2 ,
n
f (y; n) = n
y ⩾ 0,
Γ 2 2n/2
with n DOF.
• Z y
1
u 2 −1 e− 2 du.
n u
F (y; n) = n
Γ 2 2n/2 0
•
dφ(t) n
E(Y ) = (−i) = (−i) − (−2i) = n,
dt 2
t=0
•
φ(t) = (1 − 2it)−n/2 .
•
V (Y ) = E(Y 2 ) − [E(Y )]2 = n2 + 2n − n2 = 2n,
• r
µ3 2 µ4 12
γ1 = =2 , γ2 = 2 −3= .
(µ2 )3/2 n µ2 n
Figure 4.5:
24
4.16 T Distribution
• − n+1
Γ n+1 t2 2
f (t; n) = √ 2
n
1+ , −∞ < t < ∞,
nπΓ 2 n
4.17 F Distribution
•
25
Chapter 5
lim P {|Xi − a| ⩾ ε} = 0.
n→∞
or
lim P {|Xi − a| < ε} = 1.
n→∞
• Generally
( ) 1
P
n
V Xi
1X 1X
n n n
i=1 C
P Xi − µi < ε ⩾1− ⩾1− ,
n i=1 n i=1 ε2 nε2
26
5.1.3 Bernoulli law of large numbers
• n m o
lim P − p < ε = 1,
n→∞ n
1 X
n
lim 2+δ
E|Xi − µi |2+δ = 0,
n→∞ Bn
i=1
P P
i=1Xi − i=1 µi
Y =
Bn
Pn Pn
i=1 Xi − i=1 µi
lim F (y) = lim P ⩽y
n→∞ n→∞ Bn
Z y
1 − t2
= √ e 2 dt.
−∞ 2π
27
Chapter 6
• Define
Dn = max |Fn∗ (x) − F (x)|,
−∞<x<∞
28
6.2 Statistical Magnitude
6.2.1 Order Statistic
(n) (n) (n)
• X1 , X2 , · · · , Xn is Order Statistic of X1 , X2 , · · · , Xn , where the ob-
served values are sorted in order of size.
29
•
M1 = 0,
M2 = Λ2 − Λ21 ,
M3 = A3 − 3Λ2 Λ1 + 2A31 ,
M4 = Λ4 − 4Λ3 Λ1 + 6Λ2 Λ21 − 3Λ41 ,
Pn
Xi Yi − nX̄ Ȳ
i=1
= 1/2 n 1/2
P
n P 2
Xi − nX̄
2 2 Yi − nȲ 2
i=1 i=1
30
• For multiple dimensions,{X1 , · · · , Xk } has observed values
• Mean of xj
1X
n
x̄j = xij , j = 1, 2, · · · , k
n i=1
• Variance of xj
1 X
n
s2j = (xij − x̄j )2 , j = 1, 2, · · · , k
n − 1 i=1
• Covariance of xi and xl
1 X
n
sjl = (xij − x̄j )(xil − x̄l ), j, l = 1, 2, · · · , k
n − 1 i=1
• Coefficient of association
sjl
rjl = , j, l = 1, 2, · · · , k
sj l
written in the form of matrix
1 r12 ··· r1k
r21 1 ··· r2k
R = .. .. .. ,
:
. . .
rk1 rk2 ··· 1
31
6.5 Bayes Formula
• Marginal probability
X
n
P (Ai ) = P (Ai ∩ Bj )
j=1
while X X
P (Ai ) = 1 = P (Bj ).
i=1 j=1
• Total probability
X
n
P (A) = P (A|Bj ) · P (Bj ).
j=1
• Bayes Formula
• Bayes problem
Prior probability, in Bayesian statistics, is the probability of an event
before new data is collected. Denoted by
P (Bi )
32
Chapter 7
Parameter estimation
•
Y
n
L = L(X1 , X2 , · · · , Xn |ϑ) = f (Xi |ϑ) ≡ L(X|ϑ),
i=1
7.1.1 Congruence
• T converges in probability to ϑ.
P ϑ̂n − ϑ > ε < η.
7.1.2 Unbiasedness
• When n is finite, we need a new estimate. If the excepted value of T is
equal to ϑ
E(T ) = ϑ,
then T is an unbiased estimate of ϑ.
33
•
E(T ) = ϑ + b(ϑ), b ̸= 0,
is a biased estimation. A proper estimate should satisfy
•
lim E(T ) = ϑ,
n→∞
• Since
∂ 2 ln L ∂τ ∂b
E − = E A(ϑ) · +
∂ϑ2 ∂ϑ ∂ϑ
∂τ ∂b
= A(ϑ) + ,
∂ϑ ∂ϑ
the least variance can be written as
∂τ ∂b
+ ∂ϑ
MVB ≡ min[V (T )] = ∂ϑ
,
A(ϑ)
where
∂ ln L
= A(ϑ) [T − τ (ϑ) − b(ϑ)] .
∂ϑ
7.1.4 Sufficiency
• If the estimator T (X1 , X2 , · · · , Xn ) use all the information of X1 , X2 , · · · , Xn ,
then T is a sufficient estimator of ϑ. Namely,
f (X1 , X2 , · · · , Xn |T )
is independent with ϑ.
34
• Fisher-Neyman Theorem Sufficiency ⇔
Y
n
L(X1 , X2 , · · · , Xn |ϑ) = f (Xi ; ϑ)
i=1
= G(T |ϑ)H(X1 , X2 , · · · , Xn ),
ta = t (ϑa ) , tb = t (ϑb ) .
35
Chapter 8
Maximum likelihood
method
This is a function of ϑ.
• The maximum likelihood principle According to this principle, we
should use ϑ̂ such that the likelihood function get the maximum.
L(x|ϑ̂) ⩾ L(x|ϑ).
•
ϑ̂ = ϑ̂(x1 , · · · , xn )
is called the Maximum likelihood estimator.
• ∂ Pn
∂ϑ ln L(X|ϑ) =
∂
∂ϑ i=1 ln f (Xi |ϑ) = 0,
2 2 Pn
∂
∂ϑ2 ln L(X|ϑ)
∂
= ∂ϑ 2 i=1 ln f (Xi |ϑ) < 0.
ϑ=ϑ̂
ϑ=ϑ̂
is called the likelihood equation.
36
8.2 Maximum likelihood estimation of normal
population parameters
8.2.1 Maximum likelihood estimation of mean µ
• When every observed value have the same σ:
" 2 #
Yn
1 1 Xi − µ
L (X; σ|µ) = √ exp − ,
i=1
2πσ 2 σ
then " 2 #
∂ X
n
∂ ln L 1 1 Xi − µ
= − ln(2πσ ) −
2
= 0,
∂µ ∂µ i=1 2 2 σ
we have
1X
n
µ̂ = Xi = X̄.
n i=1
The variance is
σ2
V (µ̂) = .
n
• When observed value have error σi respectively:
Pn Xi Pn
i=1 σ 2 i=1 wi Xi
µ̂ = Pn 1i = P n ,
i=1 σ 2 i
i=1 wi
with variance
1 1
V (µ̂) = Pn 1 = Pn .
i=1 σi2 i=1 wi
2σ 4
V (σ̂ 2 ) = .
n
1X
n
2
σ̂ = (Xi − X̄)2 .
n i=1
37
1 X
n
n
2
S = (Xi − X̄)2 = σ̂ 2 .
n − 1 i=1 n−1
38
Chapter 9
39
Chapter 10
Moments method
40
Chapter 11
Interval estimation
41
Chapter 12
42
Chapter 13
43
Chapter 14
44
Chapter 15
45
Chapter 16
46
Chapter 17
47