0% found this document useful (0 votes)
11 views51 pages

Data Analysis

This document is a notebook on data analysis that covers topics in probability theory and statistics. It includes sections on preliminary probability theory, random variables and their distributions, multidimensional random variables, common probability distributions, laws of large numbers and the central limit theorem, subsamples and their distributions, parameter estimation, and hypothesis testing. The notebook provides an overview and introduction to key concepts and formulas within each topic.

Uploaded by

wangzixu20040407
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
11 views51 pages

Data Analysis

This document is a notebook on data analysis that covers topics in probability theory and statistics. It includes sections on preliminary probability theory, random variables and their distributions, multidimensional random variables, common probability distributions, laws of large numbers and the central limit theorem, subsamples and their distributions, parameter estimation, and hypothesis testing. The notebook provides an overview and introduction to key concepts and formulas within each topic.

Uploaded by

wangzixu20040407
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 51

The Notebook of Data

analysis

Zixu Wang

January 30, 2024


Contents

1 Preliminary probability theory 1


1.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Axiomatic approach to probability . . . . . . . . . . . . . . . . . 1
1.3 Property of probability . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Contingent probability . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Other probability . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Random variables and their distribution 4


2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Discrete random variable . . . . . . . . . . . . . . . . . . 4
2.1.2 Continuous random variable . . . . . . . . . . . . . . . . . 5
2.2 Distribution of Random variables function . . . . . . . . . . . . . 6
2.2.1 Discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Numerical characteristics . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Chebyshev inequation . . . . . . . . . . . . . . . . . . . . 8
2.4 Characteristic function of random variable . . . . . . . . . . . . . 8
2.4.1 Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Relationship with moment . . . . . . . . . . . . . . . . . . 9
2.5 Probability-generating function for discrete variables . . . . . . . 10

3 Multidimensional random variables and their distribution 11


3.1 Two-dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Contingent probability distribution . . . . . . . . . . . . . . . . . 12
3.3 Numerical characteristics . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Distribution of Function of two-dimension random variables . . . 13
3.4.1 Z = X + Y . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.2 Z = X − Y . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.3 Z = XY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.4 Z = X/Y . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.5 Z = X 2 + Y 2 . . . . . . . . . . . . . . . . . . . . . . . . . 15

i
3.5 Multidimensional random variables . . . . . . . . . . . . . . . . . 15

4 Probability distribution 16
4.1 Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Compound Poisson distribution . . . . . . . . . . . . . . . . . . . 18
4.5 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . 18
4.6 Pascal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.7 Hyper-geometric distribution . . . . . . . . . . . . . . . . . . . . 19
4.8 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.9 exponent distribution . . . . . . . . . . . . . . . . . . . . . . . . 20
4.10 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.11 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.12 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.13 Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.14 Landau distribution . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.15 χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.16 T Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.17 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Law of large numbers & Central-limit theorem 26


5.1 Law of large numbers . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.1 Chebyshev’s Law of large numbers . . . . . . . . . . . . . 26
5.1.2 Hinchin’s Law of large numbers . . . . . . . . . . . . . . . 26
5.1.3 Bernoulli law of large numbers . . . . . . . . . . . . . . . 27
5.1.4 Poisson’s law of large numbers . . . . . . . . . . . . . . . 27
5.2 Central-limit theorem . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.1 Independent and Identically distributed . . . . . . . . . . 27
5.2.2 Lyapunov theorem . . . . . . . . . . . . . . . . . . . . . . 27
5.2.3 De Moivre-Laplace Theorem . . . . . . . . . . . . . . . . 27

6 Subsamples and their distribution 28


6.1 Random Subsamples and their distribution function . . . . . . . 28
6.2 Statistical Magnitude . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2.1 Order Statistic . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2.2 Subsample mean . . . . . . . . . . . . . . . . . . . . . . . 29
6.2.3 Subsample variance . . . . . . . . . . . . . . . . . . . . . 29
6.2.4 The K-order origin moment of the subsample . . . . . . . 29
6.2.5 The K-order central moment of the subsample . . . . . . 29
6.2.6 Skewness of the subsample . . . . . . . . . . . . . . . . . 30
6.2.7 Subsample kurtosis . . . . . . . . . . . . . . . . . . . . . . 30
6.2.8 Subsample covariance . . . . . . . . . . . . . . . . . . . . 30
6.2.9 Coefficient of association of subsamples . . . . . . . . . . 30
6.3 Statistical Magnitude with its numerical Characteristic . . . . . . 30
6.4 Sample distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 31

ii
6.5 Bayes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7 Parameter estimation 33
7.1 Estimate and Likelihood function . . . . . . . . . . . . . . . . . . 33
7.1.1 Congruence . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.2 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.3 Effectiveness and the least variance . . . . . . . . . . . . . 34
7.1.4 Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2.1 Pivot variable method . . . . . . . . . . . . . . . . . . . . 35
7.2.2 Large sample method . . . . . . . . . . . . . . . . . . . . 35
7.3 Confidence interval for normal distribution . . . . . . . . . . . . 35

8 Maximum likelihood method 36


8.1 The maximum likelihood principle . . . . . . . . . . . . . . . . . 36
8.2 Maximum likelihood estimation of normal population parameters 37
8.2.1 Maximum likelihood estimation of mean µ . . . . . . . . . 37
8.2.2 Maximum likelihood estimation of variance σ . . . . . . . 37
8.2.3 Maximum likelihood estimation of both µ and σ . . . . . 37

9 Least square method 39

10 Moments method 40

11 Interval estimation 41

12 42

13 43

14 44

15 45

16 46

17 47

iii
Chapter 1

Preliminary probability
theory

1.1 Probability
• Statistical Probability
n
P (A) = lim .
N →∞ N

• Classical probability
P (A) = k/n.

• Geometric probability
ωA
P (A) = ,

1.2 Axiomatic approach to probability


• –
0 ⩽ P (A) ⩽ 1


P (S) = 1

X
n
P (A1 ∪ A2 ∪ · · · ∪ An ) = P (Ak )
k=1
X∞
P (A1 ∪ A2 ∪ · · · ∪ An ∪ · · · ) = P (Ak ).
k=1

• Three above definitions satisfy the axiom.

1
1.3 Property of probability

P (A) + P (Ā) = 1.


P (∅) = 0.


P (A) ⩾ P (B).
if A ⊃ B.


P (A) ⩾ P (B).
if Ai is a division of S.


P (A − B) = P (A) − P (B).
if A ⊃ B.

P (A ∪ B) = P (A) + P (B) − P (AB).

• generalization:
X
n
S1 = P (Ai ),
i=1
Xn
S2 = P (Ai Aj ),
i<j=2
Xn
S3 = P (Ai Aj Ak )Q ,
i<j<k=3

Then !
[
n
P Ai ≡ P (A1 ∪ A2 ∪ · · · ∪ An )
i=1
= S1 − S2 + S3 − · · · − (−1)n Sn .
which is the addition theorem of probability.

2
1.4 Contingent probability

P (AB)
P (B|A) = .
P (A)
or
P (AB)
P (A|B) = .
P (B)

• We can get

P (AB) = P (B|A) · P (A) = P (A|B) · P (B),

which is the multiplication theorem of probability.

1.4.1 Other probability



P (A|B) = 1;
if A ⊃ B.


P (A|B) = 0;
if AB = ∅.

P (A|B) = 1 − P (Ā|B),

1.4.2 Independence

P (AB) = P (A) · P (B)

3
Chapter 2

Random variables and their


distribution

2.1 Random variables


• A random variable is a variable whose value is unknown or a function
that assigns values to each of an experiment’s outcomes.
• CDF(cumulative distribution function):
In Probability and Statistics, the Cumulative Distribution Function (CDF)
of a real-valued random variable, say ’X’, which is evaluated at x, is the
probability that X takes a value less than or equal to the x. Denoted by
F (x).

0 ⩽ F (x) ⩽ 1.


F (x2 ) − F (x1 ) = P (x1 < X ⩽ x2 ) ⩾ 0.


F (xmin ) = 0, F (xmax ) = 1.

2.1.1 Discrete random variable


• X
F (x) = pi ,
xi ⩽x

4
Figure 2.1:

2.1.2 Continuous random variable


• Z x
F (x) = f (t)dt.
xmin
where f (x) is called the probability density function

Figure 2.2:


f (x) ⩾ 0.
• Z
f (x)dx = F (xmax ) = 1,

which is called the uniformity of the probability density.
• Z x2
f (x)dx = F (x2 ) − F (x1 )
x1

f (x) = F ′ (x).

5
2.2 Distribution of Random variables function
Y is the transformation of X, namely
Y = Y (X).

2.2.1 Discrete
• X
p(yj ) = qj = p(X = xi ).
y(xi )=yj


p(yi ) = p(xi ) = p(x(yi )),
if X to Y is one to one.

2.2.2 Continuous

X
k
dxi (y)
g(y) = f (xi (y)) .
i=1
dy

dx(y)
g(y) = f (x(y)) .
dy
if one to one.

2.3 Numerical characteristics


• Mathematical expectation
X
E(Y ) = E{g(X)} = g(xi )p(xi )
i

or Z
E(Y ) = E{g(X)} = g(x)f (x)dx.

• Median
P (X ⩽ xp ) ⩾ p, P (X ⩾ xp ) ⩾ 1 − p, 0<p<1
when p = 0.5
• Most probable value
p(xpro ) = max{p(x1 ), p(x2 ), · · · }
or
f (xpro ) = max{f (x)}
x∈Ω

6
2.3.1 Moments
• l − ordermomentof XwithrespecttoC is denoted by
αl ≡ E{(X − C)l }.

When C=0, called the origin moment or algebraic moment of X.


λl ≡ E(X l ).

When C=µ, called the center moment.


µl ≡ E{(X − µ)l }.

• Relationship
Xn  
n
µn = λn−k (−µ)k ,
k
k=0
Xn  
n
λn = µn−k (µ)k .
k
k=0

µ0 = 1, µ1 = 0, µ2 = σ 2

λ0 = 1, λ1 = µ, λ2 = σ 2 + µ2
• Variance is defined by µ2 , and can calculated by
V (X) = E(X 2 ) − µ2 .

• Skewness
µ3 E{(X − µ)3 }
γ1 ≡ = .
3/2
µ2 σ3

Figure 2.3:

7
• Kurtosis
µ4 E{(X − µ)4 }
γ2 ≡ − 3 = − 3.
µ22 σ4

Figure 2.4:

2.3.2 Chebyshev inequation



σ2
P {|X − µ| ⩾ ε} ⩽ .
ε2
or
P {|X − µ| < εσ} ⩾ 1 − 1/ε2 .

P {|X − µ| < 3σ} ⩾ 0.888 9,


P {|X − µ| < 4σ} ⩾ 0.9375.

2.4 Characteristic function of random variable


• Complex random variable

Z = X + iY

while
E(Z) = E(X) + iE(Y ).

8
• Characteristic function of random variable is defined by the Fourier trans-
form of the probability density function.

φX (t) ≡ E(eitX )
X
= p(xk )eitxk
k
Z ∞
= eitx f (x)dx
−∞

• Z ∞
1
f (x) = φX (t)e−ixt dt;
2π −∞

• Z ∞
i 1 −itb 
F (b) − F (a) = e − e−ita φX (t)dt.
2π −∞ t

2.4.1 Property

φ(0) = E(e0 ) = 1.


|φ(t)| ⩽ 1.


φY (t) = eibt φX (at),
is the characteristic function of Y = aX + b.

φZ (t) = φX (t) · φY (t),
is the characteristic function of Z = X + Y .

2.4.2 Relationship with moment


• with origin moment
Z ∞
(n) (n)
φX (0) = φX (t = 0) = i n
xn f (x)dx = in E{X n },
−∞

or  
n (−n) (n) −n dn φX (t)
λn = E(X ) = i φX (0) =i .
dtn t=0

• with center moment


 
(n) dn  −iµt 
µn = i(−n) φX−µ (0) = i(−n) · n
e φX (t) .
dt t=0

9
2.5 Probability-generating function for discrete
variables
• X
φX (t) = p(xk )eitxk = E(eitX ),
k

with
Z = eit
we have X
G(Z) ≡ E(Z X ) = p(xk )Z xk ,
k

which is called the Probability-generating function.

10
Chapter 3

Multidimensional random
variables and their
distribution

3.1 Two-dimension

F (x, y) = P (X ⩽ x, Y ⩽ y).
is the distribution function of {X, Y }
• Define
P (X = xi , Y = yj ) = pij , i, j = 1, 2, · · · ,
then X
F (x, y) = pij .
xi ⩽x

Similarly, Z Z
y x
F (x, y) = f (u, v)dudv,
y min x min

with
∂ 2 F (x, y)
f (x, y) = ..
∂x∂y

FX (x) = F (x, ymax ), FY (y) = F (xmax , y).

• Independence
F (x, y) = FX (x) · FY (y)
or
f (x, y) = fX (x) · fY (y).

11
3.2 Contingent probability distribution
• Discrete
P (X = xi , Y = yj ) pij
P (X = xi |Y = yj ) = = , i = 1, 2, · · · .
P (Y = yj ) pj

• Continuous
f (x, y)
f (x|y) = .
fY (y)

3.3 Numerical characteristics


• Z Z
E{H(X, Y )} = H(x, y)f (x, y)dxdy,
Ωy Ωx

V {H(X, Y )} = E{[H(X, Y ) − E(H(X, Y ))]2 }.

• When H = X, Z Z
E(X) = xf (x, y)dxdy,
Ωy Ωx

Therefore,
Z Z
E(aX + bY ) = (ax + by)f (x, y)dxdy
Ωy Ωx

= aE(X) + bE(Y ).

• Given H(X, Y ) = X l Y m ,

λlm = E(X l Y m ),

is called the l + m − orderoriginmoment. If X is independent with Y,


then
E(XY ) = E(X) · E(Y ).

3.3.1 Covariance
• Define covariance
cov(X, Y ) ≡ µ11
= E{[X − E(X)][Y − E(Y )]}
= E(XY ) − E(X)E(Y ) − E(X)E(Y ) + E(X)E(Y )
= E(XY ) − E(X)E(Y ).

cov(X, Y ) = cov(Y, X).

12

cov(aX, bY ) = ab cov(X, Y )

cov(X1 + X2 , Y ) = cov(X1 , Y ) + cov(X2 , Y ).

V (aX + bY ) = a2 V (X) + b2 V (Y ) + 2abcov(X, Y ).

• Coefficient of association
cov(X, Y )
ρXY ≡ .
σ(X)σ(Y )

|ρXY | ⩽ 1.

|ρXY | = 1
while Y = aX + b.
• Schwarz inequation

[cov(X, Y )]2 − V (X)V (Y ) ⩽ 0.

• Covariance Matrix (error matrix)


 
V11 V12
V = .
V21 V22

where
Vij = cov(Xi , Xj )

3.4 Distribution of Function of two-dimension


random variables

U = U (X, Y ), V = V (X, Y ).

• Discrete X
F (u, v) = F (U ⩽ u, V ⩽ v) = pij
(i,j)∈D

• Continuous  
x, y
g(u, v) = f (x, y) J .
u, v
where   ∂x ∂y
x, y ∂(x, y)
J = = ∂u
∂x
∂u
∂y ̸= 0.
u, v ∂(u, v) ∂v ∂v

13
3.4.1 Z =X +Y
• Discrete X
PZ (zk ) = PX (xi ) PY (zk − xi )
i
X
= PX (zk − yj )PY (yj ).
j

• Continuous
ZZ
FZ (z) = P (Z ⩽ z) = f (x, y)dxdy.
x+y⩽z

When X is independent with Y:


Z
fZ (z) = fX (z − y)fY (y)dy
Ωy
Z
= fX (x)fY (z − x)dx.
Ωx

which is called convolution formula.

Figure 3.1:

3.4.2 Z =X −Y
• Z
fZ (z) = f (z + y, y)dy.

14
3.4.3 Z = XY
• Z  
z 1
fZ (z) = f ,y dy
y |y|
Z +∞   Z 0  
1 z 1 z
= f , y dy − f , y dy.
0 y y −∞ y y

3.4.4 Z = X/Y
• Z
fZ (z) = |y|f (zy, y)dy.

3.4.5 Z = X2 + Y 2
• Z
1 2π √ √
fZ (z) = f ( z cos φ, z sin φ)dφ.
2 0

3.5 Multidimensional random variables

15
Chapter 4

Probability distribution

4.1 Bernoulli distribution


• The Bernoulli distribution is a discrete distribution having two possible
outcomes labeled by X = 1 (success) and X = 0 (failure).

P (X = 1) = p, P (X = 0) = 1 − p.

• Value of expectation
E(x) = p.

• Variance
V (x) = p(1 − p).

• Binomial distribution

– Let
r = X1 + X2 + · · · + Xn .
–  
n
B(r; n, p) = pr (1 − p)n−r , r = 0, 1, · · · , n.
r
X
x
F (x; n, p) = B(r; n, p), x = 0, 1, · · · , n.
r=0

µ ≡ E(r) = np, V (r) = np(1 − p).
1 − 2p 1 − 6p(1 − p)
γ1 = , γ2 = .
[np(1 − p)]
1
2 np(1 − p)

16
4.2 Multinomial distribution

E = A1 + A2 + · · · + Al ,
and
P (Aj ) = pj , j = 1, 2, · · · , l.


n!
M (r; n, p) = pr1 pr2 · · · prl l .
r1 !r2 ! · · · rl ! 1 2
• Value of expectation

E(rj ) = npj , j = 1, 2, · · · , l.

• Variance
V (rj ) = npj (1 − pj ).

• Covariance
cov(ri , rj )

4.3 Poisson distribution



1 r −µ
P (r; µ) = µ e , r = 0, 1, 2, · · · .
r!
• Probability-generating function

X 1 r −µ
G (Z) ≡ E(Z r ) = Zr µ e
r=0
r!
X∞
1
= (µZ)r e−µ = eµ(Z−1) .
r=0
r!


E(r) = G′ (1) = µ,
V (r) = G′′ (1) + G′ (1) − [G′ (1)] = µ.
2

namely,
E(r) = V (r) = µ.

• Characteristic function
it
−1)
φ(t) = eµ(e .

17

1 1
γ1 = √ , γ 2 = ,
µ µ
 
dλk
λk+1 = µ λk + .

• Poisson’s theorem When

n → ∞, np = µ

the binomial distribution tends towards the Bernoulli distribution.

4.4 Compound Poisson distribution



X
n
r= ri
i=1

4.5 Geometric distribution



g(r; p) = p(1 − p)r−1 ,

• Value of expectation
E(r) = p−1 ,

• Variance
V (r) = (1 − p)/p2 ,


γ1 = (2 − p)/(1 − p)1/2 ,
γ2 = (p2 − 6p + 6)/(1 − p),

• Probability-generating function
pZ
G (Z) = .
1 − (1 − p)Z

4.6 Pascal Distribution


•  
r−1 k
Pk (r; p) = p (1 − p)r−k , 0 ⩽ p ⩽ 1, r = k, k + 1, · · · ,
k−1

• Value of expectation
E(r) = k/p,

18
• Variance
V (r) = k(1 − p)/p2 ,

• p
γ1 = (2 − p)/ k(1 − p),
72 = (p2 − 6p + 6)/k(1 − p),

• Probability-generating function
 k
pZ
G(Z) = .
1 − (1 − p)Z

• When k = 1, Pascal distribution is Geometric distribution.

4.7 Hyper-geometric distribution



  . 
N −a a N
P (r; N, n, a) = , r = 0, 1, 2, · · · , min(a, n),
n−r r n

• Value of expectation
na
E(r) =
N
• Variance
N − n na  a
V (r) = · 1− .
N −1 N N
• p
γ1 = (2 − p)/ k(1 − p),
72 = (p2 − 6p + 6)/k(1 − p),

• Probability-generating function
 k
pZ
G(Z) = .
1 − (1 − p)Z

4.8 Uniform distribution


•  1
b−a , a ⩽ X ⩽ b,
f (x) =
0, else,

 0, X < a,
F (x) = x−a
b−a , a ⩽ X ⩽ b,

1, X > b.

19
Figure 4.1:

• Value of expectation
Z b
a+b
E (X) = xf (x)dx = ,
a 2
• Variance Z b
(b − a)2
V (X) = [x − E(X)]2 f (x)dx = ,
a 12

γ1 = 0, γ2 = −1.2.
• Characteristic function
eitb − eita
φ(t) =
it(b − a).

4.9 exponent distribution




λe−λx , x ⩾ 0,
f (x; λ) =
0, x < 0,
F (x) = 1 − e−λx ,
• Value of expectation
1
E(X) =
λ
• Variance
1
V (X) =
λ2

γ1 = 2, γ2 = 6.
• Characteristic function
 −1
it
φ(t) = 1−
λ

20
4.10 Gamma distribution

β α α−1 −βx
f (x; α, β) = x e , α, β > 0, 0 ⩽ x < ∞,
Γ(α)
• Value of expectation
E(X) = α/β

• Variance
V (X) = α/β 2


2 6
γ1 = √ , γ 2 = .
α α
• Characteristic function

φ(t) = (1 − it/β)−α .

Figure 4.2:

4.11 Beta Distribution



Γ(m + n) m−1
f (X) = X (1 − X)n−1 , 0⩽X⩽1
Γ(m)Γ(n)
• Value of expectation
m
E(X) = ,
m+n

21
• Variance
mn
V (X) = ,
(m + n)2 (m + n + 1)
• √
2(n − m) m + n + 1
γ1 = √ ,
(m + n + 2) mn
 
3(m + n + 1) 2(m + n)2 + mn(m + n − 6)
= − 3.
mn(m + n + 2)(m + n + 3)

• Characteristic function

Γ (m + n) X Γ (m + n)(it)k
φ(t) = .
Γ (m) Γ (m + n + k)Γ (k + 1)
k=0

Figure 4.3:

4.12 Normal distribution



1 (x−µ)2
f (x) ≡ N (µ, σ 2 ) = √ e− 2σ2 , −∞ < x < ∞,
2πσ
Z x
1 (t−µ)2
F (x) = √ e− 2σ2 dt,
2πσ −∞

22

E(X) = µ,
V (X) = σ 2 ,
γ1 = γ2 = 0.

Figure 4.4:

4.13 Cauchy distribution



1 1
f (x) = · , −∞ < x < ∞,
π 1 + x2

φ(t) = e−|t| .

23
4.14 Landau distribution

4.15 χ2 Distribution

1 y
y 2 −1 e− 2 ,
n
f (y; n) = n
 y ⩾ 0,
Γ 2 2n/2
with n DOF.
• Z y
1
u 2 −1 e− 2 du.
n u
F (y; n) = n

Γ 2 2n/2 0


dφ(t)  n
E(Y ) = (−i) = (−i) − (−2i) = n,
dt 2
t=0

φ(t) = (1 − 2it)−n/2 .


V (Y ) = E(Y 2 ) − [E(Y )]2 = n2 + 2n − n2 = 2n,

• r
µ3 2 µ4 12
γ1 = =2 , γ2 = 2 −3= .
(µ2 )3/2 n µ2 n

Figure 4.5:

24
4.16 T Distribution
•   − n+1
Γ n+1 t2 2

f (t; n) = √ 2
n
 1+ , −∞ < t < ∞,
nπΓ 2 n

4.17 F Distribution

25
Chapter 5

Law of large numbers &


Central-limit theorem

5.1 Law of large numbers


• Convergence in probability

lim P {|Xi − a| ⩾ ε} = 0.
n→∞

or
lim P {|Xi − a| < ε} = 1.
n→∞

5.1.1 Chebyshev’s Law of large numbers


• ( )
1X
n
lim P Xi − µ < ε = 1.
n→∞ n i=1

• Generally
 
( ) 1
P
n
V Xi
1X 1X
n n n
i=1 C
P Xi − µi < ε ⩾1− ⩾1− ,
n i=1 n i=1 ε2 nε2

5.1.2 Hinchin’s Law of large numbers


• ( )
1X
n
lim P Xi − µ < ε = 1.
n→∞ n i=1

26
5.1.3 Bernoulli law of large numbers
• n m o
lim P − p < ε = 1,
n→∞ n

5.1.4 Poisson’s law of large numbers


•  
m p 1 + p2 + · · · + pn
lim P − <ε = 1.
n→∞ n n

5.2 Central-limit theorem


5.2.1 Independent and Identically distributed
• When X1 , X2 , ... follow Independent and Identically distributed (iid), they
have the same value of exception and variance.
• P
j=1Xj − nµ
Y = √

satisfy Z y
1
√ e−t /2 dt,
2
lim F (y) = lim P (Y ⩽ y) =
n→∞ n→∞ −∞ 2π

5.2.2 Lyapunov theorem


• When X1 , X2 , ... follow Independent and have finite value of exception and
variance. Sign:
Xn
Bn2 = σi2
i=1

1 X
n
lim 2+δ
E|Xi − µi |2+δ = 0,
n→∞ Bn
i=1
P P
i=1Xi − i=1 µi
Y =
Bn
 Pn Pn 
i=1 Xi − i=1 µi
lim F (y) = lim P ⩽y
n→∞ n→∞ Bn
Z y
1 − t2
= √ e 2 dt.
−∞ 2π

5.2.3 De Moivre-Laplace Theorem


27
Chapter 6

Subsamples and their


distribution

6.1 Random Subsamples and their distribution


function

Y
n
F (x1 , x2 , · · · , xn ) = F (xi ),
i=1
Yn
f (x1 , x2 , · · · , xn ) = f (xi ).
i=1

• Sort xi in order of size

x∗1 ⩽ x∗2 ⩽ · · · ⩽ x∗n .

Then we have empirical distribution function



 0, x < x∗1 ,

Fn (x) = k/n, x∗k ⩽ x < x∗k+1 , k = 1, · · · , n − 1,

1, x ⩾ x∗n .

• Define
Dn = max |Fn∗ (x) − F (x)|,
−∞<x<∞

Glvenko Theorem points out that


n o
P lim Dn = 0 = 1.
n→∞

28
6.2 Statistical Magnitude
6.2.1 Order Statistic
(n) (n) (n)
• X1 , X2 , · · · , Xn is Order Statistic of X1 , X2 , · · · , Xn , where the ob-
served values are sorted in order of size.

x∗1 ⩽ x∗2 ⩽ · · · ⩽ x∗n .

6.2.2 Subsample mean



1X
n
X̄ = Xi .
n i=1

6.2.3 Subsample variance


• !
1 X X
n n
1
S2 = (Xi − X̄)2 = Xi2 − nX̄ 2 .
n − 1 i=1 n−1 i=1

6.2.4 The K-order origin moment of the subsample



1X k
n
Ak = X , k = 1, 2, · · · ,
n i=1 i

6.2.5 The K-order central moment of the subsample



1X
n
Mk = (Xi − X̄)k , k = 1, 2, · · · .
n i=1

29

M1 = 0,
M2 = Λ2 − Λ21 ,
M3 = A3 − 3Λ2 Λ1 + 2A31 ,
M4 = Λ4 − 4Λ3 Λ1 + 6Λ2 Λ21 − 3Λ41 ,

6.2.6 Skewness of the subsample



M3
g1 = ,
(M2 )3/2

6.2.7 Subsample kurtosis



M4
g2 = − 3.
M22

6.2.8 Subsample covariance



!
1 X X
n n
1
SXY = (Xi − X̄)(Yi − Ȳ ) = Xi Yi − nX̄ Ȳ .
n − 1 i=1 n−1 i=1

6.2.9 Coefficient of association of subsamples


• Pn
SXY i=1 (Xi − X̄)(Yi − Ȳ )
r= = P Pn 
SX SY n 2 1/2
i=1 (Xi − X̄) i=1 (Yi − Ȳ )
2

Pn
Xi Yi − nX̄ Ȳ
i=1
= 1/2  n 1/2
P
n P 2
Xi − nX̄
2 2 Yi − nȲ 2
i=1 i=1

6.3 Statistical Magnitude with its numerical Char-


acteristic

X̂ = E(X̄) = E(X),
1
V (X̄) = V (X),
n

30
• For multiple dimensions,{X1 , · · · , Xk } has observed values

(x11 , · · · , x1k ), (x21 , · · · , x2k ), · · · , (xn1 , · · · , xnk )

or translated into the form of matrix


   T 
x11 x12 · · · x1k x(1)
 x21 x22 · · · x2k   xT(2) 
   
Xn×k =  . .. ..  = .  = (x1 , x2 , · · · , xk ).
 .. . .   .. 
xn1 xn2 · · · xnk n×k xT(n)

• Mean of xj
1X
n
x̄j = xij , j = 1, 2, · · · , k
n i=1

• Variance of xj

1 X
n
s2j = (xij − x̄j )2 , j = 1, 2, · · · , k
n − 1 i=1

• Covariance of xi and xl

1 X
n
sjl = (xij − x̄j )(xil − x̄l ), j, l = 1, 2, · · · , k
n − 1 i=1

written in the form of matrix


 
s11 s12 ··· s1k
 s21 s22 ··· s2k 
 
S = . .. .. ,
:
 .. . . 
sk1 sk2 ··· skk

• Coefficient of association
sjl
rjl = , j, l = 1, 2, · · · , k
sj l
written in the form of matrix
 
1 r12 ··· r1k
 r21 1 ··· r2k 
 
R = .. .. .. ,
:
 . . . 
rk1 rk2 ··· 1

6.4 Sample distribution


31
6.5 Bayes Formula
• Marginal probability

X
n
P (Ai ) = P (Ai ∩ Bj )
j=1

while X X
P (Ai ) = 1 = P (Bj ).
i=1 j=1

• Total probability

X
n
P (A) = P (A|Bj ) · P (Bj ).
j=1

• Bayes Formula

P (A|Bi )P (Bi ) P (A|Bi )P (Bi )


P (Bi |A) = P
n =
P (A)
P (A|Bj )P (Bj )
j=1

• Bayes problem
Prior probability, in Bayesian statistics, is the probability of an event
before new data is collected. Denoted by

P (Bi )

A posterior probability, is the revised or updated probability of an


event occurring after taking into consideration new information. Denoted
by
P (Bi |A)

32
Chapter 7

Parameter estimation

7.1 Estimate and Likelihood function


• ϑ is a parameter of variable X, and T is a estimate of ϑ. The estimated
value is
T = T (X1 , X2 , · · · , Xn ).
A estimate should have unbiasedness, effectiveness and congruence with
ϑ. A best estimate should have the least variance and sufficiency.


Y
n
L = L(X1 , X2 , · · · , Xn |ϑ) = f (Xi |ϑ) ≡ L(X|ϑ),
i=1

the joint probability density of X1 , X2 , · · · , Xn , is the function of ϑ, called


Likelihood function.

7.1.1 Congruence
• T converges in probability to ϑ.
 
P ϑ̂n − ϑ > ε < η.

• For example, the mean value of subsamples is a convergent estimate of


excepted value.

7.1.2 Unbiasedness
• When n is finite, we need a new estimate. If the excepted value of T is
equal to ϑ
E(T ) = ϑ,
then T is an unbiased estimate of ϑ.

33

E(T ) = ϑ + b(ϑ), b ̸= 0,
is a biased estimation. A proper estimate should satisfy

b(ϑ) ∼ 1/nk , k > 1.


lim E(T ) = ϑ,
n→∞

such a estimate is called the asymptotic unbiased estimator.

7.1.3 Effectiveness and the least variance


• The estimator has a least variance
∂τ ∂b 2

+
V (T ) ⩾ ∂ϑ
 ∂ 2 ln L  ,
∂ϑ
E − ∂ϑ2

denoted by MVB, where

E(T ) = τ (ϑ) + b(ϑ),

and L is the joint probability density, namely


Z Z
· · · LdX = 1.

• Since     
∂ 2 ln L ∂τ ∂b
E − = E A(ϑ) · +
∂ϑ2 ∂ϑ ∂ϑ
 
∂τ ∂b
= A(ϑ) + ,
∂ϑ ∂ϑ
the least variance can be written as
∂τ ∂b
+ ∂ϑ
MVB ≡ min[V (T )] = ∂ϑ
,
A(ϑ)
where
∂ ln L
= A(ϑ) [T − τ (ϑ) − b(ϑ)] .
∂ϑ

7.1.4 Sufficiency
• If the estimator T (X1 , X2 , · · · , Xn ) use all the information of X1 , X2 , · · · , Xn ,
then T is a sufficient estimator of ϑ. Namely,

f (X1 , X2 , · · · , Xn |T )

is independent with ϑ.

34
• Fisher-Neyman Theorem Sufficiency ⇔

Y
n
L(X1 , X2 , · · · , Xn |ϑ) = f (Xi ; ϑ)
i=1
= G(T |ϑ)H(X1 , X2 , · · · , Xn ),

• Darmois principle When

f (x; ϑ) = exp [α(x)a(ϑ) + β(x) + c(ϑ)]

there is a sufficient estimator.

7.2 Interval Estimation



γ = P (ϑa ⩽ ϑ ⩽ ϑb ) ,
then [ϑa , ϑb ] is called the confidence interval of ϑ with probability γ. γ is
called the confidence level

7.2.1 Pivot variable method


• Z tb
γ= g(t)dt,
ta

where t is an one-to-one function o fϑ.

ta = t (ϑa ) , tb = t (ϑb ) .

7.2.2 Large sample method


7.3 Confidence interval for normal distribution


35
Chapter 8

Maximum likelihood
method

8.1 The maximum likelihood principle



Y
n
L(x|ϑ)dx = f (xi |ϑ)dxi .
i=1

This is a function of ϑ.
• The maximum likelihood principle According to this principle, we
should use ϑ̂ such that the likelihood function get the maximum.

L(x|ϑ̂) ⩾ L(x|ϑ).


ϑ̂ = ϑ̂(x1 , · · · , xn )
is called the Maximum likelihood estimator.
• ∂ Pn

 ∂ϑ ln L(X|ϑ) =

∂ϑ i=1 ln f (Xi |ϑ) = 0,
2 2 Pn
 ∂
 ∂ϑ2 ln L(X|ϑ)

= ∂ϑ 2 i=1 ln f (Xi |ϑ) < 0.
ϑ=ϑ̂
ϑ=ϑ̂
is called the likelihood equation.

36
8.2 Maximum likelihood estimation of normal
population parameters
8.2.1 Maximum likelihood estimation of mean µ
• When every observed value have the same σ:
"  2 #
Yn
1 1 Xi − µ
L (X; σ|µ) = √ exp − ,
i=1
2πσ 2 σ

then "  2 #
∂ X
n
∂ ln L 1 1 Xi − µ
= − ln(2πσ ) −
2
= 0,
∂µ ∂µ i=1 2 2 σ
we have
1X
n
µ̂ = Xi = X̄.
n i=1
The variance is
σ2
V (µ̂) = .
n
• When observed value have error σi respectively:
Pn Xi Pn
i=1 σ 2 i=1 wi Xi
µ̂ = Pn 1i = P n ,
i=1 σ 2 i
i=1 wi

with variance
1 1
V (µ̂) = Pn 1 = Pn .
i=1 σi2 i=1 wi

8.2.2 Maximum likelihood estimation of variance σ



1X
n
σ̂ 2 = (Xi − µ)2 ,
n i=1

2σ 4
V (σ̂ 2 ) = .
n

8.2.3 Maximum likelihood estimation of both µ and σ



1X
n
µ̂ = Xi = X̄.
n i=1

1X
n
2
σ̂ = (Xi − X̄)2 .
n i=1

37
1 X
n
n
2
S = (Xi − X̄)2 = σ̂ 2 .
n − 1 i=1 n−1

38
Chapter 9

Least square method

39
Chapter 10

Moments method

40
Chapter 11

Interval estimation

41
Chapter 12

42
Chapter 13

43
Chapter 14

44
Chapter 15

45
Chapter 16

46
Chapter 17

47

You might also like