Data Analysis

The Notebook of Data
analysis
Zixu Wang
January 30, 2024

Contents
1 Preliminary probability theory 1

1.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Axiomatic approach to probability . . . . . . . . . . . . . . . . . 1
1.3 Property of probability . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Contingent probability . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.1 Other probability . . . . . . . . . . . . . . . . . . . . . . . 3
1.4.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Random variables and their distribution 4

2.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Discrete random variable . . . . . . . . . . . . . . . . . . 4
2.1.2 Continuous random variable . . . . . . . . . . . . . . . . . 5
2.2 Distribution of Random variables function . . . . . . . . . . . . . 6
2.2.1 Discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Continuous . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Numerical characteristics . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Chebyshev inequation . . . . . . . . . . . . . . . . . . . . 8
2.4 Characteristic function of random variable . . . . . . . . . . . . . 8
2.4.1 Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Relationship with moment . . . . . . . . . . . . . . . . . . 9
2.5 Probability-generating function for discrete variables . . . . . . . 10
3 Multidimensional random variables and their distribution 11

3.1 Two-dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Contingent probability distribution . . . . . . . . . . . . . . . . . 12
3.3 Numerical characteristics . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Distribution of Function of two-dimension random variables . . . 13
3.4.1 Z = X + Y . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.2 Z = X − Y . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.3 Z = XY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.4 Z = X/Y . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.5 Z = X 2 + Y 2 . . . . . . . . . . . . . . . . . . . . . . . . . 15
i
3.5 Multidimensional random variables . . . . . . . . . . . . . . . . . 15
4 Probability distribution 16
4.1 Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Compound Poisson distribution . . . . . . . . . . . . . . . . . . . 18
4.5 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . 18
4.6 Pascal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.7 Hyper-geometric distribution . . . . . . . . . . . . . . . . . . . . 19
4.8 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.9 exponent distribution . . . . . . . . . . . . . . . . . . . . . . . . 20
4.10 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.11 Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.12 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.13 Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.14 Landau distribution . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.15 χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.16 T Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.17 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Law of large numbers & Central-limit theorem 26

5.1 Law of large numbers . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.1 Chebyshev’s Law of large numbers . . . . . . . . . . . . . 26
5.1.2 Hinchin’s Law of large numbers . . . . . . . . . . . . . . . 26
5.1.3 Bernoulli law of large numbers . . . . . . . . . . . . . . . 27
5.1.4 Poisson’s law of large numbers . . . . . . . . . . . . . . . 27
5.2 Central-limit theorem . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2.1 Independent and Identically distributed . . . . . . . . . . 27
5.2.2 Lyapunov theorem . . . . . . . . . . . . . . . . . . . . . . 27
5.2.3 De Moivre-Laplace Theorem . . . . . . . . . . . . . . . . 27
6 Subsamples and their distribution 28

6.1 Random Subsamples and their distribution function . . . . . . . 28
6.2 Statistical Magnitude . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2.1 Order Statistic . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2.2 Subsample mean . . . . . . . . . . . . . . . . . . . . . . . 29
6.2.3 Subsample variance . . . . . . . . . . . . . . . . . . . . . 29
6.2.4 The K-order origin moment of the subsample . . . . . . . 29
6.2.5 The K-order central moment of the subsample . . . . . . 29
6.2.6 Skewness of the subsample . . . . . . . . . . . . . . . . . 30
6.2.7 Subsample kurtosis . . . . . . . . . . . . . . . . . . . . . . 30
6.2.8 Subsample covariance . . . . . . . . . . . . . . . . . . . . 30
6.2.9 Coefficient of association of subsamples . . . . . . . . . . 30
6.3 Statistical Magnitude with its numerical Characteristic . . . . . . 30
6.4 Sample distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 31
ii
6.5 Bayes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Parameter estimation 33
7.1 Estimate and Likelihood function . . . . . . . . . . . . . . . . . . 33
7.1.1 Congruence . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.2 Unbiasedness . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.1.3 Effectiveness and the least variance . . . . . . . . . . . . . 34
7.1.4 Sufficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.2.1 Pivot variable method . . . . . . . . . . . . . . . . . . . . 35
7.2.2 Large sample method . . . . . . . . . . . . . . . . . . . . 35
7.3 Confidence interval for normal distribution . . . . . . . . . . . . 35
8 Maximum likelihood method 36

8.1 The maximum likelihood principle . . . . . . . . . . . . . . . . . 36
8.2 Maximum likelihood estimation of normal population parameters 37
8.2.1 Maximum likelihood estimation of mean µ . . . . . . . . . 37
8.2.2 Maximum likelihood estimation of variance σ . . . . . . . 37
8.2.3 Maximum likelihood estimation of both µ and σ . . . . . 37
9 Least square method 39
10 Moments method 40
11 Interval estimation 41
12 42
13 43
14 44
15 45
16 46
17 47
iii
Chapter 1
Preliminary probability
theory
1.1 Probability
• Statistical Probability
n
P (A) = lim .
N →∞ N
• Classical probability
P (A) = k/n.
• Geometric probability
ωA
P (A) = ,
Ω
1.2 Axiomatic approach to probability

• –
0 ⩽ P (A) ⩽ 1
–
P (S) = 1
–
X
n
P (A1 ∪ A2 ∪ · · · ∪ An ) = P (Ak )
k=1
X∞
P (A1 ∪ A2 ∪ · · · ∪ An ∪ · · · ) = P (Ak ).
k=1
• Three above definitions satisfy the axiom.
1
1.3 Property of probability
•
P (A) + P (Ā) = 1.
•
P (∅) = 0.
•
P (A) ⩾ P (B).
if A ⊃ B.
•
P (A) ⩾ P (B).
if Ai is a division of S.
•
P (A − B) = P (A) − P (B).
if A ⊃ B.
•
P (A ∪ B) = P (A) + P (B) − P (AB).
• generalization:
X
n
S1 = P (Ai ),
i=1
Xn
S2 = P (Ai Aj ),
i<j=2
Xn
S3 = P (Ai Aj Ak )Q ,
i<j<k=3
Then !
[
n
P Ai ≡ P (A1 ∪ A2 ∪ · · · ∪ An )
i=1
= S1 − S2 + S3 − · · · − (−1)n Sn .
which is the addition theorem of probability.
2
1.4 Contingent probability
•
P (AB)
P (B|A) = .
P (A)
or
P (AB)
P (A|B) = .
P (B)
• We can get
P (AB) = P (B|A) · P (A) = P (A|B) · P (B),
which is the multiplication theorem of probability.
1.4.1 Other probability

•
P (A|B) = 1;
if A ⊃ B.
•
P (A|B) = 0;
if AB = ∅.
•
P (A|B) = 1 − P (Ā|B),
1.4.2 Independence
•
P (AB) = P (A) · P (B)
3
Chapter 2
Random variables and their

distribution
2.1 Random variables

• A random variable is a variable whose value is unknown or a function
that assigns values to each of an experiment’s outcomes.
• CDF(cumulative distribution function):
In Probability and Statistics, the Cumulative Distribution Function (CDF)
of a real-valued random variable, say ’X’, which is evaluated at x, is the
probability that X takes a value less than or equal to the x. Denoted by
F (x).
•
0 ⩽ F (x) ⩽ 1.
•
F (x2 ) − F (x1 ) = P (x1 < X ⩽ x2 ) ⩾ 0.
•
F (xmin ) = 0, F (xmax ) = 1.
2.1.1 Discrete random variable

• X
F (x) = pi ,
xi ⩽x
4
Figure 2.1:
2.1.2 Continuous random variable

• Z x
F (x) = f (t)dt.
xmin
where f (x) is called the probability density function
Figure 2.2:
•
f (x) ⩾ 0.
• Z
f (x)dx = F (xmax ) = 1,
Ω
which is called the uniformity of the probability density.
• Z x2
f (x)dx = F (x2 ) − F (x1 )
x1
•
f (x) = F ′ (x).
5
2.2 Distribution of Random variables function
Y is the transformation of X, namely
Y = Y (X).
2.2.1 Discrete
• X
p(yj ) = qj = p(X = xi ).
y(xi )=yj
•
p(yi ) = p(xi ) = p(x(yi )),
if X to Y is one to one.
2.2.2 Continuous
•
X
k
dxi (y)
g(y) = f (xi (y)) .
i=1
dy
•
dx(y)
g(y) = f (x(y)) .
dy
if one to one.
2.3 Numerical characteristics

• Mathematical expectation
X
E(Y ) = E{g(X)} = g(xi )p(xi )
i
or Z
E(Y ) = E{g(X)} = g(x)f (x)dx.
Ω
• Median
P (X ⩽ xp ) ⩾ p, P (X ⩾ xp ) ⩾ 1 − p, 0<p<1
when p = 0.5
• Most probable value
p(xpro ) = max{p(x1 ), p(x2 ), · · · }
or
f (xpro ) = max{f (x)}
x∈Ω
6
2.3.1 Moments
• l − ordermomentof XwithrespecttoC is denoted by
αl ≡ E{(X − C)l }.
When C=0, called the origin moment or algebraic moment of X.

λl ≡ E(X l ).
When C=µ, called the center moment.

µl ≡ E{(X − µ)l }.
• Relationship
Xn
n
µn = λn−k (−µ)k ,
k
k=0
Xn
n
λn = µn−k (µ)k .
k
k=0
•
µ0 = 1, µ1 = 0, µ2 = σ 2
•
λ0 = 1, λ1 = µ, λ2 = σ 2 + µ2
• Variance is defined by µ2 , and can calculated by
V (X) = E(X 2 ) − µ2 .
• Skewness
µ3 E{(X − µ)3 }
γ1 ≡ = .
3/2
µ2 σ3
Figure 2.3:
7
• Kurtosis
µ4 E{(X − µ)4 }
γ2 ≡ − 3 = − 3.
µ22 σ4
Figure 2.4:
2.3.2 Chebyshev inequation

–
σ2
P {|X − µ| ⩾ ε} ⩽ .
ε2
or
P {|X − µ| < εσ} ⩾ 1 − 1/ε2 .
–
P {|X − µ| < 3σ} ⩾ 0.888 9,

P {|X − µ| < 4σ} ⩾ 0.9375.
2.4 Characteristic function of random variable

• Complex random variable
Z = X + iY
while
E(Z) = E(X) + iE(Y ).
8
• Characteristic function of random variable is defined by the Fourier trans-
form of the probability density function.
φX (t) ≡ E(eitX )
X
= p(xk )eitxk
k
Z ∞
= eitx f (x)dx
−∞
• Z ∞
1
f (x) = φX (t)e−ixt dt;
2π −∞
• Z ∞
i 1 −itb
F (b) − F (a) = e − e−ita φX (t)dt.
2π −∞ t
2.4.1 Property
•
φ(0) = E(e0 ) = 1.
•
|φ(t)| ⩽ 1.
•
φY (t) = eibt φX (at),
is the characteristic function of Y = aX + b.
•
φZ (t) = φX (t) · φY (t),
is the characteristic function of Z = X + Y .
2.4.2 Relationship with moment

• with origin moment
Z ∞
(n) (n)
φX (0) = φX (t = 0) = i n
xn f (x)dx = in E{X n },
−∞
or
n (−n) (n) −n dn φX (t)
λn = E(X ) = i φX (0) =i .
dtn t=0
• with center moment

(n) dn −iµt
µn = i(−n) φX−µ (0) = i(−n) · n
e φX (t) .
dt t=0
9
2.5 Probability-generating function for discrete
variables
• X
φX (t) = p(xk )eitxk = E(eitX ),
k
with
Z = eit
we have X
G(Z) ≡ E(Z X ) = p(xk )Z xk ,
k
which is called the Probability-generating function.
10
Chapter 3
Multidimensional random
variables and their
distribution
3.1 Two-dimension
•
F (x, y) = P (X ⩽ x, Y ⩽ y).
is the distribution function of {X, Y }
• Define
P (X = xi , Y = yj ) = pij , i, j = 1, 2, · · · ,
then X
F (x, y) = pij .
xi ⩽x
Similarly, Z Z
y x
F (x, y) = f (u, v)dudv,
y min x min
with
∂ 2 F (x, y)
f (x, y) = ..
∂x∂y
•
FX (x) = F (x, ymax ), FY (y) = F (xmax , y).
• Independence
F (x, y) = FX (x) · FY (y)
or
f (x, y) = fX (x) · fY (y).
11
3.2 Contingent probability distribution
• Discrete
P (X = xi , Y = yj ) pij
P (X = xi |Y = yj ) = = , i = 1, 2, · · · .
P (Y = yj ) pj
• Continuous
f (x, y)
f (x|y) = .
fY (y)
3.3 Numerical characteristics

• Z Z
E{H(X, Y )} = H(x, y)f (x, y)dxdy,
Ωy Ωx
V {H(X, Y )} = E{[H(X, Y ) − E(H(X, Y ))]2 }.
• When H = X, Z Z
E(X) = xf (x, y)dxdy,
Ωy Ωx
Therefore,
Z Z
E(aX + bY ) = (ax + by)f (x, y)dxdy
Ωy Ωx
= aE(X) + bE(Y ).
• Given H(X, Y ) = X l Y m ,
λlm = E(X l Y m ),
is called the l + m − orderoriginmoment. If X is independent with Y,

then
E(XY ) = E(X) · E(Y ).
3.3.1 Covariance
• Define covariance
cov(X, Y ) ≡ µ11
= E{[X − E(X)][Y − E(Y )]}
= E(XY ) − E(X)E(Y ) − E(X)E(Y ) + E(X)E(Y )
= E(XY ) − E(X)E(Y ).
–
cov(X, Y ) = cov(Y, X).
12
–
cov(aX, bY ) = ab cov(X, Y )
–
cov(X1 + X2 , Y ) = cov(X1 , Y ) + cov(X2 , Y ).
•
V (aX + bY ) = a2 V (X) + b2 V (Y ) + 2abcov(X, Y ).
• Coefficient of association
cov(X, Y )
ρXY ≡ .
σ(X)σ(Y )
–
|ρXY | ⩽ 1.
–
|ρXY | = 1
while Y = aX + b.
• Schwarz inequation
[cov(X, Y )]2 − V (X)V (Y ) ⩽ 0.
• Covariance Matrix (error matrix)

V11 V12
V = .
V21 V22
where
Vij = cov(Xi , Xj )
3.4 Distribution of Function of two-dimension

random variables
•
U = U (X, Y ), V = V (X, Y ).
• Discrete X
F (u, v) = F (U ⩽ u, V ⩽ v) = pij
(i,j)∈D
• Continuous
x, y
g(u, v) = f (x, y) J .
u, v
where ∂x ∂y
x, y ∂(x, y)
J = = ∂u
∂x
∂u
∂y ̸= 0.
u, v ∂(u, v) ∂v ∂v
13
3.4.1 Z =X +Y
• Discrete X
PZ (zk ) = PX (xi ) PY (zk − xi )
i
X
= PX (zk − yj )PY (yj ).
j
• Continuous
ZZ
FZ (z) = P (Z ⩽ z) = f (x, y)dxdy.
x+y⩽z
When X is independent with Y:

Z
fZ (z) = fX (z − y)fY (y)dy
Ωy
Z
= fX (x)fY (z − x)dx.
Ωx
which is called convolution formula.
Figure 3.1:
3.4.2 Z =X −Y
• Z
fZ (z) = f (z + y, y)dy.
14
3.4.3 Z = XY
• Z
z 1
fZ (z) = f ,y dy
y |y|
Z +∞ Z 0
1 z 1 z
= f , y dy − f , y dy.
0 y y −∞ y y
3.4.4 Z = X/Y
• Z
fZ (z) = |y|f (zy, y)dy.
3.4.5 Z = X2 + Y 2
• Z
1 2π √ √
fZ (z) = f ( z cos φ, z sin φ)dφ.
2 0
3.5 Multidimensional random variables
15
Chapter 4
Probability distribution
4.1 Bernoulli distribution

• The Bernoulli distribution is a discrete distribution having two possible
outcomes labeled by X = 1 (success) and X = 0 (failure).
•
P (X = 1) = p, P (X = 0) = 1 − p.
• Value of expectation
E(x) = p.
• Variance
V (x) = p(1 − p).
• Binomial distribution
– Let
r = X1 + X2 + · · · + Xn .
–
n
B(r; n, p) = pr (1 − p)n−r , r = 0, 1, · · · , n.
r
X
x
F (x; n, p) = B(r; n, p), x = 0, 1, · · · , n.
r=0
–
µ ≡ E(r) = np, V (r) = np(1 − p).
1 − 2p 1 − 6p(1 − p)
γ1 = , γ2 = .
[np(1 − p)]
1
2 np(1 − p)
16
4.2 Multinomial distribution
•
E = A1 + A2 + · · · + Al ,
and
P (Aj ) = pj , j = 1, 2, · · · , l.
•
n!
M (r; n, p) = pr1 pr2 · · · prl l .
r1 !r2 ! · · · rl ! 1 2
E(rj ) = npj , j = 1, 2, · · · , l.
• Variance
V (rj ) = npj (1 − pj ).
• Covariance
cov(ri , rj )
4.3 Poisson distribution

•
1 r −µ
P (r; µ) = µ e , r = 0, 1, 2, · · · .
r!
• Probability-generating function
∞
X 1 r −µ
G (Z) ≡ E(Z r ) = Zr µ e
r=0
r!
X∞
1
= (µZ)r e−µ = eµ(Z−1) .
r=0
r!
•
E(r) = G′ (1) = µ,
V (r) = G′′ (1) + G′ (1) − [G′ (1)] = µ.
2
namely,
E(r) = V (r) = µ.
• Characteristic function
it
−1)
φ(t) = eµ(e .
17
•
1 1
γ1 = √ , γ 2 = ,
µ µ

dλk
λk+1 = µ λk + .
dµ
• Poisson’s theorem When
n → ∞, np = µ
the binomial distribution tends towards the Bernoulli distribution.
4.4 Compound Poisson distribution

•
X
n
r= ri
i=1
4.5 Geometric distribution

•
g(r; p) = p(1 − p)r−1 ,
E(r) = p−1 ,
• Variance
V (r) = (1 − p)/p2 ,
•
γ1 = (2 − p)/(1 − p)1/2 ,
γ2 = (p2 − 6p + 6)/(1 − p),
pZ
G (Z) = .
1 − (1 − p)Z
4.6 Pascal Distribution

•
r−1 k
Pk (r; p) = p (1 − p)r−k , 0 ⩽ p ⩽ 1, r = k, k + 1, · · · ,
k−1
E(r) = k/p,
18
• Variance
V (r) = k(1 − p)/p2 ,
• p
γ1 = (2 − p)/ k(1 − p),
72 = (p2 − 6p + 6)/k(1 − p),
k
pZ
G(Z) = .
1 − (1 − p)Z
• When k = 1, Pascal distribution is Geometric distribution.
4.7 Hyper-geometric distribution

•
.
N −a a N
P (r; N, n, a) = , r = 0, 1, 2, · · · , min(a, n),
n−r r n
na
E(r) =
N
• Variance
N − n na a
V (r) = · 1− .
N −1 N N
• p
γ1 = (2 − p)/ k(1 − p),
72 = (p2 − 6p + 6)/k(1 − p),
k
pZ
G(Z) = .
1 − (1 − p)Z
4.8 Uniform distribution

• 1
b−a , a ⩽ X ⩽ b,
f (x) =
0, else,

 0, X < a,
F (x) = x−a
b−a , a ⩽ X ⩽ b,

1, X > b.
19
Figure 4.1:
Z b
a+b
E (X) = xf (x)dx = ,
a 2
• Variance Z b
(b − a)2
V (X) = [x − E(X)]2 f (x)dx = ,
a 12
•
γ1 = 0, γ2 = −1.2.
eitb − eita
φ(t) =
it(b − a).
4.9 exponent distribution

•

λe−λx , x ⩾ 0,
f (x; λ) =
0, x < 0,
F (x) = 1 − e−λx ,
1
E(X) =
λ
• Variance
1
V (X) =
λ2
•
γ1 = 2, γ2 = 6.
−1
it
φ(t) = 1−
λ
20
4.10 Gamma distribution
•
β α α−1 −βx
f (x; α, β) = x e , α, β > 0, 0 ⩽ x < ∞,
Γ(α)
E(X) = α/β
• Variance
V (X) = α/β 2
•
2 6
γ1 = √ , γ 2 = .
α α
φ(t) = (1 − it/β)−α .
Figure 4.2:
4.11 Beta Distribution

•
Γ(m + n) m−1
f (X) = X (1 − X)n−1 , 0⩽X⩽1
Γ(m)Γ(n)
m
E(X) = ,
m+n
21
• Variance
mn
V (X) = ,
(m + n)2 (m + n + 1)
• √
2(n − m) m + n + 1
γ1 = √ ,
(m + n + 2) mn

3(m + n + 1) 2(m + n)2 + mn(m + n − 6)
= − 3.
mn(m + n + 2)(m + n + 3)
∞
Γ (m + n) X Γ (m + n)(it)k
φ(t) = .
Γ (m) Γ (m + n + k)Γ (k + 1)
k=0
Figure 4.3:
4.12 Normal distribution

•
1 (x−µ)2
f (x) ≡ N (µ, σ 2 ) = √ e− 2σ2 , −∞ < x < ∞,
2πσ
Z x
1 (t−µ)2
F (x) = √ e− 2σ2 dt,
2πσ −∞
22
•
E(X) = µ,
V (X) = σ 2 ,
γ1 = γ2 = 0.
Figure 4.4:
4.13 Cauchy distribution

•
1 1
f (x) = · , −∞ < x < ∞,
π 1 + x2
•
φ(t) = e−|t| .
23
4.14 Landau distribution
•
4.15 χ2 Distribution
•
1 y
y 2 −1 e− 2 ,
n
f (y; n) = n
y ⩾ 0,
Γ 2 2n/2
with n DOF.
• Z y
1
u 2 −1 e− 2 du.
n u
F (y; n) = n

Γ 2 2n/2 0
•
dφ(t) n
E(Y ) = (−i) = (−i) − (−2i) = n,
dt 2
t=0
•
φ(t) = (1 − 2it)−n/2 .
•
V (Y ) = E(Y 2 ) − [E(Y )]2 = n2 + 2n − n2 = 2n,
• r
µ3 2 µ4 12
γ1 = =2 , γ2 = 2 −3= .
(µ2 )3/2 n µ2 n
Figure 4.5:
24
4.16 T Distribution
• − n+1
Γ n+1 t2 2
f (t; n) = √ 2
n
1+ , −∞ < t < ∞,
nπΓ 2 n
4.17 F Distribution
•
25
Chapter 5
Law of large numbers &

Central-limit theorem
5.1 Law of large numbers

• Convergence in probability
lim P {|Xi − a| ⩾ ε} = 0.
n→∞
or
lim P {|Xi − a| < ε} = 1.
n→∞
5.1.1 Chebyshev’s Law of large numbers

• ( )
1X
n
lim P Xi − µ < ε = 1.
n→∞ n i=1
• Generally

( ) 1
P
n
V Xi
1X 1X
n n n
i=1 C
P Xi − µi < ε ⩾1− ⩾1− ,
n i=1 n i=1 ε2 nε2
5.1.2 Hinchin’s Law of large numbers

• ( )
1X
n
lim P Xi − µ < ε = 1.
n→∞ n i=1
26
5.1.3 Bernoulli law of large numbers
• n m o
lim P − p < ε = 1,
n→∞ n
5.1.4 Poisson’s law of large numbers

•
m p 1 + p2 + · · · + pn
lim P − <ε = 1.
n→∞ n n
5.2 Central-limit theorem

5.2.1 Independent and Identically distributed
• When X1 , X2 , ... follow Independent and Identically distributed (iid), they
have the same value of exception and variance.
• P
j=1Xj − nµ
Y = √
nσ
satisfy Z y
1
√ e−t /2 dt,
2
lim F (y) = lim P (Y ⩽ y) =
n→∞ n→∞ −∞ 2π
5.2.2 Lyapunov theorem

• When X1 , X2 , ... follow Independent and have finite value of exception and
variance. Sign:
Xn
Bn2 = σi2
i=1
1 X
n
lim 2+δ
E|Xi − µi |2+δ = 0,
n→∞ Bn
i=1
P P
i=1Xi − i=1 µi
Y =
Bn
Pn Pn
i=1 Xi − i=1 µi
lim F (y) = lim P ⩽y
n→∞ n→∞ Bn
Z y
1 − t2
= √ e 2 dt.
−∞ 2π
5.2.3 De Moivre-Laplace Theorem

•
27
Chapter 6
Subsamples and their

distribution
6.1 Random Subsamples and their distribution

function
•
Y
n
F (x1 , x2 , · · · , xn ) = F (xi ),
i=1
Yn
f (x1 , x2 , · · · , xn ) = f (xi ).
i=1
• Sort xi in order of size
x∗1 ⩽ x∗2 ⩽ · · · ⩽ x∗n .
Then we have empirical distribution function


 0, x < x∗1 ,
∗
Fn (x) = k/n, x∗k ⩽ x < x∗k+1 , k = 1, · · · , n − 1,

1, x ⩾ x∗n .
• Define
Dn = max |Fn∗ (x) − F (x)|,
−∞<x<∞
Glvenko Theorem points out that

n o
P lim Dn = 0 = 1.
n→∞
28
6.2 Statistical Magnitude
6.2.1 Order Statistic
(n) (n) (n)
• X1 , X2 , · · · , Xn is Order Statistic of X1 , X2 , · · · , Xn , where the ob-
served values are sorted in order of size.
x∗1 ⩽ x∗2 ⩽ · · · ⩽ x∗n .
6.2.2 Subsample mean

•
1X
n
X̄ = Xi .
n i=1
6.2.3 Subsample variance

• !
1 X X
n n
1
S2 = (Xi − X̄)2 = Xi2 − nX̄ 2 .
n − 1 i=1 n−1 i=1
6.2.4 The K-order origin moment of the subsample

•
1X k
n
Ak = X , k = 1, 2, · · · ,
n i=1 i
6.2.5 The K-order central moment of the subsample

•
1X
n
Mk = (Xi − X̄)k , k = 1, 2, · · · .
n i=1
29
•
M1 = 0,
M2 = Λ2 − Λ21 ,
M3 = A3 − 3Λ2 Λ1 + 2A31 ,
M4 = Λ4 − 4Λ3 Λ1 + 6Λ2 Λ21 − 3Λ41 ,
6.2.6 Skewness of the subsample

•
M3
g1 = ,
(M2 )3/2
6.2.7 Subsample kurtosis

•
M4
g2 = − 3.
M22
6.2.8 Subsample covariance

•
!
1 X X
n n
1
SXY = (Xi − X̄)(Yi − Ȳ ) = Xi Yi − nX̄ Ȳ .
n − 1 i=1 n−1 i=1
6.2.9 Coefficient of association of subsamples

• Pn
SXY i=1 (Xi − X̄)(Yi − Ȳ )
r= = P Pn
SX SY n 2 1/2
i=1 (Xi − X̄) i=1 (Yi − Ȳ )
2
Pn
Xi Yi − nX̄ Ȳ
i=1
= 1/2 n 1/2
P
n P 2
Xi − nX̄
2 2 Yi − nȲ 2
i=1 i=1
6.3 Statistical Magnitude with its numerical Char-

acteristic
•
X̂ = E(X̄) = E(X),
1
V (X̄) = V (X),
n
30
• For multiple dimensions,{X1 , · · · , Xk } has observed values
(x11 , · · · , x1k ), (x21 , · · · , x2k ), · · · , (xn1 , · · · , xnk )
or translated into the form of matrix

   T 
x11 x12 · · · x1k x(1)
 x21 x22 · · · x2k   xT(2) 
   
Xn×k =  . .. ..  = .  = (x1 , x2 , · · · , xk ).
 .. . .   .. 
xn1 xn2 · · · xnk n×k xT(n)
• Mean of xj
1X
n
x̄j = xij , j = 1, 2, · · · , k
n i=1
• Variance of xj
1 X
n
s2j = (xij − x̄j )2 , j = 1, 2, · · · , k
n − 1 i=1
• Covariance of xi and xl
1 X
n
sjl = (xij − x̄j )(xil − x̄l ), j, l = 1, 2, · · · , k
n − 1 i=1
written in the form of matrix

 
s11 s12 ··· s1k
 s21 s22 ··· s2k 
 
S = . .. .. ,
:
 .. . . 
sk1 sk2 ··· skk
• Coefficient of association
sjl
rjl = , j, l = 1, 2, · · · , k
sj l
written in the form of matrix
 
1 r12 ··· r1k
 r21 1 ··· r2k 
 
R = .. .. .. ,
:
 . . . 
rk1 rk2 ··· 1
6.4 Sample distribution

•
31
6.5 Bayes Formula
• Marginal probability
X
n
P (Ai ) = P (Ai ∩ Bj )
j=1
while X X
P (Ai ) = 1 = P (Bj ).
i=1 j=1
• Total probability
X
n
P (A) = P (A|Bj ) · P (Bj ).
j=1
• Bayes Formula
P (A|Bi )P (Bi ) P (A|Bi )P (Bi )

P (Bi |A) = P
n =
P (A)
P (A|Bj )P (Bj )
j=1
• Bayes problem
Prior probability, in Bayesian statistics, is the probability of an event
before new data is collected. Denoted by
P (Bi )
A posterior probability, is the revised or updated probability of an

event occurring after taking into consideration new information. Denoted
by
P (Bi |A)
32
Chapter 7
Parameter estimation
7.1 Estimate and Likelihood function

• ϑ is a parameter of variable X, and T is a estimate of ϑ. The estimated
value is
T = T (X1 , X2 , · · · , Xn ).
A estimate should have unbiasedness, effectiveness and congruence with
ϑ. A best estimate should have the least variance and sufficiency.
•
Y
n
L = L(X1 , X2 , · · · , Xn |ϑ) = f (Xi |ϑ) ≡ L(X|ϑ),
i=1
the joint probability density of X1 , X2 , · · · , Xn , is the function of ϑ, called

Likelihood function.
7.1.1 Congruence
• T converges in probability to ϑ.

P ϑ̂n − ϑ > ε < η.
• For example, the mean value of subsamples is a convergent estimate of

excepted value.
7.1.2 Unbiasedness
• When n is finite, we need a new estimate. If the excepted value of T is
equal to ϑ
E(T ) = ϑ,
then T is an unbiased estimate of ϑ.
33
•
E(T ) = ϑ + b(ϑ), b ̸= 0,
is a biased estimation. A proper estimate should satisfy
b(ϑ) ∼ 1/nk , k > 1.
•
lim E(T ) = ϑ,
n→∞
such a estimate is called the asymptotic unbiased estimator.
7.1.3 Effectiveness and the least variance

• The estimator has a least variance
∂τ ∂b 2

+
V (T ) ⩾ ∂ϑ
∂ 2 ln L ,
∂ϑ
E − ∂ϑ2
denoted by MVB, where
E(T ) = τ (ϑ) + b(ϑ),
and L is the joint probability density, namely

Z Z
· · · LdX = 1.
• Since
∂ 2 ln L ∂τ ∂b
E − = E A(ϑ) · +
∂ϑ2 ∂ϑ ∂ϑ

∂τ ∂b
= A(ϑ) + ,
∂ϑ ∂ϑ
the least variance can be written as
∂τ ∂b
+ ∂ϑ
MVB ≡ min[V (T )] = ∂ϑ
,
A(ϑ)
where
∂ ln L
= A(ϑ) [T − τ (ϑ) − b(ϑ)] .
∂ϑ
7.1.4 Sufficiency
• If the estimator T (X1 , X2 , · · · , Xn ) use all the information of X1 , X2 , · · · , Xn ,
then T is a sufficient estimator of ϑ. Namely,
f (X1 , X2 , · · · , Xn |T )
is independent with ϑ.
34
• Fisher-Neyman Theorem Sufficiency ⇔
Y
n
L(X1 , X2 , · · · , Xn |ϑ) = f (Xi ; ϑ)
i=1
= G(T |ϑ)H(X1 , X2 , · · · , Xn ),
• Darmois principle When
f (x; ϑ) = exp [α(x)a(ϑ) + β(x) + c(ϑ)]
there is a sufficient estimator.
7.2 Interval Estimation

•
γ = P (ϑa ⩽ ϑ ⩽ ϑb ) ,
then [ϑa , ϑb ] is called the confidence interval of ϑ with probability γ. γ is
called the confidence level
7.2.1 Pivot variable method

• Z tb
γ= g(t)dt,
ta
where t is an one-to-one function o fϑ.
ta = t (ϑa ) , tb = t (ϑb ) .
7.2.2 Large sample method

•
7.3 Confidence interval for normal distribution

•
35
Chapter 8
Maximum likelihood
method
8.1 The maximum likelihood principle

•
Y
n
L(x|ϑ)dx = f (xi |ϑ)dxi .
i=1
This is a function of ϑ.
• The maximum likelihood principle According to this principle, we
should use ϑ̂ such that the likelihood function get the maximum.
L(x|ϑ̂) ⩾ L(x|ϑ).
•
ϑ̂ = ϑ̂(x1 , · · · , xn )
is called the Maximum likelihood estimator.
• ∂ Pn

 ∂ϑ ln L(X|ϑ) =
∂
∂ϑ i=1 ln f (Xi |ϑ) = 0,
2 2 Pn
 ∂
 ∂ϑ2 ln L(X|ϑ)
∂
= ∂ϑ 2 i=1 ln f (Xi |ϑ) < 0.
ϑ=ϑ̂
ϑ=ϑ̂
is called the likelihood equation.
36
8.2 Maximum likelihood estimation of normal
population parameters
8.2.1 Maximum likelihood estimation of mean µ
• When every observed value have the same σ:
" 2 #
Yn
1 1 Xi − µ
L (X; σ|µ) = √ exp − ,
i=1
2πσ 2 σ
then " 2 #
∂ X
n
∂ ln L 1 1 Xi − µ
= − ln(2πσ ) −
2
= 0,
∂µ ∂µ i=1 2 2 σ
we have
1X
n
µ̂ = Xi = X̄.
n i=1
The variance is
σ2
V (µ̂) = .
n
• When observed value have error σi respectively:
Pn Xi Pn
i=1 σ 2 i=1 wi Xi
µ̂ = Pn 1i = P n ,
i=1 σ 2 i
i=1 wi
with variance
1 1
V (µ̂) = Pn 1 = Pn .
i=1 σi2 i=1 wi
8.2.2 Maximum likelihood estimation of variance σ

•
1X
n
σ̂ 2 = (Xi − µ)2 ,
n i=1
2σ 4
V (σ̂ 2 ) = .
n
8.2.3 Maximum likelihood estimation of both µ and σ

•
1X
n
µ̂ = Xi = X̄.
n i=1
1X
n
2
σ̂ = (Xi − X̄)2 .
n i=1
37
1 X
n
n
2
S = (Xi − X̄)2 = σ̂ 2 .
n − 1 i=1 n−1
38
Chapter 9
Least square method
39
Chapter 10
Moments method
40
Chapter 11
Interval estimation
41
Chapter 12
42
Chapter 13
43
Chapter 14
44
Chapter 15
45
Chapter 16
46
Chapter 17
47

Data Analysis

Uploaded by

Data Analysis

Uploaded by

The Notebook of Data

January 30, 2024

1 Preliminary probability theory 1

2 Random variables and their distribution 4

3 Multidimensional random variables and their distribution 11

5 Law of large numbers & Central-limit theorem 26

6 Subsamples and their distribution 28

8 Maximum likelihood method 36

9 Least square method 39

1.2 Axiomatic approach to probability

• Three above definitions satisfy the axiom.

P (AB) = P (B|A) · P (A) = P (A|B) · P (B),

which is the multiplication theorem of probability.

1.4.1 Other probability

Random variables and their

2.1 Random variables

2.1.1 Discrete random variable

2.1.2 Continuous random variable

2.3 Numerical characteristics

When C=0, called the origin moment or algebraic moment of X.

When C=µ, called the center moment.

2.3.2 Chebyshev inequation

P {|X − µ| < 3σ} ⩾ 0.888 9,

2.4 Characteristic function of random variable

2.4.2 Relationship with moment

• with center moment

which is called the Probability-generating function.

3.3 Numerical characteristics

V {H(X, Y )} = E{[H(X, Y ) − E(H(X, Y ))]2 }.

is called the l + m − orderoriginmoment. If X is independent with Y,

[cov(X, Y )]2 − V (X)V (Y ) ⩽ 0.

• Covariance Matrix (error matrix)

3.4 Distribution of Function of two-dimension

When X is independent with Y:

which is called convolution formula.

3.5 Multidimensional random variables

4.1 Bernoulli distribution

4.3 Poisson distribution

the binomial distribution tends towards the Bernoulli distribution.

4.4 Compound Poisson distribution

4.5 Geometric distribution

4.6 Pascal Distribution

• When k = 1, Pascal distribution is Geometric distribution.

4.7 Hyper-geometric distribution

4.8 Uniform distribution

4.9 exponent distribution

4.11 Beta Distribution

4.12 Normal distribution

4.13 Cauchy distribution

Law of large numbers &

5.1 Law of large numbers

5.1.1 Chebyshev’s Law of large numbers

5.1.2 Hinchin’s Law of large numbers

5.1.4 Poisson’s law of large numbers

5.2 Central-limit theorem

5.2.2 Lyapunov theorem

5.2.3 De Moivre-Laplace Theorem

Subsamples and their

6.1 Random Subsamples and their distribution

• Sort xi in order of size

x∗1 ⩽ x∗2 ⩽ · · · ⩽ x∗n .

Then we have empirical distribution function

Glvenko Theorem points out that

x∗1 ⩽ x∗2 ⩽ · · · ⩽ x∗n .

6.2.2 Subsample mean

6.2.3 Subsample variance

6.2.4 The K-order origin moment of the subsample

6.2.5 The K-order central moment of the subsample

6.2.6 Skewness of the subsample

6.2.7 Subsample kurtosis

6.2.8 Subsample covariance

6.2.9 Coeﬀicient of association of subsamples

6.3 Statistical Magnitude with its numerical Char-

(x11 , · · · , x1k ), (x21 , · · · , x2k ), · · · , (xn1 , · · · , xnk )

or translated into the form of matrix

written in the form of matrix