Notes PDF
Notes PDF
John Garza
September 5, 2019
MATH 3301 / Notes Section 0.0
table of contents 2
Contents
Preface 7
1 What is Statistics? 9
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Using R in this Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Basic Probability 13
2.1 A Review of Basic Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Experiments and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Conditional Probability and Independent Events . . . . . . . . . . . . . . . . 31
2.5 The Additive and Multiplicative Laws . . . . . . . . . . . . . . . . . . . . . 35
2.6 Baye’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Random Variables and Random Samples . . . . . . . . . . . . . . . . . . . . 39
Biblography 299
Index 300
Preface
table of contents 5
MATH 3301 / Notes Section 0.0
table of contents 6
Preface
These notes have been prepared for the Statistics course at UT Permian Basin. The notes
are intended to be used with the matching lecture videos provided in the course’s canvas
pages. The goal of these notes is provide a simplified summary of the content contained in
the required course textbook. These notes have been prepared using LATEX and R has been
weaved into the files using http://yihui.name/knitr/.
John Garza
September 5, 2019
MATH 3301 / Notes Section 0.0
table of contents 8
Chapter 1
What is Statistics?
1.1 Introduction
Population
Sample
Statistic
n
1X
x = xi
n i=1
n
1 X
s 2
= (xi − x)2
n − 1 i=1
√
s = s2
table of contents 10
Section 1.1 MATH 3301 / Notes
table of contents 11
MATH 3301 / Notes Section 1.2
Introduction
We will be using R, a free Data Analysis program available at https://www.r-project.
org/. After downloading R, install https://www.rstudio.com/home/ on your computer.
Set up a folder named ACTS131 on your computer. This folder will contain all of your files
for the course. As a student in this course you will have free access to all premium content
at www.datacamp.com
table of contents 12
Chapter 2
Basic Probability
Set Notation
The elements of a set are not ordered. A set is determined only by its elements. For
example, if A = {x, y, z} then it is also true that A = {y, z, x}
In order for set theory to be consistent, we suppose that there is a largest possible
set. This set is called set the universal set and is often denoted by a capital S.
MATH 3301 / Notes Section 2.1
There is a unique empty set which is a denoted ∅. This empty set is also called
the null set.
For a set that contains a finite number of elements, we donote the size or cardi-
nality of A by either N (A) or |A|. If the cardinality of a set is zero, then the set is
the empty set.
Subsets
If A and B are sets then A is a subset of B if and only if every element of A is also
and element of B. This is denoted by A ⊆ B
Proper Subsets
Equality of Sets
Two sets A and B are equal if every element of A is an element of B and every
element of B is an element of A. This is denoted by A = B. A = B if and only if
A ⊆ B and B ⊆ A.
table of contents 14
Section 2.1 MATH 3301 / Notes
Set Operations
The Basic Set Operations
union: A ∪ B = {x ∈ S | x ∈ A or x ∈ B}
intersection A ∩ B = {x ∈ S | x ∈ A and x ∈ B}
difference A − B = {x ∈ S | x ∈ A and x 6∈ B}
complement A = {x ∈ S | x 6∈ A}
table of contents 15
MATH 3301 / Notes Section 2.1
A∩B = A∪B
A∪B = A∩B
Distributive Laws
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A = A
table of contents 16
Section 2.1 MATH 3301 / Notes
A∩B∩C = A∪B∪C
A∪B∪C = A∩B∩C
Set Operations in R
table of contents 17
MATH 3301 / Notes Section 2.1
table of contents 18
Section 2.1 MATH 3301 / Notes
https://cran.r-project.org/web/packages/VennDiagram/VennDiagram.pdf
https://scriptsandstatistics.wordpress.com/2018/04/26/how-to-plot-venn-diagrams
i) 34 watch CBS.
table of contents 19
MATH 3301 / Notes Section 2.1
n12 <- 7
n13 <- 6
n23 <- 5
table of contents 20
Section 2.1 MATH 3301 / Notes
NBC CBS
3
25 7
4
2 1
ABC
table of contents 21
MATH 3301 / Notes Section 2.2
Experiment
Sample Space
A sample space associated with an experiment is the set of all possible observa-
tions. A sample space will usually be denoted by S. The elements of the sample
space are called sample points.
Sample Point
Event
An event is a subset of the sample space. The elements of an event are called
sample points.
Simple Event
A simple event is an event that cannot be decomposed. Each simple event contains
a unique element, a sample point.
table of contents 22
Section 2.2 MATH 3301 / Notes
Compound Event
Probability Function
Axiom 1: P (A) ≥ 0.
Axiom 2: P (S) = 1.
table of contents 23
MATH 3301 / Notes Section 2.3
2.3 Counting
Introduction
For discrete sample spaces, probability functions require us to count the number of sample
points in an event or in the sample space. In this section we will review some of basic count-
ing definitions and theorems and we will see how R can be used to compute these quantities.
Let S be a finite sample space where every simple event has equal probability. Then
|E|
P (E) =
|S|
table of contents 24
Section 2.3 MATH 3301 / Notes
n n!
=
r r!(n − r)!
n
X n i n−i
(x + y)n = xy
i=0
i
A Binomial Identity
Pascal’s Formula
n+1 n n
= +
r r−1 r
n n
=
r n−r
table of contents 25
MATH 3301 / Notes Section 2.3
n n
= 1 = 1
0 n
n n
= n = n
1 n−1
n n(n − 1)
=
2 2
Permutations
n!
=
(n − r)!
table of contents 26
Section 2.3 MATH 3301 / Notes
n+k−1
Repetition Allowed nk
k
n
Repetition not Allowed Pnk
k
table of contents 27
MATH 3301 / Notes Section 2.3
Multinomial Coefficients
The number of ways of partitioning n distinct objects into k distinct subsets of sizes
n1 , . . . nk where n1 + · · · + nk = n is called a multinomial coefficient.
n n!
=
n1 · · · nk n1 ! · · · nk !
X n
(x1 + · · · + xk )n = xn1 1 · · · xnk k
n1 n2 · · · nk
Multinomial Identity
table of contents 28
Section 2.3 MATH 3301 / Notes
Let S be a finite set that is the union of k mutually disjoint sets B1 , . . . , Bk . Then
|S| = |B1 | + · · · + |Bk |
Inclusion/Exclusion Rule
table of contents 29
MATH 3301 / Notes Section 2.3
Counting Functions in R
R has several built in functions for counting and we can define our own functions for those
that are not built-in already.
[1] 120
# Defining a permutations function-----
perm <-
# computes the number of ordered subsets of size k from a set of size n
function(n, k){factorial(n) / factorial(k)}
# Example
perm(4, 2)
[1] 12
multinom <-
# computes a multinomial coefficient
function(v){factorial(sum(v)) / prod(factorial(v))}
# Example
multinom(c(1, 2, 3))
[1] 60
#End of Script
table of contents 30
Section 2.4 MATH 3301 / Notes
Conditional Probability
Let A and B be events and suppose that P (B) > 0. The conditional probability of
an event A given that an event B has occurred, is defined as
P (A ∩ B)
P (A|B) =
P (B)
P (A ∩ B) = P (A|B)P (B)
P (A ∩ B) = P (B|A)P (A)
table of contents 31
MATH 3301 / Notes Section 2.4
Let A and B be events such that B ⊂ A, P (B) > 0 and P (A) > 0. Then
P (A|B) = 1
P (B)
P (B|A) =
P (A)
P (A|B) = 1 − P (A|B)
Independent Events
P (B|A) = P (B)
P (A ∩ B) = P (A)P (B)
table of contents 32
Section 2.4 MATH 3301 / Notes
Dependent Events
Two events are dependent if and only if any of the following hold.
P (A|B) 6= P (A)
P (B|A) 6= P (B)
P (A ∩ B) 6= P (A)P (B)
The following are equivalent. If one of them is true then so are all of the others
• A and B are independent
• A and B are independent
• A and B are independent
• A and B are independent.
table of contents 33
MATH 3301 / Notes Section 2.4
Events A, B and C are said to be mutually independent if all of the following hold
P (A ∩ B) = P (A)P (B)
P (A ∩ C) = P (A)P (C)
P (B ∩ C) = P (B)P (C)
table of contents 34
Section 2.5 MATH 3301 / Notes
P (A ∩ B) = P (A|B)P (B)
= P (B|A)P (A)
P (A ∩ B) = P (A)P (B)
table of contents 35
MATH 3301 / Notes Section 2.5
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Complement Law
P (A) + P (A) = 1
table of contents 36
Section 2.6 MATH 3301 / Notes
Partitions
Let S be a sample space and let B1 , . . . , Bk be a partition of S such that 0 < P (Bj )
for j ∈ {1, . . . , k}. Then for any event A
k
X
P (A) = P (A|Bj )P (Bj )
j=1
k
X
= P (A ∩ Bj )
j=1
Baye’s Rule
Let S be a sample space and let B1 , . . . , Bk be a partition of S such that 0 < P (Bj )
for j ∈ {1, . . . , k}. Then for any event E
P (E|Bj )P (Bj )
P (Bj |E) = Pk
i=1 P (E|Bi )P (Bi )
table of contents 37
MATH 3301 / Notes Section 2.6
Elementary Partitions
1. {B, B} is a partition.
2. P (A) = P (A ∩ B) + P (A ∩ B)
table of contents 38
Section 2.7 MATH 3301 / Notes
Random Variables
Random Sample
If a population has size N and if all samples from the population of size n have
an equal probability of being selected, the sampling is said to be random and the
resulting sample is called a random sample.
N
number of random samples of size n from a population of size N =
n
table of contents 39
MATH 3301 / Notes Section 2.7
table of contents 40
Chapter 3
Countably Infinite
Countable
Notation
Definition of P (X = x)
table of contents 42
Section 3.2 MATH 3301 / Notes
The expected value of a random variable can be undefined. In order for the expected
value to be defined, the above sum must be absolutely convergent, that is
Xh i
|x| × pX (x) < ∞
all x
table of contents 43
MATH 3301 / Notes Section 3.2
Variance
2
= E X 2 − E[X]
Standard Deviation
The standard deviation of the random variable X is the positive square root of the
variance of X. The standard deviation is sometimes denoted σX
Linear Transformations
1. E[aX + b] = aE[X] + b
2. E[b] = b
3. V [b] = 0
4. V [aX + b] = a2 V [X]
5. V [X + b] = V [X]
6. V [aX] = a2 V [X]
table of contents 44
Section 3.2 MATH 3301 / Notes
table of contents 45
MATH 3301 / Notes Section 3.3
Note
/
Notice that µ2 = σX2 and µ1 = µX
Note
The definition requires that ∃ b ∈ R+ such that mX (t) < ∞ for all −b ≤ t ≤ b
table of contents 46
Section 3.3 MATH 3301 / Notes
Theorem
= E Xk
Note
(1) 2
= m(2)
X
(0) − mX (0)
table of contents 47
MATH 3301 / Notes Section 3.3
Note
mX (0) = E[e0X ]
= E[e0 ]
= E[1]
= 1
Note
x
E[ax ] = E[eln(a ) ]
= E[ex ln a ]
= mX (ln a)
table of contents 48
Section 3.3 MATH 3301 / Notes
Linear Transformations
Let Y = aX + b. Then
mY (t) = E[etY ]
= E[et(aX+b) ]
= E[eatX etb ]
= etb mX (at)
table of contents 49
MATH 3301 / Notes Section 3.3
table of contents 50
Section 3.4 MATH 3301 / Notes
Introduction
Many random variables can take only the values 0, 1, 2, 3, ...... Such variables may be called
counts and examples include geometric, binomial, Poisson, hyper-geometric, and negative-
binomial variables. The probability generating function is useful for computing expected
value and variance for these variables.
Let X be a random variable that takes only the values X = 0, 1, 2, 3, . . . and define
pn = P [X = n]. The probability generating function of X is
PX (t) = E[tX ]
∞
X
p j × tj
=
j=0
= p0 + p1 t + p2 t2 + · · ·
Factorial Moments
For example,
µ[1] = E[X]
table of contents 51
MATH 3301 / Notes Section 3.4
Theorem
dk PX (t)
= PX(k) (1)
dtk t=1
= µ[k]
= E[X(X − 1) · · · (X − k + 1)]
Note
2
V [X] = PX00 (1) + PX0 (1) − PX0 (1)
table of contents 52
Section 3.5 MATH 3301 / Notes
Definition
The uniform discrete distribution assigns equal probabilities to a finite set of num-
bers. If X is uniform discrete on {1, 2, . . . , n} then
1
n , for x ∈ {1, . . . , n}
p(x) =
0, otherwise
n+1
E[X] =
2
n2 − 1
V [X] =
12
mX (t) = E[eXt ]
X
px × etx
=
all x
n
X 1
= ejt
j=1
n
et ent − 1
= ·
n et − 1
table of contents 53
MATH 3301 / Notes Section 3.5
n <- 23
x <- seq(from = 1, to = n, by = 1)
x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# Define the probability function
p_x
[1] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[7] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[13] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[19] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
# Compute the expected value of X using the formula
E_X <- (n + 1) / 2
E_X
[1] 12
# Compute the Expected value using the definition
sum(x * p_x)
[1] 12
# Compute the variance using the formula
V_x <- (n ^ 2 - 1) / 12
V_x
[1] 44
# Compute the variance directly
E2_X - (E_X) ^ 2
[1] 44
table of contents 54
Section 3.6 MATH 3301 / Notes
Definition
E[X] = p
V [X] = pq
E[X n ] = p
mX (t) = q + pet
Bernoulli Trial
table of contents 55
MATH 3301 / Notes Section 3.7
Binomial Experiments
Properties
E[X] = np
V [X] = npq
mX (t) = (q + pet )n
(
b(n + 1)pc, if (n + 1)p 6∈ Z
mode =
(n + 1)p and (n + 1)p − 1, if (n + 1)p ∈ {1, . . . , n}
table of contents 56
Section 3.7 MATH 3301 / Notes
Note
Note that the argument prob is used to specify the probability of success. In the R
functions above the arguments p and q stand for probability and quantile.
[1] 0.2001209
#X is binomial with p = 0.234 and n = 27. Compute P[X <= 11]
pbinom(11, size = 27, prob = 0.234)
[1] 0.9871378
[1] 0.0009037255
#X is binomial with n = 781 and p = 0.602. Compute the 71st percentile of X
qbinom(0.71, size = 781, prob = 0.602)
[1] 478
table of contents 57
MATH 3301 / Notes Section 3.7
table of contents 58
Section 3.7 MATH 3301 / Notes
0.20
0.15
probability
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
random variable
table of contents 59
MATH 3301 / Notes Section 3.7
# Create a histogram for a random sample of 1000 from X. Use 50 bins for the histogram
G <-
ggplot(data = M) +
geom_histogram(
mapping = aes(x = rb, y = ..density..),
fill = 'darkgreen',
color = 'darkgreen',
alpha = 0.5,
size = 0.5,
bins = 33) +
labs(
x = 'random binomial numbers',
y = 'relative frequency',
title = 'Random Binomial Numbers, p = 0.439, n = 1023',
subtitle = NULL,
caption = 'Summer 2019')+
theme_classic() +
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())
table of contents 60
Section 3.7 MATH 3301 / Notes
0.03
relative frequency
0.02
0.01
0.00
table of contents 61
MATH 3301 / Notes Section 3.7
# Create a graph of the emprical distribution function for a random sample of 1000 from X.
rb <- rbinom(n = 1e6, size = 25, prob = 0.439)
tb <- table(rb)
nb <- sort(unique(rb))
cb <- cumsum(tb) /length(rb)
G <-
ggplot(data = M) +
geom_col(
mapping = aes(x = nb, y = cb),
fill = 'maroon',
color = 'maroon',
alpha = 0.5,
width = 1,
size = 0.5) +
scale_x_continuous(
breaks = seq(from = min(rb), to = max(rb), by = 2))+
labs(
x = 'random binomial numbers',
y = 'cumulative frequency',
title = 'Random Binomial Numbers, p = 0.439, n =25',
subtitle = NULL,
caption = 'Summer 2019') +
theme_bw() +
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())
table of contents 62
Section 3.7 MATH 3301 / Notes
0.75
cumulative frequency
0.50
0.25
0.00
1 3 5 7 9 11 13 15 17 19 21
random binomial numbers
Summer 2019
table of contents 63
MATH 3301 / Notes Section 3.7
table of contents 64
Section 3.8 MATH 3301 / Notes
Geometric Distributions
For the random variable X from the geometric distribution with parameter p ∈ (0, 1)
and q = 1 − p
pX (x) = q x−1 p, where x ∈ Z+
table of contents 65
MATH 3301 / Notes Section 3.8
pX (x) = q x−1 p
pX (x + 1) = qpX (x)
P [X ≤ x] = 1 − q x
1
E[X] =
p
1−p
V [X] =
p2
pt
PX (t) =
1 − qt
pet
mX (t) =
1 − qet
mode = 1
table of contents 66
Section 3.8 MATH 3301 / Notes
For the random variable Y from the geometric distribution with parameter p ∈
(0, 1)
pY (y) = q y p, where y ∈ Z+ ∪ {0}
P [Y ≤ y] = 1 − q y+1
1−p
E[Y ] =
p
1−p
V [Y ] =
p2
p
PY (t) =
1 − qt
p
mY (t) =
1 − qet
mode = 0
table of contents 67
MATH 3301 / Notes Section 3.8
∞
X r
rj = for −1 < r < 1
j=1
1−r
∞
X
rj diverges for 1 ≤ |r|
j=0
Example
∞ j
X 2 1
=
j=0
3 1 − 2/3
1
=
1/3
= 3
table of contents 68
Section 3.8 MATH 3301 / Notes
table of contents 69
MATH 3301 / Notes Section 3.8
vfill
table of contents 70
Section 3.8 MATH 3301 / Notes
0.20
0.15
0.10
0.05
0.00
0 10 20
table of contents 71
MATH 3301 / Notes Section 3.9
Introduction
4. The random variable X is the number trials until the rth success
5. The random variable Y is the number of failures until the rth success
6. Note that X = Y + r
table of contents 72
Section 3.9 MATH 3301 / Notes
r
E[X] =
p
r(1 − p)
V [X] =
p2
r
pet
mX (t) =
1 − qet
Note:
In the equations above, X is the number of trials until the rth success.
table of contents 73
MATH 3301 / Notes Section 3.9
r(1 − p)
E[Y ] =
p
r(1 − p)
V [Y ] =
p2
r
p
mY (t) =
1 − qet
Note:
In the equations above, Y is the number of failures until the rth success.
table of contents 74
Section 3.9 MATH 3301 / Notes
[1] 0.109375
# Compute the probability that Y <= 4
pnbinom(4, prob = 0.5, size = 6)
[1] 0.3769531
# Compute the probability that Y>6
pnbinom(6, prob = 0.5, size = 6, lower.tail = FALSE)
[1] 0.387207
# Solve for the smallest value w such that P[Y<=w] > 0.5
qnbinom(0.5, size = 6, prob = 0.5)
[1] 5
# Solve for the largest value w such that P[Y>w] > 0.4
qnbinom(0.4, size = 6, prob = 0.5, lower.tail = FALSE)
[1] 6
table of contents 75
MATH 3301 / Notes Section 3.9
table of contents 76
Section 3.9 MATH 3301 / Notes
0.100
0.075
0.050
0.025
0.000
0 5 10 15 20
table of contents 77
MATH 3301 / Notes Section 3.9
G <-
ggplot(data = M) +
geom_histogram(
mapping = aes(x = rv, y = ..density..),
fill = 'darkorange',
color = 'darkorange',
alpha = 0.5,
breaks = seq(from = -0.5, to = max(M$rv) + 0.5, by = 1)) +
labs(
x = NULL,
y = NULL,
title = 'Random Negative Binomial Numbers, p = 0.5, n = 6',
subtitle = NULL,
caption = 'Summer 2019')+
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())
table of contents 78
Section 3.9 MATH 3301 / Notes
0.10
0.05
0.00
0 5 10 15 20
Summer 2019
table of contents 79
MATH 3301 / Notes Section 3.10
Introduction
table of contents 80
Section 3.10 MATH 3301 / Notes
m N −m
x k−x
pX (x) =
N
k
mk
E[X] =
N
m N − m N − n
V [X] = k
N N N −1
table of contents 81
MATH 3301 / Notes Section 3.10
Identity
n
X
Since pX (k) = 1, we should expect the following identity to be true
k=0
k
X m N −m N
=
i=0
i k−i k
table of contents 82
Section 3.10 MATH 3301 / Notes
[1] 0.3916084
# X is a hyper-geometric random variable with parameters m = 53, n = 100
# and k = 77
# Compute P[X <= 20]
phyper(q = 20, m = 53, n = 100, k = 77)
[1] 0.01773732
[1] 0.4269205
# X is a hyper-geometric random variable with m = 15, n = 17, and k = 11
# Compute the smallest value w such that P[X < w] > 0.65
qhyper(p = 0.65, m = 15, n = 17, k = 11)
[1] 6
table of contents 83
MATH 3301 / Notes Section 3.10
G <-
ggplot(
data = M,
mapping = aes(x = x, y =y)) +
geom_col(
color = 'darkviolet',
fill = 'darkviolet',
alpha = 0.7,
width = 1) +
scale_x_continuous(
breaks = seq(from = 0, to = 10, by = 1))+
theme_classic()+
labs(
x = NULL,
y = NULL,
title = 'A Hypergeometric Probability Distribution',
subtitle = NULL)+
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())
table of contents 84
Section 3.10 MATH 3301 / Notes
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9 10
table of contents 85
MATH 3301 / Notes Section 3.10
G <-
ggplot(
data = M,
mapping = aes(x = x, y = ..density..))+
geom_histogram(
color = 'darkturquoise',
fill = 'darkturquoise',
alpha = 0.7,
binwidth = 1) +
theme_bw() +
labs(
title = 'Random Hyper Geometric Numbers',
subtitle = NULL,
x = NULL,
y = NULL) +
theme(
text = element_text(size = 18, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())
table of contents 86
Section 3.10 MATH 3301 / Notes
0.10
0.05
0.00
15 20 25 30 35
table of contents 87
MATH 3301 / Notes Section 3.11
Introduction
The Poisson distribution has a single parameter which is often called λ. A Poisson
random variable X can take only the values X = 0, 1, 2, 3, . . . Examples of Poisson
random variables may include, the number accidents per day at a factory, the number
of typos per page in a printed book, or the number of customers arriving at a store
in the next hour.
λk −λ
pX (k) = e
k!
λ
pX (k) = pX (k − 1) ×
k
E[X] = λ
V [X] = λ
t
mX (t) = eλ(e −1)
PX (t) = eλ(t−1)
(
bλc for λ 6∈ Z+
mode =
λ and λ − 1 for λ ∈ Z+
table of contents 88
Section 3.11 MATH 3301 / Notes
table of contents 89
MATH 3301 / Notes Section 3.11
∞
x
X xj
e =
j=0
j!
x2 x3
= 1+x+ + + ···
2! 3!
The power series expansion is important for problems about Poisson distributions.
Note the flowing application. Let X be a PoissonP random variable with mean λ.
Use the power series expansion of e to verify that ∞
x
k=0 P [X = k] = 1.
∞ ∞
X X λk
P [X = k] = e−λ
k!
k=0 k=0
∞
!
X λk
= e−λ
k!
k=0
= e−λ eλ
= e0
= 1
table of contents 90
Section 3.11 MATH 3301 / Notes
Poisson Distributions
[1] 0.2240418
# X is a Poission random variable with mean 5. Compute P[X <= 6]
ppois(q = 6, lambda = 5)
[1] 0.7621835
# X is a Poisson random variale with mean 4. Compute P[X > 5]
ppois(q = 5, lambda = 4, lower.tail = FALSE)
[1] 0.2148696
# X is a Poisson random variable with mean 2 Compute P[4 <= X <= 7]
ppois(q = 7, lambda = 2) - ppois(q = 3, lambda = 2)
[1] 0.1417798
# X is a Poisson random variable with mean 3.3. Solve for the
# smallest value of w such that P[X < w] > 0.99. Check your answer
qpois(p = 0.99, lambda = 3.3)
[1] 8
ppois(q = 8, lambda = 3.3)
[1] 0.9930882
ppois(q = 7, lambda = 3.3)
[1] 0.9802229
table of contents 91
MATH 3301 / Notes Section 3.11
M <-
data_frame(
x = seq(from = 0, to = 20, by = 1),
y = dpois(x, lambda = 5))
G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_col(
color = 'darkgreen',
fill = 'darkgreen',
alpha = 0.4,
width = 1)+
labs(
x = 'Random Variable',
y = NULL,
title = 'Poisson Probability Function',
subtitle = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 18,
color = 'black',
face = 'italic',
family = 'serif'))
table of contents 92
Section 3.11 MATH 3301 / Notes
0.15
0.10
0.05
0.00
0 5 10 15 20
Random Variable
table of contents 93
MATH 3301 / Notes Section 3.11
table of contents 94
Section 3.11 MATH 3301 / Notes
0.15
0.10
0.05
0.00
0 5 10
Random Variable
table of contents 95
MATH 3301 / Notes Section 3.11
# Plot F_x
M <- data_frame(x, F_x)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = F_x)) +
geom_col(
color = 'darkorange',
fill = 'darkorange',
alpha = 0.6,
width = 1) +
labs(
title = 'A Cumulative Distribution Function / Poisson ',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
table of contents 96
Section 3.11 MATH 3301 / Notes
0.75
0.50
0.25
0.00
0 5 10 15
x−values
table of contents 97
MATH 3301 / Notes Section 3.11
# Plot F_x
M <- data_frame(x, s_x)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = s_x)) +
geom_col(
color = 'red',
fill = 'red',
alpha = 0.3,
width = 1) +
labs(
title = 'The Survival Function of a Poisson Random Variable',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
table of contents 98
Section 3.11 MATH 3301 / Notes
0.75
0.50
0.25
0.00
0 5 10 15
x−values
table of contents 99
MATH 3301 / Notes Section 3.11
0.75
0.50
0.25
0.00
0 5 10
x−values
1. FX (x) is non-decreasing
3. lim FX (x) = 0
x→−∞
4. lim FX (x) = 1
x→+∞
MATH 3301 / Notes Section 4.1
You can work with many CDFs in R. The below script demostrates the four prperties above
using the CDF for standard normal random variable.
G <-
ggplot(
data = M, mapping = aes(x = x, y = y)) +
geom_line(
color = 'red',
size = 1.2,
linetype = 'solid') +
labs(
title = 'A Cumulative Distribution Function',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / UT - Permian Basin') +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))
0.75
0.50
0.25
0.00
−4 −2 0 2 4
x−values
Statistics I / UT − Permian Basin
Let X be a random variable with cumulative distribution function FX (x). X is a continuous random
variable if
0
1. FX (x) exists everywhere except possibly for a finite set on any finite interval
0
2. FX (x) is continuous except possibly for a finite set on any finite interval
0
fX (x) = FX (x)
Zx
FX (x) = f (t) dt
−∞
Zx
If f (x) is a continuous function on the interval [a, b] and F (x) = f (t) dt, then
a
0 d
F (x) = [F (x)]
dx
x
Z
d
= f (t) dt
dx
a
= f (x)
1. fX (x) ≥ 0 ∀ x
Z+∞
2. fX (x) dx = 1
−∞
Zb
3. P [a ≤ X ≤ b] = fX (x) dx
a
4. P [a ≤ X ≤ b] = FX (b) − FX (a)
Let X be a continuous random variable with density function fX (x). The mode(s) of X are the
value(s) of X that maximize fX (x)
Let X be a continuous random variable with probability density function fX (x). The support of
fX (x) is {x | fX (x) 6= 0}.
Quantiles
Let X be a random variable with cumulative distribution function FX (x). For a number p ∈ (0, 1),
the pth quantile of X is the smallest number φp such that
P [X ≤ φp ] = FX (φp )
≥ p
Percentiles
For a continuous random variable X with cumulative distribution function FX (x), the 100pth per-
centile of X is the smallest number φp satisfying
P [X ≤ φp ] = FX (φp )
= p
The following script is an example of working with a probability density function using R. The density is a
gamma density which you will learn about later in the chapter.
# Create a data_frame
M <- data_frame(x = x, y = y)
0.3
0.2
0.1
0.0
0 2 4 6
x−values
[1] 0.2068576
Let X be a continuous random variable with density functions f (x). The expected
value of X is defined as
Z∞
E[X] = xf (x) dx
−∞
Let X be a continous variable with density f (x) and let g(X) be a function of X.
The expected value of g(X) is ...
Z∞
E[g(X)] = g(x)f (x) dx
−∞
For a constant c ∈ R
Z∞
E[c] = cf (x) dx
−∞
Z∞
= c f (x) dx
−∞
= c·1
= c
1. E[aX] = aE[X]
2. E[aX + b] = aE[X] + b
3. E[ag(X) + b] = aE[g(X)] + b
0.2 , for − 1 ≤ x ≤ 0
f (x) = 0.2 + (1.2)x , for 0 < x ≤ 1
0 , otherwise
# Expected Value
EX <- integrate(function(x){x * f(x)}, lower = -Inf, upper = +Inf)$value
EX
[1] 0.4
# Variance
VX <- integrate(function(x){(x - EX) ^ 2 * f(x)}, lower = -Inf, upper = +Inf)$value
VX
[1] 0.2733333
M <-
data_frame(
x = seq(from = -1, to = 1, length = 1e3),
y = f(x))
G <-
ggplot(
data = M,
mapping = aes())+
geom_area(
mapping = aes(x = x, y = y),
fill = 'thistle',
color = 'thistle',
alpha = 0.7,
size = 1.3,
linetype = 'solid') +
ylim(
lower = 0,
upper = 1.5) +
xlim(
lower = -1.5,
upper = +1.5) +
geom_segment(
data = data_frame(x = v, y = f(v)),
mapping = aes(x = x, xend = x, y = 0, yend = y),
color = 'purple',
linetype = 'dashed',
size = 1.3) +
labs(
title = 'A Probability Density Function',
subtitle = NULL,
x = 'x-axis',
y = NULL,
caption = 'Statistics I / Section 4.4') +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
1.0
0.5
0.0
−1 0 1
x−axis
Statistics I / Section 4.4
Z∞
= ext f (x) dx
−∞
Z∞
= eg(x)t f (x) dx
−∞
Notes
m(k)
k
X
(0) = E X
mX (ln a) = E aX
Theorem
= E Xk
Note
2
= E[X 2 ] − E[X]
(1) 2
= m(2)
X
(0) − mX (0)
Note
mX (0) = E[e0X ]
= E[e0 ]
= E[1]
= 1
Note
x
E[ax ] = E[eln(a ) ]
= E[ex ln a ]
= mX (ln a)
Linear Transformations
Let Y = aX + b. Then
mY (t) = E[etY ]
= E[et(aX+b) ]
= E[eatX etb ]
= etb mX (at)
0, for x < a
x−a
FX [x] = , for a ≤ x ≤ b
b − a
1, for b < x
bn+1 − an+1
E[X n ] =
(n + 1)(b − a)
a+b
E[X] =
2
(b − a)2
V [X] =
12
ebt − eat
MX (t) =
(b − a)t
# Plot
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = xr, y = ..density..),
color = 'purple',
fill = 'thistle',
alpha = 0.9,
breaks = seq(from = 0, to = 2, by = 0.05)) +
geom_line(
mapping = aes(x = xp, y = fp),
color = 'darkgreen',
size = 1.3,
linetype = 'solid') +
scale_x_continuous(
breaks = c(0, 1, 2),
labels = c(0, '', 2),
limits = c(-0.5, 2.5)) +
scale_y_continuous(
limits = c(0, 0.75)) +
theme_classic() +
labs(
title = 'Relative Frequency Histogram / Random Uniform Numbers',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))
0.6
0.4
0.2
0.0
0 2
x−values
N <- data_frame(
xp = seq(from = 0, to = 2, length = 1e4),
yp = punif(xp, min = 0, max = 2))
# Plot
G <-
ggplot(
data = M) +
geom_col(
mapping = aes(x = xc, y = yc),
color = 'blue',
fill = 'powderblue',
alpha = 0.9,
size = 0.5,
linetype = 'solid') +
geom_line(
data = N,
mapping = aes(x = xp, y = yp),
color = 'darkgreen',
size = 1.3,
linetype = 'solid') +
labs(
0.75
0.50
0.25
0.00
α−1 −x/β
x e
β α Γ(α) , 0 ≤ x
f (x) =
0, otherwise
Z∞
Γ(α) = xα−1 e−x dx
0
E[X] = αβ
V [X] = αβ 2
mX (t) = (1 − βt)−α
Notation
√
3. Γ(1/2) = π
Define W = X + Y .
gamma(3.8)
[1] 4.694174
gamma(5)
[1] 24
G <-
ggplot(
data = M)+
geom_area(
mapping = aes(x = x, y = y),
fill = 'orange',
alpha = 0.3) +
geom_line(
mapping = aes(x = xp, y = yp),
color = 'orange',
size = 1.2,
linetype = 'solid') +
geom_text(
mapping = aes(x = 4.50, y = 0.05),
label = paste('P[2 < X < 7] = ', round(prob, digits = 2)),
size = 5,
color = 'black',
family = 'serif',
fontface = 'italic',
angle = 0) +
labs(
title = 'A Gamma Density Curve',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))
0.100
0.075
0.025
0.000
0 5 10 15
x−values
M <-
expand.grid(
x = seq(from = 0, to = 10, length = 1e3),
shape = c(1, 2, 3),
scale = 2)
0.4
0.3
0.2
0.1
0.0
4.7 χ2 Distributions
Introduction
E[X] = ν
V [X] = 2ν
mX (t) = (1 − 2t)−ν/2
M <-
expand.grid(
x = seq(from = 0, to = 10, by = 0.01),
df = c(2, 3, 4))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(df)),
size = 1.2,
linetype = 'solid') +
scale_y_continuous(
limits = c(0, 0.75)) +
labs(
title = 'Chi-Squared Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / Section 4.7',
color = 'Degrees of Freedom') +
theme(
legend.position = c(0.83, 0.87),
legend.key.width = unit(0.6, units = 'inches'),
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
Degrees of Freedom
2
3
4
0.6
0.4
0.2
0.0
FX [x] = 1 − e−x/β
sX [x] = e−x/β
P [a ≤ x ≤ b] = e−a/β − e−b/β
E[X] = β
V [X] = β 2
MX (t) = (1 − βt)−1
[1] 0.2231302
require(actuar)
# Use the actuar package. X is exponential with mean 0.5, calculate E[X^5]
mexp(order = 5, rate = 2)
[1] 3.75
integrate(function(x){x ^ 5 * dexp(x, rate = 2)}, lower = 0, upper = Inf)$value
[1] 3.75
# Calculate E[min(X,5)^2]
levexp(limit=5, rate = 2, order = 2)
[1] 0.4997503
integrate(f = function(x){((x < 5) * x + (5 < x) * x) ^ 2 * dexp(x, rate = 2)},
lower = 0,
upper = Inf)$value
[1] 0.5
# Calculate E[2^x]
mgfexp(log(2), rate = 2, log = FALSE)
[1] 1.530394
# Calculate E[2^x] using integration
integrate(
function(x){2 ^ x * dexp(x, rate = 2, log = FALSE)},
lower = 0,
upper = Inf)$value
[1] 1.530394
# Create a vector of 1000 random numbers from an exponential distribution with mean 5 and plot
togram with 50 bins
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x, y = ..density..),
color = 'red',
fill = 'red',
alpha = 0.6,
breaks = seq(from = 0, to = 30, by = 1)) +
labs(
x = 'Random Exponential Numbers',
y = NULL,
title = 'Relative Frequency Histogram / Exponential Numbers',
subtitle = NULL,
caption = 'Statistics I / Section 4.8') +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
0.15
0.10
0.05
0.00
0 10 20 30
Random Exponential Numbers
Statistics I / Section 4.8
[1] 0.04978707
integrate(f_cond, lower = a + b, upper = Inf)$value
[1] 0.04978708
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(rate)),
size = 1.2) +
scale_color_manual(
name = 'rate',
values = c('darkred', 'darkgreen', 'darkorange'))+
labs(
title = 'Exponential Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / Section 4.8') +
theme(
legend.position = c(0.85, 0.85),
axis.ticks = element_blank(),
text = element_text(
size = 14,
face = 'italic',
family = 'serif',
color = 'black'))
0.75
0.50
0.25
0.00
The beta distribution has two parameters α > 0 and β > 0. The density function is
defined on the closed interval [0, 1]. For a random variable X with beta distribution
and parameters α > 0 and β > 0
α−1
x (1 − x)β−1
, for 0 ≤ x ≤ 1
B(α, β)
fX (x) =
0, otherwise
Z1
B(α, β) = xα−1 (1 − x)β−1 dx
0
Γ(α)Γ(β)
=
Γ(α + β)
α
E[X] =
α+β
αβ
V [X] =
(α + β)2 (α + β + 1)
α−1
mode =
α+β−2
The moment generating function for a beta function does not exist in closed form.
To use the Beta distribution but on the interval [a, b] instead of [0, 1]...
y−a
y∗ = where a ≤ y ≤ b
b−a
The new variable y ∗ is defined on the interval [0, 1]
[1] 0.44118
# X is a beta random variable with alpha = 2 and beta = 6
# Find the first quartile of X
qbeta(p = 0.25, shape1 = 2, shape2 = 6)
[1] 0.137974
M <-
expand.grid(
x = seq(from = 0, to = 1, by = 0.01),
alpha = 3,
beta = c(1, 2, 3)) %>%
mutate(y = dbeta(x = x, shape1 = alpha, shape2 = beta))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, col = as.factor(beta)),
size = 1.2,
linetype = 'solid') +
scale_color_manual(
name = 'Beta',
values = topo.colors(3)) +
labs(
title = 'Beta Density Functions',
subtitle = NULL,
x = 'x-values',
y = 'Prabability Density',
caption = 'Statistics I / Section 4.9') +
theme(
legend.position = c(0.3, 0.8),
legend.direction = 'horizontal',
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
family = 'serif',
face = 'italic'))
Beta 1 2 3
2
Prabability Density
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x, y = ..density..),
color = 'black',
fill = 'purple',
alpha = 0.2,
breaks = seq(from = 0, to = 1, length = 30)) +
geom_line(
mapping = aes(x = x, y = dbeta(x, shape1 = 3, shape2 = 2)),
color = 'purple',
size = 1.2,
linetype = 'solid') +
labs(
x = 'Random Variable',
y = NULL,
title = 'Relative Frequency Histogram / Beta Numbers',
subtitle = NULL,
caption = 'Statistics I / Section 4.9') +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
family = 'serif',
face = 'italic'))
2.0
1.5
1.0
0.5
0.0
Introduction
The normal distribution is very important and there is alot of basic facts that you
will want to know. Let X be a normal random variable with mean µ and variance
σ 2 we symbolize this by X ∼ N (µ, σ 2 ) . There is not a closed form for the CDF, it
is just expressed as an integral.
1 2 2
fX (x) = √ e−(x−µ) /(2σ ) −∞<x<∞
σ 2π
Zx
1 2
/(2σ 2 )
FX (x) = √ e−(t−µ) dt
σ 2π
−∞
E[X] = µ
V [X] = σ 2
mode = µ
median = µ
2 2
mX (t) = eµt+t σ /2
Zz
1 2
Φ(z) = √ e−t /2
dt
2π
−∞
E[Z] = 0
V [Z] = 1
mode = 0
median = 0
2
mX (t) = et
1. Φ(z) + Φ(−z) = 1
Extreme values from a data set or distribution are often referred to as outliers. One
definition of an outlier are data values whose z-scores are greater than +3 or less
than -3.
[1] 0.1586553
# X is a random variable with mean = 5 and sd = 2. Calculate P[X < 6]
pnorm(q = 6, mean = 5, sd = 2, lower.tail = TRUE)
[1] 0.6914625
# Z is a standard normal random variable. Compute the 66th percentile of Z
qnorm(p = 0.66, mean = 0, sd = 1)
[1] 0.4124631
# mu = -3, sigma = 4. Compute the smallest number w such that P[X > w] < 0.33
qnorm(p = 0.33, mean = -3, sd = 4, lower.tail = FALSE)
[1] -1.240347
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(sigma)),
size = 1.3,
linetype = 'solid') +
scale_color_manual(
values = c('darkgreen', 'darkorange'),
name = 'Standard Deviation') +
labs(
title = 'Normal Density Functions',
caption = 'Statistics I / Section 4.10',
x = 'Normal Numbers',
y = NULL) +
theme(
legend.position = c(0.8, 0.9),
legend.key.width = unit(1, units = 'inches'),
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
0.3
0.2
0.1
0.0
5.1 Introduction
In this chapter of the course, joint probability distributions are introduced. The first section
provides an example of using R and heatmaps to visualize a bivariate probability density.
library(tidyverse)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_tile(
mapping = aes(fill = density)) +
MATH 3301 / Notes Section 5.1
labs(
title = 'Bivariate Normal Density Function, John Garza',
subtitle = NULL,
x = 'x-axis',
y = 'y-axis',
caption = 'Statistics I / Section 5.1')+
scale_fill_continuous(
name = 'density',
type = 'viridis') +
theme_classic() +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'),
legend.position = c(0.86, 0.2))
2
y−axis
density
0.6
−2 0.4
0.2
−2 0 2
x−axis
Statistics I / Section 5.1
ux <- -2.1
uy <- +0.5
sx <- +1.3
sy <- +2.3
rho <- -0.7
low = 'yellow',
high = 'steelblue') +
geom_point(
size = 1,
alpha = 0.2) +
scale_y_continuous(
breaks = c(-4, 0, 4, 8),
labels = c(-4, '', 4, 8)) +
scale_x_continuous(
breaks = c(-6, -4, -2, 0, 2),
labels = c(-6, -4, '', 0, 2)) +
labs(
title = 'Random Bivariate Normal Numbers',
caption = 'Statistics I / Test #3',
x = 'x-axis',
y = 'y-axis') +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(vjust = +5),
axis.title.y = element_text(vjust = -3),
legend.position = c(0.09, 0.15),
text = element_text(
face = 'italic',
family = 'serif',
size = 16,
color = 'black'),
axis.ticks = element_blank())
G
4
y−axis
Density
−4 0.06
0.04
0.02
−6 −4 x−axis 0
Statistics I / Test #3
Let X and Y be discrete random variables. The joint probability function for X
and Y is defined as
p(x, y) = P (X = x, Y = y)
−∞ < x < +∞
−∞ < y < +∞
For discrete random variables X and Y with joint probability function p(x, y)
1. p(x, y) ≥ 0 ∀x, y
X
2. p(x, y) = 1
all x,y
−∞ < x < +∞
−∞ < y < +∞
Let X and Y be random variables with joint distribution function F (x, y). Then
1. F (−∞, −∞) = 0
2. F (−∞, y) = 0
3. F (x, −∞) = 0
4. F (+∞, +∞) = 1
5. For x∗ ≥ x and y ∗ ≥ y,
Let X and Y be continuous random variables with joint distribution function F (x, y).
If there exists a non-negative function f (x, y) satisfying
Zx Zy
F (x, y) = f (x, y) dydx ∀ x and y
−∞ −∞
Then X and Y are said to be jointly continuous and f (x, y) is called the joint
probability density function of X and Y .
Let X and Y be jointly continuous random variables density function f (x, y).
1. f (x, y) ≥ 0 ∀ (x, y)
Z+∞ Z+∞
2. f (x, y) dy dx = 1
−∞ −∞
Zb Zd
3. P (a < X < b, c < Y < d) = f (x, y) dy dx
a c
Example
A device runs until either of two components fails, at which point the device stops running.
The joint density function of the lifetimes of the two components, both measured in hours,
is
x+y
8 , for 0 < x < 2, 0 < y < 2
f (x, y) =
0 , otherwise
Calculate the probability that the device fails during its first hour of operation. Plot the
joint density function and also show the region corresponding to the probability that the
device fails during its first hour of operation.
Solution
# Limits
xl <- function(y){0}
xu <- function(y){2}
yl <- function(x){0}
yu <- function(x){2}
prob
[1] 0.625
N <-
data_frame(
x = c(0, 0, 1, 1, 2, 2),
xend = c(0, 1, 1, 2, 2, 0),
y = c(0, 2, 2, 1, 1, 0),
yend = c(2, 2, 1, 1, 0, 0))
G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_tile(
mapping = aes(fill = z)) +
scale_x_continuous(
breaks = c(0.0, 0.5, 1.0, 1.5, 2.0),
labels = c('0.0', 0.5, '', 1.5, '2.0')) +
scale_y_continuous(
breaks = c(0.0, 0.5, 1.0, 1.5, 2.0),
labels = c('0.0', 0.5, '', 1.5, '2.0')) +
scale_fill_continuous(
name = 'Density',
type = 'viridis') +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'A Bivariate Probability Density Function, John Garza',
caption = 'Statistics I / Section 5.2') +
geom_segment(
data = N,
mapping = aes(x = x, xend = xend, y = y, yend = yend),
size = 0.4) +
geom_text(
x = 0.5,
y = 0.5,
label = 'Area',
size = 7,
fontface = 'italic',
family = 'serif') +
theme_classic() +
theme(
legend.position = c(0.86, 0.2),
legend.direction = 'vertical',
text = element_text(
size = 16,
color = 'black',
family = 'serif',
face = 'italic'),
axis.line = element_blank(),
axis.title.x = element_text(
vjust = -0.8),
axis.title.y = element_text(
vjust = +0.4),
axis.ticks = element_blank())
1.5
y−axis
Density
0.5 Area
0.4
0.3
0.2
0.1
0.0
0.0
Let X and Y be discrete random variables with joint probability function p(x, y).
The marginal probability functions of X and Y are defined as
X
pX (x) = p(x, y)
all y
X
pY (y) = p(x, y)
all x
Let X and Y be continuous random variables with joint density function f (x, y).
The marginal density functions of X and Y are
Z+∞
fX (x) = f (x, y) dy
−∞
Z+∞
fY (y) = f (x, y) dx
−∞
Let X and Y be discrete random variables with joint probability function p(x, y).
The conditional probability functions are defined as
p(x | y) = P (X = x|Y = y)
P (X = x, Y = y)
=
P (Y = y)
p(x, y)
=
pY (y)
p(y | x) = P (Y = y|X = x)
P (X = x, Y = y)
=
P (X = x)
p(x, y)
=
pX (x)
Note:
F (x | y) = P (X ≤ x | Y = y)
Let X and Y be continuous random variables with joint density function f (x, y).
The conditional density functions are defined as
f (x, y)
f (x | y) =
fY (y)
f (x, y)
f (y | x) =
fX (x)
Note:
Introduction
Let X and Y be random variables with joint distribution function F (x, y). Then X
and Y are independent if and only if
F (x, y) = FX (x) × FY (y) for all real numbers x, y
Let X and Y be discrete random variables with joint probability function p(x, y)
and marginal probability functions pX (x) and pY (y). X and Y are independent if
and only if
p(x, y) = pX (x) × pY (y)
for all real numbers x and y
Let X and Y be continuous random variables with joint density function f (x, y) and
marginal density functions fX (x) and fY (y). Then X and Y are independent if and
only if
f (x, y) = fX (x) × fY (y)
for all real numbers x and y.
Let X and Y be continuous random variables with joint density function f (x, y).
Suppose that f (x, y) > 0 on the rectangular region defined by a ≤ x ≤ b and c ≤
y ≤ d and that f (x, y) = 0 otherwise. Then X and Y are independent random
variables if and only if there exist functions h(x) and g(y) such that
f (x, y) = h(x) × g(y)
where h(x) is a function of x only and g(y) is a function of y only.
Let X and Y be jointly continuous random variables with joint density function
−x−2y
2e
for x > 0, y > 0
f (x, y) =
0 otherwise
Z+∞ Z+∞
E[X + Y ] = (x + y)f (x, y) dy dx
−∞ −∞
# Limits of integration
xl <- function(y){0}
xu <- function(y){Inf}
yl <- function(x){0}
yu <- function(x){Inf}
[1] 1.5
face = 'italic'),
axis.ticks = element_blank())
1.5
1.0
0.5
0.75
0.0
y−axis
0.25
0.00
1. For a constant c ∈ R,
E[c] = c
E[cg(X, Y )] = cE[g(X, Y )]
4. Let g(X) and h(Y ) be functions of the independent random variables X and Y .
Introduction
Covariance and correlation are very important topics and will play an important role
in your future studies. The definitions and equations presented in this section should
be learned very carefully. Additionally, there are many properties of covariance and
correlation to know and understand.
Let X and Y be random variables with means µX and µY . The covariance of X and
Y is
Cov(X, Y ) = E[(X − µX )(Y − µY )]
Correlation Coefficients, ρ
2. Cov(X, Y ) = Cov(Y, X)
3. Cov(X, X) = V ar(X)
5. Cov(aX, Y ) = a × Cov(X, Y )
6. Cov(X, Y + a) = Cov(X, Y )
7. Cov(X, Y ) = ρXY σX σY
8. −1 ≤ ρXY ≤ +1
9. |Cov(X, Y )| ≤ σX σY
8
3 xy
, 0 ≤ x ≤ 1, x ≤ y ≤ 2x
f (x, y) =
0 , otherwise
source('iterated_integral.R')
# Limits of integration
xl <- function(y){0}
xu <- function(y){1}
yl <- function(x){x}
yu <- function(x){2 * x}
# Calculate Cov(X, Y)
Cov <- E_XY - E_X * E_Y
Cov
[1] 0.04148148
# Standard Deviation, sd
st_dev <- function(x){sqrt(sum((x - mean(x)) ^ 2) / (length(x) - 1))}
rx <- runif(n = 1e3, min = -5, max = +5)
st_dev(rx)
[1] 2.820544
sd(rx)
[1] 2.820544
# Variance, var
var_x <- function(x){sum((x - mean(x)) ^ 2) / (length(x) - 1)}
rx <- rgamma(n = 1e4, scale = 2, shape = 2)
var_x(rx)
[1] 7.922016
var(rx)
[1] 7.922016
# Covariance, cov
cov_xy <- function(x,y){sum((x - mean(x)) * (y - mean(y))) / (length(x) - 1)}
rx <- rexp(n = 1e2, rate = 3)
ry <- rbinom(n = 1e2, size = 10, prob = 0.5)
cov_xy(rx,ry)
[1] 0.08773042
cov(rx,ry)
[1] 0.08773042
# Correlation, cor
cor_xy <- function(x,y){cov_xy(x,y) / (st_dev(x) * st_dev(y))}
rx <- rgamma(n = 1e4, shape = 2, scale = 3)
ry <- rbeta(n = 1e4, shape1 = 2, shape2 = 3)
cor_xy(rx, ry)
[1] 0.002670185
cor(rx, ry)
[1] 0.002670185
Example:
Analyze the harmon74.cor dataset by generating a heat map
face = 'italic'),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
Test Covariance
WordRecognition
WordMeaning
WordClassification
VisualPerception
StraightCurvedCapitals
SeriesCompletion
SentenceCompletion
ProblemReasoning
PargraphComprehension
PaperFormBoard 1.00
ObjectNumber 0.75
NumericalPuzzles
NumberRecognition 0.50
NumberFigure 0.25
GeneralInformation 0.00
Flags
FigureWord
FigureRecognition
Deduction
Cubes
CountingDots
Code
ArithmeticProblems
Addition
un le on
tin C ms
re D CgDode
en Fi coguctbes
m rRe er at gs
Re ed u ts
ra PaObricaogniguon
ph pe je lP it re
SeProComrFo tNuuzzlon
St nte le pr mBmbes
ra S nc mR eh o er
ht ie o as si d
W Visurv ompleningn
dC lP C let on
W Wlasesrcepitaon
or or if p ls
ec ea tion
og n n
ni ing
n
ig er eC e en ar
e g n io
C sC m o o
dR dMicatio
tio
o
Co rodbiti
e c F i
i
or ua ed p ti
a i
ic d
et A
P
r
c
m
gu
b
ith
Fi
Ar
rg
Pa
Introduction
Linear functions of random variables will appear in your studies often. It will be
important to know how expected value, variance, and covariance relate to linear
combinations of random variables. You probably want to derive these identities, but
we will not include that here. The textbooks contain the derivations which rely on
the elementary properties of expected value.
Notation
2. a1 , . . . , an are constants
3. U = a1 X1 + · · · + an Xn
5. b1 , . . . , bm are constants
6. W = b1 Y1 + · · · + bm Ym
1.
E[U ] = E[a1 X1 + · · · + an Xn ]
= a1 E[X1 ] + · · · + an E[Xn ]
n
" #
X
= ai × E[Xi ]
i=1
2.
V [U ] = V [a1 X1 + · · · + an Xn ]
n
" # " #
X X
= a2i × V [Xi ] + 2ai aj Cov(Xi , Xj )
i=1 i<j
3.
Cov(U, W ) = Cov(a1 X1 + · · · + an Xn , b1 Y1 + · · · + bm Ym )
n X
m
" #
X
= ai bj × Cov(Xi , Yj )
i=1 j=1
The most common case of a linear combination consists of two variables and two
constants. It is a good idea to remember the formulas for this situation. Let X, Y,
and W be random variables and let a, b, and c be constants. Then
# Compute V[2X - 3Y] using the idendity V[aX+bY] = a^2V[X] + 2abCov(X,Y) +b^2V[Y]
a ^ 2 * var(X) + 2 * a * b * cov(X,Y) + b ^ 2 * var(Y)
[1] 123.0317
[1] 123.0317
Multinomial Experiments
1. n identical trials
Notes
• p 1 + · · · + pk = 1
• X1 + · · · + Xk = n
Multinomial Distributions
k
X
1. pi = 1
i=1
2. pi > 0 for i = 1, . . . , k
1. E[Xi ] = npi
# For a multinomial distribution with p_1 = 0.2, p_2 = 0.5, p_3 = 0.3 and k = 3
# Generate 1,000,000 random vectors from the distribution.
# Compute cov(X_1, X_2)
# Compare the result to the analytic formula cov(X_1, X_2) = -n * p_1 * p_2
cov(M[1, ], M[2, ])
[1] -1.002118
# using the formula cov(X_i, X_j) = -n * p_i * p_j
If five auto policy holders are randomly selected for a study, what is the probability that
one policy holder is selected from each age group?
The random variables X and Y have a bi-variate normal distribution if the joint
density function of X and Y is
−Q/2
e
f (x, y) = p where Q is defined below
2πσX σY 1 − ρ2
(y − µY )2 (x − µX )(y − µY ) (x − µX )2
1
Q = − 2ρ +
1 − ρ2 σY2 σX σY σX2
Cov(X, Y )
ρ =
σX σY
2. E[X] = µX
3. V [X] = σX2
4. E[Y ] = µY
5. Y [Y ] = σY2
σX
2 2
6. X|Y = y ∼ N µX + ρ (y − µY ), (1 − ρ )σX
σY
σY 2 2
7. Y |X = x ∼ N µY + ρ (x − µX ), (1 − ρ )σY
σX
µX = −1
σX = 1
µY = −1
√
σY = 2
Cov(X, Y ) = −0.30
persp(x, y, z,
main = paste('Bivariate Normal Density '),
col = 'lightblue',
theta = 30,
phi = 20,
r = 50,
d = 0.1,
expand = 0.5,
ltheta = 90,
lphi = 180,
shade = 0.75,
ticktype = 'simple',
border = FALSE,
zlab = '',
xlab = '',
ylab = '',
box = FALSE)
Use iterated integration and the the function defined in the previous example to
calculate P [−4 ≤ X ≤ −1, −1 ≤ Y ≤ 0]
# Source files
source('bivariate.R')
source('iterated_integral.R')
This is the same answer we got when we used the mvtnorm package.
Let X and Y be random variables and let g(X) be a function of X. The conditional
expectation of g(X) given Y = y is defined as
Z+∞
1. E[g(X)|Y = y] = g(x)f (x|y) dx for X and Y jointly continuous.
−∞
X
2. E[g(X)|Y = y] = g(x)p(x|y) for X and Y jointly discrete.
all x
Conditioning Formulas
The following formulas are sometimes call the conditioning formulas. The proofs
are not too hard and rely on the identity f (x, y) = f (x|y)fY (y). The proofs are
contained in the textbook.
h i
1. E[X] = EY EX [X|Y ]
h i h i
2. V [X] = EY VX [X|Y ] + VY EX [X|Y ]
Conditional Variance
2
V [X|Y = y] = E[X 2 |Y = y] − E[X|Y = y]
A machine has two components and fails when both components fail. The number of years
from now until the first component fails, X, and the number of years from now until the
machine fails, Y , are random variables with joint density function
1 −(x+y)/6
18 e , 0<x<y
f (x, y) =
, otherwise
Calculate V (Y |X = 2).
# Calculate E[Y^2|X=2]
E_Y2 <- integrate(function(y){y ^ 2 * f_cond(y)}, lower = 2, upper = Inf)$value
# Calculate E[Y|X=2]
E_Y1 <- integrate(function(y){y ^ 1 * f_cond(y)}, lower = 2, upper = Inf)$value
# Calculate V[Y|X=2]
V_Y <- E_Y2 - (E_Y1)^2
V_Y
[1] 36
Ch5 Index
John Garza
Ch5 Index
John Garza
Ch5 Index
John Garza
The first part of this section described conceptual formulas for theoretical models. In
practice you may instead find yourself working with data that is presented in an excel
spreadsheet. Here we will look at importing bivariate data from excel into R, plotting a
bivariate histogram, and then computing a conditional expected value based on the excel
spread sheet data
# Load the readxl package
library(readxl)
5.0
2.5
y−axis
0.0
Count
12.5
10.0
7.5
5.0
−2.5 2.5
−2 0 x−axis 2 4
Statistics
Example:
M %>%
filter(Y > 1) %>%
mutate(sin_x = sin(X)) %>%
summarize(ex_conditional = mean(sin_x))
# A tibble: 1 x 1
ex_conditional
<dbl>
1 0.557
M %>%
filter(between(x = X, left = 1, right = 3)) %>%
mutate(sin_y = sin(Y)) %>%
summarize(variance = var(sin_y))
# A tibble: 1 x 1
variance
<dbl>
1 0.367
# Review M
head(M)
# A tibble: 6 x 3
stores products status
<chr> <chr> <chr>
1 Store - A Product - A high
2 Store - A Product - B very high
3 Store - A Product - C high
4 Store - A Product - D high
5 Store - A Product - E high
6 Store - A Product - F very low
unique(M$status)
[1] "high" "very high" "very low" "medium" "low"
# Set levels for status
M$status <- factor(M$status, levels = c('very high', 'high', 'medium', 'low', 'very low'))
axis.line = element_blank(),
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Danger Levels
Product − Z
Product − Y
Product − X
Product − W
Product − V
Product − U
Product − T
Product − S
Product − R
Product − Q
Product − P Danger Level
Product − O very high
Product − N high
Product − M medium
Product − L low
Product − K very low
Product − J
Product − I
Product − H
Product − G
Product − F
Product − E
Product − D
Product − C
Product − B
Product − A
St ore − A
Store − CB
Store − D
Store − E
o F
St re −− G
St ore H
S ore − I
Sttore − J
Store − K
St ore − ML
Store − N
St ore − O
Store − P
Store − Q
Sttore − R
Store − S
St ore − T
o U
St re − V
Store− W
Store − X
e−Y
Z
St ore −
or −
St ore −
St ore −
S ore −
Store
St
6.1 Introduction
The book assumes that population sizes are much larger than sample sizes and that
random variables obtained through random sampling are independent and identically
distributed. As a result, if X1 , . . . , Xn is a random sample from a distribution with
density function f (x). Then the joint density function will be
f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn )
If Y1 , Y2 , . . . , Yn is a random sample from a discrete distribution with probability
function p(y), then the joint probability function will be
p(y1 , y2 , . . . , yn ) = p(y1 )p(y2 ) · · · p(yn )
MATH 3301 / Notes Section 6.2
Introduction
Note: The next sections will describe each of these methods in more detail.
Introduction
Let X be a continuous random variable with density function fX (x) and let U (X)
be a function of X. We can solve for the cumulative distribution function FU (u) of
U directly by integrating X over the region corresponding to U ≤ u. The density
function fU (u) can then be found by differentiation using
dh i
fU (u) = F (u)
du U
The textbook summarizes the method of distribution function in four steps. Let U
be a function of the random variables X1 , . . . , Xn
3. Integrate the joint density function f (x1 , . . . , xn ) over the region U ≤ u to ob-
tain FU (u) = P [U ≤ u]
Note
You will have to do many practice questions to get good at this method
Example:
Let X be a continuous random variable that is uniform over the interval [0, 1]
" #
d √
F [X] = P [X ≤ x] = u
du
Zx
1
= f (t) dt = √ 0≤u≤1
2 u
0
Zx
= 1 dt
0
= x
FU [u] = P [U ≤ u]
= P [X 2 ≤ u]
√
= P [X ≤ u]
√
= FX [ u]
√
= u
# Create a histogram of u
Gu <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x.squared, y = ..density..),
fill = 'darkred',
color = 'darkred',
size = 0.2,
alpha = 0.5,
bins = 77) +
geom_line(
mapping = aes(x = us, y = fu),
color = 'darkred',
size = 1.2) +
geom_hline(
yintercept = 1,
linetype = 'solid',
size = 1.2,
color = 'darkblue') +
labs(
title = 'Squared Random Uniform Numbers',
x = 'x.squared values',
y = 'Relative Frequency',
caption = 'Statistics I, Section 6.4') +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5),
labels = c(0, 1, '', '', 4, 5),
limits = c(0, 5)) +
scale_x_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00'),
limits = c(0, 1)) +
theme_bw() +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'),
plot.title = element_text(hjust = 0.5))
Gu
4
Relative Frequency
Introduction
FU (u) = P [U ≤ u]
= P [h(X) ≤ u]
= P [X ≤ h−1 (u)]
= FX [h−1 (u)]
Now apply the chain rule for differentiation to obtain the density fU (u).
d
fU (u) = FX [h−1 (u)]
du
d −1
= fX [h−1 (u)] · h (u)
du
= fX (x(u))x0 (u)
Since h(x) is strictly increasing, h−1 (u) is also strictly increasing. Therefore
d −1
h (u) > 0
du
d d −1
−1
h (u) = h (u)
du du
d
−1 −1
fU (u) = fX (h (u)) h (u)
du
= fX (x(u))|x0 (u)|
d −1
2. Calculate the derivative g (w)
dw
−1 d −1
3. fW (w) = fX g (w) g (w)
dw
3. Apply the transformation formula to get the joint density of U and X, f (x, u)
Z+∞
fU (u) = f (x, u) dx
−∞
Example:
Solution:
1 , for 1 ≤ x ≤ 2
f (x) =
0 , otherwise
u(x) = x2
√
x(u) = u
1
x0 (u) = √
2 u
1
= (1) × √
2 u
1
= √ , 1≤u≤4
2 u
# Create a data_frame
M <-
data_frame(
x = runif(n= 10000, min = 1, max = 2),
u = x ^ 2,
xs = seq(from = 1, to = 2, length = 1e4),
fx = fx(xs),
us = seq(from = 1, to = 4, length = 1e4),
fu = fu(us))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = us, y = fu),
color = 'darkgreen',
size = 1.2) +
geom_histogram(
mapping = aes(x = xs, y = ..density..),
fill = 'darkred',
alpha = 0.3,
bins = 77,
color = 'black',
size = 0.4) +
geom_histogram(
mapping = aes(x = u, y = ..density..),
fill = 'darkorange',
alpha = 0.2,
bins = 77,
color = 'black',
size = 0.4) +
labs(
title = 'The Transformation Formula',
y = 'Relative Frequency',
x = 'Random Numbers',
caption = 'Statistics I / Section 6.4') +
theme_bw() +
theme(
plot.margin = margin(unit = 'cm', c(1, 1, 1, 1)),
axis.title.y = element_text(vjust = +5),
axis.title.x = element_text(vjust = -5),
plot.title = element_text(hjust = 0.5),
axis.ticks = element_blank(),
text = element_text(
size = 16,
face = 'italic',
family = 'serif'))
G
0.75
Relative Frequency
0.50
0.25
0.00
1 2 3 4
Let X and Y be variables with moment generating functions mX (t) and mY (t). If
mX (t) = mY (t)
for all values of t then X and Y have the same probability distribution.
Proof:
mY (t) = E[etY ]
= E[et(X1 +···+Xn ) ]
= E[etX1 ] × · · · × E[etXn ]
= m1 (t) × · · · × mn (t)
µY = E[Y ]
= a1 µ1 + · · · + an µn
σY2 = V [Y ]
Y ∼ N µY , σY2
The textbook uses the results of this section to derive the following result.
Xi − µi
3. Let Zi =
σi
n
X
4. Define Y = Zi2
i=1
Introduction
Let X and Y be continuous random variables with joint density function fX,Y (x, y).
Suppose that U (X, Y ) and V (X, Y ) are functions of the random variables. How can
we determine the joint density function of U and V ? Under certain conditions, the
bi-variate transformation method can be used. Before describing this method, lets
extend the definition of support to joint density functions.
Let X and Y be continuous random variables with joint density function fX,Y (x, y).
The support of fX,Y (x, y) is the set
n o
2
support of fX,Y (x, y) = (x, y) ∈ R fX,Y (x, y) > 0
Jacobian of a Transformation
∂x ∂x
∂(x, y) ∂u ∂v
=
∂(u, v)
∂y ∂y
∂u ∂v
! ! ! !
∂x ∂y ∂x ∂y
= −
∂u ∂v ∂v ∂u
Let X and Y be continuous random variables with joint density function fX,Y (x, y)
and suppose that T : (x, y) → (u, v) is a one-to-one function on the support of
fX,Y (x, y). If x(u, v) and y(u, v) have continuous partial derivatives with respect to
u and v and if
∂(x, y)
J =
∂(u, v)
! ! ! !
∂x ∂y ∂x ∂y
= −
∂u ∂v ∂v ∂u
6= 0
fU,V (u, v) = fX,Y x(u, v), y(u, v) × J
Example:
8xy
, 0≤x≤y≤1
f (x, y) =
0 , otherwise
⇓ = 8(uv)(v)(v)
x(u, v) = uv = 8uv 3
y(u, v) = v 0 ≤x≤y≤ 1
⇓ ⇓
∂x
= v 0 ≤v≤ 1
∂u
∂x
= u 0 ≤u≤ 1
∂v
∂y
= 1
∂v
∂y
= 0
∂u
∂(x, y)
J =
∂(u, v)
∂x ∂y ∂x ∂y
J = −
∂u ∂v ∂v ∂u
J = (v)(1) − (u)(0)
J = v
# Define f(x, y)
f <- function(x, y){8 * x * y * (0 <= x & x <= y & y <= 1)}
fill = NA,
size = 1),
axis.ticks = element_blank(),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
# Define g(u, v)
g <- function(u, v){8 * u * v ^ 3 * (0 <= u & u <= 1) * (0 <= v & v <= 1)}
theme(
plot.title = element_text(hjust = 0.5),
legend.position = c(0.15, 0.15),
legend.key.width = unit(units = 'cm', 1.25),
panel.border = element_rect(
fill = NA,
size = 1),
axis.ticks = element_blank(),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
1.00 1.00
0.75 0.75
0.50 0.50
0.25 0.25
Density Density
8 8
6 6
4 4
2 2
0.00 0 0.00 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Introduction
X(1) = min{X1 , . . . , Xn }
X(n) = max{X1 , . . . , Xn }
The method of distribution functions can be used to find the densities X(1) and
X(n) . Let F(n) (x) be the distribution function of X(n) and let g(n) (x) be the density
function of X(n)
= P [X1 ≤ x, X2 ≤ x, . . . , Xn ≤ x]
n
= F (x)
dh i
g(n) (x) = F(n) (x)
dx
d n
= [F (x)
dx
= n · f (x) · F [x]n−1
The method of distribution functions can be used to derive the density and cumu-
lative distribution function of min{X1 , . . . , Xn }. Let the cumulative distribution
function and density of X(1) be denoted by F(1) (x) and g(1) (x).
= 1 − P [min{X1 , . . . , Xn } > x]
n
= 1 − 1 − F [x]
dh i
g(1) (x) = F(1) (x)
dx
d n
= 1 − 1 − F [x]
dx
n−1
= n · f (x) · 1 − F [x]
n! n−k
× F (x)]k−1 × 1 − F (x)
g(k) (x) = × f (x)
(k − 1)!(n − k)!
The textbook also uses the multinomial distribution to provide a heuristic derivation
for the joint density function of two order statistics. Let j and k be elements of
{1, 2, . . . , n} such that j < k and le X1 , . . . , Xn be independent continuous random
variables from a distribution with density function f (x) and cumulative distribution
function F [x]. Then the joint density function of the order statistics X(j) and X(k)
is
n! j−1 k−1−j
g(j),(k) (xj , xk ) = × F (xj ) × F (xk ) − F (xj )
(j − 1)!(k − 1 − j)!(n − k)!
n−k
× 1 − F (xk ) × f (xj ) × f (xk )
# Create a matrix
M <- matrix(r, nrow = nr, ncol = n)
G <-
ggplot(
data = M) +
geom_histogram(
mapping= aes(x = X_2, y = ..density..),
fill = 'darkorange',
alpha = 0.2,
color = 'black',
bins = 50) +
geom_line(
mapping = aes(x = x, y = g_2(x)),
color = 'darkgreen',
size = 1.2) +
scale_x_continuous(
limits = c(0, 3)) +
scale_y_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00'),
limits = c(0, 1)) +
labs(
title = 'Second Order Statistic / Relative Frequency',
x = 'x-values',
y = 'Relative Frequency',
caption = 'Statistics / Section 6.7') +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -6),
axis.title.x = element_text(vjust = +6),
text = element_text(
size = 16,
face = 'italic',
family = 'serif'),
plot.title = element_text(hjust = 0.5))
0.75
Relative Frequency
0.25
0.00
0 1 x−values 2 3
7.1 Introduction
Statistic
Sampling Distribution
Example:
Generate 1000 samples each of size 1000 from an gamma distribution with shape parameter
2 and scale parmeters 3.
Compute the sample mean of each sample and plot a histogram of the results.
size = 0.2,
bins = 50) +
scale_y_continuous(
breaks = c(0, 1, 2, 3),
labels = c(0, '','', 3)) +
scale_x_continuous(
breaks = c(5.7, 6.0, 6.3),
labels = c(5.7, '',6.3)) +
labs(
title = 'The Sampling Distribution of the Sample Mean',
x = 'Sample Means',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.1') +
theme(
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G
5.7 6.3
Sample Means
Statistics I / Section 7.1
In chapter 6, the method of moment generating functions was used to show that
a linear combination of normal random variables is normal. The mean of a set of
random variables is a linear combination of the variables. As a result, we have the
following theorem. Let X1 , . . . Xn be a random sample from a normal distribution
with mean µ and variance σ 2 . Then
n
1X
X = Xi
n i=1
X ∼ N µ, σ 2 /n
µX = µ
2 σ2
σX =
n
1
The mean stays the same but that the variance is reduced by a factor of .
n
n
1X
X = Xi
n i=1
n
2 1 X 2
S = Xi − X
n − 1 i=1
n
(n − 1)S 2 1 X 2
1. = X i − X ∼ χ2 with df = (n − 1)
σ2 σ 2 i=1
Z
T = p
W/ν
Then T is said to have a t distribution with ν df
# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = T_vals, y = ..density..),
fill = 'darkred',
alpha = 0.4,
bins = 100,
size = 0.2,
color = 'black') +
scale_x_continuous(
breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
limits = c(-5, 5)) +
scale_y_continuous(
breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),
labels = c('0.0', '0.1', '', '', '0.4'),
limits = c(0.0, 0.44)) +
geom_line(
mapping = aes(x = T_vals, y = t),
size = 1.3,
color = 'navy')+
labs(
title = 'Simulating a t-Distribution',
x = 't-values',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.2') +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -5),
axis.title.x = element_text(vjust = +5),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
Simulating a t−Distribution
0.4
Relative Frequency
0.1
0.0
The F Distribution
# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = F_vals, y = ..density..),
fill = 'darkred',
alpha = 0.4,
bins = 100,
color = 'black',
size = 0.2) +
geom_line(
mapping = aes(x = F_vals, y = F),
size = 1.3,
color = 'navy')+
labs(
title = 'Simulating an F-Distribution',
x = 'F-values',
y = 'Relative Frequency',
caption = 'Statistics I / Section 7.2') +
scale_x_continuous(
breaks = c(0, 2, 4, 6, 8, 10),
labels = c(0, 2, 4, 6, 8, 10),
limits = c(0, 10)) +
scale_y_continuous(
breaks = c(0.0, 0.2, 0.4, 0.6, 0.8),
labels = c('0.0', '0.2', '', '', '0.8'),
limits = c(0.0, 0.85)) +
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = -5),
axis.title.x = element_text(vjust = +5),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
Simulating an F−Distribution
0.8
Relative Frequency
0.2
0.0
0 2 4 F−values 6 8 10
Summary
Suppose that
• X1 , . . . , Xn is a random sample from X ∼ N µX , σX2
• Y1 , . . . , Ym is a random sample from Y ∼ N µY , σY2
Then
!
√ X − µX
• n ∼ N 0, 1
σX
!
n−1
• 2
SX2 ∼ χ2 with df = (n − 1)
σX
!
√ X − µX
• n ∼ t distribution with df = (n − 1)
SX
SX2 /σX2
• F = 2 2 ∼ F distribution, numerator df = (n−1), denominator df =(m−1)
SY /σY
Introduction
The central limit theorem will apply to any distribution with finite mean µ and finite
variance σ 2 . The central limit theorem says that if a random sample is large enough,
then the sample mean is approximately normal. This theorem is very important
and will allow us to compute approximate probabilities for the sums of a random
sample when we only know the mean and standard deviation and not the underlying
distribution.
W = X 1 + X2 + · · · + Xn
2
∼ N nµ, nσ [approximately]
Proof:
A proof of the central limit theorem is outside the focus of this course. Textbook
describes the basic idea which uses moment generating function.
0.08
Relative Frequency
0.02
0.00
30 40 x−values 50 60
Example #1
The total claim amount for a health insurance policy follows a distribution with density
function
1 −(x/1000)
1000 e , for 0 < x
f (x) =
0 , otherwise
The premium for the policy is set at the expected total claim amount plus 100.If 100 policies
are sold, calculate the approximate probability that the insurance company will have claims
exceeding the premiums collected. Graph a bell curve with a shade region corresponding
to this probability.
# Clear the environment
rm(list = ls())
# Population Parameters
# X is exponential with mean 1000
mu_x <- 1000
sigma_x <- 1000
G <-
ggplot(
data = M)+
geom_line(
mapping = aes(x = x1, y = y1),
color = 'darkred',
size = 1.2)+
geom_ribbon(
mapping = aes(x = x2, ymin = 0, ymax = y2),
fill = 'darkred',
alpha = 0.3,
color = 'darkred',
size = 1.2)+
geom_text(
mapping = aes(x = amount+1000, y = mean(range(y2))),
label = paste('Area = ', round(ans, digits = 3)),
size = 6,
angle = 90,
color = 'black',
family = 'serif',
fontface = 'italic',
nudge_x = 1500,
nudge_y = -4e-6)+
labs(
title = 'The Central Limit Theorem',
caption = 'Statistics I / Section 7.3',
x = 'Total Claims',
y = 'Density')+
theme(
axis.ticks = element_blank(),
axis.title.y = element_text(vjust = +3),
axis.title.x = element_text(vjust = -3),
plot.title = element_text(hjust = 0.5),
text = element_text(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G
3e−05
Density
1e−05
0e+00
Introduction
Let X be an integer valued random variable with expected value µ and variance σ 2
and let X1 , X2 , . . . , Xn be a random sample from X. Define
S = X 1 + X2 + · · · + Xn
W ∼ N µS , σS2
S is also integer valued and by the Central Limit Theorem is approximately normal.
• µX = np
• σX2 = npq
• From the Central Limit Theorem, X is approximately N np, npq
• Let W ∼ N np, npq , then it follows from the continuity correction, that
h 1 1i
P [X = k] ≈ P k − ≤ W ≤ k +
2 2
h 1i
P [X ≥ k] ≈ P W ≥ k −
2
h 1i
P [X ≤ k] ≈ P W ≤ k +
2
How big must n be to use the normal approximation to the binomial distribution?
The answer is given by either of the two rules
p p
• 0 < p − 3 pq/n < p + 3 pq/n < 1
!
max{p, q}
• n>9×
min{p, q}
# Binomial parameters
n <- 92
p <- 0.61
# normal parameters
m <- n * p
v <- sqrt(n * p * (1 - p))
Z
T = p
W/ν
Then T is said to have a t distribution with ν df
# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = T_vals, y = ..density..),
fill = 'darkred',
alpha = 0.4,
bins = 100,
color = 'black',
size = 0.1) +
geom_line(
mapping = aes(x = T_vals, y = t),
size = 1.3,
color = 'blue')+
scale_x_continuous(
breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
limits = c(-5, 5)) +
scale_y_continuous(
Simulating a t−Distribution
0.4
Relative Frequency
0.1
0.0
For T a random variable with a t-distribution and ν degrees of freedom, the density
function is
−(ν+1)/2
t2
Γ[(ν + 1) / 2]
f (t) = √ 1+
πν Γ(ν/2) ν
Z∞
Γ(α) = y α−1 e−y dy
0
The next plot compares a t-density function and a standard normal density function. Note
that both densities are symmetric about zero but that the t-distribution has more proba-
bility mass in its tails. t-disributions have heavier tails.
M <-
data_frame(
x = seq(from = -4, to = +4, length = 1e3),
dn = dnorm(x),
dt = dt(x, df = 4))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = dn, color = 'Standard Normal'),
size = 1.4) +
geom_line(
mapping = aes(x = x, y = dt, color = 't-Distribution'),
size = 1.4) +
scale_x_continuous(
breaks = c(-4, -2, 0, 2, 4),
labels = c(-4, -2, '', 2, 4)) +
scale_y_continuous(
breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),
labels = c('0.0', '0.1', '', '0.3', '0.4'))+
scale_color_manual(
name = 'Density',
values = c('Standard Normal' = 'darkgreen', 't-Distribution' = 'maroon')) +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'Comparing Normal and t Density Functions',
caption = 'Statistics I / Section 7.5') +
theme(
axis.ticks = element_blank(),
legend.position = c(0.85, 0.85),
0.3
y−axis
0.1
0.0
−4 −2 x−axis 2 4
Create a table of percentage points for the t-distribution corresponding to the below picture.
each column should correspond to the tα and each row should correspond to df
M <-
data_frame(
x1 = seq(from = -4, to = 4, length = 1e3),
x2 = seq(from = 1, to = 4, length = 1e3),
y1 = dt(x1, df = 5),
y2 = dt(x2, df = 5))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x1, y = y1),
color = 'red') +
geom_ribbon(
mapping = aes(x = x2, ymin = 0, ymax = y2),
fill = 'red',
alpha = 0.4) +
geom_text(
mapping = aes(x = 1.5, y = 0.05),
label = expression(alpha),
size = 7,
color = 'black') +
scale_x_continuous(
breaks = c(1),
labels = expression(x = t[alpha]))+
labs(
x = NULL,
y = NULL)+
theme_classic()+
theme(
axis.text.x = element_text(face = 'bold', family = 'serif', size = 16))
0.3
0.2
0.1
0.0
tα
M <-
expand.grid(
df = seq(from = 1, to = 29, by = 1),
alpha = c(0.100, 0.050, 0.025, 0.010, 0.005)) %>%
mutate(t_alpha = qt(p = alpha, df = df, lower.tail = FALSE)) %>%
spread(key = alpha, value = t_alpha)
[Dev86] J. Devore and R. Peck 1986, Statistics: The Exploration and Analysis of Data. St
Paul, MN: West Publishing Company
[Si09] Thomas Sibley 2009, Foundations of Mathematics Hoboken, NJ: John Wiley & Sons,
Inc.
Index