Notes PDF

MATH 3301 / Summer 2019
John Garza
September 5, 2019
MATH 3301 / Notes Section 0.0
table of contents 2
Contents
Preface 7
1 What is Statistics? 9
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Using R in this Course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Basic Probability 13
2.1 A Review of Basic Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Experiments and Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 Conditional Probability and Independent Events . . . . . . . . . . . . . . . . 31
2.5 The Additive and Multiplicative Laws . . . . . . . . . . . . . . . . . . . . . 35
2.6 Baye’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.7 Random Variables and Random Samples . . . . . . . . . . . . . . . . . . . . 39
3 Discrete Random Variables 41

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Moments and Moment Generating Functions . . . . . . . . . . . . . . . . . . 46
3.4 Factorial Moments and Probability Generating Functions . . . . . . . . . . . 51
3.5 Uniform Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Bernoulli Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Binomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.8 Geometric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.9 Negative Binomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.10 Hyper-Geometric Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.11 Poisson Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4 Continuous Probability Distributions 103

4.1 Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

4.3 Expected Value for Continuous Distributions . . . . . . . . . . . . . . . . . . 113
4.4 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.5 Uniform Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . 123
4.6 Gamma Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.7 χ2 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.8 Exponential Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.9 Beta Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.10 Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5 Multivariate Probability Distributions 165

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.2 Bivariate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . 171
5.3 Marginal and Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 180
5.4 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.5 The Expected Value of a Function of Random Variables . . . . . . . . . . . . 187
5.6 Covariance and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.7 Linear Combinations of Random Variables . . . . . . . . . . . . . . . . . . . 199
5.8 Multinomial Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.9 Bi-variate Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.10 Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.10.1 Conceptual Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
5.10.2 Conditional Expectations from Excel Data . . . . . . . . . . . . . . . 220
5.10.3 Categorical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6 Functions of Random Variables 227

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.2 Probability Distributions for Functions of Random Variables . . . . . . . . . 228
6.3 The Method of Distribution Functions . . . . . . . . . . . . . . . . . . . . . 230
6.4 The Transformation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.5 The Method of Moment Generating Functions . . . . . . . . . . . . . . . . . 243
6.6 Transformations Using Jacobians . . . . . . . . . . . . . . . . . . . . . . . . 246
6.7 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
7 The Central Limit Theorem 263

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
7.2 Sampling Distributions from a Normal Distribution . . . . . . . . . . . . . . 267
7.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
table of contents 4
Section 0.0 MATH 3301 / Notes
7.4 The Continuity Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

7.5 The t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Biblography 299
Index 300
Preface
table of contents 5
table of contents 6
Preface
These notes have been prepared for the Statistics course at UT Permian Basin. The notes
are intended to be used with the matching lecture videos provided in the course’s canvas
pages. The goal of these notes is provide a simplified summary of the content contained in
the required course textbook. These notes have been prepared using LATEX and R has been
weaved into the files using http://yihui.name/knitr/.
John Garza
September 5, 2019
table of contents 8
Chapter 1
What is Statistics?
1.1 Introduction
Statistics is a Part of Data Analysis

A natural way of answering the question ”What is statistics?” is to say that it is a part of
data analysis. In statistics, the data often are associated with populations and samples and
the analysis includes quantities like means and variances.
Population
The population is the entire set of measurements, objects, or individuals of interest.
Sample
A sample is a subset of the population that is selected.
Statistic
A statistic is a quantity that is computed from the values of a sample. Examples

include the sample mean, sample variance, and the sample standard deviation. To
define these statistics, let x1 , . . . , xn be a sample of size n.
Definition of the Sample Mean x
n
1X
x = xi
n i=1
Definition of the Sample Variance, s2
n
1 X
s 2
= (xi − x)2
n − 1 i=1
Definition of the Sample Standard Deviation, s
√
s = s2
table of contents 10
Population Mean, Population Variance, and Population Standard Deviation
The sample statistics are estimates for the population

• x estimates the population mean µ
• s2 estimates the population variance, σ 2
• s estimates the population standard deviation, σ
The Empirical Rule
For a distribution of measurements that is approximately bell shaped...

• µ ± σ contains approximately 68% of the measurements
• µ ± 2σ contains approximately 95% of the measurements
• µ ± 3σ contains almost all of the measurements
1.2 Using R in this Course
Introduction
We will be using R, a free Data Analysis program available at https://www.r-project.
org/. After downloading R, install https://www.rstudio.com/home/ on your computer.
Set up a folder named ACTS131 on your computer. This folder will contain all of your files
for the course. As a student in this course you will have free access to all premium content
at www.datacamp.com
Chapter 2
Basic Probability
2.1 A Review of Basic Set Theory
Sets and Elements
A set is a collections of elements. The notation x ∈ B means that x is an element

in the set B. The notation x 6∈ B means that x is not an element of the set B.
Set Notation
If A is a set and x, y, and z are the only elements of A, then we write

A = {x, y, z}
Sets are not Ordered
The elements of a set are not ordered. A set is determined only by its elements. For
example, if A = {x, y, z} then it is also true that A = {y, z, x}
The Universal Set
In order for set theory to be consistent, we suppose that there is a largest possible
set. This set is called set the universal set and is often denoted by a capital S.
The Empty Set
There is a unique empty set which is a denoted ∅. This empty set is also called
the null set.
Size or Cardinality of a Set
For a set that contains a finite number of elements, we donote the size or cardi-
nality of A by either N (A) or |A|. If the cardinality of a set is zero, then the set is
the empty set.
Subsets
If A and B are sets then A is a subset of B if and only if every element of A is also
and element of B. This is denoted by A ⊆ B
Proper Subsets
A is a proper subset of B if A is a subset of B but there exists at least one element

of B that is not contained in A. This is denoted by A ⊂ B.
Equality of Sets
Two sets A and B are equal if every element of A is an element of B and every
element of B is an element of A. This is denoted by A = B. A = B if and only if
A ⊆ B and B ⊆ A.
Set Operations
The Basic Set Operations
union: A ∪ B = {x ∈ S | x ∈ A or x ∈ B}
intersection A ∩ B = {x ∈ S | x ∈ A and x ∈ B}
difference A − B = {x ∈ S | x ∈ A and x 6∈ B}
complement A = {x ∈ S | x 6∈ A}
Disjoint or Mutually Exlusive Sets
Sets A and B are disjoint or mutually exclusive if A ∩ B = ∅
The Distributive Laws and DeMorgan’s Laws

DeMorgan’s Laws
A∩B = A∪B
A∪B = A∩B
Distributive Laws
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
Double Complement Law
A = A
Extending Set Laws to More than Two Sets

DeMorgan’s Laws for Three Sets
A∩B∩C = A∪B∪C
A∪B∪C = A∩B∩C
Set Operations in R
R has built in functions for set operations

Set Operations in R
Set Operation R Function

union union(x, y)
intersection intersect(x, y)
set difference setdiff(x, y)
equality setequal(x, y)
membership el %in% set
membership is.element(el, set)
#----- Script 2.1.1 / Set Operations in R -----
A <- sample(letters, rep = FALSE, size = 10)

A
[1] "b" "x" "p" "w" "c" "d" "q" "m" "o" "g"
B <- sample(letters, rep = FALSE, size = 10)
B
[1] "i" "t" "q" "x" "d" "s" "v" "c" "o" "l"
union(A, B) # set union
[1] "b" "x" "p" "w" "c" "d" "q" "m" "o" "g" "i" "t" "s" "v" "l"
intersect(A, B) # set intersection
[1] "x" "c" "d" "q" "o"
setdiff(A, B) # set difference
[1] "b" "p" "w" "m" "g"
setequal(A, B) # test for set equality
[1] FALSE
is.element("a", A) # is "a" in A
[1] FALSE
"b" %in% B # is "b" in B
[1] FALSE
Example: Drawing a Venn Diagram
https://cran.r-project.org/web/packages/VennDiagram/VennDiagram.pdf
https://scriptsandstatistics.wordpress.com/2018/04/26/how-to-plot-venn-diagrams
A survey of 100 TV viewers revealed that over the last year:
i) 34 watch CBS.
ii) 15 watch NBC.
iii) 10 watch ABC.
iv) 7 watch CBS and NBC.
v) 6 watch CBS and ABC.
vi) 5 watch NBC and ABC.
vii) 4 watch CBS, NBC and ABC.
Draw a Venn Diagram corresponding to this information
# Clear the environment

remove(list = ls())
# Install the VennDiagram Package

library(VennDiagram)
# Define the values of n1, n2 and n3

n1 <- 34
n2 <- 15
n3 <- 10
# Define the values of n12, n13 and n23
n12 <- 7
n13 <- 6
n23 <- 5
# Define the values of n123

n123 <- 4
# Create the VennDiagram

venn_plot <-
draw.triple.venn(
area1 = n1,
area2 = n2,
area3 = n3,
n12 = n12,
n13 = n13,
n23 = n23,
n123 = n123,
category = c('NBC', 'CBS', 'ABC'),
fill = c('powderblue', 'yellow', 'pink'),
cex = 3,
cat.cex = 3,
cat.dist = c(0.1, 0.1, 0.1))
# Draw the plot

grid.draw(venn_plot)
NBC CBS
3
25 7
4
2 1
ABC
2.2 Experiments and Probability
This section defines basic terminology and introduces probability functions.
Experiment
An experiment is the process by which an observation is made.
Sample Space
A sample space associated with an experiment is the set of all possible observa-
tions. A sample space will usually be denoted by S. The elements of the sample
space are called sample points.
Sample Point
A sample point is an element of the sample space.
Event
An event is a subset of the sample space. The elements of an event are called
sample points.
Simple Event
A simple event is an event that cannot be decomposed. Each simple event contains
a unique element, a sample point.
Compound Event
An event is a compound event if has at least two sample points. A compound

event can be decomposed into the union of two smaller events.
Discrete Sample Space
A discrete sample space is one that contains a finite or countable number of

distinct sample points.
Probability Function
Suppose S is a sample space associated with an experiment. To every event A in

S (A is a subset of S), a probability function assigns a number P (A), called the
probability of A, so that the following hold:
Axiom 1: P (A) ≥ 0.
Axiom 2: P (S) = 1.
Axiom 3: If A1 , A2 , A3 , . . . are mutually exclusive events in S then

∞
X
P (A1 ∪ A2 ∪ A3 ∪ · · · ) = P (Ai )
i=1
2.3 Counting
Introduction
For discrete sample spaces, probability functions require us to count the number of sample
points in an event or in the sample space. In this section we will review some of basic count-
ing definitions and theorems and we will see how R can be used to compute these quantities.
Equal Probability Formula
Let S be a finite sample space where every simple event has equal probability. Then
|E|
P (E) =
|S|
The Counting Principle or mn Rule
If a process can be completed in k ∈ Z+ steps and if step i can be completed in ni

distinct ways then the total number of distinct ways to complete the entire process
is n1 n2 · · · nk . The counting principle is also called the mn rule or the multiplication
rule.
Combinations and Binomial Coefficients
The number of combinations of n objects taken r at a time is the number of subsets,

each of size
rthat can be formed from the n objects. This number will be denoted
n n
by Cr or . These numbers are called binomial coefficients.
r
The Binomial Theorem

n n!
=
r r!(n − r)!
n
X n i n−i
(x + y)n = xy
i=0
i
A Binomial Identity
Letting x = 1 and y = 1 we get the identity

n
X n
= 2n
k
k=0
Pascal’s Formula

n+1 n n
= +
r r−1 r
Symmetry of Binomial Coefficients

n n
=
r n−r
Special Values for Binomial Coefficients

n n
= 1 = 1
0 n

n n
= n = n
1 n−1

n n(n − 1)
=
2 2
Permutations
An ordered arrangement of r distinct objects is called a permutation. The number

n
of ways of ordering n distinct objects taken r at a time is denoted by the symbol Pr
r
Pn = n(n − 1) · · · (n − r + 1)
n!
=
(n − r)!
Order and Repetition

Combinations and permutations are two cases of choosing k elements from a set of n el-
ements. In combinations and permutations repetition was not allowed. If we allow for
repetition we then have four ways of choosing k elements from a set of n
Order Matters Order Does Not Matter

n+k−1
Repetition Allowed nk
k

n
Repetition not Allowed Pnk
k
Multinomial Coefficients
The number of ways of partitioning n distinct objects into k distinct subsets of sizes
n1 , . . . nk where n1 + · · · + nk = n is called a multinomial coefficient.

n n!
=
n1 · · · nk n1 ! · · · nk !
The Multinomial Theorem
X n

(x1 + · · · + xk )n = xn1 1 · · · xnk k
n1 n2 · · · nk
where the sum is over all ni ∈ {0, . . . , n} such that n1 + · · · + nk = n
Multinomial Identity
Letting x1 = x2 = · · · = xk = 1 we get the identity

X n

= kn
n1 n2 · · · nk
A Binomial Coefficient is a Multinomial Coefficient
A binomial coefficient is a multinomial coefficient.

n n!
=
r r!(n − r)!
The Addition Rule
Let S be a finite set that is the union of k mutually disjoint sets B1 , . . . , Bk . Then
|S| = |B1 | + · · · + |Bk |
The Difference Rule
Let A be a finite set and suppose that B ⊆ A. Then

|A − B| = |A| − |B|
Inclusion/Exclusion Rule
Let A, B and C be finite sets. Then

|A ∪ B| = |A| + |B| − |A ∩ B|
|A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|
Counting Functions in R
R has several built in functions for counting and we can define our own functions for those
that are not built-in already.
# Counting Functions in R ----

choose(4, 2) # binomial coeffcients
[1] 6
factorial(5) # factorial
[1] 120
# Defining a permutations function-----
perm <-
# computes the number of ordered subsets of size k from a set of size n
function(n, k){factorial(n) / factorial(k)}
# Example
perm(4, 2)
[1] 12
# defining a multinomial coefficient function----
multinom <-
# computes a multinomial coefficient
function(v){factorial(sum(v)) / prod(factorial(v))}
# Example
multinom(c(1, 2, 3))
[1] 60
#End of Script
2.4 Conditional Probability and Independent Events
Conditional Probability
Let A and B be events and suppose that P (B) > 0. The conditional probability of
an event A given that an event B has occurred, is defined as
P (A ∩ B)
P (A|B) =
P (B)
Conditional Probability Equations
P (A ∩ B) = P (A|B)P (B)
P (A ∩ B) = P (B|A)P (A)
P (B|A)P (A) = P (A|B)P (B)
Mutually Exclusive Events and Conditional Probability
Let A and B be mutually exclusive events. If P (B) > 0 then

P (A)
P (A|A ∪ B) =
P (A) + P (B)
Set Inclusion and Conditional Probability
Let A and B be events such that B ⊂ A, P (B) > 0 and P (A) > 0. Then
P (A|B) = 1
P (B)
P (B|A) =
P (A)
Conditional Probability and Complements
P (A|B) = 1 − P (A|B)
Independent Events
Two events A and B are independent if any of the following hold.

P (A|B) = P (A)
P (B|A) = P (B)
P (A ∩ B) = P (A)P (B)
Dependent Events
Two events are dependent if and only if any of the following hold.
P (A|B) 6= P (A)
P (B|A) 6= P (B)
P (A ∩ B) 6= P (A)P (B)
Independence and Complements
The following are equivalent. If one of them is true then so are all of the others
• A and B are independent
• A and B are independent.
Mutual Independence of Several Events
Events A, B and C are said to be mutually independent if all of the following hold
P (A ∩ B) = P (A)P (B)
P (A ∩ C) = P (A)P (C)
P (B ∩ C) = P (B)P (C)
P (A ∩ B ∩ C) = P (A)P (B)P (C)
2.5 The Additive and Multiplicative Laws
The Multiplicative Law of Probability
P (A ∩ B) = P (A|B)P (B)
= P (B|A)P (A)
Multiplicative Law / Independent Events
P (A ∩ B) = P (A)P (B)
Multiplicative Law / Three Sets
P (A ∩ B ∩ C) = P (A)P (B|A)P (C|A ∩ B)
Multiplicative Law / Many Sets
P (A1 ∩ A2 ∩ · · · ∩ Ak ) = P (A1 )P (A2 |A1 )P (A3 |A2 ∩ A3 ) · · · P (Ak |A1 ∩ A2 ∩ · · · ∩ Ak−1 )
The Additive Law of Probability
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Additive Law / Mutually Exclusive Sets
P (A ∪ B) = P (A) + P (B) when A ∩ B = ∅
Additive Law / Three Sets
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C)

+ P (A ∩ B ∩ C)
Complement Law
P (A) + P (A) = 1
2.6 Baye’s Rule
Partitions
Let S be a set. A collection of subsets B1 , . . . , Bk of S is a partition of S if

1. Bi 6= ∅ for i ∈ {1, . . . , k}
2. Bi ∩ Bj = ∅ for i 6= j
3. S = B1 ∪ B2 ∪ · · · ∪ Bk
The Law of Total Probability
Let S be a sample space and let B1 , . . . , Bk be a partition of S such that 0 < P (Bj )
for j ∈ {1, . . . , k}. Then for any event A
k
X
P (A) = P (A|Bj )P (Bj )
j=1
k
X
= P (A ∩ Bj )
j=1
Baye’s Rule
Let S be a sample space and let B1 , . . . , Bk be a partition of S such that 0 < P (Bj )
for j ∈ {1, . . . , k}. Then for any event E
P (E|Bj )P (Bj )
P (Bj |E) = Pk
i=1 P (E|Bi )P (Bi )
Elementary Partitions
Let A and B be events such that 0 < P (B) < 1. Then
1. {B, B} is a partition.
2. P (A) = P (A ∩ B) + P (A ∩ B)
3. P (A) = P (A|B)P (B) + P (A|B)P (B)
2.7 Random Variables and Random Samples
Random Variables
A random variable is real valued function whose domain is a sample space.
Random Sample
If a population has size N and if all samples from the population of size n have
an equal probability of being selected, the sampling is said to be random and the
resulting sample is called a random sample.
Formula for the Number of Random Samples

N
number of random samples of size n from a population of size N =
n
Chapter 3
Discrete Random Variables

3.1 Introduction
Countably Infinite
A set S is said to be countably infinite if there exists a bijection

Ψ:N → S
Countable
A set S is said to be countable if it is finite or countably infinite.
Discrete Random Variable
A random variable X is discrete if the set of values it can take is countable.
Notation
1. Upper case letters, X are for random variables

2. Lower case letters, x are for values of the random variable
3. (X = x) = {s ∈ S|X(s) = x}
Definition of P (X = x)
The probability that a random variable X equals the value x is denoted P (X = x)

and is defined as the sum of the probabilities of all s ∈ S such that X(s) = x
Probability Function of a Discrete Random Variable
Let X be a discrete random variable. The probability function of X is defined as

pX (x) = P (X = x)
Probability Distribution of a Discrete Random Variable
The probability distribution of a discrete random variable X is a function, graph or

table that indicates pX (x) for all possible values of x
Properties of Discrete Probability Functions
The probability function pX (x) of a random variable X must satisfy

1. 0 ≤ pX (x) ≤ 1, ∀ x
X
2. pX (x) = 1
all x
Cumulative Distribution Function of a Discrete Random Variable
The cumulative distribution function of a discrete random variable X is defined as

FX [x] = P [X ≤ x]
X
= pX (w)
w≤x
3.2 Expected Value
Expected Value of a Discrete Random Variable
The expected value of a discrete random variable X is defined by

Xh i
E[X] = x × pX (x)
all x
Expected Value and Absolute Convergence
The expected value of a random variable can be undefined. In order for the expected
value to be defined, the above sum must be absolutely convergent, that is
Xh i
|x| × pX (x) < ∞
all x
Notation for Expected Value
The expected value is sometimes denoted µX
Expected Value of a Function of a Random Variable
Let X be a random variable and let h(X) be a function of X.

Xh i
E[h(X)] = h(x) × pX (x)
all x
Variance
The variance of a random variable is defined as

V [X] = E (X − µX )2

2
= E X 2 − E[X]

The variance of X is sometimes denoted σX2 .
Standard Deviation
The standard deviation of the random variable X is the positive square root of the
variance of X. The standard deviation is sometimes denoted σX
Linear Transformations
Let X be a random variable. For real numbers a and b
1. E[aX + b] = aE[X] + b
2. E[b] = b
3. V [b] = 0
4. V [aX + b] = a2 V [X]
5. V [X + b] = V [X]
6. V [aX] = a2 V [X]
Sums of Functions of a Random Variable
For h1 (X), . . . , hk (X) functions of X and real numbers a1 , . . . , ak

E a1 h1 (X) + · · · + ak hk (X) = a1 E[h1 (X)] + · · · + ak E[hk (X)]
Using R to Explore Expected Value and Variance
# Using R to explore expected value and variance
# Generate a vector, Y, of random numbers from {1, 2, 3, ...., 100}

w <- seq(from = 1, to = 100, by = 1)
Y <- sample(x = w, size = 50, replace = TRUE)
# compute the mean and var of Y

mu_y <- mean(Y)
var_y <- var(Y)
# define constants a and b

a <- +3
b <- -2
# Compare E[aY + b] and aE[Y] + b

mean(a * Y + b)
[1] 149.2
a * mu_y +b
[1] 149.2
# Compare V[aY+b] and a^2V[Y]
var(a * Y + b)
[1] 7243.347
a ^ 2 * var_y
[1] 7243.347
3.3 Moments and Moment Generating Functions
kth Moment about the Origin
The kth moment about the origin of a random variable X is

/
µk = E X k

kth Central Moment
The kth central moment of the random variable X is

µk = E (X − µX )k

Note
/
Notice that µ2 = σX2 and µ1 = µX
Moment Generating Functions
Let X be a random variable. The moment-generating function of X is

mX (t) = E[etX ]
Note
The definition requires that ∃ b ∈ R+ such that mX (t) < ∞ for all −b ≤ t ≤ b
Theorem
Let X be a random variable with moment generating function mX (t). Then

dk mX (t)

= m(k) (0)
dtk t=0
X
= E Xk

Note
The previous theorem can be used to calculate the variance of X

2
V [X] = E[X 2 ] − E[X]
(1) 2
= m(2)
X
(0) − mX (0)
Note
mX (0) = E[e0X ]
= E[e0 ]
= E[1]
= 1
Note
x
E[ax ] = E[eln(a ) ]
= E[ex ln a ]
= mX (ln a)
Let Y = aX + b. Then
mY (t) = E[etY ]
= E[et(aX+b) ]
= E[eatX etb ]
= etb mX (at)
Independent Random Variables and Moment Generating Functions
Let X1 , . . . , Xn be independent random variables and define W = X1 + · · · + Xn .

mW (t) = m1 (t) × · · · × mn (t)
Log of mX (t), E[X], and V [X]
Let X be a random variable with moment generating function mX (t).
Define h(t) = ln(mX (t)).

E[X] = h0 (0)
V [X] = h00 (0)
Exploring Moment Generating Functions with R

# Exploring moment generating functions with R
# Let X be a binomial random variable with parameters p = 0.7 and n = 100
# Define the moment generating function of £X£

p <- 0.7
n <- 100
mg_x <- function(t){(p * exp(t) + (1 - p)) ^ n}
# Use the moment generating function to calculate E[1.1 ^ X]

mg_x(log(1.1)) # in R log is the natural logarithm ln
[1] 867.7163
# Compute E[1.1 ^ X] directly

x <- seq(from = 0, to = 100, by = 1)
p_x <- dbinom(x = x, size = n, prob = p)
sum(1.1 ^ x * p_x)
[1] 867.7163
# we get the same value
3.4 Factorial Moments and Probability Generating Functions
Introduction
Many random variables can take only the values 0, 1, 2, 3, ...... Such variables may be called
counts and examples include geometric, binomial, Poisson, hyper-geometric, and negative-
binomial variables. The probability generating function is useful for computing expected
value and variance for these variables.
Probability Generating Function
Let X be a random variable that takes only the values X = 0, 1, 2, 3, . . . and define
pn = P [X = n]. The probability generating function of X is
PX (t) = E[tX ]
∞
X
p j × tj

=
j=0
= p0 + p1 t + p2 t2 + · · ·
Factorial Moments
For a random variable X, the k-th factorial moment of X is

µ[k] = E[X(X − 1) · · · (X − k + 1)]
For example,
µ[1] = E[X]
µ[2] = E[X(X − 1)]
Theorem
Let X be a discrete random variable that takes only the values X = 0, 1, 2, 3, . . .

and let P (t) be the probability generating function of X. Then
dk PX (t)

= PX(k) (1)
dtk t=1
= µ[k]
= E[X(X − 1) · · · (X − k + 1)]
Note
For a count random variable X with probability generating function P (t)
E[X] = PX0 (1)
2
V [X] = PX00 (1) + PX0 (1) − PX0 (1)
3.5 Uniform Discrete Distributions
Definition
The uniform discrete distribution assigns equal probabilities to a finite set of num-
bers. If X is uniform discrete on {1, 2, . . . , n} then
1

 n , for x ∈ {1, . . . , n}


p(x) =


0, otherwise

Properties of the Uniform Discrete Random Variables
n+1
E[X] =
2
n2 − 1
V [X] =
12
mX (t) = E[eXt ]
X
px × etx

=
all x
n
X 1
= ejt
j=1
n
et ent − 1
= ·
n et − 1
Exploring the Discrete Uniform Distribution using R
# Exploring the discrte uniform distribution with R
# Let X be discrete uniform on the set {1, 2, ..., 23}
n <- 23
# Define a vector of the possible values of X
x <- seq(from = 1, to = n, by = 1)
x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# Define the probability function
p_x <- rep(x = 1 / n, times = n)
p_x
[1] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[7] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[13] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
[19] 0.04347826 0.04347826 0.04347826 0.04347826 0.04347826
# Compute the expected value of X using the formula
E_X <- (n + 1) / 2
E_X
[1] 12
# Compute the Expected value using the definition
sum(x * p_x)
[1] 12
# Compute the variance using the formula
V_x <- (n ^ 2 - 1) / 12
V_x
[1] 44
# Compute the variance directly
E2_X <- sum(x ^ 2 * p_x)
E2_X - (E_X) ^ 2
[1] 44
3.6 Bernoulli Distributions
Definition
A random variable X is a Bernoulli random variable with probability of success

p, 0 < p < 1 if it has probability function


 p, for x = 1
pX (x) = 1 − p, for x = 0

0, otherwise

For a Bernoulli random variable, 1 − p is often denoted by q.
Properties of Bernoulli Random Variables
E[X] = p
V [X] = pq
E[X n ] = p
mX (t) = q + pet
Bernoulli Trial
A Bernoulli trial is an experiment that has only two possible outcomes.
3.7 Binomial Distributions
Binomial Experiments
A binomial experiment is defined by

1. A fixed number of identical trials, n
2. The n trials are independent
3. A trial results in one of two outcomes: Success, S, or failure, F
4. The probability of success on a single trial is equal to some value p and remains
the same from trial to trial. The probability of a failure is q = (1 − p)
5. The random variable, X, is the number of successes during the n trials.
The Binomial Distribution
For a random variable X with binomial distribution, parameters p ∈ (0, 1) and

n ∈ Z+

n x n−x
p(x) = p q where x ∈ {0, . . . , n}
x
Properties
E[X] = np
V [X] = npq
mX (t) = (q + pet )n
(
b(n + 1)pc, if (n + 1)p 6∈ Z
mode =
(n + 1)p and (n + 1)p − 1, if (n + 1)p ∈ {1, . . . , n}
Using R for Binomial Distributions
probability function dbinom(x, size, prob, log = FALSE)
CDF pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
quantiles qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
random numbers rbinom(n, size, prob)
Note
Note that the argument prob is used to specify the probability of success. In the R
functions above the arguments p and q stand for probability and quantile.

# X is binomial with p = 0.3 and n = 10. Compute P[X = 4]
dbinom(x = 4, size = 10, prob = 0.3)
[1] 0.2001209
#X is binomial with p = 0.234 and n = 27. Compute P[X <= 11]
pbinom(11, size = 27, prob = 0.234)
[1] 0.9871378
#X is binomial with p = 0.44 and n = 637. Compute P[X > 319]

pbinom(319, size = 637, prob = 0.44, lower.tail = FALSE)
[1] 0.0009037255
#X is binomial with n = 781 and p = 0.602. Compute the 71st percentile of X
qbinom(0.71, size = 781, prob = 0.602)
[1] 478
Calculations for Binomial Distributions

# X is binomial with p = 0.5 and n = 10. Plot the probability function of X
f <- function(x){dbinom(x, size = 10, prob = 0.5)}
x <- seq(from = 0, to = 10, by = 1)
# set up the pdf
x <- seq(from = 0, to = 10, by = 1)

d <- dbinom(x = x, size = 10, prob = 0.5)
M <- data_frame(x = x, y = d)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = y)) +
geom_col(
color = 'red',
size = 0.5,
fill = 'red',
alpha = 0.5,
width = 1) +
scale_x_continuous(
breaks = seq(from = 0, to = 10, by = 1)) +
labs(
x = 'random variable',
y = 'probability',
title = 'Binomial Probability Function',
subtitle = NULL) +
theme(
axis.ticks = element_blank(),
text = element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
Binomial Probability Function

0.25
0.20
0.15
probability
0.10
0.05
0.00
0 1 2 3 4 5 6 7 8 9 10
random variable
# X is a binomial random variable with parameters n = 1023 and p = 0.439.
# Create a histogram for a random sample of 1000 from X. Use 50 bins for the histogram
M <- data_frame(rb = rbinom(n = 1000, size = 1023, prob = 0.439))
G <-
ggplot(data = M) +
geom_histogram(
mapping = aes(x = rb, y = ..density..),
fill = 'darkgreen',
color = 'darkgreen',
alpha = 0.5,
size = 0.5,
bins = 33) +
labs(
x = 'random binomial numbers',
y = 'relative frequency',
title = 'Random Binomial Numbers, p = 0.439, n = 1023',
subtitle = NULL,
caption = 'Summer 2019')+
theme_classic() +
theme(
text = element_text(size = 16, face = 'italic', family = 'serif', color = 'black'),
axis.ticks = element_blank())
Random Binomial Numbers, p = 0.439, n = 1023
0.03
relative frequency
0.02
0.01
0.00
400 425 450 475 500

random binomial numbers
Summer 2019

# X is a binomial random variable with parameters n = 25 and p = 0.439.
# Create a graph of the emprical distribution function for a random sample of 1000 from X.
rb <- rbinom(n = 1e6, size = 25, prob = 0.439)
tb <- table(rb)
nb <- sort(unique(rb))
cb <- cumsum(tb) /length(rb)
M <- data_frame(nb, tb, cb)
G <-
ggplot(data = M) +
geom_col(
mapping = aes(x = nb, y = cb),
fill = 'maroon',
color = 'maroon',
alpha = 0.5,
width = 1,
size = 0.5) +
scale_x_continuous(
breaks = seq(from = min(rb), to = max(rb), by = 2))+
labs(
x = 'random binomial numbers',
y = 'cumulative frequency',
title = 'Random Binomial Numbers, p = 0.439, n =25',
subtitle = NULL,
caption = 'Summer 2019') +
theme_bw() +
theme(
Random Binomial Numbers, p = 0.439, n =25

1.00
0.75
cumulative frequency
0.50
0.25
0.00
1 3 5 7 9 11 13 15 17 19 21
random binomial numbers
Summer 2019
Computing Expected Values and Variances Using R
# Let X be a binomial random variable with parameters p = 0.439 and n = 571
# Define h(X) = (X ^ 3.1 + x ^ 1.58 + 1) ^ (1 / 3)

h <- function(x){(x ^ 3.1 + x ^ 1.58 + 1) ^ (1 / 3)}
# Compute the variance of h(X) using R
# Define the binomial paramters

p <- 0.439
n <- 571
# Define the possible values of X

x_vals <- seq(from = 0, to = n, by = 1)
# Define the probability function

f <- function(x){dbinom(x, size = n, prob = p)}
# Compute E[h(X)] and E[h(X)^2]

E1_h <- sum(h(x_vals) * f(x_vals))
E2_h <- sum(h(x_vals) ^ 2 * f(x_vals))
# Compute the variance of h(X)

var_h <- E2_h - (E1_h) ^ 2
var_h
[1] 216.985
3.8 Geometric Distributions
Geometric Distributions
A geometric distribution is described by

1. Identical Bernoulli trials
2. The trials are independent
3. Success probability p and failure probability q = 1 − p
4. The random variable X is the number of trials until the first success
5. The random variable Y is the number of failures before the first success
6. Note that X = Y + 1
Definition of the Geometric Probability Distribution
For the random variable X from the geometric distribution with parameter p ∈ (0, 1)
and q = 1 − p
pX (x) = q x−1 p, where x ∈ Z+
Properties of the Geometric Distribution
pX (x) = q x−1 p
pX (x + 1) = qpX (x)
P [X ≤ x] = 1 − q x
1
E[X] =
p
1−p
V [X] =
p2
pt
PX (t) =
1 − qt
pet
mX (t) =
1 − qet
mode = 1
Memory-less Property of the Geometric Distribution
P [X > a + b | X > a] = P [X > b]
Properties of Geometric Distributions for Y
For the random variable Y from the geometric distribution with parameter p ∈
(0, 1)
pY (y) = q y p, where y ∈ Z+ ∪ {0}
P [Y ≤ y] = 1 − q y+1
1−p
E[Y ] =
p
1−p
V [Y ] =
p2
p
PY (t) =
1 − qt
p
mY (t) =
1 − qet
mode = 0
Memory-less Property of the Geometric Distribution
P [Y > a + b | Y > a] = P [Y > b]
Geometric Sum Formula
Let r ∈ R the geometric sum formula is ...

∞
X 1
rj = for −1 < r < 1
j=0
1−r
∞
X r
rj = for −1 < r < 1
j=1
1−r
∞
X
rj diverges for 1 ≤ |r|
j=0
Example
∞ j
X 2 1
=
j=0
3 1 − 2/3
1
=
1/3
= 3
Using R for the Geometric Random Variable Y
probability function dgeom(x, prob, log = FALSE)
CDF pgeom(q, prob, lower.tail = TRUE, log.p = FALSE)
quantiles qgeom(p, prob, lower.tail = TRUE, log.p = FALSE)
random numbers rgeom(n, prob)

# Let Y be a geometric random variable with parameter p = 0.25
# Compute P[Y = 3]
dgeom(x = 3, prob = 0.25)
[1] 0.1054688
# Let Y be a geometric random variable with parameter p = 1/3

# Compute P[Y <= 5]
pgeom(q = 5, prob = 1 / 3)
[1] 0.9122085
# Let Y be a geometric random variable with parameter p = 2/5

# Compute P[Y > 4]
pgeom(q = 4, prob = 2 / 5, lower.tail = FALSE)
[1] 0.07776
# Let Y be a geometric random variable with parameter p = 0.4

# Solve for the smallest value of x such that P[Y <= x] > 0.995
qgeom(p = 0.995, prob = 0.4)
[1] 10

library(tidyverse)
# Let be a geometric random variable with parameter p = 0.25
# Generate a random sample of size 1000 from X and plot the histogram
rn <- rgeom(n = 1000, prob = 0.25)
br <- seq(from = min(rn) - 0.5, to = max(rn) + 0.5, by = 1)
M <- data_frame(x = rn)
# Create the histogram

G <-
ggplot(
data = M,
mapping = aes(x = rn)) +
geom_histogram(
mapping = aes(y = ..count.. / sum(..count..)),
breaks = br,
color = 'blue',
fill = 'blue',
alpha = 0.3) +
labs(
title = 'Relative Frequency of Geometric Data',
x = NULL,
y = NULL) +
xlim(
lower = min(rn) - 0.5,
upper = max(rn) + 0.5) +
theme(
text =element_text(size = 16, face = 'italic', color = 'black', family = 'serif'))
vfill
Relative Frequency of Geometric Data

0.25
0.20
0.15
0.10
0.05
0.00
0 10 20
3.9 Negative Binomial Distributions
Introduction
The negative binomial distribution is defined by
1. Identical Bernoulli trials
2. The trials are independent
3. Success probability p and failure probability q = 1 − p
4. The random variable X is the number trials until the rth success
5. The random variable Y is the number of failures until the rth success
6. Note that X = Y + r
Definition of the Negative Binomial Distribution, X
For a negative-binomial random variable X with parameters p and r = 2, 3, 4, . . .

x − 1 r x−r
p(x) = pq , where x = r, r + 1, . . . ,
r−1
Properties of the Negative Binomial Distribution
r
E[X] =
p
r(1 − p)
V [X] =
p2
r
pet

mX (t) =
1 − qet
Note:
In the equations above, X is the number of trials until the rth success.
Definition of the Negative Binomial Distribution, Y
For a negative-binomial random variable Y with parameters p ∈ (0, 1) and r =

2, 3, 4, . . .

r+y−1 r y
pY (y) = pq , where y = 0, 1, 2, 3, , . . . ,
y
Properties of the Negative Binomial Distribution, Y
r(1 − p)
E[Y ] =
p
r(1 − p)
V [Y ] =
p2
r
p
mY (t) =
1 − qet
Note:
In the equations above, Y is the number of failures until the rth success.
Using R for the Negative-Binomial Random Variable, Y
probability function dnbinom(y, size, prob, mu, log = FALSE)
distribution function pnbinom(q, size, prob, mu, lower.tail = TRUE)
quantile function qnbinom(p, size, prob, mu, lower.tail = TRUE)
random numbers rnbinom(n, size, prob, mu)

# Let Y be a negative-binomial random variable with parameters
# p = 0.5 and r = 6. Compute the proability that Y=3
dnbinom(3, size = 6, prob = 0.5)
[1] 0.109375
# Compute the probability that Y <= 4
pnbinom(4, prob = 0.5, size = 6)
[1] 0.3769531
# Compute the probability that Y>6
pnbinom(6, prob = 0.5, size = 6, lower.tail = FALSE)
[1] 0.387207
# Solve for the smallest value w such that P[Y<=w] > 0.5
qnbinom(0.5, size = 6, prob = 0.5)
[1] 5
# Solve for the largest value w such that P[Y>w] > 0.4
qnbinom(0.4, size = 6, prob = 0.5, lower.tail = FALSE)
[1] 6
# Let Y be a negative binomial random variable with parameters r = 6 and p = 0.5

# Plot the probability function for Y from Y = 0 to Y = 20
x <- seq(from = 0, to = 20, by = 1)

d <- dnbinom(x = x, size = 6, prob = 0.5)
M <- data_frame(x = x, y = d)
G <-
ggplot(
data = M,
geom_col(
color = 'purple',
fill = 'purple',
alpha = 0.5,
size = 0.5) +
labs(
x = NULL,
y = NULL,
title = 'A Negative Binomial Distribution',
subtitle = 'John Garza')+
theme(
A Negative Binomial Distribution

John Garza
0.125
0.100
0.075
0.050
0.025
0.000
0 5 10 15 20
M <- data_frame(rv = rnbinom(n = 1000, size = 6, prob = 0.5))
G <-
ggplot(data = M) +
geom_histogram(
mapping = aes(x = rv, y = ..density..),
fill = 'darkorange',
color = 'darkorange',
alpha = 0.5,
breaks = seq(from = -0.5, to = max(M$rv) + 0.5, by = 1)) +
labs(
x = NULL,
y = NULL,
title = 'Random Negative Binomial Numbers, p = 0.5, n = 6',
subtitle = NULL,
caption = 'Summer 2019')+
theme(
Random Negative Binomial Numbers, p = 0.5, n = 6
0.10
0.05
0.00
0 5 10 15 20
Summer 2019
3.10 Hyper-Geometric Distributions
Introduction
The hyper-geometric distribution can be described by the following
1. A box contains N = n + m ∈ Z+ balls
2. m of the balls are red and n = N − m of the balls are black.
3. A random sample of size k is chosen from the box.
4. The random variable, X, is the number of red balls in the sample.
5. The possible range of X is max{0, k − n} ≤ X ≤ min{k, m}
Probability Function for the Hyper-Geometric Distribution

m N −m
x k−x
pX (x) =
N
k
mk
E[X] =
N
m N − m N − n
V [X] = k
N N N −1
Identity
n
X
Since pX (k) = 1, we should expect the following identity to be true
k=0
k
X m N −m N
=
i=0
i k−i k
Using R for Hyper-Geometric Distributions
probability function dhyper(x, m, n, k, log = F)
distribution function phyper(q, m, n, k, lower.tail = T, log.p = F)
quantile function qhyper(p, m, n, k, lower.tail = T, log.p = F)
random numbers rhyper(nn, m, n, k)

# X is a hyper-geometric random variable with parameters m = 5, n = 10 and k = 7
# Compute P[X = 2]
dhyper(x = 2, m = 5, n = 10, k = 7)
[1] 0.3916084
# X is a hyper-geometric random variable with parameters m = 53, n = 100
# and k = 77
# Compute P[X <= 20]
phyper(q = 20, m = 53, n = 100, k = 77)
[1] 0.01773732
# X is a hyper-geometric random variable with m = 101, n = 202, and k = 45

# Compute P[X > 15]
phyper(q = 15, m = 101, n = 202, k = 45, lower.tail = FALSE)
[1] 0.4269205
# X is a hyper-geometric random variable with m = 15, n = 17, and k = 11
# Compute the smallest value w such that P[X < w] > 0.65
qhyper(p = 0.65, m = 15, n = 17, k = 11)
[1] 6
# X is a hyper-geometric random variable with m = 10, n = 10 and k = 10

# Plot P[X = j] for j = 0, 1, 2,...., 10
x <- seq(from = 0, to = 10, by = 1)

y <- dhyper(x = x, m = 10, n = 10, k = 10)
M <- data_frame(x = x, y = y)
G <-
ggplot(
data = M,
mapping = aes(x = x, y =y)) +
geom_col(
color = 'darkviolet',
fill = 'darkviolet',
alpha = 0.7,
width = 1) +
scale_x_continuous(
breaks = seq(from = 0, to = 10, by = 1))+
theme_classic()+
labs(
x = NULL,
y = NULL,
title = 'A Hypergeometric Probability Distribution',
subtitle = NULL)+
theme(
A Hypergeometric Probability Distribution
0.3
0.2
0.1
0.0
0 1 2 3 4 5 6 7 8 9 10
Random Hyper-Geometric Numbers

# X is hyper geometric with m = 50, n = 100, and k = 75
# Generate 10000 random numbers from X and plot the histogram
x <- rhyper(nn = 1e4, m = 50, n = 100, k = 75)

br <- seq(from = min(x) - 0.5, to = max(x) + 0.5, by = 1)
M <- data_frame(x = x)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = ..density..))+
geom_histogram(
color = 'darkturquoise',
fill = 'darkturquoise',
alpha = 0.7,
binwidth = 1) +
theme_bw() +
labs(
title = 'Random Hyper Geometric Numbers',
subtitle = NULL,
x = NULL,
y = NULL) +
theme(
Random Hyper Geometric Numbers

0.15
0.10
0.05
0.00
15 20 25 30 35
3.11 Poisson Distributions
Introduction
The Poisson distribution has a single parameter which is often called λ. A Poisson
random variable X can take only the values X = 0, 1, 2, 3, . . . Examples of Poisson
random variables may include, the number accidents per day at a factory, the number
of typos per page in a printed book, or the number of customers arriving at a store
in the next hour.
The Probability Function for the Poisson Distribution
λk −λ
pX (k) = e
k!
Properties of the Poisson Distribution
λ
pX (k) = pX (k − 1) ×
k
E[X] = λ
V [X] = λ
t
mX (t) = eλ(e −1)
PX (t) = eλ(t−1)
(
bλc for λ 6∈ Z+
mode =
λ and λ − 1 for λ ∈ Z+
Sums of Independent Poisson Random Variables
Let X1 and X2 be independent Poisson random variables with means λ1 and λ2 .

Define W = X1 + X2 . Then W is a Poisson random variable with mean
λw = λ1 + λ2
The Power Series Expansion for ex
∞
x
X xj
e =
j=0
j!
x2 x3
= 1+x+ + + ···
2! 3!
An Application of the Power Series Expansion of ex
The power series expansion is important for problems about Poisson distributions.
Note the flowing application. Let X be a PoissonP random variable with mean λ.
Use the power series expansion of e to verify that ∞
x
k=0 P [X = k] = 1.
∞ ∞
X X λk
P [X = k] = e−λ
k!
k=0 k=0
∞
!
X λk
= e−λ
k!
k=0
= e−λ eλ
= e0
= 1
Poisson Distributions
probability function dpois(x, lambda, log = F)
distribution function ppois(q, lambda, lower.tail = T, log.p = F)
quantile function qpois(p, lambda, lower.tail = T, log.p = F)
random numbers rpois(n, lambda)
# X is Poisson random variable with mean 3. Compute P[X = 2]

dpois(x = 2, lambda = 3)
[1] 0.2240418
# X is a Poission random variable with mean 5. Compute P[X <= 6]
ppois(q = 6, lambda = 5)
[1] 0.7621835
# X is a Poisson random variale with mean 4. Compute P[X > 5]
ppois(q = 5, lambda = 4, lower.tail = FALSE)
[1] 0.2148696
# X is a Poisson random variable with mean 2 Compute P[4 <= X <= 7]
ppois(q = 7, lambda = 2) - ppois(q = 3, lambda = 2)
[1] 0.1417798
# X is a Poisson random variable with mean 3.3. Solve for the
# smallest value of w such that P[X < w] > 0.99. Check your answer
qpois(p = 0.99, lambda = 3.3)
[1] 8
ppois(q = 8, lambda = 3.3)
[1] 0.9930882
ppois(q = 7, lambda = 3.3)
[1] 0.9802229
Using R for Poisson Distributions

# X is Poisson with mean 1.2. Find the smallest w where P[X > w] < 0.01
qpois(p = 0.01, lambda = 1.2, lower.tail = FALSE)

[1] 4
# For a Poisson random variable X with mean 5,
# Plot the probability function for 0 <= X <= 20
M <-
data_frame(
x = seq(from = 0, to = 20, by = 1),
y = dpois(x, lambda = 5))
G <-
ggplot(
data = M,
geom_col(
fill = 'darkgreen',
alpha = 0.4,
width = 1)+
labs(
x = 'Random Variable',
y = NULL,
title = 'Poisson Probability Function',
subtitle = NULL) +
theme(
size = 18,
color = 'black',
face = 'italic',
family = 'serif'))
Poisson Probability Function
0.15
0.10
0.05
0.00
0 5 10 15 20
Random Variable
Using R for Poisson Distributions

# Generate 1000 random poisson numbers with mean 5 and plot the results in a histogram
M <- data_frame(x = rpois(n = 1000, lambda = 5))
G <-
ggplot(
data = M,
mapping = aes(x = x, y = ..density..)) +
geom_histogram(
color = 'purple',
fill = 'purple',
alpha = 0.2,
binwidth = 1) +
labs(
y = NULL,
title = 'Relative Frequency of Random Poison Numbers',
subtitle = NULL) +
theme(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
Relative Frequency of Random Poison Numbers
0.15
0.10
0.05
0.00
0 5 10
Random Variable
The Cumulative Distribution Function:

Let X be a Poisson random variable with λ = 4. Graph the cumulative distribution function, FX (x) = P [X ≤
x] for X = 0, 1, 2, 3, · · · , 15.
# Define a vector of x values and p_X(x) values

x <- seq(from = 0, to = 15, by = 1)
p_x <- dpois(x, lambda = 4)
F_x <- cumsum(p_x)
# Plot F_x
M <- data_frame(x, F_x)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = F_x)) +
geom_col(
color = 'darkorange',
alpha = 0.6,
width = 1) +
labs(
title = 'A Cumulative Distribution Function / Poisson ',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
A Cumulative Distribution Function / Poisson

1.00
0.75
0.50
0.25
0.00
0 5 10 15
x−values
The Survival Function:

Let X be a Poisson random variable with λ = 4. Graph the survival function, sX (x) = P [X > x] for
X = 0, 1, 2, 3, · · · , 15.
# Define a vector of x values and p_X(x) values

x <- seq(from = 0, to = 15, by = 1)
p_x <- dpois(x, lambda = 4)
s_x <- ppois(x, lambda = 4, lower.tail = FALSE)
# Plot F_x
M <- data_frame(x, s_x)
G <-
ggplot(
data = M,
mapping = aes(x = x, y = s_x)) +
geom_col(
color = 'red',
fill = 'red',
alpha = 0.3,
width = 1) +
labs(
title = 'The Survival Function of a Poisson Random Variable',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))
The Survival Function of a Poisson Random Variable

1.00
0.75
0.50
0.25
0.00
0 5 10 15
x−values
Emperical Distibution Functions

Plot the emperical distribution function for a vector of 10,000 random numbers from a Poisson random variable
with mean 4.
# Generate a vector of random

x = rpois(n = 1e4, lambda = 4) %>% sort
ecdf = (x %>% table %>% as.vector %>% cumsum) / length(x)
M <- data_frame(ecdf = ecdf, x = unique(x))
# Plot the empiracal cumulative distribution function

G <-
ggplot(
data = M,
mapping = aes(x = x, y = ecdf)) +
geom_col(
fill = 'thistle',
color = 'purple',
width = 1) +
labs(
title = 'An Emperical Cumulative Distribution Function',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))

An Emperical Cumulative Distribution Function

1.00
0.75
0.50
0.25
0.00
0 5 10
x−values


Chapter 4
Continuous Probability Distributions
4.1 Cumulative Distribution Functions
Cumulative Distribution Functions
X be a random variable. The cumulative distribution function of X is...

FX (x) = P [X ≤ x]
The cumulative distribution function is often written as CDF
Properties of Cumulative Distribution Functions
Let X be a random variable with cumulative distribution function FX (x)
1. FX (x) is non-decreasing
2. P [a < X ≤ b] = FX (b) − FX (a)
3. lim FX (x) = 0
x→−∞
4. lim FX (x) = 1
x→+∞
Cumulative Distribution Functions in R
You can work with many CDFs in R. The below script demostrates the four prperties above
using the CDF for standard normal random variable.
x <- seq(from = -4, to = +4, length = 1e3)

y <- pnorm(x, mean = 0, sd = 1)
G <-
ggplot(
data = M, mapping = aes(x = x, y = y)) +
geom_line(
color = 'red',
size = 1.2,
linetype = 'solid') +
labs(
title = 'A Cumulative Distribution Function',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / UT - Permian Basin') +
theme(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))

A Cumulative Distribution Function

1.00
0.75
0.50
0.25
0.00
−4 −2 0 2 4
x−values
Statistics I / UT − Permian Basin
# Compute P[-1 <= Z <= +1]

pnorm(1) - pnorm(-1)
[1] 0.6826895

Definition of Continuous Random Variable
Let X be a random variable with cumulative distribution function FX (x). X is a continuous random
variable if
0
1. FX (x) exists everywhere except possibly for a finite set on any finite interval
0
2. FX (x) is continuous except possibly for a finite set on any finite interval

4.2 Density Functions

Probability Density Functions
For a continuous random variable X the probability density function is

0
fX (x) = FX (x)
Relationships Between FX (x) and fX (x)
0
fX (x) = FX (x)
Zx
FX (x) = f (t) dt
−∞
Review of the Fundamental Theorem of Calculus
Zx
If f (x) is a continuous function on the interval [a, b] and F (x) = f (t) dt, then
a
0 d
F (x) = [F (x)]
dx
 x 
Z
d 
= f (t) dt
dx
a
= f (x)

Properties of Density Functions
For a random variable X with density function fX (x)
1. fX (x) ≥ 0 ∀ x
Z+∞
2. fX (x) dx = 1
−∞
Zb
3. P [a ≤ X ≤ b] = fX (x) dx
a
4. P [a ≤ X ≤ b] = FX (b) − FX (a)
The Mode of a Continuous Random Variable
Let X be a continuous random variable with density function fX (x). The mode(s) of X are the
value(s) of X that maximize fX (x)
The Support of a Density Function
Let X be a continuous random variable with probability density function fX (x). The support of
fX (x) is {x | fX (x) 6= 0}.

Quantiles
Let X be a random variable with cumulative distribution function FX (x). For a number p ∈ (0, 1),
the pth quantile of X is the smallest number φp such that
P [X ≤ φp ] = FX (φp )
≥ p
Percentiles
For a continuous random variable X with cumulative distribution function FX (x), the 100pth per-
centile of X is the smallest number φp satisfying
P [X ≤ φp ] = FX (φp )
= p

Working with Density Functions in R
The following script is an example of working with a probability density function using R. The density is a
gamma density which you will learn about later in the chapter.
# Define the density function -----

f <- function(x){(0 < x) * (x * exp(-x))}
# Define a sequence of x values and a sequene of y values

x <- seq(from = 0, to = 7, length = 1e3)
y <- f(x)
# Create a data_frame
# Plot the density curve

G <-
ggplot(
data = M,
geom_line(
color = 'red',
size = 1.3,
geom_vline(
xintercept = 1,
color = 'black',
size = 1,
linetype = 'dashed') +
labs(
title = 'A Probability Density Function',
subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
color = 'black',
size = 16,
face = 'italic',
family = 'serif'))

A Probability Density Function
0.3
0.2
0.1
0.0
0 2 4 6
x−values
# Compute P[2 < = X <= 3] using the density function

integrate(f, lower = 2, upper = 3)$value
[1] 0.2068576

Working with Quantile Functions in R
# Let X be a hypergeometric random variable with parameters m = 101, n = 88, and k = 33
# Find the 63rd quantile of X

qhyper(p = 0.63, m = 101, n = 88, k = 33)
[1] 19
# review your answer using phyper

phyper(seq(from = 1, to = 33, by = 1), m = 101, n = 88, k = 33)
[1] 1.296440e-11 3.708639e-10 6.683471e-09 8.532358e-08 8.227057e-07

[6] 6.238233e-06 3.824475e-05 1.934509e-04 8.199343e-04 2.948050e-03
[11] 9.083238e-03 2.419228e-02 5.613663e-02 1.143375e-01 2.059623e-01
[16] 3.308173e-01 4.782155e-01 6.289789e-01 7.624786e-01 8.646503e-01
[21] 9.320606e-01 9.702624e-01 9.887671e-01 9.963798e-01 9.990176e-01
[26] 9.997791e-01 9.999597e-01 9.999942e-01 9.999994e-01 1.000000e+00
[31] 1.000000e+00 1.000000e+00 1.000000e+00

4.3 Expected Value for Continuous Distributions
The Expected Value of a Continuous Random Variable
Let X be a continuous random variable with density functions f (x). The expected
value of X is defined as
Z∞
E[X] = xf (x) dx
−∞
The Expected Value of a function of X
Let X be a continous variable with density f (x) and let g(X) be a function of X.
The expected value of g(X) is ...
Z∞
E[g(X)] = g(x)f (x) dx
−∞

The Expected Value of a Constant
For a constant c ∈ R
Z∞
E[c] = cf (x) dx
−∞
Z∞
= c f (x) dx
−∞
= c·1
= c

The Expected Value is a Linear Function of X
Let X be a continuous random variable and let a, b ∈ R. Then
1. E[aX] = aE[X]
2. E[aX + b] = aE[X] + b
3. E[ag(X) + b] = aE[g(X)] + b
Computing an Expected Value with R
A random variable X has density function


0.2 , for − 1 ≤ x ≤ 0





f (x) = 0.2 + (1.2)x , for 0 < x ≤ 1






0 , otherwise

Plot the density function. Compute the mean and variance of X.

# Define the density function

f <- function(x){(-1 <= x & x <= 0) * 0.2 + (0 < x & x <= 1) * (0.2 + 1.2 * x)}
# Verify that the area under the density is one

integrate(f, lower = -Inf, upper = +Inf)$value
[1] 1
# Expected Value
EX <- integrate(function(x){x * f(x)}, lower = -Inf, upper = +Inf)$value
EX
[1] 0.4
# Variance
VX <- integrate(function(x){(x - EX) ^ 2 * f(x)}, lower = -Inf, upper = +Inf)$value
VX
[1] 0.2733333

Plotting the Density Function

f <- function(x){(-1 <= x & x <= 0) * 0.2 + (0 < x & x <= 1) * (0.2 + 1.2 * x)}
# Define sequences for plotting
# Plot the density function

v <- c(-1, 1)
M <-
data_frame(
x = seq(from = -1, to = 1, length = 1e3),
y = f(x))
G <-
ggplot(
data = M,
mapping = aes())+
geom_area(
mapping = aes(x = x, y = y),
fill = 'thistle',
color = 'thistle',
alpha = 0.7,
size = 1.3,
ylim(
lower = 0,
upper = 1.5) +
xlim(
lower = -1.5,
upper = +1.5) +
geom_segment(
data = data_frame(x = v, y = f(v)),
mapping = aes(x = x, xend = x, y = 0, yend = y),
color = 'purple',
linetype = 'dashed',
size = 1.3) +
labs(
title = 'A Probability Density Function',
subtitle = NULL,
x = 'x-axis',
y = NULL,
caption = 'Statistics I / Section 4.4') +
theme(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

A Probability Density Function

1.5
1.0
0.5
0.0
−1 0 1
x−axis
Statistics I / Section 4.4

4.4 Moment Generating Functions
The Moment Generating Function of a Continuous Random Variable
For a continuous random variable X, the moment generating function of X is ...

mX (t) = E[eXt ]
Z∞
= ext f (x) dx
−∞
MGF for Functions of a Random Variable
For a function g(X) of a continuous random variable X,

mg(X) (t) = E[eg(X)t ]
Z∞
= eg(x)t f (x) dx
−∞
Notes
As with discrete random variables,...

mX (0) = 1
m(k)
k
X
(0) = E X
mX (ln a) = E aX


Theorem
Let X be a random variable with moment generating function mX (t). Then

dk mX (t)

= m(k) (0)
dtk t=0
X
= E Xk

Note
The previous theorem can be used to calculate the variance of X

V [X] = E[(X − µX )2 ]
2
= E[X 2 ] − E[X]
(1) 2
= m(2)
X
(0) − mX (0)

Note
mX (0) = E[e0X ]
= E[e0 ]
= E[1]
= 1
Note
x
E[ax ] = E[eln(a ) ]
= E[ex ln a ]
= mX (ln a)

Let Y = aX + b. Then
mY (t) = E[etY ]
= E[et(aX+b) ]
= E[eatX etb ]
= etb mX (at)
Independent Random Variables and Moment Generating Functions
Let X1 , . . . , Xn be independent random variables and define W = X1 + · · · + Xn .

mW (t) = m1 (t) × · · · × mn (t)
Log of mX (t), E[X], and V [X]
Let X be a random variable with moment generating function mX (t).
Define h(t) = ln(mX (t)).

E[X] = h0 (0)
V [X] = h00 (0)

4.5 Uniform Continuous Distributions
The Continuous Uniform Distribution
The continuous uniform distribution is constant on an interval and zero elsewhere.

For a < b real numbers,
 1
 b − a , for a ≤ x ≤ b


f (x) =


0, otherwise



 0, for x < a





 x−a
FX [x] = , for a ≤ x ≤ b

 b − a





1, for b < x

bn+1 − an+1
E[X n ] =
(n + 1)(b − a)
a+b
E[X] =
2
(b − a)2
V [X] =
12
ebt − eat
MX (t) =
(b − a)t

Using R for Continuous Uniform Distributions
density dunif(x, min = 0, max = 1, log = FALSE)
CDF punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
quantiles qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
random numbers runif(n, min = 0, max = 1)
Using R for Continuous Uniform Distributions

Generate 100,000 random numbers from a continuous uniform distribution on the interval
[0, 2] plot them as a relative frequeny histogram and overlay the corresponding density
function.
# Generate radom numbers
M <- data_frame(
xr = runif(n = 1e4, min = 0, max = 2),
xp = seq(from = 0, to = 2, length = 1e4),
fp = dunif(xp, min = 0, max = 2))
# Plot
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = xr, y = ..density..),
color = 'purple',
fill = 'thistle',
alpha = 0.9,
breaks = seq(from = 0, to = 2, by = 0.05)) +
geom_line(
mapping = aes(x = xp, y = fp),
size = 1.3,
scale_x_continuous(
breaks = c(0, 1, 2),
labels = c(0, '', 2),
limits = c(-0.5, 2.5)) +
scale_y_continuous(
limits = c(0, 0.75)) +
theme_classic() +
labs(
title = 'Relative Frequency Histogram / Random Uniform Numbers',
subtitle = NULL,

x = 'x-values',
y = NULL) +
theme(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))
Relative Frequency Histogram / Random Uniform Numbers
0.6
0.4
0.2
0.0
0 2
x−values

The Empirical Cumulative Distribution Function

Generate 100,000 random numbers from a continuous uniform distribution on the interval
[0, 2] plot them as an empirical cumulative distribution function and overlay the theoretical
cdf.
# Generate random numbers

a <- 0
b <- 2
xr <- runif(n = 1e4, min = a, max = b)
xc <- seq(from = 0, to = 2, by = (b - a) / 50)
xf <- cut_width(xr, width = (b - a) / 50)
xt <- as.numeric(table(xf))
yc <- cumsum(xt) / sum(xt)
M <- data_frame(xc = xc, yc = yc)
N <- data_frame(
yp = punif(xp, min = 0, max = 2))
# Plot
G <-
ggplot(
data = M) +
geom_col(
mapping = aes(x = xc, y = yc),
color = 'blue',
fill = 'powderblue',
alpha = 0.9,
size = 0.5,
geom_line(
data = N,
mapping = aes(x = xp, y = yp),
size = 1.3,
labs(

title = 'An Empirical Cumulative Distribution Function',

subtitle = NULL,
x = 'x-values',
y = NULL) +
theme(
text =element_text(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

An Empirical Cumulative Distribution Function

1.00
0.75
0.50
0.25
0.00
0.0 0.5 1.0 1.5 2.0

x−values

4.6 Gamma Distributions
The Density Function of a Gamma Random Variable
For X with gamma distribution and parameters α > 0, β > 0
 α−1 −x/β
x e
 β α Γ(α) , 0 ≤ x



f (x) =



 0, otherwise
Z∞
Γ(α) = xα−1 e−x dx
0
Properties of Gamma Random Variables
E[X] = αβ
V [X] = αβ 2
mX (t) = (1 − βt)−α
The Shape and Scale Parameters
The shape parameter is α and the scale parameter is β

Notation
A gamma random variable X with parameters α and β is denoted X ∼ Γ(α, β).
Properties of the Gamma Function, Γ(α)
1. For n ∈ Z+ , Γ(n) = (n − 1)!
2. For α > 1, Γ(α) = (α − 1)Γ(α − 1)
√
3. Γ(1/2) = π
Sums of Independent Gamma Random Variables
Let X ∼ Γ(αX , β) and Y ∼ Γ(αY , β) be independent gamma random variables.
Define W = X + Y .
Then W ∼ Γ(αX + αY , β) is a gamma random variable.

Using R for Gamma Distributions
density dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE)
CDF pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE)
quantiles qgamma(p, shape, rate = 1, scale = 1/rate, lower.tail = TRUE)
random numbers rgamma(n, shape, rate = 1, scale = 1/rate)
Using R for Gamma Distributions

# X is a gamma random variable with shape parameter 3 and scale parameter 1.
# Calculate P[X > 6]
pgamma(q = 6, shape = 3, scale = 1, lower.tail = FALSE)
[1] 0.0619688
# X is a gamma random variable with shape parameter 2 and
# scale parameter 2. Determine the value w such that P[X <= w]=0.66
qgamma(p = 0.66, shape = 2, scale = 2)
[1] 4.521549
The Gamma Function Γ(x) in R

Use R to calculate Γ(3.8) and Γ(5)
gamma(3.8)
[1] 4.694174
gamma(5)
[1] 24

Area Plots for Gamma Distributions

X is a gamma random variable with shape parameter 2 and scale parameter 3. Plot the
probability density function of X on the interval [0, 15]. Show the area corresponding to the
probability P [2 ≤ X ≤ 7]
# Define parameters
shape <- 2
scale <- 3
a <- 2
b <- 7
# Calculate P[2 <= X <= 7]

prob <- pgamma(q = 7, shape = shape, scale = scale) - pgamma(q = 2, shape = shape, scale = scal
prob
[1] 0.5324553
# Plot the area
M <- data_frame(
x = seq(from = a, to = b, length = 1e3),
y = dgamma(x, shape = shape, scale = scale),
yp = dgamma(xp, shape = shape, scale = scale))
G <-
ggplot(
data = M)+
geom_area(
mapping = aes(x = x, y = y),
fill = 'orange',
alpha = 0.3) +
geom_line(
mapping = aes(x = xp, y = yp),
color = 'orange',
size = 1.2,
geom_text(
mapping = aes(x = 4.50, y = 0.05),
label = paste('P[2 < X < 7] = ', round(prob, digits = 2)),
size = 5,
color = 'black',
family = 'serif',
fontface = 'italic',
angle = 0) +
labs(
title = 'A Gamma Density Curve',
subtitle = NULL,
x = 'x-values',
y = NULL) +

theme(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'))
A Gamma Density Curve

0.125
0.100
0.075
0.050 P[2 < X < 7] = 0.53
0.025
0.000
0 5 10 15
x−values

Plotting Gamma Densities Using ggplot2

Plot several gamma density curves together
M <-
expand.grid(
x = seq(from = 0, to = 10, length = 1e3),
shape = c(1, 2, 3),
scale = 2)
M %<>% mutate(y = dgamma(x, shape = shape, scale = scale))
M %>% head %>% kable

x shape scale y
0.0000000 1 2 0.5000000
0.0100100 1 2 0.4975037
0.0200200 1 2 0.4950200
0.0300300 1 2 0.4925486
0.0400400 1 2 0.4900895
0.0500501 1 2 0.4876428
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(shape)),
size = 1.3) +
scale_color_manual(
name = 'Shape Parameter',
breaks = c('1', '2', '3'),
values = c('darkred', 'steelblue', 'darkgreen'))+
labs(
title = 'Gamma Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / Section 4.6')+
theme(
legend.position = c(0.85, 0.9),
legend.key.width = unit(1.2, unit = 'inches'),
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

Gamma Density Functions

0.5 Shape Parameter
1
2
3
0.4
0.3
0.2
0.1
0.0
0.0 2.5 5.0 7.5 10.0

x−values

4.7 χ2 Distributions
Introduction
A special case of the gamma distribution is th chi-square random variable.

This distribution is a gamma distribution with scale parameter β = 2 and shape
parameter ν/2 where ν ∈ Z+
The Density of a χ2 Random Variable
Let ν ∈ Z+ . For a random variable X with a χ2 distribution with ν degrees of

freedom
 ν/2−1 −x/2
x e
 2ν/2 Γ(ν/2) , 0 ≤ x



fX (x) =



0, otherwise

Properties of a χ2 Random Variable
E[X] = ν
V [X] = 2ν
mX (t) = (1 − 2t)−ν/2

Relationship to the Normal Distribution
For i ∈ {1, . . . , k} let Zi ∼ N (0, 1) be independent

1
1. Z 2 ∼ Γ ,2 is a chi-square random variable with df = 1 freedom.
2
n
2. S = Z12 + ··· + Zn2 ∼Γ , 2 is a chi-square random variable with df = n
2
3. E[S] = n and V [S] = 2n

Using R for χ2 Distributions
density function dchisq(x, df, ncp = 0, log = FALSE)
CDF pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
quantiles qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)
random numbers rchisq(n, df, ncp = 0)
Using R for χ2 Distributions

# X is a chi-square random variable with df = 6. Calculate P[X > 5]
pchisq(q = 5, df = 6, lower.tail = FALSE)
[1] 0.5438131
# X is a chi-square random variable with df = 5. Solve for the 70th percentile of X
qchisq(p = 0.70, df = 5)
[1] 6.06443
# X is a chi-square random variable with df = 10

# calculate P[6 < X < 7]
pchisq(q = 7, df = 10) - pchisq(q = 6, df = 10)
[1] 0.08981829

Using ggplot2 to plot the density function
M <-
expand.grid(
x = seq(from = 0, to = 10, by = 0.01),
df = c(2, 3, 4))
M %<>% mutate(y = dchisq(x = x, df = df))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(df)),
size = 1.2,
scale_y_continuous(
limits = c(0, 0.75)) +
labs(
title = 'Chi-Squared Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
caption = 'Statistics I / Section 4.7',
color = 'Degrees of Freedom') +
theme(
legend.key.width = unit(0.6, units = 'inches'),
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

Chi−Squared Density Functions
Degrees of Freedom
2
3
4
0.6
0.4
0.2
0.0
0.0 2.5 5.0 7.5 10.0

x−values

4.8 Exponential Distributions
The Density Function of an Expoential Random Variable
The exponential distribution is determined by a single parameter. In this course we

use β for the parameter. An exponential distribution is a gamma distribution with
α = 1. The density function is

1 −x/β
β e , for 0 < x



f (x) =


0, otherwise


Properties of Exponential Random Variables
FX [x] = 1 − e−x/β
sX [x] = e−x/β
P [a ≤ x ≤ b] = e−a/β − e−b/β
E[X] = β
V [X] = β 2
MX (t) = (1 − βt)−1
The Memoryless Property of the Exponential Distribution
P [X > a + b | X > b] = P [X > a]

Using R for Exponential Distributions
density function dexp(x, rate = 1, log = FALSE)
CDF pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE)
Percentiles qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)
Random Numbers rexp(n, rate = 1)
Package Actuar: Functions for the Exponential Distribution
raw moments mexp(order, rate = 1) E[X k ]
limited expected value levexp(limit, rate = 1, order=1) E[min(x, d)k ]
MGF mgfexp(t, rate = 1, log = FALSE) E[etX ]

Using R for Exponential Distributions

# Find the 55- percentile of an exponential random variable that has mean 1
qexp(p = 0.55, rate = 1)
[1] 0.7985077
# Calculate the probability that an exponential random variable

# with mean 2 is greater than 3
pexp(q = 3, rate = 0.5, lower.tail = FALSE)
[1] 0.2231302
#check using the survival function

exp(x = -3/2)
[1] 0.2231302

Usin R for Exponential Distributions
require(actuar)
# Use the actuar package. X is exponential with mean 0.5, calculate E[X^5]
mexp(order = 5, rate = 2)
[1] 3.75
integrate(function(x){x ^ 5 * dexp(x, rate = 2)}, lower = 0, upper = Inf)$value
[1] 3.75
# Calculate E[min(X,5)^2]
levexp(limit=5, rate = 2, order = 2)
[1] 0.4997503
integrate(f = function(x){((x < 5) * x + (5 < x) * x) ^ 2 * dexp(x, rate = 2)},
lower = 0,
upper = Inf)$value
[1] 0.5
# Calculate E[2^x]
mgfexp(log(2), rate = 2, log = FALSE)
[1] 1.530394
# Calculate E[2^x] using integration
integrate(
function(x){2 ^ x * dexp(x, rate = 2, log = FALSE)},
lower = 0,
upper = Inf)$value
[1] 1.530394

# Create a vector of 1000 random numbers from an exponential distribution with mean 5 and plot
togram with 50 bins
M <- data_frame(x = rexp(n = 1000, rate = 0.2))
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x, y = ..density..),
color = 'red',
fill = 'red',
alpha = 0.6,
breaks = seq(from = 0, to = 30, by = 1)) +
labs(
x = 'Random Exponential Numbers',
y = NULL,
title = 'Relative Frequency Histogram / Exponential Numbers',
subtitle = NULL,
theme(
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

Relative Frequency Histogram / Exponential Numbers
0.15
0.10
0.05
0.00
0 10 20 30
Random Exponential Numbers

The Memoryless Property of the Exponential Distribution

# introduction -----
# this script provides a demonstration of the memoryless property
# of the exponential distribution
# select values for a and b -----

a <- 3
b <- 5
# define the density function -----

f <- function(x){dexp(x, rate = 1)}
# compute P[X > b]

prob_b <- integrate(f, lower = b, upper = Inf)$value
# define the conditional density function

f_cond <- function(x){(b < x) * f(x) / prob_b}
# compare the conditional and unconditional probabilities

integrate(f, lower = a, upper = Inf)$value
[1] 0.04978707
integrate(f_cond, lower = a + b, upper = Inf)$value
[1] 0.04978708

Line Plots the Exponential Distribution

M <-
expand.grid(
x = seq(from = 0, to = 10, by = 0.001),
rate = c(1, 0.5, 0.25)) %>%
mutate(y = dexp(x = x, rate = rate))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(rate)),
size = 1.2) +
scale_color_manual(
name = 'rate',
values = c('darkred', 'darkgreen', 'darkorange'))+
labs(
title = 'Exponential Density Functions',
subtitle = NULL,
x = 'x-values',
y = NULL,
theme(
size = 14,
face = 'italic',
family = 'serif',
color = 'black'))

Exponential Density Functions

1.00
rate
0.25
0.5
1
0.75
0.50
0.25
0.00
0.0 2.5 5.0 7.5 10.0

x−values

4.9 Beta Distributions
The Density Function of a Beta Distribution
The beta distribution has two parameters α > 0 and β > 0. The density function is
defined on the closed interval [0, 1]. For a random variable X with beta distribution
and parameters α > 0 and β > 0
 α−1
x (1 − x)β−1
, for 0 ≤ x ≤ 1



 B(α, β)
fX (x) =



 0, otherwise
Z1
B(α, β) = xα−1 (1 − x)β−1 dx
0
Γ(α)Γ(β)
=
Γ(α + β)
Properties of a Beta Distribution
α
E[X] =
α+β
αβ
V [X] =
(α + β)2 (α + β + 1)
α−1
mode =
α+β−2

Note about the Moment Generating Function
The moment generating function for a beta function does not exist in closed form.
Intervals other than [0,1]
To use the Beta distribution but on the interval [a, b] instead of [0, 1]...
y−a
y∗ = where a ≤ y ≤ b
b−a
The new variable y ∗ is defined on the interval [0, 1]

Using R for Beta Distributions
density dbeta(x, shape1, shape2, ncp = 0, log = FALSE)
CDF pbeta(q, shape1, shape2, ncp = 0, lower.tail = TRUE)
quantiles qbeta(p, shape1, shape2, ncp = 0, lower.tail = TRUE)
random numbers rbeta(n, shape1, shape2, ncp = 0)

* shape1 = α
* shape2 = β
# X is a beta random variable with alpha=4 and beta=2

# Calculate P[0.4 < X <= 0.7] using the CDF
pbeta(q = 0.7, shape1 = 4, shape2 = 2) - pbeta(q = 0.4, shape1 = 4, shape2 = 2)
[1] 0.44118
# Calculate P[0.4 < X <= 0.7] using integration of the density function
integrate(function(x){dbeta(x, shape1 = 4, shape2 = 2)}, lower = 0.4, upper = 0.7)$value
[1] 0.44118
# X is a beta random variable with alpha = 2 and beta = 6
# Find the first quartile of X
qbeta(p = 0.25, shape1 = 2, shape2 = 6)
[1] 0.137974


# Plot Beta Density functions for alpha = 1, and beta = 1, 2, 3
M <-
expand.grid(
x = seq(from = 0, to = 1, by = 0.01),
alpha = 3,
beta = c(1, 2, 3)) %>%
mutate(y = dbeta(x = x, shape1 = alpha, shape2 = beta))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, col = as.factor(beta)),
size = 1.2,
scale_color_manual(
name = 'Beta',
values = topo.colors(3)) +
labs(
title = 'Beta Density Functions',
subtitle = NULL,
x = 'x-values',
y = 'Prabability Density',
theme(
legend.direction = 'horizontal',
size = 16,
color = 'black',
family = 'serif',
face = 'italic'))

Beta Density Functions

3
Beta 1 2 3
2
Prabability Density
0.00 0.25 0.50 0.75 1.00

x−values


# X is a beta random variable with alpha = 3 and beta = 2
# Generate 10,000 random numbers from X and plot the histogram
M <- data_frame(x = rbeta(n = 1000, shape1 = 3, shape2 = 2))
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x, y = ..density..),
color = 'black',
fill = 'purple',
alpha = 0.2,
breaks = seq(from = 0, to = 1, length = 30)) +
geom_line(
mapping = aes(x = x, y = dbeta(x, shape1 = 3, shape2 = 2)),
color = 'purple',
size = 1.2,
labs(
y = NULL,
title = 'Relative Frequency Histogram / Beta Numbers',
subtitle = NULL,
theme(
size = 16,
color = 'black',
family = 'serif',
face = 'italic'))

Relative Frequency Histogram / Beta Numbers
2.0
1.5
1.0
0.5
0.0
0.00 0.25 0.50 0.75 1.00

Random Variable

4.10 Normal Distributions
Introduction
The normal distribution is very important and there is alot of basic facts that you
will want to know. Let X be a normal random variable with mean µ and variance
σ 2 we symbolize this by X ∼ N (µ, σ 2 ) . There is not a closed form for the CDF, it
is just expressed as an integral.
The Density Function of a Normal Random Variable
1 2 2
fX (x) = √ e−(x−µ) /(2σ ) −∞<x<∞
σ 2π
The CDF of a Normal Random Variable
Zx
1 2
/(2σ 2 )
FX (x) = √ e−(t−µ) dt
σ 2π
−∞

Properties of a Normal Random Variable
E[X] = µ
V [X] = σ 2
mode = µ
median = µ
2 2
mX (t) = eµt+t σ /2

Standard Normal Random Variables
If X ∼ N (0, 1) then X is a standard normal random variable. Z is often used to

denote a standard normal random variable. The CDF of a standard normal random
variable is denoted Φ(z).
1 2
fZ (z) = √ e−z /2 −∞<z <∞
2π
Zz
1 2
Φ(z) = √ e−t /2
dt
2π
−∞
E[Z] = 0
V [Z] = 1
mode = 0
median = 0
2
mX (t) = et

Special Formulas for the Standard Normal Distribution
1. Φ(z) + Φ(−z) = 1
2. P [Z < a] = P [Z > −a]
3. P [−a < Z < a] = 2Φ(a) − 1
4. Φ−1 (a) = −Φ−1 (1 − a)
Sums of Normal Random Variables
Let X ∼ N (µX , σX2 ) and Y ∼ N (µY , σY2 ). Then W = X + Y is a normal random

variable. If X an Y are independent then W ∼ N (µX + µY , σX2 + σY2 )
Standardized Random Variables and z-Scores
Let X be a random variable with mean µ and standard deviation σ. Define

X −µ
Z=
σ
Z is called the standardization of X and has mean 0 and standard deviation 1. Let
x be a value from a probability distribution with mean µ and standard deviation σ.
The z-score of x is defined as
(x − µ)
σ

Standardizing Normal Random Variables
1. Let X ∼ N (µ, σ 2 ) then X = µ + σZ where Z ∼ N (0, 1)
2. Let Z ∼ N (0, 1) then for constants µ and σ, µ + σZ ∼ N (µ, σ 2 )
Outliers, Extreme Values
Extreme values from a data set or distribution are often referred to as outliers. One
definition of an outlier are data values whose z-scores are greater than +3 or less
than -3.

Using R for the Normal Distributions
density dnorm(x, mean = 0, sd = 1, log = F)
CDF pnorm(q, mean = 0, sd = 1, lower.tail = T, log.p = F)
quantiles qnorm(p, mean = 0, sd = 1, lower.tail = T, log.p = F)
random numbers rnorm(n, mean = 0, sd = 1)

# X is a random variable with mean=2 and sd=1. Calculate P[X > 3]
pnorm(q = 3, mean = 2, sd = 1, lower.tail = FALSE)
[1] 0.1586553
# X is a random variable with mean = 5 and sd = 2. Calculate P[X < 6]
pnorm(q = 6, mean = 5, sd = 2, lower.tail = TRUE)
[1] 0.6914625
# Z is a standard normal random variable. Compute the 66th percentile of Z
qnorm(p = 0.66, mean = 0, sd = 1)
[1] 0.4124631
# mu = -3, sigma = 4. Compute the smallest number w such that P[X > w] < 0.33
qnorm(p = 0.33, mean = -3, sd = 4, lower.tail = FALSE)
[1] -1.240347

# Plot normal densities for mean = 0 and sd = 1, 2

M <-
expand.grid(
x = seq(from = -5, to = +5, length = 1e3),
mu = 0,
sigma = c(1, 2))

x mu sigma
-5.00000 0 1
-4.98999 0 1
-4.97998 0 1
-4.96997 0 1
-4.95996 0 1
-4.94995 0 1
M %<>% mutate(y = dnorm(x, mean = mu, sd = sigma))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = y, color = as.factor(sigma)),
size = 1.3,
scale_color_manual(
values = c('darkgreen', 'darkorange'),
name = 'Standard Deviation') +
labs(
title = 'Normal Density Functions',
x = 'Normal Numbers',
y = NULL) +
theme(
legend.key.width = unit(1, units = 'inches'),
size = 16,
color = 'black',
face = 'italic',
family = 'serif'))

Normal Density Functions

0.4 Standard Deviation
1
2
0.3
0.2
0.1
0.0
−5.0 −2.5 0.0 2.5 5.0

Normal Numbers

Chapter 5
Multivariate Probability Distributions
5.1 Introduction
In this chapter of the course, joint probability distributions are introduced. The first section
provides an example of using R and heatmaps to visualize a bivariate probability density.
library(tidyverse)
# parameters for an example bivariate continuous density function -----

u_x <- 0.0
s_x <- +0.5
u_y <- +1.0
s_y <- +0.5
r_xy <- -0.5
# Define the biviriate normal density

Q <- function(x, y){
((x - u_x)/ s_x) ^ 2 + ((y - u_y) / s_y) ^ 2 -
2 * r_xy * (x - u_x) / s_x * (y - u_y) / s_y}
f <- function(x,y){1 / (2 * pi * s_x * s_y * sqrt(1 - r_xy ^ 2)) * exp(-Q(x, y) / 2)}
# use the outer function to create a square matrix of density values

M <-
expand.grid(
x = seq(from = -3, to = 3, by = 0.01),
y = seq(from = -3, to = 3, by = 0.01)) %>%
mutate(density = f(x, y))
G <-
ggplot(
data = M,
geom_tile(
mapping = aes(fill = density)) +
labs(
title = 'Bivariate Normal Density Function, John Garza',
subtitle = NULL,
x = 'x-axis',
y = 'y-axis',
caption = 'Statistics I / Section 5.1')+
scale_fill_continuous(
name = 'density',
type = 'viridis') +
theme_classic() +
theme(
size = 16,
face = 'italic',
family = 'serif',
color = 'black'),
legend.position = c(0.86, 0.2))

Bivariate Normal Density Function, John Garza
2
y−axis
density
0.6
−2 0.4
0.2
−2 0 2
x−axis

# Load the mvtnorm package

library(mvtnorm)
library(magrittr)
ux <- -2.1
uy <- +0.5
sx <- +1.3
sy <- +2.3
rho <- -0.7
# Define the mean vector

mu <- c(ux, uy)
# Define the variance - covariance matrix

sigma <-
cbind(
c(sx^2, rho * sx * sy),
c(rho * sx * sy, sy^2))
# Generate 500 multivariate normal samples

M <-
rmvnorm(
n = 500,
mean = mu,
sigma = sigma)
colnames(M) = c('x', 'y')

M %<>% as_data_frame()
# Create the 2d density plot

G <-
ggplot(
data = M,
stat_density_2d(
mapping = aes(fill = stat(level)),
geom = 'polygon',
position = 'identity',
size = 1.2,
contour = TRUE,
n = 100,
h = NULL,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) +
scale_fill_gradient(
name = 'Density',

low = 'yellow',
high = 'steelblue') +
geom_point(
size = 1,
alpha = 0.2) +
scale_y_continuous(
breaks = c(-4, 0, 4, 8),
labels = c(-4, '', 4, 8)) +
scale_x_continuous(
breaks = c(-6, -4, -2, 0, 2),
labels = c(-6, -4, '', 0, 2)) +
labs(
title = 'Random Bivariate Normal Numbers',
caption = 'Statistics I / Test #3',
x = 'x-axis',
y = 'y-axis') +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x = element_text(vjust = +5),
axis.title.y = element_text(vjust = -3),
face = 'italic',
family = 'serif',
size = 16,
color = 'black'),
G

Random Bivariate Normal Numbers

8
4
y−axis
Density
−4 0.06
0.04
0.02
−6 −4 x−axis 0
Statistics I / Test #3

5.2 Bivariate Probability Distributions
Joint Probability Functions
Let X and Y be discrete random variables. The joint probability function for X
and Y is defined as
p(x, y) = P (X = x, Y = y)
−∞ < x < +∞
−∞ < y < +∞
Properties of Joint Probability Functions
For discrete random variables X and Y with joint probability function p(x, y)
1. p(x, y) ≥ 0 ∀x, y
X
2. p(x, y) = 1
all x,y

Joint Distribution Functions
Let X and Y be random variables. The joint distribution function of X and Y is

defined as
F (x, y) = P (X ≤ x, Y ≤ y)
−∞ < x < +∞
−∞ < y < +∞
Properties of Joint Distribution Functions
Let X and Y be random variables with joint distribution function F (x, y). Then
1. F (−∞, −∞) = 0
2. F (−∞, y) = 0
3. F (x, −∞) = 0
4. F (+∞, +∞) = 1
5. For x∗ ≥ x and y ∗ ≥ y,
F (x∗ , y ∗ ) − F (x∗ , y) − F (x, y ∗ ) + F (x, y) ≥ 0

Joint Density Functions
Let X and Y be continuous random variables with joint distribution function F (x, y).
If there exists a non-negative function f (x, y) satisfying
Zx Zy
F (x, y) = f (x, y) dydx ∀ x and y
−∞ −∞
Then X and Y are said to be jointly continuous and f (x, y) is called the joint
probability density function of X and Y .
Properties of Joint Density Functions
Let X and Y be jointly continuous random variables density function f (x, y).
1. f (x, y) ≥ 0 ∀ (x, y)
Z+∞ Z+∞
2. f (x, y) dy dx = 1
−∞ −∞
Zb Zd
3. P (a < X < b, c < Y < d) = f (x, y) dy dx
a c

Multivariate Probability Distributions
Let n ∈ Z+ . The idea of joint probability distributions can be generalized to joint

density functions of random variables X1 , . . . Xn . In the disrete case, the joint prob-
ability function is defined as
p(x1 , . . . , xn ) = P (X1 = x1 , . . . , Xn = xn )
In the case of continuous random variables, the joint density function and joint
distribution function satisfy
P (X1 ≤ x1 , . . . , Xn ≤ xn ) = F (x1 , . . . , xn )
Zx1 Zx2 Zxn

= ··· f (t1 , t2 , . . . , tn ) dtn · · · dt2 dt1
−∞ −∞ −∞

A Function for computing Iterated Integration

iterated_integral <- function(xl, xu, yl, yu, f, dx){
# computes the iterated integral of f over the region defined by
# xl < x < xu and yl < y < yu
# Args:
# xl: lower bound for x, as a function of y
# xu: upper bound for x, as a function of y
# yl: lower bound for y, as a function of x
# yu: upper bound for y, as a function of x
# dx: 1 means integrate x first (on the inside) and then y
# f : the integrand as a function of x and y
#
# Returns: the iterated integral of f over the region defined by xl, xu, yl, and yu
if (dx == 1){
integrate(function(y)
{sapply(y, function(y)
{integrate(function(x){f(x,y)}, lower = xl(y), upper=xu(y))$value})},
lower = yl(x), upper = yu(x))
} else {
integrate(function(x)
{sapply(x, function(x)
{integrate(function(y){f(x,y)}, lower = yl(x), upper=yu(x))$value})},
lower = xl(y), upper = xu(y))}
}
Example
A device runs until either of two components fails, at which point the device stops running.
The joint density function of the lifetimes of the two components, both measured in hours,
is
x+y



 8 , for 0 < x < 2, 0 < y < 2
f (x, y) =


0 , otherwise

Calculate the probability that the device fails during its first hour of operation. Plot the
joint density function and also show the region corresponding to the probability that the
device fails during its first hour of operation.

Solution
# Define the joint density function

f <- function(x,y){(x + y) / 8 * (0 < x) * (0 < y) * (x < 2) * (y < 2)}
# Define the region of interest
r <- function(x,y){(x < 1 | y < 1) * (0 < x) * (0 < y)}
# Limits
xl <- function(y){0}
xu <- function(y){2}
yl <- function(x){0}
yu <- function(x){2}
# Compute the answer
prob <- iterated_integral(

xl = xl,
xu = xu,
yl = yl,
yu = yu,
f = function(x,y){r(x,y) * f(x,y)}, dx = 1)$value
prob
[1] 0.625

# Create an equalled spaced grid over the support of f(x,y)

M <-
expand.grid(
x = seq(from = 0, to = 2, length = 100),
y = seq(from = 0, to = 2, length = 100)) %>%
mutate(z = f(x, y))
N <-
data_frame(
x = c(0, 0, 1, 1, 2, 2),
xend = c(0, 1, 1, 2, 2, 0),
y = c(0, 2, 2, 1, 1, 0),
yend = c(2, 2, 1, 1, 0, 0))
G <-
ggplot(
data = M,
geom_tile(
mapping = aes(fill = z)) +
scale_x_continuous(
breaks = c(0.0, 0.5, 1.0, 1.5, 2.0),
labels = c('0.0', 0.5, '', 1.5, '2.0')) +
scale_y_continuous(
breaks = c(0.0, 0.5, 1.0, 1.5, 2.0),
labels = c('0.0', 0.5, '', 1.5, '2.0')) +
name = 'Density',
type = 'viridis') +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'A Bivariate Probability Density Function, John Garza',
geom_segment(
data = N,
mapping = aes(x = x, xend = xend, y = y, yend = yend),
size = 0.4) +
geom_text(
x = 0.5,
y = 0.5,
label = 'Area',
size = 7,
family = 'serif') +
theme_classic() +
theme(

legend.direction = 'vertical',
size = 16,
color = 'black',
family = 'serif',
face = 'italic'),
axis.line = element_blank(),
axis.title.x = element_text(
vjust = -0.8),
axis.title.y = element_text(
vjust = +0.4),

A Bivariate Probability Density Function, John Garza

2.0
1.5
y−axis
Density
0.5 Area
0.4
0.3
0.2
0.1
0.0
0.0
0.0 0.5 1.5 2.0

x−axis

5.3 Marginal and Conditional Distributions
Marginal Probability Functions
Let X and Y be discrete random variables with joint probability function p(x, y).
The marginal probability functions of X and Y are defined as
X
pX (x) = p(x, y)
all y
X
pY (y) = p(x, y)
all x
Marginal Density Functions
Let X and Y be continuous random variables with joint density function f (x, y).
The marginal density functions of X and Y are
Z+∞
fX (x) = f (x, y) dy
−∞
Z+∞
fY (y) = f (x, y) dx
−∞

Conditional Probability Function
Let X and Y be discrete random variables with joint probability function p(x, y).
The conditional probability functions are defined as
p(x | y) = P (X = x|Y = y)
P (X = x, Y = y)
=
P (Y = y)
p(x, y)
=
pY (y)
p(y | x) = P (Y = y|X = x)
P (X = x, Y = y)
=
P (X = x)
p(x, y)
=
pX (x)
Note:
p(x|y) is defined only if pY (y) > 0.

Conditional Distribution Functions
Let X and Y be jointly distributed random variables. The conditional distribution

function of X given Y = y is defined as
F (x | y) = P (X ≤ x | Y = y)
Conditional Density Functions
The conditional density functions are defined as
f (x, y)
f (x | y) =
fY (y)
f (x, y)
f (y | x) =
fX (x)
Note:
f (x | y) is defined only if fY (y) > 0

5.4 Independent Random Variables
Introduction
Two events A and B are independent if P (A ∩ B) = P (A) × P (B). If two random

variables X and Y are independent of each other we would like to have
P (a ≤ X ≤ b, c ≤ Y ≤ d) = P (a ≤ X ≤ b) × P (c ≤ Y ≤ d)
In this section we will describe indpendent random variables in terms of their joint
probability, density and distribution functions.
Independent Random Variables
Let X and Y be random variables with joint distribution function F (x, y). Then X
and Y are independent if and only if
F (x, y) = FX (x) × FY (y) for all real numbers x, y
Dependent Random Variables
If X and Y not independent, they are said to be dependent random variables.
Independent Jointly Discrete Random Vriables
Let X and Y be discrete random variables with joint probability function p(x, y)
and marginal probability functions pX (x) and pY (y). X and Y are independent if
and only if
p(x, y) = pX (x) × pY (y)
for all real numbers x and y

Independent Jointly Continuous Random Variables
Let X and Y be continuous random variables with joint density function f (x, y) and
marginal density functions fX (x) and fY (y). Then X and Y are independent if and
only if
f (x, y) = fX (x) × fY (y)
for all real numbers x and y.
Theorem About Independence and Rectangular Regions
Suppose that f (x, y) > 0 on the rectangular region defined by a ≤ x ≤ b and c ≤
y ≤ d and that f (x, y) = 0 otherwise. Then X and Y are independent random
variables if and only if there exist functions h(x) and g(y) such that
f (x, y) = h(x) × g(y)
where h(x) is a function of x only and g(y) is a function of y only.

Independent Random Variables and Conditional Density Functions
Let X and Y be continuous random variables with joint density function

f (x, y), conditional density functions f (x|y), f (y|x) and marginal density functions
fX (x), fY (y). It can be shown that X and Y are independent if and only if either
of two equations hold
f (x|y) = fX (x) for all y such that fY (y) > 0
f (y|x) = fY (y) for all x such that fX (x) > 0
Independence of Many Random Variables
Let X1 , X2 , . . . , Xn be random variables with joint distribution function

F (x1 , x2 , . . . , xn ) and marginal distribution functions F1 (x1 ), . . . , Fn (xn ). The vari-
ables are independent if and only if
F (x1 , x2 , . . . , xn ) = F1 (x1 ) × · · · × Fn (xn )

Independence and Covariance
The following facts relate independence and covariance
• If X and Y are independent then Cov(X, Y ) = 0
• Cov(X, Y ) = 0 does not imply that X and Y are independent.

5.5 The Expected Value of a Function of Random Variables
The Expected Value of a Function of Many Discrete Random Variables
Let X1 , X2 , . . . , Xn be discrete random variables with joint probability function

p(x1 , x2 , . . . , xn ) and let h(x1 , x2 , . . . , xn ) be a function of X1 , X2 , . . . , Xn The ex-
pected value of h(x1 , x2 , . . . , xn ) is
X X
E[h(X1 , X2 , . . . , Xn )] = ··· h(x1 , . . . , xn )p(x1 , . . . , xn )
all x1 all xn
The Expected Value of a Function of Many Continuous Random Variables
Let X1 , X2 , . . . , Xn be continuous random variables with joint density function

f (x1 , x2 , . . . , xn ) and let h(x1 , x2 , . . . , xn ) be a function of X1 , X2 , . . . , Xn The ex-
pected value of h(x1 , x2 , . . . , xn ) is
Z+∞ Z+∞
E[h(X1 , X2 , . . . , Xn )] = ··· h(x1 , . . . , xn )f (x1 , . . . , xn ) dxn · · · dx1
−∞ −∞

Using R to Compute an Expected Value
Let X and Y be jointly continuous random variables with joint density function

−x−2y
2e
 for x > 0, y > 0
f (x, y) =

0 otherwise

Plot the joint density function as a heat map
Use R to calculate E[X + Y ].
Z+∞ Z+∞
E[X + Y ] = (x + y)f (x, y) dy dx
−∞ −∞

# Define the iterated integration function

iterated_integral <- function(xl, xu, yl, yu, f, dx){
# computes the iterated integral of f over the region defined by
# xl < x < xu and yl < y < yu
# Args:
# xl: lower bound for x, as a function of y
# xu: upper bound for x, as a function of y
# yl: lower bound for y, as a function of x
# yu: upper bound for y, as a function of x
# dx: 1 means integrate x first (on the inside) and then y
# f : the integrand as a function of x and y
#
# Returns: the iterated integral of f over the region defined by xl, xu, yl, and yu
if (dx == 1){
integrate(function(y)
{sapply(y, function(y)
{integrate(function(x){f(x,y)}, lower = xl(y), upper=xu(y))$value})},
lower = yl(x), upper = yu(x))
} else {
integrate(function(x)
{sapply(x, function(x)
{integrate(function(y){f(x,y)}, lower = yl(x), upper=yu(x))$value})},
lower = xl(y), upper = xu(y))}
}
f <- function(x, y){(0 < x) * (0 < y) * 2 * exp(-x) * exp(-2 * y)}
# Limits of integration
xu <- function(y){Inf}
yl <- function(x){0}
yu <- function(x){Inf}
# compute the expected value, E[X + Y]

ex <- iterated_integral(xl, xu, yl, yu, function(x,y){(x + y) * f(x,y)}, dx = 2)$value
ex
[1] 1.5

# Create an equally spaced grid

M <- expand.grid(
y = seq(from = 0, to = 1, length = 300)) %>%
mutate(z = f(x, y))
# Create the heat map

G <-
ggplot(
data = M,
mapping = aes(x = x, y = y, fill = z))+
geom_tile() +
scale_x_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00')) +
scale_y_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00')) +
name = 'Probability Density',
low = 'white',
high = 'darkgreen') +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'A Bivariate Density Function',
theme_classic() +
theme(
size = 16,
color = 'black',
family = 'serif',

face = 'italic'),
A Bivariate Density Function

1.00
Probability Density
1.5
1.0
0.5
0.75
0.0
y−axis
0.25
0.00
0.00 0.25 x−axis 0.75 1.00

Properties of Expected Value
1. For a constant c ∈ R,
E[c] = c
2. Let g(X, Y ) be a function of the random variables X and Y . For c ∈ R
E[cg(X, Y )] = cE[g(X, Y )]
3. Let g1 (X, Y ), . . . , gn (X, Y ) be functions of the random variables X and Y . Then
E[g1 (X, Y ) + · · · + gn (X, Y )] = E[g1 (X, Y )] + · · · + E[gn (X, Y )]
4. Let g(X) and h(Y ) be functions of the independent random variables X and Y .
E[g(X)h(Y )] = E[g(X)] × E[h(Y )]

5.6 Covariance and Correlation
Introduction
Covariance and correlation are very important topics and will play an important role
in your future studies. The definitions and equations presented in this section should
be learned very carefully. Additionally, there are many properties of covariance and
correlation to know and understand.
The Covariance of Two Random Variables
Let X and Y be random variables with means µX and µY . The covariance of X and
Y is
Cov(X, Y ) = E[(X − µX )(Y − µY )]
Uncorrelated Random Variables
Two random variables X and Y are said to be uncorrelated if Cov(X, Y ) = 0
Correlation Coefficients, ρ
Let X and Y be random variables. correlation coefficient of X and Y is

Cov(X, Y )
ρXY =
σX σY

Properties of Covariance and Correlation
Let X and Y be random variables and let a ∈ R be a constant.
0. Cov(X, Y ) = E[(X − µX )(Y − µY )]
1. Cov(X, Y ) = E[XY ] − E[X]E[Y ]
2. Cov(X, Y ) = Cov(Y, X)
3. Cov(X, X) = V ar(X)
4. Cov(X + Y, W ) = Cov(X, W ) + Cov(Y, W )
5. Cov(aX, Y ) = a × Cov(X, Y )
6. Cov(X, Y + a) = Cov(X, Y )
7. Cov(X, Y ) = ρXY σX σY
8. −1 ≤ ρXY ≤ +1
9. |Cov(X, Y )| ≤ σX σY
10. If X and Y are independent then Cov(X, Y ) = 0
11. Cov(X, Y ) = 0 does not mean that X and Y are independent.

Calculating the Covariance of Jointly Distributed Random Variables
Let X and Y be continuous random variables with joint density function
8

 3 xy

 , 0 ≤ x ≤ 1, x ≤ y ≤ 2x
f (x, y) =


0 , otherwise

Calculate the covariance of X and Y .

# Calculating covariance using R
source('iterated_integral.R')

f <- function(x, y){(0 < x & x < 1) * (x <= y & y <= 2 * x) * (8 / 3) * x * y }
# Limits of integration
xu <- function(y){1}
yl <- function(x){x}
yu <- function(x){2 * x}
# Calculate the expected value of XY, X, and Y

E_XY <- iterated_integral(xl, xu, yl, yu, function(x, y){x * y * f(x, y)}, dx = 2)$value
E_X <- iterated_integral(xl, xu, yl, yu, function(x, y){x * f(x, y)}, dx = 2)$value
E_Y <- iterated_integral(xl, xu, yl, yu, function(x, y){y * f(x, y)}, dx = 2)$value
# Calculate Cov(X, Y)
Cov <- E_XY - E_X * E_Y
Cov
[1] 0.04148148

R Functions for Unbiased Sample Variance, Covariance, and Correlation
variance var(x, y = NULL, na.rm = FALSE, use)

standard deviation sd(x, na.rm = FALSE)
covariance cov(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))
correlation cor(x, y = NULL, use = "everything", method = c("pearson", "kendall", "spearman"))
# Standard Deviation, sd
st_dev <- function(x){sqrt(sum((x - mean(x)) ^ 2) / (length(x) - 1))}
rx <- runif(n = 1e3, min = -5, max = +5)
st_dev(rx)
[1] 2.820544
sd(rx)
[1] 2.820544
# Variance, var
var_x <- function(x){sum((x - mean(x)) ^ 2) / (length(x) - 1)}
rx <- rgamma(n = 1e4, scale = 2, shape = 2)
var_x(rx)
[1] 7.922016
var(rx)
[1] 7.922016
# Covariance, cov
cov_xy <- function(x,y){sum((x - mean(x)) * (y - mean(y))) / (length(x) - 1)}
rx <- rexp(n = 1e2, rate = 3)
ry <- rbinom(n = 1e2, size = 10, prob = 0.5)
cov_xy(rx,ry)
[1] 0.08773042
cov(rx,ry)
[1] 0.08773042
# Correlation, cor
cor_xy <- function(x,y){cov_xy(x,y) / (st_dev(x) * st_dev(y))}
rx <- rgamma(n = 1e4, shape = 2, scale = 3)
ry <- rbeta(n = 1e4, shape1 = 2, shape2 = 3)
cor_xy(rx, ry)
[1] 0.002670185
cor(rx, ry)
[1] 0.002670185

Example:
Analyze the harmon74.cor dataset by generating a heat map

rm(list = ls())
# Load the data

test <- datasets::Harman74.cor$cov %>% as_data_frame()
test$test2 <- names(test)
# Gather the data and add column of test names

test <- test %>% gather(key = test1, value = covariance, -test2)
# View the first few rows of test

head(test, n = 4)
# A tibble: 4 x 3
test2 test1 covariance
<chr> <chr> <dbl>
1 VisualPerception VisualPerception 1
2 Cubes VisualPerception 0.318
3 PaperFormBoard VisualPerception 0.403
4 Flags VisualPerception 0.468
# Create the heat map

G <-
ggplot() +
geom_tile(
data = test,
mapping = aes(x = test1, y = test2, fill = covariance)) +
scale_fill_gradient2(
name = NULL,
low = 'white',
mid = 'pink',
high = 'darkred')+
labs(
x = NULL,
y = NULL,
title = 'Test Covariance') +
theme_classic() +
theme(
legend.position = 'right',
legend.direction = 'vertical',
size = 16,
color = 'black',
family = 'serif',

face = 'italic'),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
Test Covariance
WordRecognition
WordMeaning
WordClassification
VisualPerception
StraightCurvedCapitals
SeriesCompletion
SentenceCompletion
ProblemReasoning
PargraphComprehension
PaperFormBoard 1.00
ObjectNumber 0.75
NumericalPuzzles
NumberRecognition 0.50
NumberFigure 0.25
GeneralInformation 0.00
Flags
FigureWord
FigureRecognition
Deduction
Cubes
CountingDots
Code
ArithmeticProblems
Addition
un le on
tin C ms
re D CgDode
en Fi coguctbes
m rRe er at gs
Re ed u ts
Nu ralI ure ition

NubeumbformFlaord
mN n W n
ra PaObricaogniguon
ph pe je lP it re
SeProComrFo tNuuzzlon
St nte le pr mBmbes
ra S nc mR eh o er
ht ie o as si d
W Visurv ompleningn
dC lP C let on
W Wlasesrcepitaon
or or if p ls
ec ea tion
og n n
ni ing
n
ig er eC e en ar
e g n io
C sC m o o
dR dMicatio
tio
o
Co rodbiti
e c F i
i
or ua ed p ti
a i
ic d
et A
P
r
c
m
gu
b
ith
Fi
Ar
rg
Pa

5.7 Linear Combinations of Random Variables
Introduction
Linear functions of random variables will appear in your studies often. It will be
important to know how expected value, variance, and covariance relate to linear
combinations of random variables. You probably want to derive these identities, but
we will not include that here. The textbooks contain the derivations which rely on
the elementary properties of expected value.
Notation
1. X1 , . . . , Xn are random variables
2. a1 , . . . , an are constants
3. U = a1 X1 + · · · + an Xn
4. Y1 , . . . , Ym are random variables
5. b1 , . . . , bm are constants
6. W = b1 Y1 + · · · + bm Ym

Properties of Linear Combinations of Random Variables
1.
E[U ] = E[a1 X1 + · · · + an Xn ]
= a1 E[X1 ] + · · · + an E[Xn ]
n
" #
X
= ai × E[Xi ]
i=1
2.
V [U ] = V [a1 X1 + · · · + an Xn ]
n
" # " #
X X
= a2i × V [Xi ] + 2ai aj Cov(Xi , Xj )
i=1 i<j
3.
Cov(U, W ) = Cov(a1 X1 + · · · + an Xn , b1 Y1 + · · · + bm Ym )
n X
m
" #
X
= ai bj × Cov(Xi , Yj )
i=1 j=1

Simple Linear Combinations
The most common case of a linear combination consists of two variables and two
constants. It is a good idea to remember the formulas for this situation. Let X, Y,
and W be random variables and let a, b, and c be constants. Then
1. E[aX + bY ] = aE[X] + bE[Y ]

2. E[aX] = aE[X]
3. E[a] = a
4. E[aX + b] = aE[X] + b
5. V [aX + bY ] = a2 V [X] + 2abCov(X, Y ) + b2 V [Y ]

6. V [X + c] = V [X]
7. V [aX] = a2 V [X]
8. V [aX + c] = a2 V [X]
9. V [c] = 0
10. Cov(aX + bY, cW ) = acCov(X, W ) + bcCov(Y, W )

11. Cov(X, c) = 0
12. Cov(X, X) = V [X]
13. Cov(X, Y ) = Cov(Y, X)
14. Cov(aX, bY ) = abCov(X, Y )
15. Cov(X, Y + c) = Cov(X, Y )

Exploring the Variance - Covariance Identities with R
# Generate two random vectors
X <- runif(n = 1e4, min = -4, max = 3)

Y <- runif(n = 1e4, min = -2, max = 10)
# Define coefficients a and b

a <- +2
b <- -3
# Compute V[2X - 3Y] using the idendity V[aX+bY] = a^2V[X] + 2abCov(X,Y) +b^2V[Y]
a ^ 2 * var(X) + 2 * a * b * cov(X,Y) + b ^ 2 * var(Y)
[1] 123.0317
# Compute the variance directly

var(a * X + b * Y)
[1] 123.0317

Variance and Independence

Let X1 , X2 , . . . , Xn be independent random variables. Then
Variance and Independence
V [X1 + X2 + · · · + Xn ] = V [X1 ] + V [X2 ] + · · · + V [Xn ]

5.8 Multinomial Distributions
Multinomial Experiments
A multinomial experiments defined by
1. n identical trials
2. the n trials are independent
3. each trial results in exactly one of k possible outcomes
4. for i = 1, . . . , k the probability of outcome i is denoted pi .
5. For X1 , . . . , Xk , Xi is the number of trials that resulted in outcome k.
Notes
• p 1 + · · · + pk = 1
• X1 + · · · + Xk = n

Multinomial Distributions
Let p1 , . . . , pk be constants satisfying
k
X
1. pi = 1
i=1
2. pi > 0 for i = 1, . . . , k
The X1 , . . . , Xk have a multinomial probability distribution if

n!
p(x1 , . . . , xk ) = px1 1 · · · pxk k
x1 ! · · · xk !
where
1. xi ∈ {0, . . . , n} for i ∈ {1, . . . , k}
k
X
2. xi = n
i=1
Properties of Multinomial Distributions
1. E[Xi ] = npi
2. V [Xi ] = npi qi where qi = 1 − pi
3. Cov(Xi , Xj ) = −npi pj for i 6= j

Using R for Multinomial Distributions
probability function dmultinom(x, size = NULL, prob, log = FALSE)
random multinomial vectors rmultinom(n, size, prob)
# For a multinomial distribution with p_1 = 0.2, p_2 = 0.5, p_3 = 0.3 and k = 3
# Generate 1,000,000 random vectors from the distribution.
# Compute cov(X_1, X_2)
# Compare the result to the analytic formula cov(X_1, X_2) = -n * p_1 * p_2
M <- rmultinom(n = 1e6, size = 10, prob = c(0.2, 0.5, 0.3))
cov(M[1, ], M[2, ])
[1] -1.002118
# using the formula cov(X_i, X_j) = -n * p_i * p_j
-10 * 0.2 * 0.5

[1] -1


The ages of auto policyholders are distributed according to the following table
Age Group Proportion

18-24 0.22
25-34 0.19
35-44 0.20
45-64 0.23
64-up 0.16
If five auto policy holders are randomly selected for a study, what is the probability that
one policy holder is selected from each age group?
dmultinom(c(1, 1, 1, 1, 1), size = 5, prob = c(0.22, 0.19, 0.20, 0.23, 0.16))

[1] 0.03691776

5.9 Bi-variate Normal Distributions
The Bi-variate Normal Distribution
The random variables X and Y have a bi-variate normal distribution if the joint
density function of X and Y is
−Q/2
e
f (x, y) = p where Q is defined below
2πσX σY 1 − ρ2
(y − µY )2 (x − µX )(y − µY ) (x − µX )2

1
Q = − 2ρ +
1 − ρ2 σY2 σX σY σX2
Cov(X, Y )
ρ =
σX σY

Properties of the Bi-variate Normal Distribution
1. X and Y are independent if and only if ρXY = 0
2. E[X] = µX
3. V [X] = σX2
4. E[Y ] = µY
5. Y [Y ] = σY2
σX

2 2

6. X|Y = y ∼ N µX + ρ (y − µY ), (1 − ρ )σX
σY
σY 2 2

7. Y |X = x ∼ N µY + ρ (x − µX ), (1 − ρ )σY
σX
Multivariate Normal Distributions
There is a generalization of the bi-variate normal distribution to more than two

variables. We call these distributions multivariate normal distributions. These dis-
tributions are more complicated and described in terms of matrices and we will not
work with them in this class.

Using R for Multivariate Normal Distributions

The mvtnorm package has built in functions for computing multi normal probabilities. As
an example consider two jointly continuous random variables X and Y that have a bi-variate
normal distribution with the following parameters.
µX = −1
σX = 1
µY = −1
√
σY = 2
Cov(X, Y ) = −0.30
Use the mvtnorm library to calculate P [−4 ≤ X ≤ −1, −1 ≤ Y ≤ 0]
# Load the package

library(mvtnorm)
# Define the vector of means

mean <- c(-1, -1)
# Define the variance-covariance matrix

sigma <- diag(c(1, 2))
sigma[lower.tri(sigma)] <- -0.30
sigma[upper.tri(sigma)] <- -0.30
sigma
[,1] [,2]
[1,] 1.0 -0.3
[2,] -0.3 2.0
# Define the upper and lower values
lower <- c(-4, -1)
upper <- c(-1, 0)
# Compute the probability

prob <- pmvnorm(lower = lower, upper = upper, mean = mean, sigma = sigma)
prob[[1]]
[1] 0.1373936

Graphing a Bi-variate Normal Density using persp()
# Define the distribution parameters

ux <- -1 # mean of X
sx <- +1 # sd of X
uy <- -1 # mean of Y
sy <- +sqrt(2) # sd of Y
sxy <- -0.5 # covariance of X and Y
rho <- (sxy) / (sy * sx) # correlation of X and Y

Q <- function(x,y){(1 - rho ^ 2) ^ {-1} * (((y - uy) / sy) ^ 2 -
2 * rho * ((x - ux) / sx) * ((y - uy) / sy) + ((x - ux) / sx) ^ 2)}
f <- function(x,y){exp(-Q(x, y) / 2) / (2 * pi * sx * sy * sqrt(1 - rho ^ 2))}
# Plotting the Density Function

x <- seq(from = ux - 3 * sx, to = ux + 3 * sx, length = 5e1)
y <- seq(from = uy - 3 * sy, to = uy + 3 * sy, length = 5e1)
z <- outer(x, y, FUN = 'f')
persp(x, y, z,
main = paste('Bivariate Normal Density '),
col = 'lightblue',
theta = 30,
phi = 20,
r = 50,
d = 0.1,
expand = 0.5,
ltheta = 90,
lphi = 180,
shade = 0.75,
ticktype = 'simple',
border = FALSE,
zlab = '',
xlab = '',
ylab = '',
box = FALSE)

Bivariate Normal Density

Bi-variate Normal Probabilities Using Iterated Integration
Use iterated integration and the the function defined in the previous example to
calculate P [−4 ≤ X ≤ −1, −1 ≤ Y ≤ 0]
# Define the distribution parameters

ux <- -1 # mean of X
sx <- +1 # sd of X
uy <- -1 # mean of Y
sy <- +sqrt(2) # sd of Y
sxy <- -0.3 # covariance of X and Y
rho <- (sxy) / (sy * sx) # correlation of X and Y
# Source files
source('bivariate.R')
source('iterated_integral.R')
# Define the limits of integration

xl <- function(y){-4}
xu <- function(y){-1}
yl <- function(x){-1}
yu <- function(x){0}
# Calculate the probability

iterated_integral(xl, xu, yl, yu, f, dx = 1)$value
[1] 0.1373936
This is the same answer we got when we used the mvtnorm package.

5.10 Conditional Expectations

5.10.1 Conceptual Formulas
Definition of Conditional Expected Value
Let X and Y be random variables and let g(X) be a function of X. The conditional
expectation of g(X) given Y = y is defined as
Z+∞
1. E[g(X)|Y = y] = g(x)f (x|y) dx for X and Y jointly continuous.
−∞
X
2. E[g(X)|Y = y] = g(x)p(x|y) for X and Y jointly discrete.
all x
Conditioning Formulas
The following formulas are sometimes call the conditioning formulas. The proofs
are not too hard and rely on the identity f (x, y) = f (x|y)fY (y). The proofs are
contained in the textbook.
h i
1. E[X] = EY EX [X|Y ]
h i h i
2. V [X] = EY VX [X|Y ] + VY EX [X|Y ]

Conditional Variance
Let X and Y be random variables. The conditional variance of X given Y = y is
2
V [X|Y = y] = E[X 2 |Y = y] − E[X|Y = y]
SOA Exam P # 203 (Conditional Variance)
A machine has two components and fails when both components fail. The number of years
from now until the first component fails, X, and the number of years from now until the
machine fails, Y , are random variables with joint density function
1 −(x+y)/6

 18 e , 0<x<y


f (x, y) =


, otherwise

Calculate V (Y |X = 2).


f <- function(x,y){(0 < x) * (x < y) * (1 / 18) * exp(-(x + y) / 6)}
# Define the marginal density function of x
f_x <- function(x){integrate(function(y){f(x, y)}, lower = x, upper = Inf)$value}
# Define the conditional density function

f_cond <- function(y){f(x = 2, y) / f_x(2)}
# Calculate E[Y^2|X=2]
E_Y2 <- integrate(function(y){y ^ 2 * f_cond(y)}, lower = 2, upper = Inf)$value
# Calculate E[Y|X=2]
E_Y1 <- integrate(function(y){y ^ 1 * f_cond(y)}, lower = 2, upper = Inf)$value
# Calculate V[Y|X=2]
V_Y <- E_Y2 - (E_Y1)^2
V_Y
[1] 36

Ch5 Index
John Garza

Ch5 Index
John Garza

Ch5 Index
John Garza

5.10.2 Conditional Expectations from Excel Data
The first part of this section described conceptual formulas for theoretical models. In
practice you may instead find yourself working with data that is presented in an excel
spreadsheet. Here we will look at importing bivariate data from excel into R, plotting a
bivariate histogram, and then computing a conditional expected value based on the excel
spread sheet data
# Load the readxl package
library(readxl)
# Import the data from excel

M <- read_excel('C:/Users/100145123/Desktop/ACTS 131/New Notes/data.xls')
# View the first few rows of data

head(M)
# A tibble: 6 x 2
X Y
<dbl> <dbl>
1 1.04 1.53
2 1.61 0.941
3 -0.858 4.24
4 1.16 3.01
5 -0.762 -0.0286
6 1.92 0.703
# Create bivariate histogram of this data
G <-
ggplot(
data = M)+
geom_hex(
mapping = aes(x = X, y = Y, fill = ..count..),
color = 'black',
size = 0.5)+
name = 'Count',
low = 'white',
high = 'red')+
labs(

title = 'A Two Dimendional Histogram',

x = 'x-axis',
y = 'y-axis',
caption = 'Statistics') +
theme_bw() +
theme(
axis.title.y = element_text(vjust = -7, hjust = 0.45),
face = 'italic',
size = 16,
family = 'serif',
color = 'black'),

A Two Dimendional Histogram
5.0
2.5
y−axis
0.0
Count
12.5
10.0
7.5
5.0
−2.5 2.5
−2 0 x−axis 2 4
Statistics

Example:
Calculate E[sin(X) | Y > 1] and V[cos(Y ) | 1 ≤ X ≤ 3]
M %>%
filter(Y > 1) %>%
mutate(sin_x = sin(X)) %>%
summarize(ex_conditional = mean(sin_x))
# A tibble: 1 x 1
ex_conditional
<dbl>
1 0.557
M %>%
filter(between(x = X, left = 1, right = 3)) %>%
mutate(sin_y = sin(Y)) %>%
summarize(variance = var(sin_y))
# A tibble: 1 x 1
variance
<dbl>
1 0.367

5.10.3 Categorical Data

Import the excel spreadsheet ’categorical data.xls’ into R and create a heat map showing
the relationship between three categorical variables
# Load the readxl package
library(readxl)
# Import data from excel spreadsheet

M <-
read_excel('C:/Users/john/Desktop/ACTS 131/UTPB/excel data/cateorical data.xls')
# Review M
head(M)
# A tibble: 6 x 3
stores products status
<chr> <chr> <chr>
1 Store - A Product - A high
2 Store - A Product - B very high
3 Store - A Product - C high
4 Store - A Product - D high
5 Store - A Product - E high
6 Store - A Product - F very low
unique(M$status)
[1] "high" "very high" "very low" "medium" "low"
# Set levels for status
M$status <- factor(M$status, levels = c('very high', 'high', 'medium', 'low', 'very low'))
# Create the tile map

G <-
ggplot(
data = M,
mapping = aes(x = stores, y = products)) +
geom_tile(
mapping = aes(fill = status),
color = 'black') +
scale_fill_manual(
name = 'Danger Level',
values = rainbow(5)) +
theme_classic() +
labs(
title = 'Danger Levels',
x = NULL,
y = NULL,
caption = NULL)+
theme(

size = 16,
face = 'italic',
family = 'serif',
color = 'black'),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1))
Danger Levels
Product − Z
Product − Y
Product − X
Product − W
Product − V
Product − U
Product − T
Product − S
Product − R
Product − Q
Product − P Danger Level
Product − O very high
Product − N high
Product − M medium
Product − L low
Product − K very low
Product − J
Product − I
Product − H
Product − G
Product − F
Product − E
Product − D
Product − C
Product − B
Product − A
St ore − A
Store − CB
Store − D
Store − E
o F
St re −− G
St ore H
S ore − I
Sttore − J
Store − K
St ore − ML
Store − N
St ore − O
Store − P
Store − Q
Sttore − R
Store − S
St ore − T
o U
St re − V
Store− W
Store − X
e−Y
Z
St ore −
or −
St ore −
St ore −
S ore −
Store
St


Chapter 6
Functions of Random Variables
6.1 Introduction
This chapter deals with functions of random variables. Topics include

• transformations of a single random variable
• the transformation formula
• moment-generating functions
• Jacobians
• order statistics
”Random Sample from a Population with Density f (x)”
The book assumes that population sizes are much larger than sample sizes and that
random variables obtained through random sampling are independent and identically
distributed. As a result, if X1 , . . . , Xn is a random sample from a distribution with
density function f (x). Then the joint density function will be
f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn )
If Y1 , Y2 , . . . , Yn is a random sample from a discrete distribution with probability
function p(y), then the joint probability function will be
p(y1 , y2 , . . . , yn ) = p(y1 )p(y2 ) · · · p(yn )
6.2 Probability Distributions for Functions of Random Variables
Introduction
Let U (X1 , X2 , . . . , Xn ) be a function of the random variables X1 , X2 , . . . , Xn and let

FU (u) be the cumulative distribution function of U . This chapter describes three
methods of determining FU (u).
The Distribution of a Single Function of a Random Variable
1. Method of Distribution Function: Find FU (u) directly by using the defini-

tion of U and FU (u) = P [U ≤ u] then obtain fU (u) using
d
fU (u) = F (u)
du U
2. Method of Transformations: If h(X1 ) is a strictly increasing or strictly de-
creasing function of X1 , we can apply a transformation formula to obtain the
density of h(X1 ). If U (X1 , X2 ) is a function of two random variables we can ap-
ply a modified version of the transformation formula to obtain the joint density
of g(X1 , U ) of X1 and U . The density fU (u) is then obtained as the marginal
distribution of U by integrating out X from the joint density.
Z+∞
fU (u) = g(X1 , U ) dX1
−∞
3. Method of Moment Generating Functions: This method uses the fact

that random variables with the same moment-generating functions have the
same distribution. If we can find the moment generating function of the random
variable U and identify it with a known distribution then we have found the
distribution of U

The Joint Distribution of Several Functions of Random Variables
4. Method of Bivariate Transformations: Let U (X1 , X2 ) and V (X1 , X2 ) be

functions of the random variables X1 and X2 . Under certain conditions, the
bivariate transformation method can use a Jacobian determinant to find the
joint density function fU,V (u, v).
Note: The next sections will describe each of these methods in more detail.

6.3 The Method of Distribution Functions
Introduction
Let X be a continuous random variable with density function fX (x) and let U (X)
be a function of X. We can solve for the cumulative distribution function FU (u) of
U directly by integrating X over the region corresponding to U ≤ u. The density
function fU (u) can then be found by differentiation using
dh i
fU (u) = F (u)
du U
Summary of the Method of Distribution Functions
The textbook summarizes the method of distribution function in four steps. Let U
be a function of the random variables X1 , . . . , Xn
1. Identify the region U = u in terms of x1 , . . . , xn
2. Identify the region U ≤ u
3. Integrate the joint density function f (x1 , . . . , xn ) over the region U ≤ u to ob-
tain FU (u) = P [U ≤ u]
4. Find the density function fU (u) by differentiating

dh i
fU (u) = F (u)
du U
Note
You will have to do many practice questions to get good at this method

Example:
Let X be a continuous random variable that is uniform over the interval [0, 1]
Define U = X 2 . Sole for the density function of U , fU (u).


1 , for 0 ≤ x ≤ 1
 " #
d
f (x) = fU (u) = F [u]
 du U
0 , otherwise

" #
d √
F [X] = P [X ≤ x] = u
du
Zx
1
= f (t) dt = √ 0≤u≤1
2 u
0
Zx
= 1 dt
0
= x
FU [u] = P [U ≤ u]
= P [X 2 ≤ u]
√
= P [X ≤ u]
√
= FX [ u]
√
= u

# Generate 1000 random uniform numbers

M <- data_frame(xr = runif(min = 0, max = 1, n = 1e5))
# View the first few rows of M

xr
0.2785159
0.8718237
0.4035373
0.3403416
0.4681988
0.1207040
# Mutate to add a new column x.squared
M %<>% mutate(x.squared = xr ^ 2)
# View the first few rows of M

xr x.squared
0.2785159 0.0775711
0.8718237 0.7600766
0.4035373 0.1628423
0.3403416 0.1158324
0.4681988 0.2192102
0.1207040 0.0145695
# Create an histograms with density curves on top
M %<>%
mutate(
us = seq(from = 0.001, to = 1, length = 1e5),
fu = 0.5 / sqrt(us))

# Create a histogram of u
Gu <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = x.squared, y = ..density..),
fill = 'darkred',
color = 'darkred',
size = 0.2,
alpha = 0.5,
bins = 77) +
geom_line(
mapping = aes(x = us, y = fu),
color = 'darkred',
size = 1.2) +
geom_hline(
yintercept = 1,
linetype = 'solid',
size = 1.2,
color = 'darkblue') +
labs(
title = 'Squared Random Uniform Numbers',
x = 'x.squared values',
y = 'Relative Frequency',
caption = 'Statistics I, Section 6.4') +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5),
labels = c(0, 1, '', '', 4, 5),
limits = c(0, 5)) +
scale_x_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00'),
limits = c(0, 1)) +
theme_bw() +
theme(

size = 16,
family = 'serif',
face = 'italic',
color = 'black'),
plot.title = element_text(hjust = 0.5))
Gu
Squared Random Uniform Numbers

5
4
Relative Frequency
0.00 0.25 0.75 1.00

x.squared values
Statistics I, Section 6.4

6.4 The Transformation Method
Introduction
The transformation method is a special case of the method of distributions. Let

X be a random variable with density function f (x) and let U = h(x) be a strictly
increasing or decreasing function of X on the support of X. Then h(x) has an inverse
function h−1 (u). The transformation formula relies on the following idea. Suppose
that U = h(X) is strictly increasing
FU (u) = P [U ≤ u]
= P [h(X) ≤ u]
= P [h−1 (h(X)) ≤ h−1 (u)]
= P [X ≤ h−1 (u)]
= FX [h−1 (u)]
Applying the Chain Rule
Now apply the chain rule for differentiation to obtain the density fU (u).
d
fU (u) = FX [h−1 (u)]
du
d −1
= fX [h−1 (u)] · h (u)
du
= fX (x(u))x0 (u)

The Derivative is a Positive Function
Since h(x) is strictly increasing, h−1 (u) is also strictly increasing. Therefore
d −1
h (u) > 0
du
d d −1
−1
h (u) = h (u)

du du
This leads to the transformation formula
d
−1 −1
fU (u) = fX (h (u)) h (u)

du
= fX (x(u))|x0 (u)|

Summary of the Transformation Method
The textbook summarizes the transformation method in three steps. Let X be

a continuous random variables with density function fX (x). Let W = g(X) be a
strictly increasing or strictly decreasing function of x whenever fX (x) > 0. To find
the density functions fW (w) of W ,
1. Find the inverse function g −1 (w)
d −1
2. Calculate the derivative g (w)
dw

−1 d −1
3. fW (w) = fX g (w) g (w)
dw

Applying the Transformation Formula for a Function of Two Variables
The textbook provides examples of using the transformation formula in to find

fU (u) in the case where U (X, Y ). To obtain fU (u) in the case, the textbook uses
the following steps:
1. Fix the value of X = x
2. Define g(Y ) = U (x, Y ). x is fixed, so g is really a function of Y only
3. Apply the transformation formula to get the joint density of U and X, f (x, u)
4. Obtain the density of fU (u) by integrating out x
Z+∞
fU (u) = f (x, u) dx
−∞

Example:
Let X be continuous and uniformly distributed over the interval 1 ≤ x ≤ 2.
Solve for the probability density function, fU (u) of U = X 2
Solution:




1 , for 1 ≤ x ≤ 2



f (x) =

0 , otherwise






u(x) = x2
√
x(u) = u
1
x0 (u) = √
2 u
fU (u) = fX (x(u))|x0 (u)|
1
= (1) × √
2 u
1
= √ , 1≤u≤4
2 u

# Clear the working environment

remove(list = ls())
# Define f(u) and fx

fx <- function(x){(1 <= x & x <= 2)* 1}
fu <- function(u){(1 <= u & u <= 4) / (2 * sqrt(u))}
# Create a data_frame
M <-
data_frame(
x = runif(n= 10000, min = 1, max = 2),
u = x ^ 2,
xs = seq(from = 1, to = 2, length = 1e4),
fx = fx(xs),
us = seq(from = 1, to = 4, length = 1e4),
fu = fu(us))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = us, y = fu),
size = 1.2) +
geom_histogram(
mapping = aes(x = xs, y = ..density..),
fill = 'darkred',
alpha = 0.3,
bins = 77,
color = 'black',
size = 0.4) +
geom_histogram(
mapping = aes(x = u, y = ..density..),
alpha = 0.2,
bins = 77,
color = 'black',

size = 0.4) +
labs(
title = 'The Transformation Formula',
x = 'Random Numbers',
theme_bw() +
theme(
plot.margin = margin(unit = 'cm', c(1, 1, 1, 1)),
axis.title.y = element_text(vjust = +5),
axis.title.x = element_text(vjust = -5),
size = 16,
face = 'italic',
family = 'serif'))
G

The Transformation Formula

1.00
0.75
Relative Frequency
0.50
0.25
0.00
1 2 3 4
Random Numbers Statistics I / Section 6.4

6.5 The Method of Moment Generating Functions
The Uniqueness Theorem of Moment Generating Functions
Let X and Y be variables with moment generating functions mX (t) and mY (t). If
mX (t) = mY (t)
for all values of t then X and Y have the same probability distribution.
Sums of Independent Random Variables
Let X1 , . . . , Xn be independent random variables with moment generating functions

n
X
m1 (t), . . . , mn (t). Then the moment generating function of Y = Xi is
i=1
mY (t) = m1 (t) × m2 (t) × · · · × mn (t)
Proof:
mY (t) = E[etY ]
= E[et(X1 +···+Xn ) ]
= E[etX1 etX2 · · · etXn ]
= E[etX1 ] × · · · × E[etXn ]
= m1 (t) × · · · × mn (t)

Sums of Independent Normal Random Variables
One application of the previous theorems is that a linear combination of inde-

pendent normal random variables is normal. More specifically, let X1 , . . . , Xk
2
be independent normal random variables Pn with E[Xi ] = µi and V [Xi ] = σi . Let
a1 , . . . , ak be constants and define Y = i=1 ai Xi . Then
µY = E[Y ]
= a1 µ1 + · · · + an µn
σY2 = V [Y ]
= a21 σ12 + · · · + a2n σn2
Y ∼ N µY , σY2


Sums of Squares of Independent Standard Normal Random Variables
The textbook uses the results of this section to derive the following result.
1. Let X1 , . . . , Xn be independent normal random variables
2. E[Xi ] = µi and V [Xi ] = σi2
Xi − µi
3. Let Zi =
σi
n
X
4. Define Y = Zi2
i=1
5. Then Y has a χ2 distribution with df = n.
Summary of the Method of Moment Generating Functions
Let W be a function of the random variables X1 , . . . , Xn
1. Find the moment generating function for W , mW (t)
2. Compare mW (t) to known moment generating functions. If mW (t) = mY (t) for

all t then W has the same distribution as Y

6.6 Transformations Using Jacobians
Introduction
Let X and Y be continuous random variables with joint density function fX,Y (x, y).
Suppose that U (X, Y ) and V (X, Y ) are functions of the random variables. How can
we determine the joint density function of U and V ? Under certain conditions, the
bi-variate transformation method can be used. Before describing this method, lets
extend the definition of support to joint density functions.
Support of a Joint Density Function
Let X and Y be continuous random variables with joint density function fX,Y (x, y).
The support of fX,Y (x, y) is the set
n o
2

support of fX,Y (x, y) = (x, y) ∈ R fX,Y (x, y) > 0

Jacobian of a Transformation
Let T : (x, y) → (u, v) be a continuously differentiable transformation. The

Jacobian is

∂x ∂x

∂(x, y) ∂u ∂v
=

∂(u, v)

∂y ∂y

∂u ∂v

! ! ! !
∂x ∂y ∂x ∂y
= −
∂u ∂v ∂v ∂u

The Bi variate Transformation Method
Let X and Y be continuous random variables with joint density function fX,Y (x, y)
and suppose that T : (x, y) → (u, v) is a one-to-one function on the support of
fX,Y (x, y). If x(u, v) and y(u, v) have continuous partial derivatives with respect to
u and v and if
∂(x, y)
J =
∂(u, v)
! ! ! !
∂x ∂y ∂x ∂y
= −
∂u ∂v ∂v ∂u
6= 0
Then, the joint density function of U and V is

fU,V (u, v) = fX,Y x(u, v), y(u, v) × J


Example:
X and Y are random variables with joint density function

8xy
 , 0≤x≤y≤1
f (x, y) =

0 , otherwise

Define U = X/Y and V = Y .
Derive fU,V (u, v) using the bivariate transformation method.

U = X/Y fU,V (u, v) = fX,Y (x(u, v), y(u, v))|J|
V = Y = 8x(u, v)y(u, v)|v|
⇓ = 8(uv)(v)(v)
x(u, v) = uv = 8uv 3
y(u, v) = v 0 ≤x≤y≤ 1
⇓ ⇓
∂x
= v 0 ≤v≤ 1
∂u
∂x
= u 0 ≤u≤ 1
∂v
∂y
= 1
∂v
∂y
= 0
∂u
∂(x, y)
J =
∂(u, v)
∂x ∂y ∂x ∂y
J = −
∂u ∂v ∂v ∂u
J = (v)(1) − (u)(0)
J = v

# Load the magrittr package

library(magrittr)
library(gridExtra)
# Define f(x, y)
f <- function(x, y){8 * x * y * (0 <= x & x <= y & y <= 1)}
# Create a grid of x,y values

Mxy <-
expand.grid(
y = seq(from = 0, to = 1, length = 150))
# Add a columns for the density values

Mxy %<>% mutate(z = f(x, y))
# Create the tile plot

Gxy <-
ggplot(
data = Mxy) +
geom_tile(
mapping = aes(x = x, y =y, fill = z)) +
low = 'white',
high = 'darkred')+
labs(
title = expression(italic('f(x, y) = 8xy')),
fill = 'Density',
x = NULL,
y = NULL,
caption = NULL) +
theme_classic() +
theme(
legend.key.width = unit(units = 'cm', 1.25),
panel.border = element_rect(

fill = NA,
size = 1),
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
# Define g(u, v)
g <- function(u, v){8 * u * v ^ 3 * (0 <= u & u <= 1) * (0 <= v & v <= 1)}
# Create a grid of x,y values

Muv <-
expand.grid(
u = seq(from = 0, to = 1, length = 150),
v = seq(from = 0, to = 1, length = 150))
# Add a columns for the density values

Muv %<>% mutate(z = g(u, v))
# Create the tile plot

Guv <-
ggplot(
data = Muv) +
geom_tile(
mapping = aes(x = u, y = v, fill = z)) +
low = 'white',
high = 'darkgreen')+
labs(
title = 'u = xy, v = y',
fill = 'Density',
x = NULL,
y = NULL,
caption = NULL) +
theme_classic() +

theme(
legend.key.width = unit(units = 'cm', 1.25),
panel.border = element_rect(
fill = NA,
size = 1),
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
grid.arrange(Gxy, Guv, nrow = 1)

f(x, y) = 8xy u = xy, v=y
1.00 1.00
0.75 0.75
0.50 0.50
0.25 0.25
Density Density
8 8
6 6
4 4
2 2
0.00 0 0.00 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

6.7 Order Statistics
Introduction
Let X1 , . . . Xn be independent continuous random variables from a distribution

with density f (x) and cumulative distribution function F [x]. The ordered random
variables
X(1) ≤ X(2) ≤ · · · ≤ X(n)
are called order statistics.
Maximum and Minimum
X(1) = min{X1 , . . . , Xn }
X(n) = max{X1 , . . . , Xn }

Density and CDF of the Maximum
The method of distribution functions can be used to find the densities X(1) and
X(n) . Let F(n) (x) be the distribution function of X(n) and let g(n) (x) be the density
function of X(n)
F(n) (x) = P [X(n) ≤ x]
= P [X1 ≤ x, X2 ≤ x, . . . , Xn ≤ x]
= P [X1 ≤ x] × P [X2 ≤ x] × · · · × P [Xn ≤ x]
n
= F (x)
dh i
g(n) (x) = F(n) (x)
dx
d n
= [F (x)
dx
= n · f (x) · F [x]n−1

Density and CDF of the Minimum
The method of distribution functions can be used to derive the density and cumu-
lative distribution function of min{X1 , . . . , Xn }. Let the cumulative distribution
function and density of X(1) be denoted by F(1) (x) and g(1) (x).
F(1) (x) = P [min{X1 , . . . , Xn } ≤ x]
= 1 − P [min{X1 , . . . , Xn } > x]
= 1 − P [X1 > x, X2 > x, . . . , Xn > x]
= 1 − P [X1 > x] × P [X2 > x] × · · · × P [Xn > x]
n
= 1 − 1 − F [x]
dh i
g(1) (x) = F(1) (x)
dx
d n
= 1 − 1 − F [x]
dx
n−1
= n · f (x) · 1 − F [x]

The Density Function of the kth Order Statistic
The textbook uses the multinomial distribution to provide a heuristic explanation

for the kth order statistics density function. Let X1 , . . . , Xn be independent
continuous random variables from a distribution with density function f (x) and
cumulative distribution function F [x]. For k ∈ {1, , . . . , n} let g(k) (x) be the density
function of X(k) . Then,
n! n−k
× F (x)]k−1 × 1 − F (x)

g(k) (x) = × f (x)
(k − 1)!(n − k)!

The Joint Density Function of Two Order Statistics
The textbook also uses the multinomial distribution to provide a heuristic derivation
for the joint density function of two order statistics. Let j and k be elements of
{1, 2, . . . , n} such that j < k and le X1 , . . . , Xn be independent continuous random
variables from a distribution with density function f (x) and cumulative distribution
function F [x]. Then the joint density function of the order statistics X(j) and X(k)
is
n! j−1 k−1−j
g(j),(k) (xj , xk ) = × F (xj ) × F (xk ) − F (xj )
(j − 1)!(k − 1 − j)!(n − k)!
n−k
× 1 − F (xk ) × f (xj ) × f (xk )

Exploring Order Statistics Using R
# Set the parameters n and k

k <- 2
n <- 3
# Set the number of simulations

nr <- 1e4
# Generate random exponential numbers

r <- rexp(n = nr * n, rate = 1)
# Create a matrix
M <- matrix(r, nrow = nr, ncol = n)
# Sort the rows

for(i in 1:nrow(M)){M[i, ] <- sort(M[i, ])}
# View the first 5 rows of M

head(M, 5)
[,1] [,2] [,3]

[1,] 0.6745128 1.1950745 2.174020
[2,] 0.2337236 1.5698768 3.221263
[3,] 0.2637691 0.3542450 1.655537
[4,] 0.2114046 0.4367300 1.010346
[5,] 0.2097365 0.3495607 2.894480
# Convert to a data_frame and assign column names

M %<>%
as_data_frame %>%
rename(X_3 = V1, X_2 = V2, X_1 = V3) %>%
mutate(x = seq(from = 0, to = 10, length = nr))
# Define the density of X_(2)

g_2 <- function(x){factorial(n) / factorial(k - 1) / factorial(n - k) *
pexp(x, rate = 1) ^ (k - 1) *(1 - pexp(x, rate = 1)^{n - k}) * dexp(x, rate = 1)}
G <-
ggplot(
data = M) +
geom_histogram(
mapping= aes(x = X_2, y = ..density..),
alpha = 0.2,
color = 'black',
bins = 50) +

geom_line(
mapping = aes(x = x, y = g_2(x)),
size = 1.2) +
scale_x_continuous(
limits = c(0, 3)) +
scale_y_continuous(
breaks = c(0.00, 0.25, 0.50, 0.75, 1.00),
labels = c('0.00', '0.25', '', '0.75', '1.00'),
limits = c(0, 1)) +
labs(
title = 'Second Order Statistic / Relative Frequency',
x = 'x-values',
caption = 'Statistics / Section 6.7') +
theme(
size = 16,
face = 'italic',
family = 'serif'),
plot.title = element_text(hjust = 0.5))

Second Order Statistic / Relative Frequency

1.00
0.75
Relative Frequency
0.25
0.00
0 1 x−values 2 3
Statistics / Section 6.7

Chapter 7
The Central Limit Theorem
7.1 Introduction
Recall the definition of a random sample. Let X1 , . . . , Xn be the random variables

observed from a random sample. Then the variables X1 , . . . , Xn are independent
and have the same distribution. Functions of the random variables that result from
a random sample are themselves random variables. These functions are used to
estimate population parameters such as the mean µ and the standard deviation σ.
For example, we might use the sample mean
n
1X
X = Xi
n i=1
to estimate the population mean µ. The sample mean is an example of a statistic.
Statistic
A statistic is a function of observable random variables in a sample and known

constants. Examples statistics include the sample mean, sample variance, and order
statistics.
Sampling Distribution
The probability distribution of a statistic is called its sampling distribution. The

sampling distribution of a statistic is a model for the relative frequency histogram
of the possible values for the statistics.
Example:
Generate 1000 samples each of size 1000 from an gamma distribution with shape parameter
2 and scale parmeters 3.
Compute the sample mean of each sample and plot a histogram of the results.
# Define the number of samples

ns <- 10000
# Define the size of each sample

sz <- 1000
# Generate the random numbers

rg <- rgamma(n = ns * sz, shape = 2, scale = 3)
# Assemble the random numbers into a matrix

M <- matrix(rg, nrow = sz, ncol = ns)
# Convert M into a matrix and compute the sample means

N <-
M %>%
as_data_frame %>%
summarize_all(mean) %>%
gather(value = sample_mean, key = sample)
# Plot the relative frequency histogram of sample means

G <-
ggplot(
data = N) +
geom_histogram(
mapping = aes(x = sample_mean, y = ..density..),
fill = 'darkred',
alpha = 0.2,
color = 'black',

size = 0.2,
bins = 50) +
scale_y_continuous(
breaks = c(0, 1, 2, 3),
labels = c(0, '','', 3)) +
scale_x_continuous(
breaks = c(5.7, 6.0, 6.3),
labels = c(5.7, '',6.3)) +
labs(
title = 'The Sampling Distribution of the Sample Mean',
x = 'Sample Means',
theme(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G

The Sampling Distribution of the Sample Mean

3
Relative Frequency
5.7 6.3
Sample Means

7.2 Sampling Distributions from a Normal Distribution
The Sample Mean for Normal Distributions
In chapter 6, the method of moment generating functions was used to show that
a linear combination of normal random variables is normal. The mean of a set of
random variables is a linear combination of the variables. As a result, we have the
following theorem. Let X1 , . . . Xn be a random sample from a normal distribution
with mean µ and variance σ 2 . Then
n
1X
X = Xi
n i=1
X ∼ N µ, σ 2 /n

µX = µ
2 σ2
σX =
n
1
The mean stays the same but that the variance is reduced by a factor of .
n

Sums of Squares of Standardized Normal Random Variables
Let X1 , . . . , Xn be a random sample from a normal distribution with mean µ and

variance σ 2 . For i ∈ {1, 2, . . . , n}. Define
Xi − µ
Zi =
σ
Then Zi ∼ N (0, 1) and
n n
!2
X X Xi − µ
Zi2 =
i=1 i=1
σ
has a χ2 distribution with df = n.
Using R for χ2 Questions

Let Z1 , Z2 , . . . , Z7 be a random sample from the standard normal distribution. Find the
value of b such that
P Z12 + · · · + Z72 ≤ b = 0.88

# we are lookng for the 88th percentile of a chi-squared, df=7

qchisq(p = 0.88, df= 7)
[1] 11.45414
# check your answer
pchisq(q = 11.45414, df = 7)
[1] 0.8800001

The Sampling Distribution of the Sample Variance S 2
Let X1 , . . . , Xn be a random sample from the distribution X ∼ N (µ, σ 2 ). Remember

that the sample mean and the sample variance were defined as
n
1X
X = Xi
n i=1
n
2 1 X 2
S = Xi − X
n − 1 i=1
The previous theorem can be used to show that
n
(n − 1)S 2 1 X 2
1. = X i − X ∼ χ2 with df = (n − 1)
σ2 σ 2 i=1
2. S 2 and X are independent random variables.

The Student’s t distribution
Let Z ∼ N (0, 1) and W ∼ χ2 with df = ν be independent random variables. Define
Z
T = p
W/ν
Then T is said to have a t distribution with ν df
R Functions for the t Distribution
density dt(x, df, ncp, log = FALSE)

cdf pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)
quantile function qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
random numbers rt(n, df, ncp)
Using R to Explore the t-Distribution
# Clear the environmet

rm(list = ls())
# Set initial parameters

n <- 1e4
v <- 4
# Generate random normal, chi^2 numbers, and T values

M <-
data_frame(
Z = rnorm(n),
W = rchisq(n, df = v),
T_vals = Z / sqrt(W/v),
t = dt(T_vals, df = v))
# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = T_vals, y = ..density..),
fill = 'darkred',

alpha = 0.4,
bins = 100,
size = 0.2,
color = 'black') +
scale_x_continuous(
breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
limits = c(-5, 5)) +
scale_y_continuous(
breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),
labels = c('0.0', '0.1', '', '', '0.4'),
limits = c(0.0, 0.44)) +
geom_line(
mapping = aes(x = T_vals, y = t),
size = 1.3,
color = 'navy')+
labs(
title = 'Simulating a t-Distribution',
x = 't-values',
theme(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

Simulating a t−Distribution
0.4
Relative Frequency
0.1
0.0
−5.0 −2.5 t−values 2.5 5.0

The F Distribution
Let W1 and W2 be independent χ2 random variables with degrees of freedom ν1 and

ν2 . Define
W1 /ν1
F =
W2 /ν2
Then F is said to have an F-distribution with ν1 numerator degrees of freedom and
ν2 denominator degrees of freedom.
R Functions for the F Distribution
density df(x, df1, df2, ncp, log = FALSE)

cdf pf(q, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
quantile qf(p, df1, df2, ncp, lower.tail = TRUE, log.p = FALSE)
random Numbers rf(n, df1, df2, ncp)
Using R to explore F -Distributions

rm(list = ls())

n <- 1e4
v1 <- 4
v2 <- 7
# Generate chi^2 numbers, and F values

M <-
data_frame(
W1 = rchisq(n, df = v1),
W2 = rchisq(n, df = v2),
F_vals = (W1 / v1) / (W2 / v2),
F = df(F_vals, df1 = v1, df2 = v2))
# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = F_vals, y = ..density..),

fill = 'darkred',
alpha = 0.4,
bins = 100,
color = 'black',
size = 0.2) +
geom_line(
mapping = aes(x = F_vals, y = F),
size = 1.3,
color = 'navy')+
labs(
title = 'Simulating an F-Distribution',
x = 'F-values',
scale_x_continuous(
breaks = c(0, 2, 4, 6, 8, 10),
labels = c(0, 2, 4, 6, 8, 10),
limits = c(0, 10)) +
scale_y_continuous(
breaks = c(0.0, 0.2, 0.4, 0.6, 0.8),
labels = c('0.0', '0.2', '', '', '0.8'),
limits = c(0.0, 0.85)) +
theme(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

Simulating an F−Distribution
0.8
Relative Frequency
0.2
0.0
0 2 4 F−values 6 8 10

Summary
Suppose that

• X1 , . . . , Xn is a random sample from X ∼ N µX , σX2

• Y1 , . . . , Ym is a random sample from Y ∼ N µY , σY2
Then
!
√ X − µX
• n ∼ N 0, 1
σX
!
n−1
• 2
SX2 ∼ χ2 with df = (n − 1)
σX
!
√ X − µX
• n ∼ t distribution with df = (n − 1)
SX
SX2 /σX2
• F = 2 2 ∼ F distribution, numerator df = (n−1), denominator df =(m−1)
SY /σY

7.3 The Central Limit Theorem
Introduction
The central limit theorem will apply to any distribution with finite mean µ and finite
variance σ 2 . The central limit theorem says that if a random sample is large enough,
then the sample mean is approximately normal. This theorem is very important
and will allow us to compute approximate probabilities for the sums of a random
sample when we only know the mean and standard deviation and not the underlying
distribution.
The Central Limit Theorem (Version I)
Let X1 , X2 , . . . , Xn be a random sample from a distribution X with finite mean µ

and finite variance σ 2 . Then for large values of n, the sum X1 + X2 + · · · + Xn is
approximately normal. That is
W = X 1 + X2 + · · · + Xn

2
∼ N nµ, nσ [approximately]
Proof:
A proof of the central limit theorem is outside the focus of this course. Textbook
describes the basic idea which uses moment generating function.

The Central Limit Theorem (Version II)
Let X1 , X2 , . . . , Xn be a random sample from a distribution X with finite mean µ and

finite variance σ 2 . Then for large values of n, the sample mean X is approximately
normal. That is
1
X = X1 + X 2 + · · · + X n
n
!
σ2
∼ N µ, [approximately]
n

Using R to Visualize the Central Limit Theorem

Since a binomial random variable is the sum of independent and identical Bernoulli random
variables, it should be approximately normal. Lets use R to generate random binomial
numbers, plot the histogram and then overlay the corresponding normal density function
# Set the population parameters
n <- 100
p <- 0.43
# Generate Random Binomial Numbers

M <-
data_frame(
xr = rbinom(n = 1e4, size = n, prob = p),
y = dnorm(xr, mean = n * p, sd = sqrt(n * p * (1-p))))
# Create a histogram with overlayed normal density function

G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = xr, y = ..density..),
fill = 'darkred',
alpha = 0.4,
breaks = seq(from = 24.5, to = 65.5, by = 1),
size = 0.2,
color = 'black') +
# scale_x_continuous(
# breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
# labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
# limits = c(-5, 5)) +
scale_y_continuous(
breaks = c(0.00, 0.02, 0.04, 0.06, 0.08, 0.10),
labels = c('0.00', '0.02', '', '', '0.08', '0.10'),
limits = c(0.0, 0.1)) +
geom_line(
mapping = aes(x = xr, y = y),
size = 1.3,
color = 'navy')+
labs(
title = 'The Central Limit Theorem',
x = 'x-values',
theme(


size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

0.10
0.08
Relative Frequency
0.02
0.00
30 40 x−values 50 60

Example #1
The total claim amount for a health insurance policy follows a distribution with density
function



1 −(x/1000)


 1000 e , for 0 < x



f (x) =


0 , otherwise






The premium for the policy is set at the expected total claim amount plus 100.If 100 policies
are sold, calculate the approximate probability that the insurance company will have claims
exceeding the premiums collected. Graph a bell curve with a shade region corresponding
to this probability.
rm(list = ls())
# Population Parameters
# X is exponential with mean 1000
mu_x <- 1000
sigma_x <- 1000
# Approximate Normal Distribution Parameters

n <- 100
mu <- n * mu_x
sigma <- sqrt(n) * sigma_x
# Calculate the probability

amount <- n * (mu_x + 100)
ans <- pnorm(q = amount, mean = mu, sd = sigma, lower.tail = FALSE)
ans
[1] 0.1586553
# Sketch this probability
M <-
data_frame(
x1 = seq(from = mu - 3 * sigma, to = mu + 3 * sigma, length = 1e3),

y1 = dnorm(x1, mean = mu, sd = sigma),

x2 = seq(from = amount, to = mu + 3 * sigma, length = 1e3),
y2 = dnorm(x2, mean = mu, sd = sigma))
G <-
ggplot(
data = M)+
geom_line(
mapping = aes(x = x1, y = y1),
color = 'darkred',
size = 1.2)+
geom_ribbon(
mapping = aes(x = x2, ymin = 0, ymax = y2),
fill = 'darkred',
alpha = 0.3,
color = 'darkred',
size = 1.2)+
geom_text(
mapping = aes(x = amount+1000, y = mean(range(y2))),
label = paste('Area = ', round(ans, digits = 3)),
size = 6,
angle = 90,
color = 'black',
family = 'serif',
nudge_x = 1500,
nudge_y = -4e-6)+
labs(
title = 'The Central Limit Theorem',
x = 'Total Claims',
y = 'Density')+
theme(
axis.title.y = element_text(vjust = +3),
axis.title.x = element_text(vjust = -3),

size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G

4e−05
3e−05
Density
2e−05 Area = 0.159
1e−05
0e+00
80000 100000 120000
Total Claims Statistics I / Section 7.3

7.4 The Continuity Correction
Introduction
Let X be an integer valued random variable with expected value µ and variance σ 2
and let X1 , X2 , . . . , Xn be a random sample from X. Define
S = X 1 + X2 + · · · + Xn
W ∼ N µS , σS2

S is also integer valued and by the Central Limit Theorem is approximately normal.
The Continuity Correction
For k ∈ Z+ , the continuity correction is

h 1 1i
P [S = k] = P k − ≤ S ≤ k +
2 2
h 1 1i
≈ P k− ≤W ≤k+
2 2
h 1i
P [S ≥ k] = P S ≥ k −
2
h 1i
≈ P W ≥k−
2
h 1i
P [S ≤ k] = P S ≤ k +
2
h 1i
≈ P W ≤k+
2

Normal Approximation to the Binomial
Let X be a binomial random variable with parameters n and p. Then
• µX = np
• σX2 = npq
• X is a sum of independent and identical Bernoulli trials.
• X assumes only integer values

• From the Central Limit Theorem, X is approximately N np, npq

• Let W ∼ N np, npq , then it follows from the continuity correction, that
h 1 1i
P [X = k] ≈ P k − ≤ W ≤ k +
2 2
h 1i
P [X ≥ k] ≈ P W ≥ k −
2
h 1i
P [X ≤ k] ≈ P W ≤ k +
2

How Big Must n Be?
How big must n be to use the normal approximation to the binomial distribution?
The answer is given by either of the two rules
p p
• 0 < p − 3 pq/n < p + 3 pq/n < 1
!
max{p, q}
• n>9×
min{p, q}
Test the Continuity Correction
# Binomial parameters
n <- 92
p <- 0.61
# normal parameters
m <- n * p
v <- sqrt(n * p * (1 - p))
# Let X be binomial with parameters n and p. Compute P[X = 45]

dbinom(x = 45, size = n, prob = p)
[1] 0.005298644
# Estimate the same probability using the continuity correction and normal ap-
proximation
pnorm(q = 45.5, mean = m , sd = v) - pnorm(q = 44.5, mean = m , sd = v)
[1] 0.005102989

7.5 The t-Distribution
The Student’s t distribution
Let Z ∼ N (0, 1) and W ∼ χ2 with df = ν be independent random variables. Define
Z
T = p
W/ν
Then T is said to have a t distribution with ν df
R Functions for the t Distribution
density dt(x, df, ncp, log = FALSE)

cdf pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)
quantile function qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
random numbers rt(n, df, ncp)

Exploring the t-Distribution

remove(list = ls())

n <- 1e5
v <- 4
# Generate random normal, chi^2 numbers, and T values

M <-
data_frame(
Z = rnorm(n),
W = rchisq(n, df = v),
T_vals = Z / sqrt(W/v),
t = dt(T_vals, df = v))
# Plot a histogram
G <-
ggplot(
data = M) +
geom_histogram(
mapping = aes(x = T_vals, y = ..density..),
fill = 'darkred',
alpha = 0.4,
bins = 100,
color = 'black',
size = 0.1) +
geom_line(
mapping = aes(x = T_vals, y = t),
size = 1.3,
color = 'blue')+
scale_x_continuous(
breaks = c(-5.0, -2.5, 0.0, 2.5, 5.0),
labels = c('-5.0', '-2.5', '', '2.5', '5.0'),
limits = c(-5, 5)) +
scale_y_continuous(

breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),

labels = c('0.0', '0.1', '', '', '0.4'),
limits = c(0.0, 0.44)) +
labs(
title = 'Simulating a t-Distribution',
x = 't-values',
theme(
size = 16,
family = 'serif',
face = 'italic',
color = 'black'))
G

Simulating a t−Distribution
0.4
Relative Frequency
0.1
0.0
−5.0 −2.5 t−values 2.5 5.0

The Density Function of a Student t-Distribution
For T a random variable with a t-distribution and ν degrees of freedom, the density
function is
−(ν+1)/2
t2

Γ[(ν + 1) / 2]
f (t) = √ 1+
πν Γ(ν/2) ν
where Γ is the gamma function defined by
Z∞
Γ(α) = y α−1 e−y dy
0

Comparing a t-Density Function and a Normal Density Function
The next plot compares a t-density function and a standard normal density function. Note
that both densities are symmetric about zero but that the t-distribution has more proba-
bility mass in its tails. t-disributions have heavier tails.
M <-
data_frame(
x = seq(from = -4, to = +4, length = 1e3),
dn = dnorm(x),
dt = dt(x, df = 4))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x, y = dn, color = 'Standard Normal'),
size = 1.4) +
geom_line(
mapping = aes(x = x, y = dt, color = 't-Distribution'),
size = 1.4) +
scale_x_continuous(
breaks = c(-4, -2, 0, 2, 4),
labels = c(-4, -2, '', 2, 4)) +
scale_y_continuous(
breaks = c(0.0, 0.1, 0.2, 0.3, 0.4),
labels = c('0.0', '0.1', '', '0.3', '0.4'))+
scale_color_manual(
name = 'Density',
values = c('Standard Normal' = 'darkgreen', 't-Distribution' = 'maroon')) +
labs(
x = 'x-axis',
y = 'y-axis',
title = 'Comparing Normal and t Density Functions',
theme(


size = 16,
family = 'serif',
face = 'italic',
color = 'black'))

Comparing Normal and t Density Functions

0.4
Density
Standard Normal
t−Distribution
0.3
y−axis
0.1
0.0
−4 −2 x−axis 2 4

A Table of Percentage Points for the t-Distribution
Create a table of percentage points for the t-distribution corresponding to the below picture.
each column should correspond to the tα and each row should correspond to df
M <-
data_frame(
x1 = seq(from = -4, to = 4, length = 1e3),
x2 = seq(from = 1, to = 4, length = 1e3),
y1 = dt(x1, df = 5),
y2 = dt(x2, df = 5))
G <-
ggplot(
data = M) +
geom_line(
mapping = aes(x = x1, y = y1),
color = 'red') +
geom_ribbon(
mapping = aes(x = x2, ymin = 0, ymax = y2),
fill = 'red',
alpha = 0.4) +
geom_text(
mapping = aes(x = 1.5, y = 0.05),
label = expression(alpha),
size = 7,
color = 'black') +
scale_x_continuous(
breaks = c(1),
labels = expression(x = t[alpha]))+
labs(
x = NULL,
y = NULL)+
theme_classic()+
theme(
axis.text.x = element_text(face = 'bold', family = 'serif', size = 16))

0.3
0.2
0.1
0.0
tα

M <-
expand.grid(
df = seq(from = 1, to = 29, by = 1),
alpha = c(0.100, 0.050, 0.025, 0.010, 0.005)) %>%
mutate(t_alpha = qt(p = alpha, df = df, lower.tail = FALSE)) %>%
spread(key = alpha, value = t_alpha)

Table 7.1: Percentage Points for t-Distributions
df 0.005 0.01 0.025 0.05 0.1

1 63.657 31.821 12.706 6.314 3.078
2 9.925 6.965 4.303 2.920 1.886
3 5.841 4.541 3.182 2.353 1.638
4 4.604 3.747 2.776 2.132 1.533
5 4.032 3.365 2.571 2.015 1.476
6 3.707 3.143 2.447 1.943 1.440
7 3.499 2.998 2.365 1.895 1.415
8 3.355 2.896 2.306 1.860 1.397
9 3.250 2.821 2.262 1.833 1.383
10 3.169 2.764 2.228 1.812 1.372
11 3.106 2.718 2.201 1.796 1.363
12 3.055 2.681 2.179 1.782 1.356
13 3.012 2.650 2.160 1.771 1.350
14 2.977 2.624 2.145 1.761 1.345
15 2.947 2.602 2.131 1.753 1.341
16 2.921 2.583 2.120 1.746 1.337
17 2.898 2.567 2.110 1.740 1.333
18 2.878 2.552 2.101 1.734 1.330
19 2.861 2.539 2.093 1.729 1.328
20 2.845 2.528 2.086 1.725 1.325
21 2.831 2.518 2.080 1.721 1.323
22 2.819 2.508 2.074 1.717 1.321
23 2.807 2.500 2.069 1.714 1.319
24 2.797 2.492 2.064 1.711 1.318
25 2.787 2.485 2.060 1.708 1.316
26 2.779 2.479 2.056 1.706 1.315
27 2.771 2.473 2.052 1.703 1.314
28 2.763 2.467 2.048 1.701 1.313
29 2.756 2.462 2.045 1.699 1.311

Bibliography
[Dev86] J. Devore and R. Peck 1986, Statistics: The Exploration and Analysis of Data. St
Paul, MN: West Publishing Company
[Si09] Thomas Sibley 2009, Foundations of Mathematics Hoboken, NJ: John Wiley & Sons,
Inc.
Index
addition rule, 29 cumulative distribution, discrete, 42

additive law of probability, 36
DeMorgan’s laws, 16
Baye’s Rule, 37 dependent events, 33
Bernoulli random variable, 55 dependent random variables, 183
Bernoulli trial, 55 difference rule, 29
beta distribution, 150 discrete random variable, 41
bi-variate normal distribution, 208 discrete sample space, 23
binomial coefficient, 24 disjoint sets, 13, 15
binomial distributions, 56 distributive laws, 16
binomial experiment, 56 double complement law, 16
binomial theorem, 24, 25
empirical rule, 11
CDF, 103 empty set, 13
central limit theorem, 277 equal probability, 24
chi-square random variables, 136 event, 22
combination, 24 expected value, continuous, 113
compound event, 23 expected value, discrete, 43
conditional density functions, 182, 185 experiment, 22
conditional distribution function, 182 exponential distribution, 141
conditional expected value, 214
F-distribution, 273
conditional probability, 31
Fundamental Theorem of Calculus, 107
conditional probability function, 181
conditional variance, 215 gamma function, 130, 131
conditioning formulas, 214 geometric distribution, 65
continuity correction, 284 geometric sum formula, 68
continuous random variables, 106
correlation coefficient, 193 hyper-geometric distribution, 80
countable, 41 inclusion/exlusion rule, 29
countably infinite, 41 independent events, 32
counting principle, 24 independent random variables, 183, 243
covariance, 186, 193
cumulative distribution function, 103 Jacobian, 247
Index MATH 3301 / Notes
joint distribution functions, 172 Pascal’s triangle, 25

joint probability density, 173 percentiles, 109
joint probability functions, 171 permutation, 26
jointly continouos, 173 Poisson random variables, 88
population, 9
k-th moment about the origin, 46
power series for ex , 90
kth central moment, 46
probability density function, 107
kth factorial moment, 51
probability function, 23, 42
linear transformations, 44 probability generating function, 51
proper subsets, 14
marginal density functions, 180
marginal probability functions, 180 quantiles, 109
memory-less property, geometric, 66, 67
random sample, 39
method of distribution functions, 230
random variables, 39
method of moment generating functions, 243
mn rule, 24 sample, 9
mode, 108 sample mean, 9
moment generating functions, 46 sample point, 22
multinomial coefficients, 28 sample standard deviation, 9
multinomial distributions, 205 sample variance, 9
multinomial experiments, 204 sampling distribution, 263
multinomial theorem, 28 scale parameter, 129
multiplication rule, 24 set complement, 13
multiplicative law of probability, 35 set difference, 13
multivariate distributions, 174 set equality, 14
multivariate normal distributions, 209 set intersection, 13
mutually exclusive, 13, 15 set notation, 13
mutually independent events, 34 set union, 13
shape parameter, 129
negative binomial distributions, 72
simple event, 22
normal approximation to the binomial, 285
standard deviation, 44
normal distributions, 157
standard normal, 159
null set, 13
standardized variables, 160
order and repetition, 27 statistic, 9, 263
order statistics, 255 subsets, 14
outliers, 161 support, 108
support of a joint density function, 246
partition, 37
Pascal’s formula, 25 t distribution, 270, 287
MATH 3301 / Notes Index
the law of total probability, 37

transformation formula, 235
uncorrelated variables, 193
uniform discrete distribution, 53
uniform discrete random variable, 53
universal set, 13
variance, discrete, 44
z-score, 160

Notes PDF

Uploaded by

Notes PDF

Uploaded by

MATH 3301 / Summer 2019

3 Discrete Random Variables 41

4 Continuous Probability Distributions 103

4.2 Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5 Multivariate Probability Distributions 165

6 Functions of Random Variables 227

7 The Central Limit Theorem 263

7.4 The Continuity Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284

Statistics is a Part of Data Analysis

The population is the entire set of measurements, objects, or individuals of interest.

A sample is a subset of the population that is selected.

A statistic is a quantity that is computed from the values of a sample. Examples

Definition of the Sample Mean x

Definition of the Sample Variance, s2

Definition of the Sample Standard Deviation, s

Population Mean, Population Variance, and Population Standard Deviation

The sample statistics are estimates for the population

The Empirical Rule

For a distribution of measurements that is approximately bell shaped...

1.2 Using R in this Course

2.1 A Review of Basic Set Theory

Sets and Elements

A set is a collections of elements. The notation x ∈ B means that x is an element

If A is a set and x, y, and z are the only elements of A, then we write

Sets are not Ordered

The Universal Set

The Empty Set

Size or Cardinality of a Set

A is a proper subset of B if A is a subset of B but there exists at least one element

Disjoint or Mutually Exlusive Sets

Sets A and B are disjoint or mutually exclusive if A ∩ B = ∅

The Distributive Laws and DeMorgan’s Laws

Double Complement Law

Extending Set Laws to More than Two Sets

R has built in functions for set operations

Set Operation R Function

#----- Script 2.1.1 / Set Operations in R -----

A <- sample(letters, rep = FALSE, size = 10)

Example: Drawing a Venn Diagram

A survey of 100 TV viewers revealed that over the last year:

ii) 15 watch NBC.

iii) 10 watch ABC.

iv) 7 watch CBS and NBC.

v) 6 watch CBS and ABC.

vi) 5 watch NBC and ABC.

vii) 4 watch CBS, NBC and ABC.

Draw a Venn Diagram corresponding to this information

# Clear the environment

# Install the VennDiagram Package

# Define the values of n1, n2 and n3

# Define the values of n12, n13 and n23

# Define the values of n123

# Create the VennDiagram

# Draw the plot

2.2 Experiments and Probability

This section defines basic terminology and introduces probability functions.

An experiment is the process by which an observation is made.

A sample point is an element of the sample space.

An event is a compound event if has at least two sample points. A compound

Discrete Sample Space

A discrete sample space is one that contains a finite or countable number of

Suppose S is a sample space associated with an experiment. To every event A in

Axiom 3: If A1 , A2 , A3 , . . . are mutually exclusive events in S then

Equal Probability Formula

The Counting Principle or mn Rule

If a process can be completed in k ∈ Z+ steps and if step i can be completed in ni

Combinations and Binomial Coefficients

The number of combinations of n objects taken r at a time is the number of subsets,

The Binomial Theorem

Letting x = 1 and y = 1 we get the identity

Symmetry of Binomial Coefficients

Special Values for Binomial Coefficients

An ordered arrangement of r distinct objects is called a permutation. The number

Order and Repetition

Order Matters Order Does Not Matter

The Multinomial Theorem