WST312 Notes 2023
WST312 Notes 2023
Chapter 1: Introduction
A mathematical model gives the value of a variable (the dependent variable) in terms of
the values of some other variables (the explanatory/independent variables). A model (of
any sort) is an imitation of a real world system or process. In the case of a mathematical
model the value of the dependent variable is given in terms of some mathematical
formula of the values of the explanatory variables.
In the case of a deterministic model the value of the dependent variable, given the values
of the explanatory variables, can only be one value as given by a mathematical formula.
This type of model contains no random components.
In the case of probabilistic models the dependent variable can assume different values
even if the values of all the explanatory variables remain the same. In this case it is the
probabilities of the dependent variable that are uniquely determined by the values of the
explanatory variables. To specify the probabilities for the dependent variable it is
sufficient to specify the distribution function of the dependent variable. This type of
model recognizes the random nature of the explanatory variables.
1. A brand switching model: A beer drinker can change the brand of beer he drinks
from time to time. It is not predictable with absolute certainty when he will change from
one brand of beer to another. To model the brand of beer he drinks at a certain point in
time, we need to consider the probabilities that he will change from one brand of beer to
another given that he has been drinking a certain brand of beer for a certain number of
times.
2. Number of contributors to a pension fund: The number of contributors to a pension
fund during a certain period of time will differ in a random way from period to period. It
will depend on the number of deaths and persons retiring during that period and new
members joining the fund. The number of contributors during one period will affect the
number of contributors during the next period.
3. A dam problem: Consider the contents of a dam at the end of each week. The
contents of the dam will change in a random way during the week depending on the
inflow (amount of rain) - which is random - during the week. The contents at the end of
one week will affect the contents at the end of the next week.
4. An inventory problem: The level of inventory of a certain item in a shop will change
from time to time when customers buy that particular item or when an order for the item
is delivered. The times at which this will happen are not predictable i.e. it is random.
-2-
Once again the level of inventory at one point in time will affect the level of inventory
some time later.
5. A queuing problem: The number of customers in a queue will change if another
customer joins the queue or if the service of a customer is completed. The times at which
these changes occur are random. The length of the queue at a certain point in time will
depend on the length of the queue at times prior to that particular point in time.
6. Population growth models: The size of the population of a certain country at the end
of a year will depend on the size of the population at the end of the previous year, the
number of births and deaths during the year and the immigration and emigration during
the year which are all random variables.
Note that in all the examples above we are interested in the value of some variable at
different points in time. The probabilities for the different possible values of the variable
at a particular point in time may depend on the values of some other variables, but may
also depend on previous values of that particular value. In such cases knowledge of the
values of the variable at previous points in time may be very useful to predict future
values of the variable. In this course we will only consider stochastic models –
specifically stochastic processes and Markov processes.
-3-
2.1 Description
Normally we are not interested in the value of a single random variable but rather a whole
collection of random variables. Let us indicate the variables we are interested in by X t
where t is some element of the set T . We refer to t as the indexing parameter for the
stochastic process. In many cases t denotes the time of the observation e.g. the size of a
queue for all times in a certain interval may be of interest. We usually use 0 (but could
also use 1) for the starting point for some process and X 0 is then the random variable that
denotes the starting value of the process – which may be some constant with probability
1.
NOTE: If ω ∈ Ω , then X t (ω ) denotes the value of the random variable X t for the
element ω of the sample space. For different values of the indexing parameter t , say t1
and t 2 , we have that X t1 and X t2 are different random variables.
All the possible values that can be assumed by X t , t ∈ T , are called the states of the
stochastic process and the set of all possible states is called the state space. Let S denote
the state space. Since the X t ’s are random variables the possible values are the real
numbers i.e. S is a subset of the real numbers.
The set of all indexing parameters T is called the parameter space. If the parameter t is
time the value of X t is called the state the process is in at time t . In many cases the
indexing parameter t is time, but other parameter spaces are possible. The parameter
space T is called discrete if it is finite or enumerable and is called continuous if it is not
enumerable. Similarly the state space S can be either discrete or continuous.
To identify the nature of a stochastic process, a first step is to classify them on the basis
of the nature of the parameter space and the nature of the state space. For example:
-4-
Example 2.1.1
Example 2.2.1
5
4
Number
3
2
1
0
0 2 4 6 8 10
Time
This graph shows a particular outcome of the ‘experiment’ which determines the
observed values of the stochastic process for the element ω . The sample path above
indicates that the first claim is received at time 2 since the function is 0 for t < 2 and
equal to 1 for t = 2 ; the second claim is received at time 3 since the function is 1 for
-5-
2 ≤ t < 3 and is equal to 2 for t = 3 ; the third claim at time 6, the fourth at time 8, etc. If
we repeat the experiment under the same conditions we will get a second value of ω , say
ω 2 , and then get the observed values of the stochastic process for all values of t i.e. we
get another possible sample path. Each repetition of the experiment determines an
element ω of Ω and therefore determines a sample path.
♪
Result 2.2.1 Given the probability space (Ω, ξ , P ) , we can determine the joint
distribution of any finite number of the X t ’s. Since each X t is a random variable the set
{ω : X t (ω ) ≤ x} is an element of ξ . Since ξ is a σ -field the intersection of any finite
number of such sets is an element of ξ and hence P is defined for such a set. For any
value n and any values t1 < t 2 < t 3 < ... < t n the joint distribution function of
X t1 , X t2 , X t3 ,..., X tn is given by
t1 (ω ) ≤ x1 , X t2 (ω ) ≤ x 2 ,..., X tn (ω ) ≤ x n .
Thus, if given the joint cumulative distribution function of X t1 , X t2 , X t3 ,..., X tn , the joint
density function or probability mass function of X t1 , X t2 , X t3 ,..., X tn can be determined.
Conversely, if we have a set of random variables {X t : t ∈ T } and know all the joint
distribution functions FX t , X t , X t ,..., X t for all values of n and t1 < t 2 < t 3 < ... < t n we can
1 2 3 n
find a probability space (Ω' , ξ ' , P') and define functions {X t' : t ∈ T } on this space such
that the joint distribution of X t'1 , X t'2 , X t'3 ,..., X t'n is the same as the joint distribution of
X t1 , X t2 , X t3 ,..., X tn . The probabilities of a stochastic process can therefore be defined by
the joint distributions of all X t1 , X t2 , X t3 ,..., X tn .
Note: To determine, or even just to list, all the joint distributions would be very
cumbersome and hard to use in some practical problems unless there is some structure in
the nature of the process which simplifies the specification of all the joint distributions. In
many cases a set of random variables with a very simple structure for all the joint
distributions is defined and then a set of new random variables i.e. a stochastic process is
defined using the random variables with the simple structure.
Example 2.2.2
Consider a gambler that makes successive bets of R1 each. The outcomes of the different
bets are independent events. Let X n be the amount of money the gambler wins (or
looses) at the n th bet. Assume that the probability of winning a bet is p for all bets. Then
the probability mass function of X n is given by
-6-
p for x = 1
f X n ( x) = P[ X n = x] = 1 − p for x = −1
0 otherwise.
Since the X n ’s are independent random variables, the joint probability mass function of
X 1 , X 2 ,..., X n is given by
Let Yn , n = 1,2,3,... be the total winnings (or losses) of the gambler up to and including
the n th bet i.e. Yn = X 1 + X 2 + ... + X n . Each Yn is therefore a linear transformation of the
X n ’s and the joint distribution of each Yn can therefore be obtained from the joint
distribution of the X n ’s. Thus we can use a stochastic process { X n : n = 1,2,3,...} for
which the joint distributions are easily determined to define the stochastic process
{ Yn : n = 1,2,3,...}.
For n = 1 this means X t1 has the same distribution as X t1 + k for any k i.e. all the X t ’s
have the same distribution. For n = 2 it means that for any t1 and t 2 the joint
distribution of X t1 , X t2 is the same as the joint distribution of X t1 + k , X t2 + k . Therefore a
shift in time of both t1 and t 2 by the same quantity k does not change the joint
distribution of the random variables. Put differently, if the distance between t1 and t 2 is
the same as the distance between t 3 and t 4 the joint distribution of X t1 , X t2 is the same
as the joint distribution of X t3 , X t4 . As a result we have, for example, that
cov( X t1 , X t2 ) = cov( X t3 , X t4 ) . Similarly for the cases n = 3,4,... etc.
-7-
A stationary process is covariance stationary, but the converse is not necessarily true.
[ ] [
P X tn +1 ∈ A | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn = P X tn +1 ∈ A | X tn = xtn ]
i.e. probabilities of future events (what happens at time t n +1 ) only depend on the state of
the process at the latest point in time ( t n ) for which we have information. The
information about the state of the process at the previous points in time ( t 0 , t1 ,..., t n −1 )
does not have any effect on the probabilities of future events and will therefore not help
to make a better prediction of what will happen at time t n +1 .
For a stochastic process with a discrete state space, the process will have the Markov
property if for any a ∈ S , for any value of n and for any values of t1 < t 2 < ... < t n < t n +1 ,
it is true that
[ ] [
P X tn +1 = a | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn = P X tn +1 = a | X tn = xtn . ]
-8-
1 (CT, ST)
Proof For any a ∈ S , for any value of n and for any values of
t1 < t 2 < ... < t n < t n +1 , we have that
[
P X tn +1 = a | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn ]
[
= P X tn +1 − X tn + xtn = a | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn ]
= P[X t n +1 − X t n + xt n = a ] (independent increments)
= P[X t n +1 − X t n + xt n = a | X t n = xt n ] (independent increments)
= P[X t n +1 = a | X t n = xt n ]
i.e. the Markov property holds.
■
NB: The converse is not necessarily true i.e. a Markov process does not necessarily have
independent increments. (Exercise: Find an example to show that this is true.)
A Markov process with discrete parameter space T = {0,1,2,...} and discrete state space
(finite or countably finite) S is called a Markov chain.
A Markov process with continuous parameter space T = [0, ∞ ) and discrete state space
(finite or countably finite) S is called a Markov jump process.
2.4.2 Random Walk Let {X t : t = 0,1,2,...} be a white noise process and define a
t
sequence of random variables Y0 , Y1 , Y2 ,... where Y0 = 0 and Yt = ∑ X i for all t ≥ 1.
i =1
the last p + 1 X ’s up to and including X t . (The weight for the latest one ( X t ) is α 0 , the
weight for the previous one ( X t −1 ) is α 1 etc.) Then {Yt : t = 0,1,2,...} is a moving average
process of order p . A moving average process is a stationary process but in general is
not a Markov process.
The Poisson process is often used as a model where we count the number of events that
take place during some interval of time – for instance the number of accidents or the
number of insurance claims. The Poisson process is not stationary (why?).
0 if N t = 0
Yt = N t
∑ X i if N t > 0.
i =1
Then is called a compound Poisson process. Such a process can be used
as a model where N t is the number of insurance claims up to time t , for example, and
X i is the amount of the i th claim. Then Yt is the total amount of claims received up to
time t .
2.4.6 Brownian Motion A stochastic process {Bt : t ∈ [0, ∞ )} with state space
S = R is called Brownian motion if:
(i) {Bt } has independent increments.
(ii) Each increment Bt − Bs , s < t , has a normal distribution with expected value µ (t − s )
and variance σ 2 (t − s ) , where µ and σ 2 > 0 are constants.
(iii) {Bt } has continuous sample paths.
Brownian motion is also referred to as a Wiener process.
The case when µ = 0 and σ 2 = 1 is referred to as standard Brownian motion.
E[T ] = E N [ E[T | N ]]
∞ N
= P[ N = 0]E[T | N = 0] + ∑ P[ N = n]E[∑ Z i | N = n]
n =1 i =1
∞ n
= P[ N = 0].0 + ∑ P[ N = n]E[∑ Z i ] (since all Z i are independent of N )
n =1 i =1
∞
= ∑ P[ N = n].n E[ Z ] (since n is a given constant)
n =0
∞
= E[ Z ]∑ P[ N = n].n
n =0
= E[ Z ]E[ N ].
So
var(T ) = E N [var(T | N )] + varN ( E[T | N ])
∞
N N
= ∑ P[ N = n] var ∑ Z i N = n + varN E[∑ Z i N ]
n =0 i =1 i =1
∞
n
N
= ∑ P[ N = n] var ∑ Z i + varN E[∑ Z i N ] .
n =0 i =1 i =1
But
N n
E[∑ Z i N = n] = E[∑ Z i ] = n E[ Z ] (true for n > 0 and n = 0 ),
i =1 i =1
so
∞
var(T ) = ∑ P[ N = n] n var(Z ) + varN ( N E[ Z ])
n =0
∞
= var(Z )∑ P[ N = n] n + E[ Z ] 2 var( N )
n =0
= var(Z ) E[ N ] + E[ Z ] 2 var( N ).
Definition 2.5.1 Let U be a random variable which can take on the values 0,1,2,3,...
and let pi = P[U = i ] . The probability generating function of U is defined by
∞
u ( z ) = p 0 + p1 z + p 2 z 2 + p3 z 3 + ... = ∑ pi z i .
i =0
This sum will converge for | z |< 1 since for | z |< 1
- 12 -
∞ ∞
∑| p z
i =0
i
i
| < ∑ pi = 1 ,
i =0
since an absolutely convergent series is convergent.
Note that if we know the function u (z ) and expand it as a power series the coefficient of
z i is equal to pi i.e. knowing u (z ) is equivalent to knowing all the probabilities.
Theorem 2.5.2 If u (z ) is the probability generating function of U then u ' (1) = E[U ] .
Proof
u ' ( z ) = p1 + 2 p 2 z + 3 p3 z 2 + 4 p 4 z 3 + ...
i.e.
From the above it follows that var(U ) = u ' ' (1) + u ' (1) − (u ' (1) ) .
2
3 (CT) Proof
∞ ∞
∑ ai ∑ b j = [a 0 + a1 + a 2 + ...][b0 + b1 + b2 + ...]
i =0 j =0
= a 0 b0 + a 0 b1 + a 0 b2 + a 0 b3 + a 0 b4 + ...
+ a1b0 + a1b1 + a1b2 + a1b3 + a1b4 + ...
+ a 2 b0 + a 2 b1 + a 2 b2 + a 2 b3 + a 2 b4 + ...
= a 0 b0
+ a1b0 + a 0 b1
+ a 2 b0 + a1b1 + a 0 b2
+ a3 b0 + a 2 b1 + a1b2 + a 0 b3
i.e. to get the sum of all the products ai b j first get the sum of the products on the various
diagonals and then get the sum of all the sums on the diagonals. Note that on the k th
diagonal the sum of the subscripts are equal to k . Therefore
∞ ∞ ∞ k ∞ k
∑ i ∑ j ∑ ∑ k −l l = ∑ ∑ al bk −l .
a b = a b
i =0 j =0 k =0 l =0 k =0 l =0
■
Note: This video will be interesting to watch:
[Link]
bigger-than-others
generating function of the probabilities of V . Then Let W = U + V and let w(z ) be the
probability generating function of W . Then w( z ) = u ( z )v( z ) .
wk = P[U + V = k ]
= P[U = 0 and V = k or U = 1 and V = k − 1 or U = 2 and V = k − 2 ... or U = k and V = 0 ]
= P[U = 0 and V = k ] + P[ U = 1 and V = k − 1] + ... + P[ U = k and V = 0 ]
(since the events are disjoint)
= P[U = 0]P[V = k ] + P[U = 1]P[V = k − 1] + ... + P[U = k ]P[V = 0]
(since U and V are independent)
k
= u 0 v k + u1v k −1 + ... + u k v0 = ∑ u i v k −i .
i =0
This is called the convolution of {u} and {v} .
Hence
∞
w( z ) = ∑ wk z k
k =0
∞ k
= ∑∑ u i v k −i z k
k =0 i =0
∞ k
= ∑∑ u i z i v k −i z k −i
k =0 i =0
∞ ∞
= ∑ ui z i ∑ v j z j
i =0 j =0
= u ( z )v( z ).
■
Proof
M T (u ) = E[e uT ]
u∑ Xi
N
= E e i =1
u∑N
Xi
= EN E e i =1
N
u∑ Xi u∑ Xi
N N
∞
= E e i =1 N = 0 p 0 + ∑ E e i =1 N = n p n
n =1
u∑ Xi
n
[ ]
∞
= E e p 0 + ∑ E e i =1 p n
u0
n =1
(since all X i ' s are independent of N )
∞
= 1 p 0 + ∑ [m X (u )] p n
n
n =1
Theorem 2.5.7 For a compound Poisson process with parameter λ the moment
Nt
∑ X i if N t > 0
generating function of Yt = i =1 is given by e λt ( M X (u ) −1)
0 if N t = 0
4 (CT, ST) Proof The probability generating function of a Poisson random variable with
parameter λt is given by
∞
(λt ) n n
Φ N t (u ) = ∑ e −λt u sin ce N t − N 0 = N t ~ Poisson(λt )
n =0 n!
∞
(λtu ) n
= e − λt ∑
n =0 n!
∞
xn
= e −λt e λtu = e λt (u −1) (definition of e x = ∑ )
n =0 n!
λt ( M X ( u ) −1)
and therefore by theorem 2.5.6 M Yt (u ) = e . ■
- 16 -
Test Yourself:
Exercise 2.1
(a) If you were to model the maximum daily temperature in Pretoria, what state space
and parameter space would you use? Describe the nature of these spaces.
(b) Suppose you have available a continuous record of temperature instead and wanted to
model Pretoria’s temperature at all times. Describe the parameter space, state space and
the nature of these spaces.
Exercise 2.2
Consider independent identically distributed random variables X t for t = 1,2,3,... where
X t = 1 with probability p and X t = −1 with probability q = 1 − p . Let Y0 = 0 and
t
Yt = ∑ X i i.e. {Yt : t = 0,1,2,...} is a random walk.
i =1
(a) Explain why the Yt ’s are not identically distributed (HINT: Determine the
distributions of Y1 and Y2 ).
(b) Explain why the Yt ’s are not independent (HINT: determine the conditional
distribution of Y2 given that Y1 = 1 ).
Exercise 2.3
For the process {Yt : t = 0,1,2,...} defined in exercise 2.2, determine
(a) P[Y2 = 1, Y4 = 3 Y0 = 0]
(b) P[Y2 = 0, Y4 = 2 Y0 = 0].
Exercise 2.4
To specify the probability structure of a stochastic process {X t : t ∈ T } it is required to
specify the distribution of X t for all values of t ∈ T - TRUE or FALSE?
Exercise 2.5
For the process {Yt : t = 0,1,2,...} defined in exercise 2.2
(a) Determine P[Y10 = 10] and P[Y2 = 10] .
(b) Is the random walk stationary?
Exercise 2.6
What can be concluded about the variance of a weakly stationary stochastic process
{X t : t ∈ T }?
Exercise 2.7
Prove that a random walk has the Markov property.
Exercise 2.8
Suppose that {Yt : t = 0,1,2,...} is a moving average process of order p .
(a) Prove that such a process is weakly stationary.
(b) Does the process have the Markov property? Explain.
(c) Does the process have independent increments? Explain.
- 17 -
For a Markov chain both the parameter space and the state space are discrete i.e. are finite
or denumerable. For convenience we will take the parameter space to be T = {0,1,2,3,...}
and the state space to be S = {1,2,3,..., N } for some N or S = {1,2,3,...}.
The stochastic process {X t : t ∈ T } is a Markov chain if for any value of n and for any
values of t1 < t 2 < ... < t n < t n +1 ∈ T , it is true that
[ ] [
P X tn +1 = xtn +1 | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn = P X tn +1 = xtn +1 | X tn = xtn ]
for all xt1 , xt2 ,..., xtn , xtn +1 ∈ S .
Let pi(,mj ,n ) = P[ X n = j X m = i ] i.e. this is the probability that given that at time m the
process was in state i it will be in state j at time n . These so called transitional
probabilities play a vital role in the theory of Markov chains.
7 (CT)
Proof The event { X t = i1 , X t = i2 ,..., X t = in } can take place in a number of
1 2 n
different disjoint ways. Namely, the process can start in any state i0 at time 0, then go to
state i1 at time t1 etc. Hence
P[ X t1 = i1 , X t2 = i2 ,..., X tn = in ] = P[ { X 0 = i0 , X t1 = i1 , X t2 = i2 ,..., X tn = in }]
i0 ∈S
= ∑ P[ X 0 = i0 , X t1 = i1 , X t2 = i2 ,..., X tn = in ]
i0 ∈S
Theorem 3.1.2 shows that the joint distribution of any number of the X t ’s can be
determined in terms of the probabilities of X 0 and all the transitional probabilities. We
will now show that these transitional probabilities must satisfy certain relationships.
8 (CT)
Proof Given that the process starts in state i at time m , the process must be in
some state, say k , at time l and then go from state k at time l to state j at time n i.e.
P[ X n = j X m = i ] = P[ { X l = k , X n = j} X m = i ]
k∈S
Let P ( m ,n ) be the matrix whose (i, j ) th element is pi(,mj ,n ) i.e. P ( m ,n ) = [ pi(,mj ,n ) ] . From the
Chapman-Kolmogorov equations it follows the the (i, j ) th of P ( m ,n ) is the elements in i th
row of P ( m ,l ) multiplied by the elements in the j th column of P (l ,n ) . Hence
P ( m ,n ) = P ( m ,l ) P (l ,n ) for any m < l < n .
Example 3.1.1 Suppose that a machine used in some production process may be either
in working order or out of order. Let us denote the two states by 1 and 2 respectively and
divide time into certain periods (e.g. day, month, year etc.). Let X m be the state of the
machine at the start of the m th period. Then S = {1,2} and T = {1,2,3,...} - therefore X 1 = 1
will indicate that the process is started at the beginning of the first period with a machine
that is in order. Now suppose that if the machine is in working order at the start of the
m th period, the probability of the machine being in working order at the start of the
m + 1th period is 100 . Note that this probability only depends on the state of the
100 + m
machine (being in order) at time m (start of the m th period) and not on anything else that
may have happened before that time – it does however depend on m (the ‘age’ of the
machine). (Thus the process is Markov but not time-homogeneous.) Also, suppose that
the probability that a machine that is out of order at the beginning of the m th period is in
working order at the beginning of the next period is given by 200 . This
200 + m
probability also does not depend on anything that might have happened before the start of
the m th period but does depend on m . This means that the process is a Markov chain
with transition matrices
100 100
1−
P ( m ,m +1) = 100 + m 100 + m
200 200
1−
200 + m 200 + m
100 1 100 2
i.e. P (1, 2 ) = 101 101 and P ( 2,3) = 102 102 .
200 1 200 2
201 201 202 202
- 20 -
0.980488 0.019512
Hence P (1,3) = P (1, 2 ) P ( 2,3) = .
0.980440 0.01956
Thus, for instance, the probability that given that at the beginning of period 1 the machine
is in order that it will be out of order at the beginning of the third period is 0.019512.
♪
A tremendous simplification occurs if all one-step transition matrices are the same i.e. if
pi(,mj ,m +1) = pi , j for all values of m . In this case the probability of a transition from state i
to state j only depends on the particular states i and j , and does not depend on the
particular point in time. Such a Markov chain is called time-homogeneous. In the rest of
this chapter we will work only with time-homogeneous Markov chains. In all
applications it must always be determined whether or not a Markov chain is time-
homogeneous before results are applied.
Let P = [ pi , j ] . If S is finite with N elements, we have that
p11 p12 p13 p1N
p p 22 p 23 p 2 N
21
P = p31p32 p33 p3 N ,
p N 1
p N 2 p N 3 p NN
and if S is infinite P is an infinite matrix. Note that the i th row contains all the
transition probabilities to go from state i to one of the states in S . Since it is certain that
starting in state i the process will go to one of the states in one step, it is true that
∑ pi, j = 1 for all i . A matrix with this property is called a stochastic matrix.
j∈S
Example 3.2.1 Suppose that a process can be in any one of two states. If a process is in
state 1 the probability is 0.6 that it will remain in state 1 and the probability is 0.4 that it
will change to state 2. If the process is in state 2 the probability is 0.7 that it will remain
in state 2 and the probability is 0.3 that it will change to state 1. Assume that in this case
all changes (increments) for any time s to any time t i.e. X t − X s , is independent of the
state of the process at times 0,1,2,..., s i.e. the process has independent increments (and is
also therefore a Markov process).
Note that this process is a Markov process but does not have independent increments –
Why?
Some interesting questions about this process are, for instance, the probability that A will
loose all his money and how these probabilities depend on the amount A had available at
the start of the process, what the probabilities are that A will loose all his money after a
certain number of bets, the expected duration of the game etc.
♪
3.3 Classification of States
is q j −i > 0 (or p (jij −i ) > 0 ) i.e. i is accessible from j . Therefore states 1,2,3,4 all
communicate with each other.
- If A’s fortune is R0 then it will remain 0 i.e. for any number of transitions the
probability is zero that A’s fortune will change and therefore state 0 does not
communicate with any other state. A state such as 0 is called absorbent. Similarly
state 5 does not communicate with any other state and is also absorbent. ♪
10 (ST, E)
Proof It is given that pi(,mj ) > 0 . Suppose p (jn,k) > 0 for some n and some
k ∈ C1 . Since i ↔ k ( i and k belong to the same equivalence class C1 ) there exists an
s such that p k( s,i) > 0 , so that by the Chapman-Kolmogorov equations
p (jn,i+ s ) ≥ p (jn,k) p k( s,i) > 0
from above i.e. i and j communicate. This would however mean that j ∈ C1 since all
states that communicate with i have to belong to the same equivalence class. This is a
contradiction, thus we must have p (jn,k) = 0 for all k ∈ C1 and all n . ■
We state the following three theorems without proof (a proof of the theorems can be
found in the book by Bhat titled Applied Stochastic Processes).
From theorem 3.3.3 it follows that if state i is aperiodic ( d i = 1 ) and if p (jm,i ) > 0 , then
there exists an N such that p (jm,i +n ) > 0 for all n ≥ N i.e. p (js,i) > 0 for all s ≥ N + m i.e.
from some point onwards all p (js,i) ’s are > 0 .
From theorem 3.3.4 it follows that all states in an equivalence class have the same period.
An equivalence class of states with period 1 is called aperiodic.
i , that the process will visit state j for the first time at some time. Note that
f ij > 0 if and only if state j is accessible from state i .
State i is called recurrent if f ii = 1 i.e. if the process is in state i then it is
certain that the process will return to state i some time.
State j is called transient if f jj < 1 i.e. the process is in state j and there is a
positive probability ( 1 − f jj ) that the process will never return to state j .
For example 3.3.1 (gambling problem) states 0 and 5 are recurrent since once the process
is in one of these states then with probability 1 it will stay in that state i.e. it will return to
that state with probability 1. States 1,2,3,4 are transient states since there is a positive
probability that the process will go from any of these states, say i , to state 0 and then
never return to that state i.e. f ii < 1 .
- 26 -
11 (ST, E) Proof If state i is recurrent then the process, given that it started in state i ,
will with probability 1 return to state i at some time. As a result of the Markov property
it will then be as if the process starts from scratch once the process has returned to state i
- the probabilities depend only on the fact that the process starts in state i . The process
will then again return to state i at some time with probability 1. With probability 1 there
will be a next visit to state i and so there will be an infinite number of visits to state i
with probability 1. The expected number of visits to state i is therefore infinity.
Conversely however, if state i is transient then the process will visit state i some time
with probability f ii < 1 . If there is a visit, the process will start over from scratch (why?)
and there will be a next visit with probability f ii . The number of visits to state i before
no further visits will have a geometric distribution with parameter p = 1 − f ii . [Regard a
success as the event that there are no further visits to state i and the geometric
distribution gives the probabilities that there will be a certain number of failures and then
a success.] In this case the expected number of visits to state i is 1 < ∞ since
(1 − f ii )
f ii < 1 .
State i is therefore recurrent if and only if the expected number of visits to state i is
infinite.
Now consider the following random variable, given that the process started in state i ,
1 if at time n the process is in state i
In = .
0 if at time n the process is not in state i
∞
Then E[ I n ] = 1 × pi(,ni ) + 0 × (1 − pi(,ni ) ) = pi(,ni ) . Note that ∑I
n =0
n = total number of visits to
∞
From theorem 3.3.6 it follows that if state i is transient, then ∑p
n =0
(n)
ii < ∞ from which it
∑p
l =0
(l )
j, j ≥ ∑ p (jn, j+ s + m )
s =0
∞
≥ ∑ p (jn,i) pi(,si ) pi(,mj ) (Chapman - Kolmogorov equations)
s =0
∞
= p (jn,i) pi(,mj ) ∑ pi(,si ) = ∞
s =0
from theorem 3.3.6 since i is recurrent and therefore j is also recurrent since
∞
∑p
n =0
(n)
j, j = ∞ (by Theorem 3.3.6). ■
NOTE: All states in an equivalence class are therefore either recurrent or all states are
transient.
states i .
15 (ST, E) Proof The event that the process goes from state i to state j in n transitions
is the union of the following mutually exclusive events:
1) the first visit to state j is after 1 transition and in the following n − 1 transitions the
process goes from state j to state j ,
- 28 -
2) the first visit to state j is after 2 transitions and in the following n − 2 transitions the
process goes from state j to state j ,
n − 1 ) the first visit to state j is after n − 1 transitions and in the following transition the
process goes from state j to state j ,
n ) the first visit to state j is after n transitions.
Hence
n −1
p i(,nj) = ∑ f ij( k ) p (jjn − k ) + f ij( n )
k =1
n
= ∑ f ij( k ) p (jjn − k ) where f ij( 0 ) = 0 and p (jj0 ) = 1.
k =0
Then
∞
Since ∑p
n =0
(n)
ij < ∞ it follows that lim pij( n ) = 0 (result from analysis) for all states i .
n →∞
■
all states are transient, then the limit of pij(n ) as n tends to infinity is zero (by theorem
3.3.10) for all j which is impossible if the sum remains 1. ■
∞
Definition 3.3.4 Suppose that state i is recurrent i.e. f ii = 1 . Let µ ii = ∑ nf ii( n )
n =0
i.e. µ ii is the expected value of the number of transitions (time), given that
the process starts in i , to return to state i for the first time.
State i is called positive recurrent if µ ii < ∞ or null-recurrent if µ ii = ∞ i.e.
although the process returns to state i with probability 1, on average it may
take an infinite amount of time for this to happen. A positive recurrent
aperiodic state of a Markov chain is called ergodic.
- 29 -
Let T = {r + 1, r + 2,..., m} be the set of all transient states and let T C = {1,2,..., r} be the
set of all recurrent states.
Suppose that i ∈ T and j ∈ T C . Let g ij(n ) be the probability, given that the process
begins in state i ∈ T , the process will visit only transient states for n − 1
transitions and at the n th transition goes to state j ∈ T C . Remember that once the
process visits a recurrent state it can never go to a transient state again. The number of
transitions n is therefore the number of transitions before a recurrent state is visited and
then never returns.
∞
Let g ij = ∑ g ij( n ) i.e. g ij is the probability, given that the process started in
n =0
[ ]
Let G ( n ) = g ij( n ) and let G = g ij .[ ]
Theorem 3.4.2 Suppose that { X t : t = 0,1,2,3,...} is a time-homogeneous
Markov chain with m states and transition probability matrix given by
P Ο
P= 1 . Then G ( n ) = Q n −1 R and G = ( I − Q) −1 R .
R Q
- 31 -
18 (ST) Proof Firstly we have that g ij(1) = pij for i ∈ T and j ∈ T C since these are the
probabilities that in 1 step the process will go from state i to state j i.e.
G (1) = R = IR = Q 0 R . For n>1 we have that g ij( n ) = ∑ pik g kj( n −1) since after 1 step the
k∈T
process must be in a transient state, say k , and then starting from that transient state k
visit only transient states and after n − 1 steps visit a recurrent state, specifically state j ,
for the first time. Hence the (i, j ) th element of G (n ) is the i th row of Q multiplied by the
j th column of G ( n −1) i.e. G ( n ) = QG ( n −1) . Applying this result repeatedly we get that
G ( n ) = QG ( n −1) = QQG ( n − 2 ) = QQQG ( n −3) = ... = Q n −1G (1) = Q n −1 R .
Also
∞
G = ∑ G (n) = R + QR + Q 2 R + Q 3 R + ...
n =1
= ( I + Q + Q 2 + Q 3 + ...) R
= ( I − Q) −1 R from theorem 3.4.1.
■
0 0.6 0 0 0.4 0 0 0
0.4 0 0.6 0 0
0 0.16 0
G ( 2 ) = QR = = .
0 0.4 0 0.6 0 0 0 0.36
0 0 0.4 0 0 0.6 0 0
If A started with R2 then the probability that he looses everything after exactly 2 bets is
0.16 = 0.4 × 0.4 and the probability that A will have R5 after two bets if he started with
R3 is 0.36 = 0.6 × 0.6 .
Next we obtain
0 0.6 0 0 0 0 0.096 0
0.4 0 0.6 0 0.16
0 0 0.216
G (3) = QG ( 2 ) = = .
0 0.4 0 0.6 0 0.36 0.064 0
0 0 0.4 0 0 0 0 0.144
The probability is 0.096 that A will have lost all his money after exactly 3 bets if he
started with R1. Similarly the probability is 0.216 that A will have all R5 after exactly 3
bets if he started with R2. We also get that
−1
1 − 0.6 0 0 0.4 0
− 0.4 1 − 0.6 0 0 0
G = ( I − Q) −1 R =
0 − 0.4 1 − 0.6 0 0
0 0 − 0.4 1 0 0.6
1.54 1.35 1.07 0.64 0.4 0
0.90 2.25 1.78 1.07 0 0
=
0.47 1.18 2.25 1.35 0 0
0.19 0.47 0.90 1.54 0 0.6
0.62 0.38
0.36 0.64
= .
0.19 0.81
0.08 0.92
From this we see that if A starts with R1 (and B with R4) then the probability that A will
loose all his money some time is 0.62 and the probability is 0.38 that some time he will
win all R5. If A starts with R2 (and B with R3) then the probabilities of loosing
everything or winning everything eventually are 0.36 and 0.64 respectively.
♪
Although theorem 3.4.2 is a very useful general method of calculating the probabilities
and is also important from a theoretical point of view, there are certain cases where it is
possible to get more explicit formulae for the probabilities and which also lead to further
insights.
- 33 -
(3.4.4)
Hence π i +1 − π i = (q p ) π 1 for i = 1,2,3,..., N − 1 . But
i
π i = π 1 + (π 2 − π 1 ) + (π 3 − π 2 ) + ... + (π i − π i −1 )
[
= π 1 1 + (q p ) + (q p ) 2 + ... + (q p ) i −1 ]
[
π 1 1 − (q p) i
]
if p ≠ q
= 1 − (q p) (3.4.5)
iπ if p = q
1
From (3.4.4) it follows that
1 = π 1 + (π 2 − π 1 ) + (π 3 − π 2 ) + ... + (π N −1 − π N − 2 ) + (1 − π N −1 )
[
= π 1 1 + q p + (q p ) 2 + ... + (q p ) N −1 ]
[
π 1 1 − (q p) N
] if p ≠ 1 / 2
= 1− q p
Nπ if p = 1 / 2.
1
Therefore
1− q / p
1 − (q / p ) N if p ≠ 1 / 2
π1 =
1 if p = 1 / 2. (3.4.6)
N
From (3.4.5) and (3.4.6) it follows that
1 − (q / p) i
if p ≠ 1 / 2
1 − (q / p) N
πi =
i if p = 1 / 2. (3.4.7)
N
For p = 1 / 2 the probability that eventually A will win all the money available is equal
to the proportion of the total capital available at the beginning of the game. If we let the
total capital N tend to infinity we get that
1 − (q / p ) i if p > 1 / 2
lim π i =
N →∞
0 if p ≤ 1 / 2.
If A plays against an adversary with an infinite amount of capital then A will eventually
loose all his money with probability 1 if p ≤ 1 / 2 . If p > 1 / 2 then A does have a chance
to win ‘all’ the money and this probability is bigger if he has more money available at the
beginning of the game.
It is possible to obtain the results of (3.4.7) in an alternative way. If π i is the probability
that if player A starts with R i he will eventually win all the money, then by conditioning
on the outcome of the first bet, we get that
π i = pπ i +1 + qπ i −1 (3.4.8) .
We get this equation by arguing as follows: if A wins the first bet (which has probability
p ) he will have R( i + 1 ) and then the probability that he will eventually win all the
- 35 -
money is π i +1 - by using the Markov property. Similarly if A looses the first bet (with
probability q ) he will have R (i − 1) and the probability of then eventually winning all the
money is π i −1 .
Equation (3.4.8) is a homogeneous difference equation of order 2. The general solution of
such an equation has two unknown constants in it. To determine these constants we make
use of the two boundary conditions of π 0 = 0 (if A starts with R0 the probability of
eventually winning all the money is 0) and π N = 1 (if A starts with R N the probability
of eventually winning all the money is 1). It is easy to check that (3.4.7) is a solution of
(3.4.8) satisfying these two conditions.
Suppose that i ∈ T and j ∈ T . Let hij(n ) be the probability, given that the
process started in state i , that state j will be visited n times before the
∞
process visit some recurrent state. Let mij = ∑ n hij( n ) i.e. mij is the expected
n =1
number of visits, given that the process starts in state i , to a transient state j
before the process visits a recurrent state for the first time. Let Tij be the
random variable such that P[Tij = n] = hij( n ) i.e. mij = E[Tij ] .
k∈T
Let Ti = ∑ Tij i.e. Ti is the total number of visits to transient states given that
j∈T
the process started in state i . This means that Ti is the total time (number of
transitions) spent in transient states. Thus E[Ti ] = ∑ mij which is the sum of the
j∈T
th
elements in the i row of M .
- 37 -
But
′
E[ X i | Z = −1] = E[1 + X i + z | Z = −1]
′
= E[1 + X i −1 | Z = −1]
′
= E[1 + X i −1 ] since what happens after the first bet is independent of what
happended in the first bet
= 1 + mi −1 .
Similarly
′
E[ X i | Z = 1] = E[1 + X i + z | Z = 1]
′
= E[1 + X i +1 | Z = 1]
′
= E[1 + X i +1 ]
= 1 + mi +1 .
From (3.4.9) it then follows that
mi = q (1 + mi −1 ) + p (1 + mi +1 )
(3.4.10)
or pmi +1 − mi + qmi −1 = −1.
This is an inhomogeneous difference equation of the second order and the solution
required must satisfy the boundary conditions m0 = m N = 0 .
It is easy enough to check that the solution in this case is given by
1 N (1 − (q / p )i )
2 p −1 − i for p ≠ q
mi = 1 − (q / p)
N
(3.4.11)
Ni − i for p = q.
2
Suppose that the process starts in state i . The time (i.e. number of steps)
taken by the process to go from state i to state j for the first time is called
the first passage time of the transition i → j . When j = i we call the number
of steps required for such a transition the recurrence time of state i .
Let Fij be the first passage time of the transition i → j and let { f ij(n ) } be its distribution
i.e.
f ij( n ) = P[ X n = j , X r ≠ j for r = 1,2,..., n − 1 | X 0 = i ] = P[ Fij = n].
- 39 -
We are considering only the case of a single recurrent equivalence class (since the
process is irreducible). Note then that all the states are recurrent since the process is finite
thus we cannot transform P into equation 3.4.1. Thus to determine f ij(n ) we modify the
transition probability matrix P such that the state j becomes absorbing i.e. make
p jj = 1 and p ji = 0 for all i ≠ j and leave all other transition probabilities unchanged.
This new state j is recurrent. All other states of the chain then becomes transient because
there is a positive probability to go from any state i ≠ j to j and then never return to
state i . For the original process the probability of starting in state i and then visiting
state j for the first time after n steps is the same as the probability of starting in a
transient state i and then for the first time to visit a recurrent state, specifically state j ,
after n steps in the changed process. Let P * be the changed probability matrix in
canonical form i.e.
j 1 2 m
j 1 | 0 0 0
− |
− − − − −
1 p1 j | p11 p12 p1m
P* = 2 p 2 j | p 21 p 22 p2m
|
|
m p mj | p m1 pm2 p mm
where state j is the only recurrent state in P * and states 1,2,..., m with j excluded are
the m − 1 transient states in P * . Let
1 O
P* =
R Q
where R is the j th column of P with the j th element excluded and Q is the matrix P
with row j and column j excluded. The distribution of the first passage times from
states i = 1,2,..., j − 1, j + 1,...m to state j is then given by theorem 3.4.2 as
F ( n ) = Q n −1 R.
Theorem 3.5.1 Let P : m × m be a stochastic matrix (i.e. all elements are non-
negative and the sum of all rows are equal to 1). Let ∈ be the smallest
element of P i.e. ∈≥ 0 . Let X : m × 1 be any vector with minimum component
a 0 and maximum component b0 . Let a1 and b1 be the minimum and
maximum components respectively of PX . Then
a 0 ≤ a1 and b1 ≤ b0 and b1 − a1 ≤ (1 − 2 ∈)(b0 − a 0 ) .
- 40 -
20 (CT)
Proof Let X * be the column vector obtained from X by replacing all
components of X , except the minimum component, by the maximum component b0 . We
then have that
b0
b
0
x1 x1
∗
x ∗
2 x2
x x ∗ b
X = 3 and X * = 3 = 0 .
a 0
b
∗ 0
x m x m
b0
Therefore X ≤ X * i.e. every component of X is less than or equal to the corresponding
component of X * .
Then
m m
j =1 j =1
m m *
i.e. ∑ pij x j ≤ ∑ pij x j or PX ≤ PX * .
j =1 j =1
∑ pij x j ≤ ∑ pij x *j
j =1 j =1
m
= ∑ pij b0 + pik a 0
j =1
j≠k
m
= (1 − pik )b0 + pik a 0 since ∑ pij = 1
j =1
= b0 − pik (b0 − a 0 )
≤ b0 − ∈ (b0 − a 0 ) since all pij ≥∈ and b0 − a 0 ≥ 0
and since this is true for all i i.e. all elements of PX , we have that
b1 ≤ b0 − ∈ (b0 − a 0 ). (3.5.1)
Now apply the above result to − X . Note that the minimum element of − X is − b0 and
the maximum element is − a 0 . Similarly the maximum element of P(− X ) = − PX is
− a1 . Hence
- 41 -
− a1 ≤ (−a 0 )− ∈ [(− a 0 ) − (− b0 )]
or − a1 ≤ − a 0 − ∈ (−a 0 + b0 ) (3.5.2)
NOTE: In case all elements of P are strictly positive it follows that ∈> 0 . In this case
1 − 2 ∈< 1 and then b1 − a 1 < b 0 − a 0 .
Since 1 − 2 ∈< 1 we see that lim d n ≤ lim (1 − 2 ∈) n = 0 ∴ lim d n = 0 i.e. lim a n = lim bn .
n→∞ n→∞ n→∞ n →∞ n →∞
This means that the minimum and maximum of the elements of P n e j tends to the same
limit i.e. all elements of P n e j tends to the same limit. But P n e j is the j th column of P n
i.e. all elements of the j th column of P n tend to the same limit, say π j .
Recall that 0 < a1 ≤ a n ≤ bn ≤ b1 < 1 so that 0 < lim a n = π j < 1 .
n →∞
m m m
Note also the 1 = ∑ pij( n ) for all n and for all i i.e. 1 = ∑ lim pij( n ) = ∑ π j .
n →∞
j =1 j =1 j =1
2. Now suppose that not all elements of P are non-zero. (NOT REQUIRED)
From theorem 3.3.5 it follows that there exist an N such that for all n ≥ N , pij( n ) are all
non-zero. Using P N instead of P as above, we get
d kN ≤ (1 − 2 ∈N ) k
where ∈N is the smallest element of P N .
Note that theorem 3.5.1 is true even if ∈ is zero i.e. if a 0 and b0 are the minimum and
maximum elements of X , and if a1 and b1 are the minimum and maximum elements of
PX , then (b1 − a1 ) ≤ (b0 − a 0 ) . Hence if d kN is the difference between the maximum and
minimum elements of P kN e j and d kN +l is the difference between the maximum and
minimum elements of P kN +l e j , then since P kN +l e j = P l P kN e j , we have that
d kN +l ≤ d kN ≤ (1 − 2 ∈N ) k for l = 1,2,3,..., N − 1 . Therefore we again obtain that
lim d n = 0 i.e. lim a n = lim bn = π j say. ■
n →∞ n →∞ n →∞
Hence v ′ = π ′ . ■
- 44 -
NOTE: From the last part of the proof above i.e. (3.5.3), it follows that π ′P n = π ′ for all
n and in particular π ′P = π ′ . These equations together with the equation π ′1 = 1
provides a set of equations we can solve to determine the stationary distribution IF it
exists which according to theorem 3.5.2 will be the case if the Markov chain is aperiodic,
irreducible and finite.
0.3 0.7
Examples a) If P = find the limiting distribution.
0.1 0.9
0.2 0.3 0.5
b) If P = 0.1 0.5 0.4 find the limiting distribution.
0 0 1
0.3 0.7 0
c) If P = 0.4 0.6 0 find the limiting distribution.
1 0 0
NOTE: Consider the problem of obtaining lim P n when the Markov chain has more than
n →∞
∞ (r )
∑ ∑ g ik π j = g i (Cl )π j , i ∈ T , j ∈ Cl
=
k∈Cl r =1
0.3 0.7 0 0 0 0
0.5 0.5 0
0 0 0
0.3 0.2 0 0 0 0.5
Example Find the limiting distribution if P = .
0 0 0.8 0 0.2 0
0 0 0 0.6 0 0.4
0 0 0 0 0 1
where Fkj* is the transition time from state k to state j if the first step is not to j .
Therefore
1 if the first step is to k = j
E[ Fij | K = k ] =
1 + µ kj if the first step is to k ≠ j
because of the Markov property (thus the probabilities for the random variables Fkj and
Fkj* are the same and thus their expected values are the same).
Therefore
µij = EK [ E[ Fij | K ]]
= 1× pij + ∑ (1 + µ kj ) pik
m
k =1
k≠ j
m m
= 1 + ∑ µ kj pik since ∑ pik = 1
k =1 k =1
k≠ j
m
= 1 + ∑ µ kj pik − µ jj pij
k =1
m
= 1 + ∑ pik µ kj − µ jj pij .
k =1
- 46 -
µ11 0 0 1 1 1
0 µ 22 0 1 1 1
Let µ = [ µij ] , µ D = and E = .
0 0 µ mm 1 1 1
Then µ = E + Pµ − Pµ D (3.5.4)
since
p11 p12 p1m µ11 0 0 p11µ11 p12 µ 22 p1m µ mm
p p22 p2 m 0 µ 22 0 p21µ11 p22 µ 22 p2 m µ mm
Pµ D = 21
= .
pm1 pm 2 pmm 0 0 µ mm pm1µ11 pm 2 µ 22 pmm µ mm
1
NOTE: Since µii = < ∞ (since all π i > 0 ) for an irreducible, aperiodic, finite chain,
πi
we see that all states are positive recurrent.
Occupation Times
By occupation time we mean the number of times (steps) the process
occupies a certain state in a given period. Let N ij(n ) be the number of times
the process visits state j in n steps, given that initially the process was in
N ij( n )
state i . Then is the fraction of time the process visits j in n steps.
n
- 47 -
Let
1 if X k = j, given X 0 = i
Yij( k ) =
0 otherwise.
Then
P[Yij( k ) = 1] = pij( k ) and P[Yij( k ) = 0] = 1 − pij( k )
and hence E[Yij( k ) ] = pij( k ) .
n
We also have that N ij( n ) = ∑ Yij( k )
k =1
and therefore
[ ] n n
E 1n N ij( n ) = E 1n ∑ Yij( k ) = 1
n ∑p (k )
ij .
k =1 k =1
Proof (not required) Given ∈> 0 there exist N (∈) such that a k − a <∈ / 2 for all
1 N (∈)
n > N (∈) . Since lim n ∑ (a k − a ) = 0 there exist an M (∈) such that for n > M (∈)
n →∞
k =1
N (∈)
1
n ∑ (a
k =1
k − a ) <∈ / 2 .
N (∈) n
≤ 1
n ∑ (a k − a) +
k =1
1
n ∑ (a
k = N (∈) +1
k − a)
n − N (∈)
≤∈ / 2 + ∈/2
n
≤∈
n
i.e. lim 1n ∑ a k = a . ■
n →∞
k =1
Theorem 3.5.6 Let N ij(n ) be the number of times that an aperiodic,
irreducible m - state Markov chain visits state j in n steps given that
initially the process was in state i . Then
[ ]
n
lim E 1n N ij( n ) = lim 1n ∑ pij( k ) = π j
n →∞ n →∞
k =1
24 (CT) Proof From theorem 3.5.2 we have that lim pij( n ) = π j and from theorem 3.5.5 it
n →∞
n
then follows that lim 1n ∑ pij( k ) = π j . ■
n →∞
k =1
NOTE: Theorem 3.5.6 shows that the limiting probabilities, i.e. the π j 's, are also
fractions of time the Markov chain can be expected to occupy the various states in a large
number of steps. From theorem 3.5.4 we also have that π i = 1 where µii is the
µii
expected number of steps for a first return to i starting from i i.e. if it takes a long time
on average to return to i we have a small probability that we will find the process in i
and i will be occupied only a small fraction of the time.
0 1−α2 0 α2 0 0 0 0
P= 0 0 1− α3 0 α3 0 0 0 .
0 0 0 0 0 1 − α a −1 0 α a −1
0 0 1 − α a α a
0 0 0 0
The equations to solve for the limit probabilities π 0 , π 1 ,..., π a are as follows:
π 0 = π 0 (1 − α 0 ) + π 1 (1 − α 1 )
π 1 = π 0α 0 + π 2 (1 − α 2 )
π2 = π 1α 1 + π 3 (1 − α 3 )
π3 = π 2α 2 + π 4 (1 − α 4 )
πi = π i −1α i −1 + π i +1 (1 − α i +1 )
π a −1 = π a − 2α a − 2 + π a (1 − α a )
πa = π a −1α a −1 + π aα a .
α0
From the first equation we can solve for π 1 in terms of π 0 namely π 1 = π0 .
1 − α1
From the second equation we can then solve for π 2 in terms of π 0 namely
α 0α 1
π2 = π0 .
(1 − α 1 )(1 − α 2 )
In general we can solve for π i in terms of π 0 from the i th equation namely
α 0α 1 ...α i −1
πi = π0.
(1 − α 1 )(1 − α 2 )...(1 − α i )
Since the sum of all π i 's is equal to 1 we must therefore have that
a
α 0α 1 ...α i −1
1 = 1 + ∑ π 0
i =1 (1 − α 1 )(1 − α 2 )...(1 − α i )
−1
a
α 0α 1 ...α i −1 α 0α 1 ...α i −1
i.e. π i = 1 + ∑ .
i =1 (1 − α 1 )(1 − α 2 )...(1 − α i ) (1 − α 1 )(1 − α 2 )...(1 − α i )
[NOTE: This is a solution of the equations even if 1 − α 0 = 0 and α a = 0 i.e. the
Ehrenfest model.]
- 50 -
♪
Experience Rating
When deciding the premium that a policyholder should pay, many insurance companies
use information on the number of claims that the policyholder has actually made in
previous years. This is done on the grounds that this gives a better indication of the
likelihood of the policyholder making a claim in the future. Those who have made more
claims in the past are charged higher premiums than those who have made fewer claims.
This is particularly common in the case of motor insurance, where a No Claims Discount
(NCD) system is used by most insurance companies. Some insurers also use such a
system for other types of insurance such as household, group life and medical cover. A
NCD system operates by giving the policyholder a discount on the normal premium,
which is related directly to the number of "claim free years" that the policyholder has
experienced. The discount is expressed as a percentage of the normal premium. The
greater the number of past claim free years, the higher the level of the discount.
When deciding whether to make a claim, the policyholder has then to consider the effect
it will have on the premium in subsequent years.
There are two parts to a NCD system: the discount categories and a set of rules for
moving between these categories. In addition, in order to investigate the properties of a
NCD system the probability that a policyholder makes a claim each year need to be
known.
The categories are often referred to as the number of "claim free years ". However, the
rules for moving between categories are usually such that they do not actually relate to
the number of years since a claim. Rather than a claim resulting in a policyholder
returning to having no discount, it is common for the policyholder to simply move to
another category with a lower level of discount.
In category 0, the policyholder pays the full premium, which in practice will vary
between individuals according to their own personal circumstances such as age.
In category 1, the policyholder pays 75% of the full premium and in category 2 only 60%
of the full premium.
If a policyholder makes no claim in a year, he or she moves to the next higher category
(or stays in category 2). If one or more claims is made, he or she move down one
category (or stay at zero discount). ♪
Let us consider an NCD system with m + 1 categories namely 0,1,2,.., m . Let pij be the
probability that during a year a policyholder will move from category i to category j .
These transition probabilities written in matrix form is as follows
p 00 p 01 p 02 p 0 m
p
10 p11 p12 p1m
P = p 20 p 21 p 22 p 2 m .
p m 0 p m1 p m 2 p mm
If we assume that these transition probabilities are the same for all years and that whether
or not a claim occurs during a year is independent of whether or not a claim is made
during any other year are independent events, matrix P will be the transition probability
matrix for a finite, time-homogeneous Markov chain. We will assume that for all i and j
there exist a number of years, say n which may depend on i and j , such that it is
possible to go from category i to category j with a positive probability in n years. This
means that all states communicate i.e. the Markov chain is irreducible and since it is a
finite Markov chain all states (i.e. categories) are recurrent.
(n )
at time 0. Let X iu be the category the u th policyholder in category i at time 0 is in
after n years.
Let
1 if X iu = j
(n)
Y = (n)
iu , j
if X iu ≠ j
(n)
0
i.e. Yiu( n, j) is 1 if the u th policyholder in category i at time 0 is in state j at time n . Hence
Yiu( n, j) is equal to 1 with probability pij(n ) and 0 otherwise and therefore
E[Yiu( n, )j ] = 1 × pij( n ) + 0 × (1 − pij( n ) ) = pij( n ) .
Ni
We further have that N i(,nj) = ∑ Yiu( n, )j is the number of policyholders which were in
u =1
category i at time 0 who are in category j at time n and E[ N i(,nj) ] = N i × pij( n ) . Let
m
N (j n ) = ∑ N i(,nj) i.e. it is all policyholders which at time n are in category j . Hence
i =0
m
E[ N (n)
j ] = ∑ N i × pij( n ) . The expected proportion of policyholders in category j at time
i =0
n , is then given by
m m m
q (jn ) = E[∑ N i(,nj) / N ] = ∑ ( N i / N ) × pij( n ) = ∑ qi × pij( n )
i =0 i =0 i =0
0 0.2 0.8
i.e. after 1 period the expected proportion in category 0 is only 0.2 whereas the expected
proportion in category 1 is 0.8. Also
- 53 -
0.2 0.8 0
q = (0.2 0.8 0) 0.2 0 0.8 = (0.2 0.16 0.64)
'
2
0 0.2 0.8
i.e. after 2 periods the expected proportion in state 2 is 0.64 etc.
If the full premium is R200 then the average premium after two steps will be
200(0.2) + 200(0.75)(0.16) + 200(0.6)(0.64) = 140.80.
♪
To prove the existence of the limit probabilities in this case we need the following
theorem which we state without proof.
∑b k ∞
then lim
n →∞
u n exists and lim u n =
n →∞
k =0
∞
if ∑ ka k < ∞ and lim u n = 0 otherwise.
n →∞
∑ ka
k =0
k
k =0
NOTE: Theorem 3.6.1 can be generalised to the case where it is not necessary to assume
that a1 > 0 but then it must be assumed that the greatest common divisor of all k such
that a k > 0 is equal to one.
In applications to Markov chains this will have the implication that the chain must be
aperiodic.
We also need the following theorem which will be proved.
n ∞
Theorem 3.6.2 (will be given) Let y n = ∑ a n −k x k where a m ≥ 0, ∑a m =a<∞
k =0 m =0
∞
and lim x k = c where c is a real number. Then lim y k = ac = ∑ a m c .
k →∞ k →∞
m =0
Proof (not required) We have that
n ∞
y n − ac = ∑ a n − k x k − c ∑ a m
k =0 m =0
n ∞
= ∑ a n − k ( x k − c) − c ∑a m .
k =0 m = n +1
Given ∈> 0 let K (∈) be such that x k − c <∈ / 3a for all k ≥ K (∈) . Then for n > K (∈)
K (∈) n ∞
y n − ac = ∑ a n − k ( x k − c) +
k =0
∑ a m ( x k − c) − c
m = K (∈) +1
∑a
m = n +1
m .
Let M = max x k − c . Then let N (∈) ≥ K (∈) and also be such that for all n ≥ N (∈) it is
k ≥0
true that
∞ K (∈) n
|c| ∑ am <∈ / 3 and
m =n +1
∑ a n−k =
k =0
∑a m
m = n − K (∈)
<∈
3M
.
∈ ∈ ∈
≤ M + a +
3M 3a 3
=∈
i.e. lim y k = ac . ■
k →∞
- 55 -
Proof Note that pii( 0) = 1 and f ii( 0) = 0 i.e. pii( 0 ) − f ii( 0 ) pii( 0 ) = 1 and in general
n
pii( n ) = ∑ f ii( k ) pii( n − k ) for n ≥ 1 (see proof of theorem 3.3.10).
k =0
Hence
n
1 for n = 0
pii( n ) − ∑ f ii( k ) pii( n − k ) =
k =0 0 for n ≥ 1.
∞
Now apply theorem 3.6.1 with u n = pii(n) ; a k = f ii( k) [then a k ≥ 0 ∀k , ∑a
k =0
k = 1 since all
∞
states are recurrent; b0 = 1 and bk = 0 ∀k ≥ 1 so that ∑b
k =0
k = 1 ; since all states are
aperiodic the greatest common divisor of all k such that f ii( k ) > 0 is 1(assume this
without proof) (or a1 > 0 since all states are aperiodic); and
.]
Therefore
setting y n = p , a n = f (n)
ji
(n)
ji and x n = p (n )
ii and using theorem 3.6.2 we obtain that
lim p (n)
ji = lim p (n)
ii if j ≠ i . ■
n →∞ n →∞
Proof If i ∈ C and j ∈ C then i ↔ j i.e. there exist n and m such that pij( n ) > 0 and
p (jim ) > 0 . Furthermore p (jjm + v + n ) ≥ p (jim ) pii( v ) pij( n ) by the Chapman-Kolmogorov equations.
As v → ∞ , we get that
π j ≥ p (jim ) pij( n )π i > 0 since π i > 0
for all j ∈ C . ■
NOTE: Theorem 3.6.4 shows that positive recurrence is a class property. Similarly null
recurrence is a class property - for if π i = lim pii( n ) = 0 for some i ∈ C then π j = 0 for all
n →∞
j ∈ C since if one of the π j ’s >0 then all π j ’s >0. The theorem also holds if the whole
process is irreducible, aperiodic and recurrent.
We call π i = lim
n →∞
pii( n ) for i = 0,1,2,3,... the limiting distribution of the states of
Theorem 3.6.5 (will be given) Suppose that a n,k for n = 1,2,3,... and
k = 0,1,2,3,... are real numbers such that a n ,k ≤ M for all n = 1,2,3,... and
k = 0,1,2,3,... and lim a n ,k = a k for k = 0,1,2,3,... . Further, suppose that
n →∞
∞
x k , k = 0,1,2,3,... is a sequence of real numbers such that ∑x
k =0
k < ∞ . Then
∞ ∞ ∞
lim ∑ x k a n ,k = ∑ x k lim a n ,k = ∑ x k a k .
n →∞ n →∞
k =0 k =0 k =0
∞
∈
Proof (not required) Given ∈> 0 there exists N (∈) such that ∑x
k =n
k <
3M
∀n ≥ N (∈) .
Let Q = max{x k : k = 0,1,2,3,...N (∈)}. Then there exist M (∈) such that
∈
a n ,k − a k < ∀n ≥ M (∈) for k = 0,1,2,3,..., N (∈) .
3QN (∈)
Hence for all n ≥ M (∈)
- 57 -
∞ ∞ N (∈) ∞ ∞
∑ x k a n,k − ∑ x k a k
k =0 k =0
≤ ∑ x k (a n,k − a k ) +
k =0
∑ x k a n,k +
k = N (∈) +1
∑x a
k = N (∈) +1
k k
∈ ∈ ∈
≤Q N (∈) + M +M
3QN (∈) 3M 3M
≤∈ .
■
Multiply both sides of (3.6.3) by p ji and then sum over all values of j to get
∞ ∞ ∞ ∞ ∞ ∞
∑π
j =0
j p ji ≥ ∑ p ji ∑ π k p kj = ∑ π k ∑ p ji p kj = ∑ π k p ki( 2 )
j =0 k =0 k =0 j =0 k =0
∞
But π i ≥ ∑ π j p ji by using (3.6.3) thus
j =0
∞
π i ≥ ∑ π k p ki( 2)
k =0
∞
or π j ≥ ∑ π k p kj( 2 ) for all values of j , by changing subscripts. (3.6.4)
k =0
Repeating the above procedure to multiply by both sides of (3.6.4) by p ji and then to add
over all values of j , we get
∞
π j ≥ ∑ π k p kj(3) for all values of j
k =0
and in general
∞
π j ≥ ∑ π k p kj( n ) for all values of j and all values of n . (3.6.5)
k =0
Now suppose that one of the inequalities in (3.6.5) is a strict inequality. If we then sum
over all values of j we get that, since each of the rows of P n sum to 1,
∞ ∞ ∞ ∞ ∞ ∞
∞ ∞
xi ≥ 0, ∑ xi = 1 and x j = ∑ xi pij for all j = 0,1,2,... .
i =0 i =0
Then by the same methods as above we can show that
∞ ∞
x j = ∑ x k p kj = ∑ x k p kj( n ) for all j and n .
k =0 k =0
Hence if we let n → ∞ and use theorem 3.6.5 we get that
- 59 -
∞ ∞
x j = ∑ xk π j = π j ∑ xk = π j
k =0 k =0
i =1
Note that all the nij 's is a set of sufficient statistics for all the pij 's. Also, note that Li
only depends on pi1 , pi 2 ,..., pi , s −1 and that these parameters only occur in Li . That means
that to maximize L we can maximize each Li individually with respect to the parameters
that occur in Li .
We will maximize Li without considering the restrictions 0 ≤ pij ≤ 1 for all i and j and
pi1 + pi 2 + ... + pi , s −1 ≤ 1 , and then show that these restrictions will in any case be satisfied
for the estimates that maximize Li .
Now
ln Li = ni1 ln pi1 + ni 2 ln pi 2 + ... + ni , s −1 ln pi , s −1 + nis ln (1 − pi1 − pi 2 − ... − pi , s −1 )
- 60 -
∂ n nis
ln Li = i1 + (−1)
∂pi1 pi1 1 − pi1 − pi 2 − ... − pi , s −1
i.e.
∂ n nis
ln Li = i 2 + (−1)
∂pi 2 pi 2 1 − pi1 − pi 2 − ... − pi , s −1
∂ ni , s −1 nis
ln Li = + (−1).
∂pi , s −1 pi , s −1 1 − pi1 − pi 2 − ... − pi , s −1
Equating all the derivatives equal to zero to get the maximum likelihood estimates we get
that
(1 − pˆ i1 − pˆ i 2 − ... − pˆ i,s −1 ) for j = 1,2,..., s − 1
nij
pˆ ij = (3.7.1)
nis
Let ni = ni1 + ni 2 + ... + nis i.e. ni is the number of times there was a transition from state
i to some state (including to i itself). Now adding both sides of (3.7.1) over
j = 1,2,..., s − 1 we get that
n − nis
pˆ i1 + pˆ i 2 + ... + pˆ i , s −1 = i (1 − pˆ i1 − pˆ i 2 − ... − pˆ i,s −1 )
nis
=
ni
(1 − pˆ i1 − pˆ i 2 − ... − pˆ i,s −1 ) − 1 + pˆ i1 + pˆ i 2 + ... + pˆ i,s −1
nis
n
i.e. 1 − pˆ i1 − pˆ i 2 − ... − pˆ i , s −1 = is and substituting this in (3.7.1) we get
ni
nij
pˆ ij = (3.7.2)
ni
The maximum likelihood estimate of pij is the proportion of times that when there was a
transition from state i it was to state j . We also have that 0 ≤ pˆ ij ≤ 1 and
ni − nis
pˆ i1 + pˆ i 2 + ... + pˆ i , s −1
≤ 1. =
ni
Let N ij be the random variable determined by the number of transitions from state i to
state j and let N i be the random variable determined by the number of transitions from
state i . Then the maximum likelihood estimator of pij is given by
~ N ij
pˆ ij = for j = 1,2,..., s − 1; i = 1,2,..., s . (3.7.3)
Ni
N ij N ij kpij
Note that E Ni = k = E Ni = k = = pij and therefore
Ni k k
N ij
~
E[ pˆ ij ] = E N i E N i = E N i pij = pij [ ]
N i
- 61 -
i.e. the maximum likelihood estimators of the pij 's are unbiased estimators based on a set
of sufficient statistics. ■
χ = ∑∑∑
2 ijk
.
i =1 j =1 k =1 nij (n jk / n j )
The model with no restrictions on the pijk 's, except that their sum is equal to 1, has s 3 − 1
parameters and the number of independent parameters under the hypothesis that the
process is a Markov chain is s ( s − 1) - remember the pij 's in every row add up to 1. The
degrees of freedom for the χ 2 statistic above is therefore s 3 − 1 − s ( s − 1) . However, if
there are certain probabilities that by their very nature must be 0 we need to adjust the
degrees of freedom namely
degrees of freedom = r − q + s − 1
where s denotes the number of states i such that ni > 0 ,
- 62 -
Exercise 3.8.1 Suppose that the following states were observed for a 3-state process:
1,3,2,2,1,3,3,2,3,1,2,3,2,1,1,2,2,1,3,3.
Show that the value of χ is 14.6111 with 20 degree of freedom which is not significant
2
at the 5% level.
Test Yourself
Exercise 3.1 Show that for a Markov chain the definitions of the Markov property
given in section 2.3.5 and that given in section 3.1 are equivalent.
Exercise 3.2 Let qi = P[ X 0 = i] and let the one-step transition probabilities be given
by pij( n ,n +1) . Determine an expression for P[ X 3 = k ] .
Exercise 3.3 Assume that pij( n,n +1) = pij for all n . Show that P[ X m+ k = j | X m = i] does
not depend on m for all values of k . (Use induction on k .)
Exercise 3.4 Let q be a vector with i th element qi = P[ X 0 = i] . Let p5 be the vector
with i th element P[ X 5 = i ] . Let pij( n ,n +1) be the one-step transition probabilities for a
time-homogeneous Markov chain. Determine an expression for p5 in terms of q and
the matrices P ( n ,n +1) .
Exercise 3.5 What is the value of ∑ P[ X
i∈S
n = i | X n +1 = j ] where P[ X n = i | X n +1 = j ]
is the probability that at time n the state of the process was i given that at time n + 1 the
process is in state j .
Exercise 3.6 Consider a time-homogeneous Markov chain with state space S = {0,1,2}
and transition probability matrix P , where
p q 0
P = 0.5 0 0.5 .
p − 0.5 0.7 0.2
(a) Determine the values of p and q .
(b) Calculate the transition probabilities pij(3) .
Exercise 3.7 A motor insurance company grants either no discount (state 0), 25%
discount (state 1) or 50% discount (state 2). A claim-free year results in a transition to the
next higher state for the following year (or in the retention of maximum discount).
Similarly a year with one or more claims results in a transition to the next lower state for
the following year (or retention of no discount). The probability of a claim-free year is
0.75 for all years and what happens in different years are independent events.
(a) Is this a Markov chain? Why?
(b) Determine the one-step transition probability matrix.
(c) What is the probability of starting with a 25% discount and ending up with 25%
discount after 4 years?
- 63 -
Exercise 3.8 A motor insurance company grants its customers either no discount, or
25%, 40% or 60% discount. A claim-free year results in a transition to the next higher
state for the following year (or in the retention of maximum discount). A year with one or
more claims results in a transition to the next lower level of discount if the previous year
was claim-free or two levels down if there was a claim in the previous year (or to no
discount if two levels lower is not possible). The probability of a claim-free year is 0.75
for all years and what happens in different years are independent events.
Consider the following possible states for the system: no discount (state 0), 25% discount
(state 1), 40% discount and a claim-free previous year (state 2), 40% discount but a claim
the previous year (state 3) or 60 % discount (state 4).
(a) Is this a Markov chain? Why?
(b) Determine the one-step transition probability matrix.
(c) What is the probability of starting with no discount and ending up with maximum
discount after 5 years?
Exercise 3.9 A no claim discount system has four levels of discount – 0%, 20%, 40%
or 60%. A new policyholder starts at 0% discount. At the end of each policy year,
policyholders will change levels according to the following rules:
(i) At the end of a claim free year, a policyholder moves up one level, or remains
on the maximum discount.
(ii) At the end of a year in which exactlt one claim is made, a policyholder drops
down one level, or remains on 0%.
(iii) At the end of a year in which more than one claim was made, a policyholder
drops to 0% discount.
The probability of a claim free year for a policyholder is 0.7, the probability of exactly
one claim is 0.2 and the probability of more than one claim is 0.1. What happens in
different years are independent events.
(a) Determine the transition probability matrix P .
(b) Calculate P 2 .
(c) If a policyholder starts at 0%, what is the probability that after 5 years the
policyholder will be on maximum discount?
Exercise 3.10 Show that the probabilities of A winning all the money as given in
example 3.4.1 agrees with the formula derived in (3.4.7).
Exercise 3.11 Show that the formula for π i derived in (3.4.7) satisfies the difference
equation (3.4.8) as well as the boundary conditions π 0 = 0 and π N = 1 .
Exercise 3.12 (a) Show that the formulae for the expected number of bets before the
game ends in (3.4.11) satisfies the difference equation (3.4.10) and the boundary
conditions m0 = m N = 0 .
(b) Show that the formulae agrees with the expected number of bets as derived in
example 3.4.2 where p = 0.6 and N = 5 .
Exercise 3.13 Consider the no claim discount example in exercise 3.8.
(a) Determine whether the process is an irreducible Markov chain.
(b) Are the states aperiodic?
(c) Explain why all the states are recurrent.
- 64 -
(d) Calculate the probabilities that a person not yet on the maximum discount,
will receive the maximum discount for the first time after 6 years.
Exercise 3.14 Consider a two-state time-homogeneous Markov chain with transition
1 − a a
probability matrix P = where 0 < a < 1 and 0 < b < 1 .
b 1 − b
(a) Is this an irreducible chain?
(b) Is this an aperiodic chain?
(c) Determine the stationary distribution for this process. Compare your answer with
the limiting distribution for a two-state Markov chainas can be determined from
theorem 3.5.7.
Exercise 3.15 For the no claim discount example considered in exercise 3.13,
determine the stationary distribution.
Exercise 3.16 Consider a two-state time-homogeneous Markov chain with transition
1 − a a
probability matrix P = where 0 < a < 1 and 0 < b < 1 .
b 1 − b
(a) Determine the recurrence times for both states.
(b) For a = 0.5 and b = 0.5 determine the expected recurrence times.
(c) For a = 0.8 and b = 0.1 determine the expected recurrence times.
Exercise 3.17 Consider the no claim discount example in exercise 3.13
(a) Determine the recurrence times.
(b) In the long run, what proportion of time will a person be on 60% discount?
(c) Does the answer in (b) depend on the state the person starts in?
- 65 -
In the same way as for Markov chains (theorems 3.1.1 and 3.1.2) we can show that for
any t 0 < t1 < t 2 < ... < t n and any values i0 , i1 , i2 ,..., in it is true that
P[ X t1 = i1 , X t2 = i2 ,..., X tn = in X t0 = i0 ]
= Pi0 ,i1 (t 0 , t1 ) Pi1 ,i2 (t1 , t 2 ) Pi2 ,i3 (t 2 , t 3 )...Pin −1 ,in (t n −1 , t n ). (4.1.1)
and that for any 0 < t1 < t 2 < ... < t n and any values i1 , i2 ,..., in it is true that
P[ X t1 = i1 , X t2 = i2 ,..., X tn = in ]
= ∑ qi0 Pi0 ,i1 (0, t1 ) Pi1 ,i2 (t1 , t 2 ) Pi2 ,i3 (t 2 , t 3 )...Pin −1 ,in (t n −1 , t n ) (4.1.2)
i0 ∈S
where qi = P[ X 0 = i ] .
It follows that if we know the initial probabilities ( qi ’s) and the transition probabilities
then all probabilities for the process can be determined.
As in the case of Markov chains we can in exactly the same way prove the Chapman-
Kolmogorov equations (see theorem 3.1.3) namely that for any s < u < t and any i and
j,
Pij ( s, t ) = ∑ Pik ( s, u ) Pkj (u , t ) (4.1.3)
k∈S
i.e. the transition probabilities must satisfy the same type of relations as in the case of
Markov chains.
If we define P( s, t ) as the matrix whose (i, j ) th element is Pij ( s, t ) then it follows from
(4.1.3) that
P( s, t ) = P( s, u ) P(u , t ) (4.1.4)
- 66 -
In practice it turns out that instead of specifying the transition probabilities it is usually
easier to specify derivatives of the transition probabilities. Let us assume that the
functions Pij ( s, t ) are continuously differentiable in both variables. Note that
0 if i ≠ j
Pij ( s, s ) =
1 if i = j
= δ ij .
Definition 4.1.2 The transition rate of going from state i to state j in the
vicinity of time s is defined as
Pij ( s, s + h)
lim if j ≠ i
Pij ( s, s + h) − Pij ( s, s ) h →0 h
σ ij ( s ) = lim =
lim Pij ( s, s + h) − 1
h →0 h
if j = i
h→0 h
∂
= Pij ( s, t ) .
∂t t =s
f ( h)
Definition 4.1.3 A function f : R R is o(h) if lim = 0.
h →0 h
It is easy to show that if f is o(h) then cf (h) is o(h) if c is a constant and if g is also
o(h) then f + g is o(h) and if 0 ≤ g (h) ≤ f (h) and f is o(h) then g is o(h) .
RESULT 4.1.1(CT)
f ( s + h) − f ( s )
Let f ' ( s ) = lim and let δ (h) be such that
h →0 h
f ( s + h) = f ( s ) + hf ' ( s ) + δ (h) .
Then
f ( s + h) − f ( s ) δ ( h)
= f ' (s) +
h h
and therefore
f ( s + h) − f ( s ) δ ( h)
lim = f ' ( s ) + lim
h →0 h h →0 h
δ ( h)
i.e. f ' ( s ) = f ' ( s ) + lim
h →0 h
δ ( h)
or in other words lim = 0 , so that δ (h) = o(h) and therefore
h →0 h
f ( s + h) = f ( s ) + hf ' ( s ) + o(h) .
- 67 -
RESULT 4.1.2(CT)
For i ≠ j :
Note that Pij ( s, s + h) is the probability that the state of the process at time s is i and at
time s + h has changed to j . This event includes all those cases where there are more
than 1 change in the interval ( s, s + h) provided it started in i and ended in j . Let
PijC ( s, s + h) be the probability that in the interval ( s, s + h) there was only the one change
from state i to state j . Let D be the event that there is more than one change and that
the state changes from i to j in ( s, s + h) . Therefore
Pij ( s, s + h) = PijC ( s, s + h) + P[ D] .
Let E be the event that more than one change occurs in the interval ( s, s + h)
irrespective of the state at the beginning and at the end. We will now assume that
P[ E ] = o(h) for all values of s . But D ⊂ E and therefore 0 ≤ P[ D] ≤ P[ E ] and therefore
P[D] is o(h) if P[ E ] = o(h) .
Hence Pij ( s, s + h) = PijC ( s, s + h) + o(h) . Therefore, for i ≠ j ,
Pij ( s, s + h) PijC ( s, s + h) + o(h) PijC ( s, s + h)
σ ij ( s ) = lim = lim = lim
h →0 h →0
h h h→0 h
i.e. σ ij (s ) is the limit of the rate of change from state i to state j at time s irrespective
of whether we include or exclude the possibility of more than one change.
It also shows that for i ≠ j ,
Pij ( s, s + h) = hσ ij ( s ) + o(h) and PijC ( s, s + h) = hσ ij ( s ) + o(h) . (4.1.5)
For i = j let PiiC ( s, s + h) be the probability that at time s the process is in state i and at
time s + h is still in state i i.e. no changes took place. The probability Pii ( s, s + h)
- 68 -
includes the probability of a change away from i and then back again to state i i.e. the
probability of more than one change. Therefore similarly
Pii ( s, s + h) = Pii ( s, s + h) + o(h) .
C
Now note that 1 − PiiC ( s, s + h) is the probability of some change in the interval s, s + h .
Therefore
P ( s, s + h) − 1 P C ( s, s + h) + o( h) − 1 PiiC ( s, s + h) − 1
σ ii ( s ) = lim ii = lim ii =
h →0
lim
h →0 h →0
h h h
i.e. σ ii (s ) is the negative of the rate of some change from state i in the process at time
s.
It also shows that
Pii ( s, s + h) = 1 + hσ ii ( s ) + o(h) and PiiC ( s, s + h) = 1 + hσ ii ( s ) + o(h) . (4.1.6)
or in matrix form
∂ ∂ ∂
∂t P00 ( s, t ) ∂t P01 ( s, t ) ∂t P02 ( s, t )
∂ ∂ ∂
P10 ( s, t ) P11 ( s, t ) P12 ( s, t )
∂t ∂t ∂t
∂ P ( s, t ) ∂ P ( s, t ) ∂ P ( s, t )
∂t 20 ∂t
21
∂t
22
P00 ( s, t ) P01 ( s, t ) P02 ( s, t ) σ 00 (t ) σ 01 (t ) σ 02 (t )
P ( s, t ) P ( s, t ) P ( s, t ) σ (t ) σ (t ) σ 12 (t )
= 10 11 12 10 11
so that if we let h → 0
∂
Pij ( s, t ) = Pij ( s, t )σ jj (t ) + ∑ Pik ( s, t )σ kj (t ) = ∑ Pik ( s, t )σ kj (t ) .
∂t k∈S k∈S
k≠ j
■
Kolmogorov’s Backward Equations
Theorem 4.1.2 For a Markov jump process with transition probabilities
Pij ( s, t ) and transition rates σ ij (s ) , we have that
∂
Pij ( s, t ) = −∑ σ ik ( s ) Pkj ( s, t )
∂s k∈S
∂
or in matrix form P( s, t ) = − A( s ) P( s, t ) . This is known as Kolmogorov’s
∂s
Backward Equation.
28 (ST) Proof Similar to that of theorem 4.1.1.
■
A Property of Transition Rates
Theorem 4.1.3 For a Markov jump process with transition probabilities
Pij ( s, t ) and transition rates σ ij (s ) , we have that
σ ii ( s ) = −∑ σ ik ( s ) .
k∈S
k ≠i
must be in one of the states in S at time t . Differentiating both sides with respect to t
and then equating t = s we get that,
∂ ? ∂
0 = ∑ Pik ( s, t ) =∑ Pik ( s, t ) = ∑ σ ik (t ) = ∑ σ ik ( s )
∂t k∈S t =s k∈S ∂t t =s k∈S t =s k∈S
i.e. σ ii ( s ) = −∑ σ ik ( s ) . ■
k∈S
k ≠i
- 70 -
Theorem 4.1.3 implies that the sum of the elements in the i th row of the A matrix is
equal to 0.
Now suppose that X is a square matrix and then define the matrix function e X as
∞ ∞
1 i 1 i i
∑
i =0 i!
X - note that e X
is a matrix. In particular e tA
= ∑i = 0 i!
A t i.e.
d tA ∞
i ∞
i ∞ 1 ∞ 1
e = ∑ A i t i −1 = ∑ A i t i −1 = ∑ A i −1t i −1 A = ∑ A i t i A = e tA A
dt i = 0 i! i =1 i! i =1 (i − 1)! i =0 i!
i.e. P(t ) = e (t ) A satisfies equations (4.1.8) and P(0) = I (we define e 0 = I ).
Determination of PiiC ( s, t )
Theorem 4.1.4 For a Markov jump process with transition probabilities
Pij ( s, t ) and transition rates σ ij (s ) , we have that
t
∫ σ ii (u ) du
PiiC ( s, t ) = e s .
30 (ST, E)
Proof
P ( s, t + h)
C
ii
= P[ X u = i, s ≤ u ≤ t + h X s = i ]
= P[ X u = i, s ≤ u ≤ t , X u = i, t ≤ u ≤ t + h X s = i ]
= P[ X u = i, s ≤ u ≤ t X s = i ]P[ X u = i, t ≤ u ≤ t + h X s = i, X u = i, s ≤ u ≤ t ]
= PiiC ( s, t ) P[ X u = i, t ≤ u ≤ t + h X t = i ] (by the Markov Property)
= PiiC ( s, t ) PiiC (t , t + h)
= PiiC ( s, t )(1 + hσ ii (t ) + o(h) ). (from (4.1.6))
- 71 -
Hence
∂ C P C ( s, t + h) − PiiC ( s, t )
Pii ( s, t ) = lim ii
∂t h →0 h
o( h)
= lim PiiC ( s, t ) σ ii (t ) +
h →0
h
= PiiC ( s, t )σ ii (t ). (4.1.9)
t
∫ σ ii (u ) du
A solution of this equation is PiiC ( s, t ) = e s which can be checked by differentiating
both sides with respect to t :
t
∂ C ∂ ∫σ ii ( u ) du
Pii ( s, t ) = e s
∂t ∂t
t
∫σ ii ( u ) du ∂ t
= es . ∫ σ ii (u )du
∂t s
t
∫σ ii ( u ) du
= es .σ ii (t )
by the Fundamental Theorem of Calculus.
s
∫ σ ii (u ) du
This solution also gives PiiC ( s, s ) = e s = e 0 = 1 which is the correct boundary
condition. ■
Theorem 4.2.1 Let X t be the number of events of some type that takes place
in the interval [0, t ] . Then { X t : t ∈ [0, ∞)} is a stochastic process with state
space S = {0,1,2,...} . We will assume that the process that generates the events
satisfies the following conditions:
(i) X 0 = 0 i.e. no events take place at time 0. (4.2.1)
(ii) The number of events in disjoint intervals are independent random
variables. (4.2.2)
(iii) The probability of an event taking place in the interval (t , t + h] only
depends on the length of the interval i.e. the process is time-
homogeneous. (4.2.3)
(iv) The probability that one event occurs in the interval (t , t + h] is
λh + o(h) . (4.2.4)
(v) The probability that no events occur in the interval (t , t + h] is
1 − λh + o( h) . (4.2.5)
(vi) The probability that more than one event takes place in the interval
- 72 -
(t , t + h] is o(h) . (4.2.6)
Then { X t : t ∈ [0, ∞)} is a Poisson process with parameter λ .
31 (ST, E) Proof We must show the properties in definition 2.4.4 hold:
(i) We are given that X 0 = 0 .
(ii) Assumption (ii) implies that the number of events that take place in the interval
( s, t ] i.e. the increment X t − X s , is independent of the number of events that took place in
any interval [0, u ] if u ≤ s i.e. of { X u : 0 ≤ u ≤ s} . This implies that the process has
independent increments (and from theorem 2.3.1 it follows that the process has the
Markov property).
(iii) Assumption (iii) implies that the process is time homogeneous. Note that
Pij (t , t + h) = 0 if j < i since the number of events cannot decrease. This implies that
σ ij = 0 for j < i .
From assumptions (iv) to (vi) it follows that for j ≥ i
Pi ,i +1 (t , t + h) = λh + o(h) i.e. σ i ,i +1 = λ ,
Pi ,i + k (t , t + h) = o(h) i.e. σ i ,i + k = 0 for k ≥ 2,
Pi ,i (t , t + h) = 1 − λh + o(h) i.e. σ i ,i = −λ.
Hence the process is a time-homogeneous Markov jump process with transition rate
matrix given by
− λ λ 0 0 0
0 −λ λ 0 0
A= 0 0 − λ λ 0 .
0 0 0 − λ λ
From the Kolmogorov forward equations (for the time homogeneous case - see 4.1.8) it
follows that
d d d −λ λ
dt P00 (t ) dt P01 (t ) dt P02 (t ) P00 (t ) P01 (t ) P02 (t )
0 0 0
d 0 −λ λ 0 0
P12 (t ) P10 (t ) P11 (t ) P12 (t )
d d
P10 (t ) P11 (t )
dt dt dt = P (t ) P (t ) P (t ) 0 0 −λ λ 0
P (t )
P22 (t ) 0 −λ λ
d d d 20 21 22
0 0
dt 20
P21 (t )
dt dt
d
For any i , Pi 0 (t ) = Pi 0 ' (t ) = −λPi 0 (t ) (4.2.7)
dt
and Pij ' (t ) = λPi , j −1 (t ) − λPi , j (t ) for j > 0 (4.2.8)
with the initial condition that Pij (0) = δ ij .
- 73 -
d
A solution of (4.2.7) is Pi 0 (t ) = δ i 0 e − λ (t ) : e −λ ( t ) = δ i 0 e −λ ( t ) (−λ ) = −λPi 0 (t ) and it
dt i 0
satisfies the initial condition Pi 0 (0) = δ i 0 : Pi 0 (0) = δ i 0 e − λ ( s −s ) = δ i 0 e 0 = δ i 0 . Hence it follows
that Pi 0 (t ) = 0 if i > 0 and P00 (t ) = e − λ (t ) .
For j > 0 a solution of (4.2.8) is given by
−λ (t ) [λ (t )] j −i
e for j ≥ i
Pij (t ) = ( j − i )!
0 for j < i.
Note that this solution satisfies the condition Pij (0) = δ ij :
For j = i : Pii (t ) = e −λ (t )
[λ (t )]i−i = e −λ (t ) and so Pii (0) = e − λ ( 0 ) = 1 = δ ii .
(i − i )!
i , j −1
( j − i − 1)! ( j − i )!
ij
dt
For j < i we have that
d
Pij (t ) = 0 = 0 − 0 = λPi , j −1 (t ) − λPij (t ) .
dt
It follows from definition 2.4.4 that { X t : t ∈ [0, ∞)} is a Poisson process. ■
For a Poisson process { X t : t ∈ [0, ∞)} we have by definition that X 0 = 0 and that all
increments have a Poisson distribution i.e. can take on the values 0,1,2,.... The value of
X t for any sample path must be a non-decreasing function of t and must be integer
valued. Whenever there is an increase it must be a positive integer and we therefore
expect all jumps in the sample path to be of size 1. This can be proved rigorously but we
will simply assume it for this course.
Let us define an "event" as a point in time where there is an increase in the value of X t .
Theorem 4.2.2 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Then the distribution of the time until the first
event take place is an exponential distribution with parameter 1 / λ .
- 74 -
32 (CT) Proof Let T be the time from time 0 until the first event takes place.
Then
P[T > t ] = P[0 events in (0, t ]]
= P[ X t − X 0 = 0]
= e −λt since X t − X 0 ~ Poisson(λ (t − 0))
i.e. 1 − FT (t ) = e − λt
or FT (t ) = 1 − e − λt
or f T (t ) = λe − λt
which is an exponential density function with parameter 1 / λ . ■
Theorem 4.2.3 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Given any time s > 0 , the distribution of the time
from s until the next event take place is an exponential distribution with
parameter 1 / λ .
33 (CT) Proof Let T be the time from s until the next event takes place.
Then
P[T > t X s = i ] = P[ X s +t = i X s = i ]
= Pii ( s, s + t )
= e −λ ( s +t − s ) since X t + s − X s ~ Poisson(λ (t + s − s = t ))
= e − λt .
Since this probability does not depend on i , the conditional probability is equal to the
unconditional probability i.e. P[T > t ] = e − λt , and hence the distribution of T is
exponential with parameter 1 / λ since i.e. 1 − FT (t ) = e − λt implies FT (t ) = 1 − e − λt which
implies f T (t ) = λe − λt which is an exponential density function with parameter 1 / λ . ■
Theorem 4.2.4 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Let T1 be the time from 0 until the first event
takes place and let T2 be the time from the first event until the second event
takes place. Then T1 and T2 are independent exponential random variables
with parameter 1 / λ .
34 (ST)
Proof For any random variable X with distribution function F and density
function f we have that
d F ( x + h) − F ( x ) P[ x < X ≤ x + h]
f ( x) = F ( x) = lim = lim .
dx h →0 h h →0 h
For convenience let X (t ) = X t and consider
- 75 -
P[t 2 < T2 ≤ t 2 + h2 T1 = t1 ]
= P[ X (t1 + t 2 + h2 ) ≥ 2 and X (t1 + t 2 ) = 1T1 = t1 ]
= P[ X (t1 + t 2 + h2 ) ≥ 2 and X (t1 + t 2 ) = 1 X (t ) = 0 ∀ t < t1 and X (t1 ) = 1]
= P[ X (t1 + t 2 + h2 ) ≥ 2 and X (t1 + t 2 ) = 1 X (t1 ) = 1] (by the Markov property)
= P[ X (t1 + t 2 ) = 1 X (t1 ) = 1] P[ X (t1 + t 2 + h2 ) ≥ 2 X (t1 + t 2 ) = 1 and X (t1 ) = 1]
(since P[ A ∩ B | C ] = P[ B | C ]P[ A | B ∩ C ] )
= P11 (t1 , t1 + t 2 ) P[ X (t1 + t 2 + h2 ) ≥ 2 X (t1 + t 2 ) = 1] (by the Markov property)
∞
= e -λt 2 ∑ P1i (t1 + t 2 , t1 + t 2 + h2 )
i =2
∞
= e -λt 2 P12 (t1 + t 2 , t1 + t 2 + h2 ) + ∑ P1i (t1 + t 2 , t1 + t 2 + h2 )
i =3
= e (λh2 + o(h2 ) + o(h2 ) ).
- λt 2
Hence
P[t 2 < T2 ≤ t 2 + h2 | T1 = t1 ] o(h2 )
= λ e − λt 2 +
h2 h2
and if we let h 2 → 0 we get that
f T2 |T1 (t 2 | t1 ) = λe − λt2
i.e. the conditional density function of T2 given that T1 = t1 is an exponential density
function with parameter 1 / λ . Since this density function does not depend on t1 it means
that it is equal to the unconditional density function of T2 which in turn implies that T1
and T2 are independent random variables both with an exponential distribution with
parameter 1 / λ . ■
NOTE: Theorem 4.2.4 can be extended to show that if Ti is the time between the (i − 1) th
and i th events, then T1 , T2 ,... are independent random variables all with an exponential
distribution with parameter 1 / λ .
Let t be a point in time at which an event does not occur. Let Rt be the time from t until
the next event occurs and S t be the time since the last event occurred.
Theorem 4.2.5 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . The distribution of Rt is independent of t and is
given by
P[ Rt ≤ x] = 1 − e − λx for x ≥ 0 .
35 (CT) Proof This follows from theorem 4.2.3 since in the proof of theorem 4.2.3 it
does not matter whether or not an event took place at time s or not. ■
- 76 -
Theorem 4.2.6 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . The distribution of S t is given by
P[ S t = t ] = e − λt and P[ S t ≤ x] = 1 − e − λx for 0 ≤ x < t.
36 (CT) Proof Suppose that no event took place in (0, t ] . Then we have that, by theorem
4.2.4,
P[ S t = t ] = P[T1 > t ] = e − λt .
Now suppose that at least one event took place in the interval (0, t ] . For 0 ≤ x < t we
have that
P[ S t ≤ x] = P[at least one event in (t − x, t )]
= 1 − P[no event in (t − x, t )]
= 1 − e −λx by theorem 4.2.4.
■
Theorem 4.2.7 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Then the distribution of the time until the n th
event takes place is a gamma distribution with parameters 1 / λ and n .
37 (CT) Proof
The distribution of the time until the n th event takes place is the distribution of the sum
of n independent random variables, i.e. the times between successive events, each
exponentially distributed with parameter 1 / λ which is a gamma distribution with
parameters 1 / λ and n . ■
n! i = n (i − n)!
n!
= e − λp A ( t − s )
[λp A (t − s)]n .
n!
Thus Yt − Ys ~ Poisson(λp A (t − s )) .
■
4.3 Pure Birth Process
in the size of the population when a birth takes place, it can be expected that as the
population size increases the chance of a birth in an interval of specified length will
increase if competition for limited resources is not a factor.
Let us consider a Markov process { X t : t ≥ 0} and state space S = {0,1,2,3,4,...} where the
following holds:
1. Events occurring in non-overlapping intervals are independent. (4.3.1)
2. (a) Pii (h) = 1 − λi h + o(h) (4.3.2)
(b) Pi ,i +1 (h) = λi h + o(h) (4.3.3)
∞
(c) ∑ P ( h) = o( h)
j =i + 2
ij (4.3.4)
Assumption 3 implies that the state of the system at t i.e. X t , will be the number of
"births" that took place in (0, t ] and is not the size of the population. Assumption 2(d)
implies that the state of the system can only increase i.e. it is still a counting process.
Since the state of the system is the number of births that took place, it makes sense to
have P01 (h) = λ0 h + o(h) > 0 i.e. "births" can take place with the state of the system zero.
Of course births can also be considered to take place if immigration into the system from
outside the system is allowed and included in the concept of a "birth".
[ ]
Note that (P00 (t ), P01 (t ), P02 (t ),...) is the first row of the matrix P(t ) = Pij (t ) .
d d d d
Similarly P00 (t ), P01 (t ), P02 (t ),... is the first row of P(t ) .
dt dt dt dt
- 79 -
d
According to the Kolmogorov forward equations, namely P(t ) = P(t ) A , the first row
dt
d
of P(t ) must be equal to the first row of P(t ) A i.e. the first row of P (t ) multiplied by
dt
A i.e. in this case
d d d
P00 (t ), P01 (t ), P02 (t ),...
dt dt dt
− λ0 λ0 0 0 0
0 − λ1 λ1 0 0
= (P00 (t ), P01 (t ), P02 (t ),...) 0 0 − λ2 λ2 0
0 0 0 − λ3 λ3
d d
i.e. P00 (t ) = −λ0 P00 (t ) and P0 n (t ) = −λn P0 n (t ) + λn−1 P0,n−1 (t ) for n > 0 . ■
dt dt
To solve equations (4.3.7) and (4.3.8) we note that equation (4.3.7) is a differential
equation of which a possible solution is
P00 (t ) = ce − λ0t where c is a constant. (4.3.9)
But since P00 (0) = P[ X 0 = 0 X 0 = 0] = 1 , it follows from (4.3.9) that
1 = ce − λ0 ( 0 ) = c i.e. P00 (t ) = e − λ0t . (4.3.10)
For a specific set of λ k 's a recursive solution for n ≥ 1 is given by
t
P0 n (t ) = λ n −1e −λnt ∫ e λn x P0,n −1 ( x)dx (4.3.11)
0
since
t
d
P0 n (t ) = λn−1 (−λn )e −λnt ∫ e λn x P0,n−1 ( x)dx +λn−1e −λnt e λnt P0,n−1 (t ) ( F .T .o.C )
dt 0
t
= −λn λn−1e ∫ e λn x P0,n−1 ( x)dx + λn−1 P0,n−1 (t )
−λnt
0
= −λn P0 n (t ) + λn−1 P0,n−1 (t ) from (4.3.11)
i.e. the formula in (4.3.11) will be a solution (NB. check the boundary condition is also
satisfied).
Such a process is once again a time-homogeneous Markov jump process with transition
rate matrix
0 0 0 0 0 0
µ − µ1 0 0 0 0
1
0 µ2 − µ2 0 0 0
A= .
0 0 0 µ i −1 − µ i −1 0
0 0 0 0 µi − µ i
d
Pi 0 (t ) = µ1 Pi1 (t )
dt
d
Pin (t ) = − µ n Pin (t ) + µ n+1 Pi ,n+1 (t ) for 0 < n < i
dt
d
Pii (t ) = − µi Pii (t ).
dt
For the case µ n = nµ we get that
i
Pin (t ) = e − nµt (1 − e − µt ) i − n (4.4.7)
n
(which is a binomial distribution with p = e - µt ) since for 0 < n < i
d i i
Pin (t ) = e −nµt (−nµ )(1 − e −µt ) i −n + e −nµt (i − n)(1 − e −µt ) i −n−1 e −µt µ
dt n n
i −( n+1) µt
= −nµPin (t ) + (n + 1) µ e (1 − e −µt ) i −( n+1)
n + 1
= −nµPin (t ) + (n + 1) µPi ,n+1 (t )
i.e. (4.4.7) satisfies the equations (show the other cases too: n = i and n = 0 ) (NB. check
the boundary conditions are also satisfied for each case).
i.e. as time goes on the number of births goes to infinity with probability one.
For the pure death process with µ n = nµ
i
lim Pin (t ) = lim e − nµt (1 − e − µt ) i − n = 0 for all i ≥ n ≥ 1
t →∞ t →∞ n
and
lim Pi 0 (t ) = lim(1 − e − µt ) i = 1
t →∞ t →∞
i.e. as time goes on the size of the population goes to zero with probability one.
and similarly
- 83 -
d d d
lim P01 (t ) = lim P10 (t ) = lim P11 (t ) = 0 .
t →∞ dt t →∞ dt t →∞ dt
Now let p 0 = lim P00 (t ) = lim P10 (t ) and p1 = lim P01 (t ) = lim P11 (t ) .
t →∞ t →∞ t →∞ t →∞
∞
Now using the requirement that ∑p
n =0
n = 1 we get that
∞
λ λ ...λ
p 0 1 + ∑ 0 1 n −1 = 1
n =1 µ1 µ 2 ...µ n
−1
∞
λ λ ...λ λ0 λ1 ...λ n −1
i.e. p n = 1 + ∑ 0 1 n −1 (4.6.8)
n =1 µ1 µ 2 ...µ n µ1 µ 2 ...µ n
∞
λ λ ...λ
provided ∑ 0 1 n −1 is finite.
n =1 µ1 µ 2 ...µ n
For the simple birth and death process where λ n = λ and µ n = µ we get that
λ0 λ1 ...λ n −1 λλ ...λ λ
n
= =
µ1 µ 2 ...µ n µµ...µ µ
λ λ ...λ
n
∞ ∞
λ 1 λ
and 1 + ∑ 0 1 n −1 = ∑ = provided < 1
n =1 µ1 µ 2 ...µ n n =0 µ 1− λ / µ µ
n
λ λ
i.e. p 0 = 1 − λ / µ and p n = [1 − λ / µ ] provided < 1 .
µ µ
SETTING: For the Poisson process there is a single transition rate namely λ . Let T1 be
the time from time 0 until the first jump occurs. Let Ti be the time from the (i − 1) st jump
- 86 -
until the i th jump. If we know the sample path for a Poisson process all the Ti 's can be
determined and conversely if all the Ti 's are known the complete sample path is known
i.e. all the information about the process is contained in the Ti 's.
LIKELIHOOD ESTIMATION: From the note following theorem 4.2.4 we have that the
Ti 's are independent exponential random variables all with parameter 1 / λ . Now suppose
that a Poisson process is observed until the n th jump occurs. The likelihood function is
then given by
n
n −λ ∑ ti
L ( λ ) = ∏ λ e − λt i = λ n e i =1
= λ n e − λt
i =1
th
where t is the total time until the n event is observed. Then
ln L(λ ) = n ln λ − λt
d
⇒ ln L(λ ) = n / λ − t
dλ
which is 0 if λ is equal to n / t i.e. the maximum likelihood estimator of λ is n / T
where T is the random variable for the total time until the n th event occurs.
In theory we can derive the distribution of the maximum likelihood estimator from the
fact that T has a gamma distribution with parameters 1 / λ and n . Unfortunately this is
not one of our standard distributions for inference. We can use the gamma distribution of
n
T to determine the expected value and variance of the estimator namely λ̂ and
n −1
n2 ˆ2
λ . For large values of n the maximum likelihood estimator will have
(n − 1) (n − 2)
2
an approximate normal distribution with expected value λ and variance equal to the
Cramer-Rao lower bound.
The above assumes that the process is in fact a Poisson process. To test whether or not it
is a Poisson process, we should test whether the number of events has a Poisson
distribution and whether or not the times between successive events are independent and
exponentially distributed.
To test for the Poisson distribution, given enough observations, divide the interval [0, t ]
into k intervals of equal length. Then the number of events observed in each interval will
be a Poisson random variable with parameter λt / k . Use the MLE of λ to estimate the
expected number of events in each interval and then apply a χ 2 goodness of fit test.
To test for the independence of times between events, calculate the serial correlation for
these times and then apply a test to determine it is significantly different from 0. (Tests
for serial correlation will be discussed in the course on Time Series Analysis in the
second semester.)
We define Wt as the length of time that the process remains in the state being occupied at
time t i.e. for every ω ∈ Ω and t ≥ 0 ,
Wt (ω ) = inf{s > 0 : X t + s (ω ) ≠ X t (ω )} .
(theorem 4.1.1)
state to which the process jumps when the first jump occurs. Then Wt and
X W are independent random variables and
t
- 88 -
σ ij
P[ X Wt = j | X t = i ] = for i ≠ j .
− σ ii
By the Markov property and time-homogeneity, given that the first jump is to j the
1
waiting time for the next event is exponentially distributed with parameter and the
− σ jj
σ jk
probability that the second jump is to k is given by etc. Furthermore the waiting
− σ jj
times are independent random variables since the waiting time for the second jump only
depends on the state j at the first jump and not on anything that happened before.
P[Ws > w | X s = i ] = e s for w > 0
1
otherwise.
- 89 -
∫ σ ii (u ) du
And for w = 0 we have e s = e0 = 1 . ■
f Ws | X s ( w | i ) = −σ ii ( s + w)e ∫s
σ ii ( u ) du
.
Let X s+ be the state to which the process jumps to at the first jump after s .
= e ∫s
σ ii ( u ) du
P[ X s + w+ h = j | X s + w = i ].
(by theorem 4.8.4 and the Markov property)
If we divide by h and let h → 0 we get on the left hand side the joint conditional
probability mass/density function of X s+ and Ws on Xs, i.e. . On the right
hand side we get
σ ii ( u ) du σ ij ( s + w)
s+w s+w
e ∫s σ ij ( s + w) = − σ ii ( s + w)e ∫s
σ ii ( u ) du
.
− σ ii ( s + w)
Since the first factor is the density function of Ws , given X s = i , the second factor must
be the conditional probability that X s+ = j given that X s = i and Ws = w . It also follows
then that Ws and X s+ are not independent. ■
- 90 -
i.e. f Y |Z ( y i | z j ) = ∑ f X |Z ( x k | z j ) f Y | X ,Z ( y i | x k , z j ) .
∀k
45 (CT)
Proof
P[Y = y i | Z = z j ] = P[ X ∈ {x k }, Y = y i | Z = z j ]
∀k
= ∑ P[ X = x k , Y = y i | Z = z j ] = ∑ f X ,Y |Z ( x k , y i | z j )
∀k ∀k
= ∑ P[ X = x k | Z = z j ]P[Y = y i | X = x k , Z = z j ]
∀k
∞
f X , Z ( x, z ) f X ,Y , Z ( x, y, z )
= ∫
−∞
f Z ( z) f X , Z ( x, z )
dx
∞
= ∫f
−∞
X |Z ( x | z ) f Y | X , Z ( y | x, z )dx.
NOTE: Now let us consider the case where X is a continuous random variable and Y
and Z are discrete random variables. Let P[ X = x, Y = y i , Z = z j ] be the joint
density/probability mass function of X , Y and Z at the point x, y i and z j . Similar to
theorems 4.9.1 and 4.9.2 we will then get that
- 91 -
∞
P[Y = y i | Z = z j ] = ∫ P[ X = x, Y = y
−∞
i | Z = z j ]dx
∞
= ∫ P[ X = x | Z = z
−∞
j ]P[Y = y i | X = x, Z = z j ]dx.
for i ≠ j
t −s s+w
e ∫s
σ ii ( u ) du
Pij ( s, t ) = P[ X t = j | X s = i ] = ∑∫
∀l ≠ i 0
σ il ( s + w)Plj ( s + w, t )dw .
t −s
= ∫f
0
Rs | X s ( w | i ) P[ X t = j | X s = i, Rs = w]dw
since P[ A ∩ B | C ] = P[ A | C ]P[ B | A ∩ C ]
t −s s+w
e ∫s
σ ii ( u ) du
= ∫ (− σ ii ( s + w) )P[ X t = j | X s = i, Rs = w]dw
0
P[ X t = j | X s + w = l , X s = i, Rs = w]dw
t −s
σ il ( s + w)
s+w
∑ ∫ e∫
σ ii ( u ) du
= s
(− σ ii ( s + w) ) P[ X t = j | X s + w = l , X s = i, Rs = w]dw
∀l ≠ i 0 − σ ii ( s + w)
from theorem 4.8.5
- 92 -
t −s
(− σ ii ( s + w) ) σ il ( s + w) P[ X t = j | X s + w = l ]dw
s+w
e ∫s
σ ii ( u ) du
= ∑∫
∀l ≠ i 0 − σ ii ( s + w)
(by the Markov property)
t −s
σ il ( s + w)
s+w
∑ ∫ e∫
σ ii ( u ) du
= s
(− σ ii ( s + w) ) Plj ( s + w, t )dw.
∀l ≠ i 0 − σ ii ( s + w)
■
The result of theorem 4.9.3 is known as the integrated form of the Kolmogorov
equations. Instead of having the derivatives and the transition probabilities in a set of
equations we have the transition probabilities and integrals of the transition probability
functions in a set of equations. The advantage of the latter is that there are fairly efficient
algorithms to calculate a sequence of approximate values of the function even though we
are not able to solve the equations analytically.
There are many examples where the transition from one state to another may depend on
the time the process already is in that state. Consider for example a process with three
states H="Healthy", S="Sick" and D="Dead". The probability of a sick person recovering
and being healthy again may depend on his age but also on how long the person has been
sick. That means that if we let X t be the state of the process at time t we will for
instance have that P[ X t = j | X u = i, s − r ≤ u ≤ s ] and P[ X t = j | X u = i, s − 2r ≤ u ≤ s ]
will be different even though in both cases it is given that X s = i i.e. the process will not
have the Markov property.
To work with a process with the Markov property we will have to define the state of the
system in such a way that knowledge of the present state of the process is enough to
determine the future probabilities of the process. This implies that we must incorporate
the length of stay in the present state in the definition of the state of the system. For a
process with states 0,1,2,3,... we then need to define the states of the process as elements
of the set S={0,1,2,...}×[0,∞) i.e. the state of the process is a vector (i, t ) where i
indicates the state and t denotes the time spent in the present state of the system. Since
the state space is no longer discrete we will no longer have a Markov jump process,
although it will be a Markov process.
We can develop a theory for such processes along the lines of this chapter and results will
necessarily have to change because of this. We will only give a few such results below.
Example 4.10.1 Consider a Healthy, Sick and Dead process. Let X t = ( H , Ct ) be the
state of the process at time t if the person is healthy and has been healthy for a period of
C t etc. Suppose that the transition rate from a healthy state to a sick state is σ (t ) i.e. it
does not depend on C t . Similarly suppose that the transition rate from healthy to dead is
- 93 -
µ (t ) , the transition rate from sick to healthy ρ (t , C t ) and the transition rate from sick to
dead be v(t , C t ) .
Both transition rates out of state H do not depend on C t and we should therefore get the
same as before for the distribution of the length of time the process remains H i.e.
− [σ ( x ) + µ ( x ) ]dx
t
P C ( s, t ) = e ∫
HH
s
.
The probability that the person will stay sick from time s until time t given that at time
the person was already sick for a period of w , is given by
− [ ρ ( x , w − s + x ) + v ( x , w − s + x ) ]dx
t
P[ X t = ( S , w + t − s ) | X s = ( S , w)] = e ∫s .
Note that at time u between s and t the person is already sick for a period of w + u − s
and this is updated continuously in the integral above. The formulae remain the same as
before except that if a transition rate depends on C t , the arguments for that transition rate
in the integral is changed from ( x ) to ( x, w + x − s ) . ♪
E.4. Consider a Markov process with state space S = {0,1,2} and transition matrix:
p q 0
P= 1 0 1
21 2
1
p− 7
2 10 5
a) What can you say about the values of p and q ?
b) Calculate the transition probabilities pij(3) .
c) Draw the transition graph for the process represented by P .
(Transition Graph: diagram in which each state is represented as a node and an arrow is
drawn from node i to node j if pij > 0 indicating a direct transition from state i to state
j is possible. The value of pij is indicated above the arrow.)
E.5. Consider a Markov chain with only two states, S = {0,1} , and transition matrix
1 1
P= 2 2 .
3
1 2
3
a) What are the conditions for the stationary distribution of this chain to exist and be
unique?
b) Are these conditions satisfied? Explain fully.
c) Determine the stationary distribution for this chain.
1/ 2 0 2/3 0
1 / 10 1 / 5 3 / 5 1 / 10
E.6. Is the following process irreducible P = ?
0 1/ 2 1/ 3 1/ 6
1/ 4 1/ 4 1/ 2 0
1 / 2 1 / 2 0 0 0
1 / 3 2 / 3 0 0
0
E.7. Is the following process irreducible P = 0 0 1 0 0 ? What are the
0 0 0 2 / 3 1 / 3
0 0 0 1 / 2 1 / 2
stationary distributions?
E.8. Consider a time-homogeneous Markov chain with state space S = {0,1,2} and
transition matrix:
p + 1 1 q
10 10
P= 1 3 1 .
5 10 2
1 p+ 3 3
5 10 10
a) What are the values of p and q ?
b) Calculate the matrix of 3-step transition probabilities from state i to state j .
- 95 -