0% found this document useful (0 votes)
73 views101 pages

WST312 Notes 2023

This document provides an introduction to stochastic processes. It defines stochastic processes as collections of random variables indexed by a parameter such as time. Stochastic processes are probabilistic models where the dependent variable can take on different values even if the explanatory variables remain the same, due to their random nature. Examples of stochastic processes include population growth models, queuing problems, and inventory levels that change randomly over time. Deterministic models are not suitable for these examples as they cannot account for this randomness.

Uploaded by

Blessing Mapadza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views101 pages

WST312 Notes 2023

This document provides an introduction to stochastic processes. It defines stochastic processes as collections of random variables indexed by a parameter such as time. Stochastic processes are probabilistic models where the dependent variable can take on different values even if the explanatory variables remain the same, due to their random nature. Examples of stochastic processes include population growth models, queuing problems, and inventory levels that change randomly over time. Deterministic models are not suitable for these examples as they cannot account for this randomness.

Uploaded by

Blessing Mapadza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

-1-

WST312 Stochastic Processes 2023

Chapter 1: Introduction
A mathematical model gives the value of a variable (the dependent variable) in terms of
the values of some other variables (the explanatory/independent variables). A model (of
any sort) is an imitation of a real world system or process. In the case of a mathematical
model the value of the dependent variable is given in terms of some mathematical
formula of the values of the explanatory variables.

Mathematical models can be categorized as being either deterministic or probabilistic


(stochastic).

In the case of a deterministic model the value of the dependent variable, given the values
of the explanatory variables, can only be one value as given by a mathematical formula.
This type of model contains no random components.

In the case of probabilistic models the dependent variable can assume different values
even if the values of all the explanatory variables remain the same. In this case it is the
probabilities of the dependent variable that are uniquely determined by the values of the
explanatory variables. To specify the probabilities for the dependent variable it is
sufficient to specify the distribution function of the dependent variable. This type of
model recognizes the random nature of the explanatory variables.

Examples where probabilistic models are preferred over deterministic models:

1. A brand switching model: A beer drinker can change the brand of beer he drinks
from time to time. It is not predictable with absolute certainty when he will change from
one brand of beer to another. To model the brand of beer he drinks at a certain point in
time, we need to consider the probabilities that he will change from one brand of beer to
another given that he has been drinking a certain brand of beer for a certain number of
times.
2. Number of contributors to a pension fund: The number of contributors to a pension
fund during a certain period of time will differ in a random way from period to period. It
will depend on the number of deaths and persons retiring during that period and new
members joining the fund. The number of contributors during one period will affect the
number of contributors during the next period.
3. A dam problem: Consider the contents of a dam at the end of each week. The
contents of the dam will change in a random way during the week depending on the
inflow (amount of rain) - which is random - during the week. The contents at the end of
one week will affect the contents at the end of the next week.
4. An inventory problem: The level of inventory of a certain item in a shop will change
from time to time when customers buy that particular item or when an order for the item
is delivered. The times at which this will happen are not predictable i.e. it is random.
-2-

Once again the level of inventory at one point in time will affect the level of inventory
some time later.
5. A queuing problem: The number of customers in a queue will change if another
customer joins the queue or if the service of a customer is completed. The times at which
these changes occur are random. The length of the queue at a certain point in time will
depend on the length of the queue at times prior to that particular point in time.
6. Population growth models: The size of the population of a certain country at the end
of a year will depend on the size of the population at the end of the previous year, the
number of births and deaths during the year and the immigration and emigration during
the year which are all random variables.

Test Yourself: Determine in each of the above examples


i) Which are the dependent and explanatory variables?
ii) What specific characteristic of the model determines its stochastic nature?
iii) Whether any type of deterministic model could have been used?

Note that in all the examples above we are interested in the value of some variable at
different points in time. The probabilities for the different possible values of the variable
at a particular point in time may depend on the values of some other variables, but may
also depend on previous values of that particular value. In such cases knowledge of the
values of the variable at previous points in time may be very useful to predict future
values of the variable. In this course we will only consider stochastic models –
specifically stochastic processes and Markov processes.
-3-

Chapter 2: Description and Definition of Stochastic Processes

2.1 Description
Normally we are not interested in the value of a single random variable but rather a whole
collection of random variables. Let us indicate the variables we are interested in by X t
where t is some element of the set T . We refer to t as the indexing parameter for the
stochastic process. In many cases t denotes the time of the observation e.g. the size of a
queue for all times in a certain interval may be of interest. We usually use 0 (but could
also use 1) for the starting point for some process and X 0 is then the random variable that
denotes the starting value of the process – which may be some constant with probability
1.

For each value t ∈ T , the random variable X t is a function defined on a probability


space (Ω, ξ , P ) which is preferably the same for all X t . On this probability space P is a
probability measure function which gives the probabilities of all the subsets of the
sample space Ω which are elements of the σ - field ξ .

NOTE: If ω ∈ Ω , then X t (ω ) denotes the value of the random variable X t for the
element ω of the sample space. For different values of the indexing parameter t , say t1
and t 2 , we have that X t1 and X t2 are different random variables.

Definition 2.1.1 A stochastic process is a collection of random variables


{X t :t ∈ T } .

All the possible values that can be assumed by X t , t ∈ T , are called the states of the
stochastic process and the set of all possible states is called the state space. Let S denote
the state space. Since the X t ’s are random variables the possible values are the real
numbers i.e. S is a subset of the real numbers.

The set of all indexing parameters T is called the parameter space. If the parameter t is
time the value of X t is called the state the process is in at time t . In many cases the
indexing parameter t is time, but other parameter spaces are possible. The parameter
space T is called discrete if it is finite or enumerable and is called continuous if it is not
enumerable. Similarly the state space S can be either discrete or continuous.

To identify the nature of a stochastic process, a first step is to classify them on the basis
of the nature of the parameter space and the nature of the state space. For example:
-4-

Example 2.1.1

Nature of State Nature of Parameter Space


Space Discrete Continuous
T: Number of items inspected T: Time of day
Discrete
S: Number of defective items observed S: Number of persons in a queue

T: Number of persons arriving at a bus stop T: Time


Continuous
S: Waiting time for the bus S: Content of a dam

2.2 Sample Paths and Probability Descriptions
Suppose that {X t : t ∈ T } , where T is some ordered set, is a stochastic process defined on
the probability space (Ω, ξ , P ) i.e. X t : Ω → R is a random variable defined on the
probability space (Ω, ξ , P ) for all values of t ∈ T . A single performance of the
‘experiment’ corresponding to this sample space, will determine an element ω from Ω .
For this element ω ∈ Ω we can determine the value of X t (ω ) for each t ∈ T . The set of
points {(t , X t ) : t ∈ T } is called a sample path of the process. For instance if X t is the
number of insurance claims received up to and including time t , then the sample path
may be as follows:

Example 2.2.1

Figure 2.2.1 Number of Claims


Recieved

5
4
Number

3
2
1
0
0 2 4 6 8 10
Time

This graph shows a particular outcome of the ‘experiment’ which determines the
observed values of the stochastic process for the element ω . The sample path above
indicates that the first claim is received at time 2 since the function is 0 for t < 2 and
equal to 1 for t = 2 ; the second claim is received at time 3 since the function is 1 for
-5-

2 ≤ t < 3 and is equal to 2 for t = 3 ; the third claim at time 6, the fourth at time 8, etc. If
we repeat the experiment under the same conditions we will get a second value of ω , say
ω 2 , and then get the observed values of the stochastic process for all values of t i.e. we
get another possible sample path. Each repetition of the experiment determines an
element ω of Ω and therefore determines a sample path.

Result 2.2.1 Given the probability space (Ω, ξ , P ) , we can determine the joint
distribution of any finite number of the X t ’s. Since each X t is a random variable the set
{ω : X t (ω ) ≤ x} is an element of ξ . Since ξ is a σ -field the intersection of any finite
number of such sets is an element of ξ and hence P is defined for such a set. For any
value n and any values t1 < t 2 < t 3 < ... < t n the joint distribution function of
X t1 , X t2 , X t3 ,..., X tn is given by

FX t , X t 2 ,..., X t n (x1 , x2 ,..., x n ) [


= P X t1 ≤ x1 , X t2 ≤ x 2 ,..., X tn ≤ x n ]
= P[{ω : X }]
1

t1 (ω ) ≤ x1 , X t2 (ω ) ≤ x 2 ,..., X tn (ω ) ≤ x n .
Thus, if given the joint cumulative distribution function of X t1 , X t2 , X t3 ,..., X tn , the joint
density function or probability mass function of X t1 , X t2 , X t3 ,..., X tn can be determined.

Conversely, if we have a set of random variables {X t : t ∈ T } and know all the joint
distribution functions FX t , X t , X t ,..., X t for all values of n and t1 < t 2 < t 3 < ... < t n we can
1 2 3 n

find a probability space (Ω' , ξ ' , P') and define functions {X t' : t ∈ T } on this space such
that the joint distribution of X t'1 , X t'2 , X t'3 ,..., X t'n is the same as the joint distribution of
X t1 , X t2 , X t3 ,..., X tn . The probabilities of a stochastic process can therefore be defined by
the joint distributions of all X t1 , X t2 , X t3 ,..., X tn .

Note: To determine, or even just to list, all the joint distributions would be very
cumbersome and hard to use in some practical problems unless there is some structure in
the nature of the process which simplifies the specification of all the joint distributions. In
many cases a set of random variables with a very simple structure for all the joint
distributions is defined and then a set of new random variables i.e. a stochastic process is
defined using the random variables with the simple structure.

Example 2.2.2

Consider a gambler that makes successive bets of R1 each. The outcomes of the different
bets are independent events. Let X n be the amount of money the gambler wins (or
looses) at the n th bet. Assume that the probability of winning a bet is p for all bets. Then
the probability mass function of X n is given by
-6-

 p for x = 1

f X n ( x) = P[ X n = x] = 1 − p for x = −1
0 otherwise.

Since the X n ’s are independent random variables, the joint probability mass function of
X 1 , X 2 ,..., X n is given by

f X 1 , X 2 ,..., X n ( x1 , x 2 ,..., x n ) = f X 1 ( x1 ) f X 2 ( x 2 )... f X n ( x n ) for all x1 , x 2 ,..., x n .

Therefore {X n : n = 1,2,3,...} is a well-defined stochastic process. This is a stochastic


process called white noise i.e. it consists of independent and identically distributed
random variables.

Let Yn , n = 1,2,3,... be the total winnings (or losses) of the gambler up to and including
the n th bet i.e. Yn = X 1 + X 2 + ... + X n . Each Yn is therefore a linear transformation of the
X n ’s and the joint distribution of each Yn can therefore be obtained from the joint
distribution of the X n ’s. Thus we can use a stochastic process { X n : n = 1,2,3,...} for
which the joint distributions are easily determined to define the stochastic process
{ Yn : n = 1,2,3,...}.

2.3 Some General Properties of Stochastic Processes

2.3.1 Stationary A stochastic process {X t : t ∈ T } is a stationary process if for any


integer n , any values t1 < t 2 < ... < t n which are all elements of T and any value k such
that k + t1 , k + t 2 ,..., k + t n are all elements of T , it is true that the joint distribution of
X t1 , X t2 ,..., X tn is the same as the joint distribution of X t1 + k , X t2 + k ,..., X tn + k .

For n = 1 this means X t1 has the same distribution as X t1 + k for any k i.e. all the X t ’s
have the same distribution. For n = 2 it means that for any t1 and t 2 the joint
distribution of X t1 , X t2 is the same as the joint distribution of X t1 + k , X t2 + k . Therefore a
shift in time of both t1 and t 2 by the same quantity k does not change the joint
distribution of the random variables. Put differently, if the distance between t1 and t 2 is
the same as the distance between t 3 and t 4 the joint distribution of X t1 , X t2 is the same
as the joint distribution of X t3 , X t4 . As a result we have, for example, that
cov( X t1 , X t2 ) = cov( X t3 , X t4 ) . Similarly for the cases n = 3,4,... etc.
-7-

2.3.2 Covariance Stationary Process A stochastic process {X t : t ∈ T } is


called covariance stationary if cov( X s , X t ) depends only on t − s - in other words,
depends only on how far apart the two random variables are in time.

A stationary process is covariance stationary, but the converse is not necessarily true.

2.3.3 Weakly Stationary Process A stochastic process {X t : t ∈ T } is weakly


stationary if all the X t ’s have the same expected value and if the process is covariance
stationary.

2.3.4 Independent Increments A stochastic process {X t : t ∈ T } is said to


have independent increments if the increment in the value of the process over any interval
t to t + s (i.e. X t + s − X t ) is independent of what happened before or at time t i.e. the
increment is independent of {X u : 0 ≤ u ≤ t}. This means that the process is such that the
value of the change in the process from time t to time t + s is independent of the state of
the process before time t and at time t .

2.3.5 The Markov Property A stochastic process {X t : t ∈ T } is said to have


the Markov Property if for any set A ⊆ S , for any value of n and for any values of
t1 < t 2 < ... < t n < t n +1 , it is true that

[ ] [
P X tn +1 ∈ A | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn = P X tn +1 ∈ A | X tn = xtn ]
i.e. probabilities of future events (what happens at time t n +1 ) only depend on the state of
the process at the latest point in time ( t n ) for which we have information. The
information about the state of the process at the previous points in time ( t 0 , t1 ,..., t n −1 )
does not have any effect on the probabilities of future events and will therefore not help
to make a better prediction of what will happen at time t n +1 .
For a stochastic process with a discrete state space, the process will have the Markov
property if for any a ∈ S , for any value of n and for any values of t1 < t 2 < ... < t n < t n +1 ,
it is true that

[ ] [
P X tn +1 = a | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn = P X tn +1 = a | X tn = xtn . ]
-8-

Theorem 2.3.1 A stochastic process {X t : t ∈ T } with independent increments


has the Markov property.

1 (CT, ST)
Proof For any a ∈ S , for any value of n and for any values of
t1 < t 2 < ... < t n < t n +1 , we have that
[
P X tn +1 = a | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn ]
[
= P X tn +1 − X tn + xtn = a | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn ]
= P[X t n +1 − X t n + xt n = a ] (independent increments)
= P[X t n +1 − X t n + xt n = a | X t n = xt n ] (independent increments)
= P[X t n +1 = a | X t n = xt n ]
i.e. the Markov property holds.

NB: The converse is not necessarily true i.e. a Markov process does not necessarily have
independent increments. (Exercise: Find an example to show that this is true.)

A Markov process with discrete parameter space T = {0,1,2,...} and discrete state space
(finite or countably finite) S is called a Markov chain.
A Markov process with continuous parameter space T = [0, ∞ ) and discrete state space
(finite or countably finite) S is called a Markov jump process.

2.3.6 Martingale A discrete-time stochastic process {X t : t = 0,1,2,...} is a


martingale if
(i) E [ X t ] < ∞ for all t
(ii) E [X n X 1 , X 2 , X 3 ,..., X m ] = X m for all m < n.
The second point implies that if we know the values of the process up to time m , the best
(using the MSE criterion) guess or prediction of any future value of the process is the
latest value of the process that we know.

2.4 Some Specific Stochastic Processes

2.4.1 White Noise A stochastic process {X t : t = 0,1,2,...} is called white noise if


X 0 , X 1 , X 2 ,... is an independent sequence of random variables, all of them identically
distributed.
White noise is often used as an intermediary process to define other stochastic processes.
White noise is a stationary process and possesses the Markov property. (Prove this
statement as an exercise.) It does not have independent increments.
-9-

2.4.2 Random Walk Let {X t : t = 0,1,2,...} be a white noise process and define a
t
sequence of random variables Y0 , Y1 , Y2 ,... where Y0 = 0 and Yt = ∑ X i for all t ≥ 1.
i =1

Then {Yt : t = 0,1,2,...} is called a random walk process.


A random walk is a process with independent increments and therefore has the Markov
property (prove this as an exercise). It is, however, not necessarily stationary (prove this
for the case where E [X t ] = µ ≠ 0 ).
A special case is when X t is either +1 or -1 with probabilities p and 1 − p respectively.
This may represent a person (maybe drunk?) who takes steps independent of each other
either to the right with probability p or steps to the left with probability 1 − p . Then Yt
is the position of the person after t steps. This is called a simple random walk.

2.4.3 Moving Average Process Suppose that X − p , X − p +1 ,..., X 0 , X 1 , X 2 ,... is a


sequence of independent identically distributed random variables. Let α 0 , α 1 ,..., α p be
p
p + 1 real numbers. Let Yt = ∑ α i X t −i for t = 0,1,2,... .Then Yt is a weighted average of
i =0

the last p + 1 X ’s up to and including X t . (The weight for the latest one ( X t ) is α 0 , the
weight for the previous one ( X t −1 ) is α 1 etc.) Then {Yt : t = 0,1,2,...} is a moving average
process of order p . A moving average process is a stationary process but in general is
not a Markov process.

2.4.4 Poisson Process A Poisson process with parameter λ > 0 is a continuous


time process {N t : t ∈ [0, ∞ )} and state space S = {0,1,2,...} with the following properties:
(i) N 0 = 0
(ii) {N t } has independent increments
(iii) {N t } has stationary increments which have a Poisson distribution with parameter
proportional to λ > 0 i.e.
e − λ (t − s ) (λ (t − s ) )
n
P[N t − N s = n] = for n = 0,1,2,...
n!
for all s < t .

The Poisson process is often used as a model where we count the number of events that
take place during some interval of time – for instance the number of accidents or the
number of insurance claims. The Poisson process is not stationary (why?).

2.4.5 Compound Poisson Process Suppose that {N t : t ∈ [0, ∞ )} is a Poisson


process with parameter λ > 0 and that X 1 , X 2 , X 3 ,... is a sequence of independent
identically distributed random variables. Let
- 10 -

0 if N t = 0

Yt =  N t
∑ X i if N t > 0.
 i =1
Then is called a compound Poisson process. Such a process can be used
as a model where N t is the number of insurance claims up to time t , for example, and
X i is the amount of the i th claim. Then Yt is the total amount of claims received up to
time t .

2.4.6 Brownian Motion A stochastic process {Bt : t ∈ [0, ∞ )} with state space
S = R is called Brownian motion if:
(i) {Bt } has independent increments.
(ii) Each increment Bt − Bs , s < t , has a normal distribution with expected value µ (t − s )
and variance σ 2 (t − s ) , where µ and σ 2 > 0 are constants.
(iii) {Bt } has continuous sample paths.
Brownian motion is also referred to as a Wiener process.
The case when µ = 0 and σ 2 = 1 is referred to as standard Brownian motion.

2.5 Important Theoretical Results

Theorem 2.5.1 Suppose that N is a non-negative integer valued random variable,


Z 1 , Z 2 , Z 3 ,... are independent identically distributed random variables and that all Z i ’s
are independent of N . Let
N
∑ Z i if N > 0
T =  i =1 .
0 if N = 0

Then E[T ] = E[ N ].E[ Z ] and var(T ) = E[ N ]. var(Z ) + var( N ).E[ Z ] 2 .
Proof We know that for any random variables T and N
E[T ] = E N [ E[T | N ]] (Bain and Engelhardt).
So
- 11 -

E[T ] = E N [ E[T | N ]]
∞ N
= P[ N = 0]E[T | N = 0] + ∑ P[ N = n]E[∑ Z i | N = n]
n =1 i =1
∞ n
= P[ N = 0].0 + ∑ P[ N = n]E[∑ Z i ] (since all Z i are independent of N )
n =1 i =1

= ∑ P[ N = n].n E[ Z ] (since n is a given constant)
n =0

= E[ Z ]∑ P[ N = n].n
n =0

= E[ Z ]E[ N ].

We know that for any random variables T and N


var(T ) = E N [var(T | N )] + varN ( E[T | N ]). (Bain and Engelhardt).

So
var(T ) = E N [var(T | N )] + varN ( E[T | N ])

 N   N 
= ∑ P[ N = n] var ∑ Z i N = n  + varN  E[∑ Z i N ] 
n =0  i =1   i =1 

 n
  N

= ∑ P[ N = n] var ∑ Z i  + varN  E[∑ Z i N ] .
n =0  i =1   i =1 
But
N n
E[∑ Z i N = n] = E[∑ Z i ] = n E[ Z ] (true for n > 0 and n = 0 ),
i =1 i =1
so

var(T ) = ∑ P[ N = n] n var(Z ) + varN ( N E[ Z ])
n =0

= var(Z )∑ P[ N = n] n + E[ Z ] 2 var( N )
n =0

= var(Z ) E[ N ] + E[ Z ] 2 var( N ).

Definition 2.5.1 Let U be a random variable which can take on the values 0,1,2,3,...
and let pi = P[U = i ] . The probability generating function of U is defined by

u ( z ) = p 0 + p1 z + p 2 z 2 + p3 z 3 + ... = ∑ pi z i .
i =0
This sum will converge for | z |< 1 since for | z |< 1
- 12 -

∞ ∞

∑| p z
i =0
i
i
| < ∑ pi = 1 ,
i =0
since an absolutely convergent series is convergent.

Note that if we know the function u (z ) and expand it as a power series the coefficient of
z i is equal to pi i.e. knowing u (z ) is equivalent to knowing all the probabilities.

Theorem 2.5.2 If u (z ) is the probability generating function of U then u ' (1) = E[U ] .
Proof
u ' ( z ) = p1 + 2 p 2 z + 3 p3 z 2 + 4 p 4 z 3 + ...
i.e.

u ' (1) = p1 + 2 p 2 + 3 p3 + 4 p 4 + ...


= 0 p 0 + 1 p1 + 2 p 2 + 3 p3 + 4 p 4 + ...
= E[U ] (by the definition of the expected value of a discrete random variable)

Theorem 2.5.3 If u (z ) is the probability generating function of U then


u ' ' (1) = E[U (U − 1)] .
Proof
u ' ' ( z ) = 2 p 2 + 3 × 2 p3 z + 4 × 3 p 4 z 2 + ...
u ' ' (1) = 2 p 2 + 3 × 2 p3 + 4 × 3 p 4 + ...
i.e. = 0 × (−1) p 0 + 1 × 0 p1 + 2 × 1 p 2 + 3 × 2 p3 + 4 × 3 p 4 + ...
= E[U (U − 1)] (definition of discrete exp ected value)

From the above it follows that var(U ) = u ' ' (1) + u ' (1) − (u ' (1) ) .
2

NOTE: u ( z ) = E[ z U ] = E[e (ln z )U ] = M U (ln z ) where M U is the moment generating


function of U .

Theorem 2.5.4 Let a0 , a1 , a 2 ,... and b0 , b1 , b2 ,... be two non-negative sequences of


numbers. Then
 ∞  ∞  ∞  k  ∞ k 
∑ i  ∑ j  ∑ ∑ k −l l  = ∑ ∑ al bk −l  .
a b = a b
 i =0   j =0  k =0  l =0  k =0  l =0 
- 13 -

3 (CT) Proof
 ∞  ∞ 
∑ ai  ∑ b j  = [a 0 + a1 + a 2 + ...][b0 + b1 + b2 + ...]
 i =0   j =0 
= a 0 b0 + a 0 b1 + a 0 b2 + a 0 b3 + a 0 b4 + ...
+ a1b0 + a1b1 + a1b2 + a1b3 + a1b4 + ...
+ a 2 b0 + a 2 b1 + a 2 b2 + a 2 b3 + a 2 b4 + ...


= a 0 b0
+ a1b0 + a 0 b1
+ a 2 b0 + a1b1 + a 0 b2
+ a3 b0 + a 2 b1 + a1b2 + a 0 b3

i.e. to get the sum of all the products ai b j first get the sum of the products on the various
diagonals and then get the sum of all the sums on the diagonals. Note that on the k th
diagonal the sum of the subscripts are equal to k . Therefore

 ∞  ∞  ∞  k  ∞ k 
∑ i  ∑ j  ∑ ∑ k −l l  = ∑ ∑ al bk −l  .
a b = a b
 i =0   j =0  k =0  l =0  k =0  l =0 

Note: This video will be interesting to watch:
[Link]
bigger-than-others

Theorem 2.5.5 Let U be a non-negative integer valued random variable with



P[U = i ] = u i for i = 0,1,2,3,... and let u ( z ) = ∑ u i z i i.e. u (z ) is the generating function
i =0
of the probabilities of U . Let V be an independent non-negative integer valued random

variable with P[V = j ] = v j , j = 0,1,2,3,... and let v( z ) = ∑ v j z j i.e. v(z ) is the
j =0

generating function of the probabilities of V . Then Let W = U + V and let w(z ) be the
probability generating function of W . Then w( z ) = u ( z )v( z ) .

Proof We have that for any integer k ≥ 0 that


- 14 -

wk = P[U + V = k ]
= P[U = 0 and V = k or U = 1 and V = k − 1 or U = 2 and V = k − 2 ... or U = k and V = 0 ]
= P[U = 0 and V = k ] + P[ U = 1 and V = k − 1] + ... + P[ U = k and V = 0 ]
(since the events are disjoint)
= P[U = 0]P[V = k ] + P[U = 1]P[V = k − 1] + ... + P[U = k ]P[V = 0]
(since U and V are independent)
k
= u 0 v k + u1v k −1 + ... + u k v0 = ∑ u i v k −i .
i =0
This is called the convolution of {u} and {v} .
Hence

w( z ) = ∑ wk z k
k =0
∞ k
= ∑∑ u i v k −i z k
k =0 i =0
∞ k
= ∑∑ u i z i v k −i z k −i
k =0 i =0
∞ ∞
= ∑ ui z i ∑ v j z j
i =0 j =0

= u ( z )v( z ).

NOTE: If U 1 and U 2 are non-negative integer valued, identically distributed and


independent random variables, i.e. both have the same probability generating function,
say u (z ) , then the probability generating function for U 1 + U 2 is given by [u (z )] .
2

NOTE: If U 1 , U 2 , U 3 ,..., U n are non-negative integer valued, identically and independent


random variables all with probability generating function u (z ) , then the probability
generating function of U 1 + U 2 + U 3 + ... + U n is given by [u ( z )] .
n

Theorem 2.5.6 Let N be a non-negative integer valued random variable with


probabilities p n = P[ N = n] and with probability generating function Φ N (z ) . Let
X 1 , X 2 , X 3 ,... be a sequence of independent identically distributed random variables with
moment generating function m X (u ) . Suppose that N is independent of all the X i ′s. Let
N
∑ X if N > 0
T =  i =1 i
0 if N = 0

and let M T (u ) be the moment generating function of T . Then
M T (u ) = Φ N (m X (u ) ) .
- 15 -

Proof
M T (u ) = E[e uT ]
 u∑ Xi 
N

= E e i =1 
 
 

  u∑N
Xi 
 
= EN E e i =1
N 

  
  
 u∑ Xi   u∑ Xi 
N N

= E e i =1 N = 0 p 0 + ∑ E e i =1 N = n  p n
  n =1  
   
 u∑ Xi 
n

[ ]

= E e p 0 + ∑ E e i =1  p n
u0

n =1  
 
(since all X i ' s are independent of N )

= 1 p 0 + ∑ [m X (u )] p n
n

n =1

(since the X i ' s are independent )


= Φ N (m X (u ) ). (since [Φ N (u )] = 1)
0

Theorem 2.5.7 For a compound Poisson process with parameter λ the moment
 Nt
∑ X i if N t > 0
generating function of Yt =  i =1 is given by e λt ( M X (u ) −1)
0 if N t = 0

4 (CT, ST) Proof The probability generating function of a Poisson random variable with
parameter λt is given by

(λt ) n n
Φ N t (u ) = ∑ e −λt u sin ce N t − N 0 = N t ~ Poisson(λt )
n =0 n!

(λtu ) n
= e − λt ∑
n =0 n!

xn
= e −λt e λtu = e λt (u −1) (definition of e x = ∑ )
n =0 n!
λt ( M X ( u ) −1)
and therefore by theorem 2.5.6 M Yt (u ) = e . ■
- 16 -

Test Yourself:
Exercise 2.1
(a) If you were to model the maximum daily temperature in Pretoria, what state space
and parameter space would you use? Describe the nature of these spaces.
(b) Suppose you have available a continuous record of temperature instead and wanted to
model Pretoria’s temperature at all times. Describe the parameter space, state space and
the nature of these spaces.
Exercise 2.2
Consider independent identically distributed random variables X t for t = 1,2,3,... where
X t = 1 with probability p and X t = −1 with probability q = 1 − p . Let Y0 = 0 and
t
Yt = ∑ X i i.e. {Yt : t = 0,1,2,...} is a random walk.
i =1

(a) Explain why the Yt ’s are not identically distributed (HINT: Determine the
distributions of Y1 and Y2 ).
(b) Explain why the Yt ’s are not independent (HINT: determine the conditional
distribution of Y2 given that Y1 = 1 ).
Exercise 2.3
For the process {Yt : t = 0,1,2,...} defined in exercise 2.2, determine
(a) P[Y2 = 1, Y4 = 3 Y0 = 0]
(b) P[Y2 = 0, Y4 = 2 Y0 = 0].
Exercise 2.4
To specify the probability structure of a stochastic process {X t : t ∈ T } it is required to
specify the distribution of X t for all values of t ∈ T - TRUE or FALSE?
Exercise 2.5
For the process {Yt : t = 0,1,2,...} defined in exercise 2.2
(a) Determine P[Y10 = 10] and P[Y2 = 10] .
(b) Is the random walk stationary?
Exercise 2.6
What can be concluded about the variance of a weakly stationary stochastic process
{X t : t ∈ T }?
Exercise 2.7
Prove that a random walk has the Markov property.
Exercise 2.8
Suppose that {Yt : t = 0,1,2,...} is a moving average process of order p .
(a) Prove that such a process is weakly stationary.
(b) Does the process have the Markov property? Explain.
(c) Does the process have independent increments? Explain.
- 17 -

Chapter 3: Markov Chains

3.1 Chapman-Kolmogorov Equations

For a Markov chain both the parameter space and the state space are discrete i.e. are finite
or denumerable. For convenience we will take the parameter space to be T = {0,1,2,3,...}
and the state space to be S = {1,2,3,..., N } for some N or S = {1,2,3,...}.
The stochastic process {X t : t ∈ T } is a Markov chain if for any value of n and for any
values of t1 < t 2 < ... < t n < t n +1 ∈ T , it is true that
[ ] [
P X tn +1 = xtn +1 | X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn = P X tn +1 = xtn +1 | X tn = xtn ]
for all xt1 , xt2 ,..., xtn , xtn +1 ∈ S .
Let pi(,mj ,n ) = P[ X n = j X m = i ] i.e. this is the probability that given that at time m the
process was in state i it will be in state j at time n . These so called transitional
probabilities play a vital role in the theory of Markov chains.

Theorem 3.1.1 For a Markov chain {X t : t ∈ T } we have that for any


t 0 < t1 < ... < t n ∈ T and any values i0 , i1 ,..., in ∈ S it is true that
P[ X t1 = i1 , X t2 = i2 ,..., X tn = in X t0 = i0 ] = pi(0t,0i1,t1 ) pi(1t,1i,2t2 ) pi(2t,2i3,t3 ) ... pi(nt−n1−,1i,ntn ) .
6 (CT) Proof We will only prove the result for n = 2 . First, note that for any events A,
B and C it is true that
P[ A ∩ B ∩ C ]
P[ A ∩ B C ] = (definition of conditional probability)
P[C ]
P[ A ∩ C ] P[ A ∩ B ∩ C ]
= . = P[ A C ].P[ B A ∩ C ](definition of conditional probability)
P[C ] P[ A ∩ C ]
.
Therefore
P[ X t1=i1 ,X t2 =i2 | X t0 =i0 ] = P[ X t1=i1 | X t0 =i0 ].P[ X t2 =i2| X t0 =i0 , X t1=i1 ](from the result proven above)
= P[ X t1=i1|X t0 =i0 ].P[ X t2 =i2| X t1=i1 ] ( Markov property)
= pi(0t,0i1,t1 ) pi(1t,1i,2t2 ) .

The result then follows by induction for all n .


Let qi = P[ X 0 = i ] i.e. the initial probabilities.


- 18 -

Theorem 3.1.2 For a Markov chain {X t : t ∈ T } we have that for any


0 < t1 < ... < t n ∈ T and any values i1 , i2 ..., in ∈ S it is true that

P[ X t1 = i1 , X t2 = i2 ,..., X tn = in ] = ∑ qi0 pi(00,,i1t1 ) pi(1t,1i,2t2 ) pi(2t,2i3,t3 ) ... pi(nt−n1−,1i,ntn ) .


i0 ∈S

 7 (CT)
Proof The event { X t = i1 , X t = i2 ,..., X t = in } can take place in a number of
1 2 n

different disjoint ways. Namely, the process can start in any state i0 at time 0, then go to
state i1 at time t1 etc. Hence
P[ X t1 = i1 , X t2 = i2 ,..., X tn = in ] = P[ { X 0 = i0 , X t1 = i1 , X t2 = i2 ,..., X tn = in }]
i0 ∈S

= ∑ P[ X 0 = i0 , X t1 = i1 , X t2 = i2 ,..., X tn = in ]
i0 ∈S

(since the events are disjoint )


= ∑ P[ X 0 = i0 ]P[ X t1 = i1 , X t2 = i2 ,..., X tn = in X 0 = i0 ]
i0 ∈S

(def. of conditional probability)


= ∑ qi0 pi(00,,i1t1 ) pi(1t,1i,2t2 ) pi(2t,2i3,t3 ) ... pi(nt−n1−,1i,ntn ) (by theorem 3.1.1)
i0 ∈S

Theorem 3.1.2 shows that the joint distribution of any number of the X t ’s can be
determined in terms of the probabilities of X 0 and all the transitional probabilities. We
will now show that these transitional probabilities must satisfy certain relationships.

Theorem 3.1.3 The Chapman-Kolmogorov Equations


For a Markov chain {X t : t ∈ T } and m < l < n we have that
pi(,mj ,n ) = ∑ pi(,mk ,l ) p k(l,,jn ) .
k∈S

 8 (CT)
Proof Given that the process starts in state i at time m , the process must be in
some state, say k , at time l and then go from state k at time l to state j at time n i.e.
P[ X n = j X m = i ] = P[ { X l = k , X n = j} X m = i ]
k∈S

= ∑ P[ X l = k , X n = j X m = i ] (since the events are disjoint)


k∈S

= ∑ P[ X l = k X m = i ]P[ X n = j X l = k , X m = i ] (from the result in theorem 3.1.1)


k∈S

= ∑ P[ X l = k X m = i ]P[ X n = j X l = k ] (Markov property)


k∈S

i.e. pi(,mj ,n ) = ∑ pi(,mk ,l ) p k(l,,jn ) . ■


k∈S
- 19 -

Let P ( m ,n ) be the matrix whose (i, j ) th element is pi(,mj ,n ) i.e. P ( m ,n ) = [ pi(,mj ,n ) ] . From the
Chapman-Kolmogorov equations it follows the the (i, j ) th of P ( m ,n ) is the elements in i th
row of P ( m ,l ) multiplied by the elements in the j th column of P (l ,n ) . Hence
P ( m ,n ) = P ( m ,l ) P (l ,n ) for any m < l < n .

Using this result repeatedly we obtain that


P ( m ,n ) = P ( m ,m +1) P ( m +1,n ) if m + 1 < n
= P ( m ,m +1) P ( m +1,m + 2 ) P ( m + 2,n ) if m + 2 < n .
= P ( m ,m +1) P ( m +1,m + 2 ) P ( m + 2,m +3) ...P ( n −1,n ) (3.1.1)
This result shows that the transitional probabilities for a number of steps can be expressed
in terms of one-step transitional probabilities. All that is therefore required to specify the
probabilities for a Markov chain are the one-step transition probabilities [Link] one-step
transition probability matrix.

Example 3.1.1 Suppose that a machine used in some production process may be either
in working order or out of order. Let us denote the two states by 1 and 2 respectively and
divide time into certain periods (e.g. day, month, year etc.). Let X m be the state of the
machine at the start of the m th period. Then S = {1,2} and T = {1,2,3,...} - therefore X 1 = 1
will indicate that the process is started at the beginning of the first period with a machine
that is in order. Now suppose that if the machine is in working order at the start of the
m th period, the probability of the machine being in working order at the start of the
m + 1th period is 100 . Note that this probability only depends on the state of the
100 + m
machine (being in order) at time m (start of the m th period) and not on anything else that
may have happened before that time – it does however depend on m (the ‘age’ of the
machine). (Thus the process is Markov but not time-homogeneous.) Also, suppose that
the probability that a machine that is out of order at the beginning of the m th period is in
working order at the beginning of the next period is given by 200 . This
200 + m
probability also does not depend on anything that might have happened before the start of
the m th period but does depend on m . This means that the process is a Markov chain
with transition matrices
 100 100 
1−

P ( m ,m +1) =  100 + m 100 + m 
200 200 
 1− 
 200 + m 200 + m 
 100 1   100 2 
   
i.e. P (1, 2 ) =  101 101  and P ( 2,3) =  102 102  .
200 1 200 2
   
 201 201  202 202 
- 20 -

0.980488 0.019512
Hence P (1,3) = P (1, 2 ) P ( 2,3) =  .
0.980440 0.01956 
Thus, for instance, the probability that given that at the beginning of period 1 the machine
is in order that it will be out of order at the beginning of the third period is 0.019512.

3.2 Time – Homogeneous Markov Chains

A tremendous simplification occurs if all one-step transition matrices are the same i.e. if
pi(,mj ,m +1) = pi , j for all values of m . In this case the probability of a transition from state i
to state j only depends on the particular states i and j , and does not depend on the
particular point in time. Such a Markov chain is called time-homogeneous. In the rest of
this chapter we will work only with time-homogeneous Markov chains. In all
applications it must always be determined whether or not a Markov chain is time-
homogeneous before results are applied.
Let P = [ pi , j ] . If S is finite with N elements, we have that
 p11 p12 p13 p1N 

p p 22 p 23  p 2 N 
 21
P =  p31p32 p33  p3 N  ,
 
      
 p N 1
p N 2 p N 3  p NN 
and if S is infinite P is an infinite matrix. Note that the i th row contains all the
transition probabilities to go from state i to one of the states in S . Since it is certain that
starting in state i the process will go to one of the states in one step, it is true that
∑ pi, j = 1 for all i . A matrix with this property is called a stochastic matrix.
j∈S

From (3.1.1) it now follows that


P ( m ,n ) = P.P.P...P (n − m times)
= P n−m
i.e. the transition probabilities depend only on the number of steps (n − m) and not on the
time that the transition starts.
Let pi(,kj) = pi(,mj ,m + k ) and let P ( k ) = [ pi(,kj) ] i.e. the matrix P (k ) contains all the probabilities
of going from one state to another in k steps. From above we also see that
P (k ) = P k . (3.2.1)
The Chapman-Kolmogorov equations for a time-homogeneous Markov chain are given
by
pi(,nj− m ) = ∑ pi(,lk− m ) p k( n, j−l ) . (3.2.2)
k∈S
- 21 -

Example 3.2.1 Suppose that a process can be in any one of two states. If a process is in
state 1 the probability is 0.6 that it will remain in state 1 and the probability is 0.4 that it
will change to state 2. If the process is in state 2 the probability is 0.7 that it will remain
in state 2 and the probability is 0.3 that it will change to state 1. Assume that in this case
all changes (increments) for any time s to any time t i.e. X t − X s , is independent of the
state of the process at times 0,1,2,..., s i.e. the process has independent increments (and is
also therefore a Markov process).

This process is a time-homogeneous Markov process with transition probability matrix


0.6 0.4
P= .
0.3 0.7 
0.6 0.4 0.6 0.4 0.48 0.52
Then P ( 2 ) = P 2 =   =  and
0.3 0.7  0.3 0.7  0.39 0.61
0.6 0.4 0.48 0.52 0.444 0.556
P ( 3) = P 3 =   =  etc.
0.3 0.7  0.39 0.61 0.417 0.583

Example 3.2.2 Suppose that players A and B gamble and that at each play the bet is
for R1. Both of them put up R1 at each play and the one that wins gets both rands so that
after one play the winner will have R1 more than before the play and the loser will have
R1 less than before the play. Let n be the total amount the two players together have
available to play. Let Y0 be the amount player A has available to start with and let Ym be
the amount player A has after m bets. They continue to play until one of them has won
all the money available. Therefore
Ym + 1 if 0 < Ym < n and player A wins the m + 1th bet

Ym +1 = Ym − 1 if 0 < Ym < n and player A looses the m + 1th bet
Y if Ym = n or 0.
 m
The state of the process at time m + 1 only depends on the state of the process at time
m and the result of the m + 1th bet. If we assume that the result of any bet does not
depend on anything that happened before the bet, the process will be a Markov chain. It is
usual to assume that the results of the different bets are completely independent events
and that the result of any bet is not dependent on anything else that might have happened
in the process. If we also assume that the probability that A wins the bet is the same for
all bets, say p , then the process will be a time-homogeneous Markov chain. In this case
the matrix P is given by
1 0 0   0 0 
q 0 p   0 0 
 
0 q 0 p  0 0 
P= .
       
0 0 0  q 0 p 
 
 0 0 0  0 0 1 
- 22 -

Note that this process is a Markov process but does not have independent increments –
Why?
Some interesting questions about this process are, for instance, the probability that A will
loose all his money and how these probabilities depend on the amount A had available at
the start of the process, what the probabilities are that A will loose all his money after a
certain number of bets, the expected duration of the game etc.

3.3 Classification of States

Definition 3.3.1 State j is accessible from state i , if there exists an


n > 0 if i ≠ j (n ≥ 0 if i = j ) such that pi(,nj) > 0 . Two states i and j
communicate if j is accessible from state i and state i is accessible from
state j
i.e. there exists n > 0 if i ≠ j (n ≥ 0 if i = j ) and m > 0 if i ≠ j (m ≥ 0 if i = j )
such that pi(,nj) > 0 and p (jm,i ) > 0 .
Notation: If j is accessible from i it is indicated by i → j and if i and j communicate
it is indicated by i ↔ j .

Example 3.3.1 (continuation of example 3.2.2)


Suppose players A and B have a total amount of R5 between them to gamble. For each
bet the probability that A will win the bet is p, 0 < p < 1 . Let Ym be the amount that A
has at time m . The possible states are 0,1,2,3,4,5 and the transition probability matrix is
then given by (where q = 1 − p )
1 0 0 0 0 0 
q 0 p 0 0 0 
 
0 q 0 p 0 0 
P= .
0 0 q 0 p 0 
0 0 0 q 0 p 
 
 0 0 0 0 0 1 
The rows and columns correspond to the possible states 0,1,2,3,4,5 in that order.
- Let i be any one of the states 1,2,3,4 i.e. when they can still continue with their
bets. If A looses the next i bets then A’s fortune will be R0 and the probability of
this event is q i > 0 (or pi(0i ) > 0 ) i.e. 0 is accessible from i .
- If A wins the next 5 − i bets A’s fortune will become R5 and the probability of this
event is p 5−i > 0 (or pi(55−i ) > 0 ) i.e. 5 is accessible from state i .
- Now suppose that 1 ≤ i ≤ 4 and 1 ≤ j ≤ 4 and that i < j . If A’s fortune is i and he
wins the next j − i bets his fortune will become j and the probability of this event
is p j −i > 0 (or pij( j −i ) > 0 ) and j is accessible from i . If A’s fortune is j and then
looses the next j − i bets A’s fortune will become i and the probability of this event
- 23 -

is q j −i > 0 (or p (jij −i ) > 0 ) i.e. i is accessible from j . Therefore states 1,2,3,4 all
communicate with each other.
- If A’s fortune is R0 then it will remain 0 i.e. for any number of transitions the
probability is zero that A’s fortune will change and therefore state 0 does not
communicate with any other state. A state such as 0 is called absorbent. Similarly
state 5 does not communicate with any other state and is also absorbent. ♪

Theorem 3.3.1 If state i communicates with state j and state j


communicates with state k , then state i communicates with state k . i.e.
i ↔ j, j ↔ k ⇒ i ↔ k .
9 (ST) Proof Since j is accessible from state i , there exists an m such that pi(,mj ) > 0
and since state k is accessible from state j , there exists an n such that p (jn,k) > 0 . From
the time-homogeneous Chapman-Kolmogorov equations (see 3.2.2) it follows that
pi(,mk + n ) = ∑ pi(,ml ) pl(,nk) ≥ pi(,mj ) p (jn,k) > 0
l∈S
from the above results, i.e. k is accessible from i . Similarly we can show that i is
accessible from k and hence i communicates with k . ■
From theorem 3.3.1 it then follows that the states of a process can be divided into disjoint
subsets such that all states that belong to the same set communicate with each other and
do not communicate with any state in another set. Such sets of states that all
communicate with each other are called equivalence classes. Any state must belong to
some equivalence class since it either does not communicate with any other state in
which case it is its own equivalence class by itself, or it does communicate with at least
one other state and then it belongs to the equivalence class of all states communicating
with it.
A process is called irreducible if there is only one equivalence class i.e. if all states
communicate with each other.
For example 3.2.2, {1,2,3,4} is an equivalence class since all the states communicate. The
class {0} is an equivalence class by itself since state 0 does not communicate with any
other state. Similarly {5} is an equivalence class. In this case the process is not
irreducible.

Theorem 3.3.2 Suppose that { X t : t = 0,1,2,...} is a time-homogeneous Markov


chain and that C1 and C 2 are two equivalence classes of states of the chain.
If i ∈ C1 and j ∈ C 2 and pi(,mj ) > 0 for some m , then p (jn,k) = 0 for all k ∈ C1 and
for all n .
(This means that if the process goes from a state in C1 to a state in C 2 the process can
never return to a state in C1 .)
- 24 -

10 (ST, E)
Proof It is given that pi(,mj ) > 0 . Suppose p (jn,k) > 0 for some n and some
k ∈ C1 . Since i ↔ k ( i and k belong to the same equivalence class C1 ) there exists an
s such that p k( s,i) > 0 , so that by the Chapman-Kolmogorov equations
p (jn,i+ s ) ≥ p (jn,k) p k( s,i) > 0
from above i.e. i and j communicate. This would however mean that j ∈ C1 since all
states that communicate with i have to belong to the same equivalence class. This is a
contradiction, thus we must have p (jn,k) = 0 for all k ∈ C1 and all n . ■

NOTE: Theorem 3.3.2 implies that if it is possible to go from a state in an equivalence


class to a state in another equivalence class, it is impossible to return to a state in the
previous equivalence class.
In example 3.3.1 the set {1,2,3,4} is an equivalence class since all the states communicate
with each other and do not communicate with any state that does not belong in the set.
We also have that {0} and {5} are equivalence classes since each of these states only
communicates with itself. Note that the process can go from any state in {1,2,3,4} to {0}
but that it is then impossible to return to any of the states in {1,2,3,4}. Similarly for {5}.

Definition 3.3.2 Suppose that { X t : t = 0,1,2,3,...} is a time-homogeneous


Markov chain. Let d i = gcd{n : n ≥ 1, pi(,ni ) > 0} . Then d i is called the period of
state i . A state is called aperiodic if its period is 1.

Example 3.3.2 Suppose that { X t : t = 0,1,2,3,...} time-homogeneous Markov chain


with S = {1,2,3} and transition probability matrix equal to
0 1 0 
P = 0 0 1 .
1 0 0
If the process is in state 1 then it will always go to state 2, then it will always go to state 3
and then it will always go to state 1. Hence if the process starts in state 1 it can only
return to state 1 after 3 steps (for the first time). Therefore
d1 = gcd{n : p1(,n1) > 0} = gcd{3,6,9,...} = 3 i.e. state 1 has period 3. Similarly it can be
shown that states 2 and 3 also have period 3. ♪

We state the following three theorems without proof (a proof of the theorems can be
found in the book by Bhat titled Applied Stochastic Processes).

Theorem 3.3.3 Suppose that { X t : t = 0,1,2,...} is a time-homogeneous Markov


chain and that state i has period di ≥ 1 . If p (jm,i ) > 0 , then there exists an integer
N such that for n ≥ N p (jm,i + nd ) > 0 .
i
- 25 -

From theorem 3.3.3 it follows that if state i is aperiodic ( d i = 1 ) and if p (jm,i ) > 0 , then
there exists an N such that p (jm,i +n ) > 0 for all n ≥ N i.e. p (js,i) > 0 for all s ≥ N + m i.e.
from some point onwards all p (js,i) ’s are > 0 .

Theorem 3.3.4 Suppose that { X n : n = 0,1,2,...} is a time-homogeneous


Markov chain. If i ↔ j then i and j have the same period.

From theorem 3.3.4 it follows that all states in an equivalence class have the same period.
An equivalence class of states with period 1 is called aperiodic.

Theorem 3.3.5 Let P be the transition matrix of an irreducible, aperiodic


FINITE Markov chain. Then there exists an N such that for all n ≥ N the
n -step transition matrix P n has no zero elements.

Definition 3.3.3 Suppose that { X t : t = 0,1,2,...} is a time-homogeneous


Markov chain. Let
f ij( 0 ) = 0 and
f ij( n ) = P[ X 1 ≠ j , X 2 ≠ j ,..., X n −1 ≠ j , X n = j X 0 = i ] for n ≥ 1
i.e. f ij(n ) is the probability, given that the process started in state i , that the
process will visit state j for the first time after n transitions. Let

f ij = ∑ f ij( n ) . Then f ij is the probability, given that the process started in state
n =0

i , that the process will visit state j for the first time at some time. Note that
f ij > 0 if and only if state j is accessible from state i .
State i is called recurrent if f ii = 1 i.e. if the process is in state i then it is
certain that the process will return to state i some time.
State j is called transient if f jj < 1 i.e. the process is in state j and there is a
positive probability ( 1 − f jj ) that the process will never return to state j .

For example 3.3.1 (gambling problem) states 0 and 5 are recurrent since once the process
is in one of these states then with probability 1 it will stay in that state i.e. it will return to
that state with probability 1. States 1,2,3,4 are transient states since there is a positive
probability that the process will go from any of these states, say i , to state 0 and then
never return to that state i.e. f ii < 1 .
- 26 -

Theorem 3.3.6 Suppose that { X t : t = 0,1,2,3,...} is a time-homogeneous



Markov chain. State i is recurrent if and only if ∑p
n =0
(n)
ii = ∞.

11 (ST, E) Proof If state i is recurrent then the process, given that it started in state i ,
will with probability 1 return to state i at some time. As a result of the Markov property
it will then be as if the process starts from scratch once the process has returned to state i
- the probabilities depend only on the fact that the process starts in state i . The process
will then again return to state i at some time with probability 1. With probability 1 there
will be a next visit to state i and so there will be an infinite number of visits to state i
with probability 1. The expected number of visits to state i is therefore infinity.
Conversely however, if state i is transient then the process will visit state i some time
with probability f ii < 1 . If there is a visit, the process will start over from scratch (why?)
and there will be a next visit with probability f ii . The number of visits to state i before
no further visits will have a geometric distribution with parameter p = 1 − f ii . [Regard a
success as the event that there are no further visits to state i and the geometric
distribution gives the probabilities that there will be a certain number of failures and then
a success.] In this case the expected number of visits to state i is 1 < ∞ since
(1 − f ii )
f ii < 1 .
State i is therefore recurrent if and only if the expected number of visits to state i is
infinite.

Now consider the following random variable, given that the process started in state i ,
1 if at time n the process is in state i
In =  .
0 if at time n the process is not in state i

Then E[ I n ] = 1 × pi(,ni ) + 0 × (1 − pi(,ni ) ) = pi(,ni ) . Note that ∑I
n =0
n = total number of visits to

state i and therefore


∞ ∞ ∞
E[ total number of visits] = E[∑ I n ] = ∑ E[ I n ] = ∑ pi(,ni ) .
n =0 n =0 n =0


From theorem 3.3.6 it follows that if state i is transient, then ∑p
n =0
(n)
ii < ∞ from which it

then follows that lim pii( n ) = 0.


n →∞

Theorem 3.3.7 If state i is recurrent and if i ↔ j , then state j is recurrent.


12 (ST, E) Proof Let m and n be such that pi(,mj ) > 0 and p (jn,i) > 0 since i ↔ j . Then
for any s ≥ 0 p (jm, j+ s + n ) ≥ p (jn,i) pi(,si ) pi(,mj ) since to go from j to i in n transitions, then to go
- 27 -

from i to i in s transitions and then to go from i to j in m transitions is only one of


the possible ways to go from j to j in m + s + n transitions. Hence
∞ ∞

∑p
l =0
(l )
j, j ≥ ∑ p (jn, j+ s + m )
s =0

≥ ∑ p (jn,i) pi(,si ) pi(,mj ) (Chapman - Kolmogorov equations)
s =0

= p (jn,i) pi(,mj ) ∑ pi(,si ) = ∞
s =0
from theorem 3.3.6 since i is recurrent and therefore j is also recurrent since

∑p
n =0
(n)
j, j = ∞ (by Theorem 3.3.6). ■

NOTE: All states in an equivalence class are therefore either recurrent or all states are
transient.

Theorem 3.3.8 If i ↔ j and both states are recurrent, then f ij = 1 .


13 (ST) Proof Suppose that f ij < 1 . Since i is accessible from j there exists an m such
that p (jm,i ) > 0 . Since f ij < 1 it implies that there is a positive probability that if i is
reached from j that the process will never return to j i.e. there is a positive probability
of at least p (jm,i ) (1 − f ij ) that the process, if it starts from j , will never return to j i.e.
f jj < 1 . This is a contradiction since state j was assumed recurrent. ■

Theorem 3.3.9 Suppose that { X t : t = 0,1,2,...} is a time-homogeneous Markov


chain. Suppose that C1 and C 2 are two equivalence classes of states of the
chain. Then if the states in C1 are recurrent and if i ∈ C1 and j ∈ C 2 then
pij( n ) = 0 for all n .
14 (ST, E) Proof Suppose pi(,nj) > 0 for some value of n . Then by theorem 3.3.2 all
p (jm,i ) = 0 for all values of m i.e. the state of the process will never return to i . This
means there is positive probability that the state of the process will never return to i
which contradicts the fact that i is recurrent.

Theorem 3.3.10 Suppose that state j is transient. Then lim pij( n ) = 0 for all
n →∞

states i .
15 (ST, E) Proof The event that the process goes from state i to state j in n transitions
is the union of the following mutually exclusive events:
1) the first visit to state j is after 1 transition and in the following n − 1 transitions the
process goes from state j to state j ,
- 28 -

2) the first visit to state j is after 2 transitions and in the following n − 2 transitions the
process goes from state j to state j ,

n − 1 ) the first visit to state j is after n − 1 transitions and in the following transition the
process goes from state j to state j ,
n ) the first visit to state j is after n transitions.
Hence
n −1
p i(,nj) = ∑ f ij( k ) p (jjn − k ) + f ij( n )
k =1
n
= ∑ f ij( k ) p (jjn − k ) where f ij( 0 ) = 0 and p (jj0 ) = 1.
k =0
Then


Since ∑p
n =0
(n)
ij < ∞ it follows that lim pij( n ) = 0 (result from analysis) for all states i .
n →∞

Theorem 3.3.11 Suppose that { X n : n = 0,1,2,...} is a time-homogeneous


Markov chain. If the state space of the Markov chain consists of a FINITE
number of states, not all states can be transient.
s
16 (ST, E) Proof If the number of states is s then for all i and for all n , ∑p
j =1
(n)
ij = 1 . If

all states are transient, then the limit of pij(n ) as n tends to infinity is zero (by theorem
3.3.10) for all j which is impossible if the sum remains 1. ■

Definition 3.3.4 Suppose that state i is recurrent i.e. f ii = 1 . Let µ ii = ∑ nf ii( n )
n =0

i.e. µ ii is the expected value of the number of transitions (time), given that
the process starts in i , to return to state i for the first time.
State i is called positive recurrent if µ ii < ∞ or null-recurrent if µ ii = ∞ i.e.
although the process returns to state i with probability 1, on average it may
take an infinite amount of time for this to happen. A positive recurrent
aperiodic state of a Markov chain is called ergodic.
- 29 -

3.4 Finite Markov Chains with Recurrent and Transient States

Suppose that {Yt : t = 0,1,2,3,...} is a time-homogeneous Markov chain with m states.


Suppose that r of the states are recurrent and that m − r are transient. For convenience
let states 1,2,..., r be the recurrent states and states r + 1, r + 2,..., m be the transient states.
In this case the transition probability matrix is as follows:
 P Ο
P= 1  (3.4.1)
 R Q
where P1 is a r × r matrix and contains all the pij ’s where both i and j are recurrent
states, Ο is a r × (m − r ) matrix with all elements equal to 0 since the probability to get
from a recurrent to a transient state is 0, R is a (m − r ) × r matrix that contains all the
pij ’s where i is transient and j is recurrent and Q is a (m − r ) × (m − r ) matrix where
both i and j are transient states.
For P in (3.4.1) we get that
 P O   P1 O   P1 P1 + OR P1O + OQ 
P2 =  1  = 
 R Q   R Q   RP1 + QR RO + QQ 
P 2 O 
= 1 2
say, and
 2
R Q 
 P O   P12
O  P1 P12 + OR2 P1O + OQ 2 
P3 =  1  2
=  2 2
 R Q   R2 Q   RP1 + QR2 RO + QQ 
P 3 O
= 1  say, and in general
 R3 Q3 
P n O 
Pn =  1 n
(3.4.2)
 R n Q 
n
i.e. the elements of Q are the probabilities that the process will go from a transient state
to a transient state in n transitions. From theorem 3.3.10 it follows that the elements of
Q n tend to 0 as n tends to infinity.
Theorem 3.4.1 Suppose that { X t : t = 0,1,2,3,...} is a time-homogeneous
Markov chain with m states and transition probability matrix given by
 P Ο
P= 1  as given above. Then ( I − Q) −1 exists and is given by
 R Q
( I − Q) −1 = I + Q + Q 2 + Q 3 + ... .
- 30 -

17 (ST) Proof By direct multiplication we have that


( I − Q)( I + Q + Q 2 + Q 3 + ... + Q n −1 )
[
= I + Q + Q 2 + Q 3 + ... + Q n −1 − Q + Q 2 + Q 3 + ... + Q n ]
= I − Q for n = 2,3,4,...
n
(3.4.3).
Now
( )
lim I − Q n = lim I − Q n (determinant is a continuous function of its elements)
n →∞ n →∞

= I (all the elements of Q n tend to 0 )


= 1.
Hence I − Q n ≠ 0 for n large enough. But then
0 ≠ I − Q n = ( I − Q)( I + Q + Q 2 + Q 3 + ... + Q n −1 ) = I − Q I + Q + Q 2 + Q 3 + ... + Q n −1
i.e. I − Q ≠ 0 which means that ( I − Q) −1 exists. If we let n tend to infinity in (3.4.3) we
get that
( I − Q)( I + Q + Q 2 + Q 3 + ...) = I
and therefore
( I − Q) −1 = I + Q + Q 2 + Q 3 + ... .

Let T = {r + 1, r + 2,..., m} be the set of all transient states and let T C = {1,2,..., r} be the
set of all recurrent states.

Suppose that i ∈ T and j ∈ T C . Let g ij(n ) be the probability, given that the process
begins in state i ∈ T , the process will visit only transient states for n − 1
transitions and at the n th transition goes to state j ∈ T C . Remember that once the
process visits a recurrent state it can never go to a transient state again. The number of
transitions n is therefore the number of transitions before a recurrent state is visited and
then never returns.


Let g ij = ∑ g ij( n ) i.e. g ij is the probability, given that the process started in
n =0

i ∈ T , that the first recurrent state visited at some time is j ∈ T C .

[ ]
Let G ( n ) = g ij( n ) and let G = g ij .[ ]
Theorem 3.4.2 Suppose that { X t : t = 0,1,2,3,...} is a time-homogeneous
Markov chain with m states and transition probability matrix given by
P Ο
P= 1 . Then G ( n ) = Q n −1 R and G = ( I − Q) −1 R .
R Q 
- 31 -

18 (ST) Proof Firstly we have that g ij(1) = pij for i ∈ T and j ∈ T C since these are the
probabilities that in 1 step the process will go from state i to state j i.e.
G (1) = R = IR = Q 0 R . For n>1 we have that g ij( n ) = ∑ pik g kj( n −1) since after 1 step the
k∈T
process must be in a transient state, say k , and then starting from that transient state k
visit only transient states and after n − 1 steps visit a recurrent state, specifically state j ,
for the first time. Hence the (i, j ) th element of G (n ) is the i th row of Q multiplied by the
j th column of G ( n −1) i.e. G ( n ) = QG ( n −1) . Applying this result repeatedly we get that
G ( n ) = QG ( n −1) = QQG ( n − 2 ) = QQQG ( n −3) = ... = Q n −1G (1) = Q n −1 R .
Also

G = ∑ G (n) = R + QR + Q 2 R + Q 3 R + ...
n =1

= ( I + Q + Q 2 + Q 3 + ...) R
= ( I − Q) −1 R from theorem 3.4.1.

Example 3.4.1 (continuation of example 3.3.1)


Suppose that the total money A and B gamble with is R5 and that A wins every bet with
probability p = 0.6 . Let Yt be the amount of money that A has available at time t . The
probability transition matrix in the form of (3.4.1) is
1 0  0 0 0 0
0 1  0 0 0 0 

      
 P1 Ο  
P=  = 0.4 0  0 0.6 0 0
 R Q 
0 0  0.4 0 0.6 0 
 
0 0  0 0.4 0 0.6
 0 0.6  0 0 0.4 0 

where row 1 and column 1 corresponds to state 0, row 2 and column 2 corresponds to
state 5, row 3 and column 3 corresponds to state 1, row 4 and column 4 to state 2, …,and
row 6 and column 6 to state 4. From theorem 3.4.2 it follows that
0.4 0 
0 0 
G =R=
(1)  .
0 0
 
 0 0.6
The number in row i and column 1 is the probability that A will have lost all his money
after 1 step if he started with R i . The number in row i and column 2 is the probability
that A will have won all R5 after 1 game if he started with R i . Also
- 32 -

 0 0.6 0 0  0.4 0   0 0 
0.4 0 0.6 0   0  
0  0.16 0 
G ( 2 ) = QR =   = .
 0 0.4 0 0.6  0 0  0 0.36
    
0 0 0.4 0   0 0.6  0 0 
If A started with R2 then the probability that he looses everything after exactly 2 bets is
0.16 = 0.4 × 0.4 and the probability that A will have R5 after two bets if he started with
R3 is 0.36 = 0.6 × 0.6 .
Next we obtain
 0 0.6 0 0  0 0  0.096 0 
0.4 0 0.6 0  0.16 
0   0  0.216
G (3) = QG ( 2 ) =   = .
 0 0.4 0 0.6  0 0.36 0.064 0 
    
0 0 0.4 0   0 0   0 0.144
The probability is 0.096 that A will have lost all his money after exactly 3 bets if he
started with R1. Similarly the probability is 0.216 that A will have all R5 after exactly 3
bets if he started with R2. We also get that
−1
 1 − 0.6 0 0  0.4 0 
− 0.4 1 − 0.6 0   0 0 
G = ( I − Q) −1 R = 
 0 − 0.4 1 − 0.6  0 0
   
 0 0 − 0.4 1   0 0.6
1.54 1.35 1.07 0.64 0.4 0 
0.90 2.25 1.78 1.07   0 0 
=   
0.47 1.18 2.25 1.35   0 0
  
0.19 0.47 0.90 1.54   0 0.6
0.62 0.38
0.36 0.64
= .
0.19 0.81
 
0.08 0.92
From this we see that if A starts with R1 (and B with R4) then the probability that A will
loose all his money some time is 0.62 and the probability is 0.38 that some time he will
win all R5. If A starts with R2 (and B with R3) then the probabilities of loosing
everything or winning everything eventually are 0.36 and 0.64 respectively.

Although theorem 3.4.2 is a very useful general method of calculating the probabilities
and is also important from a theoretical point of view, there are certain cases where it is
possible to get more explicit formulae for the probabilities and which also lead to further
insights.
- 33 -

Example 3.4.2 (continuation of example 3.2.2)


Suppose that A and B between them have R N available to gamble and that for every bet
A has probability p of winning the bet. As in example 3.4.1 the matrix P of transition
probabilities can be given as follows
1 0 | 0 0 0  0 0 0 
0 1 | 0 0 0  0 0 0 
 
− − | − − − − − − − 
 
q 0 | 0 p 0  0 0 0 
0 0 | q 0 p  0 0 0 
P= .
0 0 | 0 q 0  0 0 0 
  |        
 
  |        
0 0 | 0 0 0  q 0 p
 
 0 p | 0 0 0  0 q 0 
Let π i be the probability, given A started with R i , that A will eventually win all R N for
i = 1,2,..., N − 1 . These probabilities are given in the second column of G = ( I − Q) −1 R -
theorem 3.4.2. To obtain the second column of the product of two matrices we must first
multiply the first matrix with the second column of the second matrix. Let
π ' = [π 1 , π 2 ,..., π N −1 ] and let r2 be the second column of R . Then π = ( I − Q) −1 r2 or
( I − Q)π = r2 i.e.
1 −p 0  0 0  π1   0 
0
− q 1
 −p  0 0 0   π 2   0 
 0 −q 1  0 0 0  π 3   0 
    
           =   
             
    
 0 0 0  − q 1 − p  π N − 2   0 
 0
 0 0  0 − q 1  π N −1   p 
which will be the case if
π 1 − pπ 2 = 0 i.e. ( p + q )π 1 − pπ 2 = 0 i.e. π 2 − π 1 = (q p)π 1
− qπ 1 + π 2 − pπ 3 = 0 i.e. − qπ 1 + ( p + q )π 2 − pπ 3 = 0 i.e. π 3 − π 2 = (q p )(π 2 − π 1 )
− qπ 2 + π 3 − pπ 4 = 0 i.e. − qπ 2 + ( p + q )π 3 − pπ 4 = 0 i.e. π 4 − π 3 = (q p )(π 3 − π 2 )

− qπ i −1 + π i − pπ i +1 = 0 i.e. − qπ i −1 + ( p + q )π i − pπ i +1 = 0 i.e. π i +1 − π i = (q p )(π i − π i −1 )

− qπ N −3 + π N − 2 − pπ N −1 = 0 i.e. − qπ N −3 + ( p + q )π N − 2 − pπ N −1 = 0
i.e. π N −1 − π N − 2 = (q p )(π N − 2 − π N −3 )
− qπ N − 2 + π N −1 = p i.e. − qπ N − 2 + ( p + q )π N −1 = p i.e. 1 − π N −1 = (q p )(π N −1 − π N − 2 )
- 34 -

(3.4.4)
Hence π i +1 − π i = (q p ) π 1 for i = 1,2,3,..., N − 1 . But
i

π i = π 1 + (π 2 − π 1 ) + (π 3 − π 2 ) + ... + (π i − π i −1 )
[
= π 1 1 + (q p ) + (q p ) 2 + ... + (q p ) i −1 ]
[
π 1 1 − (q p) i

]
if p ≠ q
=  1 − (q p) (3.4.5)
iπ if p = q
 1
From (3.4.4) it follows that
1 = π 1 + (π 2 − π 1 ) + (π 3 − π 2 ) + ... + (π N −1 − π N − 2 ) + (1 − π N −1 )
[
= π 1 1 + q p + (q p ) 2 + ... + (q p ) N −1 ]
[
π 1 1 − (q p) N

] if p ≠ 1 / 2
=  1− q p
 Nπ if p = 1 / 2.
 1
Therefore
 1− q / p
1 − (q / p ) N if p ≠ 1 / 2
π1 = 
1 if p = 1 / 2. (3.4.6)
 N
From (3.4.5) and (3.4.6) it follows that
 1 − (q / p) i
 if p ≠ 1 / 2
1 − (q / p) N
πi = 
i if p = 1 / 2. (3.4.7)
 N
For p = 1 / 2 the probability that eventually A will win all the money available is equal
to the proportion of the total capital available at the beginning of the game. If we let the
total capital N tend to infinity we get that
1 − (q / p ) i if p > 1 / 2
lim π i = 
N →∞
0 if p ≤ 1 / 2.
If A plays against an adversary with an infinite amount of capital then A will eventually
loose all his money with probability 1 if p ≤ 1 / 2 . If p > 1 / 2 then A does have a chance
to win ‘all’ the money and this probability is bigger if he has more money available at the
beginning of the game.
It is possible to obtain the results of (3.4.7) in an alternative way. If π i is the probability
that if player A starts with R i he will eventually win all the money, then by conditioning
on the outcome of the first bet, we get that
π i = pπ i +1 + qπ i −1 (3.4.8) .
We get this equation by arguing as follows: if A wins the first bet (which has probability
p ) he will have R( i + 1 ) and then the probability that he will eventually win all the
- 35 -

money is π i +1 - by using the Markov property. Similarly if A looses the first bet (with
probability q ) he will have R (i − 1) and the probability of then eventually winning all the
money is π i −1 .
Equation (3.4.8) is a homogeneous difference equation of order 2. The general solution of
such an equation has two unknown constants in it. To determine these constants we make
use of the two boundary conditions of π 0 = 0 (if A starts with R0 the probability of
eventually winning all the money is 0) and π N = 1 (if A starts with R N the probability
of eventually winning all the money is 1). It is easy to check that (3.4.7) is a solution of
(3.4.8) satisfying these two conditions.

Suppose that i ∈ T and j ∈ T . Let hij(n ) be the probability, given that the
process started in state i , that state j will be visited n times before the

process visit some recurrent state. Let mij = ∑ n hij( n ) i.e. mij is the expected
n =1

number of visits, given that the process starts in state i , to a transient state j
before the process visits a recurrent state for the first time. Let Tij be the
random variable such that P[Tij = n] = hij( n ) i.e. mij = E[Tij ] .

Theorem 3.4.3 Suppose that { X t : t = 0,1,2,3,...} is a time - homogeneous


Markov chain with m states and transition probability matrix given by
P O
P= 1
Q 
.
R
Let M = [mij ] . Then M = ( I − Q) −1 .
19 (CT) Proof Suppose that the process starts in state i ∈ T . Let K be the random
variable determined by the state visited after the first transition. Then
mij = E[Tij ] = E K [ E[Tij | K ]] .
For k ∈ T C we see that:
1 if i = j
Tij = 
0 if i ≠ j.
In this case the process visits a recurrent state after 1 transition and then there will be no
visits to the transient state j if j ≠ i . If j = i then there will be only the one visit to the
transient state in which the process started - we regard the process as being in state i until
the first transition takes place.
1 if i = j
Thus E[Tij | K = k ] = δ ij where δ ij =  .
0 if i ≠ j
- 36 -

For k ∈ T we have that:


1 + T ′ if j = i
 kj
Tij = 
0 + Tkj ′ if j ≠ i

where Tkj is the number of visits to state j after the first transition given that the process
after the first transition starts in state k . If j = i then there is a visit to state i with which
the process starts plus the number of visits to state j given that after the first transition
the process starts in state k . Since the process is a Markov process, the probabilities for
′ ′
Tkj are exactly the same as the probabilities for Tkj i.e. E[Tkj ] = E[Tkj ] = mkj . Hence
1 + mkj if i = j
E[Tij | K = k ] = 
0 + mkj if i ≠ j
= δ ij + mkj .
Therefore
mij = EK [ E[Tij | K ]]
= ∑δ ij pik + ∑{δ ij + mkj } pik
k ∈T
k ∈T C

= δ ij ∑ pik + ∑ pik mkj


∀k k ∈T

= δ ij + ∑ pik mkj since ∑ pik = 1.


k ∈T ∀k
Thus
   
M = [mij ] = δ ij + ∑ pik mkj  = [δ ij ] + ∑ pik mkj  = I + QM
 k∈T   k∈T 
since ∑ pik mkj is the i row of Q multiplied by the j column of M . It follows that
th th

k∈T

( I − Q) M = I i.e. M = ( I − Q) −1 since the inverse exists by previous theorem. ■

Let Ti = ∑ Tij i.e. Ti is the total number of visits to transient states given that
j∈T

the process started in state i . This means that Ti is the total time (number of
transitions) spent in transient states. Thus E[Ti ] = ∑ mij which is the sum of the
j∈T
th
elements in the i row of M .
- 37 -

Example 3.4.3 (continuation of example 3.4.1)


Suppose that the total amount available to A and B is R5 and that A wins every bet with
probability p = 0.6 . Then
1.54 1.35 1.07 0.64
0.90 2.25 1.78 1.07 
M = ( I − Q) = 
−1 .
0.47 1.18 2.25 1.35 
 
0.19 0.47 0.90 1.54 
Note that after the first visit to one of the recurrent states 0 or 5 the game stops for all
practical purposes. If A started with R2 there will be an average number of 1.78 times
that A will have R3 before A has lost everything or has won everything. The vector with
the sum of the elements in the rows of M is
4.60
6.00
 .
5.25
 
3.10 
If A starts with R1 the average duration of the game will be 4.60 bets and if he starts with
R2 the average duration of the game will be 6.00 etc. The variances of the Tij ’s can be
determined in a similar way.
The fact that M = ( I − Q) −1 , is always true and can always be used to calculate the
expected values. It is possible to get more explicit formulae in certain cases. ♪

Example 3.4.4 (continuation of examples 3.2.2 and 3.4.3)


Let X i be the number of bets until A either loses all his money or wins all the money if
in state i . Let mi = E[ X i ] (obviously m0 = m N = 0) . Let Z be the random variable
determined by the change in A's fortune after one bet i.e.
− 1 if A loses the first bet
Z =
+ 1 if A wins the first bet.

Let X i be the number of bets after the first bet if A's fortune after one bet is i . By the
Markov property the probabilities after 1 bet is the same as the probabilities before the
first bet provided the process starts in the same state i.e.

E[ X i ] = E[ X i ] = mi .

Note that X i = 1 + X i +Z .
Furthermore E[ X i ] = E Z [ E[ X i | Z ]] . (3.4.9)
- 38 -

But

E[ X i | Z = −1] = E[1 + X i + z | Z = −1]

= E[1 + X i −1 | Z = −1]

= E[1 + X i −1 ] since what happens after the first bet is independent of what
happended in the first bet
= 1 + mi −1 .

Similarly

E[ X i | Z = 1] = E[1 + X i + z | Z = 1]

= E[1 + X i +1 | Z = 1]

= E[1 + X i +1 ]
= 1 + mi +1 .
From (3.4.9) it then follows that
mi = q (1 + mi −1 ) + p (1 + mi +1 )
(3.4.10)
or pmi +1 − mi + qmi −1 = −1.
This is an inhomogeneous difference equation of the second order and the solution
required must satisfy the boundary conditions m0 = m N = 0 .
It is easy enough to check that the solution in this case is given by
 1  N (1 − (q / p )i ) 
 2 p −1  − i for p ≠ q
mi =   1 − (q / p)
N
 (3.4.11)

 Ni − i for p = q.
2

3.5 Irreducible Finite Markov Chains and Limit Theorems

First Passage Times

Suppose that the process starts in state i . The time (i.e. number of steps)
taken by the process to go from state i to state j for the first time is called
the first passage time of the transition i → j . When j = i we call the number
of steps required for such a transition the recurrence time of state i .
Let Fij be the first passage time of the transition i → j and let { f ij(n ) } be its distribution
i.e.
f ij( n ) = P[ X n = j , X r ≠ j for r = 1,2,..., n − 1 | X 0 = i ] = P[ Fij = n].
- 39 -

We are considering only the case of a single recurrent equivalence class (since the
process is irreducible). Note then that all the states are recurrent since the process is finite
thus we cannot transform P into equation 3.4.1. Thus to determine f ij(n ) we modify the
transition probability matrix P such that the state j becomes absorbing i.e. make
p jj = 1 and p ji = 0 for all i ≠ j and leave all other transition probabilities unchanged.
This new state j is recurrent. All other states of the chain then becomes transient because
there is a positive probability to go from any state i ≠ j to j and then never return to
state i . For the original process the probability of starting in state i and then visiting
state j for the first time after n steps is the same as the probability of starting in a
transient state i and then for the first time to visit a recurrent state, specifically state j ,
after n steps in the changed process. Let P * be the changed probability matrix in
canonical form i.e.
j 1 2   m
j 1 | 0 0   0 
 − |
 − − − − − 
1  p1 j | p11 p12   p1m 
 
P* = 2  p 2 j | p 21 p 22   p2m 
  |      
 
  |      
m  p mj | p m1 pm2   p mm 
where state j is the only recurrent state in P * and states 1,2,..., m with j excluded are
the m − 1 transient states in P * . Let
 1 O
P* =  
R Q
where R is the j th column of P with the j th element excluded and Q is the matrix P
with row j and column j excluded. The distribution of the first passage times from
states i = 1,2,..., j − 1, j + 1,...m to state j is then given by theorem 3.4.2 as
F ( n ) = Q n −1 R.

Limit Theorems for Irreducible Finite Markov Chains with


Ergodic States

Theorem 3.5.1 Let P : m × m be a stochastic matrix (i.e. all elements are non-
negative and the sum of all rows are equal to 1). Let ∈ be the smallest
element of P i.e. ∈≥ 0 . Let X : m × 1 be any vector with minimum component
a 0 and maximum component b0 . Let a1 and b1 be the minimum and
maximum components respectively of PX . Then
a 0 ≤ a1 and b1 ≤ b0 and b1 − a1 ≤ (1 − 2 ∈)(b0 − a 0 ) .
- 40 -

20 (CT)
Proof Let X * be the column vector obtained from X by replacing all
components of X , except the minimum component, by the maximum component b0 . We
then have that
 b0 
b 
 0
 x1   x1    

x   ∗  
 2  x2    
x   x ∗  b 
X =  3  and X * =  3  =  0  .
     a 0 
     b 
   ∗  0 
 x m   x m    

 
 b0 
Therefore X ≤ X * i.e. every component of X is less than or equal to the corresponding
component of X * .
Then
m m

∑ pij x j ≤ ∑ pij x j since all pij ≥ 0 and x j ≤ x j ∀ j


* *

j =1 j =1

m  m *
i.e. ∑ pij x j  ≤ ∑ pij x j  or PX ≤ PX * .
 j =1   j =1 

Given that a 0 occurs in the k th row of X * we have that


m m

∑ pij x j ≤ ∑ pij x *j
j =1 j =1
m
= ∑ pij b0 + pik a 0
j =1
j≠k
m
= (1 − pik )b0 + pik a 0 since ∑ pij = 1
j =1

= b0 − pik (b0 − a 0 )
≤ b0 − ∈ (b0 − a 0 ) since all pij ≥∈ and b0 − a 0 ≥ 0
and since this is true for all i i.e. all elements of PX , we have that
b1 ≤ b0 − ∈ (b0 − a 0 ). (3.5.1)

Now apply the above result to − X . Note that the minimum element of − X is − b0 and
the maximum element is − a 0 . Similarly the maximum element of P(− X ) = − PX is
− a1 . Hence
- 41 -

− a1 ≤ (−a 0 )− ∈ [(− a 0 ) − (− b0 )]
or − a1 ≤ − a 0 − ∈ (−a 0 + b0 ) (3.5.2)

Adding (3.5.1) and (3.5.2) we get that


b1 − a1 ≤ b0 − a 0 − 2 ∈ (b0 − a 0 ) = (1 − 2 ∈)(b0 − a 0 ) .
From (3.5.1) it follows that b1 ≤ b0 and from (3.5.2) it follows that a 0 ≤ a1 . ■

NOTE: In case all elements of P are strictly positive it follows that ∈> 0 . In this case
1 − 2 ∈< 1 and then b1 − a 1 < b 0 − a 0 .

Theorem 3.5.2 Let P be the transition probability matrix of an aperiodic,


irreducible, m - state finite Markov chain. Then
π ′
π ′
  m
lim P n = Π = π ′ where π ′ = (π 1 , π 2 ,..., π m ) with 0 < π j < 1 and ∑ π j = 1.
n →∞
  j =1

π ′
21 (ST, E) Proof 1. We will first prove the theorem if P has no zero elements.

Let ∈ be the smallest element of P . Let e j be an m - component column vector with 1


in the j th position and zero elsewhere. Let a 0 and b0 be the minimum and maximum
element of e j i.e. a 0 = 0 and b0 = 1 . Let a1 and b1 be the minimum and maximum
elements of Pe j . Then by theorem 3.5.1 a 0 ≤ a1 and b1 ≤ b0 . Note further that Pe j is
the j th column of P i.e. the minimum element of Pe j is strictly positive and the
maximum element is strictly less than 1 since P has no zero elements and the sum of
each row is 1 i.e. 0 < a1 and b1 < 1 .
Now let a n and bn be the minimum and maximum elements of P n e j . Note that
P n e j = PP n −1e j i.e. a n −1 ≤ a n and bn ≤ bn −1 by theorem 3.5.1 and therefore
0 < a1 ≤ a2 ≤ ... ≤ an and bn ≤ bn −1 ≤ ... ≤ b1 < 1 .
Let d n = bn − a n . Then by theorem 3.5.1
d1 ≤ (1 − 2 ∈)(b0 − a 0 ) = (1 − 2 ∈)
d 2 ≤ (1 − 2 ∈)d1 ≤ (1 − 2 ∈) 2
d 3 ≤ (1 − 2 ∈)d 2 ≤ (1 − 2 ∈) 3

d n ≤ (1 − 2 ∈)d n −1 ≤ (1 − 2 ∈) n .
- 42 -

Since 1 − 2 ∈< 1 we see that lim d n ≤ lim (1 − 2 ∈) n = 0 ∴ lim d n = 0 i.e. lim a n = lim bn .
n→∞ n→∞ n→∞ n →∞ n →∞

This means that the minimum and maximum of the elements of P n e j tends to the same
limit i.e. all elements of P n e j tends to the same limit. But P n e j is the j th column of P n
i.e. all elements of the j th column of P n tend to the same limit, say π j .
Recall that 0 < a1 ≤ a n ≤ bn ≤ b1 < 1 so that 0 < lim a n = π j < 1 .
n →∞
m m m
Note also the 1 = ∑ pij( n ) for all n and for all i i.e. 1 = ∑ lim pij( n ) = ∑ π j .
n →∞
j =1 j =1 j =1

2. Now suppose that not all elements of P are non-zero. (NOT REQUIRED)

From theorem 3.3.5 it follows that there exist an N such that for all n ≥ N , pij( n ) are all
non-zero. Using P N instead of P as above, we get
d kN ≤ (1 − 2 ∈N ) k
where ∈N is the smallest element of P N .
Note that theorem 3.5.1 is true even if ∈ is zero i.e. if a 0 and b0 are the minimum and
maximum elements of X , and if a1 and b1 are the minimum and maximum elements of
PX , then (b1 − a1 ) ≤ (b0 − a 0 ) . Hence if d kN is the difference between the maximum and
minimum elements of P kN e j and d kN +l is the difference between the maximum and
minimum elements of P kN +l e j , then since P kN +l e j = P l P kN e j , we have that
d kN +l ≤ d kN ≤ (1 − 2 ∈N ) k for l = 1,2,3,..., N − 1 . Therefore we again obtain that
lim d n = 0 i.e. lim a n = lim bn = π j say. ■
n →∞ n →∞ n →∞

Definition 3.5.1 Suppose that p ′ = ( p1 , p 2 ,..., p m ) is a row vector of


m
probabilities such that ∑p
j =1
j = 1 . Then the probability distribution { p j } is

said to be a stationary distribution for an m - state Markov chain with


transition probability matrix P if p ′ = p ′P .

NOTE: If p is a stationary distribution, then


p ′ = p ′P = p ′PP = p ′P 2 = ... = p ′PP n −1 = p ′P n .
NOTE: Suppose that p is a stationary distribution for a Markov chain { X t , t = 0,1,2,...} .
m
If for any t , P[ X t = i ] = pi ∀ i , then P[ X t +1 = i ] = ∑ p j p ji which is the i th element of
j =1

p ′ = p ′P i.e. P[ X t +1 = i ] = pi ∀ i . This means that if { pi } is the distribution of X t it is


also the distribution of X t +1 and therefore also of X t + 2 , X t +3 ,... .
- 43 -

Theorem 3.5.3 Let P be the transition matrix of a finite, aperiodic and


irreducible Markov chain. Then there exists a unique probability vector
π ′ = (π 1 , π 2 ,..., π m ) such that π ′1 = 1 and
ΠP = Π and PΠ = Π
where Π is a matrix with m identical rows each represented by π ′ and
where 1 is a column vector consisting only of ones.
The probability vector π gives the stationary distribution of the process.
22 (ST, E) Proof Let Π = lim n →∞
P n as obtained in theorem 3.5.2. This implies that
Π = lim P n +1 = lim P n P = ΠP i.e. ΠP = Π.
n →∞ n →∞

Similarly Π = lim PP = PΠ i.e. PΠ = Π.


n
n →∞

We know that π ′1 = 1 by theorem 3.5.2 as well.


Since we have ΠP = Π we have π ′P = π ′ ( Π is a matrix with m identical rows each
represented by π ′ ) thus π gives the stationary distribution.
To show uniqueness, let v be a probability vector with v ′1 = 1 and satisfying the
relations VP = V and PV = V where
 v ′
 v ′
 
V =  v ′ .
 

v ′
π ′
π ′
  m
Note that lim v ′P n = v ′Π = v ′π ′ = v1π ′ + ... + vmπ ′ = π ′ since ∑ v j = 1 .
n →∞
  j =1

π ′
But since VP = V i.e. v ′ times P gives the first row of V i.e. v ′ , we have that
v ′P = v ′
and v ′P 2 = v ′P = v ′
and v ′P 3 = v ′P = v ′ (3.5.3)

and v ′P n = v ′P = v ′.
Therefore
lim v ′P n = v ′ .
n →∞

Hence v ′ = π ′ . ■
- 44 -

NOTE: From the last part of the proof above i.e. (3.5.3), it follows that π ′P n = π ′ for all
n and in particular π ′P = π ′ . These equations together with the equation π ′1 = 1
provides a set of equations we can solve to determine the stationary distribution IF it
exists which according to theorem 3.5.2 will be the case if the Markov chain is aperiodic,
irreducible and finite.

0.3 0.7 
Examples a) If P =   find the limiting distribution.
 0.1 0.9
0.2 0.3 0.5
b) If P =  0.1 0.5 0.4 find the limiting distribution.
 0 0 1 
0.3 0.7 0
c) If P = 0.4 0.6 0 find the limiting distribution.
 1 0 0

NOTE: Consider the problem of obtaining lim P n when the Markov chain has more than
n →∞

one equivalence class. The following is pertinent:


1) If state j is absorbing , then lim p (jjn ) = 1 and lim p (jin ) = 0 for i ≠ j .
n →∞ n →∞

2) If state j is transient, regardless of the initial state i , lim pij( n ) = 0 .


n →∞

3) Once a Markov chain enters a state belonging to a recurrent class, it stays in


that class permanently.
4) Having started from a transient state i the probability that the Markov chain
enters the recurrent state j when it eventually leaves the transient class, is given
by g ij of theorem 3.4.2. Let state j belong to a recurrent class C l and let
∑g
k∈Cl
ik = g i (C l ) . Since once in the recurrent class the transitions are governed by

the transition probabilities within that class, we get that


n
π ij = lim pij( n ) = lim ∑∑ g ik( r ) p kj( n − r )
n →∞ n →∞
k∈Cl r =1

 ∞ (r ) 
∑ ∑ g ik π j = g i (Cl )π j , i ∈ T , j ∈ Cl
=
k∈Cl  r =1

where π j is the limiting probability of state j in class C l starting from transient


state i .
- 45 -

0.3 0.7 0 0 0 0
0.5 0.5 0
 0 0 0 
0.3 0.2 0 0 0 0.5
Example Find the limiting distribution if P =  .
0 0 0.8 0 0.2 0 
0 0 0 0.6 0 0.4
 
 0 0 0 0 0 1 

Theorem 3.5.4 Let π ′ = (π 1 , π 2 ,..., π m ) be the limiting distribution of an


aperiodic, irreducible m - state Markov chain with transition probability
matrix P . Let Fij be the first passage time of the transition i → j , and let µij
be its expected value. Then
1
µii = .
πi
23 (ST) Proof Let K be a random variable determined by the state after the first step.
Then
1 if the first step is to K = k = j
Fij = 
1 + Fkj if the first step is to K = k ≠ j
*

where Fkj* is the transition time from state k to state j if the first step is not to j .
Therefore
1 if the first step is to k = j
E[ Fij | K = k ] = 
1 + µ kj if the first step is to k ≠ j
because of the Markov property (thus the probabilities for the random variables Fkj and
Fkj* are the same and thus their expected values are the same).
Therefore
µij = EK [ E[ Fij | K ]]

= 1× pij + ∑ (1 + µ kj ) pik
m

k =1
k≠ j
m m
= 1 + ∑ µ kj pik since ∑ pik = 1
k =1 k =1
k≠ j
m
= 1 + ∑ µ kj pik − µ jj pij
k =1
m
= 1 + ∑ pik µ kj − µ jj pij .
k =1
- 46 -

 µ11 0  0  1 1  1
0 µ 22  0   1 1  1
Let µ = [ µij ] , µ D =  and E =  .
         
   
0 0  µ mm  1 1  1
Then µ = E + Pµ − Pµ D (3.5.4)
since
 p11 p12  p1m   µ11 0  0   p11µ11 p12 µ 22  p1m µ mm 
p p22  p2 m   0 µ 22  0   p21µ11 p22 µ 22  p2 m µ mm 
Pµ D =  21
= .
                
    
 pm1 pm 2  pmm   0 0  µ mm   pm1µ11 pm 2 µ 22  pmm µ mm 

If we multiply both sides of (3.5.4) with π ′ we get that


π ′µ = π ′E + π ′Pµ − π ′Pµ D
= π ′E + π ′µ − π ′µ D since π ′P = π ′ by the note following theore 3.5.3
i.e. π ′µ D = π ′E or
 µ11 0  0  1 1  1
0 µ  0   1 1  1
(π 1 , π 2 ,..., π m ) 22
= (π 1 , π 2 ,..., π m )
         
   
0 0  µ mm  1 1  1
or [π 1µ11 π 2 µ 22  π m µ mm ] = [1 1  1]
m
i.e. π i µii = 1 for all i = 1,2,...m since ∑π i =1
i = 1 (by previous theorem). So since π i > 0

(by previous theorem) for all i by theorem 3.5.2 we have µii = 1


π i for all i = 1,2,...m .

1
NOTE: Since µii = < ∞ (since all π i > 0 ) for an irreducible, aperiodic, finite chain,
πi
we see that all states are positive recurrent.

Occupation Times
By occupation time we mean the number of times (steps) the process
occupies a certain state in a given period. Let N ij(n ) be the number of times
the process visits state j in n steps, given that initially the process was in
N ij( n )
state i . Then is the fraction of time the process visits j in n steps.
n
- 47 -

Let
1 if X k = j, given X 0 = i
Yij( k ) = 
0 otherwise.
Then
P[Yij( k ) = 1] = pij( k ) and P[Yij( k ) = 0] = 1 − pij( k )
and hence E[Yij( k ) ] = pij( k ) .
n
We also have that N ij( n ) = ∑ Yij( k )
k =1
and therefore
[ ]  n  n
E 1n N ij( n ) = E  1n ∑ Yij( k )  = 1
n ∑p (k )
ij .
 k =1  k =1

Theorem 3.5.5 (will be given) Suppose that a1 , a 2 , a3 ,... is a sequence of real


 n

numbers such that lim a n = a . Then lim 1n ∑ a k  = a .
n →∞
 k =1 
n →∞

Proof (not required) Given ∈> 0 there exist N (∈) such that a k − a <∈ / 2 for all
 1 N (∈) 
n > N (∈) . Since lim n ∑ (a k − a )  = 0 there exist an M (∈) such that for n > M (∈)
n →∞
 k =1 
N (∈)
1
n ∑ (a
k =1
k − a ) <∈ / 2 .

Hence for n > max{N (∈), M (∈)} we have that


n n
1
n ∑a
k =1
k −a = 1
n ∑ (a
k =1
k − a)

N (∈) n
≤ 1
n ∑ (a k − a) +
k =1
1
n ∑ (a
k = N (∈) +1
k − a)

n − N (∈)
≤∈ / 2 + ∈/2
n
≤∈
 n

i.e. lim 1n ∑ a k  = a . ■
n →∞
 k =1 
Theorem 3.5.6 Let N ij(n ) be the number of times that an aperiodic,
irreducible m - state Markov chain visits state j in n steps given that
initially the process was in state i . Then
[ ]
n
lim E 1n N ij( n ) = lim 1n ∑ pij( k ) = π j
n →∞ n →∞
k =1

where π ′ = (π 1 , π 2 ,..., π m ) is the limiting distribution.


- 48 -

24 (CT) Proof From theorem 3.5.2 we have that lim pij( n ) = π j and from theorem 3.5.5 it
n →∞
n
then follows that lim 1n ∑ pij( k ) = π j . ■
n →∞
k =1

NOTE: Theorem 3.5.6 shows that the limiting probabilities, i.e. the π j 's, are also
fractions of time the Markov chain can be expected to occupy the various states in a large
number of steps. From theorem 3.5.4 we also have that π i = 1 where µii is the
µii
expected number of steps for a first return to i starting from i i.e. if it takes a long time
on average to return to i we have a small probability that we will find the process in i
and i will be occupied only a small fraction of the time.

Theorem 3.5.8 Suppose a two-state Markov chain has transition probability


matrix
1 − a a 
P= .
 b 1 − b
Then the limiting distribution is given by
(π 0 , π 1 ) = b
,
a 

a+b a+b
if a and b are not both zero.
Proof Let n → ∞ in P (n ) as obtained in theorem 3.5.7. ■

Example 3.5.1 Random Walk with Partially Reflecting Barriers


Consider a process that can be in any one of the states 0,1,2,..., a . If the state of the
system is i then the state can only go to either states i − 1 or i + 1 with probabilities
1 − α i and α i respectively. If the state of the system is 0 then instead of going to state -1 it
remains in state 0 i.e. if the state of the system wants to move past state 0 it is reflected
back to state 0 and the probability of this happening is 1 − α 0 . Similarly if the state of the
system is a and wants to move to state a + 1 it is reflected back to state a and the
probability of this happening is α a . We will assume that 1 − α 0 > 0 and that α a > 0 . On
the other hand if either 1 − α 0 = 1 or α a = 1 these states become absorbing and the
process will not be irreducible. We will therefore assume that 0 < 1 − α i < 1 and
0 < α i < 1 for all values of i .
The process will be a Markov process if the steps to the right or left are independent. All
the states will communicate and all states will be recurrent since not all states in a finite
Markov chain can be transient. Similarly all states will have the same period and since
p 00 = 1 − α 0 > 0 all states are aperiodic.
- 49 -

The transition probability matrix P is as follows


1 − α 0 α0 0 0 0  0 0 0 
1 − α 0 α1 0 0  0 0 0 
 1

 0 1−α2 0 α2 0  0 0 0 
 
P= 0 0 1− α3 0 α3  0 0 0 .
          
 
 0 0 0 0 0  1 − α a −1 0 α a −1 
 0 0  1 − α a α a 
 0 0 0 0
The equations to solve for the limit probabilities π 0 , π 1 ,..., π a are as follows:
π 0 = π 0 (1 − α 0 ) + π 1 (1 − α 1 )
π 1 = π 0α 0 + π 2 (1 − α 2 )
π2 = π 1α 1 + π 3 (1 − α 3 )
π3 = π 2α 2 + π 4 (1 − α 4 )

πi = π i −1α i −1 + π i +1 (1 − α i +1 )

π a −1 = π a − 2α a − 2 + π a (1 − α a )
πa = π a −1α a −1 + π aα a .
α0
From the first equation we can solve for π 1 in terms of π 0 namely π 1 = π0 .
1 − α1
From the second equation we can then solve for π 2 in terms of π 0 namely
α 0α 1
π2 = π0 .
(1 − α 1 )(1 − α 2 )
In general we can solve for π i in terms of π 0 from the i th equation namely
α 0α 1 ...α i −1
πi = π0.
(1 − α 1 )(1 − α 2 )...(1 − α i )
Since the sum of all π i 's is equal to 1 we must therefore have that
 a
α 0α 1 ...α i −1 
1 = 1 + ∑ π 0
 i =1 (1 − α 1 )(1 − α 2 )...(1 − α i ) 
−1
 a
α 0α 1 ...α i −1  α 0α 1 ...α i −1
i.e. π i = 1 + ∑  .
 i =1 (1 − α 1 )(1 − α 2 )...(1 − α i )  (1 − α 1 )(1 − α 2 )...(1 − α i )
[NOTE: This is a solution of the equations even if 1 − α 0 = 0 and α a = 0 i.e. the
Ehrenfest model.]
- 50 -

In case α i = p and 1 − α i = q, p + q = 1 , for all i we have that


1 − ( p / q ) a +1
if p ≠ q
i
a
α 0α 1 ...α i −1 a
 p 
1+ ∑ = 1 + ∑   =  1 − ( p / q)
i =1 (1 − α 1 )(1 − α 2 )...(1 − α i ) i =1  q  a + 1
 if p = q = 1 / 2
and therefore
 1 − ( p / q)  p  i
   for p ≠ q
a +1 

π i = 1 − ( p / q )  q 
 1
 a + 1 for p = q = 1 / 2.


Experience Rating

When deciding the premium that a policyholder should pay, many insurance companies
use information on the number of claims that the policyholder has actually made in
previous years. This is done on the grounds that this gives a better indication of the
likelihood of the policyholder making a claim in the future. Those who have made more
claims in the past are charged higher premiums than those who have made fewer claims.
This is particularly common in the case of motor insurance, where a No Claims Discount
(NCD) system is used by most insurance companies. Some insurers also use such a
system for other types of insurance such as household, group life and medical cover. A
NCD system operates by giving the policyholder a discount on the normal premium,
which is related directly to the number of "claim free years" that the policyholder has
experienced. The discount is expressed as a percentage of the normal premium. The
greater the number of past claim free years, the higher the level of the discount.
When deciding whether to make a claim, the policyholder has then to consider the effect
it will have on the premium in subsequent years.

There are two parts to a NCD system: the discount categories and a set of rules for
moving between these categories. In addition, in order to investigate the properties of a
NCD system the probability that a policyholder makes a claim each year need to be
known.
The categories are often referred to as the number of "claim free years ". However, the
rules for moving between categories are usually such that they do not actually relate to
the number of years since a claim. Rather than a claim resulting in a policyholder
returning to having no discount, it is common for the policyholder to simply move to
another category with a lower level of discount.

Example E1 Consider a NCD system with three categories:


Category Discount %
0 0
1 25
2 40
- 51 -

In category 0, the policyholder pays the full premium, which in practice will vary
between individuals according to their own personal circumstances such as age.
In category 1, the policyholder pays 75% of the full premium and in category 2 only 60%
of the full premium.
If a policyholder makes no claim in a year, he or she moves to the next higher category
(or stays in category 2). If one or more claims is made, he or she move down one
category (or stay at zero discount). ♪

Let us consider an NCD system with m + 1 categories namely 0,1,2,.., m . Let pij be the
probability that during a year a policyholder will move from category i to category j .
These transition probabilities written in matrix form is as follows
 p 00 p 01 p 02  p 0 m 
p 
 10 p11 p12  p1m 
P =  p 20 p 21 p 22  p 2 m  .
 
      
 p m 0 p m1 p m 2  p mm 
If we assume that these transition probabilities are the same for all years and that whether
or not a claim occurs during a year is independent of whether or not a claim is made
during any other year are independent events, matrix P will be the transition probability
matrix for a finite, time-homogeneous Markov chain. We will assume that for all i and j
there exist a number of years, say n which may depend on i and j , such that it is
possible to go from category i to category j with a positive probability in n years. This
means that all states communicate i.e. the Markov chain is irreducible and since it is a
finite Markov chain all states (i.e. categories) are recurrent.

Example E2 Suppose that in example E1 the probability of a policyholder submitting a


claim during any year is 0.2. Then the transition matrix is given by
0.2 0.8 0 
P = 0.2 0 0.8 .
 0 0.2 0.8
In this case
0.20 0.16 0.64
P 2 = 0.04 0.32 0.64
0.04 0.16 0.80
i.e. it is possible to go from any category to any other category in 2 years with a positive
probability i.e. all states communicate. ♪
Suppose that at time 0 there are N policyholders all with the same probability of
submitting a claim during a year. Suppose that N i of the policyholders are in category i
N
at time 0, where N = N 0 + N 1 + ... + N m . Let qi = i i.e. qi is the proportion of
N
policyholders in category i at time 0. Let q ′ = (q 0 , q1 ,..., q m ) be the vector of proportions
- 52 -

(n )
at time 0. Let X iu be the category the u th policyholder in category i at time 0 is in
after n years.
Let
1 if X iu = j
(n)

Y = (n)
iu , j
if X iu ≠ j
(n)
0
i.e. Yiu( n, j) is 1 if the u th policyholder in category i at time 0 is in state j at time n . Hence
Yiu( n, j) is equal to 1 with probability pij(n ) and 0 otherwise and therefore
E[Yiu( n, )j ] = 1 × pij( n ) + 0 × (1 − pij( n ) ) = pij( n ) .
Ni
We further have that N i(,nj) = ∑ Yiu( n, )j is the number of policyholders which were in
u =1

category i at time 0 who are in category j at time n and E[ N i(,nj) ] = N i × pij( n ) . Let
m
N (j n ) = ∑ N i(,nj) i.e. it is all policyholders which at time n are in category j . Hence
i =0
m
E[ N (n)
j ] = ∑ N i × pij( n ) . The expected proportion of policyholders in category j at time
i =0
n , is then given by
m m m
q (jn ) = E[∑ N i(,nj) / N ] = ∑ ( N i / N ) × pij( n ) = ∑ qi × pij( n )
i =0 i =0 i =0

which is the vector q ′ multiplied by the j column of P . The vector of expected


th (n )

proportions in the various categories at time n is therefore given by


( )
qn' = q0( n ) , q1( n ) , q2( n ) ,..., qm( n ) = q ' P ( n ) = q ' P n = q ' P n −1P = qn' −1P .
This is the same formula as for the vector of probabilities at time n if at time 0 the
probabilities are given by q ′ .

Example E3 (Continuation of example E1)


Recall the transition matrix
0.2 0.8 0 
P = 0.2 0 0.8 .
 0 0.2 0.8
Let q ′ = (1 0 0) i.e. at time 0 all policyholders are in category 0. Then
0.2 0.8 0 
q1 = (1 0 0) 0.2 0 0.8 = (0.2 0.8 0)
'

 0 0.2 0.8
i.e. after 1 period the expected proportion in category 0 is only 0.2 whereas the expected
proportion in category 1 is 0.8. Also
- 53 -

0.2 0.8 0 
q = (0.2 0.8 0) 0.2 0 0.8 = (0.2 0.16 0.64)
'
2

 0 0.2 0.8
i.e. after 2 periods the expected proportion in state 2 is 0.64 etc.
If the full premium is R200 then the average premium after two steps will be
200(0.2) + 200(0.75)(0.16) + 200(0.6)(0.64) = 140.80.

Example E4 Suppose the number of claims by a policyholder in a single year follows a


binomial distribution with parameters 2 and p . If a policyholder claims once in the year
they move down a single category or stay in the lowest category. If a policyholder claims
twice in the year they move to the lowest category. If a policyholder doesn’t claim in the
year they move up a single category or stay in the highest category. Is the process
Markov? Set up the transition probability matrix if the three discount levels are 0%, 25%
and 40%.

Example E5 Suppose the probability of a claim in any given year is p . If a


policyholder didn’t claim the previous year they move up a single category or stay in the
highest category. If a policyholder claimed two years ago and claimed in the previous
year they move to the lowest category. If a policyholder didn’t claim two years ago and
but did claim the previous year they move down a single category or stay in the lowest
category. Is the process Markov? Set up the transition probability matrix if the four
discount levels are 0%, 25%, 30% and 40%.

3.6 Markov Chains with Countably Infinite States

To prove the existence of the limit probabilities in this case we need the following
theorem which we state without proof.

Theorem 3.6.1 (will be given) Suppose that a0 , a1 , a 2 , a3 ,... and b0 , b1 , b2 , b3 ,...



are sequences of real numbers such that a k ≥ 0 ∀k , bk ≥ 0 ∀k , a1 > 0 , ∑a
k =0
k =1

and ∑b
k =0
k < ∞. If a bounded sequence of real numbers u 0 , u1 , u 2 , u 3 ,... satisfy
n
u n − ∑ a n − k u k = bn for n = 0,1,2,3,...
k =0
n
or u n − ∑ u n − k a k = bn for n = 0,1,2,3,... (3.6.1)
k =0
- 54 -

∑b k ∞
then lim
n →∞
u n exists and lim u n =
n →∞
k =0

if ∑ ka k < ∞ and lim u n = 0 otherwise.
n →∞
∑ ka
k =0
k
k =0

NOTE: Theorem 3.6.1 can be generalised to the case where it is not necessary to assume
that a1 > 0 but then it must be assumed that the greatest common divisor of all k such
that a k > 0 is equal to one.
In applications to Markov chains this will have the implication that the chain must be
aperiodic.
We also need the following theorem which will be proved.
n ∞
Theorem 3.6.2 (will be given) Let y n = ∑ a n −k x k where a m ≥ 0, ∑a m =a<∞
k =0 m =0

 ∞

and lim x k = c where c is a real number. Then lim y k = ac =  ∑ a m c .
k →∞ k →∞
 m =0 
Proof (not required) We have that
n ∞
y n − ac = ∑ a n − k x k − c ∑ a m
k =0 m =0
n ∞
= ∑ a n − k ( x k − c) − c ∑a m .
k =0 m = n +1

Given ∈> 0 let K (∈) be such that x k − c <∈ / 3a for all k ≥ K (∈) . Then for n > K (∈)
K (∈) n ∞
y n − ac = ∑ a n − k ( x k − c) +
k =0
∑ a m ( x k − c) − c
m = K (∈) +1
∑a
m = n +1
m .

Let M = max x k − c . Then let N (∈) ≥ K (∈) and also be such that for all n ≥ N (∈) it is
k ≥0
true that
∞ K (∈) n
|c| ∑ am <∈ / 3 and
m =n +1
∑ a n−k =
k =0
∑a m
m = n − K (∈)
<∈
3M
.

Then for n ≥ N (∈)


K (∈)
∈ n ∞
y n − ac ≤ M ∑ a n − k +   ∑ a n − k + | c | ∑ a m
k =0  3a  k = K (∈) +1 m = n +1

 ∈   ∈  ∈
≤ M  +  a +  
 3M   3a   3 
=∈
i.e. lim y k = ac . ■
k →∞
- 55 -

Theorem 3.6.3 Suppose that { X n : n = 0,1,2,3,...} is an irreducible, recurrent


aperiodic time-homogeneous Markov chain with a countably infinite number
of states. Let be the mean recurrence time of state i i.e.
. Then

and lim p (jin ) = lim pii( n ) if j ≠ i .


n →∞ n →∞

Proof Note that pii( 0) = 1 and f ii( 0) = 0 i.e. pii( 0 ) − f ii( 0 ) pii( 0 ) = 1 and in general
n
pii( n ) = ∑ f ii( k ) pii( n − k ) for n ≥ 1 (see proof of theorem 3.3.10).
k =0
Hence
n
1 for n = 0
pii( n ) − ∑ f ii( k ) pii( n − k ) = 
k =0 0 for n ≥ 1.

Now apply theorem 3.6.1 with u n = pii(n) ; a k = f ii( k) [then a k ≥ 0 ∀k , ∑a
k =0
k = 1 since all

states are recurrent; b0 = 1 and bk = 0 ∀k ≥ 1 so that ∑b
k =0
k = 1 ; since all states are

aperiodic the greatest common divisor of all k such that f ii( k ) > 0 is 1(assume this
without proof) (or a1 > 0 since all states are aperiodic); and
.]
Therefore

For j ≠ i we have that


n n
p (jin ) = ∑ f ji(ν ) pii( n −ν ) = ∑ f ji( n − k ) pii( k ) for n ≥ 0 by theorem 3.3.10.
ν =0 k =0

From theorem 3.3.8 it follows that f ji = ∑ f ji(ν ) = 1 since all states are recurrent. Now
ν =0

setting y n = p , a n = f (n)
ji
(n)
ji and x n = p (n )
ii and using theorem 3.6.2 we obtain that
lim p (n)
ji = lim p (n)
ii if j ≠ i . ■
n →∞ n →∞

NOTE: If state i is positive recurrent ( ), then


and if i is null recurrent ( ), then
lim pii( n ) = 0 .
n →∞
- 56 -

NOTE: Under the conditions of theorem 3.6.3 and


using theorem 3.5.5.

Theorem 3.6.4 In an aperiodic, recurrent equivalence class C , if there exists


a state i such that lim pii( n ) = π i > 0 , then π j = lim p (jjn ) > 0 for all j ∈ C .
n →∞ n →∞

Proof If i ∈ C and j ∈ C then i ↔ j i.e. there exist n and m such that pij( n ) > 0 and
p (jim ) > 0 . Furthermore p (jjm + v + n ) ≥ p (jim ) pii( v ) pij( n ) by the Chapman-Kolmogorov equations.
As v → ∞ , we get that
π j ≥ p (jim ) pij( n )π i > 0 since π i > 0
for all j ∈ C . ■

NOTE: Theorem 3.6.4 shows that positive recurrence is a class property. Similarly null
recurrence is a class property - for if π i = lim pii( n ) = 0 for some i ∈ C then π j = 0 for all
n →∞

j ∈ C since if one of the π j ’s >0 then all π j ’s >0. The theorem also holds if the whole
process is irreducible, aperiodic and recurrent.

We call π i = lim
n →∞
pii( n ) for i = 0,1,2,3,... the limiting distribution of the states of

the system. It is often denoted by {π i }i∞=0 .

Theorem 3.6.5 (will be given) Suppose that a n,k for n = 1,2,3,... and
k = 0,1,2,3,... are real numbers such that a n ,k ≤ M for all n = 1,2,3,... and
k = 0,1,2,3,... and lim a n ,k = a k for k = 0,1,2,3,... . Further, suppose that
n →∞

x k , k = 0,1,2,3,... is a sequence of real numbers such that ∑x
k =0
k < ∞ . Then
∞ ∞ ∞
lim ∑ x k a n ,k = ∑ x k lim a n ,k = ∑ x k a k .
n →∞ n →∞
k =0 k =0 k =0


Proof (not required) Given ∈> 0 there exists N (∈) such that ∑x
k =n
k <
3M
∀n ≥ N (∈) .

Let Q = max{x k : k = 0,1,2,3,...N (∈)}. Then there exist M (∈) such that

a n ,k − a k < ∀n ≥ M (∈) for k = 0,1,2,3,..., N (∈) .
3QN (∈)
Hence for all n ≥ M (∈)
- 57 -

∞ ∞ N (∈) ∞ ∞

∑ x k a n,k − ∑ x k a k
k =0 k =0
≤ ∑ x k (a n,k − a k ) +
k =0
∑ x k a n,k +
k = N (∈) +1
∑x a
k = N (∈) +1
k k

∈ ∈ ∈
≤Q N (∈) + M +M
3QN (∈) 3M 3M
≤∈ .

Theorem 3.6.6 In an irreducible Markov chain with ergodic states, the


limiting probabilities {π i }i∞=0 satisfy the equations
∞ ∞
π j = ∑ π i pij for j = 0,1,2,... i.e. π ′ = π ′P and ∑π i = 1.
i =0 i =0

The limiting distribution is stationary.


The limiting distribution is uniquely determined by
∞ ∞
π i ≥ 0, ∑ π i = 1 and π j = ∑ π i pij for all j = 0,1,2,... .
i =0 i =0

Proof For every n and M


∞ M
1 = ∑ pij( n ) ≥ ∑ pij( n ) .
j =0 j =0

Letting n → ∞ we get that


M
1 ≥ ∑π j
j =0

and since this is true for any M we get that



1 ≥ ∑π j (3.6.2)
j =0

We further have that by the Chapman-Kolmogorov equations


∞ M
pij( n +1) = ∑ pik( n ) p kj ≥ ∑ pik( n ) p kj for all M .
k =0 k =0
Letting n → ∞ we obtain that
M
π j ≥ ∑ π k p kj for all M
k =0
and therefore

π j ≥ ∑ π k p kj . (3.6.3)
k =0

Multiply both sides of (3.6.3) by p ji and then sum over all values of j to get
∞ ∞ ∞ ∞ ∞ ∞

∑π
j =0
j p ji ≥ ∑ p ji ∑ π k p kj = ∑ π k ∑ p ji p kj = ∑ π k p ki( 2 )
j =0 k =0 k =0 j =0 k =0

by the Chapman-Kolmogorov equations.


- 58 -


But π i ≥ ∑ π j p ji by using (3.6.3) thus
j =0

π i ≥ ∑ π k p ki( 2)
k =0

or π j ≥ ∑ π k p kj( 2 ) for all values of j , by changing subscripts. (3.6.4)
k =0

Repeating the above procedure to multiply by both sides of (3.6.4) by p ji and then to add
over all values of j , we get

π j ≥ ∑ π k p kj(3) for all values of j
k =0
and in general

π j ≥ ∑ π k p kj( n ) for all values of j and all values of n . (3.6.5)
k =0
Now suppose that one of the inequalities in (3.6.5) is a strict inequality. If we then sum
over all values of j we get that, since each of the rows of P n sum to 1,
∞ ∞ ∞ ∞ ∞ ∞

∑ π j > ∑∑ π k pkj( n) = ∑ π k ∑ pkj( n) = ∑ π k


j =0 j =0 k =0 k =0 j =0 k =0

which is a contradiction thus π j = ∑ π k p kj( n ) for all values of j and n . (3.6.6)
k =0

Now let n → ∞ and since ∑π
k =0
k converges and p kj( n ) ≤ 1 for all k and n , we get from

(3.6.6) and theorem 3.6.5 that


∞ ∞
π j = ∑ π k π j = π j ∑ π k for all values of j . (3.6.7)
k =0 k =0

Since all π j > 0 by positive recurrence it follows from (3.6.7) that ∑π
k =0
k = 1.

Since (3.6.6) holds for all n it is also true for n = 1 i.e.



π j = ∑ π k p kj for all values of j (3.6.8)
k =0

which means that {π i }i =0 is a stationary distribution since then π ′P = π ′ .


Now suppose that {xi }i =0 also satisfies the conditions


∞ ∞
xi ≥ 0, ∑ xi = 1 and x j = ∑ xi pij for all j = 0,1,2,... .
i =0 i =0
Then by the same methods as above we can show that
∞ ∞
x j = ∑ x k p kj = ∑ x k p kj( n ) for all j and n .
k =0 k =0
Hence if we let n → ∞ and use theorem 3.6.5 we get that
- 59 -

∞ ∞
x j = ∑ xk π j = π j ∑ xk = π j
k =0 k =0

i.e. the limiting distribution {π }∞


i i =0 is uniquely determined by the conditions. ■

3.7 Estimation of Parameters for a Markov Chain


SETTING: Suppose that { X t : t = 0,1,2,3,...} is a time-homogeneous Markov chain with
state space S = {1,2,..., s} .
GIVEN: Let x1 , x 2 ,..., x n be observed values for X 1 , X 2 ,..., X n where the process started
in state i0 .
PARAMETERS TO ESTIMATE: The parameters for this model are the pij 's, however,
we must have that pi1 + pi 2 + ... + pis = 1 for all i i.e. pis = 1 − pi1 − pi 2 − ... − pi , s −1 for
i = 1,2,..., s . The likelihood function will therefore depend on the parameters
pi1 , pi 2 ,..., pi , s −1 for i = 1,2,..., s i.e. s ( s − 1) parameters in total and must also be such
that 0 ≤ pij ≤ 1 for all i and j and pi1 + pi 2 + ... + pi , s −1 ≤ 1 for all i = 1,2,..., s .
LIKELIHOOD ESTIMATION:
RESULT 3.7.1(ST, E) The likelihood function is then given by
L = L( p11 , p12 ,..., p1, s −1 , p21 , p22 ,..., p2, s −1 ,..., ps1 , ps 2 ,..., ps , s −1 )
= pi0 , x1 px1 , x 2 px 2 , x3 ... px n−1 , x n (by theorem 3.1.1 and time − homogeneity)

= ∏ ( pi1 ) i1 ( pi 2 ) i 2 ...( pi , s −1 ) i ,s −1 (1 − pi1 − pi 2 − ... − pi , s −1 ) is


s
n n n n

i =1

( where nij is the number of times there was a transition from


state i to j )
s
= ∏ Li , say.
i =1

Note that all the nij 's is a set of sufficient statistics for all the pij 's. Also, note that Li
only depends on pi1 , pi 2 ,..., pi , s −1 and that these parameters only occur in Li . That means
that to maximize L we can maximize each Li individually with respect to the parameters
that occur in Li .
We will maximize Li without considering the restrictions 0 ≤ pij ≤ 1 for all i and j and
pi1 + pi 2 + ... + pi , s −1 ≤ 1 , and then show that these restrictions will in any case be satisfied
for the estimates that maximize Li .
Now
ln Li = ni1 ln pi1 + ni 2 ln pi 2 + ... + ni , s −1 ln pi , s −1 + nis ln (1 − pi1 − pi 2 − ... − pi , s −1 )
- 60 -

∂ n nis
ln Li = i1 + (−1)
∂pi1 pi1 1 − pi1 − pi 2 − ... − pi , s −1
i.e.
∂ n nis
ln Li = i 2 + (−1)
∂pi 2 pi 2 1 − pi1 − pi 2 − ... − pi , s −1

∂ ni , s −1 nis
ln Li = + (−1).
∂pi , s −1 pi , s −1 1 − pi1 − pi 2 − ... − pi , s −1
Equating all the derivatives equal to zero to get the maximum likelihood estimates we get
that
(1 − pˆ i1 − pˆ i 2 − ... − pˆ i,s −1 ) for j = 1,2,..., s − 1
nij
pˆ ij = (3.7.1)
nis
Let ni = ni1 + ni 2 + ... + nis i.e. ni is the number of times there was a transition from state
i to some state (including to i itself). Now adding both sides of (3.7.1) over
j = 1,2,..., s − 1 we get that
n − nis
pˆ i1 + pˆ i 2 + ... + pˆ i , s −1 = i (1 − pˆ i1 − pˆ i 2 − ... − pˆ i,s −1 )
nis

=
ni
(1 − pˆ i1 − pˆ i 2 − ... − pˆ i,s −1 ) − 1 + pˆ i1 + pˆ i 2 + ... + pˆ i,s −1
nis
n
i.e. 1 − pˆ i1 − pˆ i 2 − ... − pˆ i , s −1 = is and substituting this in (3.7.1) we get
ni
nij
pˆ ij = (3.7.2)
ni
The maximum likelihood estimate of pij is the proportion of times that when there was a
transition from state i it was to state j . We also have that 0 ≤ pˆ ij ≤ 1 and
ni − nis
pˆ i1 + pˆ i 2 + ... + pˆ i , s −1
≤ 1. =
ni
Let N ij be the random variable determined by the number of transitions from state i to
state j and let N i be the random variable determined by the number of transitions from
state i . Then the maximum likelihood estimator of pij is given by
~ N ij
pˆ ij = for j = 1,2,..., s − 1; i = 1,2,..., s . (3.7.3)
Ni
 N ij   N ij  kpij
Note that E  Ni = k = E Ni = k = = pij and therefore
 Ni   k  k
  N ij 
~
E[ pˆ ij ] = E N i  E  N i   = E N i pij = pij [ ]
  N i  
- 61 -

i.e. the maximum likelihood estimators of the pij 's are unbiased estimators based on a set
of sufficient statistics. ■

3.8 Testing Whether or Not a Stochastic Process with Discrete


Parameter Space and Finite Discrete State Space is a Time-
Homogeneous Markov Chain
SETTING: Let { X t : t = 0,1,2,3,...} be a stochastic process with state space
S = {1,2,3,..., s} .
AIM: To fully test whether or not the process is a time-homogeneous Markov chain
would require testing an enormous amount of properties of the process and require large
amounts of data. For instance, if define pi0 ,i1 ,i2 ,...,in as
P[ X t +1 = i1 , X t + 2 = i2 ,..., X t + n = in | X t = i0 ] . Then this should be equal to
pi0 ,i1 × pi1 ,i2 × ... × pin −1 ,in if the process is a Markov chain. An estimate of pi0 ,i1 ,i2 ,...,in would
ni0 ,i1 ,i2 ,...,in
be where ni0 ,i1 ,i2 ,...,in is the number of times that the process went from i0 , then
ni0
to i1 , then to i2 ,..., from in −1 to in and ni0 is the total number of times the process was in
state i0 and went to some state. This would mean estimating s n parameters and require a
very large number of observations. In practice it is generally considered sufficient to test
the hypothesis for n = 3 and to use all the triplets nijk as data.
TEST: If there are no restrictions on the pijk parameters there are s 3 − 1 independent
parameters to estimate (the sum of all of them is equal to 1) and we would use nijk / ni as
the estimate. We make use of the χ 2 -goodness-of-fit test. Under the hypothesis that the
process is a Markov chain we have that pijk = pij p jk and our estimate would be
(n ij / ni )(n jk / n j ). The frequency of going from i to j and then to k is nijk and the
expected frequency of this to happen if it is a Markov chain is nij (n jk / n j ) . The test
statistic based on this is
s s s (n − nij (n jk / n j ) )
2

χ = ∑∑∑
2 ijk
.
i =1 j =1 k =1 nij (n jk / n j )
The model with no restrictions on the pijk 's, except that their sum is equal to 1, has s 3 − 1
parameters and the number of independent parameters under the hypothesis that the
process is a Markov chain is s ( s − 1) - remember the pij 's in every row add up to 1. The
degrees of freedom for the χ 2 statistic above is therefore s 3 − 1 − s ( s − 1) . However, if
there are certain probabilities that by their very nature must be 0 we need to adjust the
degrees of freedom namely
degrees of freedom = r − q + s − 1
where s denotes the number of states i such that ni > 0 ,
- 62 -

q denotes the number of pairs (i, j ) for which nij > 0


and r denotes the number of triplets (i, j , k ) such that nij n jk > 0 .

Exercise 3.8.1 Suppose that the following states were observed for a 3-state process:
1,3,2,2,1,3,3,2,3,1,2,3,2,1,1,2,2,1,3,3.
Show that the value of χ is 14.6111 with 20 degree of freedom which is not significant
2

at the 5% level.

Test Yourself
Exercise 3.1 Show that for a Markov chain the definitions of the Markov property
given in section 2.3.5 and that given in section 3.1 are equivalent.
Exercise 3.2 Let qi = P[ X 0 = i] and let the one-step transition probabilities be given
by pij( n ,n +1) . Determine an expression for P[ X 3 = k ] .
Exercise 3.3 Assume that pij( n,n +1) = pij for all n . Show that P[ X m+ k = j | X m = i] does
not depend on m for all values of k . (Use induction on k .)
Exercise 3.4 Let q be a vector with i th element qi = P[ X 0 = i] . Let p5 be the vector
with i th element P[ X 5 = i ] . Let pij( n ,n +1) be the one-step transition probabilities for a
time-homogeneous Markov chain. Determine an expression for p5 in terms of q and
the matrices P ( n ,n +1) .
Exercise 3.5 What is the value of ∑ P[ X
i∈S
n = i | X n +1 = j ] where P[ X n = i | X n +1 = j ]

is the probability that at time n the state of the process was i given that at time n + 1 the
process is in state j .
Exercise 3.6 Consider a time-homogeneous Markov chain with state space S = {0,1,2}
and transition probability matrix P , where
 p q 0
P =  0.5 0 0.5 .
 p − 0.5 0.7 0.2
(a) Determine the values of p and q .
(b) Calculate the transition probabilities pij(3) .
Exercise 3.7 A motor insurance company grants either no discount (state 0), 25%
discount (state 1) or 50% discount (state 2). A claim-free year results in a transition to the
next higher state for the following year (or in the retention of maximum discount).
Similarly a year with one or more claims results in a transition to the next lower state for
the following year (or retention of no discount). The probability of a claim-free year is
0.75 for all years and what happens in different years are independent events.
(a) Is this a Markov chain? Why?
(b) Determine the one-step transition probability matrix.
(c) What is the probability of starting with a 25% discount and ending up with 25%
discount after 4 years?
- 63 -

Exercise 3.8 A motor insurance company grants its customers either no discount, or
25%, 40% or 60% discount. A claim-free year results in a transition to the next higher
state for the following year (or in the retention of maximum discount). A year with one or
more claims results in a transition to the next lower level of discount if the previous year
was claim-free or two levels down if there was a claim in the previous year (or to no
discount if two levels lower is not possible). The probability of a claim-free year is 0.75
for all years and what happens in different years are independent events.
Consider the following possible states for the system: no discount (state 0), 25% discount
(state 1), 40% discount and a claim-free previous year (state 2), 40% discount but a claim
the previous year (state 3) or 60 % discount (state 4).
(a) Is this a Markov chain? Why?
(b) Determine the one-step transition probability matrix.
(c) What is the probability of starting with no discount and ending up with maximum
discount after 5 years?
Exercise 3.9 A no claim discount system has four levels of discount – 0%, 20%, 40%
or 60%. A new policyholder starts at 0% discount. At the end of each policy year,
policyholders will change levels according to the following rules:
(i) At the end of a claim free year, a policyholder moves up one level, or remains
on the maximum discount.
(ii) At the end of a year in which exactlt one claim is made, a policyholder drops
down one level, or remains on 0%.
(iii) At the end of a year in which more than one claim was made, a policyholder
drops to 0% discount.
The probability of a claim free year for a policyholder is 0.7, the probability of exactly
one claim is 0.2 and the probability of more than one claim is 0.1. What happens in
different years are independent events.
(a) Determine the transition probability matrix P .
(b) Calculate P 2 .
(c) If a policyholder starts at 0%, what is the probability that after 5 years the
policyholder will be on maximum discount?
Exercise 3.10 Show that the probabilities of A winning all the money as given in
example 3.4.1 agrees with the formula derived in (3.4.7).
Exercise 3.11 Show that the formula for π i derived in (3.4.7) satisfies the difference
equation (3.4.8) as well as the boundary conditions π 0 = 0 and π N = 1 .
Exercise 3.12 (a) Show that the formulae for the expected number of bets before the
game ends in (3.4.11) satisfies the difference equation (3.4.10) and the boundary
conditions m0 = m N = 0 .
(b) Show that the formulae agrees with the expected number of bets as derived in
example 3.4.2 where p = 0.6 and N = 5 .
Exercise 3.13 Consider the no claim discount example in exercise 3.8.
(a) Determine whether the process is an irreducible Markov chain.
(b) Are the states aperiodic?
(c) Explain why all the states are recurrent.
- 64 -

(d) Calculate the probabilities that a person not yet on the maximum discount,
will receive the maximum discount for the first time after 6 years.
Exercise 3.14 Consider a two-state time-homogeneous Markov chain with transition
1 − a a 
probability matrix P =   where 0 < a < 1 and 0 < b < 1 .
 b 1 − b
(a) Is this an irreducible chain?
(b) Is this an aperiodic chain?
(c) Determine the stationary distribution for this process. Compare your answer with
the limiting distribution for a two-state Markov chainas can be determined from
theorem 3.5.7.
Exercise 3.15 For the no claim discount example considered in exercise 3.13,
determine the stationary distribution.
Exercise 3.16 Consider a two-state time-homogeneous Markov chain with transition
1 − a a 
probability matrix P =   where 0 < a < 1 and 0 < b < 1 .
 b 1 − b
(a) Determine the recurrence times for both states.
(b) For a = 0.5 and b = 0.5 determine the expected recurrence times.
(c) For a = 0.8 and b = 0.1 determine the expected recurrence times.
Exercise 3.17 Consider the no claim discount example in exercise 3.13
(a) Determine the recurrence times.
(b) In the long run, what proportion of time will a person be on 60% discount?
(c) Does the answer in (b) depend on the state the person starts in?
- 65 -

Chapter 4: Markov Jump Processes

4.1 Transition Rates and The Kolmogorov Equations

Definition 4.1.1 A stochastic process { X t : t ∈ T } with parameter space


T = [0, ∞ ) and state space S = {1,2,3,..., N } for some N or S = {1,2,3,...} is called
a Markov jump process if for any value of n and any values of
t1 < t 2 < t 3 < ... < t n < t n +1 ∈ T it is true that
P[ X tn +1 = xtn +1 X t1 = xt1 , X t2 = xt2 ,..., X tn = xtn ]
= P[ X tn +1 = xtn +1 X tn = xtn ]
for all xt , xt ,..., xt and xt ∈ S .
1 2 n n +1

The quantities P[ X t = j X s = i] = Pij ( s, t ) , say, are called the transition


probabilities of the process.

In the same way as for Markov chains (theorems 3.1.1 and 3.1.2) we can show that for
any t 0 < t1 < t 2 < ... < t n and any values i0 , i1 , i2 ,..., in it is true that
P[ X t1 = i1 , X t2 = i2 ,..., X tn = in X t0 = i0 ]
= Pi0 ,i1 (t 0 , t1 ) Pi1 ,i2 (t1 , t 2 ) Pi2 ,i3 (t 2 , t 3 )...Pin −1 ,in (t n −1 , t n ). (4.1.1)
and that for any 0 < t1 < t 2 < ... < t n and any values i1 , i2 ,..., in it is true that
P[ X t1 = i1 , X t2 = i2 ,..., X tn = in ]
= ∑ qi0 Pi0 ,i1 (0, t1 ) Pi1 ,i2 (t1 , t 2 ) Pi2 ,i3 (t 2 , t 3 )...Pin −1 ,in (t n −1 , t n ) (4.1.2)
i0 ∈S

where qi = P[ X 0 = i ] .
It follows that if we know the initial probabilities ( qi ’s) and the transition probabilities
then all probabilities for the process can be determined.
As in the case of Markov chains we can in exactly the same way prove the Chapman-
Kolmogorov equations (see theorem 3.1.3) namely that for any s < u < t and any i and
j,
Pij ( s, t ) = ∑ Pik ( s, u ) Pkj (u , t ) (4.1.3)
k∈S
i.e. the transition probabilities must satisfy the same type of relations as in the case of
Markov chains.
If we define P( s, t ) as the matrix whose (i, j ) th element is Pij ( s, t ) then it follows from
(4.1.3) that
P( s, t ) = P( s, u ) P(u , t ) (4.1.4)
- 66 -

In practice it turns out that instead of specifying the transition probabilities it is usually
easier to specify derivatives of the transition probabilities. Let us assume that the
functions Pij ( s, t ) are continuously differentiable in both variables. Note that
0 if i ≠ j
Pij ( s, s ) = 
1 if i = j
= δ ij .
Definition 4.1.2 The transition rate of going from state i to state j in the
vicinity of time s is defined as
 Pij ( s, s + h)
lim if j ≠ i
Pij ( s, s + h) − Pij ( s, s ) h →0 h
σ ij ( s ) = lim =
lim Pij ( s, s + h) − 1
h →0 h
if j = i
 h→0 h
∂ 
=  Pij ( s, t ) .
 ∂t  t =s
f ( h)
Definition 4.1.3 A function f : R  R is o(h) if lim = 0.
h →0 h

It is easy to show that if f is o(h) then cf (h) is o(h) if c is a constant and if g is also
o(h) then f + g is o(h) and if 0 ≤ g (h) ≤ f (h) and f is o(h) then g is o(h) .

RESULT 4.1.1(CT)

Pij ( s, s + h) = Pij ( s, s ) + hσ ij ( s ) + o(h)


hσ ij ( s ) + o(h) if i ≠ j
=
1 + hσ ij ( s ) + o(h) if i = j.

f ( s + h) − f ( s )
Let f ' ( s ) = lim and let δ (h) be such that
h →0 h
f ( s + h) = f ( s ) + hf ' ( s ) + δ (h) .
Then
f ( s + h) − f ( s ) δ ( h)
= f ' (s) +
h h
and therefore
f ( s + h) − f ( s ) δ ( h)
lim = f ' ( s ) + lim
h →0 h h →0 h
δ ( h)
i.e. f ' ( s ) = f ' ( s ) + lim
h →0 h
δ ( h)
or in other words lim = 0 , so that δ (h) = o(h) and therefore
h →0 h
f ( s + h) = f ( s ) + hf ' ( s ) + o(h) .
- 67 -

From this note and definition 4.1.2 of σ ij (s ) it then follows that


Pij ( s, s + h) = Pij ( s, s ) + hσ ij ( s ) + o(h)
hσ ij ( s ) + o(h) if i ≠ j
=
1 + hσ ij ( s ) + o(h) if i = j.

The coefficient of h is the transition rate.

Conversely, if Pij ( s, s + h) = Pij ( s, s ) + hq ( s ) + o(h) , then


Pij ( s, s + h) − Pij ( s, s )  o( h) 
σ ij ( s ) = lim = lim q ( s ) + = q( s)
h →0 h h →0
 h 
i.e. the coefficient of h is the transition rate.

RESULT 4.1.2(CT)

Pij ( s, s + h) = δ ij + hσ ij ( s ) + o(h) and PijC ( s, s + h) = δ ij + hσ ij ( s ) + o(h)

For i ≠ j :
Note that Pij ( s, s + h) is the probability that the state of the process at time s is i and at
time s + h has changed to j . This event includes all those cases where there are more
than 1 change in the interval ( s, s + h) provided it started in i and ended in j . Let
PijC ( s, s + h) be the probability that in the interval ( s, s + h) there was only the one change
from state i to state j . Let D be the event that there is more than one change and that
the state changes from i to j in ( s, s + h) . Therefore
Pij ( s, s + h) = PijC ( s, s + h) + P[ D] .
Let E be the event that more than one change occurs in the interval ( s, s + h)
irrespective of the state at the beginning and at the end. We will now assume that
P[ E ] = o(h) for all values of s . But D ⊂ E and therefore 0 ≤ P[ D] ≤ P[ E ] and therefore
P[D] is o(h) if P[ E ] = o(h) .
Hence Pij ( s, s + h) = PijC ( s, s + h) + o(h) . Therefore, for i ≠ j ,
Pij ( s, s + h)  PijC ( s, s + h) + o(h)   PijC ( s, s + h) 
σ ij ( s ) = lim = lim   = lim  
h →0 h →0
h  h  h→0  h 
i.e. σ ij (s ) is the limit of the rate of change from state i to state j at time s irrespective
of whether we include or exclude the possibility of more than one change.
It also shows that for i ≠ j ,
Pij ( s, s + h) = hσ ij ( s ) + o(h) and PijC ( s, s + h) = hσ ij ( s ) + o(h) . (4.1.5)
For i = j let PiiC ( s, s + h) be the probability that at time s the process is in state i and at
time s + h is still in state i i.e. no changes took place. The probability Pii ( s, s + h)
- 68 -

includes the probability of a change away from i and then back again to state i i.e. the
probability of more than one change. Therefore similarly
Pii ( s, s + h) = Pii ( s, s + h) + o(h) .
C

Now note that 1 − PiiC ( s, s + h) is the probability of some change in the interval s, s + h .
Therefore
P ( s, s + h) − 1  P C ( s, s + h) + o( h) − 1   PiiC ( s, s + h) − 1
σ ii ( s ) = lim ii = lim  ii =
 h →0 
lim 
h →0 h →0
h  h   h 
i.e. σ ii (s ) is the negative of the rate of some change from state i in the process at time
s.
It also shows that
Pii ( s, s + h) = 1 + hσ ii ( s ) + o(h) and PiiC ( s, s + h) = 1 + hσ ii ( s ) + o(h) . (4.1.6)

Kolmogorov’s Forward Equations


Since the transition probabilities must satisfy certain conditions (the Chapman-
Kolmogorov equations), the transition rates must satisfy certain conditions.

Theorem 4.1.1 For a Markov jump process with transition probabilities


Pij ( s, t ) and transition rates σ ij (s ) , we have that

Pij ( s, t ) = ∑ Pik ( s, t )σ kj (t )
∂t k∈S

or in matrix form
∂ ∂ ∂ 
 ∂t P00 ( s, t ) ∂t P01 ( s, t ) ∂t P02 ( s, t ) 
∂ ∂ ∂ 
 P10 ( s, t ) P11 ( s, t ) P12 ( s, t ) 
 ∂t ∂t ∂t 
 ∂ P ( s, t ) ∂ P ( s, t ) ∂ P ( s, t ) 
 ∂t 20 ∂t
21
∂t
22

    

 P00 ( s, t ) P01 ( s, t ) P02 ( s, t )  σ 00 (t ) σ 01 (t ) σ 02 (t ) 
 P ( s, t ) P ( s, t ) P ( s, t )  σ (t ) σ (t ) σ 12 (t ) 
=  10 11 12   10 11

 P20 ( s, t ) P21 ( s, t ) P22 ( s, t )  σ 20 (t ) σ 21 (t ) σ 22 (t ) 


  
         

or P( s, t ) = P( s, t ) A(t ) where A(t ) is the matrix of transition rates at time t .
∂t
This equation is known as Kolmogorov's forward equation.
- 69 -

27 (ST) Proof From (4.1.3) we have that


Pij ( s, t + h) = ∑ Pik ( s, t ) Pkj (t , t + h)
k∈S

= Pij ( s, t ) Pjj (t , t + h) + ∑ Pik ( s, t ) Pkj (t , t + h)


k∈S
k≠ j

= Pij ( s, t )(1 + hσ jj (t ) + o(h) ) + ∑ Pik ( s, t )(hσ kj (t ) + o(h) )


k∈S
k≠ j

(from (4.1.5) and (4.1.6))


Pij ( s, t + h) − Pij ( s, t )  o( h)   o( h) 
i.e. = Pij ( s, t ) σ jj (t ) +  + ∑ Pik ( s, t ) σ kj (t ) + 
h  h  k∈S  h 
k≠ j

so that if we let h → 0

Pij ( s, t ) = Pij ( s, t )σ jj (t ) + ∑ Pik ( s, t )σ kj (t ) = ∑ Pik ( s, t )σ kj (t ) .
∂t k∈S k∈S
k≠ j


Kolmogorov’s Backward Equations
Theorem 4.1.2 For a Markov jump process with transition probabilities
Pij ( s, t ) and transition rates σ ij (s ) , we have that

Pij ( s, t ) = −∑ σ ik ( s ) Pkj ( s, t )
∂s k∈S

or in matrix form P( s, t ) = − A( s ) P( s, t ) . This is known as Kolmogorov’s
∂s
Backward Equation.
28 (ST) Proof Similar to that of theorem 4.1.1.

A Property of Transition Rates
Theorem 4.1.3 For a Markov jump process with transition probabilities
Pij ( s, t ) and transition rates σ ij (s ) , we have that
σ ii ( s ) = −∑ σ ik ( s ) .
k∈S
k ≠i

29 (ST) Proof We have that ∑P


k∈S
ik ( s, t ) = 1 since if the process is in state i at time s it

must be in one of the states in S at time t . Differentiating both sides with respect to t
and then equating t = s we get that,
∂ ? ∂
0 = ∑ Pik ( s, t ) =∑ Pik ( s, t ) = ∑ σ ik (t ) = ∑ σ ik ( s )
∂t k∈S t =s k∈S ∂t t =s k∈S t =s k∈S

i.e. σ ii ( s ) = −∑ σ ik ( s ) . ■
k∈S
k ≠i
- 70 -

Theorem 4.1.3 implies that the sum of the elements in the i th row of the A matrix is
equal to 0.

The Time-Homogeneous Case

The Markov jump process is time-homogeneous if Pij ( s, t ) only depends on


t −s.
Then
Pij ( s, s + h) − Pij ( s, s ) Pij (h) − δ ij
σ ij ( s ) = lim = lim
h →0
h h →0 h
which is independent of s i.e. it is a constant, say σ ij , and similarly the matrix A(s ) is
constant for all values of s , say A( s ) = A . Note that P( s, s ) = I .
Let P(t ) denote P(0, t ) . Then the Kolmogorov forward equation is given by
d
P(t ) = P(t ) A. (4.1.8)
dt

Now suppose that X is a square matrix and then define the matrix function e X as
∞ ∞
1 i 1 i i

i =0 i!
X - note that e X
is a matrix. In particular e tA
= ∑i = 0 i!
A t i.e.

d tA ∞
i ∞
i  ∞ 1   ∞ 1 
e = ∑ A i t i −1 = ∑ A i t i −1 =  ∑ A i −1t i −1  A =  ∑ A i t i  A = e tA A
dt i = 0 i! i =1 i!  i =1 (i − 1)!   i =0 i! 
i.e. P(t ) = e (t ) A satisfies equations (4.1.8) and P(0) = I (we define e 0 = I ).

Determination of PiiC ( s, t )
Theorem 4.1.4 For a Markov jump process with transition probabilities
Pij ( s, t ) and transition rates σ ij (s ) , we have that
t

∫ σ ii (u ) du
PiiC ( s, t ) = e s .
 30 (ST, E)
Proof
P ( s, t + h)
C
ii

= P[ X u = i, s ≤ u ≤ t + h X s = i ]
= P[ X u = i, s ≤ u ≤ t , X u = i, t ≤ u ≤ t + h X s = i ]
= P[ X u = i, s ≤ u ≤ t X s = i ]P[ X u = i, t ≤ u ≤ t + h X s = i, X u = i, s ≤ u ≤ t ]
= PiiC ( s, t ) P[ X u = i, t ≤ u ≤ t + h X t = i ] (by the Markov Property)
= PiiC ( s, t ) PiiC (t , t + h)
= PiiC ( s, t )(1 + hσ ii (t ) + o(h) ). (from (4.1.6))
- 71 -

Hence
∂ C P C ( s, t + h) − PiiC ( s, t )
Pii ( s, t ) = lim ii
∂t h →0 h
 o( h) 
= lim PiiC ( s, t ) σ ii (t ) + 
h →0
 h 
= PiiC ( s, t )σ ii (t ). (4.1.9)
t

∫ σ ii (u ) du
A solution of this equation is PiiC ( s, t ) = e s which can be checked by differentiating
both sides with respect to t :
t

∂ C ∂ ∫σ ii ( u ) du
Pii ( s, t ) = e s
∂t ∂t
t

∫σ ii ( u ) du ∂ t
= es . ∫ σ ii (u )du
∂t s
t

∫σ ii ( u ) du
= es .σ ii (t )
by the Fundamental Theorem of Calculus.
s

∫ σ ii (u ) du
This solution also gives PiiC ( s, s ) = e s = e 0 = 1 which is the correct boundary
condition. ■

4.2 The Poisson Process

Theorem 4.2.1 Let X t be the number of events of some type that takes place
in the interval [0, t ] . Then { X t : t ∈ [0, ∞)} is a stochastic process with state
space S = {0,1,2,...} . We will assume that the process that generates the events
satisfies the following conditions:
(i) X 0 = 0 i.e. no events take place at time 0. (4.2.1)
(ii) The number of events in disjoint intervals are independent random
variables. (4.2.2)
(iii) The probability of an event taking place in the interval (t , t + h] only
depends on the length of the interval i.e. the process is time-
homogeneous. (4.2.3)
(iv) The probability that one event occurs in the interval (t , t + h] is
λh + o(h) . (4.2.4)
(v) The probability that no events occur in the interval (t , t + h] is
1 − λh + o( h) . (4.2.5)
(vi) The probability that more than one event takes place in the interval
- 72 -

(t , t + h] is o(h) . (4.2.6)
Then { X t : t ∈ [0, ∞)} is a Poisson process with parameter λ .
31 (ST, E) Proof We must show the properties in definition 2.4.4 hold:
(i) We are given that X 0 = 0 .
(ii) Assumption (ii) implies that the number of events that take place in the interval
( s, t ] i.e. the increment X t − X s , is independent of the number of events that took place in
any interval [0, u ] if u ≤ s i.e. of { X u : 0 ≤ u ≤ s} . This implies that the process has
independent increments (and from theorem 2.3.1 it follows that the process has the
Markov property).
(iii) Assumption (iii) implies that the process is time homogeneous. Note that
Pij (t , t + h) = 0 if j < i since the number of events cannot decrease. This implies that
σ ij = 0 for j < i .
From assumptions (iv) to (vi) it follows that for j ≥ i
Pi ,i +1 (t , t + h) = λh + o(h) i.e. σ i ,i +1 = λ ,
Pi ,i + k (t , t + h) = o(h) i.e. σ i ,i + k = 0 for k ≥ 2,
Pi ,i (t , t + h) = 1 − λh + o(h) i.e. σ i ,i = −λ.
Hence the process is a time-homogeneous Markov jump process with transition rate
matrix given by
− λ λ 0 0 0 
 0 −λ λ 0 0 

A= 0 0 − λ λ 0  .
 
 0 0 0 − λ λ 
      
From the Kolmogorov forward equations (for the time homogeneous case - see 4.1.8) it
follows that
d d d  −λ λ
 dt P00 (t ) dt P01 (t ) dt P02 (t )   P00 (t ) P01 (t ) P02 (t )  
0 0 0 
d    0 −λ λ 0 0 
P12 (t )   P10 (t ) P11 (t ) P12 (t )  
d d
 P10 (t ) P11 (t )
 dt dt dt  =  P (t ) P (t ) P (t )   0 0 −λ λ 0 
 P (t ) 
P22 (t )    0 −λ λ
d d d 20 21 22
 0 0 
 dt 20
P21 (t )
     

dt dt       
   
d
For any i , Pi 0 (t ) = Pi 0 ' (t ) = −λPi 0 (t ) (4.2.7)
dt
and Pij ' (t ) = λPi , j −1 (t ) − λPi , j (t ) for j > 0 (4.2.8)
with the initial condition that Pij (0) = δ ij .
- 73 -

d
A solution of (4.2.7) is Pi 0 (t ) = δ i 0 e − λ (t ) : e −λ ( t ) = δ i 0 e −λ ( t ) (−λ ) = −λPi 0 (t ) and it
dt i 0
satisfies the initial condition Pi 0 (0) = δ i 0 : Pi 0 (0) = δ i 0 e − λ ( s −s ) = δ i 0 e 0 = δ i 0 . Hence it follows
that Pi 0 (t ) = 0 if i > 0 and P00 (t ) = e − λ (t ) .
For j > 0 a solution of (4.2.8) is given by
 −λ (t ) [λ (t )] j −i
e for j ≥ i
Pij (t ) =  ( j − i )!
0 for j < i.

Note that this solution satisfies the condition Pij (0) = δ ij :

For j = i : Pii (t ) = e −λ (t )
[λ (t )]i−i = e −λ (t ) and so Pii (0) = e − λ ( 0 ) = 1 = δ ii .
(i − i )!

For j ≠ i : Pij (0) = e −λ ( 0 )


[λ (0)] j−i
= 0 = δ ij .
( j − i )!
We will simply show that it is a solution and not try to derive the solution.
For j = i we have that Pii (t ) = e − λ (t ) i.e.
d
Pii (t ) = −λe −λ (t ) = 0 − λPii (t ) = λPi ,i −1 (t ) − λPii (t )
dt
since i −1 < i .
For j > i we have that
d
Pij (t ) = λe −λ (t )
[λ (t )] − λe −λ (t ) [λ (t )] = λP (t ) − λP (t ) .
j −i −1 j −i

i , j −1
( j − i − 1)! ( j − i )!
ij
dt
For j < i we have that
d
Pij (t ) = 0 = 0 − 0 = λPi , j −1 (t ) − λPij (t ) .
dt
It follows from definition 2.4.4 that { X t : t ∈ [0, ∞)} is a Poisson process. ■

For a Poisson process { X t : t ∈ [0, ∞)} we have by definition that X 0 = 0 and that all
increments have a Poisson distribution i.e. can take on the values 0,1,2,.... The value of
X t for any sample path must be a non-decreasing function of t and must be integer
valued. Whenever there is an increase it must be a positive integer and we therefore
expect all jumps in the sample path to be of size 1. This can be proved rigorously but we
will simply assume it for this course.

Let us define an "event" as a point in time where there is an increase in the value of X t .

Theorem 4.2.2 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Then the distribution of the time until the first
event take place is an exponential distribution with parameter 1 / λ .
- 74 -

32 (CT) Proof Let T be the time from time 0 until the first event takes place.
Then
P[T > t ] = P[0 events in (0, t ]]
= P[ X t − X 0 = 0]
= e −λt since X t − X 0 ~ Poisson(λ (t − 0))
i.e. 1 − FT (t ) = e − λt
or FT (t ) = 1 − e − λt
or f T (t ) = λe − λt
which is an exponential density function with parameter 1 / λ . ■

Theorem 4.2.3 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Given any time s > 0 , the distribution of the time
from s until the next event take place is an exponential distribution with
parameter 1 / λ .

33 (CT) Proof Let T be the time from s until the next event takes place.
Then
P[T > t X s = i ] = P[ X s +t = i X s = i ]
= Pii ( s, s + t )
= e −λ ( s +t − s ) since X t + s − X s ~ Poisson(λ (t + s − s = t ))
= e − λt .
Since this probability does not depend on i , the conditional probability is equal to the
unconditional probability i.e. P[T > t ] = e − λt , and hence the distribution of T is
exponential with parameter 1 / λ since i.e. 1 − FT (t ) = e − λt implies FT (t ) = 1 − e − λt which
implies f T (t ) = λe − λt which is an exponential density function with parameter 1 / λ . ■

Theorem 4.2.4 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Let T1 be the time from 0 until the first event
takes place and let T2 be the time from the first event until the second event
takes place. Then T1 and T2 are independent exponential random variables
with parameter 1 / λ .

34 (ST)
Proof For any random variable X with distribution function F and density
function f we have that
d F ( x + h) − F ( x ) P[ x < X ≤ x + h]
f ( x) = F ( x) = lim = lim .
dx h →0 h h →0 h
For convenience let X (t ) = X t and consider
- 75 -

P[t 2 < T2 ≤ t 2 + h2 T1 = t1 ]
= P[ X (t1 + t 2 + h2 ) ≥ 2 and X (t1 + t 2 ) = 1T1 = t1 ]
= P[ X (t1 + t 2 + h2 ) ≥ 2 and X (t1 + t 2 ) = 1 X (t ) = 0 ∀ t < t1 and X (t1 ) = 1]
= P[ X (t1 + t 2 + h2 ) ≥ 2 and X (t1 + t 2 ) = 1 X (t1 ) = 1] (by the Markov property)
= P[ X (t1 + t 2 ) = 1 X (t1 ) = 1] P[ X (t1 + t 2 + h2 ) ≥ 2 X (t1 + t 2 ) = 1 and X (t1 ) = 1]
(since P[ A ∩ B | C ] = P[ B | C ]P[ A | B ∩ C ] )
= P11 (t1 , t1 + t 2 ) P[ X (t1 + t 2 + h2 ) ≥ 2 X (t1 + t 2 ) = 1] (by the Markov property)

= e -λt 2 ∑ P1i (t1 + t 2 , t1 + t 2 + h2 )
i =2

 ∞

= e -λt 2  P12 (t1 + t 2 , t1 + t 2 + h2 ) + ∑ P1i (t1 + t 2 , t1 + t 2 + h2 ) 
 i =3 
= e (λh2 + o(h2 ) + o(h2 ) ).
- λt 2

Hence
P[t 2 < T2 ≤ t 2 + h2 | T1 = t1 ] o(h2 )
= λ e − λt 2 +
h2 h2
and if we let h 2 → 0 we get that
f T2 |T1 (t 2 | t1 ) = λe − λt2
i.e. the conditional density function of T2 given that T1 = t1 is an exponential density
function with parameter 1 / λ . Since this density function does not depend on t1 it means
that it is equal to the unconditional density function of T2 which in turn implies that T1
and T2 are independent random variables both with an exponential distribution with
parameter 1 / λ . ■

NOTE: Theorem 4.2.4 can be extended to show that if Ti is the time between the (i − 1) th
and i th events, then T1 , T2 ,... are independent random variables all with an exponential
distribution with parameter 1 / λ .

Let t be a point in time at which an event does not occur. Let Rt be the time from t until
the next event occurs and S t be the time since the last event occurred.

Theorem 4.2.5 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . The distribution of Rt is independent of t and is
given by
P[ Rt ≤ x] = 1 − e − λx for x ≥ 0 .
35 (CT) Proof This follows from theorem 4.2.3 since in the proof of theorem 4.2.3 it
does not matter whether or not an event took place at time s or not. ■
- 76 -

Theorem 4.2.6 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . The distribution of S t is given by
P[ S t = t ] = e − λt and P[ S t ≤ x] = 1 − e − λx for 0 ≤ x < t.
36 (CT) Proof Suppose that no event took place in (0, t ] . Then we have that, by theorem
4.2.4,
P[ S t = t ] = P[T1 > t ] = e − λt .
Now suppose that at least one event took place in the interval (0, t ] . For 0 ≤ x < t we
have that
P[ S t ≤ x] = P[at least one event in (t − x, t )]
= 1 − P[no event in (t − x, t )]
= 1 − e −λx by theorem 4.2.4.

Theorem 4.2.7 Let { X t : t ∈ [0, ∞)} with state space S = {0,1,2,3,...} be a Poisson
process with parameter λ . Then the distribution of the time until the n th
event takes place is a gamma distribution with parameters 1 / λ and n .
37 (CT) Proof
The distribution of the time until the n th event takes place is the distribution of the sum
of n independent random variables, i.e. the times between successive events, each
exponentially distributed with parameter 1 / λ which is a gamma distribution with
parameters 1 / λ and n . ■

Theorem 4.2.8: Sums of Poisson Processes Suppose that { X t : t ∈ [0, ∞)}


and {Yt : t ∈ [0, ∞)} are two independent Poisson processes with parameters λ1
and λ2 respectively. Let Z t = X t + Yt . Then {Z t : t ∈ [0, ∞)} is a Poisson process
with parameter λ1 + λ2 .
38 (ST) Proof
We have that Z 0 = X 0 + Y0 = 0 + 0 = 0 .
We also have that X t − X s is independent of { X u : 0 ≤ u ≤ s} and of {Yu : 0 ≤ u ≤ s} since
the processes are independent. Similarly Yt − Ys is independent of {Yu : 0 ≤ u ≤ s} and
of { X u : 0 ≤ u ≤ s} .
Hence Z t − Z s = X t − X s + Yt − Ys is independent { X u : 0 ≤ u ≤ s} and {Yu : 0 ≤ u ≤ s} and
therefore independent of {Z u = X u + Yu : 0 ≤ u ≤ s} i.e. the Z t 's have independent
increments.
Furthermore Pi ,i +u ( s, t ) = P[ Z t − Z s = u ] = P[ X t − X s + Yt − Ys = u ]. But X t − X s and
Yt − Ys are independent Poisson random variables with parameters λ1 (t − s ) and
λ 2 (t − s ) respectively and therefore Z t − Z s is a Poisson random variable with parameter
- 77 -

λ1 (t − s ) + λ 2 (t − s ) = (λ1 + λ 2 )(t − s ) (sums of independent Poisson random variables –


see second year work).

Theorem 4.2.9: Thinning of Poisson Processes Suppose that { X t : t ∈ [0, ∞)}


is a Poisson process with parameter λ . Suppose that events that take place
are of two types, say type A and type B. For each event the probability that it
will be of type A is p A and whether or not an event is of type A is
independent of the process and what happened at other events. Let Yt be the
number of events of type A that is observed in (0, t ] . Then {Yt : t ∈ [0, ∞)} is a
Poisson process with parameter λp A .
39 (ST) Proof
Firstly Y0 = 0 since by definition no events (of any kind) has occurred by time 0.
The number of events of type A in the interval ( s, t ] is independent of { X u : 0 ≤ u ≤ s}
and on whether or not any of the events in (0, s ] were of type A. Hence Yt − Ys is
independent of {Yu : 0 ≤ u ≤ s} i.e. the Yt 's have independent increments. Now

P[Yt − Ys = n] = ∑ P[ X t − X s = i and n of the i events are type A]
i =n

= ∑ P[ X t − X s = i ]P[n of the i events are type A X t − X s = i ]
i =n

= ∑ e −λ (t − s )
[λ (t − s)]i . i ( p )n (1 − p )i −n
  A A
i =n n
i!
since X t − X s ~ Poi(λ (t − s )) and n of the i events are type A~bin(i,p A )

[λp A (t − s)] 1 ∑ [λ (1 − p A )(t − s)]


∞ i −n
−λ (t − s )
=e
n

n! i = n (i − n)!

= e −λ (t − s ) [λp A (t − s )] e λ (1− p A )(t − s )


n 1

n!

= e − λp A ( t − s )
[λp A (t − s)]n .
n!
Thus Yt − Ys ~ Poisson(λp A (t − s )) .

4.3 Pure Birth Process

In the Poisson process the parameter λ remains constant i.e.


Pii (t , t + h) = Pii (h) = 1 − λh + o(h) and Pi ,i +1 (t , t + h) = Pi ,i +1 (h) = λh + o(h)
do not depend on i , the state of the system at time t . In many applications this may not
be a realistic model. For instance if X t is the size of the population and "events" increase
- 78 -

in the size of the population when a birth takes place, it can be expected that as the
population size increases the chance of a birth in an interval of specified length will
increase if competition for limited resources is not a factor.
Let us consider a Markov process { X t : t ≥ 0} and state space S = {0,1,2,3,4,...} where the
following holds:
1. Events occurring in non-overlapping intervals are independent. (4.3.1)
2. (a) Pii (h) = 1 − λi h + o(h) (4.3.2)
(b) Pi ,i +1 (h) = λi h + o(h) (4.3.3)

(c) ∑ P ( h) = o( h)
j =i + 2
ij (4.3.4)

(d) Pij (h) = 0 for all j < i . (4.3.5)


3. X0 = 0 . (4.3.6)

Assumption 3 implies that the state of the system at t i.e. X t , will be the number of
"births" that took place in (0, t ] and is not the size of the population. Assumption 2(d)
implies that the state of the system can only increase i.e. it is still a counting process.
Since the state of the system is the number of births that took place, it makes sense to
have P01 (h) = λ0 h + o(h) > 0 i.e. "births" can take place with the state of the system zero.
Of course births can also be considered to take place if immigration into the system from
outside the system is allowed and included in the concept of a "birth".

Theorem 4.3.1 Let { X t : t ≥ 0} with state space S = {0,1,2,3,4,...} be a pure birth


process as defined in (4.3.1)-(4.3.6). Then
d
P00 (t ) = −λ0 P00 (t ) (4.3.7)
dt
and
d
P0 n (t ) = −λn P0 n (t ) + λn−1 P0,n−1 (t ) for n > 0 (4.3.8)
dt
40 (ST) Proof The assumptions imply that the process is a time-homogeneous Markov
jump process with transition rates given by
− λ0 λ0 0 0 0 
 0
 − λ1 λ1 0 0 
A= 0 0 − λ2 λ2 0  .
 
 0 0 0 − λ3 λ3 
      

[ ]
Note that (P00 (t ), P01 (t ), P02 (t ),...) is the first row of the matrix P(t ) = Pij (t ) .
d d d  d
Similarly  P00 (t ), P01 (t ), P02 (t ),... is the first row of P(t ) .
 dt dt dt  dt
- 79 -

d
According to the Kolmogorov forward equations, namely P(t ) = P(t ) A , the first row
dt
d
of P(t ) must be equal to the first row of P(t ) A i.e. the first row of P (t ) multiplied by
dt
A i.e. in this case
d d d 
 P00 (t ), P01 (t ), P02 (t ),...
 dt dt dt 
− λ0 λ0 0 0 0 
 0 − λ1 λ1 0 0 

= (P00 (t ), P01 (t ), P02 (t ),...) 0 0 − λ2 λ2 0 
 
 0 0 0 − λ3 λ3 
      
d d
i.e. P00 (t ) = −λ0 P00 (t ) and P0 n (t ) = −λn P0 n (t ) + λn−1 P0,n−1 (t ) for n > 0 . ■
dt dt
To solve equations (4.3.7) and (4.3.8) we note that equation (4.3.7) is a differential
equation of which a possible solution is
P00 (t ) = ce − λ0t where c is a constant. (4.3.9)
But since P00 (0) = P[ X 0 = 0 X 0 = 0] = 1 , it follows from (4.3.9) that
1 = ce − λ0 ( 0 ) = c i.e. P00 (t ) = e − λ0t . (4.3.10)
For a specific set of λ k 's a recursive solution for n ≥ 1 is given by
t
P0 n (t ) = λ n −1e −λnt ∫ e λn x P0,n −1 ( x)dx (4.3.11)
0

since
t
d
P0 n (t ) = λn−1 (−λn )e −λnt ∫ e λn x P0,n−1 ( x)dx +λn−1e −λnt e λnt P0,n−1 (t ) ( F .T .o.C )
dt 0

 t

= −λn λn−1e ∫ e λn x P0,n−1 ( x)dx  + λn−1 P0,n−1 (t )
−λnt

 0 
= −λn P0 n (t ) + λn−1 P0,n−1 (t ) from (4.3.11)
i.e. the formula in (4.3.11) will be a solution (NB. check the boundary condition is also
satisfied).

The Yule Process


The Yule process is a special case where λ n = nλ and X 0 = 1 i.e. the population at time
0 is 1 and the probability of a birth is proportional to the size of the population. In this
case the equations reduces to
d d
P11 (t ) = −λP11 (t ) and P1n (t ) = −nλP1n (t ) + (n − 1)λP1,n−1 (t ) .
dt dt
- 80 -

Note there is no state 0 in this case. The solution of these equations is


(
P1n (t ) = e −λt 1 − e −λt )n −1
,
d
since for n > 1, P1n (0, t ) = −λe −λt (1 − e −λt ) n−1 + e −λt (n − 1)(1 − e −λt ) n−2 λe −λt
dt
[ ]
= −λP1n (0, t ) + (n − 1)λe −λt 1 − (1 − e −λt ) (1 − e −λt ) n−2
−λt −λt n − 2
= −λP1n (0, t ) + (n − 1)λe (1 − e ) − (n − 1)λe −λt (1 − e −λt ) n−1
= −λP1n (0, t ) + (n − 1)λP1,n−1 (0, t ) − (n − 1)λP1n (0, t )
= −nλP1n (0, t ) + (n − 1)λP1,n−1 (0, t )
i.e. the equations are satisfied (show the other case n = 1 as well) (NB. check the
boundary conditions are also satisfied for both).

4.4 Pure Death Process


Consider a physical phenomenon in which there is no increase in the population size once
the process started. Given an initial population size i > 0 individuals "die" at a certain
rate eventually reducing the size to zero.
Let us consider a Markov process { X t : t ≥ 0} and state space S = {0,1,2,3,..., i} where the
following holds:
1. Events occurring in non-overlapping intervals are independent. (4.4.1)
2. (a) Pjj (h) = 1 − µ j h + o(h) (4.4.2)
(b) Pj , j −1 (h) = µ j h + o(h) (4.4.3)
j −2
(c) ∑P
r =0
jr ( h) = o( h) (4.4.4)

(d) Pjr (h) = 0 for all r > j . (4.4.5)


3. X 0 = i . (4.4.6)

Such a process is once again a time-homogeneous Markov jump process with transition
rate matrix

0 0 0  0 0 0 
µ − µ1 0  0 0 0 
 1 
0 µ2 − µ2  0 0 0 
A= .
       
0 0 0  µ i −1 − µ i −1 0 
 
 0 0 0  0 µi − µ i 

Let Pin (t ) = P[ X t = n | X 0 = i ] . It then follows from the Kolmogorov Forward equations


that
- 81 -

d
Pi 0 (t ) = µ1 Pi1 (t )
dt
d
Pin (t ) = − µ n Pin (t ) + µ n+1 Pi ,n+1 (t ) for 0 < n < i
dt
d
Pii (t ) = − µi Pii (t ).
dt
For the case µ n = nµ we get that
i 
Pin (t ) =  e − nµt (1 − e − µt ) i − n (4.4.7)
n
(which is a binomial distribution with p = e - µt ) since for 0 < n < i
d i  i 
Pin (t ) =  e −nµt (−nµ )(1 − e −µt ) i −n +  e −nµt (i − n)(1 − e −µt ) i −n−1 e −µt µ
dt n n
 i  −( n+1) µt
= −nµPin (t ) + (n + 1) µ  e (1 − e −µt ) i −( n+1)
 n + 1 
= −nµPin (t ) + (n + 1) µPi ,n+1 (t )
i.e. (4.4.7) satisfies the equations (show the other cases too: n = i and n = 0 ) (NB. check
the boundary conditions are also satisfied for each case).

4.5 Two – State Markov Processes

For the Poisson process we have that


(λt ) n
lim P0 n (t ) = lim e −λt
= 0 for all n ≥ 0
t →∞ t →∞ n!
which essentially means that as time goes on the number of events tend to infinity with
probability one.
Similarly for the pure birth process with λ n = nλ
lim P1n (t ) = lim e − λt (1 − e − λt ) n −1 = 0 for all n ≥ 0
t →∞ t →∞

i.e. as time goes on the number of births goes to infinity with probability one.
For the pure death process with µ n = nµ
i 
lim Pin (t ) = lim e − nµt (1 − e − µt ) i − n = 0 for all i ≥ n ≥ 1
t →∞ t →∞ n
 
and
lim Pi 0 (t ) = lim(1 − e − µt ) i = 1
t →∞ t →∞

i.e. as time goes on the size of the population goes to zero with probability one.

Now consider a Markov process { X t : t ≥ 0} with state space S = {0,1} where


1. Events in disjoint intervals are independent.
2. P00 (h) = 1 − λh + o(h)
- 82 -

P01 (h) = λh + o(h)


P10 (h) = µh + o(h)
P11 (h) = 1 − µh + o(h) .
This is also a time-homogeneous Markov jump process with transition rate matrix
− λ λ 
A= .
 µ − µ
It follows from the Kolmogorov forward equations that
d
P00 (t ) = −λP00 (t ) + µP01 (t ) (4.5.1)
dt
d
P01 (t ) = λP00 (t ) − µP01 (t ) (4.5.2)
dt
d
P10 (t ) = −λP10 (t ) + µP11 (t ) (4.5.3)
dt
and
d
P11 (t ) = λP10 (t ) − µP11 (t ) . (4.5.4)
dt
The solution to these equations is given by (prove they are solutions)
µ λ
P00 (t ) = + e −( λ + µ )t
λ+µ λ+µ
µ µ −( λ + µ )t
P10 (t ) = − e
λ+µ λ+µ
λ λ
P01 (t ) = − e −( λ + µ )t and
λ+µ λ+µ
λ µ −( λ + µ )t
P11 (t ) = + e .
λ+µ λ+µ
In this case
µ
lim P00 (t ) = lim P10 (t ) = and
t →∞ t →∞ λ+µ
λ
lim P01 (t ) = lim P11 (t ) = .
t →∞ t →∞ λ+µ
i.e. there exists a limit distribution for finding the process in a certain state and this limit
distribution does not depend on the initial state of the system.
Although it is possible to prove the above statement under certain general conditions, we
will simply assume in future that such limiting distributions exist. We will also show in
this example that it is possible to find the limit distribution without first solving the
Kolmogorov equations.

In the first place note that


d
lim P00 (t ) = lim(−λ )e −( λ + µ )t = 0
t →∞ dt t →∞

and similarly
- 83 -

d d d
lim P01 (t ) = lim P10 (t ) = lim P11 (t ) = 0 .
t →∞ dt t →∞ dt t →∞ dt

Now let p 0 = lim P00 (t ) = lim P10 (t ) and p1 = lim P01 (t ) = lim P11 (t ) .
t →∞ t →∞ t →∞ t →∞

Letting t → ∞ on both sides of equations 4.5.1 to 4.5.4, we get that


0 = −λp 0 + µp1
0 = λp 0 − µp1
.
0 = −λp 0 + µp1
0 = λp 0 − µp1
Note that the first two equations are the same as the third and fourth equations and that
the second equation is the negative of the first equation. We therefore have only one
independent equation and from this equation we get that p1 = (λ / µ ) p 0 . To determine the
solution of the equations we also make use of the fact that p 0 + p1 = 1 . Together with the
µ λ
one equation we have we can then determine the solution p 0 = and p1 =
λ+µ λ+µ
which is the same as the limits we obtained above.
It is also possible to show that these limiting probabilities is the limit of the proportion of
time the process spends in a particular state.

4.6 Birth and Death Processes


Let a "birth" be an event signifying an increase in the population and a "death" an event
signifying a decrease in the population. Now suppose these two types of events satisfy
the following assumptions:
1. Events in disjoint intervals are independent.
2. The event of a birth in a certain interval is independent of the event of a death in
the same interval.
3. If the size of the population at time t is n , then
(a) the probability of a birth in the interval (t , t + h) is λ n h + o(h) ,
(b) the probability of no births in the interval (t , t + h) is 1 − λ n h + o(h) ,
(c) the probability of more than one birth in the interval (t , t + h) is o(h) .
4. If the size of the population at time t is n ≥ 1 , then
(a) the probability of a death in the interval (t , t + h) is µ n h + o(h) ,
(b) the probability of no deaths in the interval (t , t + h) is 1 − µ n h + o(h) ,
(c)the probability of more than one death in the interval (t , t + h) is o(h) .
5. If the size of the population at time t is 0 the probability of a death in the interval
(t , t + h) is o(h) .
6. The probability of more than one birth and/or death in the interval (t , t + h) is
o(h) .

Let X t be the size of the population at time t . Then


- 84 -

P00 (t , t + h) = 1 − λ0 h + o(h) and P01 (t , t + h) = λ0 h + o(h)


and for n ≥ 1
Pn ,n −1 (t , t + h) = [µ n h + o(h)][1 − λ n h + o(h)] + o(h)
= µ n h + o(h),
Pn ,n (t , t + h) = [1 − µ n h + o(h)][1 − λ n h + o(h)] + o(h)
= 1 − µ n h − λ n h + o(h),
Pn ,n +1 (t , t + h) = [λ n h + o(h)][1 − µ n h + o(h)] + o(h)
= λ n h + o( h)
and
Pnj (t , t + h) = o(h) for j ≠ n − 1, n or n + 1 .
It then follows that the process is a time-homogeneous Markov jump process with
transition rate matrix
− λ0 λ0 0 0 0 
 µ 0 
 1 − λ1 − µ1 λ1 0
A= 0 µ2 − λ2 − µ 2 λ2 0  .
 
 0 0 µ3 − λ3 − µ 3 λ3 
      

From the Kolmogorov forward equations we get that


d
P00 (t ) = −λ0 P00 (t ) + µ1 P01 (t ) (4.6.1)
dt
and
d
P0 n (t ) = λn−1 P0,n−1 (t ) − ( µ n + λn ) P0 n (t ) + µ n+1 P0,n+1 (t ) for n > 0 . (4.6.2)
dt
If at time 0 the size of the population is 1, λ n = nλ and µ n = nµ , the solution is given by
1 − e − (λ − µ )t
P10 (t ) = ξ t = µ (4.6.3)
λ − µe −( λ − µ )t
and
n −1
P1n (t ) = (1 − ξ t )(1 − η t )η t where η t = (λ / µ )ξ t . (4.6.4)
The limiting distribution can be found by letting t → ∞ in (4.6.3) and (4.6.4). However,
more general results, i.e. not only for λ n = nλ , µ n = nµ and i = 1 , can be determined as

follows. Let t → ∞ in (4.6.1) and (4.6.2) and use that lim Pij ( s, t ) = 0 :
t →∞ ∂t

∂ Pij ( s, t + h) − Pij ( s, t ) Pij ( s, t + h) − Pij ( s, t )


lim Pij ( s, t ) = lim lim = lim lim = lim 0 = 0
t →∞ ∂t t →∞ h →0 h h →0 t →∞ h h →0

provided the interchange of limits is in order.


Let lim Pin ( s, t ) = p n . We then obtain that
t →∞

0 = −λ0 p 0 + µ1 p1 for n = 0 (4.6.5)


and
- 85 -

0 = −( µ n + λ n ) p n + λ n −1 p n −1 + µ n +1 p n +1 for n > 0 . (4.6.6)


From (4.6.5) we get that
p1 = (λ0 / µ1 ) p 0 .
From (4.6.6) with n = 1 we then get
0 = −( µ1 + λ1 ) p1 + λ0 p 0 + µ 2 p 2
= −( µ1 + λ1 )(λ0 / µ1 ) p 0 + λ0 p 0 + µ 2 p 2
= −λ0 p 0 − (λ0 λ1 / µ1 ) p 0 + λ0 p 0 + µ 2 p 2
λ0 λ1
or µ 2 p2 = p0
µ1
λλ
i.e. p 2 = 0 1 p0 .
µ1 µ 2
Using inductive arguments it can be shown that
λ λ ...λ
p n = 0 1 n −1 p 0 . (4.6.7)
µ1 µ 2 ...µ n


Now using the requirement that ∑p
n =0
n = 1 we get that

 ∞
λ λ ...λ 
p 0 1 + ∑ 0 1 n −1  = 1
 n =1 µ1 µ 2 ...µ n 
−1
 ∞
λ λ ...λ  λ0 λ1 ...λ n −1
i.e. p n = 1 + ∑ 0 1 n −1  (4.6.8)
 n =1 µ1 µ 2 ...µ n  µ1 µ 2 ...µ n

λ λ ...λ
provided ∑ 0 1 n −1 is finite.
n =1 µ1 µ 2 ...µ n

For the simple birth and death process where λ n = λ and µ n = µ we get that
λ0 λ1 ...λ n −1 λλ ...λ  λ 
n

= = 
µ1 µ 2 ...µ n µµ...µ  µ 
λ λ ...λ
n
∞ ∞
λ 1 λ
and 1 + ∑ 0 1 n −1 = ∑   = provided < 1
n =1 µ1 µ 2 ...µ n n =0  µ  1− λ / µ µ
n
λ λ
i.e. p 0 = 1 − λ / µ and p n = [1 − λ / µ ]  provided < 1 .
µ µ

4.7 Estimation of Transition Rates for a Markov Jump Process


The Poisson Process

SETTING: For the Poisson process there is a single transition rate namely λ . Let T1 be
the time from time 0 until the first jump occurs. Let Ti be the time from the (i − 1) st jump
- 86 -

until the i th jump. If we know the sample path for a Poisson process all the Ti 's can be
determined and conversely if all the Ti 's are known the complete sample path is known
i.e. all the information about the process is contained in the Ti 's.
LIKELIHOOD ESTIMATION: From the note following theorem 4.2.4 we have that the
Ti 's are independent exponential random variables all with parameter 1 / λ . Now suppose
that a Poisson process is observed until the n th jump occurs. The likelihood function is
then given by
n
n −λ ∑ ti
L ( λ ) = ∏ λ e − λt i = λ n e i =1
= λ n e − λt
i =1
th
where t is the total time until the n event is observed. Then
ln L(λ ) = n ln λ − λt
d
⇒ ln L(λ ) = n / λ − t

which is 0 if λ is equal to n / t i.e. the maximum likelihood estimator of λ is n / T
where T is the random variable for the total time until the n th event occurs.

In theory we can derive the distribution of the maximum likelihood estimator from the
fact that T has a gamma distribution with parameters 1 / λ and n . Unfortunately this is
not one of our standard distributions for inference. We can use the gamma distribution of
 n 
T to determine the expected value and variance of the estimator namely  λ̂ and
 n −1
 n2  ˆ2
 λ . For large values of n the maximum likelihood estimator will have

 (n − 1) (n − 2) 
2

an approximate normal distribution with expected value λ and variance equal to the
Cramer-Rao lower bound.

CONFIDENCE INTERVAL: It is, however, quite straightforward to get confidence


intervals and test hypotheses about λ . Since T ~ GAM (1 / λ , n) we have that
2T /(1 / λ ) = 2λT ~ χ 2 (2n) i.e.
1 − 2α = P[ χ α (2n) ≤ 2λT ≤ χ 1−α (2n) | λ ]
2 2

χ α 2 ( 2n) χ 1−α 2 (2n)


= P[ ≤λ≤ | λ]
2T 2T
for all values of λ . If t is the observed value of T , then
 χ α 2 (2n) χ 1−α 2 (2n) 
 
 2t , 2 t 
 
is a 100(1 − 2α )% confidence interval for λ .
To test the hypothesis H 0 : λ = λ0 at a 2α significance level, reject H 0 if λ̂ is not in the
100(1 − 2α )% confidence interval for λ . (See the practical for more details on how to
apply this.)
- 87 -

The above assumes that the process is in fact a Poisson process. To test whether or not it
is a Poisson process, we should test whether the number of events has a Poisson
distribution and whether or not the times between successive events are independent and
exponentially distributed.

To test for the Poisson distribution, given enough observations, divide the interval [0, t ]
into k intervals of equal length. Then the number of events observed in each interval will
be a Poisson random variable with parameter λt / k . Use the MLE of λ to estimate the
expected number of events in each interval and then apply a χ 2 goodness of fit test.

To test for the independence of times between events, calculate the serial correlation for
these times and then apply a test to determine it is significantly different from 0. (Tests
for serial correlation will be discussed in the course on Time Series Analysis in the
second semester.)

4.8 The Structure of Markov Processes

The Time-Homogeneous Case

We define Wt as the length of time that the process remains in the state being occupied at
time t i.e. for every ω ∈ Ω and t ≥ 0 ,
Wt (ω ) = inf{s > 0 : X t + s (ω ) ≠ X t (ω )} .

Theorem 4.8.1 P[W0 > u | X 0 = i] = eσ u .ii

41 (ST, E) Proof


We note that P[W0 > u | X 0 = i ] = PiiC (0,0 + u ) since for the waiting time from time 0
will be greater than u if and only if the process remains in state i from time 0 to 0 + u .
From theorem 4.1.4 we therefore have that

(theorem 4.1.1)

Theorem 4.8.2 Suppose that { X t : t ≥ 0} is a time-homogeneous Markov jump


process. Let Wt be the time from t until the first event occurs. Let X W be the t

state to which the process jumps when the first jump occurs. Then Wt and
X W are independent random variables and
t
- 88 -

σ ij
P[ X Wt = j | X t = i ] = for i ≠ j .
− σ ii

42 (ST, E) Proof Consider


P[ X t +u +h = j , u < Wt ≤ u + h | X t = i ]
= P[ X t +u +h = j , Wt > u | X t = i ] (since X t +u +h = j ≠ i implies Wt ≤ u + h)
= P[Wt > u | X t = i ]P[ X t +u +h = j | Wt > u , X t = i ]
(since P[ A ∩ B | C ] = P[ B | C ]P[ A | B ∩ C ])
= eσ iiu P[ X t +u +h = j | X u +t = i ] (by theorem 4.8.1 and Markov property)
= eσ iiu Pij (h). (by time - homogeneity)
If we divide the left hand side by h and let h → 0 we get the joint conditional probability
mass/density function of X Wt and Wt on Xt. If we divide the right hand side by h and let
h → 0 we get e σ ii u σ ij . Note that one factor only depends on j and the other only on u
(this is for a fixed i ). Hence X Wt and Wt are independent random variables. Furthermore
 σ ij 
e σ ii u σ ij = −σ ii e σ ii u  
 − σ ii 
σ ij
where − σ ii e σ ii u is the conditional density function of Wt i.e. must be the probability
− σ ii
that the first jump is to j given that the process started in state i at time t . ■

By the Markov property and time-homogeneity, given that the first jump is to j the
1
waiting time for the next event is exponentially distributed with parameter and the
− σ jj
σ jk
probability that the second jump is to k is given by etc. Furthermore the waiting
− σ jj
times are independent random variables since the waiting time for the second jump only
depends on the state j at the first jump and not on anything that happened before.

The Time-Inhomogeneous Case

Theorem 4.8.3 Let { X t : t ≥ 0} be a Markov jump process with transition rate


matrix A(t ) = [σ ij (t )] . Let Wt be the waiting time from time t until the next
jump occurs. Then
 ∫ σ ii (u ) du
s+w


P[Ws > w | X s = i ] = e s for w > 0
1
 otherwise.
- 89 -

43 (ST, E) Proof


We note that P[Ws > w | X s = i ] = PiiC ( s, s + w) since for the waiting time from time s
will be greater than w if and only if the process remains in state i from time s to s + w .
s+w

From theorem 4.1.4 we therefore have that P[Ws > w | X s = i ] = P ( s, s + w) = e ∫s


σ ii ( u ) du C
. ii
s

∫ σ ii (u ) du
And for w = 0 we have e s = e0 = 1 . ■

Note that the density function of Ws given that X s = i is given by


s+w

f Ws | X s ( w | i ) = −σ ii ( s + w)e ∫s
σ ii ( u ) du
.
Let X s+ be the state to which the process jumps to at the first jump after s .

Theorem 4.8.4 Let { X t : t ≥ 0} be a Markov jump process with transition rate


matrix A(t ) = [σ ij (t )] . Then
σ ij ( s + w)
P[ X s+ = j | X s = i,Ws = w] =
.
− σ ii ( s + w)
(Note that this probability depends on w i.e. Ws and X s+ are not independent
random variables.)

44 (ST, E) Proof Consider


P[ X s + w+ h = j , w < Ws ≤ w + h | X s = i ] = P[ X s + w+ h = j , Ws > w | X s = i ]
= P[Ws > w | X s = i ]P[ X s + w+ h = j | X s = i, Ws > w]
by P[ A ∩ B | C ] = P[ B | C ]P[ A | B ∩ C ]
s+w

= e ∫s
σ ii ( u ) du
P[ X s + w+ h = j | X s + w = i ].
(by theorem 4.8.4 and the Markov property)
If we divide by h and let h → 0 we get on the left hand side the joint conditional
probability mass/density function of X s+ and Ws on Xs, i.e. . On the right
hand side we get
 σ ii ( u ) du  σ ij ( s + w) 
s+w s+w

e ∫s σ ij ( s + w) =  − σ ii ( s + w)e ∫s
σ ii ( u ) du
  .
  − σ ii ( s + w) 
Since the first factor is the density function of Ws , given X s = i , the second factor must
be the conditional probability that X s+ = j given that X s = i and Ws = w . It also follows
then that Ws and X s+ are not independent. ■
- 90 -

4.9 The Integrated Form of the Kolmogorov Equations

Theorem 4.9.1 Suppose that X , Y and Z are jointly distributed discrete


random variables with possible values x1 , x 2 ,... , y1 , y 2 ,... and z1 , z 2 ,...
respectively. Then
P[Y = y i | Z = z j ] = ∑ P[ X = x k | Z = z j ]P[Y = y i | X = x k , Z = z j ]
∀k

i.e. f Y |Z ( y i | z j ) = ∑ f X |Z ( x k | z j ) f Y | X ,Z ( y i | x k , z j ) .
∀k

 45 (CT)
Proof
P[Y = y i | Z = z j ] = P[ X ∈ {x k }, Y = y i | Z = z j ]
∀k

= ∑ P[ X = x k , Y = y i | Z = z j ] = ∑ f X ,Y |Z ( x k , y i | z j )
∀k ∀k

= ∑ P[ X = x k | Z = z j ]P[Y = y i | X = x k , Z = z j ]
∀k

by the conditional probability rule


= ∑ f X |Z ( x k | z j ) f Y | X , Z ( y i | x k , z j ).
∀k

Theorem 4.9.2 Suppose that X , Y and Z are jointly distributed continuous


random variables. Then

f Y |Z ( y | z ) = ∫f
−∞
X |Z ( x | z ) f Y | X , Z ( y | x, z )dx .

46 (CT) Proof



f Y ,Z ( y, z ) f X ,Y , Z ( x, y, z )
f Y |Z ( y | z ) =
f Z ( z)
= ∫
−∞
f Z ( z)
dx


f X , Z ( x, z ) f X ,Y , Z ( x, y, z )
= ∫
−∞
f Z ( z) f X , Z ( x, z )
dx


= ∫f
−∞
X |Z ( x | z ) f Y | X , Z ( y | x, z )dx.

NOTE: Now let us consider the case where X is a continuous random variable and Y
and Z are discrete random variables. Let P[ X = x, Y = y i , Z = z j ] be the joint
density/probability mass function of X , Y and Z at the point x, y i and z j . Similar to
theorems 4.9.1 and 4.9.2 we will then get that
- 91 -


P[Y = y i | Z = z j ] = ∫ P[ X = x, Y = y
−∞
i | Z = z j ]dx


= ∫ P[ X = x | Z = z
−∞
j ]P[Y = y i | X = x, Z = z j ]dx.

Theorem 4.9.3 Let { X t : t ≥ 0} be a Markov jump process with transition rate


matrix A(t ) = [σ ij (t )] . Let Rs be the time from time s (where an event does not
occur) until the next jump take place (i.e. Rs is the residual time) and let X s+
be the state to which the process jumps to at time s + Rs i.e. X s+ = X s + R . Then s

for i ≠ j
t −s s+w

e ∫s
σ ii ( u ) du
Pij ( s, t ) = P[ X t = j | X s = i ] = ∑∫
∀l ≠ i 0
σ il ( s + w)Plj ( s + w, t )dw .

47 (ST, E) Proof


Pij ( s, t ) = P[ X t = j | X s = i ]
= P[ Rs ∈ (0, t − s ), X t = j | X s = i ] (if i ≠ j then 0 < Rs < t − s )
t −s
= ∫f
0
Rs , X t | X s ( w, j | i )dw

t −s
= ∫f
0
Rs | X s ( w | i ) P[ X t = j | X s = i, Rs = w]dw

since P[ A ∩ B | C ] = P[ A | C ]P[ B | A ∩ C ]
t −s s+w

e ∫s
σ ii ( u ) du
= ∫ (− σ ii ( s + w) )P[ X t = j | X s = i, Rs = w]dw
0

by the note below theorem 4.8.4


t −s s+w
∫sσ ii (u ) du (− σ ( s + w) ) P[ X
= ∫e ii ∑ s + w = l , X t = j | X s = i, Rs = w]dw
∀l ≠ i
0
t −s s+w
∫sσ ii (u ) du (− σ ( s + w) ) P[ X
= ∫e ii ∑ s + w = l | X s = i, Rs = w]
∀l ≠ i
0

P[ X t = j | X s + w = l , X s = i, Rs = w]dw
t −s
σ il ( s + w)
s+w

∑ ∫ e∫
σ ii ( u ) du
= s
(− σ ii ( s + w) ) P[ X t = j | X s + w = l , X s = i, Rs = w]dw
∀l ≠ i 0 − σ ii ( s + w)
from theorem 4.8.5
- 92 -

t −s
(− σ ii ( s + w) ) σ il ( s + w) P[ X t = j | X s + w = l ]dw
s+w

e ∫s
σ ii ( u ) du
= ∑∫
∀l ≠ i 0 − σ ii ( s + w)
(by the Markov property)
t −s
σ il ( s + w)
s+w

∑ ∫ e∫
σ ii ( u ) du
= s
(− σ ii ( s + w) ) Plj ( s + w, t )dw.
∀l ≠ i 0 − σ ii ( s + w)

The result of theorem 4.9.3 is known as the integrated form of the Kolmogorov
equations. Instead of having the derivatives and the transition probabilities in a set of
equations we have the transition probabilities and integrals of the transition probability
functions in a set of equations. The advantage of the latter is that there are fairly efficient
algorithms to calculate a sequence of approximate values of the function even though we
are not able to solve the equations analytically.

4.10 Processes where the Transition Rates Also Depend on the


Length of Time it is already in a State

There are many examples where the transition from one state to another may depend on
the time the process already is in that state. Consider for example a process with three
states H="Healthy", S="Sick" and D="Dead". The probability of a sick person recovering
and being healthy again may depend on his age but also on how long the person has been
sick. That means that if we let X t be the state of the process at time t we will for
instance have that P[ X t = j | X u = i, s − r ≤ u ≤ s ] and P[ X t = j | X u = i, s − 2r ≤ u ≤ s ]
will be different even though in both cases it is given that X s = i i.e. the process will not
have the Markov property.
To work with a process with the Markov property we will have to define the state of the
system in such a way that knowledge of the present state of the process is enough to
determine the future probabilities of the process. This implies that we must incorporate
the length of stay in the present state in the definition of the state of the system. For a
process with states 0,1,2,3,... we then need to define the states of the process as elements
of the set S={0,1,2,...}×[0,∞) i.e. the state of the process is a vector (i, t ) where i
indicates the state and t denotes the time spent in the present state of the system. Since
the state space is no longer discrete we will no longer have a Markov jump process,
although it will be a Markov process.
We can develop a theory for such processes along the lines of this chapter and results will
necessarily have to change because of this. We will only give a few such results below.

Example 4.10.1 Consider a Healthy, Sick and Dead process. Let X t = ( H , Ct ) be the
state of the process at time t if the person is healthy and has been healthy for a period of
C t etc. Suppose that the transition rate from a healthy state to a sick state is σ (t ) i.e. it
does not depend on C t . Similarly suppose that the transition rate from healthy to dead is
- 93 -

µ (t ) , the transition rate from sick to healthy ρ (t , C t ) and the transition rate from sick to
dead be v(t , C t ) .
Both transition rates out of state H do not depend on C t and we should therefore get the
same as before for the distribution of the length of time the process remains H i.e.
− [σ ( x ) + µ ( x ) ]dx
t

P C ( s, t ) = e ∫
HH
s
.
The probability that the person will stay sick from time s until time t given that at time
the person was already sick for a period of w , is given by
− [ ρ ( x , w − s + x ) + v ( x , w − s + x ) ]dx
t

P[ X t = ( S , w + t − s ) | X s = ( S , w)] = e ∫s .
Note that at time u between s and t the person is already sick for a period of w + u − s
and this is updated continuously in the integral above. The formulae remain the same as
before except that if a transition rate depends on C t , the arguments for that transition rate
in the integral is changed from ( x ) to ( x, w + x − s ) . ♪

Extra Questions (for all chapters)


E.1. Suppose you run a business that sells and provides service for a range of expensive
sports cars. Each car sells for between R400000 and R500000 (cash only) and you sell
about 10 to 20 cars each year. The ‘life blood’ of the business is the regular servicing and
maintenance of the cars you have sold previously. Describe the characteristics of a
stochastic process(es) that might be a suitable model for the balance on your company’s
bank account (i.e. parameter and state spaces).
E.2. A woman in a restaurant has R100 in her pocket. She wishes to purchase a bottle of
champagne for R200. Spotting a slot machine she decides to gamble her money. For
every game she wins, the machine pays her R10, in addition to the R10 that the game
costs. If she loses the game, she loses her R10. The probability that she wins any game is
p , and this is independent of winning any other game. Let pi denote the probability that
she accumulates R200 before going broke, assuming she starts with R10 i .
a) Set up a difference equation for pi (with boundary conditions) that could be used to
determine the probability that she gets her champagne.
b) Solve the difference equation if p = 0.4 .
c) Set up a similar difference equation (with boundary conditions) for ei , the expected
number of bets until she either gets the bottle of champagne or goes broke, assuming she
starts with R10 i . (This equation will be nonhomogeneous.)
d) Solve the difference equation in (c) for p = 0.4 by first considering the related
homogeneous equation. (The solution for the nonhomogeneous equation will have an
extra term f (i ) - you do not need to know how to find this term.)
E.3. For a Markov chain { X t : t = 0,1,2,...} with qi = P[ X 0 = i ] , write down a formula
for
a) P[ X 3 = k ] where k ∈ S
b) P[ X 3 = k | X 0 = i ] where k ∈ S
c) P[ X 3 = k , X 2 = j ] where k , j ∈ S
- 94 -

d) What is meant by the expression ∑ P[ X


i∈S
n = i | X n +1 = j ] and what is it equal to?

E.4. Consider a Markov process with state space S = {0,1,2} and transition matrix:
 p q 0
 
P= 1 0 1 
 21 2
1 
p− 7
 2 10 5
a) What can you say about the values of p and q ?
b) Calculate the transition probabilities pij(3) .
c) Draw the transition graph for the process represented by P .

(Transition Graph: diagram in which each state is represented as a node and an arrow is
drawn from node i to node j if pij > 0 indicating a direct transition from state i to state
j is possible. The value of pij is indicated above the arrow.)
E.5. Consider a Markov chain with only two states, S = {0,1} , and transition matrix
1 1 
P= 2 2 .
 3
1 2
3 
a) What are the conditions for the stationary distribution of this chain to exist and be
unique?
b) Are these conditions satisfied? Explain fully.
c) Determine the stationary distribution for this chain.
 1/ 2 0 2/3 0 
1 / 10 1 / 5 3 / 5 1 / 10
E.6. Is the following process irreducible P =  ?
 0 1/ 2 1/ 3 1/ 6 
 
 1/ 4 1/ 4 1/ 2 0 
1 / 2 1 / 2 0 0 0 
1 / 3 2 / 3 0 0
 0 
E.7. Is the following process irreducible P =  0 0 1 0 0  ? What are the
 
 0 0 0 2 / 3 1 / 3
 0 0 0 1 / 2 1 / 2
stationary distributions?
E.8. Consider a time-homogeneous Markov chain with state space S = {0,1,2} and
transition matrix:
p + 1 1 q 
 10 10 
P= 1 3 1 .
 5 10 2
 1 p+ 3 3
5 10 10
a) What are the values of p and q ?
b) Calculate the matrix of 3-step transition probabilities from state i to state j .
- 95 -

c) Draw a transition graph for this process.


E.9. Explain how a Poisson process could be used to model motor insurance claims.
E.10. What is the expected value of W0 , the first waiting time for a time-homogeneous
Markov jump process starting in state i ?
E.11. A 3-state time-homogeneous Markov Jump process is determined by the following
matrix of transition rates:
− 3 2 1 
A =  0 − 2 2 .
 0 0 0
The distribution at time 0 is 1[3 1
3
]
1 . Find the distribution at time 1. (Very
3
important type of question!)

You might also like