100% found this document useful (1 vote)
264 views36 pages

Dynamic Programming Value Iteration

This document provides an overview of dynamic programming and the Bellman equation. It begins by deriving the Bellman equation heuristically for an agent seeking to maximize discounted future payoffs from state-dependent control variables. The Bellman equation expresses the value of being in a state as the maximum sum of the current payoff and discounted future value. It then discusses solving the Bellman equation by finding its fixed point through iterative approximations. Mathematical theory guarantees a unique solution if the operator is a contraction mapping. Conditions for the operator to be a contraction are given by Blackwell's sufficient conditions of monotonicity and discounting. As an example, the Bellman equation for an optimal growth model is shown to satisfy these conditions.

Uploaded by

ernestdautovic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
264 views36 pages

Dynamic Programming Value Iteration

This document provides an overview of dynamic programming and the Bellman equation. It begins by deriving the Bellman equation heuristically for an agent seeking to maximize discounted future payoffs from state-dependent control variables. The Bellman equation expresses the value of being in a state as the maximum sum of the current payoff and discounted future value. It then discusses solving the Bellman equation by finding its fixed point through iterative approximations. Mathematical theory guarantees a unique solution if the operator is a contraction mapping. Conditions for the operator to be a contraction are given by Blackwell's sufficient conditions of monotonicity and discounting. As an example, the Bellman equation for an optimal growth model is shown to satisfy these conditions.

Uploaded by

ernestdautovic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 36

Applied Dynamic Programming

In these notes, we will deal with a fundamental tool of dynamic macroeconomics: dynamic
programming. Dynamic programming is a very convenient way of writing a large set of dynamic
problems in economic analysis as most of the properties of this tool are now well established and
understood.1 In these lectures, we will not deal with all the theoretical properties attached to
this tool, but will rather give some recipes to solve economic problems using the tool of dynamic
programming. In order to understand the problem, we will first deal with deterministic models,
before extending the analysis to stochastic ones. However, we shall give some preliminary
definitions and theorems that justify this approach

The Bellman Equation and Associated Theorems

1.1

Heuristic Derivation of the Bellman Equation

Let us consider the case of an agent that has to decide on the path of a set of control variables,
{yt }
t=0 in order to maximize the discounted sum of its future payoffs, u(yt , xt ) where xt is state
variables assumed to evolve according to
xt+1 = h(xt , yt ), x0 given.
We finally make the assumption that the model is Markovian.
The optimal value our agent can derive from this maximization process is given by

X
V (xt ) =
max
s u(yt+s , xt+s )
{yt+s D(xt+s )}s=0

(1)

s=0

For a mathematical exposition of the problem see Bertsekas [1976], for a more economic approach see Lucas,
Stokey and Prescott [1989].

Applied Dynamic Programming

where D is the set of all feasible decisions for the variables of choice. The function V (xt ) is the
value function and corresponds to the payoff of the agent when she adopts her optimal plan.
Note that the value function is a function of the state variable only. Indeed, since the model is
Markovian, only the knowledge of the past is necessary to take decisions. In other words, the
whole path can be predicted once the state variable is observed. Therefore, the value at t is only
a function of xt . Equation (1) may now be rewritten as
V (xt ) =

max

{yt D(xt ),{yt+s D(xt+s )}


s=1 }

u(yt , xt ) +

s u(yt+s , xt+s )

(2)

s=1

making the change of variable k = s 1, (2) rewrites


V (xt ) =

max

{yt D(xt ),{yt+1+k D(xt+1+k )}


k=0 }

u(yt , xt ) +

X
k=0

or
V (xt ) = max u(yt , xt ) +
yt D(xt )

k+1 u(yt+1+k , xt+1+k )

max

{yt+1+k D(xt+1+k )}
k=0

k u(yt+1+k , xt+1+k )

(3)

k=0

Note that, by definition, we have


V (xt+1 ) =

max

{yt+1+k D(xt+1+k )}
k=0

k u(yt+1+k , xt+1+k )

k=0

such that (3) rewrites as


V (xt ) = max u(yt , xt ) + V (xt+1 )
yt D(xt )

(4)

This is the socalled Bellman equation that lies at the core of the dynamic programming theory.
This equation is associated, in each and every period t, with a set of optimal policy functions
for y and x, which are defined by
{yt , xt+1 } argmax u(yt , xt ) + V (xt+1 )

(5)

yt D(xt )

Our problem is now to solve (4) for the function V (xt ). This problem is particularly complicated
as we are not solving for just a point that would satisfy the equation, but we are interested in
finding a function that satisfies the equation. We are therefore leaving the realm of real analysis
for functional analysis.
A simple procedure to find a solution would be the following
1. Make an initial guess on the form of the value function V0 (xt )
2. Update the guess using the Bellman equation such that
Vi+1 (xt ) = max u(yt , xt ) + Vi (h(yt , xt ))
yt D(xt )

Applied Dynamic Programming

3. If Vi+1 (xt ) = Vi (xt ), then a fixed point has been found and the problem is solved, if not
we go back to 2, and iterate on the process until convergence.
Therefore, solving the Bellman equation just amounts to finding a fixed point of an operator T
Vi+1 = T Vi
where T stands for the list of operations involved in the computation of the Bellman equation.
The problem is then that of the existence and the uniqueness of this fixedpoint. Mathematical
theory has provided conditions for the existence and uniqueness of a solution.

1.2

Existence and uniqueness of a solution


Definition
A metric space is a set S, together with a metric : S S R+ , such that for all
x, y, z S
1. (x, y) > 0, with (x, y) = 0 if and only if x = y,
2. (x, y) = (y, x),
3. (x, z) 6 (x, y) + (y, z).

Definition
A sequence {xn }
n=0 in S converges to x S, if for each > 0 there exists an integer
N such that
(xn , x) < for all n > N

Definition
A sequence {xn }
n=0 in S is a Cauchy sequence if for each > 0 there exists an integer
N such that
(xn , xm ) < for all n, m > N

Applied Dynamic Programming

Definition
A metric space (S, ) is complete if every Cauchy sequence in S converges to a point
in S.

Definition
Let (S, ) be a metric space and T : S S be function mapping S into itself. T is a
contraction mapping (with modulus ) if for (0, 1),
(T x, T y) 6 (x, y), for all x, y S.

We then have the following remarkable theorem that establishes the existence and uniqueness
of the fixed point of a contraction mapping.

Theorem (Contraction Mapping Theorem)


If (S, ) is a complete metric space and T : S S is a contraction mapping with
modulus (0, 1), then
1. T has exactly one fixed point V S such that V = T V ,
2. for any V S, (T n V0 , V ) < n (V 0 , V ), with n = 0, 1, 2 . . .

Since we are endowed with all the tools we need to prove the theorem, we shall do it.

Proof: In order to prove 1, we shall first prove that if we select any sequence {Vn }
n=0 , such that for each n,
Vn S and
Vn = T Vn+1
this sequence converges and that it converges to V S. In order to show convergence of {Vn }
n=0 , we shall prove
that {Vn }
is
a
Cauchy
sequence.
First
of
all,
note
that
the
contraction
property
of
T
implies
that
n=0
(V2 , V1 ) = (T V1 , T V0 ) 6 (V1 , V0 )
and therefore
(Vn+1 , Vn ) = (T Vn , T Vn1 ) 6 (Vn , Vn1 ) 6 . . . 6 n (V1 , V0 )
Now consider two terms of the sequence, Vm and Vn , m > n. The triangle inequality implies that
(Vm , Vn ) 6 (Vm , Vm1 ) + (Vm1 , Vm2 ) + . . . + (Vn+1 , Vn )

Applied Dynamic Programming

therefore, making use of the previous result, we have



(Vm , Vn ) 6 m1 + m2 + . . . + n (V1 , V0 ) 6

n
(V1 , V0 )
1

Since (0, 1), n 0 as n , we have that for each > 0, there exists N N such that (Vm , Vn ) < .
Hence {Vn }
n=0 is a Cauchy sequence and it therefore converges. Further, since we have assume that S is complete,
Vn converges to V S.
We now have to show that V = T V in order to complete the proof of the first part. Note that, for each > 0,
and for V0 S, the triangular inequality implies
(V, T V ) 6 (V, Vn ) + (Vn , T V )
But since {Vn }
n=0 is a Cauchy sequence, we have
(V, T V ) 6 (V, Vn ) + (Vn , T V ) 6

+
2
2

for large enough n, therefore V = T V .


Hence, we have proven that T possesses a fixed point and therefore have established its existence. We now have
to prove uniqueness. This can be obtained by contradiction. Suppose, there exists another function, say W S
that satisfies W = T W . Then, the definition of the fixed point implies
(V, W ) = (T V, T W )
but the contraction property implies
(V, W ) = (T V, T W ) 6 (V, W )
which, as > 0 implies (V, W ) = 0 and so V = W . The limit is then unique.
Proving 2. is straightforward as
(T n V0 , V ) = (T n V0 , T V ) 6 (T n1 V0 , V )
but we have (T n1 V0 , V ) = (T n1 V0 , T V ) such that
(T n V0 , V ) = (T n V0 , T V ) 6 (T n1 V0 , V ) 6 2 (T n2 V0 , V ) 6 . . . 6 n (V0 , V )
which completes the proof.

This theorem is of great importance as it establishes that any operator that possesses the
contraction property will exhibit a unique fixedpoint, which therefore provides some rationale
to the algorithm we were designing in the previous section. It also insures that, if the value
function satisfies a contraction property, simple iterations will deliver the solution for any initial
conditions. It therefore remains to provide conditions for the value function to be a contraction.
These are provided by the next theorem.

Applied Dynamic Programming

Theorem (Blackwells Sufficient Conditions)


Let X R` and B(X) be the space of bounded functions V : X R with the
uniform metric. Let T : B(X) B(X) be an operator satisfying
1. (Monotonicity) Let V, W B(X), if V (x) 6 W (x) for all x X, then T V (x) 6
T W (x)
2. (Discounting) There exists some constant (0, 1) such that for all V B(X)
and a > 0, we have
T (V + a) 6 T V + a
then T is a contraction with modulus .

Proof: Let us consider two functions V, W B(X) satisfying 1. and 2., and such that
V 6 W + (V, W )
Monotonicity first implies that
T V 6 T (W + (V, W ))
and discounting then implies
T V 6 T W + (V, W ))
We therefore get
T V T W 6 (V, W )
Likewise, if we now consider that W 6 V + (V, W ), we end up with
T W T V 6 (V, W )
Consequently, we have
|T V T W | 6 (V, W )
so that
(T V T W ) 6 (V, W )
which defines a contraction. This completes the proof.

This theorem is extremely useful as it gives us simple tools to check whether a problem is a
contraction and therefore permits to check whether the simple algorithm we were defined is
appropriate for the problem we have in hand.

As an example, let us consider the optimal growth model, for which the Bellman equation writes
V (kt ) = max u(ct ) + V (kt+1 )
ct C

with kt+1 = F (kt ) ct . In order to save on notations, let us drop the time subscript and denote
the next period capital stock by k 0 . Plugging the law of motion of capital in the utility function,
the Bellman equation rewrites
V (k) = max
u(F (k) k 0 ) + V (k 0 )
0
k K

Applied Dynamic Programming

where we now have to find the optimal next period capital stock i.e. the optimal investment
policy. Let us now define the operator T as
(T V )(k) = max
u(F (k) k 0 ) + V (k 0 )
0
k K

We would like to know if T is a contraction and therefore if there exists a unique function V
such that
V (k) = (T V )(k)
In order to achieve this task, we just have to check whether T is monotonic and satisfies the
discounting property.
1. Monotonicity: Let us consider two candidate value functions, V and W , such that V (k) 6
W (k) for all k K . What we want to show is that (T V )(k) 6 (T W )(k). In order to do
that, let us denote by e
k 0 the optimal next period capital stock, that is
(T V )(k) = u(F (k) e
k 0 ) + V (e
k0 )
But now, since V (k) 6 W (k) for all k K , we have V (e
k 0 ) 6 W (e
k 0 ), such that it should
be clear that
(T V )(k) 6 u(F (k) e
k 0 ) + W (e
k0 )
6

max u(F (k) k 0 ) + W (k 0 ) = (T W (k 0 ))

k0 K

Hence we have shown that V (k) 6 W (k) implies (T V )(k) 6 (T W )(k) and therefore
established monotonicity.
2. Discounting: Let us consider a candidate value function, V , and a positive constant a.
(T (V + a))(k) =
=

max u(F (k) k 0 ) + (V (k 0 ) + a)

k0 K

max u(F (k) k 0 ) + V (k 0 ) + a

k0 K

= (T V )(k) + a
Therefore, the Bellman equation satisfies discounting in the case of optimal growth model.
Hence, the optimal growth model satisfies the Blackwells sufficient conditions for a contraction
mapping, and therefore the value function exists and is unique. We are now in position to design
a numerical algorithm to solve the Bellman equation.

Applied Dynamic Programming

Deterministic Dynamic Programming

2.1

Value Function Iteration

The contraction mapping theorem gives us a straightforward way to compute the solution to
the Bellman equation: iterate on the operator T , such that Vi+1 = T Vi up to the point where
the distance between two successive value functions is small enough. Basically this amounts to
apply the following algorithm
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1 , . . . , xN }
and formulate an initial guess for the value function V0 (x). Finally, choose a stopping
criterion > 0.
2. For each x` X , ` = 1, . . . , N , compute
Vi+1 (x` ) = max u(y(x` , x0 ), x` ) + Vi (x0 )
{x0 X }

3. If kVi+1 (x) Vi (x)k < go to the next step, otherwise go back to 2.


4. Compute the final solution as
y ? (x) = y(x, x0 )
In order to better understand the algorithm, let us consider a simple example and go back to
the optimal growth model, with
u(c) =

c1 1
1

and
k 0 = k c + (1 )k > 0
Then the Bellman equation writes
V (k) =

c1 1
+ V (k 0 )
06c6k +(1)k 1
max

From the law of motion of capital we can determine consumption as


c = k + (1 )k k 0
such that plugging this results in the Bellman equation, we have
V (k) =

(k + (1 )k k 0 )1 1
+ V (k 0 )
1
06k0 6k +(1)k
max

Applied Dynamic Programming

Now, let us define a grid of N feasible values for k such that we have
K = {k1 , . . . , kN }
and an initial value function V0 (k) that is a vector of N numbers that relate each k` to a
value. Note that this may be anything we want as we know by the contraction mapping
theorem that the algorithm will converge. But, if we want it to converge fast enough it may
be a good idea to impose a good initial guess. Finally, we need a stopping criterion.
Then, for each ` = 1, . . . , N , we compute the feasible values that can be taken by the quantity
in left hand side of the value function
Q`,h

(k` + (1 )k` kh0 )1 1


+ V (kh ) for h feasible
1

It is important to understand what h feasible means. Indeed, we only compute consumption


when it is positive, which restricts the number of possible values for k 0 . Namely, we want k 0 to
satisfy
0 6 k 0 6 k + (1 )k
which puts an upper bound on the index h. When the grid of values is uniform that is when
kh = k + (h 1)k , this bound can be computed as


ki + (1 )ki k
h=E
+1
k
where E(x) denotes the integer part of x.
Then we find
Q?`,h = max Q`,h
h=h,...,h

and set
Vi+1 (k` ) = Q?`,h
and keep in memory the index h? = argmaxh=1,...,N Q`,h , such that we have
k 0 (k` ) = kh?

In Figure 1, we report the value function and the decision rules obtained from the deterministic
optimal growth model with = 0.3, = 0.96, = 0.1 and = 2. The grid for the capital stock
is composed of 1000 data points ranging from (1 k )k ? to (1 + k )k ? , where k ? denotes the
steady state and k = 0.75. The algorithm2 is then as follows.
2

The code reported in these notes and those that follow are not efficient from a computational point of view,
as they are intended to have you understand the method without adding any coding complications. Much faster
implementations can be obtained by vectorizing the code.

Applied Dynamic Programming

10

Matlab Code: Value iteration (OGM)


% Parameters of the economy
sigma
= 2.00;
delta
= 0.10;
beta
= 0.96;
alpha
= 0.30;

%
%
%
%

% parameters of the algorithm


nbk
= 2000;
crit
= 1;
tol
= 1e-6;

% number of data points in the grid


% inital convergence criterion
% Tolerance parameter

% Setup
ks
dev
kmin
kmax
kgrid

utility parameter
depreciation rate
discount factor
capital elasticity of output

the grid
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
= 0.75;
% maximal deviation from steady state
= (1-dev)*ks;
% lower bound on the grid
= (1+dev)*ks;
% upper bound on the grid
= linspace(kmin,kmax,nbk); % builds the grid

% Initial
v
=
tv
=
dr
=

conditions
zeros(nbk,1);
zeros(nbk,1);
zeros(nbk,1);

% value function
% value function
% decision rule (will contain indices)

% Main loop
iter
= 1;
while crit>tol;
for i=1:nbk
%
% compute indexes for which consumption is positive
%
imax
= sum(kgrid<kgrid(i)^alpha+(1-delta)*kgrid(i));
%
% consumption and utility
%
c
= kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util
= (c.^(1-sigma)-1)/(1-sigma);
%
% find value function
%
[tv(i),dr(i)] = max(util+beta*v(1:imax));
end;
crit = max(abs(tv-v));
% Compute convergence criterion
v
= tv;
% Update the value function
fprintf(Iteration: %d Crit: %g\n,iter,crit)
iter = iter+1;
end
kpgrid
cgrid

= kgrid(dr);
% final decision rule
= kgrid.^alpha+(1-delta)*kgrid-kpgrid;

Applied Dynamic Programming

11

Figure 1: Deterministic OGM (Decision rules, Value iteration)


Value Function
4
3
2
1
0
1
2
0.5

1.5

2.5

3.5

capital (kt)

Next period capital

4.5

5.5

Consumption

1.4
1.3

1.2
4

1.1

1
0.9

0.8
1
0

2.2

0.7
0

capital (kt)

capital (kt)

Taking advantage of interpolation

A possible improvement of the method is to have a much looser grid on the capital stock but have
a more dense grid on the control variable (consumption in the optimal growth model). Then the
next period value of the state variable can be computed much more precisely. However, because
of this precision and the fact the capital grid is coarser, it may be the case that the computed
optimal value for the next period state variable does not lie in the grid. This implies that the
value function is unknown at this particular value. Therefore, we need to use an interpolation
scheme to get an approximation of the value function at this value. One advantage of this
approach is that it involves less function evaluations and is usually less costly in terms of CPU
time. The algorithm is then as follows:
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1 , . . . , xN }
Decide on a grid, Y , of admissible values for the control variable y
Y = {y1 , . . . , yM } with M  N

Applied Dynamic Programming

12

formulate an initial guess for the value function V0 (x) and choose a stopping criterion
> 0.
2. For each x` X , ` = 1, . . . , N , compute
x0`,j = h(yj , x` )j = 1, . . . , M
Compute an interpolated value function at each x0`,j = h(yj , x` ): Vei (x0`,j )
Vi+1 (x` ) = max u(y, x` ) + Vei (x0`,j )
{yY }

3. If kVi+1 (x) Vi (x)k < go to the next step, otherwise go back to 2.


The matlab code for this approach is reported next. Cubic spline interpolation method is used
for the value function, with 20 nodes for the capital stock and 10000 nodes for consumption.
The algorithm converges starting from initial condition for the value function
v0 (k) =

((c? /y ? ) k )1 1
(1 )

Matlab Code: Value iteration with interpolation (OGM)


clear all
clc
% Parameters of the economy
sigma
= 2.00;
delta
= 0.10;
beta
= 0.96;
alpha
= 0.30;
rho
= 0.80;
se
= 0.05;
% parameters of the algorithm
dev
= 0.75;
nbk
= 20;
nbc
= 10000;
crit
= 1;
tol
= 1e-6;
method = linear;
% Setup
ks
kmin
kmax
kgrid

%
%
%
%

utility parameter
depreciation rate
discount factor
capital elasticity of output

%
%
%
%
%
%

maximal deviation from steady state


number of grid points in k
number of grid points in c
inital convergence criterion
Tolerance parameter
interpolation method

the grid
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
= (1-dev)*ks;
% lower bound on the grid
= (1+dev)*ks;
% upper bound on the grid
= linspace(kmin,kmax,nbk); % builds the grid

% Initial conditions
v
= zeros(nbk,1);

% value function

Applied Dynamic Programming

13

tv
= zeros(nbk,1);
% value function
kpgrid = zeros(nbk,1);
% decision rule for k(t+1)
ygrid
= kgrid.^alpha;
% output
% Main loop
iter
= 1;
while crit>tol;
for i=1:nbk
% consumption, next period capital and utility
c
= linspace(0,ygrid(i),nbc);
k1
= kgrid(i)^alpha+(1-delta)*kgrid(i)-c;
util
= (c.^(1-sigma)-1)/(1-sigma);
% find value function
vint
= interp1(kgrid,v,k1,method);
[v1,dr]
= max(util+beta*vint);
tv(i)
= v1;
kpgrid(i)
= k1(dr);
end;
crit = max(max(abs(tv-v)));
% Compute convergence criterion
v
= tv;
% Update the value function
fprintf(Iteration: %3d Crit: %g\n,iter,crit)
iter = iter+1;
end
% Output
ygrid
= kgrid.^alpha;
cgrid
= ygrid+(1-delta)*kgrid-kpgrid;

2.3

Policy iterations: Howard Improvement

The simple value iteration algorithm has the attractive feature of being particularly simple to
implement. However, it is a slow procedure, especially for infinite horizon problems, since it can
be shown that this procedure converges at the rate , which is usually close to 1! Furthermore,
it computes unnecessary quantities during the algorithm which slows down convergence. Often,
computation speed is really important, for instance when one wants to perform a sensitivity
analysis of the results obtained in a model using different parameterizations. Hence, we would
like to be able to speed up convergence. This can be achieved by applying the socalled Howard
improvement method. This idea of this method is to iterate on policy functions rather than on
the value function. The algorithm may be described as follows
1. Set i = 0 and guess an initial feasible decision rule for the control variable y = fi (x) and
compute the value associated to this guess, assuming that this rule is operative forever
V (xt ) =

s u(fi (xt+s ), xt+s )

s=0

taking care of the fact that xt+1 = h(xt , yt ) = h(xt , fi (xt )). Set a stopping criterion > 0.

Applied Dynamic Programming

14

2. Find a new policy rule y = fi+1 (x) such that


fi+1 (x) argmax u(y, x) + V (x0 )
y

with x0 = h(x, fi (x))


3. check if kfi+1 (x) fi (x)k < , if yes then stop, otherwise go back to 2.
Note that this method differs fundamentally from the value iteration algorithm in at least two
dimensions
(i) one iterates on the policy function rather than on the value function;
(ii) the decision rule is used forever whereas it is assumed that it is used only two consecutive
periods in the value iteration algorithm. This is precisely this last feature that accelerates
convergence.
Note that when computing the value function we actually have to solve a linear system of the
form
Vi+1 (x` ) = u(fi+1 (x` ), x` ) + Vi+1 (h(x` , fi+1 (x` )))

` = 1, . . . , N

for Vi+1 (x` ), which may be rewritten as


Vi+1 (x` ) = u(fi+1 (x` ), x` ) + QVi+1 (x` ) ` = 1, . . . , N
where Q is an (N N ) matrix

Q`j =

1 if x0 h(fi+1 (x` ), x` ) = x`
0 otherwise

Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,
to get
Vi+1 (x) = (I Q)1 u(fi+1 (x), x)

Matlab Code: Policy iteration (OGM)


clear all
clc
% Parameters of the economy
sigma
= 2;
delta
= 0.1;
beta
= 0.96;
alpha
= 0.30;
% parameters of the algorithm

%
%
%
%

utility parameter
depreciation rate
discount factor
capital elasticity of output

Applied Dynamic Programming

15

nbk
crit
epsi

= 2000;
= 1;
= 1e-6;

% number of data points in the grid


% inital convergence criterion
% Tolerance parameter

% Setup
dev
ks
kmin
kmax
kgrid

the grid
= 0.5;
% maximal deviation from steady state
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
= (1-dev)*ks;
% lower bound on the grid
= (1+dev)*ks;
% upper bound on the grid
= linspace(kmin,kmax,nbk); % builds the grid

% Initial conditions
v
= zeros(nbk,1);
% value function
v1
= zeros(nbk,1);
% value function
kp0
= zeros(nbk,1);
% value function
dr
= zeros(nbk,1);
% decision rule (will contain indices)
% Main loop
while crit>epsi;
for i=1:nbk
% compute indexes for which consumption is positive
imax
= sum(kgrid<kgrid(i)^alpha+(1-delta)*kgrid(i));
% consumption and utility
c
= kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util
= (c.^(1-sigma)-1)/(1-sigma);
% find new policy rule
[v1(i),dr(i)]= max(util+beta*v(1:imax));
end;
% decision rules and utility
kp = kgrid(dr);
c
= kgrid.^alpha+(1-delta)*kgrid-kp;
util= (c.^(1-sigma)-1)/(1-sigma);
% update the value
Q
= sparse(nbk,nbk);
J
= sub2ind([nbk nbk],(1:nbk),dr);
Q(J)= 1;
Tv = (speye(nbk)-beta*Q)\util;
crit= max(abs(kp-kp0))
v
= Tv;
kp0 = kp;
end

As experimented in the particular example of the optimal growth model, policy iteration algorithm only requires a few iterations (21). Unfortunately, we have to solve the linear system
(I Q)Vi+1 = u(y, x)

Applied Dynamic Programming

16

which may be particularly costly when the number of grid points is important. Therefore, it
was proposed to replace the matrix inversion by an additional iteration step, leading to the so
called modified policy iteration with k steps, which replaces the linear problem by the following
iteration scheme
1. Set J0 = Vi
2. iterate k times on
Ji+1 = u(y, x) + QJi , i = 0, . . . , k
3. set Vi+1 = Jk .
When k , Jk tends toward the solution of the linear system. In that case, the main loop
in the matlab code is slightly different.
Matlab Code: Modified Policy iteration (OGM), main loop only
K

= 100;

% Number of iterations

% Main loop
while crit>epsi;
for i=1:nbk
% compute indexes for which consumption is positive
imax
= sum(kgrid<kgrid(i)^alpha+(1-delta)*kgrid(i));
% consumption and utility
c
= kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util
= (c.^(1-sigma)-1)/(1-sigma);
% find new policy rule
[v1(i),dr(i)]= max(util+beta*v(1:imax));
end;
% decision rules and utility
kp = kgrid(dr);
c
= kgrid.^alpha+(1-delta)*kgrid-kp;
util= (c.^(1-sigma)-1)/(1-sigma);
% update the value
Q
= sparse(nbk,nbk);
J
= sub2ind([nbk nbk],(1:nbk),dr);
Q(J)= 1;
Jk = v1;
for k=1:K
Jk = util+beta*Q*Jk;
end
Tv = Jk;
crit= max(abs(kp-kp0))
v
= Tv;
kp0 = kp;
end

Applied Dynamic Programming

2.4

17

Endogenous Grid Method

The endogenous grid method was initially proposed by Carroll [2006] who proposed to formulate
a grid for the next period state variable rather than its current value. One merit of this approach
is that it allows to take advantage of some first order conditions and thereby accelerate the
maximization step. The version which is presented here differs from that proposed in Carroll
[2006] in that it does not take advantage of some redefinition of variables that would accelerate
even further the algorithm. This is done (i) to keep notations and intuition simple and (ii) to
show that the algorithm can be combined with a root finding problem relatively easily.
Assume the problem to be solved takes the form
V (xt ) = max u(yt , xt ) + V (xt+1 )
with xt+1 = h(xt , yt ). The algorithm then works as follows
1. Set a grid for xt+1 and an initial value function Vi (xt+1 ). (i = 0 in the first iteration)
e y (xt+1 , xt )
2. Use the first order condition to get yt? =
Vi (xt+1 ) h(xt , yt )
u(yt , xt )
e t+1 , xt )
+
= 0 yt? = (x
yt
xt+1
yt
3. Use the transition equation to find the optimal x?t = x (xt+1 ):
xt+1 = h(xt , yt ) = h(xt , (xt+1 , xt ) x?t = x (xt+1 )
and update yt such that
e y (xt+1 , x (xt+1 )) = y (xt+1 )
yt? =
4. Then compute
Vi+1 (x?t ) = u(yt? , x?t ) + Vi (xt+1 )
5. Interpolate Vi+1 (x?t ) on the grid for xt+1 and update the value.
6. If kVi+1 (xt+1 ) Vi (xt+1 )k < then stop, else go back to 2.
Note that the initial guess for the value function now plays a much more important role than
before as its partial derivative is needed to compute the optimal decision rule for the control
variable yt . The computation of the partial derivative can be achieved by mixing interpolation
and numerical differentiation in a rather standard way. Also note how step 4 does not involve
maximization as the optimal decision rules has already been obtained from step 2.

Applied Dynamic Programming

18

In the optimal growth model case, things are rather simple as the utility is only a function
of ct and the capital stock can be solved for very simply. From the first order conditions on
consumption, and given a grid for the next period capital stock kt+1 , it is readily obtained
1

ct = (Vk (kt+1 ))
which can then be used in the capital accumulation equation to obtain the current period capital
stock kt by solving3
kt + (1 )kt ct kt+1 = 0
We denote by kt? the solution to this equation. This process defines a grid for the current period
capital stock which is endogenously determined, and so the name. The current value function
is then given by
c1
1
t
+ V (kt+1 )
1
Since the new value function is evaluated at nodes that do not necessarily lie on the grid for
V (kt? ) =

next period capital stock, the updated value function has to be interpolated on these nodes.
Matlab Code: Endogenous Grid Method (OGM)
% Parameters of the economy
sigma
= 2.00;
delta
= 0.10;
beta
= 0.96;
alpha
= 0.30;

%
%
%
%

utility parameter
depreciation rate
discount factor
capital elasticity of output

% parameters of the algorithm


nbk
= 2000;
% number of data points in the grid
crit
= 1;
% inital convergence criterion
epsi
= 1e-6;
% Tolerance parameter
method
= linear;
% Interpolation method
% Setup the grid
dev
= 0.5;
% maximal deviation from steady state
ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin
= (1-dev)*ks;
% lower bound on the grid
kmax
= (1+dev)*ks;
% upper bound on the grid
kpgrid = linspace(kmin,kmax,nbk); % builds the grid

% Initial
c0
=
v0
=
kgrid
=

values
1.2*(kpgrid.^alpha+(1-delta)*kpgrid);
(c0.^(1-sigma)-1)/((1-sigma)*(1-beta));
kss*ones(nk,1);

while crit>tol
dv0
= diff_value(kpgrid,v0,method);
c
= (beta*dv0).^(-1/sigma);
tmp
= (c.^(1-sigma)-1)/(1-sigma)+beta*v0;
3

In the version proposed by Carroll [2006] this step is further accelerated by rewriting the problem in terms
of available resources Yt = kt + (1 )kt .

Applied Dynamic Programming

19

crk
= 1;
while crk>tol;
% Gauss-Newton Algorithm
f0
= kgrid.^alpha+(1-delta)*kgrid-kpgrid-c;
df0 = alpha*kgrid.^(alpha-1)+1-delta;
k1
= kgrid-f0./df0;
crk = max(abs(kgrid-k1));
kgrid= k1;
end
v1
= interp1(kgrid,tmp,kpgrid,method);
crit
= max(max(abs(v0-v1)));
v0
= v1;
end
% Output
cgrid
= kgrid.^alpha+(1-delta)*kgrid-kpgrid;
vgrid
= interp1(kpgrid,v1,kgrid,method);

2.5

Parametric dynamic programming

The last technique we will describe borrows from approximation theory using either orthogonal
polynomials or spline functions. The idea is actually to make a guess for the functional form of
the value function and iterate on the parameters of this functional form. The algorithm then
works as follows
1. Choose a functional form for the value function Ve (x; ), a grid of interpolating nodes
X = {x1 , . . . , xN }, a stopping criterion > 0 and an initial vector of parameters 0 .
2. Using the conjecture for the value function, perform the maximization step in the Bellman
equation, that is compute w` = T (Ve (x, i ))
w` = T (Ve (x` , i )) = max u(y, x` ) + Ve (x0 , i )
y

s.t. x0 = h(y, x` ) for ` = 1, . . . , N


3. Using the approximation method you have chosen, compute a new vector of parameters
i+1 such that Ve (x, i+1 ) approximates the data (x` , w` ).
4. If kVe (x, i+1 ) Ve (x, i )k < then stop, otherwise go back to 2.
First note that for this method to be implementable, we need the payoff function and the
value function to be continuous. The approximation function may be either a combination of
polynomials, neural networks, splines. Note that during the optimization problem, we may have

Applied Dynamic Programming

20

to rely on a numerical maximization algorithm, and the approximation method may involve
numerical minimization in order to solve a nonlinear leastsquare problem of the form:
i+1 Argmin

N
X

(w` Ve (x` ; ))2

`=1

This algorithm is usually much faster than value iteration as it does not require iterating on a
large grid. As an example, I will once again focus on the optimal growth problem we have been
dealing with so far, and I will approximate the value function by
Ve (k; ) =

p
X


i i

i=1

log(k) k
2
1
kk

where {i (.)}pi=0 is a set of Chebychev polynomials. In the example, I set p = 5 and used 20
nodes. Figure 2 reports the decision rule and the value function in this case, and table 1 reports
the parameters of the approximation function. The algorithm converged in 242 iterations, but
it took much less time than value iterations.
Matlab Code: Parametric dynamic programming (OGM)
sigma
delta
beta
alpha

=
=
=
=

1.50;
0.10;
0.95;
0.30;

%
%
%
%

utility parameter
depreciation rate
discount factor
capital elasticity of output

nbk
p
crit
iter
epsi

=
=
=
=
=

20;
10;
1;
1;
1e-6;

%
%
%
%
%

number of points in the grid


order of polynomials
convergence criterion
iteration
convergence parameter

ks
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
dev
= 0.9;
% maximal deviation from k*
kmin
= log((1-dev)*ks);
% lower bound on the grid
kmax
= log((1+dev)*ks);
% upper bound on the grid
rk
= -cos((2*[1:nbk]-1)*pi/(2*nnk)); % Interpolating nodes
kgrid
= exp(kmin+(rk+1)*(kmax-kmin)/2);
% mapping
%
% Initial guess for the approximation
%
v
= (((kgrid.^alpha).^(1-sigma)-1)/((1-sigma)*(1-beta)));;
X
= chebychev(rk,n);
th0
= X\v
Tv
= zeros(nbk,1);
kp
= zeros(nbk,1);
%
% Main loop
%
options=foptions;
options(14)=1e9;

Applied Dynamic Programming

21

while crit>epsi;
k0
= kgrid(1);
for i=1:nbk
param = [alpha beta delta sigma kmin kmax n kgrid(i)];
kp(i) = fminunc(@tv,k0,options,[],param,th0);
k0
= kp(i);
Tv(i) = -tv(kp(i),param,th0);
end;
theta= X\Tv;
crit = max(abs(Tv-v));
v
= Tv;
th0 = theta;
iter= iter+1;
end

Matlab Code: Parametric dynamic programming (extra functions)


function res=tv(kp,param,theta);
alpha
= param(1);
beta
= param(2);
delta
= param(3);
sigma
= param(4);
kmin
= param(5);
kmax
= param(6);
n
= param(7);
k
= param(8);
kp
= sqrt(kp.^2);
v
= value(kp,[kmin kmax n],theta);
c
= k.^alpha+(1-delta)*k-kp;
d
= find(c<=0);
c(d)
= NaN;
util
= (c.^(1-sigma)-1)/(1-sigma);
util(d) = -1e12;
res
= -(util+beta*v);

%
%
%
%
%
%
%
%

insures positivity of k
computes the value function
computes consumption
find negative consumption
drop c<0
computes utility
utility to low number for c<0
compute -TV (we minimize)

function v = value(k,param,theta);
kmin
= param(1);
kmax
= param(2);
n
= param(3);
k
= 2*(log(k)-kmin)/(kmax-kmin)-1;
v
= chebychev(k,n)*theta;

function Tx=chebychev(x,n);
X=x(:);
lx=size(X,1);
if n<0;error(n should be a positive integer);end
switch n;
case 0;
Tx=[ones(lx,1)];

Applied Dynamic Programming

22

case 1;
Tx=[ones(lx,1) X];
otherwise
Tx=[ones(lx,1) X];
for i=3:n+1;
Tx=[Tx 2*X.*Tx(:,i-1)-Tx(:,i-2)];
end
end

Figure 2: Deterministic OGM (Decision rules, Parametric DP)


Value Function

4
2
0
2
4
0

0.5

1.5

2.5
k

3.5

4.5

Next period capital stock

Consumption

1.5

3
1
2
0.5

1
0
0

0
0

3
t

Table 1: Value function approximation


0
-0.2334

1
3.0686

2
0.2543

3
0.0153

4
-0.0011

5
-0.0002

A potential problem with the use of Chebychev polynomials is that they do not put any assumption on the shape of the value function, which we know to be concave and strictly increasing in
this case. This is why Judd [1998] recommends to use shapepreserving methods such as Schumaker approximation. Judd and Solnick [1994] have successfully applied this latter technique
to the optimal growth model and found that the approximation was very good and dominated

Applied Dynamic Programming

23

other methods (they actually get the same precision with 12 nodes as the one achieved with a
1200 data points grid using a value iteration technique).

Stochastic Dynamic Programming

In a large number of problems, we have to deal with stochastic shocks just think of a standard RBC model dynamic programming technique can obviously be extended to deal with
such problems. This section will first show how we can obtain the Bellman equation before
addressing some important issues concerning discretization of shocks. Then it will describe the
implementation of the value iteration and policy iteration technique for the stochastic case. The
code for the endogenous grid method is reported at the end of the text.

3.1

The Bellman Equation

The stochastic problem differs from the deterministic problem in that we now have to take
expectations. The problem then defines a value function which has as argument the state
variable xt but also the stationary shock st , whose sequence {st }+
t=0 satisfies
st+1 = (st , t+1 )

(6)

where is a white noise process. The value function is therefore given by


V (xt , st ) =

max

{yt+ D(xt+ ,st+ )}


=0

Et

u(yt+ , xt+ , st+ )

(7)

=0

subject to (6) and


xt+1 = h(xt , yt , st )

(8)

Since both yt , xt and st are either perfectly observed or decided in period t they are known in
t, such that we may rewrite the value function as
V (xt , st ) =

max

{yt D(xt ,st ),{yt+ D(xt+ ,st+ )}


=1 }

u(yt , xt , st ) + Et

u(yt+ , xt+ , st+ )

=1

or
V (xt , st ) =

max

yt D(xt ,st )

u(yt , xt , st ) +

max

{yt+ D(xt+ ,st+ )}


=1

Et

u(yt+ , xt+ , st+ )

=1

Using the change of variable k = 1, this rewrites


V (xt , st ) =

max

yt D(xt ,st )

u(yt , xt , st )
max

{yt+1+k D(xt+1+k ,st+1+k )}


k=0

Et

X
k=0

k u(yt+1+k , xt+1+k , st+1+k )

Applied Dynamic Programming

24

It is important at this point


Z
Et (X(t+ )) =
Z
=
Z
=

to recall that
X(t+ )f (t+ |t )dt+
Z
X(t+ )f (t+ |t+ 1 )f (t+ 1 |t )dt+ dt+ 1
Z
. . . X(t+ )f (t+ |t+ 1 ) . . . f (t+1 |t )dt+ . . . dt+1

which is a corollary of the law of iterated projections, such that the value function rewrites
V (xt , st ) =

max

yt D(xt ,st )

u(yt , xt , st )
max

{yt+1+k D(xt+1+k ,st+1+k )}


k=0

Et Et+1

k u(yt+1+k , xt+1+k , st+1+k )

k=0

or
V (xt , st ) =

max

yt D(xt ,st )

u(yt , xt , st )
Z

max

{yt+1+k D(xt+1+k ,st+1+k )}


k=0

Et+1

k u(yt+1+k , xt+1+k , st+1+k )f (st+1 |st )dst+1

k=0

Note that because each value for the shock defines a particular mathematical object the maximization of the integral corresponds to the integral of the maximization, therefore the max
operator and the summation are interchangeable, so that we get
V (xt , st ) =

max

yt D(xt ,st )

u(yt , xt , st )

Z
+

max

{yt+1+k D(xt+1+k ,st+1+k )}


k=0

Et+1

k u(yt+1+k , xt+1+k , st+1+k )f (st+1 |st )dst+1

k=0

By definition, the term under the integral corresponds to V (xt+1 , st+1 ), such that the value
rewrites
Z
V (xt , st ) =

max

yt D(xt ,st )

u(yt , xt , st ) +

V (xt+1 , st+1 )f (st+1 |st )dst+1

or equivalently
V (xt , st ) =

max

yt D(xt ,st )

u(yt , xt , st ) + Et V (xt+1 , st+1 )

which is precisely the Bellman equation for the stochastic dynamic programming problem.

3.2

Discretization of the Shocks

A very important problem that arises whenever we deal with value iteration or policy iteration in
a stochastic environment is that of the discretization of the space spanned by the shocks. Indeed,

Applied Dynamic Programming

25

the use of a continuous support for the stochastic shocks is unfeasible for a computer that can
only deal with discrete supports. We therefore have to transform the continuous problem into
a discrete one with the constraint that the asymptotic properties of the continuous and the
discrete processes should be the same. The question we therefore face is: does there exist a
discrete representation for s which is equivalent to its continuous original representation? The
answer to this question is yes. In particular as soon as we deal with (V)AR processes, we can
use a very powerful tool: Markov chains.

Markov Chains: A Markov chain is a sequence of random values whose probabilities at a time
interval depends upon the value of the number at the previous time. We will restrict ourselves
to discrete-time Markov chains, in which the state changes at certain discrete time instants,
indexed by an integer variable t. At each time step t, the Markov chain is in a state, denoted
by s S {s1 , . . . , sM }. S is called the state space.

The Markov chain is described in terms of transition probabilities ij . This transition probability
should be interpreted as follows
If the economy is in state si in period t, the probability that the next state is equal to
sj is ij .
We therefore get the following definition
Definition
A Markov chain is a stochastic process with a discrete indexing S , such that the
conditional distribution of st+1 is independent of all previously attained states given
st :
ij = Prob(st + 1 = sj |st = si ), si , sj S .

The important assumption we shall make concerning Markov processes is that the transition
probabilities, ij , apply as soon as state si is activated no matter the history of the shocks nor
how the system got to the state si . In other words there is no hysteresis. From a mathematical
point of view, this corresponds to the socalled Markov property
P (st+1 = sj |st = si , st1 = sin1 , . . . s0 = si0 ) = P (st+1 = sj |st = si ) = ij
for all period t, all states si , sj S , and all sequences {sin }t1
n=0 of earlier states. Thus, the
probability the next state st+1 does only depend on the current realization of s.

Applied Dynamic Programming

26

The transition probabilities ij must of course satisfy


1. ij > 0 for all i, j = 1, . . . , M
2.

PM

j=1 ij

= 1 for all i = 1, . . . , M

All of the elements of a Markov chain model can then be encoded in a transition probability
matrix

11 . . . 1M

..
..
= ...
.
.
M 1 . . . M M

Note that kij then gives the probability that st+k = sj given that st = si . In the long run, we
obtain the steady state equivalent for a Markov chain: the invariant distribution or stationary
distribution

Definition
A stationary distribution for a Markov chain is a distribution such that
1. j > 0 for all j = 1, . . . , M
2.

PM

j=1 j

=1

3. =

Moment approach to discretization: A first simple way to tackle this problem is to rely on a
method of moments. The idea is that the continuous process and its discrete approximation
should possess the same asymptotic properties in terms of conditional first and second order
moments. We will apply it later to the optimal growth model.
Nevertheless, if as illustrated this approach is straightforward when we restrict ourselves
to a 2 states representation of the process, it becomes cumbersome when we want to deal with
more states. We then can rely on a quadrature based method proposed by Tauchen and Hussey
[1991].

Gaussianquadrature approach to discretization: Tauchen and Hussey [1991] provide a simple


way to discretize VAR processes relying on Gaussian quadrature. This note will only present

Applied Dynamic Programming

27

the case of an AR(1) process of the form4


st+1 = st + (1 )s + t+1
where t+1 ; N (0, 2 ). This implies that
(

 )
Z
Z
1
1 st+1 st (1 )s 2

exp
dst+1 = f (st+1 |st )dst+1 = 1
2

2
which illustrates the fact that s is a continuous random variable. Tauchen and Hussey [1991]
propose to replace the integral by
Z
Z
f (st+1 |st )
(st+1 ; st , s)f (st+1 |s)dst+1
f (st+1 |s)dst+1 = 1
f (st+1 |s)
where f (st+1 |s) denotes the density of st+1 conditional on the fact that st = s (in fact the
unconditional density function), which in our case implies that
(
"


 #)
f (st+1 |st )
1
st+1 s 2
st+1 st (1 )s 2
(st+1 ; st , s)
= exp

f (st+1 |s)
2

then we can use the standard linear transformation and impose zt = (st s)/( 2) to get
Z



1
2
2

exp (zt+1 zt )2 zt+1


exp zt+1
dzt+1

for which we can use a GaussHermite quadrature. Assume then that we have the quadrature
nodes zi and weights i , i = 1, . . . , n, the quadrature leads to the formula
n
1 X

j (zj ; zi ; x) ' 1

j=1

in other words we might interpret the quantity j (zj ; zi ; x)/ as an estimate


bij Prob(st+1 =
sj |st = si ) of the transition probability from state i to state j. However, it is important to remember that the quadrature is just an approximation such that it will generally be the case
P
that nj=1
bij = 1 does not hold exactly. Tauchen and Hussey therefore propose the following
modification:

bij =
where i =

j (zj ; zi ; s)

Pn

j=1 j (zj ; zi ; x).

Matlab Code: TauchenHusseys procedure


function [s,p]=tauch_hussey(xbar,rho,sigma,n)
% xbar : mean of the x process
4

In their article, Tauchen and Hussey [1991] consider more general processes.

Applied Dynamic Programming

%
%
%
%
%
%

28

rho
: persistence parameter
sigma : volatility
n
: number of nodes
returns the states s and the transition probabilities p

[xx,wx] = gauss_herm(n);
% nodes and weights for s
s
= sqrt(2)*s*xx+mx;
% discrete states
x
= xx(:,ones(n,1));
z
= x;
w
= wx(:,ones(n,1));
%
% computation
%
p
= (exp(z.*z-(z-rx*x).*(z-rx*x)).*w)./sqrt(pi);
sx
= sum(p);
p
= p./sx(:,ones(n,1));

3.3

Value Iteration

As in the deterministic case, the convergence of the simple value function iteration procedure is
insured by the contraction mapping theorem. This is however a bit more subtle than that as we
have to deal with the convergence of a probability measure, which goes far beyond this introduction to dynamic programming.5 The algorithm is basically the same as in the deterministic
case up to the delicate point that expectations have to be computed at each iteration. It writes
as follows
1. Decide on a grid, X , of admissible values for the state variable x
X = {x1 , . . . , xN }
and the shocks, s
S = {s1 , . . . , sM } together with the transition matrix = (ij )
formulate an initial guess for the value function V0 (x) and choose a stopping criterion
> 0.
2. For each x` X , ` = 1, . . . , N , and sk S , k = 1, . . . , M compute
0

Vi+1 (x` , sk ) = max u(y(x` , sk , x ), x` , sk ) +


{x0 X }

The interested reader should then refer to Lucas et al. [1989] chapter 9.

M
X
j=1

kj Vi (x0 , s0j )

Applied Dynamic Programming

29

3. If kVi+1 (x, s) Vi (x, s)k < go to the next step, otherwise go back to 2.
4. Compute the final solution as
y ? (x, s) = y(x, x0 (s, s), s)
As in the deterministic case, we will illustrate the method relying on the optimal growth model,
with
u(c) =

c1 1
1

and
k 0 = exp(a)k c + (1 )k
where a0 = a + 0 . Then the Bellman equation writes
Z
c1 1
+ V (k 0 , a0 )d(a0 |a)
V (k, a) = max
c
1
From the law of motion of capital we can determine consumption as
c = exp(a)k + (1 )k k 0
such that plugging this results in the Bellman equation, we have
(exp(a)k + (1 )k k 0 )1 1
V (k, a) = max
+
k0
1

V (k 0 , a0 )d(a0 |a)

A first problem that we encounter is that we would like to be able to evaluate the integral
involved by the rational expectation. We therefore have to discretize the shock. Here, we will
consider that the technology shock can be accurately approximated by a 2 state Markov chain,
such that a can take on 2 values a1 and a2 (a1 < a2 ). We will also assume that the transition
matrix is symmetric, such that

=

1
1

a1 , a2 and are selected such that the process reproduces the conditional first and second order
moments of the AR(1) process
First order moments
a1 + (1 )a2 = a1
(1 )a1 + a2 = a2
Second order moments
a21 + (1 )a22 (a1 )2 = 2
(1 )a21 + a22 (a2 )2 = 2

Applied Dynamic Programming

30

From the first two equations we get a1 = a2 and = (1 + )/2. Plugging these last two results
p
in the two last equations, we get a1 = 2 /(1 2 ). Hence, we will actually work with a value
function of the form

V (k, ak ) = max
c

X
c1 1
+
kj V (k 0 , a0j )
1
j=1

Now, let us define a grid of N feasible values for k such that we have
K = {k1 , . . . , kN }
and an initial value function V0 (k) that is a vector of N numbers that relate each k` to a
value. Note that this may be anything we want as we know by the contraction mapping
theorem that the algorithm will converge. But, if we want it to converge fast enough it may
be a good idea to impose a good initial guess. Finally, we need a stopping criterion.

In Figure 3, we report the value function and the decision rules obtained for the stochastic
optimal growth model with = 0.3, = 0.96, = 0.1, = 2, = 0.8 and = 0.05. The grid
for the capital stock is composed of 2000 data points ranging from 25% below to 25% above the
deterministic steady state.

Matlab Code: Value iteration (Stochastic OGM)


% Parameters of the economy
sigma
= 2.00;
delta
= 0.10;
beta
= 0.96;
alpha
= 0.30;
rho
= 0.80;
se
= 0.05;
% parameters of the algorithm
dev
= 0.75;
nbk
= 2000;
crit
= 1;
tol
= 1e-6;
% Shocks
p
=
a
=
PI
=
agrid
=
nba
=
% Setup
ks
kmin
kmax

%
%
%
%

utility parameter
depreciation rate
discount factor
capital elasticity of output

%
%
%
%

maximal deviation from steady state


number of data points in the grid
inital convergence criterion
Tolerance parameter

(1+rho)/2;
se/sqrt(1-rho*rho);
[p 1-p;1-p p];
[-a a];
length(agrid);

the grid
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
= (1-dev)*ks;
% lower bound on the grid
= (1+dev)*ks;
% upper bound on the grid

Applied Dynamic Programming

kgrid

31

= linspace(kmin,kmax,nbk); % builds the grid

% Initial
v
=
tv
=
dr
=

conditions
zeros(nbk,nba);
zeros(nbk,nba);
zeros(nbk,nba);

% value function
% value function
% decision rule (will contain indices)

% Main loop
iter
= 1;
while crit>tol;
for j=1:nba
for i=1:nbk
% compute indexes for which consumption is positive
imax= sum(kgrid<exp(agrid(j))*kgrid(i)^alpha+(1-delta)*kgrid(i));
% consumption and utility
c
= exp(agrid(j))*kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util= (c.^(1-sigma)-1)/(1-sigma);
% find value function
EV
= (PI(j,:)*v(1:imax,:));
[tv(i,j),dr(i,j)]
= max(util+beta*EV);
end;
end
crit = max(max(abs(tv-v)));
% Compute convergence criterion
v
= tv;
% Update the value function
fprintf(Iteration: %3d Crit: %g\n,iter,crit)
iter = iter+1;
end
% Output
kpgrid = kgrid(dr);
ygrid
= kron(exp(agrid),kgrid.^alpha);
cgrid
= ygrid+(1-delta)*repmat(kgrid,1,nba)-kpgrid;

3.4

Policy iterations

As in the deterministic case, we may want to accelerate the simple value iteration using Howard
improvement. The stochastic case does not differ that much from the deterministic case, apart
from the fact that we now have to deal with different decision rules, which implies the computation of different Q matrices. The algorithm may be described as follows
1. Set an initial feasible set of decision rule for the control variable y = f0 (x, sk ), k = 1, . . . , M
and compute the value associated to this guess, assuming that this rule is operative forever,
taking care of the fact that xt+1 = h(xt , yt , st ) = h(xt , fi (xt , st ), st ) with i = 0. Set a
stopping criterion > 0.

Applied Dynamic Programming

32

Figure 3: Stochastic OGM (Decision rules, Value iteration)


Value Function
4
3
2
1
0
1
2
0.5

1.5

2.5

3.5

capital (kt)

Next period capital

4.5

5.5

Consumption

1.6

1.4

4
1.2
3
1
2
0.8

1
0

capital (kt)

capital (kt)

2. Find a new policy rule y = fi+1 (x, sk ), k = 1, . . . , M , such that


fi+1 (x, sk ) argmax u(y, x, sk ) +
y

M
X

kj V (x0 , s0j )

j=1

with x0 = h(x, fi (x, sk ), sk )


3. check if kfi+1 (x, s) fi (x, s)k < , if yes then stop, otherwise go back to 2.
When computing the value function we actually have to solve a linear system of the form
Vi+1 (x` , sk ) = u(fi+1 (x` , sk ), x` , sk ) +

M
X

kj Vi+1 (h(x` , fi+1 (x` , sk ), sk ), s0j )

j=1

for Vi+1 (x` , sk ) (for all x` X and sk S ), which may be rewritten as


Vi+1 (x` , sk ) = u(fi+1 (x` , sk ), x` , sk ) + k. QVi+1 (x` , .)
where Q is an (N N ) matrix

Q`j =

1 if x0 h(fi+1 (x` ), x` ) = xj
0 otherwise

x` X

Applied Dynamic Programming

33

Note that although it is a big matrix, Q is sparse, which can be exploited in solving the system,
to get
Vi+1 (x, s) = (I Q)1 u(fi+1 (x, s), x, s)
We apply this algorithm to the same optimal growth model as in the previous section.
Matlab Code: Policy iteration (Stochastic OGM)
% Parameters of the economy
sigma
= 2.00;
delta
= 0.10;
beta
= 0.96;
alpha
= 0.30;
rho
= 0.80;
se
= 0.05;
% parameters of the algorithm
dev
= 0.75;
nbk
= 2000;
crit
= 1;
epsi
= 1e-6;
% Shocks
p
=
a
=
PI
=
agrid
=
nba
=
% Setup
ks
kmin
kmax
kgrid

% utility parameter
% depreciation rate
% discount factor
% capital elasticity of output

%
%
%
%

maximal deviation from steady state


number of data points in the grid
inital convergence criterion
Tolerance parameter

(1+rho)/2;
se/sqrt(1-rho*rho);
[p 1-p;1-p p];
exp([-a a]);
length(agrid);

the grid
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
= (1-dev)*ks;
% lower bound on the grid
= (1+dev)*ks;
% upper bound on the grid
= linspace(kmin,kmax,nbk); % builds the grid

% Initial conditions
v
= zeros(nbk,nba);
% value function
v1
= zeros(nbk,nba);
% value function
kp0
= zeros(nbk,nba);
% value function
dr
= zeros(nbk,nba);
% decision rule (will contain indices)
S
= 100;
% Main loop
while crit>epsi;
for j=1:nba
for i=1:nbk
% compute indexes for which consumption is positive
imax = sum(kgrid<agrid(j)*kgrid(i)^alpha+(1-delta)*kgrid(i));
% consumption and utility
c
= agrid(j)*kgrid(i)^alpha+(1-delta)*kgrid(i)-kgrid(1:imax);
util = (c.^(1-sigma)-1)/(1-sigma);
% find new policy rule

Applied Dynamic Programming

34

EV
= (PI(j,:)*v(1:imax,:));
[v1(i,j),dr(i,j)]= max(util+beta*EV);
end;
end
% decision rules
kp = kgrid(dr);
c
= kron(agrid,kgrid.^alpha)+(1-delta)*repmat(kgrid,1,nba)-kp;
util= (c.^(1-sigma)-1)/(1-sigma);
Q = sparse(nbk*nba,nbk*nba);
for j=1:nba;
Q0 = sparse(nbk,nbk);
J
= sub2ind([nbk nbk],(1:nbk),dr(:,j));
Q0(J)= 1;
Q((j-1)*nbk+1:j*nbk,:)
= kron(PI(j,:),Q0);
end
Tv = (speye(nbk*nba)-beta*Q)\util(:);
v
= reshape(Tv,nbk,nba);
crit= max(max(abs(kp-kp0)))
kp0 = kp;
end

As for the deterministic case, the modified kstep iterations can be used to update the value
function rather than solving the big linear system.
The endogenous grid method algorithm is as easily amendable to the stochastic case as the
previous algorithms. Only the matlab code is reported.
Matlab Code: Endogenous Grid Method (Stochastic OGM)
clear all
% Parameters of the economy
sigma
= 2.00;
delta
= 0.10;
beta
= 0.96;
alpha
= 0.30;
rho
= 0.80;
se
= 0.05;
% parameters of the algorithm
dev
= 0.75;
nbk
= 2000;
crit
= 1;
tol
= 1e-6;
method = linear;
%
% Shocks
%
a
= se/(sqrt(1-rho*rho));
agrid
= exp([-a a]);
nba
= length(agrid);

%
%
%
%

utility parameter
depreciation rate
discount factor
capital elasticity of output

%
%
%
%
%

maximal deviation from steady state


number of data points in the grid
inital convergence criterion
Tolerance parameter
Interpolation method

Applied Dynamic Programming

p
PI

= (1+rho)/2;
= [p 1-p;1-p p];

% Setup the grid


kss
= ((1-beta*(1-delta))/(alpha*beta))^(1/(alpha-1));
kmin
= (1-dev)*kss;
% lower bound on the grid
kmax
= (1+dev)*kss;
% upper bound on the grid
kpgrid = linspace(kmin,kmax,nbk); % builds the grid
%
% Initial values
%
kgrid
= kss*ones(nbk,nba);
v0
= 1.5*(kpgrid.^alpha+(1-delta)*kpgrid);
v0
= repmat((v0.^(1-sigma)-1)/((1-sigma)*(1-beta)),1,nba);
v1
= v0;
while crit>tol
dv0
= diff_value(kpgrid,v0,method);
c
= (beta*(PI*dv0)).^(-1/sigma);
u
= (c.^(1-sigma)-1)/(1-sigma);
tmp
= u+beta*(PI*v0);
for i=1:nba
crk
= 1;
k0
= kgrid(:,i);
while crk>tol;
f0
= agrid(i)*k0.^alpha+(1-delta)*k0-kpgrid-c(:,i);
df0 = alpha*agrid(i)*k0.^(alpha-1)+(1-delta);
k1
= k0-f0./df0;
crk = max(abs(k0-k1));
k0
= k1;
end
kgrid(:,i) = k0;
v1(:,i)
= interp1(k0,tmp(:,i),kpgrid,method);
end
crit
= max(max(abs(v0-v1)));
v0
= v1;
end
%%
ygrid
= kron(agrid,ones(nbk,1)).*kgrid.^alpha;
cgrid
= ygrid+(1-delta)*kgrid-repmat(kpgrid,1,nba);
igrid
= ygrid-cgrid;

35

Applied Dynamic Programming

36

References
Bertsekas, D., Dynamic Programming and Stochastic Control, NewYork: Academic Press,
1976.
Carroll, C.D., The Method of Endogenous Gridpoints for Solving Dynamic Stochastic Optimization Problems, Economics Letters, 2006, 91 (3), 312320880.
Judd, K. and A. Solnick, Numerical Dynamic Programming with ShapePreserving Splines,
Manuscript, Hoover Institution 1994.
Judd, K.L., Numerical methods in economics, Cambridge, Massachussets: MIT Press, 1998.
Lucas, R., N. Stokey, and E. Prescott, Recursive Methods in Economic Dynamics, Cambridge
(MA): Harvard University Press, 1989.
Tauchen, G. and R. Hussey, Quadrature Based Methods for Obtaining Approximate Solutions
to Nonlinear Asset Pricing Models, Econometrica, 1991, 59 (2), 371396.

You might also like