LectureNotes ESS101 2023
LectureNotes ESS101 2023
Sebastien Gros
with
Bo Egardt
2023
( q)
) −V
q , q̇
T(
)=
, q̇
d (q
d t ∂L
L
∂q̇ −
∂L
∂q =
Q
θ ¤
y|
P £
P[
θ
ax
θ|
rg m
y]
=a
=
R P[ y
y ¤
P[ | θ ]
y|
θ|
θ] [θ]
P £
P[
P
θ] θ
x θ
d
ma
arg
θˆ =
P REFACE
These Lecture Notes were originally prepared in 2018 by Sebastien Gros for a
significantly revised course Modelling and Simulation. Since 2020, the
following changes have been done to the original manuscript:
• The chapter on system identification has been largely rewritten, and later
revised. The probabilistic aspects have been de-emphasized in order to
make the presentation more easily accessible. The “Bayesian approach”
has been omitted, and Maximum Likelihood estimation is for optional
reading. The focus is now shifted towards Prediction Error Methods, in-
cluding Least-squares, and a fairly thorough introductory section is de-
voted to simple curve-fitting. The section on practical aspects on system
identification is also new.
• Chapter 2 on physics-based modelling is new.
• In Chapter 8, Collocation methods are for optional reading.
3 Lagrange mechanics 51
3.1 Kinetic Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Potential Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Lagrange Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.4 External forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Constrained Lagrange Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5.1 Handling Models from Constrained Lagrange . . . . . . . . . . . . . . . 80
3.6 Consistency conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.7 Constraints drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4 Newton Method 89
4.1 Basic idea of Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 Convergence of the Newton method . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.1 Convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.2 Reduced Newton steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Jacobian Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5 Newton Methods for Unconstrained Optimization . . . . . . . . . . . . . . . . 100
4.5.1 Gauss-Newton Hessian approximation . . . . . . . . . . . . . . . . . . . 102
4.5.2 Convex optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
1
5 System Identification (SysId) 105
5.1 Introductory example: fitting a function to data . . . . . . . . . . . . . . . . . . 106
5.2 Parameter Estimation for Linear Dynamic Systems . . . . . . . . . . . . . . . . 115
5.2.1 A special case: linear regression . . . . . . . . . . . . . . . . . . . . . . . 117
5.2.2 Predictions for linear black-box models . . . . . . . . . . . . . . . . . . . 120
5.2.3 Prediction Error Methods (PEM) . . . . . . . . . . . . . . . . . . . . . . . 123
5.2.4 Properties of the PEM estimate . . . . . . . . . . . . . . . . . . . . . . . . 125
5.3 System identification in practice . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3.1 Design of experimental conditions . . . . . . . . . . . . . . . . . . . . . . 127
5.3.2 Pretreatment of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.3.3 Model structure selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.3.4 Model validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.4 The maximum likelihood method* . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2
8.5 RK methods for implicit DAEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.5.1 RK method for semi-explicit DAE models . . . . . . . . . . . . . . . . . . 196
3
4
1 N OTATION & B ACKGROUND MATERIAL
Most often, we will note vectors in the bold math format, i.e. a ∈ Rn . Scalars will be noted
in the standard math format, i.e. a ∈ R. Matrices will be noted using capital letters, i.e.
A ∈ Rn×m . There will be a few exceptions to these rules.
We will use subscript to denote the elements of vectors and matrices, i.e. a i ∈ R will be the
i :th element of vector a, and A i j the element in the i :th row and j :th column of matrix A.
Generic functions will obey the same notation rules. We will reserve the capital letter I for
the identity matrix.
But any operator ρ taking an element of the vector space under consideration into R+ , and
having the following properties, can serve as a norm:
There are several norms that differ from (1.1), e.g. the 1-norm
n
X
kxk1 = |xk | (1.2)
k=1
Note that since all norms are equivalent (i.e. boundedness in one norm implies bounded-
ness in the others), we often omit (when irrelevant) which type of norm we use.
〈., .〉 (1.4)
to denote scalar products on any vector space. On Rn , the scalar product between two
vectors x, y ∈ Rn reads as:
〈 x , y 〉 = x⊤y (1.5)
5
1.2 S OME BASICS FROM L INEAR A LGEBRA
Let us review here some basic principles of linear algebra that can be useful in this course.
Av = λv (1.6)
i.e. the eigenvectors are transformed into scaled versions of themselves by the ap-
plication of matrix A. The eigenvalues can be computed by solving the polynomial
equation
det (A − λI ) = 0 (1.7)
x ⊤ Ax ∈ R (1.8)
x ⊤ Ax ≥ 0 (1.9)
for any x.
6
is a Rn×m matrix given by:
∂ f1 ∂ f1
∂x1 ... ∂xm
∂f .. ..
= . .
(1.11)
∂x ∂ fn ∂ fn
∂x1 ... ∂xm
• It is useful to have the Jacobians of some matrix functions readily available, in partic-
ular for the Lagrange modelling chapter. In the following expressions, A is a matrix:
∂
(Ax) = A (1.12)
∂x
∂ ¡ ⊤ ¢
x Ax = x ⊤ (A + A ⊤ ) (1.13)
∂x
• It will be convenient certain times to use the gradient operator ∇, which is essentially
the transpose of the Jacobian operator, i.e.
∂f ⊤
∇x f = (1.14)
∂x
The gradient is most often used for scalar functions f ∈ R, but the notion of gradient
can be readily applied to all functions.
• We will need a few times to distinguish between total and partial derivatives. The
notion can be tricky and even ambiguous sometimes, but let us recall here the basic
principle. Consider a function:
¡ ¢
f x, y (1.17)
where y = g (x). Then the partial derivative ignores that y is intrinsically a function of
x, i.e.
¡ ¢
∂ f x, y
(1.18)
∂x
disregards this dependency, while the total derivatives takes it into account and does:
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
d f x, y ∂ f x, y ∂ f x, y ∂y ∂ f x, y ∂ f x, y ∂g
= + = + (1.19)
dx ∂x ∂y ∂x ∂x ∂y ∂x
7
1.4 O RDINARY D IFFERENTIAL E QUATIONS (ODE)
Ordinary Differential Equations (ODEs) will play a crucial role in this course. ODEs are a
way to relate the outputs y of a system to the inputs u acting on the system. Generally
speaking, an ODE reads as
¡ ¢
ϕ y (m) , y (m−1) , . . . , y , u (m−1) , u (m−2) , . . . , u = 0 (1.20)
dk
y (k) = y. (1.21)
dt k
A trivial example of such an equation is the motion of a mass m attached to a spring of
constant K , and subject to an external force u and viscous friction of parameter ξ. The
ODE describing such a system reads as:
ϕ = m ÿ + ξ ẏ + K y − u = 0 (1.22)
Most often we prefer to work with ODEs in their state-space form rather than in the form
(1.20). An ODE in the state-space is most often written as:
ẋ = f (x, u) (1.23)
for some state x ∈ Rn . E.g. the spring-mass example (1.22) in its state-space form reads as:
· ¸ · ¸
x2 y
ẋ = 1 , x= (1.24)
m (u − ξx 2 − K x 1 ) ẏ
A complete state-space model is then made of the dynamics (1.23) and an output function
y = h (x, u) telling us what we can “measure" or alternatively what we “care about" in the
system. The full state-space model then reads as:
ẋ = f (x, u) (1.25a)
y = h (x, u) (1.25b)
F (ẋ, x, u) = 0 (1.26)
i.e. it delivers the derivatives of the state space not via a function f that can be evaluated
directly, but implicitly via the equations (1.26).
8
1.5 L INEAR ODE
A very important class of models is the class of linear models, where functions f , h are
linear. We can then rewrite (1.25) in the form:
ẋ = Ax + Bu (1.27a)
y = C x + Du (1.27b)
for some matrices A ∈ Rnx ×nx , B ∈ Rnx ×nu , C ∈ Rn y ×nx , D ∈ Rn y ×nu . Linear systems have the
superposition property, i.e. if x1 (t ) and x2 (t ) are solutions of (1.27) for the input profiles
u 1 (t ), u 2 (t ) respectively, then α1 x1 (t ) + α2 x2 (t ) is solution corresponding to the input pro-
file α1 u 1 (t ) + α2 u 2 (t ). Linear ODEs are “nice" to work with because one can treat them
exploiting the powerful theorems provided by functional analysis, which - among other
things - give us the Fourier and Laplace transforms.
When matrices A, B, C , D are fixed, then system (1.27) is said Linear Time Invariant (LTI).
If (any of) the matrices A(t ), B(t ), C (t ), D(t ) is functions of time, then system (1.27) is said
Linear Time Varying (LTV).
1.6 L INEARIZATION
Linearization consists in forming locally valid linear approximations of nonlinear ODEs
(1.25). The linearization of an ODE is nothing more than an application of the Taylor ex-
pansion, to a first order. I.e. the functions f , h are replaced by their first-order approxima-
tion with respect to all arguments. In that context, we will consider deviations in the states
and inputs from a given reference, and establish how this deviation will evolve in time, to
a first-order approximation. Let us build this step-by-step. Consider first the first-order
expansion of function f . Consider a reference trajectory x(t ) solution of (2.1a) for a given
reference input profile u and reference initial conditions x0 . Consider then a deviation ∆u(t )
and ∆x0 in the input profile and/or initial conditions, and consider ∆x(t ) the resulting de-
viation in the ODE trajectories. We observe that:
must hold. For small deviations ∆u(t ), ∆x(t ), we can form the first-order Taylor expansion
of the right-hand-side of (1.28), and obtaining:
¯ ¯
∂ f ¯¯ ∂ f ¯¯ ¡ ¢
˙ ) = f (x(t ), u(t )) +
ẋ(t ) + ∆x(t ¯ ∆x(t ) + ¯ ∆u(t ) + O k∆xk2 , k∆uk2
∂x x(t),u(t) ∂u x(t),u(t)
(1.29a)
¯ ¯
∂h ¯¯ ∂h ¯¯ ¡ ¢
y (t ) + ∆y (t ) = h (x(t ), u(t )) + ¯ ∆x(t ) + ¯ ∆u(t ) + O k∆xk2 , k∆uk2
∂xx(t),u(t) ∂u x(t),u(t)
(1.29b)
9
Since ẋ(t ) = f (x(t ), u(t )), we equivalently get:
˙ ) ≈ A(t )∆x(t ) + B(t )∆u(t )
∆x(t (1.30a)
∆y (t ) ≈ C (t )∆x(t ) + D(t )∆u(t ) (1.30b)
for the time-varying matrices:
¯ ¯
∂ f ¯¯ ∂ f ¯¯
A(t ) = , B(t ) = (1.31a)
∂x ¯x(t),u(t) ∂u ¯x(t),u(t)
¯ ¯
∂h ¯¯ ∂h ¯¯
C (t ) = , D(t ) = (1.31b)
∂x ¯ x(t),u(t) ∂u ¯ x(t),u(t)
A very commonly used special case of (1.30)-(1.31) is when the reference inputs and trajec-
tories u(t ), x(t ) are constant, in which case matrices A(t ), B(t ), C (t ), D(t ) are also constant
(because they come from Jacobians evaluated on fixed arguments). Such trajectories are in
fact a stationary point of the dynamics, and can be found by solving:
f (x0 , u 0 ) = 0 (1.32)
and using x = x0 , u(t ) = u 0 such that ẋ = 0. We can then use these reference trajectory in
(1.31).
10
1.7 S OLUTION OF ODE S
Let us now turn to a crucial question in this course. Consider an ODE in its state-space
form
ẋ = f (x, u) (1.39)
An important question to ask here is “provided the initial conditions x(0) = x0 , is there a
trajectory x(t ) solution of (1.39), and is it unique?".
Before treating that question, let us consider two simple examples showing that it is not
trivial.
1
x(t ) = (1.41)
1−t
as solution. We illustrate the trajectory x(t ) in the figure below. One can easily observe
from (1.41) that the state x(t ) becomes arbitrarily large as t approaches 1. The trajectories,
in fact, do not exist at and beyond t = 1.
10
9
8
7
6
x
5
4
3
2
1
0 0.2 0.4 0.6 0.8 1
t
Figure 1.1: Illustration of the solution (1.41). The trajectories reach ∞ in finite time (when
getting close to t = 1.
This simple examples illustrates that the solution of an ODE can “explode" in finite time,
and cease to exist beyond a limited time interval.
11
Example 1.3 (Non-uniqueness of solutions). Consider the ODE
p
ẋ = |x|, x(0) = 0 (1.42)
is an admissible solution for (1.42) for any t0 ≥ 0. The issue here is that it is an admissible
solution regardless of t0 . In other words, there are infinitely many trajectories x(t ) (for
infinitely many choice of t0 ) that are solution of (1.42). We illustrate these solutions in the
next Figure.
25
20
15
x
10
0
0 2 4 6 8 10
t
Figure 1.2: Illustration of the solutions (1.43) for different values of t0 (dashed lines).
Let us introduce two theorems that unpack the two issues (existence and unicity) raised in
the examples above. The first theorem deals with the existence of the ODE solution.
ẋ = f (x) (1.44)
for some finite constant c, then the solution of (1.44) exists and is unique for all t .
1
the following property is labelled “Lipschitz continuity"
12
Let us return to the first example above. One can observe that
f (x) = x 2 (1.46)
we can observe that the term x + y can be arbitrarily large, such that we cannot find a con-
stant c for which (1.45) holds.
Unfortunately, property (1.45) can be difficult to verify for non-trivial ODEs. Let us intro-
duce a second theorem that makes our life easier.
ẋ = f (x) (1.48)
∂f
Then if f is continuously differentiable (i.e. the Jacobian ∂x exists and is continuous), then
the solution to the ODE exists and is unique on some time interval.
Let us return to the second example above. One can observe that
∂f 1
= p (1.49)
∂x 2 x
such that f is not differentiable at x = 0. This results in the non-unicity of the solution of
(1.42). Note that the first theorem does not apply to the ODE (1.42) as function f has “infi-
nite slopes" at x = 0, such that condition (1.45) does not hold around x = 0. Note that the
second theorem is a “weaker" version of the first one, and requires weaker conditions.
Before closing, let us consider the case of linear ODEs. We observe that the theorems above
readily apply to any linear ODE as linear functions readily satisfy (1.45), and are contin-
uously differentiable. That is, linear ODEs always have a unique solution over the time
interval [0, ∞[. However, the solution may be unbounded, i.e. it can grow forever if the
ODE is unstable, but there is not specific time where the solution becomes infinite (unlike
(1.40)).
13
We ought to recall here the definition of a “matrix exponential" like e At . Here we simply
extend the series corresponding to the exponential function to matrices. More specifically,
similarly to:
t t2 X∞ tk
e = 1+ t + +... = (1.52)
2! k=0 k!
we define:
(At )2 X∞ (At )k
e At = I + At + +... = (1.53)
2! k=0 k!
Note that here the power of a matrix A k does not correspond to taking the power of the ma-
trix entries, but rather to multiplying the matrix by itself k times. We additionally observe
that the differentiation rule:
d at
e = ae at (1.54)
dt
becomes for the matrix exponential function:
d At
e = Ae At (1.55)
dt
Note that the matrix exponential function is “expm.m" in Matlab, and does not (necessar-
ily) correspond to the classic exponential function “exp.m".
We can conclude here that LTI ODEs do not need to be “simulated" as one can compute
their solution explicitly from (1.51). For nonlinear ODEs (and some LTV ODEs), however,
the best way of building their trajectories is by using computer-based simulations.
where the system input u is described as a discrete sequence u 0 , . . . , u ∞ , and similarly for
the system output y. Similarly to the Laplace transform, the Z -transform allows one to
treat dynamics in the form (1.56) in the “polynomial world".
14
The Z -transform has a number of useful properties (such as e.g. linearity), among which
the time-shift property
i.e.
P Nb
b z −i
i =0 i
Y (z) = PN a U (z) (1.60)
1 − i =1 ai z −i
Formally, calculations in the Z -domain, using the Z -transform, can equivalently be car-
ried out in the time domain using the shift operator q, defined from q y k = y k+1 or its in-
verse q −1 , satisfying q −1 y k = y k−1 . Using this notation, the system defined by the difference
equation (1.56) can be written
Nb
X Na
X Nb
X Na
X
yk = b i u k−i + ai y k−i = b i q −i u k + ai q −i y k , (1.61)
i =0 i =1 i =0 i =1
or, equivalently,
¡ Na
X ¢ Nb
¡X ¢
1− ai q −i y k = b i q −i u k (1.62)
i =1 i =0
By introducing polynomials in the (backward) shift operator, defined as
Na
X Nb
X
A(q) = 1 − ai q −i , B(q) = b i q −i , (1.63)
i =1 i =0
1.10 P ROBABILITY
In the system identification chapter, some basic concepts from probability theory will be
used. Let us review some basic principles and notations here.
15
R ANDOM VARIABLES . In order to model uncertainties like disturbances and noise, the
concept of a random variable is used. A real, scalar (or univariate) random variable (r.v.) X
is defined by its (Cumulative) Distribution Function (CDF), describing the probability that
X takes a value less than or equal x:
We will occasionally use the extension to vector (multivariate) r.v., defined in an analogous
way.
In most cases, we will use the Probability Density Function (PDF) to characterize the ran-
dom variable. The PDF f X (x) of a continuous r.v. is defined by
Zx
F X (x) = f X (y)d y (1.67)
−∞
The PDF has an intuitive interpretation: f X (x) indicates the “relative” likelihood that the
r.v. X takes the value x. For a multivariate random variable X , the PDF f X (x) is a function
from Rn to R.
I NDEPENDENCE . Two random variables X and Y with joint PDF f X ,Y (x, y) are called in-
dependent if f X ,Y (x, y) = f X (x) · f Y (y). This definition can be extended to any collection of
random variables, and the term mutual independence is then often used.
In the multivariate case, we will need the concept of covariance with itself (auto-covariance),
and we will refer to the covariance matrix defined as
16
N ORMAL DISTRIBUTION . The most familiar and often used random variable distribution
is the normal or Gaussian distribution. It is characterized by the PDF
1 − 1
(x−µ)2
f X (x) = p e 2σ2 , (1.71)
2πσ2
where µ is the mean, σ is the standard deviation, and σ2 is the variance of the r.v. X . In
short, this can be written as X ∼ N (µ, σ2 ). Analogously, the multivariate normal (Gaus-
sian) distribution has a PDF
1 1 1 T
Σ−1 (x−µ)
f X (x) = n/2 1/2
e − 2 (x−µ) , (1.72)
(2π) (det Σ)
where µ (a vector) is the mean and Σ is the covariance matrix. Notation: X ∼ N (µ, Σ).
For a stochastic process {v (t )} with covariance function R v (τ), the spectral density Φv (ω) is
defined as the discrete Fourier transform of its covariance function:
∞
X
Φv (ω) = R v (t )e −i tω . (1.74)
t=−∞
The spectral density brings information on the “frequency content” of typical realizations
of the stochastic process, with useful engineering applications. Parseval’s formula applied
to stochastic processes suggests that the variance (“power”) of the stochastic process can
be thought of as the summing-up of all contributions along the frequency axis:
Z
2 1 π
E v (t ) = Φv (ω)d ω. (1.75)
2π −π
17
18
2 B UILDING MODELS FROM PHYSICS
Recall from the introductory chapter that our “favourite” ODE model is the state-space
model (1.25), repeated here for convenience:
ẋ = f (x, u) (2.1a)
y = h (x, u) (2.1b)
In this chapter, we will discuss the process of going from characterizing a system from its
physical properties to determining a useful state-space model of the type (2.1). This pro-
cess, often referred to as physical modelling or first-principles modelling, should be familiar
from e.g. the basic control course. We will nevertheless spend some effort to refresh some
of the basic ideas, and to illustrate these on examples from different domains.
One of the lessons to be learnt in this chapter is that the state-space model (2.1) is not quite
sufficient to model all systems that we are interested in. In particular, we will encounter a
class of more general models, based on differential-algebraic equations or DAE. Such mod-
els will be dealt with in some detail later in the course.
3. Formulate a model
A NALYZE THE SYSTEM ’ S FUNCTION AND STRUCTURE . One of the first actions in the mod-
elling process is often to identify how the system can be viewed as a connection of subsys-
tems. The rationale for doing this is that it is usually beneficial to adopt the divide-and-
conquer paradigm, by which the (big) modelling task is divided into smaller tasks, each
19
dealing with a separate subsystem. When identifying the subsystems, one of the criteria is
that the interactions/connections between subsystems become as simple as possible.
In this first step, it is also important to carry out a basic analysis of the system’s function.
This means, for example, to identify which physical mechanisms are important, which
quantities/variables that describe these mechanisms, and which are their qualitative re-
lations. When doing this, domain expertise (or intuition!) is important to decide which
phenomena are important and which can be neglected, which dynamic effects are slow
and which are fast etc.
The result of the first step is typically some type of graph or block diagram that gives an
overview of the system, along with a list of the most important variables. If the system is
decomposed into subsystems, connections between them can have different “semantics”,
depending on context.
Example 2.1 (Combustion engine). In the figure below, a rough sketch of an internal com-
bustion engine is shown. The sketch illustrates the airflow through the throttle valve into
the intake manifold, how the injected fuel is mixed with the air and transported into the
cylinder for combustion, and how the resulting pressure is transformed into mechanical
torque.
Figure 2.1: The workings of an internal combustion engine illustrated by a simple sketch.
Based on this simple sketch, a more systematic description of the different physical pro-
cesses involved is obtained through the block diagram shown below. In the block diagram,
the most important variables that couple the subsystems are also shown. The block dia-
gram with its listed variables would be a useful output from the first step of the modelling
process.
20
Figure 2.2: A block diagram of the internal combustion engine.
D ETERMINE BASIC RELATIONS AND EQUATIONS . In the second step, the basically quali-
tative description obtained from the first step is made quantitative. This is done by de-
termining equations describing the physical mechanisms and relating variables involved.
This is where you apply knowledge gained in courses on physics, mechanics, electricity,
thermodynamics etc. Two different types of equations can be distinguished:
• Balance equations encode principles of mass balance, energy balance, force balance
etc. These equations typically relate several variables of the same kind, i.e. having the
same units, which is also the case for e.g. Kirchhoff’s voltage and current laws.
• Constitutive relations, on the other hand, describe relations between different kinds
of variables. An example is Ohm’s law, which characterizes the relation between volt-
age and current for an ideal resistor; another example is the ideal gas law, which links
pressure, volume and temperature for an ideal gas.
When forming the equations, it is a good habit to check dimensions, i.e. to make sure that
all terms in an equation have the same physical unit—this is the most basic quality check
you can apply to a model, and it is easily done!
21
The result from the second step is a collection of equations, some of which may involve
derivatives, i.e. being differential equations. However, it is likely that also a number of
algebraic equations are obtained.
F ORMULATE A MODEL . The last step in the simplified modelling workflow aims at arriv-
ing at a final model. This is accomplished by simplifying all the equations obtained from
the previous step—superfluous variables can often be eliminated by substitutions and by
solving simple equations etc. Further simplifications may be achieved by linearizing the
equations. The final goal is often to obtain a state-space model (or, in the linear case, a
transfer function), but in some cases this is not possible and we need to accept a model
in the form of an implicit differential equation or a differential-algebraic equation (DAE),
containing a mix of differential and algebraic equations. The latter case is illustrated in the
following simple example.
Example 2.2 (Nonlinear resistance). Consider the electrical circuit depicted below, con-
sisting of a resistor and a capacitor in series.
i
R
u
C vC
dv C u − v C
C = (2.2)
dt R
However, assuming the resistor is nonlinear with a voltage-current relation
we cannot solve for i any longer, and the model needs to be stated in terms of a differential-
algebraic equation or a DAE:
dv C
i =C (2.4a)
dt
u = vC + R1i + R2i 5 (2.4b)
We will return to a much more in-depth discussion of DAE models later in the course.
We will now illustrate in a few examples how the proposed workflow can be used. The
examples will be kept simple, though, which means that the full extent of the first step
cannot be captured.
22
Example 2.3 (Electric circuit). Consider the electrical circuit depicted in the figure below,
and let us develop a state-space model for the circuit.
v3 v4 v5
i3 i4 i5
R3 R4
i1 i2 L5
va vb
C1 v1 C2 v2
2. For the energy storage components, the following differential equations describe the
dynamics:
dv 1
C1 = i1 (2.5a)
dt
dv 2
C2 = i2 (2.5b)
dt
di 5
L5 = v5 (2.5c)
dt
The resistors can be characterized by the static, constitutive relations
v 3 = R3i 3 (2.5d)
v 4 = R4i 4 . (2.5e)
Finally, Kirchhoff’s voltage and current laws can be applied to give the balance equa-
tions
v1 = va − v3 (2.5f)
v2 = v1 − v4 (2.5g)
vb = v2 − v5 (2.5h)
i3 = i1 + i4 (2.5i)
i4 = i2 + i5 (2.5j)
23
3. In the search for a state-space model, the three differential equations suggest that
v 1 , v 2 , and i 5 could serve as state variables; these are also associated with the energy
storage in the capacitors and the inductor, indicating that they are indeed natural
candidates for state variables. By using the remaining equations, the state equations
can be derived as follows:
dv 1 1 1 1 1
C1 = i1 = i3 − i4 = v3 − v4 = (v a − v 1 ) − (v 1 − v 2 ) (2.6a)
dt R3 R4 R3 R4
dv 2 1
C2 = i2 = i4 − i5 = (v 1 − v 2 ) − i 5 (2.6b)
dt R4
di 5
L5 = v5 = v2 − vb (2.6c)
dt
Notice that we have reduced the initial 10 equations into 3 equations; several vari-
ables have been eliminated in this process, but they can easily be calculated from
the state variables. We can now obtain a final state-space model in vector form by
introducing x = (v 1 , v 2 , i 5 ) and u = (v a , v b ):
1
− R31C 1 − R41C 1 1
R4C 1 0 0
1 1 1 R3C 1
ẋ(t ) = R4 C 2 − R4C 2 − C2 x(t ) + 0 0 u(t ) (2.7)
1 1
0 L 0 0 − L5
5
Example 2.4 (DC motor with load). Consider the DC motor illustrated in the figure below.
The motor is supplied with a DC source with voltage u. The motor drives a rotating load,
characterized by its moment of inertia J and friction coefficient b. We would like to derive
a state-space model and a block diagram with transfer functions, describing the motor.
uR uL
i
R
L ω
u um J
1. The structure of this electro-mechanical system can be illustrated by the simple dia-
gram below. The two blocks represent the electrical and the mechanical subsystems,
respectively. We also see how the subsystems are connected. The electrical subsys-
tem delivers by induction the torque Td = k m i , where i is the current and k m is the
torque constant. Conversely, the rotation causes a back-emf, i.e. a voltage u m = k e ω,
where ω is the rotational speed and k e is a constant.
24
Td = k m i
u electrical mechanical ω
subsystem subsystem
um = ke ω
2. Letting u m denote the voltage over the motor and u R , u L be the component voltages,
the constitutive relations and Kirchhoff’s voltage law give the following equations for
the electrical sybsystem:
u = uR + uL + um (2.8a)
u R = Ri (2.8b)
di
uL = L (2.8c)
dt
um = ke ω (2.8d)
For the mechanical subsystem, Newton’s equation (a torque balance) and the consti-
tutive relations give:
dω
J = Td − T f (2.9a)
dt
T f = bω (2.9b)
Td = km i (2.9c)
3. Choosing the differentiated variables, i and ω, as state variables, the following state-
space model is readily derived:
di
L = −Ri − k e ω + u (2.10a)
dt
dω
J = k m i − bω (2.10b)
dt
An alternative representation can be obtained by applying the Laplace transform to
the equations and solve for Td (s) = k m I (s) and Ω(s):
km ¡ ¢
Td (s) = U (s) − k e Ω(s) (2.11a)
Ls + R
1
Ω(s) = Td (s) (2.11b)
Js +b
These equations can be depicted in a block diagram, which clearly shows how the
electrical and mechanical subsystems are connected via a feedback caused by the
back-emf.
25
+ Td
u km 1 ω
Ls+R J s+b
−
um
ke
Example 2.5 (Hydraulic system). Consider the hydraulic (fluid) system depicted in the fig-
ure below, consisting of two interconnected tanks containing a liquid. The terminal pres-
sures in the pipes are p a and p b . The laminar flows q 3 and q 4 are subject to linear flow re-
sistances. However, the flow q 5 can be considered without friction, but instead the inertial
effects are important. These assumptions illustrate how domain knowledge is important to
be able to judge which phenomena are important and which can be neglected.
pa pb
q3 q4 q5
1. Assuming the liquid is non-compressible, we can start by observing that the tanks
represent accumulation of mass and that this accumulation is related to build-up
of potential energy (measured by volume or level). Similarly, based on the statement
that inertial effects are important for the flow q 5 , the corresponding flow velocity rep-
resents kinetic energy (measured by flow-rate q 5 ). Both these observations point to
dynamic effects, described by differential equations. The external pressures p a and
p b are inputs to the system.
2. A mass balance for the tanks (with cross-sectional areas A 1 and A 2 ) and a force bal-
ance (Newton’s equation) for the outflow pipe (with cross-sectional area A) give the
following differential equations (ρ is the density of the liquid):
dh 1
A1 = q1 (2.12a)
dt
dh 2
A2 = q2 (2.12b)
dt
dq 5
ρl = A(p 2 − p b ) (2.12c)
dt
26
In addition, we have two constitutive relations for the linear flow resistances R 3 and
R 4 , constitutive relations linking pressure and level in the tanks, and balance equa-
tions for the flows:
p a − p 1 = R3 q3 (2.13a)
p 1 − p 2 = R4 q4 (2.13b)
p 1 = ρg h 1 (2.13c)
p 2 = ρg h 2 (2.13d)
q1 = q3 − q4 (2.13e)
q2 = q4 − q5 (2.13f)
3. Using pressures p 1 , p 2 and flow-rate q 5 as state variables, we can now form a state-
space model by combining all equations in (2.12) and (2.13):
A 1 dp 1 1 1
= q1 = q3 − q4 = (p a − p 1 ) − (p 1 − p 2 ) (2.14a)
ρg dt R3 R4
A 2 dp 2 1
= q2 = q4 − q5 = (p 1 − p 2 ) − q 5 (2.14b)
ρg dt R4
ρl dq 5
= p2 − pb (2.14c)
A dt
S OME REMARKS ON THE EXAMPLES . Let us finish this section by making a few general
remarks related to the physical modelling workflow:
• One particular aspect of domain knowledge is that there are “standard” choices of
state variables that you quickly get accustomed to. Examples include positions and
velocities for masses in mechanical systems, charge of capacitors, current of induc-
tors, accumulated mass or volume in fluid systems, and enthalpy or temperature in
thermal systems.
• Similarly, there are many “standard” simplifications such as the assumption on point
mass, no mass or no friction in mechanical systems; incompressible liquids in hy-
draulics; perfect mixing, ideal gas, and no heat losses in chemical engineering.
• A good habit during the modelling work is to make assumptions and approximations
explicit, i.e. to clearly state and document them. This makes it easier for you or some-
one else to go back and check the validity of these.
27
• Another good habit is to always make dimensions/units explicit, and to check that
dimensions are compatible in equations (this has not been done rigorously in the
examples in the interest of brevity).
• We finally stress again that the initial step, involving e.g. the decomposition of the
system into interconnected subsystems, is more important in realistic modelling tasks,
as opposed to the “toy examples” presented here.
v1 v2 v3
k1 k2
Fa m Fb
b1 b2
Figure 2.9: A mechanical system with the same model as the electrical system in Fig. 2.4
and the hydraulic system in Fig. 2.8. The system consists of a mass m connected
to two springs and two viscous dampers.
This observation of analogy between different physical domains can be taken further and
is formalized within the framework of bond-graphs (sw: bindnings-grafer), see e.g. [4] for
an introduction. At the core of this framework is the observation that in many types of
systems, the basic mechanisms of energy transformation, accumulation and dissipation
is conveyed by a pair of variables, referred to as effort (or potential) and flow variables,
respectively. Denoting the two types of variables e and f , respectively, the power trans-
formed/accumulated/dissipated is given by the product, i.e. P = e · f .
Table 2.1 summarizes the analogies between the three domains considered above, and how
the basic concepts can be generalized. We can see that resistance, inductance and capaci-
tance are concepts that apply generally. Whereas resistance is a static, dissipative element,
inductance and capacitance both represent dynamic effects that involve energy storage. For
28
Table 2.1: Basic elements involving effort and flow and their specializations in different do-
mains.
Intensity e u p F
Flow f i q v
Power P =e·f P = u ·i P = p ·q P =F ·v
df di dq
Inductance e = α dt u = L dt p = Lf dt F = m dv
dt
dp
Capacitance f = β de
dt
i = C du
dt
q = C f dt v = k1 dF
dt
Resistance e = γf u = Ri p = Rf q F = bv
and similarly, a capacitance stores the energy 21 βe 2 (you are encouraged to check what
these formulas reveal in the special cases!).
Beyond these general remarks, there are some useful guidelines and practical hints that can
be of some help in the modelling task. We will discuss some of these in this section.
It is desirable for the modeller to have a clear idea of the modelling purpose, i.e. the in-
tended use of the model. The purpose could be to get some general insights into the work-
ings of the system, to make some important design decisions, to simply determine some
parameters in the system, or to perform control design. The intended use of the model
29
naturally leads to requirements on model quality. If, for example, the model is going to be
used for control design, there will be higher requirements on model fidelity in certain fre-
quency ranges than in other. It is not always easy to quantify the required model quality,
but it is nevertheless important to have this aspect “in the back of your mind”.
All models are approximations of reality, and as such they come with a certain region of
validity—it is important not to use the model outside this region without extrem care! A
simple example of this rule is when linearization is performed to get a simple model; by
performing linearization in several operating points, the region of validity of the complete
model is extended.
Model validation is the action taken to verify that the model requirements are met. Typi-
cally, it involves simulating the model under some relevant conditions, and comparing the
results with data from the real system; it may also include analysis. The tests being done
during model validation may motivate modifications to steps taken earlier in the modelling
process. This underlines the important fact that modeling is an iterative process, as illus-
trated in the figure below.
Real system
Physical modelling data
Model
1 2 3 Model
validation
modifications
M ODEL APPROXIMATIONS
As already pointed out, approximations and simplifications are always carried out to some
extent in the modelling work. It can be done already during the initial modelling, but it may
also be the result of going back and revising some of the decisions, e.g. in order to reduce
the final model complexity. Some common situations are described in the following.
30
• Approximate a nonlinear sensor characteristics with a linear one
R = 2.4 Ω
L = 0.25 m H
J = 1.1 × 10−3 kg m2
b = 0.0025Nm/rad/s
From these parameter values, we can compute the mechanical and electrical time con-
stants for the two transfer functions:
τm = J /b = 0.44 s (2.16a)
τe = L/R = 0.0001 s (2.16b)
31
It is evident that for many applications, we can safely neglect the electrical time constant
km
and thus simplify the model by replacing the transfer function Ls+R by the steady-state gain
k m /R.
It is of course also possible to think about time-scale separation in the frequency domain.
As an example of this, we know from the basic control course that the model needs to be
accurate in the mid-frequency range when used for feedback control design. The following
example illustrates the point.
Example 2.7 (Open vs closed-loop [2]). Consider first three different systems described by
the transfer functions
1
G(s) = , a = 0, ±0.01 (2.17)
(s + 1)(s + a)
It can be seen that one system is stable, one is unstable, and one is marginally stable. It is
thus not surprising that the three systems have quite different open-loop step responses, as
seen in Figure 2.11 (top left). However, when (a unity gain) feedback is applied, the closed-
loop step responses turn out to be very similar (same figure, top right).
1
300 a=-0.01
0.8 a=-0.01, 0, 0.01
Amplitude
Amplitude
a=0.01 0.4
100
0.2
0 0
0 100 200 300 0 2 4 6 8 10
Time (seconds) Time (seconds)
Magnitude (dB)
100 20
a=-0.01, 0.01
50 0
a=0
0 -20 a=-0.01, 0, 0.01
-50 -40
0 0
Phase (deg)
a=0.01
a=0
-90 -90 a=-0.01, 0, 0.01
a=-0.01
-180 -180
10 -2 10 0 10 -2 10 0
Frequency (rad/s) Frequency (rad/s)
Figure 2.11: Open-loop (left column) and closed-loop (right column) step responses and
Bode diagrams for the systems in (2.17).
32
The result can be explained by looking at the corresponding Bode diagrams. The open loop
frequency responses in Figure 2.11 (bottom left) differ only in the low-frequency region.
However, it is the mid/high-frequency region that matters for the closed-loop frequency
response, which can be seen to be virtually identical (bottom right).
Amplitude
0 0
-5 -0.5
0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
Time (seconds) Time (seconds)
Magnitude (dB)
50 40
T=0, 0.015, 0.03 20 T=0.03
0
0 T=0.015
-50 T=0
-20
-100 -40
0 360
Phase (deg)
T=0 180
-180 0 T=0
T=0.015 -180
T=0.03 T=0.03 T=0.015
-360 -360
10 0 10 2 10 0 10 2
Frequency (rad/s) Frequency (rad/s)
Figure 2.12: Open-loop (left column) and closed-loop (right column) step responses and
Bode diagrams for the systems in (2.18).
One of the conclusions is that open-loop simulation may not always be the best option to
validate a model!
33
A GGREGATION OF STATES . Another way to reduce the complexity of the model is to “lump”
states together and represent them by one state only, e.g. the average. A classical example
of this is a distributed parameter system, whose state space is infinite-dimensional, and e.g.
described by a partial differential equation. Such systems frequently appear in chemical
engineering (reactors, heat exchangers etc.) and in systems involving heat conduction and
convection. Here are a couple of more examples of state aggregation:
• A battery in an electric car consists of many small cells, each with its own state of
charge, temperature etc. In order to study effects on system level, it is common to
lump these states together and to approximate the cell states with an average taken
over the cells in a larger battery module, or even in the entire battery.
• In vehicle dynamics, many longitudinal effects may be studied by aggregating the left
and right halves of the vehicle, leading to a so called bicycle model.
It has already been pointed out that it is a good habit to always check dimensions/units in
the equations used and also in the final model. Closely related to this is scaling of variables,
i.e. basically deciding which units to use, which can have an impact on the numerical prop-
erties during simulation. This can be taken one step further by employing normalization
of variables, to be illustrated below. In short, scaling and normalization can contribute to
• simplify equations;
We will illustrate in a couple of examples how scaling and normalization may be used. For
a more in-depth discussion and a formal treatment, please consult [4].
Example 2.8 (Mass-spring system). A simple mass-spring system can be described by the
equation
m ẍ(t ) + kx(t ) = F (t ), (2.19)
where F is the external force acting on the system. The system is characterized by the two
parameters m and k. However, we can simplify the study of the system properties by using
normalization. Define the following dimensionless position and time variables:
z = x/L (2.20a)
s
k
τ= t, (2.20b)
m
34
where L is some chosen length scale. This leads to the new differential equation
d2 z d2 (x/L) ¡ dt ¢2 F − kx m 1
= = · = F − z = u − z, (2.21)
dτ2 dt 2 dτ mL k kL
where the new (also dimensionless) input u = F /(kL) has been introduced. The conclusion
is that every mass-spring system of the type (2.19) can be studied in terms of the differential
equation
z̈ + z = u, (2.22)
given the particular interpretation (using re-scaling) of the variables z and u.
Example 2.9 (Cart with pendulum [1]). The figure below depicts a simple model for a cart
with a balancing inverted pendulum. This model can be seen as a simple representation of
e.g. a Segway.
θ
l
F M
The following second order, vector differential equation can be derived for the system:
· ¸· ¸ · ¸ · ¸
M +m −ml cos θ q̈ c q̇ + ml sin θ · θ̇ 2 F
+ = (2.23)
−ml cos θ J + mL 2 θ̈ γθ̇ − mg l sin θ 0
τ = ω0 t (2.25a)
1
x= q (2.25b)
l
1
u= F (2.25c)
(M + m)l ω20
35
We leave it as an exercise to verify that this leads to the following differential equations for
the new variables:
d2 x d2 θ ¡ dθ ¢2 m
2
− α cos θ 2
+ α sin θ = u, α= (2.26a)
dτ dτ dτ M +m
d2 x d2 θ ml 2
−β cosθ 2 + 2 − sin θ = 0, β= (2.26b)
dτ dτ J + ml 2
It can be seen that the scaling introduced has resulted in a model with only two parameters
α and β, which clearly simplifies the study of system properties as function of parameter
values. It can finally be noted that in the case when m ≪ M, J ≪ ml 2 , then we have α ≈ 0
and β ≈ 1, which simplifies the equations even further.
ẋ = f (x, z ) (2.27a)
ǫż = g (x, z ) (2.27b)
where 0 < ǫ ≪ 1. This way to formulate the model suggests that the system can be thought
of as a connection of a system with “slow” dynamics (2.27a) and a system with “fast” dy-
namics in (2.27b). Following the idea of time-scale separation, the fast dynamics would
be neglected and replaced by the corresponding stationary solutions (formally obtained by
putting ǫ to zero). This procedure results in the DAE model
ẋ = f (x, z ) (2.28a)
0 = g (x, z ) (2.28b)
A valid question to ask at this point is how this approximation can be justified mathemati-
cally. Before providing an answer to this question, let us have a look at an example, which
illustrates that the procedure is not entirely foolproof.
36
The limit case ǫ = 0 gives the DAE
Clearly, the two cases differ qualitatively. For example, the DAE (2.32) requires that u is dif-
ferentiable as it appears time-differentiated. This is in contrast with (2.29) for which any
integrable input profile is admissible.
The trajectories for (2.29) are illustrated in Fig. 2.14 for a few different values of ǫ > 0. It can
be seen that for decreasing values of ǫ, the trajectories approach the solution of the DAE
(2.30), which are shown in bold.
37
1
0.5
x1
0
-0.5
0 0.5 1 1.5 2 2.5 3
-0.5
-1
0 0.5 1 1.5 2 2.5 3
1.5
1
0.5
u
0
-0.5
-1
-1.5
0 0.5 1 1.5 2 2.5 3
t
Figure 2.14: Trajectories for (2.29) with a piecewise constant input. The grey curves repre-
sent the trajectories for ǫ ranging from 10−1 to 4·10−3 with the initial conditions
x1 (0) = x2 (0) = 1 and for a = 2. The black curve represents the solution to the
DAE (2.30).
We could try to explain the outcome in the example with a bit of intuition. Since x1 is
expected to change slowly, assume we “freeze” x1 and see what happens to x2 in (2.29b).
Well, we expect x2 to quickly converge to the stationary solution given by (2.30b), provided
(2.29b) is stable, i.e. if a > 0. With this condition, we can also solve (2.30b) for x2 as a
function of the “frozen” x1 (which is in reality changing slowly). This intuitive explanation
is indeed supported by the following theorem due to Tikhonov:
ẋ = f (x, z ) (2.34a)
ǫż = g (x, z ) (2.34b)
where 0 < ǫ ≪ 1 is very small. Let us label xǫ (t ), z ǫ (t ) the solution to (2.34). Suppose:
38
Then
lim xǫ (t ), z ǫ (t ) = x0 (t ), z 0 (t ) (2.35)
ǫ→0
ẋ = f (x, z ) (2.36a)
0 = g (x, z ) (2.36b)
The limit (2.35) is to be understood in the sense of “almost everywhere" formally defined
in the context of measure theory. From an intuitive point of view, the trajectories match
everywhere but on a union of time intervals that are “infinitely short”.
Going back to Example 2.10, we can see that the hypotheses of Theorem 3 are satisfied for
∂g
(2.29) if a > 0, such that ∂z = −a is full rank and (2.29b) is stable. However, if a ≤ 0, then
there is no mathematical support for approximating (2.29) with (2.30).
Example 2.11. Let us consider two masses moving horizontally (positions x1 and x2 ), sub-
ject to a velocity-dependent friction with parameter ρ, and connected by elastic links of
constant K . The dynamics of this two-mass system can be modelled via the linear ODE:
K K
m1 m2
u
x1 x2
ρ ρ
We can find an approximate model in two steps as follows. By first assuming the oscillatory
dynamics of mass 1 are very fast compared to the one of mass 2, i.e. m 1 ≪ m 2 and mK1 ≪ 1,
(2.37) can be approximated by setting mK1 to zero, giving the new model:
ρ 1
ẋ1 = (x2 − x1 ) − x1 + u (2.38a)
K K
m 2 ẍ2 = K (x1 − x2 ) − ρ ẋ2 . (2.38b)
39
Assuming the dynamics of mass 1 are oscillatory, it can readily be shown that mK1 ≪ 1 im-
ρ
plies that K ≪ 1 (you are encouraged to verify this!), so that we can form the further ap-
proximation:
1
0 = x2 − 2x1 + u (2.39a)
K
m 2 ẍ2 = K (x1 − x2 ) − ρ ẋ2 , (2.39b)
which is a DAE since state x1 does not appear time differentiated. We can easily observe
that (2.39) boils down to eliminating ẋ1 , ẍ1 from equation (2.37a). Alternatively, we could
have arrived at this by putting (2.37a) in the state-space form (2.34b) with a suitable choice
of ǫ and then setting ǫ to zero; please refer to Exercises.
Figure 2.16 compares the trajectories of ODE (2.37) and the ones of the DAE (2.39). One can
verify that for both applications of the Tikhonov theorem (2.38) and (2.39), the hypothesis
apply.
3
2
2
1
1
0 0
x2
x1
-1
-1
-2
ODE (2.37)
-2 -3 DAE (2.39)
-4
-3
0 1 2 3 0 1 2 3
250 1500
200
150 1000
100 500
50
ẋ1
0 0
-50
-100 -500
-150 -1000
-200
-250 -1500
0 1 2 3 0 1 2 3
t
Figure 2.16: Illustration of the Tikhonov theorem for (2.37). The piecewise constant input
illustrated in the lower-right graph was selected. The grey curves represent
the trajectories for m 1 ranging from 10−1 to 4 · 10−4 with the initial conditions
x1 (0) = x2 (0) = 1 and ẋ1,2 (0) = 0. The parameters ρ = 1, K = 103 and m 2 = 1
were selected. The black curve represents the solution to the DAE (2.39).
40
Let us finish by providing some intuition behind the application of the Tikhonov theorem
for the two-mass example. One can observe that the approximation at play here is essen-
tially to replace the dynamics of mass 1 by its “steady-state” approximation. This means
that we consider that the motion of the mass decays instantaneously to its steady state
provided by (2.39a). One should understand here that “steady-state" does not mean “not
moving", as e.g. (2.39a) still allows x1 to move. However, in the approximation (2.39) the
motion of x1 is entirely dictated by the motion of x2 moving x1 in different steady-state
positions. This principle can very often be readily applied to models holding very fast dy-
namics. One ought to observe that the dynamics created by the DAE (2.39) do not hold very
fast dynamics (as opposed to e.g. (2.38) which still hold a very fast dynamics for x1 ). This
observation will connect to Section 7.4.
Let us start by reminding about the proposed workflow for physical modelling in Section
2.1:
3. Formulate a model
It is clear that steps 1 and 2 depend heavily on your skills as a modeller. Step 3, on the other
hand, is largely a matter of “book-keeping”, involving substitutions, sorting etc, with the
main goal to arrive at a model that is possible to simulate. In many cases, this amounts to
finding a state-space model, for which computations are easily organized. Taking a closer
look at what happens in this step, there are several issues to discuss.
M ODEL FOR COMPUTATIONS . One reason why we prefer state-space models is that these
are easy to simulate. The explicit ODE form makes it (in principle) very simple to recur-
sively compute an approximate solution by iterating a difference equation approximat-
ing the ODE. In this sense, the model can be seen as a representation of a computational
scheme, as illustrated in the following example.
41
Example 2.12 (Predators and prey [4]). The following model has been suggested to study
variations in populations for a pair of mutually dependent predator-prey, e.g. lynx and
hares:
¡ ¢
Ṅ1 (t ) = λ1 N1 (t ) − γ1 − α1 N2 (t ) N1 (t ) (2.40a)
¡ ¢
Ṅ2 (t ) = λ2 N2 (t ) − γ2 + α2 N1 (t ) N2 (t ) (2.40b)
The figure below illustrates the characteristic, cyclic variations when simulating the model.
Figure 2.17: Variations in populations of lynx and hares using the model (2.40).
The Matlab/Simulink model used to simulate the system is depicted below. It can be seen
that the model is in fact built according to the data-flow used at every iteration of the sim-
ulation. The two integrators holding the states N1 and N2 act as starting points for the
calculations, and completing all assignments “downstream” finally gives the derivatives Ṅ1
and Ṅ2 , allowing a state update by approximating the derivative by a difference quotient.
42
The following example illustrates what is happening when a state-space model is formed
for the electric circuit in Example 2.3.
Example 2.13 (Electric circuit, revisited). In the figure below, we repeat the schematic of
the circuit, the list of equations (left) that we derived in step 2 of the modelling work-flow,
and the final state-space model (right) obtained in step 3.
v3 v4 v5
i3 i4 i5
R3 R4
i1 i2 L5
va vb
C1 v1 C2 v2
C 1 v̇ 1 = i 1
C 2 v̇ 2 = i 2
L 5 i˙5 = v 5
v 3 = R3i 3
− R31C 1 − R41C 1 1
R4C 1 0 1
R3 C 1
0
v 4 = R4i 4 1
− R41C 2 − C12 x +
ẋ = R4 C 2 0 0 u
v1 = va − v3 0 1
0 0 − L15
L5
v2 = v1 − v4
vb = v2 − v5
i3 = i1 + i4
i4 = i2 + i5
Figure 2.19: The collection of basic equations (left) and the state-space model (right), both
describing the electrical circuit (top).
We can see here that considerable effort is needed in order to go from the basic equations,
describing the system, to a form that is suitable for computations. Moreover, the state-
space model has a structure that is far from the simple structure of the basic equations,
and parameters appear in combinations and in several instances.
Based on the examples, it is natural to ask the following question: can we automate step 3
of the modelling workflow by leaving to the computer to manipulate the basic equations?
Well, why not? After all, the computer is very good at routine tasks like sorting, substitu-
tions etc. Leaving step 3 to computer tools would also shift our focus a bit by viewing the
collection of basic equations from step 2 as the main result of our modelling, even as the
model itself—this is what is stressed in the term equation based modelling.
43
Let us finally remind about Example 2.2, where it was demonstrated that it is not always
possible to go from a collection of basic equations to a state-space model. Instead, we
were left with a DAE model for the circuit with a nonlinear resistance. To detect such cases
would be another task for computer tools. In addition, we will see later in the Lagrange
modelling chapter that both implicit ODE and DAE models may be preferable compared to
state-space models to keep model complexity down.
Example 2.14 (Two versions of an electrical circuit). The figure below depicts two versions
of basically the same electrical circuit. The only difference is how the circuit is driven—by
a voltage source (left) or a current source (right).
vR vR
R R
i i
v i
C vC C vC
Figure 2.20: Two versions of the same electrical circuit driven by a voltage source (left) and
a current source (right), respectively.
The basic equations that describe the electrical circuit are as follows:
C v̇ C = i (2.41a)
v R = Ri (2.41b)
v R + vC = v (∗) (2.41c)
where the last equation is valid for the voltage source driven circuit only.
Let us view the voltage over the resistance as output, y = v R . We can then readily form
state-space models for the two cases:
1 1 1
v̇ C (t ) = − v C (t ) + v (t ) v̇ C (t ) = i (t )
RC RC C (2.42)
y(t ) = −v C (t ) + v (t ) y(t ) = Ri (t )
It is clear that the final model changes significantly, even for this small modification of the
circuit! On the other hand, the original equations (2.41) differ only by the application of
Kirchhoff’s voltage law for the voltage source case. Another way to explain what is going on
is to stress that the basic resistor equation
v R = Ri (2.43)
44
is just a declarative statement of how voltage and current for a resistor relate, and no causal-
ity is implied. When forming the state-space models, on the other hand, there is implicitly
a causality implied, since the models are very close to the computations to be performed.
This means that the resistor equation is really used in two different ways, namely to per-
form the following assignments:
vR
i := v R := Ri (2.44)
R
In spite of its simplicity, the example points at an important aspect of equation based mod-
elling. By viewing the model as just a collection of equations, a declarative and a-casual
(without causality) description, no implications concerning the organization of compu-
tations are made. This means that we hand over to computer algorithms to analyze the
causality of the model, and to organize computations accordingly. This is in contrast to
state-space models or Simulink models, which are imperative, i.e. built on assignments in
a specific order tied to the causality of the model.
C OMPOSITE MODELS . Let us take another look at Example 2.13 and focus on step 2, i.e.
forming the basic equations for the circuit. It is clear that the equations can be divided
into two groups, one describing the components and one describing the couplings between
these:
Components: Couplings:
C 1 v̇ 1 = i 1 v1 = va − v3
C 2 v̇ 2 = i 2 v2 = v1 − v4
L 5 i˙5 = v 5 vb = v2 − v5
v 3 = R3i 3 i3 = i1 + i4
v 4 = R4i 4 i4 = i2 + i5
Figure 2.21: The basic equations for the circuit in Example 2.13, divided into two groups.
This observation can take us further when dealing with composite models, i.e. models built
up by components and/or subsystems. The component equations remain the same re-
gardless of which circuit we build. The coupling equations, on the other hand, encode
Kirchhoff’s laws describing how the components are connected. These laws follow from
the basic rules that the potential of two connected pins are the same, and that the sum of
currents flowing into a connection node sum up to 0. Similar rules apply for other pair of
effort/potential and flow variables, as discussed in Section 2.2. This way of viewing model
connections has been formalized within the modelling language Modelica [5].
45
2.5.2 T HE M ODELICA LANGUAGE
Modelica is a modelling language that supports equation based, declarative and a-causal
physical modelling along the principles discussed in this section. Furthermore, re-use of
models is facilitated by building models from connections of sub-models and components.
We will give a brief “teaser” on Modelica here, but for further study of the language and its
features, we refer to the many publications freely available, see [6].
Let us start by taking another look at Example 2.14, now in Modelica terms.
Example 2.15 (Two electrical circuits, revisited). All electrical components and circuits
need connectors or pins. This is how it looks in Modelica:
connector Pin
Real v;
flow Real i;
end Pin
Note that the pin is characterized by two variables, namely the effort/potential variable v
and the flow/current variable i . Implicitly, this tells us that potentials are set equal when
pins are connected, and currents are set up to sum to 0. This is all handled by Modelica
“under the hood” when declaring a connection:
pin1.v = pin2.v;
connect(pin1,pin2) ⇔
pin1.i + pin2.i = 0;
Note the dot notation to be interpreted as a genitive s, e.g. pin1.v is “pin1’s v variable”.
We are now ready to define the basic Resistor and Capacitor components (the keyword
der is used here to denote the derivative of a variable):
model Resistor model Capacitor
parameter R; parameter C;
Pin n,p; Pin n,p;
Real i,u; Real i,u;
equation equation
u = p.v - n.v; u = p.v - n.v;
p.i + n.i = 0; p.i + n.i = 0;
i = p.i; i = p.i;
u = R*i; der(u)*C = i;
end Resistor; end Capacitor;
It may seem unnecessarily complicated to define such simple components with a number
of equations. However, the good news is that this allows us to generalize to many different
components, using the same constructs. Moreover, basic components are pre-defined and
available in libraries, which means we don’t have to bother about details as users.
46
Using these components, we can now define the circuit with a voltage source in Modelica:
model Circuit1
Resistor R (R=100);
Capacitor C (C=0.001); R
VSource Vs (u=1);
equation Vs
C
connect(Vs.p,R.n);
connect(R.p,C.n);
connect(C.p,Vs.n);
end Circuit1;
Notice that in order to change this model from using a voltage source to a current source,
we only need to change one line in the code!
We have demonstrated above how models can be defined textually using the Modelica lan-
guage. There are, in addition, graphical tools available, allowing the user to define mod-
els by graphically connecting components and models earlier defined and collected in li-
braries. It should be stressed, however, that connecting models graphically does not change
the fundamental, declarative and a-causal nature of the model.
PARTIAL MODELS . Modelica is a rich language and offers many more features, which are
outside the scope of this introductory presentation. We will briefly illustrate one feature
though, namely the concept of a partial model, which resembles the inheritance feature
found in object-oriented languages.
An important observation is that many (component) models share some common charac-
teristics. Therefore, it is tempting to systematically develop such models by successively
specializing from more general models. This is exactly the idea of a partial model. Let us
illustrate by defining a general and basic electrical component, called OnePort, which in
turn makes use of a connector Pin, which we have come across earlier, but now slightly
refined:
connector Pin
[Link] v;
flow [Link] i;
end Pin;
47
i = p.i;
end OnePort;
The partial model OnePort needs specialization in order to be well-defined, and it basically
amounts to specify how the current through the component relates to the voltage drop.
In this way, it is straightforward to define models for the standard components, e.g. the
resistor and capacitor:
model Resistor model Capacitor
extends OnePort; extends OnePort;
parameter Real R=1 parameter Real C=0.001
’’Resistance in [Ohm]’’; ’’Capacitance in [F]’’;
equation equation
v = R*i; der(v)*C = i;
end Resistor; end Capacitor;
The same idea applies when defining an inductor, or any other simple one-port compo-
nent. In the code segments shown above, a couple of Modelica features are also shown:
the possibility to define default parameter values (that can later be changed, of course) and
units, the latter allowing to automatically check that a model is based on compatible units.
So far, we have used examples from the electrical domain to illustrate some of the basic
ideas in Modelica. However, Modelica is aimed for multi-domain modelling based on the
same principles, and we finish this brief tour by giving an electro-mechanical example,
namely the DC motor.
Example 2.16 (DC-motor [3]). A DC-motor model is given below in both textual and graph-
ical form. It can be seen that a special component EMF represents the coupling between the
electrical and mechanical domains. It describes both the torque induced by the current to
the motor and the back-emf, i.e. the voltage arising due to the rotation of the motor.
model DCMotor
Resistor R(R=100);
Inductor L(L=100);
VsourceDC DC(f=10);
Ground G;
EMF emf(k=10, J=10, b=2);
Inertia load;
equation
connect(DC.p,R.n);
connect(R.p,L.n);
connect(L.p,emf.n);
connect(emf.p,DC.n);
connect(DC.n,G.p);
connect([Link],[Link]);
end DCMotor;
48
Despite the fact that the DC motor model is quite simple, it nevertheless generates quite
a few equations, as seen in the following list. Luckily, it is up to the modelling software to
handle these equations, e.g. to generate code for simulations.
49
50
3 L AGRANGE MECHANICS
In the previous chapter, we discussed some general principles and guidelines for physical
modelling. It was pointed out, however, that it is important to complement this with do-
main specific knowledge and guidelines. In this chapter, we will go one step further and
show how domain specific techniques can facilitate and strengthen the physical modelling
process, and this will be done by studying the basics of Lagrange mechanics. Lagrange me-
chanics is a powerful tool to build mathematical models for complex mechanical systems,
and is used in many applications. Lagrange mechanics
• allows to describe arbitrarily complex mechanical system, including relative moving
parts, accelerated frames, gyroscopic and centrifugal effects without hassle;
• often allows to build simple models (in terms of complexity of the resulting model
equations), even for mechanical systems that are normally described by very com-
plex models.
We define the generalized coordinates of a mechanical system as a vector of time-varying
“coordinates" q (t ) ∈ Rnq that must be able to describe the “configuration" of the system at
a given time t . One can construe these coordinates as forming a “snapshot" of the system.
The snapshot does not tell us how the system will evolve in time, but it can tell us in what
configuration the system is at a given time. The generalized coordinates typically gather
positions and angles, but can also include more abstract representations of the system. For
the sake of simplicity, in the following we will omit the explicit dependence of the general-
ized coordinates and other time-varying objects in the notation, i.e. we will e.g. note the
time-varying generalized coordinates q (t ) simply as q .
51
• The generalized coordinates q describe the configuration of the entire mechanical
system we consider. I.e. if our punctual mass is part of a mechanical system having
the generalized coordinates q , then its position p is a function¡of¢q (in the sense that
p can be calculated from q ). More formally, one could write p q . The mass velocity
ṗ is then a direct result of the chain rule:
∂p
ṗ = q̇ (3.2)
∂q
1X N ° °2 1 X N
T= m i °ṗ i ° = m i ṗ i⊤ ṗ i (3.3)
2 i =1 2 i =1
The generalized coordinates describe the configuration of the whole system, i.e. it
describes the position p i of each mass. Each velocity is given by the chain rule:
∂p i
ṗ i = q̇ (3.4)
∂q
The kinetic energy of the N masses is then given by:
à !
N N ⊤ N ⊤
1X 1 X ∂p i ∂p i 1 X ∂p i ∂p i
T= m i ṗ i⊤ ṗ i = m i q̇ ⊤ q̇ = q̇ ⊤ mi q̇ (3.5)
2 i =1 2 i =1 ∂q ∂q 2 i =1 ∂q ∂q
• The observation above holds for systems having infinitely many “particles" (e.g. dis-
tributed mass systems). Generally, the kinetic energy function will take the form:
1 ¡ ¢
T (q , q̇) = q̇ ⊤W q q̇ (3.7)
2
where each of these variables (i.e. q and q̇) are time-varying, and where W is a square,
symmetric, positive-definite matrix of size Rnq ×nq , possibly function of q .
52
Example 3.1 (Mass on a rigid rod). Imagine a punctual mass m attached to the origin by a
rigid, massless rod of length L, and moving in the plane x − y. We can describe the system
by the angle θ between the x axis and the rod. This angle alone describes entirely this
simple system, as knowing θ specifies in what “configuration" (or position) the system is.
Our (unique) generalized coordinate here is q = θ ∈ R.
m
y
L
θ
x
Figure 3.1: Illustration of a mass on a rigid rod.
W = mL 2 (3.11)
and is constant.
Example 3.2 (Mass on an elastic rod). Imagine a punctual mass m attached to the origin
by an elastic, massless rod of varying length L, and moving in the plane x − y.
m
y L
θ x
Figure 3.2: A mass on an elastic rod.
53
To describe the system we now need the angle θ between the x axis and the rod, and the
length L (as the latter is also changing and therefore not simply a constant). Our generalized
coordinates are then:
· ¸
θ
q= (3.12)
L
as both are needed at a given time t to describe the “configuration" (or position) of the
system. The position of the mass in the x − y plane is still given by:
· ¸
cos θ
p =L (3.13)
sin θ
but now L is no longer a constant and is part of our generalized coordinates. The chain rule
then tells us that:
· ¸ · ¸
− sin θ cos θ
ṗ = L θ̇ + L̇ (3.14)
cos θ sin θ
1
T = m ṗ ⊤ ṗ
2
· ¸⊤ · ¸ · ¸⊤ · ¸
1 − sin θ − sin θ 1 cos θ cos θ
= mL 2 θ̇ 2 + m L̇ 2
2 cos θ cos θ 2 sin θ sin θ
· ¸⊤ · ¸
1 − sin θ cos θ
+ mL L̇ θ̇
2 cos θ sin θ
· ¸
1 1 1 mL 2 0
= mL 2 θ̇ 2 + m L̇ 2 = q̇ ⊤ q̇ (3.15)
2 2 2 0 m
y
L
m
θ
x
Figure 3.3: A rigid rod with mass.
54
The rod position is again described by the angle θ alone (we consider the length L fixed as
the rod is rigid), hence q = θ. We can consider our rod as made of infinitely many punctual
masses distributed along the rod axis. If we locate these masses via their position ν ∈ [0, L]
along the rod axis, we can describe their position in the plane x − y by:
· ¸
cos θ
p (ν) = ν (3.17)
sin θ
The overall kinetic energy of the rod is given by the summation of the kinetic energy of all
the infinitesimal masses (each having a mass m L
dν). It can be computed by the integral:
ZL µ · ¸¶⊤ µ · ¸¶
m − sin θ − sin θ
T= νθ̇ νθ̇ dν
L 0 cos θ cos θ
ZL
m m 2 3 ¯¯ν=L m 2 3 1
= ν2 θ̇ 2 dν = θ̇ ν ¯ = θ̇ L = mL 2 θ̇ 2 (3.19)
L 0 3L ν=0 3L 3
Our “matrix" W is now reduced to 2/3 of the one we had for the mass concentrated at the
end of the rod, i.e.
2
W = mL 2 (3.20)
3
(one can compare (3.19) and (3.10) to see that).
• The potential energy due to gravity in most “standard" mechanical applications de-
rives from:
V = mg z (3.21)
which gives the potential energy of a punctual mass m, where “z" is the “height" of
the mass in the field of gravity. Consider e.g. in the examples above that the mass m
is concentrated at the end of the rigid rod. The position of the mass is given by (3.17),
such that its vertical position is given by:
p z = L sin θ (3.22)
V = mg L sinθ (3.23)
55
In contrast, consider that the mass is distributed throughout the rod. Computing
the potential energy then follows similar lines as (3.19), i.e. we need to “sum up"
the potential energy of every “particle of the rod". A “particle" is described here as a
infinitesimal piece of the rod, of length dν, having a mass m L dν and a vertical position
ν sin θ. We can then do:
Z ¯ν=L
mg L mg 1 2 ¯ 1
V= ν sin θdν = ν sin θ ¯¯ = mg L sinθ (3.24)
L 0 L 2 ν=0 2
hence the potential energy then corresponds to considering that the whole mass of
the rod is concentrated at the half-length (this is to put in contrast with the kinetic
energy computation, where one can consider that the whole mass of the rod is con-
centrated at the 2/3 of the length of the rod!).
• The considerations above are valid for mechanical systems evolving “at a small scale"
in the field of gravity, for which one can consider the field as “straight" and uniform.
For mechanical systems evolving at a “large scale" such as e.g. a satellite orbiting a
planet, one needs to apply (at least) the genuine formula for classic gravity, i.e.:
m
V = −G (3.25)
r
where r is the distance to the center of the planet.
Note that the Lagrange formalism can handle potential energy from other sources (mag-
netic field, electric field, etc.), but we will not consider these cases in this course. Another
source of potential energy that is commonly present in mechanical systems is potential en-
ergy stored in flexible components. We will often simply consider springs in this course, but
any part of the system that undergoes elastic deformations does in principle store kinetic
energy.
x
0
Figure 3.4: Illustration of a spring. The position x = 0 is set at the spring rest length. The
rigidity constant of the spring is k
F = −kx (3.26)
56
where x is the position of the spring end, relative to its rest position (position at which the
spring yields no force). The potential energy stored in the spring is given by:
¯
1 2 ¯¯0 1 2
Z0
V= −kνdν = − kν ¯ = kx (3.27)
x 2 x 2
Note that if the reference frame is chosen such that the rest length of the spring is not at
x = 0 but rather at an arbitrary position x = x0 , then the potential energy is given by:
1
V = k(x − x0 )2 (3.28)
2
Example 3.5 (2D spring). Let us consider now a slightly more complex example illustrated
in the figure below.
y
Figure 3.5: Illustration of a spring moving in the plane. The rigidity constant of the spring
is k. We suppose that the rest length of the spring is L 0 .
57
3.3 L AGRANGE E QUATION
Before detailing the Euler-Lagrange equation, we need to define the Lagrange function
¡ ¢ ¡ ¢ ¡ ¢
L q , q̇ = T q , q̇ − V q ∈ R (3.33)
simply made from substracting the potential energy from the kinetic energy. In general,
the Lagrange function takes the generalized coordinates q and their time derivative q̇ as
arguments. Using the observations of the sections above, we observe that the Lagrange
function takes the form:
¡ ¢ 1 ¡ ¢ ¡ ¢
L q , q̇ = q̇ ⊤W q q̇ − V q (3.34)
2
The Euler-Lagrange equation then reads as:
d ∂L ∂L
− =0 (3.35)
dt ∂q̇ ∂q
and defines the model of the mechanical system. It is useful to observe here that (3.35)
defines equations as a row vector. Indeed, e.g. ∂L ∂q
∈ R1×nq is a row vector. One can verify
that the first term is (of course) also a row vector, such that the difference between the two
terms is well defined. As we tend to prefer working with equations in a column vector form,
it can be useful when needed to rewrite (3.35) in its transposed version, i.e.:
d
∇q̇ L − ∇q L = 0 (3.36)
dt
where we use the “gradient" notation:
∂L ⊤ ∂L ⊤
∇q̇ L = , ∇q L = (3.37)
∂q̇ ∂q
This transformation is cosmetic and of secondary importance, but it is useful to point it out.
Remember this observation in the following whenever we transpose the terms appearing
in (3.35), as we will do it whenever it will help us write things in a neat and structured form.
Before investigating how (3.35) delivers the model of the system, it is worth here pausing
to understand the mathematical meaning of the first term in (3.35). Indeed, the partial
derivative of the Lagrange function L with respect to the time derivative of the generalized
coordinates q̇ , i.e. ∂L
∂q̇ may appear surprising or ambiguous. In fact, it is a very straight-
forward operation. In order to perform it correctly, one has to consider q̇ as a variable in
itself, independent of all other variables, and take the classical differential operations ac-
cordingly. More specifically, considering (3.34), the operation ∂L
∂q̇ simply yields:
µ ¶ µ ¶
1 ⊤ ¡ ¢ ¡ ¢ 1 ⊤ ¡ ¢ ¡ ¢
∇q̇ L = ∇q̇ q̇ W q q̇ − V q = ∇q̇ q̇ W q q̇ = W q q̇. (3.38)
2 2
58
d ∂L
Forming the term dt ∂q̇ in the Lagrange equation requires a total differentiation with re-
d
spect to time (operator dt
). Using the chain rule, this yields:
d ¡ ¡ ¢ ¢ ∂ ¡ ¡ ¢ ¢ ∂ ¡ ¡ ¢ ¢ ¡ ¢ ∂ ¡ ¡ ¢ ¢
W q q̇ = W q q̇ q̈ + W q q̇ q̇ = W q q̈ + W q q̇ q̇ (3.39)
dt ∂q̇ ∂q ∂q
We can then pack the developments above to observe that the Lagrange equation yields:
¡ ¢ ∂ ¡ ¡ ¢ ¢
W q q̈ + W q q̇ q̇ − ∇q L =0 (3.40)
∂q | {z }
=∇q T −∇q V
In order to understand the “mechanics" behind using the Lagrange equations, it is useful to
run through a handful of simple and classical examples. It is really useful to underline here
that the computations performed hereafter are best carried out in a Computer Algebra
System (CAS) such as the Matlab Symbolic Toolbox. Indeed, the computations required to
deploy Lagrange mechanics are very systematic, but become quickly involved. It is best to
perform them in the computer. In the exercises associated to this course, we recommend
you to use this approach to generate your models.
1. The Lagrange equation delivers the differential equation in an implicit form, i.e. it is a
set of equations relating the generalized coordinates and their first and second-order
time derivatives, i.e. q , q̇ , q̈ .
2. The Lagrange equation allows one to compute the system acceleration q̈ for a given
system configuration q and its time derivative q̇ if matrix W (q ) is invertible. They
can be used for simulating the system, if the initial conditions q , q̇ are provided (i.e.
we need to specify the “position" and “velocity" of the system in order to predict its
future trajectory). For q , q̇ given, then the Lagrange equations can be solved for the
acceleration q̈, from which the time evolution of q , q̇ can be computed (i.e. q is
obtained from integrating q̇ and q̇ is obtained by integrating q̈).
∇q L = ∇q T − ∇q V (3.41)
in the Lagrange equation is generally speaking introducing forces that are intrin-
sic to the system (as opposed to forces applied “externally" to the system), such as
e.g. forces deriving from potentials (coming from the “V " part, see remark following
(3.51)) and centrifugal forces (coming from the “T " part).
4. The Lagrange equation is linear in the accelerations q̈. This is true for Lagrange func-
tions in the form (3.34), and stem from (3.39), which yields the first term in the La-
grange equation in the form W (q )q̈, which is linear in q̈.
59
5. Because the accelerations enter linearly in the Lagrange equation, we can solve it
for q̈. More specifically, if one write the Lagrange equation in the form (3.40), the
accelerations q̈ can be explicitly expressed as:
· ¸
¡ ¢−1 ∂ ¡ ¡ ¢ ¢
q̈ = W q ∇q L − W q q̇ q̇ (3.42)
∂q
Unfortunately, writing the model in this explicit form is not always a very good move.
Indeed, for systems that do not have a vector of generalized coordinates q of very low
¡ ¢−1
dimension (e.g. > 2), the inverse W q can be very complex, even if matrix W (q )
is fairly simple. An exception to this observation is the case where matrix W (q ) is
constant (i.e. not actually function of q ). In this case
∂ ¡ ¡ ¢ ¢
∇q T = 0 ⇒ ∇q L = −∇q V and W q q̇ q̇ = 0 (3.43)
∂q
q̈ = −W −1 ∇q V (3.44)
where matrix W is purely “numerical" (i.e. it contains only numbers, not expres-
sions), and is therefore easy to invert.
Let us try to nail down these remarks via a series of examples, starting simple and ramping
up to complex systems.
Example 3.6 (Spring-mass system). Consider the spring-mass system depicted in Fig. 3.6
below.
x
m
Figure 3.6: Vertical spring-mass system. The mass “0" position is set at the spring rest
length. The rigidity constant of the spring is k, and the hanging mass is m.
Let us deploy the principles of Lagrange modelling on this simple system. Our goal is to
unpack what the Lagrange equation does, and what the different terms “physically" corre-
spond to. Let us start with setting up our generalized coordinates. We need to describe the
mass position. Fig. 3.6 proposes to set the “0" position at the rest length of the spring, and
60
consider that x increases when the mass goes down. This choice is not unique: we could
set this “0" position anywhere we want and decide that x increases when the mass goes up.
The kinetic energy function is simple to calculate for this kind of system. We can use (3.1),
where our position p ≡ x is a scalar (we work in 1D). We then simply get:
1
T = m ẋ 2 (3.45)
2
One can observe here that our kinetic energy function is in the form (3.7), with W (x) = m.
The potential energy is composed of the sum of gravity and of the energy stored in the
spring (note that energies always add). These two terms read as:
1
Vgravity = −mg x and Vspring = kx 2 (3.46)
2
Note the minus sign on Vgravity . One can easily guess it by observing that if x increases the
mass goes down and therefore the potential energy decreases. This requires the minus sign.
The potential energy of a spring is always given by the quadratic form used here, i.e. it is
given by the elongation of the spring (from rest length) squared, multiplied by the rigidity
and divided by two.
We can now assemble the Lagrange function:
¡ ¢ 1 1
L = T − Vgravity + Vspring = m ẋ 2 + mg x − kx 2 (3.47)
2 2
Once the Lagrange function is assembled, the modelling does not require “intelligence"
anymore, as the rest is just calculus and algebraic manipulations in order to extract a differ-
ential equation from (3.35). This procedure can actually be automated, and its deployment
entrusted to computer tools. If we do the exercise of computing (3.35) here. Let us start
with:
61
because W is constant. We will see later on that having a matrix W (q ) that is constant
simplifies significantly the resulting model equations.
We can turn to computing:
∇q L = ∇x L = mg − kx (3.51)
We ought to observe that this term delivers the forces acting on the system resulting from
the potentials present in the system. We can now assemble the Lagrange equations, i.e.
(3.35) for this simple example reads as:
m ẍ − mg + kx = 0 (3.52)
Because the acceleration ẍ enters linearly in the Lagrange equation, we can solve it for ẍ.
Here we can trivially manipulate (3.52) to get:
m ẍ = mg − kx. (3.53)
This equation is essentially the Newton equation “F = m · a" for the spring-mass system.
Indeed, we observe that:
m ẍ = mg − kx
|{z} (3.54)
| {z }
≡m·a ≡F
Example 3.7 (Linear crane). We will now consider a linear crane as depicted in Fig. 3.7.
This kind of system is e.g. important for loading/unloading cargo ships in large harbors.
The proposed generalized coordinates are visible in Fig. 3.7, i.e.
· ¸
x
q= ∈ R2 (3.55)
θ
is made of the position of the cart on the rail, and the angle θ is the angle of the rod linking
the cart to the hanging mass with respect to the vertical. We observe here that there is
no problem mixing positions and angles (and any other physical quantity describing the
“configuration" of a mechanical system) in the generalized coordinates.
62
Figure 3.7: Simple hanging crane. The mass M moves on a rail (position x) and mass m
is linked to it via a massless rod of length L. We describe the position of the
hanging mass via the angle θ.
We can now compute the kinetic and potential energy functions. As energies are additive,
we can compute them separately for the two masses. The kinetic energy of the cart is simply
Tcart = 21 M ẋ 2 . The energy of the hanging mass is a bit more complex to compute here, but
can still be simply derived from formula (3.1). Indeed, we observe that the position of the
hanging mass is given by:
· ¸
x + L sinθ
pm = (3.56)
− L cos θ
Note that the positive sign in the first line will specify in which direction a positive angle
will “rotate" the hanging mass. We can then compute:
· ¸
∂p m ẋ + L θ̇ cos θ
ṗ m = q̇ = (3.57)
∂q L θ̇ sin θ
and
° °2
°ṗ m ° = ṗ ⊤ ṗ
m m = ... = L 2 θ̇ 2 + ẋ 2 + 2L θ̇ ẋ cos θ (3.58)
1 1 1 1
T = M ẋ 2 + m ṗ m
⊤
ṗ m = (m + M)ẋ 2 + mL 2 θ̇ 2 + mL θ̇ ẋ cos θ (3.59)
2 2 2 2
We can deduce that the matrix W (q ) is now:
· ¸
m + M mL cosθ
W (q ) = (3.60)
mL cos θ mL 2
63
and is not constant in this example (it is function of θ). It can be useful to observe here that
the matrix W (q ) can be readily computed from the kinetic energy function by computing
its Hessian with respect to q̇ , i.e.
¡ ¢
∂2 T q , q̇
W (q ) = (3.61)
∂q̇ 2
The potential energy of the hanging crane is just gravity, and is related to the “z" position
of the hanging mass, i.e.:
£ ¤
V = mg 0 1 p m = −mg L cosθ (3.62)
64
Example 3.8 (Elastic crane). Let us consider an elastic crane, identical to the one depicted
in Fig. 3.7, but where the rod is not rigid, hence the length L becomes variable, and the
associated potential energy stored in the rod is then given by:
1
Vrod = K (L − L 0 )2 (3.71)
2
where L 0 is the rest-length of the rod. We will now adopt the generalized coordinates
x
q = θ (3.72)
L
The kinetic energy of the system now needs to account for L being time-varying, but the
computations performed previously do not change. I.e. we can compute:
· ¸
∂p m ẋ + L θ̇ cos θ + L̇ sin θ
ṗ m = q̇ = (3.73)
∂q L θ̇ sin θ − L̇ cos θ
and
° °2
°ṗ m ° = ṗ ⊤ ṗ
m m = ... = L 2 θ̇ 2 + ẋ 2 + 2L θ̇ ẋ cos θ + L̇ 2 + 2L̇ ẋ sin θ (3.74)
The kinetic energy of the hanging crane is therefore given by:
1 1 1 1 1
T = M ẋ 2 + m ṗ m ⊤
ṗ m = (m + M)ẋ 2 + mL 2 θ̇ 2 + mL θ̇ ẋ cos θ + m L̇ 2 + m L̇ ẋ sin θ (3.75)
2 2 2 2 2
Matrix W (q ) is not quite complex in this example. It reads as:
m + M mL cosθ m sinθ
W (q ) = mL cos θ mL 2 0 (3.76)
m sinθ 0 m
It is the same as (3.60), but with an extra row and an extra column stemming from the new
coordinate L. Our potential energy now includes the new term (3.71) corresponding to the
elastic rod. It reads as:
1
V = −mg L cosθ + K (L − L 0 )2 (3.77)
2
We can then assemble the Lagrange function and compute the different terms of the La-
grange equation. We have:
(M + m)ẋ + mL θ̇ cos θ + m L̇ sin θ
∇q̇ L = mL 2 θ̇ + mL ẋ cos θ = W (q )q̇ (3.78a)
m L̇ + m ẋ sin θ
(M + m)ẍ + mL cosθ θ̈ − mL sin θ θ̇ 2 + 2m L̇ cos θ θ̇ + m L̈ sin θ
d
∇q̇ L = mL cosθ ẍ + mL 2 θ̈ − mL θ̇ ẋ sin θ + m L̇ ẋ cos θ + 2mL L̇ θ̇ (3.78b)
dt
m L̈ + m ẍ sin θ + m θ̇ ẋ cos θ
0
∇q L = −m(Lg sinθ − L̇ ẋ cos θ + L θ̇ ẋ sin θ) (3.78c)
2
mL θ̇ + m ẋ cos θ θ̇ + K (L − L 0 ) + mg cosθ
65
The Lagrange equation then reads as:
(M + m)ẍ + m sin θ L̈ + mL cos θ θ̈ − mL sinθ θ̇ 2 + 2m L̇ cos θ θ̇+
mL(L θ̈ + 2L̇ θ̇ + ẍ cos θ + g sin θ) =0 (3.79)
2
m L̈ + m sinθ ẍ + K (L − L 0 ) − mg cosθ − mL θ̇
As previously, the acceleration q̈ enters linearly in (3.79). We can again observe that (3.79)
can be written as:
m θ̇(2L̇ cos θ − L θ̇ sin θ)
W (q )q̈ + mL(2L̇ θ̇ + g sinθ) =0 (3.80)
2
−mL θ̇ + K (L − L 0 ) − mg cosθ
and the acceleration can be computed explicitly using:
m θ̇(2L̇ cos θ − L θ̇ sin θ)
q̈ = −W (q )−1 mL(2L̇ θ̇ + g sinθ) (3.81)
2
−mL θ̇ + K (L − L 0 ) − mg cosθ
Unfortunately, (3.81) yields very long expressions and we skip writing it here.
Example 3.9 (Elastic crane, cont’d). Let us revisit the elastic crane example with a different
choice of generalized coordinates. Instead of specifying the position of the hanging mass
using (θ, L) (these are polar coordinates), it is equally valid to use cartesian coordinates. Let
us describe the hanging mass using
· ¸
p1
p= ∈ R2 (3.82)
p2
as illustrated in Fig. 3.8, and we chose to order our generalized coordinates q as:
· ¸
x
q= ∈ R3 . (3.83)
p
Figure 3.8: Simple hanging crane similar to Fig. 3.7. Here we use a cartesian coordinate
system.
66
The kinetic energy then reads as:
1 1
T = M ẋ 2 + m ṗ ⊤ ṗ (3.84)
2 2
and the W (q ) matrix reads as:
¡ ¢ M 0 0
W q = 0 m 0 (3.85)
0 0 m
(note that the minus sign here is due to the reference frame chosen for the hanging mass,
with the vertical basis vector oriented down, see Fig. 3.8). Hence the Lagrange function
reads as:
1 1 1
L = T − V = M ẋ 2 + m ṗ ⊤ ṗ − K (L − L 0 )2 − mg p 2 (3.88)
2 2 2
We can then compute:
· ¸
M ẋ
∇q̇ L = (3.89a)
m ṗ
· ¸
d M ẍ
∇q̇ L = (3.89b)
dt m p̈
0 p1 − x
K (L − L 0 )
∇q L = 0 + x − p1 (3.89c)
L
mg −p 2
67
Figure 3.9: Illustration of the elastic hanging chain
Example 3.10 (Hanging chain). Consider an elastic hanging chain as depicted in Fig. 3.9.
The chain is made of N punctual masses each of mass m that can move in a 3-dimensional
space, linked together by elastic rods of rigidity K , and linked to two fixed points by elastic
rods as well. The “configuration" of the hanging chain can be described by listing the posi-
tions p 1,...,N of each mass in the vector q ∈ R3N . We will label the fixed end-points as p 0 and
p N+1 , and these two points will not be part of the generalized coordinates q . For the sake
of simplicity, we will consider that the elastic links have a rest-length L 0 = 0.
The kinetic energy is given by
1
T = m q̇ ⊤ q̇, (3.91)
2
and hence the matrix W (q ) reads as
¡ ¢
W q = mI , (3.92)
where I ∈ R3N×3N is the identity matrix, and is therefore constant. The potential energy
function reads as:
N £
X 1 XN ° °
°p k+1 − p k °2
¤
V = mg 0 0 1 pk + K (3.93)
k=1 2 k=0
| {z } | {z }
gravity spring
where p 0 and p N+1 are the fixed positions of the end points. The Lagrange function reads
as:
1 XN £ 1 XN ° °
°p k+1 − p k °2
¤
L = m q̇ ⊤ q̇ − mg 0 0 1 pk − K (3.94)
2 k=1 2 k=0
68
We can then compute:
∇q̇ L = m q̇ (3.95a)
d
∇q̇ L = m q̈ (3.95b)
dt
p0 − p1 p2 − p1 g
p1 − p2 p3 − p2 g 0
∇q L = K .. +K .. +m .. , g = 0 (3.95c)
. . .
g
p N−1 − p N p N+1 − p N g
Example 3.11 (2-pendulum crane). Let us consider now a 2-pendulum crane as illustrated
in Fig. 3.10.
Figure 3.10: Illustration of the 2-pendulum crane. The cart of mass m moves on a rail (posi-
tion x, not shown here) and the two hanging masses also of mass m are linked
to it via massless rods of length L. We describe the position of the hanging
masses via the angle θ1,2 (relative to the verticals).
69
We proceed with building the kinetic and potential energy functions for this system. To that
end, it is best to start with describing the position of the two hanging masses (in a cartesian
reference frame), as functions of the generalized coordinates
x
q = θ1 (3.98)
θ2
where
· ¸
cos θ1 − sin θ1
R (θ1 ) = (3.101)
sin θ1 cos θ1
and the second hanging mass is at a position: The first hanging mass is at a cartesian posi-
tion:
· ¸
0
p 2 = p 1 + R (θ2 ) (3.102)
−L
We can then readily apply the chain rule (3.2), i.e. to obtain ṗ 0,...2 . The kinetic energy func-
tion then reads as (the cart and the two haging masses are all of mass m):
1 X2
T= m ṗ ⊤ ṗ k (3.103)
2 k=0 k
70
We can then compute:
where
2 2
2 sin θ 1 θ̇ 1 + sin θ 2 θ̇ 2
∂ ¡ ¢
W (q )q̇ q̇ = mL L θ̇1 θ̇2 sin(θ1 − θ2 ) + 2ẋ θ̇1 sin θ1 − L θ̇22 sin(θ1 − θ2 ) (3.108)
∂q
−L θ̇1 θ̇2 sin(θ1 − θ2 ) + ẋ θ̇2 sin θ2 + L θ̇12 sin(θ1 − θ2 )
If one tries to assemble the resulting model in e.g. its explicit form, by computing (identi-
cally to (3.42)):
· ¸
¡ ¢−1 ∂ ¡ ¢
q̈ = W q ∇q L − W (q )q̇ q̇ (3.109)
∂q
However, the symbolic complexity resulting from computing (3.109) is high (and not re-
ported here for this reason).
∇q E = Q (3.110)
i.e. they describe the change of energy in the system when the generalized coordinates q
are “moved a little bit".
71
This concept is often presented in the literature via the following equation:
δW = 〈Q , δq 〉 (3.111)
which is essentially saying the same as (3.110), i.e. a “small" motion δq , combined with the
external forces and moments produces a “small" amount of work δW (a change of energy
in the system), and the generalized forces Q relate the small motion to small amount of
work produced. Once the generalized forces Q are known, they can be readily included in
the Lagrange formalism using:
d ∂L ∂L
− = Q⊤ (3.112)
dt ∂q̇ ∂q
A fairly systematic procedure can be derived from these concepts whenever a force of a
torque is applied punctually somewhere in the system. Suppose that a force given by vector
F ∈ Rn (where n = 1, 2 or 3 is the number of dimensions in which we are working) in a given
fixed reference frame R is applied at a specific point of the system, having a position p ∈ Rn
in the same reference frame R. We then observe that a small change in the generalized
∂p
coordinates yields a small displacement of the position p given by the Jacobian ∂q , and a
small work:
∂p
δW = F ⊤ δq (3.113)
∂q
It follows that in this case the generalized force corresponding to F is given by:
∂p ⊤
Q= F = ∇q pF (3.114)
∂q
This principle can be easily extended to several forces. When a set of forces F 1,...,m is applied
to a set of points p 1,...,m , then the generalized force is given by:
Xm ∂p ⊤ Xm
i
Q= Fi ≡ ∇q p i F i (3.115)
i =1 ∂q i =1
A couple of simple examples illustrate how these concepts can be used in practice:
• Consider the simple case of a point whose position is described in an orthonormal
references frame, such that the generalized coordinates are q ∈ R3 , and subject to a
force F ∈ R3 described in the same reference frame. A small displacement q → q +δq
combined with the force F yields a small amount of work:
δW = 〈 F , δq 〉 = F ⊤ δq (3.116)
such that in this specific case Q = F , i.e. the generalized force in this system is the
force F itself. One can verify that this is also given by (3.115). Indeed, since the posi-
tion of the point is readily given by the generalized coordinates, i.e. p = q , it follows
that an application of (3.115) yields (where m = 1):
∂p ⊤
Q= F =F (3.117)
∂q
| {z }
=I
72
• The very same principle applies to moments and rotations. Consider a axis of rota-
tion whose position is described via the angle θ ∈ R, and subject to a moment T ∈ R.
The amount of work produces for a small motion δθ due to the moment T is then
given by:
δW = 〈 T , δθ 〉 = T δθ (3.118)
Example 3.12 (2-pendulum crane, cont’d). Consider the crane example illustrated in Fig.
3.10. Recall that the position of the two masses in a fixed reference frame is given by (see
(3.100) and (3.102)):
· ¸ · ¸
x 0
p1 = + R (θ1 ) (3.119a)
0 −L
· ¸
0
p 2 = p 1 + R (θ2 ) (3.119b)
−L
If forces F 1,2 ∈ R2 are applied to masses m 1 , m 2 respectively, then the generalized force then
reads as:
1 0 1 0
Q = L cos θ1 L sin θ1 F 1 + L cos θ1 L sin θ1 F 2 (3.122)
0 0 L cos θ2 L sin θ2
• In the spring-mass system, any position x is “acceptable" for the system (even if for
practical reasons the position of the mass may be restricted in a real system)
73
• In the hanging crane example, any value of θ and x are valid (even if in practice the
rail may have a limited length and the hanging mass may not be physically allowed
to get above the rail)
To clarify where we are going here, let us consider a simple example where the generalized
coordinates are not free to “move independently".
Example
¡ ¢ 3.13 (Bowl). Let us consider a “bowl" in 3D, described by the scalar equation
c p = 0 (see Fig. 3.11 for an illustration), where p ∈ R3 are cartesian coordinates and
¡ ¢ 1¡ ¢
c p = p 3 − p 22 + p 12 ∈ R (3.123)
4
Let us consider a mass m moving in 3D, but forced to slide on the surface of the bowl. The
position of the mass can be described by the cartesian coordinates p ∈ R3 , but since the
mass moves at the surface of the bowl, the position p is not “free" to move everywhere, but
is “constrained"
¡ ¢ to move in the 2D “space" described by (3.123). Indeed, only positions that
satisfy c p = 0 are admissible. Put differently, the generalized coordinates are not “free" to
move independently. We will return to the example after having formally introduced some
new concepts.
74
Fortunately, the Lagrange formalism we have discussed so far can be readily extended to
constrained problems, with minimal changes. The construction of the kinetic and potential
energy functions for the system can be done without any regard to the constraint function
c (q ) = 0. The constraints appear in the Lagrange function, which is then assembled as:
®
L (q , q̇, z ) = T (q , q̇) − V (q ) − z , c (q ) (3.124)
where z is a new set of variables, and usually labelled the Lagrange multipliers associated
to the constraint function c . When function c (q ) has as image space the vector space Rnc
(i.e. when it returns a “standard" vector), then z ∈ Rnc and we can rewrite (3.124) simply
as2 :
Beyond this point, the Lagrange formalism applies without any modification, i.e. the dy-
namics associated to a system of kinetic and potential energy T and V and constrained to
evolve on the manifold c (q ) are described by:
d ∂L ∂L
− = Q⊤ (3.126a)
dt ∂q̇ ∂q
¡ ¢
c q =0 (3.126b)
The mathematical justification for this construction are beyond the scope of this course,
but we will try to build some intuitions via examples. Before deploying examples of ap-
plication of the constrained Lagrange equation, we can do the same work as in (3.38) and
(3.39). I.e. one can verify that the equalities:
¡ ¢
∇q̇ L = W q q̇. (3.127)
and
d d ¡ ¡ ¢ ¢ ¡ ¢ ∂ ¡ ¡ ¢ ¢
∇q̇ L = W q q̇ = W q q̈ + W q q̇ q̇ (3.128)
dt dt ∂q
¡ ¢
still hold, and are actually not impacted by the presence of the constraints c q . The con-
straints modify the Lagrange equation via the term:
∇q L = ∇q T − ∇q V − ∇q c · z (3.129)
We ought to recall here remark 3. of Section 3.3: the term ∇q L in the Lagrange equation
holds forces that are “intrinsic" to the system (potential, centrifugal). This remark still holds
here, and we will see in the illustrative examples below that the term ∇q c · z is in fact akin
to a force “keeping the system on the manifold c ".
2
The generic form (3.124) allows one to consider constraint function in “exotic" vector spaces such as e.g.
matrix spaces or Hilbert spaces. We will discuss only the simple case of “standard" vector spaces here.
75
As observed previously in Section 3.3, we can assemble the Lagrange equation (3.126) in
the form:
¡ ¢ ∂ ¡ ¡ ¢ ¢
W q q̈ + W q q̇ q̇ − ∇q T + ∇q V + ∇q c · z = Q (3.130a)
∂q
¡ ¢
c q =0 (3.130b)
We can observe again here that the Lagrange equation (and their explicit forms (3.131) and
(3.132)) deliver the acceleration q̈ as a function of q , q̇, Q and of the Lagrange multipliers
z . Hence, a simulation of the model could be produced for given initial conditions q (0),
q̇(0), given external forces Q(t ) (given at all time) and the Lagrange multipliers z (t ). Hence,
in order to compute a simulation of the model equations, we need to calculate somehow
the (time-varying) Lagrange multipliers z at every time instant of the simulation. We will
investigate in the next section how to do that.
Let us now consider a list of examples in order to digest these theoretical concepts, and to
gather some intuitions on the meaning of the constrained Lagrange equation (3.126).
Example 3.14 (Bowl, cont’d). Let us start with our bowl example of Fig. 3.11 to detail the
procedure behind the constrained Lagrange equation (3.126). The mass position is de-
scribed in the cartesian coordinates q ≡ p ∈ R3 , such that the kinetic and potential energy
function read as:
1 £ ¤
T = m ṗ ⊤ ṗ, V = mg 0 0 1 p (3.133)
2
Note that we¡ can
¢ assemble these functions by completely disregarding the existence of the
constraint c q = 0. We then assemble the constrained Lagrange function as per (3.125):
µ ¶
1 ⊤
£ ¤ ⊤ 1¡ 2 2
¢
L = m ṗ ṗ − mg 0 0 1 p − z p 3 − p 2 + p 1 (3.134)
2 4
¡ ¢
Note that here the constraint is scalar, such that z ∈ R, and the scalar product z ⊤ c q is a
simple product here, i.e. we can rewrite the Lagrange function:
µ ¶
1 ⊤
£ ¤ 1¡ 2 2
¢
L = m ṗ ṗ − mg 0 0 1 p − z p 3 − p 2 + p 1 (3.135)
2 4
76
and deploy the Lagrange equation as usual:
∇q̇ L = m ṗ (3.136a)
d
∇q̇ L = m p̈ (3.136b)
dt
− 12 p 1
0
∇q L = −mg 0 − ∇q c · z ,
where ∇q c = − 12 p 2 (3.136c)
1 1
The dynamics of the mass in the bowl are then given by:
1
0 − 2 p1
m p̈ = −mg 0 − z − 12 p 2 (3.137a)
1 1
1 ¡ ¢
0 = p 3 − p 22 + p 12 (3.137b)
4
As detailed in the theory above, it is worth here briefly underlining the following:
• Equation (3.137b) is scalar, and describes the condition that the position p ought to
satisfy in order to “be on" the manifold (i.e. the bowl here). However, one ought to
observe that (3.137b) does not deliver the unknown z . As a matter of fact, z does not
even appear in (3.137b). Hence, while (3.137) provides 4 equations for the 4 unknown
variables q̈, z , it does not provide them as such, as it fails to provide z .
Example 3.15 (Crane, revisited). Let us revisit the crane of Fig. 3.8 in cartesian coordinates,
assuming that the link between the cart and the hanging mass is (infinitely) rigid. The
kinetic energy function is built in the same way as in the elastic example above, i.e. we use
the coordinates
· ¸
p1
p= ∈ R2 (3.138)
p2
for position of the hanging mass, and x for the position of the cart. We chose to order our
generalized coordinates q as:
· ¸
x
q= (3.139)
p
1 1
T = M ẋ 2 + m ṗ ⊤ ṗ (3.140)
2 2
77
and the W matrix reads as (3.85), i.e.:
M 0 0
W= 0 m 0 (3.141)
0 0 m
and is constant. The potential energy now involves only the energy from gravity, i.e.
V = −mg p 2 (3.142)
The constraint is then the distance between the cart and the hanging mass. The vector
describing the link is given by:
· ¸
x
δ=p− (3.143)
0
And the constraint imposed by the link in the system can e.g. be written as:
¡ ¢ 12
δ⊤ δ −L = 0 (3.144)
For computational reasons, it is useful to “get rid" of the square root function in (3.144) and
to scale the constraint by 21 , i.e. we write:
¡ ¢ 1¡ ¢
c q = δ⊤ δ − L 2 = 0 (3.145)
2
which is equivalent to (3.144). The Lagrange function then reads as:
1 1 1 ¡ ¢
L = M ẋ 2 + m ṗ ⊤ ṗ + mg p 2 − z δ⊤ δ − L 2 (3.146)
2 2 2
We can then deploy the Lagrange equation as usual:
∇q̇ L = W q̇ (3.147a)
d
∇q̇ L = W q̈ (3.147b)
dt
0 x − p1
∇q L = 0 − z p 1 − x (3.147c)
mg p2
78
In a more explicit form, it reads as:
0 x − p1
q̈ = W −1 0 − z p 1 − x (3.149a)
mg p2
1¡ ¢
0 = δ⊤ δ − L 2 (3.149b)
2
The observations already made above concerning solving (3.149) for z still hold here. It
is interesting at this stage to compare (3.149) to its equivalent (3.70) developed in polar
coordinates (i.e. using the angle θ to describe the position of the hanging mass). While
(3.149) holds more equations than (3.70) (4 vs. 2), its symbolic complexity is lower. Indeed,
while (3.70) comprises several trigonometric terms (sin and cos), and a division, (3.149)
includes only bilinear terms (i.e. products of the variables).
Example 3.16 (2-pendulum crane, revisited). We will revisit now the 2-pendulum crane
illustrated in Fig. 3.10. Modelling this system using polar coordinates (i.e. the angles of
the pendulums) was resulting in a model of very high symbolic complexity (see (3.107)-
(3.108) and the following remarks). Let us see the outcome of approaching the modelling
of the same system using cartesian coordinates and constrained Lagrange. We select the
generalized coordinates:
x
q = p1 , p 1,2 ∈ R2 (3.150)
p2
where
· ¸
x
δ1 = − p1, δ2 = p 2 − p 1 (3.154)
0
79
where z ∈ R2 . We can then compute
∇q̇ L = m q̇ (3.156a)
d
∇q̇ L = m q̈ (3.156b)
dt
0
0 x −£ 1 0 ¤p 0
1
∇q L = mg − −δ1 −δ2 z (3.156c)
0 0 δ2
mg
This ought to be compared to (3.107)-(3.108), where the final model was not even provided
because of its very high symbolic complexity. The reduction of symbolic complexity is re-
sulting from the choice of cartesian coordinates,
¡ ¢which yields a constant and diagonal W
∂
matrix. As a result, the complex terms ∂q W (q )q̇ q̇ in
· ¸
¡ ¢−1 ∂ ¡ ¡ ¢ ¢
q̈ = W q Q + ∇q T − ∇q V − ∇q c · z − W q q̇ q̇ (3.159)
∂q
¡ ¢−1
disappears, and the inverse W q is trivial.
In the previous section we have seen that we can write the accelerations q̈
£ ¤
q̈ = W −1 Q − ∇q V − ∇q c · z (3.160)
As observed before, in order to compute a simulation from (3.160) we need the (time-
varying) Lagrange multipliers z at every time instant of the simulation.
Unfortunately, a problem arises here. Indeed, since we use equation (3.126a) (or its equiv-
alent form (3.130a)) to compute q̈, we are left with equation (3.126b) to determine the un-
known z . However, equation (3.126b) does not deliver z ; indeed it is not even a function of
z , and thus cannot inform us about z . We will see in the following how to circumvent this
problem, and we will put some further formalism around this issue in Chapter 6.
80
Running simulations for mechanical system requires one to be able to compute the system
accelerations q̈ as a function of the positions and velocities q , q̇ and of the
¡ ¢ external forces
Q. In the unconstrained case, this is readily feasible as long as matrix W q is full rank.
In the constrained case, we have raised the issue before that a model arising from con-
strained Lagrange and taking the form:
d ∂L ∂L
− = Q⊤ (3.161a)
dt ∂q̇ ∂q
¡ ¢
c q =0 (3.161b)
does not deliver as such the accelerations of the system. Indeed, while equation (3.161a)
delivers the acceleration q̈ via its explicit version:
· ¸
¡ ¢−1 ∂ ¡ ¡ ¢ ¢
q̈ = W q Q + ∇q T − ∇q V − ∇q c · z − W q q̇ q̇ (3.162)
∂q
the
¡ accelerations
¢ q̈ are function of the unknown z , which is not delivered by (3.161b), as
c q is not even a function of z . In this section we will see how to tackle this problem.
If one sets z = 0 in the dynamic equation (3.161a), then one can verify that the equation¡ ¢
would be describing a “free" motion, i.e. what the system would do if the constraints c q =
0 were not present at all. E.g. in the “bowl" example above, the ball would free fall, and in
the 2-pendulums crane example, the hanging masses would free fall and the cart would
move along its rail as if not connected to anything. ¡ ¢One can therefore construe ∇q c · z in
(3.162) as a term that will enforce the constraints c q = 0 by manipulating q̈ resulting from
(3.162) via adequate selections of the variables z . As a matter of fact, the combination of
terms
¡ ¢−1
W q ∇q c · z (3.163)
arising in (3.162) yield an acceleration that is added to the other “sources" of accelera-
tions (e.g. stemming from forces, gravity, centrifugal effects, etc.). One can observe that
¡ ¢−1
W q ∇q c is a matrix of dimension n q × n c , such that the term (3.163) yields accelera-
¡ ¢−1
tions in the subspace spanned by the columns of W q ∇q c .
Let us then build some intuition behind what will happen here. We¡ know ¢ that z ought to
be chosen such that the accelerations q̈ enforce the constraints c q = 0. However, the
constraints specify conditions on the system position, not on its accelerations. We ought
to understand, though, that the positions q are obviously influenced
¡ ¢ by the accelerations q̈
(via 2 integrations), such that the accelerations influence c q (via 2 integrations).
¡ ¢In order
to chose the z adequately, we need to make the impact of the accelerations on c q appear
explicitly.
81
¡ ¢
should observe that if c q = 0 is enforced throughout the trajectory of the system (i.e. at
every time instant), then:
dk ¡ ¢
k
c q =0 (3.164)
dt
d2
¡ ¢
also hold at all time for any k ≥ 0. We also observe that dt 2 c q = 0 is a condition where the
accelerations appear explicitly. In order to see that, we simply need to apply some chain
rules:
¡ ¢ d ¡ ¢ ∂c
ċ q , q̇ = c q = q̇ (3.165a)
dt ∂q
µ ¶
¡ ¢ d2 ¡ ¢ ∂c ∂ ∂c
c̈ q , q̇, q̈ = 2 c q = q̈ + q̇ q̇ (3.165b)
dt ∂q ∂q ∂q
We can then assemble the Lagrange equation (3.126a) and equation (3.165b) in the new
model:
d ∂L ∂L
− = Q⊤ (3.166a)
dt ∂q̇ ∂q
¡ ¢
c̈ q , q̇, q̈ = 0 (3.166b)
82
¡ ¢
However, as observed previously for matrix W (q ) alone, inverting the symbolic matrix M q
¡ ¢−1 ¡ ¢
can yield an extremely complex matrix M q , even if M q is fairly “simple". For this rea-
son, it is often preferable
¡ ¢−1 to work with the model (3.168) in its implicit form, or to treat the
matrix inversion M q numerically when deploying model (3.169).
Let us revisit some of the examples of Section 3.5 in the light of the model transformation
described above.
Example 3.17 (Bowl, cont’d). In the bowl Example 3.13, the model equations determined
were
1
0 − 2 p1
m p̈ = −mg 0 − z − 12 p 2 (3.170a)
1 1
1¡ ¢
0 = p 3 − p 22 + p 12 (3.170b)
4
¡ ¢
where (3.170b) is c q = 0 written explicitly. In order to perform the model transformation
detailed above on this model, we need to time-differentiate (3.170b) twice:
¡ ¢ 1¡ ¢
c q = p 3 − p 22 + p 12 (3.171a)
4
¡ ¢ 1¡ ¢
ċ q = ṗ 3 − p 2 ṗ 2 + p 1 ṗ 1 (3.171b)
2
¡ ¢ 1¡ ¢
c̈ q = p̈ 3 − p 2 p̈ 2 + p 1 p̈ 1 + ṗ 22 + ṗ 12 (3.171c)
2
These expressions can be obtained directly as above, or via using the construction (3.165).
The transformed model then reads as:
1
0 − 2 p1
m p̈ = −mg 0 − z − 12 p 2 (3.172a)
1 1
1 ¡ ¢
0 = p̈ 3 − p 2 p̈ 2 + p 1 p̈ 1 + ṗ 22 + ṗ 12 (3.172b)
2
It can be put in the form (3.168):
m 0 0 − 12 p 1 p̈ 1 0
0 0 − 12 p 2 p̈ 2 0
m
= (3.173)
0 0 m 1 p̈ 3 mg
− 21 p 1 − 21 p 2 1 0 z ṗ 22 + ṗ 12
| {z }
:=M (q )
¡ ¢
The determinant of matrix M q reads as:
¡
¡ ¢¢ m2 ¡ 2 ¢
det M q = − p 1 + p 22 + 4 < 0 (3.174)
4
and is non-zero for any position p, such that (3.173) is always well defined. The inverse of
matrix M(q ) is fairly complex, such that writing (3.173) explicitly is tedious.
83
Example 3.18 (Crane, cont’d). Let us look at the crane Example 3.15, which had the original
model
0 x − p1
W q̈ = 0 − z p 1 − x (3.175a)
mg p2
1 ¡ ¢
0 = δ⊤ δ − L 2 (3.175b)
2
· ¸
x
where δ = p − and
0
M 0 0
W = 0 m 0 (3.176)
0 0 m
We can then proceed with time-differentiating the constraints:
¡ ¢ 1¡ ¢
c q = δ⊤ δ − L 2 (3.177a)
¡ ¢ 2
ċ q = δ⊤ δ̇ (3.177b)
¡ ¢
c̈ q = δ⊤ δ̈ + δ̇⊤ δ̇ (3.177c)
84
while the transformed model reads as:
d ∂L ∂L
− = Q⊤ (3.180a)
dt ∂q̇ ∂q
¡ ¢
c̈ q , q̇, q̈ = 0 (3.180b)
An important question to ask here is whether the transformed model (3.180) is equivalent
to the original model (3.179). Since the trajectories arising from the original model (3.179)
satisfy c = 0 at all time, they also satisfies (3.180b) at all time. Hence, the trajectories of
the original model are also trajectories of the transformed model (3.180). Unfortunately,
the converse is not always true. Indeed, the trajectories of the transformed model (3.179)
satisfy c̈ = 0 at all time, which does not entail that they satisfy c = 0 at all time. Actually, if
the trajectories satisfy c̈ = 0 at all time, then one can verify that the constraints c must obey
the time evolution:
¡ ¢ ¡ ¢ ¡ ¢
c q (t ) = c q (0) + t · c˙ q (0) , q̇ (0) (3.181)
¡ ¢
such that c q (t ) = 0 does not necessarily hold. Fortunately, (3.181) delivers the conditions
that are required in order for the trajectories of the transformed model (3.180)
¡ ¢ to be trajec-
tories of the original model (3.179). Indeed, (3.181) readily tells us that c q = 0 holds at all
time if:
· ¡ ¢ ¸
¡ ¢ c q (0)
C q (0), q̇ (0) := ¡ ¢ =0 (3.182a)
ċ q (0), q̇ (0)
In order to build some intuition behind these observation using the bowl example above.
The consistency conditions for the bowl read as:
· ¸
¡ ¢ p 3 − 14 p 12 − 41 p 22
C q (0), q̇ (0) := (3.183)
ṗ 3 − 12 ṗ 1 p 1 − 21 ṗ 2 p 2
While these expressions do not carry much intuition, they have a very strong geometric in-
terpretation. The first¡ condition
¢ in C simply states that the initial condition q (0) must sat-
isfy the constraints c q , and is a quite natural requirement. The second condition states
that:
¡ ¢ ∂c
ċ q (0), q̇ (0) = q̇ = ∇q c ⊤ q̇ = 0 (3.184)
∂q
i.e. it requires that the scalar product between the gradient ∇q c and the velocities q̇ is
zero. In other words, it requires the velocities q̇ to be orthogonal to the gradient ∇q c . This
requirement has a very intuitive interpretation in the bowl example, see Fig. 3.12. Indeed,
85
Figure 3.12: Consistency conditions for the bowl example. The condition ∇q c ⊤ q̇ = 0 simply
requires that q̇(0) is tangent to the bowl.
¡ ¢
the gradient ∇q c is describing a normal to the surface described by the equation c q (0) =
0. In our bowl example, this surface is the bowl itself (more complex systems may not
¡ ¢⊤
deliver such simple picture), and the condition ∇q c q (0) q̇(0) = 0 essentially requires
that the initial velocities q̇(0) are tangent to the bowl surface. This requirement is physically
needed in order for the ball to slide on the surface of the bowl.
on the trajectories q (t ) computed from the transformed model. However, this time evolu-
tion holds if and only if the condition c̈ = 0 is imposed. While this condition is imposed
by the transformed model (3.166) (and all the variants we have looked at), when simulat-
ing the system, the condition is imperfectly imposed for numerical reasons. As a result,
the time evolution (3.185) is “noisy", and it tends to accumulate the numerical errors over
time. This effect is illustrated in Fig. 3.13, which is the outcome of simulating the trans-
formed model (3.172) with consistent initial conditions, but a low integration accuracy. As
a result, the condition c̈ = 0 is not enforced very accurately, and tends to create a drift in the
constraints.
This issue can be treated via Baumgarte stabilization, whereby the transformed model
does not impose c̈ = 0, but rather dynamics on c that stabilizes them to zero. This can be
e.g. done by imposing:
c̈ + 2αċ + α2 c = 0 (3.186)
86
×10-4
10
¡ ¢
c q
0
-5
0 2 4 6 8 10
t
×10-4
4
2
¡ ¢
ċ q
-2
-4
0 2 4 6 8 10
t
Figure 3.13: Illustration of the constraints drift for the bowl example.
for some α > 0 (such that the dynamics (3.186) are stable, with two real poles at −α). The
transformed model then reads as:
d ∂L ∂L
− = Q⊤ (3.187a)
dt ∂q̇ ∂q
c̈ + 2αċ + α2 c = 0 (3.187b)
87
88
4 N EWTON M ETHOD
The Newton method is a central tool for simulation, optimization, and estimation. It is
therefore no surprise that we will encounter it here. Generally speaking, the Newton method
aims at solving a set of equations, that we can e.g. write as:
¡ ¢
ϕ x, y = 0 (4.1)
in x ∈ Rnx for a given y ∈ Rn y . Clearly, in order for the problem of finding x to be well-posed,
one needs (4.1) to hold “as many equations as unknown variables", i.e. function ϕ is
We will see later that this condition is not sufficient. When the system of equations de-
scribed by (4.1) is nonlinear, finding a solution x can typically not be done explicitly (i.e.
we typically cannot provide a set of symbolic expressions that describe x as a function of
y ). This does not mean that x is not perfectly well described as a function of y by (4.1).
Formally, one say that x is an implicit function of y defined by (4.1).
The Newton method then allows one to compute x as a function of y numerically. In this
chapter we will introduce the basics of the Newton method.
89
as a (full) Newton step that corrects the guess x to the update x+ . See Fig 4.1 for an illus-
tration. Using this construction, the Newton
¡ ¢ method is then iterative, in the sense that the
(4.5) is repeated until the solution to ϕ x, y = 0 is reached (or close enough). More specif-
ically, the Newton method uses the algorithm:
Input: Variable
¡ ¢ y , initial guess x, and tolerance tol
while kϕ x, y k∞ ≥ tol do
Compute
¡ ¢
¡ ¢ ∂ϕ x, y
ϕ x, y and (4.7)
∂x
Compute the Newton step
¡ ¢
∂ϕ x, y ¡ ¢
∆x + ϕ x, y = 0 (4.8)
∂x
Take the Newton step
x ← x + ∆x (4.9)
return x
¢
ϕ x, y
x⋆ ∆x
¡
x+ x
∂ϕ(x,y )
∂x
x
Figure 4.1: Illustration of the Newton principle on a scalar equation (i.e. n x = 1 and y is not
used here). The update¡ x+ is ¢ obtained by finding the root of the linear approxi-
mation (grey line) of ϕ ¡x, y ¢ built at a candidate point x. The update x+ is closer
to the solution x ⋆ of ϕ x, y = 0 than the candidate x.
90
A few crucial remarks ought to be made here:
¡ ¢
• If it converges, the Newton method converges to x ⋆ a solution of ϕ x, y = 0
¡ ¢
• Each step of the Newton method requires evaluating the function ϕ x, y and its Ja-
∂ϕ(x)
cobian ∂x , and solving the linear system (4.8).
∂ϕ(x,y )
• The Newton method requires that the square Jacobian matrix ∂x is full rank (i.e.
invertible), in order for the linear system (4.8) to be well-posed.
¡ ¢
• If function ϕ x, y is linear in x (and well posed), then the Newton method finds the
solution x ⋆ in one step. It is then fully equivalent to solving the linear system.
Let us first assume that the Newton iteration presented above converges, and let us inves-
tigate its convergence rate, i.e how quickly kx − x ⋆ k decreases. This question is addressed
as follows:
Theorem 4. if the full step Newton iteration converges, then it converges at a quadratic rate,
i.e.
°x+ − x ⋆ ° ≤ C · °x − x ⋆ °2
° ° ° °
(4.10)
Before delivering a formal proof of the quadratic convergence rate, one ought to under-
stand that this result is fairly intuitive. Indeed, one can observe that at every iteration, the
error that the Newton step is making is of order 2. Hence, every iteration “removes"
¡ ¢ the
first-order inaccuracy between the solution candidate and the solution to ϕ x, y = 0. It
° °2
follows that at every step, the Newton method retains an error of order °x − x ⋆ ° . This ob-
servation alone give an intuition of why (4.10) ought to be true. We provide next a proof of
Theorem 4. This proof is not part of the examination though.
3
Note that there may be more than one solution! Hence the Newton method could converge to different
points, depending on where it is started.
91
Proof. For the sake of simplicity, let us dismiss the fixed argument y , and let us write the
Newton step as
∂ϕ (x)
∆x = −M −1 ϕ (x) , M= (4.11)
∂x
¡ ¢
We first observe that since ϕ x ⋆ = 0, the equality
¡ ¡ ¢¢
x+ − x ⋆ = x − x ⋆ − M −1 ϕ (x) − ϕ x ⋆ (4.12)
holds. We then use classical results from analysis to observe that:
Z1 ¡ ¡ ¢¢
¡ ⋆¢ ∂ϕ x + τ x ⋆ − x ¡ ¢
ϕ (x) − ϕ x = x − x ⋆ dτ (4.13)
0 ∂x
Using (4.12) and (4.13), we can write:
µZ1 ¡ ¡ ¢¢ ¶
⋆ ⋆ −1 ∂ϕ x + τ x ⋆ − x ¡ ⋆
¢
x+ − x = x − x − M x −x dτ (4.14)
0 ∂x
Or equivalently
µ Z1 ¡ ¡ ¢¢ ¶
⋆ −1 ∂ϕ x + τ x ⋆ − x ¡ ¢
x+ − x = M M− dτ x − x ⋆ (4.15)
0 ∂x
We can modify (4.15) as:
µ Z1 · ¡ ¡ ¢¢ ¸ ¶
⋆ −1 ∂ϕ (x) ∂ϕ x + τ x ⋆ − x ∂ϕ (x) ¡ ¢
x+ − x = M M− − − dτ x − x ⋆ (4.16)
∂x 0 ∂x ∂x
We can then bound (using triangular inequalities):
° µ ¶°
° ⋆
° ° −1
°x + − x ° ≤ °M ∂ϕ (x) °° °
° °x − x ⋆ °
° M− °
∂x
° Z · ¡ ¡ ¢¢ ¸ °
° −1 1 ∂ϕ x + τ x ⋆ − x ∂ϕ (x) °° °
dτ° °x − x ⋆ ° (4.17)
+° °M ∂x
−
∂x °
0
∂ϕ(x)
Since M = ∂x , the first term is zero. Moreover, the term under the integral sign in the
second term can be bounded (on any closed set around x ⋆ ) by
° µ ¡ ¢ ¶°
° −1 ∂ϕ (x) ∂ϕ x ⋆ ° ° °
°M − ° ≤ c °x − x ⋆ ° (4.18)
° ∂x ∂x °
92
Let us illustrate this convergence rate for the example provided in Fig. 4.3. A few remarks
can be useful here:
• A quadratic convergence rate is a very strong convergence. It means that at every
iteration
° ° the number of accurate digits is doubled, i.e. over the iterations, the error
°x − x ⋆ ° follows a decay of e.g. the form
° °
°x − x ⋆ ° = 10−1
° °
°x − x ⋆ ° = 10−2
° °
°x − x ⋆ ° = 10−4
° °
°x − x ⋆ ° = 10−8
° °
°x − x ⋆ ° = 10−16
Hence a few iterations are typically sufficient to reach machine precision (ǫ = 1e −16 )
• While (4.18) appears very technical, it can be understood as a “measure" of how non-
linear the function ϕ (x) is. Indeed, if ϕ (x) is linear in x, i.e.
ϕ (x) = Ax + b (4.21)
(we dismiss the argument y here) for a matrix A and a vector b, then
∂ϕ (x)
= A, ∀x (4.22)
∂x
and the left-hand side of (4.18) is zero, such that c = 0. Conversely, if ϕ (x) is strongly
∂ϕ(x)
nonlinear, then its Jacobian ∂x varies a lot and the difference
¡ ¡ ¢¢
∂ϕ x + τ x ⋆ − x ∂ϕ (x)
− (4.23)
∂x ∂x
can be very large, yielding a large constant c.
• The proof above also informs us on the condition required for the Newton°iteration °
to converge. Indeed, in order for the bound (4.10) to guarantee the decay of °x − x ⋆ °,
one ought to require that the initial guess x provided to the algorithm is such that:
° °
°x − x ⋆ ° ≤ 2c −1 (4.24)
93
1.5 1.5
1 1
0.5 0.5
¢
ϕ x, y
Guess x
0 0
Guess x
¡
-0.5 -0.5
-1 -1
-1.5 -1.5
x x
¡ ¢
Figure 4.2: Newton iteration on a nonlinear, scalar function ϕ x, y (five steps are displayed
¡ ¢
here). On the left graph the iteration converges to the solution x ⋆ of ϕ x ⋆ , y =
0, while on the right graph the iteration diverges. Convergence is obtained for
the initial guess provided to the full-step Newton iteration being close enough
to a solution x ⋆ , while an initial guess further away (possibly) results from an
unstable iteration.
100
10-1
°
°x − x ⋆ °
10-2
°
10-3
1 2 3 4 5
Iteration
94
4.2.2 R EDUCED N EWTON STEPS
95
An important consequence of this result is that while the Newton method presented above
may diverge, a careful selection of reduced Newton steps, i.e. modifications of x in the di-
rection of the Newton step ∆x but possibly scaled down must necessarily converge, as long
as the Newton steps ∆x exist. This motivates the modification of the full-step Newton al-
gorithm into:
Input: Variable
¡ ¢ y , initial guess x, and tolerance tol
while kϕ x, y k∞ ≥ tol do
Compute
¡ ¢
¡ ¢ ∂ϕ x, y
ϕ x, y and (4.33)
∂x
Compute the Newton step
¡ ¢
∂ϕ x, y ¡ ¢
∆x + ϕ x, y = 0 (4.34)
∂x
Select step size t ∈]0, 1]
Take Newton step
x ← x + t ∆x (4.35)
return x ≈ x ⋆
Finding the adequate step size t ∈]0, 1] is typically performed via a line-search strategy,
based on testing first a full step (t = 1), and then reducing it if the condition (4.26) is not
met. Clever methods exist in order to decide whether a step-size t is “good enough", or
too short or too long. Other methods also exist in order to adjust the Newton step ∆x and
guarantee the convergence (see e.g. trust-region techniques), but they are more difficult to
describe and implement.
A Newton iteration equipped with¡ the¢ possibility of taking reduced steps is guaranteed to
⋆
converge to a solution x of ϕ x, y = 0 as long as the Newton steps ∆x exist throughout
the iterations. It shall be underlined here that when reduced steps (i.e. using t < 1) are
necessary, then the quadratic convergence rate detailed in Theorem 4 is lost. The Newton
iteration with reduced steps has the following behavior:
• when x is “far" from a solution x ⋆ , then reduced steps (t < 1) are necessary, and the
algorithm converges “slowly". The resulting convergence rate can be very poor, even
though it is often close to linear in practice
• after a certain amount of iteration, x becomes close enough to x ⋆ and full steps (t = 1)
are acceptable. The convergence then becomes quadratic. Once x is close to x ⋆ , full
96
0.6
0.4
0.2
ϕ (x) 0
Guess x
-0.2
Iteration fails here
-0.4
x
Figure 4.4: Newton iteration with reduced steps on a nonlinear, scalar function ϕ (x) (five
steps are displayed here). Here the iteration does not diverge, but is fails at a
point where ( ) = 0. At this point, the linear system (4.8) does not have a
∂ϕ x,y
∂x
well-defined solution and the Newton step ∆x ceases to exist.
steps can be taken all the way to x ⋆ and the convergence is very fast.
The Newton algorithm with reduced steps converges globally to a solution x ⋆ unless the
linear system (4.34) becomes ill-posed. In order to develop some intuition on this potential
(and frequent) issue, let us consider the example illustrated in Fig. 4.4. A simple inspection
∂ϕ(x,y )
reveals that the Newton iteration may converge to a point at which ∂x = 0, where the
Newton step ceases to exist. In such a case, Theorem 5 ceases to apply, and the Newton
iteration fails. One can also readily see that if the Newton iteration was started closer to the
solution x ⋆ (black dot in the graph), then it would converge to x ⋆ .
in terms of x. In many cases, the equations are dependent on some other argument, la-
belled y here, which in practice can gather e.g. parameters or data appearing ¡ in¢the equa-
tions. If one changes the y , it is natural to expect that the solution x to ϕ x, y = 0 is af-
fected. Formally, we ought¡ to
¢ think of x as being implicitly a function of y , i.e. we should
consider x as a function x y that we usually cannot write explicitly, but which is implicitly
defined by
¡ ¡ ¢ ¢
ϕ x y ,y =0 (4.37)
97
¡ ¢
Some questions naturally arise then, such as e.g. is x y well-defined, is it differentiable
with respect to y and what is its derivative. These questions are object of the Implicit Func-
tion Theorem (IFT), which is one of the cornerstones in the field of numerical analysis.
¡ ¢
Theorem 6. (IFT, ¡ simplified
¢ version): let function ϕ x, y be smooth, and consider a point
x̄, ȳ such that ϕ x̄, ȳ = 0. Suppose that the Jacobian
¡ ¢¯
∂ϕ x, y ¯
¯
¯ (4.38)
∂x x=x̄ , y = ȳ
The proof of the IFT is fairly involved and we will not do it here. However, equality (4.40) is
trivial to prove.
d ¡ ¡ ¢ ¢
ϕ x y , y = 0, holds ∀ y ∈ Y (4.41)
dy
The implicit function theorem is essential in numerical analysis, and let us briefly review
what it provides us:
¡ ¢
• even though function x y cannot be written explicitly and can only be evaluated nu-
merically (using a Newton iteration), its derivative is trivial to compute, using (4.40)
at the point x found by the Newton iteration for given point y .
¡ ¢
• the IFT guarantees that if ϕ x, y = 0 is well posed (i.e. the Jacobian (4.38) is full rank)
at a point y , then it has a neighborhood where it is also well posed, i.e. we can “move"
y (locally), and x “moves" accordingly, and also locally.
98
∂ϕ(x̄, ȳ )
• One ought to observe that the assumption of the Jacobian ∂x being full rank is
also required for the Newton iteration to be well-behaved (i.e. for the linear system
(4.34) to be well-posed). In other words, if the Newton iteration is well-behaved, then
the IFT holds. ¡Conversely,
¢ the assumption of the IFT must hold in order for a system
of equation ϕ x, y = 0 to be solvable for x.
∂ϕ (x)
M≈ (4.43)
∂x
The resulting Newton-type step reads as:
∆x = −M −1 ϕ (x) (4.44)
The use of an approximate Jacobian has an impact on the convergence of the Newton
method. Let us briefly review how it is changed. In order to do that, we will revisit The-
orem 4 and 5 with the Jacobian approximation.
Theorem 7. the convergence of the full-step Newton method with an approximate Jacobian
follows:
° ³ °´ °
° x + − x ⋆ ° ≤ κ + c °x − x ⋆ ° ° x − x ⋆ °
° ° °
(4.45)
2
for some constants c, κ > 0.
∂ϕ(x)
and we observe that if M 6= ∂x the first term in the right-hand side of the inequality does
not disappear. We can then bound (on any closed set around x ⋆ ):
° µ ¶° ° °
° −1 ∂ϕ (x) ° ° −1 ∂ϕ (x) °
°
°M
° M − ° = ° I − M ≤ κ, ∀ x (4.47)
∂x ° ° ∂x °
while the second term in the right-hand side of (4.46) can be bounded as in Theorem 4.
Bound (4.45) follows.
99
It is useful here to make some remarks concerning Theorem 7.
° °
• The bound (4.45) predicts a linear convergence rate, due to the term κ °x − x ⋆ ° in
(4.45). A linear convergence rate is significantly slower than a quadratic convergence
rate, as it does not go through a “collapse" to very small values. See Fig. 4.5 for an
illustration.
100
°
°x − x ⋆ °
10-1
°
10-2
1 2 3 4 5
Iteration
• Bound (4.47) informs us on the impact of the error between M and the true Jacobian
∂ϕ(x) ∂ϕ(x)
∂x
. Indeed, for M = ∂x , κ = 0 can be selected, and one recovers a quadratic
∂ϕ(x)
convergence (see Th. 4). If M and M = ∂x are “very different", then κ has to be
large. One can observe from (4.45) that κ < 1 must hold in order for the convergence
of the full-step quasi-Newton iteration to be guaranteed (even for x being arbitrarily
close to x ⋆ ).
100
for x. The Newton methods detailed above then apply directly to solving (4.49). More
specifically, a Newton iteration for solving (4.49) reads as:
Input: Variable
¡ y¢, initial guess x, and tolerance tol
while k∇x Φ x, y k∞ ≥ tol do
Compute
¡ ¢ ¡ ¢
∇x Φ x, y and ∇2x Φ x, y (4.50)
x ← x + t ∆x (4.52)
return x ≈ x ⋆
Let us review the results presented in the previous sections in the specific context of opti-
mization.
• The convergence of Newton for Optimization is quadratic¡ in ¢ a neighborhood of the
solution x ⋆ and “slow" otherwise. If the Hessian ∇2x Φ x, y is approximated by M,
then the method has a linear convergence rate (i.e. Theorem 7 applies).
• The iteration can fail at a point where (4.51) is ill-defined, i.e. at a point where the
Hessian of Φ is rank-deficient.
In the context of optimization problems, an additional remark can be made concerning the
choice of Hessian approximation, summarized hereafter.
101
if M > 0 and for t > 0 sufficiently small.
1° °
°φ (x) − y °2
min (4.57)
x 2
for some function φ : Rnx → Rnφ .
Optimization problems of the form (4.57) are very common in system identification, see
¡ ¢ ° °2
Section 5. The gradient and Hessian of the cost function Φ x, y = 21 °φ (x) − y ° , then read
as:
¡ ¢ ¡ ¢
∇x Φ x, y = ∇x φ (x) φ (x) − y (4.58)
¡ ¢ h ¡ ¢i
∇2x Φ x, y = ∇x φ (x) ∇x φ (x)⊤ + ∇xi ,x j φ (x) φ (x) − y (4.59)
i ,j
where [.]i ,j denotes a matrix where the elements i , j are detailed between the brackets.
¡ ¢
For many fitting problems, the evaluation of the second term in the Hessian ∇2x Φ x, y is
expensive, such that the Gauss-Newton Hessian approximation:
¡ ¢
∇2x Φ x, y ≈ MGN = ∇x φ (x) ∇x φ (x)⊤ (4.60)
• the function φ (x) is not very nonlinear, such that its second-order derivatives
∇xi ,x j φ (x) are small, or
¡ ¢
• the optimal solution to the fitting problem (4.57) yields φ x ⋆ ≈ y , such that φ (x)− y
is small.
h ¡ ¢i ¡ ¢
Both cases justify the dismissal of the second term ∇xi ,x j φ (x) φ (x) − y in ∇2x Φ x, y .
i ,j
It is useful to observe that the Gauss-Newton Hessian approximation is by construction
semi-positive definite, as MGN ≥ 0. In order to ensure its strict positiveness, it is common
to add some regularization, i.e. to use a Hessian approximation of the form:
102
4.5.2 C ONVEX OPTIMIZATION
However, a point satisfying condition (4.62) is not necessarily the solution to the underly-
ing optimization problem (4.48). This problem is illustrated in Fig. 4.6.
Φ(x)
Figure 4.6: Illustration of the difficulty of finding the minimum of a function Φ(x) via solv-
ing (4.62). Several points satisfy (4.62) here. The black dot is the true minimum,
but (4.62) also yields a so-called local minimum (circle), which is lower than its
neighbours but not lower than the black dot. Condition (4.62) also returns (pos-
sibly local) maxima (square).
Mathematically speaking we would say that condition (4.62) is necessary but not sufficient
to deliver the solution to (4.48). An exception to this lack
¡ of
¢ equivalence between condition
(4.62) and solving (4.48) is when the cost function Φ x, y is convex in x, i.e. if
¡ ¢
∇2x Φ x, y > 0, ∀x (4.63)
In this case, the solution to (4.62) is unique and delivers the solution to (4.48). Moreover,
the Newton iteration with reduced steps is guaranteed to converge to the solution of (4.48).
4.6 S UMMARY
The results concerning the Newton method are a bit convoluted. We provide hereafter a
brief summary condensing the main points:
• Exact reduced Newton steps ∆x improves ϕ for sufficiently small step sizes t ∈ ]0, 1]
103
• Inexact reduced Newton steps ∆x improve ϕ for a sufficiently small step size t ∈ ]0, 1]
∂ϕ
if M is sufficiently close to ∂x . In the context of optimization, M > 0 and sufficiently
small steps t ∈ ]0, 1] reduce the cost function Φ.
• Exact full (t = 1) Newton steps converge quadratically if close enough to the solution
• Inexact full (t = 1) Newton steps converge linearly if close enough to the solution and
if the Jacobian approximation is "sufficiently good"
∂ϕ
• The Newton iteration fails if ∂x
becomes singular
• Newton methods with reduced steps converge in two phases: damped (slow) phase
when reduced steps (t < 1) are needed, quadratic/ linear when full steps are possible.
104
5 S YSTEM I DENTIFICATION (S YS I D )
So far, we have primarily explored one main approach to create models for dynamical sys-
tems, namely physics based modelling, where physical knowledge is encoded in mathe-
matical relations describing the system. In some applications, however, it may be difficult
to determine and quantify the underlying physical mechanisms in a system. In such cases,
there is an alternative approach that is based on collecting data from experiments with
the system, and—based on analysis of the data—creating a mathematical model. Such ex-
perimental, or data-driven, modelling is often referred to as system identification (or short
SysId).5
System identification may be an attractive alternative to physical modelling, e.g. when the
system is too complex to analyze in terms of physical mechanisms, or when the limited re-
quirements on model fidelity does not motivate a possibly time-consuming physical mod-
elling effort. It should be noted that in practice, a combination of physical modelling and
system identification is often used: physical modelling may give important hints about de-
pendencies and qualitative relationships, and system identification may then be used to
quantitatively determine model parameters.
The system identification problem (in the parametric case) basically amounts to adjusting
a model (with adjustable parameters) to data. The principle may be depicted as in the figure
below. An input sequence u is applied to the real system, generating an output sequence
y . The model depends on a set of parameters θ, and generates an output sequence ŷ (u, θ)
from the input sequence u. The idea is then to adjust the parameters θ such that the model
output ŷ matches in some sense the output y from the real system.
Real system
y SysId (parametric):
u
Input sequence Adjust θ such that
ŷ matches y
u ŷ
Model
ŷ = ŷ (u, θ)
Note that we will work in discrete time when applying system identification. Indeed, since
the measurements are always collected in a discrete fashion from the real system, it makes
sense to build a theory where the model output is also discrete-time.
5
This chapter is much influenced by the treatment in the book [4]. See also the course homepage.
105
For the system identification problem thus described, we can already at this stage observe
that there are a few key issues that have to be considered when we want to apply it for
system modelling:
• Selection of model structure: the model ŷ (u, θ) can take various forms, allowing e.g.
both linear and nonlinear dynamics, different parametrizations etc.
• Experiment design: this involves e.g. selection of inputs and outputs to be used and
construction of the input sequence u to be applied to the system.
• Algorithm design: we need to define what is meant by a good fit of the model to data,
and how to find the best model parameter vector θ.
• Model validation: how can we assess the resulting model and whether it fills its pur-
pose?
We will discuss each of these in the sequel, but in order to introduce some of the basic
concepts and ideas, we will start by investigating a much simpler problem, namely to fit a
function to experimental data.
D ATA . Our investigation relies on experimental data. Hence, we assume a series of mea-
surements of the quantities x and y is available. The N data-points are labelled x(1), . . ., x(N )
and y(1), . . . , y(N ), respectively, and they are depicted in Fig. 5.1 (left) for a particular data
set with N = 100. The precise conditions for the data collection may vary, but the basic as-
sumption is that for the N known values of x (determined by the experimenter or in some
other way), the corresponding values of y are measured. Usually, these measurements are
corrupted by disturbances—this can be suspected already from the scattered data-points
in Fig. 5.1 .
OVERFIT. In our search for a relation between x and y, a very simplistic (and a bit naive)
approach would be to rely completely on all measurements and assume they are reflecting
the “true” relation. This would mean that our model is obtained by simply connecting all
the data-points, as depicted in Fig. 5.1 (right). Note that we have here introduced the no-
tation ŷ = ŷ(x) to denote the predicted model output, corresponding to any (limited to the
interval [0, 1]) value of the input variable x. If we want to avoid the sharp “corners” of the
function ŷ(x), we could optionally use smooth curves instead of straight lines when con-
necting the data-points.
106
1.4 1.4
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 5.1: Illustration of the curve fitting example: experimental data (left) and a naive,
overfitted model (right).
There are at least two ways to criticise what we have just done. The first objection is that
the procedure is likely to give very different results, depending on the particular realization
of the disturbances affecting the measurements. Thus, we have been tempted to rely too
much on the individual data-points, resulting in an over-fitted model. A natural remedy for
this would be to collect several output measurements for each of a set of pre-selected val-
ues of x and to take the average as the corresponding model output. However, this would
require that we have full control over the choice of x values and also raises additional ques-
tions, such as how to select these values from a continuous range, and how to interpolate
the model output in-between the set of pre-selected x values.
More importantly, we would still be left with the second objection: we have not at all taken
into account a fundamental assumption (a prejudice, if you want), namely that we almost
always postulate that the sought-for relation is a smooth one. Hence, we do not expect an
erratic behaviour of the function as the one shown in Fig. 5.1 (right). Another way to ex-
press the assumption on smoothness is that data-points contain information not only for
specific values of x, but also for nearby values. This fact can be exploited in order to come
up with numerous superior methods for curve fitting, but we will limit our discussion to
parametric methods.
107
where f is a predefined, known function of the “input” variable x and the adjustable pa-
rameter vector θ. The idea with this parametric model, formulated already in the beginning
of this chapter, is to adjust θ in such a way as to make the model output ŷ match the mea-
sured output as closely as possible.
M ODEL STRUCTURE . Having come this far, the natural question to ask is how to choose
the function f (x, θ), also referred to as the model structure. There are basically two ways
to go. The first alternative is to use some physical insight, suggesting possible types of
relations between x and y. For example, if we want to experimentally determine the rela-
tion between the pressure drop over and the flow through a restriction in a pipe (such as
a valve), then basic physical laws tell us that a quadratic dependence would be expected.
Then a good first attempt would be the model structure
ŷ = θx 2 , (5.2)
containing only one parameter to adjust for the best fit to data.
The other alternative is to accept that there is no a priori information about promising
model structures. We can then resort to standard choices, such as polynomials, and hope-
fully be guided by inspection of experimental data. In our example, the data-points de-
picted in Fig 5.1 suggest that there may be a simple linear relationship between x and y:
ŷ = a + bx = θ⊤ ϕ, (5.3)
where we have defined the parameter vector θ and the regression vector ϕ (holding the
regressors 1 and x) as · ¸ · ¸
a 1
θ= ϕ= . (5.4)
b x
The model structure defined by (5.3) is probably known to you as a “straight-line approx-
imation”. It is an example of a linear regression model, since the parameter vector to be
adjusted or estimated enters linearly. This property of the model will turn out to be impor-
tant.
PARAMETER ESTIMATION . Once the model structure has been defined—at least as a first
attempt—we would like to test how well the model works by determining what values of a
and b that would give as good fit as possible between the assumed model and the data. Let
us formalize the posed problem a bit:
• The model (5.3) can be used to “guess” or predict values of y, given values of x. Using
ϕ(i ) = [1 x(i )]⊤ , the predictions corresponding to the N data-points will be denoted
where the notation emphasizes the fact that the predictions depend on the parameter
vector θ.
108
• In practice, there will always be a discrepancy between what the model predicts, i.e.
ŷ(i |θ), and the measured data point y(i ). Accordingly, we define the residual ε as
• Our desire is now to find the model parameters, which make the residuals as small
as possible in some sense. To this end, it is common to define a scalar loss function
or criterion VN (θ) to be minimized with respect to θ. The celebrated least-squares
method, going all the way back to Gauss, provides one solution to this problem and
will be described next, and we will then return to the example.
L EAST- SQUARES . The least-squares (LS) method for a linear regression model is based on
forming a measure of the model fit by summing the squares of the residuals, i.e. the least-
squares criterion is defined as
1 X N 1 X N ¡ ¢2
VN (θ) = ε2 (i , θ) = y(i ) − θ⊤ ϕ(i ) . (5.7)
N i =1 N i =1
The least-squares estimate (LSE) θ̂N is then obtained by minimizing this criterion, i.e.
where the notation stresses the fact that the resulting estimate depends on N data points.
1 X N 1 X N ¡ ¢2
VN (θ) = ε2 (i , θ) = y(i ) − θT ϕ(i )
N i =1 N i =1
1 X N 1 XN 1 XN
= y 2 (i ) − 2θ⊤ ϕ(i )y(i ) +θ⊤ ϕ(i )ϕ⊤ (i ) θ
N i =1 N i =1 N i =1
| {z } | {z }
fN RN
1 X N
= y 2 (i ) − f N⊤ R N
−1 −1
f N + (θ − R N f N )⊤ R N (θ − R N
−1
f N ),
N i =1
assuming the matrix inverse exists (equivalently, R N is positive definite). Since the last term
is the only term that depends on θ and since it is a nonnegative quadratic form, the least-
squares estimate θ̂N is obtained by simply putting it to zero, i.e.
N
¡1 X ¢−1 1 X N
−1
θ̂N = R N fN = ϕ(i )ϕ⊤ (i ) ϕ(i )y(i ) (5.8)
N i =1 N i =1
109
R EMARK . An alternative derivation goes as follows. Define the vector y and the matrix Φ
as
y(1) ϕ⊤ (1)
. .
y = .. , Φ = .. , (5.9)
⊤
y(N ) ϕ (N )
thus holding all the data, the measured outputs and all regression vectors. Then the LS
criterion can be written as (modulo a factor 2/N )
1 1
VN (θ) = ky − Φθk2 = (y − Φθ)⊤ (y − Φθ) (5.10)
2 2
The LS solution is found by differentiating w.r.t. θ:
dVN (θ)
= θ⊤ Φ⊤ Φ − y ⊤ Φ = 0, (5.11)
dθ
giving
θ̂N = (Φ⊤ Φ)−1 Φ⊤ y , (5.12)
which is of course identical to (5.8) (can you verify this?). Note that the solution can be
interpreted as an approximate solution of the overdetermined system of linear equations
y = Φθ.
We leave it as an exercise to show that a weighted 2-norm replacing (5.10), i.e.
1 2 1
VN (θ) = ky − ΦθkW = (y − Φθ)⊤ W (y − Φθ), (5.13)
2 2
for any symmetric, positive definite matrix W , gives the weighted least squares solution
Let us now return to the curve fitting example. Since the model (5.3) is a linear regression,
the LS estimate is obtained by applying (5.8) for the data set at hand:
· ¸ N N
â N ¡1 X ¢−1 1 X
θ̂N = = ϕ(i )ϕ⊤ (i ) ϕ(i )y(i ) (5.15)
b̂ N N i =1 N i =1
where
N · PN ¸
1 X ⊤ 1 N x(i )
ϕ(i )ϕ (i ) = PN PNi =1 2 (5.16)
N i =1 N i =1 x(i ) i =1 x (i )
N · P N ¸
1 X 1 i y(i )
ϕ(i )y(i ) = PN =1 (5.17)
N i =1 N i =1 x(i )y(i )
The linear model predictions are shown in Fig. 5.2 (left) along with the data points. At first
sight, it seems as if the simple, linear model serves its purpose quite well.
110
1.4 1.4
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 5.2: Curve fitting of a model using a linear function (left) and a quadratic function
(right).
M ODEL ORDER . When taking a closer look at the model fit to data in Fig. 5.2 (left), it can
be noticed that the data-points tend to lie above the fitted straight line for small and large
values of x, and below for intermediate values. This observation suggests that perhaps a
quadratic function would better describe the data. We can easily test this hypothesis by
redefining the parameter vector and the regression vector as
a 1
θ = b ϕ = x . (5.18)
2
c x
Notice that the model is still a linear regression, although one of the regressors is nonlinear
in x! As seen in Fig. 5.2 (right), the quadratic model clearly gives an improved fit to data;
the loss function is approximately halved compared to the straight line fit.
In fact, we could extend the model in this way to any polynomial in x or, for that matter,
with any regressor being a nonlinear function of x—the only thing that will change is the
definition of the regression vector ϕ. What is important for the derivation of the LS esti-
mate, however, is that the model ŷ = θ⊤ ϕ is linear in the parameters θ.
Let us return to our curve fitting example and for a moment pretend that the data is actually
generated by the “true system”
y(i ) = θ0⊤ ϕ(i ) + e(i ), (5.19)
111
where the sequence {e(i )} consists of independent, identically distributed (i.i.d.) random
variables with variance σ2 (see Section 1.10 for a discussion of i.i.d. random variables). We
have used the word “pretend” to stress the fact that it is highly unlikely that the real data
obeys (5.19). However, the idealized assumption makes it possible to perform some analy-
sis in order to better understand how the least-squares method behaves.
A consequence of the assumption (5.19) is that the estimate θ̂N is itself a random variable,
taking a numerical value that depends on the particular realization of the noise sequence.
When we repeat the experiment, the noise realization will be different and, hence, the es-
timate as well. Considering the fact that θ̂N is a random variable, we can ask ourselves
how this random variable can be characterized. There are a few classical and important
concepts at hand, which we will now discuss:
• Bias. The concept of bias is used in order to describe what happens “on average”, if
we repeat the experiment many times. If there is a systematic error when estimating a
“true” parameter θ0 , there is a bias. Conversely, we say that the estimate is unbiased6
if
E[θ̂N ] = θ0 (5.20)
• Consistency. Bias and variance describe properties of the estimate for finite data
records. In addition, a natural question is how the estimate θ̂N behaves when the
number of data N tends to infinity. The estimator is called consistent if
It should be remarked here that the limit of a random variable can be defined in sev-
eral ways. We will use the concept of almost sure convergence or convergence with
probability 1 (w.p. 1). This means, loosely speaking, that the estimate converges in
the usual sense for almost all realizations (or, equivalently, realizations for which this
does not hold have probability 0). In this case, it is common to refer to the property
as strong consistency.
Returning to our linear regression example, we can now state the following properties of
the LS estimate θ̂N :
6
Swedish: medelvärdesriktig
112
1. The estimate is unbiased, which is easily proved:
1 X N 1 X N
E[θ̂N ] = E[( ϕ(i )ϕ⊤ (i ))−1 ϕ(i )y(i )]
N i =1 N i =1
1 X N 1 X N
= θ0 + ( ϕ(i )ϕ(i )⊤ )−1 · E[ ϕ(i )e(i )] = θ0 (5.23)
N i =1 N i =1
2. The covariance (see Section 1.10) of the parameter estimate is, using (5.23):
1X 1X
E[(θ̂N − θ0 )(θ̂N − θ0 )⊤ ] = R N
−1
E[( ϕe)( ϕe)⊤ ]R N −1
N N
1 −1 X N σ2 −1
= 2 RN ϕ(i )ϕ( j )⊤ E[e(i )e( j )]R N
−1
= R . (5.24)
N i ,j =1 N N
In the last step, the fact that the regressors are deterministic (or, alternatively, inde-
pendent of the noise) has been used, together with the i.i.d. property of the noise
sequence (implying that E[e(i )e( j )] = 0, i 6= j ).
This result seems quite natural: the spread increases with noise variance and de-
creases with increasing number of data-points. In addition, the spread depends on
the matrix R N , which can be interpreted as a measure of the information available in
the regressors about the model parameters.
1 X N
ϕ(i )ϕ⊤ (i ) → R ∞ , w.p. 1 as N → ∞ (5.25)
N i =1
1 X N
ϕ(i )e(i ) → 0, w.p. 1 as N → ∞ (5.26)
N i =1
1 X N
σ̂2N = VN (θ̂N ) = ε2 (i , θ̂N ) (5.28)
N i =1
Let us make some brief comments on these results. First, it should again be stressed that
the properties of the LS estimate have been derived under ideal conditions, the most im-
portant one being that the true system belongs to the model set, i.e. the collection of models
obtained by varying the parameter vector θ. The “pay-back” we get in return to this strong
113
assumption are the insights through these results, which we can hope hold at least approx-
imately for more realistic scenarios.
Another relevant remark is that the characterization of the least-squares method is in terms
of the parameter estimate θ̂N , which is, strictly speaking, not our main concern. What we
are really interested in is to fit a function to data, and the parameter vector θ is just a vehicle
for doing this. However, there is a strong coupling, of course. To illustrate this, assume that
the data is described by, instead of (5.19), the following equation:
where f (·) is the unknown function to model, and {e(i )} is an i.i.d. noise sequence with
variance σ2 . Let us now calculate the expected value of the LS criterion:
N ¡
£1 X ¢2 ¤ £1 X N ¡ ¢2 ¤
E[VN (θ)] = E y(i ) − θT ϕ(i ) = E f (x(i )) + e(i ) − θT ϕ(i )
N i =1 N i =1
1 X N ¡ ¢2
= f (x(i )) − θT ϕ(i ) + σ2 (5.30)
N i =1
This result shows that there are two contributions to the (expected) LS loss function. The
first term constitutes the systematic error (bias), and the second term is a contribution from
the noise. The bias is determined by how well we can approximate the true function f by
a linear regression (linear or quadratic function of x in our example). Notice that the value
of θ that minimizes the bias depends on the set of data-points {x(i )}, since more emphasis
will be put on intervals with more data-points than intervals with scarce data. For example,
the data depicted in Fig. 5.1 indicates that less emphasis will be put to the interval [0.6 0.8]
when fitting a model, since this interval contains few data-points. Even if the argument is
based on the expected value of the LS loss function, it gives an idea of how the parameter
estimation can be affected by the experimental conditions. We will return to this later when
discussing identification of dynamic models.
This concludes the discussion on the introductory curve fitting example, based on apply-
ing the least-squares method to a linear regression model. Although the problem of fitting
a model of a static function to data is different from fitting a dynamic model to data—for
example, it does not involve time per se—many of the concepts introduced and discussed
will turn out to be useful also in the dynamic case.
A final remark is that the problem we have discussed can be formulated already at the out-
set in a probabilistic context, i.e. characterizing the task in terms of stochastic variables. In
doing so, the maximum-likelihood (ML) method offers a general and powerful technique
to formulate and solve many parameter estimation problems. It turns out that the least-
squares method, although formulated here without any stochastic framework, is in fact a
special case of the ML method. A brief account of the ML technique is provided for the
interested reader in Section 5.4.
114
5.2 PARAMETER E STIMATION FOR L INEAR DYNAMIC S YSTEMS
We will now turn to the task to identify dynamic models from experimental data. We re-
mind about the basic formulation of the (parametric) system identification problem, as for-
mulated in the beginning of this chapter. The basic idea is to choose a model, which de-
pends on a number of parameters, collected in the parameter vector θ, and then to adjust
the parameters so that the model behaviour is close to the real system’s behaviour, as rep-
resented by data recorded from experiments. In this way, you can view the parameters as
“tuning knobs” of the model, and the tuning should be done to mimic the real system as
closely as possible.
A natural question to ask at this point is how the model, with its parameter vector θ, often
referred to as a model structure, should look like? Clearly, there are many choices, and any
type of qualitative a priori knowledge of the system may guide the decision. One useful dis-
tinction of model structures is between tailor-made and general-purpose models (a similar
distinction was discussed for the curve fitting example).
Tailor-made models are typically the result of some type of physical modelling effort. The
modelling may for example lead to a state-space model with a specific structure, but with
a number of parameters that are unknown. Such a white-box model could look like:
θ1
ẋ(t ) = f (x(t ), u(t ), θ) ..
, θ= . (5.31)
y (t ) = h(x(t ), θ)
θd
In this case, we typically trust that the model (5.31), including the functions f , h, describe
the system well, but lacking the precise values of the parameters θ, we want to determine
or estimate these from data.
Example 5.1 (DC motor). Let us recall the state space model previously derived for a DC
motor:
· ¸ · ¸
−R/L −K e /L 1/L
ẋ = x+ u
K m /J −b/J 0
£ ¤
y= 0 1 x
where the state vector holds current i and angular speed ω, i.e. x = (i , ω), the input u is the
voltage applied, and the output y is the angular speed.
If the electrical time constant is neglected, the transfer function from voltage to angular
speed is given by
Ω(s) K Km RJ
= ; K= , τ=
U (s) 1 + sτ K m K e + bR K m K e + bR
While there are five parameters in the physical model (R, L, J , K e , K m ), the transfer function
can be characterized by only two parameters. The implication is that not all parameters are
115
identifiable, i.e. can be determined from experimental data describing the input-output re-
lations of the system. On the other hand, it is enough to find appropriate values for K and
τ in order to capture the dynamics of the DC motor. The fact that not all physical parame-
ters can be determined from input-output data is not uncommon for models derived from
physical principles.
Motivated by the example above, it seems natural to ask whether we could instead focus
on models that only try to capture the input-output behaviour of the system. Such models
need not be based on physical relationships, as long as they could describe the dynamics,
and they could therefore be termed general-purpose models.
General-purpose models are also termed black-box models, suggesting that the internal
workings of the model is not of interest, only the external, input-output behaviour. Black-
box models may be linear or non-linear, and formulated in continuous or discrete time.
The following example shows a simple and well-known model.
Example 5.2 (Step-response analysis). Already in the basic control course a simple system
identification tool was employed, namely a step response test. In the simplest case, we ex-
ploit the fact that a first order system, given by the transfer function
K
G(s) = ,
1 + sT
has a step response shown in the figure below.
y 6
0.63K
0 18
T t
Now, if the input sequence u is chosen as a step function and applied to a real system,
whose dynamics can be well approximated by a first order system, then the model param-
eters θ = (K , T ) can at least approximately be found from a plot of the step response—the
model parameters are adjusted to fit the data, i.e. the recorded step response. The proce-
dure can thus be described as a simple example of system identification.
In this course we will, however, focus our attention on linear, discrete-time models of the
form
y(t ) = G(q, θ)u(t ) + w(t ) = G(q, θ)u(t ) + H(q, θ)e(t ), (5.32)
116
where
B(q, θ) b 1 q −1 + . . . + b nb q −nb
G(q, θ) = = (5.33a)
F (q, θ) 1 + f 1 q −1 + . . . + f n f q −n f
C (q, θ) 1 + c1 q −1 + . . . + cnc q −nc
H(q, θ) = = (5.33b)
D(q, θ) 1 + d 1 q −1 + . . . + d nd q −nd
• Recall that the operator q is the correspondence to z in the time domain, i.e. qu(t ) =
u(t + 1) and q −1 u(t ) = u(t − 1). See Section 1.9.
• The first term of the model (5.32) means that the u − y dependence is modelled by a
general, linear time-invariant transfer function including a time delay.
We will investigate the problem to estimate parameters of linear black-box models of the
type (5.32) in some detail. However, before treating the general case, we will introduce
some of the basic ideas in the context of the simpler linear regression case, where some of
the results obtained for the curve fitting example in Section 5.1 will turn out to be useful.
Recall that much of the discussion on curve fitting in Section 5.1 was centered around mod-
els providing predictions of the form (5.5), repeated here for convenience:
We also noted in passing that there is a great deal of flexibility in using different regressors in
ϕ. In fact, this observation opens up for applying the techniques also to dynamic systems,
and a couple of simple examples will illustrate this.
Example 5.3 (Finite impulse response model). Consider the Finite Impulse Response (FIR)
(why is it called this?) model with input u and output y, given by
nb
X
y(t ) = b i u(t − i ) + e(t ) = θ⊤ ϕ(t ) + e(t ), (5.35)
i =1
where the vector θ holds the parameters b 1 , . . . , b nb , and the vector ϕ holds delayed values
of u. This is clearly a linear regression model, although the regressors are now lagged (de-
layed) input signals to a dynamic model, which is also the reason why we have switched
117
notation from i to t to stress that time is now the independent variable. In the same way as
before, the model gives rise to the prediction
where the notation now gives emphasis to the fact that the prediction is made one step into
the future, i.e. the model (which depends on θ) is used to predict the output at time t , given
information available at time t − 1 (note that there is no way to predict future values of
e(·), since it is an i.i.d. sequence). The conclusion is that the least-squares method can be
applied to estimate the parameters of the FIR model, and the least-squares estimate (LSE)
is given by (5.8), again repeated here for easy reference:
N
¡1 X ¢−1 1 XN
−1
θ̂N = R N fN = ϕ(t )ϕ⊤ (t ) ϕ(t )y(t ) (5.37)
N t=1 N t=1
Moreover, taking a close look at the analysis of the LSE performed in Section 5.1, we can
also conclude that the properties of the LSE still hold under the assumption that the noise
is uncorrelated with the input.
We discussed briefly in Section 5.1 that the choice of experimental data, i.e. selection of
points {x(i )}, has an impact on the resulting curve fitting. A similar observation can be
made in the dynamic case, as illustrated in the next example.
Example 5.4 (Estimation of an FIR model with different inputs). Let us illustrate next esti-
mation of an FIR model with n b = 20, assuming data is generated by a real system with the
same structure and noise added. Figure 5.3 shows the results for different inputs (left-hand
graphs), yielding different output of the real system and model (centred graphs), and set
of parameters θ̂ (right-hand side graphs). One can observe that the choice of input has a
strong impact on the quality of the parameter estimate θ̂. Indeed, a poor parameter esti-
mation can be obtained, even though the model output fits the system output fairly well.
In such a case, one would suspect that some of the conditions, under which we analyzed
the properties of the LSE, is not fulfilled (which one?).
118
1 1.5
1 0.4
0.5
0.5 0.3
u
θ
0 0
y
0.2
-0.5
-0.5 0.1
-1
0
-1 -1.5
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0 0.05 0.1 0.15
t t t
1 1.5
0.6
1 0.4
0.5
0.2
0.5
0
u
θ
0 0
y
-0.2
-0.5 -0.4
-0.5
-0.6
-1
-0.8
-1 -1.5
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0 0.05 0.1 0.15
t t t
0.3 0.2
0.2 0.4
0.1
0.1 0.3
u
θ
0 0
y
0.2
-0.1
-0.1 0.1
-0.2
0
-0.3 -0.2
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0 0.05 0.1 0.15
t t t
Figure 5.3: Illustration of the least-squares approach applied to an FIR model. The input
sequences u are displayed on the left-hand side graphs. The black signals repre-
sent the outputs y (centred graphs) and parameters (right-hand side graphs) of
the real system, while the grey signals represent the outputs ŷ (centred graphs)
and estimated parameters (right-hand side graphs) of the model.
Example 5.5 (Auto-regressive model). The FIR model describes a transfer function from u
to y, having zeros only (but no poles). We can add poles, however, by extending the FIR
model with auto-regressive terms,
nb
X na
X
y(t ) = b i u(t − i ) − ai y(t − i ) + e(t ) = θ⊤ ϕ(t ) + e(t ), (5.38)
i =1 i =1
and this way get an Auto-Regressive model with eXogenous input (ARX) model. The vector θ
now contains both a- and b-parameters, and ϕ contains delayed values of both input and
119
output. The prediction is still given by (5.36), and is based on previous inputs and outputs.
Again, we can conclude that the least-squares method can be applied to estimate the pa-
rameters, and the LSE is given by (5.8) with the new definitions of the vectors involved.
Motivated by the FIR and ARX examples, we would now like to extend the techniques to the
general linear, black-box model (5.32), repeated here for convenience:
where
B(q, θ) b 1 q −1 + . . . + b nb q −nb
G(q, θ) = = (5.40a)
F (q, θ) 1 + f 1 q −1 + . . . + f n f q −n f
C (q, θ) 1 + c1 q −1 + . . . + cnc q −nc
H(q, θ) = = (5.40b)
D(q, θ) 1 + d 1 q −1 + . . . + d nd q −nd
In block diagram form, the model (5.39) would look as depicted below:
C (q)
D(q)
+ +
B(q)
u y
F (q)
• Postulating the model above means we are interested in modelling both input-output
behaviour by the transfer function G, and the disturbance character by the noise
transfer function H (unless H = 1).
If we write the model transfer functions in terms of their impulse response coefficients, we
have
∞
X ∞
X
G(q, θ) = g (k, θ)q −k , H(q, θ) = 1 + h(k, θ)q −k (5.41)
k=1 k=1
Now, using this representation for H in (5.39), we see that the disturbance term w(t ) is a
weighted sum of previous noise terms e(t − k). With the assumption that {e(·)} are i.i.d.
120
random variables, it is clear that the term e(t ) in w(t ) is completely unpredictable. On the
other hand, all previous noise terms are present in old outputs, so in principle it should be
possible to extract these from measurements of the output (along with the inputs). Indeed,
this program can be carried out mathematically by re-writing the output y(t ) as
where we have omitted the argument θ for simplicity. As argued above, the last term in
this expression cannot be predicted. The remaining two terms consist of filtered input and
output signals up till time t − 1 (why?). Hence, these two terms will form the best mean-
square prediction of the output:
leading to the optimal prediction error ε(t , θ) = e(t ) (for θ known). By using the definitions
(5.40), the prediction can be expressed in the model polynomials as
In order to make it a bit more concrete, let us take a look at some special cases.
Example 5.6 (Linear regression model structures). We have already encountered the FIR
and the ARX cases, which can be obtained by the choices F (q) = C (q) = D(q) = 1 (FIR) and
C (q) = 1, F (q) = D(q) = A(q) (ARX), respectively:
These two model structures are depicted in block diagram form below (FIR left, ARX right),
where it can be clearly seen that in the ARX model structure, the polynomial A(q) is shared
between the input-output and noise-output transfer functions.
1
A(q)
e
+ + + +
B(q)
u B(q) y u y
A(q)
121
The predictors are obtained as special cases of (5.44):
Recall that we have in earlier examples observed that these expressions can both be written
as linear regressions ŷ(t |t − 1, θ) = θ⊤ ϕ(t ), which in turn gave us the opportunity to derive
explicit solutions for the least-squares estimates.
There are two more special cases that we would like to mention, cases that will turn out to
be different than the ones discussed so far.
Example 5.7 (Non-linear regression model structures). The ARMAX (Auto-Regressive, Mov-
ing Average with eXogenous input) and OE (Output Error) model structures are obtained by
choosing F (q) = D(q) = A(q) and C (q) = D(q) = 1, respectively. We thus get the models and
predictors as follows:
B(q, θ) C (q, θ)
y(t ) = u(t ) + e(t ) ⇔ A(q, θ)y(t ) = B(q, θ)u(t ) +C (q, θ)e(t )
A(q, θ) A(q, θ)
ARMAX:
B(q, θ) C (q, θ) − A(q, θ)
ŷ(t |t − 1, θ) = u(t ) + y(t )
C (q, θ) C (q, θ)
(5.47a)
B(q, θ)
y(t ) = u(t ) + e(t )
F (q, θ)
OE: (5.47b)
B(q, θ)
ŷ(t |t − 1, θ) = u(t )
F (q, θ)
C (q)
A(q)
e
+ + + +
B(q) B(q)
u y u y
A(q) F (q)
Let us reflect a bit on the results in the last example and make a couple of observations:
• We can notice that the predictions for these cases are not simply a linear combina-
tion of a few delayed, measured quantities, as in the FIR and ARX cases. Rather, the
predictions are given as the outputs of filters with measured u and y as inputs.
• In order for these predictor filters to produce meaningful results, we have to ensure
that C (q) and F (q), respectively, are stable polynomials.
122
• In the OE case, the prediction is generated by filtering u only, i.e. by simply simu-
lating the model with input u; the measured outputs are not used to compute the
prediction.
The first term in this expression contains delayed values of the prediction itself, rather than
measured quantities. We can still gather delayed signals, u and ŷ, in a vector ϕ to get an
expression for the prediction like
but the important distinction from earlier is that ϕ now depends on θ via ŷ. The conclusion
is that the prediction is nonlinear in θ. The same conclusion holds for the ARMAX case,
where the predictor can be re-written as
where the last expression contains previous values of the prediction error ε(t , θ).
Once we have determined a model structure, or rather the predictor associated with the
model structure, we can devise a parameter estimation scheme that is a fairly straightfor-
ward generalization of the ones encountered so far. The following “recipe” gives rise to an
entire class of Prediction Error Methods (PEM):
1 X N
VN (θ) = l (t , θ, ε(t , θ)),
N i =1
123
The following remarks should be made:
• The PEM recipe is quite general and can be applied to both white-box and black-box
models.
• The choice of l can be made in various ways. The very common least-squares crite-
rion corresponds to the choice l (t , θ, ε(t , θ)) = ε2 (t , θ).
R EMARK : The maximum-likelihood method, described in Section 5.4, corresponds to
choosing l as the so-called negative log-likelihood function.
Let’s assume we want to apply the PEM recipe in order to fit a model to data using a least-
squares criterion, i.e. we would like to minimize the cost
1 XN 1 XN ¡ ¢2
VN (θ) = ε2 (t , θ) = y(t ) − ŷ(t |t − 1, θ) (5.53)
N t=1 N t=1
Based on the discussion we had concerning the property of the predictor, we should dis-
tinguish between the two following situations:
LLS: For the linear regression case, including the ARX and FIR models, the predictor ŷ(t |t −
1, θ) = θ⊤ ϕ(t ) is linear in θ. The problem is then a Linear Least-Squares (LLS) prob-
lem, with the property that the estimate—the minimizer of VN (θ)—can be given ex-
plicitly or be computed as the solution of a system of linear equations. To refresh this,
we note that the cost VN (θ) can be differentiated and put to zero to find the solution:
2 XN 2 XN ¡ ¢
∇θ VN (θ) = ∇θ ε(t , θ)ε(t , θ) = − ∇θ ŷ(t |t − 1, θ) y(t ) − ŷ(t |t − 1, θ) = 0,
N t=1 N t=1
(5.54)
which, after inserting ŷ(t |t − 1, θ) = θ⊤ ϕ(t ), gives the system of linear equations
" #
1 X N 1 X N
⊤
ϕ(t )ϕ (t ) · θ = ϕ(t )y(t ) (5.55)
N t=1 N t=1
NLS: For the other model structures, including ARMAX and OE, the predictor is nonlinear
in θ. The problem is then a Nonlinear Least Squares (NLS) problem, for which the
minimizer of VN (θ) has to be found by an iterative search. Indeed, we have already
come across a general tool for this, namely the Newton method. The method would
be applied to the same necessary condition for a minimum, i.e.
2 XN ¡ ¢
∇θ VN (θ) = − ∇θ ŷ(t |t − 1, θ) y(t ) − ŷ(t |t − 1, θ) = 0, (5.56)
N t=1
the difference being that ŷ(t |t − 1, θ) = θ⊤ ϕ(t , θ) is not any longer linear in θ.
Observe that, when applying an iterative method to find the minimum of the crite-
rion, both predictions and the gradient of the predictions (see below) are obtained
by filtering operations, and since the filters depend on θ, these signals need to be
re-evaluated at each iteration.
124
R EMARK . In the NLS case, e.g. for the ARMAX and OE models, we need an expression for
the gradient of the prediction, ∇θ , in order to apply the Newton method. We will illustrate
for the OE case how this can be worked out and we start by showing in detail how to go
from (5.49) to (5.50):
where
£ ¤
θ ⊤ = f 1 · · · f n f b 1 · · · b nb (5.58a)
£ ¤
ϕ⊤ (t , θ) = − ŷ(t − 1) · · · ŷ(t − n f ) u(t − 1) · · · u(t − n b ) (5.58b)
This implies that we can calculate the derivatives of the prediction with respect to the pa-
rameters (using here total derivates, observing that previous values of the predictions de-
pend on the parameters):
d d
ŷ(t |t − 1, θ) = (1 − F (q, θ)) ŷ(t |t − 1, θ) + u(t − i ) (5.59a)
db i db i
d d
ŷ(t |t − 1, θ) = (1 − F (q, θ)) ŷ(t |t − 1, θ) − ŷ(t − i |t − i − 1, θ). (5.59b)
d fi d fi
1
∇θ ŷ(t |t − 1, θ) = ϕ(t , θ) (5.60)
F (q, θ)
Hence, the gradient is a filtered version of the regressor vector. The derivation for the AR-
MAX case is analogous.
Properties of the PEM estimate θ̂N , such as bias and variance, can be established similarly
to least-squares applied to the non-dynamic case. We briefly state a few results below.
B IAS . In real applications, the system to be modelled is typically more complex than what
the model can capture. The implication is that, even with very large data sets, there will be
a discrepancy between the model and the real system; this is what we have earlier termed
systematic error or bias. One way to interpret the bias is to view the parameter estimation as
an attempt to find an approximation of the real system, an approximation that is in some
sense the best one can find within the given model structure. The “systematic” approxi-
mation error—the bias—will, however, depend on how the experiments were conducted,
since e.g. different types of inputs will highlight different features of the system.
It is difficult to characterize the bias in general terms, since it depends on e.g. properties
of the real system and the model structure, as well as properties of the input. However, by
assuming that the number of data points N tends to infinity, a fundamental property that
125
can be used to investigate the bias in special cases can be derived. Let us first note that,
under quite general conditions, VN (θ) converges to E ε2 (t , θ). This in turn can be used to
prove the following: © ª
θ̂N → θ | E ε2 (t , θ) attains its minimum (5.61)
In hindsight, this result encodes what we can hope for: the parameter estimate converges
to a set, which minimizes the variance of the prediction errors, corresponding to the best
model within the given model structure.
The result quoted above is somewhat implicit. It is also formulated in terms of the param-
eter estimate θ̂, which is of less interest for the case of black-box models (cf. the discussion
on the curve-fitting example on p. 114). In order to get a bit more insight into characteri-
zation of bias, let us look at an example:
Example 5.8 (Bias distribution in the OE case). Recall the Output Error (OE) model struc-
ture with the predictor
B(q, θ)
ŷ(t |t − 1, θ) = u(t ) (5.62)
F (q, θ)
Now, assume the data is generated by the system
where G(q) is a transfer function and e(·) is any noise sequence, which is uncorrelated with
the input u(·). Using (5.62) and the uncorrelatedness, we can then obtain an expression for
the variance of the prediction errors:
£ B(q, θ) ¤2 £ B(q, θ) ¤2
E ε2 (t , θ) = E (G(q) − )u(t ) + e(t ) = E (G(q) − )u(t ) + E e 2 (t ) (5.64)
F (q, θ) F (q, θ)
The interpretation is that, asymptotically, the parameter estimate will be determined such
that the systematic error, given by the first term in this expression, will be minimized. Sim-
B(q,θ)
ilarly to the curve-fitting case, the fit of the model transfer function F (q,θ) to the system
transfer function G(q) will depend on the input u(t ). This dependence can be expressed
more explicitly in the frequency domain by exploiting Parseval’s formula:
Z Z
1 π B(e i ω , θ) 2 1 π
E ε2 (t , θ) = |G(e i ω ) − | Φu (ω)d ω + Φe (ω)d ω, (5.65)
2π −π F (e i ω , θ) 2π −π
where Φu (ω) and Φe (ω) are the spectral densities of the input and the noise, respectively.
From this expression, it can clearly be seen how the fit of the model transfer function will
be weighted by the frequency distribution of input power. This is quite intuitive and gives
a clear indication of how the spectral properties of the input excitation can be chosen to
prioritize certain frequency regions when fitting the model to data.
The example above was formulated for the OE model structure. For the general model
structure (5.39), a similar result can be derived, but the interpretation is less transparent,
since the fitting of the input-output transfer function and the noise model transfer function
are interconnected.
126
VARIANCE . The random fluctuations of the parameter estimate due to the finite number
of (noise corrupted) data can be characterized in a similar way as discussed before for the
linear regression case. Basically, the only thing that is changed is that the regression vector
ϕ is replaced by the gradient of the prediction, i.e. ∇θ ŷ. Thus, assuming there is a “true” pa-
rameter θ0 , the covariance of the parameter estimate is approximately and asymptotically
given by:
σ2 −1
E[(θ̂N − θ0 )(θ̂N − θ0 )⊤ ] ≈ R , (5.66)
N
where σ2 is the noise variance, and
£ ¤
R = E ∇θ ŷ(t , θ)∇θ ŷ ⊤ (t , θ) . (5.67)
In fact, it can also be shown, under general conditions, that the estimate is asymptotically
normal distributed with mean θ0 and covariance matrix given in (5.66), or
p
N (θN − θ0 ) ∼ AsN (0, σ2 R −1 ) (5.68)
With the experimental or data-driven approach to system identification, data is clearly fun-
damental. Indeed, carrying out the experiments and recording the chosen signals during
the experiment is the first necessary step. It should be stressed right away that it is usu-
ally worthwhile spending some effort to prepare the experiment and the data collection
carefully. The reason for this is that the quality of the final outcome of the system identi-
fication, the model, is largely dictated by the quality of the data, and significant efforts are
usually required to prepare and carry out additional experiments. Several factors influence
the quality of the data obtained, and some examples are given next.
C HOICE OF OPERATING POINT. The general rule is to collect data in operating points, sim-
ilar to those where the model is to be applied. If the system under study has significant
nonlinearities, the resulting model will depend on the operating point used. In some cases,
in particular if linear models are used, several models covering different parts of the oper-
ating window may have to be used. One example of this is when a so called gain scheduled
controller is designed based on the models.
127
Experiment design
Data collection
Pretreatment of data
Parameteter estimation
Model validation
No
Model ok?
Yes
C HOICE OF SAMPLING INTERVAL . When data is collected, the system is almost always sam-
pled at discrete events. This necessarily implies loss of information, and hence it is impor-
tant to make a conscious decision on which sampling interval to use. Sampling too slow
may imply that dynamics in the interesting frequency region is not accurately reflected.
Sampling too fast, on the other hand, may cause the modelling to be focussed on e.g. ir-
relevant, high-frequency disturbances—and, not the least, to generate a lot of data! A
rule-of-thumb is that 6-10 samples per settling time of a step response is usually a good
starting-point. A final remark concerning sampling is the risk of aliasing and the need to
use (analog) filtering of the signals before sampling.
C HOICE OF INPUT SIGNAL . When carrying out an experiment on the system, you usually
need to decide which inputs that should be applied. This is an opportunity to guide the
system identification process to produce high-quality models. Again, thinking in terms of
frequency properties usually helps intuition: if we intend to use the model in certain fre-
quency regions, we need to make sure that the applied input has high content (spectral
power) in those regions. Indeed, we saw in Example 5.4 that the lack of proper excitation—
in this case exemplified by a single sinusoid—may lead to poorly estimated parameters,
although the predictions could still be good.
Choosing amplitude of the input is usually a balance between getting good signal-to-noise
ratios (accuracy) and staying within operating constraints. In addition, in the presence of
non-linearities, it may be be wise to limit the amplitudes (saturation in actuators is an ex-
128
treme example). Sometimes the inputs to the system are not decided by the experimenter,
e.g. when the system is in normal operation, or when the input is generated by feedback.
Such situations require some extra care, but will not be further pursued here.
Once data is collected, the rest of the system identification process basically amounts to
data processing. Traditionally, this involves a fair amount of engineering skills used to guide
the processing of the data7, and this work is supported by interactive system identification
tools, e.g. available in Matlab or Mathematica.
D ETECTION OF ABNORMAL DATA . The first step in the data processing is to prepare the data
to be used for system identification. A good advice is to simply look at the data in the very
first step—the human eye and brain are powerful tools to quickly detect strange features
of data, and this could save a lot of work later on in the process. One standard action is
to remove outliers, i.e. data-points that are obviously wrong, due to e.g. loss of data or a
temporary failure of a sensor.
F ILTERING OF DATA . Viewing the collected data as raw data, it may be worth “polishing”
the data prior to system identification. In the interest of focussing the modelling on the
most important frequency region, it may be a good idea to remove low-frequency content
by e.g. eliminating non-zero means or trends in the data. This can be done in different ways,
e.g. by fitting a constant or a straight line to the data, or by using differentiated data. In a
similar way, high-frequency content may be removed by low-pass filtering, e.g. when it has
been discovered that an unnecessarily high sampling frequency has been used.
Prior to the actual parameter estimation, we need to decide on the model structure to be
used, and we have seen that there are plenty of possible choices.
W HITE OR BLACK BOX ? In the beginning of this chapter, we discussed the choice between
tailor-made and general-purpose models, or white-box vs. black-box models. If physics-
based relations can be used as starting-point for the modelling, then the white-box alter-
native may be preferable. The PEM principle can still be used to formulate a system iden-
tification problem, with the formation of the predictor being one important step. In this
introductory treatment of the SysId process, we have chosen to focus on the alternative
route via black-box models. There are still choices to be made, however...
C HOICE OF PARAMETRIZATION . There are, within the general white-box model (5.39), sev-
eral special model structures such as ARX, OE and ARMAX. The selection may be guided
by some application insights, whether there is a desire to model disturbances or not, or
7
This is to some extent challenged by the current development within machine learning.
129
simply by a desire to keep it simple—not a bad starting point! In addition to the choice
of parametrization, you need to choose model order (number of parameters in each of the
polynomials) and possibly also a time delay of the model. These are difficult choices, and
in the end it is usually a process of trial-and-error, guided by different tools for model vali-
dation.
Once the model structure, including e.g. model order, has been decided, the actual param-
eter estimation is basically an automatic step, carried out by the software tool. The resulting
model needs to be scrutinized, however, a step that is usually referred to as model valida-
tion. Note that this is not only a final step to ensure that the model fits its purpose, but it
is also a tool to assess the different choices made earlier in the process, e.g. selection of
model orders. The outcome may very well lead to going back to a previous step, making
different choices and repeating the parameter estimation step.
T ESTING MODEL QUALITY. There are many alternatives to test the model quality. The most
immediate is to evaluate the loss function used in the PEM approach, and a ranking of dif-
ferent models can easily be made. This could be a crude first step, but more information
can be gathered by studying time series based on the model, e.g.:
• The predicted output, ŷ(t , θ), using the model, can be compared with the real output
y(t ).
• The simulated, noise-free output of the model, y m (t ) = G(q, θ)u(t ), can be compared
with the real output y(t ).
Notice the distinction between the two: the predicted output is at the core of the PEM,
whereas the simulated output neglects the disturbance model and therefore usually devi-
ates more from the measured output. In the FIR and OE cases, the two are the same.
In addition to the above, the model can be studied in various ways, e.g. plotting of fre-
quency response, investigating poles and zeros etc.
M ODEL ORDER SELECTION . When testing model quality as described above, e.g. by com-
paring the values of the loss functions for several candidate models, it should come as
no surprise that increased model order—and hence model flexibility—will always improve
the fit. However, increasing the model order too much poses a risk: the model captures
specifics of the noise realization rather than relevant system dynamics. This phenomena is
called overfit. A pragmatic way to avoid overfit is to look for a “knee” in the curve depicting
how the loss function decreases with model order, as illustrated in Figure 5.5.
Another way to judge what is a significant improvement when increasing model orders, is
to use some test. One such test quantity is Akaike’s final prediction error criterion, which is
defined as
(1 + Nn )
F PE = VN (θ̂N ), (5.69)
(1 − Nn )
130
V (θ̂)
Model order
Figure 5.5: Loss function for increasing model orders, a typical plot.
where n is model order and N is number of data-points. By looking for the model with the
smallest value of the test quantity, it is possible to make a trade-off between improved fit
and increased model complexity.
A NALYZING PREDICTION ERRORS . The prediction errors comprise the very basis for the
PEM approach. Under certain assumptions, the statistical properties of the prediction er-
rors obtained after parameter estimation, the residuals, can be analyzed. This leads to two
useful tests:
• Cross-correlation test: If the model structure is able to “extract” all available infor-
mation in the input when forming the preciction of the output, we would expect
the residuals to be uncorrelated with the input. Indeed, approximately the follow-
ing holds and can be used to formulate statistical tests:
(
= 0, τ > 0, if the model fit is good
R̂ εu (τ) (5.70)
6= 0, τ < 0, indication of feedback
• Autocorrelation test: If the model structure basically fulfills the idealized assump-
tions we used in the derivation, then we expect the residuals to be an i.i.d. sequence
of random variables:
Figure 5.6 shows plots of two test quantities used for these tests, for a specific example.
C ROSS VALIDATION . For all of the above validation techniques, it is a good rule to try the
model on fresh data. This is called cross validation and thus means that you separate data
into two sets: the first one is used (after pre-processing) for parameter estimation, and the
second one is used for assessing the model. In this way you avoid the risk that the model
is picking up phenomena that are coupled to the particular realization of the noise, rather
than properties of the system.
131
Figure 5.6: Example of residual tests. Top: autocorrelation test of residuals; Bottom: cross-
correlation between input and residuals. Test thresholds shown with dotted
lines.
132
5.4 T HE MAXIMUM LIKELIHOOD METHOD *
The least-squares, and more generally the prediction error methods, as presented, are not
based on any particular description of the uncertainties of the model. We will now discuss
a different approach to parameter estimation, which is based on a stochastic, or probabilis-
tic, description of the uncertainties.
The starting point for the discussion is to assume that a θ-dependent model is given in
terms of a probability density function (PDF) for the random variable Y , namely f Y (θ, y ).
The task is to use this parametrized model to estimate the parameters θ, given an observa-
tion y ∗ of y . Intuitively, the “likelihood to make an observation y = y ∗ ” is proportional to
L(θ) = f Y (θ, y ∗ ), and the latter is called the likelihood function. Hence, selecting the param-
eters most likely to yield the measurements y ∗ corresponds to
The maximum-likelihood method is a very general principle and can be applied to many
different problems. Our interest is to apply it to the estimation of parameters in dynamical
models. We will, however, begin to have another look at the simple curve fitting example,
and see how the MLE can be constructed.
Example 5.9 (MLE for curve fitting). In order to apply the ML method, we need to postulate
an explicit model for the uncertainty, or the mismatch between measured data and model
predictions. It is common to assume that the model output is corrupted by additive noise,
which for the linear regression case becomes
where y and e are vectors with stacked variables as in (5.9). We assume {e(i , θ)} are i.i.d.
random variables with PDF f e (x; θ). Now, the “likelihood to make an observation y = y ∗ ”
is the same as the “likelihood that e takes the value y ∗ − Φθ”. Hence, denoting the data
available (i.e. measured outputs y and regressors Φ) by Z N , the likelihood function is given
by
N
Y
N
f (θ, Z ) = f e (ε(i , θ); θ) (5.73)
i =1
where we have used the independence assumption, and the residuals are given by (5.6).
Now, maximizing the likelihood function is equivalent to maximizing the logarithm of the
function—the log likelihood function—allowing us to get a simplified expression for the
maximum likelihood estimate:
133
The derived MLE can be applied to any probability distribution, but we will get further
insight by assuming that the noise has a (centred) Gaussian distribution, a common as-
sumption. This means that the PDF is given by
1 ε2
−
f e (ε, θ) = p e 2σ2 (5.76)
2πσ
implying
1 ε2
− log f e (ε, θ) = log σ + + const. (5.77)
2 σ2
We should observe here that the right hand side depends on both θ (via ε) and σ. By in-
cluding σ among the parameters to estimate, we therefore get the following expression for
the MLE:
¡ 1 1 XN ¢
(θ̂, σ)N = arg min logσ + 2 · ε2 (i , θ) ,
σ,θ 2σ N i =1
which finally gives the estimates (check this by finding the local minima!)
1 X N
θ̂N = arg min ε2 (i , θ) (5.78)
θ N i =1
1 X N
σ̂2N = ε2 (i , θ) (5.79)
N i =1
The important conclusion is that the maximum-likelihood estimate for the case with inde-
pendent, identically and normally distributed noise variables is identical to the least-squares
estimate (at least for the linear regression case)! This should leave us with a certain degree
of trust for the least-squares criterion, although it was initially introduced in a fairly ad-hoc
manner.
R EMARK . In the derivation above, we assumed that the noise sequence consists of i.i.d.
random variables. If we instead assume e has a multi-variate normal distribution, denoted
e ∼ N (0, Σ) , (5.80)
1 1 ⊤ −1
f e (ε, θ) = 1
e−2ε Σ ε
, (5.81)
det (2πΣ) 2
1
− log f e (ε, θ) = const + ε⊤ Σ−1 ε. (5.82)
2
We see that the ML method in this case corresponds to minimizing a weighted LS criterion
as in (5.13), and that the weight matrix is given by the inverse of the covariance matrix, i.e.
134
W = Σ−1 . For the special case with independent noise variables with variances {λi }, i.e.
Σ = diag (λi ), the resulting estimate becomes
N ε2 (i , θ)
X
θ̂N = arg min , (5.83)
θ i =1 λi
so that each of the data-points used for the estimation is used with a weight that reflects
the “credibility” of the data-point, as reflected by its measurement noise variance.
Let us conclude this section with an example, illustrating that the maximum-likelihood
method can be applied to other PDFs than the Gaussian.
Example 5.10 (ML for uniformly distributed noise). Consider a single scalar measurement,
y = y ∗ ∈ R, used to estimate a single scalar parameter θ ∈ R, based on the underlying model
y = θ · u + e, (5.84)
where u ∈ R is a given (single) input. Estimating the parameter θ in this case seems fairly
obvious, as one ought to compute θ̂ = y/u, but let us see where the ML method takes us.
With e assumed to be uniformly distributed in the interval [−1, 1], the PDF is
(
1/2, x ∈ [−1, 1]
f e (x) = , (5.85)
0, otherwise
If we rule out the degenerate case u = 0, the MLE is thus given by a set of θ, given by
£ y −1 y +1¤
θ∈ , , (5.87)
u u
for which the likelhood function takes the maximum value 0.5. Although our initial guess
for θ̂ is within this set, there is no reason it should be given any particular preference, con-
sidering the assumptions made.
The concept of efficiency expresses that the estimator has the least possible covariance of
the estimate; the details follow below.
135
Let θ̂(y ∗ ) be any unbiased estimator of θ = θ0 . Then the covariance of the estimate is
bounded from below by the Cramér-Rao inequality:
Cov θ̂(y ∗ ) ≥ M −1
136
6 D IFFERENTIAL A LGEBRAIC E QUATIONS (DAE S )
We have seen so far formally ordinary differential equations (ODEs), which describe the
time evolution of a vector of variables via relationships of the form:
where x ∈ Rnx is a vector of differential states and u ∈ Rnu a vector of “external variables",
often referred to as inputs (any variable that is not defined by (6.1), but rather “externally"
can be classified as an “input") . For the sake of simplicity, the time dependence (t ) of these
variables is usually omitted.
In this chapter, we will approach a new form of differential equations, labelled Differential-
Algebraic Equations (DAEs), which are used a lot in the modelling of complex and large-
scale systems. DAEs have features and properties that are unlike ODEs. We will investigate
them here.
Example 6.1.
Here we clearly observe that the state variable x2 does not appear in its
time-differentiated form ẋ2 , such that the equations do not define ẋ2 , but rather x2 .
Put differently, x2 is not a differential state but an algebraic state. Hence (6.2) is a
DAE.
137
Here both variables ẋ1 and ẋ2 appear time-differentiated. However, one can also ob-
serve that replacing (6.3b) by −2 · (6.3a) + (6.3b) yields:
Here we observe that (6.4) does not define the differential states ẋ1 and ẋ2 (in the
sense that one cannot solve (6.4) to obtain ẋ1 , ẋ2 ). However, (6.4) still provides a well-
defined trajectory for x1 and x2 . Indeed, (6.4b) specifies that x1 = 0, such that (6.4a)
reads as the simple ODE:
ẋ2 + x2 = 0 (6.5)
Clearly, in order for the notion of DAE to be rigorous, we need a more formal definition.
Definition 1. Consider a differential equation having the state vector x ∈ Rnx , and defined
by the equation
F ( ẋ, x, u, t ) = 0 (6.6)
such that
· ¸
∂F 1 0
= (6.9)
∂ẋ 0 0
is rank-deficient because the second column (row) of the Jacobian of F is zero. One
can generalize this observation: algebraic states yield columns of zeros in the Jaco-
bian ∂F
∂ẋ and make it rank-deficient.
138
• Differential equation (6.3) can be written as
· ¸
ẋ1 + 2x1 + ẋ2 + x2
F (ẋ, x) = =0 (6.10)
2ẋ1 + x1 + 2ẋ2 + 2x2
such that
· ¸
∂F 1 1
= (6.11)
∂ẋ 2 2
In practice, most often DAEs are arising because some of the states are purely algebraic
(i.e. they do not appear time-differentiated), as in (6.2). In order to stress the difference
between differential and algebraic states, it is common to use the notation x for the differ-
ential states, and z the algebraic states. E.g. (6.2) would be written as:
ẋ = 2x + z (6.12a)
0 = 3x − z (6.12b)
Example 6.2. Let us investigate two “freak" differential equations that can switch between
being ODEs and DAEs
u ẋ + x = 0 (6.13)
x =0 (6.16)
Hence (6.13) can switch between being an ODE and a DAE depending on the input
u.
139
• Consider the differential equation
ẋ1 + x1 − u = 0 (6.17a)
(x1 − x2 ) ẋ2 + x1 − x2 = 0 (6.17b)
ẋ1 + x1 − u = 0 (6.19a)
x1 − x2 = 0 (6.19b)
hence it enforces x1 (t ) = x2 (t ) for all time, and (6.17) remains a DAE. However, if the
initial conditions satisfy x1 (0) 6= x2 (0), then (6.17) starts as an ODE and remains an
ODE throughout its trajectories.
These examples are meant to draw the attention to the fact that the notion of DAE can
be fairly convoluted. In most practical cases, however, DAEs arise because the differential
equations hold variables that do not appear time-differentiated. In this context, the prob-
lem of DAEs having exotic behaviors as in the examples above does not arise.
F (ẋ, x, u) = 0 (6.20)
where
· ¸
∂F
det =0 (6.21)
∂ẋ
If the rank-deficiency arises because some of the states x do not appear as
∂F
time-differentiated in F (hence creating columns of zeros in the Jacobian ∂ẋ ), then
they are commonly labelled as “z " states, and (6.20) is rewritten as:
F (ẋ, x, z , u) = 0 (6.22)
140
• Semi-explicit DAEs split explicitly the differential and algebraic equations, and can
generally be written in the form:
ẋ = f (x, z , u) (6.24a)
0 = g (x, z , u) (6.24b)
ẋ = v (6.26a)
0 = F (v , x, z , u) (6.26b)
We observe that (6.24a) is a DAE by construction. Indeed, one can apply definition 1
on (6.25) (using the same construction as in (6.23)) and observe that:
µ· ¸¶
¡£ ∂F ∂F ¤¢ I 0
det = det =0 (6.27)
∂ẋ ∂ż 0 0
• Linear DAEs are DAEs where the underlying functions are linear. They are often put
in the form:
E ẋ = Ax + Bu (6.28)
ẋ = Ax + Bu +C z (6.29a)
0 = Dx + E u + F z (6.29b)
For matrix F full rank, we observe that the algebraic variables z can be eliminated
using (6.29b) and replaced into (6.29a) such that the DAE can be reduced to an ODE
with the addition of some “output variables" z :
¡ ¢ ¡ ¢
ẋ = A −C F −1 D x + B −C F −1 E u (6.30a)
z = −F −1 (Dx + E u) (6.30b)
Similarly to ODEs, nonlinear DAEs can be linearized into linear DAEs. E.g. a fully-
implicit DAE of the form (6.22) can be linearized to
∂F ∂F ∂F ∂F
∆ẋ + ∆x + ∆z + ∆u = 0 (6.31)
∂ẋ ∂x ∂z ∂u
141
and a semi-explicit DAE of the form can be linearized to
∂f ∂f ∂f
∆ẋ = ∆x + ∆u + ∆z (6.32a)
∂x ∂u ∂z
∂g ∂g ∂g
0= ∆x + ∆u + ∆z (6.32b)
∂x ∂u ∂z
and therefore takes the form (6.29). It is interesting to note here that the elimina-
∂g
tion of the algebraic variables ∆z in (6.32) is possible if the Jacobian ∂z is full rank
throughout the trajectory. We will get back to this notion soon.
One ought to understand that in order for a DAE e.g. in the semi-explicit form:
ẋ = f (x, z , u) (6.33a)
0 = g (x, z , u) (6.33b)
to be “solvable", one need to be able to compute the differential state derivative ẋ and the
the algebraic states z for a given differential state x and input u. Indeed, ẋ, z can be readily
obtained (possibly numerically) from (6.33) for any given x, u, then one can process the
system trajectories and build the trajectories corresponding to (6.33). In order to build this
intuition, let us consider the following examples.
Example 6.3.
One can verify that, using simple algebraic manipulations, (6.34) can be rewritten as
z = u − x2 (6.35a)
ẋ2 = x1 (6.35b)
ẋ1 = u − x2 (6.35c)
142
• Consider a linear DAE similar to but slightly different than (6.34)
ẋ1 − z
F = ẋ2 − x1 = 0 (6.36)
x2 − u
One can verify that no algebraic manipulation can deliver ẋ and z from (6.36). In-
deed, (6.36) also reads as:
1 0 −1 ẋ1 0
0 1 0 ẋ2 = x1 (6.37)
0 0 0 z x2 − u
| {z }
:=M
and since matrix M is rank deficient (last row is zero), one cannot manipulate the
equations defined by (6.36) in order to extract ẋ and z. In contrast, we observe that
(6.34) can be rewritten as
1 0 −1 ẋ1 0
0 1 0 ẋ2 = x1 (6.38)
1 0 0 z u − x2
and its matrix is full rank, allowing us to build (6.35). We ought to observe that the
matrix M is obtained by taking the Jacobian
£ ∂F ∂F ¤
∂ẋ ∂z
(6.39)
which need to be full rank in order for the DAEs to be solvable for ẋ and z.
The principles deployed in the examples above can be generalized to nonlinear DAEs, as is
stated next.
Theorem 9. a fully implicit DAE
F (ẋ, z , x, u) = 0 (6.40)
with function F smooth can be readily solved (i.e. solved for ẋ, z ) if the Jacobian
£ ∂F ∂F ¤
∂ẋ ∂z
(6.41)
hold locally for any x, u. It follows that (6.40) can be solved. Another way of construing this
result is that if the Jacobian (6.41) is full rank, then a Newton iteration deployed on (6.40) in
order to compute ẋ, z converges locally (because the linear system (4.8) is well-posed).
143
One can readily apply Theorem 9 to semi-explicit DAEs.
ẋ = f (x, z , u) (6.43a)
0 = g (x, z , u) (6.43b)
with function g smooth can be readily solved (i.e. solved for ẋ, z ) if the Jacobian
∂g
(6.44)
∂z
is full rank on all trajectories z , x, u.
Proof. Recall that a semi-explicit DAE can be transformed into a full implicit one via the
transformation (6.25) recalled here:
· ¸
ẋ − f (x, z , u)
F (ẋ, x, z , u) = =0 (6.45)
g (x, z , u)
∂g
which is full rank if ∂z
is full rank.
An intuitive way of construing Corollary 1 is by observing that if the Jacobian (6.44) is full
rank, then the algebraic equation (6.43b) can be solved for z at any point x, u (see IFT, The-
orem 6). The solution z can then be injected in (6.43a) to obtain ẋ.
Theorem 9 and its corollary 1 tell us that there are DAE that are “easy" to solve (those satis-
fying the full rankness of the Jacobians (6.41) and (6.44)) and DAEs that are “hard" to solve
(those where the Jacobians (6.41) and (6.44) are rank deficient). We will see next that Theo-
rem 9 and its corollary 1 have to do with the differential index of DAEs.
d
Definition 2. the differential index of a DAE is the number of times the operator dt must
be applied to the equations (+ possibly an arbitrary amount of algebraic manipulations) in
order to convert the DAE into an ODE.
Definition 2 is non-trivial and can require some care when being applied. Let us make a
few examples in order to gather some intuitions on how it works.
144
Example 6.4.
z = u − x2 (6.47a)
ẋ2 = x1 (6.47b)
ẋ1 = u − x2 (6.47c)
Recall that (6.35) is an “easy" DAE since it satisfies the assumption of Theorem 9. We
d
also observe that a single application of dt on (6.47a) yields:
ż = u̇ − ẋ2 (6.48a)
ẋ2 = x1 (6.48b)
ẋ1 = u − x2 (6.48c)
or equivalently
ż = u̇ − x1 (6.49a)
ẋ2 = x1 (6.49b)
ẋ1 = u − x2 (6.49c)
ẋ1 − z = 0 (6.50a)
ẋ2 − x1 = 0 (6.50b)
x2 − u = 0 (6.50c)
Recall that (6.36) is a “hard" DAE since it fails the assumption of Theorem 9. A time
differentiation of (6.36) (last row) yields
ẋ1 − z = 0 (6.51a)
ẋ2 − x1 = 0 (6.51b)
ẋ2 − u̇ = 0 (6.51c)
ẋ1 − z = 0 (6.52a)
u̇ − x1 = 0 (6.52b)
ẋ2 − u̇ = 0 (6.52c)
145
A second time differentiation applied on (6.52b) yields
ẋ1 − z = 0 (6.53a)
ü − ẋ1 = 0 (6.53b)
ẋ2 − u̇ = 0 (6.53c)
ü − z = 0 (6.54a)
ü − ẋ1 = 0 (6.54b)
ẋ2 − u̇ = 0 (6.54c)
A total count of 3 time differentiations was used to transform the DAE (6.36) to the
ODE (6.55). It follows that DAE (6.36) is of index 3.
Note that the same principles can be applied to nonlinear DAEs (see e.g. Section 6.4 below).
Let us make now the connection between the concept of differential index and Theorem 9
and its corollary 9.
Theorem 10. a fully-implicit DAE
F (ẋ, z , x, u) = 0 (6.56)
146
The same result extends to semi-explicit DAEs, i.e.
ẋ = f (x, z , u) (6.60a)
0 = g (x, z , u) (6.60b)
Proof. if (6.60) is of index 1, then a single time differentiation on the algebraic equation
yields an ODE, i.e.
ẋ = f (x, z , u) (6.61a)
d
0 = g (x, z , u) (6.61b)
dt
is an ODE, i.e. equivalently, using a chain rule
ẋ = f (x, z , u) (6.62a)
∂g ∂g ∂g
0= ẋ + ż + u̇ (6.62b)
∂x ∂z ∂u
is an ODE. It follows that
ẋ = f (x, z , u) (6.63a)
µ ¶
∂g −1 ∂g ∂g
ż = − ẋ + u̇ (6.63b)
∂z ∂x ∂u
∂g
is an ODE, which requires the Jacobian ∂z
to be full rank.
The message to take home from Theorem 10 and corollary 2 is that index-1 DAEs are “easy"
to solve in the sense that the equations readily deliver ẋ and z . DAEs of index more than 1
are generally referred to as “high-index" DAEs, and are notoriously more difficult to han-
dle. When approaching DAEs numerically, index-1 DAEs are clearly preferred. This does
not mean than high-index DAEs cannot be treated, but they are often best treated via a
so-called index-reduction procedure, which we will introduce in Section 6.4 and further
discuss in Section 6.5.
d ∂L ∂L
− =Q (6.64a)
dt ∂q̇ ∂q
¡ ¢
c q =0 (6.64b)
147
is in fact a semi-explicit DAE, where (6.64b) is the algebraic equation and (6.64a) yields (af-
ter minor treatments) an explicit ODE.
We have seen that (6.64) does not readily deliver q̈, z , and that it ought to be modified
by time-differentiating the constraint equation (6.64b) twice, delivering the model (3.166)
recalled here:
d ∂L ∂L
− =Q (6.65a)
dt ∂q̇ ∂q
¡ ¢
c̈ q , q̇, q̈ = 0 (6.65b)
It is interesting here to verify the differential index of (6.64) by applying definition 2. The
differential state for a mechanical system reads as
· ¸
q
x= (6.66)
q̇
One can observe that (6.67) readily delivers ẋ, z for W (q ) M(q ) full rank. It follows that
(6.67) fulfils the assumption of Theorem 9, and is therefore of index 1. Since (6.67) has been
obtained after two time differentiations of (6.64), it follows that (6.64) is an index-3 DAE.
We can summarize this observation as:
d2 d
dt 2 dt
(6.64b) −−−−−−→ (6.65b) −−−−−→ ODE
| {z } | {z }
index 3 index 1
The important message to take home here is that a high-index DAE such as (6.64) can be
transformed into an index-1 DAE (i.e. (6.65)) via time differentiations and algebraic ma-
nipulations. The trick we have explored to transform the DAEs arising from constrained
Lagrange mechanics into low-index DAEs is not limited to this special case, but it a generic
approach to transform “hard" (i.e. high-index) DAEs into “easy" ones (i.e. low-index DAEs),
commonly referred to as an index-reduction procedure. We further detail this in the next
section.
148
6.5 I NDEX REDUCTION
The index reduction of DAEs is identical to assessing their index as per definition 2, with the
minor difference that the procedure ought to be stopped one step before reaching an ODE
(i.e. at the index-1 step). More specifically, one ought to perform time differentiations and
algebraic manipulations until an index-1 DAE emerges. It is difficult to automate this pro-
cedure in general (although some software packages offer index-reduction capabilities), as
the algebraic manipulations to be performed typically require some insights into the DAE.
However, we can attempt a detailed description of the procedure.
ẋ = f (x, z , u) (6.68a)
0 = g (x, z , u) (6.68b)
2. Identify a subset of algebraic equations in (6.68b) that can be solved for a subset of
algebraic variables z .
d
3. Apply dt on the remaining algebraic equations containing some differential states x j ,
this leads to terms ẋ j appearing in these differentiated equations.
4. Substitute the terms ẋ j by the corresponding expressions f j (x, z , u), this delivers new
algebraic equations to replace those differentiated in (6.68b).
This procedure is not always straightforward to deploy. Let us consider a couple of exam-
ples to understand it better.
Example 6.5.
We observe that
· ¸
∂g 0 0
= (6.70)
∂z 0 1
149
is rank deficient, such that (6.69) is of index > 1 (see Corollary 2). We observe that the
second equation of (6.69b) delivers z 2 as:
z2 = x1 (6.71)
d
We then apply dt
on the first equation in (6.69b) to obtain:
x2 + z2 − z1 − u = 0 (6.73)
We obtain the new DAE where the differentiated algebraic equation has been re-
placed by (6.73):
· ¸
x2 + z2
ẋ = (6.74a)
z1 + u
· ¸
x2 + z2 − z1 − u
0= := g̃ (6.74b)
z2 − x1
We observe that
· ¸
∂g̃ −1 1
= (6.75)
∂z 0 1
is full rank, such that (6.74) is of index 1. This concludes the procedure. We can
additionally deduce from the index reduction that (6.69) is of index 2.
ẋ = Ax − bz (6.76a)
1¡ ¢
0 = x ⊤x − 1 (6.76b)
2
for some matrices A and vector b, and where z ∈ R. Here the algebraic part is:
1¡ ⊤ ¢
g (x) = x x −1 (6.77)
2
g (x)
such that ∂z = 0 is rank deficient by construction. We then take the time derivative
of g (x):
d
g̃ (x, z ) = g (x) = x ⊤ ẋ = x ⊤ (Ax − bz) = 0 (6.78)
dt
We observe that
∂g̃ (x, z )
= −x ⊤ b (6.79)
∂z
150
is full rank if x ⊤ b 6= 0. Note that z is here readily delivered by g̃ (x, z ) = 0. Indeed:
x ⊤ Ax
z= (6.80)
x ⊤b
The index-reduced DAE then reads as:
ẋ = Ax − bz (6.81a)
0 = x ⊤ (Ax − bz) (6.81b)
Similarly to Lagrange mechanics, the reduction of the index of a DAE requires one to collect
a set of consistency conditions that need to be enforced in order for the resulting index-1
DAE to match the original one. Generally speaking, when performing the index reduction,
one ought to collect all the algebraic equations on which a time differentiation is per-
formed, and add them to the list of consistency conditions. E.g. in the example above, the
first equation in (6.69b) is time-differentiated and is therefore the (only) consistency con-
dition.
Similarly to the Lagrange context, consistency conditions can drift numerically over long
simulations, and ought to be corrected if needed. Describing the introduction of a Baum-
garte stabilization in generic index-reduced in a general form can be difficult, and we will
leave this question out of these notes. In simple cases (such as e.g. the Lagrange context),
the principle is fairly simple.
151
152
7 E XPLICIT I NTEGRATION M ETHODS - R UNGE -K UTTA
For the remainder of these notes we will discuss the numerical treatment of differential
equations. Simulating a system in e.g. the state-space form:
over a time interval [0, T ] consists essentially in computing a discrete sequence of state
vectors
x0 , . . . , x N (7.2)
that approximate the true and continuous trajectories x(t ), t ∈ [0, T ] solution of (7.1) on a
given time grid t0 , . . . , t N sufficiently accurately that they are useful for whichever goal we
have to tackle, i.e. we want e.g. the approximation:
to hold. For most model equations f , the sequence (7.2) can only be built numerically in
a computer. For the remainder of the course, we will focus on understanding some (non-
exhaustive) modern methods to build a sequence (7.2) of simulated states that are “reason-
ably close" to the true model trajectories.
where u k = u(tk ), starting from the given initial conditions x0 . The Euler step is illustrated
in Figure 7.1, and is essentially linearly extrapolating from the model dynamics f (xk , u k ) in
order to build the next discrete state xk+1 from the current one xk .
8
i.e. the amount of computations needed to reach a given accuracy
153
x(t )
x(tk+1 )
xk+1 = xk + ∆t f (xk , u k )
tk xk ∆t tk+1
t
Figure 7.1: Illustration of the principle underlying the explicit Euler scheme (7.4)
It ought to be intuitively clear here that the larger the time step ∆t is chosen, the larger the
discrepancy between the true model trajectories x(t ) and the simulated ones x0 , . . . , x N are.
Indeed, Euler is essentially ignoring the “curvature" of the model trajectories between the
discrete states and replacing it with a straight line. The longer the step, the “more curva-
ture" is ignored.
Is is useful to formally analyse the discrepancy between the simulated states and the true
trajectories, and relate them to the choice and parameters (e.g. ∆t ) of the integration
method. In order to do that, it is very useful to compute the one-step error, i.e. assuming
that xk = x(tk ) (i.e. the simulation is exact at time tk ), then what is the error at time tk+1 , i.e.
what is kxk+1 − x(tk+1 )k? This can be done via Taylor arguments. Indeed, we observe that:
1
x(tk+1 ) = x(tk ) + ∆t · ẋ(tk ) + ∆t 2 ẍ(ξ) (7.5)
| {z } 2
=xk+1
for some ξ ∈ [tk , tk+1 ] (this is the Taylor expansion with an explicit remainder). It follows
that:
∆t 2 ∆t 2
kxk+1 − x(tk+1 )k = kẍ(ξ)k ≤ max kẍ(ξ)k (7.6)
2 ξ∈[tk tk+1 ] 2
154
• The one-step error is worse when ẍ is large, i.e. if the model trajectory is more “curved".
Using this result we can then discuss the global error of the explicit Euler integration method,
i.e. what is the final integration error kx N − x(T )k? We can answer that question almost di-
rectly from (7.6). Indeed, we make the following observations:
T
• An integration up to time T with a time step ∆t requires N = ∆t Euler steps
• After N step, the simulation error kx N − x(T )k will be of the order ∆t . We will use the
notation
Example 7.1. We illustrate next these principles on the integration of the following dynam-
ics:
¡ (x2 − ¢x1 )
σ
ẋ = f (x, u) = x1 ρ − x3 − x2 (7.9)
x1 x2 − βx3
£ ¤⊤
for σ = 10, β = 3, ρ = 28 and starting at the initial conditions x0 = 1 1 1 . Figures
7.2-7.4 illustrate the model trajectories obtained from the explicit Euler scheme (7.4) for
different step sizes ∆t , and figure 7.5 reports the global error, i.e. the error observed at the
end of the simulation.
155
∆t = 10−2
25 30 60
20
50
20
15
40
10 10
x2 (t )
x3 (t )
x1 (t )
5 30
0 0
20
-5
-10
10
-10
-15 -20 0
0 0.5 1 0 0.5 1 0 0.5 1
t t t
Figure 7.2: Numerical integration of the equation (7.9). In dark the trajectories from the
explicit Euler scheme (7.4) using ∆t = 10−2 , and in grey the trajectories obtained
using a very high accuracy integration method.
∆t = 10−2.5
25 30 60
20
50
20
15
40
10 10
x2 (t )
x3 (t )
x1 (t )
5 30
0 0
20
-5
-10
10
-10
-15 -20 0
0 0.5 1 0 0.5 1 0 0.5 1
t t t
Figure 7.3: Numerical integration of the equation (7.9). In dark the trajectories from the ex-
plicit Euler scheme (7.4) using ∆t = 10−2.5 , and in grey the trajectories obtained
using a very high accuracy integration method.
156
∆t = 10−3
25 30 60
20
50
20
15
40
10 10
x2 (t )
x3 (t )
x1 (t )
5 30
0 0
20
-5
-10
10
-10
-15 -20 0
0 0.5 1 0 0.5 1 0 0.5 1
t t t
Figure 7.4: Numerical integration of the equation (7.9). In dark the trajectories from the
explicit Euler scheme (7.4) using ∆t = 10−3 , and in grey the trajectories obtained
using a very high accuracy integration method.
1
10
kx N − x(T )k
10 0
10 -1
10 -2 10 -3 10 -4
∆t
Figure 7.5: Global error vs. ∆t using the explicit Euler scheme (7.4).
Beyond the accuracy of integration methods, another crucial aspect to discuss is their sta-
bility. In order to introduce this concept, let us consider the trivial stable scalar linear sys-
tem:
ẋ = −λx, x(0) = x0 (7.10)
157
for λ > 0 and arbitrary initial conditions x0 . The dynamics (7.10) are linear and have there-
fore the explicit solution:
Let us nonetheless consider deploying the explicit Euler method on (7.10). The Euler steps
then read as:
We observe that (7.12) is a linear discrete dynamic system, and that it becomes unstable
for |1 − λ∆t | > 1. Since λ, ∆t > 0, this happens if 1 − λ∆t < −1, i.e. if
2
∆t > (7.13)
λ
In other words, the explicit Euler method delivers an unstable simulation for the dynam-
ics (7.10) when the time step ∆t is too large compared to the pole of the dynamics λ. We
need to stress here that the dynamics (7.10) are stable, only their numerical simulation is
possibly unstable. We illustrate these observations in Fig. 7.6 below.
15 15 15
5 5 5
0 0 0
x
-5 -5 -5
Figure 7.6: Illustration of the stability problem of the Euler method for λ = 10. Only a
small enough step-size ∆t ≤ 0.2 allows the method to be stable (left and mid-
dle graph), while ∆t > 0.2 yields an unstable simulation (right graph).
We will need to discuss the stability of each integration method we will study. This stabil-
ity issue is in fact studied in a slightly more general way for various integration methods,
namely the test system:
158
is used with λ ∈ C and the integration method stability is described in terms of region of
stability in the complex plane, for which a given λ∆t yields a stable simulation. In the
specific case of the explicit Euler method the region of stability is given by a circle of radius
one centered at −1, i.e.
1.5
Imaginary 0.5
-0.5
-1
-1.5
-2.5 -2 -1.5 -1 -0.5 0 0.5
Real
Figure 7.7: Stability region of the explicit Euler scheme (7.4) in the complex plane. All com-
binations of λ,∆t such that λ∆t lies within the circle result in a stable (though
not necessarily accurate) integration.
Obviously, the larger the region of stability is the more “stable" the method is considered.
Summary
• The explicit Euler method is a method of order 1, i.e. the global error is kx N − x(T )k =
O (∆t ) this is the lowest order that one can accept from an integration method, as one
expects from an method that the error gets smaller with the step size ∆t .
• The explicit Euler method can become unstable for a too large step size. Its region of
stability for the test system (7.14) is fairly small and given in Fig. 7.7
The exact trajectories of the model dynamics (7.1) obey the following integral relationship:
Ztk+1
x(tk+1 ) = x(tk ) + f (x(t ), u(t )) dt (7.15)
tk
159
The key idea behind many integration methods is to try to provide good evaluations of the
integral term in (7.15). In fact, one can interpret the explicit Euler method as a crude way
of doing that. Indeed, explicit Euler approximates:
Ztk+1
f (x(t ), u(t )) dt ≈ (tk+1 − tk ) f (x(tk ), u(tk )) ≈ ∆t f (xk , u k ) (7.16)
tk
A natural question then is how can we improve the approximation (7.16). To that end, one
can e,g, use a mid-point rule, based on the approximation
Ztk+1 µ µ ¶ µ ¶¶
∆t ∆t
f (x(t ), u(t )) dt ≈ ∆t f x tk + , u tk + (7.17)
tk 2 2
which tends to be better than (7.16) whenever the model trajectories have a “low curva-
ture". This observation is illustrated in Figure 7.9.
160
Figure 7.9: Illustration
Rtk+1 of the approximation (7.17). The light grey area is
tk f (x(t ), u(t )) dt (including the dark grey area), which the explicit Euler
¡ ¡ ¢ ¡ ¢¢
scheme is approximating with ∆t f x tk + 2 , u tk + ∆t
∆t
2 , i.e. the rectangular
grey area.
¡ ¢
Unfortunately, x tk + ∆t
2 is unknown and needs to be itself replaced by an approximation
built upon xk . Here we rely again on an explicit Euler “half" step, i.e. we use:
µ ¶
∆t ∆t
x tk + ≈ xk + f (xk , u k ) (7.18)
2 2
Combining (7.18) and (7.17), we obtain the “mid-point" RK2 integration scheme:
Despite the somewhat convoluted construction, we can fairly easily show that this RK2
scheme is of order 2 (hence the name). In order to compute the order, we will use a Taylor
argument again. In order to reduce the complexity of the notation, let us do the calculation
assuming that the model dynamics do not have an input u. Assuming again that xk = x(tk )
(i.e. the integration is exact at time tk ), we observe that:
1 ¡ ¢
x (tk+1 ) = x(tk ) + ∆t · f (x(tk )) + ∆t 2 f˙ (x(tk )) + O ∆t 3 (7.20)
2
161
where f˙ = ∂x f . We additionally observe that (7.19) in effect does:
∂f
µ ¶
∆t
xk+1 = xk + ∆t · f xk + f (x(tk )) (7.21)
2
and we finally observe that
µ ¶ ¯
∆t ∆t ∂ f ¯¯ ¡ ¢
f xk + f (x(tk )) = f (x(tk )) + f (x(tk )) + O ∆t 2 (7.22)
2 2 ∂x xk ¯
Hence
¯
∆t 2 ∂ f ¯¯ ¡ ¢
xk+1 = xk + ∆t · f (x(tk )) + f (x(tk )) + O ∆t 3 (7.23)
2 ∂x xk ¯
We can then conclude that the one-step error of the RK2 scheme is of order 3, and using a
similar reasoning as for the explicit Euler scheme, the global error of the RK2 scheme is of
order 2, i.e.
¡ ¢
kx N − x (T )k = O ∆t 2 (7.25)
(see Figure 7.10 for an illustration). We will discuss the stability of RK2 later, together with
the other RK methods.
0
10
RK1
RK2
-2
kx N − x(T )k
10
-4
10
-6
10
10 -4 10 -3 10 -2 10 -1 10 0
∆t
Figure 7.10: Global error of the RK2 method (7.19), and comparison with the explicit Eu-
ler scheme (7.4) (RK1), as obtained from numerical experiments. One can see
the difference between having a first or second order method. The stronger
slope of the RK2 method (in the log-log plot) indicates a higher power in the
relationship ∆t to kx N − x (T )k.
162
7.3 G ENERAL RK METHODS
The principles of the RK2 method can be generalized. We introduce here the general for-
mula for RK methods, which is a generalization of (7.19).
à !
s
X
K 1 = f xk + ∆t a1j K j , u(tk + c1 ∆t ) (7.26a)
j =1
..
.
à !
s
X
K i = f xk + ∆t ai j K j , u(tk + ci ∆t ) (7.26b)
j =1
..
.
à !
s
X
K s = f xk + ∆t a s j K j , u(tk + c s ∆t ) (7.26c)
j =1
s
X
xk+1 = xk + ∆t bi K i (7.26d)
i =1
where the coefficients ai j , b i , c j define the RK method, and s is the number of stages of the
method. We can easily observe that both the explicit Euler method and the RK2 methods
introduced above can be written in this form. Indeed, it can be verified that:
a11 = 0, b 1 = 1, c1 = 0 (7.27)
c1 a11 . . . a1s
.. .. ..
. . .
cs a s1 . . . a ss
b1 ... bs
163
E.g. explicit Euler has the Butcher tableau
0 0
1
and is therefore an RK1 method (because of order 1). The “mid-point" RK2 method has the
Butcher tableau
0 0 0
1 1
2 2
0
0 1
Other RK2 methods exist, with Butcher tableaus of the same size but with different coeffi-
cients, e.g.
0 0 0 0 0 0
2 2
3 3 0 1 1 0
1 3 1 1
4 4 2 2
RK methods resulting from (7.26) ought to be divided in two categories explicit and im-
plicit. In order to make these terms clear, let us consider all RK methods having two stages
(s = 2). In this case (7.26) reads as:
We observe that for a11 , a12 , a22 = 0 in (7.29) (such as in all the RK2 we have looked at
above), K 1 can be computed explicitly from xk , u(.), and then K 2 can be computed explic-
itly from xk , u(.), K 1 . However, for any of the a11 , a12 , a22 non-zero, this cannot be done as
equations (7.29a)-(7.29b) become implicit (the unknown K 1,2 appear on both sides of the
equalities and are linked through the function f ). We then say that the RK scheme is im-
plicit. One can trivially see from the Butcher tableau of an RK method whether the method
is explicit of implicit. We state this next.
A Butcher tableau defines an explicit integrator if and only if only the diagonal and
upper-diagonal elements are zero, i.e. if and only if ai j = 0 for any j ≥ i . Otherwise it
defines an implicit method.
The distinction is important in practice. Indeed, while explicit RK methods require simply
164
computing the K i sequentially (K 1 then K 2 , etc.), implicit RK methods require solving the
equations together and numerically, typically using a Newton method. The latter is com-
putationally more expensive because it requires solving linear systems a number of times
for each integration steps. However, as we will see later, implicit RK schemes have powerful
properties that make them attractive, even when computational time is important.
It is worth discussing one of the most commonly used RK method, the explicit RK4 (order
4), which requires s = 4 stages. It has the Butcher tableau9
0 0 0 0 0
1 1
2 2 0 0 0
1 1
2 0 2 0 0
1 0 0 1 0
1 1 1 1
6 3 3 6
It is a popular approach because it has a very good trade-off between computational com-
plexity (CPU time) and accuracy.
holds for some c > 0 and for ∆t small enough. We should probably stress here the im-
plication of having an order 4. The consequence of the order 4 is that if one divides
the time step ∆t by 2, then the amount of computation required to run a simulation
on a time interval [0, T ] is doubled, but the numerical error of the simulation is di-
vided by 24 = 16. This ∆t 4 effect of the order 4 makes it “cheap" to gain accuracy
(cheap in terms of computational time).
For the sake of completeness, let us write the pseudo-code corresponding to an RK4 method.
9
Note that other Butcher tableaus can generate order 4 RK methods, but this one is arguably the most com-
monly used.
165
RK4 method for explicit ODEs
Algorithm:
return x1,...,N
Let us discuss more the integration order of explicit RK methods. Before diving in, we
should discuss more the implication of the number of stages s that the different RK meth-
ods have. It can be observed that the number of stages of an explicit RK method defines the
computational complexity of evaluating one step xk → xk+1 in the integrator. Indeed, each
integration step (i.e. the execution of (7.26)) requires s evaluations of the model equations
f and each of these evaluation translates directly into a (typically) fixed CPU time. We can
deduce that the computational cost of one step of an explicit integrator is roughly propor-
tional to s. In that sense, integrators with a low number of stages appear preferable
On the other hand, integrators with a larger number of stages s also achieve higher orders
(let’s label them o), and therefore achieve a higher accuracy for a given ∆t . Indeed, remem-
ber that the simulation error is bounded by:
kx N − x(T )k ≤ c∆t o (7.33)
for some c > 0 and ∆t small. Note that we should always consider ∆t small, i.e. ∆t o be-
comes exponentially smaller as o increases. Hence by increasing the number of stages s we
pay proportionally more in computations but we gain exponentially in accuracy. The gain
tends to out-weight the loss because of the “power effect" of the order.
The RK methods we have discussed so far suggest that the number of stages s of the method
equals its order o. Indeed, we have seen that:
• Explicit Euler has an order o = 1 and is an RK method with s = 1 stage. If ∆t is divided
by 2 then the error is divided by 2.
166
• The RK2 methods we have seen have an order o = 2 and have s = 2 stages. If ∆t is
divided by 2 then the error is divided by 4.
• The RK4 method we have seen has order o = 4 and has s = 4 stages. If ∆t is divided by
2 then the error is divided by 16.
From these observations, it may appear that the higher s the better. Unfortunately, this pat-
tern o = s breaks beyond o = 4. Let us stress this fact in the following table (these numbers
are not straightforward to establish, and we will leave that question out of these notes).
2
10
0
10
kx N − x(T )k
-2
10
10
-4 RK1
10
-6 RK2 RK4 RK5
RK3
-8
10 -4 -2 0
10 10 10
∆t
Figure 7.11: Global error of the different RK methods, as obtained from numerical experi-
ments.
167
This effect has some important consequences in practice on the choice of an integration
method. Indeed, the crux of any integration method is to achieve a given accuracy with
a minimum computational budget. There are two ways of increasing the integrator accu-
racy: increasing the order o, i.e. increasing the number of stage s, or decreasing ∆t . Having
these two options makes the choice of integrator (choice of order and choice of time step)
non-trivial.
However, using the table above, we can in fact somewhat formalize this choice. Let us
assume that we want the global error to be limited to some given number Tol
• In order to carry out a simulation on the time interval [0, T ], the number of integrator
step required is:
T
N= (7.36)
∆t
and for a number of stages s, the number of evaluation n of the system dynamics f
is:
µ ¶ 1
sT Tol − o
n = N ·s = ≥ sT (7.37)
∆t c
We deduce from this simple reasoning that the computational cost per unit of simulation
time is at least:
µ ¶ 1
n Tol − o
≥ s (7.38)
T c
where o and s are related via the table above. It is then interesting to chart this relationship,
we do that in the Figures below. Here it becomes clear that a very low or very high order is
not optimum, and that the optimum is in the order 3-6.
168
Figure 7.12: Illustration of formula (7.38) for different Tol
c . The horizontal axis represents the
order o of the method, and the vertical the computational cost per simulation
time T , i.e. Tn . The optimum is achieved in the middle range, and not for low
nor high order methods.
10 4 RK1
Number of function evaluations n
RK2
3
10 RK3
RK4
10 2
RK5
10 1
-10 -5 0 5
10 10 10 10
Simulation error kx N − x(T )k
Figure 7.13: Illustration of the number of function evaluations required to reach different
levels of simulation accuracy for a linear test example. Here the RK5 scheme
beats the other schemes for high accuracies, and RK4 beats the other schemes
for lower accuracies. Note that this result is model-dependent.
169
Before closing this section, a last point ought to be stressed regarding the order o achieved
by different RK methods. Indeed, we have seen the relationship between the order o of
different RK methods and the number of stages s required to achieve that order. We need to
stress here that the order o of an RK method does not follow simply from using the adequate
amount of stages. E.g. the RK2 schemes (o = 2) we have seen require s = 2 stages, but they
also require the coefficients a, b, c to be chosen adequately. E.g. in the case of a s = 2 stages
method, an order o = 2 is achieved if the coefficients satisfy:
1
b 1 + b 2 = 1, b2 c2 = , a21 = c2 (7.39)
2
Similarly, for the other RK methods, the coefficients cannot take arbitrary value if one wants
the method to achieve its highest possible order.
The regions of stability for different orders are depicted in Figure 7.14. One can observe
that the regions increase with the order, but their size is fairly limited.
170
3 4 5
3
2 2
1
1
Imaginary
0
-1
-2
-3
-3 -2 -1 0
Real
Figure 7.14: Illustration of the region of stability for the explicit RK methods 1 to 5. One can
observe that the stability region grows with the order, but it nonetheless covers
a limited range of admissible λ∆t .
The numerical stability considerations discussed above have important practical conse-
quences. Indeed, while the dynamics(7.40) are clearly of no interest, they operate as a
trivial test system to discuss how integration methods react to fast dynamics, i.e. more
specifically, time constants in the dynamics that are in the same ballpark as the integration
step ∆t . The question of whether an integration method is capable of handling such fast
dynamics is crucial in the context of stiff systems, which appear very often in models for
engineering application such as mechanical and electrical systems. It is important to “de-
tect" the stiffness of a model when trying to simulate it.
We have in fact already encountered stiff systems in the DAE chapter. Indeed, (2.34) essen-
tially describes a model having very fast dynamics (for ǫ small) on the states z, and slower
dynamics on the states x. DAEs allow one to approximate these fast states via their “de-
cayed" values (i.e. their trajectories after their fast, stable dynamics have decayed). How-
ever, one may want to process these dynamics without using DAE formulations, in which
case the fast dynamics have to be dealt with.
Example 7.2 (Stiff system). As an example of a stiff system, consider the model equations
0 5 1 0
1
0 0
−5 2
ẋ = −1 x (7.42)
0 0 −10 ǫ ǫ
0 ǫ −ǫ −10−1 ǫ
171
for ǫ = 5 · 10−4 . The corresponding trajectories are illustrated in Fig. 7.15 and 7.16. One can
observe that states x1,2 evolve slowly (although they are influenced by the fast states and
have some small oscillations in the beginning), while the states x3,4 oscillate fast and decay
to slow trajectories. This kind of system can e.g. arise in mechanical systems or electri-
cal circuits when some part have eigenfrequencies that are much higher than other parts.
They are especially expensive to treat in numerical simulations because the fast dynamics
require a small ∆t in order for the numerical simulation to be stable, while long simulations
(T large) are required in order to see the evolution of the “slow" states.
1.5 1
1.4 0.9
0.8
1.3
0.7
x2
x1
1.2
0.6
1.1
0.5
1 0.4
0.9 0.3
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
t t
10 15
5 10
0 5
x4
x3
-5 0
-10 -5
-15 -10
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
t t
Figure 7.15: Illustration of the trajectories of the stiff system (7.42).
172
1.12 1
1.1 0.98
1.08 0.96
1.06 0.94
1.04 0.92
1.02 0.9
1 0.88
0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02
1 0.105
0.98 0.1
0.095
0.96
0.09
0.94
0.085
0.92
0.08
0.9 0.075
0.88 0.07
0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02
Figure 7.16: Illustration of the trajectories of the stiff system (7.42) vs. a simulation using
RK4 with ∆t = 1.5 · 10−3 (dotted curve). The initial conditions for the “fast"
states are chosen such that ż = 0 at t = 0. The small step size is close to the
limit for which the simulation becomes stable.
We will see later on that stiff systems are in fact one of the motivations for using implicit
integration methods.
Example 7.3 (Van der Pol oscillator). A classic example of an ODE triggering the difficulties
mentioned above is the Van der Pol oscillator having the equations
173
and e.g. the trajectories displayed in Fig. 7.17. One can observe that the trajectories are
fairly “benign" most of the time, but regularly go through very sharp changes. The Van der
Pol oscillator is a challenging ODE for numerical integration as very fine time steps ∆t are
needed to “survive" the sharp changes, while fairly large time steps can be used on the rest
of the trajectories.
0
x1
-5
0 10 20 30 40 50
t
50
x2
-50
0 10 20 30 40 50
t
Figure 7.17: Illustration of the Van der Pol trajectories.
It then makes sense to use varying step sizes ∆t throughout the trajectories. Adaptive inte-
gration schemes are a practical approach to do precisely that. The key idea here is to adjust
the step size ∆t during the simulations in order to meet the required accuracy. In prac-
tice, adaptive RK schemes perform an error control at every RK step, and when the error
is larger than acceptable, the step size is reduces and a shorter step is attempted until the
tolerance is met. Adaptive integrator require a baseline to assess the error at each step. A
common practice is to compute the integration step using two different Butcher tableaus,
and compare their outcome. If their discrepancy is above a certain level, then the step size
is reduced.
Let us e.g. consider the RK45 adaptive integration scheme, which is implemented in Matlab
in the function ode45.m. At each RK step, the procedure:
• Generates two steps from xk , let us call them xk+1 and x̂k+1 , using two different
Butcher tableaus.
174
• Reduces ∆t if the error e is above some tolerance, and computes the steps again until
the tolerance is met
We observe here that ∆t has a “memory" in the sense that if a very low ∆t was needed at a
step k, then it will remain small for some steps. One ought to observe that this procedure is
somewhat computationally expensive, as it computes at every RK steps two different steps.
In order to limit the computational expenses, a common approach is to use two Butcher
tableau that differ only in the b coefficients, i.e. in forming the linear combination (7.26d),
while the coefficients a and c are the same such that equations (7.26a)-(7.26c) need to be
computed only once. The two steps xk+1 and x̂k+1 are therefore computed as:
s
X
xk+1 = xk + ∆t bi K i (7.44a)
i =1
X s
x̂k+1 = xk + ∆t b̂ i K i (7.44b)
i =1
The Butcher tableau for adaptive RK methods is then presented with two lines in the “b"
part of the tableau, the first one for the “classic" b and the second for b̂. E.g. the Butcher
tableau used in the ode45.m Matlab function reads as10 :
0
1/5 1/5
3/10 3/40 9/40
4/5 44/45 −56/15 32/9
8/9 19372/6561 −25360/2187 64448/6561 −212/729
1 9017/3168 −355/33 46732/5247 49/176 −5103/18656
1 35/384 0 500/1113 125/192 −2187/6784 11/84
35/384 0 500/1113 125/192 −2187/6784 11/84 0
5179/57600 0 7571/16695 393/640 −92097/339200 187/2100 1/40
One can observe that this method has s = 7 stages. Both steps xk+1 and x̂k+1 are of order
o = 5. Figure 7.18 illustrates the step size ∆t selected by the ode45.m when treating the Van
der Pol oscillator (7.43).
One can finally note that the last line of the “a" part of the tableau is identical to the first
line of the “b" part, and that the K 1 is given by:
This implies that K 1 of step k + 1 matches K 7 of step k such that the last K 7 can be reused
for the next K 1 .
10
the RK method coded by this tableau is referred to as the RK45 Dormand-Prince method
175
10 -1
10 -2
∆t
10 -3
0 10 20 30 40 50
t
Figure 7.18: Illustration of the step sizes selected by ode45.m for simulating the Van Der Pol
trajectories. Whenever the trajectories are going through a “sharp" turn, the
step size drops drastically in order to keep the integration error under control.
It then increases slowly again to reduce the computational burden.
176
8 I MPLICIT I NTEGRATION M ETHODS – R UNGE -K UTTA
In the explicit RK section, we have seen that a Butcher tableau can define explicit or implicit
RK methods. We have discussed in details explicit RK methods. It is time now to discuss
implicit RK methods. While implicit methods are more expensive computationally (one
needs to solve equations at every RK step, as opposed to simply evaluate the K i sequen-
tially as in explicit methods), we will see that they have some striking advantages. Indeed,
Implicit RK methods
The name “implicit" ought to be fairly clear from (8.2). Indeed, obtaining xk+1 from (8.2)
cannot be done via a simple function evaluation, but must be done via solving (8.2) for
xk+1 , i.e. one needs to find a solution to
in terms of xk+1 . Note that r ∈ Rn where n is the size of the state space, i.e. x ∈ Rn . Solving
(8.3) is typically done via applying the Newton method covered earlier in these notes. The
deployment of the implicit Euler algorithm is then detailed below.
177
Implicit Euler method
Algorithm:
∂r (xk+1 , xk , u k+1 )
∆xk+1 + r (xk+1 , xk , u k+1 ) = 0 (8.4)
∂xk+1
for some step size α ∈]0, 1] (a full step α = 1 generally works for implicit
integrators)
return x1,...,N
Note that this procedure is significantly more complex than the simple update (8.1) of the
explicit Euler method. In particular, computing the Newton step (8.4) requires computing
∂r (xk+1 ,xk ,u k+1 )
the Jacobian ∂xk+1
and forming its matrix factorization (i.e. solving the linear sys-
tem). This procedure must be repeated at each RK step xk → xk+1 , i.e. many times in a
∂r (xk+1 ,xk ,u k+1 )
complete simulation. Note that the Jacobian ∂x reads as:
k+1
for some c > 0 and for ∆t sufficiently small. Hence the order of the implicit Euler method
is identical to the explicit Euler method.
Let us now unpack one of the main motivations for using, in some cases, implicit methods
for simulating a model. Let us come back to the stability issue of the explicit Euler method
(8.1), and recall that the iteration is unstable on the (stable) test system
ẋ = λx (8.8)
for ∆t · λ > 2.
178
We can make a very similar computation for implicit Euler scheme, i.e. we observe that the
Implicit Euler method (8.2) deployed on the test system (8.8) reads as
xk+1 = xk + ∆t λxk+1 (8.9)
or equivalently
1
xk+1 =xk (8.10)
1 − λ∆t
Hence the Implicit Euler method (8.2) is stable if
|1 − λ∆t | > 1 (8.11)
Observe that R (λ) < 0 is required in order for (8.8) to be stable, and it follows that (8.11)
always holds. This result is quite striking. Indeed, it entails that the implicit Euler method
is stable regardless of how “fast" the time constants of the model are. This property is
called A-stability, and means that the whole left-hand complex plane is stable (as opposed
to the limited regions depicted in Figure 7.14). A-stability allows one to treat stiff dynamics
without taking special care of the instability of the method, i.e. by taking “fairly large" steps
∆t despite the fast time constant λ of the model.
Example 8.1 (Stiff system, cont’d). Let us reuse the stiff model (7.42) in Example 7.2 and
apply the implicit Euler method to simulate it. The outcome is illustrated in Fig. 8.1 for
∆t = 1 · 10−3 and ∆t = 2 · 10−2 . One can observe that the implicit Euler, for “long" steps re-
mains stable and approximates the fast dynamics by decaying to their damped values.
1.5 1
1.4 0.8
0.6
1.3
0.4
x2
x1
1.2
0.2
1.1
0
1 -0.2
0.9 -0.4
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
t t
10 15
5 10
0 5
x4
x3
-5 0
-10 -5
-15 -10
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
t t
Figure 8.1: Illustration of the trajectories of the stiff system (7.42) and its simulation via
the implicit Euler method using ∆t = 2 · 10−2 (round markers) and ∆t = 1 · 10−3
(square markers).
179
8.2 I MPLICIT RUNGE -K UTTA METHODS
Let us now study higher-order implicit Runge-Kutta methods. As already explained in the
explicit RK section, a Butcher tableau that is not lower-diagonal describes an implicit method.
Then the RK equations (7.26) recalled here:
à !
Xs
K 1 = f xk + ∆t a1j K j , u(tk + c1 ∆t ) (8.12a)
j =1
..
.
à !
s
X
K i = f xk + ∆t ai j K j , u(tk + ci ∆t ) (8.12b)
j =1
..
.
à !
s
X
K s = f xk + ∆t a s j K j , u(tk + c s ∆t ) (8.12c)
j =1
s
X
xk+1 = xk + ∆t bi K i (8.12d)
i =1
are implicit in K 1,...,s , as ai j 6= 0 for some j ≥ i . In this case we need to solve (8.12a)-(8.12c)
numerically, typically using the Newton method. more specifically, similarly to (8.3), we
write:
³ Ps ´
f xk + ∆t j =1 a1j K j , u(tk + c1 ∆t ) − K 1
..
³ . ´
Ps
r (K , xk , u(.)) := f xk + ∆t j =1 ai j K j , u(tk + ci ∆t ) − K i = 0 (8.13)
..
³ . ´
Ps
f xk + ∆t j =1 a s j K j , u(tk + c s ∆t ) − K s
to solve (8.13). Note that r ∈ Rn·s and K ∈ Rn·s where n is the dimension of the state space
(i.e. x ∈ Rn ) and s the number of stages of the IRK method. The Newton method then works
as follows:
180
IRK for explicit ODEs
Algorithm:
Input: Initial conditions x0 , input profile u(.), Butcher tableau, step size ∆t
for k = 0, . . ., N − 1 do
Guess for K (one can e.g. use K i = xk )
while kr (K , xk , u(.))k > Tol do
Compute the solution ∆K to the linear system:
∂r (K , xk , u(.))
∆K + r (K , xk , u(.)) = 0 (8.15)
∂K
with r given by (8.13). Update:
K ← K + α∆K (8.16)
for some step size α ∈]0, 1] (a full step α = 1 generally works for implicit
integrators)
Take RK step:
s
X
xk+1 = xk + ∆t bi K i (8.17)
i =1
return x1 , . . . , x N
The main computational complexity of this procedure is typically solving the linear system
∂r (K ,xk ,u(.))
(8.15) which involves the (possibly dense) Jacobian matrix ∂K of size Rn·s×n·s . Form-
ing the Jacobian matrix can also be fairly expensive in terms of computational complexity
as it requires evaluating the Jacobians of the system dynamics:
³ P ´
∂ f xk + ∆t sj =1 ai j K j , u(tk + ci ∆t )
∈ Rn×n (8.18)
∂K l
for i , l = 1, . . . , s.
Concerning the order of IRK methods, the picture here is fairly striking. Indeed, recall that
explicit RK methods achieve an order o = s (number of stages) for s ≤ 4 and then the order
“stalls" and does not increase as fast as s. This was bad news for high-order explicit RK
methods in terms of efficiency (complexity vs. accuracy).
181
explicit RK methods (ERK) to the implicit RK methods (IRK)
and readily see that IRK methods requires dramatically less stages to achieve the same or-
der as ERK methods. Let us illustrate this statement in Figure 8.2.
10 0
ERK2
-2
10
IRK4
-4
10
kx N − x(T )k
-6
10
-8
10
-10
10
-12
10 -4 -2 0
10 10 10
∆t
Figure 8.2: Comparison of the global error of the explicit RK2 method (7.19) and an implicit
RK2 method, as obtained from numerical experiments. One can see the high
order o = 2 · s of the IRK method to the order o = s of the ERK method. One
can also observe how the accuracy of the IRK method stops decreasing at about
10−10 − 10−11 where it reaches the accuracy of the linear algebra.
Unfortunately, this striking difference does not mean that IRK methods are necessarily bet-
ter than the ERKs. In order to unpack that statement, let us investigate the efficiency of IRK
methods (complexity for a given accuracy), using similar lines as what we did for the ERK
methods. Let us assume that we want the global error to be limited to some given number
Tol
182
such that the step size ∆t is limited to:
µ ¶1
Tol o
∆t ≤ (8.20)
c
• In order to carry out a simulation on the time interval [0, T ], the number of integrator
step required per simulation time T is:
µ ¶ 1
T Tol − o
N= ≥T (8.21)
∆t c
• The similarity with ERK methods stops here. Indeed, the complexity of IRK methods
is typically dominated by solving the linear system (8.15). The complexity of solving
this system is typically in the order of the cube of the size of the Jacobian matrix if the
matrix is dense11 , which is n · s, where n is the state size. This system must typically
be solved several times, say m in order to reach a good accuracy in solving the IRK
equations. The complexity per time step is then in the order of:
¡ ¢
C = O mn 3 s 3 (8.22)
We deduce from this simple reasoning that the computational cost per unit of simulation
time is of the order
õ ¶− 1 ! õ ¶− 1 !
C ¡ ¢ Tol o Tol o mn 3 o 3
= O N mn 3 s 3 = O mn 3 s 3 = O (8.23)
T c c 8
One can put (8.23) in contrast with the complexity of explicit RK methods(7.38), recalled
here:
µ ¶ 1
n Tol − o
≥ s (8.24)
T c
but the comparison is arguably a bit difficult, as we are trying to compare evaluations of the
model dynamics f in explicit methods to forming and solving linear systems in implicit
methods. We can, however, compare the explicit and implicit approaches via numerical
experiments. E.g. Fig. 8.3 depicts the computational time vs. integration accuracy for ex-
plicit and implicit RK methods of various orders (for ∆t fixed) for the Van der Pol oscillator
(7.43). The observations we make in this specific example are fairly consistent for differ-
ent models. Implicit methods can be a bit more computationally expensive than explicit
ones, though the difference is typically mild. This picture is dramatically changed for stiff
systems, where implicit integrators are typically required to have a numerically stable inte-
gration scheme.
11
the cube can be lowered if the Jacobian is sparse or structured, which can be the case for IRK methods.
Linear Algebra packages are very efficient at finding such structures, and lower the complexity of solving
the linear system. Hence in numerical experiments, one often observe a lower complexity.
183
0
10
Explicit
-2
10
Implicit
Integration error
-4
10
-6
10
-8
10
-10
10 -2 -1 0
10 10 10
Computational time [s]
Figure 8.3: Illustration of the computational time vs. accuracy for IRK and ERK methods
for the Van der Pol system (7.43) with ∆t = 10−2 . One can observe the different
behaviors of the implicit and explicit methods, and that even though implicit
methods reach very high order for few stages, the computational cost of solving
linear systems at every RK step makes the IRK method typically more expensive
than the ERK ones. Note that these results are only illustrative, and may change
depending on the coding language, computer architecture, system, etc.
We will now show how the stability of RK schemes can be analyzed more systematically,
and indicate why IRK methods are different from ERK methods in this respect. Start by
writing the Butcher array for a general RK method as
c1 a11 · · · a1s c1 a 1⊤
.. .. .. .. ..
. . . . . c A
= = , (8.25)
cs a s1 · · · a ss cs a s⊤ b⊤
b1 ··· bs b⊤
184
where {a i }, b, and c are column vectors and A is a matrix. With this notation, applying the
RK scheme to the test system ẋ = λx results in the equations
K 1 = λ(xk + ∆t · a 1⊤ K ) (8.26a)
..
.
K s = λ(xk + ∆t · a s⊤ K ), (8.26b)
The pole of this first order difference equation, a function of µ = λ∆t , is called the stability
function and is given by
R(µ) = 1 + µb ⊤ (I − µA)−1 1. (8.30)
The stability function characterizes the stability region of the RK scheme via the condition
¯ ¯
¯R(µ)¯ ≤ 1, µ = λ∆t (8.31)
As mentioned already in Section 8.1, we refer to the scheme as being A-stable, when the
stability region includes the entire left half-plane of the complex plane.
Based on the derived expression for the stability function R(µ) = 1 + µb ⊤ (I − µA)−1 1, it is in
principle possible to analyze the stability properties, once the matrix A and the vector b of
the Butcher array are defined, but in practice it may still be difficult, of course. However,
focussing for a moment on explicit RK schemes, we can be more concrete. Note that, for
ERK methods, A is strictly lower-diagonal, implying that det(I − µA) = 1 (also follows from
(I − µA)−1 = I + µA + . . . + µs−1 A s−1 , since A s = 0). From this follows (for ERK schemes):
1 s
R(µ) = 1 + µ + . . . + µ
s!
185
For IRK schemes, on the other hand, the following holds:
• Many (but not all) implicit schemes are A-stable, meaning that they can “survive"
(i.e. they are stable for) any fast dynamics. Some of them have even stronger form of
stability (L-stability), but we will not detail this in these notes.
Example 8.2 (Stability of ERK methods). Let us compute the stability function for a few
ERK methods, using the formula (8.30):
0 0
(8.32)
1
0 0 0
1 1
2 2 0 (8.34)
0 1
0 0 0
1 1 0 (8.36)
1 1
2 2
µ2
· ¸· ¸
£ 1 1
¤ 1 0 1
R(µ) = 1 + µ 2 2 = 1+µ+ . (8.37)
µ 1 1 2
Notice that the two RK2 methods have the same stability function, as is the case for all
o = s = 2 ERK methods.
186
Example 8.3 (Stability of IRK methods). A couple of examples of IRK methods verify that
the stability function is indeed a rational function:
1 1
(8.38)
1
0 0 0
1 1
1 2 2
(8.40)
1 1
2 2
· ¸−1 · ¸
£ 1 1
¤ 1 0 1
R(µ) = 1 + µ 2 2 −µ/2 1 − µ/2 1
· ¸· ¸
£ 1 1
¤ 1 1 − µ/2 0 1 µ 1 + µ/2
= 1+µ 2 2 = 1+ = . (8.41)
1 − µ/2 µ/2 1 1 1 − µ/2 1 − µ/2
1 + µ/2
< 1, (8.42)
1 − µ/2
is exactly the left half-plane (can you verify this?), i.e. the method is A-stable.
187
8.3 COLLOCATION METHODS*
There are different families of implicit RK methods, but in this course we will investigate
the family of collocation methods, which have very strong properties, and which have a
fairly intuitive interpretation. The key idea behind collocation methods is to approximate
the model trajectories via polynomials. This is intrinsically an interpolation problem.
based on a grid τ1,...,s ∈ [0, 1] (see Figure 8.4). These polynomials have two interesting fea-
tures
• For a suitable choice of grid, they are orthogonal, i.e.
Z1
ℓi (τ)ℓ j (τ) dτ = 0 if i 6= j (8.45)
0
• They satisfy:
½
1 if i = j
ℓi (τ j ) = (8.46)
0 otherwise
resulting in the following property on p:
p (τi , K ) = K i , i = 1, . . . , s (8.47)
The key idea behind the collocation method is to approximate12 the true model trajectories
by using13:
s
X
ẋ (tk + τ · ∆t ) = p (τ, K ) = K i ℓi (τ) (8.48)
i =1
12
sometimes the method is presented by considering that the trajectories x(t ) themselves are approximated
by the polynomials instead of their time derivatives ẋ(t ). This does not change much the mathematics of
the collocation schemes.
13
Note that in the following developments we use the label x(.) to denote the trajectories provided by the
integrator.
188
0 ℓ1 (τ) τ 0 ℓ2 (τ) τ
τ1 τ2 τ3 τ4 τ1 τ2 τ3 τ4
0 ℓ3 (τ) τ 0 ℓ4 (τ) τ
τ1 τ2 τ3 τ4 τ1 τ2 τ3 τ4
Figure 8.4: Illustration of the Lagrange polynomials for s = 4. The grid τ1,...,4 is displayed via
the vertical dotted lines (note that τ1 = 0). Property (8.46) is readily visible.
0 τ
K2 K3
K1 p(τ, K )
K4
τ1 τ2 τ3 τ4
P
Figure 8.5: Illustration of the polynomial p(τ, K ) = is=1 K i ℓi (τ) for s = 4 and for an arbitrary
coefficient vector K ∈ R4 , and K i ∈ R. One can observe the property (8.47), i.e.
p(τi , K ) = K i .
189
on the interval [tk , tk+1 ] (corresponding to τ ∈ [0, 1]) by selecting the coefficients K 1,...,s ∈
Rn . Note that each interval [tk , tk+1 ] (for k = 0, . . . , N −1) will have its own, possibly different
coefficients
K1
.
K = .. (8.49)
Ks
where K i ∈ Rn .
We should now formulate equations that let us determine the parameters K . The key idea
here is to enforce the dynamics on the grid points τ1 , . . . , τs , i.e. we want to determine the
K 1,...,s such that:
¡ ¢ ¡ ¢ ¡ ¡ ¢ ¡ ¢¢
ẋ tk + ∆t · τ j = p τ j , K = f x tk + ∆t · τ j , u tk + ∆t · τ j (8.50)
holds by construction
¡ ¢from (8.47). In order to further specify the equations above, we need
to relate x tk + τ j · ∆t (i.e. the integral of (8.48)) to the coefficients K . We do this next. We
first observe that:
Zτ·∆t
x (tk + τ · ∆t ) = xk + ẋ (tk + ν) dν (8.52)
0
where xk is the initial state of interval [t k , tk+1 ]. We can then make a change of variable
ν = ξ · ∆t , yielding dν = ∆t · dξ, we can then rewrite (8.52) as:
Zτ
x (tk + τ · ∆t ) = xk + ∆t ẋ (tk + ∆t · ξ) dξ (8.53)
0
where we define:
Zτ
L i (τ) = ℓi (ξ)dξ (8.55)
0
190
Note that L i (τ) can be easily computed explicitly as it is simply the integration of the poly-
nomial ℓi . We can then use (8.54c) in (8.50) to get:
à !
¡ ¢ X s ¡ ¢
ẋ tk + ∆t · τ j = f xk + ∆t K i L i (τ j ), u tk + ∆t · τ j (8.56)
| {z } i =1
=K j
which should hold for all grid points j = 1, . . . , s. It follows that on a given interval [tk , tk+1 ]
the collocation equations read as:
à !
Xs
K 1 = f xk + ∆t K i L i (τ1 ), u (tk + ∆t · τ1 ) (8.57a)
i =1
..
. (8.57b)
à !
s
X ¡ ¢
K j = f xk + ∆t K i L i (τ j ), u tk + ∆t · τ j (8.57c)
i =1
..
. (8.57d)
à !
s
X
K s = f xk + ∆t K i L i (τs ), u (tk + ∆t · τs ) (8.57e)
i =1
s
X
xk+1 = xk + ∆t · K i L i (1) (8.57f)
i =1
At this stage, it is very useful to compare (8.57) to the RK equations (7.26) and realise that
they are identical if one defines:
a j i = L i (τ j ), b i = L i (1), cj = τj (8.58)
We also observe that if one picks the grid points τ1 , . . . , τs , then the polynomials ℓ1,...,s (τ) are
defined and so is their integrals L i (τ). It follows that the coefficients (8.58) of the Butcher
tableau are entirely defined via the grid points τ1 , . . . , τs . We also construe here the role of
the variables K 1,...,s in RK methods. Indeed, (8.56) shows us that these variables are holding
the state derivatives ẋ at the grid points τ1 , . . . , τs .
Note that not all IRK methods are collocation methods, as one could choose a dense Butcher
tableau with coefficients that do not satisfy (8.58) for any choice of grid points τ1 , . . . , τs . In-
deed, one can easily see that the Butcher tableau has s 2 + 2s degrees of freedom, while the
grid point selection offers only s degrees of freedom, such that not all choice of coefficients
a, b, c can arise from (8.58).
We now need to briefly discuss how to choose the grid points τ1 , . . . , τs effectively. There are
a few possible choices for a given number of stages s. The most commonly used for treating
ODEs is the Gauss-Legendre method, which chooses the grid points τ1 , . . . , τs as the roots
of the polynomial:
1 ds h¡ 2 ¢s i
P s (τ) = τ − τ (8.59)
s! dτs
191
i.e. we select the grid points τ1 , . . . , τs such that:
P s (τ j ) = 0, j = 1, . . . , s (8.60)
This selection rule may sound mysterious. Its motivation, though, has solid and deep roots
in the Gauss quadrature theory, but we will not discuss this further here. Equipped with
this rule, we have a procedure to build IRK methods (i.e. their Butcher tableau):
1. Select the number of stages s
An additional benefit of using polynomial interpolation for simulation models is that the
polynomials provide us with an approximation of the state trajectories at any point in time,
i.e. unlike other E/IRK methods, which deliver only the states on the time grid t0,...,N , collo-
cation methods can be prompted to deliver the states in-between, using (8.53), i.e. the true
state trajectories (say x true) is approximated at any time point in [0, T ] by:
Zτ s
X Zτ
true
x (tk + ∆t τ) ≈ xk + ∆t ẋ (tk + ∆t · ξ) dξ = xk + ∆t Ki ℓi (ξ)dξ (8.61)
0 i =1 0
Note that the order of approximation o = 2s unfortunately only holds on the grid t 0,...,N ,
such that this approximation is typically a bit worse than order 2s.
F (ẋ, x, u) = 0 (8.63)
This is e.g. the case for models of complex mechanical system arising in Lagrange mechan-
ics, taking the form (see (3.40)), where we use v ≡ q̇:
· ¸· ¸ " #
I 0¡ ¢ q̇ v
= ∂
¡ ¡ ¢ ¢ (8.64)
0 W q v̇ Q + ∇q L − ∂q W q v v
| {z }
≡ẋ
192
¡ ¢
and where the symbolic inverse of matrix W q is very complex to write. In that case, it is
best to avoid trying to form the explicit version of (8.64), i.e.
· ¸ · ¸−1 " #
q̇ I 0¡ ¢ v
= ∂
¡ ¡ ¢ ¢ (8.65)
v̇ 0 W q Q + ∇q L − ∂q W q v v
and work with the implicit form (8.64) directly. One trivial, but not necessarily effective
approach would be to use an explicit RK method while forming the matrix inverse numer-
ically in (8.65) at every integrator step.
A usually more effective approach is to use an implicit RK method directly on the implicit
equation (8.64). Note that we can easily write (8.64) as (8.63), using:
· ¸ " #
I 0¡ ¢ v
F (ẋ, x) = ẋ − ∂
¡ ¡ ¢ ¢ =0 (8.66)
0 W q ∇q L − ∂q W q v v
Recall that the variables K 1,...,s in RK method for collocation are holding the state deriva-
tives ẋ at the grid points τ1 , . . . , τs . The modification of the IRK equations for treating (8.63)
193
therefore read as:
à !
s
X
F K 1 , xk + ∆t a1j K j , u(tk + c1 ∆t ) = 0 (8.68a)
j =1
..
.
à !
s
X
F K i , xk + ∆t ai j K j , u(tk + ci ∆t ) = 0 (8.68b)
j =1
..
.
à !
s
X
F K s , xk + ∆t a s j K j , u(tk + c s ∆t ) = 0 (8.68c)
j =1
X s
xk+1 = xk + ∆t bj Kj (8.68d)
j =1
The rest works as IRK schemes for explicit ODEs (see algorithm “IRK for explicit ODEs").
Here it is useful to reflect on the IRK method in comparison to using an ERK method for
solving the implicit ODE (8.63). Indeed, since the ODE does not readily deliver the state
derivative ẋ, even when using an explicit method, one would have to solve the implicit
equation for ẋ at every stage of the RK step, typically via a Newton method. We can then
observe that
• Deploying an implicit IRK method requires solving one implicit equation of size n · s,
in order to get an order o = 2s (for collocation methods).
Comparing formally the efficiency of the two approaches can be tricky, but the bottom
line here is that if the ODE model is implicit, using an implicit RK method to perform the
simulation can be a very good choice, independently of questions of stiffness.
194
8.5 RK METHODS FOR IMPLICIT DAE S
In order to close this chapter, it remains to be discussed how to treat DAEs numerically. We
have in fact already all the tools required to tackle this problem, we only need to clarify a
few points carefully. Let us consider DAEs in the fully implicit form (6.22) recalled here:
F (ẋ, z , x, u) = 0 (8.70)
DAEs can be treated very similarly to implicit ODEs, but we need to understand how the
algebraic variables z are treated here.
At every time step, the algebraic variables z ought to be considered as “free" variables that
need to be determined independently of the other time steps, and they need to be adjusted
so that (8.70) holds. More specifically, while an implicit ODE ought to be treated via impos-
ing
à !
s
X
F K i , xk + ∆t ai j K j , u(tk + ci ∆t ) = 0 (8.71)
j =1
for i = 1, . . . , s of each RK step xk → xk+1 , equivalently for a DAE of the form (8.70), we ought
to impose
à !
s
X
F K i , z i , xk + ∆t ai j K j , u(tk + ci ∆t ) = 0 (8.72)
j =1
for i = 1, . . . , s of each RK step xk → xk+1 . Note that here the variables z i ∈ Rm operate as
“unknown" that must be determined alongside the variables K i ∈ Rn . The complete IRK
equations for DAEs then read as:
³ P ´
F K 1 , z 1 , xk + ∆t sj =1 a1j K j , u(tk + c1 ∆t )
..
³ . ´
P
r (w , xk , u(.)) := F K i , z i , xk + ∆t sj =1 ai j K j , u(tk + ci ∆t ) = 0
(8.73)
..
³ . ´
Ps
F K s , z s , xk + ∆t j =1 a s j K j , u(tk + c s ∆t )
195
Note that if x ∈ Rn and z ∈ Rm , then function F returns vectors of dimension Rn+m . It
follows that w ∈ Rs(n+m) . Moreover, the “residual" function r in (8.73) returns a vector of di-
mension Rs(n+m) , and that solving (8.73) must provide the variables w . The set of equations
(8.73) must be solved at each RK step xk → xk+1 in order to build the complete simulation
on the time interval [0, T ], and is done as previously using the Newton method.
For the sake of clarity, let us write down explicitly the pseudo-code required to perform the
numerical simulation of a fully implicit DAE model.
Input: Initial conditions x0 , input profile u(.), Butcher tableau, step size ∆t
for k = 0, . . ., N − 1 do
Guess for w (one can e.g. use K i = xk , z i = 0)
while kr (w , xk , u(.))k > Tol do
Compute the solution ∆w to the linear system:
∂r (w , xk , u(.))
∆w + r (w , xk , u(.)) = 0 (8.75)
∂w
with r given by (8.73). Update:
w ← w + α∆w (8.76)
for some step size α ∈]0, 1] (a full step α = 1 generally works for implicit
integrators)
Take RK step:
s
X
xk+1 = xk + ∆t bi K i (8.77)
j =1
return x1,...,N
Note that the remarks at the end of section 8.4 also hold here, i.e. in order to deploy an
explicit RK method on the DAE (8.70), one would have to solve it for ẋ and z at every stage
of the RK steps, typically requiring the deployment of a Newton method.
Adapting the approach presented above to the semi-explicit DAE case is fairly straightfor-
ward. Let us do it here nonetheless. Recall that semi-explicit DAEs are in the form:
ẋ = f (z , x, u) (8.78a)
0 = g (z , x, u) (8.78b)
196
Here we ought to impose at each i = 1, . . . , s:
à !
Xs
K i = f z i , xk + ∆t ai j K j , u(tk + ci ∆t (8.79a)
j =1
à !
s
X
0 = g z i , xk + ∆t ai j K j , u(tk + ci ∆t (8.79b)
j =1
And apply the same procedure as described several times above to perform the simulation.
Let us now make a connection between the problem of DAE differential index and numer-
ical simulations. Recall that a semi-explicit DAE (8.78) is of index 1 (i.e. an “easy" DAE) if
∂g
the Jacobian ∂z is full rank (i.e. invertible) on the system trajectories. In order to make a
simple point, let us consider an IRK method having a single stage s = 1 to treat the DAE. In
that simple case, we observe that the RK equations (8.80) boil down to:
· ¸
f (z 1 , xk + ∆t a11 K 1 , u(tk + c1 ∆t ) − K 1
r (w , xk , u(.)) = =0 (8.81)
g (z 1 , xk + ∆t a11 K 1 , u(tk + c1 ∆t )
Suppose then that we let ∆t → 0, i.e. we investigate the behavior of the method for an
“infinite" accuracy (i.e. very high). Then the Jacobian matrix tends to:
" #
∂f
∂r (w , xk , u(.)) −I ∂z
lim = ∂g (8.83)
∆t→0 ∂w 0 ∂z
197
∂g (x,u)
which is full rank (invertible) if and only if ∂z is full rank. Recall that we need to deploy a
∂r (w ,xk ,u(.))
Newton method in order to solve r (w , xk , u(.)) = 0, where we need the Jacobian ∂w
∂g (x,u)
to be invertible, and therefore requires ∂z to be full rank. We can therefore conclude
from this simple example that
Note that if ∆t > 0, the rank of the Jacobian matrix may be nonetheless full, and one may
conclude that the method is nonetheless fine. However, the small analysis above shows
us that “something is intrinsically wrong" when deploying the IRK schemes we have stud-
ied on high-index DAEs (index above 1). We will leave out of this course further analysis
of this question. Before closing this chapter, it ought to be underlined here that one can
deploy some of the RK schemes we have investigated on high-index DAEs, and get a tra-
jectory computed from the method. Hence, there may not be a clear sign when deploying
the method that “something is not right". The trajectories obtained, however, are typically
non-sensical. Hence one ought to check the index of the DAE before trusting the output of
a classic RK method.
198
9 S ENSITIVITY OF S IMULATIONS
Our last chapter will deal with a topic that is crucial in many fields of engineering that
make use of numerical simulations. In many problems it is important not only to compute
accurate and reliable simulations for an ODE or a DAE, but also to get some information
on how this simulation would be affected by a change in the model parameters. To be a bit
more formal, let us consider an ODE depending on a fixed parameter p:
¡ ¢
ẋ = f x, p , x(0) = x0 (9.1)
Note that we will omit the inputs in our developments here, and discuss in the end what
role they play. Suppose that we have computed a simulation of (9.1), i.e. we have a se-
quence:
xk ≈ x(tk ) (9.2)
on a given time grid t0 , . . . , t N . Clearly, if one was to modify “a bit" either the initial condi-
tions x0 and/or the parameters p, then the sequence x1,...,N would be affected. For nonlin-
ear dynamics f , one cannot assess the impact of changing the parameters and/or initial
conditions x0 on the simulation without redoing the entire simulations. However, one can
use first-order arguments and compute the approximate effect of changing the parameters
and initial conditions i.e. one can try to compute:
∂xk ∂xk
, (9.3)
∂x0 ∂p
∂x(t ) ∂x(t )
(9.4)
∂p ∂x0
can appear tricky, but let us consider the following construction. Suppose that we know the
trajectory x(t ) associated to (9.1). We observe that:
199
where all Jacobians are evaluated at x(t ) and p. We moreover observe that
∂x(0) ∂x(0)
= I, =0 (9.6)
∂x0 ∂p
Let us label:
∂x(t ) ∂x(t )
A(t ) = , B(t ) = (9.7)
∂x0 ∂p
We can then write the dynamic system for A and B using (9.5)-(9.6):
∂f
Ȧ(t ) = A(t ) A(0) = I (9.8a)
∂x
∂f ∂f
Ḃ(t ) = B(t ) + B(0) = 0 (9.8b)
∂x ∂p
We then observe that (9.8) is a linear time-varying system with a matrix-based state space
(A and B). It is often referred to as the variational equations associated to (9.1). In princi-
ple, one could answer the sensitivity question raised above by performing the integration
of the dynamic (9.1) together with (9.8) using any numerical integration method.
Such an approach, however, would not deliver exact derivatives. Indeed, the variational
equations are expressing the sensitivity of the trajectories of the system, whereas the se-
quence x1,...,N arises from numerical simulation and it approximates the trajectories. More-
over, a numerical integration of (9.8) is also an approximation of the trajectories A(t ), B(t ).
As a result the sensitivities generated can be quite inaccurate.
For these reasons, the variational approach is not very often used to treat the sensitivity
question, and rather replaced by so called Algorithmic Differentiation methods, which seek
to compute exact derivatives of the inexact simulation of the dynamics. Let us briefly study
these techniques next.
Computing (9.3) for the simulation (9.9) boils down to a careful application of the chain
rule on (9.9). We observe that:
∂xk+1 ∂xk ∂ f ∂xk ∂x0
= + ∆t , =I (9.10a)
∂x0 ∂x0 ∂xk ∂x0 ∂x0
µ ¶
∂xk+1 ∂xk ∂ f ∂xk ∂ f ∂x0
= + ∆t + , =0 (9.10b)
∂p ∂p ∂xk ∂p ∂p ∂p
200
where all Jacobians are evaluated on the baseline simulation x1,...,N and for the parame-
ter value p. We observe that (9.10) is in fact a discrete linear dynamic system, assigning
dynamics to the matrices (9.3). Indeed, let us label
∂xk ∂xk
Ak = , Bk = (9.11)
∂x0 ∂p
In contrast to the variational approach, the sensitivities delivered by processing the discrete
dynamics (9.8) provides the exact (up to machine precision) derivative of the sequence
x1,...,N delivered by the explicit Euler scheme, even though that sequence is not an exact
(up to machine precision) approximation of the true trajectories.
An attentive reader, however, may have observed that if one treats both (9.1)-(9.8) using an
explicit Euler scheme, then one recovers the algorithm above. Hence in this specific case,
the variational equations and the Algorithmic Differentiation coincide. This observation is
not true in general.
201
9.3 A LGORITHMIC D IFFERENTIATION OF EXPLICIT RUNGE -K UTTA METHODS
The explicit RK equations for (9.1) arising from a lower-diagonal Butcher tableau can be
written as:
¡ ¢
K 1 = f xk , p (9.14a)
..
.
à !
iX
−1
K i = f xk + ∆t ai j K j , p (9.14b)
j =1
..
.
à !
s−1
X
K s = f xk + ∆t as j K j , p (9.14c)
j =1
s
X
xk+1 = xk + ∆t bj Kj (9.14d)
j =1
(observe that the summations in the formation of the K 1,...,s entail by construction an ex-
plicit method). Although it is a bit tedious, one can apply chain rules to (9.14) in order to
extract the sensitivities. Indeed, one can observe that:
202
Explicit RK scheme with sensitivity generation
Algorithm:
s
X
xk+1 = xk + ∆t bj Kj (9.18a)
j =1
à !
s
X ∂K j
A k+1 = I + ∆t bj Ak (9.18b)
j =1 ∂xk
à !
s
X ∂K j s
X ∂K j
B k+1 = I + ∆t bj B k + ∆t bj (9.18c)
j =1 ∂xk j =1 ∂p
Note that these computations are tedious to derive and to program. Fortunately, very ef-
ficient Algorithmic Differentiation tools such as e.g. CasADi can perform these operations
automatically.
203
read as:
à !
s
X
K 1 = f xk + ∆t a1j K j , p (9.19a)
j =1
..
.
à !
s
X
K i = f xk + ∆t ai j K j , p (9.19b)
j =1
..
.
à !
s
X
K s = f xk + ∆t as j K j , p (9.19c)
j =1
s
X
xk+1 = xk + ∆t bj Kj (9.19d)
j =1
204
One ought to observe that the matrix inverse (or rather the matrix factorization for the
linear-algebra savy among the readers) needed in (9.22) can be reused in (9.24), such that
the latter is inexpensive to compute. We then observe that (9.15) can be readily used to
compute the sensitivities of xk+1 as in the explicit RK method (see (9.15)). We can summa-
rize these observations in the following pseudo-code.
IRK for explicit ODEs with senstivities
Algorithm:
Input: Initial conditions x0 , input profile u(.), Butcher tableau, step size ∆t
Set A 0 = I , B 0 = 0
for k = 0, . . ., N − 1 do
Guess for K (one can e.g. use K i = xk )
while kr (K , xk , u(.))k > Tol do
Compute the solution ∆w to the linear system:
∂r (K , xk , u(.))
∆K + r (K , xk , u(.)) = 0 (9.25)
∂K
with r given by (8.13). Update:
K ← K + α∆K (9.26)
for some step size α ∈]0, 1] (a full step α = 1 generally works for implicit
integrators)
Reuse the matrix factorization required to solve (9.25) to compute:
¡ ¢ −1 ¡ ¢
∂K ∂r K , xk , p ∂r K , xk , p
=− (9.27a)
∂p ∂K ∂p
¡ ¢ −1 ¡ ¢
∂K ∂r K , xk , p ∂r K , xk , p
=− (9.27b)
∂xk ∂K ∂xk
return
205
9.5 S ENSITIVITY WITH RESPECT TO INPUTS
Let us finish these lecture notes with discussing the computation of the¡ sensitivity
¢ of a sim-
ulation with respect to the input u applied to the dynamics ẋ = f p, u . Since u(.) is, in
general, a profile (i.e. a function of time in the interval [0, T ]) as opposed to a finite set
of parameters p, discussing sensitivities in the form explored here is a priori inadequate.
However, the classic approach in numerical simulations is to consider that the input profile
u(.) is in fact parameterized by a finite set of parameters, i.e. we consider that the input is
given by:
u(t , p) (9.29)
Any derivative in the algorithms above is then simply replaced by its chain-rule expansion:
∂. ∂. ∂u
= (9.30)
∂p ∂u ∂p
e.g.
∂f ∂ f ∂u
= (9.31)
∂p ∂u ∂p
The input parametrization can take many forms. It can e.g. be convenient to use an ap-
proach similar to the one used in collocation-based IRK schemes. In that context, the input
profile is e.g. given by:
¡ ¢ Xs
u tk + τ∆t , p = p k,i ℓi (τ) (9.32)
i =1
where we have:
p 0,1
..
.
p 0,s
..
p = . (9.33)
p N−1,1
..
.
p N−1,s
Hence the input profile is piece-wise smooth, and interpolates the points p k,i on the time
intervals [tk , tk+1 ] for k = 0, N − 1. This principle is illustrated in Figure 9.1. A wide-spread
alternative input parametrization is to use the piecewise constant approach, which uses:
This principle is illustrated in Figure 9.2. All the sensitivity principles we studied above can
be readily applied.
206
t0 t1 t2 t3
p 2,2
p 2,1
u p 2,3
t0 t1 t2 t3
p1
p2
u
p0
p3
Figure 9.2: Illustration of the piecewise-constant approach to the input parametrization for
N = 4.
207
R EFERENCES
[1] K.J. Åströom and R. Murray. Feedback Control Systems.
[4] L. Ljung and T. Glad. Modeling and Identification of Dynamic Systems. Studentlitter-
atur 2016.
208
The implicit Runge-Kutta (IRK) methods differ from explicit Runge-Kutta (ERK) methods in terms of computational complexity and stability when dealing with differential algebraic equations (DAEs). Implicit methods generally involve solving systems of equations at each step, which makes them computationally more expensive than explicit methods . However, this computational cost is justified by their superior stability properties; implicit methods are often A-stable, meaning they can handle stiff problems and maintain stability for a wider range of step sizes . Explicit methods, on the other hand, are not A-stable and can become unstable with larger step sizes, especially when dealing with stiff equations common in DAEs . The stability advantage of IRK makes them suitable for high-index DAEs, where stability and robust handling of fast dynamics are crucial .
DAEs can transform into ODEs through adroit manipulation of their system constraints and differentiating algebraic components until they appear explicitly as time derivatives, allowing conversion to ODEs. The differential index is fundamental in this process as it indicates the number of differentiations required to express the system without algebraic equations. A DAE with a high differential index may require several differentiations to reach an ODE form, indicating computational complexity and the potential for numerical stability challenges. Solving DAEs often involves reducing the differential index, thus simplifying integration .
The Maximum Likelihood (ML) method provides several advantages over the least-squares method for parameter estimation in system identification. Firstly, ML incorporates a probabilistic framework that explicitly accounts for model uncertainties, allowing for a more comprehensive analysis under conditions of random variability . This enables the ML method to be applied effectively to various types of probability distributions, making it a more flexible approach . Secondly, while the least-squares approach assumes independent, identically distributed noise, leading to unbiased estimates under these conditions, ML extends this to handle cases with correlated noise through the use of covariance matrices, which adjust the weight of each data point based on its variance . Lastly, while least-squares is essentially a special case of ML under normal distribution assumptions, ML can optimize under different noise characteristics and distributions, providing a more robust estimation mechanism when the noise characteristics deviate from ideal assumptions . Thus, ML's versatility and robustness in handling noise contribute to its advantages over least-squares.
One might prefer a model structure based on physical laws when there is a clear understanding of the system's mechanics and if precise predictions under varied conditions are crucial. This approach can lead to more accurate models when system behavior is aligned with known physical principles, aiding in extrapolations. Conversely, general-purpose models are preferred when the system is too complex or poorly understood, allowing flexibility in capturing dynamics without detailed physical insight. They are useful when data-driven insights can adequately model relationships despite lacking a fully detailed physical understanding, such as in exploratory analysis or when rapid prototyping is required .
Implicit methods maintain stability in stiff systems by allowing larger time steps without sacrificing accuracy, as they inherently manage rapid changes in system dynamics more effectively than explicit methods. Implicit methods, such as the implicit Euler or implicit Runge-Kutta, solve equations involving both the current and future state variables at each step, inherently stabilizing the integration by accounting for stiffness, which explicit methods struggle with because they require smaller step sizes to achieve similar accuracy. This quality makes implicit methods particularly well-suited for stiff DAEs where stability is critical .
Parametric models in system identification provide a structured approach to model the relationship between inputs and outputs using a parameterized function. The choice of model, whether based on linear or nonlinear dynamics, affects the accuracy and reliability of the estimation process. Selecting an incorrect model structure can lead to overfitting or underfitting, affecting the model's ability to generalize from data. The model's complexity and the number of parameters also influence the computational effort and precision of parameter estimation. Properly chosen parametric models help achieve a balance between simplicity and fidelity to the real system .
Lagrange multipliers (denoted as \( z \)) in constrained Lagrange equations impose the constraints \( c(q) = 0 \) into the system's dynamics by introducing additional terms \( \nabla_q c \cdot z \) into the Lagrange equations. They serve as a mechanism to ensure that the constraints are satisfied during motion, effectively acting like additional forces that keep the system on the constraint manifold \( c(q) \). The dynamics of a constrained system are influenced by these multipliers as they alter the resultant forces affecting the system, contributing terms into the equations of motion which are akin to restoring forces preventing deviation from constraints . Consequently, these multipliers have to be computed at each time step to simulate correctly constrained dynamics, affecting accelerations \( \ddot{q} \), which are expressed as functions of not only positions \( q \), velocities \( \dot{q} \), and external forces \( Q \), but also these Lagrange multipliers .
Model validation is critical in the system identification process because it ensures that a model meets its intended purpose and requirements, which may vary depending on whether the model is used for insights, parameter determination, or control design . It helps to determine how well a model reflects the real system and can highlight the need for modifications if discrepancies are found, thus it is integral to the iterative nature of the modeling process . Effective model validation involves simulating the model under relevant conditions and comparing the results with actual system data, as well as conducting various tests such as predicted versus real output comparisons and statistical tests like cross-correlation and autocorrelation of prediction errors . Additionally, cross-validation can be useful by testing the model on fresh data to ensure it is not only capturing noise but is accurately representing system dynamics .
The Lagrange function for the hanging chain incorporates gravitational and spring energies by treating the potential energy as the sum of gravitational energy and the energy stored in the springs. The gravitational energy term is expressed as \(V_{gravity} = mg \sum_{k=1}^{N} \left[0, 0, 1 \right] p_k\), where \(g\) is the gravitational constant, and \(p_k\) represents the positions of the individual masses along the chain. The spring energy is given by a sum reflecting the elastic links between the masses, expressed as \(\frac{1}{2}K \sum_{k=0}^{N} \lVert p_{k+1} - p_k \rVert^2\), where \(K\) is the rigidity of the springs. The total Lagrange function then becomes \(L = \frac{1}{2}m \dot{q}^\top \dot{q} - V_{gravity} - V_{spring}\), showing how both energy types are incorporated into the dynamics of the chain .
Sensitivity analysis in implicit Runge-Kutta methods is generally more straightforward than in explicit methods because it inherently accounts for the coupled nature of states and parameters via the implicit system equations. Implicit methods allow for a more direct computation of sensitivities since they involve solving the entire system as a unified whole, leveraging the implicit function theorem, which can reuse matrix factorization to reduce computational cost. In contrast, explicit methods require separate computations for state derivatives and can be less efficient in addressing parameter sensitivity due to a need for additional steps to decouple dynamics and parameters explicitly .