Master Qfin at Wu Vienna Lecture Optimization: R Udiger Frey

Introduction and Unconstrained Optimization
Master QFin at WU Vienna Lecture Optimization

R udiger Frey
ruediger.frey@wu.ac.at http://statmath.wu.ac.at/frey
Spring 2014
1 / 29
Admin
Dates of the lecture. 4.3., 6.3., 11.3.,18.3. (incl mid-term)
25.3., 1.4., 8.4. (nal exam at some point before easter)

Examination. 10 % home assignments, 30% Midterm Exam
(unit4) 60% Final Exam

Tutor. Giorgo Ottonello, 2nd year QFin. He will correct
homework assignments
2 / 29
Useful references
R. Frey, lecture notes Optimization, available on learn@WU,
2014
Bertsekas, D., Nonlinear Programming, Athena Scientic
Publishing, 1999
Griva, Nash, Sofer, Linear and Nonlinear optimization, SIAM
publishing, 2009
3 / 29
Overview
Introduction and Unconstrained Optimization Introduction Mathematical Background Unconstrained Optimization: Theory
4 / 29
Optimization problems
In its most general for an optimization problem is minimize f (x ) subject to x X (1)
is a subset of Rn , and the cost Here the set of admissible points X to R. Often the admissible points function f is a function from X are further restricted by explicit inequality constraints. Note that maximization problems can be addressed by replacing f with f , as supx X f (x ) = inf x X {f (x )}.
5 / 29
Types of optimization problems.

is of continuous nature such Continuous problems. Here X = Rn or sets of the form as X = {x Rn : g (x ) 0 for some g : Rn Rn }. These X problems are usually tackled with calculus or convex analysis. is a (usually large) nite set (as in Discrete problems. Here X network optimization)
Nonlinear programming. Here f is nonlinear or the
is specied by nonlinear equations. constrained set X

Linear programming. Here f and g are linear, and (1) takes
the form min c x such that Ax b for c , x Rn , a matrix A Rmn , b Rm and m n.

6 / 29
Optimization problems in nance, economics and statistics.

a) Portfolio optimization. We give two examples.
Maximization of expected utility.
Rn
max E (u (V0 + (ST S0 ))) ,
1, . . . S n) Here V0 is the initial wealth of an investor; S0 = (S0 0 1 ( ), . . . S n ( )) the is the initial asset price; ST ( ) = (ST T terminal asset price; represents the portfolio strategy. The increasing and concave function u : R R is the utility function that is used to model the risk aversion of the investor.
Markowitz problem. Here one looks for the minimal-variance
portfolio under all portfolios with a given mean.
7 / 29
Optimization problems ctd.

b) Calibration problems. Denote by g1 (), . . . , gm () model prices of m nancial instruments for given parameter vector the observed market price. Rn and by g , . . . , gm X 1 Model calibration leads to the optimization problem 1 min 2 X
m
(gi () gi )2 .
i =1
If gi is linear in we have a standard regression problem; otherwise one speaks of a generalized regression problem. c) Maximum likelihood methods in statistics. d) Financial mathematics. Duality results from convex analysis are crucial in nancial mathematics (rst fundamental theorem of asset pricing or superhedging duality).
8 / 29
Overview
1. Unconstrained Optimization
Necessary and sucient optimality conditions Numerical methods
2. Lagrange multiplier theory and Karush-Kuhn-Tucker theory 3. Convex optimization and duality
Convexity and separation; The dual problem Duality results and economic applications
9 / 29
Dierentiability and partial derivatives

Consider some f : U Rn R, (x1 , . . . , xn )t f (x1 , . . . , xn ), where U is an open subset of Rn , and some x U . Denition. (1) f is called continuously dierentiable on U (f C 1 (U )) if for all x U all partial derivatives exist and if the partial derivatives are are continuous functions of x . (2) More generally, a function f : U Rn Rm , (x1 , . . . , xn )t (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn ))t is continuously dierentiable on U if all components f1 , . . . , fm belong to C 1 (U ).
10 / 29
Example: Quadratic form
Consider a symmetric 2 2 matrix A and let

2 2 f (x1 , x2 ) = x t Ax = a11 x1 + 2a12 x1 x2 + a22 x2 .
f (x ) f (x ) = 2a11 x1 + 2a12 x2 = (2Ax )1 and = (2Ax )2 x1 x2
11 / 29
Gradient and Jacobi matrix

Suppose that f : U R is in C 1 (U ). Then the column vector
t
f (x ) =
f f x1 (x ), . . . , xn (x )
is the gradient of f .
For a C 1 function g : U Rm the Jacobi matrix is given by
Jg (x ) =
g1 ( x ) x1
...
. . .
g1 (x ) xn
. . .
gm (x ) x1
gm (x ) xn
Sometimes one uses also the gradient matrix g (x ) = Jg (x )t = (g1 (x ), . . . , gm (x )).
12 / 29
First order (Taylor) approximation

Consider some C 1 function f : U R. Then for any x , y U f (y ) f (x ) = f (x )t (y x ) + R (x , y x ) where it holds that lim
z 0 R (x ,z ) z
(2)
= 0.
Idea. The function f can be approximated locally around x by the ane mapping y f (x ) + f (x )t (y x ). Similarly, we get for a C 1 function g : U Rm that g (y ) g (x ) = Jg (x )(y x )+ R (x , y x ) with lim R (x , z ) = 0. z
z 0
13 / 29
Chain rule
Theorem. Consider C 1 functions f : Rn Rm and g : Rk Rn and let h := f g . Then h is C 1 and it holds for the Jacobi matrix that Jh(x ) = Jf (g (x ))Jg (x ), i.e. the Jacobian of the concatenation is the product of the individual Jacobi matrices. Example. (Derivative along a vector). Consider a C 1 functions f : Rn R We want to consider the function f along the straight line (t ) := x + tv , for t R, x , v Rn . We have J (t ) = v , Jf (x ) = (f (x ))t and hence d d f ((t )) = (f (x +tv ))t v , in particular f ((0)) = (f (x ))t v . dt dt
14 / 29
Second derivatives
Denition. Consider C 1 function f : U Rn R. Then the rst f (x ) order partial derivatives xi , 1 i n, are themselves functions from U to R. 1. If all partial derivatives are C 1 functions, f is called twice continuously dierentiable on U (f C 2 (U )). Fix i , j {1, . . . , n}. Then one writes
f xi (x ) 2f (x ) := xi xj xj
for the second partial derivative in direction xi and xj . 2. For f C 2 (U ) the matrix Hf with Hfij (x ) = Hessian matrix of f .
2f xi xj (x )
is the
15 / 29
Theorem of Young and Schwarz

Theorem. Consider f C 2 (U ). Then the Hessian matrix is symmetric, that is 2f 2f (x ) = (x ) , xi xj xj xi 1 i , j n.
It follows that the Hessian is a symmetric matrix, that is Hfij (x ) = Hfji (x ), 1 i , j n. In particular, the deniteness of Hf can be checked using eigenvalues: HF is positive semi-denite if all eigenvalues are strictly positive and positive semidenite if all eigenvalues are non-negative.
16 / 29
Example
3 x + x 2 x 2 + x + x 2 . Then we have (1) Consider f (x1 , x2 ) = x1 2 1 1 2 2
2f 2 = 6x1 x2 + 2x2 , 2 x1
2f 2 = 2x1 + 2, 2 x2
2f 2 = 3x1 + 4x1 x2 . x1 x2
(2) Consider f (x ) = x t Ax for some symmetric matrix A. Then Hf (x ) = 2A.
17 / 29
Second order Taylor expansion

Theorem. If f is C 2 (U ) and x , y U the Taylor formula becomes 1 f (y ) f (x ) = f (x )t (y x )+ (y x )t Hf (x )(y x )+ R2 (x , y x ) 2 where lim
z 0 R 2 ( x ,z ) z 2
= 0.
Idea. f can be approximated locally around x U by the quadratic function 1 y f (x ) + f (x )t (y x ) + (y x )t Hf (x )(y x ) . 2 Locally, this is a better approximation than the rst order Taylor approximation.
18 / 29
Unconstrained optimization: the problem
In this section we consider problems of the form = Rn minimize f (x ) for x X Moreover, we assume that f is once or twice continuously dierentiable. is an open subset of Rn . Most results hold also in the case where X (3)
19 / 29
Local and global optima

Denition. Consider the optimization problem (3). 1. x is called (unconstrained) local minimum of f if there is some > 0 such that f (x ) f (x ) for all x Rn with x x < . 2. x is called global minimum of f , if f (x ) f (x ) x Rn . 3. x is said to be a strict local/global minimum if the inequality f (x ) f (x ) is strict for x = x . 4. The value of the problem is f := inf {f (x ) : x Rn } Remark Local and global maxima are dened analogously.
20 / 29
Necessary optimality conditions

Proposition. Suppose that x U is a local optimum of f . 1. If f is C 1 in U , then f (x ) = 0. (First Order Necessary Condition or FONC). 2. If moreover f C 2 (U ) then Hf (x ) is positive semi-denite (Second Order Necessary Condition or SONC). Comments.
x Rn with f (x ) = 0 is called stationary point of f . Proof is based on Taylor formula. Necessary conditions for a local maximum: f (x ) = 0,
Hf (x ) negative semidenite.
Necessary conditions in general not sucient: consider
f (x ) = x 3 , x = 0.
21 / 29
Sucient optimality conditions

Proposition. (Sucient conditions.) Let f : U Rn R be C 2 on U . Suppose that x U satises the conditions f (x ) = 0, Hf (x ) strictly positive denite Then x is a local minimum. Comments.
Sucient conditions not necessary: Consider eg.
(4)
f (x ) = x 4 , x = 0.
No global statements possible.
22 / 29
The case of convex functions

Denition. (Convex sets and functions) i) A set X Rn is convex if x1 , x2 X , [0, 1] the convex combination x1 + (1 )x2 belongs to X . ii) A function f : X Rn R (X convex) is called convex if x1 , x2 X , [0, 1] f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 ); f is strict convex if the inequality is strict for (0, 1). iii) f : X Rn R is concave f is (strict) convex holds in (5). Strict concavity is dened in the same way. (5)
23 / 29
Characterizations of Convexity
Lemma. Consider an open convex set X Rn . A C 1 function f : X Rn is convex on the if and only if it holds for all x , z X that f (z ) f (x ) + f (x ) (z x ). If f is C 2 a necessary and sucient condition for the convexity of f on X is the condition that Hf (x ) is positive semi-denite for all x X. Comments
f is concave on U Hf is negative semidenite on U . Note that we may decide convexity or concavity by nding the
eigenvalues of Hf (x ).
24 / 29
Example
2 + 2x x x 2 . Is f convex, Problem. Let f (x1 , x2 ) = 2x1 x2 x1 1 2 2 concave or none of both ?
Solution. The symmetric matrix representing the quadratic part of f is 1 1 A= 1 1 An easy computation gives for the Hessian that Hf (x ) = 2A. Hence we need to check the deniteness of A. Approach via eigenvalues. The characteristic polynomial of A is P () = 2 + 2; the equation P () = 0 has solutions (eigenvalues) 2 and 0. Hence, A is negative semidenite and the function is concave.
25 / 29
Optimality conditions for convex functions
Proposition. Let f : X R be a convex function on some convex set X Rn . Then 1. A local minimum of f over X is also a global minimum. If f is strictly convex, there exists at most one global minimum. 2. If X is open, the condition f (x ) = 0 is necessary and sucient for x X to be a global minimum of f .
26 / 29
Example: Quadratic cost functions

n Let f (x ) = 1 2 x Qx b x , x R for a symmetric n n matrix Q n and some b R . Then we have
f (x ) = Qx b and Hf (x ) = Q . a) Local minima. If x is a local minimum we must have f (x ) = Qx b = 0, Hf (x ) = Q positive semi-denite; hence if Q is not positive semi-denite, f has no local minima. b) If Q is positive semi-denite, f is convex. In that case we need not distinguish global and local minima, and f has a global minimum if and only if there is some x with Qx = b . c) If Q is positive denite, Q 1 exists and the unique global minimum is attained at x = Q 1 b .
27 / 29
Existence results for a global minimum

Proposition(Weierstrass Theorem) Let X Rn be non-empty and suppose that f : X R is continuous in X . Suppose moreover, that one of the following three conditions holds (1) X is compact (closed and bounded). (2) X is closed and f is coercive, that is (x k )k N X with x k one has lim f (x k ) =
k
(3) There is some R such that the level set {x X : f (x ) } is non-empty and compact. Then f has at least one global minimum and the set of all global minima is compact. Remark. The result holds also for lower semicontinous functions. (f is called lower semicontinuous if for all x X , all (x k )k N with x k x lim inf k f (x k ) f (x ).)
28 / 29

Master Qfin at Wu Vienna Lecture Optimization: R Udiger Frey

Uploaded by

Master Qfin at Wu Vienna Lecture Optimization: R Udiger Frey

Uploaded by

Introduction and Unconstrained Optimization