Thesis On Sequential Quadratic Programming As A Method of Optimization
Thesis On Sequential Quadratic Programming As A Method of Optimization
PROGRAMMING
by
NISHANT KUMAR
(1201EE25)
In Partial Fulfilment
of the Requirements for the award of the degree
BACHELOR OF TECHNOLOGY
This is to certify that the work contained in the thesis titled OPTIMIZATION USING
SEQUENTIAL QUADRATIC PROGRAMMING , submitted by Nishant Kumar,
to the Indian Institute of Technology, Patna, for the award of the degree of Bachelor of
Technology, is a bona fide record of the research work done by him under my supervi-
sion. The contents of this thesis, in full or in parts, have not been submitted to any other
Institute or University for the award of any degree or diploma.
Dr. S. Sivasubramani
Supervisor
Assistant Professor
Dept. of Electrical Engineering
IIT-Patna, 800 013
Place: Patna
Date: 1st December 2015
ACKNOWLEDGEMENTS
i
ABSTRACT
Solving optimization problems becomes a central theme not only on operational re-
search but also on several research areas. In this thesis the basic approach for opti-
mization through sequential quadratic programming has been covered. The quadratic
approximation of a vector input scalar valued function under appropriate assumptions
are discussed. The formulation of a quadratic subproblem is emphasized that is assumed
to reflect the local properties of the original problem.
ii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS i
ABSTRACT ii
NOTATION iv
1 INTRODUCTION 1
3 Unconstrained Optimization 8
3.1 Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Steepest Descent Method . . . . . . . . . . . . . . . . . . . 9
3.1.2 Conjugate Gradient Method . . . . . . . . . . . . . . . . . 10
3.1.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.4 Inverse Hessian Update . . . . . . . . . . . . . . . . . . . . 11
3.1.5 Direct Hessian Updating . . . . . . . . . . . . . . . . . . . 11
3.2 Sequential Quadratic Method . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 SQP and Inverse Hessian Update Method . . . . . . . . . . 12
4 Constrained Optimization 14
α Steplength parameter
dx Direction of next iteration
L Lagrangian function
∇ Gradient operator
H Hessian operator
iv
CHAPTER 1
INTRODUCTION
minimize f (x)
x
subject to : h(x) = 0
g(x) ≤ 0
where f : Rn → R, h : Rn → Rm , and g : Rn → Rp . Such problems arise in variety
of applications in science, engineering, industry and management. In the form NLP the
problem is quite general it includes as special cases linear and quadratic programs in
which the constraint functions, h and g are affine and f is linear or quadratic. While
these problems are important and numerous the great strength of the SQP method is its
ability to solve problems with nonlinear constraints. For this reason it is assumed that
NLP contains at least one nonlinear constraint function.
Many optimization demands are met by SQP like in the formulation and development
of a mathematical framework for the solution of the contingency constrained optimal
power flow (OPF). The contingency constrained optimal power flow minimizes the total
cost of a base case operating state.[1]
The basic idea of SQP is to model NLP at a given approximate solution say xk by a
quadratic programming subproblem and then to use the solution to this subproblem to
construct a better approximation xk . This process is iterated to create a sequence of
approximations that it is hoped will converge to a solution x∗ . The key to understand-
ing the performance and theory of SQP is the fact that, with an appropriate choice of
quadratic subproblem, the method can be viewed as the natural extension of Newton
and quasi-Newton methods to the constrained optimization setting. Thus one would
expect SQP methods to share the characteristics of Newton-like methods, namely, rapid
convergence when the iterates are close to the solution but possible erratic behavior that
needs to be carefully controlled when the iterates are far from a solution. While this
correspondence is valid in general, the presence of constraints makes both the analysis
and implementation of SQP methods significantly more complex.
2
CHAPTER 2
The basic SQP method deals with the problems (NLP) that are classified by the assump-
tion that all the functions in (NLP) are three times continuously differentiable.
The gradient of a scalar valued function is denoted by ∇, e.g., ∇f (x).
All vectors are assumed to be column vectors and superscript t is used to denote the
transpose.
A key function, one that plays a central role in all of the theory of constrained optimiza-
tion, is the scalar-valued Lagrangian f unction defined by
where u ∈ Rm and v ∈ Rp are the multiplier vectors. Now x∗ will represent any
particular local solution of (NLP). The following conditions are assumed to hold true
for each such solution.
1. The first order necessary conditions hold, i.e., there exist optimal multiplier
vectors u∗ and v ∗ ≥ 0 such that
gi (x∗ )vi∗ = 0
Any real or complex valued function f which is infinitely differentiable will have a
series expansion with respect to a point x0 given by
1 0 1 1
f (x) = f (x0 ) + f (x0 )(x − x0 ) + f 00 (x0 )(x − x0 )2 + f 000 (x0 )(x − x0 )3 + ...
1! 2! 3!
For approximating a function about the point x0 the higher order differential terms
are neglected. The quadratic approximation of f about x0 will be
1 0 1
q(x0 ) = f (x0 ) + f (x0 )(x − x0 ) + f 00 (x0 )(x − x0 )2
1! 2!
The same series when written for vector argument scalar function gives the quadratic
approximation as:
1 1
q(X0 ) = f (X0 ) + ∇f (x0 )(X − X0 ) + ∇2xx f (x0 )(X − X0 )2
1! 2!
where X is the argument vector of scalar function f and X0 is the point in the vector
space along which the approximation is carried out. [2]
The SQP method is an iterative method in which, at a current iterate xk , the step to
the next iterate is obtained through information generated by solving a quadratic sub-
problem. The subproblem is assumed to reflect in some way the local properties of the
original problem. The major reason for using a quadratic subproblem, i.e., a problem
with a quadratic objective function and linear constraints, is that such problems are rel-
atively easy to solve and yet, in their objective function, can reflect the nonlinearities of
4
the original problem.
A major concern in SQP methods is the choice of appropriate quadratic subproblems.
At a current approximation xk a reasonable choice for the constraints is a linearization
of the actual constraints about xk . Thus the quadratic subproblem will have the form
1
minimize (rk )t dx + dtx Bk dx
X 2
subject to : ∇h(xk )t dx + h(xk ) = 0
∇g(xk )t dx + g(xk ) ≤ 0
minimize L(x, u∗ , v ∗ )
X
subject to : h(x) = 0
g(x) ≤ 0
Note that the constraint functions are included in the objective function for this equiva-
lent problem. Although the optimal multipliers are not known approximations uk and v k
to the multipliers can be maintained as part of the iterative process. The given a current
iterate, (xk , uk , v k ), the quadratic Taylor series approximation in x for the Lagrangian
is
1
L(xk , uk , v k ) + ∇L(xk , uk , v k )t dx + dtx HL(xk , uk , v k )dx
2
A strong motivation for using this function as the objective function in the quadratic
subproblem is that it generates iterates that are identical to those generated by Newton’s
method when applied to the system composed of the first order necessary condition
(condition 1) and the constraint equations (including the active inequality constraints).
This means that the resulting algorithm will have good local convergence properties. In
5
spite of these local convergence properties there are good reasons to consider choices
other than the actual Hessian of the Lagrangian, for example approximating matrices
that have properties that permit the quadratic subproblem to be solved at any xk and the
resulting algorithm to be amenable to a global convergence analysis. Letting Bk to be
approximate of HL(xk , uk , v k ), we can write the quadratic subproblem as:
1
minimize ∇L(xk , uk , v k )t dx + dtx Bk dx
dx 2
subject to : ∇h(xk )t dx + h(xk ) = 0
∇g(xk )t dx + g(xk ) ≤ 0
The form of the quadratic subproblem most often found in the literature, and the one
that will be employed is
1
minimize ∇f (xk )t dx + dtx Bk dx
dx 2
subject to : ∇h(xk )t dx + h(xk ) = 0
∇g(xk )t dx + g(xk ) ≤ 0
These two forms are equivalent for problems with only equality constraints since, by
virtue of the linearized constraints, the term ∇h(xk )t dx is constant and the objective
function becomes ∇f (xk )t dx + 12 dtx Bk dx . The two subproblems are not quite equiva-
lent in the inequality-constrained case unless the multiplier estimate v k is zero for all
inactive linear constraints. However (QP) is equivalent to (2.4) for the slack-variable
formulation of (NLP) given by
minimize f (x)
dx
subject to : h(x) = 0
g(x) + z = 0
z≥0
6
the multipliers are needed. There are several ways in which these can be chosen, but
one obvious approach is to use the optimal multipliers of the quadratic subproblem.
Let the optimal multipliers of (QP) be denoted by uqp and vqp , and setting
du = uqp − uk
dv = vqp − v k
xk+1 = xk + αdx
uk+1 = uk + αdu
v k+1 = v k + αdv
for some selection of the steplength parameter α. Once the new iterates are con-
structed the problem functions and derivatives are evaluated and a prescribed choice of
Bk+1 calculated. [2]
7
CHAPTER 3
Unconstrained Optimization
To understand the benefit of optimizing using SQP first unconstrained optimization will
be considered. Having problem statement of
minimize f (x)
x
where f (x) can be any non linear or linear function subjected to minimize under no
constraints. There are several algorithms present for the unconstrained minimization of
multi variable function few of them which are widely used are:
• Newton’s method
Gradient methods rely on an important idea, iterative descent that works as follows, to
minimize this problem.
Since α > 0,
∇f (xk )T dk < 0
After the computation of dk at each step the next point for the iteration is concluded by:
xk+1 = xk + αdk
For the evaluation of α which is the step length to be opted for gaining the maximum
possible minimum value of the function in the direction of dk , Armijo’s Rule is imple-
mented which is
φ(α) = f (xk + αdk )
The value of α is chosen large which is 1 here. Then following condition is checked.
dk = −∇f (xk )
9
3.1.2 Conjugate Gradient Method
where 2
kf (xk )k
βk =
kf (xk−1 )k
1
f (x + ∆x) = f (x) + ∇f (x)T ∆x + ∆xT ∇2 f (x)∆x
2
since
∇f (x) = 0
∇f (x) + H∆x = 0
∆x = −H −1 ∇f (x)
dk = −Hk−1 ∇f (xk )
∇f (xk )T dk < 0
10
3.1.4 Inverse Hessian Update
At the start of the iteration it is assumed that Hk−1 = I then the inverse of the Hessian
matrix is updated as the iterations are proceed:
Initially Hk−1 = I then the Hessian matrix is updated as the iterations proceed:
1 1
Q(X) = f (X0 ) + ∇f (X0 )(X − X0 ) + ∇2xx f (X0 )(X − X0 )2
1! 2!
∇Q(X) = 0
11
which results in set of n linear equations.
a x + a12 x2 + a13 x3 + ... + a1n xn b1
11 1
a21 x1 + a22 x2 + a23 x3 + ... + a2n xn b2
..
=
a31 x1 + a32 x2 + a33 x3 + ... + ann xn .
.. ..
. .
an1 x1 + an2 x2 + an3 x3 + ... + ann xn bn
• Check k∇f (xk )k ≤ . If yes then the point is the required optimizer. Else
proceed to next step.
• The point so obtained is xk+1 . Then procedures from step are followed iteratively.
To compare the performance between SQP and Inverse Hessian Update Method two
sample examples were taken:
min f = 3x2 + y 4
Table 3.1
12
min f = 3y 6 + x8 + 1
Table 3.2
• The number of iterations carried out for the minimization of the function 3x2 + y 4
were 16 and 20 for SQP and Inverse Hessian Update respectively.
• The run time for both the algorithms were close enough.
• The number of iterations were 17 and 276 for SQP and Inverse Hessian Update
respectively.
• The run time for the algorithms highly differed due to the difference in the number
of iterations carried out.
13
CHAPTER 4
Constrained Optimization
The constrained optimization of the objective function is carried out by using the SQP
algorithm.
Matlab Codes
function QuadraticFunction
f_str = input(’enter the function: ’,’s’);
f = inline(f_str);
n = nargin(f); %%provides no. of arguments
X_char = argnames(f); %% X_char is the nx1 matrix of argument
X = sym(’X’, [n,1]); %%declare matrix X of dimension nx1 as sym
i = 1;
while (i<=n)
X(i,1) = X_char(i,1);
i = i+1;
end
Xk = input(’suggest a point: ’);
G = Grad(f, X, Xk)
F = Fvalue(f, X, Xk)
H = Hvalue(f, X, Xk)
q = F + transpose(G)*(X-Xk) +1/2*transpose(X-Xk)*H*(X-Xk)
end
grad_val = temp1;
end
16
k =1;
while(i<=n)
% i
while(j<=n)
% j
while(k<=n)
% k
temp3(i,j) = subs(temp3(i,j), arg_vector(k,1), Xc(k,
k = k+1;
end
k=1;
j = j+1;
end
j = 1;
i = i+1;
end
hes_val = temp3;
end
17
REFERENCES
18