ELE704 - Lecture Notes - II - 04!04!2024
ELE704 - Lecture Notes - II - 04!04!2024
Optimality Conditions
Constrained Optimization Algorithms
Constrained Optimization
Constrained Optimization
Outline
- Duality
- Optimality Conditions
- Constrained Optimization Algorithms
Duality
λT g(x̃) + ν T h(x̃) ≤ 0.
- So,
ℓ(λ, ν) = inf L(x, λ, ν) ≤ L(x̃, λ, ν) ≤ f (x̃) ∀x̃ □
x∈D
Examples:
- Least Squares (LS) solution of linear equations
min xT x
s.t. Ax = b
- The Lagrangian is
L(x, ν) = xT x + ν T (Ax − b)
min cT x
s.t. Ax = b
−x≤0
- The Lagrangian is
L(x, λ, ν) = cT x − λT x + ν T (Ax − b)
min xT Wx
s.t. x2j = 1, j = 1, . . . , N
N
x2j = 1, then
P
- - If we relax the constraint to be ∥x∥2 = 1, i.e.,
j=1
problem becomes easy to solve
min xT Wx
s.t. ∥x∥2 = 1
p∗ = N λmin (W)
max ℓ(λ, ν)
s.t. λ ≥ 0
- Solution of the above problem for a dual feasible set is called the dual
optimal point (λ∗ , ν ∗ ) (i.e., optimal Lagrange multipliers)
p̂∗ = ℓ(λ∗ , ν ∗ ) ≤ p∗
min cT x
s.t. Ax = b
x≥0
max − bT ν
s.t. AT ν − λ + c = 0
λ≥0
max − bT ν
s.t. AT ν + c ≥ 0
min cT x
s.t. Ax ≤ b
max − bT λ
s.t. AT λ + c = 0
λ≥0
Weak Duality
- We know that
Strong Duality
- Strong duality refers to the case where the duality gap is zero, i.e.,
p̂∗ = p∗
Slater’s Condition
min f (x)
s.t. g(x) ≤ 0
Ax = b
g(x) < 0
Ax = b
Saddle-point Interpretation
p̂∗ ≤ p∗
- i.e. with strong duality we can switch inf and sup for λ ≥ 0.
∀w ∈ W, ∀z ∈ Z
- i.e.,
= sup f (w̃, z)
z∈Z
the strong max-min property holds with the value f (w̃, z̃).
- For Lagrange duality, if x∗ and λ∗ are optimal points for the primal and
dual problems with strong duality (zero duality gap), then they form a
saddle-point for the Lagrangian, or vice-versa (i.e., the converse is also
true).
Certificate of Suboptimality
ℓ(λ, ν) ≤ p∗
- Then,
f (x) − p∗ ≤ f (x) − ℓ(λ, ν)
- for primal feasible point x and dual feasible point (λ, ν) whereas the
duality gap associated with these points is
f (x) − ℓ(λ, ν)
- in other words
Stopping Criterion
- If an algorithm produces the sequences x(k) and (λ(k) , ν (k) ) check for
?
f (x(k) ) − ℓ(λ(k) , ν (k) ) < ϵ
Complementary Slackness
f (x∗ ) = ℓ(λ∗ , ν ∗ )
= inf f (x) + λT g(x) + ν T h(x)
x
!
X X
= inf f (x) + λi gi (x) + νj hj (x)
x
i j
X X
≤ f (x∗ ) + λ∗i gi (x∗ ) + νj∗ hj (x∗ )
i j
| {z } | {z }
≤0 =0
≤ f (x∗ )
λ∗i gi (x∗ ) =0 ∀i
- In other words
gi (x∗ ) ≤ 0 (constraint)
hj (x∗ ) = 0 (constraint)
λ∗i ≥0 (constraint)
λ∗i gi (x ) = 0
∗
(complementary slackness)
∗
X X
∇f (x ) + λ∗i ∇gi (x∗ ) + νj∗ ∇hj (x∗ ) =0
i j
- If f (x), gi (x) and hj (x) are convex and x̃, λ̃ and ν̃ satisfy KKT
conditions, then they are optimal points, i.e.,
- from complementary slackness f (x̃) = L(x̃, λ̃, ν̃)
- from last condition ℓ(λ̃, ν̃) = L(x̃, λ̃, ν̃) = inf L(x, λ̃, ν̃) . Note
x
that L(x, λ̃, ν̃) is convex in x.
- Thus,
f (x̃) = ℓ(λ̃, ν̃).
Example 18:
1 T
min x Qx + cT x + r (Q : SPD)
2
s.t. Ax = b
Example 19:
Solution:
- If strong duality holds and if dual optimal solution (λ∗ , ν ∗ ) exists, then
we can compute a primal optimal solution from the dual solutions.
- If the dual problem is easier to solve (e.g. has less dimensions or has an
analytical solution), then solving the dual problem and finding the
optimal dual parameters (λ∗ , ν ∗ ) first and then solving
x∗ = argmin L(x, λ∗ , ν ∗ )
s.t. aT x = b
N
X
= −bν − fi∗ (−νai )
i=1
- NOTE: The conjugate function f ∗ (y) is the maximum gap between the
linear function yT x and f (x) (see Boyd Section 3.3). If f (x) is
differentiable, this occurs at a point x where ∇f (x) = y. Note that,
f ∗ (y) is a convex function.
f ∗ (y) = sup yT x − f (x)
x∈dom f (x)
∇L(x, ν ∗ ) = 0
∂fi (xi )
= −ν ∗ ai .
∂xi
- Original Problem
primal dual
- Perturbed problem
primal dual
- Here u = [ui ]L×1 and v = [vj ]M ×1 are called the perturbations. When
ui = 0 and vj = 0, the problem becomes the original problem. If ui > 0,
it means we have relaxed the i-th inequality constraint, and if ui < 0, it
means we have tightened the i-th inequality constraint.
- Let us use the notation p∗ (u, v) to denote the optimal solution of the
perturbed problem. Thus, the optimal solution of the original problem is
p∗ = p∗ (0, 0) = ℓ(λ∗ , ν ∗ ).
- Assume that strong duality holds and optimal dual solution exists, i.e.,
p∗ = ℓ(λ∗ , ν ∗ ). Then, we can show that
- If λi is small and ui > 0, then p∗ (u, v) will not decrease too much.
- If |νj | is large
- If νj > 0 and vi < 0, then p∗ (u, v) is guaranteed to increase greatly.
- If νj < 0 and vi > 0, then p∗ (u, v) is guaranteed to increase greatly.
- If |νj | is small
- If νj > 0 and vi > 0, then p∗ (u, v) will not decrease too much.
- If νj < 0 and vi < 0, then p∗ (u, v) will not decrease too much.
-
Cenk Toker ELE 704 Optimization 40 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
Introduction
min f (x)
s.t. g(x) ≤ 0
h(x) = 0
Primal Methods
See Luenberger Chapter 12.
- A primal method is a search method that works on the original problem
directly by searching through the feasible region for the optimal solution.
Each point in the process is feasible and the value of the objective
function constantly decreases.
- For a problem with N variables and M equality constraints, primal
methods work in the feasible space of dimension N − M .
- Advantages:
- x(k) are all composed of feasible points.
- If x(k) is a convergent sequence, it converges at least to a local
minimum.
- Do not rely on special problem structure, e.g. convexity, in other
words, primal methods are applicable to general non-linear problems.
- Disadvantages:
- must start from a feasible initial point.
- may fail to converge for inequality constraints if precaution is not
taken.
Cenk Toker ELE 704 Optimization 43 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
- Update equation is
x(k+1) = x(k) + α(k) d(k)
- d(k) must be a descent direction and x(k) + α(k) d(k) must be contained
in the feasible region, i.e., d(k) must be a feasible direction for some
α(k) > 0.
- Last line ensures a bounded solution. The other constraints assure that
vectors of the form x(k) + α(k) d(k) will be feasible for sufficiently small
α(k) > 0, and subject to these conditions, d(k) is chosen to line up as
closely as possible with the negative gradient of f (x(k) ). In some sense
this will result in the locally best direction in which to proceed. The
overall procedure progresses by generating feasible directions in this
manner, and moving along them to decrease the objective.
Cenk Toker ELE 704 Optimization 45 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
There are two major shortcomings of feasible direction methods that require
that they be modified in most cases.
- The first shortcoming is that for general problems there may not exist any
feasible directions. If, for example, a problem had nonlinear equality
constraints, we might find ourselves in the situation where no straight line
from x(k) has a feasible segment. For such problems it is necessary either
to relax our requirement of feasibility by allowing points to deviate
slightly from the constraint surface or to introduce the concept of moving
along curves rather than straight lines.
gi (x∗ ) = 0, i∈A
∗
gi (x ) < 0, i∈
/A
λi ≥ 0, i∈A
λi = 0, i∈
/A
- The surface defined by the working set W will be called the working
surface.
min f (x)
s.t. gi (x) = 0, i∈W
to find λi .
- If ∃i ∈ W such that λi < 0, then remove the i-th constraint from the
working set.
min f (x)
s.t. Ax = b
- If we use first order Taylor approximation around point x(k) , such that
f (x) = f (x(k) + d) ∼= f (x(k) ) + ∇T f (x(k) )d for small enough d , the we
will have
dT Id ≤ 1
min ∇T f (x(k) )d
s.t. Ad = 0
dT Id ≤ 1
- dT Id ≤ 1 is the Euclidean unit ball (∥d∥22 ≤ 1), thus this is the projected
GD algorithm.
- Solve α(k) = argmin f (x(k) + αd(k) ) using a line search algorithm, e.g.
exact or backtracking line search.
Projection:
- DFP problem is another constrained optimization problem which should
satisfy its KKT conditions, i.e.,
Ad(k) = 0
T
d(k) Qd(k) = 0
∇f (x(k) ) + 2βk Qd(k) + AT λk = 0
βk ≥ 0
T
βk 1 − d(k) Qd(k) = 0
= −P(k) ∇f (x(k) )
- where −1
P(k) = Q−1 − Q−1 AT AQ−1 AT AQ−1
is called the projection matrix.
- As 2βk is just a scaling factor, we can safely write that the descent
direction d(k) is given by
PSDA with DFP Algorithm: Given a feasible point x(k) (i.e., x(k) ∈ F)
- Find the active constraint set W(k) and form A (actually A(k) ) matrix
- Calculate
−1
P(k) = Q−1 − Q−1 AT AQ−1 AT AQ−1
- If d(k) ̸= 0
- Find α
n o
α1 = max α : x(k) + αd(k) is feasible
n o
α2 = argmin f (x + αd(k) ) : 0 ≤ α ≤ α1
- If d(k) = 0
- Find λ −1
λ = − AQ−1 AT AQ−1 ∇f (x(k) )
- The active constraints are the two equalities and the inequality x4 ≥ 0,
thus
2 1 1 4
A = 1 1 2 1
0 0 0 1
−3
- So,
22 9 4
AAT = 9 7 1
4 1 1
−1 6 −5 −19
T 1
AA = −5 6 14
11
−19 14 73
Nonlinear constraints:
- Consider the problem which defines only the active constraints
min f (x)
s.t. h(x) = 0
-
Cenk Toker ELE 704 Optimization 65 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
- In this case updated point must be projected onto the constrained surface.
For projected gradient descent method, the projection matrix is given by
−1
P(k) = I − JTh (x(k) ) Jh (x(k) )JTh (x(k) ) Jh (x(k) )
min f (x)
s.t. Ax = b
- Minimum occurs at
Ax∗ = b
∇f (x∗ ) + AT ν ∗ = 0
Quadratic Minimization
1 T
min x Qx + cT x + r
2
s.t. Ax = b
Ax∗ = b
)
A T x∗
Q −c
= =
Qx∗ + c + AT ν ∗ = 0 A 0 ν∗ b
| {z }
KKT matrix
- There are, of course, many possible choices for the elimination matrix F,
which can be chosen as any matrix in RN ×(N −M ) with the range
(columnspace) of the matrix F is equals to the nullspace of the matrix A,
i.e., R(F) = N (A). If F is one such matrix, and T ∈ R(M −N )×(M −N ) is
nonsingular, then F̃ = FT is also a suitable elimination matrix, since
R(F̃) = R(F) = N (A).
Example 23:
min f (x)
s.t. Ax = b
H(x) AT ∆xnt
−∇f (x)
∗ = .
A 0 ν 0
| {z }
KKT matrix
Ax∗ = b
∇f (x∗ ) + AT ν ∗ = 0
- Newton decrement λ(x) is the same as the one used for the
unconstrained problem, i.e.,
1/2
λ(x) = ∆xTnt H(x)∆xnt = ∥∆xnt ∥H(x)
being the norm of the Newton step in the norm defined by H(x).
λ2 (x) λ2 (x)
- 2
, being a good estimate of f (x) − p∗ (i.e., f (x) − p∗ ≈ 2
), can
2
λ (x)
be used as the stopping criterion 2
≤ ϵ.
min f (x)
≡ min f˜(z) = f (Fz + x̂)
s.t. Ax = b
with R(F) = N (A) and Ax̂ = b.
where x = Fz + x̂.
- The Newton decrement λ̃(z) is the same as the Newton decrement of the
original problem
Penalty Methods
min f (x)
s.t. x ∈ F
- The idea is to replace the this problem by the following penalty problem
min f (x) + cP (x)
where c ∈ R++ is a constant and P (x) : RN → R is the penalty function.
- Here
- P (x) is continuous
- P (x) ≥ 0 ∀x ∈ RN
- P (x) = 0 iff x ∈ F
- In general, the following quadratic penalty function is used
L M
1X 1X
P (x) = (max {0, gi (x)})2 + (hj (x))2
2 i=1 2 j=1
where gi (x) ≤ 0 are the inequality constraints and hj (x) = 0 are the
equality constraints.
Cenk Toker ELE 704 Optimization 78 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
2
1
(max {0, gi (x)})2 with g1 (x) = x − b
P
- For example, consider P (x) = 2
i=1
and g2 (x) = a − x
- For large c, the minimum point of the penalty problem will be in a region
where P (x) is small.
- For increasing c, the solution will approach the feasible region F and as
c → ∞ the penalty problem will converge to a solution of the constrained
problem.
Cenk Toker ELE 704 Optimization 79 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
Penalty Method:
n o
- Let c(k) , k = 1, 2, . . . be a sequence tending to ∞ such that ∀k
c(k) ≥ 0 and c(k+1) > c(k) .
- Let
q(c, x) = f (x) + cP (x)
and for each k solve the penalty problem
min q(c(k) , x)
Barrier Methods
min f (x)
s.t. x ∈ F
- The idea is to replace the this problem by the following barrier problem
1
min f (x) + B(x)
c
s.t. x ∈ interior of F
where c ∈ R++ is a constant and B(x) : RN → R is the barrier function.
- Here, barrier function B(x) is defined on the interior of F such that
- B(x) is continuous
- B(x) ≥ 0
- B(x) → ∞ as x approaches the boundary of F
- Ideally, (
0, x ∈ interior F
B(x) =
∞, x∈/ interior F
Cenk Toker ELE 704 Optimization 82 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
- There are several approximations. Two common barrier functions for the
inequality constraints are given below
L
X
Log barrier: B(x) = − log (−gi (x))
i=1
L
X 1
Inverse barrier: B(x) = −
i=1
gi (x)
- For example, consider B(x) = −1/g1 (x) − 1/g2 (x), g1 (x) = x − b and
g2 (x) = a − x.
Barrier Method:
n o
- Let c(k) , k = 1, 2, . . . be a sequence tending to ∞ such that ∀k
c(k) ≥ 0 and c(k+1) > c(k) .
- Let
1
r(c, x) = f (x) + B(x)
c
and for each k solve the barrier problem
min r(c(k) , x)
s.t. x ∈ interior of F
- Let the penalty function γ(x) where P (x) = γ(p(x)) to be the Euclidean
norm function
L
X
γ(y) = yT y ⇒ P (x) = pT (x)p(x) = (pi (x))2
i=1
- The Hessian of the above problem becomes more and more ill-conditioned
as c → ∞
- Defining
q(c, x) = f (x) + c γ(p(x))
where F(x), G(x) and Γ(x) are Hessians of f (x), g(x) and γ(x)
respectively, and Jp (x) is the Jacobian of p(x).
Interior-Point Methods
s.t. Ax = b
- Minimum occurs at
p∗ = inf {f (x) | Ax = b} = f (x∗ )
Cenk Toker ELE 704 Optimization 89 / 117
Introduction
Duality Primal Methods
Optimality Conditions Equality Constrained Optimization
Constrained Optimization Algorithms Penalty and Barrier Methods
Interior-Point Methods
- The function
L
X
ϕ(x) = − log (−gi (x))
i=1
barrier function.
- We will modify the Newton’s algorithm to solve the above problem. So,
we will need
L
X 1
∇ϕ(x) = − ∇gi (x)
i=1
gi (x)
L L
X 1 X 1
Hϕ (x) = ∇ϕ(x)∇T ϕ(x) − Hg (x)
i=1
2
gi (x) i=1
gi (x) i
where Hϕ (x) = ∇ ϕ(x) and Hgi (x) = ∇2 gi (x) are the Hessians of ϕ(x)
2
Central Path
Ax∗ (c) = b
gi (x∗ (c)) < 0, ∀i Centrality conditions
c ∇f (x∗ (c)) + ∇ϕ(x∗ (c)) + AT ν̂ = 0
for some ν̂ ∈ RM .
min eT x
s.t. Ax ≤ b
1
where di = .
bi − aTi x
- Since x is strictly feasible, we have d > 0, so the Hessian of ϕ(x), Hϕ (x)
is nonsingular if and only if A has rank N , i.e. full-rank.
- The dashed curves in the previous figure show three contour lines of the
logarithmic barrier function ϕ(x). The central path converges to the
optimal point x∗ as c → ∞. Also shown is the point on the central path
with c = 10. The optimality condition at this point can be verified
geometrically: The line eT x = eT x∗ (10) is tangent to the contour line of
ϕ(x) through x∗ (10).
- So, the dual function optimal value p̂∗ (c) = ℓ(λ∗ (c), ν ∗ (c)) is finite and
given as
L
X
ℓ(λ∗ (c), ν ∗ (c)) = f (x∗ (c)) + λ∗i (c)gi (x∗ (c)) + ν ∗T (c) (Ax∗ (c) − b)
i=1
| {z } | {z }
=1
c
=0
L
= f (x∗ (c)) −
c
L
- Duality gap is and
c
L
f (x∗ (c)) − p∗ ≤
c
goes to zero, as c → ∞.
KKT Interpretation
1
- Complementary slackness, λi gi (x) = 0 → −λ∗i gi (x∗ ) =
c
- As c → ∞, x∗ (c), λ∗ (c) and ν ∗ (c) almost satisfy the KKT optimality
conditions.
1
- Let λi = − , then
c gi (x)
L
X 1
∇f (x) − ∇gi (x) + AT ν = 0
i=1
c gi (x)
Ax = b
L L
X 1 X 1
+ H(x)d − Hgi (x)d + 2
∇gi (x)∇T gi (x)d
i=1
c gi (x) i=1
c g i (x)
| {z }
=Ĥd
Ĥd + AT ν = −ĝ
Ad = 0
where
L L
X 1 X 1
Ĥ = H(x) − Hg (x) + ∇gi (x)∇T gi (x)
i=1
c gi (x) i i=1
c g 2
i (x)
L
X 1
ĝ = ∇f (x) − ∇gi (x)
i=1
c gi (x)
Ĥ AT d
−ĝ
=
A 0 ν 0
∗
- whose solution would give the modified Newton step ∆xnt and νnt
- Using this Newton step, the Interior-Point Method (i.e., Barrier Method)
can be constructed.
- Accuracy of centering:
- Computing x∗ (c) exactly is not necessary since the central path has
no significance beyond the fact that it leads to a solution of the
original problem as c → ∞; inexact centering will still yield a
sequence of points x(k) that converges to an optimal point. Inexact
centering, however, means that the points λ∗ (c) and ν ∗ (c),
computed from the first two equation given in the section titled
"Dual points from central path", are not exactly dual feasible. This
can be corrected by adding a correction term to these formulae,
which yields a dual feasible point provided the computed x is near
the central path, i.e., x∗ (c).
- Choice of µ:
- µ provides a trade-off in the number of iterations for the inner and
outer loops.
- small µ: at each step inner loop starts from a very good point, few
inner loop iterations are required, but too many outer loop
iterations may be required.
- Choice of c(0) :
- large c(0) : first run of inner loop may require too many iterations.
Example 25:
Solution:
Example 26:
Solution:
min s
s.t. gi (x) ≤ s, i = 1, 2, . . . , L
Ax = b
- Then, apply the interior-point method to solve the above problem. There
are three cases depending on the optimal value p̄∗
- If p̄∗ < 0, then a strictly feasible solution is reached. Moreover if
(x, s) is feasible with s < 0, then x satisfies gi (x) < 0. This means
we do not need to solve the optimization problem with high
accuracy; we can terminate when s < 0.
- If p̄∗ > 0, then there is no feasible solution.
- If p̄∗ = 0, and the minimum is attained at x∗ and s∗ = 0, then the
set of inequalities is feasible, but not strictly feasible. However, if
p̄∗ = 0 and the minimum is not attained, then the inequalities are
infeasible.
Example 27:
Solution: