Constrained Optimization
Constrained Optimization
Joshua Wilde, revised by Isabel Tecu, Takeshi Suzuki and María José Boccardi
1 General Problem
g1 (x1 , . . . , xn ) ≤ b1 , . . . , gk (x1 , . . . , xn ) ≤ bk ,
h1 (x1 , . . . , xn ) = c1 , . . . , hm (x1 , . . . , xn ) = cm .
The function f (x) is called the objective function, g(x) is called an inequality constraint, and
h(x) is called an equality constraint. In the above problem there are k inequality constraints and
m equality constraints. In the following we will always assume that f , g and h are C 1 functions, i.e.
that they are dierentiable and their derivatives are continuous.
Notice that this problem diers from the regular unconstrained optimization problem in that instead
of nding the maximum of f (x), we are nding the maximum of f (x) only over the points which
satisfy the constraints.
Example: Maximize f (x) = x2 subject to 0 ≤ x ≤ 1.
Solution: We know that f (x) is strictly monotonically increasing over the domain, therefore the
maximum (if it exists) must lie at the largest number in the domain. Since we are optimizing over a
compact set, the point x = 1 is the maximal number in the domain, and therefore it is the maximum.
This problem was easy because we could visualize the graph of f (x) in our minds and see that it
was strictly monotonically increasing over the domain. However, we see a method to nd constrained
maxima of functions even when we can't picture them in our minds.
2 Equality Constraints
h(x1 , . . . , xn ) = c.
Now draw level sets of the function f (x1 , . . . , xn ). Since we might not be able to achieve the un-
constrained maxima of the function due to our constraint, we seek to nd the value of x which gets
1
2 Constrained Optimization
us onto the highest level curve of f (x) while remaining on the function h(x). Notice also that the
function h(x) will be just tangent to the level curve of f (x).
Call the point which maximizes the optimization problem x∗ , (also referred to as the maximizer).
Since at x∗ the level curve of f (x) is tangent to the curve g(x), it must also be the case that the
gradient of f (x∗ ) must be in the same direction as the gradient of h(x∗ ), or
∇f (x∗ ) = λ∇h(x∗ ),
Then setting the partial derivatives of this function with respect to x equal to zero will yield the rst
order conditions for a constrained maximum:
∇f (x∗ ) − λ∇h(x∗ ) = 0.
Setting the partial derivative with respect to λ equal to zero gives us our original constraint back:
h(x∗ ) − c = 0.
So the rst order conditions for this problem are simply ∇L(x, λ) = 0
Remember that points obtained using this formula may or may not be a maximum or minimum, since
the rst order conditions are only necessary conditions. They only give us candidate solutions.
There is another more subtle way that this process may fail, however. Consider the case where
∇h(x∗ ) = 0, or in other words, the point which maximizes f (x) is also a critical point of h(x).
Remember our necessary condition for a maximum
∇f (x∗ ) = λ∇h(x∗ ),
Since ∇h(x∗ ) = 0, this implies that ∇f (x∗ ) = 0. However, this the necessary condition for an
unconstrained optimization problem, not a constrained one! In eect, when ∇h(x∗ ) = 0, the constraint
is no longer taken into account in the problem, and therefore we arrive at the wrong solution.
Math Camp 3
h1 (x1 , . . . , xn ) = c1 , . . . , hm (x1 , . . . , xn ) = cm .
Let's rst talk about how the lagrangian approach might fail. As we saw for one constraint, if
∇h(x∗ ) = 0, then the constraint drops out of the equation. Now consider the Jacobian matrix, or a
vector of the gradients of the dierent hi (x∗ ).
dh1 (x∗ ) dh1 (x∗ )
∇h1 (x∗ )
dx1 ... dxn
Dh(x∗ ) = .. .. .. ..
. = . . .
∇hm (x∗ ) dhm (x∗ ) dhm (x∗ )
dx1 ... dxn
Notice that if any of the ∇hi (x∗ )'s is zero, then that constraint will not be taken into account in
the analysis. Also, there will be a row of zeros in the Jacobian, and therefore the Jacobian will not
be full rank. The generalization of the condition that ∇h(x∗ ) 6= 0 for the case when m = 1 is that
the Jacobian matrix must be of rank m. Otherwise, one of the constraints is not being taken into
account, and the analysis fails. We call this condition the non-degenerate constraint qualication
(NDCQ).
Note that we only have to check whether the NDCQ holds at points in the constraint set, since points
outside the constraint set are not solution candidates anyways. If we test for NDCQ and nd that
the constraint is violated for some point within our constraint set, we have to add this point to our
candidate solution set. The Lagrangian technique simply does not give us any information about this
point.
The Lagrangian for the multi-constraint optimization problem is
m
X
L(x1 , . . . , xn , λ) = f (x1 , . . . , xn ) − λi [hi (x1 , . . . , xn ) − ci ]
i=1
dL dL
= 0, . . . , = 0.
dλ1 dλn
.
Notice that the rank is 1 if and only if both x = y = 0. However, if this is the case, then our rst
constraint fails to hold. Therefore, the rank is 2 for all points in the constraint set, and so we don't
4 Constrained Optimization
(1 − x) (1 − x2 ) − x2 − x(1 + x) = 0
(1 − x) 1 − 3x2 − x = 0
n √ √ o
This yields x = 1, −1+6 13 , −1−6 13 .
Let's analyze x = 1 rst. From the second constraint we have that z = 0, and from the rst constraint
we have that y = 0. That contradicts our assumption that y 6= 0, so this cannot be a solution.
Plugging in the other values, we get four candidate solutions
x = .4343, y = ±0.9008, z = 0.5657
3 Inequality Constraints
g(x1 , . . . , xn ) ≤ b.
In this case the solution is not constrained to the curve g(x), but merely bounded by it.
In order to understand the new conditions, imagine the graph of the level sets which we talked about
before. Instead of being constrained to the function g(x), the domain is now bounded by it instead.
However, the boundary of the function is still the same as before. Notice that there is still a point
where the boundary is tangent to some level set of g(x). The question now is whether the boundary
is binding or not binding .
Remember that we are trying to nd candidate points for a global maximum. Restricting our original
domain X to the set of points where g(x) ≤ b gives us two types of candidate points: Points on the
boundary g(x) = b where g(x) = b is tangent to the level curves of f (x), and local maxima of f (x) for
which g(x) ≤ b. The rst type we can nd by the constrained FOC ∇f (x) = λ∇g(x), and the second
type we can nd by the unconstrained FOC ∇f (x) = 0. Let's look at each of these in turn.
Case 1: Candidates along the boundary (constraint binding) This is the case where an
unconstrained maximum lies outside of the constraint set. In other words, the inequality constrains
us from reaching a maximum of f . In this case, the gradient of f (x) is going to point in the steepest
direction up the graph. The gradient of g(x) points to the set g(x) ≥ b (since it points in the direction
of increasing g(x)). Therefore ∇g(x) is pointing in the same direction as ∇f (x), which implies that
λ ≥ 0. So necessary conditions for a solution on the boundary are
Case 2: Candidates not on the boundary (constraint not binding) This is the case where
an unconstrained maximum lies inside the constraint set. In other words, the inequality does not
constrain us from reaching this maximum of f . The rst order condition is simply ∇f (x) = 0, which
we can rewrite (to take the same shape as above) as
∇f (x) − λ∇g(x) = 0 and λ = 0.
In summary, either the constraint is binding (tight), that is g(x) − b = 0 and λ ≥ 0, or the constraint
is not-binding (slack), and then λ = 0. We can summarize this new set of conditions as what is called
the complementary slackness conditions
[g(x) − b] λ = 0.
This works because if the constraint is binding, then g(x) − b = 0, and if the constraint is not binding,
then we want to ignore it by having λ = 0.
As we can see, it does not matter whether the constraint binds or does not bind, the Lagrangian
multiplier must always be greater than or equal to 0. Therefore, another new set of conditions are
λi ≥ 0 ∀ i.
We can summarize all these FOC in terms of the Lagrangian, which we dene as before to be L(x, λ) =
f (x) − λ(g(x) − b):
∂L(x, λ) ∂f (x) ∂g(x)
= −λ = 0 for alli = 1, . . . , n
∂xi ∂xi ∂xi
∂L(x, λ)
λ = λ [g(x) − b] = 0 complementary slackness
∂λ
∂L(x, λ)
= [g(x) − b] ≤ 0 original constraint
∂λ
λ≥0
g1 (x1 , . . . , xn ) ≤ b1 , . . . , gk (x1 , . . . , xn ) ≤ bk .
In order to understand the new NDCQ, we must realize that if a constraint does not bind, we don't
care if it drops out of the equation. The point of the NDCQ was to ensure that binding constraints do
not drop out. Therefore, the NDCQ with inequality constraints is the same as the equality constraints,
except for the fact that we only care about the Jacobian matrix of the binding constraints, or the
Jacobian for the constraints with λi > 0. (Notice that we cannot tell in advance which constraints will
be binding, so we need to check all of them when we check the NDCQ before computing the solution
candidates.)
The following rst order conditions will tell us the candidate points for a maximum
∂L ∂L
= 0, . . . , =0
∂x1 ∂xn
g1 (x1 , . . . , xn ) ≤ b1 , . . . , gk (x1 , . . . , xn ) ≤ bk
λ1 ≥ 0, . . . , λk ≥ 0
4 Mixed constraints
Now consider the general problem, in which we have equality and inequality constraints:
max f (x1 , . . . , xn ) subject to :
xi ∈R
g1 (x1 , . . . , xn ) ≤ b1 , . . . , gk (x1 , . . . , xn ) ≤ bk ,
h1 (x1 , . . . , xn ) = c1 , . . . , hm (x1 , . . . , xn ) = cm .
Summarizing the conditions for equality and inequality constraints that we found above we can for-
mulate the following theorem:
Math Camp 7
Theorem: Suppose that x∗ is a local maximizer of f on the constraint set given by the
k inequality and m equalities above. Without loss of generality, assume that the rst k0
inequality constraints are binding and that the other k − k0 inequality constraints are not
binding. Further suppose that the Jacobian of the equality constraints and the binding
inequality constraints at x∗ has full rank. Form the Lagrangian
k
X m
X
L(x, λ1 , . . . , λk , µ1 , . . . , µm ) = f (x) − λi [gi (x) − bi ] − µi [hi (x) − ci ]
i=1 i=1
Then there exist multipliers λ∗1 , . . . , λ∗k and µ∗1 , . . . , µ∗m such that
1. ∂L ∗ ∗ ∗
∂xi (x , λ , µ ) = 0 for all i ∈ {1, . . . , n}
2. λ∗i [gi (x) − bi ] = 0 for all i ∈ {1, . . . , k}
3. hi (x∗ ) = cifor all ∈ {1, . . . , m}
i
4. gi (x ) ≤ bi for all i ∈ {1, . . . , k}
∗
Note again that this theorem gives us merely rst order, necessary conditions: If x∗ is a maximum
and if the NDCQ holds at x∗ , then there exist Lagrange multipliers for which the conditions hold
true. Finding all tuples (x, λ, µ) for which the conditions hold will therefore gives us a set of candidate
solutions. We still need to check whether these candidates are actually maximizers. (Conditions that
can be used to do so will be taught in your rst semester math class, or you can read Simon & Blume,
chapter 19). Notice also that we may not nd all candidate solutions using the Lagrangian method if
the NDCQ is violated at some points in the constraint set.
Meaning of the multiplier: See Simon & Blume, Chapter 19.1.
5 Minimization Problems
So far we have only discussed maximization problems. What happens if we are instead looking to
minimize a function given certain constraints? There are dierent ways in which we can change the
above technique of using the Lagrangian to apply it to minimization problems. The easiest is probably
the following:
Suppose we are given a minimization problem
min f (x1 , . . . , xn ) subject to :
xi ∈R
g1 (x1 , . . . , xn ) ≤ b1 , . . . , gk (x1 , . . . , xn ) ≤ bk ,
h1 (x1 , . . . , xn ) = c1 , . . . , hk (x1 , . . . , xn ) = ck .
Finding the minimum of f on a certain domain is really the same as nding the maximum of −f on
that domain. So we can transform the above problem into the maximization problem
max −f (x1 , . . . , xn ) subject to :
xi ∈R
g1 (x1 , . . . , xn ) ≤ b1 , . . . , gk (x1 , . . . , xn ) ≤ bk ,
h1 (x1 , . . . , xn ) = c1 , . . . , hk (x1 , . . . , xn ) = ck .
We can nd candidate solutions for this problem as discussed above. They will also be candidate
solutions of the original minimization problem.
8 Constrained Optimization
6 Kuhn-Tucker Notation
The most common problems in economics are maximization problems dealing with only inequality
constraints. Many of these constraints come in the form of non-negativity constraints, such as requiring
consumption to be weakly positive. Consider the following problem:
max f (x1 , . . . , xn )
x∈Rn
which implies
∂ L̃
= −vj ∀ j = 1, . . . , n.
∂xj
Plugging those into the nonnegativity constraints we have that
∂ L̃ ∂ L̃
xj = 0 and ≤ 0.
∂xj ∂xj
In summary, the following are the rst order conditions for the Kuhn-Tucker Lagrangian:
∂ L̃ ∂ L̃
≤ 0 and xj = 0 ∀ j = 1, . . . , n
∂xj ∂xj
∂ L̃ ∂ L̃
≥ 0 and λj = 0 ∀ j = 1, . . . , m
∂λj ∂λj
λj ≥ 0∀j = 1, . . . m
This is 2n + 3m constraints, n less than before. If we are dealing with a lot of non-negativity con-
straints it is therefore faster to use the Kuhn-Tucker Lagrangian.
Exercises
1. There are two commodities: x and y . Let the consumer's consumption set be R2+ and his
preference relation on his consumption set be represented by u(x, y) = −(x − 4)2 − y 2 . When
his initial wealth is 2 and the relative price is 1, solve his utility maximization problem if it is
well dened.
2. Let f : R+ → R and f (x) = −(x + 1)2 + 2. Solve the maximization problem if it is well dened.
3. Let f : R2+ → R and f (x, y) = 2y − x2 . When (x, y) must be on the unit disc, i.e., x2 + y 2 ≤ 1,
solve the minimization problem if it is well dened.1
7 Homework
1 This is the same problem as in Example 18.11 of Simon and Blume (1994).