Convex Analysis For Optimization A Unified Approach Compress
Convex Analysis For Optimization A Unified Approach Compress
Jan Brinkhuis
Convex Analysis
for Optimization
A Unified Approach
Graduate Texts in Operations Research
Series Editors
Richard Boucherie
University of Twente
Enschede, The Netherlands
Johann Hurink
University of Twente
Enschede, The Netherlands
This series contains compact volumes on the mathematical foundations of Oper-
ations Research, in particular in the areas of continuous, discrete and stochastic
optimization. Inspired by the PhD course program of the Dutch Network on the
Mathematics of Operations Research (LNMB), and by similar initiatives in other
territories, the volumes in this series offer an overview of mathematical methods for
post-master students and researchers in Operations Research. Books in the series are
based on the established theoretical foundations in the discipline, teach the needed
practical techniques and provide illustrative examples and applications.
Convex Analysis
for Optimization
A Unified Approach
Jan Brinkhuis
Econometric Institute
Erasmus University Rotterdam
Rotterdam, The Netherlands
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Grażyna
Preface
A Unified Method Borrowed from Geometry Convex analysis, with its wealth
of formulas and calculus rules, immediately intrigued me, having a background in
pure mathematics and working at a school of economics. This background helped
to see that some phenomena in various branches of geometry that can be dealt
with successfully by one tested method, called homogenization, run parallel to
phenomena in convex analysis. This suggested using this tested method in convex
analysis as well. This turned out to work well beyond my initial expectations. Here
are sketches of two examples of such parallels and of their simple but characteristic
use in convex analysis.
First Example of Homogenization in Convex Analysis Taking the homogeniza-
tion of circles, ellipses, parabolas, and hyperbolas amounts to viewing them as
conic sections—intersections of a plane and the boundary of a cone. This is known
for more than 2000 years to be a fruitful point of view. Parallel to this, taking
the homogenization of a convex set amounts to viewing this convex set as the
intersection of a hyperplane and a homogeneous convex set, also called a convex
cone. This turns out to be also a fruitful point of view. For example, consider the
task to prove the following proposition: a convex set is closed under finite convex
combinations. The usual proof, by induction, requires some formula manipulation.
The homogenization method gives a reduction to the task of proving that a convex
cone is closed under positive linear combinations. The proof of this, by induction,
requires no effort. In general, one can always reduce a task that involves convex
sets to a task that involves convex cones, and as a rule this is an easier task. This
reduction is the idea of the single method for convex analysis that is proposed in this
book.
Second Example of Homogenization in Convex Analysis In algebraic and
differential geometry, the homogenization operation gives often sets that are not
vii
viii Preface
closed. Then it is natural to make these sets closed by adding the so-called points
at infinity. This is known to be a fruitful method for dealing with the behavior “at
infinity.” The added points model two-sided or one-sided directions. Parallel to this,
taking the homogenization of an unbounded closed convex set gives a convex cone
that is not closed, and again it is natural to make it closed by taking its closure and
to view the added points as points at infinity of the given convex set. The added
points turn out to correspond exactly to a well-known important auxiliary concept
in convex analysis: recession directions of the convex set. These are directions in
which one can travel forever inside the convex set. It is a conceptual advantage of
the homogenization method that it makes this auxiliary concept arise in a natural
way and that it makes it easier to work with it. This turns out to be a fruitful
method to work with unbounded convex sets. For example, consider the task to
prove the following theorem of Krein–Milman: each point of a closed convex set
is the sum of a convex combination of extreme points and a conic combination of
extreme recession directions. The usual proof, by induction, requires a relatively
intricate argument involving recession directions. The homogenization method
gives a reduction to the task of proving that a closed convex cone is generated by
its extreme rays. The proof of this, by induction, requires no effort. In general, one
can always reduce a task that involves an unbounded convex set—and that therefore
requires the use of recession directions—to a task that involves a closed convex
cone, and as a rule this is an easier task. In other words, the distinction between
bounded and unbounded convex sets disappears, in some sense.
The homogenization method made everything fall into its proper place for me.
On the Need for Calculus Rules for Convex Sets For example, I was initially
puzzled by the calculus rules for convex sets—formulas for a convexity preserving
operation (such as the intersection of two convex sets) followed by the duality
operator for convex sets. These rules are of fundamental interest, but do they have
practical applications? If not, it might be better to omit them from a textbook on
convex analysis, especially as they are relatively intricate, involving, for example,
the concept of sublinear functions. The homogenization method gives the following
insight. In order to solve concrete convex optimization problems, one has to
compute subdifferentials, certain auxiliary convex sets; this requires calculus rules
for convex sets, and these are essentially the formulas mentioned above, as you will
see.
A Unified Definition of the Properties Closed and Bounded Another example is
that the homogenization method reveals that the standard theory of convex analysis
is completely satisfactory and straightforward, apart from one complication. To
understand what this is, one has to keep in mind that convex sets and functions that
occur in applications have two nice properties, closedness and properness; these
properties make it easy to work with them. However, when you work with them,
you might be led to new convex sets and functions that do not possess these nice
properties; then these are not so easy to work with. Another complication is that
previously the definitions of the properties closed and proper for various convex
objects (sets and functions) had to be made individually and seemed to involve
Preface ix
Acknowledgements
Prerequisites This book is meant for master, graduate, and PhD students who need
to learn the basics of convex analysis, for use in some of the many applications
of convex analysis, such as, for example, machine learning, robust optimization,
and economics. Prerequisites to reading the book are: some familiarity with linear
algebra, the differential calculus, and Lagrange multipliers. The necessary back-
ground information can be found elsewhere in standard texts, often in appendices:
for example, pp. 503–527 and pp. 547–550 from the textbook [1] “Optimization:
Insights and Applications,” written jointly with Vladimir M. Tikhomirov.
Convex Analysis: What It Is To begin with, a convex set is a subset of n-
dimensional space Rn that consists of one piece and has no holes or dents. A convex
function is a function on a convex set for which the region above its graph is a
convex set. Finally, a convex optimization problem is the problem to minimize a
convex function. Convex analysis is the theory of convex sets, functions, and its
climax, optimization problems. The central result is that each boundary point of a
convex set lies on a hyperplane that has the convex set entirely on one of its two
sides. This hyperplane can be seen as a linearization at this point of the convex set,
or even better, as a linearization at this point of the boundary of the convex set.
This is the only successful realization of the idea of approximating objects that are
nonlinear (and so are difficult to work with) by objects that are linear (and so are
easy to work with thanks to linear algebra) other than the celebrated realization of
this idea that is called differential calculus. The latter proceeds by taking derivatives
of functions and leads to the concept of tangent space.
Convex Analysis: Why You Should Be Interested One reason that convex
analysis is a standard tool in so many different areas is the analogy with differential
calculus. Another reason for the importance of convex analysis is that if you want to
find in an efficient and reliable way a good approximation of an optimal solution of
an optimization problem, then it is almost always possible to find it if this problem
is convex, but it is rarely possible otherwise.
xi
xii Introduction
slope η
∧ ∧
(x, 0, F (x, 0))
z = F (x, y )
O
∧
x
x y
Introduction xiii
This follows from the minimality property of x . Therefore, the equation of this
plane is of the form z − F ( x , 0) = ηy for some real number η. This number
η is the Lagrange multiplier for the present situation. This procedure can be
generalized to arbitrary smooth or convex functions F (x, y) with x ∈ Rn and
y ∈ Rm . For smooth F one has to use smooth linearization, for convex F one
has to use convex linearization. This gives the main result of smooth optimization,
respectively, convex optimization. Thus the difference between the derivations of
the two results is only in the method of linearization that is used.
Novelty: One Single Method Convex analysis requires a large number of different
operations, for example, maximum, sum, convex hull of the minimum, conjugate
dual, subdifferential, etc., and formulas relating them, such as the Fenchel–Moreau
theorem f ∗∗ = f , the Moreau–Rockafellar theorem ∂(f + g)(x) = ∂f (x) + ∂g(x),
the Dubovitskii–Milyutin theorem ∂ max(f, g)(x) = ∂f (x)co ∪ ∂g(x) if f (x) =
g(x), etc., depending on the task at hand. Previously, all these operations and
formulas had to be learned one by one, and all proofs had to be studied separately.
The results on convex analysis in this textbook are the same as in all textbooks on
convex analysis. However, what is different in this textbook is that a single approach
is given that can be applied to all tasks involved in convex analysis: construction
of operations and formulas relating them, formulation of results and construction
of proofs. Using this approach should save a lot of time. Once you have grasped
the meaning of this approach, you will see that the structure of convex analysis is
transparent.
Description of the Unified Method The foundation of the unified approach is the
homogenization method. This method consists of three steps:
1. homogenize, that is, translate a given task into the language of nonnegative
homogeneous convex sets, also called convex cones,
2. work with convex cones, which is relatively easy,
3. dehomogenize, that is, translate back to get the answer to the task at hand.
The use of homogenization in convex analysis is borrowed from its use in
geometry. Therefore, we first take a look at its use in geometry.
History of the Unified Method As long ago as 200 years BCE, Apollonius
of Perga used an embryonic form of homogenization in his eight volume work
“Conics,” the apex of ancient Greek mathematics. He showed that totally different
curves—circles, ellipses, parabolas, and hyperbolas—have many common proper-
ties as they are all conic sections, intersections of a plane with a cone.
Figure 2 shows an ice cream cone and boundaries of intersections with four
planes. This gives, for the horizontal plane a circle, for the slightly slanted plane an
ellipse, for the plane that is parallel to a ray on the boundary of the cone a parabola,
and for the plane that is even more slanted a branch of a hyperbola.
Thus, apparently unrelated curves can be seen to have common properties
because they are formed in the same way: as intersections of one cone with various
planes. This phenomenon runs parallel to the fact that totally different convex
xiv Introduction
1 C
O A
sets—that is, sets in column space Rn that consist of one piece and have no
holes or dents—have many common properties and that this can be explained by
homogenization: each convex set is the intersection of a hyperplane and a convex
cone, that is, a convex set that is positive homogeneous—containing all positive
scalar multiples for each of its points.
Figure 3 shows the homogenization process in a graphical form. A convex set
A is drawn in the horizontal plane R2 . If one vertical dimension is added and
three-dimensional space R3 is considered, then the set A can be lifted up in three-
dimensional space in a vertical direction to level 1; this gives the set A × {1}. Next,
the convex cone C that is the union of all open rays with endpoint the origin O
and running through a point of A × {1} is drawn. This convex cone C is called the
homogenization or conification of A. Note that there is no loss of information in
going from A to C. If you intersect C with the horizontal plane at level 1 and drop
the intersection down vertically onto the horizontal plane at level 0, then you get A
back again. This description of a convex set as the intersection of a hyperplane and
a convex cone helps by reducing the task of proving any property of a convex set to
the task of proving a property of a convex cone, which is easier.
Working with Unboundedness in Geometry by the Unified Method In 1415,
the Florentine architect Filippo Brunelleschi made the first picture that used linear
perspective. In this technique, horizontal lines that have the same direction intersect
at the horizon in one point called the vanishing point.
Introduction xv
However, open rays are infinite sets, and Möbius preferred to model points as
points and not as infinite sets. Therefore, he took the intersection of each ray with
the standard unit sphere x 2 + y 2 + z2 = 1. This creates a modified model of the set
that consists of the points in the plane and the one-sided directions in the plane, the
hemisphere model. In this model, the point (x, y) in the plane is modeled as the point
1
(x 2 + y 2 + 1)− 2 (x, y, 1) of the open upper hemisphere x 2 + y 2 + z2 = 1, z > 0,
and the one-sided direction in the plane given by the unit vector (x̃, ỹ) is modeled
as the point (x̃, ỹ, 0) on the circle that bounds the hemisphere, x 2 + y 2 = 1, z = 0.
The advantage of the hemisphere model over the ray model is that points and
directions are modeled as points instead of as infinite sets. Note that the circle
created by the hemisphere model is the analogue for the plane of the auxiliary sphere
that Gauss recommended for three-dimensional space. Comparing the hemisphere
model of Möbius and the Gauss model, the hemisphere model is more convenient to
use because it does not require the artificial constructs required by the Gauss model
(“we employ as an auxiliary, a sphere of unit radius described about an arbitrary
center”).
One more simplification is possible. Looking down at the upper hemisphere from
high above makes it look like a disk in the plane. To be more precise, one gets this
top-view model of the plane and its one-sided directions by orthogonal projection of
the hemisphere on the horizontal coordinate plane z = 0: then a point (x, y) in the
1
plane is modeled as the point (x 2 + y 2 + 1)− 2 (x, y) of the standard open unit disk
x 2 + y 2 < 1 in the plane R2 and a one-sided direction in the plane given by the unit
vector (x̃, ỹ) is modeled as this unit vector, which is a point of the circle that bounds
the disk, x 2 + y 2 = 1. The advantage of this top-view model over the hemisphere
model is that it is one dimension lower. This top-view model for the plane and its
one-sided directions is the closed unit disk in the plane itself, but the hemisphere
model for the plane and its one-sided directions requires three-dimensional space.
This top-view model for three-dimensional space and its one-sided directions is
the closed unit ball in three-dimensional space itself, but the hemisphere model for
three-dimensional space and its directions requires four-dimensional space, which
is hard to visualize.
The following line from the novel, “Dom dzienny, dom nocny” (House of day,
house of night) by Olga Tokarczuk can be read as evoking the idea of modeling the
points and one-sided directions of the plane by means of the top-view model:
Widz˛e kolista˛ lini˛e horyzontu, który zamyka dolin˛e ze wszystkich stron.
I see the circular line of the horizon, which encloses the valley on all sides.
Similarly to the way in which the three models were created for the plane, models
can be created for the line, for three-dimensional space, and more generally for a
vector space of any finite dimension.
Figure 5 illustrates the ray, hemisphere, and top-view models in the simplest case,
for dimension one, the line. In this figure, the extended real line R = R∪{+∞, −∞}
is modeled. This is the real line together with the two one-sided directions in the real
line—right (+∞) and left (−∞). A real number r is modeled in the ray model by
Introduction xvii
0 p r
Recession Directions of a Convex Set We want to illustrate how taking the closure
of one of the models of a convex set A corresponds precisely to taking the closure
of A and simultaneously adjoining the recession directions of A. To begin with, we
give the definition of a recession direction. A recession direction of a convex set A is
a one-sided direction in which one can travel forever inside the convex set A without
leaving it, after having started in an arbitrary point of A. A recession direction exists
precisely if A is unbounded. The set of all vectors having such a direction form,
together with the origin, the so-called recession cone RA . So a recession direction
of A can be defined as an open ray of RA , the set of all positive scalar multiples of
a nonzero vector of RA .
Illustration in Dimension One Figure 6 illustrates in dimension one, the line R,
that recession vectors of a closed convex set arise by taking the closure of the ray
model of this convex set. The ray model of an unbounded closed convex set A in
the plane is drawn, the half-line A = [a, +∞) for some a > 0. The set A has
one recession direction, “right” (+∞). The direction “right” is modeled in the ray
model by the horizontal open ray with endpoint the origin that points to the right.
This is the unique ray of the recession cone RA . You can see intuitively in this
figure that this horizontal ray belongs to the closure of the ray model of A. Indeed,
the beginning of a sequence of rays is drawn—the arrow suggests the direction of
the sequence—and the rays of this sequence are less and less slanted, and their limit
is precisely the horizontal ray that points to the right. This shows that the direction
“right” is a point in the closure of the ray model of A; moreover, it does not lie in
the ray model of A as all rays in this model are non-horizontal. In fact, it is the only
such point.
A × {1}
A
O RA
A
A RA A
RA
the closure of the top-view model of A. To be more precise, the points in the shaded
region that do not lie on the circle model the points of A itself; the points in the
shaded region that do lie on the circle model the recession directions of A, that is,
the rays of the recession cone RA . The straight arrows with initial point at the center
of the disk represent straight paths from the origin that lie completely inside A. The
straight arrows either end inside the circle at the boundary of the model of A, or
they end on the circle. In the first case, the path ends on the boundary of A. In the
second case, the path is a half-line, it never ends.
Looking at Fig. 7, from left to right, you see three top-view models: of a bounded
set (such as a set bounded by a circle or an ellipse), of a set having precisely one
recession direction (such as a set bounded by a parabola), and of a set having
infinitely many recession directions (such as a set bounded by a branch of a
hyperbola). In each case, you see that if you take the closure of the top-view model
of A, then what happens is that the recession directions are added. From left to right,
you see: no point is added, the unique recession direction is added, all the infinitely
many recession directions are added.
Recession Directions in the Three Models In the three models that we consider,
the ray model, the hemisphere model, and the top-view model, a recession direction
of a convex set A ⊆ Rn is modeled as follows: in the ray model as a horizontal ray
that is the limit of a sequence of non-horizontal rays that represent points of A, in
the hemisphere model as a point on the boundary of the hemisphere that is the limit
of a sequence of points on the open hemisphere that represent points of A, and in the
top-view model as a point on the boundary of the ball that is the limit of a sequence
of non-boundary points of the ball that represent points of A.
Conclusion of the Application of Homogenization to Unbounded Convex Sets
The unified method proposed in this book, homogenization, and the use of the
three visualization models make unbounded convex sets as simple to work with
as bounded ones: all that is required is to take the closure of one of the three models
of a given convex set. Then recession directions arise without requiring an artificial
construction and the geometric insight provided by the visualization models makes
it easy to construct the proofs of properties of recession directions.
xx Introduction
Comparison with the Literature The novelty of the present book compared to
existing textbooks on convex analysis is that all convex analysis tasks are simplified
by performing them using only the homogenization method. Moreover, many
motivating examples and applications as well as numerous exercises are given. Each
chapter has an abstract (“why” and “how”) and a road map. Other textbooks include:
Dimitri P. Bertsekas [2], Jonathan M. Borwein and Adrian S. Lewis [3], Lieven
Vandenberghe and Stephen Boyd [4], Werner Fenchel [5], Osman Güler [6], Jean-
Baptiste Hiriart-Urruty and Claude Lemaréchal [7, 8], Georgii G. Magaril-Ilyaev
and Vladimir M. Tikhomirov [9], Ralph Tyrrell Rockafellar [10] and [11], Vladimir
M. Tikhomirov [12], and Jan van Tiel [13].
The use of the homogenization method in convex analysis is well-known to
experts. One example is the treatment of polarity in projective space in “Convex
Cones, Sets and Functions” (1953) by Werner Fenchel in [5]. Another one is the
use of extreme points at infinity in “Convex Analysis” (1970) by Ralph Tyrrell
Rockafellar in [10]. Still another example is the concept of cosmic space in
“Variational Analysis” (1997) by Ralph Tyrrell Rockafellar and Roger Wets in [14].
However, it does not appear to be widely known that homogenization is useful
for any task in convex analysis. For example, attempts have been made to obtain
complete lists of rules for calculating the dual object of a convex object (such as
a convex set or function) but without using homogenization. A recent attempt was
described in “Convex Analysis: Theory and Applications” (2003) by Georgii G.
Magaril-Il’yaev and Vladimir M. Tikhomirov in [9]. Subsequently, it was shown that
the homogenization method leads to the construction of complete lists—in“Duality
and calculus of convex objects (theory and applications)”, Sbornik Mathematics,
Volume 198, Number 2 (2007), jointly with Vladimir M. Tikhomirov in [15], and
also in [16].
The methods that are explained in this book can be applied to convex optimiza-
tion problems. For a small but very interesting minority, one can get the optimal
solutions exactly or by means of some law that characterizes the optimal solution.
Numerous examples are given in this book. These methods are also used, for
example, in Yurii Nesterov [17] and in Aharon Ben-Tal [18], to construct algorithms,
which can find efficiently and reliably approximations to optimal solutions for
almost all convex optimization problems. These algorithms are not considered in
this book, as it is not yet clear how helpful the homogenization method is in the
analysis of these algorithms.
How Is the Book Organized? Each chapter starts with an abstract (why and what),
a road map, a section providing motivation (optional), and it ends with a section
providing applications (optional), and many exercises (the more challenging ones
are marked by a star).
Chapters 1–4 present all standard results on convex sets. Each result is proved
in the same systematic way using the homogenization method, that is, by reduction
to results on convex cones. To prove some of these results on convex cones, simple
techniques and tricks, described in the text, have to be applied.
References xxi
The remaining chapters use the homogenization method on its own to derive,
as corollaries of the standard results on convex sets, the standard results for
convex functions (Chaps. 5 and 6) and for convex optimization (Chaps. 7 and 8).
In Chaps. 7 and 8, many concrete smooth and convex optimization problems are
solved completely, all by one systematic four-step method. The consistent use
of homogenization for all tasks (definitions, constructions, explicit formulas, and
proofs) reveals that the structure of convex analysis is very transparent. To avoid
repetition, only proofs containing new ideas will be described in the book. Finally,
in Chap. 9 some additional convex optimization applications are given.
References
xxiii
xxiv Contents
Abstract
• Why. The central object of convex analysis is a convex set. Whenever weighted
averages play a role, such as in the analysis by Nash of the question ‘what is a
fair bargain?’, one is led to consider convex sets.
• What. An introduction to convex sets and to the reduction of these to convex
cones (‘homogenization’) is given. The ray, hemisphere and top-view models
to visualize convex sets are explained. The classic results of Radon, Helly and
Carathéodory are proved using homogenization. It is explained how convex
cones are equivalent to preference relations on vector spaces. It is proved that the
Minkowski sum of a large collection of sets of vectors approximates the convex
hull of the union of the collection (lemma of Shapley–Folkman).
Road Map For each figure, the explanation below the figure has to be considered
as well.
• Definition 1.2.1 and Fig. 1.3 (convex set).
• Definition 1.2.6 and Fig. 1.4 (convex cone; note that a convex cone is not required
to contain the origin).
• Figure 1.5 (the ray, hemisphere and top-view model for a convex cone in
dimension one; similar in higher dimensions: for dimension two, see Figs. 1.8
and 1.9).
• Figure 1.10 (the (minimal) homogenization of a convex set in dimension two and
in the bounded case).
• Figure 1.12 (the ray, hemisphere and top-view model for a convex set in
dimension one; for dimension two, see Figs. 1.13 and 1.14).
• Definition 1.7.2 and Fig. 1.15 (convex hull).
• Definition 1.7.4 and Fig. 1.16 (conic hull; note that the conic hull contains the
origin).
• Definitions 1.7.6, 1.7.9, Proposition 1.7.8, its proof and the explanation preceding
it (the simplest illustration of the homogenization method).
• Section 1.8.1 (the ingredients for the construction of binary operations for convex
sets).
• Theorems 1.9.1, 1.10.1, 1.12.1 and the structure of their proofs (theorems of
Radon, Helly, Carathéodory, proved by the homogenization method).
• Definitions 1.12.6, 1.12.9 and 1.12.11 (polyhedral set, polyhedral cone and
Weyl’s theorem).
• Definition 1.10.5, Corollary 1.12.4 and its proof (use of compactness in the
proof).
• The idea of Sect. 1.13 (preference relations: a different angle on convex cones).
The aim of this section is to give an example of how a convex set can arise in a
natural way. You can skip this section, and similar first sections in later chapters, if
you want to come to the point immediately.
John Nash has solved the problem of finding a fair outcome of a bargaining
opportunity (see [1]). This problem does not involve convex sets, but its solution
requires convex sets and their properties.
Here is how I learned about Nash bargaining. Once I was puzzled by the outcome
of bargaining between two students, exchanging a textbook and sunglasses. The
prices of book and glasses were comparable—the sunglasses being slightly more
expensive. The bargain that was struck was that the student who owned the most
expensive item, the sunglasses, paid an additional amount of money. What could be
the reason for this? Maybe this student had poor bargaining skills? Someone told
me that the problem of what is a fair bargain in the case of equal bargaining skills
had been considered by Nash. The study of this theory, Nash bargaining, was an
eye-opener: it explained that the bargain I had witnessed and that initially surprised
me, was a fair one.
Nash bargaining will be explained by means of a simple example. Consider Alice
and Bob. They possess certain goods and these have known utilities (≈pleasure) to
them. Alice has a bat and a box, Bob has a ball. The bat has utility 1 to Alice and
utility 2 to Bob; the box has utility 2 to Alice and utility 2 to Bob; the ball has utility
4 to Alice and utility 1 to Bob.
If Alice and Bob would come to the bargaining agreement to exchange all their
goods, they would both be better off: their total utility would increase—for Alice
from 3 to 4 and for Bob from 1 to 4. Bob would profit much more than Alice from
this bargain. So this bargain does not seem fair.
Now suppose that it is possible to exchange goods with a certain probability.
Then we could modify this bargain by agreeing that Alice will give the box to
Bob with probability 1/2. For example, a coin could be tossed: if heads comes
up, then Alice should give the box to Bob, otherwise Alice keeps her box. This
bargain increases expected utility for Alice from 3 to 5 and for Bob from 1 to 3.
This looks more fair. More is true: we will see that this is the unique fair bargain in
this situation.
1.1 *Motivation: Fair Bargains 3
To begin with, we model the bargain where bat, box and ball are handed over with
probability p1 , p2 , p3 respectively. To keep formulas as simple as possible, we do
this by working with a different but equivalent measure of utility after the bargain:
expected change in utility instead of expected utility. To this end, we consider the
vector in R2 that has as coordinates the expected changes in total utility for Alice
and Bob:
Indeed, this is the difference of the following two vectors: the vector of expected
utilities after the bargain,
and the vector of the utilities before the bargain, (1, 0) + (2, 0) + (0, 4). Thus the
utilities have been adjusted.
We allow Alice and Bob to make no agreement; in this example, it is decided in
advance that in case of no agreement, each one keeps her or his own goods.
Now we model the bargaining opportunity of Alice and Bob: as a pair (F, v),
where F , the set of bargains, is the set of all vectors p1 (−1, 2) + p2 (−2, 2) +
p3 (4, −1) with 0 ≤ p1 , p2 , p3 ≤ 1, and v, the disagreement point, is the origin
(0, 0). Figure 1.1 illustrates this bargaining opportunity.
The set F contains v and it has two properties that are typical for all bargaining
opportunities. We have to define these properties, compactness and convexity. That
F is compact means that it is bounded—there exists a bound M > 0 for the absolute
values of the coordinates of F , |xi | ≤ M for all x ∈ F, i = 1, 2—and that it
is closed—for each convergent sequence in R2 that is contained in F , its limit is
also contained in F . These properties are easily checked and they are evident from
Fig. 1.1. So F is compact. That F is convex means that it consists of one piece, and
has no holes or dents. Figure 1.2 illustrates the concept of convex set.
For a precise definition, see Definition 1.2.1. We readily see from Fig. 1.1 that F
is convex.
v
4 1 Convex Sets: Basic Properties
yes yes no no no
and
and we write
some sense to be made precise. To this end, we give three properties that we can
reasonably expect a rule ϕ to have, in order for it to deserve the title ‘fair rule’.
1. For each bargaining opportunity (F, v), there does not exist x ∈ F for which
x > ϕ(F, v).
This represents the idea that each individual wishes to maximize the utility to
himself of the bargain.
2. For each two bargaining opportunities (E, v) and (F, v) for which E ⊆ F and
ϕ(F, v) ∈ E, one has ϕ(E, v) = ϕ(F, v).
This represents the idea that eliminating from F feasible points (other than
the disagreement point) that would not have been chosen, should not effect the
outcome.
3. If a bargaining opportunity (F, v) can be brought by transformations of the two
utility functions of the type vi = ai ui + bi , i = 1, 2 with ai > 0, in symmetric
form, in the sense that (r, s) ∈ F implies (s, r) ∈ F and v is on the line y = x,
then if (F, v) is written in symmetric form, one has that ϕ(F, v) is on the line
y = x.
This represents equality of bargaining skills: if both have the same bargaining
position, then they will get the same utility out of it.
What comes now, surprises most people; it certainly surprised me. These
requirements single out one unique rule among all possible rules. This unique rule
should be called the fair rule, as it is reasonable to demand that a fair rule should
satisfy the three properties above are, and as moreover it is the only rule satisfying
these properties. Furthermore, it turns out that this rule can be described by means
of an optimization problem.
Theorem 1.1.1 (Nash) There is precisely one rule ϕ that satisfies the three
requirements above. This rule associates to a bargaining opportunity (F, v)—
consisting of a compact convex set F in the plane R2 and a point v ∈ F for which
there exists w ∈ F with w > v—the unique solution of the following optimization
problem:
Example 1.1.2 (The Fair Bargain for Alice and Bob) Now we can verify that the
following bargain is optimal for the bargaining opportunity of Alice and Bob: Bob
gives the ball to Alice, Alice gives the bat to Bob, and a coin is tossed: if it lands on
heads, then Alice keeps the box, if it lands on tails, then Alice gives the box to Bob.
That is, the objects are handed over with probabilities p1 = 1, p2 = 1/2, p3 = 1.
This gives the vector (−1, 2)+ 12 (−2, 2)+(4, −1) = (2, 2) in the set F of allowable
bargains. By Theorem 1.1.1, it suffices to check that (2, 2) is a point of maximum
of the function f (x) = (x1 − v1 )(x2 − v2 ) = x1 x2 on the set F , which is drawn in
Fig. 1.1.
To do this, it suffices to check that the line x1 + x2 = 4 separates the set F and
the solution set of f (x) ≥ f (2, 2), that is, of x1 x2 ≥ 4, x1 ≥ 0, x2 ≥ 0, and to
6 1 Convex Sets: Basic Properties
check that (2, 2) is the only point on this line that belongs to both sets. Well, on the
one hand, two adjacent vertices of the polygon F , the points (1, 3) and (3, 1)—the
two vertices in Fig. 1.1 that lie in the first orthant x1 ≥ 0, x2 ≥ 0—lie on the line
x1 + x2 = 4, so F lies on one side of this line and its intersection with this line
is the closed line interval with endpoints (1, 3) and (3, 1). On the other hand, the
solution set of x1 x2 ≥ 4, x1 ≥ 0, x2 ≥ 0—which is bounded by one branch of the
hyperbola x1 x2 = 4—lies on the other side of the line x1 + x2 = 4, and its only
common point with this line is (2, 2); this follows as the line x1 + x2 = 4 is tangent
to the hyperbola x1 x2 = 4 at the point (2, 2). This completes the verification that
the given bargain is the unique fair one.
Later, we will solve the bargaining problem of Alice and Bob in a systematic
way, using the theory of convex optimization, in Example 8.3.5.
Thus, by using convex sets and their properties, a very interesting result has been
obtained. The applications are limited by the fact that they require that the two
participants cooperate and give enough information, so that the compact convex set
F can be determined. The temptation must be resisted to influence the shape of F
and so the outcome, by giving information not truthfully. However, this result sheds
information on bargaining opportunities that are carried out in an amicable way, for
example among friends. This is precisely the case with the bargain of a textbook and
sunglasses, which led me to study Nash bargaining. Nash bargaining makes clear
that this is a fair bargain! Indeed, the student who owned the textbook had passed her
exam and had no more need for the book and could sell it to one of the many students
who had to take the course next year, the other student really needed the textbook
as she had to take the course. The student who owned the sunglasses had on second
thoughts regrets that she had bought them, the other student liked them but did not
have to buy them. If you translate this information into a bargaining opportunity
of the form (F, v), you will see that you get that the fair outcome according to
Nash agrees with the outcome of the textbook and sunglasses bargaining that I had
observed in reality.
If you are interested in the proof of Theorem 1.1.1, here is a sketch. It already
contains some properties of convexity in embryonic form. These properties will be
explained in detail later in this book.
Proof of the Nash Bargaining Theorem We begin by proving that the optimiza-
tion problem has a unique optimal solution. By the theorem of Weierstrass that a
continuous function on a nonempty compact set assumes its maximal value, there
exists at least one optimal solution. This optimal solution is > v by the assumption
that there exists w ∈ F for which w > v and so f (w) > f (v) = 0. The
optimization problem can be rewritten, by taking the logarithm, changing the sign,
and replacing the constraints x ≥ v by x > v, as follows,
The function g is convex, that is, its epigraph, the region above or on the graph
of g, that is, the set of pairs (x, ρ) for which x ∈ F, x > v, g(x) ≤ ρ, is a convex
1.2 Convex Sets and Cones: Definitions 7
set in three-dimensional space R3 . The function g is even strictly convex, that is, it
has the additional property that its graph contains no line segment of positive length.
We do not display the verification of these properties of g. These properties imply
that g cannot have more than one minimum. We show this by contradiction. If there
would be two different minima x and y, then g(x) = g(y) and so, by convexity of
g, the line segment with endpoints (x, g(x)) and (y, g(y)) lies nowhere below the
graph of g. But by the minimality property of x and y, this line segment must lie
on the graph of g. This contradicts the strict convexity of g. Thus we have proved
that the optimization problem max f (x) = (x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v has a
unique solution.
Now assume that ϕ is a rule that satisfies the three requirements. We want to
prove that ϕ(F, v) is the unique solution of the optimization problem max f (x) =
(x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v for each bargaining opportunity (F, v). We
can assume, by an appropriate choice of the utility functions of the two individuals,
that the two coordinates of the optimal solution of the optimization problem above
are equal to 1 and that v1 = v2 = 0. Then (1, 1) is the solution of the problem
max x1 x2 , x ∈ F, x ≥ 0. We claim that F is contained in the half-plane G given
by x1 + x2 ≤ 2. We prove this by contradiction. Assume that there exists a point
x ∈ F outside this half-plane. Then the line interval with endpoints (1, 1) and x
is contained in F , by the convexity of F . This line interval must contain points y
near (1, 1) with y1 y2 > 1 as a simple picture makes clear (and a calculation can
show rigorously). This is in contradiction with the maximality of (1, 1). As G is
symmetric, we get ϕ(G, 0) = (1, 1). Moreover ϕ(G, 0) ∈ F so ϕ(F, 0) = (1, 1).
Thus we have proved that a rule ϕ that satisfies the three requirements, must be
given by the optimization problem above.
Finally, it is not hard to show that, conversely, the rule that is given by the
optimization problem, satisfies the three requirements. We do not display the
verification.
Now we begin properly: with our central concept convex set. We will use the
notation R+ = [0, +∞) and R++ = (0, +∞).
Definition 1.2.1 A set A ⊆ Rn , where Rn is the space of n-columns, is called
convex if it contains for each two of its points also the entire line segment having
these points as endpoints,
a, b ∈ A, ρ, σ ∈ R+ , ρ + σ = 1 ⇒ ρa + σ b ∈ A.
yes yes no no no
∅, (a, b), [a, b], (a, b], [a, b), (a, +∞), (−∞, a), [a, +∞), (−∞, a], (−∞, +∞)
the i-th vertex as the first one of its two boundary vertices, going counter-
clockwise. Then ϕ1 , . . . , ϕk is a sequence of positive numbers in (0, π ) with
sum (k −2)π , and l1 , . . . , lk is a sequence of positive numbers. All such pairs
of sequences are obtained for a unique k-sided convex polygon, if convex
polygons that differ by rotations and translations are identified. This result
has no simple generalization beyond the plane.
(c) Here is another way to enumerate convex polygons. Let a k-sided convex
polygon be given. Let T be the set of its outside normals t of unit length,
1
t = (t12 + t22 )
2 = 1; for each t ∈ T , let at be the length of the side with
normal t. Then t∈T at t = 0. All such pairs of sequences are obtained for a
unique k-sided polygon, if polygons that differ by rotations and translations
are identified. The advantage of this result is that it has a nice straightforward
generalization beyond the plane.
(d) Note that a convex k-sided polygon has k vertices, and that this set of vertices
characterizes the polygon. However, the characterization of sets of k points
in the plane that are the vertices of a convex k-sided polygon is not so simple.
This is a disadvantage of enumerating polygons by their vertices.
(e) For each convex set A that contains a small disk with center the origin and
for each ε > 0, there is a convex polygon P such that
Each open ray in Rn contains a unique unit vector, and so the collection of open
rays of a cone in Rn can be viewed as a subset of the standard unit sphere x12 +
· · · + xn2 = 1. Each open ray Ru with positive last coordinate, un > 0, contains a
unique vector with last coordinate equal to 1, (x1 , . . . , xn−1 , 1); this gives a point
(x1 , . . . , xn−1 ) in Rn−1 . Therefore, the collection of open rays of a cone in Rn that
is contained in the halfspace xn > 0 can be viewed as a subset of Rn−1 .
Example 1.2.5 (A Cone Determined by a Positive Homogeneous Function) Let f :
Rn → R be a positive homogeneous function of degree ν ∈ R, that is, f (ρx) =
ρ ν f (x) = 0 for all ρ > 0 and x ∈ Rn . Then the solution set of For example, the
function fn (x1 , x2 , x3 ) = x22 x3 − x13 + n2 x1 x32 is positive homogeneous of degree 3
for all n ∈ R. The set of solutions of fn = 0 for which x3 ≥ 0 is a cone that has two
horizontal rays, R(0,1,0) and R(0,−1,0) , and the set of rays that are contained in the
open halfspace x3 > 0 can be viewed as the curve x22 = x13 − nx12 in the plane R2 .
[Aside. This curve is related to the famous unsolved congruent number problem:
determine whether a square-free natural number n is a congruent number, that is, the
area of a right triangle with rational number sides. Such triangles correspond 1 − 1
to the nontrivial rational number points on the curve x22 = x13 − nx12 (nontrivial
means x2 = 0). There is an easily testable criterion for the existence of such points,
but this depends on the truth of the Birch and Swinnerton-Dyer conjecture, which is
one of the Millennium Problems (‘million dollar problems’).]
We are only interested in a special type of cones.
Definition 1.2.6 A convex cone C ⊆ Rn is a set in Rn that is simultaneously a cone
and a convex set. A simpler description of a convex cone is that it is a set C ⊆ Rn
that contains for each two elements also all their positive linear combinations,
a, b ∈ C, σ, τ ∈ R++ ⇒ σ a + τ b ∈ C.
Note that the intersection of any collection of convex cones in Rn is again a convex
cone.
Warning In some texts, convex cones are required to contain the origin. This hardly
matters: if we adjoin the origin to a convex cone that does not contain the origin,
we get a convex cone containing the origin. However, for some constructions it
is convenient to allow convex cones that do not contain the origin. The precise
relation between convex cones that do not contain the origin and convex cones that
do contain the origin is given in Exercise 10.
Example 1.2.7 (Convex Cone)
1. Figure 1.4 illustrates the concept of convex cone.
The left hand side illustrates the defining property in the simpler description.
The right side illustrates why the word ‘cone’ has been chosen: a typical three-
dimensional cone looks like an extended ice-cream cone.
1.3 Chapters 1–4 in the Special Case of Subspaces 11
τb σa + τb
b
a
O σa O
∅, (0, +∞), (−∞, 0), [0, +∞), (−∞, 0], (−∞, +∞).
3. Convex cones in the plane can be enumerated. If they are closed (that is contain
their boundary points), then the list is: ∅, R2 and each region with boundary the
two legs of an angle ≤ π , with legs having their endpoint at the origin.
4. Three interesting convex cones in R3 :
(a) the first orthant R3+ = {x ∈ R3 | x1 ≥ 0, x2 ≥ 0, x3 ≥ 0},
1
(b) the Lorentz cone or ice cream cone L3 = {x ∈ R3 | x3 ≥ (x12 + x22 ) 2 },
(c) the positive semidefinite cone
The aim of this section is to give some initial insight into the contents of Chaps. 1–4
of this book. The aim of these chapters is to present the properties of convex sets
and now we display the specializations of some of these properties from convex
sets to special convex sets, subspaces. A subspace of Rn is a subset L of Rn that is
closed under sums and scalar multiples, x, y ∈ L, α, β ∈ R ⇒ αx + βy ∈ L. A
subspace can be viewed as the central concept from linear algebra: it is the solution
set of a finite system of homogeneous linear equations in n variables. A subspace
is a convex cone and so, in particular, it is a convex set. The specializations of the
12 1 Convex Sets: Basic Properties
results from Chaps. 1–4 from convex sets to subspaces are standard results from
linear algebra.
Chapter 1 A subspace of X = Rn is defined to be a nonempty subset L ⊆ X that
is closed under linear combinations of two elements:
a, b ∈ L, ρ, σ ∈ R ⇒ ρa + σ b ∈ L.
a11 x1 + a12 x2 = b1 ,
a21 x1 + a22 x2 = b2 .
where for each equation, the coefficients on the left side are not both zero: the system
is, either undetermined, or uniquely determined or false. This distinction of cases,
and in particular the intriguing choice of terminology ‘false’, might suggest that this
is a substantial phenomenon. This distinction can expressed as follows, using sets:
the solution set of the system is either infinite, or it consists of one element, or it
is the empty set. In geometric language: two lines in the plane are either identical,
or they intersect in one point, or they are parallel. A variant of homogenization (for
simplicity, we work with nonzero multiples instead of with positive multiples) turns
the system into the following form:
a11 x1 + a12 x2 = b1 x3 ,
a21 x1 + a22 x2 = b2 x3 ,
If these equations are not multiples of each other, then the system has always
a unique nonzero solution up to scalar multiples. In geometric language: the
intersection of two different planes in space through the origin is always a line.
So homogenization makes the distinction of cases disappear, and reveals the simple
essence of the phenomenon of false systems of equations.
Chapter 2 There are two standard binary operations for subspaces, the intersection
L1 ∩ L2 and the Minkowski sum
L1 + L2 = {x + y | x ∈ L1 , y ∈ L2 }.
These can be described systematically as follows. The subspace L1 +L2 is the image
of the set L1 × L2 under the linear function sum map +X : X × X → X, given by
the recipe (x, y) → x + y. The subspace L1 ∩ L2 is the inverse image of the same
set L1 × L2 under the linear function diagonal map X : X → X × X, given by
1.4 Convex Cones: Visualization by Models 13
the recipe x → (x, x). This description has the advantage that it reveals a relation
between ∩ and +, as the linear functions X and +X are each others transpose.
That is,
(x, y) · (v, w) = x1 v1 + · · · + xn vn + y1 w1 + · · · + yn wn .
L⊥ = {a ∈ X | a · x = a1 x1 + · · · + an xn = 0 ∀x ∈ L}.
That is, L⊥ consists of all a ∈ X for which L is contained in the solution set of the
equation a1 x1 + · · · + an xn = 0. One has the duality theorem L⊥⊥ = L. Moreover,
one has the calculus rule that K1 + K2 and L1 ∩ L2 are each others orthogonal
complement if Ki and Li are each others orthogonal complement for i = 1, 2.
This rule is an immediate consequence of the systematic description of ∩ and +
and the following property: if the subspaces K, L ∈ Rn are each others orthogonal
complement, then for each m × n-matrix M, the following two subspaces in Rm are
each others orthogonal complement: {Mx | x ∈ L}, the image of L under M, and
{y | M y ∈ L}, the inverse image of L under the transpose of M.
The aim of this section is to present three models for convex cones C ⊆ X = Rn
that lie in the upper halfspace xn ≥ 0: the ray model, the hemisphere model and the
top-view model.
Example 1.4.1 (Ray, Hemisphere and Top-View Model for a Convex Cone) Fig-
ure 1.5 illustrates the three models in one picture for a convex cone C in the plane
R2 that lies above or on the horizontal axis.
The ray model visualizes this convex cone C ⊆ R2 by viewing C \ {02 } as a
collection of open rays, and then considering the set whose elements are these rays;
the hemisphere model visualizes C by means of an arc: the intersection of C with the
upper half-circle x12 + x22 = 1, x2 ≥ 0; the top-view model visualizes C by looking
14 1 Convex Sets: Basic Properties
from high above at the arc, or, to be precise, by taking the interval in [−1, +1] that
is the orthogonal projection of the arc onto the horizontal axis.
The three models are all useful for visualization in low dimensions. Fortunately,
most, maybe all, properties of convex cones in Rn for general n can be understood
by looking at such low dimensional examples. Therefore, making pictures in your
head or with pencil and paper by means of one of these models is of great help
to understand properties of convex cones—and so to understand the entire convex
analysis, as we will see.
One often chooses a representative for each open ray, in order to replace open rays—
which are infinite sets of points—by single points. A convenient way to do this is
to normalize: that is, to choose the unit vector on each ray. Thus the one-sided
directions in the space X = Rn are modeled as the points on the standard unit
sphere SX = Sn = {x ∈ X | x = 1} in X. The set of unit vectors in a convex cone
C ⊆ X = Rn will be called the sphere model for C. The subsets of the standard unit
sphere SX that one gets in this way are precisely the geodesically convex subsets of
SX . A subset T of SX is called geodesically convex if for each two different points
p, q of T that are not antipodes (p = −q), the shortest curve on SX that connects
them is entirely contained in T . This curve is called the geodesic connecting these
two points. Note that for two different points p, q on SX that are not antipodes, there
is a unique great circle on SX that contains them. A great circle on SX is a circle on
SX with center the origin, that is, it is the intersection of the sphere SX with a two
dimensional subspace of X. This great circle through p and q gives two curves on
SX on this circle connecting the two points, a short one and a long one. The short
one is the geodesic connecting these points.
Example 1.4.3 (Sphere Model for a Convex Cone)
1. Figure 1.5 above illustrates the sphere model for convex cones in dimension two.
The convex cone C is modeled by an arc.
2. Figure 1.6 illustrates the sphere model for convex cones in dimension three, and
it illustrates the concepts great circle and geodesic on SX for X = R3 .
Two great circles are drawn. These model two convex cones that are planes
through the origin. For the two marked points on one of these circles, the
segment on this circle on the front of the sphere connecting the two points is
their geodesic. Moreover, you see that two planes in R3 through the origin (and
remember that a plane is a convex cone) are modeled in the sphere model for
convex cones by two large circles on the sphere.
3. Figure 1.7 is another illustration of the sphere model for a convex cone C in
dimension two.
Here the convex cone C is again modeled by an arc. The sphere model is
convenient for visualizing the one-sided directions determined by a convex cone
C ⊆ X = Rn for dimension n up to three: for n = 3, a one-sided direction in
SX
C C
C
Often, we will be working with convex cones in X = Rn that lie above or on the
horizontal coordinate hyperplane xn = 0. Then the unit vectors of these convex
cones lie on the upper hemisphere x12 + · · · + xn2 = 1, xn ≥ 0. Then the sphere
model is called the hemisphere model.
Example 1.4.4 (Hemisphere Model for a Convex Cone)
1. Figure 1.5 above illustrates the hemisphere model for convex cones in dimension
two. The convex cone C is modeled by an arc on the upper half-circle.
2. Figure 1.8 illustrates the hemisphere model for three convex cones C in
dimension three.
The models of these three convex cones C are shaded regions on the upper
hemisphere. These regions are indicated by the same letter as the convex cone
that they model, by C. The number of points in the model of C that lie on the
circle that bounds the hemisphere is, from left to right: zero, one, infinitely many.
This means that the number of horizontal rays of the convex cone C is, from left
to right: zero, one, infinitely many.
1.4 Convex Cones: Visualization by Models 17
This subset will be called the top-view model for a convex cone.
Example 1.4.5 (Top-View Model for a Convex Cone)
1. Figure 1.5 above illustrates the top-view model for convex cones in dimension
two. Here the convex cone C is modeled by a closed line segment contained in
the closed interval [−1, 1], the standard closed unit ball B1 in R1 .
2. Figure 1.9 illustrates the top-view model for three convex cones C in dimension
three.
The models for these three convex cones C are shaded regions in the standard
closed unit disk B2 in the plane R2 , which are indicated by the same letter as the
convex cones that they model, by C. All three convex cones that are modeled,
are chosen in such a way that they contain the positive vertical coordinate axis
{(0, 0, ρ) | ρ > 0} in their interior. As a consequence, their top-view models
contain in their interior the center of the disk. The arrows represent images under
orthogonal projection on the plane x3 = 0 of some geodesics that start at the top
of the hemisphere. The number of points in the model of C that lie on the circle
are, from left to right: zero, one, infinitely many. This means that the number of
horizontal rays of the convex cone C is, from left to right: zero, one, infinitely
many.
C
C C
The top-view model for convex cones can be used for visualization of convex
cones in one dimension higher than the sphere and the hemisphere model, as it
models one-sided directions of C ⊆ X = Rn as points in the ball BRn−1 ⊆ Rn−1 :
so it can be used for visualization up to and including dimension four, whereas the
sphere and hemisphere model can be used only up to and including dimension three.
The top-view model has only one disadvantage: the orthogonal projections of the
geodesics between two points on the hemisphere, which are needed to characterize
the subsets of the ball that model convex cones, are less nice than the geodesics
themselves. These geodesics on the hemisphere are nice: they are halves of great
circles. Their projections onto the ball are less nice: they are halves of ellipses that
are contained inside the ball and that are tangent to the boundary of the ball at two
antipodes on the ball. The exception to this are the projections of halves of great
circles through the top of the hemisphere: they are nice—they are just straight line
segments connecting two antipodes on the boundary of the ball, and so they run
through the center of the ball.
Example 1.4.6 (Straight Geodesics in the Top-View Model) Figure 1.9 above illus-
trates the projections of these geodesics: some straight arrows are drawn from the
center of the disk to the boundary of the top-view model for the convex cone C.
Now we come to the distinctive feature of this book: the consistent use of the
description of a convex set in terms of a convex cone, its homogenization or
conification.
Now we are going to present homogenization of convex sets, the main ingredient
of the unified approach to convex analysis that is used in this book. That is, we are
going to show that every convex set in X = Rn —and we recall that this is our basic
object of study—can be described by a convex cone in a space one dimension higher,
X × R = Rn+1 , that lies entirely on or above the horizontal coordinate hyperplane
xn+1 = 0, that is, it lies in the closed upper halfspace xn+1 ≥ 0.
Example 1.5.1 (Homogenization of a Bounded Convex Set) Figure 1.10 illustrates
the idea of homogenization of convex sets for a bounded convex set A in the plane
R2 .
This crucial picture can be kept in mind throughout this book!
1 C
O A
is indicated by the same letter as the set itself, by A. This set is lifted upward in
vertical direction to level 1, giving the set A × {1}. Then the union of all open rays
that start at the origin and that run through a point of the set A × {1} is taken. This
union is defined to be the homogenization or conification C = c(A) of A. Then A
is called the (convex set) dehomogenization or deconification of C.
There is no loss of information if we go from A to C: we can recover A from
C as follows. Intersect C with the horizontal plane at level 1. This gives the set
A × {1}. Then drop this set down on the floor in vertical direction. This gives the set
A, viewed as lying on the floor.
The formal definition is as follows. It is convenient to reverse the order and to
define first the dehomogenization and then the homogenization. If C ⊆ X × R =
Rn+1 is a convex cone and C ⊆ X × R+ , where R+ = [0, +∞)—that is, C lies on
or above the horizontal hyperplane X × {0}, or, to be more precise, C is contained
in the closed upper halfspace xn+1 ≥ 0—then the intersection C ∩ (X × {1}) is of
the form A × {1} for some unique set A = d(C) ⊆ X. This set is convex. Here is
the verification.
If
a, b ∈ A, ρ, σ ∈ [0, 1], ρ + σ = 1
lies in C and also in X×{1}; so it lies in A×{1}. This gives ρa+σ b ∈ A. This proves
that A is a convex set, as claimed. All convex sets A ⊆ X are the dehomogenization
of some convex cone C ⊆ X × R+ .
Definition 1.5.2 The (convex set) dehomogenization or deconification of a convex
cone C ⊆ X × R+ is the convex set A = d(C) ⊆ X defined by
A × {1} = C ∩ (X × {1}).
20 1 Convex Sets: Basic Properties
A × {1} = C ∩ (X × {1}).
C3 = {x ∈ R3 | − x12 + x2 x3 ≤ 0, x3 ≥ 0},
C1 = {x ∈ R3 | − x1 x2 + x3 ≤ 0, x1 ≥ 0, x2 ≥ 0, x3 ≥ 0},
have as deconifications the convex sets that have the following boundaries, respec-
tively: the circle x12 + x22 = 1, the ellipse 2x12 + 3x22 = 5, the parabola x2 = x12 , and
the branch of the hyperbola x1 x2 = 1 that is contained in the first orthant x ≥ 0.
We have already stated the fact that each convex set A ⊆ X is the deconification
of a suitable convex cone C ⊆ X × R+ . This convex cone C, a homogenization of
A, is not unique. To give a precise description of the non-uniqueness we need the
important concept recession direction.
Definition 1.5.4 Let A ⊆ X = Rn be a convex set. A recession vector of A is a
vector c ∈ X such that a + tc ∈ A for all a ∈ A and all t ≥ 0. This means, if c = 0n ,
that one can travel in A in the one-sided direction of c without ever leaving A. So
you can escape to infinity within A in this direction. The recession cone RA of the
convex set A is the set in X consisting of all recession vectors of A. The set RA is
a convex cone. A recession direction of A is a one-sided direction of RA , that is, an
open ray of the convex cone RA .
Example 1.5.5 (Recession Cone (Geometric)) Figure 1.11 illustrates the concept of
recession cone for convex sets with boundary an ellipse, a parabola and a branch of
a hyperbola respectively.
On the left, a closed convex set A is drawn with boundary an ellipse. This set
is bounded, there are no recession directions: RA = {0n }. In the middle, a closed
convex set A is drawn with boundary a parabola. This set is unbounded, there is a
unique recession direction: RA is a closed ray. This recession direction is indicated
by a dotted half-line: c is a nonzero recession vector. On the right, a closed convex
1.5 Convex Sets: Homogenization 21
A
a + tc
A a
RA
RA
RA
O O O
set A is drawn with boundary a branch of a hyperbola. This set is unbounded, there
are infinitely many recession directions: RA is the closed convex cone in the plane
with legs the directions of the two asymptotes to the branch of the hyperbola that is
the boundary of A.
Example 1.5.6 (Recession Cone (Analytic))
1. Ellipse. The recession cone of 2x12 + 3x22 ≤ 5 is {02 }.
2. Parabola. The recession cone of x2 ≥ x12 is the closed ray containing (1, 0).
3. Hyperbola. The recession cone of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 is the first quadrant
R2+ .
At this point, it is recommended that you find all the recession vectors of some
simple convex subsets of the line and the plane, such as in Exercise 27.
The following result describes all conifications of a given nonempty convex set
A ⊆ X.
Proposition 1.5.7 Let A ⊆ X = Rn be a nonempty convex set. For each
conification C ⊆ X × R+ of A, the intersection of C with the open upper halfspace
X × R++ is the smallest convex cone containing A × {1},
convex cones between the convex cones cmin (A) = R++ (A × {1}) and cmax (A) =
cmin (A) ∪ (RA × {0}),
So cmin (A) and cmax (A) are the minimal and maximal conifications of A respec-
tively. We will choose the minimal homogenization of A to be our default
homogenization. We write c(A) = cmin (A) and we call c(A) the homogenization of
A. The recession cone RA of a convex set A ⊆ X is an important concept in convex
analysis. It arises here naturally from homogenization: RA × {0} is the intersection
of cmax (A), the maximal conification of A, with the horizontal hyperplane X × {0}.
The main point that this book tries to make is that all concepts in convex analysis,
their properties and all proofs, arise naturally from homogenization. The recession
cone is the first example of this phenomenon. In Chap. 3, the recession cone will be
considered in more detail.
Example 1.5.8 (Minimal and Maximal Conification)
1. Consider the convex set A = (−7, +∞). This is the solution set of the inequality
x1 > −7. Homogenization of this inequality gives x1 > −7x2 , that is, x2 >
− 17 x1 . So cmin (A) = {x ∈ R2 | x2 > − 17 x1 , x2 > 0}. One has RA = [0, +∞).
It follows that cmax (A) = {x ∈ R2 | x2 > − 17 x1 , x2 ≥ 0}.
2. Here is a numerical illustration of the minimal and maximal homogenization for
the three convex sets in Fig. 1.10 with a convenient choice of coordinates.
• Ellipse. Let the convex set A be the solution set of x12 + 5x22 − 1 ≤ 0. Then the
recession cone is RA = {02 }. The minimal homogenization c(A) = cmin (A)
is the solution set of
• Parabola. Let the convex set A be the solution set of x2 ≥ x12 . Then the
recession cone RA is the solution set of x1 = 0, x2 ≥ 0. The minimal
homogenization c(A) = cmin (A) is the solution set of
x2 x3 ≥ x12 , x3 > 0.
x2 x3 ≥ x12 , x3 ≥ 0.
• Hyperbola. Let the convex set A be the solution set of x12 −x22 −1 ≤ 0, x2 ≥ 0.
Then the recession cone RA is the epigraph of the absolute value function, the
1.6 Convex Sets: Visualization by Models 23
solution set of x2 ≥ |x1 |. The minimal homogenization c(A) = cmin (A) is the
solution set of
Now we formulate the unified method that will be used in this book for all tasks:
the homogenization method or conification method. This method consists of three
steps:
1. conify (homogenize)—translate into the language of convex cones (homoge-
neous convex sets) by taking the conifications of all convex sets involved,
2. work in terms of convex cones, that is, homogeneous convex sets, to carry out
the task,
3. deconify (dehomogenize)—translate back.
For visualization of convex sets A in X = Rn , we will make use of the ray model,
the hemisphere model and the top-view model for a convex set.
Example 1.6.1 (Ray Model, Hemisphere Model and Top-View Model for a Convex
Set) Figure 1.12 illustrates the three models in the case of a convex set in the line
R.
A convex set A ⊆ R is drawn; R is identified with the horizontal axis in the
plane R2 ; the set A is lifted up to level 1; the open rays that start from the origin
and that run through a point of this lifted up set form the ray model for A; the arc
that consists of the intersections of these rays with the upper half-circle form the
hemisphere model for A; the orthogonal projection of this arc onto the horizontal
axis forms the top-view model for A.
24 1 Convex Sets: Basic Properties
0 A
Fig. 1.12 Ray, hemisphere and top-view models for a convex set
A A
A
RA RA
The three models for visualization of convex sets are the combination of two
constructions that have been presented in the previous two sections: homogenization
of convex sets in X, and visualization of convex cones in X ×R = Rn+1 lying above
the horizontal hyperplane X × {0} by means of the ray, hemisphere and top-view
model for convex cones.
Ray Model for a Convex Set Each point x ∈ X = Rn can be represented by the open
ray in X × R++ that contains the point (x, 1). Thus each subset of X is modeled
as a set of open rays in the open upper halfspace X × R++ . This turns out to be
specially fruitful for visualizing convex sets, if we want to understand their behavior
at infinity. We will see this in Chap. 3. We emphasize that this is an important issue.
Hemisphere Model for a Convex Set Consider the upper hemisphere
the circle that bounds the upper hemisphere. You see from left to right: (1) the closed
convex set A is bounded and the model of A is closed, (2) the closed convex set A
is unbounded and the model of A is not closed, but if we would add one suitable
point on the boundary of the hemisphere, then the model would become closed, (3)
the closed convex set A is unbounded and the model of A is not closed, but if we
would add a suitable infinite set of points on the boundary of the hemisphere, then
the model would become closed. The set of points that would be added in the middle
and on the right if we would make the model closed as explained above, is denoted
by RA . The reason for this notation is that the points of this set correspond to the
rays of the recession cone RA of A.
Top-View Model for a Convex Set Now consider the standard closed unit ball in
X = Rn , Bn = {x ∈ X | x12 + · · · + xn2 ≤ 1}. Each point (x, ρ) in the hemisphere
above, that is, each solution of x12 + · · · + xn2 + ρ 2 = 1, ρ ≥ 0, can be represented
by a point in this ball: the point x. Note here that (x, 0) is the orthogonal projection
of the point (x, ρ) onto the horizontal hyperplane X × {0}. Combining this with the
hemisphere model above, we get for each subset of X a bijection to a subset of the
ball Bn , in fact, to a subset of the open ball UX = Un = {x ∈ X | x < 1}. Here a
point x ∈ X corresponds to the point (x, 1)−1 x ∈ Un .
Example 1.6.3 (Top-View Model for a Convex Set) Figure 1.14 illustrates the top-
view model for three convex sets A in the plane R2 that contain the origin as an
interior point.
The top-view models of these three convex sets A are subsets of the standard
open unit disk; these are also indicated by A. In particular, these models include no
points of the circle that bounds the disk. On the left, you see the model of a closed
bounded A; this model is a closed set in the interior of the disk. In the middle and on
the right, you see the model of a closed unbounded A. These models are not closed.
In the middle, you should add one suitable point from the circle in order to make it
closed. On the right, you should add a suitable infinite set of points from the circle to
make it closed. The sets of points that have to be added in order to make the model
closed have again be indicated by the notation for the recession cone, by RA .
The top-view model for a convex set will be specially fruitful for visualizing
convex sets if we want to understand their behavior at infinity. The advantage of the
A
A RA A
RA
top-view model over the hemisphere model is that it is in one dimension lower. For
example, we can visualize convex sets in three-dimensional space R3 as subsets of
the open ball x12 + x22 + x32 < 1 and convex sets in the plane R2 as subsets of the
open disk x12 + x22 < 1.
The definition of the concept of convex set serves, as all definitions, for academic
precision only: for every concept that one considers, a definition has to be given. If
we work with a concept, we will never use its definition, but instead a user-friendly
calculus. For convex sets, this means that we want basic examples of convex sets
as well as convexity preserving operations, rules to make new convex sets from old
ones. Here are, to begin with, some basic convex sets: the three golden cones, the
convex hull of a set, and the conic hull of a set.
The most celebrated basic examples of convex sets are the following three ‘golden’
convex cones.
• The first orthant Rn+ , is the set of points in Rn with nonnegative coordinates.
• The Lorentz cone Ln , also called the ice cream cone, is the epigraph of the
Euclidean norm on Rn , that is, the region in Rn+1 above or on the graph of
the Euclidean norm,
1
{(x, y) | x ∈ Rn , y ∈ R, y ≥ (x12 + · · · + xn2 ) 2 }.
This is the maximal conification cmax (A) of the closed convex set A that has
boundary the branch of the hyperbola x1 x2 = 1 that is contained in the first quadrant
x ≥ 0.
cone(S)
a3 a4
ρ1 s1 + · · · + ρk sk
with ρi ∈ R+ , 1 ≤ i ≤ k and ρ1 + · · · + ρk = 1.
Example 1.7.7 (Convex Combination) Let a, b, c be three points in R2 that do not
lie on one line. Then the set of convex combinations of a, b, c is the triangle with
vertices a, b, c.
Proposition 1.7.8 The convex hull of a set S ⊆ X = Rn consists of the convex
combinations of finite subsets of S.
In particular, a convex set is closed under taking convex combinations of a finite
set of points.
This could be called a primal description—or a description from the inside—of
a convex set. Each choice of a finite subset of the convex set gives an approximation
from the inside: the set of all convex combinations of this subset. Figure 1.17
illustrates the primal description of a convex set.
1.7 Basic Convex Sets 29
Proposition 1.7.8 will be proved by the homogenization method. This is the first
of many proofs of results for convex sets that will be given in this way. Therefore,
we make here explicit how all these proofs will be organized.
1. We begin by stating a version of the result for convex cones, that is, for homoge-
neous convex sets. This version is obtained by conification (or homogenization)
of the original statement, which is for convex sets. For brevity we only display
the outcome, which is a statement for convex cones. So we do not display how
we got this statement for convex cones from the original statement for convex
sets.
2. Then we prove this convex cone version, working with convex cones.
3. Finally, we translate back by deconification (or dehomogenization).
We need a definition.
Definition 1.7.9 A conic combination of a finite set S = {s1 , . . . , sk } ⊂ X = Rn is
a nonnegative linear combination of the elements of S,
ρ1 s1 + · · · + ρk sk ,
with ρi ∈ R+ , 1 ≤ i ≤ k.
Example 1.7.10 (Conic Combination) The set of conic combinations of the stan-
dard basis e1 = (1, 0, . . . , 0) , . . . , en = (0, . . . , 0, 1) is the first orthant Rn+ .
Proof of Proposition 1.7.8
1. Convex cone version. We consider the following statement:
The conic hull of a set S ⊆ X consists of all conic combinations of finite subsets
of S.
2. Work with convex cones. The statement above holds true. Indeed, these points
should all lie in cone(S) (this follows immediately by induction on k) and as it is
readily checked that they form a convex cone, they form the conic hull of S.
3. Translate back. The statement above implies the statement of the proposition: for
S ⊆ X, we have that the convex cone cone(S × {1}) consists of all elements
Now we already mention briefly something that is the main subject of Chap. 4
and that is the heart of convex analysis. For convex sets, one has not only a
primal description, that is, a description from the inside, but also a so-called dual
description, that is, a description from the outside. Each choice of a finite system
30 1 Convex Sets: Basic Properties
H4
H1 A
H3
H2
of linear inequalities in n variables which is satisfied by all points of the convex set
gives an approximation from the outside: the set of all solutions of this system.
Definition 1.7.11 Figure 1.18 illustrates the dual description of a convex set.
In the figure, six lines are drawn that do not intersect the convex set A. For each
of them we cut away the side of the line that is disjoint from A. This gives a six-sided
polygon that is an outer approximation of A.
The first operation is the Cartesian product. If we have two convex sets A ⊆ Y =
Rm , B ⊆ X = Rn , then their Cartesian product is the set
A × B = {(a, b) | a ∈ A, b ∈ B}
The second operation is the image of a convex set under a linear function. Let
M be an m × n-matrix. This matrix determines a linear function Rn → Rm by the
recipe x → Mx. For each convex set A ⊆ Rn , its image under M is the set
MA = {Ma | a ∈ A}
BM = {a ∈ Rn | Ma ∈ B}
We make, for later purposes, the following observations for the two nice properties
that convex sets can have, closedness and properness. The question is whether these
properties are preserved under the three operations on convex sets that have been
defined above. Closedness of convex sets is preserved by Cartesian products, by
inverse images under linear functions, but not by images under linear functions.
Example 1.8.2 (Closedness Not Always Preserved Under Linear Image)
Figure 1.19 illustrates that the image of a closed convex set under a linear function
is not closed.
The shaded region, the solution set of xy ≤ 1, x ≥ 0, is a closed convex set, but
its orthogonal projection onto the x-axis is the open half-line x > 0, y = 0. It is not
closed: it does not contain the origin.
A sufficient condition for preservation of closedness of a convex set A ⊆ Rn by
the image under a linear function Rn → Rm determined by an m × n-matrix M
32 1 Convex Sets: Basic Properties
is that the kernel of M (the solution set of Mx = 0) and the recession cone RA of
A have only the origin in common, as we will see in Chap. 3. Because of this, we
will sometimes work with open convex sets instead of with closed sets: openness of
convex sets is preserved by the image of a linear function Rn → Rm determined
by an m × n-matrix M provided this linear function is surjective, that is, provided
the rows of M are linearly independent. Properness of convex sets is preserved by
Cartesian products, by images under linear functions, but not by inverse images
under linear functions.
For a finite set of points in X = Rn , we will give a concept of ‘points in the middle’.
Theorem 1.9.1 (Radon’s Theorem) A finite set S ⊂ X = Rn of at least n +
2 points can be partitioned into two disjoint sets B (‘blue’) and R (‘red’) whose
convex hulls intersect,
co(B) ∩ co(R) = ∅.
Figure 1.20 illustrates Radon’s theorem for four specific points in the plane and
for five specific points in three-dimensional space.
A common point of the convex hulls of B and R is called a Radon point of S. It
can be considered as a point in the middle of S.
1.9 Radon’s Theorem 33
2. Work with convex cones. As #(S) > n, we can choose a nontrivial linear relation
among the elements of S. As each element of S has positive last coordinate, it
follows that in this nontrivial linear relation, at least one coefficient is positive
and at least one is negative. We rewrite this linear relation, leaving all terms
with positive coefficient on the left side of the equality sign and bringing the
remaining terms to the other side, giving an equality of two conic combinations.
Thus S has been partitioned into two disjoint nonempty sets B and R, and one
has an equality
μb b = νr r.
b∈B r∈R
two disjoint sets B × {1} and R × {1} whose conic hulls intersect in a nonzero
point. That is, we have the equation
λb (b, 1) = μr (r, 1)
b∈B r∈R
λb = μr .
b∈B r∈R
S1 S3
S4 S2
step, assume that the statement holds for some m ≥ n and that J ⊆ I with
#(J ) = m + 1. Choose for each j ∈ J a nonzero element
cj ∈ ∩k∈J \{j } Cj
The assumption in Helly’s theorem that the collection of convex sets is finite is
essential.
Example 1.10.4 (Finiteness Assumption in Helly’s Theorem)
1. The infinite collection of convex subsets [n, +∞) ⊂ R, n ∈ N has no common
point, but every finite subcollection has a common point. The reason for the
trouble here is that the common points of finite subcollections run away to
infinity.
2. Another example where things go wrong is the infinite collection (0, n1 ) ⊂
R, n ∈ N. The reason for the trouble here is that common points of finite
subcollections tend to a point that is not in the intersection, in fact, that does
not belong to any of these sets: zero.
If the two reasons for failure given above are prevented, then one gets a
version of the result for infinite collections. This requires the topological concept
of compactness (in Chap. 3 more details about topology will be given).
38 1 Convex Sets: Basic Properties
Note that the assumption of Theorem 1.10.7 can be relaxed: it is sufficient that the
convex sets from the collection are closed and that at least one of them is compact.
Many interesting results can be proved with unexpected ease when you observe
that Helly’s theorem can be applied. We formulate a number of these results. This
material is optional. At a first reading, you can just look at the statements of the
results below. Some of the proofs of these results are quite challenging. At the end
of the chapter, hints are given.
Proposition 1.11.1 If each three points from a set in the plane R2 of at least three
elements can be enclosed inside a unit disk, then the entire set can be enclosed inside
a unit disk.
Here is a formulation in nontechnical language of the next result: if there is a spot
of √
diameter d on a tablecloth, then we can cover it with a circular napkin of radius
d/ 3.
Proposition 1.11.2 (Jung’s Theorem) If the distance between any two points√of a
set in the plane R2 is at most 1, then the set is contained in a disk of radius 1/ 3.
We can also give a similar result about a disk contained in a given convex set in
the plane.
Proposition 1.11.3 (Blaschke’s Theorem) Every bounded convex set A in the
plane of width 1 contains a disk of radius 1/3 (the width is the smallest distance
between parallel supporting lines; a line is called supporting if it contains a point
of the set and moreover has the set entirely on one of its two sides).
1.11 *Applications of Helly’s Theorem 39
Remark This last result can be generalized to polynomials of any given degree.
Then the polynomial function of degree < n that best approximates the func-
tion x → x n on [−1, 1] can be given explicitly: by the recipe x → x n −
2−(n−1) cos (n arccos x).
We have seen that each point of the convex hull of a set S ⊆ X = Rn is a convex
combination of a finite subset of S. This result can be refined.
Theorem 1.12.1 (Carathéodory’s Theorem) Each point in the convex hull of a
set S ⊆ Rn is a convex combination of an affinely independent subset of S. In
particular, it is a convex combination of at most n + 1 points of S.
Example 1.12.2 (Carathéodory’s Theorem) Figure 1.23 illustrates this result for a
specific set of five points in the plane R2 .
Each point in the convex hull of this set is by Proposition 1.7.8 a convex
combination of these five points. But by Carathéodory’s theorem it is even a convex
combination of three of these points—not always the same three points. Indeed, in
the figure the convex hull is split up into three triangles. Each point in the convex
hull belongs to one of the triangles. Therefore, it is a convex combination of the
three vertices of the triangle. This is in agreement with Carathéodory’s theorem.
Proof of Carathéodory’s Theorem We use the homogenization method.
1. Convex cone version. We consider the following statement:
For a set S ⊆ X = Rn , every point in its conic hull cone(S) is a conic
combination of a linearly independent subset of S.
2. Work with convex cones. Choose an arbitrary x ∈ cone(S). Let r be the minimal
number for which x can be expressed as a conic combination of some finite subset
T of S of r elements—#T = r. Choose such a subset T and write x as a conic
combination of T . All coefficients of this conic combination must be positive by
the minimality property of r.
We claim that T is linearly independent. We prove this claim by contradiction.
Suppose there would be a nontrivial linear relation among the elements of T . We
may assume that one of its coefficients is positive (by multiplying it by −1 if
necessary). Then we subtract a positive multiple of this linear relation from the
The convex cone version of Carathéodory’s theorem, which has been established
in the proof above, is worth to be stated separately.
Proposition 1.12.3 Every point in the conic hull of a set S ⊆ X = Rn is a
conic combination of a linearly independent subset of S. In particular, it is a conic
combination of at most n elements of S.
Carathéodory’s theorem has many applications. The following one is especially
useful.
Corollary 1.12.4 The convex hull of a compact set S ⊆ X = Rn is closed.
In the proof, we will need the following property of the concept of compactness.
Recall that a function f from a set S ⊆ X = Rn to Y = Rm is called continuous if
limx→a f (x) = f (a) for all a ∈ S.
Proposition 1.12.5 The image of a compact set S ⊆ X = Rn under a continuous
function S → Y , with Y = Rm , is again compact.
Here we recall from Example 1.8.2 that the image of a closed set under a linear
function need not be closed.
Proof of Corollary 1.12.4 By Carathéodory’s theorem, co(S) is the image of the
function
Convex cones arise naturally if you consider a preference relation for the elements
of X = Rn . A preference (relation) on X is a partial ordering on X that satisfies
the following conditions:
• xx
• x y, u v ⇒ x + u y + v.
• x y, α ≥ 0 ⇒ αx αy.
If is a preference relation on X, then it can be verified that the set {x ∈ X | x
0} is a convex cone C containing the origin. Conversely, each convex cone C ⊆ X
containing the origin gives a preference relation on X—defined by x y iff x −y ∈
C. That is, there is a natural bijection between the set of preference relations on X
and the set of convex cones in X that contain the origin. One can translate properties
of a preference relation on X to its corresponding convex cone C ⊆ X. For
example, is asymmetric (that is, x y and y x implies x = y) iff the convex
cone C does not contain any line through the origin. A convex cone containing the
origin that does not contain any line through the origin is called a pointed convex
cone.
Example 1.13.1 (A Preference Relation on Space-Time) The following example of
a preference relation in space-time, the four dimensional space that is considered in
special relativity theory (see Susskind and Friedman [2]), plays a key role in that
1
theory. This explains why the convex cone y ≥ (x12 + · · · + xn2 ) 2 is named after
the physicist Lorentz. No information can travel faster than with the speed of light
(for convenience, we assume that this speed equals 1, by suitable choices of units of
length and time). Therefore, an event e that takes place at the point (x, y, z) at time
t according to some observer can influence an event e that takes place at the point
(x , y , z ) at time t according to the same observer if and only if light that is sent at
44 1 Convex Sets: Basic Properties
the moment t from the point (x, y, z) in the direction of the point (x , y , z ) reaches
this point at time ≤ t . Then one writes e e. This defines a preference relation on
space-time R4 = {(x, y, z, t)}. Its corresponding convex cone is the Lorentz cone
1
L3 = {(x, y, z, t) | t ≥ (x 2 + y 2 + z2 ) 2 }. Indeed, light that is sent from the point
(0, 0, 0) at time 0 in the direction of a point (x, y, z) will reach this point at a time
1
≤ t iff t ≥ (x 2 + y 2 + z2 ) 2 .
Here are some interesting properties of this preference relation from special
relativity theory. Whether one has for two given events e and e that e e or
not, does not depend on the observer. Therefore, one sometimes says that e e
means that event e happens absolutely after event e. Note that it can happen for two
events e and e that e happens after e according to some observer but that e happens
after e according to another observer.
You might like to do the following experiment yourself (with pencil and paper or
with a computer). Choose a large number of arbitrary nonempty subsets S1 , . . . , Sm
of the plane R2 , and make a drawing of their Minkowski sum S = S1 + · · · + Sm =
{s1 +· · ·+sm | si ∈ Si , 1 ≤ i ≤ m}. That is, S is the set that you get by picking in all
possible ways one element from each set and then taking the vector sum. You might
expect that S can be any set in the plane. However, it has a very special property:
each point in the convex hull of S lies close to a point of S.
Example 1.14.1 (Shapley–Folkman Lemma) Figures 1.24, 1.25 and 1.26 illustrate
this experiment. Figure 1.24 shows a choice of sets: ten sets, each consisting of two
antipodal points.
Figure 1.25 shows the scaled outcome of the experiment, the Minkowski sum of
the ten sets.
Figure 1.26 also gives the convex hull of the Minkowski sum.
This empirical phenomenon can be formulated as a precise result (see Starr [3]).
This holds not only for the plane, but more generally in each dimension. To begin
with, note the easy property that
x = i=1 xi such that xi ∈ co(Si ) for all i. Now we give a stronger version of this
property, and this is the promised precise result.
Lemma 1.14.2 (Shapley–Folkman Lemma)
Let Si (i = 1, . . . , m) be m ≥ n
non-empty subsets of Rn . Write S = m S
i=1 i for their Minkowski sum. Then each
1.14 *The Shapley–Folkman Lemma 45
m
x ∈ co(S) has a representation x = i=1 xi such that xi ∈ co(Si ) for all i, and
xi ∈ Si for at least (m − n) indices i.
This result implies indeed the empirical observation above. For example, if each
set Si , 1 ≤ i ≤ m is contained in a ball of radius r, then the Shapley–Folkman
lemma implies that each point in co(S) lies within distance 2nr of a point in S.
In a more striking formulation, each point in the convex hull of the average sum
m (S1 + · · · + Sm ) lies within distance 2nr/m of a point of this average sum; note
1
z = (x , 1, 1, . . . , 1),
z1j = (y1j , 1, 0, . . . , 0),
...................
znj = (ynj , 0, 0, · · · , 1).
1.15 Exercises 47
li
Then we have z = m i=1 j =1 aij zij . Now we apply the convex cone version
of Carathéodory’s theorem, Proposition 1.12.3. It follows that z can be expressed as
li
z= m i=1 j =1 bij zij , in which one has for all pairs (i, j ) that bij ≥ 0 and for at
most (m + n) pairs (i, j ) that bij > 0. This amounts to:
m
li
li
x= bij yij and bij = 1
i=1 j =1 j =1
for all i.
Now write xi = lji=1 bij yij for all i. Then x = m i=1 xi and xi ∈ co(Si ) for
each i. As the number of pairs (i, j ) for which bij > 0 is at most m + n and as for
each i one has bij > 0 for each i, there are at most n indices i that have bij > 0 for
more than one j . Hence xi ∈ Si for at least (m − n) indices i.
1.15 Exercises
1. Make a drawing similar to Fig. 3, but now for a landscape with a straight road
that stretches out to the horizon, with lampposts along the road.
2. Check that the set F that is used to model the bargaining opportunity in the
Alice and Bob example, is indeed a convex set.
3. Make a drawing of the pair (F, v) in the Alice and Bob example and check that
this gives Fig. 1.1.
4. Show that the list of all convex subsets of the line R given in Example 1.2.3 is
correct.
5. *Prove the statements about convex polygons in the plane R2 in Example 1.2.3.
6. *Prove the statement about approximating a bounded convex set in the plane
R2 by polygons in Example 1.2.3.
7. Give a complete list of all convex cones in the plane R2 .
8. Consider the perspective function P : Rn × R++ → Rn given by the recipe
(x, t) → t −1 x. Show that the image of a convex set A ⊆ Rn × R++ under the
perspective function is convex.
9. Determine for each one of the following statements whether it is correct or
not.
(a) If you adjoin the origin to a convex cone that does not contain the origin,
then you get a convex cone containing the origin.
(b) If you delete the origin from a convex cone that contains the origin, then
you get a convex cone that does not contain the origin.
Hint: one statement is correct and one is wrong.
48 1 Convex Sets: Basic Properties
10. Prove the following statements, which describe the relation between convex
cones that do not contain the origin and convex cones that do contain the
origin.
(a) For each convex cone C ⊆ Rn that does not contain the origin, the union
with the origin, C ∪ {0n }, is a pointed convex cone.
(b) Each convex cone C ⊆ Rn that contains the origin is the Minkowski sum
of a pointed convex cone D and a subspace L,
C = D + L = {c + l | c ∈ C, l ∈ L}.
a11 x1 + a12 x2 = b1 ,
a21 x1 + a22 x2 = b2 .
a11 x1 + a12 x2 = b1 x3 ,
a21 x1 + a22 x2 = b2 x3 .
(b) Assume that the equations are not scalar multiples of each other. Show
that the homogenized system has a unique (up to positive scalar multiples)
solution with x3 ≥ 0.
(c) Show that translation of this uniqueness result into geometric language
gives the following statement. Consider two closed half-planes in the upper
halfspace x3 ≥ 0 that are both bounded by lines in the horizontal plane
x3 = 0 through the origin, and that are both not contained in the horizontal
plane x3 = 0. Then their intersection is a closed ray in the upper halfspace
x3 ≥ 0.
12. Show that a subspace of Rn is a convex cone, and so in particular a convex set.
13. Let M be an m × n-matrix. Show that the subspaces {x ∈ Rn | Mx = 0} and
{M y | y ∈ Rm } of Rn are each others orthogonal complement.
14. Give precise proofs for the results on subspaces from linear algebra that are
listed in Sect. 1.2 as specializations of results on convex sets in the Chaps 1–
1.15 Exercises 49
23. Show that the first orthant, the Lorentz cone and the semidefinite cone are
convex cones and that these convex cones are pointed.
24. Show that the vector space of symmetric n × n-matrices can be identified with
Rm where m = 12 n(n + 1), by means of stacking.
25. Show that viewing circles, ellipses, parabolas and branches of hyperbolas as
conic sections is an example of the homogenization of convex sets.
26. Make a version of Fig. 1.11 for an unbounded convex set in the plane. That is,
draw the conification for such a set.
27. Determine the recession cones of some simple convex sets, such as (7, +∞),
{(x, y) | x ≥ 0, y ≥ 0}, {(x, y) | x > 0, y > 0}, {(x, y) ∈ R2 | y ≥ x 2 }, and
{(x, y) ∈ R2 | xy ≥ 1, x ≥ 0, y ≥ 0}.
50 1 Convex Sets: Basic Properties
28. Let A ⊆ X = Rn be a nonempty convex set. Show that the largest set S in Rn
for which adjoining S × {0} to the convex cone c(A) preserves the convex cone
property, is the recession cone of A, S = RA , defined to be the set of all r ∈ X
for which a + tr ∈ A for all t ≥ 0 and all a ∈ A.
29. Show that the recession cone RA of a nonempty closed convex set A ⊆ Rn is a
closed convex cone.
30. Prove the statements of Proposition 1.5.7.
31. Show that the minimal conification of an open convex set is open.
32. Show that the maximal conification of a closed convex set is closed.
33. Show that the maximal conification of an open convex set is never open.
34. Show that the minimal conification of a closed convex set is never closed.
35. Let A ⊆ X = Rn be a nonempty convex set. Show that cone(A), the conic hull
of A, is equal to R+ A = {ρa | ρ ≥ 0, a ∈ A}.
36. Verify that the following three operations make new convex sets (respectively
convex cones) from old ones: the Cartesian product, image and inverse image
under a linear function.
37. As an exercise in the homogenization method, derive the property that the
image and inverse image of a convex set under a linear function are convex
sets from the analogous property for convex cones.
38. Show that the image of the closed convex set {(x, y) | x, y ≥ 0, xy ≥ 1} under
the linear function R2 → R given by the recipe (x, y) → x is a convex set in
R that is not closed.
39. Show that the property closedness is preserved by the following two operations:
the Cartesian product and the inverse image under a linear function.
40. Show that the image MA of a closed convex set A ⊆ Rn under a linear function
M : Rn → Rm is closed if the kernel of M, that is, the solution set of Mx = 0,
and the recession cone of A, RA , have only the origin in common.
41. Show that the image MA of an open convex set A ⊆ Rn under a surjective
linear function M : Rn → Rm is open.
42. Show that the proof of Theorem 1.9.1 gives an algorithm for finding the sets B
and R and a point in B ∩ R.
43. Show that the proof of Theorem 1.10.1 gives in principle an algorithm for
finding a common point of the given collection of convex sets, provided one
has a common point of each subcollection of n + 1 of these sets.
44. Show that the assumption in Theorem 1.10.7 that all convex sets of the given
collection are compact can be relaxed to: all convex sets of the given collection
are closed and at least one of them is bounded.
45. Make the algorithm that is implicit in the proof of Theorem 1.12.1 explicit.
46. Is the smallest closed set that contains a given convex set always convex?
47. Is the smallest convex set that contains a given closed set always closed?
48. Give a constructive proof (‘an algorithm’) without using any result for Weyl’s
theorem in the case of a tetrahedron, the convex hull of an affinely independent
set. To be precise, prove the following statements.
1.16 Hints for Applications of Helly’s Theorem 51
Here are hints for the applications of the theorem of Helly that are formulated in
Sect. 1.11.
• Proposition 1.11.1. Consider the collection of unit disks with center an element
of the set.
• Proposition √
1.11.2. Show that each three points of the set are included in a disk
of radius 1/ 3.
52 1 Convex Sets: Basic Properties
• Proposition 1.11.3. Let l run over the set L of supporting lines of A, define for
each l ∈ L the set Al to consist of all points in A of distance to l at most 1/3 and
then consider the collection of sets Al , l ∈ L.
• Proposition 1.11.4. Consider the collection of closed half-planes in the plane
which contains on its boundary a side of the polygon and which contains also all
points of the polygon close to this side.
• Proposition 1.11.5. Consider the collection of convex hulls of subsets of S of at
nm
least n+1 points, where m = #(S).
• Proposition 1.11.6. Let l run over the set of parallel line segments L, define for
each l ∈ L the set Al consisting of all lines that intersect l and then consider the
collection Al , l ∈ L.
References
Abstract
• Why. You need the Minkowski sum and the convex hull of the union, two binary
operations on convex sets, to solve optimization problems. A convex function f
has a minimum at a given point x iff the origin lies in the subdifferential ∂f (
x ), a
certain convex set. Now suppose that f is made from two convex functions f1 , f2
with known subdifferentials at x , by one of the following two binary operations
for convex functions: either the pointwise sum (f = f1 + f2 ) or the pointwise
maximum (f = max(f1 , f2 ) ), and, in the latter case, f1 ( x ) = f2 (x ). Then
x ) can be calculated, and so it can be checked whether it contains the origin
∂f (
or not: ∂(f1 + f2 )( x ) is equal to the Minkowski sum of the convex sets ∂f1 ( x)
and ∂f2 (x ) and ∂(max(f1 , f2 ))( x ) is equal to the convex hull of the union of
x ) and ∂f2 (
∂f1 ( x ) = f2 (
x ) in the difficult case that f1 ( x ).
• What. The four standard binary operations for convex sets are the Minkowski
sum, the intersection, the convex hull of the union and the inverse sum. The
defining formulas for these binary operations look completely different, but
they can all be generated in exactly the same systematic way by a reduction to
convex cones (‘homogenization’). These binary operations preserve closedness
for polyhedral sets but not for arbitrary convex sets.
Road Map
• Section 2.1.1 (motivation binary operations convex sets).
• Section 2.1.2 (crash course in working coordinate-free).
• Section 2.2 (construction binary operations +, ∩ for convex cones by homog-
enization; the fact that the sum map and the diagonal map are each others
transpose).
• Section 2.3 (construction of one binary operation for convex sets by homogeniza-
tion).
• Proposition 2.4.1 and Fig. 2.1 (list for the construction of the standard binary
operations +, ∩, co∪, # for convex sets by homogenization; note that not all of
these preserve the nice properties closedness and properness).
Binary operation on convex sets have been considered already a long time ago, for
example in the classic work [1]. In [2] and later in [3], it has been attempted to con-
struct complete lists of binary operations on various types of convex objects. This
plan was realized using the method presented in this chapter, by homogenization, in
[4] and [5].
A driving idea behind convex analysis is the parallel with differential analysis.
Let us apply this idea to the minimization of a function f : Rn → R. If f is
differentiable, then one has, at a minimum x , that f (
x ) = 0, the derivative of f
at
x is zero. If moreover, f is built up from differentiable functions for which one
knows the derivative in x , as f = f1 + f2 or as f = f1 f2 , then f ( x ) can be
expressed in f1 ( x ) and f2 (
x ). Here are the convex parallels of these two facts. If f
is convex (that is, the region above its graph is a convex set), then x is a minimum
iff 0 ∈ ∂f (x ), zero lies in a certain convex set, called the subdifferential of f at x . If
moreover f is built up from convex functions for which one knows the derivative,
as f = f1 + f2 or as f = max(f1 , f2 ), then ∂f ( x ) can be expressed in ∂f1 ( x ) and
x ), using binary operations for convex sets. We will see in Chap. 7 that these
∂f2 (
parallel facts are used to solve optimization problems.
In order to present the constructions of the binary operations for convex sets as
simply as possible, we will work coordinate-free. In this section, we recall the
minimal knowledge that is needed for this. More information can be found in any
textbook on linear algebra.
The idea is that if you work with column spaces, matrices, dot products and the
transpose of a matrix, you can do this in another—more abstract—language, which
does not make use of coordinates.
• Column spaces are described by finite dimensional vector spaces. These are
defined as sets V that are equipped with a function V ×V → V , which is denoted
by (v, w) → v + w and called vector addition, and with a function R × V → V ,
which is denoted by (α, v) → αv and called scalar multiplication. In order to
make sure that you can work with vector addition and scalar multiplication in the
same way that you do in column spaces, you impose a number of suitable axioms
on them such as α(v + w) = αv + αw. Finite dimensionality of V is ensured by
requiring that there exist v1 , . . . , vn ∈ V such that each v ∈ V can be written as
α1 v1 + · · · + αn vn . for suitable numbers α1 , . . . , αn ∈ R.
2.1 *Motivation and Preparation 55
The notion of finite dimensional vector space is essentially the same as the
notion of column space, in particular, it is not more general. If the natural number
n above is chosen to be minimal, then v1 , . . . , vn is called a basis of V . The
reason for this terminology is that each v ∈ V can be written uniquely as α1 v1 +
· · · + αn vn for suitable numbers α1 , . . . , αn ∈ R. Then you can replace v by the
column vector with coordinates α1 , . . . , αn (from top to bottom). If you do this,
then you are essentially working with column space Rn with the usual vector
addition and scalar multiplication. So a finite dimensional vector space is indeed
essentially a column space.
• Matrices are described by linear functions between two finite dimensional vector
spaces V and W . These are defined as functions L : V → W that send sums
to sums, L(u + v) = L(u) + L(v), and a scalar multiples to the same scalar
multiple, L(αv) = αL(v).
The notion of a linear function is essentially the same as the notion of a matrix.
Let L : V → W be a linear function between finite dimensional vector spaces.
Choose a basis v1 , . . . , vn of V and a basis w1 , . . . , wm of W . Then there exists
a unique m × n-matrix M such that if you go over to coordinates, and so get
a function Rn → Rm , then this function is given by left multiplication by the
matrix M, that is, by the recipe x → Mx, where Mx is matrix multiplication.
So the notion of a linear function is indeed essentially the same as the notion of
a matrix.
Here we point out one advantage of working with linear functions over
working with matrices. This reveals the deeper idea behind the matrix product:
it is just the coordinate version of composition of linear functions (such a
composition is of course again a linear function).
• The dot product x · y = x1 y1 + · · · + xn yn is described by an inner product on a
finite dimensional vector space V . This is defined to be a function V ×V → R—
denoted by (u, v) → u, v = u, vV and called inner product—on which
suitable axioms are imposed so that you can work with it just as you do with the
dot product.
The notion of an inner product is essentially the same as the notion of the dot
product; in particular, it is not more general. Let ·, · be an inner product on a
finite dimensional vector space V . Choose an orthonormal basis v1 , . . . , vn of V
(that is, vi , vj equals 0 if i = j and it equals 1 if i = j ). Go over to coordinates,
and so get an inner product on Rn . Then this inner product is the dot product. So
the notion of inner product is indeed essentially the same as the notion of the dot
product.
A finite dimensional vector space together with an inner product is called a
finite dimensional inner product space. So a finite dimensional inner product
space is essentially a column space considered with the dot product.
• The transpose of a matrix is described by the transpose L of a linear function
L between two finite dimensional inner product spaces V and W . This is defined
as the unique linear function L : W → V for which
for all v ∈ V , w ∈ W . Note that L = L. That is, for two linear functions
L : V → W and M : W → V we have L = M iff M = L. Then L and M
are said to be each others transpose.
The notion of the transpose of a linear function is essentially the same as the
notion of the transpose of a matrix. Let L : V → W be a linear function between
finite dimensional inner product spaces. Choose orthonormal bases v1 , . . . , vn of
V and w1 , . . . , wm of W . Then we go over to coordinates, and so L is turned into
an m × n-matrix and L is turned into an n × m-matrix. Then these two matrices
are each others transpose.
One can do everything that was done in Chap. 1 coordinate free. For example, one
can define convex sets and convex cones in vector spaces, one can take Cartesian
products of convex sets, one can take the image of a convex set under a linear
function between two vector spaces and one can take the inverse image of a convex
set under a linear function between two vector spaces.
The aim of this section is to construct the two binary operations for the homo-
geneous case, that is, for convex cones—intersection and Minkowski sum—in a
systematic way: by homogenization. Moreover, it is shown that these two binary
operations are closely related—by means of the transpose operator. For these tasks,
it is convenient to work coordinate-free. However, we recall that a finite dimensional
inner product space is essentially just Rn equipped with the dot product.
We consider two special linear functions for a finite dimensional inner product
space X:
• the sum map +X : X × X → X is the linear function with recipe (x, y) → x + y,
• the diagonal map X : X → X×X is the linear function with recipe x → (x, x).
The function +X needs no justification—it gives the formal description of
addition in X. The function X gives a formal description for the equality relation
x = y on pairs (x, y) of elements of X: these pairs are of the form X (x). Moreover,
we now show that these two special linear functions +X and X are closely related:
they are each other’s transpose. This fundamental property will be used later, for
example in Chap. 6, to get calculus rules such as
Proposition 2.2.1 (Relation Sum and Diagonal Map) Let X be a finite dimen-
sional inner product space. Then the sum map +X and the diagonal map X are
each other’s transpose.
2.3 Construction of a Binary Operation 57
u,
X (v, w) = X (u), (v, w) = (u, u), (v, w) =
It follows that
X = +X as required.
Now we construct binary operations on convex cones. For two convex cones
C, D ⊆ X, we take, to begin with, the Cartesian product C × D, which is a convex
cone in X × X. Now we use the two operations by linear functions on convex sets—
image and inverse image.
We take (C × D)X , the inverse image of C × D under the diagonal map X .
This is a convex cone in X. Note that this is the intersection C ∩ D of C and D:
(C × D)X = C ∩ D,
We take +X (C × D), the image of C × D under the sum map +X . This is also a
convex cone in X. Note that this is the Minkowski sum C + D of C and D:
+X (C × D) = C + D.
The aim of this section is to give a systematic construction for binary operations for
convex sets: by the homogenization method. We repeat once again that this means
that we begin by working with convex cones and then we dehomogenize.
The skill that is demonstrated in this section is somewhat technical. By reading
the explanation of the construction of binary operations below to get the idea,
moreover studying carefully the example to understand the details, furthermore
repeating the calculation from the example yourself, and finally constructing in the
same way one of the other three binary operations, you can master it.
The precise way to construct a binary operation on convex sets by homogeniza-
tion, is as follows. We take the space that contains the homogenizations of convex
sets in a finite dimensional vector space X (we recall that X is essentially Rn ): this
58 2 Convex Sets: Binary Operations
space is the product space X×R. Then we make, for each of the two factors X and R
of this space, a choice between + and −. Let the choice for X be ε1 and let the choice
for R be ε2 . This gives 22 = 4 possibilities for (ε1 , ε2 ): (++), (+−), (−+), (−−).
Each choice (ε1 , ε2 ) will lead to a binary operation ◦(ε1 ,ε2 ) on convex sets.
Let (ε1 , ε2 ) be one of the four choices, and let A and B be two ‘old’ convex sets
in X. We use the homogenization method to construct a ‘new’ convex set A◦(ε1 ,ε2 ) B
in X.
1. Homogenize. We conify A and B. This gives the conifications c(A) and c(B)
respectively. These are convex cones in X × R++ .
2. Work with homogeneous convex sets, that is, with convex cones. We are going to
associate to the old convex cones C = c(A) and D = c(B) a new convex cone
E that is contained in X × R++ .
We take the Cartesian product C × D. This is a convex cone in
X × R++ × X × R++ .
(X × X) × (R++ × R++ ).
+X : X × X → X,
X : X → X × X,
To explain what this binary operation is, let A and B be two ‘old’ convex sets in
X. We have to make a new convex set A ◦(+−) B in X from A and B. We use the
homogenization method.
1. Homogenize. We conify A and B. This gives the conifications c(A) and c(B)
respectively. These are convex cones in X × R++ .
2. Work with homogeneous convex sets, that is, with convex cones. We want to
associate to the old convex cones C = c(A) and D = c(B) a new convex cone
E that is contained in X × R++ .
We take the Cartesian product C × D. This is a convex cone in
X × R++ × X × R++ .
(X × X) × (R++ × R++ ).
such that
E ⊆ X × R++ .
Now we use that C = c(A) and D = c(B). It follows that E consists of all
(z, γ ) ∈ X × R++ for which there exists an element
for which
z = sa + tb and γ = s = t.
That is,
E = c(A + B),
A ◦(+−) B = A + B
Here is a list that gives which choice leads to which binary operation if we construct
the binary operations for convex sets systematically, by the homogenization method.
Proposition 2.4.1 (Systematic Construction Binary Operations for Convex
Sets) The outcome of the 4 = 22 possibilities (±, ±) of choosing for each one of
the two factors of X × R between + and − is as follows:
• (+−) gives the Minkowski sum +, defined by
A + B = {a + b | a ∈ A, b ∈ B},
A ∩ B = {x ∈ X | x ∈ A & x ∈ B},
A co ∪ B = co(A ∪ B),
2.4 Complete List of Binary Operations 61
Example 2.4.2 (Minkowski Sum) For n = 1, the Minkowski sum of two intervals is
given by
For n = 2, the Minkowski sum of two rectangles whose sides are parallel to the
coordinate axes is given by
This can be used to compute the Minkowski sum of two arbitrary convex sets in
R2 by approximating these by a disjoint union of such rectangles and then using
that if two convex sets A, B in R2 are written as the disjoint union of convex sets,
A = ∪i∈I Ai and B = ∪j ∈J Bj , then A + B = ∪i∈I,j ∈J (Ai + Bj ). For general n,
one can proceed in a similar way as for n = 2.
Example 2.4.3 (Convex Hull of the Union) Figure 2.1 illustrates the convex hull of
the union.
Figure 2.1 shows that if we want to determine the convex hull of the union of
two convex sets A, B analytically in the interesting case that A ⊆ B and B ⊆ A,
then we have to determine the outer common tangents to A and B and the points of
tangency.
Example 2.4.4 (Inverse Sum) Let A, B ⊆ Rn be convex sets. Then 0n ∈ A#B iff
0n ∈ A or 0n ∈ B. For a nonzero vector v ∈ Rn , v ∈ A#B iff it lies in the direct
sum of the following two convex subsets of the open ray R generated by v, A ∩ R
and B ∩ R. It remains to determine the inverse sum of two intervals in (0, +∞):
for all a > b > 0 and c > d > 0. This formula explains the terminology inverse
sum.
A
A
B B
A co ∪ B A co ∪ B
2.5 Exercises
This formula explains the terminology inverse sum for the binary operation #
on convex sets.
9. Give a ‘coordinate proof’ of Proposition 2.2.1: show that the matrices of +X
and X for X = Rn , equipped with the dot product, are each others transpose.
10. Verify the two formulas (C × D)X = C ∩ D and +X (C × D) = C + D for
convex cones C, D ⊆ X = Rn .
11. Show that each one of the two standard binary operations for convex cones
= ∩, + is commutative, AB = BA, and associative, (AB)C =
A(BC). Do this in two ways: by the explicit formulas for these binary
operations and in terms of their systematic construction.
12. Check that the set E constructed in Sect. 2.3 is a convex cone in X × R for
which E ⊆ X × R++ .
13. (a) Show that the following two binary operations for convex sets, the
Minkowski sum and the convex hull of the union, are the same for convex
cones.
(b) Show that the following two binary operations for convex sets, the intersec-
tion and the inverse sum, are the same for convex cones.
14. Show that each one of the four well-known binary operations on convex
sets = ∩, +, co ∪, # is commutative, AB = BA, and associative,
(AB)C = A(BC). Do this in two ways: by the explicit formulas for
these binary operations and in terms of their systematic construction.
15. Prove Proposition 2.4.1.
64 2 Convex Sets: Binary Operations
References
Abstract
• Why. Convex sets have many useful special topological properties. The chal-
lenging concept of recession directions of a convex set has to be mastered: this is
needed for work with unbounded convex sets. Here is an example of the use of
recession directions: they can turn ‘non-existence’ (of a bound for a convex set
or of an optimal solution for a convex optimization problem) into existence (of a
recession direction). This gives a certificate for non-existence.
• What. The recession directions of a convex set can be obtained by taking the
closure of its homogenization. These directions can be visualized by means of
the three models as ‘points at infinity’ of a convex set, lying on ‘the horizon’. All
topological properties of convex sets can be summarized by one geometrically
intuitive result only: if you adjoin to a convex set A their points at infinity, then
the outcome has always the same ‘shape’, whether A is bounded or unbounded:
it is a slightly deformed open ball (its relative interior) that is surrounded on
all sides by a ‘peel’ (its relative boundary with its points at infinity adjoined).
In particular, this is essentially a reduction of unbounded convex sets to the
simpler bounded convex sets. This single result implies all standard topological
properties of convex sets. For example, there is a very close relation between
the relative interior and the closure of a convex set: they can be obtained out of
each other by taking in one direction the relative interior and in the other one by
the closure. Another example is that the relative boundary of a convex set has a
‘continuity’ property.
Road Map
• Definition 1.5.4 and Fig. 3.1 (recession vector).
• Section 3.2 (list of topological notions).
• Proposition 3.2.1, Definition 1.5.4, Fig. 3.2 (construction of recession vectors by
homogenization).
• Corollary 3.2.4 (unboundedness equivalent to existence recession direction).
• Figures 3.3, 3.4, 3.5, 3.6, 3.7 (visualization of recession directions as points on
the horizon).
• Theorem 3.5.7 (the shape of bounded and unbounded convex sets).
• Figure 3.10 (the idea behind the continuity of the relative boundary of a convex
set).
• Corollary 3.6.2 (all standard topological properties of convex sets).
• Figure 3.11 (warning that the image under a linear function of a closed convex
set need not be closed).
The study of convex sets A leads to the need to consider limits of those sequences of
points in A that tend either to a boundary point of A or ‘to a recession direction of A’
(in particular, its Euclidean norm goes to infinity). The most convenient way to do
this, is to use topological notions. In this section, we recall the minimal knowledge
that is needed for this. More information can be found in any textbook on analysis
or on point-set topology.
A set U ⊆ X = Rn is called open if it is the union of a collection of open balls.
An open ball is the solution set UX (a, r) = Un (a, r) = U (a, r) of an inequality of
the form x − a < r for some vector a ∈ Rn and some positive real number r. The
sets ∅ and X are open, the intersection of a finite collection of open sets is open and
the union of any collection of open sets is open. For any set S ⊆ X, the union of all
open sets contained in S is called the interior of S and it is denoted by int(S). This
is the largest open subset of S. Points in int(S) are called interior points of S.
A set C ⊆ X is called closed if its complement X\C is open. The sets ∅ and X are
closed, the union of a finite collection of closed sets is closed and the intersection of
any collection of closed sets is closed. For any S ⊆ X, the intersection of all closed
sets containing S is called the closure of S and it is denoted by cl(S). This is the
smallest closed set containing S.
For each set S ⊆ X, the points from the closure of S that are not contained in the
interior of S, are called boundary points.
A set S ⊆ X is called bounded if there exists K > 0 such that |si | ≤ K for all
s ∈ S and all i ∈ {1, . . . , n}. Otherwise, S is called an unbounded set. A set S ⊆ X
is called compact if it is closed and bounded.
An infinite sequence of points v1 , v2 , . . . in X has limit v ∈ X—notation
limk→∞ vk = v—if there exists for each ε > 0 an natural number N such that
vk − v < ε for all k ≥ N . Here · is the Euclidean norm on X given by
1
x = (x12 + · · · + xn2 ) 2 for all x ∈ X. Then the sequence is called convergent.
An alternative definition for a closed set in X is that it is a subset of X such that
for every sequence of points in A that is convergent in X, also its limit is a point
in A.
Let S ⊆ X = Rn , T ⊆ Y = Rm , a ∈ S, b ∈ Y and f : S → T a function. Then
one says that, for s → a, the limit of f (s) equals b—notation lims→a f (s) = b—if
there exists for each ε > 0 a number δ > 0 such that f (s) − b < ε for all s ∈ S
for which s − a < δ.
3.1 *Crash Course in Topological Notions 67
The main aim of this section is to state the result that the recession cone RA of a
convex set A ⊆ X = Rn , defined in Definition 1.5.4, can be constructed by means
of a closure operation. The proof will be given in the next section.
Figure 3.1 illustrates the concept of recession cone of a convex set.
Proposition 3.2.1 For each nonempty closed convex set A ⊆ X = Rn , the convex
cone RA × {0} is the intersection of cl(co(A × {1})), the closure of the convex hull
of A × {1}, with the horizontal hyperplane X × {0}.
Note that this result implies that the recession cone RA of a nonempty closed
convex set A is a closed convex cone containing the origin. Indeed, cl(co(A × {1}))
is a closed convex cone containing the origin, and so RA × {0}, which you get from
it by intersection with the horizontal hyperplane X × {0} is also a closed convex
cone containing the origin. This means that RA is a closed convex cone containing
the origin.
Example 3.2.2 (Recession Cone by Means of a Closure Operation (Geometric))
Figure 3.2 illustrates Proposition 3.2.1 for the closed half-line A = [0, +∞) ⊂
X = R.
A
a + tc
A a
RA
RA
RA
O O O
A × {1}
A
O
The usual definition of the recession cone gives that A has a unique recession
direction, ‘go right’, described by the open ray {(x, 0) | x > 0}. The convex cone
c(A) is the solution set of x ≥ 0, y > 0. Its closure is the solution set is the solution
set of x ≥ 0, y ≥ 0. So if you take the closure, then you add all points (x, 0) with
x ≥ 0. So the direction ‘go right’ is added.
Figure 3.2 can also be used to illustrate this proposition in terms of the ray model
(consider the sequence of open rays that is drawn and for which the angle with
the positive axis tends to zero; note that the positive x-axis models the recession
direction) and the hemisphere model (consider the sequence of dotted points on
the arc that tend to the encircled point; note that this point models the recession
direction).
Example 3.2.3 (Recession Cone by Means of a Closure Operation (Analytic))
1. Ellipse. Let A ⊆ R2 be the solution set of 2x12 +3x22 ≤ 5 (a convex set in the plane
that has a boundary an ellipse). The (minimal) conification c(A) = cmin (A) =⊆
R2 × R++ can be obtained by replacing xi by xi /x3 with x3 > 0, for i = 1, 2 and
then multiplying by x32 . This gives the inequality 2x12 + 3x22 ≤ 5x32 . Its solution
set in R2 × R++ is c(A). Then the maximal conification cmax (A) can be obtained
by taking the closure. This gives that cmax (A) is the solution set in R2 × R+ of
the inequality 2x12 + 3x22 ≤ 5x32 . The recession cone RA ⊆ R2 can be obtained
by putting x3 = 0 in this inequality. This gives the inequality 2x12 + 3x22 ≤ 0. Its
solution set in R2 is {02 }. This is RA .
2. Parabola. Let A ⊆ R2 be the solution set of x2 ≥ x12 (a convex set in the
plane that has a boundary an parabola). We proceed as above. This gives that
the (minimal) conification c(A) = cmin (A) ⊆ R2 × R++ is the solution set of
x2 x3 ≥ x12 . The maximal conification cmax (A) can be obtained by taking the
closure. One can check that this gives that cmax (A) is the solution set in R2 × R+
of the system x2 x3 ≥ x12 , x2 ≥ 0. The recession cone RA ⊆ R2 can be obtained
by putting x3 = 0 in the first inequality. This gives the system of inequalities
0 ≥ x12 , x2 ≥ 0. Its solution set in R2 is the closed vertical ray in R2 that points
upward, {(0, ρ) | ρ ≥ 0}. This is RA .
3. Hyperbola. Let A be the solution set of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 (a convex set
in the plane that has a boundary a branch of a hyperbola). We proceed again as
above. This gives that the (minimal) conification c(A) = cmin (A) ⊆ R2 ×R++ is
the solution set of x1 x2 ≥ x32 , x1 ≥ 0, x2 ≥ 0. The maximal conification cmax (A)
can be obtained by taking the closure. One can check that this gives that cmax (A)
is the solution set in R2 × R+ of the system x1 x2 ≥ x32 , x1 ≥ 0, x2 ≥ 0. The
recession cone RA ⊆ R2 can be obtained by putting x3 = 0 in the first inequality.
This gives the system of inequalities x1 x2 ≥ 0, x1 ≥ 0, x2 ≥ 0. Its solution set
in R2 is the first quadrant R2+ . This is RA .
Proposition 3.2.1 has the following consequence.
70 3 Convex Sets: Topological Properties
Corollary 3.2.4 Let A ⊆ X = Rn be a given nonempty closed convex set. Then the
following conditions are equivalent:
1. A is bounded,
2. A has no recession directions,
3. c(A) ∪ {0n } is closed.
For each nonempty closed convex set A, there is a natural bijection between the set
of recession directions of A and the set of ‘horizontal’ open rays of cl(c(A)), that
is, those rays that are not open rays of c(A).
Example 3.2.5 (Unboundedness of Non-convex Sets) The subset in the plane R2
that is the solution set of x 2 + 1 ≥ y ≥ x 2 is not convex; this set unbounded, but it
contains no half-lines. So the fact that unboundedness of a convex set is equivalent
with the existence of a recession direction, is a special property of convex sets. It
does not hold for arbitrary subsets of Rn . Checking unboundedness of non-convex
sets is therefore much harder than checking boundedness of convex sets.
Now we prove the results from the previous section. Here is the proof of the main
result.
Proof of Proposition 3.2.1 It suffices to prove the second statement as both the
recession cone RA and the closure of the conification of A, cl(c(A)), do not change
if A is replaced by its closure cl(A).
⊆: Let (v, ρ) ∈ cl(c(A)) \ c(A). Choose a sequence ρn (an , 1), n ∈ N in c(A)—
so an ∈ A and ρn > 0 for all n—that has limit (v, ρ).
First we prove by contradiction that ρ = 0. Assume ρ > 0. Then the sequence
(an , 1), n ∈ N in c(A) has limit (ρ −1 v, 1) as limn→∞ ρn = ρ > 0. That is,
limn→∞ an = ρ −1 v, so ρ −1 v ∈ A as A is closed. Therefore, (v, ρ) = ρ(ρ −1 v, 1)
lies in c(A). This is the required contradiction.
So ρ = 0. By the choice of the sequence above, we have limn→∞ ρn an = v and
limn→∞ ρn = ρ = 0. Choose a ∈ A. For each t ≥ 0, we consider the sequence
((1 − tρn )a + tρn an )n in A, where we start the sequence from the index n = Nt
where Nt is chosen large enough such that tρn < 1 for all n ≥ Nt . This choice
is possible as limn→∞ ρn = 0. This sequence ((1 − tρn )a + tρn an )n converges to
a + tv as ρn → 0 and ρn an → 1, and so a + tv ∈ A as A is closed. This proves that
a + tv ∈ A for all t ≥ 0 and so, by definition, v is a recession vector of A, that is,
v ∈ RA . This establishes the inclusion ⊆.
⊇: Let v ∈ RA . If v = 0n , then we choose a ∈ A and note that (v, 0) = 0n+1 =
limt↓0 t (a, 1) ∈ cl(c(A)). Now let v = 0n . Choose a ∈ A such that a + tv ∈ A
for all t ≥ 0. As v = 0, we get limt→∞ (a + tv, 1)−1 (a + tv, 1) = (v, 0). So
(v, 0) lies in the closure of c(A), that is, (v, 0) ∈ cl(c(A)). As the last coordinate of
3.4 Illustrations Using Models 71
(v, 0) is zero, we have (v, 0) ∈ c(A), as c(A) ⊆ X × R++ . So we have proved that
(v, 0) ∈ cl(c(A)) \ c(A). This establishes the inclusion ⊇.
Here is the proof of the corollary.
Proof of Corollary 3.2.4 We prove the following implications. 3 ⇒ 1: If A is
unbounded, we can choose a sequence an , n ∈ N in A for which limn→∞ an =
+∞. Then we can choose a subsequence ank , k ∈ N such ank = 0n for all k and
such that, moreover, the sequence ank −1 ank , k ∈ N is convergent, say to v ∈ Sn .
This is possible as the unit sphere Sn is compact and as each sequence in a compact
set has a compact subsequence. Then the sequence (ank −1 ank , ank −1 ), k ∈ N
in c(A) converges to (v, 0), which does not lie in c(A). So c(A) is not closed. This
proves the implication: c(A) closed ⇒ A bounded.
2 ⇒ 3: follows from Proposition 3.2.1.
1 ⇒ 2: follows from the definition of recession direction.
The last statement of the corollary follows immediately from Proposition 3.2.1.
A × {1}
A
O RA
A × {1}
_
c(A)
^
A
A
O RA
^
X
X RX 0
Fig. 3.5 The hemisphere model for the recession directions of the plane
Example 3.4.2 (Recession Directions and the Hemisphere Model in Dimension One)
Figure 3.4 illustrates again the case of a closed half-line, A = [a, +∞) ⊆ X = R
with a > 0. Now the hemisphere model is used.
You see that A is visualized as a half open-half closed arc A (the bottom point is
not included) in the hemisphere model, and that the unique recession direction (‘go
right’) is visualized as the bottom point of the arc, also in the hemisphere model. If
we would take the closure of this half open-half closed arc, the effect would be that
this bottom point would be added to it. The closure of the conification, cl(c(A)), is
denoted in the figure by c(A).
Example 3.4.3 (Recession Directions and the Hemisphere Model for the Plane)
Figure 3.5 illustrates the case of the plane A = X = R2 . The hemisphere model is
used.
The plane A = X is visualized as the standard open upper hemisphere X, and
the set of recession directions of X, that is, of the open rays of the recession cone
RX , is visualized by the circle that bounds the upper hemisphere. This last set is
indicated in the same way as the recession cone of A = X, by RX . If you take the
closure of the model of A = X, then what is added is precisely the model of the set
of recession directions. Indeed, if we take the closure of the open upper hemisphere,
then the circle that bounds the upper hemisphere is added to it.
Example 3.4.4 (Recession Directions and the Hemisphere Model for Convex Sets
Bounded by an Ellipse, a Parabola, and a Branch of a Hyperbola) Figure 3.6
illustrates the case of three closed convex sets A in the plane R2 . The hemisphere
model is used.
3.4 Illustrations Using Models 73
A A
A
RA RA
Fig. 3.6 The hemisphere model for the recession directions of some convex sets in the plane
A
A RA A
RA
Fig. 3.7 The top-view model for the recession directions of some convex sets in the plane
Each one of the three shaded regions is denoted in the same way as the convex set
that it models, as A; the set of points of the shaded region that lie on the boundary
of the hemisphere is denoted in the same way as the recession cone of A, whose set
of rays it models, as RA .
On the left, the model of A is closed; this corresponds to the fact that A has no
recession directions, or equivalently, to the fact that A is bounded. In the middle,
the model of A is not closed: you have to add one suitable point from the circle
that bounds the hemisphere to make it closed. This point corresponds to the unique
recession direction of A, the unique open ray of the recession cone RA . In particular,
A is not bounded. On the right, the model of A is not closed: you have to add a
suitable closed arc from the circle that bounds the hemisphere to make it closed.
The points of this arc correspond to the recession directions of A. These points
correspond to the open rays of RA . In particular, A is not bounded.
Example 3.4.5 (Recession Directions and the Top-View Model for Convex Sets
Bounded by an Ellipse, a Parabola, and a Branch of a Hyperbola) Figure 3.7
illustrates also the case of three closed convex sets A in the plane R2 , just as Fig. 3.6,
but it differs slightly. In Fig. 3.7, the convex sets contain the origin in their interior,
the top-view model is used, and some arrows are drawn.
The arrows in Fig. 3.7 have to be explained. These model straight paths in the
convex set A that start at the origin of the plane. If an arrow ends in the interior of
the disk, then the path in A ends at a boundary point of A; if the arrow ends at a
boundary point of the disk, then the path in A does not end: it is a half-line.
Example 3.4.6 (Recession Directions and the Ray Model for n-Dimensional Space
Rn ) Consider the case A = X = Rn . Note to begin with that cl(c(X)) equals
the closed upper halfspace X × R+ . Its non-horizontal open rays correspond to the
points of X, and its horizontal open rays correspond to the one-sided directions of
74 3 Convex Sets: Topological Properties
X, that is, to Sn , the standard unit sphere in Rn . Indeed, these non-vertical rays
are the rays {ρ(x, 1) | ρ > 0} where x ∈ X and these horizontal rays are the rays
{ρ(x, 0) | ρ > 0} where x ∈ Sn .
Example 3.4.7 (Gauss and Recession Directions) It is interesting that Gauss used
already a similar model to visualize one-sided directions in space.
Disquisitiones, in quibus de directionibus variarum
rectarum in spazio agitur, plerumque ad maius
perspicuitatis et simplicitatis fastigium evehuntur,
in auxilium vocando superficiem sphaericam radio = 1
circa centrum arbitrarium descriptam, cuius singula
puncta repraesentare censebuntur directiones rectarum
radiis ad illa terminatis parallelarum.
Investigations, in which the directions of various straight lines in space are to be considered,
attain a high degree of clearness and simplicity if we employ, as an auxiliary, a sphere of
unit radius described about an arbitrary center, and suppose the different points of the sphere
to represent the directions of straight lines parallel to the radii ending at these points. (First
sentence of General Investigations of Curved Surfaces: Carl Friedrich Gauss, 1828.)
This fits in with the hemisphere model. The auxiliary sphere of unit radius that
is recommended by Gauss, is precisely the boundary of the open upper hemisphere
x12 + x22 + x32 + x42 = 1, x4 > 0 that models the points of three dimensional space.
Indeed, this boundary is x12 + x22 + x32 = 1, x4 = 0, that is, it is a unit sphere of
radius 1. So the auxiliary sphere recommended by Gauss agrees completely with
the visualization of recession vectors in the hemisphere model, which has been
explained above.
The aim of this section is to give the main result of this chapter—a result describing
the ‘shape’ of a convex set. We will formulate this result in terms of the hemisphere
model. We begin by giving a parametrization of the standard upper hemisphere.
Definition 3.5.1 The angle (u, w) between two unit vectors u, w ∈ Rp is the
number ϕ ∈ [0, π ] for which cos ϕ = u · w.
Definition 3.5.2 (Parametrization of the Upper Hemisphere) For each v ∈
SX = Sn , the standard unit sphere in X = Rn , and each t ∈ [0, 12 π ], we define
the point xv (t) in the standard upper hemisphere in X × R = Rn+1 , given by
x12 + · · · + xn+1
2 = 1, xn+1 ≥ 0, implicitly by the following two conditions:
t
v
and
explicitly,
This parametrization of the standard upper hemisphere is not unique for the point
(0n , 1): then one can take t = 0 and each v ∈ Sn .
Example 3.5.3 (Parametrization of the Upper Hemisphere) Figure 3.8 illustrates
this parametrization.
In order to visualize how the point xv (t) depends on t for a given unit vector v,
the vector v and the vector (0n , 1) are drawn as (1, 0) and (0, 1) in the plane R2
respectively. The point on the standard unit circle in the planeR2 that lies in the first
quadrant R2+ such that the length of the arc with endpoints (0, 1) and this point is t
is drawn: this point is xv (t).
The hemisphere model for X = Rn and the one-sided directions in X can
be expressed algebraically in terms of the parametrization of the standard upper
hemisphere given above. The point xt (v) on the hemisphere models the point
(tan t)v in X if 0 ≤ t < 12 π and it models the one-sided direction in X given
by the unit vector v if t = 12 π .
The following result is a first step towards the main result; it is also of
independent interest. From now on, we will take topological notions for a convex
set A ⊆ Rn always relative to the affine hull of A.
Proposition 3.5.4 Each nonempty convex set A ⊆ X = Rn has a relative interior
point.
Proof Take a maximal affinely independent set of points a1 , . . . , ar in A. Then the
set of points ρ1 a1 + · · · + ρr ar with ρ1 , . . . , ρr > 0, ρ1 + · · · + ρr = 1 is open in
aff(A) and so all its points are relative interior points of A.
This result gives a better grip on convex sets. Whenever you deal with a convex
set, it helps to choose a relative interior point and then consider the position of the
convex set in comparison with the chosen point. We will do this below in order to
simplify the formulation and proof of the main result, Theorem 3.5.7.
76 3 Convex Sets: Topological Properties
To be more precise, Proposition 3.5.4 shows that we can always assume wlog
that a given nonempty convex set A ⊆ Rn contains the origin in its interior point.
Indeed, we can choose a relative interior point, translate A to a new position A such
that this point is brought to the origin; this turns the affine hull of A into the span of
A . Finally we consider A as a convex subset of the subspace span(A ) and we give
a coordinate description of span(A ) by choosing a basis for it. This concludes the
reduction to a convex set that has the origin as an interior point.
Before we state the main result formally, we will explain the geometric meaning
of it.
Example 3.5.5 (The Main Result on the Shape of a Convex Set for a Closed Ball)
The following description of the shape of a closed ball in Rn is the view on convex
sets that is given below on arbitrary convex sets. If you travel in the direction of
some unit vector v, with constant speed and in a straight line, starting from the
center of the ball, then the following will happen. You travel initially through the
interior of the ball, then you will hit during one moment the boundary of the disk,
and after that moment you will travel forever outside the disk. The hitting time does
not depend on the direction v.
For the purpose of explaining the geometric meaning of the main result, the
top-view model is the most convenient one; for the purpose of giving a formal
description of the result, the hemisphere model is more convenient.
Example 3.5.6 (The Main Result on the Shape of a Convex Set for Sets in the Plane
Bounded by an Ellipse, a Parabola, and a Branch of a Hyperbola) Figure 3.9
illustrates the main result for three closed convex sets A in the plane R2 that contain
the origin in their interiors: the left one is bounded by an ellipse, the middle one is
bounded by a parabola and the right one is bounded by a branch of a hyperbola.
Here is some explanation for this figure. For each one of these three convex sets
A, its model is indicated by the same letter as the set itself, by A; the model of the
set of recession directions is indicated by the same letter as the recession cone, by
RA . Recall here that the recession directions of A correspond to the open rays of
RA . Each arrow in the figure models either a closed line segment with one endpoint
the origin and the other endpoint a boundary point of A, or it models a half-line
contained in A with one endpoint the origin.
A
A RA A
RA
Fig. 3.9 The shape of a closed convex set together with its recession directions
3.5 The Shape of a Convex Set 77
Now we describe the main result for these three sets A. Here we get the same
description for three different looking convex sets, after adding recession directions.
Suppose that you travel, in each one of the three cases, in a straight line in A, starting
from the origin and going in some direction v (a unit vector), with speed increasing
in such a way that the speed would be constant in the hemisphere model. Then in the
top-view model, you travel also in a straight line. At first, you will travel through the
interior of the model of A. Then you will hit during one moment, τ (v), the boundary
of the model of A. If this boundary point does not lie on the boundary of the closed
standard unit disk, then after that moment you will travel outside the model of A
till you will hit the boundary of the disk. The hitting time τ (v) of the boundary of
the model of A depends continuously on the initial direction v. This completes the
statement of the main result for these three convex sets A. So we see that the three
sets A, which look completely different from each other, have exactly the same
shape, as a closed disk, after we adjoin to A the set of its recession directions.
Now we are ready to state the main result formally and in general. As already
announced, we do this in terms of the hemisphere model, as this allows the
simplest formulation. We use the parametrization of the upper hemisphere from
Definition 3.5.2. We recall that all topological notions for subsets of the standard
upper hemisphere {x ∈ Sk | xk ≥ 0} are taken relative to the standard unit sphere
Sk = {x ∈ Rk | x = 1}.
Theorem 3.5.7 (Shape of a Convex Set) Let A ⊆ X = Rn be a closed convex set
that contains the origin in its interior. Consider the hemisphere model cl(c(A)) ∩
Sn+1 for the convex set A to which the set of its recession directions has been added.
For each v ∈ Sn , there exists a unique τ (v) ∈ (0, 12 π ] such that the point xv (t) lies
in the interior of cl(c(A)) ∩ Sn+1 for t < τ (v), on the boundary of cl(c(A)) ∩ Sn+1
for t = τ (v), and outside cl(c(A)) ∩ Sn+1 for t > τ (v). Moreover, τ (v) depends
continuously on v.
The meaning of this result is that if you adjoin the set of recession directions to a
convex set A, then the outcome has always the same shape; which is, for example,
the shape of a closed ball in Rd where d is the dimension of the affine hull of A.
Note that the continuity of the function τ (·) means that the union of the relative
boundary of a convex set and its set of recession directions has a continuity property
in the hemisphere model: if you travel at unit speed along a geodesic starting from
a relative interior point, then the time it takes to reach a point of this union, depends
continuously on the initial direction.
A nice feature of this result is that it implies immediately all standard topological
properties of convex sets, bounded as well as unbounded ones, as we will see in the
next section.
Proof of Theorem 3.5.7 The set cl(c(A)) is a closed convex cone contained in the
halfspace X × R+ and it contains an open ball with center (0n , 1) and some radius
r > 0. This implies all statements of the theorem, except the continuity statement.
This follows from the lemma below.
78 3 Convex Sets: Topological Properties
Example 3.5.8 (Continuity Property of the Boundary of a Convex Set) Figure 3.10
illustrates the following auxiliary result and it makes clear why this result implies
a ‘continuity’ property for the boundary of a convex set (this is a matter of
the boundary being squeezed at some point between two sets with ‘continuity’
properties at this point).
Lemma 3.5.9 Let A ⊆ X = Rn be a closed convex set that contains an open ball
U (a, ε) and let x be a boundary point of A. Then the convex hull of the point x and
the open ball U (a, ε) is contained in A, and its point reflection with respect to the
point x is disjoint from A, apart from the point x.
Proof The first statement follows from the convexity of A, together with the
assumptions that x ∈ A and U (a, ε) ⊆ A. The second statement is immediate
from the easy observation that the convex hull of any point of the reflection and
the open ball U (a, ε) contains x in its interior. So if a point of the reflection would
belong to A, then by the convexity of A and by U (a, ε) ⊆ A we would get that x is
in the interior of A. This would contradict the assumption that x is a boundary point
of A.
To derive the required continuity statement precisely from this lemma requires
some care. You have to intersect the convex cone cl(c(A)) with a hyperplane of
X × R (a subspace of dimension n − 1 of X = Rn ) that does not contain the point
(v, 0). We do not display the details of this derivation.
The aim of this section is to give a list of all standard topological properties of
convex sets. It is an advantage of the chosen approach that all topological properties
of convex sets are a part of or easy consequences of one result, Theorem 3.5.7.
Definition 3.6.1 An extreme point of a convex set A ⊆ X = Rn is a point of A that
is not the average of two different points of A. An extreme recession direction of A
is a one-sided direction if a vector that represents this direction is not the average
of two linearly independent recession vectors of A. We write extr(A) for the set of
3.6 Topological Properties Convex Set 79
extreme points of A and extrr(A) for the set of unit vectors that represent an extreme
recession direction of A.
The kernel of a linear function between finite dimensional inner product spaces
L : X → Y is defined to be the following subspace of X:
ker(L) = {x ∈ X | L(x) = 0Y }.
Statement 9 is, in the bounded case, due to Minkowski for n = 3 and to Steinitz
for any n. The result was generalized by Krein and Milman, still in the bounded case,
to suitable infinite dimensional spaces (those that are locally convex), but there one
might have to take the closure. Now the terminology Krein–Milman is often used
also for the finite dimensional case. The assumption that A contains no line can be
made essentially wlog. Indeed, for each closed convex set A ⊆ X = Rn one has
A = LA + (A ∩ L⊥
A)
LA = {x ∈ X | x · l = 0 ∀l ∈ LA }.
The statements 1–7 follow immediately from the known shape of a convex set,
that is, from Theorem 3.5.7. Statement 10 is a combination of Krein–Milman and
Carathéodory’s theorem 1.12.1. It remains to prove the statements 8 and 9.
Proof Krein–Milman Take a nonempty closed subset S of the upper hemisphere
that is convex in the sense that for two different and non-antipodal points in S the
geodesic that connects them is contained in S. For each point s ∈ S let Ls be the
subspace of X of all tangent vectors at p to geodesics that have s as an interior point.
Now pick an element p ∈ S. If Lp = {0n }, then stop. Otherwise, take a nonzero
v ∈ Lp and choose the geodesic through p which has v as tangent vector at p.
Consider the endpoints q, r of the part of this geodesic that lies in S. Then repeat
this step for q and r. Note that Lq , Lr ⊆ Lp but Lq , Lr not equal to Lp as they
do not contain v. As the dimensions go down at each step, this process must lead to
a finite set of points s with Ls = {0n } such that p lies in the convex hull of these
points.
Now we prove statement 8.
Proof Statement 8 under the Assumption on the Recession Cone By the assump-
tion, we get a surjective continuous function from the intersection SX ∩ C to the
intersection SY ∩ LC defined by u → Lu/Lu. So SY ∩ LC is compact, as
compactness is preserved under taking the image under a continuous mapping.
Hence the convex cone LC is closed.
Sketch Proof Statement 8 for the Polyhedral Case The statement holds if L is
injective: then we may assume wlog that L is defined by adding m − n zeros
below each column x ∈ Rn (by working with coordinates in X and Y with
respect to suitable bases); this shows that if A is a polyhedral set, then LA
is also a polyhedral set and so LA is closed. So as each linear function is
3.8 *Applications of Recession Directions 81
the composition of injective linear functions and linear functions of the form
πr : Rr+1 → Rr given by the recipe (x1 , . . . , xr , xr+1 ) → (x1 , . . . , xr ), it
suffices to prove the statement for linear functions of the latter type. Note that each
polyhedral set P in Rr+1 is the solution set of a system of linear inequalities of
the form Li (x1 , . . . , xr ) ≥ xr+1 , xr+1 ≥ Rj (x1 , . . . , xr ), Sk (x1 , . . . , xr ) ≥ 0
for all i ∈ I, j ∈ J, k ∈ K for suitable finite index sets I, J, K. Here
Li , Rj , Sk are affine functions of x1 , . . . , xr . Then the image πr (P ) of P under
πr is readily seen to be the solution set of the system of linear inequalities
Li (x1 , . . . , xr ) ≥ Rj (x1 , . . . , xr ), Sk (x1 , . . . , xr ) ≥ 0 for all i ∈ I, j ∈ J, k ∈ K.
That is, it is again a polyhedral set and in particular it is closed.
The method that was used in this proof is called the Fourier-Motzkin elimination,
a method to eliminate variables in a system of linear inequalities.
We give two applications of recession vectors. One can usually convince a non-
expert that something exists by giving an example. To convince him/her that
something does not exist is usually much harder. However, in some cases we
are lucky and then we can translate non-existence of something into existence of
something else.
How to convince a non-expert that a convex set A is unbounded, that is that there
exists no upperbound for the absolute value of the coordinates of the points in A?
Well, the nonexistence of a bound for a convex set A is equivalent to the existence
of a nonzero recession vector for A. So a recession vector for A is valuable: it is
a certificate for unboundedness for A. So you can convince a non-expert that there
exists no bound for A by giving him/her an explicit nonzero recession vector for A.
How to convince a non-expert that there does not exist a global minimum for a
linear function l on a closed convex set A? This non-existence is equivalent to the
existence of a nonzero recession vector of A along which the function l descends.
So such a vector is valuable: it is a certificate for insolubility for the considered
optimization problem. So you can convince a non-expert that there does not exist an
optimal solution for the given optimization problem by giving him/her an explicit
nonzero recession vector of A along which the function l descends.
82 3 Convex Sets: Topological Properties
a b
∼
c b
3.9 Exercises
1. Give a direct proof that for an interior point a of a convex set a ∈ A and an
arbitrary point b̃ ∈ A, the half-open line segment [a, b̃) is contained in the
interior of A.
Hint: see Fig. 3.12.
2. Show that both the closure and the relative interior of a convex set are convex
sets.
3. Show that the following two properties of a set C ⊆ Rn are equivalent (both
define the concept of closed set):
(a) the complement Rn \ C is open,
(b) C contains along with each convergent sequence of its points also its limit.
4. Show that a set S ⊆ Rn is compact, that is, closed and bounded, iff every
sequence in S has a subsequence that converges to an element in S.
5. Show that the following three properties of a function f : S → T with
S ⊆ Rn , T ⊆ Rm are equivalent (all three define the concept of continuous
function):
(a) lims→a f (s) = f (a) for all a ∈ S,
(b) the inverse image under f of every set that is open relative to T is open
relative to S,
(c) the inverse image under f of every set that is closed relative to T is closed
relative to S.
6. Show that one can write each nonempty closed convex set A as a Minkowski
sum D + L where D is a closed convex set that contains no line and L is a
subspace. In fact, L is uniquely determined to be LA , the lineality space of A.
Moreover, one can choose D to be the intersection A ∩ L⊥ A.
This result allows to restrict attention to nonempty closed convex sets that
do not contain a line.
7. Show that the following conditions on a convex cone C ⊆ Rn × R+ containing
the origin are equivalent:
(a) C is closed,
(b) the model of C in the ray model is closed in the sense that if the angles
between the rays of a sequence of rays of C and a given ray in Rn × R+
tend to zero, then this given ray is also a ray of C,
(c) the model of C in the hemisphere model is closed,
(d) the model of C in the top-view model is closed.
3.9 Exercises 83
8. Show that an unbounded convex set has always a recession direction, but that an
arbitrary unbounded subset of Rn does not always have a recession direction.
9. Determine the recession cones of some simple convex sets, such as (7, +∞),
{(x, y) ∈ R2 | y ≥ x 2 } and {(x, y) ∈ R2 | xy ≥ 1, x ≥ 0, y ≥ 0}.
10. Give an example of a convex set A ⊆ Rn , two recession vectors c, c and a
point a ∈ A such that a + tc ∈ A ∀t ≥ 0 but not a + tc ∈ A ∀t ≥ 0.
11. Show that the recession cone RA of a nonempty convex set A ⊂ Rn is a convex
cone.
12. Show that the recession cone RA of a nonempty closed convex set A ⊂ Rn is
closed.
13. Show that the recession cone RA of a nonempty closed convex set A ⊂ Rn
need not be closed.
14. Let A be a nonempty closed convex set in X = Rn . Show that a unit vector
c ∈ X is in the recession cone RA of A iff there exist a sequence a1 , a2 , . . . of
nonzero points in A for which ai → ∞ such that ai /ai → c for i → ∞.
15. Show that the recession cone RA of a nonempty closed convex set A ⊆ X = Rn
consists—for any choice of a ∈ A—of all c ∈ X for which a + tc ∈ A for all
t ≥ 0.
Figure 3.13 illustrates this.
So for closed convex sets, a simplification of the definition of recession
vector is possible: we can take the same arbitrary choice of a in the entire set
A, for all recession vectors c.
16. Verify in detail that Fig. 3.2 is in agreement with Proposition 3.2.1. Do this
using each one of the models (ray, hemisphere, top-view).
17. Verify that the gauge of a closed convex set that contains the origin in its
interior, is well-defined, in the following way. Let A ⊆ X = Rn be a closed
convex set that contains the origin in its interior. Show that there exists a unique
sublinear function g : X → R+ for which
A = {x ∈ X | g(x) ≤ 1}.
a
84 3 Convex Sets: Topological Properties
18. Show that the epigraph of the gauge g of a closed convex set A that contains
the origin in its interior, is equal to the closure of the homogenization of A,
epi(g) = cl(c(A)).
19. Let A ⊆ X = Rn be a closed convex set that has the origin as an interior point,
and let g be its gauge. Show that
Abstract
• Why. The heart of the matter of convex analysis is the following phenomenon:
there is a perfect symmetry (called ‘duality’) between the two ways in which a
closed proper convex set can be described: from the inside, by its points (‘primal
description’), and from the outside, by the halfspaces that contain it (‘dual
description’). Applications of duality include the theorems of the alternative:
non-existence of a solution for a system of linear inequalities is equivalent to
existence of a solution for a certain other such system. The best known of these
results is Farkas’ lemma. Another application is the celebrated result from finance
that European call options have to be priced by a precise rule, the formula of
Black-Scholes. Last but not least, duality is very important for practitioners of
optimization methods, as we will see in Chap. 8.
• What. Duality for convex sets is stated and a novel proof is given: this amounts
to just throwing a small ball against a convex set. Many equivalent versions of
the duality result are given: the supporting hyperplane theorem, the separation
theorems, the theorem of Hahn–Banach, the fact that a duality operator on convex
sets containing the origin, the polar set operator B → B ◦ , is an involution. For
each one of the four binary operations on convex sets a rule is given of the type
(B1 B2 )◦ = B1◦ " B2◦ where " is another one of the four binary operations on
convex sets, and where B1 , B2 are convex sets containing the origin. Each one of
these four rules requires usually a separate proof, but homogenization generates
a unified proof. This involves a construction of the polar set operator by means
of a duality operator for convex cones, the polar cone operator. The following
technical problem requires attention: all convex sets that occur in applications
have two nice properties, closedness and properness, but they might lose these if
you work with them. For example B1 B2 need not have these properties if B1
and B2 have them and is one of the four binary operations on convex sets; then
the rule above might not hold. This rule only holds under suitable assumptions.
Road Map
1. Definition 4.2.1, Theorem 4.2.5, Figs. 4.5, 4.6 (the duality theorem for convex
sets, remarks on the two standard proofs, the novel ball throwing proof).
2. Figure 4.7, Definition 4.3.1, Theorem 4.3.3, hint (reformulation duality theo-
rem: supporting hyperplane theorem).
3. Figure 4.8, Definition 4.3.4, Theorem 4.3.6, hint (reformulations duality theo-
rem: separation theorems).
4. Definitions 4.3.7, 4.3.9, Theorem 4.3.11, hint (reformulation duality theorem:
theorem of Hahn–Banach).
5. Definition 4.3.12, Theorem 4.3.15, hint, remark (reformulation duality theorem:
polar set operation is involution).
6. Figure 4.9, Definition 4.3.16, Theorem 4.3.19, hint, Theorem 4.3.20, hint
(reformulations of duality theorem: (1) polar cone operator is involution, (2)
polar cone of C = Rn is nontrivial; self-duality of the three golden cones).
7. The main idea of the Sects. 4.5 and 4.6 (construction polar set operator by
homogenization; pay attention to the minus sign in the bilinear mapping on
X × R and the proof that the ‘type’ of a homogenization of a convex set
containing the origin remains unchanged under the polar cone operator).
8. Figure 4.12 (geometric construction of polar set).
9. Propositions 4.6.1, 4.6.2 (calculus rules for computing polar cones).
10. Propositions 4.6.3, 4.6.4 (calculus rules for computing polar sets, take note of
the structure of the proof by homogenization).
11. Section 4.8 (Minkowski-Weyl representation, many properties of convex sets
simplify for polyhedral sets).
12. Theorem 4.8.2, structure proofs (theorems of the alternative, short proofs thanks
to calculus rules).
4.1 *Motivation
We recall that the contents of first sections of chapters that motivate the contents of
the chapter are optional material and are not used in the remainder of the section.
The aim of this section is to give an example of a curved line that is described in a
dual way, by its tangent lines.
A child draws lines on a sheet of paper using a drawer and according to some
precise plan she has learned from a friend, who in turn has learned it from an older
brother. For each line, the two endpoints of the drawer are placed somewhere at the
bottom side of the sheet and accordingly at the left side of the sheet. The drawing
is shown in Fig. 4.1. The fun of it is that drawing all these straight lines leads to
something that is not straight: a curved line! The lines are tangent lines to the curve:
the curve is created by its tangent lines.
What is fun for us is to determine the equation for this curved line. You can see
that the region above the curve is a convex set in the plane and that the lines that
4.1 *Motivation 87
are drawn are all outside this region and tangent to the region. This description of a
convex set in the plane by lines lying outside this set is called the dual description
of the set.
You can try to derive the equation already now. A calculation using the theory of
this chapter will be given in Sect. 4.8.
The aim of this section is to state an economic application of the dual description of
a convex set: how a regulator can make sure that a firm produces a certain good in a
desired way—for example respecting the environment. How to do this?
Consider a product that requires n resources (such as labor, electricity, water)
to manufacture. The product can be manufactured in many ways. Each way is
described by a resource vector—an element in a closed convex set P in the first
positive orthant Rn++ for which the recession cone is the first orthant Rn+ . In addition,
it is assumed that P is strictly convex—its boundary contains no line segments
of positive length. The regulator can set the prices πi ≥ 0, 1 ≤ i ≤ n, for the
resources. Then the firm will choose a way of manufacturing x = (x1 , . . . , xn ) ∈ P
that minimizes cost of production π1 x1 + · · · + πn xn . We will prove in Sect. 4.8,
using the dual description of the convex set P , that for any reasonable way of
manufacturing v ∈ P , the regulator can make sure that the firm chooses v. Here,
reasonable means that v should be a boundary point of P —indeed other points will
never be chosen by a firm as they can be replaced by points that give a lower price
of production, whatever the prices are. Figure 4.2 illustrates how the firm chooses
the way of manufacturing for given prices that minimizes cost of production.
88 4 Convex Sets: Dual Description
How can you convince a non-expert who wants to know for sure whether a certain
system of equations and inequalities has a solution or not, of the truth? If there
exists a solution, then you should find one and give it: then everyone can check that
it is indeed a solution. But how to do this if there is no solution? In the case of
a linear system, there is a way. It is possible to give certificates of insolubility by
means of so-called theorems of the alternative. This possibility depends on the dual
description of convex sets. This will be explained in Sect. 4.8.
Aristoteles writes in his book ‘Politics’ about 400 years BCE that another philoso-
pher, Thales of Miletus, once paid the owners of olive presses an amount of money
to secure the rights to use these presses at harvest time. When this time came and
there was a great demand for these presses due to a huge harvest, as Thales had
predicted, he sold his rights to those who needed the presses and made a great profit.
Thus he created what is now called European call options, the right—but not the
obligation—to buy something at a fixed future date at a fixed price. For thousands
of years, such options were traded although there was no method for determining
their value. Then Black and Scholes discovered a model to price them. They showed
how one could make money without risk if such options would not be priced using
their model. Therefore, from then on, all these options have been traded using the
model of Black and Scholes.
In Sect. 4.8, we will apply the dual description of a convex set to establish the
result of Black and Scholes for the pricing of European call options in a simple but
characteristic situation.
The aim of this section is to present the duality theorem for a closed convex set.
The meaning of this result is that the two descriptions of such a set—the primal
description (from the inside) and the dual description (from the outside)—are
equivalent.
Definition 4.2.1 A subset H of Rn is a closed halfspace if it is the solution set of
an inequality of the type a1 x1 + · · · + an xn ≤ b for a1 , . . . , an , b ∈ R and ai = 0
for at least one i. So, H is the set of points on one of the two sides of the hyperplane
a1 x1 + · · · + an xn = b (including the hyperplane).
A nonempty closed convex set A ⊆ Rn can be approximated from the inside, as
the convex hull of a finite subset {a1 , . . . , ak } of A, as we have already explained.
4.2 Duality Theorem 89
a3 a4
H4
H1 A
H3
H2
Example 4.2.2 (Approximation Convex Set from the Inside) Figure 4.3 illustrates a
primal approximation of a convex set.
In the figure, seven points in a given convex set A in the plane and their convex
hull are drawn. The convex hull is an approximation of A from the inside, also called
a primal approximation.
The set A can also be approximated from the outside, as the intersection of a
finite set of closed halfspaces H1 , . . . , Hl containing A.
Example 4.2.3 (Approximation Convex Set from the Outside) Figure 4.4 illustrates
a dual approximation of a convex set.
In the figure, six halfplanes containing a given convex set A in the plane and their
intersection are drawn. The intersection is an approximation of A from the outside,
also called a dual approximation.
Most of the standard results from convex analysis are a consequence of the
following central result of convex analysis, the meaning of which is that ‘in the limit’
the two types of approximation, from the inside and from the outside, coincide.
Example 4.2.4 (Approximation Convex Set from the Inside and the Outside) Fig-
ure 4.5 illustrates these both types of approximation, primal and dual, together in
one picture.
In the figure, both a primal and a dual approximation of a convex set A in the
plane are drawn. The convex set A is squashed between the primal and the dual
approximation.
Now we formulate the duality theorem for closed convex set. This is the central
result of convex analysis.
90 4 Convex Sets: Dual Description
Theorem 4.2.5 (Duality Theorem for Convex Sets) A closed convex set is the
intersection of the closed halfspaces containing it.
Example 4.2.6 (Duality Theorem and the Shape of Beach Pebbles) There are two
types of beach pebbles: sea pebbles and river pebbles. Beach pebbles have usually
the shape of a bounded closed convex set in three dimensional space R3 . River
pebbles have usually not a convex shape. What might be the reason for this?
Pebbles form gradually, sometimes over billions of years, as the water of the
ocean or the river washes over rock particles. River current is gentler than ocean
current. Therefore, the outside effect of ocean current on a pebble is that the pebble
is exactly equal to its outside approximation—the intersection of the halfspaces
containing it. However, the outside effect of river current on a pebble achieves in
general only that it is roughly equal to its outside approximation. By the duality
theorem for convex sets, Theorem 4.2.5, the sets in three dimensional space that
are equal to their outer approximation are precisely the closed convex sets. So the
difference in the shapes between sea pebbles and river pebbles illustrates the duality
theorem for convex sets.
To prove the duality theorem, one has to show that there exists for each point
p outside A a hyperplane that separates p and A, that is, such that the point p
and the set A lie on different sides of it (where each side is considered to contain
the hyperplane). There are two standard ways for proving this theorem: the proof by
means of the shortest distance problem, which produces the hyperplane in one blow,
and the standard proof of the Hahn–Banach theorem (for the finite dimensional
case), which produces step by step a chain of affine subspaces L1 ⊂ L2 ⊂ · · · ⊂
Ln−1 of Rn , such that the dimension of Li is i for all i and such that moreover,
Ln−1 is the desired hyperplane (see Exercises 3 and 9 respectively). The proof of
the Hahn–Banach theorem is usually given in a course on functional analysis for the
general, infinite dimensional, case.
Here a novel proof is presented.
Example 4.2.7 (Ball-Throwing Proof Duality Theorem) Figure 4.6 illustrates this
proof.
4.3 Other Versions of the Duality Theorem 91
p q A
The idea of the proof can be explained as follows for n = 3. Throw a small
ball outside A with center p in a straight line till it hits A, say at the point q. Then
take the tangent plane at q to the ball in its final position. Then geometric intuition
suggests that this plane separates the point p and the set A.
Now we give a more precise version of this proof. Figure 4.6 also illustrates this
more precise version.
Proof The statement of the theorem holds for A = ∅ (all closed halfspaces contain
∅ and their intersection is ∅) and A = Rn (the set of closed halfspaces containing
Rn is the empty set and their intersection is Rn ; here we use the convention that the
intersection of the empty set of subsets of a set S is S). So assume that A = ∅, Rn .
Take an arbitrary p ∈ Rn \ A. Choose ε > 0 such that the ball B(p, ε) is disjoint
with the closed convex set A. Choose a ∈ A. Define the point
pt = (1 − t)p + ta
for all t ∈ [0, 1]. So if t runs from 0 to 1, then pt runs in a straight line from p to
a. Let t¯ be the smallest value of t for which B(pt , ε) ∩ A = ∅. This exists as A
is closed. We claim that B(pt¯, ε) and A have a unique common element q. Indeed,
let q be a common element. Then, for each x ∈ B(pt¯, ε) \ {q} there exists t < t¯
such that x ∈ B(pt , ε), and so, by the minimality property of t¯, x ∈ A. Now we use
the fact that the open half-lines with endpoint a boundary point of a given closed
ball that run through a point of the ball, form an open halfspace that is bounded by
the hyperplane that is tangent to the ball at this boundary point. Apply this property
to the ball B(pt¯, ε) and the point q. This shows that the hyperplane that is tangent
to the ball B(pt¯, ε) at the point q—which has equation (x − q) · (pt¯ − q) = 0—
separates the point p and the set A.
The aim of this section is to give no less than six equivalent reformulations of the
important Theorem 4.2.5. These are often used. We only give hints for the proofs of
the equivalence.
92 4 Convex Sets: Dual Description
We need a definition.
Definition 4.3.1 A hyperplane H supports a nonempty closed convex set A ⊂ Rn
at a point x̄ ∈ A if A lies entirely on one of the two sides of H (it is allowed to have
points on H ) and if, moreover, the point x̄ lies on H . Then at least one of the two
closed halfspaces bounded by H contains A. Such a halfspace is called a supporting
halfspace of A.
Example 4.3.2 (Supporting Halfspace) In Fig. 4.7, a convex set A in the plane R2
and a boundary point x̄ of A are drawn. A halfplane H is drawn that contains A
entirely, and that contains the point x̄ on the line that is the boundary of H .
x−
H
4.3 Other Versions of the Duality Theorem 93
by a line. On the left, this line has no points in common with A or B, on the right,
this line has points in common with A or B, actually with both, and moreover with
their intersection A ∩ B.
ri(A) ∩ ri(B) = ∅.
α1 x1 + · · · + αn xn ≤ β ∀x ∈ A
and
α1 x1 + · · · + αn xn ≥ γ ∀x ∈ B.
Hint for the equivalence to the duality theorem 4.2.5: separating nonempty
convex sets A and B in Rn by a hyperplane is equivalent to finding a linear function
l : Rn → R for which l(a) ≥ l(b) for all a ∈ A and b ∈ B, that is, for which
l(a − b) ≥ 0 for all a ∈ A and b ∈ B. This is equivalent to separating the convex
set A − B = {a − b | a ∈ A, b ∈ B} from the origin by a hyperplane.
The following result is the finite dimensional case of an important result from
functional analysis. We need some definitions.
94 4 Convex Sets: Dual Description
restrict the generality as long as we consider one nonempty convex set at the time
(which is what we are doing now): this assumption can be achieved by translation.
If A ⊆ X = Rn is a nonempty convex set, then choose ã ∈ A and replace A by
B = A − ã = {a − ã | a ∈ A} or—equivalently—move the origin to the point ā.
Definition 4.3.12 The polar set of a convex set B in X = Rn that contains the
origin is defined to be the following closed convex set in X that contains the origin
B ◦ = {y ∈ X | x · y ≤ 1 ∀x ∈ B}.
Note that for a nonzero y ∈ X one has that y ∈ B ◦ iff the solution set of x · y ≤ 1
is a closed halfspace in X that contains B and for which the origin does not lie on
the boundary of the halfspace. Therefore, the polar set operator is called a duality
operator for convex sets containing the origin. Indeed, it turns the primal description
of such a set (by points inside the set) into the dual description (by closed halfspaces
containing the set that do not contain the origin on their boundary).
Example 4.3.13 (Polar Set Operator) Here are some illustrations of the polar set
operator.
• The polar set of a subspace is its orthogonal complement.
• The polar set of the standard unit ball in Rn with respect to some norm N , {x ∈
Rn | N (x) ≤ 1}, is the standard unit ball in Rn with respect to the dual norm N ∗ ,
{y ∈ Rn | N ∗ (y) ≤ 1}.
• The polar set P ◦ of a polygon P in the plane R2 that contains the origin is again
a polygon that contains the origin. The vertices of each of these two polygons
correspond 1 − 1 to the edges of the other polygon. Indeed, for each vertex of P ,
the supporting halfplanes to P that have the vertex on the boundary line form an
edge of P ◦ . Moreover, for each edge of P , the unique supporting halfplane to P
that contains the edge on its boundary line is a vertex of P ◦ .
• The polar set P ◦ of a polytope P in three dimensional space R3 that contains the
origin is again a polytope that contains the origin. The vertices of each of these
two polytopes correspond 1 − 1 to the faces of the other polytope; the edges of P
correspond 1 − 1 to the edges of P ◦ . Indeed, for each vertex of P , the supporting
halfspaces to P that have the vertex on the boundary plane form a face of P ◦ .
Moreover, for each face of P , the unique supporting halfspace to P that contains
the face on its boundary plane is a vertex of P ◦ . Furthermore, for each edge of
P , the supporting halfspaces to P that have the edge on the boundary plane form
an edge of P ◦ .
• The results above for polytopes in R2 and R3 can be extended to polytopes in Rn
for general n.
• The polar set of a regular polytope that contains the origin is again a regular
polygon that contains the origin:
– if P is a regular polygon in R2 with k vertices, then P ◦ is a regular polygon
with k vertices;
– if P is a regular simplex in Rn , then P ◦ is a regular simplex;
96 4 Convex Sets: Dual Description
B ◦◦ = (B ◦ )◦ .
Now we are ready to give the promised reformulation of the duality theo-
rem 4.2.5, in terms of the polar set operator.
Theorem 4.3.15 A closed convex set B ⊆ X = Rn that contains the origin is equal
to its bipolar set,
B ◦◦ = B.
In particular, the bipolar set operator on nonempty convex sets is the same
operator as the closure operator: B ◦◦ = cl(B) for every nonempty convex set
B ⊆ X = Rn .
Hint for the equivalence to the duality theorem 4.2.5. Elements of B ◦ correspond
to closed halfspaces containing B that do not contain the origin on their boundary.
So B = B ◦◦ is equivalent to the property that B is the intersection of closed
halfspaces.
An advantage of this reformulation is that it shows that there is a complete
symmetry between the primal description and the dual description of a closed
convex set that contains the origin. Indeed, both are given by a convex set—B
for the primal description and B̃ for the dual one—and these convex sets are each
other’s polar set—B̃ = B ◦ and B = B̃ ◦ . So taking the dual description of the dual
description brings you back to the primal description.
C ◦ = {y ∈ Rn | x · y ≤ 0 ∀x ∈ C}.
This operator is called a duality operator for convex cones; it turns the primal
description of a closed convex cone (by its rays) into the dual description (by the
halfspaces containing the convex cone that have the origin on their boundary: for
each nonzero vector y ∈ C ◦ , the set of solutions x of the inequality x · y ≤ 0 is such
a halfspace).
4.3 Other Versions of the Duality Theorem 97
C
C Co Co
Warning The terminologies polar cone and polar set are similar and the notations
are even the same, but these operators are defined by different formulas. However,
these operators agree for convex cones.
Example 4.3.17 (Polar Cone Operator)
1. Figure 4.9 illustrates the polar cone operator for convex cones in the plane and
in three dimensional space. Note the orthogonality property in both pictures: for
each ray on the boundary of a closed convex cone C, there is a unique ray on C ◦
for which the angle is minimal; this minimal angle is always 12 π .
2. The polar cone of a convex cone is the polar set of that convex cone.
3. For each one of the three ‘golden cones’ C (the first orthant, the Lorentz cone and
the positive semi-definite cone) one has C ◦ = −C = {−c | c ∈ C}. Sometimes
one defines, for a convex cone C ⊆ Rn the dual cone C ∗ = {x ∈ Rn | x · y ≥
0} = −C ◦ . and a convex cone C is called self-dual if C ∗ = C. Then one can say
that the three golden cones are self-dual.
One gets the following reformulation of the duality theorem 4.2.5 in terms of
convex cones.
Definition 4.3.18 The bipolar cone C ◦◦ of a convex cone C ⊆ X = Rn containing
the origin is (C ◦ )◦ , the polar cone of the polar cone of C,
C ◦◦ = (C ◦ )◦ .
C ◦◦ = C.
In particular, the bipolar cone operator on nonempty convex cones is the same
operator as the closure operator: C ◦◦ = cl(C).
Hint for the equivalence to the duality theorem 4.2.5. One should use the
construction of the polar set operator by the homogenization method, which will
be given later in this chapter.
98 4 Convex Sets: Dual Description
In fact, one can give moreover the following weak looking reformulation of the
duality theorem 4.2.5 in terms of convex cones. This weak looking reformulation
represents in a sense the essence of the duality of convex sets, and so the essence of
convex analysis.
Theorem 4.3.20 The polar cone of a convex cone C ⊆ Rn (so C = Rn ) that
contains the origin, is nontrivial: that is, C ◦ = {0n }.
Hint for the equivalence to Theorem 4.3.19 (and so to the duality theorem 4.2.5).
In order to derive from the weak looking Theorem 4.3.20 the stronger looking
Theorem 4.3.19, one should use that separating a point p outside C by a hyperplane
from C is equivalent to separating the origin by a hyperplane from the convex cone
The aim of this section is to give a coordinate-free construction of the polar cone
operator. This will be convenient for giving an equivalent definition for the polar set
operator, this time in a systematic way (rather than by an explicit formula, such as
we have done before)—by homogenization.
Definition 4.4.1 Let X, be two finite dimensional vector spaces. A mapping
b : X × → R is bilinear if it is linear in both arguments, that is,
b(αx + βy, z) = αb(x, z) + βb(y, z) for all α, β ∈ R, x, y ∈ X, z ∈ ,
b(x, αy + βz) = αb(x, y) + βb(y, z) for all α, β ∈ R, x ∈ X, y, z ∈ .
Definition 4.4.2 Let X, be two finite dimensional vector spaces and b : X ×
→ R a bilinear mapping. Then b is non-degenerate if for each nonzero x ∈ X the
linear function on given by the recipe y → b(x, y) is not the zero function, and
if moreover for each nonzero y ∈ the linear function on X given by the recipe
x → b(x, y) is not the zero function.
Definition 4.4.3 A pair of finite dimensional vector spaces X, is said to be in
duality if a non-degenerate bilinear mapping b : X × → R is given.
Proposition 4.4.4 (A Standard Pair of Vector Spaces in Duality) Let X = Rn
and = Rn . The mapping b : X × → R given by the recipe (x, y) → x · y,
that is, the standard inner product, called the dot product, makes from X and two
vector spaces in duality.
We only work with finite dimensional vector spaces and for them the standard
pair of vector spaces in duality given above is essentially the only example.
Remark The reasons for introducing a second space and not just taking an inner
product on X is as follows. For infinite dimensional vector spaces X of interest,
there is often no natural generalization of the concept of inner product. Then the
best one can do is to find a different space and a non-degenerate bilinear mapping
b : X × → R. Even for finite dimensional spaces, it is often more natural to
introduce a second space, although it has the same dimension as X, so it could
be taken equal to X. For example if one considers consumption vectors as column
vectors x, then one models price vectors as row vectors p. However, for our purpose,
the really compelling reason to introduce and a bilinear map b : X × → R is
that we will need other b than the inner product.
Example 4.4.5 (Useful Nonstandard Bilinear Mappings) Here are the two nonstan-
dard pairs of spaces in duality that we will need.
1. Let X = Rn+1 and = Rn+1 . The mapping b : X × → R given by the recipe
((x, α), (y, β)) → x · y − αβ makes from X and two vector spaces in duality.
Note the minus sign.
100 4 Convex Sets: Dual Description
C ◦ = {y ∈ | b(x, y) ≤ 0 ∀x ∈ C}.
and
are convex cones that are each other’s polar cone with respect to the bilinear pairing
defined by the recipe (x, α), (y, β) = x · y − αβ.
C
C Co Co
The aim of this section is to define the polar set operator by homogenization. This
definition will make it easier to establish properties of the polar set operator.
Here is a detail that needs some attention before we start: the homogenization
c(B) of a convex set B ⊆ X = Rn that contains the origin, is contained in the
space X × R. For defining the polar cone (c(B))◦ , we have to define a suitable
nondegenerate bilinear mapping on the space X × R. We take the following non-
degenerate bilinear mapping b:
R+ (0X , 1) ⊆ C ⊆ X × R+ .
R+ (0X , 1) ⊆ cl(C) ⊆ X × R+
implies
and so it gives
as required.
102 4 Convex Sets: Dual Description
Note that the second property of Proposition 4.5.1 is not true for arbitrary convex
sets. In Chap. 6, it is explained how the duality of arbitrary convex sets can be
described without translating them first to convex sets that contain the origin. This
will be necessary, as for applications in convex optimization, we will have to deal
simultaneously with two convex sets that do have a common point.
Now we are ready to construct the polar set operator by the homogenization
method.
Proposition 4.5.2 The duality operator on convex sets containing the origin that
one gets by the homogenization method (first conify, then apply the polar cone
operator, finally deconify) is equal to the polar set operator.
Proof Here are the three steps of the construction of the duality operator on convex
sets containing the origin that one gets by the homogenization method.
1. Homogenize. Take cl(c(B)), the closure of the homogenization of a given convex
set B ⊆ X = Rn containing the origin. Choose a conification C of B. Note that
C is a convex cone C ⊆ X × R for which R+ (0X , 1) ⊆ C ⊆ X × R+ .
2. Work with convex cones. Take the polar cone with respect to the bilinear mapping
b defined above, that is,
This gives
As the convex cones R+ (0X , 1) and X × R+ are each other’s polar cone with
respect to this bilinear pairing, it follows that
X × R+ ⊇ C ◦ ⊇ R+ (0X , 1).
B° ×{-1}
For every type of convex object, there are calculus rules. These can be used to
compute the duality operator for this type of objects. Here we give these calculus
rules for convex sets B containing the origin. Here the duality operator is the polar
set operator. We have already the rule B ◦◦ = B if B is closed. Now we will also give
rules expressing (B1 ◦ B2 )◦ , for a binary operation ◦ (sum, intersection, convex hull
of the union or inverse sum) in terms of B1◦ and B2◦ , and rules expressing (MB)◦
and (BN )◦ in terms of B ◦ for linear functions M and N .
We will derive the calculus rules for convex sets by using the homogenization
method. In order to do this, we need the calculus rules for convex cones C to begin
with. Therefore, we prepare by giving these rules. We have already the rule C ◦◦ = C
if C is closed and contains the origin.
We will halve the number of formulas needed, by the introduction of a handy
shorthand notation. We will use for two convex cones C, C̃ ⊆ X = Rn that are in
duality, in the sense that
C ◦ = C̃, C̃ ◦ = cl(C),
104 4 Convex Sets: Dual Description
d
C←
→ C̃.
Note that this implies that C̃ is closed; however C does not have to be closed. It also
implies that C and C̃ both contain the origin.
Proposition 4.6.1 (Calculus Rules for Binary Operations on Convex Cones)
d d
Let C, C̃, D, D̃ be closed convex cones in X = Rn for which C ←
→ C̃, D ←
→ D̃.
Then
d
C+D ←
→ C̃ ∩ D̃.
d
→ C̃M .
MC ←
This result follows from the duality theorem for convex cones. Under suitable
assumptions, the cone MC in Proposition 4.6.2 is closed (see Exercise 42).
Now we are ready to present the calculus rules for convex sets containing the
origin. Again we use similar shorthand notation. We will use for two convex sets
B, B̃ ⊆ X = Rn containing the origin that are in duality, in the sense that
B ◦ = B̃, B̃ ◦ = cl(B),
d
B←
→ B̃.
Note that this implies that B̃ is closed; however B does not have to be closed.
Proposition 4.6.3 (Calculus Rules for Binary Operations on Convex Sets Con-
taining the Origin) Let A, Ã, B, B̃ be convex sets in X = Rn containing the
d d
origin, for which A ←
→ Ã, B ←
→ B̃. Then
d
A co ∪ B ←
→Ã ∩ B̃
d
A +B ←
→Ã #B̃.
4.6 Calculus Rules for the Polar Set Operator 105
This result can be proved by the homogenization method, using the systematic
definition of the binary operations for convex sets.
Now we give the calculus rules for the two actions of linear functions (image and
inverse image).
Proposition 4.6.4 (Calculus Rules for the Two Actions of Linear Functions on
Convex Sets Containing the Origin) Let M : Rn → Rm be a linear function, and
d
B, B̃ ⊂ Rn convex zero-sets containing the origin for which B ←
→ B̃. Then
d
→ B̃M .
MB ←
This result follows by the conification method from the corresponding result for
convex cones. Again one ‘can omit the closure operator’ in the formulas above under
suitable assumptions. We do not display these conditions.
Example 4.6.5 (Calculus Rules for Bounded Convex Sets Containing the Origin)
The calculus rules above simplify if one restricts attention to the following type of
convex object: bounded convex sets that contain the origin in their interior. Part of
the assumptions, that the convex sets contain the origin in their interior, is essentially
no restriction of the generality: it can be achieved by a translation that brings a
relative interior point of the convex set to the origin, followed by the restriction of
the entire space by the linear span of the translated copy of the convex set. Note that
the polar set of such a set is of the same type, and so the polar set operator preserves
the type. Indeed, these convex sets are deconifications of convex cones C for which
for some numbers R > r > 0, where · is the Euclidean norm. If you apply the
polar cone operator with respect to b, then you get
The simplification in the calculus rules for bounded sets containing the origin in
their interior, is that the closure operations in the calculus formulas can be omitted.
This follows from the fact that the image of a compact set under a continuous
function is again compact.
Conclusion The unified definition of the four classic binary operations for convex
sets containing the origin, by means of the homogenization method, makes it
possible to prove their calculus formulas in one blow. They are a consequence of the
calculus formulas for the two actions on convex cones by linear functions—image
and inverse image.
106 4 Convex Sets: Dual Description
The aim of this section is to present the duality theory for polyhedral sets.
Polyhedral sets are the convex sets par excellence. They are the convex sets that
occur in Linear Programming. Each convex set can be approximated as closely as
desired by polyhedral sets. The theorems of convexity are essentially simpler and
stronger for polyhedral sets than for arbitrary convex sets. Polyhedral sets have an
interesting specific structure: vertices, edges, facets. Some properties of the duality
for polyhedral sets have already been given in Example 4.3.13.
A closed convex cone in X × R+ , where X = Rn , has as deconification a
polyhedral set iff it is a polyhedral cone.
We begin with the representation theorem for polyhedral sets P , which is the
central result.
We give a version of the representation theorem where the representation only
depends on one choice, that of a basis for the lineality space of the polyhedral set P .
Note that the definition of a polyhedral set, as the intersection of a finite collection
of closed halfspaces, gives a dual description of polyhedral sets. The representation
theorem gives a primal description as well.
Theorem 4.7.1 (Minkowski-Weyl Representation) Let P ⊆ X = Rn be a
polyhedral set. Consider the following sets S is the set of extreme points of P ∩ L⊥
P,
T is a set of representatives of the extreme rays of P ∩ L⊥ P and U is a basis of LP .
Then each point of P can be represented in the following form
αs s + βt t + γu u
s∈S t∈T u∈U
where αs ∈ R+ ∀s ∈ S, s∈S αs = 1, βt ∈ R+ ∀t ∈ T and γu ∈ R ∀u ∈ U .
Conversely, each convex sum that is the Minkowski sum of the convex hull of a
finite set and the conic hull of a finite set is a polyhedral set.
Example 4.7.2 (Minkowski-Weyl Representation) Figure 4.13 illustrates the
Minkowski-Weyl representation for four different polyhedral sets P in the plane
R2 . From left to right:
• a bounded P is the convex hull of its vertices (extreme points),
s4 P
s3 u
P t1 t2 t1 P t2 s1
s5 s4 s2 L⊥
P
s1 P
s1 s2 s2 s3
• an angle is the conic hull of its two legs (extreme recession directions),
• an unbounded P that contains no line is the Minkowski sum of the convex hull
of its extreme points and the conic hull of its extreme recession directions,
• a general polyhedral set P is the Minkowski sum of a polyhedral set that contains
no line, and a subspace.
Note that this result implies that a bounded convex set is the convex hull of a
finite set iff it is the intersection of a finite collection of closed halfspaces.
The proof of Theorem 4.7.1 is left as an exercise. It can be proved from first
principles by the homogenization method (recommended!) or it can be derived from
the theorem of Krein–Milman for arbitrary—possibly unbounded—convex sets that
we have given.
It follows that the polar set of a polyhedral set that contains the origin is again
a polyhedral set and that the polar cone of a polyhedral cone is again a polyhedral
cone.
d
We use again the notation P ← → P̃ for convex polyhedral sets in duality. Note
that now there is no closure operation in its definition: it just means P ◦ = P̃ , P̃ ◦ =
P . So there is complete symmetry between P and P̃ . This is the advantage of
working with polyhedral sets.
If we take this into account, then the theorems for general convex sets specialize
to the following results for polyhedral sets.
Theorem 4.7.3 (Theorem on Polyhedral Sets) The following statements hold:
1. The polar set of a polyhedral set is again a polyhedral set.
2. The image and inverse image of a polyhedral set under a linear transformation
are again polyhedral sets.
3. The polyhedral property is preserved by the standard binary operations for
convex sets—sum, intersection, convex hull of the union, inverse sum.
4. For polyhedral sets P , Q ⊂ Rn containing the origin, the following formulas
hold
d
P +Q←
→ P #Q
d
P co ∪ Q ←
→ P ∩ Q.
d
5. For polyhedral sets P , P̃ ⊂ Rn containing the origin for which P ←
→ P̃ and
M ∈ Rm×n , the following formula holds:
d
→ P̃ M .
MP ←
We do not go into the definitions of vertices, edges and facets of polyhedral sets.
However we invite you to think about what happens to the structure of vertices,
edges and facets of a polyhedral set when you apply the polar set operator. Try to
guess this. Hint: try to figure out first the case of the platonic solids—tetrahedron,
cube, octahedron, dodecahedron, icosahedron—if they contain the origin.
108 4 Convex Sets: Dual Description
Conclusion Polyhedral sets containing the origin are the nicest class of convex sets.
In particular, the polyhedral set property is preserved under taking the image under
a linear transformation and under taking the polar set. This makes the duality theory
for polyhedral sets very simple: it involves no closure operator.
4.8 Applications
The aim of this section is to give some applications of duality for convex sets.
We present the theorems of the alternative. We show that for each system of linear
inequalities and equations, one can write down another such system such that either
the first or the second system is soluble. These are results about polyhedral sets,
as polyhedral sets are solution sets of finite systems of linear inequalities and
equations. We do this for some specific forms of systems.
We start with an example.
Example 4.8.1 (Theorems of the Alternative) Consider the system of linear inequal-
ities
2x + 3y ≤ 7
6x − 5y ≤ 8
−3x + 4y ≤ 2
This system is soluble. It is easy to convince a non-expert of this. You just have to
produce an example of a solution. For example, you give (2, 1). You could call this
a certificate of solubility. This holds more generally for each system of inequalities
and equations: if it is soluble, then each solution can be used to give a proof of the
solubility.
Now consider another system of linear inequalities
2x − 3y + 5z ≤ 4
6x − y − 4z ≤ 1
−18x + 11y − 7z ≤ −15.
for some negative number a. Therefore, the assumption that a solution to the system
of linear inequalities exists leads to a contradiction. Hence, there exists no solution.
It seems reasonable to expect that non-experts will be convinced by this argument.
For example, you can take as coefficients 3, 2, 1. This gives that
3 · 4 + 2 · 1 + 1 · (−15) ≤ 3 · 4 + 1 · 1 + 2 · (−15).
0 · x + 0 · y + 0 · z ≤ −1.
You could call this a certificate of insolubility. Such certificates do not exist for
arbitrary systems of inequalities and equations. But fortunately they always exist if
the inequalities and equations are linear. This is the content of the theorems of the
alternative.
We do not go into the efficient method to find such certificates—by means of the
simplex method. Of course, it is not necessary that a non-expert understands how
the certificate—here the coefficients 3, 2, 1—has been found. This is a ‘trick of the
trade’.
Now we give several theorems of the alternative.
Theorem 4.8.2 (Theorems of the Alternative) Let X = Rn , Y = Rm , A ∈ Rm×n
and b ∈ Y .
1. Minkowski [1] and Farkas [2]. Either the system
Ax = b, x ≥ 0
A y ≥ 0, b · y < 0
is soluble.
2. Fan Tse (Ky Fan) [3]. Either the system
Ax ≤ b
A y = 0, y ≥ 0, b · y < 0
is soluble.
110 4 Convex Sets: Dual Description
Ax ≤ b, x ≥ 0
A y ≥ 0, y ≥ 0, b · y < 0
is soluble.
The first one of these theorems of the alternative is usually called Farkas’ lemma.
The proofs of these theorems of the alternative will show how handy the calculus
rules for the binary operations are. In order to avoid that minus signs will occur, we
work with dual cones and not with polar cones. Recall that they only differ by a
minus sign C ∗ = −C ◦ . We will use ((Rn )+ )∗ = Rn+ .
Proof
1. The solubility of Ax = b, x ≥ 0 means that b ∈ ARn+ . The convex cone ARn+ is
polyhedral, and therefore
This settles 1.
2. The solubility of Ax ≤ b means that b ∈ ARn +Rm
+ . The convex cone AR +R+
n m
This settles 2.
3. The solubility of Ax ≤ b, x ≥ 0 means that b ∈ ARn+ + Rm
+ . The convex cone
ARn+ + Rm + is polyhedral, and therefore
m ∗∗ −1 m ∗
ARn+ + Rm
+ = (AR+ + R+ ) = ((A ) (R+ ) + R+ ) .
n n
This settles 3.
For one more theorem of the alternative, by Gordan, see Exercise 50.
Conclusion The calculus rules for binary operations allow very efficient proofs for
the so-called theorems of the alternative. These theorems show the existence of
certificates of non-solubility of systems of linear inequalities.
4.8 Applications 111
We present, as announced in Sect. 4.1.4, the result of Black and Scholes for the
pricing of European call options in a simple but characteristic situation.
Consider a stock with current value p. Assume that there are two possible
scenarios at a fixed future date: either the value will go down to vd , or the value
will go up to vu . The question is to determine the ‘correct’ current value π for the
right to buy the stock at the fixed future date for a given price q. Such a right is
called a European call option. This given price is always chosen between the low
value vd and the high value vu , of course, vd < q < vu . Other choices make no
sense. Black and Scholes showed that there exists a unique ‘correct’ current value
for the option. It is characterized by the following interesting property:
There is an—auxiliary—unique probability distribution for the two scenarios such that, for
both the stock and the option, the current value is equal to the expected value at the fixed
future date.
This property is easy to memorize. The condition on the stock and the option
is called the martingale property. Keep in mind that the statement is not that this
probability distribution describes what the probabilities really are; it is just an
auxiliary gadget.
Let us first see why and how this determines the value of the option now. We
choose notation for the probability distribution: let yd be the probability that the
value will go down and let yu be the probability that the value will go up. So
yd , yu ≥ 0 with yd + yu = 1.
Now we put ourselves in the position of an imaginary person who thinks that the
probability for the value of the stock to go down is yd and that the probability for
the value of the stock to go up is yu . We do not imply at all that these thoughts have
any relation to the real probabilities. The probability distribution that we consider is
just an auxiliary gadget.
The expected value of the stock at the future date is yd vd + yu vu . Thus the
property above gives the equality p = yd vd + yu vu . When the price of the stock
goes down, then the right to buy something for the price q that has at that date a
lower price vd is useless. So this right will not be used. Therefore, the value of the
option at that date is 0. However, when the price of the stock goes up, then you
can buy stock for the price q and then sell it immediately for the price vu . This
gives a profit vu − q. Therefore, the value of the option at that date turns out to
be vu − q. It follows that the expected value of the option at the future date is
yd · 0 + yu (vu − q) = yu (vu − q). Thus the property above gives the equality
π = yu (vu − q). Now we have three linear equations in the unknown quantities
π, yd , yu :
yd + yu = 1,
p = yd vd + yu vu ,
π = yu (vu − q).
112 4 Convex Sets: Dual Description
These equations have a unique solution and for this solution yd , yu ≥ 0 as one
readily checks. This concludes the explanation how the property above determines
a unique current value for the option.
It remains to justify the property above. The source of this is the fact that in
financial markets, there are always people on the lookout for an opportunity to
make money without taking any risk, called arbitrage opportunities. Their activities
lead to changes in prices, which make these opportunities disappear. Therefore, the
assumption is made that there are no arbitrage opportunities. We are going to see
that this implies the property above.
To see this, we first have to give a precise description of the absence of arbitrage
opportunities. There are three assets here: money, stock, options. We assume that
there exists no portfolio of these assets that has currently a negative value but at
the fixed future date a nonnegative value in each one of the two scenarios. That
is, there exist no real numbers xm , xs , xo for which xm + xs p + xo π < 0 and
xm + xs vd + xo · 0 ≥ 0, xm + xs vu + xo (vu − q) ≥ 0.
Note that here we assume that the value of money stays the same from now till
the future date and that we allow going short, that is, xm , xs , xo are allowed to be
negative—this means holding negative amounts of the asset, as well as allowing
xm , xs , xo to be not integers.
Now we are ready to strike: we apply Farkas’ lemma. This gives the property
above, as can be checked readily. This concludes the proof of the result of Black
and Scholes.
The aim of this section is to determine the equation of the curve which is the
outcome of the child drawing of Sect. 4.1.1. Figure 4.14 gives this drawing again.
Let the length of the drawer be 1. We choose coordinates in such a way that the
x-axis lies horizontally at a height of 1 above the bottom of the sheet and that the
y-axis lies vertically at a distance of 1 to the right of the left side of the sheet. Then
the lines ax + by = 1 that the child draws enclose a closed convex set A containing
the origin; they are tangent lines to the boundary of A. This boundary is a curve
. The lines that the child draws, correspond to the points of the curve that form
the boundary of the polar set A◦ . By the duality theorem, the polar set of A◦ is A.
The points on the boundary of A = (A◦ )◦ , the polar set of A◦ , correspond to the
tangent lines to . This discussion suggests how to do the calculation. To keep this
calculation as simple as possible we change to the following coordinates: the bottom
of the sheet is the x-axis and the left-hand side of the sheet is the y-axis. Note that
now the convex set A does not contain the origin. However, this does not matter for
the present purpose.
The lines that the child draws are the lines ax + by = 1 for which the distance
between the intercepts (a −1 , 0) and (0, b−1 ) is 1. Thus the pair (a, b) has to satisfy,
by the theorem of Pythagoras, the equation
a −2 + b−2 = 1.
Conclusion By using the duality theorem for convex sets we can derive from the
dual description of a convex curved line (‘its tangent lines’) an equation for the
curved line.
The aim of this section is to prove the result stated in Sect. 4.1.2: for each reasonable
way to manufacture a product, there exist prices for the resources such that this way
will be chosen by a cost minimizing firm.
Let notations and definitions be as in Sect. 4.1.2. Let a reasonable way to
manufacture the product be chosen: this is a boundary point x of P . Apply the
supporting hyperplane theorem 4.3.3 to x and P . This gives a nonzero vector
p ∈ Rn such that the halfspace p · (x − x ) ≥ 0 supports P at x . As RP = Rn+
and as x is efficient, it follows that p > 0. If we now set prices for the resources to
be p1 , . . . , pn , then we get—by the definition of supporting hyperplane—that the
problem to minimize p · x subject to x ∈ P has optimal solution x . This completes
the analysis of the manufacturing problem.
114 4 Convex Sets: Dual Description
Conclusion The supporting hyperplane theorem can be applied to show the follow-
ing result. For each efficient way to manufacture a product, there exist prices for the
resources such that this efficient way will be chosen by a cost minimizing firm.
4.9 Exercises
1. In Sect. 4.1.2, it is stated that boundary points of P are the reasonable ways of
manufacturing. Why is this the case?
2. Check the details of the ball throwing proof for the duality theorem for convex
sets, Theorem 4.2.5, which is given in this chapter.
3. One of the two standard proofs for duality: by means of the shortest distance
problem. Consider the set-up of Theorem 4.2.5 and assume A = ∅, A = Rn .
Show that for each p ∈ Rn \ A, there exists a unique point x ∈ A which is
closest to p and show that the closed halfspace {x ∈ Rn | (x − x ) · (
x − p) ≥ 0}
contains A but not p. Figure 4.15 illustrates the shortest distance problem.
Figure 4.16 illustrates the proof for duality by means of the shortest distance
problem for a convex cone C in the plane. A disk with center p is blown up till
it hits the convex cone, at some point x . Then the tangent line to the disk at the
point x has C on one side and the point p on the other side.
4. *Let p ∈ Rn and let A ⊆ Rn be a nonempty closed convex set. Show that the
shortest distance problem for the point p and the set A has a unique solution.
x^
p
x^
x^
p p
4.9 Exercises 115
C 0 D
c
d
x
This unique solution is sometimes called the projection of the point p on the
convex set A. This result is called the Hilbert projection theorem.
5. Theorem of Moreau. *Here is the main result on the shortest distance problem.
Let C, D be two closed convex cones in X = Rn that are each other’s polar
cone, C = D ◦ , D = C ◦ . Let x ∈ X be given. Show that there exist unique
c ∈ C and d ∈ D for which
x = c + d, c · d = 0.
That is, show that the function |ψ(x)| assumes its maximum on the solution set
of N (x) = 1 (‘the sphere of the norm N ’).
8. Prove the statement in Example 4.3.10.
9. Hahn–Banach. One of the two standard proofs for duality: the Hahn–Banach
proof. The following proof of the duality theorem, has the advantage that it can
be extended to infinite dimensional spaces (this only requires that you use in
addition Zorn’s lemma); it is essentially the proof of one of the main theorems
of functional analysis, the theorem of Hahn–Banach.
Figure 4.18 illustrates the induction step of the Hahn–Banach proof for
duality, for the step from dimension two to dimension three. Here we consider
the weakest looking version of duality, Theorem 4.3.20.
116 4 Convex Sets: Dual Description
C
V
l proj l ⊥(C)
v 0
0 proj l ⊥
C1 ⊆ C2 ⇒ C1◦ ⊇ C2◦ .
16. Show that for two closed convex cones in duality, C, C̃, one has that C has a
nonempty interior iff C ◦ is pointed (that is, it contains no entire line).
17. Show that for two closed convex sets containing the origin, B1 , B2 ⊆ Rn , one
has the implication
B1 ⊆ B2 ⇒ B1◦ ⊇ B2◦ .
118 4 Convex Sets: Dual Description
18. Let B ⊆ X = Rn be a closed convex set containing the origin. Check that B ◦
is a closed convex set in X containing the origin.
19. Show that for two closed convex sets containing the origin, B, B̃, that are each
other’s polar set, one has that B is bounded iff B̃ contains the origin in its
interior.
Hint. Proceed in three steps:
(a) Show that B is bounded iff its conification c(B) is contained in the epigraph
of ε · for some ε > 0.
(b) Show that B contains the origin in its interior iff its conification c(B )
contains the epigraph of N · for some N > 0.
(c) Show that the epigraphs of N −1 · and N · are each other’s polar cone
with respect to the bilinear pairing b((x, α), (y, β)) = x · y − αβ.
20. Show that the polar set of a subspace is its orthogonal complement.
21. Show that the polar set of the standard unit ball in Rn with respect to some
norm · is the standard unit ball in Rn with respect to the dual norm · ∗ .
22. Show that the fact that the polar set operator B → B ◦ is an involution—that is,
Theorem 4.3.15—is indeed a reformulation of the duality theorem for convex
sets, Theorem 4.2.5.
Hint. Check first that for every nonzero v ∈ Rn , one has v ∈ B ◦ iff the closed
halfspace given by the inequality x · v ≤ 1 contains B.
23. Prove the weakest looking version of the duality theorem (‘polar cone of a
proper convex cone is nontrivial’) by means of the ball-throwing proof, where
the ball is not necessarily thrown in a straight line.
24. Show that for every closed convex cone C ⊂ X = Rn that has nonempty
interior, there exists a basis of the vector space X such that if you take
coordinates with respect to this basis, C is the epigraph of a function g :
Rn−1 → R+ for which epi(g) is a convex cone.
25. *Show that for every closed convex cone C ⊂ X = Rn that has nonempty
interior and that contains no line through the origin, there exists a basis of the
vector space X such that if you take coordinates with respect to this basis, C is
the epigraph of a function g : Rn−1 → R+ for which epi(g) is a convex cone
and g(0) = 0 iff x = 0.
26. *Show that a given closed convex cone does not contain any line through the
origin iff its polar cone has a nonempty interior.
27. *Show that the three golden cones—the first orthant, the Lorentz cone and the
positive semidefinite cone—are self-dual, that is, show that they are equal to
their dual cone.
28. *Prove Proposition 4.4.4.
29. Let X, be two finite dimensional vector spaces which are put in duality by a
non-degenerate bilinear mapping b : X × → R.
(a) *Show that X and have the same dimension (= n).
(b) Choose bases for X and and identify vectors of X and with the
coordinate vectors with respect to these bases. Thus X and are identified
4.9 Exercises 119
44. *Show that ‘the closure operators cannot be omitted’ from the formulas above
in general. That is, one might have, for suitable pairs of closed convex cones in
duality (C, C̃) and (D, D̃) in Rn ,
C + D = (C̃ ∩ D̃)◦
and
(C̃M )◦ = MC.
45. *Derive Proposition 4.6.4 from the corresponding result for convex cones,
Proposition 4.6.2.
46. *Derive Proposition 4.6.3 from Proposition 4.6.4 by means of the homogeniza-
tion method.
Hint. use that +Y and Y are each other’s transpose for each finite dimensional
inner product space Y .
47. Formulate Theorem 4.7.3 for polyhedral cones.
48. *Let P be the solution set of the system of linear inequalities in n variables
aj x ≤ bj , 1 ≤ j ≤ r where aj ∈ Rn and bj ∈ R. Show that for each extreme
point x̄ of this polyhedral set there is a subset J of {1, 2, . . . , r} for which the
vectors aj , j ∈ J form a basis of Rn and x̄ is the unique solution of the system
of linear equations aj x = bj , j ∈ J .
49. Consider the five platonic solids—tetrahedron, cube, octahedron, dodecahe-
dron, icosahedron. We assume that they contain the origin—after a translation;
this is done for convenience: then we can give dual descriptions by means of
the polar set operator.
(a) Show that these are polyhedral sets.
(b) Show that the polar set of a platonic solid is again a platonic solid.
(c) *Determine for each platonic solid which platonic solid is its polar set.
50. Gordan’s theorem of the alternative. *Let a1 , . . . , ar ∈ Rn . Prove the
following statement.
Gordan [5]. Either the system a1 x < 0, . . . , ar x < 0 is soluble, or the system
μ1 a1 + · · · + μr ar = 0, μ = 0, μ ≥ 0 is soluble.
51. Show that the coefficients a and b of a tangent ax +by = 1 to the curve in the
2 2
child drawing example analyzed in Sect. 4.8.3 satisfy the equation a 3 +b 3 = 1.
52. Now the child makes a second drawing (after the drawing that is analysed in
Sect. 4.8.3). Again she draws straight lines from the left-hand side of the paper
to the bottom of the paper. This time she is doing this in such a way that the
sum of the distances of the endpoints of each line to the bottom-left corner of
the paper is constant. Again these lines enclose a closed convex set. Find the
equation of the curve formed by the boundary points of this closed convex set.
References 121
References
Abstract
• Why. A second main object of convex analysis is a convex function. Convex
functions can help to describe convex sets: these are infinite sets, but they can
often be described by a formula for a convex function, so in finite terms. More-
over, in many optimization applications, the function that has to be minimized is
convex, and then the convexity is used to solve the problem.
• What. In the previous chapters, we have invested considerable time and effort in
convex sets and convex cones, proving all their standard properties. Now there
is good news. No more essentially new properties have to be established in the
remainder of this book. It remains to reap the rewards. In Chaps. 5 and 6, we
consider to begin with convex functions: the dual properties in Chap. 6, and the
properties that do not require duality—the primal properties in this chapter. This
requires convex sets: convex functions are special functions on convex sets. Even
better, they can be expressed entirely in terms of convex sets: convex functions
are functions for which the epigraph, that is, the region above the graph, is a
convex set. Some properties of convex functions follow immediately from a
property of a convex set, applied to the epigraph. An example is the continuity
property of convex functions. For other properties, a deeper investigation is
required: one has to take the homogenization of the epigraph, and then one should
apply a property of convex cones. An example is the unified construction of the
eight standard binary operations on convex functions—sum, maximum, convex
hull of the minimum, infimal convolution, Kelley’s sum, and three nameless
ones—by means of the homogenization method. The defining formulas for them
look completely different from each other, but they can all be generated in exactly
the same systematic way by a reduction to convex cones (‘homogenization’).
One has for convex functions the same technical problem as for convex sets:
all convex functions that occur in applications have the nice properties of being
closed and proper, but if you work with them and make new functions out of
them, then they might lose these properties. Two examples of such work are:
(1) the consideration, in a sensitivity analysis for an optimization problem, of
the optimal value function, and (2) the application of a binary operation. Finally,
twice continuously differentiable functions f (x) are convex iff their Hessian f (2)
is positive semidefinite for all x.
Road Map
1. Definitions 5.2.1, 5.2.4, 5.2.6, 5.2.9 and 5.2.10 (convex function, defined either
by Jensen’s inequality or by the convexity of a set: its (strict) epigraph).
2. Proposition 5.3.1 (continuity property of a convex function).
3. Propositions 5.3.6, 5.3.9 (first and second order characterizations of convex
functions).
4. Figure 5.7 and Definition 5.4.2 (construction convex function by homogeniza-
tion).
5. Definitions 5.2.12, 5.3.4 (two nice properties for convex functions, closedness
and properness).
6. Definitions 5.5.1, 5.5.2 (the two actions by linear functions on convex functions).
7. Figure 5.8, Definition 5.6.1, Proposition 5.6.6 (binary operations on convex
functions—pointwise sum, pointwise maximum, convex hull of pointwise min-
imum, infimal convolution, Kelley’s sum and three nameless ones—defined by
formulas and constructed by homogenization).
5.1 *Motivation
The aim of this section is to give two examples of how a convex set, which is usually
an infinite set, can be described by a function, which can often be described in finite
terms: by a formula.
Example 5.1.1 (An Unbounded Closed Convex Set Viewed as an Epigraph) Sup-
pose that A ⊆ X = Rn is an unbounded closed convex set. Then it has a nonzero
recession vector v. Assume that −v is not a recession vector for A. Then we can
view A as an epigraph, that is, as the region above the graph of a function. Figure 5.1
illustrates how this can be done.
Here is a precise description. Let L be the hyperplane in X through the origin
orthogonal to v, that is,
L = v ⊥ = {x ∈ X | x · v = 0}.
In this way, the convex set A is described completely in terms of the function f :
it is essentially the epigraph of f . The function f is called a convex function as its
epigraph is a convex set. Moreover, f is called lower-semicontinuous or closed as
its epigraph is a closed set. There are many functions defined by a formula—that is,
in finite terms—that can be verified to be convex functions, as we will see. For such
a function f , one gets a convenient ‘finite’ description of the convex set epif , the
epigraph of f —by the formula for the function f .
Example 5.1.2 (A Closed Convex Set Described by Its Gauge) The Euclidean norm
on Rn can be described in terms of the standard closed unit ball Bn as x = inf{t >
0 | x/t ∈ Bn }. Now we generalize this construction. Suppose that we have a closed
convex set A ⊆ X = Rn that contains the origin in its interior (recall that this
condition can be assumed for a given nonempty closed convex set by translating
it to a new position in such a way that a relative interior point is brought to the
origin, and then restricting the space from Rn to the span of the translated copy of
the given set). Then the closure of c(A), the homogenization of A, is the epigraph
of a function pA : X → R+ . One has
This function is called the Minkowski function or the gauge of A. The set A can be
expressed in terms of this function by means of
A = {x ∈ X | g(x) ≤ 1}.
If g can be given by a formula, then this gives again a convenient finite description
of the convex set A: by the formula for its gauge g, a convex function as its epigraph
is a convex cone and so a convex set.
Figure 5.2 illustrates this description for the case that A is bounded.
A subset A of the plane R2 is drawn. It is a bounded closed convex set containing
the origin in its interior. Some arrows are drawn from the origin to the boundary of
A. On such an arrow the value of the gauge g of A grows linearly from 0 to 1.
Conclusion We have seen two methods to describe convex sets by means of
functions; this is convenient, as functions can often described by a formula. Then
an ‘infinity’ (a convex set) is captured in finite terms. The first method is to view
an unbounded convex set with the aid of a recession direction as the epigraph of a
function. The second method is by describing a convex set by its gauge.
5.1.2 Why Convex Functions that Are Not Nice Can Arise
in Applications
In this chapter, we will define two nice properties, closedness and properness, that all
convex functions that occur in applications possess. Now we describe an operation
that has to be carried out in many applications and that leads to a convex function
that might not have these two nice properties. This is the reason why one allows
in the theory of convex functions also functions that do not have these two nice
properties.
Example 5.1.3 (The Importance of Sensitivity Analysis) Sometimes the sensitivity
of an optimal solution is considered to be more important than the optimal solution
itself. Suppose, for example, that a company wants to start a new project. To begin
with, some possible scenarios for how to go about it might be created. Then, in order
to help the responsible managers to make a choice, the expected revenue minus cost
5.2 Convex Function: Definition 127
holds.
In geometric terms, Jensen’s inequality means that each chord with endpoints on
the graph of a proper convex function f lies entirely above or on the graph of f .
Figure 5.3 illustrates this property.
Here we make an observation that is often useful, as it allows to reduce some
arguments involving a convex function of several variables to the case of a convex
function of one variable: a function is convex iff its restriction to each line is convex.
Example 5.2.2 (Convex Function) Here are some characteristic examples of convex
functions. Later we will show how it can be verified that a function is convex in an
easier way than by checking Jensen’s inequality.
• Each linear function f (x) = a1 x1 +· · · an xn ; moregenerally each quadratic
function f (x) = x Ax + b x + c = i,j aij xi xj + k ck xk + d for which the
matrix A + A is positive semi-definite.
1
• The function f (x) = (x12 + · · · + xn2 ) 2 , the Euclidean length. This function is not
differentiable at x = 0.
• The function f (x) = 1/(1 − x 2 ), x ∈ (−1, 1). This function is not defined on
the entire line R.
• The function f on the nonnegative real numbers R+ for which f (x) = 0 for
x > 0 and f (0) = 1. This function is not continuous at 0.
It is often convenient to extend a convex function f : A → R to the entire space
X by defining its value everywhere outside A formally to be +∞. The resulting
function X → (−∞, +∞] is then called a proper convex function on X.
Example 5.2.3 (Indicator Function) Let A ⊆ X = Rn be a nonempty convex set.
Then the indicator function of A, defined to be the function δA : X → (−∞, +∞]
given by the recipe x → 0 if x ∈ A and x → +∞ otherwise, is a proper convex
function on X.
However, a more efficient way to define convex functions (and then to distinguish
between proper and improper ones) is by reducing this concept to the concept of
convex set: a function is defined to be convex if the region above its graph is a
convex set. Now we formulate this definition in a precise way, using the concept
epigraph of a function.
Definition 5.2.4 The epigraph of a function f on X = Rn taking values in the
extended real number line R = [−∞, +∞] = R ∪ {−∞, +∞} is the region above
its graph, that is, it is the set
in X × R.
We give an informal explanation of the idea behind this definition. Suppose we
are given a continuous function of two variables. Consider its graph, a surface in
three-dimensional space. Pour concrete on this surface and let it turn to stone. This
stone is the epigraph.
You do not lose any information by taking the epigraph of a function, as a
function f can be retrieved from its epigraph as follows:
if the set of real numbers ρ for which (x, ρ) ∈ epif is nonempty and bounded
below,
f (x) = +∞
f (x) = −∞
if the set of real numbers ρ for which (x, ρ) ∈ epif is unbounded below.
This can be expressed by means of one formula, by using the concept of infimum.
Definition 5.2.5 The infimum inf S of a set S ⊆ R is the largest lowerbound r ∈
R = R ∪ {±∞} of S. Note that we allow the infimum to be −∞ (this means that S
is unbounded below) or +∞ (this means that S is empty).
It is a fundamental property of the real numbers that this concept is well-defined.
It always exists, but the minimum of a set S ⊆ R does not always exist. If the
minimum of a set S ⊆ R does exist, then it is equal to the infimum of S. Therefore,
the infimum is a sort of surrogate minimum.
A function can be retrieved from its epigraph by means of the following formula:
no
f
f
130 5 Convex Functions: Basic Properties
a first example of how properties of convex functions follow from properties from
convex sets.
Proposition 5.2.7 Let f : X = Rn → R be a convex function and let γ ∈ R. Then
the sublevel set {x ∈ X | f (x) ≤ γ } and the strict sublevel set {x ∈ X | f (x) < γ }
are convex sets.
Proof We give the proof for the sublevel set; the proof for the strict sublevel set
is similar. As f is a convex function, its epigraph is convex. Intersection with the
closed halfspace X × (−∞, γ ], which is a convex set, is a convex set. Therefore,
the image of this intersection under the orthogonal projection from X × R onto X,
given by (x, α) → x, is a convex set. This image is by definition the sublevel set
{x ∈ X | f (x) ≤ γ }.
Example 5.2.8 (Convex Sets as Level Sets of Convex Functions) The following sets
are convex as they are level sets of convex functions.
• Ellipse. The solution set of 2x12 + 3x22 ≤ 5 is a level set of x → 2x12 + 3x22 .
• Parabola. The solution set of x2 ≥ x12 is a level set of x → x12 − x2 .
• Hyperbola. The solution set of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 is a level set of the
function x → − ln x1 − ln x2 defined on R2++ .
Sometimes, the definition of a convex function, in terms of the strict epigraph, is
more convenient.
Definition 5.2.9 The strict epigraph of a function f on X = Rn taking values in
the extended real number line R = [−∞, +∞] = R ∪ {−∞, +∞} is the set
in X × R.
If you take the strict epigraph of a function, you do not lose any information
either.
Definition 5.2.10 A function f : X = Rn → R is convex if its strict epigraph is a
convex set in X × R.
Definition 5.2.11 The effective domain of a convex function f on X = Rn is the
subset dom(f ) = A = {x ∈ X | f (x) = +∞} of X.
Note that the effective domain of a convex function is a convex set. Note also
that we allow a convex function to take value −∞ on its effective domain.
The convex functions that are of interest in the first place are the following ones.
Definition 5.2.12 A convex function f : X = Rn → R is proper if it does not
assume the value −∞ and if moreover it does not only assume the value +∞.
Convex functions that are not proper are called improper.
In other words, a proper convex function on X arises from a function f :
A → R, where A is a nonempty convex set in X that satisfies Jensen’s inequality
5.2 Convex Function: Definition 131
rA− (x) = −∞ ∀x ∈ A,
rA− (x) = +∞ ∀x ∈ A
and
Then rA− , rA+ are improper convex functions. In fact they are essentially the only
improper convex functions. To be precise, a convex function f : X → R is improper
− +
iff one has, for A = dom(f ), that rcl(A) ≤ f ≤ rcl(A) .
Figure 5.5 illustrates this characterization of improper convex functions.
We cannot avoid improper convex functions; this has been explained in
Sect. 5.1.2.
Conclusion The most fruitful definition of a convex function is by means of its
property that the epigraph is a convex set. One has to allow in the theory so-called
improper convex functions on X = Rn ; these can be classified almost completely
in terms of closed convex sets in X.
−∞
132 5 Convex Functions: Basic Properties
0 0
sufficient condition for the following stronger concept than convexity. This stronger
concept will be used when we consider optimization in Chap. 7.
Definition 5.3.8 A proper convex function is called strictly convex if it is convex
and if, moreover, its graph does not contain an interval of positive length.
Proposition 5.3.9 Assume that A is open. Then a twice differentiable function
f : A → R is convex iff its Hessian f (x) is positive semidefinite for each
x ∈ A. Moreover, if f (x) is positive definite for all x ∈ A, then f is strictly
convex.
Example 5.3.10 (Second Order Conditions for (Strict) Convexity Function)
1. The function f (x) = ex is strictly convex as f (x) = ex > 0 for all x.
2. The function f (x) = 1/(1 − x 2 ), −1 < x < 1 is strictly convex as f (x) =
2(1 − x 2 )(1 + 3x 2 )/(1 − x 2 )4 > 0.
3. The function
is convex, as f11 = f22 = ex1 +x2 (ex1 +ex2 )−2 > 0 and f11 f22 −f12
2 = 0, but not
its convexity in terms of its Hessian. This criterion gives an explicit description of
the following nice explicit class of convex functions: the quadratic functions that
are convex.
c(f)
0 f
136 5 Convex Functions: Basic Properties
on the floor. This gives the strict epigraph of f , from which the function f can be
recovered.
Now we give a formal definition of the description of a convex function f : X →
R, where X = Rn , in terms of a convex cone. Again, just as for the homogenization
of a convex set, it is convenient to give first the definition of deconification and then
the definition of conification.
Let C ⊆ X × R × R = Rn+2 be a convex cone for which
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ .
Note that this implies the following condition (and if C is closed it is even equivalent
to it):
R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+ .
Then
C ∩ (X × R × {1}) = A × {1}
for some convex set A ⊆ X × R that has (0X , 1) as a recession vector. Then this
set A is the convex set deconification of the convex cone C. Consider the function
f : X → R that is defined by
for all x ∈ X. This function f can be expressed directly in terms of the convex cone
C by
This function is readily verified to be convex. All convex functions f on X are the
dehomogenization of some convex cone C ⊆ X × R × R = Rn+2 for which
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ .
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+
C2 = {x ∈ R3 | x2 x3 ≥ x12 , x3 ≥ 0},
C3 = {x ∈ R3 | x1 x2 ≥ x32 , x3 ≥ 0}
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ .
This convex cone is not unique. The following result describes all conifications of a
given convex function f on X.
Proposition 5.4.4 Let f be a convex function on X = Rn . The conifications C ⊆
X × R × R+ of f are the intermediate convex cones of the following two convex
cones: the minimal conification
This result follows immediately from the definitions. We choose the minimal
homogenization of f to be our default homogenization. We write c(f ) = cmin (f )
and we call c(f ) the homogenization of f .
Conclusion A convex function can be described in terms of a convex cone: by
taking the conification of the convex set which is the strict epigraph of the convex
function.
The aim of this section is to give the following two ways to make a new convex
function from an old one by means of a linear function—by taking the image and
by taking the inverse image.
Let M : Rn → Rm be a linear function—that is, essentially an m × n-matrix.
Definition 5.5.1 The inverse image f M of a convex function f on Rm under M is
the convex function on Rn given by
(f M)(x) = f (M(x)) ∀x ∈ Rn .
This definition shows that the chosen notation for inverse images of convex sets
and functions is natural and convenient. Note that taking the inverse image of a
convex function does not preserve properness.
Definition 5.5.2 The image Mg of a convex function g on Rn under M is the
convex function on Rm given by
In terms of the reduction of convex functions to convex cones, the image and
inverse image can be characterized as follows.
If C is a conification of f , then CM is a conification of f M. To be more precise,
CM is the inverse image of C under the linear function Rn+2 → Rm+2 given by the
recipe
epi(f M) = (epif )M
and
This is a first example where the strict epigraph is more convenient than
the epigraph. It is the reason why the strict epigraph is chosen to define the
homogenization of a convex function.
Warning The properties properness and closedness are not preserved under taking
images and inverse images (the only preservation is that of closedness under inverse
images).
The aim of this section is to construct the five standard binary operations for convex
functions in a systematic way—by the homogenization method. So far we did
not need homogenization of convex functions: it was enough to describe convex
functions by means of their epigraph or strict epigraph. But in this section, this is
not enough: we do need homogenization, to reduce all the way to convex cones.
Here are five standard binary operations for convex sets—that is, rules to make a
new convex function X = Rn → R from two old ones.
Definition 5.6.1
• f + g (pointwise sum), (f + g)(x) = f (x) + g(x) for all x ∈ X,
• f ∨ g (pointwise maximum), (f ∨ g)(x) = max(f (x), g(x)) for all x ∈ X,
• f co ∧ g (convex hull of minimum), the maximal convex underestimator of f and
g, that is, the strict epigraph of f co ∧ g is the convex hull of the strict epigraph
of f1 ∧ f2 = min(f1 , f2 ), the pointwise minimum of f1 and f2 .
• f ⊕ g (infimal convolution), f ⊕ g(x) = inf{f (x1 ) + g(x2 ) | x = x1 + x2 } for
all x ∈ X.
• f #g (Kelley’s sum), f #g(x) = inf{f (x1 ) ∨ g(x2 ) | x = x1 + x2 } (where a ∨ b =
max(a, b) for a, b ∈ R) for all x ∈ X.
140 5 Convex Functions: Basic Properties
g f
f ∨g
f co ∧ g
f g
Note that for two of these operations , pointwise sum and pointwise maximum,
the value of f g at a point depends only on the value at this same point of f and
g.
Example 5.6.2 (Binary Operations for Convex Functions (Geometric)) Figure 5.8
illustrates the pointwise maximum and the convex hull of the pointwise minimum.
To get the graph of f ∨ g one should follow everywhere the highest one of the
graphs of f and g. To get the graph of f co ∧ g, one should first follow everywhere
the lowest one of the graphs of f and g; then one should make the resulting graph
convex, by taking the graph of the highest convex minorizing function.
Example 5.6.3 (Binary Operations for Convex Functions (Analytic)) Consider the
two convex functions f (x) = x 2 and f (x) = (x − 1)2 = x 2 − 2x + 1.
1. (f + g)(x) = x 2 + (x − 1)2 = 2x 2 − 2x + 1,
2. (f ∨ g)(x) = (x − 1)2 = x 2 − 2x + 1 for x ≤ 12 and = x 2 for x ≥ 12 ,
3. (f ∧ cog)(x) = x 2 for x ≤ 0, = 0 for 0 ≤ x ≤ 1, and = (x − 1)2 = x 2 − 2x + 1
for x ≥ 1.
4. (f ⊕ g)(x) = infu (u2 + (x − u − 1)2 ) by definition; by completion of the square
of the quadratic function of u inside the brackets, this is equal to
1 1 1 1 1
inf 2(u − (x − 1)2 )2 + (x − 1)2 = (x − 1)2 = x 2 − x + .
u 2 2 2 2 2
Proof As the functions f, g are convex, the sets epi(f ), epi(g) are convex.
Therefore the intersection epi(f )∩epi(g) is convex. This intersection is by definition
of the binary operation max equal to epi(f ∨g). So the function f ∨g is convex.
Warning The properties closedness and properness are not always preserved under
binary operations. For example, if A1 , A2 are nonempty disjoint convex sets in Rn
and f1 : A1 → R, f2 : A2 → R are convex functions, then f1 + f2 has effective
domain A1 ∩ A2 = ∅, and so it is not proper.
Now a systematic construction is given of the five binary operations of Defini-
tion 5.6.1: by the homogenization method. That this can be done, is an advantage
of the homogenization method. It will be useful later, when we will give for
convex functions the convex calculus rules for their dual description and for their
subdifferentials. Recall at this point how this was done for convex sets. Here the
procedure will be similar.
The precise way to construct a binary on convex functions by homogenization, is
as follows. We take the space that contains the homogenizations of convex functions
on X (we recall that X is essentially Rn ): this space is the product space X × R × R.
Then we make for each of the three factors of this space—X, the first factor R,
and the second factor R—a choice between + and −. Let the choice for X be ε1 ,
let the choice for the first factor R be ε2 , and let the choice for the second factor
R be ε3 . This gives 23 = 8 possibilities for (ε1 , ε2 , ε3 ): (+ + +), (+ + −), (+ −
+), (+ − −), (− + +), (− + −), (− − +), (− − −). Each choice (ε1 , ε2 , ε3 ) will
lead to a binary operation ◦(ε1 ,ε2 ,ε3 ) on convex functions. Let (ε1 , ε2 , ε3 ) be one of
the eight choices, and let f and g be two ‘old’ convex functions on X. We will use
the homogenization method to construct a ‘new’ convex function f ◦(ε1 ,ε2 ,ε3 ) g on
X.
1. Conify. We conify f and g. This gives conifications c(f ) and c(g) respectively.
These are convex cones C in X × R × R = Rn+2 for which C + R+ (0X , 1, 0) ⊆
C ⊆ X × R × R+ .
2. Work with convex cones. We want to associate to the old convex cones C =
c(f ) and D = c(g) a new convex cone E in X × R × R = Rn+2 for which
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
We take the Cartesian product C × D. This is a convex cone in X × R × R+ ×
X × R × R+ . We rearrange its factors:
(X × X) × (R × R) × (R+ × R+ ).
This has three factors: the first factor is X × X, the second factor is R × R and
third factor is R+ × R+ .
Now we view C ×D as a subset of (X ×X)×(R×R)×(R+ ×R+ ). Then we act
on C × D as follows, using the triplet (ε1 , ε2 , ε3 ) that we have chosen. We define
the promised new cone E to be the outcome of applying simultaneously three
operations on C × D: the first operation is taking the image under the mapping
+X if ε1 = +, but it is the inverse image under the mapping X if ε1 = −;
the second operation is taking the image under the mapping +R on the second
142 5 Convex Functions: Basic Properties
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
F + R+ (0X , 1, 0) ⊆ F ⊆ X × R × R+ .
2. Work with convex cones. We want to associate to the old convex cones C = c(f )
and D = c(g) a new convex cone E in X × R × R = Rn+2 for which
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
(X × X) × (R × R) × (R+ × R+ ).
This has three factors: the first factor is X × X, the second factor is R × R and
third factor is R+ × R+ .
Now we view C×D as a subset of (X×X)×(R×R)×(R+ ×R+ ). Then we act on
C × D as follows, using the triplet (ε1 , ε2 , ε3 ) that we have chosen (− + −). We
define the promised new cone E to be the outcome of applying simultaneously
the inverse image under X on the first of the three factors of (X × X) × (R ×
R) × (R+ × R+ ) (as the triplet (− + −) has − on the first place), the image under
+R on the second of the three factors of (X × X) × (R × R) × (R+ × R+ ) (as
the triplet (− + −) has + on the second place), and the inverse image under R
on the third of the three factors of (X × X) × (R × R) × (R++ × R+ ) (as the
triplet (− + −) has − on the third place).
5.6 Binary Operations for Convex Functions 143
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
Explicitly, E consists of all elements (z, γ , τ ) ∈ X ×R×R for which there exists
an element ((x, α, ρ), (y, β, σ )) ∈ C × D such that
One can use this description to check that E is a convex cone in X × R × R for
which the closure is intermediate between R+ (0X , 1, 0) and X × R × R+ .
Now we use that C = c(f ) and D = c(g). It follows that E consists of all triplets
(z, γ , τ ) ∈ X × R × R for which there exists an element
(s(x, ξ, 1), t (y, η, 1)) = (sx, sξ, s, ty, tη, t) ∈ c(f ) × c(g)
z = sx = ty, γ = sξ + tη, τ = s = t.
E = c(f + g),
f ◦(−+−) g = f + g
The aim of this section is to define the recession cone of a convex function.
The recession cone of a proper convex function on X = Rn always contains the
vector (0X , 1). This is not interesting. Therefore, we define recession vectors of a
convex function f by taking the vertical projections on X of the recession vectors
of the epigraph of f that do not point upward—that is, their last coordinate should
be non-positive.
Definition 5.7.1 For each convex function f on Rn and each real number γ the
sublevel set of f for level γ is the following convex set:
Vγ = {x | f (x) ≤ γ }.
0 Rf
This concept is well-defined: which non-empty sublevel set is chosen does not
matter. Figure 5.9 illustrates the concept of recession cone of a convex function.
Thus, the recession cone of a convex function is defined to be ‘the interesting
part’ of the recession cone of the epigraph of the function. Here is the main
application of this concept.
Proposition 5.7.3 If f is closed and Rf = {0}, then f assumes its minimum.
It might be surprising that this somewhat artificial looking concept can be defined
in a natural way, as a dual concept to the concept of effective domain of a convex
function. We will see this in Chap. 6.
5.8 *Applications
The aim of this section is to give applications of the concept of convex function.
The aim of this section is to give further explanation that convex functions allow to
describe some convex sets.
By the second order criterion for convexity, one can give explicit examples of
1
convex functions of one variable such as x 2 , −x 2 , 1/x, ex , − ln x (x > 0). For
such a function f , its epigraph {(x, y) | y ≥ f (x)} is a convex set in the plane R2 .
More generally, the second order conditions can be used to verify the convexity of
functions of several variables such as f (x) = x Ax +a x +α where A is a positive
semidefinite n×n-matrix, a is an n-column and α a real number. Besides, each norm
on Rn is a convex function. By repeatedly applying binary operations one gets many
more convex functions. Then for each convex function f of n variables constructed
explicitly, one gets an explicit construction of a convex set—its epigraph.
Conclusion One can give explicit examples of convex sets by constructing convex
functions and taking their epigraphs.
146 5 Convex Functions: Basic Properties
a c pa + qb b
The aim of this section is to explain why, in some valuable applications in economic
theory, convex functions occur that are not specified.
Suppose that someone expects to receive an amount of money a with probability
p ∈ [0, 1] and an amount of money b with probability q = 1−p. Then the expected
amount of money he will receive is pa + qb. If he is risk averse, then he would
prefer to receive an amount of money slightly less than pa + qb for sure. This can
be modeled by means of a utility function U of the person: a prospect of getting the
amount c for sure is equivalent to the prospect above if U (c) = pU (a) + qU (b).
It is reasonable to require that a utility function is increasing: ‘the more the better’.
Moreover, it should be concave (that is −U should be convex), as the person is risk
averse. Figure 5.10 illustrates the modeling of risk averse behavior by concavity of
the utility function.
Thus risk averse behavior is modeled by a convex function that is not specified.
In economic theory, many problems are modeled by convex functions and analyzed
fruitfully, although these functions are not specified by a formula; one only assumes
these functions to be convex or concave.
Conclusion The property of a function to be convex or concave corresponds
sometimes to desired properties in economic models. In the analysis of these models
these properties are assumed and used in the analysis, without a specification of the
functions by a formula.
5.9 Exercises
4. Show that a convex function f : [0, +∞) → R with f (0) ≤ 0 has the
following property:
f (x) + f (y) ≤ f (x + y)
for all x, y.
5. Prove for a convex function on [a, b] (where a < b) Hadamard’s inequality
a+b 1 b f (a) + f (b)
f( )≤ f (x)dx ≤ .
2 b−a a 2
m
m
f αi xi ≤ αi f (xi )
i=1 i=1
for αi ≥ 0, m i=1 αi = 1, xi ∈ X ∀i.
30. Compare the proof you have found in Exercise 29 to the usual proof (‘derivation
from Jensen’s inequality for two points’).
31. Show that the inverse image f M of a convex function f under a linear function
M is again a convex function.
5.9 Exercises 149
32. Show that taking the inverse image of a convex function under a linear function
does not preserve properness.
33. Show that the epigraph of the inverse image of a convex function under a linear
function is the inverse image of the epigraph of this convex function.
34. Show that the image Mg of a convex function g under a linear function M is
again a convex function.
35. Show that taking the image of a convex function under a linear function does
not preserve properness.
36. *Show that the image Mg of a convex function g : Rn → R under M can be
characterized as the function Mg : Rm → R for which the strict epigraph is
the image of the strict epigraph of g under the mapping Rn × R → Rm × R :
(x, ρ) → (Mx, ρ).
37. *Show that the five binary operations in Definition 5.6.1 are well-defined. For
the pointwise maximum, this is done in the main text.
38. Prove the following geometric interpretation for the binary operation infimal
convolution of convex functions: this amounts to taking the Minkowski sum of
the strict epigraphs.
39. *Repeat this construction, but now for the triplet (X , R , R ) and show that
this gives the pointwise maximum.
40. Repeat this construction again, but now for the triplet (+X , +R , +R ) and
(+X , +R , R ) and show that this gives the convex hull of the pointwise
minimum and the infimal convolution respectively.
41. Construct with the conification method Kelley’s sum by a suitable triplet.
42. Construct with the conification method formulas for the three nameless binary
operations on convex functions. Try them for some simple concrete convex
functions on R.
43. Show that the recession cone of a closed proper convex function is well-defined.
That is, show that all nonempty sublevel sets of a convex function have the same
recession cone.
44. Give a homogenization definition of the recession cone of a convex function.
Chapter 6
Convex Functions: Dual Description
Abstract
• Why. The phenomenon duality for a proper closed convex function, the possibil-
ity to describe it from the outside of its epigraph, by graphs of affine (constant
plus linear) functions, has to be investigated. This has to be done for its own sake
and as a preparation for the duality theory of convex optimization problems. An
illustration of the power of duality is the following task, which is challenging
without duality but easy if you use duality: prove the convexity of the main
function from geometric programming, ln(ex1 + · · · + exn ), a smooth function
that approximates the non-smooth function max(x1 , . . . , xn ).
• What. The duality for convex functions can be formulated efficiently by the
property of a duality operator on convex functions that have the two nice
properties (proper and closed), the conjugate function operator f → f ∗ : this is
an involution. For each one of the eight binary operations on convex functions,
one has a rule of the type (f g)∗ = f ∗ " g ∗ where " is another one of the
eight binary operations on convex functions. Again, homogenization generates a
unified proof for these eight rules. This requires the construction of the conjugate
function operator by means of a duality operator for convex cones (the polar cone
operator). There is again a technical complication due to the fact that the rules
only hold if all convex functions involved, f, g, f g and f ∗ " g ∗ , have the two
nice properties.
Even more important, for applications, is that one has, for the two binary
operations + and max on convex functions, a rule for subdifferentials: the
theorem of Moreau-Rockafellar ∂(f1 +f2 )(x) = ∂f1 (x)+∂f2 (x) and the theorem
of Dubovitskii-Milyutin ∂ max(f1 , f2 )(x) = ∂f1 (x)co ∪ ∂f2 (x) in the difficult
case f1 (x) = f2 (x). Homogenization generates a unified proof for these two
rules. Again, these rules only hold under suitable assumptions.
Again, there is no need to prove something new: all duality results for convex
functions follow from properties of convex sets (their epigraphs) or of convex
cones (their homogenizations).
Road Map
1. Figure 6.1, Definition 6.2.3, (conjugate function).
2. Figure 6.2 (dual norm is example of either conjugate function or duality operator
by homogenization).
3. Theorem 6.4.2 and structure proof (duality theorem for proper closed convex
functions: conjugate operator is an involution, proof by homogenization).
4. Propositions 6.5.1, 6.5.2 (calculus rules for computation conjugate functions).
5. Take note of the structure of Sect. 6.5 (duality between convex sets and sublinear
functions; this is a preparation for the proof of calculus rules for subdifferentials).
6. Definitions 6.7.1, 6.7.2, Propositions 6.7.8–6.7.11 (subdifferential convex func-
tion at a point, nonemptiness and compactness, at points of differentiability,
Fenchel-Moreau, Dubovitskii-Milyutin, subdifferential norm).
6.1 *Motivation
The dual description of a convex function is fundamental and turns up often, for
example in duality theory for convex optimization. Here we mention a striking
example of the power of using the dual description of a convex function. We give an
example of a function for which it is quite involved to verify convexity by means of
the second order criterion. However, this will be surprisingly simple if you use its
dual description, as we will see at the end of this chapter.
Consider the function f : Rn → R given implicitly by
e y = e x1 + · · · + e xn
The aim of this section is to define the duality operator for proper closed convex
functions—the conjugate function operator.
Example 6.2.1 Figure 6.1 illustrates the definition of the conjugate function.
The graph of a convex function f : R → R is drawn. Its primal (‘inner’)
description is its epigraph. Its dual (‘outer’: outside the epigraph) description is the
6.2 Conjugate Function 153
ax − b
− f ∗(a)
−b
collection of lines in the plane that lie entirely under or on the graph. Each one of
these lines can be described by an equation y = ax − b. It turns out that the set
of pairs (a, b) that one gets is precisely the epigraph of a convex function f ∗ . This
function f ∗ is the dual description of the convex function f , called the conjugate
function of f . Figure 6.1 shows that f ∗ is given by the following formula:
Indeed, the condition that the line y = ax − b lies entirely under or on the graph
of f means that
ax − b ≤ f (x) ∀x.
ax − f (x) ≤ b ∀x
that is, as
This shows that the set of pairs (a, b) for which the line y = ax − b lies entirely
under or on the graph of f is indeed the epigraph of the function with recipe a →
supx (ax − f (x)) .
The formula above for f ∗ implies that f ∗ is convex, as for each x the function
a → ax − f (x) is affine and so convex, and the pointwise supremum of a collection
of convex functions is again convex.
Example 6.2.2 (Comparison of Conjugate Function and Derivative) Let f : R →
R be a differentiable convex function; assume for simplicity that f : R → R is
a bijection. Then both the functions f ∗ and f give descriptions of the set of the
tangent lines to the graph of f . In the case of the conjugate function f ∗ , the set is
described as a family (La )a∈R indexed by the slope of the tangent line: La is the
154 6 Convex Functions: Dual Description
graph of the function with recipe x → ax − f ∗ (a). In the case of the derivative f ,
the set is described as a family (Mb )b∈R indexed by the x-coordinate b of the point of
tangency of the tangent line and the graph of f : Mb is the graph of the function with
recipe x → f (b) + f (b)(x − b). The description by means of f ∗ has the advantage
over the description by f that if f is convex but not differentiable, this description
includes all supporting lines at points of nondifferentiability of f . These arguments
will be generalized in Proposition 6.7.5 to arbitrary proper convex functions: then
one gets a relation between the conjugate function and the subdifferential function,
which is the analogue of the derivative for convex functions.
Now we give the formal definition of the duality operator for convex functions.
Definition 6.2.3 Let a proper convex function f be given. That is, let a nonempty
convex set A ⊆ X = Rn and a convex function f : A → R be given. Then the
conjugate function of f is the convex function on X given by
− || ⋅ ||∗
The aim of this section is to define the conjugate function operator by homoge-
nization. This alternative definition will make it easier to establish properties of the
conjugate function operator: they can be derived from similar properties of the polar
cone operator.
Here is a detail that needs some attention before we start: the conification c(f ) of
a convex function f on X = Rn is contained in the space X × R × R. For defining
the polar cone (c(f ))◦ , we have to define a suitable nondegenerate bilinear mapping
on X × R × R. WE take the following nondegenerate bilinear mapping b:
for all (x, α, α ), (y, β, β ) ∈ X×R×R. Note the minus signs and the transposition!
The following result gives a nice property of a convex function: its dual object is
again a convex function.
Proposition 6.3.1 Let X = Rn .
1. A convex cone C ⊆ X × R × R is a conification of a convex function f on X iff
R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+ .
Proof The first statement follows by definition of conification of a convex set. The
second statement follows from the fact that the convex cones R+ (0X , 1, 0) and X ×
R×R+ are each other’s polar cone with respect to the bilinear mapping b. Therefore
we get that
R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+
implies
and so it gives
X × R × R+ ⊇ C ◦ ⊇ R+ (0X , 1, 0),
as required.
Now we are ready to construct the conjugate function operator by the homoge-
nization method.
Proposition 6.3.2 The duality operator on convex functions that one gets by the
homogenization method (first conify, then apply the polar cone operator, finally
deconify) is equal to the conjugate function operator.
Indeed, by the previous proposition, the duality operator by the homogenization
method gives, when applied to an ‘old’ convex function f on X, a ‘new’ convex
function on X. A calculation, which we do not display, shows that the new convex
function is f ∗ , the conjugate function of f .
This is yet another example where a formula from convex analysis—here
the defining formula for the conjugate function operator—is generated by the
homogenization method.
Conclusion The duality operator for proper closed convex functions—the conju-
gate function operator—has been constructed by the homogenization method.
The aim of this section is to give the duality theorem for convex functions.
Definition 6.4.1 The biconjugate function f ∗∗ of a proper convex function f on
X = Rn is the conjugate function of the conjugate function of f ,
f ∗∗ = (f ∗ )∗ .
6.5 Calculus for the Conjugate Function 157
Theorem 6.4.2 (Duality Theorem for Convex Functions) A closed proper convex
function f on X = Rn is equal to its biconjugate function f ∗∗ ,
f ∗∗ = f.
The aim of this section is to present calculus rules to compute conjugate functions.
We already have given one rule above: (f ∗ )∗ = f if f is proper and closed. Now
we will also give rules for expressing (f ◦ g)∗ in f ∗ and g ∗ where ◦ is a binary
operation for convex functions (sum, maximum, convex hull of the minimum, or
infimal convolution) and for expressing (Mf )∗ and (f N )∗ in f ∗ for linear functions
M and N .
Just as we did for convex sets, we will halve the number of formulas by the
introduction of a shorthand notation. We will write for two proper convex functions
f, f˜ : X = Rn → R that are in duality, in the sense that
as follows:
→ f˜.
d
f ←
Then we will say that (f, f˜) is a pair in duality. Note that here f need not be closed
(that is, lower-semicontinuous).
Here are the promised calculus rules for conjugate functions.
158 6 Convex Functions: Dual Description
Proposition 6.5.1 (Calculus Rules for the Conjugate Function Operator) Let
both (f, f˜) and (g, g̃) be two pairs of proper convex functions on X = Rn in duality,
→ f˜, g ←
d d
f ← → g̃.
Then
→ f˜ ⊕ g̃
d
f +g ←
and
→ f˜co ∧ g̃,
d
f ∨g ←
provided, for each of these two rules, that the outcomes of the binary operations
involved are proper.
The proof of these results in a traditional approach are relatively involved.
However, we can get a unified proof by the homogenization method. Indeed
these results follow immediately from the following convex calculus rules and the
definition of the binary operations by the homogenization method (recall that this
involves the special linear functions +Y and Y ).
Proposition 6.5.2 Let M ∈ Rm×n , (g, g̃) a pair of proper convex functions on Rn
in duality,
d
g←
→ g̃.
d
→ g̃M
Mg ←
The aim of this section is to prepare for the non-differentiable analogue of the
derivative of a differentiable function—the subdifferential of a convex function f
at a point x, a concept that we need for optimization.
Example 6.6.1 (Subdifferential) Figure 6.3 illustrates the subdifferential for func-
tions on X = Rn for n = 1.
6.6 Duality: Convex Sets and Sublinear Functions 159
Then the convex cone C is called a homogenization of the sublinear function p. The
homogenizations of p are the cones C for which
Now we define the duality operators for proper sublinear functions and for
general nonempty convex sets.
Definition 6.6.4 The support function of a nonempty convex set A ⊆ X = Rn is
the proper closed sublinear function s(A) : X → R ∪ {+∞} given by
y → sup{x · y | x ∈ A}.
∂p = {y ∈ X | x · y ≤ p(x) ∀x ∈ X}.
These two duality operators are defined by explicit formulas which might look
like they come out of nowhere and the duality theorem 6.6.8 might look like a
new phenomenon that requires a new proof. However, one can also define these
operators, equivalently, in a systematic way: by homogenization. This gives the
same duality operators and then the duality theorem 6.6.8 is seen to be an immediate
consequence of the duality theorem for convex cones. Indeed, let p be a proper
sublinear function on X = Rn . Choose a homogenization C of p. This is a convex
cone in X × R, for which
Apply the polar cone operator with respect to the bilinear mapping b on X ×R given
by
This gives
Now use that the convex cones R+ (0X , 1) and X × R+ are each other’s dual with
respect to b, and that the convex cones R(0X , 1) and X × {0} are also each other’s
dual with respect to b. It follows that we get
X × R+ ⊇ C ◦ , X × {0} ⊇ C ◦ .
That is,
C ◦ ⊆ X × R+ , C ◦ ⊆ X × {0}.
for the operations +, ∨ on sublinear functions. Therefore we only display the four
duality rules for sublinear functions.
Proposition 6.6.10 (Calculus Rules for the Sublinear Functions) Let p, q be
proper closed sublinear functions on X = Rn . Then the following formulas hold.
1. ∂(p + q) = cl(∂p + ∂q),
2. ∂(p ∨ q) = cl((∂p) co ∪ (∂q)),
3. ∂(p ⊕ q) = ∂p ∩ ∂q,
4. ∂(p # q) = ∂p # ∂q,
provided, for each one of these rules, that the outcomes of the two binary operations
involved is proper.
The closure operators can be omitted if the relative interiors of the effective
domains of p1 , p2 have a common point,
ri(dom p1 ) ∩ ri(dom p2 ) = ∅.
We emphasize that for applications in convex optimization, we need only the first
two rules.
We do not display the details of the proofs, by homogenization. These are similar
to the proofs of the corresponding rules for other convex objects than sublinear
functions, such as convex sets containing the origin or proper convex functions.
Conversely, one can give rules for the support function of ‘new convex sets’ in
terms of old ones. We do not need these rules so we will not display them.
Conclusion The dual description of a proper sublinear function is a general
nonempty convex set. This will be needed for the definition of the ‘surrogate’ for
the derivative of a convex function in a point of non-differentiability.
The aim of this section is to present the basics for subgradients and subdifferentials
of convex functions at a point. These concepts play a role in optimization. For
convex functions they are the analogue of the derivative.
Here is the usual definition of the subdifferential of a proper convex function at
a point.
Definition 6.7.1 A subgradient of a proper convex function f on X = Rn at a point
x from its effective domain dom(f ) is a vector d ∈ X for which
In words, if an affine function on X has the same value at x as f and its graph lies
nowhere above the graph of f , then its slope is called a subgradient of f at x.
6.7 Subgradients and Subdifferential 163
This result follows from the definitions. It often allows to compute the subdiffer-
ential ∂f (x) if the conjugate function f ∗ is known.
The subdifferential of a proper convex function at a point can be expressed in
terms of directional derivatives as follows.
Often, one can calculate subdifferentials ∂f (x) of a convex function f at a point
x, by means of the following result.
Definition 6.7.6 The directional derivative f (x; v) ∈ R of a proper convex
function f : X = Rn → R at a point x ∈ dom(f ) in direction v ∈ X is the
limit
f (x + tv) − f (x)
f (x; v) = lim .
t↓0 t
164 6 Convex Functions: Dual Description
The proof is easy: it suffices to prove this result for the case of a function of one
real variable, by restriction of f to lines through the point x. The proof in this case
follows readily from the definitions.
In passing, we mention that this property of convex functions is no exception to
the rule that all properties of convex functions can be proved by applying a property
of a convex set or a convex cone. It is a consequence of a property of the tangent
cone TA (x) = cl[R+ (A − x)] of a convex set A at a point x ∈ A, but we do not
consider the concept of tangent cone.
The meaning of Proposition 6.7.7 is twofold. First it gives a systematic definition,
by homogenization, of the subdifferential of a proper convex function f at a point
x of its effective domain. Second it gives a method that allows to calculate the
subdifferential of a convex function at a point in many examples.
Concerning the first point. One takes the sublinear function f (x, ·) given by the
directional derivatives of f at x,
v → f (x; v).
Then one takes the dual description of this sublinear function, the convex set
∂(f (x; v)). Proposition 6.7.7 shows that this definition is equivalent to the defi-
nition given above.
Concerning the second point. Proposition 6.7.7 reduces the task to calculate
a subdifferential of a convex function at a point to the task to calculate, using
differential calculus, one-sided derivatives of functions of one variable at a point.
This systematic definition of the subdifferential of a convex function at a point
makes it possible to give straightforward proofs—by the homogenization method—
of all its properties. Here are some of these properties.
Proposition 6.7.8 The subdifferential of a proper convex function at a relative
interior point of the effective domain is a nonempty compact convex set.
The subdifferential is, at a point of differentiability, essentially the derivative.
Proposition 6.7.9 If f : X = Rn → R is a proper convex function and x ∈
ri(dom(f )), then f is differentiable in x iff ∂f (x) consists of one element. Then this
element is f (x), the derivative of f at x.
6.8 Norms as Convex Objects 165
Moreover, we get the following calculus rules for the subdifferential of a proper
convex functions at a point of its effective domain.
Proposition 6.7.10 (Calculus Rules for Subdifferential Convex Function) Let
f, g be two proper convex functions on X = Rn . Then the following formulas hold
true:
1. ∂(f ∨ g)(x) = cl(∂f (x)co ∪ ∂g(x)) if f (x) = g(x)
2. ∂(f + g)(x) = cl(∂f (x) + ∂g(x)).
provided, for each of these formulas, that the outcomes of the two binary
operations involved are proper.
The formulas of Proposition 6.7.10 are celebrated theorems. The first one is
called the theorem of Dubovitskii-Milyutin (note that the rule for ∂(f ∨ g)(x) in
the case f (x) = g(x) is simply that it is equal to ∂f (x) if f (x) > g(x) and that it is
equal to ∂g(x) if f (x) < g(x)), the second one the theorem of Moreau-Rockafellar.
Usually it requires considerable effort to prove them; by the systematic approach—
the homogenization method—you get them essentially for free: they are immediate
consequences from the corresponding rules for sublinear functions: the first two
rules from Proposition 6.6.10
Proposition 6.7.11 The subdifferential of a norm N on X = Rn at the origin is
equal to the unit ball of the dual norm N ∗ ,
{x ∈ X | N ∗ (x) ≤ 1}.
The material in this section can be generated by the homogenization method; the
details are not displayed here.
Conclusion A surrogate for the derivative—the subdifferential—has been defined
which is also defined for points of nondifferentiability of convex functions. This
is done by a formula and equivalently by means of the duality between sublinear
functions and convex sets. The latter definition can been used to derive the calculus
rules for the subdifferential.
One can define the conification of a norm N on X = Rn in two ways. In the first
place, one can take the convex cone in X × R × R that is the conification of N
as a convex function. However, a simpler concept of conification is possible: the
convex cone in X × R that is the strict epigraph of N . These two conifications give
as dual object of N different convex functions. The conification of N as a convex
function gives the indicator function of standard unit ball of the standard dual norm
N ∗ , which is given by N ∗ (y) = supN (x)=1 x · y for all y ∈ X. That is, this function
takes the value 0 in this ball and the value +∞ outside this ball. The epigraph gives
166 6 Convex Functions: Dual Description
the standard dual norm N ∗ itself. Each norm is continuous. So if N, Ñ are norms
on X, then Ñ = N ∗ iff N = Ñ ∗ ; then one says that the norms N and Ñ are dual to
each other. One gets four binary operations on norms, the restrictions of the binary
operations on convex functions +, ⊕, ∨, co∧. The binary operations + and ⊕ are
dual to each other, as are the binary operations ∨ and co∧.
The aim of this section is to give an example that illustrates the power of the dual
description of convex functions.
Example 6.9.1 (Proving the Convexity of a Function by Means of Its Conjugate
Function) We consider the problem to verify that the function
n
yi = 1, y > 0.
i=1
For each such y, the following choice of x is a solution of the stationarity conditions
xi = ln yi 1 ≤ i ≤ n.
n
yi ln yi .
i=1
6.10 Exercises 167
So we have computed f ∗ . It is the function g(y) that takes the value ni=1 yi ln yi
if ni=1 yi = 1, y ≥ 0 (with, for simplicity of notation, the convention 0 ln 0 = 0—
note here that limt↓0 t ln t = 0) and that takes the value +∞ otherwise. If we would
take the conjugate function of g we get f back, by the duality theorem.
After this ‘secret and private preparation’, we can start the official proof that the
function f is convex. We define out of the blue the following function g : Rn → R:
n
g(y) = yi ln yi
i=1
n
yi = 1, y > 0
i=1
and which takes the value +∞ otherwise. It is our good right to consider any
function we like. There is no need to explain in a proof how we have come up
with our ideas to make the proof work. Now one sees straightaway that g is convex.
Just calculate the second derivative of η → η ln η—it is η−1 and so it is positive if
η > 0. Hence η ln η is convex and so g is convex.
Now we compute the conjugate function of g: for each x ∈ Rn we have by
definition that g ∗ (x) is the optimal value of the problem to maximize the function
y → x · y − g(y) subject to the constraints
n
yi = 1, y ≥ 0.
i=1
Writing down the stationary conditions of this problem and solving them gives
readily that the value of this problem is f (x). It follows that g ∗ = f . Hence f is a
convex function, as each conjugate function is convex.
Conclusion To prove that the central function from geometric programming is
convex, one can derive non-rigorously a formula for its conjugate function. This
formula is easily seen to define a convex function. Therefore, its conjugate function
is convex. It can be shown readily in a rigorous way that this convex function is the
original given function.
6.10 Exercises
These inequalities show that the function ln( i exi ) is a good smooth approxi-
mation of the useful but non-differentiable function maxi {xi }.
2. *
(a) Show that the condition that a line y = ax − b lies entirely under the graph
of a given convex function f : R → R can be written as follows:
(c) Conclude that the pairs (a, b) form the epigraph of the function a →
f ∗ (a) = supx (ax − f (x)).
(d) Show that the conjugate function f ∗ is convex, as it is written as the
supremum of convex functions lx : a → ax − f (x) where x runs over
R.
3. *Check that the conjugate function f ∗ of a proper closed convex function f
is again a proper closed convex function on X. Show that its effective domain
consists of the slopes a of all affine functions a · x − b that minorize f .
4. *Compute the conjugates of the following convex functions of one variable:
(a) ax 2 + bx + c, (for a ≥ 0),
(b) ex ,
(c) − ln x, x > 0,
(d) x ln x, x > 0,
(e) x r , x ≥ 0, (for r ≥ 1),
(f) −x r , x > 0, (for 0 ≤ r ≤ 1),
(g) x r , x > 0 (for x ≤ 0),
(h) |x| + |x − a| ,
(i) √ − 1| ,
||x|
(j) 1 + x2,
(k) − ln(1 − ex ),
(l) ln(1 + ex ).
5. Compute the conjugate of a convex quadratic function of n variables, that is for
f (x) = x Cx + c x + γ where C is a positive semidefinite n × n-matrix,
c ∈ Rn and γ ∈ R.
6. *Calculate the conjugate function of the convex function f : Rn → R : x →
max{x1 , . . . , xn }.
7. Show that · ∗ is a norm: directly from the defining formula and from the
definition by the homogenization method.
6.10 Exercises 169
f ≤ g ⇒ f ∗ ≥ g∗.
26. Prove the following expression for the subdifferential of a proper convex
function f : X = Rn → R at a point x of its effective domain and the
directional derivatives of f at x,
∂f (x) · v = f (x; v) ∀v ∈ X.
27. *Let X = Rn , let f be a proper convex function on X, let x̄ ∈ dom(f ) and let
η ∈ X. Write A = epi(f ) and v = (x̄, f (x̄)). Show that the closed halfspace
in X × R having v on its bounding hyperplane and for which the vector (η, −1)
is an out of the halfspace pointing normal —that is, the halfspace η · (x − x) −
(ρ − f (x̄)) ≥ 0— supports A at v iff η ∈ ∂f ( x ).
28. Derive the theorems of Moreau-Rockafellar and Dubovitskii-Milyutin from
the corresponding rules for subdifferential functions: the first two rules from
Proposition 6.6.10.
29. Give the rule for ∂(f ∨ g)(x) in the case f (x) = g(x).
30. *Compute the subdifferential of the Euclidean norm on the plane—· : R2 →
1
[0, ∞) given by the recipe x = (x1 , x2 ) → (x12 + x22 ) 2 —at each point.
31. Find the subdifferential ∂f (x, y) of the function
f (x, y) = max {x, 2y} + 2 x 2 + y 2
at the points (x, y) = (3, 2); (0, −1); (2, 1); (0, 0).
32. Find the subdifferential ∂f (x, y) for the function
f (x, y) = max{2x, y} + x2 + y2
40. Prove all results mentioned in Sect. 6.8, and determine which choice (++), (=
−), (−+), (−−) gives rise to which binary operation on norms +, ⊕, ∨, co∧.
41. *Formulate all results on norms N on X = Rn that are mentioned in Sect. 6.8,
in terms of their unit balls {x ∈ X | N(x) ≤ 1}. In particular, determine how the
binary operations on norms correspond to the binary operations on unit balls of
norms.
42. *Write down the stationarity conditions of the optimization problem y → x ·
y − g(y), where g(y) = ni=1 yi ln yi for all y > 0, subject to the constraints
n
yi = 1, y > 0
i=1
and solve them. Compute the optimal value of this problem as a function of x.
Chapter 7
Convex Problems: The Main Questions
Abstract
• Why. Convex optimization includes linear programming, quadratic program-
ming, semidefinite programming, least squares problems and shortest distance
problems. It is necessary to have theoretical tools to solve these problems.
Finding optimal solutions exactly or by means of a law that characterizes them,
is possible for a small minority of problems, but this minority contains very
interesting problems. Therefore, most problems have to be solved numerically,
by an algorithm. An analysis of the performance of algorithms for convex
optimization requires techniques that are different from those presented in this
book; therefore such an analysis falls outside the scope of this book.
• What. We give the answers to the main questions of convex optimization (except
the last one), and in this chapter these answers are compared to those for smooth
optimization:
1. What are the conditions for optimality? A convex function f assumes its
infimum at a point x if and only if 0 ∈ ∂f ( x )—Fermat’s theorem in the convex
case. Now assume, moreover, that a convex perturbation has been chosen: a
convex function of two vector variables F (x, y) for which f (x) = F (x, 0) ∀x;
here y ∈ Y = Rm represents changes in some data of the problem to minimize
f (such as prices or budgets). Then f assumes its infimum at x if and only
x ) = L(
if f ( x , η, η0 ) and 0 ∈ ∂L( x , η, η0 ) for some selection of multipliers
(η, η0 ) (provided the bad case η0 = 0 is excluded)—Lagrange’s multiplier
method in the convex case. Here L is the Lagrange function of the perturbation
function F , which will be defined in this chapter. The conditions of optimality
play the central role in the complete analysis of optimization problems. The
complete analysis can always be done in one and the same systematic four
step method. This is illustrated for some examples of optimization problems.
2. How to establish in advance the existence of optimal solutions? A suitable
compactness assumption implies that there exists at least one optimal solution.
3. How to establish in advance the uniqueness of the optimal solution? A suitable
strict convexity assumption implies that there is at most one solution.
4. What is the sensitivity of the optimal value S(y) for small changes y in the
data of the problem? The multiplier η is an element of the subdifferential
∂S(0) provided η0 = 1, so then it is a measure for the sensitivity.
5. Can one find in a guaranteed efficient way approximations for the optimal
solution, by means of an algorithm? This is possible for almost all convex
problems.
Road Map
1. Section 7.1 (convex optimization problem, types: LP, QP, SDP, least squares,
shortest distance).
2. Theorems 7.2.1–7.2.3, Definition 7.2.5, Theorems 7.2.6, 7.2.7, Example 7.2.3
(existence and uniqueness for convex problems).
3. Section 7.3 (definitions leading up to the concept of smooth local minimum).
4. Theorem 7.4.1, Example 7.4.4 (Fermat’s theorem (smooth case)).
5. Figure 7.2, Proposition 7.5.1 (no need for local minima in convex optimization).
6. Figure 7.3, Theorem 7.6.1, Example 7.6.5 (Fermat’s theorem (convex case)).
7. Section 7.7 (perturbation of a problem).
8. Theorem 7.8.1, Example 7.8.3 (Lagrange multipliers (smooth case)).
9. Figure 7.7, Proposition 7.9.1, Definitions 7.9.2–7.9.4, Theorem 7.9.5, Defini-
tion 8.2.2 (Lagrange multipliers (convex case).
10. Figure 7.8 (generalized optimal solutions always exist).
11. Section 7.11 (list advantages convex optimization).
mina, x, x ∈ L, x ∈ C,
• least squares problems (LS): this is the problem to minimize the length of the
error vector Ax − b where A is a given m × n-matrix of rank m, b is a given
column vector in Rm and x runs over X = Rn .
• shortest distance problems: this is the problem to find the point on a given convex
set A ⊆ X that has the shortest distance to a given point p ∈ X.
We will emphasize throughout this chapter the parallel between convex and
smooth optimization. With the latter we mean, roughly speaking, optimization
problems that can have equality constraints and where all functions that describe
the problem are continuously differentiable—that is, all their partial derivatives
exist and are continuous. A precise definition of smooth optimization will be given
later, when we need it. The main questions are the same for smooth problems as for
convex problems, but the answers are much more satisfying for convex problems.
7.2.1 Existence
7.2.2 Uniqueness
We need a definition.
Definition 7.2.5 A convex set A ⊆ X is strictly convex if its relative boundary
contains no closed line segment of positive length.
Now we are ready to give the promised second uniqueness result.
Theorem 7.2.6 There is at most one optimal solution for the problem to minimize
a linear function l on a strictly convex set A that is not constant on this set.
Proof The set of optimal solutions is a convex set that is a subset of the relative
boundary of A. Therefore, it does not contain any closed line segment of positive
length, as A is strictly convex. It follows that it is either empty or a singleton.
Here is a third class of convex optimization problems for which one has
uniqueness.
Theorem 7.2.7 The problem to find a point on a closed convex set A ⊆ X having
shortest distance to a given point p ∈ X has a unique optimal solution.
There are no uniqueness results for global optima of non-convex optimization
problems.
Here is an illustration of the use of these existence and uniqueness results for convex
optimization.
Example 7.2.8 (The Waist of Three Lines in Space) Once, Dr. John Tyrrell, lecturer
at King’s College London, challenged us PhD students to crack an unsolved
problem: to give a proof for the following phenomenon. If an elastic band is
stretched around three iron wires in space that are mutually disjoint and at least
two of which are not parallel, then the band will always slide to an equilibrium
position, and this position does not depend on how you stretch initially the elastic
band around the wires. In other words, the three lines have a unique waist.
We all did a lot of thinking and calculating but all was in vain. Only many years
later, when I gave a course on convex analysis, did I realize how simple the solution
is. It does not require any calculation.
The problem can be formalized as follows: show that for three given mutually
disjoint lines l1 , l2 , l3 in space R3 —at least two of them should not be parallel—the
problem to minimize the function
where the point pi runs over the line li for i = 1, 2, 3 possess a unique optimal
solution. This problem is illustrated in Fig. 7.1.
This is a convex optimization problem as all three terms of f are convex.
This gives already some information: the function f cannot have any stationary
7.3 Smooth Optimization Problem 179
l2
l3
p3
points (points where all three partial derivatives are equal to zero) that are not a
global minimum. In particular, there can be no local minimum that is not a global
minimum.
Existence of an optimal solution follows from Theorem 7.2.2: indeed if we
choose qi , ri ∈ li for i = 1, 2, 3, such that qj = rj for all j , and write
pi (t) = (1 − t)qi + tri for all t ∈ R+ . Then, it follows that the three terms of
f (p1 (t), p2 (t), p3 (t)) = p1 (t) − p2 (t) + p2 (t) − p3 (t) + p3 (t) − p1 (t) are
the Euclidean norm of a vector function of the form a + tb and for at least one of the
three b is not the zero vector. This implies that f (p1 (t), p2 (t), p3 (t)) is for large t
nondecreasing, as required.
Uniqueness of an optimal solution follows from Theorem 7.2.3: indeed, the
function f is strictly convex. We check this now. As all three terms of f are convex,
it suffices to show that at least one of them is strictly convex. Let pi (t) = qi + tri
with qi , ri ∈ R3 be a parametric description of line li for i = 1, 2, 3.
Consider the differences
None of these three functions takes the value zero as the lines are mutually disjoint
and not all three can be constant, as the lines l1 , l2 , l3 are not all parallel. We assume
that p1 (t) − p2 (t) is not constant, as we may without restricting the generality of the
argument. It follows readily that the term |p1 (t) − p2 (t)| is strictly convex. Hence
the function f is strictly convex.
This settles the ‘waist of three lines in space’ problem.
G( x ) − G (
x + h) − G( x )(h)/h
The following result gives a partial explanation why the concept of continuous
differentiability is convenient.
Proposition 7.3.4 A function G = (g1 , . . . , gm ) : U → Y —where X =
Rn , Y = Rm , U an open subset of Rn —is continuously differentiable iff all partial
∂gi
derivatives ∂x j
exist and are continuous. Then one has the following formula for the
derivative of G:
∂gi
G (x) = ( (x))ij .
∂xj
f (x) ≥ f (
x ) ∀x ∈ V .
point x , {h ∈ X | G (
x is the kernel of the derivative of G at x )(h) = 0},
x ({x ∈ X | G(x) = 0}) = ker G (
T x ).
This result is an equivalent version of two related central results from differential
calculus: the inverse function theorem and the implicit function theorem. Note in
particular that the tangent cone in Theorem 7.3.8 is a subspace. Therefore, the
tangent cone of the solution set of G(x) = 0 at a solution x for which G ( x ) is
surjective is called tangent space instead of tangent cone.
Now we can understand the sense of the definition of smooth local minimum
above. Indeed, if the graph of a function f is described near a point ( x )) as the
x , f (
solution set of a C 1 -vector function G with G ( x )) surjective, then the graph
x , f (
of f is near the point ( x , f (x )) smooth in the following sense: it looks like an affine
subspace—( x )) + T(
x , f ( x )) = (
x ,f ( x )) + ker G (
x , f ( x , f (
x )).
f (
x ) = 0,
This result follows immediately from the definitions of the derivative and the
tangent cone. Alternatively, it is a special case of the tangent space theorem given
above: consider the function with recipe (x, z) → G(x) − z.
Now we are ready to give the proof of Theorem 7.4.1 in the promised style.
7.4 Fermat’s Theorem (Smooth Case) 183
Proof of Fermat’s Theorem in the Smooth Case Assume that x is a point of local
minimum of the function f . To prove the theorem, it suffices by Proposition 7.4.2
to prove the following claim:
(X × 0) + T( x )) (f ) ⊂ X × R.
x ,f (
( x )) + tk dk = (
x , f ( x + tk xk , f (
x ) + tk ρk ) ∈ f .
x ) + tk ρk = f (
f ( x + txk ) ≥ f (
x ),
(X × 0) + T( x )) (f ) ⊂ X × R.
x ,f (
It is convenient that for each smooth or convex optimization that can be analyzed
completely, the analysis can always be presented in the same systematic way, which
we call the four step method. This method proceeds as follows.
1. Modeling (and existence and/or uniqueness). The problem is modeled in a
precise way. Moreover, existence and/or uniqueness of the optimal solution are
proved if this is desirable.
2. Optimality conditions. The optimality conditions are written down for the
problem.
3. Analysis. The optimality conditions are analyzed.
4. Conclusion. The optimal solution(s) is/are given.
Now we give the first illustration of this four step method.
184 7 Convex Problems: The Main Questions
Before we are going to give Fermat’s theorem in the convex case, we explain in this
section why the concept local minimum, which plays a role in Fermat’s theorem in
the smooth case, plays no role in convex optimization.
Proposition 7.5.1 Each local minimum of a proper convex function is also a global
minimum.
Figure 7.2 illustrates this result and its proof.
Proof We argue by contradiction. Assume f is a proper convex function on X and
x is a local minimum but not a global minimum. Then we can pick x̄ such that
f (x̄) < f ( x ). Now consider a point P on the segment with endpoints (
x , f (
x ))
and (x̄, f (x̄)) close to ( x )). That is,
x , f (
with α ∈ (0, 1) sufficiently small. What ‘sufficiently small’ means will become
clear in a moment. On the one hand, P lies in the epigraph of f , and therefore, by
x−
∧
xα x
7.6 Fermat’s Theorem (Convex Case) 185
the local minimality of x , we get that for α > 0 sufficiently small, P is so close
to ( x )) that the last coordinate of P is ≥ f (
x , f ( x ). On the other hand, this last
coordinate equals (1−α)f ( x )+αf (x̄), and—by f (x̄) < f ( x )—this last coordinate
is < (1 − α)f ( x ) + αf (
x ) = f (
x ). So we get the required contradiction.
In order to obtain Fermat’s theorem for convex problems, you should replace the
equation f (x) = 0 by the inclusion 0 ∈ ∂f (x).
Theorem 7.6.1 (Fermat’s Theorem (Convex Case)) Let f : X → (−∞, +∞]
be a proper convex function and let
x ∈ dom(f ). Then
x is an optimal solution of
the convex optimization problem minx f (x) iff
0 ∈ ∂f (
x ).
(X × 0) + T( x )) (epi(f )) ⊂ X × R.
x ,f (
Compare this reformulation of Fermat’s theorem in the convex case to the one
given for Fermat’s theorem in the smooth case in Remark 7.4.3. Note that the
epigraph and the graph of a function f : X → (−∞, +∞] are related:
∧
x
186 7 Convex Problems: The Main Questions
In passing, we mention that here we come across another situation where the
tangent cone can be determined easily.
Proposition 7.6.3 The tangent cone to a convex set A ⊆ X at a point ā ∈ A is the
cone generated by the set A − ā,
We emphasize that Fermat’s theorem 7.6.1 in the convex case gives a criterion
for global optimality: that is, the condition 0 ∈ ∂f ( x ) holds if and only if x
is an optimal solution for the convex optimization problem minx f (x). In other
words, it is a necessary and sufficient condition for global optimality. Therefore, the
ideal of the theory of optimization is reached here, in some sense. In comparison,
Fermat’s theorem 7.4.1 in the smooth case gives only a necessary condition for local
optimality.
Now we give some examples.
Example 7.6.4 (Minimization of a Complicated Looking Function) Solve the prob-
lem to minimize
f (x, y) = x 2 + 2y 2 − 3x − y + 2 x 2 − 4x + y 2 + 4.
10 x
1. Modeling and convexity. f (x, y) = (x y) − 3x − y + 2|(x, y) −
02 y
(2, 0)| → min, (x, y) ∈ (R2 )T . The function f (x, y) is strictly convex,
as its building
pattern
reveals: it is the sum of the strictly convex function
10 x
(x y) and the two convex functions −3x − y and 2|(x, y) − (2, 0)|.
02 y
2. Criterion. Convex Fermat: 0 ∈ ∂f (x, y). There is a unique point of non-
differentiability, (2, 0); in this point, f has subdifferential the sum of the
derivative (x 2 + 2y 2 − 3x − y) = (2x − 3, 4y − 1) taken in the point (2, 0)—that
is, (1, −1)—and the unit disk multiplied from the origin with the scalar 2. That
is, we have to take the disk with center the origin and radius 2 and translate it
over the vector (1, −1). This gives the disk with center at (1, −1) and radius 2.
3. Analysis. Let us try the point of non differentiability to begin with. It is a special
point. Its subdifferential is a relatively large set so we have a fair chance that it
contains the origin. All other points have a subdifferential consisting of one point
only. It would be unlikely that in a randomly chosen such points the derivative is
zero. Trying to find a point of differentiability that is a solution requires therefore
the solution of the stationarity equations. These are two complicated equations
in two unknowns. Let us hope that we are lucky with the investigation of the
point of non-differentiability. Then we do not have to solve this system. Well, the
criterion for optimality of the point of non-differentiability (2, 0) is: the origin
has to lie inside the disk with center (1, −1) and radius 2. This is illustrated in
7.6 Fermat’s Theorem (Convex Case) 187
−1
Fig. 7.4. This picture shows that this is√indeed the case. To be more precise, the
distance between (1, −1) and (0, 0) is 2, which is indeed smaller than 2.
4. Conclusion. The problem has a unique solution (2, 0).
In the next example, both calculus rules for subdifferentials—Moreau-
Rockafellar and Dubovitskii-Milyutin—are used.
Example 7.6.5 (Application of Moreau-Rockafellar and Dubovitskii-Milyutin)
Minimize the function
3 1
x 2 + y 2 + |x| + x + y 4 + y.
2 2
1. Modeling and existence. min f (x, y) = x 2 + y 2 +|x|+ 32 x+y 4 + 12 y, (x, y) ∈
R2 . This problem is convex.
2. Convex Fermat: 0 ∈ ∂f (x, y).
3. Analysis. Again we try the point of nondifferentiability (x, y) = (0, 0). We
have
• ∂( x 2 + y 2 )(0, 0) = B(0, 1), the disk with radius 1 and center at the origin;
• ∂( 32 x + y 4 + 12 y)(0, 0) = (∂( 32 x + y 4 + 12 y) (0, 0) = {( 32 , 12 )};
• ∂|x|(0, 0) = ∂ max(−x, x)(0, 0), and this is by Dubovitskii-Milyutin equal to
{(−1, 0)}co ∪ {(1, 0)}, the closed line segment on the x-axis with endpoints
−1 and +1.
This gives, by Moreau-Rockafellar, that ∂f (0, 0) is the Minkowski sum of
B(0, 1), the interval [−1, 1] on the x-axis, and the singleton {( 32 , 12 )}. Then you
have the pleasure of drawing a picture of this set. This reveals that ∂f (0, 0) is
an upright square with sides 2 and with center the point ( 32 , 12 ), to which on both
horizontal sides halfdisks have been attached. This set contains the origin.
4. Conclusion. The point (0, 0) is an optimal solution.
These were numerical examples. Now we give a convincing example of an
application to the facility location problem of Fermat-Weber.
Example 7.6.6 (The Facility Location Problem of Fermat-Weber) A facility has to
be placed for which the average distance to three given points ai , i = 1, 2, 3—that
188 7 Convex Problems: The Main Questions
120° 120°
120°
do not lie on one straight line—is minimal. This is also called the Fermat-Torricelli-
Steiner problem. The usual textbook solution is by the theorem of Fermat. The
cases that a Torricelli point does not exist are often omitted or require an awkward
additional analysis. Below we use the convex Fermat theorem, and this gives an
efficient treatment that includes all cases.
1. Modeling and existence: f (x) = 13 3i=1 |x − ai | → min, x ∈ R2 , f (x) ≈
|x|2 → ∞ for |x| → ∞, so by Weierstrass an optimal solution x exists. The
function f is strictly convex so the optimal solution is unique.
2. Convex Fermat: 0 ∈ ∂f (x); this subdifferential equals 13 3i=1 |x−a x−ai
i|
if x =
a −a
a1 , a2 , a3 and it equals {u | |u| ≤ 1} + |aii −ajj | + |aaii −a
−ak | for x = ai , where we
k
product space. Note that we allow f to take the value +∞. Consider the problem
min f (x), x ∈ X. (P )
where the index y runs over a finite dimensional inner product space Y , and where
fy is a function X → (−∞, +∞] for all y ∈ Y , and where f0 = f . The vector y
represents changes in some of the data of problem (P ). This family of optimization
problems can be described efficiently by just one object, by the function F : X ×
Y → (−∞, +∞] defined by F (x, y) = fy (x) for all x ∈ X, y ∈ Y . The function
F is called the perturbation function and (Py ) is called a perturbed problem.
Here are two examples where perturbations arise naturally.
Example 7.7.1 (Perturbation)
1. Suppose you want to choose a location on a road that is close to a given number
of locations; to be precise the sum of the squares of the distances from this point
to the given points has to be minimized. Consider the numerical example that
there are three given locations, given by points on the line R, two of which are
precisely known, 3 and 5, and one is approximately known, y ≈ 0. Then you
choose X = Y = R and F (x, y) = (x − y)2 + (x − 3)2 + (x − 5)2 for all
x, y ∈ R.
2. Suppose that you want to find the point in the plane R2 that satisfies the inequality
x1 + 3x2 ≤ 1 and that is as close as possible to the point (1, 2). We are interested
in the sensitivity of to a change of the right hand side of the inequality, which
might represent a budget. Then you choose X = R2 , Y = R and F (x, y) =
(x1 − 1)2 + (x2 − 2)2 if x1 + 3x2 − 1 + y ≤ 0 and F (x, y) = +∞ otherwise.
at
x is zero,
L (
x ; λ) = λ0 f0 (
x ) + · · · + λm fm (
x ) = 0,
(X × 0Y × 0) + T( x )) (F ) ⊂ X × Y × R.
x ,0,f (
Indeed, F can be seen as the solution set of the C 1 -function with recipe x →
(z, y) = (f0 (x), −f1 (x), . . . , −fm (x)), so by Proposition 7.4.2, T( x )) (F )
x ,0,f (
can be seen as the graph of the function with recipe
x → (z, y) = (f0 (
x )x, −f1 (
x )x, . . . , −fm (
x )x).
The claim implies that this tangent space lies in a hyperplane with equation η, y−
η0 z = 0 for suitable η = (λ1 , . . . , λm ) ∈ Y, η0 = λ0 ∈ R, not both zero. This gives
the required Lagrange equations.
To prove the proper inclusion above, it suffices to show that the intersection of the
tangent space T( x )) (F ) and the subspace {y = 0} is contained in X × Y × 0.
x ,0,f (
Indeed, this implies that the intersection of (X × 0Y × 0) + T( x )) (F ) and the
x ,0,f (
subspace {y = 0} is contained in X×0Y ×0; therefore (X×0Y ×0)+T( x )) (F )
x ,0,f (
cannot be the entire space X × Y × R.
To begin with, the tangent space to the intersection F ∩ {y = 0} at the point
x ) is equal to the intersection of the tangent spaces to F and {y = 0} at
x , 0, f (
(
this point. We omit the easy verification. So, as f (x) = F (x, 0) for all x, it remains
to show that the tangent space to f at the point ( x )) is contained in X × 0.
x , f (
This follows immediately from the local minimality of f at x.
Remark 7.8.2 This sketch shows that Lagrange’s multiplier method in the smooth
case is equivalent to the proper inclusion
(X × 0Y × 0) + T( x )) (F ) ⊂ X × Y × R.
x ,0,f (
7.8 Lagrange Multipliers (Smooth Case) 191
∂L
= λ0 (2x1 + 12x2 ) + λ1 (8x1 ) = 0,
∂x1
∂L
= λ0 (12x1 + 4x2 ) + λ1 (2x2 ) = 0.
∂x2
as x1 cannot be equal to zero (this would imply that x2 = 0 and this contradicts
the constraint). This gives x2 = 83 x1 or x2 = − 32 x1 . In the first (resp. second)
192 7 Convex Problems: The Main Questions
3 3 1
f0 ( , 4) = f0 (− , −4) = 106 .
2 2 4
4. Conclusion. (2, −3) and (−2, 3) are global minima and ( 32 , 4) and (− 32 , −4) are
global maxima.
Now we give the method of Lagrange multipliers for convex problems. The setup is
the same as in Sect. 7.7. We write S(y) for the optimal value of the problem (Py ),
for all y ∈ Y . Thus we get a function S : Y → R, called the optimal value function.
Proposition 7.9.1 If the perturbation function F is convex, then the optimal value
function S is convex.
The proof of this result shows how convenient the concept of strict epigraph can
be (proving the result using the epigraph is not convenient).
Proof By definition, epis (S), the strict epigraph of S is the image of epi s (F ), the
strict epigraph of F under the projection X × Y × R → Y × R, given by the
recipe (x, y, z) → (y, z). The F is a convex function, epis (F ) is a convex set. As
convexity of sets is preserved under talking the image under a linear function, we
get that epis (S) is a convex set and so that S is a convex function.
Figure 7.6 illustrates the shadow price interpretation of a multiplier η. By the
proposition above, there exists if S(0) is finite a closed halfspace in Y × R that
supports the epigraph of S at the point (0, S(0)). Let (η, η0 ) be an outward normal
of this halfspace. If this halfspace is not vertical, that is, if η0 = 0, then we can
0 y
7.9 Lagrange Multipliers (Convex Case) 193
(X × 0 × 0) + T( x )) (epi(F )) ⊂ X × Y × R.
x ,0,f (
3. If the Slater condition holds then in some neighborhood of 0Y the optimal value
function does not take the value +∞. Hence, a separating halfspace to epi(S) at
the point (0, S(0)) cannot be vertical.
Remark 7.9.6 The Lagrange multiplier method in the convex case is equivalent to
the following proper inclusion:
Compare this reformulation for Lagrange’s multiplier method in the convex case
to the reformulations for Fermat’s theorem in the smooth and convex case and
the reformulation for Lagrange’s multiplier method in the smooth case, given in
Remarks 7.4.3, 7.6.2, and 7.8.2 respectively.
At first sight, it might not be clear how the method of Lagrange for the
convex case can be used to solve optimization problems: the Lagrange condition
is formulated in terms of the optimal value function S, which is a priori unknown.
However, if we replace S by its definition in terms of the function F , we get a
repeated minimization: first over x ∈ X and then over y ∈ Y . Changing the
order of this repeated minimization brings the Lagrange condition in a form that is
convenient to use. Now we give the outcome of this reformulation of the Lagrange
condition. We need a definition, a notation for the minimization over y ∈ Y in the
reversed order.
Definition 7.9.7 The Lagrange function L : X × (Y × [0, +∞) \ {0Y ×R }) → R is
given by the recipe
tangent plane
slope η
∧ ∧
(x, 0, F (x, 0))
z = F (x, y)
O
∧
x
x y
this road; to be precise, the sum of the squares of the distances from this point to
the given points has to be minimized. Consider the numerical example that there
are three given locations, given by points on the line R, two of which are precisely
known, 3 and 5, and one is approximately known, 0. This can be modeled as follows
This is a convex optimization problem. We embed this problem into the following
family of optimization problems
Here y models the position of the location that is approximately known. These
problems (Py ) are all convex optimization problems.
It is straightforward to solve the problem (P ) and to determine the sensitivity of
its optimal values to small changes in the location that is approximately known to
be 0. The solution of (Py ) is seen to be the average location (3 + 5 + y)/3; therefore,
the solution of (P ) is 8/3, and substitution of x = (3 + 5 + y)/3 in fy (x) and then
taking the derivative at y = 0 gives η = S (0) = −16/3. So now that we are in the
comfortable possession of these answers, we can use this numerical problem to get
familiar with the convex method of Lagrange and all concepts involved.
The family of problems (Py )y can be described by one perturbation function,
F (x, y) = (x − y)2 + (x − 3)2 + (x − 5)2 for all x, y ∈ R. This is a proper closed
convex function. Note that the Slater condition holds: each point x̄ ∈ R is a Slater
point as F (x, y) < +∞ for all x, y ∈ R. The Lagrange function is
The expression that has to be minimized is convex. Carry out this minimization
by putting the derivative with respect to y equal to zero: 2(y − x) − η = 0. So
y = x + 12 η. Substitution gives the following formula for the Lagrange function,
1
L(x, η) = − η2 − xη + (x − 3)2 + (x − 5)2 .
4
This function is convex in x for each η. The convex method of Lagrange multipliers
gives that
x is a solution of the given problem (P ) iff there exist η ∈ R such that
x ) = L(
f ( x , η) and Lx (
x , η) = 0. Here we have used that as L(x, η) is convex in x,
the condition L( x , η) = minx L(x, η) is equivalent to the condition Lx (x , η) = 0.
So we have, writing x instead of x for simplicity:
1
x 2 + (x − 3)2 + (x − 5)2 = − η2 − xη + (x − 3)2 + (x − 5)2 ,
4
−η + 2(x − 3) + 2(x − 5) = 0.
7.10 *Generalized Optimal Solutions Always Exist 197
The second equation gives η = 4x − 16. Substitution in the first equation and
simplifying gives 9x 2 − 48x + 64 = 0,that is (3x − 8)2 = 0, so x = 8/3, and so
η = 4 · 83 − 16 = −16/3. This is in agreement with the outcomes above.
The aim of this section is to explain by means of examples that for any given convex
optimization problem, an optimal solution in a suitable generalized sense always
exists.
Example 7.10.1 (Generalized Optimal Solution for a Convex Optimization Problem)
Figure 7.8 illustrates the existence of optimal solutions using the top-view model,
for a better insight in the existence question.
0
2.
0
3.
4.
5.
6.
198 7 Convex Problems: The Main Questions
4. Here the optimal value is −∞ and f tends to −∞ with less than linear speed. We
have assumed wlog. that limx→+∞ f (x) = −∞. In the top-view model, you see
again that there is a unique generalized optimal solution, represented by the point
(1, 0). This point corresponds to the fact that the recession cone of f consists of
the ray generated by the vector (1, 0). Note that the curve drawn inside the circle
is tangent to the circle in the point (1, 0).
5. Here, f has a downward sloping asymptote for x → +∞. In the top-view model,
you see that the set of generalized solutions is a circle segment: one endpoint is
(1, 0) and the other endpoint corresponds to the slope of the asymptote mentioned
above. Note that the curve drawn inside the circle is not tangent to the circle in
the ‘other’ endpoint.
6. Here, f is unbounded below for x → +∞ but it has no asymptote. In the top-
view model, you see that the set of generalized solutions is again a circle segment:
one endpoint is (1, 0) and the other endpoint corresponds to the arrow on the left
hand side, which is a steepest recession vector of the epigraph of f . Note that the
curve drawn inside the circle is tangent to the circle in the ‘other’ endpoint.
That there exists an optimal solution in a generalized sense, for each of the six
problems in Fig. 7.7, is no coincidence. It is not hard to define this solution concept
in general, in the same spirit, and then it follows from the compactness of the ball
that there always exists an optimal solution in this generalized sense. All existence
results for ordinary optimal solutions above can be derived from this general result.
Conclusion For each convex optimization problem, there exists a solution in
a natural generalized sense. This gives existence of ordinary solutions under
assumptions on recession vectors.
The aim of this section is to list advantages of working with convex optimization
problems.
1. All local minima are global minima.
2. The optimality conditions are criteria for global minimality: they are sufficient
as well as necessary conditions for optimality. That is, they hold if and only if
the point under investigation is a minimum.
In particular, note the following advantages:
(a) No need to prove existence of an optimal solution if you can find a solution
of the optimality conditions and are satisfied with just one optimal solution.
However, if the analysis of the optimality conditions leads only to equations
which characterize the optimal solutions and not to a closed formula, then it
is essential that you check independently the existence of an optimal solution
to ensure that these equations have a solution.
200 7 Convex Problems: The Main Questions
7.12 Exercises
1. Show that the optimization problem in Nash’s theorem that characterizes the
fair bargain for a bargaining opportunity, Theorem 1.1.1, is equivalent to the
following optimization problem
1 2
|x + y − 2| + |3x − 2y − 1| + x 2 − x + y + ex−3y → min
2
31. For all possible values of the parameters a, b > 0, solve the problem
x 2 + y 2 + 2 (x − a)2 + (y − b)2 → min
32. Two villages are located on one side with respect to a right highway (a straight
line). The first one is on the distance of 1 km from the highway and the second
one is on the distance of 6 km from the highway and of 13 km from the first
village. Where to put the gasoline station so that the sum of distances from the
two villages and from the highway is minimal?
33. Formulate and prove an analogue to the Fermat-Torricelli-Steiner theorem for
four points in the three dimensional space.
34. For all possible values of the parameters α > 0, solve the problem
max 2x + 1 , x + y + α x 2 + y 2 − 4x + 4 → min
Try to make the new problem have a not very large number of constraints (the
smaller the better).
Chapter 8
Optimality Conditions: Reformulations
Abstract
• Why. In applications, many different forms of the conditions for optimality for
convex optimization are used: duality theory, the Karush–Kuhn–Tucker (KKT)
conditions, the minimax and saddle point theorem, Fenchel duality. Therefore, it
is important to know what they are and how they are related.
• What.
– Duality theory. To a convex optimization problem, one can often associate a
concave optimization problem with a completely different variable vector of
optimization, but with the same optimal value. This is called the dual problem.
– KKT. This is the most popular form for the optimality conditions. We also
present a reformulation of KKT in terms of subdifferentials, which is easier to
work with.
– Minimax and saddle point. When two parties optimize against each other,
then this sometimes leads to an equilibrium, where both have no incentive
to change the current situation. This equilibrium can be described, in formal
language, by a saddle point, that is, by vectors x ∈ Rn and y ∈ Rm for
which, for a suitable function F (x, y), one has that F ( x,
y ) equals both
minx maxy F (x, y) and maxy minx F (x, y).
– Fenchel duality. We do not describe this result in this abstract.
Road Map
• Figure 8.1 and Definition 8.1.2 (the dual problem and its geometric intuition).
• Proposition 8.1.3 (explicit description of the dual problem).
• Figure 8.2, Theorem 8.1.5 (duality theory in one picture).
• Definitions 8.2.1–8.2.4, Theorem 8.2.5, numerical example (convex program-
ming problem, Lagrange function, Slater condition, KKT theorem).
• The idea of Definition 8.2.8 (KKT in the context of a perturbation of a problem).
• Theorem 8.3.1, numerical example (KKT in subdifferential form).
• Section 8.4 (minimax, maximin, saddle point, minimax theorem of von Neu-
mann).
• Section 8.5 (Fenchel duality).
S(0) ≥ ϕ(η) ∀η ∈ Y.
This concludes the explanation that the dual problem is the problem of finding the
best lower bound for the optimal value of (P ) of a certain type.
Now we give a formal definition of the dual problem.
Definition 8.1.2 The dual problem (D) is the problem to maximize the function
ϕ : X → R given by ϕ = −S ∗ . Note that ϕ is concave (that is, minus this function
is convex).
Note that the dual problem depends on the chosen family of perturbed problems
(Py ), y ∈ Y —that is, on F —and not only on the primal problem—that is, not only
ϕ(η)
0
8.1 Duality Theory 207
Now comes the crux of the proof. We interchange the two infimum operations. This
gives
as desired.
This result is useful: experience has shown that often one can calculate—for a
given function F —the Lagrange function L and then the function ϕ. The point is
that it so happens that if you take F (x, y) − η · y, then taking first minimizing
with respect to x and then with respect to y is more complicated than doing these
minimizations in the other order. Therefore you can proceed as follows: choose an
element η, and compute ϕ(η) by using Proposition 8.1.3; this is something valuable:
an explicit lower bound for the value of the primal problem:
S(0) ≥ ϕ(η).
Example 8.1.4 (Duality Theory in One Picture) Our findings are essentially the full
duality theory. Duality theory can be seen in one simple picture, Fig. 8.2, which
shows that ϕ(η) is maximal if it equals S(0), and then η is a subgradient for S at
0—so it is a measure of sensitivity for the optimal value.
208 8 Optimality Conditions: Reformulations
f (x) ≥ ϕ(η).
S(y) ≥ S(0) +
η · y ∀y ∈ Y.
Strong duality need not always hold: if (P ) and (D) have a different optimal
value, then one says that there is a duality gap.
Example 8.1.6 (Duality Theory) Consider the problem to minimize x12 + 2x1 x2 +
3x22 under the constraint x1 + 2x2 ≥ 1. We perturb this problem by replacing the
constraint by
x1 + 2x2 ≥ y.
As
x ) = 2/3 = ϕ(
f ( η),
it follows that
x = (1/3, 1/3)
is a solution of the given problem. Observe that it lies on the boundary of the feasible
set: the constraint is binding.
For a coordinate-free treatment of duality theory for LP problems, see Sect. 9.8.
The oldest pair of primal dual problems was published in 1755 in the journal Ladies
Diary ([1]). See Sect. 9.5 for a description of this pair.
There is complete symmetry between primal and dual problem: the dual problem
can be embedded in a family of perturbed problems in a natural way such that if we
take its associated dual problem, we are back to square one: the primal problem.
Proposition 8.1.7 (Duality Scheme) Let F : X × Y → R be a proper closed
convex function. Then one has the following scheme:
min f0 (x)
s.t. fi (x) ≤ 0, 1 ≤ i ≤ m,
min f0 (x)
s.t. fi (x) ≤ 0 1 ≤ i ≤ m.
Note again that we get the ideal situation: a criterion for optimality for a feasible
point
x as the KKT conditions are necessary and sufficient for optimality.
Example 8.2.6 (KKT Example with One Constraint) Solve the shortest distance
problem for the point (1, 2) and the half-plane x1 + 3x2 ≤ 1.
8.2 Karush–Kuhn–Tucker Theorem: Traditional Version 211
The easiest way to solve this simple problem is geometrically: if you draw a
figure, then you see that you have to determine the orthogonal projection of the
point (1, 2) on the line x1 + 3x2 = 1 and to compute its distance to the point (1, 2).
The purpose of the analytical solution below is to illustrate the use of the KKT
conditions.
We will see that the analysis leads to the distinction of two cases; working out
these cases is routine but tedious, and the details will not be displayed below. We will
assume that we have a hunch which of the two cases leads to the optimal solution.
If this case leads to a point that satisfies the KKT conditions, then it is optimal,
as the KKT conditions are not only necessary but also sufficient. So then we do
not have to investigate the other case. This possibility of making use of a hunch
is an advantage of the KKT conditions. We emphasize here that the analysis of an
optimization problem with equality constraints by means of the Lagrange equations
leads also often to a distinction of cases, but then all cases have to be analyzed,
even if the first case that is considered leads to a point that satisfies the Lagrange
equations.
1. Modeling and convexity.
2(x1 − 1) + η = 0,
2(x2 − 2) + 3η = 0,
η(x1 + 3x2 − 1) = 0.
3. Analysis. Suppose you have a hunch that in the optimum the constraint is
x1 + 3
binding, that is, x2 − 1 = 0. Then the KKT conditions boil down to:
2(x1 − 1) + η = 0,
2(x2 − 2) + 3η = 0,
x1 + 3x2 − 1 = 0,
η ≥ 0. Now you solve these three linear equations in three unknowns. This gives
x1 = 25 , x2 = 15 , η = 65 . Then you check that this solution satisfies the condition
η ≥ 0.
4. Conclusion. The given problem has a unique solution x = ( 25 , 15 ).
212 8 Optimality Conditions: Reformulations
Example 8.2.7 (KKT Problem with Two Constraints) Solve the shortest distance
problem for the point (2, 3) and the set of points that have at most distance 2 to the
origin and that have moreover first coordinate at least 1.
Again, the easiest way to solve this problem is geometrically. A precise drawing
shows that one should determine the positive scalar multiple of (2, 3) that has length
2. Then one should take the distance of this point to (2, 3). The purpose of the
analytical solution below is to illustrate the KKT conditions for a problem with
several inequality constraints.
For a problem with m inequality constraints, 2m cases have to be distinguished.
So in this example the number of cases is 4. Again we assume that we have a hunch.
Again, as working out a case is routine but tedious, the details will not be displayed
below.
1. Modeling and convexity.
∂L
= 2(x2 − 3) + 2η1 x2 = 0,
∂x2
η1 , η2 ≥ 0,
η1 (x12 + x22 − 4) = 0,
η2 (1 − x1 ) = 0.
3. Analysis. Suppose we have a hunch that in the optimum only the first constraint
is binding. We start by considering this case (there are four cases). That is, x12 +
x22 = 4, 1−x1 = 0. Then the KKT conditions are led by some tedious arguments
√ √ √
to the solution x1 = 4/ 13, x2 = 6/ 13 and η1 = ( 13 − 2)/2, η2 = 0. By
the sufficiency of the KKT conditions, it follows that this is the unique solution.
8.3 KKT in Subdifferential Form 213
So we do not have to consider the other three cases; our educated guess turned
out to be right. √ √
4. Conclusion. The problem has a unique solution— x = (4/ 13, 6/ 13).
Now we are going to indicate how Theorem 8.2.5 can be derived from the convex
method of Lagrange multipliers. First, we embed the convex programming problem
in a family of perturbed problems.
Definition 8.2.8 For each y = (y1 , . . . , ym ) ∈ Rm the standard perturbed
problem of a convex programming problem (Py ) is the optimization problem
min f0 (x)
s.t. fi (x) + yi ≤ 0 1 ≤ i ≤ m.
The aim of this section is to reformulate the optimality conditions as the Karush–
Kuhn–Tucker (KKT) conditions in subdifferential form.
Vladimir Protasov has discovered a formulation of the KKT conditions in terms
of subdifferentials. This is very convenient in concrete examples. We only display
the main statement.
Theorem 8.3.1 (Karush–Kuhn–Tucker Conditions for a Convex Programming
Problem in Subdifferential Form) Assume there exists a Slater point for a given
convex programming problem. Let
x be a feasible point. Then
x is a solution iff
0n ∈ co(∂f0 ( x ), k ∈ K)
x ), ∂fk (
where K = {k ∈ {1, . . . , m} | fk (
x ) = 0}, the constraints that are active in
x.
214 8 Optimality Conditions: Reformulations
To illustrate the use of the subdifferential version of the KKT conditions, we first
consider the two run-of-the-mill examples that were solved in the previous section
by the traditional version of the KKT conditions. We have seen that working with
the traditional version leads to calculations that are routine but tedious. Now we will
see that working with the subdifferential version leads to more pleasant activities:
drawing a picture and figuring out how the insight that the picture provides can be
translated into analytical conditions. Therefore, the subdifferential version of KKT
is an attractive alternative.
Example 8.3.2 (Subdifferential Version of KKT for a Shortest Distance Problem)
1. Modeling and convexity.
y 4 − y − 4x → min,
Finally, we consider the bargaining example of Alice and Bob that was presented
in Sect. 1.1 and that was solved there. However, this time the problem will not be
solved by ad-hoc arguments, but by the KKT conditions, in subdifferential form.
Example 8.3.5 (Subdifferential KKT for a Bargaining Problem) We want to solve
the bargaining example of Alice and Bob that was presented in Sect. 1.1 and that
was solved there. However, this time the problem will not be solved by ad-hoc
arguments, but by the KKT conditions.
One can make an educated guess about the solution: Bob should give his ball,
which has utility 1 to him, to Alice, to whom it has the much higher utility 4; in
return, Alice should give her bat, which has utility 1 to her, to Bob, to whom it has
the higher utility 2; moreover, as Alice profits more from this exchange than Bob,
she should give the box with some probability to Bob; this probability should be >0
and <1.
By Theorem 1.1.1, it suffices to solve the problem to maximize the function
of two variables f (x) = x1 x2 on the set F ⊆ R2 of points x = p1 (−1, 2) +
p2 (−2, 2) + p3 (4, −1) for which 0 ≤ p1 , p2 , p3 ≤ 1. The educated guess above,
means that you have a hunch that in the optimum, p1 = p3 = 1 and 0 < p2 < 1.
1. Modeling and convexity.
∂g
≤ 0,
∂p1
∂g
= 0,
∂p2
∂g
≤ 0.
∂p3
1 1
− (−2) − 2 = 0.
−p1 − 2p2 + 4p3 2p1 + 2p2 − p3
8.4 Minimax and Saddle Points 217
Substitution of p1 = p3 = 1 gives
2 2
− = 0.
3 − 2p2 1 + 2p2
This can be rewritten as 3 − 2p2 = 1 − 2p2 , that is, as 4p2 = 2. This has unique
solution p2 = 12 .
= (1, 12 , 1) satisfies the first and the third condition
It remains to check that p
above. These conditions are
1 1
− (−1) − 2 ≤ 0,
−p1 − 2p2 + 4p3 2p1 + 2p2 − p3
1 1
− 4− (−1) ≤ 0.
−p1 − 2p2 + 4p3 2p1 + 2p2 − p3
Now we substitute p = (1, 12 , 1) in the left hand sides of these inequalities and
check that we get a non-positive outcome. For the first one, we get
1 2 1
− = − 1 < 0.
−1 − 1 + 4 2 + 1 − 1 2
−4 1 1
+ = −2 + < 0.
−1 − 1 + 4 2 + 1 − 1 2
4. Conclusion. The hunch that it is optimal for Bob to give the bat to Alice, and for
Alice to give the ball to Bob is correct; moreover then it is optimal that Alice
should give the box to Bob with probability 12 .
There is a relation between the minimax and the maximin that holds in general.
Proposition 8.4.2 (Minimax Inequality) The minimax is not smaller than the
maximin:
In order to formulate the proof, and for later purposes, it is useful to make the
following definitions.
Definition 8.4.3 The following two optimization problems are associated to L.
Write
and
and consider the primal problem minx f (x) and the dual problem maxη ϕ(η).
Proof of Proposition 8.4.2 One has
inf sup L(x, η) = inf f (x) ≥ sup ϕ(η) = sup inf L(x, η). (*)
x∈X η∈Y x∈X η∈Y η∈Y x∈X
L(x,
η) ≥ L(
x,
η) ≥ L(
x , η)
for all x ∈ X, η ∈ Y .
The following result shows that the concept of saddle point is precisely what is
needed for the present purpose.
8.4 Minimax and Saddle Points 219
x ) = L(
f ( x,
η) = ϕ(
η).
A careful reader might observe that if the function L(x, η) is concave in η, then
the inner maximization problem is equivalent to a convex optimization problem that
can be dualized to obtain inside convex minimization problem over some variable
λ. After such a dualization, one would have the outer minimization over x, and
an inner minimization over some new decision vector, say λ. Because both the
inner and outer problems would be minimization and the resulting function could be
jointly convex and they could be considered jointly as a single convex optimization
problem. Hence, there seems to be no reason to consider a saddle point problem as
a separate entity. Unfortunately, the dualization of the inner problem could increase
the dimensionality of the problem significantly, which would outweigh the benefit
of having a min problem instead of a min-max problem. For that reason, in many
applications such as for example machine learning, one prefers to stay with a min-
max form of a given problem and solve it as such using so-called saddle-point
algorithms.
8.6 Exercises
1. Show that the objective function ϕ of the dual problem is concave (that is, minus
this function is convex).
2. Write out the definition ϕ = −S ∗ and conclude that this gives the description
illustrated in Fig. 8.1 and described verbally just below this figure.
3. *Derive Theorem 8.1.5 from Lagrange’s principle in the convex case, and show
that Theorem 8.1.5 is in fact equivalent to the Lagrange’s principle in the convex
case.
4. Show that duality theorem is essentially equivalent to Lagrange’s principle in
the convex case.
5. Illustrate all statements of Theorem 8.1.5 by means of Figs. 8.1 and 8.2.
6. **Check all statements implicit in Proposition 8.1.7. Give a precise formulation
of the statement above that there is complete symmetry between the primal and
dual problem and prove this statement.
7. Write down the KKT conditions for the optimization problem that charac-
terizes the fair bargain for a bargaining opportunity that is given in Nash’s
theorem 1.1.1, after first having rewritten it as a convex optimization problem,
as
11. Formulate the Slater condition, the KKT conditions and the KKT theorem for
a convex programming problem with inequality and equality constraints:
min f0 (x)
s.t. fi (x) ≤ 0 1 ≤ i ≤ m , fi (x) = 0, m + 1 ≤ i ≤ m,
m
inf f (x) − −δfi (x) ,
x
i=1
where δfi (x) is the indicator function of the set {x | fi (x) ≤ 0}:
0 if fi (x) ≤ 0
δfi (x) =
+∞ otherwise.
24. Solve the shortest distance problem for the point (1, 2) and the half-plane x1 +
3x2 ≤ 1.
25. *Solve the problem to minimize y 4 − y − 4x subject to the constraints
√
max(x + 1, ey ) + 2 x 2 + y 2 ≤ 3y + 2 5
and
x 2 + y 2 − 4x − 2y + 5 + x 2 − 2y ≤ 0.
27. Find the minimal value a such that the system of inequalities has a solution:
⎧
⎪
⎨ |x + y − 2| + 2|x− y| + 2 − 7y ≤ a
x
x + 2y + x 2 + y 2 − 5
4 2 ≤a
⎩ 3x 2 + y 2 − 2y + 1 + ex − 3y ≤ a
⎪
31. Use Theorem 1.1.1, the reformulation of the problem in this theorem as a
convex problem min(− ln f (x)), x ∈ F, x > v and the KKT conditions to
determine the fair bargain for two persons, Alice (who possesses a bat and a
box) and Bob (who possesses a ball). The bat has utility 1 to Alice and utility
2 to Bob; the box has utility 2 to Alice and utility 2 to Bob; the ball has utility
4 to Alice and utility 1 to Bob. If they do not come to an agreement, then they
both keep what they have.
32. *(Creative problem). Suggest a method of approximate solution of the linear
regression model with the uniform metric:
⎧
⎨ f (a) = maxk=1,...,N a, x (k) → min
⎩
x ∈ Rd , (a, a) = 1
Reference
Abstract
• Why. The aim of this chapter is to teach by means of some additional examples
the craft of making a complete analysis of a convex optimization problem. These
examples will illustrate all theoretical concepts and results in this book. This
phenomenon is in the spirit of the quote by Cervantes. Enjoy watching the frying
of eggs in this chapter and then fry some eggs yourself!
• What. In this chapter, the following problems are solved completely; in brackets
the technique that they illustrate is indicated.
– Least squares (convex Fermat).
– Generalized Fermat-Weber location problem (convex Fermat).
– RAS method (KKT).
– How to take a penalty (minimax and saddle point).
– Ladies Diary problem (duality theory).
– Second welfare theorem (KKT).
– Minkowski’s theorem on an enumeration of convex polytopes (KKT).
– Duality for LP (duality theory).
– Solving LP by taking a limit (the interior point algorithms are based on convex
analysis).
The aim of this section is to present a method that is often used in practice, the least
squares method.
Suppose you have many data points (xi , yi ) with 1 ≤ i ≤ N . Suppose you are
convinced that they should lie on a straight line y = ax + b if not for some small
errors. You wonder what are the best values of a and b to take. This can be viewed
as the problem to find the best approximate solution for the following system of N
equations in the unknowns a, b:
yi = axi + b, 1 ≤ i ≤ N.
More generally, often one wants to find the best approximate solution of a linear
system of m equations in n unknowns Ax = b. Here A is a given m × n-matrix,
b is a given m-column and x is a variable n-column. Usually the columns of A
are linearly independent. Then what is done in practice is the following: the system
Ax = b is replaced by the following auxiliary system A Ax = A b. It can be
proved that this system has a unique solution. This is taken as approximate solution
of Ax = b.
Now we give the justification of this method. It shows that, in some sense, this
is the best one can do. For each x̄ ∈ Rn , the Euclidean length of the error vector
e = Ax̄ − b is a measure of the quality of x̄ as approximate solution of the system
Ax = b. The smaller this norm, the better the quality.
This suggests to consider the problem to minimize the length of the error vector,
f (x) = Ax − b. This is equivalent to minimizing its square,
This quadratic function is strictly convex and coercive (that is, its value tends
to +∞ if x → +∞) as the matrix product A A is positive definite. Therefore
it has a unique point of minimum. This is the solution of g (x) = 0, that is, of
A Ax = A b. This justifies that the method described above is best, in some sense.
This method is called the method of the least squares, as by definition Ax − b2 is
the sum of squares and this sum is minimized.
Now we apply the convex Fermat theorem to a generalization of the famous facility
location problem of Fermat-Weber to find a point that has minimal sum of distances
to the vertices of a given triangle. The generalized problem is to find a point in n-
dimensional space that has minimal sum of distances to a given finite set of points.
This leads to a solution that is much shorter than all known geometric solutions of
this problem.
Consider the problem to minimize the function
d
f (x) = x − xi .
i=1
9.3 Most Likely Matrix with Given Row and Column Sums 227
Here x1 , . . . , xd are d given distinct points in Rn and x runs over Rn . If you take
n = 2 and d = 3, then you get the problem of Fermat-Torricelli-Steiner.
The function f is continuous and coercive, so there exists a point of minimum
x . The function f is convex, so such a point is characterized by the inclusion 0 ∈
∂f (x ). We write fi (x) = x − xi for all i, and ui (x) = x−xi for all x ∈ X
x −xi
and all i for which x = xi ; note that then ui (x) is a unit vector that has the same
direction as x − xi . We distinguish two cases:
1. The point x is not equal to any of the points xi . In this case, each one of the
functions fi is differentiable and f (
x) = x ). Therefore, the point
ui ( x is an
optimal solution of the problem iff f (
x ) = di=1 ui ( x ) = 0.
2. The point x is equal to xj for some j . Then ∂fj ( x ) = B(0, 1), the closed ball
with radius 1 and center the origin, and ∂fi ( x ) = {ui ( x )} for all i = j , as all
these functions fi are differentiable in the point
x . Now we apply the theorem of
Moreau-Rockafellar. This gives that
d
∂f (xj ) = B(0, 1) + ui (xj ) = B( xi , 1).
i=j i=j
Therefore, xj is an optimal solution iff the closed ball with center i=j ui (xj )
and radius 1 contains the origin. This means that
ui (xj ) ≤ 1.
i=j
9.3 Most Likely Matrix with Given Row and Column Sums
Suppose you want to know the entries of a positive square matrix, but you only know
its row and column sums. Then you do not have enough information. Now suppose
that you have reason to believe that the matrix is the most likely matrix with row
and column sums as given. This is the case in a number of applications, such as
the estimate or prediction of input-output matrices. To be more precise, suppose
that these row and column sums are positive integers, and assume that all entries
of the matrix you are looking for are positive integers. Assume moreover that the
228 9 Application to Convex Problems
matrix has been made by making many consecutive independent random decisions
to assign one unit to one of the entries—with equal probability for each entry. Then
ask for the most likely outcome given that the row and column sums are as given.
Taking the required matrix to be the most likely one in the sense just explained is
called the RAS method.
Now we explain how to find this most likely matrix. Using Stirling’s approxima-
tion ln n! = n ln n − n + O(ln n), one gets the following formula for the logarithm
of the probability that a given matrix (Tij ) is the outcome:
C− [Tij (ln Tij ) − Tij ],
i,j
where C is a constant that does not depend on the choice of the matrix. Then
the problem to find the most likely matrix with given row and column sums is
approximated by the following optimization problem:
min f (T ) = [Tij (ln Tij ) − Tij ],
i,j
subject to
Tij = Si ∀i,
j
Tij = Cj ∀j,
i
The aim of this section is to illustrate minimax and the saddle point theorem.
We consider taking of a penalty kick, both from the point of view of the goal
keeper and from the point of view of the football player taking the penalty. For
9.5 Ladies Diary Problem 229
simplicity we assume that each one has only two options. The player can aim left or
right (as seen from the direction of the keeper), the keeper can dive left or right. So
their choices lead to four possibilities. For each one the probability is known that the
keeper will stop the ball: these probabilities are stored in a 2 × 2-matrix A. The first
and second rows represent the choice of the keeper to dive left and right respectively;
the first and second columns represent the choice of the player to aim left and right
respectively. Both know this matrix. It is easy to understand that it is in the interest
of each one that his choice will be a surprise for the other. Therefore, they both
choose randomly, with carefully determined probabilities. These probabilities they
have to make known to the other. Which choices of probabilities should they make
for given A?
The point of view of the keeper, who knows his adversary is smart, is to determine
the following maximin (where we let I = {x = (x1 , x2 ) | x1 , x2 ≥ 0, x1 + x2 = 1}):
The point of view of the player, who also knows the keeper is smart, is to determine
the following minimax
By von Neumann’s minimax theorem, these two values are equal and it so
happens that there exists a choice for each one the two where this value is realized
and nobody has an incentive, after hearing the choice of his adversary to change his
own choice. This choice is the saddle point of the function x Ay.
The aim of this section is to describe the oldest example of the duality theorem for
optimization problems.
In 1755 Th. Moss posed the following problem in the journal Ladies Diary:
In the three sides of an equilateral field stand three trees at the distance of 10, 12 and 16
chains from one another, to find the content of the field, it being the greatest.
Keep in mind that this was before the existence of scientific journals; it was not
unusual to find scientific discoveries published for the first time in the Ladies Diary.
Already in the beginning of the seventeenth century, Fermat posed at the end of
a celebrated essay on finding minima and maxima by pre-calculus methods (the dif-
ferential calculus was only discovered later, by Newton and Leibniz independently)
the following challenge:
Let he who does not approve of my method attempt the solution of the following problem:
Given three points in the plane, find a fourth point such that the sum of its distances to the
three given points is a minimum!
230 9 Application to Convex Problems
16
12 10
These two optimization problems look unrelated. However, if you solve one, then
you have the solution of the other one. The relation between their solutions is given
in Fig. 9.1.
Each one of these two problems is just the dual description of the other one.
Conclusion You have seen the historically first example of the remarkable phenom-
ena that seemingly unrelated problems, one minimization and the other maximiza-
tion, are in fact essentially equivalent.
The aim of this section is to illustrate the claim that you should be interested in
duality of convex sets as this is the heart of convexity. We give the celebrated
second welfare theorem. This gives a positive answer to the following intriguing
question. Can price mechanisms achieve each reasonable allocation of goods
without interference, provided one corrects the starting position, for example, by
means of education in the case of the labor market?
We will present the celebrated second welfare theorem that asserts that price
mechanisms can achieve each reasonable allocation of goods without interference,
provided one corrects the starting position. We first present the problem in the form
of a parable.
A group of people is going to have lunch. Each one has brought some food.
Initially, each person intends to eat his own food. Then one of them offers his
neighbor an apple in exchange for her pear which he prefers. She accepts as she
prefers an apple to a pear. So this exchange makes them both better off, because of
the difference in their tastes.
Now suppose that someone who knows the tastes of the people of the group
comes up with a ‘socially desirable’ reallocation of all the food. He wants to realizes
this using a price system, in the following way. He assigns suitable prices to each
sort of food, all the food is put on the table, and everyone gets a personal budget
9.6 The Second Welfare Theorem 231
to choose his own favorite food. The budget will not necessarily be the value of
the lunch that the person in question has brought to the lunch. The starting position
has been corrected. Show that the prices and the budgets can be chosen in such a
way that the self-interest of all individuals leads precisely to the planned socially
desirable allocation, under natural assumptions.
Solution. The situation can be modeled as follows. We consider n consumers
1, 2, . . . , n and k goods 1, . . . , k. A nonnegative vector in Rk is called a con-
sumption vector. It represents a bundle of the k goods. The consumers have initial
endowments ω1 , . . . , ωn . These are consumption vectors representing the initial
lunches. An allocation is a sequence of n consumption vectors x = (x1 , . . . , xn ).
The allocation is called admissible if
n
n
xi ≤ ωi .
i=1 i=1
That is, the allocation can be carried out given the initial endowments. The
consumers have utility functions, defined on the set of consumption vectors. These
represent the individual preferences of the consumers as follows. If ui (xi ) > ui (xi )
then this means that consumer i prefers consumption vector xi to xi . If ui (xi ) =
ui (xi ) then this means that consumer i is indifferent between the consumption
vectors xi and xi . The utility functions are assumed to be strictly quasi-concave
(that is, for each γ ≥ 0 the set {x | ui (x) ≥ γ } is convex for each i) and strictly
monotonic increasing in each variable. An admissible allocation is called Pareto
efficient if there is no other admissible allocation that makes everyone better off.
This is equivalent to the following more striking looking property: each allocation
that makes one person better off, makes at least one person worse off. A price vector
is a nonnegative row vector p = (p1 , . . . , pk ). It represents a choice of prices of the
k goods. A budget vector is a nonnegative row vector b = (b1 , . . . , bn ). It represents
a choice of budgets for all consumers. For each choice of price vector p, each choice
of budget vector b and each consumer we consider the problem
max ui (xi ),
subject to
p · xi ≤ bi ,
n
xi ≤ ωj ,
j =1
xi ∈ Rk+ .
This problem models the behavior of consumer i who chooses the best consumer
vector he can get for his budget. This problem has a unique point of maximum
232 9 Application to Convex Problems
di (p, d). This will be called the demand function of the consumer i. A nonzero
price vector p and a budget vector b for which
n
n
bi = p · ωi
i=1 i=1
that is, total budget equals total value of the initial endowments are said to give a
Walrasian equilibrium if the sequence of demands
b = (p · x1∗ , . . . , p · xn∗ )
d(p, b) = p · x ∗ .
k
ai ni = 0
i=1
9.7 *Minkowski’s Theorem on Polytopes 233
k
min −V (h), ai hi = 1, −hi ≤ 0, i = 1, . . . , k.
i=1
where
k
k
L = −λ0 V (h) − λi hi + λk+1 ( ai hi − 1)
i=1 i=1
−Si − λi + λk+1 ai = 0, i = 1, . . . , k.
Multiplying this
equality by the vector ni , taking the sum
over all i, and taking into
account that ki=1 Si ni = ki=1 ai ni = 0, we obtain ki=1 λi ni = 0. We denote
by I the set of all indices i for which λi > 0. Then i∈I λi ni = 0, and taking
into account the condition of complementary slackness, hi = 0 for all i ∈ I . On
the other hand, the polytope P ( h) has at least one interior point z, that is, for which
ni · z <
hi , i = 1, . . . , k. Multiplying by λi and taking the sum over all i ∈ I , we
get 0 < i∈I λi hi = 0. This contradiction means that I is empty, that is λi = 0
and so Si = λk+1 ai for all i = 1, . . . , k. Therefore, the volumes of the boundaries
of polytope P ( h) are proportional to the numbers ai . Carrying out a homothety
with the required coefficient, we obtain a polytope for which the volumes of the
boundaries are equal to these numbers.
Remark It can be proved that the function − ln V (h) is strictly convex, using the
Brunn-Minkowski inequality; this implies the uniqueness in Minkowski’s theorem
on polytopes.
The aim of this section is to derive in a geometric way the formulas that are usually
given to define dual problems. A natural perturbation of the data will be chosen, and
then the dual problem associated to this perturbation will be determined. This is of
9.8 Duality Theory for LP 235
interest for the following reason. There are several standard forms for LP-problems.
Often one defines for each type separately its dual problem. The coordinate free
method allows to see that all formulas for dual problems are just explicit versions of
only one coordinate-free, and so geometric intuitive, concept of dual problem.
To begin with we will proceed as much as possible coordinate free—for better
insight, starting from scratch.
Definition 9.8.1 A linear programming problem is an optimization problem P of
the form
min p · d0 , p ∈ P + ,
p
P + = {p ∈ P | p ≥ 0}.
x ∈ P + y, x ≥ 0, z ≥ p · d0 .
This set is readily seen to be a polyhedral set and to be the epigraph of a function
F : X × X × R → (−∞, +∞].
Now we derive the following explicit description of the dual problem D.
Proposition 9.8.2 The dual problem D is the following problem
where
D̃ = P̃ ⊥ = {d̃ ∈ X | d̃ · p̃ = 0 ∀p̃ ∈ P̃ },
which becomes [provided p ≥ 0, otherwise L(p, η) = +∞; in this step, we use the
equivalence of p ∈ (P̃ + y)+ with y ∈ P̃ + p and p ≥ 0]
which becomes [provided η ∈ D̃, otherwise L(p, η) = −∞; in this step, use that
infu∈P̃ u · η equals 0 if η ∈ D̃ and that it equals −∞ otherwise]
Thus, the Lagrange function has been determined. The next step is to deter-
mine the objective function of the dual problem—that is, to determine ϕ(η) =
infp∈X L(p, η) for all η ∈ X.
max p0 · η, η ∈ D̃, d0 − η ≥ 0.
η
as required.
Now duality theory takes the following symmetric form.
Theorem 9.8.3 (Coordinate Free Duality Theory for LP) Let P = P̃ + p0 , D =
D̃ + d0 ⊂ X = Rn be two affine subspaces for which P̃ and D̃ are subspaces
of X which are each other’s orthogonal complement. Consider the following LP
problems
min p · d0 , p ∈ P + (P)
p
min c x, Ax = b, x ≥ 0, (P)
x
max b y, A y + s = c, s ≥ 0. (D)
y,s
min c x, Ax ≥ b, x ≥ 0, (P)
x
max b y, A y ≤ c, y ≥ 0. (D)
y
min c x
x
s.t. Ax = b
x 0.
max b y
y,s
s.t. A y + s = c
s ∗ 0.
P̃ = D̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ D̃}
and, equivalently,
D̃ = P̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ P̃ }.
We let X++ , P++ , D++ denote the subset of all points in X, P , D respectively for
which all coordinates are positive. We use the Hadamard product, coordinate-wise
multiplication v ◦ w = (v1 w1 , . . . , vn wn ) ∈ Rn of two vectors v, w ∈ Rn . Now we
are ready to state the promised surprising fact.
Proposition 9.9.1 The map P++ × D++ → X++ given by pointwise
multiplication—(p, d) → p ◦ d—is a bijection.
Figure 9.2 illustrates this proposition.
Proof Choose an arbitrary point x ∈ Rn++ . We have to prove that there exist unique
p ∈ P++ , d ∈ D++ for which x = p ◦ d, where ◦ is the Hadamard product
9.9 Solving LP Problems by Taking a Limit 239
p∗d
s 1-1
p
D 0 P 0
subject to
p ∈ P++ , d ∈ D++ .
ln v = (ln v1 , . . . , ln vn ) ∀v ∈ Rn .
P = P̃ + s = {p̃ + s | p̃ ∈ P̃ }
and
D = D̃ + s = {d̃ + s | d̃ ∈ D̃}.
(d − x ◦ p−1 ) · p̃ ∀p̃ ∈ P̃ ,
(p − x ◦ d −1 ) · d̃ = 0 ∀d̃ ∈ D̃.
240 9 Application to Convex Problems
and
1 1
p 2 ◦ d − 2 ◦ D̃
P̃ = D̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ D̃}
and, equivalently,
D̃ = P̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ P̃ }.
Then P and D have a unique common point s; this can be checked using that each
element can be written in a unique way as the sum of an element of P̃ and an
element of D̃ = P̃ ⊥ . Then P = s + P̃ and D = s + D̃. Assume moreover that both
P and D contain positive vectors. One can consider the problem to find nonnegative
9.10 Exercises 241
Now omit the last term, which we expect to be negligible if we would substitute the
solution: indeed then both p− p̄ and d − d̄ become small vectors, so (p− p̄)◦(d − d̄)
will become very small. This leads to the linear vector equation
p̄ ◦ d̄ + p̄ ◦ (d − d̄) + (p − p̄) ◦ d̄ = v,
p ◦ d − (p − p̄) ◦ (d − d̄) = v.
9.10 Exercises
1. Show that there exists an optimal solution for the problem of the waist of three
lines in space.
242 9 Application to Convex Problems
2. (a) Check that the set of all (x, y, z) ∈ X × X × R for which is a polyhedral
set which is the epigraph of a function F : X × X → (−∞, +∞].
(b) Give an explicit description of this function F .
3. Derive Theorem 9.8.3 from the duality theory for convex optimization.
4. Derive the explicit form for the dual problem of an LP problem in standard
form that is given in Proposition 9.8.4.
5. Derive the explicit form for the dual problem of an LP problem in symmetric
form that is given in Proposition 9.8.5.
6. Derive the explicit form for the dual problem of a conic optimization problem
given in Proposition 9.8.6.
7. Give a coordinate free description of a pair of primal-dual conic optimization
problems and formulate complementarity conditions for such pairs.
8. Determine the best strategies for both players in the penalty problem from
Sect. 9.4 if all = 0.9, alr = 0.4, arl = 0.2, arr = 0.8.
9. Give a precise formalization of the maximization problem by Th. Moss from
1755 formulated in Sect. 9.5.
10. Give a precise formalization of the minimization problem by Fermat formulated
in Sect. 9.5.
11. Show that there exists a unique global optimal solution to the problem
subject to
p ∈ P++ , d ∈ D++
subject to
p ∈ P++ , d ∈ D++ ,
(d − x ◦ p−1 ) · p̃ ∀p̃ ∈ P̃ ,
(p − x ◦ d −1 ) · d̃ = 0 ∀d̃ ∈ D̃.
Appendix A
A.1 Symbols
R Real numbers.
Rn Real n-column vectors.
R+ Nonnegative numbers, [0, +∞) = {x ∈ R | x ≥ 0}.
R++ Positive numbers, (0, +∞) = {x ∈ R | x > 0}.
R Extended real numbers, [−∞, +∞] = R ∪ {−∞, +∞}.
[a, b] For a, b ∈ R, a ≤ b: {x ∈ R | a ≤ x ≤ b}.
(a, b) For a, b ∈ R, a ≤ b: {x ∈ R | a < x < b}.
[a, b) For a, b ∈ R, a ≤ b: {x ∈ R | a ≤ x < b}.
(a, b] For a, b ∈ R, a ≤ b, {x ∈ R | a < x ≤ b}.
Rc For nonzero c ∈ X = R n : the open ray spanned by c, {ρc | ρ > 0}.
X A finite dimensional (f.d.) inner product space with inner product ·, ·;
often (without loss of generality) Rn with the dot product x · y = x1 y1 +
· · · + xn yn .
1
· X The Euclidean norm on X, xX = x, x 2 for all x ∈ X. For X = Rn
1
this is (x12 + · · · + xn2 ) 2 .
(u, v) For unit vectors u, v ∈ Rn : angle ϕ between u and v, cos ϕ = u · v, ϕ ∈
[0, π ].
L⊥ For subspace L of X: orthogonal complement of L, the subspace {x ∈
X | x · y = 0 ∀y ∈ L}.
N For a linear function M : X → Y : transpose, the linear function Y → X
with recipe characterized by M (y), x = y, M(x) for all x ∈ X and
y ∈ Y.
Preference relation on X.
span(S) For a subset S ⊆ X: the span of S, the minimal subspace of X containing
S (where minimal means ‘minimal with respect to inclusion’).
aff(S) For a subset S ⊆ X: the affine hull of S, the minimal affine subset of X
containing S (where minimal means ‘minimal with respect to inclusion’).
ker(L) For a linear function L : X → Y , where X, Y are f.d. inner product
spaces: the kernel or nullspace of L, the subspace {x ∈ X | L(x) = 0} of
X.
N∗ For a norm N on X: the dual norm on X, N ∗ (y) = maxx=0 x, y/N(x)
for all y ∈ X.
1
· p For p ∈ [1, +∞]: the p-norm on Rn , xp = [|x1 |p + · · · + |xn |p ] p if
p = +∞, and x∞ = max(|x1 |, . . . , |xn |). So x2 is the Euclidean
norm.
x◦y For x, y ∈ Rn : Hadamard product of x and y, the pointwise product
(x1 y1 , . . . , xn yn ) .
Appendix A 245
A.1.4 Analysis
A.1.5 Topology
co(S) For a set S ⊆ X: the convex hull of S, the minimal convex set
containing S (minimal means ‘minimal under inclusion’).
cone(S) For a set S ⊆ X: the conic hull of S, the minimal convex cone
containing S ∪ {0X } (minimal means ‘minimal under inclusion’).
MA For a convex set A ⊆ X and a linear function M : X → Y , where Y
is also a f.d. inner product space: image of A under M, the convex set
{M(a) | a ∈ A} of Y .
AN For f.d. inner product spaces X, Y , a linear function N : X → Y ,
and a convex set A ⊆ Y : inverse image of A under N , the convex
set{x ∈ X | N(x) ∈ A}.
A1 + A2 For convex sets A1 , A2 ⊆ X: Minkowski sum of A1 and A2 , {a1 +
a2 | a1 ∈ A1 , a2 ∈ A2 }.
A1 ∩ A2 For convex sets A1 , A2 ⊆ X: intersection of A1 and A2 , {x ∈ X | x ∈
A1 & x ∈ A2 }.
A1 co ∪ A2 For convex sets A1 , A2 ⊆ X: convex hull of the union of A1 and A2 :
co(A1 ∪ A2 ).
A1 #A2 For convex sets A1 , A2 ⊆ X: inverse sum of A1 and A2 ,
∪α∈[0,1] (αA1 ∩ (1 − α)A2 ).
LA For a nonempty convex set A: the lineality space of A, the maximal
subspace of X that consists of recession vectors of A, RA ∩ R−A
(maximal means maximal with respect to inclusion).
ri(A) For a convex set A ⊆ X: the relative interior of A, the maximal rel-
atively open subset of aff(A) containing A (maximal means maximal
with respect to inclusion).
extr(A) For a convex set A: the set of extreme points of A, points that are not
the average of two different points of A.
extrr(A) For a convex set A: the set of extreme recession directions of A,
recession directions Ru for which the vector u is not the average
of two linearly independent vectors v, w for which Rv and Rw are
recession directions of A.
rA− For a convex set A ⊆ X: the minimal improper convex function for A
(where minimal means minimal with respect to the pointwise ordering
for functions), the improper convex function X → R with recipe x →
−∞ if x ∈ cl(A) and x → +∞ otherwise.
rA+ For a convex set A ⊆ X: the maximal improper convex function
for A (where maximal means maximal with respect to the pointwise
ordering for functions), the improper convex function X → R with
recipe x → −∞ if x ∈ ri(A) and x → +∞ otherwise.
B Often used for a convex set that contains the origin.
B◦ For a convex set B ⊆ X that contains the origin: the polar set, the
convex set containing the origin {y ∈ X | x, y ≤ 1 ∀x ∈ B}.
B ◦◦ For a convex set B ⊆ X that contains the origin: the bipolar set, the
closed convex set that contains the origin (B ◦ )◦ .
Appendix A 247
d
B←
→ B̃ For convex sets containing the origin B, B̃ ⊆ X: B and B̃ are in
duality, B ◦ = B̃ and B̃ ◦ = cl(B). Warning: B̃ is closed, but B is not
necessarily closed.
C Often used for a convex cone.
C◦ For a convex cone C ⊆ X: polar cone, the closed convex cone {y ∈
X | x, y ≤ 0 ∀x ∈ C}.
C ◦◦ For a convex cone C ⊆ X: bipolar cone, the closed convex cone
(C ◦ )◦ .
C∗ For a convex cone C ⊆ X: the dual cone, the closed convex cone
−C ◦ = {y ∈ X | x, y ≥ 0 ∀x ∈ C}.
d
C←
→ C̃ For convex cones C, C̃ ⊆ X: C and C̃ are in duality, C ◦ = C̃ and
C̃ ◦ = cl(C). Warning: C̃ is closed, but C is not necessarily closed.
A.1.10 Optimization
B ◦◦ = B if B is closed.
d
A co ∪ B ← → A◦ ∩ B ◦ .
d
A+B ← → A◦ # B ◦ .
d
MB ← → B ◦ M if M is a linear function.
C ◦◦ = C if C is closed.
d
C+D ← → C ◦ ∩ D◦.
d
MC ← → C ◦ M if M is a linear function.
B.1.3 Subspaces
L⊥⊥ = L.
d
K +L← → K ⊥ ∩ L⊥ .
d
ML ←→ L⊥ M if M is a linear function.
f ∗∗ = f if f is closed.
d
→ f ∗ ⊕ g∗.
f +g ←
d
→ f ∗ co ∧ g ∗ .
f ∨g ←
d
→ g∗M .
Mg ←
∂s(A) = A if A is closed.
s∂(p) = p if p is closed.
∂(p + q) = cl(∂p + ∂q).
∂(p ∨ q) = cl((∂p) co ∪ (∂q)).
∂(p ⊕ q) = cl(∂p ∩ ∂q).
∂(p # q) = cl(∂p # ∂q).
B.1.7 Norms
N ∗∗ = N .
d
→ N1∗ + N2∗ .
N1 ⊕ N2 ←
d
→ N1∗ ∨ N2∗ .
N1 co ∧ N2 ←
Index
A
Affine function C
linear plus constant, 39 Calculus rules
Affine hull, 67 convex cones, 103, 104
Affinely independent, 34 convex functions, 157
Affine subspace, 67, 175, 238, 240 convex sets, 161
Angle, 74 convex sets containing the origin, 103
sublinear functions, 161
Carathéodory’s theorem, 40
B Centerpoint theorem, 39
Barrier methods, 238 Certificate for
Basis, 55 insolubility, 81
Biconjugate function, 156 unboundedness, 81
Bilinear mapping, 99 Child drawing, 112, 120
Binary operation for convex cones Closed
intersection, 56 convex function, 133
Minkowski sum, 56 Closed halfspace, 88
Binary operation for convex functions Closed set, 38, 66
convex hull of the minimum, 139, 144 Closure, 66
F subdifferential, 161
Farkas’ lemma, 110 support function, 161
Fenchel, xx Hyperplane, 78, 88
Fenchel duality, 221
Fenchel-Moreau
theorem, 157 I
Fermat’s theorem (convex), 185 Ice cream cone, 26
Fermat theorem, 182 Iff, 8
Finite dimensional vector space, 54 Image
First orthant, 26 convex function, 138
Fourier-Motzkin elimination, 81 Indicator function, 128
Four step method, 183 In duality
Functional analysis, 93 convex cones, 103
convex sets containing the origin, 104
polyhedral sets, 107
G Infimum, 129
Gauss, 74 Inner product, 55
Geodesic Inner product space, 55
on standard unit sphere, 15 Interior, 66
Geodesically convex set Interior point, 66
in standard unit sphere, 15 Interior point methods, 238
Golden convex cone Intersection
first orthant, 26 convex cones, 56
Lorentz cone, 26 convex sets, 60
positive semidefinite cone, 26 Inverse image
convex function, 138
Inverse sum
H convex sets, 61
Hadamard’s inequality, 147 Involution, 94
Helly’s theorem, 35
Hemisphere
parametrization, 74 J
Hemisphere model Jensen’s inequality, 127
convex cone, 16 Jung’s theorem, 38
convex set, 24
Hessian, 134
Hilbert projection theorem, 115 K
Homogenization, vii Karush–Kuhn–Tucker theorem, 210
convex set, 18 Kernel, 79
Homogenization method, 23, 103 KKT in usual form, 210
binary operations convex functions, 143 Krasnoselski’s theorem, 39
binary operations convex sets, 57, 60, 144
calculus convex functions, 158
subdifferential, 164 L
calculus sublinear functions, 161 Lagrange function, 194, 210
Carathéodory’s theorem, 40 Lagrange multipliers (convex), 192
duality for convex functions, 157 Least squares, 225
finite convex combinations, 29 Least squares problem, 176
Helly’s theorem, 36 Limit of a function at a point, 66
organization of proofs, 29 Limit of a sequence, 66
polar set, 99, 102 Lineality space
properties subdifferential of a convex cone, 48
convex functions, 164 of a convex set, 80
Radon’s theorem, 33 Linear function, 55
256 Index
O Q
Open, 66 Quadratic programming, 175
Open ball, 66
Open ray, 9, 14
Optimality conditions R
duality theory, 206 Radon point, 32
Fenchel duality, 220 Radon’s theorem, 32
KKT in subdifferential form, 213 Ray model
KKT in usual form, 210 convex cone, 14
minimax and saddle points, 217 convex set, 24
Optimal solution, 174 Recession cone
Optimal value, 174 convex function, 144
Optimal value function, 192 convex set, 20
Orthogonal complement, 13 Recession direction, 20
of an affine subspace, 238, 240 Recession vector, 20
Orthonormal basis, 55 Relative boundary
convex set, 67
Index 257
S T
Saddle function, 217 Tangent cone, 164, 181
Saddle point, 218 Tangent space, 182
Saddle value, 218 Tangent space theorem, 181
Second welfare theorem Tangent vector, 181
in praise of price mechanisms, 230 Theorem of Chebyshev
Self-dual convex cone, 97 for affine functions, 39
Semidefinite programming, 175 Theorem of Hahn–Banach, 93, 115
Separated by a hyperplane, 92 Theorem of Moreau, 115
Separation theorem, 92 Theorem of the alternative, 108
Shapley–Folkman lemma, 44 Gale, 110
Shortest distance problem, 176 Gordan, 120
Slater condition, 193, 210 Ky Fan, 109
Slater point, 193, 210 Minkowski, Farkas, 109
Smooth local minimum, 181 Theorem of Weierstrass, 67
Spaces in duality, 99 Topological notion, 66
Sphere model, 15 Top-view model
convex cone, 15 convex cone, 17
Standard closed unit ball, 17 convex set, 25
Standard unit sphere, 15 Transpose of a linear function, 55
Strict epigraph, 130
Strictly convex, 134
Strictly convex set, 178 U
Strict sublevel set, 130 Unbounded, 66
Strong duality, 208 Uniqueness
Subdifferential convex optimization, 177
convex function, 163
sublinear function, 160
Subgradient W
convex function, 162 Waist of three lines, 178
Sublevel set, 130, 144 Weak duality, 208
Sublinear function, 159 Weyl’s theorem, 43
Subspace, 11, 12 Width, 38
Sum map, 56 Wlog, 36
Support function, 160