Convex Analysis For Optimization A Unified Approach Compress

Graduate Texts in Operations Research
Series Editors: Richard Boucherie · Johann Hurink
Jan Brinkhuis
Convex Analysis
for Optimization
A Unified Approach
Series Editors
Richard Boucherie
University of Twente
Enschede, The Netherlands
Johann Hurink
University of Twente
Enschede, The Netherlands
This series contains compact volumes on the mathematical foundations of Oper-
ations Research, in particular in the areas of continuous, discrete and stochastic
optimization. Inspired by the PhD course program of the Dutch Network on the
Mathematics of Operations Research (LNMB), and by similar initiatives in other
territories, the volumes in this series offer an overview of mathematical methods for
post-master students and researchers in Operations Research. Books in the series are
based on the established theoretical foundations in the discipline, teach the needed
practical techniques and provide illustrative examples and applications.
More information about this series at http://www.springer.com/series/15699

Jan Brinkhuis
Convex Analysis
for Optimization
A Unified Approach
Jan Brinkhuis
Econometric Institute
Erasmus University Rotterdam
Rotterdam, The Netherlands
ISSN 2662-6012 ISSN 2662-6020 (electronic)

ISBN 978-3-030-41803-8 ISBN 978-3-030-41804-5 (eBook)
https://doi.org/10.1007/978-3-030-41804-5
© Springer Nature Switzerland AG 2020

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To Grażyna
Preface
The novelty of this textbook is a simplification: convex analysis is based completely

on one single method.
A Unified Method Borrowed from Geometry Convex analysis, with its wealth
of formulas and calculus rules, immediately intrigued me, having a background in
pure mathematics and working at a school of economics. This background helped
to see that some phenomena in various branches of geometry that can be dealt
with successfully by one tested method, called homogenization, run parallel to
phenomena in convex analysis. This suggested using this tested method in convex
analysis as well. This turned out to work well beyond my initial expectations. Here
are sketches of two examples of such parallels and of their simple but characteristic
use in convex analysis.
First Example of Homogenization in Convex Analysis Taking the homogeniza-
tion of circles, ellipses, parabolas, and hyperbolas amounts to viewing them as
conic sections—intersections of a plane and the boundary of a cone. This is known
for more than 2000 years to be a fruitful point of view. Parallel to this, taking
the homogenization of a convex set amounts to viewing this convex set as the
intersection of a hyperplane and a homogeneous convex set, also called a convex
cone. This turns out to be also a fruitful point of view. For example, consider the
task to prove the following proposition: a convex set is closed under finite convex
combinations. The usual proof, by induction, requires some formula manipulation.
The homogenization method gives a reduction to the task of proving that a convex
cone is closed under positive linear combinations. The proof of this, by induction,
requires no effort. In general, one can always reduce a task that involves convex
sets to a task that involves convex cones, and as a rule this is an easier task. This
reduction is the idea of the single method for convex analysis that is proposed in this
book.
Second Example of Homogenization in Convex Analysis In algebraic and
differential geometry, the homogenization operation gives often sets that are not
vii
viii Preface
closed. Then it is natural to make these sets closed by adding the so-called points
at infinity. This is known to be a fruitful method for dealing with the behavior “at
infinity.” The added points model two-sided or one-sided directions. Parallel to this,
taking the homogenization of an unbounded closed convex set gives a convex cone
that is not closed, and again it is natural to make it closed by taking its closure and
to view the added points as points at infinity of the given convex set. The added
points turn out to correspond exactly to a well-known important auxiliary concept
in convex analysis: recession directions of the convex set. These are directions in
which one can travel forever inside the convex set. It is a conceptual advantage of
the homogenization method that it makes this auxiliary concept arise in a natural
way and that it makes it easier to work with it. This turns out to be a fruitful
method to work with unbounded convex sets. For example, consider the task to
prove the following theorem of Krein–Milman: each point of a closed convex set
is the sum of a convex combination of extreme points and a conic combination of
extreme recession directions. The usual proof, by induction, requires a relatively
intricate argument involving recession directions. The homogenization method
gives a reduction to the task of proving that a closed convex cone is generated by
its extreme rays. The proof of this, by induction, requires no effort. In general, one
can always reduce a task that involves an unbounded convex set—and that therefore
requires the use of recession directions—to a task that involves a closed convex
cone, and as a rule this is an easier task. In other words, the distinction between
bounded and unbounded convex sets disappears, in some sense.
The homogenization method made everything fall into its proper place for me.
On the Need for Calculus Rules for Convex Sets For example, I was initially
puzzled by the calculus rules for convex sets—formulas for a convexity preserving
operation (such as the intersection of two convex sets) followed by the duality
operator for convex sets. These rules are of fundamental interest, but do they have
practical applications? If not, it might be better to omit them from a textbook on
convex analysis, especially as they are relatively intricate, involving, for example,
the concept of sublinear functions. The homogenization method gives the following
insight. In order to solve concrete convex optimization problems, one has to
compute subdifferentials, certain auxiliary convex sets; this requires calculus rules
for convex sets, and these are essentially the formulas mentioned above, as you will
see.
A Unified Definition of the Properties Closed and Bounded Another example is
that the homogenization method reveals that the standard theory of convex analysis
is completely satisfactory and straightforward, apart from one complication. To
understand what this is, one has to keep in mind that convex sets and functions that
occur in applications have two nice properties, closedness and properness; these
properties make it easy to work with them. However, when you work with them,
you might be led to new convex sets and functions that do not possess these nice
properties; then these are not so easy to work with. Another complication is that
previously the definitions of the properties closed and proper for various convex
objects (sets and functions) had to be made individually and seemed to involve
Preface ix
several judicious choices. The homogenization method gives a unified definition

for each of these two properties; this definition can be used for each type of convex
object. This also helps to some extent to deal with the complications mentioned
above.
Complete Lists of Convex Calculus Rules The use of the homogenization method
is in essence known to experts; it is even used in some textbooks. However, it
does not appear to be widely known that homogenization is useful for virtually
everything in convex analysis. For example, it has been tried several times to obtain
complete lists of the many convex calculus rules for convex sets and functions. In
these attempts, experts had discarded the possibility to do this by homogenization:
convex cones were considered to be too poor in structure to capture fully the rich
world of these rules. However, when I tried this nevertheless, it worked: complete
lists were obtained in joint work with V.M. Tikhomirov. This makes clear that the
homogenization method can also lead to novel results in convex analysis.
A Convex Analysis Course Based on Homogenization The present book grew
out of lecture notes for the 9 week course Convex Analysis for Optimization of
the LNMB (Dutch Network for the Mathematics of Operations Research) for about
30 PhD students that was based completely on homogenization. The chosen style
is as informal as I could manage, given the wish to be precise. The transparent
structure of convex analysis that one gets by using homogenization is emphasized.
An active reader will ask himself/herself many questions, such as how the theory
can be applied to examples, and will try to answer these. This helps to master the
material. To encourage this, I have tried to guess these questions and these are placed
at the end of each chapter. One chapter was presented each week of the course, the
supporting examples and the applications at the beginning and end of each chapter
were optional material. At the end of the course, participants had to hand in, in
small groups, the solutions of their own selection of exercises: three of the more
challenging exercises from each chapter.
How Homogenization Generates Convex Analysis Writing these lecture notes
made me realize that the homogenization method, in combination with some
models for visualization, generates every concept, result, formula, technique using
formulas, and proof from convex analysis. Every task can be carried out by the
same method: reduction to the homogeneous case—that is, to convex cones, which
makes the task always easier. This makes this unified approach, together with the
models for visualization, very suitable for an introductory course: all you need to
know and understand from convex analysis becomes easily available. Moreover,
the structure of the entire theory becomes more transparent. The only work to be
done is to establish the standard properties of convex sets—always by means of
homogenization. Then all properties of convex functions (such as their continuity)
and convex optimization problems are just consequences. Moreover, a number of
relatively technical subjects of convex optimization such as the Karush–Kuhn–
Tucker conditions and duality theory are seen to be reformulations of a simpler
looking result: the convex analogue of the Lagrange multiplier rule. The course did
x Preface
not include an analysis of the performance of algorithms for convex optimization

problems. It remains to be seen how helpful homogenization will be for that purpose.
Textbook on Convex Analysis Based on Homogenization After having given this
course several times, the LNMB and Springer Verlag kindly invited me to turn the
lecture notes into a textbook, in order to share the unified approach to this subject
by the homogenization method with the wide circle of practitioners, who want to
learn in an efficient way the minimum one needs to know about convex analysis.
Acknowledgements
I would like to thank Vladimir M. Tikhomirov for stimulating my interest in convex

analysis and for working with me on the construction of complete lists of convex
calculus rules. I extend my thanks to Vladimir Protasov for sharing his insights.
Moreover, I would like to thank Krzysztof Postek, with whom I taught the LNMB
course in 2018, for reading the first version of each chapter in my presence and
giving immediate feedback. I am very grateful to Jan van de Craats, who produced
the figures, which provide an easy access to the entire material in this book. I would
like to extend my thanks to Andrew Chisholm and Alexander Brinkhuis for their
suggestions for the Preface and the Introduction.
Rotterdam, The Netherlands Jan Brinkhuis

Introduction
Prerequisites This book is meant for master, graduate, and PhD students who need
to learn the basics of convex analysis, for use in some of the many applications
of convex analysis, such as, for example, machine learning, robust optimization,
and economics. Prerequisites to reading the book are: some familiarity with linear
algebra, the differential calculus, and Lagrange multipliers. The necessary back-
ground information can be found elsewhere in standard texts, often in appendices:
for example, pp. 503–527 and pp. 547–550 from the textbook [1] “Optimization:
Insights and Applications,” written jointly with Vladimir M. Tikhomirov.
Convex Analysis: What It Is To begin with, a convex set is a subset of n-
dimensional space Rn that consists of one piece and has no holes or dents. A convex
function is a function on a convex set for which the region above its graph is a
convex set. Finally, a convex optimization problem is the problem to minimize a
convex function. Convex analysis is the theory of convex sets, functions, and its
climax, optimization problems. The central result is that each boundary point of a
convex set lies on a hyperplane that has the convex set entirely on one of its two
sides. This hyperplane can be seen as a linearization at this point of the convex set,
or even better, as a linearization at this point of the boundary of the convex set.
This is the only successful realization of the idea of approximating objects that are
nonlinear (and so are difficult to work with) by objects that are linear (and so are
easy to work with thanks to linear algebra) other than the celebrated realization of
this idea that is called differential calculus. The latter proceeds by taking derivatives
of functions and leads to the concept of tangent space.
Convex Analysis: Why You Should Be Interested One reason that convex
analysis is a standard tool in so many different areas is the analogy with differential
calculus. Another reason for the importance of convex analysis is that if you want to
find in an efficient and reliable way a good approximation of an optimal solution of
an optimization problem, then it is almost always possible to find it if this problem
is convex, but it is rarely possible otherwise.
xi
xii Introduction
The Main Results of Convex and Smooth Optimization Are Compared It is

explained in this book that the main result of convex optimization comes in various
equivalent forms such as the Karush–Kuhn–Tucker conditions, duality theory, and
the minimax theorem. Moreover, it is explained that the main result of convex
optimization is the analogue of the main result of smooth optimization, the Lagrange
multiplier method, in a precise sense.
Figure 1 illustrates how changing the method of linearization in the main result of
smooth optimization gives the main result of convex optimization, for a simple but
characteristic optimization problem that is both smooth and convex. A paraboloid
is drawn. It is the graph of a function z = F (x, y). The problem of interest is
that of minimizing F (x, 0). In the figure, a vertical plane is drawn; this is the
coordinate plane y = 0. Moreover, a parabola is drawn; it is the intersection of
the paraboloid and the vertical plane. This parabola is the graph of the function
F (x, 0). The variable y represents some change of a parameter in the problem, and
if this change is y, then the problem is perturbed and becomes that of minimizing
F (x, y) as a function of x, for that value of y. Thus the optimization problem
of interest is embedded in a family of optimization problems parametrized by y,
and this entire family is described by one function, F . Now let x be a point of
minimum for the problem of interest. Then we can linearize the graph of F at the
point P = ( x , 0, F (
x , 0)) in two ways: using that F is smooth or using that F
is convex. The smooth linearization is to take the tangent plane at P , the convex
linearization is to take a plane through P that lies completely under the graph of
F . The outcome of both linearization methods is the same plane. In the figure, this
plane is drawn.
The main result of smooth and convex optimization for this situation is that this plane is
parallel to the x-axis.
Fig. 1 The main result of

smooth and convex
optimization tangent plane
slope η
∧ ∧
(x, 0, F (x, 0))
z = F (x, y )
O
∧
x
x y
Introduction xiii
This follows from the minimality property of x . Therefore, the equation of this
plane is of the form z − F ( x , 0) = ηy for some real number η. This number
η is the Lagrange multiplier for the present situation. This procedure can be
generalized to arbitrary smooth or convex functions F (x, y) with x ∈ Rn and
y ∈ Rm . For smooth F one has to use smooth linearization, for convex F one
has to use convex linearization. This gives the main result of smooth optimization,
respectively, convex optimization. Thus the difference between the derivations of
the two results is only in the method of linearization that is used.
Novelty: One Single Method Convex analysis requires a large number of different
operations, for example, maximum, sum, convex hull of the minimum, conjugate
dual, subdifferential, etc., and formulas relating them, such as the Fenchel–Moreau
theorem f ∗∗ = f , the Moreau–Rockafellar theorem ∂(f + g)(x) = ∂f (x) + ∂g(x),
the Dubovitskii–Milyutin theorem ∂ max(f, g)(x) = ∂f (x)co ∪ ∂g(x) if f (x) =
g(x), etc., depending on the task at hand. Previously, all these operations and
formulas had to be learned one by one, and all proofs had to be studied separately.
The results on convex analysis in this textbook are the same as in all textbooks on
convex analysis. However, what is different in this textbook is that a single approach
is given that can be applied to all tasks involved in convex analysis: construction
of operations and formulas relating them, formulation of results and construction
of proofs. Using this approach should save a lot of time. Once you have grasped
the meaning of this approach, you will see that the structure of convex analysis is
transparent.
Description of the Unified Method The foundation of the unified approach is the
homogenization method. This method consists of three steps:
1. homogenize, that is, translate a given task into the language of nonnegative
homogeneous convex sets, also called convex cones,
2. work with convex cones, which is relatively easy,
3. dehomogenize, that is, translate back to get the answer to the task at hand.
The use of homogenization in convex analysis is borrowed from its use in
geometry. Therefore, we first take a look at its use in geometry.
History of the Unified Method As long ago as 200 years BCE, Apollonius
of Perga used an embryonic form of homogenization in his eight volume work
“Conics,” the apex of ancient Greek mathematics. He showed that totally different
curves—circles, ellipses, parabolas, and hyperbolas—have many common proper-
ties as they are all conic sections, intersections of a plane with a cone.
Figure 2 shows an ice cream cone and boundaries of intersections with four
planes. This gives, for the horizontal plane a circle, for the slightly slanted plane an
ellipse, for the plane that is parallel to a ray on the boundary of the cone a parabola,
and for the plane that is even more slanted a branch of a hyperbola.
Thus, apparently unrelated curves can be seen to have common properties
because they are formed in the same way: as intersections of one cone with various
planes. This phenomenon runs parallel to the fact that totally different convex
xiv Introduction
Fig. 2 Conic sections: circle,

ellipse, parabola, branch
hyperbola
Fig. 3 Convex set as

intersection of a hyperplane
and a convex cone A×{1}
1 C
O A
sets—that is, sets in column space Rn that consist of one piece and have no
holes or dents—have many common properties and that this can be explained by
homogenization: each convex set is the intersection of a hyperplane and a convex
cone, that is, a convex set that is positive homogeneous—containing all positive
scalar multiples for each of its points.
Figure 3 shows the homogenization process in a graphical form. A convex set
A is drawn in the horizontal plane R2 . If one vertical dimension is added and
three-dimensional space R3 is considered, then the set A can be lifted up in three-
dimensional space in a vertical direction to level 1; this gives the set A × {1}. Next,
the convex cone C that is the union of all open rays with endpoint the origin O
and running through a point of A × {1} is drawn. This convex cone C is called the
homogenization or conification of A. Note that there is no loss of information in
going from A to C. If you intersect C with the horizontal plane at level 1 and drop
the intersection down vertically onto the horizontal plane at level 0, then you get A
back again. This description of a convex set as the intersection of a hyperplane and
a convex cone helps by reducing the task of proving any property of a convex set to
the task of proving a property of a convex cone, which is easier.
Working with Unboundedness in Geometry by the Unified Method In 1415,
the Florentine architect Filippo Brunelleschi made the first picture that used linear
perspective. In this technique, horizontal lines that have the same direction intersect
at the horizon in one point called the vanishing point.
Introduction xv
Fig. 4 Parallel tulip fields

and linear perspective
Figure 4 illustrates linear perspective; it shows a stylized version of parallel tulip

fields of different colors that seem to stretch to the horizon. The vanishing point is
behind the windmill.
In the early nineteenth century, the technique of linear perspective inspired the
discovery of projective geometry. Projective space includes a “horizon” consisting
of “points at infinity,” which represent two-sided directions. Projective space
enables dealing with problems at infinity in algebraic geometry. This is again
an instance of homogenization. Even before the discovery of projective space, Carl
Friedrich Gauss made a recommendation about how one should deal with one-sided
directions. In the first lines of his “Disquisitiones Generales Circa Superficies
Curvas” (general investigations of curved surfaces), published in 1828, the most
important work in the history of differential geometry, he wrote:
Disquisitiones, in quibus de directionibus variarum
in spazio agitur, plerumque ad maius perspicuitatis et
simplicitatis fastigium evehuntur, in auxilium vocando
superficiem sphaericum radio =1 circa centrum
arbitrarium descriptam, cuius singula puncta
repraesentare censebuntur directiones rectarum radiis
ad illa terminatis parallelarum.
Investigations, in which the directions of various straight lines in space are to be considered,
attain a high degree of clearness and simplicity if we employ, as an auxiliary, a sphere of
unit radius described about an arbitrary center, and suppose the different points of the sphere
to represent the directions of straight lines parallel to the radii ending at these points.
Gauss’ recommendation for dealing with problems in space involving one-sided

directions is another example of homogenization. That this is indeed homogeniza-
tion can be seen more clearly from the work of August Ferdinand Möbius, a student
of Gauss. He developed the idea further applying it one dimension lower to the plane
(in order to solve a problem in algebraic geometry).
To begin with, Möbius modeled a point in the plane (x, y) as the open ray
{(tx, ty, t) | t > 0} in three-dimensional space, and he modeled a one-sided
direction in the plane given by the unit vector (x̃, ỹ) as the open ray {(t x̃, t ỹ, 0) | t >
0} in three-dimensional space. This is convenient as all of his objects of interest,
points in the plane, and one-sided directions in the plane are modeled in the same
way: as open rays in the upper halfspace z ≥ 0 (the ray model). This model is based
on homogenization as it is essentially the same construction as the one depicted in
Fig. 3.
xvi Introduction
However, open rays are infinite sets, and Möbius preferred to model points as
points and not as infinite sets. Therefore, he took the intersection of each ray with
the standard unit sphere x 2 + y 2 + z2 = 1. This creates a modified model of the set
that consists of the points in the plane and the one-sided directions in the plane, the
hemisphere model. In this model, the point (x, y) in the plane is modeled as the point
1
(x 2 + y 2 + 1)− 2 (x, y, 1) of the open upper hemisphere x 2 + y 2 + z2 = 1, z > 0,
and the one-sided direction in the plane given by the unit vector (x̃, ỹ) is modeled
as the point (x̃, ỹ, 0) on the circle that bounds the hemisphere, x 2 + y 2 = 1, z = 0.
The advantage of the hemisphere model over the ray model is that points and
directions are modeled as points instead of as infinite sets. Note that the circle
created by the hemisphere model is the analogue for the plane of the auxiliary sphere
that Gauss recommended for three-dimensional space. Comparing the hemisphere
model of Möbius and the Gauss model, the hemisphere model is more convenient to
use because it does not require the artificial constructs required by the Gauss model
(“we employ as an auxiliary, a sphere of unit radius described about an arbitrary
center”).
One more simplification is possible. Looking down at the upper hemisphere from
high above makes it look like a disk in the plane. To be more precise, one gets this
top-view model of the plane and its one-sided directions by orthogonal projection of
the hemisphere on the horizontal coordinate plane z = 0: then a point (x, y) in the
1
plane is modeled as the point (x 2 + y 2 + 1)− 2 (x, y) of the standard open unit disk
x 2 + y 2 < 1 in the plane R2 and a one-sided direction in the plane given by the unit
vector (x̃, ỹ) is modeled as this unit vector, which is a point of the circle that bounds
the disk, x 2 + y 2 = 1. The advantage of this top-view model over the hemisphere
model is that it is one dimension lower. This top-view model for the plane and its
one-sided directions is the closed unit disk in the plane itself, but the hemisphere
model for the plane and its one-sided directions requires three-dimensional space.
This top-view model for three-dimensional space and its one-sided directions is
the closed unit ball in three-dimensional space itself, but the hemisphere model for
three-dimensional space and its directions requires four-dimensional space, which
is hard to visualize.
The following line from the novel, “Dom dzienny, dom nocny” (House of day,
house of night) by Olga Tokarczuk can be read as evoking the idea of modeling the
points and one-sided directions of the plane by means of the top-view model:
Widz˛e kolista˛ lini˛e horyzontu, który zamyka dolin˛e ze wszystkich stron.
I see the circular line of the horizon, which encloses the valley on all sides.
Similarly to the way in which the three models were created for the plane, models
can be created for the line, for three-dimensional space, and more generally for a
vector space of any finite dimension.
Figure 5 illustrates the ray, hemisphere, and top-view models in the simplest case,
for dimension one, the line. In this figure, the extended real line R = R∪{+∞, −∞}
is modeled. This is the real line together with the two one-sided directions in the real
line—right (+∞) and left (−∞). A real number r is modeled in the ray model by
Introduction xvii
Fig. 5 Ray, hemisphere, (r,1)

and top-view model in
dimension 1
q
0 p r
the open ray containing the point

√ (r, 1), in the
√ hemisphere model by the point on the
= (r/ r 2 + 1, 1/ r 2 + 1 ), and in the top-view model by
open upper half-circle q √
the real number p = r/ r 2 + 1 ∈ (−1, 1). The one-sided direction “right” (+∞)
is modeled in the ray model by the positive horizontal axis, in the hemisphere model
by the right endpoint of the upper half-circle (1, 0), and in the top-view model by
the right endpoint of [−1; +1], the number 1. The one-sided direction “left” (−∞)
is modeled in a similar way.
Working with Unboundedness in Convex Analysis by the Unified Method
Having described the use of the unified method to work with unboundedness in
geometry, we now show how to use the unified method to work with unboundedness
in convex analysis. We are going to explain how homogenization makes working
with unbounded convex sets, which was previously complicated, as simple as
working with bounded convex sets. We do this by borrowing constructions from
differential geometry and algebraic geometry, which are explained above. In each
branch of geometry, such as differential or algebraic geometry, the ideal objects are
sets that have the so-called compactness property. A subset of n-dimensional column
space Rn is compact if it is bounded and closed. This suggests how compactness can
be achieved for a bounded convex set A: just take the closure of A. That is, add all
points to A that are not in A but that are the limit of a sequence of points in A.
Now we consider an unbounded convex set. Previously, working with such a set
involved the introduction of an artificial construction, that of the so-called recession
directions of A. This required considerable time and effort to get used to in a course
on convex analysis. However, homogenization leads to the hemisphere model or the
top-view model of A, and these are both bounded, so all we have to do to achieve
the desired compactness property—that is, to compactify A—is to take the closure
of one of these two models of A; that is all that is required (one can also compactify
A using the ray model, but this requires some preparation: the introduction of the
concept of metric space). In terms of A itself, this compactification corresponds to
taking not only the closure of A but also adjoining to it all its recession directions.
Working with the enlarged compact set is as simple for an unbounded convex set as
for a bounded one; in both cases, it allows a convenient analysis of the original
convex set. So, the homogenization method gives a natural mechanism for the
construction of recession directions, and this helps to see why they are “points at
infinity” (“on the horizon”), why they should be considered if you want to work
with an unbounded convex set, and how you should prove all properties of recession
directions.
xviii Introduction
Recession Directions of a Convex Set We want to illustrate how taking the closure
of one of the models of a convex set A corresponds precisely to taking the closure
of A and simultaneously adjoining the recession directions of A. To begin with, we
give the definition of a recession direction. A recession direction of a convex set A is
a one-sided direction in which one can travel forever inside the convex set A without
leaving it, after having started in an arbitrary point of A. A recession direction exists
precisely if A is unbounded. The set of all vectors having such a direction form,
together with the origin, the so-called recession cone RA . So a recession direction
of A can be defined as an open ray of RA , the set of all positive scalar multiples of
a nonzero vector of RA .
Illustration in Dimension One Figure 6 illustrates in dimension one, the line R,
that recession vectors of a closed convex set arise by taking the closure of the ray
model of this convex set. The ray model of an unbounded closed convex set A in
the plane is drawn, the half-line A = [a, +∞) for some a > 0. The set A has
one recession direction, “right” (+∞). The direction “right” is modeled in the ray
model by the horizontal open ray with endpoint the origin that points to the right.
This is the unique ray of the recession cone RA . You can see intuitively in this
figure that this horizontal ray belongs to the closure of the ray model of A. Indeed,
the beginning of a sequence of rays is drawn—the arrow suggests the direction of
the sequence—and the rays of this sequence are less and less slanted, and their limit
is precisely the horizontal ray that points to the right. This shows that the direction
“right” is a point in the closure of the ray model of A; moreover, it does not lie in
the ray model of A as all rays in this model are non-horizontal. In fact, it is the only
such point.
Illustration in Dimension Two Figure 7 illustrates in dimension two, the plane

R2 , that recession vectors of a closed convex set arise by adding boundary points
to the top-view model of this convex set. The top-view model is drawn for three
closed convex sets A in the plane that contain the origin as an interior point. The
model of A is also denoted by A and the model of the set of recession directions
of A is denoted by RA . In each one of the three pictures, the circle that bounds the
disk is the horizon, as in the quote on the valley above. The points on the circle
represent the one-sided directions in the plane. The center of the disk bounded by
the circle represents the origin of the plane R2 . The points of the shaded region form
A × {1}
A
O RA
Fig. 6 Recession directions by means of taking the closure

Introduction xix
A
A RA A
RA
Fig. 7 The modeling of convex sets and their recession directions
the closure of the top-view model of A. To be more precise, the points in the shaded
region that do not lie on the circle model the points of A itself; the points in the
shaded region that do lie on the circle model the recession directions of A, that is,
the rays of the recession cone RA . The straight arrows with initial point at the center
of the disk represent straight paths from the origin that lie completely inside A. The
straight arrows either end inside the circle at the boundary of the model of A, or
they end on the circle. In the first case, the path ends on the boundary of A. In the
second case, the path is a half-line, it never ends.
Looking at Fig. 7, from left to right, you see three top-view models: of a bounded
set (such as a set bounded by a circle or an ellipse), of a set having precisely one
recession direction (such as a set bounded by a parabola), and of a set having
infinitely many recession directions (such as a set bounded by a branch of a
hyperbola). In each case, you see that if you take the closure of the top-view model
of A, then what happens is that the recession directions are added. From left to right,
you see: no point is added, the unique recession direction is added, all the infinitely
many recession directions are added.
Recession Directions in the Three Models In the three models that we consider,
the ray model, the hemisphere model, and the top-view model, a recession direction
of a convex set A ⊆ Rn is modeled as follows: in the ray model as a horizontal ray
that is the limit of a sequence of non-horizontal rays that represent points of A, in
the hemisphere model as a point on the boundary of the hemisphere that is the limit
of a sequence of points on the open hemisphere that represent points of A, and in the
top-view model as a point on the boundary of the ball that is the limit of a sequence
of non-boundary points of the ball that represent points of A.
Conclusion of the Application of Homogenization to Unbounded Convex Sets
The unified method proposed in this book, homogenization, and the use of the
three visualization models make unbounded convex sets as simple to work with
as bounded ones: all that is required is to take the closure of one of the three models
of a given convex set. Then recession directions arise without requiring an artificial
construction and the geometric insight provided by the visualization models makes
it easy to construct the proofs of properties of recession directions.
xx Introduction
Comparison with the Literature The novelty of the present book compared to
existing textbooks on convex analysis is that all convex analysis tasks are simplified
by performing them using only the homogenization method. Moreover, many
motivating examples and applications as well as numerous exercises are given. Each
chapter has an abstract (“why” and “how”) and a road map. Other textbooks include:
Dimitri P. Bertsekas [2], Jonathan M. Borwein and Adrian S. Lewis [3], Lieven
Vandenberghe and Stephen Boyd [4], Werner Fenchel [5], Osman Güler [6], Jean-
Baptiste Hiriart-Urruty and Claude Lemaréchal [7, 8], Georgii G. Magaril-Ilyaev
and Vladimir M. Tikhomirov [9], Ralph Tyrrell Rockafellar [10] and [11], Vladimir
M. Tikhomirov [12], and Jan van Tiel [13].
The use of the homogenization method in convex analysis is well-known to
experts. One example is the treatment of polarity in projective space in “Convex
Cones, Sets and Functions” (1953) by Werner Fenchel in [5]. Another one is the
use of extreme points at infinity in “Convex Analysis” (1970) by Ralph Tyrrell
Rockafellar in [10]. Still another example is the concept of cosmic space in
“Variational Analysis” (1997) by Ralph Tyrrell Rockafellar and Roger Wets in [14].
However, it does not appear to be widely known that homogenization is useful
for any task in convex analysis. For example, attempts have been made to obtain
complete lists of rules for calculating the dual object of a convex object (such as
a convex set or function) but without using homogenization. A recent attempt was
described in “Convex Analysis: Theory and Applications” (2003) by Georgii G.
Magaril-Il’yaev and Vladimir M. Tikhomirov in [9]. Subsequently, it was shown that
the homogenization method leads to the construction of complete lists—in“Duality
and calculus of convex objects (theory and applications)”, Sbornik Mathematics,
Volume 198, Number 2 (2007), jointly with Vladimir M. Tikhomirov in [15], and
also in [16].
The methods that are explained in this book can be applied to convex optimiza-
tion problems. For a small but very interesting minority, one can get the optimal
solutions exactly or by means of some law that characterizes the optimal solution.
Numerous examples are given in this book. These methods are also used, for
example, in Yurii Nesterov [17] and in Aharon Ben-Tal [18], to construct algorithms,
which can find efficiently and reliably approximations to optimal solutions for
almost all convex optimization problems. These algorithms are not considered in
this book, as it is not yet clear how helpful the homogenization method is in the
analysis of these algorithms.
How Is the Book Organized? Each chapter starts with an abstract (why and what),
a road map, a section providing motivation (optional), and it ends with a section
providing applications (optional), and many exercises (the more challenging ones
are marked by a star).
Chapters 1–4 present all standard results on convex sets. Each result is proved
in the same systematic way using the homogenization method, that is, by reduction
to results on convex cones. To prove some of these results on convex cones, simple
techniques and tricks, described in the text, have to be applied.
References xxi
The remaining chapters use the homogenization method on its own to derive,
as corollaries of the standard results on convex sets, the standard results for
convex functions (Chaps. 5 and 6) and for convex optimization (Chaps. 7 and 8).
In Chaps. 7 and 8, many concrete smooth and convex optimization problems are
solved completely, all by one systematic four-step method. The consistent use
of homogenization for all tasks (definitions, constructions, explicit formulas, and
proofs) reveals that the structure of convex analysis is very transparent. To avoid
repetition, only proofs containing new ideas will be described in the book. Finally,
in Chap. 9 some additional convex optimization applications are given.
References
1. J. Brinkhuis, V.M. Tikhomirov, Optimization: Insights and Applications (Princeton University

Press, Princeton, 2005)
2. D.P. Bertsekas, A. Nedic, A.E. Ozdaglar, Convex Analysis and Optimization (Athena Scientific,
Nashua, 2003)
3. J.M. Borwein, A.S. Lewis, Convex Analysis and Nonlinear Optimization: Theory and Exam-
ples (Springer, Berlin, 2000)
4. L. Vandenberghe, S. Boyd, Convex Optimization (Cambridge University Press, Cambridge,
2004)
5. W. Fenchel, Convex Cones, Sets and Functions. Lecture Notes, Department of Mathematics
(Princeton University, Princeton, 1951)
6. O. Güler, Foundations of Optimization (Springer, Berlin, 2010)
7. J.-B. Hiriart-Urruty, C. Lemaréchal, Convex Analysis and Minimization Algorithms, vols. I–II
(Springer, Berlin, 1993)
8. J.-B. Hiriart-Urruty, C. Lemaréchal, Fundamentals of Convex Analysis. Grundlehren Text
Editions (Springer, Berlin, 2004)
9. G.G. Magaril-Ilyaev, V.M. Tikhomirov, Convex Analysis: Theory and Applications. Transla-
tions of Mathematical Monographs, vol. 222 (American Mathematical Society, Providence,
2003)
10. R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, 1970)
11. R.T. Rockafellar, Conjugate Duality and Optimization (Society for Industrial and Applied
Mathematics, Philadelphia, 1989, First published in 1974)
12. V.M. Tikhomirov, Convex Analysis. Analysis II, Encyclopaedia of Mathematical Sciences, vol.
14 (Springer, Berlin, 1990)
13. J. van Tiel, Convex Analysis: An Introductory Text (Wiley, Hoboken, 1971)
14. R.T. Rockafellar, R.J.-B. Wets, Variational Analysis. Grundlehren der Mathematischen Wis-
senschaften (Springer, Berlin, 1997)
15. J. Brinkhuis, V.M. Tikhomirov, Duality and calculus of convex objects. Sb. Math. 198(2), 171–
206 (2007)
16. J. Brinkhuis, Convex duality and calculus: reduction to cones. J. Optim. Theory Appl. 143, 439
(2009)
17. Y. Nesterov, Lectures on Convex Optimization. Springer Optimization and Its Applications,
vol. 137 (Springer, Berlin, 2018)
18. A. Ben-Tal, A. Nemirovskii, Lectures on Modern Convex Optimization (Society for Industrial
and Applied Mathematics, Philadelphia, 2001)
Contents
1 Convex Sets: Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 *Motivation: Fair Bargains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Convex Sets and Cones: Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Chapters 1–4 in the Special Case of Subspaces . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Convex Cones: Visualization by Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Ray Model for a Convex Cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Sphere Model for a Convex Cone . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.3 Hemisphere Model for a Convex Cone. . . . . . . . . . . . . . . . . . . . . 16
1.4.4 Top-View Model for a Convex Cone . . . . . . . . . . . . . . . . . . . . . . . 17
1.5 Convex Sets: Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.1 Homogenization: Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.2 Non-uniqueness Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5.3 Homogenization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Convex Sets: Visualization by Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7 Basic Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.7.1 The Three Golden Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.7.2 Convex Hull and Conic Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.7.3 Primal Description of the Convex Hull. . . . . . . . . . . . . . . . . . . . . 28
1.8 Convexity Preserving Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.8.1 Definition of Three Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.8.2 Preservation of Closedness and Properness . . . . . . . . . . . . . . . . 31
1.9 Radon’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.10 Helly’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.11 *Applications of Helly’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.12 Carathéodory’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.13 Preference Relations and Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.14 *The Shapley–Folkman Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.16 Hints for Applications of Helly’s Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
xxiii
xxiv Contents
2 Convex Sets: Binary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.1 *Motivation and Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.1.1 *Why Binary Operations for Convex Sets Are Needed . . . 54
2.1.2 *Crash Course in Working Coordinate-Free . . . . . . . . . . . . . . . 54
2.2 Binary Operations and the Functions +X , X . . . . . . . . . . . . . . . . . . . . . . . 56
2.3 Construction of a Binary Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4 Complete List of Binary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3 Convex Sets: Topological Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1 *Crash Course in Topological Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2 Recession Cone and Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 Recession Cone and Closure: Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4 Illustrations Using Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 The Shape of a Convex Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.6 Topological Properties Convex Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.7 Proofs Topological Properties Convex Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.8 *Applications of Recession Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.8.1 Certificates for Unboundedness of a Convex Set . . . . . . . . . . 81
3.8.2 Certificates for Insolubility of an Optimization
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4 Convex Sets: Dual Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.1 *Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.1.1 Child Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.1.2 How to Control Manufacturing by a Price Mechanism. . . . 87
4.1.3 Certificates of Insolubility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.1.4 The Black-Scholes Option Pricing Model . . . . . . . . . . . . . . . . . 88
4.2 Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Other Versions of the Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.1 The Supporting Hyperplane Theorem . . . . . . . . . . . . . . . . . . . . . . 92
4.3.2 Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.3.3 Theorem of Hahn–Banach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.4 Involution Property of the Polar Set Operator . . . . . . . . . . . . . 94
4.3.5 Nontriviality Polar Cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.4 The Coordinate-Free Polar Cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.5 Polar Set and Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.6 Calculus Rules for the Polar Set Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.7 Duality for Polyhedral Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.8 Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.8.1 Theorems of the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.8.2 *The Black-Scholes Option Pricing Model . . . . . . . . . . . . . . . . 111
4.8.3 *Child Drawing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Contents xxv
4.8.4 *How to Control Manufacturing by a Price

Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5 Convex Functions: Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.1 *Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.1.1 Description of Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.1.2 Why Convex Functions that Are Not Nice Can
Arise in Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.2 Convex Function: Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.3 Convex Function: Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.4 Convex Function: Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.5 Image and Inverse Image of a Convex Function . . . . . . . . . . . . . . . . . . . . . 138
5.6 Binary Operations for Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.7 Recession Cone of a Convex Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.8 *Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.8.1 Description of Convex Sets by Convex Functions . . . . . . . . . 145
5.8.2 Application of Convex Functions that Are Not
Specified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6 Convex Functions: Dual Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.1 *Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.2 Conjugate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
6.3 Conjugate Function and Homogenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.4 Duality Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.5 Calculus for the Conjugate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.6 Duality: Convex Sets and Sublinear Functions. . . . . . . . . . . . . . . . . . . . . . . 158
6.7 Subgradients and Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.8 Norms as Convex Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.9 *Illustration of the Power of the Conjugate Function . . . . . . . . . . . . . . . . 166
6.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7 Convex Problems: The Main Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.1 Convex Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2 Existence and Uniqueness of Optimal Solutions . . . . . . . . . . . . . . . . . . . . . 176
7.2.1 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.2.2 Uniqueness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
7.2.3 Illustration of Existence and Uniqueness. . . . . . . . . . . . . . . . . . . 178
7.3 Smooth Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.4 Fermat’s Theorem (Smooth Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.5 Convex Optimization: No Need for Local Minima . . . . . . . . . . . . . . . . . . 184
7.6 Fermat’s Theorem (Convex Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
7.7 Perturbation of a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.8 Lagrange Multipliers (Smooth Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
xxvi Contents
7.9 Lagrange Multipliers (Convex Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

7.10 *Generalized Optimal Solutions Always Exist. . . . . . . . . . . . . . . . . . . . . . . 197
7.11 Advantages of Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
8 Optimality Conditions: Reformulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
8.1 Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.2 Karush–Kuhn–Tucker Theorem: Traditional Version . . . . . . . . . . . . . . . . 210
8.3 KKT in Subdifferential Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
8.4 Minimax and Saddle Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.5 Fenchel Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
9 Application to Convex Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.1 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.2 Generalized Facility Location Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
9.3 Most Likely Matrix with Given Row and Column Sums . . . . . . . . . . . . 227
9.4 Minimax Theorem: Penalty Kick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.5 Ladies Diary Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.6 The Second Welfare Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
9.7 *Minkowski’s Theorem on Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
9.8 Duality Theory for LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
9.9 Solving LP Problems by Taking a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
9.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
A.1 Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
A.1.1 Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
A.1.2 Some Specific Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
A.1.3 Linear Algebra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
A.1.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
A.1.5 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
A.1.6 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
A.1.7 Convex Sets: Conification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
A.1.8 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
A.1.9 Convex Functions: Conification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
A.1.10 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
B.1 Calculus Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
B.1.1 Convex Sets Containing the Origin . . . . . . . . . . . . . . . . . . . . . . . . . 251
B.1.2 Nonempty Convex Cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
B.1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
B.1.4 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Contents xxvii
B.1.5 Proper Sublinear Functions p, q and Nonempty

Convex Sets A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
B.1.6 Subdifferentials of Convex Functions . . . . . . . . . . . . . . . . . . . . . . 252
B.1.7 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Chapter 1
Convex Sets: Basic Properties
Abstract
• Why. The central object of convex analysis is a convex set. Whenever weighted
averages play a role, such as in the analysis by Nash of the question ‘what is a
fair bargain?’, one is led to consider convex sets.
• What. An introduction to convex sets and to the reduction of these to convex
cones (‘homogenization’) is given. The ray, hemisphere and top-view models
to visualize convex sets are explained. The classic results of Radon, Helly and
Carathéodory are proved using homogenization. It is explained how convex
cones are equivalent to preference relations on vector spaces. It is proved that the
Minkowski sum of a large collection of sets of vectors approximates the convex
hull of the union of the collection (lemma of Shapley–Folkman).
Road Map For each figure, the explanation below the figure has to be considered
as well.
• Definition 1.2.1 and Fig. 1.3 (convex set).
• Definition 1.2.6 and Fig. 1.4 (convex cone; note that a convex cone is not required
to contain the origin).
• Figure 1.5 (the ray, hemisphere and top-view model for a convex cone in
dimension one; similar in higher dimensions: for dimension two, see Figs. 1.8
and 1.9).
• Figure 1.10 (the (minimal) homogenization of a convex set in dimension two and
in the bounded case).
• Figure 1.12 (the ray, hemisphere and top-view model for a convex set in
dimension one; for dimension two, see Figs. 1.13 and 1.14).
• Definition 1.7.2 and Fig. 1.15 (convex hull).
• Definition 1.7.4 and Fig. 1.16 (conic hull; note that the conic hull contains the
origin).
• Definitions 1.7.6, 1.7.9, Proposition 1.7.8, its proof and the explanation preceding
it (the simplest illustration of the homogenization method).
• Section 1.8.1 (the ingredients for the construction of binary operations for convex
sets).
© Springer Nature Switzerland AG 2020 1

J. Brinkhuis, Convex Analysis for Optimization, Graduate Texts in Operations
Research, https://doi.org/10.1007/978-3-030-41804-5_1
2 1 Convex Sets: Basic Properties
• Theorems 1.9.1, 1.10.1, 1.12.1 and the structure of their proofs (theorems of
Radon, Helly, Carathéodory, proved by the homogenization method).
• Definitions 1.12.6, 1.12.9 and 1.12.11 (polyhedral set, polyhedral cone and
Weyl’s theorem).
• Definition 1.10.5, Corollary 1.12.4 and its proof (use of compactness in the
proof).
• The idea of Sect. 1.13 (preference relations: a different angle on convex cones).
1.1 *Motivation: Fair Bargains
The aim of this section is to give an example of how a convex set can arise in a
natural way. You can skip this section, and similar first sections in later chapters, if
you want to come to the point immediately.
John Nash has solved the problem of finding a fair outcome of a bargaining
opportunity (see [1]). This problem does not involve convex sets, but its solution
requires convex sets and their properties.
Here is how I learned about Nash bargaining. Once I was puzzled by the outcome
of bargaining between two students, exchanging a textbook and sunglasses. The
prices of book and glasses were comparable—the sunglasses being slightly more
expensive. The bargain that was struck was that the student who owned the most
expensive item, the sunglasses, paid an additional amount of money. What could be
the reason for this? Maybe this student had poor bargaining skills? Someone told
me that the problem of what is a fair bargain in the case of equal bargaining skills
had been considered by Nash. The study of this theory, Nash bargaining, was an
eye-opener: it explained that the bargain I had witnessed and that initially surprised
me, was a fair one.
Nash bargaining will be explained by means of a simple example. Consider Alice
and Bob. They possess certain goods and these have known utilities (≈pleasure) to
them. Alice has a bat and a box, Bob has a ball. The bat has utility 1 to Alice and
utility 2 to Bob; the box has utility 2 to Alice and utility 2 to Bob; the ball has utility
4 to Alice and utility 1 to Bob.
If Alice and Bob would come to the bargaining agreement to exchange all their
goods, they would both be better off: their total utility would increase—for Alice
from 3 to 4 and for Bob from 1 to 4. Bob would profit much more than Alice from
this bargain. So this bargain does not seem fair.
Now suppose that it is possible to exchange goods with a certain probability.
Then we could modify this bargain by agreeing that Alice will give the box to
Bob with probability 1/2. For example, a coin could be tossed: if heads comes
up, then Alice should give the box to Bob, otherwise Alice keeps her box. This
bargain increases expected utility for Alice from 3 to 5 and for Bob from 1 to 3.
This looks more fair. More is true: we will see that this is the unique fair bargain in
this situation.
1.1 *Motivation: Fair Bargains 3
To begin with, we model the bargain where bat, box and ball are handed over with
probability p1 , p2 , p3 respectively. To keep formulas as simple as possible, we do
this by working with a different but equivalent measure of utility after the bargain:
expected change in utility instead of expected utility. To this end, we consider the
vector in R2 that has as coordinates the expected changes in total utility for Alice
and Bob:
p1 (−1, 2) + p2 (−2, 2) + p3 (4, −1).
Indeed, this is the difference of the following two vectors: the vector of expected
utilities after the bargain,
(1 − p1 )(1, 0) + p1 (0, 2) + (1 − p2 )(2, 0) + p2 (0, 2) + (1 − p3 )(4, 0) + p3 (0, 1)
and the vector of the utilities before the bargain, (1, 0) + (2, 0) + (0, 4). Thus the
utilities have been adjusted.
We allow Alice and Bob to make no agreement; in this example, it is decided in
advance that in case of no agreement, each one keeps her or his own goods.
Now we model the bargaining opportunity of Alice and Bob: as a pair (F, v),
where F , the set of bargains, is the set of all vectors p1 (−1, 2) + p2 (−2, 2) +
p3 (4, −1) with 0 ≤ p1 , p2 , p3 ≤ 1, and v, the disagreement point, is the origin
(0, 0). Figure 1.1 illustrates this bargaining opportunity.
The set F contains v and it has two properties that are typical for all bargaining
opportunities. We have to define these properties, compactness and convexity. That
F is compact means that it is bounded—there exists a bound M > 0 for the absolute
values of the coordinates of F , |xi | ≤ M for all x ∈ F, i = 1, 2—and that it
is closed—for each convergent sequence in R2 that is contained in F , its limit is
also contained in F . These properties are easily checked and they are evident from
Fig. 1.1. So F is compact. That F is convex means that it consists of one piece, and
has no holes or dents. Figure 1.2 illustrates the concept of convex set.
For a precise definition, see Definition 1.2.1. We readily see from Fig. 1.1 that F
is convex.
Fig. 1.1 Bargaining

opportunity for Alice and Bob
v
yes yes no no no
Fig. 1.2 Convex set: yes or no
So we see that if we model a bargaining opportunity, we are led to the concept

of convex set. We point out the source of this: a weighted average of two choices of
probabilities p1 , p2 , p3 and q1 , q2 , q3 that specify certain bargains, gives a choice
of probabilities r1 , r2 , r3 that specifies another bargain. To be more precise, if
v = p1 (−1, 2) + p2 (−2, 2) + p3 (4, −1), 0 ≤ pi ≤ 1, i = 1, 2, 3
and
w = q1 (−1, 2) + q2 (−2, 2) + q3 (4, −1), 0 ≤ qi ≤ 1, i = 1, 2, 3,
and we write
r1 = (1 − ρ)p1 + ρq1 , r2 = (1 − ρ)p2 + ρq2 , r3 = (1 − ρ)p3 + ρq3
for some 0 ≤ ρ ≤ 1, then 0 ≤ ri ≤ 1, i = 1, 2, 3 and
(1 − ρ)v + ρw = r1 (−1, 2) + r2 (−2, 2) + r3 (4, −1).
So we see that a weighted average of two bargaining opportunities is indeed again a

bargaining opportunity.
This is an example of a situation where a convex set turns up in a natural way. In
fact, the modeling of a bargaining opportunity would force one to invent the concept
of convex set, if it had not yet been invented already.
Now we explain, by means of the same example, how to solve the problem of
finding a fair outcome of a bargaining opportunity. To do this, we should begin
by considering not only the individual bargaining opportunity of Alice and Bob
above, but all bargaining opportunities simultaneously. A bargaining opportunity
is defined formally as a pair (F, v) consisting of a compact convex set F ⊂ R2 —
called the set of all allowable bargains—and a point v ∈ F —called the disagreement
point. It is assumed that there is a point w ∈ F for which w > v, that is,
w1 > v1 and w2 > v2 . The coordinates of an element of F , that is, of a
bargain, are the expected utilities of the bargain (maybe adjusted, such as was
done above) for Alice and Bob respectively. We consider all rules ϕ that assign
to each bargaining opportunity (F, v) an element ϕ(F, v) in F , the outcome of the
bargaining opportunity according to that rule. We want to find a rule that is fair, in
1.1 *Motivation: Fair Bargains 5
some sense to be made precise. To this end, we give three properties that we can
reasonably expect a rule ϕ to have, in order for it to deserve the title ‘fair rule’.
1. For each bargaining opportunity (F, v), there does not exist x ∈ F for which
x > ϕ(F, v).
This represents the idea that each individual wishes to maximize the utility to
himself of the bargain.
2. For each two bargaining opportunities (E, v) and (F, v) for which E ⊆ F and
ϕ(F, v) ∈ E, one has ϕ(E, v) = ϕ(F, v).
This represents the idea that eliminating from F feasible points (other than
the disagreement point) that would not have been chosen, should not effect the
outcome.
3. If a bargaining opportunity (F, v) can be brought by transformations of the two
utility functions of the type vi = ai ui + bi , i = 1, 2 with ai > 0, in symmetric
form, in the sense that (r, s) ∈ F implies (s, r) ∈ F and v is on the line y = x,
then if (F, v) is written in symmetric form, one has that ϕ(F, v) is on the line
y = x.
This represents equality of bargaining skills: if both have the same bargaining
position, then they will get the same utility out of it.
What comes now, surprises most people; it certainly surprised me. These
requirements single out one unique rule among all possible rules. This unique rule
should be called the fair rule, as it is reasonable to demand that a fair rule should
satisfy the three properties above are, and as moreover it is the only rule satisfying
these properties. Furthermore, it turns out that this rule can be described by means
of an optimization problem.
Theorem 1.1.1 (Nash) There is precisely one rule ϕ that satisfies the three
requirements above. This rule associates to a bargaining opportunity (F, v)—
consisting of a compact convex set F in the plane R2 and a point v ∈ F for which
there exists w ∈ F with w > v—the unique solution of the following optimization
problem:
max f (x) = (x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v.
Example 1.1.2 (The Fair Bargain for Alice and Bob) Now we can verify that the
following bargain is optimal for the bargaining opportunity of Alice and Bob: Bob
gives the ball to Alice, Alice gives the bat to Bob, and a coin is tossed: if it lands on
heads, then Alice keeps the box, if it lands on tails, then Alice gives the box to Bob.
That is, the objects are handed over with probabilities p1 = 1, p2 = 1/2, p3 = 1.
This gives the vector (−1, 2)+ 12 (−2, 2)+(4, −1) = (2, 2) in the set F of allowable
bargains. By Theorem 1.1.1, it suffices to check that (2, 2) is a point of maximum
of the function f (x) = (x1 − v1 )(x2 − v2 ) = x1 x2 on the set F , which is drawn in
Fig. 1.1.
To do this, it suffices to check that the line x1 + x2 = 4 separates the set F and
the solution set of f (x) ≥ f (2, 2), that is, of x1 x2 ≥ 4, x1 ≥ 0, x2 ≥ 0, and to
check that (2, 2) is the only point on this line that belongs to both sets. Well, on the
one hand, two adjacent vertices of the polygon F , the points (1, 3) and (3, 1)—the
two vertices in Fig. 1.1 that lie in the first orthant x1 ≥ 0, x2 ≥ 0—lie on the line
x1 + x2 = 4, so F lies on one side of this line and its intersection with this line
is the closed line interval with endpoints (1, 3) and (3, 1). On the other hand, the
solution set of x1 x2 ≥ 4, x1 ≥ 0, x2 ≥ 0—which is bounded by one branch of the
hyperbola x1 x2 = 4—lies on the other side of the line x1 + x2 = 4, and its only
common point with this line is (2, 2); this follows as the line x1 + x2 = 4 is tangent
to the hyperbola x1 x2 = 4 at the point (2, 2). This completes the verification that
the given bargain is the unique fair one.
Later, we will solve the bargaining problem of Alice and Bob in a systematic
way, using the theory of convex optimization, in Example 8.3.5.
Thus, by using convex sets and their properties, a very interesting result has been
obtained. The applications are limited by the fact that they require that the two
participants cooperate and give enough information, so that the compact convex set
F can be determined. The temptation must be resisted to influence the shape of F
and so the outcome, by giving information not truthfully. However, this result sheds
information on bargaining opportunities that are carried out in an amicable way, for
example among friends. This is precisely the case with the bargain of a textbook and
sunglasses, which led me to study Nash bargaining. Nash bargaining makes clear
that this is a fair bargain! Indeed, the student who owned the textbook had passed her
exam and had no more need for the book and could sell it to one of the many students
who had to take the course next year, the other student really needed the textbook
as she had to take the course. The student who owned the sunglasses had on second
thoughts regrets that she had bought them, the other student liked them but did not
have to buy them. If you translate this information into a bargaining opportunity
of the form (F, v), you will see that you get that the fair outcome according to
Nash agrees with the outcome of the textbook and sunglasses bargaining that I had
observed in reality.
If you are interested in the proof of Theorem 1.1.1, here is a sketch. It already
contains some properties of convexity in embryonic form. These properties will be
explained in detail later in this book.
Proof of the Nash Bargaining Theorem We begin by proving that the optimiza-
tion problem has a unique optimal solution. By the theorem of Weierstrass that a
continuous function on a nonempty compact set assumes its maximal value, there
exists at least one optimal solution. This optimal solution is > v by the assumption
that there exists w ∈ F for which w > v and so f (w) > f (v) = 0. The
optimization problem can be rewritten, by taking the logarithm, changing the sign,
and replacing the constraints x ≥ v by x > v, as follows,
min g(x) = − ln(x1 − v1 ) − ln(x2 − v2 ), x ∈ F, x > v.
The function g is convex, that is, its epigraph, the region above or on the graph
of g, that is, the set of pairs (x, ρ) for which x ∈ F, x > v, g(x) ≤ ρ, is a convex
1.2 Convex Sets and Cones: Definitions 7
set in three-dimensional space R3 . The function g is even strictly convex, that is, it
has the additional property that its graph contains no line segment of positive length.
We do not display the verification of these properties of g. These properties imply
that g cannot have more than one minimum. We show this by contradiction. If there
would be two different minima x and y, then g(x) = g(y) and so, by convexity of
g, the line segment with endpoints (x, g(x)) and (y, g(y)) lies nowhere below the
graph of g. But by the minimality property of x and y, this line segment must lie
on the graph of g. This contradicts the strict convexity of g. Thus we have proved
that the optimization problem max f (x) = (x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v has a
unique solution.
Now assume that ϕ is a rule that satisfies the three requirements. We want to
prove that ϕ(F, v) is the unique solution of the optimization problem max f (x) =
(x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v for each bargaining opportunity (F, v). We
can assume, by an appropriate choice of the utility functions of the two individuals,
that the two coordinates of the optimal solution of the optimization problem above
are equal to 1 and that v1 = v2 = 0. Then (1, 1) is the solution of the problem
max x1 x2 , x ∈ F, x ≥ 0. We claim that F is contained in the half-plane G given
by x1 + x2 ≤ 2. We prove this by contradiction. Assume that there exists a point
x ∈ F outside this half-plane. Then the line interval with endpoints (1, 1) and x
is contained in F , by the convexity of F . This line interval must contain points y
near (1, 1) with y1 y2 > 1 as a simple picture makes clear (and a calculation can
show rigorously). This is in contradiction with the maximality of (1, 1). As G is
symmetric, we get ϕ(G, 0) = (1, 1). Moreover ϕ(G, 0) ∈ F so ϕ(F, 0) = (1, 1).
Thus we have proved that a rule ϕ that satisfies the three requirements, must be
given by the optimization problem above.
Finally, it is not hard to show that, conversely, the rule that is given by the
optimization problem, satisfies the three requirements. We do not display the
verification.
1.2 Convex Sets and Cones: Definitions
Now we begin properly: with our central concept convex set. We will use the
notation R+ = [0, +∞) and R++ = (0, +∞).
Definition 1.2.1 A set A ⊆ Rn , where Rn is the space of n-columns, is called
convex if it contains for each two of its points also the entire line segment having
these points as endpoints,
a, b ∈ A, ρ, σ ∈ R+ , ρ + σ = 1 ⇒ ρa + σ b ∈ A.
If σ runs from 0 to 1—and so ρ runs from 1 to 0—then the point ρa + σ b =

(1 − ρ)a + ρb runs in a straight line from a to b.
yes yes no no no
Fig. 1.3 Convex set: yes or no
Example 1.2.2 (Convex Set)

1. Figure 1.3 illustrates the concept of convex set.
Five shaded subsets of the plane R2 have been drawn. From left to right: the
first two are convex, the other three are not: the third as it has a dent, the fourth
as it has a hole, the fifth as it consists of two pieces.
2. For n = 2, if a convex set A in the plane R2 represents land, and its complement
R2 \ A water, then if you stretch a rope between any two points of A, then the
rope remains dry.
3. The boundary of a convex set can be very irregular. Each set A in R2 that contains
the standard open unit disk x12 + x22 < 1 and is contained in the standard closed
unit disk x12 + x22 ≤ 1 is convex. So the complement of A in x12 + x22 ≤ 1 can be
any subset of the standard unit circle x12 + x22 = 1. The only such set A that is
closed is x12 + x22 ≤ 1.
One could try to enumerate all convex sets.
Example 1.2.3 (Attempts to Enumerate Convex Sets)
1. The convex subsets of the line R can be enumerated. A subset of R is convex iff
(if and only if) it is an interval; on each side it can be open or closed, finite or
infinite. That is, the list of convex sets in R is
∅, (a, b), [a, b], (a, b], [a, b), (a, +∞), (−∞, a), [a, +∞), (−∞, a], (−∞, +∞)
for all −∞ < a < b < +∞.

2. The convex subsets of the plane R2 cannot be enumerated. However, if they
are bounded (contained in a large disk with center the origin), then one can do
the next best thing: one can approximate them by simpler convex sets, convex
polygons, and these can be enumerated. A polygon is a region of the plane for
which the boundary consists of a finite number of straight line segments, the sides
of the polygon.
(a) A polygon is convex iff all its interior angles are less than π .
(b) The convex polygons can be enumerated by the lengths of their sides and
their angles. Let a k-sided convex polygon be given. Number its vertices
from 1 to k starting at some chosen vertex and going counter-clockwise. Let
ϕi be the angle at the i-th vertex and let li be the length of the side that has
1.2 Convex Sets and Cones: Definitions 9
the i-th vertex as the first one of its two boundary vertices, going counter-
clockwise. Then ϕ1 , . . . , ϕk is a sequence of positive numbers in (0, π ) with
sum (k −2)π , and l1 , . . . , lk is a sequence of positive numbers. All such pairs
of sequences are obtained for a unique k-sided convex polygon, if convex
polygons that differ by rotations and translations are identified. This result
has no simple generalization beyond the plane.
(c) Here is another way to enumerate convex polygons. Let a k-sided convex
polygon be given. Let T be the set of its outside normals t of unit length,
1
t = (t12 + t22 )
2 = 1; for each t ∈ T , let at be the length of the side with
normal t. Then t∈T at t = 0. All such pairs of sequences are obtained for a
unique k-sided polygon, if polygons that differ by rotations and translations
are identified. The advantage of this result is that it has a nice straightforward
generalization beyond the plane.
(d) Note that a convex k-sided polygon has k vertices, and that this set of vertices
characterizes the polygon. However, the characterization of sets of k points
in the plane that are the vertices of a convex k-sided polygon is not so simple.
This is a disadvantage of enumerating polygons by their vertices.
(e) For each convex set A that contains a small disk with center the origin and
for each ε > 0, there is a convex polygon P such that
P ⊇ A ⊇ (1 − ε)P = {(1 − ε)p | p ∈ P }.
3. The generalization of a convex polygon in the plane R2 to dimension n is a

convex polytope in Rn , a region in Rn for which the boundary consists of a finite
number of bounded pieces of hyperplanes in Rn . The enumeration of convex
polygons by angles and lengths of sides cannot be generalized in a convenient
way to polytopes. However, the enumeration of convex polygons by normals
and lengths of sides can be extended in a convenient way to polytopes (see
Theorem 1.12.7). The approximation of bounded convex sets in the plane by
convex polytopes can also be extended to dimension n.
Note that the intersection of any collection of convex sets in Rn is again a
convex set. In particular, if two convex sets in Rn have no common point, then
their intersection is the empty set, which is considered to be a convex set by the
definition above. However, the union of two convex sets need not be convex, as the
example of the convex sets [2, 3] and [4, 6] shows.
Often only the direction of a vector v ∈ Rn is important. Then one identifies two
vectors that are a positive scalar multiple of each other. This leads to the following
definition.
Definition 1.2.4 A cone is a set T in Rn that is positive homogeneous, that is, for
each v ∈ T , also all positive scalar multiples ρv, ρ > 0 belong to T . So a cone is a
union of a collection of open rays Ru = {ρu | ρ ∈ R++ } where u is a unit vector,
1
u = (u21 + · · · + u2n ) 2 = 1, with possibly the origin adjoined to it.
Each open ray in Rn contains a unique unit vector, and so the collection of open
rays of a cone in Rn can be viewed as a subset of the standard unit sphere x12 +
· · · + xn2 = 1. Each open ray Ru with positive last coordinate, un > 0, contains a
unique vector with last coordinate equal to 1, (x1 , . . . , xn−1 , 1); this gives a point
(x1 , . . . , xn−1 ) in Rn−1 . Therefore, the collection of open rays of a cone in Rn that
is contained in the halfspace xn > 0 can be viewed as a subset of Rn−1 .
Example 1.2.5 (A Cone Determined by a Positive Homogeneous Function) Let f :
Rn → R be a positive homogeneous function of degree ν ∈ R, that is, f (ρx) =
ρ ν f (x) = 0 for all ρ > 0 and x ∈ Rn . Then the solution set of For example, the
function fn (x1 , x2 , x3 ) = x22 x3 − x13 + n2 x1 x32 is positive homogeneous of degree 3
for all n ∈ R. The set of solutions of fn = 0 for which x3 ≥ 0 is a cone that has two
horizontal rays, R(0,1,0) and R(0,−1,0) , and the set of rays that are contained in the
open halfspace x3 > 0 can be viewed as the curve x22 = x13 − nx12 in the plane R2 .
[Aside. This curve is related to the famous unsolved congruent number problem:
determine whether a square-free natural number n is a congruent number, that is, the
area of a right triangle with rational number sides. Such triangles correspond 1 − 1
to the nontrivial rational number points on the curve x22 = x13 − nx12 (nontrivial
means x2 = 0). There is an easily testable criterion for the existence of such points,
but this depends on the truth of the Birch and Swinnerton-Dyer conjecture, which is
one of the Millennium Problems (‘million dollar problems’).]
We are only interested in a special type of cones.
Definition 1.2.6 A convex cone C ⊆ Rn is a set in Rn that is simultaneously a cone
and a convex set. A simpler description of a convex cone is that it is a set C ⊆ Rn
that contains for each two elements also all their positive linear combinations,
a, b ∈ C, σ, τ ∈ R++ ⇒ σ a + τ b ∈ C.
Note that the intersection of any collection of convex cones in Rn is again a convex
cone.
Warning In some texts, convex cones are required to contain the origin. This hardly
matters: if we adjoin the origin to a convex cone that does not contain the origin,
we get a convex cone containing the origin. However, for some constructions it
is convenient to allow convex cones that do not contain the origin. The precise
relation between convex cones that do not contain the origin and convex cones that
do contain the origin is given in Exercise 10.
Example 1.2.7 (Convex Cone)
1. Figure 1.4 illustrates the concept of convex cone.
The left hand side illustrates the defining property in the simpler description.
The right side illustrates why the word ‘cone’ has been chosen: a typical three-
dimensional cone looks like an extended ice-cream cone.
1.3 Chapters 1–4 in the Special Case of Subspaces 11
τb σa + τb
b
a
O σa O
Fig. 1.4 Convex cone
2. Convex cones in the line R can be enumerated:
∅, (0, +∞), (−∞, 0), [0, +∞), (−∞, 0], (−∞, +∞).
3. Convex cones in the plane can be enumerated. If they are closed (that is contain
their boundary points), then the list is: ∅, R2 and each region with boundary the
two legs of an angle ≤ π , with legs having their endpoint at the origin.
4. Three interesting convex cones in R3 :
(a) the first orthant R3+ = {x ∈ R3 | x1 ≥ 0, x2 ≥ 0, x3 ≥ 0},
1
(b) the Lorentz cone or ice cream cone L3 = {x ∈ R3 | x3 ≥ (x12 + x22 ) 2 },
(c) the positive semidefinite cone
S3+ = {x ∈ R3 | x1 x3 − x22 ≥ 0, x1 ≥ 0, x3 ≥ 0}.
If you identify x ∈ R3 with the symmetric 2 × 2-matrix (aij )ij given by

a11 = x1 , a12 = a21 = x2 , a22 = x3 , then this convex cone is identified
with the set of positive semi-definite 2 × 2-matrices (that is, the symmetric
2 × 2-matrices A for which v Av ≥ 0 for all v ∈ R2 ).
(d) The three examples of convex cones in R3 given above can be generalized,
as we will see in Sect. 1.7.1.
1.3 Chapters 1–4 in the Special Case of Subspaces
The aim of this section is to give some initial insight into the contents of Chaps. 1–4
of this book. The aim of these chapters is to present the properties of convex sets
and now we display the specializations of some of these properties from convex
sets to special convex sets, subspaces. A subspace of Rn is a subset L of Rn that is
closed under sums and scalar multiples, x, y ∈ L, α, β ∈ R ⇒ αx + βy ∈ L. A
subspace can be viewed as the central concept from linear algebra: it is the solution
set of a finite system of homogeneous linear equations in n variables. A subspace
is a convex cone and so, in particular, it is a convex set. The specializations of the
results from Chaps. 1–4 from convex sets to subspaces are standard results from
linear algebra.
Chapter 1 A subspace of X = Rn is defined to be a nonempty subset L ⊆ X that
is closed under linear combinations of two elements:
a, b ∈ L, ρ, σ ∈ R ⇒ ρa + σ b ∈ L.
Example 1.3.1 (False Equations and Homogenization) Now we illustrate homoge-

nization of convex sets by means of a simple high school example. Before the use
of the language of sets in high school, the following three cases were distinguished
in some school books, for a system of two linear equations in two variables:
a11 x1 + a12 x2 = b1 ,
a21 x1 + a22 x2 = b2 .
where for each equation, the coefficients on the left side are not both zero: the system
is, either undetermined, or uniquely determined or false. This distinction of cases,
and in particular the intriguing choice of terminology ‘false’, might suggest that this
is a substantial phenomenon. This distinction can expressed as follows, using sets:
the solution set of the system is either infinite, or it consists of one element, or it
is the empty set. In geometric language: two lines in the plane are either identical,
or they intersect in one point, or they are parallel. A variant of homogenization (for
simplicity, we work with nonzero multiples instead of with positive multiples) turns
the system into the following form:
a11 x1 + a12 x2 = b1 x3 ,
a21 x1 + a22 x2 = b2 x3 ,
If these equations are not multiples of each other, then the system has always
a unique nonzero solution up to scalar multiples. In geometric language: the
intersection of two different planes in space through the origin is always a line.
So homogenization makes the distinction of cases disappear, and reveals the simple
essence of the phenomenon of false systems of equations.
Chapter 2 There are two standard binary operations for subspaces, the intersection
L1 ∩ L2 and the Minkowski sum
L1 + L2 = {x + y | x ∈ L1 , y ∈ L2 }.
These can be described systematically as follows. The subspace L1 +L2 is the image
of the set L1 × L2 under the linear function sum map +X : X × X → X, given by
the recipe (x, y) → x + y. The subspace L1 ∩ L2 is the inverse image of the same
set L1 × L2 under the linear function diagonal map X : X → X × X, given by
1.4 Convex Cones: Visualization by Models 13
the recipe x → (x, x). This description has the advantage that it reveals a relation
between ∩ and +, as the linear functions X and +X are each others transpose.
That is,
X (u) · (v, w) = u · +X (v, w),
for all u, v, w ∈ X, where the inner product on X × X is denoted as
(x, y) · (v, w) = x1 v1 + · · · + xn vn + y1 w1 + · · · + yn wn .
Chapter 3 For each subspace L, we can choose a finite sequence of elements in L

such that each element of L can be written in a unique way as a linear combination
of these elements. So all subspaces have essentially the same ‘shape’.
Chapter 4 For each subspace L, we can consider its socalled dual object, the
orthogonal complement L⊥ , consisting of all elements in X that are orthogonal to
L,
L⊥ = {a ∈ X | a · x = a1 x1 + · · · + an xn = 0 ∀x ∈ L}.
That is, L⊥ consists of all a ∈ X for which L is contained in the solution set of the
equation a1 x1 + · · · + an xn = 0. One has the duality theorem L⊥⊥ = L. Moreover,
one has the calculus rule that K1 + K2 and L1 ∩ L2 are each others orthogonal
complement if Ki and Li are each others orthogonal complement for i = 1, 2.
This rule is an immediate consequence of the systematic description of ∩ and +
and the following property: if the subspaces K, L ∈ Rn are each others orthogonal
complement, then for each m × n-matrix M, the following two subspaces in Rm are
each others orthogonal complement: {Mx | x ∈ L}, the image of L under M, and
{y | M y ∈ L}, the inverse image of L under the transpose of M.
1.4 Convex Cones: Visualization by Models
The aim of this section is to present three models for convex cones C ⊆ X = Rn
that lie in the upper halfspace xn ≥ 0: the ray model, the hemisphere model and the
top-view model.
Example 1.4.1 (Ray, Hemisphere and Top-View Model for a Convex Cone) Fig-
ure 1.5 illustrates the three models in one picture for a convex cone C in the plane
R2 that lies above or on the horizontal axis.
The ray model visualizes this convex cone C ⊆ R2 by viewing C \ {02 } as a
collection of open rays, and then considering the set whose elements are these rays;
the hemisphere model visualizes C by means of an arc: the intersection of C with the
upper half-circle x12 + x22 = 1, x2 ≥ 0; the top-view model visualizes C by looking
Fig. 1.5 Ray, hemisphere

and top-view model for a
convex cone C
from high above at the arc, or, to be precise, by taking the interval in [−1, +1] that
is the orthogonal projection of the arc onto the horizontal axis.
The three models are all useful for visualization in low dimensions. Fortunately,
most, maybe all, properties of convex cones in Rn for general n can be understood
by looking at such low dimensional examples. Therefore, making pictures in your
head or with pencil and paper by means of one of these models is of great help
to understand properties of convex cones—and so to understand the entire convex
analysis, as we will see.
1.4.1 Ray Model for a Convex Cone
The role of nonzero elements of a convex cone C ⊆ X = Rn is usually to describe

one-sided directions. Then, two nonzero elements of C that differ by a positive
scalar multiple, a and ρa with a ∈ Rn and ρ > 0, are considered equivalent. The
equivalence classes of C \ {0n } are open rays of C, sets of all positive multiples of
a nonzero element. So, what often only matters about a convex cone C is its set of
open rays, as this describes a set of one-sided directions. The set of open rays of C
will be called the ray model for the convex cone C.
The ray model is the most simple description of the one-sided directions of C
from a mathematical point of view, but it might be seen as inconvenient that a one-
sided direction is modeled by an infinite set. Moreover, it requires some preparations
such as the definition of distance between two open rays: this is needed in order to
define convergence of a sequence of rays. Suppose we have two open rays, {ρv | ρ >
0} and {ρw | ρ > 0}, where v, w ∈ X are unit vectors, that is, their lengths or
1 1
Euclidean norms, v = (v12 + · · · + vn2 ) 2 and w = (w12 + · · · + wn2 ) 2 , are both
equal to 1. Then the distance between these two rays can be taken to be the angle
ϕ ∈ [0, π ] between the rays, which is defined by cos ϕ = v ·w = v1 w1 +· · ·+vn wn ,
the dot product of v and w. To be precise about the concept of distance, the concept
of metric space would be needed; however, we will not consider this concept.
Example 1.4.2 (Ray Model for a Convex Cone) Figure 1.5 above illustrates the ray
model for convex cones in dimension two. A number of rays of the convex cone C
are drawn.
1.4.2 Sphere Model for a Convex Cone
One often chooses a representative for each open ray, in order to replace open rays—
which are infinite sets of points—by single points. A convenient way to do this is
to normalize: that is, to choose the unit vector on each ray. Thus the one-sided
directions in the space X = Rn are modeled as the points on the standard unit
sphere SX = Sn = {x ∈ X | x = 1} in X. The set of unit vectors in a convex cone
C ⊆ X = Rn will be called the sphere model for C. The subsets of the standard unit
sphere SX that one gets in this way are precisely the geodesically convex subsets of
SX . A subset T of SX is called geodesically convex if for each two different points
p, q of T that are not antipodes (p = −q), the shortest curve on SX that connects
them is entirely contained in T . This curve is called the geodesic connecting these
two points. Note that for two different points p, q on SX that are not antipodes, there
is a unique great circle on SX that contains them. A great circle on SX is a circle on
SX with center the origin, that is, it is the intersection of the sphere SX with a two
dimensional subspace of X. This great circle through p and q gives two curves on
SX on this circle connecting the two points, a short one and a long one. The short
one is the geodesic connecting these points.
Example 1.4.3 (Sphere Model for a Convex Cone)
1. Figure 1.5 above illustrates the sphere model for convex cones in dimension two.
The convex cone C is modeled by an arc.
2. Figure 1.6 illustrates the sphere model for convex cones in dimension three, and
it illustrates the concepts great circle and geodesic on SX for X = R3 .
Two great circles are drawn. These model two convex cones that are planes
through the origin. For the two marked points on one of these circles, the
segment on this circle on the front of the sphere connecting the two points is
their geodesic. Moreover, you see that two planes in R3 through the origin (and
remember that a plane is a convex cone) are modeled in the sphere model for
convex cones by two large circles on the sphere.
3. Figure 1.7 is another illustration of the sphere model for a convex cone C in
dimension two.
Here the convex cone C is again modeled by an arc. The sphere model is
convenient for visualizing the one-sided directions determined by a convex cone
C ⊆ X = Rn for dimension n up to three: for n = 3, a one-sided direction in
Fig. 1.6 Geodesic and great

circle
SX
Fig. 1.7 Sphere model for a convex cone
C C
C
Fig. 1.8 The hemisphere model for a convex cone
three-dimensional space is modeled as one point on the standard unit sphere S3

in three-dimensional space, the solution set of the equation x12 + x22 + x32 = 1.
1.4.3 Hemisphere Model for a Convex Cone
Often, we will be working with convex cones in X = Rn that lie above or on the
horizontal coordinate hyperplane xn = 0. Then the unit vectors of these convex
cones lie on the upper hemisphere x12 + · · · + xn2 = 1, xn ≥ 0. Then the sphere
model is called the hemisphere model.
Example 1.4.4 (Hemisphere Model for a Convex Cone)
1. Figure 1.5 above illustrates the hemisphere model for convex cones in dimension
two. The convex cone C is modeled by an arc on the upper half-circle.
2. Figure 1.8 illustrates the hemisphere model for three convex cones C in
dimension three.
The models of these three convex cones C are shaded regions on the upper
hemisphere. These regions are indicated by the same letter as the convex cone
that they model, by C. The number of points in the model of C that lie on the
circle that bounds the hemisphere is, from left to right: zero, one, infinitely many.
This means that the number of horizontal rays of the convex cone C is, from left
to right: zero, one, infinitely many.
1.4.4 Top-View Model for a Convex Cone
In fact, for a convex cone C ⊆ X = Rn that is contained in the halfspace xn ≥ 0,

one can use an even simpler model, in one dimension lower—in Rn−1 instead of
in Rn . Suppose we look at the hemisphere from high above it. Here we view the
n-th coordinate axis of X = Rn as a vertical line. Then the hemisphere looks like
the standard closed unit ball in dimension n − 1, and the subset of the hemisphere
that models C looks like a subset of this ball. To be more precise, one can take the
orthogonal projection of points on the hemisphere onto the hyperplane xn = 0, by
setting the last coordinate equal to zero. Then the hyperplane xn = 0 in X = Rn
can be identified with Rn−1 by omitting the last coordinate 0. This gives that the set
of one-sided directions of the convex cone is modeled as a subset of the standard
closed unit ball in Rn−1 ,
BRn−1 = Bn−1 = {x ∈ Rn−1 | x12 + · · · + xn−1

2
≤ 1}.
This subset will be called the top-view model for a convex cone.
Example 1.4.5 (Top-View Model for a Convex Cone)
1. Figure 1.5 above illustrates the top-view model for convex cones in dimension
two. Here the convex cone C is modeled by a closed line segment contained in
the closed interval [−1, 1], the standard closed unit ball B1 in R1 .
2. Figure 1.9 illustrates the top-view model for three convex cones C in dimension
three.
The models for these three convex cones C are shaded regions in the standard
closed unit disk B2 in the plane R2 , which are indicated by the same letter as the
convex cones that they model, by C. All three convex cones that are modeled,
are chosen in such a way that they contain the positive vertical coordinate axis
{(0, 0, ρ) | ρ > 0} in their interior. As a consequence, their top-view models
contain in their interior the center of the disk. The arrows represent images under
orthogonal projection on the plane x3 = 0 of some geodesics that start at the top
of the hemisphere. The number of points in the model of C that lie on the circle
are, from left to right: zero, one, infinitely many. This means that the number of
horizontal rays of the convex cone C is, from left to right: zero, one, infinitely
many.
C
C C
Fig. 1.9 The top-view model for a convex cone

The top-view model for convex cones can be used for visualization of convex
cones in one dimension higher than the sphere and the hemisphere model, as it
models one-sided directions of C ⊆ X = Rn as points in the ball BRn−1 ⊆ Rn−1 :
so it can be used for visualization up to and including dimension four, whereas the
sphere and hemisphere model can be used only up to and including dimension three.
The top-view model has only one disadvantage: the orthogonal projections of the
geodesics between two points on the hemisphere, which are needed to characterize
the subsets of the ball that model convex cones, are less nice than the geodesics
themselves. These geodesics on the hemisphere are nice: they are halves of great
circles. Their projections onto the ball are less nice: they are halves of ellipses that
are contained inside the ball and that are tangent to the boundary of the ball at two
antipodes on the ball. The exception to this are the projections of halves of great
circles through the top of the hemisphere: they are nice—they are just straight line
segments connecting two antipodes on the boundary of the ball, and so they run
through the center of the ball.
Example 1.4.6 (Straight Geodesics in the Top-View Model) Figure 1.9 above illus-
trates the projections of these geodesics: some straight arrows are drawn from the
center of the disk to the boundary of the top-view model for the convex cone C.
1.5 Convex Sets: Homogenization
Now we come to the distinctive feature of this book: the consistent use of the
description of a convex set in terms of a convex cone, its homogenization or
conification.
1.5.1 Homogenization: Definition
Now we are going to present homogenization of convex sets, the main ingredient
of the unified approach to convex analysis that is used in this book. That is, we are
going to show that every convex set in X = Rn —and we recall that this is our basic
object of study—can be described by a convex cone in a space one dimension higher,
X × R = Rn+1 , that lies entirely on or above the horizontal coordinate hyperplane
xn+1 = 0, that is, it lies in the closed upper halfspace xn+1 ≥ 0.
Example 1.5.1 (Homogenization of a Bounded Convex Set) Figure 1.10 illustrates
the idea of homogenization of convex sets for a bounded convex set A in the plane
R2 .
This crucial picture can be kept in mind throughout this book!
The plane R2 is drawn as the floor, R2 × {0}, in three dimensional space R3 .

The convex set A is drawn as lying on the floor, as the set A × {0}, and this copy
1.5 Convex Sets: Homogenization 19
Fig. 1.10 Homogenization

of a convex set
A×{1}
1 C
O A
is indicated by the same letter as the set itself, by A. This set is lifted upward in
vertical direction to level 1, giving the set A × {1}. Then the union of all open rays
that start at the origin and that run through a point of the set A × {1} is taken. This
union is defined to be the homogenization or conification C = c(A) of A. Then A
is called the (convex set) dehomogenization or deconification of C.
There is no loss of information if we go from A to C: we can recover A from
C as follows. Intersect C with the horizontal plane at level 1. This gives the set
A × {1}. Then drop this set down on the floor in vertical direction. This gives the set
A, viewed as lying on the floor.
The formal definition is as follows. It is convenient to reverse the order and to
define first the dehomogenization and then the homogenization. If C ⊆ X × R =
Rn+1 is a convex cone and C ⊆ X × R+ , where R+ = [0, +∞)—that is, C lies on
or above the horizontal hyperplane X × {0}, or, to be more precise, C is contained
in the closed upper halfspace xn+1 ≥ 0—then the intersection C ∩ (X × {1}) is of
the form A × {1} for some unique set A = d(C) ⊆ X. This set is convex. Here is
the verification.
If
a, b ∈ A, ρ, σ ∈ [0, 1], ρ + σ = 1
then the point
ρ(a, 1) + σ (b, 1) = (ρa + σ b, ρ + σ ) = (ρa + σ b, 1)
lies in C and also in X×{1}; so it lies in A×{1}. This gives ρa+σ b ∈ A. This proves
that A is a convex set, as claimed. All convex sets A ⊆ X are the dehomogenization
of some convex cone C ⊆ X × R+ .
Definition 1.5.2 The (convex set) dehomogenization or deconification of a convex
cone C ⊆ X × R+ is the convex set A = d(C) ⊆ X defined by
A × {1} = C ∩ (X × {1}).
Then the convex cone C is called a homogenization or conification of the convex

set A. There is no loss of information if we go over from a convex set A to a
homogenization C: we can recover A from C: A = d(C), that is,
A × {1} = C ∩ (X × {1}).
Example 1.5.3 (Deconification) The convex cones
C1 = {x ∈ R3 | x12 + x22 − x32 ≤ 0, x3 ≥ 0},
C2 = {x ∈ R3 | 2x12 + 3x22 − 5x33 ≤ 0, x3 ≥ 0},
C3 = {x ∈ R3 | − x12 + x2 x3 ≤ 0, x3 ≥ 0},
C1 = {x ∈ R3 | − x1 x2 + x3 ≤ 0, x1 ≥ 0, x2 ≥ 0, x3 ≥ 0},
have as deconifications the convex sets that have the following boundaries, respec-
tively: the circle x12 + x22 = 1, the ellipse 2x12 + 3x22 = 5, the parabola x2 = x12 , and
the branch of the hyperbola x1 x2 = 1 that is contained in the first orthant x ≥ 0.
1.5.2 Non-uniqueness Homogenization
We have already stated the fact that each convex set A ⊆ X is the deconification
of a suitable convex cone C ⊆ X × R+ . This convex cone C, a homogenization of
A, is not unique. To give a precise description of the non-uniqueness we need the
important concept recession direction.
Definition 1.5.4 Let A ⊆ X = Rn be a convex set. A recession vector of A is a
vector c ∈ X such that a + tc ∈ A for all a ∈ A and all t ≥ 0. This means, if c = 0n ,
that one can travel in A in the one-sided direction of c without ever leaving A. So
you can escape to infinity within A in this direction. The recession cone RA of the
convex set A is the set in X consisting of all recession vectors of A. The set RA is
a convex cone. A recession direction of A is a one-sided direction of RA , that is, an
open ray of the convex cone RA .
Example 1.5.5 (Recession Cone (Geometric)) Figure 1.11 illustrates the concept of
recession cone for convex sets with boundary an ellipse, a parabola and a branch of
a hyperbola respectively.
On the left, a closed convex set A is drawn with boundary an ellipse. This set
is bounded, there are no recession directions: RA = {0n }. In the middle, a closed
convex set A is drawn with boundary a parabola. This set is unbounded, there is a
unique recession direction: RA is a closed ray. This recession direction is indicated
by a dotted half-line: c is a nonzero recession vector. On the right, a closed convex
1.5 Convex Sets: Homogenization 21
A
a + tc
A a
RA
RA
RA
O O O
Fig. 1.11 Recession cone of a convex set
set A is drawn with boundary a branch of a hyperbola. This set is unbounded, there
are infinitely many recession directions: RA is the closed convex cone in the plane
with legs the directions of the two asymptotes to the branch of the hyperbola that is
the boundary of A.
Example 1.5.6 (Recession Cone (Analytic))
1. Ellipse. The recession cone of 2x12 + 3x22 ≤ 5 is {02 }.
2. Parabola. The recession cone of x2 ≥ x12 is the closed ray containing (1, 0).
3. Hyperbola. The recession cone of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 is the first quadrant
R2+ .
At this point, it is recommended that you find all the recession vectors of some
simple convex subsets of the line and the plane, such as in Exercise 27.
The following result describes all conifications of a given nonempty convex set
A ⊆ X.
Proposition 1.5.7 Let A ⊆ X = Rn be a nonempty convex set. For each
conification C ⊆ X × R+ of A, the intersection of C with the open upper halfspace
X × R++ is the smallest convex cone containing A × {1},
R++ (A × {1}) = {ρ(a, 1) | ρ ∈ R++ , a ∈ A};
the intersection of C with the horizontal hyperplane X × {0} is a convex cone

contained in RA × {0}.
Conversely, each convex cone contained in RA ×{0} is the intersection of a unique
conification of A with the horizontal hyperplane X × {0}.
This result follows immediately from the definitions. It shows that the conifica-
tions C of a nonempty convex set A ⊆ X = Rn are precisely the intermediate
convex cones between the convex cones cmin (A) = R++ (A × {1}) and cmax (A) =
cmin (A) ∪ (RA × {0}),
cmin (A) ⊆ C ⊆ cmax (A).
So cmin (A) and cmax (A) are the minimal and maximal conifications of A respec-
tively. We will choose the minimal homogenization of A to be our default
homogenization. We write c(A) = cmin (A) and we call c(A) the homogenization of
A. The recession cone RA of a convex set A ⊆ X is an important concept in convex
analysis. It arises here naturally from homogenization: RA × {0} is the intersection
of cmax (A), the maximal conification of A, with the horizontal hyperplane X × {0}.
The main point that this book tries to make is that all concepts in convex analysis,
their properties and all proofs, arise naturally from homogenization. The recession
cone is the first example of this phenomenon. In Chap. 3, the recession cone will be
considered in more detail.
Example 1.5.8 (Minimal and Maximal Conification)
1. Consider the convex set A = (−7, +∞). This is the solution set of the inequality
x1 > −7. Homogenization of this inequality gives x1 > −7x2 , that is, x2 >
− 17 x1 . So cmin (A) = {x ∈ R2 | x2 > − 17 x1 , x2 > 0}. One has RA = [0, +∞).
It follows that cmax (A) = {x ∈ R2 | x2 > − 17 x1 , x2 ≥ 0}.
2. Here is a numerical illustration of the minimal and maximal homogenization for
the three convex sets in Fig. 1.10 with a convenient choice of coordinates.
• Ellipse. Let the convex set A be the solution set of x12 + 5x22 − 1 ≤ 0. Then the
recession cone is RA = {02 }. The minimal homogenization c(A) = cmin (A)
is the solution set of
x12 + 5x22 − x32 ≤ 0, x3 > 0.
The maximal homogenization cmax (A) is the solution set of
x12 + 5x22 − x32 ≤ 0, x3 ≥ 0.
• Parabola. Let the convex set A be the solution set of x2 ≥ x12 . Then the
recession cone RA is the solution set of x1 = 0, x2 ≥ 0. The minimal
homogenization c(A) = cmin (A) is the solution set of
x2 x3 ≥ x12 , x3 > 0.
x2 x3 ≥ x12 , x3 ≥ 0.
• Hyperbola. Let the convex set A be the solution set of x12 −x22 −1 ≤ 0, x2 ≥ 0.
Then the recession cone RA is the epigraph of the absolute value function, the
1.6 Convex Sets: Visualization by Models 23
solution set of x2 ≥ |x1 |. The minimal homogenization c(A) = cmin (A) is the
solution set of
x12 − x22 − x32 ≤ 0, x2 ≥ 0, x3 > 0.
x12 − x22 − x32 ≤ 0, x2 ≥ 0, x3 ≥ 0.
These calculations illustrate also the terminology ‘homogenization’. For

example, the usual homogenization of the quadratic function x12 + 5x22 − 1,
which occurs in the description of the first convex set above, is the function
x12 +5x22 −x32 , which occurs in the description of the two homogenizations of this
convex set. Note that this last function is homogeneous of degree 2: when each
of its arguments is multiplied by any number t > 0, the value of the function is
multiplied by t 2 .
1.5.3 Homogenization Method
Now we formulate the unified method that will be used in this book for all tasks:
the homogenization method or conification method. This method consists of three
steps:
1. conify (homogenize)—translate into the language of convex cones (homoge-
neous convex sets) by taking the conifications of all convex sets involved,
2. work in terms of convex cones, that is, homogeneous convex sets, to carry out
the task,
3. deconify (dehomogenize)—translate back.
1.6 Convex Sets: Visualization by Models
For visualization of convex sets A in X = Rn , we will make use of the ray model,
the hemisphere model and the top-view model for a convex set.
Example 1.6.1 (Ray Model, Hemisphere Model and Top-View Model for a Convex
Set) Figure 1.12 illustrates the three models in the case of a convex set in the line
R.
A convex set A ⊆ R is drawn; R is identified with the horizontal axis in the
plane R2 ; the set A is lifted up to level 1; the open rays that start from the origin
and that run through a point of this lifted up set form the ray model for A; the arc
that consists of the intersections of these rays with the upper half-circle form the
hemisphere model for A; the orthogonal projection of this arc onto the horizontal
axis forms the top-view model for A.
0 A
Fig. 1.12 Ray, hemisphere and top-view models for a convex set
A A
A
RA RA
Fig. 1.13 The hemisphere model for a convex set
The three models for visualization of convex sets are the combination of two
constructions that have been presented in the previous two sections: homogenization
of convex sets in X, and visualization of convex cones in X ×R = Rn+1 lying above
the horizontal hyperplane X × {0} by means of the ray, hemisphere and top-view
model for convex cones.
Ray Model for a Convex Set Each point x ∈ X = Rn can be represented by the open
ray in X × R++ that contains the point (x, 1). Thus each subset of X is modeled
as a set of open rays in the open upper halfspace X × R++ . This turns out to be
specially fruitful for visualizing convex sets, if we want to understand their behavior
at infinity. We will see this in Chap. 3. We emphasize that this is an important issue.
Hemisphere Model for a Convex Set Consider the upper hemisphere
{(x, ρ) | x ∈ X, ρ ∈ R+ , x12 + · · · + xn2 + ρ 2 = 1}
in X × R = Rn+1 . Each point x in X = Rn can be represented by a point in

this hemisphere: the intersection of the hemisphere with the open ray that contains
the point (x, 1). Thus x ∈ X is represented by the point (x, 1)−1 (x, 1) on the
hemisphere, where · denotes the Euclidean norm. Thus we get for each subset
of X, and in particular for each convex set in X, a bijection with a subset of the
hemisphere. This set has no points in common with the boundary of the hemisphere,
which is a sphere in one dimension lower, in Rn−1 .
Example 1.6.2 (Hemisphere Model for a Convex Set) Figure 1.13 illustrates the
hemisphere model for three closed convex sets A in the plane R2 .
The models are subsets of the upper hemisphere; these subsets are indicated by
the same letter as the set that they model, by A. These models contain no points of
1.6 Convex Sets: Visualization by Models 25
the circle that bounds the upper hemisphere. You see from left to right: (1) the closed
convex set A is bounded and the model of A is closed, (2) the closed convex set A
is unbounded and the model of A is not closed, but if we would add one suitable
point on the boundary of the hemisphere, then the model would become closed, (3)
the closed convex set A is unbounded and the model of A is not closed, but if we
would add a suitable infinite set of points on the boundary of the hemisphere, then
the model would become closed. The set of points that would be added in the middle
and on the right if we would make the model closed as explained above, is denoted
by RA . The reason for this notation is that the points of this set correspond to the
rays of the recession cone RA of A.
Top-View Model for a Convex Set Now consider the standard closed unit ball in
X = Rn , Bn = {x ∈ X | x12 + · · · + xn2 ≤ 1}. Each point (x, ρ) in the hemisphere
above, that is, each solution of x12 + · · · + xn2 + ρ 2 = 1, ρ ≥ 0, can be represented
by a point in this ball: the point x. Note here that (x, 0) is the orthogonal projection
of the point (x, ρ) onto the horizontal hyperplane X × {0}. Combining this with the
hemisphere model above, we get for each subset of X a bijection to a subset of the
ball Bn , in fact, to a subset of the open ball UX = Un = {x ∈ X | x < 1}. Here a
point x ∈ X corresponds to the point (x, 1)−1 x ∈ Un .
Example 1.6.3 (Top-View Model for a Convex Set) Figure 1.14 illustrates the top-
view model for three convex sets A in the plane R2 that contain the origin as an
interior point.
The top-view models of these three convex sets A are subsets of the standard
open unit disk; these are also indicated by A. In particular, these models include no
points of the circle that bounds the disk. On the left, you see the model of a closed
bounded A; this model is a closed set in the interior of the disk. In the middle and on
the right, you see the model of a closed unbounded A. These models are not closed.
In the middle, you should add one suitable point from the circle in order to make it
closed. On the right, you should add a suitable infinite set of points from the circle to
make it closed. The sets of points that have to be added in order to make the model
closed have again be indicated by the notation for the recession cone, by RA .
The top-view model for a convex set will be specially fruitful for visualizing
convex sets if we want to understand their behavior at infinity. The advantage of the
A
A RA A
RA
Fig. 1.14 The top-view model for a convex set

top-view model over the hemisphere model is that it is in one dimension lower. For
example, we can visualize convex sets in three-dimensional space R3 as subsets of
the open ball x12 + x22 + x32 < 1 and convex sets in the plane R2 as subsets of the
open disk x12 + x22 < 1.
1.7 Basic Convex Sets
The definition of the concept of convex set serves, as all definitions, for academic
precision only: for every concept that one considers, a definition has to be given. If
we work with a concept, we will never use its definition, but instead a user-friendly
calculus. For convex sets, this means that we want basic examples of convex sets
as well as convexity preserving operations, rules to make new convex sets from old
ones. Here are, to begin with, some basic convex sets: the three golden cones, the
convex hull of a set, and the conic hull of a set.
1.7.1 The Three Golden Convex Cones
The most celebrated basic examples of convex sets are the following three ‘golden’
convex cones.
• The first orthant Rn+ , is the set of points in Rn with nonnegative coordinates.
• The Lorentz cone Ln , also called the ice cream cone, is the epigraph of the
Euclidean norm on Rn , that is, the region in Rn+1 above or on the graph of
the Euclidean norm,
1
{(x, y) | x ∈ Rn , y ∈ R, y ≥ (x12 + · · · + xn2 ) 2 }.
• The positive semidefinite cone Sn+ , is the set of positive semi-definite n × n-

matrices in the vector space of symmetric n × n-matrices Sn (a symmetric n × n-
matrix M is called positive semidefinite if x Mx ≥ 0 for all x ∈ Rn ).
Note that in the third example we do not work in Rn , but in the space Sn of
symmetric n × n-matrices. This can be justified by identifying this space with Rm
where m = 12 n(n + 1) by stacking symmetric matrices, that is, by taking the entries
on or above the main diagonal of a symmetric matrix and then writing them as one
long column vector, taking the columns of the matrix from left to right and writing
these entries for each column from top to bottom.
The convex cone Sn+ models in terms of linear algebra (and so relatively simply)
a solution set of a system of nonlinear inequalities (that is, a complicated set).
Example 1.7.1 (Positive Semi-definite Cone) If we identify a symmetric 2 × 2-
matrix B with the vector in R3 with coordinates b11 , b22 , b12 respectively, then the
convex cone S2+ gets identified with the convex set x1 x2 − x32 ≥ 0, x1 ≥ 0, x2 ≥ 0.
1.7 Basic Convex Sets 27
This is the maximal conification cmax (A) of the closed convex set A that has
boundary the branch of the hyperbola x1 x2 = 1 that is contained in the first quadrant
x ≥ 0.
1.7.2 Convex Hull and Conic Hull
Here is a source of many basic examples of convex sets.

Definition 1.7.2 To each set S ⊆ X = Rn , one can add points from X in a minimal
way in order to make a convex set co(S) ⊆ X, the convex hull of S. That is,
co(S) is the smallest convex set containing S. It is the intersection of all convex
sets containing S.
Example 1.7.3 Figure 1.15 illustrates the concept of convex hull.
From left to right: the first two sets are already convex, so they are equal to their
convex hull; for the other convex sets, taking their convex hull means respectively
filling a dent, filling a hole, and making from a set consisting of two pieces a set
consisting of one piece, without dents.
Here is a similar source of examples of convex cones.
Definition 1.7.4 To each set S ⊆ X, one can adjoin to S ∪ {0X } points from X in a
minimal way in order to make a convex cone cone(S), the conic hull of S. That is,
cone(S) is the smallest convex cone containing S and the origin. It is the intersection
of all convex cones containing S and the origin.
Example 1.7.5 (Conic Hull)
1. Figure 1.16 illustrates the concept of conic hull.
This picture makes clear that a lot of structure can get lost if we pass from a
set to its conic hull.
2. The conification of a convex set A ⊂ Rn is essentially the conic hull of A × {1}:
c(A) = cone(A × {1}) \ {0X×R }.
Fig. 1.15 Convex hull

Fig. 1.16 Conic hull
cone(S)
Fig. 1.17 Primal description

of a convex set a6
a7
a5
a1
A
a2
a3 a4
1.7.3 Primal Description of the Convex Hull
The convex hull can be described explicitly as follows. We need a definition.

Definition 1.7.6 A convex combination or weighted average of a finite set S =
{s1 , . . . , sk } ⊆ X = Rn is a nonnegative linear combination of the elements of S
with sum of the coordinates equal to 1,
ρ1 s1 + · · · + ρk sk
with ρi ∈ R+ , 1 ≤ i ≤ k and ρ1 + · · · + ρk = 1.
Example 1.7.7 (Convex Combination) Let a, b, c be three points in R2 that do not
lie on one line. Then the set of convex combinations of a, b, c is the triangle with
vertices a, b, c.
Proposition 1.7.8 The convex hull of a set S ⊆ X = Rn consists of the convex
combinations of finite subsets of S.
In particular, a convex set is closed under taking convex combinations of a finite
set of points.
This could be called a primal description—or a description from the inside—of
a convex set. Each choice of a finite subset of the convex set gives an approximation
from the inside: the set of all convex combinations of this subset. Figure 1.17
illustrates the primal description of a convex set.
1.7 Basic Convex Sets 29
Proposition 1.7.8 will be proved by the homogenization method. This is the first
of many proofs of results for convex sets that will be given in this way. Therefore,
we make here explicit how all these proofs will be organized.
1. We begin by stating a version of the result for convex cones, that is, for homoge-
neous convex sets. This version is obtained by conification (or homogenization)
of the original statement, which is for convex sets. For brevity we only display
the outcome, which is a statement for convex cones. So we do not display how
we got this statement for convex cones from the original statement for convex
sets.
2. Then we prove this convex cone version, working with convex cones.
3. Finally, we translate back by deconification (or dehomogenization).
We need a definition.
Definition 1.7.9 A conic combination of a finite set S = {s1 , . . . , sk } ⊂ X = Rn is
a nonnegative linear combination of the elements of S,
ρ1 s1 + · · · + ρk sk ,
with ρi ∈ R+ , 1 ≤ i ≤ k.
Example 1.7.10 (Conic Combination) The set of conic combinations of the stan-
dard basis e1 = (1, 0, . . . , 0) , . . . , en = (0, . . . , 0, 1) is the first orthant Rn+ .
Proof of Proposition 1.7.8
1. Convex cone version. We consider the following statement:
The conic hull of a set S ⊆ X consists of all conic combinations of finite subsets
of S.
2. Work with convex cones. The statement above holds true. Indeed, these points
should all lie in cone(S) (this follows immediately by induction on k) and as it is
readily checked that they form a convex cone, they form the conic hull of S.
3. Translate back. The statement above implies the statement of the proposition: for
S ⊆ X, we have that the convex cone cone(S × {1}) consists of all elements
ρ1 (s1 , 1) + · · · + ρk (sk , 1) = (ρ1 s1 + · · · ρk sk , ρ1 + · · · + ρk )
where k ≥ 0, si ∈ S, 1 ≤ i ≤ k, ρi ∈ R+ . The convex hull of S is the

deconification of this convex cone; it consists of all elements ρ1 s1 + · · · + ρk sk
where k ≥ 0, si ∈ S, 1 ≤ i ≤ k, ρi ∈ R+ , ρ1 + · · · + ρk = 1, as required.
Now we already mention briefly something that is the main subject of Chap. 4
and that is the heart of convex analysis. For convex sets, one has not only a
primal description, that is, a description from the inside, but also a so-called dual
description, that is, a description from the outside. Each choice of a finite system
Fig. 1.18 Dual description of

a convex set H6 H5
H4
H1 A
H3
H2
of linear inequalities in n variables which is satisfied by all points of the convex set
gives an approximation from the outside: the set of all solutions of this system.
Definition 1.7.11 Figure 1.18 illustrates the dual description of a convex set.
In the figure, six lines are drawn that do not intersect the convex set A. For each
of them we cut away the side of the line that is disjoint from A. This gives a six-sided
polygon that is an outer approximation of A.
1.8 Convexity Preserving Operations
Now we turn to three fundamental convexity preserving operations—rules for

making new convex sets from old ones: cartesian product, image and inverse image
under a linear function. We will make good use of these operations, to begin with
in Chap. 2. There they will allow us to give a systematic construction of the four
standard binary operations for convex sets: these make from two old convex sets a
new one. In Chaps. 7 and 8, the image will be used to construct the optimal value
function for a perturbation of an optimization problem.
1.8.1 Definition of Three Operations
The first operation is the Cartesian product. If we have two convex sets A ⊆ Y =
Rm , B ⊆ X = Rn , then their Cartesian product is the set
A × B = {(a, b) | a ∈ A, b ∈ B}
in Y × X = Rm × Rn = Rm+n . This is a convex set.

1.8 Convexity Preserving Operations 31
The second operation is the image of a convex set under a linear function. Let
M be an m × n-matrix. This matrix determines a linear function Rn → Rm by the
recipe x → Mx. For each convex set A ⊆ Rn , its image under M is the set
MA = {Ma | a ∈ A}
in Rm . This is a convex set.

Example 1.8.1 (Image Convex Set Under Linear Function) Let M be a 2×2-matrix,
neither of whose two columns is a scalar multiple of the other column.
1. Let A be a rectangle in the plane R2 whose sides are parallel to the two coordinate
axes, a ≤ x < b, c ≤ y < d for
constants
a <b, c < d. Then MA is the
a b a b
parallelogram with vertices M ,M ,M ,M .
c c d d
2. An arbitrary convex set A ⊆ R2 can be approximated by the disjoint union of
many small rectangles, and then by the result above one gets an approximation
of MA as a disjoint union of parallelograms.
The third operation is the inverse image of a convex set under a linear function.
Let M be as above. For each convex set B ⊆ Rm , its inverse image under M is the
set
BM = {a ∈ Rn | Ma ∈ B}
in Rn . This is a convex set.

These three operations also preserve the convex cone property.
1.8.2 Preservation of Closedness and Properness
We make, for later purposes, the following observations for the two nice properties
that convex sets can have, closedness and properness. The question is whether these
properties are preserved under the three operations on convex sets that have been
defined above. Closedness of convex sets is preserved by Cartesian products, by
inverse images under linear functions, but not by images under linear functions.
Example 1.8.2 (Closedness Not Always Preserved Under Linear Image)
Figure 1.19 illustrates that the image of a closed convex set under a linear function
is not closed.
The shaded region, the solution set of xy ≤ 1, x ≥ 0, is a closed convex set, but
its orthogonal projection onto the x-axis is the open half-line x > 0, y = 0. It is not
closed: it does not contain the origin.
A sufficient condition for preservation of closedness of a convex set A ⊆ Rn by
the image under a linear function Rn → Rm determined by an m × n-matrix M
Fig. 1.19 The projection of a

closed convex set that is not
closed
Fig. 1.20 Radon’s theorem
is that the kernel of M (the solution set of Mx = 0) and the recession cone RA of
A have only the origin in common, as we will see in Chap. 3. Because of this, we
will sometimes work with open convex sets instead of with closed sets: openness of
convex sets is preserved by the image of a linear function Rn → Rm determined
by an m × n-matrix M provided this linear function is surjective, that is, provided
the rows of M are linearly independent. Properness of convex sets is preserved by
Cartesian products, by images under linear functions, but not by inverse images
under linear functions.
1.9 Radon’s Theorem
For a finite set of points in X = Rn , we will give a concept of ‘points in the middle’.
Theorem 1.9.1 (Radon’s Theorem) A finite set S ⊂ X = Rn of at least n +
2 points can be partitioned into two disjoint sets B (‘blue’) and R (‘red’) whose
convex hulls intersect,
co(B) ∩ co(R) = ∅.
Figure 1.20 illustrates Radon’s theorem for four specific points in the plane and
for five specific points in three-dimensional space.
A common point of the convex hulls of B and R is called a Radon point of S. It
can be considered as a point in the middle of S.
1.9 Radon’s Theorem 33
Example 1.9.2 (Radon Point)

1. Three points on the line R can be ordered s1 ≤ s2 ≤ s3 and then their unique
Radon point is s2 .
2. Four points in the plane R2 have either one point lying in the triangle formed by
the other three points or they can be partitioned in two pairs of two points such
that the two closed line segments with endpoints the points of a pair intersect. In
both cases, there is a unique Radon point. In the first case, it is the one of these
four points that lies in the triangle formed by the other three points. In the second
case, it is the intersection of the two closed line segments.
Proof of Radon’s Theorem We use the homogenization method.
A finite set S ⊂ Rn−1 × (0, +∞)—so each element of S has positive last
coordinate—of more than n elements can be partitioned into two disjoint sets
B and R whose conic hulls intersect in a nonzero point,
cone(B) ∩ cone(R) = {0n }.
2. Work with convex cones. As #(S) > n, we can choose a nontrivial linear relation
among the elements of S. As each element of S has positive last coordinate, it
follows that in this nontrivial linear relation, at least one coefficient is positive
and at least one is negative. We rewrite this linear relation, leaving all terms
with positive coefficient on the left side of the equality sign and bringing the
remaining terms to the other side, giving an equality of two conic combinations.
Thus S has been partitioned into two disjoint nonempty sets B and R, and one
has an equality

μb b = νr r.
b∈B r∈R
for suitable numbers μb > 0 ∀b ∈ B and νr ≥ 0 ∀r ∈ R. In particular, by

the version of Proposition 1.7.8 for convex cones, the common value of left and
right hand side is a point in cone(B) as well as a point in cone(R). That is, it is an
element of the intersection
cone(B) ∩ cone(R). This cannot be the zero vector: its
last coordinate is b∈B μb bn and this is a nonempty sum of products of positive
numbers: bn > 0 and μb > 0 for all b ∈ B. Thus we have partitioned S into
two disjoint subsets B and R whose conic hulls intersect in a nonzero point. This
concludes the proof of the version of the theorem of Radon for convex cones.
3. Translate back. What we just proved (the convex cone version), implies the
statement of Theorem 1.9.1. Indeed let S be a subset of Rn of at least n + 2
points. Apply the convex cone version to the set S × {1} in Rn+1 . This is allowed
as #(S × {1}) ≥ n + 2 > dim(Rn+1 ). This gives a partitioning of S × {1} into
two disjoint sets B × {1} and R × {1} whose conic hulls intersect in a nonzero
point. That is, we have the equation

λb (b, 1) = μr (r, 1)
b∈B r∈R
for suitable λb ∈ [0, +∞), b ∈ B, μr ∈ [0, +∞), r ∈ R. Moreover, not all λb

are zero (or, equivalently, not all μr are zero). This equation is equivalent to the
following two equations

λb b = μr r
b∈B r∈R

λb = μr .
b∈B r∈R
Denote both sides of the second equation by u. We have u > 0 as all λb

are nonnegative and at least one of them is nonzero. Now we divide the first
equation by u. This gives that a convex combination of the set B equals a convex
combination of the set R. By Proposition 1.7.8, this shows that the convex hulls
of B and R intersect. This concludes the proof of the theorem.
The proof of Radon’s theorem implies a uniqueness result.

Definition 1.9.3 A finite set of vectors in X=Rn is called affinely independent if
there is no linear combination among them with sum of the coefficients equal to
1 or, equivalently, if the differences of one of its vectors with the other ones form
a linearly independent set. As a linearly independent set in Rn can have at most n
elements, this implies that the number of elements of an affinely independent set in
Rn is always at most n + 1.
Example 1.9.4 (Affinely Independent) Consider subsets of R3 : a set of two different
points is always affinely independent, a set of three points is affinely independent if
they do not lie on one line, a set of four points is affinely independent if they do not
lie in one plane.
Here is the promised uniqueness statement.
Proposition 1.9.5 If S ⊆ X = Rn is a set of precisely n + 2 points, such that
deleting any point gives an affinely independent set, then the Radon point is unique.
One application of Radon points is that they can be used to approximate center
points of any finite set of points in Rn (center points are generalization of the median
to data in Rn , used in statistics and computational geometry) in an amount of time
that is polynomial in both the number of points and the dimension n. However, the
main use of Radon’s theorem is that it is a key step in the proof of Helly’s theorem.
1.10 Helly’s Theorem 35
1.10 Helly’s Theorem
If you want to know whether a large collection of convex sets in X = Rn intersects,

that is, has a common point, then you only have to check that each subcollection of
n + 1 elements has a common point.
Theorem 1.10.1 (Helly’s Theorem) A finite collection of more than n convex sets
in X = Rn has a common point iff each subcollection of n + 1 sets has a common
point.
Example 1.10.2 (Helly’s Theorem)
1. Figure 1.21 illustrates Helly’s theorem for four convex sets S1 , S2 , S3 , S4 in the
plane. These four sets have no common point, as you can check. Therefore,
by Helly’s theorem, there should be a choice of three of these sets that has no
common point. Indeed, one sees that S1 , S3 , S4 have no common point.
2. Figure 1.22 illustrates that Helly’s theorem cannot be improved: three convex
subsets of the plane for which each pair has a nonempty intersection need not
have a common point.
Now we prove Helly’s theorem for some special cases. It is recommended that
you draw some pictures for a good understanding of the arguments that will be
given.
Fig. 1.21 Helly’s theorem
S1 S3
S4 S2
Fig. 1.22 Helly’s theorem

cannot be improved
Example 1.10.3 (Proof of Helly’s Theorem)

1. Consider three convex subsets A, B, C of the line R for which each pair
intersects. Helly’s theorem states that then these three sets intersect. Let us prove
this, using Radon’s theorem 1.9.1. Choose a ∈ B ∩ C, b ∈ C ∩ A, c ∈ A ∩ B.
Now take the Radon point of a, b, c. We have already seen that this can be found
as follows. Order the three points a, b, c. We assume that a ≤ b ≤ c as we may
wlog (without loss of generality). Then b is the Radon point of a, b, c. We have
a, c ∈ B, so by convexity of B, we get b ∈ B. Combining this with b ∈ C ∩ A,
we get that b is a common point of A, B, C.
2. Consider four convex sets A, B, C, D in the plane R2 for which each triplet
intersects. The theorem of Helly states that then these four sets intersect. Let
us check this. Choose a point in the intersection of each triplet. This gives four
points: a ∈ B ∩ C ∩ D, b ∈ A ∩ C ∩ D, c ∈ A ∩ B ∩ D, d ∈ A ∩ B ∩ C. Take
their Radon point. We claim that this point is contained in all four convex sets
A, B, C, D. We have seen that for the Radon point of four points in the plane,
there are two cases to distinguish.
• The Radon point is one of the four points and it is contained in the triangle
formed by the other three points. We assume that this point is d, as we may
wlog. Then d ∈ A ∩ B ∩ C by choice of d. Moreover, a, b, c ∈ D by choice
of a, b, c. So by convexity of D, all points inside the triangle with vertices
a, b, c are contained in D. In particular, D contains the Radon point d. This
concludes the proof that the Radon point is contained in all four convex sets
A, B, C, D.
• The Radon point is the intersection of two line segments which have the four
points as endpoints. We assume that one of these line segments has endpoints
a and b, as we may wlog. So the other line segment has endpoints c and d.
As a and b both belong to C and D, it follows from the convexity of C and
D that the first line segment is contained in C and D. In particular, the Radon
point belongs to C and D. Repeating this argument for the other line segment
gives that the Radon point also belongs to A and B. This concludes the proof
that the Radon point is contained in all four sets A, B, C, D.
Now we extend these arguments to the general case.
Proof of Helly’s Theorem We use the homogenization method.
A finite collection Ci , i ∈ I of at least n convex cones in Rn that are contained
in (Rn−1 × (0, +∞)) ∪ {0n } intersects in a nonzero point iff each subcollection
of n convex cones intersects in a nonzero point.
In order to prove this statement, it is convenient to reformulate it as follows:
for each m ≥ n one has that each m convex cones of the collection intersect in a
nonzero point.
2. Work with convex cones. We prove the reformulated statement above by induction
on m. The start of the induction, m = n, holds by assumption. For the induction
1.10 Helly’s Theorem 37
step, assume that the statement holds for some m ≥ n and that J ⊆ I with
#(J ) = m + 1. Choose for each j ∈ J a nonzero element
cj ∈ ∩k∈J \{j } Cj
as we may by the assumption of the induction step.

Our knowledge about the chosen elements can be formulated as follows: if
j, k are two different elements of J , then one has cj ∈ Ck . Now we apply the
version of Radon’s theorem for convex cones to the set {cj | j ∈ J }. Then we
get a partitioning of J into two disjoint sets B and R such that the conic hulls of
the sets {cb | b ∈ B} and {cr | r ∈ R} intersect in some nonzero point c. To do
the induction step, it suffices to show that c is a common point of the collection
Cj , j ∈ J . Take, to begin with, an arbitrary j ∈ B. Then for all r ∈ R, one
has r = j and so one has cr ∈ Cj . So, as Cj is a convex cone, the conic hull of
the set {cr | r ∈ R} is contained in the cone Cj . As this conic hull contains the
point c, we get that c ∈ Cj . Repeating this argument for an arbitrary j ∈ R, we
get again c ∈ Cj . Thus we have proved for all j ∈ J = B ∪ R that c ∈ Cj .
This concludes the induction step and so it concludes the proof of the version of
Helly’s theorem for convex cones.
3. Translate back. What we have just proved (the convex cone version), implies the
statement of Theorem 1.10.1. Indeed let Ai , i ∈ I be a collection of more than
n convex sets in Rn . Apply the convex cone version to the collection of convex
cones c(Ai ) ⊂ Rn+1 , i ∈ I . This is allowed as c(Ai ) ⊆ (Rn × (0, +∞)) ∪
{0n+1 } by definition of homogenization of a convex set and #(I ) ≥ n + 1 =
dim(Rn+1 ). This gives the existence of a nonzero point in the intersection of the
convex cones c(Ai ) ⊆ Rn+1 , i ∈ I . By scaling, we may assume moreover that
the last coordinate is of this point is 1. As Ai × {1} = c(Ai ) ∩ (Rn × {1}) for all
i ∈ I , we get that the collection Ai , i ∈ I intersects in a point.
The assumption in Helly’s theorem that the collection of convex sets is finite is
essential.
Example 1.10.4 (Finiteness Assumption in Helly’s Theorem)
1. The infinite collection of convex subsets [n, +∞) ⊂ R, n ∈ N has no common
point, but every finite subcollection has a common point. The reason for the
trouble here is that the common points of finite subcollections run away to
infinity.
2. Another example where things go wrong is the infinite collection (0, n1 ) ⊂
R, n ∈ N. The reason for the trouble here is that common points of finite
subcollections tend to a point that is not in the intersection, in fact, that does
not belong to any of these sets: zero.
If the two reasons for failure given above are prevented, then one gets a
version of the result for infinite collections. This requires the topological concept
of compactness (in Chap. 3 more details about topology will be given).
Definition 1.10.5 A subset S ⊆ X = Rn is closed if it contains the limit of each

sequence in S that is convergent. A subset S ⊆ X = Rn is bounded if there exists a
positive number R such that |xi | ≤ R for all x ∈ S and all i ∈ {1, . . . , n}. A subset
S ⊆ X = Rn is called compact if it is closed and bounded.
A compact set has the following property.
Proposition 1.10.6 A collection of closed subsets of a compact set has a nonempty
intersection if each finite subcollection has a nonempty intersection.
Theorem 1.10.7 (Helly’s Theorem for Infinite Collections) An infinite collection
of compact convex sets in X = Rn has a common point iff each subcollection of
n + 1 sets has a common point.
Proof Assume that each subcollection of n + 1 sets intersects. By the theorem of
Helly, it follows that each finite subcollection intersects. Then it follows by the
compactness assumption and Proposition 1.10.6 that the entire collection intersects.
Note that the assumption of Theorem 1.10.7 can be relaxed: it is sufficient that the
convex sets from the collection are closed and that at least one of them is compact.
1.11 *Applications of Helly’s Theorem
Many interesting results can be proved with unexpected ease when you observe
that Helly’s theorem can be applied. We formulate a number of these results. This
material is optional. At a first reading, you can just look at the statements of the
results below. Some of the proofs of these results are quite challenging. At the end
of the chapter, hints are given.
Proposition 1.11.1 If each three points from a set in the plane R2 of at least three
elements can be enclosed inside a unit disk, then the entire set can be enclosed inside
a unit disk.
Here is a formulation in nontechnical language of the next result: if there is a spot
of √
diameter d on a tablecloth, then we can cover it with a circular napkin of radius
d/ 3.
Proposition 1.11.2 (Jung’s Theorem) If the distance between any two points√of a
set in the plane R2 is at most 1, then the set is contained in a disk of radius 1/ 3.
We can also give a similar result about a disk contained in a given convex set in
the plane.
Proposition 1.11.3 (Blaschke’s Theorem) Every bounded convex set A in the
plane of width 1 contains a disk of radius 1/3 (the width is the smallest distance
between parallel supporting lines; a line is called supporting if it contains a point
of the set and moreover has the set entirely on one of its two sides).
1.11 *Applications of Helly’s Theorem 39
The next result can again be formulated in nontechnical language: if a painting

gallery consists of several rooms connected with one another whose walls are
completely hung with paintings and if for each three paintings there is a point from
which all three can be seen, then there exists a point from which all the paintings
can be seen.
Proposition 1.11.4 (Krasnoselski’s Theorem) If for each three points A, B, C of
a polygon (that is, a subset of the plane that is bounded by a finite number of line
segments) there is a point M such that all three line segments MA, MB, MC lie
entirely inside the polygon, then there exists a point O in the interior of the polygon
all whose connecting line segments with points of the polygon lie entirely inside the
polygon.
If you try to generalize the concept of median of a finite set S ⊂ R (a median of
S is a real number such that the number of elements in S that are larger is equal to
the number of elements in S that are smaller) to a finite set of data points S ⊂ Rn ,
then you come across the problem that there does not always exist a point in Rn
such that each hyperplane through the point has at least 12 #(S) data points on each
of its two sides. However, if you replace 12 by n+1
1
, then such points exist. These
points are called centerpoints.
Proposition 1.11.5 (Centerpoint Theorem) For each finite set S ⊂ Rn there
exists a point c ∈ Rn such that any hyperplane in Rn through c contains on each of
its two sides at least #(S)/(n + 1) points.
Some interesting information about the approximation of continuous functions
by polynomial functions of given degree can be obtained. We need a preparation.
Proposition 1.11.6 For each three line segments of a finite collection of parallel
line segments in the plane, there exists a line that intersects them, then there exists
a line that intersects all the line segments.
Now we state the promised result on approximation of functions in the simplest
case.
Proposition 1.11.7 (Theorem of Chebyshev for Affine Functions) Let a contin-
uous function f : [−1, 1] → R be given. Then the following conditions on an affine
function y = ax + b are equivalent and there exists a unique such affine function.
1. There exists an affine (linear plus constant) function y = ax + b which
approximates f best in the following sense: the value ε = maxx∈[−1,1] |f (x) −
(ax + b)| is minimal.
2. There exist −1 ≤ x < x < x ≤ 1 such that the absolute values of the error
f (x) − (ax + b) equals ε for x = x , x , x and that the signs of the error
alternate for x = x , x , x .
To illustrate this result, consider the function x 2 on [−1, 1], and check that the
unique affine function having the two equivalent properties has a = 12 , b = 0.
Remark This last result can be generalized to polynomials of any given degree.
Then the polynomial function of degree < n that best approximates the func-
tion x → x n on [−1, 1] can be given explicitly: by the recipe x → x n −
2−(n−1) cos (n arccos x).
1.12 Carathéodory’s Theorem
We have seen that each point of the convex hull of a set S ⊆ X = Rn is a convex
combination of a finite subset of S. This result can be refined.
Theorem 1.12.1 (Carathéodory’s Theorem) Each point in the convex hull of a
set S ⊆ Rn is a convex combination of an affinely independent subset of S. In
particular, it is a convex combination of at most n + 1 points of S.
Example 1.12.2 (Carathéodory’s Theorem) Figure 1.23 illustrates this result for a
specific set of five points in the plane R2 .
Each point in the convex hull of this set is by Proposition 1.7.8 a convex
combination of these five points. But by Carathéodory’s theorem it is even a convex
combination of three of these points—not always the same three points. Indeed, in
the figure the convex hull is split up into three triangles. Each point in the convex
hull belongs to one of the triangles. Therefore, it is a convex combination of the
three vertices of the triangle. This is in agreement with Carathéodory’s theorem.
Proof of Carathéodory’s Theorem We use the homogenization method.
For a set S ⊆ X = Rn , every point in its conic hull cone(S) is a conic
combination of a linearly independent subset of S.
2. Work with convex cones. Choose an arbitrary x ∈ cone(S). Let r be the minimal
number for which x can be expressed as a conic combination of some finite subset
T of S of r elements—#T = r. Choose such a subset T and write x as a conic
combination of T . All coefficients of this conic combination must be positive by
the minimality property of r.
We claim that T is linearly independent. We prove this claim by contradiction.
Suppose there would be a nontrivial linear relation among the elements of T . We
may assume that one of its coefficients is positive (by multiplying it by −1 if
necessary). Then we subtract a positive multiple of this linear relation from the
Fig. 1.23 Carathéodory’s

theorem
1.12 Carathéodory’s Theorem 41
chosen expression of x in T . If we let the positive scalar with which we multiply

increase from zero, then initially all coefficients of the new linear combination of
T will remain positive. But eventually we will reach a value of this positive scalar
for which at least of the coefficients becomes zero, while all coefficients are still
nonnegative. Then we would have expressed x as a conic combination of r − 1
elements. This would contradict the minimality property of r. This concludes the
proof that T is linearly independent.
3. Translate back. What we have just proved (the convex cone version) implies the
statement of Carathéodory’s theorem. Indeed, if we apply the result that we have
just established to the set S × {1} and then dehomogenize, we get the required
conclusion.
The convex cone version of Carathéodory’s theorem, which has been established
in the proof above, is worth to be stated separately.
Proposition 1.12.3 Every point in the conic hull of a set S ⊆ X = Rn is a
conic combination of a linearly independent subset of S. In particular, it is a conic
combination of at most n elements of S.
Carathéodory’s theorem has many applications. The following one is especially
useful.
Corollary 1.12.4 The convex hull of a compact set S ⊆ X = Rn is closed.
In the proof, we will need the following property of the concept of compactness.
Recall that a function f from a set S ⊆ X = Rn to Y = Rm is called continuous if
limx→a f (x) = f (a) for all a ∈ S.
Proposition 1.12.5 The image of a compact set S ⊆ X = Rn under a continuous
function S → Y , with Y = Rm , is again compact.
Here we recall from Example 1.8.2 that the image of a closed set under a linear
function need not be closed.
Proof of Corollary 1.12.4 By Carathéodory’s theorem, co(S) is the image of the
function
{(α1 , . . . , αn+1 ) ∈ [0, 1]n+1 | α1 + · · · + αn+1 = 1} × S n+1 → Rn ,
where S n+1 = S × · · · S (n + 1 copies), given by the recipe
(α1 , . . . , αn+1 , s1 , . . . , sn+1 ) → α1 s1 + · · · + αn+1 sn+1 .
This implies that co(S) is compact, by Proposition 1.12.5.

In the special case of Corollary 1.12.4 that the set S is finite, a much stronger
result holds true. It is not hard to derive the following famous theorem from
Theorem 1.12.1. We need two definitions.
Definition 1.12.6 A polyhedral set in X = Rn is the intersection of a finite

collection of closed halfspaces of X, that is, it is the solution set of a finite system
of linear inequalities ai · x ≤ αi , i ∈ I with I a finite index set, ai ∈ X \ {0X }
and αi ∈ R for all i ∈ I . A polytope is a bounded polyhedral set. A polygon is a
polytope in the plane R2 .
Polytopes can be characterized as follows. For each polytope in Rn , we let
ni , 1 ≤ i ≤ k be the outward normals of unit length of its n − 1-boundaries and let
Si , 1 ≤ i ≤ k be the corresponding n − 1-dimensional volumes of the boundaries.

Theorem 1.12.7 (Minkowski) For each polytope in Rn one has ki=1 Si ni = 0.
{n 1 , . . . , nk } in R and every list of positive
Conversely, every set of unit vectors n
k
numbers Si , 1 ≤ i ≤ k for which i=1 Si ni = 0 arise in this way for a unique
polytope in Rn .
This result will be proved in Chap. 9.
Example 1.12.8 (Regular Polytope) A polytope in Rn is called regular if it is as
symmetrical as possible. Here is a precise definition for n = 3 that can be
generalized to arbitrary dimension. A polytope in R3 has faces of dimension 0,
called vertices, of dimension 1, called edges, and of dimension 2, called just faces.
A flag is a triple (v, e, f ) where v is a vertex of the polytope, e is an edge containing
v, and f is a face containing e. A polytope is regular if for any two flags (v, e, f )
and (v , e f ) there is a symmetry of the polytope that takes v to v , e to e , and f to
f .
A polygon is regular if all sides have the same length and all angles are equal.
The number of vertices can be any number ≥ 3.
In each dimension n ≥ 3, one has the following three types of regular polytopes.
If you take n + 1 points all at the same distance of each other, then they form
the vertices of a regular simplex. For n = 2, this is an equilateral triangle and
for n = 3 this is a regular tetrahedron. The set of all points x ∈ Rn for which
x∞ = maxi |xi | ≤ 1 form a hypercube. For n = 2 this is a square and for n = 3
this is a cube. The set of points x ∈ Rn for which x1 = |x1 | + · · · + |xn | ≤ 1 form
an orthoplex. For n = 2 this is a square and for n = 3 this is an octahedron. Besides,
there are only five more regular polytopes, two for n = 3, the dodecahedron and the
icosahedron, and three for n = 4.
That the given list of regular polytopes is complete, can be proved for n = 3 as
follows: there must be at least three faces meeting at each vertex, and the angles
at that vertex must add up to less than 2π . Therefore, the only possibilities for the
faces at a vertex are three, four, or five triangles, three squares, or three pentagons.
These give the tetrahedron, the octahedron, the icosahedron, the cube, and the
dodecahedron.
Definition 1.12.9 A polyhedral cone in X = Rn is the intersection of a finite
collection of halfspaces of X that have the origin on their boundary, that is, it is
the solution set of a finite system of linear inequalities ai · x ≤ 0, i ∈ I with I a
finite index set, ai ∈ X \ {0X } for all i ∈ I .
1.13 Preference Relations and Convex Cones 43
Example 1.12.10 (Polyhedral Cone) The first orthant Rn+ is an example of a

polyhedral cone, and so of an unbounded polyhedral set.
A polyhedral set is a closed convex set, and a polyhedral cone is a closed convex
cone. Polyhedral sets are a very important and nice class of convex sets; we will
come back to them in Chap. 4.
Theorem 1.12.11 (Weyl’s Theorem)
1. A convex set is a compact polyhedral set iff it is the convex hull of a finite set of
points.
2. A convex cone is a polyhedral cone iff it is the conic hull of a finite set of points.
We will not display the derivation, as we will get this result in Chap. 4 as a special
of a more general result.
1.13 Preference Relations and Convex Cones
Convex cones arise naturally if you consider a preference relation for the elements
of X = Rn . A preference (relation) on X is a partial ordering on X that satisfies
the following conditions:
• xx
• x y, u v ⇒ x + u y + v.
• x y, α ≥ 0 ⇒ αx αy.
If is a preference relation on X, then it can be verified that the set {x ∈ X | x
0} is a convex cone C containing the origin. Conversely, each convex cone C ⊆ X
containing the origin gives a preference relation on X—defined by x y iff x −y ∈
C. That is, there is a natural bijection between the set of preference relations on X
and the set of convex cones in X that contain the origin. One can translate properties
of a preference relation on X to its corresponding convex cone C ⊆ X. For
example, is asymmetric (that is, x y and y x implies x = y) iff the convex
cone C does not contain any line through the origin. A convex cone containing the
origin that does not contain any line through the origin is called a pointed convex
cone.
Example 1.13.1 (A Preference Relation on Space-Time) The following example of
a preference relation in space-time, the four dimensional space that is considered in
special relativity theory (see Susskind and Friedman [2]), plays a key role in that
1
theory. This explains why the convex cone y ≥ (x12 + · · · + xn2 ) 2 is named after
the physicist Lorentz. No information can travel faster than with the speed of light
(for convenience, we assume that this speed equals 1, by suitable choices of units of
length and time). Therefore, an event e that takes place at the point (x, y, z) at time
t according to some observer can influence an event e that takes place at the point
(x , y , z ) at time t according to the same observer if and only if light that is sent at
the moment t from the point (x, y, z) in the direction of the point (x , y , z ) reaches
this point at time ≤ t . Then one writes e e. This defines a preference relation on
space-time R4 = {(x, y, z, t)}. Its corresponding convex cone is the Lorentz cone
1
L3 = {(x, y, z, t) | t ≥ (x 2 + y 2 + z2 ) 2 }. Indeed, light that is sent from the point
(0, 0, 0) at time 0 in the direction of a point (x, y, z) will reach this point at a time
1
≤ t iff t ≥ (x 2 + y 2 + z2 ) 2 .
Here are some interesting properties of this preference relation from special
relativity theory. Whether one has for two given events e and e that e e or
not, does not depend on the observer. Therefore, one sometimes says that e e
means that event e happens absolutely after event e. Note that it can happen for two
events e and e that e happens after e according to some observer but that e happens
after e according to another observer.
1.14 *The Shapley–Folkman Lemma
You might like to do the following experiment yourself (with pencil and paper or
with a computer). Choose a large number of arbitrary nonempty subsets S1 , . . . , Sm
of the plane R2 , and make a drawing of their Minkowski sum S = S1 + · · · + Sm =
{s1 +· · ·+sm | si ∈ Si , 1 ≤ i ≤ m}. That is, S is the set that you get by picking in all
possible ways one element from each set and then taking the vector sum. You might
expect that S can be any set in the plane. However, it has a very special property:
each point in the convex hull of S lies close to a point of S.
Example 1.14.1 (Shapley–Folkman Lemma) Figures 1.24, 1.25 and 1.26 illustrate
this experiment. Figure 1.24 shows a choice of sets: ten sets, each consisting of two
antipodal points.
Figure 1.25 shows the scaled outcome of the experiment, the Minkowski sum of
the ten sets.
Figure 1.26 also gives the convex hull of the Minkowski sum.
This empirical phenomenon can be formulated as a precise result (see Starr [3]).
This holds not only for the plane, but more generally in each dimension. To begin
with, note the easy property that
co(S) = co(S1 ) + · · · + co(Sm )
n S1 , . . . , Sm ⊆ X = R . That is, each x ∈ co(S) has a representation

for sets n
x = i=1 xi such that xi ∈ co(Si ) for all i. Now we give a stronger version of this
property, and this is the promised precise result.
Lemma 1.14.2 (Shapley–Folkman Lemma)
Let Si (i = 1, . . . , m) be m ≥ n
non-empty subsets of Rn . Write S = m S
i=1 i for their Minkowski sum. Then each
1.14 *The Shapley–Folkman Lemma 45
Fig. 1.24 Ten sets of two

antipodal points
Fig. 1.25 The Minkowski

sum of the ten sets (scaled)
Fig. 1.26 The Minkowski

sum of the ten sets compared
to its convex hull
m
x ∈ co(S) has a representation x = i=1 xi such that xi ∈ co(Si ) for all i, and
xi ∈ Si for at least (m − n) indices i.
This result implies indeed the empirical observation above. For example, if each
set Si , 1 ≤ i ≤ m is contained in a ball of radius r, then the Shapley–Folkman
lemma implies that each point in co(S) lies within distance 2nr of a point in S.
In a more striking formulation, each point in the convex hull of the average sum
m (S1 + · · · + Sm ) lies within distance 2nr/m of a point of this average sum; note
1
that 2nr/m tends to zero if m → ∞.

Example 1.14.3 (Shapley–Folkman Lemma for Many Identical Sets) If we take
averages S, 12 (S + S), 13 (S + S + S), . . . for an arbitrary bounded set S ⊂ Rn ,
then these averages converge to the convex hull of S in the sense that the set of all
limits of convergent sequences s1 , s2 , . . . with sk in the k-th average for all k is equal
to the convex hull of S.
The Shapley–Folkman lemma has many applications in economic theory, in
optimization and in probability theory.
**I can not resist the temptation to say something here about the light that
the Shapley–Folkman lemma sheds on the intriguing and deep question why
the necessary conditions for optimal control problems, Pontryagin’s Maximum
Principle, one of the highlights of optimization theory in the twentieth century, are
as strong as if these problems would be convex, which they are not. In passing,
it is interesting that such strong necessary conditions are not—yet—known for
optimal control problems where functions of several variables have to be chosen
in an optimal way.
By taking a limit, one can derive from the Shapley–Folkman lemma the result
that the range of an atomless vector measure is convex. This result plays a key role
in the proof of Pontryagin’s Maximum Principle; it is the ingredient that makes
these necessary conditions so strong. If you are interested in a simple proof of the
Shapley–Folkman lemma, here is one.
**The following proof is due to Lin Zhou (see [4]).

Proof Choose an arbitrary x ∈ co(S). Write it as x = m i=1 yi with yi ∈ co(Si )
li
for all i. Then write, for each i, yi = j =1 aij yij , in which aij > 0 for all j , and
l i
j =1 aij = 1, and yij ∈ Si for all i. Construct the following auxiliary vectors in
Rm+n :
z = (x , 1, 1, . . . , 1),

z1j = (y1j , 1, 0, . . . , 0),
...................

znj = (ynj , 0, 0, · · · , 1).
1.15 Exercises 47
li
Then we have z = m i=1 j =1 aij zij . Now we apply the convex cone version
of Carathéodory’s theorem, Proposition 1.12.3. It follows that z can be expressed as
li
z= m i=1 j =1 bij zij , in which one has for all pairs (i, j ) that bij ≥ 0 and for at
most (m + n) pairs (i, j ) that bij > 0. This amounts to:

m
li
li
x= bij yij and bij = 1
i=1 j =1 j =1
for all i.
Now write xi = lji=1 bij yij for all i. Then x = m i=1 xi and xi ∈ co(Si ) for
each i. As the number of pairs (i, j ) for which bij > 0 is at most m + n and as for
each i one has bij > 0 for each i, there are at most n indices i that have bij > 0 for
more than one j . Hence xi ∈ Si for at least (m − n) indices i.
1.15 Exercises
1. Make a drawing similar to Fig. 3, but now for a landscape with a straight road
that stretches out to the horizon, with lampposts along the road.
2. Check that the set F that is used to model the bargaining opportunity in the
Alice and Bob example, is indeed a convex set.
3. Make a drawing of the pair (F, v) in the Alice and Bob example and check that
this gives Fig. 1.1.
4. Show that the list of all convex subsets of the line R given in Example 1.2.3 is
correct.
5. *Prove the statements about convex polygons in the plane R2 in Example 1.2.3.
6. *Prove the statement about approximating a bounded convex set in the plane
R2 by polygons in Example 1.2.3.
7. Give a complete list of all convex cones in the plane R2 .
8. Consider the perspective function P : Rn × R++ → Rn given by the recipe
(x, t) → t −1 x. Show that the image of a convex set A ⊆ Rn × R++ under the
perspective function is convex.
9. Determine for each one of the following statements whether it is correct or
not.
(a) If you adjoin the origin to a convex cone that does not contain the origin,
then you get a convex cone containing the origin.
(b) If you delete the origin from a convex cone that contains the origin, then
you get a convex cone that does not contain the origin.
Hint: one statement is correct and one is wrong.
10. Prove the following statements, which describe the relation between convex
cones that do not contain the origin and convex cones that do contain the
origin.
(a) For each convex cone C ⊆ Rn that does not contain the origin, the union
with the origin, C ∪ {0n }, is a pointed convex cone.
(b) Each convex cone C ⊆ Rn that contains the origin is the Minkowski sum
of a pointed convex cone D and a subspace L,
C = D + L = {c + l | c ∈ C, l ∈ L}.
The subspace L is uniquely determined: it is the lineality space LC of the

convex cone C, the largest subspace of Rn contained in the convex cone C,
that is, LC = C ∩ (−C).
11. In this exercise, we apply homogenization of convex sets to the analysis of a
system of two linear equations in two variables.
Consider a system of two equations of the form
a11 x1 + a12 x2 = b1 ,
a21 x1 + a22 x2 = b2 .
Assume that for each equation at least one coefficient is nonzero.

(a) How can you determine whether the system has 0, 1 or infinitely many
solutions, by means of the homogenized system
a11 x1 + a12 x2 = b1 x3 ,
a21 x1 + a22 x2 = b2 x3 .
(b) Assume that the equations are not scalar multiples of each other. Show
that the homogenized system has a unique (up to positive scalar multiples)
solution with x3 ≥ 0.
(c) Show that translation of this uniqueness result into geometric language
gives the following statement. Consider two closed half-planes in the upper
halfspace x3 ≥ 0 that are both bounded by lines in the horizontal plane
x3 = 0 through the origin, and that are both not contained in the horizontal
plane x3 = 0. Then their intersection is a closed ray in the upper halfspace
x3 ≥ 0.
12. Show that a subspace of Rn is a convex cone, and so in particular a convex set.
13. Let M be an m × n-matrix. Show that the subspaces {x ∈ Rn | Mx = 0} and
{M y | y ∈ Rm } of Rn are each others orthogonal complement.
14. Give precise proofs for the results on subspaces from linear algebra that are
listed in Sect. 1.2 as specializations of results on convex sets in the Chaps 1–
1.15 Exercises 49
4 of this book. In particular, show that for subspaces L1 , L2 ⊆ Rn , one has

(L1 + L2 )⊥ = L⊥ ⊥ ⊥ ⊥ ⊥
1 ∩ L2 and (L1 ∩ L2 ) = L1 + L2 , and show that the maps
X and +X are each others transpose.
15. *Let A be a bounded convex set in the plane R2 . Show that there exists an
inscribed rectangle r in A and a homothetic copy of r, R = {αx + β | x ∈ r} for
suitable α > 0 and β, that is circumscribed and such that the homothety ratio α
is at most 2 and moreover
1
area(R) ≤ area(A) ≤ 2 area(r).
2
16. Show that convex sets in the line R are precisely all intervals of the following
forms—∅, (a, b), (a, b], [a, b), [a, b], (a, +∞), [a, +∞), (−∞, b), (−∞, b],
(−∞, +∞), where a, b ∈ R, a < b.
17. Show that the solution set of the inequality f (x) ≤ 0 is a convex set if f
is the simplest nontrivial type of function f : Rn → R, an affine function
x → b · x + β with b ∈ Rn \ {0n } and β ∈ R.
18. Show that the solution set of the inequality f (x) ≤ 0 is a convex set if f is the
second simplest nontrivial type of function f : Rn → R, a quadratic function
x → x Bx + b x + β with B a symmetric n × n-matrix, b ∈ Rn , β ∈ R,
provided B is positive semi-definite.
19. Show that a circle, an ellipse, a parabola and a branch of a hyperbola are all
boundaries of convex sets.
20. Give a relation between the convex hull of a set S ⊆ Rn and the conic hull of
the set S × {1}.
21. Show that the intersection of any collection of convex sets (respectively convex
cones) is again a convex set (respectively a convex cone).
22. Show that the closed convex cones C in the plane R2 are C = ∅, C = {02 },
C = R2 , closed rays, closed half-planes with 02 on the boundary, and sets
consisting of all conic combinations of two linearly independent unit vectors u
and v in R2 ,
C = {αu + βv | α, β ∈ [0, +∞)}.
23. Show that the first orthant, the Lorentz cone and the semidefinite cone are
convex cones and that these convex cones are pointed.
24. Show that the vector space of symmetric n × n-matrices can be identified with
Rm where m = 12 n(n + 1), by means of stacking.
25. Show that viewing circles, ellipses, parabolas and branches of hyperbolas as
conic sections is an example of the homogenization of convex sets.
26. Make a version of Fig. 1.11 for an unbounded convex set in the plane. That is,
draw the conification for such a set.
27. Determine the recession cones of some simple convex sets, such as (7, +∞),
{(x, y) | x ≥ 0, y ≥ 0}, {(x, y) | x > 0, y > 0}, {(x, y) ∈ R2 | y ≥ x 2 }, and
{(x, y) ∈ R2 | xy ≥ 1, x ≥ 0, y ≥ 0}.
28. Let A ⊆ X = Rn be a nonempty convex set. Show that the largest set S in Rn
for which adjoining S × {0} to the convex cone c(A) preserves the convex cone
property, is the recession cone of A, S = RA , defined to be the set of all r ∈ X
for which a + tr ∈ A for all t ≥ 0 and all a ∈ A.
29. Show that the recession cone RA of a nonempty closed convex set A ⊆ Rn is a
closed convex cone.
30. Prove the statements of Proposition 1.5.7.
31. Show that the minimal conification of an open convex set is open.
32. Show that the maximal conification of a closed convex set is closed.
33. Show that the maximal conification of an open convex set is never open.
34. Show that the minimal conification of a closed convex set is never closed.
35. Let A ⊆ X = Rn be a nonempty convex set. Show that cone(A), the conic hull
of A, is equal to R+ A = {ρa | ρ ≥ 0, a ∈ A}.
36. Verify that the following three operations make new convex sets (respectively
convex cones) from old ones: the Cartesian product, image and inverse image
under a linear function.
37. As an exercise in the homogenization method, derive the property that the
image and inverse image of a convex set under a linear function are convex
sets from the analogous property for convex cones.
38. Show that the image of the closed convex set {(x, y) | x, y ≥ 0, xy ≥ 1} under
the linear function R2 → R given by the recipe (x, y) → x is a convex set in
R that is not closed.
39. Show that the property closedness is preserved by the following two operations:
the Cartesian product and the inverse image under a linear function.
40. Show that the image MA of a closed convex set A ⊆ Rn under a linear function
M : Rn → Rm is closed if the kernel of M, that is, the solution set of Mx = 0,
and the recession cone of A, RA , have only the origin in common.
41. Show that the image MA of an open convex set A ⊆ Rn under a surjective
linear function M : Rn → Rm is open.
42. Show that the proof of Theorem 1.9.1 gives an algorithm for finding the sets B
and R and a point in B ∩ R.
43. Show that the proof of Theorem 1.10.1 gives in principle an algorithm for
finding a common point of the given collection of convex sets, provided one
has a common point of each subcollection of n + 1 of these sets.
44. Show that the assumption in Theorem 1.10.7 that all convex sets of the given
collection are compact can be relaxed to: all convex sets of the given collection
are closed and at least one of them is bounded.
45. Make the algorithm that is implicit in the proof of Theorem 1.12.1 explicit.
46. Is the smallest closed set that contains a given convex set always convex?
47. Is the smallest convex set that contains a given closed set always closed?
48. Give a constructive proof (‘an algorithm’) without using any result for Weyl’s
theorem in the case of a tetrahedron, the convex hull of an affinely independent
set. To be precise, prove the following statements.
1.16 Hints for Applications of Helly’s Theorem 51
(a) Show that a tetrahedron is a bounded polyhedral set.

(b) Show that a bounded polyhedral set in Rn that is the intersection of n+!
closed halfspaces and that has a nonempty interior is a tetrahedron.
(c) Show that the conic hull of a basis of Rn is a polyhedral cone.
(d) Show that a polyhedral cone in Rn that is the intersection of n halfspaces
that have the origin on their boundary and that have linearly independent
normals of these boundaries, is the conic hull of a basis of Rn .
49. Show that there is for a given vector space V a bijection between the preference
relations on V and the convex cones C ⊆ V containing the origin, defined as
follows: and C correspond iff x ∈ C ⇔ x 0.
50. Show that for a corresponding convex cone C containing the origin and a
preference relation , one has that C is pointed iff is asymmetric.
51. Consider the relation on space-time R4 defined by (x , y , z , t ) (x, y, z, t)
iff a point that departs from position (x, y, z) at time t and travels in a straight
line with speed 1 in the direction of position (x , y , z ) reaches position
(x , y , z ) not later than at time t .
(a) Show that is an asymmetric preference relation.
(b) Show that (x , y , z , t ) (x, y, z, t) implies t ≥ t, but that the converse
implication does not hold.
(c) Show that the corresponding convex cone is the Lorentz cone L3 .
52. Assume that u, u1 , . . . , up are vectors in Rq . Show that if u is a conic
combination of {u1 , . . . , up }, then it can be written as a conic combination of
no more than q vectors from {u1 , . . . , up }.
53. Prove the following equality for m nonempty subsets S1 , . . . , Sm in Rn , where
we have written S = S1 + · · · + Sm = {s1 + · · · + sm | si ∈ Si , 1 ≤ i ≤ m}:
co(S) = co(S1 ) + · · · + co(Sm ).
54. Show that if we take averages S, 12 (S + S), 13 (S + S + S), . . . for a bounded

set S ⊂ Rn , then these averages converge to the convex hull of S in the sense
that the set of all limits of convergent sequences s1 , s2 , . . . with sk in the k-th
average for all k is equal to the convex hull of S.
1.16 Hints for Applications of Helly’s Theorem
Here are hints for the applications of the theorem of Helly that are formulated in
Sect. 1.11.
• Proposition 1.11.1. Consider the collection of unit disks with center an element
of the set.
• Proposition √
1.11.2. Show that each three points of the set are included in a disk
of radius 1/ 3.
• Proposition 1.11.3. Let l run over the set L of supporting lines of A, define for
each l ∈ L the set Al to consist of all points in A of distance to l at most 1/3 and
then consider the collection of sets Al , l ∈ L.
• Proposition 1.11.4. Consider the collection of closed half-planes in the plane
which contains on its boundary a side of the polygon and which contains also all
points of the polygon close to this side.
• Proposition 1.11.5. Consider the collection of convex hulls of subsets of S of at
nm
least n+1 points, where m = #(S).
• Proposition 1.11.6. Let l run over the set of parallel line segments L, define for
each l ∈ L the set Al consisting of all lines that intersect l and then consider the
collection Al , l ∈ L.
References
1. J. Nash, The bargaining problem. Econometrica 18(2), 155–162 (1950)

2. L. Susskind, A. Friedman, Special Relativity and Classical Field Theory: The Theoretical
Minimum (Basic Books, New York, 2017)
3. R.M. Starr, Quasi-equilibria in markets with non-convex preferences (Appendix 2: The Shapley–
Folkman theorem, pp. 35–37). Econometrica 37(1), 25–38 (1969)
4. L. Zhou, A simple proof of the Shapley–Folkman theorem. Econ. Theory 3, 371–372 (1993)
Chapter 2
Convex Sets: Binary Operations
Abstract
• Why. You need the Minkowski sum and the convex hull of the union, two binary
operations on convex sets, to solve optimization problems. A convex function f
has a minimum at a given point x iff the origin lies in the subdifferential ∂f (
x ), a
certain convex set. Now suppose that f is made from two convex functions f1 , f2
with known subdifferentials at x , by one of the following two binary operations
for convex functions: either the pointwise sum (f = f1 + f2 ) or the pointwise
maximum (f = max(f1 , f2 ) ), and, in the latter case, f1 ( x ) = f2 (x ). Then
x ) can be calculated, and so it can be checked whether it contains the origin
∂f (
or not: ∂(f1 + f2 )( x ) is equal to the Minkowski sum of the convex sets ∂f1 ( x)
and ∂f2 (x ) and ∂(max(f1 , f2 ))( x ) is equal to the convex hull of the union of
x ) and ∂f2 (
∂f1 ( x ) = f2 (
x ) in the difficult case that f1 ( x ).
• What. The four standard binary operations for convex sets are the Minkowski
sum, the intersection, the convex hull of the union and the inverse sum. The
defining formulas for these binary operations look completely different, but
they can all be generated in exactly the same systematic way by a reduction to
convex cones (‘homogenization’). These binary operations preserve closedness
for polyhedral sets but not for arbitrary convex sets.
Road Map
• Section 2.1.1 (motivation binary operations convex sets).
• Section 2.1.2 (crash course in working coordinate-free).
• Section 2.2 (construction binary operations +, ∩ for convex cones by homog-
enization; the fact that the sum map and the diagonal map are each others
transpose).
• Section 2.3 (construction of one binary operation for convex sets by homogeniza-
tion).
• Proposition 2.4.1 and Fig. 2.1 (list for the construction of the standard binary
operations +, ∩, co∪, # for convex sets by homogenization; note that not all of
these preserve the nice properties closedness and properness).

54 2 Convex Sets: Binary Operations
2.1 *Motivation and Preparation
2.1.1 *Why Binary Operations for Convex Sets Are Needed
Binary operation on convex sets have been considered already a long time ago, for
example in the classic work [1]. In [2] and later in [3], it has been attempted to con-
struct complete lists of binary operations on various types of convex objects. This
plan was realized using the method presented in this chapter, by homogenization, in
[4] and [5].
A driving idea behind convex analysis is the parallel with differential analysis.
Let us apply this idea to the minimization of a function f : Rn → R. If f is
differentiable, then one has, at a minimum x , that f (
x ) = 0, the derivative of f
at
x is zero. If moreover, f is built up from differentiable functions for which one
knows the derivative in x , as f = f1 + f2 or as f = f1 f2 , then f ( x ) can be
expressed in f1 ( x ) and f2 (
x ). Here are the convex parallels of these two facts. If f
is convex (that is, the region above its graph is a convex set), then x is a minimum
iff 0 ∈ ∂f (x ), zero lies in a certain convex set, called the subdifferential of f at x . If
moreover f is built up from convex functions for which one knows the derivative,
as f = f1 + f2 or as f = max(f1 , f2 ), then ∂f ( x ) can be expressed in ∂f1 ( x ) and
x ), using binary operations for convex sets. We will see in Chap. 7 that these
∂f2 (
parallel facts are used to solve optimization problems.
2.1.2 *Crash Course in Working Coordinate-Free
In order to present the constructions of the binary operations for convex sets as
simply as possible, we will work coordinate-free. In this section, we recall the
minimal knowledge that is needed for this. More information can be found in any
textbook on linear algebra.
The idea is that if you work with column spaces, matrices, dot products and the
transpose of a matrix, you can do this in another—more abstract—language, which
does not make use of coordinates.
• Column spaces are described by finite dimensional vector spaces. These are
defined as sets V that are equipped with a function V ×V → V , which is denoted
by (v, w) → v + w and called vector addition, and with a function R × V → V ,
which is denoted by (α, v) → αv and called scalar multiplication. In order to
make sure that you can work with vector addition and scalar multiplication in the
same way that you do in column spaces, you impose a number of suitable axioms
on them such as α(v + w) = αv + αw. Finite dimensionality of V is ensured by
requiring that there exist v1 , . . . , vn ∈ V such that each v ∈ V can be written as
α1 v1 + · · · + αn vn . for suitable numbers α1 , . . . , αn ∈ R.
2.1 *Motivation and Preparation 55
The notion of finite dimensional vector space is essentially the same as the
notion of column space, in particular, it is not more general. If the natural number
n above is chosen to be minimal, then v1 , . . . , vn is called a basis of V . The
reason for this terminology is that each v ∈ V can be written uniquely as α1 v1 +
· · · + αn vn for suitable numbers α1 , . . . , αn ∈ R. Then you can replace v by the
column vector with coordinates α1 , . . . , αn (from top to bottom). If you do this,
then you are essentially working with column space Rn with the usual vector
addition and scalar multiplication. So a finite dimensional vector space is indeed
essentially a column space.
• Matrices are described by linear functions between two finite dimensional vector
spaces V and W . These are defined as functions L : V → W that send sums
to sums, L(u + v) = L(u) + L(v), and a scalar multiples to the same scalar
multiple, L(αv) = αL(v).
The notion of a linear function is essentially the same as the notion of a matrix.
Let L : V → W be a linear function between finite dimensional vector spaces.
Choose a basis v1 , . . . , vn of V and a basis w1 , . . . , wm of W . Then there exists
a unique m × n-matrix M such that if you go over to coordinates, and so get
a function Rn → Rm , then this function is given by left multiplication by the
matrix M, that is, by the recipe x → Mx, where Mx is matrix multiplication.
So the notion of a linear function is indeed essentially the same as the notion of
a matrix.
Here we point out one advantage of working with linear functions over
working with matrices. This reveals the deeper idea behind the matrix product:
it is just the coordinate version of composition of linear functions (such a
composition is of course again a linear function).
• The dot product x · y = x1 y1 + · · · + xn yn is described by an inner product on a
finite dimensional vector space V . This is defined to be a function V ×V → R—
denoted by (u, v) → u, v = u, vV and called inner product—on which
suitable axioms are imposed so that you can work with it just as you do with the
dot product.
The notion of an inner product is essentially the same as the notion of the dot
product; in particular, it is not more general. Let ·, · be an inner product on a
finite dimensional vector space V . Choose an orthonormal basis v1 , . . . , vn of V
(that is, vi , vj equals 0 if i = j and it equals 1 if i = j ). Go over to coordinates,
and so get an inner product on Rn . Then this inner product is the dot product. So
the notion of inner product is indeed essentially the same as the notion of the dot
product.
A finite dimensional vector space together with an inner product is called a
finite dimensional inner product space. So a finite dimensional inner product
space is essentially a column space considered with the dot product.
• The transpose of a matrix is described by the transpose L of a linear function
L between two finite dimensional inner product spaces V and W . This is defined
as the unique linear function L : W → V for which
L w, vV = w, LvW

for all v ∈ V , w ∈ W . Note that L = L. That is, for two linear functions
L : V → W and M : W → V we have L = M iff M = L. Then L and M
are said to be each others transpose.
The notion of the transpose of a linear function is essentially the same as the
notion of the transpose of a matrix. Let L : V → W be a linear function between
finite dimensional inner product spaces. Choose orthonormal bases v1 , . . . , vn of
V and w1 , . . . , wm of W . Then we go over to coordinates, and so L is turned into
an m × n-matrix and L is turned into an n × m-matrix. Then these two matrices
are each others transpose.
One can do everything that was done in Chap. 1 coordinate free. For example, one
can define convex sets and convex cones in vector spaces, one can take Cartesian
products of convex sets, one can take the image of a convex set under a linear
function between two vector spaces and one can take the inverse image of a convex
set under a linear function between two vector spaces.
2.2 Binary Operations and the Functions +X , X
The aim of this section is to construct the two binary operations for the homo-
geneous case, that is, for convex cones—intersection and Minkowski sum—in a
systematic way: by homogenization. Moreover, it is shown that these two binary
operations are closely related—by means of the transpose operator. For these tasks,
it is convenient to work coordinate-free. However, we recall that a finite dimensional
inner product space is essentially just Rn equipped with the dot product.
We consider two special linear functions for a finite dimensional inner product
space X:
• the sum map +X : X × X → X is the linear function with recipe (x, y) → x + y,
• the diagonal map X : X → X×X is the linear function with recipe x → (x, x).
The function +X needs no justification—it gives the formal description of
addition in X. The function X gives a formal description for the equality relation
x = y on pairs (x, y) of elements of X: these pairs are of the form X (x). Moreover,
we now show that these two special linear functions +X and X are closely related:
they are each other’s transpose. This fundamental property will be used later, for
example in Chap. 6, to get calculus rules such as
∂(f1 + f2 )(x, y) = ∂f1 (x, y) + ∂f2 (x, y).
Proposition 2.2.1 (Relation Sum and Diagonal Map) Let X be a finite dimen-
sional inner product space. Then the sum map +X and the diagonal map X are
each other’s transpose.
2.3 Construction of a Binary Operation 57
Proof For all u, v, w ∈ X one has the following chain of equalities:
u,
X (v, w) = X (u), (v, w) = (u, u), (v, w) =
u, v + u, w = u, v + w = u, +X (v, w).
It follows that
X = +X as required.
Now we construct binary operations on convex cones. For two convex cones
C, D ⊆ X, we take, to begin with, the Cartesian product C × D, which is a convex
cone in X × X. Now we use the two operations by linear functions on convex sets—
image and inverse image.
We take (C × D)X , the inverse image of C × D under the diagonal map X .
This is a convex cone in X. Note that this is the intersection C ∩ D of C and D:
(C × D)X = C ∩ D,
We take +X (C × D), the image of C × D under the sum map +X . This is also a
convex cone in X. Note that this is the Minkowski sum C + D of C and D:
+X (C × D) = C + D.
Thus we have constructed the two standard binary operations (C, D) → C ◦D on

convex cones—intersection and Minkowski sum respectively—by taking for C ◦ D
the outcome of the following two operations on the subset C × D of X × X: ‘inverse
image under the diagonal map’ and ‘image under the sum map’ respectively.
Conclusion A systematic construction (by homogenization) gives the two standard
binary operations for convex cones, intersection and Minkowski sum. This makes
clear that these binary operations are related—by the transpose operator.
2.3 Construction of a Binary Operation
The aim of this section is to give a systematic construction for binary operations for
convex sets: by the homogenization method. We repeat once again that this means
that we begin by working with convex cones and then we dehomogenize.
The skill that is demonstrated in this section is somewhat technical. By reading
the explanation of the construction of binary operations below to get the idea,
moreover studying carefully the example to understand the details, furthermore
repeating the calculation from the example yourself, and finally constructing in the
same way one of the other three binary operations, you can master it.
The precise way to construct a binary operation on convex sets by homogeniza-
tion, is as follows. We take the space that contains the homogenizations of convex
sets in a finite dimensional vector space X (we recall that X is essentially Rn ): this
space is the product space X×R. Then we make, for each of the two factors X and R
of this space, a choice between + and −. Let the choice for X be ε1 and let the choice
for R be ε2 . This gives 22 = 4 possibilities for (ε1 , ε2 ): (++), (+−), (−+), (−−).
Each choice (ε1 , ε2 ) will lead to a binary operation ◦(ε1 ,ε2 ) on convex sets.
Let (ε1 , ε2 ) be one of the four choices, and let A and B be two ‘old’ convex sets
in X. We use the homogenization method to construct a ‘new’ convex set A◦(ε1 ,ε2 ) B
in X.
1. Homogenize. We conify A and B. This gives the conifications c(A) and c(B)
respectively. These are convex cones in X × R++ .
2. Work with homogeneous convex sets, that is, with convex cones. We are going to
associate to the old convex cones C = c(A) and D = c(B) a new convex cone
E that is contained in X × R++ .
We take the Cartesian product C × D. This is a convex cone in
X × R++ × X × R++ .
We rearrange its factors:
(X × X) × (R++ × R++ ).
Now we view C × D as a subset of (X × X) × (R++ × R++ ). Then we act

on C × D as follows, using the pair (ε1 , ε2 ) that we have chosen. We define the
promised new convex cone E to be the outcome of applying simultaneously two
operations on C × D: the first operation is taking the image under the mapping
+X if ε1 = +, but it is the inverse image under the mapping X if ε1 = − and
the second operation is taking the image under the mapping +R if ε2 = +, but it
is taking the inverse image under the mapping R if ε2 = −. Here we recall the
domain and range where we consider these mappings:
+X : X × X → X,
X : X → X × X,
+R : R++ × R++ → R++ ,
R : R++ → R++ × R++ .
The outcome E is a convex cone in X × R++ .

3. Dehomogenize. We take the dehomogenization of E to be A ◦(ε1 ,ε2 ) B.
Example 2.3.1 (The Binary Operation ◦(+−) for Convex Sets) We will only do one
of four possibilities: as a specific example, we choose + for the factor X and we
choose − for the factor R. Then we form the pair (+−). This pair will determine a
binary operation (A, B) → A ◦(+−) B on convex sets in X. At this point, it remains
to be seen which binary operation we will get.
2.3 Construction of a Binary Operation 59
To explain what this binary operation is, let A and B be two ‘old’ convex sets in
X. We have to make a new convex set A ◦(+−) B in X from A and B. We use the
homogenization method.
1. Homogenize. We conify A and B. This gives the conifications c(A) and c(B)
respectively. These are convex cones in X × R++ .
2. Work with homogeneous convex sets, that is, with convex cones. We want to
associate to the old convex cones C = c(A) and D = c(B) a new convex cone
E that is contained in X × R++ .
We take the Cartesian product C × D. This is a convex cone in
X × R++ × X × R++ .
We rearrange its factors:
(X × X) × (R++ × R++ ).
Now we view C × D as a subset of (X × X) × (R++ × R++ ). Then we act

on C × D as follows, using the pair (+−) that we have chosen. We define the
promised new convex cone E to be the outcome of taking simultaneously the
image under +X (as the pair (+−) starts with +) and the inverse image under
R (as the pair (+−) ends with −). Then E is a convex cone in X × R++ .
Explicitly, E consists of all elements (z, γ ) ∈ X × R++ for which there exists
an element
((x, α), (y, β)) ∈ C × D
such that
z = +X (x, y) and R (γ ) = (α, β).
In passing, we note that this explicit description of E confirms something that we

have already observed: that E, just as C and D, is a convex cone that is contained
in X × R++ ,
E ⊆ X × R++ .
Now we use that C = c(A) and D = c(B). It follows that E consists of all
(z, γ ) ∈ X × R++ for which there exists an element
(s(a, 1), t (b, 1)) = (sa, s, tb, t) ∈ c(A) × c(B)
for which
z = sa + tb and γ = s = t.
Thus E consists of the set of pairs (γ (a + b), γ ) with γ > 0, a ∈ A, b ∈ B.

That is,
E = c(A + B),
the conification or homogenization of the Minkowski sum A + B.

3. Dehomogenize. The deconification or dehomogenization of E = c(A + B) is
A + B. Therefore, the choice of (+−) leads to the binary operation Minkowski
sum,
A ◦(+−) B = A + B
for all convex sets A, B ⊆ X = Rn .

Now it is recommended that you repeat the construction for one of the other three
possibilities.
Conclusion We have seen that for each one of the 4 = 22 possibilities of choosing
for each one of the two factors of X × R between + and −, we get by means of the
homogenization method a binary operation for convex sets in X. For the choice +
for X and − for R, this gives just the well-known binary operation Minkowski sum
(A, B) → A + B.
2.4 Complete List of Binary Operations
Here is a list that gives which choice leads to which binary operation if we construct
the binary operations for convex sets systematically, by the homogenization method.
Proposition 2.4.1 (Systematic Construction Binary Operations for Convex
Sets) The outcome of the 4 = 22 possibilities (±, ±) of choosing for each one of
the two factors of X × R between + and − is as follows:
• (+−) gives the Minkowski sum +, defined by
A + B = {a + b | a ∈ A, b ∈ B},
• (−−) gives the intersection ∩, defined by
A ∩ B = {x ∈ X | x ∈ A & x ∈ B},
• (++) gives the convex hull of the union co ∪, defined by
A co ∪ B = co(A ∪ B),
2.4 Complete List of Binary Operations 61
• (−+) gives the inverse sum #, defined by
A#B = ∪α∈[0,1] (αA ∩ (1 − α)B).
Example 2.4.2 (Minkowski Sum) For n = 1, the Minkowski sum of two intervals is
given by
[a, b) + [c, d) = [a + c, b + d).
For n = 2, the Minkowski sum of two rectangles whose sides are parallel to the
coordinate axes is given by
([a1 , b1 ) × [a2 , b2 )) + ([c1 , d1 ) × [c2 , d2 )) = (a1 + c1 , b1 + d1 ) × [a2 + c2 , b2 + d2 ).
This can be used to compute the Minkowski sum of two arbitrary convex sets in
R2 by approximating these by a disjoint union of such rectangles and then using
that if two convex sets A, B in R2 are written as the disjoint union of convex sets,
A = ∪i∈I Ai and B = ∪j ∈J Bj , then A + B = ∪i∈I,j ∈J (Ai + Bj ). For general n,
one can proceed in a similar way as for n = 2.
Example 2.4.3 (Convex Hull of the Union) Figure 2.1 illustrates the convex hull of
the union.
Figure 2.1 shows that if we want to determine the convex hull of the union of
two convex sets A, B analytically in the interesting case that A ⊆ B and B ⊆ A,
then we have to determine the outer common tangents to A and B and the points of
tangency.
Example 2.4.4 (Inverse Sum) Let A, B ⊆ Rn be convex sets. Then 0n ∈ A#B iff
0n ∈ A or 0n ∈ B. For a nonzero vector v ∈ Rn , v ∈ A#B iff it lies in the direct
sum of the following two convex subsets of the open ray R generated by v, A ∩ R
and B ∩ R. It remains to determine the inverse sum of two intervals in (0, +∞):
[a −1 , b−1 )#[c−1 , d −1 ) = [(a + c)−1 , (b + d)−1 )
for all a > b > 0 and c > d > 0. This formula explains the terminology inverse
sum.
A
A
B B
A co ∪ B A co ∪ B
Fig. 2.1 Convex hull of the union

It would be interesting if we could characterize the four standard binary

operations by means of some axioms. This remains an open question. But at least we
have demonstrated that, from the point of view of homogenization, the four standard
binary operations look the same and form the complete list of binary operations for
convex sets that can be constructed by the homogenization method.
Note that the binary operations + and co ∪ preserve properness, but that the
binary operations ∩ and # do not preserve properness. The binary operation ∩
preserves closedness, but the binary operations +, co ∪ and # do not preserve
closedness.
We have now four binary operations on convex sets. If we restrict them to convex
cones, then they agree pairwise. Can you see how?
Conclusion The four standard binary operations on convex sets—intersection,
convex hull of the union, Minkowski sum and inverse sum—are defined by
completely different looking formulas, but they look exactly the same from the
point of the homogenization method and they form the complete list of the binary
operations on convex sets that are constructed by this method.
2.5 Exercises
1. Determine, by means of a drawing, the Minkowski sum A + B of the following

convex sets A, B in the plane R2 :
(a) A = {(2, 3)} and B is the closed disk B((−1, 1), 4),
(b) A and B are closed line segments; the endpoints of A are (2, 0) and (4, 0),
the endpoints of B are (0, −1) and (0, 3),
(c) A and B are closed line segments; the endpoints of A are (2, 3) and (7, 5),
the endpoints of B are (1, 4) and (6, 8),
(d) A is the closed line segment with endpoints (2, 3) and (4, 1) and B is the
closed disk B((0, 0), 2).
(e) A and B are closed disks: A = B((2, 1), 3) and B = B((−3, −2), 5),
(f) A is the solution set of y ≥ ex and B is the solution set of y ≥ e−x ,
(g) A is the solution set of y ≥ x 2 and B is the solution set of y ≥ e−x .
2. Determine, by means of a drawing, the convex hull of the union A co ∪ B of
the following convex sets A, B in the plane R2 :
(a) A = {(4, 7)} and B is the closed line segment with endpoints (2, 3) and
(4, 5),
(b) A and B are closed line segments; the endpoints of A are (−1, 0) and (2, 0),
the endpoints of B are (0, −3) and (0, 4),
(c) A and B are closed line segments; the endpoints of A are (1, 0) and (2, 0),
(d) A and B are closed line segments; the endpoints of A are (2, 3) and (7, 5),
2.5 Exercises 63
(e) A = {(0, 0)} and B = B((4, 4), 3),

(f) A = B((0, 0), 1) and B = B((2, 5), 3).
(g) A is the closed line segment with endpoints (2, −1) and (2, 3) and B is the
standard closed unit disk B((0, 0), 1).
3. Determine, by means of a drawing, the inverse sum A#B of some self-chosen
convex sets A, B in the plane R2 .
4. Prove that the Minkowski sum of two compact convex sets is compact. How is
this for the other three standard binary operations for convex sets?
5. Prove that the Minkowski sum of a compact convex set and a closed set is
closed. How is this for the other three standard binary operations for convex
sets?
6. Show that the Minkowski sum of two closed convex sets need not be closed.
How is this for the other three standard binary operations for convex sets?
7. Show that for two functions f, g : Rn → R that are convex (that is, the region
above their graph is convex), the pointwise sum f +g is also a convex function.
8. Let a > b > 0 and c > d > 0. Show that
[a −1 , b−1 )#[c−1 , d −1 ) = [(a + c)−1 , (b + d)−1 ).
This formula explains the terminology inverse sum for the binary operation #
on convex sets.
9. Give a ‘coordinate proof’ of Proposition 2.2.1: show that the matrices of +X
and X for X = Rn , equipped with the dot product, are each others transpose.
10. Verify the two formulas (C × D)X = C ∩ D and +X (C × D) = C + D for
convex cones C, D ⊆ X = Rn .
11. Show that each one of the two standard binary operations for convex cones
= ∩, + is commutative, AB = BA, and associative, (AB)C =
A(BC). Do this in two ways: by the explicit formulas for these binary
operations and in terms of their systematic construction.
12. Check that the set E constructed in Sect. 2.3 is a convex cone in X × R for
which E ⊆ X × R++ .
13. (a) Show that the following two binary operations for convex sets, the
Minkowski sum and the convex hull of the union, are the same for convex
cones.
(b) Show that the following two binary operations for convex sets, the intersec-
tion and the inverse sum, are the same for convex cones.
14. Show that each one of the four well-known binary operations on convex
sets = ∩, +, co ∪, # is commutative, AB = BA, and associative,
(AB)C = A(BC). Do this in two ways: by the explicit formulas for
these binary operations and in terms of their systematic construction.
15. Prove Proposition 2.4.1.
References
1. R.T. Rockafellar, Convex Analysis (Princeton University, Princeton, 1970)

2. V.M. Tikhomirov, Convex Analysis. Analysis II, Encyclopaedia of Mathematical Sciences, vol.
14 (Springer, Berlin, 1990)
3. G.G. Magaril-Ilyaev, V.M. Tikhomirov, Convex Analysis: Theory and Applications. Translations
of Mathematical Monographs, vol. 222 (American Mathematical Society, Providence, 2003)
4. J. Brinkhuis, V.M. Tikhomirov, Duality and calculus of convex objects. Sb. Math. 198(2), 171–
206 (2007)
5. J. Brinkhuis, Convex duality and calculus: reduction to cones. J. Optim. Theory Appl. 143, 439
(2009)
Chapter 3
Convex Sets: Topological Properties
Abstract
• Why. Convex sets have many useful special topological properties. The chal-
lenging concept of recession directions of a convex set has to be mastered: this is
needed for work with unbounded convex sets. Here is an example of the use of
recession directions: they can turn ‘non-existence’ (of a bound for a convex set
or of an optimal solution for a convex optimization problem) into existence (of a
recession direction). This gives a certificate for non-existence.
• What. The recession directions of a convex set can be obtained by taking the
closure of its homogenization. These directions can be visualized by means of
the three models as ‘points at infinity’ of a convex set, lying on ‘the horizon’. All
topological properties of convex sets can be summarized by one geometrically
intuitive result only: if you adjoin to a convex set A their points at infinity, then
the outcome has always the same ‘shape’, whether A is bounded or unbounded:
it is a slightly deformed open ball (its relative interior) that is surrounded on
all sides by a ‘peel’ (its relative boundary with its points at infinity adjoined).
In particular, this is essentially a reduction of unbounded convex sets to the
simpler bounded convex sets. This single result implies all standard topological
properties of convex sets. For example, there is a very close relation between
the relative interior and the closure of a convex set: they can be obtained out of
each other by taking in one direction the relative interior and in the other one by
the closure. Another example is that the relative boundary of a convex set has a
‘continuity’ property.
Road Map
• Definition 1.5.4 and Fig. 3.1 (recession vector).
• Section 3.2 (list of topological notions).
• Proposition 3.2.1, Definition 1.5.4, Fig. 3.2 (construction of recession vectors by
homogenization).
• Corollary 3.2.4 (unboundedness equivalent to existence recession direction).
• Figures 3.3, 3.4, 3.5, 3.6, 3.7 (visualization of recession directions as points on
the horizon).
• Theorem 3.5.7 (the shape of bounded and unbounded convex sets).

66 3 Convex Sets: Topological Properties
• Figure 3.10 (the idea behind the continuity of the relative boundary of a convex
set).
• Corollary 3.6.2 (all standard topological properties of convex sets).
• Figure 3.11 (warning that the image under a linear function of a closed convex
set need not be closed).
3.1 *Crash Course in Topological Notions
The study of convex sets A leads to the need to consider limits of those sequences of
points in A that tend either to a boundary point of A or ‘to a recession direction of A’
(in particular, its Euclidean norm goes to infinity). The most convenient way to do
this, is to use topological notions. In this section, we recall the minimal knowledge
that is needed for this. More information can be found in any textbook on analysis
or on point-set topology.
A set U ⊆ X = Rn is called open if it is the union of a collection of open balls.
An open ball is the solution set UX (a, r) = Un (a, r) = U (a, r) of an inequality of
the form x − a < r for some vector a ∈ Rn and some positive real number r. The
sets ∅ and X are open, the intersection of a finite collection of open sets is open and
the union of any collection of open sets is open. For any set S ⊆ X, the union of all
open sets contained in S is called the interior of S and it is denoted by int(S). This
is the largest open subset of S. Points in int(S) are called interior points of S.
A set C ⊆ X is called closed if its complement X\C is open. The sets ∅ and X are
closed, the union of a finite collection of closed sets is closed and the intersection of
any collection of closed sets is closed. For any S ⊆ X, the intersection of all closed
sets containing S is called the closure of S and it is denoted by cl(S). This is the
smallest closed set containing S.
For each set S ⊆ X, the points from the closure of S that are not contained in the
interior of S, are called boundary points.
A set S ⊆ X is called bounded if there exists K > 0 such that |si | ≤ K for all
s ∈ S and all i ∈ {1, . . . , n}. Otherwise, S is called an unbounded set. A set S ⊆ X
is called compact if it is closed and bounded.
An infinite sequence of points v1 , v2 , . . . in X has limit v ∈ X—notation
limk→∞ vk = v—if there exists for each ε > 0 an natural number N such that
vk − v < ε for all k ≥ N . Here · is the Euclidean norm on X given by
1
x = (x12 + · · · + xn2 ) 2 for all x ∈ X. Then the sequence is called convergent.
An alternative definition for a closed set in X is that it is a subset of X such that
for every sequence of points in A that is convergent in X, also its limit is a point
in A.
Let S ⊆ X = Rn , T ⊆ Y = Rm , a ∈ S, b ∈ Y and f : S → T a function. Then
one says that, for s → a, the limit of f (s) equals b—notation lims→a f (s) = b—if
there exists for each ε > 0 a number δ > 0 such that f (s) − b < ε for all s ∈ S
for which s − a < δ.
3.1 *Crash Course in Topological Notions 67
A function f : S → T , where S ⊆ X and T ⊆ Y , is called continuous at a ∈ S

if lims→a f (s) = f (a). The function f is called continuous if it is continuous at all
a ∈ S.
If a function f : S → T , where S ⊆ X = Rn and T ⊆ Y = Rm , is
continuous and surjective, then the following implication holds: if S is compact, then
T is compact (‘compactness is a topological invariant’). This implies the following
theorem of Weierstrass: if S ⊆ X = Rn is a nonempty compact set and f : S → R
is a continuous function, then f assumes its global minimum, that is, there exists

s ∈ S for which f (s) ≥ f ( s ) for all s ∈ S. We will also use the following result
(which characterizes compact sets): each sequence in a compact set has a convergent
subsequence.
One can also define topological concepts relative to a given subset S ⊆ X = Rn :
the intersection of an open (resp. closed) subset of X with S is called open (resp.
closed) relative to S. We will need relative topological notions in two situations.
Here is the first situation where we need relative topology in convex analysis: if
we work with a convex set.
Example 3.1.1 Let A ⊆ R3 be a convex set that is contained in a plane. Then its
interior is empty and all its points are boundary points. So the concepts of interior
and boundary are not very informative here. Therefore, we need a finer topological
notion. This can be obtained by taking the relative topology with respect to the
plane.
For a nonempty convex set A ⊆ X = Rn , it is usually natural to take topological
notions on A—such as open subset of A, interior of A and boundary of A—relative
to aff(A), the affine hull of A. This is defined as follows. An affine subspace of X is
a subset of X of the form v + L = {v + l | l ∈ L} for a point v ∈ Rn and a subspace
L ⊆ X. The affine hull of a convex set A ⊆ X is the smallest affine subspace aff(A)
of X that contains A. Then we will say relatively open subset of A, relative interior
ri(A) of A and relative boundary of A. Both the closure and the relative interior of
a convex set are again convex sets. Often we will avoid working with these relative
notions, for the sake simplicity. We will do this by translating the affine hull of a
given convex set in such a way that it becomes a subspace, and then we replace the
space X by this subspace. Then we can work with the concepts open, interior and
boundary, without having to use ‘relative’ notions.
Here is the second situation where we need relative topology in convex analysis.
If we work with the sphere model and the hemisphere model, we will take the
topological notions on subsets of the sphere and the hemisphere relative to the
sphere.
3.2 Recession Cone and Closure
The main aim of this section is to state the result that the recession cone RA of a
convex set A ⊆ X = Rn , defined in Definition 1.5.4, can be constructed by means
of a closure operation. The proof will be given in the next section.
Figure 3.1 illustrates the concept of recession cone of a convex set.
Proposition 3.2.1 For each nonempty closed convex set A ⊆ X = Rn , the convex
cone RA × {0} is the intersection of cl(co(A × {1})), the closure of the convex hull
of A × {1}, with the horizontal hyperplane X × {0}.
Note that this result implies that the recession cone RA of a nonempty closed
convex set A is a closed convex cone containing the origin. Indeed, cl(co(A × {1}))
is a closed convex cone containing the origin, and so RA × {0}, which you get from
it by intersection with the horizontal hyperplane X × {0} is also a closed convex
cone containing the origin. This means that RA is a closed convex cone containing
the origin.
Example 3.2.2 (Recession Cone by Means of a Closure Operation (Geometric))
Figure 3.2 illustrates Proposition 3.2.1 for the closed half-line A = [0, +∞) ⊂
X = R.
A
a + tc
A a
RA
RA
RA
O O O
Fig. 3.1 Recession cone of a convex set
A × {1}
A
O
Fig. 3.2 Recession direction as a limit

3.2 Recession Cone and Closure 69
The usual definition of the recession cone gives that A has a unique recession
direction, ‘go right’, described by the open ray {(x, 0) | x > 0}. The convex cone
c(A) is the solution set of x ≥ 0, y > 0. Its closure is the solution set is the solution
set of x ≥ 0, y ≥ 0. So if you take the closure, then you add all points (x, 0) with
x ≥ 0. So the direction ‘go right’ is added.
Figure 3.2 can also be used to illustrate this proposition in terms of the ray model
(consider the sequence of open rays that is drawn and for which the angle with
the positive axis tends to zero; note that the positive x-axis models the recession
direction) and the hemisphere model (consider the sequence of dotted points on
the arc that tend to the encircled point; note that this point models the recession
direction).
Example 3.2.3 (Recession Cone by Means of a Closure Operation (Analytic))
1. Ellipse. Let A ⊆ R2 be the solution set of 2x12 +3x22 ≤ 5 (a convex set in the plane
that has a boundary an ellipse). The (minimal) conification c(A) = cmin (A) =⊆
R2 × R++ can be obtained by replacing xi by xi /x3 with x3 > 0, for i = 1, 2 and
then multiplying by x32 . This gives the inequality 2x12 + 3x22 ≤ 5x32 . Its solution
set in R2 × R++ is c(A). Then the maximal conification cmax (A) can be obtained
by taking the closure. This gives that cmax (A) is the solution set in R2 × R+ of
the inequality 2x12 + 3x22 ≤ 5x32 . The recession cone RA ⊆ R2 can be obtained
by putting x3 = 0 in this inequality. This gives the inequality 2x12 + 3x22 ≤ 0. Its
solution set in R2 is {02 }. This is RA .
2. Parabola. Let A ⊆ R2 be the solution set of x2 ≥ x12 (a convex set in the
plane that has a boundary an parabola). We proceed as above. This gives that
the (minimal) conification c(A) = cmin (A) ⊆ R2 × R++ is the solution set of
x2 x3 ≥ x12 . The maximal conification cmax (A) can be obtained by taking the
closure. One can check that this gives that cmax (A) is the solution set in R2 × R+
of the system x2 x3 ≥ x12 , x2 ≥ 0. The recession cone RA ⊆ R2 can be obtained
by putting x3 = 0 in the first inequality. This gives the system of inequalities
0 ≥ x12 , x2 ≥ 0. Its solution set in R2 is the closed vertical ray in R2 that points
upward, {(0, ρ) | ρ ≥ 0}. This is RA .
3. Hyperbola. Let A be the solution set of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 (a convex set
in the plane that has a boundary a branch of a hyperbola). We proceed again as
above. This gives that the (minimal) conification c(A) = cmin (A) ⊆ R2 ×R++ is
the solution set of x1 x2 ≥ x32 , x1 ≥ 0, x2 ≥ 0. The maximal conification cmax (A)
can be obtained by taking the closure. One can check that this gives that cmax (A)
is the solution set in R2 × R+ of the system x1 x2 ≥ x32 , x1 ≥ 0, x2 ≥ 0. The
recession cone RA ⊆ R2 can be obtained by putting x3 = 0 in the first inequality.
This gives the system of inequalities x1 x2 ≥ 0, x1 ≥ 0, x2 ≥ 0. Its solution set
in R2 is the first quadrant R2+ . This is RA .
Proposition 3.2.1 has the following consequence.
Corollary 3.2.4 Let A ⊆ X = Rn be a given nonempty closed convex set. Then the
following conditions are equivalent:
1. A is bounded,
2. A has no recession directions,
3. c(A) ∪ {0n } is closed.
For each nonempty closed convex set A, there is a natural bijection between the set
of recession directions of A and the set of ‘horizontal’ open rays of cl(c(A)), that
is, those rays that are not open rays of c(A).
Example 3.2.5 (Unboundedness of Non-convex Sets) The subset in the plane R2
that is the solution set of x 2 + 1 ≥ y ≥ x 2 is not convex; this set unbounded, but it
contains no half-lines. So the fact that unboundedness of a convex set is equivalent
with the existence of a recession direction, is a special property of convex sets. It
does not hold for arbitrary subsets of Rn . Checking unboundedness of non-convex
sets is therefore much harder than checking boundedness of convex sets.
3.3 Recession Cone and Closure: Proofs
Now we prove the results from the previous section. Here is the proof of the main
result.
Proof of Proposition 3.2.1 It suffices to prove the second statement as both the
recession cone RA and the closure of the conification of A, cl(c(A)), do not change
if A is replaced by its closure cl(A).
⊆: Let (v, ρ) ∈ cl(c(A)) \ c(A). Choose a sequence ρn (an , 1), n ∈ N in c(A)—
so an ∈ A and ρn > 0 for all n—that has limit (v, ρ).
First we prove by contradiction that ρ = 0. Assume ρ > 0. Then the sequence
(an , 1), n ∈ N in c(A) has limit (ρ −1 v, 1) as limn→∞ ρn = ρ > 0. That is,
limn→∞ an = ρ −1 v, so ρ −1 v ∈ A as A is closed. Therefore, (v, ρ) = ρ(ρ −1 v, 1)
lies in c(A). This is the required contradiction.
So ρ = 0. By the choice of the sequence above, we have limn→∞ ρn an = v and
limn→∞ ρn = ρ = 0. Choose a ∈ A. For each t ≥ 0, we consider the sequence
((1 − tρn )a + tρn an )n in A, where we start the sequence from the index n = Nt
where Nt is chosen large enough such that tρn < 1 for all n ≥ Nt . This choice
is possible as limn→∞ ρn = 0. This sequence ((1 − tρn )a + tρn an )n converges to
a + tv as ρn → 0 and ρn an → 1, and so a + tv ∈ A as A is closed. This proves that
a + tv ∈ A for all t ≥ 0 and so, by definition, v is a recession vector of A, that is,
v ∈ RA . This establishes the inclusion ⊆.
⊇: Let v ∈ RA . If v = 0n , then we choose a ∈ A and note that (v, 0) = 0n+1 =
limt↓0 t (a, 1) ∈ cl(c(A)). Now let v = 0n . Choose a ∈ A such that a + tv ∈ A
for all t ≥ 0. As v = 0, we get limt→∞ (a + tv, 1)−1 (a + tv, 1) = (v, 0). So
(v, 0) lies in the closure of c(A), that is, (v, 0) ∈ cl(c(A)). As the last coordinate of
3.4 Illustrations Using Models 71
(v, 0) is zero, we have (v, 0) ∈ c(A), as c(A) ⊆ X × R++ . So we have proved that
(v, 0) ∈ cl(c(A)) \ c(A). This establishes the inclusion ⊇.
Here is the proof of the corollary.
Proof of Corollary 3.2.4 We prove the following implications. 3 ⇒ 1: If A is
unbounded, we can choose a sequence an , n ∈ N in A for which limn→∞ an =
+∞. Then we can choose a subsequence ank , k ∈ N such ank = 0n for all k and
such that, moreover, the sequence ank −1 ank , k ∈ N is convergent, say to v ∈ Sn .
This is possible as the unit sphere Sn is compact and as each sequence in a compact
set has a compact subsequence. Then the sequence (ank −1 ank , ank −1 ), k ∈ N
in c(A) converges to (v, 0), which does not lie in c(A). So c(A) is not closed. This
proves the implication: c(A) closed ⇒ A bounded.
2 ⇒ 3: follows from Proposition 3.2.1.
1 ⇒ 2: follows from the definition of recession direction.
The last statement of the corollary follows immediately from Proposition 3.2.1.
3.4 Illustrations Using Models
Now, the description of recession directions by a closure operation will be illustrated

in the three models (ray, hemisphere, top-view). This is possible as a convex cone
C ⊆ X × R+ containing the origin, where X = Rn , is closed iff one of its three
models is closed, as one can check. To be precise, for the ray model this statement
requires the concept of metric space.
Example 3.4.1 (Recession Directions and the Ray Model) Figure 3.3 illustrates the
case A = [a, +∞) ⊆ X = R with a > 0. The ray model is used.
The beginning of a sequence of rays of the convex cone c(A) has been drawn.
The rays of the sequence are flatter and flatter and the angle they make with the
horizontal ray tends to zero. In the limit, you get the positive horizontal axis; this
ray represents the unique recession direction of A—‘go right’.
A × {1}
A
O RA
Fig. 3.3 Recession cone and the ray model

A × {1}
_
c(A)
^
A
A
O RA
Fig. 3.4 Recession cone and the hemisphere model
^
X
X RX 0
Fig. 3.5 The hemisphere model for the recession directions of the plane
Example 3.4.2 (Recession Directions and the Hemisphere Model in Dimension One)
Figure 3.4 illustrates again the case of a closed half-line, A = [a, +∞) ⊆ X = R
with a > 0. Now the hemisphere model is used.
You see that A is visualized as a half open-half closed arc A (the bottom point is
not included) in the hemisphere model, and that the unique recession direction (‘go
right’) is visualized as the bottom point of the arc, also in the hemisphere model. If
we would take the closure of this half open-half closed arc, the effect would be that
this bottom point would be added to it. The closure of the conification, cl(c(A)), is
denoted in the figure by c(A).
Example 3.4.3 (Recession Directions and the Hemisphere Model for the Plane)
Figure 3.5 illustrates the case of the plane A = X = R2 . The hemisphere model is
used.
The plane A = X is visualized as the standard open upper hemisphere X, and
the set of recession directions of X, that is, of the open rays of the recession cone
RX , is visualized by the circle that bounds the upper hemisphere. This last set is
indicated in the same way as the recession cone of A = X, by RX . If you take the
closure of the model of A = X, then what is added is precisely the model of the set
of recession directions. Indeed, if we take the closure of the open upper hemisphere,
then the circle that bounds the upper hemisphere is added to it.
Example 3.4.4 (Recession Directions and the Hemisphere Model for Convex Sets
Bounded by an Ellipse, a Parabola, and a Branch of a Hyperbola) Figure 3.6
illustrates the case of three closed convex sets A in the plane R2 . The hemisphere
model is used.
3.4 Illustrations Using Models 73
A A
A
RA RA
Fig. 3.6 The hemisphere model for the recession directions of some convex sets in the plane
A
A RA A
RA
Fig. 3.7 The top-view model for the recession directions of some convex sets in the plane
Each one of the three shaded regions is denoted in the same way as the convex set
that it models, as A; the set of points of the shaded region that lie on the boundary
of the hemisphere is denoted in the same way as the recession cone of A, whose set
of rays it models, as RA .
On the left, the model of A is closed; this corresponds to the fact that A has no
recession directions, or equivalently, to the fact that A is bounded. In the middle,
the model of A is not closed: you have to add one suitable point from the circle
that bounds the hemisphere to make it closed. This point corresponds to the unique
recession direction of A, the unique open ray of the recession cone RA . In particular,
A is not bounded. On the right, the model of A is not closed: you have to add a
suitable closed arc from the circle that bounds the hemisphere to make it closed.
The points of this arc correspond to the recession directions of A. These points
correspond to the open rays of RA . In particular, A is not bounded.
Example 3.4.5 (Recession Directions and the Top-View Model for Convex Sets
illustrates also the case of three closed convex sets A in the plane R2 , just as Fig. 3.6,
but it differs slightly. In Fig. 3.7, the convex sets contain the origin in their interior,
the top-view model is used, and some arrows are drawn.
The arrows in Fig. 3.7 have to be explained. These model straight paths in the
convex set A that start at the origin of the plane. If an arrow ends in the interior of
the disk, then the path in A ends at a boundary point of A; if the arrow ends at a
boundary point of the disk, then the path in A does not end: it is a half-line.
Example 3.4.6 (Recession Directions and the Ray Model for n-Dimensional Space
Rn ) Consider the case A = X = Rn . Note to begin with that cl(c(X)) equals
the closed upper halfspace X × R+ . Its non-horizontal open rays correspond to the
points of X, and its horizontal open rays correspond to the one-sided directions of
X, that is, to Sn , the standard unit sphere in Rn . Indeed, these non-vertical rays
are the rays {ρ(x, 1) | ρ > 0} where x ∈ X and these horizontal rays are the rays
{ρ(x, 0) | ρ > 0} where x ∈ Sn .
Example 3.4.7 (Gauss and Recession Directions) It is interesting that Gauss used
already a similar model to visualize one-sided directions in space.
Disquisitiones, in quibus de directionibus variarum
rectarum in spazio agitur, plerumque ad maius
perspicuitatis et simplicitatis fastigium evehuntur,
in auxilium vocando superficiem sphaericam radio = 1
circa centrum arbitrarium descriptam, cuius singula
puncta repraesentare censebuntur directiones rectarum
radiis ad illa terminatis parallelarum.
Investigations, in which the directions of various straight lines in space are to be considered,
attain a high degree of clearness and simplicity if we employ, as an auxiliary, a sphere of
unit radius described about an arbitrary center, and suppose the different points of the sphere
to represent the directions of straight lines parallel to the radii ending at these points. (First
sentence of General Investigations of Curved Surfaces: Carl Friedrich Gauss, 1828.)
This fits in with the hemisphere model. The auxiliary sphere of unit radius that
is recommended by Gauss, is precisely the boundary of the open upper hemisphere
x12 + x22 + x32 + x42 = 1, x4 > 0 that models the points of three dimensional space.
Indeed, this boundary is x12 + x22 + x32 = 1, x4 = 0, that is, it is a unit sphere of
radius 1. So the auxiliary sphere recommended by Gauss agrees completely with
the visualization of recession vectors in the hemisphere model, which has been
explained above.
3.5 The Shape of a Convex Set
The aim of this section is to give the main result of this chapter—a result describing
the ‘shape’ of a convex set. We will formulate this result in terms of the hemisphere
model. We begin by giving a parametrization of the standard upper hemisphere.
Definition 3.5.1 The angle (u, w) between two unit vectors u, w ∈ Rp is the
number ϕ ∈ [0, π ] for which cos ϕ = u · w.
Definition 3.5.2 (Parametrization of the Upper Hemisphere) For each v ∈
SX = Sn , the standard unit sphere in X = Rn , and each t ∈ [0, 12 π ], we define
the point xv (t) in the standard upper hemisphere in X × R = Rn+1 , given by
x12 + · · · + xn+1
2 = 1, xn+1 ≥ 0, implicitly by the following two conditions:
xv (t) ∈ cone({(0n , 1), (v, 0)})

3.5 The Shape of a Convex Set 75
Fig. 3.8 Parametrization of xv(t)

the upper hemisphere
t
v
and
(xv (t), (0n , 1)) = t;
explicitly,
xv (t) = ((sin t)v, cos t).
This parametrization of the standard upper hemisphere is not unique for the point
(0n , 1): then one can take t = 0 and each v ∈ Sn .
Example 3.5.3 (Parametrization of the Upper Hemisphere) Figure 3.8 illustrates
this parametrization.
In order to visualize how the point xv (t) depends on t for a given unit vector v,
the vector v and the vector (0n , 1) are drawn as (1, 0) and (0, 1) in the plane R2
respectively. The point on the standard unit circle in the planeR2 that lies in the first
quadrant R2+ such that the length of the arc with endpoints (0, 1) and this point is t
is drawn: this point is xv (t).
The hemisphere model for X = Rn and the one-sided directions in X can
be expressed algebraically in terms of the parametrization of the standard upper
hemisphere given above. The point xt (v) on the hemisphere models the point
(tan t)v in X if 0 ≤ t < 12 π and it models the one-sided direction in X given
by the unit vector v if t = 12 π .
The following result is a first step towards the main result; it is also of
independent interest. From now on, we will take topological notions for a convex
set A ⊆ Rn always relative to the affine hull of A.
Proposition 3.5.4 Each nonempty convex set A ⊆ X = Rn has a relative interior
point.
Proof Take a maximal affinely independent set of points a1 , . . . , ar in A. Then the
set of points ρ1 a1 + · · · + ρr ar with ρ1 , . . . , ρr > 0, ρ1 + · · · + ρr = 1 is open in
aff(A) and so all its points are relative interior points of A.
This result gives a better grip on convex sets. Whenever you deal with a convex
set, it helps to choose a relative interior point and then consider the position of the
convex set in comparison with the chosen point. We will do this below in order to
simplify the formulation and proof of the main result, Theorem 3.5.7.
To be more precise, Proposition 3.5.4 shows that we can always assume wlog
that a given nonempty convex set A ⊆ Rn contains the origin in its interior point.
Indeed, we can choose a relative interior point, translate A to a new position A such
that this point is brought to the origin; this turns the affine hull of A into the span of
A . Finally we consider A as a convex subset of the subspace span(A ) and we give
a coordinate description of span(A ) by choosing a basis for it. This concludes the
reduction to a convex set that has the origin as an interior point.
Before we state the main result formally, we will explain the geometric meaning
of it.
Example 3.5.5 (The Main Result on the Shape of a Convex Set for a Closed Ball)
The following description of the shape of a closed ball in Rn is the view on convex
sets that is given below on arbitrary convex sets. If you travel in the direction of
some unit vector v, with constant speed and in a straight line, starting from the
center of the ball, then the following will happen. You travel initially through the
interior of the ball, then you will hit during one moment the boundary of the disk,
and after that moment you will travel forever outside the disk. The hitting time does
not depend on the direction v.
For the purpose of explaining the geometric meaning of the main result, the
top-view model is the most convenient one; for the purpose of giving a formal
description of the result, the hemisphere model is more convenient.
Example 3.5.6 (The Main Result on the Shape of a Convex Set for Sets in the Plane
illustrates the main result for three closed convex sets A in the plane R2 that contain
the origin in their interiors: the left one is bounded by an ellipse, the middle one is
bounded by a parabola and the right one is bounded by a branch of a hyperbola.
Here is some explanation for this figure. For each one of these three convex sets
A, its model is indicated by the same letter as the set itself, by A; the model of the
set of recession directions is indicated by the same letter as the recession cone, by
RA . Recall here that the recession directions of A correspond to the open rays of
RA . Each arrow in the figure models either a closed line segment with one endpoint
the origin and the other endpoint a boundary point of A, or it models a half-line
contained in A with one endpoint the origin.
A
A RA A
RA
Fig. 3.9 The shape of a closed convex set together with its recession directions
3.5 The Shape of a Convex Set 77
Now we describe the main result for these three sets A. Here we get the same
description for three different looking convex sets, after adding recession directions.
Suppose that you travel, in each one of the three cases, in a straight line in A, starting
from the origin and going in some direction v (a unit vector), with speed increasing
in such a way that the speed would be constant in the hemisphere model. Then in the
top-view model, you travel also in a straight line. At first, you will travel through the
interior of the model of A. Then you will hit during one moment, τ (v), the boundary
of the model of A. If this boundary point does not lie on the boundary of the closed
standard unit disk, then after that moment you will travel outside the model of A
till you will hit the boundary of the disk. The hitting time τ (v) of the boundary of
the model of A depends continuously on the initial direction v. This completes the
statement of the main result for these three convex sets A. So we see that the three
sets A, which look completely different from each other, have exactly the same
shape, as a closed disk, after we adjoin to A the set of its recession directions.
Now we are ready to state the main result formally and in general. As already
announced, we do this in terms of the hemisphere model, as this allows the
simplest formulation. We use the parametrization of the upper hemisphere from
Definition 3.5.2. We recall that all topological notions for subsets of the standard
upper hemisphere {x ∈ Sk | xk ≥ 0} are taken relative to the standard unit sphere
Sk = {x ∈ Rk | x = 1}.
Theorem 3.5.7 (Shape of a Convex Set) Let A ⊆ X = Rn be a closed convex set
that contains the origin in its interior. Consider the hemisphere model cl(c(A)) ∩
Sn+1 for the convex set A to which the set of its recession directions has been added.
For each v ∈ Sn , there exists a unique τ (v) ∈ (0, 12 π ] such that the point xv (t) lies
in the interior of cl(c(A)) ∩ Sn+1 for t < τ (v), on the boundary of cl(c(A)) ∩ Sn+1
for t = τ (v), and outside cl(c(A)) ∩ Sn+1 for t > τ (v). Moreover, τ (v) depends
continuously on v.
The meaning of this result is that if you adjoin the set of recession directions to a
convex set A, then the outcome has always the same shape; which is, for example,
the shape of a closed ball in Rd where d is the dimension of the affine hull of A.
Note that the continuity of the function τ (·) means that the union of the relative
boundary of a convex set and its set of recession directions has a continuity property
in the hemisphere model: if you travel at unit speed along a geodesic starting from
a relative interior point, then the time it takes to reach a point of this union, depends
continuously on the initial direction.
A nice feature of this result is that it implies immediately all standard topological
properties of convex sets, bounded as well as unbounded ones, as we will see in the
next section.
Proof of Theorem 3.5.7 The set cl(c(A)) is a closed convex cone contained in the
halfspace X × R+ and it contains an open ball with center (0n , 1) and some radius
r > 0. This implies all statements of the theorem, except the continuity statement.
This follows from the lemma below.
Fig. 3.10 Illustration of

Lemma 3.5.9
A
x
a
Example 3.5.8 (Continuity Property of the Boundary of a Convex Set) Figure 3.10
illustrates the following auxiliary result and it makes clear why this result implies
a ‘continuity’ property for the boundary of a convex set (this is a matter of
the boundary being squeezed at some point between two sets with ‘continuity’
properties at this point).
Lemma 3.5.9 Let A ⊆ X = Rn be a closed convex set that contains an open ball
U (a, ε) and let x be a boundary point of A. Then the convex hull of the point x and
the open ball U (a, ε) is contained in A, and its point reflection with respect to the
point x is disjoint from A, apart from the point x.
Proof The first statement follows from the convexity of A, together with the
assumptions that x ∈ A and U (a, ε) ⊆ A. The second statement is immediate
from the easy observation that the convex hull of any point of the reflection and
the open ball U (a, ε) contains x in its interior. So if a point of the reflection would
belong to A, then by the convexity of A and by U (a, ε) ⊆ A we would get that x is
in the interior of A. This would contradict the assumption that x is a boundary point
of A.
To derive the required continuity statement precisely from this lemma requires
some care. You have to intersect the convex cone cl(c(A)) with a hyperplane of
X × R (a subspace of dimension n − 1 of X = Rn ) that does not contain the point
(v, 0). We do not display the details of this derivation.
3.6 Topological Properties Convex Set
The aim of this section is to give a list of all standard topological properties of
convex sets. It is an advantage of the chosen approach that all topological properties
of convex sets are a part of or easy consequences of one result, Theorem 3.5.7.
Definition 3.6.1 An extreme point of a convex set A ⊆ X = Rn is a point of A that
is not the average of two different points of A. An extreme recession direction of A
is a one-sided direction if a vector that represents this direction is not the average
of two linearly independent recession vectors of A. We write extr(A) for the set of
3.6 Topological Properties Convex Set 79
extreme points of A and extrr(A) for the set of unit vectors that represent an extreme
recession direction of A.
The kernel of a linear function between finite dimensional inner product spaces
L : X → Y is defined to be the following subspace of X:
ker(L) = {x ∈ X | L(x) = 0Y }.
Corollary 3.6.2 Let A ⊆ X = Rn be a nonempty convex set. Then the following

statements hold true.
1. The relative interior of A is nonempty.
2. All points on an open line segment, with endpoints a relative interior point of
A and a relative boundary point of A are relative interior points of A,
3. The closure of A is a convex set and it has the same relative interior as A.
4. The relative interior of A is a convex set and it has the same closure as A.
5. A convex set is unbounded iff it has a recession direction.
6. The inverse image of a closed resp. open convex set under a linear function is
closed resp. open.
7. The image of an open convex set under a surjective linear function is open.
8. Let L : X → Y = Rm be a linear function. If A is closed, and either RA ∩
ker(L) = {0} or A is a polyhedral set, then the image LA is closed.
9. Krein–Milman theorem. If A is closed, then each point of A is the sum of an
element of the convex hull of the extreme points of A and an element of the
conic hull of the extreme recession directions of A.
10. Let A ⊆ X = Rn be a nonempty closed convex set that contains no line.
Then each point of A can be written as a linear combination of an affinely
independent subset of the union of extr(A) and extrr(A); here all coefficients
are positive and the coefficients of the points in extr(A) add up to 1.
Properties 3 and 4 show that there is a close connection between a convex set,
its closure and its relative interior. This is a special property of convex sets. For an
arbitrary subset S of Rn , one has still the inclusions int(S) ⊆ S ⊆ cl(S), but these
three sets can be completely different.
Figure 3.11 illustrates the need for an assumption in statement 8: the image of a
closed convex set under a linear function need not be closed.
Fig. 3.11 Bad behavior

image under linear function
of closed convex set
Statement 9 is, in the bounded case, due to Minkowski for n = 3 and to Steinitz
for any n. The result was generalized by Krein and Milman, still in the bounded case,
to suitable infinite dimensional spaces (those that are locally convex), but there one
might have to take the closure. Now the terminology Krein–Milman is often used
also for the finite dimensional case. The assumption that A contains no line can be
made essentially wlog. Indeed, for each closed convex set A ⊆ X = Rn one has
A = LA + (A ∩ L⊥
A)
where LA is the subspace RA ∩ R−A , the lineality space of A and L⊥

A is the
orthogonal complement of LA , that is,
LA = {x ∈ X | x · l = 0 ∀l ∈ LA }.
3.7 Proofs Topological Properties Convex Set
The statements 1–7 follow immediately from the known shape of a convex set,
that is, from Theorem 3.5.7. Statement 10 is a combination of Krein–Milman and
Carathéodory’s theorem 1.12.1. It remains to prove the statements 8 and 9.
Proof Krein–Milman Take a nonempty closed subset S of the upper hemisphere
that is convex in the sense that for two different and non-antipodal points in S the
geodesic that connects them is contained in S. For each point s ∈ S let Ls be the
subspace of X of all tangent vectors at p to geodesics that have s as an interior point.
Now pick an element p ∈ S. If Lp = {0n }, then stop. Otherwise, take a nonzero
v ∈ Lp and choose the geodesic through p which has v as tangent vector at p.
Consider the endpoints q, r of the part of this geodesic that lies in S. Then repeat
this step for q and r. Note that Lq , Lr ⊆ Lp but Lq , Lr not equal to Lp as they
do not contain v. As the dimensions go down at each step, this process must lead to
a finite set of points s with Ls = {0n } such that p lies in the convex hull of these
points.
Now we prove statement 8.
Proof Statement 8 under the Assumption on the Recession Cone By the assump-
tion, we get a surjective continuous function from the intersection SX ∩ C to the
intersection SY ∩ LC defined by u → Lu/Lu. So SY ∩ LC is compact, as
compactness is preserved under taking the image under a continuous mapping.
Hence the convex cone LC is closed.
Sketch Proof Statement 8 for the Polyhedral Case The statement holds if L is
injective: then we may assume wlog that L is defined by adding m − n zeros
below each column x ∈ Rn (by working with coordinates in X and Y with
respect to suitable bases); this shows that if A is a polyhedral set, then LA
is also a polyhedral set and so LA is closed. So as each linear function is
3.8 *Applications of Recession Directions 81
the composition of injective linear functions and linear functions of the form
πr : Rr+1 → Rr given by the recipe (x1 , . . . , xr , xr+1 ) → (x1 , . . . , xr ), it
suffices to prove the statement for linear functions of the latter type. Note that each
polyhedral set P in Rr+1 is the solution set of a system of linear inequalities of
the form Li (x1 , . . . , xr ) ≥ xr+1 , xr+1 ≥ Rj (x1 , . . . , xr ), Sk (x1 , . . . , xr ) ≥ 0
for all i ∈ I, j ∈ J, k ∈ K for suitable finite index sets I, J, K. Here
Li , Rj , Sk are affine functions of x1 , . . . , xr . Then the image πr (P ) of P under
πr is readily seen to be the solution set of the system of linear inequalities
Li (x1 , . . . , xr ) ≥ Rj (x1 , . . . , xr ), Sk (x1 , . . . , xr ) ≥ 0 for all i ∈ I, j ∈ J, k ∈ K.
That is, it is again a polyhedral set and in particular it is closed.
The method that was used in this proof is called the Fourier-Motzkin elimination,
a method to eliminate variables in a system of linear inequalities.
3.8 *Applications of Recession Directions
We give two applications of recession vectors. One can usually convince a non-
expert that something exists by giving an example. To convince him/her that
something does not exist is usually much harder. However, in some cases we
are lucky and then we can translate non-existence of something into existence of
something else.
3.8.1 Certificates for Unboundedness of a Convex Set
How to convince a non-expert that a convex set A is unbounded, that is that there
exists no upperbound for the absolute value of the coordinates of the points in A?
Well, the nonexistence of a bound for a convex set A is equivalent to the existence
of a nonzero recession vector for A. So a recession vector for A is valuable: it is
a certificate for unboundedness for A. So you can convince a non-expert that there
exists no bound for A by giving him/her an explicit nonzero recession vector for A.
3.8.2 Certificates for Insolubility of an Optimization Problem
How to convince a non-expert that there does not exist a global minimum for a
linear function l on a closed convex set A? This non-existence is equivalent to the
existence of a nonzero recession vector of A along which the function l descends.
So such a vector is valuable: it is a certificate for insolubility for the considered
optimization problem. So you can convince a non-expert that there does not exist an
optimal solution for the given optimization problem by giving him/her an explicit
nonzero recession vector of A along which the function l descends.
a b
∼
c b
Fig. 3.12 Hint for Exercise 1
3.9 Exercises
1. Give a direct proof that for an interior point a of a convex set a ∈ A and an
arbitrary point b̃ ∈ A, the half-open line segment [a, b̃) is contained in the
interior of A.
Hint: see Fig. 3.12.
2. Show that both the closure and the relative interior of a convex set are convex
sets.
3. Show that the following two properties of a set C ⊆ Rn are equivalent (both
define the concept of closed set):
(a) the complement Rn \ C is open,
(b) C contains along with each convergent sequence of its points also its limit.
4. Show that a set S ⊆ Rn is compact, that is, closed and bounded, iff every
sequence in S has a subsequence that converges to an element in S.
5. Show that the following three properties of a function f : S → T with
S ⊆ Rn , T ⊆ Rm are equivalent (all three define the concept of continuous
function):
(a) lims→a f (s) = f (a) for all a ∈ S,
(b) the inverse image under f of every set that is open relative to T is open
relative to S,
(c) the inverse image under f of every set that is closed relative to T is closed
relative to S.
6. Show that one can write each nonempty closed convex set A as a Minkowski
sum D + L where D is a closed convex set that contains no line and L is a
subspace. In fact, L is uniquely determined to be LA , the lineality space of A.
Moreover, one can choose D to be the intersection A ∩ L⊥ A.
This result allows to restrict attention to nonempty closed convex sets that
do not contain a line.
7. Show that the following conditions on a convex cone C ⊆ Rn × R+ containing
the origin are equivalent:
(a) C is closed,
(b) the model of C in the ray model is closed in the sense that if the angles
between the rays of a sequence of rays of C and a given ray in Rn × R+
tend to zero, then this given ray is also a ray of C,
(c) the model of C in the hemisphere model is closed,
(d) the model of C in the top-view model is closed.
3.9 Exercises 83
8. Show that an unbounded convex set has always a recession direction, but that an
arbitrary unbounded subset of Rn does not always have a recession direction.
9. Determine the recession cones of some simple convex sets, such as (7, +∞),
{(x, y) ∈ R2 | y ≥ x 2 } and {(x, y) ∈ R2 | xy ≥ 1, x ≥ 0, y ≥ 0}.
10. Give an example of a convex set A ⊆ Rn , two recession vectors c, c and a
point a ∈ A such that a + tc ∈ A ∀t ≥ 0 but not a + tc ∈ A ∀t ≥ 0.
11. Show that the recession cone RA of a nonempty convex set A ⊂ Rn is a convex
cone.
12. Show that the recession cone RA of a nonempty closed convex set A ⊂ Rn is
closed.
13. Show that the recession cone RA of a nonempty closed convex set A ⊂ Rn
need not be closed.
14. Let A be a nonempty closed convex set in X = Rn . Show that a unit vector
c ∈ X is in the recession cone RA of A iff there exist a sequence a1 , a2 , . . . of
nonzero points in A for which ai → ∞ such that ai /ai → c for i → ∞.
15. Show that the recession cone RA of a nonempty closed convex set A ⊆ X = Rn
consists—for any choice of a ∈ A—of all c ∈ X for which a + tc ∈ A for all
t ≥ 0.
Figure 3.13 illustrates this.
So for closed convex sets, a simplification of the definition of recession
vector is possible: we can take the same arbitrary choice of a in the entire set
A, for all recession vectors c.
16. Verify in detail that Fig. 3.2 is in agreement with Proposition 3.2.1. Do this
using each one of the models (ray, hemisphere, top-view).
17. Verify that the gauge of a closed convex set that contains the origin in its
interior, is well-defined, in the following way. Let A ⊆ X = Rn be a closed
convex set that contains the origin in its interior. Show that there exists a unique
sublinear function g : X → R+ for which
A = {x ∈ X | g(x) ≤ 1}.
Fig. 3.13 Recession vector

of a closed convex set
A
a + tc
∼
a
a
18. Show that the epigraph of the gauge g of a closed convex set A that contains
the origin in its interior, is equal to the closure of the homogenization of A,
epi(g) = cl(c(A)).
19. Let A ⊆ X = Rn be a closed convex set that has the origin as an interior point,
and let g be its gauge. Show that
int(A) = {x ∈ X | g(x) < 1}.
20. Give a precise derivation of the continuity statement of Theorem 3.5.7 by

applying Lemma 3.5.9 in a suitable way.
21. Show that the following two characterizations of extreme points of a closed
convex set A ⊆ Rn are equivalent: ‘a point in A that is not the average of two
different points of A’ and ‘a point in A that does not lie on any open interval
with its two endpoints in A’.
22. Show that each extreme point of a closed convex set A ⊆ Rn is a relative
boundary point of A.
23. *Prove or disprove the following statement: the set of extreme points of a
nonempty bounded closed set with nonempty interior is closed.
24. Show that a recession vector of a closed convex set A ⊆ Rn is extreme iff
it does not lie in the interior of an angle in (0, π ) spanned by two recession
directions of A.
25. Determine the extreme directions for the semidefinite cone.
26. Let A ⊆ Rn be a nonempty closed convex set. Show that the lineality space
LA = RA ∩ (−RA ) is a subspace.
27. Show that the lineality space LA of a nonempty closed convex set A ⊆ X = Rn
consists of all c ∈ X such that a + tc ∈ A for all t ∈ R for some—and so for
all—a ∈ A.
28. Let A ⊆ Rn be a nonempty closed convex set. Show that for each a ∈ A, the
lineality space LA is the largest subspace of Rn , with respect to inclusion, that
is contained in A − a = {b − a | b ∈ A}.
29. Show that a closed convex set A ⊆ Rn does not contain any line iff LA = {0n }.
30. Give the details of the sketch of the proof that has been given for statement 8 of
Corollary 3.6.2 in the polyhedral case.
Chapter 4
Convex Sets: Dual Description
Abstract
• Why. The heart of the matter of convex analysis is the following phenomenon:
there is a perfect symmetry (called ‘duality’) between the two ways in which a
closed proper convex set can be described: from the inside, by its points (‘primal
description’), and from the outside, by the halfspaces that contain it (‘dual
description’). Applications of duality include the theorems of the alternative:
non-existence of a solution for a system of linear inequalities is equivalent to
existence of a solution for a certain other such system. The best known of these
results is Farkas’ lemma. Another application is the celebrated result from finance
that European call options have to be priced by a precise rule, the formula of
Black-Scholes. Last but not least, duality is very important for practitioners of
optimization methods, as we will see in Chap. 8.
• What. Duality for convex sets is stated and a novel proof is given: this amounts
to just throwing a small ball against a convex set. Many equivalent versions of
the duality result are given: the supporting hyperplane theorem, the separation
theorems, the theorem of Hahn–Banach, the fact that a duality operator on convex
sets containing the origin, the polar set operator B → B ◦ , is an involution. For
each one of the four binary operations on convex sets a rule is given of the type
(B1 B2 )◦ = B1◦ " B2◦ where " is another one of the four binary operations on
convex sets, and where B1 , B2 are convex sets containing the origin. Each one of
these four rules requires usually a separate proof, but homogenization generates
a unified proof. This involves a construction of the polar set operator by means
of a duality operator for convex cones, the polar cone operator. The following
technical problem requires attention: all convex sets that occur in applications
have two nice properties, closedness and properness, but they might lose these if
you work with them. For example B1 B2 need not have these properties if B1
and B2 have them and is one of the four binary operations on convex sets; then
the rule above might not hold. This rule only holds under suitable assumptions.
Road Map
1. Definition 4.2.1, Theorem 4.2.5, Figs. 4.5, 4.6 (the duality theorem for convex
sets, remarks on the two standard proofs, the novel ball throwing proof).

86 4 Convex Sets: Dual Description
2. Figure 4.7, Definition 4.3.1, Theorem 4.3.3, hint (reformulation duality theo-
rem: supporting hyperplane theorem).
3. Figure 4.8, Definition 4.3.4, Theorem 4.3.6, hint (reformulations duality theo-
rem: separation theorems).
4. Definitions 4.3.7, 4.3.9, Theorem 4.3.11, hint (reformulation duality theorem:
theorem of Hahn–Banach).
5. Definition 4.3.12, Theorem 4.3.15, hint, remark (reformulation duality theorem:
polar set operation is involution).
6. Figure 4.9, Definition 4.3.16, Theorem 4.3.19, hint, Theorem 4.3.20, hint
(reformulations of duality theorem: (1) polar cone operator is involution, (2)
polar cone of C = Rn is nontrivial; self-duality of the three golden cones).
7. The main idea of the Sects. 4.5 and 4.6 (construction polar set operator by
homogenization; pay attention to the minus sign in the bilinear mapping on
X × R and the proof that the ‘type’ of a homogenization of a convex set
containing the origin remains unchanged under the polar cone operator).
8. Figure 4.12 (geometric construction of polar set).
9. Propositions 4.6.1, 4.6.2 (calculus rules for computing polar cones).
10. Propositions 4.6.3, 4.6.4 (calculus rules for computing polar sets, take note of
the structure of the proof by homogenization).
11. Section 4.8 (Minkowski-Weyl representation, many properties of convex sets
simplify for polyhedral sets).
12. Theorem 4.8.2, structure proofs (theorems of the alternative, short proofs thanks
to calculus rules).
4.1 *Motivation
We recall that the contents of first sections of chapters that motivate the contents of
the chapter are optional material and are not used in the remainder of the section.
4.1.1 Child Drawing
The aim of this section is to give an example of a curved line that is described in a
dual way, by its tangent lines.
A child draws lines on a sheet of paper using a drawer and according to some
precise plan she has learned from a friend, who in turn has learned it from an older
brother. For each line, the two endpoints of the drawer are placed somewhere at the
bottom side of the sheet and accordingly at the left side of the sheet. The drawing
is shown in Fig. 4.1. The fun of it is that drawing all these straight lines leads to
something that is not straight: a curved line! The lines are tangent lines to the curve:
the curve is created by its tangent lines.
What is fun for us is to determine the equation for this curved line. You can see
that the region above the curve is a convex set in the plane and that the lines that
4.1 *Motivation 87
Fig. 4.1 Envelope of a

family of lines
Fig. 4.2 Manufacturing

controlled by prices of
resources P
are drawn are all outside this region and tangent to the region. This description of a
convex set in the plane by lines lying outside this set is called the dual description
of the set.
You can try to derive the equation already now. A calculation using the theory of
this chapter will be given in Sect. 4.8.
4.1.2 How to Control Manufacturing by a Price Mechanism
The aim of this section is to state an economic application of the dual description of
a convex set: how a regulator can make sure that a firm produces a certain good in a
desired way—for example respecting the environment. How to do this?
Consider a product that requires n resources (such as labor, electricity, water)
to manufacture. The product can be manufactured in many ways. Each way is
described by a resource vector—an element in a closed convex set P in the first
positive orthant Rn++ for which the recession cone is the first orthant Rn+ . In addition,
it is assumed that P is strictly convex—its boundary contains no line segments
of positive length. The regulator can set the prices πi ≥ 0, 1 ≤ i ≤ n, for the
resources. Then the firm will choose a way of manufacturing x = (x1 , . . . , xn ) ∈ P
that minimizes cost of production π1 x1 + · · · + πn xn . We will prove in Sect. 4.8,
using the dual description of the convex set P , that for any reasonable way of
manufacturing v ∈ P , the regulator can make sure that the firm chooses v. Here,
reasonable means that v should be a boundary point of P —indeed other points will
never be chosen by a firm as they can be replaced by points that give a lower price
of production, whatever the prices are. Figure 4.2 illustrates how the firm chooses
the way of manufacturing for given prices that minimizes cost of production.
4.1.3 Certificates of Insolubility
How can you convince a non-expert who wants to know for sure whether a certain
system of equations and inequalities has a solution or not, of the truth? If there
exists a solution, then you should find one and give it: then everyone can check that
it is indeed a solution. But how to do this if there is no solution? In the case of
a linear system, there is a way. It is possible to give certificates of insolubility by
means of so-called theorems of the alternative. This possibility depends on the dual
description of convex sets. This will be explained in Sect. 4.8.
4.1.4 The Black-Scholes Option Pricing Model
Aristoteles writes in his book ‘Politics’ about 400 years BCE that another philoso-
pher, Thales of Miletus, once paid the owners of olive presses an amount of money
to secure the rights to use these presses at harvest time. When this time came and
there was a great demand for these presses due to a huge harvest, as Thales had
predicted, he sold his rights to those who needed the presses and made a great profit.
Thus he created what is now called European call options, the right—but not the
obligation—to buy something at a fixed future date at a fixed price. For thousands
of years, such options were traded although there was no method for determining
their value. Then Black and Scholes discovered a model to price them. They showed
how one could make money without risk if such options would not be priced using
their model. Therefore, from then on, all these options have been traded using the
model of Black and Scholes.
In Sect. 4.8, we will apply the dual description of a convex set to establish the
result of Black and Scholes for the pricing of European call options in a simple but
characteristic situation.
4.2 Duality Theorem
The aim of this section is to present the duality theorem for a closed convex set.
The meaning of this result is that the two descriptions of such a set—the primal
description (from the inside) and the dual description (from the outside)—are
equivalent.
Definition 4.2.1 A subset H of Rn is a closed halfspace if it is the solution set of
an inequality of the type a1 x1 + · · · + an xn ≤ b for a1 , . . . , an , b ∈ R and ai = 0
for at least one i. So, H is the set of points on one of the two sides of the hyperplane
a1 x1 + · · · + an xn = b (including the hyperplane).
A nonempty closed convex set A ⊆ Rn can be approximated from the inside, as
the convex hull of a finite subset {a1 , . . . , ak } of A, as we have already explained.
4.2 Duality Theorem 89
Fig. 4.3 Approximation

from the inside a6
a7
a5
a1
A
a2
a3 a4
Fig. 4.4 Approximation

H6 H5
from the outside
H4
H1 A
H3
H2
Example 4.2.2 (Approximation Convex Set from the Inside) Figure 4.3 illustrates a
primal approximation of a convex set.
In the figure, seven points in a given convex set A in the plane and their convex
hull are drawn. The convex hull is an approximation of A from the inside, also called
a primal approximation.
The set A can also be approximated from the outside, as the intersection of a
finite set of closed halfspaces H1 , . . . , Hl containing A.
Example 4.2.3 (Approximation Convex Set from the Outside) Figure 4.4 illustrates
a dual approximation of a convex set.
In the figure, six halfplanes containing a given convex set A in the plane and their
intersection are drawn. The intersection is an approximation of A from the outside,
also called a dual approximation.
Most of the standard results from convex analysis are a consequence of the
following central result of convex analysis, the meaning of which is that ‘in the limit’
the two types of approximation, from the inside and from the outside, coincide.
Example 4.2.4 (Approximation Convex Set from the Inside and the Outside) Fig-
ure 4.5 illustrates these both types of approximation, primal and dual, together in
one picture.
In the figure, both a primal and a dual approximation of a convex set A in the
plane are drawn. The convex set A is squashed between the primal and the dual
approximation.
Now we formulate the duality theorem for closed convex set. This is the central
result of convex analysis.
Fig. 4.5 Primal-dual

H6 H5
approximation:
co({a1 , . . . , a7 }) and a7 a6
H1 ∩ · · · ∩ H6 a5 H4
a1
H1 A A
a2
a3 a4
H3
H2
Theorem 4.2.5 (Duality Theorem for Convex Sets) A closed convex set is the
intersection of the closed halfspaces containing it.
Example 4.2.6 (Duality Theorem and the Shape of Beach Pebbles) There are two
types of beach pebbles: sea pebbles and river pebbles. Beach pebbles have usually
the shape of a bounded closed convex set in three dimensional space R3 . River
pebbles have usually not a convex shape. What might be the reason for this?
Pebbles form gradually, sometimes over billions of years, as the water of the
ocean or the river washes over rock particles. River current is gentler than ocean
current. Therefore, the outside effect of ocean current on a pebble is that the pebble
is exactly equal to its outside approximation—the intersection of the halfspaces
containing it. However, the outside effect of river current on a pebble achieves in
general only that it is roughly equal to its outside approximation. By the duality
theorem for convex sets, Theorem 4.2.5, the sets in three dimensional space that
are equal to their outer approximation are precisely the closed convex sets. So the
difference in the shapes between sea pebbles and river pebbles illustrates the duality
theorem for convex sets.
To prove the duality theorem, one has to show that there exists for each point
p outside A a hyperplane that separates p and A, that is, such that the point p
and the set A lie on different sides of it (where each side is considered to contain
the hyperplane). There are two standard ways for proving this theorem: the proof by
means of the shortest distance problem, which produces the hyperplane in one blow,
and the standard proof of the Hahn–Banach theorem (for the finite dimensional
case), which produces step by step a chain of affine subspaces L1 ⊂ L2 ⊂ · · · ⊂
Ln−1 of Rn , such that the dimension of Li is i for all i and such that moreover,
Ln−1 is the desired hyperplane (see Exercises 3 and 9 respectively). The proof of
the Hahn–Banach theorem is usually given in a course on functional analysis for the
general, infinite dimensional, case.
Here a novel proof is presented.
Example 4.2.7 (Ball-Throwing Proof Duality Theorem) Figure 4.6 illustrates this
proof.
4.3 Other Versions of the Duality Theorem 91
Fig. 4.6 Ball-throwing proof

duality theorem pt− a
p q A
The idea of the proof can be explained as follows for n = 3. Throw a small
ball outside A with center p in a straight line till it hits A, say at the point q. Then
take the tangent plane at q to the ball in its final position. Then geometric intuition
suggests that this plane separates the point p and the set A.
Now we give a more precise version of this proof. Figure 4.6 also illustrates this
more precise version.
Proof The statement of the theorem holds for A = ∅ (all closed halfspaces contain
∅ and their intersection is ∅) and A = Rn (the set of closed halfspaces containing
Rn is the empty set and their intersection is Rn ; here we use the convention that the
intersection of the empty set of subsets of a set S is S). So assume that A = ∅, Rn .
Take an arbitrary p ∈ Rn \ A. Choose ε > 0 such that the ball B(p, ε) is disjoint
with the closed convex set A. Choose a ∈ A. Define the point
pt = (1 − t)p + ta
for all t ∈ [0, 1]. So if t runs from 0 to 1, then pt runs in a straight line from p to
a. Let t¯ be the smallest value of t for which B(pt , ε) ∩ A = ∅. This exists as A
is closed. We claim that B(pt¯, ε) and A have a unique common element q. Indeed,
let q be a common element. Then, for each x ∈ B(pt¯, ε) \ {q} there exists t < t¯
such that x ∈ B(pt , ε), and so, by the minimality property of t¯, x ∈ A. Now we use
the fact that the open half-lines with endpoint a boundary point of a given closed
ball that run through a point of the ball, form an open halfspace that is bounded by
the hyperplane that is tangent to the ball at this boundary point. Apply this property
to the ball B(pt¯, ε) and the point q. This shows that the hyperplane that is tangent
to the ball B(pt¯, ε) at the point q—which has equation (x − q) · (pt¯ − q) = 0—
separates the point p and the set A.
4.3 Other Versions of the Duality Theorem
The aim of this section is to give no less than six equivalent reformulations of the
important Theorem 4.2.5. These are often used. We only give hints for the proofs of
the equivalence.
4.3.1 The Supporting Hyperplane Theorem
Definition 4.3.1 A hyperplane H supports a nonempty closed convex set A ⊂ Rn
at a point x̄ ∈ A if A lies entirely on one of the two sides of H (it is allowed to have
points on H ) and if, moreover, the point x̄ lies on H . Then at least one of the two
closed halfspaces bounded by H contains A. Such a halfspace is called a supporting
halfspace of A.
Example 4.3.2 (Supporting Halfspace) In Fig. 4.7, a convex set A in the plane R2
and a boundary point x̄ of A are drawn. A halfplane H is drawn that contains A
entirely, and that contains the point x̄ on the line that is the boundary of H .
Theorem 4.3.3 (Supporting Hyperplane Theorem) Let x̄ be a relative boundary

point of a closed convex set A ⊆ Rn . Then there exists a hyperplane that supports
the set A at the point x̄.
Hint for the equivalence to the duality theorem 4.2.5: this result can be derived
from Theorem 4.2.5 by choosing, for a relative boundary point x̄ of A, a sequence of
points outside the closure of A that converges to x̄. The other implication is easier:
it suffices to use that a closed line segment for which one endpoint is a relatively
interior point of A and the other one is not in A contains a relatively boundary point
of A.
4.3.2 Separation Theorems
Definition 4.3.4 Two convex sets A, B ⊆ X = Rn can be separated if they lie on

different sides of a hyperplane (they are allowed to have points in common with this
hyperplane).
Example 4.3.5 (Separation of Two Convex Sets by a Hyperplane) In Fig. 4.8, both
on the left and on the right, two convex set A and B in the plane R2 are separated
Fig. 4.7 Supporting

halfspace
x−
H
Fig. 4.8 Separating

hyperplane
A
B
B A
by a line. On the left, this line has no points in common with A or B, on the right,
this line has points in common with A or B, actually with both, and moreover with
their intersection A ∩ B.
Theorem 4.3.6 (Separation Theorems) Let A, B ⊆ X = Rn be nonempty convex

sets.
1. Assume that A and B have no relative interior point in common,
ri(A) ∩ ri(B) = ∅.
Then A and B can be separated by a hyperplane.

2. Assume that A and B have no point in common, A ∩ B = ∅, and, moreover,
assume that one of A, B is compact and the other one is closed. Then they can
even be separated by two parallel hyperplanes, that is, there exist real numbers
α1 , . . . , αn , not all zero and two different real numbers β < γ such that
α1 x1 + · · · + αn xn ≤ β ∀x ∈ A
and
α1 x1 + · · · + αn xn ≥ γ ∀x ∈ B.
Hint for the equivalence to the duality theorem 4.2.5: separating nonempty
convex sets A and B in Rn by a hyperplane is equivalent to finding a linear function
l : Rn → R for which l(a) ≥ l(b) for all a ∈ A and b ∈ B, that is, for which
l(a − b) ≥ 0 for all a ∈ A and b ∈ B. This is equivalent to separating the convex
set A − B = {a − b | a ∈ A, b ∈ B} from the origin by a hyperplane.
4.3.3 Theorem of Hahn–Banach
The following result is the finite dimensional case of an important result from
functional analysis. We need some definitions.
Definition 4.3.7 A norm on a vector space X is a function N : X → [0, ∞) having

the following properties:
1. N (x + y) ≤ N (x) + N(y) ∀x, y ∈ X,
2. N (αx) = |α|N (x) ∀α ∈ R ∀x ∈ X,
3. N (x) = 0 ⇔ x = 0X .
A normed space is a vector space equipped with a norm.
Example 4.3.8 (Norm) Let n be a natural number. For each p ∈ [1, +∞], the
1
function · p : Rn → [0, +∞) given by the recipe x → (|x1 |p + · · · + |xn |p ) p
for p = +∞ and x → max(|x1 | , . . . , |xn | ) = limq→+∞ xq for p = +∞ is a
norm (the p-norm or lp -norm). For n = 2 this gives the Euclidean norm. The most
often used lp -norms are the cases p = 1, 2, +∞.
Definition 4.3.9 The norm of a linear function ψ : X → R where X is a finite
dimensional normed vector space with norm N is
N ∗ (ψ) = max |ψ(x)|.

N (x)=1
The set of all linear functions X → R on a finite dimensional normed vector

space with norm N forms a finite dimensional normed vector space with norm N ∗ .
One calls N ∗ the dual norm.
Example 4.3.10 (Dual Norm) Let n be a natural number. If p, q ∈ [1, +∞] and
p + q = 1—with the convention +∞ = 0—then · q is the dual norm of · p , if
1 1 1
we identify v ∈ Rn with the linear function Rn → R given by the recipe x → v · x,

for each v ∈ Rn .
Now we are ready to give another reformulation of Theorem 4.2.5.
Theorem 4.3.11 (Hahn–Banach) Each linear function on a subspace of a finite
dimensional normed space X with norm N can be extended to a linear function on
X having the same norm. That is, if L is a subspace of X and ϕ : L → R is a
linear function, then there is a linear function ψ : X → R for which ψ|L = ϕ and
N ∗ (ψ) = N ∗ (ϕ).
Hint for the equivalence to the duality theorem 4.2.5: the statement can be viewed
as separation of the epigraph of the norm · on Rn from the subspace graph(L) by
a hyperplane. We have already seen that separating two convex sets by a hyperplane
is equivalent to the duality theorem 4.2.5.
4.3.4 Involution Property of the Polar Set Operator
Now we define an operator on convex sets B ⊆ X = Rn that contain the origin,

the polar set operator. Note that the assumption that B contains the origin does not
restrict the generality as long as we consider one nonempty convex set at the time
(which is what we are doing now): this assumption can be achieved by translation.
If A ⊆ X = Rn is a nonempty convex set, then choose ã ∈ A and replace A by
B = A − ã = {a − ã | a ∈ A} or—equivalently—move the origin to the point ā.
Definition 4.3.12 The polar set of a convex set B in X = Rn that contains the
origin is defined to be the following closed convex set in X that contains the origin
B ◦ = {y ∈ X | x · y ≤ 1 ∀x ∈ B}.
Note that for a nonzero y ∈ X one has that y ∈ B ◦ iff the solution set of x · y ≤ 1
is a closed halfspace in X that contains B and for which the origin does not lie on
the boundary of the halfspace. Therefore, the polar set operator is called a duality
operator for convex sets containing the origin. Indeed, it turns the primal description
of such a set (by points inside the set) into the dual description (by closed halfspaces
containing the set that do not contain the origin on their boundary).
Example 4.3.13 (Polar Set Operator) Here are some illustrations of the polar set
operator.
• The polar set of a subspace is its orthogonal complement.
• The polar set of the standard unit ball in Rn with respect to some norm N , {x ∈
Rn | N (x) ≤ 1}, is the standard unit ball in Rn with respect to the dual norm N ∗ ,
{y ∈ Rn | N ∗ (y) ≤ 1}.
• The polar set P ◦ of a polygon P in the plane R2 that contains the origin is again
a polygon that contains the origin. The vertices of each of these two polygons
correspond 1 − 1 to the edges of the other polygon. Indeed, for each vertex of P ,
the supporting halfplanes to P that have the vertex on the boundary line form an
edge of P ◦ . Moreover, for each edge of P , the unique supporting halfplane to P
that contains the edge on its boundary line is a vertex of P ◦ .
• The polar set P ◦ of a polytope P in three dimensional space R3 that contains the
origin is again a polytope that contains the origin. The vertices of each of these
two polytopes correspond 1 − 1 to the faces of the other polytope; the edges of P
correspond 1 − 1 to the edges of P ◦ . Indeed, for each vertex of P , the supporting
halfspaces to P that have the vertex on the boundary plane form a face of P ◦ .
Moreover, for each face of P , the unique supporting halfspace to P that contains
the face on its boundary plane is a vertex of P ◦ . Furthermore, for each edge of
P , the supporting halfspaces to P that have the edge on the boundary plane form
an edge of P ◦ .
• The results above for polytopes in R2 and R3 can be extended to polytopes in Rn
for general n.
• The polar set of a regular polytope that contains the origin is again a regular
polygon that contains the origin:
– if P is a regular polygon in R2 with k vertices, then P ◦ is a regular polygon
with k vertices;
– if P is a regular simplex in Rn , then P ◦ is a regular simplex;
– if P is a hypercube in Rn , then P ◦ is an orthoplex, and conversely;

– if P is a dodecahedron in R3 , then P ◦ is an icosahedron, and conversely.
Definition 4.3.14 The bipolar set B ◦◦ of a convex set B ⊆ X = Rn containing the
origin is (B ◦ )◦ , the polar set of the polar set of B,
B ◦◦ = (B ◦ )◦ .
Now we are ready to give the promised reformulation of the duality theo-
rem 4.2.5, in terms of the polar set operator.
Theorem 4.3.15 A closed convex set B ⊆ X = Rn that contains the origin is equal
to its bipolar set,
B ◦◦ = B.
In particular, the bipolar set operator on nonempty convex sets is the same
operator as the closure operator: B ◦◦ = cl(B) for every nonempty convex set
B ⊆ X = Rn .
Hint for the equivalence to the duality theorem 4.2.5. Elements of B ◦ correspond
to closed halfspaces containing B that do not contain the origin on their boundary.
So B = B ◦◦ is equivalent to the property that B is the intersection of closed
halfspaces.
An advantage of this reformulation is that it shows that there is a complete
symmetry between the primal description and the dual description of a closed
convex set that contains the origin. Indeed, both are given by a convex set—B
for the primal description and B̃ for the dual one—and these convex sets are each
other’s polar set—B̃ = B ◦ and B = B̃ ◦ . So taking the dual description of the dual
description brings you back to the primal description.
4.3.5 Nontriviality Polar Cone
Now we define a duality operator on convex cones C ⊆ Rn .

Definition 4.3.16 The polar cone of a convex cone C in Rn is defined to be the
following closed convex cone in Rn
C ◦ = {y ∈ Rn | x · y ≤ 0 ∀x ∈ C}.
This operator is called a duality operator for convex cones; it turns the primal
description of a closed convex cone (by its rays) into the dual description (by the
halfspaces containing the convex cone that have the origin on their boundary: for
each nonzero vector y ∈ C ◦ , the set of solutions x of the inequality x · y ≤ 0 is such
a halfspace).
C
C Co Co
Fig. 4.9 Polar cone
Warning The terminologies polar cone and polar set are similar and the notations
are even the same, but these operators are defined by different formulas. However,
these operators agree for convex cones.
Example 4.3.17 (Polar Cone Operator)
1. Figure 4.9 illustrates the polar cone operator for convex cones in the plane and
in three dimensional space. Note the orthogonality property in both pictures: for
each ray on the boundary of a closed convex cone C, there is a unique ray on C ◦
for which the angle is minimal; this minimal angle is always 12 π .
2. The polar cone of a convex cone is the polar set of that convex cone.
3. For each one of the three ‘golden cones’ C (the first orthant, the Lorentz cone and
the positive semi-definite cone) one has C ◦ = −C = {−c | c ∈ C}. Sometimes
one defines, for a convex cone C ⊆ Rn the dual cone C ∗ = {x ∈ Rn | x · y ≥
0} = −C ◦ . and a convex cone C is called self-dual if C ∗ = C. Then one can say
that the three golden cones are self-dual.
One gets the following reformulation of the duality theorem 4.2.5 in terms of
convex cones.
Definition 4.3.18 The bipolar cone C ◦◦ of a convex cone C ⊆ X = Rn containing
the origin is (C ◦ )◦ , the polar cone of the polar cone of C,
C ◦◦ = (C ◦ )◦ .
Theorem 4.3.19 A closed nonempty convex cone C ⊆ X = Rn is equal to its

bipolar cone,
C ◦◦ = C.
In particular, the bipolar cone operator on nonempty convex cones is the same
operator as the closure operator: C ◦◦ = cl(C).
Hint for the equivalence to the duality theorem 4.2.5. One should use the
construction of the polar set operator by the homogenization method, which will
be given later in this chapter.
In fact, one can give moreover the following weak looking reformulation of the
duality theorem 4.2.5 in terms of convex cones. This weak looking reformulation
represents in a sense the essence of the duality of convex sets, and so the essence of
convex analysis.
Theorem 4.3.20 The polar cone of a convex cone C ⊆ Rn (so C = Rn ) that
contains the origin, is nontrivial: that is, C ◦ = {0n }.
Hint for the equivalence to Theorem 4.3.19 (and so to the duality theorem 4.2.5).
In order to derive from the weak looking Theorem 4.3.20 the stronger looking
Theorem 4.3.19, one should use that separating a point p outside C by a hyperplane
from C is equivalent to separating the origin by a hyperplane from the convex cone
cone(C, −p) = {c − αp | c ∈ C, α ≥ 0}.
This is the same as producing a nonzero element of (cone(C, −p))◦ .

Example 4.3.21 (Throwing the Ball in Any Curved Line to Prove Theorem 4.3.20)
The proof for Theorem 4.2.5 by means of throwing a ball that has been given, can
be simplified if you want to prove the weak looking Theorem 4.3.20: it turns out
that it is not necessary to throw the ball in a straight line. It can be thrown along
any curved line that will hit the convex cone eventually. Figure 4.10 illustrates this
simpler proof.
The reason that you have to throw the ball in a straight line if you want to prove
Theorem 4.2.5, but that you are allowed to throw the ball along any curved line if
you want to prove Theorem 4.3.20 is as follows. In the first case, you have to make
sure that the point p from where you throw the ball and the given convex set A lie
on different sides of the plane that you get at the point where the ball hits the convex
set; this is only guaranteed if you throw in a straight line. Otherwise, although A
will lie entirely on one side of the plane, this point p might lie on the same side. In
the second case, it suffices to have any plane for which the given convex cone C lies
entirely on one side of it. Therefore, it is possible to throw the line along any curved
line.
Fig. 4.10 Ball-throwing

proof for the weakest looking
duality result
4.4 The Coordinate-Free Polar Cone 99
4.4 The Coordinate-Free Polar Cone
The aim of this section is to give a coordinate-free construction of the polar cone
operator. This will be convenient for giving an equivalent definition for the polar set
operator, this time in a systematic way (rather than by an explicit formula, such as
we have done before)—by homogenization.
Definition 4.4.1 Let X, be two finite dimensional vector spaces. A mapping
b : X × → R is bilinear if it is linear in both arguments, that is,
b(αx + βy, z) = αb(x, z) + βb(y, z) for all α, β ∈ R, x, y ∈ X, z ∈ ,
b(x, αy + βz) = αb(x, y) + βb(y, z) for all α, β ∈ R, x ∈ X, y, z ∈ .
Definition 4.4.2 Let X, be two finite dimensional vector spaces and b : X ×
→ R a bilinear mapping. Then b is non-degenerate if for each nonzero x ∈ X the
linear function on given by the recipe y → b(x, y) is not the zero function, and
if moreover for each nonzero y ∈ the linear function on X given by the recipe
x → b(x, y) is not the zero function.
Definition 4.4.3 A pair of finite dimensional vector spaces X, is said to be in
duality if a non-degenerate bilinear mapping b : X × → R is given.
Proposition 4.4.4 (A Standard Pair of Vector Spaces in Duality) Let X = Rn
and = Rn . The mapping b : X × → R given by the recipe (x, y) → x · y,
that is, the standard inner product, called the dot product, makes from X and two
vector spaces in duality.
We only work with finite dimensional vector spaces and for them the standard
pair of vector spaces in duality given above is essentially the only example.
Remark The reasons for introducing a second space and not just taking an inner
product on X is as follows. For infinite dimensional vector spaces X of interest,
there is often no natural generalization of the concept of inner product. Then the
best one can do is to find a different space and a non-degenerate bilinear mapping
b : X × → R. Even for finite dimensional spaces, it is often more natural to
introduce a second space, although it has the same dimension as X, so it could
be taken equal to X. For example if one considers consumption vectors as column
vectors x, then one models price vectors as row vectors p. However, for our purpose,
the really compelling reason to introduce and a bilinear map b : X × → R is
that we will need other b than the inner product.
Example 4.4.5 (Useful Nonstandard Bilinear Mappings) Here are the two nonstan-
dard pairs of spaces in duality that we will need.
1. Let X = Rn+1 and = Rn+1 . The mapping b : X × → R given by the recipe
((x, α), (y, β)) → x · y − αβ makes from X and two vector spaces in duality.
Note the minus sign.
2. Let X = Rn+2 and = Rn+2 . The mapping b : X × → R given by the recipe

((x, α, α ), (y, β, β )) → x · y − αβ − α β makes from X and two vector
spaces in duality. Note the minus signs and the transposition.
Now we are ready to give the definition of the polar cone.
Definition 4.4.6 let X and be two finite dimensional vector spaces in duality with
underlying mapping b. The polar cone C ◦ of a convex cone C in X with respect to
b is the following closed convex cone in :
C ◦ = {y ∈ | b(x, y) ≤ 0 ∀x ∈ C}.
We have mentioned that for each nondegenerate bilinear mapping b : X× → R

we can choose bases of X and such that after going over to coordinate vectors, this
mapping gives the dot product on Rn . Then we get the usual coordinate definition
of the polar cone that was given in the previous section.
Example 4.4.7 (Polar Cone) Figure 4.11 illustrates the polar cone operator with
respect to the dot product.
Example 4.4.8 (Dual Norms and Polar Cones) The dual norm is related to the polar
cone operator. Indeed, for each finite dimensional normed space, the epigraphs of
the norm N and the dual norm N ∗ ,
epi(N ) = {(x, ρ) | x ∈ X, ρ ∈ R, ρ ≥ N (x)}
and
epi(N ∗ ) = {(x, ρ) | x ∈ X, ρ ∈ R, ρ ≥ N ∗ (x)}
are convex cones that are each other’s polar cone with respect to the bilinear pairing
defined by the recipe (x, α), (y, β) = x · y − αβ.
C
C Co Co
Fig. 4.11 Polar cone with respect to the dot product

4.5 Polar Set and Homogenization 101
4.5 Polar Set and Homogenization
The aim of this section is to define the polar set operator by homogenization. This
definition will make it easier to establish properties of the polar set operator.
Here is a detail that needs some attention before we start: the homogenization
c(B) of a convex set B ⊆ X = Rn that contains the origin, is contained in the
space X × R. For defining the polar cone (c(B))◦ , we have to define a suitable
nondegenerate bilinear mapping on the space X × R. We take the following non-
degenerate bilinear mapping b:
b((x, α), (y, β)) = x · y − αβ (*)
for all (x, α), (y, β) ∈ X × R. Note the minus sign!

The following result gives a nice property of a convex set that contains the origin:
its dual object is again a convex set containing the origin.
Proposition 4.5.1 Let X = Rn .
1. A convex cone C ⊆ X × R containing the origin is a homogenization of a convex
set that contains the origin iff
R+ (0X , 1) ⊆ C ⊆ X × R+ .
2. If a convex cone C ⊆ X×R containing the origin is a homogenization of a convex

set containing the origin, then the polar cone C ◦ with respect to the bilinear
mapping b defined by (∗) is also a homogenization of a convex set containing the
origin.
Proof The first statement holds by definition of homogenization of a convex set.
The second statement follows from the fact that the convex cones R+ (0X , 1) and
X×R+ are each other’s polar cone with respect to the bilinear mapping b. Therefore,
we get that
R+ (0X , 1) ⊆ cl(C) ⊆ X × R+
implies
(R+ (0X , 1))◦ ⊇ C ◦ ⊇ (X × R+ )◦ ,
and so it gives
X × R+ ⊇ cl(C ◦ ) ⊇ R+ (0X , 1),
as required.
Note that the second property of Proposition 4.5.1 is not true for arbitrary convex
sets. In Chap. 6, it is explained how the duality of arbitrary convex sets can be
described without translating them first to convex sets that contain the origin. This
will be necessary, as for applications in convex optimization, we will have to deal
simultaneously with two convex sets that do have a common point.
Now we are ready to construct the polar set operator by the homogenization
method.
Proposition 4.5.2 The duality operator on convex sets containing the origin that
one gets by the homogenization method (first conify, then apply the polar cone
operator, finally deconify) is equal to the polar set operator.
Proof Here are the three steps of the construction of the duality operator on convex
sets containing the origin that one gets by the homogenization method.
1. Homogenize. Take cl(c(B)), the closure of the homogenization of a given convex
set B ⊆ X = Rn containing the origin. Choose a conification C of B. Note that
C is a convex cone C ⊆ X × R for which R+ (0X , 1) ⊆ C ⊆ X × R+ .
2. Work with convex cones. Take the polar cone with respect to the bilinear mapping
b defined above, that is,
{(y, β) ∈ X × R | b((x, α), (y, β)) ≤ 0 ∀(x, α) ∈ C}.
This gives
(R+ (0X , 1))◦ ⊇ C ◦ ⊇ (X × R+ )◦ .
As the convex cones R+ (0X , 1) and X × R+ are each other’s polar cone with
respect to this bilinear pairing, it follows that
X × R+ ⊇ C ◦ ⊇ R+ (0X , 1).
So C ◦ is a homogenization of a convex set in X that contains the origin.

Moreover, as C ◦ is closed, this convex set is closed.
3. Dehomogenize. Take the dehomogenization of the convex cone C ◦ that is the
outcome of the previous step. This gives a closed convex set in X that contains
the origin.
A calculation, which we do not display, shows that the polar set operator defined
by the homogenization method is equal the usual polar set operator.
Remark This is another example where a formula from convex analysis—here the
defining formula for the polar set operation—is generated by the homogenization
method. Now it follows that the duality result for closed convex cones C that contain
the origin, C ◦◦ = C, implies the duality result for closed convex sets B that contain
the origin, B ◦◦ = B.
4.6 Calculus Rules for the Polar Set Operator 103
Fig. 4.12 Geometric

construction of the polar set
B ×{1}
B° ×{-1}
Example 4.5.3 (Geometric Construction Polar Set Operator) The construction of

the polar set operator by the homogenization method gives a geometric construction
of the polar set operator. Figure 4.12 illustrates this.
Indeed, consider a closed convex set B in the plane R2 that contains the origin.
Take two closed convex cones in three-dimensional space R3 that are each other’s
polar cone—taken with respect to the standard inner product—one of which is the
homogenization of B. Note that this is easy to visualize because of ‘the right angle
between the convex cones’. Now intersect the upper cone with the horizontal plane
at height 1 and intersect the lower cone with the horizontal plane at height −1.
Take the orthogonal projections of the two intersections on the horizontal coordinate
plane. Then you get two closed convex sets containing the origin that are each
other’s polar set.
4.6 Calculus Rules for the Polar Set Operator
For every type of convex object, there are calculus rules. These can be used to
compute the duality operator for this type of objects. Here we give these calculus
rules for convex sets B containing the origin. Here the duality operator is the polar
set operator. We have already the rule B ◦◦ = B if B is closed. Now we will also give
rules expressing (B1 ◦ B2 )◦ , for a binary operation ◦ (sum, intersection, convex hull
of the union or inverse sum) in terms of B1◦ and B2◦ , and rules expressing (MB)◦
and (BN )◦ in terms of B ◦ for linear functions M and N .
We will derive the calculus rules for convex sets by using the homogenization
method. In order to do this, we need the calculus rules for convex cones C to begin
with. Therefore, we prepare by giving these rules. We have already the rule C ◦◦ = C
if C is closed and contains the origin.
We will halve the number of formulas needed, by the introduction of a handy
shorthand notation. We will use for two convex cones C, C̃ ⊆ X = Rn that are in
duality, in the sense that
C ◦ = C̃, C̃ ◦ = cl(C),
the following notation
d
C←
→ C̃.
Note that this implies that C̃ is closed; however C does not have to be closed. It also
implies that C and C̃ both contain the origin.
Proposition 4.6.1 (Calculus Rules for Binary Operations on Convex Cones)
d d
Let C, C̃, D, D̃ be closed convex cones in X = Rn for which C ←
→ C̃, D ←
→ D̃.
Then
d
C+D ←
→ C̃ ∩ D̃.
The formula in Proposition 4.6.1 is a consequence of the following result, taking

into account that +X and X are each other’s transpose by Proposition 2.2.1. In this
result, we look what happens to convex cones in duality under the two actions by
linear functions—image and inverse image.
Proposition 4.6.2 (Calculus Rules for the Two Actions of Linear Functions on
Convex Cones) Let M : Rn → Rm be a linear function, and C, C̃ ⊆ Rn convex
d
cones that are in duality, C ←
→ C̃. Then
d
→ C̃M .
MC ←
This result follows from the duality theorem for convex cones. Under suitable
assumptions, the cone MC in Proposition 4.6.2 is closed (see Exercise 42).
Now we are ready to present the calculus rules for convex sets containing the
origin. Again we use similar shorthand notation. We will use for two convex sets
B, B̃ ⊆ X = Rn containing the origin that are in duality, in the sense that
B ◦ = B̃, B̃ ◦ = cl(B),
the following notation
d
B←
→ B̃.
Note that this implies that B̃ is closed; however B does not have to be closed.
Proposition 4.6.3 (Calculus Rules for Binary Operations on Convex Sets Con-
taining the Origin) Let A, Ã, B, B̃ be convex sets in X = Rn containing the
d d
origin, for which A ←
→ Ã, B ←
→ B̃. Then
d
A co ∪ B ←
→Ã ∩ B̃
d
A +B ←
→Ã #B̃.
4.6 Calculus Rules for the Polar Set Operator 105
This result can be proved by the homogenization method, using the systematic
definition of the binary operations for convex sets.
Now we give the calculus rules for the two actions of linear functions (image and
inverse image).
Proposition 4.6.4 (Calculus Rules for the Two Actions of Linear Functions on
Convex Sets Containing the Origin) Let M : Rn → Rm be a linear function, and
d
B, B̃ ⊂ Rn convex zero-sets containing the origin for which B ←
→ B̃. Then
d
→ B̃M .
MB ←
This result follows by the conification method from the corresponding result for
convex cones. Again one ‘can omit the closure operator’ in the formulas above under
suitable assumptions. We do not display these conditions.
Example 4.6.5 (Calculus Rules for Bounded Convex Sets Containing the Origin)
The calculus rules above simplify if one restricts attention to the following type of
convex object: bounded convex sets that contain the origin in their interior. Part of
the assumptions, that the convex sets contain the origin in their interior, is essentially
no restriction of the generality: it can be achieved by a translation that brings a
relative interior point of the convex set to the origin, followed by the restriction of
the entire space by the linear span of the translated copy of the convex set. Note that
the polar set of such a set is of the same type, and so the polar set operator preserves
the type. Indeed, these convex sets are deconifications of convex cones C for which
epi(R · ) ⊆ cl(C) ⊆ epi(r · ),
for some numbers R > r > 0, where · is the Euclidean norm. If you apply the
polar cone operator with respect to b, then you get
(epi(R · ))◦ ⊇ C ◦ ⊇ (epi(r · ))◦ ,
As epi( · ) is equal to its polar cone with respect to b, you get
epi(R −1 · ) ⊇ cl(C ◦ ) ⊇ epi(r −1 · ).
The simplification in the calculus rules for bounded sets containing the origin in
their interior, is that the closure operations in the calculus formulas can be omitted.
This follows from the fact that the image of a compact set under a continuous
function is again compact.
Conclusion The unified definition of the four classic binary operations for convex
sets containing the origin, by means of the homogenization method, makes it
possible to prove their calculus formulas in one blow. They are a consequence of the
calculus formulas for the two actions on convex cones by linear functions—image
and inverse image.
4.7 Duality for Polyhedral Sets
The aim of this section is to present the duality theory for polyhedral sets.
Polyhedral sets are the convex sets par excellence. They are the convex sets that
occur in Linear Programming. Each convex set can be approximated as closely as
desired by polyhedral sets. The theorems of convexity are essentially simpler and
stronger for polyhedral sets than for arbitrary convex sets. Polyhedral sets have an
interesting specific structure: vertices, edges, facets. Some properties of the duality
for polyhedral sets have already been given in Example 4.3.13.
A closed convex cone in X × R+ , where X = Rn , has as deconification a
polyhedral set iff it is a polyhedral cone.
We begin with the representation theorem for polyhedral sets P , which is the
central result.
We give a version of the representation theorem where the representation only
depends on one choice, that of a basis for the lineality space of the polyhedral set P .
Note that the definition of a polyhedral set, as the intersection of a finite collection
of closed halfspaces, gives a dual description of polyhedral sets. The representation
theorem gives a primal description as well.
Theorem 4.7.1 (Minkowski-Weyl Representation) Let P ⊆ X = Rn be a
polyhedral set. Consider the following sets S is the set of extreme points of P ∩ L⊥
P,
T is a set of representatives of the extreme rays of P ∩ L⊥ P and U is a basis of LP .
Then each point of P can be represented in the following form

αs s + βt t + γu u
s∈S t∈T u∈U

where αs ∈ R+ ∀s ∈ S, s∈S αs = 1, βt ∈ R+ ∀t ∈ T and γu ∈ R ∀u ∈ U .
Conversely, each convex sum that is the Minkowski sum of the convex hull of a
finite set and the conic hull of a finite set is a polyhedral set.
Example 4.7.2 (Minkowski-Weyl Representation) Figure 4.13 illustrates the
Minkowski-Weyl representation for four different polyhedral sets P in the plane
R2 . From left to right:
• a bounded P is the convex hull of its vertices (extreme points),
s4 P
s3 u
P t1 t2 t1 P t2 s1
s5 s4 s2 L⊥
P
s1 P
s1 s2 s2 s3
Fig. 4.13 Minkowski-Weyl representation

4.7 Duality for Polyhedral Sets 107
• an angle is the conic hull of its two legs (extreme recession directions),
• an unbounded P that contains no line is the Minkowski sum of the convex hull
of its extreme points and the conic hull of its extreme recession directions,
• a general polyhedral set P is the Minkowski sum of a polyhedral set that contains
no line, and a subspace.
Note that this result implies that a bounded convex set is the convex hull of a
finite set iff it is the intersection of a finite collection of closed halfspaces.
The proof of Theorem 4.7.1 is left as an exercise. It can be proved from first
principles by the homogenization method (recommended!) or it can be derived from
the theorem of Krein–Milman for arbitrary—possibly unbounded—convex sets that
we have given.
It follows that the polar set of a polyhedral set that contains the origin is again
a polyhedral set and that the polar cone of a polyhedral cone is again a polyhedral
cone.
d
We use again the notation P ← → P̃ for convex polyhedral sets in duality. Note
that now there is no closure operation in its definition: it just means P ◦ = P̃ , P̃ ◦ =
P . So there is complete symmetry between P and P̃ . This is the advantage of
working with polyhedral sets.
If we take this into account, then the theorems for general convex sets specialize
to the following results for polyhedral sets.
Theorem 4.7.3 (Theorem on Polyhedral Sets) The following statements hold:
1. The polar set of a polyhedral set is again a polyhedral set.
2. The image and inverse image of a polyhedral set under a linear transformation
are again polyhedral sets.
3. The polyhedral property is preserved by the standard binary operations for
convex sets—sum, intersection, convex hull of the union, inverse sum.
4. For polyhedral sets P , Q ⊂ Rn containing the origin, the following formulas
hold
d
P +Q←
→ P #Q
d
P co ∪ Q ←
→ P ∩ Q.
d
5. For polyhedral sets P , P̃ ⊂ Rn containing the origin for which P ←
→ P̃ and
M ∈ Rm×n , the following formula holds:
d
→ P̃ M .
MP ←
We do not go into the definitions of vertices, edges and facets of polyhedral sets.
However we invite you to think about what happens to the structure of vertices,
edges and facets of a polyhedral set when you apply the polar set operator. Try to
guess this. Hint: try to figure out first the case of the platonic solids—tetrahedron,
cube, octahedron, dodecahedron, icosahedron—if they contain the origin.
Conclusion Polyhedral sets containing the origin are the nicest class of convex sets.
In particular, the polyhedral set property is preserved under taking the image under
a linear transformation and under taking the polar set. This makes the duality theory
for polyhedral sets very simple: it involves no closure operator.
4.8 Applications
The aim of this section is to give some applications of duality for convex sets.
4.8.1 Theorems of the Alternative
We present the theorems of the alternative. We show that for each system of linear
inequalities and equations, one can write down another such system such that either
the first or the second system is soluble. These are results about polyhedral sets,
as polyhedral sets are solution sets of finite systems of linear inequalities and
equations. We do this for some specific forms of systems.
We start with an example.
Example 4.8.1 (Theorems of the Alternative) Consider the system of linear inequal-
ities
2x + 3y ≤ 7
6x − 5y ≤ 8
−3x + 4y ≤ 2
This system is soluble. It is easy to convince a non-expert of this. You just have to
produce an example of a solution. For example, you give (2, 1). You could call this
a certificate of solubility. This holds more generally for each system of inequalities
and equations: if it is soluble, then each solution can be used to give a proof of the
solubility.
Now consider another system of linear inequalities
2x − 3y + 5z ≤ 4
6x − y − 4z ≤ 1
−18x + 11y − 7z ≤ −15.
This system is not soluble—it has no solution. How to convince a non-expert of

this? Well, it is again easy. Just produce a linear combination of these inequalities
with nonnegative coefficients that gives an equality of the form 0·x +0·y +0·z ≤ a
4.8 Applications 109
for some negative number a. Therefore, the assumption that a solution to the system
of linear inequalities exists leads to a contradiction. Hence, there exists no solution.
It seems reasonable to expect that non-experts will be convinced by this argument.
For example, you can take as coefficients 3, 2, 1. This gives that
3 · (2x − 3y + 5z) + 2(6x − y − 4z) + 1 · (−18x + 11y − 7z)
is less than or equal to
3 · 4 + 2 · 1 + 1 · (−15) ≤ 3 · 4 + 1 · 1 + 2 · (−15).
After simplification we get
0 · x + 0 · y + 0 · z ≤ −1.
You could call this a certificate of insolubility. Such certificates do not exist for
arbitrary systems of inequalities and equations. But fortunately they always exist if
the inequalities and equations are linear. This is the content of the theorems of the
alternative.
We do not go into the efficient method to find such certificates—by means of the
simplex method. Of course, it is not necessary that a non-expert understands how
the certificate—here the coefficients 3, 2, 1—has been found. This is a ‘trick of the
trade’.
Now we give several theorems of the alternative.
Theorem 4.8.2 (Theorems of the Alternative) Let X = Rn , Y = Rm , A ∈ Rm×n
and b ∈ Y .
1. Minkowski [1] and Farkas [2]. Either the system
Ax = b, x ≥ 0
is soluble or the system
A y ≥ 0, b · y < 0
is soluble.
2. Fan Tse (Ky Fan) [3]. Either the system
Ax ≤ b
A y = 0, y ≥ 0, b · y < 0
is soluble.
3. Gale [4]. Either the system
Ax ≤ b, x ≥ 0
A y ≥ 0, y ≥ 0, b · y < 0
is soluble.
The first one of these theorems of the alternative is usually called Farkas’ lemma.
The proofs of these theorems of the alternative will show how handy the calculus
rules for the binary operations are. In order to avoid that minus signs will occur, we
work with dual cones and not with polar cones. Recall that they only differ by a
minus sign C ∗ = −C ◦ . We will use ((Rn )+ )∗ = Rn+ .
Proof
1. The solubility of Ax = b, x ≥ 0 means that b ∈ ARn+ . The convex cone ARn+ is
polyhedral, and therefore
ARn+ = (ARn+ )∗∗ = ((A )−1 Rn+ )∗ .
This settles 1.
2. The solubility of Ax ≤ b means that b ∈ ARn +Rm
+ . The convex cone AR +R+
n m
is polyhedral, and therefore

m ∗∗ −1 m ∗
ARn + Rm
+ = (AR + R+ ) = ((A ) (0) ∩ R+ ) .
n
This settles 2.
3. The solubility of Ax ≤ b, x ≥ 0 means that b ∈ ARn+ + Rm
+ . The convex cone
ARn+ + Rm + is polyhedral, and therefore
m ∗∗ −1 m ∗
ARn+ + Rm
+ = (AR+ + R+ ) = ((A ) (R+ ) + R+ ) .
n n
This settles 3.
For one more theorem of the alternative, by Gordan, see Exercise 50.
Conclusion The calculus rules for binary operations allow very efficient proofs for
the so-called theorems of the alternative. These theorems show the existence of
certificates of non-solubility of systems of linear inequalities.
4.8.2 *The Black-Scholes Option Pricing Model
We present, as announced in Sect. 4.1.4, the result of Black and Scholes for the
pricing of European call options in a simple but characteristic situation.
Consider a stock with current value p. Assume that there are two possible
scenarios at a fixed future date: either the value will go down to vd , or the value
will go up to vu . The question is to determine the ‘correct’ current value π for the
right to buy the stock at the fixed future date for a given price q. Such a right is
called a European call option. This given price is always chosen between the low
value vd and the high value vu , of course, vd < q < vu . Other choices make no
sense. Black and Scholes showed that there exists a unique ‘correct’ current value
for the option. It is characterized by the following interesting property:
There is an—auxiliary—unique probability distribution for the two scenarios such that, for
both the stock and the option, the current value is equal to the expected value at the fixed
future date.
This property is easy to memorize. The condition on the stock and the option
is called the martingale property. Keep in mind that the statement is not that this
probability distribution describes what the probabilities really are; it is just an
auxiliary gadget.
Let us first see why and how this determines the value of the option now. We
choose notation for the probability distribution: let yd be the probability that the
value will go down and let yu be the probability that the value will go up. So
yd , yu ≥ 0 with yd + yu = 1.
Now we put ourselves in the position of an imaginary person who thinks that the
probability for the value of the stock to go down is yd and that the probability for
the value of the stock to go up is yu . We do not imply at all that these thoughts have
any relation to the real probabilities. The probability distribution that we consider is
just an auxiliary gadget.
The expected value of the stock at the future date is yd vd + yu vu . Thus the
property above gives the equality p = yd vd + yu vu . When the price of the stock
goes down, then the right to buy something for the price q that has at that date a
lower price vd is useless. So this right will not be used. Therefore, the value of the
option at that date is 0. However, when the price of the stock goes up, then you
can buy stock for the price q and then sell it immediately for the price vu . This
gives a profit vu − q. Therefore, the value of the option at that date turns out to
be vu − q. It follows that the expected value of the option at the future date is
yd · 0 + yu (vu − q) = yu (vu − q). Thus the property above gives the equality
π = yu (vu − q). Now we have three linear equations in the unknown quantities
π, yd , yu :
yd + yu = 1,
p = yd vd + yu vu ,
π = yu (vu − q).
These equations have a unique solution and for this solution yd , yu ≥ 0 as one
readily checks. This concludes the explanation how the property above determines
a unique current value for the option.
It remains to justify the property above. The source of this is the fact that in
financial markets, there are always people on the lookout for an opportunity to
make money without taking any risk, called arbitrage opportunities. Their activities
lead to changes in prices, which make these opportunities disappear. Therefore, the
assumption is made that there are no arbitrage opportunities. We are going to see
that this implies the property above.
To see this, we first have to give a precise description of the absence of arbitrage
opportunities. There are three assets here: money, stock, options. We assume that
there exists no portfolio of these assets that has currently a negative value but at
the fixed future date a nonnegative value in each one of the two scenarios. That
is, there exist no real numbers xm , xs , xo for which xm + xs p + xo π < 0 and
xm + xs vd + xo · 0 ≥ 0, xm + xs vu + xo (vu − q) ≥ 0.
Note that here we assume that the value of money stays the same from now till
the future date and that we allow going short, that is, xm , xs , xo are allowed to be
negative—this means holding negative amounts of the asset, as well as allowing
xm , xs , xo to be not integers.
Now we are ready to strike: we apply Farkas’ lemma. This gives the property
above, as can be checked readily. This concludes the proof of the result of Black
and Scholes.
4.8.3 *Child Drawing
The aim of this section is to determine the equation of the curve which is the
outcome of the child drawing of Sect. 4.1.1. Figure 4.14 gives this drawing again.
Let the length of the drawer be 1. We choose coordinates in such a way that the
x-axis lies horizontally at a height of 1 above the bottom of the sheet and that the
y-axis lies vertically at a distance of 1 to the right of the left side of the sheet. Then
the lines ax + by = 1 that the child draws enclose a closed convex set A containing
Fig. 4.14 Child drawing

the origin; they are tangent lines to the boundary of A. This boundary is a curve
. The lines that the child draws, correspond to the points of the curve that form
the boundary of the polar set A◦ . By the duality theorem, the polar set of A◦ is A.
The points on the boundary of A = (A◦ )◦ , the polar set of A◦ , correspond to the
tangent lines to . This discussion suggests how to do the calculation. To keep this
calculation as simple as possible we change to the following coordinates: the bottom
of the sheet is the x-axis and the left-hand side of the sheet is the y-axis. Note that
now the convex set A does not contain the origin. However, this does not matter for
the present purpose.
The lines that the child draws are the lines ax + by = 1 for which the distance
between the intercepts (a −1 , 0) and (0, b−1 ) is 1. Thus the pair (a, b) has to satisfy,
by the theorem of Pythagoras, the equation
a −2 + b−2 = 1.
So the curve has equation x −2 + y −2 = 1.

Now we take in each point of the tangent ax + by = 1. Then one can verify
that the coefficients a and b of a tangent ax + by = 1 to the curve satisfy the
2 2
equation a 3 + b 3 = 1. This shows that the curve in the child drawing has the
equation
2 2
x 3 + y 3 = 1.
Conclusion By using the duality theorem for convex sets we can derive from the
dual description of a convex curved line (‘its tangent lines’) an equation for the
curved line.
4.8.4 *How to Control Manufacturing by a Price Mechanism
The aim of this section is to prove the result stated in Sect. 4.1.2: for each reasonable
way to manufacture a product, there exist prices for the resources such that this way
will be chosen by a cost minimizing firm.
Let notations and definitions be as in Sect. 4.1.2. Let a reasonable way to
manufacture the product be chosen: this is a boundary point x of P . Apply the
supporting hyperplane theorem 4.3.3 to x and P . This gives a nonzero vector
p ∈ Rn such that the halfspace p · (x − x ) ≥ 0 supports P at x . As RP = Rn+
and as x is efficient, it follows that p > 0. If we now set prices for the resources to
be p1 , . . . , pn , then we get—by the definition of supporting hyperplane—that the
problem to minimize p · x subject to x ∈ P has optimal solution x . This completes
the analysis of the manufacturing problem.
Conclusion The supporting hyperplane theorem can be applied to show the follow-
ing result. For each efficient way to manufacture a product, there exist prices for the
resources such that this efficient way will be chosen by a cost minimizing firm.
4.9 Exercises
1. In Sect. 4.1.2, it is stated that boundary points of P are the reasonable ways of
manufacturing. Why is this the case?
2. Check the details of the ball throwing proof for the duality theorem for convex
sets, Theorem 4.2.5, which is given in this chapter.
3. One of the two standard proofs for duality: by means of the shortest distance
problem. Consider the set-up of Theorem 4.2.5 and assume A = ∅, A = Rn .
Show that for each p ∈ Rn \ A, there exists a unique point x ∈ A which is
closest to p and show that the closed halfspace {x ∈ Rn | (x − x ) · (
x − p) ≥ 0}
contains A but not p. Figure 4.15 illustrates the shortest distance problem.
Figure 4.16 illustrates the proof for duality by means of the shortest distance
problem for a convex cone C in the plane. A disk with center p is blown up till
it hits the convex cone, at some point x . Then the tangent line to the disk at the
point x has C on one side and the point p on the other side.
4. *Let p ∈ Rn and let A ⊆ Rn be a nonempty closed convex set. Show that the
shortest distance problem for the point p and the set A has a unique solution.
Fig. 4.15 Shortest distance

problem for a point and a
convex set
x^
p
Fig. 4.16 Shortest distance

proof for duality C C
x^
x^
p p
4.9 Exercises 115
Fig. 4.17 Theorem of

Moreau
C 0 D
c
d
x
This unique solution is sometimes called the projection of the point p on the
convex set A. This result is called the Hilbert projection theorem.
5. Theorem of Moreau. *Here is the main result on the shortest distance problem.
Let C, D be two closed convex cones in X = Rn that are each other’s polar
cone, C = D ◦ , D = C ◦ . Let x ∈ X be given. Show that there exist unique
c ∈ C and d ∈ D for which
x = c + d, c · d = 0.
Moreover show that c is the point in C closest to x and d is the point of D

closest to x. Figure 4.17 illustrates the theorem of Moreau, as c · d = 0 means
that the vectors c and d are orthogonal.
6. Prove that, for a finite dimensional vector space X, the subsets of X ×R that are
the epigraph of a norm on X are precisely the closed convex cones C ⊂ X × R
for which C \ {(0X , 0)} ⊆ X × R++ , for which (0X , 1) is an interior point of
C, and for which moreover (x, ρ) ∈ C implies (−x, ρ) ∈ C.
7. Show that the norm of a linear function ψ : X → R on a finite dimensional
normed vector space X with norm N is well-defined by the formula
N ∗ (ψ) = max |ψ(x)|.

N (x)=1
That is, show that the function |ψ(x)| assumes its maximum on the solution set
of N (x) = 1 (‘the sphere of the norm N ’).
8. Prove the statement in Example 4.3.10.
9. Hahn–Banach. One of the two standard proofs for duality: the Hahn–Banach
proof. The following proof of the duality theorem, has the advantage that it can
be extended to infinite dimensional spaces (this only requires that you use in
addition Zorn’s lemma); it is essentially the proof of one of the main theorems
of functional analysis, the theorem of Hahn–Banach.
Figure 4.18 illustrates the induction step of the Hahn–Banach proof for
duality, for the step from dimension two to dimension three. Here we consider
the weakest looking version of duality, Theorem 4.3.20.
C
V
l proj l ⊥(C)
v 0
0 proj l ⊥
Fig. 4.18 Hahn–Banach proof for duality
A convex cone C in three-dimensional space that has nonempty interior and

that is not equal to the entire space, is drawn. We have to prove duality for it:
that is, we have to prove that there exists a plane through the origin that has C
entirely on one of its two sides.
We assume that we have duality for convex cones in one dimension lower,
in a plane. That is, for each convex cone D in a two-dimensional space that has
nonempty interior and that is not equal to the entire plane, there exists a line in
this plane that has D entirely on one of its two sides.
An interior point v of C has been chosen and drawn in Fig. 4.18. Moreover, a
plane V through the origin and through the point v has been chosen and drawn.
Then the intersection C ∩ V is a convex cone in the plane V . We have duality
for this convex cone by assumption. So we can choose a line l in V through
the origin that has C ∩ V entirely on one of its two sides. This line l has been
drawn.
Now we take the orthogonal projection of C on the plane l ⊥ through the
origin that is orthogonal to l. This is a convex cone in l ⊥ that is not equal to
l ⊥ (this follows readily from the special choice of l). We have duality for this
convex cone by assumption. Therefore, we can choose a line in l ⊥ through the
origin that has this convex cone entirely on one of its two sides: in the figure,
this projection is a half-plane (drawn on the righthand side). Therefore this line
is unique.
Now the plane that contains this line and that contains the line l as well,
has the convex cone C entirely one of its two sides, as one can check. This
completes the proof of this induction step. This proof for the induction step
from dimension two to dimension three extends to a proof for the induction
step from dimension n − 1 to dimension n for arbitrary n ≥ 3; this gives
Theorem 4.3.20.
In this exercise, you are invited to construct a precise proof of the duality—in
the version of Theorem 4.2.5—by means of this induction argument.
(a) Show that in order to prove Theorem 4.2.5, it suffices to prove the following
statement. For each subspace H in Rn for which H ∩ int(A) = ∅ and
dim H < n−1 there exists a subspace K in Rn for which H ⊂ K, dim K =
dim H + 1 and K ∩ int(A) = ∅.
4.9 Exercises 117
(b) Let H be a subspace in Rn for which H ∩ int(A) = ∅ and dim H < n − 1.

Show that there exists a subspace L in Rn for which H ⊂ L and dim L =
dim H + 2.
(c) Show that the—possibly empty—set of hyperplanes in L that contain H
and that are disjoint from int(A) is in a natural bijection to the set of lines
in the plane L/H through the origin that are disjoint from the interior of
(A + H )/H .
(d) Show that, for a given convex set B in the plane not having the origin as
an interior point, there exists a line in the plane through the origin which is
disjoint from int(B).
10. Prove the supporting hyperplane theorem 4.3.3 by deriving it from the duality
theorem 4.2.5.
Hint. Apply the duality theorem to a sequence of points outside A that converge
to the given relatively boundary point of A. This leads to a sequence of
hyperplanes. Take the limit of a suitable subsequence. Here the concept limit
requires some care.
11. Derive the duality theorem 4.2.5 from the supporting hyperplane theorem 4.3.3.
Hint. Choose a relative interior point a in A. Then show that for each point x
outside A, the closed line segment with endpoints a and x contains at least one
relative boundary point of A.
12. **Prove the separation theorem, Theorem 4.3.6.
Hints. For the first statement, reduce the general case to the problem of
separating A − B = {a − b | a ∈ A, b ∈ B} from the origin if 0 ∈ A − B.
Note that we have already established this fact. For the second statement, show
that A − B is closed if A is closed and B is compact.
13. Show that the set X∗ of linear functions X → R on a given finite dimensional
normed vector space X with norm N , is a finite dimensional normed vector
space with norm N ∗ , defined in Definition 4.3.9. The set X∗ is called the dual
space of X and N ∗ is called the dual norm.
14. **Prove the finite dimensional Hahn–Banach theorem, Theorem 4.3.11, in two
different ways: (1) derive it from the separation theorem, Theorem 4.3.6, (2)
give a proof from first principles, extending the linear function step by step,
each time to a subspace of one dimension more than the previous subspace
(this is the usual proof given in textbooks).
15. Show that for two convex cones C1 , C2 ⊆ Rn one has the implication
C1 ⊆ C2 ⇒ C1◦ ⊇ C2◦ .
16. Show that for two closed convex cones in duality, C, C̃, one has that C has a
nonempty interior iff C ◦ is pointed (that is, it contains no entire line).
17. Show that for two closed convex sets containing the origin, B1 , B2 ⊆ Rn , one
has the implication
B1 ⊆ B2 ⇒ B1◦ ⊇ B2◦ .
18. Let B ⊆ X = Rn be a closed convex set containing the origin. Check that B ◦
is a closed convex set in X containing the origin.
19. Show that for two closed convex sets containing the origin, B, B̃, that are each
other’s polar set, one has that B is bounded iff B̃ contains the origin in its
interior.
Hint. Proceed in three steps:
(a) Show that B is bounded iff its conification c(B) is contained in the epigraph
of ε · for some ε > 0.
(b) Show that B contains the origin in its interior iff its conification c(B )
contains the epigraph of N · for some N > 0.
(c) Show that the epigraphs of N −1 · and N · are each other’s polar cone
with respect to the bilinear pairing b((x, α), (y, β)) = x · y − αβ.
20. Show that the polar set of a subspace is its orthogonal complement.
21. Show that the polar set of the standard unit ball in Rn with respect to some
norm · is the standard unit ball in Rn with respect to the dual norm · ∗ .
22. Show that the fact that the polar set operator B → B ◦ is an involution—that is,
Theorem 4.3.15—is indeed a reformulation of the duality theorem for convex
sets, Theorem 4.2.5.
Hint. Check first that for every nonzero v ∈ Rn , one has v ∈ B ◦ iff the closed
halfspace given by the inequality x · v ≤ 1 contains B.
23. Prove the weakest looking version of the duality theorem (‘polar cone of a
proper convex cone is nontrivial’) by means of the ball-throwing proof, where
the ball is not necessarily thrown in a straight line.
24. Show that for every closed convex cone C ⊂ X = Rn that has nonempty
interior, there exists a basis of the vector space X such that if you take
coordinates with respect to this basis, C is the epigraph of a function g :
Rn−1 → R+ for which epi(g) is a convex cone.
25. *Show that for every closed convex cone C ⊂ X = Rn that has nonempty
interior and that contains no line through the origin, there exists a basis of the
vector space X such that if you take coordinates with respect to this basis, C is
the epigraph of a function g : Rn−1 → R+ for which epi(g) is a convex cone
and g(0) = 0 iff x = 0.
26. *Show that a given closed convex cone does not contain any line through the
origin iff its polar cone has a nonempty interior.
27. *Show that the three golden cones—the first orthant, the Lorentz cone and the
positive semidefinite cone—are self-dual, that is, show that they are equal to
their dual cone.
28. *Prove Proposition 4.4.4.
29. Let X, be two finite dimensional vector spaces which are put in duality by a
non-degenerate bilinear mapping b : X × → R.
(a) *Show that X and have the same dimension (= n).
(b) Choose bases for X and and identify vectors of X and with the
coordinate vectors with respect to these bases. Thus X and are identified
4.9 Exercises 119
with Rn and then b is identified with a mapping b : Rn × Rn → R. Show

that there is a unique invertible n × n-matrix M for which b(x, y) = x My
for all x ∈ X, y ∈ .
(c) Show that the bases for X and can be chosen in such a way that M = In
the identity n × n-matrix. Then b(x, y) = x y = x · y for all x, y ∈ Rn —
that is, b is the dot product.
30. Let X and be two finite dimensional vector spaces in duality with underlying
mapping b. Check that for a convex cone C ⊆ X, the polar cone C ◦ with respect
to b is a closed convex cone in .
31. Give a nondegenerate bilinear mapping b : Rn × Rn → R for which the polar
cone operator with respect to b is just the dual cone operator with respect to the
dot product.
32. Check that the polar set and the polar cone of a convex cone in X = Rn are
equal.
33. Prove the following relation between the concepts polar cone and dual cone:
C ∗ = −C ◦ .
34. Show that the polar cone of a subspace L of X = Rn is its orthogonal
complement L⊥ = {x ∈ X | x · y = 0 ∀y ∈ L}.
35. Show that the convex cones R+ (0X , 1) and X × R+ , where X = Rn , are each
other’s polar cone with respect to the non-degenerate bilinear pairing b on X×R
defined by
b((x, α), (y, β)) = x · y − αβ
for all (x, α), (y, β) ∈ X × R.

36. Show that the definition of polar set by the homogenization method is well-
defined. To be specific, check that for a closed convex set B containing the
origin, one has that c(B)◦ , the polar cone of c(B), is a closed convex cone in
X × R+ for which c(B)◦ \ {0} is contained in X × R++ .
37. Show that the duality operator on convex sets containing the origin, defined by
the homogenization method, is equal to the polar set operator.
38. Show that the duality operator on norms, defined by the homogenization
method, is equal to the well-known dual norm operator.
39. Let notation and assumptions be as in Proposition 4.6.1. Show that if ri(C ◦ ) ∩
ri(D ◦ ) = ∅ then C̃ + D̃ is closed. So then (C ∩ D)◦ = C̃ + D̃.
40. (a) Show that for a convex set A ⊆ X = Rn the intersection RA ∩ R−A is a
subspace of X. This is called the lineality space of A.
(b) Show that the lineality space of a convex cone C ⊆ X is the largest
subspace of X contained in C.
41. Prove Proposition 4.6.2 using the duality result for convex cones.
42. Let notation and assumptions be as in Proposition 4.6.2. Show that MC is
closed if C ∩ ker(M) ⊆ LC . So then (C̃M )◦ = MC.
43. (a) Derive Proposition 4.6.2 from the separation theorem.
(b) Derive Proposition 4.6.1 by means of conification, using Proposition 2.2.1.
44. *Show that ‘the closure operators cannot be omitted’ from the formulas above
in general. That is, one might have, for suitable pairs of closed convex cones in
duality (C, C̃) and (D, D̃) in Rn ,
C + D = (C̃ ∩ D̃)◦
and
(C̃M )◦ = MC.
45. *Derive Proposition 4.6.4 from the corresponding result for convex cones,
Proposition 4.6.2.
46. *Derive Proposition 4.6.3 from Proposition 4.6.4 by means of the homogeniza-
tion method.
Hint. use that +Y and Y are each other’s transpose for each finite dimensional
inner product space Y .
47. Formulate Theorem 4.7.3 for polyhedral cones.
48. *Let P be the solution set of the system of linear inequalities in n variables
aj x ≤ bj , 1 ≤ j ≤ r where aj ∈ Rn and bj ∈ R. Show that for each extreme
point x̄ of this polyhedral set there is a subset J of {1, 2, . . . , r} for which the
vectors aj , j ∈ J form a basis of Rn and x̄ is the unique solution of the system
of linear equations aj x = bj , j ∈ J .
49. Consider the five platonic solids—tetrahedron, cube, octahedron, dodecahe-
dron, icosahedron. We assume that they contain the origin—after a translation;
this is done for convenience: then we can give dual descriptions by means of
the polar set operator.
(a) Show that these are polyhedral sets.
(b) Show that the polar set of a platonic solid is again a platonic solid.
(c) *Determine for each platonic solid which platonic solid is its polar set.
50. Gordan’s theorem of the alternative. *Let a1 , . . . , ar ∈ Rn . Prove the
following statement.
Gordan [5]. Either the system a1 x < 0, . . . , ar x < 0 is soluble, or the system
μ1 a1 + · · · + μr ar = 0, μ = 0, μ ≥ 0 is soluble.
51. Show that the coefficients a and b of a tangent ax +by = 1 to the curve in the
2 2
child drawing example analyzed in Sect. 4.8.3 satisfy the equation a 3 +b 3 = 1.
52. Now the child makes a second drawing (after the drawing that is analysed in
Sect. 4.8.3). Again she draws straight lines from the left-hand side of the paper
to the bottom of the paper. This time she is doing this in such a way that the
sum of the distances of the endpoints of each line to the bottom-left corner of
the paper is constant. Again these lines enclose a closed convex set. Find the
equation of the curve formed by the boundary points of this closed convex set.
References 121
References
1. H. Minkowski, Geometrie der Zahlen (Teubner, Leipzig, 1896)

2. G. Farkas, Über die Theorie der einfachen Ungleichungen. J. Reine Angew. Math. 124, 1–24
(1901)
3. Fan Tse (Ky Fan), On Systems of Linear Inequalities. Linear Inequalities and Related Systems
(Princeton University Press, Princeton, 1956)
4. D. Gale, The theory of linear economic models (McGraw-Hill, New York, 1960)
5. P. Gordan, Über die Auosung linearer Gleichungen mit reellen Coecienten. Math. Ann. 6, 23–28
(1873)
Chapter 5
Convex Functions: Basic Properties
Abstract
• Why. A second main object of convex analysis is a convex function. Convex
functions can help to describe convex sets: these are infinite sets, but they can
often be described by a formula for a convex function, so in finite terms. More-
over, in many optimization applications, the function that has to be minimized is
convex, and then the convexity is used to solve the problem.
• What. In the previous chapters, we have invested considerable time and effort in
convex sets and convex cones, proving all their standard properties. Now there
is good news. No more essentially new properties have to be established in the
remainder of this book. It remains to reap the rewards. In Chaps. 5 and 6, we
consider to begin with convex functions: the dual properties in Chap. 6, and the
properties that do not require duality—the primal properties in this chapter. This
requires convex sets: convex functions are special functions on convex sets. Even
better, they can be expressed entirely in terms of convex sets: convex functions
are functions for which the epigraph, that is, the region above the graph, is a
convex set. Some properties of convex functions follow immediately from a
property of a convex set, applied to the epigraph. An example is the continuity
property of convex functions. For other properties, a deeper investigation is
required: one has to take the homogenization of the epigraph, and then one should
apply a property of convex cones. An example is the unified construction of the
eight standard binary operations on convex functions—sum, maximum, convex
hull of the minimum, infimal convolution, Kelley’s sum, and three nameless
ones—by means of the homogenization method. The defining formulas for them
look completely different from each other, but they can all be generated in exactly
the same systematic way by a reduction to convex cones (‘homogenization’).
One has for convex functions the same technical problem as for convex sets:
all convex functions that occur in applications have the nice properties of being
closed and proper, but if you work with them and make new functions out of
them, then they might lose these properties. Two examples of such work are:
(1) the consideration, in a sensitivity analysis for an optimization problem, of
the optimal value function, and (2) the application of a binary operation. Finally,

124 5 Convex Functions: Basic Properties
twice continuously differentiable functions f (x) are convex iff their Hessian f (2)
is positive semidefinite for all x.
Road Map
1. Definitions 5.2.1, 5.2.4, 5.2.6, 5.2.9 and 5.2.10 (convex function, defined either
by Jensen’s inequality or by the convexity of a set: its (strict) epigraph).
2. Proposition 5.3.1 (continuity property of a convex function).
3. Propositions 5.3.6, 5.3.9 (first and second order characterizations of convex
functions).
4. Figure 5.7 and Definition 5.4.2 (construction convex function by homogeniza-
tion).
5. Definitions 5.2.12, 5.3.4 (two nice properties for convex functions, closedness
and properness).
6. Definitions 5.5.1, 5.5.2 (the two actions by linear functions on convex functions).
7. Figure 5.8, Definition 5.6.1, Proposition 5.6.6 (binary operations on convex
functions—pointwise sum, pointwise maximum, convex hull of pointwise min-
imum, infimal convolution, Kelley’s sum and three nameless ones—defined by
formulas and constructed by homogenization).
5.1 *Motivation
The aim of this section is to motivate the concept of convex function.
5.1.1 Description of Convex Sets
The aim of this section is to give two examples of how a convex set, which is usually
an infinite set, can be described by a function, which can often be described in finite
terms: by a formula.
Example 5.1.1 (An Unbounded Closed Convex Set Viewed as an Epigraph) Sup-
pose that A ⊆ X = Rn is an unbounded closed convex set. Then it has a nonzero
recession vector v. Assume that −v is not a recession vector for A. Then we can
view A as an epigraph, that is, as the region above the graph of a function. Figure 5.1
illustrates how this can be done.
Here is a precise description. Let L be the hyperplane in X through the origin
orthogonal to v, that is,
L = v ⊥ = {x ∈ X | x · v = 0}.
Let B ⊆ L be the image of A under the orthogonal projection of X onto L. Then

we put f (x) = min{ρ | x + ρv ∈ A} for each x ∈ B. This is well-defined: by
5.1 *Motivation 125
Fig. 5.1 Unbounded closed

convex set viewed as an
epigraph
A
the definition of B, there exists a number ρ ∈ R such that x + ρv ∈ A; as −v is

not a recession direction of A, the set of such numbers ρ is bounded below; by the
closedness of A, it follows that there is a minimal such number ρ. This completes the
verification that f (x) is well-defined for all x ∈ B. This gives that x + f (x)v ∈ A
and that x + ρv ∈ A implies ρ ≥ f (x). Conversely, for all ρ ≥ f (x), we have that
the point x + ρv = (x + f (x)v) + (ρ − f (x))v is in A, as x + f (x)v ∈ A and
as moreover v is a recession vector of A. So we get a function f : B → R and the
set A is the subset of X consisting of all vectors x + ρv where (x, ρ) runs over the
epigraph of f ,
epif = {(x, ρ) | x ∈ B, ρ ∈ R, ρ ≥ f (x)}.
In this way, the convex set A is described completely in terms of the function f :
it is essentially the epigraph of f . The function f is called a convex function as its
epigraph is a convex set. Moreover, f is called lower-semicontinuous or closed as
its epigraph is a closed set. There are many functions defined by a formula—that is,
in finite terms—that can be verified to be convex functions, as we will see. For such
a function f , one gets a convenient ‘finite’ description of the convex set epif , the
epigraph of f —by the formula for the function f .
Example 5.1.2 (A Closed Convex Set Described by Its Gauge) The Euclidean norm
on Rn can be described in terms of the standard closed unit ball Bn as x = inf{t >
0 | x/t ∈ Bn }. Now we generalize this construction. Suppose that we have a closed
convex set A ⊆ X = Rn that contains the origin in its interior (recall that this
condition can be assumed for a given nonempty closed convex set by translating
it to a new position in such a way that a relative interior point is brought to the
origin, and then restricting the space from Rn to the span of the translated copy of
the given set). Then the closure of c(A), the homogenization of A, is the epigraph
of a function pA : X → R+ . One has
pA (x) = inf{t > 0 | x/t ∈ A}.

Fig. 5.2 A closed convex set

described by its gauge
O
This function is called the Minkowski function or the gauge of A. The set A can be
expressed in terms of this function by means of
A = {x ∈ X | g(x) ≤ 1}.
If g can be given by a formula, then this gives again a convenient finite description
of the convex set A: by the formula for its gauge g, a convex function as its epigraph
is a convex cone and so a convex set.
Figure 5.2 illustrates this description for the case that A is bounded.
A subset A of the plane R2 is drawn. It is a bounded closed convex set containing
the origin in its interior. Some arrows are drawn from the origin to the boundary of
A. On such an arrow the value of the gauge g of A grows linearly from 0 to 1.
Conclusion We have seen two methods to describe convex sets by means of
functions; this is convenient, as functions can often described by a formula. Then
an ‘infinity’ (a convex set) is captured in finite terms. The first method is to view
an unbounded convex set with the aid of a recession direction as the epigraph of a
function. The second method is by describing a convex set by its gauge.
5.1.2 Why Convex Functions that Are Not Nice Can Arise
in Applications
In this chapter, we will define two nice properties, closedness and properness, that all
convex functions that occur in applications possess. Now we describe an operation
that has to be carried out in many applications and that leads to a convex function
that might not have these two nice properties. This is the reason why one allows
in the theory of convex functions also functions that do not have these two nice
properties.
Example 5.1.3 (The Importance of Sensitivity Analysis) Sometimes the sensitivity
of an optimal solution is considered to be more important than the optimal solution
itself. Suppose, for example, that a company wants to start a new project. To begin
with, some possible scenarios for how to go about it might be created. Then, in order
to help the responsible managers to make a choice, the expected revenue minus cost
5.2 Convex Function: Definition 127
is determined for each scenario, by means of optimization. You might be surprised

to learn that managers are often not very interested in comparing the optimal profits
of the scenarios. They are more interested in avoiding risky scenarios. Some of the
data of an optimization problem for a given scenario, such as prices, are uncertain.
In some scenarios, the sensitivity of optimal profits on the data is much higher than
in others. Let S(y) be the optimal profit for a given scenario if the data vector is
y. Then the function S is called the optimal value function, and by its definition it
determines the sensitivity of the optimal value for changes y. Therefore, the optimal
value function is of great interest. The function S is concave (that is, −S is convex),
if the functions that describe the optimization problem for the scenario are convex.
Indeed, making −S out of these convex functions is a convexity preserving operator.
In each application, the convex functions that describe the optimization problem for
a given scenario have the two nice properties. However, it turns out that the optimal
value function S need not have these two nice properties.
5.2 Convex Function: Definition
The aim of this section is to define convex functions.

Here is the traditional definition of a convex function.
Definition 5.2.1 A proper convex function is a function f : A → R, where A ⊆
X = Rn is a nonempty convex set, for which Jensen’s inequality
f ((1 − α)x + αy) ≤ (1 − α)f (x) + αf (y) ∀α ∈ [0, 1] ∀x, y ∈ A
holds.
In geometric terms, Jensen’s inequality means that each chord with endpoints on
the graph of a proper convex function f lies entirely above or on the graph of f .
Figure 5.3 illustrates this property.
Here we make an observation that is often useful, as it allows to reduce some
arguments involving a convex function of several variables to the case of a convex
function of one variable: a function is convex iff its restriction to each line is convex.
Fig. 5.3 Jensen’s inequality

Example 5.2.2 (Convex Function) Here are some characteristic examples of convex
functions. Later we will show how it can be verified that a function is convex in an
easier way than by checking Jensen’s inequality.
• Each linear function f (x) = a1 x1 +· · · an xn ; moregenerally each quadratic
function f (x) = x Ax + b x + c = i,j aij xi xj + k ck xk + d for which the
matrix A + A is positive semi-definite.
1
• The function f (x) = (x12 + · · · + xn2 ) 2 , the Euclidean length. This function is not
differentiable at x = 0.
• The function f (x) = 1/(1 − x 2 ), x ∈ (−1, 1). This function is not defined on
the entire line R.
• The function f on the nonnegative real numbers R+ for which f (x) = 0 for
x > 0 and f (0) = 1. This function is not continuous at 0.
It is often convenient to extend a convex function f : A → R to the entire space
X by defining its value everywhere outside A formally to be +∞. The resulting
function X → (−∞, +∞] is then called a proper convex function on X.
Example 5.2.3 (Indicator Function) Let A ⊆ X = Rn be a nonempty convex set.
Then the indicator function of A, defined to be the function δA : X → (−∞, +∞]
given by the recipe x → 0 if x ∈ A and x → +∞ otherwise, is a proper convex
function on X.
However, a more efficient way to define convex functions (and then to distinguish
between proper and improper ones) is by reducing this concept to the concept of
convex set: a function is defined to be convex if the region above its graph is a
convex set. Now we formulate this definition in a precise way, using the concept
epigraph of a function.
Definition 5.2.4 The epigraph of a function f on X = Rn taking values in the
extended real number line R = [−∞, +∞] = R ∪ {−∞, +∞} is the region above
its graph, that is, it is the set
epi(f ) = {(x, ρ) ∈ X × R | ρ ≥ f (x)}
in X × R.
We give an informal explanation of the idea behind this definition. Suppose we
are given a continuous function of two variables. Consider its graph, a surface in
three-dimensional space. Pour concrete on this surface and let it turn to stone. This
stone is the epigraph.
You do not lose any information by taking the epigraph of a function, as a
function f can be retrieved from its epigraph as follows:
f (x) = min{ρ | (x, ρ) ∈ epif }

if the set of real numbers ρ for which (x, ρ) ∈ epif is nonempty and bounded
below,
f (x) = +∞
if the set of real numbers ρ for which (x, ρ) ∈ epif is empty,
f (x) = −∞
if the set of real numbers ρ for which (x, ρ) ∈ epif is unbounded below.
This can be expressed by means of one formula, by using the concept of infimum.
Definition 5.2.5 The infimum inf S of a set S ⊆ R is the largest lowerbound r ∈
R = R ∪ {±∞} of S. Note that we allow the infimum to be −∞ (this means that S
is unbounded below) or +∞ (this means that S is empty).
It is a fundamental property of the real numbers that this concept is well-defined.
It always exists, but the minimum of a set S ⊆ R does not always exist. If the
minimum of a set S ⊆ R does exist, then it is equal to the infimum of S. Therefore,
the infimum is a sort of surrogate minimum.
A function can be retrieved from its epigraph by means of the following formula:
f (x) = inf{ρ | (x, ρ) ∈ epif } ∀x ∈ X.
Here is the promised efficient definition of a convex function.

Definition 5.2.6 A function f : X = Rn → R is convex if its epigraph is a convex
set in X × R.
Figure 5.4 illustrates this definition.
We give an informal explanation of the idea behind this definition for a function
of two variables. Pour concrete on the graph of this function and let it turn to stone.
If the stone is a convex set, then the function is called convex.
Now the announcement in the beginning of this chapter, about no essentially new
properties having to be established to prove properties of convex functions, can be
explained in more detail. Standard results on a convex function will follow either
from a property of a convex set, applied to the epigraph of the function, or from a
property of a convex cone, applied to the homogenization of the epigraph. Here is
Fig. 5.4 Convexity of a yes yes

function and of its epigraph f
no
f
f
a first example of how properties of convex functions follow from properties from
convex sets.
Proposition 5.2.7 Let f : X = Rn → R be a convex function and let γ ∈ R. Then
the sublevel set {x ∈ X | f (x) ≤ γ } and the strict sublevel set {x ∈ X | f (x) < γ }
are convex sets.
Proof We give the proof for the sublevel set; the proof for the strict sublevel set
is similar. As f is a convex function, its epigraph is convex. Intersection with the
closed halfspace X × (−∞, γ ], which is a convex set, is a convex set. Therefore,
the image of this intersection under the orthogonal projection from X × R onto X,
given by (x, α) → x, is a convex set. This image is by definition the sublevel set
{x ∈ X | f (x) ≤ γ }.
Example 5.2.8 (Convex Sets as Level Sets of Convex Functions) The following sets
are convex as they are level sets of convex functions.
• Ellipse. The solution set of 2x12 + 3x22 ≤ 5 is a level set of x → 2x12 + 3x22 .
• Parabola. The solution set of x2 ≥ x12 is a level set of x → x12 − x2 .
• Hyperbola. The solution set of x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0 is a level set of the
function x → − ln x1 − ln x2 defined on R2++ .
Sometimes, the definition of a convex function, in terms of the strict epigraph, is
more convenient.
Definition 5.2.9 The strict epigraph of a function f on X = Rn taking values in
the extended real number line R = [−∞, +∞] = R ∪ {−∞, +∞} is the set
epis (f ) = {(x, ρ) ∈ X × R | ρ > f (x)}
in X × R.
If you take the strict epigraph of a function, you do not lose any information
either.
Definition 5.2.10 A function f : X = Rn → R is convex if its strict epigraph is a
convex set in X × R.
Definition 5.2.11 The effective domain of a convex function f on X = Rn is the
subset dom(f ) = A = {x ∈ X | f (x) = +∞} of X.
Note that the effective domain of a convex function is a convex set. Note also
that we allow a convex function to take value −∞ on its effective domain.
The convex functions that are of interest in the first place are the following ones.
Definition 5.2.12 A convex function f : X = Rn → R is proper if it does not
assume the value −∞ and if moreover it does not only assume the value +∞.
Convex functions that are not proper are called improper.
In other words, a proper convex function on X arises from a function f :
A → R, where A is a nonempty convex set in X that satisfies Jensen’s inequality
by extending it to the entire space X, defining its value to be +∞ everywhere

outside A.
Example 5.2.13 (Improper Convex Function) What do improper convex functions
look like? They can be characterized almost completely in a simple way. Assume
that A ⊆ Rn is a closed convex set. Define the ‘almost identical’ functions rA− , rA+ :
X → R by
rA− (x) = −∞ ∀x ∈ A,
rA− (x) = +∞ ∀x ∈ A
and
rA+ (x) = −∞ ∀x ∈ ri(A),
rA+ (x) = +∞ ∀x ∈ ri(A).
Then rA− , rA+ are improper convex functions. In fact they are essentially the only
improper convex functions. To be precise, a convex function f : X → R is improper
− +
iff one has, for A = dom(f ), that rcl(A) ≤ f ≤ rcl(A) .
Figure 5.5 illustrates this characterization of improper convex functions.
We cannot avoid improper convex functions; this has been explained in
Sect. 5.1.2.
Conclusion The most fruitful definition of a convex function is by means of its
property that the epigraph is a convex set. One has to allow in the theory so-called
improper convex functions on X = Rn ; these can be classified almost completely
in terms of closed convex sets in X.
Fig. 5.5 Improper convex +∞ +∞

functions
−∞
5.3 Convex Function: Smoothness
Convex functions have always the following continuity property.

Proposition 5.3.1 Each proper convex function f on X = Rn is continuous on the
relative interior of its effective domain dom(f ).
Proof We may assume wlog that the interior of the effective domain of f is
nonempty. This implies that the affine hull of epi(f ) is X × R. Now we apply the
theorem on the shape of a convex set, Theorem 3.5.7. It follows that epi(f ) has a
nonempty interior. For each point x ∈ dom(f ), the point (x, f (x)) is a boundary
point of the epigraph of f . Another application of Theorem 3.5.7 gives that the strict
epigraph of the restriction of f to int(dom(f )) is contained in the interior of epi(f ).
Now choose an arbitrary v ∈ int(dom(f )). To prove the proposition, it suffices to
show that f is continuous in xv. Now we are going to apply Lemma 3.5.9. Let
A = epi(f ), x = (v, f (v)) and a = (v, f (v) + 1). By what we have proved above,
a ∈ int(A), so we can choose ε > 0 such that U (a, ε) ⊆ A. Now all conditions of
Lemma 3.5.9 hold for these choices of A, x, a, ε. Therefore, the conclusion of this
lemma holds. This is, in terms of the function f , that for each h ∈ U (v, ε, one has
the inequality
|f (v + h) − f (v)| < ε−1 h.
This implies that f is continuous at v.

Remark 5.3.2 This proof shows a stronger result than the stated than is stated: a
proper convex function f is Lipschitz continuous on each compact set G inside its
effective domain: there exists a constant K > 0 such that |f (x)−f (y)| < Kx −y
for all x, y ∈ G.
So, by Proposition 5.3.1, proper convex functions behave nicely on the relative
interior of their effective domain. In contrast to the nice behavior of proper convex
functions on the relative interior of its effective domain, their behavior can be very
rough on the relative boundary of their effective domain.
Example 5.3.3 (Convex Function that Behaves Wildly on the Boundary of Its
Effective Domain) Any function Sn → [0, +∞) on the standard unit-sphere is the
restriction of a convex function Bn → [0, +∞) on the standard unit-ball: define the
function to be 0 everywhere in the interior of Bn . So a convex function can behave
wildly on its boundary. However, convex functions that occur in applications always
nicely on the boundary of their effective domain.
We need a definition to formulate precisely what we mean by nice behavior of a
convex function on its boundary.
5.3 Convex Function: Smoothness 133
Definition 5.3.4 A convex function f : Rn → [−∞, +∞] is closed or lower semi-

continuous if the following equivalent properties hold:
1. the epigraph of f is closed,
2. lim infx→a f (x) = f (a) for each point a on the relative boundary of the effective
domain of f .
We recall that lim infx→a f (x) is defined to be the limit in R of inf{f (x) | x −
a < r} for r ↓ 0. It is equal to limx→a f (x) if this limit exists in R.
Warning Sometimes, a different definition is given for closedness of improper
functions. Then it is considered that the only improper convex functions that are
closed are the constant functions +∞ and −∞.
All convex functions that occur in applications have the following two nice
properties: they are proper and they are closed or lower semi-continuous.
One can ‘make a convex function closed by changing its values on the relative
boundary of its effective domain’: by taking the closure of its epigraph.
Proposition 5.3.5 Let f : X = Rn → R be a convex function. Then the following
statements hold.
1. The closure of the epigraph of f is the epigraph of a convex function cl(f ) :
X → R, the maximal closed under-estimator of f .
2. The functions f and cl(f ) agree on the relative interior of dom(f ).
Suppose you are given a function by an explicit formula and you want to check
whether it is convex. You will find that it is often not convenient to verify whether
the epigraph is convex or whether Jensen’s inequality holds even if it is a function
of one variable and the formula for the function is simple.
Here is a criterion for convexity of differentiable functions.
Proposition 5.3.6 Let an open set A ⊆ Rn and a differentiable function f : A →
R be given. Then f is convex iff f (x) − f (y) ≥ f (y)(x − y) for all x, y ∈ A.
This is already more convenient for checking convexity of a function than the
definition.
Example 5.3.7 (A Nondifferential Convex Function: A Norm) Not all convex func-
tions are differentiable everywhere. The main basic example of a convex function
that is not differentiable everywhere is a norm; for dimension one this gives the
absolute value function |x|. Figure 5.6 illustrates this example.
The graph of the Euclidean norm on Rn is drawn for n = 1 (this is the graph of
the absolute value function) and for n = 2 (then the region above the graph is an ice
cream cone).
The non-differentiability of a convex function can never be very serious. For
example, a proper closed convex function of one variable can have at most countably
many points of its effective domain where it is non-differentiable.
Here is a criterion for convexity of twice differentiable functions. This is often
very convenient, in particular for functions of one variable. A variant of it gives a
0 0
Fig. 5.6 Nondifferentiable convex functions
sufficient condition for the following stronger concept than convexity. This stronger
concept will be used when we consider optimization in Chap. 7.
Definition 5.3.8 A proper convex function is called strictly convex if it is convex
and if, moreover, its graph does not contain an interval of positive length.
Proposition 5.3.9 Assume that A is open. Then a twice differentiable function
f : A → R is convex iff its Hessian f (x) is positive semidefinite for each
x ∈ A. Moreover, if f (x) is positive definite for all x ∈ A, then f is strictly
convex.
Example 5.3.10 (Second Order Conditions for (Strict) Convexity Function)
1. The function f (x) = ex is strictly convex as f (x) = ex > 0 for all x.
2. The function f (x) = 1/(1 − x 2 ), −1 < x < 1 is strictly convex as f (x) =
2(1 − x 2 )(1 + 3x 2 )/(1 − x 2 )4 > 0.
3. The function
f (x1 , x2 ) = ln(ex1 + ex2 )
is convex, as f11 = f22 = ex1 +x2 (ex1 +ex2 )−2 > 0 and f11 f22 −f12
2 = 0, but not
strictly convex as f (x, x) = x + ln 2 and so the graph of f contains an interval

of positive length.
4. Let f be a quadratic function of n variables. Write f (x) = x Cx + c x +
γ where C ∈ Sn , the symmetric n × n-matrices, c is an n-column vector and
γ is a real number, and where x runs over Rn . This function f is convex iff
the matrix C is positive semi-definite and strictly continuous if C is positive
definite.
You are invited to apply this result to a few one-dimensional examples in
Exercise 25. For a challenging multi-dimensional example, see Exercise 27. This
last exercise requires considerable algebraic manipulations. In the next chapter,
we will show that this exercise can be done effortlessly as well—by virtue of a
trick.
Conclusion A convex function is continuous on the relative interior—but not
necessarily on the relative boundary—of its effective domain. A convex function is
not necessarily differentiable at a relative interior point: the main example is a norm
at the origin. If a function is twice differentiable, then there is a useful criterion for
5.4 Convex Function: Homogenization 135
its convexity in terms of its Hessian. This criterion gives an explicit description of
the following nice explicit class of convex functions: the quadratic functions that
are convex.
5.4 Convex Function: Homogenization
The aim of this section is to present the homogenization or conification of a convex

function. We are going to show that every convex function f : X → R, where X =
Rn , can be described by a convex cone in two dimensions higher, X×R×R = Rn+2 ,
that lies entirely on or above the horizontal coordinate hyperplane xn+2 = 0, that is,
it is contained in the closed halfspace xn+2 ≥ 0, and that remains a convex cone if
you adjoin to it the positive (n + 1)-st coordinate axis, that is, R++ (0n , 1, 0).
Example 5.4.1 (Homogenization Convex Function) Figure 5.7 illustrates the idea
of homogenization of convex functions for a convex function f of one variable.
In this figure, the graph of a convex function f : R → R is drawn, as a subset
of the floor. Its strict epigraph epis (f ) is shaded. This is the region strictly above
the graph of f . This region is a convex set in the plane R2 . Then the conification of
this convex set is taken. We recall how this is done. The strict epigraph of f is lifted
up in space R3 in vertical direction to level 1. Then the union is taken of all those
open rays that originate from the origin and that run through a point on the lifted
up copy of the strict epigraph of f . This convex cone is the promised conification
of the convex set epis (f ); we define this convex cone to be the homogenization or
conification of f and we denote it by C = c(f ). Note that c(f ) remains a convex
cone if you adjoin the ray {(0, ρ, 0) | ρ > 0} to it. The convex function f is called
the (convex function) dehomogenization or deconification of the convex cone C.
Note that there is no loss of information if you go over from a convex function
f to its homogenization: you can retrieve f from c(f ). Just intersect c(f ) with
the horizontal plane at level 1 and drop the intersection down in vertical direction
Fig. 5.7 Homogenization of

a convex function
c(f)
0 f
on the floor. This gives the strict epigraph of f , from which the function f can be
recovered.
Now we give a formal definition of the description of a convex function f : X →
R, where X = Rn , in terms of a convex cone. Again, just as for the homogenization
of a convex set, it is convenient to give first the definition of deconification and then
the definition of conification.
Let C ⊆ X × R × R = Rn+2 be a convex cone for which
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ .
Note that this implies the following condition (and if C is closed it is even equivalent
to it):
R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+ .
Then
C ∩ (X × R × {1}) = A × {1}
for some convex set A ⊆ X × R that has (0X , 1) as a recession vector. Then this
set A is the convex set deconification of the convex cone C. Consider the function
f : X → R that is defined by
f (x) = inf{ρ | (x, ρ) ∈ A}
for all x ∈ X. This function f can be expressed directly in terms of the convex cone
C by
f (x) = inf{ρ | (x, ρ, 1) ∈ C} ∀x ∈ X.
This function is readily verified to be convex. All convex functions f on X are the
dehomogenization of some convex cone C ⊆ X × R × R = Rn+2 for which
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ .
Definition 5.4.2 The (convex function) dehomogenization or deconification of a

convex cone C ⊆ X × R × R = Rn+2 for which
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+
is the convex function f = d(C) : X → R given by
f (x) = inf{ρ | (x, ρ, 1) ∈ C} ∀x ∈ X.

5.4 Convex Function: Homogenization 137
Then the convex cone C is called a (convex function) homogenization or conifica-

tion of the convex function f . There is no loss of information if we go over from a
convex function f to a homogenization C: we can recover f from C: f = d(C),
that is,
f (x) = inf{ρ | (x, ρ, 1) ∈ C} ∀x ∈ X.
Example 5.4.3 (Deconification) The convex cones
C1 = {x ∈ R3 | x12 + x22 ≤ x32 , x2 ≤ 0, x3 ≥ 0},
C2 = {x ∈ R3 | x2 x3 ≥ x12 , x3 ≥ 0},
C3 = {x ∈ R3 | x1 x2 ≥ x32 , x3 ≥ 0}
have as deconifications the convex functions that have as graphs a halfcircle, a

parabola,
a branch of a hyperbola, respectively; their function recipes are x1 →
− 1 − x12 , where −1 ≤ x1 ≤ 1, x1 → x12 , where x1 ∈ R, and x1 → 1/x1 , where
x1 > 0, respectively.
We have already stated the fact that each convex function f on X is the (convex
function) deconification of a suitable convex cone C ⊆ X × R × R = Rn+2 for
which
C + R+ (0X , 1, 0) ⊆ C ⊆ X × R × R+ .
This convex cone is not unique. The following result describes all conifications of a
given convex function f on X.
Proposition 5.4.4 Let f be a convex function on X = Rn . The conifications C ⊆
X × R × R+ of f are the intermediate convex cones of the following two convex
cones: the minimal conification
cmin (f ) = R++ (epis (f )×{1}) = {ρ(x, σ, 1) | ρ ∈ R++ , x ∈ dom(f ), σ > f (x)},
which is contained in X × R × R++ , and the maximal conification
cmax (f ) = R++ (epi(f ) × {1}) ∪ (Repi(f ) × {0}).
So the minimal conification of a convex function f on X = Rn is the convex

set conification of the strict epigraph of f , lifted up to level 1 in an additional
dimension; the maximal conification is the union of the convex set conification of
the epigraph of f , lifted up to level 1 of the additional dimension, and the recession
cone of the epigraph of f , on level 0 of the additional dimension.
This result follows immediately from the definitions. We choose the minimal
homogenization of f to be our default homogenization. We write c(f ) = cmin (f )
and we call c(f ) the homogenization of f .
Conclusion A convex function can be described in terms of a convex cone: by
taking the conification of the convex set which is the strict epigraph of the convex
function.
5.5 Image and Inverse Image of a Convex Function
The aim of this section is to give the following two ways to make a new convex
function from an old one by means of a linear function—by taking the image and
by taking the inverse image.
Let M : Rn → Rm be a linear function—that is, essentially an m × n-matrix.
Definition 5.5.1 The inverse image f M of a convex function f on Rm under M is
the convex function on Rn given by
(f M)(x) = f (M(x)) ∀x ∈ Rn .
This definition shows that the chosen notation for inverse images of convex sets
and functions is natural and convenient. Note that taking the inverse image of a
convex function does not preserve properness.
Definition 5.5.2 The image Mg of a convex function g on Rn under M is the
convex function on Rm given by
Mg(y) = inf{g(x) | M(x) = y} ∀y ∈ Rm .
In terms of the reduction of convex functions to convex cones, the image and
inverse image can be characterized as follows.
If C is a conification of f , then CM is a conification of f M. To be more precise,
CM is the inverse image of C under the linear function Rn+2 → Rm+2 given by the
recipe
(x, α, β) → (Mx, α, β).
If C is a conification of g, then MC is a conification of Mf . Again, to be more

precise, MC is the image of C under the linear function Rn+2 → Rm+2 given by
the recipe
(x, α, β) → (Mx, α, β).

5.6 Binary Operations for Convex Functions 139
In terms of the reduction of convex cones to convex sets—their (strict)

epigraph—the image and inverse image can be characterized as follows. For the
inverse image both epigraph and strict epigraph work fine:
epi(f M) = (epif )M
and
epis (f M) = (epis f )M.
For the image, only the strict epigraph works fine:
epis (Mg) = M(epis (g)).
This is a first example where the strict epigraph is more convenient than
the epigraph. It is the reason why the strict epigraph is chosen to define the
homogenization of a convex function.
Warning The properties properness and closedness are not preserved under taking
images and inverse images (the only preservation is that of closedness under inverse
images).
5.6 Binary Operations for Convex Functions
The aim of this section is to construct the five standard binary operations for convex
functions in a systematic way—by the homogenization method. So far we did
not need homogenization of convex functions: it was enough to describe convex
functions by means of their epigraph or strict epigraph. But in this section, this is
not enough: we do need homogenization, to reduce all the way to convex cones.
Here are five standard binary operations for convex sets—that is, rules to make a
new convex function X = Rn → R from two old ones.
Definition 5.6.1
• f + g (pointwise sum), (f + g)(x) = f (x) + g(x) for all x ∈ X,
• f ∨ g (pointwise maximum), (f ∨ g)(x) = max(f (x), g(x)) for all x ∈ X,
• f co ∧ g (convex hull of minimum), the maximal convex underestimator of f and
g, that is, the strict epigraph of f co ∧ g is the convex hull of the strict epigraph
of f1 ∧ f2 = min(f1 , f2 ), the pointwise minimum of f1 and f2 .
• f ⊕ g (infimal convolution), f ⊕ g(x) = inf{f (x1 ) + g(x2 ) | x = x1 + x2 } for
all x ∈ X.
• f #g (Kelley’s sum), f #g(x) = inf{f (x1 ) ∨ g(x2 ) | x = x1 + x2 } (where a ∨ b =
max(a, b) for a, b ∈ R) for all x ∈ X.
g f
f ∨g
f co ∧ g
f g
Fig. 5.8 Two binary operations
Note that for two of these operations , pointwise sum and pointwise maximum,
the value of f g at a point depends only on the value at this same point of f and
g.
Example 5.6.2 (Binary Operations for Convex Functions (Geometric)) Figure 5.8
illustrates the pointwise maximum and the convex hull of the pointwise minimum.
To get the graph of f ∨ g one should follow everywhere the highest one of the
graphs of f and g. To get the graph of f co ∧ g, one should first follow everywhere
the lowest one of the graphs of f and g; then one should make the resulting graph
convex, by taking the graph of the highest convex minorizing function.
Example 5.6.3 (Binary Operations for Convex Functions (Analytic)) Consider the
two convex functions f (x) = x 2 and f (x) = (x − 1)2 = x 2 − 2x + 1.
1. (f + g)(x) = x 2 + (x − 1)2 = 2x 2 − 2x + 1,
2. (f ∨ g)(x) = (x − 1)2 = x 2 − 2x + 1 for x ≤ 12 and = x 2 for x ≥ 12 ,
3. (f ∧ cog)(x) = x 2 for x ≤ 0, = 0 for 0 ≤ x ≤ 1, and = (x − 1)2 = x 2 − 2x + 1
for x ≥ 1.
4. (f ⊕ g)(x) = infu (u2 + (x − u − 1)2 ) by definition; by completion of the square
of the quadratic function of u inside the brackets, this is equal to
1 1 1 1 1
inf 2(u − (x − 1)2 )2 + (x − 1)2 = (x − 1)2 = x 2 − x + .
u 2 2 2 2 2
5. (f #g)(x) = infu (u2 ∨ (x − u − 1)2 ) by definition; standard knowledge of

parabolas gives that this is equal to the value of the function u2 or the function
(x − u − 1)2 at the u for which these two values are equal—u = 12 (x − 1)—so
this is equal to ( 12 (x − 1))2 = 14 x 2 − 12 x + 14 .
Example 5.6.4 (The Convenience of Working with Epigraphs) As an illustration of
the convenience of working with epigraphs or strict epigraphs of convex functions
rather than with Jensen’s inequality, we verify for one of the binary operations—the
pointwise maximum—that this operation is well-defined—that is, that it turns two
old convex functions into a new convex function.
Proof As the functions f, g are convex, the sets epi(f ), epi(g) are convex.
Therefore the intersection epi(f )∩epi(g) is convex. This intersection is by definition
of the binary operation max equal to epi(f ∨g). So the function f ∨g is convex.
Warning The properties closedness and properness are not always preserved under
binary operations. For example, if A1 , A2 are nonempty disjoint convex sets in Rn
and f1 : A1 → R, f2 : A2 → R are convex functions, then f1 + f2 has effective
domain A1 ∩ A2 = ∅, and so it is not proper.
Now a systematic construction is given of the five binary operations of Defini-
tion 5.6.1: by the homogenization method. That this can be done, is an advantage
of the homogenization method. It will be useful later, when we will give for
convex functions the convex calculus rules for their dual description and for their
subdifferentials. Recall at this point how this was done for convex sets. Here the
procedure will be similar.
The precise way to construct a binary on convex functions by homogenization, is
as follows. We take the space that contains the homogenizations of convex functions
on X (we recall that X is essentially Rn ): this space is the product space X × R × R.
Then we make for each of the three factors of this space—X, the first factor R,
and the second factor R—a choice between + and −. Let the choice for X be ε1 ,
let the choice for the first factor R be ε2 , and let the choice for the second factor
R be ε3 . This gives 23 = 8 possibilities for (ε1 , ε2 , ε3 ): (+ + +), (+ + −), (+ −
+), (+ − −), (− + +), (− + −), (− − +), (− − −). Each choice (ε1 , ε2 , ε3 ) will
lead to a binary operation ◦(ε1 ,ε2 ,ε3 ) on convex functions. Let (ε1 , ε2 , ε3 ) be one of
the eight choices, and let f and g be two ‘old’ convex functions on X. We will use
the homogenization method to construct a ‘new’ convex function f ◦(ε1 ,ε2 ,ε3 ) g on
X.
1. Conify. We conify f and g. This gives conifications c(f ) and c(g) respectively.
These are convex cones C in X × R × R = Rn+2 for which C + R+ (0X , 1, 0) ⊆
C ⊆ X × R × R+ .
2. Work with convex cones. We want to associate to the old convex cones C =
c(f ) and D = c(g) a new convex cone E in X × R × R = Rn+2 for which
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
We take the Cartesian product C × D. This is a convex cone in X × R × R+ ×
X × R × R+ . We rearrange its factors:
(X × X) × (R × R) × (R+ × R+ ).
This has three factors: the first factor is X × X, the second factor is R × R and
third factor is R+ × R+ .
Now we view C ×D as a subset of (X ×X)×(R×R)×(R+ ×R+ ). Then we act
on C × D as follows, using the triplet (ε1 , ε2 , ε3 ) that we have chosen. We define
the promised new cone E to be the outcome of applying simultaneously three
operations on C × D: the first operation is taking the image under the mapping
+X if ε1 = +, but it is the inverse image under the mapping X if ε1 = −;
the second operation is taking the image under the mapping +R on the second
factor if ε2 = +, but it is taking the inverse image under the mapping R on

the second factor if ε2 = −; the third operation is taking the image under the
mapping +R on the third factor if ε2 = +, but it is taking the inverse image
under the mapping R on the third factor if ε2 = −.
The outcome E is a convex cone in X × R × R = Rn+2 for which
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
3. Deconify. We take the convex function deconification of E to be f ◦(ε1 ,ε2 ,ε3 ) g.

Example 5.6.5 (The Binary Operation ◦(−+−) for Convex Functions) We will only
do one of the eight possibilities: as a specific example, we choose − for the factor
X, + for the first factor R, and − for the second factor R. Then we form the triplet
(− + −). This triplet will determine a binary operation (f, g) → f ◦(−+−) g on
convex functions on X. At this point it remains to be seen which binary operation
we will get. To explain what this binary operation is, let f and g be two ‘old’ convex
functions on X. We have to make a new convex function f ◦(−+−) g on X from f
and g. We use the homogenization method.
1. Conify. We conify f and g. This gives conifications c(f ) and c(g) respectively.
These are convex cones F in X × R × R = Rn+2 for which
F + R+ (0X , 1, 0) ⊆ F ⊆ X × R × R+ .
2. Work with convex cones. We want to associate to the old convex cones C = c(f )
and D = c(g) a new convex cone E in X × R × R = Rn+2 for which
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
We take the Cartesian product C × D. This is a convex cone in X × R × R+ ×

X × R × R+ . We rearrange its factors:
(X × X) × (R × R) × (R+ × R+ ).
This has three factors: the first factor is X × X, the second factor is R × R and
third factor is R+ × R+ .
Now we view C×D as a subset of (X×X)×(R×R)×(R+ ×R+ ). Then we act on
C × D as follows, using the triplet (ε1 , ε2 , ε3 ) that we have chosen (− + −). We
define the promised new cone E to be the outcome of applying simultaneously
the inverse image under X on the first of the three factors of (X × X) × (R ×
R) × (R+ × R+ ) (as the triplet (− + −) has − on the first place), the image under
+R on the second of the three factors of (X × X) × (R × R) × (R+ × R+ ) (as
the triplet (− + −) has + on the second place), and the inverse image under R
on the third of the three factors of (X × X) × (R × R) × (R++ × R+ ) (as the
triplet (− + −) has − on the third place).
The outcome E is a convex cone in X × R × R = Rn+2 for which
E + R+ (0X , 1, 0) ⊆ E ⊆ X × R × R+ .
Explicitly, E consists of all elements (z, γ , τ ) ∈ X ×R×R for which there exists
an element ((x, α, ρ), (y, β, σ )) ∈ C × D such that
X (z) = (x, y), γ = +R (α, β), R (τ ) = (ρ, σ ).
One can use this description to check that E is a convex cone in X × R × R for
which the closure is intermediate between R+ (0X , 1, 0) and X × R × R+ .
Now we use that C = c(f ) and D = c(g). It follows that E consists of all triplets
(z, γ , τ ) ∈ X × R × R for which there exists an element
(s(x, ξ, 1), t (y, η, 1)) = (sx, sξ, s, ty, tη, t) ∈ c(f ) × c(g)
with (x, ξ ) ∈ epis (f ), (y, η) ∈ epis (g) for which
z = sx = ty, γ = sξ + tη, τ = s = t.
Thus E consists of the set of triplets (γ x, γ (ξ + η), γ ) with (x, ξ ) ∈

epis (f ), (x, η) ∈ epis (g). That is,
E = c(f + g),
the homogenization of the pointwise sum f + g.

3. Deconify. The deconification of E = c(f + g) is f + g. Therefore, the choice of
(− + −) leads to the binary operation pointwise sum,
f ◦(−+−) g = f + g
for all convex functions f, g on X = Rn .

It is recommended that you repeat this construction, but now for some of the
other triplets. Here is a list which choice in the construction leads to which one of
the standard binary operations pointwise sum, pointwise maximum, convex hull of
the pointwise minimum and the infimal convolution (we do not display the choice
that gives Kelley’s sum, as this binary operation is rarely used).
Proposition 5.6.6 The four well-known binary operations for convex functions are
obtained by the following choices of triplets in the construction by the homogeniza-
tion method:
1. (− + −) gives pointwise sum f + g,
2. (− − −) gives pointwise maximum f ∨ g,
3. (+ + +) gives convex hull of the pointwise minimum f co ∧ g,

4. (+ + −) gives infimal convolution f ⊕ g.
It is satisfying to see that the four well-known binary operations on convex
functions—‘pointwise sum’, ‘pointwise maximum’, ‘convex hull of the pointwise
minimum’, ‘infimal convolution’—which are traditionally given by completely
different looking explicit formulas are all generated in a systematic way by
the homogenization method. We saw this phenomenon—that all classic explicit
formulas of convex analysis can be generated systematically by the same method—
already in action for the binary operations for convex sets.
Moreover, we see that there are four choices of triplets left. One of these leads to
Kelley’s sum and the other three lead to three nameless binary operations on convex
functions for which so far no applications have been found. The formulas for these
binary operations will not be displayed, but you are welcome to derive the formula
for one such binary operation.
The eight binary operations on convex functions can be denoted in a systematic
way as ◦(1 2 3 ) where i ∈ {+, −} for i = 1, 2, 3. The binary operation that
is obtained by choosing triplet (α1 , α2 , α3 ) with α1 ∈ {+X , X } and α2 , α3 ∈
{+R , R } is denoted by ◦(1 2 3 ) , where (1 2 3 ) is made from (α1 , α2 , α3 ) by
replacing +Y by + and Y by − (where Y = X or Y = R). For example, ◦(−+−)
denotes pointwise sum: this binary operation is obtained if the triplet (X , +R , R )
is chosen; then we replace X by −, +R by +, and R by −, and this gives the
triplet (− + −).
Conclusion The five standard binary operations for convex functions are defined
by completely different formulas, but they are all generated in the same systematic
way—by the homogenization method.
5.7 Recession Cone of a Convex Function
The aim of this section is to define the recession cone of a convex function.
The recession cone of a proper convex function on X = Rn always contains the
vector (0X , 1). This is not interesting. Therefore, we define recession vectors of a
convex function f by taking the vertical projections on X of the recession vectors
of the epigraph of f that do not point upward—that is, their last coordinate should
be non-positive.
Definition 5.7.1 For each convex function f on Rn and each real number γ the
sublevel set of f for level γ is the following convex set:
Vγ = {x | f (x) ≤ γ }.
Definition 5.7.2 The recession cone Rf of a closed proper convex function f on

X = Rn is the recession cone of a non-empty level set of f .
5.8 *Applications 145
Fig. 5.9 Recession cone of a

convex function f
0 Rf
This concept is well-defined: which non-empty sublevel set is chosen does not
matter. Figure 5.9 illustrates the concept of recession cone of a convex function.
Thus, the recession cone of a convex function is defined to be ‘the interesting
part’ of the recession cone of the epigraph of the function. Here is the main
application of this concept.
Proposition 5.7.3 If f is closed and Rf = {0}, then f assumes its minimum.
It might be surprising that this somewhat artificial looking concept can be defined
in a natural way, as a dual concept to the concept of effective domain of a convex
function. We will see this in Chap. 6.
5.8 *Applications
The aim of this section is to give applications of the concept of convex function.
5.8.1 Description of Convex Sets by Convex Functions
The aim of this section is to give further explanation that convex functions allow to
describe some convex sets.
By the second order criterion for convexity, one can give explicit examples of
1
convex functions of one variable such as x 2 , −x 2 , 1/x, ex , − ln x (x > 0). For
such a function f , its epigraph {(x, y) | y ≥ f (x)} is a convex set in the plane R2 .
More generally, the second order conditions can be used to verify the convexity of
functions of several variables such as f (x) = x Ax +a x +α where A is a positive
semidefinite n×n-matrix, a is an n-column and α a real number. Besides, each norm
on Rn is a convex function. By repeatedly applying binary operations one gets many
more convex functions. Then for each convex function f of n variables constructed
explicitly, one gets an explicit construction of a convex set—its epigraph.
Conclusion One can give explicit examples of convex sets by constructing convex
functions and taking their epigraphs.
Fig. 5.10 Risk averse U
a c pa + qb b
5.8.2 Application of Convex Functions that Are Not Specified
The aim of this section is to explain why, in some valuable applications in economic
theory, convex functions occur that are not specified.
Suppose that someone expects to receive an amount of money a with probability
p ∈ [0, 1] and an amount of money b with probability q = 1−p. Then the expected
amount of money he will receive is pa + qb. If he is risk averse, then he would
prefer to receive an amount of money slightly less than pa + qb for sure. This can
be modeled by means of a utility function U of the person: a prospect of getting the
amount c for sure is equivalent to the prospect above if U (c) = pU (a) + qU (b).
It is reasonable to require that a utility function is increasing: ‘the more the better’.
Moreover, it should be concave (that is −U should be convex), as the person is risk
averse. Figure 5.10 illustrates the modeling of risk averse behavior by concavity of
the utility function.
Thus risk averse behavior is modeled by a convex function that is not specified.
In economic theory, many problems are modeled by convex functions and analyzed
fruitfully, although these functions are not specified by a formula; one only assumes
these functions to be convex or concave.
Conclusion The property of a function to be convex or concave corresponds
sometimes to desired properties in economic models. In the analysis of these models
these properties are assumed and used in the analysis, without a specification of the
functions by a formula.
5.9 Exercises
1. Show that every linear function is convex.

2. Show that ln det(X), where X runs over the positive definite n × n-matrices, is
a convex function.
3. Show that if f, g : Rn → R are convex functions of one variable and g is
nondecreasing, then h(x) = g(f (x)) is convex.
5.9 Exercises 147
4. Show that a convex function f : [0, +∞) → R with f (0) ≤ 0 has the
following property:
f (x) + f (y) ≤ f (x + y)
for all x, y.
5. Prove for a convex function on [a, b] (where a < b) Hadamard’s inequality

a+b 1 b f (a) + f (b)
f( )≤ f (x)dx ≤ .
2 b−a a 2
6. Let f be a proper convex function on X = Rn . Show that the perspective

function g(x, t) = tf ( xt ) with effective domain {(x, t) | xt ∈ dom(f ), t > 0}
is convex. Relate the perspective function to the homogenization of f .
7. *Show that a closed convex set A ⊆ Rn that has a nonzero recession vector v
for which −v is no recession vector can be viewed as the epigraph of a function.
Hint. assume wlog that this v the vertical vector (0, 0, . . . , 0, 1).
8. Show that for a convex set A ⊆ X = Rn and a function f : A → R which has
a convex epigraph, the solution set of the inequality f (x) ≤ γ is a convex set
in X for each real number γ . Do the same for the strict inequality f (x) < γ .
9. Show how you can retrieve a function f : X = Rn → R from its epigraph—by
means of an infimum operation. This shows that you do not lose information if
you pass from a function to its epigraph.
10. *Show that a function f : X = Rn → R is convex iff its strict epigraph epis (f )
is a convex set in X × R.
11. Show that the effective domain of a convex function is a convex set.
12. Show that the definition of a proper convex function by Jensen’s inequality is
equivalent to the definition in terms of its epigraph.
13. *Prove the statements in Example 5.2.13.
14. Let a function f : X = Rn → R be given. Show that f is convex iff the
restriction of f to each line L in X is a convex function.
15. **Show that each norm on X = Rn is a convex function and characterize
epigraphs of norms as convex cones in X × R by suitable axioms.
16. Prove that each proper convex function is continuous on the relative interior of
its effective domain.
17. *Give an example of a convex function f : A → R for which A is the unit disk
x12 + x22 ≤ 1 and f is not continuous at any boundary point.
19. Prove that the function x 2 is convex by verifying that its epigraph is convex and
that Jensen’s inequality holds. Note that even for this simple example, these
methods to verify convexity of a function are not very convenient.
21. Prove that the function x 2 is convex by the first order criterion for convexity
of differentiable functions. Note that this is more convenient than using one of
the two definitions of convex functions: by the convexity of the epigraph or by

Jensen’s inequality.
22. *Show that no norm on X = Rn is differentiable at the origin.
23. **Show that a convex function f : R → R has at most countably many points
of non-differentiability.
Hints. Show that for each a ∈ R the left- and right derivative of f in a—f− (a)
and f+ (a) respectively—are defined and that f− (a) ≤ f+ (a). Then show that
a < b implies f+ (a) < f+ (b). Finally prove and apply the fact that each
uncountable collection of positive numbers has a countable subcollection with
sum +∞.
25. *Show that the following functions are convex:
(a) ax 2 + bx + c, (for a ≥ 0),
(b) ex ,
(c) − ln x, x > 0,
(d) x ln x, x > 0,
(e) x r , x ≥ 0, (for r ≥ 1),
(f) −x r , x > 0, (for 0 ≤ r ≤ 1),
(g) x r , x > 0 (for x ≤ 0),
(h) |x| + |x − a| ,
(i) √ − 1| ,
||x|
(j) 1 + x2,
(k) − ln(1 − ex ),
(l) ln(1 + ex ).
26. *Consider a quadratic function f of n variables. That is, consider a function of
the form f (x) = x Cx + c x + γ , where C ∈ Sn , c ∈ Rn and γ ∈ R. Use the
second order criterion to show that the function f (x) is convex iff C is positive
semidefinite. Moreover, show that f is strictly convex iff C is positive definite.
27. **Show by means of the second order conditions that the function ln(ex1 +· · ·+
exn ) is convex.
28. Show how to retrieve a convex function from its conification.
29. *Use conification to prove Jensen’s inequality for a convex function f : X =
Rn → R

m
m
f αi xi ≤ αi f (xi )
i=1 i=1

for αi ≥ 0, m i=1 αi = 1, xi ∈ X ∀i.
30. Compare the proof you have found in Exercise 29 to the usual proof (‘derivation
from Jensen’s inequality for two points’).
31. Show that the inverse image f M of a convex function f under a linear function
M is again a convex function.
5.9 Exercises 149
32. Show that taking the inverse image of a convex function under a linear function
does not preserve properness.
33. Show that the epigraph of the inverse image of a convex function under a linear
function is the inverse image of the epigraph of this convex function.
34. Show that the image Mg of a convex function g under a linear function M is
again a convex function.
35. Show that taking the image of a convex function under a linear function does
not preserve properness.
36. *Show that the image Mg of a convex function g : Rn → R under M can be
characterized as the function Mg : Rm → R for which the strict epigraph is
the image of the strict epigraph of g under the mapping Rn × R → Rm × R :
(x, ρ) → (Mx, ρ).
37. *Show that the five binary operations in Definition 5.6.1 are well-defined. For
the pointwise maximum, this is done in the main text.
38. Prove the following geometric interpretation for the binary operation infimal
convolution of convex functions: this amounts to taking the Minkowski sum of
the strict epigraphs.
39. *Repeat this construction, but now for the triplet (X , R , R ) and show that
this gives the pointwise maximum.
40. Repeat this construction again, but now for the triplet (+X , +R , +R ) and
(+X , +R , R ) and show that this gives the convex hull of the pointwise
minimum and the infimal convolution respectively.
41. Construct with the conification method Kelley’s sum by a suitable triplet.
42. Construct with the conification method formulas for the three nameless binary
operations on convex functions. Try them for some simple concrete convex
functions on R.
43. Show that the recession cone of a closed proper convex function is well-defined.
That is, show that all nonempty sublevel sets of a convex function have the same
recession cone.
44. Give a homogenization definition of the recession cone of a convex function.
Chapter 6
Convex Functions: Dual Description
Abstract
• Why. The phenomenon duality for a proper closed convex function, the possibil-
ity to describe it from the outside of its epigraph, by graphs of affine (constant
plus linear) functions, has to be investigated. This has to be done for its own sake
and as a preparation for the duality theory of convex optimization problems. An
illustration of the power of duality is the following task, which is challenging
without duality but easy if you use duality: prove the convexity of the main
function from geometric programming, ln(ex1 + · · · + exn ), a smooth function
that approximates the non-smooth function max(x1 , . . . , xn ).
• What. The duality for convex functions can be formulated efficiently by the
property of a duality operator on convex functions that have the two nice
properties (proper and closed), the conjugate function operator f → f ∗ : this is
an involution. For each one of the eight binary operations on convex functions,
one has a rule of the type (f g)∗ = f ∗ " g ∗ where " is another one of the
eight binary operations on convex functions. Again, homogenization generates a
unified proof for these eight rules. This requires the construction of the conjugate
function operator by means of a duality operator for convex cones (the polar cone
operator). There is again a technical complication due to the fact that the rules
only hold if all convex functions involved, f, g, f g and f ∗ " g ∗ , have the two
nice properties.
Even more important, for applications, is that one has, for the two binary
operations + and max on convex functions, a rule for subdifferentials: the
theorem of Moreau-Rockafellar ∂(f1 +f2 )(x) = ∂f1 (x)+∂f2 (x) and the theorem
of Dubovitskii-Milyutin ∂ max(f1 , f2 )(x) = ∂f1 (x)co ∪ ∂f2 (x) in the difficult
case f1 (x) = f2 (x). Homogenization generates a unified proof for these two
rules. Again, these rules only hold under suitable assumptions.
Again, there is no need to prove something new: all duality results for convex
functions follow from properties of convex sets (their epigraphs) or of convex
cones (their homogenizations).
Road Map
1. Figure 6.1, Definition 6.2.3, (conjugate function).

152 6 Convex Functions: Dual Description
2. Figure 6.2 (dual norm is example of either conjugate function or duality operator
by homogenization).
3. Theorem 6.4.2 and structure proof (duality theorem for proper closed convex
functions: conjugate operator is an involution, proof by homogenization).
4. Propositions 6.5.1, 6.5.2 (calculus rules for computation conjugate functions).
5. Take note of the structure of Sect. 6.5 (duality between convex sets and sublinear
functions; this is a preparation for the proof of calculus rules for subdifferentials).
6. Definitions 6.7.1, 6.7.2, Propositions 6.7.8–6.7.11 (subdifferential convex func-
tion at a point, nonemptiness and compactness, at points of differentiability,
Fenchel-Moreau, Dubovitskii-Milyutin, subdifferential norm).
6.1 *Motivation
The dual description of a convex function is fundamental and turns up often, for
example in duality theory for convex optimization. Here we mention a striking
example of the power of using the dual description of a convex function. We give an
example of a function for which it is quite involved to verify convexity by means of
the second order criterion. However, this will be surprisingly simple if you use its
dual description, as we will see at the end of this chapter.
Consider the function f : Rn → R given implicitly by
e y = e x1 + · · · + e xn
or explicitly by f (x) = ln(ex1 + · · · + exn ). This function is useful as it is a

smooth function resembling closely the non-differentiable function maxi (xi ) (see
Exercise 1).
You can prove that f is convex as follows. You can calculate its second order
derivatives and then form the Hessian f (x). Then it is in principle possible to prove
that f (x) is positive semi-definite for all x ∈ Rn (and hence that f is convex), but
it is not easy. On the other hand, the dual description of this function turns out to be
essentially the sum of n functions of one variable. It is straightforward to check that
each of these is convex. This implies that the function f is convex. The details will
be given at the end of this chapter.
6.2 Conjugate Function
The aim of this section is to define the duality operator for proper closed convex
functions—the conjugate function operator.
Example 6.2.1 Figure 6.1 illustrates the definition of the conjugate function.
The graph of a convex function f : R → R is drawn. Its primal (‘inner’)
description is its epigraph. Its dual (‘outer’: outside the epigraph) description is the
6.2 Conjugate Function 153
Fig. 6.1 Conjugate function

f
ax − b
− f ∗(a)
−b
collection of lines in the plane that lie entirely under or on the graph. Each one of
these lines can be described by an equation y = ax − b. It turns out that the set
of pairs (a, b) that one gets is precisely the epigraph of a convex function f ∗ . This
function f ∗ is the dual description of the convex function f , called the conjugate
function of f . Figure 6.1 shows that f ∗ is given by the following formula:
f ∗ (a) = sup (ax − f (x)) .

x
Indeed, the condition that the line y = ax − b lies entirely under or on the graph
of f means that
ax − b ≤ f (x) ∀x.
This can be rewritten as
ax − f (x) ≤ b ∀x
that is, as
sup (ax − f (x)) ≤ b.

x
This shows that the set of pairs (a, b) for which the line y = ax − b lies entirely
under or on the graph of f is indeed the epigraph of the function with recipe a →
supx (ax − f (x)) .
The formula above for f ∗ implies that f ∗ is convex, as for each x the function
a → ax − f (x) is affine and so convex, and the pointwise supremum of a collection
of convex functions is again convex.
Example 6.2.2 (Comparison of Conjugate Function and Derivative) Let f : R →
R be a differentiable convex function; assume for simplicity that f : R → R is
a bijection. Then both the functions f ∗ and f give descriptions of the set of the
tangent lines to the graph of f . In the case of the conjugate function f ∗ , the set is
described as a family (La )a∈R indexed by the slope of the tangent line: La is the
graph of the function with recipe x → ax − f ∗ (a). In the case of the derivative f ,
the set is described as a family (Mb )b∈R indexed by the x-coordinate b of the point of
tangency of the tangent line and the graph of f : Mb is the graph of the function with
recipe x → f (b) + f (b)(x − b). The description by means of f ∗ has the advantage
over the description by f that if f is convex but not differentiable, this description
includes all supporting lines at points of nondifferentiability of f . These arguments
will be generalized in Proposition 6.7.5 to arbitrary proper convex functions: then
one gets a relation between the conjugate function and the subdifferential function,
which is the analogue of the derivative for convex functions.
Now we give the formal definition of the duality operator for convex functions.
Definition 6.2.3 Let a proper convex function f be given. That is, let a nonempty
convex set A ⊆ X = Rn and a convex function f : A → R be given. Then the
conjugate function of f is the convex function on X given by
f ∗ (y) = sup (x · y − f (x)).

x∈A
Now f is extended in the usual way to f : X → R ∪ {+∞} by defining f (x) =

+∞ ∀x ∈ A. Then one can write
f ∗ (y) = sup (x · y − f (x)) ∀y ∈ X.

x∈X
Note that the function f ∗ is a proper closed convex function.

The conjugate function f ∗ of a proper convex function f : Rn → R gives a
description of all affine functions Rn → R that support the graph of f at some
point; if the slope of such an affine function is a ∈ Rn , then this affine function is
given by recipe x → a · x − f ∗ (a).
Example 6.2.4 (Conjugate Function) The conjugate function of f (x) = ex can
be determined as follows. Consider for each y ∈ R the problem to maximize the
concave function (that means that minus the function is convex) x → xy − ex . For
d
y > 0, we get dx (xy − ex ) = y − ex = 0, so x = ln y, and so f ∗ (y) = y ln y − y.
For y = 0, we get that xy − ex tends to 0 for x → −∞ and to −∞ for x → +∞,
so f ∗ (0) = 0. For y < 0, we get that xy − ex tends to +∞ for x → −∞, so
f ∗ (y) = +∞.
This is a good moment to compute the conjugate function for a few other convex
functions (see Exercises 4–6).
Example 6.2.5 (Dual Norm and Duality) Now we consider the dual norm · ∗ of
a norm · on X = Rn , defined in Chap. 4 (see Definition 4.3.9). This definition is
not just some arbitrary convention: it fits into the general pattern of duality. Indeed,
the indicator function of the unit ball of the dual norm is the conjugate of the convex
function · , as one can readily check. Even more simply, the epigraphs of ·
and − · ∗ are each other’s polar cone. Figure 6.2 illustrates this. In particular, the
6.3 Conjugate Function and Homogenization 155
Fig. 6.2 Dual norm by

means of dual cones
|| ⋅ ||
− || ⋅ ||∗
well-known formula · ∗∗ = · follows from the formula C ◦◦ = C for closed

convex cones C.
Conclusion The duality operator for proper closed convex functions—the conju-
gate function operator—has been defined by an explicit formula.
6.3 Conjugate Function and Homogenization
The aim of this section is to define the conjugate function operator by homoge-
nization. This alternative definition will make it easier to establish properties of the
conjugate function operator: they can be derived from similar properties of the polar
cone operator.
Here is a detail that needs some attention before we start: the conification c(f ) of
a convex function f on X = Rn is contained in the space X × R × R. For defining
the polar cone (c(f ))◦ , we have to define a suitable nondegenerate bilinear mapping
on X × R × R. WE take the following nondegenerate bilinear mapping b:
b((x, α, α ), (y, β, β )) = x · y − αβ − α β (*)
for all (x, α, α ), (y, β, β ) ∈ X×R×R. Note the minus signs and the transposition!
The following result gives a nice property of a convex function: its dual object is
again a convex function.
Proposition 6.3.1 Let X = Rn .
1. A convex cone C ⊆ X × R × R is a conification of a convex function f on X iff
R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+ .
2. If a convex cone C ⊆ X × R × R is a conification of a convex function on X,

then the polar cone C ◦ with respect to the bilinear mapping b defined by (∗) is
also a conification of a convex function.
Proof The first statement follows by definition of conification of a convex set. The
second statement follows from the fact that the convex cones R+ (0X , 1, 0) and X ×
R×R+ are each other’s polar cone with respect to the bilinear mapping b. Therefore
we get that
R+ (0X , 1, 0) ⊆ cl(C) ⊆ X × R × R+
implies
(R+ (0X , 1, 0))◦ ⊇ C ◦ ⊇ (X × R × R+ )◦ ,
and so it gives
X × R × R+ ⊇ C ◦ ⊇ R+ (0X , 1, 0),
as required.
Now we are ready to construct the conjugate function operator by the homoge-
nization method.
Proposition 6.3.2 The duality operator on convex functions that one gets by the
homogenization method (first conify, then apply the polar cone operator, finally
deconify) is equal to the conjugate function operator.
Indeed, by the previous proposition, the duality operator by the homogenization
method gives, when applied to an ‘old’ convex function f on X, a ‘new’ convex
function on X. A calculation, which we do not display, shows that the new convex
function is f ∗ , the conjugate function of f .
This is yet another example where a formula from convex analysis—here
the defining formula for the conjugate function operator—is generated by the
homogenization method.
Conclusion The duality operator for proper closed convex functions—the conju-
gate function operator—has been constructed by the homogenization method.
6.4 Duality Theorem
The aim of this section is to give the duality theorem for convex functions.
Definition 6.4.1 The biconjugate function f ∗∗ of a proper convex function f on
X = Rn is the conjugate function of the conjugate function of f ,
f ∗∗ = (f ∗ )∗ .
6.5 Calculus for the Conjugate Function 157
Theorem 6.4.2 (Duality Theorem for Convex Functions) A closed proper convex
function f on X = Rn is equal to its biconjugate function f ∗∗ ,
f ∗∗ = f.
In particular, the biconjugate function operator on convex functions is the same

operator as the closure operator: f ∗ = cl(f ).
This result is called the theorem of Fenchel-Moreau. Here is the geometric
meaning of this result: there is complete symmetry between the primal and
dual descriptions of a proper closed convex function: the functions f and f ∗
are each other’s conjugate. The proof below illustrates the convenience of the
homogenization definition of the conjugate function operator.
Proof We apply the homogenization method.
1. Homogenize. Take cl(c(f )), which is a homogenization of f .
2. Work. Apply the duality theorem for cones to C = cl(c(f )). This gives C ◦◦ = C.
3. Dehomogenize. As C is a conification of f , C ◦ is a conification of f ∗ , so C ◦◦ is
a conification of f ∗∗ . So deconification gives f ∗∗ = f .
6.5 Calculus for the Conjugate Function
The aim of this section is to present calculus rules to compute conjugate functions.
We already have given one rule above: (f ∗ )∗ = f if f is proper and closed. Now
we will also give rules for expressing (f ◦ g)∗ in f ∗ and g ∗ where ◦ is a binary
operation for convex functions (sum, maximum, convex hull of the minimum, or
infimal convolution) and for expressing (Mf )∗ and (f N )∗ in f ∗ for linear functions
M and N .
Just as we did for convex sets, we will halve the number of formulas by the
introduction of a shorthand notation. We will write for two proper convex functions
f, f˜ : X = Rn → R that are in duality, in the sense that
f ∗ = f˜, f˜∗ = cl(f )
as follows:
→ f˜.
d
f ←
Then we will say that (f, f˜) is a pair in duality. Note that here f need not be closed
(that is, lower-semicontinuous).
Here are the promised calculus rules for conjugate functions.
Proposition 6.5.1 (Calculus Rules for the Conjugate Function Operator) Let
both (f, f˜) and (g, g̃) be two pairs of proper convex functions on X = Rn in duality,
→ f˜, g ←
d d
f ← → g̃.
Then
→ f˜ ⊕ g̃
d
f +g ←
and
→ f˜co ∧ g̃,
d
f ∨g ←
provided, for each of these two rules, that the outcomes of the binary operations
involved are proper.
The proof of these results in a traditional approach are relatively involved.
However, we can get a unified proof by the homogenization method. Indeed
these results follow immediately from the following convex calculus rules and the
definition of the binary operations by the homogenization method (recall that this
involves the special linear functions +Y and Y ).
Proposition 6.5.2 Let M ∈ Rm×n , (g, g̃) a pair of proper convex functions on Rn
in duality,
d
g←
→ g̃.
Then (Mg, g̃M ) is a pair in duality,
d
→ g̃M
Mg ←
provided Mg and g̃M are proper.

This proposition is an immediate consequence of the corresponding calculus rule
for convex sets, applied to strict epigraphs of convex functions.
6.6 Duality: Convex Sets and Sublinear Functions
The aim of this section is to prepare for the non-differentiable analogue of the
derivative of a differentiable function—the subdifferential of a convex function f
at a point x, a concept that we need for optimization.
Example 6.6.1 (Subdifferential) Figure 6.3 illustrates the subdifferential for func-
tions on X = Rn for n = 1.
6.6 Duality: Convex Sets and Sublinear Functions 159
Fig. 6.3 Subdifferential of a

convex function at a point
The graph of a convex function f of one variable that is nondifferentiable at x,

is drawn, as well as some of the lines that run through the kink point of the graph
and that lie nowhere above the graph of f . The set of the slopes of all these lines is
defined to be the subdifferential of f at x.
It is fruitful to view the concept of subdifferential of a convex function at a
point as defined in a systematic way, in two steps. The first step is that an auxiliary
function g is taken. This auxiliary function is characterized as follows: it takes value
0 at 0, it is linear in both directions—left and right—and has left-hand slope the
left-derivative of f at x and right-hand slope the right-derivative of f at x. The
epigraph of the auxiliary function g is a convex cone. A functions for which the
epigraph is a convex cone is called a sublinear function. The second step is that the
dual description of the auxiliary function g is taken: this consists of the set of lines
through the origin that lie nowhere above the graph of this function. This set can
be viewed as a convex subset of the line R, the slopes of these lines. Note that a
nonvertical line through the origin is determined by its slope. This convex subset is
precisely the subdifferential of f at x.
The preparation to be done in this section for the systematic construction of the
subdifferential, is the presentation of the duality between sublinear functions X =
Rn → R and arbitrary convex sets in X = Rn .
We recall the definition of sublinear functions.
Definition 6.6.2 A sublinear function on X = Rn is a function p : X → R for
which the epigraph is a convex cone and for which X ×R+ ⊆ cl(epi(p)). Moreover,
p is called closed if epi(p) is closed and proper if {0X } × R ⊆ cl(epi(p)).
Example 6.6.3 (Sublinear Function: Norm) A norm N on X = Rn is an example
of a sublinear function.
Alternatively, a sublinear function can be defined by homogenization. For a
convex cone C ⊆ X × R for which R+ (0X , 1) ⊆ cl(C), one defines its
dehomogenization to be the function p : X × R by
p(x) = inf{ρ | (x, ρ) ∈ C} ∀x ∈ X.

Then the convex cone C is called a homogenization of the sublinear function p. The
homogenizations of p are the cones C for which
epis (p) ⊆ C ⊆ epi(p).
Now we define the duality operators for proper sublinear functions and for
general nonempty convex sets.
Definition 6.6.4 The support function of a nonempty convex set A ⊆ X = Rn is
the proper closed sublinear function s(A) : X → R ∪ {+∞} given by
y → sup{x · y | x ∈ A}.
Example 6.6.5 (Support Function) The support function of the standard n-

dimensional closed unit ball x12 + · · · + xn2 ≤ 1 is the Euclidean norm
1
N(x) = x = (x12 + · · · + xn2 ) 2 .
Definition 6.6.6 The subdifferential ∂p of a proper sublinear function p on X = Rn
is the nonempty closed convex set
∂p = {y ∈ X | x · y ≤ p(x) ∀x ∈ X}.
Example 6.6.7 (Subdifferential) The subdifferential of the Euclidean norm on Rn

is the standard closed unit ball in Rn .
Note that here we have for the first time duality operators that do not preserve
the type. So far, we have seen that the polar set operator preserves convex sets
containing the origin, the polar cone operator preserves nonempty convex cones,
the conjugate function preserves proper convex functions.
Now we can state the duality theorem for convex sets and sublinear functions.
This gives the dual descriptions of convex sets and sublinear functions.
Theorem 6.6.8 (Duality Theorem for Convex Sets and Sublinear Functions)
One has the following duality formulas
1. ∂s(A) = A for all nonempty closed convex sets A ⊆ X = Rn ,
2. s∂p = p for all proper closed sublinear functions p on X.
The meaning of this result is that it gives a perfect symmetry between closed
nonempty convex sets A in X and closed proper sublinear functions p on X:
p is the support function of A iff A is the subdifferential of p.
Example 6.6.9 (Subdifferential and Support Function: Inverse Operators) The n-
dimensional standard closed unit ball B and the n-dimensional Euclidean norm ·
are each other’s dual object: B is the subdifferential of · , and · is the support
function of B.
6.6 Duality: Convex Sets and Sublinear Functions 161
These two duality operators are defined by explicit formulas which might look
like they come out of nowhere and the duality theorem 6.6.8 might look like a
new phenomenon that requires a new proof. However, one can also define these
operators, equivalently, in a systematic way: by homogenization. This gives the
same duality operators and then the duality theorem 6.6.8 is seen to be an immediate
consequence of the duality theorem for convex cones. Indeed, let p be a proper
sublinear function on X = Rn . Choose a homogenization C of p. This is a convex
cone in X × R, for which
R+ (0X , 1) ⊆ cl(C), R(0X , 1) ⊆ cl(C).
Apply the polar cone operator with respect to the bilinear mapping b on X ×R given
by
b((x, α), (y, β)) = x · y − αβ.
This gives
(R+ (0X , 1))◦ ⊇ C ◦ , (R(0X , 1))◦ ⊇ (C)◦ .
Now use that the convex cones R+ (0X , 1) and X × R+ are each other’s dual with
respect to b, and that the convex cones R(0X , 1) and X × {0} are also each other’s
dual with respect to b. It follows that we get
X × R+ ⊇ C ◦ , X × {0} ⊇ C ◦ .
That is,
C ◦ ⊆ X × R+ , C ◦ ⊆ X × {0}.
This means that C ◦ is a homogenization of a nonempty convex set A. This is readily

verified to be the subdifferential ∂p that was defined above. In the same way, one
can define by homogenization a duality operator from convex sets to sublinear
functions. This gives precisely the support function operator s defined above. These
two operators are each other’s inverse on closed proper objects, that is, for a closed
proper convex set A ⊂ X and a closed proper sublinear function p on X, one has
that A is the subdifferential of p iff p is the support function of A. This follows
from the homogenization definitions of the subdifferential operator and the support
function operator together with the duality theorem C ◦◦ = C for closed nonzero
convex cones C.
One can go on and derive in a systematic way, by homogenization, the convex
calculus rules for subdifferentials and for support functions. This gives eight rules:
there are four binary operations on sublinear functions +, ∨, ⊕, # and there are
four binary operations on convex sets +, co ∪, ∩, # and for each one of these eight
binary operations, there is a duality rule. We only need two of these rules, those
for the operations +, ∨ on sublinear functions. Therefore we only display the four
duality rules for sublinear functions.
Proposition 6.6.10 (Calculus Rules for the Sublinear Functions) Let p, q be
proper closed sublinear functions on X = Rn . Then the following formulas hold.
1. ∂(p + q) = cl(∂p + ∂q),
2. ∂(p ∨ q) = cl((∂p) co ∪ (∂q)),
3. ∂(p ⊕ q) = ∂p ∩ ∂q,
4. ∂(p # q) = ∂p # ∂q,
provided, for each one of these rules, that the outcomes of the two binary operations
involved is proper.
The closure operators can be omitted if the relative interiors of the effective
domains of p1 , p2 have a common point,
ri(dom p1 ) ∩ ri(dom p2 ) = ∅.
We emphasize that for applications in convex optimization, we need only the first
two rules.
We do not display the details of the proofs, by homogenization. These are similar
to the proofs of the corresponding rules for other convex objects than sublinear
functions, such as convex sets containing the origin or proper convex functions.
Conversely, one can give rules for the support function of ‘new convex sets’ in
terms of old ones. We do not need these rules so we will not display them.
Conclusion The dual description of a proper sublinear function is a general
nonempty convex set. This will be needed for the definition of the ‘surrogate’ for
the derivative of a convex function in a point of non-differentiability.
6.7 Subgradients and Subdifferential
The aim of this section is to present the basics for subgradients and subdifferentials
of convex functions at a point. These concepts play a role in optimization. For
convex functions they are the analogue of the derivative.
Here is the usual definition of the subdifferential of a proper convex function at
a point.
Definition 6.7.1 A subgradient of a proper convex function f on X = Rn at a point
x from its effective domain dom(f ) is a vector d ∈ X for which
f (z) ≥ f (x) + d · (z − x), ∀z ∈ X.
In words, if an affine function on X has the same value at x as f and its graph lies
nowhere above the graph of f , then its slope is called a subgradient of f at x.
6.7 Subgradients and Subdifferential 163
Fig. 6.4 Subdifferential of a

convex function at a point
Definition 6.7.2 The subdifferential ∂f (x) of a proper convex function f on X at a

point x ∈ dom(f ) is the set of all subgradients of f at x. Note that this is a closed
convex set.
Example 6.7.3 (Subdifferential Convex Function at a Point (Geometric)) Figure 6.4
illustrates this definition for the case that n = 1.
The geometric meaning of a subgradient is that it corresponds to a non-vertical
supporting hyperplane to the epigraph of f at the point (x, f (x)).
Example 6.7.4 (Subdifferential Convex Function at a Point (Analytic))
1. The subdifferential of the Euclidean norm on X = Rn at a point x ∈ X is the
standard closed unit ball in X if x = 0n , the origin, and it is {x/x} if x = 0n .
2. The subdifferential of a convex quadratic function f (x) = x Bx + b x + β
(where B is a positive semidefinite matrix) at the point x0 is {2Bx0 + b}.
The subdifferential operator is related to the conjugate function operator.
Proposition 6.7.5 Let f be a proper convex function on X = Rn . For each x, y ∈
X one has
y ∈ ∂f (x) ⇔ f (x) + f ∗ (y) = x · y.
This result follows from the definitions. It often allows to compute the subdiffer-
ential ∂f (x) if the conjugate function f ∗ is known.
The subdifferential of a proper convex function at a point can be expressed in
terms of directional derivatives as follows.
Often, one can calculate subdifferentials ∂f (x) of a convex function f at a point
x, by means of the following result.
Definition 6.7.6 The directional derivative f (x; v) ∈ R of a proper convex
function f : X = Rn → R at a point x ∈ dom(f ) in direction v ∈ X is the
limit
f (x + tv) − f (x)
f (x; v) = lim .
t↓0 t
The fractional derivative f (x; v) is a sublinear function of v ∈ X

Note that a fractional derivative f (x, v) always exist, but it can take the values
+∞ and −∞.
Proposition 6.7.7 The subdifferential ∂f (x) of a proper convex function f at a
point x ∈ dom(f ) is equal to the dual object ∂(f (x; ·)) of the sublinear function
f (x, ·) that is given by the directional derivatives of f at x, v → f (x; v):
∂f (x) = ∂(f (x; ·)).
The proof is easy: it suffices to prove this result for the case of a function of one
real variable, by restriction of f to lines through the point x. The proof in this case
follows readily from the definitions.
In passing, we mention that this property of convex functions is no exception to
the rule that all properties of convex functions can be proved by applying a property
of a convex set or a convex cone. It is a consequence of a property of the tangent
cone TA (x) = cl[R+ (A − x)] of a convex set A at a point x ∈ A, but we do not
consider the concept of tangent cone.
The meaning of Proposition 6.7.7 is twofold. First it gives a systematic definition,
by homogenization, of the subdifferential of a proper convex function f at a point
x of its effective domain. Second it gives a method that allows to calculate the
subdifferential of a convex function at a point in many examples.
Concerning the first point. One takes the sublinear function f (x, ·) given by the
directional derivatives of f at x,
v → f (x; v).
Then one takes the dual description of this sublinear function, the convex set
∂(f (x; v)). Proposition 6.7.7 shows that this definition is equivalent to the defi-
nition given above.
Concerning the second point. Proposition 6.7.7 reduces the task to calculate
a subdifferential of a convex function at a point to the task to calculate, using
differential calculus, one-sided derivatives of functions of one variable at a point.
This systematic definition of the subdifferential of a convex function at a point
makes it possible to give straightforward proofs—by the homogenization method—
of all its properties. Here are some of these properties.
Proposition 6.7.8 The subdifferential of a proper convex function at a relative
interior point of the effective domain is a nonempty compact convex set.
The subdifferential is, at a point of differentiability, essentially the derivative.
Proposition 6.7.9 If f : X = Rn → R is a proper convex function and x ∈
ri(dom(f )), then f is differentiable in x iff ∂f (x) consists of one element. Then this
element is f (x), the derivative of f at x.
6.8 Norms as Convex Objects 165
Moreover, we get the following calculus rules for the subdifferential of a proper
convex functions at a point of its effective domain.
Proposition 6.7.10 (Calculus Rules for Subdifferential Convex Function) Let
f, g be two proper convex functions on X = Rn . Then the following formulas hold
true:
1. ∂(f ∨ g)(x) = cl(∂f (x)co ∪ ∂g(x)) if f (x) = g(x)
2. ∂(f + g)(x) = cl(∂f (x) + ∂g(x)).
provided, for each of these formulas, that the outcomes of the two binary
operations involved are proper.
The formulas of Proposition 6.7.10 are celebrated theorems. The first one is
called the theorem of Dubovitskii-Milyutin (note that the rule for ∂(f ∨ g)(x) in
the case f (x) = g(x) is simply that it is equal to ∂f (x) if f (x) > g(x) and that it is
equal to ∂g(x) if f (x) < g(x)), the second one the theorem of Moreau-Rockafellar.
Usually it requires considerable effort to prove them; by the systematic approach—
the homogenization method—you get them essentially for free: they are immediate
consequences from the corresponding rules for sublinear functions: the first two
rules from Proposition 6.6.10
Proposition 6.7.11 The subdifferential of a norm N on X = Rn at the origin is
equal to the unit ball of the dual norm N ∗ ,
{x ∈ X | N ∗ (x) ≤ 1}.
The material in this section can be generated by the homogenization method; the
details are not displayed here.
Conclusion A surrogate for the derivative—the subdifferential—has been defined
which is also defined for points of nondifferentiability of convex functions. This
is done by a formula and equivalently by means of the duality between sublinear
functions and convex sets. The latter definition can been used to derive the calculus
rules for the subdifferential.
6.8 Norms as Convex Objects
One can define the conification of a norm N on X = Rn in two ways. In the first
place, one can take the convex cone in X × R × R that is the conification of N
as a convex function. However, a simpler concept of conification is possible: the
convex cone in X × R that is the strict epigraph of N . These two conifications give
as dual object of N different convex functions. The conification of N as a convex
function gives the indicator function of standard unit ball of the standard dual norm
N ∗ , which is given by N ∗ (y) = supN (x)=1 x · y for all y ∈ X. That is, this function
takes the value 0 in this ball and the value +∞ outside this ball. The epigraph gives
the standard dual norm N ∗ itself. Each norm is continuous. So if N, Ñ are norms
on X, then Ñ = N ∗ iff N = Ñ ∗ ; then one says that the norms N and Ñ are dual to
each other. One gets four binary operations on norms, the restrictions of the binary
operations on convex functions +, ⊕, ∨, co∧. The binary operations + and ⊕ are
dual to each other, as are the binary operations ∨ and co∧.
6.9 *Illustration of the Power of the Conjugate Function
The aim of this section is to give an example that illustrates the power of the dual
description of convex functions.
Example 6.9.1 (Proving the Convexity of a Function by Means of Its Conjugate
Function) We consider the problem to verify that the function
f (x) = ln(ex1 + · · · + exn )
is convex. This function plays a central role in a certain type of optimization

problems, so-called geometric programming problems.
At first sight, it might seem that the best thing to do is to check that its Hessian
f (x) is positive semi-definite for all x. This verification turns out to be elaborate.
Here is a better way: by a trick. Its beginning might look surprising and in conflict
with the right way to prove something: assume the very thing that has to be proved—
the convexity of f . Then compute the conjugate function of f : for each y ∈ Rn we
have by definition that f ∗ (y) is the optimal value of the problem to maximize the
function x → x · y − f (x). Write down the stationary conditions—that is, put all
partial derivatives equal to zero. This gives:
yi − (ex1 + · · · + exn )−1 exi = 0
for 1 ≤ i ≤ n. These imply

n
yi = 1, y > 0.
i=1
For each such y, the following choice of x is a solution of the stationarity conditions
xi = ln yi 1 ≤ i ≤ n.
So the optimal value of the optimization problem under consideration is

n
yi ln yi .
i=1
6.10 Exercises 167

So we have computed f ∗ . It is the function g(y) that takes the value ni=1 yi ln yi
if ni=1 yi = 1, y ≥ 0 (with, for simplicity of notation, the convention 0 ln 0 = 0—
note here that limt↓0 t ln t = 0) and that takes the value +∞ otherwise. If we would
take the conjugate function of g we get f back, by the duality theorem.
After this ‘secret and private preparation’, we can start the official proof that the
function f is convex. We define out of the blue the following function g : Rn → R:

n
g(y) = yi ln yi
i=1
for all y ∈ Rn for which

n
yi = 1, y > 0
i=1
and which takes the value +∞ otherwise. It is our good right to consider any
function we like. There is no need to explain in a proof how we have come up
with our ideas to make the proof work. Now one sees straightaway that g is convex.
Just calculate the second derivative of η → η ln η—it is η−1 and so it is positive if
η > 0. Hence η ln η is convex and so g is convex.
Now we compute the conjugate function of g: for each x ∈ Rn we have by
definition that g ∗ (x) is the optimal value of the problem to maximize the function
y → x · y − g(y) subject to the constraints

n
yi = 1, y ≥ 0.
i=1
Writing down the stationary conditions of this problem and solving them gives
readily that the value of this problem is f (x). It follows that g ∗ = f . Hence f is a
convex function, as each conjugate function is convex.
Conclusion To prove that the central function from geometric programming is
convex, one can derive non-rigorously a formula for its conjugate function. This
formula is easily seen to define a convex function. Therefore, its conjugate function
is convex. It can be shown readily in a rigorous way that this convex function is the
original given function.
6.10 Exercises
1. *Prove the following inequalities:

max{xi } ≤ ln( exi ) ≤ ln n + max{xi }.
i i
i

These inequalities show that the function ln( i exi ) is a good smooth approxi-
mation of the useful but non-differentiable function maxi {xi }.
2. *
(a) Show that the condition that a line y = ax − b lies entirely under the graph
of a given convex function f : R → R can be written as follows:
inf (f (x) − (ax − b)) ≥ 0

x
(see Fig. 6.1).

(b) Show that this condition can be rewritten as
b ≥ sup (ax − f (x)) .

x
(c) Conclude that the pairs (a, b) form the epigraph of the function a →
f ∗ (a) = supx (ax − f (x)).
(d) Show that the conjugate function f ∗ is convex, as it is written as the
supremum of convex functions lx : a → ax − f (x) where x runs over
R.
3. *Check that the conjugate function f ∗ of a proper closed convex function f
is again a proper closed convex function on X. Show that its effective domain
consists of the slopes a of all affine functions a · x − b that minorize f .
4. *Compute the conjugates of the following convex functions of one variable:
(a) ax 2 + bx + c, (for a ≥ 0),
(b) ex ,
(c) − ln x, x > 0,
(d) x ln x, x > 0,
(e) x r , x ≥ 0, (for r ≥ 1),
(f) −x r , x > 0, (for 0 ≤ r ≤ 1),
(g) x r , x > 0 (for x ≤ 0),
(h) |x| + |x − a| ,
(i) √ − 1| ,
||x|
(j) 1 + x2,
(k) − ln(1 − ex ),
(l) ln(1 + ex ).
5. Compute the conjugate of a convex quadratic function of n variables, that is for
f (x) = x Cx + c x + γ where C is a positive semidefinite n × n-matrix,
c ∈ Rn and γ ∈ R.
6. *Calculate the conjugate function of the convex function f : Rn → R : x →
max{x1 , . . . , xn }.
7. Show that · ∗ is a norm: directly from the defining formula and from the
definition by the homogenization method.
6.10 Exercises 169
8. *Show that the conjugate of a norm · on X = Rn is the usual dual norm

· ∗ on X.
9. Show that the two definitions of the conjugate function operator for a proper
not necessarily closed convex function—by an explicit formula and by the
homogenization method, are equivalent.
10. Prove the following implication for proper convex functions f, g on X = Rn :
f ≤ g ⇒ f ∗ ≥ g∗.
11. Prove the Fenchel-Young inequality: p · x ≤ f (x) + f ∗ (p).

12. Check that for a proper closed convex function f on X = Rn one has indeed
c(f )◦ = cl(c(f ∗ )).
13. *Prove Proposition 6.5.2.
14. *Derive Proposition 6.5.1 from Proposition 6.5.2.
15. Show that a proper closed function p : X = Rn → R ∪ {+∞} is sublinear
iff C = {x ∈ X | p(x) < +∞} is a convex cone and p(αx + βy) ≤ αp(x) +
βp(y) ∀α, β ≥ 0 ∀x, y ∈ X.
16. Show that the support function of a nonempty convex set is a proper sublinear
function.
17. *Show that the subdifferential of a proper sublinear function is a nonempty
convex set.
18. Show that if a sublinear function takes all its values in R, then its subdifferential
is compact.
19. *Show that the two definitions of the duality operators for nonempty convex
sets and proper sublinear functions—by an explicit formula and by the homog-
enization method (formulate the latter explicitly)—are equivalent.
20. Prove Theorem 6.6.8 by the homogenization method. That is, use the definition
of the duality operators for nonempty convex sets and proper sublinear func-
tions by dehomogenization and use the duality theorem for convex cones.
21. Construct binary operations on proper sublinear functions by conification, show
that these are equal to binary operations defined by formulas and then prove
Proposition 6.6.10 by reduction to convex cones.
22. Show that the directional derivative of a proper convex function at a point of its
effective domain and in the direction of some vector is a well-defined element
in R.
23. Show that the directional derivative of a proper convex function at a point of its
effective domain and in the direction of some vector v is a sublinear function
of v.
24. Give an example of a proper convex function f on X = Rn , a point x ∈ dom(f )
and a vector v ∈ X such that the directional derivative f (x; v) is +∞.
25. Give an example of a proper convex function f on X = Rn , a point x ∈ dom(f )
and a vector v ∈ X such that the directional derivative f (x; v) is −∞.
26. Prove the following expression for the subdifferential of a proper convex
function f : X = Rn → R at a point x of its effective domain and the
directional derivatives of f at x,
∂f (x) · v = f (x; v) ∀v ∈ X.
27. *Let X = Rn , let f be a proper convex function on X, let x̄ ∈ dom(f ) and let
η ∈ X. Write A = epi(f ) and v = (x̄, f (x̄)). Show that the closed halfspace
in X × R having v on its bounding hyperplane and for which the vector (η, −1)
is an out of the halfspace pointing normal —that is, the halfspace η · (x − x) −
(ρ − f (x̄)) ≥ 0— supports A at v iff η ∈ ∂f ( x ).
28. Derive the theorems of Moreau-Rockafellar and Dubovitskii-Milyutin from
the corresponding rules for subdifferential functions: the first two rules from
Proposition 6.6.10.
29. Give the rule for ∂(f ∨ g)(x) in the case f (x) = g(x).
30. *Compute the subdifferential of the Euclidean norm on the plane—· : R2 →
1
[0, ∞) given by the recipe x = (x1 , x2 ) → (x12 + x22 ) 2 —at each point.
31. Find the subdifferential ∂f (x, y) of the function

f (x, y) = max {x, 2y} + 2 x 2 + y 2
at the points (x, y) = (3, 2); (0, −1); (2, 1); (0, 0).
32. Find the subdifferential ∂f (x, y) for the function

f (x, y) = max{2x, y} + x2 + y2
at every point (x, y).

33. Find the subdifferential ∂f (x, y) of the function

f (x, y) = max e x+y
− 1 , x + 2|y| − 1 , 2 x 2 + y 2
at the point (x, y) = (0, 0).

34. Consider a right truncated cone of height 2 with the discs of radii 2 and 1 in
the bases. Construct an example of a function for which this truncated cone is
a subdifferential at some point.
36. **Prove Proposition 6.7.8.
37. **Prove Proposition 6.7.9.
38. *Give a precise derivation of Proposition 6.7.10 from Proposition 6.6.10.
39. *Show that the definition of properness of a sublinear function that is given
by homogenization—its epigraph does not contain the vertical line through the
origin—is equivalent to the property that the function does not assume the value
−∞ at any point.
6.10 Exercises 171
40. Prove all results mentioned in Sect. 6.8, and determine which choice (++), (=
−), (−+), (−−) gives rise to which binary operation on norms +, ⊕, ∨, co∧.
41. *Formulate all results on norms N on X = Rn that are mentioned in Sect. 6.8,
in terms of their unit balls {x ∈ X | N(x) ≤ 1}. In particular, determine how the
binary operations on norms correspond to the binary operations on unit balls of
norms.
42. *Write down the stationarity conditions of the optimization problem y → x ·
y − g(y), where g(y) = ni=1 yi ln yi for all y > 0, subject to the constraints

n
yi = 1, y > 0
i=1
and solve them. Compute the optimal value of this problem as a function of x.
Chapter 7
Convex Problems: The Main Questions
Abstract
• Why. Convex optimization includes linear programming, quadratic program-
ming, semidefinite programming, least squares problems and shortest distance
problems. It is necessary to have theoretical tools to solve these problems.
Finding optimal solutions exactly or by means of a law that characterizes them,
is possible for a small minority of problems, but this minority contains very
interesting problems. Therefore, most problems have to be solved numerically,
by an algorithm. An analysis of the performance of algorithms for convex
optimization requires techniques that are different from those presented in this
book; therefore such an analysis falls outside the scope of this book.
• What. We give the answers to the main questions of convex optimization (except
the last one), and in this chapter these answers are compared to those for smooth
optimization:
1. What are the conditions for optimality? A convex function f assumes its
infimum at a point x if and only if 0 ∈ ∂f ( x )—Fermat’s theorem in the convex
case. Now assume, moreover, that a convex perturbation has been chosen: a
convex function of two vector variables F (x, y) for which f (x) = F (x, 0) ∀x;
here y ∈ Y = Rm represents changes in some data of the problem to minimize
f (such as prices or budgets). Then f assumes its infimum at x if and only
x ) = L(
if f ( x , η, η0 ) and 0 ∈ ∂L( x , η, η0 ) for some selection of multipliers
(η, η0 ) (provided the bad case η0 = 0 is excluded)—Lagrange’s multiplier
method in the convex case. Here L is the Lagrange function of the perturbation
function F , which will be defined in this chapter. The conditions of optimality
play the central role in the complete analysis of optimization problems. The
complete analysis can always be done in one and the same systematic four
step method. This is illustrated for some examples of optimization problems.
2. How to establish in advance the existence of optimal solutions? A suitable
compactness assumption implies that there exists at least one optimal solution.
3. How to establish in advance the uniqueness of the optimal solution? A suitable
strict convexity assumption implies that there is at most one solution.

174 7 Convex Problems: The Main Questions
4. What is the sensitivity of the optimal value S(y) for small changes y in the
data of the problem? The multiplier η is an element of the subdifferential
∂S(0) provided η0 = 1, so then it is a measure for the sensitivity.
5. Can one find in a guaranteed efficient way approximations for the optimal
solution, by means of an algorithm? This is possible for almost all convex
problems.
Road Map
1. Section 7.1 (convex optimization problem, types: LP, QP, SDP, least squares,
shortest distance).
2. Theorems 7.2.1–7.2.3, Definition 7.2.5, Theorems 7.2.6, 7.2.7, Example 7.2.3
(existence and uniqueness for convex problems).
3. Section 7.3 (definitions leading up to the concept of smooth local minimum).
4. Theorem 7.4.1, Example 7.4.4 (Fermat’s theorem (smooth case)).
5. Figure 7.2, Proposition 7.5.1 (no need for local minima in convex optimization).
6. Figure 7.3, Theorem 7.6.1, Example 7.6.5 (Fermat’s theorem (convex case)).
7. Section 7.7 (perturbation of a problem).
8. Theorem 7.8.1, Example 7.8.3 (Lagrange multipliers (smooth case)).
9. Figure 7.7, Proposition 7.9.1, Definitions 7.9.2–7.9.4, Theorem 7.9.5, Defini-
tion 8.2.2 (Lagrange multipliers (convex case).
10. Figure 7.8 (generalized optimal solutions always exist).
11. Section 7.11 (list advantages convex optimization).
7.1 Convex Optimization Problem
We will work coordinate-free. X will always be a finite-dimensional inner product

1
space in this chapter. Then we take on X the norm defined by x = x, x 2
for all x ∈ X. The main example is X = Rn , equipped with the standard inner
product—the dot product x, y = x · y = i xi yi ; we recall that the concept
of finite-dimensional inner product space is not more general than Rn with the dot
product. For semidefinite programming problems (to be defined below), the space
X is Sn , thespace of symmetric n × n-matrices, with inner product M, N =
Tr(MN ) = i,j mij nij (where T r is the trace of a square matrix, the sum of its
diagonal entries).
We begin by defining the concept of a convex optimization problem.
Definition 7.1.1 A convex optimization problem is a problem to minimize a convex
function f : X → R. The optimal value v of the problem is the infimum of all
values assumed by f ,
v = inf f (x) ∈ [−∞, +∞].

x
An optimal solution of the problem is an element x ∈ X for which f (x ) ∈ R and

x ) ≤ f (x) ∀x ∈ X, or, equivalently, for which f assumes its infimum and this
f (
7.1 Convex Optimization Problem 175
x ) = v ∈ R. Convex problems of primary interest are those where f is

is finite, f (
proper: other problems cannot have an optimal solution, of course.
Example 7.1.2 (Hiding Constraints by Using +∞) Someone wants to spend at
most 5 euro in order to maximize the pleasure x 3 y 2 of playing x times in a game
hall, which costs 50 euro cents for one game, and eating y times an ice cream that
cost 1 euro, This problem can be modeled as the convex optimization problem to
minimize the convex function f : X = R2 → R defined by f (x) = −3 ln x − 2 ln y
for x > 0, y. and f (x) = +∞ otherwise. Here the constraints x > 0, y > 0 have
been hidden.
Note that the set of optimal solutions of a convex optimization problem is a
convex set. Indeed, if f : X → R is a convex function, and v = infx f (x) is a real
number, then f assumes the value v precisely at the sublevel set {x ∈ X | f (x) ≤ v}
and a sublevel set of a convex function is a convex set.
We emphasize the generality of this definition. If f is proper, then we have the
problem to minimize a convex function A → R, where A = dom(f ) is a convex
set in X. So the definition hides the constraint x ∈ A by defining the value of
the objective function outside A formally to be +∞. In particular, this definition
includes the following type of problem, called a convex programming problem:
min f0 (x), x ∈ X, f1 (x) ≤ 0, . . . , fm (x) ≤ 0,
where f0 , . . . , fm : X → R are convex functions.

This definition also includes the following type of problem, called a conic
problem:
mina, x, x ∈ L, x ∈ C,
where a ∈ X, L is an affine subspace of X (that is, it is of the form p + M =

{p + m | m ∈ M} for p ∈ X and M a subspace of X), and C is a convex cone in
X. Conversely, each convex optimization problem can be reformulated as a conic
problem, using homogenization.
Convex optimization problems include the following important types of prob-
lems:
• linear programming problem (LP): this is a conic problem with X = Rn and
C = Rn+ , the nonnegative orthant,
• quadratic programming problem (QP): this is a conic problem with X = Rn and
C = {x ∈ Rn | xn2 ≥ x12 + · · · + xn−1
2 , x ≥ 0}, the Lorentz cone or ice cream
n
cone,
• semidefinite programming problem (SDP): this is a conic problem with X = Sn ,
the space of symmetric n × n-matrices, with inner product M, N = Tr(MN ) =

i,j mij nij , and C the convex cone of positive semidefinite n × n-matrices, the
positive semidefinite cone.
• least squares problems (LS): this is the problem to minimize the length of the
error vector Ax − b where A is a given m × n-matrix of rank m, b is a given
column vector in Rm and x runs over X = Rn .
• shortest distance problems: this is the problem to find the point on a given convex
set A ⊆ X that has the shortest distance to a given point p ∈ X.
We will emphasize throughout this chapter the parallel between convex and
smooth optimization. With the latter we mean, roughly speaking, optimization
problems that can have equality constraints and where all functions that describe
the problem are continuously differentiable—that is, all their partial derivatives
exist and are continuous. A precise definition of smooth optimization will be given
later, when we need it. The main questions are the same for smooth problems as for
convex problems, but the answers are much more satisfying for convex problems.
7.2 Existence and Uniqueness of Optimal Solutions
In applications, most questions of existence and uniqueness of optimal solutions can

be answered by means of the following results.
7.2.1 Existence
To prove existence of optimal solutions for an optimization problem with a

continuous objective function, one always uses the following theorem. If it is not
possible to prove, by means of this theorem, existence of an optimal solution for an
optimization occurring in an application, then the problem has usually no solution,
and it is likely that something has gone wrong in the modeling process that led to
the optimization problem.
Theorem 7.2.1 (Weierstrass Theorem) A lower semi-continuous function f :
T → R on a nonempty compact subset T ⊆ X assumes its infimum.
Proof As T is nonempty, we can choose a sequence t1 , t2 , . . . in T for which the
sequence of values f (t1 ), f (t2 ), . . . converges to v = inft f (t) ∈ [−∞, +∞).
Note that at this point we do not yet know that v > −∞. Now take a convergent
subsequence of t1 , t2 , . . ., with limit
t. This is possible as T is compact, and the
limit of the f -values of this subsequence is the limit of the f -values of the original
sequence. Now we replace the sequence t1 , t2 , . . . by the chosen subsequence and
leave the notation for it the same. Then v = limi→+∞ f (ti ) is ≥ f ( t), as f is lower
semi-continuous, and so, as v = inft f (t) and f ( t) ∈ R, we get f (
t) = v ∈ R.
We recall that compactness amounts to closedness and boundedness. Thus, four
properties have to be verified: lower semi-continuity (or the stronger property
continuity) of f , non-emptiness, closedness and boundedness of the domain T . The
7.2 Existence and Uniqueness of Optimal Solutions 177
hardest property to verify is boundedness. If all properties except boundedness have

been checked, then existence of optimal solutions follows from Theorem 7.2.1 if
there exists t¯ ∈ T for which the sub-level set {t ∈ T | f (t) ≤ f (t¯)} is bounded. This
property holds for example if f is coercive, that is, if limt→+∞ f (t) = +∞.
To prove existence of optimal solutions for a convex problem, one always uses
the following consequence of Theorem 7.2.1, which is slightly more convenient.
Theorem 7.2.2 Let f : X → R be a closed proper convex function. If f has no
recession directions, then f assumes its infimum.
Proof Choose x̄ ∈ dom(f ). Then the infimum of f does not change if we restrict
it to the sublevel set {x ∈ X | f (x) ≤ f (
x )}. The recession cone of this level set is
the recession cone of f and so it is equal to {0}. Therefore, this level set is compact.
Now apply Theorem 7.2.1.
Usually, it is relatively easy to check that f has no recession direction. Therefore,
it is in general easier to prove existence in the convex case than in the smooth case.
7.2.2 Uniqueness
Here is the main uniqueness result for convex optimization.

Theorem 7.2.3 There is at most one optimal solution for the problem to minimize
a strictly convex function f : X = Rn → R.
Proof If v = infx f (x) is finite, then f assumes its infimum on the—possibly
empty—sublevel set {x ∈ X | f (x) ≤ v}, which is a convex set. If we let x run
over this set, then the point (x, v) runs over a convex subset of the graph of f . As
f is strictly convex, this convex subset cannot contain any closed line segment of
positive length. It follows that it is either empty or a singleton. That is, f has at most
one minimum.
Moreover, we give a uniqueness result in a situation where the objective function
is not strictly convex. We consider convex optimization problems of the following
type: minimize a linear function on a convex set. Each convex optimization problem
can be brought in this form: minimization of a convex function f : X → R is
equivalent to letting (x, ρ) run over the epigraph of f and minimizing ρ. This type
of problems has the following property.
Proposition 7.2.4 Let l : X → R be a linear function and A ⊆ X a convex set.
Assume that l is not constant on A. Then the (possibly empty) set of optimal solutions
of the problem to minimize l on A is contained in the relative boundary of A.
Proof It suffices to prove that a relative interior point p of A cannot be an optimal
solution. Choose x, y ∈ A for which l(x) < l(y). Then choose t > 0 so small that
p + t (x − y) ∈ A. Then l(p + t (x − y)) = l(p) + t (l(x) − l(y)) < l(p) and so p
is not an optimal solution.
Definition 7.2.5 A convex set A ⊆ X is strictly convex if its relative boundary
contains no closed line segment of positive length.
Now we are ready to give the promised second uniqueness result.
Theorem 7.2.6 There is at most one optimal solution for the problem to minimize
a linear function l on a strictly convex set A that is not constant on this set.
Proof The set of optimal solutions is a convex set that is a subset of the relative
boundary of A. Therefore, it does not contain any closed line segment of positive
length, as A is strictly convex. It follows that it is either empty or a singleton.
Here is a third class of convex optimization problems for which one has
uniqueness.
Theorem 7.2.7 The problem to find a point on a closed convex set A ⊆ X having
shortest distance to a given point p ∈ X has a unique optimal solution.
There are no uniqueness results for global optima of non-convex optimization
problems.
7.2.3 Illustration of Existence and Uniqueness
Here is an illustration of the use of these existence and uniqueness results for convex
optimization.
Example 7.2.8 (The Waist of Three Lines in Space) Once, Dr. John Tyrrell, lecturer
at King’s College London, challenged us PhD students to crack an unsolved
problem: to give a proof for the following phenomenon. If an elastic band is
stretched around three iron wires in space that are mutually disjoint and at least
two of which are not parallel, then the band will always slide to an equilibrium
position, and this position does not depend on how you stretch initially the elastic
band around the wires. In other words, the three lines have a unique waist.
We all did a lot of thinking and calculating but all was in vain. Only many years
later, when I gave a course on convex analysis, did I realize how simple the solution
is. It does not require any calculation.
The problem can be formalized as follows: show that for three given mutually
disjoint lines l1 , l2 , l3 in space R3 —at least two of them should not be parallel—the
problem to minimize the function
f (p1 , p2 , p3 ) = |p1 − p2 | + |p2 − p3 | + |p3 − p1 |
where the point pi runs over the line li for i = 1, 2, 3 possess a unique optimal
solution. This problem is illustrated in Fig. 7.1.
This is a convex optimization problem as all three terms of f are convex.
This gives already some information: the function f cannot have any stationary
7.3 Smooth Optimization Problem 179
Fig. 7.1 Waist of three lines l1

in space
p1 p2
l2
l3
p3
points (points where all three partial derivatives are equal to zero) that are not a
global minimum. In particular, there can be no local minimum that is not a global
minimum.
Existence of an optimal solution follows from Theorem 7.2.2: indeed if we
choose qi , ri ∈ li for i = 1, 2, 3, such that qj = rj for all j , and write
pi (t) = (1 − t)qi + tri for all t ∈ R+ . Then, it follows that the three terms of
f (p1 (t), p2 (t), p3 (t)) = p1 (t) − p2 (t) + p2 (t) − p3 (t) + p3 (t) − p1 (t) are
the Euclidean norm of a vector function of the form a + tb and for at least one of the
three b is not the zero vector. This implies that f (p1 (t), p2 (t), p3 (t)) is for large t
nondecreasing, as required.
Uniqueness of an optimal solution follows from Theorem 7.2.3: indeed, the
function f is strictly convex. We check this now. As all three terms of f are convex,
it suffices to show that at least one of them is strictly convex. Let pi (t) = qi + tri
with qi , ri ∈ R3 be a parametric description of line li for i = 1, 2, 3.
Consider the differences
p1 (t) − p2 (t), p2 (t) − p3 (t), p3 (t) − p1 (t).
None of these three functions takes the value zero as the lines are mutually disjoint
and not all three can be constant, as the lines l1 , l2 , l3 are not all parallel. We assume
that p1 (t) − p2 (t) is not constant, as we may without restricting the generality of the
argument. It follows readily that the term |p1 (t) − p2 (t)| is strictly convex. Hence
the function f is strictly convex.
This settles the ‘waist of three lines in space’ problem.
7.3 Smooth Optimization Problem
For a good understanding of convex optimization, it is useful to keep in mind

that this runs parallel to the more familiar smooth optimization. In particular, the
two optimality conditions for convex problems that will be given, will be seen
to run parallel to two well-known optimality conditions for smooth problems:
Fermat’s theorem (‘in a local minimum all partial derivatives are equal to zero’) and
Lagrange’s multiplier rule (‘in a local minimum of an equality constrained problem,

you should first take up the constraint functions in the objective function by means
of multipliers and then put all partial derivatives equal to zero’).
For both optimality conditions for convex problems, we will before we give them
first consider their smooth analogue. As a preparation for the latter, we give in this
section a precise definition of smooth optimization. This requires that we make
precise the intuitive idea that a local minimum should be considered to be ‘smooth’
if near this point not only the description of the feasible region but also the graph of
the objective function are ‘non-singular’ in some sense. Here are some examples of
points of local minimum that should be considered to be non-smooth.
Example 7.3.1 (Non-smooth Points of Local Minimum)
1. The problem to minimize the absolute value, min f (x) = |x|, x ∈ R, has
optimal solution x = 0. At this point f (x) is non-differentiable.
2. For the problem to minimize the square of the Euclidean norm on the plane over
the union of the two coordinate axes, min f (x) = x12 + x22 , x ∈ R2 , x1 x2 = 0,
the origin x = 02 is an optimal solution. Near this point, the feasible set is non-
singular, being the intersection of two lines.
3. The problem to minimize the square of the Euclidean norm on the plane over the
horizontal axis, which is described by squaring the natural description x2 = 0,
that is, min f (x) = x12 + x22 , x ∈ R2 , x22 = 0 has optimal solution
x = 0. Near
this point, the feasible set is described in an unnecessarily complicated way.
Now we are going to give a precise definition of a smooth point of local minimum
that is adequate for the present purpose, which is to shed light on the parallel of
convex optimization with smooth optimization.
To begin with, we need the concepts differentiable and continuously differen-
tiable.
Definition 7.3.2 A function G : U → Y from an open subset U of a finite
dimensional inner product space X to another finite dimensional vector space Y
x ∈ U if there exists a linear function G (
is differentiable at a point x) : X → Y —
necessarily unique and called the derivative of G at x —such that
G( x ) − G (
x + h) − G( x )(h)/h
tends to zero if h → 0. If Y = R, then a linear function X → Y is of the

form x → a, x for some a ∈ X; this allows to identify the derivative of a function
X → R at a point with an element of X. If X = Rn and Y = Rm , then the derivative
of a function X → Y at a point can be identified with an m × n-matrix.
For practical purposes, we need the following stronger notion of differentiability.
Definition 7.3.3 A function G : U → Y from an open subset U of a finite
dimensional inner product space X to a finite dimensional inner product space Y
is continuously differentiable if it is differentiable at all points of U and if moreover
the derivative G (x) depends continuously on x ∈ U . For brevity, a continuously
differentiable function is called a C 1 -function.
7.3 Smooth Optimization Problem 181
The following result gives a partial explanation why the concept of continuous
differentiability is convenient.
Proposition 7.3.4 A function G = (g1 , . . . , gm ) : U → Y —where X =
Rn , Y = Rm , U an open subset of Rn —is continuously differentiable iff all partial
∂gi
derivatives ∂x j
exist and are continuous. Then one has the following formula for the
derivative of G:
∂gi
G (x) = ( (x))ij .
∂xj
This reduces the task to calculate the derivative of a continuously differentiable

vector function of several variables to the task of calculating partial derivatives,
which can be done by means of the calculus rules for calculating derivatives of
functions of one variable.
Now we are going to use this concept to give the definitions of local minimum
and smooth local minimum.
Definition 7.3.5 A function f : U → R on an open set U of Rn has a local
minimum at a point
x ∈ U if there exists an open subset V of U containing the point

x such that f assumes its minimum on V at the point x,
f (x) ≥ f (
x ) ∀x ∈ V .
Definition 7.3.6 A point of local minimum x ∈ X of a function f : X →

(−∞, +∞] for which the graph of f is described in an open set U ⊆ X containing

x as the zero set of a C 1 -function G : U → Y , where Y is a finite dimensional inner
product space, is smooth if G ( x ) is surjective.
Note that this property can be verified in concrete examples: if G is given, then
one can usually compute G ( x ) and then check surjectivity. In order to understand
the sense of this definition, we need the concept tangent vector.
Definition 7.3.7 A tangent vector to a subset R of a finite-dimensional vector space
X at a point r ∈ R is a vector d ∈ X for which there exist sequences (dk )k in X and
(tk )k in R++ such that dk → d, tk → 0 for k → ∞, and r + tk dk ∈ R for all k. We
write Tr (R) for the set of all tangent vectors to R at r and call it the tangent cone
of the set R at the point r. Note that this is a cone: for each d ∈ Tr (R), and each
α ∈ R++ , one has αd ∈ Tr (R).
The following nontrivial result makes it possible to calculate the tangent cone in
some interesting cases.
Theorem 7.3.8 (Tangent Space Theorem) Let X, Y be finite dimensional inner
product spaces, G : X → Y a continuously differentiable function and x ∈ X for
which G(x ) = 0 and moreover the linear function G ( x ) from X to Y is surjective.
Then the tangent cone to the set of solutions of the vector equation G(x) = 0 at the
point x , {h ∈ X | G (
x is the kernel of the derivative of G at x )(h) = 0},

x ({x ∈ X | G(x) = 0}) = ker G (
T x ).
This result is an equivalent version of two related central results from differential
calculus: the inverse function theorem and the implicit function theorem. Note in
particular that the tangent cone in Theorem 7.3.8 is a subspace. Therefore, the
tangent cone of the solution set of G(x) = 0 at a solution x for which G ( x ) is
surjective is called tangent space instead of tangent cone.
Now we can understand the sense of the definition of smooth local minimum
above. Indeed, if the graph of a function f is described near a point ( x )) as the
x , f (
solution set of a C 1 -vector function G with G ( x )) surjective, then the graph
x , f (
of f is near the point ( x , f (x )) smooth in the following sense: it looks like an affine
subspace—( x )) + T(
x , f ( x )) = (
x ,f ( x )) + ker G (
x , f ( x , f (
x )).
7.4 Fermat’s Theorem (Smooth Case)
We recall Fermat’s theorem for smooth problems.

Theorem 7.4.1 (Fermat’s Theorem (Smooth Case)) Let a function f : U → R,
where the set U ⊆ Rn is open, and a point of differentiability
x of f be given. Then
the derivative of f at
x is zero,
f (
x ) = 0,
if f (x) has a local minimum at

x.
In order to emphasize the parallel between the two optimality conditions for
convex optimization and their smooth analogues to be given, of which this is the
first one, we will prove all these optimality conditions using the concept of the
tangent cone.
In the proof, we need the following result.
Proposition 7.4.2 Let X, Y be finite dimensional inner product spaces, G : X → Y
a continuously differentiable function and
x ∈ X. Then the tangent cone of the set
R = G , the graph of G, at the point r = ( x )), is the graph of the derivative
x , G(
G (
x ),
T( x )) (G ) = G (

x ,G( x).
This result follows immediately from the definitions of the derivative and the
tangent cone. Alternatively, it is a special case of the tangent space theorem given
above: consider the function with recipe (x, z) → G(x) − z.
Now we are ready to give the proof of Theorem 7.4.1 in the promised style.
7.4 Fermat’s Theorem (Smooth Case) 183
Proof of Fermat’s Theorem in the Smooth Case Assume that x is a point of local
minimum of the function f . To prove the theorem, it suffices by Proposition 7.4.2
to prove the following claim:
(X × 0) + T( x )) (f ) ⊂ X × R.
x ,f (
So we have here a proper inclusion and no equality.

To prove this proper inclusion, we show that for every d = (x, ρ) ∈
T( x )) (f ), one has ρ = 0. Choose sequences (dk = (xk , ρk ))k in X × R and
x ,f (
(tk )k in R++ such that dk → d, tk → 0 and
( x )) + tk dk = (
x , f ( x + tk xk , f (
x ) + tk ρk ) ∈ f .
x is a point of local minimum of f and xk → x, tk → 0,

So, as
x ) + tk ρk = f (
f ( x + txk ) ≥ f (
x ),
for all sufficiently large k. Hence tk ρk ≥ 0 and so, as tk ∈ R++ , ρk ≥ 0 for

all sufficiently large k. As ρk → ρ, it follows that ρ ≥ 0. Now we repeat the
argument for −d instead of for d; this is allowed as T( x )) (f ) is a subspace, so
x ,f (
d ∈ T( x )) (f ) implies −d ∈ T(
x ,f ( x )) (f ). This gives −ρ ≥ 0. In all, we get
x ,f (
ρ = 0.
The power of Fermat’s theorem in the smooth case is based on its combination
with the differential calculus (rules for computing the derivative of a continuously
differentiable function).
Remark 7.4.3 Note that Fermat’s theorem in the smooth case is equivalent to the
proper inclusion
(X × 0) + T( x )) (f ) ⊂ X × R.
x ,f (
It is convenient that for each smooth or convex optimization that can be analyzed
completely, the analysis can always be presented in the same systematic way, which
we call the four step method. This method proceeds as follows.
1. Modeling (and existence and/or uniqueness). The problem is modeled in a
precise way. Moreover, existence and/or uniqueness of the optimal solution are
proved if this is desirable.
2. Optimality conditions. The optimality conditions are written down for the
problem.
3. Analysis. The optimality conditions are analyzed.
4. Conclusion. The optimal solution(s) is/are given.
Now we give the first illustration of this four step method.
Example 7.4.4 Minimize the function 2x14 + x24 − x12 − 2x22 .

1. Modeling and existence. min f (x) = 2x14 + x24 − x12 − 2x22 , x ∈ R2 .
Existence of an optimal solution x follows from the coercivity of f (f (x) =
x12 (2x12 − 1) − x22 (x22 − 2) → +∞ for x → +∞).
2. Fermat: f (x) = (0, 0) ⇒
f1 = 8x13 − 2x1 = 0, f2 = 4x23 − 4x2 = 0.
3. Analysis. Nine stationary points: x1 ∈ {0, + 12 , − 12 } and x2 ∈ {0, +1, −1}.

Comparison of f -values shows that four stationary points have lowest f -value:
x1 = ± 12 , x2 = ±1.
4. Conclusion. There are four optimal solutions,
x = (± 12 , ±1). (Remark: (0, 0) is
a local maximum.)
7.5 Convex Optimization: No Need for Local Minima
Before we are going to give Fermat’s theorem in the convex case, we explain in this
section why the concept local minimum, which plays a role in Fermat’s theorem in
the smooth case, plays no role in convex optimization.
Proposition 7.5.1 Each local minimum of a proper convex function is also a global
minimum.
Figure 7.2 illustrates this result and its proof.
Proof We argue by contradiction. Assume f is a proper convex function on X and

x is a local minimum but not a global minimum. Then we can pick x̄ such that
f (x̄) < f ( x ). Now consider a point P on the segment with endpoints (
x , f (
x ))
and (x̄, f (x̄)) close to ( x )). That is,
x , f (
P = (xα , yα ) = (1 − α)( x )) + α(x̄, f (x̄))

x , f (
with α ∈ (0, 1) sufficiently small. What ‘sufficiently small’ means will become
clear in a moment. On the one hand, P lies in the epigraph of f , and therefore, by
Fig. 7.2 No distinction

between local and global
minima
x−
∧
xα x
7.6 Fermat’s Theorem (Convex Case) 185
the local minimality of x , we get that for α > 0 sufficiently small, P is so close
to ( x )) that the last coordinate of P is ≥ f (
x , f ( x ). On the other hand, this last
coordinate equals (1−α)f ( x )+αf (x̄), and—by f (x̄) < f ( x )—this last coordinate
is < (1 − α)f ( x ) + αf (
x ) = f (
x ). So we get the required contradiction.
7.6 Fermat’s Theorem (Convex Case)
In order to obtain Fermat’s theorem for convex problems, you should replace the
equation f (x) = 0 by the inclusion 0 ∈ ∂f (x).
Theorem 7.6.1 (Fermat’s Theorem (Convex Case)) Let f : X → (−∞, +∞]
be a proper convex function and let
x ∈ dom(f ). Then
x is an optimal solution of
the convex optimization problem minx f (x) iff
0 ∈ ∂f (
x ).
Figure 7.3 illustrates Theorem 7.6.1.

The power of this result is based on its combination with the convex calculus
(rules for computing the subdifferential of a convex function). This result itself
is just a tautology and therefore we do not display the proof. However, in order
to emphasize the parallel between the four optimality conditions for smooth and
convex optimization that we give (Fermat and Lagrange for smooth and convex
problems), we mention the following reformulation of the result above in terms of a
tangent cone.
Remark 7.6.2 Fermat’s theorem in the convex case is equivalent to the proper
inclusion
(X × 0) + T( x )) (epi(f )) ⊂ X × R.
x ,f (
Compare this reformulation of Fermat’s theorem in the convex case to the one
given for Fermat’s theorem in the smooth case in Remark 7.4.3. Note that the
epigraph and the graph of a function f : X → (−∞, +∞] are related:
epi(f ) = f + (X × [0, +∞)).
Fig. 7.3 Fermat’s theorem

(convex version) 0 ∈ ∂f (
x) f
∧
x
In passing, we mention that here we come across another situation where the
tangent cone can be determined easily.
Proposition 7.6.3 The tangent cone to a convex set A ⊆ X at a point ā ∈ A is the
cone generated by the set A − ā,
Ta (A) = R+ (A − ā) = {ρ(a − ā) | ρ ∈ R+ , a ∈ A}.
We emphasize that Fermat’s theorem 7.6.1 in the convex case gives a criterion
for global optimality: that is, the condition 0 ∈ ∂f ( x ) holds if and only if x
is an optimal solution for the convex optimization problem minx f (x). In other
words, it is a necessary and sufficient condition for global optimality. Therefore, the
ideal of the theory of optimization is reached here, in some sense. In comparison,
Fermat’s theorem 7.4.1 in the smooth case gives only a necessary condition for local
optimality.
Now we give some examples.
Example 7.6.4 (Minimization of a Complicated Looking Function) Solve the prob-
lem to minimize

f (x, y) = x 2 + 2y 2 − 3x − y + 2 x 2 − 4x + y 2 + 4.

10 x
1. Modeling and convexity. f (x, y) = (x y) − 3x − y + 2|(x, y) −
02 y
(2, 0)| → min, (x, y) ∈ (R2 )T . The function f (x, y) is strictly convex,
as its building
pattern
reveals: it is the sum of the strictly convex function
10 x
(x y) and the two convex functions −3x − y and 2|(x, y) − (2, 0)|.
02 y
2. Criterion. Convex Fermat: 0 ∈ ∂f (x, y). There is a unique point of non-
differentiability, (2, 0); in this point, f has subdifferential the sum of the
derivative (x 2 + 2y 2 − 3x − y) = (2x − 3, 4y − 1) taken in the point (2, 0)—that
is, (1, −1)—and the unit disk multiplied from the origin with the scalar 2. That
is, we have to take the disk with center the origin and radius 2 and translate it
over the vector (1, −1). This gives the disk with center at (1, −1) and radius 2.
3. Analysis. Let us try the point of non differentiability to begin with. It is a special
point. Its subdifferential is a relatively large set so we have a fair chance that it
contains the origin. All other points have a subdifferential consisting of one point
only. It would be unlikely that in a randomly chosen such points the derivative is
zero. Trying to find a point of differentiability that is a solution requires therefore
the solution of the stationarity equations. These are two complicated equations
in two unknowns. Let us hope that we are lucky with the investigation of the
point of non-differentiability. Then we do not have to solve this system. Well, the
criterion for optimality of the point of non-differentiability (2, 0) is: the origin
has to lie inside the disk with center (1, −1) and radius 2. This is illustrated in
7.6 Fermat’s Theorem (Convex Case) 187
Fig. 7.4 Catching the origin

in the subdifferential of f at
some point 0 1
−1
Fig. 7.4. This picture shows that this is√indeed the case. To be more precise, the
distance between (1, −1) and (0, 0) is 2, which is indeed smaller than 2.
4. Conclusion. The problem has a unique solution (2, 0).
In the next example, both calculus rules for subdifferentials—Moreau-
Rockafellar and Dubovitskii-Milyutin—are used.
Example 7.6.5 (Application of Moreau-Rockafellar and Dubovitskii-Milyutin)
Minimize the function

3 1
x 2 + y 2 + |x| + x + y 4 + y.
2 2

1. Modeling and existence. min f (x, y) = x 2 + y 2 +|x|+ 32 x+y 4 + 12 y, (x, y) ∈
R2 . This problem is convex.
2. Convex Fermat: 0 ∈ ∂f (x, y).
3. Analysis. Again we try the point of nondifferentiability (x, y) = (0, 0). We
have

• ∂( x 2 + y 2 )(0, 0) = B(0, 1), the disk with radius 1 and center at the origin;
• ∂( 32 x + y 4 + 12 y)(0, 0) = (∂( 32 x + y 4 + 12 y) (0, 0) = {( 32 , 12 )};
• ∂|x|(0, 0) = ∂ max(−x, x)(0, 0), and this is by Dubovitskii-Milyutin equal to
{(−1, 0)}co ∪ {(1, 0)}, the closed line segment on the x-axis with endpoints
−1 and +1.
This gives, by Moreau-Rockafellar, that ∂f (0, 0) is the Minkowski sum of
B(0, 1), the interval [−1, 1] on the x-axis, and the singleton {( 32 , 12 )}. Then you
have the pleasure of drawing a picture of this set. This reveals that ∂f (0, 0) is
an upright square with sides 2 and with center the point ( 32 , 12 ), to which on both
horizontal sides halfdisks have been attached. This set contains the origin.
4. Conclusion. The point (0, 0) is an optimal solution.
These were numerical examples. Now we give a convincing example of an
application to the facility location problem of Fermat-Weber.
Example 7.6.6 (The Facility Location Problem of Fermat-Weber) A facility has to
be placed for which the average distance to three given points ai , i = 1, 2, 3—that
120° 120°
120°
Fig. 7.5 Facility location problem of Fermat-Weber
do not lie on one straight line—is minimal. This is also called the Fermat-Torricelli-
Steiner problem. The usual textbook solution is by the theorem of Fermat. The
cases that a Torricelli point does not exist are often omitted or require an awkward
additional analysis. Below we use the convex Fermat theorem, and this gives an
efficient treatment that includes all cases.

1. Modeling and existence: f (x) = 13 3i=1 |x − ai | → min, x ∈ R2 , f (x) ≈
|x|2 → ∞ for |x| → ∞, so by Weierstrass an optimal solution x exists. The
function f is strictly convex so the optimal solution is unique.

2. Convex Fermat: 0 ∈ ∂f (x); this subdifferential equals 13 3i=1 |x−a x−ai
i|
if x =
a −a
a1 , a2 , a3 and it equals {u | |u| ≤ 1} + |aii −ajj | + |aaii −a
−ak | for x = ai , where we
k
have written {ai , aj , ak } = {a1 , a2 , a3 }.

3. Analysis: in a point of non-differentiability x = ai we get (use: sum of two unit
vectors has length ≤ 1 iff these make an angle ≥ 2π/3) that the angle of the
triangle with vertices a1 , a2 , a3 at ai is ≥ 2π/3; in a point of differentiability we
get (use: the sum of three unit vectors equals zero iff these make angles 2π/3
with each other) that x is the Torricelli point of the triangle, the point that sees
each pair of vertices of the triangle under the same angle—2π/3.
4. Conclusion: if one of the angles of the triangle is ≥2π/3, then the unique optimal
location is at the vertex of that angle; otherwise, there exists a unique Torricelli
point and this is the unique optimal location. Figure 7.5 illustrates this: the first
case on the left hand side and the second case on the right hand side.
Note that in the last example the optimal solution is characterized by a law
(provided it is a point of differentiability for the objective function): that the angles
between the line segments from the optimal solution to the three given points are
equal.
7.7 Perturbation of a Problem
We want to consider the sensitivity of a given smooth or convex optimization

problem to changes in some of its data. We describe the given problem as follows.
Let f : X → (−∞, +∞] be a function, where X is a finite dimensional inner
7.8 Lagrange Multipliers (Smooth Case) 189
product space. Note that we allow f to take the value +∞. Consider the problem
min f (x), x ∈ X. (P )
Now we describe a change in the data of this problem by embedding (P ) into a

family of problems
min fy (x), x ∈ X, (Py )
where the index y runs over a finite dimensional inner product space Y , and where
fy is a function X → (−∞, +∞] for all y ∈ Y , and where f0 = f . The vector y
represents changes in some of the data of problem (P ). This family of optimization
problems can be described efficiently by just one object, by the function F : X ×
Y → (−∞, +∞] defined by F (x, y) = fy (x) for all x ∈ X, y ∈ Y . The function
F is called the perturbation function and (Py ) is called a perturbed problem.
Here are two examples where perturbations arise naturally.
Example 7.7.1 (Perturbation)
1. Suppose you want to choose a location on a road that is close to a given number
of locations; to be precise the sum of the squares of the distances from this point
to the given points has to be minimized. Consider the numerical example that
there are three given locations, given by points on the line R, two of which are
precisely known, 3 and 5, and one is approximately known, y ≈ 0. Then you
choose X = Y = R and F (x, y) = (x − y)2 + (x − 3)2 + (x − 5)2 for all
x, y ∈ R.
2. Suppose that you want to find the point in the plane R2 that satisfies the inequality
x1 + 3x2 ≤ 1 and that is as close as possible to the point (1, 2). We are interested
in the sensitivity of to a change of the right hand side of the inequality, which
might represent a budget. Then you choose X = R2 , Y = R and F (x, y) =
(x1 − 1)2 + (x2 − 2)2 if x1 + 3x2 − 1 + y ≤ 0 and F (x, y) = +∞ otherwise.
7.8 Lagrange Multipliers (Smooth Case)
We recall the method of Lagrange multipliers for smooth problems.

Theorem 7.8.1 (Lagrange Multiplier Rule) Let functions f0 , . . . , fm : U → R,
where the set U ⊆ Rn is open, and a point of continuous differentiability x of these
functions be given. Then there exists a selection of real numbers λ = (λ0 , . . . , λm ),
not all zero, such that the derivative of the Lagrange function
x → L(x; λ) = λ0 f0 (x) + · · · + λm fm (x)

at
x is zero,
L (
x ; λ) = λ0 f0 (
x ) + · · · + λm fm (
x ) = 0,
if the following problem has a local minimum at

x,
min f0 (x), x ∈ X, f1 (x) = · · · = fm (x) = 0.
Sketch of the Proof of Theorem 7.8.1 We assume that f1 ( x ), . . . , fm (

x ) are lin-
early independent. This can be done wlog, for if there exists a nontrivial relation
λ1 f1 (
x ) + · · · + λm fm (
x ) = 0, then we can choose λ0 = 0 and then we are through.
Consider the following perturbation of the constraints of the problem: fj (x) +
yj = 0, 1 ≤ j ≤ m. To be more precise, we consider the perturbation function
F : X × Y → (−∞, +∞], where Y = Rm , defined by F (x, y) = f0 (x) if f1 (x) +
y1 = · · · = fm (x) + ym = 0 and F (x, y) = +∞ otherwise. To prove the theorem,
it suffices to verify the claim that the following proper inclusion holds:
(X × 0Y × 0) + T( x )) (F ) ⊂ X × Y × R.
x ,0,f (
Indeed, F can be seen as the solution set of the C 1 -function with recipe x →
(z, y) = (f0 (x), −f1 (x), . . . , −fm (x)), so by Proposition 7.4.2, T( x )) (F )
x ,0,f (
can be seen as the graph of the function with recipe
x → (z, y) = (f0 (
x )x, −f1 (
x )x, . . . , −fm (
x )x).
The claim implies that this tangent space lies in a hyperplane with equation η, y−
η0 z = 0 for suitable η = (λ1 , . . . , λm ) ∈ Y, η0 = λ0 ∈ R, not both zero. This gives
the required Lagrange equations.
To prove the proper inclusion above, it suffices to show that the intersection of the
tangent space T( x )) (F ) and the subspace {y = 0} is contained in X × Y × 0.
x ,0,f (
Indeed, this implies that the intersection of (X × 0Y × 0) + T( x )) (F ) and the
x ,0,f (
subspace {y = 0} is contained in X×0Y ×0; therefore (X×0Y ×0)+T( x )) (F )
x ,0,f (
cannot be the entire space X × Y × R.
To begin with, the tangent space to the intersection F ∩ {y = 0} at the point
x ) is equal to the intersection of the tangent spaces to F and {y = 0} at
x , 0, f (
(
this point. We omit the easy verification. So, as f (x) = F (x, 0) for all x, it remains
to show that the tangent space to f at the point ( x )) is contained in X × 0.
x , f (
This follows immediately from the local minimality of f at x.
Remark 7.8.2 This sketch shows that Lagrange’s multiplier method in the smooth
case is equivalent to the proper inclusion
(X × 0Y × 0) + T( x )) (F ) ⊂ X × Y × R.
x ,0,f (
7.8 Lagrange Multipliers (Smooth Case) 191
Compare this reformulation of Lagrange’s multiplier method in the smooth case

with the reformulations of Fermat’s theorem in the smooth and convex case that
have been given in Remarks 7.4.3 and 7.6.2 respectively.
As an aside, we point out that the proof that is sketched above can be generalized
to a more general type of smooth optimization problem than minimization under
equality constraints.
The power of the Lagrange multiplier method is the reversal of the natural order
of the two tasks, which is ‘first eliminate and then differentiate’. The advantage
of this reversal is that it simplifies the most difficult task, elimination, turning it
from a nonlinear problem into a linear problem. In many examples of optimization
problems with equality constraints, this reversal is not necessary: one uses the
constraints to eliminate some variables, and then one applies Fermat’s theorem
(in the smooth case). But sometimes this does not work, such as in the following
example of a simple looking problem that would be very hard or even impossible to
solve without this reversal of tasks.
Example 7.8.3 (Lagrange Multiplier Rule) Find the minima and maxima of the
function x12 + 12x1 x2 + 2x22 on the ellipse 4x12 + x22 = 25.
1. Modeling and existence. f0 (x) = x12 + 12x1 x2 + 2x22 , x ∈ R2 , f1 (x) = 4x12 +
x22 − 25 = 0. Global extrema exist by Weierstrass: f0 is continuous and the
solution set of f1 (x) = 0 is a region bounded by an ellipse (including the ellipse)
and so it is nonempty, closed and bounded.
2. Lagrange. The Lagrange function is L = λ0 (x12 + 12x1 x2 + 2x22 ) + λ1 (4x12 +
x22 − 25). The optimality condition is Lx = 0, that is,
∂L
= λ0 (2x1 + 12x2 ) + λ1 (8x1 ) = 0,
∂x1
∂L
= λ0 (12x1 + 4x2 ) + λ1 (2x2 ) = 0.
∂x2
Exclusion of the bad case λ0 = 0: if λ0 = 0, then λ1 = 0 and x1 = x2 = 0; this

contradicts the constraint.
3. Analysis. Eliminate λ1 :
x1 x2 + 6x22 = 24x12 + 8x1 x2 .
This can be rewritten as

2
x2 x2
6 −7 − 24 = 0
x1 x1
as x1 cannot be equal to zero (this would imply that x2 = 0 and this contradicts
the constraint). This gives x2 = 83 x1 or x2 = − 32 x1 . In the first (resp. second)
case we get x1 = ± 32 and so x2 = ±4 (resp. x1 = ±2 and so x2 = ∓3; we

use the sign ∓ as x2 has a different sign as x1 ), using the equality constraint.
Comparison of these four candidates:
f0 (2, −3) = f0 (−2, 3) = −50,
3 3 1
f0 ( , 4) = f0 (− , −4) = 106 .
2 2 4
4. Conclusion. (2, −3) and (−2, 3) are global minima and ( 32 , 4) and (− 32 , −4) are
global maxima.
7.9 Lagrange Multipliers (Convex Case)
Now we give the method of Lagrange multipliers for convex problems. The setup is
the same as in Sect. 7.7. We write S(y) for the optimal value of the problem (Py ),
for all y ∈ Y . Thus we get a function S : Y → R, called the optimal value function.
Proposition 7.9.1 If the perturbation function F is convex, then the optimal value
function S is convex.
The proof of this result shows how convenient the concept of strict epigraph can
be (proving the result using the epigraph is not convenient).
Proof By definition, epis (S), the strict epigraph of S is the image of epi s (F ), the
strict epigraph of F under the projection X × Y × R → Y × R, given by the
recipe (x, y, z) → (y, z). The F is a convex function, epis (F ) is a convex set. As
convexity of sets is preserved under talking the image under a linear function, we
get that epis (S) is a convex set and so that S is a convex function.
Figure 7.6 illustrates the shadow price interpretation of a multiplier η. By the
proposition above, there exists if S(0) is finite a closed halfspace in Y × R that
supports the epigraph of S at the point (0, S(0)). Let (η, η0 ) be an outward normal
of this halfspace. If this halfspace is not vertical, that is, if η0 = 0, then we can
Fig. 7.6 Shadow price S

interpretation of multiplier η
slope η
0 y
7.9 Lagrange Multipliers (Convex Case) 193
assume that η0 = 1 by scaling. Then we have S(y) ≥ S(0) + η · y for all y ∈ Y .

That is, η ∈ ∂S(0). Therefore, η is a shadow price: it measures the sensitivity of
changes in the data of our problem.
We need some definitions.
Definition 7.9.2 A selection of multipliers is an element (η, η0 ) ∈ Y × [0, +∞) \
{0Y ×R }. In particular, it is not allowed that η and η0 are simultaneously zero: at least
one of the two elements η ∈ Y and η0 ∈ R should be nonzero.
Definition 7.9.3 The Lagrange condition holds for a feasible solution of (P ) and a
selection of multipliers (η, η0 ) if the closed halfspace in Y × R with (0Y , f (
x )) on
the boundary and with out of the halfspace pointing normal (η, −η0 ) is a supporting
hyperplane for epi(S).
For the parallel to the smooth case, note that the Lagrange condition holds for x
and some selection of multipliers (η, η0 ) iff the following proper inclusion holds:
(X × 0 × 0) + T( x )) (epi(F )) ⊂ X × Y × R.
x ,0,f (
Definition 7.9.4 A Slater point is a vector x̄ ∈ X for which 0Y is an interior point

of dom(F (x̄, ·)). The Slater condition holds if there exists a Slater point.
Now we can state Lagrange’s method for convex problems.
Theorem 7.9.5 (Method of Lagrange Multipliers (Convex Case)) Let X, Y be
finite dimensional inner product spaces, F a perturbation function X × Y →
(−∞, +∞] and x ∈ X. Assume that F is a proper closed convex function.
Then the following statements hold true for the convex problem (P ) to minimize
f (x) = F (x, 0):
1. If the point x is an optimal solution of (P ), then the Lagrange conditions hold
for some selection of multipliers.
2. If the Lagrange condition holds for x and some selection of multipliers (η, η0 )
with η0 = 0, then x is an optimal solution of (P ).
3. If the Slater condition holds, then the Lagrange conditions cannot hold with η0 =
0.
Note that the convex method of Lagrange multipliers gives under the Slater
condition a criterion—that is, a necessary and sufficient condition—for optimality.
Proof
1. If x is an optimal solution of (P ), then f ( x ) = S(0). Take a supporting
halfspace to epi(S) at the point (0, S(0)). Let (η, η0 ) be its outward slope. Then
the Lagrange condition holds for x and (η, η0 ), by definition of the Lagrange
condition.
2. If the Lagrange condition holds for x and (η, η0 ) with η0 = 0, then the halfspace
involved is not vertical and hence f (
x ) must be equal to S(0).
3. If the Slater condition holds then in some neighborhood of 0Y the optimal value
function does not take the value +∞. Hence, a separating halfspace to epi(S) at
the point (0, S(0)) cannot be vertical.
Remark 7.9.6 The Lagrange multiplier method in the convex case is equivalent to
the following proper inclusion:
(X × 0Y × 0) + T( x )) (epi(F )).

x ,0,f (
Compare this reformulation for Lagrange’s multiplier method in the convex case
to the reformulations for Fermat’s theorem in the smooth and convex case and
the reformulation for Lagrange’s multiplier method in the smooth case, given in
Remarks 7.4.3, 7.6.2, and 7.8.2 respectively.
At first sight, it might not be clear how the method of Lagrange for the
convex case can be used to solve optimization problems: the Lagrange condition
is formulated in terms of the optimal value function S, which is a priori unknown.
However, if we replace S by its definition in terms of the function F , we get a
repeated minimization: first over x ∈ X and then over y ∈ Y . Changing the
order of this repeated minimization brings the Lagrange condition in a form that is
convenient to use. Now we give the outcome of this reformulation of the Lagrange
condition. We need a definition, a notation for the minimization over y ∈ Y in the
reversed order.
Definition 7.9.7 The Lagrange function L : X × (Y × [0, +∞) \ {0Y ×R }) → R is
given by the recipe
L(x, η, η0 ) = inf (η0 F (x, y) − η, y)

y∈Y
where y runs over all elements of Y for which F (x, y) ∈ R. We write
L(x, η) = L(x, η, 1).
The Lagrange function has the following property.

Proposition 7.9.8 The Lagrange function L(x, η, η0 ) is convex in x ∈ X
The proof of this result is similar to the proof of Proposition 7.9.1.
Now we can give the promised user-friendly reformulation of the Lagrange
condition.
Proposition 7.9.9 The Lagrange condition for x ∈ X and a selection of Lagrange
x ) = L(
multipliers (η, η0 ) is equivalent to the conditions f ( x , η, η0 ), and 0 ∈
∂L(
x , η, η0 ).
Expressed in terms of the perturbation function F , the Lagrange condition for
the point
x and the selection of multipliers (η, η0 ), states that the epigraph of F is
7.9 Lagrange Multipliers (Convex Case) 195
tangent plane
slope η
∧ ∧
(x, 0, F (x, 0))
z = F (x, y)
O
∧
x
x y
Fig. 7.7 Convex method of Lagrange multipliers (0, η) ∈ ∂F (

x , 0)
contained in a closed halfspace η, y ≤ ηz for some η ∈ Y, η0 ∈ R+ , not both zero.

If moreover η0 = 0, then we can take η0 = 1 and then we get (0Y , η) ∈ ∂F ( x , 0Y ).
Figure 7.7 illustrates Lagrange’s multiplier method for the convex case, for the
formulation of the Lagrange condition in terms of F . Here F is chosen such that
its graph is a paraboloid. So the problems (Py ), and in particular (P ), are problems
to find the lowest point of a valley parabola. For (P ) this parabola is drawn and its
lowest point is at x . The Slater condition is satisfied: each point x̄ is a Slater point
as the effective domain of F is the entire space X × Y = R2 . The tangent plane
to the paraboloid is drawn at the point above ( x , 0). Its slope in the x-direction is 0
by the minimality property of x . Here we see that the convex method of Lagrange
multipliers is just the convex Fermat theorem in disguise! Its slope in the y-direction
is denoted by η. As this tangent plane supports the paraboloid at the point above
x , 0), we have—by the definition of the subdifferential—that the first one of the
(
two equivalent conditions in the theorem holds: (0, η) ∈ ∂F ( x , 0).
In fact, Fig. 7.7 also illustrates Lagrange’s multiplier method for the smooth case
(to be more precise, for a more general version than given in Theorem 7.8.1: for
a function F that satisfies suitable smoothness conditions). Note here that if F is
both convex and smooth, as in Fig. 7.7, then linearization of F by a supporting
hyperplane (‘convex linearization’) is the same as linearization of F by a tangent
space (‘smooth linearization’).
The purpose of the following simple example is to illustrate all notions from this
section and Lagrange’s multiplier method for the convex case,
Example 7.9.10 (Lagrange’s Multiplier Method for the Convex Case) Suppose you
want to choose a location on a road that is close to a given number of locations on
this road; to be precise, the sum of the squares of the distances from this point to
the given points has to be minimized. Consider the numerical example that there
are three given locations, given by points on the line R, two of which are precisely
known, 3 and 5, and one is approximately known, 0. This can be modeled as follows
min f (x) = x 2 + (x − 3)2 + (x − 5)2 , x ∈ R. (P )
This is a convex optimization problem. We embed this problem into the following
family of optimization problems
min fy (x) = (x − y)2 + (x − 3)2 + (x − 5)2 , x ∈ R. (Py )
Here y models the position of the location that is approximately known. These
problems (Py ) are all convex optimization problems.
It is straightforward to solve the problem (P ) and to determine the sensitivity of
its optimal values to small changes in the location that is approximately known to
be 0. The solution of (Py ) is seen to be the average location (3 + 5 + y)/3; therefore,
the solution of (P ) is 8/3, and substitution of x = (3 + 5 + y)/3 in fy (x) and then
taking the derivative at y = 0 gives η = S (0) = −16/3. So now that we are in the
comfortable possession of these answers, we can use this numerical problem to get
familiar with the convex method of Lagrange and all concepts involved.
The family of problems (Py )y can be described by one perturbation function,
F (x, y) = (x − y)2 + (x − 3)2 + (x − 5)2 for all x, y ∈ R. This is a proper closed
convex function. Note that the Slater condition holds: each point x̄ ∈ R is a Slater
point as F (x, y) < +∞ for all x, y ∈ R. The Lagrange function is
L(x, η) = inf[(x − y)2 + (x − 3)2 + (x − 5)2 − ηy].

y
The expression that has to be minimized is convex. Carry out this minimization
by putting the derivative with respect to y equal to zero: 2(y − x) − η = 0. So
y = x + 12 η. Substitution gives the following formula for the Lagrange function,
1
L(x, η) = − η2 − xη + (x − 3)2 + (x − 5)2 .
4
This function is convex in x for each η. The convex method of Lagrange multipliers
gives that
x is a solution of the given problem (P ) iff there exist η ∈ R such that
x ) = L(
f ( x , η) and Lx (
x , η) = 0. Here we have used that as L(x, η) is convex in x,
the condition L( x , η) = minx L(x, η) is equivalent to the condition Lx (x , η) = 0.
So we have, writing x instead of x for simplicity:
1
x 2 + (x − 3)2 + (x − 5)2 = − η2 − xη + (x − 3)2 + (x − 5)2 ,
4
−η + 2(x − 3) + 2(x − 5) = 0.
7.10 *Generalized Optimal Solutions Always Exist 197
The second equation gives η = 4x − 16. Substitution in the first equation and
simplifying gives 9x 2 − 48x + 64 = 0,that is (3x − 8)2 = 0, so x = 8/3, and so
η = 4 · 83 − 16 = −16/3. This is in agreement with the outcomes above.
7.10 *Generalized Optimal Solutions Always Exist
The aim of this section is to explain by means of examples that for any given convex
optimization problem, an optimal solution in a suitable generalized sense always
exists.
Example 7.10.1 (Generalized Optimal Solution for a Convex Optimization Problem)
Figure 7.8 illustrates the existence of optimal solutions using the top-view model,
for a better insight in the existence question.
Fig. 7.8 Existence of a 1.

generalized optimal solution
in the top-view model
0
2.
0
3.
4.
5.
6.
The illustrations suggest that an optimal solution in a suitable generalized sense

always exists. The top-view model suggests how to define this generalized optimal
solution, as we will see.
Now we will explain Fig. 7.8. We include for convenience an explanation of the
top-view model, which was already presented in Chap. 1.
The top-view model works as follows for a convex function f of one variable.
Recall that the points of the epigraph of f can been modeled as rays of the
convex cone c(f )—by homogenization. The rays of cl(c(f )), the closure of c(f ),
correspond bijectively to points on the standard upper hemisphere in space R3 —
by normalization (dividing nonzero vectors by their length). Now we make a
top-view model: we look at the hemisphere from above; then we see a closed
disk. More formally, the points on the hemisphere correspond bijectively to the
points of the standard closed unit disk in the plane R2 —by orthogonal projection
(x, y, z) → (x, y) of the hemisphere on the disk.
So, in all we can model the epigraph of f as a subset of the standard unit disk
in the plane. The effect of taking the closure of this subset of the disk is that certain
points on the boundary of the disk—this boundary is the standard unit circle—will
be added. In particular, the point (0, 1) will be added: this represents the fact that
(0, 1) is a recession vector of the epigraph of each function. The set of points that are
added on or below the vertical coordinate-axis of the plane correspond bijectively
to the rays of Rf , the recession cone of f .
Here are details of the six cases in Fig. 7.8. In each case, you see on the left hand
side the epigraph of f together with a horizontal line at level the optimal value r,
provided this is finite. On the right hand side, you see the closure of this epigraph in
the ball model and this horizontal line. Note that here we also have such a line if the
optimal value is −∞: the lower half of the circle. We define a generalized solution
as the points given by the common points of the graph of f and the horizontal line
at level the optimal value of f .
1. Here f has a unique minimum x . We have assumed wlog that x = 0 and f (
x) =
0.
2. Here f has an unbounded set of minima, but f is not constant. We have assumed
wlog that the optimal value is 0 and that the solution set is [0, +∞). In the top-
view model, you see that there is a unique generalized optimal solution which
is not an ordinary optimal solution, represented by the point (1, 0). This point
corresponds to the fact that the recession cone of f consists of the ray generated
by the vector (1, 0).
3. Here the optimal value is finite, but there is no optimal solution. We have
assumed wlog that the optimal value is 0, and that limx→+∞ f (x) = 0. In
the top-view model, you see again that there is a unique generalized optimal
solution, represented by the point (1, 0). This point corresponds to the fact that
the recession cone of f consists of the ray generated by the vector (1, 0). Note
that the curve drawn inside the disk intersects the circle in the point (1, 0) at a
right angle.
7.11 Advantages of Convex Optimization 199
4. Here the optimal value is −∞ and f tends to −∞ with less than linear speed. We
have assumed wlog. that limx→+∞ f (x) = −∞. In the top-view model, you see
again that there is a unique generalized optimal solution, represented by the point
(1, 0). This point corresponds to the fact that the recession cone of f consists of
the ray generated by the vector (1, 0). Note that the curve drawn inside the circle
is tangent to the circle in the point (1, 0).
5. Here, f has a downward sloping asymptote for x → +∞. In the top-view model,
you see that the set of generalized solutions is a circle segment: one endpoint is
(1, 0) and the other endpoint corresponds to the slope of the asymptote mentioned
above. Note that the curve drawn inside the circle is not tangent to the circle in
the ‘other’ endpoint.
6. Here, f is unbounded below for x → +∞ but it has no asymptote. In the top-
view model, you see that the set of generalized solutions is again a circle segment:
one endpoint is (1, 0) and the other endpoint corresponds to the arrow on the left
hand side, which is a steepest recession vector of the epigraph of f . Note that the
curve drawn inside the circle is tangent to the circle in the ‘other’ endpoint.
That there exists an optimal solution in a generalized sense, for each of the six
problems in Fig. 7.7, is no coincidence. It is not hard to define this solution concept
in general, in the same spirit, and then it follows from the compactness of the ball
that there always exists an optimal solution in this generalized sense. All existence
results for ordinary optimal solutions above can be derived from this general result.
Conclusion For each convex optimization problem, there exists a solution in
a natural generalized sense. This gives existence of ordinary solutions under
assumptions on recession vectors.
7.11 Advantages of Convex Optimization
The aim of this section is to list advantages of working with convex optimization
problems.
1. All local minima are global minima.
2. The optimality conditions are criteria for global minimality: they are sufficient
as well as necessary conditions for optimality. That is, they hold if and only if
the point under investigation is a minimum.
In particular, note the following advantages:
(a) No need to prove existence of an optimal solution if you can find a solution
of the optimality conditions and are satisfied with just one optimal solution.
However, if the analysis of the optimality conditions leads only to equations
which characterize the optimal solutions and not to a closed formula, then it
is essential that you check independently the existence of an optimal solution
to ensure that these equations have a solution.
(b) The following ‘educated guess’-method is available: often the optimality

conditions lead to the distinction of many cases depending on which
constraints are binding; sometimes you can make an educated guess which
case is promising (as you have a hunch) and then it is not necessary to
consider all the many other cases.
3. Checking existence of the optimal solution can be done easier than in general—
using recession cones. Moreover, if a problem has no optimal solution, then it is
often possible to prove this—again using recession cones.
4. The set of optimal solutions is a convex set.
5. For uniqueness, the principle of monotonicity boils down to conditions of strict
convexity (note that a differentiable function of one variable is strictly convex
iff its derivative is monotonic increasing): a strictly convex function can have at
most one minimum and a non-constant linear function on a strictly convex set
has at most one minimum.
6. Determining the sensitivity amounts to computing the subdifferential of a certain
auxiliary function—the optimal value function—at a point, and this computation
is often possible.
7. Last but not least, the class of convex optimization problems corresponds more
or less to the class of problems that can be solved numerically in a way that can
be proven to be efficient. However, optimization algorithms fall outside the scope
of this book.
7.12 Exercises
1. Show that the optimization problem in Nash’s theorem that characterizes the
fair bargain for a bargaining opportunity, Theorem 1.1.1, is equivalent to the
following optimization problem
min(− ln f (x)), x ∈ F, x > v
and show that this problem is convex.

2. Show that a point x ∈ X is an optimal solution for the convex optimization
problem to minimize a convex function f : X → R iff f ( x ) = v ∈ R, where v
is the optimal value of the problem.
3. Show that if the problem to minimize a given convex function f : X → R has
a solution, then f must be proper.
4. Show that the set of optimal solutions of a convex problem is a convex set.
5. Give a precise proof of Theorem 1.1.1 on Nash bargaining. A sketch of the
proof has been given in Chap. 1.
6. Give examples of convex optimization problems having no optimal solutions in
each of the following cases:
7.12 Exercises 201
(a) the optimal value is +∞,

(b) the optimal value is −∞ and the objective function is proper,
(c) the optimal value is a real number.
7. What is the—hidden—strict convexity in Theorem 7.2.7?
8. Show that each convex optimization problem can be reformulated as a conic
problem, by using homogenization.
9. Show that the local minima of the three problems that are given in Exam-
ple 7.3.1 are not smooth.
10. Prove Theorem 7.4.1 by a direct argument (not using the concept of tangent
space).
11. Let a line l in the point in the plane R2 be given and suppose that you can move
with speed v1 on one side of v1 and with speed on the other side of l. Solve
the problem of the fastest route between two given points on different sides of l
by means of a law. That is, prove that there exists an optimal solution, that it is
unique and rewrite the equation given by Fermat’s theorem in some attractive
way.
12. Solve the problem about the waist of three lines in space by finding a law that
characterizes the optimal solution.
13. Show that Theorem 7.6.1 is just a tautology: its proof is an immediate
consequence of the definitions of the subdifferential and of an optimal solution.
14. Let fi , 0 ≤ i ≤ m be proper closed convex functions on X. Define F (x, y) =
f0 (x) if fi (x) + yi = 0, 1 ≤ i ≤ m and = +∞ otherwise. Show that F is a
proper closed convex function.
15. Solve each one of the optimization problems in Example 7.7.1 in two ways:
using Lagrange’s multiplier method for smooth problems and using Lagrange’s
multiplier method for convex problems.
16. *Show that the function f and more generally the functions fy , y ∈ Y in
Sect. 7.9 are always closed but not necessarily proper.
17. Prove that L(x, η) = −F ∗(2) (x, η) for all x ∈ X, η ∈ Y , where the superscript
∗(2) denotes the operator of taking the conjugate function with respect of the
second variable of F , which is y.
18. Derive Theorem 7.6.1 from Theorem 7.9.5.
Hint: take any finite dimensional inner product space Y , for example the zero
space, and define F (x, y) = f (x) for all x ∈ X, y ∈ Y .
19. Try to prove Proposition 7.9.1 by using that a function is convex iff its epigraph
is convex. Note that this proof is not so straightforward as the proof in terms of
the strict epigraph.
20. *Show that the optimal value function need not be proper and need not be
closed.
21. Show that the Slater condition together with the condition S(0) ∈ R imply that
S is proper, that 0Y ∈ ri(dom(S)) and so that ∂S(0) is nonempty.
22. *Give the details of the proof of Theorem 7.9.5.
25. Give an example of a convex optimization problem, a feasible point x , and

a perturbation, such that the Lagrange condition holds for some selection of
multipliers (η, η0 ) with η0 = 0, but
x is not an optimal solution.
26. Formulate a concept of generalized solution for a convex optimization problem
and prove that every convex optimization problem has a generalized solution.
Hint. Work with the top-view model and use that every closed subset of a ball
is compact.
27. Give an example of a twice differentiable strictly convex function f for which
the Hessian is not positive definite everywhere.
29. Solve the problem
|x − 2| + max {3x + 3 , ex + 2} → min
30. Solve the problem
1 2
|x + y − 2| + |3x − 2y − 1| + x 2 − x + y + ex−3y → min
2
31. For all possible values of the parameters a, b > 0, solve the problem

x 2 + y 2 + 2 (x − a)2 + (y − b)2 → min
32. Two villages are located on one side with respect to a right highway (a straight
line). The first one is on the distance of 1 km from the highway and the second
one is on the distance of 6 km from the highway and of 13 km from the first
village. Where to put the gasoline station so that the sum of distances from the
two villages and from the highway is minimal?
33. Formulate and prove an analogue to the Fermat-Torricelli-Steiner theorem for
four points in the three dimensional space.
34. For all possible values of the parameters α > 0, solve the problem

max 2x + 1 , x + y + α x 2 + y 2 − 4x + 4 → min
35. Let a ∈ Rn be a given non-negative vector. Reduce the following problem to a

convex problem:
⎧
⎪
⎪ x · · · xn → max
⎪ 1
⎪
⎪
⎨
a, x ≤ 1
⎪
⎪
⎪
⎪
⎪
⎩ x ≥ 0, i = 1, . . . , n .
i
7.12 Exercises 203
36. Let a ∈ Rn be a given non-negative vector. Reduce the following problem to a

linear programming problem:
⎧
⎪
⎪ a, x → min
⎪
⎪
⎪
⎨
n
⎪ i=1 i |xi | ≤ 1
⎪
⎪
⎪
⎪
⎩ n x = 0 .
i=1 i
Try to make the new problem have a not very large number of constraints (the
smaller the better).
Chapter 8
Optimality Conditions: Reformulations
Abstract
• Why. In applications, many different forms of the conditions for optimality for
convex optimization are used: duality theory, the Karush–Kuhn–Tucker (KKT)
conditions, the minimax and saddle point theorem, Fenchel duality. Therefore, it
is important to know what they are and how they are related.
• What.
– Duality theory. To a convex optimization problem, one can often associate a
concave optimization problem with a completely different variable vector of
optimization, but with the same optimal value. This is called the dual problem.
– KKT. This is the most popular form for the optimality conditions. We also
present a reformulation of KKT in terms of subdifferentials, which is easier to
work with.
– Minimax and saddle point. When two parties optimize against each other,
then this sometimes leads to an equilibrium, where both have no incentive
to change the current situation. This equilibrium can be described, in formal
language, by a saddle point, that is, by vectors x ∈ Rn and y ∈ Rm for
which, for a suitable function F (x, y), one has that F ( x,
y ) equals both
minx maxy F (x, y) and maxy minx F (x, y).
– Fenchel duality. We do not describe this result in this abstract.
Road Map
• Figure 8.1 and Definition 8.1.2 (the dual problem and its geometric intuition).
• Proposition 8.1.3 (explicit description of the dual problem).
• Figure 8.2, Theorem 8.1.5 (duality theory in one picture).
• Definitions 8.2.1–8.2.4, Theorem 8.2.5, numerical example (convex program-
ming problem, Lagrange function, Slater condition, KKT theorem).
• The idea of Definition 8.2.8 (KKT in the context of a perturbation of a problem).
• Theorem 8.3.1, numerical example (KKT in subdifferential form).
• Section 8.4 (minimax, maximin, saddle point, minimax theorem of von Neu-
mann).
• Section 8.5 (Fenchel duality).

206 8 Optimality Conditions: Reformulations
8.1 Duality Theory
The aim of this section is to present duality theory.

We consider again the setup for the optimality conditions in terms of per-
turbations of data. Let X, Y be finite dimensional inner product spaces and let
F : X × Y → (−∞, +∞] be a proper closed convex function. The problem
minx f (x) (P ) where f (x) = F (x, 0) is called in the present context the primal
problem (P ). Now we define the dual problem. We recall the notation S(y) =
infx F (x, y), the optimal value of the problem (Py ) to minimize F (x, y) as a
function of x, for all y ∈ Y and that the function S is convex.
Example 8.1.1 (Dual Problem) The idea of the duality problem is to get the best
possible lower bound of a certain type for the optimal value S(0) of the primal
problem (P ). Figure 8.1 illustrates this.
Let η ∈ Y be given. Take for each η ∈ Y an affine (= linear plus constant)
function on Y with slope η for which the graph intersects the graph of S. Then move
this graph in vertical direction downward till the last moment that it has a point in
common with the graph of S. Equivalently, take an affine function on Y with slope
η for which the graph lies completely under the graph of S and then move it upward
till it ‘bounces’ against the graph of S. Then consider the intercept with the vertical
axis. This intercept is by definition at height precisely ϕ(η).
It follows from the convexity of S that ϕ(η) is a lower-bound for S(0):
S(0) ≥ ϕ(η) ∀η ∈ Y.
This concludes the explanation that the dual problem is the problem of finding the
best lower bound for the optimal value of (P ) of a certain type.
Now we give a formal definition of the dual problem.
Definition 8.1.2 The dual problem (D) is the problem to maximize the function
ϕ : X → R given by ϕ = −S ∗ . Note that ϕ is concave (that is, minus this function
is convex).
Note that the dual problem depends on the chosen family of perturbed problems
(Py ), y ∈ Y —that is, on F —and not only on the primal problem—that is, not only
Fig. 8.1 Dual problem S slope η
ϕ(η)
0
8.1 Duality Theory 207
on f . However, often there is for a given convex optimization problem a unique

natural perturbation (Py ), y ∈ Y —that is, a natural unique F .
The meaning of the dual problem is that it is the problem to find the best possible
lower bound for the optimal value of (P ) of a certain type. Indeed, if you work out
the definition of ϕ, then you get the following equivalent description, illustrated in
Fig. 8.1.
At first sight, this dual problem might not look very useful. What we know is F ,
and not S, let alone S ∗ . Fortunately, one can express the dual problem in terms of
the Lagrange function L(x, η) as follows.
Proposition 8.1.3
ϕ(η) = inf L(x, η) ∀η ∈ Y.

x
Proof We have ϕ = −S ∗ , S ∗ (η) = infy (η, y − S(y)) ∀η ∈ Y and S(y) =

infx∈X F (x, y). So in all we get
ϕ(η) = inf inf[F (x, y) − η, y].

y x
Now comes the crux of the proof. We interchange the two infimum operations. This
gives
ϕ(η) = inf inf[F (x, y) − η, y].

x y
By definition of the Lagrange function, this gives
ϕ(η) = inf L(x, η) ∀η ∈ Y

x
as desired.
This result is useful: experience has shown that often one can calculate—for a
given function F —the Lagrange function L and then the function ϕ. The point is
that it so happens that if you take F (x, y) − η · y, then taking first minimizing
with respect to x and then with respect to y is more complicated than doing these
minimizations in the other order. Therefore you can proceed as follows: choose an
element η, and compute ϕ(η) by using Proposition 8.1.3; this is something valuable:
an explicit lower bound for the value of the primal problem:
S(0) ≥ ϕ(η).
Example 8.1.4 (Duality Theory in One Picture) Our findings are essentially the full
duality theory. Duality theory can be seen in one simple picture, Fig. 8.2, which
shows that ϕ(η) is maximal if it equals S(0), and then η is a subgradient for S at
0—so it is a measure of sensitivity for the optimal value.
Fig. 8.2 Duality theory for S ∧

convex optimization in one slope η
picture = S ′(0)
∧
ϕ(η)
Now we give a precise description of duality theory.

Theorem 8.1.5 Let F : X × Y → R be a proper closed convex function. Let the
primal problem (P ), the objective function f of (P ), the dual problem (D), the
objective function ϕ of (D), the optimal value function S and the Lagrange function
L be defined in terms of F as above.
1. Formula for objective function dual problem.
ϕ(η) = inf L(x, η) ∀η ∈ Y.

x
2. Weak duality. For each x ∈ X and η ∈ Y one has the inequality
f (x) ≥ ϕ(η).
3. Strong duality. Let x ∈ X, η ∈ Y . Then problem (P ) has optimal solution

x and
the problem (D) has optimal solution η and the optimal values of (P ) and (D)
x ) = ϕ(
are equal iff f ( η).
4. Sensitivity to perturbations. If S(0) ∈ R and
η is an optimal solution of (D) then

η ∈ ∂S(0) and so
S(y) ≥ S(0) +
η · y ∀y ∈ Y.
Strong duality need not always hold: if (P ) and (D) have a different optimal
value, then one says that there is a duality gap.
Example 8.1.6 (Duality Theory) Consider the problem to minimize x12 + 2x1 x2 +
3x22 under the constraint x1 + 2x2 ≥ 1. We perturb this problem by replacing the
constraint by
x1 + 2x2 ≥ y.
Then, one finds that the Lagrange function L(x, η) equals
x12 + 2x1 x2 + 3x22 + η(1 − x1 − 2x2 ),

8.1 Duality Theory 209
provided η ≥ 0; otherwise, it equals −∞. So the set of η for which L(x, η) is a

proper function of x is [0, +∞). Putting the partial derivatives of L with respect to
x1 and x2 equal to zero and solving gives
x1 (η) = (1/4)η and x2 (η) = (1/4)η.
This is a point of minimum for L(x, η) as a function of x, as this function is

convex, by the second order condition. Therefore,
ϕ(η) = L(x1 (η), x2 (η), η) = −(3/8)η2 + .
This function has a stationary point

η = 4/3. So
x (4/3) = (1/3, 1/3) .

x =
As
x ) = 2/3 = ϕ(
f ( η),
it follows that
x = (1/3, 1/3)

is a solution of the given problem. Observe that it lies on the boundary of the feasible
set: the constraint is binding.
For a coordinate-free treatment of duality theory for LP problems, see Sect. 9.8.
The oldest pair of primal dual problems was published in 1755 in the journal Ladies
Diary ([1]). See Sect. 9.5 for a description of this pair.
There is complete symmetry between primal and dual problem: the dual problem
can be embedded in a family of perturbed problems in a natural way such that if we
take its associated dual problem, we are back to square one: the primal problem.
Proposition 8.1.7 (Duality Scheme) Let F : X × Y → R be a proper closed
convex function. Then one has the following scheme:
(P ) min f (x) | max ϕ(η) (D)

x η
(Py ) min F (x, y) | max (ξ, η) (Dξ )

x η
f (x) = F (x, 0); (ξ, η) = −F ∗ (ξ, η) : ϕ(η) = (0, η).

8.2 Karush–Kuhn–Tucker Theorem: Traditional Version
The aim of this section is to present the Karush–Kuhn–Tucker (KKT) conditions in

the traditional version.
Definition 8.2.1 A convex programming problem is an convex optimization prob-
lem of the form
min f0 (x)
s.t. fi (x) ≤ 0, 1 ≤ i ≤ m,
where f0 , . . . , fn are proper closed convex functions on X = Rn .

We define for this type of problem the Lagrange function and the Slater condition.
Later you will see that these definitions will agree with the ones given for the
formulation of the optimality conditions in terms of F for a suitable choice of F .
Definition 8.2.2 The Lagrange function L(x, λ) on X × Rm + for the convex
programming problem is given by L(x, λ) = f0 (x) + · · · + λm fm (x) for all x ∈ X
and all λ ∈ Rm
+ where λ = (λ1 , . . . , λm ), that is, for all λ1 , . . . , λm ∈ R+ .
Definition 8.2.3 A Slater point of the convex programming problem is a feasible

point x̄ for which all constraints hold strictly: f1 (x̄) < 0, . . . , fm (x̄) < 0. The
Slater condition holds for a convex programming problem if there exists a Slater
point.
Now we define the optimality conditions for the convex programming problem.
Definition 8.2.4 The Karush–Kuhn–Tucker (KKT) conditions hold for a feasible
point x of the convex programming problem and for the selection of multipliers
λ0 , . . . , λm ∈ R+ , not all zero, if the function x → L(x, λ) has a minimum in
x = x ) = 0 ∀i ∈ {1, . . . , m}.
x and if moreover λi fi (
Theorem 8.2.5 (Karush–Kuhn–Tucker Theorem for a Convex Programming
Problem) Consider a convex programming problem
min f0 (x)
s.t. fi (x) ≤ 0 1 ≤ i ≤ m.
Assume that the Slater condition holds. Let

x be a solution of the problem. Then

x is an optimal solution of the problem iff the KKT conditions hold for x and a
suitable selection of multipliers λ ∈ Rm
+.
Note again that we get the ideal situation: a criterion for optimality for a feasible
point
x as the KKT conditions are necessary and sufficient for optimality.
Example 8.2.6 (KKT Example with One Constraint) Solve the shortest distance
problem for the point (1, 2) and the half-plane x1 + 3x2 ≤ 1.
8.2 Karush–Kuhn–Tucker Theorem: Traditional Version 211
The easiest way to solve this simple problem is geometrically: if you draw a
figure, then you see that you have to determine the orthogonal projection of the
point (1, 2) on the line x1 + 3x2 = 1 and to compute its distance to the point (1, 2).
The purpose of the analytical solution below is to illustrate the use of the KKT
conditions.
We will see that the analysis leads to the distinction of two cases; working out
these cases is routine but tedious, and the details will not be displayed below. We will
assume that we have a hunch which of the two cases leads to the optimal solution.
If this case leads to a point that satisfies the KKT conditions, then it is optimal,
as the KKT conditions are not only necessary but also sufficient. So then we do
not have to investigate the other case. This possibility of making use of a hunch
is an advantage of the KKT conditions. We emphasize here that the analysis of an
optimization problem with equality constraints by means of the Lagrange equations
leads also often to a distinction of cases, but then all cases have to be analyzed,
even if the first case that is considered leads to a point that satisfies the Lagrange
equations.
1. Modeling and convexity.
min f (x) = (x1 − 1)2 + (x2 − 2)2 subject to g(x) = x1 + 3x2 − 1 ≤ 0. (P )

x1 ,x2
This problem is convex as f is readily seen—from its Hessian—to be strictly

convex and g is affine. So there is at most one optimal solution. The Slater
condition holds: (0, 0) is a Slater point as g(0, 0) < 0.
2. Criterion. The KKT conditions are: there exists η ≥ 0 for which
2(x1 − 1) + η = 0,
2(x2 − 2) + 3η = 0,
η(x1 + 3x2 − 1) = 0.
3. Analysis. Suppose you have a hunch that in the optimum the constraint is
x1 + 3
binding, that is, x2 − 1 = 0. Then the KKT conditions boil down to:
2(x1 − 1) + η = 0,
2(x2 − 2) + 3η = 0,
x1 + 3x2 − 1 = 0,
η ≥ 0. Now you solve these three linear equations in three unknowns. This gives
x1 = 25 , x2 = 15 , η = 65 . Then you check that this solution satisfies the condition
η ≥ 0.
4. Conclusion. The given problem has a unique solution x = ( 25 , 15 ).
Example 8.2.7 (KKT Problem with Two Constraints) Solve the shortest distance
problem for the point (2, 3) and the set of points that have at most distance 2 to the
origin and that have moreover first coordinate at least 1.
Again, the easiest way to solve this problem is geometrically. A precise drawing
shows that one should determine the positive scalar multiple of (2, 3) that has length
2. Then one should take the distance of this point to (2, 3). The purpose of the
analytical solution below is to illustrate the KKT conditions for a problem with
several inequality constraints.
For a problem with m inequality constraints, 2m cases have to be distinguished.
So in this example the number of cases is 4. Again we assume that we have a hunch.
Again, as working out a case is routine but tedious, the details will not be displayed
below.
min (x1 − 2)2 + (x2 − 3)2 , x12 + x22 − 4 ≤ 0, 1 − x1 ≤ 0.

x=(x1 ,x2 )
This is a convex optimization problem. By the strict convexity of (x1 − 2)2 +

(x2 − 3)2 , there is at most one optimal solution. The Slater condition holds: for
example, (3/2, 0) is a Slater point. We do not yet establish existence of an optimal
solution as we hope that we can solve the KKT conditions.
2. Criterion.
The Lagrange function is
L(x, η, η0 ) = η0 ((x1 − 2)2 + (x2 − 3)2 ) + η1 (x12 + x22 − 4) + η2 (1 − x1 ).
The KKT conditions:

∂L
= 2(x1 − 2) + 2η1 x1 − η2 = 0,
∂x1
∂L
= 2(x2 − 3) + 2η1 x2 = 0,
∂x2
η1 , η2 ≥ 0,
η1 (x12 + x22 − 4) = 0,
η2 (1 − x1 ) = 0.
3. Analysis. Suppose we have a hunch that in the optimum only the first constraint
is binding. We start by considering this case (there are four cases). That is, x12 +
x22 = 4, 1−x1 = 0. Then the KKT conditions are led by some tedious arguments
√ √ √
to the solution x1 = 4/ 13, x2 = 6/ 13 and η1 = ( 13 − 2)/2, η2 = 0. By
the sufficiency of the KKT conditions, it follows that this is the unique solution.
8.3 KKT in Subdifferential Form 213
So we do not have to consider the other three cases; our educated guess turned
out to be right. √ √
4. Conclusion. The problem has a unique solution— x = (4/ 13, 6/ 13).
Now we are going to indicate how Theorem 8.2.5 can be derived from the convex
method of Lagrange multipliers. First, we embed the convex programming problem
in a family of perturbed problems.
Definition 8.2.8 For each y = (y1 , . . . , ym ) ∈ Rm the standard perturbed
problem of a convex programming problem (Py ) is the optimization problem
min f0 (x)
s.t. fi (x) + yi ≤ 0 1 ≤ i ≤ m.
It is easy to give a convex function F : X × Y → R̄—where X = Rn , Y = Rm

such that for each y ∈ Y the problem (Py ) is equivalent to the problem
min fy (x) = F (x, y), x ∈ X.
Indeed, define F (x, y) = f0 (x) if fi (x)+yi ≤ 0, 1 ≤ i ≤ m and = +∞ otherwise.

This function satisfies the requirements.
Then we apply the convex method of Lagrange multipliers to this function F . It
can be verified that this gives Theorem 8.2.5.
See Sect. 9.6 for an upbeat application of KKT in economics, the celebrated
second welfare theorem, that basically expresses that every desirable allocation
of goods can be achieved by a suitable price mechanism, provided one corrects
the starting position, for example by means of education in the case of the labor
market. And, last but not least, see the proof of Minkowski’s theorem on polytopes
in Chap. 9.
8.3 KKT in Subdifferential Form
The aim of this section is to reformulate the optimality conditions as the Karush–
Kuhn–Tucker (KKT) conditions in subdifferential form.
Vladimir Protasov has discovered a formulation of the KKT conditions in terms
of subdifferentials. This is very convenient in concrete examples. We only display
the main statement.
Theorem 8.3.1 (Karush–Kuhn–Tucker Conditions for a Convex Programming
Problem in Subdifferential Form) Assume there exists a Slater point for a given
convex programming problem. Let
x be a feasible point. Then
x is a solution iff
0n ∈ co(∂f0 ( x ), k ∈ K)
x ), ∂fk (
where K = {k ∈ {1, . . . , m} | fk (
x ) = 0}, the constraints that are active in
x.
To illustrate the use of the subdifferential version of the KKT conditions, we first
consider the two run-of-the-mill examples that were solved in the previous section
by the traditional version of the KKT conditions. We have seen that working with
the traditional version leads to calculations that are routine but tedious. Now we will
see that working with the subdifferential version leads to more pleasant activities:
drawing a picture and figuring out how the insight that the picture provides can be
translated into analytical conditions. Therefore, the subdifferential version of KKT
is an attractive alternative.
Example 8.3.2 (Subdifferential Version of KKT for a Shortest Distance Problem)
min f0 (x) = (x1 − 1)2 + (x2 − 2)2 , x ∈ R2 , f1 (x) = x1 + 3x2 − 1 ≤ 0.
This problem is convex as f0 and f1 are convex functions. The function f0 is

even strictly convex, so there is at most one solution. The Slater condition holds:
(0, 0) is a Slater point as f1 (0, 0) < 0.
2. Criterion (taking into account the hunch that the constraint is active in the optimal
point x ). The origin (0, 0) is contained in the convex hull of the two points
(2(x1 − 1), 2(x2 − 1)) and (1, 3).
3. Analysis. Drawing a picture reveals what the criterion means: the point (2(x1 −
1), 2(x2 − 1)) must lie on the line through (0, 0) and (1, 3), but on the other side
of (0, 0) than (1, 3) (it is allowed that it is equal to the point (0, 0)). That is,
(2(x1 − 1), 2(x2 − 1)) is a non-positive scalar multiple of (1, 3).
Translating this geometric insight into analytical conditions gives the equation
2(x2 − 1) = 6(x1 − 1) (here we use that two vectors (a, b) and (c, d) are linearly
dependent iff ad = bc) and the inequality 2(x1 − 1) ≤ 0. The equation can
be rewritten as 6x1 − 2x2 = 2; the active constraint gives x1 + 3x2 = 1. We
get solution ( 25 , 15 ). Finally, we check that the inequality is satisfied 2( 25 − 1) is
indeed non-positive.
4. Conclusion. The point ( 25 , 15 ) is the unique optimal solution of the problem.
Example 8.3.3 (Subdifferential Version of KKT for a Shortest Distance Problem)
min f0 (x) = (x1 − 2)2 + (x2 − 3)3 , x ∈ R2 , x12 + x22 − 4 ≤ 0, 1 − x1 ≤ 0.
This is a convex problem as f0 , f1 , f2 are convex functions. The function f0 is

even strictly convex, so there is at most one optimal solution.
2. Criterion (taking into account the hunch that the only active constraint is x12 +
x22 − 4 ≤ 0). The origin (0, 0) is contained in the convex hull of the two points
(2(x1 − 2), 2(x2 − 3)) and (2x1 , 2x2 ).
3. Analysis. Translation of the criterion, which is a geometric condition, into
analytical conditions gives the equation (x1 −2)x2 = (x2 −3)x1 and the inequality
2(x1 − 2)2x1 ≤ 0 (that is, x1 and x1 − 2 have different sign). The equation can be
8.3 KKT in Subdifferential Form 215
rewritten as −2x2 = −3x3 . Substitution in the active constraint x12 + x22 − 4 = 0

gives x12 + 94 x12 = 4 and so x12 = 16
13 . Thus we get, taking into account that x1
and x1 − 2 have different sign, that x1 = √4 and x2 = √6 .
13 13
4. Conclusion. The point ( √4 , √6 ) is the unique optimal solution.
13 13
Now we give a more interesting numerical example. It would be complicated to
solve this example using the traditional version of the KKT conditions.
Example 8.3.4 (Subdifferential Version of KKT for a Problem Involving Compli-
cated Functions) We consider the problem to minimize
y 4 − y − 4x → min,
subject to the constraints

√
max{x + 1, ey } + 2 x 2 + y 2 ≤ 3y + 2 5,

x 2 + y 2 − 4x − 2y + 5 + x 2 − 2y ≤ 2.
1. Modeling and convexity. f (x, y) = y 4 − y − 4x √→ min, (x, y) ∈

(R2 )T , g(x, y) = max{x + 1, ey } + 2|(x, y)| − 3y − 2 5 ≤ 0, h(x, y) =
|(x, y) − (2, 1)| + x 2 − 2y − 2 ≤ 0. This is seen to be a convex optimization
problem and the objective function is strictly convex.
2. Criterion. The origin is contained in the convex hull of ∂f (x, y), ∂g(x, y) and
∂h(x, y).
3. Analysis. Let us begin by investigating the point of non-differentiability (2, 1).
At this point we have: (1) the subdifferential of f consists of the point f (2, 1) =
(−4, 3), (2) the first constraint is active and the subdifferential
√ of g consists
√ of
the point g (2, 1) = (1, 1) = 2 (2,1)
(2,1)
+ (0, −3) = (1 + 45 5, −2 + 25 5), (3)
the second constraint is active, and the subdifferential of h is D + (4, −2), where
D denotes the unit disk.
It is strongly recommended that you try to draw a picture. How to compute
the convex hull of these subdifferentials? We can try to do this graphically by
drawing a picture. However, is a picture reliable? Maybe not, but it gives us a
valuable idea. We see that the origin lies more or less between the point (−4, 3),
the subdifferential of f and the disk with center (4, −2) and radius 1, which is
the subdifferential of h. This suggests the following rigorous observation: the
origin is the midpoint of the segment with endpoints (−4, 3) and (4, −3), and
the latter point lies in the disk with center (4, −2) and radius 1. This shows that
the origin lies in the convex hull of the subdifferentials (and the subdifferential of
the objective function f is genuinely taking part). That is, condition (0 ) holds
true and so (2, 1) is minimal.
4. Conclusion. The problem has a unique solution, ( x,
y ) = (2, 1).
Finally, we consider the bargaining example of Alice and Bob that was presented
in Sect. 1.1 and that was solved there. However, this time the problem will not be
solved by ad-hoc arguments, but by the KKT conditions, in subdifferential form.
Example 8.3.5 (Subdifferential KKT for a Bargaining Problem) We want to solve
the bargaining example of Alice and Bob that was presented in Sect. 1.1 and that
was solved there. However, this time the problem will not be solved by ad-hoc
arguments, but by the KKT conditions.
One can make an educated guess about the solution: Bob should give his ball,
which has utility 1 to him, to Alice, to whom it has the much higher utility 4; in
return, Alice should give her bat, which has utility 1 to her, to Bob, to whom it has
the higher utility 2; moreover, as Alice profits more from this exchange than Bob,
she should give the box with some probability to Bob; this probability should be >0
and <1.
By Theorem 1.1.1, it suffices to solve the problem to maximize the function
of two variables f (x) = x1 x2 on the set F ⊆ R2 of points x = p1 (−1, 2) +
p2 (−2, 2) + p3 (4, −1) for which 0 ≤ p1 , p2 , p3 ≤ 1. The educated guess above,
means that you have a hunch that in the optimum, p1 = p3 = 1 and 0 < p2 < 1.
min g(p) = − ln(−p1 −2p2 +4p3 )−ln(2p1 +2p2 −p3 ), x ∈ R3 , 0 ≤ p1 , p2 , p3 ≤ 1,
−p1 − 2p2 + 4p3 > 0, 2p1 + 2p2 − p3 > 0.
Here we have used that maximizing f (x) is equivalent to minimizing

− ln f (x) = − ln x1 − ln x2 , which is a convex function; moreover, we have
expressed x in p1 , p2 , p3 .
2. Criterion (taking into account the hunch that the constraints that are active in the
optimum p are p1 − 1 ≤ 0, p3 − 1 ≤ 0). The origin is contained in the convex
hull of ∂g(p) = {g (p)}, ∂(p1 − 1) = {(1, 0, 0)}, ∂(p3 − 1) = {(0, 0, 1)}. For
geometric reasons, this means that
∂g
≤ 0,
∂p1
∂g
= 0,
∂p2
∂g
≤ 0.
∂p3
3. Analysis. The second condition is
1 1
− (−2) − 2 = 0.
−p1 − 2p2 + 4p3 2p1 + 2p2 − p3
8.4 Minimax and Saddle Points 217
Substitution of p1 = p3 = 1 gives
2 2
− = 0.
3 − 2p2 1 + 2p2
This can be rewritten as 3 − 2p2 = 1 − 2p2 , that is, as 4p2 = 2. This has unique
solution p2 = 12 .
= (1, 12 , 1) satisfies the first and the third condition
It remains to check that p
above. These conditions are
1 1
− (−1) − 2 ≤ 0,
−p1 − 2p2 + 4p3 2p1 + 2p2 − p3
1 1
− 4− (−1) ≤ 0.
−p1 − 2p2 + 4p3 2p1 + 2p2 − p3
Now we substitute p = (1, 12 , 1) in the left hand sides of these inequalities and
check that we get a non-positive outcome. For the first one, we get
1 2 1
− = − 1 < 0.
−1 − 1 + 4 2 + 1 − 1 2
For the second one, we get
−4 1 1
+ = −2 + < 0.
−1 − 1 + 4 2 + 1 − 1 2
4. Conclusion. The hunch that it is optimal for Bob to give the bat to Alice, and for
Alice to give the ball to Bob is correct; moreover then it is optimal that Alice
should give the box to Bob with probability 12 .
8.4 Minimax and Saddle Points
The aim of this section is to present von Neumann’s minimax theorem.

Let a function L : X × Y → R̄ be given, to be called a saddle function.
Definition 8.4.1 The minimax of L is the following element of R̄
inf sup L(x, η)

x∈X η∈Y
and the maximin of L is the following element of R̄
sup inf L(x, η).

η∈Y x∈X
There is a relation between the minimax and the maximin that holds in general.
Proposition 8.4.2 (Minimax Inequality) The minimax is not smaller than the
maximin:
inf sup L(x, η) ≥ sup inf L(x, η).

x∈X η∈Y η∈Y x∈X
In order to formulate the proof, and for later purposes, it is useful to make the
following definitions.
Definition 8.4.3 The following two optimization problems are associated to L.
Write
f (x) = sup L(x, η) ∀x ∈ X

η∈Y
and
ϕ(η) = inf L(x, η) ∀η ∈ Y.

x∈X
and consider the primal problem minx f (x) and the dual problem maxη ϕ(η).
Proof of Proposition 8.4.2 One has
f (x) ≥ L(x, η) ≥ ϕ(η)
for all x ∈ X, η ∈ Y and so
inf sup L(x, η) = inf f (x) ≥ sup ϕ(η) = sup inf L(x, η). (*)
x∈X η∈Y x∈X η∈Y η∈Y x∈X
The converse inequality does not hold in general.

Definition 8.4.4 If the equality holds in the minimax inequality, then the common
value is called the saddle value of L.
Under suitable assumptions, one has that the converse inequality holds as well
and that both outer optimizations have optimal solutions. In order to formulate these
conclusions, the following concept is convenient.
Definition 8.4.5 A saddle point of L is a pair (
x,
η) ∈ X × Y for which
L(x,
η) ≥ L(
x,
η) ≥ L(
x , η)
for all x ∈ X, η ∈ Y .
The following result shows that the concept of saddle point is precisely what is
needed for the present purpose.
8.4 Minimax and Saddle Points 219
Proposition 8.4.6 A pair ( x,

η) ∈ X × Y is a saddle point of L iff x is an optimal
solution of the primal problem, η is a solution of the dual problem and if moreover
L has a saddle value.
Proof The definition of the saddle point can be written as the condition
x ) = L(
f ( x,
η) = ϕ(
η).
The required equivalence follows immediately from (∗).

Here we give the most celebrated result on minimax and saddle points. We need
a definition.
Definition 8.4.7 A function L : X × Y → R̄ is called convex-concave if
L(·, η) is convex for each η ∈ Y , and
L(x, ·) is concave for each x ∈ X.
Theorem 8.4.8 (Minimax Theorem of von Neumann) Let A ⊂ X and B ⊂ Y be
compact convex sets. If L : A × B → R is a continuous convex-concave function,
then we have that
min max L(a, b) = max min L(a, b).

a∈A b∈B b∈B a∈A
In particular, L has a saddle point.

The use of this result in game theory is illustrated by the example of taking a
penalty kick in Sect. 9.4.
Fundamentally, searching for the KKT point of a convex programming problem
is a problem of searching for the best (highest) lower bound on the optimal value
of the primal problem. For a given vector of Lagrange multipliers λ, the minimum
of the Lagrange function gives a lower bound on the optimal value of the convex
programming problem. Then, one can look for the best possible lower bound by
means of maximizing this these lower bounds over λ. Therefore, one is in fact
seeking to maximize a (concave) function of λ which in itself is defined as an
infimum of the (convex) function of the primal variables x.
Such problems are known as saddle point problems and are of independent
interest in fields that involve optimization of two adversarial forces. Examples are:
• robust optimization, where one seeks to optimize the primal decisions x assuming
that at the same time nature picks worst-possible forms (from a given set) of
functions fi ,
• game theory, with classical two-player games as an example (von Neumann
himself famously said “As far as I can see, there could be no theory of games
. . . without that theorem I thought there was nothing worth publishing until the
Minimax Theorem was proved.”)
• chemical engineering, where a balance point of a system is a result of several
adversarial forces.
A careful reader might observe that if the function L(x, η) is concave in η, then
the inner maximization problem is equivalent to a convex optimization problem that
can be dualized to obtain inside convex minimization problem over some variable
λ. After such a dualization, one would have the outer minimization over x, and
an inner minimization over some new decision vector, say λ. Because both the
inner and outer problems would be minimization and the resulting function could be
jointly convex and they could be considered jointly as a single convex optimization
problem. Hence, there seems to be no reason to consider a saddle point problem as
a separate entity. Unfortunately, the dualization of the inner problem could increase
the dimensionality of the problem significantly, which would outweigh the benefit
of having a min problem instead of a min-max problem. For that reason, in many
applications such as for example machine learning, one prefers to stay with a min-
max form of a given problem and solve it as such using so-called saddle-point
algorithms.
8.5 Fenchel Duality Theory
The aim of this section is to reformulate the optimality conditions as Fenchel’s

duality theory.
In a previous section, the reader has been introduced to the ideas of Lagrangian
duality. In the Lagrangian world, the dual problem is viewed through the perspective
of adding the constraint functions to the objective function using the Lagrange
multipliers. A minimum of such a function is then a lower bound on the optimal
value of the optimal value of the original convex optimization problem and the dual
problem is about finding the best lower bound by manipulating the λ.
Another view on duality is offered by the Fenchel duality, which is fundamentally
equivalent to the Lagrange duality, but which makes it sometimes easier to deter-
mine the dual problem by using simple function transformations called conjugation
f → f ∗ . The intuition behind this approach is the same as in the dual description
of a convex cone—one wants to describe a convex function in a dual way as a
supremum of all linear functions that underestimate it.
We need the concept of conjugate function of a concave function (that is, a
function for which minus the function is convex).
Definition 8.5.1 The concave conjugate f∗ of a concave function f : A → R,
where the set A ⊂ X is convex is the concave function f∗ : X → [−∞, +∞) given
by the recipe
f∗ (ξ ) = inf [x, ξ − f (x)].

x∈A
Note that the function f∗ is concave.

8.6 Exercises 221
We consider two proper convex functions f and −g on X and we are going to

relate the minimum of the convex function f − g and the maximum of the concave
function g∗ − f ∗ . Note that the minimization of f − g takes place over the convex
set dom(f − g) = dom(f ) ∩ dom(−g) and that the maximization of g∗ − f ∗ takes
place over the convex set dom(f ∗ − g∗ ) = dom(f ∗ ) ∩ dom(−g ∗ ).
Theorem 8.5.2 (Fenchel Duality) Let f and −g be proper convex functions.
Assume that the relative interiors of the effective domains of f and −g have a point
in common—ri(dom(f )) ∩ ri(dom(−g)) = ∅. Then we have the equality
inf (f (x) − g(x)) = sup (g∗ (ξ ) − f ∗ (ξ ))

x∈X ξ ∈X
holds and the supremum in the right hand side is attained.
8.6 Exercises
1. Show that the objective function ϕ of the dual problem is concave (that is, minus
this function is convex).
2. Write out the definition ϕ = −S ∗ and conclude that this gives the description
illustrated in Fig. 8.1 and described verbally just below this figure.
3. *Derive Theorem 8.1.5 from Lagrange’s principle in the convex case, and show
that Theorem 8.1.5 is in fact equivalent to the Lagrange’s principle in the convex
case.
4. Show that duality theorem is essentially equivalent to Lagrange’s principle in
the convex case.
5. Illustrate all statements of Theorem 8.1.5 by means of Figs. 8.1 and 8.2.
6. **Check all statements implicit in Proposition 8.1.7. Give a precise formulation
of the statement above that there is complete symmetry between the primal and
dual problem and prove this statement.
7. Write down the KKT conditions for the optimization problem that charac-
terizes the fair bargain for a bargaining opportunity that is given in Nash’s
theorem 1.1.1, after first having rewritten it as a convex optimization problem,
as
min(− ln f (x)), x ∈ F, x > v.
8. Show that a convex programming problem is a convex optimization problem.

9. *Derive Theorem 8.2.5 from Lagrange’s principle in the convex case.
10. *Derive Theorem 8.2.5 directly from the convex version of the Fermat theorem,
using rules for computing the subdifferential.
11. Formulate the Slater condition, the KKT conditions and the KKT theorem for
a convex programming problem with inequality and equality constraints:
min f0 (x)
s.t. fi (x) ≤ 0 1 ≤ i ≤ m , fi (x) = 0, m + 1 ≤ i ≤ m,
where f0 , . . . , fm are proper closed convex functions on X and fm +1 , . . . , fm

are affine functions on X. Derive this KKT theorem from Lagrange’s principle
in the convex case.
12. Formulate the KKT-conditions and the KKT theorem for a convex program-
ming problem for which the Slater conditions do not necessarily hold. Derive
this KKT theorem from Lagrange’s principle in the convex case.
13. Show that the problems considered in Examples 8.2.6 and 8.2.7 are convex
optimization problems.
14. Show that the function F : X × Y → (−∞] given by the recipe F (x, y) =
(x1 − 1)2 + (x2 − 2)2 if x1 + 3x2 − 1 + y ≤ 0 and F (x, y) = +∞ otherwise—
which is the perturbation function for Example 8.2.6, is a proper closed convex
function and that F (x, 0) = f (x) for all x ∈ R2 .
15. In the complete analysis of the problem considered in Examples 8.2.6 and 8.2.7,
use is made is made of the fortunate assumption that you have a hunch which
constraints are binding in the optimum. Which changes are necessary if you do
not have such a hunch?
16. Write down the analysis of the problem of the waist of three lines in space
considered in Sect. 7.2.3 in the style of the four step method.
17. Derive Theorem 8.3.1 from the principle of Fermat-Lagrange. Hint: use the
subdifferential form of Lagrange’s principle in the convex case.
18. Derive the KKT conditions in subdifferential form from the formal duality
theory for convex functions F : X × Y → R.
19. *Show that the inequality in Proposition 8.4.2 can be strict.
20. *Derive the minimax theorem 8.4.8 from Lagrange’s principle in the convex
case.
21. Show that the function f∗ is concave.
22. **Derive Theorem 8.5.2 from Lagrange’s principle in the convex case.
Hints. Reformulate the problem of minimizing f − g as follows
min f (y) − g(z)

y,z
subject to z = y, y ∈ dom(f ), z ∈ dom(g). Then take the dual problem

with respect to a suitable perturbation and show that this dual problem is the
problem to maximize g∗ − f ∗ .
8.6 Exercises 223
23. Derive, conversely, the KKT theorem from Fenchel duality.

Hint. Reformulate the convex programming problem as

m
inf f (x) − −δfi (x) ,
x
i=1
where δfi (x) is the indicator function of the set {x | fi (x) ≤ 0}:

0 if fi (x) ≤ 0
δfi (x) =
+∞ otherwise.
24. Solve the shortest distance problem for the point (1, 2) and the half-plane x1 +
3x2 ≤ 1.
25. *Solve the problem to minimize y 4 − y − 4x subject to the constraints
√
max(x + 1, ey ) + 2 x 2 + y 2 ≤ 3y + 2 5
and

x 2 + y 2 − 4x − 2y + 5 + x 2 − 2y ≤ 0.
26. Solve the problem to maximize x + y − 7 subject to the constraint

max(e + 3, y ) +
x 2
x 2 + y 2 − 4y + 4 ≤ 4.
27. Find the minimal value a such that the system of inequalities has a solution:
⎧
⎪
⎨ |x + y − 2| + 2|x− y| + 2 − 7y ≤ a
x
x + 2y + x 2 + y 2 − 5
4 2 ≤a
⎩ 3x 2 + y 2 − 2y + 1 + ex − 3y ≤ a
⎪
28. Solve the problem:

⎧ 2
⎨ x − 4y → min
|2x − y| + |y − x − 1| + y 2 ≤ 4
⎩
3 max 2|x|, |y| + e 2x−y ≤ 7

⎧ 2
⎨ 2x − 3x + 5y → min
|x − 2y + 1| + (y + 1)2 ≤ 4
⎩ 2
2 x + y 2 ≤ x + 2y

|x− 2y| + 5x 2 + y 2 − 3x → min
4 x 2 + y 2 + 6y + 9 ≤ 3x − 2y − 6
31. Use Theorem 1.1.1, the reformulation of the problem in this theorem as a
convex problem min(− ln f (x)), x ∈ F, x > v and the KKT conditions to
determine the fair bargain for two persons, Alice (who possesses a bat and a
box) and Bob (who possesses a ball). The bat has utility 1 to Alice and utility
2 to Bob; the box has utility 2 to Alice and utility 2 to Bob; the ball has utility
4 to Alice and utility 1 to Bob. If they do not come to an agreement, then they
both keep what they have.
32. *(Creative problem). Suggest a method of approximate solution of the linear
regression model with the uniform metric:
⎧
⎨ f (a) = maxk=1,...,N a, x (k) → min
⎩
x ∈ Rd , (a, a) = 1
where x (1) , . . . , x (N ) are given points (samples) from Rd .
Reference
1. T. M. Diary, Question CCCXCVI in Ladies Diary or Women’s Almanack, p. 47 (1755)

Chapter 9
Application to Convex Problems
Al freír de los huevos lo verá.
You will see it when you fry the eggs.

Miguel de Cervantes, Don Quixote (1615)
Abstract
• Why. The aim of this chapter is to teach by means of some additional examples
the craft of making a complete analysis of a convex optimization problem. These
examples will illustrate all theoretical concepts and results in this book. This
phenomenon is in the spirit of the quote by Cervantes. Enjoy watching the frying
of eggs in this chapter and then fry some eggs yourself!
• What. In this chapter, the following problems are solved completely; in brackets
the technique that they illustrate is indicated.
– Least squares (convex Fermat).
– Generalized Fermat-Weber location problem (convex Fermat).
– RAS method (KKT).
– How to take a penalty (minimax and saddle point).
– Ladies Diary problem (duality theory).
– Second welfare theorem (KKT).
– Minkowski’s theorem on an enumeration of convex polytopes (KKT).
– Duality for LP (duality theory).
– Solving LP by taking a limit (the interior point algorithms are based on convex
analysis).
9.1 Least Squares
The aim of this section is to present a method that is often used in practice, the least
squares method.
Suppose you have many data points (xi , yi ) with 1 ≤ i ≤ N . Suppose you are
convinced that they should lie on a straight line y = ax + b if not for some small

226 9 Application to Convex Problems
errors. You wonder what are the best values of a and b to take. This can be viewed
as the problem to find the best approximate solution for the following system of N
equations in the unknowns a, b:
yi = axi + b, 1 ≤ i ≤ N.
More generally, often one wants to find the best approximate solution of a linear
system of m equations in n unknowns Ax = b. Here A is a given m × n-matrix,
b is a given m-column and x is a variable n-column. Usually the columns of A
are linearly independent. Then what is done in practice is the following: the system
Ax = b is replaced by the following auxiliary system A Ax = A b. It can be
proved that this system has a unique solution. This is taken as approximate solution
of Ax = b.
Now we give the justification of this method. It shows that, in some sense, this
is the best one can do. For each x̄ ∈ Rn , the Euclidean length of the error vector
e = Ax̄ − b is a measure of the quality of x̄ as approximate solution of the system
Ax = b. The smaller this norm, the better the quality.
This suggests to consider the problem to minimize the length of the error vector,
f (x) = Ax − b. This is equivalent to minimizing its square,
g(x) = Ax − b2 = (Ax − b) (Ax − b) = x A Ax − 2b Ax + b2 .
This quadratic function is strictly convex and coercive (that is, its value tends
to +∞ if x → +∞) as the matrix product A A is positive definite. Therefore
it has a unique point of minimum. This is the solution of g (x) = 0, that is, of
A Ax = A b. This justifies that the method described above is best, in some sense.
This method is called the method of the least squares, as by definition Ax − b2 is
the sum of squares and this sum is minimized.
9.2 Generalized Facility Location Problem
Now we apply the convex Fermat theorem to a generalization of the famous facility
location problem of Fermat-Weber to find a point that has minimal sum of distances
to the vertices of a given triangle. The generalized problem is to find a point in n-
dimensional space that has minimal sum of distances to a given finite set of points.
This leads to a solution that is much shorter than all known geometric solutions of
this problem.
Consider the problem to minimize the function

d
f (x) = x − xi .
i=1
9.3 Most Likely Matrix with Given Row and Column Sums 227
Here x1 , . . . , xd are d given distinct points in Rn and x runs over Rn . If you take
n = 2 and d = 3, then you get the problem of Fermat-Torricelli-Steiner.
The function f is continuous and coercive, so there exists a point of minimum

x . The function f is convex, so such a point is characterized by the inclusion 0 ∈
∂f (x ). We write fi (x) = x − xi for all i, and ui (x) = x−xi for all x ∈ X

x −xi
and all i for which x = xi ; note that then ui (x) is a unit vector that has the same
direction as x − xi . We distinguish two cases:
1. The point x is not equal to any of the points xi . In this case, each one of the
functions fi is differentiable and f (
x) = x ). Therefore, the point
ui ( x is an
optimal solution of the problem iff f (
x ) = di=1 ui ( x ) = 0.
2. The point x is equal to xj for some j . Then ∂fj ( x ) = B(0, 1), the closed ball
with radius 1 and center the origin, and ∂fi ( x ) = {ui ( x )} for all i = j , as all
these functions fi are differentiable in the point
x . Now we apply the theorem of
Moreau-Rockafellar. This gives that

d
∂f (xj ) = B(0, 1) + ui (xj ) = B( xi , 1).
i=j i=j

Therefore, xj is an optimal solution iff the closed ball with center i=j ui (xj )
and radius 1 contains the origin. This means that

ui (xj ) ≤ 1.
i=j
Conclusion For d given distinct points x1 , . . . , xd ∈ Rn , thereis either a point

x
d
that does not coincide with any of these points, and for which u
i=1 i (
x ) = 0, or
among these points there is one, xj , for which i=j ui (xj ) ≤ 1, and then we
put
x = xj . In each case, the point x is the optimal solution of the given problem.
If you take n = 2 and d = 3, you get the solution of the problem of Fermat-
Torricelli-Steiner, which is called the problem of Fermat-Weber in location theory.
This special case and the geometric description of its solution is considered in detail
in Example 7.6.6.
9.3 Most Likely Matrix with Given Row and Column Sums
Suppose you want to know the entries of a positive square matrix, but you only know
its row and column sums. Then you do not have enough information. Now suppose
that you have reason to believe that the matrix is the most likely matrix with row
and column sums as given. This is the case in a number of applications, such as
the estimate or prediction of input-output matrices. To be more precise, suppose
that these row and column sums are positive integers, and assume that all entries
of the matrix you are looking for are positive integers. Assume moreover that the
matrix has been made by making many consecutive independent random decisions
to assign one unit to one of the entries—with equal probability for each entry. Then
ask for the most likely outcome given that the row and column sums are as given.
Taking the required matrix to be the most likely one in the sense just explained is
called the RAS method.
Now we explain how to find this most likely matrix. Using Stirling’s approxima-
tion ln n! = n ln n − n + O(ln n), one gets the following formula for the logarithm
of the probability that a given matrix (Tij ) is the outcome:

C− [Tij (ln Tij ) − Tij ],
i,j
where C is a constant that does not depend on the choice of the matrix. Then
the problem to find the most likely matrix with given row and column sums is
approximated by the following optimization problem:

min f (T ) = [Tij (ln Tij ) − Tij ],
i,j
subject to

Tij = Si ∀i,
j

Tij = Cj ∀j,
i
Tij > 0 ∀i, j.
Note that this is a convex optimization problem. The objective function is

coercive (it tends to +∞ if the max-norm of the matrix T = maxi,j |Tij | tends to
+∞) and strictly convex, so there exists a unique optimal solution for the problem.
Solving the Lagrange equations of this problem
gives that this optimal solution is
Tij = Ri Cj /S ∀i, j where S = i Ri = j Cj . The resulting matrix is given
as an estimate for the desired matrix: it has positive entries and the desired row and
column sums, but it does not necessarily have integral coefficients.
The most likely matrix with given row and column sum is used to estimate input-
output matrices for which only the row sums and column sums are known.
9.4 Minimax Theorem: Penalty Kick
The aim of this section is to illustrate minimax and the saddle point theorem.
We consider taking of a penalty kick, both from the point of view of the goal
keeper and from the point of view of the football player taking the penalty. For
9.5 Ladies Diary Problem 229
simplicity we assume that each one has only two options. The player can aim left or
right (as seen from the direction of the keeper), the keeper can dive left or right. So
their choices lead to four possibilities. For each one the probability is known that the
keeper will stop the ball: these probabilities are stored in a 2 × 2-matrix A. The first
and second rows represent the choice of the keeper to dive left and right respectively;
the first and second columns represent the choice of the player to aim left and right
respectively. Both know this matrix. It is easy to understand that it is in the interest
of each one that his choice will be a surprise for the other. Therefore, they both
choose randomly, with carefully determined probabilities. These probabilities they
have to make known to the other. Which choices of probabilities should they make
for given A?
The point of view of the keeper, who knows his adversary is smart, is to determine
the following maximin (where we let I = {x = (x1 , x2 ) | x1 , x2 ≥ 0, x1 + x2 = 1}):
max min x Ay.

x∈I y∈I
The point of view of the player, who also knows the keeper is smart, is to determine
the following minimax
min max x Ay.

y∈I x∈I
By von Neumann’s minimax theorem, these two values are equal and it so
happens that there exists a choice for each one the two where this value is realized
and nobody has an incentive, after hearing the choice of his adversary to change his
own choice. This choice is the saddle point of the function x Ay.
9.5 Ladies Diary Problem
The aim of this section is to describe the oldest example of the duality theorem for
optimization problems.
In 1755 Th. Moss posed the following problem in the journal Ladies Diary:
In the three sides of an equilateral field stand three trees at the distance of 10, 12 and 16
chains from one another, to find the content of the field, it being the greatest.
Keep in mind that this was before the existence of scientific journals; it was not
unusual to find scientific discoveries published for the first time in the Ladies Diary.
Already in the beginning of the seventeenth century, Fermat posed at the end of
a celebrated essay on finding minima and maxima by pre-calculus methods (the dif-
ferential calculus was only discovered later, by Newton and Leibniz independently)
the following challenge:
Let he who does not approve of my method attempt the solution of the following problem:
Given three points in the plane, find a fourth point such that the sum of its distances to the
three given points is a minimum!
Fig. 9.1 The oldest pair of

primal-dual optimization
problems
16
12 10
These two optimization problems look unrelated. However, if you solve one, then
you have the solution of the other one. The relation between their solutions is given
in Fig. 9.1.
Each one of these two problems is just the dual description of the other one.
Conclusion You have seen the historically first example of the remarkable phenom-
ena that seemingly unrelated problems, one minimization and the other maximiza-
tion, are in fact essentially equivalent.
9.6 The Second Welfare Theorem
The aim of this section is to illustrate the claim that you should be interested in
duality of convex sets as this is the heart of convexity. We give the celebrated
second welfare theorem. This gives a positive answer to the following intriguing
question. Can price mechanisms achieve each reasonable allocation of goods
without interference, provided one corrects the starting position, for example, by
means of education in the case of the labor market?
We will present the celebrated second welfare theorem that asserts that price
mechanisms can achieve each reasonable allocation of goods without interference,
provided one corrects the starting position. We first present the problem in the form
of a parable.
A group of people is going to have lunch. Each one has brought some food.
Initially, each person intends to eat his own food. Then one of them offers his
neighbor an apple in exchange for her pear which he prefers. She accepts as she
prefers an apple to a pear. So this exchange makes them both better off, because of
the difference in their tastes.
Now suppose that someone who knows the tastes of the people of the group
comes up with a ‘socially desirable’ reallocation of all the food. He wants to realizes
this using a price system, in the following way. He assigns suitable prices to each
sort of food, all the food is put on the table, and everyone gets a personal budget
9.6 The Second Welfare Theorem 231
to choose his own favorite food. The budget will not necessarily be the value of
the lunch that the person in question has brought to the lunch. The starting position
has been corrected. Show that the prices and the budgets can be chosen in such a
way that the self-interest of all individuals leads precisely to the planned socially
desirable allocation, under natural assumptions.
Solution. The situation can be modeled as follows. We consider n consumers
1, 2, . . . , n and k goods 1, . . . , k. A nonnegative vector in Rk is called a con-
sumption vector. It represents a bundle of the k goods. The consumers have initial
endowments ω1 , . . . , ωn . These are consumption vectors representing the initial
lunches. An allocation is a sequence of n consumption vectors x = (x1 , . . . , xn ).
The allocation is called admissible if

n
n
xi ≤ ωi .
i=1 i=1
That is, the allocation can be carried out given the initial endowments. The
consumers have utility functions, defined on the set of consumption vectors. These
represent the individual preferences of the consumers as follows. If ui (xi ) > ui (xi )
then this means that consumer i prefers consumption vector xi to xi . If ui (xi ) =
ui (xi ) then this means that consumer i is indifferent between the consumption
vectors xi and xi . The utility functions are assumed to be strictly quasi-concave
(that is, for each γ ≥ 0 the set {x | ui (x) ≥ γ } is convex for each i) and strictly
monotonic increasing in each variable. An admissible allocation is called Pareto
efficient if there is no other admissible allocation that makes everyone better off.
This is equivalent to the following more striking looking property: each allocation
that makes one person better off, makes at least one person worse off. A price vector
is a nonnegative row vector p = (p1 , . . . , pk ). It represents a choice of prices of the
k goods. A budget vector is a nonnegative row vector b = (b1 , . . . , bn ). It represents
a choice of budgets for all consumers. For each choice of price vector p, each choice
of budget vector b and each consumer we consider the problem
max ui (xi ),
subject to
p · xi ≤ bi ,

n
xi ≤ ωj ,
j =1
xi ∈ Rk+ .
This problem models the behavior of consumer i who chooses the best consumer
vector he can get for his budget. This problem has a unique point of maximum
di (p, d). This will be called the demand function of the consumer i. A nonzero
price vector p and a budget vector b for which

n
n
bi = p · ωi
i=1 i=1
that is, total budget equals total value of the initial endowments are said to give a
Walrasian equilibrium if the sequence of demands
d(p, b) = (d1 (p, b), . . . , dn (p, b))
is an admissible allocation, that is, if the utility maximizing behavior of all

individuals does not lead to excess demand.
Here are two positive results about the present model.
Theorem 9.6.1 (First Welfare Theorem) For each Walrasian equilibrium deter-
mined by price p and budget b, the resulting admissible allocation d(p, b) is Pareto
efficient.
Theorem 9.6.2 (Second Welfare Theorem) Each Pareto efficient admissible allo-
cation x ∗ for which all consumers hold a positive amount of each good is a
Walrasian equilibrium. To be more precise, there is a nonzero price vector p which
gives for the choice of budget vector
b = (p · x1∗ , . . . , p · xn∗ )
a Walrasian equilibrium with
d(p, b) = p · x ∗ .
These theorems can be proved by means of the KKT conditions.
9.7 *Minkowski’s Theorem on Polytopes
Minkowski’s theorem gives a convenient description of all polytopes.

Theorem 9.7.1 (Minkowski’s Theorem on Polytopes) Let ni , 1 ≤ i ≤ k be a
sequence of different unit vectors in Rn and let a1 , . . . , ak be a sequence of positive
numbers, such that the equality

k
ai ni = 0
i=1
9.7 *Minkowski’s Theorem on Polytopes 233
holds. Then there exists a unique polytope with k (n − 1)-dimensional boundaries

with outward normals ni , 1 ≤ i ≤ k and with (n − 1)-volumes a1 , . . . , ak . All
polytopes in Rn are obtained in this way.
In three-dimensional space, the equality in the theorem has a physical meaning.
If a polytope is filled with an ideal gas under pressure p, then the force that acts on
the i-th boundary is equal to pai ni . As the total force of the gas on all boundaries
is equal to zero (otherwise, this force would move the polytope and we would get a
perpetuum mobile, which is impossible), we get ki=1 pSi ni = 0.

The theorem consists of three statements: the necessary condition ki=1 ai ni = 0
for the existence of a polytope, its sufficiency, and the uniqueness of the polytope.
We prove all except the uniqueness. This proof, by Vladimir Protasov, is a bit
technical, but it appears to be the shortest proof. It is a convincing example
of application of the KKT theorem to a well-known geometric problem. If the
nonnegativity conditions of the Lagrange multipliers are not taken into account,
then one comes to a wrong conclusion. This mistake is made in several well-known
books on convex analysis.
Proof of the Necessity and the Sufficiency
Necessity Let P be a polytope with k (n − 1)-dimensional boundaries with outward
normals ni , 1 ≤ i ≤ k and with (n − 1)-volumes a1 , . . . , ak . For an arbitrary point
x ∈ P , we denote by hi = hi (x) the distance of x to the i-th boundary. As the
volume of a pyramid with top x and base the i-th boundary of P is equal to n1 hi ai ,

we get ki=1 hi (x)ai = dvol(P ). Taking the derivative of both sides of the equality

gives ki=1 ai ni = 0.
Sufficiency For an arbitrary sequence of nonnegative numbers h = (h1 , . . . , hk )
we consider the polytope P (h) = {x ∈ Rn | ni · x ≤ hi , i = 1, . . . , k} and we
denote by V (h) its volume. We observe that 0 ∈ P (h) for all h. We consider the
optimization problem

k
min −V (h), ai hi = 1, −hi ≤ 0, i = 1, . . . , k.
i=1
In other words, among all polytopes in Rn containing the origin, bounded by

hyperplanes that are orthogonal to the vectors ni and that have distances to the origin
hi , and for which the equation ki=1 ai hi = 0 holds, we look for the polytope with
maximal volume.
There exists a solution to this optimization problem, as the set of admissible
vectors h is compact. We denote a solution by h. Let the (n − 1)-volumes of the
boundaries of P (h) be Si , 1 ≤ i ≤ k. By the KKT theorem, one has Lh ( h, λ) = 0,
where

k
k
L = −λ0 V (h) − λi hi + λk+1 ( ai hi − 1)
i=1 i=1
is the Lagrange function. Moreover λi ≥ 0 and λi

hi = 0, i = 1, . . . , k.
Differentiating with respect to hi , we get
−λ0 Vhi − λi + λk+1 ai = 0, i = 1, . . . , k.
If for some j , the hyperplane {x ∈ Rn | nj · x = hj } is disjoint from the

polytope P ( h), that is, if this hyperplane is superfluous, then Vhj = 0, and hence
λj = λk+1 aj . This also holds in the case λ0 = 0. If λj = 0, then λk+1 = 0, and so
λi = −λ0 Vhi for all i. If λ0 = 0, then we get a contradiction to the requirement
that not all Lagrange multipliers are zero, but if λ0 = 1, then λi = −Vhi for
all i = 1, . . . , k. If the i-th hyperplane is not disjoint from the polytope P (h) (such
hyperplanes exist of course), then Vhi = Si > 0, and so λi < 0, which contradicts
the condition of nonnegativity of the Lagrange multipliers. Therefore, λ0 = 1, and
all k hyperplanes ‘form the boundary’, that is, Vhi = Si > 0 for all i. Then
−Si − λi + λk+1 ai = 0, i = 1, . . . , k.
Multiplying this
equality by the vector ni , taking the sum
over all i, and taking into
account that ki=1 Si ni = ki=1 ai ni = 0, we obtain ki=1 λi ni = 0. We denote
by I the set of all indices i for which λi > 0. Then i∈I λi ni = 0, and taking
into account the condition of complementary slackness, hi = 0 for all i ∈ I . On
the other hand, the polytope P ( h) has at least one interior point z, that is, for which
ni · z <
hi , i = 1, . . . , k. Multiplying by λi and taking the sum over all i ∈ I , we
get 0 < i∈I λi hi = 0. This contradiction means that I is empty, that is λi = 0
and so Si = λk+1 ai for all i = 1, . . . , k. Therefore, the volumes of the boundaries
of polytope P ( h) are proportional to the numbers ai . Carrying out a homothety
with the required coefficient, we obtain a polytope for which the volumes of the
boundaries are equal to these numbers.
Remark It can be proved that the function − ln V (h) is strictly convex, using the
Brunn-Minkowski inequality; this implies the uniqueness in Minkowski’s theorem
on polytopes.
9.8 Duality Theory for LP
The aim of this section is to derive in a geometric way the formulas that are usually
given to define dual problems. A natural perturbation of the data will be chosen, and
then the dual problem associated to this perturbation will be determined. This is of
9.8 Duality Theory for LP 235
interest for the following reason. There are several standard forms for LP-problems.
Often one defines for each type separately its dual problem. The coordinate free
method allows to see that all formulas for dual problems are just explicit versions of
only one coordinate-free, and so geometric intuitive, concept of dual problem.
To begin with we will proceed as much as possible coordinate free—for better
insight, starting from scratch.
Definition 9.8.1 A linear programming problem is an optimization problem P of
the form
min p · d0 , p ∈ P + ,
p
where d0 ∈ X = Rn , P ⊂ X an affine subspace—say P = P̃ + p0 for a subspace

P̃ ⊂ X and a vector p0 ∈ X—and
P + = {p ∈ P | p ≥ 0}.
This problem will be called the primal problem.

We perturb P by a translation of the affine subspace P . That is, we consider the
family of problems Py , y ∈ X, given for each y ∈ X by
min p · d0 , p ∈ (P̃ + y)+ .

p
So P = Pp0 . For simplicity of formulas we do not translate p0 to the origin as we

have done in the main text. We write D for the dual problem of the primal problem
P with respect to the perturbation Py , y ∈ Y .
Now we write down the function F : X × X → (−∞, +∞] that describes the
chosen perturbation. We do this by giving the epigraph of F : (x, y, z) ∈ X × X × R
for which
x ∈ P + y, x ≥ 0, z ≥ p · d0 .
This set is readily seen to be a polyhedral set and to be the epigraph of a function
F : X × X × R → (−∞, +∞].
Now we derive the following explicit description of the dual problem D.
Proposition 9.8.2 The dual problem D is the following problem
max p0 · (d0 − d), d ∈ D + ,

d
where
D̃ = P̃ ⊥ = {d̃ ∈ X | d̃ · p̃ = 0 ∀p̃ ∈ P̃ },
the orthogonal complement of P̃ in X, and D = D̃ + d0 .

Proof We begin by determining the Lagrange function L : X × X → R̄.
L(p, η) = inf{p · d0 + (p0 − y) · η | y ∈ P̃ + p},

y
which becomes [provided p ≥ 0, otherwise L(p, η) = +∞; in this step, we use the
equivalence of p ∈ (P̃ + y)+ with y ∈ P̃ + p and p ≥ 0]
L(p, η) = inf{p · d0 + (p0 − y) · η | y ∈ (P̃ + p)},

y
which becomes [provided η ∈ D̃, otherwise L(p, η) = −∞; in this step, use that
infu∈P̃ u · η equals 0 if η ∈ D̃ and that it equals −∞ otherwise]
L(p, η) = p · d0 + (p0 − p · η) = p · (d0 − η) + p0 · η.
Thus, the Lagrange function has been determined. The next step is to deter-
mine the objective function of the dual problem—that is, to determine ϕ(η) =
infp∈X L(p, η) for all η ∈ X.
ϕ(η) = inf{p · (d0 − η) + p0 · η | p ∈ X+ } = p0 · η,

p
provided d0 − η ≥ 0 and η ∈ D̃, otherwise ϕ(η) = −∞. This finishes the

determination of ϕ. Thus, the objective function ϕ of the dual problem D has been
determined. It follows that D takes the following form:
max p0 · η, η ∈ D̃, d0 − η ≥ 0.
η
Writing d = d0 − η, this can be rewritten as
max p0 · (d0 − d), d ∈ (D̃ + d0 )+ ,

d
as required.
Now duality theory takes the following symmetric form.
Theorem 9.8.3 (Coordinate Free Duality Theory for LP) Let P = P̃ + p0 , D =
D̃ + d0 ⊂ X = Rn be two affine subspaces for which P̃ and D̃ are subspaces
of X which are each other’s orthogonal complement. Consider the following LP
problems
min p · d0 , p ∈ P + (P)
p
max p0 · (d0 − d), d ∈ D. (D)

d
9.8 Duality Theory for LP 237
Then the following statements hold true.

1. Either P and D have the same optimal value or they are both infeasible.
2. P and D both possess an optimal solution if their common optimal value is finite.
3. Let p ∈ P + and d ∈ D + . Then the following two conditions are equivalent:
(a) p is an optimal solution of P and d is an optimal solution of D
(b) The complementary condition p · d = 0 holds.
This result is a consequence of the duality theory for convex optimization.
Finally, we go over to coordinates. Then we get the familiar descriptions for LP-
problems and their duals.
Proposition 9.8.4 The description of a pair of primal-dual LP problems in stan-
dard form is as follows:
min c x, Ax = b, x ≥ 0, (P)
x
max b y, A y + s = c, s ≥ 0. (D)
y,s
Here, A is an m × n-matrix, b ∈ Rm and c ∈ nn .

Proposition 9.8.5 The descriptions of a pair of primal-dual problems in symmetric
form is as follows:
min c x, Ax ≥ b, x ≥ 0, (P)
x
max b y, A y ≤ c, y ≥ 0. (D)
y
Here, A is an m × n-matrix, b ∈ Rm and c ∈ Rn .

One can determine, more generally, the dual problem of a conic optimization
problem in the same way.
Proposition 9.8.6 Consider the conic optimization problem
min c x
x
s.t. Ax = b
x 0.
Here A ∈ Rm×n , b ∈ Rm , c ∈ Rn , the variable x runs over Rn and x 0 means

that x ∈ C, a closed pointed solid convex cone in Rn .
Then the dual problem determined by the perturbation of b defined by replacing

in Ax = b the righthand side b by b − y is
max b y
y,s
s.t. A y + s = c
s ∗ 0.
where s ∗ 0 means that x ∈ C ∗ , the dual cone of C.
9.9 Solving LP Problems by Taking a Limit
In this section, an introduction is given to primal-dual interior point methods, also

called barrier methods. These methods allow one to solve numerically most convex
optimization problems in an efficient and reliable way, both in practice and in theory.
We restrict attention to LP problems.
We begin with a result that is of independent interest. We define a mapping that
seems to belong to linear algebra. It turns out, maybe surprisingly, to be a bijection.
This can be proved by a somewhat miraculous argument, involving the construction
of a suitable strictly convex function.
Let P and D be affine subspaces in X = Rn that are each other’s orthogonal
complement. That is, after translation of both to subspaces P̃ and D̃ respectively,
each of the two consists of the vectors in X that are orthogonal to the other one,
P̃ = D̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ D̃}
and, equivalently,
D̃ = P̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ P̃ }.
We let X++ , P++ , D++ denote the subset of all points in X, P , D respectively for
which all coordinates are positive. We use the Hadamard product, coordinate-wise
multiplication v ◦ w = (v1 w1 , . . . , vn wn ) ∈ Rn of two vectors v, w ∈ Rn . Now we
are ready to state the promised surprising fact.
Proposition 9.9.1 The map P++ × D++ → X++ given by pointwise
multiplication—(p, d) → p ◦ d—is a bijection.
Figure 9.2 illustrates this proposition.
Proof Choose an arbitrary point x ∈ Rn++ . We have to prove that there exist unique
p ∈ P++ , d ∈ D++ for which x = p ◦ d, where ◦ is the Hadamard product
9.9 Solving LP Problems by Taking a Limit 239
p∗d
s 1-1
p
D 0 P 0
Fig. 9.2 A surprising bijection
(coordinate wise multiplication). To this end we consider the optimization problem
min f (p, d) = p · d − x · ln(p ◦ d),
subject to
p ∈ P++ , d ∈ D++ .
Here we write ln for coordinate-wise taking the logarithm:
ln v = (ln v1 , . . . , ln vn ) ∀v ∈ Rn .
It is readily checked that there is an optimal solution of this problem; moreover,

one can check that the function f (p, d) is strictly convex, so this optimal solution
is unique.
Let s be the unique common point of P and D. Then
P = P̃ + s = {p̃ + s | p̃ ∈ P̃ }
and
D = D̃ + s = {d̃ + s | d̃ ∈ D̃}.
The stationarity conditions of this optimization problem are readily seen to be as

follows:
(d − x ◦ p−1 ) · p̃ ∀p̃ ∈ P̃ ,
(p − x ◦ d −1 ) · d̃ = 0 ∀d̃ ∈ D̃.
Here we write v r for coordinate-wise raising to the power r:
v r = (v1r , . . . , vnr ) ∀v ∈ Rn++ ∀r ∈ R.
The stationarity conditions can be rewritten as follows:

1 1 1 1 1 1
(p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2 ) · (p− 2 ◦ d 2 ◦ p̃) = 0 ∀p̃ ∈ P̃ ,
1 1 1 1 1 1
(p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2 ) · (p 2 ◦ d − 2 ◦ d̃) = 0 ∀d̃ ∈ D̃.
That is, the vector

1 1 1 1
p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2
is orthogonal to the subspaces

1 1
p− 2 ◦ d 2 ◦ P̃
and
1 1
p 2 ◦ d − 2 ◦ D̃
of Rn . These subspaces are each other’s orthogonal complement. We get

1 1 1 1
p 2 ◦ d 2 − x ◦ p− 2 ◦ d − 2 = 0.
This can be rewritten as p ◦ d = x.

Now we turn to primal-dual interior point methods for LP problems.
An LP-problem can be formulated coordinate-free in the following unusual way.
Let P and D be affine subspaces in X = Rn that are each other’s orthogonal
complement. That is, after translation of both to subspaces P̃ and D̃ respectively,
each of the two consists of the vectors in X which are orthogonal to the other one,
P̃ = D̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ D̃}
and, equivalently,
D̃ = P̃ ⊥ = {x ∈ X | x · y = 0 ∀y ∈ P̃ }.
Then P and D have a unique common point s; this can be checked using that each
element can be written in a unique way as the sum of an element of P̃ and an
element of D̃ = P̃ ⊥ . Then P = s + P̃ and D = s + D̃. Assume moreover that both
P and D contain positive vectors. One can consider the problem to find nonnegative
9.10 Exercises 241
vectors p ∈ P and d ∈ D for which p · d = 0. It follows from Theorem 9.8.3 that

this is equivalent to the problem of solving an arbitrary LP problem and of its dual
problem.
The interior point method solves this problem by taking a limit: one considers
pairs (p, d) consisting of a positive vectors p ∈ P and d ∈ D—that is, p ∈
P++ , d ∈ D++ . Then one takes a limit ( of a convergent sequence of such
p, d)
pairs for which p · d tends to zero. Then ( is a solution of the problem, by the
p, d)
continuity of the inner product.
Now we are going to sketch how one can obtain a numerical computation of such
limits efficiently—using Newton’s algorithm for solving systems of equations—by
virtue of Proposition 9.9.1.
Suppose you have explicit points p̄ ∈ P++ and d̄ ∈ D++ . Compute p̄ ◦ d̄.
Choose a positive vector v in X that lies close to p̄ ◦ d̄ and that is a bit closer to the
origin than p̄ ◦ d̄. Then do one step of the Newton method for solving the system of
equations p ◦ d = v with initial approximation p = p̄, d = d̄. We recall how this
works and what is the intuition behind it. One expects the solution of p ◦ d = v to
be close to (p̄, d̄) as v is close to p̄ ◦ d̄. Therefore one rewrites the vector equation
p ◦ d = v, writing p = p̄ + (p − p̄) and d = d̄ + (d − d̄) and then expanding
brackets. This gives
p̄ ◦ d̄ + p̄ ◦ (d − d̄) + (p − p̄) ◦ d̄ + (p − p̄) ◦ (d − d̄) = v.
Now omit the last term, which we expect to be negligible if we would substitute the
solution: indeed then both p− p̄ and d − d̄ become small vectors, so (p− p̄)◦(d − d̄)
will become very small. This leads to the linear vector equation
p̄ ◦ d̄ + p̄ ◦ (d − d̄) + (p − p̄) ◦ d̄ = v,
which also can be written as
p ◦ d − (p − p̄) ◦ (d − d̄) = v.
This has a unique solution (p , d ) having p ∈ P , d ∈ D. Take this solution.

It is reasonable to expect that this is a good approximation of the unique p ∈
P++ , d ∈ D++ for which p ◦ d = v. In particular, it is reasonable to expect
that p and d are positive vectors. Then repeat this step, now starting from (p , d ).
Repeat the step a number of times. This can be done in such a way that one gets an
algorithm that is in theory and in practice efficient.
9.10 Exercises
1. Show that there exists an optimal solution for the problem of the waist of three
lines in space.
2. (a) Check that the set of all (x, y, z) ∈ X × X × R for which is a polyhedral
set which is the epigraph of a function F : X × X → (−∞, +∞].
(b) Give an explicit description of this function F .
3. Derive Theorem 9.8.3 from the duality theory for convex optimization.
4. Derive the explicit form for the dual problem of an LP problem in standard
form that is given in Proposition 9.8.4.
5. Derive the explicit form for the dual problem of an LP problem in symmetric
form that is given in Proposition 9.8.5.
6. Derive the explicit form for the dual problem of a conic optimization problem
given in Proposition 9.8.6.
7. Give a coordinate free description of a pair of primal-dual conic optimization
problems and formulate complementarity conditions for such pairs.
8. Determine the best strategies for both players in the penalty problem from
Sect. 9.4 if all = 0.9, alr = 0.4, arl = 0.2, arr = 0.8.
9. Give a precise formalization of the maximization problem by Th. Moss from
1755 formulated in Sect. 9.5.
10. Give a precise formalization of the minimization problem by Fermat formulated
in Sect. 9.5.
11. Show that there exists a unique global optimal solution to the problem
min f (p, d) = p · d − x · ln(p ◦ d),
subject to
p ∈ P++ , d ∈ D++
considered in Sect. 9.9.

12. Check that the stationarity conditions for the problem
min f (p, d) = p · d − x · ln(p ◦ d),
subject to
p ∈ P++ , d ∈ D++ ,
considered in Sect. 9.9 are as follows:
(d − x ◦ p−1 ) · p̃ ∀p̃ ∈ P̃ ,
(p − x ◦ d −1 ) · d̃ = 0 ∀d̃ ∈ D̃.
Appendix A
A.1 Symbols
A.1.1 Set Theory
s∈S The element s belongs to the set S.

S⊆T For sets S, T : S subset of T , s ∈ S ⇒ s ∈ T .
S⊂T For sets S, T : S proper subset of T , S ⊆ T , S = T .
S×T For sets S, T : Cartesian product of S and T , the set {(s, t) | s ∈ S, t ∈
T }.
f :S→T For sets S, T : a function f from S to T .
epi(f ) For a set T and a function f : T → R: epigraph of f , the set
{(t, ρ) | t ∈ T , ρ ∈ R, ρ ≥ f (t)}.
#T For a finite set T : the number of elements of T .
A.1.2 Some Specific Sets
R Real numbers.
Rn Real n-column vectors.
R+ Nonnegative numbers, [0, +∞) = {x ∈ R | x ≥ 0}.
R++ Positive numbers, (0, +∞) = {x ∈ R | x > 0}.
R Extended real numbers, [−∞, +∞] = R ∪ {−∞, +∞}.
[a, b] For a, b ∈ R, a ≤ b: {x ∈ R | a ≤ x ≤ b}.
(a, b) For a, b ∈ R, a ≤ b: {x ∈ R | a < x < b}.
[a, b) For a, b ∈ R, a ≤ b: {x ∈ R | a ≤ x < b}.
(a, b] For a, b ∈ R, a ≤ b, {x ∈ R | a < x ≤ b}.
Rc For nonzero c ∈ X = R n : the open ray spanned by c, {ρc | ρ > 0}.

Research, https://doi.org/10.1007/978-3-030-41804-5
244 Appendix A
Rn+ First orthant, the convex cone {x ∈ Rn | x ≥ 0n } = {x ∈ Rn | x1 ≥

0, . . . , xn ≥ 0}.
Ln Lorentz cone or ice cream cone, the convex cone that is the epigraph of the
1
Euclidean norm on Rn , {(x, ρ) | x ∈ Rn , ρ ∈ R, ρ ≥ (x12 + · · · + xn2 ) 2 }.
Sn Inner product space of symmetric n × n matrices M (that is, for which
M = M), with inner product M, N = i,j Mij Nij .
Sn+ Positive semidefinite cone, the convex cone in Sn that consists of all M that
are positive semidefinite x Mx ≥ 0 for all x ∈ Rn .
A.1.3 Linear Algebra
X A finite dimensional (f.d.) inner product space with inner product ·, ·;
often (without loss of generality) Rn with the dot product x · y = x1 y1 +
· · · + xn yn .
1
· X The Euclidean norm on X, xX = x, x 2 for all x ∈ X. For X = Rn
1
this is (x12 + · · · + xn2 ) 2 .
(u, v) For unit vectors u, v ∈ Rn : angle ϕ between u and v, cos ϕ = u · v, ϕ ∈
[0, π ].
L⊥ For subspace L of X: orthogonal complement of L, the subspace {x ∈
X | x · y = 0 ∀y ∈ L}.
N For a linear function M : X → Y : transpose, the linear function Y → X
with recipe characterized by M (y), x = y, M(x) for all x ∈ X and
y ∈ Y.
Preference relation on X.
span(S) For a subset S ⊆ X: the span of S, the minimal subspace of X containing
S (where minimal means ‘minimal with respect to inclusion’).
aff(S) For a subset S ⊆ X: the affine hull of S, the minimal affine subset of X
containing S (where minimal means ‘minimal with respect to inclusion’).
ker(L) For a linear function L : X → Y , where X, Y are f.d. inner product
spaces: the kernel or nullspace of L, the subspace {x ∈ X | L(x) = 0} of
X.
N∗ For a norm N on X: the dual norm on X, N ∗ (y) = maxx=0 x, y/N(x)
for all y ∈ X.
1
· p For p ∈ [1, +∞]: the p-norm on Rn , xp = [|x1 |p + · · · + |xn |p ] p if
p = +∞, and x∞ = max(|x1 |, . . . , |xn |). So x2 is the Euclidean
norm.
x◦y For x, y ∈ Rn : Hadamard product of x and y, the pointwise product
(x1 y1 , . . . , xn yn ) .
Appendix A 245
A.1.4 Analysis
f (x) For a function f : X → Y and a point x ∈ X of differentiability of f :

derivative of f at x, linear function f (x) : X → Y for which f (x + h) −
f (x) − f (x)(h)/h → 0 for h → 0.
f (x) For a function f : X → Y and a point x ∈ X of twice differentiability:
second derivative or Hessian of f at x, f = (f ) .
f For a set T and a function f : T → R: graph of f , the set {(t, f (t)) | t ∈
T }.
Tx R For a set R ⊆ X and a point x ∈ R: the tangent cone of R at x, the set of
vectors d ∈ X for which there exist sequences (dk )k in X and (tk )k in R++
such that dk → d, tk → 0 for k → ∞, and r + tk dk ∈ R for all k.
A.1.5 Topology
cl(S) For a subset S ⊂ X: the closure of S, the minimal

closed subset of X that contains S (minimal
means minimal with respect to inclusion).
int(S) For a subset S ⊂ X: the interior of S, the
maximal open subset of X that is contained
in S (maximal means maximal with respect to
inclusion).
BX (a, r) = Bn (a, r) = B(a, r) For a ∈ X = Rn and r ∈ R++ : the closed ball
with radius r and center a, {x ∈ X | x−a ≤ r}.
UX (a, r) = Un (a, r) = U (a, r) For a ∈ X = Rn and r ∈ R++ : the open ball
with radius r and center a, {x ∈ X | x−a < r}.
SX = Sn For X = Rn : the standard unit sphere in X, {x ∈
X | x12 + · · · + xn2 = 1} = {x ∈ X | x = 1}.
BX = Bn For X = Rn : the standard closed unit ball in X,
B(0n , 1) = {x ∈ X | x12 + · · · + xn2 ≤ 1} = {x ∈
X | x ≤ 1}.
UX = Un For X = Rn : the standard open unit ball in X,
U (0n , 1) = {x ∈ X | x12 = · · · + xn2 < 1} = {x ∈
X | x < 1}.
A.1.6 Convex Sets
A Often used for a convex set.

RA Recession cone for a nonempty convex set A.
246 Appendix A
co(S) For a set S ⊆ X: the convex hull of S, the minimal convex set
containing S (minimal means ‘minimal under inclusion’).
cone(S) For a set S ⊆ X: the conic hull of S, the minimal convex cone
containing S ∪ {0X } (minimal means ‘minimal under inclusion’).
MA For a convex set A ⊆ X and a linear function M : X → Y , where Y
is also a f.d. inner product space: image of A under M, the convex set
{M(a) | a ∈ A} of Y .
AN For f.d. inner product spaces X, Y , a linear function N : X → Y ,
and a convex set A ⊆ Y : inverse image of A under N , the convex
set{x ∈ X | N(x) ∈ A}.
A1 + A2 For convex sets A1 , A2 ⊆ X: Minkowski sum of A1 and A2 , {a1 +
a2 | a1 ∈ A1 , a2 ∈ A2 }.
A1 ∩ A2 For convex sets A1 , A2 ⊆ X: intersection of A1 and A2 , {x ∈ X | x ∈
A1 & x ∈ A2 }.
A1 co ∪ A2 For convex sets A1 , A2 ⊆ X: convex hull of the union of A1 and A2 :
co(A1 ∪ A2 ).
A1 #A2 For convex sets A1 , A2 ⊆ X: inverse sum of A1 and A2 ,
∪α∈[0,1] (αA1 ∩ (1 − α)A2 ).
LA For a nonempty convex set A: the lineality space of A, the maximal
subspace of X that consists of recession vectors of A, RA ∩ R−A
(maximal means maximal with respect to inclusion).
ri(A) For a convex set A ⊆ X: the relative interior of A, the maximal rel-
atively open subset of aff(A) containing A (maximal means maximal
with respect to inclusion).
extr(A) For a convex set A: the set of extreme points of A, points that are not
the average of two different points of A.
extrr(A) For a convex set A: the set of extreme recession directions of A,
recession directions Ru for which the vector u is not the average
of two linearly independent vectors v, w for which Rv and Rw are
recession directions of A.
rA− For a convex set A ⊆ X: the minimal improper convex function for A
(where minimal means minimal with respect to the pointwise ordering
for functions), the improper convex function X → R with recipe x →
−∞ if x ∈ cl(A) and x → +∞ otherwise.
rA+ For a convex set A ⊆ X: the maximal improper convex function
for A (where maximal means maximal with respect to the pointwise
ordering for functions), the improper convex function X → R with
recipe x → −∞ if x ∈ ri(A) and x → +∞ otherwise.
B Often used for a convex set that contains the origin.
B◦ For a convex set B ⊆ X that contains the origin: the polar set, the
convex set containing the origin {y ∈ X | x, y ≤ 1 ∀x ∈ B}.
B ◦◦ For a convex set B ⊆ X that contains the origin: the bipolar set, the
closed convex set that contains the origin (B ◦ )◦ .
Appendix A 247
d
B←
→ B̃ For convex sets containing the origin B, B̃ ⊆ X: B and B̃ are in
duality, B ◦ = B̃ and B̃ ◦ = cl(B). Warning: B̃ is closed, but B is not
necessarily closed.
C Often used for a convex cone.
C◦ For a convex cone C ⊆ X: polar cone, the closed convex cone {y ∈
X | x, y ≤ 0 ∀x ∈ C}.
C ◦◦ For a convex cone C ⊆ X: bipolar cone, the closed convex cone
(C ◦ )◦ .
C∗ For a convex cone C ⊆ X: the dual cone, the closed convex cone
−C ◦ = {y ∈ X | x, y ≥ 0 ∀x ∈ C}.
d
C←
→ C̃ For convex cones C, C̃ ⊆ X: C and C̃ are in duality, C ◦ = C̃ and
C̃ ◦ = cl(C). Warning: C̃ is closed, but C is not necessarily closed.
A.1.7 Convex Sets: Conification
X Diagonal map for X, function X → X × X with recipe x → (x, x).

+X Sum map for X, function X × X → X with recipe (x, y) → x + y.
d(C) For a convex cone C ⊆ X × R+ : deconification/dehomogenization of C,
the convex set A = {a ∈ X | (a, 1) ∈ C}.
cmin (A) For a convex set A ⊆ X: the minimal conification of A (where minimal
means ‘minimal under inclusion’), R++ (A × {1}) = {ρ(a, 1) | ρ ∈
R++ , a ∈ A} (the conic hull of A × {1} with the origin of X × R deleted).
cmax (A) For a convex set A ⊆ X: the maximal conification of A (where maximal
means ‘maximal under inclusion’), cmin (A) ∪ (RA × {0}) (this is a union
of two disjoint sets).
c(A) For a convex set A ⊆ X: the conification of A, cmin (A).
◦(+−) Minkowski sum.
◦(−−) Intersection.
◦(++) Convex hull of the union.
◦(+−) Inverse sum.
A.1.8 Convex Functions
dom(f ) For a proper convex function f on X: effective domain

of f , the set {x ∈ X | f (x) < +∞}.
∂f (
x) For a convex function f on X and a point x in its effective
domain: subdifferential of f at x , the set of slopes of
those affine functions on X that have the same value as
f at the point x and that moreover minorize f , {y ∈
X | f (x ) + y, x −
x ≤ f (x) ∀x ∈ X}.
248 Appendix A
f1 ∨ f2 = max(f1 , f2 ) For convex functions f1 , f2 on X: the pointwise maxi-

mum of f1 and f2 , the convex function X → R with
recipe x → max(f1 (x), f2 (x)).
f1 + f2 For convex functions f1 , f2 on X: pointwise sum of f1
and f2 , the convex function X → R with recipe x →
f1 (x) + f2 (x).
f1 co ∧ f2 For proper convex functions f1 , f2 on X: convex hull of
the (pointwise) minimum of f1 and f2 , the largest convex
underestimator of f1 and f2 , that is, the strict epigraph
of f1 co ∧ f2 is the convex hull of the strict epigraph of
f1 ∧ f2 = min(f1 , f2 ), the pointwise minimum of f1 and
f2 .
f1 ⊕ f2 For proper convex functions f1 , f2 on X: infimal convo-
lution of f1 and f2 , the convex function on X with recipe
x → inf{f1 (x1 ) + f2 (x2 ) | x = x1 + x2 }.
f #g For convex functions f1 , f2 on X: Kelley’s sum, the
convex function on X with recipe x → inf{f (x1 ) ∨
g(x2 ) | x = x1 + x2 } (where a ∨ b = max(a, b) for
a, b ∈ R).
cl(f ) For a convex function on X: the closure of f , the maximal
closed convex function that minorizes f , characterized by
epi(cl(f )) = cl(epi(f )).
fM For two f.d. inner product spaces X, Y , a convex function
f on Y , and a linear function M : X → Y : the inverse
image of f under M, a convex function on X with recipe
x → f (M(x)).
Mg For two f.d. inner product spaces X, Y , a convex function
g on X, and a linear function M : X → Y : the image
of g under M, a convex function on Y with recipe y →
inf{g(x) | M(x) = y}.
Rf For a convex function on X: the recession cone of f , the
recession cone of a non-empty level set of f .
f∗ For a convex function f on X: conjugate function, the
convex function on X with recipe y → supx (x, y −
f (x)).
f ∗∗ For a convex function f on X: biconjugate function of f ,
(f ∗ )∗ .
→ f˜ For convex functions f, f˜ on X: f and f˜ are in duality,
d
f ←
f ∗ = f˜ and f˜∗ = cl(f ). Warning: f˜ is closed, but f is
not necessarily closed.
∂p For a proper sublinear function p on X: subdifferential
of p, the nonempty closed convex set {y ∈ X | x, y ≤
p(x) ∀x ∈ X}.
Appendix A 249
sA For a nonempty convex set A ⊆ X: support function of

A, the proper closed sublinear function on X with recipe
y → sup{x, y | x ∈ A}.
A.1.9 Convex Functions: Conification
d(C) For a convex cone C ⊆ X × R × R for which R+ (0X , 1, 0) ⊆ cl(C) ⊆

X ×R×R+ : deconification/dehomogenization of C, the convex function
f on X with recipe x → f (x) = inf{ρ | (x, ρ, 1) ∈ C}.
cmin (f ) For a convex function f on X: the minimal conification of f
(minimal means minimal with respect to inclusion), the convex cone
R++ (epis (f ) × {1}) in X × R × R.
cmax (f ) For a convex function f on X: the maximal conification of f (maximal
means maximal with respect to inclusion), the closed convex cone that is
the union of the disjoint sets R++ (epi(f ) × {1}) and Repi(f ) × {0}.
c(f ) For a convex function f on X: the conification of f , cmin (f ).
◦(−+−) pointwise sum.
◦(−−−) pointwise maximum.
◦(+++) convex hull of the pointwise minimum.
◦(++−) infimal convolution.
A.1.10 Optimization
min R For a set R ⊂ R: minimum of R, the element r ∈ R for which r ≥ r for

all r ∈ R. Not every set R has a minimal element; if it exists, then it is
unique.
inf R For a set R ⊂ R: infimum of R, the largest lowerbound of R in R. Every
set R has an infimum, and it is unique.
max R For a set R ⊂ R: maximum of R, the element r ∈ R for which r ≤ r for all
r ∈ R. Not every set R has a maximal element; if it exists, then it is unique.
It can be expressed in the concept of minimum: max R = − min(−R).
sup R For a set R ⊂ R: supremum of R, the smallest upperbound of R in R.
Every set R has a supremum, and it is unique. It can be expressed in the
concept of infimum: sup R = − inf(−R).
L For a perturbation function F : X × Y → R: Lagrange function, the
function X × (Y × [0, +∞) \ {0Y ×R }) → R with recipe (x, η, η0 ) →
infy∈Y (η0 F (x, y) − η, y).
S For a perturbation function F : X × Y → R: optimal value function, the
function Y → R with recipe y → infx∈X F (x, y).
ϕ For a perturbation function F : X × Y → R: objective function dual
problem, the function Y → R with recipe y → infy (η, y − S(y)).
Appendix B
B.1 Calculus Formulas
B.1.1 Convex Sets Containing the Origin
B ◦◦ = B if B is closed.
d
A co ∪ B ← → A◦ ∩ B ◦ .
d
A+B ← → A◦ # B ◦ .
d
MB ← → B ◦ M if M is a linear function.
B.1.2 Nonempty Convex Cones
C ◦◦ = C if C is closed.
d
C+D ← → C ◦ ∩ D◦.
d
MC ← → C ◦ M if M is a linear function.
B.1.3 Subspaces
L⊥⊥ = L.
d
K +L← → K ⊥ ∩ L⊥ .
d
ML ←→ L⊥ M if M is a linear function.

252 Appendix B
B.1.4 Convex Functions
f ∗∗ = f if f is closed.
d
→ f ∗ ⊕ g∗.
f +g ←
d
→ f ∗ co ∧ g ∗ .
f ∨g ←
d
→ g∗M .
Mg ←
B.1.5 Proper Sublinear Functions p, q and Nonempty Convex

Sets A
∂s(A) = A if A is closed.
s∂(p) = p if p is closed.
∂(p + q) = cl(∂p + ∂q).
∂(p ∨ q) = cl((∂p) co ∪ (∂q)).
∂(p ⊕ q) = cl(∂p ∩ ∂q).
∂(p # q) = cl(∂p # ∂q).
B.1.6 Subdifferentials of Convex Functions
∂(f + g)(x) = cl(∂f (x) + ∂g(x)).

∂(f ∨ g)(x) = cl(∂f (x) co ∪ ∂g(x)) if f (x) = g(x).
B.1.7 Norms
N ∗∗ = N .
d
→ N1∗ + N2∗ .
N1 ⊕ N2 ←
d
→ N1∗ ∨ N2∗ .
N1 co ∧ N2 ←
Index
Symbols infimal convolution, 139, 144

BX = Bn Kelley’s sum, 139
closed standard unit ball in X = Rn , 17 maximum, 139, 143
C 1 -function, 180 sum, 139, 143
L⊥ orthogonal complement, 13 Binary operation for convex sets
SX = Sn convex hull of the union, 60
standard unit sphere in X = Rn , 15 intersection, 60
R = [0, ∞), 19 inverse sum, 61
ri(A) Minkowski sum, 60
relative interior convex set A, 67 systematic construction, 60
X diagonal map, 12 Bipolar cone, 97
+X sum map, 12 Bipolar set, 96
lp -norm, 94 Blaschke’s theorem, 38
p-norm, 94 Boundary point, 66
Bounded set, 38, 66
A
Affine function C
linear plus constant, 39 Calculus rules
Affine hull, 67 convex cones, 103, 104
Affinely independent, 34 convex functions, 157
Affine subspace, 67, 175, 238, 240 convex sets, 161
Angle, 74 convex sets containing the origin, 103
sublinear functions, 161
Carathéodory’s theorem, 40
B Centerpoint theorem, 39
Barrier methods, 238 Certificate for
Basis, 55 insolubility, 81
Biconjugate function, 156 unboundedness, 81
Bilinear mapping, 99 Child drawing, 112, 120
Binary operation for convex cones Closed
intersection, 56 convex function, 133
Minkowski sum, 56 Closed halfspace, 88
Binary operation for convex functions Closed set, 38, 66
convex hull of the minimum, 139, 144 Closure, 66

254 Index
Coercive, 177 convex set, 87, 89, 160

Compactify, xvii convex set containing the origin, 96
Compact set, 38, 66 sublinear function, 160
Concave function, 146, 154 Duality
Cone, 9, 181 the essence, 98
Conic combination, 29 proof: ball throwing, 90
Conic hull, 27 proof: Hahn–Banach, 115
Conic problem, 175 proof: shortest distance, 114
Conification method, 23 Duality convex sets
Conjugate function, 154 Black-Scholes, 88, 111
application, 166, 167 certificates of insolubility, 88
concave, 220 child drawing, 86, 112
Continuous function, 41, 67 Hahn–Banach, 94
Continuously differentiable, 180 manufacturing, 87
Convergent, 66 price mechanism, 113
Convex combination, 28 separation theorem, 93
Convex cone, 10 supporting hyperplane theorem, 92
preference relation, 43 theorems of the alternative, 108
space-time, 43 Duality gap, 208
Convex function, 125 Duality operator
closed, 125 conjugate function, 154
epigraph, 129 polar cone, 96
(im)proper, 128, 130 polar set, 95
lower-semicontinuous, 125 subdifferential
strict epigraph, 130 sublinear function, 160
Convex hull, 27 support function, 160
Convex hull of the union Duality operator for
convex sets, 60 convex cone, 96
Convex optimization problem, 174 convex function, 154
Convex programming problem, 175, 210 convex set, 160
Convex set, 7 convex set containing the origin, 95
dual description, 29 sublinear function, 160
fair bargain, 5 Duality scheme, 209
primal description, 28 Duality theory
shape, 77 illustration, 229, 234
Coordinate-free oldest example, 229
crash course, 54 Duality theory LP
coordinate free, 234, 237
Dual norm, 94, 117
D Dual problem, 206
Deconification saddle function, 218
convex set, 19 Dual space
Dehomogenization normed space, 117
convex set, 19 Dubovitskii-Milyutin
Derivative, 180 theorem, 165
Diagonal map, 56
Differentiable, 180
Distance E
between two rays, 14 Effective domain, 130
Dot product, 14 Epigraph, 128
Dual cone, 97 Euclidean norm, 14
Dual description European call option, 111
convex cone, 97 Extreme point, 78
convex function, 156 Extreme recession direction, 78
Index 255
F subdifferential, 161
Farkas’ lemma, 110 support function, 161
Fenchel, xx Hyperplane, 78, 88
Fenchel duality, 221
Fenchel-Moreau
theorem, 157 I
Fermat’s theorem (convex), 185 Ice cream cone, 26
Fermat theorem, 182 Iff, 8
Finite dimensional vector space, 54 Image
First orthant, 26 convex function, 138
Fourier-Motzkin elimination, 81 Indicator function, 128
Four step method, 183 In duality
Functional analysis, 93 convex cones, 103
convex sets containing the origin, 104
polyhedral sets, 107
G Infimum, 129
Gauss, 74 Inner product, 55
Geodesic Inner product space, 55
on standard unit sphere, 15 Interior, 66
Geodesically convex set Interior point, 66
in standard unit sphere, 15 Interior point methods, 238
Golden convex cone Intersection
first orthant, 26 convex cones, 56
Lorentz cone, 26 convex sets, 60
positive semidefinite cone, 26 Inverse image
convex function, 138
Inverse sum
H convex sets, 61
Hadamard’s inequality, 147 Involution, 94
Helly’s theorem, 35
Hemisphere
parametrization, 74 J
Hemisphere model Jensen’s inequality, 127
convex cone, 16 Jung’s theorem, 38
convex set, 24
Hessian, 134
Hilbert projection theorem, 115 K
Homogenization, vii Karush–Kuhn–Tucker theorem, 210
convex set, 18 Kernel, 79
Homogenization method, 23, 103 KKT in usual form, 210
binary operations convex functions, 143 Krasnoselski’s theorem, 39
binary operations convex sets, 57, 60, 144
calculus convex functions, 158
subdifferential, 164 L
calculus sublinear functions, 161 Lagrange function, 194, 210
Carathéodory’s theorem, 40 Lagrange multipliers (convex), 192
duality for convex functions, 157 Least squares, 225
finite convex combinations, 29 Least squares problem, 176
Helly’s theorem, 36 Limit of a function at a point, 66
organization of proofs, 29 Limit of a sequence, 66
polar set, 99, 102 Lineality space
properties subdifferential of a convex cone, 48
convex functions, 164 of a convex set, 80
Radon’s theorem, 33 Linear function, 55
256 Index
Linear programming, 175 P

Linear programming problem Pair in duality
coordinate free, 235 convex functions, 157
Local minimum, 181, 184 Penalty kick, 228
Lorentz cone, 26 Perspective function, 47
Lower semi-continuous Perturbation function, 189
convex function, 133 Perturbed problem, 189
Platonic solids, 120
Pointed convex cone, 43
M Points in the middle, 32
Martingale, 111 Polar cone, 96
Maximin, 217 with respect to a bilinear mapping, 100
Median, 39 Polar set, 94, 95
Minimax, 217 geometric construction, 103
Minimax theorem homogenization, 102
illustration, 228 homogenization method, 99
Minimax theorem of von Neumann, 219 Polygon, 8
Minkowski function, 126 Polyhedral cone, 42
Minkowski sum Polyhedral set, 42, 106
convex cones, 56 Pontryagin’s Maximum Principle, 46
convex sets, 60 Positive homogeneous, 9
Minkowski-Weyl representation, 106 Positive homogeneous function, 10
Moreau-Rockafellar Positive semidefinite cone, 26
theorem, 165 Positive semidefinite matrix, 26
Most likely matrix Primal-dual LP
RAS method, 228 standard form, 237
symmetric form, 237
Primal-dual problems
N conic form, 237
Nash bargaining, 5 Primal problem, 206
Non-degenerate saddle function, 218
bilinear mapping, 99 Projection point on convex set, 115
Norm, 94 Proper convex function
Normed space, 94 Jensen’s inequality, 127
Norm of a linear function, 94 Protasov, 213
O Q
Open, 66 Quadratic programming, 175
Open ball, 66
Open ray, 9, 14
Optimality conditions R
duality theory, 206 Radon point, 32
Fenchel duality, 220 Radon’s theorem, 32
KKT in subdifferential form, 213 Ray model
KKT in usual form, 210 convex cone, 14
minimax and saddle points, 217 convex set, 24
Optimal solution, 174 Recession cone
Optimal value, 174 convex function, 144
Optimal value function, 192 convex set, 20
Orthogonal complement, 13 Recession direction, 20
of an affine subspace, 238, 240 Recession vector, 20
Orthonormal basis, 55 Relative boundary
convex set, 67
Index 257
Relative interior Supporting halfspace, 92

convex set, 67 Supporting hyperplane, 92
Relatively open Supporting hyperplane theorem, 92
convex set, 67 Supporting line, 38
Rockafellar, xx Surprising bijection, 238
S T
Saddle function, 217 Tangent cone, 164, 181
Saddle point, 218 Tangent space, 182
Saddle value, 218 Tangent space theorem, 181
Second welfare theorem Tangent vector, 181
in praise of price mechanisms, 230 Theorem of Chebyshev
Self-dual convex cone, 97 for affine functions, 39
Semidefinite programming, 175 Theorem of Hahn–Banach, 93, 115
Separated by a hyperplane, 92 Theorem of Moreau, 115
Separation theorem, 92 Theorem of the alternative, 108
Shapley–Folkman lemma, 44 Gale, 110
Shortest distance problem, 176 Gordan, 120
Slater condition, 193, 210 Ky Fan, 109
Slater point, 193, 210 Minkowski, Farkas, 109
Smooth local minimum, 181 Theorem of Weierstrass, 67
Spaces in duality, 99 Topological notion, 66
Sphere model, 15 Top-view model
convex cone, 15 convex cone, 17
Standard closed unit ball, 17 convex set, 25
Standard unit sphere, 15 Transpose of a linear function, 55
Strict epigraph, 130
Strictly convex, 134
Strictly convex set, 178 U
Strict sublevel set, 130 Unbounded, 66
Strong duality, 208 Uniqueness
Subdifferential convex optimization, 177
convex function, 163
sublinear function, 160
Subgradient W
convex function, 162 Waist of three lines, 178
Sublevel set, 130, 144 Weak duality, 208
Sublinear function, 159 Weyl’s theorem, 43
Subspace, 11, 12 Width, 38
Sum map, 56 Wlog, 36
Support function, 160

Convex Analysis For Optimization A Unified Approach Compress

Uploaded by

Convex Analysis For Optimization A Unified Approach Compress

Uploaded by

Graduate Texts in Operations Research

Series Editors: Richard Boucherie · Johann Hurink

More information about this series at http://www.springer.com/series/15699

ISSN 2662-6012 ISSN 2662-6020 (electronic)

© Springer Nature Switzerland AG 2020

The novelty of this textbook is a simplification: convex analysis is based completely

several judicious choices. The homogenization method gives a unified definition

not include an analysis of the performance of algorithms for convex optimization

I would like to thank Vladimir M. Tikhomirov for stimulating my interest in convex

Rotterdam, The Netherlands Jan Brinkhuis

The Main Results of Convex and Smooth Optimization Are Compared It is

Fig. 1 The main result of

Fig. 2 Conic sections: circle,

Fig. 3 Convex set as

Fig. 4 Parallel tulip fields

Figure 4 illustrates linear perspective; it shows a stylized version of parallel tulip

Gauss’ recommendation for dealing with problems in space involving one-sided

Fig. 5 Ray, hemisphere, (r,1)

the open ray containing the point

Illustration in Dimension Two Figure 7 illustrates in dimension two, the plane

Fig. 6 Recession directions by means of taking the closure

Fig. 7 The modeling of convex sets and their recession directions

1. J. Brinkhuis, V.M. Tikhomirov, Optimization: Insights and Applications (Princeton University

1 Convex Sets: Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Convex Sets: Binary Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.8.4 *How to Control Manufacturing by a Price

7.9 Lagrange Multipliers (Convex Case) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

B.1.5 Proper Sublinear Functions p, q and Nonempty

© Springer Nature Switzerland AG 2020 1

1.1 *Motivation: Fair Bargains

p1 (−1, 2) + p2 (−2, 2) + p3 (4, −1).

(1 − p1 )(1, 0) + p1 (0, 2) + (1 − p2 )(2, 0) + p2 (0, 2) + (1 − p3 )(4, 0) + p3 (0, 1)

Fig. 1.1 Bargaining

Fig. 1.2 Convex set: yes or no

So we see that if we model a bargaining opportunity, we are led to the concept

v = p1 (−1, 2) + p2 (−2, 2) + p3 (4, −1), 0 ≤ pi ≤ 1, i = 1, 2, 3

w = q1 (−1, 2) + q2 (−2, 2) + q3 (4, −1), 0 ≤ qi ≤ 1, i = 1, 2, 3,

r1 = (1 − ρ)p1 + ρq1 , r2 = (1 − ρ)p2 + ρq2 , r3 = (1 − ρ)p3 + ρq3

for some 0 ≤ ρ ≤ 1, then 0 ≤ ri ≤ 1, i = 1, 2, 3 and

(1 − ρ)v + ρw = r1 (−1, 2) + r2 (−2, 2) + r3 (4, −1).

So we see that a weighted average of two bargaining opportunities is indeed again a

max f (x) = (x1 − v1 )(x2 − v2 ), x ∈ F, x ≥ v.

min g(x) = − ln(x1 − v1 ) − ln(x2 − v2 ), x ∈ F, x > v.

1.2 Convex Sets and Cones: Definitions

If σ runs from 0 to 1—and so ρ runs from 1 to 0—then the point ρa + σ b =

Fig. 1.3 Convex set: yes or no

Example 1.2.2 (Convex Set)

for all −∞ < a < b < +∞.

P ⊇ A ⊇ (1 − ε)P = {(1 − ε)p | p ∈ P }.

3. The generalization of a convex polygon in the plane R2 to dimension n is a

Fig. 1.4 Convex cone

2. Convex cones in the line R can be enumerated:

S3+ = {x ∈ R3 | x1 x3 − x22 ≥ 0, x1 ≥ 0, x3 ≥ 0}.

If you identify x ∈ R3 with the symmetric 2 × 2-matrix (aij )ij given by

1.3 Chapters 1–4 in the Special Case of Subspaces

Example 1.3.1 (False Equations and Homogenization) Now we illustrate homoge-

X (u) · (v, w) = u · +X (v, w),

for all u, v, w ∈ X, where the inner product on X × X is denoted as

Chapter 3 For each subspace L, we can choose a finite sequence of elements in L

1.4 Convex Cones: Visualization by Models

Fig. 1.5 Ray, hemisphere

1.4.1 Ray Model for a Convex Cone

The role of nonzero elements of a convex cone C ⊆ X = Rn is usually to describe

1.4.2 Sphere Model for a Convex Cone

Fig. 1.6 Geodesic and great

Fig. 1.7 Sphere model for a convex cone

Fig. 1.8 The hemisphere model for a convex cone

three-dimensional space is modeled as one point on the standard unit sphere S3

1.4.3 Hemisphere Model for a Convex Cone

1.4.4 Top-View Model for a Convex Cone

In fact, for a convex cone C ⊆ X = Rn that is contained in the halfspace xn ≥ 0,

BRn−1 = Bn−1 = {x ∈ Rn−1 | x12 + · · · + xn−1

Fig. 1.9 The top-view model for a convex cone

1.5 Convex Sets: Homogenization

1.5.1 Homogenization: Definition

The plane R2 is drawn as the floor, R2 × {0}, in three dimensional space R3 .

cone(B) ∩ cone(R) = {0n }.