See discussions, stats, and author profiles for this publication at: [Link]
net/publication/226961439
Convexity of Nonlinear Image of a Small Ball with Applications to
Optimization
Article in Set-Valued Analysis · January 2001
DOI: 10.1023/A:1011287523150
CITATIONS READS
67 323
1 author:
Boris T. Polyak
V. A. Trapeznikov Institute of Control Sciences
292 PUBLICATIONS 20,264 CITATIONS
SEE PROFILE
All content following this page was uploaded by Boris T. Polyak on 30 May 2014.
The user has requested enhancement of the downloaded file.
Set-Valued Analysis 9: 159–168, 2001.
159
© 2001 Kluwer Academic Publishers. Printed in the Netherlands.
Convexity of Nonlinear Image of a Small
Ball with Applications to Optimization
B. T. POLYAK
Institute for Control Science, Moscow, Russia. e-mail: boris@[Link]
(Received: 11 May 2000)
Abstract. Let f : X → Y be a nonlinear differentiable map, X, Y are Hilbert spaces, B(a, r)
is a ball in X with a center a and radius r. Suppose f (x) is Lipschitz in B(a, r) with Lipschitz
constant L and f (a) is a surjection: f (a)X = Y ; this implies the existence of ν > 0 such that
||f (a)∗ y|| ν||y||, ∀y ∈ Y . Then, if ε < min{r, ν/(2L)}, the image F = f (B(a, ε)) of the ball
B(a, ε) is convex. This result has numerous applications in optimization and control. First, duality
theory holds for nonconvex mathematical programming problems with extra constraint ||x − a|| ε.
Special effective algorithms for such optimization problems can be constructed as well. Second, the
reachability set for ‘small power control’ is convex. This leads to various results in optimal control.
Mathematics Subject Classifications (2000): 52A05, 58E25, 90C46.
Key words: convexity, image, optimization, nonlinear transformation, duality, nonconvex program-
ming, optimal control.
1. Introduction
Convexity plays a key role in optimization and control theory. If a mathematical
programming problem is convex, then the duality theorem holds and effective nu-
merical methods can be constructed [1]. However, convexity is an exception for
general nonlinear optimization problems.
In the present paper, we describe a new class of problems, which are originally
nonconvex, but yield convex techniques. The basic mathematical result is a new
theorem, asserting convexity of a nonlinear image of a small ball in a Hilbert space
(Section 2). The case of a quadratic map is considered as an example in Section 3.
The duality theory for nonconvex mathematical programming problems, relying on
the new convexity principle, is developed in Section 4. Special numerical methods
for solving such problems are also provided. Various applications to control prob-
lems are described in Section 5. They are based on the convexity of the reachable
set for nonlinear system with ‘small power control’.
160 B. T. POLYAK
2. Basic Result
Let X, Y be two Hilbert spaces, let f : X → Y be a nonlinear map with Lipschitz
derivative on a ball B(a, r) = {x ∈ X : ||x − a|| r}, thus
||f (x) − f (z)|| L||x − z|| ∀x, z ∈ B(a, r). (1)
Suppose that a is a regular point of f , i.e. the linear operator f (a) maps X onto
Y , then there exists ν > 0 such that
||f (a)∗ y|| ν||y|| ∀y ∈ Y. (2)
For instance, if X, Y are finite-dimensional, X = Rn , Y = Rm , then this condition
holds if rank f (a) = m; for this case ν = σ1 (f (a)) – the least singular value of
f (a).
THEOREM 2.1. If (1), (2) hold and ε < min{r, ν/(2L)}, then the image of a ball
B(a, ε) = {x ∈ X : ||x − a|| ε} under the map f is convex, i.e. F = {f (x) :
x ∈ B(a, ε)} is a convex set in Y .
We need the following results:
LEMMA 2.1. A ball in a Hilbert space is strongly convex: if x1 , x2 ∈ B(a, ε), x0 =
(x1 + x2 )/2, then B(x0 , ρ) ⊂ B(a, ε) for ρ = ||x1 − x2 ||2 /(8ε).
This result is well known and follows immediately from the parallelogram
equality.
LEMMA 2.2. Suppose there exist L, ρ, µ > 0, such that
||f (x) − f (z)|| L||x − z|| ∀x, z ∈ B(x0 , ρ),
||f (x)∗ y|| µ||y|| ∀y ∈ Y, ∀x ∈ B(x0 , ρ),
||f (x0 ) − y0 || ρµ,
then the equation f (x) = y0 has a solution x ∗ ∈ B(x0 , ρ) and
||f (x0 ) − y0 ||
||x ∗ − x0 || .
µ
This Lemma coincides with Corollary 1, Theorem 1 of [2].
Proof of Theorem 2.1. Let x1 , x2 be arbitrary points in B(a, ε) ⊂ B(a, r), yi =
f (xi ) ∈ F, i = 1, 2. Denote x0 = (x1 + x2 )/2, y0 = (y1 + y2 )/2. To prove
convexity of F it suffices to find x ∗ ∈ B(a, ε) such that f (x ∗ ) = y0 . We have
y1 = f (x0 ) + f (x0 )(x1 − x0 ) + 1 , y2 = f (x0 ) + f (x0 )(x2 − x0 ) + 2 ,
where
||i || L||xi − x0 ||2 /2 = L||x1 − x2 ||2 /8, i = 1, 2
CONVEXITY OF NONLINEAR IMAGE OF A SMALL BALL 161
due to (1), see, e.g., [3, Theorem 3.2.12]. Hence
y0 = f (x0 ) + 0 , 0 = (1 + 2 )/2, ||0 || L||x1 − x2 ||2 /8.
All conditions of Lemma 2.2 are satisfied for µ = ν − Lε > 0, ρ = ||x1 −
x2 ||2 /(8ε), because (1), (2) hold, B(x0 , ρ) ⊂ B(a, ε) due to Lemma 2.1,
||f (x0 ) − y0 || = ||0 || L||x1 − x2 ||2 /8 = Lρε ρ(ν − Lε) = ρµ.
Moreover,
||f (x)∗ y|| ||f (a)∗ y|| − ||(f (x)∗ − f (a)∗ )y||
ν||y|| − L||x − a||||y|| (ν − Lε)||y|| = µ||y|| for x ∈ B(x0 , ρ).
∗
Thus Lemma 2.2 provides the desired x and the proof of convexity of F is com-
pleted. ✷
Remark 1. We presented the proof, based on Lemma 2.2 (which has been proved
in [2] by using a version of the Newton method). Another proof can be obtained
by exploiting modern techniques, related to the Ljusternik theorem (see, e.g., [4,
Theorem 2.7], [5]). However, the proofs of Ljusternik-like results are also based
on the Newton method.
Remark 2. The idea of Theorem 2.1 is very simple. The ball B(a, ε) is strongly
convex, thus its image under linear map f (a) is strongly convex as well. But it can-
not lose convexity for a nonlinear map f , which is close enough to its linearization.
The same reasoning explains that the result cannot be extended to Banach spaces,
where a ball is not strongly convex.
Remark 3. The result holds, if we replace the ball by any other strongly convex
set (e.g. by a nondegenerate ellipsoid).
Remark 4. We can verify some additional properties of F : it is a strictly convex
set; it has a nonempty interior which is generated by interior points of B(a, ε); its
boundary is the image of the sphere ||x − a|| = ε. Indeed, we have proved that all
points in B(a, ε) are regular for the map f : ||f (x)∗ y|| µ||y||, µ > 0, for all y.
A regular image of an open set is open thus {f (x) : ||x − a|| < ε} is an open set
and cannot contain points of ∂F.
Remark 5. The smoothness assumptions of Theorem 2.1 cannot be seriously
relaxed. For instance, A. Ioffe constructed a counter-example with f continuously
differentiable, but not in C 1,1 . Then the result is false.
162 B. T. POLYAK
3. Example: Quadratic Transformation
In many cases, the conditions of Theorem 2.1 can be effectively checked and the
radius ε of the ball can be estimated. One such example is a quadratic transforma-
tion.
Let x ∈ Rn and f (x) = (f1 (x), . . . , fm (x))T where fi (x) are quadratic func-
tions:
fi (x) = (1/2)(Ai x, x) + (ai , x),
Ai = ATi ∈ Rn×n , ai ∈ Rn , i = 1, . . . , m. (3)
Take a = 0, that is B = {x : ||x|| ε}. Then fi (x) = Ai x + ai and (1) is satisfied
on Rn with
m 1/2
L= ||Ai ||2
,
i=1
where ||Ai || stands for the operator norm of matrices Ai . Consider the matrix A
with columns ai : A = (a1 |a2 | . . . |am ). Then f (0)T y = Ay, and if rank A =
m, then (2) holds with ν = σ1 (A) – the minimal singular value of A, that is
ν = (min λ1 (AT A))1/2 , where λ1 is the minimal eigenvalue of the corresponding
matrix. Hence, Theorem 2.1 states:
PROPOSITION 3.1. If ε < ν/(2L), then the image of the ball B under the map
f is convex: F = {f (x) : ||x|| ε} is a convex set in Rm .
This is in sharp contrast with the results on images of arbitrary balls under
quadratic transformations, where the convexity can be validated [6] under some
very restrictive assumptions.
For instance, let n = m = 2 and
f1 (x) = x1 x2 − x1 , f2 (x) = x1 x2 + x2 . (4)
√
Then the estimates above guarantee that F is convex for ε < ε ∗ = 1/(2 2) ≈
0.3536. It can be directly proved for this case that F is convex for ε ε ∗ and loses
convexity for ε > ε ∗ . Thus, the estimate provided by Proposition 3.1 is tight for
this example. Figure 1 shows the images of the ε-discs {x ∈ R2 : ||x|| ε} under
the mapping (4) for various values of ε.
4. Duality in Local Optimization Problems
Simultaneously, with the standard mathematical programming problem
min f0 (x), x ∈ Rn ,
fi (x) 0, i = 1, . . . , l, (5)
fi (x) = 0, i = l + 1, . . . , m,
CONVEXITY OF NONLINEAR IMAGE OF A SMALL BALL 163
Figure 1. Images of ε-discs for various ε.
consider its local version with the extra constraint
min f0 (x), x ∈ Rn ,
fi (x) 0, i = 1, . . . , l,
(6)
fi (x) = 0, i = l + 1, . . . , m,
||x − a|| ε.
Suppose that the functions fi (x), i = 0, 1, . . . , m are from C 1,1 on B(a, ε). Con-
struct the Lagrange function
m
L(x, y) = yi fi (x). (7)
i=0
Denote Y+ = {y ∈ Rm+1 : yi 0, i = 0, 1, . . . , l}. We assume that a is a feasible
point in (5); moreover we can assume without loss of generality that all inequality
constraints are active in a: fi (a) = 0, i = 1, . . . , m, otherwise they play no role in
(6) and can be rejected for ε small enough. Finally, we suppose that the gradients of
fi (x), i = 0, 1, . . . , m at a are linearly independent, i.e. no y 0 = 0 exists such that
Lx (a, y 0 ) = 0. If there are no inequality constraints, this condition means that a is
not a stationary point in (5). In the presence of inequality constraints, this condition
is more restrictive than the assumption ‘a is not a Kuhn–Tucker point in problem
(5)’. For instance, it implies m < n, i.e., the number of active constraints in a is
less than the dimension.
164 B. T. POLYAK
THEOREM 4.1. Under above assumptions, there exists ε ∗ > 0 such that a so-
lution x ∗ of (6) with 0 < ε < ε ∗ exists, is unique, lies on the boundary of
B(a, ε): ||x ∗ − a|| = ε, and the following inequality holds
L(x, y ∗ ) L(x ∗ , y ∗ ) ∀x : ||x − a|| ε (8)
for some y ∗ ∈ Y+ , y ∗ = 0, yi∗ fi (x ∗ ) = 0, i = 1, . . . , l.
Proof. Problem (6) is equivalent to the optimization problem in the ‘image
space’:
min f0 , f ∈ F, fi 0, i = 1, . . . , l,
(9)
F = {f (x) : ||x − a|| ε},
where
f = (f0 , f1 , . . . , fm ) ∈ Rm+1 , f (x) = (f0 (x), f1 (x), . . . , fm (x)).
The point a is a regular point for f (x) because fi (a) are linearly independent.
Theorem 2.1 guarantees the convexity of F for ε small enough. Thus (9) is a convex
problem and, for its solution f ∗ = f (x ∗ ), there exists a separating hyperplane
0 = y ∗ ∈ Rm+1 , (y ∗ , f ) 0 ∀f : f ∈ F, f0 f0∗ , fi 0, i = 1, . . . , l.
This condition is equivalent to (8). The other statements of the theorem follow
from Remark 4 to Theorem 2.1. ✷
COROLLARY 4.1. If a point x ∗ is a solution of (6), then there exists y ∗ ∈ Y+ , y ∗ =
0, yi∗ fi (x ∗ ) = 0, i = 1, . . . , m such that
Lx (x ∗ , y ∗ )
x∗ = a − ε . (10)
||Lx (x ∗ , y ∗ )||
Indeed, (10) is a necessary condition for x ∗ to be a minimum point of the
function L(x, y ∗ ) on B(a, ε). The factor ε in (10) is due to condition ||x ∗ −a|| = ε.
COROLLARY 4.2. Introduce
ψ(y) = min L(x, y),
||x−a||ε
if x ∗ is a solution of (6) then there exists y ∗ ∈ Y+ such that
L(x ∗ , y ∗ ) = max ψ(y).
y∈Y+
Under some Slater-like condition we can ensure y0∗ = 0, that is y0∗ can be taken
as being equal to one.
CONVEXITY OF NONLINEAR IMAGE OF A SMALL BALL 165
THEOREM 4.2. Suppose that the following regularity condition holds: for any
ε > 0, σ ∈ Rm : σi = 1, i = 1, . . . , l, |σi | = 1, i = l + 1, . . . , m there exists xσ
such that
σi fi (xσ ) < 0, i = 1, . . . , m, ||xσ − a|| ε. (11)
Then, in Theorem 4.1 we can take y0∗ = 1 and (8) is necessary and sufficient
condition for optimality in (6).
Proof. From (8) we get
m
y0∗ (f0 (x) − f0 (x ∗ )) + yi∗ fi (x) 0 ∀||x − a|| ε.
i=1
m σ ∗ : σ i =
Take sign yi∗
and the corresponding xσ . Then for y0∗ = 0 we have
∗
i=1 yi fi (xσ ) < 0 (because y = 0), which contradicts the inequality above
for x = xσ . Thus, y0 > 0, of course we can scale y ∗ to make y0∗ = 1. Condition
∗
(8) is obviously sufficient for optimality if y0∗ = 1. ✷
Regularity condition (11) can be replaced by other ones, e.g. fi (a), i = l +
1, . . . , m are linearly independent and there exists h ∈ Rn : (fi (a), h) = 0, i =
l + 1, . . . , m, (fi (a), h) < 0, i = 1, . . . , l.
Let’s show how these results work for the case of quadratic functions. Consider
(6) with a = 0 and
fi (x) = (1/2)(Ai x, x) + (ai , x) + αi , i = 0, 1, . . . , m.
Suppose that αi 0, i = 1, . . . , l, αi = 0, i = l + 1, . . . , m and the assumptions
of Proposition 3.1 are satisfied (with obvious changes of notation). Then Theorem
4.1 can be applied,
L(x, y) = (1/2)(A(y)x, x) + (a(y), x) + α(y),
m m
m
A(y) = yi Ai , a(y) = yi ai , α(y) = yi αi .
i=0 i=0 i=0
Then ψ(y) can be found as the solution of the problem
ψ(y) = min ((A(y)x, x) + 2(a(y), x) + α(y)). (12)
||x||ε
This problem is always tractable (even if A(y) is not positive definite), and can
be effectively solved [6]. Thus, we can calculate ψ(y), it is not hard to calculate
∂y ψ(y) as well. Hence we can apply the subgradient method for maximization of
ψ(y) on Y+ .
In the more general case, when fi (x) are nonquadratic functions, minimiza-
tion of L(x, y) on a ball can be performed by use of the special iterative method.
Consider the simplest optimization problem:
min f (x) (13)
||x−a||ε
166 B. T. POLYAK
and the iterative method
f (x k )
x k+1 = x k − ε . (14)
||f (x k )||
THEOREM 4.3. Suppose that f : Rn → R 1 is C 1,1 on B(a, ε):
||f (x) − f (y)|| L||x − y||, x, y ∈ B(a, ε), ε < ||f (a)||/(2L). (15)
Then
(a) The solution x ∗ of (13) exists and is unique, ||x ∗ − a|| = ε and the necessary
and sufficient optimality condition holds:
f (x ∗ )
x∗ = a − ε . (16)
||f (x ∗ )||
(b) Method (14) converges with linear rate of convergence for any x 0 ∈ B(a, ε):
εL
||x k − x ∗ || q k ||x 0 − x ∗ ||, q = O(ε) = < 1. (17)
||f (a)|| − εL
Proof. The statement (a) follows from Theorem 4.1 and Corollary 4.1.
If we subtract (16) from (14) we get
k ∗
f (x ) f (x )
x k+1 − x ∗ = ε − .
||f (x k )|| ||f (x ∗ )||
For any 0 < τ, x ∈ Rn , ||x|| τ the vector τ x/||x|| is a projection of x on
the ball B(0, τ ). Projection is a nonexpanding map, so we can proceed (with τ =
||f (a)|| − εL)
||x k+1 − x ∗ || (ε/τ )||f (x k ) − f (x ∗ )|| q||x k − x ∗ ||.
This is equivalent to the desired estimate (17). ✷
Note that (14) can be considered as the conditional gradient method [7] for
solving (13) with the special stepsize rule. However, its structure is rather peculiar:
each new step is performed from the point a, not x k .
5. Control Applications
We consider very briefly (with no technical details) some control applications of
the ‘image convexity’ principle.
Convexity of the reachable set. A general nonlinear control system
ẋ = F (x, u, t), x ∈ Rn , u ∈ Rm , 0 t T, x(0) = c (18)
CONVEXITY OF NONLINEAR IMAGE OF A SMALL BALL 167
with L2 -bounded control
T
u∈U = u: ||u(t)||2 dt ε (19)
0
defines a reachable set
S = {x(T ) : x(t) is a solution of (18), u ∈ U }. (20)
Suppose that the linearized system
ż = Fx (x0 , 0, t)z + Fu (x0 , 0, t)u, y(0) = 0 (21)
is controllable [8]; here x0 is the solution of the nominal system
x˙0 = F (x0 , 0, t), x0 (0) = c.
Then (under some technical assumptions to guarantee the smoothness of the map
f : u → x(T )) we can conclude, that for ε small enough the reachable set S is
convex. Indeed, we can apply Theorem 2.1 with X = L2 , Y = Rn , f : u → x(T ).
The controllability of (21) ensures regularity of this map at u = 0.
Sufficiency of the maximum principle. Consider the optimal control problem
min φ(x(T )), (22)
where x(t) is a solution of (18) subject to the constraint (19) and terminal time T is
fixed and the function φ: Rn → R 1 is convex. Then this optimal control problem is
equivalent to the finite-dimensional one minx∈S φ(x) which is convex under above
conditions. Thus, the first-order necessary conditions for the extremum (which
can be written in the form of maximum principle [8]) is also sufficient. Thus, we
conclude that the maximum principle is the sufficient condition for optimality for
(18), (19), (22).
Also from Theorem 4.1 we obtain that the solution is unique and it reduces (19)
to equality.
Numerical methods. Iterative method (14) can be applied to solve the optimal
control problem (18), (19), (22). It has the following form. At the kth iteration we
have an approximation uk = uk (t), 0 t T and calculate x k as a solution of
(18) with u = uk . Then the gradient of the objective function can be found as
f (uk ) = −FuT (x k , uk , t)ψ k (t),
where ψ k is a solution of the adjoint system
ψ̇ = −FxT (x k , uk , t)ψ, ψ(T ) = −φ (x k (T )).
Then the updated control is found by (14), where L2 norm is used. Theorem 4.3
guarantees the convergence of this method to the optimal control.
168 B. T. POLYAK
Discrete maximum principle. It is well known that, in general, the Pontryagin
maximum principle is not valid for discrete-time systems [9]. However, it can be
validated for ‘small power’ control.
Let the states xk ∈ Rn and controls uk ∈ Rm be described by nonlinear differ-
ence equations
xk+1 = F (xk , uk , k), x0 = c, k = 0, 1, . . . , N − 1. (23)
Our objective is
min φ(x(N)) (24)
subject to the l2 -type constraint
N−1
||uk ||2 ε. (25)
k=0
Then under the condition of controllability of the linearized system, we can prove
(as was done above for the continuous-time case) that the reachable set is convex
if ε is small enough. The standard technique allows us to obtain the maximum
principle under this convexity assumption.
6. Conclusions
The general ‘image convexity’ principle is presented and some of its applications to
optimization and control are described. It is possible that many more applications
will arise in various fields of functional analysis and numerical analysis.
References
1. Rockafellar, R. T.: Convex Analysis, Princeton Univ. Press, Princeton, 1970.
2. Polyak, B.: Gradient methods for solving equations and inequalities, USSR Comput. Math. Math.
Phys. 4(6) (1964), 17–32.
3. Ortega, J. W. and Rheinboldt, W. C.: Iterative Solution of Nonlinear Equations in Several
Variables, Academic Press, New York, 1970.
4. Dmitruk, A. V., Miljutin, A. A. and Osmolovskii, N. M.: Ljusternik theorem and extremum
theory, Russian Math. Surveys 55(6) (1980), 11–46.
5. Ioffe, A. D.: On the local surjection property, Nonlinear Anal. 11(5) (1987), 565–592.
6. Polyak, B. T.: Convexity of quadratic transformations and its use in control and optimization, J.
Optim. Theory Appl. 99(3) (1998), 553–583.
7. Bertsekas, D. P.: Nonlinear Programming, Athena Scientific, Belmont, MA, 1998.
8. Lee, E. B. and Markus, L.: Foundations of Optimal Control Theory, Wiley, New York, 1970.
9. Jordan, B. K. and Polak, E.: Theory of a class of discrete optimal control systems, J. Electr.
Control 17(6) (1964).
View publication stats