0% found this document useful (0 votes)
32 views228 pages

Intro To Mathematical Programming MIT

This document provides an overview of the structure and topics covered in a course on mathematical programming. The course covers the formulation, theory, and algorithms for linear optimization problems including the simplex method, duality theory, and sensitivity analysis. It also covers network flows, interior point methods, semidefinite optimization, and robust optimization. The course requirements include homework, a midterm exam, and a final exam. Important concepts include modeling optimization problems and using software like CPLEX to solve problems. Example applications discussed include transportation, scheduling, manufacturing, capacity expansion, and revenue management.

Uploaded by

Cloud neuron
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
32 views228 pages

Intro To Mathematical Programming MIT

This document provides an overview of the structure and topics covered in a course on mathematical programming. The course covers the formulation, theory, and algorithms for linear optimization problems including the simplex method, duality theory, and sensitivity analysis. It also covers network flows, interior point methods, semidefinite optimization, and robust optimization. The course requirements include homework, a midterm exam, and a final exam. Important concepts include modeling optimization problems and using software like CPLEX to solve problems. Example applications discussed include transportation, scheduling, manufacturing, capacity expansion, and revenue management.

Uploaded by

Cloud neuron
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 228

1 Structure of Class

Formula,tionn: Lcc. I

Ctcomct,ry: Lcc. 2-4

Simplex l\lcthoi:l: Lcc. 5-8

Duality
Thcory: Lc,:. 9-11

Sensitivity
.-inalysin: Lei:. 12

Robust Optimization: Lcc. 1 3

Large ncalc ol:,timizat,ion: Lcc. 14-15

Nctmork F l o ~ i s :Lcc. 16-17

The Ellipsi:,ii:l methi:,,$: Lcc. 18-19

Interior 1:)oint mcthoi:ls: Lcc. 20-21

.
Scmii:lcfinitc opt,imizatii:,n: Lcc. 22

~ b..
il ~ c~t cOpt,imizatii:,n:
, Lcc. 24-2;

2 Requirements
Homcmorkn:
30%)

Mii:ltcrm Exam:
30%)

Final Exam: 40%

Iml:,ortant tic
brakcr: cont,ribut,ionst,o ,class

Lnc of
CPLEX fi:n si:,lving ol:,timiza,tion problems

3 Lecture Outline
History
of Optimizatio~l

Whcrc
LOPS Arisc?

Examplcs of
Fi:,rmulatii:,ns
4 History of Opti~rlization
Fermat, 1638: Newton, 1670

mi11 f(r) x : scalar

Euler, 1755

Lagrange, 1707

nlin ~ ( s .I. .,. r,")


s.t,. (Jk(1.1,. . . : r,")= 0 k = 1 , .. . , 77L

Euler, Lagrange Prol:,lcms in in fin it,^ dimcnsii:,ns, iralirulus of variat,ions.

5 Nonlinear Optirrlizatiovl
5.1 The general problem

6 What is Linear Optimization?


6.1 Forlnt~lat,ioli
mininlizc 31.1 + 1.2

sul,jcct ti:, 1.1 + 21.2 22


21.1 + 1.2 23
1.1 2 0.1.2 >0

~ninimizc c'z
sul,jcct ti:, Az2h
z20
7 History of LO

7.1 Tlie pre-algorit,hmic period


Fonrier, 1826 Mct,hoil for solving synbcm of linear inc,:lualit,ics.

de la Val1i.e Poussirl simplcx-like mcthoi:l fin ol,jcct,ivc functii:,n wit,h al:,ii:,-


lute values.

vo11 Nenlnann, 1028 game t,hcory, iluality.

Farkas, Minkowski: Caratl~i.odory,1870-1930 Fi:,un,Sa,tionn

7.2 Tlie lnoderll period


George Dantzig. 1047 51ml,lcx mcthoil

1950s Applicat,ions.

1960s Large Siralc Opt,imizat,ion.

1970s Complexity thci:,ry.

1979 The cllipsi:,ii:l algi:,rit,hm.

1980s Interior pi,int algi:,rit,hmn.

1990s Scnlidcfinit,c all<\conic ~ ~ t i ~ n i r a t i o n .

2000s Ri:,bust Ol:,timizat,ion.

8 Where do LOPS Arise?


8.1
. Wide Applicabilit,y
Transportat,ion

Air traffic ci:,nt,ri:,l,Crew schci:luling,


>Iovcmcnt i:,f Truck Loails
9 Trar~sportatiollProblem
9.1
.. 171
Dat,a
plants. 11 ~snrchouscs

. .si supply

ti,?
of i t h plant, i = 1. . . m

i:lcrnnn,S of j t h >iarchousc, j = I . . . 11

9.2 Decision Variables


9.2.1 Fur~~l~~latiuri
rij = numl:,cr i:,f units t,o send i i j

10 Sorting through LO

11 Invest ~ r ~ e vurlder
~ t taxation
.. You have purchnsci:l

Cl~rrrcntprice of stock i is pi
.5i sharcs i:,f stock i at pricc yi. i = 1, . . . . 11
You cxpcct that the l:,ricc i:,f stock i i:,nc scar fii:,m now will bc ri
You ]:,a- a cal:,ital-gains t,ax a t thc rat? of 30% ,311 any capital gains at t,hc
t,imc i:,f the sale.

You want ti:, raise C ami:,unt of


crash aft,cr taxes.

You pay
1%)
in transaction costs

Example: You sell 1.000 shares a t $50 per sharc; you ha,~:cbought them
a t $30 per sharc: Vet cash is:

SO x 1.000 - 0.30 x (SO - 30) x 1,000

Five invcstmcnt choices .-i. B. C , D.E

.-i. C , and D arc a,~:ailal:,lcin 1993.

B is availablc in 1994.

Cash carns 6% per year.

S1.OOO.OOO in 1993.

12.1 C;asli Flowper Dollar Ilivest,ed


.
12.2.1 Decision Variables
.I, . . . E: amount invcst,cd in Y; millions
ami:,unt invcstcd in cash in pcrii:,d t , t = 1.2 , 3
C'~I,../I~:

max 1.0GCu.~h3 + 1.00B + 1 . i 5 D + 1.40E


s.t. . l + C:+D+ C'udhl 5 1
+
CIA.S/L~B 5 0.3.1+ l . l C + l.OGC'u.~hl
+ <
Cil.sI~3 l.OE l . O l + 0.3A + 1.06C'ash2

13 Manufacturing
13.1 Data
11 l:,roilucts, m raw nlatcrials

hi: a,~:ailal:,lcunits of material i .

aij: # units of nlatcrial i proi:luct j needs in ori:lcr ti:, lbc proi:lucc,S.

13.2 Formulat,ion
13.2.1 Decision variables
rj = amount
i:,f pri:,,Suct j pri:,iSucc,S.
n

max
C qis,i
j=l
14 Capacity Exparlsiovi
14.1 Data and C;olistraiilt,s
Dt: fc,rccast,ccl clcmani:l fi:n electricit,:: a t ::car t
Et: cxisti~lgcal:,acity (in oil) nvailablc nt t
c,: ci:,st ti:, l:,roclucc 11\I\\' using coal capncity

lit: cost ti:, proi:lucc 1MTV using nuclcnr c n p a c i t , ~

No morc than 20% nu,rlca,r

Coal plants last 20 ycnri

Nuclear plants Inst 15 :7cars

14.2 Decision Variables


r,: ami:,unt of ci:,al cal:,acity brought on lint in ycar t.
yt: nmount i:,f capncit,:: I:,rought i:,n line in ycnr t.

u:,: tot,nl ci:,al cal:,acity in ycar t.


zt: t,otal cnpacit,:: in :;car t .

15 Scheduling
15.1 Decision variables
Hospital ~ i a n t ,ti:,
s mnkc >icckl:: night,shift fi:,r its llurscs

D , iicmancl fi:n nurses, j = 1. . . T


Evcry llursc works 5 clays in a ri:,m
Goal: hire mininl~lmnnrnbcr of nurses
Decision Variables
rj: #nurscs startiqg t,hcir week i:,n cia- j

16 Revenue Managerrlerlt
16.1 The indust,ry
Deregulation in 1978

- Clarricrs only allowc~iti:, fly ircrt,ain ri:,utcs. Hcnirc airlincs such as


iYi:,rt,h~scst.Eastern, Southwest, ctc.
- Fares ilct,cr~nincilI:,:: Clivil Acrona,utics Boaril (CAB) lbascil on mileage
and othcr ci:,st,s (C.4B no li:,ngcr exists)
SLII)E26
Post D e r c g l ~ l a t i ~ n
anyone ,ran fly, anywhcrc
farcs ~ i c t c r m i n c ~
I:,yi carrier (and thc markct)

17 Revenue Managerrlerlt

. Huge sunk anil fixcil cost,s


Vcry li:,m variablc ci:,st,s pcr passenger ($lO/passcngcr i:,r lcss)
Strong cci:,nomically ci:,mpctitivc cnvironmcnt
Ncar-pcrfcct infi:,rmat,ion and ncgligil:,lc ci:,st of infi,rmatii:,n
Highly pcrishal:,lc invcnt,ory
Result: l\lult~il~lc
farcs
18 Revenue Managerrlerlt
18.1 Data

. 11

I hub
origins. 11 ilcstinatii:,ns

.
2 irlasscs (for simplicity), (2-class. Ti-class

R c ~ c n u c sT,! ZJ 1.j.;.
i = I .. . . T I ; C,0.i. .I. = I , . . . I L
.
Ca1:)acitics:

Expected iicnlancls: D:)

18.2 LO Forillulatioil

..
18.2.1
Q,,:
Decision Variables
# o f Q-class cu~tomersme accept from i t o j
1;,: # of Y-class cu~tomers accept from i t o j

maximize zr.,::c2%, +v:,Y;,

19 Revenue
Managerrlerlt

F\'c c s t i m a t , ~t,hat RVI hns gcncrat,cil $1.4 billion in in,rrcmcntal rcvcnuc fi,r
Anlcri,ran Airlines in the last three ycnrs ali:,nc. This is not a i:,nc-time benefit.
F\'c c x p r t RM t o gcncrat,c at lcnst $500 millii:,n nnnually for the fi:,rcsccnl:,lc
future. .-is me continue t o invest in the cnhanccmcnt of DIV.-i?rlO me cxl:,cct t,o
cnpt,urc nn even lnrgcr rcvcnuc premium.
20 Messages
20.1 How t,o formulate?
1. Define your ilccision variables clearly.
2.
ITrritc ci:,nstraints ani:l ol:,jcctivc funct,ion.

What is a good LO for~ril~latiorl?


A fi,rmulation with a
numl:,cr of variaI:,lcs anil const,raint,s,anil t,hc mat,rix
A is sparse.

21.1 The general proble~ll

22 Convex furlctiorls

. f :.S+R
For all sl.s 2E ,Y

f(As1 + ( l h ) s ? )5 Xf(sl)+(lPA)f(s2)
. ,f (z) ci:,ncavc if f (z)ci:,nvcx.
-

23 0 x 1 the power of LO

min f (z)= maxi, dn z + ci,


.5.t. Az 3b
24 0x1the power of LO

min C t,j l.cjl


n.t. Az 2b
Idea: 1x1 = max{s, s ]
mi11 Cqiqi
5.t. Ax 2 b
r.i 5 z.?
" j 5 zj
Message: >Iinimizing Picirc~sisclincnr convex functii:,n ,ran lbc moi:lcllcil I:,y LO
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming
Lecture 2: Geometry of Linear Optimization I
1 Outline: Slide 1
1. What is the central problem?
2. Standard Form.
3. Preliminary Geometric Insights.
4. Geometric Concepts (Polyhedra, \Corners").
5. Equivalence of algebraic and geometric concepts.

2 Central Problem Slide 2


minimize cx 0

subject to ai x =
0
bi i 2 M1
ai x 
0
bi i 2 M2
ai x 
0
bi i 2 M3

xj0 j 2 N1
>
x j <0 j 2 N2

2.1 Standard Form Slide 3


minimize cx
0

subject to Ax = b
x0
Characteristics
 Minimization problem
 Equality constraints
 Non-negative variables

2.2 Transformations Slide 4


max c x0
; min(;c x) 0

ai x 
0
bi ai x + si = bi si  0
0

,
ai x 
0
bi ai x ; i =
0
s i
b  i0
s

>
xj < 0 x j = xj ; xj+ ;

j  0 xj  0
+ ;
x

2.3 Example Slide 5


maximize x1 ; x2
subject to x1+ 21 x

x1+2 2  1 x
>
x1 <0 2  0 x

+
;minimize ;x+1 + x1 + x2 ;

subject to x+1 ; x1 + x2 + s1 ;
=1
x1 ; x1 + 2x2 ; s2 = 1

+ ;

x1  x1  x2  s1  s2  0

+ ;

3 Preliminary Insights
Slide 6
minimize ; 1 ; 2 x x

subject to 1 + 2 2  3x x

2 1+ 23

x x

1 20 x x

x2

- x1 - x2 = - 2

1.5

- x1 - x2 = z
(1,1)

- x1 - x2 = 0

1.5 3 x1
c x1 + 2x2 < 3
2x1 + x2 < 3

Slide 7
;x1 + x2  1
x1  0

x2  0
Slide 8
 There exists a unique optimal solution.
 There exist multiple optimal solutions in this case, the set of optimal
solutions can be either bounded or unbounded.

2
x2

c = (1,0)

c = (- 1,- 1)
1

c = (0,1)
c = (1,1)

x1

 The optimal cost is ;1, and no feasible solution is optimal.


 The feasible set is empty.

4 Polyhedra
4.1 Denitions Slide 9
 The set fx j a x = g is called a hyperplane.
0
b

 The set fx j a x  g is called a halfspace.


0
b

 The intersection of many halfspaces is called a polyhedron.

 A polyhedron is a convex set, i.e., if x y 2 , then x + (1 ; )y 2 .


P P   P

= b3
a '2

a '3x
x=
b2

a3
a2

a' x < b a4

a' x > b a1
=b a 4' x = b4
=b
1

a' x a5
1x
a'

a
a5' x = b
5

(a) (b)

3
5 Corners
5.1 Extreme Points Slide 10
 Polyhedron P = fx j Ax  bg

 x 2 P is an extreme point of P

if 6 9 y z 2 P (y =
6 x z 6= x):

x = y + (1 ; )z  0 <  < 1

. . u

.
v
w

. . z

. y
x

5.2 Vertex Slide 11


 x 2 is a vertex of if 9 c:
P P

x is the unique optimum

minimize cy
0

subject to y2 P

5.3 Basic Feasible Solution Slide 12


P = f(x1  x2 x3 )j x1 + x2 + x3 =1  x1  x2  x3  0g
Slide 13
Points A,B,C : 3 constraints active
Point E: 2 constraints active
suppose we add 2 1 + 2 2 + 2 3 = 2.
x x x

4
'w }
.
w

=c
c 'y
{y | P

c
.
x
'x }
{y | c
'y = c

x3

A.
E . P . C
x2

D . .
B
x1

Then 3 hyperplanes are tight, but constraints are not linearly independent.
Slide 14
Intuition: a point at which inequalities are tight and corresponding equations
n

are linearly independent.


P = fx 2 <n j Ax  bg
 a1  : : :  am rows of A
 x2P

 I = fi j ai x = bi g
0

Denition x is a basic feasible solution if subspace spanned by fai 2 g i I

is <n .
5.3.1 Degeneracy
Slide 15
 If jI j = n, then ai  i 2 I are linearly independent x nondegenerate.

5
 If jI j > n, then there exist n linearly independent fai  i 2 I g x degener-
ate.

C
P
B
E

(a) (b)

5.3.2 Example
Slide 16
min x1 + 52 x ;2x3
st : : x 1+ 2+ x x3 4
x1 2
x3 3
3 2+
x x3 6
x1  x2  x3 0
Slide 17

6 Equivalence of denitions
Slide 18
Theorem: = fx j Ax  bg. Let x 2 .
P P

x is a vertex , x is an extreme point , x is a BFS.


6.1 Proof Slide 19
1. Vertex ) extreme point
9c : c x c y 8y 2
0
<
0
P

If x is not an extreme point 9y z 6= x: 

x = y + (1 ; )z . But c x c y c x c z
 
0
<
0

0
<
0

) c x = c y + (1 ; )c z c x contradiction
0

0

0
<
0

Slide 20
2. Extreme point ) BFS
Suppose x is not a BFS.

6
Let = f : ai x = ig. But ai do not
I i
0
b

span all of <n ) 9z 2 <n : ai z = 0 2 0


i I

Let
x1 = x + z 

x2 = x ; z ) 

aix1 = i 0
b

aix2 = i 2 0
b
i I

Slide 21
i 62 I : aix < bi ) ai (x + z ) < bi 
0 0
ai (x ; z)
0
 < b i
for small enough.
) x1 x2 2 : yet x = x1 +2 x2 )


 P

x not an extreme point: contradiction Slide 22


3. BFS ) vertex

x BFS �

= f : ai x = i g

0 �
I i b

Let i = 10 262

i I
d
i I:

c = ;d A 0 0

Then c x = ;d Ax = ;
Pm P P .
iai x = ; ai x = ; Slide 23
� � � �
0
i 0
d
0 0
b
i�1 i I i I
But 8x 2 P : ai x  i )
2 2
0
b

x optimum
c x = ; P aix  ; P i = c x

0
min cx
0
b
0 � 0

i2I i2I x2 P:
Why unique?

7
Equality holds if ai x = i 2  since ai spans <n
0
b i I  aix = i has a unique
0
b

solution x = x .

MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming
Lecture 3: Geometry of Linear Optimization II
1 Outline Slide 1
BFS for standard form polyhedra
Deeper understanding of degeneracy
Existence of extreme points
Optimality of Extreme Points
Representation of Polyhedra

2 BFS for standard form polyhedra


Slide 2
Ax = b and x  0

m  n matrix A has linearly independent rows

x 2 <n is a basic solution if and only if Ax = b, and there exist indices


B(1) : : : B (m) such that:

{ The columns AB(1)  : : : AB(m) are linearly independent

{ If i 6= B(1) : : : B (m), then xi = 0


2.1 Construction of BFS Slide 3


Procedure for constructing basic solutions
1. Choose m linearly independent columns AB(1)  : : : AB(m)
2. Let xi = 0 for all i =
6 B(1) : : : B (m)
3. Solve Ax = b for xB(1)  : : : xB(m)

Ax = b ! BxB + NxN = b
xN = 0 xB = B; b 1

2.2 Example 1 Slide 4


21 1 2 1 0 0 03 2 83
66 0 1 6 0 1 0 0 77 x = 66 12 77
41 0 0 0 0 1 05 4 45
0 1 0 0 0 0 1 6
A  A  A  A basic columns
4 5 6 7

1
Solution: x = (0 0 0 8 12 4 6), a BFS

Another basis: A3  A5  A6  A7 basic columns.

Solution: x = (0 0 4 0 ;12 4 6), not a BFS

2.3 Geometric intuition Slide 5


A3

A1

A2
A4 = - A1

2.4 Example 2

Slide 6
General form

Slide 7
x1 + x2 + x3 4

x1 2


x3 3


3x2 + x3 6


x1 x2 x3 0

2
Standard form
x1 + x2 + x3 + s1 = 4
x1 + s2 = 2
x3 + s3 = 3
3x2 + x3 + s4 = 6
x1 x2 x3 s1 : : : s4  0

Slide 8
Using the denition for BFS in polyhedra in general form :

x1 + x2 + x3 = 4 =

Choose tight constraints: x3 = 3 ) (1 0 3)


x2 =0

011 001 001

Check if @ 1 A  @ 0 A  @ 1 A span <3 (they do)


1 1 0

Slide 9

Using the denition for BFS in polyhedra in standard form :


Pick the basic variables: x1 x3 s2 s3 : xB = (x1 x3 s2  s3)

Pick the nonbasic variables: x2  s1 s4 : xN = (x2  s1 s4 )


Slide 10

Partition A:
2 x1 x2 x3 s1 s2 s3 s4 3
1 1 1 1 0 0 0

A = 664 1 0 0 0 1 0 0 77 = B N ]

0 3 1 0 0 1 0

0 0 1 0 0 0 1

21 1 0 03 21 1 0
3

Slide 11

6 7 6 0 77  B non-singular
B = 64 10 01 10 01 57  N = 64 03 0

05

0 1 0 0
0 x 1 0 1 10 0 1

xN = 0 xB = B; b ) BB@ xs CCA = BB@ 13 CCA


1

1 3
2
s3 3

3 Degeneracy for standard form polyhedra


3.1 Denition Slide 12
A BFS x of P = fx 2 <n : Ax = b A : n  n x  0g is called degenerate

if it contains more than n ; m zeros.


x is non-degenerate if it contains exactly n ; m zeros.


3

3.2 Example 2, revisited Slide 13


In previous example:
(2 2 0 0 0 3 0)degenerate : n = 7
m=4

More than n ; m = 7 ; 4 = 3 zeros.


Ambiguity about which are basic variables.
(x1 x2 x3 x6) one choice
(x1 x2 x6 x7) another choice

3.3 Extreme points and BFS Slide 14


Consider again the extreme point (2 2 0 0 0 6 0)

How do we construct the basis?

8 0 1 1 0 1 1 0 0 1 9
< 1 0 0 =
B=
: @ 0

A @ 0
A @ 1
A
0 3 0

A1
A 2 A 6

Slide 15
Columns in B are linearly independent.

Rank (A) = 4

jBj = 3 < 4

Can we augment B?

Choices:
{ B0 = B fA3 g basic variables x1  x2 x3 x6
{ B0 = B fA7 g basic variables x1  x2 x6 x7

{ How many choices do we have?

3.4 Degeneracy and geometry Slide 16


Whether a BFS is degenerate may depend on the particular representation
of a polyhedron.

n  o

P = (x1 x2 x3)  x1 ; x2 = 0 x1 + x2 + 2x3 = 2 x1 x2 x3  0 :


4
n = 3, m = 2 and n ; m = 1. (1 1 0) is nondegenerate, while (0,0,1) is
degenerate.
n 
Consider the representation P = (x1 x2 x3)  x1 ;x2 = 0 x1 +x2 +2x3 =
o
2 x1  0 x3  0 : (0,0,1) is now nondegenerate.

3.5 Conclusions Slide 17


An extreme point corresponds to possibly many bases in the presence of
degeneracy.
A basic feasible solution, however, corresponds to a unique extreme point.
Degeneracy is not a purely geometric property.

4 Existence of
extreme points
Slide 18

P Q

Note that P = f(x1 x2) : 0 x1 1g does not have an extreme point, while
P 0 = f(x1 x2) : x1 x2  x1  0 x2  0g has one. Why?
4.1 Denition Slide 19
A polyhedron P <n contains a line if there exists a vector x 2 P and a
nonzero vector d 2 <n such that x + d 2 P for all scalars .
4.2 Theorem Slide 20
Suppose that the polyhedron P = fx 2 <n j ai 0 x  bi  i = 1 : : : mg is
nonempty. Then, the following are equivalent:
(a) The polyhedron P has at least one extreme point.

(b) The polyhedron P does not contain a line.


(c) There exist n vectors out of the family a1  : : : am , which are linearly

independent.

5
4.3 Corollary Slide 21
Polyhedra in standard form contain an extreme point.
Bounded polyhedra contain an extreme point.
4.4 Proof Slide 22
Let P = fx j Ax = b x  0g 6=  rank(A) = m. If there exists a feasible
solution in P, then there is an extreme point.
Proof

Let x = (x1 : : : xt 0 : : : 0), s.t. x 2 P. Consider B = fA1  A2  : : : At g


If fA1 A2  : : : At g are linearly independent we can augment, to nd a
basis, and thus a BFS exists.

If fA1 A2 : : : Atg are dependent


d1A1 +    + dtAt = 0 (di 6= 0)
But x1A1 +    + xtAt = b
) (x1 + d1)A1 +    + (xt + dt )At = b
 x + d j = 1 : : :t
Consider xj () = j j

0 otherwise.
Slide 23
Clearly A  x() = b
n o
Let: 1 = max

d >0j
; xdjj (if all dj 0)
n x o 1 = ;1)
2 = dmin

>0 ; d
j
j
j
(if all dj  0)
2 = +1
For1  2 (suciently small)
x()  0
Slide 24
Since at least one (d1 : : : dt) 6= 0 ) at least one from 1  2 is nite, say 1 .
But then x(1 )  0 and number of nonzeros decreased.

xj +   dj  0 ) xj  ;dj
4.5 Example 3 Slide 25
P = fx j x1 + x2 + x3 =2
x1 +x4 = 1 x1 : : : x4  0g

x = ; 12  21  1 21 

6
 1   1   1   0 
B= 1  0  0  1
1 1 1 0 0
1 1 ;
0 +0

0 ;
1 = 0
Slide 26
1 
Consider: x() = 2 +  12 ;  1 12 ;  for ; 12  12 :

x() 2 P.

Note x(; 12 ) = (0 1 1 1) and x( 21 ) = (1 0 1 0).



5 Optimality of
Extreme Points

5.1 Theorem Slide 27


Consider
min c x

s:t: x 2 P = fx 2 <n j Ax  bg:


P has no line and it has an optimal solution.


Then, there exists an optimal solution which is an extreme point of P.
5.2 Proof Slide 28
v optimal value of the cost c x. 0

Q : set of optimal solutions, i.e.,

Q = fx j c x = v Ax  bg 0

Q P and P contains no lines, Q does not contain any lines, hence is has
an extreme point x� .

Slide 29

Claim: x� is an extreme point of P.

Suppose not 9 y w 6= x� : x� = y + (1 ; )w  y w 2 P  0 <  < 1:

v = c x� = c y + (1 ; )c w
0 0 0

cyv )

c w  v ) c y = c w = v ) y w 2 Q
0 0
0

) x� is NOT an extreme point of Q, CONTRADICTION.


7
6 Representation
of Polyhedra

6.1 Theorem Slide 30


A nonempty and bounded polyhedron is the convex hull of its extreme points.

.
y

z. P

.
u
Q a 'i*x = bi*

8
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 4: Geometry of Linear Optimization III


1 Outline
Slide 1
1. Projections of Polyhedra
2. Fourier-Motzkin Elimination Algorithm
3. Optimality Conditions

2 Projections of polyhedra
Slide 2
• πk : ℜn �→ ℜk projects x onto its first k coordinates:

πk (x) = πk (x1 , . . . , xn ) = (x1 , . . . , xk ).

• � �
Πk (S) = πk (x) | x ∈ S ;

Equivalently

� �
Πk (S) = (x1 , . . . , xk ) � there exist xk+1 , . . . , xn

s.t. (x1 , . . . , xn ) ∈ S .

x3

x2

P 2( S )

P 1( S )

x1

2.1 The Elimination Algorithm


2.1.1 By example
Slide 3
• Consider the polyhedron

x1 + x2 ≥ 1

1
x1 + x2 + 2x3 ≥ 2
2x1 + 3x3 ≥ 3
x1 − 4x3 ≥ 4
−2x1 + x2 − x3 ≥ 5.

• We rewrite these constraints

0 ≥ 1 − x1 − x2
x3 ≥ 1 − (x1 /2) − (x2 /2)
x3 ≥ 1 − (2x1 /3)
−1 + (x1 /4) ≥ x3
−5 − 2x1 + x2 ≥ x3 .

• Eliminate variable x3 , obtaing polyhedron Q

0 ≥ 1 − x1 − x2
−1 + x1 /4 ≥ 1 − (x1 /2) − (x2 /2)
−1 + x1 /4 ≥ 1 − (2x1 /3)
−5 − 2x1 + x2 ≥ 1 − (x1 /2) − (x2 /2)
−5 − 2x1 + x2 ≥ 1 − (2x1 /3).

2.2 The Elimination Algorithm


�n Slide 4
1. Rewrite j =1 aij xj ≥ bi in the form
n−1

ain xn ≥ − aij xj + bi , i = 1, . . . , m;
j=1

if ain =
� 0, divide both sides by ain . By letting x = (x1 , . . . , xn−1 ) that P
is represented by:

xn ≥ di + f ′i x, if ain > 0,
dj + f ′j x ≥ xn , if ajn < 0,
0 ≥ dk + f ′k x, if akn = 0.

2. Let Q be the polyhedron in ℜn−1 defined by:

dj + f ′j x ≥ di + f ′i x, if ain > 0 and ajn < 0,


0 ≥ dk + f ′k x, if akn = 0.

Theorem:

The polyhedron Q constructed by the elimination algorithm is equal to the

projection Πn−1 (P ) of P .

2
2.3 Implications
Slide 5
• Let P ⊂ ℜn+k be a polyhedron. Then, the set
x ∈ ℜn � there exists y ∈ ℜk such that (x, y) ∈ P

� � �

is also a polyhedron.

• Let P ⊂ ℜn be a polyhedron and let A be an m × n matrix. Then, the

set Q = {Ax | x ∈ P } is also a polyhedron.

• The convex hull of a finite number of vectors is a polyhedron.

2.4 Algorithm for LO


Slide 6
• Consider min c′ x subject to x ∈ P .
• Define a new variable x0 and introduce the constraint x0 = c′ x.
• Apply the elimination algorithm n times to eliminate the variables x1 , . . . , xn
• We are left with the set
� �
Q = x0 | there exists x ∈ P such that x0 = c′ x ,

and the optimal cost is equal to the smallest element of Q.

3 Optimality Conditions
3.1 Feasible directions
Slide 7
• We are at x ∈ P and we contemplate moving away from x, in the direction

of a vector d ∈ ℜn .

• We need to consider those choices of d that do not immediately take us

outside the feasible set.

• A vector d ∈ ℜn is said to be a feasible direction at x, if there exists a

positive scalar θ for which x + θd ∈ P .

Slide 8

3
Slide 9
• x be a BFS to the standard form problem corresponding to a basis B.
• xi = 0, i ∈ N , xB = B −1 B.
• We consider moving away from x, to a new vector x + θd, by selecting a

nonbasic variable xj and increasing it to a positive value θ, while keeping

the remaining nonbasic variables at zero.

• Algebraically, dj = 1, and di = 0 for every nonbasic index i other than j.


• The vector xB of basic variables changes to xB + θdB .
• Feasibility: A(x + θd) = B ⇒ Ad = 0.
�n �m
• 0 = Ad = i=1 Ai di = i=1 AB(i) dB(i) + Aj = BdB + Aj ⇒ dB =

−B −1 Aj .

• Nonnegativity constraints?

– If x nondegenerate, xB > 0; thus xB + θdB ≥ 0 for θ is sufficiently

small.

– If xdegenerate, then d is not always a feasible direction. Why?

• Effects in cost?

Cost change: c′ d = cj − c′B B −1 Aj This quantity is called reduced cost

cj of the variable xj .

3.2 Theorem
Slide 10
• x BFS associated with basis B
• c reduced costs

Then

• If c ≥ 0 ⇒ x optimal
• x optimal and non-degenerate ⇒ c ≥ 0

3.3 Proof
• y arbitrary feasible solution
• d = y − x ⇒ Ax = Ay = b ⇒ Ad = 0
� Slide 11
⇒ BdB + Ai di = 0
i∈N
� −1
⇒ dB = − B Ai di
i∈N �
⇒ c′ d = c′B dB + ci di
i∈N
Slide 12
(ci − cB B −1 Ai )di =
� ′

= ci di

i∈N i∈N

4
• Since yi ≥ 0 and xi = 0, i ∈ N , then di = yi − xi ≥ 0, i ∈ N
• c′ d = c′ (y − x) ≥ 0 ⇒ c′ y ≥ c′ x
⇒ x optimal

(b) Your turn

5
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 5: The Simplex Method I


1 Outline
Slide 1
• Reduced Costs
• Optimality conditions
• Improving the cost
• Unboundness
• The Simplex algorithm
• The Simplex algorithm on degenerate problems

2 Matrix View
Slide 2
min c′ x
s.t. Ax = b
x≥0

x = (xB , xN ) xB basic variables


xN non-basic variables

A = [B, N ]

Ax = b ⇒ B · xB + N · xN = b

⇒ xB + B −1 N xN = B −1 b
⇒ xB = B −1 b − B −1 N xN

2.1 Reduced Costs


Slide 3
z = c′B xB + c′N xN
= c′B (B −1 b − B −1 N xN ) + c′N xN
= c′B B −1 b + (c′N − c′B B −1 N )xN

cj = cj − c′B B −1 Aj reduced cost

2.2 Optimality Conditions


Slide 4
Recall Theorem:

• x BFS associated with basis B


• c reduced costs

Then

• If c ≥ 0 ⇒ x optimal
• x optimal and non-degenerate ⇒ c ≥ 0

1
3 Improving the Cost
Slide 5
• Suppose cj = cj − c′B B −1 Aj < 0

Can we improve the cost?

• Let dB = −B −1 Aj

dj = 1, di = 0, i =
� B(1), . . . , B(m), j.

• Let y = x + θ · d, θ > 0 scalar


Slide 6
c′ y − c′ x = θ · c′ d
= θ · (c′B dB + cj dj )
= θ · (cj − c′B B −1 Aj )
= θ · cj
Thus, if cj < 0 cost will decrease.

4 Unboundness
Slide 7
• Is y = x + θ · d feasible?

Since Ad = 0 ⇒ Ay = Ax = b

• y ≥ 0 ?

If d ≥ 0 ⇒ x + θ · d ≥ 0 ∀ θ ≥ 0

⇒ objective unbounded.

5 Improvement
Slide 8
If di < 0, then
xi
xi + θdi ≥ 0 ⇒ θ ≤ −
di
� �
xi
⇒ θ∗ = min −
{i|di <0} di
xB(i)
� �

⇒θ = min −
{i=1,...,m|dB(i) <0} dB(i)

5.1 Example
Slide 9
min x1 + 5x2 −2x3
s.t. x1 + x2 + x3 ≤4
x1 ≤2
x3 ≤3
3x2 + x3 ≤6
x1 , x2 , x3 ≥0

2
x3
(0,0,3) (1,0,3)

(2,0,2)
(0,1,3)

x1

(0,2,0) (2,2,0)
Slide 10
x2 
x1

Slide 11
A1 A2 A3 A4 A5 A6 A7
   x2   
1 1 1 1 0 0 0   4
 x3 
 1 0 0 0 1 0 0     2 
   x4 = 
 0 0 1 0 0 1 0     3 
 x5 
0 3 1 0 0 0 1   6
 x6 
x7
B = [A1 , A3 , A6 , A7 ]
BFS: x = (2, 0, 2, 0,0, 1, 4)′   Slide 12
1 1 0 0 0 1 0 0
 1 0 0 0  −1
 1 −1 0 0 
B=  c′ = (0, 7, 0, 2, −3, 0, 0)
 0 1 1 0  , B =  −1
 
1 1 0 

0 1 0 1   −1 1 0 1

d1 −1
 d3  −1
 1 
d5 = 1, d2 = d4 = 0,   d6  = −B A5
 = −1 
 Slide 13
d7 −1
y ′ = x′ + θd′ = (2 − θ, 0, 2 + θ, 0, θ, 1 − θ, 4 − θ)
What happens as θ increases? � �
x
θ∗ = min{i=1,...,m|dB(i)<0 } − Bdi(i) =
� �
min − (−21) , − (−11) , − (−41) = 1.
l = 6 (A6 exits the basis).
New solution
y = (1, 0, 3, 0, 1, 0, 3)′ Slide 14
New basis B = (A1 , A3 , A5 , A7 ) Slide 15

3
x3
(0,0,3) (1,0,3)

(2,0,2)
(0,1,3)

x1

(0,2,0) (2,2,0) 
  
1 1 0 0 1 0 −1 0
x21 0 1 0 
 , B −1 =  0 0 1 0 
 
B = 
 0

1 0 0   −1 1 1 0 

0 1 0 1 0 0 −1 1

−1
c ′ = c′ − c′B B A = (0, 4, 0, −1, 0, 3, 0)
Need to continue, column A4 enters the basis.

6 Correctness
Slide 16
xB(l) xB(i)
� �
− = min − = θ∗
dB(l) i=1,...,m,dB(i)<0 dB(i)

Theorem

• B = {AB(i) ,i�=l , Aj } basis

• y = x + θ∗ d is a BFS associated with basis B.

7 The Simplex Algorithm


Slide 17
1. Start with basis B = [AB(1) , . . . , AB(m) ]

and a BFS x.

2. Compute cj = cj − c′B B −1 Aj
• If cj ≥ 0; x optimal; stop.
• Else select j : cj < 0.

4
Slide 18
3. Compute u = −d = B −1 Aj .
• If u ≤ 0 ⇒ cost unbounded; stop
• Else
∗ xB(i) uB(l)
4. θ = min =
1≤i≤m,ui >0 ui ul
5. Form a new basis by replacing AB(l) with Aj .
6. yj = θ∗

yB(i) = xB(i) − θ∗ ui

7.1 Finite Convergence


Slide 19
Theorem:
• P = {x | Ax = b, x ≥ 0} =
� ∅
• Every BFS non-degenerate

Then

• Simplex method terminates after a finite number of iterations


• At termination, we have optimal basis B or we have a direction d : Ad =

0, d ≥ 0, c′ d < 0 and optimal cost is −∞.

7.2 Degenerate problems


Slide 20
• θ∗ can equal zero (why?) ⇒ y = x, although B �= B.
• Even if θ∗ > 0, there might be a tie

xB(i)

min ⇒
1≤i≤m,ui >0 ui
next BFS degenerate.
• Finite termination not guaranteed; cycling is possible.
Slide 21
- g
x

f g

h x
=0 5 =0
x4
=0
3
x

x6
=0

x
2 =0
. y
c
=0
x1

5
7.3 Pivot Selection
Slide 22
• Choices for the entering column:
(a) Choose a column Aj , with cj < 0, whose reduced cost is the most

negative.

(b) Choose a column with cj < 0 for which the corresponding cost de­

crease θ∗ |cj | is largest.

• Choices for the exiting column:

smallest subscript rule: out of all variables eligible to exit the basis, choose

one with the smallest subscript.

7.4 Avoiding Cycling


Slide 23
• Cycling can be avoided by carefully selecting which variables enter and

exit the basis.

• Example: among all variables cj < 0, pick the smallest subscript;

among all variables eligible to exit the basis, pick the one with the smallest

subscript.

6
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 6: The Simplex Method II


1 Outline
Slide 1
• Revised Simplex method
• The full tableau implementation
• Anticycling

2 Revised Simplex
Slide 2
Initial data: A, b, c
1. Start with basis B = [AB(1) , . . . , AB(m) ]
and B −1 .
2. Compute p′ = c′B B −1
cj = cj − p ′ A j
• If cj ≥ 0; x optimal; stop.
• Else select j : cj < 0.
Slide 3
−1
3. Compute u = B Aj .
• If u ≤ 0 ⇒ cost unbounded; stop
• Else
xB(i) uB(l)
4. θ∗ = min =
1≤i≤m,ui >0 ui ul
5. Form a new basis B by replacing AB(l) with Aj .
6. yj = θ∗ , yB(i) = xB(i) − θ∗ ui
Slide 4
7. Form [B −1 |u]
8. Add to each one of its rows a multiple of the lth row in order to make the

last column equal to the unit vector el .

−1
The first m columns is B .

2.1 Example
Slide 5
min x1 + 5x2 −2x3
s.t. x1 + x2 + x3 ≤4
x1 ≤2
x3 ≤3
3x2 + x3 ≤6
x1 , x2 , x3 ≥0
Slide 6

1
B = {A1 , A3 , A6 , A7 }, BFS: x = (2, 0, 2, 0, 0, 1, 4)′

c = (0, 7, 0, 2, −3, 0,0)  
1 1 0 0 0 1 0 0
 1 0 0 0  −1
 1 −1 0 0 
B=  0 1 1 0  , B =  −1
  
1 1 0 
0 1 0 1 −1 1 0 1
(u1 , u3 , u6�, u7 )′ =�B −1 A5 = (1, −1, 1, 1)′
θ∗ = min 21 , 11 , 41 = 1, l = 6
l = 6 (A6 exits  the basis).  Slide 7
0 1 0 0 1
 1 −1 0 0 −1 
[B −1 |u] =   −1

1 1 0 1 
−1 1 0 1 1
 
1 0 −1 0

−1  0 0 1 0

⇒B =  −1 1

1 0 

0 0 −1 1

2.2 Practical issues


Slide 8
• Numerical Stability
B −1 needs to be computed from scratch once in a while, as errors accu­
mulate
• Sparsity
B −1 is represented in terms of sparse triangular matrices

3 Full tableau implementation


Slide 9
−c′B B −1 b c′ − c′B B −1 A
B −1 b B −1 A

or, in more detail,

−c′B xB c1 ... cn
xB(1) | |
..
. B −1 A1 ... B −1 An
xB(m) | |

2
3.1 Example
Slide 10
min −10x1 − 12x2 − 12x3
s.t. x1 + 2x2 + 2x3 ≤ 20
2x1 + x2 + 2x3 ≤ 20
2x1 + 2x2 + x3 ≤ 20
x1 , x2 , x3 ≥ 0
min −10x1 − 12x2 − 12x3
s.t. x1 + 2x2 + 2x3 + x4 = 20
2x1 + x2 + 2x3 + x5 = 20
2x1 + 2x2 + x3 + x6 = 20
x1 , . . . , x6 ≥ 0
BFS: x = (0, 0, 0, 20, 20, 20)′
B=[A4 , A5 , A6 ] Slide 11

x1 x2 x3 x4 x5 x6
0 −10 −12 −12 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = 20 2* 1 2 0 1 0
x6 = 20 2 2 1 0 0 1

c′ = c′ − c′B B −1 A = c′ = (−10, −12, −12, 0, 0, 0) Slide 12

x1 x2 x3 x4 x5 x6
100 0 −7 −2 0 5 0
x4 = 10 0 1.5 1* 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 0 0 1 −1 0 −1 1
Slide 13

x1 x2 x3 x4 x5 x6
120 0 −4 0 2 4 0
x3 = 10 0 1.5 1 1 −0.5 0

x1 = 0 1 −1 0 −1 1 0

x6 = 10 0 2.5* 0 1 −1.5 1

Slide 14

3
x1 x2 x3 x4 x5 x6
136 0 0 0 3.6 1.6 1.6
x3 = 4 0 0 1 0.4 0.4 −0.6
x1 = 4 1 0 0 −0.6 0.4 0.4
x2 = 4 0 1 0 0.4 −0.6 0.4
Slide 15
x3

.B = (0,0,10)

A = (0,0,0) .
E = (4,4,4) .

. .C = (0,10,0)

D = (10,0,0)
x1 x2

4 Comparison of implementations
Slide 16
Full tableau Revised simplex

Memory O(mn) O(m2 )

Worst-case time O(mn) O(mn)

Best-case time O(mn) O(m2 )

5 Anticycling
5.1 Degeneracy in Practice
Slide 17
Does degeneracy really happen in practice?
�n
xij = 1
j=1
�n
xij = 1
i=1
xij ≥ 0

4
n! vertices:

For each vertex ∃ 2n−1 nn−2 different bases (n = 8) for each vertex ∃ 33, 554, 432

bases.

5.2 Perturbations
Slide 18
(P ) min c′ x (Pǫ ) min c′ x  
ǫ
 ǫ2 
s.t. Ax = b s.t. Ax = b +  ..
 

 . 
ǫm
x ≥ 0 x ≥ 0.

5.2.1 Theorem
Slide 19
∃ ǫ1 > 0: for all 0 < ǫ < ǫ1

ǫ
Ax = b +  ... 
 

ǫm
x≥0

is non-degenerate.

5.2.2 Proof
Slide 20
Let B 1 , . . . , B r be all the bases.
  r
b1 + B r11 ǫ + · · · + B r1m
ǫm
  
ǫ
 .   ..
B −1
r b +  ..  = 
 

. 
r
ǫm bm + B rm1 ǫ + · · · + B rmm ǫm

where: r 
B r11 B r 1m
  
··· b1
B −1 =  ... .. −1
 , Br b = 
.. 
  
r . . 
r
B rm1 · · · B r mm bm
Slide 21
r
• bi + B ri1 θ + · · · + B rim θm is a polynomial in θ
• Roots θi,r1 , θi,r2 , . . . , θi,m
r

r
• If ǫ �= θi,r1 , . . . , θi,m
r
⇒ bi + B ri1 ǫ + · · · + B rim ǫm �= 0.
• Let ǫ1 the smallest positive root ⇒ 0 < ǫ < ǫ1 all RHS are =

� 0 ⇒

non-degeneracy.

5
5.3 Lexicography
Slide 22
L
• u is lexicographically larger than v, u > v, if u =

� v and the first


nonzero component of u − v is positive.

• Example:
L
(0, 2, 3, 0) > (0, 2, 1, 4),
L
(0, 4, 5, 0) < (1, 2, 1, 2).

5.4 Lexicography-Pertubation
5.4.1 Theorem
Slide 23
Let B be a basis of Ax = b, x ≥ 0. Then B is feasible for Ax = b +

(ǫ, . . . , ǫm ) , x ≥ 0 for sufficiently small ǫ if and only if
L
ui = (bi , Bi1 , . . . , Bim ) > 0, ∀ i

B −1 = (Bij )
(B −1 b)i = (bi )

5.4.2 Proof
Slide 24
B is feasible for peturbed problem “⇔” B −1 (b + (ǫ, . . . , ǫm )′ ) ≥ 0 ⇔
bi + B i1 ǫ + · · · + B im ǫm ≥ 0 ∀ i
⇔ First non-zero component of ui = (bi , Bi1 , . . . , Bim ) is positive ∀ i.

5.5 Summary
Slide 25
1. We start with: (P ) : Ax = b, x ≥ 0
2. We introduce (Pǫ ): Ax = b + (ǫ, . . . , ǫm )′ , x ≥ 0
L
3. A basis is feasible + non-degenerate in (Pǫ ) ⇔ ui > 0 in (P ).
L
4. If we maintain ui > 0 in (P ) ⇒ (Pǫ ) is non-degenerate ⇒ Simplex is
finite in (Pǫ ) for sufficiently small ǫ.

5.6 Lexicographic pivoting rule


Slide 26
1. Choose an entering column Aj arbitrarily, as long as cj < 0; u = B −1 Aj .
2. For each i with ui > 0, divide the ith row of the tableau (including the
entry in the zeroth column) by ui and choose the lexicographically smallest
row. If row l is lexicographically smallest, then the lth basic variable xB(l)
exits the basis.

6
5.6.1 Example
Slide 27
• j = 3

1 0 5 3 ···
• 2 4 6 −1 ···
3 0 7 9 ···

• xB(1) /u1 = 1/3 and xB(3) /u3 = 3/9 = 1/3.


• We divide the first and third rows of the tableau by u1 = 3 and u3 = 9,

respectively, to obtain:

1/3 0 5/3 1 ···


• ∗ ∗ ∗ ∗ ···
1/3 0 7/9 1 ···

• Since 7/9 < 5/3, the third row is chosen to be the pivot row, and the

variable xB(3) exits the basis.

5.6.2 Uniqueness
Slide 28
• Why lexicographic pivoting rule always leads to a unique choice for the

exiting variable?

• Otherwise, two rows in tableau proportional ⇒ rank(B −1 A) < m ⇒

rank(A) < m

5.7 Theorem
Slide 29
If simplex starts with all the rows in the simplex tableau, other than the zeroth
row, lexicographically positive and the lexicographic pivoting rule is followed,
then
(a) Every row of the simplex tableau, other than the zeroth row, remains

lexicographically positive throughout the algorithm.

(b) The zeroth row strictly increases lexicographically at each iteration.


(c) The simplex method terminates after a finite number of iterations.

5.8 Smallest subscript


pivoting rule
Slide 30
1. Find the smallest j for which the reduced cost cj is negative and have the

column Aj enter the basis.

2. Out of all variables xi that are tied in the test for choosing an exiting

variable, select the one with the smallest value of i.

7
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 7: The Simplex Method III


1 Outline
Slide 1
• Finding an initial BFS
• The complete algorithm
• The column geometry
• Computational efficiency
• The diameter of polyhedra and the Hirch conjecture

2 Finding an initial BFS


Slide 2
• Goal: Obtain a BFS of Ax = b, x ≥ 0

or decide that LOP is infeasible.

• Special case: b ≥ 0

Ax ≤ b, x ≥ 0

⇒ Ax + s = b, x, s ≥ 0
s = b, x=0

2.1 Artificial variables


Slide 3
Ax = b, x≥0
1. Multiply rows with −1 to get b ≥ 0.
2. Introduce artificial variables y, start with initial BFS y = b, x = 0, and

apply simplex to auxiliary problem

min y1 + y2 + . . . + ym
s.t. Ax + y = b
x, y ≥ 0
Slide 4
3. If cost > 0 ⇒ LOP infeasible; stop.

4. If cost = 0 and no artificial variable is in the basis, then a BFS was found.

5. Else, all yi∗ = 0, but some are still in the basis. Say we have AB(1) , . . . , AB(k)

in basis k < m. There are m − k additional columns of A to form a basis.

Slide 5

6. Drive artificial variables out of the basis: If lth basic variable is artifi­

cial examine lth row of B −1 A. If all elements = 0 ⇒ row redundant.

Otherwise pivot with =� 0 element.

1
2.2 Example
Slide 6
min x1 + x2 + x3
s.t. x1 + 2x2 + 3x3 = 3
−x1 + 2x2 + 6x3 = 2
4x2 + 9x3 = 5
3x3 + x4 = 1
x1 , . . . , x4 ≥ 0.
min x5 + x6 + x7 + x8
s.t. x1 + 2x2 + 3x3 + x5 = 3
−x1 + 2x2 + 6x3 + x6 = 2
4x2 + 9x3 + x7 = 5
3x3 + x4 + x8 = 1
x1 , . . . , x8 ≥ 0.
Slide 7
x1 x2 x3 x4 x5 x6 x7 x8
−11 0 −8 −21 −1 0 0 0 0
x5 = 3 1 2 3 0 1 0 0 0
x6 = 2 −1 2 6 0 0 1 0 0
x7 = 5 0 4 9 0 0 0 1 0
x8 = 1 0 0 3 1* 0 0 0 1

x1 x2 x3 x4 x5 x6 x7 x8
−10 0 −8 −18 0 0 0 0 1
x5 = 3 1 2 3 0 1 0 0 0
x6 = 2 −1 2 6 0 0 1 0 0
x7 = 5 0 4 9 0 0 0 1 0
x4 = 1 0 0 3* 1 0 0 0 1
Slide 8

x1 x2 x3 x4 x5 x6 x7 x8
−4 0 −8 0 6 0 0 0 7
x5 = 2 1 2 0 −1 1 0 0 −1
x6 = 0 −1 2* 0 −2 0 1 0 −2
x7 = 2 0 4 0 −3 0 0 1 −3
x3 = 1/3 0 0 1 1/3 0 0 0 1/3

2
x1 x2 x3 x4 x5 x6 x7 x8
−4 −4 0 0 −2 0 4 0 −1
x5 = 2 2* 0 0 1 1 −1 0 1
x2 = 0 −1/2 1 0 −1 0 1/2 0 −1
x7 = 2 2 0 0 1 0 −2 1 1
x3 = 1/3 0 0 1 1/3 0 0 0 1/3
Slide 9

x1 x2 x3 x4 x5 x6 x7 x8
0 0 0 0 0 2 2 0 1
x1 = 1 1 0 0 1/2 1/2 −1/2 0 1/2
x2 = 1/2 0 1 0 −3/4 1/4 1/4 0 −3/4
x7 = 0 0 0 0 0 −1 −1 1 0
x3 = 1/3 0 0 1 1/3 0 0 0 1/3
Slide 10
x1 x2 x3 x4
∗ ∗ ∗ ∗ ∗
x1 = 1 1 0 0 1/2
x2 = 1/2 0 1 0 −3/4
x3 = 1/3 0 0 1 1/3

3 A complete Algorithm for LO


Slide 11
Phase I:
1. By multiplying some of the constraints by −1, change the problem so that

b ≥ 0.

�m
2. Introduce y1 , . . . , ym , if necessary, and apply the simplex method to min i=1 yi .
3. If cost> 0, original problem is infeasible; STOP.
4. If cost= 0, a feasible solution to the original problem has been found.
5. Drive artificial variables out of the basis, potentially eliminating redundant

rows.

Slide 12
Phase II:

3
1. Let the final basis and tableau obtained from Phase I be the initial basis
and tableau for Phase II.
2. Compute the reduced costs of all variables for this initial basis, using the
cost coefficients of the original problem.
3. Apply the simplex method to the original problem.

3.1 Possible outcomes


Slide 13
1. Infeasible: Detected at Phase I.
2. A has linearly dependent rows: Detected at Phase I, eliminate redundant
rows.
3. Unbounded (cost= −∞): detected at Phase II.
4. Optimal solution: Terminate at Phase II in optimality check.

4 The big-M method


Slide 14
n
� m

min cj xj + M yi
j=1 i=1
s.t. Ax + y = b
x, y ≥ 0

5 The Column Geometry


Slide 15
min c′ x
s.t. Ax = b
e′ x = 1
x ≥ 0
� � � � � � � �
A1 A2 An b
x1 + x2 + · · · + xn =
c1 c2 cn z
Slide 16
Slide 17
6 Computational efficiency
Slide 18
Exceptional practical behavior: linear in n
Worst case
max xn
s.t. ǫ ≤ x1 ≤ 1
ǫxi−1 ≤ xi ≤ 1 − ǫxi−1 , i = 2, . . . , n
Slide 19
Theorem Slide 20

4
z
B

. I C

F
H .
G . D

b .

initialbasis
z
. 6

3. . . . 2
1

4 . 7
nextbasis

8
. . 5

. b
optimalbasis

x3

x2

x2

x1 x1

(a) (b)

• The feasible set has 2n vertices


• The vertices can be ordered so that each one is adjacent to and has lower

cost than the previous one.

• There exists a pivoting rule under which the simplex method requires

2n − 1 changes of basis before it terminates.

7 The Diameter of polyhedra


Slide 21
• Given a polyhedron P , and x, y vertices of P , the distance d(x, y) is the

minimum number of jumps from one vertex to an adjacent one to reach y

starting from x.

• The diameter D(P ) is the maximum of d(x, y) ∀x, y.


Slide 22
• Δ(n, m) as the maximum of D(P ) over all bounded polyhedra in ℜn that

are represented in terms of m inequality constraints.

• Δu (n, m) is like Δ(n, m) but for possibly unbounded polyhedra.

7.1 The Hirsch Conjecture


Slide 23
• �m�
Δ(2, m) = , Δu (2, m) = m − 2
2

. .
. .
. . .
. . .
. . . . .
(a) (b)

• Hirsch Conjecture: Δ(n, m) ≤ m − n.


Slide 24
• We know that �n�
Δu (n, m) ≥ m − n +
5

Δ(n, m) ≤ Δu (n, m) < m1+log2 n = (2n)log2 m

6
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 8: Duality Theory I


1 Outline
Slide 1
• Motivation of duality
• General form of the dual
• Weak and strong duality
• Relations between primal and dual
• Economic Interpretation
• Complementary Slackness

2 Motivation
2.1 An idea from Lagrange
Slide 2
Consider the LOP, called the primal with optimal solution x∗

min c′ x
s.t. Ax = b
x≥0

Relax the constraint


g(p) = min c′ x + p′ (b − Ax)
s.t. x ≥ 0

g(p) ≤ c′ x∗ + p′ (b − Ax∗ ) = c′ x∗
Get the tightest lower bound, i.e.,

max g(p)

� �
g(p) = min c′ x + p′ (b − Ax)
x ≥0
= p′ b + min (c′ − p′ A)x
x ≥0

Note that �
′ ′ 0, if c′ − p′ A ≥ 0′ ,
min (c − p A)x =
x≥0 −∞, otherwise.

Dual max g(p) ⇔ max p′ b


s.t. p′ A ≤ c′

1
3 General form of the dual

Slide 3
Primal Dual
min c′
x max p′ b
s.t. a′i
x ≥ bi i ∈ M1 s.t. pi ≥ 0 i ∈ M1
a′i
x ≤ bi i ∈ M2 pi ≤ 0 i ∈ M1
a′i
x = bi i ∈ M3 pi >< 0 i ∈ M3
xj ≥ 0 j ∈ N1 p′ Aj ≤ cj j ∈ N1
xj ≤ 0 j ∈ N2 p′ Aj ≥ cj j ∈ N2
xj >< 0 j ∈ N3 p′ Aj = cj j ∈ N3

3.1 Example
Slide 4
min x1 + 2x2 + 3x3 max 5p1 + 6p2 + 4p3
s.t. −x1 + 3x2 =5 s.t. p1 free
2x1 − x2 + 3x3 ≥ 6 p2 ≥0
x3 ≤ 4 p3 ≤0
x1 ≥ 0 −p1 + 2p2 ≤1
x2 ≤ 0 3p1 − p2 ≥2
x3 free, 3p2 + p3 = 3.
Slide 5
Primal min max dual
≥ bi ≥0
constraints ≤ bi ≤0 variables
>
= bi <0
≥0 ≤ cj
variables ≤0 ≥ cj constraints
>
<0 = cj

Theorem: The dual of the dual is the primal.

3.2 A matrix view


Slide 6
min c′ x max p′ b
s.t. Ax = b s.t. p′ A ≤ c′
x ≥ 0
min c′ x max p′ b
s.t. Ax ≥ b s.t. p′ A = c′
p≥0

4 Weak Duality
Slide 7
Theorem:
If x is primal feasible and p is dual feasible then p′ b ≤ c′ x
Proof
p′ b = p′ Ax ≤ c′ x

2
Corollary:

If x is primal feasible, p is dual feasible, and p′ b = c′ x, then x is optimal in

the primal and p is optimal in the dual.

5 Strong Duality
Slide 8
Theorem: If the LOP has optimal solution, then so does the dual, and optimal

costs are equal.

Proof:

min c′ x
s.t. Ax = b
x ≥ 0
Apply Simplex; optimal solution x, basis B.
Optimality conditions:
c′ − c′B B −1 A ≥ 0′
Slide 9
Define p′ = c′B B −1 ⇒ p′ A ≤ c′
⇒ p dual feasible for
max p′ b
s.t. p′ A ≤ c′

p′ b = c′B B −1 b = c′B xB = c′ x
⇒ x, p are primal and dual optimal

5.1 Intuition
Slide 10
a3

c
a2
a1
p 2a 2 p 1a 1

.
x *

3
6 Relations between primal and dual
Slide 11
Finite opt. Unbounded Infeasible
Finite opt. *
Unbounded *
Infeasible * *

7 Economic Interpretation
Slide 12
• x optimal nondegenerate solution: B −1 b > 0
• Suppose b changes to b + d for some small d
• How is the optimal cost affected?
• For small d feasibilty unaffected
• Optimality conditions unaffected
• New cost c′B B −1 (b + d) = p′ (b + d)
• If resource i changes by di , cost changes by pi di : “Marginal Price”

8 Complementary slackness
8.1 Theorem
Slide 13
Let x primal feasible and p dual feasible. Then x, p optimal if and only if

pi (a′i x − bi ) = 0, ∀i

xj (cj − p′ Aj ) = 0, ∀j

8.2 Proof
Slide 14
• ui = pi (a′i x − bi ) and vj = (cj − p′ Aj )xj
• If x primal feasible and p dual feasible, we have ui ≥ 0 and vj ≥ 0 for all

i and j.

• Also � �
c ′ x − p′ b = ui + vj .
i j

• By the strong duality theorem, if x and p are optimal, then c′ x = p′ b ⇒

ui = vj = 0 for all i, j.

• Conversely, if ui = vj = 0 for all i, j, then c′ x = p′ b,

• ⇒ x and p are optimal.

4
8.3 Example
Slide 15

min 13x1 + 10x2 + 6x3 max 8p1 + 3p2


s.t. 5x1 + x2 + 3x3 = 8 s.t. 5p1 + 3p2 ≤ 13

3x1 + x2 = 3 p1 + p2 ≤ 10

x1 , x2 , x3 ≥ 0 3p1 ≤ 6

Is x∗ = (1, 0, 1)′ optimal? Slide 16

5p1 + 3p2 = 13, 3p1 = 6

⇒ p1 = 2, p2 = 1
Objective=19

MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 9: Duality Theory II


1 Outline
Slide 1
• Strict complementary slackness
• Geometry of duality
• The dual simplex algorithm
• Duality and degeneracy

2 Strict Complementary Slackness


Slide 2
Assume that both problems have an optimal solution:

min c′ x max p′ b
s.t. Ax ≥ b s.t. p′ A ≤ c′
x ≥ 0, p ≥ 0.

There exist optimal solutions to the primal and to the dual that satisfy
• For every j, either xj > 0 or p′ Aj < cj .
• For every i, we have either a′i x > bi or pi > 0.

2.1 Example
Slide 3
min 5x1 + 5x2
s.t. x1 + x2 ≥ 2
2x1 − x2 ≥ 0
x1 , x2 ≥ 0.

• Is (2/3, 4/3) strictly complementary?


• Which are all the strictly complementary solutions?

3 The Geometry of Duality


Slide 4
min c′ x
s.t. a′i x ≥ bi , i = 1, . . . , m

max p′ b
m

s.t. pi ai = c
i=1
p≥0

1
c
c
a1
a3
A
a5
a2 c a4
a1

B a3 c

a4 a2
a1

C c

a1
a5
D a1

a2 c

a1
a3 a3
a2

x * a1

4 Dual Simplex Algorithm


4.1 Motivation
Slide 5
• In simplex method B −1 b ≥ 0
• Primal optimality condition

c′ − c′B B −1 A ≥ 0′

same as dual feasibility


• Simplex is a primal algorithm: maintains primal feasibility and works

towards dual feasibility

• Dual algorithm: maintains dual feasibility and works towards primal


feasibility
Slide 6
−c′B xB c̄1 ... c̄n
xB(1) | |
..
. B −1 A1 ... B −1 An
xB(m) | |

• Do not require B −1 b ≥ 0
• Require c̄ ≥ 0 (dual feasibility)
• Dual cost is

p′ b = c′B B −1 b = c′B xB

• If B −1 b ≥ 0 then both dual feasibility and primal feasibility, and also

same cost ⇒ optimality

• Otherwise, change basis

4.2 An iteration
Slide 7
1. Start with basis matrix B and all reduced costs ≥ 0.

2. If B −1 b ≥ 0 optimal solution found; else, choose l s.t. xB(l) < 0.

3. Consider the lth row (pivot row) xB(l) , v1 , . . . , vn . If ∀i vi ≥ 0 then dual

optimal cost = +∞ and algorithm terminates.

Slide 8
4. Else, let j s.t.
c̄j c̄i
= min
|vj | {i|vi <0} |vi |
5. Pivot element vj : Aj enters the basis and AB(l) exits.

3
x2 p2

1 . D

c b

. B . C
1
. C

. .
A
1
D
.
E
2 x1
A
. B
1/2
. .
1
E
p1

(a) (b)

4.3 An example
Slide 9
min x1 + x2
s.t. x1 + 2x2 ≥ 2
x1 ≥ 1
x1 , x2 ≥ 0
min x1 + x2 max 2p1 + p2
s.t. x1 + 2x2 − x3 = 2 s.t. p1 + p2 ≤ 1
x1 − x4 = 1 2p1 ≤ 1
x1 , x2 , x3 , x4 ≥ 0 p1 , p2 ≥ 0
Slide 10
x1 x2 x3 x4
0 1 1 0 0
x3 = −2 −1 −2* 1 0
x4 = −1 −1 0 0 1
Slide 11

x1 x2 x3 x4
−1 1/2 0 1/2 0
x2 = 1 1/2 1 −1/2 0
x4 = −1 −1* 0 0 1

x1 x2 x3 x4
−3/2 0 0 1/2 1/2
x2 = 1/2 0 1 −1/2 1/2
x1 = 1 1 0 0 −1

4
x2 p2

B . A'
.
.
C c
.
C

A
. . B
.
D
p1

A
. .
D
x1
A' ' . b

(a) (b)

5 Duality and Degeneracy


Slide 12
• Any basis matrix B leads to dual basic solution p′ = cB ′ B −1 .
• The dual constraint p′ Aj = cj is active if and only if the reduced cost cj

is zero.

• Since p is m-dimensional, dual degeneracy implies more than m reduced

costs that are zero.

• Dual degeneracy is obtained whenever there exists a nonbasic variable

whose reduced cost is zero.

5.1 Example
Slide 13
min 3x1 + x2 max 2p1
s.t. x1 + x2 − x3 = 2 s.t. p1 + 2p2 ≤ 3
2x1 − x2 − x4 = 0 p1 − p2 ≤ 1
x1 , x2 , x3 , x4 ≥ 0, p1 , p2 ≥ 0.
Equivalent primal problem

min 3x1 + x2
s.t. x1 + x2 ≥ 2
2x1 − x2 ≥ 0
x1 , x2 ≥ 0.
Slide 14
Slide 15
• Four basic solutions in primal: A, B, C, D.
• Six distinct basic solutions in dual: A, A′ , A′′ , B, C, D.
• Different bases may lead to the same basic solution for the primal, but

to different basic solutions for the dual. Some are feasible and some are

infeasible.

5
5.2 Degeneracy and uniqueness
Slide 16
• If dual has a nondegenerate optimal solution, the primal problem has a

unique optimal solution.

• It is possible, however, that dual has a degenerate solution and the dual

has a unique optimal solution.

6
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 10: Duality Theory III


A1
A3
A2

b
.

1 Outline
Slide 1
• Farkas lemma
• Asset pricing
• Cones and extreme rays
• Representation of Polyhedra

2 Farkas lemma
Slide 2
Theorem:

Exactly one of the following two alternatives hold:

1. ∃x ≥ 0 s.t. Ax = b.
2. ∃p s.t. p′ A ≥ 0′ and p′ b < 0.

2.1 Proof
Slide 3
“ ⇒′′
If ∃x ≥ 0 s.t. Ax = b, and if p′ A ≥ 0′ , then p′ b = p′ Ax ≥ 0
“ ⇐′′
Assume there is no x ≥ 0 s.t. Ax = b

(P ) max 0′ x (D) min p′ b


s.t. Ax = b s.t. p′ A ≥ 0 ′
x ≥ 0

(P) infeasible ⇒ (D) either unbounded or infeasible


Since p = 0 is feasible ⇒ (D) unbounded
⇒ ∃p : p′ A ≥ 0′ and p′ b < 0

1
3 Asset Pricing
Slide 4
• n different assets
• m possible states of nature
• one dollar invested in some asset i, and state of nature is s, we receive a

payoff of rsi

• m × n payoff matrix:
 
r11 . . . r1n
 .. .. .. 
R= . . . 
rm1 . . . rmn
Slide 5
� �
• xi : amount held of asset i. A portfolio of assets is x = x1 , . . . , xn .
• A negative value of xi indicates a “short” position in asset i: this amounts

to selling |xi | units of asset i at the beginning of the period, with a promise

to buy them back at the end. Hence, one must pay out rsi |xi | if state s

occurs, which is the same as receiving a payoff of rsi xi

Slide 6

• Wealth in state s from a portfolio x


n

ws = rsi xi .
i=1

� �
• w = w1 , . . . , wm , w = Rx
� �
• pi : price of asset i, p = p1 , . . . , pn
• Cost of acquiring x is p′ x.

3.1 Arbitrage
Slide 7
• Central problem: Determine pi
• Absence of arbitrage: no investor can get a guaranteed nonnegative

payoff out of a negative investment. In other words, any portfolio that pays

off nonnegative amounts in every state of nature, must have nonnegative

cost.

if Rx ≥ 0, then p′ x ≥ 0.
Slide 8

2
• Theorem: The absence of arbitrage condition holds if and only if there

exists a nonnegative vector q = (q1 , . . . , qm ), such that the price of each

asset i is given by

�m
pi = qs rsi .
s=1

• Applications to options pricing

4 Cones and extreme rays


4.1 Definitions
Slide 9
• A set C ⊂ ℜn is a cone if λx ∈ C for all λ ≥ 0 and all x ∈ C
• A polyhedron of the form P = {x ∈ ℜn | Ax ≥ 0} is called a polyhedral

cone

4.2 Applications
� � Slide 10
• P = x ∈ ℜn | Ax ≥ b , y ∈ P
• The recession cone at y

RC = d ∈ ℜn | y + λd ∈ P, ∀ λ ≥ 0
� �

• It turns out that


RC = d ∈ ℜn | Ad ≥ 0
� �

• RC independent of y
Slide 11

4.3 Extreme rays


Slide 12
Ax= � 0 of a polyhedral cone C ⊂ ℜn is called an extreme ray if there are
n − 1 linearly independent constraints that are active at x

4.4 Unbounded LPs


Slide 13
Theorem: Consider the problem of minimizing c′
x over a polyhedral cone C =
{x ∈ ℜn | A′i x ≥ 0, i = 1, . . . , m} that has zero as an extreme point. The
optimal cost is equal to −∞ if and only if some extreme ray d of C satisfies
c′ d < 0. Theorem: Consider the problem of minimizing c′
x subject to Ax ≥ b, Slide 14
and assume that the feasible set has at least one extreme point. The optimal
cost is equal to −∞ if and only if some extreme ray d of the feasible set satisfies
c′
d < 0.

What happens when the simplex method detects an unbounded problem?

3
x2 x3
0
x=

- a1
. ..
a'
1

y
. - a2
x2

0
a 2' x =

x1
x1

(a) (b)

4
w 1
x1

.
x1 .
y

recession

x2
. cone

.
x3 w 2 x2

5 Resolution Theorem
Slide 15
P = x ∈ ℜn | Ax ≥ b
� �

be a nonempty polyhedron with at least one extreme point. Let x1 , . . . , xk be


the extreme points, and let w 1 , . . . , w r be a complete set of extreme rays of P .
� k r k

� � � �
i j �
Q= λi x + θj w � λi ≥ 0, θj ≥ 0, λi = 1 .
i=1 j=1 i=1

Then, Q = P .

5.1 Example
Slide 16
x1 − x2 ≥ −2
x1 + x2 ≥ 1
x1 , x2 ≥ 0
Slide 17
• Extreme points: x1 = (0, 2), x2 = (0, 1), and x3 = (1, 0).
• Extreme rays w1 = (1, 1) and w2 = (1, 0).
• � � � � � � � �
2 0 1 1
y= = + + = x2 + w 1 + w 2 .
2 1 1 0

5
5.2 Proof
Slide 18
• Q ⊂ P . Let x ∈ Q:
k
� r

x= λi xi + θj wj

i=1 j=1

�k

λi , θj ≥ 0 i=1 λi = 1.
�k
• y = i=1 λi xi ∈ P and satisfies Ay ≥ b.
�r
• Aw j ≥ 0 for every j: z = j=1 θj wj satisfies Az ≥ 0.
• x = y + z satisfies Ax ≥ b and belongs to P .
Slide 19
For the reverse, assume there is a z ∈ P , such that z ∈
/ Q.
k
� r

max 0λi + 0θj
i=1 j=1
k r
� �
s.t. λi x i + θj w j = z
i=1 j=1
k

λi = 1
i=1
λi ≥ 0, i = 1, . . . , k,
θj ≥ 0, j = 1, . . . , r,
Is this feasible? Slide 20
• Dual
min p′
z + q
s.t. p′ xi + q ≥ 0, i = 1, . . . , k,
p′
wj ≥ 0, j = 1, . . . , r.
• This is unbounded. Why?
• There exists a feasible solution (p, q) whose cost p′
z + q < 0
• p′ z < p′ xi for all i and p′ w j ≥ 0 for all j.
Slide 21

min p′
x
s.t. Ax ≥ b.

• If the optimal cost is finite, there exists an extreme point xi which is

optimal. Since z is a feasible solution, we obtain p′ xi ≤ p′ z, which is a

contradiction.

• If the optimal cost is −∞, there exists an extreme ray wj such that

p′ w j < 0, which is again a contradiction

6
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 11: Duality Theory IV


1 Outline
Slide 1
• Overview and objectives
• Weistrass Theorem
• Separating hyperplanes theorem
• Farkas lemma revisited
• Duality theorem revisited

2 Overview and objectives


Slide 2
• So far: Simplex −→ Duality −→ Farkas lemma
• Disadvantages: specialized to LP, relied on a particular algorithm

• Plan today: Separation (A Geometric property) −→ Farkas lemma −→

Duality

• Purely geometric, generalizes to general nonlinear problems, more funda­

mental

3 Closed sets
Slide 3
• A set S ⊂ ℜn is closed if x1 , x2 , . . . is a sequence of elements of S that

converges to some x ∈ ℜn , then x ∈ S.

• Every polyhedron is closed.

4 Weierstrass’ theorem
Slide 4
If f : ℜn �→ ℜ is a continuous function, and if S is a nonempty, closed, and
bounded subset of ℜn , then there exists some x∗ ∈ S such that f (x∗ ) ≤ f (x)
for all x ∈ S. Similarly, there exists some y ∗ ∈ S such that f (y ∗ ) ≥ f (x) for
all x ∈ S.
Note: Weierstrass’ theorem is not valid if the set S is not closed. Consider,
S = {x ∈ ℜ | x > 0}, f (x) = x

5 Separation
Slide 5
Theorem: Let S be a nonempty closed convex subset of ℜn and let x∗ ∈ ℜn :
/ S. Then, there exists some vector c ∈ ℜn such that c′ x∗ < c′ x for all
x∗ ∈
x ∈ S.

1
S

.
x *

.x *

B
.
x *

.
c

. . y
w
.y
q
.
x

S S

(a) (b)

5.1 Proof
Slide 6
• Fix w ∈ S
� � �
• B = x � ||x − x∗ || ≤ ||w − x∗ || ,
• D =S∩B
• D �= ∅, closed and bounded. Why?
• Consider min ||x − x∗ ||
Slide 7
Slide 8
• By Weierstrass’ theorem there exists some y ∈ D such that

||y − x∗ || ≤ ||x − x∗ ||, ∀ x ∈ D.

/ D, ||x − x∗ || > ||w − x∗ || ≥ ||y − x∗ ||.


• ∀x ∈ S and x ∈
• y minimizes ||x − x∗ || ∀x ∈ S.
• Let c = y − x∗

2
Slide 9
• x ∈ S. ∀λ satisfying 0 < λ ≤ 1, y + λ(x − y) ∈ S (S convex)
• ||y − x∗ ||2 ≤ ||y + λ(x − y) − x∗ ||2

= ||y − x∗ ||2 + 2λ(y − x∗ )′ (x − y) + λ2 ||x − y||2

• 2λ(y − x∗ )′ (x − y) + λ2 ||x − y||2 ≥ 0.


• Divide by λ, (y − x∗ )′ (x − y) ≥ 0, i.e.,

(y − x∗ )′ x ≥ (y − x∗ )′ y
= (y − x∗ )′ x∗ + (y − x∗ )′ (y − x∗ )
> (y − x∗ )′ x∗ .

• c = y − x∗ proves theorem

6 Farkas’ lemma
Slide 10
Theorem: If Ax = b, x ≥ 0 is infeasible, then there exists a vector p such that
p′ A ≥ 0′ and p′ b < 0.
� � �
• S = y � there exists x such that y = Ax, x ≥ 0 b ∈ / S.

• S is convex; nonempty; closed;

S is the projection of {(x, y) | y = Ax, x ≥ 0} onto the y coordinates,

is itself a polyhedron and is therefore closed.

/ S: ∃p such that p′ b < p′ y for every y ∈ S.


• b ∈
• Since 0 ∈ S, we must have p′ b < 0.
• ∀Ai and ∀λ > 0, λAi ∈ S and p′ b < λp′ Ai
• Divide by λ and then take limit as λ tends to infinity: p′ Ai ≥ 0 ⇒ p′ A ≥
0′

7 Duality theorem
Slide 11
min c′ x max p′ b
s.t. Ax ≥ b s.t. p′ A = c ′
p≥0
and we assume that the primal has an optimal solution x∗ . We will show that
the dual problem also has a feasible solution with the same cost. Strong duality
follows then from weak duality. Slide 12

• I = {i | a′i x∗ = bi }
• We next show: if a′i d ≥ 0 for every i ∈ I, then c′ d ≥ 0

3
• a′i (x∗ + ǫd) ≥ ai x∗ = bi for all i ∈ I.
/ I, a′i x∗ > bi hence a′i (x∗ + ǫd) > bi .
• If i ∈
• x∗ + ǫd is feasible
Slide 13

• By optimality x∗ , c′ d ≥ 0
• By Farkas’ lemma �
c= pi a i .
i∈I

/ I, we define pi = 0, so p′ A = c′ .
• For i ∈
• � �
p′ b = pi b i = pi ai′ x∗ = c′ x∗ ,
i∈I i∈I

4
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 12: Sensitivity Analysis


1 Motivation
1.1 Questions
Slide 1
z = min c′ x
s.t. Ax = b

x ≥ 0

• How does z depend globally on c? on b?


• How does z change locally if either b, c, A change?
• How does z change if we add new constraints, introduce new variables?
• Importance: Insight about LO and practical relevance

2 Outline
Slide 2
1. Global sensitivity analysis
2. Local sensitivity analysis
(a) Changes in b
(b) Changes in c
(c) A new variable is added
(d) A new constraint is added
(e) Changes in A
3. Detailed example

3 Global sensitivity analysis


3.1 Dependence on c
Slide 3
G(c) = min c′ x
s.t. Ax = b
x≥0
i
G(c) = mini=1,...,N c′ x is a concave function of c

3.2 Dependence on b
Slide 4
Primal Dual
F (b) = min c′ x
F (b) = max p′ b
s.t. Ax = b
s.t. p′ A ≤ c′
x≥0
F (b) = maxi=1,...,N (pi )′ b is a convex function of b

1
( c + q d) ' ( x3)

( c + q d) ' ( x2
)

( c + q d) ' ( x1) ( c + q d) ' ( x4


)

x1 o p t i m a l
. x2 o p t i m a l
. x3 o p t i m a l
. x4 o p t i m a l q

f( q)

( p1) ' ( b* + q d)

( p3) ' ( b* + q d)
( p2) ' ( b* + q d)

q1 q2 q

4 Local sensitivity analysis


Slide 5
z = min c′ x
s.t. Ax = b
x≥0
What does it mean that a basis B is optimal?

1. Feasibility conditions: B −1 b ≥ 0
2. Optimality conditions: c′ − cB

B −1 A ≥ 0′
Slide 6
• Suppose that there is a change in either b or c for example
• How do we find whether B is still optimal?
• Need to check whether the feasibility and optimality conditions are satis­

fied

5 Local sensitivity analysis


5.1 Changes in b
Slide 7
bi becomes bi + Δ, i.e.
(P ) min c′ x (P ′ ) min c′ x
s.t. Ax = b → s.t. Ax = b + Δei

x≥0 x ≥ 0

• B optimal basis for (P )


• Is B optimal for (P ′ )?
Slide 8
Need to check:

1. Feasibility: B −1 (b + Δei ) ≥ 0
2. Optimality: c′ − cB

B −1 A ≥ 0′

Observations:
1. Changes in b affect feasibility
2. Optimality conditions are not affected
Slide 9
B −1 (b + Δei ) ≥ 0
βij = [B −1 ]ij
bj = [B −1 b]j
Thus,
(B −1 b)j + Δ(B −1 ei )j ≥ 0 ⇒ bj + Δβji ≥ 0 ⇒

3
   
bj bj
max − ≤ Δ ≤ min −
βji >0 βji βji <0 βji
Slide 10

Δ≤Δ≤Δ

Within this range


• Current basis B is optimal
• z = c′B B −1 (b + Δei ) = cB

B −1 b + Δpi
• What if Δ = Δ?
• What if Δ > Δ?
Current solution is infeasible, but satisfies optimality conditions → use

dual simplex method

5.2 Changes in c
Slide 11
cj → cj + Δ

Is current basis B optimal?

Need to check:

1. Feasibility: B −1 b ≥ 0, unaffected
2. Optimality: c′ − cB

B −1 A ≥ 0′ , affected

There are two cases:


• xj basic

• xj nonbasic

5.2.1 xj nonbasic
Slide 12
cB unaffected
(cj + Δ) − c′B B −1 Aj ≥ 0 ⇒ cj + Δ ≥ 0
Solution optimal if Δ ≥ −cj
What if Δ = −cj ?
What if Δ < −cj ?

4
5.2.2 xj basic
Slide 13

cB ← ĉB = cB + Δej

Then,
[c′ − ĉB

B −1 A]i ≥ 0 ⇒ ci − [cB + Δej ]′ B −1 Ai ≥ 0
[B −1 A]ji = aji
ci ci
ci − Δaji ≥ 0 ⇒ max ≤ Δ ≤ min
aji <0 aji aji >0 aji

What if Δ is outside this range? use primal simplex

5.3 A new variable is added


Slide 14
min c′ x min c′ x + cn+1 xn+1
s.t. Ax = b → s.t. Ax + An+1 xn+1 = b
x≥0 x≥0
In the new problem is xn+1 = 0 or xn+1 > 0? (i.e., is the new activity prof­
itable?) Slide 15
Current basis B. Is solution x = B −1 b, xn+1 = 0 optimal?

• Feasibility conditions are satisfied


• Optimality conditions:

cn+1 − c′B B −1 An+1 ≥ 0 ⇒ cn+1 − p′ An+1 ≥ 0?

• If yes, solution x = B −1 b, xn+1 = 0 optimal


• Otherwise, use primal simplex

5.4 A new constraint is added


Slide 16
′ min c′ x
min c x
s.t. Ax = b
s.t. Ax = b →
a′m+1 x = bm+1
x≥0
x≥0
If current solution feasible, it is optimal; otherwise, apply dual simplex

5
5.5 Changes in A
Slide 17
• Suppose aij ← aij + Δ
• Assume Aj does not belong in the basis
• Feasibility conditions: B −1 b ≥ 0, unaffected
• Optimality conditions: cl − c′B B −1 Al ≥ 0, l 6= j, unaffected
• Optimality condition: cj − p′ (Aj + Δei ) ≥ 0 ⇒ cj − Δpi ≥ 0

• What if Aj is basic? BT, Exer. 5.3

6 Example
6.1 A Furniture company
Slide 18
• A furniture company makes desks, tables, chairs
• The production requires wood, finishing labor, carpentry labor

Desk Table (ft) Chair Avail.


Profit 60 30 20 -
Wood (ft) 8 6 1 48
Finish hrs. 4 2 1.5 20
Carpentry hrs. 2 1.5 0.5 8

6.2 Formulation
Slide 19
Decision variables:
x1 = # desks, x2 = # tables, x3 = # chairs

max 60x1 + 30x2 + 20x3


s.t. 8x1 + 6x2 + x3 ≤ 48
4x1 + 2x2 + 1.5x3 ≤ 20
2x1 + 1.5x2 + 0.5x3 ≤8
x1 , x2 , x3 ≥0

6.3 Simplex tableaus


Slide 20
Initial tableau: s1 s2 s3 x1 x2 x3
0 0 0 0 -60 -30 -20
s1 = 48 1 8 6 1
s2 = 20 1 4 2 1.5
s2 = 8 1 2 1.5 0.5

6
Final tableau: s1 s2 s3 x1 x2 x3
280 0 10 10 0 5 0
s1 = 24 1 2 -8 0 -2 0
x3 = 8 0 2 -4 0 -2 1
x1 = 2 0 -0.5 1.5 1 1.25 0

6.4 Information in tableaus


Slide 21
• What is B?  
1 1 8
B= 0 1.5 4 
0 0.5 2

• What is B −1 ?  
1 2 −8
B −1 = 0 2 −4 
0 −0.5 1.5
Slide 22
• What is the optimal solution?
• What is the optimal solution value?
• Is it a bit surprising?
• What is the optimal dual solution?
• What is the shadow price of the wood constraint?
• What is the shadow price of the finishing hours constraint?
• What is the reduced cost for x2 ?

6.5 Shadow prices


Slide 23
Why the dual price of the finishing hours constraint is 10?

• Suppose that finishing hours become 21 (from 20).


• Currently only desks (x1 ) and chairs (x3 ) are produced
• Finishing and carpentry hours constraints are tight
• Does this change leaves current basis optimal?
Slide 24
New Previous
8x1 + x3 + s1 = 48 s1 = 26 24
New solution:
4x1 + 1.5x3 = 21 ⇒ x1 = 1.5 2
2x1 + 0.5x3 =8 x3 = 10 8
Solution change:
z ′ − z = (60 ∗ 1.5 + 20 ∗ 10) − (60 ∗ 2 + 20 ∗ 8) = 10
Slide 25

7
• Suppose you can hire 1h of finishing overtime at $7. Would you do it?
• Another check
 
1 2 −8
c′B B −1 = (0, −20, −60)  0 2 −4  =
0 −0.5 1.5

(0, −10, −10)

6.6 Reduced costs


Slide 26
• What does it mean that the reduced cost for x2 is 5?
• Suppose you are forced to produce x2 = 1 (1 table)
• How much will the profit decrease?

8x1 + x3 + s1 + 6·1 = 48 s1 = 26
4x1 + 1.5x3 + 2·1 = 20 ⇒ x1 = 0.75
2x1 + 0.5x3 + 1.5 · 1 = 8 x3 = 10
z ′ − z = (60 ∗ 0.75 + 20 ∗ 10) − (60 ∗ 2 + 20 ∗ 8 + 30 ∗ 1) = −35 + 30 = −5 Slide 27
Another way to calculate the same thing: If x2 = 1

Direct profit from table +30


Decrease wood by -6 −6 ∗ 0 = 0
Decrease finishing hours by -2 −2 ∗ 10 = −20
Decrease carpentry hours by -1.5 −1.5 ∗ 10 = −15
Total Effect −5

Suppose profit from tables increases from $30 to $34. Should it be produced?
At $35? At $36?

6.7 Cost ranges


Slide 28
Suppose profit from desks becomes 60 + Δ. For what values of Δ does current

basis remain optimal?

Optimality conditions:

cj − c′B B −1 Aj ≥ 0 ⇒
1 2 −8
" #

p = c′B B −1 = [0, −20, −(60 + Δ)] 0 2 −4

0 −0.5 1.5

= −[0, 10 − 0.5Δ, 10 + 1.5Δ]


Slide 29
s1 , x3 , x1 are basic
Reduced costs of non-basic variables

8
 
6
c2 = c2 − p′ A2 = −30 + [0, 10 − 0.5Δ, 10 + 1.5Δ]  2  = 5 + 1.25Δ
1.5
cs2 = 10 − 0.5Δ
cs3 = 10 + 1.5Δ
Current basis optimal:

5 + 1.25Δ ≥ 0 
10 − 0.5Δ ≥ 0 −4 ≤ Δ ≤ 20
10 + 1.5Δ ≥ 0

⇒ 56 ≤ c1 ≤ 80 solution remains optimal.


If c1 < 56, or c1 > 80 current basis is not optimal.
Suppose c1 = 100(Δ = 40) What would you do?

6.8 Rhs ranges


Slide 30
Suppose
 finishing hours
 change by Δ  becoming
 (20+ Δ) What happens?
48 1 2 −8 48
B −1  20 + Δ  =  0 2 −4   20 + Δ 
8 0 −0.5 1.5 8
 
24 + 2Δ

=  8 + 2Δ  ≥ 0

2 − 0.5Δ

⇒ −4 ≤ Δ ≤ 4 current basis optimal Slide 31


Note that even if current basis is optimal, optimal solution variables change:

s1 = 24 + 2Δ
x3 = 8 + 2Δ
x1 = 2 − 0.5Δ
z = 60(2 − 0.5Δ) + 20(8 + 2Δ) = 280 + 10Δ
Slide 32
Suppose
 Δ =
 10 then

s1 44
 x3  =  25  ← inf. (Use dual simplex)
x1 −3

6.9 New activity


Slide 33
Suppose the company has the opportunity to produce stools
Profit $15; requires 1 ft of wood, 1 finishing hour, 1 carpentry hour
Should the company produce stools?

max 60x1 +30x2 +20x3 +15x4

8x1 +6x2 +x3 +x4 +s1 = 48

4x1 +2x2 +1.5x3 +x4 +s2 = 20

2x1 +1.5x2 +0.5x3 +x4 +s3 = 8

xi ≥ 0

9
1
!
c4 −c′B B −1 A4 = −15 − (0, −10, −10) 1 =5≥0
1
Current basis still optimal. Do not produce stools

10
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 13: Robust Optimization


1 Papers
Slide 1
• B. and Sim, The Price of Robustness, Operations Research, 2003.
• B. and Sim, Robust Discrete optimization, Mathematical Programming,

2003.

2 Structure
Slide 2
• Motivation

• Data Uncertainty

• Robust Mixed Integer Optimization

• Robust 0-1 Optimization

• Robust Approximation Algorithms

• Robust Network Flows

• Experimental Results

• Summary and Conclusions

3 Motivation
Slide 3
• The classical paradigm in optimization is to develop a model that assumes

that the input data is precisely known and equal to some nominal values.

This approach, however, does not take into account the influence of data

uncertainties on the quality and feasibility of the model.

• Can we design solution approaches that are immune to data uncertainty,

that is they are robust?

Slide 4

• Ben-Tal and Nemirovski (2000):


In real-world applications of Linear Optimization (Net Lib li­
brary), one cannot ignore the possibility that a small uncer­
tainty in the data can make the usual optimal solution com­
pletely meaningless from a practical viewpoint.

1
3.1 Literature
Slide 5
• Ellipsoidal uncertainty; Robust convex optimization Ben-Tal and Nemirovski

(1997), El-Ghaoui et. al (1996)

• Flexible adjustment of conservativism


• Nonlinear convex models
• Not extendable to discrete optimization

4 Goal
Slide 6
Develop an approach to address data uncertainty for optimization problems
that:
• It allows to control the degree of conservatism of the solution;
• It is computationally tractable both practically and theoretically.

5 Data Uncertainty
Slide 7
minimize c′ x
subject to Ax ≤ b
l≤x≤u
xi ∈ Z, i = 1, . . . , k,
WLOG data uncertainty affects only A and c, but not the vector b. Slide 8

• (Uncertainty for matrix A): aij , j ∈ Ji is independent, symmetric

and bounded random variable (but with unknown distribution) ãij , j ∈ Ji

that takes values in [aij − âij , aij + âij ].

• (Uncertainty for cost vector c): cj , j ∈ J0 takes values in [cj , cj + dj ].

6 Robust MIP
Slide 9
• Consider an integer Γi ∈ [0, |Ji |], i = 0, 1, . . . , m.
• Γi adjusts the robustness of the proposed method against the level of

conservativeness of the solution.

• Speaking intuitively, it is unlikely that all of the aij , j ∈ Ji will change.

We want to be protected against all cases that up to Γi of the aij ’s are

allowed to change.

Slide 10
• Nature will be restricted in its behavior, in that only a subset of the

coefficients will change in order to adversely affect the solution.

2
• We will guarantee that if nature behaves like this then the robust solution

will be feasible deterministically. Even if more than Γi change, then the

robust solution will be feasible with very high probability.

6.1 Problem
( ) Slide 11
X

minimize c x+ max dj |xj |
{S0 | S0 ⊆J0 ,|S0 |≤Γ0 }
j∈S0
( )
X X
subject to aij xj + max ˆij |xj |
a ≤ bi , ∀i
{Si | Si ⊆Ji ,|Si |≤Γi }
j j∈Si
l ≤ x ≤
u
xi ∈ Z, ∀i = 1, . . . k.

6.2 Theorem 1
Slide 12
The robust problem can be reformulated has an equivalent MIP:
P
minimize c ′ x + z 0 Γ0 + p
X j∈J0X0j
subject to aij xj + zi Γi + pij ≤ bi ∀i
j j∈Ji
z0 + p0j ≥ dj yj ∀j ∈ J0
zi + pij ≥ âij yj ∀i �= 0, j ∈ Ji
pij , yj , zi ≥ 0 ∀i, j ∈ Ji
−yj ≤ xj ≤ yj ∀j
lj ≤ xj ≤ uj ∀j
xi ∈ Z i = 1, . . . , k.

6.3 Proof
Slide 13
Given a vector x∗ , we define:
( )
X
βi (x∗ ) = max âij |x∗j | .
{Si | Si ⊆Ji ,|Si |=Γi }
j∈Si

This equals to: X


βi (x∗ ) = max âij |xj∗ |zij
j∈Ji
X
s.t. zij ≤ Γi
j∈Ji
0 ≤ zij ≤ 1 ∀i, j ∈ Ji .
Slide 14
Dual: X
βi (x∗ ) = min pij + Γi zi
j∈Ji
s.t. zi + pij ≥ âij |x∗j | ∀j ∈ Ji
pij ≥ 0 ∀j ∈ Ji
zi ≥ 0 ∀i.

3
|Ji | Γi

5 5

10 8.3565

100 24.263

200 33.899

Table 1: Choice of Γi as a function of |Ji | so that the probability of constraint


violation is less than 1%.

6.4 Size
Slide 15
• Original Problem has n variables and m constraints
Pm
• Robust counterpart has 2n + m + l variables, where l = i=0 |Ji | is the

number of uncertain coefficients, and 2n + m + l constraints.

6.5 Probabilistic Guarantee


6.5.1 Theorem 2
Slide 16
Let x∗ be an optimal solution of robust MIP.
(a) If A is subject to the model of data uncertainty U:
!  
n   n  

1

X X n X n
Pr ãij x∗j > bi ≤ (1 − µ) +µ ,

2n  l l 
j l=⌊ν⌋ l=⌊ν⌋+1

n = |Ji |, ν = Γi2+n and µ = ν − ⌊ν⌋; bound is tight.


(b) As n → ∞
 
n   n    
1 Γi − 1
 X n X n
(1 − µ) +µ ∼1−Φ √ .
2n  l l  n
l=⌊ν⌋ l=⌊ν⌋+1
Slide 17
Slide 18

7 Experimental Results
7.1 Knapsack Problems
• Slide 19
X
maximize ci xi
i∈N
X
subject to wi xi ≤ b
i∈N
x ∈ {0, 1}n.

4
0
10
Approx bound
Bound 2

−1
10

−2
10

−3
10

−4
10
0 1 2 3 4 5 6 7 8 9 10
Γi

Γ Violation Probability Optimal Value Reduction


0 0.5 5592 0%
2.8 4.49 × 10−1 5585 0.13%
36.8 5.71 × 10−3 5506 1.54%
82.0 5.04 × 10−9 5408 3.29%
200 0 5283 5.50%

• w̃i are independently distributed and follow symmetric distributions in

[wi − δi , wi + δi ];

• c is not subject to data uncertainty.

7.1.1 Data
Slide 20
• |N | = 200, b = 4000,
• wi randomly chosen from {20, 21, . . . , 29}.
• ci randomly chosen from {16, 17, . . . , 77}.
• δi = 0.1wi .

5
7.1.2 Results
Slide 21

8 Robust 0-1 Optimization


Slide 22

• Nominal combinatorial optimization:

minimize c′ x
subject to x ∈ X ⊂ {0, 1}n .

• Robust Counterpart:
X
Z∗ = minimize c′ x + max dj x j
{S| S⊆J,|S|=Γ}
j∈S

subject to x ∈ X,

• WLOG d1 ≥ d2 ≥ . . . ≥ dn .

8.1 Remarks
Slide 23

• Examples: the shortest path, the minimum spanning tree, the minimum
assignment, the traveling salesman, the vehicle routing and matroid inter­
section problems.

• Other approaches to robustness are hard. Scenario based uncertainty:

minimize max(c′1 x, c′2 x)


subject to x ∈ X.

is NP-hard for the shortest path problem.

8.2 Approach
X Slide 24

Primal :Z ∗ = min c′ x + max dj xj uj


x∈X
j
s.t. 0 ≤ uj ≤ 1, ∀j
X
uj ≤ Γ
j
X
Dual :Z ∗ = min c′ x + min θΓ + yj
x∈X
j
s.t. yj + θ ≥ dj xj , ∀j
yj , θ ≥ 0

6
8.3 Algorithm A
Slide 25
• Solution: yj = max(dj xj − θ, 0)
• X
Z∗ = min θΓ + (cj xj + max(dj xj − θ, 0))
x∈X,θ≥0
j

• Since X ⊂ {0, 1}n,


max(dj xj − θ, 0) = max(dj − θ, 0) xj

• X
Z∗ = min θΓ + (cj + max(dj − θ, 0)) xj
x∈X,θ≥0
j
Slide 26
• d1 ≥ d2 ≥ . . . ≥ dn ≥ dn+1 = 0.
• For dl ≥ θ ≥ dl+1 ,
n l
X X
min θΓ + cj xj + (dj − θ)xj =
x∈X,dl ≥θ≥dl+1
j=1 j=1

n l
X X
dl Γ + min cj xj + (dj − dl )xj = Zl
x∈X
j=1 j=1

n l
X X
Z∗ = min dl Γ + min cj xj + (dj − dl )xj .
l=1,...,n+1 x∈X
j=1 j=1

8.4 Theorem 3
Slide 27
• Algorithm A correctly solves the robust 0-1 optimization problem.
• It requires at most |J| + 1 solutions of nominal problems. Thus, If the

nominal problem is polynomially time solvable, then the robust 0-1 coun­

terpart is also polynomially solvable.

• Robust minimum spanning tree, minimum assignment, minimum match­

ing, shortest path and matroid intersection, are polynomially solvable.

9 Experimental Results
9.1 Robust Sorting
X Slide 28
minimize ci xi
i∈N
X
subject to xi = k
i∈N
x ∈ {0, 1}n .

7
Γ ¯
Z(Γ) ¯
% change in Z(Γ)
σ(Γ) % change in σ(Γ)
0 8822 0 %
501.0 0.0 %
10 8827 0.056 %
493.1 -1.6 %
20 8923 1.145 %
471.9 -5.8 %
30 9059 2.686 %
454.3 -9.3 %
40 9627 9.125 %
396.3 -20.9 %
50 10049 13.91 %
371.6 -25.8 %
60 10146 15.00 %
365.7 -27.0 %
70 10355 17.38 %
352.9 -29.6 %
80 10619 20.37 %
342.5 -31.6 %
100 10619 20.37 %
340.1 -32.1 %

X
Z ∗ (Γ) = minimize c′ x + max dj x j
{S| S⊆J,|S|=Γ}
j∈S
X
subject to xi = k
i∈N
x ∈ {0, 1}n .

9.1.1 Data
Slide 29
• |N | = 200;
• k = 100;
• cj ∼ U [50, 200]; dj ∼ U [20, 200];
• For testing robustness, generate instances such that each cost component

independently deviates with probability ρ = 0.2 from the nominal value

cj to cj + dj .

9.1.2 Results
Slide 30
10 Robust Network Flows
Slide 31
• Nominal
X
min cij xij

(i,j)∈A

X X
s.t. xij − xji = bi ∀i ∈ N
{j:(i,j)∈A} {j:(j,i)∈A}

0 ≤ xij ≤ uij ∀(i, j) ∈ A.

• X set of feasible solutions flows.


• Robust X
Z ∗ = min c′ x + max dij xij
{S| S⊆A,|S|≤Γ}
(i,j)∈S
subject to x ∈ X.

8
(cost, capacity)
(cij, uij)
i j

) j’ ( 0,
,� �)
(d ij

i i’ j
(cij, uij) (0, θ/ dij)

10.1 Reformulation
Slide 32

Z ∗ = min Z(θ),
θ≥0
X

Z(θ) = Γθ + min c x+ pij
(i,j)∈A
subject to pij ≥ dij xij − θ ∀(i, j) ∈ A
pij ≥ 0 ∀(i, j) ∈ A
x ∈ X.
• Equivalently
 

X θ
Z(θ) = Γθ + min c x+ dij max xij − ,0
dij
(i,j)∈A

subject to x ∈ X.

10.2 Network Reformulation


Slide 33
Theorem: For fixed θ we can solve the robust problem as a network flow problem

10.3 Complexity
Slide 34
• Z(θ) is a convex function and for all θ1 , θ2 ≥ 0, we have

|Z(θ1 ) − Z(θ2 )| ≤ |A||θ1 − θ2 |.

ˆ ∈
X with robust

• For any fixed Γ ≤ |A| and every ǫ > 0, we can find a solution x
objective value X
Ẑ = c′ x̂ + max dij x̂ij
{S| S⊆A,|S|≤Γ}
(i,j)∈S

such that

Z ∗ ≤ Ẑ ≤ (1 + ǫ)Z ∗

by solving 2⌈log 2 (|A|θ/ǫ)⌉ + 3 network flow problems, where θ = max{uij dij :


(i, j) ∈ A}.

9
3000
Γ=0
Γ=3
Γ=6
Γ = 10
2500

2000

1500

1000

500

3 4 5 6 7 8 9
Distributions of path cost

11 Experimental Results
Slide 35
12 Conclusions
Slide 36
• Robust counterpart of a MIP remains a MIP, of comparable size.
• Approach permits flexibility of adjusting the level of conservatism in terms of
probabilistic bound of constraint violation

• For polynomial solvable 0-1 optimization problems with cost uncertainty, the

robust counterpart is polynomial solvable.


Slide 37
• Robust network flows are solvable as a series of nominal network flow problems.
• Robust optimization is tractable for stochastic optimization problems without
the curse of dimensionality

10
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 14: Large Scale Optimization, I


1 Outline
Slide 1
1. The idea of column generation
2. The cutting stock problem
3. Stochastic programming

2 Column Generation
Slide 2
• For x ∈ ℜn and n large consider the LOP:

min c′ x
s.t. Ax = b
x≥0

• Restricted problem �
min ci xi
i∈I

s.t. Ai xi = b
i∈I
x≥0

2.1 Two Key Ideas


Slide 3
• Generate columns Aj only as needed.
• Calculate mini ci efficiently without enumerating all columns.

3 The Cutting Stock Problem


Slide 4
• Company has a supply of large rolls of paper of width W .
• bi rolls of width wi , i = 1, . . . , m need to be produced.
• Example: w = 70 inches, can be cut in 3 rolls of width w1 = 17 and 1 roll

of width w2 = 15, waste:

70 − (3 × 17 + 1 × 15) = 4
Slide 5
• Given w1 , . . . , wm and W there are many cutting patterns: (3, 1) and (2, 2)

for example

3 × 17 + 1 × 15 ≤ 70
2 × 17 + 2 × 15 ≤ 70

1
• Pattern: (a1 , . . . , am ) integers:


ai wi ≤ W
i=1

3.1 Problem
Slide 6
• Given wi , bi , i = 1, . . . , m (bi : number of rolls of width wi demanded,

and W (width of large rolls):

• Find how to cut the large rolls in order to minimize the number of rolls

used.

3.2 Concrete Example


Slide 7
• What is the solution for W = 70, w1 = 21, w2 = 9, b1 = 20, b2 = 21?
• feasible patterns: (2, 3), (3, 0), (0, 7), (2, 0)
• Solution 1: (2, 3) : 7 rolls; (3, 0) : 2 rolls: 9 rolls total

• Solution 2: (0, 7) : 3, (3, 0) : 6, (2, 0) : 1 : 10 rolls total


Slide 8
• W = 70, w1 = 20, w2 = 11, b1 = 12, b2 = 17
• Feasible patterns: 10 , 20 , 30 , 01 , 11 , 21 , 02 , 12 , 22 , 03 , 13 ,

�� �� �� �� � � �� �� �� �� �� ��
�0� �1� �0� �0

4 , 4 , 5 , 6

• x1 , . . . , x15 = # of feasible patterns of the type 10 , . . . , 06 respectively


�� ��

min x1 �+ ·�· · + x15

� � � � � �
1 2 0 12
s.t. x1 + x2 + · · · + x15 =
0 0 6 17
x1 , . . . , x15 ≥ 0
Slide 9
� � � � � � � �
0 0 3 12
• Example: 2 +1 +4 = 7 rolls used
6 5 0 17
� � � � � � � �
0 0 3 12
4 + +4 = 9 rolls used
4 1 0 17
• Any ideas?

2
3.3 Formulation
Slide 10
Decision variables: xj = number of rolls cut by pattern j characterized by vector
Aj :
�n
min xj
j=1 
b1
n
Aj · xj =  ... 
�  
j=1
bm
xj ≥ 0 ( integer)
Slide 11
• Huge number of variables.
• Can we apply column generation, that is generate the patterns Aj on the

fly?

3.4 Algorithm
Slide 12
Idea: Generate feasible patterns as needed.
 W       
⌊ w1 ⌋ 0 0 0
 0   ⌊W ⌋   0
   0
1) Start with initial patterns:    w2
 0 , 0
, W , 
  ⌊ ⌋  
w3
 0
W
0 0 0 ⌊w4

Slide 13
2) Solve:
min x1 + · · · + xm

x1 A1 + · · · + xm Am = b

xi ≥ 0

Slide 14
3) Compute reduced costs

cj = 1 − p′ Aj for all patterns j


If cj ≥ 0 current set of patterns optimal
If cs < 0 ⇒ xs needs to enter basis
How are we going to compute reduced costs cj = 1 − p′ Aj for all j? (huge
number)

3
3.4.1 Key Idea
Slide 15
4) Solve
m

z ∗ = max p i ai
i=1
�m
s.t. wi ai ≤ W
i=1
ai ≥ 0, integer
This is the integer knapsack problem
Slide 16
• If z ∗ ≤ 1 ⇒ 1 − p′ Aj > 0 ∀j ⇒ current solution optimal
• If z ∗ > 1 ⇒ ∃ s: 1 − p′ As < 0 ⇒ Variable xs becomes basic, i.e., a new

pattern As will enter the basis.

• Perform min-ratio test and update the basis.

3.5 Dynamic Programming


Slide 17
F (u) = max p1 a1 + · · · + pm am
s.t. w1 a1 + · · · + wm am ≤ u
ai ≥ 0, integer

• For u ≤ wmin , F (u) = 0.


• For u ≥ wmin

F (u) = max {pi + F (u − wi )}

i=1,...,m

Why ?

3.6 Example
Slide 18
max 11x1 + 7x2 + 5x3 + x4
s.t. 6x1 + 4x2 + 3x3 + x4 ≤ 25
xi ≥ 0, xi integer

F (0) = 0

F (1) = 1

F (2) = 1 + F (1) = 2

Slide 19
F (3) = max(5 + F (0)∗ , 1 + F (2)) = 5

F (4) = max(7 + F (0)∗ , 5 + F (1), 1 + F (3)) = 7

F (5) = max(7 + F (1)∗ , 5 + F (2), 1 + F (4)) = 8

F (6) = max(11 + F (0)∗ , 7 + F (2), 5 + F (3), 1 + F (5)) = 11

F (7) = max(11 + F (1)∗ , 7 + F (2), 5 + F (3), 1 + F (4)) = 12

F (8) = max(11 + F (2), 7 + F (4)∗ , 5 + F (5), 1 + F (7)) = 14

F (9) = 11 + F (3) = 16

F (10) = 11 + F (4) = 18

F (u) = 11 + F (u − 6) = 16 u ≥ 11

4
⇒ F (25) = 11 + F (19) = 11 + 11 + F (13) = 11 + 11 + 11 + F (7) = 33 + 12 = 45
x∗ = (4, 0, 0, 1)

4 Stochastic Programming
4.1 Example
Slide 20
Wrenches
Pliers Cap.
Steel (lbs)
1.5
1.0 27,000
Molding machine (hrs)
1.0
1.0 21,000
Assembly machine (hrs)
0.3
0.5 9,000* Slide 21
Demand limit (tools/day)
15,000
16,000
Contribution to earnings
$130*
$100
($/1000 units)
max 130W + 100P
s.t. W ≤ 15

P ≤ 16

1.5W + P ≤ 27

W + P ≤ 21

0.3W + 0.5P ≤ 9

W, P ≥ 0

4.1.1 Random data


Slide 22
1

 8000
 with probability
• Assembly capacity is random: 2
 10, 000 with probability
 1
2
1

 160 with probability

• Contribution from wrenches: 2
 90 1
with probability

2

4.1.2 Decisions
Slide 23
• Need to decide steel capacity in the current quarter. Cost 58$/1000lbs.
• Soon after, uncertainty will be resolved.
• Next quarter, company will decide production quantities.

4.1.3 Formulation
Slide 24

5
State
Cap. W. contr.
Prob.
1
8,000 160
0.25
2
10,000 160
0.25
3
8,000 90
0.25
4
10,000 90
0.25
Decision Variables: S: steel capacity,

Pi , Wi : i = 1, . . . , 4 production plan under state i. Slide 25

max −58S + 0.25Z1 + 0.25Z2 + 0.25Z3 + 0.25Z4


s.t.

Ass. 1 0.3W1 + 0.5P1 ≤ 8

Mol. 1 W1 + P1 ≤ 21

Ste. 1 −S + 1.5W1 + P1 ≤ 0
W.d. 1 W1 ≤ 15

P.d. 1 P1 ≤ 16

Obj. 1 −Z1 + 160W1 + 100P1 = 0

Slide 26

Ass. 2 0.3W2 + 0.5P2 ≤ 8

Mol. 2 W2 + P2 ≤ 21

Ste. 2 −S + 1.5W2 + P2 ≤ 0
W.d. 2 W2 ≤ 15

P.d. 2 P2 ≤ 16

Obj. 2 −Z2 + 160W2 + 100P2 = 0

Slide 27

Ass. 3 0.3W3 + 0.5P3 ≤ 8

Mol. 3 W3 + P3 ≤ 21

Ste. 3 −S + 1.5W3 + P3 ≤ 0
W.d. 3 W3 ≤ 15

P.d. 3 P3 ≤ 16

Obj. 3 −Z3 + 160W3 + 100P3 = 0

Slide 28

Ass. 4 0.3W4 + 0.5P4 ≤ 8

Mol. 4 W4 + P4 ≤ 21

Ste. 4 −S + 1.5W4 + P4 ≤ 0
W.d. 4 W4 ≤ 15

P.d. 4 P4 ≤ 16

Obj. 4 −Z4 + 160W4 + 100P4 = 0

S, Wi , Pi ≥ 0

4.1.4 Solution
Slide 29

Solution: S = 27, 250lb.


Wi Pi
1
15,000 4,750
2
15,000 4,750
3
12,500 8,500
4
5,000 16,000

4.2 Two-stage problems


Slide 30
• Random scenarios indexed by w = 1, . . . , k. Scenario w has probability

αw .

• First stage decisions: x: Ax = b, x ≥ 0.


• Second stage decisions: yw : w = 1, . . . , k.
• Constraints:

Bw x + Dw yw = dw , yw ≥ 0.

4.2.1 Formulation
Slide 31
min c′ x + α1 f1′ y1 + ··· + αk fk′ yk
Ax =b
B1 x + D 1 y1 = d1
B2 x + D 2 y2 = d2 Slide 32
. . .. ..
.
. .
Bk x + D k yk = dk

x, y1 , y2 , . . . , yk ≥ 0.

Structure: x y1 y2 y3 y4
Objective

7
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 15: Large Scale Optimization, II


1 Outline
Slide 1
1. Dantzig-Wolfe decomposition
2. Key Idea
3. Bounds

2 Decomposition
Slide 2
min c′1 x1 + c′2 x2
s.t. D 1 x1 + D 2 x2 = b 0
F 1 x 1 = b1
F 2 x 2 = b2
x1 , x 2 ≥ 0
• Relation with stochastic programming?
• Firm’s problem

2.1 Reformulation
� � Slide 3
• Pi = xi ≥ 0 | F i xi = bi , i = 1, 2
• xji , j ∈ Ji extreme points of Pi
• w ki , k ∈ Ki , extreme rays of Pi .
• For all xi ∈ Pi � �
xi = λji xji + θik wki ,
j∈Ji k∈Ki

λji ≥ 0 and θik ≥0 �


λji = 1, i = 1, 2
j∈Ji
Slide 4
� � � �
min λj1 c′1 xj1 + θ1k c′1 w k1 + λj2 c′2 xj2 + θ2k c′2 w k2
j∈J1 k∈K1 j∈J2 k∈K2
� � �
s.t. λj1 D 1 xj1 + θ1k D 1 w k1 + λj2 D 2 xj2
j∈J1 k∈K1 j∈J2

+ θ2k D 2 w 2k = b0
k∈K2

λj1 =1
j∈J1

λj2 = 1
j∈J2

λji ≥ 0, θik ≥ 0, ∀ i, j, k.

1
Huge # variables, m0 + 2 constraints Slide 5
• A bfs is available with a basis matrix B
• p′ = c′B B −1 ; p = (q, r1 , r2 )
• Is B optimal?
• Check reduced costs

(c′1 − q ′ D 1 )xj1 − r1

(c′1 − q ′ D 1 )wk1
• Huge number of them

3 Key idea
Slide 6
Consider subproblem:

min (c′1 − q ′ D 1 )x1


s.t. x1 ∈ P1 ,
• If optimal cost of subproblem is −∞, an extreme ray w k1 is generated:

(c′1 − q ′ D 1 )w k1 < 0, i.e., reduced cost of θ1k is negative; Generate column

[D 1 wk1 , 0, 0]′

• If optimal cost is finite and smaller than r1 , then, an extreme point xj


1
is generated: (c′1 − q ′ D1 )xj1 < r1 , i.e., reduced cost of λj1 is negative;

Generate column [D1 xj1 ,1 , 0]′

• Otherwise, reduced costs are nonnegative


• Repear for subproblem:
min (c′2 − q ′ D2 )x2
s.t. x2 ∈ P2 ,

4 Remarks
Slide 7
• Economic interpretation
• Applicability of the method

min c′1 x1 + c′2 x2 + · · · + c′t xt


s.t. D1 x1 + D2 x2 + · · · + D t xt = b0
F i xi = bi , i = 1, 2, . . . , t
x1 , x2 , . . . , xt ≥ 0.

2
min c′ x
s.t. Dx = b0
F x = b
x ≥ 0,

4.1 Termination
Slide 8
• Finite termination
• Algorithm makes substantial progress in the beginning, but very slow later

on

• no faster than the revised simplex method applied to the original problem
• Storage with t subproblems
• Original: O (m0 + tm1 )2
� �

• Decomposition algorithm O (m0 + t)2 for the tableau of the master prob­

� �

lem, and t times O(m21 ) for subproblems.

• If t = 10 and if m0 = m1 is much larger than t, memory requirements for

decomposition algorithm are about 100 times smaller than revised simplex

method.

5 Example
Slide 9

min −4x1 − x2 − 6x3
s.t. 3x1 + 2x2 + 4x3 = 17
1 ≤ x1 ≤2
1 ≤ x2 ≤2
1 ≤ x3 ≤ 2.

• P = {x ∈ ℜ3 | 1 ≤ xi ≤ 2, i = 1, 2, 3}; eight extreme points;


• Master problem:
8

λj Dxj = 17,
j=1
8

λj = 1,
j=1

Slide 10
• x1 = (2, 2, 2) and x2 = (1, 1, 2); Dx1 = 18, Dx2 = 13

3
� � � �
18 13 0.2 −2.6
• B= ; B −1 =
1 1 −0.2 3.6
•  
2
= c′ x1 = − 4 − 1 − 6  2  = −22,
� �
cB(1)
2
 
1
= c′ x2 = − 4 − 1 − 6  1  = −17.
� �
cB(2)
2
� � � � � �
• p′ = q ′ r = c′B B −1 = − 22 − 17 B −1 = − 1 − 4 .
• � � �
c′ − q ′ D = − 4 − 1 − 6] − (−1) 3 2 4 = [−1 1 − 2],
optimal solution is x3 = (2, 1, 2) with optimal cost −5 ≤ r = −4
• Generate the column corresponding to λ3 .
Slide 11
x3
x2 = ( 1 ,1,2 ) ( 1 ,2 , 2 )

. .
A
B
x3 = ( 2 , 1,2 ) x1 = ( 2 , 2 , 2 )
x2
(1,1,1) ( 1 ,2 , 1)

( 2 , 1,1) ( 2 , 2 , 1)

x1

6 Starting the algorithm


Slide 12
m0

min yt

t=1

 
� � �
s.t.  λji Di xji + θik D i wki  + y = b0
i=1,2 j∈Ji k∈Ki

λj1 = 1
j∈J1

4

λj2 = 1
j∈J2

λji ≥ 0, θik ≥ 0, yt ≥ 0, ∀ i, j, k, t.

7 Bounds
Slide 13
• Optimal cost z ∗
• z cost of feasible solution obtained at some intermediate stage ofe decom­

position algorithm.

• ri be the value of the dual variable associated with the convexity constraint

for the ith subproblem

• zi optimal cost in the ith subproblem


• Then, �
z+ (zi − ri ) ≤ z ∗ ≤ z.
i

7.1 Proof
Slide 14
Dual of master problem

max q ′ b0 + r1 + r2
s.t. q ′ D 1 xj1 + r1 ≤ c′1 xj1 , ∀ j ∈ J1 ,
q ′
D 1 wk1 ≤ c′1 w k1 , ∀ k ∈ K1 ,
q ′ D 2 xj2 + r2 ≤ c′2 xj2 , ∀ j ∈ J2 ,
q ′
D 2 wk2 ≤ c′2 w k2 , ∀ k ∈ K2 .
Slide 15

• (q, r1 , r2 ) dual variables

q ′ b0 + r1 + r2 = z

• z1 is the optimal cost in the first subproblem:

min (c′1 xj1 − q ′ D 1 xj1 ) = z1 ,


j∈J1

min (c′1 wk1 − q ′ D1 wk1 ) ≥ 0.


k∈K1

• (q, z1 , z2 ) is a feasible solution to the dual of master problem

5
• By weak duality,

z ∗ ≥ q ′ b0 + z1 + z2
= q ′ b0 + r1 + r2 + (z1 − r1 ) + (z2 − r2 )
= z + (z1 − r1 ) + (z2 − r2 ),

7.2 Example
Slide 16
• (λ1 , λ2 ) = (0.8, 0.2)

• cB = (−22, −17), z = (−22, −17)′(0.8, 0.2) = −21


• r = −4; z1 = (−1, 1, −2)′(2, 1, 2) = −5.
• −21 ≥ z ∗ ≥ −21 + (−5) − (−4) = −22
• z ∗ = −21.5

6
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 16: Network Flows, I


1 Networks
Slide 1
• Electrical & Power Networks
• Road Networks
• Airline Routes
• Internet Backbone
• Printed Circuit Board
• Social Networks

2 Common Thrust
Slide 2
Move some entity (electricity, a consumer product, a person, a vehicle, a mes­
sage, . . . ) from one point to another in the underlying network, as efficiently as
possible.

1. Learn how to model application settings as network flow problems.


2. Study ways to solve the resulting models.

3 Shortest Path
3.1 Description
Slide 3
• Identify a shortest path from a given source node to a given sink node.
• Finding a path of minimum length.
• Finding a path taking minimum time.
• Finding a path of maximum reliability.

4 Maximum Flow
4.1 Description
Slide 4
• Determine the maximum flow that can be sent from a given source node

to a sink node in a capacitated network.

• Determining maximum steady-state flow of


– petroleum products in a pipeline network
– cars in a road network
– messages in a telecommunication network
– electricity in an electrical network

1
5 Min-Cost Flow
5.1 Description
Slide 5
• Determine a least cost shipment of a commodity through a network in

order to satisfy demands at certain nodes from available supplies at other

nodes. Arcs have capacities and cost associated with them

• Distribution of products
• Flow of items in a production line
• Routing of cars through street networks
• Routing of telephone calls

5.2 In LOP Form


Slide 6
• Network G = (N, A).
• Arc costs c : A → R.
• Arc capacities u : A → N .
• Node balances b : N → R.


min cij xij

(i,j)∈A

� �
s.t. xij − xji = bi for all i ∈ N
j:(i,j)∈A j:(j,i)∈A
xij ≤ uij for all (i, j) ∈ A
xij ≥ 0 for all (i, j) ∈ A

6 Outline
Slide 7
• Shortest path applications
• Maximum Flow applications
• Minimum cost flow applications

7 Shortest Path
7.1 Interword Spacing in LATEX
Slide 8
The spacing between words and characters is normally set

automatically by LATEX. Interword spacing within one line

is uniform. LATEX also attempts to keep the word spacing

for different lines as nearly the same as possible.

2
The spacing between words and characters is normally set auto­

matically by LATEX. Interword spacing within one line is uniform.

LATEX also attempts to keep the word spacing for different lines

as nearly the same as possible.

7.2 Interword Spacing in LATEX (2)


Slide 9
• The paragraph consists of n words, indexed by 1, 2, . . . , n.
• cij is the attractiveness of a line if it begins with i and ends with j − 1.
• (LATEX uses a formula to compute the value of each cij .)

For instance,

c12 = −10, 000 c13 = −1, 000


c14 = 100 c1,37 = −100, 000
...

7.3 Interword Spacing in LATEX (3)


Slide 10
• The problem of decomposing a paragraph into several lines of text to max­

imize total attractiveness can be formulated as a shortest path problem.

• Nodes? Arcs? Costs?

7.4 Project Management


Slide 11
• A project consists of a set of jobs and a set of precedence relations
• Given a set A of job pairs (i, j) indicating that job i cannot start before

job j is completed.

• ci duration of job i
• Find the least possible duration of the project

7.4.1 Formulation
Slide 12
• Introduce two artificial jobs s and t, of zero duration, that signify the

beginning and the completion of the project

• Add (s, i) and (i, t) to A


• pi time that job i begins
• (i, j) ∈ A: pj ≥ pi + ci
• Project duration: pt − ps

3
Slide 13


min pt − ps
s.t pj − pi ≥ ci , ∀ (i, j) ∈ A.

• Dual �
max ci fij

(i,j)∈A

� �
s.t. fji − fij = bi
{j|(j,i)∈A} {j|(i,j)∈A}

fij ≥ 0
Slide 14
• bs = −1, bt = 1, and bi = 0 for i �= s, t.
• Shortest path problem, where each precedence relation (i, j) ∈ A corre­
sponds to an arc with cost of −ci .
Slide 15
Activity
Immediate Predecessor
Time(ci )

s
0

A
S
14

B
S
3

C
A,B
5

D
A
7

E
C,D
10

t
E
0

Slide 16

0 14
S A D T
7
0 14
10
3 5
B C E

7.5 DNA Sequencing


Slide 17
• Given two sequences of letters, say

B = b1 · · · bp and D = d1 · · · dq

• How similar are the two sequences?

• What is the min cost of transforming B to D?

4
7.5.1 Transformation costs
Slide 18
• α = cost of inserting a letter in B
• β = cost of deleting a letter from B
• g(bi , dj ) = cost of mutating a letter bi into dj

7.5.2 Transformation steps


Slide 19
1. Add or delete letters from B so as to make |B ′ | = |D|.
2. Align B ′ and D
3. Mutate letters of B ′ so that B ′′ = D.

7.5.3 Algorithm
Slide 20
• f (b1 · · · bp , d1 · · · dq ): the min cost of transforming B into D by the three

steps above. We obtain this cost by a recursive way.

f (∅ · · · ∅, d1 · · · dj ) = jα, j = 1, ..., q

f (b1 · · · bi , ∅ · · · ∅) = iβ, i = 1, ..., p.

Slide 21
Substitution
B′ = b1 ··· bi
D= d1 ··· dj
f (b1 · · · bi , d1 · · · dj )
= f (b1 · · · bi−1 , d1 · · · dj−1 ) + g(bi , dj )
Slide 22
• Addition of dj

B′ = b1 ··· bi ··· ∅

D= d1 ··· ··· dj
f (b1 · · · bi , d1 · · · dj ) = f (b1 · · · bi , d1 · · · dj−1 ) + α.
• Deletion of bi :

f (b1 · · · bi , d1 · · · dj ) = f (b1 · · · bi−1 , d1 · · · dj ) + β

Slide 23
Recursion
f (b1 · · · bi , d1 · · · dj )
= min{f (b1 · · · bi−1 , d1 · · · dj−1 ) + g(bi , dj ),
f (b1 · · · bi , d1 · · · dj−1 ) + α,
f (b1 · · · bi−1 , d1 · · · dj ) + β}
Slide 24
The shortest path from 00 to 32

5
8 Maximum Flow
8.1 The tournament problem
Slide 25
• Each of n teams plays against every other team a total of k games.
• Each game ends in a win or a loss (no draws)

• xi : the number of wins of team i.


• X set of all possible outcome vectors (x1 , ..., xn )
• Given x = (x1 , ..., xn ) decide whether x ∈ X

8.1.1 Formulation
Slide 26
• Supply nodes T1 , ..., Tn represent teams with supply x1 , ..., xn
• Since total number of wins total number of games, we must have

xi = n(n − 1)k/2

• Demand nodes

G12 , ..., G1n , G23 , ..., G2n , ..., Gij , ..., Gn−1,n

denote games between Ti and Tj with demand k.

• Arcs: (Ti , Gij ), (Tj , Gij ). The flow from Ti to Gij represents the total

number of games between i and j won by i

• Transportation model feasible if and only if x ∈ X

6
8.2 Preemptive Scheduling
Slide 27
• m identical machines to process n jobs
• Job j must be processed for pj periods, j = 1, ..., n
• It can not start before period rj and must be competed before period dj
• We allow preemption, i.e., we can disrupt the processing of one job with
another
Slide 28
• Problem Find a schedule (which job is processed by which machine at
which period) such that all jobs are processed after their release times and
completed before their deadlines
• Cj : completion time of job j: We need to have

rj + pj ≤ Cj ≤ dj for all j

8.3 Formulation
Slide 29
• Rank all release times and deadlines in ascending order. The ordered
list of numbers divides the time horizon into a number of nonoverlapping
intervals.
• Tkl be the interval that starts in the period k and ends in period l. During
Tkl , we can process any job j that has been released (rj ≤ k) and its
deadline has not yet been reached (l ≤ dj ).

8.3.1 Example
Slide 30
• 4 jobs with release times 3, 1, 3, 5, and deadlines 5, 4, 7, 9.
• The ascending list of release times and deadlines is 1, 3, 4, 5, 7, 9.
• Five intervals: T13 , T34 , T45 , T57 , T79 .

8.3.2 Network
Slide 31
• Nodes: source s, sink t, a node corresponding to each job j, and a node
corresponding to each interval Tkl .
• Arcs: (s, j), with capacity pj . Flow represents the number of periods of
processing that job j receives.
• Arcs: (Tkl , t), with capacity m(l − k). Flow represents the total number
of machine-periods of processing during Tkl .
Slide 32
• Arcs: (j, Tkl ) if rj ≤ k ≤ l ≤ dj with capacity l − k. Flow represents the
number of periods that job j is processed during Tkl .

7
j
pj
l-k
s t

Tkl m( l - k)

9 Min-Cost Flow
9.1 Passenger Routing
Slide 33
• United Airlines has seven daily flights from BOS to SFO, every two hours,

starting at 7am.

• Capacities are 100, 100, 100, 150, 150, 150, and ∞.


• Passengers suffering from overbooking are diverted to later flights.
• Delayed passengers get $200 plus $20 for every hour of delay.
• Suppose that today the first six flighs have 110, 160, 103, 149, 175, and 140

confirmed reservations.

Determine the most economical passenger routing strategy!

8
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
-15
2
2 4
2
5 3 1

4 4
10 1 5 7 10

3 6
6
3 6
5
-5

1
1

2 7

3 6

4 5

1 1

2 7 2 7

0
3 6 3 6

0
4 5 4 5
0

What is the flow in 1 1 What is the flow in 1 1


arc (4,3)? arc (5,3)?
-6 -6
2 7 3 2 7 3

1 3 6 0 1 3 6 0
-4 -4
2
0 0
4 5 4 5
2 0 3 2 0 3

2
What is the flow in 1 1 What is the flow in 1 1
arc (3,2)? arc (2,6)?
-6 -6
2 7 3 2 7 3
6
1 3 6 0 1 3 6 0
-4 -4
2 3 2 3
0 0
4 5 4 5
2 0 3 2 0 3

What is the flow in 1 1 What is the flow in 1 1


arc (7,1)? arc (1,2)? 3
-6 -6
2 7 3 2 7 3
6 4 6 4
1 3 6 0 1 3 6 0
-4 -4
2 3 2 3
0 0
4 5 4 5
2 0 3 2 0 3

Note: there are 1 1


two different ways 4 3
of calculating the -6
flow on (1,2), and 2 7 3
6 4
both ways give a
flow of 4. Is this a 1 3 6 0
coincidence? -4
2 3
0
4 5
2 0 3

3
3 1 3 1
4 4 4
2 2
1
7 7
5 2 5
3 1 2 3 2

3 3 6 6
4 4
5 4 5 4

2 1 1 1
3 2
2 2
7 7
6 7
3 1 3 0

6 6
5 4
5 3 5 2

1 1
2
2
7
7
3
6
4
5 2

4
0
1 Here is a spanning 1 There is a redundant constraint
5 -6 tree with arc costs. 5 -6 in the minimum cost flow
How can one choose problem.
2 7 node potentials so 2 7
3 -4 3 -4 One can set p1 arbitrarily. We
that reduced costs of
will let p1 = 0.
3 6 tree arcs are 0? 3 6
-2 1 -2 1
What is the node potential for 2?
4 5 4 5

0 0
1 1
5 -6 5 -6
-5 -5
-6
2 7 2 7
3 -4 3 -4
3 6 3 6
-2 1 -2 1
4 5 4 5
What is thenode potential for 7? What is the potential for node 3?

0 0
1 1
5 -6 5 -6
-5 -5
-6 -6
2 7 2 7
3 -4 3 -4
-2 -2
3 6 3 6 -1
-2 1 -2 1
4 5 4 5
What is the potential for node 6? What is the potential for node 4?

5
0 0
1 1
5 -6 5 -6
-5 -5
-6 -6
2 7 2 7
3 -4 3 -4
-2 -2
3 6 -1 3 6 -1
These are the node potentials
-2 1 -2 1 associated with this tree. They
4 5 4 5 do not depend on arc flows, nor
What is the potential for node 5? on costs of non-tree arcs.
-4 -4 -1

0
Node potentials 1
Flow on arcs 1
Original costs Reduced costs 4
3
-5
-6
2 7 2 7
7 4 2
6
-2
3 6 -1 3 6
-3 2 3 -3
4 5 4 5
-4 2 5
-1

Flow on arcs 1 1
4 4
3 3
2 7 2 7
4 1
6 3
3 6 3 6
2 3 0 2 0 3
4 5 4 5

6
1

2 7

3 6

4 5

7
1 1

2 7 2 7

3 6 3 6

4 5 4 5

2 7

3 6

4 5

8
9
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 18: The Ellipsoid method


1 Outline
Slide 1
• Efficient algorithms and computational complexity
• The key geometric result behind the ellipsoid method
• The ellipsoid method for the feasibility problem
• The ellipsoid method for optimization

2 Efficient algorithms
Slide 2
• The LO problem
min c′ x
s.t. Ax = b
x≥0

• A LO instance

min 2x + 3y

s.t. x + y ≤ 1
x , y ≥ 0

• A problem is a collection of instances

2.1 Size
Slide 3
• The size of an instance is the number of bits used to describe the instance,

according to a prespecified format

• A number r ≤ U

r = ak 2k + ak−1 2k−1 + · · ·
+ a1 21 + a0

is represented by (a0 , a1 , . . . , ak ) with k ≤ ⌊log 2 U ⌋

• Size of r is ⌊log 2 U ⌋ + 2
• Instance of LO: (c, A, b)
• Size is � �
(mn + m + n) ⌊log2 U ⌋ + 2

2.2 Running Time


Slide 4
Let A be an algorithm which solves the optimization problem Π.

If there exists a constant α > 0 such that A terminates its computation after at most

α f (I) elementary steps for each instance I, then A runs in O(f ) time.

Elementary operations are


• variable assignments • comparison of numbers
• random access to variables • arithmetic operations
• conditional jumps • ... Slide 5

1
A “brute force” algorithm for solving the min-cost flow problem:
Consider all spanning trees and pick the best tree solution among the feasible ones.
Suppose we had a computer to check 1015 trees in a second. It would need more than
109 years to find the best tree for a 25-node min-cost flow problem.
It would need 1059 years for a 50-node instance.
That’s not efficient! Slide 6
Ideally, we would like to call an algorithm “efficient” when it is sufficiently fast to be
usable in practice, but this is a rather vague and slippery notion.

The following notion has gained wide acceptance:


An algorithm is considered efficient if the number of steps it performs for
any input is bounded by a polynomial function of the input size.
Polynomials are, e.g., n, n3 , or 106 n8 .

2.3 The Tyranny of


Exponential Growth
Slide 7
100 n log n 10 n2 n3.5 2n n! nn−2
109 /sec 1.19 · 109 600, 000 3, 868 41 15 13
1010 /sec 1.08 · 1010 1, 897, 370 7, 468 45 16 13
Maximum input sizes solvable within one hour.

2.4 Punch line


Slide 8
The equation

efficient = polynomial

has been accepted as the best available way of tying the empirical

notion of a “practical algorithm” to a precisely formalized mathe­

matical concept.

2.5 Definition
Slide 9
An algorithm runs in polynomial time if its running time is O(|I|k ), where |I|
is the input size, and all numbers in intermediate computations can be stored
with O(|I|k ) bits.

3 The Ellipsoid method


Slide 10
• D is an n × n positive definite symmetric matrix
• A set E of vectors in ℜn of the form
E = E(z, D) = x ∈ ℜn | (x − z)′ D −1 (x − z) ≤ 1
� �

is called an ellipsoid with center z ∈ ℜn

2
3.1 The algorithm intuitively
Slide 11
• Problem: Decide whether a given polyhedron

P = x ∈ ℜn | Ax ≥ b
� �

is nonempty Slide 12

Et+1

P
xt 00
11
Et
11
00 xt+1
a′ x ≥ b

a′ x ≥ a′ xt

• Key property: We can find a new ellipsoid Et+1 that covers the half­
ellipsoid and whose volume is only a fraction of the volume of the previous
ellipsoid Et

3.2 Key Theorem


Slide 13
• E = E(z, D) be an ellipsoid in ℜn ; a nonzero n-vector.
• H = {x ∈ ℜn | a′ x ≥ a′ z}
1 Da
z = z+ √ ,
n + 1 a′ Da
� �
n2 2 Daa′ D
D = 2
D− .
n −1 n + 1 a′ Da

• The matrix D is symmetric and positive definite and thus E ′ = E(z, D) is an


ellipsoid
• E ∩ H ⊂ E

• Vol(E ′ ) < e−1/(2(n+1)) Vol(E)

3
x2

E'
E

x1

3.3 Illustration
Slide 14
3.4 Assumptions
Slide 15
• A polyhedron P is full-dimensional if it has positive volume
• The polyhedron P is bounded: there exists a ball E0 = E(x0 , r2 I), with

volume V , that contains P

• Either P is empty, or P has positive volume, i.e., Vol(P ) > v for some

v > 0

• E0 , v, V , are a priori known


• We can make our calculations in infinite precision; square roots can be

computed exactly in unit time

3.5 Input-Output
Slide 16
Input:
• A matrix A and a vector b that define the polyhedron P = {x ∈ ℜn |
a′i x ≥ bi , i = 1, . . . , m}
• A number v, such that either P is empty or Vol(P ) > v

4
• A ball E0 = E(x0 , r2 I) with volume at most V , such that P ⊂ E0
Output: A feasible point x∗ ∈ P if P is nonempty, or a statement that P is
empty

3.6 The algorithm


Slide 17
1. (Initialization)
Let t∗ = 2(n + 1) log(V /v) ; E0 = E(x0 , r 2 I); D 0 = r 2 I; t = 0.
� �

2. (Main iteration)
• If t = t∗ stop; P is empty.
• If xt ∈ P stop; P is nonempty.
/ P find a violated constraint, that is, find an i such that a′i xt < bi .
• If xt ∈
• Let Ht = {x ∈ ℜn | a′i x ≥ a′i xt }. Find an ellipsoid Et+1 containing Et ∩ Ht :

Et+1 = E(xt+1 , D t+1 ) with

1 D a
xt+1 = xt + � t i ,
n + 1 a′i D t ai
n2
� �
2 D t ai a′i D t
D t+1 = 2 Dt − .
n −1 n + 1 a′i D t ai

• t := t + 1.

3.7 Correctness
Slide 18
Theorem: Let P be a bounded polyhedron that is either empty or full-dimensional
and for which the prior information x0 , r, v, V is available. Then, the ellipsoid
method decides correctly whether P is nonempty or not, i.e., if xt∗ −1 ∈
/ P , then
P is empty

3.8 Proof
Slide 19
• If xt ∈ P for t < t∗ , then the algorithm correctly decides that P is

nonempty

• Suppose x0 , . . . , xt∗ −1 ∈
/ P . We will show that P is empty.
• We prove by induction on k that P ⊂ Ek for k = 0, 1, . . . , t∗ . Note

that P ⊂ E0 , by the assumptions of the algorithm, and this starts the

induction.

Slide 20

• Suppose P ⊂ Ek for some k < t∗ . Since xk ∈ / P , there exists a violated


inequality: a′ i(k) x ≥ bi(k) be a violated inequality, i.e., ai(k)

xk < bi(k) ,

where xk is the center of the ellipsoid Ek

5
• For any x ∈ P , we have
a′i(k) x ≥ bi(k) > a′i(k) xk

• Hence, P ⊂ Hk = x ∈ ℜn | a′i(k) x ≥ a′i(k) xk


� �

• Therefore, P ⊂ Ek ∩ Hk
Slide 21
By key geometric property, Ek ∩ Hk ⊂ Ek+1 ; hence P ⊂ Ek+1 and the induction is
complete

Vol(Et+1 )
< e−1/(2(n+1))
Vol(Et )
Vol(Et∗ ) ∗
< e−t /(2(n+1))
Vol(E0 )
V V
Vol(Et∗ ) < V e−⌈2(n+1) log v ⌉/(2(n+1)) ≤ V e− log v = v
If the ellipsoid method has not terminated after t∗ iterations, then Vol(P ) ≤ Vol(Et∗ ) ≤
v. This implies that P is empty

3.9 Binary Search


� � Slide 22
• P = x ∈ ℜ | x ≥ 0, x ≥ 1, x ≤ 2, x ≤ 3
• E0 = [0, 5], centered at x0 = 2.5
• Since x0 ∈
/ P , the algorithm chooses the violated inequality x ≤ 2 and

constructs E1 that contains the interval E0 ∩ {x | x ≤ 2.5} = [0, 2.5]

• The ellipsoid E1 is the interval [0, 2.5] itself


• Its center x1 = 1.25 belongs to P
• This is binary search

3.10 Boundedness of P
Slide 23
Let A be an m × n integer matrix and let b a vector in ℜn . Let U be the largest
absolute value of the entries in A and b.
Every extreme point of the polyhedron P = {x ∈ ℜn | Ax ≥ b} satisfies
−(nU )n ≤ xj ≤ (nU )n , j = 1, . . . , n
Slide 24
• All extreme points of P are contained in
PB = x ∈ P �
|xj | ≤ (nU )n , j = 1, . . . , n
� � �

2n
� �

• Since
� PB ⊆ 2n E 0,
� n(nU ) I , we can start the ellipsoid method with E0 =
E 0, n(nU ) I
• �n 2
V ol(E0 ) ≤ V = 2n(nU )n = (2n)n (nU )n

6
3.11 Full-dimensionality
Slide 25
Let P = {x ∈ ℜn | Ax ≥ b}. We assume that A and b have integer entries,
which are bounded in absolute value by U . Let
1 � �−(n+1)
ǫ= (n + 1)U .
2(n + 1)
Let
Pǫ = x ∈ ℜn | Ax ≥ b − ǫe ,
� �

where e = (1, 1, . . . , 1).


(a) If P is empty, then Pǫ is empty.
(b) If P� is nonempty, then� Pǫ is full-dimensional. Slide 26
Let P = x ∈ ℜn | Ax ≥ b be a full-dimensional bounded polyhedron, where
the entries of A and b are integer and have absolute value bounded by U . Then,
2
(n+1)
Vol(P ) > v = n−n (nU )−n

3.12 Complexity
Slide 27
• P = {x ∈ ℜn | Ax ≥ b}, where A, b have integer entries with magni­

tude bounded by some U and has full rank. If P is bounded and either

empty
� or full-dimensional,
� the ellipsoid method decides if P is empty in

O n log(V /v) iterations


2 2
(n+1)
• v = n−n (nU )−n V = (2n)n (nU )n
,
• Number of iterations O n4 log(nU )
� �
Slide 28
• If P is arbitrary, we first form PB , then perturb PB to form PB,ǫ and apply the

ellipsoid method to PB,ǫ

• Number of iterations is O n6 log(nU ) .


� �

• It has been shown that only O(n3 log U ) binary digits of precision are needed,

and the numbers computed during the algorithm have polynomially bounded

size

• The linear programming feasibility problem with integer data can be solved in

polynomial time

4 The ellipsoid method for optimization


Slide 29
min c′ x max b′ π
s.t. Ax ≥ b, s.t. A′ π = c
π ≥ 0.
By strong duality, both problems have optimal solutions if and only if the following
system of linear inequalities is feasible:
b′ p = c′ x, Ax ≥ b, A′ p = c, p ≥ 0.
LO with integer data can be solved in polynomial time.

7
4.1 Sliding objective
Slide 30
• �
We first run the ellipsoid method to find a feasible solution x0 ∈ P =
x ∈ ℜn | Ax ≥ b .

• We apply the ellipsoid method to decide whether the set

P ∩ x ∈ ℜn | c′ x < c′ x0
� �

is empty.
• If it is empty, then x0 is optimal. If it is nonempty, we find a new solution
x1 in P with objective function value strictly smaller than c′ x0 .
Slide 31

• More generally, every time a better feasible solution xt is found, we take

P ∩ {x ∈ ℜn | c′ x < c′ xt } as the new set of inequalities and reapply the

ellipsoid method.

- c

.xt+1 c' x < c ' xt+1

. xt c' x < c ' xt

4.2 Performance in practice


Slide 32
• Very slow convergence, close to the worst case
• Contrast with simplex method
• The ellipsoid method is a tool for classifying the complexity of linear

programming problems

8
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 19: Problems with exponentially


many constraints
1 Outline
Slide 1
• Problems with exponentially many constraints
• The separation problem
• Polynomial solvability
• Examples: MST, TSP, Probability
• Conclusions

2 Problems
2.1 Example
� Slide 2
min ci xi
i

ai xi ≥ |S|, for all subsets S of {1, . . . , n}
i∈S

• There are 2n constraints, but are described concisely in terms of the n

scalar parameters a1 , . . . , an

• Question: Suppose we apply the ellipsoid algorithm. Is it polynomial?


• In what?

2.2 The input


Slide 3
• Consider min c′ x s.t. x ∈ P
• P belongs to a family of polyhedra of special structure
• A typical polyhedron is described by specifying the dimension n and an

integer vector h of primary data, of dimension O(nk ), where k ≥ 1 is some

constant.

• In example, h = (a1 , . . . , an ) and k = 1


• U0 be the largest entry of h
Slide 4

• Given n and h, P is described as Ax ≥ b


• A has an arbitrary number of rows
• U largest entry in A and b. We assume

log U ≤ Cnℓ logℓ U0

1
3 The separation problem
Slide 5
Given a polyhedron P ⊂ ℜn and a vector x ∈ ℜn , the separation problem is
to:
• Either decide that x ∈ P , or
• Find a vector d such that d′ x < d′ y for all y ∈ P
What is the separation problem for

ai xi ≥ |S|, for all subsets S of {1, . . . , n}?
i∈S

4 Polynomial solvability
4.1 Theorem
Slide 6
If we can solve the separation problem (for a family of polyhedra) in time
polynomial in n and log U , then we can also solve linear optimization problems
in time polynomial in n and log U . If log U ≤ Cnℓ logℓ U0 , then it is also
polynomial in log U0

• Proof ?
• Converse is also true
• Separation and optimization are polynomially equivalent

4.2 Minimum Spanning


Tree (MST)
Slide 7
• How do telephone companies bill you?
• It used to be that rate/minute: Boston → LA proportional to distance in

MST

• Other applications: Telecommunications, Transportation (good lower bound

for TSP)

Slide 8
• Given a graph G = (V, E) undirected and Costs ce , e ∈ E.
• Find a tree of minimum cost spanning all the nodes.

1, if edge e is included in the tree
• Decision variables xe =
0, otherwise
Slide 9
• The tree should be connected. How can you model this requirement?

2
• Let S be a set of vertices. Then S and V \ S should be connected

i∈S
• Let δ(S) = {e = (i, j) ∈ E :
j ∈V \S
• Then, �
xe ≥ 1
e∈δ(S)

• What is the number of edges in a tree?



• Then, xe = n − 1
e∈E

4.2.1 Formulation
� Slide 10
IZMST = min ce xe

 e∈E


 xe ≥ 1 ∀ S ⊆ V, S �= ∅, V
 e∈δ(S)


H xe = n − 1
 e∈E


xe ∈ {0, 1}.
How can you solve the LP relaxation?

4.3 The Traveling Salesman


Problem
Slide 11
Given G = (V, E) an undirected graph. V = {1, . . . , n}, costs ce ∀ e ∈ E. Find
a tour that minimizes total length.

4.3.1 Formulation
� Slide 12
1, if edge e is included in the tour.
xe =
0, otherwise.

min ce xe

e∈E


s.t. xe ≥ 2, S⊆E
e∈δ(S)

xe = 2, i∈V
e∈δ(i)
xe ∈ {0, 1}

How can you solve the LP relaxation?

3
4.4 Probability Theory
Slide 13
• Events A1 , A2
• P (A1 ) = 0.5, P (A2 ) = 0.7, P (A1 ∩ A2 ) ≤ 0.1
• Are these beliefs consistent?
• General problem: Given n events Ai i ∈ N = {1, . . . , n}, beliefs
P(Ai ) ≤ pi , i ∈ N,
P(Ai ∩ Aj ) ≥ pij , i, j ∈ N, i < j.
• Given the numbers pi and pij , which are between 0 and 1, are these beliefs

consistent?

4.4.1 Formulation
� � � � �� Slide 14
x(S) = P ∩i∈S Ai ∩ ∩i∈S
/ Ai ,

x(S) ≤ pi , i ∈ N,
{S|i∈S}

x(S) ≥ pij , i, j ∈ N, i < j,
{S|i,j∈S}

x(S) = 1,
S
x(S) ≥ 0, ∀ S.
Slide 15
The previous LP is feasible if and only if there does not exist a vector (u, y, z) such
that � �
yij + ui + z ≥ 0, ∀ S,
i,j∈S,i<j i∈S
� �
pij yij + pi ui + z ≤ −1,
i,j∈N,i<j i∈N

yij ≤ 0, ui ≥ 0, i, j ∈ N, i < j.
Slide 16
Separation problem:
� �
z ∗ + min f (S) = ∗
yij + ui∗ ≥ 0?
S
i,j∈S,i<j i∈S

∗ ∗ ∗ ∗ ∗ ∗
Example: y12 = −2, y13 = −4, y14 = −4, y23 = −4, y24 = −1, y34 = −7,
∗ ∗ ∗ ∗ ∗
u1 = 9, u2 = 6, u3 = 4, u4 = 2, and z = 2 Slide 17
Slide 18
• The minimum cut corresponds to S0 = {3, 4} with value c(S0 ) = 21.
� �

• f (S0 ) = yij + u∗i
= −7 + 4 + 2 = −1
i,j∈S0 ,i<j i∈S0

∗ ∗
• f (S) + z ≥ f (S0 ) + z = −1 + 2 = 1 > 0, ∀S
• Given solution (y ∗ , u∗ , z ∗ ) is feasible

4
1, 2

2
1, 3 1
4 9

4 1, 4 2 6
s t
4 4
2,3 3
1
2
2,4 4

7
3,4

5 Conclusions
Slide 19
• Ellipsoid algorithm can characterize the complexity of solving LOPs with

an exponential number of constraints

• For practical purposes use dual simplex


• Ellipsoid method is an important theoretical development, not a practical

one

5
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 20: The Affine Scaling Algorithm


1 Outline
Slide 1
• History
• Geometric intuition
• Algebraic development
• Affine Scaling
• Convergence
• Initialization
• Practical performance

2 History
Slide 2
• In 1984, Karmakar at AT&T “invented” interior point method
• In 1985, Affine scaling “invented” at IBM + AT&T seeking intuitive ver­

sion of Karmarkar’s algorithm

• In early computational tests, A.S. far outperformed simplex and Kar­

markar’s algorithm

• In 1989, it was realised Dikin invented A.S. in 1967

3 Geometric intuition
3.1 Notation
Slide 3
min c′ x
s.t. Ax = b
x ≥ 0
and its dual
max p′ b
s.t. p′ A ≤ c′

• P = {x | Ax = b, x ≥ 0}
• {x ∈ P | x > 0} the interior of P and its elements interior points

1
c

x2 .
x1 .0
x.

3.2 The idea


Slide 4
4 Algebraic development
4.1 Theorem
Slide 5
β ∈ (0, 1), y ∈ ℜn : y > 0, and
� n

(xi − yi )2
� �
n
S= x∈ℜ � ≤ β2

2
.
yi
i=1

Then, x > 0 for every x ∈ S


Proof
• x∈S
• (xi − yi )2 ≤ β 2 yi2 < yi2
• |xi − yi | < yi ; −xi + yi < yi , and hence xi > 0
Slide 6
� �
x ∈ S is equivalent to �|Y −1 (x − y)�| ≤ β
Replace original LP:
min c′ x
s.t. �Ax = b �
�|Y −1 (x − y)�| ≤ β.

2
d=x−y
min c′ d
s.t. Ad = 0
||Y −1 d|| ≤ β

4.2 Solution
Slide 7
If rows of A are linearly independent and c is not a linear combination of the
rows of A, then
• optimal solution d∗ :
Y 2 (c − A′ p)
d∗ = −β �� � , p = (AY 2 A′ )−1 AY 2 c.
|Y (c − A′ p)�|

• x = y + d∗ ∈ P
� �
• c′ x = c′ y − β �|Y (c − A′ p)�| < c′ y

4.2.1 Proof
Slide 8
• AY 2 A′ is invertible;if not, there exists some z =
� 0 such that z ′ AY 2 A′ z = 0
′ ′
• w = Y A z; w w = 0 ⇒ w = 0
• Hence A′ z = 0 contradiction
• Since c is not a linear combination of the rows of A, c − A′ p = � 0 and d∗ is well

defined

• d∗ feasible
Y (c − A′ p)
Y −1 d∗ = −β � � ⇒ ||Y −1 d∗ || = β
�|Y (c − A′ p)�|
Ad∗ = 0, since AY 2 (c − A′ p) = 0

c′ d = (c′ − p′ A)d
= (c′ − p′ A)Y Y −1 d
� �
≥ −�|Y (c − A′ p)�| · ||Y −1 d||
� �
≥ −β �|Y (c − A′ p)�|.
Slide 9

c′ d∗ = (c′ − p′ A)d∗
Y 2 (c − A′ p)
= −(c′ − p′ A)β � �
�|Y (c − A′ p)�|
� �′ � �
Y (c − A′ p) Y (c − A′ p)
= −β � �
�|Y (c − A′ p)�|
� �
= −β �|Y (c − A′ p)�|.
� �
• c′ x = c′ y + c′ d∗ = c′ y − β �|Y (c − A′ p)�|

3
4.3 Interpretation
Slide 10
• y be a nondegenerate BFS with basis B
• A = [B N ]
• Y = diag(y1 , . . . , ym , 0, . . . , 0) and Y 0 = diag(y1 , . . . , ym ), then AY =
[BY 0 0]

p = (AY 2 A′ )−1 AY 2 c
= (B ′ )−1 Y −2
0 B
−1
BY 20 cB
= (B ′ )−1 cB

• Vectors p dual estimates


• r = c − A′ p becomes reduced costs:

r = c − A′ (B ′ )−1 cB

• Under degeneracy?

4.4 Termination
Slide 11
y and p be primal and dual feasible solutions with

c′ y − b′ p < ǫ

y ∗ and p∗ be optimal primal and dual solutions. Then,

c′ y ∗ ≤ c′ y < c′ y ∗ + ǫ,
b′ p∗ − ǫ < b′ p ≤ b′ p∗

4.4.1 Proof
Slide 12
• c′ y ∗ ≤ c′ y
• By weak duality, b′ p ≤ c′ y ∗
• Since c′ y − b′ p < ǫ,
c′ y < b′ p + ǫ ≤ c′ y ∗ + ǫ
b′ p∗ = c′ y ∗ ≤ c′ y < b′ p + ǫ

4
5 Affine Scaling
5.1 Inputs
Slide 13
• (A, b, c);
• an initial primal feasible solution x0 > 0
• the optimality tolerance ǫ > 0
• the parameter β ∈ (0, 1)

5.2 The Algorithm


Slide 14
1. (Initialization) Start with some feasible x0 > 0; let k = 0.
2. (Computation of dual estimates and reduced costs) Given some feasible

xk > 0, let

X k = diag(xk1 , . . . , xkn ),
pk = (AX 2k A′ )−1 AX 2k c,
rk = c − A′ pk .

3. (Optimality check) Let e = (1, 1, . . . , 1). If r k ≥ 0 and e′ X k rk < ǫ, then

stop; the current solution xk is primal ǫ-optimal and pk is dual ǫ-optimal.

4. (Unboundedness check) If −X 2k r k ≥ 0 then stop; the optimal cost is −∞.


5. (Update of primal solution) Let

X 2k rk
xk+1 = xk − β .
||X k rk ||

5.3 Variants
Slide 15
• ||u||∞ = maxi |ui |, γ(u) = max{ui | ui > 0}
• γ(u) ≤ ||u||∞ ≤ ||u||
• Short-step method.
• Long-step variants
X 2k r k
xk+1 = xk − β
||X k r k ||∞
X 2k r k
xk+1 = xk − β
γ(X k r k )

5
6 Convergence
6.1 Assumptions
Slide 16
Assumptions A:
(a) The rows of the matrix A are linearly independent.
(b) The vector c is not a linear combination of the rows of A.
(c) There exists an optimal solution.
(d) There exists a positive feasible solution.
Assumptions B:
(a) Every BFS to the primal problem is nondegenerate.
(b) At every BFS to the primal problem, the reduced cost of every nonbasic
variable is nonzero.

6.2 Theorem
Slide 17
If we apply the long-step affine scaling algorithm with ǫ = 0, the following hold:
(a) For the Long-step variant and under Assumptions A and B, and if 0 < β < 1,

xk and pk converge to the optimal primal and dual solutions

(b) For the second Long-step variant, and under Assumption A and if 0 < β <
2/3, the sequences xk and pk converge to some primal and dual optimal solutions,
respectively

7 Initialization
Slide 18
min c′ x + M xn+1
s.t. � � Ax + (b − Ae)xn+1 = b
x, xn+1 ≥ 0

8 Example
Slide 19
max x1 + 2x2
s.t. x1 + x2 ≤ 2
−x1 + x2 ≤ 1
x1 , x2 ≥0

9 Practical Performance
Slide 20
• Excellent practical performance, simple
• Major step: invert AX 2k A′
• Imitates the simplex method near the boundary

6
x2

..
1 ...
..
..
2 x1

MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 21: Primal Barrier

Interior Point Algorithm

1 Outline
Slide 1
1. Barrier Methods
2. The Central Path
3. Approximating the Central Path
4. The Primal Barrier Algorithm
5. Correctness and Complexity

2 Barrier methods
Slide 2
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m

S = {x| gj (x) < 0, j = 1, . . . , p,


hi (x) = 0, i = 1, . . . , m}

2.1 Strategy
Slide 3
• A barrier function G(x) is a continous function with the property that is

approaches ∞ as one of gj (x) approaches 0 from negative values.

• Examples:
p p
� � 1
G(x) = − log(−gj (x)), G(x) = −
j=1 j=1
gj (x)

Slide 4
• Consider a sequence of µk : 0 < µk+1 < µk and µk → 0.
• Consider the problem

xk = argminx∈S f (x) + µk G(x)


� �

• Theorem Every limit point xk generated by a barrier method is a global

minimum of the original constrained problem.

1
. x*

.
x(0.01) c

central p a th . x(0.1)

.
x(1)

. x(10)

.
analy t ic center

2.2 Primal path-following


IPMs for LO
Slide 5
(P ) min c′ x (D) max b′ p
s.t. Ax = b s.t. A′ p + s = c
x ≥ 0 s≥0
Barrier problem:
n

min Bµ (x) = c′ x − µ log xj
j=1
s.t. Ax = b

Minimizer: x(µ)

3 Central Path
Slide 6
• As µ varies, minimizers x(µ) form the central path
• limµ→0 x(µ) exists and is an optimal solution x∗ to the initial LP
• For µ = ∞, x(∞) is called the analytic center
n

min − log xj
j=1
s.t. Ax = b
Slide 7

3.1 Example
Slide 8
min x2
s.t. x1 + x2 + x3 = 1
x1 , x2 , x3 ≥ 0

2
x3 ( 1 / 3 , 1/ 3 , 1/ 3 )
the analy t ic
Q centerof P

.
P

( 1 / 2 , 0 , 1/ 2 )
the analy t ic
.
x2
centerof Q

the central p a th
x1


• Q = x | x = (x1 , 0, x3 ), x1 + x3 = 1, x ≥ 0}, set of optimal solutions to

original LP

• The analytic center of Q is (1/2, 0, 1/2)

min x2 − µ log x1 − µ log x2 − µ log x3


s.t. x1 + x2 + x3 = 1

min x2 − µ log x1 − µ log x2 − µ log(1 − x1 − x2 ).

1 − x2 (µ)
x1 (µ) =
2 �
1 + 3µ − 1 + 9µ2 + 2µ
x2 (µ) =
2
1 − x2 (µ)
x3 (µ) =
2
The analytic center: (1/3, 1/3, 1/3) Slide 9

3.2 Solution of Central Path


Slide 10
• Barrier problem for dual:
n

max p′ b + µ log sj
j=1
s.t. p′ A + s = c′

• Solution (KKT):

Ax(µ) = b

x(µ) ≥ 0

A′ p(µ) + s(µ) = c

s(µ) ≥ 0
X(µ)S(µ)e = eµ

3
Slide 11
• Theorem: If x∗ , p∗ , and s∗ satisfy optimality conditions, then they are

optimal solutions to problems primal and dual barrier problems.

• Goal: Solve barrier problem


n

min Bµ (x) = c′ x − µ log xj
j=1
s.t. Ax = b

4 Approximating the central path


Slide 12
∂Bµ (x) µ
= ci −
∂xi xi
∂ 2 Bµ (x) µ
2 = 2
∂xi xi
∂ 2 Bµ (x)
= 0, i �= j
∂xi ∂xj

Given a vector x > 0: Slide 13


n
� ∂Bµ (x)
Bµ (x + d) ≈ Bµ (x) + di +
i=1
∂xi
n
1 � ∂ 2 Bµ (x)
di dj
2 i,j=1 ∂xi ∂xj
1
= Bµ (x) + (c′ − µe′ X −1 )d + µd′ X −2 d
2
X = diag(x1 , . . . , xn ) Slide 14
Approximating problem:
1
min (c′ − µe′ X −1 )d + µd′ X −2 d
2
s.t. Ad = 0

Solution (from Lagrange):

c − µX −1 e + µX −2 d − A′ p = 0
Ad = 0
Slide 15

• System of m + n linear equations, with m + n unknowns (dj , j = 1, . . . , n,

and pi , i = 1, . . . , m).

4
.x*

.
x(0.01) c

central p a th . x(0.1)

.
x(1)

. x(10)

.
analy t ic center

• Solution:
� �� 1

d(µ) = I − X 2 A′ (AX 2 A′ )−1 A xe − X 2 c
µ
2 ′ −1 2
p(µ) = (AX A ) A(X c − µxe)

4.1 The Newton connection


Slide 16
• d(µ) is the Newton direction; process of calculating this direction is called

a Newton step

• Starting with x, the new primal solution is x + d(µ)


� �
• The corresponding dual solution becomes (p, s) = p(µ), c − A′ p(µ)
• We then decrease µ to µ = αµ, 0 < α < 1

4.2 Geometric Interpretation


Slide 17
• Take one Newton step so that x would be close to x(µ)

• Measure of closeness �� ��

�� 1 ��
�� XSe − e�� ≤ β,
�� µ ��

0 < β < 1, X = diag(x1 , . . . , xn ) S = diag(s1 , . . . , sn )

• As µ → 0, the complementarity slackness condition will be satisfied

Slide 18

5
5 The Primal Barrier Algorithm
Slide 19
Input
(a) (A, b, c); A has full row rank;
(b) x0 > 0, s0 > 0, p0 ;
(c) optimality tolerance ǫ > 0;
(d) µ0 , and α, where 0 < α < 1. Slide 20
1. (Initialization) Start with some primal and dual feasible x0 > 0, s0 >

0, p0 , and set k = 0.

2. (Optimality test) If (sk )′ xk < ǫ stop; else go to Step 3.


3. Let

X k = diag(xk1 , . . . , xkn ),
µk+1 = αµk

Slide 21
4. (Computation of directions) Solve the linear system

µk+1 X −2 ′
k d−A p = µ
k+1
X −1
k e−c
Ad = 0

5. (Update of solutions) Let

x k+1 = xk + d,
pk+1 = p,
sk+1 = c − A′ p.

6. Let k := k + 1 and go to Step 2.

6 Correctness
√ Slide 22
β−β
Theorem Given α = 1 − √ √ , β < 1, (x0 , s0 , p0 ), (x0 > 0, s0 > 0):
β+ n
�� ��
�� 1 ��
�� X 0 S 0 e − e�� ≤ β.
�� µ0 ��

Then, after � √ √
(s0 )′ x0 (1 + β)

β+ n
K= √ log
β−β ǫ(1 − β)
iterations, (xK , sK , pK ) is found:

(sK )′ xK ≤ ǫ.

6
6.1 Proof
� � Slide 23
� 1 �
• Claim (by induction): �| µk X k S k e − e��| ≤ β

• For k = 0 we have assumed it


• Assume it holds for k;
� � � �
� 1
�| = �| 1 X k S k e − e �|
� � �
�| X k S k e − e
� µk+1 � � αµk �
� � � �
� 1 1 1 − α ��
= ��| X k S k e − e + e �|
α µk α
� �
1� 1 � 1−α
≤ ��| k X k S k e − e��| + ||e||
α µ α
β 1 − α√
≤ + n
α α

= β

• We next show that ||X −1
k d|| ≤ β < 1, where d = xk+1 − xk .
• d solves

µk+1 X −2 ′
k d−A p = µ
k+1
X −1
k e − c,
Ad = 0

• By left-multiplying the first equation by d′


� �
µk+1 d′ X −2 ′
k d = d µ
k+1
X −1
k e−c

2
||X −1
k d|| = d′ X k−2 d
� �′
1
= X −1
k e − c d
µk+1
� �′
1 1 k k
= X−
k e − (s + A p ) ′
d
µk+1
� �′
1
= X −1
k e− sk d
µk+1
� �′
1
= − X kSke − e X −1
k d
µk+1
� �
� 1 �
≤ ��| k+1 X k S k e − e��| ||X −1
k d||
µ
� 1
≤ β||X −
k d||

hence, ||X −1
k d|| ≤ β < 1.

7
• We next show that xk+1 and (pk+1 , sk+1 ) are primal and dual feasible Since
Ad = 0, we have
Axk+1 = b

xk+1 = xk + d = X k (e + X − 1
k d) > 0,

because ||X −1
k d|| < 1

A′ pk+1 + sk+1 = c,
by construction and

sk+1 = c − A′ pk+1 = µk+1 X −1 −1


k (e − X k d) > 0,

because ||X −1
k d|| < 1


� �
dj
xk+1
j = xjk 1 + ,
xkj
� �
µk+1 dj
sk+1
j = 1− k .
xkj xj

Therefore,
� � � �
1 1 dj µk+1 dj
xk+1 sk+1 − 1 = k+1 xkj 1 + k 1− −1
µk+1 j j
µ xj xkj xkj
� �2
dj
= − .
xkj

• D = diag(d1 , . . . , dn ), ||u||1 = i
|ui |. Note that ||u|| ≤ ||u||1
� �
� 1 � � −2 2 �
�|
� µk+1 X k+1 S k+1 e − e�| = |X k D e |
� � �
� �
2 �
≤ �|X −2
k D e | 1
2 2
= e′ X −
k D e
2
= e′ DX −
k De
= d′ X −2
k d
1 � 2
� �
= �|X −
k d |

≤ ( β)2
= β,

and hence the induction is complete.


• Since at every iteration � �
�| 1 X k S k e − e �| ≤ β
� �
� µk �
1 k k
−β ≤ x j sj − 1 ≤ β
µk
nµk (1 − β) ≤ (sk )′ xk ≤ nµk (1 + β)

8
• � √ �k √
β−β

k k 0 β−β 0 −k √ √
β+ n µ0

µ =α µ = 1− √ √ µ ≤e
β+ n
• After
�√ √ � � √ √ �
β+ n µ0 n(1 + β) β+ n (s0 )′ x0 (1 + β)
√ log ≤ √ log =K
β−β ǫ β−β ǫ(1 − β)

iterations, the primal barrier algorithm finds primal and dual solutions xK ,

(pK , sK ), that have duality gap (sK )′ xK less than or equal to ǫ

7 Complexity
Slide 24
• Work per iteration involves solving a linear system with m + n equations

in m + n unknowns. Given that m ≤ n, the work per iteration is O(n3 ).

• ǫ0 = (s0 )′ x0 : initial duality gap. Algorithm needs


�√ ǫ0 �
O n log
ǫ
iterations to reduce the duality gap from ǫ0 to ǫ, with O(n3 ) arithmetic

operations per iteration.

9
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 22: Primal-dual Barrier

Interior Point Algorithm

1 Outline
Slide 1
1. The Barrier Problem
2. Solving Equations
3. The Primal-Dual Barrier Algorithm
4. Insight on Behavior
5. Computational Aspects
6. Conclusions

2 The Barrier Problem


Slide 2
Barrier problem:
n

min Bµ (x) = c′ x − µ log xj
j=1
s.t. Ax = b

KKT: � �′
1 1
c−µ ,..., + A′ p(µ) = 0
x1 (µ) xn (µ)
Ax(µ) = b, x(µ) ≥ 0

2.1 Optimality Conditions


Slide 3
µ
Set sj (µ) =
xj (µ)

Ax(µ) = b

x(µ) ≥ 0


A p(µ) + s(µ) = c

s(µ) ≥ 0

sj (µ)xj (µ) = µ or

X(µ)S(µ)e = eµ

� � � �
X(µ) = diag x1 (µ), . . . , xn (µ) , S(µ) = diag s1 (µ), . . . , sn (µ)

1
3 Solving Equations
  Slide 4
Ax − b
F (z) =  A′ p + s − c 
 

XSe − µe
z = (x, p, s), r = 2n + m
Solve
F (z ∗ ) = 0

3.1 Newton’s method


Slide 5
F (z k + d) ≈ F (z k ) + J (z k )d
Here J (z k ) is the r × r Jacobian matrix whose (i, j)th element is given by

∂Fi (z) ��
∂zj �z =z k

F (z k ) + J (z k )d = 0
Set z k+1 = z k + d (d is the Newton direction) Slide 6
(xk , pk , sk ) current primal and dual feasible solution
Newton direction d = (dkx , dkp , dsk )
 k 
Axk − b
  
A 0 0 dx
 0 A′ I   dkp  = −  A′ pk + sk − c 
    

Sk 0 Xk dks X k S k e − µk e

3.2 Step lengths


Slide 7
x k+1
= x + βPk dkx
k

k+1 k k
p = pk + βD dp
k+1 k k k
s = s + βD ds

To preserve nonnegativity, take

xk
� � ��
k
βP = min 1, α min − ki ,
{i|(dk
x )i <0} (dx )i
ski
� � ��
k
βD = min 1, α min − ,
{i|(dk
s )i <0} (dks )i

0<α<1

2
4 The Primal-Dual Barrier Algorithm
Slide 8
1. (Initialization) Start with x0 > 0, s0 > 0, p0 , and set k = 0
2. (Optimality test) If (sk )′ xk < ǫ stop; else go to Step 3.
3. (Computation of Newton directions)

(sk )′ xk
µk =
n

Xk = diag(xk1 , . . . , xkn )

S k = diag(sk1 , . . . , skn )

Solve linear system

dkx Axk − b
    
A 0 0
 0 A

I   dkp  = −  A′ pk + sk − c 
    

Sk 0 Xk dks X k S k e − µk e
Slide 9
4. (Find step lengths)

xki
� � ��
βPk = min 1, α min −
{i|(dk
x )i <0} (dkx )i
sk
� � ��
k
βD = min 1, α min − ki
{i|(dk
s )i <0} (ds )i

5. (Solution update)

xk+1 = xk + βPk dkx


k k
pk+1 = pk + βD dp
k k
sk+1 = sk + βD ds

6. Let k := k + 1 and go to Step 2

5 Insight on behavior
Slide 10
• Affine Scaling
� �
daffine = −X 2 I − A′ (AX 2 A′ )−1 AX 2 c

• Primal barrier
�� �
1 2

dprimal−barrier = I − X 2 A′ (AX 2 A′ )−1 A Xe − X c
µ

3
• For µ = ∞ � �
dcentering = I − X 2 A′ (AX 2 A′ )−1 A Xe

• Note that
1
dprimal−barrier = dcentering + daffine
µ
• When µ is large, then the centering direction dominates, i.e., in the beginning,

the barrier algorithm takes steps towards the analytic center

• When µ is small, then the affine scaling direction dominates, i.e., towards the

end, the barrier algorithm behaves like the affine scaling algorithm

6 Computational aspects of IPMs


Slide 11
Simplex vs. Interior point methods (IPMs)
• Simplex method tends to perform poorly on large, massively degenerate

problems, whereas IP methods are much less affected.

• Key step in IPMs


AX 2k A′ d = f
� �

• In implementations of IPMs AX 2k A′ is usually written as

AX 2k A′ = LL′ ,

where L is a square lower triangular matrix called the Cholesky factor


• Solve system
AX 2k A′ d = f
� �

by solving the triangular systems

Ly = f , L′ d = y

• The construction of L requires O(n3 ) operations; but the actual compu­

tational effort is highly dependent on the sparsity (number of nonzero

entries) of L

• Large scale implementations employ heuristics (reorder rows and columns

of A) to improve sparsity of L. If L is sparse, IPMs are stronger.

7 Conclusions
Slide 12
• IPMs represent the present and future of Optimization.
• Very successful in solving very large problems.
• Extend to general convex problems

4
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 23: Semidefinite Optimization


1 Outline
Slide 1
1. Preliminaries
2. SDO
3. Duality
4. SDO Modeling Power
5. Barrier Algorithm for SDO

2 Preliminaries
Slide 2
• A symmetric matrix A is positive semidefinite (A � 0) if and only if

u′ Au ≥ 0 ∀ u ∈ Rn

• A � 0 if and only if all eigenvalues of A are nonnegative


n �
� n
• A•B = Aij Bij
i=1 j=1

2.1 The trace


Slide 3
• The trace of a matrix A is defined
n

trace(A) = Ajj
j=1

• trace(AB) = trace(BA)
• A • B = trace(A′ B)

3 SDO
Slide 4
• C symmetric n × n matrix
• Ai , i = 1, . . . , m symmetric n × n matrices
• bi , i = 1, . . . , m scalars
• Semidefinite optimization problem (SDO)

(P ) : min C • X

s.t. Ai • X = bi i = 1, . . . , m
X�0

1
3.1 Example
Slide 5
n = 3 and m = 2
     
1 0 1 0 2 8 1 2 3
A1 =  0 3 7  , A2 =  2 6 0, C = 2 9 0
1 7 5 8 0 4 3 0 7

b1 = 11, b2 = 19
 
x11 x12 x13
X =  x21 x22 x23 
x31 x32 x33
Slide 6

(P ) : min x11 + 4x12 + 6x13 + 9x22 + 7x33


s.t. x11 + 2x13 + 3x22 + 14x23 + 5x33 = 11
4x12 + 16x13 + 6x22 + 4x33 = 19
 
x11 x12 x13
X =  x21 x22 x23  � 0
x31 x32 x33

3.2 LO as SDO
Slide 7
LO : min c′ x
s.t. Ax = b
x≥0
ai1 0 ... 0 c1 0 ... 0
   
 0 ai2 ... 0  0 c2 ... 0 
Ai = 
 ... .. .. ...  , C =  ...
  .. .. .. 
. . . . . 
0 0 . . . ain 0 0 . . . cn
Slide 8

(P ) : min C • X
s.t. Ai • X = bi , i = 1, . . . , m
Xij = 0, i = 1, . . . , n, j = i + 1, . . . , n
X�0
x1 0 . . . 0
 
 0 x2 . . . 0 
X =  ... .. . . . 
. . .. 
0 0 . . . xn

2
4 Duality
Slide 9
m

(D) : max yi bi
i=1
�m
s.t. yi Ai + S = C
i=1
S�0
Equivalently,
m

(D) : max yi bi
i=1
m

s.t. C − yi Ai � 0
i=1

4.1 Example
Slide 10
(D) max 11y1 + 19y2
     
1 0 1 0 2 8 1 2 3
s.t. y1  0 3 7  + y2  2 6 0+S = 2 9 0
1 7 5 8 0 4 3 0 7
S�0

(D) max 11y1 + 19y2


 
1 − 1y1 − 0y2 2 − 0y1 − 2y2 3 − 1y1 − 8y2
s.t.  2 − 0y1 − 2y2 9 − 3y1 − 6y2 0 − 7y1 − 0y2  � 0
3 − 1y1 − 8y2 0 − 7y1 − 0y2 7 − 5y1 − 4y2

4.2 Weak Duality


Slide 11
Theorem Given a feasible solution X of (P ) and a feasible solution (y, S) of
(D),
�m
C •X − yi bi = S • X ≥ 0
i=1
m

If C • X − yi bi = 0, then X and (y, S) are each optimal solutions to (P )
i=1
and (D) and SX = 0

3
4.3 Proof
Slide 12
• We must show that if S � 0 and X � 0, then S • X ≥ 0
• Let S = P DP ′ and X = QEQ′ where P, Q are orthonormal matrices

and D, E are nonnegative diagonal matrices

S • X = trace(S ′ X) = trace(SX)

= trace(P DP ′ QEQ′ )
n

= trace(DP ′ QEQ′ P ) = Djj (P ′ QEQ′ P )jj ≥ 0,
j=1

since Djj ≥ 0 and the diagonal of P ′ QEQ′ P must be nonnegative.


• Suppose that trace(SX) = 0. Then
n

Djj (P ′ QEQ′ P )jj = 0
j=1

• Then, for each j = 1, . . . , n, Djj = 0 or (P ′ QEQ′ P )jj = 0.

• The latter case implies that the j th row of P ′ QEQ′ P is all zeros. There­

fore, DP ′ QEQ′ P = 0, and so SX = P DP ′ QEQ′ = 0.

4.4 Strong Duality


Slide 13
• (P ) or (D) might not attain their respective optima
• There might be a duality gap, unless certain regularity conditions hold

Theorem

• If there exist feasible solutions X̂ for (P ) and (ŷ, Ŝ) for (D) such that

X̂ ≻ 0, Ŝ ≻ 0

• then, both (P ) and (D) attain their optimal values zP∗ and zD

• zP∗ = zD

5 SDO vs LO
Slide 14
• There may be a finite or infinite duality gap. The primal and/or dual may

or may not attain their optima. Both problems will attain their common

optimal value if both programs have feasible solutions in the interior of

the semidefinite cone.

4
• There is no finite algorithm for solving SDO. There is a simplex algorithm,

but it is not a finite algorithm. There is no direct analog of a “basic feasible

solution” for SDO.

• Given rational data, the feasible region may have no rational solutions.

The optimal solution may not have rational components or rational eigen­

values.

6 SDO Modeling Power


6.1 Quadratically
Constrained Problems
Slide 15
min (A0 x + b0 )′ (A0 x + b0 ) − c′0 x − d0

s.t. (Ai x + bi )′ (Ai x + bi ) − c′i x − di ≤ 0 ,

i = 1, . . . , m

(Ax + b)′ (Ax + b) − c′ x − d ≤ 0 ⇔


� �
I Ax + b
�0
(Ax + b)′ c′ x + d
Slide 16

min t

s.t. (A0 x + b0 )′ (A0 x + b0 ) − c′0 x − d0 − t ≤ 0

(Ai x + bi )′ (Ai x + bi ) − c′i x − di ≤ 0, ∀ i


Slide 17

min t
� �
I A0 x + b0
s.t. �0
(A0 x + b0 )′ c′0 x + d0 + t
� �
I Ai x + bi
�0 ∀i
(Ai x + bi )′ c′i x + di

6.2 Eigenvalue Problems


Slide 18
• X: symmetric n × n matrix
• λmax (X) = largest eigenvalue of X
• λ1 (X) ≥ λ2 (X) ≥ · · · ≥ λm (X) eigenvalues of X

5
• Theorem λmax (X) ≤ t ⇔ t · I − X � 0

k

λi (X) ≤ t ⇔ t − k · s − trace(Z) ≥ 0
i=1
Z�0

Z −X +sI �0
n

• Recall trace(Z) = Zii
i=1

6.3 Optimizing
Structural Dynamics
Slide 19
• Select xi , cross-sectional area of structure i, i = 1, . . . , n

• M (x) = M 0 + i xi M i , mass matrix

• K(x) = K 0 + i xi K i , stiffness matrix

• Structure weight w = w0 + i xi wi
• Dynamics

¨ + K(x)d = 0

M (x)d
Slide 20
• d(t) vector of displacements
n

• di (t) = αij cos(ωj t − φj )
j=1

• det(K(x) − M (x)ω 2 ) = 0; ω1 ≤ ω2 ≤ · · · ≤ ωn

1/2
• Fundamental frequency: ω1 = λmin (M (x), K(x))
• We want to bound the fundamental frequency

ω1 ≥ Ω ⇐⇒ M (x)Ω2 − K(x) � 0

• Minimize weight
Slide 21
Problem: Minimize weight subject to
Fundamental frequency ω1 ≥ Ω
Limits on cross-sectional areas
Formulation

min w0 + i xi wi

s.t. M (x) Ω2 − K(x) � 0


li ≤ xi ≤ ui

6
6.4 Measurements with Noise
Slide 22
• x: ability of a random student on k tests

E[x] = x̄, E[(x − x̄)(x − x̄)′ = Σ

• y: score of a random student on k tests


• v: testing error of k tests, independent of x

E[v] = 0, E[vv ′ ] = D, diagonal (unknown)

• y = x + v; E[y] = x̄

E[(y − x̄)(y − x̄)′ ] = Σ


� = Σ + D

• Objective: Estimate reliably x̄ and Σ


Slide 23
• Take samples of y from which we can estimate x̄, Σ

• e′ x: total ability on tests


• e′ y: total test score
• Reliability of test:=
Var[e′ x] e′ Σe e′ De
= = 1 −
Var[e′ y] �e
e′ Σ e′ Σ
�e
Slide 24
We can find a lower bound on the reliability of the test
min e′ Σe
s.t. Σ + D = Σ�
Σ, D � 0
D diagonal
Equivalently,
max e′ De
s.t. 0 � D � Σ

D diagonal

6.5 Further Tricks


Slide 25

B C ′
� �
A= � 0 ⇐⇒ D − CB −1 C ′ � 0
C D

c b′
� �
x′ Ax + 2b′ x + c ≥ 0, ∀ x ⇐⇒ �0
b A

7
6.6 MAXCUT
Slide 26
• Given G = (N, E) undirected graph, weights wij ≥ 0 on edge (i, j) ∈ E

• Find a subset S ⊆ N : i∈S,j∈S̄ wij is maximized

• xj = 1 for j ∈ S and xj = −1 for j ∈ S̄


n n
1 ��
M AXCU T : max wij (1 − xi xj )
4 i=1 j=1
s.t. xj ∈ {−1, 1}, j = 1, . . . , n

6.6.1 Reformulation
Slide 27
• Let Y = xx′ , i.e., Yij = xi xj
• Let W = [wij ]
• Equivalent Formulation
n n
1 ��
M AXCU T : max wij − W • Y
4 i=1 j=1
s.t. xj ∈ {−1, 1}, j = 1, . . . , n
Yjj = 1, j = 1, . . . , n
Y = xx′

6.6.2 Relaxation
Slide 28
• Y = xx′ � 0
• Relaxation
n n
1 ��
RELAX : max wij − W • Y
4 i=1 j=1
s.t. Yjj = 1, j = 1, . . . , n

Y � 0
Slide 29

M AXCU T ≤ RELAX

• It turns out that:

0.87856 RELAX ≤ M AXCU T ≤ RELAX

• The value of the SDO relaxation is guaranteed to be no more than 12%


higher than the value of the very difficult to solve problem MAXCUT

8
7 Barrier Algorithm for SDO
Slide 30
• X � 0 ⇔ λ1 (X) ≥ 0, . . . , λn (X) ≥ 0
• Natural barrier to repel X from the boundary λ1 (X) > 0, . . . , λn (X) > 0:
n

− log(λi (X)) =
j=1

n

− log( λi (X)) = − log(det(X))
j=1
Slide 31
• Logarithmic barrier problem

min Bµ (X) = C • X − µ log(det(X))


s.t. Ai • X = bi , i = 1, . . . , m,
X ≻0

• Derivative: ∇Bµ (X) = C − µX −1


• KKT
Ai • X = bi , i = 1, . . . , m,

X ≻ 0,
m

C − µX −1 = yi A i .
i=1

• Since X is symmetric, X = LL′ .


−1
S = µX −1 = µL′ L−1
1 ′
L SL = I
µ

Ai • X = bi , i = 1, . . . , m,

X ≻ 0, X = LL′
m

yi A i + S = C
i=1

I − µ1 L′ SL = 0
• Nonlinear equations: Take a Newton step analogously to IPM for LO.
�√
ǫ0

• Barrier algorithm needs O n log iterations to reduce duality gap from ǫ0
ǫ
to ǫ

9
8 Conclusions
Slide 32
• SDO is a very powerful modeling tool
• SDO represents the present and future in continuous optimization
• Barrier Algorithm is very powerful
• Research software available

10
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 24: Discrete Optimization


1 Outline
Slide 1
• Modeling with integer variables
• What is a good formulation?
• Theme: The Power of Formulations

2 Integer Programming
2.1 Mixed IP
Slide 2
(MIP) max c′ x + h′ y
s.t. Ax + By ≤ b
n
x ∈ Z+ (x ≥ 0, x integer)
n
y ∈ R+ (y ≥ 0)

2.2 Pure IP
Slide 3
(IP) max c′ x
s.t. Ax ≤ b
n
x ∈ Z+
Important special case: Binary IP
(BIP) max c′ x
s.t. Ax ≤ b
x ∈ {0, 1}n

2.3 LP
Slide 4
(LP) max c′ x
s.t. By ≤ b
n
y ∈ R+

3 Modeling with Binary Variables


3.1 Binary Choice
� Slide 5
1, if event occurs
x∈
0, otherwise
Example 1: IP formulation of the knapsack problem
n : projects, total budget b
aj : cost of project j
cj : value
� of project j Slide 6
1, if project j is selected.
xj =
0, otherwise.

1
n

max cj xj
j=1

s.t. aj xj ≤ b
xj ∈ {0, 1}

3.2 Modeling relations


Slide 7
• At most one event occurs �
xj ≤ 1
j

• Neither or both events occur

x2 − x1 = 0

• If one event occurs then, another occurs


0 ≤ x2 ≤ x1

• If x = 0, then y = 0; if x = 1, then y is uncontrained

0 ≤ y ≤ U x, x ∈ {0, 1}

3.3 The assignment problem


Slide 8
n people

m jobs

cij : cost
� of assigning person j to job i.

1 person jis assigned to job i


xij =
� 0
min cij xij


s.t. xij = 1 each job is assigned
j=1

�m

xij ≤ 1 each person can do at most one job.

i=1

xij ∈ {0, 1}

3.4 Multiple optimal solutions


Slide 9
• Generate all optimal solutions to a BOP.
max c′ x
s.t. Ax ≤ b
x ∈ {0, 1}n

• Generate third best?


• Extensions to MIO?

2
3.5 Nonconvex functions
Slide 10
• How to model min c(x), where c(x) is piecewise linear but not convex?

4 What is a good formulation?


4.1 Facility Location
Slide 11
• Data

N = {1 . . . n} potential facility locations

I = {1 . . . m} set of clients

cj : cost of facility placed at j

hij : cost of satisfying client i from facility j.

• Decision variables

1, a facility is placed at location j
xij =
0, otherwise
yij = fraction of demand of client i
satisfied by facility j.
Slide 12
n
� m �
� n
IZ1 = min cj xj + hij yij
j=1 i=1 j=1
�n
s.t. yij = 1
j=1
yij ≤ xj
xj ∈ {0, 1}, 0 ≤ yij ≤ 1.
Slide 13
Consider an alternative formulation.
n
� m �
� n
IZ2 = min cj xj + hij yij
j=1 i=1 j=1
�n
s.t. yij = 1
j=1
�m
yij ≤ m · xj
i=1
xj ∈ {0, 1}, 0 ≤ yij ≤ 1.

Are both valid?

Which one is preferable?

4.2 Observations
Slide 14
• IZ1 = IZ2 , since the integer points both formulations define are the same.
• n �
� 0 ≤ xj ≤ 1
P1 = {(x, y) : yij = 1, yij ≤ xj ,
0 ≤ yij ≤ 1
j=1

3
n
� m

P2 = {(x, y) : yij = 1, yij ≤ m · xj ,
j=1 i=1

0 ≤ xj ≤ 1
0 ≤ yij ≤ 1
Slide 15
• Let
Z1 = min cx + hy, Z2 = min cx + hy
(x, y) ∈ P1 (x, y) ∈ P2

• Z2 ≤ Z1 ≤ IZ1 = IZ2

4.3 Implications
Slide 16
• Finding IZ1 (= IZ2 ) is difficult.
• Solving to find Z1 , Z2 is an LP. Since Z1 is closer to IZ1 several methods

(branch and bound) would work better (actually much better).

• Suppose that if we solve min cx + hy, (x, y) ∈ P1 we find an integral

solution. Have we solved the facility location problem?

Slide 17
• Formulation 1 is better than Formulation 2. (Despite the fact that 1 has

a larger number of constraints than 2.)

• What is then the criterion?

4.4 Ideal Formulations


Slide 18
• Let P be an LP relaxation for a problem
• Let

H = {(x, y) : x ∈ {0, 1}n} ∩ P

• Consider Convex Hull (H)


� �
= {x : x = λi xi , λi = 1, λi ≥ 0, xi ∈ H}
i i

Slide 19
• The extreme points of CH(H) have {0, 1} coordinates.
• So, if we know CH(H) explicitly, then by solving min cx + hy, (x, y) ∈

CH(H) we solve the problem.

• Message: Quality of formulation is judged by closeness to CH(H).

CH(H) ⊆ P1 ⊆ P2

4
5 Minimum Spanning
Tree (MST)
Slide 20
• How do telephone companies bill you?
• It used to be that rate/minute: Boston → LA proportional to distance in

MST

• Other applications: Telecommunications, Transportation (good lower bound

for TSP)

Slide 21
• Given a graph G = (V, E) undirected and Costs ce , e ∈ E.
• Find a tree of minimum cost spanning all the nodes.

1, if edge e is included in the tree
• Decision variables xe =
0, otherwise
Slide 22
• The tree should be connected. How can you model this requirement?
• Let S be a set of vertices. Then S and V \ S should be connected

i∈S
• Let δ(S) = {e = (i, j) ∈ E :
j ∈V \S
• Then, �
xe ≥ 1
e∈δ(S)

• What is the number of edges in a tree?



• Then, xe = n − 1
e∈E

5.1 Formulation
� Slide 23
IZMST = min ce xe
 e∈E

 xe ≥ 1 ∀ S ⊆ V, S �= ∅, V
 e∈δ(S)


H xe = n − 1
 e∈E


xe ∈ {0, 1}.
Is this a good formulation? Slide 24

Pcut = {x ∈ R|E| : 0 ≤ x ≤ e,

xe = n − 1
e∈E

xe ≥ 1 ∀ S ⊆ V, S �= ∅, V }
e∈δ(S)

Is Pcut the CH(H)?

5
5.2 What is CH(H)?
Slide 25
Let �
Psub = {x ∈ R|E| : xe = n − 1
e∈E

xe ≤ |S| − 1 ∀ S ⊆ V, S =
� ∅, V }
e∈E(S)
� �
i∈S
E(S) = e = (i, j) :
j∈S
Why is this a valid IP formulation? Slide 26

• Theorem: Psub = CH(H).


• ⇒ Psub is the best possible formulation.
• MESSAGE: Good formulations can have an exponential number of con­

straints.

6 The Traveling Salesman


Problem
Slide 27
Given G = (V, E) an undirected graph. V = {1, . . . , n}, costs ce ∀ e ∈ E. Find
a tour that minimizes total length.

6.1 Formulation I
� Slide 28
1, if edge e is included in the tour.
xe =
0, otherwise.

min ce xe

e∈E


s.t. xe ≥ 2, S⊆E
e∈δ(S)

xe = 2, i∈V
e∈δ(i)
xe ∈ {0, 1}

6.2 Formulation II
� Slide 29
min �ce xe
s.t. xe ≤ |S| − 1, S ⊆ E
e∈E(S)

xe = 2, i ∈ V
e∈δ(i)
xe ∈ {0, 1}
Slide 30

6
T SP
� �
Pcut = {x ∈ R|E| ; xe ≥ 2, xe = 2
e∈δ(S) e∈δ(i)
0 ≤ xe ≤ 1}

T SP
Psub = {x ∈ R|E| ; xe = 2

e∈δ(i)


xe ≤ |S| − 1
e∈δ(S)
0 ≤ xe ≤ 1}
Slide 31
T SP T SP
• Theorem: Pcut = Psub �⊇ CH(H)
• Nobody knows CH(H) for the TSP

7 Minimum Matching
Slide 32
• Given G = (V, E); ce costs on e ∈ E. Find a matching of minimum cost.
• Formulation: �
min �ce xe
s.t. xe = 1, i∈V
e∈δ(i)
xe ∈ {0, 1}

• Is the LP ralaxation CH(H)?


Slide 33
Let �
PMAT = {x ∈ R|E| : xe = 1
e∈δ(i)

xe ≥ 1 |S| = 2k + 1, S =
� ∅
e∈δ(S)
xe ≥ 0}
Theorem: PMAT = CH(H)

8 Observations
Slide 34
• For MST, Matching there are efficient algorithms. CH(H) is known.
• For TSP � ∃ efficient algorithm. TSP is an N P − hard problem. CH(H)

is not known.

• Conjuecture: The convex hull of problems that are polynomially solvable

are explicitly known.

7
9 Summary
Slide 35
1. An IP formulation is better than another one if the polyhedra of their LP

relaxations are closer to the convex hull of the IP.

2. A good formulation can have an exponential number of constraints.


3. Conjecture: Formulations characterize the complexity of problems. If a

problem is solvable in polynomial time, then the convex hull of solutions

is known.

8
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.081J/6.251J Introduction to Mathematical
Programming

Lecture 25: Exact Methods


for Discrete Optimization
1 Outline
Slide 1
• Cutting plane methods
• Branch and bound methods

2 Cutting plane methods


Slide 2
min c′ x
s.t. Ax = b
x ≥ 0
x integer,
LP relaxation
min c′ x
s.t. Ax = b
x ≥ 0.

2.1 Algorithm
Slide 3
• Solve the LP relaxation. Let x∗ be an optimal solution.
• If x∗ is integer stop; x∗ is an optimal solution to IP.
• If not, add a linear inequality constraint to LP relaxation that all integer

solutions satisfy, but x∗ does not; go to Step 1.

2.2 Example
Slide 4
• Let x∗ be an optimal BFS to LP ralxation with at least one fractional

basic variable.

• N : set of indices of the nonbasic variables.


• Is this a valid cut? �
xj ≥ 1.
j∈N

2.3 The Gomory cutting


plane algorithm
Slide 5
• Let x∗ be an optimal BFS and B an optimal basis.

xB + B −1 AN xN = B −1 b.
� � � �
• aij = B −1 Aj i , ai0 = B −1 b i .

1
• �
xi + aij xj = ai0 .
j∈N

• Since xj ≥ 0 for all j,


� �
xi + ⌊aij ⌋xj ≤ xi + aij xj = ai0 .
j∈N j∈N

• Since xj integer, �
xi + ⌊aij ⌋xj ≤ ⌊ai0 ⌋.
j∈N

• Valid cut

2.4 Example
Slide 6
min x1 − 2x2
s.t. −4x1 + 6x2 ≤ 9
x1 + x2 ≤ 4
x1 , x2 ≥0
x1 , x2 integer.
We transform the problem in standard form
min x1 − 2x2
s.t. −4x1 + 6x2 + x3 = 9
x1 + x2 + x4 = 4
x1 , . . . , x4 ≥ 0
x1 , . . . , x4 integer.
LP relaxation: x1 = (15/10, 25/10). Slide 7

1 1 25
x2 + x3 + x4 = .
10 10 10
• Gomory cut

x2 ≤ 2.

• Add constraints x2 + x5 = 2, x5 ≥ 0
• New optimal x2 = (3/4, 2).
• One of the equations in the optimal tableau is
1 6 3
x1 − x3 + x5 = .
4 4 4
• New Gomory cut

x1 − x3 + x5 ≤ 0,

• New optimal solution is x3 = (1, 2).


Slide 8

2
x2

.
x1 - 3 x 1 + 5x2 < 7

2 ..
x2
x3
x2 < 2

0
1 2 3 4 x1

3 Branch and bound


Slide 9
1. Branching: Select an active subproblem Fi
2. Pruning: If the subproblem is infeasible, delete it.
3. Bounding: Otherwise, compute a lower bound b(Fi ) for the subproblem.
4. Pruning: If b(Fi ) ≥ U , the current best upperbound, delete the subproblem.
5. Partitioning: If b(Fi ) < U , either obtain an optimal solution to the subproblem

(stop), or break the corresponding problem into further subproblems, which are

added to the list of active subproblem.

3.1 LP Based
Slide 10
• Compute the lower bound b(F ) by solving the LP relaxation of the discrete

optimization problem.

• From the LP solution x∗ , if there is a component x∗i which is fractional,

we create two subproblems by adding either one of the constraints

xi ≤ ⌊x∗i ⌋, or xi ≥ ⌈xi∗ ⌉.

Note that both constraints are violated by x∗ .


• If there are more than 2 fractional components, we use selection rules like

maximum infeasibility etc. to determine the inequalities to be added to

the problem

• Select the active subproblem using either depth-first or breadth-first search

strategies.

3.2 Example
Slide 11
max 12x1 + 8x2 + 7x3 + 6x4
s.t. 8x1 + 6x2 + 5x3 + 4x4 ≤ 15
x1 , x2 , x3 , x4 are binary.

3
Ob j e ctiv e v a l u e =22.2
x1=1, x 2=0, x 3=0 . 6 , x 4=1

x3=0 x3=1

Ob j e ctiv e v a l u e =22 Ob j e ctiv e v a l u e =22


x1=1, x 2=0.5,x 3=0 , x 4=1 x1=1, x 2=0, x 3=1 x, 4=0 . 5

Ob j e ctiv e v a l u e =22.2
x1=1, x 2=0, x 3=0 . 6 , x 4=1

x3=0 x3=1

Ob j e ctiv e v a l u e =22 Ob j e ctiv e v a l u e =22


x1=1, x 2=0.5,x 3=0 , x 4=1 x1=1, x 2=0, x 3=1, x 4=0 . 5

x4=0 x4=1

Ob j e ctiv e v a l u e =2 1 . 6 6 Ob j e ctiv e v a l u e =22


x1=1, x 2=0 . 3 , x 3=1 , x 4=0 x1=0 . 7 5 , x 2=0 , x 3=1 , x 4=1

LP relaxation Slide 12

max 12x1 + 8x2 + 7x3 + 6x4


s.t. 8x1 + 6x2 + 5x3 + 4x4 ≤ 15

x1 ≤ 1, x2 ≤ 1, x3 ≤ 1, x4 ≤ 1

x1 , x2 , x3 , x4 ≥ 0

LP solution: x1 = 1, x2 = 0, x3 = 0.6, x4 = 1 Profit=22.2

3.2.1 Branch and bound tree


Slide 13

3.3 Pigeonhole Problem Slide 14

Slide 15

• There are n + 1 pigeons with n holes. We want to place the pigeons in the Slide 16

holes in such a way that no two pigeons go into the same hole.

• Let xij = 1 if pigeon i goes into hole j, 0 otherwise.


Slide 17

• Formulation 1:

j
xij = 1, i = 1, . . . , n + 1
xij + xkj ≤ 1, ∀j, i =
� k

4
Ob j e ctiv e v a l u e =22.2
x1=1, x 2=0, x 3=0 . 6 , x 4=1

x3=0 x3=1

Ob j e ctiv e v a l u e =22 Ob j e ctiv e v a l u e =22


x1=1, x 2=0.5,x 3=0 , x 4=1 x1=1, x 2=0, x 3=1, x 4=0 . 5

x4=0 x4=1

Ob j e ctiv e v a l u e =2 1 . 6 6 Ob j e ctiv e v a l u e =22


x1=1, x 2=0 . 3 , x 3=1 , x 4=0 x1=0 . 7 5 , x 2=0 , x 3=1 , x 4=1

x1=0 x1=1

Ob j e ctiv e v a l u e =2 1 I n feasib l e
x1=0 x, 2=1 x, 3=1 x, 4=1

• Formulation 2:

j
xij = 1, i = 1, . . . , n + 1
�n+1
i=1
xij ≤ 1, ∀j

Which formulation is better for the problem? Slide 18


• The pigeonhole problem is infeasible.
• For Formulation 1, feasible solution xij = n1 for all i, j. O(n3 ) constraints.

Nearly complete enumeration is needed for LP-based BB, since the prob­

lem remains feasible after fixing many variables.

• Formulation 2 Infeasible. O(n) constraints.


• Mesage: Formulation of the problem is important!

3.4 Preprocessing
Slide 19
• An effective way of improving integer programming formulations prior to and

during branch-and-bound.

• Logical Tests
– Removal of empty (all zeros) rows and columns;
– Removal of rows dominated by multiples of other rows;
– strengthening the bounds within rows by comparing individual variables
and coefficients to the right-hand-side.
– Additional strengthening may be possible for integral variables using round­
ing.
• Probing : Setting temporarily a 0-1 variable to 0 or 1 and redo the logical

tests. Force logical connection between variables. For example, if 5x + 4y + z ≤

8, x, y, z ∈ {0, 1}, then by setting x = 1, we obtain y = 0. This leads to an

inequality x + y ≤ 1.

5
1 2 5

4 3 6 7

4 Application
4.1 Directed TSP
4.1.1 Assignment Lower Bound
Slide 20
Given a directed graph G = (N, A) with n nodes, and a cost cij for every arc,
find a tour (a directed cycle that visits all nodes) of minimum cost.
�n �n
min i=1
c x
j=1 ij ij
�n
s.t. : xij = 1, j = 1, . . . , n,
�i=1
n
j=1
xij = 1, i = 1, ..., n,
xij ∈ {0, 1}.

Slide 21

4.2 Improving BB
Slide 22
• Better LP solver
• Use problem structure to derive better branching strategy
• Better choice of lower bound b(F ) - better relaxation
• Better choice of upper bound U - heuristic to get good solution
• KEY: Start pruning the search tree as early as possible

6
MIT OpenCourseWare
http://ocw.mit.edu

6.251J / 15.081J Introduction to Mathematical Programming


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like