Uploaded by Ammaar Saiyed

cvx-intro-notes

advertisement
M\cr NA
Manchester Numerical Analysis
MATH36061
Convex Optimization
Martin Lotz
School of Mathematics
The University of Manchester
Manchester, September 29, 2015
Outline
General information
What is optimization?
Course overview
Outline
General information
What is optimization?
Course overview
Organisation
The course website can be found under
http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/
1 / 24
Organisation
The course website can be found under
http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/
I
Tutorials start in week two, the tutorial class in week one will
consist of an introduction to CVX.
1 / 24
Organisation
The course website can be found under
http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/
I
Tutorials start in week two, the tutorial class in week one will
consist of an introduction to CVX.
I
Problem sets consist of two parts: problems to be worked on
at home, and problems to be discussed in class.
1 / 24
Organisation
The course website can be found under
http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/
I
Tutorials start in week two, the tutorial class in week one will
consist of an introduction to CVX.
I
Problem sets consist of two parts: problems to be worked on
at home, and problems to be discussed in class.
I
All the material presented in the lecture will be made available
online as the course progresses.
1 / 24
Organisation
The course website can be found under
http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/
I
Tutorials start in week two, the tutorial class in week one will
consist of an introduction to CVX.
I
Problem sets consist of two parts: problems to be worked on
at home, and problems to be discussed in class.
I
All the material presented in the lecture will be made available
online as the course progresses.
I
Lecture notes will appear on the website towards the end of
each week.
1 / 24
Organisation
The course website can be found under
http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/
I
Tutorials start in week two, the tutorial class in week one will
consist of an introduction to CVX.
I
Problem sets consist of two parts: problems to be worked on
at home, and problems to be discussed in class.
I
All the material presented in the lecture will be made available
online as the course progresses.
I
Lecture notes will appear on the website towards the end of
each week.
I
The material from the website is also available on Blackboard.
1 / 24
Organisation
The course website can be found under
http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/
I
Tutorials start in week two, the tutorial class in week one will
consist of an introduction to CVX.
I
Problem sets consist of two parts: problems to be worked on
at home, and problems to be discussed in class.
I
All the material presented in the lecture will be made available
online as the course progresses.
I
Lecture notes will appear on the website towards the end of
each week.
I
The material from the website is also available on Blackboard.
I
Contact: martin.lotz@manchester.ac.uk
1 / 24
Example classes
Example classes are there to deepen the understanding of the
course material.
2 / 24
Example classes
Example classes are there to deepen the understanding of the
course material.
I
Problems sheets should ideally be looked at before the
example class.
2 / 24
Example classes
Example classes are there to deepen the understanding of the
course material.
I
I
Problems sheets should ideally be looked at before the
example class.
In the example classes,
I
I
I
I
you will be given some time to work on the problems,
you should ask questions if some parts are not clear,
you will get feedback on your attempts at the problems,
we will go through a selection of problems and their solutions.
2 / 24
Important dates
Reserve these dates:
I
Tue, November 10: Midterm (coursework) test (50 minutes)
I
Thu, December 17: Revision class for final exam
I
Date of final exam will be announced when available!
3 / 24
MATLAB
4 / 24
MATLAB
I
Matlab, or an equivalent system, is an indispensable tool for
this course.
4 / 24
MATLAB
I
I
Matlab, or an equivalent system, is an indispensable tool for
this course.
Availability:
I
I
On campus computers
Student version available at low cost
4 / 24
MATLAB
I
I
Matlab, or an equivalent system, is an indispensable tool for
this course.
Availability:
I
I
I
On campus computers
Student version available at low cost
The course page contains links to Matlab documentation
4 / 24
MATLAB
I
I
Matlab, or an equivalent system, is an indispensable tool for
this course.
Availability:
I
I
On campus computers
Student version available at low cost
I
The course page contains links to Matlab documentation
I
Alternatives: Python, Julia
4 / 24
MATLAB
I
I
Matlab, or an equivalent system, is an indispensable tool for
this course.
Availability:
I
I
On campus computers
Student version available at low cost
I
The course page contains links to Matlab documentation
I
Alternatives: Python, Julia
I
An easy to parse introduction to Matlab is available here:
http://www.guettel.com/teaching/matlab/
4 / 24
CVX
5 / 24
CVX
I
CVX is a convex optimization package for MATLAB.
5 / 24
CVX
I
CVX is a convex optimization package for MATLAB.
I
CVX is freely available on
http://cvxr.com/cvx/
5 / 24
CVX
I
CVX is a convex optimization package for MATLAB.
I
CVX is freely available on
http://cvxr.com/cvx/
I
It is easy to install and the documentation provides some
examples to get started.
5 / 24
Outline
General information
What is optimization?
Course overview
Mathematical optimization
A mathematical optimization problem is a problem of the form
minimize
subject to
f (x)
g1 (x) ≤ 0
···
gm (x) ≤ 0
x∈Ω
where f, g1 , . . . , gm : Rd → R are real functions and Ω ⊆ Rd .
6 / 24
Mathematical optimization
A mathematical optimization problem is a problem of the form
minimize
subject to
f (x)
g1 (x) ≤ 0
···
gm (x) ≤ 0
x∈Ω
where f, g1 , . . . , gm : Rd → R are real functions and Ω ⊆ Rd .
I
f is the objective function, the gi and Ω the constraints.
6 / 24
Mathematical optimization
A mathematical optimization problem is a problem of the form
minimize
subject to
f (x)
g1 (x) ≤ 0
···
gm (x) ≤ 0
x∈Ω
where f, g1 , . . . , gm : Rd → R are real functions and Ω ⊆ Rd .
I
f is the objective function, the gi and Ω the constraints.
I
An x∗ such that f (x∗ ) ≤ f (x) for all other x satisfying the
constraints is called a solution of the problem.
6 / 24
Example: least squares
Assume a quantity Y depends linearly on predictors X1 , . . . , Xn :
Y = β0 + β1 X1 + · · · + βp Xp + ε,
(3.1)
where ε is some random error.
7 / 24
Example: least squares
Assume a quantity Y depends linearly on predictors X1 , . . . , Xn :
Y = β0 + β1 X1 + · · · + βp Xp + ε,
(3.1)
where ε is some random error.
I
Example: Y could sales, and X1 , . . . , Xp advertisement
budget in different media (TV, internet, radio, billboards,
newspapers).
7 / 24
Example: least squares
Assume a quantity Y depends linearly on predictors X1 , . . . , Xn :
Y = β0 + β1 X1 + · · · + βp Xp + ε,
(3.1)
where ε is some random error.
I
Example: Y could sales, and X1 , . . . , Xp advertisement
budget in different media (TV, internet, radio, billboards,
newspapers).
I
Goal: determine model parameters β0 , . . . , βp .
7 / 24
Example: least squares
Assume a quantity Y depends linearly on predictors X1 , . . . , Xn :
Y = β0 + β1 X1 + · · · + βp Xp + ε,
(3.1)
where ε is some random error.
I
Example: Y could sales, and X1 , . . . , Xp advertisement
budget in different media (TV, internet, radio, billboards,
newspapers).
I
Goal: determine model parameters β0 , . . . , βp .
I
What we know is data from n ≥ p observations or
experiments:
yi = β0 + β1 xi1 + · · · + βp xip + εi ,
1 ≤ i ≤ n.
7 / 24
Example: least squares
What we know is data from n ≥ p observations or experiments:
yi = β0 + β1 xi1 + · · · + βp xip + εi ,
1 ≤ i ≤ n.
8 / 24
Example: least squares
What we know is data from n ≥ p observations or experiments:
yi = β0 + β1 xi1 + · · · + βp xip + εi ,
1 ≤ i ≤ n.
Collect the data in matrices and vectors:


y1
1 x11 · · ·
 .. . .
 .. 
..
y =  .  , X = .
.
.
yn
1 xn1 · · ·


 
β0
ε1
β1 
 
 .. 

 , β =  ..  , ε =  . 
.
εn
xnp
βp
x1p


8 / 24
Example: least squares
What we know is data from n ≥ p observations or experiments:
yi = β0 + β1 xi1 + · · · + βp xip + εi ,
1 ≤ i ≤ n.
Collect the data in matrices and vectors:


y1
1 x11 · · ·
 .. . .
 .. 
..
y =  .  , X = .
.
.
yn
1 xn1 · · ·


 
β0
ε1
β1 
 
 .. 

 , β =  ..  , ε =  . 
.
εn
xnp
βp
x1p


The relationship can then be written as
y = Xβ + ε.
8 / 24
Example: least squares
Based on the relationship
y = Xβ + ε,
choose β that minimizes the squared 2-norm of the error,
minimize
ky − Xβk22
9 / 24
Example: least squares
Based on the relationship
y = Xβ + ε,
choose β that minimizes the squared 2-norm of the error,
minimize
I
ky − Xβk22
This is an optimization problem with a quadratic objective
function and no constraints.
9 / 24
Example: least squares
Based on the relationship
y = Xβ + ε,
choose β that minimizes the squared 2-norm of the error,
minimize
ky − Xβk22
I
This is an optimization problem with a quadratic objective
function and no constraints.
I
The problem has a closed form solution
β = (X > X)−1 X > y.
9 / 24
Example: least squares
Based on the relationship
y = Xβ + ε,
choose β that minimizes the squared 2-norm of the error,
minimize
ky − Xβk22
I
This is an optimization problem with a quadratic objective
function and no constraints.
I
The problem has a closed form solution
β = (X > X)−1 X > y.
I
There are more efficient methods available than using the
closed form solution.
9 / 24
Application: linear regression
Observation: the log of the basal metabolic rate (energy
expenditure) rate Y in mammals is related to log adult mass X as
Y = β0 + β1 X.
10 / 24
Application: linear regression
Observation: the log of the basal metabolic rate (energy
expenditure) rate Y in mammals is related to log adult mass X as
Y = β0 + β1 X.
I
Collect data from a database 573 mammal species and plot
the relationship.
10 / 24
Application: linear regression
Observation: the log of the basal metabolic rate (energy
expenditure) rate Y in mammals is related to log adult mass X as
Y = β0 + β1 X.
I
Collect data from a database 573 mammal species and plot
the relationship.
I
Assemble the vector y, matrix X, and solve the problem
minimize
ky − Xβk2
cvx begin
variable x (2);
m i n i m i z e ( norm ( y−X∗x , 2 ) )
cvx end
10 / 24
Application: linear regression
Y = β0 + β1 X.
12
log Basal metabolic rate
10
8
6
4
2
0
0
2
4
6
8
log Adult body mass
10
12
14
11 / 24
Application: linear regression
Y = β0 + β1 X.
Figure: Episode Size Matters from TV series Wonders of Life
11 / 24
Example: linear programming
Linear programming refers to problems of the form
maximize
c1 x1 + · · · cn xn
subject to
a11 x1 + · · · + a1n xn ≤ b1
···
am1 x1 + · · · + amn xn ≤ bm
x1 ≥ 0, . . . , xn ≥ 0.
12 / 24
Example: linear programming
Linear programming refers to problems of the form
maximize
c1 x1 + · · · cn xn
subject to
a11 x1 + · · · + a1n xn ≤ b1
···
am1 x1 + · · · + amn xn ≤ bm
x1 ≥ 0, . . . , xn ≥ 0.
I
Can be rewritten in standard form shown at the beginning!
12 / 24
Example: linear programming
Linear programming refers to problems of the form
maximize
c1 x1 + · · · cn xn
subject to
a11 x1 + · · · + a1n xn ≤ b1
···
am1 x1 + · · · + amn xn ≤ bm
x1 ≥ 0, . . . , xn ≥ 0.
I
Can be rewritten in standard form shown at the beginning!
I
Using matrices and vectors:
maximize
hc, xi (= c> x)
subject to
Ax ≤ b
(inequality componentwise)
x≥0
12 / 24
Linear programming: geometry
{x : ha, xi ≤ b}.
13 / 24
Linear programming: geometry
{x : ha1 , xi ≤ b1 , . . . , ham , xi ≤ bm } = {x : Ax ≤ b}.
Polyhedron
14 / 24
Linear programming: geometry
maximize
hc, xi
subject to
Ax ≤ b.
15 / 24
Linear programming: application
A cargo plane has two compartments with capacities C1 = 35 and
C2 = 40 tonnes and volumes V1 = 250 and V2 = 400 cubic metres.
16 / 24
Linear programming: application
A cargo plane has two compartments with capacities C1 = 35 and
C2 = 40 tonnes and volumes V1 = 250 and V2 = 400 cubic metres.
I
Three types of cargo:
Cargo 1
Cargo 2
Cargo 3
Volume
8
10
7
Weight
25
32
28
Profit (£/ tonne)
£300
£350
£270
16 / 24
Linear programming: application
A cargo plane has two compartments with capacities C1 = 35 and
C2 = 40 tonnes and volumes V1 = 250 and V2 = 400 cubic metres.
I
Three types of cargo:
Cargo 1
Cargo 2
Cargo 3
I
Volume
8
10
7
Weight
25
32
28
Profit (£/ tonne)
£300
£350
£270
How to distribute the cargo to maximize profit?
16 / 24
Linear programming: application
xij : amount of cargo i in compartment j
Objective function:
f (x) = 300 · (x11 + x12 ) + 350 · (x21 + x22 ) + 270 · (x31 + x32 ).
17 / 24
Linear programming: application
xij : amount of cargo i in compartment j
Objective function:
f (x) = 300 · (x11 + x12 ) + 350 · (x21 + x22 ) + 270 · (x31 + x32 ).
Constraints:
x11 + x12 ≤ 25
(total amount of cargo 1)
x21 + x22 ≤ 32
(total amount of cargo 2)
x31 + x32 ≤ 28
(total amount of cargo 3)
x11 + x21 + x31 ≤ 35
(weight constraint on comp. 1)
x12 + x22 + x32 ≤ 40
(weight constraint on comp. 2)
8x11 + 10x21 + 7x31 ≤ 250
8x12 + 10x22 + 7x32 ≤ 400
(volume constraint on comp. 1)
(volume constraint on comp. 2)
(x11 + x21 + x31 )/35 − (x12 + x22 + x32 )/40 = 0
(maintain balance of weight ratio)
xij ≥ 0
(cargo can’t have negative weight)
17 / 24
Linear programming: application
As complicated as it might look, it can be solved easily with CVX.
c = [300;300;350;350;270;270];
A = [1 , 1 , 0 , 0 , 0 , 0; . . .
0 , 0 , 1 , 1 , 0 , 0; . . .
0 , 0 , 0 , 0 , 1 , 1; . . .
1 , 0 , 1 , 0 , 1 , 0; . . .
0 , 1 , 0 , 1 , 0 , 1; . . .
8 , 0 , 10 , 0 , 7 , 0 ; . . .
0 , 8 , 0 , 10 , 0 , 7 ] ;
B = [ 1 / 3 5 , −1/40, 1 / 3 5 , −1/40, 1 / 3 5 , −1/40];
b = [25;32;28;35;40;250;400];
cvx begin
variable x (6);
maximize ( c ’∗ x ) ;
s u b j e c t to
A∗x <= b ;
B∗x == 0 ;
x >= 0 ;
cvx end
18 / 24
Linear programming: application
As complicated as it might look, it can be solved easily with CVX.
c = [300;300;350;350;270;270];
A = [1 , 1 , 0 , 0 , 0 , 0; . . .
0 , 0 , 1 , 1 , 0 , 0; . . .
0 , 0 , 0 , 0 , 1 , 1; . . .
1 , 0 , 1 , 0 , 1 , 0; . . .
0 , 1 , 0 , 1 , 0 , 1; . . .
8 , 0 , 10 , 0 , 7 , 0 ; . . .
0 , 8 , 0 , 10 , 0 , 7 ] ;
B = [ 1 / 3 5 , −1/40, 1 / 3 5 , −1/40, 1 / 3 5 , −1/40];
b = [25;32;28;35;40;250;400];
cvx begin
variable x (6);
maximize ( c ’∗ x ) ;
s u b j e c t to
A∗x <= b ;
B∗x == 0 ;
x >= 0 ;
cvx end
x11 = 6.7500, x12 = 7.7143, x21 = 0, x22 = 32, x31 = 28, x32 = 0.
18 / 24
Example: portfolio optimization
Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n.
19 / 24
Example: portfolio optimization
Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n.
I
If asset i has expected return µi , then the overall expected
return of the investment will be
x1 µ1 + · · · + xn µn = µ> x
19 / 24
Example: portfolio optimization
Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n.
I
If asset i has expected return µi , then the overall expected
return of the investment will be
x1 µ1 + · · · + xn µn = µ> x
I
The risk of the investment is modelled using the estimated
covariance matrix Σ and the quadratic function x> Σx.
19 / 24
Example: portfolio optimization
Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n.
I
If asset i has expected return µi , then the overall expected
return of the investment will be
x1 µ1 + · · · + xn µn = µ> x
I
The risk of the investment is modelled using the estimated
covariance matrix Σ and the quadratic function x> Σx.
I
Goal: choose ratios xi in order to minimize the risk given
some target expected return µ:
minimize
subject to
x> Σx
µ> x = µ
x ≥ 0.
19 / 24
Example: portfolio optimization
minimize
subject to
x> Σx
µ> x = µ
x ≥ 0.
20 / 24
Example: portfolio optimization
minimize
subject to
x> Σx
µ> x = µ
x ≥ 0.
I
This problem is a combination of the previous two: we
minimize a quadratic function and have linear constraints.
20 / 24
Example: portfolio optimization
minimize
subject to
x> Σx
µ> x = µ
x ≥ 0.
I
This problem is a combination of the previous two: we
minimize a quadratic function and have linear constraints.
I
If we remove the condition x ≥ 0 (corresponds to allowing
short selling), then the problem has a closed form solution. In
general, can be solved efficiently.
20 / 24
Example: total variation denoising
Assume a vector x comes from sampling a signal at equal time
intervals, and we observe a noisy version
x = x + ε,
1.5
1
0.5
0
−0.5
−1
−1.5
0
50
100
150
200
250
300
350
400
21 / 24
Example: total variation denoising
Assume a vector x comes from sampling a signal at equal time
intervals, and we observe a noisy version
x = x + ε,
1.5
1
0.5
0
−0.5
−1
−1.5
0
I
50
100
150
200
250
300
350
400
Goal: find good estimate x̂ for noiseless signal.
21 / 24
Example: total variation denoising
Solve optimization problem
minimize kx − xk2 + λ kxkTV
P
where kxkTV = n−1
i=1 |xi+1 − xi | is the total variation, or TV
norm of x, and λ is a regularisation parameter.
22 / 24
Example: total variation denoising
Solve optimization problem
minimize kx − xk2 + λ kxkTV
P
where kxkTV = n−1
i=1 |xi+1 − xi | is the total variation, or TV
norm of x, and λ is a regularisation parameter.
I
We use the notation x̂ = arg min kx − xk2 + λ kxkTV .
22 / 24
Example: total variation denoising
Solve optimization problem
minimize kx − xk2 + λ kxkTV
P
where kxkTV = n−1
i=1 |xi+1 − xi | is the total variation, or TV
norm of x, and λ is a regularisation parameter.
I
We use the notation x̂ = arg min kx − xk2 + λ kxkTV .
I
The objective function is nonsmooth: there are points where
it is not differentiable!
22 / 24
Example: total variation denoising
Solve optimization problem
minimize kx − xk2 + λ kxkTV
P
where kxkTV = n−1
i=1 |xi+1 − xi | is the total variation, or TV
norm of x, and λ is a regularisation parameter.
I
We use the notation x̂ = arg min kx − xk2 + λ kxkTV .
I
The objective function is nonsmooth: there are points where
it is not differentiable!
I
Optimization theory needs tools to do calculus with such
function: theory of subdifferentials, nonsmooth analysis.
22 / 24
Example: total variation denoising
Solve optimization problem
minimize kx − x̂k2 + λ kxkTV
P
where kxkTV = n−1
i=1 |xi+1 − xi | is the total variation, or TV
norm of x, and λ is a regularisation parameter.
Signal after denoising
1.5
1
0.5
0
−0.5
−1
−1.5
0
50
100
150
200
250
300
350
400
23 / 24
Outline
General information
What is optimization?
Course overview
Overview of the course
The course will try to balance three points of view:
I
Theory. Optimality conditions, duality theory, convex analysis,
nonsmooth analysis, geometry of linear, quadratic and
semidefinite programming.
I
Algorithms. Descent algorithms, interior-point methods,
projected gradient methods, Lagrangian
I
Applications. Logistics and scheduling, finance, signal
processing, convex regularization, semidefinite relaxations of
hard combinatorial problems, machine learning, compressive
sensing.
24 / 24
Download