M\cr NA Manchester Numerical Analysis MATH36061 Convex Optimization Martin Lotz School of Mathematics The University of Manchester Manchester, September 29, 2015 Outline General information What is optimization? Course overview Outline General information What is optimization? Course overview Organisation The course website can be found under http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/ 1 / 24 Organisation The course website can be found under http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/ I Tutorials start in week two, the tutorial class in week one will consist of an introduction to CVX. 1 / 24 Organisation The course website can be found under http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/ I Tutorials start in week two, the tutorial class in week one will consist of an introduction to CVX. I Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. 1 / 24 Organisation The course website can be found under http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/ I Tutorials start in week two, the tutorial class in week one will consist of an introduction to CVX. I Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. I All the material presented in the lecture will be made available online as the course progresses. 1 / 24 Organisation The course website can be found under http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/ I Tutorials start in week two, the tutorial class in week one will consist of an introduction to CVX. I Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. I All the material presented in the lecture will be made available online as the course progresses. I Lecture notes will appear on the website towards the end of each week. 1 / 24 Organisation The course website can be found under http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/ I Tutorials start in week two, the tutorial class in week one will consist of an introduction to CVX. I Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. I All the material presented in the lecture will be made available online as the course progresses. I Lecture notes will appear on the website towards the end of each week. I The material from the website is also available on Blackboard. 1 / 24 Organisation The course website can be found under http://www.maths.manchester.ac.uk/~mlotz/teaching/cvx/ I Tutorials start in week two, the tutorial class in week one will consist of an introduction to CVX. I Problem sets consist of two parts: problems to be worked on at home, and problems to be discussed in class. I All the material presented in the lecture will be made available online as the course progresses. I Lecture notes will appear on the website towards the end of each week. I The material from the website is also available on Blackboard. I Contact: martin.lotz@manchester.ac.uk 1 / 24 Example classes Example classes are there to deepen the understanding of the course material. 2 / 24 Example classes Example classes are there to deepen the understanding of the course material. I Problems sheets should ideally be looked at before the example class. 2 / 24 Example classes Example classes are there to deepen the understanding of the course material. I I Problems sheets should ideally be looked at before the example class. In the example classes, I I I I you will be given some time to work on the problems, you should ask questions if some parts are not clear, you will get feedback on your attempts at the problems, we will go through a selection of problems and their solutions. 2 / 24 Important dates Reserve these dates: I Tue, November 10: Midterm (coursework) test (50 minutes) I Thu, December 17: Revision class for final exam I Date of final exam will be announced when available! 3 / 24 MATLAB 4 / 24 MATLAB I Matlab, or an equivalent system, is an indispensable tool for this course. 4 / 24 MATLAB I I Matlab, or an equivalent system, is an indispensable tool for this course. Availability: I I On campus computers Student version available at low cost 4 / 24 MATLAB I I Matlab, or an equivalent system, is an indispensable tool for this course. Availability: I I I On campus computers Student version available at low cost The course page contains links to Matlab documentation 4 / 24 MATLAB I I Matlab, or an equivalent system, is an indispensable tool for this course. Availability: I I On campus computers Student version available at low cost I The course page contains links to Matlab documentation I Alternatives: Python, Julia 4 / 24 MATLAB I I Matlab, or an equivalent system, is an indispensable tool for this course. Availability: I I On campus computers Student version available at low cost I The course page contains links to Matlab documentation I Alternatives: Python, Julia I An easy to parse introduction to Matlab is available here: http://www.guettel.com/teaching/matlab/ 4 / 24 CVX 5 / 24 CVX I CVX is a convex optimization package for MATLAB. 5 / 24 CVX I CVX is a convex optimization package for MATLAB. I CVX is freely available on http://cvxr.com/cvx/ 5 / 24 CVX I CVX is a convex optimization package for MATLAB. I CVX is freely available on http://cvxr.com/cvx/ I It is easy to install and the documentation provides some examples to get started. 5 / 24 Outline General information What is optimization? Course overview Mathematical optimization A mathematical optimization problem is a problem of the form minimize subject to f (x) g1 (x) ≤ 0 ··· gm (x) ≤ 0 x∈Ω where f, g1 , . . . , gm : Rd → R are real functions and Ω ⊆ Rd . 6 / 24 Mathematical optimization A mathematical optimization problem is a problem of the form minimize subject to f (x) g1 (x) ≤ 0 ··· gm (x) ≤ 0 x∈Ω where f, g1 , . . . , gm : Rd → R are real functions and Ω ⊆ Rd . I f is the objective function, the gi and Ω the constraints. 6 / 24 Mathematical optimization A mathematical optimization problem is a problem of the form minimize subject to f (x) g1 (x) ≤ 0 ··· gm (x) ≤ 0 x∈Ω where f, g1 , . . . , gm : Rd → R are real functions and Ω ⊆ Rd . I f is the objective function, the gi and Ω the constraints. I An x∗ such that f (x∗ ) ≤ f (x) for all other x satisfying the constraints is called a solution of the problem. 6 / 24 Example: least squares Assume a quantity Y depends linearly on predictors X1 , . . . , Xn : Y = β0 + β1 X1 + · · · + βp Xp + ε, (3.1) where ε is some random error. 7 / 24 Example: least squares Assume a quantity Y depends linearly on predictors X1 , . . . , Xn : Y = β0 + β1 X1 + · · · + βp Xp + ε, (3.1) where ε is some random error. I Example: Y could sales, and X1 , . . . , Xp advertisement budget in different media (TV, internet, radio, billboards, newspapers). 7 / 24 Example: least squares Assume a quantity Y depends linearly on predictors X1 , . . . , Xn : Y = β0 + β1 X1 + · · · + βp Xp + ε, (3.1) where ε is some random error. I Example: Y could sales, and X1 , . . . , Xp advertisement budget in different media (TV, internet, radio, billboards, newspapers). I Goal: determine model parameters β0 , . . . , βp . 7 / 24 Example: least squares Assume a quantity Y depends linearly on predictors X1 , . . . , Xn : Y = β0 + β1 X1 + · · · + βp Xp + ε, (3.1) where ε is some random error. I Example: Y could sales, and X1 , . . . , Xp advertisement budget in different media (TV, internet, radio, billboards, newspapers). I Goal: determine model parameters β0 , . . . , βp . I What we know is data from n ≥ p observations or experiments: yi = β0 + β1 xi1 + · · · + βp xip + εi , 1 ≤ i ≤ n. 7 / 24 Example: least squares What we know is data from n ≥ p observations or experiments: yi = β0 + β1 xi1 + · · · + βp xip + εi , 1 ≤ i ≤ n. 8 / 24 Example: least squares What we know is data from n ≥ p observations or experiments: yi = β0 + β1 xi1 + · · · + βp xip + εi , 1 ≤ i ≤ n. Collect the data in matrices and vectors: y1 1 x11 · · · .. . . .. .. y = . , X = . . . yn 1 xn1 · · · β0 ε1 β1 .. , β = .. , ε = . . εn xnp βp x1p 8 / 24 Example: least squares What we know is data from n ≥ p observations or experiments: yi = β0 + β1 xi1 + · · · + βp xip + εi , 1 ≤ i ≤ n. Collect the data in matrices and vectors: y1 1 x11 · · · .. . . .. .. y = . , X = . . . yn 1 xn1 · · · β0 ε1 β1 .. , β = .. , ε = . . εn xnp βp x1p The relationship can then be written as y = Xβ + ε. 8 / 24 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize ky − Xβk22 9 / 24 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize I ky − Xβk22 This is an optimization problem with a quadratic objective function and no constraints. 9 / 24 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize ky − Xβk22 I This is an optimization problem with a quadratic objective function and no constraints. I The problem has a closed form solution β = (X > X)−1 X > y. 9 / 24 Example: least squares Based on the relationship y = Xβ + ε, choose β that minimizes the squared 2-norm of the error, minimize ky − Xβk22 I This is an optimization problem with a quadratic objective function and no constraints. I The problem has a closed form solution β = (X > X)−1 X > y. I There are more efficient methods available than using the closed form solution. 9 / 24 Application: linear regression Observation: the log of the basal metabolic rate (energy expenditure) rate Y in mammals is related to log adult mass X as Y = β0 + β1 X. 10 / 24 Application: linear regression Observation: the log of the basal metabolic rate (energy expenditure) rate Y in mammals is related to log adult mass X as Y = β0 + β1 X. I Collect data from a database 573 mammal species and plot the relationship. 10 / 24 Application: linear regression Observation: the log of the basal metabolic rate (energy expenditure) rate Y in mammals is related to log adult mass X as Y = β0 + β1 X. I Collect data from a database 573 mammal species and plot the relationship. I Assemble the vector y, matrix X, and solve the problem minimize ky − Xβk2 cvx begin variable x (2); m i n i m i z e ( norm ( y−X∗x , 2 ) ) cvx end 10 / 24 Application: linear regression Y = β0 + β1 X. 12 log Basal metabolic rate 10 8 6 4 2 0 0 2 4 6 8 log Adult body mass 10 12 14 11 / 24 Application: linear regression Y = β0 + β1 X. Figure: Episode Size Matters from TV series Wonders of Life 11 / 24 Example: linear programming Linear programming refers to problems of the form maximize c1 x1 + · · · cn xn subject to a11 x1 + · · · + a1n xn ≤ b1 ··· am1 x1 + · · · + amn xn ≤ bm x1 ≥ 0, . . . , xn ≥ 0. 12 / 24 Example: linear programming Linear programming refers to problems of the form maximize c1 x1 + · · · cn xn subject to a11 x1 + · · · + a1n xn ≤ b1 ··· am1 x1 + · · · + amn xn ≤ bm x1 ≥ 0, . . . , xn ≥ 0. I Can be rewritten in standard form shown at the beginning! 12 / 24 Example: linear programming Linear programming refers to problems of the form maximize c1 x1 + · · · cn xn subject to a11 x1 + · · · + a1n xn ≤ b1 ··· am1 x1 + · · · + amn xn ≤ bm x1 ≥ 0, . . . , xn ≥ 0. I Can be rewritten in standard form shown at the beginning! I Using matrices and vectors: maximize hc, xi (= c> x) subject to Ax ≤ b (inequality componentwise) x≥0 12 / 24 Linear programming: geometry {x : ha, xi ≤ b}. 13 / 24 Linear programming: geometry {x : ha1 , xi ≤ b1 , . . . , ham , xi ≤ bm } = {x : Ax ≤ b}. Polyhedron 14 / 24 Linear programming: geometry maximize hc, xi subject to Ax ≤ b. 15 / 24 Linear programming: application A cargo plane has two compartments with capacities C1 = 35 and C2 = 40 tonnes and volumes V1 = 250 and V2 = 400 cubic metres. 16 / 24 Linear programming: application A cargo plane has two compartments with capacities C1 = 35 and C2 = 40 tonnes and volumes V1 = 250 and V2 = 400 cubic metres. I Three types of cargo: Cargo 1 Cargo 2 Cargo 3 Volume 8 10 7 Weight 25 32 28 Profit (£/ tonne) £300 £350 £270 16 / 24 Linear programming: application A cargo plane has two compartments with capacities C1 = 35 and C2 = 40 tonnes and volumes V1 = 250 and V2 = 400 cubic metres. I Three types of cargo: Cargo 1 Cargo 2 Cargo 3 I Volume 8 10 7 Weight 25 32 28 Profit (£/ tonne) £300 £350 £270 How to distribute the cargo to maximize profit? 16 / 24 Linear programming: application xij : amount of cargo i in compartment j Objective function: f (x) = 300 · (x11 + x12 ) + 350 · (x21 + x22 ) + 270 · (x31 + x32 ). 17 / 24 Linear programming: application xij : amount of cargo i in compartment j Objective function: f (x) = 300 · (x11 + x12 ) + 350 · (x21 + x22 ) + 270 · (x31 + x32 ). Constraints: x11 + x12 ≤ 25 (total amount of cargo 1) x21 + x22 ≤ 32 (total amount of cargo 2) x31 + x32 ≤ 28 (total amount of cargo 3) x11 + x21 + x31 ≤ 35 (weight constraint on comp. 1) x12 + x22 + x32 ≤ 40 (weight constraint on comp. 2) 8x11 + 10x21 + 7x31 ≤ 250 8x12 + 10x22 + 7x32 ≤ 400 (volume constraint on comp. 1) (volume constraint on comp. 2) (x11 + x21 + x31 )/35 − (x12 + x22 + x32 )/40 = 0 (maintain balance of weight ratio) xij ≥ 0 (cargo can’t have negative weight) 17 / 24 Linear programming: application As complicated as it might look, it can be solved easily with CVX. c = [300;300;350;350;270;270]; A = [1 , 1 , 0 , 0 , 0 , 0; . . . 0 , 0 , 1 , 1 , 0 , 0; . . . 0 , 0 , 0 , 0 , 1 , 1; . . . 1 , 0 , 1 , 0 , 1 , 0; . . . 0 , 1 , 0 , 1 , 0 , 1; . . . 8 , 0 , 10 , 0 , 7 , 0 ; . . . 0 , 8 , 0 , 10 , 0 , 7 ] ; B = [ 1 / 3 5 , −1/40, 1 / 3 5 , −1/40, 1 / 3 5 , −1/40]; b = [25;32;28;35;40;250;400]; cvx begin variable x (6); maximize ( c ’∗ x ) ; s u b j e c t to A∗x <= b ; B∗x == 0 ; x >= 0 ; cvx end 18 / 24 Linear programming: application As complicated as it might look, it can be solved easily with CVX. c = [300;300;350;350;270;270]; A = [1 , 1 , 0 , 0 , 0 , 0; . . . 0 , 0 , 1 , 1 , 0 , 0; . . . 0 , 0 , 0 , 0 , 1 , 1; . . . 1 , 0 , 1 , 0 , 1 , 0; . . . 0 , 1 , 0 , 1 , 0 , 1; . . . 8 , 0 , 10 , 0 , 7 , 0 ; . . . 0 , 8 , 0 , 10 , 0 , 7 ] ; B = [ 1 / 3 5 , −1/40, 1 / 3 5 , −1/40, 1 / 3 5 , −1/40]; b = [25;32;28;35;40;250;400]; cvx begin variable x (6); maximize ( c ’∗ x ) ; s u b j e c t to A∗x <= b ; B∗x == 0 ; x >= 0 ; cvx end x11 = 6.7500, x12 = 7.7143, x21 = 0, x22 = 32, x31 = 28, x32 = 0. 18 / 24 Example: portfolio optimization Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n. 19 / 24 Example: portfolio optimization Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n. I If asset i has expected return µi , then the overall expected return of the investment will be x1 µ1 + · · · + xn µn = µ> x 19 / 24 Example: portfolio optimization Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n. I If asset i has expected return µi , then the overall expected return of the investment will be x1 µ1 + · · · + xn µn = µ> x I The risk of the investment is modelled using the estimated covariance matrix Σ and the quadratic function x> Σx. 19 / 24 Example: portfolio optimization Problem: invest proportion xi of funds in asset i, 1 ≤ i ≤ n. I If asset i has expected return µi , then the overall expected return of the investment will be x1 µ1 + · · · + xn µn = µ> x I The risk of the investment is modelled using the estimated covariance matrix Σ and the quadratic function x> Σx. I Goal: choose ratios xi in order to minimize the risk given some target expected return µ: minimize subject to x> Σx µ> x = µ x ≥ 0. 19 / 24 Example: portfolio optimization minimize subject to x> Σx µ> x = µ x ≥ 0. 20 / 24 Example: portfolio optimization minimize subject to x> Σx µ> x = µ x ≥ 0. I This problem is a combination of the previous two: we minimize a quadratic function and have linear constraints. 20 / 24 Example: portfolio optimization minimize subject to x> Σx µ> x = µ x ≥ 0. I This problem is a combination of the previous two: we minimize a quadratic function and have linear constraints. I If we remove the condition x ≥ 0 (corresponds to allowing short selling), then the problem has a closed form solution. In general, can be solved efficiently. 20 / 24 Example: total variation denoising Assume a vector x comes from sampling a signal at equal time intervals, and we observe a noisy version x = x + ε, 1.5 1 0.5 0 −0.5 −1 −1.5 0 50 100 150 200 250 300 350 400 21 / 24 Example: total variation denoising Assume a vector x comes from sampling a signal at equal time intervals, and we observe a noisy version x = x + ε, 1.5 1 0.5 0 −0.5 −1 −1.5 0 I 50 100 150 200 250 300 350 400 Goal: find good estimate x̂ for noiseless signal. 21 / 24 Example: total variation denoising Solve optimization problem minimize kx − xk2 + λ kxkTV P where kxkTV = n−1 i=1 |xi+1 − xi | is the total variation, or TV norm of x, and λ is a regularisation parameter. 22 / 24 Example: total variation denoising Solve optimization problem minimize kx − xk2 + λ kxkTV P where kxkTV = n−1 i=1 |xi+1 − xi | is the total variation, or TV norm of x, and λ is a regularisation parameter. I We use the notation x̂ = arg min kx − xk2 + λ kxkTV . 22 / 24 Example: total variation denoising Solve optimization problem minimize kx − xk2 + λ kxkTV P where kxkTV = n−1 i=1 |xi+1 − xi | is the total variation, or TV norm of x, and λ is a regularisation parameter. I We use the notation x̂ = arg min kx − xk2 + λ kxkTV . I The objective function is nonsmooth: there are points where it is not differentiable! 22 / 24 Example: total variation denoising Solve optimization problem minimize kx − xk2 + λ kxkTV P where kxkTV = n−1 i=1 |xi+1 − xi | is the total variation, or TV norm of x, and λ is a regularisation parameter. I We use the notation x̂ = arg min kx − xk2 + λ kxkTV . I The objective function is nonsmooth: there are points where it is not differentiable! I Optimization theory needs tools to do calculus with such function: theory of subdifferentials, nonsmooth analysis. 22 / 24 Example: total variation denoising Solve optimization problem minimize kx − x̂k2 + λ kxkTV P where kxkTV = n−1 i=1 |xi+1 − xi | is the total variation, or TV norm of x, and λ is a regularisation parameter. Signal after denoising 1.5 1 0.5 0 −0.5 −1 −1.5 0 50 100 150 200 250 300 350 400 23 / 24 Outline General information What is optimization? Course overview Overview of the course The course will try to balance three points of view: I Theory. Optimality conditions, duality theory, convex analysis, nonsmooth analysis, geometry of linear, quadratic and semidefinite programming. I Algorithms. Descent algorithms, interior-point methods, projected gradient methods, Lagrangian I Applications. Logistics and scheduling, finance, signal processing, convex regularization, semidefinite relaxations of hard combinatorial problems, machine learning, compressive sensing. 24 / 24