IEOR 4004: Optimization Models and Methods Lecture 1: Introduction to linear programs Instructor: Garud Iyengar 1 Definition of a linear program Linear program (LP) is an optimization model where • Decision variables x1 , x2 , x3 , . . . , take real values. • Objective is to maximize or minimize a single linear function of the decision variables, e.g. 2x1 − 4x2 + 5x3 , x1 + x2 + x3 + x4 , etc. • Constraints are linear inequalities or equalities, e.g. x1 + x2 ≤ 80, x1 ≥ 4x1 + x3 , x1 + x2 + x3 + 4x4 = 10, etc. • Sign restrictions e.g. x1 ≥ 0, x2 ≤ 0. 2 Examples of linear programs Modeling in most cases reduces to pattern matching. So, we will work through several examples so you have examples of a wide variety of problems. 2.1 Transportation problem A paper company wants to ship truckloads of paper from warehouses to stores. • There are two warehouses with supplies of 100 and 200 units resp. • There are three stores with demands 50, 100, 150 units resp. • Per-unit transportation costs: W1 W2 S1 10 11 S2 12 12 S3 14 13 • Problem: minimize total transportation cost. W2 W1 S1 S2 1 S3 Formulation as an LP: Decision variables. Objective. xij = quantity of paper to transport from Warehouse i to store j. x11 , x12 , x13 , x21 , x22 , x23 . Minimize transportation cost 10x11 + 12x12 + 14x13 + 11x21 + 12x22 + 13x23 Constraints. • Sign Restrictions: xij ≥ 0 for all i, j. Cannot transport negative quantities of paper. • Supply constraints: a warehouse cannot supply more paper than its capacity. x11 + x12 + x13 ≤ 100 x21 + x22 + x23 ≤ 200 • Demand satisfaction: every store’s demand must be satisfied x11 + x21 ≥ 50 x12 + x22 ≥ 100 x13 + x23 ≥ 150 Linear Program. min 10x11 + 12x12 + 14x13 + 11x21 + 12x22 + 13x23 Subject to x11 + x12 + x13 ≤ 100 x21 + x22 + x23 ≤ 200 x11 + x21 ≥ 50 x12 + x22 ≥ 100 x13 + x23 ≥ 150 x≥0 (1) We call this a program. It is a linear program, because the objective is a linear function of the decision variables, and the constraints are linear inequalities (in the decision variables). 2.2 Investment We have $100 to invest. There are three investment vehicles. a. for every $1 invested now, we get $0.1 one year from now, and $1.3 three years from now. b. for every $1 invested now, we get $0.2 one year from now, and $1.1 two years from now. c. for every $1 invested a year from now, we get $1.5 three years from now. d. remaining money in money-market account, gets 2% per year. 2 Table 1: Cash flow Year Option a Option b Option c 0 (beginning of year 1) -1 -1 0 end of year 1 0.1 0.2 -1 end of year 2 0 1.1 0 end of year 3 1.3 0 1.5 Goal: to maximize value in three years. This is an example of multi-period decision problem or dynamic decision problem. Such problems arise when the decision maker makes decisions at more than one point in time. For example, in above problem, the investor needs to decide how much money to invest in each option now, in the beginning of year 1, and at the end of year 1. In such problems, decisions made during the current period influence decisions made during future periods. For example, in above problem, if money is invested now in option a or b, there will less money left to invest in option c an year from now. Decision variables • quantities invested in each option: xa , xb , xc ≥ 0 • amount invested in money market account at the end of the period: y0 , y1 , y2 , y3 ≥ 0 Objective max y3 Constraints The constraints in this problem as basically the budget constraints, i.e. inflow = outflow. • Time 0: xa + xb + y0 = 100 • Time 1: 0.1xa + 0.2xb + 1.02y0 = xc + y1 • Time 2: 1.1xb + 1.02y1 = y2 • Time 3: 1.3xa + 1.5xc + 1.02y2 = y3 Matrix formulation 2.3 maxx,y Subject to y3 xa + xb + y0 0.1xa + 0.2xb − xc + 1.02y0 − y1 1.1xb + 1.02y1 − y2 1.3xa + 1.5xc + 1.02y2 − y3 x, y = = = = ≥ 100 0 0 0 0 Feed blending problem This is a classical LP problem. In fact LP’s originated to solve a similar problem albeit for military personnel. Cattle feed can be mixed from oats, corn, alfalfa and peanut hulls. The following table shows the current cost per ton (in dollars) of each of these ingredients together with the percentages of recommended daily allowances for protein, fat and fiber that a serving of it fulfills. 3 Protein Fat Fiber Cost Oats 60 50 90 200 Corn 80 70 30 150 Alfalfa 30 40 60 100 Hulls 40 100 80 75 We want to find a minimum cost way to produce feed that satisfies at least 60% of the daily allowance for protein and fibers while not exceeding 60% of the daily allowance for fat. Let xo denote the fraction of oats in 1 unit of cattle feed. Similarly, define xc , xa and xh . Then, xo +xc +xa +xh = 1, and the nutrition constraints can be written as Protein: Fat: Fiber: 60xo + 80xc + 30xa + 40xh 50xo + 70xc + 40xa + 100xh 90xo + 30xc + 60xa + 80xh ≥ ≤ ≥ 60 60 60 The objective here is to minimize 200xo + 150xc + 100xa + 75xh . 2.4 Whisky blending problem A distiller wants to blend i = 1, . . . , I different single malt whiskies to produce j = 1, . . . , J blends. Suppose each whisky is characterized by K different properties. Let aik denote the value of k-th property for whisky i. Suppose these properties combine linearly when one blends whiskies, i.e. if we blend xi barrels of whisky i with xj barrels a x +a xj . of whisky j, the k-th property of the blend is given by ik xii +xjk j Show how to convert each of the following as a linear constraint using the decision variable: xij = number of barrels of whisky i in the blend j Note that this is a discrete variables. However, we will relax it to a non-negative continuous variables, and hope that the approximation will not cause an issue. • Property k for all blends must be between 45 and 48: Using the fact that properties combine linearly, this constraint can be formulated as PI i=1 aik xij 45 ≤ P ≤ 48 I i=1 xij or, equivalently I X aik xij ≥ 45 I X i=1 i=1 I X I X aik xij ≤ 48 i=1 ! xij ∀j = 1, . . . , J, ! xij ∀j = 1, . . . , J. i=1 • At least 1/3 of the total volume of the blends should come from single malts i = 3, . . . , 6. J X 6 X I J 1 XX xij ≥ xij 3 i=1 j=1 j=1 i=3 • No more than 5% of the volume should come from the single malt i = 1. J X x1j ≤ 0.05 j=1 I X J X i=1 j=1 4 xij 2.5 Metal working problem A metal shop has a demand dj , j = 1, . . . , J, for discs of size sj , j = 1, . . . , J. The discs have to be cut out of standard rectangular sheets using one of L possible cutting patterns. Cutting pattern ` produces aj` discs of type j and wastes w` fraction the sheet. Formulate an LP that computes the minimum cost combination of the cutting patterns that meets the demand. The natural decision variables for this problem are: x` = number of sheets cut using pattern ` Note that this is a discrete variables. As before, we will relax it to a non-negative continuous variables, and hope that the approximation will not cause an issue. The demand constraints are given by L X aj` x` ≥ dj , j = 1, . . . , L, `=1 and the objective is given by min L X w` x` `=1 2.6 Part assembly problem A factory can produce i = 1, . . . , I different parts, and these parts can be combined into j = 1, . . . , J different assemblies, where assembly j requires aji units of part i. Each unit of part i can be sold for pi and each unit of assembly j can be sold for price qj . Each part and assembly is manufactured using M different machines. Part i requires πim hours of machine m, and each assembly requires γjm hours of machine m. A total of hm hours of machine m are available. Formulate an LP to determine the combination of parts and assemblies that maximizes revenue. The natural set of decision variables for this problem are as follows: xi = number of units of part i that are sold directly zi = number of units of part i are used in assemblies yj = number of units of assembly j produced Again, all these variables are really discrete. But, we will relax them to be non-negative variables. PI PJ The objective is clearly max i=1 pi xi + j=1 qj yj . There are two kinds of constraints in this problem: (1) Machine time constraints: I X πim (xi + zi ) + i=1 J X γjm yj ≤ hm , m = 1, . . . , M. j=1 (2) Inflow = Outflow for all parts that go into assemblies. J X aji yj = zi , j=1 5 i = 1, . . . , I. 2.7 Nurse scheduling problem A hospital wants to schedule a weekly shift for its nurses. The demand of nurses on day j = 0, . . . , 6 is dj . Every nurse works five days in a row, and the gets two days off. Formulate an LP to compute the minimum number of nurses needed. One might immediately want to model this problem using a decision variable xi as the number of nurses working on day i. However, this choice will not allow us to model the constraint that each nurse works 5 days in a row and then gets a break. Instead, we will work with the decision variable xi = number of nurses who start their 5 day shift on day i. Thus, the number of nurses working on day 0 are: x0 + x6 + x5 + x4 + x3 , and therefore, the demand constraint for day 0 is given by x0 + x6 + x5 + x4 + x3 ≥ d0 . More generally, the nurses working on day j start their shift on day i = (j − k) mod 6, for k = 0, . . . , 6. Thus, the general constraint is given by 6 X x(j−k) mod 6 ≥ dj , j = 0, . . . , 6. k=0 Since each nurse starts the shift on a unique day, the total number of nurses is minx P6 xj , s.t. P6 x(j−k) mod 6 ≥ dj , j=0 k=0 P6 j=0 xj . Thus, the LP is given by j = 0, . . . , 6, x ≥ 0. An LP may not be the best model for this problem, since dj are likely to be small numbers. 2.8 Assignment problem Suppose n people are to be assigned to n jobs, wij measures the performance of person i for job j. Find the optimal assignment (maximize the total performance). P maximize ij wij xij Pn s.t. j=1 xij = 1 i = 1, . . . , n Pn i=1 xij = 1 j = 1, . . . , n xij ∈ {0, 1} i = 1, . . . , n, j = 1, . . . , n This is not a linear program – it is an integer program. But it turns out that one can relax the integrality constraints xij ∈ {0, 1} to the linear constraints 0 ≤ xij ≤ 1, or equivalently xij ∈ [0, 1] without affecting the optimal solution. We learn later how to recognize such problems. Suppose there are restrictions on who can be assigned to every given job. In such a case, how would you formulate the problem of finding an assignment that maximizes the number of matches? 6 2.9 Network or graph LPs A graph consists of a • set of node or vertices V = {1, . . . , n} labeled from 1 to n, • and a set of arcs E = {(m, n) : m, n ∈ V} ⊆ V × V, where the arc (m, n) is interpreted to go from node m to node n. Note that graph can may contain both arcs (m, n) and (n, m). A graph that contains the arc (n, m) whenever it contains the arc (m, n) is called an undirected graph. The arc (m, m) for some m ∈ V is called a loop or a self-loop. Flows Arc (m, n) can carry a flow xmn from node m to node n. The flow on any arc must be less than its capacity u, i.e. xmn ≤ umn , and arcs can have costs. Typically, the cost of sending a flow xmn on arc (m, n) is assumed to be cmn xmn . We will consider other kinds of cost later in this course. Flow conservation in nodes There is no storage at nodes, so whatever flow comes in must go out. Nodes may have external supply bn , n ∈ V that can take positive or negative value. Therefore, the flow conversation constraint at node n is given by Flow in = Flow out X X xmn = xnm bn + (m,n)∈E Flow LPs (n,m)∈E Flow LPs come is two varieties: (a) Max flow problems: Maximize the flow from a node s to the node t subject to arc capacities. This problem can be formulated as follows: max z x,z s.t. X xtj − (t,j)∈V X xjt = z, (j,t)∈V xsj − (s,j)∈V X X X xjs = −z, (j,s)∈V xij − (i,j)∈V X xji = 0, i ∈ V\{s, t} (i,t)∈V 0 ≤ x ≤ u. (b) Min cost problems: Route a demand d from a node s to the node t at the lowest cost subject to arc capacities. 7 This problem can be formulated as follows: X min cij xij x (i,j)∈V s.t. X xtj − (t,j)∈V X xjt = d, (j,t)∈V xsj − (s,j)∈V X X X xjs = −d, (j,s)∈V xij − (i,j)∈V X xji = 0, i ∈ V\{s, t} (i,t)∈V 0 ≤ x ≤ u. We study network flow LPs for several reasons: • They provide a nice way of visualizing an optimization problem • The optimal solution of network LPs is integral when the data is integral. • These LPs have a lot of structure, and can be solved much faster than general LPs, and can be approximately at almost a linear rate. 2.10 Inventory Routing Problem Consider a multi-period inventory problem. Here there is demand for a product over time. The demand must be met at each time period. In the following (simple) example we have four time periods and per-unit production costs and capacities are given as per the table below. Additionally, units product of the product at each level of completion can be stored as inventory, at a per-unit cost and also subject to capacities. 1 2 3 4 Prod. Cost (c) 50 4000 30 22 Prod. Capacity (u) 100 50 54 90 Inv. Cost (h) 150 240 45 Inv. Capacity (w) 170 10 44 Finally, demands are as follows. period demand 1 70 2 50 3 40 4 90 The problem is to establish a production and inventory plan that delivers all demands at minimum cost. The direct LP formulation for this problem is as follows. The decision variables are as follows: • production in period t: x1 , . . . , x4 • inventory at the end of period t: s1 , . . . , s3 The objective is clearly min P4 t=1 ct xt + P3 t=1 ht st . The constraints are as follows: • Production capacity constraints: 0 ≤ xt ≤ ut , t = 1, . . . , 4. 8 • Inventory capacity contraints: 0 ≤ st ≤ wt , t = 1, 2, 3. • Inventory dynamics x1 = d1 + s1 s1 + x2 = d2 + s2 s2 + x3 = d3 + s3 s3 + x4 = d4 This problem can be represented as a min-cost flow problem using the following network. In this figure, arcs (5, 1), (5, 2), (5, 3) and (5, 4) represent production; node 5 is an auxiliary node. The horizontal arcs represent inventory. Next to each arc we show its cost and capacity. Demands are shown in green and supplies (only one) in red. Next, we consider different variations. • Suppose that the production capacity is divided between two factories, each with the following capacities and costs in month 1, 2, 3, 4: 1 2 3 4 Factory 1 Prod. Cost Prod. 50 4000 30 22 Capacity 50 25 27 45 Factory 2 Prod. Cost Prod. 100 8000 60 44 Capacity 50 25 27 45 The problem is again to establish a production and inventory plan that delivers all demands at minimum cost, but now you also need to divide the production between the two factories. How would you modify your answer? We can create two dummy nodes 5 and 6 to represent two types of productions. However, we don’t know how the total supply (250) should be divided between the two. Therefore, we create another dummy node 7, with supply 250. Below, we show the resulting network. For illustration, cost and capacities for edges (5, 1) and (6, 1) are shown. Other edges from node 5 and 6 should also be labeled with cost and capacities, similarly. 9 • In the next variation, the factories charge a flat hourly rate: Factory 1 charges 2$ per hour, and is available for 100 hours; Factory 2 charges 6$ per hour, and is available for 700 hours. Producing one unit of product takes 10 hours. How would you formulate this as a minimum cost flow problem? You will need to convert all capacities and cost rates into the same units – either time or product units. (Note that if the production time were different for the different factories, inventory capacities could not be converted into time units). Let’s convert everything into product units, using the following formula: 10 hours of factory time is equivalent to 1 unit of product. Therefore, Factory 1 charges 20$ per unit of product, and its capacity is 10 units of products; Factory 2 charges 60$ per unit of product, and its capacity is 70 units of products. Using these conversions, repopulate the production cost and capacity tables for the two factories, and then you can use the same solution as in the previous bullet. • How would you modify your answer if the demand could be backlogged at a cost of $500 per product unit per month, but you still need to satisfy all the demands by end of the month 4. A simple solution is create an edge going from the node for every month m to m − 1 with cost 500$ and infinite capacity. (Why does this work?) 2.11 Data mining/Linear regression Suppose we have 2-d data points (1, 2), (3, 4), (4, 7). We want to find a linear relation of form y = ax + b that best explains the data. In general there may not exist any w, z such that yi = wxi + z for all data points. Then, we attempt to find the closest approximation. The general problem is: • We are given vectors x1 , x2 , . . . , xN all in Rn . (E.g., represents attributes like Age, Gender, Race, Marital status, Education level) • We are given scalar values y1 , y2 , . . . , yN . (E.g., salary level) 10 • We want to come up with an “approximate” linear model for the data. We want to find w ∈ Rn and z ∈ R so that yi ≈ wT xi + z, 1 ≤ i ≤ N Here decision variables are w, z. (E.g., with such a linear model, we can predict the salary level, given a new person’s attributes) Method 1. The traditional method: linear regression. (N ) X T 2 min (w xi + z − yi ) . w,z i=1 What kind of problem is this? What does it try to do? Method 2. Non-traditional method. min max |wT xi + z − yi |. w,z 1≤i≤N What kind of problem is this? What does it try to do? Actually, this is another linear program: let’s do it in two steps. min V Subject to, V ≥ |wT xi + z − yi |, 1 ≤ i ≤ N. (2) Here the variables are V , w (in Rn ) and z. But is this really linear? No: (2) is not a linear program. Let’s examine the absolute value function. Note that |x| = max{x, −x}. Thus, V ≥ |wT xi + z − yi | is the same as V ≥ max{wT xi + z − yi , −(wT xi + z − yi )}, and, this single constraint, in turn, equivalent to the pair of constraints V ≥ wT xi + z − yi , and V ≥ −(wT xi + z − yi ). The (2) can be reformulated as the following linear program min V Subject to V ≥ wT xi + z − yi , 1 ≤ i ≤ N, V ≥ −wT xi − z + yi , 1 ≤ i ≤ N. (3) When does this trick work? What if the problem was maxw,z min1≤i≤N |wT xi + z − yi |? 2.12 Piecewise linear concave/convex functions We call a set C ⊆ Rd a convex set if for any choice of x and y in C, the entire line segment [x, y] ⊆ C, i.e. xλ = θx + (1 − θ)y ∈ C, for all x, y ∈ C, and θ ∈ [0, 1]. x x λ y y Convex set Non-convex set We call a function f (x) : Rd → R a convex function provided any one of the following equivalent conditions hold: 11 (i) For all x, y ∈ Rd and θ ∈ [0, 1], the function value f (θx + (1 − θ)y) at the convex combination θx + (1 − θ)y of x and y satisfies: f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y), the function value at the convex of combination of x and y is less than, or equal to, the convex combination of the function values f (x) and f (y). (ii) The set {(α, x) : f (x) ≤ α} is a convex set in Rd+1 . (iii) At each point x, there exists a vector ax such that for all y ∈ Rd f (y) ≥ f (x) + a> x (y − x) It immediately follows that for any set of point {xi : i ∈ S} f (y) ≥ max f (xi ) + a> xi (y − xi ) i∈S We call a function f concave, if the function −f is convex. We need a couple more facts: • {x : a> x ≤ b}, {x : a> x ≥ b} and {x : a> x = b} are all convex sets. • Intersection of convex sets is convex. Thus, the feasible set of a linear program is convex. • A non-negative combination of convex (concave) function is convex (concave). • If the function f is a convex, the lower level set {x : f (x) ≤ α} is convex for all α. Similarly, if the function f is concave, the upper level set {x : f (x) ≥ α} is a convex set for all α. • A convex optimization problem comes in the following two varieties: maxx g(x) concave function s.t. x ∈ C convex set minx f (x) convex function s.t. x ∈ C convex set We call a function f (x) a piecewise linear convex function if f (x) is of the form f (x) = maxi∈S {a> i x + bi }. A function f (x) is piecewise linear concave if it is of the form f (x) = mini∈S {a> x + b }. We call an optimization i i problem a piecewise linear convex optimization problem if • it is convex optimization problem and the objective, • and the constraints are all defined using a combination of piecewise linear functions. Theorem 1. A piecewise linear convex optimization problem can be reformulated as a linear program. Next, we discuss two examples to show this transformation is done. Suppose we currently hold y ∈ Rd shares of Pdd assets, and because of price impact the revenue that we get from trading x ∈ Rd shares is given by f (x) = i=1 min1≤k≤n {aik xi + bik }, i.e. the proceeds from the sales of each asset i is concave in xi – this is typically the case because of diminishing returns to scale. We want to generate K from the sales today, while ensure that the future value q > (y − x) is the maximum possible. Thus, the optimization problem is given by maxx q > (y − x), Pd s.t. f (x) = i=1 min1≤k≤n {aik xi + bik } ≥ K We transform the problem into a linear program as follows: 12 • Set each piecewise linear term equal to a new variable, and replace the piecewise linear function by this new variable in the objective and constraints. For this problem, we introduce vi = min {aik xi + bik } 1≤k≤n and replace the piecewise linear term in the constraint to write d X vi ≥ K i=1 • Relax the equality in the definition into either ≤ or ≥ in a manner that the constraint can be written as a collection of linear constraints, and add these constraints to the optimization problem. In this case, we relax vik ≤ min {aik xi + bik }. 1≤k≤n This contraint is equivalent to the following collection of n linear inequalities vik ≤ aik xi + bik , k = 1, . . . , n. Convince yourself that relaxing = to ≥ will NOT work here. Therefore, the linear program is given by maxx,v s.t. q > (y − x), Pd i=1 vi ≥ K, aik xi + bik − vi ≥ 0, i = 1, . . . , d, k = 1, . . . , n. Suppose the production cost for producing x units of a product i is ci (x) = max1≤k≤n {cik x + dik } and each unit can be sold for a price pi . Compute the profit maximizing production plan subject to the operational constraints Ax ≤ b. Thus, the optimization problem is given by maxx s.t. Pd p> x − i=1 ci (x), Ax ≥ b, x ≥ 0. First, we convince ourselves that this is a piecewise linear convex optimization problem: Pd • c(x) = i=1 ci (x) is a piecewise linear convex function. Therefore, the objective p> x − c(x) is a piecewise linear concave function. • The feasible set is the intersection of linear inequalities, and therefore, convex. Therefore, it follows that the production planning problem is piecewise linear convex. So, we know we can transform this problem into an LP. Here are the steps: • Introduce a new variable for each piecewise linear term. Set vi = max {cik x + dik }. 1≤k≤n The new objective function is p> x − Pd i=1 vi 13 • Relax the equality to an inequality that can be rewritten as a collection of linear inequalities. vi ≥ max {cik x + dik }, 1≤k≤n can be reformulated as vi ≥ cik x + dik , k = 1, . . . , n. Thus, the LP is given by maxx,v s.t. 3 p> x − 1> v, Ax ≥ b, vi ≥ cik x + dik , x ≥ 0. i = 1, . . . , d, k = 1, . . . , n, Linear programs and matrix notation Linear program (LP) : • Decision variables: x1 , x2 , x3 , . . . , • maximize or minimize an objective • Objective function is linear, e.g. 2x1 − 4x2 + 5x3 , x1 + x2 + x3 + x4 , etc. • Constraints are linear inequalities or equalities, e.g. x1 + x2 ≤ 80, x1 ≥ 4x1 + x3 , x1 + x2 + x3 + 4x4 = 10, etc. • Sign restrictions e.g. x1 ≥ 0, x2 ≤ 0. In matrix/shorthand notation we can write a linear program as: max or min cT x Subject to Ax = b Cx ≤ d Dx ≥ f xi ≥ 0, or xi ≤ 0 or xi unrestricted for every i Expressing linear programs using matrix notation is useful for using solvers like Matlab with Gurobi. Let’s consider the linear program for the transportation example: min 10x11 + 12x12 + 14x13 + 11x21 + 12x22 + 13x23 Subject to x11 + x12 + x13 ≤ 100 x21 + x22 + x23 ≤ 200 x11 + x21 ≥ 50 x12 + x22 ≥ 100 x13 + x23 ≥ 150 x≥0 14 In matrix notation, we can write this as: min cT x Subject to Ax ≤ b Dx ≥ f x≥0 where 10 12 14 ,A = 1 c= 11 0 12 13 1 0 1 0 0 1 1 0 0 , D = 0 1 1 0 0 1 0 0 0 1 1 0 0 0 1 0 0 50 100 0 , b = f = 100 200 1 150 Let’s solve it using Matlab+Gurobi. Matlab requires specifying [A; D] as a single matrix A and [b; f ] has a single vector b. 15