3 The Utility Maximization Problem We have now discussed how to describe preferences in terms of utility functions and how to formulate simple budget sets. The rational choice assumption, that consumers pick the best affordable bundle, can then be described as an optimization problem. The problem is to find a bundle (x∗1 , x∗2 ), which is in the budget set, meaning that p1 x∗1 + p2 x∗2 ≤ m and x∗1 ≥ 0, x∗2 ≥ 0, which is such that u(x∗1 , x∗2 ) ≥ u(x1 , x2 ) for all (x1 , x2 ) in the budget set. It is convenient to introduce some notation for this type of problems. Using rather standard conventions I will write this as max u (x1 , x2 ) x1 ,x2 subj. to p1 x1 + p2 x2 ≤ m x1 ≥ 0 , x2 ≥ 0 which is an example of a problem of constrained optimization. A common tendency of students is to skip the step where the problem is written down. This is a bad idea. The reason is that we will often study variants of optimization problems that differ in what seems to be small “details”. Indeed, often times the difficult step when thinking about a problem is to formulate the right optimization problem. For this reason I want you to: 1. Write out the “max” in front of the utility function (the maximand, or, objective function). This clarifies that the consumer is supposed to solve an optimization problem. 2. Below the max, it is a good idea to indicate what the choices variables are for the consumer (x1 and x2 in this example). This is to clarify the difference between the variables that are under control of the decision maker and variables that the decision maker has no control over, which are referred to as parameters. In the application above p1 , p2 and m are parameters. 38 3. Finally, it is important that it is clear what the constraints to the problem are. I good habit is to write “subject to” or, more concisely, s.t. and then list whatever constraints there are, as in the problem above. 3.1 Basic Math Review Read this only if you fell that you don’t quite know how to deal with simple optimization problems. 3.1.1 Derivatives y 6 ½½ ½ ½ ½ ½ ax00 + b ½ ½ ½ ax0 + b ½ ½ ½ ½ b½ ½ ½ ½b ½− a x0 x00 ½ x Figure 1: A Linear Function Consider a linear function y = h (x) = ax + b as the one depicted in Figure 1. Recall that the slope of a linear function is f (x00 ) f (x0 ) z }| { z }| { change in y ax00 + b − (ax0 + b) = a. = slope = change in x x00 − x0 Now, let’s think about a non-linear function y = f (x) . If we just do the same thing as with the linear function and think of the “slope” as the ratio of the change in the value of the function to the change in the value of the argument. 39 f(x) 6 f(x0 ) + ³ ´! Ã ³ 00 ´ −f x0 f x ¡ x00 −x0 x − x0 ¢ f(x00 ) f (x0 ) x0 x x00 Figure 2: A Non-Linear Function Note that 0 h (x) = f (x ) − | = f (x0 ) + µ µ f (x00 ) − f (x0 ) x00 − x0 {z ¶ µ ¶ f (x00 ) − f (x0 ) x= x + x00 − x0 } | {z } 0 b ¶ f (x00 ) − f (x0 ) (x − x0 ) x00 − x0 a is a linear function that assigns the same value to the function at x0 and x00 (to see this it is just to plug in x = x0 and x = x00 above and check that h (x0 ) = f (x0 ) and h (x00 ) = f (x00 )). Now imagine that you do the same thing over and over again with x0 fixed and x00 moved mover and mover to x0 (from the right, say). Intuitively, it is rather clear that the slope of the linear function becomes “closer and closer” to what we intuitively think of at the slope of f at the point x0 −the tangent line. Indeed, this is exactly how the derivative of a function at a particular point is defined. Definition 1 The derivative of f at x0 , denoted df (x0 ) , dx is given by df (x0 ) f (x00 ) − f (x0 ) = lim x00 →x0 dx x00 − x0 You may have seen this written (as in Varian) df (x0 ) f (x0 + ∆x) − f (x0 ) = lim , ∆x→0 dx ∆x 40 f(x) 6 f (x0 ) + f(x0 ) df (x0 ) dx (x − x0 ) · · · · · · · x x0 Figure 3: The Derivative of f at x0 which is the same thing as you can see by letting x00 = x0 + ∆x. Observe that it is geometrically obvious that the tangent line drawn in Figure 3 and the lines between x0 and x00 are getting really close to each other if you pick x00 very close to x0 . For our purposes, what is interesting about the derivative is that the tangent line, h (x) = ax + b = f (x0 ) + df (x0 ) (x − x0 ) , dx is “close” to the original function f (x) for x “near” x0 . Indeed, the tangent line is often called the (best) linear approximation of f at x0 . To make sense of this at an intuitive level we observe that: 1. It is a linear function of x, with constant b = f (x0 ) − df (x0 ) 0 x dx and slope a = df (x0 ) dx 2. Plug in x = x0 ⇒the value of the linear function equals f (x0 ) −Thus it exactly coincides with the original function at x0 . 3. df (x0 ) dx is the slope of the original function at x0 and also the slope coefficient of the linear function (thus, the slope for all x) It is intuitive from the picture that the linear approximation is a good approximation of f (x) near x0 . In particular (which is what the optimization techniques rely on) it will be 41 the case that the derivative is strictly positive when the function is strictly increasing and strictly negative when the function is strictly decreasing. 3.1.2 Optional Reading: How to Make this Rigorous? An intuitive understanding will be sufficient for the purposes of this class, but you should not be fooled into thinking that the treatment above fully explains what a derivative is. Here, I’ll only give the formal definition of a limit and discuss an example. If you want to get a better understanding, any reasonable book on calculus is a good source (the book by Sydsaeter and Hammond is excellent for example). The hard part when talking derivatives is to give a precise meaning to lim F (x) = A. x→x0 Intuitively, the statement means that the difference between F (x0 ) and A can be made as small as we desire by letting the difference between x and x0 be small enough. Formally, this may be stated as follows: Definition 2 We say that limx→x0 F (x) = A (in words, F (x) tends to A as x tends to x0 ) if for every number > 0 there exists a number δ > 0 such that: A − < F (x) < A + x0 − δ < x < x 0 + δ whenever Since this definition makes the concept of a limit precise we know also have a precise definition of what a derivative is. Most of you probably know some rules of differentiation of certain functions and such rules can be established directly from this definition. Example 3 Consider f (x) = x3 . Then f (x00 ) − f (x0 ) = (x00 )3 − (x0 )3 . Let x00 = x0 + ∆ and rewrite this as 3 3 3 (x0 + ∆) − (x0 ) = (x0 + ∆) (x0 + ∆) (x0 + ∆) − (x0 ) ´ ³ 2 3 = (x0 ) + 2∆x0 + ∆2 (x0 + ∆) − (x0 ) 3 2 2 3 = (x0 ) + (x0 ) ∆ + 2∆ (x0 ) + 2∆2 x0 + ∆2 x0 + ∆3 − (x0 ) 2 = 3 (x0 ) ∆ + 3x0 ∆2 + ∆3 Hence 3 (x0 )2 ∆ + 3x0 ∆2 + ∆3 f (x0 + ∆) − f (x0 ) f (x00 ) − f (x0 ) = = x00 − x0 ∆ ∆ 2 = 3 (x0 ) + 3x0 ∆ + ∆2 I now assert that as ∆ → 0 (same as x00 → x0 ) we have that f (x0 + ∆) − f (x0 ) 2 2 = lim 3 (x0 ) + 3x0 ∆ + ∆2 = 3 (x0 ) . ∆→0 ∆→0 ∆ lim 42 (1) Where I have just appealed to the intuitive notion of a limit (that is 3x0 ∆ + ∆2 gets closer and closer to zero as ∆ gets closer to zero). To see how this comes from the − δ definition of a limit above, consider any s µ ¶2 sµ ¶2 3x 3x + − , δ= 2 2 > 0 and let which is an indisputably positive number for any positive . It is a good exercise for you to verify (some algebra cranking is involved) that which since δ = 3.1.3 q − < 3x0 ∆ + ∆2 < + ¡ 3x ¢2 2 − q¡ ¢ 3x 2 2 whenever > 0 for every − δ < ∆ < δ, means that the limit is as asserted in (1) above. Maxima In order to say anything about the behavior of our fictitious rational consumer are interested in finding the consumption bundle that gives the highest value of the utility function of all bundles in the budget set. This is an example of an optimization problem and there are simple calculus techniques on how to handle this we can exploit. Consider first the case where there are no constraints on x, so that x can take on any value from negative infinity to positive infinity. Then, a maximum is simply a point x∗ such that the value of the function at x∗ exceeds the value of x for all x. That is f (x∗ ) ≥ f (x) for all values of x. To find such a point we appeal to an absolutely crucial (and very simple) insight: Claim If x∗ is a maximum, then df (x∗ ) dx = 0. I will not at any point test whether you understand WHY this is true, but it is actually very useful if you do. One way to convince yourself is to draw some graphs: a maximum must be at a place where the function “peaks”, which is a point where the derivative is zero (given that the derivative is well defined at the maximum). One can also note that if you accept that the linear approximation at any point is close to the original function for “nearby” points we may argue as follows. Since the linear approximation around the 43 f (x) 6 x ∗ x Figure 4: The Basic First Order Condition maximum is good for x near x∗ we have that f (x) ≈ f (x∗ ) + ⇔ f (x) − f (x∗ ) ≈ df (x∗ ) (x − x∗ ) dx df (x∗ ) (x − x∗ ) dx Thus, df (x∗ ) > 0 ⇒ increase x a little bit to increase f(x) dx df (x∗ ) < 0 ⇒ decrease x a little bit to increase f (x) dx 3.1.4 Boundary Constraints As you will see in the next lecture, the utility maximization problem for a consumer (and lots of other problems that you will see in this class) can be stated as a problem on the form max f (x) , a≤x≤b where a < b. Sometimes we will use some “regularity conditions” on preferences or simply assume that the solution is interior, but in general we need to worry about “corner solutions” when we maximize a function over an interval. To understand why it is simply to observe 44 that the logic we appealed to when we claimed that a maximum must be at a point where the derivative is zero assumes that you can you in any direction from the candidate for a solution. But, 1. At x = a it is impossible to decease x further. Hence it is possible that we have an optimal solution a and that df (a) dx <0 2. At x = b it is impossible to increase x further. Hence it is possible that we have an optimal solution b and that df (b) dx <0 For all other points except a and b the same logic as in the unconstrained case applies. We thus conclude that if we seek the maximum of f (x) over an interval [a, b] the only possible optimal solutions are points where the derivative is zero and the boundary points. Hence, we may use the following algoritm to solve the: maximization problem. Step 1 Differentiate f (x) and look for all values of x such that the derivative is zero. I.e., df (x∗ ) x∗i is a “candidate solution if dxi = 0. In most problems you will see, there will be no more than one such candidate solution. Step 2 Evaluate the function you are maximizing at all candidate solutions. In the case with a single x∗ solving df (x∗ ) dx = 0 that means that you compute f (x∗ ) , f (a), and f (b). Then it is just to see which of these 3 numbers is the greatest, whatever x in the set {a, b, x∗ } that gives the largest value of the function is the solution to the problem. Note that (this you will encounter) there in some cases will be no solution to df (x) dx = 0. In this case the only candidate solutions are the boundary points a and b. 3.1.5 Partial Derivatives Derivatives are hard. About the only thing about them that isn’t hard is to extend the concept of a derivative for a function with a single variable to partial derivatives for functions 45 y 6 f(x) f(b) a b x Figure 5: Example with Optimum at the Upper Boundary with more than one variable. Let f (x, y) be a given function of x and y. The trivial step to realize is that if I set y = y 0 one can define a new function h of a single variable (x) as h (x) ≡ f (x, y 0 ) Now the partial derivative of f (x, y) with respect to x at a point (x0 , y 0 ) is simply the derivative of h (x) at x0 ! We write ∂f (x0 ,y0 ) ∂x for the partial derivatives and what I just claimed is that ∂f (x0 , y 0 ) dh (x0 ) = ∂x dx In words, the partial derivative of a function with respect to a variable is just the univariate derivative of the function you get by setting the other variable(s) to some value (corresponding to where the derivative is evaluated). In practice, this means that when you compute the partial derivative with respect to x, you simply pretend that all other variables but x are constants. Example 4 Consider f (x, y) = √ xy. Now, to find the partial derivative of f with respect to x at some point (x0 , y 0 ) we treat y as a constant and differentiate with respect to y to get · ¸ ∂f (x, y) d√ 1 = x y= √ y ∂x dx 2 x Now, say we want to know the partial with respect to x at (x0 , y 0 ) = (1, 1). Then it’s just to plug into the expression above to get ∂f (1, 1) 1 1 = √ 1= . ∂x 2 2 1 46 3.1.6 The Chain Rule In examples with specific utility functions we will be able to get by without it, but to obtain a general expression we need to use the chain rule of differentiation to differentiate the objective of our maximization problem: (with two variables) h (x) = f (x, g (x)) ⇒ dh (x) ∂f (x, g (x)) ∂f (x, g (x)) dg(x) = + dx ∂x ∂y dx Intuition: ∂f (x, g (x)) − change in f from a small change in x ∂x ∂f (x, g (x)) − change in f from a small change in y ∂y dg(x) − how big is the change in y corresponding to small change in x dx 3.2 Solving the Utility Maximization Problem Optional Reading: You may want to look at the appendix to chapter 5 in Varian (pages 90-94 in the most recent Edition). You should ignore the discussion on Lagrange multipliers unless you have been exposed to this method of optimization previously. We seek to solve the consumer problem max u (x1 , x2 ) x1 ,x2 subj. to p1 x1 + p2 x2 ≤ m x1 ≥ 0 . x2 ≥ 0 Given that we are willing to assume that preferences are monotonic (which we are) we first make the simple observation that we may replace the inequality p1 x1 + p2 x2 ≤ m 47 with p1 x1 + p2 x2 = m. To understand this one just needs to draw a graph. Suppose we would be able to find sine optimal solution x∗ = (x∗1 , x∗2 ) to the consumer problem such that p1 x∗1 + p2 x∗2 < m and that preferences are monotonic. Then we observe that we can increase good 1 (or good 2 or both) a little bit without violating the budget constraint. But, by the monotonicity, this increases the utility of the consumer, which means that x∗ wasn’t optimal. Hence we conclude that no optimal solution can be interior in the budget set. x2 6 @ @ Better @ Bundles @ s@ @ @ @ @ @ x1 Figure 6: Interior Bundles Can’t be Optimal with Monotonic Preferences Thus, we can rule out all interior points in the budget set and solve max u (x1 , x2 ) x1 ,x2 subj. to p1 x1 + p2 x2 = m x1 ≥ 0 . x2 ≥ 0 But since the constraint must hold with equality we know that in any optimal solution it must be the case that x2 = m − p1 x1 . p2 48 We can thus simply plug in the constraint into the objective function and solve the simpler problem µ m − p1 x1 maxm u x1 , 0≤x1 p p2 1 ¶ , WHICH IS A MAXIMIZATION PROBLEM WITH A SINGLE VARIABLE (exactly on the form maxx f (x) s.t. a ≤ x ≤ b which we had in our discussion on maxima). Ignoring for now the possibility of corner solutions we can just differentiate this and set the derivative to zero to get the first order condition. 3.2.1 Optimality Conditions for Consumer Choice Problem Using the chain rule, differentiate the utility function with respect to the choice variable x1 to get ³ ´ ∗ ∗ m−p1 x1 ∂u x1 , p2 ∂x1 ³ ´ ∗ µ ¶ ∗ m−p1 x1 ∂u x1 , p2 p1 + − =0 ∂x2 p2 Since the budget constraint is satisfied with equality we can now substitute back x∗2 = which leads to the condition ∂u(x∗1 ,x∗2 ) ∂x1 ∂u( ∂x2 x∗1 ,x∗2 ) = m−p1 x∗1 , p2 p1 . p2 This condition, which we interpret as a tangency condition below, is a necessary condition for an interior optimum. 3.3 Interpretation of The Optimality Condition The optimality condition has a useful geometric interpretation as a tangency condition between the indifference curve and the budget line and if often referred to as saying that slope of indifference curve=MRS=slope of budget line This geometric interpretation is useful since it will allow us to go back and forth between pictures and math. 49 The first thing to realize is that − ∂u(x01 ,x02 ) ∂x1 ∂u(x01 ,x02 ) ∂x2 is the slope of the indifference curve for any point (x01 , x02 )(Remark about arguments & notational sloppiness). To see this, pick a point (x01 , x02 ) and look for all (x1 , x2 ) such that the consumer is indifferent between these points and (x01 , x02 ) . That is u (x1 , x2 ) = u (x01 , x02 ) ≡ k {z } | just a number- k for short In principle, this can be solved for x2 as a function of x1 (as when we solved examples in class & homework). This solution is some relation x2 = y (x1 ) satisfying u (x1 , y (x1 )) = k I’m ignoring some math issues which are a bit too advanced at this level-there is a question of whether u (x1 , x2 ) = k can be solved for x2 as a function of x1 . However, you have seen in examples that it works and you can trust me that it can be proved that it always works under some regularity conditions. Nevertheless, once you’ve taken on faith that we can solve for a function y (x1 ) we have that the condition above is an identity ( holds for all values of x1 ). Hence, we can differentiate the identity with respect to x1 to get ∂u (x1 , y (x1 )) ∂u (x1 , y (x1 )) dy (x1 ) d + = k=0 ∂x1 ∂x2 dx1 dx1 m dy (x1 ) = dx1 | {z } Slope ∂u(x1 ,y(x1 )) 1 − ∂u(x∂x 1 ,y(x1 )) ∂x2 = MRS x1 , y (x1 ) | {z } =x2 Now y (x1 ) is constructed so that for each x1 you plug in you get the same utility level (it is an indifference curve). So the interpretation of the slope of y (x1 ) is that it tells you how much of good 2 you need to take away if you increase the consumption of good 1 slightly in order to keep the consumer indifferent. For discrete changes, the marginal rate of substitution will give you a slightly wrong answer to the question, but for sufficiently small changes the error will be negligible. So 50 x2 6 @ @ ∂u(x1 ,y(x1 )) @s 1 slope: − ∂u(x∂x 1 ,y(x1 )) @ ∂x2 ´ @+́ @ @ @ x1 ∆x ∆y Figure 7: The Marginal Rate of Substitution the optimality condition can be interpreted as (after multiplying the optimality condition by −1) MRS (x∗1 , x∗2 ) = − | ∂u(x∗1 ,x∗2 ) ∂x1 ∂u( ∂x2 x∗1 ,x∗2 {z ) } rate at which the market is willing to exchange goods rate at which the consumer is willing to exchange goods 3.3.1 p1 − p |{z}2 = Example: Cobb-Douglas Utility u (x1 , x2 ) = xa1 xb2 . Utility maximization problem: max xa1 xb2 x1 ,x2 subj to p1 x1 + p2 x2 ≤ m Budget constraint must bind⇒solve out x2 from constraint to get problem. max 0≤x1 ≤ pm 1 xa1 µ m − p1 x1 p2 ¶b Assuming that the solution is not at the boundary points, the first order condition needs to be satisfied. That is axa−1 1 µ m − p1 x1 p2 ¶b + xa1 b µ m − p1 x1 p2 51 ¶b−1 µ ¶ p1 − = 0. p2 At this point “only” some algebra remains. This can be done in a number of different ways, but here is one calculation µ ¶b µ ¶b−1 µ ¶ m − p1 x1 m − p1 x1 p1 a−1 a ax1 + x1 b − = 0 p2 p2 p2 , , µ ¶b 1 p2 multiply with α >0 m x1 m − p1 x1 µ ¶ a bp2 p1 − = 0 x1 m − p1 x1 p2 m a (m − p1 x1 ) − bp1 x1 = 0 ⇔ x1 p1 (a + b) = am a m ⇔ x1 = a + b p1 Now we have the candidate solution for good one. Plugging back into budget constraint gives m = p1 x1 + p2 x2 = p1 m µ a+b−a a+b b m ⇔ x2 = a + b p2 a m = m m− a+b ¶ a m + p2 x2 a + b p1 = b m = p2 x2 a+b Hence, the candidate solution is a m a + b p1 b m = a + b p2 x∗1 = x∗2 Now, • We know that a solution must either satisfy the first order condition (the point (x∗1 , x∗2 ) is the only point on budget line which does) or be at a boundary point. • In this example, any bundle with either x1 = 0 or x2 = 0 gives utility u (x1 , x2 ) = 0, whereas u (x∗1 , x∗2 ) = (x∗1 )a (x∗2 )b > 0 since x∗1 > 0 and x∗2 > 0. 52 • Combining these two facts we know that the bundle (x∗1 , x∗2 ) = the solution to the consumer choice problem. 3.3.2 ³ a m , b m a+b p1 a+b p2 ´ is indeed Final Remark About Ordinality and Monotone Transformations Let c= a a a+b−a b ⇒1−c=1− = = a+b a+b a+b a+b Plug in c instead of a and 1 − c instead of b above and you immediately get that cm a m = p1 a + b p1 (1 − c) m b m = = , p2 a + b p2 x∗1 = x∗2 so the solution does not change. This is because 1 f (u) = u a+b is an increasing function of u and a b 1 ¡ a b ¢ a+b x1 x2 = x1a+b x2a+b = xc1 x1−c 2 therefore is a monotone transformation. Hence there is no added flexibility in preferences by allowing a + b 6= 1 (and this simplifies calculations). Thus, I will typically use utility function u (x1 , x2 ) = xa1 x1−a 2 Moreover, those of you who are comfortable with logaritms may note that ¢ ¡ = a ln x1 + (1 − a) ln x2 ln xa1 x1−a 2 and you are free to use that transformations whenever you like. The FOC then immediately becomes 1−a a ´ +³ m−p1 x1 x1 p2 so this saves some work. 53 µ p1 − p2 ¶ = 0, y 6 f(x) f(b) a b x Figure 8: Derivative May be Non-Zero if Solution at the Boundary 3.3.3 The Possibility of Corner Solutions Recall that if the highest value of the function is achieved at a boundary point, then the first order condition need not necessarily hold at the maximum. I’ll skip the details, but it is straightforward to work out (and intuitive) that for our utility maximization problem: 1. If solution is at the lower end of the boundary (with x1 being the choice variable in the reduced problem you get after plugging in the budget constraint), then that means ³ ´ that the slope of the indifference curve at point (x1 , x2 ) = 0, pm2 is flatter than the budget line. 2. If solution is at the upper end of the boundary, then that means that the slope of the ³ ´ indifference curve at point (x1 , x2 ) = pm1 , 0 is steeper than the budget line. To see that this is a real possibility in a simple example, let u (x1 , x2 ) = √ x1 + x2 . Following the same steps as in the previous example this results in a utility maximization problem that may be written as maxm 0≤x1 ≤ p 1 √ m − p1 x1 x1 + . p2 If there is an interior solution, we thus have that the first order condition (if f (x) = then df (x) dx = 1 √ ) 2 x 1 p1 =0 √ − 2 x1 p2 54 √ x x2 6 m p2 c c c c c c c c c c c m p1 x1 Figure 9: A Corner Solution must hold for some 0 ≤ x1 ≤ m . p1 Solving this condition for x1 we get x∗1 = µ p2 2p1 ¶2 . For simplicity, set p2 = 2 and p1 = 1, in which case x∗1 = 1. Then, we can recover the consumption of x2 through the budget constraint as p1 x∗1 + p∗2 x∗2 = 1 + 2x∗2 = m ⇒ m−1 . x∗2 = 2 The point with this is that we notice that if m < 1, then our candidate solution for x∗2 is negative, which doesn’t make any sense at all. We thus conclude that the solution must be √ 1 at a corner and (it may be useful for you to plot the function x1 + 1−x to see this) the 2 solution is instead to use all income on x1 , i.e. (x∗1 , x∗2 ) = (m, 0) . √ 1 solves the problem. If you have a hard time to see this you should 1) plot x1 + 1−x , and/or 2 ¡ ¢ √ 2) plug in (x1 , x2 ) = (m, 0) and note that u (m, 0) = m, 3) plug in (x1 , x2 ) = 0, m2 and ¡ ¢ √ note that u 0, m2 = m2 < m. 55