Theory Track Micro Analysis David Pearce New York University Spring 2021 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 1 / 210 Introduction The course begins by studying the behavior and welfare of individual consumers, and then the decisions of competitive profit-maximizing firms. Their behavior is aggregated to derive demand and supply curves in each market, allowing us to analyze perfect competition, first in “partial equilibrium”, one market at a time, and then in general equilibrium. Although we start with ideal conditions (no informational problems or impediments to competition), we gradually enrich the models to take account of risk, asymmetric information and market power. Game theory is used to study strategic situations: before deciding upon an action or course of actions, a player may need to think about what other players may do, and how they may react to what she does. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 2 / 210 The course syllabus is posted on NYU Classes. It gives a more detailed account of the topics covered in this course. In addition, it contains information about: the grading scheme the date of the midterm (during class time) and the time and location of the final exam advice concerning the optional text book contact information for our exceptional teaching assistant Gian Luca Carniglia. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 3 / 210 Consumer Theory A consumer makes a myriad of economic decisions: how much to spend on transportation, food, entertainment, and so on. Abstractly, we think of her allocating the available funds among n possible goods or commodities. How big is n? That depends on how detailed we want to be. Transportation could be broken down into car, bus, subway, bicycle, airplane, and so on. But the item “car” could again be broken down into all kinds of categories or even brands and models. An applied economist would choose how finely to subdivide spending categories, depending on the goals of the analysis. Think of listing all kinds of commodity, from 1 to n (the order isn’t important, as long as we stick to one). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 4 / 210 Some standard mathematical notation will help us refer to different collections of commodities that might be purchased. ∈ R ∀ ∃ | R+ Rn Rn+ “is in the set”, “is an element of” the set of real numbers for all, for every there exist(s) such that = {x ∈ R|x ≥ 0} the set of nonnegative real numbers n dimensional Euclidean space = {x ∈ Rn |xi ≥ 0, i = 1, 2, ..., n} David Pearce (NYU) Theory Track Micro Analysis Spring 2021 5 / 210 A commodity bundle is a vector x ∈ Rn+ Example If there are three commodities, say apples, oranges and bananas, then the vector (2, 4, 0) refers to a bundle that has two apples, four oranges and no bananas. In this case a price vector would be a vector p ∈ R3++ = {x ∈ R3 |xi > 0, i = 1, 2, 3}. If having more of a good is always desirable, its price has to be strictly positive, or else everyone would buy an infinite amount of it. From all the bundles she can afford, a consumer chooses one she likes best (or at least as well as any other affordable bundle). She makes this choice according to her preferences, which we now study formally. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 6 / 210 Preferences If a consumer likes bundle x at least as well as another bundle y, we say “x is weakly preferred to y” and write x % y. We usually assume that her preferences satisfy the following two properties: Completeness: for every two bundles x and y, either x % y or y % x (or both!). Transitivity: for every three bundles x, y and z, if x % y and y % z then x % z. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 7 / 210 If x % y and y % x, we say “x is indifferent to y” (a lazy way of saying she is indifferent between the two) and we write x ∼ y. As a matter of notation, x - y means the same as y % x. If x % y but it is NOT the case that y % x, we write x y. Here we say “x is strictly preferred to y”. For any bundle x, the indifference curve through x is the set ∼ (x) = {y ∈ Rn+ |y ∼ x}. Similarly we can define the upper contour set of x by % (x) = {y ∈ Rn+ |y % x}. For the lower contour set, replace % by -. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 8 / 210 % satisfies local nonsatiation if for every x in arbitrarily close to x such that y x: Rn+ there exists y ∀x ∈ Rn+ and ∀ε > 0, ∃y such that ky − xk < ε and y x This assumption rules out “fat” indifference curves. A stronger asumption that implies local nonsatiation is strict monotonicity: % is strictly monotonic if for each x, y ∈ Rn+ with x ≥ y and x 6= y, we have x y. Here x ≥ y means xi ≥ yi , i = 1, 2, ..., n. So strict monotonicity just means more is always better. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 9 / 210 Often we assume that preferences are continuous. The preference ordering % is continuous if for each x, the sets % (x) and - (x) are closed (contain all their boundary points). Recall that to call a function f continuous is basically to say that f(x) doesn’t “jump” as x moves. Similarly, continuity of a preference ordering concerns the lack of “jumpiness” of the preferences. This is easiest to see in an example. Example: Lexicographic preference ordering in R2+ The preference ordering % on R2+ is called lexicographic if ∀x, y ∈ R2+ , x % y if and only if one (or both) of the following holds: (i) x1 > y1 or (ii) x1 = y1 and x2 ≥ y2 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 10 / 210 Suppose x, y and z are given as follows: x b 2 y 1 z b b b 1 4 7 As we move along the line from y to z, we are initially encountering points that are strictly worse than x (red), but we transition to points that are strictly better than x (green), without ever encountering a point that is indifferent to x. We jump from strictly worse to strictly better. This is like the value of a function starting at 1 and moving to 3, say, without ever taking on the value 2. This is possible only if the function is discontinuous. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 11 / 210 A person with lexicographic preferences cares about both the first and second commodities, but drastically less about the second than the first. Just as a dictionary ranks words according to the first letter in each, and breaks ties by looking at the second letters in each, these preferences use the second component of two bundles only to break a tie in the first component. So what does an indifference curve through a point x look like? It’s just the singleton set {x}. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 12 / 210 Another property of preferences that is important in consumer theory is convexity. Recall that a subset S of Rn is convex if for every x, y ∈ S and every λ ∈ [0, 1], λx + (1 − λ)y ∈ S. In other words, if two points are in the convex set S, so is the line segment between them. A preference ordering % is convex if for each bundle x ∈ S, % (x) is a convex set. The preferences are strictly convex if for every x 6= y in S with y % x and every λ in (0, 1), λx + (1 − λ)y x. Someone with convex preferences likes averages of two indifferent bundles better than extremes. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 13 / 210 Utility Preference orderings are a very general way of capturing a consumer’s tastes. But they are not always convenient to work with: you can’t manipulate them algebraically, or apply standard calculus tools to them. This raises the question: is it possible to find a function that assigns higher numbers to bundles that are more preferred, and then work with that function instead of with the underlying preferences? If % is a preference on Rn+ , a function U : Rn+ → R is said to represent % if, ∀x, y ∈ Rn+ , U (x) ≥ U (y) if and only if x % y. Such a function U is called a utility function. It assigns higher numbers to things that the consumer prefers, or equivalently, things that “give her higher utility”. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 14 / 210 Example Consider for example someone who likes fruit, but is equally happy with apples or oranges. She doesn’t care if she has two apples and three oranges, that is the bundle (2,3), or the bundle (3, 2). All she cares about is the number of apples plus the number of oranges. So if we define the function U by U (x) = x1 + x2 , then U represents her preferences. Suppose you defined a new function V to be five times U , so V (x) = 5(x1 + x2 ). Notice that V ranks all bundles in the same order as U does, so V also represents her preferences. In fact, for any strictly increasing function g, the composite function g(U ), or g follows U , also represents those preferences. We think of these utility functions as ordinal because we are just paying attention to the way they order the bundles; the numbering system is not meant to convey more information than that. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 15 / 210 Can any preference ordering be represented by a utility function? The answer is yes, if the preferences are complete, transitive and continuous. (Remember that, but you don’t need to know the proof.) It is interesting to see why there might be trouble if continuity is violated. Lexicographic preferences are a famous family of preferences that cannot be represented by any utility function. Why? Think of the vertical line at x1 = 5, say, and the points x = (5, 2) and y = (5, 1). To represent the preferences with a function U , each of these two points must be given a different utility. Nothing to the right of the line can be given a utility in the range [U (y), U (x)], nor anything to the left of the line. So just to number things on that vertical line, we’ve used up a whole interval of points on the real number line. Every vertical line is going to need the same treatment: it will take up a whole interval on the real number line. You get the idea that this is going to be pretty crowded. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 16 / 210 In fact, if you’ve taken real analysis (and if you haven’t, please do it one of these days), you know that there is an uncountable infinity of vertical lines, but only a (much smaller) countable infinity of intervals. So there can’t be a distinct interval of real numbers reserved for each vertical line. Give this some thought, but again, you don’t need to know this argument. But most preferences can be represented by some utility function U , and we will usually assume U is smooth (let’s say, twice continuously differentiable). Then the equation of an indifference curve is U (x) = c, where c is some constant. In two dimensions, this is U (x1 , x2 ) = c and totally differentiating both sides yields: U1 dx1 + U2 dx2 = 0 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 17 / 210 that is, dx2 U1 =− dx1 U2 This tells us the rate at which the second commodity must be increased, per (infinitesimal) unit of first commodity taken away, to hold utility constant. In other words, this is the slope of the indifference curve. This is also called the marginal rate of substitution (2 for 1), abbreviated M RS2 for 1 . (Some books get this notation backwards.) Economists call these first partials “marginal utilities”, so we can write dx2 M U1 =− dx1 M U2 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 18 / 210 Recall what the concavity of a function means: f : S → R is concave if for all distinct x, y ∈ S and ∀λ ∈ (0, 1), f (λx + (1 − λ)y) ≥ λf (x) + (1 − λ)f (y) If we replace ≥ with ≤ above, we have the definition of convexity of f . Replacing weak inequalities with strict inequalities in the respective definitions give us strict concavity and strict convexity of f , respectively. Preferences % might be represented by two functions U and V , one strictly concave and one strictly convex. A more suitable property for a utility function is quasiconcavity. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 19 / 210 A function f : S → R on a convex set S is quasiconcave if for all distinct x, y ∈ S and ∀λ ∈ (0, 1), f (λx + (1 − λ)y) ≥ min {f (x), f (y)} Replacing ≥ with > above gives us the definition of strict quasiconcavity. Proposition Suppose U represents % on Rn+. Then U is quasiconcave ⇐⇒ % (x) is convex ∀x We see that quasiconcavity is not so much about the shape of the function as it is about the shape of upper contour sets. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 20 / 210 Optimization Review Optimization usually refers to maximization or minimization, with or without constraints. A typical problem without constraints would be: max f (x) x∈ R Or the problem might include some parameters a = (a1 , a2 , ..., ak ) that the person doing the maximizing is not allowed to choose (constants beyond her control, such as prices that a consumer might face). Then we would write: max f (x, a1 , a2 , ..., ak ) x∈ David Pearce (NYU) R Theory Track Micro Analysis Spring 2021 21 / 210 We are interested in two things: The solution x∗ (a), that is, the choice of x that maximizes f (x, a). The maximized value f (x∗ (a), a). The first tells us the best thing to do, depending on a, and the second tells us how well we do, when we choose the best thing. Example max −x2 + 6ax x∈ R First order necessary conditions (FOC): −2x + 6a = 0 x∗ = 3a maximized value is −(3a)2 + 6a(3a) = −9a2 + 18a2 = 9a2 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 22 / 210 There might be many choice variables: maxn f (x, a1 , a2 , ..., ak ) R x∈ F.O.C.’s ∂f ∂xi = 0, i = 1, ..., n The solution is a vector-valued function x∗ (a) = (x∗1 (a), ..., x∗n (a)) and the maximized value function is f (x∗ (a), a) . Even when the FOC’s are satisfied, you might not have found a global maximizer: it might instead be a local maximizer, a local minimizer, a point of inflection, or a saddle point. (Necessity vs. Sufficiency). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 23 / 210 Suppose there is a one-unit increase in one parameter ai . How much does the maximized value change? Even if we hold all the x1 , ..., xn fixed, there would be the direct effect ∂f ∂ai . But in general all the xj ’s may adjust optimally. So there are n + 1 changes to be taken into account. Fortunately, there is a beautiful result called the envelope theorem that simplifies things tremendously. Let M (a) = f (x∗ (a), a) be the maximized value function of the diffferentiable function f . Assume x∗ is a differentiable function of a. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 24 / 210 Envelope Theorem ∂M ∂f = ∂ai ∂ai x held constant Proof. ∂M ∂ (f (x1 (a), ..., xn (a), a)) = ∂ai ∂ai ∂x1 ∂f ∂f + ... + = ∂x1 ∂ai ∂xn |{z} |{z} =0 (FOC) = ∂f ∂ai David Pearce (NYU) ∂xn ∂f + ∂ai ∂ai x held constant =0 (FOC) x held constant Theory Track Micro Analysis Spring 2021 25 / 210 Optimization with Constraints Often x cannot be chosen freely, but instead must satisfy some constraints. For example, a consumer can spend only the amount of money she has. As before, there may be parameters a1 , ..., ak whose values cannot be chosen. A typical problem with constraints is maxn f (x, a) x∈ R s.t. g(x, a) = 0 The mathematician Lagrange developed the correct first order conditions for these problems. Form the expression L(x, a, λ) = f (x, a) − λg(x, a) In his honor, λ is called a Lagrange multiplier and L is called “the Lagrangean”. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 26 / 210 The first order conditions for an interior maximizer (or minimizer) of f , subject to g = 0, are just the FOC’s of the unconstrained problem min max L(x, a, λ) λ x that is ∂L = 0, ∂xi ∂L =0 ∂λ i = 1, .., n, and Equivalently ∂f ∂g −λ = 0, ∂xi ∂xi g(x, a) = 0 David Pearce (NYU) i = 1, .., n, and Theory Track Micro Analysis Spring 2021 27 / 210 You don’t need to know the proof, but we will use the result all the time, along with associated constrained envelope theorem: Theorem Let M (a) be the maximized value function M (a) = f (x∗ (a), a) associated with the constrained maximization problem. Then ∂L ∂M = ∂ai ∂ai ∂f = ∂ai x held constant x held constant −λ ∂g ∂ai x held constant These Lagrangean results are put to use in various ways in both consumer and producer theory. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 28 / 210 Existence and Uniqueness of Solutions A general way of expressing many maximization problems with or without constraints is to write max f (x, a) x∈S where S is a subset of Rn . The choice set S takes into account any constraints that must be satisfied. Not all such problems have a solution. Example 1 max x2 x∈ has no solution, because David Pearce (NYU) R R is unbounded Theory Track Micro Analysis Spring 2021 29 / 210 Example 2 max x x∈S has no solution when S = {x ∈ R|3 < x < 5}, because S is not closed. Example 3 Finally, suppose S = [0, 1], and f : R → R is defined by ( x if x ∈ [0, 1) f= 0 if x = 1 Then max f (x) x∈S has no solution, because f is discontinuous. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 30 / 210 Weierstrass, the same fellow who proved the Intermediate Value Theorem, proved that these are the only reasons for nonexistence. Weierstrass’ Theorem Let S be a nonempty, closed bounded subset of Rn and f : S → R be continuous. Then f achieves a maximum (and minimum) on S. (know the theorem; you don’t need to know the proof.) Weierstrass’ Theorem says nothing about uniqueness of the x such that f (x) is maximized. Clearly max 10 x∈[0,1] has many maximizers, and max x2 x∈[−1,1] has two. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 31 / 210 Proposition Suppose S is a convex subset of Rn , and f : S → R is strictly quasiconcave. Then there is at most one x∗ that maximizes f on S. Proof. For contradiction, suppose that distinct x, y ∈ S both maximize f on S. Choose any λ ∈ (0, 1) and note that λx + (1 − λ)y ∈ S. By strict quasiconcavity, f (λx + (1 − λ)y) > min{f (x), f (y)}, contradicting the maximality of f (x) and f (y). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 32 / 210 The Consumer Problem In its simplest form, the consumer’s problem is to allocate fixed income I across expenditures on the n available goods: maxn U (x) R+ x∈ s.t. p · x − I = 0 Choosing from Rn+ makes sense (you can’t consume −5 units of toothpaste). In effect, there are n nonnegativity constraints, but we won’t create n extra Lagrange multipliers. Instead we keep in mind that the solution might be a corner solution (such as, eat no apples at all) rather than an interior solution satisfying the FOC’s. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 33 / 210 The associated Lagrangean is: L(x, p, I, λ) = U (x) − λ(p · x − I) FOC’s: M Ui − λpi = 0, i = 1, ..., n p·x=I The first n equations say that λ= M U1 M Un = ... = p1 pn Marginal utilities per dollar are equalized! If some good j were not consumed at all (xj = 0), its marginal utility per dollar might be lower than that of the others, a good reason not to consume xj . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 34 / 210 The solution x∗ to the utility maximization problem is called the Marshallian demand function, in honor of Alfred Marshall, who taught at Cambridge University more than a century ago. We will write D(p, I) = (D1 (p, I), ..., Dn (p, I)) for the Marshallian demand function. The maximized value function is called the indirect utility function, and is written V (p, I) = U (D(p, I)) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 35 / 210 Where there’s a value function, there must be an envelope theorem (or several). In this case the constrained envelope theorem tells us: ∂L ∂V = ∂pi ∂pi x=x∗ (a) ∂V ∂L = ∂I ∂I x=x∗ (a) = −λx∗i (a) =λ Taking the ratios on both sides yields ∂V ∂p x∗i (a) = − ∂Vi Roy’s Identity ∂I which tells us how to recover the Marshallian demands from the indirect utility function. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 36 / 210 Example: Cobb Douglas preferences u(x1 , x2 ) = xα1 xβ2 , FOC’s: α, β > 0 αxα−1 xβ2 − λp1 = 0 1 βxα1 xβ−1 − λp2 = 0 2 p1 x1 + p2 x2 = I Then Marshallian demands are α I α + β p1 β I D2 (p, I) = x2 (p, I) = α + β p2 D1 (p, I) = x1 (p, I) = So the indirect utility function is α β V (p, I) = D1 (p, I) D2 (p, I) = David Pearce (NYU) I α+β α+β Theory Track Micro Analysis α p1 α β p2 β Spring 2021 37 / 210 Note that the relative size of α and β matters, but scaling both up or down leaves demands (but not indirect utility) unchanged. Observe also that the proportion of income spent on each good, that is pi xi α I , is constant: it only depends on β , and not on prices or income. Constant expenditure share is a special property of this family of preferences; there is no reason to expect this for most utility functions. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 38 / 210 Let’s stay with n = 2 for the moment and get a graphical interpretation of the first order conditions of the Marshallian consumer problem. M U2 M U1 = p1 p2 can be rewritten − M U1 p1 =− M U2 p2 (1) The LHS is the slope of the indifference curve at the chosen bundle. And the RHS?Solve the budget constraint for x2 as a function of x1 : p1 x1 + p2 x2 = I I p1 x2 = − x1 p2 p2 So the RHS of (1) is the slope of the budget line. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 39 / 210 Thus, at an interior solution, with both goods consumed, the FOC’s require tangency of the budget line at the indifference curve. If instead M U2 M U1 < , p2 p1 so consuming good 2 is not a useful way to get utility, then we are at a corner solution and M U1 p1 > M U2 p2 The budget line is shallower than the indifference curve. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 40 / 210 Notice that if I increases, the slope − pp12 of the budget line does not change. So an increase in income produces a parallel shift of the budget line, outward. How would such an increase affect x1 and x2 , the quantities demanded? We can’t say, in general. With n commodities, we say the ith good is a normal good if ∂Di ≥0 ∂I We say good i is an inferior good if ∂Di < 0. ∂I David Pearce (NYU) Theory Track Micro Analysis Spring 2021 41 / 210 Intuition might suggest that if you have more to spend, you will buy more of a good than before. This is often the case, hence the name normal good. But suppose, when you have relatively little money, you consume some cheap wine. If you get richer, probably you switch to buying a different wine that you like better. So the cheap wine is an inferior good that you phase out of your consumption bundle as your circumstances improve. We have been changing income, while prices are held fixed. Let’s think now of changing p1 and seeing how x1 reacts. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 42 / 210 When p1 increases, the vertical intercept of the budget line, pI2 , stays fixed, but the maximum amount of x1 the consumer can afford decreases. So the budget line gets steeper, rotating around the fixed vertical intercept. How do x1 and x2 react? By drawing some alternative graphs, we can see that x1 might go up, or down. The same is true for x2 . We say that xi is a Giffen good if ∂Di >0 ∂pi It is rare that Giffen goods are observed, but logic cannot rule them out. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 43 / 210 Economists like to understand the effect of a change in price by dividing it into two parts, a substitution effect and an income effect. When p1 increases, two things have changed. First, the budget line has become steeper: good 1 has become a worse deal than before, compared to good 2. This favors shifting consumption away from good 1, and toward good 2. But the consumer has also experienced a loss in welfare, or well being: she can no longer get onto her original indifference curve. Imagine that when p1 increases, we compensate her by giving her just enough extra income that she can attain the original utility level. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 44 / 210 x2 indifference curve after price increase b B A original indifference curve b b C x1 budget line after p1 increases original budget line In the graph, A is the bundle originally demanded, C is the bundle demanded after p1 increases, and B is the hypothetical bundle chosen if, magically, the price increase had been accompanied by an increase in the income just enough to allow her to attain her original utility level. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 45 / 210 The movement from A to B is called the substitution effect of the increase in p1 . The x1 component of this is negative: as P1 increased, the compensated consumer substituted away from that good. Does that depend on the shape of the preferences or on n = 2? We will get definitive answers on this a bit later. The budget lines through B and C are parallel: the movement from hypothetical B to actual consumption C corresponds to a loss in income. There fore the move from B to C is called the income effect of the increase in p1 . In the graph, the x1 component of this change is negative, meaning good 1 is a normal good (remember income decreased). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 46 / 210 Had good 1 been inferior, the income effect would have been positive, and point C would have been to the right of B. If this income effect were sufficiently large, it could overwhelm the negative substitution effect, leaving final consumption C to the right of A. This would make good 1 a Giffen good. We see that a Giffen good is an extreme example of inferiority. When pi increases, there is no reason to be sure which way x2 should adjust. If iPhones get really expensive, maybe a consumer switches to a substitute, some kind of smartphone. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 47 / 210 But if bacon gets expensive, a consumer might buy fewer eggs, because she likes to eat them together: she considers them complementary. Good j is said to be a gross substitute for good i if ∂Dj >0 ∂pi Good j is a gross complement to good i if ∂Dj <0 ∂pi This is an “old fashioned” definition, known to be unsatisfactory because it is possible (see example in Nicholson and Snyder) to have ∂Dj > 0, ∂pi but ∂Di < 0, ∂pj so j is a substitute for i but i is not a substitute for j. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 48 / 210 By the way, all these derivatives of quantities with respect to prices look like natural measures of how price-sensitive demand is. But they are not unit-free! Example A family buys: 15 pints of milk if p = 1 and 10 pints of milk if p = 2. The price increase was just 1, and their consumption dropped by 5, so the sensitivity to price seems high. But if we measure price in cents, the price increase was 100, and consumption dropped by only 5, so the sensitivity to price sounds low. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 49 / 210 Clearly we need a unit-free measure of how sensitive a variable y is, to a variable x. For a discrete change in x, we can look at the percentage change in y the percentage change in x that is, 100 ∆y y 100 ∆x x = x ∆y y ∆x The analog of this for an infinitesimal change is the elasticity of y with respect to x: x ∂y εy,x = y ∂x David Pearce (NYU) Theory Track Micro Analysis Spring 2021 50 / 210 εxi ,pi = pi ∂xi xi ∂pi is sometimes called an “own price elasticity”, whereas εxi ,pj = pj ∂xi xi ∂pj is a “cross price elasticity”. Except for Giffen goods, own price elasticities are negative, so talking about a large elasticity can be confusing (it usually means for below 0). xi is elastic unitary elastic inelastic perfectly inelastic David Pearce (NYU) if if if if |εxi ,pi | > 1 |εxi ,pi | = 1 |εxi ,pi | < 1 |εxi ,pi | = 0 Theory Track Micro Analysis Spring 2021 51 / 210 Consumer Welfare Economists often use the Marshallian demand curve to get a dollar measure (called consumer surplus) of how much the consumer benefits from having access to a market at a particular price. Let’s take a microscopic view of the Marshallian curve, revealing the maximum amount the consumer would have been willing to pay for each unit. p1 p2 p3 p4 p5 p6 p7 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 52 / 210 Here’s an argument that’s not quite right: The demand curve tells us the consumer would pay at most p1 for the first unit, at most p2 for the next, and so on. So if the price is actually p5 , for example, she pays 5p5 for the five units she buys, but would have been willing to pay p1 + p2 + p3 + p4 + p5 for it. Her surplus (from having access at p = p5 ) is (p1 − p5 ) + (p2 − p5 ) + (p3 − p5 ) + (p4 − p5 ), that is, the area under the demand curve above the price. This is called consumer surplus. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 53 / 210 What’s wrong with this argument? She wouldn’t actually be willing to pay p1 + p2 + p3 + p4 + p5 to get five units. For example, her willingness to pay p5 for the fifth unit is conditional on her having paid p5 for the first four units as well! If in fact she is made to pay more for the first four units, she feels poorer, and may be unwilling to pay p5 for the last unit. Thus, Marshallian consumer surplus is an imperfect measure of maximal willingness to pay. In later lectures we will see when it is nonetheless a good approximate measure of welfare. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 54 / 210 Cost Minimization and Hicksian Demand Constrained utility maximization is a familiar problem: it’s just choosing what you like best, from what you can afford. We do it all the time. There’s a different constrained optimization problem that seems more artificial: attaining a target utility, say u0 , as cheaply as possible. As strange as it seems, it accomplishes lot of things for us: a beautiful, general treatment of income and substitution effects. a satisfying definition of substitutes and complements. “Hicksian” demand curves that always slope down. exact measures of consumer welfare. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 55 / 210 minn p · x R+ x∈ s.t. U (x) = u L = p · x − λ(U (x) − u) ∂L = 0 ⇒ pi − λ M Ui = 0, ∂xi i = 1, ..., n 1 M U1 M Un = = ... = λ p1 pn This is the “equalize marginal utilities per dollar” rule, all over again. The x∗ that solves the minimization problem is called the Hicksian demand (a function of prices and the target utility u) and is denoted h(p, u) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 56 / 210 The minimized value is called the expenditure function and is written e(p, u) = p · h(p, u) Let’s apply the constrained envelope theorem: ∂e ∂L = ∂pi ∂pi x=x∗ (p,u) = x∗i (p, u) Now what x∗i is this? It’s the Hicksian demand, so we have: Shephard’s Lemma ∂e = hi , ∂pi i = 1, ..., n Clearly the function e is increasing in all its arguments. We can say more about its shape. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 57 / 210 Proposition e is concave as a function of p. Proof. We need to show, for all p0 , p00 ∈ Rn++ and all λ ∈ (0, 1), that e λp0 + (1 − λ)p00 , u ≥ λe p0 , u + (1 − λ)e p00 , u Letting p000 = λp0 + (1 − λ)p00 and letting x0 , x00 and x000 solve the minimization problem at prices p0 , p00 and p000 respectively, we need to show: p000 x000 ≥ λp0 x0 + (1 − λ)p00 x00 (2) Now p000 x000 = λp0 x000 + (1 − λ)p00 x000 ≥ λp0 x0 + (1 − λ)p00 x00 which establishes (2). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 58 / 210 This helps us right away. Suppose e is twice continuously differentiable. Concavity of e in p implies, for each i, that ∂2e ≤0 ∂p2i Now ∂hi ∂ = ∂pi ∂pi ∂2e = 2 ∂pi ≤0 ∂e ∂pi (from Shephard’s Lemma) (concavity of e) Thus, for each good i, the ith Hicksian demand curve is downward-sloping in pi . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 59 / 210 The function h is also called the compensated demand function, when pi increases, income is increased just enough to keep the consumer at the original utility. When we broke the response of Marshallian demand for good 1 (to an increase in p1 ) into a substitution effect and an income effect, the substitution effect was the change in x1 when the price increase is exactly compensated. That’s exactly what the function h tells us. We just learned that hi always (weakly) decreases when pi increases, so the substitution effect is always (weakly) negative. This doesn’t depend on the shape of the utility function nor the number of goods. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 60 / 210 Why does the same “equalize marginal utility per dollar” rule apply to both constrained utility maximization and constrained cost minimization? In both cases, you need to be producing utility efficiently. In the Hicksian problem, suppose you were buying strictly positive amounts of two goods, say xi and xj , but xi had a higher marginal utility per dollar. Spend a dollar less on j and spend it on i instead. Utility goes up.Now you can cut back expenditures (on any good) until you are back down to the target utility, which is now costing you less (contradicting optimality of the original bundle) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 61 / 210 x2 x∗ b U (x) = u p1 x 1 + p2 x 2 = I x1 Someone could look at this graph and say: “This shows how x∗ achieves the highest possible utility, given income I.” Someone else might say: “This shows how x∗ minimizes expenditure, subject to the utility target u.” They would both be right: x∗ solves both problems. In a sense they are “paired problems”: u = V (p, I) and I = e(p, u) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 62 / 210 This reminds me of a pair of “identities” that I call the “woodchuck identities”, for reasons I will reveal in class. V (p, e(p, u)) = u e(p, V (p, I)) = I These are not always true, but they hold under fairly weak conditions on the underling utility function U . Holding prices fixed, V maps money into utility, whereas e maps utility into money. You might wonder, then, if the two functions are inverses. And the woodchuck identities say indeed, they are. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 63 / 210 Something amazing is going to happen. Let’s build the Hicksian income compensation directly into the Marshallian demand function, so that when any price changes, income gets adjusted automatically to keep utility constant: D(p, e(p, u)) = h(p, u) Because this is true at all prices, we can partially differentiate the ith component of both sides w.r.t. pi : ∂Di ∂e ∂Di ∂hi + = ∂pi ∂pi ∂I ∂pi ∂Di ∂hi ∂e ∂Di = − ∂pi ∂pi ∂pi ∂I |{z} Shephard’s Lemma Slutsky Equation ∂Di ∂hi ∂Di = − xi ∂pi ∂pi ∂I David Pearce (NYU) Theory Track Micro Analysis Spring 2021 64 / 210 The Slutsky equation breaks the Marshallian own-price derivative into the substitution effect ∂hi ∂pi which we’ve shown to be ≤ 0, and the income effect −xi If good i is normal, ∂Di ∂I ∂Di ∂I ≥ 0 and the income effect is ≤ 0. If instead the good i is inferior, the two effects go in opposite directions. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 65 / 210 It is instructive to derive the elasticity form of the Slutsky equation: ∂Di ∂hi ∂Di = − xi ∂pi ∂pi ∂I pi ∂Di pi ∂hi pi xi I ∂Di = −xi x ∂p x ∂p xi I xi ∂I | i {z i} | i{z i} | {z } εDi ,pi = εhi ,pi − p i xi I εDi ,I This shows that the Marshallian and Hicksian price elasticities differ by an income elasticity weighted by the income share of the good in question. Since the income share of most goods is tiny, often the two price elasticities are almost the same. This makes it unlikely that the income effect is going to overwhelm the substitution effect, even if the good is inferior. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 66 / 210 How about the cross-price Slutsky equation? Di (p, e(p, u)) = hi (p, u) ∂Di ∂e ∂Di ∂hi + = ∂pj ∂pj ∂I ∂pj ∂Di ∂hi ∂Di = − xj ∂pj ∂pj ∂I Converting to elasticity form, pj ∂Di pj ∂hi pj xi I ∂Di = −xj xi ∂pj xi ∂pj xi I xi ∂I | {z } | {z } | {z } εDi ,pj = εhi ,pj − David Pearce (NYU) p j xj I Theory Track Micro Analysis εDi ,I Spring 2021 67 / 210 Here’s a better proof that the ith Hicksian demand is weakly downward-sloping. It doesn’t require any smoothness of e, or even that the Hicksian problem has a unique solution! Proposition If pi increases (other prices constant), then hi does not increase. Proof. Consider any two price vectors p0 , p00 ∈ Rn++ and let x0 , x00 solve the respective expenditure minimization problems (for some u).By definition p0 x0 ≤ p0 x00 00 0 (3) 00 00 p x ≥p x 00 0 (4) 00 00 −p x ≤ −p x David Pearce (NYU) Theory Track Micro Analysis (5) Spring 2021 68 / 210 Proof. Add (3) and (5): 0 (p0 − p00 ) · x0 ≤ (p0 − p00 ) · x00 00 0 (6) 00 (p − p ) · (x − x ) ≤ 0 (7) Now consider price vectors p0 , p00 that differ only in the ith component. (p0i − p00i ) · (x0i − x00i ) ≤ 0 That is, xi moves in (weakly) the opposite direction of pi . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 69 / 210 How do the Marshalian and Hicksian functions Di and hi through the point x∗ compare? If good i is normal, the income and substitution effects are both ≤ 0, so they reinforce one another, and ∂hi ∂Di ≥ ∂pi ∂pi So Di is more elastic than hi . But it doesn’t “look” that way when we graph them both, because the independent variable, pi is on the vertical axis! David Pearce (NYU) Theory Track Micro Analysis Spring 2021 70 / 210 p1 hi (pi, u0) b Normal good i x∗ Di (pi, I) x1 Here, the other argument of Di and pi that are both fixed, have been suppressed. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 71 / 210 Compensating and Equivalent Variation The expenditure function gives a clear way of deriving dollar measures of a consumer’s loss in welfare when one or more prices rise. Suppose original prices are p0 ∈ Rn++ , and new prices are p1 ∈ Rn++ , where one or more goods have become more expensive. The consumer’s original utility was u0 and her income is I, so e p0 , u0 = I If we wanted her to have enough money to attain the original utility u0 at the new prices, that would be (by definition) e p1 , u0 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 72 / 210 Therefore the extra money she needs, in addition to her original income I, is CV = e p1 , u0 − I This is called the compensating variation associated with the increase in prices. Notice that if u1 is the new, lower utility, she gets at p1 if there is no compensation, then e p1 , u1 = I = e p0 , u0 Therefore CV is sometimes written as e p1 , u0 − e p1 , u1 or e p1 , u0 − e p0 , u0 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 73 / 210 In the case where only one price pi has changed, there is a striking graphical interpretation of CV . First some notation: for x ∈ Rn and y ∈ R, define (y, x−i ) = (x1 , ..., xi−1 , y, xi+1 , ..., xn ) And secondly, recall from the Fundamental Theorem of Calculus (FTC) that if f is the first derivative of F , then Z a b f (x)dx = F (b) − F (a) Think of the expenditure function as F and its derivative hi as f (Shephard’s Lemma again). Hold utility at u0 and all prices fixed except pi . Remember pi is the independent variable, so integration is the area to the left of the curve. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 74 / 210 The area to the left of the Hicksian demand anchored at u0 , between p0i an p1i is Z p1i p0i Z hi pi , p0−i , u0 dpi p1i ∂e pi , p0−i , u0 dpi p0i ∂pi p1 = e pi , p0−i , u0 pi0 i = e p1 , u0 − e p0 , u0 = (by Shephard’s Lemma) (FTC) = CV David Pearce (NYU) Theory Track Micro Analysis Spring 2021 75 / 210 pi Hicksian demand p1i p0i b Marshallian demand xi The shaded area is the compensating variation associated with an increase in pi If good i is normal, the Marshallian demand will be more price-elastic than the Hicksian. The loss in CS from the price increase is the area to the left of the Marshallian curve, between p0i and p1i (less than CV ). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 76 / 210 Economists like to ask another dollar-valued question about the welfare effect of price increases from p0 and p1 : What loss in income would be as bad, at prices p0 , as facing prices p1 at original income I? Well, having e p0 , u1 in total would be as bad, since it leaves you with the same utility u1 . Starting at I, to get to e p0 , u1 you would need to lose EV = I − e p0 , u1 , the equivalent variation associated with the price increase. Because I = e p0 , u0 = e p1 , u1 , EV can also be written David Pearce (NYU) e p1 , u1 − e p0 , u1 Theory Track Micro Analysis Spring 2021 77 / 210 Repeating the argument for graphically evaluating CV , based on the area to the left of the Hicksian demand anchored at u0 , we see that EV is the area to the left of the Hicksian demand is anchored at u1 . If i is a normal good, Marshallian demand is more price-elastic than Hicksian. pi hi pi , p−i, u0 p1i p0i y hi pi , p−i, u1 b b x Di(pi , p−i, I) xi Call the old and new Marshallian bundles x and y. Point y gives utility u1 , so the Hicksian demand curve through y in anchored at u1 . The shaded area is EV ≤ loss of CS ≤ CV in this normal case. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 78 / 210 Recall that the gross substitute definition based on Marshallian demand cross partials, is unsatisfactory because of a lack of symmetry: sometimes y is a gross substitute for x, but x is not a gross substitute for y. We say x and y are net substitutes if ∂hi >0 ∂pj Note that, using Shephard’s Lemma and Young’s Theorem ∂hj ∂hi ∂e ∂e ∂ ∂ = = = ∂pj ∂pj ∂pi ∂pi ∂pj ∂pi So the desired symmetry holds. We say that x and y are net complements if ∂hi <0 ∂pj David Pearce (NYU) Theory Track Micro Analysis Spring 2021 79 / 210 Producer Theory Whereas each consumer has her own preferences, we will simplify by assuming each firm seeks to maximize profits. Instead of introducing a preference ordering, then, we specify the firm’s technology by describing its production function f : Rn+ → R The interpretation is that the input vector x = (x1 , ..., xn ) can be turned into f (x) units of output. Note: By assuming the range of f is R, not Rk for some integer k > 1, we are focusing on single-product firms. (But think of a sheep.) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 80 / 210 For simplicity f is usually assumed differentiable. For any i, ∂f = M Pi ∂xi is called the ith marginal product, typically strictly positive. The law of diminishing returns asserts that if all inputs are held fixed as the ith one is increased, eventually M Pi will decline and approach zero. The reasoning is that you can’t do much with vast quantities of water, for example, if space and other raw materials and labor are fixed. But returns to scale are an entirely different matter. Production function f is said to have constant returns to scale if f (tx) = tf (x) ∀x, ∀t > 1. increasing returns to scale if f (tx) > tf (x) ∀x, ∀t > 1. decreasing returns to scale if f (tx) < tf (x) ∀x, ∀t > 1. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 81 / 210 An isoquant is the set of all input vectors that produce the same output level. It is analogous to an indifference curve. In two dimensions its equation is f (x) = ȳ and the total differentiation gives the slope − M P1 M P2 One can denote output price by p ∈ R++ and input prices by w ∈ Rn++ . In two dimensions one often sees costs= wL + rK, where L and K are labor and capital, w is the wage rate and r is the cost of capital. An isocost line (or curve) is the set of input bundles that cost a particular amount, for example w1 x 1 + w2 x 2 = c David Pearce (NYU) Theory Track Micro Analysis Spring 2021 82 / 210 The Producer’s Problem Suppose you produce q units. You are not maximizing profits unless you are minimizing the cost of producing q. Doing this is exactly analogous to minimizing the cost of producing u utils in consumer theory! So the Lagrangean has to look the same, the FOC’s must look the same, the graphical interpretation must look the same (tangency between isocost and isoquant) and so on. minn w · x R+ x∈ s.t. f (x) = q L = w · x − λ(f (x) − q) FOC’s: 1 M P1 M Pn = = ... = λ w1 wn The solution x(w, q) is called the conditional input demand function. The associated minimized value function is C(w, q) = w · x(w, q) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 83 / 210 Naturally the envelope theorem here is again: Shephard’s Lemma ∂C(w, q) = xi (w, q) ∂wi Because all of the math is identical to the cost minimization problem for the consumer, this cost function is also concave in w, and the two proofs that input demand i is weakly downward sloping in wi , are identical to those in consumer theory. You should know them. A firm often makes some commitments regarding inputs that cannot be exchanged quickly. For example, it may rent capital for a month, or hire a manager on a one-year contract. So if input or output prices change, the firm may be stuck in the short run, with a suboptimal vector of inputs. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 84 / 210 Long Run In the long run, the firm is assumed to be able to choose all inputs optimally. Write LC(q) for the long run cost of producing q units of output. Since it takes no input to produce no output, C(0) = 0 But there might be setup costs that cause a discontinuity in LC at the origin. For example, before a restaurant can serve any food at all, it needs to meet health codes and pass inspections. So it might need to spend $20, 000 before it is allowed to produce any q > 0. LM C(q) = d LC(q) dq LAC(q) = David Pearce (NYU) LC(q) q is the long run marginal cost. is the long run average cost. Theory Track Micro Analysis Spring 2021 85 / 210 For any q > 0, d LAC = dq d LC q dq = qM C − LC LM C − LAC = 2 q q This says the average is always moving toward the marginal. If a person taller than the average person in a group joins the group, she increases the average. But not by much, if it’s a group with lots of members. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 86 / 210 Case 1: No setup costs. If LC is differentiable, it approaches 0 as q → 0. Therefore: lim LAC = lim q→0 q→0 LC = lim LM C q→0 q (by L’Hôpital’s Rule) Therefore LAC and LM C have the same vertical intercept. Lots of shapes are possible. A favorite one among economists is “U-shaped cost curves”, as shown in the graph LAC LM C LM C LAC q Here, LAC declines, while LM C is below it, until it is cut by LM C as the latter rises. Thus the minimum of LAC occurs where LM C = LAC. (This is true even if there are setup costs.) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 87 / 210 L’Hôpital’s Rule can further be pressed into service to show (in the absence of setup costs) that lim q→0 d LM C d LAC = 2 lim q→0 dq dq This is a basic property of marginals and averages, having nothing to do with the fact that these happen to be cost curves. We will see the same relationship when we get to the firm’s marginal revenue and demand curves. It is easy to explain why LAC might initially decline. Adam Smith pointed out in 1776 in The Wealth of Nations that with many workers, specialization is possible, and profitable. This is an example of an “economy of scale”. But what explains an eventual rise in LAC? The best story is that “span of control” issues become a problem as firm gets really large. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 88 / 210 Case 2: Strictly positive setup costs. Now lim LC > 0 q→0 therefore lim LAC = ∞ q→0 The $20, 000 in our restaurant example is divided by a vanishing q, so the limit LC q explodes. Here a different graph becomes economists’ favorite: LAC LM C LM C LAC q David Pearce (NYU) Theory Track Micro Analysis Spring 2021 89 / 210 Sometimes it is useful to see how the LC curve is related to the marginal and average curves. Point of Inflection b b While LC is concave, LM C is 2 falling, because dd qLC 2 < 0. LAC = rise run is minimized here q After the point of inflection, LM C is rising. LM C LAC is minimized where the line from the origin is tangent to LC. LAC q David Pearce (NYU) Theory Track Micro Analysis Spring 2021 90 / 210 Short Run Let SF C (short run fixed costs) be costs that cannot be avoided in the short run, even by shutting down. Define short run variable costs by SV C = SC − SF C Dividing both sides by q gives average costs: SAV C = SAC − SAF C SM C = d SC dq C Notice that, because d SF d q = 0, SM C is the marginal curve of both SC and SV C and hence cuts the average curves SAC and SAV C at their respective minima. The graph shows a U-shaped SM C curve in a case without setup costs. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 91 / 210 SC b SV C b b b b q SM C SAC b SAV C b b b SAF C q David Pearce (NYU) Theory Track Micro Analysis Spring 2021 92 / 210 LC and SC Together Suppose that, fixing input prices in the background, K̄ is the ideal amount of capital for producing some particular output level q̄. Then if K is fixed at K̄ in the short run (all other inputs variable), LC(q̄) = SC(q̄) and probably LC(q) < SC(q), q 6= q̄ (the weak inequality surely holds). The average curves LAC and SAC are related in exactly the same way. So LC and SC are tangent at q̄, and so are LAC and SAC. Of course tangency of LC and SC at q̄ means that LM C(q̄) = SM C(q̄). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 93 / 210 SC LC Probably the simplest setting in which to see the relationship between long and short run curves is the case of constant LAC and LM C. In the top graph, draw LC and ask where SC must lie. In the lower, draw LM C = LAC and see where SAC and then SM C must be. b SF C q SM C b SAC LM C = LAC q q̄ David Pearce (NYU) Theory Track Micro Analysis Spring 2021 94 / 210 LC What about the more ambitious case where long run costs are rising? Let’s say that q̄ happens to be higher than q m that minimizes SAC. b b SC q SM C Draw LC, then SC, then LAC and LM C, then SAC and finally SM C. LM C SAC b b b LAC qm David Pearce (NYU) Theory Track Micro Analysis q q̄ Spring 2021 95 / 210 Suppose instead LAC and LM C are our favorite U-shaped curves. For each different fixed capital level (ideal for a different output level), the corresponding LAC and SAC must coincide at the respective q value. LAC SAC ′′ SAC ∗ SM C ′′ SM C ∗ b SAC ∗ b SM C ∗ b q′ q∗ q ′′ q With capital fixed at K ∗ ideal for Q∗ , LAC and SAC are minimized at Q∗ and SM C cuts them there, as it rises.At q 0 (with capital fixed at the corresponding K 0 ) SAC is not minimized, because it is tangent to the downward-sloping LAC curve there. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 96 / 210 Firm Supply Graphical analysis of a firm’s supply curve depends heavily upon the properties of marginal and average cost curves. Fix input prices and regard supply as a function of the market price p ∈ R+ , which the firm cannot change. The firm wishes to maximize profits = revenues - costs In the long run, it solves max pq − LC(q) q∈ FOC: R+ p = LM C This condition must hold at any interior solution. But it might be better to produce 0, that is, if p < min LAC, so that (multiplying by q) revenue< LC. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 97 / 210 In the simplest case, there are no setup costs, and LM C and LAC are increasing. LM C LAC p pc q Below P c , it is impossible to cover costs, so q = 0. Above pc , the firm produces where p = M C. Thus, the LR supply curve “is” the LM C curve, in this case. p LMC Notice that for the supply curve, p is the independent variable, whereas for marginal cost, q is the independent variable. long run supply q David Pearce (NYU) Theory Track Micro Analysis Spring 2021 98 / 210 Notice that integrating from 0 to q under the LM C curve gives us LC(q). Thus, profits are revenues-costs=pq-(shaded area). p q Thus the graph shows us the firm’s profits at any price. Next, suppose that LM C and LAC are U-shaped. The supply curve no longer coincides with LM C. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 99 / 210 LM C Supply LAC p p c b b q′ qc q q ′′ Now, supply will be 0 until price exceeds the critical value for entry pc . For prices below pc , the firm cannot cover its costs and prefers not to produce. For prices slightly above pc , p = LM C is satisfied at two quantities. Note that at the lesser of these two, q 0 , p < LAC, so losses would be incurred. But at q 00 , p > LAC and the firm earns profits. So long run supply here will be given by the portion of LM C to the right of q c (and by the points below pc on the vertical axis). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 100 / 210 supply curve pc b b qc q q′ If someone shows you the supply curve, you can’t deduce what LM C was. So it seems to be impossible to integrate under LM C to find LC(q), just from the supply curve. BUT because the firm is indifferent, at pc , between q = 0 and q = q c , you infer that the revenue from q c just covers the cost of producing those units, that is, LC(q c ) = pc q c . Therefore, for any q 0 > q c , LC(q) = pc q c + area under supply curve from q c to q 0 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 101 / 210 LM C What if there are setup costs? Then it is no longer true that integrating under LM C gives LC(q): this misses the setup cost. LAC pc But it is still true that for the critical entry price pc and corresponding q c , q qc supply c c c LC(q ) = p q (inferred from the firm’s indifference between q = 0 and q c ). LC(q ′ ) pc q′ David Pearce (NYU) Theory Track Micro Analysis Spring 2021 q 102 / 210 In the short run, what is the firm’s objective? SF C are unavoidable in the short run, so the objective is max pq − SV C(q) q∈ R+ Thus the FOC for an interior maximizer is p = SM C but if p < min SAV C, the firm prefers to set q = 0. Notice that the firm may choose to produce even if is making losses in the short run. For example, if revenues exceed variable costs by $3,000 but cannot cover the SF C of $5,000, it is better to produce and take a net loss of $2,000 than to choose q = 0 and lose $5,000. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 103 / 210 SM C SAC SAV C p′ q q′ Consider an example with no setup costs but positive fixed costs. At price p0 , FOC’s suggests: consider setting q = q 0 . Is this better than q = 0? The firm makes losses at q 0 , because p < SAC. But p > SAV C, so q 0 is better than 0. In these graphs, one can calculate SC(q) either by pc q c plus the integral under supply from q c to q, or by q · SAV C. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 104 / 210 Definition The producer surplus, given price p, is the dollar benefit to the producer from being able to produce and sell in the market. More precisely, it is profits at the optimal q minus profits at q = 0. In the long run, producer surplus coincides with profits, because LC(0) = 0 so profits at q = 0 are 0. But in the short run, does the firm try to maximize profits, or producer surplus? It amounts to the same thing, because they differ by a constant, SF C. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 105 / 210 In both the short run and long run analysis, we have been assuming that the firm takes market price p fixed (views it as something the firm cannot noticeably affect, because the firm is of negligible size relative to the market). For some technologies, this makes sense; for some it doesn’t. For example, suppose there are setup costs, and LC is linear. Then LM C is constant. LAC LM C q What is the long run supply curve? Does the question make sense in a price-taking setting? Even without setup costs, the same problem arises if LM C is declining as q grows without bound. So “increasing returns to scale everywhere” can’t be analyzed using “price-taking” methods. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 106 / 210 One-Step Maximization We have been focused on a two-step approach to the producer problem: first, solve the cost minimization problem to find the average and marginal cost curves, and then use them to understand the supply decision, as a function of market price. This will prove extremely useful in doing policy analysis. But think for a moment about tackling the firm’s profit-maximizing problem in one step: max x ∈ Rn+ (pf (x) − w · x) where p ∈ R++ is the market price of output, w ∈ Rn++ is the vector of input prices, x ∈ Rn+ is the vector of input quantities, f (x) is the quantity of output. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 107 / 210 If output price rises, will the producer always produce more? Proposition Fix some vector of input prices w, and suppose p0 > p. Let x and x0 solve the profit maximization problem at (p, w) and (p0 , w) respectively. Then f (x0 ) ≥ f (x) Proof. pf (x) − w · x ≥ pf (x0 ) − w · x0 (8) p0 f (x) − w · x ≤ p0 f (x0 ) − w · x0 x0 (9) p0 ). (because x is best at p, and is best at Multiply (8) by −1 (changing the direction of the inequality) and add the result to (9): (p0 − p)f (x) ≤ (p − p0 )f (x0 ) therefore so f (x0 ) ≥ f (x). David Pearce (NYU) (f (x0 ) − f (x))(p0 − p) ≥ 0 Theory Track Micro Analysis Spring 2021 108 / 210 This means that firms’ supply curves are always (at least weakly) upward-sloping, no matter how complicated M C and AC might be. Are we talking about the long run or the short run here? The previous result (and the next one) apply in either case (the only difference is, how many inputs does the firm get to vary?) What if, instead, one input price rises while all others, and p, are held fixed? Proposition Suppose w0 and w differ only in the ith component, where wi0 > wi . Price of output is fixed at p. If x and x0 solve the profit maximization problem at (p, w) and (p, w0 ) respectively, then x0i ≤ xi . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 109 / 210 Proof. Because x is profit-maximizing at (p, w), pf (x) − w · x ≥ pf (x0 ) − w · x0 But x0 is profit-maximizing at (p, w0 ), so pf (x) − w0 · x ≤ pf (x0 ) − w0 · x0 Multiplying (11) by −1, thereby changing the direction of the inequality, and adding the result to (10): (w0 − w) · x ≥ (w0 − w) · x0 therefore (10) (11) (w0 − w)(x − x0 ) ≥ 0 Hence, because all components of w0 − w are 0 except the ith , (wi0 − wi )(xi − x0i ) ≥ 0 so x0i ≤ xi . Thus, factor input demands are weakly downward-sloping in their own prices. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 110 / 210 Market Demand It’s time to aggregate demand and supply curves to derive the market demand and market supply curves. In the case of market demand, this is quite straightforward. At each price p, the j th consumer’s demand function Dij for the ith good specifies the quantity xji demanded by j, as a function of her income and all prices. When we draw her demand curve, we hold her income and all prices pk , i 6= k, fixed. Remember that here, pi is the independent variable, and the quantity demanded is plotted on the horizontal axis. So the market demand for good i is the horizontal summation of all the individual demand curves of the m consumers: Di (pi ) = m X Dij (pi ) j=1 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 111 / 210 Draw yourself some examples. Note carefully that the sum of linear demand curves, for example, is not linear, if they have different intercepts. pi Di (pi ) Di2 Di1 xi Here the graph shows two consumers with linear demand curves and different vertical intercepts. The sum has a kink at the price where the second consumer switches between participation and nonparticipation. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 112 / 210 How about aggregating consumer surplus? This is an extremely delicate issue. What does it mean to add up the welfare of different people? Is an extra thousand dollars to a rich person comparable in value to an extra thousand for a poor person struggling for survival? Most people would say no. Nonetheless, economists often simplify by just adding up individual consumer surpluses to get an aggregate measure. Gertrude Stein’s most famous line is: “A rose is a rose is a rose.” When one adds up different individuals’ consumer surpluses, one is saying, without justification, “A dollar is a dollar is a dollar”. Is the sum of individual CS captured in the market supply curve? Or is information lost in the aggregation? David Pearce (NYU) Theory Track Micro Analysis Spring 2021 113 / 210 At price p0i , consumer j enjoys consumer surplus Z ∞ j 0 Dij (pi ) dpi CSi (p ) = p0i where we suppress the other arguments of demand including other prices and income. So consumer surplus is m X CSij (p0 ) = j=1 m Z X j=1 = = Z p0i m ∞X p0i Z ∞ Dij (pi ) dpi Dij (pi ) dpi j=1 ∞ p0i Di (pi ) dpi the area to the left of market demand, above price. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 114 / 210 Market Supply Here it is useful once again to consider the long run and the short run separately. For the aggregation exercise, the simple case is the short run. Economists usually assume that it is a good approximation to hold input prices fixed as firms in the market move up or down their short run supply curves. That is, because some factors of production are fixed in the short run, firms (in response to an increase in p) won’t increase their scale of production so much that factor input prices are bid up significantly. So all factor input prices are held fixed while output price is varied and firms respond accordingly. In the short run, aggregate supply is just the horizontal sum of firm supply curves. If there are l firms active in the short run: Si (p) = l X Sik (p) k=1 David Pearce (NYU) Theory Track Micro Analysis Spring 2021 115 / 210 How about aggregate producer surplus? Again, we sum the surpluses of the individual firms, each of which is the area to the left of the supply, up to the market price p0i : l X P Sik (p0 ) = k=1 l Z X k=1 = Z p0i 0 = Z p0i Sik (pi ) dpi 0 l X Sik (pi ) dpi k=1 p0i Si (pi ) dpi 0 the area to the left of market supply, under price p0i . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 116 / 210 Is every firm’s short run supply curve the same as any other’s? Not necessarily. Even if all firms have access to the same technology, maybe they have made their choices of capital levels (or other inputs that take a long time to adjust) at different times, when different prices prevailed. Hence, they may have made different choices about input levels that are now, in the short run, fixed. As a result, their cost curves, and therefore their supply curves, may differ. This means that as price increases from some low level, market quantity supplied increases partly because firms that are already active each produce more, and partly because other firms (with higher minimum values of SAV C) jump into production. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 117 / 210 Understanding long run market supply is more subtle. All factor inputs are variable, so one can think of the industry growing from relatively low outputs, at low market price, to, say, twenty times the output at a substantially higher market price. At that price it will be using vastly greater quantities of some factor inputs. Does this bid up prices of inputs? That depends on the size of this industry relative to the size of the input markets. If the industry is one of hundreds that use water (or unskilled labor, or gasoline...) in some region, its changing demand for water amounts to fairly little as a proportion of total demand for water. In this case, changes in this industry have little effect on the prices of water. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 118 / 210 Definition An industry is called a constant cost industry if factor input prices remain constant as the industry expands. If instead, one or more input prices get bid up as the industry expands, it is called an increasing cost industry. WARNING “Constant cost industry” does not refer to the shape of each firm’s marginal or average cost curves!!! Example Constant cost industry with identical firms and U-shaped cost curves. On the left, draw a typical firm’s cost curves, and on the right, with different scale on the horizontal axis (say, millions of units), the market supply curve. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 119 / 210 p p LM C LAC Market Supply pc qc q Q At the critical price pc , every potential firm (we have free entry) is indifferent between producing q c (perhaps 100 units), where LAC is minimized, or 0. In the aggregate, that means the industry is willing, at that price, to supply 100 units in total, or 200, 300, or 100k for any natural number k. If we plot all these (100k, pc ) points on the right graph, with Q being millions of units, say, the dots are so close together that it just looks like a solid line. Thus, in a constant cost industry with identical firms, U-shaped cost curves yield a flat industry supply curve (ignoring the vertical section on the price axis below pc ). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 120 / 210 Some restaurants in NYC have locations near the Hudson River, enjoying great views of the river and beautiful New Jersey. Others don’t. Does this mean those with views can earn profits that others can’t? Economists views these returns as rents associated with the favorable location. If a restaurant is renting the favorable site, competition with other restaurants who would like a great view (in order to be able to charge more for the same food and service) means the owner of the site, not restauranteur, captures those extra returns, literally as rent. Even if a restaurant owns one of the favorable sited, it is foregoing being able to rent it to another restaurant, so that foregone rent is an opportunity cost of operating the business on the owned property. So economic profits are zero. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 121 / 210 In an increasing cost industry, as Q increases, one or more factor input prices is being bid up. This can change the LM C and LAC curves in complicated ways, which are difficult to derive graphically. The result, though, is that each firm’s value of min LAC rises as Q rises, so for cost to be covered at higher values of Q, market price must be higher. p Long run supply pe Qe Q Suppose market equilibrium occurs at some (Qe , pe ). The graph would suggest that producer surplus in this market is area under LR supply, e e e e p Q − (total cost of production) = p Q − from 0 to Qe But with free entry, shouldn’t profits be zero? We’ll discuss the resolution of this puzzle. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 122 / 210 Partial Equilibrium Partial equilibrium refers to the study of one market in isolation. Drop the i subscript and consider market demand D(p) (holding all incomes and prices in other markets fixed) and market supply S(p) (holding all input prices fixed). The pair (p, Q) is an equilibrium of this model if D(p) = S(p) = Q. If price is such that demand exceeds supply, the excess demand drives price upward, moving toward equilibrium. S pe D Qe David Pearce (NYU) Theory Track Micro Analysis Spring 2021 123 / 210 The government might intervene in the market, taxing or subsidizing consumption or production. Definition Social surplus is the sum of consumer surplus, producer surplus, and government net revenues (tax minus government expenditures). Notice that in equilibrium, exactly these units have been produced and consumed for which the social benefit (measured by the demand curve) exceeds the social cost (measured by surplus curve). Producing more units would involve costs that cannot be covered by consumers’ marginal benefits. Voltaire’s Dr. Pangloss would say: ”Everything is for the best, in this best of all possible worlds.” He would be disappointed to learn that equilibria often fail to maximize social surplus, in the presence of market power, asymmetric information or externalities. We will study each of these later in the course. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 124 / 210 Suppose the government imposes a tax on consumers of $t for each unit of this good they buy. S p′ +t pe p′ D D Q′ ′ Qe To find the new demand curve choose any Q0 and note that it now takes a price t units lower to call forth that demand. Consequently, the new demand curve D0 is t units lower, at each Q0 , than the original. (The price to the consumer is the market price plus the per unit tax). On the graph, pe denotes the original equilibrium price, and p0 the new market price. The consumer pays p0 + t per unit. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 125 / 210 The loss in social surplus associated with the tax, also called the excess burden of the tax, can be computed in two equivalent ways. Method 1 Find the change in consumer surplus, the change in producer surplus and change in net tax revenues, and add them up. S p′ +t pe p′ D D Q′ David Pearce (NYU) ′ Qe Theory Track Micro Analysis Spring 2021 126 / 210 The loss in P S is the blue shaded area: (pe − p0 )Q0 + area below pe above supply, from Q0 to Qe The loss in CS is the red shaded area: (p0 + t − pe )Q0 + area above pe below demand, from Q0 to Qe But the area tQ0 (see the corresponding rectangle on the graph) is a gain in government revenues. Method 1 is reliable, but it is inefficient, in the sense that it unnecessarily keeps track of “ transfers” of money from one party to another. For example, we know that the money consumers pay in taxes, which is counted as a loss of consumer surplus, is going to show up as increased government revenues, and these two things just cancel out in social surplus calculations. The same goes for the fall in market price, a loss to producers but gained by consumers. Method 2 is a quicker way to answer, but takes some practice to do correctly. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 127 / 210 Method 2 After determining what is consumed and produced, ignore price (it just determines how surplus is divided) and use the demand curve to measure the change in true benefit (from units consumed before but not after, or vice versa) and the supply curve to measure the change in social cost (of extra units produced before relative to after the policy, or vice versa) For example, in evaluating the consumption tax, notice that there are Qe − Q0 units no longer produced and no longer consumed, after imposition of the tax. The loss in real consumer benefits, the area under demand from Q0 to Qe , exceeds the reduction in social cost of production (the area under supply from Q0 to Qe ) by the area between the two curves, from Q0 to Qe . That’s exactly what Method 1 produced at greater length. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 128 / 210 What if producers, instead of consumers, were obliged to pay the tax of $t per unit? Note that in the case of the tax on consumers, at the new equilibrium Q0 , the consumers are paying $t more (per unit) than producers are getting, so Q0 is that quantity at which the original demand curve is above (original) supply by t. That must also be the case with the producer tax (consumers are paying $t more than producers are getting) so the equilibrium is again at Q0 . Everyone is actually paying (or firms, keeping, after tax) the same amount in the two situations, so changes in CS, P S and tax revenues are identical across these alternative tax schemes. Although the schemes have different legal incidences (who legally has to pay the tax), their economic incidences are identical. Economists say that part of the burden of the tax is shifted, by a market price adjustment, from one party (who legally pays the tax) to the other. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 129 / 210 Tariffs are taxes levied on imports. Suppose a “small open economy” (that is, a country that is too small to affect the world price of a good when it imports it) imposes a tariff of $t per unit on the importation of a certain good. In the simplest case, there are no domestic producers. pe +t world supply pe domestic demand D Q′ Q Qe Consumers now have to pay the world price plus the tariff, so they buy Q0 units. What happens to domestic social surplus? Consumers lose tQ0 + area uncer D above pe , from Q0 to Qe but tQ0 is tariff revenue to the government, so the overall loss is just the shaded area on the graph. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 130 / 210 What if there is a domestic industry producing this good? Label their supply curve S DOM S DOM pe +t pe world supply final imports original imports D Q′ Qe Q Before the tariff, some of the domestic consumption Qe is imported, the rest being produced domestically. With the tariff, less is imported (see “final imports”) and more is produced within the country. Method 2 says the loss in domestic social surplus is the sum of the red area (how much more the lost consumption units mattered than it originally cost to import them) and the blue area (how much more it costs to produce the extra domestic units than it did to import them). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 131 / 210 Going back to a closed economy (no trade), let’s compare two ways of helping farmers. Policy 1 Pay a subsidy of $t for each unit produced. This shifts the supply curve downward by t. S S′ pe shaded area is loss in social surplus p′ world supply D Qe Q0 Q′ Q Qe Extra units − are produced and consumed; consumers are helped, producers are helped, but loss of government revenues exceeds those benefits. Method 1 says the loss in social surplus is the area below S and above D, form Qe to Q0 . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 132 / 210 Policy 2 The government enters the market, buying agricultural goods (say, eggs) to drive up their price. It eventually destroys them (rather than selling them and driving down the price). S p′ pe D+G D Q′′ Qe Q′ Q G represents government demand. The extra demand drives the price up to p0 , helping producers and hurting consumers, and costing the government p0 (Q0 − Q00 ). Method 2 says: Units Q0 − Qe are newly produced, costing the blue area on the graph. Units Qe − Q00 still get produced, but no longer enjoyed, so the loss here is the area under D from Q00 to Qe (red area on the graph). This is a DISASTER compared to the subsidy. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 133 / 210 Governments often employ price ceilings or price floors (think of rent control, minimum wages, wartime price controls and so on). These tend to cause excess supply or demand. SSR SLR pe p̄ D Q′ Qe Q′′ Q Suppose, for example, a rent control law is passed, putting a price ceiling p̄ on rents. If the short run supply curve is vertical (the stock of apartments is fixed right now), this doesn’t affect supply, but it increases quantity demanded. Excess demand is Q00 − Qe . In the long run, fewer apartments will be provided for rental, so there is even greater excess demand (Q00 − Q0 ). The effects on social surplus depend on how the good in short supplied is rationed. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 134 / 210 At best, in the long run, if the renters who value the apartments the most somehow get them, without any resources wasted on competing for them (by lineups or complicated application procedures), the loss of social surplus is the shaded area on the graph above. In the short run, landlords suffer and renters benefit. Rationing by lineup is extremely inefficient. Let’s say New York City wants to celebrate an anniversary by hosting a rock concert at Madison Square Garden. The performers are superstars, so the market price for a ticket would normally be $200. But so that people other than the rich have a chance to attend, the city decides to sell tickets at $120 each. At $200, demand would have equalled the capacity of 20, 000 seats. At $120, there is huge excess demand. The city handles this by saying: One ticket per costumer, the box office opens at 9 AM, October 14. Clearly there will be a long line, and it won’t form at 9 AM. When will it start? David Pearce (NYU) Theory Track Micro Analysis Spring 2021 135 / 210 S 200 120 D 20, 000 Q For simplicity, assume first that once the box office opens, it serves each customer instantaneously (so this line moves fast!). A rich guy might want to get a ticket, but he doesn’t want to stand in line for hours. So he pays someone with a lower opportunity cost of time, to line up for him, If there’s a competitive market for lining up, it will have some hourly equilibrium wage rate w. Each person who lines up can buy a ticket for $120 and sell it for $200, the market clearing price of a ticket. He nets $80 for waiting, so for him to be indifferent, the lineup has to form 80 w hours before 9 AM. The same people attend the concert as would have been the case had the city sold them at $200. Loss of social surplus is the shaded area. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 136 / 210 Clearly this area doesn’t depend on w. If people are more averse to lining up, the line will form a bit later, say at 3 AM instead of 2 AM. And if rain is forecast, the line forms later. The lost social surplus is the same. Should the government reduce the wasted waiting time by allowing each person in the lineup a maximum of four tickets, instead of just one? No! The lineup will form four times as many hours before 9 AM, and the total wasted hours will be the same. Because rationing purely by lineups is so destructive, wartime prices controls are often accompanied by rationing coupons. A coupon entitles you to two eggs, for example. You don’t need to be first in the line in order to get your eggs. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 137 / 210 General Equilibrium So far we have been studying single markets in isolation. Now it’s time to put them all together and study equilibria of the whole system of interrelated markets. Such an equilibrium is called a general equilibrium or a competitive equilibrium of the economy. Ideally, one would study consumption and production together. But to keep the complexity under control, here we will limit attention to economies with no production. Everyone starts off owning something (called an initial endowment) and then trade is permitted. Such an economy is called a pure exchange economy. The insights gleaned from understanding these models carry over, under reasonable assumptions to economies with production. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 138 / 210 Definition A pure exchange economy consists of: goods 1, 2, ..., n consumers 1, 2, ..., m with utility functions uj (xj1 , ..., xjn ) initial endowments w1 , ..., wm ∈ Rn+ Here, xji denotes the amount of good i consumed by individual j, and wj is the individual j’s initial endowment of goods. This is a vector (w1j , ..., wnj ), representing what is individual j’s owns, before trade. Let w denote the vector (w1 , ..., wm ) ∈ Rmn + and x denote the vector (x1 , ..., xm ) ∈ Rmn . + David Pearce (NYU) Theory Track Micro Analysis Spring 2021 139 / 210 P j Notice that the vector m j=1 w tells us how much of each good is available in the entire economy. It is the stuff you would have if you collected everybody’s initial endowments of goods. If you were a social planner, with dictatorial powers, you could then distribute those goods to the m people in many alternative ways. Of course, you couldn’t allocate more goods than are available, in total. We say a feasible allocation is a vector x = (x1 , ..., xm ) such that m X j=1 xj = m X wj j=1 Some economists like to use a weak inequality ≤ in the definition of feasible allocation (you could distribute less than the total available). As long as consumers have “free disposal” (they can costlessly get rid of stuff), this makes no difference. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 140 / 210 Each consumer is assumed to be a “price taker”: she acts as though she has no influence on any price. This is a reasonable approximation if m is very large. Facing prices p = (p1 , .., pn ) in this economy, consumer j’s problem is: maxn u xj subject to p · xj = p · wj xj ∈ R+ The constraint says that she can’t consume a bundle whose market price exceeds the market value of her initial endowment. If you like, think of her first selling the endowment; the proceeds, p · wj , now play the role of her income in a standard Marshallian demand problem. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 141 / 210 Often economists give examples with just two consumers, A and B. The interpretation is that there are, say, one million consumers with preferences and endowments like person A, and another million persons with preferences and endowments like person B (and that’s why A and B don’t think they can individually affect princes). In the 19th century, Edgeworth proposed a device, later refined by Pareto and Bowley, for visualizing a two-person pure exchange economy. It is now called and Edgeworth-Bowley box. First, draw person A’s indifference curves as you normally would (if her utility function is strictly quasiconcave, these curves will be convex). Then draw B’s indifference curves, but upside down and backwards! His origin (what to him is (0, 0)) is on A’s graph the point (w1A + w1B , w2A + w2B ). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 142 / 210 b OB b OA b Initial endowment w The length of the box is the amount of good 1 in the economy, while the height is the amount of good 2. Any particular point in the box is a feasible allocation, a specific way of dividing the goods between the two agents. One point in the box is the initial endowment, showing who started off owning what. If market prices (p1 , p2 ) arise, the ratio pp12 determines the slope of the line along which consumers can trade away from w. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 143 / 210 What do we mean by an equilibrium of an exchange economy? It should be a price vector and a feasible allocation such that no one can trade, at market prices, to a bundle she strictly prefers. Definition The pair (p, x) ∈ Rn+ × Rmn + is a competitive equilibrium (or Walrasian equilibrium) if x is a feasible allocation and for each j = 1, ..., m, xj solves maxn u (z) R+ z∈ subject to p · z = p · wj This says that xj is the Marshallian demand for agent j. Why don’t we also need to require that each market clears, that is, that supply equals demand? That is guaranteed already by the feasibility of the allocation x: the sum of the demands equals the sum of the “supplies”, that is, the sum of initial endowments. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 144 / 210 People won’t always agree about the best way to allocate scarce resources. But Pareto suggested that if everyone likes x at least as well as y, and at least one person strictly prefers x to y, then x is a better allocation that y. In honor of Pareto, when this is the case we say x is Pareto-superior to y, or x Pareto-dominates y. If there is NO feasible y that Pareto-dominates a feasible allocation x, we say x is Pareto efficient (or Pareto optimal). The set of all Pareto efficient points in an Edgeworth-Bowley box weakly preferred by all agents to the initial endowment is called the contract curve. b OA OB b David Pearce (NYU) Theory Track Micro Analysis Spring 2021 145 / 210 First Theorem of Classical Welfare Economics Consider a pure exchange economy in which each utility function uj is strictly increasing. If (p, x) is a competitive equilibrium, then x is Pareto efficient. Proof. Suppose, for contradiction, ∃ feasible allocation y that Pareto-dominates x. Then uj (y j ) ≥ uj (xj ) k k k k u (y ) > u (x ) ∀j (12) for at least one k (13) (13) implies y k was not affordable (that’s why k didn’t choose y k instead of xk ), that is, p · y k > p · wk Now if we could also show p · y j ≥ p · wj David Pearce (NYU) ∀j Theory Track Micro Analysis (14) (15) Spring 2021 146 / 210 Proof. then adding (14) to all the j 6= k terms of (15): m X j=1 So So p · yj > p· m X p· m X j=1 j=1 m X j=1 yj > p · j w >p· p · wj m X wj j=1 m X wj j=1 a contradiction. So if we could show (15), we’d be finished. If, for some j, p · y j < p · wj , j could have afforded to buy y j plus (ε, ..., ε), if ε > 0 is sufficiently small. By strict monotonicity, this would have strictly higher utility, contradicting the optimality of xj . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 147 / 210 What an amazing theorem! We didn’t need convexity, differentiablity, or even continuity. “If it’s a CE, it’s Pareto efficient.” Of course, giving everything to A and leaving B with absolutely nothing is also Pareto efficient, so Pareto efficiency is not the only thing to care about. Is competitive equilibrium unique? No, there can be multiple equilibria even in rather simple economies, with only two people (or kinds of people) and two goods. b b b y x b OA OB w b Here x and y are both CE starting from the same initial endowment w. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 148 / 210 In the box shown, the two equilibria have different prices and different quantities. Even in cases where economists would consider equilibrium unique (just one allocation x and one set of relative prices), that x is supported by lots of different absolute prices. For example, if (1, 2) is an equilibrium price vector, so are (2, 4) and (5, 10). All of these yield the same budget line, so they are equivalent. In other words, there in nothing in this model to tie down the price level. The Pareto criterion distinguishes the contract curve from the other (inefficient) allocations in the box. How should society choose among the points on the contract curve? This involves interpersonal comparisons of utility, challenging on conceptual grounds. Maybe someday neuroscience will help us with this. Remind me to say more about “social choice” after we’ve studied behavior under risk. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 149 / 210 Monopoly Until this point, all economic agents have been assumed to take prices as given: each agent believes that she has no influence on market price. At the opposite extreme is a monopolist. As the only firm in the market, she can decide to sell a lot at a low price, or a little at a high price. The world is full of examples of a monopolist selling the same product at different prices to different groups. Known as price discrimination, it will be considered here a bit later. But we start with a simpler setting in which a monopolist chooses a single price, which applies to all buyers and all units of the good. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 150 / 210 Let p be price in the monopolized market, and q be both firm quantity and market quantity. Since the monopolist is basically choosing a point on the market demand curve, it is equivalent to think of her choosing p or q (the one implies the other). She solves Choose q to maximize profits = revenue − costs (16) Equivalently max (qp(q) − C(q)) (17) R+ q∈ FOC for interior maximizer: (16) yields MR = MC (18) where M R denotes marginal revenue, the first derivative of revenue with respect to q. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 151 / 210 Differentiating (17), which “unpacks” revenue a bit, yields q dp + p = MC dq (19) The good thing, for the firm, about producing an extra unit, is that it sells it for $p. But there are two drawbacks: the extra cost M C, and the amount the extra quantity lowers market price (that is, dp dq ) time the number of units q on which this price decline operates. (19) can be expressed in elasticity terms: q dp p + 1 = MC p dq 1 = MC p 1+ εq,p or ! 1 p= MC 1 1 + εq,p (20) (21) These are versions of the “inverse elasticity rule”. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 152 / 210 How do things look, graphically? First notice that when we draw the demand curve, at any q we are plotting average revenue. When the monopolist chooses that (q, p) pair, she sells the q units at p each, so her average revenue is pq q = p. So we can read average revenue off the demand curve. Therefore, if demand is differentiable, average revenue and marginal revenue have the same vertical intercept, and slopes at the origin in the ratio 1 : 2. p MR D q If D happens to be linear, so is M R, and with twice the slope. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 153 / 210 MC AC pM MR qM D q The monopolist sets M R = M C, as long as price at that point exceeds AC. Otherwise, she prefers not to produce. Profits per unit are pM − AC q M , so profits are q M pM − AC q M . Here, AC q M is average cost evaluated at q M . It would be socially optimal to produce at q ∗ , where M C cuts demand (where marginal benefit=marginal cost). But by choosing q M instead, the monopolist causes a deadweight loss of the area above M C below demand, from q M to q ∗ . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 154 / 210 What can the government do to make monopoly less inefficient? One possibility is a per-unit subsidy to production. That will increase q and increase social surplus. But adding to already handsome monopoly profits is not going to be politically popular. An alternative is regulation. Passing legislation requiring the monopolist to produce at q ∗ would remove the deadweight loss. But this assumes the government has accurate, detailed knowledge about the industry, so it can choose the ideal q ∗ . There’s a bigger problem with regulatory solution, for some cost structures. Especially if there are large setup costs, AC may be above the demand curve at q ∗ . Then if the firm is forced to produce q ∗ , it makes losses, and it would prefer to shut down. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 155 / 210 MC p AC M MR qM D q q∗ The shaded area are the monopolist’s losses at q ∗ . Another option is to impose q ∗ by regulation, and at the same time give a lump-sum subsidy equal to the firm’s losses. Public utilities (power companies, trains,...) often have large setup costs (think of all the land and tracks needed for trains). At q ∗ for such firm, the efficient solution may be to regulate q ∗ and low prices, and then cover the resulting losses! David Pearce (NYU) Theory Track Micro Analysis Spring 2021 156 / 210 Different authors use the term natural monopoly in different ways. But the most recognized definition is that of the late William Baumol (of NYU), who said that natural monopoly arises when it is more expensive to produce with two or more firms, than with a single firm. This will be the case if AC is everywhere decreasing (at least in the range where demand can cover costs). Why would there be only one firm in a market, in the first place? Natural monopoly is one possible explanation: one firm can serve the market much better than several. Another reason is patents. The government, in order to reward research and development (R&D) sometimes grants a patent to a firm, giving it the exclusive rights to some product or process, for many years. Even without a patent, I might have some special location, talent or formula, that makes me unique. Whether I use this or supply it to someone else, it allows me (or the person to whom I rent the resource) to capture the market. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 157 / 210 Firm A might be an original incumbent in the market, with lots of brand recognition and loyalty. If firm B tries to break into the market, firm A lowers its price, making it hard for B to win much market share without charging ruinously low prices. So B gives up and leaves. This behavior by A is called predatory pricing. Once B is gone, prices go back up. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 158 / 210 Price Discrimination The better a monopolist’s information about different consumers’ willingnesses to pay, the more effectively it can practice price discrimination. The extreme case of this is known as perfect or first degree price discrimination: the firm recognizes each consumer’s “reservation price” (maximum willingness to pay), and charges her this amount. In this situation, the firm extracts all the surplus in the market (consumers get no net benefits). MC AC D q∗ q The firm’s surplus is the shaded area below demand and above M C. It produces the socially optimal amount. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 159 / 210 In practice, a firm never has such perfect ability to extract surplus. But it may be able to divide the market into groups that have different elasticities of demand (remember the inverse elasticity rule for monopoly pricing). For example, if the elderly have higher price elasticities of demand for some product, the firm may offer them senior discounts. The same goes for students. A chain store may price more modestly in poor neighborhoods. To me, this seems like a weaker, or cruder, version of first degree price discrimination. I would have called it second degree price discrimination. But I wasn’t born early enough to name these things, and for unknown reasons these phenomena are called third degree price discrimination. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 160 / 210 What, then, is second degree price discrimination? It involves giving quantity discounts. Even if you can’t see that a consumer is poor, or elderly, and so on, you can say that buying individual bars of soup or rolls of paper towels will be more expensive per unit. A large family with high expenses may be willing to put up with the inconvenience of buying 48 rolls of paper towels at once. A special kind of second degree price discrimination is a two-part tariff. Here, tariff just refers to charges or fees, not to importation. It is extremely common for a public utility to bill customers this way. There is a fixed monthly charge for service, plus a charge proportional to the amount of electricity you use that month. This keeps marginal prices low, encouraging consumption, while the fixed charge helps cover the utility’s high setup costs. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 161 / 210 How about charging much more for a plane ticket bought the day before the flight, the one purchased two weeks before? This is often given as an example of third degree price discrimination. But notice it doesn’t rely on airlines’ knowing what customer group a purchaser belongs to. The customers “self-select”, with vacationers planning ahead and business executives willing to pay more to be in the right place as a deal is developing, and flying in and out quickly without staying over a Saturday night. So it has something in common with second degree price discrimination: in each case (quantity discounts or time-dependent pricing), everyone is offered the same deals, and different market segments choose different deals. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 162 / 210 Price discrimination depends upon the firm’s ability to prevent “arbitrage”: it doesn’t want people to buy huge quantities at a low price, and resell at higher prices to those who are wealthier and so on. So the student tickets to a New York Philharmonic concert are usually stamped “STUDENT”, and airline tickets are not transferable. It is infuriating to find that the person in the next seat to you on the plane paid $350 less than you did, because of the time of day she bought or what web site she used. But over all, price discrimination by airlines has something of a “Robin Hood” aspect: it takes from the rich and gives to the poor. You may object that yes, it creates more surplus that way, but the airlines grab a lot of that surplus. Interestingly, some empirical studies suggest that even consumer surplus (not just producer surplus) is increased by price discrimination. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 163 / 210 Decisions under Risk Our choices affect what happens to us, but there is often an element of chance as well. Maybe I carry an umbrella, but it ends up not raining. A firm might need to invest in its capital stock before finding out how strong the demand for its product is. When we face risk, we are not choosing an outcome from a set of alternatives, but rather a lottery (from a set of lotteries). A particular lottery, call it L, might have n different possible outcomes x1 , ..., xn as “prizes”, which happen with probabilities p1 , ..., pn , respectively. If these n possibilities are mutually exclusive (at most one can actually happen) and exhaustiveP (there are no other possibilities, so one of these will happen), then ni=1 pi = 1, and we assume pi ∈ [0, 1], i = 1, ..., n. Usually we assume these probabilities are “objective” (roughly, everyone agrees they are “obvious” — if you flip a fair coin, it has probability 1/2 of coming up heads (H) and probability 1/2 of coming up tails (T )). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 164 / 210 We could write the lottery in compact notation as L = (x1 , .., xn ; p1 , ..., pn ). The xi ’s could be be monetary prizes, or different states of health (if you are choosing whether or not to get vaccinated against flu, for example). If the prizes are monetary, the actuarial value of L is the expected value or mathematical expectation n X p i xi i=1 A lottery with just two outcomes is often called a bet: Lb = (w, l; p1 , p2 ) where the person buying the lottery gets w with probability p1 and gets l < w if he loses. A bet is called “fair” if p1 w + p2 l = 0 (better that fair if p1 w + p2 l > 0 and less than fair (unfair) if p1 w + p2 l < 0) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 165 / 210 Suppose you have wealth $100, 000, and someone offers you the fair bet heads, you win (another) $100, 000. tails, you lose $100, 000. If you take the bet, you have an equal chance of ending up with $200, 000 or nothing. Most people will refuse this bet, even though it’s (actuarialy) fair. In some sense, the possible loss bothers them more than the possible gain. This suggests that people don’t maximize actuarial value (nor should they). There is an even more dramatic, and entertaining, demonstration of this, pointed out by Nicolaus Bernoulli almost three hundred years ago. It goes by the name “St. Petersburg Paradox”. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 166 / 210 How much would you pay for the following lottery? A fair coin is flipped until it comes up H (after that, no more flips). If this happens on the 1st toss, you get $2. If this happens on the 2nd toss, you get $4. If this happens on the tth toss, you get $2t . The probability it comes up H on 1st toss is 21 . The probability it comes up H for the 1st time on 2nd toss is The probability it comes up H for the 1st time on tth toss is 1 1 2×2 1 2t . = 14 . So the lottery’s actuarial value is 2· 1 1 1 + 4 · + 8 · + ... = 1 + 1 + 1 + ... = ∞ 2 4 8 The lottery has infinite actuarial value. But few people would pay even $1, 000 for it. This contrast has a paradoxical feel to it. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 167 / 210 Nicolaus Bernoulli’s cousin Daniel Bernoulli proposed a resolution. He said that increments of money become less important, or have lower marginal utility, the more money a person has. He suggested that a person maximizes his expected utility, not his expected actuarial value. P If you maximize ∞ i=1 pi u(xi ) and u is sufficiently concave, then the sum is finite, so you would pay only a limited amount for the lottery after all. People don’t care just about actuarial value, but how that value is spread around. So you might object (to Daniel Bernoulli): maybe people don’t care just about the expected value of utility, but how that utility is spread around. This question remained open for more than two hundred years, when in the early 1940’s, when it was given an elegant answer by von Neumann and Morgenstern, as part of their development of game theory. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 168 / 210 The von Neumann-Morgenstern (vN-M) Theorem gives conditions (axioms) under which an individual acts as though he is maximizing the expectation of some utility function. That is, there exists u : R+ → R such that for any two lotteries L = (x1 , ..., xn ; p1 , ..., pn ) and L0 = (x01 , ..., x0m ; p01 , ..., p0m ), L % L0 ⇐⇒ n X i=1 pi u(xi ) ≥ m X p0j u(x0j ) j=1 That is, a person who satisfies the (rather modest) vN-M axioms is an expected utility maximizer. We say u represents his preferences with the expected utility property. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 169 / 210 Whether a consumer likes risk or not is a matter of taste. We say he is risk averse if he refuses all fair bets. More precisely, starting from some non-random wealth level, he always (at least) weakly prefers to reject any fair bet. One can show this is equivalent to having a concave utility function. General increasing functions do not preserve concavity. For example, √ u(x) = x is strictly concave, but v(x) = (u(x))4 = x2 is strictly convex. However, strictly increasing linear transformations such as w(x) = au(x) + b do preserve concavity or convexity. The u in the vN-M Theorem is unique only up to strictly increasing linear transformations. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 170 / 210 In fact, most of the decisions taken by a consumer, a firm or a government involve risk. How toxic is a fruit I am buying? How well will buyers respond to a new ad campaign? Will the stock market crash this week? Will someone crash into my car on the way to work? As a result, expected utility theory (and its elaborations) plays a huge role in consumption analysis, theory of the firm, finance, and so on. Let’s look at one simple application to the insurance industry. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 171 / 210 Insurance A driver with initial wealth w and strictly concave vN-M utility function u will have an accident with damages $d with probability p. An insurance company offers him insurance on the following terms: he can choose any x ∈ [0, d] and pay rx upfront. If he has an accident, he will receive a payment of x. What value of x should he choose? The choice variable x is like a quantity of insurance to buy, and r is the price. If r is outrageously high, he might prefer the corner solution x = 0 (go uninsured). But if he chooses an interior solution, it sets the first derivative of expected utility w.r.t. x equal to 0: d [(1 − p)u( w r)x − d )] = 0 | − {zrx} ) + pu( |w + (1 − {z } dx consumption with no accident consumption after an accident −r(1 − p)u0 (x − rx) + (1 − r)pu0 (w + (1 − r)x − d) = 0 u0 (x David Pearce (NYU) (1 − r)p − rx) = 0 (1 − p)r u (w + (1 − r)x − d) Theory Track Micro Analysis Spring 2021 (22) (23) 172 / 210 Suppose, for example, that the insurance is fair, that is, rx (1 − p) = p (1 − r)x |{z} | {z } payment if no accident that is, p = r. net payment if accident occurs Then LHS of (23) is 1, so RHS is (22). Strict concavity of u implies numerator can equal denominator only if their arguments are equal, that is, he has fully insured; his consumption is unaffected by the accident. But if insurance is unfair, then 1 > LS = RS so he has less than fully insured: consumption falls if there is an accident. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 173 / 210 The simple analysis above leaves out two extremely important considerations: moral hazard and adverse selection. Both arise in situations of asymmetric information: people don’t all know the same things. Moral Hazard refers to the fact that after a contract is signed, one party (or more) may have an incentive to act in a way that damages one or more other parties. In our insurance example, if I’m fully insured against collision damages, I have a lot less reason to drive carefully in a parking lot than if I’m uninsured. Notice this means p depends on my behavior! The more insurance you sell me, the more careless I get, and the higher p is. How can an insurance company respond to moral hazard? It can raise the price for higher amounts of coverage. And it can charge a lot for things that are likely to affect behavior a lot. The reason you don’t drive 100 mph on I-95 has little to do with being under-insured, and much to do with fear of death or loss of license. But if you have “full glass” coverage, you are much more likely to park on the street instead of in a guarded lot, and so on. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 174 / 210 Moral hazard has major implications beyond the insurance industry. If you own a house, you look after it more carefully (because the property whose value you are affecting is yours) than if you are renting (the same goes for rental cars, bikes and so on). If you are a wealthy landowner in a developing country, with workers cultivating your crops, why should they work as hard (irrigating, weeding, guarding against pests,...) as if it were their land and crops? This is the motivation for share cropping, where the tenant farmer gets part of what is produced. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 175 / 210 Adverse selection refers to contracts attracting some kind of participants more than others. In the insurance context, expensive and comprehensive health insurance is more attractive to someone with complex health problems than to someone who seems to need a doctor. And regarding car accidents, some drivers are more prone to them than others. Maybe I’ve had no accidents, but many close calls, so I understand what the company doesn’t: I’m a poor risk. So I buy more insurance than a typical person would. What do insurance companies do about this? First, they take it into account when they choose contract prices: if these were based on estimates of average incidences in the whole population, they will lose a lot of money. It is the “high-p” types who rush to buy. Secondly, they are very cautious when introducing a new kind of coverage, because again their population information can lead them badly astray. Finally, they may offer “menus of contracts”, some with high coverage at very high rates (to attract the bad risks) and others with lower coverage that attract lower-risk patrons. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 176 / 210 Like moral hazard, adverse selection is a powerful consideration beyond insurance provision. If I’m buying a used car, I know a lot less about it than the seller does. The fact that it is for sale is already bad news: maybe it’s got problems and the seller wants to dump it. Akerlof wrote about “lemons” in 1970, referring not to fruit but to cars of low quality. Huge oil companies bidding on rights to exploit an oil tract are nor sure how much oil is there, and how accessible it is. Each gets to do some sample drilling. If my bid wins, it’s a kind of bad news: no one else’s samples were as encouraging as mine. This is known as “the winner’s curse”. On average the auction gives me an “adverse selection”, that is, a tract about which others have worse information than I do. Remind me to discuss the adverse selection on Bumble, related to a wisecrack of Groucho Marx. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 177 / 210 Game Theory Game theory is a way of studying formally the interaction of different persons, groups or organizations. This could be two countries negotiating a trade agreement, or five firms deciding how to price their competing products in a market, or ninety auction participants deciding how to bid. The two major branches of game theory approach these questions rather differently. Noncooperative game theory (which despite its name, admits the possibility of cooperation) tries to describe a strategic situation in detail, with careful attention to timing and information. Cooperative game theory takes a broader view, modelling with a softer focus and applying general principles to suggest what should happen. We will study noncooperative theory here. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 178 / 210 Simultaneous Noncooperative Games There are N players, who make choices simultaneously. Each player: has a set Si of pure strategies. Think of a pure strategy as a deterministic, or non-random, action that player i might take. A strategy si ∈ Si could be multidimensional: for a firm, it might be “stay in the market, invest in a new mode of production, and produce 8, 000 units of output”. Let S = S1 × S2 × ... × SN = {(x1 , ..., xN ) | xi ∈ Si , i = 1, ..., N } An element x ∈ S is called a strategy profile; it is a vector specifying a particular pure strategy for each player. Each player i has a utility function ui : S → R that associates with any pure strategy profile x ∈ S a utility ui (x). These are von Neumann-Morgenstern utils: we assume that i seeks to maximize expected utility. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 179 / 210 The normal form of a game is written: G = (S1 , ..., Sn ; u1 , ..., un ) This describes the available alternatives for each player, and the payoff consequences for each player, of each strategy profile. If N = 2 and if there are not too many strategies, it may be easiest to describe G by drawing its (bi)matrix, with 1’s strategies corresponding to rows, and 2’s strategies corresponding to columns. In the cell corresponding to the ith row and j th column, write the payoff pair u1 (ai , bj ), u2 (ai , bj ) David Pearce (NYU) Theory Track Micro Analysis Spring 2021 180 / 210 Example 2 1 a1 a2 a3 b1 5,4 0,0 3,1 b2 2,8 1,9 4,7 G This represents a game with S1 = {a1 , a2 , a3 } and S2 = {b1 , b2 }. The entry 4, 7 on the lower right says that players 1 and 2, respectively, get 4 and 7 if 1 chooses a3 and 2 chooses b2 . David Pearce (NYU) Theory Track Micro Analysis Spring 2021 181 / 210 A player i might wish to choose randomly. For example, she could construct a spinner, and make the proportion of the circumference corresponding to a particular strategy si ∈ Si proportional to the probability with which she wishes to play si . For example, maybe she wants to play: si with probability 1/2. s′′i si s′i David Pearce (NYU) s0i with probability 1/8. and s00i with probability 3/8. Theory Track Micro Analysis Spring 2021 182 / 210 Definition A mixed strategy mi ∈ Mi for player i is a probability distribution over the set of pure strategies of i. (This includes “degenerate” or “trivial” mixed strategies that put all their weight on one pure strategy.) Let Mi = the set of all i’s mixed strategies, and M = M1 × ... × MN . Notice that, in the previous example, if player 1 thought 2 was playing 1 3 the mixed strategy 4 , 4 , that is, will play the first column with probability 1/4, then 1’s expected utility from choosing the third row, for example, is 14 (3) + 34 (7). More generally, we can extend the utility functions ui to the domain M = M1 × ... × MN (so ui : M → R) by an expected utility calculation. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 183 / 210 Now we’ll use our (θ, x−i ) notation from earlier lectures. Definition xi ∈ Mi is weakly dominated by yi ∈ M−i if ui (yi , m−i ) ≥ ui (xi , m−i ) ui (yi , m0−i ) > ui (xi , m0−i ) for all m ∈ M , and for at least one m0 ∈ M. If strict inequality holds everywhere above, we say xi is strictly dominated by yi . These are ways of saying that yi is a better strategy than xi : under strict dominance, it always does strictly better than xi does, no matter what others do. That doesn’t mean that i should necessarily choose yi , just that yi is a better choice than xi (maybe some zi does even better!). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 184 / 210 Contrast this to a dominant strategy, which involves a comparison with all strategies. Definition di ∈ Mi is a dominant strategy for i if for all m ∈ M , ui (di , m−i ) ≥ ui (xi , m−i ) ∀xi ∈ Mi If player i has a dominant strategy, she can play it without worrying about what others are doing. di maximizes her expected utility against every profile of strategies for others. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 185 / 210 Example: Prisoner’s Dilemma This is perhaps the most famous of all games. 2 1 c d C 10,10 11,0 D 0,11 1,1 Two criminals have been caught in the act of a minor crime. The prosecuting attorney knows they have committed something much more serious as well, but does not have admissible evidence to prove it. Either defendant can testify to the (serious) guilt of the other conclusively. If they both testify, they get 1 util each. If they keep quiet (c, for “cooperate with each other”, as opposed to d for “defect from cooperation”) they both get 10. If one testifies and the other doesn’t, the defector gets 11 and the cooperator gets 0. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 186 / 210 It is easy to check that d is a (strictly) dominant strategy for 1, and D is a (strictly) dominant strategy for 2. So the dilemma is not any difficulty in figuring out what to do; it’s that the obvious result will be (d; D). Notice this is Pareto-dominated by (c; C). In most games, it is not so clear what each player should do. In such situations, game theorists often look for a strategy profile which, if it were expected by all players, would not lead to any contradictions, that is, would not give any player a reason to switch to a different strategy. Definition m ∈ M is a Nash Equilibrium (NE) if for every i, ui (m) ≥ ui (xi , m−i ) ∀xi ∈ Mi . One can think of m as a plan (commonly understood) from which no individual can profitably deviate, unilaterally. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 187 / 210 Example 2 1 a1 a2 b1 3,3 0,0 b2 0,0 2,2 In this game, (a1 , b1 ) is an NE. If anyone deviated to a different pure strategy, she would get 0 instead of 3. Note: (a2 , b2 ) is also an NE, for exactly the same reason! Of course there is a “bilateral deviation” (let’s both switch to our respective first strategies) that is profitable. But NE ignores that. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 188 / 210 Consider the game “matching pennies”. 2 1 h t H 1,-1 -1,1 T -1,1 1,-1 If they both choose heads, or both tails, 2 pays 1 one dollar. Otherwise, 1 pays 2 one dollar. Assume they are risk neutral. Notice there is no NE in pure strategies: from any cell in the matrix, someone wants to switch. BUT both of them mixing 50/50 is an NE! More formally (.5, .5; .5, .5) is an NE. Why? If you are equally likely to play heads or tails, then I don’t care which one I play (I’m equally likely to match you either way). Note that the only reason player i would ever put strictly positive weight on two pure strategies is that they give her the same expected utility. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 189 / 210 Kuhn and Tucker are most widely known for their work in optimization (the Kuhn Tucker conditions). But in the early 1950’s in Princeton’s math department they did influential work on game theory. Tucker invented a more interesting game than matching pennies, that helps us understanding pure and mixed NE. It is called Battle of the Sexes. W H b o B 2,1 0,0 O 0,0 1,2 A stereotypical 1950’s couple have left home without arranging whether they will meet that evening at the boxing match (the husband’s favored choice) or at the opera (preferred by wife). There were no cell phones. What will they do? Both going to the boxing is an NE; both going to the opera is an NE. Neither of these is symmetric, although the game is. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 190 / 210 What about mixed equilibria? Let p be the probability H plays b, and q be the probability that W plays B. Since H must be indifferent (if he is to mix) between b and o, EUboxing = 2q + 0(1 − q) = 0q + 1(1 − q) = EUopera ⇒ q = 13 Similarly for W , EUboxing = 1p + 0(1 − p) = 0p + 2(1 − p) = EUopera ⇒ p = 23 Notice this is a symmetric equilibrium (each goes to his or her favorite activity two thirds of the time). But it is Pareto-dominated by each of the pure NE! Now that we know the mixed NE, we could figure out the likelihood of landing in each of the four respective cells, and calculate each player’s EU that way. But it’s much simpler to compute H’s EU , for example, by choosing either of his rows, and evaluating that (they both yield EU !). For example, playing the first row gives the same him 2 13 + 0 23 = 23 so his EU in this NE is 2/3 (and same for W ). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 191 / 210 In larger games, finding all the NE can be difficult. Sometimes one can begin by eliminating a strictly dominated strategy for some player; once that row is crossed out, maybe some column becomes strictly dominated in the smaller game that is left. This is called “iterative dominance”. 2 b b b3 1 2 Here, notice that a2 is strictly a1 5,4 1,6 10,0 dominated by a1 . If we cross out 1 a 2,7 0,7 1,8 2 a2 , because it can never get a3 1,6 5,4 10,0 weight in any equilibrium, we are b1 b2 b3 left with the 2 × 3 to the right, in a1 5,4 1,6 10,0 which b3 is strictly dominated (by 1 a3 1,6 5,4 10,0 both b1 and b2 ). Cross b3 out to b1 b2 get the 2 × 2 on the right, which a1 5,4 1,6 can be solved by (p, q) 1 a3 1,6 5,4 calculations. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 192 / 210 Why should we expect an NE to occur whenever some game is played? (In general, I would say we shouldn’t be too sure that equilibrium will be attained.) Here are some considerations: 1 Players may be able to deduce what will happen, for example by iterative strict dominance. Then we expect NE. 2 There may be a tradition, in a society, that a game is played in a certain way. That tradition needs to be an NE (otherwise, people would not follow it). 3 Even if there is no tradition, because the game is rarely ever played, something about the game may make some strategy profile “salient” (Thomas Schelling, 1960). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 193 / 210 For example, the Game G below has many NE, and we cannot prove that any particular profile will be played. But most of us would guess that (a1 ; b1 ) is the likely outcome. 1 a1 a2 a3 a4 b1 10,10 0,0 0,0 0,0 2 b2 0,0 1,1 0,0 0,0 b3 0,0 0,0 1,1 0,0 b4 0,0 0,0 0,0 1,1 G How about a game like matching pennies? It has a unique NE. Can we be sure it will be played? Notice that if 2 plays 50/50, 1 doesn’t care what he does. So playing h for sure is as good as anything else. Why should he randomize? If he doesn’t, 2 doesn’t know that 1 isn’t! So she can’t take advantage of his non-randomization. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 194 / 210 Linear Symmetric Cournot Duopoly A famous economic example of a simultaneous game was studied by Cournot in 1838! His “linear symmetric Cournot duopoly” has two firms with the same constant marginal cost c ≥ 0 (and no setup costs) simultaneously choosing quantities q1 and q2 to put on the market. The inverse demand function is p = a − q1 − q2 . Firm i takes firm j’s output qj to be fixed, and chooses qi to maximize i’s profit qi p − cqi , that is, qi (a − qi − qj ) − cqi FOC (a − qi − qj ) − qi − c = 0 (24) qi = p − c (so qi = qj ) (25) We see (24) and (25) equations imply q1 = q2 = David Pearce (NYU) a−c 3 . Theory Track Micro Analysis Spring 2021 195 / 210 Notice that we could regard a − c as the “competitive output” (where p = M C). a a−c c a−c a Q In Cournot’s model, the pure strategy NE has market output as 2 3 (a − c). One can redo the analysis for n symmetric firms, to find qi = a−c n+1 so Q = n (a − c) n+1 which approaches the competitive output as n → ∞. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 196 / 210 Linear Bertrand Duopoly In 1883, discussing Cournot’s work, Bertrand pointed out that in many cases, firms set prices rather than quantities. We agreed that for a monopolist, these amount to the same thing. Bertrand showed dramatically that this is no longer true, for two or more firms. (As a practical example, think of gas stations on opposite corners. They post their prices for the day, and then consumers respond by deciding how much to buy from each station.) Assume firm’s products are perfect substitutes. So if pi < pj , no one will buy from firm j. (The market is split 50/50 if pi = pj .) Assume as before that marginal costs ci = cj = c ≥ 0. Look for an NE in pure strategies. Notice that (c; c) is an NE (decreasing price from c results in losses, and increasing price from c, unilaterally, leaves you with no sales). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 197 / 210 If (x; x) is an NE, could x > c? No, because 1 could undercut 2’s price by some tiny ε > 0, thereby doubling his sales and almost doubling his profits. If (x; y) is an NE with x > y > c, firm 1 has no sales, and should undercut 2. If x > y = c, firm 2 should increase to a price above c, to make profits. We see (c; c) is the only NE in pure strategies. Note that, just with a change in “strategic variable”, Bertrand has turned Cournot’s story upside-down: two firms are enough for achieving perfectly competitive results! David Pearce (NYU) Theory Track Micro Analysis Spring 2021 198 / 210 Extensive Form Games Many games of interest are played over time, with participants learning some of what has happened as they contemplate what to do next. The description of an extensive form game includes a specification of: the players 1, 2, ..., N the origin (initial node), where the game begins who “moves” at the origin, and what choices (branches, edges) are available to her, each of which leads to a further (successor) node such a successor may be an endpoint of the game (terminal node) or a choice node, at which some player has choices available to him (further branches), and so on. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 199 / 210 Information sets may be used to limit what a player knows about the history of play when she is at a particular choice node (in chess, you always know everything that has happened so far, but in most card games, such as bridge, you don’t) each terminal node, if reached, gives each player some payoff, measured in vN-M utils. All of this was rather vague; we’ll make it clear with some examples. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 200 / 210 Often it is useful to describe an extensive form game by drawing the game tree. These can grow in any direction; I usually draw them from left to right. 2 a 3, 1, 5 b 2, 7, 0 c 1, 6, 2 e 5, 6, 4 b u 1 b d Γ1 b 3 In game Γ1 , player 1 moves first, at the origin. If she chooses u, then 2 gets to choose a or b. If instead 1 chooses d, then 3 gets to choose between c and e. The resulting payoffs for the respective players are indicated beside the various terminal nodes. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 201 / 210 Without further markings on the tree, we know that the players can see where they are in the tree, when they make choices. (Later we’ll see examples of the opposite.) What should 1 do? Choosing u means she’ll get 3 or 2. Choosing d means she’ll get 1 or 5. But she can figure out what 2 and 3 would do, if reached, by thinking about their payoffs (the assumption is that the game’s structure is “common knowledge”: everyone knows it, everyone knows that everyone knows it, everyone...). If reached, 2 would play b, to get 7, whereas if reached, 3 would play e, to get 4. That means 1 gets 2 by choosing u, and 5 by choosing d. So she should choose d, and (5, 6, 4) results. This procedure (of going to the end of the tree, seeing what the last players would do, and then seeing what penultimate players would do, then antipenultimate and so on) is called backward induction. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 202 / 210 If the 4 at the end of the tree had been a 2, player 3 would have been indifferent between c and e, and 1 wouldn’t have had reason to predict 3’s choice with confidence. Players have a strong basis for making their decisions when any indifference by a player at some node in the tree is shared by all earlier players (they’re not sure what she’ll do, but it doesn’t concern them). What if a player doesn’t know where he is in the tree, when he needs to make a choice? We draw an information set around all the points he cannot distinguish. b u 3, 4 m 4, 3 x a d 2, 2 u 4, 3 2 1 b b b m 3, 4 d 2, 2 y Γ2 c 0, 15 In Γ2 , if 2 is called upon to play, he knows 1 did not choose c, but he can’t see if she chose a or b. Backward induction will not succeed here. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 203 / 210 Note: The number of choices at each node in the same information set must be the same (three at x, then three at y, in Γ2 ). Definition 1 Γ is a game of perfect information if every information set is a singleton (that is, has exactly one element). [Γ1 is, Γ2 is not] Definition 2 A pure strategy for player i in an extensive form game Γ specifies what (non-random) choice she will make at each of her information sets. Example: 2, 2 u 5, 5 1 b a 1, 6 l d Γ3 b 2 b b 1 r 6, 1 A pure strategy for 2 is just a choice (a or b), so S2 = {a, b}. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 204 / 210 But a strategy for 1 is something like: (d, r). It says she’ll start by choosing d, and then, in the unlikely event that she is reached at the end of the game, she will play r. Another strategy is (d, l). The two remaining pure strategies for 1 may make you frown. They are (u, l) and (u, r). What sense can you make of the strategy: “I choose u, and if I’m reached later, I will choose r.”?? Choosing u already makes it impossible for her to be reached later. The instruction “r, if reached” seems entirely redundant. Nonetheless, this is how strategies are defined, and it actually turns out to be useful, as you will see. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 205 / 210 A pure strategy profile (s1 , ..., sN ) says what everyone will do, everywhere (at every information set). If you look at each node in the tree, see what the player’s strategy says there, and draw an arrow on the appropriate branch, you can start at the origin and the arrows will lead you to a terminal node, with its vector payoffs. So with each profile s ∈ S is associated a profile u(s) = (u1 (s), ..., uN (s)) of v.N.-M. payoffs. Notice that, starting with an extensive form game Γ, you can find the normal form G = (S1 , ..., Sn ; u1 , ..., uN ) of the game! Remark: If we allow for random moves by “Nature” (who determines how likely it is to rain, or for an accident to occur, and so on, and whose behavior is non strategic and involves fixed probabilities that are common knowledge), things are a little more complicated. A pure strategy profile (for the N actual players, not including Nature) now induces a probability distribution over terminal nodes. But one can still compute expected payoffs for each player, and hence the normal form. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 206 / 210 Nash equilibrium encounters some embarrassment in extensive form games. A Nash equilibrium strategy for some player i may involve an implicit “threat” at an information set not reached in equilibrium, that seems lacking in any credibility. 4, 10 2 a 1 b c b 8, 5 Γ4 d a 4, 10 4, 10 b 0, 0 8, 5 G 1 b 2 d c 0, 0 Backward induction makes it clear that 2 would choose c if reached, and therefore 1 will choose b. And indeed, that is an NE of the game. But let’s look at its normal form. Notice that (a; d) is also an NE! No one can gain by deviating. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 207 / 210 The problem is that d is “a best response to a” (maximizes 2’s EU against strategy a) at the beginning of the game, but not a best response from 2’s decision node onward. Once 2 is reached, the only utility-maximizing thing for her to do is to choose c. Selten (1965) pointed out that once 2’s decision node is reached, what remains of Γ4 is a game in its own right, with just one active player, player 2. In that subgame; as he called it, c is the only NE. He argues that (a; d) is not a sensible NE, as it fails the NE test on that subgame. More generally, after some history of play leading to a choice node x, if you can remove all of the tree following x (from the big game) without ripping any information sets, call x and all that follows it a subgame. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 208 / 210 There are two ways of thinking about player i randomizing in an extensive form game. First, he could play a mixed strategy, which, as before, corresponds to the random choice of a pure strategy. This is like spinning a spinner before the game starts, and following the pure strategy it points to, for the whole game. (If you have two information sets, this coordinates your randomization over the two sets.) This is a “global” randomization. Instead you could randomize “locally”. Think of waiting until you are reached at a particular information set, and then spinning a spinner designed just for that eventuality. This is what a behavior strategy for player i does: it specifies what probability distribution he would use at each of his information sets. These distributions are independent, not “correlated” (they employ completely separate spinners). David Pearce (NYU) Theory Track Micro Analysis Spring 2021 209 / 210 Harold Kuhn, who introduced the most commonly used formulation of the extensive form, showed that it doesn’t matter if players use mixed strategies or behavior strategies, as long as the game has perfect recall (no one ever forgets anything he has done or known). Since a behavior strategy for i tells him what to do everywhere in Γ, it tells him what to do in any subgame. Definition (Selten 1965) A behavior strategy profile b is a subgame perfect equilibrium (SPE) of a game Γ if, for every subgame τ of Γ, b induces an NE on the subgame τ. David Pearce (NYU) Theory Track Micro Analysis Spring 2021 210 / 210