Uploaded by qrx2002115

Intermediate Micro Notes

advertisement
Theory Track Micro Analysis
David Pearce
New York University
Spring 2021
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
1 / 210
Introduction
The course begins by studying the behavior and welfare of individual
consumers, and then the decisions of competitive profit-maximizing
firms.
Their behavior is aggregated to derive demand and supply curves in
each market, allowing us to analyze perfect competition, first in
“partial equilibrium”, one market at a time, and then in general
equilibrium.
Although we start with ideal conditions (no informational problems or
impediments to competition), we gradually enrich the models to take
account of risk, asymmetric information and market power.
Game theory is used to study strategic situations: before deciding
upon an action or course of actions, a player may need to think about
what other players may do, and how they may react to what she does.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
2 / 210
The course syllabus is posted on NYU Classes. It gives a more detailed
account of the topics covered in this course. In addition, it contains
information about:
the grading scheme
the date of the midterm (during class time) and the time and
location of the final exam
advice concerning the optional text book
contact information for our exceptional teaching assistant Gian
Luca Carniglia.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
3 / 210
Consumer Theory
A consumer makes a myriad of economic decisions: how much to spend
on transportation, food, entertainment, and so on.
Abstractly, we think of her allocating the available funds among n
possible goods or commodities. How big is n? That depends on how
detailed we want to be.
Transportation could be broken down into car, bus, subway, bicycle,
airplane, and so on. But the item “car” could again be broken down
into all kinds of categories or even brands and models.
An applied economist would choose how finely to subdivide spending
categories, depending on the goals of the analysis.
Think of listing all kinds of commodity, from 1 to n (the order isn’t
important, as long as we stick to one).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
4 / 210
Some standard mathematical notation will help us refer to different
collections of commodities that might be purchased.
∈
R
∀
∃
|
R+
Rn
Rn+
“is in the set”, “is an element of”
the set of real numbers
for all, for every
there exist(s)
such that
= {x ∈ R|x ≥ 0} the set of nonnegative real numbers
n dimensional Euclidean space
= {x ∈ Rn |xi ≥ 0, i = 1, 2, ..., n}
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
5 / 210
A commodity bundle is a vector x ∈ Rn+
Example
If there are three commodities, say apples, oranges and bananas, then
the vector (2, 4, 0) refers to a bundle that has two apples, four oranges
and no bananas.
In this case a price vector would be a vector
p ∈ R3++ = {x ∈ R3 |xi > 0, i = 1, 2, 3}. If having more of a good is
always desirable, its price has to be strictly positive, or else everyone
would buy an infinite amount of it.
From all the bundles she can afford, a consumer chooses one she likes
best (or at least as well as any other affordable bundle). She makes
this choice according to her preferences, which we now study formally.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
6 / 210
Preferences
If a consumer likes bundle x at least as well as another bundle y, we
say “x is weakly preferred to y” and write x % y. We usually assume
that her preferences satisfy the following two properties:
Completeness: for every two bundles x and y, either x % y or
y % x (or both!).
Transitivity: for every three bundles x, y and z, if x % y and y % z
then x % z.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
7 / 210
If x % y and y % x, we say “x is indifferent to y” (a lazy way of saying
she is indifferent between the two) and we write x ∼ y. As a matter of
notation, x - y means the same as y % x.
If x % y but it is NOT the case that y % x, we write x y. Here we
say “x is strictly preferred to y”.
For any bundle x, the indifference curve through x is the set
∼ (x) = {y ∈ Rn+ |y ∼ x}.
Similarly we can define the upper contour set of x by
% (x) = {y ∈ Rn+ |y % x}.
For the lower contour set, replace % by -.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
8 / 210
% satisfies local nonsatiation if for every x in
arbitrarily close to x such that y x:
Rn+ there exists y
∀x ∈ Rn+ and ∀ε > 0, ∃y such that ky − xk < ε and y x
This assumption rules out “fat” indifference curves.
A stronger asumption that implies local nonsatiation is strict
monotonicity: % is strictly monotonic if for each x, y ∈ Rn+ with
x ≥ y and x 6= y, we have x y.
Here x ≥ y means xi ≥ yi , i = 1, 2, ..., n. So strict monotonicity just
means more is always better.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
9 / 210
Often we assume that preferences are continuous. The preference
ordering % is continuous if for each x, the sets % (x) and - (x) are
closed (contain all their boundary points).
Recall that to call a function f continuous is basically to say that f(x)
doesn’t “jump” as x moves. Similarly, continuity of a preference
ordering concerns the lack of “jumpiness” of the preferences. This is
easiest to see in an example.
Example: Lexicographic preference ordering in
R2+
The preference ordering % on R2+ is called lexicographic if ∀x, y ∈ R2+ ,
x % y if and only if one (or both) of the following holds:
(i) x1 > y1 or
(ii) x1 = y1 and x2 ≥ y2
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
10 / 210
Suppose x, y and z are given as follows:
x
b
2
y
1
z
b
b
b
1
4
7
As we move along the line from y to z, we are initially encountering
points that are strictly worse than x (red), but we transition to points
that are strictly better than x (green), without ever encountering a
point that is indifferent to x. We jump from strictly worse to strictly
better.
This is like the value of a function starting at 1 and moving to 3, say,
without ever taking on the value 2. This is possible only if the function
is discontinuous.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
11 / 210
A person with lexicographic preferences cares about both the first and
second commodities, but drastically less about the second than the
first.
Just as a dictionary ranks words according to the first letter in each,
and breaks ties by looking at the second letters in each, these
preferences use the second component of two bundles only to break a
tie in the first component.
So what does an indifference curve through a point x look like?
It’s just the singleton set {x}.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
12 / 210
Another property of preferences that is important in consumer theory
is convexity. Recall that a subset S of Rn is convex if for every
x, y ∈ S and every λ ∈ [0, 1], λx + (1 − λ)y ∈ S. In other words, if two
points are in the convex set S, so is the line segment between them.
A preference ordering % is convex if for each bundle x ∈ S, % (x) is a
convex set. The preferences are strictly convex if for every x 6= y in
S with y % x and every λ in (0, 1), λx + (1 − λ)y x.
Someone with convex preferences likes averages of two indifferent
bundles better than extremes.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
13 / 210
Utility
Preference orderings are a very general way of capturing a consumer’s
tastes. But they are not always convenient to work with: you can’t
manipulate them algebraically, or apply standard calculus tools to
them.
This raises the question: is it possible to find a function that assigns
higher numbers to bundles that are more preferred, and then work
with that function instead of with the underlying preferences?
If % is a preference on Rn+ , a function U : Rn+ → R is said to
represent % if, ∀x, y ∈ Rn+ ,
U (x) ≥ U (y) if and only if x % y.
Such a function U is called a utility function. It assigns higher numbers
to things that the consumer prefers, or equivalently, things that “give
her higher utility”.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
14 / 210
Example
Consider for example someone who likes fruit, but is equally happy
with apples or oranges. She doesn’t care if she has two apples and
three oranges, that is the bundle (2,3), or the bundle (3, 2). All she
cares about is the number of apples plus the number of oranges.
So if we define the function U by U (x) = x1 + x2 , then U represents
her preferences.
Suppose you defined a new function V to be five times U , so
V (x) = 5(x1 + x2 ). Notice that V ranks all bundles in the same order
as U does, so V also represents her preferences.
In fact, for any strictly increasing function g, the composite function
g(U ), or g follows U , also represents those preferences.
We think of these utility functions as ordinal because we are just
paying attention to the way they order the bundles; the numbering
system is not meant to convey more information than that.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
15 / 210
Can any preference ordering be represented by a utility function?
The answer is yes, if the preferences are complete, transitive and
continuous. (Remember that, but you don’t need to know the proof.)
It is interesting to see why there might be trouble if continuity is
violated. Lexicographic preferences are a famous family of preferences
that cannot be represented by any utility function. Why?
Think of the vertical line at x1 = 5, say, and the points x = (5, 2) and
y = (5, 1). To represent the preferences with a function U , each of
these two points must be given a different utility. Nothing to the right
of the line can be given a utility in the range [U (y), U (x)], nor anything
to the left of the line.
So just to number things on that vertical line, we’ve used up a whole
interval of points on the real number line. Every vertical line is going
to need the same treatment: it will take up a whole interval on the real
number line. You get the idea that this is going to be pretty crowded.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
16 / 210
In fact, if you’ve taken real analysis (and if you haven’t, please do it
one of these days), you know that there is an uncountable infinity of
vertical lines, but only a (much smaller) countable infinity of intervals.
So there can’t be a distinct interval of real numbers reserved for each
vertical line. Give this some thought, but again, you don’t need to
know this argument.
But most preferences can be represented by some utility function U ,
and we will usually assume U is smooth (let’s say, twice continuously
differentiable). Then the equation of an indifference curve is U (x) = c,
where c is some constant. In two dimensions, this is U (x1 , x2 ) = c and
totally differentiating both sides yields:
U1 dx1 + U2 dx2 = 0
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
17 / 210
that is,
dx2
U1
=−
dx1
U2
This tells us the rate at which the second commodity must be
increased, per (infinitesimal) unit of first commodity taken away, to
hold utility constant. In other words, this is the slope of the
indifference curve.
This is also called the marginal rate of substitution (2 for 1),
abbreviated M RS2 for 1 . (Some books get this notation backwards.)
Economists call these first partials “marginal utilities”, so we can write
dx2
M U1
=−
dx1
M U2
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
18 / 210
Recall what the concavity of a function means:
f : S → R is concave if for all distinct x, y ∈ S and ∀λ ∈ (0, 1),
f (λx + (1 − λ)y) ≥ λf (x) + (1 − λ)f (y)
If we replace ≥ with ≤ above, we have the definition of convexity of f .
Replacing weak inequalities with strict inequalities in the respective
definitions give us strict concavity and strict convexity of f ,
respectively.
Preferences % might be represented by two functions U and V , one
strictly concave and one strictly convex. A more suitable property for a
utility function is quasiconcavity.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
19 / 210
A function f : S → R on a convex set S is quasiconcave if for all
distinct x, y ∈ S and ∀λ ∈ (0, 1),
f (λx + (1 − λ)y) ≥ min {f (x), f (y)}
Replacing ≥ with > above gives us the definition of strict
quasiconcavity.
Proposition
Suppose U represents % on
Rn+. Then
U is quasiconcave ⇐⇒ % (x) is convex ∀x
We see that quasiconcavity is not so much about the shape of the
function as it is about the shape of upper contour sets.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
20 / 210
Optimization Review
Optimization usually refers to maximization or minimization, with or
without constraints. A typical problem without constraints would be:
max f (x)
x∈
R
Or the problem might include some parameters a = (a1 , a2 , ..., ak ) that
the person doing the maximizing is not allowed to choose (constants
beyond her control, such as prices that a consumer might face). Then
we would write:
max f (x, a1 , a2 , ..., ak )
x∈
David Pearce (NYU)
R
Theory Track Micro Analysis
Spring 2021
21 / 210
We are interested in two things:
The solution x∗ (a), that is, the choice of x that maximizes
f (x, a).
The maximized value f (x∗ (a), a).
The first tells us the best thing to do, depending on a, and the second
tells us how well we do, when we choose the best thing.
Example
max −x2 + 6ax
x∈
R
First order necessary conditions (FOC):
−2x + 6a = 0
x∗ = 3a
maximized value is −(3a)2 + 6a(3a) = −9a2 + 18a2 = 9a2
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
22 / 210
There might be many choice variables:
maxn f (x, a1 , a2 , ..., ak )
R
x∈
F.O.C.’s
∂f
∂xi
= 0, i = 1, ..., n
The solution is a vector-valued function
x∗ (a) = (x∗1 (a), ..., x∗n (a))
and the maximized value function is
f (x∗ (a), a) .
Even when the FOC’s are satisfied, you might not have found a global
maximizer: it might instead be a local maximizer, a local minimizer, a
point of inflection, or a saddle point. (Necessity vs. Sufficiency).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
23 / 210
Suppose there is a one-unit increase in one parameter ai . How much
does the maximized value change?
Even if we hold all the x1 , ..., xn fixed, there would be the direct effect
∂f
∂ai . But in general all the xj ’s may adjust optimally. So there are
n + 1 changes to be taken into account.
Fortunately, there is a beautiful result called the envelope theorem that
simplifies things tremendously.
Let
M (a) = f (x∗ (a), a)
be the maximized value function of the diffferentiable function f .
Assume x∗ is a differentiable function of a.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
24 / 210
Envelope Theorem
∂M
∂f
=
∂ai
∂ai
x held constant
Proof.
∂M
∂
(f (x1 (a), ..., xn (a), a))
=
∂ai
∂ai
∂x1
∂f
∂f
+ ... +
=
∂x1 ∂ai
∂xn
|{z}
|{z}
=0 (FOC)
=
∂f
∂ai
David Pearce (NYU)
∂xn
∂f
+
∂ai
∂ai
x held constant
=0 (FOC)
x held constant
Theory Track Micro Analysis
Spring 2021
25 / 210
Optimization with Constraints
Often x cannot be chosen freely, but instead must satisfy some
constraints. For example, a consumer can spend only the amount of
money she has. As before, there may be parameters a1 , ..., ak whose
values cannot be chosen. A typical problem with constraints is
maxn f (x, a)
x∈
R
s.t. g(x, a) = 0
The mathematician Lagrange developed the correct first order
conditions for these problems. Form the expression
L(x, a, λ) = f (x, a) − λg(x, a)
In his honor, λ is called a Lagrange multiplier and L is called “the
Lagrangean”.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
26 / 210
The first order conditions for an interior maximizer (or minimizer) of
f , subject to g = 0, are just the FOC’s of the unconstrained problem
min max L(x, a, λ)
λ
x
that is
∂L
= 0,
∂xi
∂L
=0
∂λ
i = 1, .., n, and
Equivalently
∂f
∂g
−λ
= 0,
∂xi
∂xi
g(x, a) = 0
David Pearce (NYU)
i = 1, .., n, and
Theory Track Micro Analysis
Spring 2021
27 / 210
You don’t need to know the proof, but we will use the result all the
time, along with associated constrained envelope theorem:
Theorem
Let M (a) be the maximized value function
M (a) = f (x∗ (a), a)
associated with the constrained maximization problem. Then
∂L
∂M
=
∂ai
∂ai
∂f
=
∂ai
x held constant
x held constant
−λ
∂g
∂ai
x held constant
These Lagrangean results are put to use in various ways in both
consumer and producer theory.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
28 / 210
Existence and Uniqueness of Solutions
A general way of expressing many maximization problems with or
without constraints is to write
max f (x, a)
x∈S
where S is a subset of Rn . The choice set S takes into account any
constraints that must be satisfied.
Not all such problems have a solution.
Example 1
max x2
x∈
has no solution, because
David Pearce (NYU)
R
R is unbounded
Theory Track Micro Analysis
Spring 2021
29 / 210
Example 2
max x
x∈S
has no solution when S = {x ∈ R|3 < x < 5}, because S is not closed.
Example 3
Finally, suppose S = [0, 1], and f : R → R is defined by
(
x if x ∈ [0, 1)
f=
0 if x = 1
Then
max f (x)
x∈S
has no solution, because f is discontinuous.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
30 / 210
Weierstrass, the same fellow who proved the Intermediate Value
Theorem, proved that these are the only reasons for nonexistence.
Weierstrass’ Theorem
Let S be a nonempty, closed bounded subset of Rn and f : S → R be
continuous. Then f achieves a maximum (and minimum) on S.
(know the theorem; you don’t need to know the proof.)
Weierstrass’ Theorem says nothing about uniqueness of the x such that
f (x) is maximized. Clearly
max 10
x∈[0,1]
has many maximizers, and
max x2
x∈[−1,1]
has two.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
31 / 210
Proposition
Suppose S is a convex subset of Rn , and f : S → R is strictly
quasiconcave. Then there is at most one x∗ that maximizes f on S.
Proof.
For contradiction, suppose that distinct x, y ∈ S both maximize f on
S. Choose any λ ∈ (0, 1) and note that λx + (1 − λ)y ∈ S.
By strict quasiconcavity,
f (λx + (1 − λ)y) > min{f (x), f (y)},
contradicting the maximality of f (x) and f (y).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
32 / 210
The Consumer Problem
In its simplest form, the consumer’s problem is to allocate fixed income
I across expenditures on the n available goods:
maxn U (x)
R+
x∈
s.t. p · x − I = 0
Choosing from Rn+ makes sense (you can’t consume −5 units of
toothpaste). In effect, there are n nonnegativity constraints, but we
won’t create n extra Lagrange multipliers.
Instead we keep in mind that the solution might be a corner solution
(such as, eat no apples at all) rather than an interior solution satisfying
the FOC’s.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
33 / 210
The associated Lagrangean is:
L(x, p, I, λ) = U (x) − λ(p · x − I)
FOC’s:
M Ui − λpi = 0,
i = 1, ..., n
p·x=I
The first n equations say that
λ=
M U1
M Un
= ... =
p1
pn
Marginal utilities per dollar are equalized!
If some good j were not consumed at all (xj = 0), its marginal utility
per dollar might be lower than that of the others, a good reason not to
consume xj .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
34 / 210
The solution x∗ to the utility maximization problem is called the
Marshallian demand function, in honor of Alfred Marshall, who
taught at Cambridge University more than a century ago.
We will write
D(p, I) = (D1 (p, I), ..., Dn (p, I))
for the Marshallian demand function.
The maximized value function is called the indirect utility function,
and is written
V (p, I) = U (D(p, I))
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
35 / 210
Where there’s a value function, there must be an envelope theorem (or
several). In this case the constrained envelope theorem tells us:
∂L
∂V
=
∂pi
∂pi
x=x∗ (a)
∂V
∂L
=
∂I
∂I
x=x∗ (a)
= −λx∗i (a)
=λ
Taking the ratios on both sides yields
∂V
∂p
x∗i (a) = − ∂Vi
Roy’s Identity
∂I
which tells us how to recover the Marshallian demands from the
indirect utility function.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
36 / 210
Example: Cobb Douglas preferences
u(x1 , x2 ) = xα1 xβ2 ,
FOC’s:
α, β > 0
αxα−1
xβ2 − λp1 = 0
1
βxα1 xβ−1
− λp2 = 0
2
p1 x1 + p2 x2 = I
Then Marshallian demands are
α I
α + β p1
β I
D2 (p, I) = x2 (p, I) =
α + β p2
D1 (p, I) = x1 (p, I) =
So the indirect utility function is
α
β
V (p, I) = D1 (p, I) D2 (p, I) =
David Pearce (NYU)
I
α+β
α+β Theory Track Micro Analysis
α
p1
α β
p2
β
Spring 2021
37 / 210
Note that the relative size of α and β matters, but scaling both up or
down leaves demands (but not indirect utility) unchanged.
Observe also that the proportion of income spent on each good, that is
pi xi
α
I , is constant: it only depends on β , and not on prices or income.
Constant expenditure share is a special property of this family of
preferences; there is no reason to expect this for most utility functions.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
38 / 210
Let’s stay with n = 2 for the moment and get a graphical interpretation
of the first order conditions of the Marshallian consumer problem.
M U2
M U1
=
p1
p2
can be rewritten
−
M U1
p1
=−
M U2
p2
(1)
The LHS is the slope of the indifference curve at the chosen bundle.
And the RHS?Solve the budget constraint for x2 as a function of x1 :
p1 x1 + p2 x2 = I
I
p1
x2 =
− x1
p2 p2
So the RHS of (1) is the slope of the budget line.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
39 / 210
Thus, at an interior solution, with both goods consumed, the FOC’s
require tangency of the budget line at the indifference curve.
If instead
M U2
M U1
<
,
p2
p1
so consuming good 2 is not a useful way to get utility, then we are at a
corner solution and
M U1
p1
>
M U2
p2
The budget line is shallower than the indifference curve.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
40 / 210
Notice that if I increases, the slope − pp12 of the budget line does not
change. So an increase in income produces a parallel shift of the
budget line, outward.
How would such an increase affect x1 and x2 , the quantities
demanded? We can’t say, in general.
With n commodities, we say the ith good is a normal good if
∂Di
≥0
∂I
We say good i is an inferior good if
∂Di
< 0.
∂I
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
41 / 210
Intuition might suggest that if you have more to spend, you will buy
more of a good than before. This is often the case, hence the name
normal good.
But suppose, when you have relatively little money, you consume some
cheap wine. If you get richer, probably you switch to buying a different
wine that you like better. So the cheap wine is an inferior good that
you phase out of your consumption bundle as your circumstances
improve.
We have been changing income, while prices are held fixed. Let’s think
now of changing p1 and seeing how x1 reacts.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
42 / 210
When p1 increases, the vertical intercept of the budget line, pI2 , stays
fixed, but the maximum amount of x1 the consumer can afford
decreases. So the budget line gets steeper, rotating around the fixed
vertical intercept. How do x1 and x2 react?
By drawing some alternative graphs, we can see that x1 might go up,
or down. The same is true for x2 .
We say that xi is a Giffen good if
∂Di
>0
∂pi
It is rare that Giffen goods are observed, but logic cannot rule them
out.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
43 / 210
Economists like to understand the effect of a change in price by
dividing it into two parts, a substitution effect and an income effect.
When p1 increases, two things have changed. First, the budget line has
become steeper: good 1 has become a worse deal than before,
compared to good 2. This favors shifting consumption away from good
1, and toward good 2.
But the consumer has also experienced a loss in welfare, or well being:
she can no longer get onto her original indifference curve. Imagine that
when p1 increases, we compensate her by giving her just enough extra
income that she can attain the original utility level.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
44 / 210
x2
indifference curve
after price increase
b
B
A
original indifference curve
b
b
C
x1
budget line after p1 increases
original budget line
In the graph, A is the bundle originally demanded, C is the bundle
demanded after p1 increases, and B is the hypothetical bundle chosen
if, magically, the price increase had been accompanied by an increase in
the income just enough to allow her to attain her original utility level.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
45 / 210
The movement from A to B is called the substitution effect of the
increase in p1 . The x1 component of this is negative: as P1 increased,
the compensated consumer substituted away from that good. Does
that depend on the shape of the preferences or on n = 2?
We will get definitive answers on this a bit later.
The budget lines through B and C are parallel: the movement from
hypothetical B to actual consumption C corresponds to a loss in
income.
There fore the move from B to C is called the income effect of the
increase in p1 . In the graph, the x1 component of this change is
negative, meaning good 1 is a normal good (remember income
decreased).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
46 / 210
Had good 1 been inferior, the income effect would have been positive,
and point C would have been to the right of B.
If this income effect were sufficiently large, it could overwhelm the
negative substitution effect, leaving final consumption C to the right of
A. This would make good 1 a Giffen good. We see that a Giffen good
is an extreme example of inferiority.
When pi increases, there is no reason to be sure which way x2 should
adjust. If iPhones get really expensive, maybe a consumer switches to
a substitute, some kind of smartphone.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
47 / 210
But if bacon gets expensive, a consumer might buy fewer eggs, because
she likes to eat them together: she considers them complementary.
Good j is said to be a gross substitute for good i if
∂Dj
>0
∂pi
Good j is a gross complement to good i if
∂Dj
<0
∂pi
This is an “old fashioned” definition, known to be unsatisfactory
because it is possible (see example in Nicholson and Snyder) to have
∂Dj
> 0,
∂pi
but
∂Di
< 0,
∂pj
so j is a substitute for i but i is not a substitute for j.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
48 / 210
By the way, all these derivatives of quantities with respect to prices
look like natural measures of how price-sensitive demand is. But they
are not unit-free!
Example
A family buys:
15 pints of milk if p = 1 and
10 pints of milk if p = 2.
The price increase was just 1, and their consumption dropped by 5, so
the sensitivity to price seems high.
But if we measure price in cents, the price increase was 100, and
consumption dropped by only 5, so the sensitivity to price sounds low.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
49 / 210
Clearly we need a unit-free measure of how sensitive a variable y is, to
a variable x.
For a discrete change in x, we can look at
the percentage change in y
the percentage change in x
that is,
100 ∆y
y
100 ∆x
x
=
x ∆y
y ∆x
The analog of this for an infinitesimal change is the elasticity of y
with respect to x:
x ∂y
εy,x =
y ∂x
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
50 / 210
εxi ,pi =
pi ∂xi
xi ∂pi
is sometimes called an “own price elasticity”, whereas
εxi ,pj =
pj ∂xi
xi ∂pj
is a “cross price elasticity”.
Except for Giffen goods, own price elasticities are negative, so talking
about a large elasticity can be confusing (it usually means for below 0).
xi is
elastic
unitary elastic
inelastic
perfectly inelastic
David Pearce (NYU)
if
if
if
if
|εxi ,pi | > 1
|εxi ,pi | = 1
|εxi ,pi | < 1
|εxi ,pi | = 0
Theory Track Micro Analysis
Spring 2021
51 / 210
Consumer Welfare
Economists often use the Marshallian demand curve to get a dollar
measure (called consumer surplus) of how much the consumer
benefits from having access to a market at a particular price.
Let’s take a microscopic view of the Marshallian curve, revealing the
maximum amount the consumer would have been willing to pay for
each unit.
p1
p2
p3
p4
p5
p6
p7
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
52 / 210
Here’s an argument that’s not quite right:
The demand curve tells us the consumer would pay at most p1
for the first unit, at most p2 for the next, and so on. So if the
price is actually p5 , for example, she pays 5p5 for the five units
she buys, but would have been willing to pay
p1 + p2 + p3 + p4 + p5
for it. Her surplus (from having access at p = p5 ) is
(p1 − p5 ) + (p2 − p5 ) + (p3 − p5 ) + (p4 − p5 ),
that is, the area under the demand curve above the
price. This is called consumer surplus.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
53 / 210
What’s wrong with this argument?
She wouldn’t actually be willing to pay
p1 + p2 + p3 + p4 + p5
to get five units. For example, her willingness to pay p5 for the fifth
unit is conditional on her having paid p5 for the first four units as well!
If in fact she is made to pay more for the first four units, she feels
poorer, and may be unwilling to pay p5 for the last unit.
Thus, Marshallian consumer surplus is an imperfect measure of
maximal willingness to pay. In later lectures we will see when it is
nonetheless a good approximate measure of welfare.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
54 / 210
Cost Minimization and Hicksian Demand
Constrained utility maximization is a familiar problem: it’s just
choosing what you like best, from what you can afford. We do it all the
time.
There’s a different constrained optimization problem that seems more
artificial: attaining a target utility, say u0 , as cheaply as possible. As
strange as it seems, it accomplishes lot of things for us:
a beautiful, general treatment of income and substitution effects.
a satisfying definition of substitutes and complements.
“Hicksian” demand curves that always slope down.
exact measures of consumer welfare.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
55 / 210
minn p · x
R+
x∈
s.t. U (x) = u
L = p · x − λ(U (x) − u)
∂L
= 0 ⇒ pi − λ M Ui = 0,
∂xi
i = 1, ..., n
1
M U1
M Un
=
= ... =
λ
p1
pn
This is the “equalize marginal utilities per dollar” rule, all over again.
The x∗ that solves the minimization problem is called the Hicksian
demand (a function of prices and the target utility u) and is denoted
h(p, u)
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
56 / 210
The minimized value is called the expenditure function and is
written
e(p, u) = p · h(p, u)
Let’s apply the constrained envelope theorem:
∂e
∂L
=
∂pi
∂pi
x=x∗ (p,u)
= x∗i (p, u)
Now what x∗i is this? It’s the Hicksian demand, so we have:
Shephard’s Lemma
∂e
= hi ,
∂pi
i = 1, ..., n
Clearly the function e is increasing in all its arguments. We can say
more about its shape.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
57 / 210
Proposition
e is concave as a function of p.
Proof.
We need to show, for all p0 , p00 ∈ Rn++ and all λ ∈ (0, 1), that
e λp0 + (1 − λ)p00 , u ≥ λe p0 , u + (1 − λ)e p00 , u
Letting p000 = λp0 + (1 − λ)p00 and letting x0 , x00 and x000 solve the
minimization problem at prices p0 , p00 and p000 respectively, we need to
show:
p000 x000 ≥ λp0 x0 + (1 − λ)p00 x00
(2)
Now
p000 x000 = λp0 x000 + (1 − λ)p00 x000
≥ λp0 x0 + (1 − λ)p00 x00
which establishes (2).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
58 / 210
This helps us right away. Suppose e is twice continuously differentiable.
Concavity of e in p implies, for each i, that
∂2e
≤0
∂p2i
Now
∂hi
∂
=
∂pi
∂pi
∂2e
= 2
∂pi
≤0
∂e
∂pi
(from Shephard’s Lemma)
(concavity of e)
Thus, for each good i, the ith Hicksian demand curve is
downward-sloping in pi .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
59 / 210
The function h is also called the compensated demand function,
when pi increases, income is increased just enough to keep the
consumer at the original utility.
When we broke the response of Marshallian demand for good 1 (to an
increase in p1 ) into a substitution effect and an income effect, the
substitution effect was the change in x1 when the price increase is
exactly compensated. That’s exactly what the function h tells us.
We just learned that hi always (weakly) decreases when pi increases, so
the substitution effect is always (weakly) negative.
This doesn’t depend on the shape of the utility function nor the
number of goods.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
60 / 210
Why does the same “equalize marginal utility per dollar” rule apply to
both constrained utility maximization and constrained cost
minimization?
In both cases, you need to be producing utility efficiently.
In the Hicksian problem, suppose you were buying strictly positive
amounts of two goods, say xi and xj , but xi had a higher marginal
utility per dollar. Spend a dollar less on j and spend it on i instead.
Utility goes up.Now you can cut back expenditures (on any good) until
you are back down to the target utility, which is now costing you less
(contradicting optimality of the original bundle)
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
61 / 210
x2
x∗
b
U (x) = u
p1 x 1 + p2 x 2 = I
x1
Someone could look at this graph and say: “This shows how x∗
achieves the highest possible utility, given income I.”
Someone else might say: “This shows how x∗ minimizes expenditure,
subject to the utility target u.”
They would both be right: x∗ solves both problems. In a sense they
are “paired problems”:
u = V (p, I)
and I = e(p, u)
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
62 / 210
This reminds me of a pair of “identities” that I call the “woodchuck
identities”, for reasons I will reveal in class.
V (p, e(p, u)) = u
e(p, V (p, I)) = I
These are not always true, but they hold under fairly weak conditions
on the underling utility function U .
Holding prices fixed,
V maps money into utility,
whereas e maps utility into money.
You might wonder, then, if the two functions are inverses.
And the woodchuck identities say indeed, they are.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
63 / 210
Something amazing is going to happen. Let’s build the Hicksian
income compensation directly into the Marshallian demand function,
so that when any price changes, income gets adjusted automatically to
keep utility constant:
D(p, e(p, u)) = h(p, u)
Because this is true at all prices, we can partially differentiate the ith
component of both sides w.r.t. pi :
∂Di
∂e ∂Di
∂hi
+
=
∂pi
∂pi ∂I
∂pi
∂Di
∂hi
∂e ∂Di
=
−
∂pi
∂pi
∂pi ∂I
|{z}
Shephard’s Lemma
Slutsky Equation
∂Di
∂hi
∂Di
=
− xi
∂pi
∂pi
∂I
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
64 / 210
The Slutsky equation breaks the Marshallian own-price derivative into
the substitution effect
∂hi
∂pi
which we’ve shown to be ≤ 0, and the income effect
−xi
If good i is normal,
∂Di
∂I
∂Di
∂I
≥ 0 and the income effect is ≤ 0.
If instead the good i is inferior, the two effects go in opposite directions.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
65 / 210
It is instructive to derive the elasticity form of the Slutsky equation:
∂Di
∂hi
∂Di
=
− xi
∂pi
∂pi
∂I
pi ∂Di
pi ∂hi
pi xi I ∂Di
=
−xi
x ∂p
x ∂p
xi I
xi ∂I
| i {z i} | i{z i}
| {z }
εDi ,pi = εhi ,pi −
p i xi
I
εDi ,I
This shows that the Marshallian and Hicksian price elasticities differ by
an income elasticity weighted by the income share of the good in
question.
Since the income share of most goods is tiny, often the two price
elasticities are almost the same. This makes it unlikely that the income
effect is going to overwhelm the substitution effect, even if the good is
inferior.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
66 / 210
How about the cross-price Slutsky equation?
Di (p, e(p, u)) = hi (p, u)
∂Di
∂e ∂Di
∂hi
+
=
∂pj
∂pj ∂I
∂pj
∂Di
∂hi
∂Di
=
− xj
∂pj
∂pj
∂I
Converting to elasticity form,
pj ∂Di
pj ∂hi
pj xi I ∂Di
=
−xj
xi ∂pj
xi ∂pj
xi I
xi ∂I
| {z }
| {z } | {z }
εDi ,pj = εhi ,pj −
David Pearce (NYU)
p j xj
I
Theory Track Micro Analysis
εDi ,I
Spring 2021
67 / 210
Here’s a better proof that the ith Hicksian demand is weakly
downward-sloping. It doesn’t require any smoothness of e, or even that
the Hicksian problem has a unique solution!
Proposition
If pi increases (other prices constant), then hi does not increase.
Proof.
Consider any two price vectors p0 , p00 ∈ Rn++ and let x0 , x00 solve the
respective expenditure minimization problems (for some u).By
definition
p0 x0 ≤ p0 x00
00 0
(3)
00 00
p x ≥p x
00 0
(4)
00 00
−p x ≤ −p x
David Pearce (NYU)
Theory Track Micro Analysis
(5)
Spring 2021
68 / 210
Proof.
Add (3) and (5):
0
(p0 − p00 ) · x0 ≤ (p0 − p00 ) · x00
00
0
(6)
00
(p − p ) · (x − x ) ≤ 0
(7)
Now consider price vectors p0 , p00 that differ only in the ith component.
(p0i − p00i ) · (x0i − x00i ) ≤ 0
That is, xi moves in (weakly) the opposite direction of pi .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
69 / 210
How do the Marshalian and Hicksian functions Di and hi through the
point x∗ compare?
If good i is normal, the income and substitution effects are both ≤ 0,
so they reinforce one another, and
∂hi
∂Di
≥
∂pi
∂pi
So Di is more elastic than hi .
But it doesn’t “look” that way when we graph them both, because the
independent variable, pi is on the vertical axis!
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
70 / 210
p1
hi (pi, u0)
b
Normal
good i
x∗
Di (pi, I)
x1
Here, the other argument of Di and pi that are both fixed, have been
suppressed.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
71 / 210
Compensating and Equivalent Variation
The expenditure function gives a clear way of deriving dollar measures
of a consumer’s loss in welfare when one or more prices rise.
Suppose original prices are p0 ∈ Rn++ , and new prices are p1 ∈ Rn++ ,
where one or more goods have become more expensive. The consumer’s
original utility was u0 and her income is I, so
e p0 , u0 = I
If we wanted her to have enough money to attain the original utility u0
at the new prices, that would be (by definition)
e p1 , u0
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
72 / 210
Therefore the extra money she needs, in addition to her original
income I, is
CV = e p1 , u0 − I
This is called the compensating variation associated with the
increase in prices.
Notice that if u1 is the new, lower utility, she gets at p1 if there is no
compensation, then
e p1 , u1 = I = e p0 , u0
Therefore CV is sometimes written as
e p1 , u0 − e p1 , u1
or e p1 , u0 − e p0 , u0
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
73 / 210
In the case where only one price pi has changed, there is a striking
graphical interpretation of CV .
First some notation: for x ∈ Rn and y ∈ R, define
(y, x−i ) = (x1 , ..., xi−1 , y, xi+1 , ..., xn )
And secondly, recall from the Fundamental Theorem of Calculus
(FTC) that if f is the first derivative of F , then
Z
a
b
f (x)dx = F (b) − F (a)
Think of the expenditure function as F and its derivative hi as f
(Shephard’s Lemma again). Hold utility at u0 and all prices fixed
except pi . Remember pi is the independent variable, so integration is
the area to the left of the curve.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
74 / 210
The area to the left of the Hicksian demand anchored at u0 , between p0i
an p1i is
Z
p1i
p0i
Z
hi pi , p0−i , u0 dpi
p1i
∂e
pi , p0−i , u0 dpi
p0i ∂pi
p1
= e pi , p0−i , u0 pi0
i
= e p1 , u0 − e p0 , u0
=
(by Shephard’s Lemma)
(FTC)
= CV
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
75 / 210
pi
Hicksian demand
p1i
p0i
b
Marshallian demand
xi
The shaded area is the compensating variation associated with an
increase in pi
If good i is normal, the Marshallian demand will be more price-elastic
than the Hicksian. The loss in CS from the price increase is the area to
the left of the Marshallian curve, between p0i and p1i (less than CV ).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
76 / 210
Economists like to ask another dollar-valued question about the welfare
effect of price increases from p0 and p1 :
What loss in income would be as bad, at prices p0 , as facing prices p1
at original income I?
Well, having e p0 , u1 in total would be as bad, since it leaves you with
the same utility u1 .
Starting at I, to get to e p0 , u1 you would need to lose
EV = I − e p0 , u1 ,
the equivalent variation associated with the price increase.
Because
I = e p0 , u0 = e p1 , u1 ,
EV can also be written
David Pearce (NYU)
e p1 , u1 − e p0 , u1
Theory Track Micro Analysis
Spring 2021
77 / 210
Repeating the argument for graphically evaluating CV , based on the
area to the left of the Hicksian demand anchored at u0 , we see that EV
is the area to the left of the Hicksian demand is anchored at u1 .
If i is a normal good, Marshallian demand is more price-elastic than
Hicksian.
pi
hi pi , p−i, u0
p1i
p0i
y
hi pi , p−i, u1
b
b
x
Di(pi , p−i, I)
xi
Call the old and new Marshallian bundles x and y. Point y gives utility
u1 , so the Hicksian demand curve through y in anchored at u1 .
The shaded area is EV ≤ loss of CS ≤ CV in this normal case.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
78 / 210
Recall that the gross substitute definition based on Marshallian
demand cross partials, is unsatisfactory because of a lack of symmetry:
sometimes y is a gross substitute for x, but x is not a gross substitute
for y.
We say x and y are net substitutes if
∂hi
>0
∂pj
Note that, using Shephard’s Lemma and Young’s Theorem
∂hj
∂hi
∂e
∂e
∂
∂
=
=
=
∂pj
∂pj ∂pi
∂pi ∂pj
∂pi
So the desired symmetry holds.
We say that x and y are net complements if
∂hi
<0
∂pj
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
79 / 210
Producer Theory
Whereas each consumer has her own preferences, we will simplify by
assuming each firm seeks to maximize profits. Instead of introducing a
preference ordering, then, we specify the firm’s technology by
describing its production function
f : Rn+ → R
The interpretation is that the input vector x = (x1 , ..., xn ) can be
turned into f (x) units of output.
Note: By assuming the range of f is R, not Rk for some integer k > 1,
we are focusing on single-product firms. (But think of a sheep.)
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
80 / 210
For simplicity f is usually assumed differentiable. For any i,
∂f
= M Pi
∂xi
is called the ith marginal product, typically strictly positive.
The law of diminishing returns asserts that if all inputs are held
fixed as the ith one is increased, eventually M Pi will decline and
approach zero. The reasoning is that you can’t do much with vast
quantities of water, for example, if space and other raw materials and
labor are fixed.
But returns to scale are an entirely different matter.
Production function f is said to have
constant returns to scale if f (tx) = tf (x) ∀x, ∀t > 1.
increasing returns to scale if f (tx) > tf (x) ∀x, ∀t > 1.
decreasing returns to scale if f (tx) < tf (x) ∀x, ∀t > 1.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
81 / 210
An isoquant is the set of all input vectors that produce the same
output level. It is analogous to an indifference curve.
In two dimensions its equation is
f (x) = ȳ
and the total differentiation gives the slope
−
M P1
M P2
One can denote output price by p ∈ R++ and input prices by w ∈ Rn++ .
In two dimensions one often sees costs= wL + rK, where L and K are
labor and capital, w is the wage rate and r is the cost of capital.
An isocost line (or curve) is the set of input bundles that cost a
particular amount, for example
w1 x 1 + w2 x 2 = c
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
82 / 210
The Producer’s Problem
Suppose you produce q units. You are not maximizing profits unless
you are minimizing the cost of producing q. Doing this is exactly
analogous to minimizing the cost of producing u utils in consumer
theory! So the Lagrangean has to look the same, the FOC’s must look
the same, the graphical interpretation must look the same (tangency
between isocost and isoquant) and so on.
minn w · x
R+
x∈
s.t. f (x) = q
L = w · x − λ(f (x) − q)
FOC’s:
1
M P1
M Pn
=
= ... =
λ
w1
wn
The solution x(w, q) is called the conditional input demand
function. The associated minimized value function is
C(w, q) = w · x(w, q)
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
83 / 210
Naturally the envelope theorem here is again:
Shephard’s Lemma
∂C(w, q)
= xi (w, q)
∂wi
Because all of the math is identical to the cost minimization problem
for the consumer, this cost function is also concave in w, and the two
proofs that input demand i is weakly downward sloping in wi , are
identical to those in consumer theory. You should know them.
A firm often makes some commitments regarding inputs that cannot be
exchanged quickly. For example, it may rent capital for a month, or
hire a manager on a one-year contract. So if input or output prices
change, the firm may be stuck in the short run, with a suboptimal
vector of inputs.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
84 / 210
Long Run
In the long run, the firm is assumed to be able to choose all inputs
optimally. Write LC(q) for the long run cost of producing q units of
output.
Since it takes no input to produce no output,
C(0) = 0
But there might be setup costs that cause a discontinuity in LC at
the origin. For example, before a restaurant can serve any food at all,
it needs to meet health codes and pass inspections. So it might need to
spend $20, 000 before it is allowed to produce any q > 0.
LM C(q) =
d LC(q)
dq
LAC(q) =
David Pearce (NYU)
LC(q)
q
is the long run marginal cost.
is the long run average cost.
Theory Track Micro Analysis
Spring 2021
85 / 210
For any q > 0,
d LAC
=
dq
d
LC
q
dq
=
qM C − LC
LM C − LAC
=
2
q
q
This says the average is always moving toward the marginal.
If a person taller than the average person in a group joins the group,
she increases the average. But not by much, if it’s a group with lots of
members.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
86 / 210
Case 1: No setup costs.
If LC is differentiable, it approaches 0 as q → 0. Therefore:
lim LAC = lim
q→0
q→0
LC
= lim LM C
q→0
q
(by L’Hôpital’s Rule)
Therefore LAC and LM C have the same vertical intercept. Lots of
shapes are possible. A favorite one among economists is “U-shaped
cost curves”, as shown in the graph
LAC
LM C
LM C
LAC
q
Here, LAC declines, while LM C is below it, until it is cut by LM C as
the latter rises. Thus the minimum of LAC occurs where
LM C = LAC. (This is true even if there are setup costs.)
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
87 / 210
L’Hôpital’s Rule can further be pressed into service to show (in the
absence of setup costs) that
lim
q→0
d LM C
d LAC
= 2 lim
q→0
dq
dq
This is a basic property of marginals and averages, having nothing to
do with the fact that these happen to be cost curves. We will see the
same relationship when we get to the firm’s marginal revenue and
demand curves.
It is easy to explain why LAC might initially decline. Adam Smith
pointed out in 1776 in The Wealth of Nations that with many
workers, specialization is possible, and profitable. This is an example of
an “economy of scale”.
But what explains an eventual rise in LAC? The best story is that
“span of control” issues become a problem as firm gets really large.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
88 / 210
Case 2: Strictly positive setup costs.
Now
lim LC > 0
q→0
therefore
lim LAC = ∞
q→0
The $20, 000 in our restaurant example is divided by a vanishing q, so
the limit LC
q explodes.
Here a different graph becomes economists’ favorite:
LAC
LM C
LM C
LAC
q
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
89 / 210
Sometimes it is useful to see how the LC curve is related to the
marginal and average curves.
Point of Inflection
b
b
While LC is concave, LM C is
2
falling, because dd qLC
2 < 0.
LAC = rise
run
is minimized here
q
After the point of inflection, LM C
is rising.
LM C
LAC is minimized where the line
from the origin is tangent to LC.
LAC
q
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
90 / 210
Short Run
Let SF C (short run fixed costs) be costs that cannot be avoided in the
short run, even by shutting down. Define short run variable costs by
SV C = SC − SF C
Dividing both sides by q gives average costs:
SAV C = SAC − SAF C
SM C =
d SC
dq
C
Notice that, because d SF
d q = 0, SM C is the marginal curve of both
SC and SV C and hence cuts the average curves SAC and SAV C at
their respective minima.
The graph shows a U-shaped SM C curve in a case without setup costs.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
91 / 210
SC
b
SV C
b
b
b
b
q
SM C
SAC
b
SAV C
b
b
b
SAF C
q
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
92 / 210
LC and SC Together
Suppose that, fixing input prices in the background, K̄ is the ideal
amount of capital for producing some particular output level q̄. Then if
K is fixed at K̄ in the short run (all other inputs variable),
LC(q̄) = SC(q̄)
and probably
LC(q) < SC(q),
q 6= q̄
(the weak inequality surely holds).
The average curves LAC and SAC are related in exactly the same way.
So LC and SC are tangent at q̄, and so are LAC and SAC.
Of course tangency of LC and SC at q̄ means that
LM C(q̄) = SM C(q̄).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
93 / 210
SC
LC
Probably the simplest
setting in which to see the
relationship between long
and short run curves is the
case of constant LAC and
LM C.
In the top graph, draw LC
and ask where SC must lie.
In the lower, draw
LM C = LAC and see where
SAC and then SM C must
be.
b
SF C
q
SM C
b
SAC
LM C = LAC
q
q̄
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
94 / 210
LC
What about the more
ambitious case where long
run costs are rising? Let’s
say that q̄ happens to be
higher than q m that
minimizes SAC.
b
b
SC
q
SM C
Draw LC, then SC, then
LAC and LM C, then SAC
and finally SM C.
LM C
SAC
b
b
b
LAC
qm
David Pearce (NYU)
Theory Track Micro Analysis
q
q̄
Spring 2021
95 / 210
Suppose instead LAC and LM C are our favorite U-shaped curves. For
each different fixed capital level (ideal for a different output level), the
corresponding LAC and SAC must coincide at the respective q value.
LAC
SAC ′′
SAC ∗
SM C ′′
SM C ∗
b
SAC ∗
b
SM C ∗
b
q′
q∗
q ′′
q
With capital fixed at K ∗ ideal for Q∗ , LAC and SAC are minimized at
Q∗ and SM C cuts them there, as it rises.At q 0 (with capital fixed at
the corresponding K 0 ) SAC is not minimized, because it is tangent to
the downward-sloping LAC curve there.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
96 / 210
Firm Supply
Graphical analysis of a firm’s supply curve depends heavily upon the
properties of marginal and average cost curves. Fix input prices and
regard supply as a function of the market price p ∈ R+ , which the firm
cannot change.
The firm wishes to maximize
profits = revenues - costs
In the long run, it solves
max pq − LC(q)
q∈
FOC:
R+
p = LM C
This condition must hold at any interior solution. But it might be
better to produce 0, that is, if p < min LAC, so that (multiplying by
q) revenue< LC.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
97 / 210
In the simplest case, there are no setup costs, and LM C and LAC are
increasing.
LM C
LAC
p
pc
q
Below P c , it is impossible to cover costs, so q = 0. Above pc , the firm
produces where p = M C. Thus, the LR supply curve “is” the LM C
curve, in this case.
p
LMC
Notice that for the supply curve, p is the
independent variable, whereas for marginal
cost, q is the independent variable.
long run supply
q
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
98 / 210
Notice that integrating from 0 to q under the LM C curve gives us
LC(q). Thus, profits are revenues-costs=pq-(shaded area).
p
q
Thus the graph shows us the firm’s profits at any price.
Next, suppose that LM C and LAC are U-shaped. The supply curve
no longer coincides with LM C.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
99 / 210
LM C
Supply
LAC
p
p
c
b
b
q′
qc
q
q ′′
Now, supply will be 0 until price exceeds the critical value for entry
pc . For prices below pc , the firm cannot cover its costs and prefers not
to produce.
For prices slightly above pc , p = LM C is satisfied at two quantities.
Note that at the lesser of these two, q 0 , p < LAC, so losses would be
incurred. But at q 00 , p > LAC and the firm earns profits.
So long run supply here will be given by the portion of LM C to the
right of q c (and by the points below pc on the vertical axis).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
100 / 210
supply curve
pc
b
b
qc
q
q′
If someone shows you the supply curve, you can’t deduce what LM C
was. So it seems to be impossible to integrate under LM C to find
LC(q), just from the supply curve.
BUT because the firm is indifferent, at pc , between q = 0 and q = q c ,
you infer that the revenue from q c just covers the cost of producing
those units, that is, LC(q c ) = pc q c .
Therefore, for any q 0 > q c ,
LC(q) = pc q c + area under supply curve from q c to q 0
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
101 / 210
LM C
What if there are setup costs?
Then it is no longer true that
integrating under LM C gives
LC(q): this misses the setup cost.
LAC
pc
But it is still true that for the
critical entry price pc and
corresponding q c ,
q
qc
supply
c
c c
LC(q ) = p q
(inferred from the firm’s
indifference between q = 0 and q c ).
LC(q ′ )
pc
q′
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
q
102 / 210
In the short run, what is the firm’s objective? SF C are unavoidable
in the short run, so the objective is
max pq − SV C(q)
q∈
R+
Thus the FOC for an interior maximizer is
p = SM C
but if p < min SAV C, the firm prefers to set q = 0.
Notice that the firm may choose to produce even if is making losses in
the short run.
For example, if revenues exceed variable costs by $3,000 but cannot
cover the SF C of $5,000, it is better to produce and take a net loss of
$2,000 than to choose q = 0 and lose $5,000.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
103 / 210
SM C
SAC
SAV C
p′
q
q′
Consider an example with no setup costs but positive fixed costs.
At price p0 , FOC’s suggests: consider setting q = q 0 .
Is this better than q = 0?
The firm makes losses at q 0 , because p < SAC. But p > SAV C, so q 0 is
better than 0.
In these graphs, one can calculate SC(q) either by pc q c plus the
integral under supply from q c to q, or by q · SAV C.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
104 / 210
Definition
The producer surplus, given price p, is the dollar benefit to the
producer from being able to produce and sell in the market. More
precisely, it is profits at the optimal q minus profits at q = 0.
In the long run, producer surplus coincides with profits, because
LC(0) = 0 so profits at q = 0 are 0.
But in the short run, does the firm try to maximize profits, or producer
surplus?
It amounts to the same thing, because they differ by a constant, SF C.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
105 / 210
In both the short run and long run analysis, we have been assuming
that the firm takes market price p fixed (views it as something the firm
cannot noticeably affect, because the firm is of negligible size relative
to the market). For some technologies, this makes sense; for some it
doesn’t.
For example, suppose there are setup costs, and LC is linear. Then
LM C is constant.
LAC
LM C
q
What is the long run supply curve? Does the question make sense in a
price-taking setting? Even without setup costs, the same problem arises
if LM C is declining as q grows without bound. So “increasing returns
to scale everywhere” can’t be analyzed using “price-taking” methods.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
106 / 210
One-Step Maximization
We have been focused on a two-step approach to the producer problem:
first, solve the cost minimization problem to find the average and
marginal cost curves, and then use them to understand the supply
decision, as a function of market price. This will prove extremely useful
in doing policy analysis.
But think for a moment about tackling the firm’s profit-maximizing
problem in one step:
max x ∈ Rn+ (pf (x) − w · x)
where p ∈ R++ is the market price of output,
w ∈ Rn++ is the vector of input prices,
x ∈ Rn+ is the vector of input quantities,
f (x) is the quantity of output.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
107 / 210
If output price rises, will the producer always produce more?
Proposition
Fix some vector of input prices w, and suppose p0 > p. Let x and x0
solve the profit maximization problem at (p, w) and (p0 , w) respectively.
Then
f (x0 ) ≥ f (x)
Proof.
pf (x) − w · x ≥ pf (x0 ) − w · x0
(8)
p0 f (x) − w · x ≤ p0 f (x0 ) − w · x0
x0
(9)
p0 ).
(because x is best at p, and is best at
Multiply (8) by −1 (changing the direction of the inequality) and add
the result to (9):
(p0 − p)f (x) ≤ (p − p0 )f (x0 )
therefore
so f (x0 ) ≥ f (x).
David Pearce (NYU)
(f (x0 ) − f (x))(p0 − p) ≥ 0
Theory Track Micro Analysis
Spring 2021
108 / 210
This means that firms’ supply curves are always (at least weakly)
upward-sloping, no matter how complicated M C and AC might be.
Are we talking about the long run or the short run here?
The previous result (and the next one) apply in either case (the only
difference is, how many inputs does the firm get to vary?)
What if, instead, one input price rises while all others, and p, are held
fixed?
Proposition
Suppose w0 and w differ only in the ith component, where wi0 > wi .
Price of output is fixed at p. If x and x0 solve the profit maximization
problem at (p, w) and (p, w0 ) respectively, then x0i ≤ xi .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
109 / 210
Proof.
Because x is profit-maximizing at (p, w),
pf (x) − w · x ≥ pf (x0 ) − w · x0
But x0 is profit-maximizing at (p, w0 ), so
pf (x) − w0 · x ≤ pf (x0 ) − w0 · x0
Multiplying (11) by −1, thereby changing the direction of the
inequality, and adding the result to (10):
(w0 − w) · x ≥ (w0 − w) · x0
therefore
(10)
(11)
(w0 − w)(x − x0 ) ≥ 0
Hence, because all components of w0 − w are 0 except the ith ,
(wi0 − wi )(xi − x0i ) ≥ 0
so x0i ≤ xi .
Thus, factor input demands are weakly downward-sloping in their own
prices.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
110 / 210
Market Demand
It’s time to aggregate demand and supply curves to derive the market
demand and market supply curves. In the case of market demand, this
is quite straightforward. At each price p, the j th consumer’s demand
function Dij for the ith good specifies the quantity xji demanded by j,
as a function of her income and all prices. When we draw her demand
curve, we hold her income and all prices pk , i 6= k, fixed.
Remember that here, pi is the independent variable, and the quantity
demanded is plotted on the horizontal axis. So the market demand
for good i is the horizontal summation of all the individual demand
curves of the m consumers:
Di (pi ) =
m
X
Dij (pi )
j=1
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
111 / 210
Draw yourself some examples. Note carefully that the sum of linear
demand curves, for example, is not linear, if they have different
intercepts.
pi
Di (pi )
Di2
Di1
xi
Here the graph shows two consumers with linear demand curves and
different vertical intercepts. The sum has a kink at the price where the
second consumer switches between participation and nonparticipation.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
112 / 210
How about aggregating consumer surplus?
This is an extremely delicate issue. What does it mean to add up the
welfare of different people? Is an extra thousand dollars to a rich
person comparable in value to an extra thousand for a poor person
struggling for survival? Most people would say no.
Nonetheless, economists often simplify by just adding up individual
consumer surpluses to get an aggregate measure. Gertrude Stein’s
most famous line is: “A rose is a rose is a rose.” When one adds up
different individuals’ consumer surpluses, one is saying, without
justification, “A dollar is a dollar is a dollar”.
Is the sum of individual CS captured in the market supply curve? Or is
information lost in the aggregation?
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
113 / 210
At price p0i , consumer j enjoys consumer surplus
Z ∞
j 0
Dij (pi ) dpi
CSi (p ) =
p0i
where we suppress the other arguments of demand including other
prices and income. So consumer surplus is
m
X
CSij (p0 )
=
j=1
m Z
X
j=1
=
=
Z
p0i
m
∞X
p0i
Z
∞
Dij (pi ) dpi
Dij (pi ) dpi
j=1
∞
p0i
Di (pi ) dpi
the area to the left of market demand, above price.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
114 / 210
Market Supply
Here it is useful once again to consider the long run and the short run
separately. For the aggregation exercise, the simple case is the short
run. Economists usually assume that it is a good approximation to
hold input prices fixed as firms in the market move up or down their
short run supply curves. That is, because some factors of production
are fixed in the short run, firms (in response to an increase in p) won’t
increase their scale of production so much that factor input prices are
bid up significantly. So all factor input prices are held fixed while
output price is varied and firms respond accordingly.
In the short run, aggregate supply is just the horizontal sum of firm
supply curves. If there are l firms active in the short run:
Si (p) =
l
X
Sik (p)
k=1
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
115 / 210
How about aggregate producer surplus?
Again, we sum the surpluses of the individual firms, each of which is
the area to the left of the supply, up to the market price p0i :
l
X
P Sik (p0 )
=
k=1
l Z
X
k=1
=
Z
p0i
0
=
Z
p0i
Sik (pi ) dpi
0
l
X
Sik (pi ) dpi
k=1
p0i
Si (pi ) dpi
0
the area to the left of market supply, under price p0i .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
116 / 210
Is every firm’s short run supply curve the same as any other’s? Not
necessarily. Even if all firms have access to the same technology, maybe
they have made their choices of capital levels (or other inputs that take
a long time to adjust) at different times, when different prices
prevailed. Hence, they may have made different choices about input
levels that are now, in the short run, fixed. As a result, their cost
curves, and therefore their supply curves, may differ.
This means that as price increases from some low level, market
quantity supplied increases partly because firms that are already active
each produce more, and partly because other firms (with higher
minimum values of SAV C) jump into production.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
117 / 210
Understanding long run market supply is more subtle. All factor inputs
are variable, so one can think of the industry growing from relatively
low outputs, at low market price, to, say, twenty times the output at a
substantially higher market price. At that price it will be using vastly
greater quantities of some factor inputs. Does this bid up prices of
inputs?
That depends on the size of this industry relative to the size of the
input markets. If the industry is one of hundreds that use water (or
unskilled labor, or gasoline...) in some region, its changing demand for
water amounts to fairly little as a proportion of total demand for
water. In this case, changes in this industry have little effect on the
prices of water.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
118 / 210
Definition
An industry is called a constant cost industry if factor input prices
remain constant as the industry expands. If instead, one or more input
prices get bid up as the industry expands, it is called an increasing
cost industry.
WARNING
“Constant cost industry” does not refer to the shape of each firm’s
marginal or average cost curves!!!
Example
Constant cost industry with identical firms and U-shaped cost curves.
On the left, draw a typical firm’s cost curves, and on the right, with
different scale on the horizontal axis (say, millions of units), the market
supply curve.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
119 / 210
p
p
LM C
LAC
Market Supply
pc
qc
q
Q
At the critical price pc , every potential firm (we have free entry) is
indifferent between producing q c (perhaps 100 units), where LAC is
minimized, or 0. In the aggregate, that means the industry is willing,
at that price, to supply 100 units in total, or 200, 300, or 100k for any
natural number k.
If we plot all these (100k, pc ) points on the right graph, with Q being
millions of units, say, the dots are so close together that it just looks
like a solid line.
Thus, in a constant cost industry with identical firms, U-shaped cost
curves yield a flat industry supply curve (ignoring the vertical section
on the price axis below pc ).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
120 / 210
Some restaurants in NYC have locations near the Hudson River,
enjoying great views of the river and beautiful New Jersey. Others
don’t. Does this mean those with views can earn profits that others
can’t?
Economists views these returns as rents associated with the favorable
location. If a restaurant is renting the favorable site, competition with
other restaurants who would like a great view (in order to be able to
charge more for the same food and service) means the owner of the
site, not restauranteur, captures those extra returns, literally as rent.
Even if a restaurant owns one of the favorable sited, it is foregoing
being able to rent it to another restaurant, so that foregone rent is an
opportunity cost of operating the business on the owned property. So
economic profits are zero.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
121 / 210
In an increasing cost industry, as Q increases, one or more factor input
prices is being bid up. This can change the LM C and LAC curves in
complicated ways, which are difficult to derive graphically. The result,
though, is that each firm’s value of min LAC rises as Q rises, so for
cost to be covered at higher values of Q, market price must be higher.
p
Long run supply
pe
Qe
Q
Suppose market equilibrium occurs at some (Qe , pe ). The graph would
suggest that producer surplus in this market is
area under LR supply,
e e
e e
p Q − (total cost of production) = p Q −
from 0 to Qe
But with free entry, shouldn’t profits be zero?
We’ll discuss the resolution of this puzzle.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
122 / 210
Partial Equilibrium
Partial equilibrium refers to the study of one market in isolation. Drop
the i subscript and consider market demand D(p) (holding all incomes
and prices in other markets fixed) and market supply S(p) (holding all
input prices fixed).
The pair (p, Q) is an equilibrium of this model if D(p) = S(p) = Q. If
price is such that demand exceeds supply, the excess demand drives
price upward, moving toward equilibrium.
S
pe
D
Qe
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
123 / 210
The government might intervene in the market, taxing or subsidizing
consumption or production.
Definition
Social surplus is the sum of consumer surplus, producer surplus, and
government net revenues (tax minus government expenditures).
Notice that in equilibrium, exactly these units have been produced and
consumed for which the social benefit (measured by the demand curve)
exceeds the social cost (measured by surplus curve). Producing more
units would involve costs that cannot be covered by consumers’
marginal benefits.
Voltaire’s Dr. Pangloss would say: ”Everything is for the best, in this
best of all possible worlds.” He would be disappointed to learn that
equilibria often fail to maximize social surplus, in the presence of
market power, asymmetric information or externalities. We will study
each of these later in the course.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
124 / 210
Suppose the government imposes a tax on consumers of $t for each
unit of this good they buy.
S
p′ +t
pe
p′
D
D
Q′
′
Qe
To find the new demand curve choose any Q0 and note that it now
takes a price t units lower to call forth that demand. Consequently, the
new demand curve D0 is t units lower, at each Q0 , than the original.
(The price to the consumer is the market price plus the per unit tax).
On the graph, pe denotes the original equilibrium price, and p0 the new
market price. The consumer pays p0 + t per unit.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
125 / 210
The loss in social surplus associated with the tax, also called the
excess burden of the tax, can be computed in two equivalent ways.
Method 1
Find the change in consumer surplus, the change in producer surplus
and change in net tax revenues, and add them up.
S
p′ +t
pe
p′
D
D
Q′
David Pearce (NYU)
′
Qe
Theory Track Micro Analysis
Spring 2021
126 / 210
The loss in P S is the blue shaded area:
(pe − p0 )Q0 + area below pe above supply, from Q0 to Qe
The loss in CS is the red shaded area:
(p0 + t − pe )Q0 + area above pe below demand, from Q0 to Qe
But the area tQ0 (see the corresponding rectangle on the graph) is a
gain in government revenues.
Method 1 is reliable, but it is inefficient, in the sense that it
unnecessarily keeps track of “ transfers” of money from one party to
another.
For example, we know that the money consumers pay in taxes, which is
counted as a loss of consumer surplus, is going to show up as increased
government revenues, and these two things just cancel out in social
surplus calculations. The same goes for the fall in market price, a loss
to producers but gained by consumers. Method 2 is a quicker way to
answer, but takes some practice to do correctly.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
127 / 210
Method 2
After determining what is consumed and produced, ignore price (it just
determines how surplus is divided) and use the demand curve to
measure the change in true benefit (from units consumed before but
not after, or vice versa) and the supply curve to measure the change in
social cost (of extra units produced before relative to after the policy,
or vice versa)
For example, in evaluating the consumption tax, notice that there are
Qe − Q0 units no longer produced and no longer consumed, after
imposition of the tax. The loss in real consumer benefits, the area
under demand from Q0 to Qe , exceeds the reduction in social cost of
production (the area under supply from Q0 to Qe ) by the area between
the two curves, from Q0 to Qe .
That’s exactly what Method 1 produced at greater length.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
128 / 210
What if producers, instead of consumers, were obliged to pay the tax of
$t per unit?
Note that in the case of the tax on consumers, at the new equilibrium
Q0 , the consumers are paying $t more (per unit) than producers are
getting, so Q0 is that quantity at which the original demand curve is
above (original) supply by t.
That must also be the case with the producer tax (consumers are
paying $t more than producers are getting) so the equilibrium is again
at Q0 . Everyone is actually paying (or firms, keeping, after tax) the
same amount in the two situations, so changes in CS, P S and tax
revenues are identical across these alternative tax schemes.
Although the schemes have different legal incidences (who legally has
to pay the tax), their economic incidences are identical. Economists
say that part of the burden of the tax is shifted, by a market price
adjustment, from one party (who legally pays the tax) to the other.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
129 / 210
Tariffs are taxes levied on imports. Suppose a “small open economy”
(that is, a country that is too small to affect the world price of a good
when it imports it) imposes a tariff of $t per unit on the importation of
a certain good. In the simplest case, there are no domestic producers.
pe +t
world supply
pe
domestic demand D
Q′
Q
Qe
Consumers now have to pay the world price plus the tariff, so they buy
Q0 units. What happens to domestic social surplus? Consumers lose
tQ0 + area uncer D above pe , from Q0 to Qe
but tQ0 is tariff revenue to the government, so the overall loss is just
the shaded area on the graph.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
130 / 210
What if there is a domestic industry producing this good? Label their
supply curve S DOM
S DOM
pe +t
pe
world supply
final imports
original imports
D
Q′
Qe
Q
Before the tariff, some of the domestic consumption Qe is imported,
the rest being produced domestically. With the tariff, less is imported
(see “final imports”) and more is produced within the country.
Method 2 says the loss in domestic social surplus is the sum of the red
area (how much more the lost consumption units mattered than it
originally cost to import them) and the blue area (how much more it
costs to produce the extra domestic units than it did to import them).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
131 / 210
Going back to a closed economy (no trade), let’s compare two ways of
helping farmers.
Policy 1
Pay a subsidy of $t for each unit produced. This shifts the supply
curve downward by t.
S
S′
pe
shaded area is loss in social surplus
p′
world supply
D
Qe
Q0
Q′
Q
Qe
Extra units
−
are produced and consumed; consumers are
helped, producers are helped, but loss of government revenues exceeds
those benefits. Method 1 says the loss in social surplus is the area
below S and above D, form Qe to Q0 .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
132 / 210
Policy 2
The government enters the market, buying agricultural goods (say,
eggs) to drive up their price. It eventually destroys them (rather than
selling them and driving down the price).
S
p′
pe
D+G
D
Q′′
Qe
Q′
Q
G represents government demand. The extra demand drives the price
up to p0 , helping producers and hurting consumers, and costing the
government p0 (Q0 − Q00 ). Method 2 says: Units Q0 − Qe are newly
produced, costing the blue area on the graph. Units Qe − Q00 still get
produced, but no longer enjoyed, so the loss here is the area under
D from Q00 to Qe (red area on the graph). This is a DISASTER
compared to the subsidy.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
133 / 210
Governments often employ price ceilings or price floors (think of rent
control, minimum wages, wartime price controls and so on). These
tend to cause excess supply or demand.
SSR
SLR
pe
p̄
D
Q′
Qe
Q′′
Q
Suppose, for example, a rent control law is passed, putting a price
ceiling p̄ on rents. If the short run supply curve is vertical (the stock of
apartments is fixed right now), this doesn’t affect supply, but it
increases quantity demanded. Excess demand is Q00 − Qe .
In the long run, fewer apartments will be provided for rental, so there
is even greater excess demand (Q00 − Q0 ). The effects on social surplus
depend on how the good in short supplied is rationed.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
134 / 210
At best, in the long run, if the renters who value the apartments the
most somehow get them, without any resources wasted on competing
for them (by lineups or complicated application procedures), the loss of
social surplus is the shaded area on the graph above. In the short run,
landlords suffer and renters benefit.
Rationing by lineup is extremely inefficient. Let’s say New York City
wants to celebrate an anniversary by hosting a rock concert at Madison
Square Garden. The performers are superstars, so the market price for
a ticket would normally be $200. But so that people other than the rich
have a chance to attend, the city decides to sell tickets at $120 each.
At $200, demand would have equalled the capacity of 20, 000 seats. At
$120, there is huge excess demand. The city handles this by saying:
One ticket per costumer, the box office opens at 9 AM, October 14.
Clearly there will be a long line, and it won’t form at 9 AM. When will
it start?
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
135 / 210
S
200
120
D
20, 000
Q
For simplicity, assume first that once the box office opens, it serves
each customer instantaneously (so this line moves fast!). A rich guy
might want to get a ticket, but he doesn’t want to stand in line for
hours. So he pays someone with a lower opportunity cost of time, to
line up for him, If there’s a competitive market for lining up, it will
have some hourly equilibrium wage rate w.
Each person who lines up can buy a ticket for $120 and sell it for $200,
the market clearing price of a ticket. He nets $80 for waiting, so for
him to be indifferent, the lineup has to form 80
w hours before 9 AM.
The same people attend the concert as would have been the case had
the city sold them at $200. Loss of social surplus is the shaded area.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
136 / 210
Clearly this area doesn’t depend on w. If people are more averse to
lining up, the line will form a bit later, say at 3 AM instead of 2 AM.
And if rain is forecast, the line forms later. The lost social surplus is
the same.
Should the government reduce the wasted waiting time by allowing
each person in the lineup a maximum of four tickets, instead of just
one? No! The lineup will form four times as many hours before 9 AM,
and the total wasted hours will be the same.
Because rationing purely by lineups is so destructive, wartime prices
controls are often accompanied by rationing coupons. A coupon
entitles you to two eggs, for example. You don’t need to be first in the
line in order to get your eggs.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
137 / 210
General Equilibrium
So far we have been studying single markets in isolation. Now it’s time
to put them all together and study equilibria of the whole system of
interrelated markets. Such an equilibrium is called a general
equilibrium or a competitive equilibrium of the economy.
Ideally, one would study consumption and production together. But to
keep the complexity under control, here we will limit attention to
economies with no production. Everyone starts off owning something
(called an initial endowment) and then trade is permitted. Such an
economy is called a pure exchange economy. The insights gleaned from
understanding these models carry over, under reasonable assumptions
to economies with production.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
138 / 210
Definition
A pure exchange economy consists of:
goods 1, 2, ..., n
consumers 1, 2, ..., m with utility functions uj (xj1 , ..., xjn )
initial endowments w1 , ..., wm ∈ Rn+
Here, xji denotes the amount of good i consumed by individual j, and
wj is the individual j’s initial endowment of goods. This is a vector
(w1j , ..., wnj ), representing what is individual j’s owns, before trade.
Let w denote the vector (w1 , ..., wm ) ∈ Rmn
+ and x denote the vector
(x1 , ..., xm ) ∈ Rmn
.
+
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
139 / 210
P
j
Notice that the vector m
j=1 w tells us how much of each good is
available in the entire economy. It is the stuff you would have if you
collected everybody’s initial endowments of goods. If you were a social
planner, with dictatorial powers, you could then distribute those goods
to the m people in many alternative ways.
Of course, you couldn’t allocate more goods than are available, in total.
We say a feasible allocation is a vector x = (x1 , ..., xm ) such that
m
X
j=1
xj =
m
X
wj
j=1
Some economists like to use a weak inequality ≤ in the definition of
feasible allocation (you could distribute less than the total available).
As long as consumers have “free disposal” (they can costlessly get rid
of stuff), this makes no difference.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
140 / 210
Each consumer is assumed to be a “price taker”: she acts as though
she has no influence on any price. This is a reasonable approximation if
m is very large.
Facing prices p = (p1 , .., pn ) in this economy, consumer j’s problem is:
maxn u xj
subject to p · xj = p · wj
xj ∈
R+
The constraint says that she can’t consume a bundle whose market
price exceeds the market value of her initial endowment. If you like,
think of her first selling the endowment; the proceeds, p · wj , now play
the role of her income in a standard Marshallian demand problem.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
141 / 210
Often economists give examples with just two consumers, A and B.
The interpretation is that there are, say, one million consumers with
preferences and endowments like person A, and another million persons
with preferences and endowments like person B (and that’s why A and
B don’t think they can individually affect princes).
In the 19th century, Edgeworth proposed a device, later refined by
Pareto and Bowley, for visualizing a two-person pure exchange
economy. It is now called and Edgeworth-Bowley box.
First, draw person A’s indifference curves as you normally would (if her
utility function is strictly quasiconcave, these curves will be convex).
Then draw B’s indifference curves, but upside down and backwards!
His origin (what to him is (0, 0)) is on A’s graph the point
(w1A + w1B , w2A + w2B ).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
142 / 210
b
OB
b
OA
b
Initial endowment w
The length of the box is the amount of good 1 in the economy, while
the height is the amount of good 2. Any particular point in the box is
a feasible allocation, a specific way of dividing the goods between the
two agents.
One point in the box is the initial endowment, showing who started off
owning what. If market prices (p1 , p2 ) arise, the ratio pp12 determines
the slope of the line along which consumers can trade away from w.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
143 / 210
What do we mean by an equilibrium of an exchange economy? It
should be a price vector and a feasible allocation such that no one can
trade, at market prices, to a bundle she strictly prefers.
Definition
The pair (p, x) ∈ Rn+ × Rmn
+ is a competitive equilibrium (or
Walrasian equilibrium) if x is a feasible allocation and for each
j = 1, ..., m, xj solves
maxn u (z)
R+
z∈
subject to p · z = p · wj
This says that xj is the Marshallian demand for agent j. Why don’t we
also need to require that each market clears, that is, that supply equals
demand? That is guaranteed already by the feasibility of the allocation
x: the sum of the demands equals the sum of the “supplies”, that is,
the sum of initial endowments.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
144 / 210
People won’t always agree about the best way to allocate scarce
resources. But Pareto suggested that if everyone likes x at least as
well as y, and at least one person strictly prefers x to y, then x is a
better allocation that y. In honor of Pareto, when this is the case we
say x is Pareto-superior to y, or x Pareto-dominates y.
If there is NO feasible y that Pareto-dominates a feasible allocation x,
we say x is Pareto efficient (or Pareto optimal).
The set of all Pareto efficient points in an Edgeworth-Bowley box
weakly preferred by all agents to the initial endowment is called the
contract curve.
b
OA
OB
b
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
145 / 210
First Theorem of Classical Welfare Economics
Consider a pure exchange economy in which each utility function uj is
strictly increasing. If (p, x) is a competitive equilibrium, then x is
Pareto efficient.
Proof.
Suppose, for contradiction, ∃ feasible allocation y that
Pareto-dominates x. Then
uj (y j ) ≥ uj (xj )
k
k
k
k
u (y ) > u (x )
∀j
(12)
for at least one k
(13)
(13) implies y k was not affordable (that’s why k didn’t choose y k
instead of xk ), that is,
p · y k > p · wk
Now if we could also show
p · y j ≥ p · wj
David Pearce (NYU)
∀j
Theory Track Micro Analysis
(14)
(15)
Spring 2021
146 / 210
Proof.
then adding (14) to all the j 6= k terms of (15):
m
X
j=1
So
So
p · yj >
p·
m
X
p·
m
X
j=1
j=1
m
X
j=1
yj > p ·
j
w >p·
p · wj
m
X
wj
j=1
m
X
wj
j=1
a contradiction.
So if we could show (15), we’d be finished.
If, for some j, p · y j < p · wj , j could have afforded to buy y j plus
(ε, ..., ε), if ε > 0 is sufficiently small. By strict monotonicity, this would
have strictly higher utility, contradicting the optimality of xj .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
147 / 210
What an amazing theorem! We didn’t need convexity, differentiablity,
or even continuity. “If it’s a CE, it’s Pareto efficient.” Of course, giving
everything to A and leaving B with absolutely nothing is also Pareto
efficient, so Pareto efficiency is not the only thing to care about.
Is competitive equilibrium unique? No, there can be multiple equilibria
even in rather simple economies, with only two people (or kinds of
people) and two goods.
b
b
b
y
x
b
OA
OB
w
b
Here x and y are both CE starting from the same initial endowment w.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
148 / 210
In the box shown, the two equilibria have different prices and different
quantities. Even in cases where economists would consider
equilibrium unique (just one allocation x and one set of relative prices),
that x is supported by lots of different absolute prices. For example, if
(1, 2) is an equilibrium price vector, so are (2, 4) and (5, 10). All of
these yield the same budget line, so they are equivalent. In other
words, there in nothing in this model to tie down the price level.
The Pareto criterion distinguishes the contract curve from the other
(inefficient) allocations in the box. How should society choose among
the points on the contract curve? This involves interpersonal
comparisons of utility, challenging on conceptual grounds. Maybe
someday neuroscience will help us with this. Remind me to say more
about “social choice” after we’ve studied behavior under risk.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
149 / 210
Monopoly
Until this point, all economic agents have been assumed to take prices
as given: each agent believes that she has no influence on market price.
At the opposite extreme is a monopolist. As the only firm in the
market, she can decide to sell a lot at a low price, or a little at a high
price.
The world is full of examples of a monopolist selling the same product
at different prices to different groups. Known as price
discrimination, it will be considered here a bit later. But we start
with a simpler setting in which a monopolist chooses a single price,
which applies to all buyers and all units of the good.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
150 / 210
Let p be price in the monopolized market, and q be both firm quantity
and market quantity. Since the monopolist is basically choosing a point
on the market demand curve, it is equivalent to think of her choosing p
or q (the one implies the other). She solves
Choose q to maximize profits = revenue − costs
(16)
Equivalently
max (qp(q) − C(q))
(17)
R+
q∈
FOC for interior maximizer: (16) yields
MR = MC
(18)
where M R denotes marginal revenue, the first derivative of revenue
with respect to q.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
151 / 210
Differentiating (17), which “unpacks” revenue a bit, yields
q
dp
+ p = MC
dq
(19)
The good thing, for the firm, about producing an extra unit, is that it
sells it for $p. But there are two drawbacks: the extra cost M C, and
the amount the extra quantity lowers market price (that is, dp
dq ) time
the number of units q on which this price decline operates.
(19) can be expressed in elasticity terms:
q dp
p
+ 1 = MC
p dq
1
= MC
p 1+
εq,p
or
!
1
p=
MC
1
1 + εq,p
(20)
(21)
These are versions of the “inverse elasticity rule”.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
152 / 210
How do things look, graphically?
First notice that when we draw the demand curve, at any q we are
plotting average revenue. When the monopolist chooses that (q, p)
pair, she sells the q units at p each, so her average revenue is pq
q = p.
So we can read average revenue off the demand curve. Therefore, if
demand is differentiable, average revenue and marginal revenue have
the same vertical intercept, and slopes at the origin in the ratio 1 : 2.
p
MR
D
q
If D happens to be linear, so is M R, and with twice the slope.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
153 / 210
MC
AC
pM
MR
qM
D
q
The monopolist sets M R = M C, as long as price at that point exceeds
AC. Otherwise, she prefers not to produce.
Profits per unit
are pM − AC q M , so profits are q M pM − AC q M .
Here, AC q M is average cost evaluated at q M .
It would be socially optimal to produce at q ∗ , where M C cuts demand
(where marginal benefit=marginal cost). But by choosing q M instead,
the monopolist causes a deadweight loss of the area above M C below
demand, from q M to q ∗ .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
154 / 210
What can the government do to make monopoly less inefficient? One
possibility is a per-unit subsidy to production. That will increase q and
increase social surplus. But adding to already handsome monopoly
profits is not going to be politically popular.
An alternative is regulation. Passing legislation requiring the
monopolist to produce at q ∗ would remove the deadweight loss. But
this assumes the government has accurate, detailed knowledge about
the industry, so it can choose the ideal q ∗ .
There’s a bigger problem with regulatory solution, for some cost
structures. Especially if there are large setup costs, AC may be above
the demand curve at q ∗ . Then if the firm is forced to produce q ∗ , it
makes losses, and it would prefer to shut down.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
155 / 210
MC
p
AC
M
MR
qM
D
q
q∗
The shaded area are the monopolist’s losses at q ∗ . Another option is to
impose q ∗ by regulation, and at the same time give a lump-sum
subsidy equal to the firm’s losses.
Public utilities (power companies, trains,...) often have large setup
costs (think of all the land and tracks needed for trains). At q ∗ for such
firm, the efficient solution may be to regulate q ∗ and low prices, and
then cover the resulting losses!
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
156 / 210
Different authors use the term natural monopoly in different ways. But
the most recognized definition is that of the late William Baumol (of
NYU), who said that natural monopoly arises when it is more
expensive to produce with two or more firms, than with a single firm.
This will be the case if AC is everywhere decreasing (at least in the
range where demand can cover costs).
Why would there be only one firm in a market, in the first place?
Natural monopoly is one possible explanation: one firm can serve the
market much better than several. Another reason is patents. The
government, in order to reward research and development (R&D)
sometimes grants a patent to a firm, giving it the exclusive rights to
some product or process, for many years. Even without a patent, I
might have some special location, talent or formula, that makes me
unique. Whether I use this or supply it to someone else, it allows me
(or the person to whom I rent the resource) to capture the market.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
157 / 210
Firm A might be an original incumbent in the market, with lots of
brand recognition and loyalty. If firm B tries to break into the market,
firm A lowers its price, making it hard for B to win much market share
without charging ruinously low prices. So B gives up and leaves. This
behavior by A is called predatory pricing. Once B is gone, prices go
back up.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
158 / 210
Price Discrimination
The better a monopolist’s information about different consumers’
willingnesses to pay, the more effectively it can practice price
discrimination. The extreme case of this is known as perfect or first
degree price discrimination: the firm recognizes each consumer’s
“reservation price” (maximum willingness to pay), and charges her this
amount. In this situation, the firm extracts all the surplus in the
market (consumers get no net benefits).
MC
AC
D
q∗
q
The firm’s surplus is the shaded area below demand and above M C. It
produces the socially optimal amount.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
159 / 210
In practice, a firm never has such perfect ability to extract surplus.
But it may be able to divide the market into groups that have different
elasticities of demand (remember the inverse elasticity rule for
monopoly pricing).
For example, if the elderly have higher price elasticities of demand for
some product, the firm may offer them senior discounts. The same goes
for students. A chain store may price more modestly in poor
neighborhoods.
To me, this seems like a weaker, or cruder, version of first degree price
discrimination. I would have called it second degree price
discrimination. But I wasn’t born early enough to name these things,
and for unknown reasons these phenomena are called third degree
price discrimination.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
160 / 210
What, then, is second degree price discrimination? It involves giving
quantity discounts. Even if you can’t see that a consumer is poor, or
elderly, and so on, you can say that buying individual bars of soup or
rolls of paper towels will be more expensive per unit. A large family
with high expenses may be willing to put up with the inconvenience of
buying 48 rolls of paper towels at once.
A special kind of second degree price discrimination is a two-part tariff.
Here, tariff just refers to charges or fees, not to importation. It is
extremely common for a public utility to bill customers this way. There
is a fixed monthly charge for service, plus a charge proportional to the
amount of electricity you use that month. This keeps marginal prices
low, encouraging consumption, while the fixed charge helps cover the
utility’s high setup costs.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
161 / 210
How about charging much more for a plane ticket bought the day
before the flight, the one purchased two weeks before? This is often
given as an example of third degree price discrimination. But notice it
doesn’t rely on airlines’ knowing what customer group a purchaser
belongs to. The customers “self-select”, with vacationers planning
ahead and business executives willing to pay more to be in the right
place as a deal is developing, and flying in and out quickly without
staying over a Saturday night. So it has something in common with
second degree price discrimination: in each case (quantity discounts or
time-dependent pricing), everyone is offered the same deals, and
different market segments choose different deals.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
162 / 210
Price discrimination depends upon the firm’s ability to prevent
“arbitrage”: it doesn’t want people to buy huge quantities at a low
price, and resell at higher prices to those who are wealthier and so on.
So the student tickets to a New York Philharmonic concert are usually
stamped “STUDENT”, and airline tickets are not transferable.
It is infuriating to find that the person in the next seat to you on the
plane paid $350 less than you did, because of the time of day she
bought or what web site she used. But over all, price discrimination by
airlines has something of a “Robin Hood” aspect: it takes from the rich
and gives to the poor. You may object that yes, it creates more surplus
that way, but the airlines grab a lot of that surplus. Interestingly, some
empirical studies suggest that even consumer surplus (not just
producer surplus) is increased by price discrimination.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
163 / 210
Decisions under Risk
Our choices affect what happens to us, but there is often an element of
chance as well. Maybe I carry an umbrella, but it ends up not raining.
A firm might need to invest in its capital stock before finding out how
strong the demand for its product is.
When we face risk, we are not choosing an outcome from a set of
alternatives, but rather a lottery (from a set of lotteries). A particular
lottery, call it L, might have n different possible outcomes x1 , ..., xn as
“prizes”, which happen with probabilities p1 , ..., pn , respectively. If
these n possibilities are mutually exclusive (at most one can actually
happen) and exhaustiveP
(there are no other possibilities, so one of
these will happen), then ni=1 pi = 1, and we assume pi ∈ [0, 1],
i = 1, ..., n.
Usually we assume these probabilities are “objective” (roughly,
everyone agrees they are “obvious” — if you flip a fair coin, it has
probability 1/2 of coming up heads (H) and probability 1/2 of coming
up tails (T )).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
164 / 210
We could write the lottery in compact notation as
L = (x1 , .., xn ; p1 , ..., pn ). The xi ’s could be be monetary prizes, or
different states of health (if you are choosing whether or not to get
vaccinated against flu, for example).
If the prizes are monetary, the actuarial value of L is the expected
value or mathematical expectation
n
X
p i xi
i=1
A lottery with just two outcomes is often called a bet:
Lb = (w, l; p1 , p2 )
where the person buying the lottery gets w with probability p1 and
gets l < w if he loses.
A bet is called “fair” if p1 w + p2 l = 0
(better that fair if p1 w + p2 l > 0
and less than fair (unfair) if p1 w + p2 l < 0)
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
165 / 210
Suppose you have wealth $100, 000, and someone offers you the fair bet
heads, you win (another) $100, 000.
tails, you lose $100, 000.
If you take the bet, you have an equal chance of ending up with
$200, 000 or nothing. Most people will refuse this bet, even though it’s
(actuarialy) fair. In some sense, the possible loss bothers them more
than the possible gain.
This suggests that people don’t maximize actuarial value (nor should
they). There is an even more dramatic, and entertaining,
demonstration of this, pointed out by Nicolaus Bernoulli almost three
hundred years ago. It goes by the name “St. Petersburg Paradox”.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
166 / 210
How much would you pay for the following lottery?
A fair coin is flipped until it comes up H (after that, no more flips).
If this happens on the 1st toss, you get $2.
If this happens on the 2nd toss, you get $4.
If this happens on the tth toss, you get $2t .
The probability it comes up H on 1st toss is 21 .
The probability it comes up H for the 1st time on 2nd toss is
The probability it comes up H for the 1st time on tth toss is
1
1
2×2
1
2t .
= 14 .
So the lottery’s actuarial value is
2·
1
1
1
+ 4 · + 8 · + ... = 1 + 1 + 1 + ... = ∞
2
4
8
The lottery has infinite actuarial value. But few people would pay
even $1, 000 for it. This contrast has a paradoxical feel to it.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
167 / 210
Nicolaus Bernoulli’s cousin Daniel Bernoulli proposed a resolution. He
said that increments of money become less important, or have lower
marginal utility, the more money a person has. He suggested that a
person maximizes his expected utility, not his expected actuarial
value.
P
If you maximize ∞
i=1 pi u(xi ) and u is sufficiently concave, then the
sum is finite, so you would pay only a limited amount for the lottery
after all.
People don’t care just about actuarial value, but how that value is
spread around. So you might object (to Daniel Bernoulli): maybe
people don’t care just about the expected value of utility, but how that
utility is spread around. This question remained open for more than
two hundred years, when in the early 1940’s, when it was given an
elegant answer by von Neumann and Morgenstern, as part of their
development of game theory.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
168 / 210
The von Neumann-Morgenstern (vN-M) Theorem gives conditions
(axioms) under which an individual acts as though he is maximizing
the expectation of some utility function. That is, there exists
u : R+ → R such that for any two lotteries L = (x1 , ..., xn ; p1 , ..., pn )
and L0 = (x01 , ..., x0m ; p01 , ..., p0m ),
L % L0 ⇐⇒
n
X
i=1
pi u(xi ) ≥
m
X
p0j u(x0j )
j=1
That is, a person who satisfies the (rather modest) vN-M axioms is an
expected utility maximizer. We say u represents his preferences with
the expected utility property.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
169 / 210
Whether a consumer likes risk or not is a matter of taste. We say he is
risk averse if he refuses all fair bets. More precisely, starting from
some non-random wealth level, he always (at least) weakly prefers to
reject any fair bet. One can show this is equivalent to having a concave
utility function.
General increasing functions do not preserve concavity. For example,
√
u(x) = x is strictly concave, but v(x) = (u(x))4 = x2 is strictly
convex. However, strictly increasing linear transformations such as
w(x) = au(x) + b do preserve concavity or convexity. The u in the
vN-M Theorem is unique only up to strictly increasing linear
transformations.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
170 / 210
In fact, most of the decisions taken by a consumer, a firm or a
government involve risk. How toxic is a fruit I am buying? How well
will buyers respond to a new ad campaign? Will the stock market
crash this week? Will someone crash into my car on the way to work?
As a result, expected utility theory (and its elaborations) plays a huge
role in consumption analysis, theory of the firm, finance, and so on.
Let’s look at one simple application to the insurance industry.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
171 / 210
Insurance
A driver with initial wealth w and strictly concave vN-M utility
function u will have an accident with damages $d with probability p.
An insurance company offers him insurance on the following terms: he
can choose any x ∈ [0, d] and pay rx upfront. If he has an accident, he
will receive a payment of x.
What value of x should he choose? The choice variable x is like a
quantity of insurance to buy, and r is the price. If r is outrageously
high, he might prefer the corner solution x = 0 (go uninsured). But if
he chooses an interior solution, it sets the first derivative of expected
utility w.r.t. x equal to 0:
d
[(1 − p)u( w
r)x − d )] = 0
| −
{zrx} ) + pu( |w + (1 −
{z
}
dx
consumption with no accident consumption after an accident
−r(1 − p)u0 (x − rx) + (1 − r)pu0 (w + (1 − r)x − d) = 0
u0 (x
David Pearce (NYU)
(1 − r)p
− rx)
= 0
(1 − p)r
u (w + (1 − r)x − d)
Theory Track Micro Analysis
Spring 2021
(22)
(23)
172 / 210
Suppose, for example, that the insurance is fair, that is,
rx (1 − p) = p (1 − r)x
|{z}
| {z }
payment if no accident
that is, p = r.
net payment if accident occurs
Then LHS of (23) is 1, so RHS is (22). Strict concavity of u implies
numerator can equal denominator only if their arguments are equal,
that is, he has fully insured; his consumption is unaffected by the
accident. But if insurance is unfair, then 1 > LS = RS so he has less
than fully insured: consumption falls if there is an accident.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
173 / 210
The simple analysis above leaves out two extremely important
considerations: moral hazard and adverse selection. Both arise in
situations of asymmetric information: people don’t all know the
same things.
Moral Hazard refers to the fact that after a contract is signed, one
party (or more) may have an incentive to act in a way that damages
one or more other parties. In our insurance example, if I’m fully
insured against collision damages, I have a lot less reason to drive
carefully in a parking lot than if I’m uninsured. Notice this means p
depends on my behavior! The more insurance you sell me, the more
careless I get, and the higher p is.
How can an insurance company respond to moral hazard? It can raise
the price for higher amounts of coverage. And it can charge a lot for
things that are likely to affect behavior a lot. The reason you don’t
drive 100 mph on I-95 has little to do with being under-insured, and
much to do with fear of death or loss of license. But if you have “full
glass” coverage, you are much more likely to park on the street instead
of in a guarded lot, and so on.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
174 / 210
Moral hazard has major implications beyond the insurance industry. If
you own a house, you look after it more carefully (because the property
whose value you are affecting is yours) than if you are renting (the
same goes for rental cars, bikes and so on).
If you are a wealthy landowner in a developing country, with workers
cultivating your crops, why should they work as hard (irrigating,
weeding, guarding against pests,...) as if it were their land and crops?
This is the motivation for share cropping, where the tenant farmer
gets part of what is produced.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
175 / 210
Adverse selection refers to contracts attracting some kind of
participants more than others. In the insurance context, expensive and
comprehensive health insurance is more attractive to someone with
complex health problems than to someone who seems to need a doctor.
And regarding car accidents, some drivers are more prone to them than
others. Maybe I’ve had no accidents, but many close calls, so I
understand what the company doesn’t: I’m a poor risk. So I buy more
insurance than a typical person would.
What do insurance companies do about this? First, they take it into
account when they choose contract prices: if these were based on
estimates of average incidences in the whole population, they will lose
a lot of money. It is the “high-p” types who rush to buy. Secondly,
they are very cautious when introducing a new kind of coverage,
because again their population information can lead them badly astray.
Finally, they may offer “menus of contracts”, some with high coverage
at very high rates (to attract the bad risks) and others with lower
coverage that attract lower-risk patrons.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
176 / 210
Like moral hazard, adverse selection is a powerful consideration beyond
insurance provision. If I’m buying a used car, I know a lot less about it
than the seller does. The fact that it is for sale is already bad news:
maybe it’s got problems and the seller wants to dump it. Akerlof wrote
about “lemons” in 1970, referring not to fruit but to cars of low quality.
Huge oil companies bidding on rights to exploit an oil tract are nor sure
how much oil is there, and how accessible it is. Each gets to do some
sample drilling. If my bid wins, it’s a kind of bad news: no one else’s
samples were as encouraging as mine. This is known as “the winner’s
curse”. On average the auction gives me an “adverse selection”, that is,
a tract about which others have worse information than I do.
Remind me to discuss the adverse selection on Bumble, related to a
wisecrack of Groucho Marx.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
177 / 210
Game Theory
Game theory is a way of studying formally the interaction of different
persons, groups or organizations. This could be two countries
negotiating a trade agreement, or five firms deciding how to price their
competing products in a market, or ninety auction participants
deciding how to bid.
The two major branches of game theory approach these questions
rather differently. Noncooperative game theory (which despite its
name, admits the possibility of cooperation) tries to describe a
strategic situation in detail, with careful attention to timing and
information. Cooperative game theory takes a broader view,
modelling with a softer focus and applying general principles to suggest
what should happen. We will study noncooperative theory here.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
178 / 210
Simultaneous Noncooperative Games
There are N players, who make choices simultaneously. Each player:
has a set Si of pure strategies. Think of a pure strategy as a
deterministic, or non-random, action that player i might take. A
strategy si ∈ Si could be multidimensional: for a firm, it might be
“stay in the market, invest in a new mode of production, and produce
8, 000 units of output”.
Let
S = S1 × S2 × ... × SN = {(x1 , ..., xN ) | xi ∈ Si , i = 1, ..., N }
An element x ∈ S is called a strategy profile; it is a vector specifying
a particular pure strategy for each player.
Each player i has a utility function ui : S → R that associates with
any pure strategy profile x ∈ S a utility ui (x). These are von
Neumann-Morgenstern utils: we assume that i seeks to maximize
expected utility.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
179 / 210
The normal form of a game is written:
G = (S1 , ..., Sn ; u1 , ..., un )
This describes the available alternatives for each player, and the payoff
consequences for each player, of each strategy profile.
If N = 2 and if there are not too many strategies, it may be easiest to
describe G by drawing its (bi)matrix, with 1’s strategies corresponding
to rows, and 2’s strategies corresponding to columns.
In the cell corresponding to the ith row and j th column, write the
payoff pair u1 (ai , bj ), u2 (ai , bj )
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
180 / 210
Example
2
1
a1
a2
a3
b1
5,4
0,0
3,1
b2
2,8
1,9
4,7
G
This represents a game with S1 = {a1 , a2 , a3 } and S2 = {b1 , b2 }.
The entry 4, 7 on the lower right says that players 1 and 2, respectively,
get 4 and 7 if 1 chooses a3 and 2 chooses b2 .
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
181 / 210
A player i might wish to choose randomly. For example, she could
construct a spinner, and make the proportion of the circumference
corresponding to a particular strategy si ∈ Si proportional to the
probability with which she wishes to play si .
For example, maybe she wants to play:
si with probability 1/2.
s′′i
si
s′i
David Pearce (NYU)
s0i with probability 1/8.
and s00i with probability 3/8.
Theory Track Micro Analysis
Spring 2021
182 / 210
Definition
A mixed strategy mi ∈ Mi for player i is a probability distribution over
the set of pure strategies of i. (This includes “degenerate” or “trivial”
mixed strategies that put all their weight on one pure strategy.)
Let Mi = the set of all i’s mixed strategies, and M = M1 × ... × MN .
Notice that, in the previous
example, if player 1 thought 2 was playing
1 3
the mixed strategy 4 , 4 , that is, will play the first column with
probability 1/4, then 1’s expected utility from choosing the third row,
for example, is 14 (3) + 34 (7).
More generally, we can extend the utility functions ui to the domain
M = M1 × ... × MN (so ui : M → R) by an expected utility calculation.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
183 / 210
Now we’ll use our (θ, x−i ) notation from earlier lectures.
Definition
xi ∈ Mi is weakly dominated by yi ∈ M−i if
ui (yi , m−i ) ≥ ui (xi , m−i )
ui (yi , m0−i ) > ui (xi , m0−i )
for all m ∈ M , and
for at least one m0 ∈ M.
If strict inequality holds everywhere above, we say xi is strictly
dominated by yi .
These are ways of saying that yi is a better strategy than xi : under
strict dominance, it always does strictly better than xi does, no matter
what others do.
That doesn’t mean that i should necessarily choose yi , just that yi is a
better choice than xi (maybe some zi does even better!).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
184 / 210
Contrast this to a dominant strategy, which involves a comparison with
all strategies.
Definition
di ∈ Mi is a dominant strategy for i if for all m ∈ M ,
ui (di , m−i ) ≥ ui (xi , m−i )
∀xi ∈ Mi
If player i has a dominant strategy, she can play it without worrying
about what others are doing. di maximizes her expected utility against
every profile of strategies for others.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
185 / 210
Example: Prisoner’s Dilemma
This is perhaps the most famous of all games.
2
1
c
d
C
10,10
11,0
D
0,11
1,1
Two criminals have been caught in the act of a minor crime. The
prosecuting attorney knows they have committed something much
more serious as well, but does not have admissible evidence to prove it.
Either defendant can testify to the (serious) guilt of the other
conclusively. If they both testify, they get 1 util each. If they keep
quiet (c, for “cooperate with each other”, as opposed to d for “defect
from cooperation”) they both get 10. If one testifies and the other
doesn’t, the defector gets 11 and the cooperator gets 0.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
186 / 210
It is easy to check that d is a (strictly) dominant strategy for 1, and D
is a (strictly) dominant strategy for 2. So the dilemma is not any
difficulty in figuring out what to do; it’s that the obvious result will be
(d; D). Notice this is Pareto-dominated by (c; C).
In most games, it is not so clear what each player should do. In such
situations, game theorists often look for a strategy profile which, if it
were expected by all players, would not lead to any contradictions, that
is, would not give any player a reason to switch to a different strategy.
Definition
m ∈ M is a Nash Equilibrium (NE) if for every i,
ui (m) ≥ ui (xi , m−i )
∀xi ∈ Mi .
One can think of m as a plan (commonly understood) from which no
individual can profitably deviate, unilaterally.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
187 / 210
Example
2
1
a1
a2
b1
3,3
0,0
b2
0,0
2,2
In this game, (a1 , b1 ) is an NE. If anyone deviated to a different pure
strategy, she would get 0 instead of 3.
Note: (a2 , b2 ) is also an NE, for exactly the same reason!
Of course there is a “bilateral deviation” (let’s both switch to our
respective first strategies) that is profitable. But NE ignores that.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
188 / 210
Consider the game “matching pennies”.
2
1
h
t
H
1,-1
-1,1
T
-1,1
1,-1
If they both choose heads, or both tails, 2 pays 1 one dollar.
Otherwise, 1 pays 2 one dollar. Assume they are risk neutral.
Notice there is no NE in pure strategies: from any cell in the matrix,
someone wants to switch. BUT both of them mixing 50/50 is an NE!
More formally (.5, .5; .5, .5) is an NE.
Why? If you are equally likely to play heads or tails, then I don’t care
which one I play (I’m equally likely to match you either way). Note
that the only reason player i would ever put strictly positive weight on
two pure strategies is that they give her the same expected utility.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
189 / 210
Kuhn and Tucker are most widely known for their work in optimization
(the Kuhn Tucker conditions). But in the early 1950’s in Princeton’s
math department they did influential work on game theory. Tucker
invented a more interesting game than matching pennies, that helps us
understanding pure and mixed NE. It is called Battle of the Sexes.
W
H
b
o
B
2,1
0,0
O
0,0
1,2
A stereotypical 1950’s couple have left home without arranging whether
they will meet that evening at the boxing match (the husband’s favored
choice) or at the opera (preferred by wife). There were no cell phones.
What will they do? Both going to the boxing is an NE; both going to
the opera is an NE. Neither of these is symmetric, although the game
is.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
190 / 210
What about mixed equilibria?
Let p be the probability H plays b, and q be the probability that W
plays B. Since H must be indifferent (if he is to mix) between b and o,
EUboxing = 2q + 0(1 − q) = 0q + 1(1 − q) = EUopera ⇒ q = 13
Similarly for W ,
EUboxing = 1p + 0(1 − p) = 0p + 2(1 − p) = EUopera ⇒ p = 23
Notice this is a symmetric equilibrium (each goes to his or her favorite
activity two thirds of the time). But it is Pareto-dominated by each of
the pure NE! Now that we know the mixed NE, we could figure out the
likelihood of landing in each of the four respective cells, and calculate
each player’s EU that way. But it’s much simpler to compute H’s EU ,
for example, by choosing either of his rows, and evaluating that (they
both yield
EU !). For example, playing the first row gives
the same
him 2 13 + 0 23 = 23 so his EU in this NE is 2/3 (and same for W ).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
191 / 210
In larger games, finding all the NE can be difficult. Sometimes one can
begin by eliminating a strictly dominated strategy for some player;
once that row is crossed out, maybe some column becomes strictly
dominated in the smaller game that is left. This is called “iterative
dominance”.
2
b
b
b3
1
2
Here, notice that a2 is strictly
a1 5,4 1,6 10,0
dominated by a1 . If we cross out
1
a
2,7 0,7 1,8
2
a2 , because it can never get
a3 1,6 5,4 10,0
weight in any equilibrium, we are
b1
b2
b3
left with the 2 × 3 to the right, in
a1 5,4 1,6 10,0
which b3 is strictly dominated (by
1
a3 1,6 5,4 10,0
both b1 and b2 ). Cross b3 out to
b1
b2
get the 2 × 2 on the right, which
a1 5,4 1,6
can be solved by (p, q)
1
a3 1,6 5,4
calculations.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
192 / 210
Why should we expect an NE to occur whenever some game is played?
(In general, I would say we shouldn’t be too sure that equilibrium will
be attained.) Here are some considerations:
1
Players may be able to deduce what will happen, for example by
iterative strict dominance. Then we expect NE.
2
There may be a tradition, in a society, that a game is played in a
certain way. That tradition needs to be an NE (otherwise, people
would not follow it).
3
Even if there is no tradition, because the game is rarely ever
played, something about the game may make some strategy profile
“salient” (Thomas Schelling, 1960).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
193 / 210
For example, the Game G below has many NE, and we cannot prove
that any particular profile will be played. But most of us would guess
that (a1 ; b1 ) is the likely outcome.
1
a1
a2
a3
a4
b1
10,10
0,0
0,0
0,0
2
b2
0,0
1,1
0,0
0,0
b3
0,0
0,0
1,1
0,0
b4
0,0
0,0
0,0
1,1
G
How about a game like matching pennies? It has a unique NE. Can we
be sure it will be played? Notice that if 2 plays 50/50, 1 doesn’t care
what he does. So playing h for sure is as good as anything else. Why
should he randomize? If he doesn’t, 2 doesn’t know that 1 isn’t! So
she can’t take advantage of his non-randomization.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
194 / 210
Linear Symmetric Cournot Duopoly
A famous economic example of a simultaneous game was studied by
Cournot in 1838! His “linear symmetric Cournot duopoly” has two
firms with the same constant marginal cost c ≥ 0 (and no setup costs)
simultaneously choosing quantities q1 and q2 to put on the market.
The inverse demand function is p = a − q1 − q2 .
Firm i takes firm j’s output qj to be fixed, and chooses qi to maximize
i’s profit
qi p − cqi , that is, qi (a − qi − qj ) − cqi
FOC
(a − qi − qj ) − qi − c = 0
(24)
qi = p − c (so qi = qj )
(25)
We see
(24) and (25) equations imply q1 = q2 =
David Pearce (NYU)
a−c
3 .
Theory Track Micro Analysis
Spring 2021
195 / 210
Notice that we could regard a − c as the “competitive output” (where
p = M C).
a
a−c
c
a−c
a
Q
In Cournot’s model, the pure strategy NE has market output as
2
3 (a − c). One can redo the analysis for n symmetric firms, to find
qi =
a−c
n+1
so Q =
n
(a − c)
n+1
which approaches the competitive output as n → ∞.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
196 / 210
Linear Bertrand Duopoly
In 1883, discussing Cournot’s work, Bertrand pointed out that in many
cases, firms set prices rather than quantities. We agreed that for a
monopolist, these amount to the same thing. Bertrand showed
dramatically that this is no longer true, for two or more firms. (As a
practical example, think of gas stations on opposite corners. They post
their prices for the day, and then consumers respond by deciding how
much to buy from each station.)
Assume firm’s products are perfect substitutes. So if pi < pj , no one
will buy from firm j. (The market is split 50/50 if pi = pj .) Assume as
before that marginal costs ci = cj = c ≥ 0.
Look for an NE in pure strategies. Notice that (c; c) is an NE
(decreasing price from c results in losses, and increasing price from c,
unilaterally, leaves you with no sales).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
197 / 210
If (x; x) is an NE, could x > c?
No, because 1 could undercut 2’s price by some tiny ε > 0, thereby
doubling his sales and almost doubling his profits.
If (x; y) is an NE with x > y > c, firm 1 has no sales, and should
undercut 2. If x > y = c, firm 2 should increase to a price above c, to
make profits. We see (c; c) is the only NE in pure strategies.
Note that, just with a change in “strategic variable”, Bertrand has
turned Cournot’s story upside-down: two firms are enough for
achieving perfectly competitive results!
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
198 / 210
Extensive Form Games
Many games of interest are played over time, with participants learning
some of what has happened as they contemplate what to do next.
The description of an extensive form game includes a specification
of:
the players 1, 2, ..., N
the origin (initial node), where the game begins
who “moves” at the origin, and what choices (branches, edges) are
available to her, each of which leads to a further (successor) node
such a successor may be an endpoint of the game (terminal node)
or a choice node, at which some player has choices available to him
(further branches), and so on.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
199 / 210
Information sets may be used to limit what a player knows
about the history of play when she is at a particular choice node
(in chess, you always know everything that has happened so far,
but in most card games, such as bridge, you don’t)
each terminal node, if reached, gives each player some payoff,
measured in vN-M utils.
All of this was rather vague; we’ll make it clear with some examples.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
200 / 210
Often it is useful to describe an extensive form game by drawing the
game tree. These can grow in any direction; I usually draw them from
left to right.
2
a
3, 1, 5
b
2, 7, 0
c
1, 6, 2
e
5, 6, 4
b
u
1
b
d
Γ1
b
3
In game Γ1 , player 1 moves first, at the origin. If she chooses u, then 2
gets to choose a or b. If instead 1 chooses d, then 3 gets to choose
between c and e. The resulting payoffs for the respective players are
indicated beside the various terminal nodes.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
201 / 210
Without further markings on the tree, we know that the players can
see where they are in the tree, when they make choices. (Later we’ll see
examples of the opposite.)
What should 1 do?
Choosing u means she’ll get 3 or 2. Choosing d means she’ll get 1 or 5.
But she can figure out what 2 and 3 would do, if reached, by thinking
about their payoffs (the assumption is that the game’s structure is
“common knowledge”: everyone knows it, everyone knows that
everyone knows it, everyone...). If reached, 2 would play b, to get 7,
whereas if reached, 3 would play e, to get 4. That means 1 gets 2 by
choosing u, and 5 by choosing d. So she should choose d, and (5, 6, 4)
results.
This procedure (of going to the end of the tree, seeing what the last
players would do, and then seeing what penultimate players would do,
then antipenultimate and so on) is called backward induction.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
202 / 210
If the 4 at the end of the tree had been a 2, player 3 would have been
indifferent between c and e, and 1 wouldn’t have had reason to predict
3’s choice with confidence. Players have a strong basis for making their
decisions when any indifference by a player at some node in the tree is
shared by all earlier players (they’re not sure what she’ll do, but it
doesn’t concern them).
What if a player doesn’t know where he is in the tree, when he needs
to make a choice? We draw an information set around all the points
he cannot distinguish.
b
u
3, 4
m
4, 3
x
a
d
2, 2
u
4, 3
2
1
b
b
b
m
3, 4
d
2, 2
y
Γ2
c
0, 15
In Γ2 , if 2 is called upon to play, he knows 1 did not choose c, but he
can’t see if she chose a or b. Backward induction will not succeed here.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
203 / 210
Note: The number of choices at each node in the same information set
must be the same (three at x, then three at y, in Γ2 ).
Definition 1
Γ is a game of perfect information if every information set is a
singleton (that is, has exactly one element). [Γ1 is, Γ2 is not]
Definition 2
A pure strategy for player i in an extensive form game Γ specifies what
(non-random) choice she will make at each of her information sets.
Example:
2, 2
u
5, 5
1
b
a
1, 6
l
d
Γ3
b
2
b
b
1
r
6, 1
A pure strategy for 2 is just a choice (a or b), so S2 = {a, b}.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
204 / 210
But a strategy for 1 is something like: (d, r). It says she’ll start by
choosing d, and then, in the unlikely event that she is reached at the
end of the game, she will play r. Another strategy is (d, l).
The two remaining pure strategies for 1 may make you frown. They are
(u, l) and (u, r). What sense can you make of the strategy: “I choose u,
and if I’m reached later, I will choose r.”??
Choosing u already makes it impossible for her to be reached later.
The instruction “r, if reached” seems entirely redundant. Nonetheless,
this is how strategies are defined, and it actually turns out to be useful,
as you will see.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
205 / 210
A pure strategy profile (s1 , ..., sN ) says what everyone will do,
everywhere (at every information set). If you look at each node in the
tree, see what the player’s strategy says there, and draw an arrow on
the appropriate branch, you can start at the origin and the arrows will
lead you to a terminal node, with its vector payoffs. So with each
profile s ∈ S is associated a profile u(s) = (u1 (s), ..., uN (s)) of v.N.-M.
payoffs.
Notice that, starting with an extensive form game Γ, you can find the
normal form G = (S1 , ..., Sn ; u1 , ..., uN ) of the game!
Remark: If we allow for random moves by “Nature” (who determines
how likely it is to rain, or for an accident to occur, and so on, and
whose behavior is non strategic and involves fixed probabilities that are
common knowledge), things are a little more complicated. A pure
strategy profile (for the N actual players, not including Nature) now
induces a probability distribution over terminal nodes. But one can
still compute expected payoffs for each player, and hence the normal
form.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
206 / 210
Nash equilibrium encounters some embarrassment in extensive form
games. A Nash equilibrium strategy for some player i may involve an
implicit “threat” at an information set not reached in equilibrium,
that seems lacking in any credibility.
4, 10
2
a
1
b
c
b
8, 5
Γ4
d
a
4, 10
4, 10
b
0, 0
8, 5
G
1
b
2
d
c
0, 0
Backward induction makes it clear that 2 would choose c if reached,
and therefore 1 will choose b. And indeed, that is an NE of the game.
But let’s look at its normal form. Notice that (a; d) is also an NE! No
one can gain by deviating.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
207 / 210
The problem is that d is “a best response to a” (maximizes 2’s EU
against strategy a) at the beginning of the game, but not a best
response from 2’s decision node onward. Once 2 is reached, the only
utility-maximizing thing for her to do is to choose c.
Selten (1965) pointed out that once 2’s decision node is reached, what
remains of Γ4 is a game in its own right, with just one active player,
player 2. In that subgame; as he called it, c is the only NE. He argues
that (a; d) is not a sensible NE, as it fails the NE test on that subgame.
More generally, after some history of play leading to a choice node x, if
you can remove all of the tree following x (from the big game) without
ripping any information sets, call x and all that follows it a subgame.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
208 / 210
There are two ways of thinking about player i randomizing in an
extensive form game. First, he could play a mixed strategy, which, as
before, corresponds to the random choice of a pure strategy. This is
like spinning a spinner before the game starts, and following the pure
strategy it points to, for the whole game. (If you have two information
sets, this coordinates your randomization over the two sets.) This is a
“global” randomization.
Instead you could randomize “locally”. Think of waiting until you are
reached at a particular information set, and then spinning a spinner
designed just for that eventuality. This is what a behavior strategy
for player i does: it specifies what probability distribution he would use
at each of his information sets. These distributions are independent,
not “correlated” (they employ completely separate spinners).
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
209 / 210
Harold Kuhn, who introduced the most commonly used formulation of
the extensive form, showed that it doesn’t matter if players use mixed
strategies or behavior strategies, as long as the game has perfect
recall (no one ever forgets anything he has done or known).
Since a behavior strategy for i tells him what to do everywhere in Γ, it
tells him what to do in any subgame.
Definition (Selten 1965)
A behavior strategy profile b is a subgame perfect equilibrium (SPE) of
a game Γ if, for every subgame τ of Γ, b induces an NE on the subgame
τ.
David Pearce (NYU)
Theory Track Micro Analysis
Spring 2021
210 / 210
Download