Document

advertisement
Econ 600: Mathematical
Economics
July/August 2006
Stephen Hutton
1
Why optimization?
• Almost all economics is about solving
constrained optimization problems. Most
economic models start by writing down an
objective function.
• Utility maximization, profit maximization, cost
minimization, etc.
• Static optimization: most common in
microeconomics
• Dynamic optimization: most common in
macroeconomics
2
My approach to course
• Focus on intuitive explanation of most important
concepts, rather than formal proofs.
• Motivate with relevant examples
• Practice problems and using tools in problem
sets
• Assumes some basic math background (people
with strong background might not find course
useful)
• For more details, see course notes, textbooks,
future courses
• Goal of course: introduction to these concepts
3
Order of material
• Course will skip around notes a bit during
the static course; specifically, I’ll cover the
first half of lecture 1, then give some
definitions from lecture 3, then go back to
lecture 1 and do the rest in order.
• Sorry! 
4
Why not basic optimization?
• Simplest method of unconstrained
optimization (set deriv = 0) often fails
• Might not identify the optima, or optima
might not exist
• Solution unbounded
• Function not always differentiable
• Function not always continuous
• Multiple local optima
5
Norms and Metrics
• It is useful to have some idea of “distance”
or “closeness” in vector space
• The most common measure is Euclidean
distance; this is sufficient for our purposes
(dealing with n-dimensional real numbers)
• General requirements of norm: anything
that satisfies conditions 1), 2), 3) (see
notes)
6
Continuity
• General intuitive sense of continuity (no gaps or jumps). Whenever
x is close to x’, f(x) is close to f(x’)
• Formal definitions:
A sequence of elements, {xn} is said to converge to a point, x in
Rn if for every  > 0 there is a number, N such that for all n < N, ||xnx|| < .
• A function f:RnRn is continuous at a point, x if for ALL
sequences {xn} converging to x, the derived sequence of points in
the target space {f((xn)} converges to the point f(x).
• A function is continuous if it is continuous at all points in its
domain.
• What does this mean in 2d? Sequence of points converging from
below, sequence of points converging from above. Holds true in
higher levels of dimensionality.
7
Continuity 2
• Why continuity? Needed to guarantee existence of
solution
• So typically assume continuity on functions to guarantee
(with other assumptions) that a solution to the problem
exists
• Sometimes continuity is too strong. To guarantee a
maximum, upper semi-continuity is enough. To
guarantee a minimum, lower semi-continuity
• Upper semi-continuity: For all xn  x, limn f(xn) ≤ f(x)
• Lower semi-continuity: For all xn  x, limn f(xn)  f(x)
• Note that if these hold with equality, we have continuity.
• Note, figure 6 in notes is wrong
8
Open sets
(notes from lecture 3)
• For many set definitions and proofs we use the
concept of an open ball of arbitrarily small size.
• An open ball is a set of points (or vectors) within
a given distance from a particular point (or
vector). Formally:
Let ε be a small real number. Bε(x)={y| ||x-y||< ε}.
• A set of points S in Rn is open if for all points in S, there
exists an open ball that is entirely contained within S. Eg
(1,2) vs (1,2].
• Any union of open sets is open.
• Any finite intersection of open sets is open.
9
Interior, closed set
(notes in lecture 3)
• The interior of a set S is the largest open set
contained in S. Formally, Int(S) = UiSi where Si
is an open subset of S.
• If S is open, Int(S)=S
• A set is closed if all sequences within the set
converge to points within the set. Formally, fix a
set S and let {xm} be any sequence of elements
in S. If limmxm=r where r is in S, for all
convergent sequences in S, then S is closed.
• S is closed if and only if SC is open.
10
Boundary, bounded, compact
(notes in lecture 3)
• The boundary of a set S [denoted B(S)] is the
set of points such that for all ε>0, Bε(x)∩S is not
empty and Bε(x)∩SC is not empty.
Ie any open ball contains points both in S and
not in S.
• If S is closed, S=B(S)
• A set S is bounded if the distance between all
objects in the set is finite.
• A set is compact if it is closed and bounded.
• These definitions correspond to their
commonsense interpretations.
11
Weierstrass’s Theorem
(notes in lecture 3)
• Gives us a sufficient condition to ensure that a
solution to a constrained optimization problem
exists. If the constraint set C is compact and the
function f is continuous, then there always
exists at least one solution to
max f(x) s.t. x is in C
• Formally: Let f:RnR be continuous. If C is a
compact subset of Rn, then there exists x* in C,
y* in C s.t. f(x*)f(x)f(y*) for all x in C.
12
Vector geometry
• Want to extend intuition about slope = 0 idea of optimum
to multiple dimensions. We need some vector tools to
do this
• Inner product: x·y=(x1y1+x2y2+…+xnyn)
• Euclidean norm and inner product related: ||x||2=x·x
• Two vectors are orthogonal (perpendicular) if x·y = 0.
• Inner product of two vectors v, w is v’w in matrix
notation.
• v’w > 0 then v, w form acute angle
• v’w < 0 then v, w form obtuse angle.
• v’w = 0 then v, w orthogonal.
13
Linear functions
• A function f:VW is linear if for any two real numbers
a,b and any two elements v,v’ in V we have f(av+bv’) =
af(v)+bf(v’)
• Note that our usual interpretation of “linear” functions in
R1 (f(x)=mx+b) are not generally linear, these are affine.
(Only linear if b=0).
• Every linear function defined on Rn can be represented
by an n-dimensional vector (f1,f2,…fn) with the feature
that f(x) = Σfixi
• Ie value of function at x is inner product of defining
vector with x.
• [Note, in every situation we can imagine dealing with,
functionals are also functions.]
14
Hyperplanes
• A hyperplane is the set of points given by
{x:f(x)=c} where f is a linear functional and c is
some real number.
• Eg1: For R2 a typical hyperplane is a straight
line.
• Eg2: For R3 a typical hyperplane is a plane.
• Think about a hyperplane as one of the level
sets of the linear functional f. As we vary c, we
change level sets.
• The defining vector of f(x) is orthogonal to the
hyperplane.
15
Separating Hyperplanes
• A half-space is the set of points on one side of a
hyperplane. Formally: HS(f) = {x:f(x)c} or
HS(f)= {x:f(x)≤c}.
• Consider any two disjoint sets: when can we
construct a hyperplane that separates the sets?
• Examples in notes.
• If C lies in a half-space defined by H and H
contains a point on the boundary of C, then H is
a supporting hyperplane of C.
16
Convex sets
• A set is convex if the convex combination of all
points in a set is also in the set.
• No such thing as a concave set. Related but
different idea to convex/concave functions.
• Formally: a set C in Rn is convex if for all x, y in
C, for all  between [0,1] we have x+(1-)y is
in C.
• Any convex set can be represented as
intersection of halfspaces defined by supporting
hyperplanes.
• Any halfspace is a convex set.
17
Separating Hyperplanes 2
• Separating hyperplane theorem: Suppose X, Y are
non-empty convex sets in Rn such that the interior of
Y∩X is empty and the interior of Y is not empty.
Then there exists a vector a in Rn which is the defining
vector of a separating hyperplane between X and Y.
Proof: in texts.
• Applications: general equilibrium theory, second
fundamental theorem of welfare economics. Conditions
where a pareto optimum allocation can be supported as
a price equilibrium. Need convex preferences to be able
to guarantee that there is a price ratio (a hyperplane)
that can sustain an equilibrium.
18
Graphs
• The graph is what you normally see when
you plot a function.
• Formally: the graph of a function from V to
W is the ordered pair of elements,
{( v, w) : v V , w  f ( v )}
19
Derivatives
• We already know from basic calculus that a
necessary condition for x* to be an
unconstrained maximum of a function f is that its
derivative be zero (if the derivative exists) at x*.
• A derivative tells us something about the slope
of the graph of the function.
• We can also think about the derivative as telling
us the slope of the supporting hyperplane to the
graph of f at the point (x,f(x)).
(see notes)
20
Multidimensional derivatives
and gradients
• We can extend what we know about derivatives from
single-dimensional space to multi-dimensional space
directly.
• The gradient of f at x is just the n-dimensional (column)
vector which lists all the partial derivatives if they exist.
f f
f
f  (
,
,...
)'
dx1 dx2
dxn
• This nx1 matrix is also known as the Jacobian.
• The derivative of f is the transpose of the gradient.
• The gradient can be interpreted as a supporting
hyperplane of the graph of f.
21
Second order derivatives
• We can think about the second derivative of
multidimensional functions directly as in the single
dimension case.
• The first derivative of the function f was an nx1 vector;
the second derivative is an nxn matrix known as the
Hessian.
 2 f ( x)
 dx 2
1

2 f ( x )  

2
  f ( x)
 dx dx
n
1




2 f ( x) 
dx1dxn 



2
 f ( x) 
dxn2 

• If f is twice continuously differentiable (ie all elements of
Hessian exist) then the Hessian matrix is symmetric
22
(second derivatives are irrespective of order).
Homogeneous functions
• Certain functions in Rn are particularly well-behaved and have useful
properties that we can exploit without having to prove them every
time.
• A function f:RnR is homogeneous of degree k if f(tx1,tx2,….,tkf(x).
In practice we will deal with homogeneous functions of degree 0 and
degree 1.
Eg: demand function is homog degree 0 in prices (in general
equilibrium) or in prices and wealth: double all prices and income
has no impact on demand.
• Homogeneous functions allow us to determine the entire behavior of
the function from only knowing about the behavior in a small ball
around the origin
Why? Because for any point x’, we can define x’ as a scalar multiple
of some point x it that ball, so x’=tx
• If k=1 we say that f is linearly homogeneous.
• Euler’s theorem: if f is h.o.d. k then
x  f ( x )  kf ( x )
23
Homogenous functions 2
• A ray through x is the line (or hyperplane) running
through x and the origin running forever in both
directions.
Formally: a ray is the set {x’ in Rn|x’=tx, for t in R}
• The gradient of a homogenous function is the essentially
the same along any ray (linked by a scalar multiple). Ie
the gradient at x’ is linearly dependent with the gradient
at x’.
Thus level sets along any ray have the same slope.
Application: homogeneous utility functions rule out
income effects in demand. (At constant prices,
consumers demand goods in the same proportion as
income changes.)
24
Homothetic functions
• A function f:R+nR+ is homothetic if f(x)=h(v(x))
where h:R+R+ is strictly increasing and
v:R+R+ is h.o.d. k.
• Application: we often assume that preferences
are homothetic. This gives that indifference sets
are related by proportional expansion along
rays.
• This means that we can deduce the consumer’s
entire preference relation from a single
indifference set.
25
More properties of gradients
(secondary importance)
• Consider a continuously differentiable function, f:RnR.
The gradient of f (Df(x)) is a vector in Rn which points in
the direction of greatest increase of f moving from the
point x.
• Define a (very small) vector v s.t. Df(x)’v=0 (ie v is
orthogonal to the gradient). Then the vector v is moving
us away from x in a direction that adds zero to the value
of f(x). Thus, any points on the vector v are at the same
level of f(x). So we have a method of finding the level
sets of f(x) – by solving Df(x)’v=0. Also, v is tangent to
the level set of f(x).
• The direction of greatest increase of a function at a point
x is at right angles to the level set at x.
26
Upper contour sets
• The level sets of a function are the set of
points which yield the same value of the
function. Formally, for f:RnR the level
set is {x:f(x)=c}
Eg: indifference curves are level sets of
utility functions.
• The upper contour set is the set of points
above the level set, ie the set {x:f(x) c}.
27
Concave functions
• For any two points, we can trace out the line of
points joining them through tx+(1-t)y, varying t
between 0 and 1. This is a convex
combination of x and y.
• A function is concave if for all x, y:
x, y  R n , f (tx  (1  t ) y )  tf ( x )  (1  t ) f ( y )
ie line joining any two points is (weakly) less
than the graph of the function between those two
points
• A function is strictly concave if the inequality is
strict for all x,y.
28
Convex functions
• A function is convex if for all x, y:
x, y  R n , f (tx  (1  t ) y )  tf ( x )  (1  t ) f ( y )
ie line joining any two points is (weakly) greater
than the graph of the function between the
points.
• A function is strictly convex if the inequality is
strict for all x,y.
• A function f is convex if –f is concave.
• The upper contour set of a convex function is a
convex set. The lower contour set of a concave
function is a convex set.
29
Concavity, convexity and second
derivatives
• If f:RR and f is C2, then f is concave iff
f’’(x)≤0 for all x. (And strictly concave for
strict inequality).
• If f:RR and f is C2, then f is convex iff
f’’(x)0 for all x. (And strictly convex for strict
inequality).
30
Concave functions and gradients
• Any concave function lies below its
gradient (or below its subgradient if f is not
C1).
• Any convex function lies above its gradient
(or above subgradient if f is not C1.
• Graphically: function lies below/above line
tangent to graph at any point.
31
Negative and positive (semi-)
definite
• Consider any square symmetric matrix A.
• A is negative semi-definite if x’Ax≤0 for all
x.
If in addition x’Ax=0 implies that x=0, then A
is negative definite.
• A is positive semi-definite if x’Ax0 for all x.
If in addition x’Ax=0 implies that x=0, then A
is positive definite.
32
Principal minors and nsd/psd
• Let A be a square matrix. The k’th order leading
principal minor of A is the determinant of the
kxk matrix obtained by deleting the last n-k rows
and columns.
• An nxn square symmetric matrix is positive
definite if its n leading principal minors are
strictly positive.
• An nxn square symmetric matrix is negative
definite if its n leading principal minors are
alternate in sign with a11 < 0.
• [There are conditions for getting nsd/psd from
principal minors.]
33
Reminder: determinant of a 3x3
matrix
• You won’t have to take the determinant of
a matrix bigger than 3x3 without a
computer, but for 3x3:
 a11 a12 a13 
A  a21 a22 a23 


a31 a32 a33 
A  a11a22a33  a12a23a31  a13a21a32
 a13a22a31  a12a21a33  a23a32a11
34
Concavity/convexity and nd/pd
• Any ease way to identify if a function is convex
or concave is from the Hessian matrix.
• Suppose f:RnR is C2. Then:
• f is strictly concave iff the Hessian matrix is
negative definite for all x.
• f is concave iff the Hessian matrix is negative
semi-definite for all x.
• f is strictly convex iff the Hessian matrix is
positive definite for all x.
• f is convex iff the Hessian matrix is positive
semi-definite for all x.
35
Quasi-concavity
• A function is quasi-concave if
f(tx + (1-t)y)min{f(x),f(y)} for x,y in Rn, 0≤t≤1
• Alternatively: a function is quasi-concave if its upper
contour sets are convex sets.
• A function is strictly quasi-concave if in addition
f(tx + (1-t)y)=min{f(x),f(y)} for 0<t<1 implies that x=y
• All concave functions are quasi-concave (but not vice
versa).
• [Why quasi-concavity? Strictly quasi-concave functions
have a unique maximum.]
36
Quasi-convexity
• A function is quasi-convex if
f(tx + (1-t)y) ≤max{f(x),f(y)} for x,y in Rn, 0≤t≤1
• Alternatively: a function is convex if its lower
contour sets are convex sets.
• A function is strictly quasi-convex if in addition
f(tx + (1-t)y)=max{f(x),f(y)} for 0<t<1 implies that
x=y
• All convex functions are quasi-convex (but not
vice versa).
• [Why quasi-convexity? Strictly quasi-convex
functions have a unique miniumum.]
37
Bordered Hessian
• The bordered hessian matrix H is just the
hessian matrix next to the Jacobian and its
transpose:

 0

 f
H   x1
 
 f

 xn
f
x1

2 f
f 
xn 







• If the leading principal minors of H from k=3
onwards alternate in sign with the first lpm>0,
then f is quasi-concave. If they are all negative,38
then f is quasi-convex.
Concavity and monotonic
transformations
• (Not in the lecture notes, but useful for solving
some of the problem set problems).
• The sum of two concave functions is concave
(proof in PS2).
• Any monotonic transformation of a concave
function is quasiconcave (though not necessarily
concave). Formally, if h(x)=g(f(x)), where f(x) is
concave and g(x) is monotonic, then h(x) is
quasi-concave.
• Useful trick: the ln(x) function is a monotonic
transformation.
39
Unconstrained optimization
• If x* is a solution to the problem maxxf(x), x is in Rn, what
can we say about characteristics of x*?
• A point x is a global maximum of f if for all x’ in Rn,
f(x)f(x’).
• A point x is a local maximum of f if there exists an open
ball of positive radius around x, Bε(x) s.t. for all x’ in the
ball, f(x)  f(x’).
• If x is a global maximum then it is a local maximum (but
not necessarily vice versa).
• If f is C1, then if f is a local maximum of f, then the
gradient of f at x = 0. [Necessary but not sufficient.]
This is the direct extension of the single dimension case.
40
Unconstrained optimization 2
• If x is a local maximum of f, then there is an open ball
around x, Bε(x) s.t. f is concave on Bε(x).
• If x is a local minimum of f, then there is an open ball
around x, Bε(x) s.t. f is convex on Bε(x).
• Suppose f is C2. If x is a local maximum, then the
Hessian of f at x is negative semi-definite.
• Suppose f is C2. If x is a local minimum, then the
Hessian of f at x is positive semi-definite.
• To identify a global max, we either solve for all local
maxima and then compare them, or look for additional
features on f that guarantee that any local max are
global.
41
Unconstrained optimization 3
• If f: RnR is concave and C1, then Df(x)=0
implies that x is a global maximum of f. (And x
being a global maximum implies that the
gradient is zero.) This is both a necessary and
sufficient condition.
• In general, we only really look at maximization,
since all minimization problems can be turned
into maximization problems by looking at –f.
• x solves max f(x) if and only if x solves min f(x).
42
Non-differentiable functions
(secondary importance)
• In economics, we rarely have to deal with nondifferentiable functions; normally we assume these away.
• The superdifferential of a concave function f at a point
x is the set of all supporting hyperplanes of the graph of f
at the point (x*,f(x*)).
• A supergradient of a function f at a point x* is an
element of the superdiffential of f at x*.
• If x* is an unconstrained local maximum of a function
f:RnR, then the vector of n zeros must be an element
of the superdifferential of f at x*.
• [And equivalently subdifferential, subgradient, local
minimum for convex functions.]
43
Constrained optimization
• General form of constrained optimization
max f ( x ), x  C
• Normally we write the constraint by writing out restrictions (eg x 1)
rather than using set notation.
• Sometimes (for equality constraints) it is more convenient to solve
problems by substituting the constraint(s) into the objective function,
and so solving an unconstrained optimization problem.
• Most common restrictions: equality or inequality constraints.
max f ( x ), s.t. g ( x )  c
• Eg: Manager trying to induce worker to provide optimal effort (moral
hazard contract).
44
Constrained optimization 2
• No reason why can only have one restriction. Can have
any number of constraints, which may be of any form.
Most typically we use equality and inequality constraints;
these are easier to solve analytically than constraints
that x belong to some general set.
• These restrictions define the constraint set.
• Most general notation, while using only inequality
constraints:
max x f ( x ) s.t.G( x )  0
where G(x) is a mx1 vector of inequality constraints (m is
number of constraints).
• Eg: For the restrictions 3x1+x2≤10, x12, we have:
10  3x1  x2 

G( x )  
x1  2 

45
Constrained optimization 3
• We will need limitations on the constraint set to
guarantee solution of existence (Weierstrass’
theorem).
• What can happen if constraint set not convex,
closed? (examples)
• Denoting constraint sets:
C  {x  R n | f ( x )  c  0}
characterizes all values of x in Rn where f(x)  c
46
General typology of constrained
maximization
• Unconstrained maximization. C is just the whole vector
space that x lies in (usually Rn). We know how to solve
these.
• Lagrange Maximization problems. Here the constraint
set is defined solely by equality constraints.
• Linear programming problems. Not covered in this
course.
• Kuhn-Tucker problems. These involve inequality
constraints. Sometimes we also allow equality
constraints, but we focus on inequality constraints. (Any
problem with equality constraints could be transformed
by substitution to deal only with inequality constraints.)
47
Lagrange problems
• Covered briefly here, mostly to compare and contrast
with Kuhn-Tucker.
• Canonical Lagrange problem is of form:
max x f ( x ) s.t.G ( x )  0, x  R n , G : R n  R n
• Often we have a problem with inequality constraints, but
we can use economic logic to show that at our solution
the constraints will bind, and so we can solve the
problem as if we had equality constraints.
• Eg: Consumer utility maximization; if utility function is
increasing in all goods, then consumer will spend all
income. So budget constraint px≤w becomes px=w.
48
Lagrange problems 2
• Lagrange theorem: in the canonical Lagrange problem
(CL) above, suppose that f and G are C1 and suppose
that the nxm matrix DG(x*) has rank m. Then if x* solves
CL, there exists a vector λ* in Rn such that Df(x*) +
DG(x*) λ*=0. Ie:
•
f ( x*)   j 1 g j ( x*) * j  0
m
• This is just a general form of writing what we know from
solving Lagrange problems: we get n FOCs that all equal
zero at the solution.
• Rank m requirement is called “Constraint qualification”,
we will come back to this with Kuhn Tucker. But this is a
necessary (not sufficient) condition for the existence of
49
Lagrange Multipliers.
Basic example:
• max f(x1,x2) s.t. g1(x1,x2) = c1, g2(x1,x2)=c2
• L = f(x1,x2)+λ1(g1(x1,x2)-c1)+λ2(g2(x1,x2)-c2)
• FOCs:
x1: f1(x1,x2) + λ1g11(x1,x2) + λ2g21(x1,x2) =0
x2: f2(x1,x2) + λ1g12(x1,x2) + λ2g22(x1,x2) =0
• Plus constraints:
λ1: g1(x1,x2) – c1 = 0
λ2: g2(x1,x2) – c2 = 0
50
Lagrange problems 3
• We can also view the FOCs from the theorem
as:
m
f ( x*)   j 1 g j ( x*) * j
• Ie we can express the gradient of the objective
function as a linear combination of the gradients
of the constraint functions, where the weights
are determined by λ*. (see diagram in notes)
• Note that no claims are made about the sign of
λ* (but sign will be more important in KT).
51
Kuhn Tucker 1
• The most common form of constrained
optimization in economics takes the form:
max x f ( x ) s.t.G ( x )  0, x  0, x  R n , G : R n  R n
• (Note that we can include non-negativity
constraints inside the G(x) vector, or not.)
• Examples: utility maximization.
• Cost minimization
52
Kuhn Tucker 2
• Key problem with inequality constraints: solution to
problem might be on boundary of constraint, or might be
internal. (see diagram in notes)
• Main advance of KT: sets up necessary conditions for
optimum in situations where constraints bind, and for
situations where they don’t. Then compare between
these cases.
• Basic idea: if constraints bind at a solution, then the
value of the function must decrease as we move away
from the constraint. So if at constraint x≤c, we can’t be
at a maximum unless f’(x)0 at that point. If constraint is
x c, we can’t be at a maximum unless f’(x)≤0 at that
point. Otherwise, we could increase the value of the
function without violating any of the constraints.
53
Kuhn-Tucker 3
• We say a weak inequality constraint is binding if the constraint
holds with equality.
• Unlike Lagrange problems, in KT problems, constraints might bind a
solution, or they might not (if we have an internal solution). If a
particular constraint does not bind, then its multiplier is zero; if the
constraint does bind, then the multiplier is non-zero (and is >0 or <0
depending on our notational formulation of the problem).
• We can think of the multiplier on a constraint as being the shadow
value of relaxing that constraint.
• Main new thing to deal with; complementary slackness conditions.
Complementary slackness conditions are a way of saying that either
a) a particular constraint is binding (and so the respective multiplier
for that constraint is non-zero), which implies a condition on the
slope of the function at the constraint (it must be increasing towards
the constraint)
b) a constraint does not bind (so we must be in an internal solution,
with a FOC that equals zero).
54
Example 1
• Max f(x) s.t. 10-x0, x 0
L: f(x) + λ(10-x)
FOCs
x: f’(x)- λ ≤0
λ: 10-x 0
CS:
CS1: (f’(x)-λ)x=0
CS2: (10-x)λ=0
55
Example 1, contd
• Case 1, strict interior. x>0, x<10
From CS2, we have λ=0.
From CS1, we have f’(x) = 0. (ie unconstrained
optimum)
• Case 2, left boundary, x=0.
From CS2, we have λ=0.
From FOC1 (x) we need f’(x) ≤ 0.
• Case 3, right boundary, x=0.
From CS1, we have f’(x) = λ, and we know λ0
by construction, so we must have f’(x) 0.
• Thus, we can use the KT method to reject any
candidate cases that don’t have the right slope.
56
Solving KT problems
• Two methods, basically identical but slightly different in how they
handle non-negativity constraints.
• Method 1 (treat non-negativity constraints as different from other
conditions)
• Write the Lagrangean with a multiplier for each constraint other than
non-negativity constraints on choice variables. If we write the
constraints in the Lagrangean as g(x)0, we should add (not
substract) the multipliers in the Lagrangean, assume the multipliers
λ0, and this will make the FOCs for x non-positive, and the FOCs
for the multiplers λ non-negative.
• Take FOCs for each choice variable and each multiplier.
• Take CS conditions from the FOC for each choice variable that has
a non-negativity constraint, and for each multiplier.
• Take cases for different possibilities of constraints binding; reject
infeasible cases, compare feasible cases.
57
Solving KT problems 2
• Second method: treat non-negativity constraints as the
same as any other constraint; functionally the same but
doesn’t take shortcuts.
• Write the Lagrangean with a multiplier for each
constraint. This will give us more multipliers than the
previous method.
• Take FOCs for each choice variable and each multiplier.
• Take CS conditions for each multiplier. This gives us the
same number of CS conditions as the previous method.
• Take cases for different possibilities of constraints
binding; reject infeasible cases, compare feasible cases.
58
Example 2, method 1
• Max x2 s.t. x0, x≤2
L: x2 + λ(2-x)
FOCs:
x: 2x - λ ≤ 0
λ: (2-x)  0
CS:
(2x + λ)x = 0
(2-x)λ = 0
• Case 1, internal solution, x>0, λ=0: contradiction from
FOC1 rules this case out.
• Case 2, left boundary, x=0, λ=0. Consistent, but turns
out to be a minimum.
• Case 3, right boundary, λ > 0, x>0. CS2 implies x=2.
59
Example 2, method 2
• Max x2 s.t. x0, x≤2
L: x2 + λ1(2-x) + λ2(x)
FOCs:
x: 2x – λ1 + λ2 ≤ 0
λ1: (2-x)  0
λ 2: x  0
CS:
(2-x)λ1 = 0
xλ2= 0
• Case 1, internal solution, λ1=0, λ2=0: From FOC1, consistent only if
x=0 (ie actually case 2)
• Case 2, left boundary, λ1=0, λ2>0: From CS2, x=0. Consistent, but
turns out to be a minimum.
• Case 3, right boundary, λ1 > 0, λ2=0. CS1 implies x=2.
• Case 4, λ1 > 0, λ2>0: from CS1 and CS2, clearly contradictory
(0=x=2).
60
Sign issues
• There are multiple ways of setting up the KT Lagrangean
using different signs.
• One way is as above: in the Lagrangean, add λg(x)
terms, write the g(x) terms as  0, assume the multipliers
λi0, which implies that the FOC terms are ≤0 for choice
variables and 0 for multipliers. The lecture notes
(mostly) use this method.
• Another way is to subtract the λg(x) terms in L, and write
the g(x) terms as ≤ 0, assume implies λi0, which implies
the FOC terms are ≤0 for choice variables and 0 for
multipliers.
SB uses this method.
• Whatever method you choose, be consistent.
61
Example 1, SB signing
• Max f(x) s.t. 10-x0, x 0
L: f(x) - λ(x-10)
FOCs
x: f’(x)- λ ≤0
λ: -(x-10)0
CS:
CS1: (f’(x)-λ)x=0
CS2: -(x-10)λ=0
62
Kuhn Tucker 4
• Formal treatment: start with Lagrangian. When f:R+nR
and G:R+nRm, the Lagrangian of the KT problem is a
new function L:R+n+mR.
L( x ,  )  f ( x )   ' G ( x )
• Important to note the domain limit on L; the Lagrangian
is non-negative (and so
(We could rewrite the problem restricting the multipliers
to be negative by changing the + in the Lagrangian to - .)
(We could also rewrite the problem without the implicit
non-negativity constraints; in general KT problems not in
economic settings, we need not require x non-negative.)
63
Kuhn-Tucker 5
• As in the Lagrange method case, we can rewrite
the Lagrangian as:
L( x,  )  f ( x )   j 1  j g j ( x )
m
decomposing G into its components.
• For any fixed point x*, define indices of G:
K = {i:gi(x*)=0} and M = {i:x*i>0}.
• Define:
H ( x*)  M GK ( x)
by differentiating G with only the K components
wrt components j in M . This is MxK matrix.
64
Kuhn Tucker Theorem
• Suppose that x* solves the canonical KT as a local
maximum and suppose that H(x*) has maximal rank
(Constraint Qualification). Then there exists λ*0 s.t.:
L( x*,  *)
x*  0
x *

L( x*,  *)
x*  0
x *
L( x*,  *)
x*  0

 *'
L( x*,  *)
  *' G ( x*)  0

(ie FOCs for choice vbles)
for i=1,..n; (ie CS conditions for
non-negativity constraints)
(ie FOCs for multipliers)
(ie CS conditions for multipliers)
65
KT theorem notes
• The constraint qualification (H(x*) has maximal
rank) is complex and is typically ignored. But
technically we need this to guarantee the
theorem, and that the solution method yields
actual necessary conditions
• These are necessary conditions for a solution.
Just because they are satisfied does not mean
we have solved the problem; we could have
multiple candidate solutions, or multiple
solutions, or no solution at all (if no x* exists).
66
KT and existence/uniqueness
• Suppose G(x) is concave, and f(x) is strictly quasi-concave (of
G(x) strictly concave, and f(x) quasi-concave), then if x*
solves KT, x* is quasi-concave. Furthermore, if {x:G(x)0,x0}
is compact and non-empty and f(x) is continuous, then there
exists x* which solves KT.
• Proof: Existence from Weierstrass theorem. For uniqueness:
Suppose there are some x, x’ that both solve KT. Then f(x) =
f(x’) and G(x)0, G(x’)0. Since G is concave, for t in [0,1] we
have G(tx + (1-t)x’)  tG(x) + (1-t)G(x’)  0. So tx + (1-t)x’ is
feasible for KT. But f strictly quasi-concave implies f(tx+(1t)x’) > min{f(x),f(x’)}=f(x). So we have a feasible x’’ = (tx + (1t)x’) which does better than x and x’. Which contradicts x, x’
both being optimal solutions.
67
The constraint qualification
Consider the problem:
max x1 s.t. (1-x1)3-x2 0, x1 0, x2 0.
(see picture in notes, (1,0) is soln)
At solution, x2 0 is a binding constraint.
Note that gradient of constraint at (1,0) is
Dg(1,0) = (2(x1-1),-1)’ = (0,-1)’ at soln.
This gives H* matrix of
0 1 
which has a rank of 1. 

0

1


• The gradient of f(1,0) is (1,0), which cannot be
expressed as a linear combination of (0,1) or (0,-1). So
no multipliers exist that satisfy the KT necessary
conditions.
68
Non-convex choice sets
• Sometimes we have non-convex choice
sets; typically these lead to multiple local
optima.
• In these cases, we can go ahead and
solve the problem separately in each case
and then compare. OR we can solve the
problem simultaneously.
69
Example: labour supply with
overtime
• Utility function U(c,l)=cαlβ
• Non-negativity constraint on consumption.
Time constraints l  0 and 24 – l  0 on
leisure (note l is leisure, not labour).
• Overtime means that wage rate = w per
hour for first 8 hours, 1.5w per hour for
extra hours. This means:
c ≤ w(24-l)
for l  16
c ≤ 8w + 1.5w(16-l)
for l ≤ 16.
70
Overtime 2
• The problem is that we have different functions for the
boundary of the constraint set depending on the level of
l. The actual problem we are solving has either the first
constraint OR the second constraint; if we tried solving
the problem by maximising U(x) s.t. both constraints for
all l then we would solve the wrong problem.
(see figures in notes)
• To solve the problem, note that the complement of the
constraint set is convex.
c  w(24-l)
for l  16
c  8w + 1.5w(16-l)
for l ≤ 16
• So consider the constraint set given by:
(c – w(24-l))(c-8w-1.5w(16-l))  0
(see figure in notes)
71
Overtime 3
• Then, without harm we could rewrite the problem as:
maxc,l cαlβ s.t.
c 0, l 0, 24-l 0
-(c – w(24-l))(c-8w-1.5w(16-l))  0
• Note that this is not identical to the original problem (it
omits the bottom left area), but we can clearly argue that
the difference is harmless, since the omitted area is
dominated by points in allowed area.
• Note that if x* solves max f(x), it solves max g(f(x))
where g(.) is a monotonic transformation.
• So lets max log(cαlβ) instead s.t. the same constraints.
• This gives the Lagrangean:
L: αlog(c)+βlog(l)+μ(24-l)–λ(c-w(24-l))(c-(8w+1.5w(16-l)))
72
Overtime 4
• We can use economic and mathematical logic to
simplify the problem. First, note that since the
derivative of the log function is infinity at c=0 or
l=0, this clearly can’t be a solution, so μ=0 at any
optimum and we can ignore CS conditions on c
and l.
• So rewrite Lagrangean dropping μ term:
L: αlog(c)+βlog(l)–λ(c-w(24-l))(c-(8w+1.5w(16-l)))
• Now lets look at the FOCs.
73
Overtime 5
• FOCs:
c:    (c  8w  3w (16  l )  c  w(24  l ))  0
c
l:

l
2
 w ( c  8w 
3w
3
(16  1)  ( c  ( w( 24  l )))  0
2
2
λ: (c  8w  3w (16  l ))( c  w(24  l ))  0
2
• CS condition:
 ( c  8w 
3w
(16  l ))( c  w( 24  l ))  0
2
noting that the equalities occur in the FOCs because we
argued that non-negativity constraints for c and l don’t
bind.
74
Overtime 6
• If l and c were such that the FOC for λ were strictly
negative, we must have λ=0 by CS, but this makes the
first two FOCs impossible to satisfy.
So (c-8w-1.5w(16-l))=0 and/or (c-w(24-l))=0
In other words, we can’t have an internal solution to the
problem (which is good, since these are clearly
dominated).
• Case 1: (c-w(24-l))=0 (no overtime worked)
From first two FOCs, we get αwl=βc, which with c=24wwl gives us c = 24α/(α+β)
• Case 2: (c-8w-1.5w(16-l))=0 (overtime)
From the first two FOCs, we get 3αwl=2βc, which we can
combine with c = 8w+1.5w(16-l)) to get an expression for
c in terms of parameters.
• Actual solution depends on particular parameters of
75
utility function (graphically could be either).
The cost minimization problem
• Cost minimization problem: what is the cheapest
way to produce at least y output from x inputs at
input price vector w.
• C(y,w) = -maxx –w’x s.t. f(x)  y, y0, x 0.
• If f(x) is a concave function, then the set
{x:f(x)y} is a convex set (since this is an upper
contour set).
• To show that C(y,w) is convex in y:
Consider any two levels of output y, y’ and define
yt=ty+(1-t)y’ (ie convex combination).
76
Convexity of the cost function
• Let x be a solution to the cost minimization
problem for y, xt for yt, x’ for y’.
• Concavity of f(x) implies:
f(tx+(1-t)x’)tf(x)+(1-t)f(x’).
• Feasibility implies: f(x)  y, f(x’) y’.
• Together these imply:
f(tx+(1-t)x’)tf(x)+(1-t)f(x’)ty + (1-t)y’=tt
• So the convex combination tx+(1-t)x’ is
feasible for yt.
77
Convexity of the cost fn 2
• By definition:
C(y,w) = w’x
C(y’,w) = w’x’
C(yt,w) = w’xt
• But C(yt,w) = w’xt ≤ w’(tx+(1-t)x’)=tw’x+(1t)w’x’= t C(y,w) + (1-t)C(y’,w)
where the inequality comes since xt solves
the problem for yt.
• So C(.) is convex in y.
78
Implicit functions
(SB easier than lecture notes)
• So far we have been working only with functions in which
the endogenous variables are explicit functions of the
exogenous or independent variables.
Ie y = F(x1,x2,…xn)
• This is not always the case; frequently we have
economic situations with exogenous variables mixed in
with endogenous variables.
G(x1,x2,…xn,y)=0
• If for each x vector this equation determines a
corresponding value of y, then this equation defines an
implicit function of the exogenous variables x.
• Sometimes we can solve the equation to write y as an
explicit function of x, but sometimes this is not possible,
or it is easier to work with the implicit function.
79
Implicit functions 2
• 4x + 2y = 5 expresses y as an implicit function of
x. Here we can easily solve for the explicit
function.
• y2-5xy+4x2=0 expresses y implicitly in terms of x.
Here we can also solve for the explicit
relationship using the quadratic formula [but it is
a correspondence, not a function], y=4x OR =x.
• Y5-5xy+4x2=0 cannot be solved into an explicit
function, but still implicitly defines y in terms of x.
Eg: x=0 implies y=0. x=1 implies y=1.
80
Implicit functions 3
• Consider a profit-maximizing firm that uses a single input
x at a cost of w per unit to make a single output y using
technology y=f(x), and sells the output for p per unit.
Profit function: π(x)=pf(x)-wx
FOC: pf’(x)-w=0
• Think of p and w as exogenous variables. For each
choice of p and w, the firm will choice a value of x that
satisfies the FOC.
To study profit-maximising behaviour in general, we need
to work with this FOC defining x as an implicit function of
p and w.
• In particular, we will want to know how the choice of x
changes in response to changes in p and w.
81
Implicit functions 4
• An implicit function (or correspondence) of y in
terms of x does not always exist, even if we can
write an equation of the form G(x,y)=c
Eg: x2+y2=1. When x>1 there is no y that
satisfies this equation. So there is no implicit
function mapping x’s greater than 1 into y’s.
• We would like to have us general conditions
telling us when an implicit function exists.
82
Implicit functions 5
• Consider the problem:
maxx0f(x;q)
s.t. G(x;q)0
where q is some k dimensional vector of exogenous real
numbers.
• Call a solution to this problem x(q), and the value the
solution attains V(q) = f(x(q);q).
• Note that x(q) may not be unique, but V(q) is still welldefined (ie there may be multiple x’s that maximise the
function, but they all give the same value (otherwise
some wouldn’t solve the maximisation problem))
• Interesting question: how do V and x change with q?
• We have implicitly defined functions mapping q’s to V’s.
83
Implicit functions 6
• The problem above really describes a family of optimization
problems; each different value of the q vector yields a
different member of the family (ie a different optimization
problem).
• The FOCs from KT suggest that it will be useful to be able to
solve generally systems of equations
y  {z  R k : T ( z, q)  0}
where
z : R p  Rk , T : Rk  p  Rk
(why? Because the FOCs constitute such a system.)
• Eg: Finding the equation for a level set, is to find z(q) such
that T(z(q),q)-c=0. Here, z(q) is an implicit function
• As noted previously, not all systems provide implicit functions.
Some give correspondences, or give situations where there is
mapping x(q).
• The implicit function theorem tells us when it is possible to 84
find an implicit function from a system of equations.
Implicit function theorem
(for system of equations)
• Let T:Rk+pRk be C1. Suppose that
T(z*,q*)=0. If the kxk matrix formed by
stacking the k gradient vectors (wrt z) of
T1,T2,…Tk is invertible (or equivalently has
full rank or is non-singular), then there
exist k C1 functions each mapping Rp Rk
such that:
z1(q*)=z1*, z2(q*)=z2*, …. zk(q*)=zk* and
T(z(q),q) = 0 for all q in Bε(q*) for some
ε>0.
85
IFT example
• Consider the utility maximisation problem:
maxx in Rn U(x) s.t.
p’x≤I, U strictly quasiconcave, DU(x)>0, dU(0)/dxi=.
• We know a solution to this problem satisfies xi >
0 (because of dU(0)/dxi=) and I-p’x=0 (because
DU(x)>0) and the FOCs:
dU/dx1-λp1=0
…
dU/dxn-λpn=0
I-p’x=0
86
IFT example contd
• This system of equations maps from the
space R2n+2 (because x and p are nx1, λ
and I are scalars) to the space Rn+1 (the
number of equations).
• To apply the IFT, set z = (x, λ), q=(p,I)
Create a function T: R2n+2 Rn+1 given by:
 U ( x )

 p1 

 x1


T ( x,  , p, I )   U ( x )




p
n
 x
 I n p' x 


87
IFT example contd
• If this function T is C1 and if the n+1xn+1 matrix
of derivatives of T (wrt x and λ) is invertible, then
by the IFT we know that there exist n+1 C1
functions:
x1(p,I), x2(p,I), …. xn(p,I), λ(p,I)
s.t. T(x1(p,I), x2(p,I), …. xn(p,I), λ(p,I)) = 0 for all
p,I in a neighborhood of a given price income
vector (p,I).
• Ie, the IFT gives us the existence of continuously
differentiable consumer demand functions.
88
Theorem of the maximum
• Consider the family of lagrangian
problems:
V(q) = maxxf(x;q) s.t. G(x;q)=0
This can be generalized to KT by
restricting attention only to constraints that
are binding at a given solution.
• Define the function T:Rn+m+pRn+m by
 f ( x; q)   ' G ( x; q) 
T ( x,  , p, I )  

G ( x; q )


89
Theorem of the Maximum 2
• The FOCs for this problem at an optimum are
represented by T(x*,λ*;q)=0. We want to know
about defining the solutions to the problem, x*
and λ*, as functions of q.
• The IFT already tells when we can do this: if the
(n+m)x(n+m) matrix constructed by taking the
derivative of T wrt x and λ is invertible, then we
can find C1 functions x*(q) and λ*(q) s.t. T(x*(q’),
λ*(q’);q’)=0 for q’ in a neighborhood of q.
• Ie we need the matrix below to have full rank:
 2x f ( x; q)   '  2x G( x; q)  x G( x; q)
T ( x,  ; q)  

 x G( x; q)
0


90
Theorem of the Maximum 3
• Suppose the Lagrange problem above
satisfies the conditions of the implicit
function theorem at x*(q*),q*. If f is C1 at
x*(q*),q*, then V(q) is C1 at q*.
• Thus, small changes in q around q* will
have small changes in V(q) around V(q*).
91
Envelope Theorem
• Applying the IFT to our FOCs means we know (under
conditions) that x(q) that solves our FOCs exists and is
C1, and that V(.) is C1.
• The envelope theorem tells us how V(q) changes in
response to changes in q.
• The basic answer from the ET is that all we need to do is
look at the direct partial derivative of the objective
function (or of the Lagrangian for constrained problems)
with respect to q.
• We do not need to reoptimise and pick out different x(q)
and λ(q), because the fact that we were at an optimum
means these partial derivs are already zero.
92
Envelope theorem 2
• Consider the problem:
maxxf(x;q) s.t. G(x;q)≥0, G:Rn→Rm.
where q is a p-dimensional vector of exogenous
variables.
Assume that, at a solution, the FOCs hold with equality
and that we can ignore the CS conditions.
(Or assume that we only include constraints that bind at
the solution in G() )
• Suppose that the problem is well behaved, so we have
that at a particular value q*, the solution x(q*), λ(q*) are
C1 and V(q*)=f(x(q*);q*) is C1.
(Note that we could get these from the IFT and the
Theorem of the Maximum)
93
Envelope theorem 3
• Suppose the problem above satisfies the
conditions of the IFT at x*(q*). If f is C1 at
x*(q*),q* then:
 qV (q*)  f ( x(q*); q*)
L( x(q*),  (q*); q*)

q
f ( x(q*), q*)
g ( x(q * (; q*)

  (q*)
q
q
ie the derivative of the value function V(q) is
equal to the derivative of the Lagrangean
94
Envelope theorem 4
• So, to determine how the value function changes, we
merely need to look at how the objective function and
constraint functions change with q directly.
• We do not need to include the impact of changes in the
optimization variables x and λ, because we have already
optimized L(x,λ,q) with respect to these.
• So, for an unconstrained optimization problem, the effect
on V(.) is just the derivative of the objective function.
• For a constrained optimization problem, we also need to
add in the effect on the constraint. Changing q could
effect the constraint (relaxing or tightening it), which we
know has shadow value λ.
• Proof is in lecture notes.
95
Envelope theorem example
• Consider a problem for the form:
maxxf(x) s.t. q-g(x) 0
• Thus, as q gets bigger, the constraint is easier to satisfy.
What would we gain from a small increase in q, and thus
a slight relaxation of the constraint?
• The Lagrangian is L(x,λ;q) = f(x)+ λ(q-g(x))
• The partial deriv of the Lagrangian wrt q is λ. Thus,
dV(q)/dq = λ.
• A small increase in q increases the value by λ.
Thus, the lagrange multiplier is the shadow price. It
describes the price of relaxing the constraint.
• If the constraint does not bind, λ=0 and dV(q)/dq = 0.
96
Envelope theorem example 2
• We can use the envelope theorem to show
that in the consumer max problem, λ is the
marginal utility of income.
• Consider the cost min problem:
C(y,w) = maxx -w’x
s.t. f(x)-y0.
• Lagrangian is: L(x,λ;y,w) = -w’x + λ(f(x)-y)
Denote the optimal solution to be x(y,w).
• From the ET, we get:
C ( y , w)
L( x ,  , y , w )

 xi ( y , w)
wi
wi
97
ET example 2, contd
• This is known as Shephard’s lemma; the partial
derivative of the cost function with respect to wi
is just xi, the demand for factor i.
• Also note that:
 2 C ( y , w)


 2 C ( y , w)

xi ( y , w) 
x j ( y , w) 
wi w j
w j
wi
w j wi
ie the change in demand for factor i with respect
to a small change in price of factor j is equal to
the change in demand for factor j in response to
a small change in the price of factor i.
98
Correspondences
• A correspondence is a transformation that
maps a vector space into collections of subsets
in another vector space.
• Eg: a correspondence F:RnR takes any n
dimensional vector and gives as its output a
subset of R. If this subset has a only one
element for every input vector, then the
correspondence is also a function.
• Examples of correspondences: solution to the
cost minimization problem, or the utility
maximization problem.
99
Correspondences 2
• A correspondence F is bounded if for all x and for all y in
F(x), the size of y is bounded. That is, ||y||≤M for some
finite M. For bounded correspondences we have the
following definitions.
• A correspondence F is convex-valued if for all x, F(x) is
a convex set. (All functions are convex-valued
correspondences).
• A correspondence F is upper hemi-continuous at a
point x if: for all sequences {xn} that converge to x, and
all sequences {yn} such that yn in F(xn) converge to y,
then y is in F(x).
• For bounded correspondences, if a correspondence is
uhc for all x, then its graph is a closed set.
100
Correspondences 3
• A correspondence F is lower hemicontinuous at a point x, if for all
sequences {xn} that converge to x and for
all y in F(x), there exists a sequence {yn}
s.t. yn is in F(xn) and the sequence
converges to y.
• See figure in notes.
101
Fixed point theorems
• A fixed point of a function f:RnRn is a point x,
such that x=f(x). A fixed point of a
correspondence F:RnRn is a point x such
that x is an element of F(x).
• Solving a set of equations can be described as
finding a fixed point. (Suppose you are finding x
to solve f(x) = 0. Then you are looking for a
fixed point in the function g(x), where g(x) = x +
f(x), since for a fixed point x* in g, x* = g(x*) = x*
+ f(x*), so f(x*) =0.)
• Fixed points are crucial in proofs of existence of
equilibriums in GE and in games.
102
Fixed point theorems
• If f:RR, then a fixed point of f is any
point where the graph of f crosses the 45
degree line (ie the line f(x)=x).
• A function can have many fixed points, a
unique fixed point, or none at all.
• When can we be sure that a function
possesses a fixed point? We use fixed
point theorems.
103
Brouwer fixed point theorem
• Suppose f:RnRn and for some convex,
compact set C (that is a subset of Rn) f
maps C into itself. (ie if x is in C, then f(x)
is in C). If f is continuous, then f
possesses a fixed point.
• Continuity
• Convexity of C
• Compactness of C
• C maps into itself.
104
Kakutani fixed point theorem
• Suppose F:RnRn is a convex-valued
correspondence, and for some convex
compact set C in Rn, F maps C into itself.
(ie if x is in C, then F(x) is a subset of C).
If F is upper hemicontinuous, then F
possesses a fixed point.
• These FPTs give existence. To get
uniqueness we need something else.
105
Contraction mappings
• Suppose f:RnRn such that
||f(x)=f(y)|| ≤ θ ||x-y|| for some θ < 1 and for all
x,y. Then f ix a contraction mapping.
• Let C[a,b] be the set of all continuous functions
f:[0,1]R with the “supnorm” metric ||f)|| = maxx
in [a,b] f(x). Suppose T:CC (that is, T takes a
continuous function, does something to it and
returns a new, possibly different continuous
function). If, for all f,g in C, ||Tf-Tg|| ≤ θ ||f-g|| for
some θ < 1, then T is a contraction mapping.
106
Contraction mapping theorem
• If f or T (as defined above) is a contraction
mapping, it possesses a unique fixed
point, x*.
107
Dynamic optimisation
• Up to now, we have looked at static optimisation
problems, where agents select variables to maximise a
single objective function.
• Many economic models, particularly in macroeconomics
(eg saving and investment behaviour), use dynamic
models, where agents make choices each period that
affect their potential choices in future periods, and often
have a “total” objective function that maximises the
(discounted) sum of payoffs in each period.
• Much of the material in the notes is focused on
differential and difference equations (lectures 1-4), but
we will attempt to spend more time on lectures 5-6,
which are the focus of most dynamic models.
108
Ordinary differential equations
• Differential equations are used to model situations which
treat time as a continuous variable (as opposed to in
discrete periods, where we use difference equations).
• An ordinary differential equation is an expression
which describes a relationship between a function of one
variable and its derivatives.
• Formally:
m
1
2
m1
x (t )  F [t, x(t ), x , x (t ),..., x
where
(t );  ]
dx(t ) 2
d 2 x (t )
d m (t )
m
x (t ) 
, x (t ) 
,..., x (t ) 
2
dt
dt
dt m
1
   Rp
is a vector of parameters
F if a function Rm+1+pR
109
Ordinary differential equations 2
• The solution is a function x(t) that, together with its
derivatives, satisfies this equation.
• This is an ordinary differential equation because x is a
function of one argument, t, only. If it was a function of
more than one variable, we would have a partial
differential equation, which we will not study here.
• A differential equation is linear if F is linear in x(t) and its
derivatives.
• A differential equation is autonomous if t does not
appear as an independent argument of F, but enters
through x only.
• The order of a differential equation is the order of the
highest derivative of x that appears in it (ie order m
above).
110
First order differential equation
• Any differential equation can be reduced to a
first-order differential equation system by
introducing additional variables.
• Consider x3(t) = ax2(t) + bx1(t) + x(t)
Define y(t) = x1(t), z(t) = x2(t)
• Then: y1(t) = x2(t) = z(t), z1(t)=x3(t).
• So we have the system:
 z 1 (t )   az (t )  by (t )  cx (t ) 
 1  

z (t )
 y (t )   

 x 1 (t )  

y
(
t
)


 
111
Particular and general solutions
• A particular solution to a differential equation is a
differentiable function x(t) that satisfies the equation for
some subinterval I0 of the domain of definition of t, I.
• The set of all solutions is called the general solution,
xg(t).
• To see that the solution to a differential equation is
generally not unique, consider:
x1(t) = 2x(t).
One solution is x(t) = e2t. But for any constant c, x(t) =
ce2t is also a solution.
• The non-uniqueness problem can be overcome by
augmenting the differential equation with a boundary
condition: x(t0) = x0.
112
Boundary value problems
• A boundary value problem is defined by a
differential equation
x1(t) = f[t,x(t)]
and a boundary condition
x(t0) = x0,(x0,t0) is an element of X x I
• Under some conditions, every boundary value
problem has a unique solution.
• Fundamental Existence Uniqueness theorem:
Let F be C1 in some neighborhood of (x0,t0).
Then in some subinterval I0 of I containing t0
there is a unique solution to the boundary value
problem.
113
Boundary values problems 2
• If F is not C1 in some neighborhood of (x0,t0), the
solution may not be unique. Consider:
x1(t) = 3x(t)2/3
x,t in R
x(0)=0
Both x(t) = t3 and x(t) = 0 are solutions.
Note f(x) = 3x(t)2/3 is not differentiable at x=0.
• The solution may not exist globally.
Consider:
x1(t) = x(t)2 x,t in R
x(0) = 1
x(t) = 1/(1-t) is a solution, but is only defined for t
in [-,1)
114
Steady states and stability
• When using continuous time dynamic models, we are
often interested in the long-run properties of the
differential equation.
• In particular, we are interested in the properties of its
steady state (our equilibrium concept for dynamic
systems, where the system remains unchanged from
period to period), and whether or not the solution
eventually converges to the steady state (ie is the
equilibrium stable, will we return there after shocks).
• We can analyze the steady state without having to find
an explicit solution for the differential equation.
115
Steady states and stability 2
• Consider the autonomous differential equation:
x1(t) = f[x(t)]
f :X RR
• A steady state is a point x  X such that
f (x)  0
• Phase diagrams to illustrate this.
• Steady states may not exist, may not be unique,
may not be isolated.
• Stability: consider an equation that is initially at rest
at an equilibrium point x , and suppose that some
shock causes a deviation from x .
We want to know if the equation will return to the
steady state (or at least remain close to it), or if it will
116
get farther and farther away over time.
Steady states and stability 3
• Let x be an isolated (ie locally unique) steady
state of the autonomous differential equation
x1(t)=f[x(t)],
f :X RR
• We say that x is stable if for any ε > 0, there
exists δ in (0,ε] such that:
|| x(t0 )  x ||  || x(t )  x || t  t0
ie any solution x(t) that at some point enters a
ball of radius δ around x remains within a ball of
(possibly larger) radius ε forever after.
117
Steady states and stability 4
• A steady state is asymptotically stable if
it is stable AND δ can be chosen in such a
way that any solution that satisfies
|| x(t0 )  x || 
for some t0 will also satisfy lim t x(t )  x
• That is, any solution that gets sufficiently
close to x not only remains nearby but
converges to x as t .
118
Phase diagrams: arrows of motion
• The sign of x1(t) tells us about the direction that
x(t) is moving (see diagram).
• x1(t) > 0 implies that x(t) is increasing (arrows of
motion point right).
• x1(t) < 0 implies that x(t) is decreasing (arrows of
motion point left).
• Thus: x1 and x3 in diagram are locally
asymptotically stable; x2 is unstable.
• x1 in the second diagram (see notes) is globally
asymptotically stable.
119
Phase diagrams arrows of motion 2
• We can conclude that if for all x in some neighborhood of
a steady state
:
• x(t) < x implies x1(t) > 0 AND x(t) > x implies that x1(t)<0,
then x is asymptotically stable.
• x(t)<x implies x1(t) < 0 and x(t) > x implies x1(t)>0 then x
is unstable.
• Therefore, we can determine the stability property of a
steady state by checking the sign of the derivative of
f[x(t)] at x .
• xi is (locally) asymptotically stable if: f ' [ x (t )]|x ( t ) x  0
• xi is unstable if f ' [ x (t )]|x ( t ) x  0
• If f ' [ x (t )]|x ( t ) x  0 , then we don’t know.
i
i
i
120
Grobman-Hartman theorem
• Let x be a steady state of out standard autonomous
differential equation 1
1
x (t )  f [ x(t )], f : X  R  R, f  C
• We say that x is a hyperbolic equilibrium if f ' [ x (t )]|x ( t ) x  0
i
• The previous analysis suggests we can study the
stability properties of a nonlinear differential equation by
linearizing it, as long as the equilibrium is hyperbolic.
• Theorem: If x is a hyperbolic equilibrium of the
autonomous differential equation above, then there is a
neighborhood U of x such that the equation is
topologically equivalent to the linear equation:
x1 (t )  f ' [ x(t )] x ( t ) x  [ x(t )  x ]
in U.
(Note that this is a first-order Taylor series approximation
121
of f around x ). (See notes).
Grobman-Hartman theorem 2
• The theorem says that near a hyperbolic
equilibrium x , the non-linear differential equation
has the same qualitative structure as the
linearized differential equation.
• In particular, if x is (locally) asymptotically stable
for the linearized equation, it is locally
asymptotically stable for the non-linear equation,
and if it is unstable for the linearized equation,
then it is unstable for the non-linear equation.
122
Application: Solow Growth model
• Classic growth theory model. Also, start getting
used to notation. Capital letters used for aggregate
variables; lower case letters used for per worker
equivalents.
• Output Y is produced using capital K and labor L
accord to a production function:
Y(t) = F(K(t),L(t)]
• F is C1, it has constant returns to scale and positive
and diminishing marginal products wrt each input.
• By constant returns to scale, we have:
Y(t) = F[K(t),L(t)] = L(t)F[K(t)/L(t),1] = L(t)f[k(t)]
where k(t) = K(t)/L(t)
123
Solow model 2
• Then y(t) = Y(t)/L(t) = f[k(t)]
ie we can write output per unit of labor as a function of
capital per unit of labor.
• We also assume the Inada conditions hold: f(0)=0
limk(t)
f[k(t)] = 
f’(0)= 
limk(t)f’[k(t)] =0
• The economy is closed, so savings equal investment,
and a constant fraction of output s is saved. Assume no
capital depreciation. Thus,
Kt(t)=I(t)=sF[K(t),L(t)], where s is in [0,1]
and I denotes investment
• Assume that labor grows at a constant exogenous rate n
L(t) = L(0)ent = L0ent.
124
Solow model 3
• Then: K(t) = K(t)L(t)/L(t) = k(t)L0ent
• And thus: K1(t) = k1(t)L0ent + k(t)nL0ent (by
product rule)
• Combining the equations for capital evolution
(K1(t)) we get:
k1(t) = sf[k(t)] - nk(t)
(see graph).
• To find the steady state, set k1(t) = 0. Thus,
k
s
sf (k )  nk 

f (k ) n
The capital to output ratio K/Y=k/f(k) is constant ,
and capital stock and output grow at the rate of125
growth of the population, n.
Solow model 4
• Stability properties:
k1(t) > 0 for k(t) < k (since sf [k(t)]> nk(t))
k1(t) < 0 for k(t) > k (since sf [k(t)]< nk(t))
(How do we know this? See graph. We
have from diminishing marginal product
that f’ is positive but decreasing, at that f’
is infinite at 0, so nk cuts sf from below.)
126
Solving autonomous differential
equations
• Consider the following first-order autonomous
linear differential equation:
x1(t)=ax(t)+b
• All solutions can be written as:
xg(t) =
xc(t) +
xp(t)
general soln=complementary function + particular soln
• The complementary function xc(t) is the general
solution to the homogenous equation associated
with the equation above:
x1(t) = ax(t)
• xp(t) is any particular solution to the full
equation.
127
Solving autonomous differential
eqns 2
• We use the method of separation of variables to find the
complementary solution. Rewrite x1(t) = ax(t) as:
dx(t)/dt = ax(t)
• Then: dx(t) = ax(t)dt, and: dx(t)/x(t) = adt
• Integrating both sides, we get:
1
 x(t )dx(t )   adt
ln x (t )  at  c1
e ln x ( t )  e at e c1
x c (t )  ce at , c  e c1
(Note that the c in xc(t) is for the complementary solution,
not the constant c)
• So we have the complementary solution for the general128
form of a first order differential equation, xc(t)=ceat
Solving autonomous differential
eqns 3
• Now, lets verify that the general solution is the
complementary function plus any particular
solution; ie show that if xp(t) is a solution, then
ceat + xp(t) is also a soln.
d
dx p (t )
at
p
at
[ce  x (t )]  ace 
dt
dt
 ace at  ax p (t )  b
 a[ce at  x p (t )]  b
• Thus, if we add ceat to any solution, we get
another solution. We can get all solutions in that
way.
129
Solving autonomous differential
eqns 3
• Let xp(t) and xg(t) be two arbitrary solns. Then:
dx g (t )
 ax g (t )  b
dt
dx p (t )
 ax p (t )  b
dt
• Subtracting the second from the first, defining
y(t)=xg(t)-xp(t), we get:
dy (t )
 ay (t )
dt
with general solution:
y(t)=ceat
• Thus, the difference between any two arbitrary
solutions has the form ceat, and thus we can get
all solutions by adding ceat to xp(t).
130
Solving autonomous differential
eqns 3
• To solve the differential equation, we still need to
find xp(t). The simplest alternative is usually a
steady state solution:
x1 (t )  0  ax(t )  b  0
b
x p (t ) 
if a ≠
a
• Thus,
x g (t )  ce at 
b
a
0
if a≠0, and
x g (t )  bt  c if a = 0.
(show this yourself)
131
Solving autonomous differential
eqns 4
• We can pin down the arbitrary constant by
means of a boundary condition. Suppose we
have x(0) = x0.
• Then, for a≠0, x(0) = c – b/a, so
c = x0+b/a
, and the general soln is:
b  at b

x (t )   x0  e 
a
a

g
x (t )  bt  x0
g
if a≠0, and
if a=0
132
Solving autonomous differential
eqns 5
• What are the stability properties of this general
solution?
• If a >0, then eat as t, and x(t) explodes
unless x0 = -b/a (in which case it stays constant
forever)
• If a <0, then eat 0 as t, and x(t) is
asymptotically stable.
• Recall this solution is only for autonomous
equations, where t appears only through x.
What about non-autonomous differential
equations?
133
Solving non-autonomous
differential eqns
• Consider the first order non-autonomous
differential equation:
x1(t)=a(t)x(t)+b(t)
where the coefficients a and b are (potentially)
functions of time.
• Rewrite the equation as:
x1(t)-a(t)x(t)=b(t)
t
• Consider the function e-α(t), with  (t )    ( s)ds
0
-α(t):
Multiply both sides by e
x1 (t )e  ( t )  a(t ) x(t )e  ( t )  b(t )e  ( t )
134
Solving non-autonomous
differential eqns 2
• Note that the LHS is the derivative of x(t)e-α(t). So, we
can rewrite this as:
d
( x (t )e  ( t ) )  b(t )e  ( t )
dt
• The expression e-α(t) that made the left hand side above
the exact derivative of a function is called an integrating
factor.
• There are two forms of the general solution to the nonautonomous differential equation obtained using different
methods, the backward solution (convenient if there is
a natural initial condition) and the forward solution
(convenient if there is a natural terminal condition).
135
The backward solution
• The backward solution is obtained by integrating
backward between 0 and s (where s is our current
period). s d
s
 ( t )
 ( t )
 dt x(t )e
dt   b(t )e
0
x(t )e
dt
s
|   b(t )e  ( t )
 ( t ) s
0
x ( s )e
0
 ( s )
0
s
 x(0)   b(t )e  (t )
x( s )  x(0)e
0
 (s)
s
  b(t )e ( s )  (t )
0
This gives us the value of x(s) as a function of its initial
value x(0), and a weighted sum of past values of the
forcing term b(t).
Eg: with a natural initial condition: growth models, with
some initial level of capital.
136
The forward solution
• If we don’t have a logical initial condition, but still have a
natural terminal condition (normally an asymptotic
terminal condition) then we should use the forward
solution, obtained by integrating forward between s and
.  d

 ( t )
 ( t )

s

x(t )e
dt
x(t )e
dt  
s
b(t )e
dt

|   b(t )e  (t ) dt
 ( t ) 
s
lim x(t )e
s
 ( t )
 x ( s )e
 ( s )
t 
 (s)
x( s )  e
lim x(t )e
t 
 ( t )

  b(t )e  (t ) dt
s

  b(t )e ( s )  (t ) dt
s
provided that the limit exists (ie x(t) converges).
• Example: dynamic consumption optimisation (typical
terminal condition: net assets go to zero (or nonnegative) as time goes to infinity)
137
Example: Intertemporal budget
constraint
• Suppose that an infinitely lived household has the
following budget constraint:
a1(t)=ra(t)+y(t)-c(t)
ie at time t, change in assets is interest income (at rate r)
from assets plus labor income minus consumption
• Lets find the forward solution (since we want a budget
constraint out to time infinity). Multiply both sites the
budget constraint by e-rt and rearrange:
[a1(t)-ra(t)]e-rt=[y(t)-c(t)] e-rt
• Note that the LHS is the derivative of a(t)e-rt.
So, we can rewrite our equation for the general solution
of a non-autonomous differential equation as:
d

a (t )e rt   [ y (t )  c(t )]e rt
dt
138
Intertemportal budget constraint 2
• Suppose that a(0) is given, lets integrate forward
starting from t=0
a (t )e

|   [ y (t )  c(t )]e rt dt
 rt 
0
0

 a (0)   [ y (t )  c(t )]e rt dt
0


0

c(t )e dt  a (0)   y (t )e rt dt
 rt
0
ie lifetime consumption = initial assets + lifetime
income.
• Note: If the interest rate changed over time, we
would need to replace
r by r(t), and the integrating
t
factor becomes  0 r ( s ) ds
, so
e
t
0
c
(
t
)
e


0
 r ( s ) ds
t
dt  a (0)   y (t )e 0

0
 r ( s ) ds
dt
139
Linear systems
• Now we study systems of linear differential
equations. Recall, wlog, we can restrict
attention to first order-systems (since any higher
order system can be converted into a first order
system by defining new variables).
• General form: x1(t) = A
x(t) +
b
(nx1) (nxn) (nx1)
(nx1)
• Start with uncoupled systems (the matrix A is
diagonal). Ie:
0  0 
a
 11

0
a



22
A

  0 


0  a nn 
 0
140
Uncoupled systems
• Then, we can solve each of the n equations separately,
by the methods for solving individual equations.
• Eg: the first equation is x11(t)=a11x1(t)+b1
and its solution is x1g(t)=cea11t – b1/a11
• With an initial condition x1(0)=x10 then the solution
becomes:

b 
b
x1 (t )   x10  1 e a11t  1
a11 
a11

• Thus, if we are given an initial condition for each
equation in the system, xi(0)=xi0, i=1,2,…,n
then the solution to the boundary value problem is:

bi  aiit bi (assuming a≠0)
xi (t )   xi 0  e 
aii 
aii

141
Diagonalizable linear systems
• If the matrix is not diagonal, we can use the algebraic
technique of diagonalizing a matrix to reduce any
diagonalizable linear system to an equivalent diagonal
system, which we can then solve as above (by solving
each individual equation individually).
• Theorem: if the matrix A has n linearly independent
eigenvectors v1, v2,…vn then the matrix V = (v1,v2…vn) is
invertible and V-1AV is a diagonal matrix with the
eigenvalues of A along its main diagonal
• If all eigenvalues of the matrix are distinct, then the n
eigenvectors are linearly independent, so A is
diagonaliable.
142
Diagonalizable linear systems 2
• If A is diagonalizable, we first derive xc(t) by
solving the homogeneous part of the system of
differential equations.
x1(t) = Ax(t)
V-1x1(t) = V-1Ax(t)
= V-1AVV-1x(t)
(since VV-1=I)
y1(t) = Λy(t)
where y(t) = V-1x(t).
• y1(t) = Λ y(t) is a diagonal system, which we
already know how to solve, so:
yi(t) = cieλit
143
Diagonalizable linear systems 3
• To obtain the solution to the original system, we need to
convert back from y(t) to x(t):  c e  t 
1
 t
 c2 e 2  n
i t
c
x (t )  Vy (t )  ( v1 , v2 ,..., vn )

c
v
e
ii
  i 1


 c e nt 
 n 
1
• To obtain xp(t) we use the steady state:
x1(t) = 0nx1 ↔ Ax(t) + b = 0nx1
xp(t) = -A-1b
(if A nonsingular)
• Therefore:

x g (t )   ci vi e i t  A1b
i 1
where the constants c1,c2,…,cn can be found with a set
of n boundary conditions.
144
Eigenvectors and eigenvalues
• Consider a nxn matrix A.
• Its eigenvalues λ1,λ2,…,λn and eigenvectors
vi=(vi1,vi2,…vin) satisfy the equation:
(A-λiIn)vi=0nx1
for i=1,2,…,n
(Note this gives us ratios of values within an
eigenvector, we can pick one value to set =1 to
define a particular v)
• The eigenvalues are the roots of the following
equation: det(A-λiIn)=0
145
Example
• Consider the linear system x1(t)=Ax(t)
where
 1 1
A  

 4 1
• Its eigenvalues and eigenvectors satisfy:
(A-λI2)v=0nx2
• Its eigenvalues are the roots of:
det(A-λI2) = 0
• The matrix (A-λI2) is A  I  1   1 
 4

1   
146
Example, 2
• So setting the determinant of this =0 gives
0 = (1-λ)2 – 4 = λ2 - 2 λ -3 = (λ-3)(λ+1)
• So we have λ1 = 3, λ2 = -1.
• To find eigenvalue for λ1=3 we solve:
  2 1  v11   0 

    
4

2

 v12   0 
• This gives v12=2v11. Setting v11=1 gives:
v1=(1 2)’
• Similarly for λ2=-1, we get v2=(1 -2)’
147
Example, 3
• So we get the V matrix from concatenating eigenvectors:
1 1 
V  

 2  2
• This gives the diagonal matrix:
3 0 
V AV  

 0  1
ie it has the eigenvalues along its diagonal.
• Under the transformation y(t) = V-1x(t) we get:
1
y1(t)=Λy(t)   3 0  y1 (t ) 
 0  1 y (t ) 

 2 
which has the general solution:
 y1 (t )   c1e 3t 


  
t 
 y 2 (t )   c 2 e 
148
Example, 4
• And we get the solution to the original
solution by converting back to x(t) from
y(t):
 x1 (t ) 
 1 1  c1e 3t   c1e 3t c 2 e  t 
  


  Vy(t )  

t 
3t
t 
 2  2  c 2 e   2c1e  2c 2 e 
 x 2 (t ) 
149
Stability
• Recall the general solution:
n
x g (t )   ci vi e i t  A1b
i 1
• Case a) All eigenvalues real
• If all eigenvalues are negative, eλit0 as t ,
i=1,2,…n and the system is asymptotically
stable: it converges to the steady state whatever
the initial position.
• If some eigenvalues are positive, the
corresponding terms eλit as t, and xj(t) is
unstable unless civij=0.
• Case b) some eigenvalues complex: see notes.
Punchline: if eigenvalue is complex, stability will
150
be determined by real part of eigenvalue.
Stability 2
• Skipped ahead in notes to page 8, at eqn (11)
• If n=2, we can use phase diagrams to illustrate the
stability properties of the system.
• Suppose as before that we have a system of equations
x1(t) = Ax(t) + b
(where these 2x1 vectors and A is 2x2)
• WLOG, let’s assume that b=0.
We can always “demean” our system by defining a new
variable y as the deviation of x from its steady state:
y(t) = x(t) + A-1b
• Then, y1(t)=x1(t)=Ax(t)+b = A[y(t)-A-1b]+b = Ay(t)
• When n=2, all information about the system’s stability
properties are summarized by the trace and the
determinant of the coefficient matrix A.
• The trace of a matrix is the sum of its diagonal elements
151
Stability 3
• The eigenvalues of A solve |A-λI2| = 0
(a11-λ)(a22-λ) – a12a21 = 0
λ2 – (a11+a22)λ + a11a22 – a12a21 = 0
λ2 – tr(A) + det(a) = 0
• Thus, by the quadratic formula:
1, 2
tr( A)  tr( A) 2  4 det( A)

2
• Also:
tr(A) = λ1 + λ2
det(A) = λ1λ2
152
Stability: Nodes
• 4 different types of cases of steady state.
• Case 1: Nodes
• a) If det(A) > 0 and [tr(A)2-4det(A)] > 0: λ1 and λ2
are real, distinct and have the same sign.
If tr(A)<0, then λ1, λ2 < 0: and node is stable.
If tr(A) > 0, then λ1, λ2 >0: node is unstable
• b) If [tr(A)2-4det(A)] = 0: λ1 and λ2 are real and
repeated (so we can’t diagonalize the matrix).
If tr(A) < 0, then λ1 = λ2 < 0, and node is stable
If tr(A) > 0, then λ1 = λ2 > 0: unstable node
153
Stability: saddle points
• If det(A) < ), λ1 and λ2 are real, distinct and have
opposite sign.
WLOG, assume λ1 > 0, λ2 < 0. Then, if c1v1j≠0,
xj(t) is unstable.
(Note we cannot have v11=v12=0, else V is not
invertible)
• If c1 = 0, the system’s behavior is determined
only by λ2, and it converges to the steady state
as t.
• Setting c1=0 we obtain:
x1(t) = c2v21eλ2t
x2(t) = c2v22eλ2t
154
Stability: saddle points 2
• Then we get:
v 
x1 (t )   21  x2 (t )
 v22 
which is the equation of the saddle path.
• For the graph in the notes, assume:
 1 0 

A  
 0 2 
then: v = (1 0)’ v = (0 1)’
1
2
• The system will converge to the steady state if and only
if it starts out from one of points where the constant
associated with the explosive root is equal to zero
(constant on x1 here). If we are off this saddle path, we
do not converge to the steady state.
155
Stability: spiral points
• If [tr(A)2 – 4det(A)] < 0 and tr(A)≠0: λ1 and λ2 are
complex conjugates.
These give spiral paths; clockwise if a21 < 0,
counter-clockwise if a21 > 0
Spirals can be stable or unstable depending on
the sign of the trace.
• If tr(A) <0, then eigenvalues have a negative real
part and the spirals converge to the steady state
• If tr(A) >0, then eigenvalues have positive real
part, and spirals diverge from steady state.
156
Stability: centers
• If tr(A) = 0 and det(A) > 0: eigenvalues are
pure imaginary numbers. The steady
state is stable but not asymptotically
stable.
• If the matrix A is non-diagonalizable, we
need to use other methods to find
solutions (beyond scope of this course).
157
Nonlinear systems of differential
equations
• In general, it is not possible to find closed form
solutions for nonlinear differential equations.
However, we can obtain qualitative information
about the local behavior of the solution if n=2,
and study non-linear systems analytically by
looking at their linearizations.
• Consider the nonlinear differential equation
system:
x1(t) =
F[x(t)]
F: X subset of RnRn
(nx1)
(nx1)
158
Phase diagrams
• Assuming n=2, we have:
x11(t) = f1[x1(t), x2(t)]
x21(t) = f2[x1(t), x2(t)]
• Set x11 = 0 to obtain the phase lines.
x11(t) = 0 implies f1[x1(t), x2(t)] = 0
x21(t) = 0 implies f2[x1(t), x2(t)] = 0
• Each of these equations describes a curve in (x1(t),x2(t))
space. To find the slope of each curve, look at the sign
of dfi[.]/dx1 for i=1,2
(Note: take both derivatives wrt x1 for the sign to equal
the sign of the slope, since x1 is the horizontal axis.
Alternatively, we could take derivs wrt x2, or a
combination, whichever is computationally easier – as
long as we interpret them correctly)
159
Phase diagrams: arrows of motion
• To determine behavior of x1 off the phase lines, we
look at either of the derivatives:
x11 (t ) f1 [ x1 (t ), x2 (t )]

x1 (t )
x1 (t )
• Eg: suppose that
or
x11 (t )
x1 (t )
x11 (t ) f1 [ x1 (t ), x2 (t )]

x2 (t )
x2 (t )

x11 ( t ) 0
f1 [ x1 (t ), x2 (t )]
0
x1 (t )
x1 ( t ) 0
1
• Then, starting from the x11=0 line, a small movement
to the right will increase x11(t), making it positive, so
to the right of the x11(t)=0 line, x1(t) is increasing.
A small movement to the left will decrease x11(t),
making it negative, so left of the phase line x1(t) is
decreasing.
160
Phase diagrams: arrows of motion
2
• Now do the same for x2(t). Suppose:
x21 (t )
x2 (t )

x12 ( t ) 0
f 2 [ x1 (t ), x2 (t )]
0
x 2 (t )
x1 ( t )  0
2
• Then, starting from the x21(t) = 0 line, a small
movement upward (ie increasing x2) will
decrease x21(t), making it negative. So above
the x21(t) = 0 line, x2 is decreasing.
• Similarly, taking a small movement downward
(decreasing x2) will increase x21(t), making it
positive. So below the phase line, x2 is
increasing.
• Finally, combine all this.
161
Behavior of nonlinear systems
• From the phase diagrams we can obtain
valuable information about the behavior of the
system, but the phase diagram alone often does
not give us enough information about the
stability properties of the steady state.
• To obtain more precise information about the
behavior of the nonlinear system around the
steady state, under certain conditions we can
approximate the system by their linear
counterparts.
162
Behavior of nonlinear systems 2
• Take a multivariate first order Taylor series
approximation of F around the steady
state
F [ x(t )]  F ( x )  {DF [ x(t )] | x(t )  x}  [ x(t )  x ]  R[ x(t )  x ]
where
and
 f1

 x1
 f 2
DF [ x (t )] |x ( t ) x   x1

 
 f n
 x
 1
f1
x 2
f 2
x 2

f n1
x 2
f1 

x n 
f 2 

x n 



f n 

x n 

 || R[ xi (t )  xi ) || 

  0
lim
xi ( t ) xi  || xi (t )  xi || 
163
Behavior of nonlinear systems 3
• In some neighborhood of the steady state, the linear
system:
x1 (t )  {DF [ x(t )] | x(t )  x}  [ x(t )  x ]
can be expected to be a reasonable approximation of
our system of nonlinear equations under some
conditions.
• We need the steady state to be a hyperbolic equilibrium.
This is the case if the derivative of F evaluated at the
steady state, DF [ x (t )] | x (t )  x
has no eigenvalues with a zero real part.
164
Grobman-Hartman theorem
• We return to a version of the Grobman-Hartman
theorem for systems of equations.
• If x is a hyperbolic equilibrium of the
autonomous differential equation system
x1(t) = F[x(t)] and F is C1, then there exists a
neighborhood U of x such that the nonlinear
system is topologically equivalent to the linear
system
in U.
x1 (t )  {DF [ x(t )] | x(t )  x}  [ x(t )  x ]
165
Discrete time
• Up to now, in the dynamic section of the course, we have
been dealing with continuous time. That is, each
“period” t has had a length of zero, so we describe how a
variable changes by looking at its derivative; thus
differential equations are a key tool for explaining how
variables evolve over time.
• An alternate used in many situations is to use discrete
time. Here, we have periods indexed by t, of some
positive finite length, and so to look at changes in
variables we compare xt to xt+1. Our key tool here are
difference equations, the discrete time counterpart to
differential equations.
• Many definitions and properties that hold for differential
equations also hold for difference equations, with slight
modifications.
166
Difference equations 1
• With differential equations, time is continuous, so t
can take any real variable.
• With difference equations, t can only take integer
values that correspond to periods (eg a year, a
quarter, a day).
• An ordinary difference equation is an equation of
the form:
xt+m=F[t,xt,xt+1,xt+2,…,xt+m-1;α]
where
p
   R
and F is a function Rm+1+pR
• This is an ordinary difference equation because we
describe a relationship between a function of one 167
variable and its lags (ie x is a function only of t).
Difference equations 2
• A difference equation is linear if F is linear in xt,
xt+1,…,xt+m-1.
• A difference equation is autonomous if t does not
appear as an independent argument of F, but
enters only through x.
• The order of a difference equation is equal to the
difference between the highest and lowest time
subscripts appearing in the eqn. Thus, above
we have an order m difference equation.
168
Difference equations 3
• Solving the difference equation means finding a
function xt that satisfies the equation for any t in
some subinterval I0 of I.
• Since the time subscripts are just labels, we can
harmlessly transfer all the time subscripts by the
same amount. Thus (ignoring α) the equation
above is equivalent to
xt=F[t,xt-1,xt-2,…,xt-m]
• Any difference equation can be reduced to a
system of first order difference equations by
defining additional variables, so we can assume
m=1.
169
Difference equations 4
• The solution to the difference equation xt = f(t,xt1) in general is not unique.
• If we impose a boundary condition xt0 = x0, we
can recursively derive the solution as follows:
xt0+1 = f(t0+1,xt0),
xt0+2 = f(t0+2,xt0+1), …
• Different boundary conditions will generate
different trajectories of xt. As with differential
equations, a specific solution is a particular
solution, and the general solution is the set of all
particular solutions.
170
Steady states
• We are interested in analyzing the long-run
properties of a difference equation, and such
analysis does not necessarily require us to find
an explicit solution.
• For the autonomous differential equation
xt=f(xt-1), a steady state is a fixed point x
ie a point such that x  f (x )
• Note that a steady state may not exist.
• We can use phase diagrams to examine
difference equations. A fixed point is an
intersection of the phase line and the 450 line.
171
Stability
• As before, we are interested in stability and
asymptotic stability; what happens after a shock
that causes a small deviation from our steady
state?
• Recall: stability requires that any solution xt that
at some point enters a ball of radius δ around
the steady state will remain within a ball of
(possibly larger) radius ε.
• Asymptotic stability requires not only stability,
but also that we converge to the steady state as
t .
172
Stability 2
• When the phase line is above the 450 line:
xt = f(xt-1) > xt-1, so x is increasing.
• When the phase line is below the 45o line:
xt = f(xt-1) < xt-1, so x is decreasing.
• Thus, x1 is unstable, x 2 is asymptotically
stable.
• We can determine the stability property of a
steady state by looking at f’ at the steady state.
• If | f ' ( x ) | 1, then x is (locally) asymptotically stable
• If | f ' ( x ) | 1 , then x is unstable.
• If | f ' ( x ) | 1 , then x is a non-hyperbolic eqbm, and173
could be stable or unstable.
Grobman-Hartman
• As before, we can study stability
properties by linearizing the equation, as
long as we have a hyperbolic equilibrium.
• Theorem: If x is a hyperbolic equilibrium,
then there is a neighborhood U of x such
that the non-linear difference equation is
topologically equivalent to:
in U.
xt  x  [ f ' ( xt 1 ) |xt 1  x ]  ( xt 1  x )
174
Solving autonomous difference
equations
• Consider the following first order autonomous linear
difference equation:
xt = axt-1 + b
• All solutions can be written as xtg=xtc+xtp
• To derive xtc we substitute recursively on the
homogeneous part of the equation:
xt=axt-1=a(axt-2)=a2xt-2=a2(axt-3)=a3xt-3=…
=atx0
• Since the initial value of x, x0 remains undetermined
in the absence of a boundary condition, what this
says is that all solutions to xt=axt-1 must be of the
form:
xtc=cat,
where c is an arbitrary constant.
175
Solving autonomous difference
equations 2
• To find a particular solution, we use the steady
state: xt  xt 1  x  ax  b
xtp 
b
1 a
if a ≠ 1.
b
g
t
• Therefore, if a ≠ 1: xt  ca 
1 a
• We can pin down the arbitrary constant c by
means of a boundary condition. Suppose we
have: x0  ~
x
• Then, for a ≠ 1, we have: x0  c  b
1 a
b
c~
x
1 a
176
Solving autonomous difference
equations 3
• This gives general solution (if a≠1):
b  t
b
~
xt   x 
a 
1 a 
1 a

• If a=1, no steady state exists, and iteration
from the boundary condition gives
xt  bt  ~
x
177
Stability properties
• What are the stability properties of xtg?
• If |a|<1, then at0 as t, and xt is asymptotically
stable (converges to steady state regardless of initial
position)
• If |a|>1, then at  as t , and xt explodes unless x0 =
b/(1-a)
• The sign of a determines whether xt remains on one side
of the steady state, or whether it alternates sides of the
steady state.
• If a>0, cat has the same sign and the system converges
or diverges monotonically.
• If a<0, at is positive or negative as t is even or odd, and
the system jumps from one side to the other each period.
• See diagrams in notes.
178
Solving non-autonomous linear
difference equations
• Consider the first order non-autonomous linear
difference equation xt=axt-1+bt
where the forcing term b is a function of time.
• The complementary function will be the same as
that for the first order autonomous linear
equation xt = axt-1+b
which is of the form xt=cat. So we only need to
derive a particular solution.
• There are two particular solutions frequently
used: the forward solution and the backward
solution.
179
Backward solution
• The backward solution is obtained by iterating
the equation backwards:
xt  axt 1  bt
 a ( axt 2  bt 1 )  bt  a 2 xt 2  abt 1  bt
 a 2 ( axt 3  bt 2 )  ab t 1 bt  a 3 xt 3  a 2 xt 3  a 2 bt 2  abt 1  bt
 ...
s 1
 a x t  s   a j bt  j
s
j 0
• If there is no natural starting point for xt, we can
express the solution as a function of all past
values of the forcing variables by letting s:

xt  lim a x t s   a j bt  j
s
s
j 0
180
Backward solution 2
• For this equation to be well defined, we need to
assume that
|a|<1,
|b
|<B
for
all
t.
Then:
t

xtB   a j bt  j
j 0
as the limit goes to zero.
• And the general solution is:

x  c B a   a j bt  j
g
t
t
j 0
(denoting the constant as cB to reflect that once
we consider a boundary condition, the value of
cB will depend on the particular solution chosen)
181
Forward solution
• The forward solution is obtained by iterating out
difference equation forward.
• First, solve the equation for xt-1
xt-1=(1/a)xt – (bt/a)
and then shift it one period forward
xt=(1/a)xt+1 - (bt+1/a)
• Then, we have:
2
2
b  b
11
1
1
1
xt   xt  2  t  2   t 1    xt  2    bt  2  bt 1
aa
a  a
a
a
a
2
2
3
3
2
b  1
1
1
1 1
1
1
1
    xt 3  t 3     bt  2  bt 1    xt 3    bt 3    bt  2  bt 1
a  a
a
a
a a
a
a
a

s
s 1
1
1
   xt  s    
a
j 0  a 
j 1
bt  j 1
182
Forward solution 2
• If there is no natural end point for xt, we can
express the solution as a function of all future
values of the forcing variables by letting s →∞:
s
j
1  1
1
xt  lim   xt  s     bt  j 1
a j 0  a 
s   a 
• For this to be well defined, assume |a|>1, |bt| < B
for all t. Then:
j
1  1
x      bt  j 1
a j 0  a 
F
t
• And the general solution is:
xtg  c F a t 

j
1 1
  bt  j 1

a j 0  a 
183
Lag operators
• A common notation when dealing with discrete
time is to use lag operators. We can, for
example, use these to derive the backward and
forward solutions.
• A lag operator is defined by:
Lnxt = xt-n for n= …,-2,-1,0,1,2,…
• Thus, we can rewrite our difference equation xt =
axt-1 + bt as
xt = aL1xt+bt
or as
xt = aLxt+bt
• Thus,
(1  aL) x  b
t
t
 1 
xt  
bt
 1  aL 
184
Lag operators 2
• If |a| <1 we have:
1/(1-aL)=1+aL+a2L2 + ….
ie we can use the sum to infinity formula, and so
xt=bt/(1-aL) is the backward solution.
• To derive the forward solution, rewrite xt=bt/(1 1

aL) as:
L 

1
bt
xt   a
 1 1

 L 1
a

1 




a
bt 1
xt  
1 1 

1

L 

a


2

1  1 1  1   2
xt   1  L    L  ...bt 1
a  a
a

if a ≠1.
185
Capital accumulation
• The standard capital accumulation equation is
given by:
Kt = (1-δ)Kt-1 + It-1
where δ is depreciation (and so is in (0,1) and I
is investment.
• A particular solution can be derived using
backward iteration:
K t  (1   )[(1   ) K t  2  I t  2 ]  I t 1  (1   ) 2 K t  2  (1   ) I t  2  I t 1

s 1
 (1   ) K t  s   (1   ) j I t  j 1
s
j 0
and typically an initial condition for K is given.
186
Household budget constraint
• Suppose an infinitely lived household has the following
budget constraint:
at = (1+r)at-1 + yt – ct
ie assets at beginning of period t equal asset income
from last period asset holdings (at rate r) plus labor
income at beginning of period t less consumption
spending at the beginning of period t.
• A particular solution to the budget constraint can be
obtained by iterating forward:
1
 1 
at  
(c t  y t )
1 r
1

r


  1 
1  1 
 1 

a

(
c

y
)




( c t  y t )
t 1
t 1
t 1   
1  r  1  r 
1

r
1

r



 
at 1 
at 1

 1 


1

r


s 1
 1 
 

1

r

j 0 
s
at  s
j 1
(c t  j  y t  j )
187
Household budget constraint 2
• Multiplying both sides by (1+r) we get:
s
j
 1 
 1 
(1  r )at 1  
a

 t s  
 (c t  j  y t  j )
1 r 
j 0  1  r 
s
• Letting s→∞:
s
(1  r )at 1
j

 1 
 1 
 lim 
a

 t s  
 (c t  j  y t  j )
s   1  r 
j 0  1  r 
• Under a no Ponzi game condition the
household is not allowed to let its debt
grow indefinitely, so:
s
 1 

 at  s  0
lim
s   1  r 
188
Household budget constraint 3
• Under weak assumptions on the utility
function (strictly increasing utility in at least
one good), the household will not want to
build up positive assets forever, so this
holds with equality.
• Then, we get the lifetime budget constraint

j

j
 1 
 1 

 ct  j  (1  r )at 1   
 yt  j

j 0  1  r 
j 0  1  r 
189
Linear systems of difference
equations
• Now we look at systems of first order
difference equations, of the form:
x1(t) = A
x(t) +
b
(nx1) (nxn) (nx1)
(nx1)
• First suppose that A is a diagonal matrix. Then,
we can solve each of the n equations separately.
~
x

• With a vector of initial conditions i 0 xi
we get a solution (assume aii≠1):
~
bi  t
bi


xit   xi 
aii 

1  aii 
1  aii

190
Diagonalizable linear systems
• IF the matrix A is diagonalizable, so that
V=(v1,v2, …vn) is invertible and V-1AV is a
diagonal matrix with the eigenvalues of A along
its diagonal, then we can derive xtc by solving the
homogeneous part of the equation:
xt = Axt-1
V-1xt
= V-1Axt-1
= V-1AVV-1xt-1
yt=Λyt-1
where yt=V-1xt. This gives an uncoupled system,
which we already know how to solve.
• Therefor, yit = ciλit
191
Diagonalizable linear systems
• To obtain the solution to the original system, we
need to revert from yt back to xt:
c 
t
1 1
t
2 2


c   n
c
c
t
xt  Vyt  (v1 , v2 ,  , vn )

c
v


i
i
i
  i 1


 c t 
 n n
• To get xtp, we use the steady state:
xt = xt-1 ↔ xtp = A xtp + b
xtp = (I-A)-1b
(if I-A is invertible)
• Thus, the general solution is:
n
xtg   ci vi ti  ( I  A) 1 b
i 1
where the constants ci can be found with a set of
192
n boundary conditions.
Stability
• Now, to analyze stability properties of this general
solution.
• Case a) All eigenvalues real.
If all eigenvalues are less than 1 in absolute value, λit0
as t , and the system is asymptotically stable.
If some eigenvalues are greater than one in absolute
value, the corresponding terms λit  as t , and xjt is
unstable unless civij=0
• Case b) some eigenvalues complex.
See notes. Stability determined by the modulus of the
eigenvalue: r < 1 means system is asymptotically stable.
r > 1means system is unstable.
• Non-diagonalizable systems: we need to use other
methods. Beyond this course.
193
Elements of nonlinear systems
• Consider the nonlinear difference equation
system: xt = F(xt-1)
• When we work with discrete time, there are not
continuous movements along the solution path,
but rather jumps. This can mean that we can
jump around the steady state, which is not the
same as instability.
• As before, to look at behavior near the steady
state, we look at the linearized form.
194
Grobman-Hartman
• A steady state is a hyperbolic equilibrium if the
derivative of F evaluated at the steady state,
DF ( xt 1 ) |x  x
has no eigenvalues with modulus
equal to one.
• Theorem: If the steady state is a hyperbolic
equilibrium of the autonomous difference
equation system, then there exists a
neighborhood U of x such that the nonlinear
function is topologically equivalent to the linear
system:
t 1
xt  x  {DF ( xt 1 ) |xt 1  x }  ( xt 1  x )
in U.
195
Phase diagrams
• For n=2, we can use phase diagrams to
examine behavior around the steady state.
• If n=2, our system is:
x1t+1 = f1(x1t,x2t)
x2t+1 = f2(x1t,x2t)
• First, express the system in first differences (the
discrete counterpart to derivatives):
Δx1t = x1t+1 – x1t = f1(x1t,x2t) – x1t
Δx2t = x2t+1 – x2t = f2(x1t,x2t) – x2t
196
Phase diagrams 2
• Obtain the phase lines by setting Δx1t = 0 and
Δx2t = 0. This gives:
x1t = f1(x1t,x2t)
x2t = f2(x1t,x2t)
• To obtain arrows of motion, take derivatives of
the first differences: for x1t, we look at either:
x
f ( x , x )
or x1t f1 ( x1t , x2t )

1
1t
x1t
1
1t
2t
x1t
x2 t

x2 t
at a point in the Δx1t = 0 phase line.
• Eg: suppose
x1t
x1t

x1t 0
f1 ( x1t , x2 t )
1
x1t
x
1t 0
0
197
Phase diagrams 3
• Then, starting from the Δx1t = 0 line,
moving slightly right will increase Δx1t
making it strictly positive. So, to the right
of the phase line, x1t is increasing (arrows
point right).
• Similarly, moving left of the Δx1t = 0 line,
will reduce Δx1t below zero, making it
strictly negative, so the arrow points left.
• Now do the same for x2(t). Suppose:
x 2 t
x 2 t

x2 t 0
f 2 ( x1t , x 2 t )
1
x 2 t
x
0
2 t 0
198
Phase diagrams 4
• Then, starting from the Δx2t = 0 line, a small
movement up will decrease Δx2t, making it
strictly negative. So, above the phase line x2 is
decreasing.
• Similarly, a small movement downward from the
Δx2t = 0 line will make Δx2t positive, making it
strictly positive. So, below the phase line, x2 is
increasing.
• Finally, combine the phase lines and arrows.
199
Download