Document

Econ 600: Mathematical Economics July/August 2006 Stephen Hutton 1 Why optimization? • Almost all economics is about solving constrained optimization problems. Most economic models start by writing down an objective function. • Utility maximization, profit maximization, cost minimization, etc. • Static optimization: most common in microeconomics • Dynamic optimization: most common in macroeconomics 2 My approach to course • Focus on intuitive explanation of most important concepts, rather than formal proofs. • Motivate with relevant examples • Practice problems and using tools in problem sets • Assumes some basic math background (people with strong background might not find course useful) • For more details, see course notes, textbooks, future courses • Goal of course: introduction to these concepts 3 Order of material • Course will skip around notes a bit during the static course; specifically, I’ll cover the first half of lecture 1, then give some definitions from lecture 3, then go back to lecture 1 and do the rest in order. • Sorry!  4 Why not basic optimization? • Simplest method of unconstrained optimization (set deriv = 0) often fails • Might not identify the optima, or optima might not exist • Solution unbounded • Function not always differentiable • Function not always continuous • Multiple local optima 5 Norms and Metrics • It is useful to have some idea of “distance” or “closeness” in vector space • The most common measure is Euclidean distance; this is sufficient for our purposes (dealing with n-dimensional real numbers) • General requirements of norm: anything that satisfies conditions 1), 2), 3) (see notes) 6 Continuity • General intuitive sense of continuity (no gaps or jumps). Whenever x is close to x’, f(x) is close to f(x’) • Formal definitions: A sequence of elements, {xn} is said to converge to a point, x in Rn if for every  > 0 there is a number, N such that for all n < N, ||xnx|| < . • A function f:RnRn is continuous at a point, x if for ALL sequences {xn} converging to x, the derived sequence of points in the target space {f((xn)} converges to the point f(x). • A function is continuous if it is continuous at all points in its domain. • What does this mean in 2d? Sequence of points converging from below, sequence of points converging from above. Holds true in higher levels of dimensionality. 7 Continuity 2 • Why continuity? Needed to guarantee existence of solution • So typically assume continuity on functions to guarantee (with other assumptions) that a solution to the problem exists • Sometimes continuity is too strong. To guarantee a maximum, upper semi-continuity is enough. To guarantee a minimum, lower semi-continuity • Upper semi-continuity: For all xn  x, limn f(xn) ≤ f(x) • Lower semi-continuity: For all xn  x, limn f(xn)  f(x) • Note that if these hold with equality, we have continuity. • Note, figure 6 in notes is wrong 8 Open sets (notes from lecture 3) • For many set definitions and proofs we use the concept of an open ball of arbitrarily small size. • An open ball is a set of points (or vectors) within a given distance from a particular point (or vector). Formally: Let ε be a small real number. Bε(x)={y| ||x-y||< ε}. • A set of points S in Rn is open if for all points in S, there exists an open ball that is entirely contained within S. Eg (1,2) vs (1,2]. • Any union of open sets is open. • Any finite intersection of open sets is open. 9 Interior, closed set (notes in lecture 3) • The interior of a set S is the largest open set contained in S. Formally, Int(S) = UiSi where Si is an open subset of S. • If S is open, Int(S)=S • A set is closed if all sequences within the set converge to points within the set. Formally, fix a set S and let {xm} be any sequence of elements in S. If limmxm=r where r is in S, for all convergent sequences in S, then S is closed. • S is closed if and only if SC is open. 10 Boundary, bounded, compact (notes in lecture 3) • The boundary of a set S [denoted B(S)] is the set of points such that for all ε>0, Bε(x)∩S is not empty and Bε(x)∩SC is not empty. Ie any open ball contains points both in S and not in S. • If S is closed, S=B(S) • A set S is bounded if the distance between all objects in the set is finite. • A set is compact if it is closed and bounded. • These definitions correspond to their commonsense interpretations. 11 Weierstrass’s Theorem (notes in lecture 3) • Gives us a sufficient condition to ensure that a solution to a constrained optimization problem exists. If the constraint set C is compact and the function f is continuous, then there always exists at least one solution to max f(x) s.t. x is in C • Formally: Let f:RnR be continuous. If C is a compact subset of Rn, then there exists x* in C, y* in C s.t. f(x*)f(x)f(y*) for all x in C. 12 Vector geometry • Want to extend intuition about slope = 0 idea of optimum to multiple dimensions. We need some vector tools to do this • Inner product: x·y=(x1y1+x2y2+…+xnyn) • Euclidean norm and inner product related: ||x||2=x·x • Two vectors are orthogonal (perpendicular) if x·y = 0. • Inner product of two vectors v, w is v’w in matrix notation. • v’w > 0 then v, w form acute angle • v’w < 0 then v, w form obtuse angle. • v’w = 0 then v, w orthogonal. 13 Linear functions • A function f:VW is linear if for any two real numbers a,b and any two elements v,v’ in V we have f(av+bv’) = af(v)+bf(v’) • Note that our usual interpretation of “linear” functions in R1 (f(x)=mx+b) are not generally linear, these are affine. (Only linear if b=0). • Every linear function defined on Rn can be represented by an n-dimensional vector (f1,f2,…fn) with the feature that f(x) = Σfixi • Ie value of function at x is inner product of defining vector with x. • [Note, in every situation we can imagine dealing with, functionals are also functions.] 14 Hyperplanes • A hyperplane is the set of points given by {x:f(x)=c} where f is a linear functional and c is some real number. • Eg1: For R2 a typical hyperplane is a straight line. • Eg2: For R3 a typical hyperplane is a plane. • Think about a hyperplane as one of the level sets of the linear functional f. As we vary c, we change level sets. • The defining vector of f(x) is orthogonal to the hyperplane. 15 Separating Hyperplanes • A half-space is the set of points on one side of a hyperplane. Formally: HS(f) = {x:f(x)c} or HS(f)= {x:f(x)≤c}. • Consider any two disjoint sets: when can we construct a hyperplane that separates the sets? • Examples in notes. • If C lies in a half-space defined by H and H contains a point on the boundary of C, then H is a supporting hyperplane of C. 16 Convex sets • A set is convex if the convex combination of all points in a set is also in the set. • No such thing as a concave set. Related but different idea to convex/concave functions. • Formally: a set C in Rn is convex if for all x, y in C, for all  between [0,1] we have x+(1-)y is in C. • Any convex set can be represented as intersection of halfspaces defined by supporting hyperplanes. • Any halfspace is a convex set. 17 Separating Hyperplanes 2 • Separating hyperplane theorem: Suppose X, Y are non-empty convex sets in Rn such that the interior of Y∩X is empty and the interior of Y is not empty. Then there exists a vector a in Rn which is the defining vector of a separating hyperplane between X and Y. Proof: in texts. • Applications: general equilibrium theory, second fundamental theorem of welfare economics. Conditions where a pareto optimum allocation can be supported as a price equilibrium. Need convex preferences to be able to guarantee that there is a price ratio (a hyperplane) that can sustain an equilibrium. 18 Graphs • The graph is what you normally see when you plot a function. • Formally: the graph of a function from V to W is the ordered pair of elements, {( v, w) : v V , w  f ( v )} 19 Derivatives • We already know from basic calculus that a necessary condition for x* to be an unconstrained maximum of a function f is that its derivative be zero (if the derivative exists) at x*. • A derivative tells us something about the slope of the graph of the function. • We can also think about the derivative as telling us the slope of the supporting hyperplane to the graph of f at the point (x,f(x)). (see notes) 20 Multidimensional derivatives and gradients • We can extend what we know about derivatives from single-dimensional space to multi-dimensional space directly. • The gradient of f at x is just the n-dimensional (column) vector which lists all the partial derivatives if they exist. f f f f  ( , ,... )' dx1 dx2 dxn • This nx1 matrix is also known as the Jacobian. • The derivative of f is the transpose of the gradient. • The gradient can be interpreted as a supporting hyperplane of the graph of f. 21 Second order derivatives • We can think about the second derivative of multidimensional functions directly as in the single dimension case. • The first derivative of the function f was an nx1 vector; the second derivative is an nxn matrix known as the Hessian.  2 f ( x)  dx 2 1  2 f ( x )    2   f ( x)  dx dx n 1     2 f ( x)  dx1dxn     2  f ( x)  dxn2   • If f is twice continuously differentiable (ie all elements of Hessian exist) then the Hessian matrix is symmetric 22 (second derivatives are irrespective of order). Homogeneous functions • Certain functions in Rn are particularly well-behaved and have useful properties that we can exploit without having to prove them every time. • A function f:RnR is homogeneous of degree k if f(tx1,tx2,….,tkf(x). In practice we will deal with homogeneous functions of degree 0 and degree 1. Eg: demand function is homog degree 0 in prices (in general equilibrium) or in prices and wealth: double all prices and income has no impact on demand. • Homogeneous functions allow us to determine the entire behavior of the function from only knowing about the behavior in a small ball around the origin Why? Because for any point x’, we can define x’ as a scalar multiple of some point x it that ball, so x’=tx • If k=1 we say that f is linearly homogeneous. • Euler’s theorem: if f is h.o.d. k then x  f ( x )  kf ( x ) 23 Homogenous functions 2 • A ray through x is the line (or hyperplane) running through x and the origin running forever in both directions. Formally: a ray is the set {x’ in Rn|x’=tx, for t in R} • The gradient of a homogenous function is the essentially the same along any ray (linked by a scalar multiple). Ie the gradient at x’ is linearly dependent with the gradient at x’. Thus level sets along any ray have the same slope. Application: homogeneous utility functions rule out income effects in demand. (At constant prices, consumers demand goods in the same proportion as income changes.) 24 Homothetic functions • A function f:R+nR+ is homothetic if f(x)=h(v(x)) where h:R+R+ is strictly increasing and v:R+R+ is h.o.d. k. • Application: we often assume that preferences are homothetic. This gives that indifference sets are related by proportional expansion along rays. • This means that we can deduce the consumer’s entire preference relation from a single indifference set. 25 More properties of gradients (secondary importance) • Consider a continuously differentiable function, f:RnR. The gradient of f (Df(x)) is a vector in Rn which points in the direction of greatest increase of f moving from the point x. • Define a (very small) vector v s.t. Df(x)’v=0 (ie v is orthogonal to the gradient). Then the vector v is moving us away from x in a direction that adds zero to the value of f(x). Thus, any points on the vector v are at the same level of f(x). So we have a method of finding the level sets of f(x) – by solving Df(x)’v=0. Also, v is tangent to the level set of f(x). • The direction of greatest increase of a function at a point x is at right angles to the level set at x. 26 Upper contour sets • The level sets of a function are the set of points which yield the same value of the function. Formally, for f:RnR the level set is {x:f(x)=c} Eg: indifference curves are level sets of utility functions. • The upper contour set is the set of points above the level set, ie the set {x:f(x) c}. 27 Concave functions • For any two points, we can trace out the line of points joining them through tx+(1-t)y, varying t between 0 and 1. This is a convex combination of x and y. • A function is concave if for all x, y: x, y  R n , f (tx  (1  t ) y )  tf ( x )  (1  t ) f ( y ) ie line joining any two points is (weakly) less than the graph of the function between those two points • A function is strictly concave if the inequality is strict for all x,y. 28 Convex functions • A function is convex if for all x, y: x, y  R n , f (tx  (1  t ) y )  tf ( x )  (1  t ) f ( y ) ie line joining any two points is (weakly) greater than the graph of the function between the points. • A function is strictly convex if the inequality is strict for all x,y. • A function f is convex if –f is concave. • The upper contour set of a convex function is a convex set. The lower contour set of a concave function is a convex set. 29 Concavity, convexity and second derivatives • If f:RR and f is C2, then f is concave iff f’’(x)≤0 for all x. (And strictly concave for strict inequality). • If f:RR and f is C2, then f is convex iff f’’(x)0 for all x. (And strictly convex for strict inequality). 30 Concave functions and gradients • Any concave function lies below its gradient (or below its subgradient if f is not C1). • Any convex function lies above its gradient (or above subgradient if f is not C1. • Graphically: function lies below/above line tangent to graph at any point. 31 Negative and positive (semi-) definite • Consider any square symmetric matrix A. • A is negative semi-definite if x’Ax≤0 for all x. If in addition x’Ax=0 implies that x=0, then A is negative definite. • A is positive semi-definite if x’Ax0 for all x. If in addition x’Ax=0 implies that x=0, then A is positive definite. 32 Principal minors and nsd/psd • Let A be a square matrix. The k’th order leading principal minor of A is the determinant of the kxk matrix obtained by deleting the last n-k rows and columns. • An nxn square symmetric matrix is positive definite if its n leading principal minors are strictly positive. • An nxn square symmetric matrix is negative definite if its n leading principal minors are alternate in sign with a11 < 0. • [There are conditions for getting nsd/psd from principal minors.] 33 Reminder: determinant of a 3x3 matrix • You won’t have to take the determinant of a matrix bigger than 3x3 without a computer, but for 3x3:  a11 a12 a13  A  a21 a22 a23    a31 a32 a33  A  a11a22a33  a12a23a31  a13a21a32  a13a22a31  a12a21a33  a23a32a11 34 Concavity/convexity and nd/pd • Any ease way to identify if a function is convex or concave is from the Hessian matrix. • Suppose f:RnR is C2. Then: • f is strictly concave iff the Hessian matrix is negative definite for all x. • f is concave iff the Hessian matrix is negative semi-definite for all x. • f is strictly convex iff the Hessian matrix is positive definite for all x. • f is convex iff the Hessian matrix is positive semi-definite for all x. 35 Quasi-concavity • A function is quasi-concave if f(tx + (1-t)y)min{f(x),f(y)} for x,y in Rn, 0≤t≤1 • Alternatively: a function is quasi-concave if its upper contour sets are convex sets. • A function is strictly quasi-concave if in addition f(tx + (1-t)y)=min{f(x),f(y)} for 0<t<1 implies that x=y • All concave functions are quasi-concave (but not vice versa). • [Why quasi-concavity? Strictly quasi-concave functions have a unique maximum.] 36 Quasi-convexity • A function is quasi-convex if f(tx + (1-t)y) ≤max{f(x),f(y)} for x,y in Rn, 0≤t≤1 • Alternatively: a function is convex if its lower contour sets are convex sets. • A function is strictly quasi-convex if in addition f(tx + (1-t)y)=max{f(x),f(y)} for 0<t<1 implies that x=y • All convex functions are quasi-convex (but not vice versa). • [Why quasi-convexity? Strictly quasi-convex functions have a unique miniumum.] 37 Bordered Hessian • The bordered hessian matrix H is just the hessian matrix next to the Jacobian and its transpose:   0   f H   x1    f   xn f x1  2 f f  xn         • If the leading principal minors of H from k=3 onwards alternate in sign with the first lpm>0, then f is quasi-concave. If they are all negative,38 then f is quasi-convex. Concavity and monotonic transformations • (Not in the lecture notes, but useful for solving some of the problem set problems). • The sum of two concave functions is concave (proof in PS2). • Any monotonic transformation of a concave function is quasiconcave (though not necessarily concave). Formally, if h(x)=g(f(x)), where f(x) is concave and g(x) is monotonic, then h(x) is quasi-concave. • Useful trick: the ln(x) function is a monotonic transformation. 39 Unconstrained optimization • If x* is a solution to the problem maxxf(x), x is in Rn, what can we say about characteristics of x*? • A point x is a global maximum of f if for all x’ in Rn, f(x)f(x’). • A point x is a local maximum of f if there exists an open ball of positive radius around x, Bε(x) s.t. for all x’ in the ball, f(x)  f(x’). • If x is a global maximum then it is a local maximum (but not necessarily vice versa). • If f is C1, then if f is a local maximum of f, then the gradient of f at x = 0. [Necessary but not sufficient.] This is the direct extension of the single dimension case. 40 Unconstrained optimization 2 • If x is a local maximum of f, then there is an open ball around x, Bε(x) s.t. f is concave on Bε(x). • If x is a local minimum of f, then there is an open ball around x, Bε(x) s.t. f is convex on Bε(x). • Suppose f is C2. If x is a local maximum, then the Hessian of f at x is negative semi-definite. • Suppose f is C2. If x is a local minimum, then the Hessian of f at x is positive semi-definite. • To identify a global max, we either solve for all local maxima and then compare them, or look for additional features on f that guarantee that any local max are global. 41 Unconstrained optimization 3 • If f: RnR is concave and C1, then Df(x)=0 implies that x is a global maximum of f. (And x being a global maximum implies that the gradient is zero.) This is both a necessary and sufficient condition. • In general, we only really look at maximization, since all minimization problems can be turned into maximization problems by looking at –f. • x solves max f(x) if and only if x solves min f(x). 42 Non-differentiable functions (secondary importance) • In economics, we rarely have to deal with nondifferentiable functions; normally we assume these away. • The superdifferential of a concave function f at a point x is the set of all supporting hyperplanes of the graph of f at the point (x*,f(x*)). • A supergradient of a function f at a point x* is an element of the superdiffential of f at x*. • If x* is an unconstrained local maximum of a function f:RnR, then the vector of n zeros must be an element of the superdifferential of f at x*. • [And equivalently subdifferential, subgradient, local minimum for convex functions.] 43 Constrained optimization • General form of constrained optimization max f ( x ), x  C • Normally we write the constraint by writing out restrictions (eg x 1) rather than using set notation. • Sometimes (for equality constraints) it is more convenient to solve problems by substituting the constraint(s) into the objective function, and so solving an unconstrained optimization problem. • Most common restrictions: equality or inequality constraints. max f ( x ), s.t. g ( x )  c • Eg: Manager trying to induce worker to provide optimal effort (moral hazard contract). 44 Constrained optimization 2 • No reason why can only have one restriction. Can have any number of constraints, which may be of any form. Most typically we use equality and inequality constraints; these are easier to solve analytically than constraints that x belong to some general set. • These restrictions define the constraint set. • Most general notation, while using only inequality constraints: max x f ( x ) s.t.G( x )  0 where G(x) is a mx1 vector of inequality constraints (m is number of constraints). • Eg: For the restrictions 3x1+x2≤10, x12, we have: 10  3x1  x2   G( x )   x1  2   45 Constrained optimization 3 • We will need limitations on the constraint set to guarantee solution of existence (Weierstrass’ theorem). • What can happen if constraint set not convex, closed? (examples) • Denoting constraint sets: C  {x  R n | f ( x )  c  0} characterizes all values of x in Rn where f(x)  c 46 General typology of constrained maximization • Unconstrained maximization. C is just the whole vector space that x lies in (usually Rn). We know how to solve these. • Lagrange Maximization problems. Here the constraint set is defined solely by equality constraints. • Linear programming problems. Not covered in this course. • Kuhn-Tucker problems. These involve inequality constraints. Sometimes we also allow equality constraints, but we focus on inequality constraints. (Any problem with equality constraints could be transformed by substitution to deal only with inequality constraints.) 47 Lagrange problems • Covered briefly here, mostly to compare and contrast with Kuhn-Tucker. • Canonical Lagrange problem is of form: max x f ( x ) s.t.G ( x )  0, x  R n , G : R n  R n • Often we have a problem with inequality constraints, but we can use economic logic to show that at our solution the constraints will bind, and so we can solve the problem as if we had equality constraints. • Eg: Consumer utility maximization; if utility function is increasing in all goods, then consumer will spend all income. So budget constraint px≤w becomes px=w. 48 Lagrange problems 2 • Lagrange theorem: in the canonical Lagrange problem (CL) above, suppose that f and G are C1 and suppose that the nxm matrix DG(x*) has rank m. Then if x* solves CL, there exists a vector λ* in Rn such that Df(x*) + DG(x*) λ*=0. Ie: • f ( x*)   j 1 g j ( x*) * j  0 m • This is just a general form of writing what we know from solving Lagrange problems: we get n FOCs that all equal zero at the solution. • Rank m requirement is called “Constraint qualification”, we will come back to this with Kuhn Tucker. But this is a necessary (not sufficient) condition for the existence of 49 Lagrange Multipliers. Basic example: • max f(x1,x2) s.t. g1(x1,x2) = c1, g2(x1,x2)=c2 • L = f(x1,x2)+λ1(g1(x1,x2)-c1)+λ2(g2(x1,x2)-c2) • FOCs: x1: f1(x1,x2) + λ1g11(x1,x2) + λ2g21(x1,x2) =0 x2: f2(x1,x2) + λ1g12(x1,x2) + λ2g22(x1,x2) =0 • Plus constraints: λ1: g1(x1,x2) – c1 = 0 λ2: g2(x1,x2) – c2 = 0 50 Lagrange problems 3 • We can also view the FOCs from the theorem as: m f ( x*)   j 1 g j ( x*) * j • Ie we can express the gradient of the objective function as a linear combination of the gradients of the constraint functions, where the weights are determined by λ*. (see diagram in notes) • Note that no claims are made about the sign of λ* (but sign will be more important in KT). 51 Kuhn Tucker 1 • The most common form of constrained optimization in economics takes the form: max x f ( x ) s.t.G ( x )  0, x  0, x  R n , G : R n  R n • (Note that we can include non-negativity constraints inside the G(x) vector, or not.) • Examples: utility maximization. • Cost minimization 52 Kuhn Tucker 2 • Key problem with inequality constraints: solution to problem might be on boundary of constraint, or might be internal. (see diagram in notes) • Main advance of KT: sets up necessary conditions for optimum in situations where constraints bind, and for situations where they don’t. Then compare between these cases. • Basic idea: if constraints bind at a solution, then the value of the function must decrease as we move away from the constraint. So if at constraint x≤c, we can’t be at a maximum unless f’(x)0 at that point. If constraint is x c, we can’t be at a maximum unless f’(x)≤0 at that point. Otherwise, we could increase the value of the function without violating any of the constraints. 53 Kuhn-Tucker 3 • We say a weak inequality constraint is binding if the constraint holds with equality. • Unlike Lagrange problems, in KT problems, constraints might bind a solution, or they might not (if we have an internal solution). If a particular constraint does not bind, then its multiplier is zero; if the constraint does bind, then the multiplier is non-zero (and is >0 or <0 depending on our notational formulation of the problem). • We can think of the multiplier on a constraint as being the shadow value of relaxing that constraint. • Main new thing to deal with; complementary slackness conditions. Complementary slackness conditions are a way of saying that either a) a particular constraint is binding (and so the respective multiplier for that constraint is non-zero), which implies a condition on the slope of the function at the constraint (it must be increasing towards the constraint) b) a constraint does not bind (so we must be in an internal solution, with a FOC that equals zero). 54 Example 1 • Max f(x) s.t. 10-x0, x 0 L: f(x) + λ(10-x) FOCs x: f’(x)- λ ≤0 λ: 10-x 0 CS: CS1: (f’(x)-λ)x=0 CS2: (10-x)λ=0 55 Example 1, contd • Case 1, strict interior. x>0, x<10 From CS2, we have λ=0. From CS1, we have f’(x) = 0. (ie unconstrained optimum) • Case 2, left boundary, x=0. From CS2, we have λ=0. From FOC1 (x) we need f’(x) ≤ 0. • Case 3, right boundary, x=0. From CS1, we have f’(x) = λ, and we know λ0 by construction, so we must have f’(x) 0. • Thus, we can use the KT method to reject any candidate cases that don’t have the right slope. 56 Solving KT problems • Two methods, basically identical but slightly different in how they handle non-negativity constraints. • Method 1 (treat non-negativity constraints as different from other conditions) • Write the Lagrangean with a multiplier for each constraint other than non-negativity constraints on choice variables. If we write the constraints in the Lagrangean as g(x)0, we should add (not substract) the multipliers in the Lagrangean, assume the multipliers λ0, and this will make the FOCs for x non-positive, and the FOCs for the multiplers λ non-negative. • Take FOCs for each choice variable and each multiplier. • Take CS conditions from the FOC for each choice variable that has a non-negativity constraint, and for each multiplier. • Take cases for different possibilities of constraints binding; reject infeasible cases, compare feasible cases. 57 Solving KT problems 2 • Second method: treat non-negativity constraints as the same as any other constraint; functionally the same but doesn’t take shortcuts. • Write the Lagrangean with a multiplier for each constraint. This will give us more multipliers than the previous method. • Take FOCs for each choice variable and each multiplier. • Take CS conditions for each multiplier. This gives us the same number of CS conditions as the previous method. • Take cases for different possibilities of constraints binding; reject infeasible cases, compare feasible cases. 58 Example 2, method 1 • Max x2 s.t. x0, x≤2 L: x2 + λ(2-x) FOCs: x: 2x - λ ≤ 0 λ: (2-x)  0 CS: (2x + λ)x = 0 (2-x)λ = 0 • Case 1, internal solution, x>0, λ=0: contradiction from FOC1 rules this case out. • Case 2, left boundary, x=0, λ=0. Consistent, but turns out to be a minimum. • Case 3, right boundary, λ > 0, x>0. CS2 implies x=2. 59 Example 2, method 2 • Max x2 s.t. x0, x≤2 L: x2 + λ1(2-x) + λ2(x) FOCs: x: 2x – λ1 + λ2 ≤ 0 λ1: (2-x)  0 λ 2: x  0 CS: (2-x)λ1 = 0 xλ2= 0 • Case 1, internal solution, λ1=0, λ2=0: From FOC1, consistent only if x=0 (ie actually case 2) • Case 2, left boundary, λ1=0, λ2>0: From CS2, x=0. Consistent, but turns out to be a minimum. • Case 3, right boundary, λ1 > 0, λ2=0. CS1 implies x=2. • Case 4, λ1 > 0, λ2>0: from CS1 and CS2, clearly contradictory (0=x=2). 60 Sign issues • There are multiple ways of setting up the KT Lagrangean using different signs. • One way is as above: in the Lagrangean, add λg(x) terms, write the g(x) terms as  0, assume the multipliers λi0, which implies that the FOC terms are ≤0 for choice variables and 0 for multipliers. The lecture notes (mostly) use this method. • Another way is to subtract the λg(x) terms in L, and write the g(x) terms as ≤ 0, assume implies λi0, which implies the FOC terms are ≤0 for choice variables and 0 for multipliers. SB uses this method. • Whatever method you choose, be consistent. 61 Example 1, SB signing • Max f(x) s.t. 10-x0, x 0 L: f(x) - λ(x-10) FOCs x: f’(x)- λ ≤0 λ: -(x-10)0 CS: CS1: (f’(x)-λ)x=0 CS2: -(x-10)λ=0 62 Kuhn Tucker 4 • Formal treatment: start with Lagrangian. When f:R+nR and G:R+nRm, the Lagrangian of the KT problem is a new function L:R+n+mR. L( x ,  )  f ( x )   ' G ( x ) • Important to note the domain limit on L; the Lagrangian is non-negative (and so (We could rewrite the problem restricting the multipliers to be negative by changing the + in the Lagrangian to - .) (We could also rewrite the problem without the implicit non-negativity constraints; in general KT problems not in economic settings, we need not require x non-negative.) 63 Kuhn-Tucker 5 • As in the Lagrange method case, we can rewrite the Lagrangian as: L( x,  )  f ( x )   j 1  j g j ( x ) m decomposing G into its components. • For any fixed point x*, define indices of G: K = {i:gi(x*)=0} and M = {i:x*i>0}. • Define: H ( x*)  M GK ( x) by differentiating G with only the K components wrt components j in M . This is MxK matrix. 64 Kuhn Tucker Theorem • Suppose that x* solves the canonical KT as a local maximum and suppose that H(x*) has maximal rank (Constraint Qualification). Then there exists λ*0 s.t.: L( x*,  *) x*  0 x *  L( x*,  *) x*  0 x * L( x*,  *) x*  0   *' L( x*,  *)   *' G ( x*)  0  (ie FOCs for choice vbles) for i=1,..n; (ie CS conditions for non-negativity constraints) (ie FOCs for multipliers) (ie CS conditions for multipliers) 65 KT theorem notes • The constraint qualification (H(x*) has maximal rank) is complex and is typically ignored. But technically we need this to guarantee the theorem, and that the solution method yields actual necessary conditions • These are necessary conditions for a solution. Just because they are satisfied does not mean we have solved the problem; we could have multiple candidate solutions, or multiple solutions, or no solution at all (if no x* exists). 66 KT and existence/uniqueness • Suppose G(x) is concave, and f(x) is strictly quasi-concave (of G(x) strictly concave, and f(x) quasi-concave), then if x* solves KT, x* is quasi-concave. Furthermore, if {x:G(x)0,x0} is compact and non-empty and f(x) is continuous, then there exists x* which solves KT. • Proof: Existence from Weierstrass theorem. For uniqueness: Suppose there are some x, x’ that both solve KT. Then f(x) = f(x’) and G(x)0, G(x’)0. Since G is concave, for t in [0,1] we have G(tx + (1-t)x’)  tG(x) + (1-t)G(x’)  0. So tx + (1-t)x’ is feasible for KT. But f strictly quasi-concave implies f(tx+(1t)x’) > min{f(x),f(x’)}=f(x). So we have a feasible x’’ = (tx + (1t)x’) which does better than x and x’. Which contradicts x, x’ both being optimal solutions. 67 The constraint qualification Consider the problem: max x1 s.t. (1-x1)3-x2 0, x1 0, x2 0. (see picture in notes, (1,0) is soln) At solution, x2 0 is a binding constraint. Note that gradient of constraint at (1,0) is Dg(1,0) = (2(x1-1),-1)’ = (0,-1)’ at soln. This gives H* matrix of 0 1  which has a rank of 1.   0  1   • The gradient of f(1,0) is (1,0), which cannot be expressed as a linear combination of (0,1) or (0,-1). So no multipliers exist that satisfy the KT necessary conditions. 68 Non-convex choice sets • Sometimes we have non-convex choice sets; typically these lead to multiple local optima. • In these cases, we can go ahead and solve the problem separately in each case and then compare. OR we can solve the problem simultaneously. 69 Example: labour supply with overtime • Utility function U(c,l)=cαlβ • Non-negativity constraint on consumption. Time constraints l  0 and 24 – l  0 on leisure (note l is leisure, not labour). • Overtime means that wage rate = w per hour for first 8 hours, 1.5w per hour for extra hours. This means: c ≤ w(24-l) for l  16 c ≤ 8w + 1.5w(16-l) for l ≤ 16. 70 Overtime 2 • The problem is that we have different functions for the boundary of the constraint set depending on the level of l. The actual problem we are solving has either the first constraint OR the second constraint; if we tried solving the problem by maximising U(x) s.t. both constraints for all l then we would solve the wrong problem. (see figures in notes) • To solve the problem, note that the complement of the constraint set is convex. c  w(24-l) for l  16 c  8w + 1.5w(16-l) for l ≤ 16 • So consider the constraint set given by: (c – w(24-l))(c-8w-1.5w(16-l))  0 (see figure in notes) 71 Overtime 3 • Then, without harm we could rewrite the problem as: maxc,l cαlβ s.t. c 0, l 0, 24-l 0 -(c – w(24-l))(c-8w-1.5w(16-l))  0 • Note that this is not identical to the original problem (it omits the bottom left area), but we can clearly argue that the difference is harmless, since the omitted area is dominated by points in allowed area. • Note that if x* solves max f(x), it solves max g(f(x)) where g(.) is a monotonic transformation. • So lets max log(cαlβ) instead s.t. the same constraints. • This gives the Lagrangean: L: αlog(c)+βlog(l)+μ(24-l)–λ(c-w(24-l))(c-(8w+1.5w(16-l))) 72 Overtime 4 • We can use economic and mathematical logic to simplify the problem. First, note that since the derivative of the log function is infinity at c=0 or l=0, this clearly can’t be a solution, so μ=0 at any optimum and we can ignore CS conditions on c and l. • So rewrite Lagrangean dropping μ term: L: αlog(c)+βlog(l)–λ(c-w(24-l))(c-(8w+1.5w(16-l))) • Now lets look at the FOCs. 73 Overtime 5 • FOCs: c:    (c  8w  3w (16  l )  c  w(24  l ))  0 c l:  l 2  w ( c  8w  3w 3 (16  1)  ( c  ( w( 24  l )))  0 2 2 λ: (c  8w  3w (16  l ))( c  w(24  l ))  0 2 • CS condition:  ( c  8w  3w (16  l ))( c  w( 24  l ))  0 2 noting that the equalities occur in the FOCs because we argued that non-negativity constraints for c and l don’t bind. 74 Overtime 6 • If l and c were such that the FOC for λ were strictly negative, we must have λ=0 by CS, but this makes the first two FOCs impossible to satisfy. So (c-8w-1.5w(16-l))=0 and/or (c-w(24-l))=0 In other words, we can’t have an internal solution to the problem (which is good, since these are clearly dominated). • Case 1: (c-w(24-l))=0 (no overtime worked) From first two FOCs, we get αwl=βc, which with c=24wwl gives us c = 24α/(α+β) • Case 2: (c-8w-1.5w(16-l))=0 (overtime) From the first two FOCs, we get 3αwl=2βc, which we can combine with c = 8w+1.5w(16-l)) to get an expression for c in terms of parameters. • Actual solution depends on particular parameters of 75 utility function (graphically could be either). The cost minimization problem • Cost minimization problem: what is the cheapest way to produce at least y output from x inputs at input price vector w. • C(y,w) = -maxx –w’x s.t. f(x)  y, y0, x 0. • If f(x) is a concave function, then the set {x:f(x)y} is a convex set (since this is an upper contour set). • To show that C(y,w) is convex in y: Consider any two levels of output y, y’ and define yt=ty+(1-t)y’ (ie convex combination). 76 Convexity of the cost function • Let x be a solution to the cost minimization problem for y, xt for yt, x’ for y’. • Concavity of f(x) implies: f(tx+(1-t)x’)tf(x)+(1-t)f(x’). • Feasibility implies: f(x)  y, f(x’) y’. • Together these imply: f(tx+(1-t)x’)tf(x)+(1-t)f(x’)ty + (1-t)y’=tt • So the convex combination tx+(1-t)x’ is feasible for yt. 77 Convexity of the cost fn 2 • By definition: C(y,w) = w’x C(y’,w) = w’x’ C(yt,w) = w’xt • But C(yt,w) = w’xt ≤ w’(tx+(1-t)x’)=tw’x+(1t)w’x’= t C(y,w) + (1-t)C(y’,w) where the inequality comes since xt solves the problem for yt. • So C(.) is convex in y. 78 Implicit functions (SB easier than lecture notes) • So far we have been working only with functions in which the endogenous variables are explicit functions of the exogenous or independent variables. Ie y = F(x1,x2,…xn) • This is not always the case; frequently we have economic situations with exogenous variables mixed in with endogenous variables. G(x1,x2,…xn,y)=0 • If for each x vector this equation determines a corresponding value of y, then this equation defines an implicit function of the exogenous variables x. • Sometimes we can solve the equation to write y as an explicit function of x, but sometimes this is not possible, or it is easier to work with the implicit function. 79 Implicit functions 2 • 4x + 2y = 5 expresses y as an implicit function of x. Here we can easily solve for the explicit function. • y2-5xy+4x2=0 expresses y implicitly in terms of x. Here we can also solve for the explicit relationship using the quadratic formula [but it is a correspondence, not a function], y=4x OR =x. • Y5-5xy+4x2=0 cannot be solved into an explicit function, but still implicitly defines y in terms of x. Eg: x=0 implies y=0. x=1 implies y=1. 80 Implicit functions 3 • Consider a profit-maximizing firm that uses a single input x at a cost of w per unit to make a single output y using technology y=f(x), and sells the output for p per unit. Profit function: π(x)=pf(x)-wx FOC: pf’(x)-w=0 • Think of p and w as exogenous variables. For each choice of p and w, the firm will choice a value of x that satisfies the FOC. To study profit-maximising behaviour in general, we need to work with this FOC defining x as an implicit function of p and w. • In particular, we will want to know how the choice of x changes in response to changes in p and w. 81 Implicit functions 4 • An implicit function (or correspondence) of y in terms of x does not always exist, even if we can write an equation of the form G(x,y)=c Eg: x2+y2=1. When x>1 there is no y that satisfies this equation. So there is no implicit function mapping x’s greater than 1 into y’s. • We would like to have us general conditions telling us when an implicit function exists. 82 Implicit functions 5 • Consider the problem: maxx0f(x;q) s.t. G(x;q)0 where q is some k dimensional vector of exogenous real numbers. • Call a solution to this problem x(q), and the value the solution attains V(q) = f(x(q);q). • Note that x(q) may not be unique, but V(q) is still welldefined (ie there may be multiple x’s that maximise the function, but they all give the same value (otherwise some wouldn’t solve the maximisation problem)) • Interesting question: how do V and x change with q? • We have implicitly defined functions mapping q’s to V’s. 83 Implicit functions 6 • The problem above really describes a family of optimization problems; each different value of the q vector yields a different member of the family (ie a different optimization problem). • The FOCs from KT suggest that it will be useful to be able to solve generally systems of equations y  {z  R k : T ( z, q)  0} where z : R p  Rk , T : Rk  p  Rk (why? Because the FOCs constitute such a system.) • Eg: Finding the equation for a level set, is to find z(q) such that T(z(q),q)-c=0. Here, z(q) is an implicit function • As noted previously, not all systems provide implicit functions. Some give correspondences, or give situations where there is mapping x(q). • The implicit function theorem tells us when it is possible to 84 find an implicit function from a system of equations. Implicit function theorem (for system of equations) • Let T:Rk+pRk be C1. Suppose that T(z*,q*)=0. If the kxk matrix formed by stacking the k gradient vectors (wrt z) of T1,T2,…Tk is invertible (or equivalently has full rank or is non-singular), then there exist k C1 functions each mapping Rp Rk such that: z1(q*)=z1*, z2(q*)=z2*, …. zk(q*)=zk* and T(z(q),q) = 0 for all q in Bε(q*) for some ε>0. 85 IFT example • Consider the utility maximisation problem: maxx in Rn U(x) s.t. p’x≤I, U strictly quasiconcave, DU(x)>0, dU(0)/dxi=. • We know a solution to this problem satisfies xi > 0 (because of dU(0)/dxi=) and I-p’x=0 (because DU(x)>0) and the FOCs: dU/dx1-λp1=0 … dU/dxn-λpn=0 I-p’x=0 86 IFT example contd • This system of equations maps from the space R2n+2 (because x and p are nx1, λ and I are scalars) to the space Rn+1 (the number of equations). • To apply the IFT, set z = (x, λ), q=(p,I) Create a function T: R2n+2 Rn+1 given by:  U ( x )   p1    x1   T ( x,  , p, I )   U ( x )     p n  x  I n p' x    87 IFT example contd • If this function T is C1 and if the n+1xn+1 matrix of derivatives of T (wrt x and λ) is invertible, then by the IFT we know that there exist n+1 C1 functions: x1(p,I), x2(p,I), …. xn(p,I), λ(p,I) s.t. T(x1(p,I), x2(p,I), …. xn(p,I), λ(p,I)) = 0 for all p,I in a neighborhood of a given price income vector (p,I). • Ie, the IFT gives us the existence of continuously differentiable consumer demand functions. 88 Theorem of the maximum • Consider the family of lagrangian problems: V(q) = maxxf(x;q) s.t. G(x;q)=0 This can be generalized to KT by restricting attention only to constraints that are binding at a given solution. • Define the function T:Rn+m+pRn+m by  f ( x; q)   ' G ( x; q)  T ( x,  , p, I )    G ( x; q )   89 Theorem of the Maximum 2 • The FOCs for this problem at an optimum are represented by T(x*,λ*;q)=0. We want to know about defining the solutions to the problem, x* and λ*, as functions of q. • The IFT already tells when we can do this: if the (n+m)x(n+m) matrix constructed by taking the derivative of T wrt x and λ is invertible, then we can find C1 functions x*(q) and λ*(q) s.t. T(x*(q’), λ*(q’);q’)=0 for q’ in a neighborhood of q. • Ie we need the matrix below to have full rank:  2x f ( x; q)   '  2x G( x; q)  x G( x; q) T ( x,  ; q)     x G( x; q) 0   90 Theorem of the Maximum 3 • Suppose the Lagrange problem above satisfies the conditions of the implicit function theorem at x*(q*),q*. If f is C1 at x*(q*),q*, then V(q) is C1 at q*. • Thus, small changes in q around q* will have small changes in V(q) around V(q*). 91 Envelope Theorem • Applying the IFT to our FOCs means we know (under conditions) that x(q) that solves our FOCs exists and is C1, and that V(.) is C1. • The envelope theorem tells us how V(q) changes in response to changes in q. • The basic answer from the ET is that all we need to do is look at the direct partial derivative of the objective function (or of the Lagrangian for constrained problems) with respect to q. • We do not need to reoptimise and pick out different x(q) and λ(q), because the fact that we were at an optimum means these partial derivs are already zero. 92 Envelope theorem 2 • Consider the problem: maxxf(x;q) s.t. G(x;q)≥0, G:Rn→Rm. where q is a p-dimensional vector of exogenous variables. Assume that, at a solution, the FOCs hold with equality and that we can ignore the CS conditions. (Or assume that we only include constraints that bind at the solution in G() ) • Suppose that the problem is well behaved, so we have that at a particular value q*, the solution x(q*), λ(q*) are C1 and V(q*)=f(x(q*);q*) is C1. (Note that we could get these from the IFT and the Theorem of the Maximum) 93 Envelope theorem 3 • Suppose the problem above satisfies the conditions of the IFT at x*(q*). If f is C1 at x*(q*),q* then:  qV (q*)  f ( x(q*); q*) L( x(q*),  (q*); q*)  q f ( x(q*), q*) g ( x(q * (; q*)    (q*) q q ie the derivative of the value function V(q) is equal to the derivative of the Lagrangean 94 Envelope theorem 4 • So, to determine how the value function changes, we merely need to look at how the objective function and constraint functions change with q directly. • We do not need to include the impact of changes in the optimization variables x and λ, because we have already optimized L(x,λ,q) with respect to these. • So, for an unconstrained optimization problem, the effect on V(.) is just the derivative of the objective function. • For a constrained optimization problem, we also need to add in the effect on the constraint. Changing q could effect the constraint (relaxing or tightening it), which we know has shadow value λ. • Proof is in lecture notes. 95 Envelope theorem example • Consider a problem for the form: maxxf(x) s.t. q-g(x) 0 • Thus, as q gets bigger, the constraint is easier to satisfy. What would we gain from a small increase in q, and thus a slight relaxation of the constraint? • The Lagrangian is L(x,λ;q) = f(x)+ λ(q-g(x)) • The partial deriv of the Lagrangian wrt q is λ. Thus, dV(q)/dq = λ. • A small increase in q increases the value by λ. Thus, the lagrange multiplier is the shadow price. It describes the price of relaxing the constraint. • If the constraint does not bind, λ=0 and dV(q)/dq = 0. 96 Envelope theorem example 2 • We can use the envelope theorem to show that in the consumer max problem, λ is the marginal utility of income. • Consider the cost min problem: C(y,w) = maxx -w’x s.t. f(x)-y0. • Lagrangian is: L(x,λ;y,w) = -w’x + λ(f(x)-y) Denote the optimal solution to be x(y,w). • From the ET, we get: C ( y , w) L( x ,  , y , w )   xi ( y , w) wi wi 97 ET example 2, contd • This is known as Shephard’s lemma; the partial derivative of the cost function with respect to wi is just xi, the demand for factor i. • Also note that:  2 C ( y , w)    2 C ( y , w)  xi ( y , w)  x j ( y , w)  wi w j w j wi w j wi ie the change in demand for factor i with respect to a small change in price of factor j is equal to the change in demand for factor j in response to a small change in the price of factor i. 98 Correspondences • A correspondence is a transformation that maps a vector space into collections of subsets in another vector space. • Eg: a correspondence F:RnR takes any n dimensional vector and gives as its output a subset of R. If this subset has a only one element for every input vector, then the correspondence is also a function. • Examples of correspondences: solution to the cost minimization problem, or the utility maximization problem. 99 Correspondences 2 • A correspondence F is bounded if for all x and for all y in F(x), the size of y is bounded. That is, ||y||≤M for some finite M. For bounded correspondences we have the following definitions. • A correspondence F is convex-valued if for all x, F(x) is a convex set. (All functions are convex-valued correspondences). • A correspondence F is upper hemi-continuous at a point x if: for all sequences {xn} that converge to x, and all sequences {yn} such that yn in F(xn) converge to y, then y is in F(x). • For bounded correspondences, if a correspondence is uhc for all x, then its graph is a closed set. 100 Correspondences 3 • A correspondence F is lower hemicontinuous at a point x, if for all sequences {xn} that converge to x and for all y in F(x), there exists a sequence {yn} s.t. yn is in F(xn) and the sequence converges to y. • See figure in notes. 101 Fixed point theorems • A fixed point of a function f:RnRn is a point x, such that x=f(x). A fixed point of a correspondence F:RnRn is a point x such that x is an element of F(x). • Solving a set of equations can be described as finding a fixed point. (Suppose you are finding x to solve f(x) = 0. Then you are looking for a fixed point in the function g(x), where g(x) = x + f(x), since for a fixed point x* in g, x* = g(x*) = x* + f(x*), so f(x*) =0.) • Fixed points are crucial in proofs of existence of equilibriums in GE and in games. 102 Fixed point theorems • If f:RR, then a fixed point of f is any point where the graph of f crosses the 45 degree line (ie the line f(x)=x). • A function can have many fixed points, a unique fixed point, or none at all. • When can we be sure that a function possesses a fixed point? We use fixed point theorems. 103 Brouwer fixed point theorem • Suppose f:RnRn and for some convex, compact set C (that is a subset of Rn) f maps C into itself. (ie if x is in C, then f(x) is in C). If f is continuous, then f possesses a fixed point. • Continuity • Convexity of C • Compactness of C • C maps into itself. 104 Kakutani fixed point theorem • Suppose F:RnRn is a convex-valued correspondence, and for some convex compact set C in Rn, F maps C into itself. (ie if x is in C, then F(x) is a subset of C). If F is upper hemicontinuous, then F possesses a fixed point. • These FPTs give existence. To get uniqueness we need something else. 105 Contraction mappings • Suppose f:RnRn such that ||f(x)=f(y)|| ≤ θ ||x-y|| for some θ < 1 and for all x,y. Then f ix a contraction mapping. • Let C[a,b] be the set of all continuous functions f:[0,1]R with the “supnorm” metric ||f)|| = maxx in [a,b] f(x). Suppose T:CC (that is, T takes a continuous function, does something to it and returns a new, possibly different continuous function). If, for all f,g in C, ||Tf-Tg|| ≤ θ ||f-g|| for some θ < 1, then T is a contraction mapping. 106 Contraction mapping theorem • If f or T (as defined above) is a contraction mapping, it possesses a unique fixed point, x*. 107 Dynamic optimisation • Up to now, we have looked at static optimisation problems, where agents select variables to maximise a single objective function. • Many economic models, particularly in macroeconomics (eg saving and investment behaviour), use dynamic models, where agents make choices each period that affect their potential choices in future periods, and often have a “total” objective function that maximises the (discounted) sum of payoffs in each period. • Much of the material in the notes is focused on differential and difference equations (lectures 1-4), but we will attempt to spend more time on lectures 5-6, which are the focus of most dynamic models. 108 Ordinary differential equations • Differential equations are used to model situations which treat time as a continuous variable (as opposed to in discrete periods, where we use difference equations). • An ordinary differential equation is an expression which describes a relationship between a function of one variable and its derivatives. • Formally: m 1 2 m1 x (t )  F [t, x(t ), x , x (t ),..., x where (t );  ] dx(t ) 2 d 2 x (t ) d m (t ) m x (t )  , x (t )  ,..., x (t )  2 dt dt dt m 1    Rp is a vector of parameters F if a function Rm+1+pR 109 Ordinary differential equations 2 • The solution is a function x(t) that, together with its derivatives, satisfies this equation. • This is an ordinary differential equation because x is a function of one argument, t, only. If it was a function of more than one variable, we would have a partial differential equation, which we will not study here. • A differential equation is linear if F is linear in x(t) and its derivatives. • A differential equation is autonomous if t does not appear as an independent argument of F, but enters through x only. • The order of a differential equation is the order of the highest derivative of x that appears in it (ie order m above). 110 First order differential equation • Any differential equation can be reduced to a first-order differential equation system by introducing additional variables. • Consider x3(t) = ax2(t) + bx1(t) + x(t) Define y(t) = x1(t), z(t) = x2(t) • Then: y1(t) = x2(t) = z(t), z1(t)=x3(t). • So we have the system:  z 1 (t )   az (t )  by (t )  cx (t )   1    z (t )  y (t )      x 1 (t )    y ( t )     111 Particular and general solutions • A particular solution to a differential equation is a differentiable function x(t) that satisfies the equation for some subinterval I0 of the domain of definition of t, I. • The set of all solutions is called the general solution, xg(t). • To see that the solution to a differential equation is generally not unique, consider: x1(t) = 2x(t). One solution is x(t) = e2t. But for any constant c, x(t) = ce2t is also a solution. • The non-uniqueness problem can be overcome by augmenting the differential equation with a boundary condition: x(t0) = x0. 112 Boundary value problems • A boundary value problem is defined by a differential equation x1(t) = f[t,x(t)] and a boundary condition x(t0) = x0,(x0,t0) is an element of X x I • Under some conditions, every boundary value problem has a unique solution. • Fundamental Existence Uniqueness theorem: Let F be C1 in some neighborhood of (x0,t0). Then in some subinterval I0 of I containing t0 there is a unique solution to the boundary value problem. 113 Boundary values problems 2 • If F is not C1 in some neighborhood of (x0,t0), the solution may not be unique. Consider: x1(t) = 3x(t)2/3 x,t in R x(0)=0 Both x(t) = t3 and x(t) = 0 are solutions. Note f(x) = 3x(t)2/3 is not differentiable at x=0. • The solution may not exist globally. Consider: x1(t) = x(t)2 x,t in R x(0) = 1 x(t) = 1/(1-t) is a solution, but is only defined for t in [-,1) 114 Steady states and stability • When using continuous time dynamic models, we are often interested in the long-run properties of the differential equation. • In particular, we are interested in the properties of its steady state (our equilibrium concept for dynamic systems, where the system remains unchanged from period to period), and whether or not the solution eventually converges to the steady state (ie is the equilibrium stable, will we return there after shocks). • We can analyze the steady state without having to find an explicit solution for the differential equation. 115 Steady states and stability 2 • Consider the autonomous differential equation: x1(t) = f[x(t)] f :X RR • A steady state is a point x  X such that f (x)  0 • Phase diagrams to illustrate this. • Steady states may not exist, may not be unique, may not be isolated. • Stability: consider an equation that is initially at rest at an equilibrium point x , and suppose that some shock causes a deviation from x . We want to know if the equation will return to the steady state (or at least remain close to it), or if it will 116 get farther and farther away over time. Steady states and stability 3 • Let x be an isolated (ie locally unique) steady state of the autonomous differential equation x1(t)=f[x(t)], f :X RR • We say that x is stable if for any ε > 0, there exists δ in (0,ε] such that: || x(t0 )  x ||  || x(t )  x || t  t0 ie any solution x(t) that at some point enters a ball of radius δ around x remains within a ball of (possibly larger) radius ε forever after. 117 Steady states and stability 4 • A steady state is asymptotically stable if it is stable AND δ can be chosen in such a way that any solution that satisfies || x(t0 )  x ||  for some t0 will also satisfy lim t x(t )  x • That is, any solution that gets sufficiently close to x not only remains nearby but converges to x as t . 118 Phase diagrams: arrows of motion • The sign of x1(t) tells us about the direction that x(t) is moving (see diagram). • x1(t) > 0 implies that x(t) is increasing (arrows of motion point right). • x1(t) < 0 implies that x(t) is decreasing (arrows of motion point left). • Thus: x1 and x3 in diagram are locally asymptotically stable; x2 is unstable. • x1 in the second diagram (see notes) is globally asymptotically stable. 119 Phase diagrams arrows of motion 2 • We can conclude that if for all x in some neighborhood of a steady state : • x(t) < x implies x1(t) > 0 AND x(t) > x implies that x1(t)<0, then x is asymptotically stable. • x(t)<x implies x1(t) < 0 and x(t) > x implies x1(t)>0 then x is unstable. • Therefore, we can determine the stability property of a steady state by checking the sign of the derivative of f[x(t)] at x . • xi is (locally) asymptotically stable if: f ' [ x (t )]|x ( t ) x  0 • xi is unstable if f ' [ x (t )]|x ( t ) x  0 • If f ' [ x (t )]|x ( t ) x  0 , then we don’t know. i i i 120 Grobman-Hartman theorem • Let x be a steady state of out standard autonomous differential equation 1 1 x (t )  f [ x(t )], f : X  R  R, f  C • We say that x is a hyperbolic equilibrium if f ' [ x (t )]|x ( t ) x  0 i • The previous analysis suggests we can study the stability properties of a nonlinear differential equation by linearizing it, as long as the equilibrium is hyperbolic. • Theorem: If x is a hyperbolic equilibrium of the autonomous differential equation above, then there is a neighborhood U of x such that the equation is topologically equivalent to the linear equation: x1 (t )  f ' [ x(t )] x ( t ) x  [ x(t )  x ] in U. (Note that this is a first-order Taylor series approximation 121 of f around x ). (See notes). Grobman-Hartman theorem 2 • The theorem says that near a hyperbolic equilibrium x , the non-linear differential equation has the same qualitative structure as the linearized differential equation. • In particular, if x is (locally) asymptotically stable for the linearized equation, it is locally asymptotically stable for the non-linear equation, and if it is unstable for the linearized equation, then it is unstable for the non-linear equation. 122 Application: Solow Growth model • Classic growth theory model. Also, start getting used to notation. Capital letters used for aggregate variables; lower case letters used for per worker equivalents. • Output Y is produced using capital K and labor L accord to a production function: Y(t) = F(K(t),L(t)] • F is C1, it has constant returns to scale and positive and diminishing marginal products wrt each input. • By constant returns to scale, we have: Y(t) = F[K(t),L(t)] = L(t)F[K(t)/L(t),1] = L(t)f[k(t)] where k(t) = K(t)/L(t) 123 Solow model 2 • Then y(t) = Y(t)/L(t) = f[k(t)] ie we can write output per unit of labor as a function of capital per unit of labor. • We also assume the Inada conditions hold: f(0)=0 limk(t) f[k(t)] =  f’(0)=  limk(t)f’[k(t)] =0 • The economy is closed, so savings equal investment, and a constant fraction of output s is saved. Assume no capital depreciation. Thus, Kt(t)=I(t)=sF[K(t),L(t)], where s is in [0,1] and I denotes investment • Assume that labor grows at a constant exogenous rate n L(t) = L(0)ent = L0ent. 124 Solow model 3 • Then: K(t) = K(t)L(t)/L(t) = k(t)L0ent • And thus: K1(t) = k1(t)L0ent + k(t)nL0ent (by product rule) • Combining the equations for capital evolution (K1(t)) we get: k1(t) = sf[k(t)] - nk(t) (see graph). • To find the steady state, set k1(t) = 0. Thus, k s sf (k )  nk   f (k ) n The capital to output ratio K/Y=k/f(k) is constant , and capital stock and output grow at the rate of125 growth of the population, n. Solow model 4 • Stability properties: k1(t) > 0 for k(t) < k (since sf [k(t)]> nk(t)) k1(t) < 0 for k(t) > k (since sf [k(t)]< nk(t)) (How do we know this? See graph. We have from diminishing marginal product that f’ is positive but decreasing, at that f’ is infinite at 0, so nk cuts sf from below.) 126 Solving autonomous differential equations • Consider the following first-order autonomous linear differential equation: x1(t)=ax(t)+b • All solutions can be written as: xg(t) = xc(t) + xp(t) general soln=complementary function + particular soln • The complementary function xc(t) is the general solution to the homogenous equation associated with the equation above: x1(t) = ax(t) • xp(t) is any particular solution to the full equation. 127 Solving autonomous differential eqns 2 • We use the method of separation of variables to find the complementary solution. Rewrite x1(t) = ax(t) as: dx(t)/dt = ax(t) • Then: dx(t) = ax(t)dt, and: dx(t)/x(t) = adt • Integrating both sides, we get: 1  x(t )dx(t )   adt ln x (t )  at  c1 e ln x ( t )  e at e c1 x c (t )  ce at , c  e c1 (Note that the c in xc(t) is for the complementary solution, not the constant c) • So we have the complementary solution for the general128 form of a first order differential equation, xc(t)=ceat Solving autonomous differential eqns 3 • Now, lets verify that the general solution is the complementary function plus any particular solution; ie show that if xp(t) is a solution, then ceat + xp(t) is also a soln. d dx p (t ) at p at [ce  x (t )]  ace  dt dt  ace at  ax p (t )  b  a[ce at  x p (t )]  b • Thus, if we add ceat to any solution, we get another solution. We can get all solutions in that way. 129 Solving autonomous differential eqns 3 • Let xp(t) and xg(t) be two arbitrary solns. Then: dx g (t )  ax g (t )  b dt dx p (t )  ax p (t )  b dt • Subtracting the second from the first, defining y(t)=xg(t)-xp(t), we get: dy (t )  ay (t ) dt with general solution: y(t)=ceat • Thus, the difference between any two arbitrary solutions has the form ceat, and thus we can get all solutions by adding ceat to xp(t). 130 Solving autonomous differential eqns 3 • To solve the differential equation, we still need to find xp(t). The simplest alternative is usually a steady state solution: x1 (t )  0  ax(t )  b  0 b x p (t )  if a ≠ a • Thus, x g (t )  ce at  b a 0 if a≠0, and x g (t )  bt  c if a = 0. (show this yourself) 131 Solving autonomous differential eqns 4 • We can pin down the arbitrary constant by means of a boundary condition. Suppose we have x(0) = x0. • Then, for a≠0, x(0) = c – b/a, so c = x0+b/a , and the general soln is: b  at b  x (t )   x0  e  a a  g x (t )  bt  x0 g if a≠0, and if a=0 132 Solving autonomous differential eqns 5 • What are the stability properties of this general solution? • If a >0, then eat as t, and x(t) explodes unless x0 = -b/a (in which case it stays constant forever) • If a <0, then eat 0 as t, and x(t) is asymptotically stable. • Recall this solution is only for autonomous equations, where t appears only through x. What about non-autonomous differential equations? 133 Solving non-autonomous differential eqns • Consider the first order non-autonomous differential equation: x1(t)=a(t)x(t)+b(t) where the coefficients a and b are (potentially) functions of time. • Rewrite the equation as: x1(t)-a(t)x(t)=b(t) t • Consider the function e-α(t), with  (t )    ( s)ds 0 -α(t): Multiply both sides by e x1 (t )e  ( t )  a(t ) x(t )e  ( t )  b(t )e  ( t ) 134 Solving non-autonomous differential eqns 2 • Note that the LHS is the derivative of x(t)e-α(t). So, we can rewrite this as: d ( x (t )e  ( t ) )  b(t )e  ( t ) dt • The expression e-α(t) that made the left hand side above the exact derivative of a function is called an integrating factor. • There are two forms of the general solution to the nonautonomous differential equation obtained using different methods, the backward solution (convenient if there is a natural initial condition) and the forward solution (convenient if there is a natural terminal condition). 135 The backward solution • The backward solution is obtained by integrating backward between 0 and s (where s is our current period). s d s  ( t )  ( t )  dt x(t )e dt   b(t )e 0 x(t )e dt s |   b(t )e  ( t )  ( t ) s 0 x ( s )e 0  ( s ) 0 s  x(0)   b(t )e  (t ) x( s )  x(0)e 0  (s) s   b(t )e ( s )  (t ) 0 This gives us the value of x(s) as a function of its initial value x(0), and a weighted sum of past values of the forcing term b(t). Eg: with a natural initial condition: growth models, with some initial level of capital. 136 The forward solution • If we don’t have a logical initial condition, but still have a natural terminal condition (normally an asymptotic terminal condition) then we should use the forward solution, obtained by integrating forward between s and .  d   ( t )  ( t )  s  x(t )e dt x(t )e dt   s b(t )e dt  |   b(t )e  (t ) dt  ( t )  s lim x(t )e s  ( t )  x ( s )e  ( s ) t   (s) x( s )  e lim x(t )e t   ( t )    b(t )e  (t ) dt s    b(t )e ( s )  (t ) dt s provided that the limit exists (ie x(t) converges). • Example: dynamic consumption optimisation (typical terminal condition: net assets go to zero (or nonnegative) as time goes to infinity) 137 Example: Intertemporal budget constraint • Suppose that an infinitely lived household has the following budget constraint: a1(t)=ra(t)+y(t)-c(t) ie at time t, change in assets is interest income (at rate r) from assets plus labor income minus consumption • Lets find the forward solution (since we want a budget constraint out to time infinity). Multiply both sites the budget constraint by e-rt and rearrange: [a1(t)-ra(t)]e-rt=[y(t)-c(t)] e-rt • Note that the LHS is the derivative of a(t)e-rt. So, we can rewrite our equation for the general solution of a non-autonomous differential equation as: d  a (t )e rt   [ y (t )  c(t )]e rt dt 138 Intertemportal budget constraint 2 • Suppose that a(0) is given, lets integrate forward starting from t=0 a (t )e  |   [ y (t )  c(t )]e rt dt  rt  0 0   a (0)   [ y (t )  c(t )]e rt dt 0   0  c(t )e dt  a (0)   y (t )e rt dt  rt 0 ie lifetime consumption = initial assets + lifetime income. • Note: If the interest rate changed over time, we would need to replace r by r(t), and the integrating t factor becomes  0 r ( s ) ds , so e t 0 c ( t ) e   0  r ( s ) ds t dt  a (0)   y (t )e 0  0  r ( s ) ds dt 139 Linear systems • Now we study systems of linear differential equations. Recall, wlog, we can restrict attention to first order-systems (since any higher order system can be converted into a first order system by defining new variables). • General form: x1(t) = A x(t) + b (nx1) (nxn) (nx1) (nx1) • Start with uncoupled systems (the matrix A is diagonal). Ie: 0  0  a  11  0 a    22 A    0    0  a nn   0 140 Uncoupled systems • Then, we can solve each of the n equations separately, by the methods for solving individual equations. • Eg: the first equation is x11(t)=a11x1(t)+b1 and its solution is x1g(t)=cea11t – b1/a11 • With an initial condition x1(0)=x10 then the solution becomes:  b  b x1 (t )   x10  1 e a11t  1 a11  a11  • Thus, if we are given an initial condition for each equation in the system, xi(0)=xi0, i=1,2,…,n then the solution to the boundary value problem is:  bi  aiit bi (assuming a≠0) xi (t )   xi 0  e  aii  aii  141 Diagonalizable linear systems • If the matrix is not diagonal, we can use the algebraic technique of diagonalizing a matrix to reduce any diagonalizable linear system to an equivalent diagonal system, which we can then solve as above (by solving each individual equation individually). • Theorem: if the matrix A has n linearly independent eigenvectors v1, v2,…vn then the matrix V = (v1,v2…vn) is invertible and V-1AV is a diagonal matrix with the eigenvalues of A along its main diagonal • If all eigenvalues of the matrix are distinct, then the n eigenvectors are linearly independent, so A is diagonaliable. 142 Diagonalizable linear systems 2 • If A is diagonalizable, we first derive xc(t) by solving the homogeneous part of the system of differential equations. x1(t) = Ax(t) V-1x1(t) = V-1Ax(t) = V-1AVV-1x(t) (since VV-1=I) y1(t) = Λy(t) where y(t) = V-1x(t). • y1(t) = Λ y(t) is a diagonal system, which we already know how to solve, so: yi(t) = cieλit 143 Diagonalizable linear systems 3 • To obtain the solution to the original system, we need to convert back from y(t) to x(t):  c e  t  1  t  c2 e 2  n i t c x (t )  Vy (t )  ( v1 , v2 ,..., vn )  c v e ii   i 1    c e nt   n  1 • To obtain xp(t) we use the steady state: x1(t) = 0nx1 ↔ Ax(t) + b = 0nx1 xp(t) = -A-1b (if A nonsingular) • Therefore:  x g (t )   ci vi e i t  A1b i 1 where the constants c1,c2,…,cn can be found with a set of n boundary conditions. 144 Eigenvectors and eigenvalues • Consider a nxn matrix A. • Its eigenvalues λ1,λ2,…,λn and eigenvectors vi=(vi1,vi2,…vin) satisfy the equation: (A-λiIn)vi=0nx1 for i=1,2,…,n (Note this gives us ratios of values within an eigenvector, we can pick one value to set =1 to define a particular v) • The eigenvalues are the roots of the following equation: det(A-λiIn)=0 145 Example • Consider the linear system x1(t)=Ax(t) where  1 1 A     4 1 • Its eigenvalues and eigenvectors satisfy: (A-λI2)v=0nx2 • Its eigenvalues are the roots of: det(A-λI2) = 0 • The matrix (A-λI2) is A  I  1   1   4  1    146 Example, 2 • So setting the determinant of this =0 gives 0 = (1-λ)2 – 4 = λ2 - 2 λ -3 = (λ-3)(λ+1) • So we have λ1 = 3, λ2 = -1. • To find eigenvalue for λ1=3 we solve:   2 1  v11   0        4  2   v12   0  • This gives v12=2v11. Setting v11=1 gives: v1=(1 2)’ • Similarly for λ2=-1, we get v2=(1 -2)’ 147 Example, 3 • So we get the V matrix from concatenating eigenvectors: 1 1  V     2  2 • This gives the diagonal matrix: 3 0  V AV     0  1 ie it has the eigenvalues along its diagonal. • Under the transformation y(t) = V-1x(t) we get: 1 y1(t)=Λy(t)   3 0  y1 (t )   0  1 y (t )    2  which has the general solution:  y1 (t )   c1e 3t       t   y 2 (t )   c 2 e  148 Example, 4 • And we get the solution to the original solution by converting back to x(t) from y(t):  x1 (t )   1 1  c1e 3t   c1e 3t c 2 e  t         Vy(t )    t  3t t   2  2  c 2 e   2c1e  2c 2 e   x 2 (t )  149 Stability • Recall the general solution: n x g (t )   ci vi e i t  A1b i 1 • Case a) All eigenvalues real • If all eigenvalues are negative, eλit0 as t , i=1,2,…n and the system is asymptotically stable: it converges to the steady state whatever the initial position. • If some eigenvalues are positive, the corresponding terms eλit as t, and xj(t) is unstable unless civij=0. • Case b) some eigenvalues complex: see notes. Punchline: if eigenvalue is complex, stability will 150 be determined by real part of eigenvalue. Stability 2 • Skipped ahead in notes to page 8, at eqn (11) • If n=2, we can use phase diagrams to illustrate the stability properties of the system. • Suppose as before that we have a system of equations x1(t) = Ax(t) + b (where these 2x1 vectors and A is 2x2) • WLOG, let’s assume that b=0. We can always “demean” our system by defining a new variable y as the deviation of x from its steady state: y(t) = x(t) + A-1b • Then, y1(t)=x1(t)=Ax(t)+b = A[y(t)-A-1b]+b = Ay(t) • When n=2, all information about the system’s stability properties are summarized by the trace and the determinant of the coefficient matrix A. • The trace of a matrix is the sum of its diagonal elements 151 Stability 3 • The eigenvalues of A solve |A-λI2| = 0 (a11-λ)(a22-λ) – a12a21 = 0 λ2 – (a11+a22)λ + a11a22 – a12a21 = 0 λ2 – tr(A) + det(a) = 0 • Thus, by the quadratic formula: 1, 2 tr( A)  tr( A) 2  4 det( A)  2 • Also: tr(A) = λ1 + λ2 det(A) = λ1λ2 152 Stability: Nodes • 4 different types of cases of steady state. • Case 1: Nodes • a) If det(A) > 0 and [tr(A)2-4det(A)] > 0: λ1 and λ2 are real, distinct and have the same sign. If tr(A)<0, then λ1, λ2 < 0: and node is stable. If tr(A) > 0, then λ1, λ2 >0: node is unstable • b) If [tr(A)2-4det(A)] = 0: λ1 and λ2 are real and repeated (so we can’t diagonalize the matrix). If tr(A) < 0, then λ1 = λ2 < 0, and node is stable If tr(A) > 0, then λ1 = λ2 > 0: unstable node 153 Stability: saddle points • If det(A) < ), λ1 and λ2 are real, distinct and have opposite sign. WLOG, assume λ1 > 0, λ2 < 0. Then, if c1v1j≠0, xj(t) is unstable. (Note we cannot have v11=v12=0, else V is not invertible) • If c1 = 0, the system’s behavior is determined only by λ2, and it converges to the steady state as t. • Setting c1=0 we obtain: x1(t) = c2v21eλ2t x2(t) = c2v22eλ2t 154 Stability: saddle points 2 • Then we get: v  x1 (t )   21  x2 (t )  v22  which is the equation of the saddle path. • For the graph in the notes, assume:  1 0   A    0 2  then: v = (1 0)’ v = (0 1)’ 1 2 • The system will converge to the steady state if and only if it starts out from one of points where the constant associated with the explosive root is equal to zero (constant on x1 here). If we are off this saddle path, we do not converge to the steady state. 155 Stability: spiral points • If [tr(A)2 – 4det(A)] < 0 and tr(A)≠0: λ1 and λ2 are complex conjugates. These give spiral paths; clockwise if a21 < 0, counter-clockwise if a21 > 0 Spirals can be stable or unstable depending on the sign of the trace. • If tr(A) <0, then eigenvalues have a negative real part and the spirals converge to the steady state • If tr(A) >0, then eigenvalues have positive real part, and spirals diverge from steady state. 156 Stability: centers • If tr(A) = 0 and det(A) > 0: eigenvalues are pure imaginary numbers. The steady state is stable but not asymptotically stable. • If the matrix A is non-diagonalizable, we need to use other methods to find solutions (beyond scope of this course). 157 Nonlinear systems of differential equations • In general, it is not possible to find closed form solutions for nonlinear differential equations. However, we can obtain qualitative information about the local behavior of the solution if n=2, and study non-linear systems analytically by looking at their linearizations. • Consider the nonlinear differential equation system: x1(t) = F[x(t)] F: X subset of RnRn (nx1) (nx1) 158 Phase diagrams • Assuming n=2, we have: x11(t) = f1[x1(t), x2(t)] x21(t) = f2[x1(t), x2(t)] • Set x11 = 0 to obtain the phase lines. x11(t) = 0 implies f1[x1(t), x2(t)] = 0 x21(t) = 0 implies f2[x1(t), x2(t)] = 0 • Each of these equations describes a curve in (x1(t),x2(t)) space. To find the slope of each curve, look at the sign of dfi[.]/dx1 for i=1,2 (Note: take both derivatives wrt x1 for the sign to equal the sign of the slope, since x1 is the horizontal axis. Alternatively, we could take derivs wrt x2, or a combination, whichever is computationally easier – as long as we interpret them correctly) 159 Phase diagrams: arrows of motion • To determine behavior of x1 off the phase lines, we look at either of the derivatives: x11 (t ) f1 [ x1 (t ), x2 (t )]  x1 (t ) x1 (t ) • Eg: suppose that or x11 (t ) x1 (t ) x11 (t ) f1 [ x1 (t ), x2 (t )]  x2 (t ) x2 (t )  x11 ( t ) 0 f1 [ x1 (t ), x2 (t )] 0 x1 (t ) x1 ( t ) 0 1 • Then, starting from the x11=0 line, a small movement to the right will increase x11(t), making it positive, so to the right of the x11(t)=0 line, x1(t) is increasing. A small movement to the left will decrease x11(t), making it negative, so left of the phase line x1(t) is decreasing. 160 Phase diagrams: arrows of motion 2 • Now do the same for x2(t). Suppose: x21 (t ) x2 (t )  x12 ( t ) 0 f 2 [ x1 (t ), x2 (t )] 0 x 2 (t ) x1 ( t )  0 2 • Then, starting from the x21(t) = 0 line, a small movement upward (ie increasing x2) will decrease x21(t), making it negative. So above the x21(t) = 0 line, x2 is decreasing. • Similarly, taking a small movement downward (decreasing x2) will increase x21(t), making it positive. So below the phase line, x2 is increasing. • Finally, combine all this. 161 Behavior of nonlinear systems • From the phase diagrams we can obtain valuable information about the behavior of the system, but the phase diagram alone often does not give us enough information about the stability properties of the steady state. • To obtain more precise information about the behavior of the nonlinear system around the steady state, under certain conditions we can approximate the system by their linear counterparts. 162 Behavior of nonlinear systems 2 • Take a multivariate first order Taylor series approximation of F around the steady state F [ x(t )]  F ( x )  {DF [ x(t )] | x(t )  x}  [ x(t )  x ]  R[ x(t )  x ] where and  f1   x1  f 2 DF [ x (t )] |x ( t ) x   x1     f n  x  1 f1 x 2 f 2 x 2  f n1 x 2 f1   x n  f 2   x n     f n   x n    || R[ xi (t )  xi ) ||     0 lim xi ( t ) xi  || xi (t )  xi ||  163 Behavior of nonlinear systems 3 • In some neighborhood of the steady state, the linear system: x1 (t )  {DF [ x(t )] | x(t )  x}  [ x(t )  x ] can be expected to be a reasonable approximation of our system of nonlinear equations under some conditions. • We need the steady state to be a hyperbolic equilibrium. This is the case if the derivative of F evaluated at the steady state, DF [ x (t )] | x (t )  x has no eigenvalues with a zero real part. 164 Grobman-Hartman theorem • We return to a version of the Grobman-Hartman theorem for systems of equations. • If x is a hyperbolic equilibrium of the autonomous differential equation system x1(t) = F[x(t)] and F is C1, then there exists a neighborhood U of x such that the nonlinear system is topologically equivalent to the linear system in U. x1 (t )  {DF [ x(t )] | x(t )  x}  [ x(t )  x ] 165 Discrete time • Up to now, in the dynamic section of the course, we have been dealing with continuous time. That is, each “period” t has had a length of zero, so we describe how a variable changes by looking at its derivative; thus differential equations are a key tool for explaining how variables evolve over time. • An alternate used in many situations is to use discrete time. Here, we have periods indexed by t, of some positive finite length, and so to look at changes in variables we compare xt to xt+1. Our key tool here are difference equations, the discrete time counterpart to differential equations. • Many definitions and properties that hold for differential equations also hold for difference equations, with slight modifications. 166 Difference equations 1 • With differential equations, time is continuous, so t can take any real variable. • With difference equations, t can only take integer values that correspond to periods (eg a year, a quarter, a day). • An ordinary difference equation is an equation of the form: xt+m=F[t,xt,xt+1,xt+2,…,xt+m-1;α] where p    R and F is a function Rm+1+pR • This is an ordinary difference equation because we describe a relationship between a function of one 167 variable and its lags (ie x is a function only of t). Difference equations 2 • A difference equation is linear if F is linear in xt, xt+1,…,xt+m-1. • A difference equation is autonomous if t does not appear as an independent argument of F, but enters only through x. • The order of a difference equation is equal to the difference between the highest and lowest time subscripts appearing in the eqn. Thus, above we have an order m difference equation. 168 Difference equations 3 • Solving the difference equation means finding a function xt that satisfies the equation for any t in some subinterval I0 of I. • Since the time subscripts are just labels, we can harmlessly transfer all the time subscripts by the same amount. Thus (ignoring α) the equation above is equivalent to xt=F[t,xt-1,xt-2,…,xt-m] • Any difference equation can be reduced to a system of first order difference equations by defining additional variables, so we can assume m=1. 169 Difference equations 4 • The solution to the difference equation xt = f(t,xt1) in general is not unique. • If we impose a boundary condition xt0 = x0, we can recursively derive the solution as follows: xt0+1 = f(t0+1,xt0), xt0+2 = f(t0+2,xt0+1), … • Different boundary conditions will generate different trajectories of xt. As with differential equations, a specific solution is a particular solution, and the general solution is the set of all particular solutions. 170 Steady states • We are interested in analyzing the long-run properties of a difference equation, and such analysis does not necessarily require us to find an explicit solution. • For the autonomous differential equation xt=f(xt-1), a steady state is a fixed point x ie a point such that x  f (x ) • Note that a steady state may not exist. • We can use phase diagrams to examine difference equations. A fixed point is an intersection of the phase line and the 450 line. 171 Stability • As before, we are interested in stability and asymptotic stability; what happens after a shock that causes a small deviation from our steady state? • Recall: stability requires that any solution xt that at some point enters a ball of radius δ around the steady state will remain within a ball of (possibly larger) radius ε. • Asymptotic stability requires not only stability, but also that we converge to the steady state as t . 172 Stability 2 • When the phase line is above the 450 line: xt = f(xt-1) > xt-1, so x is increasing. • When the phase line is below the 45o line: xt = f(xt-1) < xt-1, so x is decreasing. • Thus, x1 is unstable, x 2 is asymptotically stable. • We can determine the stability property of a steady state by looking at f’ at the steady state. • If | f ' ( x ) | 1, then x is (locally) asymptotically stable • If | f ' ( x ) | 1 , then x is unstable. • If | f ' ( x ) | 1 , then x is a non-hyperbolic eqbm, and173 could be stable or unstable. Grobman-Hartman • As before, we can study stability properties by linearizing the equation, as long as we have a hyperbolic equilibrium. • Theorem: If x is a hyperbolic equilibrium, then there is a neighborhood U of x such that the non-linear difference equation is topologically equivalent to: in U. xt  x  [ f ' ( xt 1 ) |xt 1  x ]  ( xt 1  x ) 174 Solving autonomous difference equations • Consider the following first order autonomous linear difference equation: xt = axt-1 + b • All solutions can be written as xtg=xtc+xtp • To derive xtc we substitute recursively on the homogeneous part of the equation: xt=axt-1=a(axt-2)=a2xt-2=a2(axt-3)=a3xt-3=… =atx0 • Since the initial value of x, x0 remains undetermined in the absence of a boundary condition, what this says is that all solutions to xt=axt-1 must be of the form: xtc=cat, where c is an arbitrary constant. 175 Solving autonomous difference equations 2 • To find a particular solution, we use the steady state: xt  xt 1  x  ax  b xtp  b 1 a if a ≠ 1. b g t • Therefore, if a ≠ 1: xt  ca  1 a • We can pin down the arbitrary constant c by means of a boundary condition. Suppose we have: x0  ~ x • Then, for a ≠ 1, we have: x0  c  b 1 a b c~ x 1 a 176 Solving autonomous difference equations 3 • This gives general solution (if a≠1): b  t b ~ xt   x  a  1 a  1 a  • If a=1, no steady state exists, and iteration from the boundary condition gives xt  bt  ~ x 177 Stability properties • What are the stability properties of xtg? • If |a|<1, then at0 as t, and xt is asymptotically stable (converges to steady state regardless of initial position) • If |a|>1, then at  as t , and xt explodes unless x0 = b/(1-a) • The sign of a determines whether xt remains on one side of the steady state, or whether it alternates sides of the steady state. • If a>0, cat has the same sign and the system converges or diverges monotonically. • If a<0, at is positive or negative as t is even or odd, and the system jumps from one side to the other each period. • See diagrams in notes. 178 Solving non-autonomous linear difference equations • Consider the first order non-autonomous linear difference equation xt=axt-1+bt where the forcing term b is a function of time. • The complementary function will be the same as that for the first order autonomous linear equation xt = axt-1+b which is of the form xt=cat. So we only need to derive a particular solution. • There are two particular solutions frequently used: the forward solution and the backward solution. 179 Backward solution • The backward solution is obtained by iterating the equation backwards: xt  axt 1  bt  a ( axt 2  bt 1 )  bt  a 2 xt 2  abt 1  bt  a 2 ( axt 3  bt 2 )  ab t 1 bt  a 3 xt 3  a 2 xt 3  a 2 bt 2  abt 1  bt  ... s 1  a x t  s   a j bt  j s j 0 • If there is no natural starting point for xt, we can express the solution as a function of all past values of the forcing variables by letting s:  xt  lim a x t s   a j bt  j s s j 0 180 Backward solution 2 • For this equation to be well defined, we need to assume that |a|<1, |b |<B for all t. Then: t  xtB   a j bt  j j 0 as the limit goes to zero. • And the general solution is:  x  c B a   a j bt  j g t t j 0 (denoting the constant as cB to reflect that once we consider a boundary condition, the value of cB will depend on the particular solution chosen) 181 Forward solution • The forward solution is obtained by iterating out difference equation forward. • First, solve the equation for xt-1 xt-1=(1/a)xt – (bt/a) and then shift it one period forward xt=(1/a)xt+1 - (bt+1/a) • Then, we have: 2 2 b  b 11 1 1 1 xt   xt  2  t  2   t 1    xt  2    bt  2  bt 1 aa a  a a a a 2 2 3 3 2 b  1 1 1 1 1 1 1 1     xt 3  t 3     bt  2  bt 1    xt 3    bt 3    bt  2  bt 1 a  a a a a a a a a  s s 1 1 1    xt  s     a j 0  a  j 1 bt  j 1 182 Forward solution 2 • If there is no natural end point for xt, we can express the solution as a function of all future values of the forcing variables by letting s →∞: s j 1  1 1 xt  lim   xt  s     bt  j 1 a j 0  a  s   a  • For this to be well defined, assume |a|>1, |bt| < B for all t. Then: j 1  1 x      bt  j 1 a j 0  a  F t • And the general solution is: xtg  c F a t   j 1 1   bt  j 1  a j 0  a  183 Lag operators • A common notation when dealing with discrete time is to use lag operators. We can, for example, use these to derive the backward and forward solutions. • A lag operator is defined by: Lnxt = xt-n for n= …,-2,-1,0,1,2,… • Thus, we can rewrite our difference equation xt = axt-1 + bt as xt = aL1xt+bt or as xt = aLxt+bt • Thus, (1  aL) x  b t t  1  xt   bt  1  aL  184 Lag operators 2 • If |a| <1 we have: 1/(1-aL)=1+aL+a2L2 + …. ie we can use the sum to infinity formula, and so xt=bt/(1-aL) is the backward solution. • To derive the forward solution, rewrite xt=bt/(1 1  aL) as: L   1 bt xt   a  1 1   L 1 a  1      a bt 1 xt   1 1   1  L   a   2  1  1 1  1   2 xt   1  L    L  ...bt 1 a  a a  if a ≠1. 185 Capital accumulation • The standard capital accumulation equation is given by: Kt = (1-δ)Kt-1 + It-1 where δ is depreciation (and so is in (0,1) and I is investment. • A particular solution can be derived using backward iteration: K t  (1   )[(1   ) K t  2  I t  2 ]  I t 1  (1   ) 2 K t  2  (1   ) I t  2  I t 1  s 1  (1   ) K t  s   (1   ) j I t  j 1 s j 0 and typically an initial condition for K is given. 186 Household budget constraint • Suppose an infinitely lived household has the following budget constraint: at = (1+r)at-1 + yt – ct ie assets at beginning of period t equal asset income from last period asset holdings (at rate r) plus labor income at beginning of period t less consumption spending at the beginning of period t. • A particular solution to the budget constraint can be obtained by iterating forward: 1  1  at   (c t  y t ) 1 r 1  r     1  1  1   1   a  ( c  y )     ( c t  y t ) t 1 t 1 t 1    1  r  1  r  1  r 1  r      at 1  at 1   1    1  r   s 1  1     1  r  j 0  s at  s j 1 (c t  j  y t  j ) 187 Household budget constraint 2 • Multiplying both sides by (1+r) we get: s j  1   1  (1  r )at 1   a   t s    (c t  j  y t  j ) 1 r  j 0  1  r  s • Letting s→∞: s (1  r )at 1 j   1   1   lim  a   t s    (c t  j  y t  j ) s   1  r  j 0  1  r  • Under a no Ponzi game condition the household is not allowed to let its debt grow indefinitely, so: s  1    at  s  0 lim s   1  r  188 Household budget constraint 3 • Under weak assumptions on the utility function (strictly increasing utility in at least one good), the household will not want to build up positive assets forever, so this holds with equality. • Then, we get the lifetime budget constraint  j  j  1   1    ct  j  (1  r )at 1     yt  j  j 0  1  r  j 0  1  r  189 Linear systems of difference equations • Now we look at systems of first order difference equations, of the form: x1(t) = A x(t) + b (nx1) (nxn) (nx1) (nx1) • First suppose that A is a diagonal matrix. Then, we can solve each of the n equations separately. ~ x  • With a vector of initial conditions i 0 xi we get a solution (assume aii≠1): ~ bi  t bi   xit   xi  aii   1  aii  1  aii  190 Diagonalizable linear systems • IF the matrix A is diagonalizable, so that V=(v1,v2, …vn) is invertible and V-1AV is a diagonal matrix with the eigenvalues of A along its diagonal, then we can derive xtc by solving the homogeneous part of the equation: xt = Axt-1 V-1xt = V-1Axt-1 = V-1AVV-1xt-1 yt=Λyt-1 where yt=V-1xt. This gives an uncoupled system, which we already know how to solve. • Therefor, yit = ciλit 191 Diagonalizable linear systems • To obtain the solution to the original system, we need to revert from yt back to xt: c  t 1 1 t 2 2   c   n c c t xt  Vyt  (v1 , v2 ,  , vn )  c v   i i i   i 1    c t   n n • To get xtp, we use the steady state: xt = xt-1 ↔ xtp = A xtp + b xtp = (I-A)-1b (if I-A is invertible) • Thus, the general solution is: n xtg   ci vi ti  ( I  A) 1 b i 1 where the constants ci can be found with a set of 192 n boundary conditions. Stability • Now, to analyze stability properties of this general solution. • Case a) All eigenvalues real. If all eigenvalues are less than 1 in absolute value, λit0 as t , and the system is asymptotically stable. If some eigenvalues are greater than one in absolute value, the corresponding terms λit  as t , and xjt is unstable unless civij=0 • Case b) some eigenvalues complex. See notes. Stability determined by the modulus of the eigenvalue: r < 1 means system is asymptotically stable. r > 1means system is unstable. • Non-diagonalizable systems: we need to use other methods. Beyond this course. 193 Elements of nonlinear systems • Consider the nonlinear difference equation system: xt = F(xt-1) • When we work with discrete time, there are not continuous movements along the solution path, but rather jumps. This can mean that we can jump around the steady state, which is not the same as instability. • As before, to look at behavior near the steady state, we look at the linearized form. 194 Grobman-Hartman • A steady state is a hyperbolic equilibrium if the derivative of F evaluated at the steady state, DF ( xt 1 ) |x  x has no eigenvalues with modulus equal to one. • Theorem: If the steady state is a hyperbolic equilibrium of the autonomous difference equation system, then there exists a neighborhood U of x such that the nonlinear function is topologically equivalent to the linear system: t 1 xt  x  {DF ( xt 1 ) |xt 1  x }  ( xt 1  x ) in U. 195 Phase diagrams • For n=2, we can use phase diagrams to examine behavior around the steady state. • If n=2, our system is: x1t+1 = f1(x1t,x2t) x2t+1 = f2(x1t,x2t) • First, express the system in first differences (the discrete counterpart to derivatives): Δx1t = x1t+1 – x1t = f1(x1t,x2t) – x1t Δx2t = x2t+1 – x2t = f2(x1t,x2t) – x2t 196 Phase diagrams 2 • Obtain the phase lines by setting Δx1t = 0 and Δx2t = 0. This gives: x1t = f1(x1t,x2t) x2t = f2(x1t,x2t) • To obtain arrows of motion, take derivatives of the first differences: for x1t, we look at either: x f ( x , x ) or x1t f1 ( x1t , x2t )  1 1t x1t 1 1t 2t x1t x2 t  x2 t at a point in the Δx1t = 0 phase line. • Eg: suppose x1t x1t  x1t 0 f1 ( x1t , x2 t ) 1 x1t x 1t 0 0 197 Phase diagrams 3 • Then, starting from the Δx1t = 0 line, moving slightly right will increase Δx1t making it strictly positive. So, to the right of the phase line, x1t is increasing (arrows point right). • Similarly, moving left of the Δx1t = 0 line, will reduce Δx1t below zero, making it strictly negative, so the arrow points left. • Now do the same for x2(t). Suppose: x 2 t x 2 t  x2 t 0 f 2 ( x1t , x 2 t ) 1 x 2 t x 0 2 t 0 198 Phase diagrams 4 • Then, starting from the Δx2t = 0 line, a small movement up will decrease Δx2t, making it strictly negative. So, above the phase line x2 is decreasing. • Similarly, a small movement downward from the Δx2t = 0 line will make Δx2t positive, making it strictly positive. So, below the phase line, x2 is increasing. • Finally, combine the phase lines and arrows. 199

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib