ARE211, Fall2012 CALCULUS1: TUE, OCT 2, 2012 PRINTED: OCTOBER 9, 2012 (LEC# 12) Contents 3. Univariate and Multivariate Differentiation 1 3.1. Univariate calculus: the derivative 1 3.2. The fundamental idea: linear approximations to nonlinear functions 2 3.3. Univariate Calculus: the differential 4 3.4. Multivariate calculus: functions from Rn to R 5 3.4.1. Partial Derivative 5 3.4.2. The Gradient 6 3.4.3. Crosspartial Derivative 6 3.4.4. The differential for functions from Rn to R 7 3. Univariate and Multivariate Differentiation 3.1. Univariate calculus: the derivative Assume you all know how to calculate the derivative of a single variable function, i.e., given f , calculate df (·) dx , denoted also f ′ (·). Important to know the difference between f ′ (·), which is a function and f ′ (x), which is a number, the function evaluated at a point. I’ll try to be careful to use this notation from now on: g(·) is a RULE, represents a function. 1 2 CALCULUS1: TUE, OCT 2, 2012 PRINTED: OCTOBER 9, 2012 (LEC# 12) Since f ′ (·) is a function, like any other, it may have a derivative; if it does, call it f ′′ (·). Do example f (x) = x2 . Lots of standard kinds of functions you have to be able to differentiate in your sleep. Equivalent of being able to spell. Brainless activity. 3.2. The fundamental idea: linear approximations to nonlinear functions The most important thing to keep in mind about calculus is: (Calculus Mantra) When you do calculus, you are, ALWAYS, approximating a small change in a differentiable function by evaluating THE (unique) appropriate LINEAR function at the magnitude of the change. That is, for sufficiently small dx, f (x + dx) − f (x) ≈ f ′ (x)dx. df x (·) = f ′ (x)(·) is the linear function that approximates the difference between f (x + ·) and f (x). Draw picture of the comparison: concave example. • graph of f (·) with points x∗ and x∗ + dx∗ . • look at difference in the value of the function at the two points. • look at the graph of the linear function df = f ′ (x∗ )dx: • note that the difference between (f (x∗ + dx∗ ) − f (x∗ )) and f ′ (x∗ )dx∗ will be arbitrarily small provided that dx∗ is sufficiently small. • Explain the enormous practical importance of this. ARE211, Fall2012 3 f df ∗ df x = f ′ (x∗ )dx x∗ f (x∗ ) Estimated change in f Actual change in f df (dx† ) f ′ (x∗ )dx† + f (x∗ + dx† ) dx† f (x∗ ) dx x x∗ x∗ + dx† Figure 1. Linear Approximations of the change in a nonlinear function When dx is bigger, don’t do so well at approximation. Could do better if you threw in the second derivative, i.e., f (x + dx) − f (x) ≈ even better 1 f ′ (x)dx + f ′′ (x)dx2 . 2 Note that in the above figure, f ′′ (x∗ ) < 0. Note also that the linear approximation gives an overestimate of the change, so that adding the second term helps. This is the germ of a crucial theorem called Taylor’s theorem, that we’ll see more of later. 4 CALCULUS1: TUE, OCT 2, 2012 PRINTED: OCTOBER 9, 2012 (LEC# 12) 3.3. Univariate Calculus: the differential You’ve probably seen the expression dy = f ′ (x)dx, known as the differential, before. What sort of object is this? What’s the relationship between the differential and the derivative? Recall from linear algebra about the relationship bewteen linear functions from R1 to R1 and scalars; every such function is uniquely defined by a single scalar. E.g., I sometimes talked about the function p, when I really meant the linear function f (·) defined by f (x) = px. Well... The differential is to the derivative as the linear function is to the scalar/vector that defines it. The relationship between the differential and the derivative is a particular example of the relationship between any linear function and the scalar/vector/matrix that defines it. Defn: the differential of f at x is the linear function defined by the scalar f ′ (x). In other words, the differential of f at x is the unique linear function whose coefficient (slope) is equal to the derivative of f at x. Restate: The most important thing to keep in mind about calculus is: when you do calculus on f at x, you are, ALWAYS, approximating a small change in f , starting from x, by evaluating the differential of f at x at the magnitude of the change. This is the idea that underlies all of comparative statics. That is, dy = f ′ (x)dx is the value of the approximation to the change in f when you shift from x to x + dx. Economic example: have π(p), i.e., profits are a function of prices (assuming optimal input choices for prices p.) If p change is small enough, dπ p ≈ π ′ (p)dp. ARE211, Fall2012 5 It is the ability to make computations like this—we call them comparative statics—that distinguish economists from art historians. Example: • Let f (x) = x3 + 2x2 • set x∗ = 1, dx∗ = 0.1 • then f (x∗ ) = 3, f ′ (x∗ ) = 3x∗ 2 + 4x∗ = 7, f (x∗ + dx∗ ) = 1.13 + 2 ∗ 1.21 = 3.751. • The true difference f (x∗ + dx∗ ) − f (x∗ ) is .751. • the estimated difference is f ′ (x∗ )dx∗ = (3 × 12 + 4 × 1) × dx∗ = 0.70, not bad. 3.4. Multivariate calculus: functions from Rn to R 3.4.1. Partial Derivative. Cookbook approach: If f : Rn → R, then fi (·) ≡ ∂f (·) ∂xi is the (usual) derivative of the single variable function you get by treating all other variables as constant. Example: f (x) = x21 + 2x1 x2 + x22 = x21 + 2x1 α + β f1 (x) = 2x1 + 2α + 0 = 2x1 + 2x2 A more graphical view of partial derivatives. Take a cross-section of the graph of the function along the i’th axis, and look at the slope of the single-dimensional function you obtain in this way. This slope is the i’th partial derivative. 6 CALCULUS1: TUE, OCT 2, 2012 PRINTED: OCTOBER 9, 2012 (LEC# 12) Similarly, you could take a diagonal cross-section, and get a different one-dimensional slope. Generally, you get what is called a directional derivative. Partial derivatives are just special kinds of directional derivatives: in particular, they tell you the slopes you obtain when you take crosssections along the various axes. We’ll talk about directional derivatives much more in the next lecture. 3.4.2. The Gradient. Given a function f : Rn → R, the gradient of f is just the function from Rn to Rn that maps each x ∈ Rn to the vector of partial derivatives of f at x. That is, f1 (·) . .. ▽f (·) = fi (·) . .. fn (·) ▽f (·) is the exact analog for f : Rn → R. as f ′ (·) is for f : R → R. In fact the words “slope of f,” “gradient of f” and “derivative of f” are synonomous. The important distinction between the “gradient of f ” which is a function, written ▽f or ▽f (·) and the “gradient of f at x,” which is a vector. 3.4.3. Crosspartial Derivative. A cross-partial derivative is just an entry in the matrix which is the derivative of the derivative. That is, take a function f : Rn → R. The i’th partial derivative of this function fi (·) is a function, like any other, and if the function is differentiable, it has derivatives: ARE211, Fall2012 fij (·) ≡ ∂2f ∂xi ∂xj (·) is the j’th partial derivative of the function fi (·) ≡ 7 ∂f ∂xi (·). Thus, the Hessian of f just consists of the matrix of crosspartials of f . 3.4.4. The differential for functions from Rn to R. Recall from earlier: the most important thing to keep in mind about calculus is that when you do calculus on f at x, you are, ALWAYS, approximating a small change in f , starting from x, by evaluating the differential of f at x at the magnitude of the change. We’ll look at two numerical examples, then look at the picture. Numerical example: • f (x, y) = x2 + y 2 ; • ▽f (x, y) = (2x, 2y); • Consider (dx, dy) = (0.1, 0.1) and set (x∗ , y ∗ ) = (100, 100) • approx f (x∗ +dx, y ∗ +dy)−f (x∗ , y ∗ ) by ▽f (x∗ , y ∗ )×(0.1, 0.1) = (200, 200)×(0.1, 0.1) = 40. • Now evaluate at f at both (x∗ , y ∗ ) = (100, 100) and (x∗ + dx, y ∗ + dy) = (100.1, 100.1), and compare: f (x∗ , y ∗ ) = 20, 000; f (x∗ + dx, y ∗ + dy) = 20040.02; Actual difference is 40.02. Symbolic example: • y = f (ℓ, k); • ▽f (ℓ, k) = (fℓ (ℓ, k), fk (ℓ, k)). • Fix (ℓ∗ , k∗ ) and let (ℓ, k) = (ℓ∗ , k∗ ) + (dℓ, dk). • Approximate the difference f (ℓ, k) − f (ℓ∗ , k∗ ) by the value of the differential, evaluated at (dℓ, dk) = (ℓ, k) − (ℓ∗ , k∗ ): f (ℓ, k) − f (ℓ∗ , k∗ ) = dy = fℓ (ℓ∗ , k∗ )dℓ + fk (ℓ∗ , k∗ )dk 8 CALCULUS1: TUE, OCT 2, 2012 PRINTED: OCTOBER 9, 2012 (LEC# 12) Graphical example: In Fig. 2, we’re doing exactly the same thing for functions from Rn to R as we did in Fig. 1 for functions from R to R. We evaluate the difference f (x∗ + dx† ) − f (x∗ ), where here dx† and x∗ ’s are both vectors, by the value of the differential, evaluated at the difference, dx† . What’s the differential, in terms that we’ve used before? Recall that it is a linear function from Rn ∗ to R; so it is characterized by a vector of coefficients. That is, df x = ▽f (x∗ ) · dx. As Fig. 2 indicates, the differential here plays exactly the same role as it played for functions from R to R. In Fig. 2, the tangent plane lies below the function, at least a neighborhood of x∗ . Then, as we move from x∗ to x∗ + dx† , the value of the function declines; but the height of the tangent plane declines even further. Applying the calculus mantra, we approximate the change in f , i.e., f (x∗ + dx† ) − f (x∗ ), by translating the tangent plane so that x∗ becomes the new origin, then approximate the change in f when you move from x∗ to x∗ + dx† by evaluating the height of this ∗ translated plane at the point dx† . Notice from the top panel that the absolute value of df x (dx† ) over-estimates the absolute magnitude of the change in the function. The differential is sometimes refered to as the total differential. I’ll hopefully never use this term, because it is too close to something that’s different, i.e., the total derivative. The total derivative is related to (but not the same as) a directional derivative (i.e., you take the cross section of the function in a special direction). The total derivative is a either number function df (·) dx . df (x) dx ∗ or possibly a non-linear The total differential is the name of a linear function,. df x (dx) = ▽f (x∗ )dx. ARE211, Fall2012 9 Graph of f , with tangent plane at x∗ 0 f (x∗ ) f (x∗ +dx† ) f (x∗ )+▽f (x∗ )·dx† x∗ + dx† x2 x1 x∗ Graph of f ; rotated 0 f (x∗ ) f (x∗ +dx† ) x∗ + dx† f (x∗ )+▽f (x∗ )·dx† x∗ x1 x2 Differential of f at x∗ , eval at dx† 0 ∗ df x (dx† ) dx2 0 dx1 0 Figure 2. Linear Approximation of the change in a nonlinear multivariate function