A chain rule for multivariate divided differences Michael S. Floater∗ Abstract In this paper we derive a formula for divided differences of composite functions of several variables with respect to rectangular grids of points. Letting the points coalesce yields a chain rule for partial derivatives of multivariate functions. Math Subject Classification: 41A05, 65D05, 26A06 Keywords: Divided difference, chain rule, calculus of several variables, Faa di Bruno formula. 1 Introduction The product rule for divided differences generalizes the Leibniz rule of calculus. It goes back at least to Popoviciu [16, 17] and Steffensen [22], and has played a role in the development of spline theory [1]. More recently, motivated by the error analysis of parametric curve interpolation, a chain rule was derived for divided differences: a formula that expresses divided differences of a composite function g ◦ f in terms of divided differences of the two functions f and g separately. Somewhat curiously, there are (at least) two such formulas, both of which were found in [8]. In one formula, all divided differences of the inner function f are over consecutive points, while in the other, all divided differences of the outer function g are over consecutive points, which suggests referring to the formulas as the inner and outer chain rules respectively. The outer chain rule was found independently in [23]. Centre of Mathematics for Applications, Department of Informatics, University of Oslo, PO Box 1053, Blindern, 0316 Oslo, Norway, email: michaelf@ifi.uio.no ∗ 1 Divided differences of composite functions appear in error formulas for parametric curve fitting based on polynomials and splines [15, 6, 7]. When compared with derivative chain rules such as Faa di Bruno’s formula [5], an advantage of the divided difference chain rules is that they can be applied to functions with little or no smoothness. Moreover, by letting the points in these difference formulas coalesce, they provide a new, and simple way of deriving the Faa di Bruno formula [5, 11, 18, 19, 12, 13, 20, 11]. The inner chain rule was later used in [9] to derive a rule for divided differences of inverses of functions. Surprisingly, the inverse rule can be interpreted in terms of partitions of convex polygons, and offers a new way of counting such partitions. Given these various applications it seems worthwhile looking for a divided difference chain rule for multivariate functions. In this paper we show that at least the inner chain rule has a natural extension to multivariate functions, when divided differences are defined over rectangular grids of points. The rule can be interpreted in terms of polygonal paths that pass through points in the grid and are monotonic with respect to each variable. Due to the fact that multivariate divided differences converge to normalized partial derivatives as the grid points coalesce, the rule yields a corresponding formula for partial derivatives of multivariate composite functions; cf. [4]. 2 The univariate formula We begin by recalling the inner chain rule of [8]. Let [x0 , x1 , . . . , xn ]f denote the usual divided difference of a real-valued function f at the real values x0 , . . . , xn ; see [3]. We define [xi ]f = f (xi ), and for distinct xi , [x0 , . . . , xn ]f = ([x1 , . . . , xn ]f − [x0 , . . . , xn−1 ]f )/(xn − x0 ). We can allow any of the xi to be equal if f has sufficiently many derivatives to allow it. In particular, if all the xi are equal to x say, then [x0 , x1 , . . . , xn ]f = f (n) (x)/n!. Sometimes, in order to simplify expressions, we will use the shorter notation [x; i, j]f := [xi , xi+1 , . . . , xj ]f, j ≥ i. (1) It was shown in [8] that for n ≥ 1 and f and g smooth enough, [x; 0, n](g ◦ f ) = n X X [fi0 , fi1 , . . . , fik ]g k=1 0=i0 <···<ik =n 2 k Y j=1 [x; ij−1 , ij ]f, (2) where fi := f (xi ). The formula is a sum over all sequences of integers (i0 , i1 , . . . , ik ) satisfying the condition 0 = i0 < i1 < · · · < ik = n, and the product term is formed by filling the gaps between xij−1 and xij . For example, the case n = 3 is [x; 0, 3](g ◦ f ) = [f0 , f3 ]g [x; 0, 3]f + [f0 , f1 , f3 ]g [x; 0, 1]f [x; 1, 3]f + [f0 , f2 , f3 ]g [x; 0, 2]f [x; 2, 3]f + [f0 , f1 , f2 , f3 ]g [x; 0, 1]f [x; 1, 2]f [x; 2, 3]f. 3 (3) Chain rule: inner function multivariate We begin the generalization of (2) by considering the case that the inner function f is replaced by a multivariate function while the outer function g remains univariate. Following the approach of ([10], sec. 6.6), if f is a function of p variables, f = f (x1 , . . . , xp ), we denote by [x10 , x11 , . . . , x1n1 ; x20 , x21 , . . . , x2n2 ; . . . ; xp0 , xp1 , . . . , xpnp ]f the divided difference of f with respect to the real values xrk , r = 1, . . . , p, 0 ≤ k ≤ nr . This is the ‘tensor-product’ kind of divided difference, in the sense that it is the leading coefficient of the unique tensor-product polynomial that interpolates f over the rectangular grid of points (x1k1 , . . . , xpkp ), 0 ≤ k r ≤ nr . If these points are not distinct, the interpolating polynomial is understood to be the Hermite one. The difficult but interesting issues of multivariate polynomial interpolation over other configurations of points and associated notions of divided differences have been addressed by several authors [1, 2, 14, 21]. In analogy with the univariate case, if xrkr → xr for all r, then [x10 , x11 , . . . , x1n1 ; . . . ; xp0 , xp1 , . . . , xpnp ]f → D n f (x) , n! where x := (x1 , x2 , . . . , xp ), n = (n1 , n2 , . . . , np ), n! := n1 !n2 ! · · · np !, and n1 n2 np ∂ ∂ ∂ n D f (x) := ··· f (x). ∂x1 ∂x2 ∂xp 3 Using the vector notation i = (i1 , . . . , ip ) and j = (j 1 , . . . , j p ), we can express the condition that ir ≤ j r for all r = 1, . . . , p as i ≤ j, in which case, in analogy with (1), we will sometimes make use of the shorthand [x; i, j]f := [x1i1 , x1i1 +1 , . . . , x1j 1 ; . . . ; xrir , xrir +1 , . . . , xrjr ; . . . ; xpip , xpip +1 , . . . , xpjp ]f. As in [8] we will need the product rule. In the multivariate case this is X [x; 0, n](f g) = [x; 0, i]f [x; i, n]g, (4) 0≤i≤n which is a straightforward generalization of the univariate product rule found by Popoviciu [16, 17] and Steffensen [22]. The formula can be viewed as a sum over polygonal paths that connect the point 0 to the point n in the grid of points i, 0 ≤ i ≤ n. Each path has two line segments, one connecting 0 to i, the other connecting i to n, either of which can degenerate to a point. Figure 1(a) illustrates an example of a path in the bivariate case, p = 2. The two shaded boxes indicate the points involved in the two divided differences in the sum in (4), the bottom left box applies to f , while the top right box applies to g. The rule extends in a simple way to deal with a product of r functions f1 , f2 , . . . , fr , by induction on r, ! r r Y X Y [x; 0, n] fj = [x; ij−1, ij ]fj . (5) j=1 0=i0 ≤i1 ≤···≤ir =n j=1 Like (4), this formula is a sum over non-decreasing polygonal paths: polygonal paths that connect the point 0 to the point n in such a way that the vertices form a non-decreasing sequence in each of the p coordinates. The paths now have r line segments (possibly degenerate). Figure 1(b) illustrates the bivariate case when there are four functions, r = 4. We now derive a chain rule for divided differences when the inner function is multivariate. With the vector notation, the formula looks almost like the univariate one. Theorem 1 If f : Rp → R and g : R → R are smooth enough, and fi := f (xi ), then [x; 0, n](g ◦ f ) = |n| X X [fi0 , . . . , fik ]g k=0 0=i0 <···<ik =n 4 k Y [x; ij−1 , ij ]f. j=1 (6) n n i3 i i2 i1 0 0 Figure 1: The bivariate product rule: (a) two functions, (b) four functions. Here, i1 < i2 means i1 ≤ i2 and i1 6= i2 , and |n| := n1 + n2 + · · · + np . Note that we allow n = 0 in this formula, which is why we include the case k = 0 in the first sum, unlike in (2). If n = 0 we interpret the condition 0 = i0 < i1 < · · · < ik = n (7) as having the unique solution k = 0 and i0 = 0 and with the convention that an empty product has the value 1, formula (6) reduces to the correct equation (g ◦ f )(x0 ) = g(f (x0)). If on the other hand |n| ≥ 1, all solutions to (7) require k ≥ 1. So for |n| ≥ 1 we could replace the first sum in (6) by P|n| k=1 . However, we will need to allow the case n = 0 when we use (6) to prove Theorem 2. Proof. It is sufficient to prove (6) when g is a polynomial of degree ≤ |n| because if we replace g by the polynomial of degree ≤ |n| that interpolates it on the points fi , 0 ≤ i ≤ n, the divided differences in (6) are unchanged. Therefore, by the linearity of (6) in g, it is also sufficient to prove it for any monomial g(y) = y r , with r ≤ |n|. In this case, by the divided difference rule for the product of r functions, r [x; 0, n](g ◦ f ) = [x; 0, n](f ) = X r Y [x; αq−1 , αq ]f. 0=α0 ≤···≤αr =n q=1 Now, by counting the multiplicities of the points αj we have (α0 , . . . , αr ) = (i0 , . . . , i0 , . . . , ik , . . . , ik ) | {z } | {z } 1+µ0 5 1+µk (8) Figure 2: Chain rule with f bivariate and n = (1, 1). for some k with 0 ≤ k ≤ r and 0 = i0 < i1 < · · · < ik = n and µ0 + · · ·+ µk = r − k, with µj ≥ 0. The product on the right of (8) is then r Y q=1 k Y [x; αq−1 , αq ]f = f (xi0 )f (xi1 ) · · · f (xik ) [x; ij−1 , ij ]f, µ0 µk µ1 j=1 and we obtain [x; 0, n](g ◦ f ) = r X X X k=0 0=i0 <···<ik =n µ0 +···+µk =r−k fiµ00 · · · fiµkk k Y [x; ij−1 , ij ]f. (9) j=1 Now, observe that X fiµ00 · · · fiµkk = [fi0 , . . . , fik ]g, µ0 +···+µk =r−k the divided difference of the monomial g(y) = y r at the points fi0 , . . . , fik . We substitute this identity into (9), and, then, since [fi0 , . . . , fik ]g = 0 if k > r, we can replace the upper limit r in the first sum in (9) by |n|. 2 Similar to the product rule (5), formula (6) can be thought of as a sum over non-decreasing polygonal paths of vertices that connect 0 to n in the grid of points i, 0 ≤ i ≤ n. The number of line segments in the paths now varies, and is given by k. As long as n 6= 0, the shortest path has one line segment (k = 1), and the longest paths have k = |n| line segments. Unlike the product rule, none of the line segments are degenerate, due to the strict inequality ij−1 < ij . Figure 1(b) can be used to illustrate the structure of the chain rule: it shows a path in (6) in the case k = 4. The shaded boxes indicate the divided differences of f in the sum in (6). 6 Figure 3: Chain rule with f bivariate and n = (2, 1). We write out a couple of examples of (6). In the bivariate case p = 2 with n = (1, 1) the formula has three terms. There is one term with k = 1, corresponding to the path (i0 , i1 ) = (00, 11) (the first row of Figure 2) and two terms with k = 2 corresponding to the paths (i0 , i1 , i2 ) = (00, 10, 11) and (i0 , i1 , i2 ) = (00, 01, 11) (the second row of Figure 2); [x; 00, 11](g ◦ f ) = [f00 , f11 ]g [x; 00, 11]f + [f00 , f10 , f11 ]g [x; 00, 10]f [x; 10, 11]f + [f00 , f01 , f11 ]g [x; 00, 01]f [x; 01, 11]f. (10) In the case n = (2, 1), the formula is a sum over eight terms, one for each of the paths shown in Figure 3. The three rows in the figure show respectively: the unique path with k = 1 segment, the four paths with k = 2 segments, and the three paths with k = 3 segments, and the formula is [x; 00, 21](g ◦ f ) = [f00 , f21 ]g [x; 00, 21]f + [f00 , f10 , f21 ]g [x; 00, 10]f [x; 10, 21]f + [f00 , f20 , f21 ]g [x; 00, 20]f [x; 20, 21]f + [f00 , f01 , f21 ]g [x; 00, 01]f [x; 01, 21]f + [f00 , f11 , f21 ]g [x; 00, 11]f [x; 11, 21]f + [f00 , f10 , f20 , f21 ]g [x; 00, 10]f [x; 10, 20]f [x; 20, 21]f + [f00 , f10 , f11 , f21 ]g [x; 00, 10]f [x; 10, 11]f [x; 11, 21]f + [f00 , f01 , f11 , f21 ]g [x; 00, 01]f [x; 01, 11]f [x; 11, 21]f. (11) 7 4 Chain rule: both functions multivariate Now we treat the full multivariate case, building on Theorem 1. We consider a composition g ◦ f in which f is a multivariate, vector-valued function f : Rp → Rq and g is a multivariate scalar-valued function g : Rq → R. Theorem 2 If f : Rp → Rq and g : Rq → R are smooth enough and f = (f1 , . . . , fq ) then, defining fr,i := fr (xi ), [x; 0, n](g ◦ f) = |n| X X [f1,ij0 , . . . , f1,ij1 ; . . . ; fq,ijq−1 , . . . , fq,ijq ]g × k=0 0=i0 <···<ik =n 0=j0 ≤···≤jq =k jr Y q Y [x; ij−1, ij ]fr . (12) r=1 j=jr−1+1 Proof. Similar to the proof of Theorem 1, it is sufficient to prove the formula when g is a polynomial of the form g(y1, . . . , yq ) = y1r1 · · · yqrq , where 0 ≤ rj ≤ nj . Therefore, it is also sufficient to prove the formula in the more general case that g is the product of q univariate functions, gi : R → R, i = 1, . . . , q, g(y1, . . . , yq ) = g1 (y1 ) · · · gq (yq ). (13) To show this, observe that with g in this form, the composition g ◦ f is given by g(f(x)) = g1 (f1 (x)) · · · gq (fq (x)). (14) Then, due to the product rule (5), we have [x; 0, n](g ◦ f) = q Y [x; αr−1 , αr ](gr ◦ fr ). X 0=α0 ≤α1 ≤···≤αq =n r=1 Next, since gr is univariate, we can expand the divided differences on the right using the chain rule of Theorem 1. Note that these divided differences may have order 0, due to the possibility that αr−1 = αr , and this is why we allowed n = 0 (and consequently k = 0) in formula (6). With the shorthand k Y Ar (i0 , . . . , ik ) := [fr,i0 , . . . , fr,ik ]gr [x; ij−1 , ij ]fr , j=1 8 (15) for points i0 , . . . , ik ∈ Rp and r = 1, . . . , q, we find [x; 0, n](g ◦ f) = q |αr −αr−1 | Y X X 0=α0 ≤α1 ≤···≤αq =n r=1 kr =0 X Ar (ir0 , . . . , irk ). αr−1 =ir0 <···<irkr =αr Next, observe that since the sequence ir0 < · · · < irkr in this expression connects αr−1 to αr , when we put all q of these sequences together we obtain a single increasing sequence that connects 0 to n. This means that we can rearrange the formula to be a single sum over paths connecting 0 to n, and over all possible choices of q sub-paths, with each sub-path contributing to one of the functions gr : [x; 0, n](g ◦ f) = |n| X X q Y Ar (ijr−1 , . . . , ijr ). k=0 0=i0 <···<ik =n r=1 0=j0 ≤···≤jq =k To complete the proof, we expand the term Ar using (15), and hence we obtain equation (12) for g of the form (13) because with this choice of g, q Y [fr,ijr−1 , . . . , fr,ijr ]gr = [f1,ij0 , . . . , f1,ij1 ; . . . ; fq,ijq−1 , . . . , fq,ijq ]g. r=1 2 We look at four examples. First, if p = 1 and q = 2, and setting a = f1 and b = f2 , the composition has the form g(f(x)) = g(a(x), b(x)), and the formula with n = n = 1 is [x; 0, 1](g ◦ f) = [a0 , a1 ; b1 ]g [x; 0, 1]a + [a0 ; b0 , b1 ]g [x; 0, 1]b, (16) and with n = n = 2, [x; 0, 2](g ◦ f) = [a0 , a2 ; b2 ]g [x; 0, 2]a + [a0 ; b0 , b2 ]g [x; 0, 2]b + [a0 , a1 , a2 ; b2 ]g [x; 0, 1]a [x; 1, 2]a + [a0 , a1 ; b1 , b2 ]g [x; 0, 1]a [x; 1, 2]b + [a0 ; b0 , b1 , b2 ]g [x; 0, 1]b [x; 1, 2]b. 9 (17) If p = 2 and q = 2, with a = f1 and b = f2 , the composition has the form g(f(x)) = g(a(x), b(x)), with x ∈ R2 . Specializing to n = (1, 1) we then obtain a sum over the three paths of Figure 2. But each path is now split into q = 2 parts, and there are k + 1 ways of splitting a path with k segments. Thus the total number of terms is 2 × 1 + 3 × 2 = 8: [x; 00, 11](g ◦ f) = [a00 , a11 ; b11 ]g [x; 00, 11]a + [a00 ; b00 , b11 ]g [x; 00, 11]b + [a00 , a10 , a11 ; b11 ]g [x; 00, 10]a [x; 10, 11]a + [a00 , a10 ; b10 , b11 ]g [x; 00, 10]a [x; 10, 11]b + [a00 ; b00 , b10 , b11 ]g [x; 00, 10]b [x; 10, 11]b + [a00 , a01 , a11 ; b11 ]g [x; 00, 01]a [x; 01, 11]a + [a00 , a01 ; b01 , b11 ]g [x; 00, 01]a [x; 01, 11]b + [a00 ; b00 , b01 , b11 ]g [x; 00, 01]b [x; 01, 11]b. (18) If instead, n = (2, 1), the formula is a sum over the eight paths of Figure 3. We do not write it out but the number of terms is 2 × 1 + 3 × 4 + 4 × 3 = 26. 5 Chain rule for derivatives Letting the points xk in equation (12) converge to x immediately gives a multivariate chain rule for derivatives. The points ir in the second sum in (12) are no longer specifically needed in the derivative version; the order of the partial derivative of fr depends only on the vector difference αr := ir − ir−1 . Thus, summing over the αr instead of the ir , the derivative formula is |n| D n (g ◦ f)(x) X = n! k=0 X jr q D j g(f(x)) Y Y D αj fr (x) , j! αj ! =n r=1 j=j +1 α1 +···+αk 0=j0 ≤···≤jq =k (19) r−1 where the second sum is over αr > 0, and j := (j1 − j0 , . . . , jq − jq−1 ). This equation contains, in general, repeated terms, unlike Faa di Bruno’s formula and the multivariate formula in [4]. It could be reformulated to avoid repeats by summing over partitions of vector integers using the same approach as that of the conversion of equation (25) of [8] to equation (26) of [8]. Instead we simply gives some examples, taking, for comparison, the same cases that we looked at earlier for divided differences. Specializing (19) 10 to the cases treated in (3), (10–11), (16–17), and (18), one obtains, after cancelling factorials and drawing together repeated terms, their respective derivative counterparts: (g ◦ f )′′′ = g ′ f ′′′ + 3g ′′ f ′ f ′′ + g ′′′ (f ′ )3 , D 11 (g ◦ f ) = g ′ D 11 f + g ′′ D 10 f D01 f, D 21 (g ◦ f ) = g ′ D 21 f + g ′′ (D 20 f D01 f + 2D 10 f D11 f ) + g ′′′ (D 10 f )2 D 01 f, (g ◦ f)′ = D 10 g a′ + D 01 g b′ , (g ◦ f)′′ = D 10 g a′′ + D 01 g b′′ + D 20 g (a′ )2 + 2D 11 g a′ b′ + D 02 g (b′ )2 , and D 11 (g ◦ f) = D 10 g D 11 a + D 01 g D 11 b + D 20 g D 10 a D 01 a + D 11 g D 10 a D 01 b + D 11 g D 01 a D 10 b + D 02 g D 01 b D 10 b. References [1] C. de Boor, A Leibniz formula for multivariate divided differences, SIAM J. Numer. Anal. 41 (2003), 856–868. [2] C. de Boor, A multivariate divided difference, in Approximation Theory VIII, Vol 1: Approximation and Interpolation, C. K. Chui and L. L. Schumaker (eds.), World Scientific, Singapore, 1995, pp. 87–96. [3] C. de Boor, Divided differences, Surveys in Approximation Theory 1 (2005), 46—69. [4] G. M. Constantine and T. H. Savits, A multivariate Faa di Bruno formula with applications. Trans. Amer. Math. Soc. 348 (1996), 503–520. [5] C. F. Faa di Bruno, Note sur une nouvelle formule de calcul differentiel, Quarterly J. Pure Appl. Math, 1 (1857), 359–360. [6] M. S. Floater, Arc length estimation and the convergence of parametric polynomial interpolation, BIT 45 (2005), 679–694. 11 [7] M. S. Floater, Chordal cubic spline interpolation is fourth order accurate, IMA J. Numer. Anal. 26 (2006), 25–33. [8] M. S. Floater and T. Lyche, Two chain rules for divided differences and Faa di Bruno’s formula, Math. Comp. 76 (2007), 867–877. [9] M. S. Floater and T. Lyche, Divided differences of inverse functions and partitions of a convex polygon, Math. Comp. 77 (2008), 2295-2308. [10] E. Isaacson and H. B. Keller, Analysis of numerical methods, Wiley, 1966. [11] W. P. Johnson, The curious history of Faa di Bruno’s formula, Amer. Math. Monthly 109 (2002), 217–234. [12] C. Jordan, Calculus of finite differences, Chelsea, New York, 1947. [13] D. Knuth, The art of computer programming, Vol I, Addison Wesley, 1975. [14] C. A. Micchelli, A constructive approach to Kergin interpolation in Rk : multivariate B-splines and Lagrange interpolation, Rocky Mountain J. Math. 10 (1979), 485–497. [15] K. Mørken and K. Scherer, A general framework for high-accuracy parametric interpolation, Math. Comp. 66 (1997), 237–260. [16] T. Popoviciu, Sur quelques propriétés des fonctions d’une ou de deux variables reélles, dissertation, presented at the Faculté des Sciences de Paris, published by Institutul de Arte Grafice “Ardealul” (Cluj, Romania), 1933. [17] T. Popoviciu, Introduction à la théorie des différences divisées, Bull. Math. Soc. Roumaine Sciences 42 (1940), 65–78. [18] J. Riordan, Derivatives of composite functions, Bull. Amer. Math. Soc. 52 (1946), 664–667. [19] J. Riordan, An introduction to combinatorial analysis, John Wiley, New York, 1958. 12 [20] S. Roman, The formula of Faa di Bruno, Amer. Math. Monthly 87 (1980), 805–809. [21] T. Sauer, Polynomial interpolation in several variables: lattices, differences, and ideals, in Topics in multivariate approximation and interpolation, K. Jetter et al. (eds.), Elsevier, 2006, pp. 191–230. [22] J. F. Steffensen, Note on divided differences, Danske Vid. Selsk. Math.Fys. Medd 17 (1939), 1–12. [23] X. Wang and H. Wang, On the divided difference form of Faà di Bruno’s formula, J. Comp. Math. 24 (2006), 553–560. 13