1 EXAMPLES ON AN ALGORITHM FOR LEAST SQUARES DATA FITTING BY NONNEGATIVE DIFFERENCES I.C. Demetriou1, E.A. Lipitakis2 and E.E. Vassiliou1 Abstract-- A smooth function is measured at equally spaced abscissae and the measurements contain random I. INTRODUCTION errors. We address the problem of making the least sum A smooth function f ( x) is measured at the of squares change to the data by requiring nonnegative abscissae xi , i 1, 2,..., n and the measurements differences of order r for the smoothed values. The problem is a strictly convex quadratic programming (data) {φi f ( xi ) : i 1, 2,..., n} contain random calculation, where each of the constraint functions errors. If depends on r+1 adjacent components of the smoothed nonnegative r th derivative ( f values, which are the binomial coefficients with alternating signs that arise in the expansion of (1 1) r . We take account of this structure and describe a special f is r -times differentiable with is called r - convex by Karlin 1968), then the r th order differences of the f ( xi ) s are also nonnegative. active set method that is much faster than general Therefore it seems appropriate to modify the data quadratic programming algorithms. We present two in order that their differences allow no sign examples that illustrate our approach and that although changes. This condition when r 1 (Robertson et have a common development they follow different al. 1988) or r 2 (Demetriou & Powell 1991) is solution paths. The first of them starts from the point highly that satisfies all the constraints as equalities and the second starts from the unconstrained minimum of the problem. descriptive, because it implies monotonicity or convexity of the new values respectively. The case r 3 may be illustrated by the following example. If the measurements show Index Terms-- data smoothing, divided difference, least an inflection point and away from this point the squares fitting, r-convexity, quadratic programming underlying function seems to be concave and convex, then it would be suitable to introduce the 1 Department of Economics, University of Athens, 8 Pesmazoglou Street, Athens 10559, Greece. E-mail: demetri@econ.uoa.gr (I.C. Demetriou) and evagvasil@econ.uoa.gr (E.E. Vassiliou) 2 Athens University of Economics and Business, Department of Informatics, 76 Patission Street, Athens 104 34, Greece. E-mail: eal@aueb.gr condition that the third order differences are nonnegative. 2 In the general case, r is a positive integer, twice the unit matrix, the problem of minimizing much smaller than n, and Demetriou & Lipitakis (1.1) subject to (1.2) is a strictly convex quadratic (2005) address the problem of calculating programming problem that has a unique solution. numbers { yi : i 1, 2,..., n} from the data that The Karush-Kuhn-Tucker optimality conditions minimize the objective function characterize the solution, say it is y , (see, for * n Φ ( y1 , y2 ,..., yn ) ( yi φi ) 2 , (1.1) i 1 example, Nocedal & Wright 1999) by stating that 2( y φ ) = * subject to the constraints that the divided iA differences of order r are nonnegative. In this where A paper we consider the case where the abscissae {1, 2,..., n r} and λ are equally spaced with uniform spacing h, so xi x1 (i 1)h , i 1, 2,..., n, , and obtain some computational advantages. Indeed now the Δ ir y 0, i 1, 2,..., n r , (1.2) where Δ ir y is the r th order difference (see, for example, Hildebrand 1956) 1 r !h r is a subset of the constraint indices * is the (n r ) -vector of Lagrange multipliers λ*i . Two special quadratic programming methods are available for calculating the solution; namely constraints are Δ ir y * λ*i ai , λ*i 0, i A * , ir (1) r j i j i a primal method by Cullinan (1990) and a dual method by Demetriou & Lipitakis (2005). However, we consider a method that takes account of the fact that the abscissae are equally r yj . j i (1.3) spaced and we note that this has indeed an impact to an efficient calculation. For instance, the scaled form of the r th All these methods generate a sequence of difference in (1.3) when r 3 is subsets of the constraint indices {1, 2,..., n r} , Δ 3i yi 3 yi 1 3 yi 2 yi 3 , where each subset, A say, has the property and when r 4 is y ai 0 , i A . (1.4) T Δ yi 4 yi 1 6 yi 2 4 yi 3 yi 4 . 4 i Vector y is obtained by solving the equality Let y be the vector whose components are the numbers { yi : i 1, 2,..., n} , let ai be the i th constrained problem minimize (1.1) subject to (1.4) constraint normal with respect to y and let the constraint functions be in the form Since the constraint normals (1.5) are linearly y ai Δ y, i 1,2,..., n r . Since the constraints on independent, y are linear and consistent and since the second λi , i A , are defined by the first order derivative matrix of (1.1) with respect to y is optimality condition (Fletcher 2001) T r i unique Lagrange multipliers 3 2( y φ) iA λi ai . (1.6) reached several times during the sequence of estimates of A * , but there is no cycling because In the Section II we describe a suitable quadratic programming algorithm. We also show the corresponding values of (1.1) increase strictly monotonically. that the equality-constrained problem (1.5) may be solved efficiently and stably by introducing a At this stage if the conditions (1.2) are suitable transformation to the linear space of satisfied then y y as required. Otherwise we variables defined by (1.4). In view of the pick the most violated constraint, a j y 0 say, uniformly spaced data the calculation of the Lagrange multipliers becomes very efficient by * T and we add a j to the active set, which strictly making use of some methods that have been increases the value of (1.1). Since developed by Demetriou & Lipitakis (2001). In ai , i 1, 2,..., n 2 , are linearly independent, any Section III we present two examples that show addition to the active set is a well-defined certain features of this calculation. operation, but our method does guard against a nearly linear dependent a j . Further, we add j to II. BASIC ALGORITHMS A , update y , record and calculate the The quadratic programming algorithm we rely new . If the new multipliers have acceptable upon is a version of the algorithm of Demetriou signs, there is a branch to the part of the and Powell (1991), which gives priority to the algorithm that is described in this paragraph, but conditions on the Lagrange multipliers than to the otherwise we proceed as follows. conditions on * y . The algorithm identifies iteratively an active set of constraint indices A Now, both and are available, the components of are nonnegative but one or * such that the estimate of y , say it is y , solves (1.5) and identifies unique Lagrange multipliers i , i A , from the vector equation (1.6). Let A more of the components of are negative. Also, the condition j 0 , which always holds in theory, is being forced by the method in order be the initial guess of A * . For example A may that j remains in A . Specifically, if there is a be the empty set or {1, 2,..., n r} . We begin by negative multiplier, then we seek the greatest calculating the multipliers i , i A , and if we value find any negative multipliers, then we start (1 ) i i , i A , removing indices from A , one at a time, until the implies 0 1 . If q is the value of i that gives Lagrange multipliers are all nonnegative. The (1 ) i i 0 , then the q-th element of the stage when the multipliers are acceptable may be of such that all the numbers are nonnegative, which 4 active set is deleted, A is replaced by A \{q} , It holds (see Fletcher, 2001) that any vector y is replaced by (1 ) and y is updated that satisfies (1.4) can be written as y , where by a method outlined below. The components of is in the null column space of Δ . This is the have all acceptable signs but they are not the linear space { : Δ 0} , true multipliers of the current active set. Therefore we calculate the new values of the components of , and if they are acceptable the algorithm branches to the testing of the constraints (1.2) that has been described already. Otherwise the procedure of this paragraph is applied recursively, until the multipliers are accepted. Each constraint deletion reduces the value of (1.1), but never below the value it had at whose dimension is programming complete. This algorithm is U be a n (n p) matrix such that ΔU 0 . The purpose of the matrix U is that it has linearly independent columns u1 , u2 ,..., un p , which are in the null space and therefore act as basis vectors for the null space. That is to say that points y can be expressed as n p the last accepted . The description of the quadratic n p . Let y U uii , (2.1) i 1 now a where 1 , 2 ,..., n p are the components in each consequence of the observation that the number reduced coordinate direction. Thus (2.1) provides of different active sets of constraints that it a way of eliminating the active constraints in generates is finite. terms of the vector of reduced variables , which algorithm terminates as The implementation of these ideas is more has n p elements. elaborated than the description of the algorithm, Cullinan (1990) suggests a certain way for because it employs a basis transformation of constructing the basis vectors u1 , u2 ,..., un p of the problem (1.5) that makes our calculation very efficient. A generalized elimination method can be used matrix U . An alternative way for the case of equally spaced abscissae is presented in this paper. to minimize (1.1) subject to (1.4). We denote by ΔT [a i : i A ] the (n r ) n matrix that is formed by the normals {ai : i A } and we call active all the constraints that satisfy (1.4). Also let p A , where A is the number of elements of A . Let K {1, 2,..., n r} \ A be the set of inactive constraints and K k n r p , where K is the numbers of elements of K. Further, let q be the least integer such that q q 1 . Finally, let 2q r , 5 Q 1,..., q, n r q 1,..., n , (2.3) form a basis for the n p dimensional K k q : k K , subspace of the vectors y that satisfy (1.4). and The basis matrix U has a specific structure, a T K Q . fact that has an impact to the efficiency of our Then Q r . The vectors ut , for t T , are then uniquely defined by if t T and j t 1 (ut ) j 0 (2.2) otherwise and aiT ut 0 if t T and i A . (2.3) Three important properties can be derived from this definition. U obtains the form t Q j Q Iq jK U j Q j I t K I nr p t Q I n p , I r q U p( n p ) where, I q is the q q identity matrix and denotes the unknowns elements of the basis First, because of (2.2), the vectors ut , satisfy the equality constraints (1.4). Second, we let calculation. Specifically, by permuting its rows, tT t ut 0 , and by taking the components j, (2.4) j T , of equation (2.3), we obtain from (2.1) that (ut ) j 1 for t j and (ut ) j 0 for t j , t T . Thus, j 0 for all j T . vectors, that need be computed. Each of the basis vectors ut , t T can be found by solving the p p systems jI ai , j q (ut ) j q (bt ) , i for i I , (2.5) where the vector bt is a multiple of column t of Δ after we delete the rows i K . Define the p p matrix Δ to be the So, the vectors ut , t T are linearly independent. coefficient matrix of the systems (2.4). We notice Third, that Δ can be derived from Δ if we delete the K Q columns j T and the rows i K . Demetriou and Lipitakis (2001) proved that the resultant and system is nonsingular, so it can be solved T K Q K Q (n r p ) r n p , efficiently by using LU factorization, for each so the dimension of the subspace is n p . As a bt , in order to specify the p unknowns. The consequence of these properties we conclude that advantage of the way that the basis vectors are the vectors ut , for t T , defined by (2.2) and constructed is exactly that we solve systems of 6 order p instead of n r , where p in practice is Demetriou and Lipitakis (2001) have shown that much smaller than n r . in view of uniformly spaced data a specific After we have calculated the basis matrix U , choice of A from the n equations of (1.6) by substituting (2.1) into (1.1), we consider the allows a positive definite subsystem of equations. reduced quadratic function Specifically, let i(1), i(2),..., i( A ) be the elements ( ) TUTU 2 U . of A in ascending order and define the A A The matrix U T U is a symmetric positive definite matrix M by (M)cd (Δ)i ( d ),i (c)q , matrix so, a unique minimizer * exists, which can be derived by solving the linear system 1 c, d A and the A -vectors and b by ( ) 0 d i ( d ) and bc 2( yi (c )q i (c )q ), 1 c, d A or UT U UT . (2.6) Thus, we derive the positive system of equations A The solution is achieved by computing the Cholesky LLT factors of U T U . Then y (A ) is obtained by substituting into (2.1). d 1 d ( M )cd 2( yc φc ), 1 c A , (2.7) which can be solved by efficient Toeplitz solvers (see, for example, Golub & Van Loan 1989). Since y (A ) has been calculated for a given A , the Lagrange multipliers i (A ) associated II. TWO EXAMPLES In this section we illustrate the quadratic with y (A ) are given by the first order conditions programming procedure that was described in (1.6). These multipliers will then be used to Section II, by applying it twice to a data set. determine whether y (A ) is the required solution. Therefore we present two examples. The first If, however, y (A ) is not optimal then we have example starts from an active set that includes all an indication of which constraints to remove from the active set. Equation (1.6) represents an overdetermined system with n A redundant the constraints. The second example starts from an empty active set, thus the calculation starts from the unconstrained minimum of the problem. These equations, so A two approaches make important equations may be chosen in differences in the number of iterations required to unknowns, i (A ) . All reach the optimum but, unfortunately due to the possible choices will give the same solution, small number of variables that we use for our provided the chosen system is nonsingular. presentation, these differences will not be all order to specify the A exposed. . 7 The data is artificially created by choosing 9 basis matrix U, is defined by (2.2) and (2.3), and points xi ih , i 4, 3,..., 4 and h 0.5 from the unknown part can be found by solving (2.5), f ( x) x3 and then by adding random error to the due to 3 3 1 0 0 1 3 3 0 0 Δ 0 1 3 1 0 . 0 0 0 3 3 0 0 0 1 3 value f ( xi ) we obtain the data vector 8.232 5.279 1.652 0.673 0.159 . 0.361 3.106 2.839 7.832 With these calculations U becomes 1 0.75 0.536 0.357 U 0.214 0.107 0.036 0 0 We require the least squares approximation to that has nonnegative differences of order r 3 . It follows that the constraint coefficient matrix has the form 1 3 3 1 0 0 0 0 0 1 3 3 1 0 0 0 0 0 1 3 3 1 0 0 Δ 0 0 0 1 3 3 1 0 0 0 0 0 1 3 3 1 0 0 0 0 0 1 3 3 0 0 0 . 0 0 1 Further, we solve 0 1 1.714 2.143 2.286 2.143 1.714 1 0 (2.6), 7.155 5.887 4.716 , 0 0.75 1.25 1.5 1.5 . 1.25 0.75 0 1 so we obtain and by substituting in (2.1) we derive y (A ) . Finally, we solve (2.7) and obtain the Lagrange multipliers T 2.155 6.893 11.297 11.464 7.913 3.889 , Below we present the examples, iteration after iteration, and we explain the actions taken. Example 1 Iteration 1 Initially we set A 1, 2,3, 4,5,6 that is all the constraints are taken to be active. Then, as we associated with y (A ) . Then we remove index i 4 from A that corresponds to the most negative Lagrange multiplier 4 11.297 . Iteration 2 mentioned in Section I, we let K be the empty set, In view of the last sentence, removing the fourth so we obtain Q 1,8,9 . The known part of constraint from the active set successively yields 8 A 1, 2,3,5,6 , T 1,8,9,5 , 0 1 0.6 0.2 0.3 0.267 0.2 0.1 U 0 0 0 0.333 0 0 1 0 0 0 3 3 1 0 0 1 3 3 0 0 Δ 0 1 3 1 0 , 0 0 0 3 3 0 0 0 1 3 0 0 0 1 0.6 0.6 0.3 0.7 0.3 0.8 0.4 1.1 8.465 0.1 0.6 0.3 1.2 4.029 U 0 0 0 1 , 7.458 1 0.5 0.5 0 0.838 0 1.333 0.5 0.167 1 0 0 0 0 0 1 0 0.465 0.182 0.436 0.135 0.748 . 0 8.362 0.258 3.243 0.27 7.832 , and 0.877 0.937 0.805 1.898 We remove index i 3 from A that corresponds to and 0 0 0.8 0.6 0 1.233 0.8 0 1.3 0.6 0 1 0 , 0 0.333 1 0 0 1 0 0 0 1 0 0 0 the most negative Lagrange multiplier 3 0.877 . T We remove index i 6 from A , that corresponds to the most negative Lagrange multiplier 6 0.748 . Iteration 4 Removing the third constraint from the active set yields A 1, 2,5 , T 1,8,9,5,7, 4 , Iteration 3 Removing the sixth constraint from the active set yields as above A 1, 2,3,5 , T 1,8,9,5,7 , 3 3 1 1 3 3 Δ 0 1 3 0 0 0 0 0 , 1 3 3 3 0 Δ 1 3 0 , 0 0 3 0 1 0.5 0 0.167 0 0 0 U 0 0 0.333 0 0 0 1 0 0 0 0 0 0 0 0.5 0 0 0.5 0 0 0 0 0 1 0 0 0.333 1 0 0 1 0 0 0 1 0 0 0 1 1.333 1 0 , 0 0 0 0 9 8.616 3.137 7.832 0.242 2.215 0.079 3 3 0 0 1 3 1 0 , Δ 0 0 3 3 0 0 1 3 and T 0.767 0.758 0.594 . We see that all the Lagrange multipliers are positive, which identifies the point where the next calculation is entered. By substituting into (2.1) we obtain the vector 8.616 4.508 1.666 0.079 y (A ) 0.242 . 1.251 2.215 3.137 7.832 1 0.5 0.167 0 U 0 0 0 0 0 0 0.25 0.25 0 0.5 0.5 0 1 0 0 0 0.5 0.75 0 0.5 1.083 0 0 1 0 1 0.5 , 0 1.333 0.167 0 1 0 0 0 0 1 0 0 0 0 8.577 0.688 3.22 0.638 . 7.832 and 0.23 0.2.081 0.76 0.055 We substitute into the constraints functions (1.3) and find that Δ 13 y 0 , Δ 32 y 0 , Δ 33 y 1.953 , Δ y 0.734 , Δ y 0 and Δ y 3.817 , so 3 4 3 5 3 6 We see that all the Lagrange multipliers satisfy the inequalities i 0, i A . We obtain the vector the fourth constraint is violated. Thus, we pick 8.616 4.508 1.666 0.079 y (A ) 0.242 . 1.251 2.215 3.137 7.832 index i 4 of the most violated constraint (-0.734) and add i 4 to A . Then one more iteration begins. Iteration 5 Adding the fourth constraint to the active set yields A 1, 2, 4,5 , T 1,8,9,7, 4 , We find Δ 13 y 0 , Δ 32 y 0 , Δ 33 y 1.386 , Δ 34 y 0 , Δ 35 y 0 and Δ 36 y 3.261 , so y (A ) satisfies all the constraints. The calculation ends 10 because the Kuhn-Tucker conditions are satisfied. Therefore, y (A ) is the required optimum. Alternatively, the calculation could start from the unconstrained minimum of (1.1). In this case 5 iterations are needed for termination. We shall see that each iteration adds a constraint to the active set. U Example 2 Initially we set A that is, all constraints are Then, K 1, 2,3, 4,5,6 and Q 1, 2,3, 4,5,6,7,8,9 . In this case, the basis matrix, U, is the (9 9) identity matrix and the calculation starts from the 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.3 1 0.3 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 8.233 5.279 1.652 0.673 and 0.557 . 0.118 2.272 3.118 7.832 Iteration 1 inactive. 1 unconstrained minimum of the problem. This means that Obviously the Lagrange multiplier satisfies the y(A ) and that the Lagrange multipliers are inequality 1 0 . Compute the vector all equal to zero. By substituting vector into the constraint Δ 13 y 1.974 , functions we Δ 32 y 1.538 , find 8.233 5.279 1.652 0.673 y (A ) 0.118 . 1.195 2.272 3.118 7.832 that Δ 33 y 3.554 , Δ 34 y 1.83 , Δ 35 y 5.556 and Δ 36 y 8.269 . The fifth constraint is the most violated one, so i 5 should be the first element to be included in the active set. Iteration 2 We Adding the fifth constraint to the active set yields A 5 , T 1, 2,3, 4,5,7,8,9 , Δ 3 , find Δ 33 y 5.221 , Δ 13 y 1.974 , Δ 34 y 2.339 , Δ 32 y 1.816 , Δ 35 y 0 and Δ 36 y 4.1 , so some constraints are violated. Pick index i 4 of the most violated constraint (-2.339) and add i 4 to A . 11 Iteration 3 Δ 36 y 2.701 , so two constraints are violated. Now Pick the index i 1 of the most violated 3 3 A 4,5 , T 1, 2,3, 4,7,8,9 , Δ , 1 3 U 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0.5 1 0.5 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0.167 1.333 0.5 0 0 0 0 0 , 0 0 0 1 8.233 5.279 1.652 0.534 . 0.406 and 0.956 1.939 3.318 7.832 Both the Lagrange multipliers are positive. 8.233 5.279 1.652 0.406 y (A ) 0.483 . 0.994 1.939 3.318 7.832 find Δ 33 y 2.415 , Δ 13 y 2.242 , Δ 34 y 0 , Iteration 4 Now A 1, 4,5 , T 1,3, 4,7,8,9 , 3 0 0 Δ 0 3 3 , 0 1 3 1 0.333 0 0 U 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0.333 0 0 0 0 0 0 0 1 0 0 0 0.5 1 0.5 0 , 0.167 1.333 0.5 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 8.346 1.992 0.506 1.923 3.328 7.832 and T 0.226 0.559 0.974 . All the Lagrange Compute the vector We constraint (-2.242) and add i 1 to A . multipliers now are positive. Compute the vector Δ 32 y 0.413 , Δ 35 y 0 and 8.346 4.94 1.992 0.506 y (A ) 0.512 . 0.984 1.923 3.328 7.832 12 We find Δ 13 y 0 , Δ 32 y 2.042 , Δ 33 y 2.958 , for such a small data set as the one used in these Δ 34 y 0 , Δ 35 y 0 and Δ 36 y 2.633 , so the examples. The authors intend to provide future second constraint is violated. Pick the index i 2 of the most violated constraint (-2.042) and add i 2 to A . work on this subject. References Cullinan, M.P. 1990 Data smoothing using non- Iteration 5 negative Now, A 1, 2, 4,5 and T 1, 4,7,8,9 . We see approximation. IMA J. of Numerical Analysis, divided differences and l2 10, 583-608. that the active set in this iteration is the same with the active set of Iteration 5 of the Example 1. Thus, we have reached the required solution 8.616 4.508 1.666 0.079 y (A ) 0.242 , 1.251 2.215 3.137 7.832 as in Example 1. The presentation of Example 2 is complete. It worth noticing the tremendous difference the choice of the active set made to this calculation. Demetriou, I.C. and E.A. Lipitakis 2005 Least squares data fitting by nonnegative divided differences. Unpublished manuscript. Demetriou, I.C. and E.A. Lipitakis 2001 Certain positive definite submatrices that arise from binomial coefficient matrices. Applied Numerical Mathematics, 36, 219-229. Demetriou, I.C. and M.J.D. Powell 1991 The minimum sum of squares change to univariate data that gives convexity. IMA J. of Numerical Analysis, 11, 433-448. Fletcher R. 2001 Practical Methods of Optimization. J.Wiley & Sons, Chichester, UK. When all the constraints are included in the initial Goldfarb, D. and A. Idnani 1983 A numerically active set, then our algorithm proceeds by stable dual method for solving strictly convex dropping constraints from the active set. When quadratic programs. Math. Programming, 27, the calculation starts from the unconstrained 1-33. minimum, our algorithm proceeds by adding Golub, G.E. and C.F. Van Loan 1989 Matrix constraints to the active set. Thus, it gives more Computations, 2nd ed. The John Hopkins control to which constraint to add to the active University Press, Baltimore and London. set, which allows a stable procedure (Goldfarb & Idnani, 1983). However this cannot be realized 13 Hildebrand, F.B. 1956 Introduction to Numerical Analysis, McGraw-Hill, New York. Karlin, S. 1968 Total Positivity. Volume 1. Stanford University Press, Stanford, Nocedal, J. and S.J. Wright 1999 Numerical California. Optimization. Springer, New York. Robertson, T., Wright, F.T., and R.L. Dykstra 1988 Order Restricted Statistical Inference. J. Wiley & Sons, Chichester, UK.