LIDS-P-206 7 Inverse Passive Learning of an Input-Output-Map Through Update-Spline-Smoothing M. Heiss Laboratory for Information and Decision Systems Massachusetts Institute of Technology September 5, 1991 Abstract This paper presents a robust method of learning passively a one-dimensional input-output-map when receiving only indirect information about the correct input-output-mlap (e.g. only the sign of the deviation between the actual estimated output value and the correct output value is obtained). This information is obtained for only one input-output combination per updating-cycle. The approach is to increment or decrement step by step the output values of the actually stored map and then to apply global or local cubic spline smoothing, in order to avoid "adaptation holes" at points which are never updated or less frequently updated than other points. This method works with noisy measurements as well as slowly timne-varying systems. Even discontinuous changes of the desired input-output-relation do not result in instability. Problems of convergence and stability are treated and design rules are given. Address for correspondence until September 30, 1991: Dr.Michael Heiss LIDS/M.I.T. 35-417 Cambridge, MA-02139-4307 U.S.A. Address for correspondence after Oktober 1, 1991: Dr.Michael Heiss R.Waisenhorngasse 115/52 A-1235 Vienna, Austria EUROPE Tel.: 011 (431) 883174, FAX: 011 (431) 883174-90 Inverse Passive Learning of an Input-Output-Map Through Update-Spline-Smoothing* M. Heiss Laboratory for Information and Decision Systems Massachusetts Institute of Technology September 5, 1991 Abstract This paper presents a robust method of learning passively a one-dimensional input-output-map when receiving only indirect information about the correct input-output-map (e.g. only the sign of the deviation between the actual estimated output value and the correct output value is obtained). This information is obtained for only one input-output combination per updating-cycle. The approach is to increment or decrement step by step the output values of the actually stored map and then to apply global or local cubic spline smoothing, in order to avoid "adaptation holes" at points which are never updated or less frequently updated than other points. This method works with noisy measurements as well as slowly time-varying systems. Even discontinuous changes of the desired input-output-relation do not result in instability. Problems of convergence and stability are treated and design rules are given. 1 Introduction The term learning is used in a large variety of meanings. The work reported in this paper deals with the learning of an input-output-map, having m inputs ~ = (s...r m)T and one output f(C). In other words an m-dimensional hypersurface f(,) has to be learned. To specify the term learning more clearly in this context, a distinction should be made between passive and active learning. Passive learning is defined as learning in the sense that the learner pays no role in obtaining information about the unknown input-output-map, as opposed to active learning where the learner has complete choice in the information received. *This work was supported by the Austrian "Fonds zur Forderung der wissenschaftlichen Forschung" under contract J0514TEC. In the most widely studied passive learning of an input-output-map, learning sim- ply means collecting examples, coajisting of the input vectors (i and the corresponding output values at those locatir;ns, i.e. the heights of the surface f(ji) [1]. Most of the methods allow noisy measurements of f(~i) [2, 3, 4, 5, 6, 7, 8, 9]. The problem solved in these papers is to reconstruct the hypersurface from the set of sparse data points. The related keywords for this kind of problem are regularization theory, approximation, estimation, smoothing, regression, and interpolation [10, 11, 12]. In particular for high dimensions m of the input vector ~ the approximation problem is hard to solve [13, 4, 7, 14, 15]. This effect is the so called "curse of dimensionality" (Bellman 1961 [16]). In applications where the input-output-map is doing the work of a controller the inverse problem of how to choose the input vector ~ to get a certain output value f(~) has to be solved (Atkeson 1986-91 [17, 7, 6, 18, 19]) [20]. In reflection of a wide range of applications the following problem is posed: Let = (EM, U)T be composed of an (m- 1l)-dimensional vector CM, representing the given measured input variables, and u, the scalar control variable. Define F' C R, XM C Rm- 1 ,U C R as the application specific set of all possible values of f(C),~ M, and u, respectively. Let f(CM, u) : XM,U -- F be "practically invertible" in the sense that for every combination of elements of F and XM, denoted fql and ~Mq exists a uq E U such that f(LMq,Uq) = fq (see Fig.1). oI-O-MAP LANT Uq= finv(~Mq, fq)Uq f f = f(M,) , U) Figure 1: Nonlinear plant and input-output-map working as controller: the desired f = fq is obtained by looking up the control variable u. in the correctly learned input-output-map Then we can define the inverse function of f = f(M, ) :XM,U -- F (1) as U = finv(M,f) : XM,F --4 U. (2) It is assumed that a very basic relationship between the change of the control variable u and the plant output f is known, by knowing a constant 'Index q for "query" 2 cis M auO = sign( XM, u E U. )', (3) The issue is to learn the inverse function finv(.M, f) with the following features: 1. the plant can be a black box system with the inputs ~M, u and the output f, describable by the unknown nonlinear function f = f(CM, u) and relation (3) 2. the function f(~M, u) and therefore finv(,(M, f) may vary very slowly during the life of the plant (algorithm must forget the old experience), 3. the answer for every query finv(CMq, fq) must be found immediately by looking it up in an input-output-map, and no post-computations are allowed, 4. finv(~ Mq, fq) is known to be smooth; thus every estimate of fin,, must be smooth, too, 5. the learner plays no role in obtaining information about the unknown inputoutput-map (passive learning), 6. robust for noisy measurements, 7. arbitrary accuracy of u, 8. easily realizable. 2 Project Motivation For a better understanding of the posed problem, an academic example is given, only to demonstrate the context and not to be considered as a real problem: The Golden Toast Problem. Let us assume you want to make a golden toast every day in the morning. Eq. (1) is the toaster model. In this case the color of the toast, the desired output of the plant, f = fgold is constant. To make it easy, the other input variables ,Mare reduced to only one scalar (M = Obread E [-40°C, +30°C], the bread temperature at the time of putting the toast into the toaster. Depending on the bread temperature thead and the control knob position ckp you will produce a toast with the color f = f(o)read, ckp) = f (M, u). (4) If you own a toaster which is powerful enough to produce a golden toast even if you take the bread out of a deep freezer with -40°C, the function f is practically invertible and you have to learn according to eq. (2) ckp = fin.v(bread, fgold)* 3 (5) Because of the constant setting f = reduced to f/,,: R - R, by writing ckp fgold, the dimension of fi,, : R 2 - R can be = finv(Vbread)* (6) It is known that a higher ckp leads to a darker toast (higher f). Thus, c,ig,, of eq. (3) is +1. Every morning you can measure the bread temperature Vra,,d. Then you look up in your mental input-output-map the necessary control knob position ckp to produce a golden toast. If the toast is too dark, you will decrement the control knob position for this specific temperature in your mental map. If the toast is too light, you will increment it. After a couple of days you will have no problems to make your toast as you like it. Doing this every day, even a slowly drifting toast characteristic would not handicap you in producing perfect toasts. As long as you do not have a sensor available which is able to measure the plant output f during the process or is not fast enough to be used for an on-line feedbackcontrol you have to learn the relationship (2),(6). With this relationship a fast on-line feed forward control can be done, using the updating procedure as slow feedback-loop. In some applications the input-output-map is just used for a fast but rough inner control loop and additionally an accurate outer control loop is provided. Note that in most cases passive learning is required. No test series are allowed. Similar problems can be found in steel production and robot control. Numerous applications can be found in automotive control: * single cylinder control for modern Diesel engines, compensating for the tolerances of the different cylinders over rotation speed and injection quantity (in this case z input-output-maps are necessary for z cylinders) (Heiss and Augesky 1988 [21, 221), * adaptive full-load fuel control to avoid too high injection quantities which would result in bad smoke emission values (Heiss 1989 [21, 23]), * adaptation of ignition angle for different fuel qualities (Kiencke 1987 [24]), * control of knock frequency with forward-loop adaptation (Kiencke 1987 [24]), * line pressure control for automatic transmission (Yamaguchi 1990 [25]), * position control during friction (Bitzer 1989 [26], Silberbauer 1990 [27]), · and many others [28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]. 4 Update-Smoothing or Query-Smoothing 3 The one-dimensional case (m = 1) is very important for industrial applications due to the possibility of cheap realization. Even this one-dimensional case has not been solved in general up to now [see above mentioned application papers]. Therefore, in this first paper only the one-dimensional case of inverse passive learning of the function (7) V x E X = [a, b] U = finv(x) will be treated, where u and x are scalars (e.g.: ckp and 1bc,,ad). The function fi,n is known to be smooth. Then for every given accuracy Auma, (see feature 7) a constant sampling interval Ax can be found (e.g. Ax1 = a. max!EX d~= such that finv(x) - finv(iALx + C) < AUmax (8) V X E X, where i = argminj Ix - jAx - cl and c is a constant which makes i(x = a) = 1. With x = (XlX 2 ... zn)T", xi =iAx + c, and u = (ul, u 2 ... u,) T , ui = fin(xi), the problem is reduced to learning an input-output-mapping x U, - (9) where x is constant and known, and u is very slowly time varying and unknown (Fig.2). For simplicity u can be treated as being constant over time.2 At the very beginning instead of the unknown u an initial estimate fi(0) is proposed. This initial estimate fi(0) may be any vector with fii e U,V i e {1... n} and might be a very bad estimate of u. When the system starts running, the first query at x; is answered with itj(O). The control variable is used as input to the plant. After completing the process the output f(0) of the plant is measured. If f(O) is not equal to the desired output fd(O) then fij is updated according to ij(l) = f 1j(O) + Csign h(fd(O) - f(0)). (10) If nothing more than (3) is known about the system, then the most robust way of choosing the operator h is h(A) = Cin,, sign(A). Otherwise h(A) can also be a gain or a more complicated controller. constant increment or decrement. (11) Cincr is the The simplest way of updating would be to update only fij and to keep all the other ui fixed: ti(l) = Uit(0) V i E {1 ... n},i yi j. (12) 2 The algorithm, to be proposed in the following chapter, will work in the same way whether u is constant or very slowly time varying. Most of the other known algorithms do not share this property. 5 U 30U 50 40 l U ui+(O) 20 (i+1)Ax iAx 0 0 10 20 30 40 50 60 70 80 90 100 Figure 2: fi,,(z), represented by the step function or the vector u = (u1 ... u,,)T, has to be learned. ii(O) is the initial estimate of u. Continuing to change only the data points where information is obtained, can cause problems when the number n of x-intervals is too high and/or it cannot be guaranteed that all ai:can be updated evenly. Some uit may never be adapted and remain at their initial value tit(O). Such non-adapted or less frequently adapted points ui are so called "adaptation holes" [45] and are unacceptable (see feature 4). Every application has to deal with this problem. Holmes 1989 [38] uses only a very small number n of intervals. Consequently the accuracy of u is low and the step height between ui and ui+l is high, but every ti is well adapted. Holmes utilizes hysteresis on all the interval limits in order to hide the high steps of fi and to avoid an irritation of an outer-loop controller, which has the input-output-map in its control loop. This idea is applicable only for a very limited set of applications. Schmidt and Schmitt 1988 [31] invented the "tent roof"-adaptation. They do not only modify the value itj but also its 2r neighbors (r ... tent roof radius) in a decreasing way with increasing distance from xj: Uj,+l(k+l)= ij+,l(k)+(1-- ) Cs,, sign(fd - f(xj(k))) V E Z, ll <r. (13) The disadvantage of this method is that a non-adapted or a less frequently adapted value ij does not necessarily fit smoothly between its neighbors. The advantage of this method is that it can be extended easily to the multi-dimensional case. 6 The approach in this paper is to update ftj and then immediately smooth the entire vector if. In this case a smooth fi can be guaranteed at any time (see feature 4). The smoothing can be accomplished by multiplying fi with a constant smoothing matrix S (see section 5). This kind of smoothing is called update-smoothing because the smoothing is part of the updating procedure. An alternative approach would be to collect the unsmoothed updates and to smooth the data only if a query is posed (Atkeson 1991 [191). The latter can be called query-smoothing. The proposed feature 3 requires the update-smoothing and prohibits the querysmoothing. 4 The Update-Smoothing Algorithm According to the previous section the following algorithm is proposed for inverse passive learning of an input-output-map through update-smoothing: * find an appropriate sampling interval Ax (8) * find the constant c,ig,,, representing the relationship (3) * find an initial estimate fi(O) which roughly estimates u * set the desired plant output fd to a constant value or define fd =x * choose the controller function h (e.g.(11)) · compute the smoothing matrix S (18) * start the system, (k = 0) * repeat forever: o read query x: o look up aj(k) in the actual (k-th) input-output-map at xj o use ij(k) as input to the plant o wait for completion of the plant process o measure f(k) o set ;tj(k+) =/Lj(k) + c,ig, h(fd - f(k)) *** updating *** o set i(k+ 1) = S fi(k+) *** smoothing *** o k=k+l The algorithm satisfies all the required features 1 to 7. In particular the combination of feature 2 (slowly time varying u), feature 3 (immediate answer of queries), and feature 4 (smooth fi) are not provided by other published methods. Feature 8 (easily realizable) will be shown in the next section. 7 5 Global Spline Smoothing A well known technique for smoothing the data vector i(k+) is to find fi,,(x) = argmin [ g(gi)) 2 + p i(k+) (g"(x))2dx , (14) optimizing g(x) over the Sobolev space W 2 of functions with g' absolutely continuous and g" C L2 (Schoenberg 1964 [46], Reinisch 1967 [47], DeBoor 1978 [48], Wahba 1975-1990 [3, 49], Craven and Wahba 1979 [50], Schumaker 1981 [51], Eubank 1984 [52, 53, 54], Wegman 1983 [5], Silverman 1985 [55]). p is a constant smoothing parameter trading off the smoothness of the curve fi,,(x) with its closeness to the values ii(k+) (the basic underlying idea was described by Whittaker 1923 [56]). The solution is known to be a cubic spline approximation of fi(k+) (Reinisch 1967 [47]). When p = 0, the solution is any interpolating function, limp,o of the solutions is the cubic spline interpolation, for p -+ oo the solution is the least squares line (Buja 1989 [13], Wahba 1989 [57], Wahba 1990 [49] p.14 ). With ti,(k+1) = fji,(zi) it can be shown that (14) is equivalent to solving u(k+l) = argmin li(k+)- f[2 + p fTKTi, (15) where 6 1 -2 1 1 -21 1 -21 T I 2 1 1 21 1 1-2 1 1 -2 1 1 -2 1 K= (16) 1-21 dim: nx(n-2) 12 dim: (n-2)x(n-2) 1-21 dim: (n-2)xn (Buja 19893 [13]). Note that K is of rank n-2. With K = KT and setting the derivative of the right hand term of (15) o9 = 0 the solution of (15) is uf(k+l) = S fi(k+), (17) S = (I + pK) - l. (18) where Note that K is a constant matrix and so is S. Thus, the smoothing procedure is a simple matrix multiplication (17) which is easily realizable. 8 a) 60 50 ,, 40 - ' 30 20 10 10 2 0 4 20 30 40 -201 0 10 1 50 f,(x) u / \ i 10 60 U(10 000) 70 80 100 90 X u(O) b) f I J~mC~ls~SL r, .u(10 000) time in 1- Figure 3: Simulation of how u is approximated when starting (k=O) with a completely different · timate 6i(0) and p = 0.0083. Example Fig.3 shows how the algorithm works. u = fi, (xi) is the sampled unknown function to be found. fi(0) is an initial estimate of u, being very different hereof. fir,,(x) is chosen to be discontinuous at x = 75 and not smooth at x = 40 and x = 60 to show the properties of the algorithm. The vector u is defined to be smooth, if u - Su. (19) The algorithm does not know of the fact that fi,,,(x) is not a smooth function. Thus, the result of the learning process can only be a smooth approximation of fi,,(x). The reason for this unusual choice of u is to give an example that even if fi,,(x) is not as smooth as expected, the proposed algorithm does not become unstable. The inputs xj (e.g. )Obread)of the map were given randomly. The output itj(k) of the map was used as input to the plant. If the output f of the plant was higher than the desired value fd, then iLj(k) was decremented (iLj(k+) = fLj(k) - 1), otherwise it was incremented. Then the smoothing ei(k+ 1) = S fi(k+) was done. Every 100k a new line fi was plotted in Fig.3b. Fig. 3a compares fi(10 000) with u and shows that fi(10 000) is a very good approximation of u as long as the requirement of a smooth fi,, is satisfied. The small limit cycle of fi around u is caused by the increment constant ci,cr, eq. (11), being equal to 1. 6 Convergence and Stability The example has shown that the algorithm works "pretty well". Nevertheless, it would be of interest to give at least some ideas of convergence and stability in general. In this context convergence means that there exists an K < oo such that lizi(K) - uil < Climit where Vi E {1 ... n}, cycle (20) is a small constant satisfying the needs of the application. Usually is approximately cin,, (see eq. (11)). Stability means that eq. (20) holds for Climit cycle Climit cycle all k > K. Referring to the algorithm, the two steps updating and smoothing are responsible for the change of fi. The updating step always tries to improve the approximation of u as long as the controller h is chosen properly (10), (11). The critical step is the smoothing operation. The global spline smoothing (14),(15) is only able to smooth the function and has originally nothing to do with estimating u, except the fact that u is known to be smooth (19). Thus, ei will be smoothed even if ef is exactly u. That means that stability is not obvious although the algorithm is convergent. Global spline 3 The introduction of this paper, p.453-467, gives a very good survey of linear smoothers, in particular cubic spline smoothers. 10 smoothing can result feature 4. Of course, through smoothing is In order to give a considered: in loss of stability, but smoothing is necessary to guarantee eq. (19) says that the forgetting of an already learned ii = u very slow. theoretical background to this problem some subproblems are 1. What happens, if smoothing is applied infinitely often to ui = u without updating in the meantime? (section 6.1) 2. How fast does the algorithm forget an already learned u ? (section 6.2) 3. If sufficient information is obtained for some of the n points of u, to what values will the intermediate points converge? (section 6.3) 6.1 Infinitely many times of smoothing without updating Without updating there is no longer a difference between k and k+ in eq. (17). fi(k) can be calculated immediately by fi(k) = Skil(0). For k -4 oo we get i(oo) = Smfi(0). What is fi(oo) in relation to fi(O) ? Theorem 1 If fi(O) is smoothed infinitely many times by fi(k+ 1) = S fi(k), where S is defined in eq. (18), then limk, 0o fi(k) is the least squares regression line of fi(O). This means that So = H, (21) where H is the hat matrix of the linear least squares regression. Proof: Since S is symmetric it can be diagonalized and the k-th power of S can be written as (Strang 4 [58] p.296, 264) S k = V A VT = A v + A2 V2V + A V3V. + .. A.k VVt (22) {Ai}=l1 are the eigenvalues of S sorted to be of decreasing amount with increasing i, and {vi}=l1 are the corresponding orthonormalized eigenvectors. A =diag{Ai} and V = (V)=1. It is known(e.g.[13] p.463) that Al = A 2 = 1 and 0 < Ai < 1 for i > 3. Due to this fact, (22) can be written as sk = (V1VT + V 2V) + k 3 + .. VT. (23) The first term represents a projection on the plane spanned by (v 1 ,v 2 ) ([58] p.380). The eigenvectors of A 1 = A2 = 1 are defined by S v = v, in other words the space of all v which are reproduced by spline smoothing. Since constants and straight lines 4a fascinating book about linear algebra are not changed by spline smoothing ([13] p. 4 61 ), the plane spanned by the vectors (1,x) is exactly the same plane as spanned by any other eigenvectors (v l ,v 2) of A1 = A2 = 1. The projection matrix projecting on the plane (1,x) is known to be the least squares line hat matrix H ([58] p.158-160, [13] p.458). Thus, (23) becomes S k = H + A k V3V T +... Ak VVT. (24) For k -- oo all terms with 0 < Ai < 1 vanish, leaving So = H. 6.2 [] Forgetting dynamics The last section has shown that even an already learned fi will drift towards its regression line after many times of smoothing without updating (Fig.4). The same problem occurs when updating takes place for a long time only in a little local region of fi and not at the whole range of xi-values. For example, updating the left endpoint of fi does not directly influence the right end of ii in the sense that it makes almost no difference for the right end of fi if the left end is not updated and the whole fi is only smoothed. Step by step, the right end will become smoother and smoother until it is a straight line. 50 d= 1 2 3 4 10 0(0) -10 -20 0 10 20 Figure 4: fi(k) after k = 30 1 0d 40 50 60 70 80 90 100 times of smoothing (p = 0.0083) In order to derive restrictions for a minimum rate of updating every region, the dynamics of Sku going to its regression line Hu should be considered. Eq. (24) is the spectral form of the matrix Sk. The eigenvectors are orthonormal to each other. Demmler and Reinisch 1975 [59] (or [13] p.463, 465) proved that for 12 i > 3 the number of sign changes in the i-th eigenvector of a cubic spline smoother is i - 1. The eigenvectors look like polynomials of degree i - 1. Any input vector fi is split up into its orthogonal components by projecting u on the eigenvectors vi ([58] p.1 4 8 ). This projection is very similar to the classic Fourier-decomposition. Instead of decomposing fi in its sin(iwx)-components, fi is decomposed by (24) into its polynomial-like components vi. The components with higher "frequencies" (high number of sign changes) have small coefficients Ai. Thus, during k times of smoothing (Ak) these components go faster to zero than lower frequent components which have eigenvalues Ai closer to one. This is exactly what is supposed to happen when the input vector fi is smoothed (see again Fig.4). The last remaining non-linear component is the one having the largest eigenvalue less than one (i.e. A3) as coefficient. If the input vector up,, looks like a parabola and is equal to the eigenvector v 3 then the projections onto all other eigenvectors are zero. In this case i(k) can be written with (24) as fi(k) = Sk Upa = k + k V3V3 Upar+0+...0 = Aupar = e- with t = kT, where T is the sampling period, and with the time constant can be estimated by 1- A3' \ T t pa =upare = -T/ln(\3). For A3 (25) - 1, (26) For low dimensions (n < 10) of S the eigenvalue A3 can simply be computed with the help of a standard program package. For higher dimensions an estimation formula for A3 (S) would be helpful. S depends on the smoothing parameter p, the number of points n, and the sampling interval Az. The same dependencies hold for the eigenvalues. The eigenvalues are characteristic values for the smoothing behavior (24) and therefore they should not depend on n and Ax, but only on the smoothing parameter. Some authors use pn in eq. (14) instead of p or divide the first term by n which is equivalent [3, 52, 60]. In order to make the minimization truly dimensionless P = pm n (b- a) 3 (27) must be used in eq. (14), where p,, is the independent smoothing parameter and b - a = n Ax (7),(8). p,m is usually a very small number (e.g. 10-10). Estimation 1 If p,m less than 10% by 10- 4 and n > 4 then 1 1 - A 3 (S) A3 can be estimated with an error , 751 p,,. (28) With (26) r can be expressed as r 751p'T In (28),(29) p,, can be replaced by p by inverting (27). 13 (29) Example: p = 8.. · 10-3, n = 100, Ax = 1, b- a = 100 was used in Fig.3 and Fig.4. With (27) p,m = 8.3' 10-1 1 follows (28) 1 - A3 M 6.26 10 - 8 and (29) r = 15 980 000 T. Silverman 1984 shows in his paper [61] that spline smoothing can be seen as variable kernel smoothing when writing (17) as {1 Vi i2(k+ 1) = si f(k+), (30) where the kernel si is the i-th row of S. The kernel si is centered at i. The shape and therefore the bandwidth bw is almost constant over all i, except near the boundaries of the interval (Fig.5). Si 0.8 0.7 0.5 . . .... ..... ...................................... .................... .......... .......................................................... 0. 0.3 - 0.24. .. .. 10 20 . . . . .......... . ................. - . 0 -0.1 0 30 40 50 60 70 80 90 100 Figure 5: Kernels si for i = 1, 10, 30, 50, 70, 90, 100 (n = 100, p = 1) The bandwidth bw of the kernels can either be estimated [61, 62] or directly measured at one of the middle rows (e.g. S,/2) of S. The bandwidth bw is a measurement of how many points in the neighborhood of an updated point (xj, uj) are directly influenced by this update. Conclusion: If t,ma is the maximum updating time-period over all possible intervals [xj-bw, jj+bw] E X, then one should choose tma, << 7, in order to get feasible convergence and stability results. 14 (31) 6.3 Infinitely many times of smoothing with fixed points In real applications it is usually imnpossible to guarantee that all points will be updated equally many times. What will happen if only some of the points are updated frequently while other points are never updated? The main motivation for this paper was to find an automatically interpolating algorithm that avoids adaptation holes. Does the algorithm really converge to an interpolation of the more frequently updated points? Let 1 points of 6i with indices i E Zf i = {il ... il} be updated very frequently, or even better, be perfectly updated. To get an idea of what is going on, we can analyze the following model: assume a continuous perfect updating process of the I points UC ( ... . tti)T in such a way that we can see them as constants. Now, we smooth infinitely many times by minimizing the cost function of (15), but with the additional boundary condition uc(k+ 1) = uc(k). With the Lagrange multiplicands [63] written in the vector g | |, I r C I0b{oifAi if ii f E Ifi (32) (32) the problem can be formulated to solve ii(k+ 1) = argmin [l11i(k)- + MT (i(k) - i)] . (33) u0= 0 and introducing a rectangular "cut"-matrix 5 of dimension By setting 90 = ° lxn C = {cjib ill2 + p jKf cji for else Vi E Nli < n, j E Nlj < I, cutting away all non-constant uii (e.g.: uc = C 6i), the solution of (33) is (34) 6 (k+ l) = S 1 ,,, i(k), (35) where Sx,,. = S [I + CT(C S CT)-1C (I 5 E.g. if n = 13 and l = 3 fixed points at Z3, X7, S)]. and X12, then Ili, and C = = {il = (36) 3, i2 = 7, i3 00000 = 12} 000000 0 0 0 0 0 0 0 0 0'00 1 0! °The computation is given in Appendix A. 15 Theorem 2 If fi(O) is smoothed infinitely many times by 6i(k+l) = Sz,. ifi(k), where Sxfi, is defined in eq. (36), 1 > 2, then limk_, il(k) = u,,, is the univariate natural cubic (unc) spline interpolation of the fixed pooints {(xij, uij)};= . This means that Si,.o = SUnc, (37) where Su,, is the (n x n)-hat matrix for the univariate natural cubic spline interpolation of the I fixed points. Proof: Every multiplication by Sifi. minimizes 8 (a'(k+ 1)) 2 dx, N -= Z(Ui(k) -Ul(k++1))2 + p ~ Nl~ (38) N2 over the subspace Sfi= of the Sobolev Space W 2 of functions u(k+1, x) with 'a' absolute continuous and ui" C L 2 and i,(k+1) = tii(k), Vi E Zfi,. (39) Denote the minimum of N in (38) N(k+l1) = N 1 (k+1) + N 2 (k 1) (40) and N2,t = minuEst. N 2 . Then it is known that this minimum of N 2 is only reached by the univariate natural cubic spline interpolation uun, which is unique (e.g.: Wahba 1989 [57]). a) First we show that if the series fi(k) is convergent, it converges to u,,,. Assume there exists an u* = SZ,ru* 4 uunc. Set u* for ii(k) and fi(k+l) in (38), then it remains to minimize N = O+N 2 which is N 2opt. N2 opt is only reached by 6i(k+l) = Uunc which is a contradiction. b) Assume N 2 (k) = N2 opt, in other words ft(k) = u,,,, then no further optimization is possible. Choosing if(k+1) = a(k) makes Nl(k+l) minimal (=0) and N 2 (K+1) minimal (=N 2 opt). Thus, starting with a(k) = un,,c all Ct(k + j) will be equal to Uu,,c c) Assume N 2 (k) $7 N2opt. Then N 2 (k) > N2opt. A trivial trial of minimizing N would be to choose i(k+1) = it(k) resulting in N(k + 1) = 0 + N 2 (k+1) with N 2 (k+1) = N 2 (k). This cannot be the minimizing solution as a result of part a) of this proof: i(k + 1) = f(k) is only possible if N 2 (k) = N 2opt which is a contradiction. 7 The term naturalcomes from the naturalboundary conditions: instead of integrating the square of the second derivative from x1 to z, it is integrated from -oo to +oo which makes the function u(z) a straight line outside the interval [xz, x,] according to the minimization of this term (see eq. (38)). The same result is achieved by the boundary conditions ft"(zi) = i"(z,) SNote that fi denotes a continuous function of x, the vector fi = (at,2 .. ,). T 16 = I"I'(xl) = t."'(X,) = 0. iiiare the values of (xz) at x = xi composing Since we know the existence of a solution, there must be a solution with N(k+ 1) < N(k+1) = N 2 (k). With Nl(k+1) always > O and (40) follows (41) N 2 (k + 1) < Nz(k). Thus, if k - oo, N 2 (k) - N2opt [63, p.297] and because of uniqueness ft(k) - u,,,. [ In case of I = 0, Theorem 1 provides the solution (regression line). It is mentioned without proof that for I = 1 the solution converges to the regression line through the fixed point. Conclusion: The purpose of this section was to show that if some points are never updated but only smoothed through global spline smoothing then the non-updated points will converge to the cubic spline interpolation of the frequently updated points (see also Fig. 9). 7 Local Spline Smoothing The learning algorithm using the global spline smoothing works pretty well as long as updates are provided which are evenly distributed over the whole range of xi-values. It does not matter if some of the points are never updated, they converge to the spline interpolation of the frequently updated points (section 6.3). The worst case for the global smoothing algorithm is to update only one point xj for a very long period. This case can easily happen in practice (e.g.: consider the learning of some car-engine characteristics during a cruise control ride on a flat straight freeway). Due to the global smoothing, in this case all the knowledge about u would be averaged resulting in a straight regression line through the perfectly updated point at xj. Methods of turning on and off the learning algorithm are not recommended. In almost every application a counter-example can be constructed where the on-offturning of the algorithm causes problems (e.g.: consider again the car on the freeway: the temperature will change during the ride and therefore u will change, too; learning is required all the time). Forgetting the older points in an averaging way can be useful in some applications. Nevertheless in most cases global smoothing is not desired and local smoothing would be appreciated. From the literature the "k-nearest neighbor estimators" (e.g.: [54, 13, 6]) are known: every point is smoothed but only the k-nearest neighbors are used to compute each of them. Contrary to this approach, we need a smoother, called local smoother, which smoothes only a local neighborhood around the newly updated point but uses as many points as necessary to compute the smoothed points. The problem is to fit the smoothed neighborhood in a smooth way into the unchanged part outside the neighborhood. 17 7.1 Moving window The idea of local update-spline-sIroothing is to define a window [Xj_,., Xj+r] around xj, to update the point at xj in the usual way, but to smooth only the neighboring points of xj within the window (2r+1 points). In this case eq. (18) cannot be used to compute the window-smoothing matrix So,,al. The regular spline smoother from eq. (18) would change all points inside the window without considering the required smoothness at the boundaries of the window. The smoothness is necessary in order to guarantee the smoothness of fi(k) for all the time (see feature 4 in section 1). Three methods of local smoothing are presented, all providing the smoothness at the boundaries. 7.1.1 All points outside the window are fixed One way to provide the smoothness would be to calculate Slcal using eq.(36) as Slocal(Xj) = SIf,, with Ifin = {all i from 1 to n without i = {j-l}-=+'}. In other words, all points outside the window are chosen as fixed points. The disadvantage of this approach is that Slocal(Xj) is dependent on xj and therefore not constant any more as it was in eq. (18). 7.1.2 Three fixed points on each side As we are dealing with cubic splines, it is sufficient to consider for smoothness only the continuity of the zeroth, first, and second derivative at the boundary. Thus a better approach is to define only the next three points outside the window on both sides to be fixed points and to calculate Slocal using again eq. (36). Define w = 2r + 1 (42) to be the number of points within the window. As shown in Fig.6, we can take the w points of the window and the six points outside the window (3 on each side) and calculate the new points uii(k+l 1) within the window. We need w + 6 input values and have w output values (it would not make sense to compute the 6 fixed points because they remain constant anyway). Thus, the matrix So,,al must only have the dimension w x (w + 6), which is much less than n x n. As the points are equally spaced it makes no difference which local part of the vector fi(k+) is smoothed. Slocal is always the same constant matrix and not dependent on j. A problem occurs if j is less than r + 4 or greater than n - r - 3. In this case the input values of the matrix multiplication (upper oval in Fig.6) exceeds the boundaries of the vector fi(k+). The simplest method of solving this problem is to augment the data vector with ul on the left side and with u, on the right side. Now, Stocal is really a constant matrix. 18 wo + 6 a(k+) , ~/ xfixe pointsst3 _ ." ..... I ............. . .'...... .................. ..... Ha;\ I < unchanged fixed point ..... ...... , i !, u tU I smoothed i \Uu ( Fk + 1) unchanged Figure 6: Illustration of eq. (47): After uj(k) is updated to fj(k+) only the w points within the window are smoothed instead of smoothing the whole vector fl. The next three points outside the window are used for the smoothing process but are considered as fixed points in order to fit slmoothly the smoothed part into the unchanged part of the vector. According to the augmentation method define augm(i) = 1 if i<1 n if i > n i else. (43) Other methods of generating data outside the vector can also be considered. Further, let us define some sets of indices: The set of n indices of fi is I = {i} =l. The set of o indices within the window around j is 1. = I.(j) = {augm(i)}=jt_, (44) and the set of w+6 indices for the window including the 3 fixed points is Tw,+6 = Iw+6(j) = {augm(i)}i=j_,_r. (45) The set of indices outside the window is IZ = 1z(j) = I without Z,(j). (46) The vector fi(k+) as it was used until now, can also be written as izI(k+) and using eq.(44) to (46), parts of the vector can be denoted as well. 19 Using this notation the local spline smoothing algorithm becomes u1z(k+l) = u Stoca dim: w xl x (+6) (k+) (+6) Ii7(k+) (47) x1 ul_(c+ 1) = where Slocal can be calculated from (36) and (34) as Socal = 4 th to w+ 3 rd row of [S (I + CT(C S CT)-1C (I- S))] (48) with 1 0 C= 0 0 1 00 0 0 0 00 O (49) 0 l 0 1 0o 0 0 1 dim: 6x (w+66) and with S as defined in eq.(18), (16) but dim(S) = (w+6) x (w+6). Not only does eq.(47) represent a local smoother with the mentioned advantages, it also requires muss less computational effort (w(w+6) instead of n 2 ) than the global smoother of eq.(18). 7.1.3 Roll-off method The reader may think that it is much easier to simply smooth the points in the window without considering the boundary conditions and to do gradual blending afterwards, taking the unchanged values at the boundaries of the window, the regularly smoothed value in the center of the window, and a combination of both in between (a modified idea of the piecewise smoothing method by Paihua 1978 [64] and Utreras1979 [65] p.214). Define the normed roll-off function a(() = U2(3 - 2~) (50) with a(0) = a'(1) = 0 representing the weight of the first point outside the window, and a(l) = 1,a'(1) = 0 representing the weight at the center of the window. Then compose the w x w weighting diagonal matrix W = diag (,r-l -),( a,--) (- ),-- -,o( ),oa(r;-),cca(,-), .. ),;(--f, ) (51) 20 With dim S = w x w and S from (18) the smoothing process becomes i (k+ 1) = WSi , (k+) + (I-5W) 2(k) ) (52) fiz(k+1) =ilv(k +) gradually blending the two vectors Siu(k+) and fi(k) within the window. The vectors fi(k) and fi(k+) differ from each other only in one component: the updated component in the center of the window. The weight corresponding to this component is 1 as shown in (51), or 1 minus the weight is zero. Thus, the first line of (52) can also be written as ai~(k+l) = WS iW(k+)+(I-W)uZ(k~+) = = (I + W (S - I)) i(k+) or = iz.,(k+ l) Slocalz(k+) (53) with SlocL= I + W (S - I). a) (54) b) Figure 7: Smoothing matrices for p = 1, r = 5, w = 11 a) Slocal from eq.(48), dim: 17 x 11, b) S1octa from eq.(54), dim: 11 x 11. The matrix Socal (54) is surprisingly different from Slocal in eq.(48) (see Fig.7). The two matrices represent the two different methods. The fixed point method (Fig. 7a) considers the three points outside the window to provide the boundary smoothness. The roll-off method (Fig. 7b) considers only the points inside the window. In the case of updating only one point for a very long period, the window remains at the 21 same place and the points within the window converge again to their regression line 9 , causing a discontinuity at the boundary, of the window. The motivation for this chapter was to avoid the global smoothing. Global smoothing causes a slow forgetting of all points of fi in an averaging way, even if the points are far away from the actually updated point. Therefore we asked for local smoothing. It is obvious that a smoothing method cannot be recommended which is able to cause discontinuities like the roll-off method. 7.2 Example For testing purposes both smoothing methods, the global one based on eq. (17) and the local one based on eq. (47), are applied to learn the same goal vector u which is representing the input-output-map (eq. (9)). The vector u is chosen to be the same as in Fig. 3. The selections of which point is updated next are performed randomly. 10 000 updates are done. The same parameters p = 1 and ci,,, = 1 are used for both mIethods. According to eq. (47) the only difference is that local smoothing is done exclusively within a window of o = 11 points (r = 5). 50 40 30 20 10 .. -20 0 global 10 20 30 40 50 60 70 80 90 100 Figure 8: Different learning ability of global and local update-smoothing algorithm. - - - - goal vector u, ........... fi(10 000) using global update-spline-smoothing, fi(10 000) using local update-spline-smoothing. 9It is easy to show with (54) that Ai(Sioc,a) = 1 + wii(Ai(S) - 1) and both matrices have the same eigenvectors. With wai 0 0 is Ai(S 1 Oc,,) = 1 iff Ai(S) = 1. As shown for eq.(24) this makes SooCaI = H, where all matrices are of dimension w x w. 22 In this case (see Fig. 8), the behaviors of the global and the local methods are quite different. Both converge to a stable vector ft. The local one approximates u very well, except the unsmooth parts (as desired). The global one is improperly dimensioned. The smoothing operation is much stronger than the updating operation and therefore fi never reaches its goal u. This fact is notable as both methods are using the same parameters. The reason for this behavior is that in the global case all points are smoothed repeatedly with every update. This makes the smoothing effect per point approximately n times higher than in the local smoothing case. Fig. 9 shows another test example. This time the global smoothing parameter is chosen Pgjobal = 0.01, the local again Plocal = 0.1. Both methods behave similarly, as long as all regions are updated (the case that updates take place only in some regions of the vector ii (e.g. between x = 60 and 90) is not shown in the figure). To make the test of Fig. 9 more interesting, only every fifth point (marked with "o" in Fig. 9) is updated and all other points are never updated. Additionally to the result of the local and global update-smoothing method, the natural cubic spline interpolation of the updated points is plotted. 50 40 30o 10 -20 0 . 10 20 30 40 50 60 70 80 90 100 Figure 9: Learning of the input-output-map in case of sparse updated points. o updated points, - - - -goal vector u, ........... fi(10 000) using global update-spline-smoothing, fi(10 000) using local update-spline-smoothing, univariate natural cubic spline interpolation to updated points o. We can see that all the non-updated points converge to the natural cubic spline interpolation of the updated points (see Theorem 2).10 It is obvious that the goal 10 Remember that the term convergence, as defined in eq. (20), tolerates a deviation of approximately -c;,,, which is chosen to be 1 in these examples. 23 vector u cannot be approximated very accurately by fi in regions where the spline interpolation to the well updated points is very different from u. Either you have to provide the algorithm with enough updated points or you face the fact that all other points are interpolated. The stable vector fi(oo) is independent of the initial estimate 6i(O) in all these examples. 8 Realization Experiments have shown that the proposed update-smoothing algorithm is in general robust against bad computational resolution. Using 1· round(256 S,,,l) instead 1 of S,,oca and round(ui) instead of Ci, a similar performance is provided as with full precision. If the computational resolution is very low, it can happen that one point is updated, but after the smoothing the ordinate of this point is exactly the same as it was before updating this point. In other words some undesired stable conditions can occur. For the same reason the interpolation ability of the algorithm can be less than expected. Thus, it is recommended to choose cin larger by an order of magnitude than the computational resolution. A factor of 2 can be sufficient in uncritical applications. The rest of the realization is simple. Eq. (48) shows how to calculate" Sioca which is a rectangular matrix of low dimension. Therefore, the computational effort for the matrix multiplication (smoothing operation) is low. 9 Conclusion The issue is to track a slowly time-varying, nonlinear, but smooth characteristic curve during the life time of the system. The simplest method of learning this parameter free characteristic curve is to store in an input-output-map for every possible input value the corresponding output value. In this case, no computations are necessary to answer a query concerning one of these input-output pairs. The corresponding value is obtained simply by looking it up in the map. Of course, this comfort of very cheap queries has to be paid by some effort for the updating operation. It is known that the unknown input-output-map is a smooth input-output relationship with the option to be very slowly time-varying. All new information about this relationship has to be used, but the smoothness of the relationship must not get lost. In order to avoid "adaptation holes" or any discontinuities, cubic spline smoothing is done after every update (so called update smoothing). The paper discussed two 11 MATLAB is a very helpful tool for computing this matrix. 24 different smoothing methods: the global and the local spline smoothing. Global smoothing should not be used in applications where it cannot be guaranteed that all input regions are updated within some time period. Otherwise the smoothing task averages the already learned relationship, converging to the straight regression line of the data points. The time constant of this forgetting process was estimated. The better way is to use local smoothing, smoothing only a local neighborhood of the updated point. Even if we stay for thousands of updates at the same point, the relationship outside of the neighborhood window is not destroyed. It was possible to show that the update-spline-smoothing algorithm has the interpolation property. The never or seldom updated points converge to the univariate natural cubic spline interpolation of the well updated points. This requires the trivial restriction that at least as many points have to be frequently updated that the cubic spline interpolation of these points is sufficiently accurate. There are still many open problems in this field. More concrete design rules would be appreciated for the one-dimensional problem, as treated in this paper. The learning of hysteresis functions and some application oriented details have to be solved. Furthermore, the multi-dimensional update-smoothing should be solved. Different smoothing methods, the neural network approach (Tolle 1989 [66], Poggio 1990 [1], Atkeson 1989 [7], Livstone 1991 [67], Lippmann 1987 [68]), and other learning methods (e.g. Horowitz 1989 [69]) should be considered. Appendix A Solving eq. (33): 0 =0 =o fi(k+l1) = with (a): with (18): C --· . with (b): or: with A= CTCI: (cscT)- . CT-. = = fi = C = Cfi(k) = cs = CSCTCA CA s with (c): = -2(fi(k) )-i)+2pKfi-t - = A=2 Cf(k) (a) (b) - (I+pK)-'(fi(k) +) Sfi(k)+ SI CSa(k) + CS CSfi(k) + CSA C(I - S)i(k) C(I- S)(k) (c) (CSCT)-lC(I- s)i(k) CT(CSCT)-1C(I- S)ef(k) i(k+l) = S [I + CT(CSCT)-1 C(I 25 - S)] fi(k) Eq.(35),(36) Acknowledgements The author wishes to thank Prof. Sanjoy K. Mitter, Prof. Gilbert Strang, Prof. Christopher Atkeson, Prof. Walter Olbricht, Prof. John N. Tsitsiklis (all M.I.T.), Prof. Grace Wahba (University of Wisconsin at Madison), Prof. Jerome H. Friedman (Stanford University), Dr. Andreas Buja (Bellcore), and Dorothea Czedik-Eysenberg (Boston University) for their helpful discussions. References [1] T. Poggio and F.Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978-982, February 1990. [2] E. Grosse. A catalog of algorithms for approximation. netlib@research.att.com, pages 1-27, 1989. [3] G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik, 24:383-393, 1975. [4] J. H. Friedman. Multivariate adaptive regression splines. Technical report 102, Department of Statistics, Stanford University, November 1988. [5] E. J. Wegman and I. W. Wright. Splines in statistics. Journal of the American Statistical Association, 78(382):351-365, 1983. [6] Ch. G. Atkeson. Memory-based approches to approximating continous functions. note, M.I.T., the Artificial Intelligence Laboratory, 1990. [7] Ch. G. Atkeson. Learning arm kinematics and dynamics. Neuroscience, 12:157-183, 1989. Annual Review of [8] J. H. Friedman and W.Stuetzle. Projection pursuit regression. Journal of the American Statistical Association, 78(376):817-823, 1981. [9] P. J. Green. Semi-parametric generalized linear models. In Lecture Notes in Statistics 32, Generalized Linear Models, pages 44-55, Berlin Heidelberg New York, 1985. Springer-Verlag. [10] Ph. J. Davis. Interpolation and Approximation. Blaisdell Publishing Company, New York Toronto London, 1963. [11] L. L. Schumaker. Fitting surfaces to scattered data. In G. G. Lorentz, C. K. Chui, and L. L. Schumaker, editors, Approximation Theory III, pages 203-268, New York, 1976. Academic Press. 26 [12] R. Franke. Scattered data interpolation: Tests of some methods. Mathematics of Computation, 38(157):181-200, 1982. [13] A. Buja, T. Hastie, and R. Tibshirani. Linear smoothers and additive models. The Annals of Statistics, 17(2):453-555, 1989. [14] J. Moody. Fast learning in multi-resolutional hierarchies. In D. S. Touretzky, editor, Advances in Neural Information ProcessingSystems I, pages 29-39. Morgan Kaufmann Publisher, San Mateo, CA, 1989. [15] M. J. D. Powell. Radial basis functions for multivariable interpolation: a review. In J. C. Mason and M. G. Cox, editors, Algorithms for Approximation, pages 143-167, Oxford, 1987. Clarendon Press. [16] R. E. Bellman. Adaptive Control Processes. Princeton University Press, Princeton, New Jersey, 1961. [17] Ch. G. Atkeson and J. McIntyre. Robot trajectory learning through practice. In IEEE InternationalConference on Robotics and Automation, pages 1737-1742, 1986. [18] Ch. G. Atkeson. Using locally weighted regression for robot learning. In IEEE International Conference on Robotics and Automation, Sacramento, CA, April 1991. [19] Ch. G. Atkeson. Memory-based learning control. In American Control Conference (ACC 91), pages 2131-2136, Boston, 1991. IEEE. [20] D.-Y. Yeung and G. A. Gekey. Using a context-sensitive learning network for robot arm control. In IEEE International Conference on Robotics and Automation, pages 1441-1447, May 1989. [21] M. Heiss. Minimization of Dead Time in Electronic Diesel Contol Systems (in German). PhD thesis, University of Technology, Vienna, Austria, August 1989. Einrichtung zum Steuern und Regeln einer [22] Ch. Augesky and M. Heiss. Brennkraftmaschine eines Fahrzeuges. German Patent DE 3822582, July 1988. [23] M. Heiss, Ch. Augesky, A. Seibt, and W. Bittinger. Einrichtung zum Steuern und Regeln einer Dieselbrennkraftmaschine. German Patent Application DE 3906083, February 1989. [24] U. Kiencke and C. T. Cao. Regelverfahren in der elektronischen Motorsteuerung I, II. Automobilindustrie, pages 629-636 and 135-144, 6/1987 and 2/1988. [25] H. Yamaguchi. Motor vehicle with line pressure control for automatic transmission. European Patent Application EP 0398057, 1990. 27 [26] R. Bitzer and F. Piwonka. Verfahren und Stellregler zur adaptiven Regelung eines reibungsbehafteten .Antriebs. German Patent Application DE 3731983, 1989. [27] M. Silberbauer. Digitaler Mengenservo. Technical Report 55, Voest Alpine Automotive, Vienna, Austria, March 1990. [28] K. Abe. Learning control systems for controlling an automotive engine. USPatent 4733357, 1988. [29] K. Abe. Kraftstoff-Luftverhiltnis-Uberwachungssystem fiir eine Kraftfahrzeugmaschine. German Patent Application DE 4001494, 1990. [30] N.Tomisawa. Learning and control apparatus for electronically controlled internal combustion engine. US-Patent 4763627, 1988. [31] P. Schmidt and M.Schmitt. Verfahren und Einrichtung zur Steuerung/Regelung von Betriebsgr6Ben einer Brennkraftmaschine. German Patent Application DE 3603137, 1988. [32] M. H. Furuyama. System zum Erfassen anormaler Betriebszustainde eines Verbrennungsmotors. German Patent Application DE 3819016, 1988. [33] M. H. Furuyama. Steuerungssystem fiir das Luft-Kraftstoff-Verh iltnis fiir einen KFZ-Motor. German Patent Application DE 3928585, 1990. [34] S. Miyama. Verfahren zur Steuerung des Ziindzeitpunktes einer Brennkraftmaschine. German Patent Application DE 4018447, 1990. [35] H. Ohishi. Verfahren und Vorrichtung zum Regeln des Luft-KraftstoffVerhailtnisses des einer Brennkraftmaschine fiir ein Fahrzeug zugefiihrten LuftKraftstoff-Gemisches. German Patent DE 3726867, 1990. [36] T. Hori, T. Atago, and M. Nagano. Method of controlling air-fuel ratio for use in internal combustion engine and apparatus of controlling the same. European Patent Application EP 0358062, 1989. [37] K. Kameta, K. Morita, T. Kikuchi, and Y. Tanabe. Method and apparatus for controlling fuel supply to an internal combustion engine. European Patent Application EP 0339585, 1989. [38] M. Holmes. Einstell-Regelsystem fiir einen Verbrennungsmotor und Verfahren zum Regeln eines Verbrennungsmotors. German Patent DE 3703496, 1989. [39] E. Linder, M. Klenk, and W. Moder. Steuer-/Regelsystem fiir instationiiren Betrieb einer Brennkraftmaschine. German Patent Application DE 3731983, 1989. 28 [40] S. Nakinawa and T. Tomisawa. Control system for internal combustion engine with improved control characteristics at transition of engine conditions. European Patent Application 0314081, 1989. [41] K. Walter. Kraftstoffregeleinrichtung fiir Brennkraftmaschinen mit Regelschaltung zum Erfassen der Eichung und Verfahren zum Betrieb von Brennkraftmaschinen. German Patent Application DE 2829958, 1979. [42] S. Sasaki, Y. Mouri, and N. Sugiura. Engine control device. European Patent 0155663, 1989. [43] M. Klenk. Lernendes Regelungsverfahren fiir eine Brennkraftmaschine und Vorrichtung hierfiir. German Patent Application DE 3811262, 1989. [44] M. Klenk. Lernendes Regelungsverfahren fiur eine Brennkraftmaschine und Vorrichtung hierfiir. German Patent Application DE 3811263, 1989. [45] M. Heiss. Adaption von Kennlinien in Echtzeit. Elektrotechnik und Informationstechnik e & i, 106(10):398-402, 1989. [46] I. J. Schoenberg. Spline functions and the problem of graduation. Proc. Nat. Acad. Sci. U.S.A., 52:947-950, 1964. [47] Ch. H. Reinisch. 10:177-183, 1967. Smoothing by spline functions. Numerische Mathematik, [48] C. De Boor. A PracticalGuide to Splines. Springer, New York, 1978. [49] G. Wahba. Spline Alodels for Observational Data. CBMS-NSF # 59. Society for Industrial and Applied Mathmatics, Philadelphia, 1990. [50] P. Craven and G. Wahba. Smoothing noisy data with spline functions. merische Mathematik, 31:377-403, 1979. Nu- [51] L. L. Schumaker. Spline Functions: Basic Theory. Wiley, New York, 1981. [52] R. L. Eubank. The hat matrix for smoothing splines. Statistics & Probability Letters, 2:9-14, 1984. [53] R. L. Eubank. Diagnostics for smoothing splines. J. R. Statist. Soc. B, 47(2):332341, 1985. [54] R. L. Eubank. Spline Smoothing and NonparametricRegression. Marcel Dekker, New York and Basel, 1988. [55] B. W. Silverman. Some aspects of the spline smoothing approach to nonparametric regression curve fitting. J. R. Statist. Soc. B, 47(1):1-52, 1985. 29 [56] E. T. Whittaker. On a new method of graduation. Proc. Edinburgh AMath. Soc., 41:63-75, 1923. [57] G. Wahba. Spline functions. In Encyclopedia of Statistical Science, Supplement Volume, pages 148-159. John Wiley & Sons, New York, 1989. [58] G. Strang. Linear Algebra and its Applications. Harcourt Brace Jovanovich, San Diego, 3rd edition, 1988. [59] A. Demmler and C. Reinisch. Oscillation matrices with spline smoothing. Numerische Mathematik, 24:375-382, 1975. [60] D. D. Cox. Asymptotics for M-type smoothing splines. The Annals of Statistics, 11(2):530-551, 1983. [61] B. W. Silverman. Spline smoothing: The equivalent variable kernel method. The Annals of Statistics, 12(3):898-916, 1984. [62] W. Hirdle. Applied Nonparametric Regression. Cambridge University Press, Cambridge, New York, New Rochelle, Melbourne, Sydney, 1990. [63] I. N. Bronstein and K. A. Semendjajew. Taschenbuch der Mathematik. Teubner, Leipzig, 20th edition, 1981. [64] L. Paihua. Quelques Methodes Numeriques pour les Fonctions Spline a une et deux Variables (in French). PhD thesis, Grenoble, May 1978. [65] F. Utreras. Cross-validation techniques for smoothing spline functions in one or two dimensions. In Lecture Notes in Mathematics 757, Smoothing Techniques for Curve Estimation, pages 196-232, Berlin Heidelberg New York, 1979. SpringerVerlag. [66] H. Tolle, J. Militzer, and E. Ersii. Zur Leistungsfahigkeit lokal verallgemeinernder assoziativer Speicher und ihren Einsatzmoglichkeiten in lernenden Regelungen. Messen Steuern Regeln msr, 32(3):98-105, 1989. See also 9th IFAC World Congr. pp. 1039-1044. [67] M. M. Livstone, J. A. Farrell, and W. L. Baker. A computationally efficient algorithm for training recurrent networks. IEEE Transactions on Neural Networks (submitted), 1991. [68] R. P. Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine, pages 4-22, April 1987. [69] R. Horowitz, W.-W. Kao, M. Boals, and N. Sadegh. Digital implementation of repetitive controllers for robotic manipulators. In IEEE InternationalConference on Robotics and Automation, volume III, pages 1497-1503, May 1989. 30 [70] J. G. Hayes and J. Halliday. The least-squares fitting of cubic spline surfaces to general data sets. J. Inst. Maths Applics, 14:89-103, 1974. [71] W. Th. Miller III, F. H. Glanz, and L. G. Kraft III. Application of a general algorithm to the control of robotic manipulators. The InternationalJournal of Robotics Research, 6(2):84-98, 1987. 31