ETNA Electronic Transactions on Numerical Analysis. Volume 17, pp. 168-180, 2004. Copyright 2004, Kent State University. ISSN 1068-9613. Kent State University etna@mcs.kent.edu MULTIDIMENSIONAL SMOOTHING USING HYPERBOLIC INTERPOLATORY WAVELETS MARKUS HEGLAND , OLE M. NIELSEN , AND ZUOWEI SHEN Abstract. We propose the application of hyperbolic interpolatory wavelets for large-scale -dimensional data fitting. In particular, we show how wavelets can be used as a highly efficient tool for multidimensional smoothing. The grid underlying these wavelets is a sparse grid. The hyperbolic interpolatory wavelet space of level uses basis functions and it is shown that under sufficient smoothness an approximation error of order ! can be achieved. The implementation uses the fast wavelet transform and an efficient indexing method to access the wavelet coefficients. A practical example demonstrates the efficiency of the approach. Key words. sparse grids, predictive modelling, wavelets, smoothing, data mining. AMS subject classifications. 65C60, 65D10, 65T60. 1. Introduction. Predictive modelling aims to recover functional relations from observed data. More concisely, given a sequence of values of a response variable "$#&%('*) +!+ +),"#.-' #.-' 3 , and a sequence of values of a predictor variable (which is typically a vector), /10 #&%' ) + +!+ ),0 24 % 4 2 3 # % ' . # ' + +!+ , /10 )!+ +!+ )(0 predictive modelling recovers a function 5 such that % #.7.' 3;: " #.78' + 56/10 %8# 7.' !) + +!+)(0 29 In the cases considered here the response and the components of the predictors are real numbers. In a smoothing approach to predictive modelling, the function is obtained by minimising a cost functional (1.1) <>=?/@5 3;A B78C?% /@56/10 #.78' 3ED " #87.' 3(FHGJI /LK5)(KM5 3 ) where I is the smoothing parameter which 2 can be found, e.g., by cross-validation [13] and the operator K is densely defined in N F /LO 3 , and typically is a differential operator. Finding a good function class which both allows the efficient approximation of the predictive function 5 and uses algorithms which are scalable in the number P of predictor variables 0 % ) +!+ + ),0 2 is a major challenge. Function classes which have been used in the past for this problem include radial basis functions [18], artificial neural nets, multivariate adaptive regression splines [13], and generalised additive models [13]. Here we present a new method for predictive modelling based on hyperbolic interpolatory wavelets. A related approach which applies sparse grids to classification is discussed in [9]. First, the function space for 5 should provide a good approximation of a large class of functions under reasonable smoothness assumptions. Second, the evaluation of the function Q Received August 27, 2002. Accepted for publication April 13, 2004. Recommended by Martin Gutknecht. The research presented here was partly funded by the Australian Advanced Computational Systems CRC, Australian Partnership for Advanced Computations (APAC), the Danish Research Council and the Academic Research Fund RP3981647. This research was partially done while the first author was visiting the Institute for Mathematical Sciences, National University of Singapore in 2003. The visit was supported by the Institute. Mathematical Sciences Institute,Australian National University, Canberra ACT 0200, Australia. E-mail: markus.hegland@anu.edu.au Urban Risk Research Group, Geoscience Australia, Canberra ACT, Australia. Department of Mathematics, National University of Singapore, Singapore. 168 ETNA Kent State University etna@mcs.kent.edu 169 Multidimensional smoothing using hyperbolic interpolatory wavelets should be fast and the function itself should be represented by a limited number of nonzero coefficients. As we will see, hyperbolic interpolatory wavelets satisfy both conditions. If the function space for 5 has a relatively low dimension, then nonlinear approximation theory provides a feasible technique for the determination of an approximation with few terms [5]. This -term approximation proceeds by first determining all the coefficients of the expansion of 5 in the space and then selecting the most important ones, e.g., by using the threshold method [8]. This approach is very successful in the case of small P , e.g., in the case of image processing where P A (see [6]). For problems of higher dimension, this approach is not feasible as the dimension of the underlying function space becomes prohibitively large. In this case, the -term approximation technique needs to be preceded by a basis selection procedure which finds a smaller basis. This step makes the problem “seriously” nonlinear and typically greedy algorithms are used [13]. Under sufficient smoothness conditions (which are related to the operator K ), hyperbolic interpolatory wavelet spaces provide a good trade-off between function space complexity and approximation error. Hyperbolic orthogonal and bi-orthogonal wavelets were introduced in [7, 10, 11] where error analyses were also given. Further, hyperbolic bi-orthogonal wavelets have been shown to be effective for the solution of high dimensional elliptic PDEs and IEs ([15, 11]). Here we consider the case of interpolatory wavelets where the interpolation points form a sparse grid [19]. At the same time as we developed our hyperbolic interpolatory wavelet approach a related approach was developed which uses sparse grid approximations for the classification problem [9]. In Section 2 we introduce hyperbolic interpolatory wavelets and discuss their approximation properties. In Section 3 we discuss the implementation and give an application. 2. Approximation by hyperbolic interpolatory wavelets. Data mining applications require access to function values on the grid points for further processing. For interpolatory wavelets the function values on the grid are directly related to the wavelet coefficients and can be easily retrieved. This is in contrast to orthogonal and biorthogonal wavelets for which more elaborate summations are required to retrieve the function values on the grid points. The approximation properties for hyperbolic biorthogonal wavelets have been studied in [7]. In the following we will show that similar estimates can be given for interpolatory wavelets using a different approach. In the next subsection some basic properties of one-dimensional interpolatory wavelets are reviewed. The reader familiar with the literature on this topic can skip this subsection and move straight to the following subsection which introduces hyperbolic interpolatory wavelets. 2.1. One dimensional interpolatory wavelets. Compactly supported interpolatory wavelets are defined by compactly supported, interpolatory refinable functions . A compactly supported function is refinable if there is a finitely supported sequence , such that the function satisfies the following equation: M/L0 3 A B / 0 D 3 + The sequence is called the refinement mask for M /10 3 . A continuous refinable function is interpolatory if ) A ) ; 3 A A (2.1) M/ )!#"%$& + A simple example of an interpolatory refineable function is the hat function given in the following example. ETNA Kent State University etna@mcs.kent.edu 170 M. Hegland, O. M. Nielsen, and Z. Shen E XAMPLE 2.1. Let G M/10 3 A D ) 0?) 0 ?0 ) 0 D ) ) ) ) + otherwise M/10 3 is an interpolatory refinable function and its refinement mask is ) A D ) ) A ) A ) ) otherwise + / 3 by Other examples are given in [3] in terms of their Fourier transforms M (2.2) / 3;A / 3 ) OH) C?% # B " F$ % )' ( D G , +.- F 30/1 3 ! / * / C&% and the symbols are (2.3) ( / 3 A ) where is an even number. More details on interpolatory refinable functions can be found in [14]. We will now recall the wavelet decomposition using interpolatory refinable functions. /@O 3 as First defines a shift-invariant subspace of N 2% 2 % A M/03 D 3 #" + 2% 2 A 56/ 3 3 5 42 % + 2 65 2 87 % 9<>=@? " 5 ;: /LO 3 B 5 A 56/ 3 8A + span The dilations of the space Since is refinable, dyadic level is are , . For 9 , the interpolation operator at the th BD C E M2 / >3GAF )H / I 3;A )9 A ) ) + +!+ ),E D < ) = ? "%$ ) 7 % /LO 3 H A 2KJ 5 L B C M 5#M CN 7 % <P=@? 5 D 5 <G=? OO 5 D 5QOOSR -T,U M 5#M CN 7 % ? $ C + 5 < =? 5 / F% 3 " / 5 3 / 3;A 56/ 3 ) #" + The order of the interpolation error is known for smooth functions [4]. The smoothness can be described using Sobolev spaces /LO 3 [1]. One also requires that the refinable function satisfies a Strang-Fix condition of order which is given in terms of the Fourier transform as where . Now, if is a compactly supported function in the Sobolev space with Sobolev semi-norm then the interpolation error satisfies (2.4) The interpolation interpolates on the lattice , i.e., ETNA Kent State University etna@mcs.kent.edu 171 Multidimensional smoothing using hyperbolic interpolatory wavelets 2$ % 5 2 B $ % 2 A 2 $ % B $ % + B B% B% N /LO 3 A M/ 3 D 3 + 2% A 2% B % B % A /,3 D 3 !#" ) B A 56/ 3 3 5 4B % ) < =? < =? < ? < =? 5 A 5 G /@5 D 5 3 + 2 2 ) 9 A ) 2 A 7 ) A B % $ % ) 9 + 78C&% A 8 A 2 B % % 8A A ) 9 A ) % A 8A $A < =? $ % ) 9 + 5 5 5 3 9 9 9 A 5 B B A A ) 5 A C&% A 9 C A /@O 3 : C /LO 3 E E E B B 8A 8A 5 A C&% C A /LO 3 B B ) % A G / / C / M A M 3 % " 3 3 % " C?% E A E G D M 5#M !C C % A7 % % " F /LO 3 B B A C&% C M M + As one can now introduce an algebraic complement In particular, generated by is a dilation of It follows that , where [8, 16], with such that is the shift-invariant subspace of span and and one finds that The spaces satisfy the decomposition (2.5) Since is a basis for given as and is a basis for it follows that has the basis (2.6) In general, let be a compactly supported continuous function. Then, the sequence converges to in as goes to infinity. In particular, as goes to infinity (and has the expansion ) where for each fixed , the sequence is the sequence of wavelet coefficients related to the space . The Besov space can be defined in many ways. One definition uses the wavelet coefficients. Let satisfying the Strang-Fix condition of order , with . Then, is in if and only if where . Furthermore, the Besov norm number. We use in this paper the space . Its norm is: is equivalent to the above ETNA Kent State University etna@mcs.kent.edu 172 M. Hegland, O. M. Nielsen, and Z. Shen 8 A A / $ C 3 + M M It follows that the wavelet coefficients satisfy B As we consider compactly supported wavelets and functions, only / 3 of the coefficients are nonzero. Thus, the average over all (nonzero) coefficients at level is of order / $ # C 7 %(' 3 + : C7 C A7 " B C 7 % /@5 O 3 C A7 is a compactly supported function in % /@O 3 , then 5 is in both the Sobolev space 3 % F 3 and the Besov space % % /LO (it is, in fact, in % % % /LO ). Further facts of Sobolev and Besov spaces can be found in [5] and [8]. If 2.2. Hyperbolic interpolatory wavelets. The hyperbolic wavelet basis [7] is a subset of the rectangular wavelet basis [5]: 2 (2.7) A A A ) % + +!+ ) 2 " ) R 9 % )!+ +!+ ) 9 2 R 9 ) C?% A / 9 % ) +!+ + ) 9 2 3 G 3@3 3 G 9 2 ;9 9 % 2 A A A ) % + +!+ ) 2 " ) R 9 % )!+ +!+ ) 9 2 R 9 ) C?% G 3 3@3 G 9 2 9 ) 9 % R 2 2 2 A A A CE% /10 % ) +!+ +),0 3;A /10 % 3 3 3@3 /L0 3 where and is as above. The hyperbolic interpolatory wavelet basis is obtained from (2.7) by removing all elements for which leaving (2.8) where These basis functions span function spaces as (2.9) where 87 % A . which alternatively are defined recursively 2 ) 7 7 C 87 % C?% are the spaces of univariate wavelets defined in (2.5) and % A % +!+ + %+ Note that these function spaces are constructed using dilations and shifts of a refinable function . With 2 A 7 7 87 ) C % C?% 2 87 % A A /10 % 3 3 3@3 A /10 2 3 C?% 9 A 9 G A ' % )!+ + +!) 2 + + +!) 2 " ) 9 )!+ + + ) 9 2 9 G 3 3@3 G 9 2 A 9! ) ? * ?% % R % / ) +!+ + ) 3 F ?F ? ? ? one has . The basis functions form a Lagrange basis for the spaces , with the grid points and thus they are at an interpolation operator on and "$# by A "$# at all other points of "$% /'& D "$# 3 + , with , . Thus one can introduce ETNA Kent State University etna@mcs.kent.edu ? "# 173 Multidimensional smoothing using hyperbolic interpolatory wavelets and the grid as the grid known This operator provides the interpolant on the grid A as sparse grid in the literature [19, 17, 12]. Thus provides a direct way to compute the sparse grid the combination formula can be used [11]. A bounded sub2 interpolant for 2 which 2 set of O contains / % 3 sparse grid points which is substantially less than the / 3 regular grid points in the same subset. It can be seen that for compactly supported interpolatory wavelets the number 2 of 3 wavelets contributing to the function values for arguments in a bounded subset is / % . 5 . For this we need the following lemma: Next we estimate the interpolation error 5 D 9 $ $ < ? # 87 $ $ 9 9 R 9 R 9 9 @3 3 3 9 9 9 9 9 .9 9 G G 2 A L EMMA 2.1. 2 The cardinality of the set A / % ) +!+ + ) 2 3 % A 2 % . is % Proof. Let ;/ ),P 3 denote the cardinality of . If A the cardinality of R is ;/ )P D 3 , % D since there are only P degrees of freedom. In the remaining cases where have % ;, we ),P 3 . Hence we have the recursion / ),P 3 A that and the cardinality of is / D ;/ ),P D 3 G ;/ D ),P 3 . We use the fact that ;/ ) 3 A to start an induction and the induction step over and P to see that 9 99 99 9' .9 ' 9 * 9 * '9 * .9 / ),P 3;A ;/ ) P D 3 G / D )P 3 G P D G P D A G P D P D G D P A ) P D based on the standard recursion formula for binomial coefficients. We are now ready to state the following theorem: T HEOREM 2.2. Let be the hyperbolic interpolatory wavelet space as defined in equation (2.9). Suppose that the underlying refinable function 2 satisfies the Strang-Fix condition of order . Let be the interpolation operator from /@O 3 onto . Then for an arbitrary 2 G P , compactly supported function 5 in (/@O 3 with E where M 5#M !C 7 2 " F < ? # : EE < ? :C ' G D # 9 P 2 7 OO 5 D 5QOOGR -T,U M 5#M !C " F P D * $ C ) 72 5 %C A % " F L: C 7 % /LO 2 3 is the Besov norm of Proof. We first discuss the case that product function, i.e., is a compactly supported tensor 2 ) C?% A 3@3@3 2 /L0 2 3 so that we have / 3 A % /10 % 3 Let have the 1D expansion 9 . The constant is independent of 5 and . in . A B B A A ) C&% A ) + +!+),P+ ETNA Kent State University etna@mcs.kent.edu 174 < ? M. Hegland, O. M. Nielsen, and Z. Shen < ? Then, subtracting the wavelet expansion for D # A OOO 2 from the wavelet expansion for , we get 2 A A 7 7 ) C?% % B 2 B 7 7 ) C?% KA A + % B A Since # B A A OOO we get 7 7 ' C ) @ T , U $ N # M M OOO C?% OOO R C < ? 2 O B # O OO D OO R 7 7 ) OOO B A A OOOOO %-T, U M& M N O CEB % $ # 7 7 O ' C R C 7 7 ) % ' GPD B A -T,U M &M N C 78C 87 % P D * $ 7 C + @ -T,U 9 @ -T,U B Here, we have used Lemma 2.1. Although the constant ‘ ’ may vary throughout the proof, we use the generic name ‘ ’ for items independent of . Rewriting the above, we get < ? B ' G P D '9 G P D ' G D 9 P OO D OO R @ T- ,U M M CN P D * $ C ! 8 7 P D * P D * $ #.7 $ ' C /1 78C % ' G 9 G A P D D 9 ' 9 G P D A / G 9 G P D 3 / G 9 G P D 3 3@3@3!/ G 9 G 3 * P D * 2 / 9 G P D 3 / 9 G P D 3 3@3 3!/ 9 G 3 P D $ % G 9 G A 9G 2 C?% ' $ % G + A C?% 9 G * B ' G 9 G P D '9 G P D P D * P D * $ C C?% # We now let and observe that We use this to simplify the series + ETNA Kent State University etna@mcs.kent.edu 175 Multidimensional smoothing using hyperbolic interpolatory wavelets 2 $ % ' 9 G G * $ C R B / G 3 2 $ % $ C ) CE% C?% C?% 2 ' $% G + G * $C 9 E < ? 'N 9 G P D $ C 2 $ 3 $ C # D 0 U OO OO R M M C P D * R / 9 % + 9 P P 9 : C,/LO 3 E E G P ( E / 3 A M/10 % 3 3 3@3 /L0 2 3 + 2 : (/@O 3 C P P A / 3 D E3 ) ) 2 $& A % ) )2 P 23 5 : C ,/LO E E G P 5 B A / D 3$G B B B P 7 A / 7 D 3 + 56/ 3 A % 78C&% < ? B < ? OO 5 D # 5QOO R M % A MOO /03 D 3ED # /03 D 3 OO < ? B B B G P 7 A OO / 7 3 D 36D # / 7 3 D 3 OO + 7.C&% / 7 33 < ? ' 9 G P D N $ # D 6 3 D D 3 @ T , U OO /03 /03 OO R * M MC C ) P D < ? ' G D # 9 P OO / 7 3 D 36D / 7 3 D 3 OO R -T0U P D * / 7 3 3 CN $ C ) #" ) % + to B which is convergent by the quotient rule: for . This is a constant independent of so we have the bound (2.10) Thus we get convergence in for fixed . While the bound grows exponentially in , however, as grows very slowly with the grid size, this dependence is close to constant. , with For the general case, we first choose an interpolatory refinable function , that satisfies the Strang-fix condition . Such a function can be constructed by choosing in (2.3) sufficiently large (see, e.g., [2] and [14]). Let be the tensor product of , i.e., Thus, is a -variable interpolatory refinable function. Then we can define the -dimensional tensor product interpolatory wavelets as (see, e.g., [8]) where is the set of all dimensional vectors with entries either or . For an arbitrary compactly supported function , with ing in terms of this basis, one obtains that , expand- Note that this is a different decomposition than what is used elsewhere in this paper because there are no mixed scales. This is done in order to use the bound obtained for the tensor product function (2.10) for each scale in the general proof. Therefore, Since and are tensor product functions, we use the bound given in (2.10) to obtain and ETNA Kent State University etna@mcs.kent.edu 176 :C M 5#M !C 7 " F C% A7 % 2 " F M. Hegland, O. M. Nielsen, and Z. Shen Since 5 is in ,/LO 2 is equivalent to 23 , it is in the Besov space . Furthermore, its Besov norm B B B A G ! ! C 7 M M M P 7 A M /1 /1 ) % 7.C&% / 7 3 N A 7 C M / 3 M CN ) C < ? B ' G D # 9 P OO 5 D 5 OO R @ -T,U P D * $ C ! M % A M G B B B 7.C&% ' G D 9 P R @ -T,U M 5#M !C 7 2 " F P D * $ C + B (2.11) (see [8], Theorem 2.7), and, since we have P7 A 7 C /1 The last inequality follows from (2.11). R EMARK 2.1. The accuracy of the compressed (orthogonal or biorthogonal) wavelet expansion has been discussed in [7, 10, 15], where it was called the hyperbolic wavelet basis. However, the proof there, which depends on the vanishing moments of the wavelet functions, cannot be applied to the interpolatory wavelet expansion used here, since they do not have the required vanishing moments. The choice of interpolatory wavelets is motivated by their practical advantages. In particular, the wavelet coefficients of hyperbolic interpolatory wavelets are closely related to the function values on a sparse grid which can be retrieved with little extra computation. These function values can then be used to determine many local properties of the functions like slope, curvature, and interactions. R EMARK 2.2. The Besov spaces provide some 2 information about the size of the wavelet F , one has coefficients P . In fact, in the case of 5 C% A7 % " B B 8 A $ A C3+ P / A 7 2 2 3 contains / 3 elements and for compactly supported wavelets and functions / of2 the2 coefficients are non-zero the average of all coefficients at level is of order / # ' 3+ $ % C 7 $ As 9 3. Implementation and Application. 3.1. The smoothing problem in . The hyperbolic interpolatory wavelets are now A used to solve the following smoothing problem. Given the data set: / #.78' )(" #.78' 3 ) 2 )!+ +!+) ) #.78' O )E" #87.' O) we wish to minimise the functional < = /@5 3HA B- 78C?% /@56/ #.78' 3ED " #.78' 3(FHG I where is the number of data points and 5 is limited to problem for the vector of wavelet coefficients : < = / 3;A D F G I M K56/ 3 M F P M) . This leads to the following matrix # ) ETNA Kent State University etna@mcs.kent.edu 177 Multidimensional smoothing using hyperbolic interpolatory wavelets A A K 8A K 8A P M) 7 A A A / #.7.' 3 ) A )!+ + + ) where # This problem has the normal equations: (3.1) G I A # + + A The computations of the matrices use the wavelet basis functions . It turns out that the computations could also be done using products of shifted and translated refinable functions . For this one first observes that and thus 5 2 2 C?% 2 S5 2 + CE% The basis of the right-hand side are just products of shifts and fixed dilations of the refinable function . After computing the corresponding matrices to N and in this case the fast wavelet transform is used to determine the components relating to the wavelet 2 spaces. This but only does not require the determination of the matrices for the “full” spaces C?% for a selection of much smaller spaces. In a sense this approach is akin to the combination technique of sparse grids which also requires the solution of problems on smaller but regular spaces. In contrast to the combination method the method proposed here, however, is the exact solution in the wavelet space and not only an approximation. Moreover, the determination of the matrices in the wavelet space can be computed very efficiently using the fast wavelet transform. 2 3.2. Example: Predictive modelling of forest cover type. In this section we demonstrate how the method developed in the previous sections can be used in data mining for multidimensional predictive modelling. We will take a (Bayesian) classification problem as our example. The data description and the problem statement are available at the web page http://kdd.ics.uci.edu/databases/covertype/covertype.html. The aim is to find a model for the forest cover type as a function of the following cartographic parameters: Variable Description ELEVATION Altitude above sea level ASPECT Azimuth SLOPE Inclination HORIZ HYDRO Horizontal distance to water VERTI HYDRO Vertical distance to water HORIZ ROAD Horizontal distance to roadways HILL SHADE 9 Hill shade at 9am HILL SHADE 12 Hill shade at noon HILL SHADE 15 Hill shade at 3pm HORIZ FIRE Horizontal distance to fire points All values are averaged over 30x30 meter cells and are observed on a regular grid with 30 meter spacing in both directions. The response variable takes one of seven values (1-7) corresponding to the different possible types of forest cover. They are Spruce fir, Lodgepole pine, Ponderosa pine, Cottonwood/Willow, Aspen, Douglas fir, and Krummholz, respectively. However, for this study we ETNA Kent State University etna@mcs.kent.edu 178 M. Hegland, O. M. Nielsen, and Z. Shen will use the data to train a predictive model predicting only the presence or absence of one type of forest cover at a time. In particular, we will determine predictors for the existence of Ponderosa pine. Predictors for the other types can be obtained in similar ways. The response variable is thus ) ) " A cover = Ponderosa pine ) otherwise + ) A ) +!+ + ) + The predictor is the vector of cartographic measurements. Each model found using the smoothing methodology of Section 3.1 is a continuous function 5 . The classifier is obtained by thresholding this function where the class is predicted to be Ponderosa pine if 56/ 3 + and is not Ponderosa pine otherwise. On any test set the performance of the function 5 is estimated by the misclassification rate. This is the number of times the classifier predicts Ponderosa pine when in fact it is something else for the test set plus the number of times that the classifier predicts something else but the actual data is indeed Ponderosa pine. The smoothing method requires the choice of two parameters, the smoothing parameter I and the maximal level . If I is chosen too small and too large then one would get overfitting of large misclassification rates. In order to determine these parameters the data is randomly separated into three distinct subsets of approximately equal size. The first part, the training set is used to compute functions for a variety of different parameters I and . The second part, the test set is used to estimate the misclassification rate for these functions which have been determined from the training set. The parameters are selected to make this error estimate minimal. Finally, the third part of the data is used to estimate the misclassification rate of the model with these “optimal” parameters. For an in-depth discussion of the properties of this method, see [13]. The above problem is ten-dimensional at a first glance. However, some variables are likely to be more important than others, so we have conducted an initial 1D study predicting " as a function of each of the independent variables separately. The most important variables are then selected for predictions of high dimension. Figure 3.1 shows approximating functions 5 which depend on one variable 0 only. The 7 y-axis corresponds to the function values 56/L0 3 and the x-axis to the values of 0 . The name 7 7 of the particular variable 0 together with the overall classification rate is provided as label 7 to each plot. It can be seen that “elevation” is the most important predictor variable with a classification rate of + followed by “distance to roads” with a rate of + and “distance to fire points” with a rate of + . The rates and the best values of I and are listed in Table 3.1 The important question at this point is how much the maximal rate of the 1D predictions can be improved by simultaneously taking more variables into account. This depends on the problem at hand and can be verified by progressively increasing the dimensionality including variables according to their importance as assessed in the 1D study. Table 3.2 shows the classification rates obtained from multivariate models. In this case was fixed to be but I was found using the test set as described above. It is seen that the best classification rate was increased from + in the 1D case to + in the six dimensional case. The third column provides a reference prediction using a generic surface where possible. The generic surface is obtained by using the spaces instead of the hyperbolic interpolatory wavelet spaces. It is seen that the predictive power of the compressed surface is almost as good as that of a full surface. In this particular example, the gain in classification rate from increased dimensionality is modest. Other examples, where a multivariate model has much more predictive power, the advantage of using the compressed system will be greater. 9 9 9 9 2 2 9 ETNA Kent State University etna@mcs.kent.edu Multidimensional smoothing using hyperbolic interpolatory wavelets ELEVATION: 0.80 ASPECT: 0.53 SLOPE: 0.53 HORIZ_HYDRO: 0.54 VERTI_HYDRO: 0.54 HORIZ_ROAD: 0.59 HILL_SHADE_9: 0.55 HILL_SHADE_12: 0.52 HILL_SHADE_15: 0.53 179 HORIZ_FIRE: 0.55 F IG . 3.1. One dimensional predictions of Ponderosa pine. y-axis: predictions used, label: name of predictor and classification rate. Variable ELEVATION ASPECT SLOPE HORIZ HYDRO VERTI HYDRO HORIZ ROAD HILL SHADE 9 HILL SHADE 12 HILL SHADE 15 HORIZ FIRE I 100 0.1 10 0.01 0.1 1 100 10000 10 0.1 9 1 4 2 2 2 5 2 0 2 1 > , x-axis: predictor Rate 0.80 0.53 0.53 0.54 0.54 0.59 0.55 0.52 0.52 0.55 TABLE 3.1 Results of the 1D predictions corresponding to Figure 3.1. The chosen value of and are given together with the classification rate for each of the ten variables. Dimensions 2 3 4 5 6 Compressed system Rate # terms 0.8216 37 0.8359 123 0.8487 368 0.8587 1032 0.8697 2768 Generic system Rate # terms 0.8260 81 0.8483 729 0.8631 6561 N/A 59049 N/A 531441 TABLE 3.2 The best multivariate predictions obtained from the compressed system and the generic system where possible. It is seen that the compressed system predicts almost as well as the generic system but it requires much less storage. The CPU time required to compute the generic system increased rapidly with increased dimensionality and it was not possible to compute the generic system for dimensions higher than 4 because of excessive storage requirements. ETNA Kent State University etna@mcs.kent.edu 180 M. Hegland, O. M. Nielsen, and Z. Shen 4. Conclusion. We have introduced hyperbolic interpolatory wavelets and demonstrated how they can be used for data mining applications, in particular predictive modelling based on smoothing. Compared to related approaches using (bi-)orthogonal wavelets the approach suggested here has advantages for data mining as function values on the grid points are readily accessible and could be used to determine properties of the functions like derivatives, and, possibly more importantly, have an immediate meaning for the application which is not the case for the wavelet coefficients of (bi-)orthogonal wavelets. The method performs well for up to around 10 dimensions. We are now generalising this approach for higher dimensions. Further information and related software can be found at our web page http://datamining.anu.edu.au. REFERENCES [1] R. A. A DAMS , Sobolev spaces, Academic press, New York, 1975. [2] I. D AUBECHIES , Orthonormal bases of compactly supported wavelets, Comm. Pure Appl. Math., XLI (1988), pp. 909–996. , Ten Lectures on Wavelets, SIAM, 1992. [3] [4] C. DE B OOR , K. H ÖLLIG , AND S. R IEMENSCHNEIDER , Box splines, Springer Verlag, New York, 1993. [5] R. D E V ORE , Nonlinear approximation, Acta Numerica, 7 (1998), pp. 51–150. [6] R. D E V ORE , B. JAWERTH , AND B. L UCIER , Surface compression, Comput. Aided Geom. Design, 9 (1992), pp. 219–239. [7] R. A. D E V ORE , S. V. K ONYAGIN , AND V. N. T EMLYAKOV , Hyperbolic wavelet approximation, Constr. Approx., 14 (1998), pp. 1–26. [8] D. D ONOHO , Interpolating wavelet transforms. Available at the web page (1992): http://www-stat.stanford.edu/˜donoho/Reports. [9] J. G ARCKE , M. G RIEBEL , AND M. T HESS , Data mining with sparse grids, Computing, 67 (2001), pp. 225– 253. [10] M. G RIEBEL AND S. K NAPEK , Optimized tensor-product approximation spaces, Constr. Approx., 16 (2000), pp. 525–540. [11] M. G RIEBEL AND P. O SWALD , Tensor product type subspace splittings and multilevel iterative methods for anisotropic problems, Adv. Comput. Math., 4 (1995), pp. 171–206. [12] M. G RIEBEL AND G. Z UMBUSCH , Adaptive sparse grids for hyperbolic conservation laws, in Proceedings of the 7th International Conference on Hyperbolic Problems, Theory, Numerics, Applications, Birkhäuser, 1998. [13] T. H ASTIE , R. T IBSHIRANI , AND J. F RIEDMAN , The Elements of Statistical Learning, Springer Series in Statistics, Springer Verlag, 2001. [14] J. H UI , S. D. R IEMENSCHNEIDER , AND Z. S HEN , Multivariate compactly supported fundamental refinable functions, duals and biorthogonal wavelets, Stud. Appl. Math., 102 (1999), pp. 173–204. [15] S. K NAPEK AND F. K OSTER , Integral operators on sparse grids. Preprint. [16] S. D. R IEMENSCHNEIDER AND Z. W. S HEN , Interpolatory wavelet packets, Appl. Comput. Harmon. Anal., 8 (2000), pp. 320–324. [17] F. S PRENGEL , Interpolation and wavelets on sparse Gauss-Chebyshev grids, in Multivariate Approximation, Recent Trends and Results, W. Haussmann, K.Jetter, and M. Reimer, eds., vol. 101 of Mathematical Research, Akademie-Verlag, Berlin, 1997, pp. 269–286. [18] G. WAHBA , Spline Models for Observational Data, CBMS-NSF Regional Conf. Ser. in Appl. Math., vol. 59, SIAM, 1990. [19] C. Z ENGER , Sparse grids, Technical Report, Technische Universität München, 1990.