ETNA

advertisement
ETNA
Electronic Transactions on Numerical Analysis.
Volume 17, pp. 168-180, 2004.
Copyright  2004, Kent State University.
ISSN 1068-9613.
Kent State University
etna@mcs.kent.edu
MULTIDIMENSIONAL SMOOTHING USING HYPERBOLIC INTERPOLATORY
WAVELETS
MARKUS HEGLAND , OLE M. NIELSEN , AND ZUOWEI SHEN
Abstract. We propose the application of hyperbolic interpolatory wavelets for large-scale -dimensional data
fitting. In particular, we show how wavelets can be used as a highly efficient tool for multidimensional smoothing. The grid underlying these wavelets is a sparse grid. The hyperbolic interpolatory wavelet space of level
uses
basis functions and it is shown that under sufficient smoothness an approximation error of order
! can be achieved. The implementation uses the fast wavelet transform and an efficient indexing
method to access the wavelet coefficients. A practical example demonstrates the efficiency of the approach.
Key words. sparse grids, predictive modelling, wavelets, smoothing, data mining.
AMS subject classifications. 65C60, 65D10, 65T60.
1. Introduction. Predictive modelling aims to recover functional relations from observed data. More concisely, given a sequence of values of a response variable "$#&%('*)
+!+
+),"#.-'
#.-' 3 ,
and a sequence of values of a predictor variable (which is typically a vector), /10 #&%' )
+
+!+
),0 24
%
4
2
3
#
%
'
.
#
'
+
+!+ , /10 )!+
+!+
)(0
predictive modelling recovers a function 5 such that
%
#.7.' 3;: " #.78' +
56/10 %8# 7.' !) +
+!+)(0 29
In the cases considered here the response and the components of the predictors are real numbers.
In a smoothing approach to predictive modelling, the function is obtained by minimising
a cost functional
(1.1)
<>=?/@5 3;A
B78C?%
/@56/10 #.78' 3ED " #87.' 3(FHGJI /LK5)(KM5 3 )
where I is the smoothing parameter which
2 can be found, e.g., by cross-validation [13] and
the operator K is densely defined in N F /LO 3 , and typically is a differential operator. Finding a
good function class which both allows the efficient approximation of the predictive function
5 and uses algorithms which are scalable in the number P of predictor variables 0 % )
+!+
+
),0 2 is
a major challenge. Function classes which have been used in the past for this problem include
radial basis functions [18], artificial neural nets, multivariate adaptive regression splines [13],
and generalised additive models [13]. Here we present a new method for predictive modelling
based on hyperbolic interpolatory wavelets. A related approach which applies sparse grids to
classification is discussed in [9].
First, the function space for 5 should provide a good approximation of a large class of
functions under reasonable smoothness assumptions. Second, the evaluation of the function
Q
Received August 27, 2002. Accepted for publication April 13, 2004. Recommended by Martin Gutknecht. The
research presented here was partly funded by the Australian Advanced Computational Systems CRC, Australian
Partnership for Advanced Computations (APAC), the Danish Research Council and the Academic Research Fund
RP3981647. This research was partially done while the first author was visiting the Institute for Mathematical
Sciences, National University of Singapore in 2003. The visit was supported by the Institute.
Mathematical Sciences Institute,Australian National University, Canberra ACT 0200, Australia. E-mail:
markus.hegland@anu.edu.au
Urban Risk Research Group, Geoscience Australia, Canberra ACT, Australia.
Department of Mathematics, National University of Singapore, Singapore.
168
ETNA
Kent State University
etna@mcs.kent.edu
169
Multidimensional smoothing using hyperbolic interpolatory wavelets
should be fast and the function itself should be represented by a limited number of nonzero
coefficients. As we will see, hyperbolic interpolatory wavelets satisfy both conditions. If
the function space for 5 has a relatively low dimension, then nonlinear approximation theory
provides a feasible technique for the determination of an approximation with few terms [5].
This -term approximation proceeds by first determining all the coefficients of the expansion
of 5 in the space and then selecting the
most important ones, e.g., by using the threshold
method [8]. This approach is very successful in the case of small P , e.g., in the case of image
processing where P A
(see [6]). For problems of higher dimension, this approach is not
feasible as the dimension of the underlying function space becomes prohibitively large. In
this case, the -term approximation technique needs to be preceded by a basis selection
procedure which finds a smaller basis. This step makes the problem “seriously” nonlinear
and typically greedy algorithms are used [13].
Under sufficient smoothness conditions (which are related to the operator K ), hyperbolic
interpolatory wavelet spaces provide a good trade-off between function space complexity and
approximation error. Hyperbolic orthogonal and bi-orthogonal wavelets were introduced in
[7, 10, 11] where error analyses were also given. Further, hyperbolic bi-orthogonal wavelets
have been shown to be effective for the solution of high dimensional elliptic PDEs and IEs
([15, 11]). Here we consider the case of interpolatory wavelets where the interpolation points
form a sparse grid [19]. At the same time as we developed our hyperbolic interpolatory
wavelet approach a related approach was developed which uses sparse grid approximations
for the classification problem [9].
In Section 2 we introduce hyperbolic interpolatory wavelets and discuss their approximation properties. In Section 3 we discuss the implementation and give an application.
2. Approximation by hyperbolic interpolatory wavelets. Data mining applications
require access to function values on the grid points for further processing. For interpolatory
wavelets the function values on the grid are directly related to the wavelet coefficients and
can be easily retrieved. This is in contrast to orthogonal and biorthogonal wavelets for which
more elaborate summations are required to retrieve the function values on the grid points.
The approximation properties for hyperbolic biorthogonal wavelets have been studied in [7].
In the following we will show that similar estimates can be given for interpolatory wavelets
using a different approach.
In the next subsection some basic properties of one-dimensional interpolatory wavelets
are reviewed. The reader familiar with the literature on this topic can skip this subsection and
move straight to the following subsection which introduces hyperbolic interpolatory wavelets.
2.1. One dimensional interpolatory wavelets. Compactly supported interpolatory
wavelets are defined by compactly supported, interpolatory refinable functions . A compactly supported function is refinable if there is a finitely supported sequence , such that
the function satisfies the following equation:
M/L0 3 A B
/ 0 D 3 +
The sequence is called the refinement mask for M
/10 3 . A continuous refinable function is interpolatory if
) A )
;
3
A
A
(2.1)
M/
)!#"%$& +
A simple example of an interpolatory refineable function is the hat function given in
the following example.
ETNA
Kent State University
etna@mcs.kent.edu
170
M. Hegland, O. M. Nielsen, and Z. Shen
E XAMPLE 2.1. Let
G
M/10 3 A D
)
0?) 0
?0 ) 0
D ) ) ) )
+
otherwise
M/10 3 is an interpolatory refinable function and its refinement mask is
) A D ) )
A ) A )
)
otherwise +
/ 3 by
Other examples are given in [3] in terms of their Fourier transforms M
(2.2)
/ 3;A / 3 ) OH)
C?%
# B " F$ % )' ( D G , +.- F 30/1
3
!
/
* /
C&%
and the symbols are
(2.3)
(
/ 3 A
)
where is an even number. More details on interpolatory refinable functions can be found
in [14].
We will now recall the wavelet decomposition using interpolatory refinable functions.
/@O 3 as
First defines a shift-invariant subspace
of N
2% 2 % A M/03 D 3 #" +
2%
2 A 56/ 3 3 5 42 % +
2 65 2 87 % 9<>=@? " 5 ;: /LO 3
B
5 A
56/ 3 8A +
span
The dilations of the space
Since is refinable,
dyadic level is
are
,
. For
9
, the interpolation operator at the th
BD C E
M2 / >3GAF )H / I 3;A )9 A ) )
+
+!+ ),E D < ) = ? "%$ )
7 % /LO 3
H A 2KJ
5
L
B
C
M 5#M CN 7 % <P=@?
5 D
5
<G=? OO 5 D 5QOOSR -T,U M 5#M CN 7 % ? $ C +
5 < =?
5
/ F% 3 "
/ 5 3 / 3;A 56/ 3 ) #" +
The order of the interpolation error is known for smooth functions [4]. The smoothness can
be described using Sobolev spaces
/LO 3 [1]. One also requires that the refinable function
satisfies a Strang-Fix condition of order which is given in terms of the Fourier transform
as
where
. Now, if is a compactly supported function in the Sobolev space
with Sobolev semi-norm
then the interpolation error
satisfies
(2.4)
The interpolation
interpolates
on the lattice
, i.e.,
ETNA
Kent State University
etna@mcs.kent.edu
171
Multidimensional smoothing using hyperbolic interpolatory wavelets
2$ % 5 2
B $ %
2 A 2 $ % B $ % +
B
B% B%
N /LO 3
A
M/ 3 D 3 +
2% A 2% B %
B % A /,3 D 3 !#" )
B A 56/ 3 3 5 4B % )
< =? < =? < ? < =?
5 A
5 G /@5 D
5 3 +
2
2 ) 9 A )
2 A 7 ) A B % $ % ) 9 +
78C&% A
8
A
2
B
%
%
8A A ) 9 A )
% A 8A $A
< =?
$ % ) 9 +
5
5
5 3 9
9
9 A 5
B B A A
)
5 A C&%
A
9
C A /@O 3
: C /LO 3
E E E B B 8A 8A
5 A C&%
C A /LO 3
B
B )
% A G / / C / M A M 3 % " 3 3 % "
C?% E A E G D M 5#M !C C % A7 % % " F /LO 3
B B A
C&% C M M +
As
one can now introduce an algebraic complement
In particular,
generated by
is a dilation of
It follows that
, where
[8, 16], with
such that
is the shift-invariant subspace of
span
and
and one finds that
The spaces
satisfy the decomposition
(2.5)
Since
is a basis for
given as
and
is a basis for
it follows that
has the basis
(2.6)
In general, let be a compactly supported continuous function. Then, the sequence
converges to in
as goes to infinity. In particular, as goes to infinity (and
has the expansion
)
where for each fixed , the sequence
is the sequence of wavelet coefficients related
to the space .
The Besov space
can be defined in many ways. One definition uses the wavelet
coefficients. Let
satisfying the Strang-Fix condition of order , with
.
Then,
is in
if and only if
where
. Furthermore, the Besov norm
number. We use in this paper the space
. Its norm is:
is equivalent to the above
ETNA
Kent State University
etna@mcs.kent.edu
172
M. Hegland, O. M. Nielsen, and Z. Shen
8
A
A / $ C 3 +
M
M
It follows that the wavelet coefficients satisfy
B
As we consider compactly supported wavelets and functions, only / 3 of the coefficients are nonzero. Thus, the average over all (nonzero) coefficients at level is of order
/
$ # C 7 %(' 3 +
: C7
C A7 "
B C 7 % /@5 O 3
C A7
is a compactly supported function in
% /@O 3 , then 5 is in both the Sobolev space
3
% F 3
and the Besov space % % /LO (it is, in fact, in % % % /LO ). Further facts of
Sobolev and Besov spaces can be found in [5] and [8].
If
2.2. Hyperbolic interpolatory wavelets. The hyperbolic wavelet basis [7] is a subset
of the rectangular wavelet basis [5]:
2
(2.7)
A A A ) % +
+!+
) 2 " ) R 9 % )!+
+!+
) 9 2 R 9 )
C?%
A / 9 % )
+!+
+
) 9 2 3
G 3@3 3 G 9 2 ;9
9
%
2
A A A ) % +
+!+
) 2 " ) R 9 % )!+
+!+
) 9 2 R 9 )
C?%
G 3 3@3 G 9 2 9 )
9
%
R
2
2
2
A
A
A
CE% /10 % )
+!+
+),0 3;A /10 % 3 3 3@3 /L0 3
where
and is as above. The hyperbolic interpolatory wavelet basis is
obtained from (2.7) by removing all elements for which
leaving
(2.8)
where
These basis functions span function spaces
as
(2.9)
where
87
% A
.
which alternatively are defined recursively
2 )
7 7 C 87 % C?%
are the spaces of univariate wavelets defined in (2.5) and
%
A %
+!+
+
%+
Note that these function spaces are constructed using dilations and shifts of a refinable function . With
2
A 7 7 87 )
C % C?%
2
87 % A A /10 % 3 3 3@3 A /10 2 3 C?% 9 A 9 G
A ' % )!+
+
+!) 2 +
+
+!) 2 " ) 9 )!+
+
+
) 9 2 9 G 3 3@3 G 9 2 A 9! )
? * ?%
%
R %
/ )
+!+
+
) 3 F ?F
? ?
?
one has
. The basis functions
form a Lagrange basis for the spaces , with the grid points
and thus they are at
an interpolation operator on
and
"$#
by
A
"$#
at all other points of
"$% /'& D "$#
3 +
, with
,
. Thus one can introduce
ETNA
Kent State University
etna@mcs.kent.edu
? "#
173
Multidimensional smoothing using hyperbolic interpolatory wavelets
and the grid as the grid known
This operator provides the interpolant on the grid A
as sparse grid in the literature [19, 17, 12]. Thus
provides a direct way to compute the
sparse grid
the combination formula can be used [11]. A bounded sub2 interpolant for
2 which
2
set of O contains /
% 3 sparse grid points which is substantially less than the / 3
regular grid points in the same subset. It can be seen that for compactly supported interpolatory wavelets the number
2 of 3 wavelets contributing to the function values for arguments in a
bounded subset is /
% .
5 . For this we need the following lemma:
Next we estimate the interpolation error 5 D
9 $ $ < ?
#
87 $ $ 9 9 R 9 R 9 9 @3 3 3 9
9 9 9 9
.9 9
G
G 2 A
L EMMA 2.1.
2 The cardinality of the set A / % )
+!+
+
) 2 3
%
A 2 % .
is
%
Proof. Let ;/ ),P 3 denote the cardinality of . If A
the cardinality of R is ;/ )P D 3 ,
%
D
since there are only P
degrees of freedom. In the remaining cases where
have
% ;, we
),P 3 . Hence we have the recursion
/ ),P 3 A
that
and the cardinality of is / D
;/ ),P D 3 G ;/ D ),P 3 . We use the fact that ;/ ) 3 A to start an induction and the
induction step over and P to see that
9
99 99
9' .9 ' 9 * 9 *
'9 *
.9
/ ),P 3;A ;/ ) P D 3 G / D )P 3
G P D
G P D
A
G
P D
P D
G
D
P
A
)
P D
based on the standard recursion formula for binomial coefficients.
We are now ready to state the following theorem:
T HEOREM 2.2. Let be the hyperbolic interpolatory wavelet space as defined in equation (2.9). Suppose that the underlying refinable function 2 satisfies the Strang-Fix condition
of order . Let
be the interpolation operator from /@O 3 onto . Then for an arbitrary
2
G P ,
compactly supported function 5 in (/@O 3 with E
where
M 5#M !C 7 2 " F
< ?
#
:
EE
< ? :C
'
G
D
#
9
P
2
7
OO 5 D 5QOOGR -T,U M 5#M !C " F P D * $ C )
72
5 %C A % " F
L: C 7 % /LO 2 3
is the Besov norm of
Proof. We first discuss the case that
product function, i.e.,
is a compactly supported tensor
2
)
C?%
A
3@3@3 2 /L0 2 3
so that we have / 3 A
% /10 % 3
Let have the 1D expansion
9
. The constant is independent of 5 and .
in
.
A B B A A )
C&%
A
)
+
+!+),P+
ETNA
Kent State University
etna@mcs.kent.edu
174
< ?
M. Hegland, O. M. Nielsen, and Z. Shen
< ?
Then, subtracting the wavelet expansion for
D
# A
OOO 2
from the wavelet expansion for , we get
2 A A 7 7 ) C?%
% B 2 B 7 7 ) C?% KA A +
%
B
A
Since
# B
A A OOO
we get
7 7 ' C )
@
T
,
U
$
N
#
M
M
OOO C?% OOO R
C
< ?
2
O
B
#
O
OO D OO R 7 7 ) OOO B A A OOOOO
%-T, U M& M N O CEB % $ # 7 7 O ' C
R
C 7 7 )
% ' GPD
B
A -T,U M &M N C 78C 87 % P D * $ 7 C +
@ -T,U 9
@ -T,U
B
Here, we have used Lemma 2.1. Although the constant ‘
’ may vary throughout the
proof, we use the generic name ‘
’ for items independent of . Rewriting the above, we
get
< ?
B ' G P D '9 G P D '
G
D
9
P
OO D OO R @ T- ,U M M CN P D * $ C ! 8 7 P D * P D * $ #.7 $ ' C /1
78C %
' G 9 G A P D D 9 ' 9 G P D A / G 9 G P D 3 / G 9 G P D 3 3@3@3!/ G 9 G 3
* P D * 2 / 9 G P D 3 / 9 G P D 3 3@3 3!/ 9 G 3
P D
$ % G 9 G A
9G 2 C?% '
$ % G +
A
C?% 9 G *
B ' G 9 G P D '9 G P D P D * P D * $ C
C?%
# We now let
and observe that
We use this to simplify the series
+
ETNA
Kent State University
etna@mcs.kent.edu
175
Multidimensional smoothing using hyperbolic interpolatory wavelets
2
$ % ' 9 G G * $ C R B / G 3 2 $ % $ C )
CE% C?%
C?%
2
'
$% G
+ G * $C
9
E
< ?
'N 9 G P D $ C 2 $ 3 $ C
#
D
0
U
OO OO R M M C P D * R / 9 % +
9
P
P
9
: C,/LO 3
E E G P (
E
/ 3 A M/10 % 3 3 3@3 /L0 2 3 +
2
:
(/@O 3
C
P
P
A / 3 D E3 ) ) 2 $& A % )
)2
P
23
5 : C ,/LO
E E G P 5
B
A
/
D 3$G B B B P 7 A / 7 D 3 +
56/ 3 A
%
78C&% < ? B
< ?
OO 5 D # 5QOO R M % A MOO /03 D 3ED # /03 D 3 OO < ?
B B B
G P 7 A OO / 7 3 D 36D #
/ 7 3 D 3 OO +
7.C&%
/ 7 33
< ?
' 9 G P D N $ #
D
6
3
D
D
3
@
T
,
U
OO /03 /03 OO R
* M MC C )
P D
< ?
'
G
D
#
9
P
OO / 7 3 D 36D / 7 3 D 3 OO R -T0U P D * / 7 3 3 CN $ C ) #" ) % +
to
B
which is convergent by the quotient rule:
for
. This is a constant independent of so we have the bound
(2.10)
Thus we get convergence in for fixed . While the bound grows exponentially in , however,
as grows very slowly with the grid size, this dependence is close to constant.
, with
For the general case, we first choose an interpolatory refinable function
, that satisfies the Strang-fix condition . Such a function can be constructed
by choosing in (2.3) sufficiently large (see, e.g., [2] and [14]). Let be the tensor product
of , i.e.,
Thus, is a -variable interpolatory refinable function. Then we can define the
-dimensional tensor product interpolatory wavelets as (see, e.g., [8])
where
is the set of all dimensional vectors with entries either or .
For an arbitrary compactly supported function
, with
ing in terms of this basis, one obtains that
, expand-
Note that this is a different decomposition than what is used elsewhere in this paper because
there are no mixed scales. This is done in order to use the bound obtained for the tensor
product function (2.10) for each scale in the general proof. Therefore,
Since and
are tensor product functions, we use the bound given in (2.10) to obtain
and
ETNA
Kent State University
etna@mcs.kent.edu
176
:C
M 5#M !C 7 " F
C% A7 % 2 " F
M. Hegland, O. M. Nielsen, and Z. Shen
Since 5 is in ,/LO
2 is equivalent to
23
, it is in the Besov space . Furthermore, its Besov norm
B
B B
A
G
!
!
C
7
M
M
M P 7 A M /1 /1 )
%
7.C&%
/ 7 3 N A 7 C M /
3 M CN )
C
< ?
B
'
G
D
#
9
P
OO 5 D 5 OO R @ -T,U P D * $ C ! M % A M G B B B 7.C&%
'
G
D
9
P
R @ -T,U M 5#M !C 7 2 " F P D * $ C +
B
(2.11)
(see [8], Theorem 2.7), and, since
we have
P7
A 7 C /1
The last inequality follows from (2.11).
R EMARK 2.1. The accuracy of the compressed (orthogonal or biorthogonal) wavelet
expansion has been discussed in [7, 10, 15], where it was called the hyperbolic wavelet basis.
However, the proof there, which depends on the vanishing moments of the wavelet functions,
cannot be applied to the interpolatory wavelet expansion used here, since they do not have
the required vanishing moments.
The choice of interpolatory wavelets is motivated by their practical advantages. In particular, the wavelet coefficients of hyperbolic interpolatory wavelets are closely related to the
function values on a sparse grid which can be retrieved with little extra computation. These
function values can then be used to determine many local properties of the functions like
slope, curvature, and interactions.
R EMARK 2.2. The Besov spaces provide some
2 information about the size of the wavelet
F , one has
coefficients P . In fact, in the case of 5
C% A7 % "
B B
8
A
$
A
C3+
P
/
A
7
2
2 3 contains / 3 elements and for compactly supported wavelets and functions
/
of2 the2 coefficients are non-zero the average of all coefficients at level is of order
/ # ' 3+
$ % C 7 $
As
9
3. Implementation and Application.
3.1. The smoothing problem in . The hyperbolic interpolatory wavelets are now
A
used to solve the following
smoothing problem. Given the data set: / #.78' )(" #.78' 3 )
2
)!+
+!+) ) #.78' O )E" #87.' O) we wish to minimise the functional
< = /@5 3HA
B-
78C?%
/@56/ #.78' 3ED " #.78' 3(FHG I
where is the number of data points and 5 is limited to
problem for the vector of wavelet coefficients :
< = / 3;A D
F G I
M K56/
3 M F P M)
. This leads to the following matrix
#
)
ETNA
Kent State University
etna@mcs.kent.edu
177
Multidimensional smoothing using hyperbolic interpolatory wavelets
A A K 8A K 8A P M)
7 A A A / #.7.' 3 ) A )!+
+
+
)
where
#
This problem has the normal equations:
(3.1)
G I
A
#
+
+
A
The computations of the matrices use the wavelet basis functions
. It turns out that the
computations could also be done using
products
of
shifted
and
translated
refinable functions
. For this one first observes that
and thus
5 2
2
C?%
2
S5 2 +
CE%
The basis of the right-hand side are just products of shifts and fixed dilations of the refinable
function . After computing the corresponding matrices to N and
in this case the fast
wavelet transform is used to determine the components relating to the wavelet
2 spaces. This
but only
does not require the determination of the matrices for the “full” spaces
C?%
for a selection of much smaller spaces. In a sense this approach is akin to the combination
technique of sparse grids which also requires the solution of problems on smaller but regular
spaces. In contrast to the combination method the method proposed here, however, is the exact solution in the wavelet space and not only an approximation. Moreover, the determination
of the matrices in the wavelet space can be computed very efficiently using the fast wavelet
transform.
2 3.2. Example: Predictive modelling of forest cover type. In this section we demonstrate how the method developed in the previous sections can be used in data mining for
multidimensional predictive modelling. We will take a (Bayesian) classification problem as
our example. The data description and the problem statement are available at the web page
http://kdd.ics.uci.edu/databases/covertype/covertype.html.
The aim is to find a model for the forest cover type as a function of the following
cartographic parameters:
Variable
Description
ELEVATION
Altitude above sea level
ASPECT
Azimuth
SLOPE
Inclination
HORIZ HYDRO
Horizontal distance to water
VERTI HYDRO
Vertical distance to water
HORIZ ROAD
Horizontal distance to roadways
HILL SHADE 9
Hill shade at 9am
HILL SHADE 12 Hill shade at noon
HILL SHADE 15 Hill shade at 3pm
HORIZ FIRE
Horizontal distance to fire points
All values are averaged over 30x30 meter cells and are observed on a regular grid with
30 meter spacing in both directions.
The response variable takes one of seven values (1-7) corresponding to the different
possible types of forest cover. They are Spruce fir, Lodgepole pine, Ponderosa pine, Cottonwood/Willow, Aspen, Douglas fir, and Krummholz, respectively. However, for this study we
ETNA
Kent State University
etna@mcs.kent.edu
178
M. Hegland, O. M. Nielsen, and Z. Shen
will use the data to train a predictive model predicting only the presence or absence of one
type of forest cover at a time. In particular, we will determine predictors for the existence of
Ponderosa pine. Predictors for the other types can be obtained in similar ways. The response
variable is thus
)
)
" A
cover = Ponderosa pine )
otherwise +
)
A )
+!+
+
)
+
The predictor is the vector of cartographic measurements.
Each model found using the smoothing methodology of Section 3.1 is a continuous function 5 . The classifier is obtained by thresholding this function where the class is predicted to
be Ponderosa pine if 56/ 3 + and is not Ponderosa pine otherwise. On any test set the
performance of the function 5 is estimated by the misclassification rate. This is the number
of times the classifier predicts Ponderosa pine when in fact it is something else for the test
set plus the number of times that the classifier predicts something else but the actual data is
indeed Ponderosa pine.
The smoothing method requires the choice of two parameters, the smoothing parameter I
and the maximal level . If I is chosen too small and too large then one would get overfitting
of large misclassification rates. In order to determine these parameters the data is randomly
separated into three distinct subsets of approximately equal size. The first part, the training
set is used to compute functions for a variety of different parameters I and . The second
part, the test set is used to estimate the misclassification rate for these functions which have
been determined from the training set. The parameters are selected to make this error estimate
minimal. Finally, the third part of the data is used to estimate the misclassification rate of the
model with these “optimal” parameters. For an in-depth discussion of the properties of this
method, see [13].
The above problem is ten-dimensional at a first glance. However, some variables are
likely to be more important than others, so we have conducted an initial 1D study predicting
" as a function of each of the independent variables separately. The most important variables
are then selected for predictions of high dimension.
Figure 3.1 shows approximating functions 5 which depend on one variable 0 only. The
7
y-axis corresponds to the function values 56/L0 3 and the x-axis to the values of 0 . The name
7
7
of the particular variable 0 together with the overall classification rate is provided as label
7
to each plot. It can be seen that “elevation” is the most important predictor variable with a
classification rate of + followed by “distance to roads” with a rate of + and “distance to
fire points” with a rate of + . The rates and the best values of I and are listed in Table 3.1
The important question at this point is how much the maximal rate of the 1D predictions
can be improved by simultaneously taking more variables into account. This depends on the
problem at hand and can be verified by progressively increasing the dimensionality including variables according to their importance as assessed in the 1D study. Table 3.2 shows the
classification rates obtained from multivariate models. In this case was fixed to be but
I was found using the test set as described above. It is seen that the best classification rate
was increased from + in the 1D case to + in the six dimensional case. The third column
provides a reference prediction using a generic surface where possible. The generic surface is
obtained by using the spaces
instead of the hyperbolic interpolatory wavelet spaces.
It is seen that the predictive power of the compressed surface is almost as good as that of a
full surface. In this particular example, the gain in classification rate from increased dimensionality is modest. Other examples, where a multivariate model has much more predictive
power, the advantage of using the compressed system will be greater.
9
9
9
9
2 2
9
ETNA
Kent State University
etna@mcs.kent.edu
Multidimensional smoothing using hyperbolic interpolatory wavelets
ELEVATION: 0.80
ASPECT: 0.53
SLOPE: 0.53
HORIZ_HYDRO: 0.54
VERTI_HYDRO: 0.54
HORIZ_ROAD: 0.59
HILL_SHADE_9: 0.55
HILL_SHADE_12: 0.52
HILL_SHADE_15: 0.53
179
HORIZ_FIRE: 0.55
F IG . 3.1. One dimensional predictions of Ponderosa pine. y-axis: predictions
used, label: name of predictor and classification rate.
Variable
ELEVATION
ASPECT
SLOPE
HORIZ HYDRO
VERTI HYDRO
HORIZ ROAD
HILL SHADE 9
HILL SHADE 12
HILL SHADE 15
HORIZ FIRE
I
100
0.1
10
0.01
0.1
1
100
10000
10
0.1
9
1
4
2
2
2
5
2
0
2
1
> , x-axis: predictor Rate
0.80
0.53
0.53
0.54
0.54
0.59
0.55
0.52
0.52
0.55
TABLE 3.1
Results of the 1D predictions corresponding to Figure 3.1. The chosen value of and are given together with
the classification rate for each of the ten variables.
Dimensions
2
3
4
5
6
Compressed system
Rate
# terms
0.8216
37
0.8359
123
0.8487
368
0.8587
1032
0.8697
2768
Generic system
Rate
# terms
0.8260
81
0.8483
729
0.8631
6561
N/A
59049
N/A
531441
TABLE 3.2
The best multivariate predictions obtained from the compressed system and the generic system where possible.
It is seen that the compressed system predicts almost as well as the generic system but it requires much less storage.
The CPU time required to compute the generic system increased rapidly with increased dimensionality and it was
not possible to compute the generic system for dimensions higher than 4 because of excessive storage requirements.
ETNA
Kent State University
etna@mcs.kent.edu
180
M. Hegland, O. M. Nielsen, and Z. Shen
4. Conclusion. We have introduced hyperbolic interpolatory wavelets and demonstrated how they can be used for data mining applications, in particular predictive modelling
based on smoothing. Compared to related approaches using (bi-)orthogonal wavelets the approach suggested here has advantages for data mining as function values on the grid points
are readily accessible and could be used to determine properties of the functions like derivatives, and, possibly more importantly, have an immediate meaning for the application which
is not the case for the wavelet coefficients of (bi-)orthogonal wavelets. The method performs
well for up to around 10 dimensions. We are now generalising this approach for higher dimensions. Further information and related software can be found at our web page
http://datamining.anu.edu.au.
REFERENCES
[1] R. A. A DAMS , Sobolev spaces, Academic press, New York, 1975.
[2] I. D AUBECHIES , Orthonormal bases of compactly supported wavelets, Comm. Pure Appl. Math., XLI (1988),
pp. 909–996.
, Ten Lectures on Wavelets, SIAM, 1992.
[3]
[4] C. DE B OOR , K. H ÖLLIG , AND S. R IEMENSCHNEIDER , Box splines, Springer Verlag, New York, 1993.
[5] R. D E V ORE , Nonlinear approximation, Acta Numerica, 7 (1998), pp. 51–150.
[6] R. D E V ORE , B. JAWERTH , AND B. L UCIER , Surface compression, Comput. Aided Geom. Design, 9 (1992),
pp. 219–239.
[7] R. A. D E V ORE , S. V. K ONYAGIN , AND V. N. T EMLYAKOV , Hyperbolic wavelet approximation, Constr.
Approx., 14 (1998), pp. 1–26.
[8] D. D ONOHO , Interpolating wavelet transforms. Available at the web page (1992):
http://www-stat.stanford.edu/˜donoho/Reports.
[9] J. G ARCKE , M. G RIEBEL , AND M. T HESS , Data mining with sparse grids, Computing, 67 (2001), pp. 225–
253.
[10] M. G RIEBEL AND S. K NAPEK , Optimized tensor-product approximation spaces, Constr. Approx., 16 (2000),
pp. 525–540.
[11] M. G RIEBEL AND P. O SWALD , Tensor product type subspace splittings and multilevel iterative methods for
anisotropic problems, Adv. Comput. Math., 4 (1995), pp. 171–206.
[12] M. G RIEBEL AND G. Z UMBUSCH , Adaptive sparse grids for hyperbolic conservation laws, in Proceedings of
the 7th International Conference on Hyperbolic Problems, Theory, Numerics, Applications, Birkhäuser,
1998.
[13] T. H ASTIE , R. T IBSHIRANI , AND J. F RIEDMAN , The Elements of Statistical Learning, Springer Series in
Statistics, Springer Verlag, 2001.
[14] J. H UI , S. D. R IEMENSCHNEIDER , AND Z. S HEN , Multivariate compactly supported fundamental refinable
functions, duals and biorthogonal wavelets, Stud. Appl. Math., 102 (1999), pp. 173–204.
[15] S. K NAPEK AND F. K OSTER , Integral operators on sparse grids. Preprint.
[16] S. D. R IEMENSCHNEIDER AND Z. W. S HEN , Interpolatory wavelet packets, Appl. Comput. Harmon. Anal.,
8 (2000), pp. 320–324.
[17] F. S PRENGEL , Interpolation and wavelets on sparse Gauss-Chebyshev grids, in Multivariate Approximation,
Recent Trends and Results, W. Haussmann, K.Jetter, and M. Reimer, eds., vol. 101 of Mathematical
Research, Akademie-Verlag, Berlin, 1997, pp. 269–286.
[18] G. WAHBA , Spline Models for Observational Data, CBMS-NSF Regional Conf. Ser. in Appl. Math., vol. 59,
SIAM, 1990.
[19] C. Z ENGER , Sparse grids, Technical Report, Technische Universität München, 1990.
Download