as a PDF

advertisement
Gaussian Process Modeling of Large Scale Terrain
Shrihari Vasudevan, Fabio Ramos, Eric Nettleton and Hugh Durrant-Whyte
Australian Centre for Field Robotics
The University of Sydney, NSW 2006, Australia
{s.vasudevan,f.ramos,e.nettleton,hugh}@acfr.usyd.edu.au
Abstract
Building a model of large scale terrain that can adequately handle uncertainty and incompleteness
in a statistically sound way is a challenging problem. This work proposes the use of Gaussian processes
as models of large scale terrain. The proposed model naturally provides a multi-resolution representation of space, incorporates and handles uncertainties aptly and copes with incompleteness of sensory
information. Gaussian process regression techniques are applied to estimate and interpolate (to fill gaps
in occluded areas) elevation information across the field. The estimates obtained are the best linear
unbiased estimates for the data under consideration. A single non-stationary (neural network) Gaussian
process is shown to be powerful enough to model large and complex terrain, effectively handling issues
relating to discontinuous data. A local approximation method based on a “moving window” methodology and implemented using KD-Trees is also proposed. This enables the approach to handle extremely
large datasets, thereby completely addressing its scalability issues. Experiments are performed on large
scale datasets taken from real mining applications. These datasets include sparse mine planning data,
which is representative of a GPS based survey, as well as dense laser scanner data taken at different
mine-sites. Further, extensive statistical performance evaluation and benchmarking of the technique has
been performed through cross validation experiments. They conclude that for dense and/or flat data,
the proposed approach will perform very competitively with grid based approaches using standard interpolation techniques and triangulated irregular networks using triangle based interpolation techniques;
for sparse and/or complex data however, it would significantly outperform them.
1
Introduction
Large scale terrain mapping is a difficult problem with wide-ranging applications in robotics, from space
exploration to mining and more. For autonomous robots to function in such high-value applications, an
1
efficient, flexible and high-fidelity representation of space is critical. The key challenges posed by this
problem are that of dealing with the problems of uncertainty, incompleteness and handling unstructured
terrain. Uncertainty and incompleteness are virtually ubiquitous in robotics as sensor capabilities are limited.
The problem is magnified in a field robotics scenario due to the significant scale of many domains.
State-of-the-art surface mapping methods employ representations based on triangular tesselations. These
representations, however, do not have a statistically sound way of incorporating and managing uncertainty.
The assumption of spatially independent data is a further limitation of approaches based on these representations. While there are several interpolation techniques known, the independence assumption can lead
to simplistic (simple averaging like) techniques resulting in inaccurate modeling of the terrain. Tesselation
methods also do not handle incomplete data where areas of the terrain may not be visible from the sensor
perspective. This paper proposes the use of Gaussian process models to represent large scale terrain. Gaussian process models provide a direct method of incorporating sensor uncertainty and using this in learning of
the terrain model. Estimation of data at unknown locations is treated as a regression problem in 3D space
and takes into account the correlated nature of the spatial data. This technique is also used to overcome
sensor limitations and occlusions by filling the gaps in sensor data with a best linear unbiased estimate. The
Gaussian process representation is a continuous domain, compact and non-parametric model of the terrain
and thus can readily be used to create terrain maps at any required resolution.
The main contribution of this paper is a novel approach for representing large scale terrain using Gaussian
processes. Specifically, this paper shows that a single non-stationary kernel (neural-network) Gaussian
process is successfully able to model large scale terrain data taking into account the local smoothness while
preserving spatial features in the terrain. A further contribution of this paper is a local approximation method
based on a “moving window” implemented using KD-Trees that incorporates the benefits of both stationary
and non-stationary kernels in the terrain estimation process. The use of the local approximation technique
addresses the issue of scalability of the Gaussian process model to large and complex terrain data sets.
The end-result is a multi-resolution representation of space that incorporates and manages uncertainty in a
statistically sound way and handles spatial correlation between data in an appropriate manner. Experiments
conducted on large scale datasets obtained from real mining applications, including a sparse mine planning
dataset that is representative of a GPS based survey as well as dense laser scanner based surveys, clearly
demonstrate the viability and effectiveness of the Gaussian process representation of large-scale terrain.
The paper also compares the performance of stationary and non-stationary kernels and details various finer
points of the Gaussian process approach to terrain modeling. Extensive cross-validation experiments have
been performed to statistically evaluate the method; they also compare the approach with several well known
representation and interpolation methods, serving as benchmarks. These experiments further support the
2
claims made and enable a greater understanding of how different techniques compare against each other, the
applicability of the different techniques and finally the Gaussian process modeling paradigm itself.
Section 2 broadly reviews recent work in the areas of terrain representations, interpolation and finally
Gaussian processes applied to the terrain modeling problem. The following section (3) presents the approach
in detail, from problem formulation to modeling and learning and finally application. The next two sections
(4 and 5) will discuss the kernels, the learning method used and the approximation methodology in more
detail; perspectives and comparisons of the proposed approach in relation to specific works in the state-ofthe-art will also be drawn. These theoretical sections will be followed by a three part experimental section
(6) that will provide an extensive evaluation of the proposed approach. The paper will conclude with a
reiteration of the findings and contributions of this work.
2
Related Work
State-of-the-art terrain representations used in applications such as mining, space exploration and other field
robotics scenarios as well as in geospatial engineering are typically limited to elevation maps, triangulated
irregular networks (TIN’s), contour models and their variants or combinations ((Durrant-Whyte, 2001) and
(Moore et al., 1991)). Each of them have their own strengths and preferred application domains. The former
two are more popular in robotics. The latter represents the terrain as a succession of “isolines” of specific
elevation (from minimum to maximum). They are particularly suited for modeling hydrological phenomena
but otherwise offer no particular computational advantages in the context of this paper. Each of them, in
their native form, do not handle spatially correlated data effectively.
Grid based methods represent space in terms of elevation data corresponding to each cell of a regularly
spaced grid structure. The outcome is a 2.5D representation of space, also known as an elevation map.
The main advantage of this representation is simplicity. Limitations include the inability to handle abrupt
changes, the dependence on grid size and the issue of scalability in large environments. In robotics, grid based
methods have been exemplified by numerous works such as (Krotkov and Hoffman, 1994), (Ye and Borenstein,
2003), (Lacroix et al., 2002) and more recently (Triebel et al., 2006). Both (Krotkov and Hoffman, 1994)
and (Ye and Borenstein, 2003) use laser range finders to construct elevation maps. The latter also proposes
a “certainty assisted spatial filter” that uses “certainty” information as well as spatial information (elevation
map) to filter out erroneous terrain pixels. The authors of (Lacroix et al., 2002) developed stereo-vision
based elevation maps and recognize that the main problem is uncertainty management. Addressing this
issue, they proposed a heuristic data fusion algorithm based on the Dempster Shafer theory. The authors of
(Triebel et al., 2006) propose an extension to standard elevation maps in order to handle multiple surfaces
3
and overhanging objects. The main weaknesses observed in grid based representations are the lack of a
statistically direct way of incorporating and managing uncertainty and the inability to appropriately handle
spatial correlation.
Triangulated Irregular Networks (TIN) usually sample a set of surface specific points that capture important aspects of the terrain surface to be modeled - bumps/peaks, troughs, breaks for example. The
representation typically takes the form of an irregular network of elevation points with each point linked to
its immediate neighbors. This set of points is represented as a triangulated surface. TIN’s are able to more
easily capture sudden elevation changes and are also more flexible and efficient than grid maps in relatively
flat areas. In robotics, TIN’s have been used extensively, see (Leal et al., 2001) and (Rekleitis et al., 2008).
The work (Leal et al., 2001), presents an approach to performing online 3D multi-resolution reconstruction
of unknown and unstructured environments to yield a stochastic representation of space. Recent work in
the space exploration domain presents a LIDAR based approach to terrain modeling using TIN’s (Rekleitis
et al., 2008). This work decimates coplanar triangles into a single triangle to provide a more compact representation. TIN’s may be efficient from a survey perspective as few points are hand-picked. However, (Triebel
et al., 2006) points out that for dense sensor data, while they are accurate and can easily be textured, they
have a huge memory requirement which grows linearly with the number of scans. The assumption of spatial
statistical independence may render the method ineffective at handling incomplete data as the elemental
facet of the TIN (planar triangle) may approximate a complicated surface. This however, depends on the
choice of the sensor and the data density obtained.
There are several different kinds of interpolation strategies for grid data structures. The choice of interpolation method can have severe consequences on the accuracy of the model obtained. Kidner (Kidner
et al., 1999) reviewed and compared numerous interpolation methods for grid based methods. He recommended the use of higher order (at least bi-cubic) polynomial interpolation methods as a requirement for
grid data interpolation. Ye and Borenstein, in (Ye and Borenstein, 2003), used median filtering to fill in the
missing data in an elevation map. The Locus algorithm (Kweon and Kanade, 1992) was used in (Krotkov
and Hoffman, 1994). This interpolation method attempted to find an elevation estimate by computing the
intersection of the terrain with the vertical line at the point in the grid - this was done in image space rather
than Cartesian space.
Gaussian processes (Rasmussen and Williams, 2006) (GP’s) are powerful non-parametric learning techniques that can handle many of the problems described above. They can provide a scalable multi-resolution
model of large scale terrain. GP’s provide a continuous domain representation of the terrain data and hence
can be easily sampled at any desired resolution. They incorporate and handle uncertainty in a statistically
sound way and represent spatially correlated data in an appropriate manner. They model and use the
4
spatial correlation of the given data points to estimate the elevation values for other unknown points of
interest. In an estimation sense, GP’s provide a best linear unbiased estimate (Kitanidis, 1997) based on
the underlying stochastic model of spatial correlation between the data points. They basically implement
an interpolation methodology called Kriging (Matheron, 1963) which is a standard interpolation technique
used in geostatistics. GP’s thus handle both uncertainty and incompleteness effectively.
In the recent past, Gaussian processes have been applied in the context of terrain modeling, see (Lang
et al., 2007), (Plagemann et al., 2008a) and (Plagemann et al., 2008b). In all three cases, the GP’s are
implemented using a non-stationary equivalent of a stationary squared exponential covariance function (Paciorek and Schervish, 2004) and incorporate kernel adaptation techniques in order to adequately handle both
smooth surfaces as well as inherent (and characteristic) surface discontinuities. Whereas (Lang et al., 2007)
initializes the kernel matrices evaluated at each point with parameters learnt for the corresponding stationary
kernel and then iteratively adapts them to account for local structure and smoothness, (Plagemann et al.,
2008a) and (Plagemann et al., 2008b) introduce the idea of a “hyper-GP” (using an independent stationary
kernel GP) to predict the most probable length scale parameters to suit the local structure. These are used
to make predictions on a non-stationary GP model of the terrain. (Plagemann et al., 2008b) proposes to
model space as an ensemble of GP’s in order to reduce computational complexity.
While the context of this work is similar, the approach differs in proposing the use of non-stationary,
neural network kernels to model large scale discontinuous spatial data. The performance of GP’s using stationary (squared exponential) and non-stationary (neural network) kernels for modeling large scale terrain
data are compared. Experiments to better understand various aspects of the modeling process using GP’s are
also described. These demonstrate that using suitable non-stationary kernel can directly result in modeling
both local structure and smoothness. A local approximation methodology that addresses scalability issues is
proposed, which when used with the proposed kernel, emulates the locally adaptive effect of the techniques
proposed in (Lang et al., 2007), (Plagemann et al., 2008a) and (Plagemann et al., 2008a). This approximation technique is based on a “moving window” methodology and implemented using an efficient hierarchical
representation (KD-Tree) of the data. Thus, the scalability issues relating to the application of GP’s to
large scale data sets are addressed. Further, extensive statistical evaluation and benchmarking experiments
are performed, they compare GP’s using stationary and non-stationary kernel, grid representations using
several well known interpolation methods and finally, TIN representations using triangle based interpolation
techniques. These experiments further strengthen the claims on the effectiveness of the approach. The contribution of this work is thus the proposition of a novel way of representing large scale terrain using Gaussian
processes. The representation naturally provides a multi-resolution model of the terrain, incorporates and
handles uncertainty effectively and appropriately handles spatially correlated data.
5
3
Approach
3.1
Problem Definition
The terrain modeling problem can be understood as follows - given a sensor that provides terrain data as a
set of points (x, y, z) in 3D space, the objective is to:
1. Develop a multi-resolution representation that incorporates sensor uncertainty in an appropriate manner.
2. Handle in a statistically consistent manner, sensing limitations including incomplete sensor information
and terrain occlusions.
Terrain data can be obtained using numerous sensors including 3D laser scanners, radar or GPS. 3D laser
scanners provide dense and accurate data whereas a GPS based survey typically comprise of a relatively
sparse set of well chosen points of interest. The experiments reported in this work use data sets that
represent both of these kinds of information. The overall process of using Gaussian processes to model large
scale terrain is shown in Figure 1. The steps involved in learning the representation sought are shown in
Figure 2. The process of applying the GP representation to create elevation/surface maps at any arbitrary
resolution is described in Figure 3.
Typically, for dense and large data sets, not all data is required for learning the GP model. Such an
approach would not scale for reasons of computational complexity. Thus, a sampling step may need to
be included. This sampling may be uniform or random and could also use heuristic approaches such as
preferential sampling from areas of high gradient. The dataset is sampled into three subsets - training,
evaluation and testing. The training data is used to learn the GP model corresponding to the data. The
training and evaluation data together are stored in the KD-Tree (Preparata and Shamos, 1993) for later
use in the application or inference process. The KD-Tree provides for rapid data access in the inference
process. It is used in the local approximation method proposed in this work to address scalability issues
related to applying the approach to large scale data sets. The testing data is the subset of data over which
the GP model is evaluated. In this work, the mean squared error between the predicted and actual elevation,
computed over the testing subset, is the performance metric used.
3.2
Model Description
The use of Gaussian processes for modeling and representing terrain data is proposed. The steps below
are directly based on (Rasmussen and Williams, 2006). Gaussian processes provide a powerful framework
6
Figure 1: The overall process of learning and using Gaussian processes to model terrain: Data from any
sensor along with noise estimates are provided as the input to the process. The modeling step then learns
the appropriate Gaussian process model (hyper-parameters of the kernel) and the result of the modeling step
are the GP hyperparameters together with the training/evaluation data. These are then used to perform
GP regression to estimate the elevation data across any grid of points, of the users interest. The application
step produces a 2.5D elevation map of the terrain as well as an associated uncertainty for each point.
7
Figure 2: The training or modeling process: A sensor such as a laser scanner is used to provide 3D terrain
data. A sample of the given terrain data is used to learn a GP representation of the terrain. Different
sampling methods can be used. Training and evaluation data are stored in a KD-Tree for later use in
the application stage. Learning the GP model of the terrain amounts to performing a maximum marginal
likelihood estimation of the hyperparameters of the model. The outcomes of this process are the KD-Tree
representation of the training/evaluation data and the GP model of the terrain.
8
Figure 3: The testing or application process: Given a desired region of interest where elevation needs to
be estimated, applying the GP model learnt in the modeling stage amounts to sampling the continuous
representation at the desired resolution, in the area of interest. A local approximation technique based on
a “moving window” methodology and implemented using the KD-Tree obtained from the modeling stage
addresses scalability issues related to the method being applied for large scale data. It also enables a tradeoff
between smoothness and feature preservation in the terrain. The outcome of this process is an elevation map
with the corresponding uncertainty for each point in it.
9
for learning models of spatially correlated and uncertain data. Gaussian process regression provides a
robust means of estimation and interpolation of elevation information and can handle incomplete sensor
data effectively. GP’s are non-parametric approaches in that they do not specify an explicit functional
model between the input and output. They may be thought of as a Gaussian probability distribution in
function space and are characterized by a mean function m(x) and the covariance function k(x, x0 ) where
m(x) = E[f (x)]
(1)
k(x, x0 ) = E[(f (x) − m(x))(f (x0 ) − m(x0 ))]
(2)
such that the GP is written as
f (x) ∼ GP(m(x), k(x, x0 ))
(3)
The mean and covariance functions together specify a distribution over functions. In the context of the
problem at hand, each x ≡ (x, y) and f (x) ≡ z of the given data.
The covariance function models the relationship between the random variables corresponding to the
given data. Although not necessary, the mean function m(x) may be assumed to be zero by scaling the data
appropriately such that it has an empirical mean of zero. There are numerous covariance functions (kernels)
that can be used to model the spatial variation between the data points. The most popular kernel is the
squared-exponential kernel given as
1
k(x, x0 ) = σf2 exp − (x − x0 )T Σ(x − x0 )
2

 lx
where k is the covariance function or kernel; Σ = 
0
(4)
−2
0 
 is the length-scale matrix, a measure of
ly
how quickly the modeled function changes in the directions x and y; σf2 is the signal variance. The set of
parameters lx , ly , σf are referred to as the kernel hyperparameters.
The neural network kernel ((Neal, 1996), (Williams, 1998a) and (Williams, 1998b)) is the focus of this
work and is specified as
0
k(x, x ) =
σf2
arcsin
β + 2xT Σx0
p
(1 + β + 2xT Σx)(1 + β + 2x0T Σx0 )
!
(5)
where β is a bias factor, Σ is the length scale matrix as described before and lx , ly , σf , β constitute the
10
kernel hyperparameters.
The NN kernel represents the covariance function of a neural network with a single hidden layer between
the input and output, infinitely many hidden nodes and using a Sigmoid as the transfer function (Williams,
1998a) for the hidden nodes. Hornik in (Hornik, 1993) showed that such neural networks are universal
approximators of data and Neal (Neal, 1996) observed that the functions produced by such a network would
tend to a Gaussian Process.
The main difference between these two kernels is that the squared-exponential kernel, being a function
of |x − x0 | is stationary (invariant to translation) whereas the neural network function is not so. In practice,
the squared exponential function has a smoothing or averaging effect on the data. The neural network
covariance function proves to be much more effective than the squared exponential covariance function in
handling discontinuous (rapidly changing) data, as the experiments in this paper will show. This is the main
reason why it proves to be effective in modeling complex terrain data. A more detailed analysis of these
kernels and their properties will be given in Section 4.
3.3
Learning the hyperparameters
Training the GP for a given data set is equivalent to choosing a kernel function and optimizing the hyperparameters for the chosen kernel. For the squared-exponential kernel, this amounts to determining the values
θ = {lx , ly , σf , σn } and for the neural network kernel, the values θ = {lx , ly , σf , β , σn }, where σn2 is
the noise variance of the data being modeled. This is performed by formulating the problem in a maximum
marginal likelihood estimation framework and subsequently solving a non-convex optimization problem.
Defining X = {xi }ni=1 = {(xi , yi )}ni=1 and z = {f (xi )}ni=1 = {zi }ni=1 as the sets of training inputs and
outputs respectively of n instances, the log marginal likelihood of the training outputs (z) given the set of
locations (X) and the set of hyperparameters θ is given by
1
1
n
log p(z|X, θ) = − zT Kz−1 z − log |Kz | − log(2π)
2
2
2
(6)
where Kz = Kf + σn2 I is the covariance matrix for all noisy targets z and Kf is covariance matrix for
the noise-free targets obtained by applying either Equation 4 or 5 on the training data. The log marginal
likelihood has three terms - the first describes the data fit, the second term penalizes model complexity and
the last term is simply a normalization coefficient. Thus, training the model will involve searching for the
set of hyperparameters that enables the best data fit while avoiding overly complex models. Occam’s razor
principle (Mackay, 2002) is thus in-built in the optimization and prevention of over-fitting is guaranteed by the
very formulation of the learning mechanism. Using this maximum marginal likelihood formulation, training
11
the GP model on a given set of data amounts to finding the optimal set of hyperparameters that maximize
the log marginal likelihood. This can be performed using standard off-the-shelf optimization approaches.
In this work, a combination of stochastic search (simulated annealing) and gradient based search (QuasiNewton optimization with BFGS Hessian update (Nocedal and Wright, 2006)) was found to produce the best
results. Using a gradient based optimization approach leads to advantages in that convergence is achieved
much faster.
3.4
Applying the GP model
Applying the GP model amounts to using the learned GP model to estimate the elevation information across
a region of interest, characterized by a grid of points at a desired resolution. The 2.5D elevation map can
then be used directly or as a surface map for various applications. This is achieved by performing Gaussian
process regression at a set of query points, given the training/evaluation data sets and the GP kernel with
the learnt hyperparameters.
The observed sensor noise is assumed to be Gaussian zero-mean and white, with unknown variance (it is
learnt from the data) σn2 . Thus, the covariance on the observations is given as
cov(zp , zq ) = k(xp , xq ) + σn2 δpq
(7)
where δpq is a Kroeneker Delta, δpq = 1 iff p = q and 0 otherwise.
The joint distribution of any finite number of random variables of a GP is Gaussian. Thus, the joint
distribution of the training outputs z and test outputs f∗ given this prior can be specified by





σn2 I
 z 
  K(X, X) +

 ∼ N 0 , 
f∗
K(X∗ , X)
K(X, X∗ ) 

K(X∗ , X∗ )
(8)
For n training points and n∗ test points, K(X, X∗ ) denotes the n × n∗ matrix of covariances evaluated at
all pairs of training and test points. The terms K(X, X), K(X∗ , X∗ ) and K(X∗ , X) can be defined likewise.
The function values (f∗ ) corresponding to the test locations (X∗ ) given the training inputs X, training
outputs z and the covariance function (kernel) is given by
f¯∗ = K(X∗ , X)[K(X, X) + σn2 I]−1 z.
The function covariance is given by
12
(9)
cov(f∗ ) = K(X∗ , X∗ ) − K(X∗ , X)[K(X, X) + σn2 I]−1 K(X, X∗ ).
(10)
Denoting K(X, X) by K, K(X, X∗ ) by K∗ and for a single test point x∗ , k(x∗ ) = k∗ is used to represent
the vector of covariances between x∗ and the n training points, Equations 9 and 10 can be rewritten as:
f¯∗ = k∗T (K + σn2 I)−1 z,
(11)
V [f∗ ] = k(x∗ , x∗ ) − k∗T (K + σn2 I)−1 k∗ .
(12)
and variance
Equations 11 and 12 provide the basis for the elevation estimation process. The GP estimates obtained
are a best linear unbiased estimate for the respective query points. Uncertainty is handled by incorporating
the sensor noise model in the training data. The representation produced is multi-resolution in that a
terrain model can be generated at any desired resolution using the GP regression equations presented above.
Thus, the GP terrain modeling approach is a probabilistic, multi-resolution method that handles spatially
correlated information.
3.5
Fast local approximation using KD-Trees
During the inference process, the KD-Tree that initially stores the training and evaluation data is queried to
provide a predefined number of spatially closest data, to the point for which the elevation must be estimated.
The GP regression process then uses only these training exemplars to estimate the elevation at the point of
interest. This is described in the zoomed in view of Figure 3. Note that the GP model itself is learnt from the
set of all training data but is applied locally using the KD-Tree approximation. This approximation method
is akin to a “moving window” methodology and has multiple benefits that are vital to the scalability of the
GP method in a large scale application. The use of only the spatially closest neighbors for GP regression
keeps the computational complexity low and bounded. As described earlier, the squared exponential kernel
has a smoothing effect on the data whereas the neural-network kernel is much more effective in modeling
discontinuous terrain data. The number of nearest neighbor exemplars used can be used to control the
tradeoff between smoothness and feature preservation. Finally, the proposed local approximation method
bounds (by selecting a fixed number of data points) the subset of data over which the neural network kernel
is applied and thus effectively uses it. A more detailed explanation of these aspects is provided in Section 5.
13
4
On the Choice of Kernel
(a) Squared Exponential (SQEXP) kernel behavior
(b) Neural Network (NN) kernel behavior
Figure 4: 4(a) depicts the SQEXP kernel (X = -3.0 : 0.1 : 3.0). Correlation between points is inversely
proportional to the Euclidean distance between them, hence, points further away from the point under
consideration are less correlated to it. 4(b) depicts the NN kernel for data (X = -3.0 : 0.1 : 3.0). The NN
kernel is a function of the dot product between the points until it saturates. Spatial correlation between two
points is dependent on the distance of the respective points from the coordinate origin and not their relative
distance as in SQEXP. Thus, the center/query point does not bear any special significance as in SQEXP.
The data will be such that the positive and negative correlation values together with the respective data will
produce the correct estimate.
4.1
Squared Exponential Kernel
The squared exponential (SQEXP) kernel (equation 4) is a function of | x − x0 | and is thus stationary and
isotropic meaning that it is invariant to all rigid motions. The SQEXP function is also infinitely differentiable
and thus tends to be smooth. The kernel is also called the Gaussian kernel. It functions similarly to a
Gaussian filter being applied to an image. The point at which the elevation has to be estimated is treated
as the central point/pixel. Training data at this point together with the those at points around it determine
the elevation.
As a consequence of being a function of the Euclidean distance between the points and the fact that the
correlation is inversely proportional to the distance between them, the SQEXP kernel is prone to producing
smoothened (possibly overly) models. Points that are far away from the central pixel contribute less to its
final estimate whereas points nearby the central pixel have maximum leverage. The correlation measures
of far off points no matter how significant they may be, are diminished. This is in accordance with the
“bell-shaped” curve of the kernel. Figure 4(a) depicts the SQEXP kernel of a selection of data with itself.
14
4.2
Neural Network Kernel
The NN kernel treats data as vectors and is a function of the dot product of the data points. Hence, the NN
kernel is dependent on the coordinate origin and not the central point in a window of points, as explained
about SQEXP above. Spatial correlation is a function of the distance of the points from the data origin
until reaching a saturation point. This kernel behavior is exhibited in Figure 4(b). In the saturation region,
the correlation of points are similar. It is this kernel behavior that enables the resulting GP model to adapt
to rapid or large variations in data. This work uses a local approximation technique (Sections 3.5 and 5)
to address scalability issues relating to large scale applications. The local approximation method also serves
to bound the region over which the kernel is applied (hence, enabling effective usage of the kernel) as even
two distant points lying in the saturation region of the kernel will be correlated. The effect of using the
NN kernel alongside the local approximation leads to a form of locally non-stationary GP regression for
large scale terrain modeling. The fact that correlation increases (until saturation) with distance in the NN
kernel, although apparently counter intuitive, may be thought of as a “noise-filtering” mechanism, wherein,
a greater emphasis is placed on the neighborhood data of a point under consideration. Such a behavior is
not seen in the SQEXP kernel. In this work, it has been repeatedly observed from experiments in Section 6
that the NN kernel has been able to adapt to fast changing / large discontinuities in data. The NN kernel
has been described in detail in (Neal, 1996), (Williams, 1998a), (Williams, 1998b) and (Rasmussen and
Williams, 2006). The form used here is an exact equivalent and implementationally more convenient form
to the classical form of the kernel.
4.3
Discussion
Table 1: Comparison of Kernels
Squared Exponential (SQEXP)
Neural Network (NN)
Stationary (translational invariance)
Non-stationary (not translationally invariant)
Isotropic (rotational invariance)
Non-isotropic (not rotationally invariant)
Negative values not allowed
Negative values allowed
Function of Euclidean distance between points
Function of dot-product of the points until saturation
Origin independent
Origin dependent
Correlation inversely proportional to Euclidean Correlation is a function of the distance of the
distance between points
points from the origin until it saturates
A comparison between the SQEXP and NN kernels is detailed in Table 1. As noted in (Williams,
1998a), a direct analytical comparison of stationary and non-stationary kernel beyond its properties is not
straight-forward. This paper compares them based on their properties and provides empirical results which
15
(a) GP modeling of data using SQEXP kernel
(b) GP modeling of data using NN kernel
Figure 5: Adaptive power of GP’s using stationary (SQEXP) and non-stationary (NN) kernels. For a set of
chosen points (in 1 dimension) that represent an abrupt change (the plus marks), a GP is learnt using both
the kernels and the GP based estimates are plotted (solid blue line) with the uncertainty being represented
by the gray area around the solid line. The GP using the NN kernel is able to respond to the step-like
jump in data almost spontaneously whereas the GP using the SQEXP smoothens out the curve (ie. is much
more approximate) in order to best fit the data. The GP-NN estimates are also much more certain than the
GP-SQEXP estimates. This adaptive power of the GP using the NN kernel is the main motivation of using
it in this work in order to model large scale, terrain which may potentially be very unstructured - there may
be large and rapid jumps in data and the GP/kernel will have to respond to them effectively.
demonstrate that the non-stationary kernel performs much better than the stationary one for a spatial
modeling task.
There is no clear way of specifying which kinds of kernel behavior are “correct” or “incorrect” for a spatial
modeling task. The only criteria of importance is the ability of the GP using the particular kernel to adapt to
the data at hand. A simple but telling example of the adaptive power of the a GP using the non-stationary
NN kernel versus that of one using a stationary SQEXP kernel is depicted in Figure 5. Another interesting
aspect of a GP using the NN kernel in comparison with one using an SQEXP kernel is its superior ability to
respond to sparse data availability. This is due to the fact that the correlation in the NN kernel saturates.
Thus, even relatively far off points could be used to make useful predictions at the point of interest. In
the case of the SQEXP kernel however, the correlation decreases with distance towards zero. Thus, far off
data in sparse datasets become insignificant and are not useful for making inferences at the point of interest.
Such a scenario could occur in large voids due to occlusions or due to fundamental sensor limitations. These
aspects of the NN kernel taken together, make it a powerful method for use in complex terrain modeling
using GP’s.
While a GP based on the NN kernel is an effective model for terrain data as described and demonstrated
in this paper, other kernels may work better than the NN in specific situations. Indeed, the GP with
16
NN kernel is a general purpose universal approximator. Domain specific kernels (kernels tailored to specific
applications or datasets - for example a linear kernel in flat areas) could potentially perform better in specific
situations.
The essential issues with GP’s based on the SQEXP kernel for describing terrain data are the resulting
stationarity and constant length scales of the model. These underscore the issue that the SQEXP treats
different parts of the data in exactly the same way and to exactly the same extent. This makes GP’s based
on the standard SQEXP kernel inappropriate for the task of modelling unstructured large-scale terrain. The
works of Lang and Plagemann ((Lang et al., 2007), (Plagemann et al., 2008a) and (Plagemann et al., 2008b))
have attempted to address these issues by using the non-stationary equivalent of the SQEXP kernel ((Paciorek
and Schervish, 2004)), defining Gaussian kernels for each point that account for the local smoothness or by
using a hyper-GP to predict local length scales.
This work attempts to use an alternative non-stationary kernel that is inherently immune to these issues.
The Neural Network (NN) kernel is a function of the dot product of the points, thus, different parts of the
data will be treated differently even if a single set of length scales are used. Treating the data points as vectors
and using their dot product inherently captures the “local smoothness”. The interesting aspect is that a
single kernel choice with a single set of hyperparameters will suffice to model the data - this is demonstrated
in our experiments (Section 6). Thus, this work proposes an alternative methodology to address the core
issues in GP based terrain modeling and further, it is also tailored to meet the demands of modeling large
scale and unstructured terrain.
4.4
Learning methodology
Once the kernel is chosen, the next important step in the GP modeling process is to estimate the hyperparameters for a given dataset. Different approaches have been attempted including a maximum marginal
likelihood formulation (MMLE), generalized cross validation (GCV) and maximum a-posteriori estimation
(MAP) using Markov chain Monte Carlo (MCMC) techniques (see (Williams, 1998b) and (Rasmussen and
Williams, 2006)). The MMLE computes the hyperparameters that best explains the training data given the
chosen kernel. This requires the solution of a non-convex optimization problem and may possibly be driven
to locally optimal solutions. The MAP technique places a prior on the hyperparameters and computes the
posterior distribution of the test outputs once the training data is well explained. The state-of-the-art in
GP based terrain modeling typically used the latter approach. For large scale datasets, the GCV and MAP
approaches are too computationally expensive. In the work described in this paper, the learning procedure is
based on the MMLE method, the use of gradients and the learning of the hyperparameters that best explains
17
the training data through optimization techniques. Locally optimal solutions are avoided by combining a
stochastic optimization procedure with a gradient based optimization. The former enables the optimizer to
reach a reasonable solution and the latter focuses on fast convergence to the exact solution.
5
The local approximation strategy
As described earlier, Equations 11 and 12 can be used to estimate the elevation and its uncertainty at any
point, given a set of training data. The two equations involve the term (K + σn2 I)−1 which requires an
inversion of the covariance matrix of the training data. This is an operation of cubic complexity O(n3 ) where
n is the number of training data used.
In this work, the training data together with the evaluation data is stored in a KD-Tree. Typically, the
size of these data subsets is very large and thus using the entire set of data for elevation estimation is not
practically possible. Thus, approximations to the GP regression problem are required. In the literature,
several approximation techniques for Gaussian processes have been proposed to address this problem. Some
well known examples include the Nystrom approximation, the subset of regressors and Bayesian committee
machines. Extensive reviews of these techniques are provided in Chapter 8 of (Rasmussen and Williams,
2006) and (Candela et al., 2007). In this work, a local approximation strategy that is akin to a “moving
window” approximation is proposed. The proposed approximation methodology is inspired from work in
geostatistics where the “moving window” approach is well established ((Haas, 1990b), (Haas, 1990a) and
(Wackernagel, 2003)). Such an approximation has also been suggested in the machine learning community
by Williams in (Williams, 1998b). The work (Shen et al., 2006) proposed a different approximation method
using KD-Trees. They used the idea that GP regression is a weighted summation. For monotonic isotropic
kernels, they used KD-Trees to efficiently group training data points having similar weights with respect to a
given query point in order to approximate and speed-up the regression process. In comparison, the method
proposed in this work uses standard GP regression using any kernel, with a bounded neighborhood of points
obtained by applying a moving window on data.
The proposed local approximation strategy is based on the underlying idea that a point is most correlated
with its immediate neighborhood. Thus, during the inference process, the KD-Tree, that initially stored the
training and evaluation datasets, is queried to provide a predefined set of training data, spatially closest to
the point for which the elevation must be estimated. The GP regression process then uses only these training
exemplars to estimate the elevation at the point of interest. Note that the GP model itself is learnt from the
set of all training data but is applied locally using the local approximation idea.
In order to have an idea of the reduction in complexity due to the proposed approximation, a simple
18
example is considered. A dataset of N (for example, 1 million) points may be split into for instance ntrain
(say, 3000) training points, ntest (say, 100000) test points and the remaining neval (897000) are evaluation
points. The training points are those over which the GP is learnt, the test points are those over which the
MSE is evaluated and the evaluation points together with the training points are used to make predictions at
the test points. The KD-Tree data structure would thus comprise of a total of ntrain + neval points (a total
of 900000 training and evaluation data). The complexity in the absence of the approximation strategy would
be O(ntrain + neval )3 for each of the ntest points. However, with the proposed local approximation strategy,
a predefined number m of spatially closest training and evaluation points is used to make a prediction at
any point of interest. Typically, m << ntrain + neval . For instance, a neighborhood size of 100 points
is used for experiments in this work. Thus the computational complexity would significantly decrease to
O(m3 ) for each of the ntest points to be evaluated. As the experiments in this work will demonstrate, the
results are still very good. Also note that the KD-Tree is just a convenient hierarchical data structure used
to implement the fast “moving window” based local approximation method. In theory, any appropriate data
structure could be used; the key idea is to use the local neighborhood only.
The squared exponential kernel has a smoothing effect on the data whereas the neural-network kernel
is much more effective in modeling discontinuous terrain data. The local approximation strategy defines
the neighborhood over which the interpolation takes place and hence can serve as a control parameter to
balance local smoothness with feature preservation in terrain modeling. The method of choosing the nearest
neighbors can also be used to the same effect. For instance, 100 nearest neighbors mutually separated by a
point will produce much smoother outputs as compared to the 100 immediate neighbors. In our experience,
the immediate neighborhood produced the best results. The local approximation method can thus be used
as a control or tradeoff mechanism between locally smooth outputs and feature preservation.
The local approximation strategy is also useful in the application of the neural network kernel (NN). The
NN kernel is capable of seamlessly adapting to rapidly and largely changing data. However, it also has the
property that depending on the location of the origin and that of the data points with respect to it, the
correlation increases with distance until it saturates. Particularly in the saturation region, the kernel is able
to respond to discontinuous data very effectively as the correlation values become similar. However, this
also implies that two distant points lying in the saturation region will be correlated. The proposed local
approximation method acts as a bounding mechanism for the applicability of the kernel. This is because the
NN only acts on the immediate neighborhood of the data, over which it is extremely effective in adapting. In
the case of the neural network kernel, this amounts to adding the benefits of a stationary kernel to the highly
adaptive power of a non-stationary kernel. Thus, the proposed local approximation methodology attempts
to emulate the locally adaptive GP effect exemplified by the works (Lang et al., 2007), (Plagemann et al.,
19
2008a) and (Plagemann et al., 2008b).
6
Experiments
6.1
Overview and Datasets
Experiments were performed on three large scale data sets representative of different sensory modalities.
The first data set comprises of a scan from a RIEGL LMSZ420 laser scanner at the Mt. Tom Price mine in
Western Australia. The data set consists of over 1.8 million points (in 3D) spread over an area of 135 x 72
sq m, with a maximum elevation difference of about 13.8 m. The scan has both numerous voids of varying
sizes as well as noisy data, particularly around the origin of the scan. The data set is shown in Figure 6 and
is referred to as the “Tom Price” data set in the rest of this paper. The second data set comprises of a mine
planning dataset, that is representative of a GPS based survey, from a large Kimberlite (diamond) mine.
This data set is a very hard one in that it contains very sparse information (4612 point in total; the average
spread of the data is about 33 m in x and y directions) spread over a very large geographical area (≈ 2.2 x
2.3 sq km), with a significant elevation of about 250 m. The data set is referred to as the “Kimberlite Mine”
data set in the rest of this paper and is shown in Figure 7. A third data set also consists of a laser scan but
over a much larger area than the first data set. This dataset is shown in Figure 8 and is referred to as the
“West Angelas” dataset for the rest of this paper. It comprises of about 534,460 points spread over 1.865 x
0.511 sq km. The maximum elevation of this dataset is about 190 m. This dataset is both rich in features
(step like wall formations, voids of varying sizes) as well and large in scale; a large part of the dataset is
composed of only sparse data.
Figure 6: Laser scanner data set taken at Mt. Tom Price mine in Western Australia using a RIEGL
LMSZ420. The data set consists of over 1.8 million data points. The data set spans about 135 x 72 sq m.
The focus of the experiments is three-fold. The first part will demonstrate GP terrain modeling and
20
Figure 7: Dataset representative of a GPS based survey of a large Kimberlite (diamond) mine. The data set
comprised of just over 4600 points spanning about 2.2 x 2.3 sq km. The difference in elevation between the
base and the top was about 250 m. The data set is very sparse with the average spread being about 33 m
in x and y directions.
Figure 8: Laser scanner data set taken at West Angelas mine in Western Australia using a RIEGL LMSZ420.
The data set consists of about 534460 data points and spans about 1.865 x 0.511 sq km.
21
evaluate the non-stationary NN kernel against the stationary SQEXP kernel for this application. The second
part will detail results from extensive cross-validation experiments done using all three datasets that aim to
statistically evaluate the performance of the stationary and non-stationary GP and also compare them with
several well known methods of interpolating data, even using alternative representations. This part aims at
understanding how they compare against each other on the test datasets and which would be more effective
in what circumstances. The last part will detail experiments that help understand finer aspects of the GP
modeling process and explain some of the design decisions in this work - these include the choice of using
two optimization techniques or 100 nearest neighbors for instance.
In all experiments, the mean squared error (MSE) criterion was used as the performance metric (together
with visual inspection) to interpret the results. Each dataset was split into three parts - training, evaluation
and testing. The former-most one was used for learning the GP and the latter was used only for computing
the mean squared error (MSE); the former two parts together were used to estimate the elevations for the
test data. The mean squared error was computed as the mean of the squared differences between the GP
elevation prediction at the test points and the ground truth specified by the test data subset.
6.2
Stationary vs Non-stationary GP terrain modeling
In the first set of experiments, the approach as described in Section 3 was implemented and tested with
the aforementioned datasets. The objective was to understand the if GP’s could be successfully used for a
terrain modeling task, the benefits of using them and if the non-stationary (neural network) or stationary
(squared exponential) kernel was more suited for terrain modeling.
Table 2: Kernel performance - Tom Price dataset (1806944 data points over 135 x 72 sq m, 3000 training
data, 10000 test points for MSE)
Kernel
Mean Squared Error (MSE) (sq m)
Squared Exponential (SQEXP)
0.0136
Neural Network (NN)
0.0137
Table 3: Kernel performance - Kimberlite Mine dataset (4612 points spread over 2.17 x 2.28 sq km)
Kernel
Number of
Mean Squared Error (MSE)
training data
(sq m)
Squared Exponential (SQEXP)
1000
13.014 (over 3612 points)
Neural Network (NN)
1000
8.870 (over 3612 points)
Squared Exponential (SQEXP)
4512
4.238 (over 100 points)
Neural Network (NN)
4512
3.810 (over 100 points)
22
Table 4: Kernel performance - West Angelas dataset (534460 data points over 1.865 x 0.511 sq km, 3000
training data, 106292 test points for MSE)
Kernel
Mean Squared Error (MSE) (sq m)
Squared Exponential (SQEXP)
0.590
Neural Network (NN)
0.019
The results of GP modeling of the three data sets using are summarized in Tables 2, 3 and 4. The tests
compared the stationary SQEXP kernel and the non-stationary NN kernel at Gaussian process modeling
of large scale mine terrain data. Overall, the NN kernel outperformed the SQEXP and proved to be more
suited to the terrain modeling task. In the case of the Tom Price dataset, as the dataset is relatively flat,
both kernels performed relatively well. The Kimberlite Mine dataset being sparse, was better modeled by
the NN kernel. The West Angelas dataset was more challenging in that it had large voids, large elevation
and sparse data. The NN kernel clearly outperformed the SQEXP kernel. Further, not only was its MSE
lower than that of the SQEXP but the MSE values were comparable to the known sensor noise of the RIEGL
scanner. The final output for the Tom Price data set is depicted in Figures 9 and 10; the corresponding
confidence estimate is depicted in Figure 11. The output surface and confidence estimates for the Kimberlite
Mine data set are respectively depicted in Figures 12 and 13 respectively. Figures 14 and 15 show the output
points and associated confidence for the West Angelas dataset.
Figure 9: Outcome of applying a single neural network based GP to the Tom Price data set. The gray
(darker) points constitute the given data set. The red points are the output points - the elevation data as
estimated by the GP over a test grid of points, spanning the dataset. The figure demonstrates the ability
of the GP to (1) reliably estimate the elevation data in known areas and (2) provide a good interpolated
estimate in areas with occlusions.
The experiments successfully demonstrated Gaussian process modeling of large scale terrain. Further,
the SQEXP and NN kernel were compared. The NN kernel proved to be much better suited to modeling
23
Figure 10: The surface map generated using the output elevation map obtained by applying a single neural
network GP to the Tom Price data set. Refer Figure 9.
complex terrain as compared to the SQEXP kernel. The kernels performed very similarly on relatively flat
data. Thus, the approach can
1. Model large scale sensory data acquired at different degrees of sparseness.
2. Provide an unbiased estimator for the data at any desired resolution.
3. Handle sensor incompleteness by providing an unbiased estimates of data that are unavailable (due
to occlusions for instance). Note that no assumptions on the structure of the data are made, a best
prediction is made using only the data at hand.
4. Handle sensor uncertainty by modeling the uncertainty in data in a statistically sound manner.
5. Handle spatial correlation between data.
The result is a compact, non-parametric, multi-resolution, probabilistic representation of the large scale
terrain that appropriately handles spatially correlated data as well as sensor incompleteness and uncertainty.
24
(a) Top-down view of Tom Price dataset
(b) Uncertainty (standard deviation in m) estimates of output elevation map (Figure 9)
Figure 11: 11(b) depicts the final uncertainty estimates of the output elevation map (see Figure 9) of the
Tom Price dataset as obtained using the proposed GP approach. The uncertainty map corresponds to a
top-down view of the dataset (Figure 6) as depicted in 11(a). Each point represents the uncertainty in the
estimated elevation of the corresponding point in the elevation map. The uncertainty is clearly higher in
those areas where there was no training data - typically in the outer fringe areas and in gaps. (min standard
deviation = 0.05 m, max standard deviation = 11.98 m)
25
Figure 12: Final output surface of the Kimberlite Mine data set obtained using a single neural network kernel
based GP. The surface map is obtained from the elevation map, in turn obtained by predicting the elevation
for each point across a test grid of points spanning the dataset. The figure demonstrates the ability of the
GP to (1) reliably estimate the elevation data in known areas (2) produce models that take into account the
local structure so as to preserve the characteristics of the terrain being modeled (3) be able to model sparse
and feature rich datasets.
26
(a) Top-down view of Kimberlite Mine
dataset
(b) Uncertainty (standard deviation in m) estimates of output elevation map (Figure 12)
Figure 13: 13(b) depicts the final uncertainty estimates of the output elevation map (see Figure 12) of the
Kimberlite Mine dataset as obtained using the proposed GP approach. The uncertainty map corresponds
to a top-down view of the dataset (Figure 7) as depicted in 13(a). Each point represents the uncertainty in
the estimated elevation of the corresponding point in the elevation map. The uncertainty is clearly higher in
those areas where there was no training data - typically in the outer fringe areas and in gaps. (min standard
deviation = 0.336 m, max standard deviation = 32.65 m)
27
Figure 14: Outcome of applying a single neural network based GP to the West Angelas data set. This
elevation map is obtained by predicting the elevation for each point across a test grid of points spanning
the dataset. The figure clearly demonstrates the ability of the GP to (1) reliably estimate the elevation
data in known areas (2) produce models that take into account the local structure so as to preserve the
characteristics of the terrain being modeled (3) be able to model sparse and feature rich datasets.
28
(a) Top-down view of West Angelas
mine dataset
(b) Uncertainty (standard deviation in m) of output elevation map (Figure 14)
Figure 15: 15(b) depicts the final uncertainty estimates of the output elevation map (see Figure 14) of the
West Angelas mine dataset as obtained using the proposed GP approach. The uncertainty map corresponds
to a top-down view of the dataset (Figure 8) as depicted in 15(a). Each point represents the uncertainty in
the estimated elevation of the corresponding point in the elevation map. The uncertainty is clearly higher in
those areas where there was no training data - typically in the outer fringe areas and in gaps. (min standard
deviation = 0.01 m, max standard deviation = 39.25 m)
29
6.3
Statistical performance evaluation, comparison and analysis
The next set of experiments that were conducted were aimed at
1. Evaluating the GP modeling technique in a way that provides for statistically representative results.
2. Providing benchmarks to compare the proposed GP approach with grid/elevation maps using several
standard interpolation techniques as well as TIN’s using triangle based interpolation techniques.
3. Understanding the strengths and applicability of different techniques.
The method adopted to achieve these objectives was to perform cross validation experiments on each of
the three datasets and interpret the results obtained. The GP modeling using the SQEXP and NN kernels
are compared with each other as well as with parametric, non-parametric and triangle based interpolation
techniques. A complete list and description of the techniques used in this experiment is included below.
1. GP modeling using SQEXP and NN kernels. Details on the approach and the kernels can be found in
Sections 3 and 4.
2. Parametric interpolation methods - These methods attempt to fit the given data to a polynomial surface
of the chosen degree. The polynomial coefficients, once found, can be used to interpolate or extrapolate
as required. The parametric interpolation techniques used here include linear (plane fitting), quadratic
and cubic.
(a) Linear (plane fitting) - A polynomial of degree 1 (plane in 2D) is fit to the given data. The
objective is to find the coefficients {ai }2i=0 for the equation
z = a0 + a1 x + a2 y ,
that best describes the given data by minimizing the residual error.
(b) Quadratic fitting: A polynomial of degree 2 is fit to the given data. The objective is to find the
coefficients {ai }5i=0 for the equation
z = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y 2 ,
that best describes the given data by minimizing the residual error.
(c) Cubic fitting - A polynomial of degree 3 is fit to the given data. The objective is to find the
30
coefficients {ai }9i=0 for the equation
z = a0 + a1 x + a2 y + a3 x2 + a4 xy + a5 y 2 + a6 x3 + a7 x2 y + a8 xy 2 + a9 y 3 ,
that best describes the given data by minimizing the residual error.
3. Non-parametric interpolation methods - These techniques attempt to fit a smooth curve through the
given data and do not attempt to characterize all of it using a parametric form, as was the case
with the parametric methods. They include piecewise linear or cubic interpolation, biharmonic spline
interpolation as well as techniques such as nearest neighbor interpolation and mean-of-neighborhood
interpolation.
(a) Piecewise linear interpolation - In this method, the data is gridded (associated with a suitable
mesh structure) and for any point of interest, the 4 points of its corresponding grid cell are
determined. A bilinear interpolation (Kidner et al., 1999) of these 4 points yields the estimate at
the point of interest. This interpolation basically uses the form:
z = a0 + a1 x + a2 y + a3 xy ,
where the coefficients are expressed in terms of the coordinates of the grid cell. It is thus truly
linear only if its behavior is considered keeping one variable/dimension fixed and observing the
other. It is also called linear spline interpolation wherein n + 1 points are connected by n splines
such that (a) the resultant spline would constitute a polygon if the begin and end points were
identical and (b) the splines would be continuous at each data point.
(b) Piecewise cubic interpolation - This method extends the piecewise linear method to cubic interpolation. It is also called cubic spline interpolation wherein n + 1 points are connected by n
splines such that (a) the splines are continuous at each data point and (b) the spline functions are
twice continuous differentiable, meaning that both the first and second derivatives of the spline
functions are continuous at the given data points. The spline function also minimizes the total
curvature and thus produces the smoothest curve possible. It is akin to constraining an elastic
strip to fit the given n + 1 points.
(c) Biharmonic spline interpolation - The biharmonic spline interpolation is based on (Sandwell,
1987). It places Green functions at each data point and finds coefficients for a linear combination
of these functions to determine the surface. The coefficients are found by solving a linear system
31
of equations. Both the slope and the values of the given data are used to determine the surface.
The method is known to be relatively inefficient, possibly unstable but flexible (in the sense of the
number of model parameters) in relation to the cubic spline method. In one or two dimensions,
this is known to be equivalent to the cubic spline interpolation as it also minimizes the total
curvature.
(d) Nearest-neighbor interpolation - This method uses the elevation estimate of the nearest training/evaluation data as the estimate of the point of interest.
(e) Mean-of-neighborhood interpolation - This method uses the average elevation of the set of training/evaluation data as the estimate of the point of interest.
4. Triangle based interpolation - All techniques mentioned thus far are applied on the exact same neighborhood of training and evaluation data that is used for GP regression for any desired point. In essence,
thus far, the comparison can be likened to one of comparing GP’s with standard elevation maps using any of the aforementioned techniques, applied over exactly the same neighborhoods. Two other
interpolation techniques are also considered for comparison - triangle based linear and cubic interpolation (Watson, 1992). These methods work on datasets that have been triangulated and involve fitting
planar or polynomial (cubic) patches to each triangle. They are meant to compare the proposed GP
approach with a TIN representation of the dataset, using the triangle based interpolation techniques.
(a) Triangle based linear interpolation - In this method, a Delaunay triangulation is performed on
the data provided and then the coordinates of the triangle corresponding to the point of interest
(in Barycentric coordinates (Watson, 1992)) are subject to linear interpolation to estimate the
desired elevation.
(b) Triangle based cubic interpolation - In this method, a Delaunay triangulation is performed on the
data provided and then the coordinates of the triangle corresponding to the point of interest (in
Barycentric coordinates (Watson, 1992)) are subject to cubic interpolation to estimate the desired
elevation.
Taking inspiration from (Kohavi, 1995), a 10-fold cross validation was conducted using each dataset.
Two kinds of cross validation techniques were considered - using uniform sampling to define the folds and
using patch sampling to define the folds. In the uniform sampling method, each fold had approximately
the same number of points and points were selected through a uniform sampling process - thus, the cross
validation was also stratified in this sense. This was performed for all three datasets. Further to this, the
patch sampling based cross validation was conducted for the two laser scanner datasets. In this case, the data
32
was gridded into 5 m (for Tom Price dataset) or 10 m (for West Angelas dataset) patches. Subsequently a 10
fold cross validation was performed by randomly selecting different sets of patches for testing and training in
each cycle. The uniform sampling method was meant to evaluate the method as such. The patch sampling
method was done with a view of testing the different techniques for their robustness to handling occlusions
or voids in data. Figures 16 and 17 depict the two kinds of cross validation performed.
Figure 16: Cross validation using uniform sampling. The figure shows a top - down / X (horizontal axis) - Y
(vertical axis) view of the West Angelas dataset (Figure 8). Points are uniformly sampled from the dataset
to produce 10 equally sized folds and cross validation is performed by testing each fold while learning from
the remaining folds. The figure depicts the dataset (blue points) and test points from three of the folds (red,
green and yellow points).
The results of the cross validation with uniform sampling experiments are summarized in Tables 5, 6
and 7. The tabulated results depict the mean and standard-deviation in MSE values obtained across the 10
folds of testing performed. The results of the cross validation with patch sampling experiments performed
on the Tom Price and West Angelas datasets are summarized in Tables 8 and 9 respectively. In each of these
tests, the GP representation with its kriging interpolation technique, stationary or non-stationary kernel
and local approximation strategy is compared with elevation maps using various parametric, non-parametric
interpolation techniques as well as the exact same local approximation method as the GP and also with
TIN’s using triangle based interpolation techniques. The former two approaches thus operate on exactly
the same neighborhood of training/evaluation data during the estimation/interpolation process. In order
to compare the proposed GP modeling approach with a TIN representation, the local approximation is not
applied on the latter class of interpolation techniques.
33
(a) Patch sampling of West Angelas dataset
(b) Patch sampling of Tom Price dataset
Figure 17: Cross Validation using patch sampling. The figures show a top - down / X (horizontal axis) Y (vertical axis) view of the West Angelas (Figure 8) and Tom Price (Figure 6) datasets respectively. The
datasets are gridded into 5m (for Tom Price) and 10m (for West Angelas) patches. 10 fold cross validation
is done by randomly selecting patches for training and testing in each cycle. The figures depict the datasets
(blue points) and test patches for three cycles of testing (red, green and yellow patches) in the two laser
scanner datasets.
34
Table 5: Tom Price dataset 10-fold cross validation with uniform sampling. Comparison of Interpolation
techniques
1000 test data per fold
10000 test data per fold
Interpolation Method
Mean MSE Std. Dev. MSE Mean MSE Std. Dev. MSE
(sq m)
(sq m)
(sq m)
(sq m)
GP Neural Network
0.0107
0.0012
0.0114
0.0004
GP Squared Exponential
0.0107
0.0013
0.0113
0.0004
Nonparametric Linear
0.0123
0.0047
0.0107
0.0013
Nonparametric Cubic
0.0137
0.0053
0.0120
0.0017
Nonparametric Biharmonic
0.0157
0.0065
0.0143
0.0019
Nonparametric
0.0143
0.0010
0.0146
0.0007
Mean-of-neighborhood
Nonparametric
0.0167
0.0066
0.0149
0.0017
Nearest-neighbor
Parametric Linear
0.0107
0.0013
0.0114
0.0005
Parametric Quadratic
0.0110
0.0018
0.0104
0.0005
Parametric Cubic
0.0103
0.0018
0.0103
0.0005
Triangulation Linear
0.0123
0.0046
0.0107
0.0013
Triangulation Cubic
0.0138
0.0053
0.0120
0.0017
Table 6: Kimberlite Mine dataset 10-fold cross validation with uniform sampling. Comparison of Interpolation techniques
Interpolation Method
Mean MSE (sq m) Std. Dev. MSE (sq m)
GP Neural Network
3.9290
0.3764
GP Squared Exponential
5.3278
0.3129
Nonparametric Linear
5.0788
0.6422
Nonparametric Cubic
5.1125
0.6464
Nonparametric Biharmonic
5.5265
0.5801
Nonparametric
132.5097
2.9112
Mean-of-neighborhood
Nonparametric
20.4962
2.5858
Nearest-neighbor
Parametric Linear
43.1529
2.2123
Parametric Quadratic
13.6047
0.9047
Parametric Cubic
10.2484
0.7282
Triangulation Linear
5.0540
0.6370
Triangulation Cubic
5.1091
0.6374
35
Table 7: West Angelas dataset 10-fold cross validation with uniform sampling. Comparison of Interpolation
techniques
1000 test data per fold
10000 test data per fold
Interpolation Method
Mean MSE Std. Dev. MSE Mean MSE Std. Dev. MSE
(sq m)
(sq m)
(sq m)
(sq m)
GP Neural Network
0.0166
0.0071
0.0219
0.0064
GP Squared Exponential
0.4438
1.0289
0.7485
0.7980
Nonparametric Linear
0.0159
0.0075
0.0155
0.0021
Nonparametric Cubic
0.0182
0.0079
0.0161
0.0021
Nonparametric Biharmonic
0.0584
0.0328
0.1085
0.1933
Nonparametric
0.9897
0.4411
0.9158
0.0766
Mean-of-neighborhood
Nonparametric
0.1576
0.0271
0.1233
0.0048
Nearest-neighbor
Parametric Linear
0.1019
0.0951
0.0927
0.0173
Parametric Quadratic
0.0458
0.0130
0.0390
0.0059
Parametric Cubic
0.0341
0.0109
0.0288
0.0038
Triangulation Linear
0.0162
0.0074
0.0157
0.0022
Triangulation Cubic
0.0185
0.0078
0.0166
0.0023
Table 8: Tom Price dataset 10-fold cross validation with patch sampling. Comparison of Interpolation
techniques
1000 test data per fold
10000 test data per fold
Interpolation Method
Mean MSE Std. Dev. MSE Mean MSE Std. Dev. MSE
(sq m)
(sq m)
(sq m)
(sq m)
GP Neural Network
0.0104
0.0029
0.0100
0.0021
GP Squared Exponential
0.0103
0.0029
0.0099
0.0021
Nonparametric Linear
0.0104
0.0047
0.0098
0.0040
Nonparametric Cubic
0.0114
0.0054
0.0108
0.0045
Nonparametric Biharmonic
0.0114
0.0046
0.0124
0.0047
Nonparametric
0.0143
0.0036
0.0142
0.0026
Mean-of-neighborhood
Nonparametric
0.0139
0.0074
0.0132
0.0048
Nearest-neighbor
Parametric Linear
0.0103
0.0029
0.0100
0.0021
Parametric Quadratic
0.0099
0.0029
0.0091
0.0022
Parametric Cubic
0.0097
0.0027
0.0605
0.1072
Triangulation Linear
0.0104
0.0047
0.0098
0.0039
Triangulation Cubic
0.0114
0.0054
0.0108
0.0045
36
Table 9: West Angelas dataset 10-fold cross validation with patch sampling. Comparison of Interpolation
techniques
1000 test data per fold
10000 test data per fold
Interpolation Method
Mean MSE Std. Dev. MSE Mean MSE Std. Dev. MSE
(sq m)
(sq m)
(sq m)
(sq m)
GP Neural Network
0.0286
0.0285
0.0291
0.0188
GP Squared Exponential
0.7224
1.1482
1.4258
2.0810
Nonparametric Linear
0.0210
0.0168
0.0236
0.0161
Nonparametric Cubic
0.0229
0.0175
0.0243
0.0165
Nonparametric Biharmonic
0.0555
0.0476
0.0963
0.0795
Nonparametric
1.6339
0.8907
1.7463
1.2831
Mean-of-neighborhood
Nonparametric
0.1867
0.0905
0.2248
0.1557
Nearest-neighbor
Parametric Linear
0.1400
0.0840
0.1553
0.1084
Parametric Quadratic
0.0612
0.0350
0.0658
0.0454
Parametric Cubic
0.0380
0.0196
0.0752
0.0880
Triangulation Linear
0.0212
0.0169
0.0235
0.0160
Triangulation Cubic
0.0232
0.0175
0.0245
0.0164
From the experimental results presented, many inferences on the applicability and usage of different
techniques could be drawn. The results obtained validate the findings from the previous set of experiments.
They show that the NN kernel is much more effective at modeling terrain data than the SQEXP kernel.
This is particularly true in relatively sparse and/or complex datasets such as the Kimberlite Mine or West
Angelas datasets as shown in Tables 6, 7 and 9.
For relatively flat terrain such as the Tom Price scan data, the techniques compared performed more
or less similarly. This is shown in Tables 5 and 8. From the GP modeling perspective, the choice of nonstationary or stationary kernel was less relevant in such a scenario. The high density of data in the scan was
likely another contributive factor.
For sparse datasets such as the Kimberlite Mine dataset, GP-NN easily outperformed all the other
interpolation methods. This is shown in Table 6. This is quite simply because most other techniques
imposed apriori models on the data whereas the GP-NN was actually able to model and adapt to the data
at hand much more effectively. Further, GP-SQEXP is not as effective as the GP-NN in handling sparse or
complex data.
In the case of the West Angelas dataset, the data was complex in that it had a poorly defined structure
and large areas of the data were sparsely populated. This was due to large occlusions as the scan was a
bottom-up scan from one end of the mine pit. However, the dataset spanned over a large area and local
neighborhoods (small sections of data) were relatively flat. Thus, while the GP-NN clearly outperformed
the GP-SQEXP, it performed comparably with the piecewise linear/cubic and triangle based linear/cubic
37
methods and much better than the other techniques attempted. This is shown in the Tables 7 and 9. Thus,
even in this case, the GP-NN proved to be a very competitive modeling option. GP-SQEXP was not able
to handle sparse data or the complex structure (with an elevation difference of 190 m) effectively and hence
performed poorly in relation to the aforementioned methods.
As evidenced by Tables 6, 7 and 9, the non-parametric and triangle based linear/cubic interpolation
techniques performed much better than the parametric interpolation techniques, which fared poorly overall.
This is expected and intuitive as the parametric models impose apriori models over the whole neighborhood
of data whereas the non-parametric and triangle based techniques basically split the data into simplexes
(triangular or rectangular patches) and then operate only only one such simplex for a point of interest.
Thus their focus is more towards smooth local data fitting. The parametric techniques only performed well
in the Tom Price dataset tests (Tables 5 and 8) due to the flat nature of the dataset. Among the nonparametric techniques, the linear and cubic also performed better than the others. The nearest-neighbor
and mean-of-neighborhood methods were too simplistic to be able to handle any complex or sparse data.
The results prove that the GP-NN is a very versatile and competitive modeling option for a range of
datasets, varying in complexity and sparseness. For dense datasets or relatively flat terrain (even locally
flat terrain), the GP-NN will perform as well as the standard grid based methods employing any of the
interpolation techniques compared or a TIN based representation employing triangle based linear or cubic
interpolation. For sparse datasets or very bumpy terrain, the GP-NN will significantly outperform every
other technique that has been tested in this experiment. When considering the other advantages of the GP
approach such as Bayesian model fitting, ability to handle uncertainty and spatial correlation in a statistically
sound manner and multi-resolution representation of data, the GP approach to terrain representation seems
to have a clear advantage over standard grid based or TIN based approaches.
Finally, note that in essence, these experiments compared the GP - Kriging interpolation technique with
various other standard interpolation methods. The GP modeling approach can use any of the standard interpolation techniques through the use of suitably chosen/designed kernels. For instance, a GP-linear kernel
performs the same interpolation as a linear interpolator, however, the GP also provides the corresponding
uncertainty estimates among other things. The GP is more than just an interpolation technique. It is a
Bayesian framework that brings together interpolation, linear least squares estimation, uncertainty representation, non-parametric continuous domain model fitting and the Occam’s Razor principle (thus preventing
the over-fitting of data). Individually, these are well known techniques; the GP provides a flexible (in the
sense of using different kernels) technique to do all of them together.
Thus, this set of experiments statistically evaluated the proposed GP modeling approach, compared
stationary and non-stationary GP’s for terrain modeling, compared the GP modeling / Kriging interpolation
38
approach to elevation maps using many standard interpolation techniques as well as TIN representations
using triangle based interpolation techniques.
6.4
Further analysis
The final set of experiments performed aimed at gaining a better understanding of some aspects of the
modeling process, the factors influencing it and some of the design decisions made in this work. These
experiments were mostly conducted on the Kimberlite Mine data set owing to its characteristic features
(gradients, roadways, faces etc.). Only small numbers of training data have been used in these experiments
in order to facilitate rapid computation. The mean squared error values will thus obviously not be near the
best case scenario, however the focus of these experiments was not on improving accuracy but rather on
observing trends and understanding dependencies.
Among the experiments performed, three different optimization strategies were attempted - stochastic
(simulated annealing), gradient based (Quasi-Newton optimization with BFGS Hessian update (Nocedal and
Wright, 2006)) and the combination (first a simulated annealing, followed by the Quasi-Newton optimization
with the output of the previous stage being the input to this one). The results obtained are shown in Table
10. The objective of this experiment was to start from a random set of hyper-parameters and observe which
method converges to a better solution. Expectedly, it was found that the combined strategy performed
best. The gradient based optimization requires a good starting point or the optimizer may get stuck in local
minima. The stochastic optimization (simulated annealing) provides a better chance to recover from local
minima as it allows for “jumps” in the parameter space during optimization. Thus, the combination first
localizes a good parameter neighborhood and then zeros-down on the best parameters for the given scenario.
Table 10: Optimization Strategy - GP-NN applied on Kimberlite Mine data set (Number of Training Data
= 1000, optimization is started from a random start point)
Optimization Method
Mean Squared Error (sq m)
Stochastic (Simulated Annealing)
94.30
Gradient based (Quasi-Newton Optimization)
150.48
Combined
4.61
Three different sampling strategies have been attempted in this work - uniform sampling, random sampling, heuristic sampling (implemented here as a preferential sampling around elevation gradients). All
experiments in this paper have been performed using a uniform sampling strategy as it performed the best.
The three sampling strategies were specifically tested on the Kimberlite Mine data set owing to its significant
elevation gradient. The results obtained are shown in Table 11. Heuristic sampling did poorly in comparison
to the other sampling methods because of the fact that the particular gradient based sampling approach
39
applied here resulted in the use of mostly gradient information and little or no face information. Increasing
the number of training data increases the number of face related data points and reduces the amount of
error. At the limit (number of training data = number of data), all three approaches would produce the
same result.
Table 11: Sampling Strategy - GP-NN applied on Kimberlite Mine data set
Method Number(Train Data) = 1000 Number(Train Data) = 1500
Mean Squared Error (sq m)
Mean Squared Error (sq m)
Uniform
8.87
5.87
Random
10.17
7.89
Heuristic
32.55
11.48
Table 12: GP-NN applied on Kimberlite Mine data set (3000 training data, 1000 test data)
Number of Neighbors MSE (sq m)
10
4.5038
50
4.2214
100
4.1959
150
4.1859
200
4.1811
250
4.1743
300
4.1705
350
4.1684
400
4.1664
450
4.1652
500
4.1642
550
4.1626
The number of nearest neighbors used for the local approximation step in the inference process is another
important factor influencing the GP modeling process. Too many nearest neighbors results in excessive
“smoothing” of the data whereas too few of them would result in a relatively “spiky” output. Further, the
greater the number of neighbors considered, the more the time it takes to estimate/interpolate the elevation
(an increase in computational complexity). The “correct” number of nearest neighbors was determined
by computing the MSE for different neighborhood sizes and interpreting the results. This experiment was
conducted on both the Kimberlite Mine and the West Angelas datasets. The results of this experiment
are shown in Tables 12 and 13 respectively. With an aim of (a) choosing a reasonable sized neighborhood,
(b) obtaining reasonably good MSE performance and time complexity and (c) if possible, using a single
parameter across all experiments and datasets, a neighborhood size of 100 was used in this work.
The size of the training data is obviously an important factor influencing the modeling process. However,
this depends a lot on the data set under consideration. For the Tom Price laser scanner data, a mere 3000
40
Table 13: GP-NN applied on West Angelas Mine data set (3000 training data, 10000 test data)
Number of Neighbors MSE (sq m)
10
0.082
50
0.017
100
0.020
150
0.023
200
0.026
250
0.027
300
0.028
of over 1.8 million data points were enough to produce centimeter level accuracy. This was due to the fact
that this particular data set comprised of dense and accurate scanner data with a relatively lower change in
elevation. The same number of training data also produced very satisfactory results in the case of the West
Angelas dataset which is both large and has a significant elevation gradient - this is attributed to the uniform
sampling of points across the whole dataset and adaptive power of the NN kernel. The Kimberlite Mine data
set comprised of very sparse data spread over a very large area. This data set also had a lot of “features”
such as roads, “crest-lines” and “toe-lines”. Thus, if the amount of training data is reduced (say 1000 of
the 4612), while the terrain was still broadly reconstructed, the finer details - faces, lines, roads etc. were
smoothened out. Using nearly the entire data set for training and then reconstructing the elevation/surface
map at the desired resolution provided the best results.
This section of experiments attempted to better understand the finer points and design decisions of the GP
modeling approach proposed in this work. A combination of stochastic (simulated annealing) and gradient
based (Quasi-Newton) optimization was found to be the best option to obtain the GP hyperparameters.
For most datasets and particularly for dense and large ones, a uniform sampling procedure was a good
option. For feature rich datasets, a heuristic sampling might be attempted but care must be taken to ensure
that training data from the non-feature-rich areas are also obtained. The number of nearest neighbors
affected both the time complexity as well as the MSE performance. A tradeoff between these was applied
to empirically determine that the experiments in this work would use 100 nearest neighbors. Finally, the
number of training data depends on the computational resources available and on the richness of the dataset.
A dataset such as the Kimberlite Mine dataset needed more training data than the Tom Price dataset.
7
Conclusion
This work proposed the use of Gaussian processes for modeling large scale and complex terrain. Specifically,
it showed that a single neural network based Gaussian process (GP) was powerful enough to be able to
41
successfully model complex terrain data, taking into account the local structure and preserving much of
the spatial features in the terrain. This work also proposed a fast local approximation technique, based
on a “moving-window” methodology and implemented using KD-Trees, that was able to simultaneously
address the issues of scalability of this approach to very large datasets, effective application of the neural
network (NN) kernel and procuring a tradeoff between smoothness and feature preservation in the resulting
output. This work also compared stationary and non-stationary kernel and studied various detailed aspects
of the modeling process. Cross validation experiments were used to provide statistically representative
performance measures, compare the GP approaches with several other interpolation methods using both
grid/elevation maps as well as TIN’s and understand the strengths and applicability of different techniques.
These experiments demonstrated that for dense and/or relatively flat terrain, the GP-NN would perform as
well as the grid based methods using any of the standard interpolation techniques or TIN based methods
using triangle based interpolation techniques. However, for sparse and/or complex terrain data, the GP-NN
would significantly outperform alternative methods. Thus, the GP-NN proved to be a very versatile and
effective modeling option for terrain of a range of sparseness and complexity. Experiments conducted on
multiple real sensor data sets of varying sparseness, taken from a mining scenario clearly validated the claims
made. The data sets included dense laser scanner data and sparse mine planning data that was representative
of a GPS based survey. Not only are the kernel choice and local approximation method unique to this work
on terrain modeling, the approach as a whole is novel in that it addressed the problem at a scale never
attempted before with this particular technique.
The model obtained naturally provided a multi-resolution representation of large scale terrain, effectively
handled both uncertainty and incompleteness in a statistically sound way and provided a powerful basis to
handle correlated spatial data in an appropriate manner. The Neural Network kernel was found to be much
more effective than the squared exponential kernel in modeling terrain data. Further, the various properties
of the neural network kernel including its adaptive power and of being a universal approximator make it an
excellent choice for modeling data, particularly when no apriori information (eg. the geometry of the data)
is available or when data is sparse, unstructured and complex. This has been consistently demonstrated
throughout this paper. Where domain information is available (eg. a flat surface), a domain specific kernel
(eg. linear kernel) may perform well but the NN kernel would still be very competitive. Thus, this work also
suggests further research in the area of kernel modeling - particularly combinations of kernels and domain
specific kernels.
42
Acknowledgements
This work has been supported by the Rio Tinto Centre for Mine Automation and the ARC Centre of
Excellence programme, funded by the Australian Research Council (ARC) and the New South Wales State
Government. The authors acknowledge the support of Annette Pal, James Batchelor, Craig Denham, Joel
Cockman and Paul Craine of Rio Tinto.
References
Candela, J. Q., Rasmussen, C. E., and Williams, C. K. I. (2007). Approximation methods for gaussian
process regression. Technical Report MSR-TR-2007-124, Microsoft Research.
Durrant-Whyte, H. (2001). A Critical Review of the State-of-the-Art in Autonomous Land Vehicle Systems
and Technology. Technical Report SAND2001-3685, Sandia National Laboratories, USA.
Haas, T. C. (1990a). Kriging and automated variogram modeling within a moving window. Atmospheric
Environment, 24A:1759–1769.
Haas, T. C. (1990b). Lognormal and moving window methods of estimating acid deposition. Journal of the
American Statistical Association, 85:950–963.
Hornik, K. (1993). Some new results on neural network approximation. Neural Networks, 6(8):1069–1072.
Kidner, D., Dorey, M., and Smith, D. (1999). What’s the point? Interpolation and extrapolation with a
regular grid DEM. In Fourth International Conference on GeoComputation, Fredericksburg, VA, USA.
Kitanidis, P. K. (1997). Introdcution to Geostatistics: Applications in Hydrogeology. Cambridge University
Press.
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In
Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI), volume
2 (12), page 1137:1143.
Krotkov, E. and Hoffman, R. (1994). Terrain mapping for a walking planatary rover. IEEE Transactions
on Robotics and Automation, 10(6):728–739.
Kweon, I. S. and Kanade, T. (1992). High-Resolution Terrain Map from Multiple Sensor Data. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 14(2):278–292.
43
Lacroix, S., Mallet, A., Bonnafous, D., Bauzil, G., Fleury, S., Herrb, M., and Chatila, R. (2002). Autonomous
rover navigation on unknown terrains: Functions and Integration. International Journal of Robotics
Research (IJRR), 21(10-11):917–942.
Lang, T., Plagemann, C., and Burgard, W. (2007). Adaptive Non-Stationary Kernel Regression for Terrain
Modeling. In Robotics, Science and Systems.
Leal, J., Scheding, S., and Dissanayake, G. (2001). 3D Mapping: A Stochastic Approach. In Australian
Conference on Robotics and Automation.
Mackay, D. J. C. (2002). Information Theory, Inference & Learning Algorithms. Cambridge University
Press.
Matheron, G. (1963). Principles of Geostatistics. Economic Geology, 58:1246–1266.
Moore, I. D., Grayson, R. B., and Ladson, A. R. (1991). Digital terrain modelling: A review of hydrological,
geomorphological, and biological applications. Hydrological Processes, 5-1:3–30.
Neal, R. M. (1996). Bayesian Learning for Neural Networks. Lecture Notes in Statistics 118. Springer, New
York.
Nocedal, J. and Wright, S. (2006). Numerical Optimization. Springer.
Paciorek, C. J. and Schervish, M. J. (2004). Nonstationary Covariance Functions for Gaussian Process
Regression. In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing
Systems (NIPS) 16. MIT Press, Cambridge, MA.
Plagemann, C., Kersting, K., and Burgard, W. (2008a). Nonstationary Gaussian Process Regression using
Point Estimates of Local Smoothness. In European Conference on Machine Learning (ECML).
Plagemann, C., Mischke, S., Prentice, S., Kersting, K., Roy, N., and Burgard, W. (2008b). Learning
Predictive Terrain Models for Legged Robot Locomotion. In International Conference on Intelligent
Robots and Systems (IROS). To appear.
Preparata, F. P. and Shamos, M. I. (1993). Computational Geometry: An Introduction. Springer.
Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
Rekleitis, I., Bedwani, J., Gingras, D., and Dupuis, E. (2008). Experimental Results for Over-the-Horizon
Planetary exploration using a LIDAR sensor. In Eleventh International Symposium on Experimental
Robotics.
44
Sandwell, D. T. (1987). Biharmonic spline interpolation of geos-3 and seasat altimeter data. Geophysical
Research Letters, 14,2:139–142.
Shen, Y., Ng, A., and Seeger, M. (2006). Fast gaussian process regression using kd-trees. In Weiss, Y.,
Schölkopf, B., and Platt, J., editors, Advances in Neural Information Processing Systems (NIPS) 18,
pages 1225–1232. MIT Press, Cambridge, MA.
Triebel, R., Pfaff, P., and Burgard, W. (2006). Multi-Level Surface Maps for Outdoor Terrain Mapping and
Loop Closing. In International Conference on Intelligent Robots and Systems (IROS), Beijing, China.
Wackernagel, H. (2003). Multivariate geostatistics: an introduction with applications. Springer.
Watson, D. E. (1992). Contouring: A Guide to the Analysis and Display of Spatial Data. Pergamon (Elsevier
Science, Inc.).
Williams, C. K. I. (1998a). Computation with infinite neural networks. Neural Computation, 10(5):1203–
1216.
Williams, C. K. I. (1998b). Prediction with gaussian processes: From linear regression to linear prediction
and beyond. In Jordan, M. I., editor, Learning in Graphical Models, pages 599–622. Springer.
Ye, C. and Borenstein, J. (2003). A new terrain mapping method for mobile robots obstacle negotiation. In
UGV Technology Conference at the SPIE AeroSense Symposium, volume 4, Orlando, FL, USA.
45
Download