Sparse Grid-Based Nonlinear Filtering Professor: Ming-Shyan Wang

Sparse Grid-Based
Nonlinear Filtering
Professor: Ming-Shyan Wang
Student: Jia-Sin
The problem of estimating the state of a nonlinear stochastic
plant is considered. Unlike classical approaches such as the
extended Kalman filter, which are based on the linearization
of the plant and the measurement model, we concentrate on
the nonlinear filter equations such as the Zakai equation. The
numerical approximation of the conditional probability density
function (pdf) using ordinary grids suffers from the “curse of
dimension” and is therefore not applicable in higher dimensions.
It is demonstrated that sparse grids are an appropriate tool to
represent the pdf and to solve the filtering equations numerically.
The basic algorithm is presented. Using some enhancements it is
shown that problems in higher dimensions can be solved with an
acceptable computational effort. As an example a six-dimensional,
highly nonlinear problem, which is solved in real-time using a
standard PC, is investigated.
We consider the dynamics of a nonlinear
stochastic system given by the solution of
the stochastic differential equation
The system states (Xt) are measured via the
nonlinear stochastic measurement equation
he filtering problem can briefly be described as the
problem of obtaining an (sub)optimal estimator for
state Xt given past measurements There are various
ways of approaching such aproblem. The most
common method is to apply anextended Kalman
filter (see, e.g., [1]), a methodsuitable and highly
efficient for systems with modestnon linearities.
Since the extended Kalman filter is based on
linearizations of the system equation,
divergence is possible. In addition nonsymmetric
or multimodal distributions cannot be treated
sinceclassical Gaussian theory is applied.
Another widely used method is the particle filter
(see, e.g., [1]) which makes use of a reasonable
amount of state samples and propagates them through
simulation of the system. The particles, which may
be viewed as a discrete distribution approximating
the conditional probability distribution, are assigned
normalized weights. An updated approximation is
generated by changing the weights with respect to
the measurements (e.g., using Bayes formula). The
quality of the approximation is heavily dependent
on the importance density design as well as on the
number of particles. As a rule of thumb, the necessary
number of particles grows exponentially with the
number of dimensions of the system. This “curse of
Furthermore, it is well known (e.g., [3]) that only
in very special setups it is possible to calculate the
estimate in a closed form by a finite-dimensional
system of equations. Most important is the linear
case with the Kalman filter, which just needs to
update d conditional expected values and d(d¡1)=2
covariances. Since the linearity implies a Gaussian
conditional distribution, the complete distribution
is specified by these parameters. Other so-called
finite-dimensional filters are the Bene²s [4] and the
Daum filters [2], which are of utmost theoretical
interest but require artificial and restrictive conditions
on f and h.
To specify the whole conditional distribution of X
given the filtration FY generated by Y, it is typically
necessary to consider an infinite
amount of parameters.
There exist various publications that describe and
compare the several approaches to attack the
filtering problem. Li and Jilkov [3] as well as Daum
[5] provide a general overview, while Farina, et al.
[6]compare different methods with respect to a
tracking application.
The approach of this paper is based on the representation of
the conditional probability density function (pdf) by means of
sparse grids. Due to their huge computational effort, wellknown grid-based nonlinear filtering methods are typically
used for lower dimensions only. Challa and Bar-Shalom [7]
solve a tracking problem in two dimensions, and Musick, et al.
[8] compare a special finite difference
approach with a particle filter in four dimensions. Spencer and
Bergman [9] propose a finite element method applied in two
dimensions. The point-mass approach in [10] is also applied in
two dimensions. Zhang and Laneuville [11] use a special grid
approach and solve a tracking problem in four dimensions,
which, however, cannot be computed in real-time. We
demonstrate that by using a sparse grid representation of the
pdf, classical filtering formulas
can be treated numerically with reasonable effort evenfor
higher dimensions
Sparse grids were first introduced by Zenger and
have been widely used, e.g., in the area of finance
mathematics [12].
The basic idea is to decompose the space of
piecewise multi-linear functions © : [0,1]d !R
in hierarchical subspaces and then consider only
those subspaces for which the contribution to
 interpolation of smooth functions is significant.
 With multi-indices l and i, the multi-linear basis
 functions are given as
In the sequel we call l the level and i the index of a
sparse grid point.They are centered at the grid
points xl,i = (i1 ¢ hl1 ,: : : , id ¢ hld) T with leveldependent hierarchical grid width hlj=2¡lj . The
space of piecewise multi-linearfunctions of level l in
the interior of [0,1]d is thengiven by
Fig. 1. Hierarchical subspaces Wl , jlj1 · 3 and sparse grid of level L = 5.
The hierarchical space
the hierarchical surplus ul,i contains the difference
in xl,i between the relative coarser and finer
interpolation. While the number of grid points
increases considerably with the level, it turns out
(see Zenger [12]) that the gain in interpolation
accuracy becomes comparatively small for smooth
functions. The idea of sparse grids is, instead of using
a full grid with L jkj1·LWk, to only use the lower
level hierarchical subspaces and form L
jkj1·L+d¡1Wk (atetrahedron of subspaces is
composed instead of acuboid). L is called the level
of the sparse grid. It should be noted that no grid
points are located at the boundary of the domain.
Figure 1 depicts an example of a sparse grid of level
We are considering sparse grids for the
discretization of the estimators’ pdfs. Let D2 be
the class of pdfs with
To handle typical classes of pdfs, some slight modifications to the concept of
sparse grids have to be applied.Firstly, in the classical definition of sparse grids,
the domain of functions u to be approximated is [0,1]d; by a trivial rescaling
˜p(x) :=p(2ax¡(a, : : : ,a)T), the domain can be extended to support functions
with general compact subsets C μ [¡a,a]d.
Secondly, since pdfs with infinite support (such
as the Gaussian pdfs) shall also be handled, a
slightly different convergence concept is
introduced.Let D ½ fp : Rd !R+g be a class of
pdfs. We call an approximation method for D
using (regular or sparse) grids converging of
order f(R,n) in probability, if forall p 2 D and " >
0, there exists a radius R >0 of asphere S(R) :=fx
2 Rd : kxk1 · Rg = [¡R,R]d with
such as the restrictions of the pdfs on S(R)
converge of the stated order, jjpjS(R)
¡pnjS(R)jj1= O(f(R,n)).Herein, p is approximated
by pn using the grid which spans [¡R,R]d.
Observe that the basic grid size forthe
approximation pn is hn =2R ¢ 2¡n (n equals the
level L).
Let (−,F,P) be a probability space endowed with
a right-continuous filtration (Ft), and let W and
V be a d- and m-dimensional adapted Brownian
The system state is defined by an adapted
stochastic process X = (Xt), Xt 2 Rd. X is given
as the
strong solution of a nonlinear stochastic
which represents the dynamics as well as the
stochastic properties of the system.
In contrast to the discrete measurement (see
(2)),the continuous measurement is modeled by
another stochastic process Y, again defined (up
to versions) as a strong solution of the
stochastic differential equation
It is well known (see [14, ch. 8.6]) that a strong
solution exists if appropriate growth conditions
suchas (q denotes any of the measurable
functions f,g,)
By interpreting (3) as a system equation and
(4) as a measurement equation, the problem of
estimating the current state Xt of the object by
using measurements Ys·t can be seen as a filtering
problem: let FYt be the filtration generated by Y. We
are considering the problem of finding an optimal
(in the L2 sense) FY-adapted estimation of X. It can
be easily seen that this problem is equivalent to
finding the conditional expectation E(Xt j Ft).
We are assuming that the conditional pdf pt
which is a measurable function of (t,x), exists. The analysis
of the evolution of the conditional distribution is part of
the general filter theory. Under very mild assumptions
filter formulas such as the Kushner-Stratonovich equation
[15, Theorem 3.30]) have been developed. The KushnerStratonovich equation is equivalent to a tochastic partial
differential equation for the pdf if the solution of the
differential equation exists (see [14, Theorem 8.6]). For
details on the existence of the conditional pdf see also
[15, Theorem 7.11]. A thorough analysis of the conditions
and properties of the solution is contained in [16]. I
The right-hand side of this equation may be
seen as the sum of a propagation part, containing
a transport (or advection) term (the first line), a
dissipation (or diffusion) term (the second line), and
the innovation part (the third and fourth line), which
handles measurements.
While the transport term shifts the pdf
according to the model, the diffusion term
widens the pdf in time, which inserts uncertainty
into the estimation. The measurement term will
in turn narrow the pdf due to the measurement
update. The discrete time case is similar. The
which can be seen as an abstract version of the
Bayesian rule, is simply replaced by the classical
Bayesian rule such that the propagation is
performed via the partial differential equation
Herein, p+t denotes the conditional pdf of Xt
just after the measurement at time t has been
considered,and p¡t denotes the pdf just before
the measurement.As the measurement time
intervals tend to zero, the discrete time solution
converges towards the continuous time solution.
In the sequel we consider the discrete time
Conventional approaches for solving partial
differential equations numerically suffer
from the curse of dimension. To achieve a
given order of approximation accuracy, the
number of grid points grows exponentially
with the dimension such that
for a function f of smoothness r. The dimension
of the filtering problems considered in this paper
typically ranges from 5 to 10 and is therefore
out of reach for real-time applications with
regular grid methods. Even the use of adaptive
grids for only 4 dimensions could not be
implemented in real-time (see [11]). However,
the technique of sparse grids offers a possibility
to significantly lower the number of necessary
grid points from O(Nd) to O(N(logN)d¡1).
In Section II the notion of sparse grids was
introduced based on hierarchical spaces of
multi-linear basis functions. The presented
filtering algorithm is a hybrid approach, which
uses the nodal function values for the pdf at
each grid point instead of theapproximation on
hierarchical spaces.
A. Propagation
Finite differences are used to discretize the
propagation equation (5). On the boundaries the pdf is set to
zero. The following scheme is proposed. A first-order forward
scheme for the time derivative
B. Measurement Update
The pdf is updated using measurements
according to Bayes’ rule (6). As the nodal values
of the pdf are stored at the sparse grid points,
the multiplication can simply be performed for
every sparse grid point x by
The denominator has to be evaluated
numerically. Interpolating the integrand in L
jkj1·L+d¡1Wk would result in a quadrature rule
with negative weights for some of the grid
points,which could cause severe problems.
Therefore, as an alternative quadrature rule,
weighting every grid point by the approximate
volume of its nearest domain is used.
C. Expected Values
Usually, only certain characteristics of the pdf are of
interest. For most cases it is required to extract the
expected value
Instead of approximating the integrand by a
piecewise constant or linear function as is done in
the normalization of Bayes’ rule, we use a situation
adapted rule to achieve a more accurate
approximation. This is accomplished by interpolating
the pdf itself piecewise using Gaussian densities of
the form
in every coordinate direction.
Let xi = x0, xj = x0 +¢x, and xk = x0 ¡¢x be
three neighboring sparse grid points (again ¢x is
the local grid width). With interpolation points
(xi,pt(xi)), (xj ,pt(xj )), (xk,pt(xk)) the parameters
The advection part forces the density to
across the state domain with time. Large
density shifts would imply unnecessary high
computational effort to discretize the whole
region in every time step.A significant
performance gain can be achieved by gridtiling and grid-drifting techniques.
Tiling helps in dealing with pdfs, which move and
widen due to drift and diffusion with time. Also, tiling
restricts the computational effort to a possibly small
subset of the state domain, which carries a probability
close to 1. In contrast to regular grids, it is not natural to
expand sparse grids just by adding a few rows of grid
points since this would contradict the hierarchical
structure of those grids. If we want to add or delete grid
points, we do so by adding or removing an entiresparse
grid. Initially, one tile covers the whole region of interest,
and in the further computation, only a few tiles are
typically used.
To represent a more general area necessary for the
filtering process for a moving and shape-changing
pdf, we cover the relevant region of the domain
tiles each containing a sparse grid. Since the sparse
grid does not contain boundary points, we have to
connecting boundary layers between the sparse grid
tiles. Those boundary layers (“glue layer”) are sparse
grids themselves with a lower dimension. A
layer is added only if all its possible neighbors are
part of the tiling.
This process is driven by the d-dimensional tiles
only. A new tile is introduced if the probability on
a rectangular strap of quarter tile width close to a
border exceeds a certain threshold. The necessary
boundary layers to existing neighbors (starting from
(d ¡1)-dimensional boundaries down to 0dimensional boundaries) are added afterwards. We
choose the threshold such that for a Gaussian
distribution with the same variances as the initial
one, a tile is added if the tail outside the tile carries
more than 0.01 probability.
A tile is removed if the integral of the pdf on this
tile is below a given threshold ®. This is done
reversely to the process of adding tiles. The lowest
dimensional associated boundary tiles are removed
first, and the d-dimensional tile itself is deleted last.
It is reasonable to use a threshold similar to the
adding threshold; we chose ®=0:01. The check
whether tiles are still necessary or new tiles are
required is done in every time step. To save
computing time the check could be performed in
regular time intervals depending on the dynamics of
the system.
[1] Simon, D. Optimal State Estimation. Hoboken, NJ: Wiley, 2006.
[2] Daum, F. E. New exact nonlinear filters: Theory and
applications. Proceedings of SPIE, vol. 2235, 1994, pp. 636—649.
[3] Li, X. R. and Jilkov, V. P. A survey of maneuvering target
tracking–Part VI(a):Density-based exact nonlinear filtering.
Proceedings of the 2010 SPIE Conference on Signal and Data
Processing of Small Targets, Orlando, FL, Apr. 6—8,2010.
[4] Bene³s,V. E. Exact finite-dimensional filters for certain
diffusions with nonlinear drift. Stochastics, 5, 1—2 (1981), 65—92.
[5] Daum, F. Nonlinear filters: Beyond the Kalman filter. IEEE
Aerospace and Electronic Systems Magazine, 20, 8(Aug. 2005), 57—
[6] Farina, A., Ristic, B., and Benvenuti, D.Tracking a ballistic
target: Comparison of several nonlinear filters.IEEE Transactions
on Aerospace and Electronic Systems,38, 3 (July 2002), 854—866.
Thanks for