Sparse Grid-Based Nonlinear Filtering IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS VOL. 49, NO. 4 OCTOBER 2013 Professor: Ming-Shyan Wang Student: Jia-Sin Hong 控晶四乙 4992C073 INTRODUCTION SPARSE GRIDS THE PROBLEM THE BASIC IMPLEMENTATION ENHANCEMENTS REFERENCES The problem of estimating the state of a nonlinear stochastic plant is considered. Unlike classical approaches such as the extended Kalman filter, which are based on the linearization of the plant and the measurement model, we concentrate on the nonlinear filter equations such as the Zakai equation. The numerical approximation of the conditional probability density function (pdf) using ordinary grids suffers from the “curse of dimension” and is therefore not applicable in higher dimensions. It is demonstrated that sparse grids are an appropriate tool to represent the pdf and to solve the filtering equations numerically. The basic algorithm is presented. Using some enhancements it is shown that problems in higher dimensions can be solved with an acceptable computational effort. As an example a six-dimensional, highly nonlinear problem, which is solved in real-time using a standard PC, is investigated. I. INTRODUCTION We consider the dynamics of a nonlinear stochastic system given by the solution of the stochastic differential equation The system states (Xt) are measured via the nonlinear stochastic measurement equation he filtering problem can briefly be described as the problem of obtaining an (sub)optimal estimator for the state Xt given past measurements There are various ways of approaching such aproblem. The most common method is to apply anextended Kalman filter (see, e.g., [1]), a methodsuitable and highly efficient for systems with modestnon linearities. Since the extended Kalman filter is based on linearizations of the system equation, divergence is possible. In addition nonsymmetric or multimodal distributions cannot be treated sinceclassical Gaussian theory is applied. Another widely used method is the particle filter (see, e.g., [1]) which makes use of a reasonable amount of state samples and propagates them through simulation of the system. The particles, which may be viewed as a discrete distribution approximating the conditional probability distribution, are assigned normalized weights. An updated approximation is generated by changing the weights with respect to the measurements (e.g., using Bayes formula). The quality of the approximation is heavily dependent on the importance density design as well as on the number of particles. As a rule of thumb, the necessary number of particles grows exponentially with the number of dimensions of the system. This “curse of dimension” Furthermore, it is well known (e.g., [3]) that only in very special setups it is possible to calculate the estimate in a closed form by a finite-dimensional system of equations. Most important is the linear case with the Kalman filter, which just needs to update d conditional expected values and d(d¡1)=2 covariances. Since the linearity implies a Gaussian conditional distribution, the complete distribution is specified by these parameters. Other so-called finite-dimensional filters are the Bene²s [4] and the Daum filters [2], which are of utmost theoretical interest but require artificial and restrictive conditions on f and h. To specify the whole conditional distribution of X given the filtration FY generated by Y, it is typically necessary to consider an infinite amount of parameters. There exist various publications that describe and compare the several approaches to attack the nonlinear filtering problem. Li and Jilkov [3] as well as Daum [5] provide a general overview, while Farina, et al. [6]compare different methods with respect to a tracking application. The approach of this paper is based on the representation of the conditional probability density function (pdf) by means of sparse grids. Due to their huge computational effort, wellknown grid-based nonlinear filtering methods are typically used for lower dimensions only. Challa and Bar-Shalom [7] solve a tracking problem in two dimensions, and Musick, et al. [8] compare a special finite difference approach with a particle filter in four dimensions. Spencer and Bergman [9] propose a finite element method applied in two dimensions. The point-mass approach in [10] is also applied in two dimensions. Zhang and Laneuville [11] use a special grid adaptive approach and solve a tracking problem in four dimensions, which, however, cannot be computed in real-time. We demonstrate that by using a sparse grid representation of the pdf, classical filtering formulas can be treated numerically with reasonable effort evenfor higher dimensions II. SPARSE GRIDS Sparse grids were first introduced by Zenger and have been widely used, e.g., in the area of finance mathematics [12]. The basic idea is to decompose the space of piecewise multi-linear functions © : [0,1]d !R in hierarchical subspaces and then consider only those subspaces for which the contribution to the interpolation of smooth functions is significant. With multi-indices l and i, the multi-linear basis functions are given as In the sequel we call l the level and i the index of a sparse grid point.They are centered at the grid points xl,i = (i1 ¢ hl1 ,: : : , id ¢ hld) T with leveldependent hierarchical grid width hlj=2¡lj . The space of piecewise multi-linearfunctions of level l in the interior of [0,1]d is thengiven by Fig. 1. Hierarchical subspaces Wl , jlj1 · 3 and sparse grid of level L = 5. The hierarchical space the hierarchical surplus ul,i contains the difference in xl,i between the relative coarser and finer interpolation. While the number of grid points increases considerably with the level, it turns out (see Zenger [12]) that the gain in interpolation accuracy becomes comparatively small for smooth functions. The idea of sparse grids is, instead of using a full grid with L jkj1·LWk, to only use the lower level hierarchical subspaces and form L jkj1·L+d¡1Wk (atetrahedron of subspaces is composed instead of acuboid). L is called the level of the sparse grid. It should be noted that no grid points are located at the boundary of the domain. Figure 1 depicts an example of a sparse grid of level 5. We are considering sparse grids for the discretization of the estimators’ pdfs. Let D2 be the class of pdfs with To handle typical classes of pdfs, some slight modifications to the concept of sparse grids have to be applied.Firstly, in the classical definition of sparse grids, the domain of functions u to be approximated is [0,1]d; by a trivial rescaling ˜p(x) :=p(2ax¡(a, : : : ,a)T), the domain can be extended to support functions with general compact subsets C μ [¡a,a]d. Secondly, since pdfs with infinite support (such as the Gaussian pdfs) shall also be handled, a slightly different convergence concept is introduced.Let D ½ fp : Rd !R+g be a class of pdfs. We call an approximation method for D using (regular or sparse) grids converging of order f(R,n) in probability, if forall p 2 D and " > 0, there exists a radius R >0 of asphere S(R) :=fx 2 Rd : kxk1 · Rg = [¡R,R]d with such as the restrictions of the pdfs on S(R) converge of the stated order, jjpjS(R) ¡pnjS(R)jj1= O(f(R,n)).Herein, p is approximated by pn using the grid which spans [¡R,R]d. Observe that the basic grid size forthe approximation pn is hn =2R ¢ 2¡n (n equals the level L). III. THE PROBLEM Let (−,F,P) be a probability space endowed with a right-continuous filtration (Ft), and let W and V be a d- and m-dimensional adapted Brownian motion. The system state is defined by an adapted stochastic process X = (Xt), Xt 2 Rd. X is given as the strong solution of a nonlinear stochastic differential Equation which represents the dynamics as well as the stochastic properties of the system. In contrast to the discrete measurement (see (2)),the continuous measurement is modeled by another stochastic process Y, again defined (up to versions) as a strong solution of the stochastic differential equation It is well known (see [14, ch. 8.6]) that a strong solution exists if appropriate growth conditions suchas (q denotes any of the measurable functions f,g,) By interpreting (3) as a system equation and (4) as a measurement equation, the problem of estimating the current state Xt of the object by only using measurements Ys·t can be seen as a filtering problem: let FYt be the filtration generated by Y. We are considering the problem of finding an optimal (in the L2 sense) FY-adapted estimation of X. It can be easily seen that this problem is equivalent to finding the conditional expectation E(Xt j Ft). We are assuming that the conditional pdf pt which is a measurable function of (t,x), exists. The analysis of the evolution of the conditional distribution is part of the general filter theory. Under very mild assumptions filter formulas such as the Kushner-Stratonovich equation (see [15, Theorem 3.30]) have been developed. The KushnerStratonovich equation is equivalent to a tochastic partial differential equation for the pdf if the solution of the differential equation exists (see [14, Theorem 8.6]). For details on the existence of the conditional pdf see also [15, Theorem 7.11]. A thorough analysis of the conditions and properties of the solution is contained in [16]. I The right-hand side of this equation may be seen as the sum of a propagation part, containing a transport (or advection) term (the first line), a dissipation (or diffusion) term (the second line), and the innovation part (the third and fourth line), which handles measurements. While the transport term shifts the pdf according to the model, the diffusion term widens the pdf in time, which inserts uncertainty into the estimation. The measurement term will in turn narrow the pdf due to the measurement update. The discrete time case is similar. The innovationpart which can be seen as an abstract version of the Bayesian rule, is simply replaced by the classical Bayesian rule such that the propagation is performed via the partial differential equation Herein, p+t denotes the conditional pdf of Xt just after the measurement at time t has been considered,and p¡t denotes the pdf just before the measurement.As the measurement time intervals tend to zero, the discrete time solution converges towards the continuous time solution. In the sequel we consider the discrete time formulation. IV. THE BASIC IMPLEMENTATION Conventional approaches for solving partial differential equations numerically suffer from the curse of dimension. To achieve a given order of approximation accuracy, the number of grid points grows exponentially with the dimension such that for a function f of smoothness r. The dimension of the filtering problems considered in this paper typically ranges from 5 to 10 and is therefore out of reach for real-time applications with regular grid methods. Even the use of adaptive grids for only 4 dimensions could not be implemented in real-time (see [11]). However, the technique of sparse grids offers a possibility to significantly lower the number of necessary grid points from O(Nd) to O(N(logN)d¡1). In Section II the notion of sparse grids was introduced based on hierarchical spaces of multi-linear basis functions. The presented filtering algorithm is a hybrid approach, which uses the nodal function values for the pdf at each grid point instead of theapproximation on hierarchical spaces. A. Propagation Finite differences are used to discretize the propagation equation (5). On the boundaries the pdf is set to zero. The following scheme is proposed. A first-order forward scheme for the time derivative B. Measurement Update The pdf is updated using measurements according to Bayes’ rule (6). As the nodal values of the pdf are stored at the sparse grid points, the multiplication can simply be performed for every sparse grid point x by The denominator has to be evaluated numerically. Interpolating the integrand in L jkj1·L+d¡1Wk would result in a quadrature rule with negative weights for some of the grid points,which could cause severe problems. Therefore, as an alternative quadrature rule, weighting every grid point by the approximate volume of its nearest domain is used. C. Expected Values Usually, only certain characteristics of the pdf are of interest. For most cases it is required to extract the expected value Instead of approximating the integrand by a piecewise constant or linear function as is done in the normalization of Bayes’ rule, we use a situation adapted rule to achieve a more accurate approximation. This is accomplished by interpolating the pdf itself piecewise using Gaussian densities of the form in every coordinate direction. Let xi = x0, xj = x0 +¢x, and xk = x0 ¡¢x be three neighboring sparse grid points (again ¢x is the local grid width). With interpolation points (xi,pt(xi)), (xj ,pt(xj )), (xk,pt(xk)) the parameters V. ENHANCEMENTS The advection part forces the density to move across the state domain with time. Large density shifts would imply unnecessary high computational effort to discretize the whole region in every time step.A significant performance gain can be achieved by gridtiling and grid-drifting techniques. Tiling helps in dealing with pdfs, which move and widen due to drift and diffusion with time. Also, tiling restricts the computational effort to a possibly small subset of the state domain, which carries a probability close to 1. In contrast to regular grids, it is not natural to expand sparse grids just by adding a few rows of grid points since this would contradict the hierarchical structure of those grids. If we want to add or delete grid points, we do so by adding or removing an entiresparse grid. Initially, one tile covers the whole region of interest, and in the further computation, only a few tiles are typically used. To represent a more general area necessary for the filtering process for a moving and shape-changing pdf, we cover the relevant region of the domain with tiles each containing a sparse grid. Since the sparse grid does not contain boundary points, we have to add connecting boundary layers between the sparse grid tiles. Those boundary layers (“glue layer”) are sparse grids themselves with a lower dimension. A boundary layer is added only if all its possible neighbors are part of the tiling. This process is driven by the d-dimensional tiles only. A new tile is introduced if the probability on a rectangular strap of quarter tile width close to a border exceeds a certain threshold. The necessary boundary layers to existing neighbors (starting from (d ¡1)-dimensional boundaries down to 0dimensional boundaries) are added afterwards. We choose the threshold such that for a Gaussian distribution with the same variances as the initial one, a tile is added if the tail outside the tile carries more than 0.01 probability. A tile is removed if the integral of the pdf on this tile is below a given threshold ®. This is done reversely to the process of adding tiles. The lowest dimensional associated boundary tiles are removed first, and the d-dimensional tile itself is deleted last. It is reasonable to use a threshold similar to the adding threshold; we chose ®=0:01. The check whether tiles are still necessary or new tiles are required is done in every time step. To save computing time the check could be performed in regular time intervals depending on the dynamics of the system. REFERENCES [1] Simon, D. Optimal State Estimation. Hoboken, NJ: Wiley, 2006. [2] Daum, F. E. New exact nonlinear filters: Theory and applications. Proceedings of SPIE, vol. 2235, 1994, pp. 636—649. [3] Li, X. R. and Jilkov, V. P. A survey of maneuvering target tracking–Part VI(a):Density-based exact nonlinear filtering. Proceedings of the 2010 SPIE Conference on Signal and Data Processing of Small Targets, Orlando, FL, Apr. 6—8,2010. [4] Bene³s,V. E. Exact finite-dimensional filters for certain diffusions with nonlinear drift. Stochastics, 5, 1—2 (1981), 65—92. [5] Daum, F. Nonlinear filters: Beyond the Kalman filter. IEEE Aerospace and Electronic Systems Magazine, 20, 8(Aug. 2005), 57— 69. [6] Farina, A., Ristic, B., and Benvenuti, D.Tracking a ballistic target: Comparison of several nonlinear filters.IEEE Transactions on Aerospace and Electronic Systems,38, 3 (July 2002), 854—866. Thanks for listening