This lesson focuses on the estimation process in level 1 fusion. In the previous lesson, we discussed association and correlation – in essence how to sort observations into groups or “piles” with each group representing observations of the same physical entity or event. We introduced the concept of sequential processing, in which observations were processed as they were received by a fusion system. The key question in association/correlation involved how to determine if a new observation belonged to, or was evidence for an existing known entity (e.g., track). However, in the previous lesson, we did not actually address the question of how to fuse the new data with our existing knowledge. We turn now to that problem. We continue our introduction to level 1 fusion and focus on the problem of estimation; how to combine data from multiple sources to obtain the best estimate of the position, velocity, and characteristics of an observed entity. The student is reminded of two things: First, as in the previous lecture we will present Level 1 fusion (focused on estimation) as it pertains to tracking a physical target. However, the state estimation problem and techniques are very general and pertain to many problems in which we seek to determine an unknown state vector based on observations; Second, this lecture contains slides with mathematical expressions. For those who do not have a mathematical background do not be alarmed; in each instance we will describe the essence of the mathematical process. It is not necessary to “read” or understand the actual mathematical equations (they are included for the mathematically inclined). For reference, the figure above shows the Joint Directors of Laboratories (JDL) data fusion process model and highlights level 1 processing. As a reminder, level 1 processing involves combining data from multiple sensors or sources to determine the location, velocity, characteristics and identity of an entity. This lesson focuses on the estimation of location, velocity and characteristics of an entity. As with most of data fusion processes, these occur naturally for humans and animals in their perception and processing of their physical senses. Humans and animals can catch balls, interact with a dynamic world, and predict the motion of moving objects. This is a learned behavior. Teaching a child to catch a ball involves starting nearby, tossing the ball into their outstretched hands and continuing the process until the learn how to predict where the ball will be going and match their movements to where the ball will be, rather than where it currently is. A similar process occurs for animals. We seek to emulate this natural human and animal cognition process using computational models. In particular, we utilize explicit mathematical models such as differential equations to model the motion of an object (or the evolution of an observed parameter), and solve those equations to predict the future state of an object. Subsequently, we compare observations of the objects with our predicted observations and continually adjust our current estimate of the state vector (which is the basis for modeling the motion of an object or entity) to refine our knowledge. The models serve as an approximation to reality. Formally, we can state the estimation problem as follows: Given a redundant set of “N” observations, “Z(t)” from one or more sensors, how can we find the value of a state vector, x(t) which provides a best fit to the observational data. The term redundant is used here in the mathematical sense – that is, we have more observations than the minimum number needed to provide an initial estimate of the state vector. There are a number of aspects to the problem including; State vector - How do we define or select the parameters to represent the state of an object or entity? For a moving object, the state vector typically involves a specification of the entity’s location and velocity at a given time. For an observed “entity” such as a mechanical system, the state vector might include the parameters related to the internal operation of the mechanical system. The idea of a state vector is a set of parameters, which if known, would allow us to predict the future state of the entity or object. Observation model – How do we relate the parameters that can be observed to the state vector? Thus, for a moving object, we might observe a range and angular information, but seek to know its location and velocity with respect to the center of the earth. Observation noise – What information do we know or want to assume about the noise processes involved in the observation process? Are the observations corrupted by noise in the transmission media, in the inherent observing process within the sensor? Dynamic model – How does the entity or object move or change in time? What are the dynamics of how a state vector, x(t0) change from an initial time, t0, to a later time, t1 ? The specification of these dynamics involves the selection or definition of an “equation of motion”. Definition of “best fit” – In trying to find a value of a state vector that “best fits” the observational data, what constitutes a definition of “best” fit? We saw some preliminary thoughts on this in the discussion of association and correlation. That is, we might select the state vector that minimizes the geometric distances between the predicted observations and the actual observations. Alternatively, we might select the value of the state vector which minimizes the sum of the squares of the distances between the predicted observations and the actual observations (this is the so-called method of least squares). Method of solution - What mathematical method should be used to solve this optimization problem. There are numerous methods that can be selected. Fusion architecture – What fusion architecture should be used (e.g., a centralized architecture in which all observations are sent to a central process for treatment, a distributed architecture in which processing is performed at each sensor, etc.). Approach to handling data – Finally, how do we process the individual sensor or source data? What transformations or representations should be applied prior to the actual estimation or fusion? The point here is not to show that there are a “blizzard” of design alternatives, but rather to raise awareness that there are choices in the formulation and approach to solving the estimation problem that should be addressed in a conscious manner. It is interesting to note that the problem of data fusion and parametric fusion is relatively old. In the late 1700s, Carl Frederick Gauss invented the method of least squares to address the challenge of how to use multiple angular observations of asteroids to compute the best estimate of their orbital elements. Previously, astronomers had developed so-called initial orbit methods that would use a minimum set of observations to compute orbital elements. Gauss asked (and answered) the question of what could be done if there were more than a minimum set of observations. Interestingly, his description of the method included a discussion of many of the issues that we consider to be “modern” questions such as issues of observation noise and definition of “best fit”. Near the same time, Legendre also independently identified the method of least squares and got into a feud with Gauss about who should claim credit for the invention. This figure refers back to the chart previously introduced that showed the level 1 processing steps of; preprocessing, data alignment, bulk gating, data association and correlation, positional/kinematic and attribute estimation, and identity estimation. This lesson focuses on the highlighted step of position, kinematic and attribute estimation. Again, the student is reminded that this partitioning of the functions for level 1 processing is shown for clarity, rather than as a guide to implementation. An actual data fusion system might very well integrate and interleave all of these functions. This chart shows a “drill down” of the level 1 JDL data fusion processing model. Within level 1 processing, we identify the function of object/entity positional, kinematic and attribute estimation. Categories of functions include; i) system models, ii) optimization criteria, iii) optimization approach, and iv) processing approach. System models - Within the system models, we need to first define the state vector or parameters to be estimated, select equations and models to represent the dynamics of the entity being observed (i.e., the equations of motion, maneuver models), define how to map from the observable quantities to the state vector, and other implementation questions. Optimization criteria – As previously mentioned, the estimation problem is fundamentally an optimization problem – that is, how can we find a value of a state vector that optimizes the predicted observations compared with the actual observations. There are a number of optimization criteria that could be used. For example, we could minimize the sum of the squares of the residuals (or differences) between the predicted and actual observations – this is the method of least squares. We could weight the observations or components of the state vector and perform a weighted least squares approach. There are other methods such as maximum likelihood methods and constrained Bayesian methods. Optimization approach – Having selected the system models and the optimization criteria, we need to select an approach to solving the problem. Choices include so-called direct methods, and indirect methods. It is beyond the scope of this course to explore all of these. However, the student is referred to various optimization texts to explore these further. Processing approach - Finally, we have previously indicated that there are basic approaches to be considered such as whether to process the observations as they are received (e.g., a sequential processing approach), or wait until all observations have been received and process them together (a batch approach). Finally, we indicated a method that does not use observations, but rather explores error relationships using a method called covariance error analysis. We have previously discussed some design options for multi-target, multi-sensor tracking. options are summarized on this chart. These In order to explore the idea of estimation, let’s consider the following situation. We observe a parameter, z, which varies as a function of time as illustrated in the graph above. We speculate that the parameter, z, could be modeled by a quadratic function, z(t) = a + b t2. That is, z is equal to some constant, a, plus another constant, b, times time squared. However, we don’t actually know what the constant a and b are. The values of a and b constitute the state vector. If we knew what a and b were, we could predict the value of z at any future time. The equation z(t) = a + b t2, is the observation equation. Since we assume that a and b are constants in time, the equation of motion is simply; a(t) = a(t0) and b(t) = b(t0). The estimation problem is to find the values of a and b that predict observations that best fit the actual observations. For this example, we will use a batch approach, in which we process a number of observations after they have been received or observed. Thus, we observe a number of values of the parameter z at times t1, t2, etc. We proceed as follows; • First we “guess” at the values of a and b (say a0 and b0) • Next, given this initial guess at a and b, we compute predicted observations at the same time as the actual observations. • • For example, z predicted(t1) = a0 + b0 (t1)2 and z predicted (t2) = a0 + b0 (t2)2 , etc. We compare the predicted values at each time with the actual values; • r(t1) = z(t1) – zpredicted (t1), r(t2) = z(t2) – zpredicted (t2), etc. There are called observation residuals. • We compute the sum of the squares of the residuals, viz., sum = r(t1)2 + r(t2)2 + …. r(tn)2 • We systematically vary the values of a and b until this sum is a minimum. One might ask, why minimize the sum, rather than say the absolute value of the residuals. It turns out that using the squares actually makes the solution easier to find than using the absolute values. Also, under some fairly safe conditions, the result is the same. We previously showed the concept of “batch” processing in which we waited until all the observations have been collected and introduced the method of least squares to find the value of a state vector, which, if known, would allow us to predict the state of a system (e.g., the location and velocity of an object) at any time in the future. There are numerous other estimation methods such as the weighted least squares method, the mean square error method, the maximum likelihood method, etc. The next question is what happens if we want to try to determine the values of the state vector “on the fly” as we receive new observations. This is called sequential processing. In the next few charts, we will introduce one such method – the Kalman filter. A history of the Kalman filter is available at: http://www.ieeecss.org/CSM/library/2010/june10/11-HistoricalPerspectives.pdf. There are many extensions of the Kalman filter and extensive literature exists on the topic of sequential estimation. Let’s consider a simple case of recursive or sequential estimation. Suppose we are trying to measure the length of a table with a ruler. We know that each measurement that we make has some error due to how the granularity of the ruler as a measurement instrument. So one way to get a good estimate of the length of the table is to make multiple measurements and simply take an average. So if we make “k” measurements, the best estimate of the length of the table is to simply use the average, e.g., estimated length = the sum of each of the individual measurements, l1 + l2 + … + lk , divided by the total number of measurements, k. But what happens if later we make another measurement, but do not have access to the original measurement? It can be shown, that a way to incorporate the new measurement is to take the previous estimate of the length of the table, based on the previous k measurements and add a correction based on a constant times the difference between the new measurement and the previous estimate. Thus, in words, New estimate equals previous estimate plus a constant times the difference between the new observation and the old estimate (based on the previous observations). The chart shows a general formulation. In the next two charts, we will show the equations for sequential estimation using the Kalman filter formulation. However, in preparation, this chart distills the essence of the estimation process in words. Again, we suppose that we are trying to estimate the value of a state vector, x(t) which “best fits” a set of observations, z(t), processing one observation at a time. So we start, in-process and assume that we have a current estimate of the state vector at some time, and have just received a new observation at time, ti+1 Predict forward in time – First, we must “move” the value of the state vector from the time of the previous observation to the new observation time. Specifically; Update the old state vector from the time of the last observation, ti, to the time of the current observation at time, ti+1 . This involves using the equations of motion which describe how the state vector changes in time (e.g., the assumed trajectory of a moving object) Update the estimate of the uncertainty of the state vector to time, ti+1 (using an “equation of motion” for the error growth) Compute the Kalman gain - this is a factor used to change our estimate of the state vector and associated uncertainty based on several factors; our confidence in the accuracy of our current estimate of the state vector (how well do we think we know the answer) and our confidence in the accuracy of the observations (are we receiving “good” observations or relatively “bad” observations?) Receive the new observation Obtain a new observation at time, ti+1 and perform whatever transformations and units conversions that are needed Update the estimates of the state vector and its uncertainty Update the estimate of the state vector at time ti+1 using the Kalman equation (which says in essence, “new state vector equals olds state vector times a constant multiplied by the difference between the actual observation and the predicted observation”). Similarly update the estimate of the uncertainty of the state vector. At this point we now have a new estimate of the state vector at time, t1+1, (i.e., x(ti+1)) along with an estimate of the uncertainty in the state vector, accounting for all of the observations, to date, including the new observation at time, ti+1 Graphically, the update process is shown above. At some time, tk-1, we have an estimate of the value of the state vector, x (tk-1) and an estimate of the uncertainty of the state vector, P (tk-1) at the time of the observation, z (tk-1). These are denoted by xk-1 and Pk-1. However, we have not yet updated these to include the effect of the observation, z at time tk-1. This is indicated in the diagram by the (-) for xk-1 and Pk-1. We solve the Kalman equations and obtain new estimates of the value of the state vector and its uncertainty at time, tk-1, denoted by the (+) associated with x and P. A new observation becomes available at time tk. We “move” the state vector from time, tk-1, to time tk, solving the equations of motion for the state vector and for the uncertainty. These are shown by the transition matrices, Φk-1, and Qk-1, and begin the process again, as we did at time, tk-1. One note should be made about the quantity Q. This is an attempt to introduce noise explicitly into the state vector propagation. This seems counter intuitive. The reason for this is an effect that is sometimes called “convergence”. The problem is that, as we process observational data, we tend to become overconfident about the accuracy of the state vector. Thus, as we process “n” observations, the uncertainty of the state vector tends to be reduced by a factor of the square root of n. As a result, in computing the Kalman gain, we tend to reduce the impact of new observations, since we “believe” our current estimate of x over the evidence provided by the new observations. In effect we begin to mathematically ignore the new observations. The quantity, Q, sometimes called process noise, adds uncertainty to the state vector as we propagate the state vector forward in time. Other techniques can be used such as so-called “fading memory” filters which systematically seek to reduce the impact of observations as we move forward in time. This chart shows the definition of the key quantities and dynamic models. Specifically, x, is a state vector that we seek to estimate and z represents an observation. The equation of motion specifies how the state vector moves in time (without considering the effect of new observations). This is represented given by; x(t) = Φ (t, t0) x(t0), where the quantity, Φ, is called the transition matrix. Thus, in the absence of any new information (via an observation), if we know the value of x at time, t0, namely, x(t0), then the predicted value of x at a later time, t, is given by the equation of motion above. The observation equation, z(t) = H x(t) + observation noise, allows us to predict the value of an observation (that is, what would be observed at time, t, given our current estimate of the state vector, x, at the same time). This is a summary of the Kalman filter equations. For the student interested in the mathematical formulation, this is provided in the text book. This diagram simply relates the Kalman filter equations to words which describe the processing steps. Notice several things. • First, the only place in the Kalman filter equations that depend upon the type of sensor (or data source) are the variables z, (the predicted and actual observation), the value of R (which is the observation noise or uncertainty), and the quantity H which represents how we compute a predicted observation based on our current estimate of the state vector. Thus, we could integrate or fuse several different types of observations from different sensors. • Second, the dimension of the state vector and the observations are unrelated. So, we could have a 6 dimensional state vector (e.g., three position coordinates and three velocity coordinates), but only have a single observable quantity (e.g., range or even range-rate). There are so-called “observability” issues in which a weak relationship between the observable quantities and the state vector can create difficulties in the estimation process. • Third, measurements or observations from different types of sensors can be interleaved during the sequential estimation process. That is, we could start with an estimate of the state vector at time, t0 , (x(t0), and process an observation from one type of sensor at time, t1, (say, zsensor A (t1)) using the Kalman update equations, then obtain another observation from a different type of sensor at time t2, (say, zsensor B (t2)) using the same Kalman equations, but simply using the appropriate expressions for R and H. This latter point addresses the question of where does the fusion occur. It occurs as we sequentially process each observation and observation type, systematically updating our estimate of the state vector, accounting for these different observations. Some brief notes on Kalman filtering. First, mathematically, Kalman filtering is a specific case of Bayesian estimation. It relies on statistical assumptions about the observational noise and the propagation of the state vector. It uses linear system models. There are extensions available for nonlinear models. The Kalman filter was originally developed and is often used because it is computationally efficient and performs well if the underlying assumptions are true (or close to being true). Some disadvantages of the Kalman filter include issues when the underlying noise characteristics of the observation processes are not known. Also, many practical applications involve non-linear systems. Alternatives to the Kalman filter include the Extended Kalman Filter which seeks to relax the linearity requirement by using a local linear approximation, and the Particle filter, which seeks to relax the requirement for a priori knowledge of the noise statistics. A challenge in any estimation process involves maneuvers or unpredictable changes in the state vector. This can occur either deliberately, such as when a pilot changes direction, or because of environmental conditions – think of a Frisbee which is subject to unpredictable air currents. So, a maneuver involves a prolonged acceleration and in general, acceleration cannot be measured directly. That is we can observe the effects of acceleration via the changed position or velocity of an object or entity, but we cannot directly observe the acceleration. Even if we have a model of acceleration, there is a high correlation between position, velocity and acceleration. Hence, the estimation becomes challenging because of this “observability” effect. As a result, there are a number of techniques that have been used to try to address this problem. The techniques range from using a standard sequential estimation process and hoping that the estimation will “catch up” to any acceleration. In this case, the new observations cause changes to the estimated state vector, and while we don’t explicitly account for acceleration, we hope that the new observation can allow us to maintain a track. Other methods seek to explicitly develop one or more models and try to estimate the acceleration components, despite the high correlations and observability issues. Finally, there are ad hoc methods. One such method involves simultaneously using multiple models of acceleration and having each model process the data – sharing the estimation results among the competing models. This is shown conceptually on the next figure. This chart shows the concept of interacting multiple models (IMM). Adapted from the paper “Comparison of Various Schema of Filter Adaptivity for the Tracking of Maneuvering Targets”, Center of Mathematical Research, Montreal by A. Jouan, E. Bosse, Marc-Alain Simard, and E. Shahbazian, Proceedings of the SPIE, 3373, Signal and Data Processing of Small Targets, 1998 There are numerous variations in estimation techniques, based on different levels of knowledge or assumptions about our understanding of the inherent models, error characteristics (e.g., of the sensor observations, growth of error in the state vector propagation, etc.). The key is not which model is the most sophisticated, but rather which model is appropriate based on our understanding of the problem at hand. Not every problem is a nail!