This lesson focuses on the estimation process in level 1 fusion. In the previous lesson, we discussed
association and correlation – in essence how to sort observations into groups or “piles” with each group
representing observations of the same physical entity or event. We introduced the concept of
sequential processing, in which observations were processed as they were received by a fusion system.
The key question in association/correlation involved how to determine if a new observation belonged
to, or was evidence for an existing known entity (e.g., track). However, in the previous lesson, we did
not actually address the question of how to fuse the new data with our existing knowledge. We turn
now to that problem.
We continue our introduction to level 1 fusion and focus on the problem of estimation; how to combine
data from multiple sources to obtain the best estimate of the position, velocity, and characteristics of an
observed entity.
The student is reminded of two things:
First, as in the previous lecture we will present Level 1 fusion (focused on estimation) as it pertains to
tracking a physical target. However, the state estimation problem and techniques are very general and
pertain to many problems in which we seek to determine an unknown state vector based on
Second, this lecture contains slides with mathematical expressions. For those who do not have a
mathematical background do not be alarmed; in each instance we will describe the essence of the
mathematical process. It is not necessary to “read” or understand the actual mathematical equations
(they are included for the mathematically inclined).
For reference, the figure above shows the Joint Directors of Laboratories (JDL) data fusion process
model and highlights level 1 processing. As a reminder, level 1 processing involves combining data from
multiple sensors or sources to determine the location, velocity, characteristics and identity of an entity.
This lesson focuses on the estimation of location, velocity and characteristics of an entity.
As with most of data fusion processes, these occur naturally for humans and animals in their perception
and processing of their physical senses. Humans and animals can catch balls, interact with a dynamic
world, and predict the motion of moving objects. This is a learned behavior. Teaching a child to catch
a ball involves starting nearby, tossing the ball into their outstretched hands and continuing the process
until the learn how to predict where the ball will be going and match their movements to where the ball
will be, rather than where it currently is. A similar process occurs for animals.
We seek to emulate this natural human and animal cognition process using computational models. In
particular, we utilize explicit mathematical models such as differential equations to model the motion of
an object (or the evolution of an observed parameter), and solve those equations to predict the future
state of an object. Subsequently, we compare observations of the objects with our predicted
observations and continually adjust our current estimate of the state vector (which is the basis for
modeling the motion of an object or entity) to refine our knowledge. The models serve as an
approximation to reality.
Formally, we can state the estimation problem as follows:
Given a redundant set of “N” observations, “Z(t)” from one or more sensors, how can we find the value
of a state vector, x(t) which provides a best fit to the observational data. The term redundant is used
here in the mathematical sense – that is, we have more observations than the minimum number needed
to provide an initial estimate of the state vector.
There are a number of aspects to the problem including;
State vector - How do we define or select the parameters to represent the state of an object or entity?
For a moving object, the state vector typically involves a specification of the entity’s location and
velocity at a given time. For an observed “entity” such as a mechanical system, the state vector might
include the parameters related to the internal operation of the mechanical system. The idea of a state
vector is a set of parameters, which if known, would allow us to predict the future state of the entity or
Observation model – How do we relate the parameters that can be observed to the state vector? Thus,
for a moving object, we might observe a range and angular information, but seek to know its location
and velocity with respect to the center of the earth.
Observation noise – What information do we know or want to assume about the noise processes
involved in the observation process? Are the observations corrupted by noise in the transmission
media, in the inherent observing process within the sensor?
Dynamic model – How does the entity or object move or change in time? What are the dynamics of
how a state vector, x(t0) change from an initial time, t0, to a later time, t1 ? The specification of these
dynamics involves the selection or definition of an “equation of motion”.
Definition of “best fit” – In trying to find a value of a state vector that “best fits” the observational data,
what constitutes a definition of “best” fit? We saw some preliminary thoughts on this in the discussion
of association and correlation. That is, we might select the state vector that minimizes the geometric
distances between the predicted observations and the actual observations. Alternatively, we might
select the value of the state vector which minimizes the sum of the squares of the distances between
the predicted observations and the actual observations (this is the so-called method of least squares).
Method of solution - What mathematical method should be used to solve this optimization problem.
There are numerous methods that can be selected.
Fusion architecture – What fusion architecture should be used (e.g., a centralized architecture in which
all observations are sent to a central process for treatment, a distributed architecture in which
processing is performed at each sensor, etc.).
Approach to handling data – Finally, how do we process the individual sensor or source data? What
transformations or representations should be applied prior to the actual estimation or fusion?
The point here is not to show that there are a “blizzard” of design alternatives, but rather to raise
awareness that there are choices in the formulation and approach to solving the estimation problem
that should be addressed in a conscious manner.
It is interesting to note that the problem of data fusion and parametric fusion is relatively old. In the
late 1700s, Carl Frederick Gauss invented the method of least squares to address the challenge of how
to use multiple angular observations of asteroids to compute the best estimate of their orbital elements.
Previously, astronomers had developed so-called initial orbit methods that would use a minimum set of
observations to compute orbital elements. Gauss asked (and answered) the question of what could be
done if there were more than a minimum set of observations. Interestingly, his description of the
method included a discussion of many of the issues that we consider to be “modern” questions such as
issues of observation noise and definition of “best fit”. Near the same time, Legendre also
independently identified the method of least squares and got into a feud with Gauss about who should
claim credit for the invention.
This figure refers back to the chart previously introduced that showed the level 1 processing steps of;
preprocessing, data alignment, bulk gating, data association and correlation, positional/kinematic and
attribute estimation, and identity estimation. This lesson focuses on the highlighted step of position,
kinematic and attribute estimation. Again, the student is reminded that this partitioning of the
functions for level 1 processing is shown for clarity, rather than as a guide to implementation. An
actual data fusion system might very well integrate and interleave all of these functions.
This chart shows a “drill down” of the level 1 JDL data fusion processing model. Within level 1
processing, we identify the function of object/entity positional, kinematic and attribute estimation.
Categories of functions include; i) system models, ii) optimization criteria, iii) optimization approach, and
iv) processing approach.
System models - Within the system models, we need to first define the state vector or parameters to be
estimated, select equations and models to represent the dynamics of the entity being observed (i.e., the
equations of motion, maneuver models), define how to map from the observable quantities to the state
vector, and other implementation questions.
Optimization criteria – As previously mentioned, the estimation problem is fundamentally an
optimization problem – that is, how can we find a value of a state vector that optimizes the predicted
observations compared with the actual observations. There are a number of optimization criteria that
could be used. For example, we could minimize the sum of the squares of the residuals (or differences)
between the predicted and actual observations – this is the method of least squares. We could weight
the observations or components of the state vector and perform a weighted least squares approach.
There are other methods such as maximum likelihood methods and constrained Bayesian methods.
Optimization approach – Having selected the system models and the optimization criteria, we need to
select an approach to solving the problem. Choices include so-called direct methods, and indirect
methods. It is beyond the scope of this course to explore all of these. However, the student is
referred to various optimization texts to explore these further.
Processing approach - Finally, we have previously indicated that there are basic approaches to be
considered such as whether to process the observations as they are received (e.g., a sequential
processing approach), or wait until all observations have been received and process them together (a
batch approach). Finally, we indicated a method that does not use observations, but rather explores
error relationships using a method called covariance error analysis.
We have previously discussed some design options for multi-target, multi-sensor tracking.
options are summarized on this chart.
In order to explore the idea of estimation, let’s consider the following situation. We observe a
parameter, z, which varies as a function of time as illustrated in the graph above. We speculate that
the parameter, z, could be modeled by a quadratic function, z(t) = a + b t2. That is, z is equal to some
constant, a, plus another constant, b, times time squared. However, we don’t actually know what the
constant a and b are. The values of a and b constitute the state vector. If we knew what a and b were,
we could predict the value of z at any future time. The equation z(t) = a + b t2, is the observation
equation. Since we assume that a and b are constants in time, the equation of motion is simply; a(t) =
a(t0) and b(t) = b(t0). The estimation problem is to find the values of a and b that predict observations
that best fit the actual observations.
For this example, we will use a batch approach, in which we process a number of observations after they
have been received or observed. Thus, we observe a number of values of the parameter z at times t1,
t2, etc. We proceed as follows;
First we “guess” at the values of a and b (say a0 and b0)
Next, given this initial guess at a and b, we compute predicted observations at the same time as
the actual observations.
For example, z predicted(t1) = a0 + b0 (t1)2 and z predicted (t2) = a0 + b0 (t2)2 , etc.
We compare the predicted values at each time with the actual values;
r(t1) = z(t1) – zpredicted (t1), r(t2) = z(t2) – zpredicted (t2), etc. There are called observation
We compute the sum of the squares of the residuals, viz., sum = r(t1)2 + r(t2)2 + …. r(tn)2
We systematically vary the values of a and b until this sum is a minimum.
One might ask, why minimize the sum, rather than say the absolute value of the residuals. It turns out
that using the squares actually makes the solution easier to find than using the absolute values. Also,
under some fairly safe conditions, the result is the same.
We previously showed the concept of “batch” processing in which we waited until all the observations
have been collected and introduced the method of least squares to find the value of a state vector,
which, if known, would allow us to predict the state of a system (e.g., the location and velocity of an
object) at any time in the future. There are numerous other estimation methods such as the weighted
least squares method, the mean square error method, the maximum likelihood method, etc.
The next question is what happens if we want to try to determine the values of the state vector “on the
fly” as we receive new observations. This is called sequential processing. In the next few charts, we
will introduce one such method – the Kalman filter. A history of the Kalman filter is available at: There are many
extensions of the Kalman filter and extensive literature exists on the topic of sequential estimation.
Let’s consider a simple case of recursive or sequential estimation. Suppose we are trying to measure
the length of a table with a ruler. We know that each measurement that we make has some error due
to how the granularity of the ruler as a measurement instrument. So one way to get a good estimate
of the length of the table is to make multiple measurements and simply take an average. So if we make
“k” measurements, the best estimate of the length of the table is to simply use the average, e.g.,
estimated length = the sum of each of the individual measurements, l1 + l2 + … + lk , divided by the total
number of measurements, k.
But what happens if later we make another measurement, but do not have access to the original
measurement? It can be shown, that a way to incorporate the new measurement is to take the
previous estimate of the length of the table, based on the previous k measurements and add a
correction based on a constant times the difference between the new measurement and the previous
estimate. Thus, in words,
New estimate equals previous estimate plus a constant times the difference between the new
observation and the old estimate (based on the previous observations). The chart shows a general
In the next two charts, we will show the equations for sequential estimation using the Kalman filter
formulation. However, in preparation, this chart distills the essence of the estimation process in
words. Again, we suppose that we are trying to estimate the value of a state vector, x(t) which “best
fits” a set of observations, z(t), processing one observation at a time. So we start, in-process and
assume that we have a current estimate of the state vector at some time, and have just received a new
observation at time, ti+1
 Predict forward in time – First, we must “move” the value of the state vector from the time of
the previous observation to the new observation time. Specifically;
 Update the old state vector from the time of the last observation, ti, to the time of the
current observation at time, ti+1 . This involves using the equations of motion which
describe how the state vector changes in time (e.g., the assumed trajectory of a moving
 Update the estimate of the uncertainty of the state vector to time, ti+1 (using an
“equation of motion” for the error growth)
 Compute the Kalman gain - this is a factor used to change our estimate of the state vector and
associated uncertainty based on several factors; our confidence in the accuracy of our current
estimate of the state vector (how well do we think we know the answer) and our confidence in
the accuracy of the observations (are we receiving “good” observations or relatively “bad”
 Receive the new observation
 Obtain a new observation at time, ti+1 and perform whatever transformations and units
conversions that are needed
 Update the estimates of the state vector and its uncertainty
 Update the estimate of the state vector at time ti+1 using the Kalman equation (which
says in essence, “new state vector equals olds state vector times a constant multiplied
by the difference between the actual observation and the predicted observation”).
 Similarly update the estimate of the uncertainty of the state vector.
At this point we now have a new estimate of the state vector at time, t1+1, (i.e., x(ti+1)) along with an
estimate of the uncertainty in the state vector, accounting for all of the observations, to date, including
the new observation at time, ti+1
Graphically, the update process is shown above. At some time, tk-1, we have an estimate of the value of
the state vector, x (tk-1) and an estimate of the uncertainty of the state vector, P (tk-1) at the time of the
observation, z (tk-1). These are denoted by xk-1 and Pk-1. However, we have not yet updated these to
include the effect of the observation, z at time tk-1. This is indicated in the diagram by the (-) for xk-1 and
Pk-1. We solve the Kalman equations and obtain new estimates of the value of the state vector and its
uncertainty at time, tk-1, denoted by the (+) associated with x and P.
A new observation becomes available at time tk. We “move” the state vector from time, tk-1, to time tk,
solving the equations of motion for the state vector and for the uncertainty. These are shown by the
transition matrices, Φk-1, and Qk-1, and begin the process again, as we did at time, tk-1.
One note should be made about the quantity Q. This is an attempt to introduce noise explicitly into the
state vector propagation. This seems counter intuitive. The reason for this is an effect that is
sometimes called “convergence”. The problem is that, as we process observational data, we tend to
become overconfident about the accuracy of the state vector. Thus, as we process “n” observations,
the uncertainty of the state vector tends to be reduced by a factor of the square root of n. As a result,
in computing the Kalman gain, we tend to reduce the impact of new observations, since we “believe”
our current estimate of x over the evidence provided by the new observations. In effect we begin to
mathematically ignore the new observations. The quantity, Q, sometimes called process noise, adds
uncertainty to the state vector as we propagate the state vector forward in time. Other techniques can
be used such as so-called “fading memory” filters which systematically seek to reduce the impact of
observations as we move forward in time.
This chart shows the definition of the key quantities and dynamic models. Specifically, x, is a state
vector that we seek to estimate and z represents an observation. The equation of motion specifies
how the state vector moves in time (without considering the effect of new observations). This is
represented given by;
x(t) = Φ (t, t0) x(t0), where the quantity, Φ, is called the transition matrix.
Thus, in the absence of any new information (via an observation), if we know the value of x at time, t0,
namely, x(t0), then the predicted value of x at a later time, t, is given by the equation of motion above.
The observation equation, z(t) = H x(t) + observation noise, allows us to predict the value of an
observation (that is, what would be observed at time, t, given our current estimate of the state vector, x,
at the same time).
This is a summary of the Kalman filter equations. For the student interested in the mathematical
formulation, this is provided in the text book.
This diagram simply relates the Kalman filter equations to words which describe the processing steps.
Notice several things.
First, the only place in the Kalman filter equations that depend upon the type of sensor (or data
source) are the variables z, (the predicted and actual observation), the value of R (which is the
observation noise or uncertainty), and the quantity H which represents how we compute a
predicted observation based on our current estimate of the state vector. Thus, we could
integrate or fuse several different types of observations from different sensors.
Second, the dimension of the state vector and the observations are unrelated. So, we could
have a 6 dimensional state vector (e.g., three position coordinates and three velocity
coordinates), but only have a single observable quantity (e.g., range or even range-rate).
There are so-called “observability” issues in which a weak relationship between the observable
quantities and the state vector can create difficulties in the estimation process.
Third, measurements or observations from different types of sensors can be interleaved during
the sequential estimation process. That is, we could start with an estimate of the state vector
at time, t0 , (x(t0), and process an observation from one type of sensor at time, t1, (say, zsensor A
(t1)) using the Kalman update equations, then obtain another observation from a different type
of sensor at time t2, (say, zsensor B (t2)) using the same Kalman equations, but simply using the
appropriate expressions for R and H.
This latter point addresses the question of where does the fusion occur. It occurs as we sequentially
process each observation and observation type, systematically updating our estimate of the state
vector, accounting for these different observations.
Some brief notes on Kalman filtering. First, mathematically, Kalman filtering is a specific case of
Bayesian estimation. It relies on statistical assumptions about the observational noise and the
propagation of the state vector. It uses linear system models. There are extensions available for nonlinear models. The Kalman filter was originally developed and is often used because it is
computationally efficient and performs well if the underlying assumptions are true (or close to being
true). Some disadvantages of the Kalman filter include issues when the underlying noise characteristics
of the observation processes are not known. Also, many practical applications involve non-linear
systems. Alternatives to the Kalman filter include the Extended Kalman Filter which seeks to relax the
linearity requirement by using a local linear approximation, and the Particle filter, which seeks to relax
the requirement for a priori knowledge of the noise statistics.
A challenge in any estimation process involves maneuvers or unpredictable changes in the state vector.
This can occur either deliberately, such as when a pilot changes direction, or because of environmental
conditions – think of a Frisbee which is subject to unpredictable air currents. So, a maneuver involves a
prolonged acceleration and in general, acceleration cannot be measured directly. That is we can
observe the effects of acceleration via the changed position or velocity of an object or entity, but we
cannot directly observe the acceleration. Even if we have a model of acceleration, there is a high
correlation between position, velocity and acceleration. Hence, the estimation becomes challenging
because of this “observability” effect. As a result, there are a number of techniques that have been
used to try to address this problem.
The techniques range from using a standard sequential estimation process and hoping that the
estimation will “catch up” to any acceleration. In this case, the new observations cause changes to the
estimated state vector, and while we don’t explicitly account for acceleration, we hope that the new
observation can allow us to maintain a track. Other methods seek to explicitly develop one or more
models and try to estimate the acceleration components, despite the high correlations and observability
issues. Finally, there are ad hoc methods. One such method involves simultaneously using multiple
models of acceleration and having each model process the data – sharing the estimation results among
the competing models. This is shown conceptually on the next figure.
This chart shows the concept of interacting multiple models (IMM). Adapted from the paper
“Comparison of Various Schema of Filter Adaptivity for the Tracking of Maneuvering Targets”, Center of
Mathematical Research, Montreal by A. Jouan, E. Bosse, Marc-Alain Simard, and E. Shahbazian,
Proceedings of the SPIE, 3373, Signal and Data Processing of Small Targets, 1998
There are numerous variations in estimation techniques, based on different levels of knowledge or
assumptions about our understanding of the inherent models, error characteristics (e.g., of the sensor
observations, growth of error in the state vector propagation, etc.). The key is not which model is the
most sophisticated, but rather which model is appropriate based on our understanding of the problem
at hand. Not every problem is a nail!

This lesson focuses on the estimation process in level 1 fusion. In