This lesson introduces the concept of Level 0 fusion in the JDL data

advertisement
This lesson introduces the concept of Level 0 fusion in the JDL data fusion processing model. Level 0 in
the JDL model is focused on processing individual sensor data prior to combining the data with other
sensors.
The objectives of this lesson are to; i) introduce the concept of level 0 processing, ii) identify categories
of techniques for level 0 processing and iii) to extend the concept of traditional signal and image
processing to “meta-data” generation.
Recall that level 0 in the JDL data fusion process model was not part of the original formulation of the
model. Instead, this “new level” of processing was introduced by Alan Steinberg, Chris Bowman and
Frank White in a 1999 paper entitled, “Revisions to the JDL Data Fusion Model”. Level 0 processing was
introduced for a number of reasons including;
i) recognition of the rapid introduction of “smart sensors” with embedded processing capability that
allowed application of sophisticated image and signal processing, and
ii) recognition that some processing could be done on a single sensor basis that would improve
subsequent multi-sensor fusion. Examples of the latter include functions such as “pre-detection”
fusion and single sensor entity characterization and identification.
The following are some comments on level 0 fusion:
Level 0 processing concerns processing each source of data independently to obtain the most useful
information possible; we want to “squeeze out” useful information from each sensor or source of
information;
In general, input data to a fusion system may include scalar or vector data, signal data, image data, or
textual information. Each of these classes of information may entail an entire range of potentially
applicable techniques (e.g., signal processing, image processing, text-based processing)
It is beyond the scope of this lecture (and indeed this entire course) to address these subjects –
however, we will provide some thoughts on each of these data types in this lesson
Finally, an emerging new area is the generation of semantic meta-data for signals and images as a new
powerful process to improve data fusion. This involves using pattern recognition and machine learning
techniques to process data and to provide semantic annotations. An example is automatic face
recognition. Another example is linguistic indexing of images (e.g., processing images to automatically
assign words that describe the content of the image). These concepts are described later in this lesson.
This chart shows some examples of the types of processing that could be performed on single-sensor
data. There are an enormous amount of papers and books describing these methods, and commercial
tools for performing meta-sensor processing. For example the continually evolving commercial toolkit
called MATLAB (http://www.mathworks.com/products/matlab/) provides a signal processing toolbox, a
wavelet toolbox, a phased array system toolbox, and many other tools that can perform the functions
identified in the chart above.
The chart identifies four basic categories of meta-sensor processing. These include; i) parametric
modeling, ii) non-parametric modeling, iii) entity detection, and iv) sensor validation and calibration. It
should be noted that the techniques shown in the chart are only illustrative and not exhaustive. There
are many more types of methods involved in each category that could be applied to sensor data. For
example, the concept of image analysis (listed under the category of non-parametric analysis) is an
entire field of study with numerous texts, papers, and tools available. Let’s briefly discuss each of these
categories.
Parametric modeling methods are mathematical techniques for “operating on” data to produce new
representations. Parametric methods rely on assumptions about the underlying statistical properties of
the data – e.g., assumptions about the characteristics of the “noise” in the data. For example, one
might assume that the sensor is observing some type of time varying phenomena and that the
observations are not perfect, but rather are “corrupted” by inherent noise or random variations. This
noise and variations in the sensing process can be induced by the internal workings of the sensor,
variations in the observing media, and variations in the phenomena or entity being observed. Parametric
modeling methods seek to address questions such as; are there periodicities in the data, is there a trend,
are there multiple time series or observed phenomena that are interleaved in the data, etc. Methods
of parametric modeling include techniques such as spectral analysis, discrete Fourier transforms, timefrequency domain processing methods such as wavelet transforms, auto-regression (AR) methods and
others. A readable introduction to time series analysis is provided by an article entitled, “Time series
analysis with a past”, written by O. Hammer in 2010
(http://nhm2.uio.no/norlex/past/TimeseriesPast.pdf).
A second category of meta-sensor processing involves non-parametric analysis. Some of these
methods are similar to parametric modeling but make fewer assumptions about the underlying nature
of the data. Examples include neural networks, energy detection methods and trend analyses.
A third type of meta-sensor processing, entity detection, seeks to determine if the observed data is the
result of some observed entity, activity or event. For sensors involving an array of elements, for
example, we seek to process the data observed by each of the elements in the array to determine if the
observations are simply noise, or whether, when processed together the data are indicative of some
entity or target. Moving target indicator (MTI) processing involves processing radar data to take
advantage of the fact that a moving target provides a response that can be contrasted against
background clutter. Hypothesis testing references an entire field of statistical methods. For target or
entity detection, for example, we hypothesize that a target or entity is the “cause” of the observed data
and seek to determine to refute or confirm the hypothesis using the statistical techniques. Another set
of techniques called pre-detection fusion involves processing data in a sensor array in which none of the
individual array elements or sensor contributors can accurately declare that it has “seen” a target. See
for example, “Predetection fusion in a large sensor network with unknown target locations”, by R.
Georgeseu, P. Willett, S. Marano and V. Matta, in the Journal of Advances in Information Fusion, vol. 7, #
1, June 2012, pp. 61 – 77.
Finally, meta-sensor processing can involve sensor validation and calibration. Experimental tests may
be conducted to characterize the performance of a sensor and allow development of mathematical
models of the performance as a function of the environmental conditions, nature of the entity being
observed, etc.. Other concepts involve calibration procedures for the sensor, validation of the sensor
performance, and establishment of characteristics such as the probability of detection or probability of
correct identification, and probability of false alarms.
This chart illustrates the concept of level 0 processing by type of information or sensor type. As always
in these illustrations, we view sensors as if we are surveying or monitoring the entire world. Of course
this is only conceptual. The domain being monitored or observed depends on our specific application.
For condition-based maintenance, we might seek to monitor the performance of a mechanical system
such as an airplane. We would use sensors to measure temperature, pressure, vibration and chemical
composition of a lubricant. For this application, a human pilot, member of a flight crew or maintenance
person may provide observations about the condition and context of a machine’s operation. For a
medical application, we might seek to determine the health of an individual human using X-rays,
sonograms, temperature and biological tests. For this application, human observations might include
observations from a physician, nurse, or even the patient himself.
In any event, we may have access to data from sensors that produce a scalar or vector output (e.g., a
thermometer that produces a single number output – temperature), or a sensor that produces a signal
or time series output, an image sensor that produces one or more images, and finally the observations
of a human. These categories of sensor types are shown here. For each category of sensor or
information source, different types of level 0 processing may be applied.
Scalar/vector sensors - For sensors that output a scalar or vector quantity, data conditioning might be
applied involving units conversions, transformations from one numerical scale to another, changes from
one coordinate reference frame to another, thresholding and other methods.
Signal sensors - For sensors that produce a signal or time series, numerous signal processing algorithms
can be applied. In the previous chart we showed examples of both parametric and nonparametric
methods that could be applied.
Imagery sensors - For sensors that produce images, the entire field of digital image processing and
semantic meta-data processing is applicable.
Human observations - Finally for human observations, we may utilize methods from natural language
processing to translate the human’s “story” about what he or she has seen to extracted scalar
information about the observed entity location, characteristics and identity.
In the next few charts we will consider each of these types of sources.
Level 0 processing of scalar or vector data from a sensor or source includes techniques such as scaling
methods and transformations, coordinate transformations and rotations, smoothing, filtering or
averaging methods, thresholding and feature extraction and alternate representations. The concept is
illustrated on the lower part of this chart. Original scalar or vector data, represented by y is input to a
transformation function. The output of this transformation is the original data, y, plus one or more
alternative representations indicated by the letter z. The output may also include statistical metrics
such as standard deviation, if the output sensor data are averaged over a period of time. In addition,
the output may include semantic representations (e.g., engine overheating!)
Other types of transformations include scaling, change of representation units, coordinate
transformations and rotations. For example a radar observation that reports range, azimuth and
elevation to an observed target may be translated into (x,y,z) Cartesian coordinates related to the
radar’s location. Similarly, the location of an observed person, might originally be reported with respect
to a building or reference point (e.g., the observed person is at latitude 40 degrees, 47 minutes and 35
seconds North and Longitude 77 degrees, 51 minutes and 37 seconds West). However, this might be
more useful translated to Military Grid coordinates.
We may take the output of a sensor and perform smoothing or averaging over a period of time. A
blood pressure reading measurement, for example, might be more meaningful to a physician if we know
the average blood pressure of a patient over a period of an hour or day, rather than the moment the
patient walks into the physician’s office. Finally, we might seek to use a threshold and only report the
observation if it exceeds a certain value. The temperature of the engine of your automobile may not
be useful to know unless the engine is deemed to be overheating.
Finally, feature extractions and other representations might be utilized. For frequency domain data,
we might seek to represent the sensor data as the location and magnitude of the peaks of the data (viz.,
the primary frequencies present in the data).
For sensors that provide an output signal or time series, there are two main categories of algorithms for
level 0 processing; these include parametric modeling methods and non-parametric modeling methods.
The concept is illustrated in lower part of the chart above. A generally noisy signal is input to a
processing or transformation function that outputs a new representation of the signal along with
parameters that characterize the signal. For example as illustrated above, a noisy time series is input to
our transformation process which applies a 512 point Fourier Transform. We see the output frequency
domain representation on the right. It is clear from the transformation that the original noisy time
series is comprised of two main frequencies, and corrupted by noise.
While many techniques are available to manipulate the input data, the specific techniques selected
should depend upon our understanding of the nature of the observing environment and the entities
being observed. If we believe that the phenomena that we’re interested in is periodic in nature (e.g. a
heart beat), then spectral methods are applicable.
If we know the form of the signal being observed, we could utilize methods such as matched filter
processing. For example, if we are seeking to find an emitter (such as that associated with an aircraft’s
lost “black box”, we could generate a signal that we would expect to observe and then match that signal
against the actual output of a sensor tuned to the correct frequency and see if there is a match (subject
to noise and environmental conditions). Alternatively if we are seeking to observe a trend, such as
the evolution of the wear in a mechanical system, we might select trend analysis techniques.
An excellent overview of these methods and techniques is provided by the book, Signal Processing for
Intelligent Sensor Systems, by David Swanson, published by Marcel-Decker, 2000. A free online course
in digital signal processing is offered by Coursera. See https://www.coursera.org/course/dsp
The rapid evolution of computing capabilities, miniaturization of computers, and maturation of signal
processing techniques allows embedding sophisticated signal processing capabilities in devices such as
digital receivers, digital hearing aids, and noise cancelling earphones. Digital receivers and hearing
aids can perform embedded functions such as high and low frequency filtering, boosting of selected
frequency ranges to improve the match with the limitations of human hearing, and so-called “coloring”
of signals (e.g., boosting the bass of music to improve the sound of the music).
Digital hearing aids provide a greatly improved experience for users over previous hearing aids that
simply amplified all of sound including background noise. Increasingly sophisticated digital hearing aids
can match their processing to the specific needs on an individual user.
Active noise cancelation devices monitor ambient noise or background sounds and use an active
speaker to generate the same sound but with inverted phase. When the ambient noise combines with
the cancellation sound, the result is a quieting effect. Active noise cancelation is being introduced into
automobiles to reduce engine and road noise.
If the source data for the fusion system is image data, then level 0 processing can be performed using
image processing techniques. As with signal processing, there are a very large number of image
processing techniques and tools available to implement those techniques. The commercial tool,
MATLAB, for example has an extensive image processing toolkit
(http://www.mathworks.com/products/image/). In addition, Coursera offers a free online course on
digital image and video processing (https://www.coursera.org/course/digital).
In essence, an image is represented as a matrix of picture elements (pixels) that are manipulated via
mathematical techniques to produce a transformed images plus meta-data. Some types of methods
are shown on this chart. We may seek a number of effects such as; deblurring an image, contrast
enhancement, segmentation (to isolate a portion of an image that has an object of interest), rotation
and scaling effects, and for selected objects in the image the ability to extract features such as height,
width and color, and ultimately object identification. Some examples of these types of effects is
provided by Mathworks (http://www.mathworks.com/help/images/examples/index.html#transforms).
Some examples of image processing are shown here including contrast enhancement, image sharpening,
deconvolution, and image segmentation to extract objects of interest.
A rapidly emerging area of image processing involves automated face recognition. This entails steps
such as; i) segmenting an image to localize humans and human faces, ii) obtaining feature
measurements from a selected face and iii) using a pattern recognition/machine learning technique to
map the feature vector to a unique matching feature vector representing a known or previously
observed face. This latter step typically involves a pattern matching system such as a neural network
that has been “trained” by providing many examples of known faces. This chart shows a feature
extraction process developed by Dr. Wen-Sheng Chu of the Carnegie Mellon University to support
improved face recognition.
Researchers from Facebook announced in a paper in the 2014 Conference on Computer Vision and
Pattern Recognition (CVPR) that they had developed a special 9-layer neural network, training on 4
million facial images of 4,000 people, that their algorithm achieved an accuracy in correct face
recognition of over 97% (see “DeepFace: Closing the Gap to Human-Level Performance in Face
Verification” by Y. Taigman, M. Yang, M. Ranzato and L.Wolf). While this is certainly impressive, such
processing may not be so effective in dynamic situations such as identifying an individual in a crowded
room or public place. The Face Detection Homepage (http://www.facedetection.com/), maintained by
Robert Frischholz, provides links to commercial and free face recognition software, access to sample
data sets, links to research organizations and projects, and access to online demonstrations of face
recognition. Researchers at Colorado State University also maintain a web site
(http://www.cs.colostate.edu/evalfacerec/index10.php) that supports the evaluation of face recognition
algorithms. Not surprisingly, MATLAB has a toolkit that supports face recognition
(http://www.mathworks.com/discovery/face-recognition.html).
We’ve seen that the concept of face recognition is rapidly emerging and being rapidly applied to the
development of new APPs such as Google Glass’s NameTag App. This application allows a person
wearing Google Glass to select an observed stranger and scour social network media for pictures and
the associated name of the observed stranger. Other applications involve use of face recognition on
your own smart phones and netbooks to allow them to recognize you – and hence obviate the need for
remembering your own password!
The more general problem beyond face recognition is a form of computer vision – how to develop
algorithms to view images (or signals) and recognize the content of the image or signal. The problem
for images is also known as automatic linguistic indexing of pictures (ALIP) or semantic meta-data
labeling of images. We will turn now, briefly to a system developed by Dr. James Wang of the
Pennsylvania State University that has shown great promise in ALIP.
A problem addressed by Dr. Wang is how to effectively perform a query on a very large collection of
image data to find images that could be described by a set of key words or find images that “look like”
an image that we have in hand. We will show the concept for satellite image data. However, Dr.
Wang has applied his evolving techniques for applications such as medical image data (viz., find me
examples of patient X-rays that “look like” the X-ray of my current patient), for detection of forgeries in
paintings, and many other areas. The student is referred to Dr. Wang’s web page for further
information on his current research (http://ist.psu.edu/directory/jzw11).
In the satellite imagery example above, we have a very large collection of image data and want to use
semantic categorization, combined with content-based information retrieval (CBIR) to find images of
interest.
The concept of such a query approach is shown here. A query patch or sample image (containing the
type of elements we’re interested in such as mountains, etc.) is processed and input to a large satellite
image data base. Dr. Wang’s processing approach allows a rapid search and ranking of images that
“look like” the query patch. This would support queries such as “find me mountainous regions with
snow-caps”, or “find forests of a certain density”.
Wang’s system has a number of benefits. It has been shown to have a practical implementation on
large-scale archives of images. It is flexible to adapt to various applications of satellite imagery for
geography, military applications, agriculture and search of mineral deposits. The approach exploits
both spatial information such as size and shape as well as spectral characteristics in the image. It uses
a model to generate semantic categorization of images using a supervised learning approach, and also
handles untrained classes of data. Finally it uses a scalable content-based information retrieval (CBIR)
system for efficient querying and browsing.
For completeness it should be noted that there are a number of other commercial and open source CBIR
systems. A partial list is available on Wikipedia (http://en.wikipedia.org/wiki/List_of_CBIR_engines).
More details on Wang’s CBIR system are provided in this chart for your reference. Notice that the
system utilizes methods such as hidden Markov models, support vector machine classifiers, and
integrated region matching. Some online demonstrations of Dr. Wang’s image processing tools are
available at the following website: http://wang.ist.psu.edu/IMAGE/. For those of you who are amateur
photographers, Dr. Wang has a tool called ACQUINE (Aesthetic Quality Inference Engine) that allows you
to present a sample photograph and have the tool rate the photo regarding it’s aesthetic qualities (viz.,
how good is it?).
The final type of input to consider for level 0 processing involves humans as observers, reporting their
observations and inferences via voice or text. We distinguish this from the concept of humans as
mobile sensor “platforms” in which they carry mobile devices and use the sensors on those devices to
collect sensor data. Thus, a person carrying a cell phone and using the associated camera to capture
information about a crisis or event is different then the person using the cell phone to provide messages
about what they see.
The concept of humans as observers is provided by D. Hall, N. McNeese and J. Llinas, “H-Space: Humans
as Observers”, chapter 3 in Human-Centered Information Fusion by D. Hall and J. Jordan, Artech House,
2010. That chapter provides a discussion of models for how humans transform their sensory input into
texts or reports. Human reports can provide significant value to the observations of physical sensors by
providing contextual information and interpretation. For example, an image sensor can provide precise
information about the size, facial expressions, location, and attributes of an observed person, but
virtually no information about the person’s intent. Such information can be inferred by a human
observer. Similarly, the observations about the condition of a machine, such as an automobile, by
physical sensors may provide an excellent characterization of the machine’s state, but little or no
information about the context in which the machine is operating.
On the other hand, humans have many biases and sensory limitations such as the “blind spot”.
Moreover, humans tend to provide information expressed in “fuzzy” terms – e.g., I see a man “near” the
building. These limitations, biases and fuzzy terminology provide challenges to the interpretation and
processing of human reports.
Processing of human reports constitutes the final broad category of level 0 processing. The input is
usually provided as unformatted text, which must be processed using natural language processing
methods. Similar to signal processing, the field of natural language processing (NLP) has an extensive
history with numerous methods available. Coursera offers a free online course in NLP – see
https://www.coursera.org/course/nlp, and many texts exist on the subject. The Stanford Natural
Language Processing Group (http://nlp.stanford.edu/downloads/) makes available free downloadable
copies of many of their NLP tools to perform functions such as language parsing, tagging parts of speech,
recognizing named entities, categorization/classification of text, and toolboxes for modeling specific
topics. Apache NLP is another open source toolkit for performing natural language processing
(https://opennlp.apache.org/).
Types of techniques associated with processing textual information include;
Key word extraction - identifying specific words of interest in a text stream
Name disambiguation – trying to determine if Dave, Dave Hall, David Hall, and Dean Hall are the same
person
Link of semantics to parametric data – e.g., how to convert “near the bank building” to a latitude and
longitude or translating “tall man” to “height = 6 feet”
Syntactic processing – handling sentence and phrase structures
Thesaurus processing – use of a thesaurus to determine the equivalency of words or phrases
Semantic distance calculations – how close is a word or phase to another word or phrase (e.g., does
“nearby” equal “close to”?),
Providing links to other documents and finally meta-data generation such as translating text phrases
into RDF triples or graph representations
In general, NLP is a very difficult problem. Human language, especially English, is challenging to
interpret by computers. Humans have extensive “real-world” knowledge and contextual perspectives
that allow interpretation of sentences and phrases. Consider the example phrase, “I saw Pike’s Peak
flying to New York”. This requires knowledge about what Pike’s Peak is, the impossibility of the
mountain peak actually flying and hence the interpretation that the speaker was flying on an airplane,
knowledge of what New York is, and where it is, etc.
Never the less, despite these challenges, progress is continually being made in NLP, which leads to
increasing viability of effectively processing and using human input via textual reports.
Recently at Penn State, Dr. Stephen Shafer has conducted research on the use of coherence networks
for processing text data (see “CohNet: Automatic theory generation from analyst text files using
coherence networks”, Proceedings of the 2014 Defense and Security SPIE Conference, Baltimore Md.
May 6 – 8, 2014). The concept is shown in this chart. Messages from an analyst or observer are first
processed via a natural language processing approach using finite state automata to produce Resource
Description Framework (RDF) triple representations. These RDF representations are a way of
expressing subject-object-predicate expressions. For example, the expression, “the sky has the color
blue” in RDF notation has the triple; a subject “the sky”, a predicate, “has” and an object “the color
blue”. A collection of these types of relations are often represented using directed graphs. Shafer has
developed an approach using coherence network calculations to identify important (or coherent)
relationships that occur in the evolving complex graph representation. These are prioritized and
presented to an analyst to suggest potential important relationships and links among text messages.
An example is provided in the next chart.
This chart shows an example of the coherence network processing. Text information from Wikipedia
regarding Abraham Lincoln was processed to produce the output graph shown on the right hand side.
We use this example because of the background knowledge that most people have regarding the
historical figure Abraham Lincoln. One can easily see how relationships related to the emancipation
proclamation, Gettysburg address, political activities, etc. are automatically generated using the
coherent network processing. This certainly does not replace the kind of human reasoning that can be
done related to text messaging. However it does provide the capability to develop some textual
“features” that could be useful, especially if a large number of texts or reports need to be processed.
In determining what type of level 0 processing that should be performed, it is vital to understand your
selected application or problem. What sensors or sources of information are available; how do those
sensors and sources perform in a real observing environment; what categories of data (e.g., scalar,
vector, signal or images) will be available; how do the sensor observations and sources connect to the
ultimate inferences sought by the data fusion system. Given this understanding, then one can begin to
choose level 0 algorithms and processes to best support the “downstream” processes. It is important
to do the best job you can at every step of the way. One cannot “make up” or correct for failure to
perform processing on the original sensor and source data.
Download