This lesson introduces the concept of Level 0 fusion in the JDL data fusion processing model. Level 0 in the JDL model is focused on processing individual sensor data prior to combining the data with other sensors. The objectives of this lesson are to; i) introduce the concept of level 0 processing, ii) identify categories of techniques for level 0 processing and iii) to extend the concept of traditional signal and image processing to “meta-data” generation. Recall that level 0 in the JDL data fusion process model was not part of the original formulation of the model. Instead, this “new level” of processing was introduced by Alan Steinberg, Chris Bowman and Frank White in a 1999 paper entitled, “Revisions to the JDL Data Fusion Model”. Level 0 processing was introduced for a number of reasons including; i) recognition of the rapid introduction of “smart sensors” with embedded processing capability that allowed application of sophisticated image and signal processing, and ii) recognition that some processing could be done on a single sensor basis that would improve subsequent multi-sensor fusion. Examples of the latter include functions such as “pre-detection” fusion and single sensor entity characterization and identification. The following are some comments on level 0 fusion: Level 0 processing concerns processing each source of data independently to obtain the most useful information possible; we want to “squeeze out” useful information from each sensor or source of information; In general, input data to a fusion system may include scalar or vector data, signal data, image data, or textual information. Each of these classes of information may entail an entire range of potentially applicable techniques (e.g., signal processing, image processing, text-based processing) It is beyond the scope of this lecture (and indeed this entire course) to address these subjects – however, we will provide some thoughts on each of these data types in this lesson Finally, an emerging new area is the generation of semantic meta-data for signals and images as a new powerful process to improve data fusion. This involves using pattern recognition and machine learning techniques to process data and to provide semantic annotations. An example is automatic face recognition. Another example is linguistic indexing of images (e.g., processing images to automatically assign words that describe the content of the image). These concepts are described later in this lesson. This chart shows some examples of the types of processing that could be performed on single-sensor data. There are an enormous amount of papers and books describing these methods, and commercial tools for performing meta-sensor processing. For example the continually evolving commercial toolkit called MATLAB (http://www.mathworks.com/products/matlab/) provides a signal processing toolbox, a wavelet toolbox, a phased array system toolbox, and many other tools that can perform the functions identified in the chart above. The chart identifies four basic categories of meta-sensor processing. These include; i) parametric modeling, ii) non-parametric modeling, iii) entity detection, and iv) sensor validation and calibration. It should be noted that the techniques shown in the chart are only illustrative and not exhaustive. There are many more types of methods involved in each category that could be applied to sensor data. For example, the concept of image analysis (listed under the category of non-parametric analysis) is an entire field of study with numerous texts, papers, and tools available. Let’s briefly discuss each of these categories. Parametric modeling methods are mathematical techniques for “operating on” data to produce new representations. Parametric methods rely on assumptions about the underlying statistical properties of the data – e.g., assumptions about the characteristics of the “noise” in the data. For example, one might assume that the sensor is observing some type of time varying phenomena and that the observations are not perfect, but rather are “corrupted” by inherent noise or random variations. This noise and variations in the sensing process can be induced by the internal workings of the sensor, variations in the observing media, and variations in the phenomena or entity being observed. Parametric modeling methods seek to address questions such as; are there periodicities in the data, is there a trend, are there multiple time series or observed phenomena that are interleaved in the data, etc. Methods of parametric modeling include techniques such as spectral analysis, discrete Fourier transforms, timefrequency domain processing methods such as wavelet transforms, auto-regression (AR) methods and others. A readable introduction to time series analysis is provided by an article entitled, “Time series analysis with a past”, written by O. Hammer in 2010 (http://nhm2.uio.no/norlex/past/TimeseriesPast.pdf). A second category of meta-sensor processing involves non-parametric analysis. Some of these methods are similar to parametric modeling but make fewer assumptions about the underlying nature of the data. Examples include neural networks, energy detection methods and trend analyses. A third type of meta-sensor processing, entity detection, seeks to determine if the observed data is the result of some observed entity, activity or event. For sensors involving an array of elements, for example, we seek to process the data observed by each of the elements in the array to determine if the observations are simply noise, or whether, when processed together the data are indicative of some entity or target. Moving target indicator (MTI) processing involves processing radar data to take advantage of the fact that a moving target provides a response that can be contrasted against background clutter. Hypothesis testing references an entire field of statistical methods. For target or entity detection, for example, we hypothesize that a target or entity is the “cause” of the observed data and seek to determine to refute or confirm the hypothesis using the statistical techniques. Another set of techniques called pre-detection fusion involves processing data in a sensor array in which none of the individual array elements or sensor contributors can accurately declare that it has “seen” a target. See for example, “Predetection fusion in a large sensor network with unknown target locations”, by R. Georgeseu, P. Willett, S. Marano and V. Matta, in the Journal of Advances in Information Fusion, vol. 7, # 1, June 2012, pp. 61 – 77. Finally, meta-sensor processing can involve sensor validation and calibration. Experimental tests may be conducted to characterize the performance of a sensor and allow development of mathematical models of the performance as a function of the environmental conditions, nature of the entity being observed, etc.. Other concepts involve calibration procedures for the sensor, validation of the sensor performance, and establishment of characteristics such as the probability of detection or probability of correct identification, and probability of false alarms. This chart illustrates the concept of level 0 processing by type of information or sensor type. As always in these illustrations, we view sensors as if we are surveying or monitoring the entire world. Of course this is only conceptual. The domain being monitored or observed depends on our specific application. For condition-based maintenance, we might seek to monitor the performance of a mechanical system such as an airplane. We would use sensors to measure temperature, pressure, vibration and chemical composition of a lubricant. For this application, a human pilot, member of a flight crew or maintenance person may provide observations about the condition and context of a machine’s operation. For a medical application, we might seek to determine the health of an individual human using X-rays, sonograms, temperature and biological tests. For this application, human observations might include observations from a physician, nurse, or even the patient himself. In any event, we may have access to data from sensors that produce a scalar or vector output (e.g., a thermometer that produces a single number output – temperature), or a sensor that produces a signal or time series output, an image sensor that produces one or more images, and finally the observations of a human. These categories of sensor types are shown here. For each category of sensor or information source, different types of level 0 processing may be applied. Scalar/vector sensors - For sensors that output a scalar or vector quantity, data conditioning might be applied involving units conversions, transformations from one numerical scale to another, changes from one coordinate reference frame to another, thresholding and other methods. Signal sensors - For sensors that produce a signal or time series, numerous signal processing algorithms can be applied. In the previous chart we showed examples of both parametric and nonparametric methods that could be applied. Imagery sensors - For sensors that produce images, the entire field of digital image processing and semantic meta-data processing is applicable. Human observations - Finally for human observations, we may utilize methods from natural language processing to translate the human’s “story” about what he or she has seen to extracted scalar information about the observed entity location, characteristics and identity. In the next few charts we will consider each of these types of sources. Level 0 processing of scalar or vector data from a sensor or source includes techniques such as scaling methods and transformations, coordinate transformations and rotations, smoothing, filtering or averaging methods, thresholding and feature extraction and alternate representations. The concept is illustrated on the lower part of this chart. Original scalar or vector data, represented by y is input to a transformation function. The output of this transformation is the original data, y, plus one or more alternative representations indicated by the letter z. The output may also include statistical metrics such as standard deviation, if the output sensor data are averaged over a period of time. In addition, the output may include semantic representations (e.g., engine overheating!) Other types of transformations include scaling, change of representation units, coordinate transformations and rotations. For example a radar observation that reports range, azimuth and elevation to an observed target may be translated into (x,y,z) Cartesian coordinates related to the radar’s location. Similarly, the location of an observed person, might originally be reported with respect to a building or reference point (e.g., the observed person is at latitude 40 degrees, 47 minutes and 35 seconds North and Longitude 77 degrees, 51 minutes and 37 seconds West). However, this might be more useful translated to Military Grid coordinates. We may take the output of a sensor and perform smoothing or averaging over a period of time. A blood pressure reading measurement, for example, might be more meaningful to a physician if we know the average blood pressure of a patient over a period of an hour or day, rather than the moment the patient walks into the physician’s office. Finally, we might seek to use a threshold and only report the observation if it exceeds a certain value. The temperature of the engine of your automobile may not be useful to know unless the engine is deemed to be overheating. Finally, feature extractions and other representations might be utilized. For frequency domain data, we might seek to represent the sensor data as the location and magnitude of the peaks of the data (viz., the primary frequencies present in the data). For sensors that provide an output signal or time series, there are two main categories of algorithms for level 0 processing; these include parametric modeling methods and non-parametric modeling methods. The concept is illustrated in lower part of the chart above. A generally noisy signal is input to a processing or transformation function that outputs a new representation of the signal along with parameters that characterize the signal. For example as illustrated above, a noisy time series is input to our transformation process which applies a 512 point Fourier Transform. We see the output frequency domain representation on the right. It is clear from the transformation that the original noisy time series is comprised of two main frequencies, and corrupted by noise. While many techniques are available to manipulate the input data, the specific techniques selected should depend upon our understanding of the nature of the observing environment and the entities being observed. If we believe that the phenomena that we’re interested in is periodic in nature (e.g. a heart beat), then spectral methods are applicable. If we know the form of the signal being observed, we could utilize methods such as matched filter processing. For example, if we are seeking to find an emitter (such as that associated with an aircraft’s lost “black box”, we could generate a signal that we would expect to observe and then match that signal against the actual output of a sensor tuned to the correct frequency and see if there is a match (subject to noise and environmental conditions). Alternatively if we are seeking to observe a trend, such as the evolution of the wear in a mechanical system, we might select trend analysis techniques. An excellent overview of these methods and techniques is provided by the book, Signal Processing for Intelligent Sensor Systems, by David Swanson, published by Marcel-Decker, 2000. A free online course in digital signal processing is offered by Coursera. See https://www.coursera.org/course/dsp The rapid evolution of computing capabilities, miniaturization of computers, and maturation of signal processing techniques allows embedding sophisticated signal processing capabilities in devices such as digital receivers, digital hearing aids, and noise cancelling earphones. Digital receivers and hearing aids can perform embedded functions such as high and low frequency filtering, boosting of selected frequency ranges to improve the match with the limitations of human hearing, and so-called “coloring” of signals (e.g., boosting the bass of music to improve the sound of the music). Digital hearing aids provide a greatly improved experience for users over previous hearing aids that simply amplified all of sound including background noise. Increasingly sophisticated digital hearing aids can match their processing to the specific needs on an individual user. Active noise cancelation devices monitor ambient noise or background sounds and use an active speaker to generate the same sound but with inverted phase. When the ambient noise combines with the cancellation sound, the result is a quieting effect. Active noise cancelation is being introduced into automobiles to reduce engine and road noise. If the source data for the fusion system is image data, then level 0 processing can be performed using image processing techniques. As with signal processing, there are a very large number of image processing techniques and tools available to implement those techniques. The commercial tool, MATLAB, for example has an extensive image processing toolkit (http://www.mathworks.com/products/image/). In addition, Coursera offers a free online course on digital image and video processing (https://www.coursera.org/course/digital). In essence, an image is represented as a matrix of picture elements (pixels) that are manipulated via mathematical techniques to produce a transformed images plus meta-data. Some types of methods are shown on this chart. We may seek a number of effects such as; deblurring an image, contrast enhancement, segmentation (to isolate a portion of an image that has an object of interest), rotation and scaling effects, and for selected objects in the image the ability to extract features such as height, width and color, and ultimately object identification. Some examples of these types of effects is provided by Mathworks (http://www.mathworks.com/help/images/examples/index.html#transforms). Some examples of image processing are shown here including contrast enhancement, image sharpening, deconvolution, and image segmentation to extract objects of interest. A rapidly emerging area of image processing involves automated face recognition. This entails steps such as; i) segmenting an image to localize humans and human faces, ii) obtaining feature measurements from a selected face and iii) using a pattern recognition/machine learning technique to map the feature vector to a unique matching feature vector representing a known or previously observed face. This latter step typically involves a pattern matching system such as a neural network that has been “trained” by providing many examples of known faces. This chart shows a feature extraction process developed by Dr. Wen-Sheng Chu of the Carnegie Mellon University to support improved face recognition. Researchers from Facebook announced in a paper in the 2014 Conference on Computer Vision and Pattern Recognition (CVPR) that they had developed a special 9-layer neural network, training on 4 million facial images of 4,000 people, that their algorithm achieved an accuracy in correct face recognition of over 97% (see “DeepFace: Closing the Gap to Human-Level Performance in Face Verification” by Y. Taigman, M. Yang, M. Ranzato and L.Wolf). While this is certainly impressive, such processing may not be so effective in dynamic situations such as identifying an individual in a crowded room or public place. The Face Detection Homepage (http://www.facedetection.com/), maintained by Robert Frischholz, provides links to commercial and free face recognition software, access to sample data sets, links to research organizations and projects, and access to online demonstrations of face recognition. Researchers at Colorado State University also maintain a web site (http://www.cs.colostate.edu/evalfacerec/index10.php) that supports the evaluation of face recognition algorithms. Not surprisingly, MATLAB has a toolkit that supports face recognition (http://www.mathworks.com/discovery/face-recognition.html). We’ve seen that the concept of face recognition is rapidly emerging and being rapidly applied to the development of new APPs such as Google Glass’s NameTag App. This application allows a person wearing Google Glass to select an observed stranger and scour social network media for pictures and the associated name of the observed stranger. Other applications involve use of face recognition on your own smart phones and netbooks to allow them to recognize you – and hence obviate the need for remembering your own password! The more general problem beyond face recognition is a form of computer vision – how to develop algorithms to view images (or signals) and recognize the content of the image or signal. The problem for images is also known as automatic linguistic indexing of pictures (ALIP) or semantic meta-data labeling of images. We will turn now, briefly to a system developed by Dr. James Wang of the Pennsylvania State University that has shown great promise in ALIP. A problem addressed by Dr. Wang is how to effectively perform a query on a very large collection of image data to find images that could be described by a set of key words or find images that “look like” an image that we have in hand. We will show the concept for satellite image data. However, Dr. Wang has applied his evolving techniques for applications such as medical image data (viz., find me examples of patient X-rays that “look like” the X-ray of my current patient), for detection of forgeries in paintings, and many other areas. The student is referred to Dr. Wang’s web page for further information on his current research (http://ist.psu.edu/directory/jzw11). In the satellite imagery example above, we have a very large collection of image data and want to use semantic categorization, combined with content-based information retrieval (CBIR) to find images of interest. The concept of such a query approach is shown here. A query patch or sample image (containing the type of elements we’re interested in such as mountains, etc.) is processed and input to a large satellite image data base. Dr. Wang’s processing approach allows a rapid search and ranking of images that “look like” the query patch. This would support queries such as “find me mountainous regions with snow-caps”, or “find forests of a certain density”. Wang’s system has a number of benefits. It has been shown to have a practical implementation on large-scale archives of images. It is flexible to adapt to various applications of satellite imagery for geography, military applications, agriculture and search of mineral deposits. The approach exploits both spatial information such as size and shape as well as spectral characteristics in the image. It uses a model to generate semantic categorization of images using a supervised learning approach, and also handles untrained classes of data. Finally it uses a scalable content-based information retrieval (CBIR) system for efficient querying and browsing. For completeness it should be noted that there are a number of other commercial and open source CBIR systems. A partial list is available on Wikipedia (http://en.wikipedia.org/wiki/List_of_CBIR_engines). More details on Wang’s CBIR system are provided in this chart for your reference. Notice that the system utilizes methods such as hidden Markov models, support vector machine classifiers, and integrated region matching. Some online demonstrations of Dr. Wang’s image processing tools are available at the following website: http://wang.ist.psu.edu/IMAGE/. For those of you who are amateur photographers, Dr. Wang has a tool called ACQUINE (Aesthetic Quality Inference Engine) that allows you to present a sample photograph and have the tool rate the photo regarding it’s aesthetic qualities (viz., how good is it?). The final type of input to consider for level 0 processing involves humans as observers, reporting their observations and inferences via voice or text. We distinguish this from the concept of humans as mobile sensor “platforms” in which they carry mobile devices and use the sensors on those devices to collect sensor data. Thus, a person carrying a cell phone and using the associated camera to capture information about a crisis or event is different then the person using the cell phone to provide messages about what they see. The concept of humans as observers is provided by D. Hall, N. McNeese and J. Llinas, “H-Space: Humans as Observers”, chapter 3 in Human-Centered Information Fusion by D. Hall and J. Jordan, Artech House, 2010. That chapter provides a discussion of models for how humans transform their sensory input into texts or reports. Human reports can provide significant value to the observations of physical sensors by providing contextual information and interpretation. For example, an image sensor can provide precise information about the size, facial expressions, location, and attributes of an observed person, but virtually no information about the person’s intent. Such information can be inferred by a human observer. Similarly, the observations about the condition of a machine, such as an automobile, by physical sensors may provide an excellent characterization of the machine’s state, but little or no information about the context in which the machine is operating. On the other hand, humans have many biases and sensory limitations such as the “blind spot”. Moreover, humans tend to provide information expressed in “fuzzy” terms – e.g., I see a man “near” the building. These limitations, biases and fuzzy terminology provide challenges to the interpretation and processing of human reports. Processing of human reports constitutes the final broad category of level 0 processing. The input is usually provided as unformatted text, which must be processed using natural language processing methods. Similar to signal processing, the field of natural language processing (NLP) has an extensive history with numerous methods available. Coursera offers a free online course in NLP – see https://www.coursera.org/course/nlp, and many texts exist on the subject. The Stanford Natural Language Processing Group (http://nlp.stanford.edu/downloads/) makes available free downloadable copies of many of their NLP tools to perform functions such as language parsing, tagging parts of speech, recognizing named entities, categorization/classification of text, and toolboxes for modeling specific topics. Apache NLP is another open source toolkit for performing natural language processing (https://opennlp.apache.org/). Types of techniques associated with processing textual information include; Key word extraction - identifying specific words of interest in a text stream Name disambiguation – trying to determine if Dave, Dave Hall, David Hall, and Dean Hall are the same person Link of semantics to parametric data – e.g., how to convert “near the bank building” to a latitude and longitude or translating “tall man” to “height = 6 feet” Syntactic processing – handling sentence and phrase structures Thesaurus processing – use of a thesaurus to determine the equivalency of words or phrases Semantic distance calculations – how close is a word or phase to another word or phrase (e.g., does “nearby” equal “close to”?), Providing links to other documents and finally meta-data generation such as translating text phrases into RDF triples or graph representations In general, NLP is a very difficult problem. Human language, especially English, is challenging to interpret by computers. Humans have extensive “real-world” knowledge and contextual perspectives that allow interpretation of sentences and phrases. Consider the example phrase, “I saw Pike’s Peak flying to New York”. This requires knowledge about what Pike’s Peak is, the impossibility of the mountain peak actually flying and hence the interpretation that the speaker was flying on an airplane, knowledge of what New York is, and where it is, etc. Never the less, despite these challenges, progress is continually being made in NLP, which leads to increasing viability of effectively processing and using human input via textual reports. Recently at Penn State, Dr. Stephen Shafer has conducted research on the use of coherence networks for processing text data (see “CohNet: Automatic theory generation from analyst text files using coherence networks”, Proceedings of the 2014 Defense and Security SPIE Conference, Baltimore Md. May 6 – 8, 2014). The concept is shown in this chart. Messages from an analyst or observer are first processed via a natural language processing approach using finite state automata to produce Resource Description Framework (RDF) triple representations. These RDF representations are a way of expressing subject-object-predicate expressions. For example, the expression, “the sky has the color blue” in RDF notation has the triple; a subject “the sky”, a predicate, “has” and an object “the color blue”. A collection of these types of relations are often represented using directed graphs. Shafer has developed an approach using coherence network calculations to identify important (or coherent) relationships that occur in the evolving complex graph representation. These are prioritized and presented to an analyst to suggest potential important relationships and links among text messages. An example is provided in the next chart. This chart shows an example of the coherence network processing. Text information from Wikipedia regarding Abraham Lincoln was processed to produce the output graph shown on the right hand side. We use this example because of the background knowledge that most people have regarding the historical figure Abraham Lincoln. One can easily see how relationships related to the emancipation proclamation, Gettysburg address, political activities, etc. are automatically generated using the coherent network processing. This certainly does not replace the kind of human reasoning that can be done related to text messaging. However it does provide the capability to develop some textual “features” that could be useful, especially if a large number of texts or reports need to be processed. In determining what type of level 0 processing that should be performed, it is vital to understand your selected application or problem. What sensors or sources of information are available; how do those sensors and sources perform in a real observing environment; what categories of data (e.g., scalar, vector, signal or images) will be available; how do the sensor observations and sources connect to the ultimate inferences sought by the data fusion system. Given this understanding, then one can begin to choose level 0 algorithms and processes to best support the “downstream” processes. It is important to do the best job you can at every step of the way. One cannot “make up” or correct for failure to perform processing on the original sensor and source data.