We turn now to discuss the 2nd level of processing in the Joint Directors of Laboratories (JDL) data fusion processing model. The objectives of this topic are to introduce the JDL level 2 processing concept, survey and introduce some methods for approximate reasoning such as the Bayes method, Dempster-Shafer’s method and others, and describe some challenges and issues in automated reasoning. In this lesson, we will focus on report-level fusion while the next lesson will introduce contextual reasoning methods such as rulebased systems and intelligent agents. We refer again to the JDL processing model and put the level 2 process in perspective. We have previously indicated that the JDL model is not sequential nor hierarchical. We don’t necessarily perform level 1 fusion first, followed by level 2, etc. However, the level 2 fusion processes tend to operate on the evolving products of level 1 fusion. Thus, in level 2 fusion, we are seeking to develop insight into an evolving situation that involves entities such as targets, events, and activities. Level 2 fusion processing seeks to understand the context of level-1 products, such as how entities, activities and events relate to each other and to the environment. The original specification of the JDL model identified the following types of sub-processes or functions for level-2 processing. These include: • Object aggregation – Object aggregation involves looking a multiple entities or objects and seeking to establish potential relationships such as geographical proximity, temporal relationships (e.g., this object or event always appears after that object or event), communications among entities and perhaps functional dependence (e.g., in order for this event to occur, the following functions need to be performed). • Event/activity aggregation – Event and activity aggregation involves determining possible links between events and activities and higher level interpretation. For example, girls wearing white gowns, and multiple picture taking events, and a large dinner or banquet might indicate a wedding event. • Contextual interpretation and fusion – Many events and activities are affected by (and must be interpreted in light of) the environment, weather, doctrine and socio-political interpretations. Doctrine refers to the formal or informal rules by which an activity occurs. • Multi-perspective assessment – Finally, for adversarial situations, we can use a red, blue and white perspective assessment. Red refers to our opponents (regardless of their political or organizational affiliation), blue refers to our own forces or players, and white refers to the environment that affects both red and blue. Thus for military operations, the white view would look at factors such as terrain and how that affects the ability for ground-based vehicles to traverse from one point to another. This terrain affects both the red and the blue ground vehicles. While we had indicated that JDL processing is not necessarily hierarchical, there is a kind of hierarchy of abstraction and associated processes. In this chart we show that reasoning and related data may proceed from very physical and specific levels to very general and abstract levels. So we may start with sensor data or human inputs to establish “what’s out there” – the existence and observable features of an entity. At a more abstract level, we are interested in the identity, attributes and location of observed entities. Proceeding to a higher level of abstraction, we might be interested in the behavior of entities and the relationship of one entity to others. At a still higher level of abstraction, we seek to understand the meaning of an evolving situation. And finally, we seek to develop hypotheses about the future implications of the current situation. Correspondingly the methods for reasoning about this hierarchy of abstraction range from signal and image processing techniques (for determining the existence and features of an entity), to estimation techniques such as Kalman Filters, cluster algorithms, neural networks, Bayesian Belief Nests and Evidential reasoning, to finally more abstract reasoning techniques such as expert systems, use of frames, templating, scripts, case-based reasoning and intelligent agents. Some examples of data fusion inferences for different application domains are provided in this chart. These include: • Military tactical situation assessment – involves basic inferences such as the location and identification of low level entities and objects. Higher level inferences would seek to identify complex entities, relationships among entities and a contextual interpretation of the meaning of these entities. Types of reasoning would involve estimation, spatial and temporal reasoning, establishing functional relationships, hierarchical reasoning and context-based analysis. • Threat or consequence assessment – for military or emergency management applications involves many of the same types of reasoning as for tactical situation assessment. However, in this case we are focuses on reasoning about the future and the consequences of the current situation. Hence, reasoning methods are focused on the development of alternative hypotheses, prediction of the consequences of anticipated actions, the development of alternative scenarios, and cause-effect reasoning. • Complex equipment diagnosis – The proliferation of sensors and computers embedded into complex mechanical systems such as automobiles and airplanes provides the opportunity to monitor the health and potential future maintenance needs of the mechanical systems. The basic inferences sought for this application includes determining the basic state of the equipment by monitoring data such as temperature, pressure and vibration, locating and identifying fault conditions and pre-cursors to fault conditions, and identification of abnormal operating conditions. Higher level inferences involve establishment of cause-effect relationships, analyses of processes, development of recommendations for diagnostic tests and recommendations for maintenance actions. Some types of reasoning include failure-effect and mode analysis (FEMA) methods. • Medical diagnosis – Medical diagnosis is a very complex domain that potentially involves integrating the results of sensor measurements (e.g., X-rays, sonograms, biological tests) with human observations by a paramedic, nurse or physician, and self-reports from a patient. We seek to combine this information to assess symptoms, locate injuries (both external and internal), determine abnormal conditions, and establish a diagnosis of the patient’s condition. Types of reasoning include analysis of the relationships among symptoms, linking symptoms to potential causes, recommending diagnostic tests, and identification of diseases. • Remote sensing – Finally, the chart shows inferences and reasoning associated with remote sensing. This could be for understanding environmental damage and evolution, agricultural applications, or dealing with crisis events. Basic inferences may include location and identification of crops, vegetation, minerals or geographic features, the identification of features of objects or areas of interest, and the identification of unusual phenomena. Types of reasoning involve the determination of the relationships among geographical features, interpretation of data, spatial and temporal reasoning, and context-based reasoning. In general, reasoning for level-2 and level-3 processing involves context-based reasoning and high level inferences. The techniques are generally probabilistic and entail representation of uncertainty in data and inferential relationships. Often, the reasoning involves working at a semantic (word-based) level. Examples of methods at the semantic level include; rule-based methods, graphical representations, logical templates, cases, plan hierarchies, agents and others. In principle, we are trying to emulate the types of reasoning that involves human sensing and cognition. As shown in the chart, basic techniques and functions required include pattern matching, inference (reasoning) methods, search techniques and knowledge representation. These are basic functions in the realm of artificial intelligence, which seeks to develop methods to emulate functions ordinarily associated with humans and animals, e.g., vision (computer vision), purposeful motion (robotics), learning (machine learning), reasoning (expert systems), and speech and language understanding (natural language processing). There are many challenges in developing computer methods to automate human reasoning or inferencing. Humans have general sensing, reasoning and context-based situation awareness capabilities. They tend to have continual access to multiple human senses, perform complex pattern recognition such as recognizing visual objects, can perform reasoning with words, and have extensive knowledge of “real-world” facts relationships and interactions. All of these must be developed explicitly for computer systems. Also, humans tend to be able to perform rapid assessments and make quick decisions based on rules of thumb and heuristics. Finally, humans can readily understand the context of a situation, which affects the understanding of the sensory data and the decisions to be made. By contrast, computers lack real-world knowledge, are challenged by the difficulties of dealing with English or other human languages, and require explicit methods to represent knowledge and to reason about that knowledge. However, computers have some advantages over humans including the ability to perform complex mathematical calculations and hence use physics-based models. They are unaffected by fatigue, emotions or bias. In addition, computers can process huge data sets and use machine learning techniques to obtain insights that would be difficult for a human. There are numerous types of relationships that we may need to represent in order to perform situation awareness or assessment. Types of relationships include: • Physical constituency – What are the components or elements of a system, event, activity or entity? We may use representation techniques such as block diagrams, specification trees, or physical models to represent physical constituency. • Functional constituency – What are the functions required for an event or activity. The question seeks to determine, what does an event or activity involve, requires or provides. Examples of representation methods include functional block diagrams, decomposition trees, interpretive structural modeling and other methods. • Process constituency – What processes are involved in an unfolding situation? Does an entity or group of entities perform one or more processes. We may use mathematical functions, logical operations, rules or procedures to describe these processes. • Sequential dependency – Is there a sequence of events, activities or processes that define a situation. What must come first? What are precursory steps required before something else can happen? Examples of representation techniques include PERT (Program Evaluation Review Technique) charts such as schedules or GANT charts, petri-nets, scripts and operational sequence diagrams. • Temporal dependency – Related to sequential dependency is temporal dependency in which events or activities need to be coordinated in time. These may be represented by event sequences, timelines, scripts or operational sequence diagrams. • Finally, we may utilize techniques such as computer simulations, access to sample real-world data, physical models, or other means to seek to represent and decompose a complex environment into understandable and predictable components. It is beyond the scope of this course to address all of the current methods used for automated reasoning. We provide a brief list here of three major aspects of automated reasoning and types of techniques used. The three main aspects include: • Knowledge representation – We need to be able to represent facts, relationships, and interactions between entities and their environment. Common representation methods include; rules, frames, scripts, semantic nets, parametric templates and analogical methods such as physical models. • Uncertainty representation – along with representing knowledge, we need to represent information about the uncertainty of this information. Examples of techniques include confidence factors, probability, Demspter-shafer evidential intervals, fuzzy membership functions, and other methods. • Reasoning methods and architectures – Given the ability to represent data, facts and uncertainty, we seek to reason with or process that information in order to allow automated generation of results such as new relationships and inferences. Methods include the following. • Implicit reasoning methods such as neural nets and cluster algorithms operate directly on data without explicit guidance or semantic information; • Explicit reasoning methods seek to incorporate semantic knowledge such as rules or other incorporation of human expert knowledge to guide the reasoning. Pattern templates such as templating or case-based reasoning incorporate both parametric data as well as semantic knowledge. • Process reasoning methods such as script interpreters or plan-based reasoning use information about processes such as causal relationships, sequential requirements, interaction among functions or actors, etc. to perform automated reasoning. • Deductive methods include decision trees, Bayesian belief nets and Dempster-shafer belief nets. • Finally, hybrid architectures include intelligent software agents, blackboard systems and hybrid symbolic and numerical systems. We will review some of these methods in this and the next lesson. In the previous lesson, we introduced pattern recognition methods such as cluster algorithms and neural networks. We indicated that these methods could be applied to individual sensors or sources of information to result in a “declaration of identity” based on each sensor or source alone. For example, one might process data from a radar and use the radar cross section data as a function of time and aspect-angle to classify the type of aircraft or object being observed. Similarly, one might use size and shape information from a visual or infrared sensor in order to make another declaration of target type or class. We now turn to decision-level identity fusion. In this case, we want to fuse the identity or classification declarations from the individual sensors and sources to arrive at a consensus – a fused declaration of identity. This process represents a transition from Level-1 identity fusion to Level-2 fusion related to complex entities, activities, events. The reasoning is performed at the semantic (report) level. The concept is illustrated in this flow chart. However, prior to introducing techniques such as voting methods, Bayesian inference, and DempsterShafer’s method, it is necessary for a brief review of some concepts from probability and logic. Classical statistical inference was first introduced in the early 1700s and became formalized into modern statistical inference in the 1920s by publications by Fisher and the 1930s by Neyman and Pearson. We start by defining three types of probabilities. i) Classical probability involves statements about a situation in which we know all possible outcomes of an “experiment”. For example if we flip a two sided coin with one side having a “head” and one side having a “tail”. We can ask, what is the probability that the coin will show a head when flipped once. If the coin is not biased, then the probability of see a Heads is simply given ½ (that is, we could either see a H or a T). Suppose we flip the coin twice is succession. What is the probability of seeing two head? In classic probability we consider all the possible combinations of outcomes; that is, (H,H), (H,T), (T, H), and (T,T). Out of four possible outcomes, showing two heads would occur once. We conclude that the probability of seeing (H,H) is ¼. This concept can be extended to games of chance involving cards, roulette and other forms of gambling. Indeed, the original development of probabalistic concepts arose by French mathematicians seeking to help nobility win at games of chance. ii) Empirical probability – Suppose we cannot define all of the possible outcomes of an experiment and hence cannot enumerate the possible outcomes. Empirical probability seeks to determine probability based on observations of many experiments. For example, we seek to determine the probability of having a flat tire on our automobile versus having a transmission failure. There are many ways in which we could obtain a flat tire, and many ways in which a transmission could fail. To determine these probabilities, we observe how often these events occur “in nature” and use the relative number of occurrences (e.g., versus miles traveled) to estimate probability. Similar means are used to determine the likelihood of being affected by a disease or medical condition. Of course, these occurrences depend upon other conditions, but for the moment we ignore those dependencies. iii) Finally, subjective probability involves a human assessment of the likelihood of the occurrence of an event or activity. We often use subjective probabilities to determine how to live our lives – should we take an umbrella today? Will there be heavy traffic on I-270? Subjective probability is tempting to use, since we don’t actually have to enumerate outcomes or collect data about underlying distributions of events or activities. Unfortunately, humans are notoriously poor at estimating probabilities. In particular our estimate of the probability of an event, A, (P(A)), and the probability that A does not happen, Not-A (P(Not-A)) do not add up to one. In classic probability, the probability of event, P(A), plus the probability of event Not-A, P(Not-A) should equal one. After all, an event either will happens or it will not. However, when humans make these estimates (even experienced statisticians), they fail this condition. Returning to statistical inference, we define a statistical hypothesis as a statement about a population, which, based on information from a sample, one seeks to support or refute. For example, we hypothesize that a coin being used by a gambling house is biased. To test this hypothesis, we would observe the coin being flipped a number of times and seek to either refute the hypothesis (that is, determine that the evidences does not support a biased coin), or affirm the hypothesis (state that the coin is biased). A statistical test is a set of rules whereby a decision on the hypothesis H is reached. A measure of the test accuracy involves a probability statement regarding how accurate our statistical test is. Again using the biased coin hypothesis, the measure of test accuracy would specify, for example, how many times we would need to flip the coin to either affirm or refute the hypothesis. The classical statistical inference proceeds as follows: We start with a hypothesis, say H0, and it’s alternate, H1. Using the biased coin example, we assume a hypothesis, H0, that the coin is fair or unbiased. The alternate hypothesis, H1, is that the coin is biased. The test logic proceeds as follows: First, assume that the null hypothesis is true – assume H0, is true. We collect data – in this case observations of the coin flips. Using the language of statistical inference, this is known as “examining the consequences of H0, being true in the sampling distribution for the statistic”. If the observations have a high probability of occurring, we can say that the “data do not contradict, H0”. Thus, we do not declare that the hypothesis is true, we can only say that the data collected does not tend to contradict the hypothesis being true. Conversely, if the observed data do not support the hypothesis being true, we cannot say that it is false, only that the data contracts the hypothesis. The level of significance of the test is the probability level that is considered to low to warrant support of H0. This may seem arcane and a stuffy mathematical set of statements, but the student is reminded that such tests are used to determine things like, does a particular drug reduce the effects of a specific condition or disease. How can statistical inference be used for target or entity classification? Let’s consider an example of emitter identification, e. g. for situation assessment related to a Department of Defense problem. In particular, we have an electronic support measures (ESM) device on our aircraft and want to devise a means of knowing whether or not we are being illuminated by an emitter of possible hostile intent. Note that this example can be translated directly into other applications such as medical tests, environmental monitoring, or monitoring complex machines. In fact, such an application is very similar to the use of a “fuzz buster” to monitor the presence of a police speed trap. Consider two types of emitters, emitter type E1, and emitter type, E2. Based on observing emission characteristics such as pulse repetition frequency (PRF), and frequency of emission (F), we want to determine whether we are being illuminated by an emitter of type, E1, or an emitter of type, E2. Why would this make a difference? Suppose emitter type, E1, is a common weather radar, while emitter type, E2, is associated with a ground to air aircraft missile defense unit. We want our electronic support measure (ESM) device to provide a warning if we are being illuminated by a radar associated with an anti-aircraft missile defense unit. How would statistical inference proceed? Continuing this example, suppose assume that we have previously collected information about both types of emitters and have the empirical distributions about each emitter based on pulse repetition interval (PRI) - We will only use PRI at the moment, because it is easier to show than showing both PRI and Frequency. The graph on the left shows two curves, one for each emitter. The curve is the probability that an emitter of either type, E1 or type E2, would exhibit a specific value of PRI. So in the figure, the cross-hatched area is the probability that an emitter of type, E2, would be observed emitting with a pulse repetition interval between, PRIN and PRIN+1. Note that the two curves overlap. Each emitter has a range of pulse repetition agility – they both exhibit pulse repetition intervals over a wice range. But this range of PRIs overlap. If each emitter emitted only in a narrow range of values of PRI) AND these ranges were not overlapping, then the inference would be easy. The challenge comes when these ranges of PRI overlap as shown on the lower right hand side of the figure. In order to make a decision about identity, we select a critical value of PRI (call it PRIc) and decide to use the following rule: • If the observed value of PRI is greater than PRIc, we declare that we have seen an emitter of type, E2 • Conversely, if the observed value of PRI is less than PRIc, we declare that we have seen an emitter of type, E1. Of course, even when we use this rule, we can still be wrong in our declaration. As shown on the right hand side of the figure, if the observed value of PRI is greater that PRIc, there is still a probability that we have NOT seen an emitter of type E2, but rather have seen an emitter of type E1. This is the crosshatched area under the distribution to the right of the vertical PRIc line. The similar argument holds for declaring that we have seen an emitter of type, E1, if the observed value of PRI is less than PRIc. Each of these errors in declaration are called a type 1 versus type 2 error, respectively. How then do we choose the value of PRIc? Unfortunately the mathematics does not help us. This is a design choice. If we are concerned about the threat of being illuminated by an emitter of type , E2, because it is associated with a threatening anti-aircraft missile unit, we may elect to set the value of PRIc, very low (move it to the left). This would reduce the false alarms in which we mistakenly declare that we’re being illuminated by an emitter of type, E1, when in fact we are being illuminated by a radar of type, E2. The problem is that we will get more false alarms of the other type and be constantly “on alert” when it is not necessary. Under these conditions, a pilot might tend to ignore the alarms or even turn off the ESM device. These are challenging issues with no easy solutions. Think of alarm systems used in operating rooms by anesthesiologists, or medical tests for dire diseases. False alarms can have fatal consequences. One method to reduce issues related to false alarms is the use of Bayesian Inference. This is based on a theorem developed by an English clergyman, Thomas Bayes, and published in 1763, two years after his death. Bayesian inference is an extension of classical statistical inference in several ways. First, rather than considering only a null hypothesis, H0, and it’s alternate, H1, we consider a set of hypotheses, Hi, which are mutually exclusive. That is, each one is separate from all other hypotheses, and exhaustive. This collection of hypotheses represents every anticipated condition or “explanation” that could cause observed data. One form of the Bayes theorem is shown here. In words, the theorem says, “if E represents an event, that has occurred, the probability that a hypothesis, Hi, is true, based on the observed event, is equal to the a priori probability of the hypothesis, P(Hi), multiplied by the probability that if the hypothesis were true we would have observed the event, E, P(E/Hi), divided by the sum of all the hypotheses, Hi, times the corresponding probability that the event would have been seen or caused by each hypothesis”. In simpler terms, the probability that an hypothesis, Hi, is true (even before we’ve seen any evidence of it) times the probability that, if the hypothesis were true it would have caused the event of evidence to be observed, normalized by the other ways in which the evidence could have been caused. So, Bayes rule could also be written as; P(H|E) = P(E|H)P(H)/P(E); the probability of the hypothesis H, given that we’ve observed E, is equal to the probability of H (prior to any evidence) times the probability that we would have seen E given that H is true, divided by the probability that we would have observed E if any hypothesis was true. Returning to our electronic support measures (ESM), we can use Bayes rule to compute the probability that we have seen (or been illuminated by) an emitter of type E (either E1 or E2), given the observation of a value of PRI0 and Frequency, F0. We could extend this to many different types of emitters that could be observed by our ESM system. For each possible candidate emitter, we compute the joint probability that we have seen emitter of type, X, given the observed values of PRI and Frequency. In each case we can provide a priori information about that likelihood that the emitter would be observed or located in the area we are flying. For example, if our hypothetical pilot is flying near Cedar Rapids, Iowa, it is unlikely that there is an emitter associated with an anti-aircraft missile facility in the area. We can include information about the relative numbers of different types of emitters, and estimate the probabilities of the evidence given the evidence. Also we could use subjective probabilities if we do not have empirical probability data. Before proceeding to discuss, decision-level fusion, we introduce here the concept of a declaration matrix. The idea is shown conceptually here for a radar observing an aircraft. The radar output is an observed radar cross section (RCS) of the aircraft. Using techniques previously described such as signal processing, feature extraction, pattern classification and identity declaration, we could obtain an identity declaration matrix as shown on the figure. The rows of the matrix are possible declarations made by the sensor. That is, the radar could declare that it has seen an object of type 1 (e.g. perhaps a commercial aircraft), OR, it could declare that it has seen an object of type 2 (e.g., a weather balloon), etc. For each of these possible declarations, D1, D2, etc. the matrix would provide an estimate of the probability that the declaration of object identity or type actually matches reality. The columns of the matrix are the actual types of objects or entities that could be observed. Each element of the matrix represents a probability, P(Di|Oj). This is the probability that the sensor would have declared it has seen an object of type, “I” (viz., Di ) , given that in fact it has actually observed an object of type “J” (Oj). The diagonal of the matrix, containing the probabilities, P(Di|Oi), are the probabilities that for each type of entity or object, “I”, the sensor will make the correct identification. All of the off diagonal terms are the probabilities of false identity declarations. We note that, in general, these probabilities would change as a function of observing conditions, distance to the observed target, calibration of the sensor, and many other factors. How could we determine these probabilities? The short answer is that we need to either; i) develop physics based models of the sensor performance, ii) try to calibrate the sensor on a test range to observed these probability distributions, or iii) use subjective probabilities. In general this is a difficult problem. However, for the remaining part of our discussion, we will assume that we could obtain these values in some way. In principle, if we have a number of sensors observing an object or entity, we need to determine the probabilities associated with the declaration matrix for each sensor or sensor type. In the previous example, we implied that the declaration matrix was square – that the number of columns must equal the number of rows. In fact, this is not necessarily true. For example, we might have a sensor that is unable to identify some entities. Instead of making a separate declaration for objects O1, O2, …, to ON, the sensor may only be able to identify objects 1 through M and group everything else into a single declaration we’ll call “I don’t know” (viz., Di don’t know). It is important to note here, that while it is acceptable to have a sensor declare, “I don’t know” for a declaration, we must assume that we can account for every possible type of entity that it might observe and result in an observation. Hence, a sensor can declare, “I don’t know”, but we can’t create an “I don’t know” entity or set of entities. We’ll see later how this can be modified. In order to fusion multiple sensor observations or declarations, we will use Bayesian inference. We can process the declaration matrix of each sensor using Bayes' rule. Suppose sensor A declares that it has “seen” an object of type 1 – viz., it makes a declaration, D1. We can compute a series of probabilities; i) the probability that the sensor has actually seen object 1 given it’s declaration that it has seen object 1 (P(O1|D1), ii) the probability that the sensor has actually seen object 2 given that it has declared it has seen an object of type 1, (P(O2|D1), etc. This can be done for each sensor. Given the multiple observations or declaration, Bayes rule allows fusion of all of those declarations as shown in the equation above. Hence, for each type of object, object type O1, O2, etc, we compute the joint probability that the collection of sensors has observed each type of object. Finally, based on these joint probabilities, we select which one is the most likely (viz., we choose the highest probability. The concept of using Bayes rule is shown in this diagram for multiple sensors. Notice that Bayes combination formula does not tell us what the “true” object is that has been jointly observed by the suite of sensors – it merely computes a joint probability for each type of object. Now you may see why we need to know all the possible types of objects (or entities) that could be observed. This is because we need to know all the possible causes of the sensor observations. Again, while it is ok for a sensor to declare “I don’t know”, it is not ok in Bayesian inference not to know all of the possible “causes” for the observations. So a brief summary about Bayesian inference. First, the good news. This approach allows incorporation about prior information, such as the likelihood that we would see or encounter a particular object, entity or situation. This is done via the use of the “prior” probabilities for each hypothesis, P(Hi). It also allows the use of subjective probabilities – although the issues regarding humans ability to estimate these probabilities still holds. The Bayesian approach also allows iterative updates – we can “guess” at the prior probabilities, collect sensor information, update the probabilities of these objects, events or activities and then use these updated probabilities as the new “priors” for yet further observations and updates. Finally, this approach has a kind of intuitive formulation that “makes sense”.. Now for the bad news (actually not bad news – just limitations and constraints). While we can use the prior probabilities P(Hi) to provide improved information for the inference process, we may not always know what these are. In this case, we resort to what mathematicians call, the “principle of indifference”, a fancy way of saying we can set them all equal (viz., P(Hi) = 1/N, where N is the number of hypotheses). This is convenient, but may significantly misrepresent reality. The second issue is that we need to identify ALL of the hypotheses – the possible causes of our sensors observing “something” (an object, event, or activity). If there is an unknown cause that could induce an observation, this must be identified; otherwise our reasoning is flawed. For example, if a medical test for a disease could be positive if the disease is present, and also positive if the patient has taken another drug or uncommon food item, we would need to know that. Otherwise, our conclusions would be biased. A third issue involves dependent evidence. The formulae that we have shown thus far have assumed that all of the sensor observations are independent – that one observation does not depend upon or is influenced by another one. There are ways of accounting for such conditional dependencies, but the formulae become complicated. Finally, our reasoning could produce some unreasonable results if we do not consider evidential dependencies. The classic example is observing wet grass and concluding that it must have rained. If we knew that there was an automatic sprinkler system that came on in the early morning, our reasoning would be flawed because we did not include these logical dependencies. We turn now to an extension of Bayesian inference introduced by Arthur Dempster and Glen Shafer. In 1966, Arthur Dempster developed a basic theory of how to represent the uncertainties for expert opinion using a concept of “upper” and “lower” probabilities. In 1976, Glen Shafer refined and extended the concept to “upper probabilities” and “degrees of belief”, and in 1988 George Klir and Tina Folger introduced the concepts of “degrees of belief” and “plausibility’. We will use these latter concepts in our discussion of what is commonly termed “Dempster-Shafer” theory. The basic concept involves modeling how humans observe evidence and distribute a sense of belief to propositions about the possibilities in a domain. Conceptually we define a quantity called a “measure of belief” that evidence supports a proposition, A. (e.g., m(A)). We can also assign a measure of belief to a proposition, A, and it’s union with another hypothesis, B; e.g., m(AUB). A couple of comments are needed here. First, a measure of belief is not (quite) a probability – I’ll explain momentarily. Second, a proposition may contain multiple, perhaps conflicting hypotheses. And third, the Dempster-Shafer theory becomes identical to Bayesian theory under some restricted assumptions. In the Dempster-Shafer concept, we say that the probabilities for propositions, are “induced” by the mass distribution according to the relationship shown above. That is, the Probability of Proposition, A, P(A) is equal to the sum of all of the probability masses, m, that both directly relate to proposition, A, and to propositions that contain A as a subset. If we assign probability masses ONLY to hypotheses rather than to general propositions, then the relationships reduce to a Bayesian approach. A key element of the Dempster-Shafer approach is that measures of belief can be assigned to propositions that need not be mutually exclusive. This leads to the notion of an evidential interval {Support (A), Plausibility (A)}. We define an interval that includes a measure that the belief supports the proposition A directly, and a plausibility measure – a measure of the extent to which the assigned belief supports the proposition A indirectly. The plausibility is a measure of the extent that the belief does not directly refute the proposition, A. Let’s turn to an example. Consider the classic probability problem of throwing a die. There are six observable faces that could be shown if a die is thrown; namely, “the number showing is “1”, the number showing is “two”, etc. In classic probability and in Bayesian inference, we only assign evidence or probabilities to the fundamental hypotheses; “hypothesis 1 – the die will show a 1”, “hypothesis 2 – the die will show a “2”,” etc. However, in the Dempster-Shafer approach we can assign measures of belief not only to these fundamental hypotheses, but also to propositions such as, “the number showing on the die is even”, the number showing on the die is “odd”, and finally, the number showing on the die is either a 1 or 2 or 3 or 4 or 5 or 6. This latter proposition is equivalent to saying, “I don’t know” – it is a measure of belief assigned to a general level of uncertainty. In the Dempster-Shafer approach, we introduce two special sets – the set, Θ (theta), of the fundamental hypotheses, and the set 2Θ, of general propositions. The set, Θ, is called the “frame of discernment”. The classic rules of probability show us how to compute the probabilities of combinations of hypothesis; the Bayesian rules of combination show us how to combine probabilities (including a priori information), and the Dempster-Shafer rules of combination show is how to combine probability masses for combinations of propositions. These are shown on the next two charts. Here are the equations to define the support for a proposition, A, and the plausibility of proposition, A. Support is the accumulation of measures of belief for elements in the set, Θ, AND the set, 2Θ, that directly support the proposition, A. Similarly, the plausibility of A, is the sum of the evidence that directly refutes proposition, A. The support for A is the degree to which the evidence supports A, either directly or indirectly, while the plausibility is the extent to which the evidence fails to refute the proposition, A. The support for A and the plausibility for NOT-A must sum to a value of 1. Consider the following two examples; i) First, suppose the support for A is 0 and the plausibility for A is 0. That means there is NO evidence that supports A (either directly or indirectly) and there IS evidence that directly refutes A – hence the proposition must be false. ii) Second, suppose the support for A is 0.25 and the plausibility is 0.85. This means there is a 25% amount of evidence that supports A directly, and 15% of evidence that directly refutes proposition A. The difference between the plausibility and the support is called the evidential interval. In this case there is evidence that both supports AND refutes proposition, A. Clearly, if the support for proposition, A, was 1 (indicating 100% evidence supports proposition, A, AND the plausibility of A was 1 (meaning there was 0 % evidence to refute A, then the proposition must be true. The evidential interval provides a measure of the uncertainty between support and refutation for a proposition. Let’s consider an example of a threat warning system (equivalent to an Electronic Support Measures (ESM) system that we previously introduced). This example is adopted from an article by Thomas Greer. We suppose that the threat warning system (TWS) measures the pulse repetition frequency or interval (PRF) and the radio frequency (RF) of radars that might be illuminating our flying aircraft. We suppose that we’ve collected previous information or developed models, so we can understand the relationship between the PRF, RF observations and the type of radar that emits or exhibits that PRF and RF. What we’re especially concerned about are those radars that are associated with surface to air missile (SAM) units. In addition, we have some evidence and models that allow us to determine what mode of operation such a radar is in; for example in a target tracking (TTR) mode, or an acquisition mode (ACQ). For those who are unfamiliar with such a radar, I note that as an aircraft is flying towards a military unit having a surface to air missile weapon, the SAM radar might transition from a general surveillance mode, to a target tracking mode, to an acquisition mode, and finally to a mode in which the radar is guiding a missile to shoot down our aircraft! Hence, we are concerned not only about what type of radar is observing us, but what mode of operation it is in. This mode of operation would provide evidence of whether or not we are a potential target of a surface to air missile. In this example, we suppose that we have a special “Dempster-Shafer” sensor that observes RF and PRF, and based on some model assigns probability masses (measures of belief) to what type of radar is being observed and what mode of operation it is in. Sample values are shown here. The term, SAM-X, means, “a surface to air missile radar of type X”. In the example, above, the warning system assigns, measures of belief to; i) having seen a surface to air missile radar of type X (SAM-X = 0.3), ii) having seen a surface to air missile radar of type X in the target tracking mode (SAM-X, TTR = 0.4), iii) having seen an surface to air missile radar of type X in the acquisition mode (SAM-X, ACQ = 0.2) and finally, iv) assignment of general uncertainty of a value of 0.1. Given this, the chart uses the previous DemspterShafer relations to compute the evidential intervals for various propositions. Suppose we had two Dempster-Shafer sensors. How would the evidence be combined? This chart shows a numerical example of two sources or sensors providing measures of belief about surface to air missile radars. The example indicates the resulting credibility or evidential intervals for various propositions. In turn, these can be combined using Dempster’s rules of combination to produce fused or combined evidential intervals. These combined results are shown in the next slide. For this example, this chart shows the fused measures of belief and evidential intervals based on the combined evidence from the two Dempster-Shafer sensors. How did we get these results? To get the fused results, we use Dempster’s rules of combination. These are summarized as follows; • The product of mass assignments to two propositions that are consistent leads to another proposition contained within the original (e.g., m1(a1)m2(a1) = m(a1)). • Multiplying the mass assignment to uncertainty by the mass assignment to any other proposition leads to a contribution to that proposition (e.g., m1()m2(a2) = m(a2)). • Multiplying uncertainty by uncertainty leads to a new assignment to uncertainty (e.g, m1()m2() = m()). • When inconsistency occurs between knowledge sources, assign a measure of inconsistency denoted k to their products (e.g., m1(a1)m2(a1) = k). This chart, adapted from an example provided by Ed Waltz, shows a graphical flow of how multiple sources of information are combined using the Dempster-Shafer approach. For each source or sensor, we obtain the measures of belief for propositions and subsequently compute the evidential or credability intervals. We compute the composite belief functions using Dempster’s rules of combination and subsequently compute the joint or fused credibility intervals for the pooled evidence. In our previous discussion of the Bayesian approach, we noted that the approach only provided us with resulting probabilities for hypotheses, but that we still had to apply logic to select the actual hypothesis we believed to be “true”. Similarly, the Dempster-Shafer approach only provides us with joint or fused credibility intervals for various propositions, but does not select which proposition should be accepted as being “true”. This final selection must be based on logic that we provide. This chart is analogous to the chart shown for the Bayesian inference process. It shows the flow from sensors, to declarations of probability masses for propositions, to combination of these declarations using Dempster’s combination rules, to a final decision using decision logic. As with Bayes method, we will highlight some “good news” and “bad news” about Dempster-Shafer inference. First, the good news; i) The method allows incorporation of prior information about hypotheses and propositions (just as Bayes method did using “prior probabilities”); ii) It allows use of subjective evidence, iii) it allows assignment of a general level of uncertainty – that is, it allows assignment of probability mass to the “I don’t know” proposition, and finally iv) it allows for iterative updates – similar to the Bayesian approach. Now for some issues – the “bad news”; i) As with Bayes method, it requires specification of the “prior knowledge” – that is, if we’re going to use information about the prior likelihood of propositions, we must provide that information, ii) this is not a particularly “intuitive” method – while it is aimed at trying to model how humans assign evidence, it is not as intuitive as the use of probabilities, iii) this method is computationally more complex and demanding that the Bayesian approach, iv) it can become complex for dependent evidence, and finally, v) this may still produce anomalous results if we do not take into account dependencies. This is similar to the issue with Bayesian inference. A final note involves the use of a general level of uncertainty. A strength of this method is that we can assign evidence to a general level of uncertainty. This would represent for example a way to The final decision-level fusion technique to be introduced in this lesson is the use of voting. The idea is simple: Each sensor or source of information provides a declaration about a hypothesis or proposition, and we simply us a democratic voting method to combine the data. In the simplest version, each source or sensor gets an equal vote. More sophisticated voting techniques can use weights to try to account for the performance or reliability of each source, or other methods. As with each of the previous methods introduced, we will need decision logic to determine what final result we will select as being “true”. Thus, we might use logic such as, “the majority rules” (select the answer based on which gets the most “votes”), or plurality or other logic. Voting techniques have the advantage of being both intuitive and relatively simple to implement. This slide simply summarizes some of the fusion techniques for voting, weighted decision methods, and Bayesian Decision processing. Young researchers in automated reasoning and artificial intelligence wax enthusiastically about the power of computers and their potential to automate human-like reasoning processes (saying in effect “aren’t computers wonderful!”) Later in their careers these same researchers admit that it is a very difficult problem and believe they could make significant progress with increased computer speed and memory Still later, these researchers realize the complexities of the problem and praise human reasoning (saying in effect, “aren’t humans wonderful!”). The message is that, while very sophisticated mathematical techniques can be developed for decision-level fusion, humans still possess some remarkable capabilities, especially in the use of contextual information to shape decisions.