Dempster-Shafer theory Materi Pendukung : T0264P11_1

advertisement
Materi Pendukung : T0264P11_1
Dempster-Shafer theory
From Wikipedia, the free encyclopedia
Retrieved from "http://en.wikipedia.org/wiki/Dempster-Shafer_theory"
Jump to: navigation, search
The Dempster-Shafer theory is a mathematical theory of evidence ([SH76]) based on
belief functions and plausible reasoning, which is used to combine separate pieces of
information (evidence) to calculate the probability of an event. The theory was developed
by Arthur P. Dempster and Glenn Shafer.
Contents
1 Consider two possible gambles
2 Formalism
3 Discussion
3.1 Support and plausibility
3.2 Combining probability sets
4 References
5 See also
Consider two possible gambles
The first gamble is that we bet on a head turning up when we toss a coin that is known to
be fair. Now consider the second gamble, in which we bet on the outcome of a fight
between the world's greatest boxer and the world's greatest wrestler. Assume we are
fairly ignorant about martial arts and would have great difficulty making a choice of who
to bet on.
Many people would feel more unsure about taking the second gamble, in which the
probabilities are unknown, rather than the first gamble, in which the probabilities are
easily seen to be one half for each outcome. Dempster-Shafer theory allows one to
consider the confidence one has in the probabilities assigned to the various outcomes.
Formalism
Let
be the universal set, the set of all states under consideration. The power set,
, is the set of all possible sub-sets of
example, if:
, including the empty set,
. For
then
The elements of the power set can be taken to represent propositions that one might be
interested in, by containing all and only the states in which this proposition is true.
By definition, the mass of the empty set is zero:
The masses of the remaining members of the power set add up to a total of 1:
The mass
of a given member of the power set,
, expresses the proportion of all
relevant and available evidence that supports the claim that the actual state belongs to
but to no particular subset of
. The value of
makes no additional claims about any subsets of
own mass.
pertains only to the set
and
, each of which has, by definition, its
From the mass assignments, the upper and lower bounds of a probability interval can be
defined. This interval contains the precise probability of a set of interest (in the classical
sense), and is bounded by two non-additive continuous measures called belief (or
support) and plausibility:
The belief
for a set
is defined as the sum of all the masses of (not necessarily
proper) subsets of the set of interest:
The plausibility
interest
:
is the sum of all the masses of the sets
that intersect the set of
The two measures are related to each other as follows:
It follows from the above that you need know but one of the three (mass, belief, or
plausibility) to deduce the other two, though you may need to know the values for many
sets in order to calculate one of the other values for a particular set.
The problem we now face is how to combine two independent sets of mass assignments.
The original combination rule, known as Dempster's rule of combination, is a
generalization of Bayes' rule. This rule strongly emphasises the agreement between
multiple sources and ignores all the conflicting evidence through a normalization factor.
Use of that rule has come under serious criticism when significant conflict in the
information is encountered.
Specifically, the combination (called the joint mass) is calculated from the two sets of
masses
and
in the following manner:
where:
is a measure of the amount of conflict between the two mass sets. The normalization
factor,
, has the effect of completely ignoring conflict and attributing any mass
associated with conflict to the null set. Consequently, this operation yields
counterintuitive results in the face of significant conflict in certain contexts.
Discussion
Dempster-Shafer theory is a generalization of the Bayesian theory of subjective
probability; whereas the latter requires probabilities for each question of interest, belief
functions base degrees of belief (or confidence, or trust) for one question on the
probabilities for a related question. These degrees of belief may or may not have the
mathematical properties of probabilities; how much they differ depends on how closely
the two questions are related ([SH02]). Put another way, it is a way of representing
epistemic plausibilities but it can yield answers which contradict those arrived at using
probability theory.
Often used as a method of sensor fusion, Dempster-Shafer theory is based on two ideas:
obtaining degrees of belief for one question from subjective probabilities for a related
question, and Dempster's rule ([DE68]) for combining such degrees of belief when they
are based on independent items of evidence. In essence, the degree of belief in a
proposition depends only upon the number of answers to the related questions that
contain the proposition, and the subjective probabilities of the answers to each of the
related questions.
In this formalism a degree of belief (also referred to as a mass) is represented as a belief
function rather than a Bayesian probability distribution. Probability values are assigned
to sets of possibilities rather than single events: their appeal rests on the fact they
naturally encode evidence in favor of propositions.
Dempster-Shafer theory assigns its masses to all of the subsets of the entities that
comprise a system. Suppose for example that a system has five members, that is to say
five independent states, exactly one of which is actual. If the original set is called S, then
the set of all subsets —the power set— is called 2S. Since you can express each possible
subset as a binary vector (describing whether any particular member is present or not by
writing a “1” or a “0” for that member's slot), it can be seen that there are 25 subsets
possible, ranging from the empty subset (0, 0, 0, 0, 0) to the "everything" subset (1, 1, 1,
1, 1). The empty subset represents a contradiction, which is not true in any state, and is
thus assigned a mass of zero; the remaining masses are normalised so that their total is 1.
The "everything" subset is often labelled "unknown" as it represents the state where all
elements are present, in the sense that you cannot tell which is actual.
Support and plausibility
Shafer's framework allows for belief about propositions to be represented as intervals,
bounded by two values, support and plausibility:
support ≤ plausibility.
Support for a hypothesis is constituted by the sum of the masses of all sets enclosed by it
(i.e. the sum of the masses of all subsets of the hypothesis). It is the amount of belief that
directly supports a given hypothesis at least in part, forming a lower bound. Plausibility is
1 minus the sum of the masses of all sets whose intersection with the hypothesis is empty
(equivalently, it is the sum of the masses of all sets whose intersection with the
hypothesis is not empty). It is an upper bound on the belief that the hypothesis could
possibly happen, i.e. it "could possibly happen" up to that value, because there is only so
much evidence that contradicts that hypothesis.
For example, suppose we have a support of 0.5 and a plausibility of 0.8 for a proposition,
say "the cat in the box is dead." This means that we have evidence that allows us to state
strongly that the proposition is true with a confidence of 0.5. However, the evidence
contrary to that hypothesis (i.e. "the cat is alive") only has a confidence of 0.2. The
remaining mass of 0.3 (the gap between the 0.5 supporting evidence on the one hand, and
the 0.2 contrary evidence on the other) is "indeterminate," meaning that the cat could
either be dead or alive. This interval represents the level of uncertainty based on the
evidence in your system.
Hypothesis
Probability Support Plausibility
Null (neither alive nor dead) 0
0
0
Alive
0.2
0.2
0.5
Dead
0.5
0.5
0.8
Either (alive or dead)
0.3
1.0
1.0
The null hypothesis is set to zero by definition (it corresponds to "no solution"). The
orthogonal hypotheses "Alive" and "Dead" have probabilities of 0.2 and 0.5, respectively.
This could correspond to "Live/Dead Cat Detector" signals, which have respective
reliabilities of 0.2 and 0.5. Finally, the all-encompassing "Either" hypothesis (which
simply acknowledges there is a cat in the box) picks up the slack so that the sum of the
masses is 1. The support for the "Alive" and "Dead" hypotheses matches their
corresponding masses because they have no subsets; support for "Either" consists of the
sum of all three masses (Either, Alive, and Dead) because "Alive" and "Dead" are each
subsets of "Either". The "Alive" plausibility is m(Alive)+m(Either), since only "Either"
intersects "Alive". Likewise, the "Dead" plausibility is m(Dead)+m(Either). Finally, the
"Either" plausibility sums m(Alive)+m(Dead)+m(Either). The universal hypothesis
("Either") will always have 100% support and plausibility —it acts as a checksum of
sorts.
Here is a somewhat more elaborate example where the behaviour of support and
plausibility begins to emerge. We're looking at a faraway object, which can only be
coloured in one of three colours (red, white, and blue) through a variety of detector
modes:
Hypothesis Probability Support Plausibility
Null
0
0
0
Red
0.35
0.35
0.56
White
0.25
0.25
0.45
Blue
0.15
0.15
0.34
Red or white 0.06
0.66
0.85
Red or blue
0.05
0.55
0.75
White or blue 0.04
0.44
0.65
Any
1.0
1.0
0.1
Combining probability sets
Beliefs corresponding to independent pieces of information are combined using
Dempster's rule of combination which is a generalisation of the special case of Bayes'
theorem where events are independent (There is as yet no method of combining nonindependent pieces of information). Note that the probability masses from propositions
that contradict each other can also be used to obtain a measure of how much conflict
there is in a system. This measure has been used as a criterion for clustering multiple
pieces of seemingly conflicting evidence around competing hypotheses.
In addition, one of the computational advantages of the Dempster-Shafer framework is
that priors and conditionals need not be specified, unlike Bayesian methods which often
use symmetry arguments to assign prior probabilities to random variables (e.g. assigning
0.5 to binary values for which no information is available about which is more likely).
However, any information contained in the missing priors and conditionals is not used in
the Dempster-Shafer framework unless it can be obtained indirectly - and arguably is
then available for calculation using Bayes equations.
Dempster-Shafer theory allows one to specify a degree of ignorance in this situation
instead of being forced to supply prior probabilities which add to unity. This sort of
situation, and whether there is a real distinction between risk and ignorance, has been
extensively discussed by statisticians and economists. See, for example, the contrasting
views of Ellsberg and Howard Raiffa.
References



[DE68] Dempster, Arthur P.; A generalization of Bayesian inference, Journal of
the Royal Statistical Society, Series B, Vol. 30, pp. 205-247, 1968
[SH76] Shafer, Glenn; A Mathematical Theory of Evidence, Princeton University
Press, 1976
[SH02] Shafer, Glenn; Dempster-Shafer theory, 2002
Download