Normalising the Consensus Operator for Belief Fusion ∗ Audun Jøsang QUT Brisbane, Australia Simon Pope DSTO Adelaide, Australia. David McAnally The Gap Brisbane, Australia. Abstract The consensus operator is used for cumulative belief fusion in subjective logic, and takes beliefs on binary frames of discernment as input. Coarsening to a binary state space is therefore needed in case the original frame is larger than binary. Coarsening causes information loss in general, and can lead to inconsistent results when applying the consensus operator. We presents a method for normalising the output beliefs of the consensus operator in order to restore consistency in all cases. Keywords: belief theory, belief fusion 1 Introduction The consensus operator [1, 5, 6] operates on 3dimensional belief representations called opinions [3], and provides a method for cumulative fusion of opinions from different sources. Its purpose is clearly different from, but still similar to that of e.g. Dempster’s rule and other belief fusion operators [11]. The consensus operator applies to belief distribution functions on binary frames of discernment, and coarsening is needed when the original frame is larger than binary. Fusing opinions derived from coarsened frames can lead to inconsistent results, and this represents a limitation for the applicability of the consensus operator. This paper describes a method for normalising the consensus operator, whenever necessary, in order to make it applicable in all cases. Appears in the proceedings of IPMU 2006 ∗ 2 Representing Opinions Belief representation in classic belief theory[10] is based on an exhaustive set of mutually exclusive atomic states which is called the frame of discernment Θ. The power set 2Θ is the set of all sub-states of Θ. A bba (basic belief assignment 1 ) is a belief distribution function m Θ mapping 2Θ to [0, 1] such that X mΘ (x) = 1 , where mΘ (∅) = 0 . (1) x⊆Θ The bba distributes a total belief mass of 1 amongst the subsets of Θ such that the belief mass for each subset is positive or zero. Each subset x ⊆ Θ such that mΘ (x) > 0 is called a focal element of mΘ . In the case of total ignorance, mΘ (Θ) = 1, and mΘ is called a vacuous bba. In case all focal elements are atoms (i.e. one-element subsets of Θ) then we speak about Bayesian bba. A dogmatic bba is when mΘ (Θ) = 0 [12]. Let us note that, trivially, every Bayesian bba is dogmatic. In addition, we also define the Dirichlet bba, and its cluster variant, as follows. Definition 1 (Dirichlet bba) A bba where the possible focal elements are Θ and/or singletons of Θ, is called a Dirichlet belief distribution function. Definition 2 (Cluster Dirichlet bba) A bba where the only focal elements are Θ and/or mutually exclusive subsets of Θ (singletons or clusters of singletons), is called a cluster Dirichlet belief distribution function. 1 Called basic probability assignment in [10]. It can be noted that Bayesian bbas are a special case of Dirichlet bbas. The Dempster-Shafer theory [10] defines a belief function b(x). The probability transformation [2]2 projects a bba onto a probability expectation value denoted by E(x). In addition, subjective logic [3] defines a disbelief function d(x), an uncertainty function u(x), and a base rate function 3 a(x), defined as follows: b(x) = X m(y) ∀ x ⊆ Θ , (2) X m(y) ∀ x ⊆ Θ , (3) For E(x) ≤ b(x) + a(x)u(x) : bx dx = E(x)b(x) b(x)+a(x)u(x) =1− E(x)(b(x)+u(x)) b(x)+a(x)u(x) E(x)u(x) ux = b(x)+a(x)u(x) ax = a(x) , for E(x) > b(x) + a(x)u(x) : ∅6=y⊆x d(x) = y∩x=∅ u(x) = X m(y) ∀ x ⊆ Θ , (4) y ∩ x 6= ∅ y 6⊆ x a(x) = |x|/|Θ| X |x ∩ y| mΘ (y) E(x) = |y| ∀ x ⊆ Θ , (5) ∀ x ⊆ Θ . (6) y⊆Θ In case |Θ| > 2, coarsening is necessary in order to apply subjective logic operators. Choose x ⊆ Θ, and let x be the complement of x in Θ, then X = {x, x} is a binary frame, and x is called that target of the coarsening. The coarsened bba on X can consist of the three belief masses bx = mX (x), dx = mX (x) and ux = mX (X), called belief, disbelief and uncertainty respectively. Coarsened belief masses can be computed e.g. with simple, normal or Bayesian coarsening as defined in [3, 7, 9], or with smooth coarsening defined next. Definition 3 (Smooth Coarsening) Let Θ be a frame of discernment and let b(x), d(x), u(x) and a(x) be the belief, disbelief, uncertainty and base rate functions of the target state x ⊆ Θ, with probability expectation value E(x). Let X = {x, x} be the binary frame of discernment, where x is the complement of x in Θ. Smooth coarsening of Θ into X produces the corresponding belief, disbelief, uncertainty and base rate functions bx , dx , ux and ax defined by: 2 3 Also known as the pignistic transformation [13, 14] Called relative atomicity in [3]. bx dx =1− = (1−E(x))(d(x)+u(x)) 1−b(x)−a(x)u(x) (1−E(x))d(x) 1−b(x)−a(x)u(x) (1−E(x))u(x) ux = 1−b(x)−a(x)u(x) ax = a(x) . In case the target element for the coarsening is a focal element of a (cluster) Dirichlet belief distribution function, E(x) = b(x) + a(x)u(x) holds, and the coarsening can be described as stable. Definition 4 (Stable Coarsening) Let Θ be a frame of discernment, and let x ∈ Θ be the target element for the coarsening. Let the belief distribution function be a Dirichlet (cluster or not) bba such that the target element is also a focal element. Then the stable coarsening of Θ into X produces the belief, disbelief and uncertainty functions bx , dx and ux defined by: bx = b(x) , dx = d(x) , ux = u(x) . It can be shown that bx , dx , ux , ax ∈ [0, 1], and that Eq.(7) and Eq.(8) hold. bx + d x + u x = 1 . (Additivity) (7) E(x) = bx + ax ux . (Expectation) (8) It can be noticed that non-stable coarsening requires an adjustment of the belief, disbelief and uncertainty functions in general, and this can result in inconsistency when applying the consensus operator. The purpose of this paper is to present a method to rectify this problem. This will be explained in further detail in the sections below. The ordered quadruple ωx = (bx , dx , ux , ax ), called the opinion about x, is equivalent to a bba on a binary frame of discernment X, with an additional base rate parameter ax which can carry information about the relative size of x in Θ. The opinion space can be mapped into the interior of an equal-sided triangle, where the the relative distance towards the bottom right, bottom left and the top corners represent belief, disbelief and uncertainty functions respectively. For an arbitrary opinion ωx = (bx , dx , ux , ax ), the three parameters bx , dx and ux determine the position of the opinion point in the triangle. The base line is the probability axis. The base rate value can be indicated as a point on the probability axis. Fig.1 illustrates an example opinion about x with the value ωx = (0.7, 0.1, 0.2, 0.5). Uncertainty 1 Example opinion: ωx = (0.7, 0.1, 0.2, 0.5) 0 0.5 0.5 Disbelief 1 0 Probability axis 0.5 0 ax E(x ) Various visualisations of opinions are possible to facilitate human interpretation. For this, see∼josang/sl/demo/BV.html 3 Deriving the Consensus Operator The consensus operator is equivalent to Bayesian updating of the beta PDF (probability density function). Its derivation is based on a bijective mapping between beta PDFs and opinions. 3.1 Mapping Opinions to Beta PDFs The beta-family of distributions is a continuous family of distribution functions indexed by the two parameters α and β. The beta distribution beta(α, β) can be expressed using the gamma function Γ as: Γ(α+β) beta(α, β) = Γ(α)Γ(β) P α−1 (1−P )β−1 , (9) where 0 ≤ P ≤ 1, α > 0, β > 0 0 ωx The reason why a redundant parameter is kept in the opinion representation is that it allows for more compact expressions of opinion operators [3, 4, 7, 8]. Projector 1 1Belief Figure 1: Opinion triangle with example opinion The projector going through the opinion point, parallel to the line that joins the uncertainty corner and the base rate point, determines the probability expectation value E(x) = bx + ax ux . Although an opinion has 4 parameters, it only has 3 degrees of freedom because the three components bx , dx and ux are dependent through Eq.(7). As such they represent the traditional Bel(x) (Belief) and Pl(x) (Plausibility) pair of Shaferian belief theory through the correspondence Bel(x) = b(x) and Pl(x) = b(x)+u(x). The disbelief function d(x) is the same as doubt of x in Shafer’s book. However, ‘disbelief’ seems to be a better term because the case when it is certain that x is false, is better described by ‘total disbelief’ than by ‘total doubt’. with the restriction that the probability P 6= 0 if α < 1, and P 6= 1 if β < 1. The probability expectation value of the beta distribution is given by: E(P ) = α/(α + β), where P is the random variable corresponding to the probability. It can be observed that the beta PDF has two degrees of freedom whereas opinions have three degrees of freedom as explained in Sec.2. In order to define a bijective mapping between opinions and beta PDFs, we will augment the beta PDF expression with 1 additional parameter representing the prior, so that it also gets 3 degrees of freedom. The α parameter represents the amount of evidence in favour a given outcome or statement, and the β parameter represents the amount of evidence against the same outcome or statement. With a given state space, it is possible to express the a priori PDF using a base rate parameter in addition to the evidence parameters. The beta PDF parameters with the prior base rate a included can be defined as [3]: α = r + 2a, (10) β = s + 2(1 − a) , where a represents the a priori base rate, r represents the amount of positive evidence, and s represents the amount of negative evidence. We define the augmented beta PDF, denoted ϕ(r, s, a), with 3 parameters as: ϕ(r, s, a) = beta(α, β) , given Eq.(10). (11) This augmented beta distribution function distinguishes between the a priori base rate a, and the a posteriori observed evidence (r, s). For example, when an urn contains unknown proportions of red and black balls, the likelihood of picking a red ball is not expected to be greater or less than that of picking a black ball, so the a priori probability of picking a red ball is a = 0.5, and the a priori augmented beta distribution is ϕ(0, 0, 21 ) = beta(1, 1) as illustrated in Fig.2. Probability density Beta(p | 1,1 ) 5 4 3 2 r = 2bx /ux bx = r/(r+s+2) s = 2dx /ux dx = s/(r+s+2) ⇔ 1 = b +d +u u x = 2/(r+s+2) x x x a = ax , ax = a . It can be noted that under this correspondence, the example opinion of Fig.1 and the beta distribution of Fig.3 are equivalent. 3.2 Bayesian Updating with Opinions Bayesian updating of beta PDFs is simply vector addition of (r, s) pairs. For example, let a specific outcome x have the base rate a x prior to any observations, and assume that observer A’s observations are (rxA , sA x ) and observer B’s obB B servations are (rx , sx ). The combined observations of both A and B would then simply be B 00 00 (rxA + rxB , sA x + sx ). We use the symbol ⊕ to denote cumulative fusion of a posteriori evidence, so that the combination of the two augmented beta PDFs can be expressed as: B ϕ(rxA + rxB , sA x + sx , a) 1 0 0 0.2 0.4 0.6 0.8 1 (12) (13) B B = ϕ(rxA , sA x , a) ⊕ ϕ(rx , sx , a) . Probability p Figure 2: A priori ϕ(0, 0, 12 ) Assume that an observer picks 8 balls, of which 7 turn out to be red and only one turns out to be black. The updated augmented beta distribution of the outcome of picking red balls is ϕ(7, 1, 12 ) = beta(8, 2) which is illustrated in Fig.3. Probability density Beta(p | 8,2 ) 2dA x , uA x ax ) B B x x , 2d ϕ( 2b B , uB u x x ax ) A x , ωxA 7−→ ϕ( 2b uA 5 x 4 ωxB 3 2 1 0 The consensus operator is defined by first mapping two opinions, ωxA and ωxB , to augmented beta PDFs using the left side of Eq.(12), then combining the augmented beta PDFs using Eq.(13), and finally mapping the result back to an opinion using the right side of Eq.(12). The mapping from opinions to augmented beta PDFs gives: 0 0.2 0.4 0.6 Probability p 0.8 1 Figure 3: Updated ϕ(7, 1, 21 ) The expression for augmented beta PDFs has 3 degrees of freedom and allows a bijective mapping to opinions [3], defined as: 7−→ (14) The augmented beta PDFs can now be fused according to Eq.(13), and the result mapped back to an opinion again. This can be written as: A 2dA 2dB 2bx 2bB x x x + B , A + B , ax 7−→ ωxAB (15) ϕ uA ux ux ux x The symbol “” denotes the fusion of two observers A and B into a single imaginary observer denoted as A B. All the necessary elements are now in place to define the consensus operator . Definition 5 (Consensus Operator) A A B Let ωxA = (bA = x , dx , ux , ax ) and ωx B B B (bx , dx , ux , ax ) be opinions respectively held by agents A and B about the same state x, and let B A B A B κ = uA x + ux − ux ux . When ux , ux → 0, A/B the relative weight γ between ωxA and ωxB A AB = is defined as γ A/B = uB x /ux . Let ωx AB AB AB AB (bx , dx , ux , ax ) be the opinion such that: for κ 6= 0 : AB bx dAB x uAB x AB ax B B A = (bA x ux + bx ux )/κ B A B = (dA x ux + dx ux )/κ B = (uA x ux )/κ = ax , for κ = 0 : AB B A/B + 1) bx = (γ A/B bA x + bx )/(γ B A/B + 1) = (γ A/B dA dAB x x + dx )/(γ uAB =0 x AB ax = a x . Then ωxAB is called the consensus opinion between ωxA and ωxB , representing an imaginary agent [A B]’s opinion about x, as if that agent represented both A and B. By using the symbol ‘⊕’ to designate this operator, we define ω xAB ≡ ωxA ⊕ ωxB . 4 Normalising the Consensus Operator The consensus operator applies to belief distribution functions (bbas) on binary frames of discernment, and coarsening is needed in case the original frame of discernment is larger than binary. The consensus operator can then be applied to individual elements in the original frame of discernment separately. However, inconsistency can sometimes occur because the coarsening process removes information. The normalisation then consists of normalising the result of the consensus operator over all the individual elements. This is described in detail next. Let Θ be a frame of discernment larger than binary, and let X = {x1 , x2 , . . . xn } be a set of exhaustive S and mutually exclusive elements in Θ, i.e. Θ = ni=1 xi and xi ∩ xj = ∅ for all i 6= j. In general it can be assumed that different observers, denoted by Ak , provide conflicting bbas on Θ Let X1 , X3 , . . . Xn be n different coarsenings where Xi = {xi , xi } for each coarsening indexed by i. Let the coarsened opinions from each bba be denoted by ωxAik , where k indexes the bba. For any coarsening it can be shown that: n X E(ωxAik ) = 1 for all bbas Ak . (16) i=1 Assume that the consensus operator has been applied to opinions on x1 , x2 , . . . , xn , which has produced a set of n fused opinions ωx1 , ωx2 , . . . , ωxn with probability expectation values E(ωx1 ), E(ωx2 ), . . . , E(ωxn ). Consistency would require that: n X E(ωxj ) = 1 , for all xj ∈ X , (17) j=1 The consensus operator is commutative, associative and non-idempotent. It is here assumed that the two input opinions have the same base rate, but disagreement over the base rate is also possible, see [5]. In case of dogmatic opinions the associativity of the consensus operator does not emerge directly from Def.5 because relative weights between each pair of opinions are needed. This case is described in detail in [6]. but when the consensus operator is applied to opinions not resulting from a stable coarsening, that can not be guaranteed. The consistency can be rectified by normalisation. The normalised probability expectation values, denoted by Ẽ(ωxj ), can be computed as: Ẽ(ωxj ) = E(ωxj ) Pn i=1 E(ωxi ) for all xj ∈ X . , (18) The normalised opinions can be computed from the normalised probability expectation values by adjusting the b, d and u parameters. This can be done in the same way as the belief, disbelief and uncertainty functions b(x), d(x) and u(x) was mapped to opinion parameters in the smooth coarsening of Def.3. The normalised consensus operator can then be defined as follows. Definition 6 (Normalisation Method) Let Θ be a frame of discernment where |Θ| ≥ 3, and for which there are multiple and possibly conflicting bbas that must be fused. Normalised fusion of bbas consists of the following steps. 1. Make binary coarsenings of the original bbas according to Def.3 to produce opinions on binary partitions of the original frame of discernment. 2. Apply the consensus operator to fuse the opinions according to Def.5 to produce fused opinions for each binary partition. 3. Check for consistency according to Eq.(17). • In case of consistency, the fused opinions denoted by ωxj = (bj , dj , uj , aj ) are valid and do not require normalisation. Exit the algorithm. • In case of inconsistency, proceed with step 4 below. 4. Normalise the expectation values of the fused opinions according to Eq.(18). 5. For each fused opinion, set the following values: E(xj ) b(x) d(x) u(x) = Ẽ(ωxj ) , = bxj , = dxj , = uxj . (19) and apply the smooth coarsening of Def.3. Then the resulting opinions, denoted by ω̃ xj = (b̃j , d˜j , ũj , ãj ), are called normalised fused opinions. In the next section we will illustrate the effect of the normalisation by an example. 5 Example We consider the case of a loaded dice, where Θ = {1, 2, 3, 4, 5, 6} is the set of possible outcomes. An informant has special knowledge about the loaded dice, and an observer is trying to predict the outcome of throwing the dice based on hints from the informant. First the informant provides hint A which says that the dice will always produce an even number. The observer translates this into the belief mass mA ({2, 4, 6}) = 1. Then the informant provides hint B which says that the dice will always produce a prime number. The observer translates this into the belief mass mB ({2, 3, 5}) = 1. In order to apply the consensus operator to combine the two hints, the observer must select a set of binary partitions for the coarsenings. The most general approach in this example is to make 6 binary coarsenings of Θ, targeting {1}, {2}, {3}, {4}, {5} and {6} respectively. This produces the following 6 binary partitions of the frame of discernment: X1 = {{1}, {2, 3, 4, 5, 6}} , X2 = {{2}, {1, 3, 4, 5, 6}} , X3 = {{3}, {1, 2, 4, 5, 6}} , X4 = {{4}, {1, 2, 3, 5, 6}} , X5 = {{5}, {1, 2, 3, 4, 6}} , X6 = {{6}, {1, 2, 3, 4, 5}} . The corresponding coarsened opinions resulting from hint A:“The dice will produce an even number” are: A = (0, 1, 0, 1 ) , ω{1} 6 A = ( 1 , 0, 4 , 1 ) , ω{4} 5 5 6 A = ( 1 , 0, 4 , 1 ) , ω{2} 5 5 6 A = (0, 1, 0, 1 ) , ω{5} 6 A = (0, 1, 0, 1 ) , ω{3} 6 A = ( 1 , 0, 4 , 1 ) . ω{6} 5 5 6 The opinions resulting from hint B:“The dice will produce a prime number” are: 6 B = (0, 1, 0, 1 ) , ω{1} 6 B = (0, 1, 0, 1 ) , ω{4} 6 B = ( 1 , 0, 4 , 1 ) , ω{2} 5 5 6 B = ( 1 , 0, 4 , 1 ) , ω{5} 5 5 6 B = ( 1 , 0, 4 , 1 ) , ω{3} 5 5 6 B = (0, 1, 0, 1 ) . ω{6} 6 It can be observed that Eq.(16) is satisfied for both hints, and that: A ) = E(ω A ) = E(ω A ) = E(ω{2} {4} {6} 1 3 , B ) = E(ω B ) = E(ω B ) = E(ω{2} {3} {5} 1 3 , A ) = E(ω A ) = E(ω A ) = 0 , E(ω{1} {3} {5} B ) = E(ω B ) = E(ω B ) = 0 . E(ω{1} {4} {6} We now apply the consensus operator of Def.5 to the opinions of each possible outcome {1}, {2}, {3}, {4}, {5} and {6} in turn. This produces: AB = (0, 1, 0, 1 ), ω{1} 6 AB = (0, 1, 0, 1 ), ω{4} 6 AB = ( 1 , 0, 2 , 1 ), ω AB = (0, 1, 0, 1 ), ω{2} 3 3 6 6 {5} AB = (0, 1, 0, 1 ), ω{3} 6 AB = (0, 1, 0, 1 ). ω{6} 6 AB ) = It can be seen that: E(ω{2} that 4 9 = 0.444 and AB ) = E(ω AB ) = E(ω AB ) = 0, E(ω{1} {4} {3} AB ) = E(ω AB ) = 0 , E(ω{5} {6} so that the consistency requirement expressed by Eq.(17) is not satisfied. By applying the normalisation method, the normalised expectation value and the normalised opinion become: 1 AB AB = (1, 0, 0, ). ) = 1, and ω̃{2} Ẽ(ω{2} 6 As a result, the observer concludes that the dice can only produce a “2”, which of course was obvious in this example because “2” is the only number that is both even and prime. Discussion and Conclusion The consensus operator is used for cumulative belief fusion on binary state spaces. Thanks to the normalisation and coarsening described in this paper, it can also be used for cumulative fusion of bbas on larger state spaces. In case of coarsening of Dirichlet bbas, the consensus operator does not require any normalisation. It is only when any of the original bbas are non-Dirichlet that normalisation might be needed in order to produce consistent results. Normalisation is also needed for Dempster’s rule [10], and serves to redistribute belief in case of conflicting input bba arguments. However, the purposes of Dempster’s rule and the consensus operator are different. Without going into detail, the consensus operator is used for cumulative belief fusion, whereas Dempster’s rule is used for conjunctive belief fusion. It is often hard to determine when a practical situation should be modelled with cumulative or with conjunctive belief fusion, and this particular problem can be the source of considerable confusion. The fact that conjunctive belief fusion with Dempster’s rule often represents a good approximation of cumulative belief fusion with the consensus operator, and vice versa, only adds to this confusion. In case of low conflict between the two argument beliefs, Dempster’s rule and the consensus operator produce approximately the same result, but as the level of conflict increases the difference gets more and more pronounced. In case of strong conflict, Dempster’s rule always produces correct results when the argument beliefs are interpreted as conjunctive constraints, meaning that the bbas are combined in line with the product of probabilities. The consensus operator handles strongly conflicting beliefs in line with standard weighted average of probabilities, and thus interprets the belief arguments as cumulative evidence. The example of the dice above could also be interpreted as a situation of conjunctive evidence, so Dempster’s rule could be used as an alternative to the consensus operator. When applying Dempster’s rule to this example there is no conflict, so it produce the same result as the consensus operator. Interestingly, because there is no conflict when applying Dempster’s rule in this particular exam- ple, the normalisation of Dempster’s rule does not come into play. In that sense other operators such as Smets’ non-normalised version of Dempster’s rule [14] would also have produced the same result. To interpret the beliefs about the dice as cumulative evidence, the hints could be worded as A :“The dice produced an even number k out of k times” and B :“The dice produced an prime number k out of k times”, where it is assumed that the dice has been thrown 2k times in total and k is arbitrarily large. 