Subjective Logic Draft, 23 November 2015 Audun Jøsang University of Oslo Web: http://folk.uio.no/josang/ Email: josang@mn.uio.no Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Elements of Subjective Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Motivation for the Opinion Representation . . . . . . . . . . . . . . . . . . . . . 2.2 Flexibility of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Domains and Hyperdomains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Random Variables and Hypervariables . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Belief Mass Distribution and Uncertainty Mass . . . . . . . . . . . . . . . . . 2.6 Base Rate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8 8 12 13 14 17 3 Opinion Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Belief and Trust Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Opinion Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Binomial Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Binomial Opinion Representation . . . . . . . . . . . . . . . . . . . . . . 3.3.2 The Beta Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Mapping between a Binomial Opinion and a Beta PDF . . . . . 3.4 Multinomial Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 The Multinomial Opinion Representation . . . . . . . . . . . . . . . . 3.4.2 The Dirichlet Multinomial Model . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Visualising Dirichlet Probability Density Functions . . . . . . . . 3.4.4 Coarsening Example: From Ternary to Binary . . . . . . . . . . . . 3.4.5 Mapping between Multinomial Opinion and Dirichlet PDF . 3.4.6 Uncertainty-Maximisation of Multinomial Opinions . . . . . . . 3.5 Hyper Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 The Hyper-Opinion Representation . . . . . . . . . . . . . . . . . . . . . 3.5.2 Projecting Hyper-Opinions to Multinomial Opinions . . . . . . 3.5.3 The Dirichlet Model Applied to Hyperdomains . . . . . . . . . . . 3.5.4 Mapping between a Hyper-Opinion and a Dirichlet HPDF . . 3.5.5 Hyper Dirichlet PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 22 22 24 26 27 27 29 32 33 33 35 36 36 37 38 40 41 vii viii Contents 3.6 Alternative Opinion Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.6.1 Probabilistic Notation of Opinions . . . . . . . . . . . . . . . . . . . . . . 43 3.6.2 Qualitative Category Representation . . . . . . . . . . . . . . . . . . . . 45 4 Decision-Making Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Aspects of Belief and Uncertainty in Opinions . . . . . . . . . . . . . . . . . . 4.1.1 Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Vagueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Dirichlet Visualisation of Opinion Vagueness . . . . . . . . . . . . . 4.1.4 Elemental Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Mass-Sum for Specificity, Vagueness and Uncertainty . . . . . . . . . . . . 4.2.1 Elemental Mass-Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Total Mass-Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Utility and Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Decision Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 The Ellsberg Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Examples of Decision Under Vagueness and Uncertainty . . . . . . . . . 4.6.1 Decisions with Difference in Projected Probability . . . . . . . . 4.6.2 Decisions with Difference in Specificity . . . . . . . . . . . . . . . . . 4.6.3 Decisions with Difference in Vagueness and Uncertainty . . . 4.7 Entropy in the Opinion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Outcome Surprisal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Opinion Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Conflict Between Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 49 50 52 53 54 54 56 57 61 63 67 67 69 71 73 74 75 77 5 Principles of Subjective Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Related Frameworks for Uncertain Reasoning . . . . . . . . . . . . . . . . . . . 5.1.1 Comparison with Dempster-Shafer Belief Theory . . . . . . . . . 5.1.2 Comparison with Imprecise Probabilities . . . . . . . . . . . . . . . . 5.1.3 Comparison with Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 Comparison with Kleene’s Three-Valued Logic . . . . . . . . . . . 5.2 Subjective Logic as a Generalisation of Probabilistic Logic . . . . . . . . 5.3 Overview of Subjective Logic Operators . . . . . . . . . . . . . . . . . . . . . . . 81 81 81 83 83 85 86 90 6 Addition, Subtraction and Complement . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 95 97 98 7 Binomial Multiplication and Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.1 Binomial Multiplication and Comultiplication . . . . . . . . . . . . . . . . . . 101 7.1.1 Binomial Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.1.2 Binomial Comultiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.1.3 Approximations of Product and Coproduct . . . . . . . . . . . . . . . 104 7.2 Reliability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.2.1 Simple Reliability Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Contents ix 7.2.2 Reliability Analysis of Complex Systems . . . . . . . . . . . . . . . . 109 Binomial Division and Codivision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.3.1 Binomial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.3.2 Binomial Codivision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.4 Correspondence with Probabilistic Logic . . . . . . . . . . . . . . . . . . . . . . . 113 7.3 8 Multinomial Multiplication and Division . . . . . . . . . . . . . . . . . . . . . . . . . . 115 8.1 Normal Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.1.1 Determining Uncertainty Mass . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.1.2 Determining Belief Mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.1.3 Product Base Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.1.4 Assembling the Multinomial Product Opinion . . . . . . . . . . . . 120 8.1.5 Justification for Normal Multinomial Multiplication . . . . . . . 120 8.2 Proportional Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 8.3 Projected Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.4 Hypernomial Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 8.5 Product of Dirichlet Probability Density Functions . . . . . . . . . . . . . . . 123 8.6 Example Multinomial Product Computation . . . . . . . . . . . . . . . . . . . . 125 8.6.1 Multinomial Product Computation . . . . . . . . . . . . . . . . . . . . . . 126 8.6.2 Hypernomial Product Computation . . . . . . . . . . . . . . . . . . . . . 127 8.7 Multinomial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.7.1 Averaging Proportional Division . . . . . . . . . . . . . . . . . . . . . . . 129 8.7.2 Selective Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 8.8 Multinomial Opinion Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.8.1 Opinion Projection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.8.2 Example: Football Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 9 Conditional Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 9.1 Introduction to Conditional Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . 135 9.2 Probabilistic Conditional Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 9.2.1 Binomial Probabilistic Deduction and Abduction . . . . . . . . . . 138 9.2.2 Multinomial Probabilistic Deduction and Abduction . . . . . . . 140 9.3 Notation for Subjective Conditional Inference . . . . . . . . . . . . . . . . . . . 143 9.3.1 Notation for Binomial Deduction and Abduction . . . . . . . . . . 143 9.3.2 Notation for Multinomial Deduction and Abduction . . . . . . . 144 9.4 Binomial Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 9.4.1 Bayesian Base Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 9.4.2 Free Base Rate Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9.4.3 Method for Binomial Deduction . . . . . . . . . . . . . . . . . . . . . . . . 148 9.4.4 Justification for the Binomial Deduction Operator . . . . . . . . . 152 9.5 Multinomial Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 9.5.1 Constraints for Multinomial Deduction . . . . . . . . . . . . . . . . . . 154 9.5.2 Bayesian Base Rate Distribution . . . . . . . . . . . . . . . . . . . . . . . 157 9.5.3 Free Base Rate Distribution Intervals . . . . . . . . . . . . . . . . . . . . 158 9.5.4 Method for Multinomial Deduction . . . . . . . . . . . . . . . . . . . . . 159 x Contents 9.6 9.7 Example: Multinomial Deduction for Match-Fixing . . . . . . . . . . . . . . 162 Interpretation of Material Implication in Subjective Logic . . . . . . . . . 163 9.7.1 Truth Functional Material Implication . . . . . . . . . . . . . . . . . . . 164 9.7.2 Material Probabilistic Implication . . . . . . . . . . . . . . . . . . . . . . 166 9.7.3 Relevance in Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 9.7.4 Subjective Interpretation of Material Implication . . . . . . . . . . 168 9.7.5 Comparison with subjective Logic Deduction . . . . . . . . . . . . . 169 9.7.6 How to Interpret Material Implication . . . . . . . . . . . . . . . . . . . 170 10 Conditional Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 10.1 Introduction to Abductive Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . 173 10.2 Relevance and Irrelevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 10.3 Inversion of Binomial Conditional Opinions . . . . . . . . . . . . . . . . . . . . 176 10.3.1 Principles for Inverting Binomial Conditional Opinions . . . . 176 10.3.2 Method for inversion of Binomial Conditional Opinions . . . . 178 10.3.3 Convergence of Repeated Inversions . . . . . . . . . . . . . . . . . . . . 182 10.4 Binomial Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 10.5 Illustrating the Base Rate Fallacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 10.6 Inversion of Multinomial Conditional Opinions . . . . . . . . . . . . . . . . . 188 10.6.1 Principles of Multinomial Conditional Opinion Inversion . . . 188 10.6.2 Method for Multinomial Conditional Inversion . . . . . . . . . . . 189 10.7 Multinomial Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 10.8 Example: Military Intelligence Analysis . . . . . . . . . . . . . . . . . . . . . . . 195 10.8.1 Example: Intelligence Analysis with Probability Calculus . . 195 10.8.2 Example: Intelligence Analysis with Subjective Logic . . . . . 196 11 Fusion of Subjective Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 11.1 Interpretation of Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 11.1.1 Correctness and Consistency Criteria for Fusion Models . . . 201 11.1.2 Classes of Fusion Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 11.1.3 Criteria for Fusion Operator Selection . . . . . . . . . . . . . . . . . . . 205 11.2 Belief Constraint Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 11.2.1 Method of Constraint Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 11.2.2 Frequentist Interpretation of Constraint Fusion . . . . . . . . . . . . 209 11.2.3 Expressing Preferences with Subjective Opinions . . . . . . . . . 213 11.2.4 Example: Going to the Cinema, 1st Attempt . . . . . . . . . . . . . . 214 11.2.5 Example: Going to the Cinema, 2nd Attempt . . . . . . . . . . . . . 215 11.2.6 Example: Not Going to the Cinema . . . . . . . . . . . . . . . . . . . . . 215 11.3 Cumulative Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 11.4 Averaging Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 11.5 Hybrid Cumulative-Averaging Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 220 11.6 Consensus & Compromise Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 11.6.1 Consensus Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 11.6.2 Compromise Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 11.6.3 Merging Consensus and Compromise Belief . . . . . . . . . . . . . 225 Contents xi 12 Unfusion and Fission of Subjective Opinions . . . . . . . . . . . . . . . . . . . . . . 227 12.1 Unfusion of Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 12.1.1 Cumulative Unfusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 12.1.2 Averaging Unfusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 12.1.3 Example: Cumulative Unfusion of Binomial Opinions . . . . . 230 12.2 Fission of Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 12.2.1 Cumulative Fission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 12.2.2 Fission of Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 12.2.3 Example Fission of Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 13 Computational Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 13.1 The Notion of Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 13.1.1 Reliability Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 13.1.2 Decision Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 13.1.3 Reputation and Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 13.2 Trust Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 13.2.1 Motivating Example for Transitive Trust . . . . . . . . . . . . . . . . . 242 13.2.2 Referral Trust and Functional Trust . . . . . . . . . . . . . . . . . . . . . 243 13.2.3 Notation for Transitive Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 13.2.4 Compact Notation for Transitive Trust Paths . . . . . . . . . . . . . 245 13.2.5 Semantic Requirements for Trust Transitivity . . . . . . . . . . . . . 246 13.3 The Trust Discounting Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 13.3.1 Principle of Trust Discounting . . . . . . . . . . . . . . . . . . . . . . . . . 247 13.3.2 Trust Discounting with 2-Edge Paths . . . . . . . . . . . . . . . . . . . . 247 13.3.3 Example: Trust Discounting of Restaurant Recommendation 250 13.3.4 Trust Discounting for Multi-Edge Path . . . . . . . . . . . . . . . . . . 251 13.4 Trust Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 13.5 Trust Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 13.5.1 Motivation for Trust Revision . . . . . . . . . . . . . . . . . . . . . . . . . . 256 13.5.2 Trust Revision Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 13.5.3 Example: Conflicting Restaurant Recommendations . . . . . . . 260 14 Trust Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 14.1 Graphs for Trust Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 14.1.1 Directed Series-Parallel Graphs . . . . . . . . . . . . . . . . . . . . . . . . 263 14.2 Outbound-Inbound Node Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 14.2.1 Parallel-Path Subnetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 14.2.2 Nesting Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 14.3 Analysis of DSPG Trust Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 14.3.1 Algorithm for Analysis of DSPG . . . . . . . . . . . . . . . . . . . . . . . 268 14.3.2 Soundness Requirements for Trust Recommendations . . . . . 269 14.4 Analysing Complex Non-DSPG Trust Networks . . . . . . . . . . . . . . . . . 272 14.4.1 Synthesis of DSPG Trust Network . . . . . . . . . . . . . . . . . . . . . . 274 14.4.2 Requirements for DSPG Synthesis . . . . . . . . . . . . . . . . . . . . . . 276 xii Contents 15 Bayesian Reputation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 15.1 Computing Reputation Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 15.1.1 Binomial Reputation Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 15.1.2 Multinomial Reputation Scores. . . . . . . . . . . . . . . . . . . . . . . . . 283 15.2 Collecting and Aggregating Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 15.2.1 Collecting Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 15.2.2 Aggregating Ratings with Aging . . . . . . . . . . . . . . . . . . . . . . . 285 15.2.3 Reputation Score Convergence with Time Decay . . . . . . . . . . 286 15.3 Base Rates for Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 15.3.1 Individual Base Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 15.3.2 Total History Base Rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 15.3.3 Sliding Time Window Base Rate. . . . . . . . . . . . . . . . . . . . . . . . 287 15.3.4 High Longevity Factor Base Rate. . . . . . . . . . . . . . . . . . . . . . . 288 15.3.5 Dynamic Community Base Rates . . . . . . . . . . . . . . . . . . . . . . . 288 15.4 Reputation Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 15.4.1 Multinomial Probability Representation. . . . . . . . . . . . . . . . . . 289 15.4.2 Point Estimate Representation. . . . . . . . . . . . . . . . . . . . . . . . . . 290 15.4.3 Continuous Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 15.5 Simple Scenario Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 15.6 Combining Trust and Reputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 15.7 Combining Trust and Reputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 16 Subjective Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 16.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 16.1.1 Example: Lung Cancer Situation . . . . . . . . . . . . . . . . . . . . . . . 297 16.1.2 Naı̈ve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 16.1.3 Independence and Separation . . . . . . . . . . . . . . . . . . . . . . . . . . 301 16.2 Subjective Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 16.2.1 Subjective Predictive Reasoning . . . . . . . . . . . . . . . . . . . . . . . . 304 16.2.2 Subjective Diagnostic Reasoning . . . . . . . . . . . . . . . . . . . . . . . 305 16.2.3 Subjective Intercausal Reasoning . . . . . . . . . . . . . . . . . . . . . . . 306 16.3 Subjective Combined Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 16.4 Subjective Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Chapter 1 Introduction In standard logic, propositions are considered to be either true or false, and in probability calculus the argument probabilities are expressed in the range [0, 1]. However, a fundamental aspect of the human condition is that nobody can ever determine with absolute certainty whether a proposition about the world is true or false, or determine the probability of something with certainty. In addition, whenever the truth of a proposition is assessed, it is always done by an individual, and it can never be considered to represent a general and objective belief. This indicates that important aspects are missing in the way standard logic and probability calculus capture our perception of reality, and that these reasoning models are more designed for an idealised world than for the subjective world in which we are all living. The expressiveness of arguments in a reasoning model depends on the richness in the syntax of those arguments. Opinions used in subjective logic offer significantly greater expressiveness than Boolean truth values or by probabilities. This is achieved by explicitly including degrees of uncertainty, thereby allowing an analyst to specify “I don’t know” or “I’m indifferent” as input argument. Definitions of operators used in a specific reasoning model depend on the argument syntax. For example, in binary logic the AND, OR and XOR operators are defined by their respective truth tables which traditionally have the status of being axioms. Other operators, such as MP (Modus Ponens), MT (Modus Tollens) and other logical operators are defined in a similar way. The concept of probabilistic logic has multiple interpretations in the literature, see e.g. [72]. The general aim of a probabilistic logic is to combine the capacity of probability theory to handle likelihood with the capacity of binary logic to exploit structure to make inferences. The result is a richer and more expressive formalism that neither probability calculus nor deductive logic can offer alone. The various probabilistic logics have in common that they attempt to find a natural extension of traditional logic truth tables: the results they define are derived through probabilistic expressions instead. In this book, probabilistic logic is interpreted as the direct extension of binary logic, in the sense that propositions get assigned continuous probabilities, rather 1 2 1 Introduction than just boolean truth values, and where formulas of probability calculus replace truth tables. In binary logic, operators are typically defined as axioms represented as truth tables. In probabilistic logic the corresponding operators are simply algebraic formulas that take probabilities as input arguments. Assuming that Boolean TRUE in binary logic corresponds to probability p = 1, and that Boolean FALSE corresponds to probability p = 0, binary logic (BL) simply becomes an instance of probabilistic logic (PL), or equivalently, probabilistic logic becomes a generalisation of binary logic. More specifically there is a direct correspondence between many binary logic operators and probabilistic logic operator formulas, as specified in Table 1.1. Table 1.1 Correspondence between binary logic and probabilistic logic operators Binary Logic Probabilistic Logic AND: x∧y Product: p(x ∧ y) = p(x)p(y) OR: x∨y Coproduct: p(x ∨ y) = p(x) + p(y) − p(x)p(y) XOR: x 6≡ y Inequivalence: p(x 6≡ y) = p(x)(1 − p(y)) + (1 − p(x))p(y) EQU: x≡ y Equivalence: p(x ≡ y) = 1 − p(x 6≡ y) IMP: x→y Implication: Not closed in probabilistic logic. MP: {x → y, x} ⇒ y Deduction: p(ykx) = p(x)p(y|x) + p(x)p(y|x) MT: {x → y, y} ⇒ x Abduction: p(x|y) = a(x)p(y|x) a(x)p(y|x)+a(x)p(y|x) p(x|y) = a(x)p(y|x) a(x)p(y|x)+a(x)p(y|x) p(xk̃y) = p(y)p(x|y) + p(y)p(x|y) The Material Implication operator IMP is traditionally defined in binary logic in terms of a truth table, but no corresponding probabilistic operator exists because it would require degrees of uncertainty which is not defined for probabilities. Material implication is in fact not closed in probabilistic logic. In subjective logic however it is possible to define an operator corresponding to material implication because 1 Introduction 3 subjective logic includes degrees of uncertainty. This is explained in Section 9.7 on Material Implication. The notation p(ykx) means that the probability of y is derived as a function of the conditionals p(y|x) and p(y|x) as well as the parent p(x). The parameter a(x) represents the base rate of x. The symbol ‘6≡’ represents inequivalence, i.e. that x and y have different truth values. MP (Modus Ponens) corresponds to – and is a special case of – probabilistic conditional deduction. MT (Modus Tollens) corresponds to – and is a special case of – probabilistic conditional abduction. The notation p(ykx) for conditional deduction denotes the output probability of y conditionally deduced from the input conditional p(y|x) and p(y|x) as well as the input argument p(x). Similarly, the notation p(xk̃y) for conditional abduction denotes the output probability of x conditionally abduced from the input conditional p(y|x) and p(y|x) as well as the evidence argument p(y). For example, consider the probabilistic operator for MT in Table 1.1. In the case when (x → y) is TRUE and y is FALSE, which translates into p(y|x) = 1 and p(y) = 0. Then it can be observed from the first equation that p(x|y) 6= 0 because p(y|x) = 1. From the second equation it can be observed that p(x|y) = 0 because p(y|x) = 1 − p(y|x) = 0. From the third equation it can finally be seen that p(xky) = 0 because p(y) = 0 and p(x|y) = 0. From the probabilistic expressions we just abduced that p(x) = 1 which translates into x being FALSE, as MT dictates. The power of probabilistic logic is the ability to derive logic conclusions without relying on axioms of logic, only on principles of probability calculus. Probabilistic logic was first defined by Nilsson [72] with the aim of combining the capability of deductive logic to exploit the structure and relationship of arguments and events, with the capacity of probability theory to express degrees of truth about those arguments and events. Probabilistic logic can be used to build reasoning models of practical situations that are more general and more expressive than reasoning models based on binary logic. Logically speaking it is meaningless to define binary logic operators such as those of Table 1.1 in terms of truth tables because these operators are simply special cases of corresponding probabilistic operators. The truth values specified in truth tables can be directly computed with probabilistic logic operators, so defining them as axioms is redundant. To have separate independent definitions for the same concept, i.e. both as a truth table and as a probability calculus operator, is problematic because of the possibility of inconsistency between definitions. In the defence of truth tables one could say that it is pedagogically meaningful to use truth tables as a look-up tool for Boolean cases because a simple look-up is faster than computing the result of an algebraic expression. However, the truth tables should be defined in terms of their corresponding probabilistic logic operators, and not as separate axioms. A serious limitation of probabilistic logic (and binary logic alike), is that there is often uncertainty about the probabilities (and Boolean values) themselves. When the analyst is unable to estimate probability for input arguments then probabilistic logic is not an adequate formalism. It is for example impossible to express probabilistic input arguments that reflect expressions like “I don’t know” because they 4 1 Introduction express degrees of ignorance and uncertainty. An analyst who is unable to provide any reliable probability for a given input argument can be tempted or even forced to set a probability without any evidence to support it. This practice will generally lead to unreliable conclusions, often described as the problem of ‘garbage in - garbage out’. In case the analyst wants to express “I don’t know the truth values of variables X1 and X2 ” and needs to derive p(X1 ∧ X2 ), then probabilistic logic does not offer an adequate model. The type of uncertainty that subjective opinions express is in the literature typically called second-order probability or second-order uncertainty, where traditional probability represents first-order uncertainty [29, 86]. More specifically, (secondorder) uncertainty is expressed as a probability density function over first-order probabilities. Probability density functions must have an integral of 1 to respect the additivity axiom of probability theory. Apart from this requirement, a probability density function can take any shape, and thereby represent many different forms of uncertainty. The uncertainty expressed by subjective opinions represent second-order probability in the form of Dirichlet probability density functions. A Dirichlet PDF naturally reflects random sampling of statistical events. Uncertainty in the Dirichlet model reflects the lack of evidence, in the sense that the fewer observed samples there are, the less evidence there is, and the more uncertainty there is. In subjective logic, the term uncertainty is interpreted in this way, with reference to the lack of evidence which can be reflected by the Dirichlet model. The additivity principle of classical probability requires that the probabilities of mutually disjoint elements in a state space add up to 1. This requirement makes it necessary to estimate a probability for every state, even though there might not be a basis for it. On other words, it prevents us from explicitly expressing ignorance about the possible states, outcomes or statements. If somebody wants to express ignorance about the state x as “I don’t know” this would be impossible with a simple scalar probability. A probability P(x) = 0.5 would for example mean that x and x are equally likely, which in fact is quite informative, and very different from ignorance. Alternatively, a uniform probability density function would more closely express the situation of ignorance about possible the outcome. Arguments in subjective logic are called subjective opinions, or opinions for short. An opinion can contain degrees of uncertainty in the sense of uncertainty about probability. The uncertainty of an opinion can be interpreted as ignorance about the truth of the relevant states, or as second order probability about first order probabilities. The subjective opinion model extends the traditional belief function model in the sense that opinions take base rates into account, whereas belief functions ignore base rates. An essential characteristic of subjective logic is thus to include base rates, which also makes it possible to define a bijective mapping between subjective opinions and Dirichlet probability density functions. Subjective opinions generalise belief functions, precisely because subjective opinions include base rates, and in that sense have a richer expressiveness than belief functions. Belief theory has its origin in a model for upper and lower probabilities proposed by Dempster in 1960. Shafer later proposed a model for expressing belief functions 1 Introduction 5 [81]. The main idea behind belief theory is to abandon the additivity principle of probability theory, i.e. that the sum of probabilities on all pairwise disjoint states must add up to one. Instead belief theory gives observers the ability to assign socalled belief mass to the powerset of the state space. The main advantage of this approach is that ignorance, i.e. the lack of evidence about the truth of the states, can be explicitly expressed e.g. by assigning belief mass to the whole state space. Shafer’s book [81] describes various aspects of belief theory, but the two main elements are 1) a flexible way of expressing beliefs, and 2) a conjunctive method for combining belief functions, commonly known as Dempster’s Rule, which in subjective logic is called the belief constraint fusion operator. The definition of new operators for subjective opinions is normally quite simple, and consists of adding the new dimension of uncertainty to traditional probabilistic operators. Currently, a relatively large set of practical subjective logic operators exists. This provides a flelxible framework for reasoning in a large variety of situations where input arguments can be incomplete or affected by uncertainty. Subjective opinions are equivalent to Dirichlet and Beta probability density functions. Through this equivalence subjective logic provides a calculus for reasoning with probability density functions. The aim of this book is to provide a general introduction to subjective logic. Different but equivalent representations of subjective opinions are presented together with their interpretation. This allows uncertain probabilities to be seen from different angles, and allows an analyst to define models according to the formalisms that they are most familiar with, and that most naturally represents a specific real world situation. Subjective logic contains the same set of basic operators known from binary logic and classical probability calculus, but also contains some nontraditional operators which are specific to subjective logic. The advantage of subjective logic over traditional probability calculus and probabilistic logic is that lack of evidence can be explicitly expressed so that real world situations can be modeled and analysed more realistically than is otherwise possible with purely probabilistic models. The analyst’s partial ignorance and lack of information can be taken explicitly into account during the analysis, and explicitly expressed in the conclusion. When used for decision support, subjective logic allows decision makers to be better informed about uncertainties affecting the assessment of specific situations and future outcomes. Chapter 2 Elements of Subjective Opinions 2.1 Motivation for the Opinion Representation Explicit expression of uncertainty is the main motivation for subjective logic and for using opinions as input arguments to reasoning models. Uncertainty comes in many flavours, and a good taxonomy is described in in [85]. In subjective logic, uncertainty relates to probabilities. For example, let the probability of a future event x be estimated as p(x) = 0.5. In case this probability estimate represents the perceived likelihood of obtaining heads when flipping a fair coin, then it would be natural to represent it as an opinion with zero uncertainty. In case the probability estimate represents the perceived likelihood that there is life on a planet in a specific solar system, then it would be natural to represent it as an opinion with considerable uncertainty. The probability estimate of an event is thus separated from the certainty/uncertainty of the probability. With this explicit representation of uncertainty, subjective logic can be applied for analysing situations where events have more or less certain probabilities, i.e. where the analyst is ignorant about the probability of possible events. This is done by including the degree of uncertainty about probabilities as an explicit parameter in the input arguments. This uncertainty is then taken into account during the analysis and explicitly represented in the output conclusion. In other words, subjective logic allows uncertainty to propagate through the analysis all the way to the output conclusions. For decisions makers it can make a big difference whether probabilities are certain or uncertain. For example, it is risky to make important decision based on highly uncertain probabilities. Decision makers can instead request additional evidence in order to get more conclusive beliefs less affected by uncertainty. 7 8 2 Elements of Subjective Opinions 2.2 Flexibility of Representation There can be multiple equivalent syntactic representations of subjective opinions. The traditional opinion expression is a composite function consisting of belief masses, uncertainty mass and base rates which are described separately below. An opinion applies to a variable which takes its values from a domain (i.e. from a state space). An opinion defines a sub-additive belief mass distribution over the variable, meaning that the sum of belief masses can be less than one. Opinions can have an attribute that identifies the belief owner. An important property of opinions is that they are equivalent to Beta or Dirichlet probability density functions (PDF) under a specific mapping. This equivalence is based on simple assumptions about the correspondence between evidence and belief mass distributions. More specifically, an infinite amount of evidence leaves no room for uncertainty, and produces an additive belief mass distribution (i.e. the sum is equal to one). A finite amount of evidence gives room for uncertainty and produces a subadditive belief mass distribution (i.e. the sum is less than one). In practical situations the amount of evidence is always finite, so that practical opinions should always have subadditive belief mass that is complemented by some uncertainty. The basic features of subjective opinions are defined in the sections below. 2.3 Domains and Hyperdomains In subjective logic a domain is a state space consisting of a set of values which can also be called elements. Domains can be binary (with exactly two values) or n-ary (with n values) where n > 2. The values of the domain can e.g. be observable or hidden states, events, hypotheses or propositions, just like in traditional Bayesian modeling. The different values of a domain are assumed to be exclusive and exhaustive, which means that the world can only be in one state at any moment in time, and that all possible states are included in the domain. The simplest domain is binary e.g. denoted X = {x, x} where x is the complement of x, as illustrated in Figure 2.1. : x x Fig. 2.1 Binary domain Binary domains are typically used when modeling situations that have only two alternatives, such as when modeling a switch that can be either on or off. 2.3 Domains and Hyperdomains 9 When more than two alternatives are possible the model requires a domain larger than binary. An n-ary domain specifies three or more different exhaustive and exclusive values, where the example quaternary domain Y = {y1 , y2 , y3 , y4 } is illustrated in Figure 2.2. ; y1 y2 y3 y4 Fig. 2.2 Example quaternary domain Domains are typically specified to reflect realistic situations for the purpose of being practically analysed in some way. The values of an n-ary domain are singletons, i.e. they are considered to represent a single possible state or event. It is possible to combine singletons into composite values, as explained below. Assume a ternary domain X = {x1 , x2 , x3 }. The hyperdomain of X is the reduced powerset denoted R(X) as illustrated in Figure 2.3, where the solid circles denoted x1 , x2 and x3 represent singleton values, and the dotted oval shapes denoted (x1 ∪x2 ), (x1 ∪ x3 ) and (x2 ∪ x3 ) represent composite values. R (:) x1 x1x2 x1x3 x2 x2x3 x3 Fig. 2.3 Example hyperdomain Definition 2.1 (Hyperdomain). Let X be a domain and let P(X) denote the powerset of X. The powerset contains all subsets of X including the emptyset 0/ and the domain X itself. The hyperdomain denoted R(X) is the reduced powerset of X, i.e. the powerset excluding the emptyset 0/ and the domain X. The hyperdomain is expressed as: Hyperdomain: R(X) = P(X) \ {X, 0}. / (2.1) ⊔ ⊓ A composite value x ∈ R(X) \ X is the union of a set of singleton values from X. The interpretation of a composite value being TRUE is that one and only one of the constituent singletons is TRUE, and that it is unspecified which singleton is TRUE in particular. 10 2 Elements of Subjective Opinions Singletons represent real possible states in a situation to be analysed. A composite value on the other hand does not reflect a specific state in the real world, because otherwise we would have to assume that the world can be in multiple different states at the same time, which contradicts the assumption behind the original domain. Composites are only used as a synthetic artifact for allowing belief mass to express that one of multiple singletons is true, but not which singleton in particular is true. The property that all proper subsets of X are elements of R(X), but not {X} nor {0}, / is in line with the hyper-Dirichlet model [31]. The cardinality of the hyperdomain is κ = |R(X)| = 2k − 2. Indexes can be used to identify specific values in a hyperdomain, and a natural question is how these elements should be indexed. One simple indexing method is to index each composite element as a function of the singleton elements that it contains, as illustrated in Figure 2.3. While this is a very explicit indexing method, it can be complex to use in mathematical expressions. A more compact indexing method is to use continuous indexing, where indexes in the range [1, k] identify singleton values in X, and indexes in the range [k + 1, κ ] identify composites. The values contained in the hyperdomain R(X) are thus the singletons of X with index in the range [1, k], as well as the composites with index in the range [k + 1, κ ]. The type of indexes following this method is illustrated in Figure 2.4 which is equivalent to the indexing method illustrated in Figure 2.3 R (:) x4 x1 x5 x3 x2 x6 Fig. 2.4 Example of continuous indexing of composite elements in hyperdomain Let us explain the continuous indexing method below. Assume X to be a domain of cardinality k, and then consider how to index the elements of the hyperdomain R(X) of cardinality κ . It is practical to define the first k elements of R(X) as having the same index as the corresponding singletons of X. The remaining elements of R(X) cab be indexed in a simple and intuitive way. The elements of R(X) can be grouped in cardinality classes according to the number of singletons from X that they contain. Let j denote the number of singletons in the elements of a specific cardinality class, then call it ‘cardinality class j’. By definition then, all elements belonging to cardinality class j have cardinality j. The actual number of elements belonging to each cardinality class is determined by the Choose Function C(κ , j) which determines the number of ways that j out of κ singletons can be chosen. The Choose Function, equivalent to the binomial coefficient, is defined as: 2.3 Domains and Hyperdomains C(κ , j) = 11 κ! κ = j (κ − j)! j! (2.2) Within a given hyperdomain each element can be indexed according to the order of the lowest indexed singletons from X that it contains. As an example, Figure 2.2 above illustrates domain X of cardinality k = 4. Let us consider the specific composite value xm = {x1 , x2 , x4 } ∈ R(X). The fact that xm contains 3 singletons identifies it as an element of cardinality class 3. The two first singletons x1 and x2 have the lowest indexes that are possible to select, but the third singleton x4 has the second lowest index that is possible to select. This particular element must therefore be assigned the second relative index in cardinality class 3. However, its absolute index depends on the number of elements in the inferior cardinality classes. Table 2.1 specifies the number of elements of cardinality classes 1 to 3, as determined by Eq.(2.2). Table 2.1 Number of elements per cardinality class 1 Cardinality class: Number of elements in each cardinality class: 4 2 6 3 4 In this example, cardinality class 1 has 4 elements, and cardinality class 2 has 6 elements, which together makes 10 elements. Because ym represents the 2nd relative index in cardinality class 3, its absolute index is 10 + 2 = 12. So the solution is that m = 12 so we have x12 = {x1 , x2 , x4 }. To complete the example, Table 2.2 specifies the index and cardinality class of all the elements of R(X) according to this scheme. Table 2.2 Index and cardinality class of elements of R(X) in case |X| = 4. Singletons Singleton selection per element x4 ∗ ∗ ∗ ∗ ∗ ∗ ∗ x3 ∗ ∗ ∗ ∗ ∗ ∗ ∗ x2 ∗ ∗ ∗∗ ∗ ∗ ∗ x1 ∗ ∗∗∗ ∗ ∗ ∗ Element Index: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Cardinality Class: 1 2 3 Elements of cardinality class 1 are the original singletons from X. The domain X = {x1 , x2 , x3 , x4 } does not figure as an element of R(X) in Table 2.2 because excluding X is precisely what makes R(X) a reduced powerset and a hyperdomain. An element of R(X) that contains multiple singletons is called a composite element because it represent the combination of multiple singletons. In other words, when an element is a non-singleton, or equivalently is not an element in cardinality class 1, then it is a composite element in R(X). This is formally defined below. 12 2 Elements of Subjective Opinions Definition 2.2 (Composite Set). Let X be a domain of cardinality k, where R(X) is its hyperdomain of cardinality κ . Every proper subset x ⊂ X, i.e. every subset of cardinality |x| ≥ 2, is a composite element. The set of composite elements is the composite set, denoted C (X) and defined as: Composite set: C (X) = {x ⊂ X where |x| ≥ 2} . (2.3) ⊔ ⊓ It is straightforward to prove the following equality: C (X) = R(X) \ X . (2.4) The cardinality of the composite set C (X) is expressed as: |C (X)| = κ − k . (2.5) Section 4.1.2 describes the degree of vagueness in an opinion as a function of the belief mass assigned to composite elements, i.e. to elements in C (X). 2.4 Random Variables and Hypervariables Let X denote a binary or an n-ary domain of cardinality k. Then we can define X to be a random variable which takes its values from X. For example, if X is a ternary domain, then ‘X = x3 ’ means that the random variable X has value x3 , which is typically interpreted in the sense that x3 is TRUE. Note our convention that domains are denoted with blackboard letters such as X, Y or Z and that variables are denoted with italic capital letters such as Xm Y or Z. Let X be a ternary domain and consider X’s hyperdomain denoted R(X). The concept of hyperdomain calls for the possibility of assigning values of the hyperdomain to a variable. For example, it must be possible for a variable to take the composite value {x1 , x3 } ∈ R(X). This means that the real TRUE value is either x1 or x3 but that it is unspecified which value in particular it is. Variable that take their values from a hyperdomain are naturally called hypervariables, as defined below. Definition 2.3 (Hypervariable). Let X be a domain with corresponding hyperdomain R(X). A variable X that takes its values from R(X) is a hypervariable. ⊔ ⊓ A hypervariable X can be constrained to a random variable by restricting it to only take values from the domain X. For simplicity of notation we use the same notation for a random variable and the corresponding hypervariable, so that e.g. X can denote both a random variable and a hypervariable. When either meaning can be assumed we simply use the term variable. Let now X be a variable which can take its values from the ternary domain WEATHER = {rainy, sunny, overcast} that contains the three possible weather 2.5 Belief Mass Distribution and Uncertainty Mass 13 types specified as {rainy}, {sunny} and {overcast}. The hyperdomain denoted R(WEATHER) contains the singletons of the original WEATHER domain, as well as all possible composites such as {rainy, overcast}. Remember that values in a domain are exclusive, meaning that it is assumed that only one value is TRUE at any one time. In case a composite value is considered TRUE, it must be interpreted in the sense that in reality only one of the contained singleton values is TRUE, but that it is unknown which value in particular it is. So when a variable takes a composite value such as X = {rainy, sunny} it means that the actual weather is either rainy or sunny, but not both at the same time. If the analyst wants to include the realistic possibility that there can be rain and sunshine simultaneously, then the domain would need to be extended with a corresponding singleton value such as {rainy&sunny}. It is thus a question of interpretation how the analyst wants to separate between different types of weather, and thereby define the relevant domain. 2.5 Belief Mass Distribution and Uncertainty Mass Subjective opinions are based on belief mass distributions over a domain X or over a hyperdomain R(X). In case of multinomial opinions the belief mass distribution is restricted to the domain X. In case of hyper-opinions the belief mass distribution applies to the hyperdomain R(X). Belief mass assigned to a singleton value xi ∈ X expresses support for xi being TRUE. Belief mass assigned to a composite value x j ∈ R(X) expresses support for one of the singleton values contained in x j being TRUE, but says nothing about which of them in particular is TRUE. Belief mass distributions are subadditive, meaning that that the sum of belief masses can be less than one. The sub-additivity of belief mass distributions is complemented by uncertainty mass denoted uX . In general, the belief mass distribution b X assigns belief masses to possible values of the variable X ∈ R(X) as a function of the support for those values. The uncertainty mass uX represents lack of support for the variable X to have any specific value. As explained in Ch.1 Introduction, uncertainty mass can also be interpreted as second-order probability uncertainty, where probability represents first-order uncertainty. The sub-additivity of the belief mass distribution and the complement property of the uncertainty mass are expressed by Eq.(2.6) and Eq.(2.7) below. Definition 2.4 (Belief Mass Distribution). Let X be a domain with corresponding hyperdomain R(X), and let X be a variable over those domains. A belief mass distribution denoted b X assigns belief mass to possible values of the variable X. In case of a random variable X ∈ X the belief mass distribution applies to domain X, and in case of a hypervariable X ∈ R(X) the belief mass distribution applies to hyperdomain R(X). This is formally defined as: Belief mass distribution over domain X is defined as: b X : X → [0, 1], 14 2 Elements of Subjective Opinions with the multinomial additivity requirement: uX + ∑ b X (x) = 1. (2.6) x∈X Belief mass distribution over hyperdomain R(X) is defined as: b X : R(X) → [0, 1], with the hypernomial additivity requirement: uX + ∑ b X (x) = 1. (2.7) x∈R(X) ⊔ ⊓ There exists a direct correspondence between the bba (basic belief assignment) distribution m used for representing belief masses in traditional belief theory [81], and the b X and uX functions used in subjective logic. The correspondence is defined such that m (X) = uX and m (x) = b X (x), ∀x ∈ R(X). Here m (x) denotes belief mass assigned to x. The difference is thus that subjective logic considers uncertainty mass to be different in nature from belief mass, where uncertainty mass exists a priori and belief mass is created a posteriori as a function of collected evidence according to the Beta and Dirichlet models. One advantage of representing opinions with separate belief and uncertainty mass is that it makes the opinion model equivalent to the Beta and Dirichlet models [31, 53] as described in Ch.3 below. While belief theory offers a very general model for expressing beliefs, it is limited by not having any direct correspondence to classical models of statistics, except through a default pignistic probability based on default base rates. Because of its purely theoretical characteristics, default pignistic probability is not suitable for modeling realistic non-default situations. This is because traditional belief theory does not specify base rates. Without base rates however, belief theory does not provide an adequate model for pignistic probability. In subjective logic the notion of projected probability uses base rates and is consistent with the Dirichlet model, and thereby with traditional statistical theory. 2.6 Base Rate Distributions The concept of base rates is central in the theory of probability. Base rates are for example needed for default reasoning, for abductive reasoning and for Bayesian updating. This section describes the concept of base rate distribution over variables, and shows how it can be used for probability projections. Given a domain X of cardinality k, the default base rate of for each singleton in the domain is 1/k, and the default base rate of a subset consisting of n singletons is n/k. In other words, the default base rate of a composite value is equal to the number of singletons in the composite value relative to the cardinality of the whole domain. Default base rate is sometimes called ‘relative atomicity’. For each composite value there exists default relative base rates with respect to every other fully or partly overlapping value x ⊂ X. Remember that a value x ⊂ X is also a value x ∈ R(X). 2.6 Base Rate Distributions 15 However, in practical situations it would be possible and useful to apply base rates that are different from the default base rates. For example, when considering the base rate of a particular infectious disease in a specific population, the domain can be defined as a set of two values: {‘infected’, ‘not infected’} with respect to a particular disease. Assuming that an unknown person enters a medical clinic, the physician would a priori be ignorant about whether that person is infected or not before having assessed any evidence. This ignorance should intuitively be expressed as uncertainty mass. The probability projection of a vacuous opinion using a default base rate of 0.5 would dictate the a priori probability of 0.5 that the person has the disease. However, the base rate of a disease is normally much lower than 0.5 , and can typically be determined given relevant statistics from a given population. Typically, data is collected from hospitals, clinics and other sources where people diagnosed with a specific disease are treated. The amount of data that is required to calculate a reliable base rate of the disease can be determined by guidelines, statistical analysis, and expert opinion about the data that it is truly reflective of the actual number of infections – which is itself a subjective assessment. After the guidelines, analysis and opinion are all satisfied, the base rate will be determined from the data, and can then be used with medical tests to provide a better indication of the likelihood of specific patients having contracted the disease [32]. Integrating base rates with belief mass distributions enables a better and more intuitive interpretation of opinions, facilitates probability projections from opinions, and provides a basis for conditional reasoning. When using base rates for probability projections, the contributing belief mass assigned to values in the domain and the contributing uncertainty mass is a function of the base rate distribution. Base rates are expressed as a base rate distribution denoted as a X so that a X (x) represents the base rate of the element x ∈ X. Base rate distribution is formally defined below. Definition 2.5 (Base Rate Distribution). Let X be a domain, and let X be a random variable in X. The base rate distribution a X assigns base rate probability to possible values of X ∈ X, and is an additive probability distribution, formally expressed as: Base rate distribution: a X : X → [0, 1], with the additivity requirement: ∑ aX (x) = 1 . (2.8) x∈X ⊔ ⊓ The base rate distribution is normally assumed to be common (i.e. not subjective) because is is based on general background information. So although different analysts can have different opinion on the same variable, they normally share the same base rate distribution over the domain of a particular situation. However, it is obvious that two different observers can also assign different base rate distributions to the same random variable in case they do not share the same background information. Base rates can thus be partially objective and partially subjective. 16 2 Elements of Subjective Opinions This flexibility allows two different analysts to assigning different belief masses as well as different base rates to the same variable. This naturally reflects different views, analyses and interpretations of the same situation by different observers. Events that can be repeated many times are typically frequentist in nature, meaning that base rates for such events typically can be derived from statistical observations. For events that can only happen once, the analyst must often extract base rates from subjective intuition or from analyzing the nature of the phenomenon at hand and any other relevant evidence. However, in many cases this can lead to considerable vagueness about base rates, and when nothing else is known, it is possible to use the default base rate distribution for a random variable. More specifically, when there are k singletons in the domain, the default base rate of each singleton is 1/k. The difference between the concepts of subjective and frequentist probabilities is that the former can be defined as subjective betting odds – and the latter as the relative frequency of empirically observed data, where the subjective probability normally converges toward the frequentist probability in the case where empirical data is available [12]. The concepts of subjective and empirical base rates can be interpreted in a similar manner where they also converge and merge into a single base rate when empirical data about the population in question is available. The usefulness of base rate distributions is to make possible the derivation of projected probability distributions from opinions. Projection from opinion space to probability space removes uncertainty and base rate to produce a probability distribution over a domain. The projected probability distribution depends partially on belief mass and partially on uncertainty mass, where the contribution from uncertainty is weighted as a function of base rate. It can be useful to project probability for composite values in a hyperdomain, and for that purpose it is necessary to first compute base rate for such values. The computation of base rate for elements in a hyperdomain is defined below. Definition 2.6 (Base Rate for Elements in a Hyperdomain). Let X be a domain with corresponding hyperdomain R(X), and let X be a variable over those domains. Assume the base rate distribution a X over the domain X according to Def.2.5. The base rate a X (x) for a composite element x ∈ R(X) can be computed as follows: Base rate over composite elements: a X (xi ) = ∑ a X (x j ) , ∀xi ∈ R(X) . (2.9) x j ∈X x j ⊆xi ⊔ ⊓ Eq.(2.9) says that the base rate on a composite element xi ∈ R(X) is the sum of base rates on singletons x j contained in xi . Note that the this is not a base rate distribution because base rates on singletons would be counted multiple times, and so would be super-additive. Belief masses can be assigned to elements in the hyperdomain that are fully or partially overlapping subsets of the hyperdomain. In order to take such belief masses into account for probability projections it is necessary to also derive relative base 2.7 Probability Distributions 17 rates for these elements as a function of the degree of overlap with each other. This is defined below. Definition 2.7 (Relative Base Rates). Assume the domain X of cardinality k, and the corresponding hyperdomain R(X). Let X be hypervariable over R(X). Assume that a base rate distribution a X is defined over X according to Def.2.6. Then the base rates of an element xi relative to an element x j is expressed according to the relative base rate a X (xi /x j ) defined below. a X (xi /x j ) = a X (xi ∩ x j ) , ∀ xi , x j ∈ R(X), a X (x j ) where a X (x j ) = 0 . (2.10) ⊔ ⊓ In the case when a X (x j ) = 0, then a X (xi /x j ) = 0. From a syntactic point of view, base rates are simply probabilities. From a semantic point of view, base rates are non-informative prior probabilities estimated as a function of general background information for a class of variables. The term ‘non-informative’ is used to express that no specific evidence is available for determining the probability of a specific event other than the general background information for that class of events. Base rates make it possible to define a bijective mapping between opinions and Dirichlet probability density functions, and are used for probability projections. The base rate concepts defined in this chapter are used for various computations with opinions, as described in the next chapters. 2.7 Probability Distributions A probability distribution assigns a probability to each value of a random variable. In case it distributes probability over a single random variable, then it is an univariate probability distribution. A probability distribution can also be multivariate in which case it distributes joint probability over two or more random variables taking on various combinations of values. With a probability distribution denoted p X , the probability a X (x) represents the probability of the value x ∈ X. Probability distribution is formally defined below. Definition 2.8 (Probability Distribution). Let X be a domain with corresponding hyperdomain R(X), and let X denote a variable in X or in R(X). The standard probability distribution p X assigns probability to possible values of X ∈ X. The hyperprobability distribution p H X assigns probability to possible values of X ∈ R(X). These distributions are formally defined below: Probability distribution: p X : X → [0, 1], with the additivity requirement: ∑ p X (x) = 1 . x∈X (2.11) 18 2 Elements of Subjective Opinions Hyper-probability distribution: p H X : R(X) → [0, 1], with the additivity requirement: ∑ pH X (x) = 1 . (2.12) x∈R(X) ⊔ ⊓ The hyper-probability distribution is not meaningful in the traditional sense, because hyper-probability is not restricted to exclusive values in the domain X. The traditional assumption behind frequentist or subjective probability is that it is additive over values x ∈ X that in turn represent exclusive real events. Probability distributed over the hyperdomain R(X) is still additive, but the values x ∈ R(X) no longer represent exclusive real events, because they can be composite and (partially) overlapping. However, a hyper-probability distribution can be projected onto a traditional probability distribution according to the Eq.(2.13) which uses the concept of relative base rates from Eq.(2.10). p X (x) = ∑ a X (x/x j ) p H X (x j ), ∀x ∈ X. (2.13) x j ∈R(X) Hyper-probability distributions are used when describing the Dirichlet model over hyperdomains in Section 3.5.3. Chapter 3 Opinion Representations Subjective opinions express beliefs about the truth of propositions under degrees of uncertainty, and can indicate ownership of an opinion whenever required. This chapter presents the various representations and notations for subjective opinions. 3.1 Belief and Trust Relationships In general the notation ωXA is used to denote opinions in subjective logic, where e.g. the subscript X indicates the target variable or proposition to which the opinion applies, and e.g. the superscript A indicates the subject agent who holds the opinion, i.e. the belief owner. Superscripts can be omitted when it implicit or irrelevant who the belief owner agent is. The principle that a subject agent A has an opinion about a target variable X means that there is a directed belief relationship from A to X, formally denoted [A, X]. Similarly, the principle that an agent A trusts an entity E means that there is a directed trust relationship from A to E, formally denoted [A, E]. These relationships can be considered as directed edges in a graph. This convention is summarised in Table 3.1. See also Table 13.1 on p.244 which in addition includes the concept of referral-trust relationship. Table 3.1 Notation for belief and trust relationships Relationship type Formal notation Graph edge notation Interpretation Belief [A, X] A −→ X Agent A has an opinion about variable X Trust [A, E] A −→ E Agent A has a trust opinion about entity E To believe and to trust are very similar concepts, the main difference being that trust assumes dependence and risk, which belief does not necessarily assume. So by 19 20 3 Opinion Representations abstracting away the dependence and risk aspects of trust relationships, subjective logic uses the same formal representation for both belief opinions and trust opinions. Trust opinions are described in detail in Chapter 13. 3.2 Opinion Classes Opinions apply to variables that take their values from domains. A domain is a state space which consist of values that are assumed to be exhaustive and mutually disjoint. Different opinion owners are assumed to have a common semantic interpretation of the elements in the same domain, whether they represent states, events, hypotheses or propositions. The opinion owner (subject) and the variable (object) are attributes of an opinion. The opinion itself is a composite function ωXA = (bbX , uX , a X ) consisting of the belief mass distribution b X , the uncertainty mass uX and the base rate distribution a X . A few specific classes of opinions have been defined. In case the domain X is binary, so is the variable X, and the opinion is binomial. In case the domain is larger than binary and the variable is a random variable X ∈ X, then the opinion is multinomial. In case the domain is larger than binary and the variable is a hypervariable X ∈ R(X)), then the opinion is hypernomial. These are the 3 main opinion classes. Opinions can also be classified according to levels of uncertainty and belief mass assignment. In case uX = 1 the opinion is vacuous, in case 0 < uX < 1 the opinion is relatively uncertain, and in case uX = 0 the opinion is dogmatic. When a single value is considered TRUE by assigning belief mass 1 to that value, the opinion is absolute. By considering the 3 main opinion classes depending on the domain, and the 4 subclasses depending on uncertainty mass and belief mass assignment, we get 12 different opinion classes as listed in Table 3.2. These are further described in the next section. The 12 entries in Table 3.2 also mention the equivalent probability representation of opinions, e.g. as Beta PDF, Dirichlet PDF or as a probability distribution over the variable X. This equivalence is explained in more detail below. The intuition behind using the term ‘dogmatic’ is that a totally certain opinion (i.e. where u = 0) about a real-world proposition can be seen as an extreme opinion. From a philosophical viewpoint, no one can ever be totally certain about anything in this world. So when the formalism allows explicit expression of uncertainty, as opinions do, it is extreme, and even unrealistic, to express a dogmatic opinion. The rationale for this interpretation is that a dogmatic opinion has an equivalent Dirichlet probability density function in the form of a Dirac delta function which is infinitely high and infinitesimally thin. It would require an infinite amount of evidence to produce a Dirichlet PDF equal to a Dirac delta function, which in practice is impossible, and thereby can only be considered in case of idealistic assumptions. This does not mean that traditional probabilities should be interpreted as dogmatic, because the probability model does not include uncertainty in the way opinions do. Instead it can implicitly be assumed that there is some uncertainty associated with 3.2 Opinion Classes 21 Table 3.2 Opinion classes and their equivalent probabilistic or logic representations Class: Domain: Variable: Binomial Multinomial Hyper X = {x, x}, |X| = 2 X, |X| > 2 R(X), |X| > 2 Binomial variable X = x Random variable X ∈ X Hypervariable X ∈ R(X) Vacuous Vacuous (uX = 1) binomial opinion Proba. equiv: Uniform Beta PDF on p(x) Vacuous multinomial opinion Prior PDF on p X Vacuous hyper-opinion Prior PDF on p X Uncertain (0 < uX < 1) Proba. equiv: Uncertain binomial opinion Beta PDF on p(x) Uncertain multinomial opinion Dirichlet PDF on p X Uncertain hyper-opinion Dirichlet HPDF on p H X Dogmatic (uX = 0) Proba. equiv: Dogmatic binomial opinion Probability on x Absolute (bbX (x) = 1) Logic equiv: Absolute binomial opinion Boolean TRUE/FALSE Dogmatic Dogmatic multinomial opinion hyper-opinion Proba. distribution over X Proba.distrib. over R(X) Absolute multinomial opinion TRUE element in X Absolute hyper-opinion TRUE element in R(X) every probability estimate, but that it is invisible, because uncertainty is not included in the model. One advantage of subjective logic is precisely that it allows explicit expression of uncertainty. A vacuous opinion represents belief about random variable in case the observer or analyst has no specific evidence about the possible values of a random variable expect for the base rate distribution which represents general background information. It is thus always possible for an analyst to produce more or less certain opinions that genuinely represents an analyst’s belief, so analysts must never invent beliefs. In case they are ignorant they can simply produce a vacuous or highly uncertain opinion. The same can not be said when using probabilities, where analysts sometimes have to ‘pull probabilities out of thin air’ e.g. in case a specific input probability parameter to a model is needed in order to execute an analysis with the model. Each opinion class from Table 3.2 has an equivalence mapping to a type of Dirichlet or a Beta PDF (probability density function) under a specific mapping. This mapping then gives subjective opinions a firm basis in the domain of classical probability and statistics theory. The different opinions classes are described in more detail in the following sections. 22 3 Opinion Representations 3.3 Binomial Opinions 3.3.1 Binomial Opinion Representation A binary domain consists of only two values, and the variable is typically fixed to one of the two values. Formally, let a binary domain be specified as X = {x, x}, then a binomial random variable X ∈ X can be fixed to X = x. Opinions on a binomial variable are called binomial opinions, and a special notation is used for their mathematical representation. Note that a general n-ary domain X can be considered binary when seen as a binary partition consisting of a proper subset x ⊂ X and its complement x, so that the corresponding multinomial random variable becomes a binomial random variable under the same partition. Definition 3.1 (Binomial Opinion). Let X = {x, x} be a binary domain, and let X be a binomial random variable in X. A binomial opinion about the truth of state x is the ordered quadruplet ωx = (bx , dx , ux , ax ), where the additivity requirement: bx + dx + ux = 1, (3.1) is satisfied, and where the respective parameters are defined as: bx : belief mass in support of x being TRUE (i.e. X = x), dx : disbelief mass in support of x being FALSE (i.e. X = x ), ux : uncertainty mass, i.e. the amount of uncommitted belief/disbelief mass, ax : base rate, i.e. a priori probability of x without any committed belief mass. ⊔ ⊓ The characteristics of various binomial opinion classes are listed below. A binomial opinion: where bx = 1 is an absolute opinion equivalent to Boolean TRUE, where dx = 1 is an absolute opinion equivalent to Boolean FALSE, where ux = 0 is a dogmatic opinion ω x , and a traditional probability, where 0 < ux < 1 is an opinion with some uncertainty, and ◦ where ux = 1 is a vacuous opinion ω x , i.e. with zero belief mass. The projected probability of a binomial opinion on proposition x is defined by Eq.(3.2) below. Projected probability of binomial opinions: Px = bx + ax ux (3.2) Binomial opinions have variance expressed as: Variance of binomial opinions: Varx = Px (1 − Px )ux , W + ux (3.3) where W denotes non-informative prior weight, which must be set to W = 2. The opinion variance is derived from the variance of the Beta PDF, as defined by Eq.(3.10) below. 3.3 Binomial Opinions 23 Barycentric coordinate systems can be used to visualise opinions. In a barycentric coordinate system the location of a point is specified as the center of mass, or barycenter, of masses placed at its vertices [69]. A barycentric coordinate system with n axes is represented on a simplex with n vertices which has dimensionality (n− 1). A triangle is a 2D simplex which has 3 vertices and is thus a barycentric system with 3 axes. A binomial opinion can be visualised as a point in a barycentric coordinate system of 3 axes represented by a 2D simplex which is in fact an equal sided triangle, as illustrated in Figure 3.1. Here, the belief, disbelief and uncertainty axes go perpendicularly from each edge towards the respective opposite vertices denoted x, x and uncertainty. The base rate ax is a point on the base line, and the projected probability Px is determined by projecting the opinion point to the base line in parallel with the base rate director. The binomial opinion ωx = (0.40, 0.20, 0.40, 0.90) with probability projection Px = 0.76 is shown as an example. u vertex (uncertainty) bx Zx dx uX x vertex (disbelief) x vertex Px ax (belief) Fig. 3.1 Barycentric triangle visualisation of binomial opinion In case the opinion point is located at the left or right vertex of the triangle, i.e. with dx = 1 or bx = 1 (and ux = 0), then the opinion is equivalent to Boolean TRUE or FALSE, in which case subjective logic becomes equivalent to binary logic. In case the opinion point is located on the baseline of the triangle, i.e. with ux = 0, then the opinion is equivalent to a traditional probability, in which case subjective logic becomes equivalent to probability calculus, or more specifically to probabilistic logic. In case the opinion point is located at one of the three vertices in the triangle, i.e. with b = 1, d = 1 or u = 1, the reasoning with such opinions becomes a form of three-valued logic that is comparable with Kleene logic [24]. However, the threevalued arguments of Kleene logic do not contain base rates, so that probability projections can not be derived from Kleene logic arguments. See Section 5.1.4 for a more detailed explanation. 24 3 Opinion Representations 3.3.2 The Beta Binomial Model A binomial opinion is equivalent to a Beta PDF (probability density function) under a specific bijective mapping. In general a probability density function denoted PDF(p(x)) is defined as: PDF(p(x)) : [0, 1] → R≥0 , where Z 1 PDF(p(x)) dp(x) = 1 . (3.4) 0 R≥0 is the set of positive real numbers including 0, which can also be denoted as [0, ∞>. The variable of the PDF is thus the continuous probability p(x) ∈ [0, 1], and the image of the PDF is the density PDF(p(x)) ∈ R≥0 . When considering the probability function p(x) : X → [0, 1], then the image of p(x) becomes the variable of PDF(p(x)). In this way the functions p(x) and PDF(p(x)) are chained functions. When there is uncertainty about the probability p(x), then the density expresses where along the continuous interval [0, 1] the probability p(x) is likely to be. At positions on the p-axis where the density is high, the corresponding probability is relatively likely (2nd order probability), and where the density is low the corresponding probability is relatively likely (2nd order probability). A ‘certain probability’ means that there is high probability density at a specific position on the p-axis. A totally ‘uncertain probability’ means that any probability is equally likely (2nd order probability), so the density is spread out uniformly over the whole interval [0, 1]. The traditional 1st order interpretation of probability as likelihood of events, is thus complemented by probability density which can be interpreted as 2nd order probability. This interpretation is the basis for mapping (high) probability density in Beta PDFs into (high) belief mass in opinions, through the bijective mapping described in Section 3.3.3. As a consequence, flat probability density in Beta PFDs is mapped into uncertainty in opinions, The Beta PDF is a specific type of probability density function denoted as Beta(α , β ) with variable p(x), and the two strength parameters α and β . The Beta PDF is defined below. Definition 3.2 (Beta Probability Density Function). Assume the binary domain X = {x, x} and the random variable X ∈ X. Let α represent the evidence about X = x, and let β represent the evidence about X = x . Let p denote the continuous probability function p : X → [0, 1] where p(x) + p(x) = 1. With p(x) as variable the Beta probability density function Beta(α , β ) is the function expressed as: Beta(α , β ) : [0, 1] → R≥0 , Beta(α , β ) = where Γ(α +β ) α −1 (1 − p(x))β −1 , Γ(α )Γ(β ) p(x) α > 0, β > 0 , with the restrictions that: p(x) 6= 0 if α < 1, and p(x) 6= 1 if β < 1. It can be shown that the additivity requirement which in fact is a general property of the Beta PDF. R1 0 (3.5) (3.6) ⊔ ⊓ Beta(α , β ) dp(x) = 1 holds, 3.3 Binomial Opinions 25 Let rx denote the number of observations of x, and let sx denote the number of observations of x. The α and β parameters can be expressed as a function of the observations (rx , sx ) in addition to the base rate ax . α = rx + axW (3.7) β = sx + (1 − ax )W . This leads to the evidence notation of the Beta PDF denoted Betae (rx , sx , ax ) which is expressed as: Betae (rx , sx , ax ) = Γ(rx +sx +W ) (rx +axW −1) (1 − p(x))(sx +(1−ax )W −1) , Γ(rx +axW )Γ(sx +(1−ax )W ) p(x) (3.8) where (rx + axW ) > 0 and (sx + (1 − ax )W ) > 0, with the restrictions: p(x) 6= 0 if (rx + axW ) < 1, p(x) 6= 1 if (sx + (1 − ax )W ) < 1. The non-informative prior weight denoted as W is normally set to W = 2 which ensures that the non-informative prior (i.e. when rx = sx = 0) Beta PDF with default base rate ax = 0.5 is the uniform PDF. The expected probability Ex as a function of the Beta PDF parameters is defined by Eq.(3.9): α rx + axW Ex = = . (3.9) α +β rx + sx +W The variance Varx of the Beta PDF is defined by Eq.(3.10). Varx = αβ (α +β )2 (α +β +1) = (rx +axW )(sx +(1−ax )W ) (rx +sx +W )2 (rx +sx +W +1) = (bx +ax ux )(dx +(1−ax )ux )ux W +ux = Px (1−Px )ux W +ux (3.10) The latter two equality expressions for the variance in Eq.(3.10) emerge from the mapping of Definition 3.3 below. The variance of the Beta PDF measures how far the probability density is spread out over the interval [0, 1]. A variance of zero indicates that the probability density is concentrated in one point, which only happens for infinite α and/or β (or infinite rx and/or sx ). The uniform Beta PDF, which occurs when α = β = 1 (e.g. when rx = sx = 0, ax = 1/2 and W = 2), gives Varx = 1/12. 26 3 Opinion Representations The Beta PDF is important for subjective logic because it is possible to define a bijective mapping between the projected probability of a binomial opinion and the expected probability of a Beta PDF. This is described next. 3.3.3 Mapping between a Binomial Opinion and a Beta PDF The bijective mapping between a binomial opinion and a Beta PDF emerges from the intuitive requirement that Px = Ex , i.e. that the projected probability of a binomial opinions must be equal to the expected probability of a Beta PDF. This can be generalised to a mapping between multinomial opinions and Dirichlet PDFs, as well as between hyper-opinions and hyper-Dirichlet PDFs. The detailed description for determining the mapping is described in Section 3.4.5. The mapping from the parameters of a binomial opinion ωx = (bx , dx , ux , ax ) to the parameters of Betae (rx , sx , ax ) is defined below. Definition 3.3 (Mapping: Binomial Opinion ↔ Beta PDF). Let ωx = (bx , dx , ux , ax ) be a binomial opinion, and let p(x) be a probability distribution, both over the same binomial random variable X where it is assumed that X = x. Let Betae (rx , sx , ax ) be a Beta PDF over the probability variable p(x) defined as a function of rx , sx and ax according to Eq.(3.8). The opinion ωx and the Beta probability density function Betae (rx , sx , ax ) are equivalent through the following mapping: bx = dx = ux = rx W +rx +sx sx W +rx +sx W W +rx +sx For u 6= 0: rx = bx W ux dx W ⇔ sx = ux 1 = bx + dx + ux For ux = 0: rx = bx · ∞ sx = dx · ∞ 1 = bx + dx (3.11) ⊔ ⊓ A generalisation of this mapping is provided in Def.3.6 below. The default noninformative prior weight W is set to W = 2 because then it produces a uniform Beta PDF in the case of default base rate ax = 1/2. It can be seen from Eq.(3.11) that the vacuous binomial opinion ωx = (0, 0, 1, 21 ) corresponds to the uniform PDF Beta(1, 1). The example Beta((3.8, 1.2)) = Betae (2.0, 1.0, 0.9) is illustrated in Figure 3.2. Through the equivalence defined by Eq.(3.11) this Beta PDF is equivalent to the example opinion ωx = (0.4, 0.2, 0.4, 0.9) from Figure 3.1. In the example of Figure 3.2 where α = 3.8 and β = 1.2 the expected probability is Ex = 3.8/5.0 = 0.76 which is indicated with the vertical line. This expected probability is of course equal to the projected probability of Figure 3.1 because the Beta 3.4 Multinomial Opinions 27 3 Probability density Beta(p(x)) 2.5 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 Probability p(x) Fig. 3.2 Probability density function Beta((3.8, 1.2)) ≡ ωx = (0.4, 0.2, 0.4, 0.9) PDF is equivalent to the opinion through Eq.(3.11). The equivalence between binomial opinions and Beta PDFs is very powerful because subjective logic operators then can be applied to density functions and vice versa, and also because binomial opinions can be determined through statistical observations. Multinomial opinions described next are a generalisation of binomial opinions in the same way as Dirichlet PDFs are a generalisation of Beta PDFs. 3.4 Multinomial Opinions 3.4.1 The Multinomial Opinion Representation Multinomial opinions represent the natural generalisation of binomial opinions. Multinomial opinions can be used to model situations where a random variable X ∈ X can take multiple values. Definition 3.4 (Multinomial Opinion). Let X be a domain larger than binary, i.e. so that k = |X| > 2. Let X be a random variable in X. A multinomial opinion over the random variable X is the ordered triplet ωX = (bbX , uX , a X ) where: b X : is a belief mass distribution over X, uX : is the uncertainty mass, i.e. the amount of uncommitted belief mass, a X : is a base rate distribution over X , where the multinomial additivity requirement of Eq.(2.6) is satisfied. ⊔ ⊓ 28 3 Opinion Representations In case uX = 1 then ωX is a vacuous multinomial opinion, in case uX = 0 then ωX is a dogmatic multinomial opinion, and in case 0 < uX < 1 then ωX is an uncertain multinomial opinion. In the special case where for some X = x all belief mass is assigned to a single value as b X (x) = 1 then ωX is an absolute opinion, i.e. it is absolutely certain that a specific value x ∈ X is TRUE. In case of multinomial opinions, the belief mass distribution b X and the base rate distribution a X both have k parameters each. The uncertainty mass uX is a simple scalar. A multinomial opinion thus contains (2k + 1) parameters. However, given the belief and uncertainty mass additivity of Eq.(2.6) and the base rate additivity of Eq.(2.8), multinomial opinions only have (2k − 1) degrees of freedom. The probability projection of multinomial opinions is relatively simple to calculate, compared to general opinions, because no belief mass applies to overlapping values in the domain X. The expression for projected probability of multinomial opinions is therefore a special case of the general expression of Def.3.28. The probability projection of multinomial opinions is defined by Eq.(3.12) below. Projected probability distribution: PX (x) = b X (x) + a X (x) uX , ∀ x ∈ X . (3.12) Multinomial opinions have variance expressed as: Variance of multinomial opinions: VarX (x) = PX (x)(1 − PX (x))uX W + uX (3.13) where W denotes non-informative prior weight, which must be set to W = 2. The multinomial opinion variance is derived from the variance of the Dirichlet PDF, as defined by Eq.(3.18) below. The only multinomial opinions that can be easily visualised are trinomial, in which case it can be presented as a point inside a tetrahedron which is a barycentric coordinate system of 4 axes, as shown in Figure 3.3. The tetrahedron is a 3D simplex. In Figure 3.3, the vertical elevation of the opinion point inside the tetrahedron represents the uncertainty mass. The distances from each of the three triangular side planes to the opinion point represents the respective belief masses. The base rate distribution a X is indicated as a point on the base plane. The line that joins the tetrahedron summit and the base rate distribution point represents the director. The projected probability distribution point is geometrically determined by drawing a projection from the opinion point parallel to the director onto the base plane. Assume the ternary domain X = {x1 , x2 , x3 } and the corresponding random variable X. Figure 3.3 shows a tetrahedron with the example multinomial opinion ωX with belief mass distribution b X = {0.20, 0.20, 0.20}, uncertainty mass uX = 0.40 and base rate distribution a X = {0.750, 0.125, 0.125}. Only the uncertainty axis is shown in Figure 3.3. The belief axes for x1 , x2 and x3 are not shown due to the difficulty of 3D visualisation on the 2D plane of the figure. 3.4 Multinomial Opinions 29 u vertex (uncertainty) ZX uX x2 vertex x3 vertex PX x1 vertex aX Fig. 3.3 Barycentric tetrahedron visualisation of trinomial opinion The triangle and tetrahedron belong to the simplex family of geometrical shapes. Multinomial opinions on domains of cardinality k can in general be represented as a point in a simplex of dimension k. For example, binomial opinions can be represented inside a triangle which is a 2D simplex, and trinomial opinions can be represented inside a tetrahedron which is a 3D simplex. By applying Eq.(3.12) to the example of Figure 3.3 the projected probability distribution is PX = {0.50, 0.25, 0.25}. It can be noted that the probability projection of multinomial opinions expressed by Eq.(3.12) is a generalisation of the probability projection of binomial opinions expressed by Eq.(3.2). 3.4.2 The Dirichlet Multinomial Model A multinomial opinion is equivalent to a Dirichlet PDF over X according to a specific bijective mapping described in Section 3.4.5. For self-containment, we briefly outline the Dirichlet multinomial model below, and refer to [30] for more details. Multinomial probability density over a domain of cardinality k is described by the k-dimensional Dirichlet PDF, where the special case of a probability density over a binary domain (i.e. where k = 2) is the Beta PDF described in Section 3.3.2 above. Assume the domain X of cardinality k and the random variable X ∈ X with probability distribution p X . The Dirichlet PDF can be used to represent probability density over p X . Because of the additivity requirement ∑x∈X p (x) = 1 the Dirichlet density function has only k − 1 degrees of freedom. This means that the knowledge of k − 1 30 3 Opinion Representations probability variables and their density determines the last probability variable and its density. The Dirichlet PDF takes as variable the k-dimensional probability distribution p X . The strength parameters for the k possible outcomes are represented as k positive real numbers αX (x), each corresponding to one of the possible outcomes x ∈ X. When considering that the probability distribution p X consists of k separate probability function p X (x) : X → [0, 1], then the image of the k probability functions p X (x) becomes the k-component variable of Dir(αX ). In this way the functions p X (x) and Dir(αX ) are chained functions. Definition 3.5 (Dirichlet Probability Density Function). Let X be a domain consisting of k mutually disjoint values. Let αX represent the strength vector over the values of X, and let p X denote the k-component probability distribution over X. With p X as a k-dimensional variable, the Dirichlet PDF denoted Dir(αX ) is expressed as: Γ ∑ αX (x) x∈X Dir(αX ) = (3.14) ∏ p X (x)(αX (x)−1) , where αX (x) ≥ 0 , ∏ Γ(αX (x)) x∈X x∈X with the restrictions that: p X (x) 6= 0 if αX (x) < 1. ⊔ ⊓ The strength vector αX represents the a priori as well as the observation evidence. The non-informative prior weight is expressed as a constant W , and this weight is distributed over all the possible outcomes as a function of the base rate. As mentioned already it is normally assumed that W = 2. The singleton values in a domain of cardinality k can have a base rate different from the default value 1//k meaning that it is possible to define an arbitrary additive base rate distribution a X over the domain X. The total strength αX (x) for each value x ∈ X can then be expressed as: αX (x) = r X (x) + a X (x)W , whererr X (x) ≥ 0 ∀x ∈ X . (3.15) This leads to the evidence representation of the Dirichlet probability density function denoted DireX (rr X , a X ) expressed in terms of the evidence vector r X where r X (x) is the evidence for outcome x ∈ X. In addition, the base rate distribution a X and the non-informative prior weight W are parameters in the expression for the evidence Dirichlet PDF. DireX (rr X , a X ) = Γ ∑ (rr X (x)+aaX (x)W ) x∈X ∏ Γ(rr X (x)+aaX (x)W ) x∈X ∏ p X (x)(rr X (x)+aaX (x)W −1) , x∈X (3.16) where (rr X (x) + a X (x)W ) ≥ 0, with the restrictions that p X (x) 6= 0 if (rr X (x) + a X (x)W ) < 1. 3.4 Multinomial Opinions 31 The notation of Eq.(3.16) is useful, because it allows the determination of the probability densities over variables where each value can have an arbitrary base rate. Given the Dirichlet PDF of Eq.(3.16), the expected probability distribution over X can now be written as: EX (x) = r X (x) + a X (x)W αX (x) = (x ) α W + ∑ r X (x j ) ∑ X j x j ∈X ∀x ∈ X , (3.17) x j ∈X which represents a generalisation of the projected probability of the Beta PDF expressed by Eq.(3.9). The variance VarX (x) of the Dirichlet PDF is defined by Eq.(3.18). VarX (x) = αX (x) ∑ αX (x j )−αX (x) x j ∈X 2 ∑ αX (x j ) ∑ αX (x j )+1 = x j ∈X x j ∈X (rr X (x)+aaX (x)W )(RX +W −rr X (x)−aaX (x)W ) , (RX +W )2 (RX +W +1) = (bbX (x)+aaX (x)uX )(1−bbX (x)−aaX (x)uX )uX W +uX = PX (x)(1−PX (x))uX W +uX where RX = ∑ r X (x j ) (3.18) x j ∈X The latter two equality expressions for the variance in Eq.(3.18) emerge from the mapping of Definition 3.6 below. The variance of the Dirichlet PDF measures how far the probability density is spread out over the interval [0, 1] for each dimension x. A variance of zero for some value x indicates that the probability density is concentrated in one point, which only happens in case the corresponding parameter αX (x) is infinite (or the corresponding parameter r X (x) is infinite). It is normally assumed that the a priori probability density in case of a binary domain X = {x, x} is uniform. This requires that αX (x) = αX (x) = 1, which in turn dictates that W = 2. Assuming an a priori uniform probability density over a domain larger than binary would require a non-informative prior weight W > 2. In fact, W is always be equal to the cardinality of the domain for which a uniform probability density is assumed. Selecting W > 2 would result in new observation evidence having relatively less influence over the Dirichlet PDF and the projected probability distribution. Note that it would be unnatural to require a uniform probability density over arbitrarily large domains because it would make the PDF insensitivity to new observation evidence. For example, requiring a uniform a priori PDF over a domain of cardinality 100, would force W = 100. In case an event of interest has been observed 100 times, and no other event has been observed, the projected probability of the event of interest 32 3 Opinion Representations will still only be about 1/2, which would be highly counter-intuitive. In contrast, when a uniform PDF is assumed in the binary case, and the positive outcome has been observed 100 times, and the negative outcome has not been observed, then the projected probability of the positive outcome is close to 1, as intuition would dictate. 3.4.3 Visualising Dirichlet Probability Density Functions Visualising Dirichlet probability density functions is challenging because it is a density function over k − 1 dimensions, where k is the domain cardinality. For this reason, Dirichlet PDFs over ternary domains are the largest that can be practically visualised. Let us consider the example of an urn containing balls of the three different markings: x1 , x2 and x3 , meaning that the urn can contain multiple balls marked x1 , x2 or x3 respectively. This situation can be modelled by a domain X = {x1 , x2 , x3 } of cardinality k = 3. Let us first assume that no other information than the cardinality is available, meaning that the number and relative proportion of balls marked x1 , x2 and x3 are unknown, and that the default base rate for any of the markings is aX (x) = 1/k = 1/3. Initially, before any balls are drawn we have r X (x1 ) = r X (x2 ) = r X (x3 ) = 0. Then Eq.(3.17) dictates that the a priori projected probability of picking a ball of any specific marking is the default base rate probability a X (x) = 1/3. The non-informative a priori Dirichlet PDF is illustrated in Figure 3.4.a. Density Dir(αX ) Density Dir(αX ) 20 20 15 15 10 10 5 5 0 0 1 p X (x2 ) 0 1 0 0 1 0 1 p X (x3 ) . p X (x2 ) 0 1 p X (x1 ) (a) Non-informative prior Dirichlet PDF p X (x3 ) . 0 1 p X (x1 ) (b) A posteriori Dirichlet PDF Fig. 3.4 Prior and posterior Dirichlet PDFs Let us now assume that an observer has picked (with return) 6 balls market x1 , 1 ball marked x2 and 1 ball marked x3 , i.e. r (x1 ) = 6, r (x2 ) = 1, r (x3 ) = 1, then the 3.4 Multinomial Opinions 33 a posteriori projected probability of picking a ball marked x1 can be computed as EX (x1 ) = 23 . The a posteriori Dirichlet PDF is illustrated in Figure 3.4.b. 3.4.4 Coarsening Example: From Ternary to Binary We reuse the example of Section 3.4.3 with the urn containing balls marked x1 , x2 and x3 , but this time we consider a binary partition of the markings into x1 and x1 = {x2 , x3 }. The base rate of picking x1 is set to the relative atomicity of x1 , expressed as a X (x1 ) = 31 . Similarly, the base rate of picking x1 is a X (x1 ) = a (x2 ) + a (x3 ) = 23 . Let us again assume that an observer has picked (with return) 6 balls marked x1 , and 2 balls marked x1 , i.e. marked x2 or x3 . This translates into the observation vector r X (x1 ) = 6, r (x1 ) = 2. Since the domain has been reduced to binary, the Dirichlet density function is reduced to a Beta PDF which is simple to visualise. The a priori and a posteriori density functions are illustrated in Figure 3.5. 6 5 5 4 4 Density 7 6 Density 7 3 3 2 2 1 1 0 0 0 . 0.2 0.4 p(x1 ) 0.6 0.8 (a) Non-informative a priori Beta PDF 1 0 . 0.2 0.4 p(x1 ) 0.6 0.8 1 (b) After 6 balls marked x1 , and 2 balls marked x2 or x3 Fig. 3.5 Prior and posterior Beta PDF Computing the a posteriori projected probability of picking ball marked x1 with Eq.(3.17) produces EX (x1 ) = 32 , which is the same as before the coarsening, as illustrated in Section 3.4.3. This shows that the coarsening does not influence the projected probability of specific events. 3.4.5 Mapping between Multinomial Opinion and Dirichlet PDF The Dirichlet model translates observation evidence directly into a PDF over a kcomponent probability variable. The representation of the observation evidence, to- 34 3 Opinion Representations gether with the base rate, can be used to determine subjective opinions. In other words it is possible to define a bijective mapping between Dirichlet PDFs and multinomial opinions. Let X be a random variable in domain X of cardinality k. Assume the multinomial opinion ωX = (bbX , uX , a X ), the probability distribution p X over X ∈ X, and DireX (rr X , a X ) over p X . The bijective mapping between ωX and DireX (rr X , a X ) is based on the requirement for equality between the projected probability distribution PX derived from ωX , and expected probability distribution EX derived from DireX (rr X , a X ). This requirement is expressed as: PX = EX (3.19) m b X (x) + a X (x) uX = r X (x) +W aX (x) W + ∑ r X (x j ) ∀x ∈ X , (3.20) x j ∈X We also require that each belief mass b X (x) be an increasing function of of the evidence r X (x), and that uX be a decreasing function of ∑ r X (x). In other words, x∈X the more evidence in favour of a particular outcome x, the greater the belief mass on that outcome. Furthermore, the more total evidence available, the less uncertain the opinion. These requirements are expressed as: ∑ r X (x) −→ ∞ x∈X ⇔ ∑ r X (x) −→ ∞ ∑ b X (x) −→ 1 (3.21) ⇔ (3.22) x∈X uX −→ 0 x∈X As already mentioned, the non-informative prior weight is set to W = 2. These intuitive requirements together with Eq.(3.20) provide the basis for the following bijective mapping: Definition 3.6 (Mapping: Multinomial Opinion ↔ Dirichlet PDF). Let ωX = (bbX , uX , a X ) be a multinomial opinion, and let DireX (rr X , a X ) be a Dirichlet PDF, both over the same variable X ∈ X. The multinomial opinions ωX and DireX (rr X , a X ) are equivalent through the following mapping: 3.4 Multinomial Opinions ∀x ∈ X b (x) = X uX = r X (x) W + ∑ r X (xi ) xi ∈X W W + ∑ r X (xi ) xi ∈X 35 For uX 6= 0: W b X (x) r X (x) = uX ⇔ 1 = uX + ∑ b X (xi ) xi ∈X For uX = 0: = b (x) ∞ r (x) X X 1 = b (x ) ∑ X i xi ∈X (3.23) ⊔ ⊓ The equivalence mapping of Eq.(3.23) is a generalisation of the binomial mapping from Eq.(3.11). The interpretation of Beta and Dirichlet PDFs is well established in the statistics literature so that the mapping of Definition 3.6 creates a direct mathematical and interpretation equivalence between Dirichlet PDFs and opinions when both are expressed over the same domain X. This equivalence is very powerful. One the one hand, statistical tools and methods such as collecting statistical observation evidence can now be applied to opinions. On the other hand, the operators of subjective logic such as conditional deduction and abduction can be applied to statistical models in terms of Dirichlet PDFs. 3.4.6 Uncertainty-Maximisation of Multinomial Opinions Given a specific multinomial opinion ωX , with its projected probability distribution PX , it is often useful to know the theoretical maximum uncertainty which still preserves the same projected probability distribution. The corresponding uncertaintybX = (b maximised opinion is denoted ω b X , ubX , a X ). Obviously, the base rate distribution a X is not affected by uncertainty-maximisation. The maximum theoretical uncertainty ubX is determined by converting as much belief mass as possible into uncertainty mass, while preserving consistent projected probabilities. This process is illustrated in in Figure 3.6. The line defined by the equations PX = b X (xi ) + a X (xi )uX , i = 1, . . . k, (3.24) which by definition is parallel to the base rate director line and which joins PX bX in Figure 3.6, defines possible opinions ωX for which projected probability and ω bX is an uncertaintydistribution is constant. As the illustration shows, an opinion ω bX is maximised opinion when Eq.(3.24) is satisfied and at least one belief mass of ω zero, since the corresponding point would lie on a side of the simplex. In general, not all belief masses can be zero simultaneously except for vacuous opinions. The example of Figure 3.6 indicates the case when b X (x1 ) = 0. bX should satisfy the following requireThe components of the opinion point ω ments: PX (xi0 ) ubX = , for some i0 ∈ {1, . . . , k}, and (3.25) a X (xi0 ) 36 3 Opinion Representations uX PX (x1) = bX (x1) + aX (x1) uX PX (x2) = bX (x2) + aX (x2) uX vertex ZX PX (x3) = bX (x3) + aX (x3) uX ZX x3 PX x2 aX x1 bX of multinomial opinion ωX Fig. 3.6 Uncertainty-maximized opinion ω PX (xi ) ≥ a X (xi )uX , for every i ∈ {1, . . . , k. (3.26) The requirement of Eq.(3.26) ensures that all the belief masses determined according to Eq.(3.12) are non-negative. These requirements lead to the theoretical uncertainty maximum : PX (xi ) ubX = min (3.27) i a X (xi ) 3.5 Hyper Opinions The concept of hyper-opinions which is described below, also creates an equivalence to Dirichlet PDFs over hyperdomains as well as to hyper-Dirichlet PDFs over n-ary domains. The hyper-Dirichlet model was defined in 2010 [31] but has so far received relatively little attention in the literature. 3.5.1 The Hyper-Opinion Representation A hyper-opinion is the natural generalisation of a multinomial opinion. In case of a domain X with hyperdomain R(X) it is possible to obtain evidence for a composite value x ∈ R(X) which would translate into assigning belief mass to the same composite value. Definition 3.7 (Hyper-Opinion). Let X be a domain of cardinality k > 2, with corresponding hyperdomain R(X). Let X be a hypervariable in R(X). A hyper-opinion on the hypervariable X is the ordered triplet ωX = (bbX , uX , a X ) where: 3.5 Hyper Opinions 37 bX : is a belief mass distribution over R(X), uX : is the uncertainty mass, i.e. the amount of uncommitted belief mass, aX : is a base rate distribution over X, where the hypernomial additivity of Eq.(2.7) is satisfied. ⊔ ⊓ A subjective opinion ωXA denotes the target variable X as a subscript, and denotes the opinion owner A as a superscript. Explicitly expressing subjective ownership of opinions makes is possible to express that different agents have different opinions on the same variable. The belief mass distribution bX over R(X) has (2k − 2) parameters, whereas the base rate distribution aX over X only has k parameters. The uncertainty mass uX ∈ [0, 1] is a simple scalar. A general opinion thus contains (2k + k − 1) parameters. However, given that Eq.(2.7) and Eq.(2.8) remove one degree of freedom each, hyper-opinions over a domain of cardinality k only have (2k + k − 3) degrees of freedom. By using the concept of relative base rates from Eq.(2.10), the projected probability distribution PX of hyper-opinions can be expressed as: PX (x) = ∑ a X (x/x j ) b X (x j ) + a X (x) uX , ∀x ∈ X. (3.28) x j ∈R(X) For x ∈ X it can be shown that the projected probability distribution PX satisfies the probability additivity principle: ∑ PX (x) = 1 . (3.29) x∈X However, for probabilities over X ∈ R(X), the sum of projected probabilities is in general super-additive, formally expressed as: ∑ PX (x) ≥ 1 . (3.30) x∈R(X) The super-additivity results from the fact that projected probability of partially overlapping composite elements x j ∈ R(X) are partially based on the same projected probability on their constituent singleton elements xi ∈ X so that probabilities are counted multiple times. 3.5.2 Projecting Hyper-Opinions to Multinomial Opinions Given a hyper-opinion it can be useful to project it onto a multinomial opinion. The procedure goes as follows. If b ′X is a belief mass distribution defined by the sum in Eq.(3.28), i.e.: b ′X (x) = ∑ x′ ∈R(X) a X (x/x′ ) b X (x′ ) , (3.31) 38 3 Opinion Representations then it is easy to check that b ′X : X → [0, 1], and that b ′X together with uX satisfies the additivity property in Eq.(2.6), i.e. ωX′ = (bb′X , uX , a X ) is a multinomial opinion. From Eq.(3.28) and Eq.(3.31) we obtain P(ωX ) = P(ωX′ ). This means that every hyper opinion can be approximated with a multinomial opinion which has the same projected probability distribution as the initial hyper-opinion. 3.5.3 The Dirichlet Model Applied to Hyperdomains The traditional Dirichlet model applies naturally to a multinomial domain X of cardinality k, and there is a simple bijective mapping between multinomial opinions and Dirichlet PDFs. Since opinions also apply to a hyperdomain R(X) of cardinality κ = 2k − 2, the question then is whether the Dirichlet model can also be applied to hyperdomains. This would be valuable for interpreting hyper-opinions in terms of traditional statistical theory. The apparent obstacle for this would be that two composite elements xi , x j ∈ R(X) can be overlapping (i.e. non-exclusive) so that xi ∩ x j 6= 0, / which is contrary to the assumption in the traditional Dirichlet model. However, there is a solution, as described below. The approach that we follow is to artificially assume that hyperdomain R(X) is exclusive, i.e. to artificially assume that for every pair of elements xi , x j ∈ R(X) it holds that xi ∩ x j = 0. / In this way, the Dirichlet model can be applied to the artificially exclusive hyperdomain R(X). This Dirichlet model is then based on the κ -dimensional hyper-probability distribution p H X from Eq.(2.12), where X ∈ R(X) is a hypervariable. The input is now a sequence of strength parameters of the κ possible elements x ∈ R(X) represented as κ positive real numbers αX (xi ), i = 1 . . . κ , each corresponding to one of the possible values x ∈ R(X). Definition 3.8 (Dirichlet HPDF). Let X be a domain consisting of k mutually disjoint elements, where the corresponding hyperdomain R(X) has cardinality κ = (2k − 2). Let αX represent the strength vector over the κ elements x ∈ R(X). The hyper-probability distribution p H X and the strength vector αX are both κ dimensional. The Dirichlet hyper-probability density function over p H X , called Dirichlet HPDF for short, is denoted DirH X (αX ), and is expressed as: Γ DirH X (αX ) = ∑ αX (x) x∈R(X) ! ∏ Γ(αX (x)) x∈R(X) ∏ (αX (x)−1) pH , where αX (x) ≥ 0 , (3.32) X (x) x∈R(X) with the restrictions that: pH X (x) 6= 0 if αX (x) < 1. ⊔ ⊓ The strength vector αX represents the a priori as well as the observation evidence, now assumed applicable to values x ∈ R(X). 3.5 Hyper Opinions 39 Since the elements of R(X) can contain multiple singletons from X, an element of R(X) has a base rate equal to the sum of base rates of the singletons it contains as expressed by Eq.(2.9). The strength αX (x) for each element x ∈ R(X) can then be expressed as: r X (x) ≥ 0 αX (x) = r X (x) + a X (x)W , where a X (x) = ∑ a (x j ) ∀x ∈ R(X) x j ⊆x x j ∈X W =2 (3.33) The Dirichlet HPDF over a set of κ possible states xi ∈ R(X) can thus be expressed as a function of the observation evidence r X and the base rate distribution a X (x), where x ∈ R(X). The superscript ‘eH’ in the notation DireH X indicates that it is expressed as a function of the evidence parameter vector r X (not of the strength parameter vector αX ), and that it is a Dirichlet HPDF (not PDF). Γ DireH X (rr X , a X ) = ∑ (rr X (x)+aaX (x)W ) x∈R(X) ! Γ(rr X (x)+aaX (x)W ) ∏ x∈R(X) (rr X (x)+aaX (x)W −1) , ∏ pH X (x) x∈R(X) (3.34) where (rr X (x) + a X (x)W ) ≥ 0, with the restrictions that pH X (x) 6= 0 if (rr X (x) + a X (x)W ) < 1. The expression of Eq.(3.34) determines probability density over hyper-probability distributions pH X where each value x ∈ R(X) has a base rate according to Eq.(2.9). Because an element x j ∈ R(X) can be composites the expected probability of any element x ∈ X is not only a function of the direct probability density on x, but also of the probability density of all other elements x j ∈ R(X) that contain x. More formally, the expected probability of x ∈ X results from the probability density of each x j ∈ R(X) where x ∩ x j 6= 0. / Given the Dirichlet HPDF of Eq.(3.34), the expected probability of any of the k values x ∈ X can now be written as: ∑ EX (x) = a X (x/x j )rr (x j ) +W a X (x) x j ∈R(X) W+ ∑ r (x j ) ∀x ∈ X . (3.35) x j ∈R(X) The expected probability distribution of a Dirichlet HPDF expressed by Eq.(3.35) is a generalisation of the expected probability distribution of a Dirichlet PDF expressed by Eq.(3.17). 40 3 Opinion Representations 3.5.4 Mapping between a Hyper-Opinion and a Dirichlet HPDF A hyper-opinion is equivalent to a Dirichlet HPDF according the mapping defined below. This mapping is simply is an extension of the mapping between a multinomial opinion and a traditional Dirichlet PDF as described in Eq.(3.23). Definition 3.9 (Mapping: Hyper-Opinion ↔ Dirichlet HPDF). Let X be a domain consisting of k mutually disjoint elements, where the corresponding hyperdomain R(X) has cardinality κ = (2k − 2), and let X be a hypervariable in R(X). Let ωX be a hyper-opinion on X, and let DireH X (rr X , a X ) be a Dirichlet HPDF over the ωX and the Dirichlet HPDF hyper-probability distribution p H . The hyper-opinion X r DireH (r , a ) are equivalent through the following mapping: X X X ∀x ∈ R(X) b X (x) = uX = r X (x) W + ∑ r X (xi ) xi ∈R(X) W W + ∑ r X (xi ) xi ∈R(X) For uX 6= 0: r (x) = W bX (x) uX X ⇔ 1 = uX +∑ b X (xi ) xi ∈R(X) For uX = 0: r X (x) = b X (x) ∞ 1 = ∑ b X (xi ) xi ∈R(X) (3.36) ⊔ ⊓ A Dirichlet HPDF is based on applying the Dirichlet model to values of the hyperdomain R(X) that in fact are partially overlapping values in the corresponding domain X. A Dirichlet HPDF applied to the R(X) can be projected to a PDF applied to X, but this projected PDF is not a Dirichlet PDF in general. Only a few degenerate cases become Dirichlet PDFs through this projection, such as the non-informative prior Dirichlet where r X is the zero vector which corresponds to a vacuous opinion with u = 1, or the case where evidence only relates to singleton values x ∈ X. The advantage of the Dirichlet HPDF is to provide an interpretation and equivalent representation of hyper-opinions. It would not be meaningful to try to visualise the Dirichlet HPDF over the hyperprobability distribution pH X itself because it would fail to visualise the important fact that probability is assigned to overlapping values x ∈ X. This aspect would make it extremely difficult to see the probability on a specific value x ∈ X, because in general the probability is a function of multiple probabilities on overlapping hyper-values. A visualisation of probability density should therefore be done over the probability distribution p X , where probability on specific values x ∈ X can be seen or interpreted directly. 3.5 Hyper Opinions 41 3.5.5 Hyper Dirichlet PDF The Dirichlet HPDF (hyper probability density function) described in Section 3.5.3 above applies to hyperdomain R(X), and is not suitable for representing probability over the corresponding domain X. What is needed is a PDF (probability density function) that somehow represents the parameters of the Dirichlet HPDF over the domain X. The PDF that does exactly that can be obtained by integrating the evidence parameters for the Dirichlet HPDF to produce evidence parameters for a PDF over the probability variable p X . In other words, the evidence on singleton values of the random variable must be computed as a function of the evidence on composite values of the hypervariable. A method for this task has been defined by Hankin [31], where the resulting PDF is a Hyper-Dirichlet PDF which is a generalisation of the classical Dirichlet PDF. In addition to the factors consisting of the probability product of the probability variables, it requires a normalisation factor B(rr X , a X ) that can be computed numerically. Hankin also provides a software package for producing visualisations of Hyper-Dirichlet PDFs over ternary domains. The Hyper-Dirichlet PDF is denoted HDireX (rr X , a X ). Its mathematical expression is given by Eq.(3.37) below. ! HDireX (rr X , a X ) = B(rr X , a X )−1 k κ i=1 j=(k+1) ∏ p X (xi )(rrX (xi )+aaX (xi )W −1) ∏ p X (x j )r X (x j ) (3.37) = B(rr X , a X ) −1 k ∏ p X (xi ) (aaX (xi )W −1) i=1 κ ∏ p X (x j ) j=1 ! r X (x j ) (3.38) where B(rr X , a X ) = κ R p X (x)≥0 k ∏ p X (xi i=1 κ ∏ p X (x j j=(k+1) )(rr X (xi )+aaX (xi )W −1) )r X (x j ) ! d(ppX (x1 ). . ., p X (xκ )). ∑ p X (x j )≤1 j=(k+1) (3.39) A Hyper-Dirichlet PDF produces probability density over a probability distribution p X where X ∈ X. Readers might therefore be surprised to see that Eq.(3.37) contains probability terms p X (x j ) where x j are composite values in R(X). However, the probability of a composite value is in fact the sum of set of probabilities p X (xi ) where xi ∈ X, as expressed by Eq.(3.40). This ensures that a Hyper-Dirichlet PDF is really a PDF over a traditional probability distribution p X . 42 3 Opinion Representations The expression for the Hyper-Dirichlet PDF in Eq.(3.37) strictly separates between the singleton value terms and the composite value terms. To this end the k singleton state values x ∈ R(X) (i.e. elements x ∈ X) are denoted xi , i ∈ [1, k], and the (κ − k) composite state values x ∈ R(X) are denoted x j , j ∈ [(k + 1), κ ]. The notation is more compact in Eq.(3.38), where the index j covers the whole range [1, κ ]. The simplification results from interpreting the term p X (x j ) according to Eq.(3.40), so that for j = i ≤ k, we automatically have p X (x j ) = p X (xi ). p X (x j ) = ∑ p X (xi ), for j ∈ [1, κ ] (3.40) xi ⊆x j The normalisation factor B(rr X , a X ) is unfortunately not given by a closed expression, so that numerical computation is needed to determine its value for each set of parameters r X and a X . The ability to represent statistical observations in terms of Hyper-Dirichlet PDFs is useful because a PDF on p X is intuitively meaningful, in contrast to a PDF on p H X. We will here consider the example of a genetic engineering process, where eggs of 3 different mutations are being produced. The mutations are denoted by x1 , x2 and x3 respectively, so that the domain can be defined as X = {x1 , x2 x3 }. The specific mutation of each egg can not be controlled by the process, so a sensor is being used to determine the mutation of each egg. Let us assume that the sensor is not always able to determine the mutation exactly, and that it sometimes can only exclude one out of the three possibilities. What is observed by the sensors is therefore elements of the reduced powerset R(X). We consider two separate scenarios of 100 observations. In scenario A, mutation x3 has been observed 20 times, and mutation x1 or x2 (i.e. the element {x1 , x2 }) has been observed 80 times. In scenario B, mutation x2 has been observed 20 times, the mutations x1 or x3 (i.e. the element {x1 , x3 }) have been observed 40 times, and the mutations x2 or x3 (i.e. the element {x2 , x3 }) have also been observed 40 times. Table 3.3 summarises the two scenarios. The base rate is set to the default value 1/3 for each mutation. Table 3.3 Number of observations per mutation category Scenario A Scenario B Mutation: x1 x2 x3 {x1 , x2 } {x1 , x3 } {x2 , x3 } x1 x2 x3 {x1 , x2 } {x1 , x3 } {x2 , x3 } Counts: 0 0 20 80 0 0 0 20 0 0 40 40 Because the domain X is ternary it is possible to visualise the corresponding Hyper-Dirichlet PDFs, as shown in Figure 3.7 Readers who are familiar with the typical shapes of Dirichlet PDFs will immediately notice that the plots of Figure 3.7 are clearly not Dirichlet. The Hyper-Dirichlet model [31] represents a generalisation of the classic Dirichlet model and provides a nice interpretation of hyper-opinions that can be useful for better understanding the nature hyper-opinions. 3.6 Alternative Opinion Representations 43 Density Density 20 20 15 15 10 10 5 5 0 0 1 p(x2) 1 p(x2) 0 0 1 p(x3) 0 0 0 1 p(x1) (a) Hyper-Dirichlet PDF of scenario A 1 p(x3) 0 1 p(x1) (b) Hyper-Dirichlet PDF of scenario B Fig. 3.7 Example Hyper-Dirichlet probability density functions An interesting aspect of hyper opinions is that they can express vagueness in the sense that evidence can support multiple elements in the domain simultaneously. Vague belief is defined in Section 4.1.2. 3.6 Alternative Opinion Representations The previous sections have presented two equivalent opinion representations which are the belief representation of opinions, typically denoted ωX as well as the evidence representation of opinions in the form of Dirichlet PDFs, typically denoted DireX (rr X , a X ). Other representations can be defined, where Section 3.6.1 and Section 3.6.2 below describe two simple representations that can be useful in specific applications. An additional representation for binomial opinions is for example defined in CertainLogic [79]. Binomial opinions as defined in CertainLogic are equivalent to traditional binomial belief opinions. 3.6.1 Probabilistic Notation of Opinions Most people are familiar with the concept of probability, and are able to intuitively interpret probabilities quite well. The classical probability representation is used in all areas of science, so people are primarily interested in probability distributions when analysing models of situations that include possible and uncertain events. It can therefore be seen as a disadvantage that the traditional opinion representation described in the previous sections does not explicitly express projected probability. 44 3 Opinion Representations Although the projected probability distribution of an opinion can easily be derived with Eq.(3.28), the lack of explicit representation of projected probability might still represent a mental barrier for direct intuitive interpretation of opinions. In order to overcome this barrier, an alternative representation of opinions could therefore be designed to consist of explicit projected probability distributions, together with the degree of uncertainty and base rate distributions. This representation is called the probabilistic opinion notation which is formally defined below. Definition 3.10 (Probabilistic Opinion Notation). Assume domain X with random variable X and let ωX = (bbX , uX , a X ) be a binomial or multinomial opinion on X. Let PX be the corresponding projected probability distribution over X defined according to Eq.(3.12). The probabilistic notation for multinomial opinions is given below. Probabilistic opinion: Constraints: πX = (PX , uX , a X ) a X (x) uX ≤ PX (x) ≤ (aaX (x) uX + 1 − uX ), ∑ PX (x) = 1, (3.41) ∀x ∈ X x∈X ⊔ ⊓ The uncertainty mass uX is the same for both the belief notation and the probabilistic notation of opinions. The base rate distribution a X is also the same for both notations. The equivalence between the two notations is simply based on the expression for the projected probability distribution as a function of the belief distribution in Eq.(3.12). This leads to the bijective mapping defined below. Definition 3.11 (Mapping: Belief Opinion – Probabilistic Opinion). Let ωX = (bbX , uX , a X ) be a multinomial belief opinion, and let πX = (PX , uX , a X ) be a multinomial probabilistic opinion, both over the same variable X ∈ X. The multinomial opinions ωX and πX are equivalent through the following mapping: b X (x) = PX (x)−aaX (x) uX ⇔ PX (x) = b X (x)+aaX (x) uX , ∀x ∈ X (3.42) ⊔ ⊓ In case uX = 0, then PX is a traditional discrete probability distribution without uncertainty. In case uX = 1, then PX = a X , and no evidence has been received, so the probability distribution PX is totally uncertain. Assume a binary domain X with cardinality k. Then both the base rate distribution a X as well as the projected probability distribution PX have k − 1 degrees of freedom due to the additivity property of Eq.(2.8) and Eq.(3.29). With the addition of the independent uncertainty parameter uX , the probabilistic notation of opinions has 2k − 1 degrees of freedom, as do the belief notation and the evidence notation of multinomial opinions. 3.6 Alternative Opinion Representations 45 In case of a binary domain X = {x, x} a special notation for binomial opinions can be used. Eq.(3.43) shows the probabilistic notation of binomial opinions which has three parameters and also has three degrees of freedom. Px is the projected probability of x ux is the uncertainty mass πx = (Px , ux , ax ) , where ax is the base rate of x (3.43) under the constraint: ax ux ≤ Px ≤ 1. The main limitation of the probabilistic opinion notation is that it does not cover hyper-opinions, it only covers binomial and multinomial opinions. However, in case only binomial or multinomial opinion representation is required this limitation might not be a problem. The second disadvantage with the probabilistic opinion notation is that the algebraic expressions for operators often become unnecessarily complex. It turns out that the belief notation of opinions, as specified in Definitions 3.1, 3.4 and 3.7 offers the simplest algebraic representation of opinion operators. For this reason, we do not use the notation for probabilistic opinion operators here, we only use the belief notation. 3.6.2 Qualitative Category Representation Human language provides various terms that are commonly used to express various types of likelihood and uncertainty. It is possible to express binomial opinions in terms of qualitative verbal categories which can be specified according to the need of a particular application. An example set of qualitative categories is provided in Table 3.4. These qualitative verbal categories can be mapped to areas in the opinion triangle as illustrated in Figure 3.8. The mapping must be defined for combinations of ranges of expected probabilities and uncertainty. As a result, the mapping between a specific qualitative category from Table 3.4 and specific geometric area in the opinion triangle depends on the base rate. Without specifying the exact underlying ranges, the visualization of Figure 3.8 indicates the ranges approximately. The edge ranges are deliberately made narrow in order to have categories for near dogmatic and vacuous beliefs, as well as beliefs that express projected probability near absolute 0 or 1. The number of likelihood categories, and certainty categories, as well as the exact ranges for each, must be determined according to the need of each application, and the qualitative categories defined here must be seen as an example. Real-world categories would likely be similar to those found in Sherman Kent’s Words of Estimated Probability [57]; based on the Admiralty Scale as used within the UK National Intel- 3 Opinion Representations 46 # $ % & ' ( ) * + !" , !" #" $" %" &" '" (" )" *" - !+ #+ $+ %+ &+ '+ (+ )+ *+ ! # $ % & ' ( ) * , . !- #- $- %- &- '- (- )- *- / ! # $ % & ' ( ) * Table 3.4 Qualitative Categories ligence Model1 ; or could be based on empirical results obtained from psychological experimentation. 9E 8E 9D 9C 8C 9B 8B 9A 8A 8D 7D 7C 6E 7E 6D 6C 5E 5D 5C 4E 4D 4C 3C 3E 3D 2E 1E 2D 2C 7B 6B 5B 4B 3B 2B 7A 6A 5A 4A 3A 1D 1C 1B 2A 1A (a) Qualitative categories with a = 9E 8E 9D 8D 9C 9B 9A 1 3 7E 7D 8C 6E 4E 6D 5D 4D 7C 6C 5C 8B 7B 8A 5E 7A 6B 6A 5B 5A 3E 2E 3D 2D 4C 4B 4A 3C 3B 3A (b) Qualitative categories with a 1E 1D 2C 1C 2B 1B 2A 1A = 23 Fig. 3.8 Mapping qualitative categories to ranges of belief as a function of the base rate Figure 3.8 illustrates category-opinion mappings in the case of base rate a = 1/3, and the case of base rate a = 2/3. The mapping is determined by the overlap between category area and triangle region. Whenever a qualitative category area overlaps, partly or completely, with the opinion triangle, that qualitative category is a possible mapping. Note that the qualitative category areas overlap with different regions on the triangle depending on the base rate. For example, it can be seen that the category 7D: ‘Unlikely and Very Uncertain’ is possible in case a = 1/3, but not in case a = 2/3. 1 http://www.policereform.gov.uk/implementation/natintellmodel.html 3.6 Alternative Opinion Representations 47 This is because the projected probability of a state x is defined as Px = bx + ax ux , so that when ax , ux −→ 1, then Px −→ 1, meaning that the likelihood category ‘Unlikely’ would be impossible. Mapping from qualitative categories to subjective opinions is also straightforward. Geometrically, the process involves mapping the qualitative adjectives to the corresponding center of the portion of the grid cell contained within the opinion triangle (see Figure 3.8). Naturally, some mappings will always be impossible for a given base rate, but these are logically inconsistent and should be excluded from selection. Although a specific qualitative category maps to different geometric areas in the opinion triangle depending on the base rate, it will always correspond to the same range of beta PDFs. It is simple to visualize ranges of binomial opinions with the opinion triangle, but it would not be easy to visualize ranges of Beta PDFs. The mapping between binomial opinions and beta PDFs thereby provides a very powerful way of describing Beta PDFs in terms of qualitative categories, and vice versa. Chapter 4 Decision-Making Under Uncertainty Decision-making is the process of identifying and choosing between alternative options based on the beliefs about the different options and their associated utility gains or losses. The decision-maker can be the analyst of the situation, or the decision maker can act on advice produced by an analyst. In the following we do not distinguish between the decision-maker and the analyst, and use the term ’analyst’ to cover both. Opinions can form the basis of decisions, and it is important to understand how various aspects of an opinion should (rationally) determine the optimal decision. For this purpose it is necessary to introduce new concepts that are described next. Section 4.4 provides a summary of decision criteria based on opinions. 4.1 Aspects of Belief and Uncertainty in Opinions The previous chapter on the different categories of opinions only distinguishes between belief mass and uncertainty mass. This section dissects these two types into more granular types called specificity, vagueness, and uncertainty that each have their elemental and total variants. 4.1.1 Specificity Belief mass that only supports a specific element is called specific belief mass because it is specific to a single element and discriminates between elements, i.e. it is non-vague and non-uncertain. Note that we also interpret belief mass on a composite element (and its subsets) to be specific for that composite element, because it discriminates between that composite element and any other element which is not a subset of that element. 49 50 4 Decision-Making Under Uncertainty Definition 4.1 (Elemental Specificity). Let X be a domain with hyperdomain R(X) and variable X. Given an opinion ωX , the elemental specificity of element x ∈ R(X) S is the function ḃbX : R(X) → [0, 1] expressed as: S Elemental specific belief mass: ḃbX (x) = ∑b X (x j ) . (4.1) x j ⊆x ⊔ ⊓ It is useful to express elemental specificity of composite elements in order to assist decision making in situations like the Ellsberg paradox described in Section 4.5. The total specific belief mass denoted bSX is simply the sum of all belief masses assigned to singletons, defined as: Definition 4.2 (Total Specificity). Let X be a domain with variable X, and let ΩX be the set of opinions on X. Total specificity of an opinion ωX is the function bSX : ΩX → [0, 1] expressed as: Total specific belief mass: bSX = ∑b X (xi ) . (4.2) xi ∈X ⊔ ⊓ Total specificity represents the complement of the sum of total vagueness and uncertainty, as described below. 4.1.2 Vagueness Recall from Section 2.3 that that the composite set denoted by C (X) is the set of all composite elements from the hyperdomain. Belief mass assigned to a composite element expresses cognitive vagueness because this type of belief mass supports the truth of multiple singletons in X simultaneously, i.e. it does not discriminate between the singletons in the composite element. In case of binary domains there can be no vague belief because there are no composite elements. In case of hyperdomains there are always composite elements, and every singleton x ∈ X is member of multiple composite elements. The elemental vagueness of a singleton x ∈ R(X) is defined as the weighted sum of belief masses on the composite elements of which x is a member, where the weights are determined by the base rate distribution. The total amount of vague belief mass is simply the sum of belief masses on all composite elements in the hyperdomain. The formal definitions for these concepts are provided next. Let X be a domain where R(X) denotes its hyperdomain. Let C (X) be the composite set of X according to Eq.(2.3). Let x ∈ R(X) denote an element in hyperdomain R(X) and let x j ∈ C (X) denote a composite element in C (X). 4.1 Aspects of Belief and Uncertainty in Opinions 51 Definition 4.3 (Elemental Vagueness). Let X be a domain with hyperdomain R(X) and variable X. Given an opinion ωX , the elemental vagueness of element x ∈ R(X) V is the function ḃbX : R(X) → [0, 1] expressed as: V Elemental vague belief mass: ḃbX (x) = ∑ a X (x/x j ) b X (x j ) . (4.3) x j ∈C (X) x j 6⊆x ⊔ ⊓ Note that Eq.(4.3) not only defines vagueness of singletons x ∈ X, but also defines vagueness of composite elements x ∈ C (X), i.e. of all elements x ∈ R(X). Obviously in case x is a composite element, then the belief mass b X (x) does not contribute to elemental vagueness of x, although b X (x) represents vague belief mass for the whole opinion. The total vague belief mass in an opinion ωX is defined as the sum of belief masses on composite elements x j ∈ C (X), formally defined as: Definition 4.4 (Total Vagueness). Let X be a domain with variable X, and let ΩX be the set of opinions on X. The total vagueness of an opinion ωX is the function bV X : Ω X → [0, 1] expressed as: Total vague belief mass: bV X = ∑ b X (x j ) . (4.4) x j ∈C (X) ⊔ ⊓ An opinion ωX is dogmatic and vague when bV X = 1, and is partially vague when 0 < bV < 1. An opinion has mono-vagueness when only a single composite eleX ment has (vague) belief mass assigned to it. Correspondingly an opinion has plurivagueness when several composite elements have (vague) belief mass assigned to them. It is important to understand the difference between uncertainty and vagueness in subjective logic. Uncertainty reflects lack of evidence, whereas vagueness results from evidence that fails to discriminate between specific singletons. A vacuous (totally uncertain) opinion, by definition, does not contain any vagueness. Hyper opinions can contain vagueness, whereas multinomial and binomial opinions never contain vagueness. The ability to express vagueness is thus the main aspect that makes hyper-opinions different from multinomial opinions. When assuming that collected evidence never decays, then uncertainty can only decrease over time because accumulated evidence is never lost. As the natural complement, specificity and vagueness can only increase. At the extreme, a dogmatic opinion where bV X = 1 expresses dogmatic vagueness. A dogmatic opinion where bSX = 1 expresses dogmatic specificity, which is equivalent to a traditional probability distribution over a random variable. When assuming that evidence decays e.g. as a function of time, then uncertainty can increase over time because uncertainty decay is equivalent to the loss of evidence. Vagueness decreases in case new evidence is specific, i.e. when the new 52 4 Decision-Making Under Uncertainty evidence supports singletons, and old vague evidence decays. Vagueness increases in case new evidence is vague, i.e. when the new evidence supports composite elements, and the old specific evidence decays. 4.1.3 Dirichlet Visualisation of Opinion Vagueness The total vagueness of a trinomial opinion can not easily be visualised as such on the opinion tetrahedron. However, it can be visualised in the form of a hyper-Dirichlet PDF. Let us for example consider the ternary domain X with corresponding hyperdomain R(X) illustrated in Figure 4.1. R (:) x4 x1 x5 x3 x2 x6 Fig. 4.1 Hyperdomain for the example of vague belief mass The singletons and composite elements of R(X) are listed below. X = {x1 , x2 , x3 } Domain: x4 = {x1 , x2 } Hyperdomain: R(X) = {x1 , x2 , x3 , x4 , x5 , x6 } where x = {x1 , x3 } 5 Composite set: C (X) = {x4 , x5 , x6 } x6 = {x2 , x3 } (4.5) Let us further assume a hyper-opinion ωX with belief mass distribution and base rate distribution specified in Eq.(4.6) below. Belief mass distribution b X (x6 ) = 0.8, uX = 0.2. Base rate distribution a X (x1 ) = 0.33, a X (x2 ) = 0.33, a X (x3 ) = 0.33. (4.6) Note that this opinion has mono-vagueness because the vague belief mass is assigned to only one composite elements. The projected probability distribution on X computed with Eq.(3.28) and the vague belief mass computed with Eq.(4.3) are given in Eq.(4.7) below. 4.1 Aspects of Belief and Uncertainty in Opinions Projected probability distribution PX (x1 ) = 0.066, PX (x2 ) = 0.467, PX (x3 ) = 0.467. 53 Vague V belief mass ḃbX (x1 ) = 0.0, V ḃbX (x2 ) = 0.4, V ḃbX (x3 ) = 0.4. (4.7) The hyper-Dirichlet PDF for this vague opinion is illustrated in Figure 4.2. Note how the probability density is spread out along the edge between the x2 and x3 vertices, which precisely indicates that the opinion expresses vagueness between x2 and x3 . Vague belief of this kind can be useful for an analyst in the sense that it can exclude specific elements from being plausible, which in this case is x1 . 10 8 6 4 2 0 1 p(x2) 0 0 1 p(x3) 0 1 p(x1) Fig. 4.2 Hyper-Dirichlet PDF with vague belief In case of multinomial and hypernomial opinions larger than trinomial it is challenging to design visualisations. A possible solution in case visualisation is required for opinions over large domains, is to use partial visualisation over specific values of the domain that are of interest to the analyst. 4.1.4 Elemental Uncertainty When an opinion contains uncertainty, the simplest interpretation is to consider that the whole uncertainty mass is shared between all the elements of the (hyper) domain. However, as indicated by the expressions for projected probability of e.g. Eq.(3.28), the uncertainty mass can be interpreted as being implicitly assigned to (hyper)elements of the variable, as a function of the base rate distribution over the variable. This interpretation is captured by the definition of elemental uncertainty mass. 54 4 Decision-Making Under Uncertainty Definition 4.5 (Elemental Uncertainty). Let X be a domain where R(X) denotes its hyperdomain. Given an opinion ωX , the elemental uncertainty mass of an element x ∈ R(X) is computed with the function u̇uX : R(X) → [0, 1] defined as: Elemental uncertainty mass: u̇uX (x) = a X (x) uX . (4.8) ⊔ ⊓ Note that the the above definition uses the notation u̇uX in the sense of a distribution of uncertainty mass over elements, which is different from total uncertainty mass as a single scalar denoted uX . 4.2 Mass-Sum for Specificity, Vagueness and Uncertainty The elemental specificity, vagueness and uncertainty concepts defined in the previous section are representative for each element by pulling belief and uncertainty mass proportionally across the belief masses and the uncertainty of the opinion. The concatenation of elemental specificity, vagueness and uncertainty is then called elemental mass-sum, and similarly for total mass-sum. The additivity properties of elemental and total belief and uncertainty mass are described next. 4.2.1 Elemental Mass-Sum The sum of elemental specificity, vagueness, and uncertainty of an element is equal to the element’s projected probability, expressed as: S V ḃbX (x) + ḃbX (x) + u̇uX (x) = PX (x). (4.9) Eq.(4.9) shows that the projected probability can be split into three parts which are i) elemental specificity, ii) elemental vagueness, and iii) elemental uncertainty. The composition of these three parts, called elemental mass-sum, denoted ΞEX (x). The symbol ‘Ξ’ is the Greek letter ‘Xi’. The concept of elemental mass-sum is defined next. Definition 4.6 (Elemental Mass-Sum). Let X be a domain with hyperdomain R(X), and assume that the opinion ωX is specified. Consider an element x ∈ R(X) S V with its elemental specificity ḃbX (x), elemental vagueness ḃbX (x) and elemental uncertainty u̇uX (x). The elemental mass-sum for element x is the triplet denoted ΞEX (x) expressed as: S V Elemental mass-sum: ΞEX (x) = ḃbX (x), ḃbX (x), u̇uX (x) . (4.10) 4.2 Mass-Sum for Specificity, Vagueness and Uncertainty 55 ⊓ ⊔ Given an opinion ωX , each element x ∈ R(X) has an associated elemental masssum ΞEX (x) which is a function of the opinion ωX . The term mass-sum means that the triplet of specificity, vagueness and uncertainty has the additivity property of Eq.(4.9). In order to visualise an elemental mass-sum, consider the ternary domain X = {x1 , x2 , x3 } and hyper domain R(X) illustrated in Figure 4.3 where the belief masses and uncertainty mass of opinion ωX are indicated on the diagram. R (:) 0.2 x4 0.2 x1 0.1 x5 0.3 x3 0 x2 0.1 x6 0.1 Fig. 4.3 Hyperdomain with belief masses Formally, the opinion ωX is specified in Table 4.1. The table also includes the elemental mass-sum elements in terms of elemental specificity, vagueness and uncertainty. The table also shows the projected probability for every element x ∈ R(X). Table 4.1 Opinion with elemental specificity, vagueness, uncertainty, and projected probability. Element x x1 x2 x3 x4 x5 x6 X Belief mass / Elemental Elemental Elemental Projected Uncertainty Base rate specificity vagueness Uncertainty probability S V b X (x) a X (x) ḃbX (x) ḃbX (x) u̇uX (x) PX (x) uX 0.10 0.10 0.00 0.20 0.30 0.10 0.20 0.20 0.30 0.50 0.50 0.70 0.80 0.10 0.10 0.00 0.40 0.40 0.20 0.16 0.16 0.28 0.12 0.14 0.34 0.04 0.06 0.10 0.10 0.14 0.16 0.30 0.32 0.38 0.62 0.68 0.70 The elemental mass-sums from opinion ωX listed in Table 4.1 are visualised as a mass-sum diagram in Figure 4.4. Mass-sum diagrams are useful for assisting decision making because the degree of specificity, vagueness and uncertainty can be clearly understood. 56 4 Decision-Making Under Uncertainty ; EX ( x1 ) : PX ( x1 ) ; EX ( x2 ) : PX ( x2 ) ; ( x3 ) : E X PX ( x3 ) ; ( x4 ) : E X PX ( x4 ) ; ( x5 ) : E X PX ( x5 ) ; ( x6 ) : E X PX ( x6 ) P 0.0 0.1 0.2 Legend: 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Elemental specificity Elemental vagueness Elemental uncertainty Fig. 4.4 Elemental mass-sum diagram for ωX . Visualisation with the mass-sum diagram makes it much easier to appreciate the nature of beliefs in each element as a function of the opinion. Since hyper opinions can not easily be visualised on simplexes like triangles or tetrahedrons, a masssum diagram like the one in Figure 4.4 offers a nice alternative that scales to larger domains. On Figure 4.4 it can be seen that x3 has the greatest projected probability among the singletons, expressed as PX (x3 ) = 0.38. However, the elemental mass-sum of x3 is void of specificity, so its projected probability is solely based on vagueness and uncertainty. These aspects are important to consider for decision making, as explained below. 4.2.2 Total Mass-Sum The belief mass of an opinion as a whole can be decomposed into total specificity which provides distinctive support for singletons, and total vagueness which provides vague support for singletons. These two belief masses are then complementary to the uncertainty mass. For any opinion ωX it can be verified that Eq.(4.11) holds. bSX + bV (4.11) X + uX = 1 . Eq.(4.11) shows that the belief and uncertainty mass can be split into the three parts of total specificity, total vagueness and uncertainty. The composition of these three parts is called total mass-sum, denoted ΞTX , and is defined below. 4.3 Utility and Normalisation 57 Definition 4.7 (Total Mass-Sum). Let X be a domain with hyperdomain R(X), and assume that the opinion ωX is specified. The total specificity bSX (x), total vagueness bV X (x) and uncertainty uX (x) can be combined as a triplet, and is then called the total mass-sum, denoted ΞTX expressed as: Total mass-sum: ΞTX = bSX , bV (4.12) X , uX . ⊔ ⊓ The total mass-sum of opinion ωX from Figure 4.3 and Table 4.1 is illustrated in Figure 4.5. ; TX : P 0.0 Legend: 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Total specific belief mass Total vague belief mass Uncertainty mass Fig. 4.5 Visualising the total mass-sum from ωX . 4.3 Utility and Normalisation Assume a random variable X with an associated projected probability distribution PX . Utility is typically associated with outcomes of a random variable, in the sense that for each outcome x there is an associated utility λ X (x) expressed on some scale such as monetary value, which can be positive or negative. Given utility λ X (x) in case of outcome x, then the elemental expected utility for x is: Elemental expected utility: LX (x) = PX (x)λ X (x). (4.13) Expected utility for the variable X is then: Expected utility: LX = ∑ PX (x)λ X (x). (4.14) x∈X I classical utility theory, decisions are based on expected utility for possible options. It is also possible to eliminate the notion of utility by integrating it in the probabilities for the various options [7], which produces a utility-normalised probability vector. This approach greatly simplifies decision making models, because every option can be represented as a simple probability. 58 4 Decision-Making Under Uncertainty Normalisation is useful wen comparing options of variables from different domains, where the different variables have different associated probability distributions and utility vectors. The normalisation factor must be appropriate for all variables, so that the utility-normalised probability vectors are within a given range. Note that in case of negative utility for a specific outcome, the utility-normalised probability for that outcome is also negative. In that sense, utility-normalised probability represents synthetic probability, and not realistic probability. Given a set of variables, with associated probability distributions and utility vectors, let λ + denote the greatest absolute utility of all utilities in all vectors. Thus, if the greatest absolute utility is negative, then λ + takes its positive (absolute) value. The utility-normalised probability vector PN X is defined below. Definition 4.8 (Utility-Normalised Probability Vector). Assume a random variable X with an associated projected probability distribution PX and a utility vector λ X that together produce the expected utility LX . Let λ + denote the greatest absolute utility from λ X and from other relevant utility vectors that will be considered for comparing different options. The utility-normalised probability vector produced by PX , λ X and λ + is expressed as: PN X (x) = LX (x) λ X (x)PX (x) = , ∀x ∈ X. λ+ λ+ (4.15) ⊔ ⊓ Note that the utility-normalised probability vector PN X does not represent a probability distribution, and in general does satisfy the additivity requirement of a probability distribution. The vector PN X represents relative probability to be used in comparisons with other vectors of relative probability, for the purpose of choosing between different options. Similarly to the notion of utility-normalised probability, it is possible to define utility-normalised elemental specificity, vagueness and uncertainty. Definition 4.9 (Utility-Normalised Elemental Measures). Assume a random variS able X with an associated projected probability distribution PX . Let ḃbX (x) denote V the elemental specificity of x, let ḃbX (x) denote the elemental vagueness of x, and let u̇uX (x) denote the elemental uncertainty of x. Assume the utility vector λ X , as well as λ + , the greatest absolute utility from λ X and from other relevant utility vectors that will be considered for comparing different options. The utility-normalised elemental specificity, vagueness and uncertainty are expressed as: S Utility-normalised elemental specificity: NS ḃbX (x) = λ X (x)ḃbX (x) , ∀x ∈ X. (4.16) λ+ V NV Utility-normalised elemental vagueness: ḃbX (x) = λ X (x)ḃbX (x) , ∀x ∈ X. (4.17) λ+ 4.3 Utility and Normalisation 59 Utility-normalised elemental uncertainty: u̇uN X (x) = λ X (x)u̇uX (x) , ∀x ∈ X. (4.18) λ+ ⊔ ⊓ Similarly to the additivity property of elemental specificity, vagueness and uncertainty of Eq.(4.9), we also have additivity of utility-normalised elemental utility specificity, vagueness and uncertainty, as expressed in Eq.(4.19). NS NV N ḃbX (x) + ḃbX (x) + u̇uN X (x) = PX (x). (4.19) Having defined utility-normalised probability, it is possible to directly compare options without involving utilities, because utilities are integrated in the utilitynormalised probabilities. Similarly to the mass-sum for elemental specificity, vagueness and uncertainty of Eq.(4.10), it is possible to also describe a corresponding utility-normalised elemental mass-sum, as defined below. Definition 4.10 (Utility-Normalised Elemental Mass-Sum). Let X be a domain with hyperdomain R(X), and assume that the opinion ωX is specified. Also assume that a utility vector λ X is specified. Consider an element x ∈ R(X) with its elemental NS NV utility specificity ḃbX (x), elemental utility vagueness ḃbX (x) and elemental utility N uncertainty u̇uX (x). The utility-normalised elemental mass-sum for x is the triplet denoted ΞNE X (x) expressed as: NS NV bX (x), ḃbX (x), u̇uN Utility-normalised elemental mass-sum: ΞNE X (x) = ḃ X (x) . (4.20) ⊔ ⊓ Note that utility-normalised elemental specificity, vagueness and uncertainty do not represent realistic measures, and must be considered as purely synthetic. As an example of applying utility-normalised probability, consider two urns named X and Y that both contain 100 red and black balls, and you are asked to draw a ball at random from one of the urns. The possible outcomes are named x1 = ‘Red’ and x2 = ‘Black’ for urn X, and are similarly named y1 = ‘Red’ and y2 = ‘Black’ for urn Y. For urn X you are told that it contains 70 red balls, 10 black balls, and 20 balls that are either red or black. The corresponding opinion ωX is expressed as: bX (x1 ) = 7/10, aX (x1 ) = 1/2, Opinion ωX = bX (x2 ) = 1/10, aX (x2 ) = 1/2, (4.21) uX = 2/10. For urn Y you are told that it contains 40 red balls, 20 black balls, and 40 balls that are either red or black. The corresponding hyper opinion ωY is expressed as: 60 4 Decision-Making Under Uncertainty aY (y1 ) = 1/2, aY (y2 ) = 1/2, bY (y1 ) = 4/10, b Opinion ωY = Y (y2 ) = 2/10, uY = 4/10 (4.22) Imagine that you must selected one ball at random, either from urn X or Y, and you are asked to make a choice about which urn to draw it from in a single betting game. With option X, you receive $1000 if you draw ‘Black’ from urn X (i.e. if you draw x2 ). With option Y, you receive $500 if a you draw ‘Black’ from urn Y (i.e. if you draw y2 ). You receive nothing if you draw ‘Red’ in either option. Table 4.2 summarises the options in this game. Table 4.2 Betting options in situation involving utilities Option X, draw from urn X: Option Y, draw from urn Y: Red 0 0 Black $1000 $500 The elemental mass-sums for drawing ‘Black’ are different for options X and Y. However, the utility-normalised elemental mass-sums are equal, as illustrated in Figure 4.6. The normalisation factor used in this example is λ + = 1000, since $1000 is the greatest absolute utility. ; EX ( x2 ) : PX ( x2 ) ; ( y2 ) : E Y PY ( y2 ) ; NE X ( x2 ) : PXN ( x2 ) ;YNE ( y2 ) : PYN ( y2 ) Elemental mass-sums Utility-normalised elemental mass-sums P 0.0 0.1 Legend: 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Elemental specificity Elemental uncertainty Fig. 4.6 Diagram for mass-sums and for utility-normalised mass-sums for options X and Y Note that utility-normalised probability is equal for options X and Y, expressed N as PN X (x2 ) = PY (y2 ). Considering utility-normalised probability alone is thus insufficient for determining which option is the best. The decision in this case must be based on the elemental specificity, which is greatest for option Y, expressed as S S ḃbY (y2 ) > ḃbX (x2 ). In this situation it would not be meaningful to consider utilitynormalised elemental specificity, which here is equal for both options. This is explained in detail in Section 4.4. 4.4 Decision Criteria 61 In case of equal utilities for all options, then normalisation is not needed, or it can simply be observed that utility-normalised elemental mass-sums are equal to the corresponding non-normalised elemental mass-sums, as expressed below. N Projected probability: PX NS Elemental specificity: ḃbX When all options have equal utility: Elemental vagueness: ḃbNV X Elemental uncertainty: u̇uN X Elemental mass-sum: ΞNE X = PX S = ḃbX V = ḃbX (4.23) = u̇uX = ΞEX In the examples below, utilities for all options are equal, so for convenience, the diagrams show simple elemental mass-sums, which would be equal to the corresponding utility-normalised elemental mass-sums. 4.4 Decision Criteria It is possible to specify a set of criteria for making choices with opinions. The criteria follow the indicated order of priority. 1. The option with the highest utility-normalised probability is the best choice. 2. Given equal utility-normalised probability among all options, the option with the greatest elemental specificity is the best choice. 3. Given equal utility-normalised probability as well as equal elemental specificity among all options, the option with the least elemental uncertainty (and thereby the greatest elemental vagueness, whenever relevant) is the best option. The above criteria predict the choice of the majority of participants in the Ellsberg experiment described below, as well as the intuitive best choice in additional examples that combine various degrees of specificity, vagueness and uncertainty. The procedure for making decisions according to these criteria is illustrated in Figure fig:decision-criteria below. The steps of the decision-making process in Figure 4.7 are described in more detail below. We assume uniform utilities for all the options. In case of different utilities, then the expected utilities can be computed by simple product. (a) The decision maker must have the opinions for the relevant options to be compared. The elemental specificities, vaguenesses and uncertainties must be computed, and utility normalised probabilities must be represented in a normalised form like in Figure 4.6. 62 4 Decision-Making Under Uncertainty (a) Consider elemental mass-sums for all decision options (b) Compare utility-normalised probabilities (c) Different Select option with greatest utility-normalised probability Equal (e) (d) Compare elemental specificities Different Select option with greatest elemental specificity Equal (g) (f) Compare elemental uncertainties (h) Different Select option with least elemental uncertainty Equal Difficult decision Fig. 4.7 Decision-making process (b) Compare the utility-normalised probabilities of all relevant options. (c) In case one option has greatest utility-normalised probability, then that option is the best choice. (d) Assuming that the relevant options have equal utility-normalised probabilities, then compare the elemental specificities. (e) In case one option has greatest elemental specificity, then that option is the best choice. (f) Assuming that all relevant options have equal utility-normalised probability as well as equal elemental specificity, then compare the elemental uncertainties. (g) In case one option has least elemental uncertainty (i.e. greatest elemental vagueness), then that option is the best choice. (h) Assuming that the relevant options are such that they have equal utility-normalised probability, equal elemental specificity and uncertainty, then it is challenging to make a decision. However, the composition of each elemental vagueness might be different, or the base rates might be different. In addition, it might be meaningful to consider differences between utility-normalised specificity, vagueness and uncertainty. There are thus multiple aspects that could be considered for specifying more detailed decision criteria, in addition to those specified above. 4.5 The Ellsberg Paradox 63 The next sections describe examples where the decision criteria defined above can be seen in action. 4.5 The Ellsberg Paradox This and the next sections serve as motivating examples for defining decision criteria in Section 4.4 which then defines how specificity, vagueness and uncertainty of opinions should be used for rational decision-making. The Ellsberg paradox[20] results from an experiment that shows how traditional probability theory is unable to explain typical human decision behaviour. Because traditional probability does not express degrees of vagueness and uncertainty it can not explain the results of the experiment. However, when representing the situation with opinions that do express degrees of vagueness and uncertainty the results of the experiment become perfectly rational. In the Ellsberg experiment you are shown an urn with 90 balls in it, and you are told that 30 balls are red and that the remaining 60 balls are either black or yellow. One ball is going to be selected at random and you are asked to make a choice in two separate betting games. Fig.4.8 shows the situation of the Ellsberg paradox represented in the form of a hyperdomain with corresponding belief mass distribution. R (:) 0 x4 0 x1 1/3 x5 0 x3 0 x2 0 x6 2/3 Fig. 4.8 Hyperdomain and belief mass distribution in the Ellsberg paradox The domain X and its hyper opinion are then expressed as: x1 : Red, x : Black, 2 x3 : Yellow, Hyperdomain R(X) = x4 : Red or Yellow, x5 : Red or Black, x6 : Black or Yellow, (4.24) 64 4 Decision-Making Under Uncertainty bX (x1 ) = 1/3, bX (x2 ) = 0, bX (x3 ) = 0, Hyper opinion ωX = bX (x4 ) = 0, bX (x5 ) = 0, b(x6 ) = 2/3, uX =0 aX (x1 ) = 1/3, aX (x2 ) = 1/3, aX (x3 ) = 1/3, aX (x4 ) = 2/3, aX (x5 ) = 2/3, aX (x6 ) = 2/3, (4.25) A quick look at ωX reveals that it contains some specific belief mass, some vague belief mass and no uncertainty mass, so it is a dogmatic and partially vague opinion. In betting game 1 you must choose between option 1A and 1B. With option 1A you receive $100 if a ‘Red’ ball is drawn, and you receive nothing of either a ‘Black’ or ‘Yellow’ ball is drawn. With option 1B you receive $100 if a black ball is drawn and you receive nothing if either a ‘Red’ or ‘Yellow’ ball is drawn. Table 4.3 summarises the options in game 1. Table 4.3 Game 1: Pair of betting options Option 1A: Option 1B: Red $100 0 Black 0 $100 Yellow 0 0 Make a note of your choice from betting game 1, and then proceed to betting game 2 where you are asked to choose between two new options based on the same random draw of a single ball from the same urn. With option 2A you receive $100 if either a ‘Red’ or ‘Yellow’ ball is drawn, and you receive noting of a ‘Black’ ball is drawn. With option 2B you receive $100 if either a ‘Black’ or ‘Yellow’ ball is drawn, and you receive nothing if a ‘Red’ ball is drawn. Table 4.4 summarises the options in game 2. Table 4.4 Game 2: Pair of betting options Option 2A: Option 2B: Red $100 0 Black 0 $100 Yellow $100 $100 Would you choose option 2A or 2B? Ellsberg reports that, when presented with these pairs of choices, most people select options 1A and 2B. Adopting the approach of expected utility theory this reveals a clear inconsistency in probability assessments. On this interpretation, when a person chooses option 1A over option 1B, he or she is revealing a higher subjective probability assessment of picking a ‘Red’ ball than a ‘Black’ ball. However, when the same person prefers option 2B over option 2A, he or she reveals that his or her subjective probability assessment of picking a ‘Black’ or ‘Yellow’ ball is higher than a ‘Red’ or ‘Yellow’ ball, which implies that picking a ‘Black’ ball has a higher probability assessment than a ‘Red’ ball. This seems 4.5 The Ellsberg Paradox 65 to contradict the probability assessment of game 1, which therefore represents a paradox. When representing the vagueness of the opinions the choices 1A and 2B of the majority become perfectly rational, as explained next. The utilities for options 1A and 1B are equal ($100) so their utility probabilities are equal to their projected probabilities that are used for decision modelling below. Projected probabilities are computed with Eq.(3.28) which for convenience is repeated below: ∑ PX (x) = a X (x/x j ) b X (x j ) + a X (x) uX . (4.26) x j ∈R(X) Relative base rates are computed with Eq.(2.10) which for convenience is repeated below: aX (x ∩ x j ) . (4.27) a X (x/x j ) = a X (x j ) The projected probabilities of x1 and x2 in game 1 are then: Option 1A: PX (x1 ) = a X (x1 /x1 ) b X (x1 ) = 1 · 31 = 1 3 Option 1B: PX (x2 ) = a X (x2 /x6 ) b X (x2 ) = 12 · 23 = 1 3 (4.28) Note that PX (x1 ) = PX (x2 ), which makes the the options equal from a purely 1st order probability point of view. However they are affected by different vague belief V mass as shown below. Elemental vague belief mass of x, denoted ḃbX (x), is computed with Eq.(4.3) which for convenience is repeated below. V ḃbX (x) = ∑ a X (x/x j ) b X (x j ) (4.29) x j ∈C (X) x j 6⊆x The elemental vague belief masses of x1 and x2 in game 1 are then: V Option 1A: ḃbX (x1 ) = 0 (4.30) Option 1B: V ḃbX (x2 ) 1 2 = a X (x2 /x6 ) b X (x6 ) = · 2 3 = 1 3 Given the absence of uncertainty, the additivity property of Eq.(4.9) allows us to S S compute the elemental specificities as ḃbX (x1 ) = 1/3 and ḃbX (x2 ) = 0. The elemental mass-sum diagram of the options in Ellsberg betting game 1 is illustrated in Figure 4.9. The difference between options 1A (x1 ) and 1B (x2 ) emerges with their different specific and vague belief masses. People clearly prefer choice 1A because it only has specificity and no vagueness, whereas choice 1B is affected by vagueness. We now turn to betting game 2 where the projected probabilities of x5 and x6 are: 66 4 Decision-Making Under Uncertainty Opt.1A, ; EX ( x1 ) : PX ( x1 ) Opt.1B, ; ( x2 ) : PX ( x2 ) E X P 0 1 9 2 9 Legend: 3 9 4 9 5 9 7 9 6 9 8 9 1 Elemental specificity Elemental vagueness Fig. 4.9 Elemental mass-sum diagram for game 1 in the Ellsberg paradox. Option 2A: PX (x5 ) = a X (x1 /x1 ) b X (x1 ) + a X (x3 /x6 ) b X (x6 ) = 1 · 13 + 12 · 32 = 2 3 Option 2B: PX (x6 ) = a X (x2 /x6 ) b X (x6 ) + a X (x3 /x6 ) b X (x6 ) = 12 · 23 + 21 · 23 = 23 (4.31) Note that PX (x5 ) = PX (x6 ), which makes the the options equal from a 1st order probability point of view. However they have different vague belief masses as shown below. Vague belief mass is computed with Eq.(4.3). The elemental vagueness of x5 and x6 in game 2 are: V Option 2A: ḃbX (x5 ) = aX (x3 /x6 ) bX (x6 ) = 12 · 23 = 1 3 (4.32) V Option 2B: ḃbX (x6 ) = 0 Given the absence of uncertainty, the additivity property of Eq.(4.9) allows us to S S compute the elemental specificities as ḃbX (x5 ) = 1/3 and ḃbX (x6 ) = 2/3. The elemental mass-sum diagram of the options in Ellsberg betting game 2 is illustrated in Figure 4.10. Opt.2A, ; EX ( x5 ) : PX ( x5 ) Opt.2B, ; EX ( x6 ) : PX ( x6 ) P 0 Legend: 1 9 2 9 3 9 4 9 5 9 6 9 7 9 Elemental specificity Elemental vagueness Fig. 4.10 Elemental mass-sum diagram for game 2 in the Ellsberg paradox. 8 9 1 4.6 Examples of Decision Under Vagueness and Uncertainty 67 The difference between options 2A and 2B emerges with their different elemental vagueness and specificity. People clearly prefer choice 2B (x6 ) because it has no vagueness, whereas choice 2A (x5 ) is affected by its vagueness of 1/3. We have shown that preferring option 1A over option 1B, and that preferring option 2B over option 2A is perfectly rational, and therefore does not represent a paradox within the opinion model. Other models of uncertain probabilities are also able to explain the Ellsberg paradox, such as e.g. Choquet capacities(Choquet 1953[9], Chateauneuf 1991[8]). However, the Ellsberg paradox only involves vagueness, not uncertainty. In fact, the Ellsberg paradox is too simplistic for teasing out the whole specter of specificity, vagueness and uncertainty of opinions. The next section presents examples where all aspects are taken into account. 4.6 Examples of Decision Under Vagueness and Uncertainty The three examples presented in this section involve specificity, vagueness and uncertainty. Different situations of varying degrees of specificity, vagueness and uncertainty can be clearly separated and compared when represented as subjective opinions. As far as we are aware of, no other model of uncertain reasoning is able to distinguish and correctly rank the described situations in the same way. Each example consist of a game where you are presented with two urns denoted X and Y , both with 90 balls, and you are asked to pick a random ball from one of the two urns, with the chance of winning $100 if you pick a yellow ball. 4.6.1 Decisions with Difference in Projected Probability In game 1 you receive the following information. For urn X you are told that it contains 90 balls that are either red, black or yellow. The corresponding hyper opinion ωX is expressed as: b X (x1 ) = 0, a X (x1 ) = 1/3, b X (x2 ) = 0, a X (x2 ) = 1/3, b X (x3 ) = 0, a X (x3 ) = 1/3, Hyper opinion 2 about X ωX = (4.33) b X (x4 ) = 0, a X (x4 ) = /3, 2 in game 1 b (x ) = 0, a (x ) = / 3 , X X 5 5 b X (x6 ) = 0, a X (x6 ) = 2/3, uX = 1. For urn Y you are told that 50 balls are red, 20 balls are black, and 20 balls are yellow. The corresponding hyper opinion ωY is expressed as: 68 4 Decision-Making Under Uncertainty bY (y1 ) = 4/9, bY (y2 ) = 3/9, bY (y3 ) = 2/9, Hyper opinion about Y ωY = bY (y4 ) = 0, bY (y5 ) = 0, in game 1 bY (y6 ) = 0, uY =0 aY (y1 ) = 1/3, aY (y2 ) = 1/3, aY (y3 ) = 1/3, aY (y4 ) = 2/3, aY (y5 ) = 2/3, aY (y6 ) = 2/3, (4.34) You must selected one ball at random, either from urn X or Y , and you are asked to make a choice about which urn to draw it from in a single betting game. You receive $100 if a ‘Yellow’ ball is drawn, and you receive nothing of either a ‘Red’ or ‘Black’ ball is drawn. Table 4.5 summarises the options in this game. Table 4.5 Game 1: Betting options in situation of different projected probabilities Option 1X, draw ball from urn X: Option 1Y, draw ball from urn Y: Red 0 0 Black 0 0 Yellow $100 $100 Without having conducted any experiment, when presented with this pair of choices, it seems obvious to select option 1X. The intuitive reason is that option 1X has the greatest projected probability for picking a ‘Yellow’ ball. Eq.(4.35) gives the computed results for projected probability. Projected probability is computed with Eq.(3.28). The projected probabilities of x3 and y3 in options 1X and 1Y are: Option 1X: PX (x3 ) = a X (x3 )uX = 13 · 1 = 1 3 (4.35) Option 1Y: PY (y3 ) = bY (y3 ) = 2 9 The elemental mass-sum diagram of ΞEX (x3 ) and ΞYE (y3 ) of options 1X and 1Y are visualised in Figure 4.11. Note that the utility for picking a red ball is equal for both E NE options, so that ΞEX (x3 ) = ΞNE X (x3 ) and ΞY (y3 ) = ΞY (y3 ), i.e. the utility-normalised and the non-normalised mass-sums are equal. So while Figure 4.11 shows elemental mass-sums, the corresponding utility-normalised elemental mass-sums are identical. It can be seen that PX (x3 ) > PY (y3 ) which indicates that the rational choice is option 1X. Note that option 1X has elemental uncertainty of 1/3 in contrast to option 1Y which has no uncertainty. In case of highly risk-averse participants, option 1Y might be preferable, but this should still be considered ‘irrational’. Game 1 shows that when options have different projected probability, the option with the greatest projected probability is to be preferred. 4.6 Examples of Decision Under Vagueness and Uncertainty Opt.1X, ; EX ( x3 ) : 69 PX ( x3 ) Opt.1Y, ; ( y3 ) : E Y PY ( y3 ) P 0 Legend: 1 9 2 9 3 9 4 9 5 9 6 9 7 9 8 9 1 Elemental specificity Elemental uncertainty Fig. 4.11 Visualising elemental mass-sum for options 1X and 1Y. 4.6.2 Decisions with Difference in Specificity In game 2 you receive the following information. For urn X you are told that 30 balls are red, and that 60 balls are either black or yellow. The corresponding hyper opinion ωX is expressed as: b X (x1 ) = 1/3, a X (x1 ) = 1/3, b X (x2 ) = 0, a X (x2 ) = 1/3, Hyper opinion a X (x3 ) = 1/3, b X (x3 ) = 0, about X a X (x4 ) = 2/3, ωX = (4.36) b X (x4 ) = 0, b X (x5 ) = 0, 2/3, in game 2 a (x ) = X 5 b X (x6 ) = 2/3, a X (x6 ) = 2/3, uX =0 For urn Y you are told that 10 balls are red, 10 balls are black, 10 balls are yellow, and that the remaining 60 balls are either red, black or yellow. The corresponding hyper opinion ωY is expressed as: bY (y1 ) = 1/9, aY (y1 ) = 1/3, bY (y2 ) = 1/9, aY (y2 ) = 1/3, bY (y3 ) = 1/9, aY (y3 ) = 1/3, Hyper opinion about Y aY (y4 ) = 2/3, ωY = (4.37) bY (y4 ) = 0, bY (y5 ) = 0, in game 2 aY (y5 ) = 2/3, bY (y6 ) = 0, aY (y6 ) = 2/3, 2 uY = /3 One ball is going to be selected at random either from urn X or Y , and you are asked to make a choice about which urn to draw it from in a single betting game. You receive $100 if a ‘Yellow’ ball is drawn, and you receive nothing of either a ‘Red’ or ‘Black’ ball is drawn. Table 4.6 summarises the options in this game. Without having conducted any experiment, when presented with this pair of choices, it appears obvious to select option 2Y. The intuitive reason us that option 2Y includes some specific belief mass on favour of ‘Yellow’ whereas with option 70 4 Decision-Making Under Uncertainty Table 4.6 Game 2: Betting options in vague and uncertain situation Option 2X, draw ball from urn X: Option 2Y, draw ball from urn Y: Red 0 0 Black 0 0 Yellow $100 $100 2X there is none. Below are the expressions for projected probability, elemental specificity, vague belief mass and elemental uncertainty mass. Projected probabilities are computed with Eq.(3.28), relative base rates are computed with Eq.(2.10), and elemental uncertainty with Eq.(4.3). The projected probabilities of x3 and y3 in options 2X and 2Y are: Option 2X: PX (x3 ) = a X (x3 /x6 ) b X (x6 ) + a X (x3 )uX = 12 · 94 + 13 · 39 = 1 3 (4.38) = 13 · 1 Option 2Y: PY (y3 ) = aY (y3 ) uY = 1 3 Note that PX (x3 ) = PY (y3 ) which makes options 2X and 2Y equal from a purely 1st order probability point of view. However they have different elemental specificity, elemental vagueness, and elemental uncertainty as shown below. Elemental specificities of x3 and y3 are: S Option 2X: ḃbX (x3 ) = 0 (4.39) S Option 2Y: ḃbY (y3 ) = bY (y3 ) = 1 9 Elemental vaguenesses of x3 and y3 are: V Option 2X: ḃbX (x3 ) = a X (x3 /x6 ) b X (x6 ) = 12 · 23 = 1 3 (4.40) V Option 2Y: ḃbY (y3 ) = 0 Elemental uncertainty masses of x3 and y3 are: Option 2X: u̇uX (x3 ) = 0 (4.41) Option 2Y: u̇uY (y3 ) = aY (y3 )uY = 13 · 69 = 2 9 Note that the additivity property of Eq.(4.9) holds for x3 and y3 . The elemental mass-sum diagram of ΞEX (x3 ) and ΞYE (y3 ) of options 2X and 2Y are visualised in Figure 4.12. The difference between options 2X and 2Y emerges with their difference in elemental specificity, where the option 2Y has the greatest specificity. It also means that option 2Y has the least sum of elemental vagueness and uncertainty, and therefore is the preferable option. Game 2 shows that when projected probabilities are equal, but the elemental specificities are different, then the option with the greatest specificity is the best 4.6 Examples of Decision Under Vagueness and Uncertainty Opt.2X, ; EX ( x3 ) : PX ( x3 ) Opt.2Y, ; ( y3 ) : PY ( y3 ) E Y 0 Legend: 71 P 1 9 2 9 3 9 4 9 5 9 6 9 7 9 8 9 1 Elemental specificity Elemental vagueness Elemental uncertainty Fig. 4.12 Visualising elemental mass-sum for options 2X and 2Y. choice. Option 2Y is therefore the rational preferred choice because it clearly has the greatest elemental specificity among the two. 4.6.3 Decisions with Difference in Vagueness and Uncertainty In game 3 you receive the following information. For urn X you are told that 20 balls are red, that 40 balls are either black or yellow, and that the remaining 30 balls are either red, black or yellow. For urn Y you are only told that the 90 balls in the urn are either red, black or yellow. The corresponding hyper opinions are expressed as: b X (x1 ) = 2/9, a X (x1 ) = 1/3, b X (x2 ) = 0, a X (x2 ) = 1/3, b X (x3 ) = 0, a X (x3 ) = 1/3, Hyper opinion a X (x4 ) = 2/3, about X ωX = (4.42) b X (x4 ) = 0, b X (x5 ) = 0, 2 a (x ) = / 3 , in game 3 X 5 b X (x6 ) = 4/9, a X (x6 ) = 2/3, uX = 3/9 bY (y1 ) = 0, aY (y1 ) = 1/3, bY (y2 ) = 0, aY (y2 ) = 1/3, bY (y3 ) = 0, aY (y3 ) = 1/3, Hyper opinion 2 about Y ωY = (4.43) bY (y4 ) = 0, aY (y4 ) = /3, bY (y5 ) = 0, aY (y5 ) = 2/3, in game 3 bY (y6 ) = 0, aY (y6 ) = 2/3, uY =1 One ball is going to be selected at random either from urn X or from urn Y , and you are asked to make a choice about which urn to draw it from in a single betting 72 4 Decision-Making Under Uncertainty game. You receive $100 if a ‘Yellow’ ball is drawn, and you receive nothing of either a ‘Red’ or ‘Black’ ball is drawn. Table 4.7 summarises the options in this game. Table 4.7 Game 3: Betting options in vague and uncertain situation Red 0 0 Option 3X, draw ball from urn X: Option 3Y, draw ball from urn Y: Black 0 0 Yellow $100 $100 Without having conducted any experiment, when presented with this pair of choices, it seems obvious to select option 2X. The intuitive reason us that option 3X is affected by less elemental uncertainty than option 3Y. Below are the expressions for projected probability and elemental specificity. Projected probabilities are computed with Eq.(3.28), relative base rates are computed with Eq.(2.10), and elemental uncertainty with Eq.(4.3). The projected probabilities of x3 and y3 in options 3X and 3Y are: Option 3X: PX (x3 ) = a X (x3 /x6 ) b X (x6 ) + a X (x3 )uX = 12 · 94 + 13 · 39 = 1 3 (4.44) = 13 · 1 Option 3Y: PY (y3 ) = aY (y3 ) uY = 1 3 S Option 2X: ḃbX (x3 ) = 0 (4.45) Option 2Y: S ḃbY (y3 ) =0 Note that PX (x3 ) = PY (y3 ) which makes options 3X and 3Y equal from a purely 1st order probability point of view. In addition we have equal elemental specificity S S expressed by ḃbX (x3 ) = ḃbY (y3 ). However they have different elemental vagueness and elemental uncertainty as shown below. Elemental vagueness of x3 and y3 are: V Option 3X: ḃbX (x3 ) = a X (x3 /x6 ) b X (x6 ) = 21 · 49 = 2 9 (4.46) Option 3Y: V ḃbY (y3 ) =0 Elemental uncertainty of x3 and y3 are: Option 3X: u̇uX (x3 ) = a X (x3 )uX = 13 · 39 = 1 9 Option 3Y: u̇uY (y3 ) = aY (y3 )uY = 31 · 1 = 1 3 (4.47) Note that the additivity property of Eq.(4.9) holds for x3 and y3 . The elemental mass-sum diagram of ΞEX (x3 ) and ΞYE (y3 ) of options 3X and 3Y are visualised in Figure 4.13. What is interesting in game 3 is that elemental vagueness and elemental uncertainty for x3 and y3 respectively are different. Vagueness is preferable over uncertainty, because vagueness is based on evidence, whereas uncertainty reflects lack 4.7 Entropy in the Opinion Model 73 Opt.3X, ; EX ( x3 ) : PX ( x3 ) Opt.3Y, ; ( y3 ) : PY ( y3 ) E Y P 0 Legend: 1 9 2 9 3 9 4 9 5 9 6 9 7 9 8 9 1 Elemental vagueness Elemental uncertainty Fig. 4.13 Visualising elemental mass-sum for options 3X and 3Y. of evidence. The option with the least uncertainty, and thereby the option with the greatest vagueness is therefore preferable. Game 3 shows that when projected probabilities are equal, and the elemental specificities are also equal (zero in this case), but the elemental uncertainty and vagueness are different, then the option with the least elemental uncertainty is the best choice. Option 3X is therefore the rational preferred choice because it clearly has the the least elemental uncertainty among the two. 4.7 Entropy in the Opinion Model Information theory [82] provides a formalism for modeling and measuring 1st order uncertainty about guessing the outcome of random events that are governed by probability distributions. The amount of information associated with a random variable is called entropy, where high entropy indicates that it is difficult to predict outcomes, and low entropy indicates easy predictions. The amount of information associated with a given outcome is called surprisal, where high surprisal indicates an a priori unlikely outcome, and low surprisal indicates an a priori likely outcome. People tend to be risk-averse [20], so they prefer to make decisions under low entropy and low surprisal. For example, most people prefer the option of receiving $1,000 over the option of an all-or-nothing coin flip for $2,000. The expected utility is $1,000 in both options, but the former option exposes the participant to 0 bits surprisal (i.e. no surprisal), and the latter option exposes him or her to 1 bit surprisal. Given that the expected utility otherwise is equal (as in the example above), people prefer betting with the lowest possible exposure to surprisal. Belief and uncertainty are intimately linked with regard to information theory in the opinion model. In the sections below, we introduce standard notions of information surprisal and entropy from classical information theory, before extending these notions to opinions. A more detailed discussion and treatment can be found, e.g., in [66]. 74 4 Decision-Making Under Uncertainty 4.7.1 Outcome Surprisal Surprisal, aka. self-information, is a measure of the information content associated with the outcome of random variable under a given probability distribution. The measuring unit of surprisal can be bits, nats, or hartleys, depending on the base of the logarithm used in its calculation. When logarithm base 2 is used, the unit is bits, which is also used below. Definition 4.11 (Surprisal). The surprisal (or self-information) of an outcome x of a discrete random variable X with probability distribution p X is expressed as: IX (x) = − log2 (ppX (x)) (4.48) ⊔ ⊓ Surprisal measures the degree to which an outcome is surprising. An outcome is more surprising the less likely it is to happen. When the base of the logarithm is 2, as in Eq.(4.48), the surprisal is measured in bits. The more surprising an outcome is, the more informative it is, and the more bits it has. For example, when considering a fair coin, the probability is 0.5 for both ‘heads’ and ‘tail’, so each time the coin lands with ‘heads’ or ‘tail’, the observed amount of information is I(tossing fair coin) = − log2 (0.5) = log2 (2) = 1 bit of information. When considering a fair dice, the probability is 1/6 for each face, so each time the dice produces one of its six faces, the observed amount of information is I(throwing fair dice) = − log2 (1/6) = log2 (6) = 2.585 bits. In case of an unfair dice where the probability of ‘six’ is only 1/16 (as opposed to 1/6 for a fair dice), throwing a ‘six’ amounts to I(‘six’) = − log (1/16) = log (16) = 2 2 4 bits surprise. In information theory, surprisal of an outcome is completely determined by the probability that it happens. Opinion outcome surprisal defined below. Definition 4.12 (Opinion Outcome Surprisal). Assume a (hyper) opinion ωX where the variable X takes its values from the hyperdomain R(X). Given that the projected probability of outcome x is PX (x), the opinion surprisal of outcome x is: Opinion Outcome Surprisal: IPX (x) = − log2 (PX (x)) (4.49) ⊔ ⊓ In the opinion model, surprisal of an outcome can be partially specific, vague or uncertain, in any proportion. These concepts are defined below. Definition 4.13 (Specificity, Vagueness and Uncertainty Surprisal). Assume a (hyper) opinion ωX where the variable X takes its values from the hyperdomain R(X). Given that the projected probability of outcome x is PX (x), the specificity, vagueness and uncertainty surprisals of outcome x are expressed as: 4.7 Entropy in the Opinion Model 75 S Specificity Surprisal: ISX (x) = ḃbX (x)IPX (x) PX (x) Vagueness Surprisal: IV X (x) = ḃbX (x)IPX (x) PX (x) (4.51) Uncertainty Surprisal: IU X (x) = u̇uX (x)IPX (x) PX (x) (4.52) (4.50) V ⊔ ⊓ Note that opinion surprisal of an outcome consists of the sum of specificity, vagueness and uncertainty surprisal, expressed as: U P ISX (x) + IV X (x) + IX (x) = IX (x). (4.53) The decision criteria described in Section 4.4 are expressed in terms of projected probability consisting of elemental specificity, vagueness and uncertainty. Given that opinion outcome surprisal is a function of the same concepts, the same decision criteria can equivalently be articulated in terms of outcome surprisal consisting of specificity, vagueness, and uncertainty surprisal. However, using projected probability has obvious advantages when including utility in the decision process, because their product then produces the expected utility directly. 4.7.2 Opinion Entropy Information entropy can be interpreted as expected surprisal, and is the sum over products of surprisal and probability of outcomes, as defined below. Definition 4.14 (Entropy). The entropy, denoted H(X), of a random variable X that takes its values from a domain X, is the expected surprisal expressed as: H(X) = ∑ p X (x) IX (x) = − ∑ p X (x) log2 (ppX (x)) x∈X (4.54) x∈X ⊔ ⊓ Entropy measures the expected information carried with a random variable. In information theory, entropy of a random variable is decided by the probability (1st order uncertainty) of its outcome in one test. The more evenly the outcome probabilities of a random variable are distributed, the more entropy the random variable has. If one outcome is absolutely certain, then the variable has zero entropy. The opinion entropy of a (hyper) variable X with an associated opinion ωX is simply the entropy computed over the projected probability distribution PX , similarly to Eq.(4.54). 76 4 Decision-Making Under Uncertainty Definition 4.15 (Opinion Entropy). Assume a (hyper) opinion ωX where the variable X takes its values from the hyperdomain R(X). The opinion entropy, denoted HP (ωX ), is the expected surprisal expressed as: HP (ωX ) = − ∑ PX (x) log2 (PX (x)) (4.55) x∈X ⊔ ⊓ Opinion entropy is insensitive to change in the uncertainty mass of an opinion as long as the projected probability distribution PX remains the same. Proposition 4.1. Let ωXA and ωXB be two opinions, such that uAX > uBX and PAX = PBX , then HP (ωXA ) = HP (ωXB ). Proof. The proposition’s validity follows from the fact that HP is determined by the ⊔ projected probability distributions, which are equal for ωXA and ωXB . ⊓ In order to account for difference in uncertainty, as well as in vagueness, it is necessary to introduce specificity entropy, vagueness entropy and uncertainty enS tropy. These entropy concepts can be computed based on elemental specificity ḃbX , V elemental vagueness ḃbX , and elemental uncertainty u̇uX , as defined in Section 4.1. Definition 4.16 (Specificity Entropy). Assume a (hyper) opinion ωX where the variable X takes its values from the hyperdomain R(X). The specificity entropy, denoted HS (ωX ), is the expected surprisal from elemental specificity, expressed as: HS (ωX ) = − ∑ ḃbX (x) log2 (PX (x)) S (4.56) x∈X ⊔ ⊓ Definition 4.17 (Vagueness Entropy). Assume a (hyper) opinion ωX where the variable X takes its values from the hyperdomain R(X). The vagueness entropy, denoted HV (ωX ), is the expected surprisal from elemental vagueness, expressed as: HV (ωX ) = − ∑ ḃbX (x) log2 (PX (x)) V (4.57) x∈X ⊔ ⊓ Definition 4.18 (Uncertainty Entropy). Assume a (hyper) opinion ωX where the variable X takes its values from the hyperdomain R(X). The uncertainty entropy, denoted HU (ωX ), is the expected surprisal expressed as: HU (ωX ) = − ∑ u̇uX (x) log2 (PX (x)) (4.58) x∈X ⊔ ⊓ 4.8 Conflict Between Opinions 77 Note the additivity property of the above defined entropy concepts. HS (ωX ) + HV (ωX ) + HU (ωX ) = HP (ωX ). (4.59) Thus, for a given opinion entropy, there is a continuum of sums of specificity, vagueness and uncertainty entropy. The structure of the sum reflects the type of evidence on which the entropy is based. In case of an urn of 100 balls where you only known that the balls can be red or black, the entropy for you is 2 bits with regard to the variable of picking a red or a black ball. This entropy consists solely of 2 bits uncertainty entropy. In another case, where you learn that there are exactly 50 red balls and 50 black balls, the entropy for you is still 2 bits, however in this case this entropy consists solely of 2 bits specificity entropy. In the opinion model, entropy consist of the three different types of entropy as shown in Eq.(4.59), which thereby gives a more informative expression of entropy than classical information entropy. When predicting the outcome of a variable with a given entropy, specificity entropy is preferable over vagueness entropy, which is turn is preferable over uncertainty entropy. This is in line with the decision criteria defined in Section 4.4. In case of two variables with equal entropy containing the exact same sum of specificity, vagueness and uncertainty entropy, the two variables might still have different structure of vagueness entropy, and thereby be different in nature. However, this topic is outside the scope of the current presentation. The cross entropy of an opinion measures the difference between the projected probability distribution and the base rate distribution. Definition 4.19 (Base-Rate to Projected-Probability Cross Entropy). The baserate to projected-probability cross entropy of a discrete random variable X that takes its values from domain X, denoted H BP (ωX ), is the base-rate expected projected probability expressed as: HBP (ωX ) = − ∑ aX (x) log2 (PX (x)) (4.60) x∈X ⊔ ⊓ For a given entropy, the cross entropy is maximum when the projected probability and base rate have equal distributions. 4.8 Conflict Between Opinions A fundamental assumption behind subjective logic is that different agents can have different opinions about the same variable. This also reflects the subjective reality of how we perceive the world we live in. For decision making however, having different opinions about the same thing can be problematic because it makes it difficult to agree on the best course of action. 78 4 Decision-Making Under Uncertainty When it can be assumed that ground truth exists (without being directly observable), that fact that agents have different opinions can be interpreted as an indicator that one or multiple agents are wrong. In such situations it can be meaningful to apply strategies to revise opinions, such as for trust revision described in Section 13.5. The degree of conflict, abbreviated DC, is a measure of the difference between opinions, and can be used in strategies for dealing with situations of difference between opinions about the same target. Let B and C be two agents that have their respective opinions ωXB and ωXC about the same variable X. The most basic measure of conflict between the two opinions ωXB and ωXC is the projected distance, denoted PD, expressed by Eq.(4.61). PD(ωXB , ωXC ) = ∑ |PBX (x) − PCX (x)| x∈X (4.61) 2 The property that PD ∈ [0, 1] should be to explained. Obviously PD ≥ 0. Furthermore, given that ∑ PBX (x)+ ∑ PCX (x) = 2, independently of the cardinality of X, it can be observed that PD ≤ 1. The case PD = 0 occurs when the two opinions have equal projected probability distributions, in which case the opinions are non-conflicting (even though they might be different). The maximum value PD = 1 occurs e.g. in case of two absolute binomial opinions with opposite projected probabilities. An equivalent representation of the projected distance is given in Eq.(4.62). Projected Distance: Equivalent Projected Distance: PD(ωXB , ωXC ) = max |PBX (x) − PCX (x)| x∈X (4.62) To see that Eq.(4.61) and Eq.(4.62) are equivalent becomes evident from the fact that the greatest difference |PBX (xi ) − PCX (xi )| must be balanced by an equal amount of projected probability difference for other indexes x j , xk , . . . due to the additivity of ∑ PBX and ∑ PCX , so that the sum of differences is the double of the maximum difference. A large PD does not necessarily indicate conflict, because the potential conflict is diffused in case one (or both) opinions have high uncertainty. The more uncertain one or both opinions are, the more tolerance for a large PD should be given. Tolerance for large PD in case of high uncertainty reflects the fact that uncertain opinions carry little weight in the fusion process. A natural measure of the common certainty between two opinions ωXB and ωXC is their conjunctive certainty denoted by CC: Conjunctive Certainty: CC(ωXB , ωXC ) = (1 − uBX )(1 − uCX ) (4.63) It can be seen that CC ∈ [0, 1] where CC = 0 means that one or both opinions are vacuous, and CC = 1 means that both opinions are dogmatic, i.e. have zero uncertainty mass. The degree of conflict (DC) is simply defined as the product of PD and CC. 4.8 Conflict Between Opinions 79 Definition 4.20 (Degree of Conflict). Assume two agents B and C with their respective opinions ωXB and ωXC about the same variable X. DC(ωXB , ωXC ) denotes the degree of conflict between ωXB and ωXC , which is expressed as: Degree of Conflict: DC(ωXB , ωXC ) = PD(ωXB , ωXC ) · CC(ωXB , ωXC ) (4.64) ⊔ ⊓ As as example we consider the two binomial opinions ωXB1 = (0.05, 0.15, 0.80, 0.90) and ωXC1 = (0.68, 0.22, 0.10, 0.90). Figure 4.14 shows a screenshot of the visualisation demonstration applet of subjective logic, showing two example opinions ωXB1 and ωXC1 as points in the opinion triangle on the left with their equivalent PDFs on the right. In this case we get DC(ωXB1 , ωXC1 ) = 0, meaning that there is no conflict. u 10 9 [B,X] 8 7 6 5 [C,X] 4 3 [C,X] d Opinion on [B,X] belief 0.05 disbelief 0.15 uncertainty 0.80 base rate 0.90 probability 0.77 P a Opinion on [C,X] belief 0.68 disbelief 0.22 uncertainty 0.10 base rate 0.90 probability 0.77 b 2 1 0 [B,X] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Beta Probability density Fig. 4.14 Example of opinions ωXB1 and ωXC1 where DC(ωXB1 , ωXC1 ) = 0.0. The reason why DC(ωXB1 , ωXC1 ) = 0 is because PD(ωXB1 , ωXC1 ) = 0. In terms of Definition 4.20 there is thus no conflict between these opinions, although their belief masses are quite different. The next example shows two binomial opinions ωXB2 = (0.05, 0.15, 0.80, 0.10) and ωXC2 = (0.68, 0.22, 0.10, 0.10) that have the same belief masses as in the previous example, but with different base rate. In this example there is some conflict between the opinions, which demonstrates that the degree of conflict is influenced simply by changing the base rate. The degree of conflict can be computed according to Eq.(4.64) as: 80 4 Decision-Making Under Uncertainty DC(ωXB2 , ωXC2 ) = PD(ωXB2 , ωXC2 ) · CC(ωXB2 , ωXC2 ) (4.65) = 0.56 · (1.00 − 0.80)(1.00 − 0.10) = 0.10. Figure 4.15 shows a screenshot of the visualisation of the binomial opinions ωXB2 and ωXC2 . The opinions are shown as points in the opinion triangle on the left, with their equivalent PDFs on the right. u 10 9 [B,X] 8 7 6 5 4 3 [C,X] d aP Opinion on [B,X] 0.05 belief disbelief 0.15 uncertainty 0.80 base rate 0.10 probability 0.13 P Opinion on [C,X] 0.68 belief disbelief 0.22 uncertainty 0.10 base rate 0.10 probability 0.69 b [C,X] [B,X] 2 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Beta Probability density Fig. 4.15 Example of opinions ωXB2 and ωXC2 where DC(ωXB2 , ωXC2 ) = 0.1 Although the conflict might seem high due to the very different projected probabilities, the fact that ωXB2 is highly uncertain diffuses the potential conflict, so that the degree of conflict only becomes DC(ωXB2 , ωXC2 ) = 0.10. The notion of degree of conflict as described here only provides a relatively course measure of conflict between two opinions. For example, two opinions with very different PDFs can have zero conflict, as shown in Figure 4.14. Because opinions are multi-dimensional a more complete expression for conflict would necessarily require multiple parameters. However, this would partially defeat the purpose of having a simple measure of conflict between opinions. The degree of conflict expressed by Definition 4.20 provides a simple way of assessing conflict between two opinions, which is is useful e.g. for trust revision. Chapter 5 Principles of Subjective Logic This chapter compares subjective logic with other relevant reasoning frameworks, and gives an overview of general principles of subjective logic. 5.1 Related Frameworks for Uncertain Reasoning 5.1.1 Comparison with Dempster-Shafer Belief Theory Dempster-Shafer Belief Theory (DST), also known as evidence theory, has its origin in a model for upper and lower probabilities proposed by Dempster in 1960. Based on Dempster’s model, Shafer later proposed a model for expressing beliefs [81]. The main idea behind DST is to abandon the additivity principle of probability theory, i.e. that the sum of probabilities on all pairwise exclusive possibilities must add up to one. DST uses the term ‘frame of discernment’, or ‘frame’ for short, to denote the set of exclusive possible states, which is equivalent to a domain in subjective logic. Belief theory gives observers the ability to assign so-called belief mass to any subset of the frame including the whole frame itself. The advantage of this approach is that uncertainty about the probabilities, i.e. the lack of evidence to support any specific probability, can be explicitly expressed by assigning belief mass to the whole frame or to arbitrary subsets of the frame. Shafer’s book [81] describes many aspects of belief theory, where the two main elements are 1) a flexible way of expressing beliefs, and 2) a method for combining beliefs, commonly known as Dempster’s rule. The way DST expresses beliefs is highly expressive, and extends the notion of probability. By using beliefs it is possible to provide the argument “I don’t know” as input to a reasoning model, which is not possible with probabilities. This capability has made DST quite popular among researchers and practitioners. 81 82 5 Principles of Subjective Logic The opinion representation in subjective logic is based on the representation of belief functions in DST. The difference between subjective opinions and DST belief functions is that opinions include base rates, while DST belief functions do not. Consider a domain X with its hyperdomain R(X)and powerset P(X). Recall that X ∈ P(X). Let x denote a specific value of the hyperdomain R(X) or of the powerset P(X). In DST, belief mass on value x is denoted m (x). The equality between the belief masses of DST and the belief masses and uncertainty mass of subjective opinions is given by Eq.(5.1). m (x) = b X (x) ∀x ∈ R(X) (5.1) m (X) = uX Syntactically, the belief/uncertainty representations of DST and subjective logic are thus equivalent. Their interpretation however is different. In subjective logic there can be no belief mass assigned to the domain X itself. This interpretation corresponds to the Dirichlet model, where only observations of values of X are counted as evidence. The domain X can not be observed, so it can not be counted as evidence. The main application area of DTS as described in the literature has been applications of belief fusion, where Dempster’s rule is the classical operator [81]. There has been considerable controversy around assessing the adequacy of operators for belief fusion, especially related to Dempster’s rule. The traditional interpretation of Dempster’s rule is that it fuses separate argument beliefs from independent sources into a single belief. There are well known examples where Dempster’s rule produces counter-intuitive and clearly wrong results when interpreted in this way, especially in case of strong conflict between the input argument beliefs [92], but also in case of harmony between the input argument beliefs [15]. Motivated by this observation, numerous authors have proposed alternative methods for fusing beliefs [11, 14, 17, 38, 41, 62, 71, 84, 91]. These operators are not only formally different, they also model very different situations, but the authors often do not specify the type of situations they model. This confusion can be seen as the tragedy of belief theory for two reasons. Firstly, instead of advancing belief theory, researchers have been trapped in the search for a solution to the same problem for 30 years. Secondly, this controversy has given belief theory a bad taste despite its obvious advantages for representing ignorance and uncertainty. The fact that different situations require different operators and modeling assumptions has often been ignored in the belief theory literature, and has therefore been a significant source of confusion for many years [48]. The equivalent operator for Dempster’s rule in subjective logic is the constraint fusion operator described in Section 11.2. We prove in Section 11.2.2 that the constraint fusion operator (end thereby also Dempster’s rule) models situations of frequentist stochastic constraints, which through the correspondence between frequentist and subjective probabilities also applies to general constraints. 5.1 Related Frameworks for Uncertain Reasoning 83 5.1.2 Comparison with Imprecise Probabilities The Imprecise Dirichlet Model (IDM) for multinomial data is described by Walley [88] as a method for determining upper and lower probabilities. The model is based on setting the minimum and maximum base rates in the Beta or Dirichlet PDF for each possible value in the domain. The expected probability resulting from assigning the maximum base rate (i.e. equal to one) to the probability of a value in the domain produces the upper probability, and the expected probability resulting from assigning a zero base rate to a value in the domain produces the lower probability. The upper and lower probabilities are interpreted as the upper and lower bounds for the relative frequency of the outcome. While this is an interesting interpretation of the Dirichlet PDF, it can not be taken literally, as shown below. Let r X represent the evidence for the Dirichlet PDF, and let the non-informative prior weight be denoted by W = 2. According to the Imprecise Dirichlet Model (IDM) [88] the upper and lower probabilities for a value x ∈ X are defined as: IDM Upper probability: E(x) = r X (x) +W , ∀x ∈ X W + ∑ki=1 r X (xi ) (5.2) IDM Lower probability: E(x) = r X (x) , ∀x ∈ X W + ∑ki=1 r X (xi ) (5.3) It can easily be shown that the IDM Upper and IDM Lower values can not be literally interpreted as upper and lower bounds for for the probability. For example, assume a bag contains 9 red marbles and 1 black marble, meaning that the relative frequencies of red and black marbles are p(red) = 0.9 and p(black) = 0.1. The a priori weight is set to W = 2. Assume further that an observer picks one marble which turns out to be black. According to Eq.(5.3) the lower probability is then E(black) = 31 . It would be incorrect to literally interpret this value as the lower bound for the probability because it obviously is greater than the actual relative frequency of black balls. In other words, if E(black) > p(black) then E(black) can impossibly be the lower bound. This case shows that the upper and lower probabilities defined by the IDM should be interpreted as a rough probability interval, because it must allow for the possibility that actual probabilities (relative frequencies) can be outside the range. 5.1.3 Comparison with Fuzzy Logic The domains for variables in fuzzy logic consist of terms/categories that are vague in nature and that have partially overlapping semantics. For example, in case the variable is ‘Height of a person’ then possible values can be ‘short’, ‘average’ or ‘tall’. The fuzzy aspect is that for a specific height it can be uncertain whether the person should be considered small, average or tall. A person measuring 182cm might be 84 5 Principles of Subjective Logic considered to be somewhat average and somewhat tall. In fuzzy logic this is expressed by fuzzy membership functions, whereby a person could be considered to be 0.5 average and 0.5 tall, depending on the circumstances. Note that in fuzzy logic, the height of a person can be measured in an exact and crisp way, whereas variable domains consist of terms/categories that are fuzzy/vague in nature. In subjective logic on the other hand the domains consist of terms/categories that are considered crisp in nature, whereas subjective opinions contain belief mass and uncertainty mass that express uncertainty and vagueness. This difference between fuzzy logic and subjective logic is illustrated in Figure 5.1. Fuzzy logic 250 cm Tall Domain of fuzzy categories Average 200 cm Fuzzy membership functions 150 cm 100 cm Short Crisp measures 50 cm 0 cm Subjective logic Domain of crisp categories Friendly aircraft Enemy aircraft Civilian aircraft Subjective opinions Z Uncertain measures Fig. 5.1 Difference between fuzzy membership functions and subjective opinions. Both fuzzy logic and subjective logic both handle aspects of uncertainty and vagueness, but they use quite different principles. A natural idea would be to combine these two reasoning frameworks. It is then a question of how this can be done, and whether it would produce a more flexible and powerful reasoning than either fuzzy logic or subjective logic can provide in isolation. Without going deeper into this topic we can simply mention the possibility of combining fuzzy logic and subjective logic, e.g. by expressing fuzzy membership functions in terms of opinions, as described in [54]. The advantage of this approach is that it is possible to express uncertainty about the membership functions. If for example the height of a person is only known with imprecision, then this can naturally be reflected by expressing the fuzzy membership function as an uncertain subjective opinion. 5.1 Related Frameworks for Uncertain Reasoning 85 5.1.4 Comparison with Kleene’s Three-Valued Logic In Kleene’s 3-valued logic [24] propositions can be assigned one of 3 truth-values specified as TRUE, FALSE and UNKNOWN. The two first truth values are interpreted as the traditional TRUE and FALSE in binary logic. The UNKNOWN value can be thought of as neither TRUE nor FALSE. In Kleene logic it is assumed that when the truth value of a particular proposition is UNKNOWN, then it might secretly have the value TRUE or FALSE at any moment in time, but the actual truth value is not available to the analyst. The logical AND and OR operators in Kleene’s 3-valued logic are specified in Tables 5.1 (a) and (b) below. Table 5.1 Truth tables for Kleene’s 3-valued AND and OR operators x∧y x F U T F F F F y U F U U T F U T (a) Truth table for AND x∨y x F U T F F U T y U U U T T T T T (a) Truth table for OR There are obvious problems with Kleene’s logic, as explained below. According to truth table 5.1 (a), the truth value of the conjunction (x ∧ y) is specified to be UNDEFINED when the truth values of x and y are both defined as UNDEFINED. However, in case of an infinitely large number of variables x, y, . . . z that are all UNDEFINED, Kleene’s logic would still dictate the truth of the serial conjunction (x ∧ y · · · ∧ z) to be UNDEFINED. This result is inconsistent with the intuitive conclusion where the correct value should be FALSE. A simple example illustrates why this is so. Assume the case of flipping a fair coin multiple times where each flip is a separate variable. An observer’s best guess about whether the first outcome will be heads might be expressed as “I don’t know” which in 3-valued logic would be expressed as UNDEFINED, but the observer’s guess about whether the first n outcomes will all be heads, when n is arbitrarily large or infinite, should intuitively be expressed as FALSE, because the likelihood that a infinite series of outcomes will only produce heads becomes infinitesimally small. In subjective this paradox is easily solved when multiplying a series of vacuous opinions. The product of an arbitrarily long series of vacuous binomial opinions would still be vacuous, but the projected probability would be close to zero. This result is illustrated with an example below. Figure 5.2 shows a screenshot of the online demonstrator for subjective logic operators. The example illustrates the case of multiplying the two vacuous binomial opinions ωx = (0, 0, 1, 12 ) and ωy = (0, 0, 1, 12 ). 86 5 Principles of Subjective Logic u u u = AND d b a P Opinion about x belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.50 probability 0.50 b d a P Opinion about y belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.50 probability 0.50 b d a P Opinion about x AND y belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.25 probability 0.25 Fig. 5.2 Example multiplication of two vacuous opinions The method of multiplying two binomial opinions is described in Section 7.1 below, but this trivial example can be directly understood from Figure 5.2. The product opinion ωx∧y = (0, 0, 1, 41 ) is still vacuous, but the projected product probability is Px∧y = 14 . In case the product has n factors that are all the same vacuous opinions, then the product has a projected probability P(x∧y∧...z) = ( 21 )n which quickly converges towards zero, as would be expected. At first glance Kleene’s 3-valued logic might seem to represent a special case of subjective logic. However, as the example above illustrates, applying the truth tables of Kleene’s logic to practical situations leads to counter-intuitive results. The corresponding results by subjective logic correspond well with intuition. 5.2 Subjective Logic as a Generalisation of Probabilistic Logic We define probabilistic logic (PL) as the set of operators defined in Table 1.1 applied to probabilities. PL operators generalise the traditional binary logic (BL) operators AND, OR, XOR, MP etc., in the sense that when the probability arguments are 0 or 1 (equivalent to Boolean FALSE or TRUE) the PL operators correctly populate the traditional truth tables of the corresponding BL operators. It means that PL operators are homomorphic to the truth tables of BL in case probability arguments are 0 or 1, and are generalisations in other cases. Similarly, subjective logic generalises PL operators in the sense that when opinion arguments are dogmatic (equivalent to probabilities) then they produce dogmatic opinions equivalent to probabilities produced by the corresponding PL operators. It means that subjective logic operators are homomorphic to PL operators in case opinion arguments are dogmatic, and are generalisations in other cases. 5.2 Subjective Logic as a Generalisation of Probabilistic Logic 87 In case of absolute opinion arguments (equivalent to Boolean TRUE or FALSE), then SL operators are homomorphic to BL truth tables. The generalisations and homomorphisms are illustrated in Figure 5.3. Generalisation Binary logic x Booleans Generalisation Probabilistic logic Homomorphic i.c.o. probability 0 or 1 x Probabilities x PL operators x Truth tables Homomorphic i.c.o. dogmatic multinomial opinions Subjective logic x Opinions Z X x SL operators Homomorphic in case of absolute binomial opinions Fig. 5.3 Generalisations and homomorphisms between SL, PL and BL A homomorphism from an algebra denoted (domain A, set of operators) to an algebra denoted (domain B, set of operators) exists when their respective sets of opA A B B erators e.g. denoted as (+, ×, . . . ) and (+, ×, . . . ) satisfy the following properties under the mapping F from variables x, y, · · · ∈ A to variables F(x), F(y), · · · ∈ B: A B F(x + y) = F(x) + F(y) Homomorphism: (5.4) A B F(x × y) = F(x) × F(y) A B Given a homomorphism, we say e.g. that operator + is homomorphic to +. For example, multiplication of binomial opinions (or probabilities) is homomorphic to binary logic AND. An isomorphism between an algebra denoted (domain A, operators) and an algebra denoted (domain B, operators) exists when in addition to Eq.(5.4) the mapping F is bijective so that the following holds: B A −1 F (F(x) + F(y)) = x + y Isomorphism: (5.5) B A F −1 (F(x) × F(y)) = x × y A B Given an isomorphism, we say e.g. that operators + and + are isomorphic. For example, multiplication with integers and multiplication with Roman numerals are isomorphic, where obviously multiplication with integers is the simplest. In case two values are represented in Roman numerals and we need to compute their product, the simplest is to first map the Roman numerals to integers, do the multiplication, and finally map the product back to a Roman numeral. Subjective logic isomorphisms, illustrated in Figure 5.4, allow effective usage of operators from both the Dirichlet and the belief models. 88 5 Principles of Subjective Logic Different expressions that traditionally are equivalent in binary logic do not necessarily have equal opinions. Take for example distributivity of AND over OR: x ∧ (y ∨ z) ⇔ (x ∧ y) ∨ (x ∧ z). (5.6) This equivalence only holds for binary logic, not in subjective logic. The corresponding opinions are in general different, as expressed by Eq.(5.7). ωx∧(y∨z) 6= ω(x∧y)∨(x∧z) (5.7) This is no surprise, as the corresponding PL operator for multiplication is also non-distributive on comultiplication as expressed by Eq.(5.8). p(x) · (p(y) ⊔ p(z)) 6= (p(x) · p(y)) ⊔ (p(x) · p(z)) (5.8) The symbol ⊔ denotes coproduct of independent probabilities, defined as: p(x) ⊔ p(y) = (p(x) · p(y)) − p(x) − p(y). (5.9) Coproduct of probabilities generalises binary logic OR. This means that Eq.(5.9) generates the traditional truth table for binary logic OR when input probability arguments are either 0 (FALSE) or 1 (TRUE). Multiplication is distributive over addition in subjective logic, as expressed by: ωx∧(y∪z) = ω(x∧y)∪(x∧z) . (5.10) De Morgan’s laws are also satisfied in subjective logic as e.g. expressed by: De Morgan 1: ωx∧y = ωx∨y De Morgan 2: ωx∨y = ωx∧y . (5.11) Note also that Definition 6.3 of complement gives the following equalities: ωx∧y = ¬ωx∧y , ωx∨y = ¬ωx∨y . (5.12) Subjective logic provides of a rich set of operators where input and output arguments are in the form of subjective opinions. Opinions can be applied to domains of any cardinality, but some subjective logic operators are only defined for binomial opinions over binary domains. Opinion operators can be described for the belief notation (i.e. traditional opinion notation), for the evidence notation (i.e. as Dirichlet PDFs), or for the probabilistic notation (as defined in Section 3.6.1). The belief notation of SL operators normally produces the simplest and most compact expressions, but it can be practical to use other notations in specific cases. Subjective logic operators involving multiplication and division produce product opinions with correct projected probability (distribution), but possibly with ap- 5.2 Subjective Logic as a Generalisation of Probabilistic Logic 89 proximate variance when seen as Beta/Dirichlet PDF. All other operators produce opinions with projected probability and variance that are analytically correct. Table 5.2 provides the equivalent values and interpretation in belief notation, evidence notation, and probabilistic notation as well as in binary logic and traditional probability representation for a selection of binomial opinions. Table 5.2 Examples in the three equivalent notations of binomial opinion, and their interpretations. Belief notation (b, d, u, a) Evidence notation (r, s, a) Probabilistic Interpretations as binomial opinion, Beta PDF notation and probability (P, u, a) (1, 0, 0, a) (∞, 0, a) (1, 0, a) Absolute positive binomial opinion (Boolean TRUE), Dirac delta function, probability p = 1 (0, 1, 0, a) (0, ∞, a) (0, 0, a) Absolute negative binomial opinion (Boolean FALSE), Dirac delta function, probability p = 0 ( 21 , 12 , 0, a) (∞, ∞, a) ( 21 , 0, a) Dogmatic binomial opinion denoted ω , Dirac delta function, probability p = 12 ( 41 , 14 , 21 , 12 ) (1, 1, 12 ) ( 21 , 12 , 12 ) Uncertain binomial opinion, symmetric Beta PDF of 1 positive and 1 negative observation, probability p = 21 (0, 0, 1, a) (0, 0, a) (a, 1, a) Vacuous binomial opinion denoted ω , prior Beta PDF with base rate a, probability p = a (0, 0, 1, 12 ) (0, 0, 12 ) ( 21 , 1, 12 ) Vacuous binomial opinion denoted ω , uniform Beta PDF, probability p = 12 ◦ ◦ It can be seen that some measures correspond to Booleans and probabilities, whereas other measures correspond to probability density distributions. This richness of expression represents the advantage of subjective logic over other probabilistic logic frameworks. Online visualisations of subjective opinions and density functions can be accessed at http://folk.uio.no/josang/sl/. Subjective logic allows highly efficient computation of mathematically complex models. This is possible by approximating the analytical function expressions whenever needed. While it is relatively simple to analytically multiply two Beta distributions in the form of a joint distribution, anything more complex than that quickly becomes intractable. When combining two Beta distributions with some operator/connective, the analytical result is not always a Beta distribution and can involve hypergeometric series. In such cases, subjective logic always approximates the result as an opinion that is equivalent to a Beta distribution. 90 5 Principles of Subjective Logic 5.3 Overview of Subjective Logic Operators Table 5.3 lists the main subjective logic operators. Table 5.3 Correspondence between SL operators, binary logic / set operators and SL notation. SL operator (page) Symbol BL / set operator Addition (p.95) + Union ∪ ωx∪y = ωx + ωy Subtraction (p.97) − Difference \ ωx\y = ωx − ωy Complement (p.98) ¬ NOT (Negation) x ωx = ¬ωx Multiplication (p.102) · AND (Conjunction) ∧ ωx∧y = ωx · ωy Comultiplication (p.103) ⊔ OR (Disjunction) ∨ ωx∨y = ωx ⊔ ωy Division (p.110) / UN-AND (Unconjunction) Codivision (p.112) UN-OR (Undisjunction) ωx∧e y = ωx /ωy Multinomial product (p.117) e ⊔ e ∧ · Cartesian product × e ωy ωx∨e y = ωx ⊔ ωX×Y = ωX · ωY Multinomial division (p.127) / Cartesian quotient / ωXY /Y = ωXY /ωY Deduction (p.135) ⊚ MP k ωY kX = ωX ⊚ ωY |X Abduction/Inversion (p.173) MT k̃ Constraint fusion (p.207) e ⊚ ⊙ n.a. & e ωY |X ωX k̃Y = ωY ⊚ Cumulative Fusion (p.216) ⊕ n.a. ⋄ ωXA⋄B = ωXA ⊕ ωXB Averaging fusion (p.218) ⊕ n.a. ⋄ ωXA⋄B = ωXA ⊕ ωXB CC n.a. ♥ B CC ω ωXA♥B = ωXA X Cumulative unfusion (p.228) ⊖ n.a. ωXAe⋄B = ωXA ⊖ ωXB Averaging unfusion (p.229) ⊖ n.a. e ⋄ Cumulative fission (p.231) Discounting (p.247) ⊗ CC-fusion (p.224) Symbol SL notation e ∨ ωXA&B = ωXA ⊙ ωXB ωXAe⋄B = ωXA ⊖ ωXB n.a. e ⋄ ▽ ωX▽C = ωXC Trust transitivity : ωX [A;B] = ωBA ⊗ ωXB 5.3 Overview of Subjective Logic Operators 91 Most of the operators in Table 5.3 correspond to well-known operators from binary logic and probability calculus, while others are specific to subjective logic. The correspondence between subjective logic operators and traditional operators means that they are related through homomorphisms. The homomorphisms of Figure 5.3 can be illustrated with concrete examples. Assume two independent binomial opinions ωx and ωy . Let P(ωx ) denote the projected probability of ωx which then is the probability of x. Similarly, the expressions P(ωy ) and P(ωx∧y ) represent the probabilities of y and (x ∧ y) respectively. The homomorphism from SL to PL illustrated in Figure 5.3 means for example that: In case of dogmatic opinions: P(ωx∧y ) = P(ωx ) · P(ωy ) (5.13) The homomorphism of Eq.(5.13) is of course also valid in case ωx and ωy are absolute opinions. Assume now two absolute binomial opinions ωx and ωy , and let B(ωx ) denote the Boolean value of x. Similarly, the expressions B(ωy ) and B(ωx∧y ) represent the Boolean values of y and (x ∧ y) respectively. The homomorphism from SL to BL illustrated in Figure 5.3 means for example that: In case of absolute opinions: B(ωx∧y ) = B(ωx ) ∧ B(ωy ) (5.14) In the special case of absolute binomial opinions and with the homomorphisms of Eq.(5.14), the distributivity between product and coproduct of opinions holds, in contrast to the general case of Eq.(5.7). This leads to the equality of Eq.(5.15). In case of absolute opinions: B(ωx∧(y∨z) ) = B(ω(x∧y)∨(x∧z) ) (5.15) Recall from Eq.(3.43) the probabilistic notation of binomial opinions: Px : probability of x Probabilistic notation: πx = (Px , ux , ax ), where: ux : uncertainty (5.16) ax : base rate of x Binary logic AND corresponds to multiplication of opinions [46]. For example, the pair of probabilistic binomial opinions on the elements x ∈ X and y ∈ Y: πx = (1, 0, ax ) TRUE with respective corresponding Booleans: (5.17) πy = (0, 0, ay ) FALSE Their product is: expressed with numerical values: which corresponds to: πx∧y = πx · πy (0, 0, ax ay ) = (1, 0, ax ) · (0, 0, ay ) (5.18) FALSE = TRUE ∧ FALSE It is interesting to note that subjective logic represents a calculus for Dirichlet distributions when opinions are equivalent to Dirichlet distributions. Analytical 92 5 Principles of Subjective Logic manipulations of Dirichlet distributions is complex but can be done for simple operators, such as multiplication in which case it is called a joint distribution. However, this analytical method will quickly become unmanageable when applied to the more complex operators of Table 5.3 such as conditional deduction and abduction. Subjective logic therefore has the advantage of providing advanced operators for Dirichlet distributions for which no practical analytical solutions exist. It should be noted that the simplicity of some subjective logic operators comes at the cost of allowing those operators to be approximations of the analytically correct operators. This is discussed in more detail in Section 7.1. Subjective opinions can have multiple equivalent representations, as described in Section 3.6. It naturally follows that each subjective logic operator can be expressed for the various opinion representations. Since the different representations of opinions are equivalent, the different expressions of the same operator are isomorphic to each other, as illustrated in Figure 5.4 SL Evidence notation SL Belief notation x Belief opinions Z X x Belief operators Isomorphic e x Evidence opinions DirX x Evidence operators SL Probabilistic notation x Probabilistic opinions S X x Probabilistic operators Fig. 5.4 Isomorphisms between the SL operators for different opinion representations The fact that subjective opinions can be expressed based on the belief notation, the evidence notation or the probabilistic notation also means that subjective logic operators can be expressed in these different notations. Since the different representations of a subjective opinion are equivalent, the operators are isomorphic, as illustrated in Figure 5.4. Throughout this book opinions and operators are generally expressed in the belief notation because it gives the simplest and most compact expressions. CertainLogic [79] provides alternative expressions for a few operators based on a special notation, but only for binomial opinions. The operator expressions used in CertainLogic are significantly more complex than the the equivalent operator expressions based on the belief notation. 5.3 Overview of Subjective Logic Operators 93 Subjective logic is directly connected to traditional reasoning frameworks through homomorphic correspondence with probabilistic logic and binary logic, and offers powerful operators that are derived from the isomorphic correspondence between the respective native algebras of belief opinions and Dirichlet PDFs. The next chapters describe the operators mentioned in Table 5.3. Online demonstrations of subjective logic operators can be accessed at http://folk.uio.no/josang/sl/. Chapter 6 Addition, Subtraction and Complement 6.1 Addition Addition of opinions in subjective logic is a binary operator that takes opinions about two mutually exclusive values (i.e. two disjoint subsets of the same domain) as arguments, and outputs an opinion about the union of the values [65]. Consider for example the domain X = {x1 , x2 , x3 } illustrated in Figure 6.1, with the assumed union of x1 and x2 . x1 x1x2 : x2 x3 Fig. 6.1 Union of values, corresponding to addition of opinions Assume that the binomial opinions ωx1 and ωx2 apply to x1 and x2 respectively. The addition of ωx1 and ωx2 then consists of computing the opinion on x1 ∪ x2 as a function of the two former opinions. The operator for addition first described in [65] is defined below. Definition 6.1 (Addition). Assume a domain X where x1 and x1 are two singleton elements, or alternatively two disjoint subsets, i.e. x1 ∩ x2 = 0. / We also require that the two elements x1 and x2 together do not represent a complete partition of X which would mean that x1 ∪ x2 ⊂ X. Assume the binomial opinions ωx1 = (bx1 , dx1 , ux1 , ax1 ) and ωx2 = (bx2 , dx2 , ux2 , ax2 ) that respectively apply to x1 and x2 . The opinion about x1 ∪ x2 as a function of the opinions about x1 and x2 is defined as: 95 96 6 Addition, Subtraction and Complement Opinion sum ω(x1 ∪x2 ) : b(x1 ∪x2 ) = bx1 + bx2 , ax1 (dx1 −bx2 )+ax2 (dx2 −bx1 ) , d(x1 ∪x2 ) = ax1 +ax2 (6.1) ax1 ux1 +ax2 ux2 u(x1 ∪x2 ) = ax1 +ax2 , a (x1 ∪x2 ) = ax1 + ax2 . By using the symbol ‘+’ to denote the addition operator for opinions, addition ⊔ ⊓ can be denoted as ω(x1 ∪x2 ) = ωx1 + ωx2 . It can be verified that the addition operator preserves the addition of projected probabilities, as expressed by Eq.(6.2). Addition of projected probabilities: P(x1 ∪x2 ) = Px1 + Px2 . (6.2) Figure 6.2 shows a screenshot of the online demonstrator for subjective logic operators. The example illustrates addition of the two binomial opinions ωx1 = (0.20, 0.40, 0.40, 0.25) and ωx2 = (0.10, 0.50, 0.40, 0.50). u u u = PLUS d b aP Opinion about x1 belief 0.20 disbelief 0.40 uncertainty 0.40 base rate 0.25 probability 0.30 b d P a Opinion about x2 belief 0.10 disbelief 0.50 uncertainty 0.40 base rate 0.50 probability 0.30 b d P a Opinion about x1 x2 belief 0.30 disbelief 0.30 uncertainty 0.40 base rate 0.75 probability 0.60 Fig. 6.2 Example addition of two binomial opinions The sum is simply ω(x1 ∪x2 ) = (0.30, 0.30, 0.40, 0.75), and it can be verified that P(x1 ∪x2 ) = 0.30 + 0.30 = 0.60. Opinion addition generates confusing belief mass b(x1 ∪x2 ) from the specific belief masses bx1 and bx2 . Opinion addition might therefore not be as useful as one might intuitively assume. Also, opinion addition does not apply to the case when X = (x1 ∪ x1 ), because the resulting belief mass would be totally vague, so the opinion should be considered vacuous. 6.2 Subtraction 97 The cumulative fusion operator is related to the addition operator, but these two operators have different interpretations and purposes, so they should not be confused. The cumulative fusion operator is based on addition of evidence in the evidence space, whereas the addition operator is based on addition of belief mass is the belief space. Cumulative fusion does not necessarily lead to increased vagueness. In case of multinomial opinions cumulative fusion does not produce vagueness at all. The cumulative fusion operator is described in Chapter 11. 6.2 Subtraction The inverse operation to opinion addition is opinion subtraction [65]. Since addition of opinions yields the opinion about x1 ∪ x2 from the opinions about disjoint subsets of the domain, then the difference between the opinions about x1 ∪ x2 and x2 (i.e. the opinion about (x1 ∪ x2 )\x2 ) can only be defined if x2 ⊆ (x1 ∪ x2 ) where x2 and (x1 ∪ x2 ) are subsets of the domain X, i.e. the system must be in the state (x1 ∪ x2 ) whenever it is in the state x2 . The operator for subtraction first described in [65] is defined below. Definition 6.2 (Subtraction). Let (x1 ∪ x2 ) and x2 be subsets of the same domain X where (x1 ∪ x2 ) ∩ x2 = x2 . The opinion about (x1 ∪ x2 )\x2 as a function of the opinions about (x1 ∪ x2 ) and x2 is expressed below. ω((x1 ∪x2 )\x2 ) : b = b(x1 ∪x2 ) − bx2 , ((x1 ∪x2 )\x2 ) a(x ∪x ) (d(x ∪x ) +bx2 )−ax2 (1+bx2 −b(x ∪x ) −ux2 ) 1 2 1 2 1 2 , a(x ∪x ) −ax2 d((x1 ∪x2 )\x2 ) = 1 2 a(x ∪x ) u(x ∪x ) −ax2 ux2 2 u((x1 ∪x2 )\x2 ) = 1 a2 1 −a , x2 (x1 ∪x2 ) a((x1 ∪x2 )\x2 ) = a(x1 ∪x2 ) − ax2 . (6.3) Since both u((x1 ∪x2 )\x2 ) and d((x1 ∪x2 )\x2 ) should be non-negative the following constraints apply. u((x1 ∪x2 )\x2 ) ≥ 0 ⇒ ax2 ux2 ≤ a(x1 ∪x2 ) u(x1 ∪x2 ) , d((x1 ∪x2 )\x2 ) ≥ 0 ⇒ a(x1 ∪x2 ) (d(x1 ∪x2 ) + bx2 ) ≥ ax2 (1+bx2 −b(x1 ∪x2 ) − ux2 ). (6.4) By using the symbol ‘−’ to denote the subtraction operator for opinions, subtraction can be denoted as ω((x1 ∪x2 )\x2 ) = ω(x1 ∪x2 ) − ωx2 . ⊔ ⊓ Given the structure of the example domain X in Figure 6.1, it is obvious that ω((x1 ∪x2 )\x2 ) = ωx1 . 98 6 Addition, Subtraction and Complement The subtraction operator produces reduced vagueness, and removes vagueness completely if ((x1 ∪ x2 )\x2 ) is a singleton. Subtraction of opinions is consistent with subtraction of probabilities, as expressed by Eq.(6.5). Subtraction of projected probabilities: Px1 = P(x1 ∪x2 ) − Px2 . (6.5) Figure 6.3 shows a screenshot of the online demonstrator for subjective logic operators. The example illustrates subtraction of the binomial opinion ω(x1 ∪x2 = (0.70, 0.10, 0.20, 0.75) by the binomial opinion ωx2 = (0.50, 0.30, 0.20, 0.25). u u u = MINUS d b a P Opinion about x1 x2 belief 0.70 disbelief 0.10 uncertainty 0.20 base rate 0.75 probability 0.85 b d a P Opinion about x2 belief 0.50 disbelief 0.30 uncertainty 0.20 base rate 0.25 probability 0.55 b d P a Opinion about x1 belief 0.20 disbelief 0.60 uncertainty 0.20 base rate 0.50 probability 0.30 Fig. 6.3 Example subtraction between two binomial opinions The difference is simply ωx1 = (0.20, 0.60, 0.20, 0.50), and it can be verified that P((x1 ∪x2 )\x2 ) = Px1 = 0.85 − 0.55 = 0.30. 6.3 Complement A binomial opinion focuses on a single element x in a binary domain X = {x, x}. The complement of this opinion opinion is simply the opinion on the complement element x. This is illustrated in Figure 6.4. Definition 6.3 (Complement). Assume the binary domain X = {x x} where ωx = (bx , dx ux , ax ) is a binomial opinion on x. Its complement is the binomial opinion ωx expressed as: bx = dx dx = bx Complement opinion ωx : (6.6) ux = ux ax = 1 − ax 6.3 Complement 99 Z xA Z xA : x x Fig. 6.4 Complement of binomial opinion The complement operator denoted ‘¬’ is an unary operator. Applying the complement operator to a binomial opinion is written: ¬ωx = ωx (6.7) ⊔ ⊓ The complement operator corresponds to binary logic NOT, and to complement of probabilities. For projected probabilities it can be verified that: P(¬ωx ) = 1 − P(ωx ). (6.8) Figure 6.5 shows a screenshot of the online demonstrator for binomial subjective logic operators. The example shown in the figure illustrates complement of the binomial opinion ωx = (0.50, 0.10, 0.40, 0.25). u u = COMPLEMENT d a P b Opinion about x 0.50 belief disbelief 0.10 uncertainty 0.40 base rate 0.25 probability 0.60 b d P a Opinion about NOT x 0.10 belief disbelief 0.50 uncertainty 0.40 base rate 0.75 probability 0.40 Fig. 6.5 Example complement of binomial opinion The complement opinion is simply ωx = (0.10, 0.50, 0.40, 0.75), and it can be verified that Px = 1 − Px = 1.00 − 0.60 = 0.40. Chapter 7 Binomial Multiplication and Division This chapter describes the subjective logic operators that correspond to binary logic AND and OR, as well as their inverse operators which in binary logic could be called UN-AND and UN-OR. We will here describe multiplication and comultiplication [46]1 . Special limit cases are described in [46]. 7.1 Binomial Multiplication and Comultiplication Binomial multiplication and comultiplication in subjective logic take binomial opinions about two elements from distinct binary domains as input arguments and produce a binomial opinion as result. The product and coproduct result opinions relate to subsets of the Cartesian product of the two binary domains. The Cartesian product of the two binary domains X = {x, x} and Y = {y, y} produces the quaternary set X×Y = {(x y), (x y), (x y), (x y)} which is illustrated in Figure 7.1 below. : ; x y x u :u; ( x y) ( x y) ( x y) ( x y) = y Fig. 7.1 Cartesian product of two binary domains 1 Note that the definitions of multiplication and comultiplication defined here are different from those defined in [37] which should not be used. 101 102 7 Binomial Multiplication and Division It is possible to compute joint Beta PDFs and Dirichlet PDFs although closed expressions for general case might be intractable. When assuming that such joint PDFs exist, one would expect multiplication to be equivalent. However, in general, products of opinions in subjective logic represent approximations of the analytically correct products of Beta PDFs and Dirichlet PDFs. In this regard, multiplication of binomial opinions produces the best approximation of joint Beta PDFs. The same can be said for coproducts, quotients and co-quotients. There is hardly any work on computing these results for Beta PDFs in the literature, so subjective logic currently offers the most practical operators for computing coproducts, quotients and co-quotients of Beta PDFs. 7.1.1 Binomial Multiplication Let ωx and ωy be opinions about x and y respectively held by the same observer. Then the product opinion ωx∧y is the observer’s opinion about the conjunction x ∧ y = {(x y)} that is represented by the area inside the dotted line in Figure 7.1. The coproduct opinion ωx∨y is the opinion about the disjunction x ∨ y = {(x y), (x y), (x y)} that is represented by the area inside the dashed line in Figure 7.1. Obviously X×Y is not binary, and coarsening is required in order to determine the product and coproduct opinions as binomial opinions. Definition 7.1 (Binomial Multiplication). Let X = {x, x} and Y = {y, y} be two separate domains, and let ωx = (bx , dx , ux , ax ) and ωx = (by , dy , uy , ay ) be independent binomial opinions on x and y respectively. Given opinions about independent propositions, x and y, the binomial opinion ωx∧y on the conjunction (x ∧ y) is: Product ωx∧y : (1−ax )ay bx uy +ax (1−ay )ux by bx∧y = bx by + , 1−ax ay dx∧y = dx + dy − dx dy , (1−ay )bx uy +(1−ax )ux by ux∧y = ux uy + , 1−ax ay ax∧y = ax ay . (7.1) By using the symbol ‘·’ to denote this operator, multiplication of opinions can be written as ωx∧y = ωx · ωy . ⊔ ⊓ Figure 7.2 shows a screenshot of the online demonstrator for subjective logic operators. The example illustrates multiplication of the two binomial opinions specified as ωx = (0.75, 0.15, 0.10, 0.50) and ωy = (0.10, 0.00, 0.90, 0.20). The product is ω(x∧y) = (0.15, 0.15, 0.70, 0.10), and it can be verified that Eq.(7.2) holds for the product projected probability. 7.1 Binomial Multiplication and Comultiplication u 103 u u = AND d b a P Opinion about x belief 0.75 disbelief 0.15 uncertainty 0.10 base rate 0.50 probability 0.80 b d a P Opinion about y 0.10 belief disbelief 0.00 uncertainty 0.90 base rate 0.20 probability 0.28 b d a P Opinion about x AND y belief 0.15 disbelief 0.15 uncertainty 0.70 base rate 0.10 probability 0.22 Fig. 7.2 Example multiplication of two binomial opinions P(x∧y) = Px · Py (7.2) = 0.80 · 0.28 = 0.22. Notice that ωx has relatively low uncertainty whereas ωy has relatively high uncertainty. An interesting property of the multiplication operator, which can be seen in Figure 7.2, is that the product opinion has an uncertainty mass on a level between the uncertainty masses of the factor opinions. 7.1.2 Binomial Comultiplication Comultiplication of binomial opinions is defined next. Definition 7.2 (Binomial Comultiplication). Let X = {x, x} and Y = {y, y} be two separate domains, and let ωx = (bx , dx , ux , ax ) and ωx = (by , dy , uy , ay ) be independent binomial opinions on x and y respectively. The binomial opinion ωx∨y on the disjunction x ∨ y is: bx∨y = bx + by − bx by , ax (1−ay )dx uy +(1−ax )ay ux dy , dx∨y = dx dy + ax +ay −ax ay (7.3) Coproduct ωx∨y : ay dx uy +ax ux dy ux∨y = ux uy + ax +ay −ax ay , ax∨y = ax + ay − ax ay . 104 7 Binomial Multiplication and Division By using the symbol ‘⊔’ to denote this operator, comultiplication of opinions can ⊔ ⊓ be written as ωx∨y = ωx ⊔ ωy . Figure 7.3 shows a screenshot of the online demonstrator for subjective logic operators. The example illustrates comultiplication of the two binomial opinions ωx = (0.75, 0.15, 0.10, 0.50) and ωy = (0.35, 0.00, 0.65, 0.20). u u u = OR d b a P Opinion about x belief 0.75 disbelief 0.15 uncertainty 0.10 base rate 0.50 probability 0.80 b d a b d P Opinion about y belief 0.35 disbelief 0.00 uncertainty 0.65 base rate 0.20 probability 0.48 a P Opinion about x OR y belief 0.84 disbelief 0.06 uncertainty 0.10 base rate 0.60 probability 0.90 Fig. 7.3 Example comultiplication of two binomial opinions The coproduct is ω(x∨y) = (0.84, 0.06, 0.10, 0.60), and it can be verified that Eq.(7.4) holds for the coproduct projected probability. P(x∨y) = Px ⊔ Py = Px + Py − (Px · Py ) (7.4) = 0.80 + 0.48 − (0.80 · 0.48) = 0.90. Notice that ωx has relatively low uncertainty whereas ωy has relatively high uncertainty. Similarly to the case of multiplication above, it can be seen in Figure 7.3 that the coproduct opinion has an uncertainty mass on a level between the uncertainty masses of the factor opinions. 7.1.3 Approximations of Product and Coproduct The expressions for product in Definition 7.1, and for coproduct in Definition 7.2, might appear ad hoc. However, there is a clear rationale behind their design. 7.1 Binomial Multiplication and Comultiplication 105 The rationale is that the product and coproduct beliefs and disbeliefs must be at least as large as the raw products of belief and disbelief from the factor opinions, anything else would be irrational. If the product disbelif is dx∧y = dx + dy − dx dy , and the coproduct belief is bx∨y = bx + by − bx by , then this requirement is satisfied. The product and coproduct beliefs and disbeliefs could of course be larger than that, but the larger they are, the smaller the uncertainty. The operators are so designed that the maximum uncertainty is preserved in the product and coproduct, which occurs when the disbelief of the product, and the belief of the coproduct are exactly as they are defined. The operators for multiplication and comultiplication are thus conservative, in the sense that they preserve the maximum uncertainty possible. The variance of the product and coproduct can easily be computed through Eq.(3.10), where the variance is a function of the uncertainty, as expressed in Eq.(7.5). Varx∧y = Px∧y (1−Px∧y )ux∧y W +ux∧y Coproduct variance: Varx∨y = Px∨y (1−Px∨y )ux∨y W +ux∨y Product variance: (7.5) where W denotes the non-informative prior weight, which is normally set to W = 2. The non-informative prior weight is discussed in Section 3.3.2. From Eq.(7.5) it be seen that when the uncertainty is zero, the variance is also zero, which corresponds to a dogmatic opinion. The case where Px∧y = 1/2 and ux∧y = 1 is a vacuous opinion which corresponds to the uniform Beta PDF, with variance 1/12. The question is how well the level of uncertainty corresponds with the analytically ‘correct’ level of uncertainty, in the sense that the variance of the product and coproduct follows the analytically correct variance of the product and coproduct as closely as possible. Multiplication and comultiplication represent a self-dual system represented by b ↔ d, u ↔ u, a ↔ 1 − a, and ∧ ↔ ∨, that is, for example, the expressions for bx∧y and dx∨y are dual to each other, and one determines the other by the correspondence, and similarly for the other expressions. This is equivalent to the observation that the opinions satisfy de Morgan’s Laws, i.e. ωx∧y = ωx∨y and ωx∨y = ωx∧y . However it should be noted that multiplication and comultiplication are not distributive over each other, i.e. for example that: ωx∧(y∨z) 6= ω(x∧y)∨(x∧z) (7.6) This is to be expected because if x, y and z are independent, then x ∧ y and x ∧ z are not generally independent in probability calculus so that distributivity does not hold. In fact distributivity of conjunction over disjunction and vice versa only holds in binary logic. Multiplication and comultiplication produce very good approximations of the analytically correct products and coproducts when the arguments are Beta probability density functions [46]. The difference between the subjective logic product and the 106 7 Binomial Multiplication and Division analytically correct product of Beta density functions is best illustrated with the example of multiplying two equal vacuous binomial opinions ω = (0, 0, 1, 21 ), that are equivalent to the uniform Beta probability density function Beta(1, 1). Theorem 7.1. Let X = {x, x} and Y = {y, y} be two binary domains, and let X ∈ X and Y ∈ Y be independent binary random variables with identical uniform probability density functions, which for example can be described as Beta(1, 1) and Beta(p(y) | 1, 1). Then the probability density function PDF (p(Z = (x∧y))) for the product random variable Z = X ·Y is given by: PDF (p(Z = (x∧y))) = − ln p(Z = (x ∧ y)), for 0 < p(Z) < 1. (7.7) The proof is given in [46]. This result applies to the case of the independent propositions x and y, where the joint variable Z takes values from the Cartesian product domain Z = {(x ∧ y), (x ∧ y), (x ∧ y), (x ∧ y)}. Specifically, this means that when the probabilities of x and y have uniform distributions, then the probability of the conjunction x ∧ y has the probability density function PDF (p(Z = (x∧y))) with projected probability P(Z=(x∧y)) = 41 . This can be contrastedwith the a priori non-informative probability density function Dir pQ | ( 21 , 21 , 12 , 21 ) over the quaternary domain Q = {q1 , q2 , q3 , q4 }. The corresponding a priori probability density function for the probability of q1 is Beta p(q1 ) | ( 12 , 23 ) which can be directly derived from Dir pQ | ( 21 , 12 , 12 , 12 ) . Interestingly we get equal projected probabilities: P(q1 ) = P(Z = (x∧y)) = 14 . The difference between Beta p(q1 ) | ( 21 , 32 ) and PDF (p(Z = (x∧y))) = − ln p(Z = (x∧y) is illustrated in Figure 7.4 below. f 5 4 3 −ln p 2 1 1 __ 3 beta(__ 2, 2 ) 0.2 p 0.4 0.6 0.8 1 Fig. 7.4 Comparison between Beta p(q1 ) | ( 21 , 23 ) and PDF (p(Z = (x∧y))) = − ln p(Z = (x∧y)). The analytically correct product of two uniform distributions is represented by PDF(p(Z = (x∧y)) = − ln p, whereas the product produced by the multiplication operator is Beta p(q1 ) | ( 21 , 32 ) , which illustrates that multiplication and comultiplication in subjective logic produce approximate results. More specifically, it can 7.2 Reliability Analysis 107 be shown that the projected probability is always exact, and that the variance is approximate. The quality of the variance approximation is analysed in [46], and is very good in general. The discrepancies grow with the amount of uncertainty in the arguments, so Figure 7.4 illustrates the worst case. The advantage of the multiplication and comultiplication operators of subjective logic is their simplicity, which means that analytical expressions that normally are complex, and sometimes intractable, can be analysed efficiently. The analytical result of products and coproducts of Beta distributions will in general involve the Gauss hypergeometric function [76]. The analysis of anything but the most basic models based on such functions would quickly become unmanageable. 7.2 Reliability Analysis The modern use of the word reliability originates from U.S. military in the 1940s, where it meant that a product would operate as expected for a specified period of time. Reliability analysis of systems is now a mature discipline based on well established techniques. This section describes ow subjective a applied to system reliability analysis. 7.2.1 Simple Reliability Networks For the purpose reliability analysis, a system can be considered as consisting of components that are represented as edges in a graph. This network of components is not intended to reflect the physical architecture of a system, but only to represent the reliability dependencies of the system. The connection between components in this way represents a semantic relationship which is similar to trust relationships described in Chapter 13. A serial connection of two components reflects the property that both components must function correctly for the whole system to function correctly. In binary logic this dependency relationship is computed with the AND connective. In probabilistic logic it is computed with the product operator. In subjective logic it is computed with binomial multiplication according to Definition 7.1. A parallel connection of two components reflects the property that at least one of the components must function correctly for the whole system to function correctly. In binary logic this dependency relationship is computed with the OR connective. In probabilistic logic it is computed with the coproduct operator. In subjective logic it is computed with binomial comultiplication according to Definition 7.2. Figure 7.5.a illustrates a system S which consists of the components w, x, y and z. From a reliability point of view assume that these components can be considered connected as a reliability network consisting of serial and parallel connections, as illustrated in Figure 7.5.b. 108 7 Binomial Multiplication and Division Reliability of system S System S w x x z w y z y b) Reliability dependence relationships a) System components Fig. 7.5 System components with series-parallel dependencies Figure 7.5.b expresses that the correct function of system S requires that both w and z must function correctly, and in addition that either x or y must function correctly. This reliability network can be formally expressed as: S = w ∧ (x ∨ y) ∧ z. (7.8) The advantage of using subjective logic for reliability analysis is that component reliability can be expressed with degrees of uncertainty. To compute the reliability of system S, the reliability of each component can be expressed as the binomial opinions ωw , ωx , ωy and ωz . By applying binomial multiplication and comultiplication the reliability of system S can be computed as: ωS = ω(w∧(x∨y)∧z) = ωw · (ωx ⊔ ωy ) · ωw (7.9) As an example, Table 7.1 specifies opinions for the reliabilities of components w, x, y and z, as well as the resulting reliability opinion of system S. Table 7.1 Example reliability analysis of system S Opinion parameter Belief mass Disbelief mass Uncertainty mass Base rate Projected probability b d u a P ωw 0.90 0.10 0.00 0.90 0.90 Component reliabilities ωx ωy ωz 0.50 0.00 0.00 0.00 0.00 0.00 0.50 1.00 1.00 0.80 0.80 0.90 0.90 0.80 0.90 It can be verified that the following holds, as expected. System reliability ω(w∧(x∨y)∧z) 0.43 0.10 0.47 0.78 0.80 7.2 Reliability Analysis 109 P(w∧(x∨y)∧z) = Pw · (Px ⊔ Py ) · Pz = Pw · Pz · (Px + Py − Px · Py ) (7.10) = 0.90 · 0.90 · (0.90 + 0.80 − 0.90 · 0.80) = 0.80. Thanks to the bijective mapping between belief opinions and evidence opinions in the form of Beta PDFs as defined by Eq.(3.11) it is possible to conduct reliability analysis where the reliability of individual components are expressed in terms of Beta PDFs. 7.2.2 Reliability Analysis of Complex Systems System reliability networks can be more complex than that illustrated in the previous section. Consider for example the 5-component system S shown in Figure 7.6.a, and assume that its reliability can be modelled in the form of the reliability network in Figure 7.6.b. System S components v w y Reliability of system S v x x z a) System components y w z b) Reliability dependence relationships Fig. 7.6 System components with complex dependencies As Figure 7.6.b illustrates, this reliability network can not be broken down into a group of series and parallel edges. This complicates the problem of determining the networks reliability. If the system could be broken down to series/parallel configurations, it would be a relatively simple matter to determine the mathematical or analytical formula that describes the networkss reliability. A good description of possible approaches to analysing complex reliability systems is presented in [70] (p.161). Some of the methods for analytically obtaining the reliability of a complex system are: • Decomposition method. The decomposition method applies the law of total probability. It involves choosing a key edge and then calculating the reliability of the network twice: once as if the key edge failed, and once as if the key edge 110 7 Binomial Multiplication and Division succeeded. These two probabilities are then combined to obtain the reliability of the system, since at any given time the key edge will fail or operate. • Event space method. The event space method applies the mutually exclusive events axiom. All mutually exclusive events are determined, and those which result in network success are considered. The reliability of the network is simply the probability of the union of all mutually exclusive events that yield a network success. Similarly, the unreliability is the probability of the union of all mutually exclusive events that yield a network failure. • Path-tracing method. This method considers every path from a starting point to the ending point. Since network success involves having at least one path available from one end of the Reliability Block Diagram (RBD) to the other, as long as at least one path from the beginning to the end of the path is available, the network has not failed. One could consider the RBD to be a plumbing schematic. If an edge in the network fails, the water can no longer flow through it. As long as there is at least one path for the water to flow from the start to the end of the network, the network is successful. This method involves identifying all of the paths the water could take and calculating the reliability of the path based on the edges that lie along that path. The reliability of the network is simply the probability of the union of these paths. In order to maintain consistency of the analysis, starting and ending points for the network must be defined, which in case of trust network is trivial to define. We refer to [70] for a more detailed description of the above mentioned methods. Reliability networks of this kind can also be analysed with subjective logic. 7.3 Binomial Division and Codivision Division and codivision naturally represent the inverse operations of multiplication and comultiplication. These operations are well defined for probabilities. The corresponding operations for binary logic can be defined as UN-AND and UN-OR respectively. 7.3.1 Binomial Division The inverse operation to binomial multiplication is binomial division. The quotient of opinions about propositions x and y represents the opinion about a proposition z which is independent of y such that ωx = ωy∧z . This requires that: 7.3 Binomial Division and Codivision 111 ax < ay , dx ≥ dy , bx ≥ u ≥ x ax (1−ay )(1−dx )by (1−ax )ay (1−dy ) (1−ay )(1−dx )uy (1−ax )(1−dy ) . (7.11) , Definition 7.3 (Binomial Division). Let X = {x, x} and Y = {y, y} be domains, and let ωx = (bx , dx , ux , ax ) and ωy = (by , dy , uy , ay ) be binomial opinions on x and y satisfying Eq.(7.11). The division of ωx by ωy produces the quotient opinion ωx∧e y = (bx∧e y , dx∧e y , ux∧e y , ax∧e y ) defined by: Quotient ωx∧e y : bx∧e y = dx∧e y = ux∧e y = ax∧e y = ay (bx +ax ux ) (ay −ax )(by +ay uy ) x (1−dx ) − (aya−a , x )(1−dy ) dx −dy 1−dy , (7.12) ay (1−dx ) (ay −ax )(1−dy ) − ay (bx +ax ux ) (ay −ax )(by +ay uy ) , ax ay , By using the symbol ‘/’ to denote this operator, division of opinions can be written as ωx∧e y = ωx /ωy . ⊓ ⊔ Figure 7.7 shows a screenshot of the online demonstrator for subjective logic operators. The example of Figure 7.7 illustrates division of binomial opinion ω(x∧y) = (0.10, 0.80, 0.10, 0.20) by the binomial opinion ωy = (0.40, 0.00, 0.60, 0.50). u u u = UN-AND d b P a Opinion about x AND y belief 0.10 disbelief 0.80 uncertainty 0.10 base rate 0.20 probability 0.12 b d a P Opinion about y belief 0.40 disbelief 0.00 uncertainty 0.60 base rate 0.50 probability 0.70 b d P a Opinion about x belief 0.15 disbelief 0.80 uncertainty 0.05 base rate 0.40 probability 0.17 Fig. 7.7 Example division of a binomial opinion by another binomial opinion 112 7 Binomial Multiplication and Division The quotient is ωx = ω((x∧y)∧e y) = (0.15, 0.80, 0.04, 0.40), and it can be verified that Eq.(7.13) holds for the quotient projected probability. Px = P((x∧y)∧e y) = P(x∧y) /Py (7.13) = 0.12/0.70 = 0.17. Although probability division is a traditional operation used in probabilistic models and analysis, the corresponding binary logic operator UN-AND is rarely used, and was only introduced in 2004 [46]. 7.3.2 Binomial Codivision The inverse operation to comultiplication is codivision. The co-quotient of opinions about propositions x and y represents the opinion about a proposition z which is independent of y such that ωx = ωy∨z . This requires that ax > ay , bx ≥ by , (1−ax )ay (1−bx )dy , dx ≥ ax (1−a y )(1−by ) u ≥ ay (1−bx )uy . x (7.14) ax (1−by ) Definition 7.4 (Binomial Codivision). Let X = {x, x} and Y = {y, y} be domains, and let ωx = (bx , dx , ux , ax ) and ωy = (by , dy , uy , ay ) be binomial opinions on x and y satisfying Eq.(7.14). The codivision of opinion ωx by opinion ωx produces the co-quotient opinion ωx∨e y = (bx∨e y , dx∨e y , ux∨e y , ax∨e y ) defined by: Co-quotient ωx∨e y : bx∨e y = dx∨e y = ux∨e y = a ey = x∨ bx −by 1−by , (1−ay )(dx +(1−ax )ux ) (ax −ay )(dy +(1−ay )uy ) (1−ax )(1−bx ) − (a , x −ay )(1−by ) (7.15) (1−ay )(1−bx ) (ax −ay )(1−by ) − (1−ay )(dx +(1−ax )ux ) (ax −ay )(dy +(1−ay )uy ) , ax −ay 1−ay , e ’ to denote this operator, codivision of opinions can be By using the symbol ‘⊔ e ωy . written as ωx∨e y = ωx ⊔ ⊓ ⊔ 7.4 Correspondence with Probabilistic Logic 113 Figure 7.8 shows a screenshot of the online demonstrator for subjective logic operators. The example of Figure 7.8 illustrates codivision of binomial opinion ω(x∨y) = (0.05, 0.55, 0.40, 0.75) by binomial opinion ωy = (0.00, 0.80, 0.20, 0.50). u u u = UN-OR d b P a b d P P Opinion about y belief 0.00 disbelief 0.80 uncertainty 0.20 base rate 0.50 probability 0.10 Opinion about x OR y 0.05 belief disbelief 0.55 uncertainty 0.40 base rate 0.75 probability 0.35 b d a a Opinion about x belief 0.05 disbelief 0.49 uncertainty 0.46 base rate 0.50 probability 0.28 Fig. 7.8 Example codivision of a binomial opinion by another binomial opinion The co-quotient is ωx = (0.05, 0.49, 0.46, 0.50), and it can be verified that Eq.(7.16) holds for the co-quotient projected probability. Px = P((x∨y)∨e y) e Py = P(x∨y) ⊔ (7.16) = (P(x∨y) − Py )/(1 − Py ) = 0.35−0.10 1−0.10 = 0.28. Although probability codivision is a traditional operation used in probabilistic models and analysis, the corresponding binary logic operator UN-OR is rarely used, and was only introduced in 2004 [46]. 7.4 Correspondence with Probabilistic Logic Multiplication, comultiplication, division and codivision of dogmatic opinions are equivalent to the corresponding probability calculus operators in Table 7.2, where e.g. p(x) denotes the probability of variable value x. 114 7 Binomial Multiplication and Division Table 7.2 Probability calculus operators corresponding to opinion operators. Operator name: Result type: Probability calculus operator: Multiplication Product p(x ∧ y) = p(x)p(y) Division Coproduct Comultiplication Quotient e y) = p(x)/p(y) p(x∧ p(x ∨ y) = p(x) + p(y) − p(x)p(y) Codivision Co-quotient e y) = (p(x) − p(y))/(1 − p(y)) p(x∨ In the case of absolute opinions, i.e. when the argument opinions have either b = 1 (absolute belief) or d = 1 (absolute disbelief), then these opinions can be interpreted as Boolean TRUE or FALSE, so the multiplication and comultiplication operators are homomorphic with the binary logic operators AND and OR, as illustrated in Figure 5.3. Chapter 8 Multinomial Multiplication and Division Multinomial (and hypernomial) multiplication is different from binomial multiplication in that the product opinion on the whole product domain is considered, instead of just on one element of the product domain. Figure 8.1 below illustrates the general situation with two domains X and Y that form the Cartesian product X×Y. The product of two opinions ωX and ωY produces belief masses on singleton elements of X×Y as well as on the row and column subsets of X×Y : ; x1 y1 x2 y2 = } (x1 yl) (x2 y1) (x2 y2) } (x2 yl) } (xk yl) (xk y1) (xk y2) ... (x1 y2) ... yl (x1 y1) ... ... ... xk u :u; Fig. 8.1 Cartesian product of two domains In order to produce an opinion with only belief mass on each singleton element of X×Y as well as uncertainty mass on X×Y, some of the belief mass on the row and column subsets of X×Y must be redistributed to the singleton elements in such a way that the projected probability of each singleton element in X×Y equals the product of projected probabilities of pairs of singleton values from X and Y respectively. Evaluating the multinomial product of two separate multinomial opinions involves the Cartesian product of the respective domains to which the opinions apply. 115 116 8 Multinomial Multiplication and Division Let ωX and ωY be two independent multinomial opinions that apply to X and Y: X = {x1 , x2 , . . . xk } with cardinality k, (8.1) Y = {y1 , y2 , . . . yl } with cardinality l. The Cartesian product X×Y with cardinality kl is expressed as the matrix: (x1 y1 ), (x1 y2 ), · · · (x1 yl ) (x2 y1 ), (x2 y2 ), · · · (x2 yl ) . ··· . X×Y = (8.2) . . . ··· . (xk y1 ), (xk y2 ), · · · (xk yl ) Consider the random variable XY which takes its values from the Cartesian product X×Y. We then turn to the multinomial product of multinomial opinions. The raw terms produced by ωX · ωY can be separated into four groups. 1. The first group of terms consists of raw product belief masses on singletons of X×Y: b X (x1 )bbY (y1 ), b X (x1 )bbY (y2 ), . . . b X (x1 )bbY (yl ) b X (x2 )bbY (y1 ), b X (x2 )bbY (y2 ), . . . b X (x2 )bbY (yl ) Singletons . ... . b XY = . (8.3) . . . . . . b X (xk )bbY (y1 ), b X (xk )bbY (y2 ), . . . b X (xk )bbY (yl ) 2. The second group of terms consists of belief masses on rows of X×Y b Rows XY = bX (x1 )uY , bX (x2 )uY , . . . bX (xk )uY 3. The third group consists of belief masses on columns of X×Y: = uX bY (y1 ), uX bY (y2 ), . . . uX bY (yl b Columns XY (8.4) (8.5) 4. The last term is simply the uncertainty mass on the whole product domain: uDomain = uX uY XY (8.6) The challenge is how to interpret the various types of product belief masses. In case of hypernomial products the product belief masses are directly interpreted as part of the hypernomial belief mass distribution, as explained in Section 8.4. In case of multinomial products some of the belief mass on the row and column subsets of X×Y must be redistributed to the singleton elements in such a way that the projected probability of each singleton element equals the product of projected probabilities of pairs of singleton values from X and Y respectively. There are (at least) 3 approaches of multinomial opinion products that produce consistent projected probability products, namely normal multiplication described in Section 8.1, 8.1 Normal Multiplication 117 proportional multiplication described in Section 8.2, and projected multiplication described in Section 8.3. Whatever method is used, the projected probability distribution PXY of the product is always the same: PXY (x y) = PX (x)PY (y) (8.7) The product variance depends on the uncertainty, which in general is different for each method. Based on the Dirichlet PDF of the computed products, the product variance can easily be computed through Eq.(3.18), as expressed in Eq.(8.8). Multinomial product variance: VarXY (x y) = PXY (x y)(1 − PXY (x y))uXY W + uXY (8.8) where W denotes the non-informative prior weight, which is must set to W = 2. The non-informative prior weight is discussed in Section 3.3.2. It can for example be seen that when the uncertainty is zero, the variance is also zero, which is the case for dogmatic multinomial opinions. In the general case, the product variance of Eq.(8.8) is an approximation of the analytically correct variance, which typically is complex, and for which there is no closed expression. 8.1 Normal Multiplication The singleton terms of Eq.(8.3) and the uncertainty mass on the whole domain of Eq.(8.6) are unproblematic because they conform with the multinomial opinion representation of having belief mass only on singletons and on the whole domain. In contrast, the set of terms on rows of Eq.(8.4) and columns of Eq.(8.5) apply to overlapping subsets which is not compatible with the required format for multinomial opinions, and therefore needs to be reassigned. Some belief mass from those terms can be reassigned to belief mass on singletons, and some to uncertainty mass on the whole domain. 8.1.1 Determining Uncertainty Mass Consider the belief mass from Eq.(8.4) and Eq.(8.5) as potential uncertainty masses, expressed as: Potential uncertainty mass from rows: uRows XY = ∑x∈X b Rows XY (x) (8.9) Potential uncertainty mass from columns: uColumns = ∑y∈Y b Columns (y) XY XY 118 8 Multinomial Multiplication and Division The sum of the uncertainty masses from Eq.(8.6) and Eq.(8.9) represents the maximum possible uncertainty mass uMax XY expressed as: Rows Columns uMax + uDomain XY = uXY + uXY XY (8.10) The minimum possible uncertainty mass uMin XY is simply: Domain uMin XY = uXY (8.11) The projected probability of each singleton in the product domain can easily be computed as the product of the projected probabilities of each pair of value of X and Y according to Eq.(8.12). PX (x)PY (y) = (bbX (x) + a X (x)uX )(bbY (y) + aY (y)uY ) (8.12) We also require that the projected probability distribution over the product variable can be computed as a function of the product opinion according to Eq.(8.13). PXY (x y) = bXY (x y) + aX (x)aaY (y)uXY (8.13) Obviously the quantities of Eq.(8.12) and Eq.(8.13) are equal, so we can write: PX (x)PY (y) = PXY (x y) ⇔ (bbX (x) + a X (x)uX )(bbY (y) + aY (y)uY ) = b XY (x y) + a X (x)aaY (y)uXY ⇔ uXY = (8.14) (bbX (x)+aaX (x)uX )(bbY (y)+aaY (y)uY )−bbXY (x y) a X (x)aaY (y) The task now is to determine uXY and the belief distribution b XY of the multinomial product opinion ωXY . There is at least one product value (xi y j ) for which the following equation can be satisfied: Singletons b XY (xi y j ) = b XY (xi y j ) (8.15) Based on Eq.(8.14) and Eq.(8.15) there is thus at least one product value (xi y j ) for which the following equation can be satisfied: (i, j) uXY = Singletons (bbX (xi )+aaX (xi )uX )(bbY (y j )+aaY (y j )uY )−bbXY (xi y j ) a X (xi )aaY (y j ) (8.16) Singletons = (i, j) PX (xi )PY (yi )−bbXY a X (xi )aaY (y j ) Max where uMin XY ≤ uXY ≤ uXY . (xi y j ) 8.1 Normal Multiplication 119 In order to determine the uncertainty mass for the product opinion, each product value (xi y j ) ∈ X×Y must be visited in turn to find the smallest uncertainty mass (i, j) uXY that satisfies Eq.(8.16). (i, j) The product uncertainty can now be determined as the smallest uXY from Eq.(8.16), expressed as: uXY = (i, j) min uXY (x y)∈X×Y (8.17) 8.1.2 Determining Belief Mass Having determined the uncertainty mass uXY according to Eq.(8.17), the expression for the product projected probability of Eq.(8.12) can be used to compute the belief mass on each element in the product domain, as expressed by Eq.(8.18). b XY (x y) = PX (x)PY (y) − a X (x)aaY (y)uXY (8.18) = (bbX (x) + a X (x)uX )(bbY (y) + aY (y)uY ) − a X (x)aaY (y)uXY It can be shown that the additivity property of Eq.(8.19) is preserved. uXY + ∑ b XY (x y) = 1 (8.19) (x y)∈X×Y From Eq.(8.18) it follows directly that the product operator is commutative. It can also be shown that the product operator is associative. 8.1.3 Product Base Rates The computation of product base rates is straightforward according to Eq.(8.20) below. a X (x1 )aaY (y1 ), a X (x1 )aaY (y2 ), . . . a X (x1 )aaY (yl ) a X (x2 )aaY (y1 ), a X (x2 )aaY (y2 ), . . . a X (x2 )aaY (yl ) a XY = (8.20) . . ... . a X (xk )aaY (y1 ), a X (xk )aaY (y2 ), . . . a X (xk )aaY (yl ) 120 8 Multinomial Multiplication and Division 8.1.4 Assembling the Multinomial Product Opinion Having computed the belief mass distribution b XY , the uncertainty mass uXY and the base rate distribution a XY according to the above stepwise procedure, the multinomial product opinion is complete as expressed by: ωXY = (bbXY , uXY , a XY ). (8.21) Although not directly obvious, the normal multinomial product opinion method described here is a generalisation of the binomial product described in Section 7.1.1. Because of the relative simplicity of the binomial product it can be described as a closed expression. For the normal multinomial product however, the stepwise procedure described here is needed. 8.1.5 Justification for Normal Multinomial Multiplication The method to determine the product uncertainty in Eq.(8.17) might appear ad hoc. However, there is a clear rationale behind this method. The rationale is that the product belief masses must be at least as large as the raw product belief masses of Eq.(8.3), anything else would be irrational, so we define this as a requirement. Remember that the larger the product uncertainty, the smaller the product belief masses, so if the product uncertainty is too large, then the requirement might not be satisfied for all product belief masses. The largest uncertainty which satisfies the requirement is defined as the product uncertainty. The method for determining the product uncertainty in Section 8.1.1 follows exactly this principle The operator for normal multinomial multiplication is thus conservative, in the sense that it preserves the maximum uncertainty possible. 8.2 Proportional Multiplication Given the product projected probability distribution PXY computed with Eq.(8.7), the quesiton is how to compute an appropriate uncertainty level. The proportional method uses uX and uY together with the maximium theoretical uncertainty levels ubX and ubY , and defines the uncertainty level uXY based on the assumption that uXY is the proportional average of uX and uY , as expressed by Eq.(8.22). This produces the uncertainty uXY . 8.3 Projected Multiplication 121 uXY ubXY ⇔ = uXY = uX + uY ubX + ubY (8.22) ubXY (uX + uY ) ubX + ubY (8.23) bX with uncertainty ubX is The computation of uncertainty-maximised opinions ω described in Section 3.4.6. The convention for marginal cases of division by zero is that the whole fraction is equal to zero, as e.g. expressed by: IF (uX + uY = 0) ∧ (b uX + ubY = 0) THEN uX + uY = 0. ubX + ubY (8.24) Eq.(8.24) is sound in all cases, because we always have (uX + uY ) ≤ (b uX + ubY ). Of course, the uncertainty sum (uX +uY ) is strictly limited by the maximum possible uncertainty sum (b uX + ubY ). This property ensures that uXY ∈ [0, 1] in Eq.(8.23). Having computed the uncertainty level uXY in of Eq.(8.23), the belief masses are computed according to: b XY (x y) = PXY (x y) − a XY (x y)uXY , for each (x y) ∈ X × Y. (8.25) This completes the computation of the proportional product opinion ωXY , expressed as: ωXY = ωX · ωY (8.26) Proportional multiplication as described here produces slightly less uncertainty than normal multiplication described in Section 8.1, and can therefore be considered as slightly more aggressive. The precise nature of heir similarity and difference remains to be analysed. 8.3 Projected Multiplication ′ can be computed by first computing a hyperA projected multinomial product ωXY Singletons nomial opinion with belief mass distribution consisting of b XY from Eq.(8.3), Rows Columns b XY from Eq.(8.4) and b XY from Eq.(8.5). The uncertainty is uDomain from XY Eq.(8.6), and the base rate distribution is a XY from Eq.(8.20). Then this hypernomial opinion is projected to a multinomial opinion according ′ . to Eq.(3.31) on p.37. The result is the multinomial product opinion ωXY ′ In general the projected multinomial product opinion ωXY has less uncertainty than the normal product opinion ωXY described in Section 8.3 above, although both have the same projected probability distribution. In case one or both of the factor opinions ωX and ωY contain significant uncertainty it is desirable to let this be reflected in the product opinion. The normal 122 8 Multinomial Multiplication and Division multinomial product ωXY described above in Section 8.1 is therefore the preferred method to be used. 8.4 Hypernomial Product Evaluating the hypernomial product of two separate multinomial or hypernomial (or even binomial) opinions involves the Cartesian product of the respective domains to which the factor opinions apply. Assume the two domains X of cardinality k and Y of cardinality l, as well as their hyperdomains R(X) of cardinality κ = (2k − 2) and R(Y) of cardinality λ = (2l − 2). The Cartesian product X×Y with cardinality kl is expressed as the matrix: (x1 y1 ), (x1 y2 ), · · · (x1 yl ) (x2 y1 ), (x2 y2 ), · · · (x2 yl ) . ··· . X×Y = (8.27) . . . ··· . (xk y1 ), (xk y2 ), · · · (xk yl ) The hyperdomain of X×Y is denoted R(X×Y). Let ωX and ωY be two independent hypernomial opinions that apply to the separate domains. The task is to compute the hypernomial product opinion ωXY . Table 8.1 summarises characteristics of the opinions and domains involved in a hypernomial product. Table 8.1 Hypernomial product elements Dom. Cardi. Hyperdom. Hypercardi. Var. Val. Bel. mass dist. ♯ bel. masses Factor ωX X k R(X) κ X x bX κ Factor ωY Y l R(Y) λ Y y bY λ kl R(X×Y) (2kl − 2) XY xy bXY (κλ + κ + λ ) Product ωXY X×Y The expression (κλ + κ + λ ) represents the number of belief masses of the hypernomial product. This number emerges as follows: The opinion factor ωX ’s belief mass distribution b X can have κ belief masses, and the opinion factor ωY ’s belief mass distribution bY can have λ belief masses, so their product produces κλ belief masses. In addition, the product between b X and uY produces κ belief masses, and the product between bY and uX produces λ belief masses. Note that (κλ + κ + λ ) ∝ 2k+l . The expression (2kl −2) represents the number of hypervalues in R(X×Y). Note that (2kl − 2) ∝ 2kl . 8.5 Product of Dirichlet Probability Density Functions 123 Because 2kl ≫ 2k+l with growing k and l, the number (2kl − 2) of possible values in R(X×Y) is in general far superior to the number (κλ + κ + λ ) of belief masses of the hypernomial product. A hypernomial opinion product is thus highly constrained with regard to the set of hypervalues that can receive belief mass. We now turn to the computation of the hypernomial opinion product. The terms produced by ωX · ωY can be separated into four groups. 1. The first group of terms consists of belief masses on hypervalues of R(X×Y): b X (x1 )bbY (y1 ), b X (x1 )bbY (y2 ), . . . b X (x1 )bbY (yλ ) b X (x2 )bbY (y1 ), b X (x2 )bbY (y2 ), . . . b X (x2 )bbY (yλ ) HValues (8.28) b XY = . . ... . b X (xκ )bbY (y1 ), b X (xκ )bbY (y2 ), . . . b X (xκ )bbY (yλ ) 2. The second group of terms consists of belief masses on hyperrows of R(X×Y): b HRows = b X (x1 )uY , b X (x2 )uY , . . . b X (xκ )uY XY (8.29) 3. The third group consists of belief masses on hypercolumns of R(X×Y): b HColumns = uX bY (y1 ), uX bY (y2 ), . . . uX bY (yλ ) XY (8.30) 4. The last term is simply the belief mass on the whole product domain: uHDomain = uX uY XY (8.31) The set of κλ belief masses of b HValues , the κ belief masses of b HRows and the λ XY XY Hyper HColumns belief masses of b XY together form the belief mass distribution b XY of ωXY : Hyper b XY = (bbHValues , b HRows , b HColumns ) XY XY XY (8.32) The uncertainty mass is simply uHDomain . Finally the base rate distribution a XY XY is the same as that of multinomial products in Eq.(8.20). The hypernomial product opinion is then defined as: Hyper Hyper ωXY = (bbXY , uHDomain , aXY ). XY (8.33) If needed the hypernomial product opinion ωXY can be projected to a multinomial opinion according to Eq.(3.31) on p.37. The result is then a multinomial ′ which has the same projected probability distribution as that product opinion ωXY Hyper of ωXY . Hyper 8.5 Product of Dirichlet Probability Density Functions Multinomial opinion multiplication can be leveraged to compute products of Dirichlet PDFs (Probability Density Functions) described in Section 3.4.2. 124 8 Multinomial Multiplication and Division Assume domains X and Y. The variables X and Y take their values from X and Y respectively. In the Dirichlet model the analyst observes occurrences of values x ∈ X and values y ∈ Y, and represents these observations as evidence vectors r X and r Y , so that e.g. r X (x) represents the number of observed occurrences of the value x. In addition the analyst must specify the base rate distributions a X over X as well as the base rate distribution aY over Y. These parameters define the Dirichlet PDFs on X and Y . Let e.g. DireX and DirYe denote the evidence Dirichlet PDFs on variables X and Y respectively, according to Eq.(3.16). Their product can be denoted as: DireXY = DireX · DirYe The procedure for computing is also illustrated in Figure 8.2. DireXY (8.34) according to Eq.(8.34) is described next and 1. Specify the evidence parameters (rr X , a X ) and (rrY , aY ) of the factor Dirichlet PDFs. 2. Derive opinions ωX and ωY from Dirichlet PDFs according to the mapping of Eq.(3.23). 3. Compute ωXY = ωX · ωY as described in Section 8.1 in case of multinomial product. 4. Derive the product Dirichlet PDF DireXY from the multinomial product opinion ωXY according to the mapping of Eq.(3.23). Figure 8.2 depicts the procedure just described. DirXe Mapping ZX Multiplication DirYe Mapping Z XY Mapping e DirXY ZY Fig. 8.2 Procedure for computing the product of Dirichlet PDFs In general the product of two Dirichlet PDFs can be computed as a Dirichlet PDF. If needed a Dirichlet HPDF can be projected onto a Dirichlet PDF. This is done by first mapping the Dirichlet HPDF to a hyper-opinion according to Eq.(3.36). Then the hyper-opinion is projected onto a multinomial opinion according to Eq.(3.31). Finally the multinomial opinion can be used in the multiplication operation. The above described method for computing a product Dirichlet PDF from two Dirichlet PDFs is very simple and requires very little computation. Although not equivalent this result is related to Dirichlet convolution. 8.6 Example Multinomial Product Computation 125 8.6 Example Multinomial Product Computation We consider the scenario where a GE (Genetic Engineering) process can produce Male (M) or Female (F) fertilised eggs. In addition, each fertilised egg can have genetic mutation S or T independently of its gender. This constitutes two binary domains representing gender: Gen = {M, F} and mutation Mut = {S, Ξ} , or alternatively the quaternary product domain Gen×Mut = {(M S), (M T), (F S), (F T)}. Sensor A observes whether each egg has gender M or F, and Sensor B observes whether the egg has mutation S or T. Sensors A and Sensor B have thus observe different and orthogonal aspects, so that opinions derived from their observations can be combined with multiplication. This is illustrated in Figure 8.3. Sensor A Opinion about gender Product opinion about egg Multiplication Sensor B Opinion about mutation Fig. 8.3 Multiplication of opinions on orthogonal aspects of GE eggs The result of opinion multiplication in this case can be considered as an opinion based on observation from a single sensor that simultaneously detects both aspects at the same time. Assume that 20 eggs have been produced and that the two aspects aspects have been observed for each egg. Table 8.2 summarises the observations and the resulting opinions. Table 8.2 Observations of egg gender and mutation Observations Opinions Base Rates Probabilities Gender opinion: r(M) = 12 r(F) = 6 b(M) = 0.60 b(F) = 0.30 uGen = 0.10 a(M) = 0.50 a(F) = 0.50 P(M) = 0.65 P(F) = 0.35 Mutation opinion: r(S) = 14 r(Ξ) = 4 b(S) = 0.70 b(Ξ) = 0.20 uMut = 0.10 a(S) = 0.50 a(Ξ) = 0.50 P(S) = 0.75 P(Ξ) = 0.25 The Cartesian product domain and the projected probabilities are expressed as: 126 8 Multinomial Multiplication and Division Gen×Mut = (M S), (F S), (M T) , (F T) P(Gen×Mut) = 0.49, 0.26, 0.16 . (8.35) 0.09 The next section describes and compares the normal multinomial product and the projected multinomial product of this example. Then Section 8.6.2 describes the hypernomial product of this example. 8.6.1 Multinomial Product Computation Below is presented the result of multinomial multiplication according to the normal product method described in Section 8.1 as well as the projected product method described in Section 8.3. Table 8.3 shows the results applying the method of normal multinomial product of Section 8.1, as well as the method or projected multinomial product. Also shown are the Dirichlet PDF parameters that are obtained with the mapping of Eq.(3.23). Table 8.3 therefore accounts for both multinomial opinion product as well as Dirichlet PDF product. Table 8.3 Multinomial products of egg gender and mutation Opinions Base Rates Probabilities Eq. Observations b(M S) b(M T) b(F S) b(F T) uGen×Mut = 0.460 = 0.135 = 0.235 = 0.060 = 0.110 a(M S) a(M T) a(F S) a(F T) = 1/4 = 1/4 = 1/4 = 1/4 P(M S) P(M T) P(F S) P(F T) = 0.49 = 0.16 = 0.26 = 0.09 r(M S) r(M T) r(F S) r(F T) ∑r = 8.36 = 2.45 = 4.27 = 1.09 = 16.18 b(M S) Proportional b(M T) product: b(F S) b(F T) uGen×Mut = 0.473 = 0.148 = 0.248 = 0.073 = 0.058 a(M S) a(M T) a(F S) a(F T) = 1/4 = 1/4 = 1/4 = 1/4 P(M S) P(M T) P(F S) P(F T) = 0.49 = 0.16 = 0.26 = 0.09 r(M S) r(M T) r(F S) r(F T) ∑r = 16.21 = 5.07 = 8.50 = 2.50 = 32.28 b(M S) b(M T) b(F S) b(F T) uGen×Mut = 0.485 = 0.160 = 0.260 = 0.085 = 0.010 a(M S) a(M T) a(F S) a(F T) = 1/4 = 1/4 = 1/4 = 1/4 P(M S) P(M T) P(F S) P(F T) = 0.49 = 0.16 = 0.26 = 0.09 r(M S) r(M T) r(F S) r(F T) ∑r = 97 = 32 = 52 = 17 = 198 Normal product: Projected product: The normal product preserves the most uncertainty, the proportional product preserves about 50% less uncertainty, and the projected product hardly preserves any uncertainty at all. The normal product thereby represent the most conservative approach, and should normally be used for multinomial product computation in general situations. The analytically correct product of Dirichlet PDFs is computationally complex, and the methods described here represent approximations. Simula- 8.7 Multinomial Division 127 tions of the binomial product in Section 7.1.3 show that the normal product is a very good approximation of the analytically correct product. 8.6.2 Hypernomial Product Computation Computation of hypernomial products does not require any synthesis of uncertainty mass or projection and is therefore much simpler than the computation of multinomial products. We continue the example of observing egg gender and mutation. Based on the observation parameters of Table 8.2 the hypernomial product can be computed according to the method described in Section 8.4. In case of hypernomial products it is necessary to consider the product hyperdomain R(Gender×Mutation) (including the product domain (Gen Mut)) of Eq.(8.36). (M S), (M T), (M Mut) (F T), (F Mut) R(Gen×Mut) = (F S), (8.36) (Gen S), (Gen T), (Gen×Mut) Table 8.4 shows the hypernomial product ωGen×Mut including the hypernomial product DireH Gen×Mut . Table 8.4 Hypernomial product of egg gender and mutation Opinions Hypernomial product: b(M S) b(M T) b(F S) b(F T) b(M Mut) b(F Mut) b(Gen S) b(Gen T) uGen×Mut Base Rates = 0.42 = 0.12 = 0.21 = 0.06 = 0.07 = 0.07 = 0.06 = 0.03 = 0.01 a(M S) a(M T) a(F S) a(F T) = 0.25 = 0.25 = 0.25 = 0.25 Probabilities P(M S) P(M S) P(F S) P(F S) = 0.49 = 0.16 = 0.26 = 0.09 8.7 Multinomial Division Multinomial division is the inverse operation of multinomial multiplication described in Section 8.1. Similarly to how multinomial multiplication applies to the Cartesian product of two domains, multinomial division applies to the Cartesian quotient of a product domain by one of its factor domains. 128 8 Multinomial Multiplication and Division Consider the Cartesian product domain X × Y, and the factor domain Y. The Cartesian quotient resulting from dividing product domain X×Y by factor domainm Y produces the quotient domain X, as illustrated in Figure 8.4. :u; ; : } (x1 yl) y1 x1 (x2 y1) (x2 y2) } (x2 yl) y2 x2 . . . . . . (xk y1) (xk y2) . . . } / (xk yl) yl = ... (x1 y2) ... (x1 y1) xk Fig. 8.4 Cartesian quotient of a Cartesian product domain divided by one of its factors Assume a multinomial opinion ωXY = (bbXY , uXY , a XY ) on the product domain XY with the following belief mass distribution and base rate distribution. b XY (x1 y1 ), b X (x1 y2 ), . . . , b X (x1 yl ) b XY (x2 y1 ), b X (x2 y2 ), . . . , b X (x2 yl ) . ..., . b XY = . (8.37) . . . . . , . b XY (xk y1 ), b X (xk y2 ), . . . , b X (xk yl ) a XY (x1 y1 ), a X (x1 y2 ), . . . , a X (x1 yl ) a XY (x2 y1 ), a X (x2 y2 ), . . . , a X (x2 yl ) . ..., . a XY = . (8.38) . . . . . , . a XY (xk y1 ), a X (xk y2 ), . . . , a X (xk yl ) Assume also the multinomial opinion ωY = (bbY , uY , aY ) on domain Y. We want to determine ωX by multinomial division according to. ωX = ωXY /ωY (8.39) It can already be mentioned that there is no general solution to this seemingly simple equation. A general solution would have to satisfy the unrealistic requirement of Eq.(8.40). 8.7 Multinomial Division 129 PXY (xi y1 ) PXY (xi y2 ) PXY (xi yl ) = = ... , for all xi ∈ X. PY (y1 ) PY (y2 ) PY (yl ) (8.40) However, there is only one single product opinion ωXY for which Eq.(8.40) holds, making it impossible to satisfy that requirement in general. Instead, the realistic situation is expressed by Eq.(8.41). Unrealistic: PX (xi ) = PXY (xi yl ) PXY (xi y1 ) PXY (xi y2 ) 6= 6= . . . , for all xi ∈ X. PY (y1 ) PY (y2 ) PY (yl ) (8.41) For the base rate distributions a X , aY and a XY , we assume that the requirement does hold, as expressed by Eq.(8.42). Realistic: PX (xi ) 6= a XY (xi yl ) a XY (xi y1 ) a XY (xi y2 ) = = ... , for all xi ∈ X. aY (y1 ) aY (y2 ) aY (yl ) (8.42) Because there is no general analytical solution to multinomial division, we will instead describe two partial solutions below. The method for the first possible solution is averaging proportional division, and the method for the second possible solution is selective division. Assumed: a X (xi ) = 8.7.1 Averaging Proportional Division The method for averaging proportional division, denoted ‘/’, is synthetic by nature, in the sense that it produces a solution where there is no real analytical solution. The produced solution can not be described as an approximation, because there is not even a correct solution to approximate. The main principle of averaging proportional division is simply to compute the average of the different projected probabilities from Eq.(8.41). However, potential problems with zero divisors and non-additivity must be addressed, by defining one limit rule and one condition. Limit rule: ( IF (PXY (xi y j ) = PY (y j ) = 0) for some (i, j), P (xi y j ) THEN XY PY (y j ) = 0. (8.43) Condition: IF (PXY (xi y j ) > 0) ∧ (PY (y j ) = 0) for some (i, j), THEN no division is possible. (8.44) Thus, by applying the limit rule of Eq.(8.43) and respecting the condition of Eq.(8.44), a preliminary quotient projected probability distribution can be computed according to: 130 8 Multinomial Multiplication and Division PPre X (xi ) = l PXY (xi y j ) ∑ PY (y j ) j=1 ! 1 l (8.45) Note the cardinality l = |Y| used for producing the preliminary quotient probability distribution. It is possible that the probability distribution of PPre X is non-additive (i.e. the sum is not equal to 1). Additivity can be obtained through simple normalisation through the normalisation factor νXAve expressed as: k νXAve = ∑ PPre X (xi ) (8.46) i=1 The average quotient projected probability distribution PAve can then be comX puted, as expressed in Eq.(8.47). PAve X (xi ) = PPre X (xi ) . νXAve (8.47) The average quotient projected probability distribution is then: PAve X = {PX (xi ), for i = 1, . . . , k} One disadvantage of PAve X (8.48) is obviously that, in general, PXY = 6 PAve X ·PY . However, it can be noted that the following equality holds: Ave PAve X = (PX · PY )/PY (8.49) Given the average quotient projected probability distribution PAve X , the question is how to compute an appropriate level of uncertainty, based on the uncertainties uXY and uY . One simple heuristic method is to take the maximium theoretical uncertainty levels ubXY and ubY , and define the uncertainty level uY based on the assumption that uXY is the proportional average of uX and uY , as expressed by Eq.(8.50). This produces a preliminary uncertainty uPre X ⇔ ⇔ uXY ubXY uPre X ubX +b uY = = uPre = X uPre uPre uY X + uY X = + ubX + ubY ubX + ubY ubY + ubY uY uXY − ubXY ubX + ubY uXY (b uX + ubY ) − uY ubXY (8.50) (8.51) (8.52) bX with uncertainty ubX is The computation of uncertainty-maximised opinions ω described in Section 3.4.6. The convention for marginal cases of division by zero is that the whole fraction is equal to zero, as e.g. expressed by: 8.7 Multinomial Division 131 IF (uXY = 0) ∧ (b uXY = 0) THEN uXY = 0. ubXY (8.53) Eq.(8.53) is sound in all cases, because we always have uXY ≤ ubXY . Of course, the uncertainty uXY is strictly limited by the maximum possible uncertainty ubXY . The uncertainty uPre X of Eq.(8.52) could theoretically take values greater than 1 or less than 0. Therefore, in order to normalise the situation it is necessary to complete the computation by constraining its range. (uPre IF X > 1) THEN uX = 1 Range constraint: ELSEIF (uPre X < 0) THEN uX = 0 ELSE uX = uPre X (8.54) Having computed the uncertainty level uX , the belief masses are computed according to: b X (x) = PX (x) − a X (x)uX , for each x ∈ X (8.55) This completes the computation of the averaging proportional quotient opinion expressed as: ωXAve = ωXY /ωY (8.56) The averaging aspect of this operator stems from the averaging of the projected probabilities in Eq.(8.45). The proportional aspect comes from the computation of uncertainty in Eq.(8.52) proportionally to the uncertainty of the argument opinions. 8.7.2 Selective Division The method for selective division, denoted ‘’, assumes that one of the values of Y has been observed, and is thereby considered to be absolutely TRUE. Let the specific observed value be y j , then the belief mass distribution bY and projected probability distriution PY are expressed by: y j is TRUE ⇒ bY (y j ) = 1, and PY (y j ) = 1, bY (yt ) = 0, and PY (yt ) = 0 for all other values yt 6= y j (8.57) The quotient projected probability distribution is expressed as: PX (xi ) = PXY (xi y j ) = PXY (xi y j ), for i = 1, . . . , k PY (y j ) (8.58) 132 8 Multinomial Multiplication and Division The uncertainty can be determined in the same way as for average division according to Eq.(8.52), but since it is assumed that uY = 0 (because some y j is TRUE), then a simplified version of Eq.(8.52) is expressed as: uSel X = uXY ubX ubXY (8.59) It can be verified that uSel X ∈ [0, 1], so that no constraining is needed. Having computed the uncertainty level uSel X , the quotient belief mass distribution is computed as: Sel b Sel X (x) = PX (x) − a X (x)uX , for each x ∈ X (8.60) This completes the computation of the selective quotient opinion expressed as: ωXSel = ωXY ωY (8.61) The selective aspect is that one of the divisor elements is considered TRUE, which makes it possible to select one specific quotient term of candidate projected probability terms, thereby avoiding complications resulting from unequal terms. 8.8 Multinomial Opinion Projection A multinomial product opinion is a joint opinion in case the factor opinions are independent. Let ωX and ωY be independent multinomial opinions on respective domains X and Y, and let ωXY be the multinomial product opinion on the Cartesian product domain X×Y. There are cases where it is necessary to project the product opinion to produce a projected opinion on a factor domain. The interpretation of this operation is that a projected opinion represents the specific factor in the product. 8.8.1 Opinion Projection Method Let X be a domain of cardinality k = |X|, and let Y be a domain of cardinality l = |Y|, with X and Y as their respective variables. Let ωX and ωY be two opinions, and let ωXY be the product opinion on domain X × Y. The product opinion ωXY can be projected onto X or Y , so both projections are described below. Let PX and PY be (assumed) projected probability distributions over X and Y respectively, then the product projected probability distribution is expressed as: 8.8 Multinomial Opinion Projection Product: PXY 133 PXY (x1 y1 ), PXY (x1 y2 ), . . . PXY (x2 y1 ), PXY (x2 y2 ), . . . = . ... . PXY (xk y1 ), PXY (xk y2 ), . . . PXY (x1 yl ) PXY (x2 yl ) . PXY (xk yl ) PX (x1 )PY (y1 ), PX (x1 )PY (y2 ), . . . PX (x2 )PY (y1 ), PX (x2 )PY (y2 ), . . . Implicitly equal to: = . ... . PX (xk )PY (y1 ), PX (xk )PY (y2 ), . . . (8.62) PX (x1 )PY (yl ) PX (x2 )PY (yl ) (8.63) . PX (xk )PY (yl ) The projection onto X and Y produces the projected probability distributions PX and PY expressed as: l P̌X (xi ) = ∑ PXY (xi y j ) j=1 (8.64) k P̌Y (y j ) = ∑ PXY (xi y j ) i=1 Similarly, the projected base rates a X and aY are computed as: l ǎaX (xi ) = ∑ a XY (xi y j ) j=1 (8.65) k ǎaY (y j ) = ∑ a XY (xi y j ) i=1 The question is how to determine an appropriate level of uncertainty. For this purpose, the same principle as for averaging proportional division described in Section 8.7.1 is used. In addition, we assume that the uncertainties for ωX and ωY are equal. This leads to the set of constraints on the left-hand side of Eq.(8.66), which produces the uncertainties on the right-hand side of Eq.(8.66). ǔY uXY ub u = ǔubXX + ubXY ǔX = XubXYXY +b uY (8.66) ⇔ ǔX = ǔY ǔ = ubY uXY Y ubX ubY ubXY Having computed the projected uncertainties ǔX and ǔY , the respective projected belief mass distributions b̌bX and b̌bY can be computed as: b̌bX (xi ) = P̌X (xi ) − ǎaX (xi ) ǔX (8.67) b̌bY (y j ) = P̌Y (y j ) − ǎaY (yi ) ǔY This completes the projection onto X with opinion ω̌X = (b̌bX , ǔX , ǎaX ), as well as projection onto Y with opinion ω̌Y = (b̌bY , ǔY , ǎaY ). 134 8 Multinomial Multiplication and Division 8.8.2 Example: Football Games Consider e.g. two football games, where in game X the teams x1 and x2 play together, and in game Y the teams y1 and y2 play together. The rules dictates that the games can not end in a draw, so if the score is level at the end of a game, a penalty shootout follows until one team wins. Assume that an betting analyst has the joint opinion ωXY about the four possible outcomes of the two games before they begin. Assume further that game Y is cancelled for some reason (e.g. the bus carrying team y2 had an accident). The betting analyst then wants to project the joint opinion ωXY onto the domain X to produce the projected opinion ωX . This is possible since the belief masses on all product values (x y) ∈ X×Y have assigned belief masses. The Cartesian product domain, the belief mass distribution, and the base rate distributions are expressed as: (x1 y1 ), (x1 y2 ) X×Y = , (x2 y1 ), (x2 y2 ) b XY = a XY = b XY (x1 y1 ) = 0.4, b XY (x2 y1 ) = 0.1, a XY (x1 y1 ) = 0.25, a XY (x2 y1 ) = 0.25, b XY (x1 y2 ) = 2.0 . uXY = 0.2 b XY (x2 y2 ) = 0.1 (8.68) a XY (x1 y2 ) = 0.25 . a XY (x2 y2 ) = 0.25 The product projected probability distribution can be computed as: PXY (x1 y1 ) = 0.45, PXY (x1 y2 ) = 0.25 PXY = PXY (x2 y1 ) = 0.15, PXY (x2 y2 ) = 0.15 Table 8.5 summarises the projected opinions ω̌X and ω̌Y . Table 8.5 Projected opinions ω̌X and ω̌Y Opinions Base Rates Probabilities Projection on X: b̌bX (x1 ) = 0.600 b̌bX (x2 ) = 0.200 ǔX = 0.200 ǎaX (x1 ) = 1/2 ǎaX (x2 ) = 1/2 P̌X (x1 ) = 0.70 P̌X (x2 ) = 0.30 Projection on Y : b̌bY (y1 ) = 0.467 b̌bY (y2 ) = 0.267 ǔY = 0.266 ǎaY (y1 ) = 1/2 ǎaY (y2 ) = 1/2 P̌Y (y1 ) = 0.60 P̌Y (y2 ) = 0.40 (8.69) Chapter 9 Conditional Deduction 9.1 Introduction to Conditional Reasoning Both binary logic and probability calculus have mechanisms for conditional reasoning, where deduction and abduction are the fundamental operators. Conditional deduction and abduction are both discussed in this introduction, but this chapter only describes in detail the deduction operator for subjective logic. The abduction operator for subjective logic is described in detail in Chapter 10. In binary logic, Modus Ponens (MP) which represents deduction, and Modus Tollens (MT) which represents abduction, are the classical operators that are used in any field of logic that requires conditional inference. In probability calculus, conditional probabilities together with base rates are used for analysing deductive and abductive reasoning models. Subjective logic extends the traditional probabilistic approach by allowing subjective opinions to be used as input arguments, so that deduced or abduced conclusions reflect the underlying uncertainty of the situation to be analysed. Our knowledge of the real world tells us that certain variables and states are related in some way. For example, the state of rainy weather and the state of carrying an umbrella are often related, so it is meaningful to express this relationship as a conditional propositions “If it rains, Bob carries an umbrella” which is of the form “IF x THEN y”. Here x denotes the antecedent proposition (aka. evidence) y the consequent proposition (aka. hypothesis) in the conditional. The format of this binary logic conditional is thus always “IF <antecedent> THEN <consequent>”. Causal conditionals are typically considered in reasoning models, where the parent is assumed to cause the child. Consider a reasoning model about carrying an umbrella depending on rainy weather. There is obviously a causal conditional between rain and carrying an umbrella, so the rain variable becomes parent, and the umbrella variable becomes child, as illustrated in Figure 9.1. A conditional is a complex proposition consisting of the antecedent and consequent sub-propositions that practically or hypothetically represent states in the real 135 9 Conditional Deduction ZX Causal conditionals Child node Y ZY|X Deduction Parent node X ZY||X ~ ZX||Y Abduction 136 Fig. 9.1 Principles of deduction and abduction world. The conditional proposition does not represent a state in the same way, rather it represents a relationship between states of the world. The purpose of conditionals is primarily to reason from knowledge about the antecedent proposition to infer knowledge about the consequent proposition, which commonly is called deductive reasoning. In addition, conditionals can also be used to reason from knowledge about the consequent proposition to infer knowledge about the antecedent proposition which commonly is called abductive reasoning. Abductive reasoning involves inversion of conditionals, which makes abduction more complex than deduction. In case of a causal conditional it is assumed that a parent variable dynamically influences a child variable in space and time. For example consider the case of a binary parent proposition “It rains”, and a binary child proposition “Bob carries an umbrella” that both can be evaluated to TRUE or FALSE. Initially, assume that the parent propositions are both FALSE. Subsequently, assume that the parent proposition becomes TRUE, and that this makes the child proposition also becoming TRUE. If this dynamic conditional relationship holds in general, then it can be seen as a TRUE conditional. Note that in case of a TRUE causal conditional, forcing the child proposition to become TRUE normally does not influence the parent proposition in any way. The above scenario of rain and umbrellas clearly demonstrates this principle, because carrying an umbrella obviously does not bring rain. However, in case of a TRUE causal conditional, simply knowing that the child is TRUE can nevertheless indicate the truth value of the parent, because seeing Bob carrying an umbrella can plausibly indicate that it rains. Hence, in case of a TRUE causal conditional, to force or to know the child proposition to be TRUE can have very different effects on the parent proposition. However, to force or to know the parent proposition to be TRUE have equal effect on the child proposition. A derivative conditional is the opposite of a causal conditional. This means that even in case of a TRUE derivative conditional, then forcing the parent proposition to become TRUE does not necessarily make the child proposition become TRUE 9.2 Probabilistic Conditional Inference 137 as well. For example, the conditional “IF Bob carries an umbrella THEN it must be raining” is a derivative conditional because forcing Bob to carry an umbrella does not cause rain. Conditionals can also be non-causal. For example in case two separate lamps are connected to the same electric switch, then observing one of the lamps being lit gives an indication of the other lamb being lit too, so there is clearly a conditional relationship between them. However neither lamp actually causes the other to light up, rather it is the flipping of the switch which causes both lamps to light up at the same time. Conditionals are logically directed from the antecedent to the consequent proposition. The idea is that an analyst with knowledge of the truth value of a conditional and its antecedent proposition can infer knowledge about the consequent proposition. However, in case the analyst needs to infer knowledge about the antecedent proposition, then the conditional can not be used directly. What the analyst needs is the opposite conditional where the propositions have swapped places. For example assume that Alice knows that Bob usually carries an umbrella when it rains. Then she has an initial causal conditional with which she can determine with some accuracy whether he carries an umbrella simply by observing that it rains. However, if she wants to determine whether it rains by observing that Bob picks up his umbrella before going out, then in theory the causal conditional is not directly applicable. Alice might intuitively infer that it probably rains if he picks up an umbrella, but in reality she then practices derivative reasoning because she applies the inverse of a causal conditional, as well as abductive reasoning because she implicitly inverts the initial conditional (whether it is causal or not). Inversion of conditionals is more complex than what intuition indicates, so it is important to use sound methods to make sure that it is done correctly and consistently. The degree of truth, or equivalently the validity of conditionals, can be expressed in different ways, e.g. as Boolean TRUE or FALSE, as a probabilities, or as subjective opinions. In the sections below, the methods for binomial and multinomial conditional deduction are explained and described, first in case of probabilistic conditionals, and subsequently in case of opinion conditionals. Then, in Chapter 10, the methods for binomial and multinomial conditional abduction are explained and described. 9.2 Probabilistic Conditional Inference With the aim of giving the reader a gentle introduction to the principles of deduction and abduction, this section provides a brief overview of how it is handled in traditional probability calculus. The binomial case is described first, followed by the general multinomial case. 138 9 Conditional Deduction 9.2.1 Binomial Probabilistic Deduction and Abduction The notation ykx, introduced in [55], denotes that the truth or probability of proposition y is derived as a function of the probability of the parent x together with the conditionals p(y|x) and p(y|x). The expression p(ykx) thus represents a derived value, whereas the expression p(y|x) represents an input argument. This notational convention will be used when describing probabilistic and subjective conditional reasoning below. The deductive and abductive reasoning situations are illustrated in Figure 9.1 where X denotes the parent variable and Y denotes the child variable in the reasoning model. Conditionals are expressed as p(< consequent > | < antecedent >), i.e. with the consequent variable first, and the antecedent variable last. Formally, a conditional probability is defined as follows: Conditional probability: p(y|x) = p(x ∧ y) . p(x) (9.1) Parent-child reasoning models are typically assumed to be causal, where parent nodes have a causal influence over child nodes. In this situation conditionals are typically expressed in the same direction as the reasoning, i.e. with parent as antecedent, and child as consequent. Forward conditional inference, called deduction, is when the analyst has evidence about the parent variable, and the child variable is the target of reasoning. Assume that the elements x and x are relevant to the element y (and y) according to the conditional statements y|x and y|x. Here, x and x are parents and y is the child of the conditionals. Let p(x), p(y|x) and p(y|x) be probability assessments of x, y|x and y|x respectively. The law of total probability says that the probability of a child value y is the sum of conditional probabilities conditioned on every parent value, expressed as: Law of total probability: p(y) = p(x)p(y|x) + p(x)p(y|x). (9.2) The conditionally deduced probability p(ykx) can then be computed from the law of total probability as: Deduced probability: p(ykx) = p(x)p(y|x) + p(x)p(y|x) (9.3) = p(x)p(y|x) + (1 − p(x))p(y|x) . In case the analyst knows exactly that x is true, i.e. p(x) = 1, then from Eq.(9.3) it can immediately be seen that p(ykx) = p(y|x). Conversely, in case the analyst knows exactly that x is false, i.e. p(x) = 0, then from Eq.(9.3) it can immediately be seen that p(ykx) = p(y|x). 9.2 Probabilistic Conditional Inference 139 Reverse conditional inference, called abduction, is when the analyst has evidence about the child variable, and the parent variable is the target of the reasoning. In this case the available conditionals are directed in the opposite direction to the reasoning, and opposite to what the analyst needs. Assume that the states x and x are relevant to y according the conditional statements y|x and y|x, where x and x are parent values, and y and y are child values. Let p(y), p(y|x) and p(y|x) be probability assessments of y, y|x and y|x respectively. The required conditionals can be correctly derived by inverting the available conditionals using Bayes theorem. Bayes theorem: p(x|y) = p(x)p(y|x) . p(y) (9.4) Bayes theorem is simply derived from the definition of conditional probability of Eq.(9.1) as follows. By expressing the conditional probability of p(y|x) and p(x|y) Bayes theorem emerges. p(y|x) = p(x∧y) p(x) p(x)p(y|x) ⇒ p(x|y) = . (9.5) p(y) p(x|y) = p(x∧y) p(y) With Bayes theorem the inverted conditional p(x|y) can be computed from the conditionals p(y|x) and p(y|x). However, the simple expression of Bayes theorem hides some subtleties related to base rates as explained below. Conditionals are assumed to represent general dependence relationships between statements, so the terms p(x) and p(y) on the right hand side of Eq.(9.4) must also represent general probabilities, and not for example observations. A general probability must be interpreted as a base rate, as explained in Section 2.6. The term p(x) on the right hand side of Eq.(9.4) therefore expresses the base rate of x. Similarly, the term p(y) on the right hand side of Eq.(9.4) expresses the base rate of y, which conditionally depends on the base rate of x, and which can be computed from the base rate of x using the law of total probability of Eq.(9.2). In order to avoid confusion between the base rate of x and the probability of x, the term a(x) will denote the base rate of x in the following. Similarly, the term a(y) will denote the base rate of y. By applying the law of total probability of Eq.(9.2) the term p(y) in Eq.(9.5) can be expressed as a function of the base rate a(x) and its complement a(x) = 1 − a(x). As a result the inverted positive and negative conditionals are: a(x)p(y|x) p(x|y) = a(x)p(y|x)+a(x)p(y|x) , Inverted conditionals: (9.6) a(x)p(y|x) p(x|y) = . a(x)p(y|x)+a(x)p(y|x) The conditionally abduced probability p(ykx) can then be computed as: 140 9 Conditional Deduction Abduced probability: p(xk̃y) = p(y)p(x|y) + p(y)p(x|y) = p(y) a(x)p(y|x) a(x)p(y|x)+a(x)p(y|x) + p(y) a(x)p(y|x) a(x)p(y|x)+a(x)p(y|x) (9.7) It can be noted that Eq.(9.7) is simply the application of conditional deduction according to Eq.(9.3) where the conditionals are determined according to Eq.(9.6) The terms used in Eq.(9.3) and Eq.(9.7) are interpreted as follows: p(y|x) : the conditional probability of y given that x is TRUE p(y|x) : the conditional probability of y given that x is FALSE p(x|y) : the conditional probability of x given that y is TRUE p(x|y) : the conditional probability of x given that y is FALSE p(y) : the probability of y p(y) : the probability of the complement of y (= 1 − p(y)) a(x) : the base rate of x p(ykx) : the deduced probability of y as a function of evidence on x p(xk̃y) : the abduced probability of x as a function of evidence on y The binomial expressions for probabilistic deduction of Eq.(9.3) and probabilistic abduction of Eq.(9.7) can be generalised to multinomial expressions as explained below. 9.2.2 Multinomial Probabilistic Deduction and Abduction Let X = {xi |i = 1 . . . k} be the parent domain with variable X, and let Y = {y j | j = 1 . . . l} be the child domain with variable Y . The deductive conditional relationship between X and Y is then expressed as the set of conditionals p(Y |X) with k specific conditionals p(Y |xi ), each having the l dimensions of Y . This is illustrated in Figure 9.2. A specific conditional probability distribution p (Y |xi ) relates the value xi to the values of variable Y . The probability distribution p (Y |xi ) consists of l conditional probabilities expressed as: l p(Y |xi ) = {p(y j |xi ), for j = 1 . . . l}, where ∑ p(y j |xi ) = 1 . j=1 (9.8) 9.2 Probabilistic Conditional Inference Parent variable X 141 Child variable Y y1 y2 x1 p(Y | x1) x2 p(Y | x2) . . . . . . xk p(Y | xk) yl Conditionals p(Y|X) Fig. 9.2 Multinomial conditionals between parent X and child Y The term p(Y |X) denotes the set of probability distributions of the form of Eq.(9.8), which can be expressed as: p(Y |X) = {p(Y |xi ), for x = 1 . . . k} . (9.9) The generalised law of total probability for multinomial probability distributions is expressed as: k General law of total probability: p(y) = ∑ p(xi )p(y|xi ) . (9.10) i=1 The probabilistic expression for multinomial conditional deduction from X to Y is directly derived from the law of total probability of Eq.(9.10). The deduced probability distribution over Y is denoted p (Y kX), where the deduced probability p(y j kX) of each value y j is: k Deduced probability: p(y j kX) = ∑ p(xi )p(y j |xi ) . (9.11) i=1 The deduced probability distribution on Y can be expressed as: l p(Y kX) = {p(y j kX) for j = 1 . . . l} where ∑ p(y j kX) = 1 . (9.12) j=1 Note that in case the exact variable value X = xi i known, i.e. p(xi ) = 1, then from Eq.(9.11) it can immediately be seen that the deduced probability distribution becomes p(Y kX) = p(Y |xi ). Moving over to abduction, it is necessary to first compute the inverted conditionals. The multinomial probabilistic expression for inverting conditionals, according to the generalised Bayes theorem is: Generalised Bayes theorem: p(xi |y j ) = a(xi )p(y j |xi ) k a(xt )p(y j |xt ) ∑t=1 (9.13) 142 9 Conditional Deduction where a(xi ) represents the base rate of xi . By substituting the conditionals of Eq.(9.11) with inverted multinomial conditionals from Eq.(9.13), the general expression for probabilistic abduction emerges: ! l a(xi )p(y j |xi ) Abduced probability: p(xi kY ) = ∑ p(y j ) . (9.14) k a(xt )p(y j |xt ) ∑t=1 j=1 The set of abduced probability distributions can be expressed as: k p(X k̃Y ) = {p(xi kY ) for i = 1 . . . k} where ∑ p(xi kY ) = 1 . (9.15) i=1 The terms used in the above described formalism for multinomial conditional inference are interpreted as follows: p(Y |X) : set of conditional probability distributions on Y p(Y |xi ) : specific conditional probability distribution on Y p(X|Y ) : set of conditional probability distributions on X p(X|y j ) : specific conditional probability distribution on X p(X) : probability distribution over X p(xi ) : probability of specific value xi p(Y ) : probability distribution over Y p(y j ) : probability of specific value y j a(X) : base rate distribution over X a(Y ) : base rate distribution over Y p(Y kX) : deduced probability distribution over Y p(y j kX) : deduced probability of specific value y j p(X k̃Y ) : abduced probability distribution over X p(xi k̃Y ) : abduced probability of specific value xi 9.3 Notation for Subjective Conditional Inference 143 The above described formalism is illustrated by a numerical example in Section 10.8.1. 9.3 Notation for Subjective Conditional Inference This section simply introduces the notation used for conditional deduction and abduction in subjective logic. The notation is similar to the corresponding notation for probabilistic deduction and abduction. The detailed mathematical description of the operators is provided in subsequent sections below. 9.3.1 Notation for Binomial Deduction and Abduction Let X = {x, x} and Y = {y, y} be two binary domains with respective variables X and Y , where there is a degree of relevance between X and Y . Let ωx = (bx , dx , ux , ax ), ωy|x = (by|x , dy|x , uy|x , ay|x ) and ωy|x = (by|x , dy|x , uy|x , ay|x ) be an agent’s respective opinions about x being true, about y being true given that x is true, and finally about y being true given that x is false. Conditional deduction is computed with the deduction operator denoted ‘⊚’ so that binomial deduction is expressed as: Binomial opinion deduction: ωykx = ωx ⊚ (ωy|x , ωy|x ) . (9.16) e so Conditional abduction is computed with the abduction operator denoted ‘⊚’, that binomial abduction is expressed as: e (ωy|x , ωy|x , ax ) Binomial opinion abduction: ωxk̃y = ωy ⊚ (9.17) = ωy ⊚ (ωx|y , ωx|y ) = ωxky . The conditionally abduced opinion ωxk̃y expresses the belief about x being true as a function of the beliefs about y and the two sub-conditionals y|x and y|x, as well as the base rate ax . In order to compute Eq.(9.17) it is necessary to invert the conditional opinions ωy|x and ωy|x in order to obtain the conditional opinions ωx|y and ωx|y , so that the final part of the abduction computation can be based on simple deduction according to Eq.(9.16). Inversion of binomial opinion conditionals is described in Section 10.3 below. The notation for binomial opinion inversion is given below: e ({ωy|x , ωy|x }, ax ) Binomial opinion inversion: {ωx|y , ωx|y } = ⊚ (9.18) 144 9 Conditional Deduction 9.3.2 Notation for Multinomial Deduction and Abduction Let domain X have cardinality k = |X| and domain Y have cardinality l = |Y|, where variable X plays the role of parent, and variable Y the role of child. Assume the set of conditional opinions of the form ωY |xi , where i = 1 . . . k. There is thus one conditional opinion for each element xi of the parent variable. Each of these conditionals must be interpreted as the subjective opinion on Y given that xi is TRUE. The subscript notation on each conditional opinion ωY |xi specifies not only the child variable Y it applies to, but also the element xi of the parent variable it is conditioned on. By extending the notation for binomial conditional deduction to the case of multinomial opinions, the general expression for multinomial conditional deduction is written as: Multinomial opinion deduction: ωY kX = ωX ⊚ ωY |X (9.19) where the symbol ‘⊚’ denotes the conditional deduction operator for subjective opinions, and where ωY |X is a set of k = |X| different opinions conditioned on each xi ∈ X respectively. The structures of deductive reasoning is illustrated in Figure 9.3. The conditionals are expressed on the child variable, which is also the target variable for the deductive reasoning. In the example of weather being the parent variable, and carrying an umbrella being the child variable, the evidence is about the weather, and the conclusion is an opinion about carrying an umbrella. Child variable Parent variable Y X y1 y2 x2 ZX . . . Z Y | xk xk = yl Conditionals ZY | X Z Y | x1 Z Y | x2 x1 . . . ZY || X Fig. 9.3 Structure of conditionals for deduction In case of abduction, the goal is to reason from the child variable Y to the parent variable X. The multinomial expression for subjective logic conditional abduction is written as: 9.3 Notation for Subjective Conditional Inference 145 e (ωY |X , a X ) Multinomial opinion abduction: ωX k̃Y = ωY ⊚ (9.20) = ωY ⊚ ωX|Y = ωXkY . e denotes the general conditional abduction operawhere the operator symbol ‘⊚’ tor for subjective opinions, and where ωY |X is a set of k = |X| different multinomial opinions conditioned on each xi ∈ X respectively. The base rate distribution over X is denoted by a X . In order to compute the abduced opinion according to Eq.(9.20) it is necessary to invert the set of conditional opinions ωY |X which produces the set of conditional opinions ωX|Y so that the final part of the abduction computation can be based on multinomial deduction according to Eq.(9.19). Inversion of multinomial opinion conditionals is described in Section 10.3 below. The notation for multinomial opinion inversion is given below: e (ωY |X , a X ) Multinomial opinion inversion: ωX|Y = ⊚ (9.21) The structures of abductive reasoning is illustrated in Figure 9.4. The fact that both the evidence opinion as well as the set of conditional opinions are expressed on the child variable makes this reasoning situation complex. The parent is now the target variable, so it is a situation of reasoning from child to parent. In the example of weather being the causal parent variable, and carrying an umbrella being the consequent child variable, the evidence is about carrying an umbrella, and the conclusion is an opinion about its possible cause, the weather. Parent variable Child variable X Y x1 x2 xk y1 y2 . . . ZY ~ Conditionals ZY | X . Z Y | x1 Z Y | x2 . Z Y | xk . yl = ZX ~|| Y Fig. 9.4 Structure of conditionals for abduction Note that the conditionals do not necessarily have to be causal. However, for analysts it is typically easier to express conditionals in the causal direction, which 146 9 Conditional Deduction is the reason why it is normally assumed that parent variables represent causes of child variables. The above sections presented the concepts of conditionals as well as of deductive and abductive reasoning. It is time to get down to the mathematical details. This next sections describe binomial and multinomial deduction in subjective logic. The binomial case can be described in the form of closed expressions, whereas the multinomial case requires a series of steps that need to be implemented as an algorithm. For that reason, the cases of binomial and multinomial deduction are presented separately. 9.4 Binomial Deduction Conditional deduction with binomial opinions has previously been described in [55]. However, that description did not include the base rate consistency requirement of Eq.(9.24) which has been added to the description of binomial deduction of Definition 9.1 below. 9.4.1 Bayesian Base Rate In general, the base rate of x and the conditionals on y put constraints on the base rate of y. The strictest base rate consistency requirement is to derive a specific base rate, as described here. The expression for the base rate ay in Eq.(9.26) is derived from the base rate consistency requirement: Binomial base rate consistency requirement: ay = ax Py|x + ax Py|x (9.22) Assuming that ωy|x and ωy|x are not both vacuous, i.e. that uy|x + uy|x < 2 the simple expression for the Bayesian base rate ay can be derived as follows: ay = ax Py|x + ax Py|x = ax (by|x + ay uy|x ) + ax (by|x + ay uy|x ) ay = ax by|x +ax by|x 1−ax uy|x −ax uy|x ⇔ (9.23) With the Bayesian base rate of Eq.(9.23), it is guaranteed that the projected probabilities of binomial conditional opinions do not change after multiple inversions. In case ωy|x and ωy|x are both vacuous, i.e. when uy|x = uy|x = 1, then there is no constraint on the base rate ay , as in the case of the free base rate interval described in Section 9.4.2. 9.4 Binomial Deduction 147 Figure 9.5 shows a screenshot of binomial deduction, involving a Bayesian base rate, which is equal to the deduced projected probability given a vacuous antecedent ◦ ω x , according to the requirement of Eq.(9.22). u u u y|x DEDUCE = y|x d b aP Opinion about x belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.80 probability 0.80 d b P Opinion about y|x belief 0.40 disbelief 0.50 uncertainty 0.10 base rate 0.40 probability 0.44 b d aP aP y|x 0.00 0.40 0.60 0.40 0.24 Opinion about y belief 0.32 disbelief 0.48 uncertainty 0.20 base rate 0.40 probability 0.40 ◦ Fig. 9.5 Screenshot of deduction with vacuous antecedent ω x , and corresponding Bayesian base rate ay = 0.40 To intuitively see why Bayesian base rate is necessary, consider the case of a a pair of dogmatic conditional opinions ωy|x and ωy|x , where both projected probabilities are Py|x = Py|x = 1. In this trivial case we always have p(y) = 1 independently on the probability p(x). It would then be totally inconsistent to e.g. have base rate ay = 0.5 when we always have p(y) = 1. The base rate must reflect reality, so the only consistent base rate in this case is ay = 1, which emerges directly from Eq.(9.23). 9.4.2 Free Base Rate Interval There may be situations where it is not necessary to have the strict requirement of a specific base rate for ay , but instead allow an interval for ay . In general, the more dogmatic the conditional opinions the stricter the constraints on base rates. This section describes the relatively lax requirement of an interval for choosing a free base rate. Section 9.4.1 above describes the more strict requirement of a single Bayesian base rate. The idea of a free base rate interval, is that the base rate ay in Eq.(9.26) must take its value from the interval defined by a lower base rate limit a− y and an upper base rate limit a+ in order to be consistent with the conditionals. In case of dogmatic y conditionals, the free base rate interval is reduced to a single base rate, which in 148 9 Conditional Deduction fact is the Bayesian base rate. The upper and lower limits for free base rates are the projected probabilities of the consequent opinions resulting from first assuming a ◦ vacuous antecedent opinion ω x , and then to hypothetically set maximum (=1) and minimum (=0) base rates for the consequent variable y. The upper and lower base rate limits are: 1 Upper base rate: a+ y = max[ax Py|x + ax Py|x ] ay =0 1 = max[ax (by|x + ay uy|x ) + ax (by|x + ay uy|x )] ay =0 (9.24) 1 Lower base rate: a− y = min [ax Py|x + ax Py|x ] ay =0 1 = min [ax (by|x + ay uy|x ) + ax (by|x + ay uy|x )] ay =0 + The free base rate interval for ay is then expressed as [a− y , ay ], meaning that the base rate ay must be within that interval in order to be consistent with the pair of + conditionals ωy|x and ωy|x . Note that the free base rate interval [a− y , ay ] is also a function of base rate ax . Figure 9.6 shows a screenshot of binomial deduction, indicating the upper base rate a+ y = 0.52, which is equal to the deduced projected probability given a vacuous ◦ antecedent ω x and a hypothetical base rate ay = 1, according to Eq.(9.24). Figure 9.7 shows a screenshot of binomial deduction, indicating the lower base rate a− y = 0.32, which is equal to the deduced projected probability given a vacuous ◦ antecedent ω x and a hypothetical zero base rate ay = 0, according to Eq.(9.24). In case of dogmatic conditionals, the free base rate interval collapses to a single value expressed by Eq.(9.23). ay = ax Py|x + ax Py|x = ax by|x + ax by|x (9.25) In case both ωy|x and ωy|x are vacuous opinions, i.e. when uy|x = uy|x = 1, then the free base rate interval for ay is [0, 1], meaning that there is no consistency constraint on the base rate ay . 9.4.3 Method for Binomial Deduction Binomial opinion deduction is a generalisation of probabilistic conditional deduction expressed in Eq.(9.3). It is assumed that the base rate ay is consistent with the 9.4 Binomial Deduction 149 u u u y|x DEDUCE = y|x d b d Opinion about y|x belief 0.40 disbelief 0.50 uncertainty 0.10 base rate 1.00 probability 0.50 Opinion about x belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.80 probability 0.80 d b a P P aP P b a Opinion about y belief 0.32 disbelief 0.48 uncertainty 0.20 base rate 1.00 probability 0.52 y|x 0.00 0.40 0.60 1.00 0.60 ◦ Fig. 9.6 Screenshot of deduction with vacuous antecedent ω x , indicating upper free base rate ay = 0.52 u u u y|x DEDUCE = y|x d b aP Opinion about x belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.80 probability 0.80 d aP b P Opinion about y|x belief 0.40 disbelief 0.50 uncertainty 0.10 base rate 0.00 probability 0.40 y|x 0.00 0.40 0.60 0.00 0.00 d a b P Opinion about y belief 0.32 disbelief 0.48 uncertainty 0.20 base rate 0.00 probability 0.32 ◦ Fig. 9.7 Screenshot of deduction with vacuous antecedent ω x , indicating lower free base rate ay = 0.32 base rate ax and the pair of conditionals ωx|y and ωx|y , as described in Section 9.4.1, or alternatively as described in Section 9.4.2. Definition 9.1 (Conditional Deduction with Binomial Opinions). Let X = {x, x} and Y = {y, y} be two binary domains where there is a degree of relevance of variable X ∈ X to variable Y ∈ Y. Let opinions ωx = (bx , dx , ux , ax ), ωy|x = (by|x , dy|x , uy|x , ay ) and ωy|x = (by|x , dy|x , uy|x , ay ) be an agent’s respective opinions about x being true, about y being true given that x is true and about y being true given that x is false. The deduced opinion ωykx = (bykx , dykx , uykx , ay ) is computed 150 9 Conditional Deduction as: bykx dykx uykx ωykx : ay ay = bIy − ay K = dyI − (1 − ay )K = uIy + K = (9.26) ax by|x +ax by|x 1−ax uy|x −ax uy|x = ay (arbitrary value) for uy|x + uy|x < 2 for uy|x = uy|x = 1 I by = bx by|x + dx by|x + ux (by|x ax + by|x (1 − ax )) dyI = bx dy|x + dx dy|x + ux (dy|x ax + dy|x (1 − ax )) where I uy = bx uy|x + dx uy|x + ux (uy|x ax + uy|x (1 − ax )) (9.27) and where K is determined according to the following selection criteria: Case I: ((by|x > by|x ) ∧ (dy|x > dy|x )) ∨ ((by|x ≤ by|x ) ∧ (dy|x ≤ dy|x )) =⇒ K = 0 Case II.A.1: ((by|x > by|x ) ∧ (dy|x ≤ dy|x )) ∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) ≤ ax ) =⇒ K = (9.28) (9.29) ax ux (bIy −by|x ) (bx +ax ux )ay Case II.A.2: ((by|x > by|x ) ∧ (dy|x ≤ dy|x )) ∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) > ax ) =⇒ K = ax ux (dyI −dy|x )(by|x −by|x ) (dx +(1−ax )ux )ay (dy|x −dy|x ) Case II.B.1: ((by|x > by|x ) ∧ (dy|x ≤ dy|x )) ∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) ≤ ax ) =⇒ K = (9.30) (1−ax )ux (bIy −by|x )(dy|x −dy|x ) (bx +ax ux )(1−ay )(by|x −by|x ) (9.31) 9.4 Binomial Deduction 151 Case II.B.2: ((by|x > by|x ) ∧ (dy|x ≤ dy|x )) ∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) > ax ) =⇒ K = (dx +(1−ax )ux )(1−ay ) Case III.A.1: ((by|x ≤ by|x ) ∧ (dy|x > dy|x )) ∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) ≤ ax ) =⇒ K = where (bx +ax ux )(1−ay ) (9.36) ax ux (bIy −by|x )(dy|x −dy|x ) (dx +(1−ax )ux )(1−ay )(by|x −by|x ) P(ωykx◦ ) = by|x ax + by|x (1 − ax ) + ay (uy|x ax + uy|x (1 − ax )) (9.35) ax ux (dyI −dy|x ) Case III.B.2: ((by|x ≤ by|x ) ∧ (dy|x > dy|x )) ∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) > ax ) =⇒ K = (9.34) (1−ax )ux (bIy −by|x ) (dx +(1−ax )ux )ay Case III.B.1: ((by|x ≤ by|x ) ∧ (dy|x > dy|x )) ∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) ≤ ax ) =⇒ K = (9.33) (1−ax )ux (dyI −dy|x )(by|x −by|x ) (bx +ax ux )ay (dy|x −dy|x ) Case III.A.2: ((by|x ≤ by|x ) ∧ (dy|x > dy|x )) ∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x ))) ∧ (P(ωx ) > ax ) =⇒ K = (9.32) (1−ax )ux (dyI −dy|x ) (9.37) P(ωx ) = bx + ax ux ⊔ ⊓ The computed ωykx is the conditionally deduced opinion derived from ωx , ωy|x and ωy|x . It expresses the belief in y being true as a function of the beliefs in x and the two sub-conditionals y|x and y|x. The conditional deduction operator is a ternary operator, and by using the function symbol ‘⊚’ to designate this operator, we define ωykx = ωx ⊚ (ωy|x , ωy|x ) . Figure 9.8 shows a screenshot of binomial deduction, involving the Bayesian base rate ay = 0.40. The deduced opinion ωykx = (0.07, 0.42, 0.51, 0.40) lies within the sub-triangle defined by ωy|x , ωy|x , and ωykx◦ = (0.32, 0.48, 0.20, 0.40). In this case, the sub-triangle is reduced to a line between ωy|x , ωy|x , because ωykx◦ is situated on that line. 152 9 Conditional Deduction u u u y|x DEDUCE = y|x d b P a Opinion about x belief 0.10 disbelief 0.80 uncertainty 0.10 base rate 0.80 probability 0.18 d b P b d aP Opinion about y|x belief 0.40 disbelief 0.50 uncertainty 0.10 base rate 0.40 probability 0.44 P a y|x 0.00 0.40 0.60 0.40 0.24 Opinion about y belief 0.07 disbelief 0.42 uncertainty 0.51 base rate 0.40 probability 0.28 Fig. 9.8 Screenshot of deduction, involving Bayesian base rate ay = 0.40 Note that in case x is known to be true, i.e. ωx = (1, 0, 0, a) is an absolute positive opinion, then obviously ωykx = ωy|x . Similarly, in case x is known to be false, i.e. ωx = (0, 1, 0, a) is an absolute negative opinion, then obviously ωykx = ωy|x . 9.4.4 Justification for the Binomial Deduction Operator While not particularly complex, the expressions for binomial conditional inference has many cases which can be difficult to understand and interpret. A more direct and intuitive justification can be found in its geometrical interpretation. The image space of the child opinion is a subtriangle where the two subconditionals ωy|x and ωy|x form the two bottom vertices. The third vertex of the ◦ subtriangle is the child opinion resulting from a vacuous parent opinion ω x . This particular child opinion, denoted ωykx◦ , is determined by the base rates of x and y as well as the horizontal distance between the sub-conditionals. The parent opinion then determines the actual position of the child within that subtriangle. For example, when the parent is believed to be TRUE, i.e. ωx = (1, 0, 0, ax ), the child opinion is ωykx = ωy|x , when the parent is believed to be FALSE, i.e. ωx = (0, 1, 0, ax ), the child opinion is ωykx = ωy|x , and when the parent opinion is vacuous, ◦ i.e. ω x = (0, 0, 1, ax ), the child opinion is ωykx = ωykx◦ . For all other opinion values of the parent, the child opinion is determined by linear mapping from a point in the parent triangle to a point in the child subtriangle according to Def.9.1. It can be noted that when ωy|x = ωy|x , the child subtriangle is reduced to a point, so that it is necessary that ωykx = ωy|x = ωy|x = ωykx◦ in this case. This would mean that there is no relevance relationship between parent and child. 9.4 Binomial Deduction 153 Assume the sub-triangle defined by the conditional opinions inside the child opinion triangle. Deduction then consists of linearly projecting the antecent opinion onto the sub-triangle. Figure 9.9 illustrates the principle of projecting an parent opinion to the child opinion in the subtriangle with vertices: Arguments: Deduced opinion: ωy|x = (0.55, 0.30, 0.15, 0.38) ω = (0.10, 0.75, 0.15, 0.38) y|x ωx = (0.00, 0.40, 0.60, 0.50) Zq ⇒ ωykx = (0.15, 0.48, 0.37, 0.38) ux vertex uy vertex x Zx Z y|| x Z y|| xq Z y| x x (9.38) Px y ax Opinion on antecedent x Z y| x y Py|| x a y Deduced opinion on y Fig. 9.9 Projection from parent opinion triangle to the child opinion sub-triangle The deduced opinion can be obtained geometrically by mapping the position of the parent ωx in the parent triangle onto a position that relatively seen has the same in the subtriangle (shaded area) of the child triangle. Additional parameters seen in Figure 9.9 are: Projected probability of x: Px = 0.30 Projected deduced probability: Pykx = 0.29 Vacuous-deduced opinion: ωykx◦ = (0.19, 0.30, 0.51, 0.38) (9.39) In the general case, the child image subtriangle is not equal-sided as in the example above. By setting the base rate of x different from 0.5, and by defining subconditionals with different uncertainty, the child image subtriangle will be skewed, and it is even possible that the uncertainty of ωykx◦ is less that that of ωx|y or ωx|y . 154 9 Conditional Deduction 9.5 Multinomial Deduction 9.5.1 Constraints for Multinomial Deduction Conditional deduction with multinomial opinions has previously been described in [39]. However, the description below has improved clarity and also includes a description of consistency intervals for base rate distributions. Multinomial opinion deduction is a generalisation of probabilistic conditional deduction as expressed in Eq.(9.11). Let X = {xi |i = 1, . . . , k} and Y = {y j | j = 1, . . . , l} be random variables, where X is the evidence variable, and Y is the target variable in given domains. Assume an opinion ωX = (bbX , uX , a X ) on X and a set of conditional opinions ωY |xi on Y , one for each xi , i = 1, . . . , k. The conditional opinion ωY |xi is a subjective opinion on Y given that X takes the value xi . Formally, each conditional opinion ωY |xi , i = 1, . . . , k, is a tuple: ωY |xi = (bbY |xi , uY |xi , aY ), (9.40) where bY |xi : Y → [0, 1] is a belief mass distribution and uY |xi ∈ [0, 1] is an uncertainty mass, such that Eq.(2.6) holds, and the base rate distribution aY : Y → [0, 1] is a prior probability distribution of Y . We denote by ωY |X the set of all conditional opinions on Y given the values of X: ωY |X = {ωY |xi |i = 1, . . . , k}. (9.41) Motivated by the above analysis, we want to deduce a subjective opinion on the target variable Y : ωY kX = (bbY kX , uY kX , aY ) , (9.42) where bY kX : Y → [0, 1] is a belief mass distribution and uY kX ∈ [0, 1] is an uncertainty mass, such that Eq.(2.6) holds. Note that the base rate distribution aY is the same for all of the conditional opinions in ωY |X , and we have the same base rate distribution for the deduced opinion as well. We could as equally well take a separate base rate distribution aY |xi , for each i = 1, . . . , k, and require that aY is determined from the given base rate distributions by Eq.(9.11); or not put any requirements for a relation between the base rates at all. By keeping the same base rate in the given conditional opinions and the deduced one, we want to point out that the method of deducing an opinion described below is irrelevant on the choice of the base rate distributions and their interconnections. The definition of Bayesian deduction for subjective opinions should be compatible with the definition of Bayesian deduction for probability distributions described in Section 9.2. This requirement leads to the following conclusion: The projected probability of the deduced opinion ωY kX should satisfy the probabilistic deduction relation given in Eq.(9.11), i.e. : 9.5 Multinomial Deduction 155 k PY kX (y j ) = ∑ PX (xi )PY |xi (y j ), (9.43) i=1 for j = 1, . . . , l, where Eq.(3.12) provides each factor on the right-hand side of Eq.(9.43). On the other hand, from Eq.(3.12), we have: PY kX (y j ) = bY kX (y j ) + aY (y j )uY kX . (9.44) Eq.(9.43) and Eq.(9.44) together determine l linear equations with the beliefs bY kX (y j ), j = 1, . . . , l, and uncertainty uY kX as variables. We obtain one more equation over the same variables from the additivity property for the beliefs and uncertainty of the subjective opinion ωY kX , by Eq.(2.6): l uY kX + ∑ bY kX (y j ) = 1. (9.45) j=1 This means that we have a system of l + 1 equation with l + 1 variables, which might seem to fully determine the deduced opinion ωY kX . However, the projected probabilities on the left-hand side of the equations in Eq.(9.44) also add up to 1, which makes this system dependent. Hence, the system has an infinite number of solutions, which means that there are infinitely many subjective opinions on Y with a base rate aY , the projected probability distribution of which satisfies Eq.(9.43). This is in correspondence with the geometrical representation given in Figure 9.9 and Figure 9.10, namely: Once we have an opinion point ωY as a solution, then every other point in ΩY , lying on the line through ωY parallel to the director line, will also be a solution. The question is which one of these points is the most appropriate to represent the deduced opinion on Y from the given input opinions. The above observation suggests that we need to somehow choose and fix one belief mass (or the uncertainty mass) of the deduced opinion, and determine the rest of the values from the projected probability relations in Eq.(9.44). Since, in general, we do not have a reason to distinguish among belief mass values, the obvious candidate for this is the uncertainty mass. In what follows, we provide a method for determining the most suitable uncertainty mass value for the deduced opinion corresponding to the given input opinions, i.e. we provide a method for fully determining the deduced opinion. The method of obtaining the deduced opinion ωY kX from the opinions in the set ωY |X and an opinion ωX , i.e. the method of determining a suitable uncertainty mass value of the opinion ωY kX for the given input, is inspired by the geometric analysis of the input opinions and how they are related. The idea is that the conditional opinions in ωY |X are input arguments to the deduction operator which maps ωX into ωY kX . Multinomial deduction is denoted by: ωY kX = ωX ⊚ ωY |X . (9.46) 156 9 Conditional Deduction The deduction operator ‘⊚’ maps the whole opinion space of X, ΩX , into a subspace of ΩY , which we call a deduction sub-space. The following intuitive constraints are taken into consideration in providing the definition of the deduction operator: Constraint 1. The vertices lying on the base of the opinion space ΩX map correspondingly into the opinion points determined by ωY |X . This means that the conditional opinions on Y in the set ωY |X = {ωY |xi |i = 1, . . . , k}, correspond to the absolute opinions on X, namely: ωY |xi = ωXi ⊚ ωY |X , (9.47) for i = 1, . . . , k, where ωXi = (bbiX , uiX , a X ) is the absolute opinion on X such that b iX (xi ) = 1 (consequently b iX (x j ) = 0, for j 6= i, and uiX = 0), and a X is the same as in ωX . Constraint 2. The apex of the opinion space ΩX maps into the apex of the deduction sub-space. The apex of ΩX corresponds to the vacuous opinion on X given as: ◦ ω X = (bb0 , 1, a X ), (9.48) where b 0 is the zero-belief mass distribution that has b 0 (xi ) = 0, for every i = 1, . . . , k. Let us denote by ω ◦ the opinion on Y that corresponds to the apex of Y kX the deduction sub-space, corresponding to the deduction of the vacuous opinion ◦ ω X . Then we obtain the following constraint on the operator ⊚: ω ◦ ◦ Y kX = ω X ⊚ ωY |X . (9.49) Now, according to Eq.(9.47) and Eq.(9.49), the vertices of the domain opinion space ΩX map into the opinion points ωY |xi , i = 1, . . . , k, and ω ◦ with the deduction Y kX operator ⊚. We want the deduction operator to be defined in a way that the deduction sub-space is the ‘convex closure’ of these points. In that way, the deduction subspace, and the deduction itself, will be fully determined by the given conditional opinions in ωY |X , and the given base rate distribution a X . Constraint 3. The image of an arbitrary opinion point ωX on the evidence variable X is obtained by linear projection of the parameters of ωX inside the deduction subspace, and represents an opinion ωY kX on the target variable Y . A visualisation of the above in the case of trinomial opinions where the opinion spaces are tetrahedrons, is given in Figure 9.10. The deduction sub-space is shown as a shaded tetrahedron inside the opinion tetrahedron of Y , on the right-hand side of Figure 9.10. Based on the above assumptions, the deduced opinion ωY kX results from first to construct the deduction sub-space, and then to project the opinion ωX onto it. The deduction sub-space is bounded by the k points ωY |xi , i = 1, . . . , k, and the point that corresponds to the vacuous opinion ω ◦ . While the former are given, the Y kX latter needs to be computed, as described in Step 1 below. 9.5 Multinomial Deduction 157 uX vertex ZY || Xq Zq X uY vertex ZY |x ZY || X ZX ZY |x ZY |x x2 2 1 3 y2 x3 Opinion on antecedent X x1 y3 Deduced opinion on Y y1 Fig. 9.10 Projecting an antecedent opinion into a consequent deduction Subspace The opinion ωX is then linearly projected onto this sub-space, which means that its uncertainty mass uY kX is determined as a linear transformation of the parameters of ωX , with the belief masses determined accordingly. 9.5.2 Bayesian Base Rate Distribution Similarly to the Bayesian base rate for binomial conditionals as described in Section 9.4.1, there is a Bayesian base rate distribution for multinomial conditional opinions. This is necessary if it is required that the set of conditional opinions between nodes X and Y can be inverted multiple times, while preserving their projected probabilities. Assume parent node X of cardinality k and child node Y of cardinality l, with ◦ associated set of conditional opinions ωY |X . Let ω X denote the vacuous opinion on X. Let P ◦ denote the deduced projected probability distribution computed with Y kX ◦ the vacuous opinion ω X . The constraint on the base rate distribution aY is: Multinomial base rate consistency requirement: aY = P ◦ Y kX (9.50) Assuming that the conditional opinions ωY |X are not all vacuous, formally expressed as ∑x∈X uY |x < k, the simple expression for aY (y) can be derived from Eq.(9.50) as follows: 158 9 Conditional Deduction aY (y) = ∑ aX (x) PY |x (y) = ∑ aX (x) (bbY |x (y) + aY (y) uY |x ) x∈X ⇔ aY (y) = x∈X ∑x∈X a X (x) bY |x (y) 1−∑x∈X a X (x) uY |x (9.51) By using the Bayesian base rate distribution aY of Eq.(9.51), it is guaranteed that the multinomial opinion conditionals get the same projected probability distributions after multiple inversions. Note that Eq.(9.51) is a generalisaiton of the binomial case in Eq.(9.23). In case all conditionals in the set ωY |X are vacuous, i.e. when ∑x∈X uY |x = k, then there is no constraint on the base rate distribution aY . 9.5.3 Free Base Rate Distribution Intervals Similarly to the free base rate interval for binomial conditionals as described in Section 9.4.2, there is a set of free base rate intervals for the base rate distribution of multinomial conditional opinions. More specifically, the base rate distribution aY of the deduced opinion ωY kX in Eq.(9.46) must be constrained by intervals defined by a lower base rate limit aY− (y) and an upper base rate limit aY+ (y) for each element y ∈ Y in order to be consistent with the set of conditionals ωY |X . In case of dogmatic conditionals, the set of free base rate intervals is reduced to a Bayesian base rate distribution. The upper and lower limits for consistent base rates are the projected probabilities of the consequent opinions resulting from first assuming a vacuous antecedent opinion ωX , and then to hypothetically set maximum (=1) and minimum (=0) base rates for elements y ∈ Y. Assume parent node X of cardinality k and child node Y of cardinality l, with as◦ ◦ sociated set of conditional opinions ωY |X . Let ω X , where uX = 1, denote the vacuous opinion on X. Let P ◦ denote the deduced projected probability distribution computed with Y kX ◦ the vacuous opinion ω X as antecedent. The free base rate distribution aY still has the constraint of the upper and lower limits: 1 Upper base rate: aY+ (y) = max [P aY (y)=0 ◦ Y kX (y)] 1 = max [ ∑ a X (x) PY |x (y)] aY (y)=0 x∈X 1 = max [ ∑ a X (x) (bbY |x (y) + aY (y) uY |x )] aY (y)=0 x∈X (9.52) 9.5 Multinomial Deduction 159 1 Lower base rate: aY− (y) = min [P aY (y)=0 ◦ Y kX (y)] 1 = min [ ∑ a X (x) PY |x (y)] aY (y)=0 x∈X (9.53) 1 = min [ ∑ aX (x) (bbY |x (y) + aY (y) uY |x )] aY (y)=0 x∈X The free base rate interval for aY (y) is then expressed as [aY− (y), aY+ (y)], meaning that the base rate aY (y) must be within that interval in order to be consistent with the set of conditionals ωY |X . Note that the free base rate interval [aY− (y), aY+ (y)] is also a function of the base rate distribution a X . In case of dogmatic conditionals ωY |X the set of free base rate intervals for aY collapses to a base rate distribution, where each base rate aY (y) is expressed by Eq.(9.54). aY (y) = ∑ a X (x) PY |x (y) = ∑ a X (x) bY |x (y) x∈X (9.54) x∈X In case the set of conditional opinions ωY |X contains only vacuous opinions, i.e. when uY |x = 1 for all x ∈ X, then every free interval is [0, 1], meaning that there is no constraint on the base rate distribution aY other than the additivity requirement ∑ aY (y) = 1. 9.5.4 Method for Multinomial Deduction Deduction for multinomial conditional opinions can be described in 3 steps. The first step consists of determining the Bayesian base rate distribution (or set of base rate intervals) for aY . The second step consists of determining the image sub-simplex of the deduction space Y . The third step consists of linear mapping of the opinion on X onto the sub-simplex of Y to produce the deduced opinion ωY kX . Step 1: Compute the Bayesian base rate distribution aY according to Eq.(9.55), as described in Section 9.5.2. ∑x∈X a X (x) bY |x (y) 1 − ∑x∈X a X (x) uY |x aY (y) = (9.55) Alternatively, if the analyst wants to specify the base rate distribution more freely, a set of base rate intervals can be computed as described in Section 9.5.3. Step 2: The belief mass assigned to each value y j of Y in any deduced opinion ωY kX should be at least as large as the minimum of the corresponding belief masses in the given conditionals, i.e. : bY kX (y j ) ≥ min[bbY |xi (y j )],∀ j = 1, . . . , l. i (9.56) 160 9 Conditional Deduction This is intuitively clear: If we assign a belief mass to a value y j of Y , for every possible value of X, then the belief mass we would assign to y j without knowing the value of X should be at least as large as the minimum of the assigned belief masses to y j conditional on the values of X. Eq.(9.56) holds for the belief masses of every deduced opinion, and in particular for the belief masses of the opinion ω ◦ . In determining ω ◦ we need to consider Y kX Y kX the constraint about the projected probability distribution given in Eq.(9.43), and keep track on the condition in Eq.(9.56), while maximizing the uncertainty. The fact that all deduced opinions should satisfy Eq.(9.56) has the following geometrical interpretation in tetrahedron opinion spaces: All the deduced opinion points must be inside the auxiliary deduction sub-space of ΩY , that is the sub-space bounded with the planes bY (y j ) = mini [bbY |xi (y j )] (parallel to the sides of ΩY ). ◦ Applying Eq.(9.43) to the vacuous opinion on X, ω X , we obtain the following equation for the projected probability distribution of ω ◦ : Y kX k P ◦ Y kX (y j ) = ∑ a X (xi )PY |xi (y j ). (9.57) i=1 On the other hand, for the projected probability of ω ◦ Y kX , according to the defini- tion of projected probability given in Eq.(3.12), we have the following: P ◦ Y kX (y j ) = b ◦ Y kX Thus, we need to find the point ω ◦ Y kX (y j ) + aY (y j )u ◦ Y kX = (bb ◦ Y kX ,u ◦ Y kX . (9.58) , aY ) with the greatest possible uncertainty satisfying the requirements in Eq.(9.58) and Eq.(9.56), where P ◦ Y kX (y j ) is determined by Eq.(9.57). From Eq.(9.58) and Eq.(9.56) we have the following: P u ◦ Y kX ≤ ◦ Y kX (y j ) − min[bbY |xi (y j )] i aY (y j ) , (9.59) for every j = 1, . . . , l. For simplicity, let us denote the right-hand side of Eq.(9.59) by u j . Hence we have: u ◦ Y kX for every j = 1, . . . , l. Now, the greatest u ≤ u j, ◦ Y kX (9.60) , for which Eq.(9.60) holds, is obviously determined as: u ◦ Y kX = min[u j ]. j (9.61) Namely, from Eq.(9.58) and Eq.(9.56), it follows that this value is non-negative. It is also less than or equal to 1 since, if we assume the opposite, it will follow that u j > 1, for every j = 1, . . . , l, which leads to P ◦ (y j ) > mini [bbY |xi (y j ) + aY (y j )], Y kX 9.5 Multinomial Deduction 161 for every j = 1, . . . , l. Summing up by j in the last inequality leads to contradiction, since both the projected probabilities and the base rates of Y sum up to 1. Hence, u ◦ determined by Eq.(9.61) is a well-defined uncertainty value. It is obviously the Y kX greatest value satisfying Eq.(9.60), hence also the initial requirements. Having determined u ◦ , we determine the corresponding belief masses b Y kX j = 1, . . . , l from Eq.(9.58), hence determine the opinion point ω ◦ Y kX ◦ Y kX (y j ), . In the geometrical representation of opinions in tetrahedrons, determining the opinion point ω ◦ is equivalent to identifying the intersection between the surface Y kX of the auxiliary deduction sub-space and the projector line (line parallel to the director) passing through the point on the base correspondent to the projected probability determined by Eq.(9.57). Step 3: The vertices of the opinion simplex of X map into the vertices of the deduction sub-space. This leads to the following linear expression for the uncertainty uY kX of an opinion ωY kX on Y , deduced from an opinion ωX = (bbX , uX , a X ) on X: k uY kX = uX u ◦ Y kX + ∑ uY |xi b X (xi ) . (9.62) i=1 We obtain the last expression as the unique transformation on the beliefs and uncertainty of an opinion on X, that maps the beliefs and uncertainties of the opinions ◦ ωXi , i = 1, . . . , k, and ω X , into the uncertainty masses of ωY |xi , i = 1, . . . , k, and ω ◦ , Y kX correspondingly. From the equivalent form of Eq.(9.62): k uY kX = u ◦ Y kX − ∑ (u i=1 ◦ Y kX − uY |xi )bbX (xi ) , (9.63) The uncertainty of a deduced opinion ωY kX from an arbitrary opinion ωX is obtained when the maximum uncertainty of the deduction, u ◦ , is decreased by the weighted Y kX average of the beliefs by the ‘uncertainty distance’ of the conditional opinions to the maximum uncertainty. Having deduced the uncertainty uY kX , the belief mass distribution, bY kX = {bbY kX (y j ), j = 1, . . . , l}, of the deduced opinion is determined by rearranging Eq.(9.44), to the following form. bY kX (y j ) = PY kX (y j ) − aY (y j )uY kX . (9.64) The deduced multinomial opinion is then: ωY kX = (bbY kX , uY kX , aY ). (9.65) 162 9 Conditional Deduction This marks the end of the 3-step deduction procedure for multinomial opinions. Note that in case the analyst knows the exact value of variable X = xi , i.e. b X (xi ) = 1 so that ωX is an absolute opinion, then obviously ωY kX = ωY |xi . The above described procedure can be applied also in the case when some of the given opinions are hyper opinions. In that case, we first determine the corresponding projections of the hyper opinions into multinomial opinions, in the way described in Section 3.5.2, and then deduce an opinion from the projections. The resulting deduced opinion is then multinomial. 9.6 Example: Multinomial Deduction for Match-Fixing This example is about a football match to be played between Team 1 and Team 2. A gambler who plans to bet on the match suspects that match-fixing is taking place, whereby one of the teams has been paid to loose, or at least not to win the match. The gambler has an opinion about the outcome of the match in case Team 1 has been paid to loose, in case Team 2 has been paid to loose, and in case no team has been paid to loose. The gambler also has an opinion about whether Team 1, Team 2, or none of the teams has been paid to loose. Let X = {x1 , x2 , x3 } denote the variable representing which team has been paid, as expressed by Eq.(9.66). x1 : Team 1 has been paid to loose Domain for match fixing: X = x2 : Team 2 has been paid to loose x3 : No match-fixing (9.66) Let Y = {y1 , y2 , y3 } denote the variable representing which team wins the match, as expressed by Eq.(9.67). y1 : Team 1 wins the match Domain for winning the match: Y = y2 : Team 2 wins the match (9.67) y3 : The match ends in a draw The opinions are given in Table 9.1 Table 9.1 Opinion ωX (match-fixing), and conditional opinions ωY |X (winning the match). Opinion on X b X (x1 ) b X (x2 ) b X (x3 ) uX = 0.50 = 0.10 = 0.10 = 0.30 a X (x1 ) = 0.1 a X (x2 ) = 0.1 a X (x1 ) = 0.8 bY |x1 bY |x2 bY |x3 Conditional opinions ωY |X y1 y2 y3 = {0.00, 0.70, 0.10} uY |x1 = 0.20 = {0.70, 0.00, 0.10} uY |x2 = 0.20 = {0.10, 0.10, 0.20} uY |x3 = 0.60 9.7 Interpretation of Material Implication in Subjective Logic 163 The first step is to apply Eq.(9.55) to compute the Bayesian base rate distribution aY , which produces: aY (y1 ) = 0.3125 Bayesian base rate distribution aY = aY (y2 ) = 0.3125 (9.68) aY (y3 ) = 0.3750 The second step is to apply Eq.(9.59) and Eq.(9.61) to compute the sub-simplex apex uncertainty which produces u ◦ = 0.7333. Y kX The third step is to apply Eq.(9.62) and Eq.(9.64) to compute the deduced opinion about which team will win the match, which produces: Deduced opinion ωY kX bY kX (y1 ) = 0.105, bY kX (y2 ) = 0.385, = bY kX (y3 ) = 0.110, uY kX = 0.4 aY kX (y1 ) = 0.3125, aY kX (y2 ) = 0.3125, (9.69) aY kX (y3 ) = 0.3750, So, based on the opinion about match-fixing, as well as conditional opinions about the chances of winning, it appears that Team 2 will win, with projected probability PY kX (y2 ) = 0.385 + (0.3125 · 0.4) = 0.51. 9.7 Interpretation of Material Implication in Subjective Logic Material implication is traditionally denoted as (x → y), where x represents the antecedent and y the consequent of the logical relationship between the propositions x and y. Material implication is a truth functional connective, meaning that it is defined by its Truth Table 9.2. Table 9.2 Traditional truth table for material implication x F F T T y F T F T x→y T T F T While truth functional connectives normally have a relatively clear interpretation in normal language, this is not the case for material implication. The implication (x → y) could for example be expressed as: “If x is true, then y is true”. However, this does not say anything about the case when x is false, which is problematic for the interpretation of the corresponding entries in the truth table. In this section we show that material implication is not closed under Boolean truth values, and that it in fact produces uncertainty in the form of a vacuous opinion. When seen in this light, 164 9 Conditional Deduction it becomes clear that the traditional definition of material implication is based on the over-simplistic and misleading interpretation of complete uncertainty as binary logic TRUE. We redefine material implication with subjective logic to preserve the uncertainty that it unavoidably produces in specific cases. We then compare the new definition of material implication with conditional deduction, and show that they reflect the same mathematical equation rearranged in different forms. 9.7.1 Truth Functional Material Implication By definition, logical propositions in binary logic can only be evaluated to TRUE or FALSE. A logical proposition can be composed of sub-propositions that are combined with logical connectives. For example, the conjunctive connective ∧ can be used to combine propositions x and y into the conjunctive proposition (x ∧ y). In this case the proposition (x ∧ y) is a complex proposition because it is composed of subpropositions. The ∧ connective has its natural language interpretation expressed as: “x and y are both TRUE”. A logical proposition is said to be truth functional when its truth depends on the truth of its sub-propositions alone [5]. Traditionally, it is required that the complex proposition has a defined truth value for all the possible combinations of truth values of the sub-propositions, in order for the truth function to be completely defined. As an example of a simple truth functional connective, the conjunction of two propositions is defined as the truth table shown in Table 9.3 below. Table 9.3 Truth table for conjunction (x ∧ y) x F F T T y F T F T x∧y F F F T Logical AND reflects the natural intuitive understanding of the linguistic term ‘and’, and is at the same time extremely useful for specifying computer program logic. Material implication, also known as truth functional implication, is a conditional proposition usually denoted as (x → y), where x is the antecedent, y is the consequent. A conditional proposition is a complex proposition consisting of the two sub-propositions x and y connected with the material implication connective ‘→’. The natural language interpretation of material implication is: “If x is TRUE, then y is TRUE”, or simply as “x implies y”. However, it can be noted that the natural language interpretation does not say anything about the case when x is FALSE. Nevertheless, material implication is defined both in case x is TRUE and x FALSE. The natural language interpretation thus only covers half the definition of material 9.7 Interpretation of Material Implication in Subjective Logic 165 implication, and this interpretation vacuum is the source of the confusion around material implication. Defining material implication as truth-functional means that its truth values are determined as a function of the truth values of x and y alone, as shown in Table 9.4. Table 9.4 Basic cases in truth table for material implication Case 1: Case 2: Case 3: Case 4: x F F T T y F T F T x→y T T F T The truth table of Table 9.4 happens to be equal to the truth table of (x ∨ y), which is the reason why the traditional definition of truth functional material implication leads to the equivalence (x → y) ⇔ (x ∨ y). However, treating conditionals as truth functional in this fashion leads to well known inconsistencies. Truth functional material implication should therefore not be considered to be a binary logic operator at all. The natural language interpretation assumes that there is a relevance connection between x and y which does not emerge from Truth Table 9.4. The relevance property which intuitively, but mistakenly, is assumed by (x → y) can be expressed as: “The truth value of x is relevant for the truth value of y”. For example, connecting a false antecedent proposition with an arbitrary consequent proposition gives a true implication according to material implication, but is obviously counter-intuitive when expressed in normal language such as: “If 2 is odd, then 2 is even”. However, the inverse proposition: “If 2 is even, then 2 is odd” is false according to Truth Table 9.4. Furthermore, connecting an arbitrary antecedent with a true consequent proposition is true according to material implication although the antecedent and consequent might not have any relevance to each other. An example expressed in normal language is e.g. “If it rains, then 2 is even”. The problem is that it takes more than a truth table to determine whether a proposition x is relevant for another proposition y. In natural language, the term ‘relevance’ assumes that when the truth value of the antecedent varies, so does that of the consequent. Correlation of truth variables between antecedent and consequent is thus a necessary element for relevance. Material implication defined in terms of Truth Table 9.4 does not express any relevance between the propositions, and therefore does not reflect the meaning of the natural language concept of implication. Truth Table 9.4 gives a case-by-case static view of truth values which is insufficient to derive any relevance relationships. There is thus a difference between e.g. conjunctive propositions and conditional propositions in that conjunctive propositions are intuitively truth functional, whereas conditional propositions are not. Treating a conditional proposition as truth functional is problematic because its truth cannot be determined in binary logic terms solely as a function of the truth of its components. 166 9 Conditional Deduction This section explains that the truth function of material implication is not closed under binary logic, and that its truth value in fact can be uncertain, which is something that cannot be expressed with binary logic. We show that material implication can be redefined to preserve uncertainty, and compare this to probabilistic and subjective logic conditional deduction where uncertainty can be expressed either in the form of probability density functions or as subjective opinions. Subjective logic allows degrees of uncertainty to be explicitly expressed and is therefore suitable for expressing the uncertain truth function of material implication. 9.7.2 Material Probabilistic Implication By material probabilistic implication we mean that the probability value of the conditional p(y|x) shall be determined as a function of other probability variables. This then corresponds directly with propositional logic material implication where the truth value of the conditional is determined as a function of the antecedent and the consequent truth values according to the truth table. According to Eq.(9.3) on p.138 binomial probabilistic deduction is expressed as: p(ykx) = p(x)p(y|x) + p(x)p(y|x). (9.70) The difference between probabilistic conditional deduction and probabilistic material implication is a question of rearranging Eq.(9.70) so p(y|x) is expressed as: p(y|x) = p(ykx) − p(x)p(y|x) p(x) (9.71) Below we will use Eq.(9.71) and Eq.(9.70) to determine the value of the conditional (y|x). • Cases 1 & 2: p(x) = 0 The case p(x) = 0 in Eq.(9.71) immediately appears as problematic. It is therefore necessary to consider Eq.(9.70). It can be seen that the term involving p(y|x) disappears from Eq.(9.70) when p(x) = 0. As a result p(y|x) can take any value in the range [0, 1], so p(y|x) must be expressed as a probability density function. Without any prior information, the density function must be considered to be uniform, which in subjective logic has a specific interpretation as will be explained below. A realistic example could for example be when considering the propositions x:“The switch is on” and y:“The light is on”. Recall that x is FALSE (i.e. “The switch is off”) in the cases under consideration here. Let us first consider the situation corresponding to Case 1 in Table 2 where ykx is FALSE (i.e. “The light is off with the given switch position, which happens to be off”), which would be the case when y|x is FALSE. In this situation it is perfectly possible that y|x is FALSE too (i.e. “The light is off whenever the switch 9.7 Interpretation of Material Implication in Subjective Logic 167 is on”). It is for example possible that the switch in question is not connected to the lamp in question, or that the bulb is blown. Let us now consider the situation corresponding to Case 2 in Table 2 where ykx is TRUE (i.e. “The light is on with the given switch position, which happens to be off”), which would be the case when y|x is TRUE. In this situation it is also perfectly possible that y|x is FALSE (i.e. “The light is off whenever the switch is on”). It is for example possible that the electric cable connections have been inverted, so that the light is on when the switch is off, and vice versa. These examples are in direct contradiction with Cases 1 & 2 of Table 2 which dictates that the corresponding implication (x → y) should be TRUE in both cases. The observation of this contradiction proves that the traditional definition of material implication is inconsistent with standard probability calculus. • Cases 3 & 4: p(x) = 1 Necessarily p(x) = 0, so that Eq.(9.71) is transformed into p(y|x) = p(ykx). Thus when x is TRUE (i.e. p(x) = 1) then necessarily (x → y) will have the same truth value as y. This does not necessarily mean that the truth value of x is relevant to the truth value of y. In fact it could be either relevant or irrelevant. For example consider the antecedent proposition x:“It rains” combined with the consequent y:“I wear an umbrella”, then it is plausibly relevant, but combined with the consequent y:“I wear glasses”, then it is plausibly irrelevant. It can be assumed that x and y are TRUE in this example so that the implication is TRUE. The unclear level of relevance can also be observed in examples where the consequent y is FALSE so that the implication (x → y) becomes FALSE. The level of relevance between the antecedent and the consequent is thus independent of the truth value of the implication (x → y) alone. The criteria for relevance will be described in more detail below. 9.7.3 Relevance in Implication A meaningful conditional relationship between x and y requires that the antecedent x is relevant to the consequent y, or in other words that the consequent depends on the antecedent, as explicitly expressed in relevance logics [19]. Conditionals that are based on the dependence between consequent and antecedent are considered to be universally valid (and not truth functional), and are called logical conditionals [16]. Deduction with logical conditionals reflects human intuitive conditional reasoning, and do not lead to any of the paradoxes of material implication. Material implication, which is purely truth functional, ignores any relevance connection between antecedent x and the consequent y, and defines the truth value of the conditional as a function of the truth values of the antecedent and consequent alone. It is possible to express the relevance between the antecedent and the consequent as a function of the conditionals. According to Definition 10.1 and Eq.(10.4) the relevance denoted Ψ(y|x) is expressed as: 168 9 Conditional Deduction Ψ(y|x) = |p(y|x) − p(y|x)| . (9.72) Obviously, Ψ(y|x) ∈ [0, 1]. The case Ψ(y|x) = 0 expresses total irrelevance/independence, and the case Ψ(y|x) = 1 expresses total relevance/dependence of x on y. Relevance cannot be derived from the traditional truth functional definition of material implication because the truth value of (x → y) is missing from the truth table. In order to rectify this, an augmented truth table that includes (x → y) is given below. Table 9.5 Truth and relevance table for material implication Case 1a: Case 1b: Case 2a: Case 2b: Case 3a: Case 3b: Case 4a: Case 4b: x F F F F T T T T y x → y x → y Relevance F F Any Any F T n.d. n.d. T F n.d. n.d. T T Any Any F F F None F T F Total T F T Total T T T None From Table 9.5 it can be seen that the truth table entries in the cases 1 and 2 are either ambiguous (‘Any’) or not defined (‘n.d’). The term ‘Any’ is used to indicate that any truth or probability for (x → y) is possible in the cases 1a and 2b, not just Boolean TRUE or FALSE. Only in the cases 3 and 4 is the truth table clear about the truth value of (x → y). The same applies to the relevance between x and y where any relevance value is possible in the cases 1a and 2b, and only the cases 3 and 4 define the relevance crisply as either no relevance or total relevance. Total relevance can be interpreted in the sense x ⇔ y or x ⇔ y, i.e. that x and y are either equivalent or inequivalent. Our analysis shows that the natural conditional relationship between two propositions can not be meaningfully be described with a simple binary truth table because other values than Boolean TRUE and FALSE are possible. The immediate conclusion is that material implication is not closed under a binary truth value space. Not even by assigning probabilities to (x → y) in Table 9.5 can material implication be made meaningful. Below we show that subjective logic which can express uncertainty is suitable for defining material implication. 9.7.4 Subjective Interpretation of Material Implication The discussion in Section 9.7.3 above concluded that any probability is possible in the cases 1 and 2 of the truth table of material implication. The uniform probability density function expressed by Beta(1, 1) which is equivalent to the vacuous opinion ω = (0, 0, 1, 12 ) is therefore a meaningful and sound representation of the term ‘Any’ 9.7 Interpretation of Material Implication in Subjective Logic 169 in Table 9.5. Similarly to three-valued logics such as Kleene-logic [24], it is possible to define three-valued truth as {TRUE, FALSE, UNCERTAIN}, abbreviated as {T, F, U}, where the truth value UNCERTAIN represents ω = (0, 0, 1, 21 ). Given that material implication is not closed in the binary truth value space, an augmented truth table can be defined that reflects the ternary value space of (x → y) as a function of the binary truth values of x and y, as shown in Table 9.6. Table 9.6 Augmented truth table for material implication x y x→y Opinion Case 1: F F U: ω(x→y) = (0, 0, 1, 12 ) Case 2: F T U: ω(x→y) = (0, 0, 1, 12 ) Case 3: T F F: ω(x→y) = (0, 1, 0, ay ) Case 4: T T T: ω(x→y) = (1, 0, 0, ay ) Table 9.6 defines material implication as truth functional in the sense that it is determined as a function of binary truth values of x and y. Specifying the truth value UNCERTAIN (vacuous opinion) in the column for (x → y) in the truth table is a necessary consequence of the analysis in Section 9.7.2, but this means that the truth table no longer is closed under binary truth values. It can be argued that if values other than binary logic TRUE and FALSE are allowed for (x →y), then it would be natural to also allow the same for x and y. This is indeed possible, and can be expressed in terms of subjective logic as described next. 9.7.5 Comparison with subjective Logic Deduction With the mathematical detail omitted, the notation for binomial conditional deduction in subjective logic is: ωykx = ωx ⊚ (ωy|x , ω y|x) where the terms are interpreted as follows: ωy|x : opinion about y given x is TRUE ωy|x : opinion about y given x is FALSE ωx : opinion about the antecedent x ωykx : opinion about the consequent y given x Figure 9.11 shows a screenshot of conditional deduction. (9.73) 170 9 Conditional Deduction u u u y|x DEDUCE d b P y|x b aP P d a Opinion about x belief 0.00 disbelief 1.00 uncertainty 0.00 base rate 0.50 probability 0.00 = Opinion about y|x belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.75 probability 0.75 b d y|x 1.00 0.00 0.00 0.75 1.00 a P Opinion about y belief 1.00 disbelief 0.00 uncertainty 0.00 base rate 0.75 probability 1.00 Fig. 9.11 Screenshot from subjective logic demonstrator of deduction The input variables in the example of Figure 9.11 are binomial opinions which can be mapped to Beta distributions according to Definition 3.3. The leftmost triangle represents the opinion on x, and the rightmost triangle that of y. The middle triangle represents the conditional opinions of y|x and y|x. The particular example illustrates Case 2b of Table 9.5 and Case 2 of Table 9.6. Eq.(9.73) corresponds directly to the probabilistic version of Eq.(9.70). Both expressions take three input variables, the only difference is that the input variables in Eq.(9.70) are scalars, where as those of Eq.(9.73) are 3-dimensional. Given that the base rates of ωy|x and ωy|x are equal, Eq.(9.73) takes eight scalar input parameters. 9.7.6 How to Interpret Material Implication We have shown that material implication is inconsistent with traditional probabilistic logic. This is nothing new, where e.g. Nute and Cross (2002) pointed out that “There can be little doubt that neither material implication nor any other truth function can be used by itself to provide an adequate representation of the logical and semantic properties of English conditionals” [73]. In this section we have presented a redefinition of material implication as a probabilistic material implication. The difference between probabilistic material implication and conditional deduction is a question of rearranging equations, as in the transition from Eq.(9.3) to Eq.(9.71). The analysis of material implication has shown that it is impossible to determine the conditional p(y|x) or the corresponding implication (x → y) as a truth function because the required conditional p(y|x) or the corresponding implication (x →y) are 9.7 Interpretation of Material Implication in Subjective Logic 171 missing. Material implication produces an uncertain conclusion precisely because it attempts to determine the conditional relationship without the necessary evidence. Probabilistic conditional relationships are routinely determined from statistical data to be used as input to e.g. Bayesian networks. It is when the conditionals are known and expressed for example in a boolean, probabilistic or a subjective logic form that they are applicable for deriving conclusions about propositions of interest. The idea of material implication is to turn the concept of deduction on its head and try to determine the conditional from the antecedent argument and the consequent conclusion. We have shown that the cases when the antecedent is FALSE then the truth value of the material implication should be ‘uncertain’, not TRUE. We have extended the truth table of material implication to make it correct and consistent with conditional deduction. By doing so, material implication can no longer be considered as a connective of propositional logic because its truth table is not closed under binary logic values. A more general reasoning calculus such as subjective logic is needed to allow a consistent definition of material implication because it allows the truth table to express the required uncertainty in the form of vacuous opinions. Chapter 10 Conditional Abduction 10.1 Introduction to Abductive Reasoning Abduction is to reason in the opposite direction of available conditionals. Since conditionals are typically causal, the abduction process typically consists of reasoning from an observed fact/event to determine (the likelihood of) possible causes of the fact/event. This might appear to be a rather complex reasoning process, which it often is. However, we constantly do intuitive abductive reasoning without much effort, but often make mistakes because of typical human reasoning fallacies, such as ignoring base rates. Simple examples of abductive reasoning are when we try to find causes for something. Assume for example that I follow the principle of locking the front door when leaving my house for work every morning. Then one evening when I return home, I find the door unlocked. Abductive reasoning is then the reasoning process which tries to discover possible causes for why the door is unlocked, such as the possibility that a burglar picked the lock and robbed the house, or that I forgot to lock the door when I left in the morning. Another typical example of abductive reasoning is when medical doctors diagnose diseases through tests. A pharmaceutical company that develops a medical test for a specific disease, must determine the quality of the test by applying it to a number of persons who certainly do have the disease, that we denote AS (Affected Subjects), as well as to a number of persons who certainly do not have the disease, that we denote US (Unaffected Subjects). The respective numbers of TP (True Positives), TN (True Negatives), FP (False Positives), and FN (False Negatives) can then be observed. Note that AS = TP + FN, and that US = TN + FP. The quality of the test is then described in terms of its sensitivity aka. TPR (True Positive Rate) and specificity aka. TNR (True Negative Rate) expressed as follows: 173 174 10 Conditional Abduction Sensitivity: TPR = TP = TP TP+FN AS (10.1) TN TN Specificity: TNR = = TN+FP US Sensitivity quantifies the test’s ability to avoid false negatives, and specificity quantifies the test’s ability to avoid false positives. The smaller the sensitivity TPR and specificity TNR the better the quality of the test. It turns out that quality aspects of the test can be expressed in terms of the conditionals: p(‘positive test’ | ‘Affected subject’) = TPR (10.2) p(‘positive test | ‘Unaffected subject’) = 1 − TNR The conditionals of Eq.(10.2) are causal because the presence or absence of the disease causes the test to be positive or negative. The problem with these conditionals is that the medical doctor can not apply them directly to make the diagnosis. What is needed is the pair of opposite conditionals so that from a positive or negative test the medical doctor can assess the likelihood that the patient is affected or not affected by the disease. The process of inverting the conditionals of Eq.(10.2) and making a diagnosis in this situation is precisely abductive reasoning. Experiments show that humans are quite bad at intuitive abductive reasoning. For example, the base rate fallacy [6, 58] in medicine consists of making the erroneous assumption that p(y|x) = p(x|y). While this reasoning error often produces relatively good approximations of correct diagnostic probabilities, it can lead to a completely wrong result and wrong diagnosis in case the base rate of the disease in the population is very low and the reliability of the test is not perfect. Medical tests are of course not only for diseases, but for any medical condition. An extreme example of the base rate fallacy, is to conclude that a male person is pregnant just because he tests positive in a pregnancy test. Obviously, the base rate of male pregnancy is zero, and assuming that no test is absolutely perfect, it would be correct to conclude that the male person is not pregnant and to assume that the positive test is a merely a false positive. In legal reasoning the base rate fallacy is called the prosecutor’s fallacy [80], which consists of assigning too high base rate (prior probability) to finding a true match of e.g. fingerprints or DNA. For example, if a specific fingerprint is found on the murder weapon at a crime scene, and a search is done through a database containing millions of samples, then the base rate of a true match is extremely low, so it would be unsafe to interpret a match in the database directly as a true match and a proof of guilt. Instead, it is more likely to be a false match, i.e. the person is probably not the person who left the fingerprint, and is therefore not guilty, even if the fingerprint matches. In order to correctly assess the fingerprint match as proof, the prosecutors must also consider the quality (sensitivity and specificity) of the matching procedure, as well as the base rate of true match given demographic and other circumstantial parameters. 10.2 Relevance and Irrelevance 175 The correct reasoning that takes base rates into account can easily be formalized mathematically, and is often needed in order to avoid errors of intuition in medical diagnostics, legal argumentation and other situations of abductive reasoning. Aspects of abductive reasoning are also mentioned in connection with conditional deduction, described in Chapter 9. We therefore recommend readers to look at the introduction to conditional reasoning in Section 9.1, the description of probabilistic conditional inference in Section 9.2, as well as the subjective logic notation for conditional inference in Section 9.3. In this section we describe the principle of abductive reasoning and its expression in the framework of subjective logic. Before providing the details of binomial and multinomial abduction in subjective logic, the next section first introduces the concept of relevance, which is necessary for inverting binomial and multinomial conditionals [56]. 10.2 Relevance and Irrelevance The concept of relevance expresses that the values of two separate variables influence each other dynamically in time. Relevance between a variable X and another variable Y then expresses the likelihood that observing a change in the values of X leads to a change in the observed values of Y . This concept is formally defined below. Definition 10.1 (Probabilistic Relevance). Given two variables X = {xi |i = 1, . . . , k} and Y = {y j | j = 1, . . . , l}, and a set of conditional probability distributions p(Y |xi ), i = 1, . . . , k, the relevance of a variable X to a value y j , is defined as: Ψ(y j |X) = max[p(y j |xi )] − min[p(y j |xi )] . xi ∈X xi ∈X (10.3) ⊔ ⊓ The relevance expresses the diagnostic power of the conditionals, i.e. the degree of belief according to the conditionals, to which the possible values of the random variable X influence the truth of value y j (Y taking the value y j ). Obviously, Ψ(y j |X) ∈ [0, 1], for every j = 1, . . . , l. Ψ(y j |X) = 1 expresses total relevance (determination), and Ψ(y j |X) = 0 expresses total irrelevance of X to y j . In case of binary probabilistic conditionals p(y|x) and p(y|x) the expression for relevance is simplified to: Ψ(y|x) = |p(y|x) − p(y|x)| . (10.4) The concept of relevance can be extended to conditional subjective opinions, simply by projecting conditional opinions to their corresponding projected probability functions, and applying Eq.(10.3). 176 10 Conditional Abduction Definition 10.2 (Opinion Relevance). Assume a set of conditional opinions ωY |X = {ωY |x1 , . . . , ωY |xk }, where each conditional opinion ωY |xk has a corresponding projected probability distribution PY |xi . The relevance of X to each y j can be expressed as: Ψ(y j |X) = max[PY |xi (y j )] − min[PY |xi (y j )] . xi ∈X xi ∈X (10.5) ⊔ ⊓ It is useful to also define irrelevance, Ψ(y j |X), as the complement of relevance: Ψ(y j |X) = 1 − Ψ(y j |X) . (10.6) The irrelevance Ψ(y j |X) expresses the lack of diagnostic power of X over y j , which gives rise to the uncertainty of the conditional opinions. Note that in the case of binary variables X and Y , where X takes its values from X = {x, overlinex} and Y takes its values from Y = {y, y}, the above equations give the same relevance and irrelevance values for the two values of Y . For simplicity, we can denote relevance of X to y (and X to y) by Ψ(y|x) in this case. 10.3 Inversion of Binomial Conditional Opinions 10.3.1 Principles for Inverting Binomial Conditional Opinions Binomial abduction requires the inversion of binomial conditional opinions. This section describes the mathematical expressions necessary for computing the required inverted conditionals. Assume that the available conditionals are ωy|x and ωy|x which are expressed in the opposite direction to that needed for applying the operator for deduction in Eq.(10.7). ωxky = ωy ⊚ (ωx|y , ωx|y ) (10.7) Binomial abduction simply consists of first inverting the pair of available conditionals (ωy|x , ωy|x ) to produce the opposite pair of conditionals (ωx|y , ωx|y ), and subsequently to use these as input to binomial deduction described in Section 9.4. Figure 10.1 illustrates the principle of conditional inversion, in the simple case of the conditoinals ωy|x = (0.80, 0.20, 0.00, 0.50) and ωy|x = (0.20, 0.80, 0.00, 0.50), and where ax = 0.50. The inversion produces the pair of conditionals ωy|x = (0.72, 0.12, 0.16, 0.50) and ωy|x = (0.16, 0.72, 0.12, 0.50), which are computed with the method described below. The upper half of Figure10.1 illustrates how the conditionals define the shaded subtriangle within the Y -triangle, which represents the image area for possible deduced opinions ωykx . 10.3 Inversion of Binomial Conditional Opinions ux vertex Zq 177 Zq y uy vertex x Z y|| xq X-triangle Z y| x x x ax Y-triangle Z y| x y y ay Inversion of conditional opinions ux vertex Zq x Z x| y x Zq y Z x|| yq X-triangle uy vertex Y-triangle Z x| y ax x y ay y Fig. 10.1 Inversion of binomial conditionals The lower half of Figure10.1 illustrates how the inverted conditionals define the shaded subtriangle within the X-subtriangle, which represents the image area for possible abduced opinions ωxky . Note that, in general, inversion of opinions produces increased uncertainty, as can be seen by the difference in uncertainty level between the shaded subtriangle on the upper half, and the shaded subtriangle on the lower half of Figure10.1. Inversion of conditional opinions must be compatible with inversion of probabilistic opinions. We therefore need the projected probabilities of the available conditionals ωy|x and ωy|x . Py|x = by|x + ay uy|x (10.8) Py|x = by|x + ay uy|x Then compute the projected probabilities of the inverted conditionals ωx|y and ωx|y using the results of of Eq.(10.8) and the base rate ax . 178 10 Conditional Abduction Px|y = Px|y = ax Py|x (ax Py|x +ax Py|x ) (10.9) ax Py|x (ax Py|x +ax Py|x ) Synthesise a pair of dogmatic conditional opinions from the expectation values of Eq.(10.9): ω x|y = (Px|y , Px|y , 0, ax ) (10.10) ω x|y = (Px|y , Px|y , 0, ax ) where Px|y = (1 − Py|x ) and Py|x = (1 − Py|x ). The projected probabilities of the dogmatic conditionals of Eq.(10.10) and of the inverted conditional opinions ωx|y and ωx|y are equal by definition. However, the inverted conditional opinions ωx|y and ωx|y do in general contain uncertainty, in contrast to the dogmatic opinions of Eq.(10.10) that contain no uncertainty. The inverted conditional opinions ωx|y and ωx|y can be derived from the dogmatic opinions of Eq.(10.10) by determining their appropriate uncertainty level. This amount of uncertainty is a function of the following elements: • The theoretical maximum uncertainty values ubX|y and ubX|y for ωx|y and ωx|y respectively, • The weighted proportional uncertainty uw y|X based on the uncertainties uy|x and uy|x , • The irrelevance Ψ(y|X) and Ψ(y|X) . Having determined the appropriate uncertainty for the two conditionals, the corresponding belief masses emerge directly. 10.3.2 Method for inversion of Binomial Conditional Opinions The inversion of binomial opinions is summarised in 4 steps below. Step 1: Theoretical maximum uncertainties ubx|y and ubx|y . Figure 10.2 illustrates how the belief mass is set to zero to determine the theoretbx|y . ical uncertainty-maximised conditional ω The theoretical maximum uncertainties ubX|y for ωx|y and ubX|y for ωx|y are determined by setting either the belief or the disbelief mass to zero according to the simple IF-THEN-ELSE algorithm below. Computation of ubX|y IF Px|y < ax THEN ubX|y = Px|y /ax ELSE ubX|y = (1 − Px|y )/(1 − ax ) (10.11) 10.3 Inversion of Binomial Conditional Opinions 179 ux vertex Base rate director Zx|y Px|y = bx|y + ax|y ux|y Zx|y x vertex Px|y x vertex ax bx|y Fig. 10.2 Dogmatic conditional ω x|y and corresponding uncertainty maximized conditional ω Computation of ubX|y IF Px|y < ax THEN ubX|y = Px|y /ax ELSE ubX|y = (1 − Px|y )/(1 − ax ) (10.12) Step 2: Weighted Proportional Uncertainty uw y|X . We need the sum of conditional uncertanty uΣy|X , computed as: uΣy|X = uy|x + uy|x The proportional uncertainty weights u wy|x = wuy|x =0 u wy|x = uy|x uΣy|X uy|x uΣy|X wuy|x = 0 wuy|x and (10.13) wuy|x are computed as: , for uΣy|X > 0 (10.14) for uΣy|X =0 , for uΣy|X > 0 (10.15) for uΣy|X = 0 We also need the maximum theoretical uncertainty ubY |x and ubY |x . The theoretical maximum uncertainties uby|x and uby|x are determined by setting either the belief or the disbelief mass to zero according to the simple IF-THEN-ELSE algorithm below. Computation of uby|x IF Py|x < ay THEN uby|x = Py|x /ay ELSE uby|x = (1 − Py|x )/(1 − ay ) (10.16) 180 10 Conditional Abduction Computation of uby|x IF Py|x < ay THEN uby|x = Py|x /ay ELSE uby|x = (1 − Py|x )/(1 − ay ) (10.17) w The weighted proportional uncertainty components uw y|x and uy|x are computed as: uw y|x = wuy|x uy|x uby|x , uw y|x = wuy|x uy|x uby|x , uw y|x = 0 uw y|x =0 for uby|x > 0 (10.18) for uby|x > 0 (10.19) for uby|x = 0 for uby|x = 0 The weighted proportional uncertainty uw y|X can then be computed as: w w uw y|X = uy|x + uy|x . (10.20) Step 3: Relative Uncertainties uex|y and uex|y . The relative uncertainties uex|y and uex|y are computed as: uex|y = uex|y = uw y|X ⊔ Ψ(y|X) (10.21) w = uw y|X + Ψ(y|X) − uy|X Ψ(y|X) The interpretation of Eq.(10.21) is that the relative uncertainty ueX|y is an increasing function of the weighted proportional uncertainty uw y|X , because uncertainty in one reasoning direction must be reflected by the uncertainty in the opposite reasoning direction. A practical example is when Alice is totally uncertain about whether Bob carries an umbrella in sunny or rainy weather. Then it is natural that observing whether Bob carries an umbrella tells Alice nothing about the weather. Similarly, the relative uncertainty uex|y is an increasing function of the irrelevance Ψ(y|X), because if the original conditionals ωy|x and ωy|x reflect total irrelevance from parent X to child Y , then there is no basis for deriving belief about the inverted conditionals ωx|y and ωx|y , so it must be uncertainty-maximized. A practical example is when Alice knows that Bob always carries an umbrella both in rain and sun. Then observing Bob carrying an umbrella tells her nothing about the weather. The relative uncertainty uex|y is thus high in case the weighted proportional uncertainty uw y|X is high, or the irrelevance Ψ(y|X) is high, or both are high at the same time. The correct mathematical model for this principle is to compute the relative uncertainty uex|y as the disjunctive combination of weighted proportional uncertainty 10.3 Inversion of Binomial Conditional Opinions 181 uw y|X and the irrelevance Ψ(y|X), denoted by the coproduct operator ⊔ in Eq.(10.21). Note that in the binomial case we have uex|y = uex|y . Step 4: Inverted opinions ωx|y and ωx|y . Having computed ubX|y and the relative uncertainty ueX|y , the correct un certainty level can be computed, and the remaining opinion parameters b and d emerge directly, to produce the inverted opinion. ux|y = ubX|y ueX|y ωx|y = bx|y = Px|y − ax uX|y (10.22) dx|y = 1 − bx|y − uX|y so that the inverted conditional opinion can be expressed as ωx|y = (bx|y , dx|y , uX|y , ax ) . Having computed ubX|y and the relative uncertainty ueX|y , the correct un certainty level can be computed, and the remaining opinion parameters b and d emerge directly, to produce the inverted opinion. uX|y = ubX|y ueX|y ωx|y = bx|y = Px|y − ax uX|y (10.23) dx|y = 1 − bx|y − uX|y so that the inverted conditional opinion is ωx|y = (bx|y , dx|y , uX|y , ax ). This marks the ende of the 4-step procedure for inverting binomial conditionals. The process of inverting conditional opinions can be defined as an operator. Definition 10.3 (Inversion of Binomial Conditionals). Let {ωy|x , ωy|x } be a pair of binomial conditional opinoins, and let ax be the base rate of x. The pair of conditional opinions, denoted {ωx|y , ωx|y }, derived through the 4-step procedure described above, are the inverted binomial conditional opinions of the former pair. e denotes the operator for conditional inversion, so that inversion of The symbol ‘⊚’ a pair of binomial conditional opinions can be expressed as: e ({ωy|x , ωy|x }, ax ) {ωx|y , ωx|y } = ⊚ (10.24) ⊔ ⊓ The process of applying inverted binomial conditionals for binomial conditional deduction is the same as binomial abduction. The difference between inversion and abduction is thus that abduction takes the evidence argument ωy , whereas inversion does not. Eq.(10.25) compares the two operations that both use the same operator symbol. Inversion: {ωx|y , ωx|y } = Abduction: ωxeky e ({ωy|x , ωy|x }, ax ) ⊚ e ({ωy|x , ωy|x }, ax ) = ωy ⊚ (10.25) 182 10 Conditional Abduction 10.3.3 Convergence of Repeated Inversions An interesting question is, what happens when conditionals are repeatedly inverted? In case of probabilistic logic, which is uncertainty agnostic, the inverted conditionals always remain the same after repeated inversion. This can be formally expressed as: e {p(y|x), p(y|x} = ⊚({p(x|y), p(x|y)}, a(y)) (10.26) e ⊚({p(y|x), e = ⊚(( p(y|x)}, a(x))), a(y)) In case of opinions with Bayesian base rates, the projected probabilities of conditional opinions also remain the same. However, repeated inversion of conditional opinions increases uncertainty in general. The increasing uncertainty is of course limited by the theoretical maximum uncertainty for each conditional. In general, the uncertainty of conditional opions converges towards their theoretical maximum, as inversions are repeated infinitely many times. Figure 10.3 illustrated the process if repeated inversion of conditionals, based on the same example as in Figure 10.1, where the initial conditionals are ωy|x = (0.80, 0.20, 0.00, 0.50) and ωy|x = (0.20, 0.80, 0.00, 0.50), and where the equal base rates are ax = ay = 0.50. Zq x ux vertex X-triangle Y-triangle Convergence conditionals Z x| y Z x| y x Zq y uy vertex ax Z y| x Z y| x x y ay y Initial conditionals Fig. 10.3 Convergence of repeated inversion of pairs of binomial conditionals Table 10.1 lists a selection of the computed opinions ωy|x and ωx|y . The set consissts of: the convergence conditional opinion, the 8 first inverted conditional opinions, and the initial conditional opinion, in that order. The uncertainty increase is relative large in the first few inversions, and rapidly becomes smaller. The pair of convergence conditionals opinions are ωy|x = ωy|x = (0.00, 0.60, 0.40, 0.50). The inverted opinions were computed with an office spreadsheet, which started rounding off results from index 6. The final convergence con- 10.4 Binomial Abduction 183 Table 10.1 Series of inverted conditional opinions Index Convergence Initial ∞ · 8 7 6 5 4 3 2 1 0 Opinion Belief ωy|x = ωx|y = ( 0.6, ··· ωy|x = ( 0.603358, ωx|y = ( 0.605599, ωy|x = ( 0.609331, ωx|y = ( 0.615552, ωy|x = ( 0.62592, ωx|y = ( 0.6432, ωy|x = ( 0.672, ωx|y = ( 0.72, ωy|x = ( 0.8, Disbelief Uncertainty Base rate 0.0, ··· 0.003359, 0.005599, 0.009331, 0.015552, 0.02592, 0.0432, 0.072, 0.12, 0.2, 0.4, ··· 0.393282, 0.388803, 0.381338, 0.368896, 0.34816, 0.3136, 0.256, 0.16, 0.0 , 0.5 ) ··· 0.5 ) 0.5 ) 0.5 ) 0.5 ) 0.5 ) 0.5 ) 0.5 ) 0.5 ) 0.5 ) ditional was not computed with the spreadsheet, but was simply determined as the opinion with the theoretical maximum uncertainty. The above example is rather simple, with its perfectly symmetric conditinals and base rates of 1/2. However, the same pattern of convergence of increasing uncertainty occurs for arbitrary conditionals and base rates. In general, the two pairs of convergence conditionals are not equal. The equality in our example above is only due to the symmetric conditionals and base rates. 10.4 Binomial Abduction Binomial abduction with the conditionals ωy|x and ωy|x consists of first producing the inverted conditionals ωx|y and ωx|y as described in Section 10.3, and subsequently applying them in binomial conditional deduction. This is summarised below. Definition 10.4 (Binomial Abduction). Assume the binary domains X = {x, x} and Y = {y, y}, where the pair of conditionals (ωy|x , ωy|x ) is available to the analyst. Assume further that the analyst has the opinion ωy about y (as well as the complement ωy about y), and wants to determine an opinion about x. Binomial abduction is to compute the opinion about x in this situation, which consists of the following two-step process: 1. Invert the pair of available conditionals (ωy|x , ωy|x ) to produce the inverted pair of conditionals (ωx|y , ωx|y ) as described in Section 10.3. 2. Apply the pair of inverted conditionals (ωx|y , ωx|y ) together with the opinion ωy (as well as its complement ωy to compute binomial deduction as denoted in Eq.(10.27) and described in Section 9.4. ⊔ ⊓ Binomial abduction produces the opinion ωxk̃y about the value x expressed as: 184 10 Conditional Abduction e ωy|x , ωy|x , ax ) ωxk̃y = ωy ⊚( (10.27) = ωy ⊚ (ωx|y , ωx|y ) = ωxky ⊔ ⊓ Note that the operator symbol for abduction is the same as for binomial conditional inversion. The difference in usage is that, in case of inversion the notation is e ωy|x , ωy|x } aX ), i.e. there is no argument opinion ωy . For abduc{ωx|y , ωx|y } = ⊚({ tion, the argument is needed. Figure 10.4 shows a screenshot of binomial abduction, involving the Bayesian base rate ay = 0.33. u u u y|x ~ ABDUCE = y|x d b a Base rate on x Belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.20 probability 0.20 d y Pa P Opinion about y belief 0.80 disbelief 0.10 uncertainty 0.10 base rate 0.33 probability 0.83 P y|x 0.40 0.00 0.60 0.33 0.60 b y|x 0.20 0.60 0.20 0.33 0.27 b d a P Opinion about x belief 0.17 disbelief 0.08 uncertainty 0.75 base rate 0.20 probability 0.32 Fig. 10.4 Screenshot of abduction, involving Bayesian base rate ay = 0.33 The abduced opinion ωxeky = (0.17, 0.08, 0.75, 0.20) contains considerable uncertainty, which is partially due to the following uncertainty factors that appear in the computation. Irrelevance: Weighted relative uncertainty: Ψ(y|X) = 0.67 uw y|X Apex point uncertainty in X-subtriangle: uxky◦ Argument uncertainty: uY = 0.81 (10.28) = 0.95 = 0.10 Notice that the difference between deduction and abduction simply depends on which conditionals are available to the analyst. In case of causal situations it is 10.5 Illustrating the Base Rate Fallacy 185 normally easier to estimate causal conditionals than the opposite derivative conditionals. Assuming that there is a causal conditional relationship from x to y, the analyst therefore typically has available the pair of conditionals (ωy|x , ωy|x ), so that computing an opinion about x would require abduction. 10.5 Illustrating the Base Rate Fallacy The base rate fallacy is briefly discussed in Section 10.1. This section provides simple visualisations of how the base rate fallacy can materialise, and how it can be avoided. Assume medical tests for diseases A and B, where the sensitivity and specificity for both tests are equal, as expressed by the following conditional opinions. Sensitivity (‘Positive test on affected’): by|x = 0.90 ωy|x = dy|x = 0.05 uy|x = 0.05 by|x = 0.05 Specificity (‘Positive test on unaffected’): ωy|x = dy|x = 0.90 uy|x = 0.05 (10.29) In the first situation, assume that the base rate for disease A in the population is ax = 0.5, then the Bayesian base rate of the test result becomes ay = 0.5 too. Assume further that a patient tests positive for disease A. The abduced opinion about the patient having disease A illustrated in Figure 10.5. The projected probability of having disease A is P(x) = 0.93 In the second situation, assume that the base rate for disease B in the population is ax = 0.01, then the Bayesian base rate of the test result becomes ay = 0.06. Assume further that a patient tests positive for disease B. The abduced opinion about the patient having disease B is illustrated in Figure 10.6. The projected probability of having disease B is only P(x) = 0.15, and the uncertainty is considerable. The conclusion to be drawn from the examples of Figure 10.5 and Figure 10.6 is that the medical practitioner must consider the base rate of the disease that the patient is tested for. Note that tests A and B both have the same quality, as expressed by their equal sensitivity and specificity. Despite having equal quality, the diagnostic conclusions to be drawn from a positive test A and a positive test B are radically different. It is thus not enough to simply consider the quality of tests when making diagnostics, the practitioner must also take into account the base rates of diseases and other medical conditions that are being tested. The quality of a test can be expressed in terms of the relevance of the disease on the test results. Tests A and B in the example above have relatively high, but not perfect relevance. 186 10 Conditional Abduction u u u ~ ABDUCE = y|x d b d a P a P Opinion about y belief 1.00 disbelief 0.00 uncertainty 0.00 base rate 0.50 probability 1.00 Base rate on x Belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.50 probability 0.50 y|x y b P y|x 0.90 0.05 0.05 0.50 0.93 b d a P Opinion about x belief 0.89 disbelief 0.04 uncertainty 0.07 base rate 0.50 probability 0.93 y|x 0.05 0.90 0.05 0.50 0.07 Fig. 10.5 Screenshot of medical test of disease A with base rate ax = 0.50 u u u ~ ABDUCE = y|x d b a Base rate on x Belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.01 probability 0.01 d Pa Opinion about y belief 1.00 disbelief 0.00 uncertainty 0.00 base rate 0.06 probability 1.00 y|x y b P P y|x 0.90 0.05 0.05 0.06 0.90 y|x 0.05 0.90 0.05 0.06 0.05 b d a P Opinion about x belief 0.14 disbelief 0.53 uncertainty 0.33 base rate 0.01 probability 0.15 Fig. 10.6 Screenshot of medical test of disease B with base rate ax = 0.01 In the next example, assume a test C with low quality (high irrelevance) as expressed by the conditionals below. 10.5 Illustrating the Base Rate Fallacy 187 by|x = 0.90 ωy|x = dy|x = 0.05 uy|x = 0.05 Sensitivity (‘Positive test on affected’): by|x = 0.70 Specificity (‘Positive test on unaffected’): ωy|x = dy|x = 0.25 uy|x = 0.05 (10.30) In this case, assume that the base rate for disease C in the population is ax = 0.50, then the Bayesian base rate of the test result becomes ay = 0.84. u u u ~ ABDUCE d b d = y|x a Base rate on x Belief 0.00 disbelief 0.00 uncertainty 1.00 base rate 0.50 probability 0.50 Opinion about y belief 1.00 disbelief 0.00 uncertainty 0.00 base rate 0.84 probability 1.00 y|x y b P a PP y|x 0.90 0.05 0.05 0.84 0.94 y|x 0.70 0.25 0.05 0.84 0.74 b d aP Opinion about x belief 0.20 disbelief 0.08 uncertainty 0.72 base rate 0.50 probability 0.56 Fig. 10.7 Screenshot of medical test C with poor quality Note the dramatic increase in uncertainty, which is mainly due to the high irrelevance Ψ(y|X) = 0.80. The closer together the conditional opinion points are positioned in the opinion triangle (in the opinion simplex in general), the higher the irrelevance. In the extreme case when all conditional opinion points are in the exact same position, the irrelevance is total, meaning that variable Y is independent on variable X. 188 10 Conditional Abduction 10.6 Inversion of Multinomial Conditional Opinions 10.6.1 Principles of Multinomial Conditional Opinion Inversion Abduction in subjective logic requires inverion of conditional opinions of the form ωY |xi into conditional opinions of the form ωX|y j , analogously to Eq.(10.32) in the case of Bayesian inference. This section describes the principles of inversion. Figure 10.8 illustrates the principle of inversion of multinomial conditional opinions. The initial conditionals project the X-simplex (the opinion space of X) onto a sub-simplex within the Y -simplex, as shown in the top part of Figure 10.8, which is the basis for deduction as described in the previous section. The goal of the inversion is to derive conditionals that define a projection from the opinion space of Y to a sub-space of the opinion space of X (as shown in the bottom part of Figure 10.8), which in turn can support deduction from Y to X. Then an opinion on X can be deduced from an evidence opinion on Y , which completes the abduction process. uX vertex ZY || Xq Zq X Antecedent X uY vertex Consequent Y Conditionals ZY |x2 ZY|X x2 ZY |x3 x3 ZY |x1 y2 y3 x1 y1 Inversion of conditional opinions Z X ||Yq uX vertex uY vertex ZqY Consequent Z X | y2 X Z X | y3 x2 x3 Conditionals ZX|Y Z X | y1 x1 Antecedent Y y2 y3 y1 Fig. 10.8 Inversion of multinomial conditional opinions In case the conditionals are expressed as hyper opinions then it is required that they be projected to multinomial opinion arguments that only provide belief support 10.6 Inversion of Multinomial Conditional Opinions 189 for singleton elements. Eq.(3.31) describes the method for projecting hyper opinions onto multinomial opinions. Now, for the inversion of conditional opinions, assume two random variables X and Y with respective cardinalities k = |X| and l = |Y |, with a set of multinomial conditionals ωY |X , and the base rate distribution a X on X: Inversion arguments: ωY |X = {ωY |x1 , . . . , ωY |xk }, (10.31) aX . Since we want the projected probabilities of the inverted conditional opinions to behave in the same way as the inverted conditional probability distributions over variables in Bayesian inference described in Section 9.2, the projected probabilities of each inverted conditional opinion ωX|y j are determined according to the following equations which are analogous to Eq.(9.13): PX|y j (xi ) = a X (xi )PY |xi (y j ) , k ∑t=1 a X (xt )PY |xt (y j ) for i = 1, . . . , k. (10.32) The simplest opinions ωX|y j , j = 1, . . . , l, to satisfy Eq.(10.32) are the dogmatic opinions defined in the following way: b X|y j (xi ) = PX|y j (xi ), i = 1, . . . , k =0 ω X|y j : uX|y j (10.33) a X|y j = aX . However, the proper inverted conditional opinions ωX|Y do in general contain uncertainty, in contrast to the dogmatic opinions of Eq.(10.33). The amount of uncertainty to be assigned to the inverted conditional opinions depend on the following factors: • The maximum possible uncertainty values ubX|y j of the opinions ωX|y j satisfying Eq.(10.32), j = 1, . . . , l, • The weighted proportional uncertainty uYw|X of the uncertainties uY |xi , and • The irrelevance values Ψ(y j |X), for j = 1, . . . , l . 10.6.2 Method for Multinomial Conditional Inversion The principles of the inversion procedure for conditional opinions are cincisely formalised in 4 steps below. The crux is to determine the appropriate uncertainty for each inverted conditional, then the corresponding belief masses emerge directly. Step 1: Maximum theoretical uncertainties ubX|y j . First we identify the maximum theoretical uncertainties ubX|y j of the inverted conditionals, by converting as much belief mass as possible into uncertainty mass, while 190 10 Conditional Abduction preserving consistent projected probabilities according to Eq.(10.32). This process is illustrated in in Figure 10.9. uX PX|yj (x1) = bX|yj(x1) + aX|yj(x1) uX|yj PX|yj (x2) = bX|yj(x2) + aX|yj(x2) uX|yj vertex ZX|yj PX|yj (x3) = bX|yj(x3) + aX|yj(x3) uX|yj x2 ZX|yj x3 PX|yj x1 aX bX|y j Fig. 10.9 Dogmatic opinion ω X|y j and corresponding uncertainty maximized opinion ω The line defined by the equations PX|y j (xi ) = b X|y j (xi ) + a X (xi )uX|y j , i = 1, . . . k, (10.34) which by definition is parallel to the base rate director line and which joins ω X|y j bX|y j in Figure 10.9, defines possible opinions ωX|y j for which projected proband ω bX|y j is ability is consistent with Eq.(10.32). As the illustration shows, an opinion ω uncertainty-maximised when Eq.(10.34) is satisfied and at least one belief mass of bX|y j is zero, since the corresponding point would lie on a side of the simplex. In ω general, not all belief masses can be zero simultaneously except for vacuous opinions. The example of Figure 10.9 indicates the case when b X|y j (x1 ) = 0. bX|y j should satisfy the following requireThe components of the opinion point ω ments: PX|y j (xi0 ) ubX|y j = , for some i0 ∈ {1, . . . , k}, and (10.35) a X (xi0 ) PX|y j (xi ) ≥ a X (xi )uX|y j , for every i ∈ {1, . . . , k. (10.36) The requirement of Eq.(10.36) ensures that all the belief masses determined according to Eq.(3.12) are non-negative. These requirements lead to the theoretical uncertainty maximum : " # " # PX|y j (xi ) PY |xi (y j ) ubX|y j = min = min (10.37) k a i i a X (xi ) ∑t=1 X (xt )PY |xt (y j ) 10.6 Inversion of Multinomial Conditional Opinions 191 Step 2: Weighted proportional uncertainty uYw|X . We need the sum of conditional uncertanty uYΣ |X , computed as: uYΣ |X = ∑ uY |x (10.38) x The proportional uncertainty weights wYu |x are computed as: u wY |x = wYu |x uY |x uYΣ |X , for uYΣ |X > 0 (10.39) =0 for uYΣ |X =0 We also need the maximum theoretical uncertainty ubY |xi of each conditional ωY |xi . The maximum theoretical uncertainty must satisfy the following requirements: ubY |xi = PY |xi (y j0 ) , for some j0 ∈ {1, . . . , l}, and aY (y j0 ) PY |xi (y j ) ≥ aY (y j )uY |xi , for every j ∈ {1, . . . , l. (10.40) (10.41) The requirement of Eq.(10.41) ensures that all belief masses determined according to Eq.(3.12) are non-negative. These requirements lead to the theoretical uncertainty maximum : PY |xi (y j ) (10.42) ubY |xi = min j aY (y j ) The weighted proportional uncertainty components uYw|x are computed as: uYw|x = uYw|x wYu |x uY |x ubY |x , =0 for ubY |x > 0 (10.43) for ubY |x = 0 The weighted proportional uncertainty uYw|X can then be computed as: k uYw|X = ∑ uYw|xi . (10.44) i=1 Step 3: Relative uncertainties ueX|y . The relative uncertainty, denoted ueX|y j , is computed as the coproduct of the weighted proportional uncertainty uYw|X and the irrelevance Ψ(y j |X): ueX|y j = uYw|X ⊔ Ψ(y j |X) = uYw|X + Ψ(y j |X) − uYw|X Ψ(yi |X) . (10.45) 192 10 Conditional Abduction The irrelevance Ψ(y j |X) of the variable X to the particular value y j of Y , is obviously a factor for determining the uncertainty uX|y j . For example, if the original conditionals ωY |X reflect total irrelevance of the variable X to the value y j of Y , then there is no basis for deriving belief about the inverted conditionals ωX|y j , and the latter must have maximal uncertainty. This is assured by both Eq.(10.45) when the irrelevance Ψ(y j |X) = 1. A practical example is when we know that the climate has continuously been changing for millions if years, for various reasons. Then observing that the climate is currently changing, in itself, says nothing about specific causes of the current change. The weighted proportional uncertainty uYw|X must be taken into account, because the uncertainty in one reasoning direction must be reflected by the uncertainty in the opposite reasoning direction. A practical example is when it is uncertain whether a specific factor has had any significant influence on the climate in the past. Then, knowing that the climate did change significantly at some point in time, in itself, says nothing about the presence of the specific factor at that point in time. According to the way it is defined by Eq.(10.44), uYw|X represents the proportional expected uncertainty of Y given X, which represents a general uncertainty level for the deductive reasoning, and which must be reflected in the inverted conditionals as well. The justification for Eq.(10.45) is that the relative uncertainty ueX|y should be an increasing function of both the weighted proportional uncertainty uYw|X and the irrelevance Ψ(y j |X). In addition to that, all the three values should lie in the interval [0, 1]. The disjunctive combination of weighted proportional uncertainty uYw|X and the irrelevance Ψ(y|X), is an adequate choice because it enables the following: • When one of the two operands equals 0, the result equals the other one, • When one of the two operands equals 1, the result equals 1 (equals that operand). Step 4: Inverted conditionals ωX|Y . The uncertainty of each inverted conditional, denoted uX|y j , is computed by multiplying the theoretical maximum uncertainty ubX|y j with the relative uncertainty ueX|y j , as expressed by Eq.(10.46) uX|y j = ubX|y j ueX|y j = ubX|y j (uYw|X + Ψ(y j |X) − uYw|X Ψ(yi |X)) . (10.46) The uncertainty uX|y j is in the range [0, ubX|y j ], because the relative uncertainty ueX|y j is in the range [0, 1]. Finally, given the uncertainties uX|y j , the inverted conditional opinions are simply determined as: ωX|y j = (bbX|y j , uX|y j , a X ), (10.47) where b X|y j (xi ) = PX|y j (xi ) − uX|y j a X (xi ), for i = 1, . . . , k. 10.7 Multinomial Abduction 193 Eq.(10.47) determines the set ωX|Y of inverted conditional opinions of Eq.(10.47). This marks the end of the 4-step procedure for inverting multinomial conditional opinions, which can be defined as an operator. Definition 10.5 (Inversion of Multinomial Conditionals). Let ωY |X be a set of multinomial conditional opinoins, and let a X be the base rate distribution over X. The set of conditional opinions ωX|Y , derived through the procedure of 4 steps described above, is the inverted set of conditional opinions of the former set. The e denotes the operator for conditional inversion, so that inversion of a set symbol ‘⊚’ of conditional opinions can be expressed as: e (ωY |X , a X ) ωX|Y = ⊚ (10.48) ⊔ ⊓ The difference between inversion and abduction is thus that abduction takes the evidence argument ωy , whereas inversion does not. Eq.(10.25) compares the two operations that both use the same operator symbol. Inversion: ωX|Y = e (ωY |X , a X ) ⊚ e (ωY |X , a X ) Abduction: ωX ekY = ωY ⊚ (10.49) Conditional abduction according to Eq.(9.20) with the original set of multinomial conditionals ωY |X , is reduced to multinomial conditional deduction according to Eq.(9.19), with the set of the inverted conditionals ωX|Y . As explained in Section 10.3.3, repeated inversion of conditionals with Bayesian base rates preserves the projected probabilities, but produces increased uncertainty in general. The increase in uncertainty is of course limited by the theoretical maximum uncertainty for each conditional. Repeated inversion will in the end make the uncertainty converge towards the theoretical maximum. 10.7 Multinomial Abduction Multinomial abduction is a two-step process, where the first consists of inverting a set of multinomial conditional opinions as described in Section 10.6, which in the second step are used as arguments for multinomial deduction. e denotes the conditional abduction operator for subjective opinThe symbol ⊚ ions, and ωY |X denotes the set of all the k = |X| different conditional opinions over Y , ωY |xi , i = 1, . . . , k. ωX k̃Y denotes the opinion on X derived by the operation of abduction that we define below. Definition 10.6 (Multinomial Abduction). 194 10 Conditional Abduction Let X = {xi |i = 1, . . . , k} and Y = {y j | j = 1, . . . , l} be random variables, where now Y is the evidence variable, and X is the target variable. Assume we have an opinion on ωY , and a set of conditional opinions of the form ωY |xi , one for each i = 1, . . . , k, and a base rate distribution a X on X. The conditional opinion ωY |xi expresses a subjective opinion on Y given that X takes the value xi . Formally, a conditional opinion ωY |xi , i ∈ [1, k], is a tuple: ωY |xi = bY |xi , uY |xi , aY , (10.50) where bY |xi : (Y ) → [0, 1] is a belief mass function, uY |xi is an uncertainty mass, and aY : Y → [0, 1] is a base rate function expressing the prior probabilities over Y . (Note that the base rate function aY is the same for all of the conditional opinions ωY |xi , i = 1, . . . , k.) Given the above, assume that the analyst wants to derive a subjective opinion on X. Multinomial abduction is to compute the opinion about X in this situation, which consists of the following two-step process: 1. Invert the set of available conditionals ωY |X to produce the set of conditionals ωX|Y , as described in Section 10.6. 2. Apply the set of inverted conditionals ωX|Y together with the argument opinion ωY to compute the abduced opinion ωX ekY , as described in Section 9.5.4 on multinomial deduction. ⊔ ⊓ Multinomial abduction produces the opinion ωX k̃Y about the variable X expressed as: e ωY |X , aX ) ωX k̃Y = ωY ⊚( (10.51) = ωY ⊚ ωX|Y . ⊔ ⊓ Note that the operator symbol for abduction is the same as for conditional inversion. The difference in usage is that, in case of inversion the notation is e ωY |X , aX ), i.e. there is no argument opinion ωY . For abduction, the arωX|Y = ⊚( gument is needed. Notice that the difference between deduction and abduction simply depends on which conditionals are available to the analyst. In case of causal situations it is normally easier to estimate causal conditionals than the opposite derivative conditionals. Assuming that there is a causal conditional relationship from X to Y , the analyst therefore typically has available the set of conditionals ωY |X , so that computing an opinion about X would require abduction. 10.8 Example: Military Intelligence Analysis 195 10.8 Example: Military Intelligence Analysis 10.8.1 Example: Intelligence Analysis with Probability Calculus Two countries A and B are in conflict, and intelligence analysts of country B wants to find out whether country A intends to use military aggression. The analysts of country B consider the following possible alternatives regarding country A’s plans: x1 : No military aggression from country A x2 : Minor military operations by country A x3 : Full invasion of country B by country A (10.52) The way the analysts will determine the most likely plan of country A is by trying to observe movement of troops in country A. For this, they have spies placed inside country A. The analysts of country B consider the following possible movements of troops. y1 : No movement of country A’s troops y2 : Minor movements of country A’s troops (10.53) y3 : Full mobilisation of all country A’s troops The analysts have defined a set of conditional probabilities of troop movements as a function of military plans, as specified by Table 10.2. Table 10.2 Conditional probabilities p(Y |X): troop movement y j given military plan xi Probability y1 vectors No movemt. p (Y |x1 ): p(y1 |x1 ) = 0.50 p (Y |x2 ): p(y1 |x2 ) = 0.00 p (Y |x3 ): p(y1 |x3 ) = 0.00 Troop movements y2 y3 Minor movemt. Full mob. p(y2 |x1 ) = 0.25 p(y3 |x1 ) = 0.25 p(y2 |x2 ) = 0.50 p(y3 |x2 ) = 0.50 p(y2 |x3 ) = 0.25 p(y3 |x3 ) = 0.75 The rationale behind the conditionals are as follows. In case country A has no plans of military aggression (x1 ), then there is little logistic reason for troop movements. However, even without plans of military aggression against country B it is possible that country A expects military aggression from country B, forcing troop movements by country A. In case country A prepares for minor military operations against country B (x2 ), then necessarily troop movements are required. In case country A prepares for full invasion of country B (x3 ), then significant troop movements are required. Assume that, based on observations by spies of country B, the analysts assess the likelihoods of actual troop movements to be: p(y1 ) = 0.20 , p(y2 ) = 0.60 , p(y3 ) = 0.20 . (10.54) 196 10 Conditional Abduction The analysts are faced with an abductive reasoning situation, and must first derive the inverted conditionals p(X|Y ). Assume that the analysts estimate the base rates (prior probabilities) of military plans to be: a(x1 ) = 0.70 , a(x2 ) = 0.20 , a(x3 ) = 0.10 . (10.55) The expression of Eq.(9.13) can now be used to derive the required inverted conditionals, which are given in Table 10.3 below. Table 10.3 Conditional probabilities p(X|Y ): military plan xi given troop movement y j Probabilities of military plans given troop movement p (X|y1 ) p (X|y2 ) p (X|y3 ) Military plan No movemt. Minor movemt. Full mob. y1 : No aggr. p(x1 |y1 ) = 1.00 p(x1 |y2 ) = 0.58 p(x1 |y3 ) = 0.50 y2 : Minor ops. p(x2 |y1 ) = 0.00 p(x2 |y2 ) = 0.34 p(x2 |y3 ) = 0.29 y3 : Invasion p(x3 |y1 ) = 0.00 p(x3 |y2 ) = 0.08 p(x3 |y3 ) = 0.21 The expression of Eq.(9.11) can now be used to derive the probabilities of military plans of country A, resulting in: p(x1e kY ) = 0.65 , p(x2e kY ) = 0.26 , p(x3e kY ) = 0.09 . (10.56) Based on the results of Eq.(10.56), it seems most likely that country A does not plan any military aggression against country B. However, these results hide uncertainty, and can thereby give a misleading estimate of country A’s plans. Analysing the same example with subjective logic in Section 10.8.2 gives a more nuanced picture, by explicitly showing the amount of uncertainty affecting the results. 10.8.2 Example: Intelligence Analysis with Subjective Logic In this example we revisit the intelligence analysis situation of Section 10.8.1, but now with conditionals and evidence represented as subjective opinions. When analysed with subjective logic, the conditionals are affected with uncertainty but still have the same projected probability distribution as before. For the purpose of the example we assign the maximum possible amount of uncertainty to the set of dogmatic opinions that correspond to the probabilistic conditionals of the example in Section 10.8.1. These dogmatic opinions are specified in Table 10.4. To recall, the base rates over the three possible military plans in X that were already specified in Eq.(10.55), are repeated in Eq(10.57) below. Military plan base rates: a X (x1 ) = 0.70, a X (x2 ) = 0.20, a X (x3 ) = 0.10 (10.57) 10.8 Example: Military Intelligence Analysis 197 Table 10.4 Dogmatic conditional opinions ω Y |X : troop movement y j given military plan xi Opinions ω Y |X ω Y |x1 : ω Y |x2 : ω Y |x3 : y1 : No movemt. bY |x1 (y1 ) = 0.50 bY |x2 (y1 ) = 0.00 bY |x3 (y1 ) = 0.00 Troop movements y2 : y3 : Minor movemt. Full mob. bY |x1 (y2 ) = 0.25 bY |x1 (y3 ) = 0.25 bY |x2 (y2 ) = 0.50 bY |x2 (y3 ) = 0.50 bY |x3 (y2 ) = 0.25 bY |x3 (y3 ) = 0.75 Uncertainty Any uY |x1 = 0.00 uY |x2 = 0.00 uY |x3 = 0.00 The opinion conditionals affected with uncertainty specified in Table 10.5 are obtained by uncertainty maximization of the dogmatic opinion conditionals of Table 10.4. The uncertainty maximization depends on the base rates in Eq.(10.58) over the three possible troop movements in Y , which are derived as Bayesian base rates as described in Section 9.5.2. Troop movement base rates: aY (y1 ) = 0.35, aY (y2 ) = 0.30, aY (y3 ) = 0.35 (10.58) With the base rates over Y , the uncertainty ubY |xi of the uncertainty-maximised conditional opinions about troop movement of Table 10.5 are obtained according to Eq.(10.59), which is equivalent to Eq.(10.37) as described in Section 10.6. PY |xi (y j ) (10.59) ubY |xi = min j aY (y j ) The belief masses of the uncertainty-maximised opinions are then computed according to Eq.(10.60). bY |xi (y j ) = PY |xi (y j ) − aY (y j )b uY |xi (10.60) The uncertainty-maximixed conditional opinions are given in Table 10.5. Table 10.5 Uncertain conditional opinions ωY |X : troop movement y j given military plan xi Opinions ωY |X ωY |x1 : ωY |x2 : ωY |x3 : y1 : No movemt. bY |x1 (y1 ) = 0.25 bY |x2 (y1 ) = 0.00 bY |x3 (y1 ) = 0.00 Troop movements y2 : y3 : Minor movemt. Full mob. bY |x1 (y2 ) = 0.04 bY |x1 (y3 ) = 0.00 bY |x2 (y2 ) = 0.50 bY |x2 (y3 ) = 0.50 bY |x3 (y2 ) = 0.25 bY |x3 (y3 ) = 0.75 Uncertainty Any uY |x1 = 0.71 uY |x2 = 0.00 uY |x3 = 0.00 The opinion about troop movements also needs to be uncertainty maximised, in accordance with Eq.(10.59) and Eq.(10.60), where the uncertainty maximised opinion is expressed by Eq.(10.61). 198 10 Conditional Abduction bY (y1 ) = 0.00, bY (y2 ) = 0.43, ωY = bY (y3 ) = 0.00, uY = 0.57 aY (y1 ) = 0.35 aY (y2 ) = 0.30 aY (y3 ) = 0.35 (10.61) First, the opinion conditionals must be inverted by taking into account the base rate of military plans expressed in Eq.(10.55). The inversion process produces the inverted conditional expressed in Table 10.6. Table 10.6 Conditional opinions ωX|Y : Military plan xi given Troop movement y j Military plan Opinions of military plans given troop movement ωX|y1 ωX|y2 ωX|y3 No movemt. Minor movemt. Full mob. x1 : No aggression b X|y1 (x1 ) = 1.00 b X|y2 (x1 ) = 0.00 b X|y3 (x1 ) = 0.00 x2 : Minor ops. b X|y1 (x2 ) = 0.00 b X|y2 (x2 ) = 0.17 b X|y3 (x2 ) = 0.14 x3 : Invasion b X|y1 (x3 ) = 0.00 b X|y2 (x3 ) = 0.00 b X|y3 (x3 ) = 0.14 X: Any uX|y1 = 0.00 uX|y2 = 0.83 uX|y3 = 0.72 Then the likelihoods of country A’s plans can be computed as the opinion: ωX ekY b X ekY (x1 ) = 0.00, a X (x1 ) = 0.70, PX ekY (x1 ) = 0.65 b e (x2 ) = 0.07, a X (x2 ) = 0.20, P e (x2 ) = 0.26 X kY X kY = b X ekY (x3 ) = 0.00, a X (x3 ) = 0.10, PX ekY (x3 ) = 0.09 u = 0.93 Xe kY (10.62) These results can be compared with those of Eq.(10.56) which were derived with probabilities only, and which are equal to the probability distribution given in the rightmost column of Eq.(10.62). An important observation to make is that, although x3 (full invasion) seems to be country A’s least likely plan in probabilistic terms es expressed by PX ekY (x3 ) = 0.09, there is considerable uncertainty, as expressed by uX ekY = 0.93. In fact, the probability PX (x1 ) = 0.65 of the most likely plan x1 has no belief support at all, and is only based on uncertainty, which would be worrisome in a real situation. A likelihood expressed as a scalar probability can thus hide important aspects of the analysis, which will only come to light when uncertainty is explicitly expressed, as done in the example above. Chapter 11 Fusion of Subjective Opinions Belief fusion is a central concept in subjective logic. It allows evidence and opinions held by source agents about the same domain of interest to be merged in order to provide an opinion about the domain representing the combination of the source agents. 11.1 Interpretation of Fusion In many situations there will be multiple sources of evidence about a domain of interest, where there can be significant differences between the opinions. It is often useful to combine the evidence from different sources in order to produce an opinion that better reflects the set of different opinions or that is closer to the ground truth than each opinion in isolation. Belief fusion is precisely to merge multiple opinions in order to produce a single opinion that is more correct (according to some criteria) than each opinion in isolation. The principle of opinion fusion is illustrated in Figure 11.1. Belief fusion consists of merging the separate sources/agents A and B into a single source/agent that e.g. can be denoted denoted (A ⋄ B), and of mathematically combining their opinions into a single opinion which then represents the opinion of the merged source/agent. Fusion situations vary significantly. For this reason it is necessary to apply different fusion operators when modelling different fusion situations. However, it can be challenging to identify the correct fusion operator for a specific situation. In general, a given fusion operator is unsatisfactory when it produces adequate results in most instances of a situation, but not in all instances of the situations. There should be no exceptional input arguments that the fusion operator can not handle. A fusion operator should produce adequate results in all possible cases of the situation to be modelled. 199 200 11 Fusion of Subjective Opinions Agent A expresses Z XA Values of variable X Agent B expresses represent States of domain : represent State of domain : Z XB Fusion Process Agents A and B combined express Z X( A and B ) supports Values of variable X Fig. 11.1 Fusion process principle As an analogy, consider the situation of predicting the strength of a steel chain, where the classical model is that of the weakest link, meaning that the chain is only as strong as the weakest of all its links. A different situation is e.g. to determine the strength of a relay swimming team, for which an adequate model could be the average strength of each swimmer on the team. Applying the weakest link model to assess the overall strength of the relay swimming team is an approximation that might give good predictions in most instances of high level swimming championships. However, it obviously is a poor model and would produce unreliable predictions in general. Similarly, applying the average strength model for assessing the overall strength of the chain represents an approximation that would produce satisfactory strength predictions in most instances of high quality steel chains. However, it is obviously a very poor model which would be unreliable in general and which could be fatal if life depended on it. These examples illustrate that it is insufficient to simply use a few numerical examples to test whether the weakest link principle is an adequate model for predicting the strength of relay swimming teams. Similarly it is insufficient to simply use a few numerical examples to test whether the average principle is adequate for modeling steel chains. Without a clear understanding of the situations to be modelled the analyst does not have a basis for selecting the most appropriate model. The selection of the appropriate models might be obvious for the simple examples above, but it can be challenging to say whether a fusion operator is adequate for a specific fusion situation [52]. The conclusion to be drawn from this discussion is that the analyst must first understand the dynamics of the situation at hand in order to find the most correct model for analyzing it. The next section describes the problem of determining the correctness of a specific model for belief fusion. 11.1 Interpretation of Fusion 201 11.1.1 Correctness and Consistency Criteria for Fusion Models We argue that meaningful and reliable belief fusion depends on the fusion operator’s ability to produce correct results for the practical or hypothetical situation that is being analyzed. This calls for a definition of what it means for results to be correct. Definition 11.1 (Correctness of Results). In general the correctness of results produced by a model is the degree to which the results represent the true state of the real situation that is being modeled. ⊔ ⊓ To clarify this definition it is useful to distinguish between three types of truth: 1) ground truth, 2) consensus truth and 3) subjective truth, as described below, where ground truth is the strongest and subjective truth is the weakest form of truth. • Ground truth about a situation is the objectively observable state of the situation. • Consensus truth about a situation is the state that is identified to be the actual state by a commonly shared opinion about the situation, or the state that is identified to be the actual state according to commonly accepted norms or standards. • Subjective truth about a situation is the state identified to be actual state by the analyst’s own opinion about the situation. The term ‘true state’ can also be used in the sense that the state is satisfactory or preferred. For example when a group of people wants to select a movie to watch at the cinema together it would seem strange to say that one specific movie is more true than another. However, in this case the term truth can interpreted in the sense that one specific movie (the true state) can be considered to be the most satisfactory for all the group members to watch together. Fusion models produce output results when used to analyze fusion situations. Three different types of result correctness emerge from the three different types of truth, where objective correctness is the strongest, and subjective correctness is the weakest. • Objective result correctness is the degree to which the result represents the ground truth of a situation. • Consensus result correctness is the degree to which the result represents the consensus truth of a situation. • Subjective result correctness is the degree to which the result represents the subjective truth of a situation. Depending on whether ground truth, consensus truth or subjective truth is available, the strongest form of correctness should be required for assessing the results. For example assume a weather forecast model with all its various input parameters and their complex relationships. Weather forecasts can be compared with the actual weather when the time of the forecast arrives a day or two later, so that it is reasonable to require objective correctness when assessing weather forecasting models. 202 11 Fusion of Subjective Opinions The case of predicting global warming might seem similar to that of forecasting the weather, because models for global warming are also based on many different input parameters with complex relationships. Although predicting global warming to occur over the next 100 years can be objectively verified or refuted, the time scale makes it impossible to require objective correctness in the short term. Instead, practical assessment of model correctness must be based on consensus among experts. So with no ground truth as bench mark it is only possible to require consensus correctness in the short term. An paradoxical observation is that in 100 years (e.g. after year 2100) when ground truth about global warming predicted for the next 100 years finally becomes available there will probably no longer be any interest in assessing the correctness of the models used to make the predictions, and the individuals who designed the models will be long gone. Designers of global warming models will thus never be confronted with the ground truth about their models and predictions. Despite the lack of objective basis, consensus correctness as a criterion is often used for selecting specific models and for determining whether or not the results they produce shall be used in planning and decision making. In situations where ground truth can not be observed and consensus truth is impossible to obtain, only subjective criteria for truth can be used. Models for which subjective correctness criteria can be used are e.g. models for making personal decisions about which career path to follow or which partner to live with. In theory such decision are made based on multiple forms of evidence which must be fused to form an opinion. People normally do not use formal models for analyzing such situations, and instead use their intuition. Models assessed under subjective correctness criteria are often only used for practical decision making by an individual a small number of times during a lifetime, so not even statistical evidence can be obtained. However there are expert systems for e.g. career path choice and partner matching, in which case it is possible to determine statistically whether a particular model predicts ‘good’ career choices and ‘happy’ unions in the long term. With regard to Definition 11.1, it is necessary to examine the case when it has only been observed once or, a small number of times, and as whether it is representative of the true state of a situation. Although a model produces correct results in some instances, there might be other instances where the results are clearly wrong, in which case the model can not be considered to be correct in general. In situations when only instances with correct results have been observed, the analyst might erroneously think that the model is correct in general. For example, assume a rather naı̈ve analyst who misinterprets the situation of adding apples from two baskets, and erroneously thinks that the product rule of integer multiplication is an appropriate model. Assume that the analyst tries a specific example with two apples in each basket, and computes the sum with the product rule, which gives 4 apples. When observing a real example of two baskets of two apples each, it turns out that adding them together also produces 4 apples. This result could mistakenly be interpreted as a confirmation that the product rule is a correct model, simply because the computed result is the same as the ground truth in this particular instance. It is of course wrong to conclude that a model is correct just because it produces results that (perhaps by coincidence) correspond to the ground 11.1 Interpretation of Fusion 203 truth in a single instance. In order for a model to be correct, it is natural to require that results produced by it are generally correct, and not just by coincidence in specific instances of a situation. In order to distinguish between coincidentally correct results and generally correct results, it is necessary to also consider consistency, which leads to the following definition. Definition 11.2 (Model Correctness). A model is correct for a specific situation when it consistently produces correct results in all instances of the situation. ⊔ ⊓ On a high level of abstraction, a correct reasoning model according to Definition 11.2 must faithfully reflect the (class of) situations that are being modeled. A precise way of expressing this principle is that for a given a class of situations, there is one correct model. Note that is possible to articulate three types of model correctness according to the three types of result correctness. • Objective model correctness for a specific class of situations is the model’s ability to consistently produce objectively correct results for all possible situations in the class. • Consensus model correctness for a specific class of situations is the model’s ability to consistently produces consensus correct results for all possible situations in the class. • Subjective model correctness for a specific class of situations is the model’s ability to consistently produces subjectively correct results for all possible situations in the class. Depending on whether ground truth, consensus truth or subjective truth is available, the strongest form of model correctness should be required for practical analysis. Observing result correctness in one instance is not sufficient to conclude that a model is correct. It can be theoretically impossible to verify that all possible results are consistently correct, so proving that a model is correct in general can be challenging. On the other hand, if a single false result is observed it can be concluded that the model is incorrect for the situation. In such cases it might be meaningful to indicate the range of validity of the model which limits the range of input arguments or possibly the range of output results. The next two sections described interpretations of fusion operators defined in subjective logic, and selection criteria than analysts can use when deciding which fusion operator to use for a specific situation. 11.1.2 Classes of Fusion Situations Situations of belief fusion involve belief arguments from multiple sources that must be fused in some way to produce a single belief argument. More specifically, the situation is characterized by a frame consisting of two or more statements, and a set of different belief arguments about these statements. It is assumed that each belief argument supports one or several statements. The purpose of belief fusion is 204 11 Fusion of Subjective Opinions to produce a new belief that identifies the most ‘correct’ statement(s) in the frame. The meaning of most correct statement can also be that it is the most acceptable or most preferred statement. Different beliefs can be fused in various ways, each having an impact on how the specific situation in evidence fusion is modeled. It is often challenging to determine the correct or the most appropriate fusion operator for a specific situation. One way of addressing this challenge is to categorize these specific situations according to their typical characteristics, which would then allow for determining which fusion operators are more adequate to each category. Four distinct classes as well as one hybrid class of fusion situations are described below. • Belief Constraint Fusion is when it is assumed that (i) each belief argument can dictate which states of the frame are the most correct, and (ii) there is no room for compromise in case two argument opinions are totally conflicting, i.e. the fusion result is not defined in that case. In some situations this property is desirable. An example is when two persons try to agree on seeing a movie at the cinema. If their preferences include some common movies they can decide to see one of them. Yet, if their preferences do not have any movies in common then there is no solution, so the rational consequence is that they will not watch any movie together. Constraint fusion is described in Section 11.2. • Cumulative Belief Fusion is when it is assumed that it is possible to collect an increasing amount of independent evidence by including more and more arguments, and that certainty about the most correct state increases with the amount of evidence accumulated. A typical case depicting this type of fusion is when one makes statistical observations about possible outcomes, i.e. the more observations the stronger the analyst’s belief about the likelihood of each outcome. For example, a mobile network operator could observe the location of a subscriber over time, which will produce increasing certainty about the most frequent locations of that subscriber. However, the result would not necessarily be suitable for indicating the exact location of the subscriber at a specific time. Cumulative fusion is described in Section 11.3. • Averaging Belief Fusion is when dependence between arguments is assumed. In other words, including more arguments does not mean that more evidence is supporting the conclusion. An example of this type of situation is when a jury tries to reach a verdict after having observed the court proceedings. Because the evidence is limited to what was presented to the court, the certainty about the verdict does not increase by having more jury members expressing their beliefs, since they were all exposed to the same evidence. Averaging fusion is described in Section 11.4. • Hybrid Cumulative / Averaging Fusion can be applied when observations can be considered partially dependent. The operator for hybrid cumulativeaveraging fusion [49] is partially based on cumulative fusion and partially on averaging fusion. It is described in Section 11.5. • Consensus & Compromise Fusion (CC-fusion) is when no single belief argument alone can dictate that specific states of the frame are the most correct. In this fusion class the analyst naturally wants to preserve shared beliefs from each 11.1 Interpretation of Fusion 205 argument, and in addition transform conflicting beliefs into new shared beliefs on union subsets. In this way consensus belief is preserved when it exists and compromise belief is formed when necessary. In case of totally conflicting beliefs on a binary frame, then the resulting fused belief is totally uncertain. An example is when analysing evidence about the Kennedy murder case, where the analyst collects statements from two witnesses. Assuming that both witnesses claim to know with some certainty that Oswald killed Kennedy, the consensus & compromise fusion would say the same, because there is a consensus. However, when assuming that witness 1 claims to know with certainty that Oswald killed Kennedy, and that witness 2 claims to know with certainty that Oswald did not kill Kennedy, then consensus & compromise fusion would return the result that it is totally uncertain whether Oswald killed Kennedy, because uncertainty is the best compromise in case of totally conflicting beliefs. CC-fusion is described in Section 11.6. The subtle differences between the fusion situations above illustrate the challenge of modeling them correctly. For instance, consider the task of determining the location of a mobile phone subscriber at a specific point in time by collecting location evidence from base station, in which case it seems natural to use constraining belief fusion. If two adjacent base stations detect the subscriber, then the belief constraint operator can be used to locate the subscriber within the overlapping region of the respective radio cells. However, if two base stations far apart detect the subscriber at the same time, then the result of constraining belief fusion is not defined so there is no conclusion. With additional assumptions, it would still be reasonable to think that the subscriber is probably located in one of the two cells, but not which one in particular, and that the case needs further investigation because the inconsistent signals might be caused by en error in the system. 11.1.3 Criteria for Fusion Operator Selection While defining classes of fusion situations helps in scoping the solution space, there is still the issue of determining which class a specific situation belongs to. The approach we propose for this classification problem is to specify a set of assumptions about a fusion situation, where each assumption can be judged to be either valid or invalid for the situation. In other words, we decompose the classification problem so it now becomes a matter of defining whether specific assumptions apply to the situation. The set of assumptions below can be used to determine which class a situation belongs to. In order to select the correct fusion model the analyst must consider the set of assumption about the fusion situation to be analysed and judge which assumptions are applicable. The most adequate fusion model is identified as a function of the set of assumptions that applies to the situation to be analyzed. The selection procedure is illustrated in Figure 11.2. 206 11 Fusion of Subjective Opinions (a) Consider situation of belief fusion to be modelled (b) Fusion of totally conflicting opinion arguments (c) Undefined (Non-compromise) Constraint fusion operator ‘~’ Defined (Compromise) (d) Effect of fusing equal opinion arguments (e) Reduced uncertainty (Non-idempotence) Cumulative fusion operator ‘’ Equal uncertainty (Idempotence) (g) (f) Effect of fusion with vacuous opinion argument (h) Increased uncertainty (Non-neutrality) Averaging fusion operator ‘’ No effect (Compromise, idempotence and neutrality) Consensus and compromise cc ’ fusion operator ‘c Fig. 11.2 Procedure for selecting the most adequate fusion operator The steps of the selection procedure in Figure 11.2 are described in more detail below. (a) The analyst first needs a good understanding of the situation to be analysed and modelled with a fusion operator. This includes being able to make the binary choices of (b), (d) and (f) below. (b) Does it make sense to fused two totally conflicting opinion arguments? (c) In case no compromise can be imagined between two totally conflicting arguments, then it is probably adequate to apply the belief constraint fusion operator. This fusion operator is not defined in case of totally conflicting arguments, which reflects the assumption that there is no compromise for totally conflicting arguments. (d) Should two equal opinion arguments produce a fused opinion which is equal to the arguments? (e) In case it is not assumed that two equal arguments should produce an equal fusion result, then it is probably adequate to apply the cumulative fusion operator. It means that equal arguments are considered as independent support for specific values of the variable, so that two equal arguments produce stronger support than a single argument alone. In other words the operator should be 11.2 Belief Constraint Fusion 207 non-idempotent, as is the case for cumulative fusion. This operator can also handle totally conflicting opinions if necessary. (f) Should a vacuous opinion have any influence on the fusion result? (g) In case it is assumed that a vacuous opinion has an influence on the fused result, then it is probably adequate to apply the averaging fusion operator. It means that there can be no neutral fusion argument. This can be meaningful when e.g. applying fusion to make a survey of opinions. The averaging fusion operator would suit this purpose e.g. because a situation where many agents express vacuous opinions would be reflected in the result. This operator can also handle totally conflicting opinions if necessary. (h) In case it is assumed that two equal arguments produce an equal fusion result, and that a vacuous opinion argument has no influence on the fusion result, then it is probably adequate to apply the consensus and compromise fusion (CCfusion) operator. This operator can also handle totally conflicting opinions if necessary. The fusion operators mentioned in Figure 11.2 are described in the next sections. 11.2 Belief Constraint Fusion Situations where agents with different preferences try to agree on a single choice occur frequently. This must not be confused with fusion of evidence from different agents to determine the most likely correct hypothesis or actual event. Multi-agent preference combination assumes that each agent has already made up her mind, and is about determining the most acceptable decision or choice for the group of agents. Preferences for a state variable can be expressed in the form of subjective opinions. The constraint fusion operator of subjective logic can be applied as a method for merging preferences of multiple agents into a single preference for the whole group. This model is expressive and flexible, and produces perfectly intuitive results. Preference can be represented as belief and indifference can be represented as uncertainty/uncommitted belief. Positive and negative preferences are considered as symmetric concepts, so they can be represented in the same way and combined using the same operator. A totally uncertain opinion has no influence and thereby represents the neutral element. 11.2.1 Method of Constraint Fusion The belief constraint fusion operator described here is an extension of Dempster’s rule which in Dempster-Shafer belief theory is often presented as a method for fusing evidence from different sources [81] in order to identify the most likely hypothesis from the frame (domain). Many authors have however demonstrated that 208 11 Fusion of Subjective Opinions Dempster’s rule is not an appropriate operator for evidence fusion [92], and that it is better suited as a method for combining constraints [50, 48]. Definition 11.3 (The Constraint Fusion Operator). Assume the domain X and its hyperdomain R(X), and assume the hypervariable X which takes its values from R(X). Let agent A hold opinion ωXA and agent B hold opinion ωXB . The superscripts A and B are attributes that identify the respective belief sources or belief owners. These two opinions can be mathematically merged using the belief constraint fusion operator denoted as ‘⊙’ which can be expressed as: (A&B) Belief Constraint Fusion: ωX = ωXA ⊙ ωXB . (11.1) Belief source combination denoted with ‘&’ thus corresponds to opinion fusion with ‘⊙’. Below is the algebraic expression of the belief constraint fusion operator for subjective opinions. (A&B) bX (x) = u(A&B) = (A&B) X ωX : a (A&B) (x) = Har(x) (1−Con) , ∀ x ∈ R(X), uAX uBX (1−Con) (11.2) a AX (x)(1−uAX )+aaBX (x)(1−uBX ) , 2−uAX −uBX ∀ x ∈ R(X), x 6= 0/ The term Har(x) represents the relative harmony between constraints (in terms of overlapping belief mass) on x. The term Con represents the relative conflict between constraints (in terms of non-overlapping belief mass) between ωXA and ωXB . These parameters are defined below: Har(x) = b AX (x)uBX + b BX (x)uAX + ∑ b AX (xA )bbBX (xB ) (11.3) (xA ∩xB )=x Con = ∑ b AX (xA )bbBX (xB ) (11.4) (xA ∩xB )=0/ ⊔ ⊓ The divisor (1 − Con) in Eq.(11.2) normalizes the derived belief mass; it ensures belief mass and uncertainty mass additivity. The use of the constraint fusion operator is mathematically possible only if ωXA and ωXB are not totally conflicting, i.e., if Con 6= 1. The constraint fusion operator is commutative and non-idempotent. Associativity is preserved when the base rate is equal for all agents. Associativity in case of different base rates requires that all preference opinions be combined in a single 11.2 Belief Constraint Fusion 209 operation which would require a generalization of Eq.(11.2) for multiple agents, i.e. for multiple input arguments, which is relatively trivial. The base rates of the two arguments are normally assumed to be equal, expressed by aAX = aBX , but different base rates can be used in case of base rate disagreement between agents A and B. Associativity in case of different base rates requires that all preference opinions be combined in a single operation which would require a generalisation of Definition 11.3 for multiple agents, i.e. for multiple input arguments, which is relatively trivial. A totally indifferent opinion acts as the neutral element for constraint fusion, formally expressed as: IF (ωXA is totally indifferent, i.e. with uAX = 1) THEN (ωXA ⊙ ωXB = ωXB ) . (11.5) Having a neutral element in the form of the totally indifferent opinion is very useful when modelling situations of preference combination. The flexibility of subjective logic makes it simple to express positive and negative preferences within the same framework, as well as indifference/uncertainty. Because preference can be expressed over arbitrary subsets of the domain, this is in fact a multi-polar model for expressing and combining preferences. Even in the case of non-overlapping focal elements the belief constraint fusion operator produces meaningful results, namely that the preferences are incompatible. Examples in Sections 11.2.3–11.2.6 demonstrates the usefulness of this property. 11.2.2 Frequentist Interpretation of Constraint Fusion The behaviour and purpose of the constraint fusion operator can be challenging to understand. However, it has a very clear frequentist interpretation which is described next. Assume a domain X with its hyperdomain R(X)and powerset P(X). Recall from Eq.(2.1) that P(X) = R(X) ∪ {X, 0}. / Let x denote a specific value of the hyperdomain R(X) or of the powerset P(X). Let X be a hypervariable in X, and let χ by a random variable which takes its values from the powerset P(X). We consider a repetitive process denoted U which generates unconstrained instances of the variable χ , that in turn gets constrained through serially arranged stages A and B to produce a constrained output value of the variable. To be unconstrained means that χ = X. To constrain χ by e.g. xA is to produce a new χ so that χ := χ ∩ xA . This means that that χ is modified to take a value in P(X) with smaller or equal cardinality, i.e. that the new constrained χ typically has a value consisting of fewer singleton elements. Assume the opinions ωXA = (bbAX , uAX , aAX ) and ωXB = (bbBX , uBX , aBX ). Let pAχ be a probability distribution over χ where p Aχ (x) = b AX (x) and p Aχ (X) = uAX . Similarly, let pBχ be a probability distribution over χ based on ωXB . 210 11 Fusion of Subjective Opinions The serial constraining configuration is determined by the probability distributions p Aχ and p Bχ in the following way. At stage A a specific element xA ∈ P(X) is selected with probability p Aχ (xA ). At stage B a specific element xB ∈ P(X) is selected with probability p Bχ (xB ). The unconstrained variable χ produced at stage U is first constrained at stage A by computing χ := (χ ∩ xA ) = xA , which in turn is constrained at stage B to produce χ := (xA ∩ xB ) = xC . The final constrained values xC = (xA ∩ xB ) are collected at stage C. This is illustrated in Figure 11.3. U Unconstrained F =: Source of unconstrained variable values F=: A Constrained F = xA Stochastic constraint xA selected A by p Ȥ . B Constrained F = xC Stochastic constraint xB selected B by p Ȥ . C Collected constrained values xC = (xA xB ) Fig. 11.3 Frequentist interpretation of constraint fusion Assume that a series of unconstrained variable instances χ are generated at source U, and that the resulting constrained values are collected at stage C. If e.g. for a specific instance i of the process in Figure 11.3 the constraints are such that (xiA ∩ xiB ) = xCi 6= 0, / then a non-empty value xCi is collected. If for another / then the collected value is instance j the constraints are such that (xAj ∩ xBj ) = xCj = 0, A B 0. / If for yet another instance k the constraints are xk = xk = X so that (xkA ∩ xkB ) = X, then the collected value is X. Relative to the total number of collected values (including X and 0) / the relative proportions of each type of collected values can be expressed by a relative frequency probability distribution pCχ . Let n denote the total number of collected values, and let T denote the Boolean truth function such that T(TRUE) = 1 and T(FALSE) = 0. Then the convergence values of pCχ when n goes to infinity is expressed as: ∑ni=1 T(xCi = x) C p χ (x) = lim ∀ x ∈ P(X). (11.6) n→∞ n − ∑n T(xC = 0) / i=1 i The stochastic constraint opinion ωXC is derived from the probability distribution pCχ according to Definition 11.4. Definition 11.4 (Stochastic Constraint Opinion). Given the convergent constrained relative frequency distribution pCχ of Eq.(11.6) the stochastic constraint opinion is expressed as: 11.2 Belief Constraint Fusion 211 C b X (x) = pCχ (x), for x ∈ R(X) ωXC : uCX = pCχ (X) C aX = aX (11.7) ⊔ ⊓ (A&B) the opinion ωX The stochastic constraint opinion ωXC is the same as produced by the belief constraint fusion operator of Definition 11.3, so the following theorem can be stated. Theorem 11.1 (Equivalence Between Stochastic and Belief Constraint Fusion). Stochastic constraint fusion of Definition 11.3 is equivalent to the belief constraint fusion of Definition 11.4. This can be expressed as: (A&B) ωX ≡ ωXC . (11.8) Proof. (A&B) (A&B) The stepwise transformation of bCX (x) into b X (x), as well as of uCX into uX , demonstrates the equivalence. (A&B) Transformation bCX −→ b X : 1 : bCX (x) = pCX (x) 2: = lim n→∞ 3: = ∀ x ∈ R(X) ∑ni=1 T((xiA ∩xiB )=x) n − ∑ni=1 T((xiA ∩xiB )=0) / n ∑ p Aχ (xA )ppBχ (xB ) (xA ∩xB )=x ∀ x ∈ R(X), xiA , xiB ∈ P(X) ∀ x ∈ R(X), xA , xB ∈ P(X) n1− ∑ p Aχ (xA )ppBχ (xB ) (xA ∩xB )=0/ ∑ 4: = p Aχ (xA )ppBχ (xB ) (xA ∩xB )=x 1− ∑ p Aχ (xA )ppBχ (xB ) ∀ x ∈ R(X), xA , xB ∈ P(X) (xA ∩xB )=0/ bAX (x)uBX +bbBX (x)uAX + ∑ 5: = 6: = bAX (xA )bbBX (xB ) (xA ∩xB )=x 1− ∑ b AX (xA )bbBX (xB ) (xA ∩xB )=0/ Har(x) (1−Con) (A&B) = bX (x) ∀ x ∈ R(X), xA , xB ∈ R(X) ∀ x ∈ R(X). (11.9) The crucial point is the transformation from step 2 to step 3. The validity of this transformation is evident because at every instance i the following probability equalities hold: 212 11 Fusion of Subjective Opinions p((xiA ∩ xiB ) = x) = p((xiA ∩ xiB ) = 0) / = ∑ p Aχ (xiA )ppBχ (xBj ) ∑ p Aχ (xiA )ppBχ (xiB ). (xiA ∩xiB )=x (11.10) (xiA ∩xiB )=0/ This means that the probabilities of the two Boolean truth functions in step 2 above are given by Eq.(11.10). The transformation from step 2 to step 3 above simply consists of rewriting these probabilities. (A&B) Simplified transformation uCX −→ uX : 1 : uCX = pCX (X) 2: = lim 3: = n→∞ 1− ∑ni=1 T((xiA ∩xiB )=X) n − ∑ni=1 T((xiA ∩xiB )=0) / xiA , xiB ∈ P(X) p Aχ (X)ppBχ (X) ∑ p Aχ (xA )ppBχ (xB ) xA , xB ∈ P(X) uAX uBX b AX (xA )bbBX (xB ) xA , xB ∈ R(X) (11.11) (xA ∩xB )=0/ 4: = 1− ∑ (xA ∩xB )=0/ 5: = uAX uBX (1−Con) , (A&B) = uX . ⊔ ⊓ While Definition 11.4 is based on long term frequentist situations, the results can be extended to the combination of subjective opinions in the same way that frequentist probability calculus can be extended to subjective probability and nonfrequentist situations. According to de Finetti [13] a frequentist probability is no more objective than a subjective (non-frequentist) probability, because even if observations are objective, their translation into probabilities is always subjective. de Finetti [12] provides further justification for this view by explaining that subjective knowledge of a system often will carry more weight when estimating probabilities of future events than purely objective observations of the past. The only case where probability estimates can be purely based on frequentist information is in abstract examples from text books. Frequentist information is thus just another form of evidence used to estimate probabilities. Because a subjective opinion is simply a probability distribution over a hyperdomain, de Finetti’s view can obviously be extended to subjective opinions. Based on this argumentation there is not only a mathematical equivalence, but also an interpretational equivalence between stochastic constraint fusion and belief constraint fusion. 11.2 Belief Constraint Fusion 213 11.2.3 Expressing Preferences with Subjective Opinions Preferences can be expressed e.g. as soft or hard constraints, qualitative or quantitative, ordered or partially ordered etc. It is possible to specify a mapping between qualitative verbal tags and subjective opinions which enables easy solicitation of preferences [75]. Table 11.1 describes examples of how preferences can be expressed. Table 11.1 Example preferences and corresponding subjective opinions Example & Type “Ingredient x is mandatory” Hard positive “Ingredient x is totally out of the question” Hard negative “My preference rating for x is 3 out of 10” Quantitative “I prefer x or y, but z is also acceptable” Qualitative Opinion Expression Binary domain X = {x, x} ωx : (1, 0, 0, 12 ) Binomial opinion Binary domain X = {x, x} ωx : (0, 1, 0, 21 ) Binomial opinion Binary domain X = {x, x} Binomial opinion ωx : (0.3, 0.7, 0.0, 21 ) Ternary domain Θ = {x, y, z} Trinomial opinion ωΘ : (b(x, y) = 0.6, b(z) = 0.3, u = 0.1, a(x, y, z) = 13 ) “I like x, but I like y even more” Two binary domains X = {x, x} and Y = {y, y} Positive rank Binomial opinions ωx : (0.6, 0.3, 0.1, 21 ), ωy : (0.7, 0.2, 0.1, 21 ) “I don’t like x, and I dislike y even more” Two binary domains X = {x, x} and Y = {y, y} Negative rank Binomial opinions ωx : (0.3, 0.6, 0.1, 21 ), ωy : (0.2, 0.7, 0.1, 21 ) “I’m indifferent about x, y and z” Ternary domain Θ = {x, y, z} Neutral Trinomial opinion ωΘ : (uΘ = 1.0, a(x, y, z) = 13 ) “I’m indifferent but most people prefer x” Ternary domain Θ = {x, y, z} Neutral with bias Trinomial opinion ωΘ : (uΘ = 1.0, a(x) = 0.6, a(y) = 0.2, a(z) = 0.2) All the preference types of Table 11.1 can be interpreted in terms of subjective opinions, and further combined by considering them as constraints expressed by different agents. The examples that comprise two binary domains could also have been modelled with a quaternary product domain with a corresponding 4-nomial product opinion. In fact product opinions over product domains could be a method of simultaneously considering preferences over multiple variables, and this will be the topic of future research. Default base rates are specified in all but the last example which indicates total indifference but with a bias which expresses the average preference in the population. Base rates are useful in many situations, such as for default reasoning. Base rates only have an influence in case of significant indifference or uncertainty. 214 11 Fusion of Subjective Opinions 11.2.4 Example: Going to the Cinema, 1st Attempt Assume three friends, Alice, Bob and Clark, who want to see a film together at the cinema one evening, and that the only films showing are Black Dust (BD), Grey Matter (GM) and White Powder (W P), represented as the ternary domain Θ = {BD, GM, W P}. Assume that the friends express their preferences in the form of the opinions of Table 11.2. Table 11.2 Combination of film preferences b(BD) b(GM) b(W P) b(GM ∪W P) = = = = Preferences of: Results of preference combinations: Alice Bob Clark (Alice & Bob) (Alice & Bob & Clark) ωΘA ωΘB ωΘA&B ωΘC ωΘA&B&C 0.99 0.00 0.00 0.00 0.00 0.01 0.01 0.00 1.00 1.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 Alice and Bob have strong and conflicting preferences. Clark, who only does not want to watch Black Dust, and who is indifferent about the two other films, is not sure whether he wants to come along, so Table 11.2 shows the results of applying the preference combination operator, first without him, and then including in the party. By applying belief constraint fusion, Alice and Bob conclude that the only film they are both interested in seeing is Grey Matter. Including Clark in the party does not change that result because he is indifferent to Grey Matter and White Powder anyway, he just does not want to watch the film Black Dust. The belief mass values of Alice and Bob in the above example are in fact equal to those of Zadeh’s example [92] which was used to demonstrate the unsuitability of Dempster’s rule for fusing beliefs because it produces counter-intuitive results. Zadeh’s example describes a medical case where two medical doctors express their opinions about possible diagnoses, which typically should have been modelled with the averaging fusion operator [41], not with Dempster’s rule. In order to select the appropriate operator it is crucial to fully understand the nature of the situation to be modelled. The failure to understand that Dempster’s rule does not represent an operator for cumulative or averaging belief fusion, combined with the unavailability of the general cumulative and averaging belief fusion operators for many years (1976[81]-2010[41]), has often led to inappropriate applications of Dempster’s rule to cases of belief fusion [48]. However, when specifying the same numerical values as in [92] in a case of preference combination such as the example above, the constraint fusion operator (which is a simple extension of Dempster’s rule) is very suitable and produces perfectly intuitive results. 11.2 Belief Constraint Fusion 215 11.2.5 Example: Going to the Cinema, 2nd Attempt In this example Alice and Bob soften their strong preference by expressing some indifference in the form of u = 0.01, as specified by Table 11.3. Clark has the same opinion as in the previous example, and is still not sure whether he wants to come along, so Table 11.3 shows the results without and with his preference included. Table 11.3 Combination of film preferences with some indifference and with non-default base rates Preferences of: Results of preference combinations: Alice Bob Clark (Alice & Bob) (Alice & Bob & Clark) ωΘA ωΘB ωΘA&B ωΘC ωΘA&B&C b (BD) = 0.98 0.00 0.00 0.490 0.000 b (GM) = 0.01 0.01 0.00 0.015 0.029 0.00 0.98 0.00 0.490 0.961 b (W P) = b (GM ∪W P) = 0.00 0.00 1.00 0.000 0.010 u = 0.01 0.01 0.00 0.005 0.000 a (BD) = 0.6 0.6 0.6 0.6 0.6 a (GM) = a(W P) = 0.2 0.2 0.2 0.2 0.2 The effect of adding some indifference is that Alice and Bob should pick film Black Dust or White Powder because in both cases one of them actually prefers one of the films, and the other finds it acceptable. Neither Alice nor Bob prefers Gray Matter, they only find it acceptable, so it turns out not to be a good choice for any of them. When taking into consideration that the base rate a(BD) = 0.6 and the base rate a(W P) = 0.2, the expected preference levels according to Eq.(3.28) are such that: PA&B (BD) > PA&B (W P) . (11.12) More precisely, the expected preference levels according to Eq.(3.28) are: PA&B (BD) = 0.493 , PA&B (W P) = 0.491 . (11.13) Because of the higher base rate, Black Dust also has a higher expected preference than White Powder, so the rational choice would be to watch Black Dust. However, when including Clark who does not want to watch Black Dust, the base rates no longer dictates the result. In this case Eq.(3.28) produces PA&B&C (W P) = 0.966 so the obvious choice is to watch White Powder. 11.2.6 Example: Not Going to the Cinema Assume now that the Alice and Bob express totally conflicting preferences as specified in Table 11.4, i.e. Alice expresses a hard preference for Black Dust and Bob 216 11 Fusion of Subjective Opinions expresses a hard preference for White Powder. Clark still has the same preference as before, i.e he does not want to watch Black Dust and is indifferent about the two other films. Table 11.4 Combination of film preferences with hard and conflicting preferences b(BD) b(GM) b(W P) b(GM ∪W P) = = = = Preferences of: Results of preference combinations: Alice Bob Clark (Alice & Bob) (Alice & Bob & Clark) ωΘA ωΘB ωΘA&B ωΘC ωΘA&B&C 1.00 0.00 0.00 Undefined Undefined 0.00 0.00 0.00 Undefined Undefined 0.00 1.00 0.00 Undefined Undefined 0.00 0.00 1.00 Undefined Undefined In this case the belief constraint fusion operator can not be applied because Eq.(11.2) produces a division by zero. The conclusion is that the friends will not go to the cinema to see a film together. The test for detecting this situation is when Con = 1 in Eq.(11.4). It makes no difference to include Clark in the party because a conflict can not be resolved by including additional preferences. However it would have been possible for Bob and Clark to watch White Powder together without Alice. 11.3 Cumulative Fusion The cumulative fusion operator for belief opinions is equivalent to simply adding up the evidence parameters of evidence opinions. The cumulative fusion operator for belief opinions is then obtained through the bijective mapping between belief opinions and evidence opinions, as described by Definition 3.9. Assume a domain X and its hyperdomain R(X). Assume a process where variable X takes values from X resulting from the process. Consider two agents A and B who observe the outcomes of the process over two separate time periods. Their observations can be vague, meaning that sometimes they observe an outcome which can be one of multiple possible singletons in X, but they are unable to identify the observed outcome uniquely. For example, assume that persons A and B are observing coloured balls being picked from an urn, where the balls can have one of four colours: black, white, red or green. Assume further that the observers A and B are colour blind, which means that sometimes they are unable see the difference between red and green balls, although they can always tell when a ball is black or white. As a result their observations can be vague, meaning that sometimes they perceives a specific ball to be either red or green, but are unable to identify the ball’s colour precisely. This corresponds to the situation where X is a hypervariable which takes its values from R(X). 11.3 Cumulative Fusion 217 The symbol ‘⋄’ denotes the fusion of two observers A and B into a single imaginary observer denoted as (A ⋄ B). Definition 11.5 (The Cumulative Fusion Operator). Let ω A and ω B be opinions respectively held by agents A and B over the same (A⋄B) (hyper)variable X on domain X. Let ωX be the opinion such that: Case I: For uAX 6= 0 ∨ uBX 6= 0 : b AX (x)uBX +bbBX (x)uAX (A⋄B) b X (x) = uAX +uBX −uAX uBX u(A⋄B) X = Case II: For uAX = 0 ∧ uBX = 0 : (A⋄B) b X (x) = γXA b AX (x) + γXB b B (x) u(A⋄B) X (A⋄B) =0 (11.14) uAX uBX uAX +uBX −uAX uBX where γ A = lim uAX →0 uBX →0 γXB = lim uAX →0 B uBX A uX +uBX (11.15) uAX uAX +uBX uX →0 is called the cumulatively fused opinion of ωXA and ωXB , representing Then ωX the combination of independent opinions of sources A and B. By using the symbol (A⋄B) ⊔ ⊓ ‘⊕’ to designate this belief operator, we define ωX ≡ ωXA ⊕ ωXB . It can be verified that the cumulative fusion operator is commutative, associative and non-idempotent. In Case II of Definition 11.5, the associativity depends on the preservation of relative weights of intermediate results, which requires the additional weight parameter γ . In this case, the cumulative fusion operator is equivalent to the weighted average of probabilities. The cumulative fusion operator is equivalent to updating prior Dirichlet PDFs by adding new evidence to produce posterior Dirichlet PDFs. Deriving the cumulative belief fusion operator is based on the bijective mapping between belief opinions and evidence opinions. The mapping is expressed in Definition 3.9. Theorem 11.2. The cumulative fusion operator of Definition 11.5 is equivalent to simple addition of the evidence parameters of evidence opinions as expressed in in Eq.(3.34). Proof. The cumulative belief fusion operator of Definition 11.5 is derived by mapping the argument belief opinions to evidence opinion through the bijective mapping of Definition 3.9. Cumulative fusion of evidence opinions simply consists of evidence parameter addition. The fused evidence opinion is then mapped back to 218 11 Fusion of Subjective Opinions a belief opinion through the bijective mapping of Definition 3.9. This explanation is in essence the proof of Theorem 11.2. A more detailed explanation is provided below. Let the two observers’ respective hyper opinions be expressed as ωXA and ωXB . eH B A The corresponding evidence opinions DireH X (rr X , a X ) and DirX (rr X , a X ) contain the A B respective evidence parameters r X and r X . The cumulative fusion of these two bodies of evidence simply consists of vector eH B A addition of DireH X (rr X , a X ) and DirX (rr X , a X ), expressed as: (A⋄B) DireH X (rr X eH B A , a X ) = DireH X (rr X , a X ) ⊕ DirX (rr X , a X ) (11.16) A B = DireH X ((rr X + r X ), a X ) . More specifically, for each value x ∈ R(X) the accumulated observation evidence is computed as: (A⋄B) r X (x) = r AX (x) + r BX (x). (11.17) (A⋄B) rX The cumulative fused belief opinion ωXA⋄B of Definition 11.5 results from mapping the fused evidence opinion of Eq.(11.16) back to a belief opinion by applying the bijective mapping of Definition 3.9. ⊓ ⊔ Notice that the expression for the cumulative fusion operator in Definition 11.5 is independent of the non-informative prior weight W . That means that the choice of non-informative prior weight in fact only influences the mapping between evidence opinions and belief opinions, not the cumulative fusion operator itself. The cumulative fusion operator represents a generalisation of the consensus operator [38, 37]. The binomial cumulative fusion operator emerges directly from Definition 11.5 by assuming a binary domain and binomial argument opinions. 11.4 Averaging Fusion The averaging fusion operator for belief opinions is equivalent to averaging the evidence parameters of the evidence opinions. The averaging fusion operator for belief opinions is then obtained through the bijective mapping between belief opinions and evidence opinions, as expressed by Definition 3.9. Assume a domain X and its hyperdomain R(X). Assume a process X where the produced outcomes are values from X. Agents A and B observe the same outcomes of the same process over the same time period. Even though A and B witness the same process, their perceptions might be different, e.g. because their cognitive capabilities are different. For example, consider a situation where persons A and B are observing coloured balls being picked from an urn, where the balls can have one of four colours: black, white, red or green. Assume further that observer B is colour blind, which means that sometimes he has trouble distinguishing between red and green balls, although he can always tell when a ball 11.4 Averaging Fusion 219 is black or white. Observer A has perfect colour vision. and normally can always tell the correct colour when a ball is picket. As a result, when a red ball has been picked, observer A normally identifies it as red, but observer B might identify it as green. This corresponds to a case where two observers have conflicting opinions about the same variable, although their observations and opinions are totally dependent. Consider that a priori it is unknown that one of the observers is colour blind, so that their opinions are considered equally reliable. The averaging fusion operator provides an adequate model for this fusion situation. When it is assumed that some observers are unreliable, their opinions might be discounted as a function of the analyst’s trust in the observers. This principle is described in Chapter 13. The symbol ‘⋄’ denotes the principle of averaging fusion where two observers A and B are merged into a single imaginary observer denoted (A⋄B). Definition 11.6 (The Averaging Fusion Operator). Let ωXA and ωXB be opinions respectively held by agents A and B over the same (A⋄B) (hyper)variable X on domain X. Let ωX be the opinion such that: Case I: For uAX 6= 0 ∨ uBX 6= 0 : b AX (x)uBX +bbBX (x)uAX (A⋄B) b X (x) = uAX +uBX u(A⋄B) X = Case II: For uAX = 0 ∧ uBX = 0 : (A⋄B) b X (x) = γXA b AX (x) + γXB b BX (x) uA⋄B X =0 (11.18) 2uAX uBX uAX +uBX where γXA = lim uAX →0 uBX →0 B γX = lim uAX →0 B uBX uAX +uBX (11.19) uAX uAX +uBX uX →0 Then ωXA⋄B is called the averaged opinion of ωXA and ωXB , representing the combination of the dependent opinions of A and B. By using the symbol ‘⊕’ to designate (A⋄B) ≡ ωXA ⊕ωXB . ⊔ ⊓ this belief operator, we define ωX It can be verified that the averaging fusion rule is commutative and idempotent, but not associative. The averaging fusion operator is equivalent to updating prior Dirichlet PDFs by computing the average of prior evidence and new evidence to produce posterior Dirichlet PDFs. Deriving the averaging belief fusion operator is based on the bijective mapping between the belief and evidence notations described in Definition 3.9. 220 11 Fusion of Subjective Opinions Theorem 11.3. The averaging fusion operator of Definition 11.6 is equivalent to simple averaging of the evidence parameters of evidence opinions as expressed in Eq.(3.34). Proof. The cumulative fusion operator for belief opinions of Definition 11.6 is derived by mapping the argument belief opinions to evidence opinion through the bijective mapping of Definition 3.9. Cumulative fusion of evidence opinions simply consists of computing the average of evidence parameters. The fused evidence opinion is then mapped back to a belief opinion through the bijective mapping of Definition 3.9. This explanation is in essence the proof of Theorem 11.3. A more detailed explanation is provided below. Let the two observers’ respective belief opinions be expressed as ωXA and ωXB . eH B A The corresponding evidence opinions DireH X (rr X , a X ) and DirX (rr X , a X ) contain the A B respective evidence parameters r X and r X . The averaging fusion of these two bodies of evidence simply consists of vector eH B A averaging of DireH X (rr X , a X ) and DirX (rr X , a X ), expressed as: (A⋄B) DireH X (rr X eH B A , a X ) = DireH X (rr X , a X )⊕DirX (rr X , a X ) (11.20) A B = DireH X ((rr X + r X )/2, a X ) . More specifically, for each value x ∈ R(X) the accumulated observation evidence (A⋄B) is computed as: rX r A (x) + r BX (x) (A⋄B) r X (x) = X . (11.21) 2 The averaging fused belief opinion ωXA⋄B of Definition 11.6 results from mapping the fused evidence opinion of Eq.(11.20) back to a belief opinion by applying the bijective mapping of Definition 3.9. ⊓ ⊔ The cumulative rule represents a generalisation of the consensus rule for dependent opinions defined in [45]. 11.5 Hybrid Cumulative-Averaging Fusion A hybrid fusion operator can be designed for situations where the arguments are partially dependent. In such situations, neither cumulative fusion nor averaging fusion would be fully adequate. Instead a hybrid fusion operator can be used to model the partial dependence between the arguments and compute a fused opinion. Let two agents A and B receive evidence e.g. by observing the same process during two partially overlapping periods. If it is known exactly which events were observed by both, one of the agents could simply dismiss these observations, so their opinions would be independent. However, it may not always be possible to determine which observations are the same. 11.5 Hybrid Cumulative-Averaging Fusion 221 Instead, it may be possible to determine the degree of dependence between their evidence. The idea is that cumulative fusion can be applied to the independent part, and averaging fusion to the dependent part. A Let the opinions of A and B be represented as evidence opinions DireH X (rr X , a X ) eH r B A B and DirX (r X , a X ) with the respective evidence parameters r X and r X . A i(B) Assume that r AX can be split into two parts, where r X is totally independent of A d(B) B’evidence, and where r X is totally dependent on B’s evidence. Similarly, r BX can also be split into a dependent and an independent part, as expressed by Eq.(11.22). A i(B) A d(B) + rX r AX = r X Partially dependent evidence: (11.22) r B = r B i(A) + r B d(A) X X X Figure 11.4 illustrates the situation of partially dependent evidence. Assuming that the fraction of overlapping observations is known, the dependent and the independent parts of their observation evidence can be estimated, so that an operator for combined cumulative and averaging fusion can be defined [45, 49]. Evidence body I rXAi ( B ) A’s evidence Ad ( B ) X r rXB d ( A) Evidence body II B’s evidence rXB i ( A) Fig. 11.4 Partially dependent evidence Let A and B have evidence parameters rXA and rXB that are partially dependent, (A/B) where δX represents A’s evidence’s relative dependence on B’s evidence, and (B/A) denotes B’s evidence’s relative dependence on A’s evidence. As shown in δX Figure 11.4 the two degrees of dependence are not necessarily equal, e.g. when one of the observers has collected a larger body of evidence than the other observer. The dependent and independent evidence can be defined as a function of the two dependence factors. ( Ad(B) (A/B) r X (x) = rXA (x)δX A rX : Ai(B) (A/B) r X (x) = rXA (x)(1 − δX ) (11.23) ( Bd(A) B (x)δ (B/A) r (x) = r x X X r BX : Bi(A) (B/A) r X (x) = rXB (x)(1 − δX ) 222 11 Fusion of Subjective Opinions The fusion of partially dependent evidence opinions can then be defined as a function of their respective dependent and independent parts. Definition 11.7 (Fusion of Partially Dependent Evidence Opinions). Let rXA and rXB be evidence parameters in the evidence opinions respectively held e denotes fusion between by the agents A and B regarding variable X. The symbol ⊕ partially dependent opinions. As before ⊕ and ⊕ are the respective operators for cumulative and averaging fusion. Partially dependent fusion between A and B can then be written as: ⋄B = r A ⊕r e XB r Ae X X Ad(B) = (rr X (11.24) Bd(A) ⊕ rX Ai(B) ) ⊕ rX Bi(A) ⊕ rX ⊔ ⊓ The equivalent expression for fusion of partially dependent belief opinions can be obtained by using Eq.(11.24) and by applying the bijective mapping of Definition 3.9. The reciprocal dependence factors are as before denoted by δ (A/B) and δ (B/A) . Definition 11.8 (Fusion of Partially Dependent Belief Opinions). Let A and B have the partially dependent opinions ωXA and ωXB respectively, about the same variable X, and let their dependent and independent parts be expressed according to Eq.(11.25) below. 11.5 Hybrid Cumulative-Averaging Fusion Ai(B) ωX Ad(B) ωX Bi(A) ωX Bd(A) ωX : : : : Ai(B) b X (x) = uAi(B) X (A/B) bAX (x)(1−δX (A/B) (1−δX )(∑ bAX )+uAX (A/B) bAX (x)δX (A/B) (∑ bAX )+uAX δX = (A/B) δX ∀x ∈ R(X) uAX (∑ bAX )+uAX (11.25) Bi(A) b (x) = X (B/A)) bBX (x)(1−δX (B/A) (1−δX )(∑ bBX )+uBX ∀x ∈ R(X) uBX = (B/A) (1−δX Bd(A) b (x) = X uBd(A) X ∀x ∈ R(X) uAX Ad(B) b X (x) = uBi(A) X ) (A/B) )(∑ bAX )+uAX (1−δX = uAd(B) X 223 = )(∑ bBX )+uBX (B/A) bBX (x)δX (B/A) δX (∑ bBX )+uBX (B/A) δX ∀x ∈ R(X) uBX (∑ bBX )+uBX Having specified the separate dependent and independent parts of two partially dependent opinions, the fusion operator for partially dependent opinions can be expressed. e denotes fusion between partially dependent opinions. As usual ⊕ The symbol ⊕ and ⊕ denote the operators for independent and dependent opinions. e ωXB ωXAe⋄B = ωXA ⊕ Ad(B) = (ωX (11.26) Bd(A) ⊕ωX Ai(B) ) ⊕ ωX Bi(A) ⊕ ωX ⊔ ⊓ Theorem 11.4. The fusion operator for partially dependent belief opinions described in Definition 11.8 is equivalent to the fusion operator for partially dependent evidence opinions of Definition 11.7. Proof. To show the equivalence it is enough to map the partially dependent belief opinion arguments to evidence opinions according to the bijective mapping of Definition 3.9, then do the fusion according to Definition 11.7, and finally map the result back to a belief opinions according to the same bijective mapping of Definition 3.9. The expressions of Definition 11.8 then emerge directly. ⊓ ⊔ (A/B) It is also easy to prove that for any opinion ωXA with a dependence factor δx to another opinion ωXB the following equality holds: 224 11 Fusion of Subjective Opinions Ai(B) ωXA = ωX Ad(B) ⊕ ωX (11.27) 11.6 Consensus & Compromise Fusion CC-fusion (Consensus & Compromise) is a fusion model specifically designed to satisfy the requirements of being idempotent, having a neutral element, and where conflicting beliefs result in compromise beliefs. This shows that it is possible to design fusion models to fit particular requirements. Assume two opinions ωXA and ωXB over the variable X which takes its values from the hyperdomain R(X). The superscripts A and B are attributes that identify the respective belief sources or belief owners. These two opinions can be mathematically CC ’ which can be expressed as: merged using the CC-fusion operator denoted as ‘ B CC ω . Consensus & Compromise Fusion: ωXA♥B = ωXA X (11.28) Belief source combination denoted with ‘♥’ thus corresponds to opinion fusion CC ’. The CC-operator is formally described next. It is a two-step operator with ‘ where the consensus step comes first, and then the compromise step. 11.6.1 Consensus Step The consensus step simply consists of determining shared belief mass between the two arguments, which is stored as the belief vector b cons expressed by Eq.(11.29). X A B b cons (11.29) X (x) = min b X (x), b X (x) . The sum of consensus belief denoted bcons is expressed as: X bcons = X ∑ b cons X (x) (11.30) x∈R(X) The residue belief masses of the arguments are: resA b X (x) = b AX (x) − b cons X (x) B cons b resB (x) = b (x) − b X X X (x) (11.31) 11.6.2 Compromise Step The compromise step redistributes conflicting residue belief mass to produce comcomp promise belief mass, stored in b X expressed by Eq.(11.32). 11.6 Consensus & Compromise Fusion 225 bcomp (xi ) = bresA (xi )uBX + bresB (xi )uAX X + ∑ a X (y/z) a X (z/y) b resA (y) b resB (z) {y∩z}=xi + resA resB ∑ (1 − a X (y/z) a X (z/y)) b (y) b (z) {y∪z}=xi {y∩z}6=0/ + (11.32) resA resB ∑ b (y) b (z) , where xi ∈ P(X) . {y∪z}=xi {y∩z}=0/ pre The preliminary uncertainty uX is computed as: pre uX = uAX uBX . The sum of compromise belief denoted comp bX = ∑ comp bX (11.33) is: comp b X (x) . (11.34) x∈P(X) comp pre comp In general bcons + bX + uX < 1, so normalisation of b X is required. The X normalisation factor denoted fnorm is: pre fnorm = 1 − (bcons X + uX ) . comp bX (11.35) Because belief on X is uncertainty, the fused uncertainty is: pre comp uA♥B = uX + fnorm b X X (X) . (11.36) After computing the fused uncertainty the compromise belief mass on X must be set to zero, i.e. comp b X (X) = 0 . (11.37) 11.6.3 Merging Consensus and Compromise Belief After normalisation the resulting CC-fused belief is: comp cons b A♥B (x) , ∀x ∈ R(X) . X (x) = b X (x) + f norm b X (11.38) The CC-operator is commutative, idempotent and semi-associative, with the vacuous opinion as neutral element. Semi-associativity means that 3 or more arguments must first be combined together in the Consensus Step, and then together again in the Compromise Step before normalisation. Chapter 12 Unfusion and Fission of Subjective Opinions Given belief fusion as a principle for merging evidence about a domain of interest, it is natural to think of its opposite. However, it is not immediately clear what the opposite of belief fusion might be. From a purely linguistic and semantic point of view, fission naturally appears to be the opposite of fusion. As a consequence we define belief fission for subjective opinions below. In addition, we also define unfusion for subjective opinions. The two concepts are related but still clearly different. Their interpretations are explained in the respective sections below. 12.1 Unfusion of Opinions The principle of unfusion [40] is the opposite of fusion, namely to eliminate the contribution of a specific belief from an already fused belief, with the purpose of deriving the remaining belief. This chapter describes cumulative unfusion as well as averaging unfusion opinions. These operators can for example be applied to remove the contribution of a given real or hypothetical evidence source in order to determine the result of analysing what the situation would have been in the absence of that evidence source. Figure12.1 illustrates the principle of unfusion. Arguments: (A¡B) believes B Result: Z XA¡B u Z XB X remove belief A believes Z XA ~ Z X( A¡B ) ¡B Z XA¡B Z XB X Fig. 12.1 Unfusion operator principle 227 228 12 Unfusion and Fission of Subjective Opinions There are situations where it is useful to separate a fused belief in its contributing belief components, and this process is called belief unfusion. This requires the already fused belief and one of its contributing belief components as input, and will produce the remaining contributing belief component as output. Unfusion is basically the opposite of fusion, and the formal expressions for unfusion can be derived by rearranging the expressions for fusion. This will be described in the following sections. Fission of beliefs is related to unfusion of beliefs but is different and will not be discussed here. Fission simply means that a belief is split into several parts without specifying any of its contributing factors. A belief can for example be split into two equal contributing beliefs. The principle of belief unfusion is the opposite to belief fusion. This section describes the unfusion operators corresponding to the cumulative and averaging fusion operators described in the previous section. 12.1.1 Cumulative Unfusion Assume a domain X of cardinality k with hyperdomain R(X) and associated variable X. Assume two observers A and B who have observed the outcomes of a process over two separate time periods. Assume that the observers’ beliefs have been cumulatively fused into ωXA⋄B = ωXC = (bbCX , uCX , a X ), and assume that entity B’s contributing opinion ωXB = (bbBX , uBX , a X ) is known. The cumulative unfusion of these two bodies of evidence is denoted as ωXCe⋄B = A ωX = ωXC ⊖ ωXB , which represents entity A’s contributing opinion. The mathematical expressions for cumulative unfusion is described below. Definition 12.1 (The Cumulative Unfusion Operator). Let ωXC = ωXA⋄B be the cumulatively fused opinion of ωXB and the unknown opinion ωXA over the variable X. Let ωXA = ωXCe⋄B be the opinion such that: Case I: For uCX 6= 0 ∨ uBX 6= 0 : A C⋄B bX (x) = bX (x) = uA = uCe⋄B X X = B bB (x)uC bC X X (x)uX −b X B B C uX −uC X +uX uX (12.1) uBX uC X B B C uX −uC X +uX uX 12.1 Unfusion of Opinions 229 Case II: For uCX = 0 ∧ uBX = 0 : A ⋄B B C C B b X (x) = bCe X (x) = γ b X (x) − γ b X (x) ⋄B uAX = uCe X where (12.2) =0 γ B = lim uC X →0 B uX →0 γC = lim uC X →0 uBX →0 uBX B C uBX −uC X +uX uX uC X C B uX −uX +uBX uC X Then ωXCe⋄B is called the cumulatively unfused opinion of ωXC and ωXB , representing the result of eliminating the opinion of B from that of C. By using the symbol ‘⊖’ to designate this belief operator, we define: Cumulative unfusion: ωXCe⋄B ≡ ωXC ⊖ ωXB . (12.3) ⊔ ⊓ Cumulative unfusion is the inverse of cumulative fusion. Its proof and derivation is based on rearranging the mathematical expressions of Definition 11.5. It can be verified that cumulative unfusion is non-commutative, non-associative and non-idempotent. In Case II of Definition 12.1, the unfusion operator is equivalent to the weighted subtraction of probabilities. 12.1.2 Averaging Unfusion Assume a domain X of cardinality k with corresponding hyperdomain R(X) and associated variable X. Assume two observers A and B who have observed the same outcomes of a process over the same time period. Assume that the observers’ beliefs have been averagely fused into ωXC = ωXA⋄B = (bbCX , uCX , aCX ), and assume that entity B’s contributing opinion ωXB = (bbBX , uBX , a BX ) is known. The averaging unfusion of these two bodies of evidence is denoted as ωXA = ωXCe⋄B = ωXC ⊖ωXB , which represents entity A’s contributing opinion. The mathematical expressions for averaging unfusion is described below. Definition 12.2 (Averaging Unfusion Operator). Let ωXC = ωXA⋄B be the fused average opinion of ωXB and the unknown opinion ωXA over the variable X. Let ωXA = ωXCe⋄B be the opinion such that: 230 12 Unfusion and Fission of Subjective Opinions Case I: For uCX 6= 0 ∨ uBX 6= 0 : Ce ⋄B A b X (x) = b X (x) = uA = uCe⋄B X X = B bB (x)uC 2bbC X X (x)uX −b X B 2uX −uC X (12.4) uBX uC X 2uBX −uC X Case II: For uCX = 0 ∧ uBX = 0 : A Ce ⋄B bx = bX (x) = γ B bCX (x) − γ C bBX (x) ⋄B uAX = uCe X where =0 γ B = lim uC X →0 B uX →0 γC = lim uC X →0 uBX →0 (12.5) 2uBX 2uBX −uC X uC X B 2uX −uC X Then ωXCe⋄B is called the average unfused opinion of ωXC and ωXB , representing the result of eliminating the opinions of B from that of C. By using the symbol ‘⊖’ to designate this belief operator, we define: Averaging unfusion: ωXCe⋄B ≡ ωXC ⊖ωXB . (12.6) ⊔ ⊓ Averaging unfusion is the inverse of averaging fusion. Its proof and derivation is based on rearranging the mathematical expressions of Definition 11.6 It can be verified that the averaging unfusion operator is idempotent, noncommutative and non-associative. 12.1.3 Example: Cumulative Unfusion of Binomial Opinions Assume that A has an unknown binomial opinion about x. Let B’s opinion and the cumulatively fused opinion between A’s and B’s opinions be specified as: ωxA⋄B = (0.90, 0.05, 0.05, 21 ) and ωxB = (0.70, 0.10, 0.20, 12 ) The cumulative unfusion operator can be used to derive A’s opinion. By applying the argument opinions to Eq.(12.1) the contributing opinion from A is derived as: 12.2 Fission of Opinions 231 ωxA = (0.93, 0.03, 0.06, 1 ) 2 12.2 Fission of Opinions Assuming that an opinion can be considered as the actual or virtual result of fusion, there are situations where it is useful to split it into two separate opinions, and this process is called opinion fission. This operator, which requires an opinion and a fission parameter as input arguments, will produce two separate opinions as output. Fission is basically the opposite operation to fusion. The mathematical formulation of fission will be described in the following sections. 12.2.1 Cumulative Fission The principle of opinion fission is the opposite operation to opinion fusion. This section describes the fission operator corresponding to the cumulative fusion operator that was described in the previous section. There are in general an infinite number of ways to split an opinion. The principle followed here is to require an auxiliary fission parameter φ to determine how the argument opinion shall be split. As such, opinion fission is a binary operator, i.e. it takes two input arguments which are the fission parameter and the opinion to be split. Assume a domain X and its hyperdomain R(X), with associated variable X. Assume that the opinion ωXC = (bbX , uX , a X ) over X is held by a real or imaginary entity C. The fission of ωXC consists of splitting ωXC into two opinions ωXC1 and ωXC2 assigned to the (real or imaginary) agents C1 and C2 so that ωXC = ωXC1 ⊕ ωXC2 . The parameter φ determines the relative proportion of belief mass that each new opinion gets. Fission of ωXC produces two opinions denoted as φ ωXC = ωXC1 and φ ωXC = ωXC2 . The mathematical expressions for cumulative fission are constructed as follows. First we map the argument opinion ωXC = (bbCX , uCX , a X ) to the Dirichlet HPDF C DireH X (rr X , a X ) according to the mapping of Definition 3.9. The the parameters of this C1 eH C2 Dirichlet HPDF is linearly split into two parts DireH X (rr X , a X ) and DirX (rr X , a X ) as a function of the fission parameter φ . These steps produce: 232 12 Unfusion and Fission of Subjective Opinions C1 DireH X (rr X , a X ) C2 DireH X (rr X , a X ) : : C rX 1 = (12.7) a XC1 = a C rX 2 = φW b u (1−φ )W b u (12.8) a XC2 = a where W denotes the non-informative prior weight. The reverse mapping of these evidence parameters into two separate opinions according to Definition 3.9 produces the expressions of Def.12.3 below. As would be expected, the base rate is not affected by fission. Definition 12.3 (Cumulative Fission Operator). Let ωXC be an opinion over the variable X. The cumulative fission of ωXC based on the fission parameter φ where 0 < φ < 1 produces two opinions ωXC1 and ωXC2 defined by: C b X 1 = u+φ φk b b(x ) ∑i=1 i u ωXC1 : uCX1 = (12.9) u+φ ∑ki=1 b(xi ) C1 aX = a ωXC2 φ )bb b XC2 = u+(1−(1− φ ) ∑ki=1 b(xi ) u : uCX2 = u+(1−φ ) ∑ki=1 b(xi ) C2 aX = a (12.10) By using the symbol ‘’ to designate this operator, we define: ωXC1 = φ ωXC (12.11) ωXC2 (12.12) =φ ωXC In case [C : X] represents a trust edge where X represents a target entity, it can also be assumed that the entity X is being split, which leads to the same mathematical expression as Eq.(12.9) and Eq.(12.10), but with the following notation: ωXC1 = φ ωXC = ωXC1 (12.13) ωXC2 (12.14) =φ ωXC = ωXC2 12.2 Fission of Opinions 233 ⊔ ⊓ It can be verified that ωXC1 ⊕ ωXC2 = ωXC , as expected. In case φ = 0 or φ = 1 one of the resulting opinions will be vacuous, and the other equal to the argument opinion. 12.2.2 Fission of Average Assume a domain X and the corresponding variable X. Then assume that the opinion ωXA = (bb, u, a ) over X is held by a real or imaginary entity A. Average fission of ωXA consists of splitting ωXA into two opinions ωXA1 and ωXA2 assigned to the (real or imaginary) agents A1 and A2 so that ωXA = ωXA1 ⊕ωXA2 . It turns out that averaging fission of an opinion trivially produces two opinions that are equal to the argument opinion. This is because the average fusion of two equal opinions necessarily produces the same opinion. It would be meaningless to define this operator formally because it is trivial, and because it does not provide a useful model for any interesting practical situation. 12.2.3 Example Fission of Opinion Consider a ternary domain X with corresponding variable X and a hyper-opinion ωX . An analyst wants to split the opinion based on the fission parameter φ = 0.75. Table 12.1 shows the argument opinion as well as the result of the fission operation. Table 12.1 Example cumulative opinion fission with φ = 0.75 Argument opinion: ωCA Parameters: belief mass of x1 : belief mass of x2 : belief mass of x3 : uncertainty mass: b (x1 ) b (x2 ) b(x3 ) uX 0.20 0.30 0.40 0.10 base rate of x1 : base rate of x2 : base rate of x3 : a (x1 ) a(x2 ) a (x3 ) 0.10 0.20 0.70 projected prob. of x1 : projected prob. of x2 : projected prob. of x3 : P(x1 ) P(x1 ) P(x1 ) 0.21 0.32 0.47 Fission result: ωCA1 ωCA2 0.194 0.290 0.387 0.129 0.154 0.230 0.308 0.308 0.207 0.316 0.477 0.185 0.292 0.523 234 12 Unfusion and Fission of Subjective Opinions It can bee seen that the derived opinion ωCA1 contains significantly less uncertainty than ωCA2 , which means that ωCA1 represents the largest evidence base. This is due to the fission parameter φ = 0.75 which dictates the relative proportion of evidence between ωCA1 and ωCA2 to be 3 : 1. Chapter 13 Computational Trust Subjective logic was originally developed for the purpose of reasoning about trust in information security, such as when analysing trust structures of a PKI (Public-Key Infrastructure). Subjective logic and its application to this type of computational trust was first proposed by Jøsang in 1997 [35]. The idea of computational trust was originally proposed by Marsh in 1994 [64]. The concept of trust is to a large extend a mental and psychological phenomenon which does not correspond to a physical process that can be objectively observed and analysed. For this reason, formal trust models do not have any natural benchmark against which they can be compared and validated. There is thus no single correct formalism for computational trust, so that any formal trust model to a certain extent becomes ad hoc. Using subjective logic as a basis for computational trust is therefore just one of several possible approaches. Computational trust with subjective logic has been a thriving research topic since the first publication in 1997, with subsequent contributions by many different authors. It has the advantage of being intuitively sound and relatively simple, which is important when making practical implementations. The main subjective logic operators used for computational trust are fusion and trust discounting. Fusion operators are described in Chapter 11 above. The operator for trust discounting is described in Section 13.3 below. Trust discounting is the operator for deriving trust or belief from transitive trust paths. Before diving into the mathematical details of trust discounting the next section introduces the concept of trust from a more philosophical perspective. 13.1 The Notion of Trust Trust can be considered to be a particular kind of belief. In that sense trust can be modeled as an opinion that can be used as input arguments, or that can be the output results, in reasoning models based on subjective logic. We use the term trust opinion to denote trust represented as a subjective opinion. 235 236 13 Computational Trust Trust is a directional relationship between two parties that can be called trustor and trustee. One must assume the trustor to be a ‘thinking entity’ in some form meaning that it has the ability to make assessments and decisions based on received information and past experience. The trustee can be anything from a person, organisation or physical entity, to abstract notions such as information, propositions or a cryptographic key [34]. A trust relationship has a scope, meaning that it applies to a specific purpose or domain of action, such as ‘being authentic’ in the case of a an agent’s trust in a cryptographic key, or ‘providing reliable information’ in case of a person’s trust in the correctness of an entry in Wikipedia1 . Mutual trust is when both parties trust each other with the same scope, but this is obviously only possible when both parties are cognitive entities capable of doing some form of reliability, risk and policy assessment. Trust influences the trustor’s attitudes and actions, but can also have effects on the trustee and other elements in the environment, for example, by stimulating reciprocal trust [21]. The literature uses the term trust with a variety of different meanings which can be confusing[67]. Two main interpretations are to view trust as the perceived reliability of something or somebody, called reliability trust, and to view trust as a decision to enter into a situation of dependence on something or somebody, called decision trust. These two different interpretations of trust are explained in the following sections. It can already be mentioned that the notion of trust opinions in subjective logic assumes trust to have the meaning of perceived reliability. 13.1.1 Reliability Trust As the name suggest, reliability trust can be interpreted as the estimated reliability of something or somebody independently of any actual commitment or decision, and the definition by Gambetta (1988) [27] provides an example of how this interpretation can be articulated: Definition 13.1 (Reliability Trust). Trust is the subjective probability by which an individual, A, expects that another individual, B, performs a given action on which its welfare depends. ⊔ ⊓ In Definition 13.1 trust is interpreted as the trustor’s probability assessment of the trustee’s reliability, in the context of the trustor’s potential dependence on the trustee. Instead of using probability, the trustor can express the trustee’s reliability as a subjective opinion, which thereby becomes a trust opinion. Assume that an agent A holds a certain belief about an arbitrary variable X, which then represents a belief relationship formally expressed as [A, X], and that agent A also has a level of trust in entity E, which then represents a trust relationship formally expressed as [A, E]. A crucial semantic difference between holding a belief 1 http://www.wikipedia.org/ 13.1 The Notion of Trust 237 about a variable X and having a level of trust in an entity E is that the trust relationship [A, E] assumes that trustor A potentially or actually is in a situation of dependence on E, whereas the belief relationship [A, X] makes no assumption about dependence. By dependence is meant that the welfare of agent A depends on the performance of E, but which which A can not accurately predict or control. This uncertainty on the objectives of A means that in case E does not perform as assumed by A, then A would suffer some damage. In general, the uncertainty on the objectives of an agent is defined as risk [33]. The dependence aspect of a trust opinion thus creates risk which is a function of the potential damage resulting from the possible failure of entity E to meet its trust expectations. Trust opinions are binomial opinions because they apply to binary variables that naturally can take two values. A general trust domain can be denoted T = {t, t}, so that a binary random trust variable T can be assumed to take one of these two variables with the general meaning: Trust domain T : t : “The action is performed as expected.” t : “The action is not performed as expected.” Assume that an entity E is trusted to perform a specific action. A binomial trust opinion about E can thus be denoted ω tE . However, in order to have more direct expressions for trust opinions we normally use the notation ωE with the same meaning. This convention is expressed in Eq.(13.1). Equivalent opinion notations for trust in E: ωE ≡ ω tE (13.1) For example, when bank B provides credit to E it puts itself in a situation of dependence on E and becomes exposed to risk in case E is unable to repay its debt. The bank can use a trust opinion to express trust in E with regard to E’s creditworthiness. Trust in target E can be represented as belief in a variable TE which takes its values from the binary domain TE = {tE ,t E } where the values have the following meanings: tE : “Entity E will pay its debt.” Trust domain TE : t E : “Entity E will default on its debt.” A binomial opinion about the value tE is then a trust opinion about E with regard to E’s creditworthiness. Bank B’s binomial trust opinion about entity E can thus be denoted ωtBE . However, for ease of expression we use the simplified notation ωEB according to notation equivalence of Eq.(13.1). Trust opinions fit nicely into the reasoning framework of subjective logic, either as input arguments or as output results. Applying subjective logic to reasoning with 238 13 Computational Trust trust opinions represents computational trust which is a powerful way of taking subjective aspects of belief reasoning into account. Some subjective logic operators are essential for computational trust, in particular trust discounting described in Section 13.3, and (trust) fusion described in Chapter 11. Trust discounting is used for deriving opinions from transitive trust paths, and trust fusion is used for merging multiple trust paths. In combination, trust discounting and trust fusion form the main the building blocks for trust networks, as described in Section 14. 13.1.2 Decision Trust Trust can be interpreted with a more complex meaning than than that of reliability trust and Gambetta’s definition. For example, Falcone & Castelfranchi (2001) [22] note that having high (reliability) trust in a person is not necessarily sufficient for deciding to enter into a situation of dependence on that person. In [22] they write: “For example it is possible that the value of the damage per se (in case of failure) is too high to choose a given decision branch, and this independently either from the probability of the failure (even if it is very low) or from the possible payoff (even if it is very high). In other words, that danger might seem to the agent an intolerable risk.” [22] To illustrate the difference between reliability trust and decision trust with a practical example, consider the situation of a fire drill where participants are asked to abseil from the third floor window of a house using a rope that looks old and that appears to be in a state of severe deterioration. In this situation, the participants would assess the probability that the rope will hold them while abseiling. Let R denote the rope, and assume the binary trust domain TR = {tR ,t R } where the values have the following meanings: Trust domain TR : tR : “The rope will hold me while I’m abseiling down.” t R : “The rope will rupture if I try to abseil down.” Person A’s reliability trust in the rope can then be expressed as a binomial opinion denoted ω tAR , but we normally use ωRA as equivalent notation according to Eq.(13.1). If person A thinks that the rope will rupture she would express this in the form of a binomial opinion ωRA with a disbelief parameter dRA close to 1.0, as a representation for distrust in the rope R, and would most likely refuse to use it for abseiling. The fire drill situation is illustrated on the left side of Figure 13.1. Imagine now that the same person is trapped in a real fire, and that the only escape is to abseil from the third floor window using the same old rope. It is assumed that the trust opinion ωRA is the same as before. However, in this situation it is likely that person A would decide to trust the rope for abseiling down, even if she thinks it is possible that it could rupture. The trust decision has thus changed even though the reliability trust opinion is unchanged. This paradox is easily explained by the fact 13.1 The Notion of Trust 239 Would you trust this rope ... in a fire drill? No, I would not! in a real fire? Yes, I would! Fig. 13.1 Same reliability trust, but different decision trust that we are here talking about two different types of trust, namely reliability trust and decision trust. The change in trust decision is perfectly rational because the likelihood of injury or death while abseiling is assessed against the likelihood of smoke suffocation and death by fire. Although the reliability trust in the rope is the same in both situations, the decision trust changes as a function of the comparatively different utility values associated with the different courses of action in the two situations. The following definition captures the concept of decision trust. Definition 13.2 (Decision Trust). Trust is the commitment to depend on something or somebody in a given situation with a feeling of relative security, even though negative consequences are possible (inspired by [67]). ⊔ ⊓ In Definition 13.2, trust is primarily interpreted as the willingness to actually rely on a given object, and specifically includes the notions of dependence on the trustee, and its reliability and risk. In addition, Definition 13.2 implicitly also covers situational elements such as utility (of possible outcomes), environmental factors (law enforcement, contracts, security mechanisms etc.) and risk attitude (risk taking, risk averse, etc.). Both reliability trust and decision trust reflect a positive belief about something on which the trustor depends for its welfare. Reliability trust is most naturally measured as a probability or opinion about reliability, whereas decision trust is most naturally measured in terms of a binary decision. While most trust and reputation models assume reliability trust, decision trust can also modelled. Systems based on decision trust models should be considered as decision support tools. The difficulty of capturing the notion of trust in formal models in a meaningful way has led some economists to reject it as a computational concept. The strongest expression for this view has been given by Williamson (1993) [89] who argues that the notion of trust should be avoided when modelling economic interactions, because it adds nothing new, and that well studied notions such as reliability, utility and risk are adequate and sufficient for that purpose. Personal trust is the only type of trust that can be meaningful for describing interactions, according to Williamson. He argues that personal trust applies to emotional and personal interactions such as love 240 13 Computational Trust relationships where mutual performance is not always monitored and where failures are forgiven rather than sanctioned. In that sense, traditional computational models would be inadequate e.g. because of insufficient data and inadequate sanctioning, but also because it would be detrimental to the relationships if the involved parties were to take a computational approach. Non-computation models for trust can be meaningful for studying such relationships according to Williamson, but developing such models should be done within the domains of sociology and psychology, rather than in economy. In the light of Williamson’s view on modelling trust it becomes important to judge the purpose and merit of computational trust itself. Can computational trust add anything new and valuable to the Internet technology and economy? The answer, in our opinion, is definitely yes. The value of computational trust lies in the architectures and mechanisms for collecting trust relevant information, for efficient, reliable and secure processing, for distribution of derived trust and reputation scores, and for taking this information into account when navigating the Internet and making decisions about online activities and transactions. Economic models for risk taking and decision making are abstract and do not address how to build trust networks and reputation systems. Computational trust specifically addresses how to build such systems, and can be combined with economic modeling whenever relevant and useful. 13.1.3 Reputation and Trust The concept of reputation is closely linked to that of trustworthiness, but it is evident that there is a clear and important difference. For the purpose of this study, we will define reputation according to Merriam-Webster’s online dictionary [68]. Definition 13.3 (Reputation). The overall quality or character as seen or judged by people in general. ⊔ ⊓ This definition corresponds well with the view of social network researchers [26, 63] that reputation is a quantity derived from the underlying social network which is globally visible to all members of the network. The difference between trust and reputation can be illustrated by the following perfectly normal and plausible statements: 1. “I trust you because of your good reputation.” 2. “I trust you despite your bad reputation.” Assuming that the two sentences relate to the same trust scope, statement a) reflects that the relying party is aware of the trustee’s reputation, and bases his trust on that. Statement b) reflects that the relying party has some private knowledge about the trustee, e.g. through direct experience or intimate relationship, and that these 13.2 Trust Transitivity 241 factors overrule any (negative) reputation that a person might have. This observation reflects that trust ultimately is a personal and subjective phenomenon that that is based on various factors or evidence, and that some of those carry more weight than others. Personal experience typically carries more weight than second hand trust referrals or reputation, but in the absence of personal experience, trust often has to be based on referrals from others. Reputation can be considered as a collective measure of trustworthiness (in the sense of reliability) based on the referrals or ratings from members in a community. An individual’s subjective trust can be derived from a combination of received referrals and personal experience. Reputation can relate to a group or to an individual. A group’s reputation can for example be modelled as the average of all its members’ individual reputations, or as the average of how the group is perceived as a whole by external parties. Tadelis’ (2001) [87] study shows that an individual belonging to to a given group will inherit an a priori reputation based on that group’s reputation. If the group is reputable all its individual members will a priori be perceived as reputable and vice versa. Reputation systems are automated systems for generating reputation scores about products or services. Reputation systems are based on receiving feedback and ratings from users about their satisfaction with products or services that they have had direct experience with, and uses the ratings and the feedback to derive reputation scores. Reputation systems are widely used on e-commerce platforms, social networks and in web 2.0 applications in general. Evidence opinions where the number of observations are explicitly represented are well suited as the basis for computation in reputation systems. Feedback can be represented as an observation, and can be merged using the cumulative fusion operator described in Section 11.3. This type of reputation computation engines is called a Bayesian reputation system because it is based on Bayesian statistics through Beta and Dirichlet PDFs. Chapter 15 below describes the principle and building blocks of Bayesian reputation systems. 13.2 Trust Transitivity The formalism for computational trust described in the following sections assumes that trust is interpreted as reliability trust according to Definition 13.1. Based on the assumption that reliability trust is a form of belief, degrees of trust can be expressed as trust opinions. 242 13 Computational Trust 13.2.1 Motivating Example for Transitive Trust We constantly make choices and decisions based on trust. As a motivating example, let us assume that Alice has trouble with her car, so she needs to get it fixed by a car mechanic. Assume further that Alice has recently moved to town and therefore has no experience with having her car serviced in that town. Bob who is one of her colleagues at work has lived in the town for many years. When Alice’s car broke down, Bob gave her a lift with his car. Alice noticed that Bob’s car was well maintained, so she intuitively trusted Bob in matters of car maintenance. Bob tells her that he usually gets his car serviced by a car mechanic named Eric, and that based on direct experience Eric seems to be a very skilled car mechanic. As a result, Bob has direct trust in Eric. Bob gives her the advice to get her car fixed at Eric’s garage she. Based on her trust in Bob in matters of car maintenance, and on Bob’s advice, Alice develops trust in Eric too. Alice has derived indirect trust in Eric, because it is not based in direct experience. Still, it is genuine trust which helps Alice to make a decision about where to get her car fixed. This example represents trust transitivity in the sense that Alice trusts Bob who trusts Eric, so that Alice also trusts Eric. This assumes that Bob actually tells Alice that he trusts Eric, which is called a recommendation. This is illustrated in Figure 13.2, where the indexes indicate the order in which the trust relationships and recommendations are formed. 4 derived functional trust Alice referral trust 2 Eric Bob functional trust 1 3 recommendation Fig. 13.2 Transitive trust principle Trust is only conditionally transitive [10]. For example, assume that Alice would trust Bob to look after her child, and that Bob trusts Eric to fix his car, then this does not imply that Alice trusts Eric for looking after her child. However, when certain semantic requirements are satisfied [47], trust can be transitive, and a trust system can be used to derive trust. For example, every trust edge along a transitive chain must share the same trust scope. In the last example, trust transitivity collapses because the scopes of Alice’s and Bob’s trust are different. These semantic requirements are described in Section 13.2.5 below. Based on the situation of Figure 13.2, let us assume that Alice needs to have her car serviced, so she asks Bob for his advice about where to find a good car mechanic 13.2 Trust Transitivity 243 in town. Bob is thus trusted by Alice to know about a good car mechanic, and to tell his honest opinion about that. Bob in turn trusts Eric to be a good car mechanic. The examples above assume some sort of absolute trust between the agents in the transitive chain. In reality trust is never absolute, and many researchers have proposed to express trust as discrete verbal statements, as probabilities or other continuous measures. When applying computation to such trust measures, intuition dictates that trust should be weakened or diluted through transitivity. Revisiting the above example, this means that Alice’s derived trust in the car mechanic Eric through the recommender Bob can be at most as strong or confident as Bob’s trust in Eric. How trust strength and confidence should be formally represented depends on the particular formalism used. It could be argued that negative trust in a transitive chain can have the paradoxical effect of strengthening the derived trust. Take for example the case where Alice distrusts Bob, and Bob distrusts Eric. In this situation, it might be reasonable for Alice to derive positive trust in Eric, since she thinks “Bob is trying to trick me, I will not rely on him”. In some situations it is valid to assume that the enemy of my enemy is my friend. If Alice would apply this principle, and Bob would recommends distrust in Eric, then Bob’s negative advice could count as a pro-Eric argument from Alice’s perspective. The question of how transitivity of distrust should be interpreted can quickly become very complex because it can involve multiple levels of deception. Models based on this type of reasoning have received minimal attention in the trust and reputation systems literature, and it might be argued that the study of such models belongs to the intelligence analysis discipline, rather than online trust management. However, the fundamental issues and problems are the same in both disciplines. The safe and conservative approach is to assume that distrust in a node which forms part of a transitive trust path should contribute to increased uncertainty in the opinion about the target entity or variable. This is also the approach taken by the typical trust discounting operator described in Section 13.3. 13.2.2 Referral Trust and Functional Trust With reference to the previous example, it is important to separate between trust in the ability to recommend a good car mechanic which represents referral trust, and trust in actually being a good car mechanic which represents functional trust. The scope of the trust is nevertheless the same, namely to be a good car mechanic. Assuming that Bob has demonstrated to Alice that he is knowledgeable in matters relating to car maintenance, Alice’s referral trust in Bob for the purpose of recommending a good car mechanic can be considered to be direct. Assuming that Eric on several occasions has proven to Bob that he is a good mechanic, Bob’s functional trust in Eric can also be considered to be direct. Thanks to Bob’s advice, Alice also trusts Eric to actually be a good mechanic. However, this functional trust must be 244 13 Computational Trust considered to be indirect, because Alice has not directly observed or experienced Eric’s skills in servicing and repairing cars. The concept of referral trust represents a third type of belief/trust relationships that comes in addition to belief relationships and functional trust relationships. The list of all three types of belief/trust relationships is given in Table 13.1, which extends Table 3.1 on p.19. Table 13.1 Notation for belief, functional trust and referral trust relationships Nr. Relationship type Formal notation Graph edge notation Interpretation 1a Belief [A, X] A −→ X A has an opinion on variable X 2a Functional trust [A, E] A −→ E A has a functional trust opinion in entity E 2b Functional trust [A,tE ] A −→ tE A has a functional trust opinion in value tE 3a Referral trust [A; B] A 99K B A has a referral trust opinion in entity B 3b Referral trust [A;tB ] A 99K tB A has a referral trust opinion in value tB Note that types 2a and 2b in Table 13.1 simply give two equivalent notations for functional trust relationships, and that types 3a and 3b give two equivalent notations for referral trust relationships. 13.2.3 Notation for Transitive Trust Table 13.1 specifies the notation for representing simple belief and trust relationships, where each relationship represents an edge. Transitive trust paths are formed by combining relationship edges with the transitivity symbol ‘:’. For example the transitive trust path of Figure 13.2 can be formally express as: Notation for transitive trust in Figure 13.2 : [A, E] = [A; B] : [B, E]. (13.2) The referral trust edge from A to B is thus denoted [A; B], and the functional trust edge from B to E is denoted [B, E]. The serial/transitive connection of the two trust edges produces the derived functional trust edge [A, E]. The mathematics for computing derived trust opinions resulting from transitive trust networks such as in Figure 13.2 and expressed by Eq.(13.2) is given by the trust discounting operator described in Section 13.3 below. Let us slightly extend the example, wherein Bob does not actually know any car mechanics himself, but he trusts Claire, whom he believes knows a good car mechanic. As it happens, Claire is happy to give a recommendation about Eric to Bob, which Bob passes on to Alice. As a result of transitivity, Alice is able to derive 13.2 Trust Transitivity 245 trust in Eric, as illustrated in Figure 13.3. This clearly illustrates that referral trust in this example is about the ability to recommend somebody who can fix cars, and that functional trust in this example is about the ability to actually fix cars. “To be skilled at fixing cars” represents the scope of the trust relationships in this example. Alice 2 rec. 1 referral trust Bob 2 rec. Claire 1 referral trust Eric 1 functional trust derived functional trust 3 Fig. 13.3 Trust derived through transitivity Formal notation of the graph of Figure 13.3 is given in Eq.(13.3). Transitive trust path notation: [A, E] = [A; B] : [B;C] : [C, E] (13.3) The functional trust for the example of Figures 13.3 can be represented by the binomial opinion ωEC which then expresses C’s level of functional trust in entity E. The equivalent notation ωtCE makes explicit the belief in the trustworthiness of E expressed as the variable value tE , where high belief mass assigned to tE means that C has high trust in E. Similarly, the opinion ωBA expresses A’s referral trust in B, with equivalent notation ωtAB , where the statement tB e.g. is interpreted as tB : “Entity B can provide good advice about car mechanics”. 13.2.4 Compact Notation for Transitive Trust Paths A transitive trust path consists of a set of trust edges representing chained trust relationships. It can be noted that two adjacent trust edges repeat the name of an entity connecting the two edges together. For example, the entity name B appears twice in the notation of the trust path [A; B] : [B;C]. This notation can be simplified by omitting the repetition of entities, and by representing consecutive edges as a single multi-edge. By using this principle, the compact notation for the transitive trust path of Figure 13.3 emerges, as given in Eq.(13.4), which also shows the corresponding equivalent full notation. Compact notation for trust paths: [A; B;C, E] ≡ [A; B] : [B;C] : [C, E]. (13.4) 246 13 Computational Trust The advantage of the compact notation is precisely that it is more compact than the full notation, resulting in simpler expressions. We sometimes use the compact notation in our description of trust networks below. 13.2.5 Semantic Requirements for Trust Transitivity The concept of referral trust might seem subtle. The interpretation of referral trust is that Alice trusts Bob to recommend somebody (who can recommend somebody etc.) who can recommend a good car mechanic. At the same time, referral trust always assumes the existence of functional trust or belief at the end of the transitive path, which in this example is about being a good car mechanic. The ‘referral’ variant of a trust can be considered to be recursive, so that any transitive trust chain, with arbitrary length, can be expressed. This principle is captured by the following criterion. Definition 13.4 (Functional Trust Derivation Criterion). Derivation of functional trust through referral trust, requires that the last trust edge represents functional trust, and all previous trust arcs represents referral trust. ⊔ ⊓ In practical situations, a trust scope can be characterised by being general or specific. For example, knowing how to change wheels on a car is more specific than to be a good car mechanic, where the former scope is a subset of the latter. Whenever the functional trust scope is equal to, or a subset of the referral trust scopes, it is possible to form transitive paths. This can be expressed with the following consistency criterion. Definition 13.5 (Trust Scope Consistency Criterion). A valid transitive trust path requires that the trust scope of the functional/last edge in the path be a subset of all previous referral trust edges in the path. ⊔ ⊓ Trivially, every trust edge can have the same trust scope. Transitive trust propagation is thus possible with two variants (i.e. functional and referral) of a single trust scope. A transitive trust path stops at the first functional trust edge encountered. It is, of course, possible for a principal to have both functional and referral trust in another principal, but that should be expressed as two separate trust edges. The existence of both a functional and a referral trust edge, e.g. from Claire to Eric, should be interpreted as Claire having trust in Eric not only to be a good car mechanic, but also to recommend other car mechanics. 13.3 The Trust Discounting Operator 247 13.3 The Trust Discounting Operator 13.3.1 Principle of Trust Discounting The general idea behind trust discounting is to express degrees of trust in an information source and then to discount information provided by that source as a function of the trust in the source. We represent both the trust and the provided information in the form of subjective opinions, and then define an appropriate operation on these opinions to find the trust discounted opinion. Let agent A denote the relying party and agent B denote an information source. Assume that agent B provides information to agent A about the state of a variable X expressed as a subjective opinion on X. Assume further that agent A has an opinion on the trustworthiness of B with regard to providing information about X, i.e. the trust scope is to provide information about X. Based on the combination of A’s trust in B as well as B’s opinion about X given as an advice to A, it is possible for A to derive an opinion about X. This process is illustrated in Figure 13.4. Arguments: A trusts Z BA Z XB B believes Result: X A believes Z [XA; B ] Z BA Z XB X Fig. 13.4 Trust discounting of opinions Several trust discounting operators for subjective logic are described in the literature [49, 51]. The general representation of trust discounting is through conditionals [51], while special cases can be expressed with specific trust discounting operators. In this paper we use the specific case of uncertainty-favouring trust discounting which enables the uncertainty in A’s derived opinion about X to increase as a function of the projected distrust in the recommender B. The uncertainty-favouring trust discounting operator is described below. 13.3.2 Trust Discounting with 2-Edge Paths Agent A’s referral trust in B can be formally expressed as a binomial opinion on domain TB = {tB ,t B } where the values tB and t B denote trusted and distrusted respectively. We denote this opinion by ωBA = (bAB , dBA , uAB , aAB ) 2 . The values bAB , dBA , According to Eq.(13.1), an equivalent notation for this opinion is ω tAB = (btAB , dtAB , utAB , atAB ). For practical reasons we use the simplified notation here. 2 248 13 Computational Trust and uAB represent the degrees to which A trusts, does not trust, or is uncertain about the trustworthiness of B in the current situation, while aAB is a base rate probability that A would assign to the trustworthiness of B a priori, before receiving the advice. Definition 13.6 (Trust Discounting for 2-Edge Path). Assume agents A and B where A has referral trust in B for a trust scope related to domain X. Let X denote a variable on domain X, and let ωXB = (bbBX , uBX , a BX ) be agent B’s general opinion on X as recommended by B to A. Also assume that agent A’s referral trust in B with respect to recommending belief about X is expressed as ωBA . The notation for trust discounting is given by: [A;B] (13.5) ωX = ωBA ⊗ ωXB . The trust discounting operator combines agent A’s referral trust opinion about agent B, denoted by ω AB , to discount B’s opinion about variable X, denoted ωXB , to [A;B] produce A’s derived opinion about X, denoted ωX . The parameters of the derived [A;B] opinion ωX are defined in the following way: [A;B] b X (x) = PAB b BX (x) [A;B] [A;B] = 1 − PAB ∑ b BX (x) ωX : uX (13.6) x∈R(X) [A;B] a X (x) = a BX (x) ⊔ ⊓ The effect of the trust discounting operator is illustrated in Figure 13.5 with the following example. Let ωBA = (0.20, 0.40, 0.40, 0.75) be A’s trust opinion on B, with projected probability PAB = 0.50. Let further ωXB = (0.45, 0.35, 0.20, 0.25) be B’s binomial opinion about the binary variable X, i.e with projected probability PBx1 = 0.50. [A;B] By using Eq.(13.6) we can compute A’s derived opinion about X as ωx = [A;B] (0.225, 0.175, 0.600, 0.250) which has projected probability Px1 = 0.375. The opinion values are summarised in Eq.(13.7). 13.3 The Trust Discounting Operator 249 u u u A bxA1 ; B Z XA; B d xA1 ; B PB d BA bBA Z A B u BA tB = B B d x bxB1 Z X 1 u XA; B u XB A B P a A B A’s referral trust in B tB x2 a B x1 B x1 x1 P B’s opinion about X x2 a xA1; B PxA1 ; B x1 A’s derived opinion about X Fig. 13.5 Uncertainty-favouring trust discounting A b B A dB ωBA : uAB aAB A PB = 0.20 = 0.40 = 0.40 = 0.75 = 0.50 B b x1 B dx1 ωXB : uBX aBx1 B Px1 = 0.45 = 0.35 = 0.20 = 0.25 = 0.50 [A;B] bx1 [A;B] dx1 [A;B] ωX : u[A;B] X [A;B] ax1 [A;B] Px1 = 0.225 = 0.175 = 0.600 = 0.250 = 0.375 (13.7) [A;B] typically gets increased uncertainty, comThe trust-discounted opinion ωX pared to the original opinion recommended by B, where the uncertainty increase is dictated by the projected probability of the referral trust opinion ωBA . The principle is that the smaller the projected probability PAB , the greater the uncertainty of the derived opinion ωBA . Figure 13.5 illustrates the general behaviour of the uncertainty-favouring trust discounting operator, where the derived opinion is constrained to the shaded subtriangle at the top of the right-most triangle. The size of the shaded sub-triangle corresponds to the projected probability of trust in the trust opinion. The effect of this is that the barycentric representation of ωXB is shrunk proportionally to PAB to become a barycentric opinion representation inside the shaded sub-triangle. Some special cases are worth mentioning. In case the projected trust probability equals one, which means complete trust, the relying party accepts the recommended opinion ‘as is’, meaning that the derived opinion is equal to the recommended opinion. In case the projected trust probability equals zero, which means complete distrust, the recommended opinion is reduced to a vacuous opinion, meaning that the recommended opinion is completely discarded. It can be mentioned that the trust discounting operator described above is a special case of the general trust discounting operator for deriving opinions from arbitrarily long trust paths, as expressed by Definition 13.7 below. 250 13 Computational Trust 13.3.3 Example: Trust Discounting of Restaurant Recommendation The following example illustrates how trust discounting is applied intuitively in real situations. Let us assume that Alice is on holiday in a town in a foreign country, and is looking for a restaurant where the locals go, because she would like to avoid places overrun by tourists. Alice’s impression is that it is hard to find a good restaurant, and guesses that only about 20% of the restaurants could be characterised as good. She would like to find the best one. While walking around town she meets a local called Bob who tells her that restaurant Xylo is the favourite place for locals. We assume that Bob is a stranger to Alice, so that a priori her trust in Bob is affected by high uncertainty. However, it is enough for Alice to assume that locals in general give good advice, which translates into a high base rate for her trust in the advice from locals. Even if her trust in Bob is vacuous, a high base rate will result in a strong projected probability of trust. Assuming that Bob gives a very positive recommendation about Xylo, Alice will derive a positive opinion about the restaurant based on Bob’s advice. This example situation can be translated into numbers. Figure 13.6 shows a screenshot of the online demonstrator for subjective logic trust discounting, with arguments that are plausible for the example of the restaurant recommendation. u u u = DISCOUNT d b aP Opinion on [A;B] 0.00 belief disbelief 0.00 uncertainty 1.00 base rate 0.90 probability 0.90 b d a Opinion on [B,X] 0.95 belief disbelief 0.00 uncertainty 0.05 base rate 0.20 probability 0.96 P b d a P Opinion on [A;B,X] 0.85 belief disbelief 0.00 uncertainty 0.15 base rate 0.20 probability 0.88 Fig. 13.6 Example trust discounting in the restaurant recommendation example. As a result of the recommendation Alice becomes quite convinced that Xylo is the right place to go, and intends to have dinner at Xylo in the evening. However, if Alice receives a second and contradictory advice, her trust in Xylo could drop dramatically, so she might change her mind. This scenario is described in an extended version of this example in Section 13.5.3 below. 13.3 The Trust Discounting Operator 251 The next section describes how transitive trust path longer that two edges can be analysed. 13.3.4 Trust Discounting for Multi-Edge Path Trust discounting described in Definition 13.6 describes how trust discounting is performed for a trust path consisting of two adjacent edges, but does not say how it should be computed for longer trust paths. The method for computing transitive trust in case three or more edges are chained is described below. This type of multi-edge transitive trust path can be split into the referral trust part, and the final belief part. Consider the following trust network represented as a graph. Multi-edge trust graph: (A1 −→ X) = (A1 99K A2 99K . . . An −→ X) (13.8) The derived functional belief edge [A1 , X] of Eq.(13.8) is formally expressed as: Full formal notation: [A1 , X] = [A1 ; A2 ] : [A2 ; A3 ] : . . . [An , X] Compact trust path notation: [A1 , X] = [A1 ; A2 ; . . . An , X] (13.9) Separate referral and functional: [A1 , X] = [A1 ; . . . An ] : [An , X] Short notation with separation: [A1 , X] = [A1 ; An ] : [An , X] Let each edge from the full formal notation have assigned an opinion, where the inter-agent trust opinions are denoted ωAAi , and the final opinion is denoted (i+1) ωXAn . All the inter-agent opinions represent referral trust, whereas the final opinion represents functional belief. The projected probability of the referral trust part [A1 ; An ] is computed as: n−1 Referral Trust Projected Probability: PAA1n = ∏ PAAii+1 (13.10) i=1 Trust discounting with an arbitrarily long referral trust path is defined as a function of the referral trust projected probability of Eq.(13.10). Definition 13.7 (Trust Discounting for Multi-Edge Paths). Assume a transitive trust path consisting of chained trust edges between n agents denoted A1 . . . An followed by a final belief edge between the last agent An and the target node X, where the goal is to derive a belief edge between the first agent A1 and 252 13 Computational Trust the target X. The parameters of the derived opinion ωXA1 are defined in the following way: A b X1 (x) = PAA1n b AXn (x) A1 A1 = 1 − PAA1n ∑ b AXn (x) ωX : uX (13.11) x∈R(X) A1 a X (x) = aBX (x) ⊔ ⊓ The principle of multi-edge trust discounting is to take the projected probabilities of the referral trust part expressed by Eq.(13.10) as a measure of the trust network reliability. This reliability measure is then used to discount the recommended functional opinion ωXAn of the final belief edge [An , X]. Note that transitive referral trust computation described here is computed similarly to serial reliability of components in a system, as described in Section 7.2. The difference is that in case of serial reliability analysis the subjective logic multiplication operator described in Chapter 7 is used, whereas for transitive trust networks, the simple product of projected probabilities is used. To only use probability product is sufficient for trust networks because only the projected probability is needed for the trust discounting operator. In the case where every referral trust opinion has projected probability PAAii+1 = 1, then the product referral trust projected probability also equals to 1, so that the derived opinion ωXA1 is equal to the recommended opinion ωXAn . In case any of the referral trust relationships has projected probability PAAii+1 01, then the product referral trust projected probability is also zero, so the derived opinion ωXA1 becomes vacuous. Note that the operator for deriving opinions from arbitrarily long trust paths described here is a generalisation of the trust discounting operator for combining only two edges, described by Definition 13.6 above. As an example consider the transitive trust network expressed as: [A, X] = [A; B] : [B;C] : [C; D] : [D, X] = [A; B;C; D, X] (13.12) = [A; B;C; D] : [D, X] = [A; D] : [D, X]. Table 13.2 provides example opinions for the trust path of Eq.(13.12), and shows the result of the trust discounting computation. Although each referral trust edge has relatively high projected probability, their product quickly drops to a relatively low value as expressed by PAD = 0.44. The trust [A;D] therefore becomes highly uncertain. discounted opinion ωX 13.3 The Trust Discounting Operator 253 Table 13.2 Example trust discounting in multi-edge path ωBA Parameters: Argument opinions: ωCB ωDC ωXD Product: PAD Derived: [A;D] ωX belief: b 0.20 0.20 0.20 0.80 0.35 disbelief: d 0.10 0.10 0.10 0.20 0.09 uncertainty: u 0.70 0.70 0.70 0.00 0.56 base rate: a 0.80 0.80 0.80 0.10 0.10 projected prob.: P 0.76 0.76 0.76 0.80 0.44 0.41 This result reflects the intuition that that a long path of indirect recommendations quickly becomes meaningless. It is worth asking the question whether a trust path longer than a few edges can be practical. In everyday contexts we rarely rely on trust paths longer than a couple of edges. For example, few people would put much faith in an advice delivered like this: “A colleague of mine told me that his sister has a friend who knows a good car mechanic, so therefore you should take your car to his garage”. It is natural to become suspicious about the veracity of information or advice in situations with a high degree of separation between the analyst agent A and the origin source of information about the target X, because we often see how information gets distorted when passed from person to person through many hops. This is because we humans are quite unreliable agents regarding truthful representation and forwarding of information that we receive. However, computer systems are able to correctly propagate information through multiple nodes with relatively good reliability. The application of trust transitivity through long chains is therefore better suited for computer networks than human networks. MANETs (Mobile Ad-Hoc Networks) and sensor networks represent a type of computer networks where multiple nodes depend on each other for service provision. A typical characteristic of such networks is the uncertain reliability of each node, as well as the lack of control of one node over other nodes. Transitive trust computation with subjective logic therefore is highly relevant for MANETs and similar networks, even in case of long trust paths. 254 13 Computational Trust 13.4 Trust Fusion It is common to collect recommendations from several sources in order to be better informed e.g when making decisions. This can be called trust fusion, meaning that the derived trust opinions resulting from from multiple trust paths are fused. Let us continue the example of Alice who needs to have her car serviced, where she has received a recommendation from Bob to use the car mechanic Eric. This time we assume that Alice has doubts about Bob’s advice so she would like to get a second opinion. She therefore asks her other colleague Claire for her opinion about Eric. The trust graph which includes both recommendations is illustrated in Figure 13.7. B 2 A E 1 1 C 1 1 2 Legend: derived functional trust referral trust recommendation belief / functional trust 3 Fig. 13.7 Example of trust fusion Formal notation of the graph of Figure 13.7 is shown in Eq.(13.13). This trust network also involves trust transitivity which is computed with trust discounting. Trust fusion formal notation: [A, E] = ([A; B] : [B, E]) ⋄ ([A;C] : [C, E]) (13.13) Compact notation: [A, E] = [A; B, E] ⋄ [A;C, E] The computation of trust fusion involves both trust discounting and belief fusion, because what is actually fused is a pair of opinions computed with trust discounting. The trust target E in Figure 13.13 can be expressed as a binary variable X = {“E is reliable”, “E is unreliable”} so that that A in fact derives a (trust) opinion about the variable X. In general it is assumed that agent A receives opinions about target X from two sources B and C, and that A has referral trust opinions in both B and C, as illustrated in Figure 13.8. 13.4 Trust Fusion 255 Arguments: B ZX believe X Z BA A trusts Result: B A believes ZCA C Z XC Z X[A; B ]¡[A;C ] X (Z BA Z XB ) (ZCA Z XC ) Fig. 13.8 Fusion of trust-discounted opinions The symbol ⋄ denotes fusion between the two trust paths [A; B, X] and [A;C, X] (in compact notation). The choice of this symbol was motivated by the resemblance between the diamond shape and the graph of a typical trust fusion network such as on the left side of Figure 13.8. The expression for the A’s derived opinion about X as a function of trust fusion is given by Eq.(13.14). [A;B]⋄[A;C] Trust fusion computation: ωX = (ωBA ⊗ ωXB ) ⊕ (ωCA ⊗ ωXC ) (13.14) The operator for fusing trust paths must be selected from the set of fusion operators described in Chapter 11 according to the selection criteria described in Figure 11.2 on p.206. As an example Eq.(13.14) shows trust fusion using the cumulative fusion operator ‘⊕’. Table 13.3 provides a numerical example showing the result of cumulative trust fusion in the example of Alice and the car mechanic of Figure 13.7. The trust opinions derived from each path are first computed with the trust discounting operator of Definition 13.6 (which is a special case of Definition 13.7). Then the two derived trust opinions are fused with the cumulative fusion operator of Definition 11.5. Table 13.3 Example trust fusion for the car mechanic situation of Figure 13.7 ωBA Parameters: Argument opinions: ωEB ωCA ωEC Intermediate: [A;B] [A;C] ωE ωE Derived: [A;B]⋄[A;C] ωE belief: b 0.40 0.90 0.50 0.80 0.630 0.600 0.743 disbelief: d 0.10 0.00 0.00 0.10 0.000 0.075 0.048 uncertainty: u 0.50 0.10 0.50 0.10 0.370 0.325 0.209 base rate: a 0.60 0.40 0.50 0.40 0.400 0.400 0.400 projected prob.: P 0.70 0.94 0.75 0.84 0.778 0.730 0.826 256 13 Computational Trust In this example, both Bob and Claire offer relatively strong recommendations about Eric, so that Alice’s first derived trust in Eric is strengthened by asking Claire for a second advice. Figure 13.9 shows a screenshot of the online demonstrator for subjective logic trust network based on the input arguments in the example of Table 13.3. u u d a P t d u d a P t d u P a u a P t E t = d a P t Fig. 13.9 Example trust fusion Figure 13.9 shows the same trust network as that of Figure 13.7, but where the opinion triangles are placed on top of each edge. The input arguments are represented as the 4 opinion triangles at the top of the figure, and the derived trust opinion is represented as the opinion triangle at the bottom of the figure. This trust fusion example uses a combination of trust discounting and fusion. By combining fusion and trust discounting, complex trust networks can be modeled and analysed, as described in the following sections. 13.5 Trust Revision 13.5.1 Motivation for Trust Revision A complicating element in case of trust fusion is when multiple sources provide highly conflicting advice, which might indicate that one or both sources are unreliable. In this case a strategy is needed for dealing with the conflict. The chosen strategy must be suitable for the specific situation. 13.5 Trust Revision 257 A simplistic strategy for fusing conflicting opinions would be to consider the trust opinions as static, and not to revise trust at all. With this strategy the relying party only needs to determine the most suitable fusion operator for the situation to be analysed. For example, if averaging fusion is considered suitable, then a simple model would be to derive A’s opinion about X according to the principle of trust fusion of Eq.(13.13) as follows: (A;B)⋄(A;C) Simple averaging fusion: ωX = (ωBA ⊗ ωXB )⊕(ωCA ⊗ ωXC ) (13.15) However, there are several situations where simple fusion might be considered inadequate, and where it would be natural to revise one or both trust opinions. One such situation is when the respective opinions provided by B and C are highly conflicting in terms of their projected probability distributions on X. Note here that in some situations this might be natural, such as in case of short samples of random processes where a specific type of events might be observed in clusters. Another situation where highly conflicting beliefs might occur naturally is when the observed system can change characteristics over time and the observed projected probability distributions refer to different time periods. However, if the sources A and B have observe the same situation or event, possibly at the same time, but still produce different opinions, then it is likely that one or both sources are unreliable, so that trust revision should be considered. Another situation that calls for a trust revision is when the relying party A learns that the ground truth about X is radically different from the recommended opinions. The the analyst has good reasons to distrust a source that recommends a very different opinion. Since high conflict indicates that one or several sources might be unreliable, the strategy should aim at reducing the influence of unreliable sources in order to derive the most reliable belief in the target. A reduction of the influence of unreliable sources typically involves some form of trust revision, i.e. that the analyst’s trust in one or several advisors is reduced as a function of their degree of conflict. This section describes a strategy for how this should be done. In conclusion, there are situations where revision of trust opinions is warranted. The method for doing this is described in the next section. 13.5.2 Trust Revision Method Trust revision is based on the degree of conflict between the opinions derived through trust discounting in two different paths. The rationale is that conflict indicates that one or both advisors are unreliable, so that the referral trust in the advisors should be revised as a function of the degree of conflict. 258 13 Computational Trust Recall from Section 4.8 and Definition 4.20 that DC (degree of conflict) is the product of PD (projected probability distance) and CC (conjunctive certainty) expressed as: Degree of conflict: DC = PD · CC [A;B] When applied to the two trust-discounted opinions ωX probability distance PD is expressed as: [A;B] [A;B] [A;C] PD(ωX , ωX ) ∑ |PX = x∈X (13.16) [A;C] (x) − PX [A;C] and ωX (x)| (13.17) 2 [A;B] Similarly, when applied to the two trust discounted opinions ωX conjunctive certainty CC is expressed as: [A;B] CC(ωX [A;C] , ωX the projected [A;B] ) = (1 − uX [A;C] )(1 − uX [A;C] and ωX ) the (13.18) The degree of conflict between the two trust discounted opinions is then: [A;B] DC(ωX [A;C] , ωX [A;B] ) = PD(ωX [A;C] , ωX [A;B] ) · CC(ωX [A;C] , ωX ). (13.19) Knowing the degree of conflict is only one of the factors for determining the magnitude of revision for the referral trust opinions ωBA and ωCA . It is natural to let the magnitude of trust revision also be determined by the relative degree of uncertainty in the referral trust opinions, so that the most uncertain opinion gets revised the most. The rationale is that if the analyst has uncertain referral trust in another agent, then the level of trust could easily change. The uncertainty differential (UD) is a a measure for relative uncertainty between two referral trust opinions. There is one UD for each opinion relative to the other. A A |ω A ) = uB UD( ω A A B C uB +uC Uncertainty Differentials: (13.20) A UD(ω A |ω A ) = uC C B A uAB +uC It can be seen that UD ∈ [0, 1], where UD = 0.5 means that both referral trust opinions have equal uncertainty and therefore should get their equal share of revision. The case where UD = 1 means that the first referral trust opinion is infinitely more uncertain than the other and therefore should get all the revision. The case where UD = 0 means that the other referral trust opinion is infinitely more uncertain than the first, so that the other opinion should get all the revision. In summary the UD factors dictate the relative share of trust revision for each referral trust opinion. The magnitude of trust revision is determined by revision factor RF which is a product of DC and and UD. 13.5 Trust Revision Revision Factors: 259 [A;B] [A;C] RF(ωBA ) = UD(ωBA |ωCA ) · DC(ωX , ωX ) RF(ω A ) = UD(ω A |ω A ) · DC(ω [A;B] , ω [A;C] ) C C B X X (13.21) Trust revision consists of modifying the referral trust opinions by increasing distrust mass at the cost of trust mass and uncertainty mass. The idea is that sources found to be unreliable should be distrusted more. A source found to be completely unreliable should be absolutely distrusted. In terms of the opinion triangle, trust revision consists of moving the opinion point towards the t-vertex, as shown in Figure 13.10. Given the argument referral eBA is trust opinion ωBA = (bAB , dBA , uAB , aAB ), the revised referral trust opinion denoted ω expressed as: A b̃B = bAB − bAB · RF(ωBA ) d˜BA = dBA + (1 − dBA ) · RF(ωBA ) A eB : (13.22) Revised opinion ω A = uA − uA · RF(ω A ) ũ B B B B A ãB = aAB Similarly, given the argument referral trust opinion ωCA = (bCA , dCA , uCA , aCA ), the eCA is expressed as: revised referral trust opinion denoted ω A b̃ = bCA − bCA · RF(ωCA ) C d˜CA = dCA + (1 − dCA ) · RF(ωCA ) A eC : Revised opinion ω (13.23) A = uA − uA · RF(ω A ) ũ C C C C A ãC = aCA Figure 13.10 illustrates the effect of trust revision on ωBA which consists of making the referral trust opinion more distrusting. eBA and ω eCA , trust fusion accordAfter trust revision has been applied to produce ω ing to Eq.(13.15) can be repeated, with reduced conflict. Trust revised averaging fusion is given by the expression in Eq.(13.24) below. (A;B)⋄(A;C) eX Revised trust fusion: ω eBA ⊗ ωXB )⊕(ω eCA ⊗ ωXC ) = (ω (13.24) Trust revision offers a strategy to handle situations where potentially unreliable sources give conflicting advice or recommendations, presumably because one or both of them give advice that is wrong or significantly different from the ground truth. Based on the degree of conflict, and also on the prior uncertainty in the referral trust opinions, trust revision determines the degree by which the referral trust opinions should be considered unreliable, and therefore be revised in order to have 260 13 Computational Trust u vertex (uncertainty) d BA bBA Z~ A Z BA u BA B t vertex t vertex (distrust) (trust) Fig. 13.10 Revision of referral trust opinion ωBA less influence on the derived fused belief. This process leads to more conservative results which take into account that the information sources might be unreliable. 13.5.3 Example: Conflicting Restaurant Recommendations We continue the example from Section 13.3.3 about the recommended restaurant Xylo, where this time we assume that Alice stays in a hostel with another traveler named Claire who tells her that she already tried the recommended restaurant, and that she was very disappointed because the food was very bad, and that there were no locals there. Let us assume that Alice has spoken with Claire a few times, and judges her to be an experienced traveler, so she intuitively gets a relatively high trust in Claire. Now Alice has received a second advice about the restaurant Xylo. This gives her a basis and a reason to revise her initial trust in Bob, which potentially could translate into distrusting Bob, and which could trigger a change in her initial belief about the restaurant Xylo. This example situation can be translated into numbers where ωBA , ωXB , ωCA and C ωX are the argument opinions. Let us first show the result of trust fusion without trust revision where the derived opinion ω [A;B]⋄[A;C]X is computed simple trust fusion as: [A;B]⋄[A;C] Simple trust fusion: ωX = (ωBA ⊗ ωXB ) ⊕ (ωCA ⊗ ωXB ) (13.25) The argument opinions as well as the derived opinion are shown in Table 13.4. The application of trust fusion to the situation where Alice receives advice from both Bob and Claire produces a derived opinion with projected probability PAX = 0.465 which indicates that the chances of Xylo being a good or bad restaurant are about even. This result seems counter-intuitive. Although the projected probabilities 13.5 Trust Revision 261 Table 13.4 Simple trust fusion of conflicting advice about restaurant ωBA Parameters: Argument opinions: ωXB ωCA ωXC Intermediate: [A;B] [A;C] ωX ωX Derived: [A;B]⋄[A;C] ωX belief: b 0.00 0.95 0.90 0.10 0.855 0.099 0.452 disbelief: d 0.00 0.00 0.00 0.80 0.000 0.792 0.482 uncertainty: u 1.00 0.05 0.10 0.10 0.145 0.109 0.066 base rate: a 0.90 0.20 0.90 0.20 0.200 0.200 0.200 projected prob.: P 0.90 0.96 0.99 0.12 0.884 0.121 0.465 PAB = 0.90 and PCA = 0.99 are relatively close, Alice’s referral trust belief bCA = 0.90 in Claire is much stronger than her referral trust belief bAB = 0.00 in Bob. It would therefore seem natural to let Claire’s advice carry significantly more weight, but in the case of simple trust fusion as expressed in Table 13.4 it does not. From an intuitive perspective, Alice’s natural reaction in this situation would be to revise her referral trust in Bob because his advice conflicts with that of Claire whom she trust with more certainty, i.e. with more belief mass. Alice would typically start to distrust Bob because apparently his advice is unreliable. As a result of this trust revision, Claire’s advice would carry the most weight. The application of the trust revision method described in Section 13.5.2 produces the following intermediate values: [A;B] [A;C] Projected distance: PD(ωX , ωX ) = 0.763 Conflict: Conjunctive Certainty: CC(ωX[A;B] , ωX[A;C] ) = 0.762 [A;B] [A;C] Degree of Conflict: DC(ωX , ωX ) = 0.581 Revision: Uncertainty Differential for B : UD(ωBA |ωCA ) = 0.909 Uncertainty Differential for C : UD(ωCA |ωBA ) = 0.091 Revision Factor for B : Revision Factor for C : RF(ωBA ) = 0.529 RF(ωCA ) = 0.053 (13.26) (13.27) These intermediate parameters of Eq.(13.26) and Eq.(13.27) determine the trust eBA and ω eCA that are specified in Table 13.5. The table revised referral trust opinions ω 262 13 Computational Trust also shows the result of applying trust fusion based on the revised referral trust opinions. Table 13.5 Trust revision of conflicting advice about restaurant eBA ω Parameters: Argument opinions: eCA ωXB ω ωXC Intermediate: eX[A;B] ω eX[A;C] ω Derived: eX[A;B]⋄[A;C] ω belief: b 0.00 0.95 0.85 0.10 0.403 0.094 0.180 disbelief: d 0.53 0.00 0.05 0.80 0.000 0.750 0.679 uncertainty: u 0.47 0.05 0.10 0.10 0.597 0.156 0.141 base rate: a 0.90 0.20 0.90 0.20 0.200 0.200 0.200 projected prob.: P 0.42 0.96 0.94 0.12 0.522 0.125 0.208 It can be seen that Alice’s revised referral trust in Bob has been reduced significantly, whereas her referral trust on Claire has been kept more or less unchanged. This reflects the intuitive reaction the we would have in a similar situation. Trust revision must be considered to be an ad hoc method, because there is no parallel in physical processes that can be objectively observed and analysed. Especially the expression for the revision factor RF is affected by the design choice of mirroring intuitive human judgment. There might be different design choices for the revision factor that better reflect human intuition, and that also can be shown to produce sound results under specific criteria. We invite the reader to reflect on these issues, and maybe come up with an alternative and improved design for the revision factor. Chapter 14 Trust Networks Trust networks represent chained trust and belief relationships between an analyst agent and a target entity or variable. Simple trust networks have already been described in Chapter 13 on computational trust, and included where the computation of transitive trust paths, and the computation of simple trust fusion networks. In case of more complex trust networks it is necessary to take an algorithmic approach to modeling and analysis. This chapter describes how to deal with trust networks that are more complex than those described in Chapter ??. The operators for fusion, trust discounting and trust revision can be applied to trust and belief relationships represented as a DSPG (Directed Series-Parallel Graphs). The next section therefore gives a brief introduction to such graphs. 14.1 Graphs for Trust Networks 14.1.1 Directed Series-Parallel Graphs Series-parallel graphs, called SP-graphs for short, represent a specific type of graphs that have a pair of distinguished vertices called source and sink. The following definition of SP-graphs is taken from [18]. Definition 14.1 (Series-Parallel Graph). A graph is an SP-graph, if it may be turned into a single edge connecting a source node s and a sink node t by a sequence of the following operations: (i) Replacement of a pair of edges incident to a vertex of degree 2 other than the source or sink with a single edge. (ii) Replacement of a pair of parallel edges with a single edge that connects their common endpoint vertices. ⊔ ⊓ 263 264 14 Trust Networks For example, Figure 14.1 illustrates how the SP-graph to the left can be stepwise transformed using the operations of Definition 14.1. SP-graph Transform 1 Transform 2 Transform 3 Transform 4 s s s s s t t t t t Fig. 14.1 Procedure for transforming and SP-graph into a single edge. Transform 1 results from applying procedure (i) 4 times. Transform 2 results from applying procedure (ii) twice. Transform 3 results from applying procedure (i) twice. Transform 4, which is a single edge, results from applying procedure (ii) once. The fact that the graph transforms into a single edge in this way proves that it is a SP-graph. Trust networks are represented as directed graphs. We therefore assume that an SP-graph representing a trust network is directed from the source to the sink, in which case it is called a directed series-parallel graph or DSPG for short [25]. Definition 14.2 (Directed Series-Parallel Graph). A graph is a DSPG (Directed Series-Parallel Graph) iff it is a SP-graph according to Definition 14.1 and it only consists of directed edges that form paths without loops from source to the sink. ⊓ ⊔ In the context of trust networks the source node of an DSPG is the analyst agent, aka. relying party, which is typically represent by the label A. In general, the sink node of a DSPG is the target variable which is typically represented by the label X, but which can also be a trusted entity represented e.g. by the label E. 14.2 Outbound-Inbound Node Set A DSPG (Directed Series Parallel Network) can consist of multiple subnetworks that themselves are sub-DSPGs. A parallel-path subnetwork (PPS) is a subnetwork that consists parallel paths in a DSPG. A node can be part of one or multiple edges. In general the degree of a node represents the number of edges that a node is part of. Since a DSPG is directed it is possible to distinguish between inbound degree (ID) and outbound degree (OD) for 14.2 Outbound-Inbound Node Set 265 a node. The inbound degree of a node is the number if inbound edges to that node. Similarly, the outbound degree of a node is the number of outbound edges from that node. Consider for example the following referral trust network: A 99K B 99K C Node B has degree 2, denoted as Deg(B) = 2, because it is connected to the two edges [A; B] and [B;C]. At the same time, node B has inbound degree ID(B) = 1 because its only inbound edge is [A; B], and has outbound degree OD(B) = 1 because its only outbound edge is [B;C]. Node A has ID(A) = 0 and OD(A) = 1. Obviously, for any node V in a DSPG, its degree is represented as Deg(V ) = ID(V ) + OD(V ). An ordered pair of nodes (Vs ,Vt ) in a DSPG is said to be connected if the second node Vt can be reached by departing from the first node Vs . From the simple example of a 3-node network above it can easily be seen that (A,C) is a connected pair of nodes. Consider a node in a DSPG. We define the outbound node set of the node as the set of edges that can be traversed after departing from that node. Similarly, the inbound node set of a node in a DSPG is the set of edges that can be traversed before reaching the node. Definition 14.3 (OINS: Outbound-Inbound Node Set). Consider an ordered pair of nodes (Vs ,Vt ) in a DSPG. We define the outbound-inbound node set (OINS) of the ordered pair to be the intersection of the outbound set of the first node Vs and the inbound set of the second node Vt . ⊔ ⊓ Some simple properties of an OINS can be stated. Theorem 14.1. A pair of of nodes (Vs ,Vt ) in a DSPG are connected iff their OINS (Outbound-Inbound Node Set) is non-empty. Proof. If OINS 6= 0, / then the OINS contains at least one edge that can be traversed after departing from the first node Vs and that can be traversed before reaching the second node Vt , which means that it is possible to reach the second node Vt by departing from the first node Vs , so the nodes are connected. If OINS = 0, / then the OINS contains no path connecting the two nodes, which means that they are not connected. ⊓ ⊔ 14.2.1 Parallel-Path Subnetworks A DSPG can in general consist of multiple subnetworks that themselves are DSPGs that can contain parallel paths. We are interested in identifying subnetworks within a DSPG that contain parallel paths. A parallel-path subnetwork (PPS) in a DSPG is the set of multiple paths between a pair of connected nodes, as defined next. 266 14 Trust Networks Definition 14.4 (parallel-path subnetwork). Select an ordered pair (Vs ,Vt ) of connected nodes in a DSPG. The subnetwork consisting of the pair’s OINS is a parallelpath subnetwork (PPS) iff both the outbound degree of the first node Vs in OINS satisfies OD(Vs ) ≥ 2, and the inbound degree of the second node Vt in the OINS satisfies ID(Vt ) ≥ 2. The node Vs is called the source of the PPS, and Vt is called the sink of the PPS. ⊔ ⊓ Consider for example the OINS of the node pair (C, J) in Figure 14.2. Within that particular OINS we have OD(C) = 2 and ID(J) = 3 which satisfies the requirements of Definition 14.4, so that the OINS is a PPS (parallel-path subnetwork). F D A B C I J E Legend: X G Referral trust H Functional trust / belief Fig. 14.2 DSPG with 5 PPSs (parallel-path subnetworks). It can also be verified that the respective OINSs of the node pairs (E, J), (C, F), (F, X) and (C, X) are also PPSs, which means that the DSPG of Figure 14.2 contains 5 PPSs in total. However, the sub-network between the connected node pair (E, X) is not a PPS because ID(X) = 1 within that OINS (outbound-inbound node set), which does not satisfy the requirements of Definition 14.4. 14.2.2 Nesting Level The concept of nesting level is important for analysing trust networks represented as a DSPG. In general the nesting level of an edge reflects how many PPSs it is part of in the DSPG. Each edge has a specific nesting level equal or greater than 0. For example, a trust network consisting of a single trust path has trust edges with nesting level 0, because the edges are not part of any subnetwork of parallel paths. The nesting level of an edge in a DSPG is defined next. Definition 14.5 (Nesting Level). Assume a DSPG consisting of multiple nodes connected via directed edges. The nesting level of an edge in the DSPG is equal to the number of PPSs (parallel-path subnetworks) that the edge is a part of. 14.3 Analysis of DSPG Trust Networks 267 Let e.g. [Vm ;Vn ] be an edge in a DSPG. The nesting level of the edge [Vm ;Vn ] is formally denoted NL([Vm ;Vn ]). The nesting level of an edge is an integer that can be zero or greater. ⊔ ⊓ In Figure 14.3 the nesting level of edges is indicated by the numbered diamonds on the edges. F 2 2 2 2 D I 2 2 A B 0 0 C 2 G 2 E Legend: 1 J 3 3 # X 2 H Nesting level 3 Referral trust Functional trust / belief Fig. 14.3 Nesting levels of edges in a DSPG. It can be seen that the edge [A; B] is not part of any PPS, so that NL([A; B]) = 0. It can also be seen that the edge [H; J] is part of 3 PPSs belonging to the node pairs (E, J), (C, F) and (C, X), so that NL([H; J]) = 3. The nesting level determines the order of computation of trust in a DSPG trust network, as described next. 14.3 Analysis of DSPG Trust Networks We assumes that the trust network to be analysed is represented in the form of a DSPG. It can be verified that the trust network of Figure 14.4 represents a DSPG (Directed Series-Parallel Graph) according to Definition 14.2. Analyst A B C D Legend: E F I G J H Referral trust Functional trust / belief Fig. 14.4 Trust network in the form of a DSPG. Target X 268 14 Trust Networks It can also be seen that the DSPG of Figure 14.4 consists of 3 PPSs (parallel-path subnetworks), represented by the source-sink pairs (D, J), (A, J) and (A, X). The compact formal expression for the trust network of Figure 14.4 is given in Eq.(14.1). [A, X] = [A; B; E; I, X] ⋄ (([A;C; F; J] ⋄ ([A; D] : ([D; G; J] ⋄ [D; H; J]))) : [J, X]). (14.1) Next we describe a simple algorithm for analysing and deriving belief/trust from a DSPG trust network like that of Figure 14.4. 14.3.1 Algorithm for Analysis of DSPG The procedure for computing derived trust in a trust network represented as a DSPG is explained below in the form of a flowchart algorithm. The procedure can e.g. be applied for computing the trust opinion ωXA in Figure 14.4. The procedure corresponds well to the operations of graph simplification of Definition 14.1. For the purpose of the computation principles defined here, agents and target are called nodes. Each step in the flowchart algorithm in Figure 14.5 is described below. (a) Prepare for analysing the trust network. This includes representing the trust network as a set of directed edges with pairs of nodes. Verify that the trust network is indeed a DSPG. (b) Identify each PPS (Parallel Path Subnetwork) with its pair of source and target nodes (Vs ,Vt ). Determine the nesting level of every edge as a function of the number of PPSs it is part of. (c) Select the PPS where all edges have the highest nesting level, and proceed to (d). In case no PPS remains proceed to (g). (d) For the selected PPS, compute 2-edge or multi-edge trust discounting of every path between Vs and Vt , where the node Vt is considered as a target node of the analysis. As a result, every path is transformed into an edge. (e) For the selected PPS, compute trust fusion of all edges. As a result, the PPS is transformed into a single edge. (f) Determine the nesting level of the edge that now replaces the selected PPS. (g) When no PPS exists, the trust network might still consist of a series of edges. In that case compute 2-edge or multi-edge trust discounting. In case the resulting network consists of a single edge, nothing needs to be done. (h) The trust network has now been transformed into a single edge between the analyst and the final target, and the computation is completed. A parser that implements the computational algorithm of Figure 14.5 is able to analyse the graph of e.g. Figure 14.4 and derive the opinion ωXA . 14.3 Analysis of DSPG Trust Networks (a) (b) 269 Prepare for computational trust analysis of DSPG Determine nesting levels of all edges, and find every PPS (c) Select PPS with highest nesting level PPS found no PPS found Compute trust (d) discounting for each path in PPS Compute trust discounting for last path in DSPG (g) Compute trust fusion (e) of edges that replace paths in PPS Completed DSPG trust computation (h) (f) Determine nesting level of edge that replaces PPS Fig. 14.5 Flowchart algorithm for computational trust analysis of DSPG network 14.3.2 Soundness Requirements for Trust Recommendations It is important that a recommendation is always passed in its original form from the recommender to the relying party, and not in the form of secondary derived trust. The reason for this is explained below. Figure 14.6 shows an example of how not to provide recommendations. In Figure 14.6 the trust and recommendation arrows are indexed according to the order in which they are formed whereas the initial trust relationships have no index. In the scenario of Figure14.6 D passes his recommendation about X to B and C (index 2) so that B and C are able to derive their opinions about X (index 3). Now B and C pass their derived opinions about X to A (index 4) so that she can derive her opinion about X (index 5). As a result A perceives the topology to be ([A; B, X] ⋄ [A;C, X]). Note that we use the compact notation presented in Section 13.2.4. The problem with the scenario of Figure 14.6 is that A ignores the presence of D so that A in fact derives a hidden topology that is different from the perceived topol- 270 14 Trust Networks B 4 2 A 1 3 D 1 1 C 1 X 1 2 4 3 incorrectly derived belief 4 Legend: referral trust recommendation belief / functional trust Fig. 14.6 Incorrect recommendation ogy, which in both in fact are different from the real topology. The three different topologies are given in Table 14.1. Table 14.1 Inconsistent trust network topologies Perceived topology: Hidden topology: Real topology: ([A; B, X] ⋄ [A;C, X]) ([A; B; D, X] ⋄ [A;C; D, X]) ([A; B; D] ⋄ [A;C; D]) : [D, X] The reason for this inconsistency is that B’s belief relationship [B, X] is derived from [B; D, X], and C’s belief relationship [C, X] is derived from [C; D, X]. So when [B;D] B recommends ωXB he implicitly recommends ωX , and when C recommends ωXC [C;D] she implicitly recommends ωX , but A ignores the influence of D in the received recommendations [36]. It can easily be seen that neither the perceived nor the hidden topology is equal to the real topology, which shows that this way of passing recommendations can produces inconsistent results. The sound way of sending the recommendations is to let B and C pass the recommended opinions they receive from D ‘as is’ without modification, as well as their respective trust opinions in D. This principle is certainly possible to follow, but it also requires that A is convinced that B and C have not altered the recommendations from D, which precisely is part of A’s referral trust in B and C. It is thus necessary that A receives all the trust recommendations unaltered and as expressed by the original recommending party. An example of a correct way of passing recommendations is indicated in Figure 14.7 14.3 Analysis of DSPG Trust Networks 2 A 271 B 2 1 1 D 1 C 1 X 1 2 Legend: referral trust recommendation belief / functional trust correctly derived belief 3 Fig. 14.7 Correct recommendation In the scenario of Figure 14.7 the perceived topology is equal to the real topology which can be expressed as: [A, X] = ([A; B; D] ⋄ [A;C; D]) : [D, X] (14.2) The lesson to be learned from the scenarios of Figure 14.6 and Figure 14.7 is that there is a crucial difference between recommending trust/belief in an entity or variable resulting from your own experience with that entity or domain, and recommending trust/belief in an entity or variable which has been derived as a result of recommendations received from others. The morale is that analysts should be aware of the difference between direct trust/belief and derived trust/belief. Figure 14.6 illustrated how problems can occur when derived belief is recommended, so the rule is to only recommend direct belief [36]. However, it is not always possible to follow this principle, but simply being aware of the potential inconsistencies is useful when assessing the results of an analysis, or when considering mitigation strategies against inconsistencies. If B and C were unreliable they might for example try to change the recommended trust measures. Not only that, any party that is able to intercept the recommendations from B, C, or D to A might want to alter the trust values, and A needs to receive evidence of the authenticity and integrity of the recommendations. Cryptographic security mechanisms can typically be used to solve this problem. 272 14 Trust Networks 14.4 Analysing Complex Non-DSPG Trust Networks An analyst might be confronted with a trust network that appears more complex than a DSPG. It is desirable not to put any restrictions on the possible trust network topology that can be analysed, except that it should not be cyclic. This means that the set of possible trust paths from the analyst agent A to the target X can contain paths that are inconsistent with a DSPG. The question then arises how such a trust network should be analysed. A complex non-DSPG trust network is one that is not a SP-graph according to Definition 14.1. In many cases it can be challenging to recognize which referral trust components are in series and which are in parallel in a complex network. Figure 14.8 illustrates a good example of such a complex trust network. The trust network only consists of the edges from A to E and from A to F. The trust network thus consists of referral trust edges only. The last edges from E and F to the target X represent functional belief relationships which could also be considered to be functional trust relationships according to Figure 13.2. However, for the purpose of trust network analysis described below, only the referral trust network from A to E and F is relevant. B Analyst A E X C D Target F Legend: Referral trust Functional belief / trust Fig. 14.8 Non-DSPG trust network. Simply by looking at Figure 14.8 it is obvious that this trust network can not be broken down into groups of series-parallel paths, which complicates the problem of computing trust from the network. In case of a DSPG which can be split into series-parallel configurations, it is simple to determine the mathematical or analytical formula that describes the network’s derived trust. However, for a complex non-DSPG network, trust computation requires more involved methods. Transitive trust networks can be digitally represented and stored in the form of a list of directed trust edges with additional attributes such as trust scope σ , time of collection, trust variant (referral or functional) and trust opinion. Based on the list of edges, an automated parser can establish valid DSPGs between two nodes depending on the need. The trust edges of the non-DSPG trust network of Figure 14.8 can for example be listed as in Table 14.4. 14.4 Analysing Complex Non-DSPG Trust Networks 273 Table 14.2 Trust edges of Figure 14.8 Source Vs Target VT Scope Variant Opinion A B σ referral ωBA A C σ referral ωCA A D σ referral ωDA B E σ referral ωEB B F σ referral ωFB D F σ referral ωFD E X σ functional ωXE F X σ functional ωXF There can be multiple approaches to analysing non-DSPG trust networks. This task might appear similar to the case of reliability analysis of complex systems, described in Section 7.2. However, the case of complex trust networks is fundamentally different. The main differences are that trust networks involve possible deception which system reliability networks do not, and that trust networks involve fusion, which is not an ingredient of system reliability analysis. The principles for analysing complex reliability networks can therefore not be applied to complex trust networks. A different approach is needed for analysing non-DSPG trusts networks. A non-DSPG trust network must be simplified in order to remove paths that prevent consistent trust computation. This process produces a DSPG trust network which can easily be analysed. The optimal derived DSPG trust network is the one that produces the highest certainty of the derived opinion. Note that the goal is to maximise certainty in the derived opinion, and not e.g. to deriving the opinion with the highest projected probability of some value of the variable X. There is a trade-off between the time it takes to find the optimal DSPG, and how close to the optimal DSPG a simplified graph can can be. Below we describe an exhaustive method that is guaranteed to find the optimal DSPG, and a heuristic method that will find a DSPG close to, or equal to the optimal DSPG. • Exhaustive Discovery of Optimal DSPG Trust Network The exhaustive method of finding the optimal DSPG trust network consists of determining all possible DSPGs, then deriving the trust value for each one of them, and finally selecting the DSPG and the corresponding canonical expression that produces the trust value with the highest confidence level. The compu- 274 14 Trust Networks tational complexity of this method is Comp = lm(2n −1), where n is the number of possible paths, m is the average number of paths in the DSPGs, and l is the average number of arcs in the paths. • Heuristic Discovery of Near-Optimal DSPG Trust Network The heuristic method of finding a near-optimal DSPG trust network consists of constructing the graph by including new paths one by one in decreasing order of confidence. Each new path that would turn the graph into a non-DSPG and break canonicity is excluded. This method only requires the computation of the trust value for a single DSPG and canonical expression, with computational complexity Comp = lm, where m is average number of paths in the DSPGs, and l is the average number of arcs in the paths. The heuristic method will produce a DSPG trust network that produces a derived opinion/trust with certainty level equal or close to that of the optimal DSPG trust network. The reason why this method is not guaranteed to produce the optimal DSPG, is that it could exclude multiple trust paths with relatively low certainty levels because of conflict with a single path with higher certainty level previously included. It is possible that the low certainty paths together could provide higher certainty than the previous high certainty path alone. In such cases it would have been optimal to exclude the single high certainty path, and instead include the low certainty paths. However, only the exhaustive method described above is guaranteed to find the optimal DSPG in such cases. The next section describes a heuristic method for transforming a non-DSPG trust network into a DSPG trust network. It can be mentioned that an alternative approach to constructing efficient networks from a potentially large and complex network, has been described in [61] where it is called discovery of small worlds. However, we do not follow that approach here. 14.4.1 Synthesis of DSPG Trust Network Below we describe an algorithm which is able to to simplify a non-DSPG trust network like the one in Figure 14.8 in order to synthesise a DSPG trust network. Simplification of a non-DSPG trust network is a two-step process. First, the nonDSPG trust network is analysed to identify all possible trust paths from the analyst A to the target X. Secondly a new DSPG trust network is synthesised from scratch by only including certain trust paths from the non-DSPG trust network in a way that preserves the DSPG property of the synthesised trust network. The synthesised graph between the source analyst A and the target node X is then a DSPG trust network. A DSPG can be constructed by sequences of serial and parallel compositions that are defined as follows [25]: 14.4 Analysing Complex Non-DSPG Trust Networks 275 Definition 14.6 (Directed Series and Parallel Composition). • A directed series composition consists of replacing an edge [A;C] with two edges [A; B] and [B;C] where B is a new node. • A directed parallel composition consists of replacing an edge [A;C] with two edges [A;C]1 and [A;C]2 . The principle of directed series and parallel composition is illustrated in Figure 14.9. A A B C A C C A C a) Serial composition b) Parallel composition Fig. 14.9 Principles of directed series and parallel composition. Figure 14.10 shows a flowchart algorithm for synthesising a DSPG trust network from a non-DSPG trust network according to the heuristic method. Each step in the flowchart algorithm of Figure 14.10 is described below. (a) Prepare for simplification of non-DSPG trust network. This includes representing the non-DSPG trust network as a set of directed edges with pairs of nodes. Set a threshold pT for the lowest relevant reliability of trust paths. Create an empty DSPG trust network to be synthesised. (b) Identify each trust path from the analyst A to the target node X. For each path, compute the product of the projected probabilities of referral trust edges. The last functional belief/trust edge to the target X is not included in the product. (c) Create a ranked list of paths according to the products computed in the previous step, i.e. where the path with the highest product has index 1. Set pointer to 0. (d) Increment the pointer to select the next path from the ranked list of paths. Exit to termination if there is no path left, or if the product is smaller than the threshold pT . Continue in case there is a path with product greater or equal to the threshold reliability pT . (e) Check if the selected trust path can be added and integrated into the DSPG trust network. Use the criteria described in Section 14.4.2. (f) Add the selected trust path in case it fits into the DSPG. Existing trust edges are not duplicated, only new trust edges are added to the DSPG. (g) Drop the selected trust path in case it does not fit into the DSPG. (h) The synthesised DSPG can be analysed according to the algorithm described in Section 14.3. 276 14 Trust Networks (a) (b) Prepare non-DSPG trust network for synthesis of DSPG trust network Trace all trust paths in non-DSPG from analyst A to target X Rank all paths based on (c) product of referral trust projected probabilities (d) Select next path from ranked list of paths no path found path found (e) (g) Drop path no Can path be integrated into DSPG ? Completed synthesis (h) of DSPG trust network yes (f) Integrate path into DSPG trust network Fig. 14.10 Flowchart algorithm for synthesising a DSPG from a complex non-DSPG 14.4.2 Requirements for DSPG Synthesis Ideally, all the possible paths discovered by the algorithm of Figure 14.10 should be taken into account when deriving the opinion/trust value. A general directed graph will often contain loops and dependencies. This can be avoided by excluding certain paths, but this can also cause information loss. Specific selection criteria are needed in order to find the optimal subset of paths to include. With n possible paths, there are (2n − 1) different combinations for constructing graphs, of which not all necessarily are DSPGs. The algorithm of Figure 14.10 aims at synthesising a DSPG trust network with the least information loss relative to the original non-DSPG trust network. Figure 14.11 illustrates an simple non-DSPG trust graph, where it is assumed that A is the source analyst and X is the target. The two heuristic rules used to discard paths are 1) when a path is inconsistent with a DSPG, and 2) when the product of projected probabilities drops below a predefined threshold. In the algorithm of Figure 14.10 on p.276 it can be noted that step (d) enforces the rule that the product projected probability of the referral trust edges in the graph 14.4 Analysing Complex Non-DSPG Trust Networks 277 B A D X C Legend: Referral trust Functional belief / trust Fig. 14.11 Simple non-DSPG trust network is greater than or equal to threshold pT . A low product value indicates low certainty in the trust path. By removing paths with low certainty, the number of paths to consider is reduced while the information loss can be kept to an insignificant level. The subsequent step (e) checks that the path can be included consistently with the DSPG. In the non-DSPG trust network of Figure 14.11 there are 3 possible paths between A and X, as expressed by Eq.(14.3) below. φ1 = ([A; B] : [B;C] : [C, X]) φ2 = ([A; D] : [D;C] : [C, X]) φ3 = ([A; B] : [B; D] : [D;C] : [C, X]) (14.3) The 3 paths can generate the following 7 potential combinations/graphs. γ1 = φ 1 γ2 = φ 2 γ3 = φ 3 γ4 = φ 1 ⋄ φ 2 γ5 = φ 1 ⋄ φ 3 γ6 = φ 2 ⋄ φ 3 γ7 = φ 1 ⋄ φ 2 ⋄ φ 3 (14.4) The expression γ7 contains all possible paths between A and X. The problem with γ7 is that it is a non-DSPG so that it can not be represented in the form of a canonical expression, i.e. where each edge only appears once. In this example, one path must must be removed from the graph in order to have a canonical expression. The expressions γ4 , γ5 and γ6 can be canonicalised, and the expressions γ1 , γ2 and γ3 are already canonical, which means that all the expressions except γ7 can be used as a basis for constructing a DSPG and for deriving A’s opinion/trust in X. The following requirements must be satisfied for preserving a DSPG when including new sub-paths. The source and target nodes refer to the source and target nodes of the new sub-path that is to be added to the existing graph by bifurcation. Definition 14.7 (Requirements for Including New Sub-Paths in DSPG). 1. The target node must be reachable from the source node in the existing graph. 2. The source and the target nodes must have equal nesting levels in the existing graph. 3. The nesting level of the source and target nodes must be equal to, or less than the nesting level of all intermediate nodes in the existing graph. 278 14 Trust Networks ⊔ ⊓ These principles are illustrated with examples below. Figure 14.12, Figure 14.13 and Figure 14.14 illustrate how new paths can be included in a way that preserves graph canonicity. In the figures, the nesting levels of nodes and edges are indicated as an integer. A bifurcation is when a node has two or more incoming or outgoing edges, and is indicated by brackets in the shaded node boxes. The opening bracket ‘(’ increments the nesting level by 1, and the closing bracket ‘)’ decrements the nesting level by 1. A sub-path is a section of a path without bifurcations. The equal sign ‘=’ means that the node is part of a sub-path, in which case the nesting level of the edge on the side of the ‘=’ symbol is equal to the nesting level of the node. Each time a new path is added to the old graph, some sub-path sections may already exist in the old graph which does not require any additions, whereas other sub-path sections that do not already exist, must be added by bifurcations to the old graph. • Illustrating DSPG Synthesis Requirement 1. Requirement 1 from Definition 14.7 is illustrated in Figure 14.12. The new edge [B;C] is rejected because C is not reachable from B in the existing graph, whereas the new edge [A; D] can be included because D is reachable from A in the existing graph. B =1= A 0( 1 1 1 C =1= D )0= 0 X =0 1 Legend: Existing referral trust edge Existing functional belief/trust edge Potential new referral trust edge Included Rejected # Nesting level Fig. 14.12 Visualising the requirement that the target must be reachable from the source. The edge can be included under the same nesting level as the sub-paths ([A; B] : [B; D]) and ([A;C] : [C; D]) in this example. The existing and new updated graphs of Figure 14.12 are expressed below. Note that the brackets around sub-paths, e.g. ([A; B] : [B; D]), are not reflected in Figure 14.12 because they do not represent nesting, but simply grouping of arcs belonging to the same sub-path. 14.4 Analysing Complex Non-DSPG Trust Networks 279 Existing graph: ((([A; B] : [B; D]) ⋄ ([A;C] : [C; D])) : [D; X]) (14.5) Updated graph: ((([A; B] : [B; D]) ⋄ ([A;C] : [C; D]) ⋄ [A; D]) : [D; X]) • Illustrating DSPG Synthesis Requirement 2. Requirement 2 from Definition 14.7 is illustrated in Figure 14.13. The new edge [B; D] is rejected because B and D have different nesting levels, whereas the new edge [A; D] is included because A and D have equal nesting levels. Node A does in fact have nesting levels 1 and 2 simultaneously because two separate bifurcations with different nesting levels start from A. 2 2 A 0(1( 2 B =2= 2 C )1= 1 D =1= 1 X )0 Legend: Existing referral trust edge Existing functional belief/trust edge Potential new referral trust edge Included Rejected # Nesting level Fig. 14.13 Visualising the requirement that the source and target must have equal nesting levels. Including the new edge produces an additional nesting level being created, which also causes the nesting levels of the sub-paths [A; B] : [B;C] and [A;C] to increment. The existing and updated graphs of Figure 14.13 can then be expressed as: Existing graph: (((([A; B] : [B;C]) ⋄ [A;C]) : [C; D] : [D, X]) ⋄ [A, X]) Updated graph: ((((([A; B] : [B;C]) ⋄ [A;C]) : [C; D]) ⋄ [D, X]) ⋄ [A, X]) (14.6) • Illustrating DSPG Synthesis Requirement 3. Requirement 3 from Definition 14.7 is illustrated in Figure 14.14. The new edge [B; D] is rejected because the node C has a nesting level that is inferior to that of B and D, whereas the new edge [A, X] is included because the nesting level of C is equal to that of A and X. Including the new edge produces an additional nesting level being created, which also causes the nesting levels of the existing sub-paths to increment. The 280 14 Trust Networks B =1= A 0( 1 D =1= 1 1 C )0( 1 1 1 X )0 Legend: Existing referral trust edge Existing functional trust edge Potential new (referral or functional) belief/trust edge Included Rejected # Nesting level Fig. 14.14 Visualising the requirement that nesting level of intermediate nodes must be equal to or greater than that of source and target existing and new graphs of Figure 14.14 can then be expressed as: Existing graph: ((([A; B] : [B;C]) ⋄ [A;C]) : (([C; D] : [D, X]) ⋄ [C, X])) Updated graph: (((([A; B] : [B;C]) ⋄ [A;C]) : (([C; D] : [D, X]) ⋄ [C, X])) ⋄ [A, X]) (14.7) Chapter 15 Bayesian Reputation Systems Reputation systems are used to collect and analyse feedback about the performance and quality of products, service and service entities, that for short can be called service objects. The received feedback can be used to derive reputation scores, that in turn can be published to potential future users. The feedack can also be used internally by the service provider in order to improve the quality of service objects. Figure 15.1 illustrates how a reputation system typically isintegrated in online service provision. The figure indicates the cyclic sequence of steps including request and provision of services, in addition to the exchange and processing of feedback ratings and reputation scores. Reputation systems are normally integrated with the service provision function, so that steps related to reputation are linked to the steps of service provision. Feedback from service users is highly valuable to service providers, but there are typically no obvious incentive for service users to provide ratings. In order to increase the amount of feedback it is common that the service provider explicitly requests feedback from the service users, after a service has been provided and consumed. From the user’s point of view, it is assumed that reputation scores can help predict the future performance of service objects and thereby reduce uncertainty of users during decision-making to rely on those service objects [77]. The idea is that transactions with reputable service objects or service providers are likely to result in more favourable outcomes than transactions with disreputable service objects and service providers. Reputation scores are not only useful for service consumer decision making. Reputation scores can also be used internally by a service provider in order to tune and configure the service provision system, and in general to increase quality and performance. Reputation systems are typically centralised, meaning that ratings are centrally aggregated, as illustrated in Figure 15.1. Distributed reputation systems have been proposed, and could e.g. be implemented in conjunction with Peer-to-Peer (P2P) networks [3], where a user must discover and request private reputation ratings from other users in the P2P-network. We will not discuss distributed reputation systems here. 281 282 15 Bayesian Reputation Systems User Service provider Select and request service object Service request Consume and assess service Fetch and consider reputation information Service provision Compose and provide service Feedback rating Reputation System Reputation dissemination Aggregate feedback & compute scores Improve quality of service objects Legend: External interaction Internal processing Fig. 15.1 Integration of reputation systems in service architectures Two fundamental elements of reputation systems are: 1. Collection network that allows the reputation system to receive and aggregate feedback ratings about service objects from users, as well as quality indicators from other sources. 2. A reputation score computation engine used by the reputation system to derive reputation scores for each participant, based on received ratings, and possibly also on other information. Many different reputation systems, including reputation score computation methods, have been proposed in the literature, and we do not intend to provide a complete survey or comparison here. We refer to [23, 43, 60, 78, 83] as general literature on reputation systems. The most common approach to computing the reputation score in commercial systems is probably to use some form of weighted mean. Computation methods based in weights distributed around the median rating (i.e. the middle rating in a ranked list of ratings) [1, 2, 28] can provide more stable reputation scores than scores based on the simple mean. It is also possible to compute scores bases on fuzzy logic [4], for example. User-trust and time-related factors can be incorporated into any of the above mentioned reputation computation methods, This chapter focuses on the reputation score computation methods, and in particular on Bayesian computational methods. Binomial and multinomial Bayesian reputation systems have been proposed and studied e.g. in [42, 44, 54, 90]. The purpose of this chapter is to concisely describe basic features of Bayesian reputation systems. 15.1 Computing Reputation Scores 283 15.1 Computing Reputation Scores Binomial reputation systems allow ratings to be expressed with two values, as either positive (e.g. Good) or negative (e.g. Bad). Multinomial reputation systems allow the possibility of providing ratings in different discrete levels such as e.g. mediocre - bad - average - good - excellent. 15.1.1 Binomial Reputation Scores. Binomial Bayesian reputation systems apply to the binary state space {Bad, Good} which reflect a corresponding performance of a service object. The evidence notation of the Beta PDF of Eq.(3.8) takes the two parameters r and s that represent the number of received positive and negative ratings respectively. Binomial reputation is computed by statistical updating of the Beta PDF. More specifically, the a posteriori (i.e. the updated) reputation is continuously computed by combining the a priori (i.e. previous) reputation with every new rating. It is the expected probability of Eq.(3.9) that is used to represent the reputation score. The Beta PDF itself only provides the underlying statistical foundation, and is otherwise not used in the reputation system. Before receiving any ratings, the a priori distribution is the Beta PDF with r = 0 and s = 0, which with the default base rate a = 0.5 produces a uniform a priori distribution. Then after observing r ”Good” and s ”Bad” outcomes, the a posteriori Beta PDF gives a reputation score S that can be computed with the expression for expected probability of Eq.(3.9), which in terms of reputation score is repeated below. S = (r +Wa)/(r + s +W ). (15.1) This score should be interpreted as the probability that the next experience with the service object will be ”Good”. Recall that W denotes the non-informative prior weight, where W = 2 is normally used. 15.1.2 Multinomial Reputation Scores. Multinomial Bayesian reputation systems allow ratings to be provided over k different levels which can be considered as a set of k disjoint elements. Let this set be denoted as Λ = {L1 , . . . Lk }, and assume that ratings are provided as votes on the elements of Λ . This leads to a Dirichlet PDF (probability density function) over the k-component random probability variable p (Li ), i = 1 . . . k with sample space [0, 1]k , subject to the simple additivity requirement ∑ki=1 p (Li ) = 1. The evidence representation of the Dirichlet PDF is given in Eq.(3.16). The Dirichlet PDF itself 284 15 Bayesian Reputation Systems only provides the underlying statistical foundation, and is otherwise not used in the reputation system. The Dirichlet PDF with prior captures a sequence of observations of the k possible outcomes with k positive real rating parameters r (Li ), i = 1 . . . k, each corresponding to one of the possible levels. In order to have a compact notation we define a vector p = {pp(Li ) | 1 ≤ i ≤ k} to denote the k-component probability variable, and a vector r = {ri | 1 ≤ i ≤ k} to denote the k-component rating variable. In order to distinguish between the a priori default base rate, and the a posteriori ratings, the Dirichlet distribution must be expressed with prior information represented as a base rate distribution a over the rating levels L. Similarly to the binomial case, the multinomial reputation score S is the distribution of expected probabilities of the k random probability variables, which can be computed with the expression for expected probability distribution of Eq.(3.17), which in terms of score distribution is expressed as: S (Li ) = E(pp(Li ) | r , a ) = r (Li ) +W a (Li ) . W + ∑ki=1 r (Li ) (15.2) The non-informative prior weight W will normally be set to W = 2 when a uniform distribution over binary state spaces is assumed. Selecting a larger value for W would result in new observations having less influence. over the Dirichlet distribution, and can in fact represent specific a priori information provided by a domain expert or by another reputation system. 15.2 Collecting and Aggregating Ratings Before computing reputation scores, the ratings must be collected and aggregated in some way. This includes taking time decay into account, for example. 15.2.1 Collecting Ratings Assume k different discrete rating levels L. This translates into having a domain of cardinality k. For binomial reputation systems k = 2 and the rating levels are ”Bad” and ”Good”. For multinomial reputation system k > 2 and any corresponding set of suitable rating levels can be used. Let the rating level be indexed by i. The aggregate ratings for a particular service object y are stored as a cumulative vector, expressed as: Ry (Li ) | i = 1 . . . k) . R y = (R (15.3) 15.2 Collecting and Aggregating Ratings 285 The simplest way of updating a rating vector as a result of a new rating is by adding the newly received rating vector r to the previously stored vector R . The case when old ratings are aged is described in Sec.15.2.2. Each new discrete rating of service object y by an agent A takes the form of a trivial vector r Ay where only one element has value 1, and all other vector elements have value 0. The index i of the vector element with value 1 refers to the specific rating level. 15.2.2 Aggregating Ratings with Aging Ratings are typically aggregated by simple addition of the components (vector addition). However, service objects may change their quality over time, so it is desirable to give relatively greater weight to more recent ratings. This principle is called time decay, which can be taken into by introducing a longevity factor λ ∈ [0, 1] for ratings, which controls the rapidity with which old ratings are aged and discounted as a function of time. With λ = 0, ratings are completely forgotten after a single time period. With λ = 1, ratings are never forgotten. Let new ratings be collected in discrete time periods. Let the sum of the ratings of a particular service object y in period t be denoted by the vector r y,t . More specifically, it is the sum of all ratings r Ay of service object y by rating agents A during that period, expressed by: r y,t = ∑ r xy (15.4) A∈My,t where My,t is the set of all rating agents who rated service object y during period t. Let the total accumulated ratings (with aging) of service object y after the time period t be denoted by Ry,t . Then the new accumulated rating after time period t + 1 can be expressed as: Ry,(t+1) = λ · Ry,t + r y,(t+1) , where 0 ≤ λ ≤ 1 . (15.5) Eq.(15.5) represents a recursive updating algorithm that can be executed once every period for all service objects, or alternatively in a discrete fashion for each service object for example after each new rating. Assuming that new ratings are received between time t and time t + n periods, then the updated rating vector can be computed as: R y,(t+n) = λ n · R y,t + r y,(t+n) , 0 ≤ λ ≤ 1. (15.6) 286 15 Bayesian Reputation Systems 15.2.3 Reputation Score Convergence with Time Decay The recursive algorithm of Eq.(15.5) makes it possible to compute convergence values for the rating vectors, as well as for reputation scores. Assuming that a particular service object receives the same ratings every period, then Eq.(15.5) defines a geometric series. We use the well known result of geometric series: ∞ 1 ∑ λ j = 1−λ for − 1 < λ < 1 . (15.7) j=0 Let e y represent a constant rating vector of service object y for each period. The Total accumulated rating vector after an infinite number of periods is then expressed as: R y,∞ = ey , where 0 ≤ λ < 1 . 1−λ (15.8) Eq.(15.8) shows that the longevity factor λ determines the convergence values for the accumulated rating vector according to Eq.(15.5). In general it will be impossible for components of the accumulated rating vector to reach infinity, which makes it impossible for the score vector components to cover the whole range [0, 1]. However, service objects that provide maximum quality services over a long time period wold naturally expect to get the highest possible reputation score. An intuitive interpretation of this expectation is that each long standing service object should have its own individual base rate which is determined as a function of the service object’s total history, or at least a large part of it. This approach is used in the next section to include individual base rates. 15.3 Base Rates for Ratings The cold-start problem in reputation systems is when a service object has not received any ratings, or too few ratings to produce a reliable reputation score. This problem can be solved by basing reputation scores on base rates, which can be individual or community based. 15.3.1 Individual Base Rates A base rate normally expresses the average in a population or domain. Here we will compute individual base rates from a ‘population’ consisting of individual performances over a series of time periods. The individual base rate for service object y at time t will be denoted as a y,t . It will be based on individual evidence vectors denoted as Q y,t . 15.3 Base Rates for Ratings 287 Let a denote the community base rate as usual. Then the individual base rate for service object y at time t can be computed similarly to Eq.(??) as: a y,t (Li ) = Q y,t (Li ) +W a (Li ) W + ∑ki=1 Q y,t (Li ) . (15.9) Reputation scores can be computed as normal with Eq.(15.2), except that the community base rate a is replaced with the individual base rate a y,t of Eq.(15.9). It can be noted that the individual base rate a y,t is partly a function of the community base rate a , which thereby constitutes a two-level base rate model. The components of the reputation score vector computed with Eq.(15.2) based on the individual base rate of Eq.(15.9) can theoretically be arbitrarily close to 0 or 1 with any longevity factor and any community base rate. The simplest alternative to consider is to let the individual base rate for each service object be a function of the service object’s total history. A second similar alternative is to let the individual base rate be computed as a function of an service object’s performance over a very long sliding time window. A third alternative is to define an additional high longevity factor for base rates that is much closer to 1 than the common longevity factor λ . The formalisms for these three alternatives are briefly described below. 15.3.2 Total History Base Rate. The total evidence vector Q y,t for service object y used to compute the individual base rate at time period t is expressed as: t Q y,t = ∑ R y, j (15.10) j=1 15.3.3 Sliding Time Window Base Rate. The evidence vector Q y,t for computing an service object y’s individual base rate at time period t is expressed as: t Q y,t = ∑ R y, j where Window Size = (t − u) . (15.11) j=u The Window Size would normally be a constant, but could also be dynamic. In case e.g. u = 1 the Window Size would be increasing and be equal to t, which also would make this alternative equivalent to the total history alternative described above. 288 15 Bayesian Reputation Systems 15.3.4 High Longevity Factor Base Rate. Let λ denote the normal longevity factor. A high longevity factor λH can be defined where λH > λ . The evidence vector Q y,t for computing an service object y’s individual base rate at time period t is computed as: Q y,t = λH · Q y,(t−1) + r y,t , where λ < λH ≤ 1 . (15.12) In case λH = 1 this alternative would be equivalent to the total history alternative described above. The high longevity factor makes ratings age much slower than the regular longevity factor. 15.3.5 Dynamic Community Base Rates Bootstrapping a reputation system to a stable and conservative state is important. In the framework described above, the base rate distribution a will define initial default reputation for all service objects. The base rate can for example be evenly distributed, or biased towards either a negative or a positive reputation. This must be defined by those who set up the reputation system in a specific market or community. Service objects will come and go during the lifetime of a market, and it is important to be able to assign a reasonable base rate reputation to new service objects. In the simplest case, this can be the same as the initial default reputation used during during bootstrap. However, it is possible to track the average reputation score of the whole community of service objects, and this can be used to set the base rate for new service objects, either directly or with a certain additional bias. Not only new service objects, but also existing service objects with a standing track record can get the dynamic base rate. After all, a dynamic community base rate reflects the whole community, and should therefore be applied to all service object members of that community. The aggregate reputation vector for the whole community at time t can be computed as: R M,t = ∑ R y,t (15.13) y j ∈M This vector then needs to be normalised to a base rate vector as follows: Definition 15.1 (Community Base Rate). Let R M,t be an aggregate reputation vector for a whole community, and let S M,t be the corresponding multinomial probability reputation vector which can be computed with Eq.(15.2). The community base rate as a function of existing reputations at time t + 1 is then simply expressed as the community score at time t: a M,(t+1) = S M,t . (15.14) 15.4 Reputation Representation 289 The base rate vector of Eq.(15.14) can be given to every new service object that joins the community. In addition, the community base rate vector can be used for every service object every time their reputation score is computed. In this way, the base rate will dynamically reflect the quality of the market at any one time. If desirable, the base rate for new service objects can be biased in either negative or positive direction in order to make it harder or easier to enter the market. When base rates are a function of the community reputation, the expressions for convergence values with constant ratings can no longer be defined with Eq.(15.8), and will instead converge towards the average score from all the ratings. 15.4 Reputation Representation Reputation can be represented in different forms. We will here illustrate reputation as multinomial probability scores, and as point estimates. Each form will be described in turn below. 15.4.1 Multinomial Probability Representation. The most natural is to define the reputation score as a function of the expected probability of each score level. The expected probability for each rating level can be computed with Eq.(15.2). Let R represent the service object’s aggregate ratings. Then the vector S defined by: ! R ∗y (Li ) +W a (Li ) S y : S y (Li ) = ;| i = 1...k . (15.15) W + ∑kj=1 R ∗y (L j ) is the corresponding multinomial probability reputation score. As already stated, W = 2 is the value of choice, but a larger value for the constant W can be chosen if a reduced influence of new evidence over the base rate is required. The reputation score S can be interpreted like a multinomial probability measure as an indication of how a particular service object is expected to perform in future transactions. It can easily be verified that k ∑ S(Li ) = 1 . (15.16) i=1 The multinomial reputation score can for example be visualised as columns, which would clearly indicate if ratings are polarised. Assume for example 5 levels: 290 15 Bayesian Reputation Systems L1 : Mediocre, L2 : Bad, Discrete rating levels: L3 : Average, L4 : Good, L5 : Excellent. (15.17) We assume a default base rate distribution. Before any ratings have been received, the multinomial probability reputation score will be equal to 1/5 for all levels. Let us assume that 10 ratings are received. In the first case, 10 average ratings are received, which translates into the multinomial probability reputation score of Fig.15.2.a. In the second case, 5 mediocre and 5 excellent ratings are received, which translates into the multinomial probability reputation score of Fig.15.2.b. (a) 10 average ratings (b) 5 mediocre and 5 excellent ratings Fig. 15.2 Illustrating score difference resulting from average and polarised ratings With a binomial reputation system, the difference between these two rating scenarios would not have been visible. 15.4.2 Point Estimate Representation. While informative, the multinomial probability representation can require considerable space to be displayed on a computer screen. A more compact form can be to express the reputation score as a single value in some predefined interval. This can be done by assigning a point value ν to each rating level i, and computing the normalised weighted point estimate score σ . Assume e.g. k different rating levels with point values evenly distributed in the i−1 range [0,1], so that ν (Li ) = k−1 . The point estimate reputation score is then: k σ = ∑ ν (Li )S(Li ) . (15.18) i=1 However, this point estimate removes information, so that for example the difference between the average ratings and the polarised ratings of Fig.15.2.a and Fig.15.2.b is no longer visible. The point estimates of the reputation scores of Fig.15.2.a and Fig.15.2.b are both 0.5, although the ratings in fact are quite dif- 15.5 Simple Scenario Simulation 291 ferent. A point estimate in the range [0,1] can be mapped to any range, such as 1-5 stars, a percentage or a probability. 15.4.3 Continuous Ratings It is common that the rating and score of service objects are measured on a continuous scale, such as time, throughput or relative ranking, to name a few examples. Even when it is natural to provide discrete ratings, it may be difficult to express that something is strictly good or average, so that combinations of discrete ratings, such as ‘average-to-good’ would better reflect the rater’s opinion. Such ratings can then be considered continuous. To handle this, it is possible to use a fuzzy membership function to convert a continuous rating into a binomial or multinomial rating. For example with five rating levels the sliding window function can be illustrated as in Fig.15.3. The continuous q-value determines the r-values for that level. Fig. 15.3 Fuzzy triangular membership functions 15.5 Simple Scenario Simulation A simple scenario can be used to illustrate the performance of a multinomial reputation system that uses some of the features described above. Let us assume that service objects y and z receive the following ratings over 70 rounds or time periods. Table 15.1 Sequence of ratings Sequence Periods 1 - 10 Periods 11-20 Periods 21-30 Periods 31-40 Periods 41-70 Service object y 10 × L1 ratings in each period 10 × L2 ratings in each period 10 × L3 ratings in each period 10 × L4 ratings in each period 30 × L5 ratings in each period Service object z 1 × L1 rating in each period 1 × L2 rating in each period 1 × L3 rating in each period 1 × L4 rating in each period 3 × L5 ratings in each period 292 15 Bayesian Reputation Systems The longevity of ratings is set to λ = 0.9 and the individual base rate is computed with the high longevity approach described in Sec.15.3.4 with high longevity factor for the base rate set to λH = 0.999. For simplicity in this example the community base rate is assumed to be fixed during the 70 rounds, expressed by a(L1 ) = a(L2 ) = a(L3 ) = a(L4 ) = a(L5 ) = 0.2. Fig.15.4 illustrates the evolution of the scores of service objects y and z during the period. Scores Scores 1 1 0,9 0,9 0,8 0,8 0,7 71 0,6 61 0,5 51 0,4 41 0,3 31 0,2 Periods 21 0,1 0,7 71 0,6 61 51 0,5 41 0,4 0,3 31 0,2 Periods 21 0,1 11 0 L1 L3 Point Estimate L5 1 (a) Scores for service object y 11 0 L1 L3 L5 1 Point Estimate (b) Scores for service object z Fig. 15.4 Score evolution for service objects y and z The scores for both service objects start with the community base rate, and then vary as a function of the received ratings. Both service objects have an initial point estimate of 0.5. The scores for service object z in Fig.15.4.b are similar in trend but less articulated than that of service object y in Fig.15.4.a, because service object z receives equal but less frequent ratings. The final score of service object z is visibly lower than 1 because the relatively low number of ratings is insufficient for driving the individual base rate very close to 1. Thanks to the community base rate, all new service objects in a community will have a meaningful initial score. In case of rating scarcity, an service object’s score will initially be determined by the community base rate, with the individual base rate dominating as soon as some ratings have been received, 15.6 Combining Trust and Reputation Multinomial aggregate ratings can be used to derive binomial trust in the form of an opinion. This is done by first converting the multinomial ratings to binomial ratings according to Eq.(15.19) below, and then to apply the mapping of Definition 3.3. Let the multinomial reputation model have k rating levels Li ; i = 1, . . . k, where R (Li ) represents the ratings on each level Li , and let σ represent the point estimate 15.7 Combining Trust and Reputation 293 reputation score from Eq.(15.18). Let the binomial reputation model have positive and negative ratings r and s respectively. The derived converted binomial rating parameters (r, s) are given by: r = σ ∑ki=1 R y (Li ) (15.19) k R s = ∑i=1 y (Li ) − r With the equivalence mapping of Definition 3.3 it is possible to analyse trust networks based on both trust relationships and reputation scores, as described next. 15.7 Combining Trust and Reputation The multinomial Bayesian reputation systems described above use the same representation as multinomial evidence opinions described in Eq.(3.16), which can be mapped to multinomial opinions according to Definition 3.6. Furthermore, the projection from multinomial ratings to binomial ratings of Eq.(15.19), combined with the binomial mapping of Definition 3.3 makes it is possible to represent reputation scores as binomial opinions, that in turn can be applied in computational trust models as described in Chapter 13. Fig.15.5 illustrates a scenario involving a reputation system that publishes reputation scores about agents in a network. We assume that agent A needs to derive a measure of trust in agent F, and that only agent B has knowledge about F. Assume furthermore that agent A has no direct trust in B, but that A trusts the Reputation System RS and that RS has published a reputation score about B. Reputation System RS 1 2 A 3 B 5 4 C D E Users and Services F Fig. 15.5 Combining trust and reputation A in the Reputation System (arrow 1), and agent B has repuAgent A has trust ωRS RS tation score R B (arrow 2). The binomial reputation score R RS B a Beta PDF denoted Betae (rx , sx , ax ) as expressed in Eq.(3.8), and which can be mapped to a binomial opinion according to the mapping of Definition 3.3. The binomial opinion derived from the reputation score of B produced by the Reputation System RS is then denoted ωBRS . 294 15 Bayesian Reputation Systems Agent A can then derive a measure of trust in B (arrow 3) based on A’s trust in RS, and the reputation opinion ωBRS . Agent B trusts F (arrow 4) with opinion ωFB , where it is assumed that B recommends this trust to A, so that A can derive a measure of trust in F (arrow 5). The trust path is expressed as: [A, F] = [A; RS] : [RS; B] : [B, F] (15.20) The trust edges [A; RS] and [RS; B] represent referral trust, whereas the trust edge [B, F] represents functional trust. In the notation of subjective logic, A’s derived trust in F can be expressed as: A ωFA = ωRS ⊗ ωBRS ⊗ ωFB (15.21) The computation of ωFA is done according to the method for multi-node trust transitivity described in Section 13.3.4. The compatibility between Bayesian reputation systems and subjective logic provides a flexible framework for analysing trust networks consisting of both reputation scores and private trust values. Chapter 16 Subjective Networks This chapter focuses on representation and reasoning in conditional inference networks, combined with trust networks, thereby introducing subjective networks as graph-based structures of variables combined with conditional opinions. Subjective networks generalize Bayesian network modelling and analysis from being based on probability calculus, to being based on subjective logic. A Bayesian network [74] is a compact representation of a joint probability distribution of random variables in the form of DAG (directed acyclic graph) and a set of conditional probability distributions associated with each node. The goal of inference in Bayesian networks is to derive a conditional probability distribution of any set of (target) variables in the network, given that the values of any other set of (evidence) variables have been observed. Bayesian networks reasoning algorithms provide a way to propagate the probabilistic information through the graph, from the evidence to the target. One serious limitation of traditional Bayesian network reasoning is that all the input conditional probability distributions in the network must be assigned precise values in order for the inference algorithms to work, and for the model to be analysed. This is problematic in situations where probabilities can not be reliably elicited and one needs to do inference with uncertain or incomplete probabilistic information, inferring the most accurate conclusions possible. Subjective opinions can express uncertain probabilistic information of any kind (minor or major imprecision, and even total ignorance), by varying the uncertainty mass between 0 and 1. A straightforward generalization of Bayesian networks in subjective logic retains the network structure and replaces conditional probability distributions with conditional subjective opinions at every node of the network. We call this a subjective Bayesian network and consider the reasoning in it as a generalization of classical Bayesian reasoning, where the goal is to obtain a subjective opinion on the target given the evidence that can be an instantiation of values, but also a subjective opinion itself. 295 296 16 Subjective Networks General inference in subjective Bayesian networks can be challenging, since subjective logic inference requires the consideration of uncertainty, which changes the notion of conditional independence in Bayesian networks. At the time of writing, subjective Bayesian networks – the topic of this chapter – is a relatively new field that is still not thoroughly developed. This chapter is therefore meant as a brief introduction into this new field, which appears to be a very fertile field of research and development. The capacity of subjective logic for reasoning in the presence of uncertainty, combined with the power of Bayesian networks for modelling conditional knowledge structures, is a very potent combination. A whole new book will be required to cover this field properly. The next section gives a brief overview of Bayesian networks. For a thorough introduction into the field, see e.g. [59]. After the introduction, we describes some properties and aspects resulting from the generalisation to subjective Bayesian networks, and how it can be applied. 16.1 Bayesian Networks Bayesian networks represent a powerful framework for modelling and analysing practical situation, where the analyst needs to make probabilistic inference about a set of variables with unknown values. Initially proposed by Pearl in 1988 [74], Bayesian network tools are currently being used in important applications in many areas like medical diagnostics, risk management, marketing, military planning, etc. When events and states are related in time and space, they are conditionally dependent. For example, the state of carrying an umbrella is typically influenced by the state of rain. These relationships can be expressed in the form of graphs, consisting of nodes connected with directed edges. To be practical, the graphs must be acyclic to prevent loops, so that the graph is a DAG (directed acyclic graph), to be precise. The nodes are variables that represent possible states or events. The directed edges represent the (causal) relationships between the nodes. Associated with the Bayesian network graph, are various (conditional) probability distributions, that formally specify selected local (conditional) relationships between nodes. Missing probability distributions for specific target nodes can be derived through various algorithms that take as input arguments the existing known probability distributions and the structure of the Bayesian network graph. Consider a Bayesian network containing K nodes, X I to X K , and a joint probability distribution over all the variables. A particular probability in the joint distribution is represented by p(X I = xI , X II = xII , . . . , X K = xK ), which can be expressed more concisely as p(x1 , x2 , . . . , xn ). The chain rule of conditional probability reasoning expresses the joint probability in terms factorisation of conditional probabilities as: 16.1 Bayesian Networks 297 p(xI , xII , . . . , xK ) = p(xI ) · p(xII |xI ) · p(xIII |(xI , xII ))·, . . . , ·p(xK |(xI , . . . , x(K−1) )) = ∏i p(xi |(xI , . . . , x(i−1) )) (16.1) The application of Eq.(16.1) togeter with Bayes rule, a set of independence properties, as well as various computation algorithms, provide the basis for analysing complex Bayesian networks. Figure 16.1 illustrate the most common reasoning categories supported by Bayesian networks [59]. Direction of reasoning PREDICTIVE DIAGNOSTIC X II XI Query Y ZI Direction of reasoning Evidence XI X II Y Query Z II ZI Z II Evidence INTERCAUSAL XI X II Query Y ZI COMBINED Evidence XI Query Z II Direction of reasoning ZI X II Y Evidence Z II Direction of reasoning Fig. 16.1 Categories of Bayesian reasoning The next sections provide examples of reasoning according to these categories. 16.1.1 Example: Lung Cancer Situation A classical example used to illustrate Bayesian networks is the case of lung cancer, that on the one hand can have various causes, and that on the other hand can cause observable effects [59]. In the example, it is assumed that breathing polluted air, denoted P, and cigarette smoking, denoted S, are the most relevant causes. The estimated likelihood of getting lung cancer is specified as a table with conditional 298 16 Subjective Networks probabilities corresponding to all possible combinations of causes, i.e. without any of the causes, as a result of cause S alone, cause P alone, or both causes simultaneously. In addition, assuming that a person has lung cancer (or not), the estimated likelihood of positive cancer detection on an X-ray image, denoted X, and the estimated likelihood of shortness of breath (medical term: ‘dyspnoea’, denoted D, are also specified as probability tables. Figure 16.2 illustrates this particular Bayesian network. a(P) = 0.30 a(C) = 0.01 C p( D | C) F T Smoking Pollution Cancer Dyspnoea a(S) = 0.40 P S p( C | P, S ) F F T T F T F T 0.001 0.020 0.002 0.030 X-Ray 0.020 0.650 C p( X | C) F T 0.010 0.900 Fig. 16.2 Simple Bayesian network for the lung cancer situation Once the graph has bee drawn and populated with conditional probabilities, the Bayesian network becomes a basis for reasoning. In particular, when the value of one or several variables has been observed (or guessed), we can make inferences from the new information. This process is sometimes called probability propagation or belief updating, and consists of applying Bayes theorem and other laws of probability, with the observed evidence and the probability tables as input parameters, to determine the probability distribution of specific variables of interest. For example, assume a person who consults his GP because of shortness of breath. From this evidence alone, the GP can estimate the likelihood that the person suffers from lung cancer. The computation applies Bayes theorem of Eq.(9.6), and would require a prior (base rate) probability of lung cancer in the population (expressed as a(x) in Eq.(9.6)). Assume that the prior probability for lung cancer is a(C) = 0.01. The probability of lung cancer given dyspnoea is then: p(C|D) = a(C)p(D|C) a(C)p(D|C)+a(C)p(D|C) (16.2) = 0.01·0.65 (0.01·0.65)+(0.99·0.02) = 0.25. 16.1 Bayesian Networks 299 However, assuming that the evidence is not very conclusive, and that many other conditions can also cause shortness of breath, the GP can decide to get an X-ray image of the person’s lungs in order to have more firm evidence. Based on indications of lung cancer found on the X-ray image, combined with the evidence of dyspnoea, the GP can update her belief in the likelihood of lung cancer by re-applying Bayes theorem. The expression for the probability of lung cancer in Eq.(16.3) is conditioned on the joint variables (D,X). p(C|D, X) = a(C)p(D, X|C) a(C)p(D, X|C) + a(C)p(D, X|C) (16.3) However, there is no available probability table for p(D, X|C), so Eq.(16.3) can not be computed directly. Of course, medical authorities could establish a specific probability table for p(D, X|C), but because there can be many different indicators for a given diagnosis, it is typically impractical to produce ready-made probability tables for every possible combination of indicators. Instead, an approximation of Eq.(16.3) is the so-called naı̈ve Bayes classifier, where multiple conditional probability tables based on separate variables can be combined as if they were one single probability table based on joint variables. This simplification is correct to the extent that the separate variables are independent, which in many cases is a reasonable assumption. Eq.(16.4) gives the result of the naı̈ve Bayes classifier in the example of diagnosing lung cancer based on dyspmoea and X-ray. p(C|D, X) ≈ a(C)p(D|C)p(X|C) a(C)p(D|C)p(X|C)+a(C)p(D|C)p(X|C) (16.4) ≈ 0.01·0.65·0.90 (0.01·0.65·0.90)+(0.99·0.02·0.01) = 0.97 It can be seen that, by including X-ray as evidence, the derived likelihood of cancer increases significantly. The Bayesian network of Figure 16.2 can also be used for making predictions. Assume for example that, according to statistics, the base rate of the population exposed to significant pollution is a(P) = 0.30, and the base rate of the population that are smokers is a(S) = 0.40. Assuming independence, the combined base rates of smokers and exposure to pollution are given in Table 16.1. Table 16.1 Base rates of people exposed to pollution and being smokers Pollution Smoker Probability F F 0.42 F T 0.28 0.18 T F 0.12 T T 300 16 Subjective Networks From the statistics of Table 16.1 and the conditional probability table in Figure 16.2, the base rate of lung cancer in the population can be computed with the deduction operator of Eq.(9.11) to produce p(C) = 0.01. The Bayesian network can also be used to formulate policy targets for public health. Assume that the health authorities want to reduce the base rate of lung cancer to p(C) = 0.005. Then they would have various options, where one option could be to reduce the base rate exposure to pollution to p(P) = 0.1, and the base rate of smokers to a(S) = 0.20. According to the Bayesian network, that would give a base rate of lung cancer of p(C) = 0.005. 16.1.2 Naı̈ve Bayes Classifier Figure 16.3 illustrates a generalised version of the Bayesian network of the above example. In this general case, we assume that there is a central variable Y of cardinality l, with a set of K different parent variables X, and a set of M different child variables Z. XI ..... X II Z II Parent variables Intermediate variable Y ZI XK ..... ZM Child variables Fig. 16.3 General Bayesian network for intermediate variable with sets of parents and children A specific parent variable is denoted X I , where a specific value of index i is denoted xiI . We write X I = xiI to denote the case when variable X I takes the specific value xiI . The notation xI denotes a specific value for variable X I without explicitly specifying its index. Similarly, zI denotes a specific value of variable Z I without explicitly specifying its index, and y denotes a specific value of variable Y without explicitly specifying its index. Assume that Y takes its values from domain Y of cardinality l = |Y|. Eq.(16.5) is the general Bayes classifier which gives the expression for the probability distribution over Y given evidence variables with specific values Z I = zI , Z II = zII ,. . . , Z M = zM . The prior probability distribution, called base rate distribu- 16.1 Bayesian Networks 301 tion in subjective logic, is denoted a, so that e.g. a(y j ) denotes the prior probability of value y j ∈ Y. a(y)p(zI , . . . , zM | y) p(y|zI , . . . , zM ) = l (16.5) ∑ a(y j )p(zI , . . . , zM | y j ) j=1 Assuming that there is no available multivariate probability table for conditional probability distributions over Y based on joint variables, and that there are only probability tables for conditional probability distributions based on single variables, then it would be impossible to apply Eq.(16.5). In practice, this is often the case. However, if it can be assumed that the variables Z I , Z II ,. . . , Z M are reasonably independent, it is possible to apply the naı̈ve Bayes classifier of Eq.(16.6). p(y|zI , . . . , zM ) ≈ a(y) ∏i p(zi | y) l (16.6) ∑ [a(y j ) ∏i p(zi | y j )] j=1 Eq.(16.6) is the fundamental naı̈ve Bayes classifier that is used in a wide range of fields, such as spam filtering, natural language text classification, medical diagnostics, customer profiling and marketing, just to name a few. 16.1.3 Independence and Separation A Bayesian network graph is assumed to represent the significant dependencies of all relevant variables of the situation to be analysed. This is expressed by the Markov property which states that there are no direct dependencies in the system being modeled which are not already explicitly shown via edges. In the above example of lung cancer, there is no way for smoking to influence dyspnoea except by way of causing cancer. Bayesian networks which have the Markov property are often called Independencemaps (or, I-maps for short), since every independence implicitly indicated by the lack of an edge is real in the situation. If every arc in a Bayesian network correspond to a direct dependence in the situation, then the network is said to be a Dependence-map (or, D-map for short). A Bayesian network which is both an I-map and a D-map is called a perfect map. Bayesian networks with the Markov property, are I-maps by definition, and explicitly express conditional independencies between probability distributions of variables in the causal chain. Consider a causal chain of three nodes, X, Y and Z, as shown in Figure 16.4.a. In the above example of lung cancer, one such causal chain is: ‘smoking’ −→ ‘cancer’ −→ ‘dyspnoea’ 302 16 Subjective Networks X X Y Y Z Y (a) Causal chain X Z (b) Common Cause Z (c) Common effect Fig. 16.4 Different topologies of causality Causal chains give rise to conditional independence, which for Figure 16.4.a is reflected by: p(Z|X,Y ) = p(Z|Y ) (16.7) Eq.(16.7) expresses the property of Figure 16.4.a where the probability of Z given Y is exactly the same as the probability of Z, given both X and Y , because knowing that X has occurred is irrelevant to our beliefs about Z if we already know that Y has occurred. With reference to Figure 16.4.a, the probability of dyspnoea (Z) depends directly only on the condition of lung cancer (Y ). If we only know that someone is a smoker (X), then that would increase the likelihood of both the person having lung cancer (Y ) and suffering from shortness of breath (Z). However, if we already know that the person has lung cancer (Y ), then it is assumed that also knowing that he is smoking (X) is irrelevant to the probability of dyspnoea (Z). Expressed concisely, dyspnoea is conditionally independent of smoking given lung cancer. Consider now Figure 16.4.b, where two variables Y and Z have the common cause X. With reference to the same example, lung cancer (X) is the common cause of both symptoms which are dyspnoea (Y ), and positive indication on X-ray (Z). Common causes have the same conditional independence properties as causal chains, as expressed by Eq.(16.8) p(Z|X,Y ) = p(Z|Y ) (16.8) If there is no direct evidence about cancer, then knowing that one symptom is present increases the likelihood of lung cancer, which in turn increase the likelihood of the other symptom. However, Eq.(16.8) expresses the property of Figure 16.4.b which says: if we already know that the person suffers from lung cancer, it is assumed that an additional positive observation of dyspnoea does not change the likelihood of finding a positive indication on the X-ray image. Finally, consider Figure 16.4.c, where two variables X and Y have the common effect Z. Common effect situations have the opposite conditional independence structure to that of causal chains and common causes, as expressed by Eq.(16.9). p(X|Y, Z) 6= p(X|Z) (16.9) 16.2 Subjective Bayesian Networks 303 Thus, if the effect Z (e.g., lung cancer) is observed, and we know that one of the causes is absent (e.g., the patient does not smoke Y ), then that evidence increases the likelihood of presence of the other cause (e.g., that he lives in a polluted area X). 16.2 Subjective Bayesian Networks The goal of subjective Bayesian network modelling and analysis is to generalise traditional Bayesian network modelling and analysis by including the uncertainty dimension. This is certainly possible, but involves some additional complexity to handle the uncertainty dimension. On the other hand, the advantage is that subjective Bayesian networks explicitly express the inherent uncertainty of realistic situation in the formal modelling, thereby allowing the analysis and the results to better reflect the situation as seen by the analysts. In other words, the inherent uncertainty of situations can no longer be ‘hidden under the carpet’, which is good news for policy and decision makers. Consider a subjective Bayesian network containing a set of K nodes/variables, denoted X I to X K , and a joint opinion over all the variables. The joint opinion is expressed by ωX I ,...,X K . Similarly to the chain rule for conditional probabilities of Eq.(16.1), the chain rule of subjective conditional reasoning expresses the joint opinion in terms iterative deduction of conditional opinions as: ωX I ,...,X K = ((.(ωX I ⊚ ωX II | X I ) ⊚ ωX III | (X I ,X II ) )⊚, . . . ) ⊚ ωX K | (X I ,...,X (K−1) ) ⊚ ⊚ (16.10) = ∏ωX i | (X I ,...,X (i−1) ) , where ∏ denotes chained deduction. i Eq.(16.10) allows the concept of Bayesian networks to be generalised to subjective Bayesian networks. In subjective Bayesian networks, (conditional) opinions and base rate distributions replace the (conditional) probability tables and priors used in traditional Bayesian networks. Based on the operators of subjective logic such as multiplication, division, deduction and abduction/inversion, models of subjective Bayesian networks can be nicely expressed and analysed. In the sections below, the four reasoning categories of fig:Bayes-reasoningmodels are described within the framework of subjective logic. Then in Section 16.4, aspects of conditional independence for subjective Bayesian networks, as well as the combination of trust networks, are discussed. 304 16 Subjective Networks 16.2.1 Subjective Predictive Reasoning The predictive reasoning category was described during the presentation of traditional Bayesian networks above. Predictive reasoning is simply the application of deduction, as described in Section 9.5 above. b denote a set of K domains, and let Xb denote the corresponding joint variLet X able, where both are expressed as: b = {XI , XII ,. . . , XK } Set of domains: X Joint variable: (16.11) Xb = {X I , X II ,. . . , X K } A specific domain Y of cardinality l, with variable Y , represents a consequent variable of interest to the analyst. Figure 16.5 illustrates the general situation of Bayesian predictive modelling, involving the mentioned variables. X I X II ..... X K ZY | X Joint cause variables X (Evidence) Consequent variable Y (Query) Y Fig. 16.5 Situation of Bayesian prediction Assume that there exists a joint conditional opinion ωY |Xb which for every combi- nation of values in Xb specifies an opinion on Y . Assume also that there is an opinion on ωXb . Subjective Bayesian predictive reasoning is expressed as: ωY kXb = ωXb ⊚ ωY |Xb (16.12) In case the variables in Xb can be assumed independent, the opinion on the joint variable Xb can be generated by normal multinomial multiplication described in Chapter 8, and is expressed by: K ωXb = ∏ ωX i (16.13) i=1 There are two optional operators for multinomial multiplication for computing Eq.(16.13), namely normal multiplication described in Section 8.1, and proportional multiplication described in Section 8.2. At the time of writing, the relative performance of each multiplication operator has not been fully investigated, so no specific advice vcan be given on the choice of operator. 16.2 Subjective Bayesian Networks 305 16.2.2 Subjective Diagnostic Reasoning This section briefly describes how the diagnostic reasoning category can be handled with subjective logic. b denote a set of L domains, and let Yb denote the respective set of variables, Let Y where both are expressed as. b = {YI , YII ,. . . , YL } Set of domains: Y (16.14) Yb = {Y I , Y II ,. . . , Y L } Joint variable: A specific class domain X of cardinality k, with variable X, represents the set of classes of interest to the analyst. The classes can e.g. be a set of medical diagnoses, or types of email messages such as ‘spam’ or ‘ham’. Figure 16.6 illustrates the situation of Bayesian classifiers, where states of varaible X causes states of the joint variable Yb . Class variable X (Query) X ZY | X ZY I YI II ZY |X ..... Y II L |X YL Observation variables (Evidence) Fig. 16.6 Situation of Bayes classifier Eq.(16.15) expresses the general subjective Bayes classifier which gives the expression for the opinion on X given the joint opinion on the evidence variables Yb . It is based on the operator for multinomial opinion inversion of Definition 10.48. e ωb , aX ) ωX|Yb = ⊚( Y |X (16.15) In practical situations, the joint conditional opinion ωYb|X is typically not available, then it would be impossible to apply Eq.(16.15). It is typically more practical to obtain conditional opinions for single Y -variables. In case it can reasonably assumed that the Y -variables are independent, the naı̈ve Bayes classifier for subjective logic, expressed in Eq.(16.16), can be used. ! e ωX|Yb ≈ ⊚ L ∏ ωY j |X , a X j=1 (16.16) 306 16 Subjective Networks Eq.(16.16) expresses the general naı̈ve Bayes classifier for subjective logic. The product of conditional opinions ω(Y j |X) can be computed by normal multinomial or proportional multiplication described in Chapter 8. At the time of writing, no software implementations or applications exist for this classifier. When implemented, it will be interesting to see how it performs in fields, such as spam filtering, natural language text classification, medical diagnostics, customer profiling and cuber incident classification. 16.2.3 Subjective Intercausal Reasoning Situations of intercausal reasoning occur frequently. With reference to the lung cancer example, it could for example be that a non-smoker person has been diagnosed with lung cancer. Then it would be possible to determine the likelihood that the person has been exposed to pollution. The derived probability of exposure to pollution is then typically high, which then would be seen as the cause of cancer. Alternatively, if the person is a smoker, then the derived probability of exposure to pollution is typically low, which would then be seen not to be the cause of cancer. b denote a set of K domains, and let Xb denote the respective set of variables, Let X where both are expressed as. b = {XI , XII ,. . . , XK } Set of domains: X Joint variable: (16.17) Xb = {X I , X II ,. . . , X K } b denote a set of L domains, and let Yb denote the respective set of variables, Let Y where both are expressed as. b = {YI , YII ,. . . , YL } Set of domains: Y Joint variable: (16.18) Yb = {Y I , Y II ,. . . , Y L } b denote a specific consequent domain with variable Z. b Let Z Figure 16.7 illustrates the general situation of intercausal reasoning, where the two sets of variables Xb and Yb are causes of the consequent variable Z. It is assumed that there is evidence on the set of variables Yb as well as on Z, and that the query b targets the set of variables X. Intercausal reasoning takes place in two steps: 1) Abduction, and 2) Division. More specifically, it is assumed that the analyst has an opinion ωZ about the consequent variable Z, and that there exists a joint conditional opinion ωZ|(X, b Yb ) . With multinomial abduction, it is possible to compute the opinion ω(X, b Yb )kZ expressed as: e ω bb , a bb ) ω(X, b Yb )kZ = ωZ ⊚( (X,Y )|Z (X,Y ) (16.19) 16.3 Subjective Combined Reasoning 307 Joint cause variables X (Query) XI ..... Joint cause variables Y (Evidence) XK YI ..... YL Z Z |Y Z Z | X Z Consequent variable Z (Evidence) Fig. 16.7 Intercausal reasoning b Yb ), Having computed the abduced opinion on the joint set of joint variables (X, we can proceed to the second step. Assume that the analyst has an opinion ωYb on the set of joint variables Yb , then it is possible to derive an opinion ωXb about the set of joint variable Xb thorugh multinomial division, as expressed by: ωXb = ω(X, b Yb ) k Z /ωYb (16.20) Multinomial division is described in Chapter 8. There are two optional division operators. In case the evidence on Yb is an absolute (product) opinion, then selective division described in Section 8.7.2 should be used. In case the evidence on Yb is a partially uncertain (product) opinion, then proportional division described in Section 8.7.1 should be used. 16.3 Subjective Combined Reasoning The last reasoning category to be described is the so-called combined category, because it combines predictive and diagnostic reasoning. b denote a set of K domains, and let Xb denote the respective set of variables, Let X where both are expressed as. b = {XI , XII ,. . . , XK } Set of domains: X Joint variable: Xb = {X I , X II ,. . . , X K } (16.21) b denote a set of L domains, and let Yb denote the respective set of variables, Let Y where both are expressed as. 308 16 Subjective Networks b = {ZI , ZII ,. . . , ZM } Set of domains: Z Joint variable: (16.22) Zb = {Z I , Z II ,. . . , Z M } Let Y be an intermediate consequent domain with variable Y . Figure 16.8 illustrates the general situation of combined reasoning, where the two sets of variables Xb and Zb represent the evidence, and variable Y represents the query variable. X I X ..... II X K Joint cause variables X (Evidence) ZY | X Intermediate consequent variable Y (Query) Y Z Z |Y ZZ I ZI Z II II ZZ |Y ..... M |Y ZM Concequent variables (Evidence) Fig. 16.8 Combined reasoning The situation of Figure 16.8 can be handled by first computing the inverted condtional opinion ωX|Y b , and subsequently by deriving a naı̈ve Bayes classifier for the variable Y based on both ωX|Y b and ωZ|Y b . ! ! e ωY |(Yb,X) b ≈⊚ M ωX|Y b · ∏ ωZ j |Y , aY (16.23) j=1 In the example of lung cancer, the GP can thus use all the evidence consisting of air pollution, smoking, X-ray and dyspnoea to compute an opinion about whether the person suffers from lung cancer. 16.4 Subjective Networks The independence properties of Bayesian networks described in Section 16.1.3 are not obvious in case of subjective logic. This is because the criterion of ‘knowing the probability distribution’, e.g of an intermediate variable in a causal chain, is not necessarily satisfied with a subjective opinion on the variable. Consider for example a vacuous opinion on node Y in Figure 16.4.a. It would be an exaggeration to say 16.4 Subjective Networks 309 that the ‘probability distribution is known’ in that case. It is also possible that different analysts have different opinions about the same variables. Traditional Bayesian networks are not designed to handle such situations. Subjective logic opens up possible ways of handling such situations. Figure 16.9 illustrated how trust networks and subjective Bayesian networks can be integrated. Z Subjective Trust Network Z A Z ZCA B C B X X Subjective Bayesian Network A B Z XA:B A D Agents D ZYC ZYD Y ZY( A:C ) ¡ ( A:D ) Variables Z Z |( X ,Y ) Z Z Z ||( X ,Y ) Fig. 16.9 Subjective Networks, consisting of trust networks and Bayesian networks The investigation of theoretical models and practical methods for Bayesian network modelling based on subjective logic, combined with trust networks, opens up a highly fertile field of research in AI and machine learning. 310 16 Subjective Networks References 1. Ahmad Abdel-Hafez, Yue Xu, and Audun Jøsang. A Normal-Distribution Based Rating Aggregation Method for Generating Product Reputations. Web Intelligence, 13(1):43–51, 2015. 2. Ahmad Abdel-Hafez, Yue Xu, and Audun Jøsang. An Accurate Rating Aggregation Method for Generating Item Reputation. In Proceedings of the International Conference on Data Science and Advanced Analytics (DSAA 2015), Paris, October 2015. IEEE. 3. K. Aberer and Z. Despotovic. Managing trust in a peer-2-peer information system. In Henrique Paques, Ling Liu, and David Grossman, editors, Proceedings of the Tenth International Conference on Information and Knowledge Management (CIKM01), pages 10–317. ACM Press, 2001. 4. K.K. Bharadwaj and M.Y.H. Al-Shamri. Fuzzy Computational Models for Trust and Reputation Systems. Electronic Commerce Research and Applications, 8(1):37–47, 2009. 5. F.M. Brown. Boolean Reasoning: The Logic of Boolean Equations. 1st edition, Kluwer Academic Publishers,. 2nd edition, Dover Publications, 2003. 6. W. Casscells, A. Schoenberger, and T.B. Graboys. Interpretation by physicians of clinical laboratory results. New England Journal of Medicine, 299(18):999–1001, 1978. 7. E. Castagnoli and M. LiCalzi. Expected Utility Without Utility. Theory and Decision, 41(3):281–301, November 1996. 8. A. Chateauneuf. On the use of capacities in modelling uncertainty aversion and risk aversion. Journal of Mathematical Economics, 20:343–369, 1991. 9. G. Choquet. Theory of capacities. Annales de l’Institut Fourier, 5:131–295, 1953. 10. B. Christianson and W. S. Harbison. Why Isn’t Trust Transitive? In Proceedings of the Security Protocols International Workshop. University of Cambridge, 1996. 11. M. Daniel. Associativity in Combination of Belief Functions. In Proceedings of 5th Workshop on Uncertainty Processing. - Praha, Edicni oddeleni VSE 2000, pages 41–54. Springer, 2000. 12. Bruno de Finetti. The true subjective probability problem. In Carl-Axel Staël von Holstein, editor, The concept of probability in psychological experiments, pages 15–23, Dordrecht, Holland, 1974. D.Reidel Publishing Company. 13. Bruno de Finetti. The value of studying subjective evaluations of probability. In Carl-Axel Staël von Holstein, editor, The concept of probability in psychological experiments, pages 1–14, Dordrecht, Holland, 1974. D.Reidel Publishing Company. 14. Sebastien Destercke and Didier Dubois. Idempotent merging of belief functions: Extending the minimum rule of possibility theory. In Thierry Denœx, editor, Workshop on the Theory on Belief Functions (WTBF 2010), Brest, April 2010. 15. Jean Dezert, Pei Wang, and Albena Tchamova. On the Validity of Dempster Shafer Theory. In Proceedings of the 15th International Conference on Information Fusion (FUSION 2012), Singapore, July 2012. 16. M.R. Diaz. Topics in the Logic of Relevance. Philosophia Verlag, München, 1981. 17. D. Dubois and H. Prade. Representation and combination of uncertainty with belief functions and possibility measures. Comput. Intell., 4:244–264, 1988. 18. R. J. Duffin. Topology of Series-Parallel Networks. Journal of Mathematical Analysis and Applications, 10(2):303–313, 1965. 19. J.K. Dunn and G. Restall. Relevance Logic. In D. Gabbay and F. Guenthner, editors, Handbook of Philosophicla Logic, 2nd Edition, volume 6, pages 1–128. Kluwer, 2002. 20. Daniel Ellsberg. Risk, ambiguity, and the Savage axioms. Quarterly Journal of Ecomonics, 75:643–669, 1961. 21. R. Falcone and C. Castelfranchi. How trust enhances and spread trust. In Proceedings of the 4th Int. Workshop on Deception Fraud and Trust in Agent Societies, in the 5th International Conference on Autonomous Agents (AGENTS’01), May 2001. 22. R. Falcone and C. Castelfranchi. Social Trust: A Cognitive Approach. In C. Castelfranchi and Y.H. Tan, editors, Trust and Deception in Virtual Societies, pages 55–99. Kluwer, 2001. 23. Randy Farmer and Bryce Glass. Building Web Reputation Systems. O’Reilly Media / Yahoo Press, March 2010. References 311 24. M. Fitting. Kleene’s three-valued logics and their children. Fundamenta Informaticae, 20:113–131, 1994. 25. P. Flocchini and F.L. Luccio. Routing in Series Parallel Networks. Theory of Computing Systems, 36(2):137–157, 2003. 26. L.C. Freeman. Centrality on Social Networks. Social Networks, 1:215–239, 1979. 27. D. Gambetta. Can We Trust Trust? In D. Gambetta, editor, Trust: Making and Breaking Cooperative Relations, pages 213–238. Basil Blackwell. Oxford, 1990. 28. F. Garcin, B. Faltings, and R. Jurca. Aggregating Reputation Feedback. In Proceedings of the First International Conference on Reputation: Theory and Technology, pages 62–74. Italian National Research Council, 2009. 29. P. Gärdenfors and N.-E. Sahlin. Unreliable probabilities, risk taking, and decision making. Synthese, 53:361–386, 1982. 30. A. Gelman et al. Bayesian Data Analysis, 2nd ed. Chapman and Hall/CRC, Florida, USA, 2004. 31. Robin K.S. Hankin. A generalization of the dirichlet distribution. Journal of Statistical Software, 33(11):1–18, February 2010. 32. U. Hoffrage, S. Lindsey, R. Hertwig, and G. Gigerenzer. Communicating statistical information. Science, 290(5500):2261–2262, December 2000. 33. ISO. ISO 31000:2009 - Risk management Principles and guidelines. International Organization for Standardization, 2009. 34. A. Jøsang. The right type of trust for distributed systems. In C. Meadows, editor, Proc. of the 1996 New Security Paradigms Workshop. ACM, 1996. 35. A. Jøsang. Artificial reasoning with subjective logic. In Abhaya Nayak and Maurice Pagnucco, editors, Proceedings of the 2nd Australian Workshop on Commonsense Reasoning, Perth, December 1997. Australian Computer Society. 36. A. Jøsang. An Algebra for Assessing Trust in Certification Chains. In J. Kochmar, editor, Proceedings of the Network and Distributed Systems Security Symposium (NDSS’99). The Internet Society, 1999. 37. A. Jøsang. A Logic for Uncertain Probabilities. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(3):279–311, June 2001. 38. A. Jøsang. The Consensus Operator for Combining Beliefs. Artificial Intelligence, 142(1– 2):157–170, October 2002. 39. A. Jøsang. Conditional Reasoning with Subjective Logic. Journal of Multiple-Valued Logic and Soft Computing, 15(1):5–38, 2008. 40. A. Jøsang. Cumulative and Averaging Unfusion of Beliefs. In The Proceedings of the International Conference on Information Processing and Management of Uncertainty (IPMU2008), Malaga, June 2008. 41. A. Jøsang, J. Diaz, and M. Rifqi. Cumulative and Averaging Fusion of Beliefs. Information Fusion, 11(2):192–200, 2010. doi:10.1016/j.inffus.2009.05.005. 42. A. Jøsang and R. Ismail. The Beta Reputation System. In Proceedings of the 15th Bled Electronic Commerce Conference, June 2002. 43. A. Jøsang, R. Ismail, and C. Boyd. A Survey of Trust and Reputation Systems for Online Service Provision. Decision Support Systems, 43(2):618–644, 2007. 44. A. Jøsang and Haller J. Dirichlet Reputation Systems. In The Proceedings of the International Conference on Availability, Reliability and Security (ARES 2007), Vienna, Austria, April 2007. 45. A. Jøsang and S.J. Knapskog. A Metric for Trusted Systems (full paper). In Proceedings of the 21st National Information Systems Security Conference. NSA, October 1998. 46. A. Jøsang and D. McAnally. Multiplication and Comultiplication of Beliefs. International Journal of Approximate Reasoning, 38(1):19–51, 2004. 47. A. Jøsang and S. Pope. Semantic Constraints for Trust Tansitivity. In S. Hartmann and M. Stumptner, editors, Proceedings of the Asia-Pacific Conference of Conceptual Modelling (APCCM) (Volume 43 of Conferences in Research and Practice in Information Technology), Newcastle, Australia, February 2005. 48. A. Jøsang and S. Pope. Dempster’s Rule as Seen by Little Colored Balls. Computational Intelligence, 28(4), November 2012. 312 16 Subjective Networks 49. A. Jøsang, S. Pope, and S. Marsh. Exploring Different Types of Trust Propagation. In Proceedings of the 4th International Conference on Trust Management (iTrust), Pisa, May 2006. 50. Audun Jøsang. Multi-Agent Preference Combination using Subjective Logic. In International Workshop on Preferences and Soft Constraints (Soft’11), Perugia, Italy, 2011. 51. Audun Jøsang, Tanja Az̆derska, and Stephen Marsh. Trust Transitivity and Conditional Belief Reasoning. In Proceedings of the 6th IFIP International Conference on Trust Management (IFIPTM 2012), Surat, India, May 2012. 52. Audun Jøsang, Paulo C.G. Costa, and Erik Blash. Determining Model Correctness for Situations of Belief Fusion. In Proceedings of the 16th International Conference on Information Fusion (FUSION 2013), Istanbul, July 2013. 53. Audun Jøsang and Robin Hankin. Interpretation and Fusion of Hyper Opinions in Subjective Logic. In Proceedings of the 15th International Conference on Information Fusion (FUSION 2012), Singapore, July 2012. 54. Audun Jøsang, Xixi. Luo, and Xiaowu Chen. Continuous Ratings in Discrete Bayesian Reputation Systems. In The Proceedings of the Joint iTrust and PST Conferences on Privacy, Trust Management and Security (IFIPTM 2008), Trondheim, June 2008. 55. Audun Jøsang, Simon Pope, and Milan Daniel. Conditional deduction under uncertainty. In Proceedings of the 8th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU 2005), 2005. 56. Audun Jøsang and Francesco Sambo. Inversting Conditional Opinions in Subjective Logic. In Proceedings of the 20th International Conference on Soft Computing (MENDEL 2014), Brno, 2014. 57. Sherman Kent. Words of estimated probability. In Donald P. Steury, editor, Sherman Kent and the Board of National Estimates: Collected Essays. CIA, Center for the Study of Intelligence, 1994. 58. Jonathan Koehler. The Base Rate Fallacy Reconsidered: Descriptive, Normative and Methodological Challenges. Behavioral and Brain Sciences, 19, 1996. 59. Kevin B. Korb and Ann E. Nicholson. Bayesian Artificial Intelligence, Second Edition. CRC Press, Inc., Boca Raton, FL, USA, 2nd edition, 2010. 60. Robert E. Kraut and Paul Resnick. Building Successful Online Communities: Evidence-Based Social Design. MIT Press, Cambridge, MA, 2012. 61. V. Latora and M. Marchiori. Economic small-world behavior in weighted networks. The European Physical Journal B, 32:249–263, 2003. 62. E. Lefevre, O. Colot, and P. Vannoorenberghe. Belief Functions Combination and Conflict Management. Information Fusion, 3(2):149–162, June 2002. 63. P.V. Marsden and N. Lin, editors. Social Structure and Network Analysis. Beverly Hills: Sage Publications, 1982. 64. Stephen Marsh. Formalising Trust as a Computational Concept. PhD thesis, University of Stirling, 1994. 65. D. McAnally and A. Jøsang. Addition and Subtraction of Beliefs. In Proceedings of Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2004), Perugia, July 2004. 66. Robert J. McEliece. Theory of Information and Coding. Cambridge University Press New York, NY, USA, 2nd edition, 2001. 67. D.H. McKnight and N.L. Chervany. The Meanings of Trust. Technical Report MISRC Working Paper Series 96-04, University of Minnesota, Management Information Systems Reseach Center, 1996. 68. Merriam-Webster. Merriam-Webster Online. Available from http://www.m-w.com/, accessed October 2015. 69. August Ferdinand Möbius. Der barycentrische Calcul. Leipzig, 1827. Re-published by Georg Olms Verlag, Hildesheim, New York, 1976. 70. Mohammad Modarres, Mark P. Kaminskiy, and Vasiliy Krivtsov. Reliability Engineering and Risk Analysis: A Practical Guide, Second Edition. CRC Press, 2002. 71. Catherine K. Murphy. Combining belief functions when evidence conflicts. Decision Support Systems, 29:1–9, 2000. References 313 72. N. J. Nilsson. Probabilistic logic. Artificial Intelligence, 28(1):71–87, 1986. 73. Donald Nute and Charles B. Cross. Conditional Logic. In Dov M. Gabbay and Franz Guenthner, editors, Handbook of Philosophical Logic, 2nd Edition. Kluwer, 2002. 74. Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman Publishers, 1988. 75. Simon Pope and Audun Jøsang. Analsysis of Competing Hypotheses using Subjective Logic. In Proceedings of the 10th International Command and Control Research and Technology Symposium (ICCRTS). United States Department of Defense Command and Control Research Program (DoDCCRP), 2005. 76. A.P. Prudnikov, Yu.A. Brychkov, and O.I. Marichev. Integrals and Series, (translated from Russian), volume 1–3. Gordon and Breach Science Publishers, Amsterdam, New York, 1986. 77. W. Quattrociocchi, M. Paolucci, and R. Conte. Dealing with Uncertainty: Simulating Reputation in an Ideal Marketplace. In Proceedings of the 2008 Trust Workshop, at the 7th Int. Joint Conference on Autonomous Agents & Multiagent Systems (AAMAS), 2008. 78. P. Resnick, R. Zeckhauser, R. Friedman, and K. Kuwabara. Reputation Systems. Communications of the ACM, 43(12):45–48, December 2000. 79. Sebastian Ries, Sheikh Mahbub Habib, Max Mühlhäuser, and Vijay Varadharajan. Certainlogic: A logic for modeling trust and uncertainty. Technical Report TUD-CS-2011-0104, Technische Universität Darmstadt, Darmstadt, Germany, April 2011. 80. B. Robertson and G.A. Vignaux. Interpreting evidence: Evaluating forensic evidence in the courtroom. John Wiley & Sons, Chichester, 1995. 81. G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976. 82. C.E. Shannon. A mathematical theory of communication. Bell Syst.Techn.J., 27:379–423, 623–656, July and October 1948. 83. C. Shapiro. Consumer Information, Product Quality, and Seller Reputation. The Bell Journal of Economics, 13(1):20–35, 1982. 84. P. Smets. The Combination of Evidence in the Transferable Belief Model. IEEE Transansactions on Pattern Analysis and Machine Intelligence, 12(5):447–458, 1990. 85. M. Smithson. Ignorance and Uncertainty: Emerging Paradigms. Springer, 1988. 86. David Sundgren and Alexander Karlsson. Uncertainty levels of second-order probability. Polibits, 48:5–11, 2013. 87. S. Tadelis. Firm Reputation with Hidden Information. Economic Theory, 21(2):635–651, 2003. 88. P. Walley. Inferences from Multinomial Data: Learning about a Bag of Marbles. Journal of the Royal Statistical Society, 58(1):3–57, 1996. 89. O.E. Williamson. Calculativeness, Trust and Economic Organization. Journal of Law and Economics, 36:453–486, April 1993. 90. A. Withby, A. Jøsang, and J. Indulska. Filtering Out Unfair Ratings in Bayesian Reputation Systems. The Icfain Journal of Management Research, 4(2):48–64, 2005. 91. R. Yager. On the Dempster-Shafer framework and new combination rules. Information Sciences, 41:93–137, 1987. 92. L.A. Zadeh. Review of Shafer’s A Mathematical Theory of Evidence. AI Magazine, 5:81–83, 1984.