Subjective Logic Audun Jøsang University of Oslo Web:

advertisement
Subjective Logic
Draft, 23 November 2015
Audun Jøsang
University of Oslo
Web: http://folk.uio.no/josang/
Email: josang@mn.uio.no
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
Elements of Subjective Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Motivation for the Opinion Representation . . . . . . . . . . . . . . . . . . . . .
2.2 Flexibility of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Domains and Hyperdomains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Random Variables and Hypervariables . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Belief Mass Distribution and Uncertainty Mass . . . . . . . . . . . . . . . . .
2.6 Base Rate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
8
8
12
13
14
17
3
Opinion Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Belief and Trust Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Opinion Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Binomial Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.1 Binomial Opinion Representation . . . . . . . . . . . . . . . . . . . . . .
3.3.2 The Beta Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3.3 Mapping between a Binomial Opinion and a Beta PDF . . . . .
3.4 Multinomial Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 The Multinomial Opinion Representation . . . . . . . . . . . . . . . .
3.4.2 The Dirichlet Multinomial Model . . . . . . . . . . . . . . . . . . . . . . .
3.4.3 Visualising Dirichlet Probability Density Functions . . . . . . . .
3.4.4 Coarsening Example: From Ternary to Binary . . . . . . . . . . . .
3.4.5 Mapping between Multinomial Opinion and Dirichlet PDF .
3.4.6 Uncertainty-Maximisation of Multinomial Opinions . . . . . . .
3.5 Hyper Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1 The Hyper-Opinion Representation . . . . . . . . . . . . . . . . . . . . .
3.5.2 Projecting Hyper-Opinions to Multinomial Opinions . . . . . .
3.5.3 The Dirichlet Model Applied to Hyperdomains . . . . . . . . . . .
3.5.4 Mapping between a Hyper-Opinion and a Dirichlet HPDF . .
3.5.5 Hyper Dirichlet PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
19
20
22
22
24
26
27
27
29
32
33
33
35
36
36
37
38
40
41
vii
viii
Contents
3.6
Alternative Opinion Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6.1 Probabilistic Notation of Opinions . . . . . . . . . . . . . . . . . . . . . . 43
3.6.2 Qualitative Category Representation . . . . . . . . . . . . . . . . . . . . 45
4
Decision-Making Under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Aspects of Belief and Uncertainty in Opinions . . . . . . . . . . . . . . . . . .
4.1.1 Specificity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2 Vagueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.3 Dirichlet Visualisation of Opinion Vagueness . . . . . . . . . . . . .
4.1.4 Elemental Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Mass-Sum for Specificity, Vagueness and Uncertainty . . . . . . . . . . . .
4.2.1 Elemental Mass-Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Total Mass-Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Utility and Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Decision Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 The Ellsberg Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Examples of Decision Under Vagueness and Uncertainty . . . . . . . . .
4.6.1 Decisions with Difference in Projected Probability . . . . . . . .
4.6.2 Decisions with Difference in Specificity . . . . . . . . . . . . . . . . .
4.6.3 Decisions with Difference in Vagueness and Uncertainty . . .
4.7 Entropy in the Opinion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.1 Outcome Surprisal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7.2 Opinion Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.8 Conflict Between Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
49
49
50
52
53
54
54
56
57
61
63
67
67
69
71
73
74
75
77
5
Principles of Subjective Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Related Frameworks for Uncertain Reasoning . . . . . . . . . . . . . . . . . . .
5.1.1 Comparison with Dempster-Shafer Belief Theory . . . . . . . . .
5.1.2 Comparison with Imprecise Probabilities . . . . . . . . . . . . . . . .
5.1.3 Comparison with Fuzzy Logic . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.4 Comparison with Kleene’s Three-Valued Logic . . . . . . . . . . .
5.2 Subjective Logic as a Generalisation of Probabilistic Logic . . . . . . . .
5.3 Overview of Subjective Logic Operators . . . . . . . . . . . . . . . . . . . . . . .
81
81
81
83
83
85
86
90
6
Addition, Subtraction and Complement . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
95
97
98
7
Binomial Multiplication and Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.1 Binomial Multiplication and Comultiplication . . . . . . . . . . . . . . . . . . 101
7.1.1 Binomial Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.1.2 Binomial Comultiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.1.3 Approximations of Product and Coproduct . . . . . . . . . . . . . . . 104
7.2 Reliability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.2.1 Simple Reliability Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Contents
ix
7.2.2 Reliability Analysis of Complex Systems . . . . . . . . . . . . . . . . 109
Binomial Division and Codivision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3.1 Binomial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.3.2 Binomial Codivision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.4 Correspondence with Probabilistic Logic . . . . . . . . . . . . . . . . . . . . . . . 113
7.3
8
Multinomial Multiplication and Division . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.1 Normal Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.1.1 Determining Uncertainty Mass . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.1.2 Determining Belief Mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.1.3 Product Base Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.1.4 Assembling the Multinomial Product Opinion . . . . . . . . . . . . 120
8.1.5 Justification for Normal Multinomial Multiplication . . . . . . . 120
8.2 Proportional Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.3 Projected Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.4 Hypernomial Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.5 Product of Dirichlet Probability Density Functions . . . . . . . . . . . . . . . 123
8.6 Example Multinomial Product Computation . . . . . . . . . . . . . . . . . . . . 125
8.6.1 Multinomial Product Computation . . . . . . . . . . . . . . . . . . . . . . 126
8.6.2 Hypernomial Product Computation . . . . . . . . . . . . . . . . . . . . . 127
8.7 Multinomial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.7.1 Averaging Proportional Division . . . . . . . . . . . . . . . . . . . . . . . 129
8.7.2 Selective Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.8 Multinomial Opinion Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.8.1 Opinion Projection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.8.2 Example: Football Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9
Conditional Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.1 Introduction to Conditional Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . 135
9.2 Probabilistic Conditional Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.2.1 Binomial Probabilistic Deduction and Abduction . . . . . . . . . . 138
9.2.2 Multinomial Probabilistic Deduction and Abduction . . . . . . . 140
9.3 Notation for Subjective Conditional Inference . . . . . . . . . . . . . . . . . . . 143
9.3.1 Notation for Binomial Deduction and Abduction . . . . . . . . . . 143
9.3.2 Notation for Multinomial Deduction and Abduction . . . . . . . 144
9.4 Binomial Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.4.1 Bayesian Base Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.4.2 Free Base Rate Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.4.3 Method for Binomial Deduction . . . . . . . . . . . . . . . . . . . . . . . . 148
9.4.4 Justification for the Binomial Deduction Operator . . . . . . . . . 152
9.5 Multinomial Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.5.1 Constraints for Multinomial Deduction . . . . . . . . . . . . . . . . . . 154
9.5.2 Bayesian Base Rate Distribution . . . . . . . . . . . . . . . . . . . . . . . 157
9.5.3 Free Base Rate Distribution Intervals . . . . . . . . . . . . . . . . . . . . 158
9.5.4 Method for Multinomial Deduction . . . . . . . . . . . . . . . . . . . . . 159
x
Contents
9.6
9.7
Example: Multinomial Deduction for Match-Fixing . . . . . . . . . . . . . . 162
Interpretation of Material Implication in Subjective Logic . . . . . . . . . 163
9.7.1 Truth Functional Material Implication . . . . . . . . . . . . . . . . . . . 164
9.7.2 Material Probabilistic Implication . . . . . . . . . . . . . . . . . . . . . . 166
9.7.3 Relevance in Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
9.7.4 Subjective Interpretation of Material Implication . . . . . . . . . . 168
9.7.5 Comparison with subjective Logic Deduction . . . . . . . . . . . . . 169
9.7.6 How to Interpret Material Implication . . . . . . . . . . . . . . . . . . . 170
10
Conditional Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
10.1 Introduction to Abductive Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . 173
10.2 Relevance and Irrelevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.3 Inversion of Binomial Conditional Opinions . . . . . . . . . . . . . . . . . . . . 176
10.3.1 Principles for Inverting Binomial Conditional Opinions . . . . 176
10.3.2 Method for inversion of Binomial Conditional Opinions . . . . 178
10.3.3 Convergence of Repeated Inversions . . . . . . . . . . . . . . . . . . . . 182
10.4 Binomial Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10.5 Illustrating the Base Rate Fallacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
10.6 Inversion of Multinomial Conditional Opinions . . . . . . . . . . . . . . . . . 188
10.6.1 Principles of Multinomial Conditional Opinion Inversion . . . 188
10.6.2 Method for Multinomial Conditional Inversion . . . . . . . . . . . 189
10.7 Multinomial Abduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10.8 Example: Military Intelligence Analysis . . . . . . . . . . . . . . . . . . . . . . . 195
10.8.1 Example: Intelligence Analysis with Probability Calculus . . 195
10.8.2 Example: Intelligence Analysis with Subjective Logic . . . . . 196
11
Fusion of Subjective Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
11.1 Interpretation of Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
11.1.1 Correctness and Consistency Criteria for Fusion Models . . . 201
11.1.2 Classes of Fusion Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
11.1.3 Criteria for Fusion Operator Selection . . . . . . . . . . . . . . . . . . . 205
11.2 Belief Constraint Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.2.1 Method of Constraint Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.2.2 Frequentist Interpretation of Constraint Fusion . . . . . . . . . . . . 209
11.2.3 Expressing Preferences with Subjective Opinions . . . . . . . . . 213
11.2.4 Example: Going to the Cinema, 1st Attempt . . . . . . . . . . . . . . 214
11.2.5 Example: Going to the Cinema, 2nd Attempt . . . . . . . . . . . . . 215
11.2.6 Example: Not Going to the Cinema . . . . . . . . . . . . . . . . . . . . . 215
11.3 Cumulative Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
11.4 Averaging Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
11.5 Hybrid Cumulative-Averaging Fusion . . . . . . . . . . . . . . . . . . . . . . . . . 220
11.6 Consensus & Compromise Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
11.6.1 Consensus Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
11.6.2 Compromise Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
11.6.3 Merging Consensus and Compromise Belief . . . . . . . . . . . . . 225
Contents
xi
12
Unfusion and Fission of Subjective Opinions . . . . . . . . . . . . . . . . . . . . . . 227
12.1 Unfusion of Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.1.1 Cumulative Unfusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
12.1.2 Averaging Unfusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
12.1.3 Example: Cumulative Unfusion of Binomial Opinions . . . . . 230
12.2 Fission of Opinions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
12.2.1 Cumulative Fission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
12.2.2 Fission of Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
12.2.3 Example Fission of Opinion . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
13
Computational Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
13.1 The Notion of Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
13.1.1 Reliability Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
13.1.2 Decision Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
13.1.3 Reputation and Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
13.2 Trust Transitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
13.2.1 Motivating Example for Transitive Trust . . . . . . . . . . . . . . . . . 242
13.2.2 Referral Trust and Functional Trust . . . . . . . . . . . . . . . . . . . . . 243
13.2.3 Notation for Transitive Trust . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
13.2.4 Compact Notation for Transitive Trust Paths . . . . . . . . . . . . . 245
13.2.5 Semantic Requirements for Trust Transitivity . . . . . . . . . . . . . 246
13.3 The Trust Discounting Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
13.3.1 Principle of Trust Discounting . . . . . . . . . . . . . . . . . . . . . . . . . 247
13.3.2 Trust Discounting with 2-Edge Paths . . . . . . . . . . . . . . . . . . . . 247
13.3.3 Example: Trust Discounting of Restaurant Recommendation 250
13.3.4 Trust Discounting for Multi-Edge Path . . . . . . . . . . . . . . . . . . 251
13.4 Trust Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
13.5 Trust Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
13.5.1 Motivation for Trust Revision . . . . . . . . . . . . . . . . . . . . . . . . . . 256
13.5.2 Trust Revision Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
13.5.3 Example: Conflicting Restaurant Recommendations . . . . . . . 260
14
Trust Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
14.1 Graphs for Trust Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
14.1.1 Directed Series-Parallel Graphs . . . . . . . . . . . . . . . . . . . . . . . . 263
14.2 Outbound-Inbound Node Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
14.2.1 Parallel-Path Subnetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
14.2.2 Nesting Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
14.3 Analysis of DSPG Trust Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
14.3.1 Algorithm for Analysis of DSPG . . . . . . . . . . . . . . . . . . . . . . . 268
14.3.2 Soundness Requirements for Trust Recommendations . . . . . 269
14.4 Analysing Complex Non-DSPG Trust Networks . . . . . . . . . . . . . . . . . 272
14.4.1 Synthesis of DSPG Trust Network . . . . . . . . . . . . . . . . . . . . . . 274
14.4.2 Requirements for DSPG Synthesis . . . . . . . . . . . . . . . . . . . . . . 276
xii
Contents
15
Bayesian Reputation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
15.1 Computing Reputation Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
15.1.1 Binomial Reputation Scores. . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
15.1.2 Multinomial Reputation Scores. . . . . . . . . . . . . . . . . . . . . . . . . 283
15.2 Collecting and Aggregating Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
15.2.1 Collecting Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
15.2.2 Aggregating Ratings with Aging . . . . . . . . . . . . . . . . . . . . . . . 285
15.2.3 Reputation Score Convergence with Time Decay . . . . . . . . . . 286
15.3 Base Rates for Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
15.3.1 Individual Base Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
15.3.2 Total History Base Rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
15.3.3 Sliding Time Window Base Rate. . . . . . . . . . . . . . . . . . . . . . . . 287
15.3.4 High Longevity Factor Base Rate. . . . . . . . . . . . . . . . . . . . . . . 288
15.3.5 Dynamic Community Base Rates . . . . . . . . . . . . . . . . . . . . . . . 288
15.4 Reputation Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
15.4.1 Multinomial Probability Representation. . . . . . . . . . . . . . . . . . 289
15.4.2 Point Estimate Representation. . . . . . . . . . . . . . . . . . . . . . . . . . 290
15.4.3 Continuous Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
15.5 Simple Scenario Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
15.6 Combining Trust and Reputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
15.7 Combining Trust and Reputation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
16
Subjective Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
16.1 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
16.1.1 Example: Lung Cancer Situation . . . . . . . . . . . . . . . . . . . . . . . 297
16.1.2 Naı̈ve Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
16.1.3 Independence and Separation . . . . . . . . . . . . . . . . . . . . . . . . . . 301
16.2 Subjective Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
16.2.1 Subjective Predictive Reasoning . . . . . . . . . . . . . . . . . . . . . . . . 304
16.2.2 Subjective Diagnostic Reasoning . . . . . . . . . . . . . . . . . . . . . . . 305
16.2.3 Subjective Intercausal Reasoning . . . . . . . . . . . . . . . . . . . . . . . 306
16.3 Subjective Combined Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
16.4 Subjective Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Chapter 1
Introduction
In standard logic, propositions are considered to be either true or false, and in probability calculus the argument probabilities are expressed in the range [0, 1]. However,
a fundamental aspect of the human condition is that nobody can ever determine with
absolute certainty whether a proposition about the world is true or false, or determine the probability of something with certainty. In addition, whenever the truth
of a proposition is assessed, it is always done by an individual, and it can never
be considered to represent a general and objective belief. This indicates that important aspects are missing in the way standard logic and probability calculus capture
our perception of reality, and that these reasoning models are more designed for an
idealised world than for the subjective world in which we are all living.
The expressiveness of arguments in a reasoning model depends on the richness in
the syntax of those arguments. Opinions used in subjective logic offer significantly
greater expressiveness than Boolean truth values or by probabilities. This is achieved
by explicitly including degrees of uncertainty, thereby allowing an analyst to specify
“I don’t know” or “I’m indifferent” as input argument. Definitions of operators used
in a specific reasoning model depend on the argument syntax. For example, in binary
logic the AND, OR and XOR operators are defined by their respective truth tables
which traditionally have the status of being axioms. Other operators, such as MP
(Modus Ponens), MT (Modus Tollens) and other logical operators are defined in a
similar way.
The concept of probabilistic logic has multiple interpretations in the literature,
see e.g. [72]. The general aim of a probabilistic logic is to combine the capacity of
probability theory to handle likelihood with the capacity of binary logic to exploit
structure to make inferences. The result is a richer and more expressive formalism
that neither probability calculus nor deductive logic can offer alone. The various
probabilistic logics have in common that they attempt to find a natural extension of
traditional logic truth tables: the results they define are derived through probabilistic
expressions instead.
In this book, probabilistic logic is interpreted as the direct extension of binary
logic, in the sense that propositions get assigned continuous probabilities, rather
1
2
1 Introduction
than just boolean truth values, and where formulas of probability calculus replace
truth tables.
In binary logic, operators are typically defined as axioms represented as truth
tables. In probabilistic logic the corresponding operators are simply algebraic formulas that take probabilities as input arguments. Assuming that Boolean TRUE in
binary logic corresponds to probability p = 1, and that Boolean FALSE corresponds
to probability p = 0, binary logic (BL) simply becomes an instance of probabilistic
logic (PL), or equivalently, probabilistic logic becomes a generalisation of binary
logic. More specifically there is a direct correspondence between many binary logic
operators and probabilistic logic operator formulas, as specified in Table 1.1.
Table 1.1 Correspondence between binary logic and probabilistic logic operators
Binary Logic
Probabilistic Logic
AND:
x∧y
Product:
p(x ∧ y) = p(x)p(y)
OR:
x∨y
Coproduct:
p(x ∨ y) = p(x) + p(y) − p(x)p(y)
XOR:
x 6≡ y
Inequivalence:
p(x 6≡ y) = p(x)(1 − p(y)) + (1 − p(x))p(y)
EQU:
x≡ y
Equivalence:
p(x ≡ y) = 1 − p(x 6≡ y)
IMP:
x→y
Implication:
Not closed in probabilistic logic.
MP:
{x → y, x} ⇒ y Deduction:
p(ykx) = p(x)p(y|x) + p(x)p(y|x)
MT:
{x → y, y} ⇒ x Abduction:
p(x|y) =
a(x)p(y|x)
a(x)p(y|x)+a(x)p(y|x)
p(x|y) =
a(x)p(y|x)
a(x)p(y|x)+a(x)p(y|x)
p(xk̃y) = p(y)p(x|y) + p(y)p(x|y)
The Material Implication operator IMP is traditionally defined in binary logic in
terms of a truth table, but no corresponding probabilistic operator exists because it
would require degrees of uncertainty which is not defined for probabilities. Material
implication is in fact not closed in probabilistic logic. In subjective logic however
it is possible to define an operator corresponding to material implication because
1 Introduction
3
subjective logic includes degrees of uncertainty. This is explained in Section 9.7 on
Material Implication.
The notation p(ykx) means that the probability of y is derived as a function of
the conditionals p(y|x) and p(y|x) as well as the parent p(x). The parameter a(x)
represents the base rate of x. The symbol ‘6≡’ represents inequivalence, i.e. that x
and y have different truth values.
MP (Modus Ponens) corresponds to – and is a special case of – probabilistic
conditional deduction. MT (Modus Tollens) corresponds to – and is a special case of
– probabilistic conditional abduction. The notation p(ykx) for conditional deduction
denotes the output probability of y conditionally deduced from the input conditional
p(y|x) and p(y|x) as well as the input argument p(x). Similarly, the notation p(xk̃y)
for conditional abduction denotes the output probability of x conditionally abduced
from the input conditional p(y|x) and p(y|x) as well as the evidence argument p(y).
For example, consider the probabilistic operator for MT in Table 1.1. In the case
when (x → y) is TRUE and y is FALSE, which translates into p(y|x) = 1 and p(y) =
0. Then it can be observed from the first equation that p(x|y) 6= 0 because p(y|x) = 1.
From the second equation it can be observed that p(x|y) = 0 because p(y|x) = 1 −
p(y|x) = 0. From the third equation it can finally be seen that p(xky) = 0 because
p(y) = 0 and p(x|y) = 0. From the probabilistic expressions we just abduced that
p(x) = 1 which translates into x being FALSE, as MT dictates.
The power of probabilistic logic is the ability to derive logic conclusions without
relying on axioms of logic, only on principles of probability calculus.
Probabilistic logic was first defined by Nilsson [72] with the aim of combining
the capability of deductive logic to exploit the structure and relationship of arguments and events, with the capacity of probability theory to express degrees of truth
about those arguments and events. Probabilistic logic can be used to build reasoning models of practical situations that are more general and more expressive than
reasoning models based on binary logic.
Logically speaking it is meaningless to define binary logic operators such as
those of Table 1.1 in terms of truth tables because these operators are simply special
cases of corresponding probabilistic operators. The truth values specified in truth
tables can be directly computed with probabilistic logic operators, so defining them
as axioms is redundant. To have separate independent definitions for the same concept, i.e. both as a truth table and as a probability calculus operator, is problematic
because of the possibility of inconsistency between definitions. In the defence of
truth tables one could say that it is pedagogically meaningful to use truth tables as
a look-up tool for Boolean cases because a simple look-up is faster than computing
the result of an algebraic expression. However, the truth tables should be defined
in terms of their corresponding probabilistic logic operators, and not as separate
axioms.
A serious limitation of probabilistic logic (and binary logic alike), is that there
is often uncertainty about the probabilities (and Boolean values) themselves. When
the analyst is unable to estimate probability for input arguments then probabilistic
logic is not an adequate formalism. It is for example impossible to express probabilistic input arguments that reflect expressions like “I don’t know” because they
4
1 Introduction
express degrees of ignorance and uncertainty. An analyst who is unable to provide
any reliable probability for a given input argument can be tempted or even forced to
set a probability without any evidence to support it. This practice will generally lead
to unreliable conclusions, often described as the problem of ‘garbage in - garbage
out’. In case the analyst wants to express “I don’t know the truth values of variables
X1 and X2 ” and needs to derive p(X1 ∧ X2 ), then probabilistic logic does not offer
an adequate model.
The type of uncertainty that subjective opinions express is in the literature typically called second-order probability or second-order uncertainty, where traditional
probability represents first-order uncertainty [29, 86]. More specifically, (secondorder) uncertainty is expressed as a probability density function over first-order
probabilities. Probability density functions must have an integral of 1 to respect
the additivity axiom of probability theory. Apart from this requirement, a probability density function can take any shape, and thereby represent many different
forms of uncertainty. The uncertainty expressed by subjective opinions represent
second-order probability in the form of Dirichlet probability density functions. A
Dirichlet PDF naturally reflects random sampling of statistical events. Uncertainty
in the Dirichlet model reflects the lack of evidence, in the sense that the fewer observed samples there are, the less evidence there is, and the more uncertainty there
is. In subjective logic, the term uncertainty is interpreted in this way, with reference
to the lack of evidence which can be reflected by the Dirichlet model.
The additivity principle of classical probability requires that the probabilities of
mutually disjoint elements in a state space add up to 1. This requirement makes it
necessary to estimate a probability for every state, even though there might not be
a basis for it. On other words, it prevents us from explicitly expressing ignorance
about the possible states, outcomes or statements. If somebody wants to express
ignorance about the state x as “I don’t know” this would be impossible with a simple
scalar probability. A probability P(x) = 0.5 would for example mean that x and x are
equally likely, which in fact is quite informative, and very different from ignorance.
Alternatively, a uniform probability density function would more closely express
the situation of ignorance about possible the outcome.
Arguments in subjective logic are called subjective opinions, or opinions for
short. An opinion can contain degrees of uncertainty in the sense of uncertainty
about probability. The uncertainty of an opinion can be interpreted as ignorance
about the truth of the relevant states, or as second order probability about first order
probabilities.
The subjective opinion model extends the traditional belief function model in
the sense that opinions take base rates into account, whereas belief functions ignore
base rates. An essential characteristic of subjective logic is thus to include base
rates, which also makes it possible to define a bijective mapping between subjective
opinions and Dirichlet probability density functions. Subjective opinions generalise
belief functions, precisely because subjective opinions include base rates, and in that
sense have a richer expressiveness than belief functions.
Belief theory has its origin in a model for upper and lower probabilities proposed
by Dempster in 1960. Shafer later proposed a model for expressing belief functions
1 Introduction
5
[81]. The main idea behind belief theory is to abandon the additivity principle of
probability theory, i.e. that the sum of probabilities on all pairwise disjoint states
must add up to one. Instead belief theory gives observers the ability to assign socalled belief mass to the powerset of the state space. The main advantage of this approach is that ignorance, i.e. the lack of evidence about the truth of the states, can be
explicitly expressed e.g. by assigning belief mass to the whole state space. Shafer’s
book [81] describes various aspects of belief theory, but the two main elements are
1) a flexible way of expressing beliefs, and 2) a conjunctive method for combining
belief functions, commonly known as Dempster’s Rule, which in subjective logic is
called the belief constraint fusion operator.
The definition of new operators for subjective opinions is normally quite simple,
and consists of adding the new dimension of uncertainty to traditional probabilistic operators. Currently, a relatively large set of practical subjective logic operators
exists. This provides a flelxible framework for reasoning in a large variety of situations where input arguments can be incomplete or affected by uncertainty. Subjective opinions are equivalent to Dirichlet and Beta probability density functions.
Through this equivalence subjective logic provides a calculus for reasoning with
probability density functions.
The aim of this book is to provide a general introduction to subjective logic. Different but equivalent representations of subjective opinions are presented together
with their interpretation. This allows uncertain probabilities to be seen from different angles, and allows an analyst to define models according to the formalisms that
they are most familiar with, and that most naturally represents a specific real world
situation. Subjective logic contains the same set of basic operators known from binary logic and classical probability calculus, but also contains some nontraditional
operators which are specific to subjective logic.
The advantage of subjective logic over traditional probability calculus and probabilistic logic is that lack of evidence can be explicitly expressed so that real world
situations can be modeled and analysed more realistically than is otherwise possible with purely probabilistic models. The analyst’s partial ignorance and lack of
information can be taken explicitly into account during the analysis, and explicitly
expressed in the conclusion. When used for decision support, subjective logic allows
decision makers to be better informed about uncertainties affecting the assessment
of specific situations and future outcomes.
Chapter 2
Elements of Subjective Opinions
2.1 Motivation for the Opinion Representation
Explicit expression of uncertainty is the main motivation for subjective logic and
for using opinions as input arguments to reasoning models. Uncertainty comes in
many flavours, and a good taxonomy is described in in [85]. In subjective logic,
uncertainty relates to probabilities. For example, let the probability of a future event
x be estimated as p(x) = 0.5. In case this probability estimate represents the perceived likelihood of obtaining heads when flipping a fair coin, then it would be
natural to represent it as an opinion with zero uncertainty. In case the probability
estimate represents the perceived likelihood that there is life on a planet in a specific
solar system, then it would be natural to represent it as an opinion with considerable
uncertainty. The probability estimate of an event is thus separated from the certainty/uncertainty of the probability. With this explicit representation of uncertainty,
subjective logic can be applied for analysing situations where events have more or
less certain probabilities, i.e. where the analyst is ignorant about the probability of
possible events. This is done by including the degree of uncertainty about probabilities as an explicit parameter in the input arguments. This uncertainty is then taken
into account during the analysis and explicitly represented in the output conclusion.
In other words, subjective logic allows uncertainty to propagate through the analysis
all the way to the output conclusions.
For decisions makers it can make a big difference whether probabilities are certain or uncertain. For example, it is risky to make important decision based on highly
uncertain probabilities. Decision makers can instead request additional evidence in
order to get more conclusive beliefs less affected by uncertainty.
7
8
2 Elements of Subjective Opinions
2.2 Flexibility of Representation
There can be multiple equivalent syntactic representations of subjective opinions.
The traditional opinion expression is a composite function consisting of belief
masses, uncertainty mass and base rates which are described separately below. An
opinion applies to a variable which takes its values from a domain (i.e. from a state
space). An opinion defines a sub-additive belief mass distribution over the variable,
meaning that the sum of belief masses can be less than one. Opinions can have an
attribute that identifies the belief owner.
An important property of opinions is that they are equivalent to Beta or Dirichlet
probability density functions (PDF) under a specific mapping. This equivalence is
based on simple assumptions about the correspondence between evidence and belief
mass distributions. More specifically, an infinite amount of evidence leaves no room
for uncertainty, and produces an additive belief mass distribution (i.e. the sum is
equal to one). A finite amount of evidence gives room for uncertainty and produces
a subadditive belief mass distribution (i.e. the sum is less than one). In practical
situations the amount of evidence is always finite, so that practical opinions should
always have subadditive belief mass that is complemented by some uncertainty. The
basic features of subjective opinions are defined in the sections below.
2.3 Domains and Hyperdomains
In subjective logic a domain is a state space consisting of a set of values which can
also be called elements. Domains can be binary (with exactly two values) or n-ary
(with n values) where n > 2.
The values of the domain can e.g. be observable or hidden states, events, hypotheses or propositions, just like in traditional Bayesian modeling. The different
values of a domain are assumed to be exclusive and exhaustive, which means that
the world can only be in one state at any moment in time, and that all possible states
are included in the domain.
The simplest domain is binary e.g. denoted X = {x, x} where x is the complement of x, as illustrated in Figure 2.1.
:
x
x
Fig. 2.1 Binary domain
Binary domains are typically used when modeling situations that have only two
alternatives, such as when modeling a switch that can be either on or off.
2.3 Domains and Hyperdomains
9
When more than two alternatives are possible the model requires a domain larger
than binary. An n-ary domain specifies three or more different exhaustive and exclusive values, where the example quaternary domain Y = {y1 , y2 , y3 , y4 } is illustrated
in Figure 2.2.
;
y1
y2
y3
y4
Fig. 2.2 Example quaternary domain
Domains are typically specified to reflect realistic situations for the purpose of
being practically analysed in some way. The values of an n-ary domain are singletons, i.e. they are considered to represent a single possible state or event. It is
possible to combine singletons into composite values, as explained below.
Assume a ternary domain X = {x1 , x2 , x3 }. The hyperdomain of X is the reduced
powerset denoted R(X) as illustrated in Figure 2.3, where the solid circles denoted
x1 , x2 and x3 represent singleton values, and the dotted oval shapes denoted (x1 ∪x2 ),
(x1 ∪ x3 ) and (x2 ∪ x3 ) represent composite values.
R (:)
x1
x1‰x2
x1‰x3
x2
x2‰x3
x3
Fig. 2.3 Example hyperdomain
Definition 2.1 (Hyperdomain). Let X be a domain and let P(X) denote the powerset of X. The powerset contains all subsets of X including the emptyset 0/ and
the domain X itself. The hyperdomain denoted R(X) is the reduced powerset of X,
i.e. the powerset excluding the emptyset 0/ and the domain X. The hyperdomain is
expressed as:
Hyperdomain: R(X) = P(X) \ {X, 0}.
/
(2.1)
⊔
⊓
A composite value x ∈ R(X) \ X is the union of a set of singleton values from X.
The interpretation of a composite value being TRUE is that one and only one of the
constituent singletons is TRUE, and that it is unspecified which singleton is TRUE
in particular.
10
2 Elements of Subjective Opinions
Singletons represent real possible states in a situation to be analysed. A composite value on the other hand does not reflect a specific state in the real world,
because otherwise we would have to assume that the world can be in multiple different states at the same time, which contradicts the assumption behind the original
domain. Composites are only used as a synthetic artifact for allowing belief mass to
express that one of multiple singletons is true, but not which singleton in particular
is true.
The property that all proper subsets of X are elements of R(X), but not {X} nor
{0},
/ is in line with the hyper-Dirichlet model [31]. The cardinality of the hyperdomain is κ = |R(X)| = 2k − 2. Indexes can be used to identify specific values in a
hyperdomain, and a natural question is how these elements should be indexed.
One simple indexing method is to index each composite element as a function of
the singleton elements that it contains, as illustrated in Figure 2.3. While this is a
very explicit indexing method, it can be complex to use in mathematical expressions.
A more compact indexing method is to use continuous indexing, where indexes
in the range [1, k] identify singleton values in X, and indexes in the range [k + 1, κ ]
identify composites. The values contained in the hyperdomain R(X) are thus the
singletons of X with index in the range [1, k], as well as the composites with index
in the range [k + 1, κ ]. The type of indexes following this method is illustrated in
Figure 2.4 which is equivalent to the indexing method illustrated in Figure 2.3
R (:)
x4
x1
x5
x3
x2
x6
Fig. 2.4 Example of continuous indexing of composite elements in hyperdomain
Let us explain the continuous indexing method below. Assume X to be a domain
of cardinality k, and then consider how to index the elements of the hyperdomain
R(X) of cardinality κ . It is practical to define the first k elements of R(X) as having
the same index as the corresponding singletons of X. The remaining elements of
R(X) cab be indexed in a simple and intuitive way.
The elements of R(X) can be grouped in cardinality classes according to the
number of singletons from X that they contain. Let j denote the number of singletons in the elements of a specific cardinality class, then call it ‘cardinality class j’.
By definition then, all elements belonging to cardinality class j have cardinality j.
The actual number of elements belonging to each cardinality class is determined
by the Choose Function C(κ , j) which determines the number of ways that j out
of κ singletons can be chosen. The Choose Function, equivalent to the binomial
coefficient, is defined as:
2.3 Domains and Hyperdomains
C(κ , j) =
11
κ!
κ
=
j
(κ − j)! j!
(2.2)
Within a given hyperdomain each element can be indexed according to the order
of the lowest indexed singletons from X that it contains. As an example, Figure 2.2
above illustrates domain X of cardinality k = 4. Let us consider the specific composite value xm = {x1 , x2 , x4 } ∈ R(X).
The fact that xm contains 3 singletons identifies it as an element of cardinality
class 3. The two first singletons x1 and x2 have the lowest indexes that are possible
to select, but the third singleton x4 has the second lowest index that is possible
to select. This particular element must therefore be assigned the second relative
index in cardinality class 3. However, its absolute index depends on the number
of elements in the inferior cardinality classes. Table 2.1 specifies the number of
elements of cardinality classes 1 to 3, as determined by Eq.(2.2).
Table 2.1 Number of elements per cardinality class
1
Cardinality class:
Number of elements in each cardinality class: 4
2
6
3
4
In this example, cardinality class 1 has 4 elements, and cardinality class 2 has 6
elements, which together makes 10 elements. Because ym represents the 2nd relative
index in cardinality class 3, its absolute index is 10 + 2 = 12. So the solution is that
m = 12 so we have x12 = {x1 , x2 , x4 }. To complete the example, Table 2.2 specifies
the index and cardinality class of all the elements of R(X) according to this scheme.
Table 2.2 Index and cardinality class of elements of R(X) in case |X| = 4.
Singletons
Singleton selection per element
x4
∗
∗ ∗ ∗
∗ ∗ ∗
x3
∗
∗ ∗ ∗ ∗
∗ ∗
x2
∗
∗
∗∗
∗ ∗
∗
x1 ∗
∗∗∗
∗ ∗ ∗
Element Index:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Cardinality Class:
1
2
3
Elements of cardinality class 1 are the original singletons from X. The domain
X = {x1 , x2 , x3 , x4 } does not figure as an element of R(X) in Table 2.2 because
excluding X is precisely what makes R(X) a reduced powerset and a hyperdomain.
An element of R(X) that contains multiple singletons is called a composite element
because it represent the combination of multiple singletons. In other words, when
an element is a non-singleton, or equivalently is not an element in cardinality class
1, then it is a composite element in R(X). This is formally defined below.
12
2 Elements of Subjective Opinions
Definition 2.2 (Composite Set). Let X be a domain of cardinality k, where R(X)
is its hyperdomain of cardinality κ . Every proper subset x ⊂ X, i.e. every subset
of cardinality |x| ≥ 2, is a composite element. The set of composite elements is the
composite set, denoted C (X) and defined as:
Composite set: C (X) = {x ⊂ X where |x| ≥ 2} .
(2.3)
⊔
⊓
It is straightforward to prove the following equality:
C (X) = R(X) \ X .
(2.4)
The cardinality of the composite set C (X) is expressed as:
|C (X)| = κ − k .
(2.5)
Section 4.1.2 describes the degree of vagueness in an opinion as a function of the
belief mass assigned to composite elements, i.e. to elements in C (X).
2.4 Random Variables and Hypervariables
Let X denote a binary or an n-ary domain of cardinality k. Then we can define X to
be a random variable which takes its values from X. For example, if X is a ternary
domain, then ‘X = x3 ’ means that the random variable X has value x3 , which is
typically interpreted in the sense that x3 is TRUE. Note our convention that domains
are denoted with blackboard letters such as X, Y or Z and that variables are denoted
with italic capital letters such as Xm Y or Z.
Let X be a ternary domain and consider X’s hyperdomain denoted R(X). The
concept of hyperdomain calls for the possibility of assigning values of the hyperdomain to a variable. For example, it must be possible for a variable to take the
composite value {x1 , x3 } ∈ R(X). This means that the real TRUE value is either x1
or x3 but that it is unspecified which value in particular it is. Variable that take their
values from a hyperdomain are naturally called hypervariables, as defined below.
Definition 2.3 (Hypervariable). Let X be a domain with corresponding hyperdomain R(X). A variable X that takes its values from R(X) is a hypervariable.
⊔
⊓
A hypervariable X can be constrained to a random variable by restricting it to
only take values from the domain X. For simplicity of notation we use the same
notation for a random variable and the corresponding hypervariable, so that e.g. X
can denote both a random variable and a hypervariable. When either meaning can
be assumed we simply use the term variable.
Let now X be a variable which can take its values from the ternary domain
WEATHER = {rainy, sunny, overcast} that contains the three possible weather
2.5 Belief Mass Distribution and Uncertainty Mass
13
types specified as {rainy}, {sunny} and {overcast}. The hyperdomain denoted
R(WEATHER) contains the singletons of the original WEATHER domain, as well
as all possible composites such as {rainy, overcast}. Remember that values in a domain are exclusive, meaning that it is assumed that only one value is TRUE at any
one time. In case a composite value is considered TRUE, it must be interpreted in
the sense that in reality only one of the contained singleton values is TRUE, but that
it is unknown which value in particular it is.
So when a variable takes a composite value such as X = {rainy, sunny} it means
that the actual weather is either rainy or sunny, but not both at the same time. If the
analyst wants to include the realistic possibility that there can be rain and sunshine
simultaneously, then the domain would need to be extended with a corresponding
singleton value such as {rainy&sunny}. It is thus a question of interpretation how
the analyst wants to separate between different types of weather, and thereby define
the relevant domain.
2.5 Belief Mass Distribution and Uncertainty Mass
Subjective opinions are based on belief mass distributions over a domain X or over
a hyperdomain R(X). In case of multinomial opinions the belief mass distribution
is restricted to the domain X. In case of hyper-opinions the belief mass distribution
applies to the hyperdomain R(X). Belief mass assigned to a singleton value xi ∈ X
expresses support for xi being TRUE. Belief mass assigned to a composite value
x j ∈ R(X) expresses support for one of the singleton values contained in x j being
TRUE, but says nothing about which of them in particular is TRUE.
Belief mass distributions are subadditive, meaning that that the sum of belief
masses can be less than one. The sub-additivity of belief mass distributions is complemented by uncertainty mass denoted uX . In general, the belief mass distribution
b X assigns belief masses to possible values of the variable X ∈ R(X) as a function of
the support for those values. The uncertainty mass uX represents lack of support for
the variable X to have any specific value. As explained in Ch.1 Introduction, uncertainty mass can also be interpreted as second-order probability uncertainty, where
probability represents first-order uncertainty. The sub-additivity of the belief mass
distribution and the complement property of the uncertainty mass are expressed by
Eq.(2.6) and Eq.(2.7) below.
Definition 2.4 (Belief Mass Distribution). Let X be a domain with corresponding
hyperdomain R(X), and let X be a variable over those domains. A belief mass
distribution denoted b X assigns belief mass to possible values of the variable X.
In case of a random variable X ∈ X the belief mass distribution applies to domain
X, and in case of a hypervariable X ∈ R(X) the belief mass distribution applies to
hyperdomain R(X). This is formally defined as:
Belief mass distribution over domain X is defined as: b X : X → [0, 1],
14
2 Elements of Subjective Opinions
with the multinomial additivity requirement: uX + ∑ b X (x) = 1.
(2.6)
x∈X
Belief mass distribution over hyperdomain R(X) is defined as: b X : R(X) → [0, 1],
with the hypernomial additivity requirement: uX + ∑ b X (x) = 1.
(2.7)
x∈R(X)
⊔
⊓
There exists a direct correspondence between the bba (basic belief assignment)
distribution m used for representing belief masses in traditional belief theory [81],
and the b X and uX functions used in subjective logic. The correspondence is defined
such that m (X) = uX and m (x) = b X (x), ∀x ∈ R(X). Here m (x) denotes belief mass
assigned to x. The difference is thus that subjective logic considers uncertainty mass
to be different in nature from belief mass, where uncertainty mass exists a priori
and belief mass is created a posteriori as a function of collected evidence according to the Beta and Dirichlet models. One advantage of representing opinions with
separate belief and uncertainty mass is that it makes the opinion model equivalent
to the Beta and Dirichlet models [31, 53] as described in Ch.3 below. While belief theory offers a very general model for expressing beliefs, it is limited by not
having any direct correspondence to classical models of statistics, except through
a default pignistic probability based on default base rates. Because of its purely
theoretical characteristics, default pignistic probability is not suitable for modeling
realistic non-default situations. This is because traditional belief theory does not
specify base rates. Without base rates however, belief theory does not provide an
adequate model for pignistic probability. In subjective logic the notion of projected
probability uses base rates and is consistent with the Dirichlet model, and thereby
with traditional statistical theory.
2.6 Base Rate Distributions
The concept of base rates is central in the theory of probability. Base rates are for
example needed for default reasoning, for abductive reasoning and for Bayesian
updating. This section describes the concept of base rate distribution over variables,
and shows how it can be used for probability projections.
Given a domain X of cardinality k, the default base rate of for each singleton in
the domain is 1/k, and the default base rate of a subset consisting of n singletons is
n/k. In other words, the default base rate of a composite value is equal to the number
of singletons in the composite value relative to the cardinality of the whole domain.
Default base rate is sometimes called ‘relative atomicity’.
For each composite value there exists default relative base rates with respect to
every other fully or partly overlapping value x ⊂ X. Remember that a value x ⊂ X
is also a value x ∈ R(X).
2.6 Base Rate Distributions
15
However, in practical situations it would be possible and useful to apply base
rates that are different from the default base rates. For example, when considering
the base rate of a particular infectious disease in a specific population, the domain
can be defined as a set of two values: {‘infected’, ‘not infected’} with respect to a
particular disease. Assuming that an unknown person enters a medical clinic, the
physician would a priori be ignorant about whether that person is infected or not
before having assessed any evidence. This ignorance should intuitively be expressed
as uncertainty mass. The probability projection of a vacuous opinion using a default
base rate of 0.5 would dictate the a priori probability of 0.5 that the person has the
disease. However, the base rate of a disease is normally much lower than 0.5 , and
can typically be determined given relevant statistics from a given population.
Typically, data is collected from hospitals, clinics and other sources where people
diagnosed with a specific disease are treated. The amount of data that is required
to calculate a reliable base rate of the disease can be determined by guidelines,
statistical analysis, and expert opinion about the data that it is truly reflective of
the actual number of infections – which is itself a subjective assessment. After the
guidelines, analysis and opinion are all satisfied, the base rate will be determined
from the data, and can then be used with medical tests to provide a better indication
of the likelihood of specific patients having contracted the disease [32].
Integrating base rates with belief mass distributions enables a better and more
intuitive interpretation of opinions, facilitates probability projections from opinions,
and provides a basis for conditional reasoning. When using base rates for probability
projections, the contributing belief mass assigned to values in the domain and the
contributing uncertainty mass is a function of the base rate distribution.
Base rates are expressed as a base rate distribution denoted as a X so that a X (x)
represents the base rate of the element x ∈ X. Base rate distribution is formally
defined below.
Definition 2.5 (Base Rate Distribution). Let X be a domain, and let X be a random
variable in X. The base rate distribution a X assigns base rate probability to possible
values of X ∈ X, and is an additive probability distribution, formally expressed as:
Base rate distribution: a X : X → [0, 1],
with the additivity requirement:
∑ aX (x) = 1 .
(2.8)
x∈X
⊔
⊓
The base rate distribution is normally assumed to be common (i.e. not subjective) because is is based on general background information. So although different
analysts can have different opinion on the same variable, they normally share the
same base rate distribution over the domain of a particular situation. However, it
is obvious that two different observers can also assign different base rate distributions to the same random variable in case they do not share the same background
information. Base rates can thus be partially objective and partially subjective.
16
2 Elements of Subjective Opinions
This flexibility allows two different analysts to assigning different belief masses
as well as different base rates to the same variable. This naturally reflects different
views, analyses and interpretations of the same situation by different observers.
Events that can be repeated many times are typically frequentist in nature, meaning that base rates for such events typically can be derived from statistical observations. For events that can only happen once, the analyst must often extract base rates
from subjective intuition or from analyzing the nature of the phenomenon at hand
and any other relevant evidence. However, in many cases this can lead to considerable vagueness about base rates, and when nothing else is known, it is possible to
use the default base rate distribution for a random variable. More specifically, when
there are k singletons in the domain, the default base rate of each singleton is 1/k.
The difference between the concepts of subjective and frequentist probabilities
is that the former can be defined as subjective betting odds – and the latter as the
relative frequency of empirically observed data, where the subjective probability
normally converges toward the frequentist probability in the case where empirical
data is available [12]. The concepts of subjective and empirical base rates can be
interpreted in a similar manner where they also converge and merge into a single
base rate when empirical data about the population in question is available.
The usefulness of base rate distributions is to make possible the derivation of
projected probability distributions from opinions. Projection from opinion space to
probability space removes uncertainty and base rate to produce a probability distribution over a domain. The projected probability distribution depends partially on
belief mass and partially on uncertainty mass, where the contribution from uncertainty is weighted as a function of base rate. It can be useful to project probability
for composite values in a hyperdomain, and for that purpose it is necessary to first
compute base rate for such values. The computation of base rate for elements in a
hyperdomain is defined below.
Definition 2.6 (Base Rate for Elements in a Hyperdomain). Let X be a domain
with corresponding hyperdomain R(X), and let X be a variable over those domains.
Assume the base rate distribution a X over the domain X according to Def.2.5. The
base rate a X (x) for a composite element x ∈ R(X) can be computed as follows:
Base rate over composite elements: a X (xi ) =
∑
a X (x j ) , ∀xi ∈ R(X) . (2.9)
x j ∈X
x j ⊆xi
⊔
⊓
Eq.(2.9) says that the base rate on a composite element xi ∈ R(X) is the sum
of base rates on singletons x j contained in xi . Note that the this is not a base rate
distribution because base rates on singletons would be counted multiple times, and
so would be super-additive.
Belief masses can be assigned to elements in the hyperdomain that are fully or
partially overlapping subsets of the hyperdomain. In order to take such belief masses
into account for probability projections it is necessary to also derive relative base
2.7 Probability Distributions
17
rates for these elements as a function of the degree of overlap with each other. This
is defined below.
Definition 2.7 (Relative Base Rates). Assume the domain X of cardinality k, and
the corresponding hyperdomain R(X). Let X be hypervariable over R(X). Assume
that a base rate distribution a X is defined over X according to Def.2.6. Then the base
rates of an element xi relative to an element x j is expressed according to the relative
base rate a X (xi /x j ) defined below.
a X (xi /x j ) =
a X (xi ∩ x j )
, ∀ xi , x j ∈ R(X),
a X (x j )
where a X (x j ) = 0 .
(2.10)
⊔
⊓
In the case when a X (x j ) = 0, then a X (xi /x j ) = 0.
From a syntactic point of view, base rates are simply probabilities. From a semantic point of view, base rates are non-informative prior probabilities estimated
as a function of general background information for a class of variables. The term
‘non-informative’ is used to express that no specific evidence is available for determining the probability of a specific event other than the general background information for that class of events. Base rates make it possible to define a bijective
mapping between opinions and Dirichlet probability density functions, and are used
for probability projections. The base rate concepts defined in this chapter are used
for various computations with opinions, as described in the next chapters.
2.7 Probability Distributions
A probability distribution assigns a probability to each value of a random variable. In
case it distributes probability over a single random variable, then it is an univariate
probability distribution. A probability distribution can also be multivariate in which
case it distributes joint probability over two or more random variables taking on
various combinations of values.
With a probability distribution denoted p X , the probability a X (x) represents the
probability of the value x ∈ X. Probability distribution is formally defined below.
Definition 2.8 (Probability Distribution). Let X be a domain with corresponding
hyperdomain R(X), and let X denote a variable in X or in R(X). The standard probability distribution p X assigns probability to possible values of X ∈ X. The hyperprobability distribution p H
X assigns probability to possible values of X ∈ R(X).
These distributions are formally defined below:
Probability distribution: p X : X → [0, 1],
with the additivity requirement:
∑ p X (x) = 1 .
x∈X
(2.11)
18
2 Elements of Subjective Opinions
Hyper-probability distribution: p H
X : R(X) → [0, 1],
with the additivity requirement:
∑
pH
X (x) = 1 .
(2.12)
x∈R(X)
⊔
⊓
The hyper-probability distribution is not meaningful in the traditional sense, because hyper-probability is not restricted to exclusive values in the domain X. The
traditional assumption behind frequentist or subjective probability is that it is additive over values x ∈ X that in turn represent exclusive real events. Probability
distributed over the hyperdomain R(X) is still additive, but the values x ∈ R(X) no
longer represent exclusive real events, because they can be composite and (partially)
overlapping.
However, a hyper-probability distribution can be projected onto a traditional
probability distribution according to the Eq.(2.13) which uses the concept of relative base rates from Eq.(2.10).
p X (x) =
∑
a X (x/x j ) p H
X (x j ), ∀x ∈ X.
(2.13)
x j ∈R(X)
Hyper-probability distributions are used when describing the Dirichlet model
over hyperdomains in Section 3.5.3.
Chapter 3
Opinion Representations
Subjective opinions express beliefs about the truth of propositions under degrees
of uncertainty, and can indicate ownership of an opinion whenever required. This
chapter presents the various representations and notations for subjective opinions.
3.1 Belief and Trust Relationships
In general the notation ωXA is used to denote opinions in subjective logic, where
e.g. the subscript X indicates the target variable or proposition to which the opinion
applies, and e.g. the superscript A indicates the subject agent who holds the opinion,
i.e. the belief owner. Superscripts can be omitted when it implicit or irrelevant who
the belief owner agent is.
The principle that a subject agent A has an opinion about a target variable X
means that there is a directed belief relationship from A to X, formally denoted
[A, X]. Similarly, the principle that an agent A trusts an entity E means that there is a
directed trust relationship from A to E, formally denoted [A, E]. These relationships
can be considered as directed edges in a graph. This convention is summarised in
Table 3.1. See also Table 13.1 on p.244 which in addition includes the concept of
referral-trust relationship.
Table 3.1 Notation for belief and trust relationships
Relationship
type
Formal
notation
Graph edge
notation
Interpretation
Belief
[A, X]
A −→ X
Agent A has an opinion about variable X
Trust
[A, E]
A −→ E
Agent A has a trust opinion about entity E
To believe and to trust are very similar concepts, the main difference being that
trust assumes dependence and risk, which belief does not necessarily assume. So by
19
20
3 Opinion Representations
abstracting away the dependence and risk aspects of trust relationships, subjective
logic uses the same formal representation for both belief opinions and trust opinions.
Trust opinions are described in detail in Chapter 13.
3.2 Opinion Classes
Opinions apply to variables that take their values from domains. A domain is a
state space which consist of values that are assumed to be exhaustive and mutually disjoint. Different opinion owners are assumed to have a common semantic
interpretation of the elements in the same domain, whether they represent states,
events, hypotheses or propositions. The opinion owner (subject) and the variable
(object) are attributes of an opinion. The opinion itself is a composite function
ωXA = (bbX , uX , a X ) consisting of the belief mass distribution b X , the uncertainty
mass uX and the base rate distribution a X .
A few specific classes of opinions have been defined. In case the domain X is
binary, so is the variable X, and the opinion is binomial. In case the domain is larger
than binary and the variable is a random variable X ∈ X, then the opinion is multinomial. In case the domain is larger than binary and the variable is a hypervariable
X ∈ R(X)), then the opinion is hypernomial. These are the 3 main opinion classes.
Opinions can also be classified according to levels of uncertainty and belief mass
assignment. In case uX = 1 the opinion is vacuous, in case 0 < uX < 1 the opinion
is relatively uncertain, and in case uX = 0 the opinion is dogmatic. When a single
value is considered TRUE by assigning belief mass 1 to that value, the opinion is
absolute. By considering the 3 main opinion classes depending on the domain, and
the 4 subclasses depending on uncertainty mass and belief mass assignment, we get
12 different opinion classes as listed in Table 3.2. These are further described in the
next section.
The 12 entries in Table 3.2 also mention the equivalent probability representation
of opinions, e.g. as Beta PDF, Dirichlet PDF or as a probability distribution over the
variable X. This equivalence is explained in more detail below.
The intuition behind using the term ‘dogmatic’ is that a totally certain opinion
(i.e. where u = 0) about a real-world proposition can be seen as an extreme opinion.
From a philosophical viewpoint, no one can ever be totally certain about anything
in this world. So when the formalism allows explicit expression of uncertainty, as
opinions do, it is extreme, and even unrealistic, to express a dogmatic opinion. The
rationale for this interpretation is that a dogmatic opinion has an equivalent Dirichlet
probability density function in the form of a Dirac delta function which is infinitely
high and infinitesimally thin. It would require an infinite amount of evidence to
produce a Dirichlet PDF equal to a Dirac delta function, which in practice is impossible, and thereby can only be considered in case of idealistic assumptions. This
does not mean that traditional probabilities should be interpreted as dogmatic, because the probability model does not include uncertainty in the way opinions do.
Instead it can implicitly be assumed that there is some uncertainty associated with
3.2 Opinion Classes
21
Table 3.2 Opinion classes and their equivalent probabilistic or logic representations
Class:
Domain:
Variable:
Binomial
Multinomial
Hyper
X = {x, x}, |X| = 2
X, |X| > 2
R(X), |X| > 2
Binomial variable X = x Random variable X ∈ X Hypervariable X ∈ R(X)
Vacuous
Vacuous
(uX = 1)
binomial opinion
Proba. equiv: Uniform Beta PDF on p(x)
Vacuous
multinomial opinion
Prior PDF on p X
Vacuous
hyper-opinion
Prior PDF on p X
Uncertain
(0 < uX < 1)
Proba. equiv:
Uncertain
binomial opinion
Beta PDF on p(x)
Uncertain
multinomial opinion
Dirichlet PDF on p X
Uncertain
hyper-opinion
Dirichlet HPDF on p H
X
Dogmatic
(uX = 0)
Proba. equiv:
Dogmatic
binomial opinion
Probability on x
Absolute
(bbX (x) = 1)
Logic equiv:
Absolute
binomial opinion
Boolean TRUE/FALSE
Dogmatic
Dogmatic
multinomial opinion
hyper-opinion
Proba. distribution over X Proba.distrib. over R(X)
Absolute
multinomial opinion
TRUE element in X
Absolute
hyper-opinion
TRUE element in R(X)
every probability estimate, but that it is invisible, because uncertainty is not included
in the model. One advantage of subjective logic is precisely that it allows explicit
expression of uncertainty.
A vacuous opinion represents belief about random variable in case the observer
or analyst has no specific evidence about the possible values of a random variable
expect for the base rate distribution which represents general background information. It is thus always possible for an analyst to produce more or less certain opinions
that genuinely represents an analyst’s belief, so analysts must never invent beliefs. In
case they are ignorant they can simply produce a vacuous or highly uncertain opinion. The same can not be said when using probabilities, where analysts sometimes
have to ‘pull probabilities out of thin air’ e.g. in case a specific input probability
parameter to a model is needed in order to execute an analysis with the model.
Each opinion class from Table 3.2 has an equivalence mapping to a type of
Dirichlet or a Beta PDF (probability density function) under a specific mapping.
This mapping then gives subjective opinions a firm basis in the domain of classical probability and statistics theory. The different opinions classes are described in
more detail in the following sections.
22
3 Opinion Representations
3.3 Binomial Opinions
3.3.1 Binomial Opinion Representation
A binary domain consists of only two values, and the variable is typically fixed to
one of the two values. Formally, let a binary domain be specified as X = {x, x},
then a binomial random variable X ∈ X can be fixed to X = x. Opinions on a binomial variable are called binomial opinions, and a special notation is used for their
mathematical representation. Note that a general n-ary domain X can be considered
binary when seen as a binary partition consisting of a proper subset x ⊂ X and its
complement x, so that the corresponding multinomial random variable becomes a
binomial random variable under the same partition.
Definition 3.1 (Binomial Opinion). Let X = {x, x} be a binary domain, and let X
be a binomial random variable in X. A binomial opinion about the truth of state x is
the ordered quadruplet ωx = (bx , dx , ux , ax ), where the additivity requirement:
bx + dx + ux = 1,
(3.1)
is satisfied, and where the respective parameters are defined as:
bx : belief mass in support of x being TRUE (i.e. X = x),
dx : disbelief mass in support of x being FALSE (i.e. X = x ),
ux : uncertainty mass, i.e. the amount of uncommitted belief/disbelief mass,
ax : base rate, i.e. a priori probability of x without any committed belief mass.
⊔
⊓
The characteristics of various binomial opinion classes are listed below. A binomial opinion:
where bx = 1
is an absolute opinion equivalent to Boolean TRUE,
where dx = 1
is an absolute opinion equivalent to Boolean FALSE,
where ux = 0
is a dogmatic opinion ω x , and a traditional probability,
where 0 < ux < 1 is an opinion with some uncertainty, and
◦
where ux = 1
is a vacuous opinion ω x , i.e. with zero belief mass.
The projected probability of a binomial opinion on proposition x is defined by
Eq.(3.2) below.
Projected probability of binomial opinions: Px = bx + ax ux
(3.2)
Binomial opinions have variance expressed as:
Variance of binomial opinions: Varx =
Px (1 − Px )ux
,
W + ux
(3.3)
where W denotes non-informative prior weight, which must be set to W = 2. The
opinion variance is derived from the variance of the Beta PDF, as defined by
Eq.(3.10) below.
3.3 Binomial Opinions
23
Barycentric coordinate systems can be used to visualise opinions. In a barycentric coordinate system the location of a point is specified as the center of mass, or
barycenter, of masses placed at its vertices [69]. A barycentric coordinate system
with n axes is represented on a simplex with n vertices which has dimensionality
(n− 1). A triangle is a 2D simplex which has 3 vertices and is thus a barycentric system with 3 axes. A binomial opinion can be visualised as a point in a barycentric coordinate system of 3 axes represented by a 2D simplex which is in fact an equal sided
triangle, as illustrated in Figure 3.1. Here, the belief, disbelief and uncertainty axes
go perpendicularly from each edge towards the respective opposite vertices denoted
x, x and uncertainty. The base rate ax is a point on the base line, and the projected
probability Px is determined by projecting the opinion point to the base line in parallel with the base rate director. The binomial opinion ωx = (0.40, 0.20, 0.40, 0.90)
with probability projection Px = 0.76 is shown as an example.
u vertex (uncertainty)
bx
Zx
dx
uX
x vertex
(disbelief)
x vertex
Px
ax
(belief)
Fig. 3.1 Barycentric triangle visualisation of binomial opinion
In case the opinion point is located at the left or right vertex of the triangle, i.e.
with dx = 1 or bx = 1 (and ux = 0), then the opinion is equivalent to Boolean TRUE
or FALSE, in which case subjective logic becomes equivalent to binary logic. In
case the opinion point is located on the baseline of the triangle, i.e. with ux = 0,
then the opinion is equivalent to a traditional probability, in which case subjective
logic becomes equivalent to probability calculus, or more specifically to probabilistic logic.
In case the opinion point is located at one of the three vertices in the triangle,
i.e. with b = 1, d = 1 or u = 1, the reasoning with such opinions becomes a form
of three-valued logic that is comparable with Kleene logic [24]. However, the threevalued arguments of Kleene logic do not contain base rates, so that probability projections can not be derived from Kleene logic arguments. See Section 5.1.4 for a
more detailed explanation.
24
3 Opinion Representations
3.3.2 The Beta Binomial Model
A binomial opinion is equivalent to a Beta PDF (probability density function) under a specific bijective mapping. In general a probability density function denoted
PDF(p(x)) is defined as:
PDF(p(x)) : [0, 1] → R≥0 ,
where
Z 1
PDF(p(x)) dp(x) = 1 .
(3.4)
0
R≥0 is the set of positive real numbers including 0, which can also be denoted
as [0, ∞>. The variable of the PDF is thus the continuous probability p(x) ∈ [0, 1],
and the image of the PDF is the density PDF(p(x)) ∈ R≥0 . When considering the
probability function p(x) : X → [0, 1], then the image of p(x) becomes the variable
of PDF(p(x)). In this way the functions p(x) and PDF(p(x)) are chained functions.
When there is uncertainty about the probability p(x), then the density expresses
where along the continuous interval [0, 1] the probability p(x) is likely to be. At
positions on the p-axis where the density is high, the corresponding probability is
relatively likely (2nd order probability), and where the density is low the corresponding probability is relatively likely (2nd order probability). A ‘certain probability’ means that there is high probability density at a specific position on the p-axis.
A totally ‘uncertain probability’ means that any probability is equally likely (2nd
order probability), so the density is spread out uniformly over the whole interval
[0, 1]. The traditional 1st order interpretation of probability as likelihood of events,
is thus complemented by probability density which can be interpreted as 2nd order
probability. This interpretation is the basis for mapping (high) probability density
in Beta PDFs into (high) belief mass in opinions, through the bijective mapping described in Section 3.3.3. As a consequence, flat probability density in Beta PFDs is
mapped into uncertainty in opinions,
The Beta PDF is a specific type of probability density function denoted as
Beta(α , β ) with variable p(x), and the two strength parameters α and β . The Beta
PDF is defined below.
Definition 3.2 (Beta Probability Density Function). Assume the binary domain
X = {x, x} and the random variable X ∈ X. Let α represent the evidence about
X = x, and let β represent the evidence about X = x . Let p denote the continuous
probability function p : X → [0, 1] where p(x) + p(x) = 1. With p(x) as variable the
Beta probability density function Beta(α , β ) is the function expressed as:
Beta(α , β ) : [0, 1] → R≥0 ,
Beta(α , β ) =
where
Γ(α +β )
α −1 (1 − p(x))β −1 ,
Γ(α )Γ(β ) p(x)
α > 0, β > 0 ,
with the restrictions that: p(x) 6= 0 if α < 1, and p(x) 6= 1 if β < 1.
It can be shown that the additivity requirement
which in fact is a general property of the Beta PDF.
R1
0
(3.5)
(3.6)
⊔
⊓
Beta(α , β ) dp(x) = 1 holds,
3.3 Binomial Opinions
25
Let rx denote the number of observations of x, and let sx denote the number of
observations of x. The α and β parameters can be expressed as a function of the
observations (rx , sx ) in addition to the base rate ax .
α = rx + axW
(3.7)
β = sx + (1 − ax )W .
This leads to the evidence notation of the Beta PDF denoted Betae (rx , sx , ax )
which is expressed as:
Betae (rx , sx , ax )
=
Γ(rx +sx +W )
(rx +axW −1) (1 − p(x))(sx +(1−ax )W −1) ,
Γ(rx +axW )Γ(sx +(1−ax )W ) p(x)
(3.8)
where (rx + axW ) > 0 and (sx + (1 − ax )W ) > 0,
with the restrictions:
p(x) 6= 0 if (rx + axW ) < 1,
p(x) 6= 1 if (sx + (1 − ax )W ) < 1.
The non-informative prior weight denoted as W is normally set to W = 2 which
ensures that the non-informative prior (i.e. when rx = sx = 0) Beta PDF with default
base rate ax = 0.5 is the uniform PDF.
The expected probability Ex as a function of the Beta PDF parameters is defined
by Eq.(3.9):
α
rx + axW
Ex =
=
.
(3.9)
α +β
rx + sx +W
The variance Varx of the Beta PDF is defined by Eq.(3.10).
Varx =
αβ
(α +β )2 (α +β +1)
=
(rx +axW )(sx +(1−ax )W )
(rx +sx +W )2 (rx +sx +W +1)
=
(bx +ax ux )(dx +(1−ax )ux )ux
W +ux
=
Px (1−Px )ux
W +ux
(3.10)
The latter two equality expressions for the variance in Eq.(3.10) emerge from the
mapping of Definition 3.3 below.
The variance of the Beta PDF measures how far the probability density is spread
out over the interval [0, 1]. A variance of zero indicates that the probability density
is concentrated in one point, which only happens for infinite α and/or β (or infinite
rx and/or sx ). The uniform Beta PDF, which occurs when α = β = 1 (e.g. when
rx = sx = 0, ax = 1/2 and W = 2), gives Varx = 1/12.
26
3 Opinion Representations
The Beta PDF is important for subjective logic because it is possible to define a
bijective mapping between the projected probability of a binomial opinion and the
expected probability of a Beta PDF. This is described next.
3.3.3 Mapping between a Binomial Opinion and a Beta PDF
The bijective mapping between a binomial opinion and a Beta PDF emerges from
the intuitive requirement that Px = Ex , i.e. that the projected probability of a binomial opinions must be equal to the expected probability of a Beta PDF. This can
be generalised to a mapping between multinomial opinions and Dirichlet PDFs, as
well as between hyper-opinions and hyper-Dirichlet PDFs. The detailed description
for determining the mapping is described in Section 3.4.5.
The mapping from the parameters of a binomial opinion ωx = (bx , dx , ux , ax ) to
the parameters of Betae (rx , sx , ax ) is defined below.
Definition 3.3 (Mapping: Binomial Opinion ↔ Beta PDF).
Let ωx = (bx , dx , ux , ax ) be a binomial opinion, and let p(x) be a probability distribution, both over the same binomial random variable X where it is assumed that
X = x. Let Betae (rx , sx , ax ) be a Beta PDF over the probability variable p(x) defined
as a function of rx , sx and ax according to Eq.(3.8). The opinion ωx and the Beta
probability density function Betae (rx , sx , ax ) are equivalent through the following
mapping:

bx =





dx =





ux =
rx
W +rx +sx
sx
W +rx +sx
W
W +rx +sx

For u 6= 0:


  rx = bx W
ux




dx W
⇔ 
  sx = ux




1 = bx + dx + ux
For ux = 0:

rx = bx · ∞





sx = dx · ∞





1 = bx + dx





 (3.11)




⊔
⊓
A generalisation of this mapping is provided in Def.3.6 below. The default noninformative prior weight W is set to W = 2 because then it produces a uniform
Beta PDF in the case of default base rate ax = 1/2. It can be seen from Eq.(3.11)
that the vacuous binomial opinion ωx = (0, 0, 1, 21 ) corresponds to the uniform PDF
Beta(1, 1).
The example Beta((3.8, 1.2)) = Betae (2.0, 1.0, 0.9) is illustrated in Figure 3.2.
Through the equivalence defined by Eq.(3.11) this Beta PDF is equivalent to the
example opinion ωx = (0.4, 0.2, 0.4, 0.9) from Figure 3.1.
In the example of Figure 3.2 where α = 3.8 and β = 1.2 the expected probability
is Ex = 3.8/5.0 = 0.76 which is indicated with the vertical line. This expected probability is of course equal to the projected probability of Figure 3.1 because the Beta
3.4 Multinomial Opinions
27
3
Probability density Beta(p(x))
2.5
2
1.5
1
0.5
0
0
0.2
0.4
0.6
0.8
1
Probability p(x)
Fig. 3.2 Probability density function Beta((3.8, 1.2)) ≡ ωx = (0.4, 0.2, 0.4, 0.9)
PDF is equivalent to the opinion through Eq.(3.11). The equivalence between binomial opinions and Beta PDFs is very powerful because subjective logic operators
then can be applied to density functions and vice versa, and also because binomial
opinions can be determined through statistical observations. Multinomial opinions
described next are a generalisation of binomial opinions in the same way as Dirichlet
PDFs are a generalisation of Beta PDFs.
3.4 Multinomial Opinions
3.4.1 The Multinomial Opinion Representation
Multinomial opinions represent the natural generalisation of binomial opinions.
Multinomial opinions can be used to model situations where a random variable
X ∈ X can take multiple values.
Definition 3.4 (Multinomial Opinion). Let X be a domain larger than binary, i.e.
so that k = |X| > 2. Let X be a random variable in X. A multinomial opinion over
the random variable X is the ordered triplet ωX = (bbX , uX , a X ) where:
b X : is a belief mass distribution over X,
uX : is the uncertainty mass, i.e. the amount of uncommitted belief mass,
a X : is a base rate distribution over X ,
where the multinomial additivity requirement of Eq.(2.6) is satisfied.
⊔
⊓
28
3 Opinion Representations
In case uX = 1 then ωX is a vacuous multinomial opinion, in case uX = 0 then ωX
is a dogmatic multinomial opinion, and in case 0 < uX < 1 then ωX is an uncertain
multinomial opinion. In the special case where for some X = x all belief mass is
assigned to a single value as b X (x) = 1 then ωX is an absolute opinion, i.e. it is
absolutely certain that a specific value x ∈ X is TRUE.
In case of multinomial opinions, the belief mass distribution b X and the base rate
distribution a X both have k parameters each. The uncertainty mass uX is a simple
scalar. A multinomial opinion thus contains (2k + 1) parameters. However, given
the belief and uncertainty mass additivity of Eq.(2.6) and the base rate additivity of
Eq.(2.8), multinomial opinions only have (2k − 1) degrees of freedom.
The probability projection of multinomial opinions is relatively simple to calculate, compared to general opinions, because no belief mass applies to overlapping
values in the domain X. The expression for projected probability of multinomial
opinions is therefore a special case of the general expression of Def.3.28. The probability projection of multinomial opinions is defined by Eq.(3.12) below.
Projected probability distribution: PX (x) = b X (x) + a X (x) uX , ∀ x ∈ X . (3.12)
Multinomial opinions have variance expressed as:
Variance of multinomial opinions: VarX (x) =
PX (x)(1 − PX (x))uX
W + uX
(3.13)
where W denotes non-informative prior weight, which must be set to W = 2. The
multinomial opinion variance is derived from the variance of the Dirichlet PDF, as
defined by Eq.(3.18) below.
The only multinomial opinions that can be easily visualised are trinomial, in
which case it can be presented as a point inside a tetrahedron which is a barycentric coordinate system of 4 axes, as shown in Figure 3.3. The tetrahedron is a 3D
simplex.
In Figure 3.3, the vertical elevation of the opinion point inside the tetrahedron
represents the uncertainty mass. The distances from each of the three triangular side
planes to the opinion point represents the respective belief masses. The base rate
distribution a X is indicated as a point on the base plane. The line that joins the
tetrahedron summit and the base rate distribution point represents the director. The
projected probability distribution point is geometrically determined by drawing a
projection from the opinion point parallel to the director onto the base plane.
Assume the ternary domain X = {x1 , x2 , x3 } and the corresponding random variable X. Figure 3.3 shows a tetrahedron with the example multinomial opinion ωX
with belief mass distribution b X = {0.20, 0.20, 0.20}, uncertainty mass uX = 0.40
and base rate distribution a X = {0.750, 0.125, 0.125}. Only the uncertainty axis is
shown in Figure 3.3. The belief axes for x1 , x2 and x3 are not shown due to the
difficulty of 3D visualisation on the 2D plane of the figure.
3.4 Multinomial Opinions
29
u vertex (uncertainty)
ZX
uX
x2 vertex
x3 vertex
PX
x1 vertex
aX
Fig. 3.3 Barycentric tetrahedron visualisation of trinomial opinion
The triangle and tetrahedron belong to the simplex family of geometrical shapes.
Multinomial opinions on domains of cardinality k can in general be represented
as a point in a simplex of dimension k. For example, binomial opinions can be
represented inside a triangle which is a 2D simplex, and trinomial opinions can be
represented inside a tetrahedron which is a 3D simplex.
By applying Eq.(3.12) to the example of Figure 3.3 the projected probability
distribution is PX = {0.50, 0.25, 0.25}.
It can be noted that the probability projection of multinomial opinions expressed
by Eq.(3.12) is a generalisation of the probability projection of binomial opinions
expressed by Eq.(3.2).
3.4.2 The Dirichlet Multinomial Model
A multinomial opinion is equivalent to a Dirichlet PDF over X according to a specific bijective mapping described in Section 3.4.5. For self-containment, we briefly
outline the Dirichlet multinomial model below, and refer to [30] for more details.
Multinomial probability density over a domain of cardinality k is described by
the k-dimensional Dirichlet PDF, where the special case of a probability density
over a binary domain (i.e. where k = 2) is the Beta PDF described in Section 3.3.2
above.
Assume the domain X of cardinality k and the random variable X ∈ X with probability distribution p X . The Dirichlet PDF can be used to represent probability density
over p X .
Because of the additivity requirement ∑x∈X p (x) = 1 the Dirichlet density function has only k − 1 degrees of freedom. This means that the knowledge of k − 1
30
3 Opinion Representations
probability variables and their density determines the last probability variable and
its density.
The Dirichlet PDF takes as variable the k-dimensional probability distribution
p X . The strength parameters for the k possible outcomes are represented as k positive real numbers αX (x), each corresponding to one of the possible outcomes x ∈ X.
When considering that the probability distribution p X consists of k separate probability function p X (x) : X → [0, 1], then the image of the k probability functions p X (x)
becomes the k-component variable of Dir(αX ). In this way the functions p X (x) and
Dir(αX ) are chained functions.
Definition 3.5 (Dirichlet Probability Density Function). Let X be a domain consisting of k mutually disjoint values. Let αX represent the strength vector over the
values of X, and let p X denote the k-component probability distribution over X. With
p X as a k-dimensional variable, the Dirichlet PDF denoted Dir(αX ) is expressed as:
Γ ∑ αX (x)
x∈X
Dir(αX ) =
(3.14)
∏ p X (x)(αX (x)−1) , where αX (x) ≥ 0 ,
∏ Γ(αX (x)) x∈X
x∈X
with the restrictions that: p X (x) 6= 0 if αX (x) < 1.
⊔
⊓
The strength vector αX represents the a priori as well as the observation evidence. The non-informative prior weight is expressed as a constant W , and this
weight is distributed over all the possible outcomes as a function of the base rate.
As mentioned already it is normally assumed that W = 2.
The singleton values in a domain of cardinality k can have a base rate different
from the default value 1//k meaning that it is possible to define an arbitrary additive
base rate distribution a X over the domain X. The total strength αX (x) for each value
x ∈ X can then be expressed as:
αX (x) = r X (x) + a X (x)W , whererr X (x) ≥ 0 ∀x ∈ X .
(3.15)
This leads to the evidence representation of the Dirichlet probability density
function denoted DireX (rr X , a X ) expressed in terms of the evidence vector r X where
r X (x) is the evidence for outcome x ∈ X. In addition, the base rate distribution a X
and the non-informative prior weight W are parameters in the expression for the
evidence Dirichlet PDF.
DireX (rr X , a X ) =
Γ ∑ (rr X (x)+aaX (x)W )
x∈X
∏ Γ(rr X (x)+aaX (x)W )
x∈X
∏ p X (x)(rr X (x)+aaX (x)W −1) ,
x∈X
(3.16)
where (rr X (x) + a X (x)W ) ≥ 0,
with the restrictions that p X (x) 6= 0 if (rr X (x) + a X (x)W ) < 1.
3.4 Multinomial Opinions
31
The notation of Eq.(3.16) is useful, because it allows the determination of the
probability densities over variables where each value can have an arbitrary base rate.
Given the Dirichlet PDF of Eq.(3.16), the expected probability distribution over X
can now be written as:
EX (x) =
r X (x) + a X (x)W
αX (x)
=
(x
)
α
W + ∑ r X (x j )
∑ X j
x j ∈X
∀x ∈ X ,
(3.17)
x j ∈X
which represents a generalisation of the projected probability of the Beta PDF
expressed by Eq.(3.9).
The variance VarX (x) of the Dirichlet PDF is defined by Eq.(3.18).

VarX (x) =


αX (x) ∑ αX (x j )−αX (x)
x j ∈X
2 

 ∑ αX (x j )  ∑ αX (x j )+1
=
x j ∈X
x j ∈X
(rr X (x)+aaX (x)W )(RX +W −rr X (x)−aaX (x)W )
,
(RX +W )2 (RX +W +1)
=
(bbX (x)+aaX (x)uX )(1−bbX (x)−aaX (x)uX )uX
W +uX
=
PX (x)(1−PX (x))uX
W +uX
where RX = ∑ r X (x j )
(3.18)
x j ∈X
The latter two equality expressions for the variance in Eq.(3.18) emerge from the
mapping of Definition 3.6 below.
The variance of the Dirichlet PDF measures how far the probability density is
spread out over the interval [0, 1] for each dimension x. A variance of zero for some
value x indicates that the probability density is concentrated in one point, which only
happens in case the corresponding parameter αX (x) is infinite (or the corresponding
parameter r X (x) is infinite).
It is normally assumed that the a priori probability density in case of a binary
domain X = {x, x} is uniform. This requires that αX (x) = αX (x) = 1, which in turn
dictates that W = 2. Assuming an a priori uniform probability density over a domain
larger than binary would require a non-informative prior weight W > 2. In fact, W
is always be equal to the cardinality of the domain for which a uniform probability
density is assumed.
Selecting W > 2 would result in new observation evidence having relatively less
influence over the Dirichlet PDF and the projected probability distribution. Note that
it would be unnatural to require a uniform probability density over arbitrarily large
domains because it would make the PDF insensitivity to new observation evidence.
For example, requiring a uniform a priori PDF over a domain of cardinality 100,
would force W = 100. In case an event of interest has been observed 100 times, and
no other event has been observed, the projected probability of the event of interest
32
3 Opinion Representations
will still only be about 1/2, which would be highly counter-intuitive. In contrast,
when a uniform PDF is assumed in the binary case, and the positive outcome has
been observed 100 times, and the negative outcome has not been observed, then the
projected probability of the positive outcome is close to 1, as intuition would dictate.
3.4.3 Visualising Dirichlet Probability Density Functions
Visualising Dirichlet probability density functions is challenging because it is a density function over k − 1 dimensions, where k is the domain cardinality. For this
reason, Dirichlet PDFs over ternary domains are the largest that can be practically
visualised.
Let us consider the example of an urn containing balls of the three different
markings: x1 , x2 and x3 , meaning that the urn can contain multiple balls marked x1 ,
x2 or x3 respectively. This situation can be modelled by a domain X = {x1 , x2 , x3 } of
cardinality k = 3. Let us first assume that no other information than the cardinality
is available, meaning that the number and relative proportion of balls marked x1 ,
x2 and x3 are unknown, and that the default base rate for any of the markings is
aX (x) = 1/k = 1/3. Initially, before any balls are drawn we have r X (x1 ) = r X (x2 ) =
r X (x3 ) = 0. Then Eq.(3.17) dictates that the a priori projected probability of picking
a ball of any specific marking is the default base rate probability a X (x) = 1/3. The
non-informative a priori Dirichlet PDF is illustrated in Figure 3.4.a.
Density
Dir(αX )
Density
Dir(αX )
20
20
15
15
10
10
5
5
0
0
1 p X (x2 )
0
1
0
0
1
0
1
p X (x3 )
.
p X (x2 )
0
1
p X (x1 )
(a) Non-informative prior Dirichlet PDF
p X (x3 )
.
0
1
p X (x1 )
(b) A posteriori Dirichlet PDF
Fig. 3.4 Prior and posterior Dirichlet PDFs
Let us now assume that an observer has picked (with return) 6 balls market x1 , 1
ball marked x2 and 1 ball marked x3 , i.e. r (x1 ) = 6, r (x2 ) = 1, r (x3 ) = 1, then the
3.4 Multinomial Opinions
33
a posteriori projected probability of picking a ball marked x1 can be computed as
EX (x1 ) = 23 . The a posteriori Dirichlet PDF is illustrated in Figure 3.4.b.
3.4.4 Coarsening Example: From Ternary to Binary
We reuse the example of Section 3.4.3 with the urn containing balls marked x1 , x2
and x3 , but this time we consider a binary partition of the markings into x1 and x1 =
{x2 , x3 }. The base rate of picking x1 is set to the relative atomicity of x1 , expressed
as a X (x1 ) = 31 . Similarly, the base rate of picking x1 is a X (x1 ) = a (x2 ) + a (x3 ) = 23 .
Let us again assume that an observer has picked (with return) 6 balls marked
x1 , and 2 balls marked x1 , i.e. marked x2 or x3 . This translates into the observation
vector r X (x1 ) = 6, r (x1 ) = 2.
Since the domain has been reduced to binary, the Dirichlet density function is
reduced to a Beta PDF which is simple to visualise. The a priori and a posteriori
density functions are illustrated in Figure 3.5.
6
5
5
4
4
Density
7
6
Density
7
3
3
2
2
1
1
0
0
0
.
0.2
0.4
p(x1 )
0.6
0.8
(a) Non-informative a priori Beta PDF
1
0
.
0.2
0.4
p(x1 )
0.6
0.8
1
(b) After 6 balls marked x1 , and 2 balls
marked x2 or x3
Fig. 3.5 Prior and posterior Beta PDF
Computing the a posteriori projected probability of picking ball marked x1 with
Eq.(3.17) produces EX (x1 ) = 32 , which is the same as before the coarsening, as illustrated in Section 3.4.3. This shows that the coarsening does not influence the
projected probability of specific events.
3.4.5 Mapping between Multinomial Opinion and Dirichlet PDF
The Dirichlet model translates observation evidence directly into a PDF over a kcomponent probability variable. The representation of the observation evidence, to-
34
3 Opinion Representations
gether with the base rate, can be used to determine subjective opinions. In other
words it is possible to define a bijective mapping between Dirichlet PDFs and multinomial opinions.
Let X be a random variable in domain X of cardinality k. Assume the multinomial opinion ωX = (bbX , uX , a X ), the probability distribution p X over X ∈ X, and
DireX (rr X , a X ) over p X .
The bijective mapping between ωX and DireX (rr X , a X ) is based on the requirement
for equality between the projected probability distribution PX derived from ωX , and
expected probability distribution EX derived from DireX (rr X , a X ). This requirement
is expressed as:
PX = EX
(3.19)
m
b X (x) + a X (x) uX =
r X (x) +W aX (x)
W + ∑ r X (x j )
∀x ∈ X ,
(3.20)
x j ∈X
We also require that each belief mass b X (x) be an increasing function of of the
evidence r X (x), and that uX be a decreasing function of ∑ r X (x). In other words,
x∈X
the more evidence in favour of a particular outcome x, the greater the belief mass on
that outcome. Furthermore, the more total evidence available, the less uncertain the
opinion. These requirements are expressed as:
∑ r X (x) −→ ∞
x∈X
⇔
∑ r X (x) −→ ∞
∑ b X (x) −→ 1
(3.21)
⇔
(3.22)
x∈X
uX −→ 0
x∈X
As already mentioned, the non-informative prior weight is set to W = 2. These
intuitive requirements together with Eq.(3.20) provide the basis for the following
bijective mapping:
Definition 3.6 (Mapping: Multinomial Opinion ↔ Dirichlet PDF). Let ωX =
(bbX , uX , a X ) be a multinomial opinion, and let DireX (rr X , a X ) be a Dirichlet PDF,
both over the same variable X ∈ X. The multinomial opinions ωX and DireX (rr X , a X )
are equivalent through the following mapping:
3.4 Multinomial Opinions
∀x ∈ X

b (x) =


 X




 uX
=
r X (x)
W + ∑ r X (xi )
xi ∈X
W
W + ∑ r X (xi )
xi ∈X
35

For uX 6= 0:



W b X (x)

 r X (x) = uX

⇔ 


 1 = uX + ∑ b X (xi )
xi ∈X
For uX = 0:





=
b
(x)
∞
r
(x)

X

 X



1
=
b
(x
)
∑

X i
xi ∈X
(3.23)
⊔
⊓
The equivalence mapping of Eq.(3.23) is a generalisation of the binomial mapping from Eq.(3.11). The interpretation of Beta and Dirichlet PDFs is well established in the statistics literature so that the mapping of Definition 3.6 creates a direct
mathematical and interpretation equivalence between Dirichlet PDFs and opinions
when both are expressed over the same domain X.
This equivalence is very powerful. One the one hand, statistical tools and methods such as collecting statistical observation evidence can now be applied to opinions. On the other hand, the operators of subjective logic such as conditional deduction and abduction can be applied to statistical models in terms of Dirichlet PDFs.
3.4.6 Uncertainty-Maximisation of Multinomial Opinions
Given a specific multinomial opinion ωX , with its projected probability distribution
PX , it is often useful to know the theoretical maximum uncertainty which still preserves the same projected probability distribution. The corresponding uncertaintybX = (b
maximised opinion is denoted ω
b X , ubX , a X ). Obviously, the base rate distribution a X is not affected by uncertainty-maximisation.
The maximum theoretical uncertainty ubX is determined by converting as much
belief mass as possible into uncertainty mass, while preserving consistent projected
probabilities. This process is illustrated in in Figure 3.6.
The line defined by the equations
PX = b X (xi ) + a X (xi )uX , i = 1, . . . k,
(3.24)
which by definition is parallel to the base rate director line and which joins PX
bX in Figure 3.6, defines possible opinions ωX for which projected probability
and ω
bX is an uncertaintydistribution is constant. As the illustration shows, an opinion ω
bX is
maximised opinion when Eq.(3.24) is satisfied and at least one belief mass of ω
zero, since the corresponding point would lie on a side of the simplex. In general,
not all belief masses can be zero simultaneously except for vacuous opinions. The
example of Figure 3.6 indicates the case when b X (x1 ) = 0.
bX should satisfy the following requireThe components of the opinion point ω
ments:
PX (xi0 )
ubX =
, for some i0 ∈ {1, . . . , k}, and
(3.25)
a X (xi0 )
36
3 Opinion Representations
uX
PX (x1) = bX (x1) + aX (x1) uX
PX (x2) = bX (x2) + aX (x2) uX
vertex
ZX
PX (x3) = bX (x3) + aX (x3) uX
ZX
x3
PX
x2
aX
x1
bX of multinomial opinion ωX
Fig. 3.6 Uncertainty-maximized opinion ω
PX (xi ) ≥ a X (xi )uX , for every i ∈ {1, . . . , k.
(3.26)
The requirement of Eq.(3.26) ensures that all the belief masses determined according to Eq.(3.12) are non-negative. These requirements lead to the theoretical
uncertainty maximum :
PX (xi )
ubX = min
(3.27)
i
a X (xi )
3.5 Hyper Opinions
The concept of hyper-opinions which is described below, also creates an equivalence
to Dirichlet PDFs over hyperdomains as well as to hyper-Dirichlet PDFs over n-ary
domains. The hyper-Dirichlet model was defined in 2010 [31] but has so far received
relatively little attention in the literature.
3.5.1 The Hyper-Opinion Representation
A hyper-opinion is the natural generalisation of a multinomial opinion. In case of
a domain X with hyperdomain R(X) it is possible to obtain evidence for a composite value x ∈ R(X) which would translate into assigning belief mass to the same
composite value.
Definition 3.7 (Hyper-Opinion). Let X be a domain of cardinality k > 2, with corresponding hyperdomain R(X). Let X be a hypervariable in R(X). A hyper-opinion
on the hypervariable X is the ordered triplet ωX = (bbX , uX , a X ) where:
3.5 Hyper Opinions
37
bX : is a belief mass distribution over R(X),
uX : is the uncertainty mass, i.e. the amount of uncommitted belief mass,
aX : is a base rate distribution over X,
where the hypernomial additivity of Eq.(2.7) is satisfied.
⊔
⊓
A subjective opinion ωXA denotes the target variable X as a subscript, and denotes
the opinion owner A as a superscript. Explicitly expressing subjective ownership of
opinions makes is possible to express that different agents have different opinions
on the same variable.
The belief mass distribution bX over R(X) has (2k − 2) parameters, whereas
the base rate distribution aX over X only has k parameters. The uncertainty mass
uX ∈ [0, 1] is a simple scalar. A general opinion thus contains (2k + k − 1) parameters. However, given that Eq.(2.7) and Eq.(2.8) remove one degree of freedom each,
hyper-opinions over a domain of cardinality k only have (2k + k − 3) degrees of
freedom.
By using the concept of relative base rates from Eq.(2.10), the projected probability distribution PX of hyper-opinions can be expressed as:
PX (x) =
∑
a X (x/x j ) b X (x j ) + a X (x) uX , ∀x ∈ X.
(3.28)
x j ∈R(X)
For x ∈ X it can be shown that the projected probability distribution PX satisfies
the probability additivity principle:
∑ PX (x) = 1 .
(3.29)
x∈X
However, for probabilities over X ∈ R(X), the sum of projected probabilities is
in general super-additive, formally expressed as:
∑
PX (x) ≥ 1 .
(3.30)
x∈R(X)
The super-additivity results from the fact that projected probability of partially
overlapping composite elements x j ∈ R(X) are partially based on the same projected probability on their constituent singleton elements xi ∈ X so that probabilities
are counted multiple times.
3.5.2 Projecting Hyper-Opinions to Multinomial Opinions
Given a hyper-opinion it can be useful to project it onto a multinomial opinion. The
procedure goes as follows.
If b ′X is a belief mass distribution defined by the sum in Eq.(3.28), i.e.:
b ′X (x) =
∑
x′ ∈R(X)
a X (x/x′ ) b X (x′ ) ,
(3.31)
38
3 Opinion Representations
then it is easy to check that b ′X : X → [0, 1], and that b ′X together with uX satisfies
the additivity property in Eq.(2.6), i.e. ωX′ = (bb′X , uX , a X ) is a multinomial opinion.
From Eq.(3.28) and Eq.(3.31) we obtain P(ωX ) = P(ωX′ ). This means that every
hyper opinion can be approximated with a multinomial opinion which has the same
projected probability distribution as the initial hyper-opinion.
3.5.3 The Dirichlet Model Applied to Hyperdomains
The traditional Dirichlet model applies naturally to a multinomial domain X of cardinality k, and there is a simple bijective mapping between multinomial opinions
and Dirichlet PDFs. Since opinions also apply to a hyperdomain R(X) of cardinality κ = 2k − 2, the question then is whether the Dirichlet model can also be applied
to hyperdomains. This would be valuable for interpreting hyper-opinions in terms
of traditional statistical theory. The apparent obstacle for this would be that two
composite elements xi , x j ∈ R(X) can be overlapping (i.e. non-exclusive) so that
xi ∩ x j 6= 0,
/ which is contrary to the assumption in the traditional Dirichlet model.
However, there is a solution, as described below.
The approach that we follow is to artificially assume that hyperdomain R(X) is
exclusive, i.e. to artificially assume that for every pair of elements xi , x j ∈ R(X)
it holds that xi ∩ x j = 0.
/ In this way, the Dirichlet model can be applied to the artificially exclusive hyperdomain R(X). This Dirichlet model is then based on the
κ -dimensional hyper-probability distribution p H
X from Eq.(2.12), where X ∈ R(X)
is a hypervariable.
The input is now a sequence of strength parameters of the κ possible elements x ∈
R(X) represented as κ positive real numbers αX (xi ), i = 1 . . . κ , each corresponding
to one of the possible values x ∈ R(X).
Definition 3.8 (Dirichlet HPDF). Let X be a domain consisting of k mutually
disjoint elements, where the corresponding hyperdomain R(X) has cardinality
κ = (2k − 2). Let αX represent the strength vector over the κ elements x ∈ R(X).
The hyper-probability distribution p H
X and the strength vector αX are both κ dimensional. The Dirichlet hyper-probability density function over p H
X , called Dirichlet HPDF for short, is denoted DirH
X (αX ), and is expressed as:
Γ
DirH
X (αX )
=
∑ αX (x)
x∈R(X)
!
∏ Γ(αX (x))
x∈R(X)
∏
(αX (x)−1)
pH
, where αX (x) ≥ 0 , (3.32)
X (x)
x∈R(X)
with the restrictions that: pH
X (x) 6= 0 if αX (x) < 1.
⊔
⊓
The strength vector αX represents the a priori as well as the observation evidence,
now assumed applicable to values x ∈ R(X).
3.5 Hyper Opinions
39
Since the elements of R(X) can contain multiple singletons from X, an element
of R(X) has a base rate equal to the sum of base rates of the singletons it contains
as expressed by Eq.(2.9). The strength αX (x) for each element x ∈ R(X) can then
be expressed as:


r X (x) ≥ 0










αX (x) = r X (x) + a X (x)W , where a X (x) = ∑ a (x j )
∀x ∈ R(X)
x j ⊆x






x j ∈X




W =2
(3.33)
The Dirichlet HPDF over a set of κ possible states xi ∈ R(X) can thus be expressed as a function of the observation evidence r X and the base rate distribution
a X (x), where x ∈ R(X). The superscript ‘eH’ in the notation DireH
X indicates that it
is expressed as a function of the evidence parameter vector r X (not of the strength
parameter vector αX ), and that it is a Dirichlet HPDF (not PDF).
Γ
DireH
X (rr X , a X ) =
∑ (rr X (x)+aaX (x)W )
x∈R(X)
!
Γ(rr X (x)+aaX (x)W )
∏
x∈R(X)
(rr X (x)+aaX (x)W −1) ,
∏ pH
X (x)
x∈R(X)
(3.34)
where (rr X (x) + a X (x)W ) ≥ 0,
with the restrictions that pH
X (x) 6= 0 if (rr X (x) + a X (x)W ) < 1.
The expression of Eq.(3.34) determines probability density over hyper-probability
distributions pH
X where each value x ∈ R(X) has a base rate according to Eq.(2.9).
Because an element x j ∈ R(X) can be composites the expected probability of
any element x ∈ X is not only a function of the direct probability density on x, but
also of the probability density of all other elements x j ∈ R(X) that contain x. More
formally, the expected probability of x ∈ X results from the probability density of
each x j ∈ R(X) where x ∩ x j 6= 0.
/
Given the Dirichlet HPDF of Eq.(3.34), the expected probability of any of the k
values x ∈ X can now be written as:
∑
EX (x) =
a X (x/x j )rr (x j ) +W a X (x)
x j ∈R(X)
W+
∑
r (x j )
∀x ∈ X .
(3.35)
x j ∈R(X)
The expected probability distribution of a Dirichlet HPDF expressed by Eq.(3.35)
is a generalisation of the expected probability distribution of a Dirichlet PDF expressed by Eq.(3.17).
40
3 Opinion Representations
3.5.4 Mapping between a Hyper-Opinion and a Dirichlet HPDF
A hyper-opinion is equivalent to a Dirichlet HPDF according the mapping defined
below. This mapping is simply is an extension of the mapping between a multinomial opinion and a traditional Dirichlet PDF as described in Eq.(3.23).
Definition 3.9 (Mapping: Hyper-Opinion ↔ Dirichlet HPDF). Let X be a domain consisting of k mutually disjoint elements, where the corresponding hyperdomain R(X) has cardinality κ = (2k − 2), and let X be a hypervariable in R(X). Let
ωX be a hyper-opinion on X, and let DireH
X (rr X , a X ) be a Dirichlet HPDF over the
ωX and the Dirichlet HPDF
hyper-probability distribution p H
.
The
hyper-opinion
X
r
DireH
(r
,
a
)
are
equivalent
through
the
following
mapping:
X
X
X
∀x ∈ R(X)

b X (x) =







 uX
=
r X (x)
W + ∑ r X (xi )
xi ∈R(X)
W
W + ∑ r X (xi )
xi ∈R(X)

For uX 6= 0:


  r (x) = W bX (x)

uX
 X
⇔



 1 = uX +∑ b X (xi )
xi ∈R(X)
For uX = 0:






 r X (x) = b X (x) ∞ 



 1 = ∑ b X (xi )
xi ∈R(X)
(3.36)
⊔
⊓
A Dirichlet HPDF is based on applying the Dirichlet model to values of the
hyperdomain R(X) that in fact are partially overlapping values in the corresponding
domain X. A Dirichlet HPDF applied to the R(X) can be projected to a PDF applied
to X, but this projected PDF is not a Dirichlet PDF in general. Only a few degenerate
cases become Dirichlet PDFs through this projection, such as the non-informative
prior Dirichlet where r X is the zero vector which corresponds to a vacuous opinion
with u = 1, or the case where evidence only relates to singleton values x ∈ X.
The advantage of the Dirichlet HPDF is to provide an interpretation and equivalent representation of hyper-opinions.
It would not be meaningful to try to visualise the Dirichlet HPDF over the hyperprobability distribution pH
X itself because it would fail to visualise the important fact
that probability is assigned to overlapping values x ∈ X. This aspect would make it
extremely difficult to see the probability on a specific value x ∈ X, because in general
the probability is a function of multiple probabilities on overlapping hyper-values.
A visualisation of probability density should therefore be done over the probability
distribution p X , where probability on specific values x ∈ X can be seen or interpreted
directly.
3.5 Hyper Opinions
41
3.5.5 Hyper Dirichlet PDF
The Dirichlet HPDF (hyper probability density function) described in Section 3.5.3
above applies to hyperdomain R(X), and is not suitable for representing probability
over the corresponding domain X. What is needed is a PDF (probability density
function) that somehow represents the parameters of the Dirichlet HPDF over the
domain X.
The PDF that does exactly that can be obtained by integrating the evidence parameters for the Dirichlet HPDF to produce evidence parameters for a PDF over
the probability variable p X . In other words, the evidence on singleton values of the
random variable must be computed as a function of the evidence on composite values of the hypervariable. A method for this task has been defined by Hankin [31],
where the resulting PDF is a Hyper-Dirichlet PDF which is a generalisation of the
classical Dirichlet PDF. In addition to the factors consisting of the probability product of the probability variables, it requires a normalisation factor B(rr X , a X ) that can
be computed numerically. Hankin also provides a software package for producing
visualisations of Hyper-Dirichlet PDFs over ternary domains.
The Hyper-Dirichlet PDF is denoted HDireX (rr X , a X ). Its mathematical expression
is given by Eq.(3.37) below.
!
HDireX (rr X , a X ) = B(rr X , a X )−1
k
κ
i=1
j=(k+1)
∏ p X (xi )(rrX (xi )+aaX (xi )W −1) ∏ p X (x j )r X (x j )
(3.37)
= B(rr X , a X )
−1
k
∏ p X (xi )
(aaX (xi )W −1)
i=1
κ
∏ p X (x j )
j=1
!
r X (x j )
(3.38)
where
B(rr X , a X ) =
κ
R
p X (x)≥0
k
∏ p X (xi
i=1
κ
∏ p X (x j
j=(k+1)
)(rr X (xi )+aaX (xi )W −1)
)r X (x j )
!
d(ppX (x1 ). . ., p X (xκ )).
∑ p X (x j )≤1
j=(k+1)
(3.39)
A Hyper-Dirichlet PDF produces probability density over a probability distribution p X where X ∈ X. Readers might therefore be surprised to see that Eq.(3.37)
contains probability terms p X (x j ) where x j are composite values in R(X). However, the probability of a composite value is in fact the sum of set of probabilities
p X (xi ) where xi ∈ X, as expressed by Eq.(3.40). This ensures that a Hyper-Dirichlet
PDF is really a PDF over a traditional probability distribution p X .
42
3 Opinion Representations
The expression for the Hyper-Dirichlet PDF in Eq.(3.37) strictly separates between the singleton value terms and the composite value terms. To this end the k
singleton state values x ∈ R(X) (i.e. elements x ∈ X) are denoted xi , i ∈ [1, k], and
the (κ − k) composite state values x ∈ R(X) are denoted x j , j ∈ [(k + 1), κ ].
The notation is more compact in Eq.(3.38), where the index j covers the whole
range [1, κ ]. The simplification results from interpreting the term p X (x j ) according
to Eq.(3.40), so that for j = i ≤ k, we automatically have p X (x j ) = p X (xi ).
p X (x j ) =
∑
p X (xi ), for j ∈ [1, κ ]
(3.40)
xi ⊆x j
The normalisation factor B(rr X , a X ) is unfortunately not given by a closed expression, so that numerical computation is needed to determine its value for each set of
parameters r X and a X .
The ability to represent statistical observations in terms of Hyper-Dirichlet PDFs
is useful because a PDF on p X is intuitively meaningful, in contrast to a PDF on p H
X.
We will here consider the example of a genetic engineering process, where eggs of
3 different mutations are being produced. The mutations are denoted by x1 , x2 and
x3 respectively, so that the domain can be defined as X = {x1 , x2 x3 }. The specific
mutation of each egg can not be controlled by the process, so a sensor is being used
to determine the mutation of each egg. Let us assume that the sensor is not always
able to determine the mutation exactly, and that it sometimes can only exclude one
out of the three possibilities. What is observed by the sensors is therefore elements
of the reduced powerset R(X). We consider two separate scenarios of 100 observations. In scenario A, mutation x3 has been observed 20 times, and mutation x1 or x2
(i.e. the element {x1 , x2 }) has been observed 80 times. In scenario B, mutation x2
has been observed 20 times, the mutations x1 or x3 (i.e. the element {x1 , x3 }) have
been observed 40 times, and the mutations x2 or x3 (i.e. the element {x2 , x3 }) have
also been observed 40 times. Table 3.3 summarises the two scenarios. The base rate
is set to the default value 1/3 for each mutation.
Table 3.3 Number of observations per mutation category
Scenario A
Scenario B
Mutation: x1 x2 x3 {x1 , x2 } {x1 , x3 } {x2 , x3 } x1 x2 x3 {x1 , x2 } {x1 , x3 } {x2 , x3 }
Counts: 0 0 20 80
0
0
0 20 0
0
40
40
Because the domain X is ternary it is possible to visualise the corresponding
Hyper-Dirichlet PDFs, as shown in Figure 3.7
Readers who are familiar with the typical shapes of Dirichlet PDFs will immediately notice that the plots of Figure 3.7 are clearly not Dirichlet. The Hyper-Dirichlet
model [31] represents a generalisation of the classic Dirichlet model and provides a
nice interpretation of hyper-opinions that can be useful for better understanding the
nature hyper-opinions.
3.6 Alternative Opinion Representations
43
Density
Density
20
20
15
15
10
10
5
5
0
0
1 p(x2)
1 p(x2)
0
0
1
p(x3)
0
0
0
1
p(x1)
(a) Hyper-Dirichlet PDF of scenario A
1
p(x3)
0
1
p(x1)
(b) Hyper-Dirichlet PDF of scenario B
Fig. 3.7 Example Hyper-Dirichlet probability density functions
An interesting aspect of hyper opinions is that they can express vagueness in the
sense that evidence can support multiple elements in the domain simultaneously.
Vague belief is defined in Section 4.1.2.
3.6 Alternative Opinion Representations
The previous sections have presented two equivalent opinion representations which
are the belief representation of opinions, typically denoted ωX as well as the evidence representation of opinions in the form of Dirichlet PDFs, typically denoted
DireX (rr X , a X ). Other representations can be defined, where Section 3.6.1 and Section 3.6.2 below describe two simple representations that can be useful in specific
applications.
An additional representation for binomial opinions is for example defined in CertainLogic [79]. Binomial opinions as defined in CertainLogic are equivalent to traditional binomial belief opinions.
3.6.1 Probabilistic Notation of Opinions
Most people are familiar with the concept of probability, and are able to intuitively
interpret probabilities quite well. The classical probability representation is used in
all areas of science, so people are primarily interested in probability distributions
when analysing models of situations that include possible and uncertain events.
It can therefore be seen as a disadvantage that the traditional opinion representation described in the previous sections does not explicitly express projected probability.
44
3 Opinion Representations
Although the projected probability distribution of an opinion can easily be derived with Eq.(3.28), the lack of explicit representation of projected probability
might still represent a mental barrier for direct intuitive interpretation of opinions.
In order to overcome this barrier, an alternative representation of opinions could
therefore be designed to consist of explicit projected probability distributions, together with the degree of uncertainty and base rate distributions. This representation
is called the probabilistic opinion notation which is formally defined below.
Definition 3.10 (Probabilistic Opinion Notation). Assume domain X with random
variable X and let ωX = (bbX , uX , a X ) be a binomial or multinomial opinion on X. Let
PX be the corresponding projected probability distribution over X defined according
to Eq.(3.12). The probabilistic notation for multinomial opinions is given below.
Probabilistic opinion:
Constraints:
πX = (PX , uX , a X )


 a X (x) uX ≤ PX (x) ≤ (aaX (x) uX + 1 − uX ),

 ∑ PX (x) = 1,
(3.41)
∀x ∈ X
x∈X
⊔
⊓
The uncertainty mass uX is the same for both the belief notation and the probabilistic notation of opinions. The base rate distribution a X is also the same for both
notations. The equivalence between the two notations is simply based on the expression for the projected probability distribution as a function of the belief distribution
in Eq.(3.12). This leads to the bijective mapping defined below.
Definition 3.11 (Mapping: Belief Opinion – Probabilistic Opinion).
Let ωX = (bbX , uX , a X ) be a multinomial belief opinion, and let πX = (PX , uX , a X )
be a multinomial probabilistic opinion, both over the same variable X ∈ X. The
multinomial opinions ωX and πX are equivalent through the following mapping:
b X (x) = PX (x)−aaX (x) uX
⇔
PX (x) = b X (x)+aaX (x) uX ,
∀x ∈ X
(3.42)
⊔
⊓
In case uX = 0, then PX is a traditional discrete probability distribution without
uncertainty. In case uX = 1, then PX = a X , and no evidence has been received, so
the probability distribution PX is totally uncertain.
Assume a binary domain X with cardinality k. Then both the base rate distribution a X as well as the projected probability distribution PX have k − 1 degrees of
freedom due to the additivity property of Eq.(2.8) and Eq.(3.29). With the addition
of the independent uncertainty parameter uX , the probabilistic notation of opinions
has 2k − 1 degrees of freedom, as do the belief notation and the evidence notation
of multinomial opinions.
3.6 Alternative Opinion Representations
45
In case of a binary domain X = {x, x} a special notation for binomial opinions
can be used. Eq.(3.43) shows the probabilistic notation of binomial opinions which
has three parameters and also has three degrees of freedom.

 Px is the projected probability of x
ux is the uncertainty mass
πx = (Px , ux , ax ) , where

ax is the base rate of x
(3.43)
under the constraint: ax ux ≤ Px ≤ 1.
The main limitation of the probabilistic opinion notation is that it does not cover
hyper-opinions, it only covers binomial and multinomial opinions. However, in
case only binomial or multinomial opinion representation is required this limitation
might not be a problem.
The second disadvantage with the probabilistic opinion notation is that the algebraic expressions for operators often become unnecessarily complex. It turns out
that the belief notation of opinions, as specified in Definitions 3.1, 3.4 and 3.7 offers
the simplest algebraic representation of opinion operators. For this reason, we do
not use the notation for probabilistic opinion operators here, we only use the belief
notation.
3.6.2 Qualitative Category Representation
Human language provides various terms that are commonly used to express various
types of likelihood and uncertainty. It is possible to express binomial opinions in
terms of qualitative verbal categories which can be specified according to the need
of a particular application. An example set of qualitative categories is provided in
Table 3.4.
These qualitative verbal categories can be mapped to areas in the opinion triangle
as illustrated in Figure 3.8. The mapping must be defined for combinations of ranges
of expected probabilities and uncertainty. As a result, the mapping between a specific qualitative category from Table 3.4 and specific geometric area in the opinion
triangle depends on the base rate. Without specifying the exact underlying ranges,
the visualization of Figure 3.8 indicates the ranges approximately. The edge ranges
are deliberately made narrow in order to have categories for near dogmatic and vacuous beliefs, as well as beliefs that express projected probability near absolute 0 or
1. The number of likelihood categories, and certainty categories, as well as the exact
ranges for each, must be determined according to the need of each application, and
the qualitative categories defined here must be seen as an example. Real-world categories would likely be similar to those found in Sherman Kent’s Words of Estimated
Probability [57]; based on the Admiralty Scale as used within the UK National Intel-
Š†‹‡ˆŒƒŽ†ƒ‡
ŒƒŽ†ƒ‡
‚‘†’“”…ˆŒƒŽ†ƒ‡
•“”–†ˆ”€‚„…ˆ†—†
‚‘†’“”…ˆ˜Ž†ƒ‡
˜Ž†ƒ‡
Š†‹‡ˆ˜†ƒ‡
€‚ƒ„…†ƒ‡
3 Opinion Representations
€‚ƒ„…†ƒ‡ˆ‰‚…
46
#
$
%
&
'
(
)
*
+
€‚ƒ€„……†
‡ˆ‰‚Š…‹€‚Œ
‡‚‹‰ˆ€ ‰!"‡ˆ‰‚Š…‹€‚Œ
•‚‘ ƒ†…†ƒ‡ˆŒ–†‹…”Ž
,
!"
#"
$"
%"
&"
'"
("
)"
*"
Š†‹‡ˆŒ–†‹…”Ž
-
!+
#+
$+
%+
&+
'+
(+
)+
*+
Œ–†‹…”Ž
‡
!•
#•
$•
%•
&•
'•
(•
)•
*•
ƒŽ,“…ƒ‡ˆŒ–†‹…”Ž
.
!-
#-
$-
%-
&-
'-
(-
)-
*-
•‚‘ ƒ†…†ƒ‡ˆ•†‹…”Ž
/
!
#
$
%
&
'
(
)
*
Table 3.4 Qualitative Categories
ligence Model1 ; or could be based on empirical results obtained from psychological
experimentation.
9E 8E
9D
9C 8C
9B
8B
9A 8A
8D 7D
7C
6E
7E
6D
6C
5E
5D
5C
4E
4D
4C 3C
3E
3D
2E 1E
2D
2C
7B
6B
5B
4B
3B 2B
7A
6A
5A
4A
3A
1D
1C
1B
2A 1A
(a) Qualitative categories with a =
9E 8E
9D
8D
9C
9B
9A
1
3
7E
7D
8C
6E
4E
6D
5D 4D
7C 6C
5C
8B 7B
8A
5E
7A
6B
6A
5B
5A
3E
2E
3D 2D
4C
4B
4A
3C
3B
3A
(b) Qualitative categories with a
1E
1D
2C 1C
2B
1B
2A 1A
= 23
Fig. 3.8 Mapping qualitative categories to ranges of belief as a function of the base rate
Figure 3.8 illustrates category-opinion mappings in the case of base rate a = 1/3,
and the case of base rate a = 2/3. The mapping is determined by the overlap between
category area and triangle region. Whenever a qualitative category area overlaps,
partly or completely, with the opinion triangle, that qualitative category is a possible
mapping.
Note that the qualitative category areas overlap with different regions on the triangle depending on the base rate. For example, it can be seen that the category 7D:
‘Unlikely and Very Uncertain’ is possible in case a = 1/3, but not in case a = 2/3.
1
http://www.policereform.gov.uk/implementation/natintellmodel.html
3.6 Alternative Opinion Representations
47
This is because the projected probability of a state x is defined as Px = bx + ax ux ,
so that when ax , ux −→ 1, then Px −→ 1, meaning that the likelihood category ‘Unlikely’ would be impossible.
Mapping from qualitative categories to subjective opinions is also straightforward. Geometrically, the process involves mapping the qualitative adjectives to
the corresponding center of the portion of the grid cell contained within the opinion
triangle (see Figure 3.8). Naturally, some mappings will always be impossible for
a given base rate, but these are logically inconsistent and should be excluded from
selection.
Although a specific qualitative category maps to different geometric areas in the
opinion triangle depending on the base rate, it will always correspond to the same
range of beta PDFs. It is simple to visualize ranges of binomial opinions with the
opinion triangle, but it would not be easy to visualize ranges of Beta PDFs. The
mapping between binomial opinions and beta PDFs thereby provides a very powerful way of describing Beta PDFs in terms of qualitative categories, and vice versa.
Chapter 4
Decision-Making Under Uncertainty
Decision-making is the process of identifying and choosing between alternative options based on the beliefs about the different options and their associated utility
gains or losses. The decision-maker can be the analyst of the situation, or the decision maker can act on advice produced by an analyst. In the following we do not
distinguish between the decision-maker and the analyst, and use the term ’analyst’
to cover both.
Opinions can form the basis of decisions, and it is important to understand how
various aspects of an opinion should (rationally) determine the optimal decision.
For this purpose it is necessary to introduce new concepts that are described next.
Section 4.4 provides a summary of decision criteria based on opinions.
4.1 Aspects of Belief and Uncertainty in Opinions
The previous chapter on the different categories of opinions only distinguishes between belief mass and uncertainty mass. This section dissects these two types into
more granular types called specificity, vagueness, and uncertainty that each have
their elemental and total variants.
4.1.1 Specificity
Belief mass that only supports a specific element is called specific belief mass because it is specific to a single element and discriminates between elements, i.e. it
is non-vague and non-uncertain. Note that we also interpret belief mass on a composite element (and its subsets) to be specific for that composite element, because it
discriminates between that composite element and any other element which is not a
subset of that element.
49
50
4 Decision-Making Under Uncertainty
Definition 4.1 (Elemental Specificity). Let X be a domain with hyperdomain R(X)
and variable X. Given an opinion ωX , the elemental specificity of element x ∈ R(X)
S
is the function ḃbX : R(X) → [0, 1] expressed as:
S
Elemental specific belief mass: ḃbX (x) =
∑b X (x j ) .
(4.1)
x j ⊆x
⊔
⊓
It is useful to express elemental specificity of composite elements in order to assist decision making in situations like the Ellsberg paradox described in Section 4.5.
The total specific belief mass denoted bSX is simply the sum of all belief masses
assigned to singletons, defined as:
Definition 4.2 (Total Specificity). Let X be a domain with variable X, and let ΩX
be the set of opinions on X. Total specificity of an opinion ωX is the function bSX :
ΩX → [0, 1] expressed as:
Total specific belief mass: bSX =
∑b X (xi ) .
(4.2)
xi ∈X
⊔
⊓
Total specificity represents the complement of the sum of total vagueness and
uncertainty, as described below.
4.1.2 Vagueness
Recall from Section 2.3 that that the composite set denoted by C (X) is the set of
all composite elements from the hyperdomain. Belief mass assigned to a composite element expresses cognitive vagueness because this type of belief mass supports
the truth of multiple singletons in X simultaneously, i.e. it does not discriminate
between the singletons in the composite element. In case of binary domains there
can be no vague belief because there are no composite elements. In case of hyperdomains there are always composite elements, and every singleton x ∈ X is member
of multiple composite elements. The elemental vagueness of a singleton x ∈ R(X)
is defined as the weighted sum of belief masses on the composite elements of which
x is a member, where the weights are determined by the base rate distribution. The
total amount of vague belief mass is simply the sum of belief masses on all composite elements in the hyperdomain. The formal definitions for these concepts are
provided next.
Let X be a domain where R(X) denotes its hyperdomain. Let C (X) be the composite set of X according to Eq.(2.3). Let x ∈ R(X) denote an element in hyperdomain R(X) and let x j ∈ C (X) denote a composite element in C (X).
4.1 Aspects of Belief and Uncertainty in Opinions
51
Definition 4.3 (Elemental Vagueness). Let X be a domain with hyperdomain R(X)
and variable X. Given an opinion ωX , the elemental vagueness of element x ∈ R(X)
V
is the function ḃbX : R(X) → [0, 1] expressed as:
V
Elemental vague belief mass: ḃbX (x) =
∑ a X (x/x j ) b X (x j ) .
(4.3)
x j ∈C (X)
x j 6⊆x
⊔
⊓
Note that Eq.(4.3) not only defines vagueness of singletons x ∈ X, but also defines
vagueness of composite elements x ∈ C (X), i.e. of all elements x ∈ R(X).
Obviously in case x is a composite element, then the belief mass b X (x) does not
contribute to elemental vagueness of x, although b X (x) represents vague belief mass
for the whole opinion. The total vague belief mass in an opinion ωX is defined as
the sum of belief masses on composite elements x j ∈ C (X), formally defined as:
Definition 4.4 (Total Vagueness). Let X be a domain with variable X, and let ΩX
be the set of opinions on X. The total vagueness of an opinion ωX is the function
bV
X : Ω X → [0, 1] expressed as:
Total vague belief mass: bV
X =
∑
b X (x j ) .
(4.4)
x j ∈C (X)
⊔
⊓
An opinion ωX is dogmatic and vague when bV
X = 1, and is partially vague when
0 < bV
<
1.
An
opinion
has
mono-vagueness
when
only a single composite eleX
ment has (vague) belief mass assigned to it. Correspondingly an opinion has plurivagueness when several composite elements have (vague) belief mass assigned to
them.
It is important to understand the difference between uncertainty and vagueness
in subjective logic. Uncertainty reflects lack of evidence, whereas vagueness results
from evidence that fails to discriminate between specific singletons. A vacuous (totally uncertain) opinion, by definition, does not contain any vagueness. Hyper opinions can contain vagueness, whereas multinomial and binomial opinions never contain vagueness. The ability to express vagueness is thus the main aspect that makes
hyper-opinions different from multinomial opinions.
When assuming that collected evidence never decays, then uncertainty can only
decrease over time because accumulated evidence is never lost. As the natural complement, specificity and vagueness can only increase. At the extreme, a dogmatic
opinion where bV
X = 1 expresses dogmatic vagueness. A dogmatic opinion where
bSX = 1 expresses dogmatic specificity, which is equivalent to a traditional probability distribution over a random variable.
When assuming that evidence decays e.g. as a function of time, then uncertainty
can increase over time because uncertainty decay is equivalent to the loss of evidence. Vagueness decreases in case new evidence is specific, i.e. when the new
52
4 Decision-Making Under Uncertainty
evidence supports singletons, and old vague evidence decays. Vagueness increases
in case new evidence is vague, i.e. when the new evidence supports composite elements, and the old specific evidence decays.
4.1.3 Dirichlet Visualisation of Opinion Vagueness
The total vagueness of a trinomial opinion can not easily be visualised as such on the
opinion tetrahedron. However, it can be visualised in the form of a hyper-Dirichlet
PDF. Let us for example consider the ternary domain X with corresponding hyperdomain R(X) illustrated in Figure 4.1.
R (:)
x4
x1
x5
x3
x2
x6
Fig. 4.1 Hyperdomain for the example of vague belief mass
The singletons and composite elements of R(X) are listed below.


X
= {x1 , x2 , x3 }
 Domain:
 x4 = {x1 , x2 }
Hyperdomain: R(X) = {x1 , x2 , x3 , x4 , x5 , x6 } where
x = {x1 , x3 }

 5
Composite set: C (X) = {x4 , x5 , x6 }
x6 = {x2 , x3 }
(4.5)
Let us further assume a hyper-opinion ωX with belief mass distribution and base
rate distribution specified in Eq.(4.6) below.
Belief
mass distribution
b X (x6 ) = 0.8,
uX
= 0.2.
Base
 rate distribution
 a X (x1 ) = 0.33,
a X (x2 ) = 0.33,

a X (x3 ) = 0.33.
(4.6)
Note that this opinion has mono-vagueness because the vague belief mass is assigned to only one composite elements.
The projected probability distribution on X computed with Eq.(3.28) and the
vague belief mass computed with Eq.(4.3) are given in Eq.(4.7) below.
4.1 Aspects of Belief and Uncertainty in Opinions
Projected probability distribution

PX (x1 ) = 0.066,





PX (x2 ) = 0.467,





PX (x3 ) = 0.467.
53
Vague
 V belief mass

ḃbX (x1 ) = 0.0,





V
ḃbX (x2 ) = 0.4,





 V
ḃbX (x3 ) = 0.4.
(4.7)
The hyper-Dirichlet PDF for this vague opinion is illustrated in Figure 4.2. Note
how the probability density is spread out along the edge between the x2 and x3
vertices, which precisely indicates that the opinion expresses vagueness between x2
and x3 . Vague belief of this kind can be useful for an analyst in the sense that it can
exclude specific elements from being plausible, which in this case is x1 .
10
8
6
4
2
0
1 p(x2)
0
0
1
p(x3)
0
1
p(x1)
Fig. 4.2 Hyper-Dirichlet PDF with vague belief
In case of multinomial and hypernomial opinions larger than trinomial it is challenging to design visualisations. A possible solution in case visualisation is required
for opinions over large domains, is to use partial visualisation over specific values
of the domain that are of interest to the analyst.
4.1.4 Elemental Uncertainty
When an opinion contains uncertainty, the simplest interpretation is to consider that
the whole uncertainty mass is shared between all the elements of the (hyper) domain. However, as indicated by the expressions for projected probability of e.g.
Eq.(3.28), the uncertainty mass can be interpreted as being implicitly assigned to
(hyper)elements of the variable, as a function of the base rate distribution over the
variable. This interpretation is captured by the definition of elemental uncertainty
mass.
54
4 Decision-Making Under Uncertainty
Definition 4.5 (Elemental Uncertainty). Let X be a domain where R(X) denotes
its hyperdomain. Given an opinion ωX , the elemental uncertainty mass of an element
x ∈ R(X) is computed with the function u̇uX : R(X) → [0, 1] defined as:
Elemental uncertainty mass: u̇uX (x) = a X (x) uX .
(4.8)
⊔
⊓
Note that the the above definition uses the notation u̇uX in the sense of a distribution of uncertainty mass over elements, which is different from total uncertainty
mass as a single scalar denoted uX .
4.2 Mass-Sum for Specificity, Vagueness and Uncertainty
The elemental specificity, vagueness and uncertainty concepts defined in the previous section are representative for each element by pulling belief and uncertainty
mass proportionally across the belief masses and the uncertainty of the opinion.
The concatenation of elemental specificity, vagueness and uncertainty is then called
elemental mass-sum, and similarly for total mass-sum.
The additivity properties of elemental and total belief and uncertainty mass are
described next.
4.2.1 Elemental Mass-Sum
The sum of elemental specificity, vagueness, and uncertainty of an element is equal
to the element’s projected probability, expressed as:
S
V
ḃbX (x) + ḃbX (x) + u̇uX (x) = PX (x).
(4.9)
Eq.(4.9) shows that the projected probability can be split into three parts which
are i) elemental specificity, ii) elemental vagueness, and iii) elemental uncertainty.
The composition of these three parts, called elemental mass-sum, denoted ΞEX (x).
The symbol ‘Ξ’ is the Greek letter ‘Xi’. The concept of elemental mass-sum is
defined next.
Definition 4.6 (Elemental Mass-Sum). Let X be a domain with hyperdomain
R(X), and assume that the opinion ωX is specified. Consider an element x ∈ R(X)
S
V
with its elemental specificity ḃbX (x), elemental vagueness ḃbX (x) and elemental uncertainty u̇uX (x). The elemental mass-sum for element x is the triplet denoted ΞEX (x)
expressed as:
S
V
Elemental mass-sum: ΞEX (x) = ḃbX (x), ḃbX (x), u̇uX (x) .
(4.10)
4.2 Mass-Sum for Specificity, Vagueness and Uncertainty
55
⊓
⊔
Given an opinion ωX , each element x ∈ R(X) has an associated elemental masssum ΞEX (x) which is a function of the opinion ωX . The term mass-sum means that
the triplet of specificity, vagueness and uncertainty has the additivity property of
Eq.(4.9).
In order to visualise an elemental mass-sum, consider the ternary domain X =
{x1 , x2 , x3 } and hyper domain R(X) illustrated in Figure 4.3 where the belief masses
and uncertainty mass of opinion ωX are indicated on the diagram.
R (:)
0.2
x4
0.2
x1
0.1
x5
0.3
x3
0
x2
0.1
x6
0.1
Fig. 4.3 Hyperdomain with belief masses
Formally, the opinion ωX is specified in Table 4.1. The table also includes the elemental mass-sum elements in terms of elemental specificity, vagueness and uncertainty. The table also shows the projected probability for every element x ∈ R(X).
Table 4.1 Opinion with elemental specificity, vagueness, uncertainty, and projected probability.
Element
x
x1
x2
x3
x4
x5
x6
X
Belief mass /
Elemental Elemental Elemental Projected
Uncertainty Base rate specificity vagueness Uncertainty probability
S
V
b X (x)
a X (x)
ḃbX (x)
ḃbX (x)
u̇uX (x)
PX (x)
uX
0.10
0.10
0.00
0.20
0.30
0.10
0.20
0.20
0.30
0.50
0.50
0.70
0.80
0.10
0.10
0.00
0.40
0.40
0.20
0.16
0.16
0.28
0.12
0.14
0.34
0.04
0.06
0.10
0.10
0.14
0.16
0.30
0.32
0.38
0.62
0.68
0.70
The elemental mass-sums from opinion ωX listed in Table 4.1 are visualised
as a mass-sum diagram in Figure 4.4. Mass-sum diagrams are useful for assisting
decision making because the degree of specificity, vagueness and uncertainty can be
clearly understood.
56
4 Decision-Making Under Uncertainty
; EX ( x1 ) :
PX ( x1 )
; EX ( x2 ) :
PX ( x2 )
; ( x3 ) :
E
X
PX ( x3 )
; ( x4 ) :
E
X
PX ( x4 )
; ( x5 ) :
E
X
PX ( x5 )
; ( x6 ) :
E
X
PX ( x6 )
P
0.0
0.1
0.2
Legend:
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Elemental specificity
Elemental vagueness
Elemental uncertainty
Fig. 4.4 Elemental mass-sum diagram for ωX .
Visualisation with the mass-sum diagram makes it much easier to appreciate the
nature of beliefs in each element as a function of the opinion. Since hyper opinions
can not easily be visualised on simplexes like triangles or tetrahedrons, a masssum diagram like the one in Figure 4.4 offers a nice alternative that scales to larger
domains.
On Figure 4.4 it can be seen that x3 has the greatest projected probability among
the singletons, expressed as PX (x3 ) = 0.38. However, the elemental mass-sum of
x3 is void of specificity, so its projected probability is solely based on vagueness
and uncertainty. These aspects are important to consider for decision making, as
explained below.
4.2.2 Total Mass-Sum
The belief mass of an opinion as a whole can be decomposed into total specificity
which provides distinctive support for singletons, and total vagueness which provides vague support for singletons. These two belief masses are then complementary to the uncertainty mass. For any opinion ωX it can be verified that Eq.(4.11)
holds.
bSX + bV
(4.11)
X + uX = 1 .
Eq.(4.11) shows that the belief and uncertainty mass can be split into the three
parts of total specificity, total vagueness and uncertainty. The composition of these
three parts is called total mass-sum, denoted ΞTX , and is defined below.
4.3 Utility and Normalisation
57
Definition 4.7 (Total Mass-Sum). Let X be a domain with hyperdomain R(X), and
assume that the opinion ωX is specified. The total specificity bSX (x), total vagueness
bV
X (x) and uncertainty uX (x) can be combined as a triplet, and is then called the total
mass-sum, denoted ΞTX expressed as:
Total mass-sum: ΞTX = bSX , bV
(4.12)
X , uX .
⊔
⊓
The total mass-sum of opinion ωX from Figure 4.3 and Table 4.1 is illustrated in
Figure 4.5.
; TX :
P
0.0
Legend:
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Total specific belief mass
Total vague belief mass
Uncertainty mass
Fig. 4.5 Visualising the total mass-sum from ωX .
4.3 Utility and Normalisation
Assume a random variable X with an associated projected probability distribution
PX . Utility is typically associated with outcomes of a random variable, in the sense
that for each outcome x there is an associated utility λ X (x) expressed on some scale
such as monetary value, which can be positive or negative. Given utility λ X (x) in
case of outcome x, then the elemental expected utility for x is:
Elemental expected utility: LX (x) = PX (x)λ X (x).
(4.13)
Expected utility for the variable X is then:
Expected utility: LX =
∑ PX (x)λ X (x).
(4.14)
x∈X
I classical utility theory, decisions are based on expected utility for possible options. It is also possible to eliminate the notion of utility by integrating it in the
probabilities for the various options [7], which produces a utility-normalised probability vector. This approach greatly simplifies decision making models, because
every option can be represented as a simple probability.
58
4 Decision-Making Under Uncertainty
Normalisation is useful wen comparing options of variables from different domains, where the different variables have different associated probability distributions and utility vectors. The normalisation factor must be appropriate for all variables, so that the utility-normalised probability vectors are within a given range.
Note that in case of negative utility for a specific outcome, the utility-normalised
probability for that outcome is also negative. In that sense, utility-normalised probability represents synthetic probability, and not realistic probability.
Given a set of variables, with associated probability distributions and utility vectors, let λ + denote the greatest absolute utility of all utilities in all vectors. Thus, if
the greatest absolute utility is negative, then λ + takes its positive (absolute) value.
The utility-normalised probability vector PN
X is defined below.
Definition 4.8 (Utility-Normalised Probability Vector). Assume a random variable X with an associated projected probability distribution PX and a utility vector
λ X that together produce the expected utility LX . Let λ + denote the greatest absolute utility from λ X and from other relevant utility vectors that will be considered
for comparing different options. The utility-normalised probability vector produced
by PX , λ X and λ + is expressed as:
PN
X (x) =
LX (x) λ X (x)PX (x)
=
, ∀x ∈ X.
λ+
λ+
(4.15)
⊔
⊓
Note that the utility-normalised probability vector PN
X does not represent a probability distribution, and in general does satisfy the additivity requirement of a probability distribution. The vector PN
X represents relative probability to be used in comparisons with other vectors of relative probability, for the purpose of choosing between different options.
Similarly to the notion of utility-normalised probability, it is possible to define
utility-normalised elemental specificity, vagueness and uncertainty.
Definition 4.9 (Utility-Normalised Elemental Measures). Assume a random variS
able X with an associated projected probability distribution PX . Let ḃbX (x) denote
V
the elemental specificity of x, let ḃbX (x) denote the elemental vagueness of x, and
let u̇uX (x) denote the elemental uncertainty of x. Assume the utility vector λ X , as
well as λ + , the greatest absolute utility from λ X and from other relevant utility vectors that will be considered for comparing different options. The utility-normalised
elemental specificity, vagueness and uncertainty are expressed as:
S
Utility-normalised elemental specificity:
NS
ḃbX (x) =
λ X (x)ḃbX (x)
, ∀x ∈ X. (4.16)
λ+
V
NV
Utility-normalised elemental vagueness: ḃbX (x) =
λ X (x)ḃbX (x)
, ∀x ∈ X. (4.17)
λ+
4.3 Utility and Normalisation
59
Utility-normalised elemental uncertainty: u̇uN
X (x) =
λ X (x)u̇uX (x)
, ∀x ∈ X. (4.18)
λ+
⊔
⊓
Similarly to the additivity property of elemental specificity, vagueness and uncertainty of Eq.(4.9), we also have additivity of utility-normalised elemental utility
specificity, vagueness and uncertainty, as expressed in Eq.(4.19).
NS
NV
N
ḃbX (x) + ḃbX (x) + u̇uN
X (x) = PX (x).
(4.19)
Having defined utility-normalised probability, it is possible to directly compare
options without involving utilities, because utilities are integrated in the utilitynormalised probabilities.
Similarly to the mass-sum for elemental specificity, vagueness and uncertainty
of Eq.(4.10), it is possible to also describe a corresponding utility-normalised elemental mass-sum, as defined below.
Definition 4.10 (Utility-Normalised Elemental Mass-Sum). Let X be a domain
with hyperdomain R(X), and assume that the opinion ωX is specified. Also assume
that a utility vector λ X is specified. Consider an element x ∈ R(X) with its elemental
NS
NV
utility specificity ḃbX (x), elemental utility vagueness ḃbX (x) and elemental utility
N
uncertainty u̇uX (x). The utility-normalised elemental mass-sum for x is the triplet
denoted ΞNE
X (x) expressed as:
NS
NV
bX (x), ḃbX (x), u̇uN
Utility-normalised elemental mass-sum: ΞNE
X (x) = ḃ
X (x) .
(4.20)
⊔
⊓
Note that utility-normalised elemental specificity, vagueness and uncertainty do
not represent realistic measures, and must be considered as purely synthetic.
As an example of applying utility-normalised probability, consider two urns
named X and Y that both contain 100 red and black balls, and you are asked to draw
a ball at random from one of the urns. The possible outcomes are named x1 = ‘Red’
and x2 = ‘Black’ for urn X, and are similarly named y1 = ‘Red’ and y2 = ‘Black’
for urn Y.
For urn X you are told that it contains 70 red balls, 10 black balls, and 20 balls
that are either red or black. The corresponding opinion ωX is expressed as:


bX (x1 ) = 7/10, aX (x1 ) = 1/2,
Opinion ωX =  bX (x2 ) = 1/10, aX (x2 ) = 1/2, 
(4.21)
uX
= 2/10.
For urn Y you are told that it contains 40 red balls, 20 black balls, and 40 balls
that are either red or black. The corresponding hyper opinion ωY is expressed as:
60
4 Decision-Making Under Uncertainty


aY (y1 ) = 1/2,
aY (y2 ) = 1/2, 
bY (y1 ) = 4/10,

b
Opinion ωY =
Y (y2 ) = 2/10,
uY
= 4/10
(4.22)
Imagine that you must selected one ball at random, either from urn X or Y, and
you are asked to make a choice about which urn to draw it from in a single betting
game. With option X, you receive $1000 if you draw ‘Black’ from urn X (i.e. if you
draw x2 ). With option Y, you receive $500 if a you draw ‘Black’ from urn Y (i.e.
if you draw y2 ). You receive nothing if you draw ‘Red’ in either option. Table 4.2
summarises the options in this game.
Table 4.2 Betting options in situation involving utilities
Option X, draw from urn X:
Option Y, draw from urn Y:
Red
0
0
Black
$1000
$500
The elemental mass-sums for drawing ‘Black’ are different for options X and
Y. However, the utility-normalised elemental mass-sums are equal, as illustrated in
Figure 4.6. The normalisation factor used in this example is λ + = 1000, since $1000
is the greatest absolute utility.
; EX ( x2 ) :
PX ( x2 )
; ( y2 ) :
E
Y
PY ( y2 )
; NE
X ( x2 ) :
PXN ( x2 )
;YNE ( y2 ) :
PYN ( y2 )
Elemental mass-sums
Utility-normalised elemental mass-sums
P
0.0
0.1
Legend:
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Elemental specificity
Elemental uncertainty
Fig. 4.6 Diagram for mass-sums and for utility-normalised mass-sums for options X and Y
Note that utility-normalised probability is equal for options X and Y, expressed
N
as PN
X (x2 ) = PY (y2 ). Considering utility-normalised probability alone is thus insufficient for determining which option is the best. The decision in this case must
be based on the elemental specificity, which is greatest for option Y, expressed as
S
S
ḃbY (y2 ) > ḃbX (x2 ). In this situation it would not be meaningful to consider utilitynormalised elemental specificity, which here is equal for both options. This is explained in detail in Section 4.4.
4.4 Decision Criteria
61
In case of equal utilities for all options, then normalisation is not needed, or it
can simply be observed that utility-normalised elemental mass-sums are equal to the
corresponding non-normalised elemental mass-sums, as expressed below.

N

 Projected probability: PX





NS


Elemental specificity: ḃbX





When all options have equal utility: Elemental vagueness: ḃbNV
X







Elemental uncertainty: u̇uN

X






Elemental mass-sum: ΞNE
X
= PX
S
= ḃbX
V
= ḃbX
(4.23)
= u̇uX
= ΞEX
In the examples below, utilities for all options are equal, so for convenience, the
diagrams show simple elemental mass-sums, which would be equal to the corresponding utility-normalised elemental mass-sums.
4.4 Decision Criteria
It is possible to specify a set of criteria for making choices with opinions. The criteria follow the indicated order of priority.
1. The option with the highest utility-normalised probability is the best choice.
2. Given equal utility-normalised probability among all options, the option with
the greatest elemental specificity is the best choice.
3. Given equal utility-normalised probability as well as equal elemental specificity
among all options, the option with the least elemental uncertainty (and thereby
the greatest elemental vagueness, whenever relevant) is the best option.
The above criteria predict the choice of the majority of participants in the Ellsberg experiment described below, as well as the intuitive best choice in additional
examples that combine various degrees of specificity, vagueness and uncertainty.
The procedure for making decisions according to these criteria is illustrated in
Figure fig:decision-criteria below.
The steps of the decision-making process in Figure 4.7 are described in more
detail below. We assume uniform utilities for all the options. In case of different
utilities, then the expected utilities can be computed by simple product.
(a) The decision maker must have the opinions for the relevant options to be compared. The elemental specificities, vaguenesses and uncertainties must be computed, and utility normalised probabilities must be represented in a normalised
form like in Figure 4.6.
62
4 Decision-Making Under Uncertainty
(a)
Consider elemental
mass-sums for all
decision options
(b)
Compare
utility-normalised
probabilities
(c)
Different
Select option with greatest
utility-normalised probability
Equal
(e)
(d)
Compare elemental
specificities
Different
Select option with greatest
elemental specificity
Equal
(g)
(f)
Compare elemental
uncertainties
(h)
Different
Select option with least
elemental uncertainty
Equal
Difficult decision
Fig. 4.7 Decision-making process
(b) Compare the utility-normalised probabilities of all relevant options.
(c) In case one option has greatest utility-normalised probability, then that option
is the best choice.
(d) Assuming that the relevant options have equal utility-normalised probabilities,
then compare the elemental specificities.
(e) In case one option has greatest elemental specificity, then that option is the best
choice.
(f) Assuming that all relevant options have equal utility-normalised probability as
well as equal elemental specificity, then compare the elemental uncertainties.
(g) In case one option has least elemental uncertainty (i.e. greatest elemental vagueness), then that option is the best choice.
(h) Assuming that the relevant options are such that they have equal utility-normalised
probability, equal elemental specificity and uncertainty, then it is challenging to
make a decision. However, the composition of each elemental vagueness might
be different, or the base rates might be different. In addition, it might be meaningful to consider differences between utility-normalised specificity, vagueness
and uncertainty. There are thus multiple aspects that could be considered for
specifying more detailed decision criteria, in addition to those specified above.
4.5 The Ellsberg Paradox
63
The next sections describe examples where the decision criteria defined above
can be seen in action.
4.5 The Ellsberg Paradox
This and the next sections serve as motivating examples for defining decision criteria in Section 4.4 which then defines how specificity, vagueness and uncertainty of
opinions should be used for rational decision-making.
The Ellsberg paradox[20] results from an experiment that shows how traditional
probability theory is unable to explain typical human decision behaviour. Because
traditional probability does not express degrees of vagueness and uncertainty it can
not explain the results of the experiment. However, when representing the situation
with opinions that do express degrees of vagueness and uncertainty the results of
the experiment become perfectly rational.
In the Ellsberg experiment you are shown an urn with 90 balls in it, and you
are told that 30 balls are red and that the remaining 60 balls are either black or
yellow. One ball is going to be selected at random and you are asked to make a
choice in two separate betting games. Fig.4.8 shows the situation of the Ellsberg
paradox represented in the form of a hyperdomain with corresponding belief mass
distribution.
R (:)
0
x4
0
x1
1/3
x5
0
x3
0
x2
0
x6
2/3
Fig. 4.8 Hyperdomain and belief mass distribution in the Ellsberg paradox
The domain X and its hyper opinion are then expressed as:


x1 : Red,








x
:
Black,


2




x3 : Yellow,
Hyperdomain R(X) =
x4 : Red or Yellow, 






x5 : Red or Black, 






x6 : Black or Yellow,
(4.24)
64
4 Decision-Making Under Uncertainty

bX (x1 ) = 1/3,
 bX (x2 ) = 0,

 bX (x3 ) = 0,

Hyper opinion ωX = 
 bX (x4 ) = 0,
 bX (x5 ) = 0,

 b(x6 ) = 2/3,
uX
=0

aX (x1 ) = 1/3,
aX (x2 ) = 1/3, 

aX (x3 ) = 1/3, 

aX (x4 ) = 2/3, 

aX (x5 ) = 2/3, 

aX (x6 ) = 2/3, 
(4.25)
A quick look at ωX reveals that it contains some specific belief mass, some vague
belief mass and no uncertainty mass, so it is a dogmatic and partially vague opinion.
In betting game 1 you must choose between option 1A and 1B. With option
1A you receive $100 if a ‘Red’ ball is drawn, and you receive nothing of either a
‘Black’ or ‘Yellow’ ball is drawn. With option 1B you receive $100 if a black ball is
drawn and you receive nothing if either a ‘Red’ or ‘Yellow’ ball is drawn. Table 4.3
summarises the options in game 1.
Table 4.3 Game 1: Pair of betting options
Option 1A:
Option 1B:
Red
$100
0
Black
0
$100
Yellow
0
0
Make a note of your choice from betting game 1, and then proceed to betting
game 2 where you are asked to choose between two new options based on the same
random draw of a single ball from the same urn. With option 2A you receive $100
if either a ‘Red’ or ‘Yellow’ ball is drawn, and you receive noting of a ‘Black’ ball
is drawn. With option 2B you receive $100 if either a ‘Black’ or ‘Yellow’ ball is
drawn, and you receive nothing if a ‘Red’ ball is drawn. Table 4.4 summarises the
options in game 2.
Table 4.4 Game 2: Pair of betting options
Option 2A:
Option 2B:
Red
$100
0
Black
0
$100
Yellow
$100
$100
Would you choose option 2A or 2B?
Ellsberg reports that, when presented with these pairs of choices, most people
select options 1A and 2B. Adopting the approach of expected utility theory this
reveals a clear inconsistency in probability assessments. On this interpretation, when
a person chooses option 1A over option 1B, he or she is revealing a higher subjective
probability assessment of picking a ‘Red’ ball than a ‘Black’ ball.
However, when the same person prefers option 2B over option 2A, he or she
reveals that his or her subjective probability assessment of picking a ‘Black’ or
‘Yellow’ ball is higher than a ‘Red’ or ‘Yellow’ ball, which implies that picking
a ‘Black’ ball has a higher probability assessment than a ‘Red’ ball. This seems
4.5 The Ellsberg Paradox
65
to contradict the probability assessment of game 1, which therefore represents a
paradox.
When representing the vagueness of the opinions the choices 1A and 2B of the
majority become perfectly rational, as explained next.
The utilities for options 1A and 1B are equal ($100) so their utility probabilities
are equal to their projected probabilities that are used for decision modelling below. Projected probabilities are computed with Eq.(3.28) which for convenience is
repeated below:
∑
PX (x) =
a X (x/x j ) b X (x j ) + a X (x) uX .
(4.26)
x j ∈R(X)
Relative base rates are computed with Eq.(2.10) which for convenience is repeated
below:
aX (x ∩ x j )
.
(4.27)
a X (x/x j ) =
a X (x j )
The projected probabilities of x1 and x2 in game 1 are then:
Option 1A: PX (x1 ) = a X (x1 /x1 ) b X (x1 ) = 1 · 31 =
1
3
Option 1B: PX (x2 ) = a X (x2 /x6 ) b X (x2 ) = 12 · 23 =
1
3
(4.28)
Note that PX (x1 ) = PX (x2 ), which makes the the options equal from a purely 1st
order probability point of view. However they are affected by different vague belief
V
mass as shown below. Elemental vague belief mass of x, denoted ḃbX (x), is computed
with Eq.(4.3) which for convenience is repeated below.
V
ḃbX (x) =
∑ a X (x/x j ) b X (x j )
(4.29)
x j ∈C (X)
x j 6⊆x
The elemental vague belief masses of x1 and x2 in game 1 are then:
V
Option 1A: ḃbX (x1 ) = 0
(4.30)
Option 1B:
V
ḃbX (x2 )
1
2
= a X (x2 /x6 ) b X (x6 ) = ·
2
3
=
1
3
Given the absence of uncertainty, the additivity property of Eq.(4.9) allows us to
S
S
compute the elemental specificities as ḃbX (x1 ) = 1/3 and ḃbX (x2 ) = 0.
The elemental mass-sum diagram of the options in Ellsberg betting game 1 is
illustrated in Figure 4.9.
The difference between options 1A (x1 ) and 1B (x2 ) emerges with their different
specific and vague belief masses. People clearly prefer choice 1A because it only
has specificity and no vagueness, whereas choice 1B is affected by vagueness.
We now turn to betting game 2 where the projected probabilities of x5 and x6 are:
66
4 Decision-Making Under Uncertainty
Opt.1A, ; EX ( x1 ) :
PX ( x1 )
Opt.1B, ; ( x2 ) :
PX ( x2 )
E
X
P
0
1
9
2
9
Legend:
3
9
4
9
5
9
7
9
6
9
8
9
1
Elemental specificity
Elemental vagueness
Fig. 4.9 Elemental mass-sum diagram for game 1 in the Ellsberg paradox.
Option 2A: PX (x5 ) = a X (x1 /x1 ) b X (x1 ) + a X (x3 /x6 ) b X (x6 ) = 1 · 13 + 12 · 32 =
2
3
Option 2B: PX (x6 ) = a X (x2 /x6 ) b X (x6 ) + a X (x3 /x6 ) b X (x6 ) = 12 · 23 + 21 · 23 = 23
(4.31)
Note that PX (x5 ) = PX (x6 ), which makes the the options equal from a 1st order
probability point of view. However they have different vague belief masses as shown
below. Vague belief mass is computed with Eq.(4.3).
The elemental vagueness of x5 and x6 in game 2 are:
V
Option 2A: ḃbX (x5 ) = aX (x3 /x6 ) bX (x6 ) = 12 · 23 =
1
3
(4.32)
V
Option 2B: ḃbX (x6 ) = 0
Given the absence of uncertainty, the additivity property of Eq.(4.9) allows us to
S
S
compute the elemental specificities as ḃbX (x5 ) = 1/3 and ḃbX (x6 ) = 2/3.
The elemental mass-sum diagram of the options in Ellsberg betting game 2 is
illustrated in Figure 4.10.
Opt.2A, ; EX ( x5 ) :
PX ( x5 )
Opt.2B, ; EX ( x6 ) :
PX ( x6 )
P
0
Legend:
1
9
2
9
3
9
4
9
5
9
6
9
7
9
Elemental specificity
Elemental vagueness
Fig. 4.10 Elemental mass-sum diagram for game 2 in the Ellsberg paradox.
8
9
1
4.6 Examples of Decision Under Vagueness and Uncertainty
67
The difference between options 2A and 2B emerges with their different elemental
vagueness and specificity. People clearly prefer choice 2B (x6 ) because it has no
vagueness, whereas choice 2A (x5 ) is affected by its vagueness of 1/3.
We have shown that preferring option 1A over option 1B, and that preferring
option 2B over option 2A is perfectly rational, and therefore does not represent a
paradox within the opinion model.
Other models of uncertain probabilities are also able to explain the Ellsberg paradox, such as e.g. Choquet capacities(Choquet 1953[9], Chateauneuf 1991[8]). However, the Ellsberg paradox only involves vagueness, not uncertainty. In fact, the
Ellsberg paradox is too simplistic for teasing out the whole specter of specificity,
vagueness and uncertainty of opinions. The next section presents examples where
all aspects are taken into account.
4.6 Examples of Decision Under Vagueness and Uncertainty
The three examples presented in this section involve specificity, vagueness and uncertainty. Different situations of varying degrees of specificity, vagueness and uncertainty can be clearly separated and compared when represented as subjective
opinions. As far as we are aware of, no other model of uncertain reasoning is able
to distinguish and correctly rank the described situations in the same way.
Each example consist of a game where you are presented with two urns denoted
X and Y , both with 90 balls, and you are asked to pick a random ball from one of
the two urns, with the chance of winning $100 if you pick a yellow ball.
4.6.1 Decisions with Difference in Projected Probability
In game 1 you receive the following information. For urn X you are told that it contains 90 balls that are either red, black or yellow. The corresponding hyper opinion
ωX is expressed as:


b X (x1 ) = 0, a X (x1 ) = 1/3,
 b X (x2 ) = 0, a X (x2 ) = 1/3, 


 b X (x3 ) = 0, a X (x3 ) = 1/3, 
Hyper opinion



2
about X
ωX = 
(4.33)
 b X (x4 ) = 0, a X (x4 ) = /3, 


2
in game 1
b
(x
)
=
0,
a
(x
)
=
/
3
,
X
X
5
5


 b X (x6 ) = 0, a X (x6 ) = 2/3, 
uX
= 1.
For urn Y you are told that 50 balls are red, 20 balls are black, and 20 balls are
yellow. The corresponding hyper opinion ωY is expressed as:
68
4 Decision-Making Under Uncertainty

bY (y1 ) = 4/9,
 bY (y2 ) = 3/9,

 bY (y3 ) = 2/9,
Hyper opinion

about Y
ωY = 
 bY (y4 ) = 0,
 bY (y5 ) = 0,
in game 1

 bY (y6 ) = 0,
uY
=0

aY (y1 ) = 1/3,
aY (y2 ) = 1/3, 

aY (y3 ) = 1/3, 

aY (y4 ) = 2/3, 

aY (y5 ) = 2/3, 

aY (y6 ) = 2/3, 
(4.34)
You must selected one ball at random, either from urn X or Y , and you are asked
to make a choice about which urn to draw it from in a single betting game. You
receive $100 if a ‘Yellow’ ball is drawn, and you receive nothing of either a ‘Red’
or ‘Black’ ball is drawn. Table 4.5 summarises the options in this game.
Table 4.5 Game 1: Betting options in situation of different projected probabilities
Option 1X, draw ball from urn X:
Option 1Y, draw ball from urn Y:
Red
0
0
Black
0
0
Yellow
$100
$100
Without having conducted any experiment, when presented with this pair of
choices, it seems obvious to select option 1X. The intuitive reason is that option 1X
has the greatest projected probability for picking a ‘Yellow’ ball. Eq.(4.35) gives the
computed results for projected probability. Projected probability is computed with
Eq.(3.28).
The projected probabilities of x3 and y3 in options 1X and 1Y are:
Option 1X: PX (x3 ) = a X (x3 )uX = 13 · 1 =
1
3
(4.35)
Option 1Y: PY (y3 ) = bY (y3 )
=
2
9
The elemental mass-sum diagram of ΞEX (x3 ) and ΞYE (y3 ) of options 1X and 1Y are
visualised in Figure 4.11. Note that the utility for picking a red ball is equal for both
E
NE
options, so that ΞEX (x3 ) = ΞNE
X (x3 ) and ΞY (y3 ) = ΞY (y3 ), i.e. the utility-normalised
and the non-normalised mass-sums are equal. So while Figure 4.11 shows elemental
mass-sums, the corresponding utility-normalised elemental mass-sums are identical.
It can be seen that PX (x3 ) > PY (y3 ) which indicates that the rational choice is
option 1X. Note that option 1X has elemental uncertainty of 1/3 in contrast to option
1Y which has no uncertainty. In case of highly risk-averse participants, option 1Y
might be preferable, but this should still be considered ‘irrational’.
Game 1 shows that when options have different projected probability, the option
with the greatest projected probability is to be preferred.
4.6 Examples of Decision Under Vagueness and Uncertainty
Opt.1X, ; EX ( x3 ) :
69
PX ( x3 )
Opt.1Y, ; ( y3 ) :
E
Y
PY ( y3 )
P
0
Legend:
1
9
2
9
3
9
4
9
5
9
6
9
7
9
8
9
1
Elemental specificity
Elemental uncertainty
Fig. 4.11 Visualising elemental mass-sum for options 1X and 1Y.
4.6.2 Decisions with Difference in Specificity
In game 2 you receive the following information. For urn X you are told that 30
balls are red, and that 60 balls are either black or yellow. The corresponding hyper
opinion ωX is expressed as:


b X (x1 ) = 1/3, a X (x1 ) = 1/3,
 b X (x2 ) = 0,
a X (x2 ) = 1/3, 



Hyper opinion
a X (x3 ) = 1/3, 
 b X (x3 ) = 0,

about X
a X (x4 ) = 2/3, 
ωX = 
(4.36)
 b X (x4 ) = 0,

 b X (x5 ) = 0,
2/3, 
in game 2
a
(x
)
=
X
5


 b X (x6 ) = 2/3, a X (x6 ) = 2/3, 
uX
=0
For urn Y you are told that 10 balls are red, 10 balls are black, 10 balls are yellow,
and that the remaining 60 balls are either red, black or yellow. The corresponding
hyper opinion ωY is expressed as:


bY (y1 ) = 1/9, aY (y1 ) = 1/3,
 bY (y2 ) = 1/9, aY (y2 ) = 1/3, 


 bY (y3 ) = 1/9, aY (y3 ) = 1/3, 
Hyper opinion


about Y
aY (y4 ) = 2/3, 
ωY = 
(4.37)
 bY (y4 ) = 0,

 bY (y5 ) = 0,
in game 2
aY (y5 ) = 2/3, 


 bY (y6 ) = 0,
aY (y6 ) = 2/3, 
2
uY
= /3
One ball is going to be selected at random either from urn X or Y , and you are
asked to make a choice about which urn to draw it from in a single betting game.
You receive $100 if a ‘Yellow’ ball is drawn, and you receive nothing of either a
‘Red’ or ‘Black’ ball is drawn. Table 4.6 summarises the options in this game.
Without having conducted any experiment, when presented with this pair of
choices, it appears obvious to select option 2Y. The intuitive reason us that option
2Y includes some specific belief mass on favour of ‘Yellow’ whereas with option
70
4 Decision-Making Under Uncertainty
Table 4.6 Game 2: Betting options in vague and uncertain situation
Option 2X, draw ball from urn X:
Option 2Y, draw ball from urn Y:
Red
0
0
Black
0
0
Yellow
$100
$100
2X there is none. Below are the expressions for projected probability, elemental
specificity, vague belief mass and elemental uncertainty mass.
Projected probabilities are computed with Eq.(3.28), relative base rates are computed with Eq.(2.10), and elemental uncertainty with Eq.(4.3).
The projected probabilities of x3 and y3 in options 2X and 2Y are:
Option 2X: PX (x3 ) = a X (x3 /x6 ) b X (x6 ) + a X (x3 )uX = 12 · 94 + 13 · 39 =
1
3
(4.38)
= 13 · 1
Option 2Y: PY (y3 ) = aY (y3 ) uY
=
1
3
Note that PX (x3 ) = PY (y3 ) which makes options 2X and 2Y equal from a purely
1st order probability point of view. However they have different elemental specificity, elemental vagueness, and elemental uncertainty as shown below.
Elemental specificities of x3 and y3 are:
S
Option 2X: ḃbX (x3 ) = 0
(4.39)
S
Option 2Y: ḃbY (y3 ) = bY (y3 ) =
1
9
Elemental vaguenesses of x3 and y3 are:
V
Option 2X: ḃbX (x3 ) = a X (x3 /x6 ) b X (x6 ) = 12 · 23 =
1
3
(4.40)
V
Option 2Y: ḃbY (y3 ) = 0
Elemental uncertainty masses of x3 and y3 are:
Option 2X: u̇uX (x3 ) = 0
(4.41)
Option 2Y: u̇uY (y3 ) = aY (y3 )uY = 13 · 69 =
2
9
Note that the additivity property of Eq.(4.9) holds for x3 and y3 .
The elemental mass-sum diagram of ΞEX (x3 ) and ΞYE (y3 ) of options 2X and 2Y
are visualised in Figure 4.12.
The difference between options 2X and 2Y emerges with their difference in elemental specificity, where the option 2Y has the greatest specificity. It also means that
option 2Y has the least sum of elemental vagueness and uncertainty, and therefore
is the preferable option.
Game 2 shows that when projected probabilities are equal, but the elemental
specificities are different, then the option with the greatest specificity is the best
4.6 Examples of Decision Under Vagueness and Uncertainty
Opt.2X, ; EX ( x3 ) :
PX ( x3 )
Opt.2Y, ; ( y3 ) :
PY ( y3 )
E
Y
0
Legend:
71
P
1
9
2
9
3
9
4
9
5
9
6
9
7
9
8
9
1
Elemental specificity
Elemental vagueness
Elemental uncertainty
Fig. 4.12 Visualising elemental mass-sum for options 2X and 2Y.
choice. Option 2Y is therefore the rational preferred choice because it clearly has
the greatest elemental specificity among the two.
4.6.3 Decisions with Difference in Vagueness and Uncertainty
In game 3 you receive the following information. For urn X you are told that 20
balls are red, that 40 balls are either black or yellow, and that the remaining 30 balls
are either red, black or yellow.
For urn Y you are only told that the 90 balls in the urn are either red, black or
yellow.
The corresponding hyper opinions are expressed as:


b X (x1 ) = 2/9, a X (x1 ) = 1/3,
 b X (x2 ) = 0,
a X (x2 ) = 1/3, 


 b X (x3 ) = 0,
a X (x3 ) = 1/3, 
Hyper opinion


a X (x4 ) = 2/3, 
about X
ωX = 
(4.42)
 b X (x4 ) = 0,

 b X (x5 ) = 0,

2
a
(x
)
=
/
3
,
in game 3
X 5


 b X (x6 ) = 4/9, a X (x6 ) = 2/3, 
uX
= 3/9


bY (y1 ) = 0, aY (y1 ) = 1/3,
 bY (y2 ) = 0, aY (y2 ) = 1/3, 


 bY (y3 ) = 0, aY (y3 ) = 1/3, 
Hyper opinion



2
about Y
ωY = 
(4.43)
 bY (y4 ) = 0, aY (y4 ) = /3, 
 bY (y5 ) = 0, aY (y5 ) = 2/3, 
in game 3


 bY (y6 ) = 0, aY (y6 ) = 2/3, 
uY
=1
One ball is going to be selected at random either from urn X or from urn Y , and
you are asked to make a choice about which urn to draw it from in a single betting
72
4 Decision-Making Under Uncertainty
game. You receive $100 if a ‘Yellow’ ball is drawn, and you receive nothing of either
a ‘Red’ or ‘Black’ ball is drawn. Table 4.7 summarises the options in this game.
Table 4.7 Game 3: Betting options in vague and uncertain situation
Red
0
0
Option 3X, draw ball from urn X:
Option 3Y, draw ball from urn Y:
Black
0
0
Yellow
$100
$100
Without having conducted any experiment, when presented with this pair of
choices, it seems obvious to select option 2X. The intuitive reason us that option
3X is affected by less elemental uncertainty than option 3Y. Below are the expressions for projected probability and elemental specificity.
Projected probabilities are computed with Eq.(3.28), relative base rates are computed with Eq.(2.10), and elemental uncertainty with Eq.(4.3).
The projected probabilities of x3 and y3 in options 3X and 3Y are:
Option 3X: PX (x3 ) = a X (x3 /x6 ) b X (x6 ) + a X (x3 )uX = 12 · 94 + 13 · 39 =
1
3
(4.44)
= 13 · 1
Option 3Y: PY (y3 ) = aY (y3 ) uY
=
1
3
S
Option 2X: ḃbX (x3 ) = 0
(4.45)
Option 2Y:
S
ḃbY (y3 )
=0
Note that PX (x3 ) = PY (y3 ) which makes options 3X and 3Y equal from a purely
1st order probability point of view. In addition we have equal elemental specificity
S
S
expressed by ḃbX (x3 ) = ḃbY (y3 ). However they have different elemental vagueness
and elemental uncertainty as shown below. Elemental vagueness of x3 and y3 are:
V
Option 3X: ḃbX (x3 ) = a X (x3 /x6 ) b X (x6 ) = 21 · 49 =
2
9
(4.46)
Option 3Y:
V
ḃbY (y3 )
=0
Elemental uncertainty of x3 and y3 are:
Option 3X: u̇uX (x3 ) = a X (x3 )uX = 13 · 39 =
1
9
Option 3Y: u̇uY (y3 ) = aY (y3 )uY = 31 · 1 =
1
3
(4.47)
Note that the additivity property of Eq.(4.9) holds for x3 and y3 .
The elemental mass-sum diagram of ΞEX (x3 ) and ΞYE (y3 ) of options 3X and 3Y
are visualised in Figure 4.13.
What is interesting in game 3 is that elemental vagueness and elemental uncertainty for x3 and y3 respectively are different. Vagueness is preferable over uncertainty, because vagueness is based on evidence, whereas uncertainty reflects lack
4.7 Entropy in the Opinion Model
73
Opt.3X, ; EX ( x3 ) :
PX ( x3 )
Opt.3Y, ; ( y3 ) :
PY ( y3 )
E
Y
P
0
Legend:
1
9
2
9
3
9
4
9
5
9
6
9
7
9
8
9
1
Elemental vagueness
Elemental uncertainty
Fig. 4.13 Visualising elemental mass-sum for options 3X and 3Y.
of evidence. The option with the least uncertainty, and thereby the option with the
greatest vagueness is therefore preferable.
Game 3 shows that when projected probabilities are equal, and the elemental
specificities are also equal (zero in this case), but the elemental uncertainty and
vagueness are different, then the option with the least elemental uncertainty is the
best choice. Option 3X is therefore the rational preferred choice because it clearly
has the the least elemental uncertainty among the two.
4.7 Entropy in the Opinion Model
Information theory [82] provides a formalism for modeling and measuring 1st order
uncertainty about guessing the outcome of random events that are governed by probability distributions. The amount of information associated with a random variable
is called entropy, where high entropy indicates that it is difficult to predict outcomes,
and low entropy indicates easy predictions. The amount of information associated
with a given outcome is called surprisal, where high surprisal indicates an a priori
unlikely outcome, and low surprisal indicates an a priori likely outcome. People
tend to be risk-averse [20], so they prefer to make decisions under low entropy and
low surprisal. For example, most people prefer the option of receiving $1,000 over
the option of an all-or-nothing coin flip for $2,000. The expected utility is $1,000 in
both options, but the former option exposes the participant to 0 bits surprisal (i.e. no
surprisal), and the latter option exposes him or her to 1 bit surprisal. Given that the
expected utility otherwise is equal (as in the example above), people prefer betting
with the lowest possible exposure to surprisal.
Belief and uncertainty are intimately linked with regard to information theory in
the opinion model. In the sections below, we introduce standard notions of information surprisal and entropy from classical information theory, before extending these
notions to opinions. A more detailed discussion and treatment can be found, e.g., in
[66].
74
4 Decision-Making Under Uncertainty
4.7.1 Outcome Surprisal
Surprisal, aka. self-information, is a measure of the information content associated
with the outcome of random variable under a given probability distribution. The
measuring unit of surprisal can be bits, nats, or hartleys, depending on the base of
the logarithm used in its calculation. When logarithm base 2 is used, the unit is bits,
which is also used below.
Definition 4.11 (Surprisal). The surprisal (or self-information) of an outcome x of
a discrete random variable X with probability distribution p X is expressed as:
IX (x) = − log2 (ppX (x))
(4.48)
⊔
⊓
Surprisal measures the degree to which an outcome is surprising. An outcome is
more surprising the less likely it is to happen. When the base of the logarithm is 2,
as in Eq.(4.48), the surprisal is measured in bits. The more surprising an outcome
is, the more informative it is, and the more bits it has.
For example, when considering a fair coin, the probability is 0.5 for both ‘heads’
and ‘tail’, so each time the coin lands with ‘heads’ or ‘tail’, the observed amount of
information is I(tossing fair coin) = − log2 (0.5) = log2 (2) = 1 bit of information.
When considering a fair dice, the probability is 1/6 for each face, so each time
the dice produces one of its six faces, the observed amount of information is
I(throwing fair dice) = − log2 (1/6) = log2 (6) = 2.585 bits.
In case of an unfair dice where the probability of ‘six’ is only 1/16 (as opposed to
1/6 for a fair dice), throwing a ‘six’ amounts to I(‘six’) = − log (1/16) = log (16) =
2
2
4 bits surprise.
In information theory, surprisal of an outcome is completely determined by the
probability that it happens. Opinion outcome surprisal defined below.
Definition 4.12 (Opinion Outcome Surprisal). Assume a (hyper) opinion ωX where
the variable X takes its values from the hyperdomain R(X). Given that the projected
probability of outcome x is PX (x), the opinion surprisal of outcome x is:
Opinion Outcome Surprisal: IPX (x) = − log2 (PX (x))
(4.49)
⊔
⊓
In the opinion model, surprisal of an outcome can be partially specific, vague or
uncertain, in any proportion. These concepts are defined below.
Definition 4.13 (Specificity, Vagueness and Uncertainty Surprisal). Assume a
(hyper) opinion ωX where the variable X takes its values from the hyperdomain
R(X). Given that the projected probability of outcome x is PX (x), the specificity,
vagueness and uncertainty surprisals of outcome x are expressed as:
4.7 Entropy in the Opinion Model
75
S
Specificity Surprisal: ISX (x) =
ḃbX (x)IPX (x)
PX (x)
Vagueness Surprisal: IV
X (x) =
ḃbX (x)IPX (x)
PX (x)
(4.51)
Uncertainty Surprisal: IU
X (x) =
u̇uX (x)IPX (x)
PX (x)
(4.52)
(4.50)
V
⊔
⊓
Note that opinion surprisal of an outcome consists of the sum of specificity,
vagueness and uncertainty surprisal, expressed as:
U
P
ISX (x) + IV
X (x) + IX (x) = IX (x).
(4.53)
The decision criteria described in Section 4.4 are expressed in terms of projected
probability consisting of elemental specificity, vagueness and uncertainty. Given that
opinion outcome surprisal is a function of the same concepts, the same decision
criteria can equivalently be articulated in terms of outcome surprisal consisting of
specificity, vagueness, and uncertainty surprisal. However, using projected probability has obvious advantages when including utility in the decision process, because
their product then produces the expected utility directly.
4.7.2 Opinion Entropy
Information entropy can be interpreted as expected surprisal, and is the sum over
products of surprisal and probability of outcomes, as defined below.
Definition 4.14 (Entropy). The entropy, denoted H(X), of a random variable X that
takes its values from a domain X, is the expected surprisal expressed as:
H(X) =
∑ p X (x) IX (x) = − ∑ p X (x) log2 (ppX (x))
x∈X
(4.54)
x∈X
⊔
⊓
Entropy measures the expected information carried with a random variable. In
information theory, entropy of a random variable is decided by the probability (1st
order uncertainty) of its outcome in one test. The more evenly the outcome probabilities of a random variable are distributed, the more entropy the random variable
has. If one outcome is absolutely certain, then the variable has zero entropy.
The opinion entropy of a (hyper) variable X with an associated opinion ωX is simply the entropy computed over the projected probability distribution PX , similarly
to Eq.(4.54).
76
4 Decision-Making Under Uncertainty
Definition 4.15 (Opinion Entropy). Assume a (hyper) opinion ωX where the variable X takes its values from the hyperdomain R(X). The opinion entropy, denoted
HP (ωX ), is the expected surprisal expressed as:
HP (ωX ) = − ∑ PX (x) log2 (PX (x))
(4.55)
x∈X
⊔
⊓
Opinion entropy is insensitive to change in the uncertainty mass of an opinion as
long as the projected probability distribution PX remains the same.
Proposition 4.1. Let ωXA and ωXB be two opinions, such that uAX > uBX and PAX = PBX ,
then HP (ωXA ) = HP (ωXB ).
Proof. The proposition’s validity follows from the fact that HP is determined by the
⊔
projected probability distributions, which are equal for ωXA and ωXB . ⊓
In order to account for difference in uncertainty, as well as in vagueness, it is
necessary to introduce specificity entropy, vagueness entropy and uncertainty enS
tropy. These entropy concepts can be computed based on elemental specificity ḃbX ,
V
elemental vagueness ḃbX , and elemental uncertainty u̇uX , as defined in Section 4.1.
Definition 4.16 (Specificity Entropy). Assume a (hyper) opinion ωX where the
variable X takes its values from the hyperdomain R(X). The specificity entropy,
denoted HS (ωX ), is the expected surprisal from elemental specificity, expressed as:
HS (ωX ) = − ∑ ḃbX (x) log2 (PX (x))
S
(4.56)
x∈X
⊔
⊓
Definition 4.17 (Vagueness Entropy). Assume a (hyper) opinion ωX where the
variable X takes its values from the hyperdomain R(X). The vagueness entropy,
denoted HV (ωX ), is the expected surprisal from elemental vagueness, expressed as:
HV (ωX ) = − ∑ ḃbX (x) log2 (PX (x))
V
(4.57)
x∈X
⊔
⊓
Definition 4.18 (Uncertainty Entropy). Assume a (hyper) opinion ωX where the
variable X takes its values from the hyperdomain R(X). The uncertainty entropy,
denoted HU (ωX ), is the expected surprisal expressed as:
HU (ωX ) = − ∑ u̇uX (x) log2 (PX (x))
(4.58)
x∈X
⊔
⊓
4.8 Conflict Between Opinions
77
Note the additivity property of the above defined entropy concepts.
HS (ωX ) + HV (ωX ) + HU (ωX ) = HP (ωX ).
(4.59)
Thus, for a given opinion entropy, there is a continuum of sums of specificity,
vagueness and uncertainty entropy. The structure of the sum reflects the type of
evidence on which the entropy is based. In case of an urn of 100 balls where you
only known that the balls can be red or black, the entropy for you is 2 bits with
regard to the variable of picking a red or a black ball. This entropy consists solely
of 2 bits uncertainty entropy. In another case, where you learn that there are exactly
50 red balls and 50 black balls, the entropy for you is still 2 bits, however in this
case this entropy consists solely of 2 bits specificity entropy. In the opinion model,
entropy consist of the three different types of entropy as shown in Eq.(4.59), which
thereby gives a more informative expression of entropy than classical information
entropy.
When predicting the outcome of a variable with a given entropy, specificity entropy is preferable over vagueness entropy, which is turn is preferable over uncertainty entropy. This is in line with the decision criteria defined in Section 4.4.
In case of two variables with equal entropy containing the exact same sum of
specificity, vagueness and uncertainty entropy, the two variables might still have
different structure of vagueness entropy, and thereby be different in nature. However, this topic is outside the scope of the current presentation.
The cross entropy of an opinion measures the difference between the projected
probability distribution and the base rate distribution.
Definition 4.19 (Base-Rate to Projected-Probability Cross Entropy). The baserate to projected-probability cross entropy of a discrete random variable X that takes
its values from domain X, denoted H BP (ωX ), is the base-rate expected projected
probability expressed as:
HBP (ωX ) = − ∑ aX (x) log2 (PX (x))
(4.60)
x∈X
⊔
⊓
For a given entropy, the cross entropy is maximum when the projected probability
and base rate have equal distributions.
4.8 Conflict Between Opinions
A fundamental assumption behind subjective logic is that different agents can have
different opinions about the same variable. This also reflects the subjective reality
of how we perceive the world we live in.
For decision making however, having different opinions about the same thing can
be problematic because it makes it difficult to agree on the best course of action.
78
4 Decision-Making Under Uncertainty
When it can be assumed that ground truth exists (without being directly observable), that fact that agents have different opinions can be interpreted as an indicator
that one or multiple agents are wrong. In such situations it can be meaningful to apply strategies to revise opinions, such as for trust revision described in Section 13.5.
The degree of conflict, abbreviated DC, is a measure of the difference between
opinions, and can be used in strategies for dealing with situations of difference between opinions about the same target.
Let B and C be two agents that have their respective opinions ωXB and ωXC about
the same variable X.
The most basic measure of conflict between the two opinions ωXB and ωXC is the
projected distance, denoted PD, expressed by Eq.(4.61).
PD(ωXB , ωXC ) =
∑ |PBX (x) − PCX (x)|
x∈X
(4.61)
2
The property that PD ∈ [0, 1] should be to explained. Obviously PD ≥ 0. Furthermore, given that ∑ PBX (x)+ ∑ PCX (x) = 2, independently of the cardinality of X, it can
be observed that PD ≤ 1. The case PD = 0 occurs when the two opinions have equal
projected probability distributions, in which case the opinions are non-conflicting
(even though they might be different). The maximum value PD = 1 occurs e.g. in
case of two absolute binomial opinions with opposite projected probabilities.
An equivalent representation of the projected distance is given in Eq.(4.62).
Projected Distance:
Equivalent Projected Distance: PD(ωXB , ωXC ) = max |PBX (x) − PCX (x)|
x∈X
(4.62)
To see that Eq.(4.61) and Eq.(4.62) are equivalent becomes evident from the fact
that the greatest difference |PBX (xi ) − PCX (xi )| must be balanced by an equal amount
of projected probability difference for other indexes x j , xk , . . . due to the additivity
of ∑ PBX and ∑ PCX , so that the sum of differences is the double of the maximum
difference.
A large PD does not necessarily indicate conflict, because the potential conflict
is diffused in case one (or both) opinions have high uncertainty. The more uncertain
one or both opinions are, the more tolerance for a large PD should be given.
Tolerance for large PD in case of high uncertainty reflects the fact that uncertain
opinions carry little weight in the fusion process.
A natural measure of the common certainty between two opinions ωXB and ωXC is
their conjunctive certainty denoted by CC:
Conjunctive Certainty: CC(ωXB , ωXC ) = (1 − uBX )(1 − uCX )
(4.63)
It can be seen that CC ∈ [0, 1] where CC = 0 means that one or both opinions
are vacuous, and CC = 1 means that both opinions are dogmatic, i.e. have zero
uncertainty mass.
The degree of conflict (DC) is simply defined as the product of PD and CC.
4.8 Conflict Between Opinions
79
Definition 4.20 (Degree of Conflict). Assume two agents B and C with their respective opinions ωXB and ωXC about the same variable X.
DC(ωXB , ωXC ) denotes the degree of conflict between ωXB and ωXC , which is expressed as:
Degree of Conflict: DC(ωXB , ωXC ) = PD(ωXB , ωXC ) · CC(ωXB , ωXC )
(4.64)
⊔
⊓
As as example we consider the two binomial opinions ωXB1 = (0.05, 0.15, 0.80, 0.90)
and ωXC1 = (0.68, 0.22, 0.10, 0.90). Figure 4.14 shows a screenshot of the visualisation demonstration applet of subjective logic, showing two example opinions ωXB1
and ωXC1 as points in the opinion triangle on the left with their equivalent PDFs on
the right. In this case we get DC(ωXB1 , ωXC1 ) = 0, meaning that there is no conflict.
u
10
9
[B,X]
8
7
6
5
[C,X]
4
3
[C,X]
d
Opinion on [B,X]
belief
0.05
disbelief
0.15
uncertainty 0.80
base rate
0.90
probability 0.77
P
a
Opinion on [C,X]
belief
0.68
disbelief
0.22
uncertainty 0.10
base rate
0.90
probability 0.77
b
2
1
0
[B,X]
0.1 0.2 0.3 0.4
0.5 0.6 0.7 0.8
0.9
1.0
Beta Probability density
Fig. 4.14 Example of opinions ωXB1 and ωXC1 where DC(ωXB1 , ωXC1 ) = 0.0.
The reason why DC(ωXB1 , ωXC1 ) = 0 is because PD(ωXB1 , ωXC1 ) = 0. In terms of
Definition 4.20 there is thus no conflict between these opinions, although their belief
masses are quite different.
The next example shows two binomial opinions ωXB2 = (0.05, 0.15, 0.80, 0.10)
and ωXC2 = (0.68, 0.22, 0.10, 0.10) that have the same belief masses as in the previous example, but with different base rate. In this example there is some conflict
between the opinions, which demonstrates that the degree of conflict is influenced
simply by changing the base rate.
The degree of conflict can be computed according to Eq.(4.64) as:
80
4 Decision-Making Under Uncertainty
DC(ωXB2 , ωXC2 ) = PD(ωXB2 , ωXC2 ) · CC(ωXB2 , ωXC2 )
(4.65)
= 0.56 · (1.00 − 0.80)(1.00 − 0.10) = 0.10.
Figure 4.15 shows a screenshot of the visualisation of the binomial opinions ωXB2
and ωXC2 . The opinions are shown as points in the opinion triangle on the left, with
their equivalent PDFs on the right.
u
10
9
[B,X]
8
7
6
5
4
3
[C,X]
d
aP
Opinion on [B,X]
0.05
belief
disbelief
0.15
uncertainty 0.80
base rate
0.10
probability 0.13
P
Opinion on [C,X]
0.68
belief
disbelief
0.22
uncertainty 0.10
base rate
0.10
probability 0.69
b
[C,X]
[B,X]
2
1
0
0.1 0.2 0.3 0.4
0.5 0.6 0.7 0.8
0.9
1.0
Beta Probability density
Fig. 4.15 Example of opinions ωXB2 and ωXC2 where DC(ωXB2 , ωXC2 ) = 0.1
Although the conflict might seem high due to the very different projected probabilities, the fact that ωXB2 is highly uncertain diffuses the potential conflict, so that
the degree of conflict only becomes DC(ωXB2 , ωXC2 ) = 0.10.
The notion of degree of conflict as described here only provides a relatively
course measure of conflict between two opinions. For example, two opinions with
very different PDFs can have zero conflict, as shown in Figure 4.14.
Because opinions are multi-dimensional a more complete expression for conflict
would necessarily require multiple parameters. However, this would partially defeat
the purpose of having a simple measure of conflict between opinions.
The degree of conflict expressed by Definition 4.20 provides a simple way of
assessing conflict between two opinions, which is is useful e.g. for trust revision.
Chapter 5
Principles of Subjective Logic
This chapter compares subjective logic with other relevant reasoning frameworks,
and gives an overview of general principles of subjective logic.
5.1 Related Frameworks for Uncertain Reasoning
5.1.1 Comparison with Dempster-Shafer Belief Theory
Dempster-Shafer Belief Theory (DST), also known as evidence theory, has its origin
in a model for upper and lower probabilities proposed by Dempster in 1960. Based
on Dempster’s model, Shafer later proposed a model for expressing beliefs [81]. The
main idea behind DST is to abandon the additivity principle of probability theory,
i.e. that the sum of probabilities on all pairwise exclusive possibilities must add up
to one. DST uses the term ‘frame of discernment’, or ‘frame’ for short, to denote the
set of exclusive possible states, which is equivalent to a domain in subjective logic.
Belief theory gives observers the ability to assign so-called belief mass to any subset
of the frame including the whole frame itself. The advantage of this approach is that
uncertainty about the probabilities, i.e. the lack of evidence to support any specific
probability, can be explicitly expressed by assigning belief mass to the whole frame
or to arbitrary subsets of the frame.
Shafer’s book [81] describes many aspects of belief theory, where the two main
elements are 1) a flexible way of expressing beliefs, and 2) a method for combining
beliefs, commonly known as Dempster’s rule.
The way DST expresses beliefs is highly expressive, and extends the notion of
probability. By using beliefs it is possible to provide the argument “I don’t know” as
input to a reasoning model, which is not possible with probabilities. This capability
has made DST quite popular among researchers and practitioners.
81
82
5 Principles of Subjective Logic
The opinion representation in subjective logic is based on the representation of
belief functions in DST. The difference between subjective opinions and DST belief
functions is that opinions include base rates, while DST belief functions do not.
Consider a domain X with its hyperdomain R(X)and powerset P(X). Recall
that X ∈ P(X). Let x denote a specific value of the hyperdomain R(X) or of the
powerset P(X).
In DST, belief mass on value x is denoted m (x). The equality between the belief
masses of DST and the belief masses and uncertainty mass of subjective opinions is
given by Eq.(5.1).

 m (x) = b X (x) ∀x ∈ R(X)
(5.1)

m (X) = uX
Syntactically, the belief/uncertainty representations of DST and subjective logic
are thus equivalent. Their interpretation however is different. In subjective logic
there can be no belief mass assigned to the domain X itself. This interpretation
corresponds to the Dirichlet model, where only observations of values of X are
counted as evidence. The domain X can not be observed, so it can not be counted as
evidence.
The main application area of DTS as described in the literature has been applications of belief fusion, where Dempster’s rule is the classical operator [81]. There
has been considerable controversy around assessing the adequacy of operators for
belief fusion, especially related to Dempster’s rule. The traditional interpretation of
Dempster’s rule is that it fuses separate argument beliefs from independent sources
into a single belief. There are well known examples where Dempster’s rule produces
counter-intuitive and clearly wrong results when interpreted in this way, especially
in case of strong conflict between the input argument beliefs [92], but also in case
of harmony between the input argument beliefs [15].
Motivated by this observation, numerous authors have proposed alternative methods for fusing beliefs [11, 14, 17, 38, 41, 62, 71, 84, 91]. These operators are not
only formally different, they also model very different situations, but the authors
often do not specify the type of situations they model. This confusion can be seen as
the tragedy of belief theory for two reasons. Firstly, instead of advancing belief theory, researchers have been trapped in the search for a solution to the same problem
for 30 years. Secondly, this controversy has given belief theory a bad taste despite
its obvious advantages for representing ignorance and uncertainty.
The fact that different situations require different operators and modeling assumptions has often been ignored in the belief theory literature, and has therefore
been a significant source of confusion for many years [48].
The equivalent operator for Dempster’s rule in subjective logic is the constraint
fusion operator described in Section 11.2. We prove in Section 11.2.2 that the constraint fusion operator (end thereby also Dempster’s rule) models situations of frequentist stochastic constraints, which through the correspondence between frequentist and subjective probabilities also applies to general constraints.
5.1 Related Frameworks for Uncertain Reasoning
83
5.1.2 Comparison with Imprecise Probabilities
The Imprecise Dirichlet Model (IDM) for multinomial data is described by Walley
[88] as a method for determining upper and lower probabilities. The model is based
on setting the minimum and maximum base rates in the Beta or Dirichlet PDF for
each possible value in the domain. The expected probability resulting from assigning the maximum base rate (i.e. equal to one) to the probability of a value in the
domain produces the upper probability, and the expected probability resulting from
assigning a zero base rate to a value in the domain produces the lower probability.
The upper and lower probabilities are interpreted as the upper and lower bounds
for the relative frequency of the outcome. While this is an interesting interpretation
of the Dirichlet PDF, it can not be taken literally, as shown below.
Let r X represent the evidence for the Dirichlet PDF, and let the non-informative
prior weight be denoted by W = 2. According to the Imprecise Dirichlet Model
(IDM) [88] the upper and lower probabilities for a value x ∈ X are defined as:
IDM Upper probability:
E(x) =
r X (x) +W
, ∀x ∈ X
W + ∑ki=1 r X (xi )
(5.2)
IDM Lower probability:
E(x) =
r X (x)
, ∀x ∈ X
W + ∑ki=1 r X (xi )
(5.3)
It can easily be shown that the IDM Upper and IDM Lower values can not be
literally interpreted as upper and lower bounds for for the probability. For example,
assume a bag contains 9 red marbles and 1 black marble, meaning that the relative
frequencies of red and black marbles are p(red) = 0.9 and p(black) = 0.1. The a
priori weight is set to W = 2. Assume further that an observer picks one marble
which turns out to be black. According to Eq.(5.3) the lower probability is then
E(black) = 31 . It would be incorrect to literally interpret this value as the lower
bound for the probability because it obviously is greater than the actual relative
frequency of black balls. In other words, if E(black) > p(black) then E(black) can
impossibly be the lower bound.
This case shows that the upper and lower probabilities defined by the IDM should
be interpreted as a rough probability interval, because it must allow for the possibility that actual probabilities (relative frequencies) can be outside the range.
5.1.3 Comparison with Fuzzy Logic
The domains for variables in fuzzy logic consist of terms/categories that are vague in
nature and that have partially overlapping semantics. For example, in case the variable is ‘Height of a person’ then possible values can be ‘short’, ‘average’ or ‘tall’.
The fuzzy aspect is that for a specific height it can be uncertain whether the person
should be considered small, average or tall. A person measuring 182cm might be
84
5 Principles of Subjective Logic
considered to be somewhat average and somewhat tall. In fuzzy logic this is expressed by fuzzy membership functions, whereby a person could be considered to be
0.5 average and 0.5 tall, depending on the circumstances. Note that in fuzzy logic,
the height of a person can be measured in an exact and crisp way, whereas variable
domains consist of terms/categories that are fuzzy/vague in nature.
In subjective logic on the other hand the domains consist of terms/categories that
are considered crisp in nature, whereas subjective opinions contain belief mass and
uncertainty mass that express uncertainty and vagueness. This difference between
fuzzy logic and subjective logic is illustrated in Figure 5.1.
Fuzzy logic
250 cm
Tall
Domain of
fuzzy
categories
Average
200 cm
Fuzzy
membership
functions
150 cm
100 cm
Short
Crisp
measures
50 cm
0 cm
Subjective logic
Domain of
crisp
categories
Friendly
aircraft
Enemy
aircraft
Civilian
aircraft
Subjective
opinions
Z
Uncertain
measures
Fig. 5.1 Difference between fuzzy membership functions and subjective opinions.
Both fuzzy logic and subjective logic both handle aspects of uncertainty and
vagueness, but they use quite different principles. A natural idea would be to combine these two reasoning frameworks. It is then a question of how this can be done,
and whether it would produce a more flexible and powerful reasoning than either
fuzzy logic or subjective logic can provide in isolation.
Without going deeper into this topic we can simply mention the possibility of
combining fuzzy logic and subjective logic, e.g. by expressing fuzzy membership
functions in terms of opinions, as described in [54]. The advantage of this approach
is that it is possible to express uncertainty about the membership functions. If for example the height of a person is only known with imprecision, then this can naturally
be reflected by expressing the fuzzy membership function as an uncertain subjective
opinion.
5.1 Related Frameworks for Uncertain Reasoning
85
5.1.4 Comparison with Kleene’s Three-Valued Logic
In Kleene’s 3-valued logic [24] propositions can be assigned one of 3 truth-values
specified as TRUE, FALSE and UNKNOWN. The two first truth values are interpreted as the traditional TRUE and FALSE in binary logic. The UNKNOWN value
can be thought of as neither TRUE nor FALSE. In Kleene logic it is assumed that
when the truth value of a particular proposition is UNKNOWN, then it might secretly have the value TRUE or FALSE at any moment in time, but the actual truth
value is not available to the analyst.
The logical AND and OR operators in Kleene’s 3-valued logic are specified in
Tables 5.1 (a) and (b) below.
Table 5.1 Truth tables for Kleene’s 3-valued AND and OR operators
x∧y
x
F
U
T
F
F
F
F
y
U
F
U
U
T
F
U
T
(a) Truth table for AND
x∨y
x
F
U
T
F
F
U
T
y
U
U
U
T
T
T
T
T
(a) Truth table for OR
There are obvious problems with Kleene’s logic, as explained below.
According to truth table 5.1 (a), the truth value of the conjunction (x ∧ y) is specified to be UNDEFINED when the truth values of x and y are both defined as UNDEFINED. However, in case of an infinitely large number of variables x, y, . . . z that are
all UNDEFINED, Kleene’s logic would still dictate the truth of the serial conjunction (x ∧ y · · · ∧ z) to be UNDEFINED. This result is inconsistent with the intuitive
conclusion where the correct value should be FALSE. A simple example illustrates
why this is so.
Assume the case of flipping a fair coin multiple times where each flip is a separate
variable. An observer’s best guess about whether the first outcome will be heads
might be expressed as “I don’t know” which in 3-valued logic would be expressed
as UNDEFINED, but the observer’s guess about whether the first n outcomes will
all be heads, when n is arbitrarily large or infinite, should intuitively be expressed as
FALSE, because the likelihood that a infinite series of outcomes will only produce
heads becomes infinitesimally small.
In subjective this paradox is easily solved when multiplying a series of vacuous
opinions. The product of an arbitrarily long series of vacuous binomial opinions
would still be vacuous, but the projected probability would be close to zero. This
result is illustrated with an example below.
Figure 5.2 shows a screenshot of the online demonstrator for subjective logic
operators. The example illustrates the case of multiplying the two vacuous binomial
opinions ωx = (0, 0, 1, 12 ) and ωy = (0, 0, 1, 12 ).
86
5 Principles of Subjective Logic
u
u
u
=
AND
d
b
a
P
Opinion about x
belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.50
probability 0.50
b
d
a
P
Opinion about y
belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.50
probability 0.50
b
d
a
P
Opinion about x AND y
belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.25
probability 0.25
Fig. 5.2 Example multiplication of two vacuous opinions
The method of multiplying two binomial opinions is described in Section 7.1
below, but this trivial example can be directly understood from Figure 5.2.
The product opinion ωx∧y = (0, 0, 1, 41 ) is still vacuous, but the projected product
probability is Px∧y = 14 . In case the product has n factors that are all the same vacuous opinions, then the product has a projected probability P(x∧y∧...z) = ( 21 )n which
quickly converges towards zero, as would be expected.
At first glance Kleene’s 3-valued logic might seem to represent a special case
of subjective logic. However, as the example above illustrates, applying the truth
tables of Kleene’s logic to practical situations leads to counter-intuitive results. The
corresponding results by subjective logic correspond well with intuition.
5.2 Subjective Logic as a Generalisation of Probabilistic Logic
We define probabilistic logic (PL) as the set of operators defined in Table 1.1 applied
to probabilities. PL operators generalise the traditional binary logic (BL) operators
AND, OR, XOR, MP etc., in the sense that when the probability arguments are 0 or
1 (equivalent to Boolean FALSE or TRUE) the PL operators correctly populate the
traditional truth tables of the corresponding BL operators. It means that PL operators
are homomorphic to the truth tables of BL in case probability arguments are 0 or 1,
and are generalisations in other cases.
Similarly, subjective logic generalises PL operators in the sense that when opinion arguments are dogmatic (equivalent to probabilities) then they produce dogmatic
opinions equivalent to probabilities produced by the corresponding PL operators.
It means that subjective logic operators are homomorphic to PL operators in case
opinion arguments are dogmatic, and are generalisations in other cases.
5.2 Subjective Logic as a Generalisation of Probabilistic Logic
87
In case of absolute opinion arguments (equivalent to Boolean TRUE or FALSE),
then SL operators are homomorphic to BL truth tables. The generalisations and
homomorphisms are illustrated in Figure 5.3.
Generalisation
Binary logic
x Booleans
Generalisation
Probabilistic logic
Homomorphic i.c.o.
probability 0 or 1
x Probabilities
x PL operators
x Truth tables
Homomorphic i.c.o.
dogmatic multinomial opinions
Subjective logic
x Opinions Z X
x SL operators
Homomorphic in case of absolute binomial opinions
Fig. 5.3 Generalisations and homomorphisms between SL, PL and BL
A homomorphism from an algebra denoted (domain A, set of operators) to an
algebra denoted (domain B, set of operators) exists when their respective sets of opA
A
B
B
erators e.g. denoted as (+, ×, . . . ) and (+, ×, . . . ) satisfy the following properties
under the mapping F from variables x, y, · · · ∈ A to variables F(x), F(y), · · · ∈ B:

A
B

F(x + y) = F(x) + F(y)
Homomorphism:
(5.4)
A
B
 F(x ×
y) = F(x) × F(y)
A
B
Given a homomorphism, we say e.g. that operator + is homomorphic to +. For
example, multiplication of binomial opinions (or probabilities) is homomorphic to
binary logic AND.
An isomorphism between an algebra denoted (domain A, operators) and an algebra denoted (domain B, operators) exists when in addition to Eq.(5.4) the mapping
F is bijective so that the following holds:

B
A
 −1
F (F(x) + F(y)) = x + y
Isomorphism:
(5.5)
B
A
 F −1 (F(x) ×
F(y)) = x × y
A
B
Given an isomorphism, we say e.g. that operators + and + are isomorphic. For
example, multiplication with integers and multiplication with Roman numerals are
isomorphic, where obviously multiplication with integers is the simplest. In case two
values are represented in Roman numerals and we need to compute their product,
the simplest is to first map the Roman numerals to integers, do the multiplication,
and finally map the product back to a Roman numeral.
Subjective logic isomorphisms, illustrated in Figure 5.4, allow effective usage of
operators from both the Dirichlet and the belief models.
88
5 Principles of Subjective Logic
Different expressions that traditionally are equivalent in binary logic do not necessarily have equal opinions. Take for example distributivity of AND over OR:
x ∧ (y ∨ z) ⇔ (x ∧ y) ∨ (x ∧ z).
(5.6)
This equivalence only holds for binary logic, not in subjective logic. The corresponding opinions are in general different, as expressed by Eq.(5.7).
ωx∧(y∨z) 6= ω(x∧y)∨(x∧z)
(5.7)
This is no surprise, as the corresponding PL operator for multiplication is also
non-distributive on comultiplication as expressed by Eq.(5.8).
p(x) · (p(y) ⊔ p(z)) 6= (p(x) · p(y)) ⊔ (p(x) · p(z))
(5.8)
The symbol ⊔ denotes coproduct of independent probabilities, defined as:
p(x) ⊔ p(y) = (p(x) · p(y)) − p(x) − p(y).
(5.9)
Coproduct of probabilities generalises binary logic OR. This means that Eq.(5.9)
generates the traditional truth table for binary logic OR when input probability arguments are either 0 (FALSE) or 1 (TRUE).
Multiplication is distributive over addition in subjective logic, as expressed by:
ωx∧(y∪z) = ω(x∧y)∪(x∧z) .
(5.10)
De Morgan’s laws are also satisfied in subjective logic as e.g. expressed by:
De Morgan 1: ωx∧y = ωx∨y
De Morgan 2: ωx∨y = ωx∧y .
(5.11)
Note also that Definition 6.3 of complement gives the following equalities:
ωx∧y = ¬ωx∧y ,
ωx∨y = ¬ωx∨y .
(5.12)
Subjective logic provides of a rich set of operators where input and output arguments are in the form of subjective opinions. Opinions can be applied to domains
of any cardinality, but some subjective logic operators are only defined for binomial
opinions over binary domains. Opinion operators can be described for the belief notation (i.e. traditional opinion notation), for the evidence notation (i.e. as Dirichlet
PDFs), or for the probabilistic notation (as defined in Section 3.6.1). The belief notation of SL operators normally produces the simplest and most compact expressions,
but it can be practical to use other notations in specific cases.
Subjective logic operators involving multiplication and division produce product opinions with correct projected probability (distribution), but possibly with ap-
5.2 Subjective Logic as a Generalisation of Probabilistic Logic
89
proximate variance when seen as Beta/Dirichlet PDF. All other operators produce
opinions with projected probability and variance that are analytically correct.
Table 5.2 provides the equivalent values and interpretation in belief notation, evidence notation, and probabilistic notation as well as in binary logic and traditional
probability representation for a selection of binomial opinions.
Table 5.2 Examples in the three equivalent notations of binomial opinion, and their interpretations.
Belief
notation
(b, d, u, a)
Evidence
notation
(r, s, a)
Probabilistic Interpretations as binomial opinion, Beta PDF
notation
and probability
(P, u, a)
(1, 0, 0, a)
(∞, 0, a)
(1, 0, a)
Absolute positive binomial opinion (Boolean TRUE),
Dirac delta function, probability p = 1
(0, 1, 0, a)
(0, ∞, a)
(0, 0, a)
Absolute negative binomial opinion (Boolean
FALSE), Dirac delta function, probability p = 0
( 21 , 12 , 0, a)
(∞, ∞, a)
( 21 , 0, a)
Dogmatic binomial opinion denoted ω , Dirac delta
function, probability p = 12
( 41 , 14 , 21 , 12 )
(1, 1, 12 )
( 21 , 12 , 12 )
Uncertain binomial opinion, symmetric Beta PDF of 1
positive and 1 negative observation, probability p = 21
(0, 0, 1, a)
(0, 0, a)
(a, 1, a)
Vacuous binomial opinion denoted ω , prior Beta PDF
with base rate a, probability p = a
(0, 0, 1, 12 )
(0, 0, 12 )
( 21 , 1, 12 )
Vacuous binomial opinion denoted ω , uniform Beta
PDF, probability p = 12
◦
◦
It can be seen that some measures correspond to Booleans and probabilities,
whereas other measures correspond to probability density distributions. This richness of expression represents the advantage of subjective logic over other probabilistic logic frameworks. Online visualisations of subjective opinions and density
functions can be accessed at http://folk.uio.no/josang/sl/.
Subjective logic allows highly efficient computation of mathematically complex models. This is possible by approximating the analytical function expressions
whenever needed. While it is relatively simple to analytically multiply two Beta
distributions in the form of a joint distribution, anything more complex than that
quickly becomes intractable. When combining two Beta distributions with some
operator/connective, the analytical result is not always a Beta distribution and can
involve hypergeometric series. In such cases, subjective logic always approximates
the result as an opinion that is equivalent to a Beta distribution.
90
5 Principles of Subjective Logic
5.3 Overview of Subjective Logic Operators
Table 5.3 lists the main subjective logic operators.
Table 5.3 Correspondence between SL operators, binary logic / set operators and SL notation.
SL operator (page)
Symbol
BL / set operator
Addition (p.95)
+
Union
∪
ωx∪y = ωx + ωy
Subtraction (p.97)
−
Difference
\
ωx\y = ωx − ωy
Complement (p.98)
¬
NOT (Negation)
x
ωx = ¬ωx
Multiplication (p.102)
·
AND (Conjunction)
∧
ωx∧y = ωx · ωy
Comultiplication (p.103)
⊔
OR (Disjunction)
∨
ωx∨y = ωx ⊔ ωy
Division (p.110)
/
UN-AND (Unconjunction)
Codivision (p.112)
UN-OR (Undisjunction)
ωx∧e y = ωx /ωy
Multinomial product (p.117)
e
⊔
e
∧
·
Cartesian product
×
e ωy
ωx∨e y = ωx ⊔
ωX×Y = ωX · ωY
Multinomial division (p.127)
/
Cartesian quotient
/
ωXY /Y = ωXY /ωY
Deduction (p.135)
⊚
MP
k
ωY kX = ωX ⊚ ωY |X
Abduction/Inversion (p.173)
MT
k̃
Constraint fusion (p.207)
e
⊚
⊙
n.a.
&
e ωY |X
ωX k̃Y = ωY ⊚
Cumulative Fusion (p.216)
⊕
n.a.
⋄
ωXA⋄B = ωXA ⊕ ωXB
Averaging fusion (p.218)
⊕
n.a.
⋄
ωXA⋄B = ωXA ⊕ ωXB
CC
n.a.
♥
B
CC ω
ωXA♥B = ωXA X
Cumulative unfusion (p.228)
⊖
n.a.
ωXAe⋄B = ωXA ⊖ ωXB
Averaging unfusion (p.229)
⊖
n.a.
e
⋄
Cumulative fission (p.231)
Discounting (p.247)
⊗
CC-fusion (p.224)
Symbol SL notation
e
∨
ωXA&B = ωXA ⊙ ωXB
ωXAe⋄B = ωXA ⊖ ωXB
n.a.
e
⋄
▽
ωX▽C = ωXC
Trust transitivity
:
ωX
[A;B]
= ωBA ⊗ ωXB
5.3 Overview of Subjective Logic Operators
91
Most of the operators in Table 5.3 correspond to well-known operators from
binary logic and probability calculus, while others are specific to subjective logic.
The correspondence between subjective logic operators and traditional operators
means that they are related through homomorphisms. The homomorphisms of Figure 5.3 can be illustrated with concrete examples.
Assume two independent binomial opinions ωx and ωy . Let P(ωx ) denote the
projected probability of ωx which then is the probability of x. Similarly, the expressions P(ωy ) and P(ωx∧y ) represent the probabilities of y and (x ∧ y) respectively. The
homomorphism from SL to PL illustrated in Figure 5.3 means for example that:
In case of dogmatic opinions: P(ωx∧y ) = P(ωx ) · P(ωy )
(5.13)
The homomorphism of Eq.(5.13) is of course also valid in case ωx and ωy are
absolute opinions.
Assume now two absolute binomial opinions ωx and ωy , and let B(ωx ) denote
the Boolean value of x. Similarly, the expressions B(ωy ) and B(ωx∧y ) represent the
Boolean values of y and (x ∧ y) respectively. The homomorphism from SL to BL
illustrated in Figure 5.3 means for example that:
In case of absolute opinions: B(ωx∧y ) = B(ωx ) ∧ B(ωy )
(5.14)
In the special case of absolute binomial opinions and with the homomorphisms
of Eq.(5.14), the distributivity between product and coproduct of opinions holds, in
contrast to the general case of Eq.(5.7). This leads to the equality of Eq.(5.15).
In case of absolute opinions: B(ωx∧(y∨z) ) = B(ω(x∧y)∨(x∧z) )
(5.15)
Recall from Eq.(3.43) the probabilistic notation of binomial opinions:

 Px : probability of x
Probabilistic notation: πx = (Px , ux , ax ), where: ux : uncertainty
(5.16)

ax : base rate of x
Binary logic AND corresponds to multiplication of opinions [46]. For example,
the pair of probabilistic binomial opinions on the elements x ∈ X and y ∈ Y:
πx = (1, 0, ax )
TRUE
with respective corresponding Booleans:
(5.17)
πy = (0, 0, ay )
FALSE
Their product is:
expressed with numerical values:
which corresponds to:
πx∧y
=
πx
·
πy
(0, 0, ax ay ) = (1, 0, ax ) · (0, 0, ay )
(5.18)
FALSE = TRUE ∧ FALSE
It is interesting to note that subjective logic represents a calculus for Dirichlet distributions when opinions are equivalent to Dirichlet distributions. Analytical
92
5 Principles of Subjective Logic
manipulations of Dirichlet distributions is complex but can be done for simple operators, such as multiplication in which case it is called a joint distribution. However, this analytical method will quickly become unmanageable when applied to the
more complex operators of Table 5.3 such as conditional deduction and abduction.
Subjective logic therefore has the advantage of providing advanced operators for
Dirichlet distributions for which no practical analytical solutions exist. It should be
noted that the simplicity of some subjective logic operators comes at the cost of
allowing those operators to be approximations of the analytically correct operators.
This is discussed in more detail in Section 7.1.
Subjective opinions can have multiple equivalent representations, as described in
Section 3.6. It naturally follows that each subjective logic operator can be expressed
for the various opinion representations. Since the different representations of opinions are equivalent, the different expressions of the same operator are isomorphic to
each other, as illustrated in Figure 5.4
SL
Evidence notation
SL
Belief notation
x Belief opinions Z X
x Belief operators
Isomorphic
e
x Evidence opinions DirX
x Evidence operators
SL
Probabilistic notation
x Probabilistic opinions S X
x Probabilistic operators
Fig. 5.4 Isomorphisms between the SL operators for different opinion representations
The fact that subjective opinions can be expressed based on the belief notation,
the evidence notation or the probabilistic notation also means that subjective logic
operators can be expressed in these different notations. Since the different representations of a subjective opinion are equivalent, the operators are isomorphic, as
illustrated in Figure 5.4. Throughout this book opinions and operators are generally expressed in the belief notation because it gives the simplest and most compact
expressions.
CertainLogic [79] provides alternative expressions for a few operators based on
a special notation, but only for binomial opinions. The operator expressions used
in CertainLogic are significantly more complex than the the equivalent operator
expressions based on the belief notation.
5.3 Overview of Subjective Logic Operators
93
Subjective logic is directly connected to traditional reasoning frameworks through
homomorphic correspondence with probabilistic logic and binary logic, and offers
powerful operators that are derived from the isomorphic correspondence between
the respective native algebras of belief opinions and Dirichlet PDFs.
The next chapters describe the operators mentioned in Table 5.3. Online demonstrations of subjective logic operators can be accessed at
http://folk.uio.no/josang/sl/.
Chapter 6
Addition, Subtraction and Complement
6.1 Addition
Addition of opinions in subjective logic is a binary operator that takes opinions
about two mutually exclusive values (i.e. two disjoint subsets of the same domain)
as arguments, and outputs an opinion about the union of the values [65]. Consider
for example the domain X = {x1 , x2 , x3 } illustrated in Figure 6.1, with the assumed
union of x1 and x2 .
x1
x1‰x2
:
x2
x3
Fig. 6.1 Union of values, corresponding to addition of opinions
Assume that the binomial opinions ωx1 and ωx2 apply to x1 and x2 respectively.
The addition of ωx1 and ωx2 then consists of computing the opinion on x1 ∪ x2 as a
function of the two former opinions. The operator for addition first described in [65]
is defined below.
Definition 6.1 (Addition). Assume a domain X where x1 and x1 are two singleton
elements, or alternatively two disjoint subsets, i.e. x1 ∩ x2 = 0.
/ We also require that
the two elements x1 and x2 together do not represent a complete partition of X which
would mean that x1 ∪ x2 ⊂ X.
Assume the binomial opinions ωx1 = (bx1 , dx1 , ux1 , ax1 ) and ωx2 = (bx2 , dx2 , ux2 , ax2 )
that respectively apply to x1 and x2 . The opinion about x1 ∪ x2 as a function of the
opinions about x1 and x2 is defined as:
95
96
6 Addition, Subtraction and Complement
Opinion sum ω(x1 ∪x2 ) :

b(x1 ∪x2 ) = bx1 + bx2 ,







ax1 (dx1 −bx2 )+ax2 (dx2 −bx1 )



,
 d(x1 ∪x2 ) =
ax1 +ax2
(6.1)

ax1 ux1 +ax2 ux2



 u(x1 ∪x2 ) = ax1 +ax2 ,





a
(x1 ∪x2 ) = ax1 + ax2 .
By using the symbol ‘+’ to denote the addition operator for opinions, addition
⊔
⊓
can be denoted as ω(x1 ∪x2 ) = ωx1 + ωx2 .
It can be verified that the addition operator preserves the addition of projected
probabilities, as expressed by Eq.(6.2).
Addition of projected probabilities: P(x1 ∪x2 ) = Px1 + Px2 .
(6.2)
Figure 6.2 shows a screenshot of the online demonstrator for subjective logic
operators. The example illustrates addition of the two binomial opinions ωx1 =
(0.20, 0.40, 0.40, 0.25) and ωx2 = (0.10, 0.50, 0.40, 0.50).
u
u
u
=
PLUS
d
b
aP
Opinion about x1
belief
0.20
disbelief
0.40
uncertainty 0.40
base rate
0.25
probability 0.30
b
d
P
a
Opinion about x2
belief
0.10
disbelief
0.50
uncertainty 0.40
base rate
0.50
probability 0.30
b
d
P
a
Opinion about x1 ‰ x2
belief
0.30
disbelief
0.30
uncertainty 0.40
base rate
0.75
probability 0.60
Fig. 6.2 Example addition of two binomial opinions
The sum is simply ω(x1 ∪x2 ) = (0.30, 0.30, 0.40, 0.75), and it can be verified that
P(x1 ∪x2 ) = 0.30 + 0.30 = 0.60.
Opinion addition generates confusing belief mass b(x1 ∪x2 ) from the specific belief
masses bx1 and bx2 . Opinion addition might therefore not be as useful as one might
intuitively assume. Also, opinion addition does not apply to the case when X =
(x1 ∪ x1 ), because the resulting belief mass would be totally vague, so the opinion
should be considered vacuous.
6.2 Subtraction
97
The cumulative fusion operator is related to the addition operator, but these two
operators have different interpretations and purposes, so they should not be confused. The cumulative fusion operator is based on addition of evidence in the evidence space, whereas the addition operator is based on addition of belief mass is the
belief space. Cumulative fusion does not necessarily lead to increased vagueness. In
case of multinomial opinions cumulative fusion does not produce vagueness at all.
The cumulative fusion operator is described in Chapter 11.
6.2 Subtraction
The inverse operation to opinion addition is opinion subtraction [65]. Since addition
of opinions yields the opinion about x1 ∪ x2 from the opinions about disjoint subsets
of the domain, then the difference between the opinions about x1 ∪ x2 and x2 (i.e.
the opinion about (x1 ∪ x2 )\x2 ) can only be defined if x2 ⊆ (x1 ∪ x2 ) where x2 and
(x1 ∪ x2 ) are subsets of the domain X, i.e. the system must be in the state (x1 ∪ x2 )
whenever it is in the state x2 . The operator for subtraction first described in [65] is
defined below.
Definition 6.2 (Subtraction). Let (x1 ∪ x2 ) and x2 be subsets of the same domain
X where (x1 ∪ x2 ) ∩ x2 = x2 . The opinion about (x1 ∪ x2 )\x2 as a function of the
opinions about (x1 ∪ x2 ) and x2 is expressed below.
ω((x1 ∪x2 )\x2 ) :

b
= b(x1 ∪x2 ) − bx2 ,

 ((x1 ∪x2 )\x2 )





a(x ∪x ) (d(x ∪x ) +bx2 )−ax2 (1+bx2 −b(x ∪x ) −ux2 )

1 2
1 2
1 2

,

a(x ∪x ) −ax2
 d((x1 ∪x2 )\x2 ) =
1 2

a(x ∪x ) u(x ∪x ) −ax2 ux2

2

u((x1 ∪x2 )\x2 ) = 1 a2 1 −a
,


x2
(x1 ∪x2 )






a((x1 ∪x2 )\x2 ) = a(x1 ∪x2 ) − ax2 .
(6.3)
Since both u((x1 ∪x2 )\x2 ) and d((x1 ∪x2 )\x2 ) should be non-negative the following constraints apply.

 u((x1 ∪x2 )\x2 ) ≥ 0 ⇒ ax2 ux2 ≤ a(x1 ∪x2 ) u(x1 ∪x2 ) ,

d((x1 ∪x2 )\x2 ) ≥ 0 ⇒ a(x1 ∪x2 ) (d(x1 ∪x2 ) + bx2 ) ≥ ax2 (1+bx2 −b(x1 ∪x2 ) − ux2 ).
(6.4)
By using the symbol ‘−’ to denote the subtraction operator for opinions, subtraction can be denoted as ω((x1 ∪x2 )\x2 ) = ω(x1 ∪x2 ) − ωx2 .
⊔
⊓
Given the structure of the example domain X in Figure 6.1, it is obvious that
ω((x1 ∪x2 )\x2 ) = ωx1 .
98
6 Addition, Subtraction and Complement
The subtraction operator produces reduced vagueness, and removes vagueness
completely if ((x1 ∪ x2 )\x2 ) is a singleton. Subtraction of opinions is consistent
with subtraction of probabilities, as expressed by Eq.(6.5).
Subtraction of projected probabilities: Px1 = P(x1 ∪x2 ) − Px2 .
(6.5)
Figure 6.3 shows a screenshot of the online demonstrator for subjective logic
operators. The example illustrates subtraction of the binomial opinion ω(x1 ∪x2 =
(0.70, 0.10, 0.20, 0.75) by the binomial opinion ωx2 = (0.50, 0.30, 0.20, 0.25).
u
u
u
=
MINUS
d
b
a P
Opinion about x1 ‰ x2
belief
0.70
disbelief
0.10
uncertainty 0.20
base rate
0.75
probability 0.85
b
d
a
P
Opinion about x2
belief
0.50
disbelief
0.30
uncertainty 0.20
base rate
0.25
probability 0.55
b
d
P
a
Opinion about x1
belief
0.20
disbelief
0.60
uncertainty 0.20
base rate
0.50
probability 0.30
Fig. 6.3 Example subtraction between two binomial opinions
The difference is simply ωx1 = (0.20, 0.60, 0.20, 0.50), and it can be verified that
P((x1 ∪x2 )\x2 ) = Px1 = 0.85 − 0.55 = 0.30.
6.3 Complement
A binomial opinion focuses on a single element x in a binary domain X = {x, x}.
The complement of this opinion opinion is simply the opinion on the complement
element x. This is illustrated in Figure 6.4.
Definition 6.3 (Complement). Assume the binary domain X = {x x} where ωx =
(bx , dx ux , ax ) is a binomial opinion on x. Its complement is the binomial opinion ωx
expressed as:

bx = dx



dx = bx
Complement opinion ωx :
(6.6)
ux = ux



ax = 1 − ax
6.3 Complement
99
™Z xA
Z xA
:
x
x
Fig. 6.4 Complement of binomial opinion
The complement operator denoted ‘¬’ is an unary operator. Applying the complement operator to a binomial opinion is written:
¬ωx = ωx
(6.7)
⊔
⊓
The complement operator corresponds to binary logic NOT, and to complement
of probabilities. For projected probabilities it can be verified that:
P(¬ωx ) = 1 − P(ωx ).
(6.8)
Figure 6.5 shows a screenshot of the online demonstrator for binomial subjective logic operators. The example shown in the figure illustrates complement of the
binomial opinion ωx = (0.50, 0.10, 0.40, 0.25).
u
u
=
COMPLEMENT
d
a
P
b
Opinion about x
0.50
belief
disbelief
0.10
uncertainty 0.40
base rate
0.25
probability 0.60
b
d
P
a
Opinion about NOT x
0.10
belief
disbelief
0.50
uncertainty 0.40
base rate
0.75
probability 0.40
Fig. 6.5 Example complement of binomial opinion
The complement opinion is simply ωx = (0.10, 0.50, 0.40, 0.75), and it can be
verified that Px = 1 − Px = 1.00 − 0.60 = 0.40.
Chapter 7
Binomial Multiplication and Division
This chapter describes the subjective logic operators that correspond to binary logic
AND and OR, as well as their inverse operators which in binary logic could be called
UN-AND and UN-OR. We will here describe multiplication and comultiplication
[46]1 . Special limit cases are described in [46].
7.1 Binomial Multiplication and Comultiplication
Binomial multiplication and comultiplication in subjective logic take binomial opinions about two elements from distinct binary domains as input arguments and produce a binomial opinion as result. The product and coproduct result opinions relate
to subsets of the Cartesian product of the two binary domains. The Cartesian product of the two binary domains X = {x, x} and Y = {y, y} produces the quaternary
set X×Y = {(x y), (x y), (x y), (x y)} which is illustrated in Figure 7.1 below.
:
;
x
y
x
u
:u;
( x y)
( x y)
( x y)
( x y)
=
y
Fig. 7.1 Cartesian product of two binary domains
1
Note that the definitions of multiplication and comultiplication defined here are different from
those defined in [37] which should not be used.
101
102
7 Binomial Multiplication and Division
It is possible to compute joint Beta PDFs and Dirichlet PDFs although closed
expressions for general case might be intractable. When assuming that such joint
PDFs exist, one would expect multiplication to be equivalent. However, in general,
products of opinions in subjective logic represent approximations of the analytically
correct products of Beta PDFs and Dirichlet PDFs. In this regard, multiplication of
binomial opinions produces the best approximation of joint Beta PDFs.
The same can be said for coproducts, quotients and co-quotients. There is hardly
any work on computing these results for Beta PDFs in the literature, so subjective
logic currently offers the most practical operators for computing coproducts, quotients and co-quotients of Beta PDFs.
7.1.1 Binomial Multiplication
Let ωx and ωy be opinions about x and y respectively held by the same observer. Then the product opinion ωx∧y is the observer’s opinion about the conjunction x ∧ y = {(x y)} that is represented by the area inside the dotted line
in Figure 7.1. The coproduct opinion ωx∨y is the opinion about the disjunction
x ∨ y = {(x y), (x y), (x y)} that is represented by the area inside the dashed line
in Figure 7.1. Obviously X×Y is not binary, and coarsening is required in order to
determine the product and coproduct opinions as binomial opinions.
Definition 7.1 (Binomial Multiplication). Let X = {x, x} and Y = {y, y} be two
separate domains, and let ωx = (bx , dx , ux , ax ) and ωx = (by , dy , uy , ay ) be independent binomial opinions on x and y respectively. Given opinions about independent
propositions, x and y, the binomial opinion ωx∧y on the conjunction (x ∧ y) is:
Product ωx∧y :

(1−ax )ay bx uy +ax (1−ay )ux by

bx∧y = bx by +
,

1−ax ay







 dx∧y = dx + dy − dx dy ,

(1−ay )bx uy +(1−ax )ux by


ux∧y = ux uy +
,

1−ax ay






ax∧y = ax ay .
(7.1)
By using the symbol ‘·’ to denote this operator, multiplication of opinions can be
written as ωx∧y = ωx · ωy .
⊔
⊓
Figure 7.2 shows a screenshot of the online demonstrator for subjective logic
operators. The example illustrates multiplication of the two binomial opinions specified as ωx = (0.75, 0.15, 0.10, 0.50) and ωy = (0.10, 0.00, 0.90, 0.20).
The product is ω(x∧y) = (0.15, 0.15, 0.70, 0.10), and it can be verified that
Eq.(7.2) holds for the product projected probability.
7.1 Binomial Multiplication and Comultiplication
u
103
u
u
=
AND
d
b
a
P
Opinion about x
belief
0.75
disbelief
0.15
uncertainty 0.10
base rate
0.50
probability 0.80
b
d
a P
Opinion about y
0.10
belief
disbelief
0.00
uncertainty 0.90
base rate
0.20
probability 0.28
b
d
a P
Opinion about x AND y
belief
0.15
disbelief
0.15
uncertainty 0.70
base rate
0.10
probability 0.22
Fig. 7.2 Example multiplication of two binomial opinions
P(x∧y) = Px · Py
(7.2)
= 0.80 · 0.28 = 0.22.
Notice that ωx has relatively low uncertainty whereas ωy has relatively high uncertainty. An interesting property of the multiplication operator, which can be seen
in Figure 7.2, is that the product opinion has an uncertainty mass on a level between
the uncertainty masses of the factor opinions.
7.1.2 Binomial Comultiplication
Comultiplication of binomial opinions is defined next.
Definition 7.2 (Binomial Comultiplication). Let X = {x, x} and Y = {y, y} be two
separate domains, and let ωx = (bx , dx , ux , ax ) and ωx = (by , dy , uy , ay ) be independent binomial opinions on x and y respectively. The binomial opinion ωx∨y on the
disjunction x ∨ y is:

bx∨y = bx + by − bx by ,







ax (1−ay )dx uy +(1−ax )ay ux dy


,
 dx∨y = dx dy +
ax +ay −ax ay
(7.3)
Coproduct ωx∨y :

ay dx uy +ax ux dy


ux∨y = ux uy + ax +ay −ax ay ,







ax∨y = ax + ay − ax ay .
104
7 Binomial Multiplication and Division
By using the symbol ‘⊔’ to denote this operator, comultiplication of opinions can
⊔
⊓
be written as ωx∨y = ωx ⊔ ωy .
Figure 7.3 shows a screenshot of the online demonstrator for subjective logic
operators. The example illustrates comultiplication of the two binomial opinions
ωx = (0.75, 0.15, 0.10, 0.50) and ωy = (0.35, 0.00, 0.65, 0.20).
u
u
u
=
OR
d
b
a
P
Opinion about x
belief
0.75
disbelief
0.15
uncertainty 0.10
base rate
0.50
probability 0.80
b
d
a
b
d
P
Opinion about y
belief
0.35
disbelief
0.00
uncertainty 0.65
base rate
0.20
probability 0.48
a
P
Opinion about x OR y
belief
0.84
disbelief
0.06
uncertainty 0.10
base rate
0.60
probability 0.90
Fig. 7.3 Example comultiplication of two binomial opinions
The coproduct is ω(x∨y) = (0.84, 0.06, 0.10, 0.60), and it can be verified that
Eq.(7.4) holds for the coproduct projected probability.
P(x∨y) = Px ⊔ Py
= Px + Py − (Px · Py )
(7.4)
= 0.80 + 0.48 − (0.80 · 0.48) = 0.90.
Notice that ωx has relatively low uncertainty whereas ωy has relatively high uncertainty. Similarly to the case of multiplication above, it can be seen in Figure 7.3
that the coproduct opinion has an uncertainty mass on a level between the uncertainty masses of the factor opinions.
7.1.3 Approximations of Product and Coproduct
The expressions for product in Definition 7.1, and for coproduct in Definition 7.2,
might appear ad hoc. However, there is a clear rationale behind their design.
7.1 Binomial Multiplication and Comultiplication
105
The rationale is that the product and coproduct beliefs and disbeliefs must be at
least as large as the raw products of belief and disbelief from the factor opinions,
anything else would be irrational. If the product disbelif is dx∧y = dx + dy − dx dy , and
the coproduct belief is bx∨y = bx + by − bx by , then this requirement is satisfied. The
product and coproduct beliefs and disbeliefs could of course be larger than that, but
the larger they are, the smaller the uncertainty. The operators are so designed that
the maximum uncertainty is preserved in the product and coproduct, which occurs
when the disbelief of the product, and the belief of the coproduct are exactly as they
are defined.
The operators for multiplication and comultiplication are thus conservative, in
the sense that they preserve the maximum uncertainty possible. The variance of
the product and coproduct can easily be computed through Eq.(3.10), where the
variance is a function of the uncertainty, as expressed in Eq.(7.5).
Varx∧y =
Px∧y (1−Px∧y )ux∧y
W +ux∧y
Coproduct variance: Varx∨y =
Px∨y (1−Px∨y )ux∨y
W +ux∨y
Product variance:
(7.5)
where W denotes the non-informative prior weight, which is normally set to W = 2.
The non-informative prior weight is discussed in Section 3.3.2.
From Eq.(7.5) it be seen that when the uncertainty is zero, the variance is also
zero, which corresponds to a dogmatic opinion. The case where Px∧y = 1/2 and
ux∧y = 1 is a vacuous opinion which corresponds to the uniform Beta PDF, with
variance 1/12.
The question is how well the level of uncertainty corresponds with the analytically ‘correct’ level of uncertainty, in the sense that the variance of the product and
coproduct follows the analytically correct variance of the product and coproduct as
closely as possible.
Multiplication and comultiplication represent a self-dual system represented by
b ↔ d, u ↔ u, a ↔ 1 − a, and ∧ ↔ ∨, that is, for example, the expressions for bx∧y
and dx∨y are dual to each other, and one determines the other by the correspondence,
and similarly for the other expressions. This is equivalent to the observation that the
opinions satisfy de Morgan’s Laws, i.e. ωx∧y = ωx∨y and ωx∨y = ωx∧y . However it
should be noted that multiplication and comultiplication are not distributive over
each other, i.e. for example that:
ωx∧(y∨z) 6= ω(x∧y)∨(x∧z)
(7.6)
This is to be expected because if x, y and z are independent, then x ∧ y and x ∧ z
are not generally independent in probability calculus so that distributivity does not
hold. In fact distributivity of conjunction over disjunction and vice versa only holds
in binary logic.
Multiplication and comultiplication produce very good approximations of the analytically correct products and coproducts when the arguments are Beta probability
density functions [46]. The difference between the subjective logic product and the
106
7 Binomial Multiplication and Division
analytically correct product of Beta density functions is best illustrated with the example of multiplying two equal vacuous binomial opinions ω = (0, 0, 1, 21 ), that are
equivalent to the uniform Beta probability density function Beta(1, 1).
Theorem 7.1. Let X = {x, x} and Y = {y, y} be two binary domains, and let X ∈ X
and Y ∈ Y be independent binary random variables with identical uniform probability density functions, which for example can be described as Beta(1, 1) and
Beta(p(y) | 1, 1). Then the probability density function PDF (p(Z = (x∧y))) for the
product random variable Z = X ·Y is given by:
PDF (p(Z = (x∧y))) = − ln p(Z = (x ∧ y)),
for 0 < p(Z) < 1.
(7.7)
The proof is given in [46]. This result applies to the case of the independent
propositions x and y, where the joint variable Z takes values from the Cartesian
product domain Z = {(x ∧ y), (x ∧ y), (x ∧ y), (x ∧ y)}. Specifically, this means that
when the probabilities of x and y have uniform distributions, then the probability of
the conjunction x ∧ y has the probability density function PDF (p(Z = (x∧y))) with
projected probability P(Z=(x∧y)) = 41 .
This can be contrastedwith the a priori non-informative probability density function Dir pQ | ( 21 , 21 , 12 , 21 ) over the quaternary domain Q = {q1 , q2 , q3 , q4 }. The
corresponding a priori
probability density function for the probability of q1 is
Beta p(q1 ) | ( 12 , 23 ) which can be directly derived from Dir pQ | ( 21 , 12 , 12 , 12 ) . Interestingly we get equal projected probabilities: P(q1 ) = P(Z = (x∧y)) = 14 .
The difference between Beta p(q1 ) | ( 21 , 32 ) and PDF (p(Z = (x∧y))) = − ln p(Z =
(x∧y) is illustrated in Figure 7.4 below.
f
5
4
3
−ln p
2
1
1 __
3
beta(__
2, 2 )
0.2
p
0.4
0.6
0.8
1
Fig. 7.4 Comparison between Beta p(q1 ) | ( 21 , 23 ) and PDF (p(Z = (x∧y))) = − ln p(Z = (x∧y)).
The analytically correct product of two uniform distributions is represented by
PDF(p(Z = (x∧y)) = − ln p, whereas
the product produced by the multiplication
operator is Beta p(q1 ) | ( 21 , 32 ) , which illustrates that multiplication and comultiplication in subjective logic produce approximate results. More specifically, it can
7.2 Reliability Analysis
107
be shown that the projected probability is always exact, and that the variance is approximate. The quality of the variance approximation is analysed in [46], and is
very good in general. The discrepancies grow with the amount of uncertainty in the
arguments, so Figure 7.4 illustrates the worst case.
The advantage of the multiplication and comultiplication operators of subjective logic is their simplicity, which means that analytical expressions that normally
are complex, and sometimes intractable, can be analysed efficiently. The analytical
result of products and coproducts of Beta distributions will in general involve the
Gauss hypergeometric function [76]. The analysis of anything but the most basic
models based on such functions would quickly become unmanageable.
7.2 Reliability Analysis
The modern use of the word reliability originates from U.S. military in the 1940s,
where it meant that a product would operate as expected for a specified period of
time. Reliability analysis of systems is now a mature discipline based on well established techniques. This section describes ow subjective a applied to system reliability analysis.
7.2.1 Simple Reliability Networks
For the purpose reliability analysis, a system can be considered as consisting of
components that are represented as edges in a graph. This network of components
is not intended to reflect the physical architecture of a system, but only to represent
the reliability dependencies of the system. The connection between components in
this way represents a semantic relationship which is similar to trust relationships
described in Chapter 13.
A serial connection of two components reflects the property that both components must function correctly for the whole system to function correctly. In binary
logic this dependency relationship is computed with the AND connective. In probabilistic logic it is computed with the product operator. In subjective logic it is computed with binomial multiplication according to Definition 7.1.
A parallel connection of two components reflects the property that at least one of
the components must function correctly for the whole system to function correctly.
In binary logic this dependency relationship is computed with the OR connective.
In probabilistic logic it is computed with the coproduct operator. In subjective logic
it is computed with binomial comultiplication according to Definition 7.2.
Figure 7.5.a illustrates a system S which consists of the components w, x, y and
z. From a reliability point of view assume that these components can be considered
connected as a reliability network consisting of serial and parallel connections, as
illustrated in Figure 7.5.b.
108
7 Binomial Multiplication and Division
Reliability of system S
System S
w
x
x
z
w
y
z
y
b) Reliability dependence relationships
a) System components
Fig. 7.5 System components with series-parallel dependencies
Figure 7.5.b expresses that the correct function of system S requires that both
w and z must function correctly, and in addition that either x or y must function
correctly.
This reliability network can be formally expressed as:
S = w ∧ (x ∨ y) ∧ z.
(7.8)
The advantage of using subjective logic for reliability analysis is that component
reliability can be expressed with degrees of uncertainty. To compute the reliability of
system S, the reliability of each component can be expressed as the binomial opinions ωw , ωx , ωy and ωz . By applying binomial multiplication and comultiplication
the reliability of system S can be computed as:
ωS = ω(w∧(x∨y)∧z) = ωw · (ωx ⊔ ωy ) · ωw
(7.9)
As an example, Table 7.1 specifies opinions for the reliabilities of components
w, x, y and z, as well as the resulting reliability opinion of system S.
Table 7.1 Example reliability analysis of system S
Opinion parameter
Belief mass
Disbelief mass
Uncertainty mass
Base rate
Projected probability
b
d
u
a
P
ωw
0.90
0.10
0.00
0.90
0.90
Component reliabilities
ωx
ωy
ωz
0.50
0.00
0.00
0.00
0.00
0.00
0.50
1.00
1.00
0.80
0.80
0.90
0.90
0.80
0.90
It can be verified that the following holds, as expected.
System reliability
ω(w∧(x∨y)∧z)
0.43
0.10
0.47
0.78
0.80
7.2 Reliability Analysis
109
P(w∧(x∨y)∧z) = Pw · (Px ⊔ Py ) · Pz
= Pw · Pz · (Px + Py − Px · Py )
(7.10)
= 0.90 · 0.90 · (0.90 + 0.80 − 0.90 · 0.80)
= 0.80.
Thanks to the bijective mapping between belief opinions and evidence opinions
in the form of Beta PDFs as defined by Eq.(3.11) it is possible to conduct reliability
analysis where the reliability of individual components are expressed in terms of
Beta PDFs.
7.2.2 Reliability Analysis of Complex Systems
System reliability networks can be more complex than that illustrated in the previous
section. Consider for example the 5-component system S shown in Figure 7.6.a, and
assume that its reliability can be modelled in the form of the reliability network in
Figure 7.6.b.
System S
components
v
w
y
Reliability of system S
v
x
x
z
a) System components
y
w
z
b) Reliability dependence relationships
Fig. 7.6 System components with complex dependencies
As Figure 7.6.b illustrates, this reliability network can not be broken down into
a group of series and parallel edges. This complicates the problem of determining
the networks reliability. If the system could be broken down to series/parallel configurations, it would be a relatively simple matter to determine the mathematical
or analytical formula that describes the networkss reliability. A good description of
possible approaches to analysing complex reliability systems is presented in [70]
(p.161). Some of the methods for analytically obtaining the reliability of a complex
system are:
• Decomposition method. The decomposition method applies the law of total
probability. It involves choosing a key edge and then calculating the reliability
of the network twice: once as if the key edge failed, and once as if the key edge
110
7 Binomial Multiplication and Division
succeeded. These two probabilities are then combined to obtain the reliability
of the system, since at any given time the key edge will fail or operate.
• Event space method. The event space method applies the mutually exclusive
events axiom. All mutually exclusive events are determined, and those which
result in network success are considered. The reliability of the network is simply
the probability of the union of all mutually exclusive events that yield a network
success. Similarly, the unreliability is the probability of the union of all mutually
exclusive events that yield a network failure.
• Path-tracing method. This method considers every path from a starting point to
the ending point. Since network success involves having at least one path available from one end of the Reliability Block Diagram (RBD) to the other, as long
as at least one path from the beginning to the end of the path is available, the network has not failed. One could consider the RBD to be a plumbing schematic.
If an edge in the network fails, the water can no longer flow through it. As long
as there is at least one path for the water to flow from the start to the end of the
network, the network is successful. This method involves identifying all of the
paths the water could take and calculating the reliability of the path based on
the edges that lie along that path. The reliability of the network is simply the
probability of the union of these paths. In order to maintain consistency of the
analysis, starting and ending points for the network must be defined, which in
case of trust network is trivial to define.
We refer to [70] for a more detailed description of the above mentioned methods.
Reliability networks of this kind can also be analysed with subjective logic.
7.3 Binomial Division and Codivision
Division and codivision naturally represent the inverse operations of multiplication
and comultiplication. These operations are well defined for probabilities. The corresponding operations for binary logic can be defined as UN-AND and UN-OR
respectively.
7.3.1 Binomial Division
The inverse operation to binomial multiplication is binomial division. The quotient
of opinions about propositions x and y represents the opinion about a proposition z
which is independent of y such that ωx = ωy∧z . This requires that:
7.3 Binomial Division and Codivision
111

ax < ay ,


 dx ≥ dy ,

bx ≥



u ≥
x
ax (1−ay )(1−dx )by
(1−ax )ay (1−dy )
(1−ay )(1−dx )uy
(1−ax )(1−dy ) .
(7.11)
,
Definition 7.3 (Binomial Division). Let X = {x, x} and Y = {y, y} be domains, and
let ωx = (bx , dx , ux , ax ) and ωy = (by , dy , uy , ay ) be binomial opinions on x and y
satisfying Eq.(7.11). The division of ωx by ωy produces the quotient opinion ωx∧e y =
(bx∧e y , dx∧e y , ux∧e y , ax∧e y ) defined by:
Quotient ωx∧e y :

bx∧e y =










 dx∧e y =




ux∧e y =







ax∧e y =
ay (bx +ax ux )
(ay −ax )(by +ay uy )
x (1−dx )
− (aya−a
,
x )(1−dy )
dx −dy
1−dy ,
(7.12)
ay (1−dx )
(ay −ax )(1−dy )
−
ay (bx +ax ux )
(ay −ax )(by +ay uy ) ,
ax
ay ,
By using the symbol ‘/’ to denote this operator, division of opinions can be
written as ωx∧e y = ωx /ωy .
⊓
⊔
Figure 7.7 shows a screenshot of the online demonstrator for subjective logic operators. The example of Figure 7.7 illustrates division of binomial opinion ω(x∧y) =
(0.10, 0.80, 0.10, 0.20) by the binomial opinion ωy = (0.40, 0.00, 0.60, 0.50).
u
u
u
=
UN-AND
d
b
P a
Opinion about x AND y
belief
0.10
disbelief
0.80
uncertainty 0.10
base rate
0.20
probability 0.12
b
d
a
P
Opinion about y
belief
0.40
disbelief
0.00
uncertainty 0.60
base rate
0.50
probability 0.70
b
d
P
a
Opinion about x
belief
0.15
disbelief
0.80
uncertainty 0.05
base rate
0.40
probability 0.17
Fig. 7.7 Example division of a binomial opinion by another binomial opinion
112
7 Binomial Multiplication and Division
The quotient is ωx = ω((x∧y)∧e y) = (0.15, 0.80, 0.04, 0.40), and it can be verified
that Eq.(7.13) holds for the quotient projected probability.
Px = P((x∧y)∧e y)
= P(x∧y) /Py
(7.13)
= 0.12/0.70 = 0.17.
Although probability division is a traditional operation used in probabilistic models and analysis, the corresponding binary logic operator UN-AND is rarely used,
and was only introduced in 2004 [46].
7.3.2 Binomial Codivision
The inverse operation to comultiplication is codivision. The co-quotient of opinions
about propositions x and y represents the opinion about a proposition z which is
independent of y such that ωx = ωy∨z . This requires that

ax > ay ,



 bx ≥ by ,
(1−ax )ay (1−bx )dy
,
dx ≥ ax (1−a

y )(1−by )


 u ≥ ay (1−bx )uy .
x
(7.14)
ax (1−by )
Definition 7.4 (Binomial Codivision). Let X = {x, x} and Y = {y, y} be domains,
and let ωx = (bx , dx , ux , ax ) and ωy = (by , dy , uy , ay ) be binomial opinions on x and
y satisfying Eq.(7.14). The codivision of opinion ωx by opinion ωx produces the
co-quotient opinion ωx∨e y = (bx∨e y , dx∨e y , ux∨e y , ax∨e y ) defined by:
Co-quotient ωx∨e y :


bx∨e y =










 dx∨e y =



ux∨e y =








a
ey =
x∨
bx −by
1−by ,
(1−ay )(dx +(1−ax )ux )
(ax −ay )(dy +(1−ay )uy )
(1−ax )(1−bx )
− (a
,
x −ay )(1−by )
(7.15)
(1−ay )(1−bx )
(ax −ay )(1−by )
−
(1−ay )(dx +(1−ax )ux )
(ax −ay )(dy +(1−ay )uy ) ,
ax −ay
1−ay ,
e ’ to denote this operator, codivision of opinions can be
By using the symbol ‘⊔
e ωy .
written as ωx∨e y = ωx ⊔
⊓
⊔
7.4 Correspondence with Probabilistic Logic
113
Figure 7.8 shows a screenshot of the online demonstrator for subjective logic
operators. The example of Figure 7.8 illustrates codivision of binomial opinion
ω(x∨y) = (0.05, 0.55, 0.40, 0.75) by binomial opinion ωy = (0.00, 0.80, 0.20, 0.50).
u
u
u
=
UN-OR
d
b
P
a
b
d
P
P
Opinion about y
belief
0.00
disbelief
0.80
uncertainty 0.20
base rate
0.50
probability 0.10
Opinion about x OR y
0.05
belief
disbelief
0.55
uncertainty 0.40
base rate
0.75
probability 0.35
b
d
a
a
Opinion about x
belief
0.05
disbelief
0.49
uncertainty 0.46
base rate
0.50
probability 0.28
Fig. 7.8 Example codivision of a binomial opinion by another binomial opinion
The co-quotient is ωx = (0.05, 0.49, 0.46, 0.50), and it can be verified that
Eq.(7.16) holds for the co-quotient projected probability.
Px = P((x∨y)∨e y)
e Py
= P(x∨y) ⊔
(7.16)
= (P(x∨y) − Py )/(1 − Py )
=
0.35−0.10
1−0.10
= 0.28.
Although probability codivision is a traditional operation used in probabilistic
models and analysis, the corresponding binary logic operator UN-OR is rarely used,
and was only introduced in 2004 [46].
7.4 Correspondence with Probabilistic Logic
Multiplication, comultiplication, division and codivision of dogmatic opinions are
equivalent to the corresponding probability calculus operators in Table 7.2, where
e.g. p(x) denotes the probability of variable value x.
114
7 Binomial Multiplication and Division
Table 7.2 Probability calculus operators corresponding to opinion operators.
Operator name:
Result type:
Probability calculus operator:
Multiplication
Product
p(x ∧ y) = p(x)p(y)
Division
Coproduct
Comultiplication
Quotient
e y) = p(x)/p(y)
p(x∧
p(x ∨ y) = p(x) + p(y) − p(x)p(y)
Codivision
Co-quotient
e y) = (p(x) − p(y))/(1 − p(y))
p(x∨
In the case of absolute opinions, i.e. when the argument opinions have either
b = 1 (absolute belief) or d = 1 (absolute disbelief), then these opinions can be interpreted as Boolean TRUE or FALSE, so the multiplication and comultiplication
operators are homomorphic with the binary logic operators AND and OR, as illustrated in Figure 5.3.
Chapter 8
Multinomial Multiplication and Division
Multinomial (and hypernomial) multiplication is different from binomial multiplication in that the product opinion on the whole product domain is considered, instead
of just on one element of the product domain. Figure 8.1 below illustrates the general situation with two domains X and Y that form the Cartesian product X×Y. The
product of two opinions ωX and ωY produces belief masses on singleton elements
of X×Y as well as on the row and column subsets of X×Y
:
;
x1
y1
x2
y2
=
}
(x1 yl)
(x2 y1)
(x2 y2)
}
(x2 yl)
}
(xk yl)
(xk y1)
(xk y2)
...
(x1 y2)
...
yl
(x1 y1)
...
...
...
xk
u
:u;
Fig. 8.1 Cartesian product of two domains
In order to produce an opinion with only belief mass on each singleton element
of X×Y as well as uncertainty mass on X×Y, some of the belief mass on the
row and column subsets of X×Y must be redistributed to the singleton elements in
such a way that the projected probability of each singleton element in X×Y equals
the product of projected probabilities of pairs of singleton values from X and Y
respectively.
Evaluating the multinomial product of two separate multinomial opinions involves the Cartesian product of the respective domains to which the opinions apply.
115
116
8 Multinomial Multiplication and Division
Let ωX and ωY be two independent multinomial opinions that apply to X and Y:
X = {x1 , x2 , . . . xk } with cardinality k,
(8.1)
Y = {y1 , y2 , . . . yl } with cardinality l.
The Cartesian product X×Y with cardinality kl is expressed as the matrix:


(x1 y1 ), (x1 y2 ), · · · (x1 yl )
 (x2 y1 ), (x2 y2 ), · · · (x2 yl ) 


.
···
. 
X×Y = 
(8.2)
 .

 .
.
···
. 
(xk y1 ), (xk y2 ), · · · (xk yl )
Consider the random variable XY which takes its values from the Cartesian product X×Y. We then turn to the multinomial product of multinomial opinions. The raw
terms produced by ωX · ωY can be separated into four groups.
1. The first group of terms consists of raw product belief masses on singletons of
X×Y:

b X (x1 )bbY (y1 ), b X (x1 )bbY (y2 ), . . . b X (x1 )bbY (yl )




 b X (x2 )bbY (y1 ), b X (x2 )bbY (y2 ), . . . b X (x2 )bbY (yl )
Singletons
.
... .
b XY
= .
(8.3)


.
.
.
.
.
.



b X (xk )bbY (y1 ), b X (xk )bbY (y2 ), . . . b X (xk )bbY (yl )
2. The second group of terms consists of belief masses on rows of X×Y
b Rows
XY = bX (x1 )uY , bX (x2 )uY , . . . bX (xk )uY
3. The third group consists of belief masses on columns of X×Y:
= uX bY (y1 ), uX bY (y2 ), . . . uX bY (yl
b Columns
XY
(8.4)
(8.5)
4. The last term is simply the uncertainty mass on the whole product domain:
uDomain
= uX uY
XY
(8.6)
The challenge is how to interpret the various types of product belief masses. In
case of hypernomial products the product belief masses are directly interpreted as
part of the hypernomial belief mass distribution, as explained in Section 8.4.
In case of multinomial products some of the belief mass on the row and column
subsets of X×Y must be redistributed to the singleton elements in such a way that
the projected probability of each singleton element equals the product of projected
probabilities of pairs of singleton values from X and Y respectively. There are (at
least) 3 approaches of multinomial opinion products that produce consistent projected probability products, namely normal multiplication described in Section 8.1,
8.1 Normal Multiplication
117
proportional multiplication described in Section 8.2, and projected multiplication
described in Section 8.3.
Whatever method is used, the projected probability distribution PXY of the product is always the same:
PXY (x y) = PX (x)PY (y)
(8.7)
The product variance depends on the uncertainty, which in general is different
for each method. Based on the Dirichlet PDF of the computed products, the product
variance can easily be computed through Eq.(3.18), as expressed in Eq.(8.8).
Multinomial product variance: VarXY (x y) =
PXY (x y)(1 − PXY (x y))uXY
W + uXY
(8.8)
where W denotes the non-informative prior weight, which is must set to W = 2. The
non-informative prior weight is discussed in Section 3.3.2.
It can for example be seen that when the uncertainty is zero, the variance is also
zero, which is the case for dogmatic multinomial opinions. In the general case, the
product variance of Eq.(8.8) is an approximation of the analytically correct variance,
which typically is complex, and for which there is no closed expression.
8.1 Normal Multiplication
The singleton terms of Eq.(8.3) and the uncertainty mass on the whole domain of
Eq.(8.6) are unproblematic because they conform with the multinomial opinion representation of having belief mass only on singletons and on the whole domain. In
contrast, the set of terms on rows of Eq.(8.4) and columns of Eq.(8.5) apply to overlapping subsets which is not compatible with the required format for multinomial
opinions, and therefore needs to be reassigned. Some belief mass from those terms
can be reassigned to belief mass on singletons, and some to uncertainty mass on the
whole domain.
8.1.1 Determining Uncertainty Mass
Consider the belief mass from Eq.(8.4) and Eq.(8.5) as potential uncertainty masses,
expressed as:
Potential uncertainty mass from rows:
uRows
XY
= ∑x∈X b Rows
XY (x)
(8.9)
Potential uncertainty mass from columns: uColumns
= ∑y∈Y b Columns
(y)
XY
XY
118
8 Multinomial Multiplication and Division
The sum of the uncertainty masses from Eq.(8.6) and Eq.(8.9) represents the
maximum possible uncertainty mass uMax
XY expressed as:
Rows
Columns
uMax
+ uDomain
XY = uXY + uXY
XY
(8.10)
The minimum possible uncertainty mass uMin
XY is simply:
Domain
uMin
XY = uXY
(8.11)
The projected probability of each singleton in the product domain can easily be
computed as the product of the projected probabilities of each pair of value of X and
Y according to Eq.(8.12).
PX (x)PY (y) = (bbX (x) + a X (x)uX )(bbY (y) + aY (y)uY )
(8.12)
We also require that the projected probability distribution over the product variable can be computed as a function of the product opinion according to Eq.(8.13).
PXY (x y) = bXY (x y) + aX (x)aaY (y)uXY
(8.13)
Obviously the quantities of Eq.(8.12) and Eq.(8.13) are equal, so we can write:
PX (x)PY (y) = PXY (x y)
⇔
(bbX (x) + a X (x)uX )(bbY (y) + aY (y)uY ) = b XY (x y) + a X (x)aaY (y)uXY
⇔
uXY =
(8.14)
(bbX (x)+aaX (x)uX )(bbY (y)+aaY (y)uY )−bbXY (x y)
a X (x)aaY (y)
The task now is to determine uXY and the belief distribution b XY of the multinomial product opinion ωXY . There is at least one product value (xi y j ) for which the
following equation can be satisfied:
Singletons
b XY
(xi y j ) = b XY (xi y j )
(8.15)
Based on Eq.(8.14) and Eq.(8.15) there is thus at least one product value (xi y j )
for which the following equation can be satisfied:
(i, j)
uXY =
Singletons
(bbX (xi )+aaX (xi )uX )(bbY (y j )+aaY (y j )uY )−bbXY
(xi y j )
a X (xi )aaY (y j )
(8.16)
Singletons
=
(i, j)
PX (xi )PY (yi )−bbXY
a X (xi )aaY (y j )
Max
where uMin
XY ≤ uXY ≤ uXY .
(xi y j )
8.1 Normal Multiplication
119
In order to determine the uncertainty mass for the product opinion, each product
value (xi y j ) ∈ X×Y must be visited in turn to find the smallest uncertainty mass
(i, j)
uXY that satisfies Eq.(8.16).
(i, j)
The product uncertainty can now be determined as the smallest uXY from
Eq.(8.16), expressed as:
uXY =
(i, j)
min uXY
(x y)∈X×Y
(8.17)
8.1.2 Determining Belief Mass
Having determined the uncertainty mass uXY according to Eq.(8.17), the expression
for the product projected probability of Eq.(8.12) can be used to compute the belief
mass on each element in the product domain, as expressed by Eq.(8.18).
b XY (x y) = PX (x)PY (y) − a X (x)aaY (y)uXY
(8.18)
= (bbX (x) + a X (x)uX )(bbY (y) + aY (y)uY ) − a X (x)aaY (y)uXY
It can be shown that the additivity property of Eq.(8.19) is preserved.
uXY +
∑
b XY (x y) = 1
(8.19)
(x y)∈X×Y
From Eq.(8.18) it follows directly that the product operator is commutative. It
can also be shown that the product operator is associative.
8.1.3 Product Base Rates
The computation of product base rates is straightforward according to Eq.(8.20)
below.

a X (x1 )aaY (y1 ), a X (x1 )aaY (y2 ), . . . a X (x1 )aaY (yl )



a X (x2 )aaY (y1 ), a X (x2 )aaY (y2 ), . . . a X (x2 )aaY (yl )
a XY =
(8.20)
.
.
... .



a X (xk )aaY (y1 ), a X (xk )aaY (y2 ), . . . a X (xk )aaY (yl )
120
8 Multinomial Multiplication and Division
8.1.4 Assembling the Multinomial Product Opinion
Having computed the belief mass distribution b XY , the uncertainty mass uXY and the
base rate distribution a XY according to the above stepwise procedure, the multinomial product opinion is complete as expressed by:
ωXY = (bbXY , uXY , a XY ).
(8.21)
Although not directly obvious, the normal multinomial product opinion method
described here is a generalisation of the binomial product described in Section 7.1.1.
Because of the relative simplicity of the binomial product it can be described as a
closed expression. For the normal multinomial product however, the stepwise procedure described here is needed.
8.1.5 Justification for Normal Multinomial Multiplication
The method to determine the product uncertainty in Eq.(8.17) might appear ad hoc.
However, there is a clear rationale behind this method.
The rationale is that the product belief masses must be at least as large as the
raw product belief masses of Eq.(8.3), anything else would be irrational, so we define this as a requirement. Remember that the larger the product uncertainty, the
smaller the product belief masses, so if the product uncertainty is too large, then
the requirement might not be satisfied for all product belief masses. The largest uncertainty which satisfies the requirement is defined as the product uncertainty. The
method for determining the product uncertainty in Section 8.1.1 follows exactly this
principle
The operator for normal multinomial multiplication is thus conservative, in the
sense that it preserves the maximum uncertainty possible.
8.2 Proportional Multiplication
Given the product projected probability distribution PXY computed with Eq.(8.7),
the quesiton is how to compute an appropriate uncertainty level. The proportional
method uses uX and uY together with the maximium theoretical uncertainty levels
ubX and ubY , and defines the uncertainty level uXY based on the assumption that uXY
is the proportional average of uX and uY , as expressed by Eq.(8.22). This produces
the uncertainty uXY .
8.3 Projected Multiplication
121
uXY
ubXY
⇔
=
uXY =
uX + uY
ubX + ubY
(8.22)
ubXY (uX + uY )
ubX + ubY
(8.23)
bX with uncertainty ubX is
The computation of uncertainty-maximised opinions ω
described in Section 3.4.6. The convention for marginal cases of division by zero is
that the whole fraction is equal to zero, as e.g. expressed by:
IF (uX + uY = 0) ∧ (b
uX + ubY = 0) THEN
uX + uY
= 0.
ubX + ubY
(8.24)
Eq.(8.24) is sound in all cases, because we always have (uX + uY ) ≤ (b
uX + ubY ).
Of course, the uncertainty sum (uX +uY ) is strictly limited by the maximum possible
uncertainty sum (b
uX + ubY ). This property ensures that uXY ∈ [0, 1] in Eq.(8.23).
Having computed the uncertainty level uXY in of Eq.(8.23), the belief masses are
computed according to:
b XY (x y) = PXY (x y) − a XY (x y)uXY , for each (x y) ∈ X × Y.
(8.25)
This completes the computation of the proportional product opinion ωXY , expressed as:
ωXY = ωX · ωY
(8.26)
Proportional multiplication as described here produces slightly less uncertainty
than normal multiplication described in Section 8.1, and can therefore be considered
as slightly more aggressive. The precise nature of heir similarity and difference
remains to be analysed.
8.3 Projected Multiplication
′ can be computed by first computing a hyperA projected multinomial product ωXY
Singletons
nomial opinion with belief mass distribution consisting of b XY
from Eq.(8.3),
Rows
Columns
b XY from Eq.(8.4) and b XY
from Eq.(8.5). The uncertainty is uDomain
from
XY
Eq.(8.6), and the base rate distribution is a XY from Eq.(8.20).
Then this hypernomial opinion is projected to a multinomial opinion according
′ .
to Eq.(3.31) on p.37. The result is the multinomial product opinion ωXY
′
In general the projected multinomial product opinion ωXY has less uncertainty
than the normal product opinion ωXY described in Section 8.3 above, although both
have the same projected probability distribution.
In case one or both of the factor opinions ωX and ωY contain significant uncertainty it is desirable to let this be reflected in the product opinion. The normal
122
8 Multinomial Multiplication and Division
multinomial product ωXY described above in Section 8.1 is therefore the preferred
method to be used.
8.4 Hypernomial Product
Evaluating the hypernomial product of two separate multinomial or hypernomial (or
even binomial) opinions involves the Cartesian product of the respective domains to
which the factor opinions apply. Assume the two domains X of cardinality k and Y
of cardinality l, as well as their hyperdomains R(X) of cardinality κ = (2k − 2) and
R(Y) of cardinality λ = (2l − 2).
The Cartesian product X×Y with cardinality kl is expressed as the matrix:


(x1 y1 ), (x1 y2 ), · · · (x1 yl )
 (x2 y1 ), (x2 y2 ), · · · (x2 yl ) 


.
··· . 
X×Y = 
(8.27)
 .

 .
.
··· . 
(xk y1 ), (xk y2 ), · · · (xk yl )
The hyperdomain of X×Y is denoted R(X×Y). Let ωX and ωY be two independent hypernomial opinions that apply to the separate domains. The task is to
compute the hypernomial product opinion ωXY . Table 8.1 summarises characteristics of the opinions and domains involved in a hypernomial product.
Table 8.1 Hypernomial product elements
Dom. Cardi. Hyperdom. Hypercardi. Var. Val. Bel. mass dist.
♯ bel. masses
Factor ωX
X
k
R(X)
κ
X
x
bX
κ
Factor ωY
Y
l
R(Y)
λ
Y
y
bY
λ
kl
R(X×Y)
(2kl − 2)
XY
xy
bXY
(κλ + κ + λ )
Product ωXY X×Y
The expression (κλ + κ + λ ) represents the number of belief masses of the hypernomial product. This number emerges as follows: The opinion factor ωX ’s belief mass distribution b X can have κ belief masses, and the opinion factor ωY ’s
belief mass distribution bY can have λ belief masses, so their product produces
κλ belief masses. In addition, the product between b X and uY produces κ belief
masses, and the product between bY and uX produces λ belief masses. Note that
(κλ + κ + λ ) ∝ 2k+l .
The expression (2kl −2) represents the number of hypervalues in R(X×Y). Note
that (2kl − 2) ∝ 2kl .
8.5 Product of Dirichlet Probability Density Functions
123
Because 2kl ≫ 2k+l with growing k and l, the number (2kl − 2) of possible values
in R(X×Y) is in general far superior to the number (κλ + κ + λ ) of belief masses of
the hypernomial product. A hypernomial opinion product is thus highly constrained
with regard to the set of hypervalues that can receive belief mass.
We now turn to the computation of the hypernomial opinion product. The terms
produced by ωX · ωY can be separated into four groups.
1. The first group of terms consists of belief masses on hypervalues of R(X×Y):

b X (x1 )bbY (y1 ), b X (x1 )bbY (y2 ), . . . b X (x1 )bbY (yλ )



b X (x2 )bbY (y1 ), b X (x2 )bbY (y2 ), . . . b X (x2 )bbY (yλ )
HValues
(8.28)
b XY
=
.
.
... .



b X (xκ )bbY (y1 ), b X (xκ )bbY (y2 ), . . . b X (xκ )bbY (yλ )
2. The second group of terms consists of belief masses on hyperrows of R(X×Y):
b HRows
= b X (x1 )uY , b X (x2 )uY , . . . b X (xκ )uY
XY
(8.29)
3. The third group consists of belief masses on hypercolumns of R(X×Y):
b HColumns
= uX bY (y1 ), uX bY (y2 ), . . . uX bY (yλ )
XY
(8.30)
4. The last term is simply the belief mass on the whole product domain:
uHDomain
= uX uY
XY
(8.31)
The set of κλ belief masses of b HValues
, the κ belief masses of b HRows
and the λ
XY
XY
Hyper
HColumns
belief masses of b XY
together form the belief mass distribution b XY of ωXY :
Hyper
b XY = (bbHValues
, b HRows
, b HColumns
)
XY
XY
XY
(8.32)
The uncertainty mass is simply uHDomain
. Finally the base rate distribution a XY
XY
is the same as that of multinomial products in Eq.(8.20). The hypernomial product
opinion is then defined as:
Hyper
Hyper
ωXY
= (bbXY , uHDomain
, aXY ).
XY
(8.33)
If needed the hypernomial product opinion ωXY can be projected to a multinomial opinion according to Eq.(3.31) on p.37. The result is then a multinomial
′ which has the same projected probability distribution as that
product opinion ωXY
Hyper
of ωXY .
Hyper
8.5 Product of Dirichlet Probability Density Functions
Multinomial opinion multiplication can be leveraged to compute products of Dirichlet PDFs (Probability Density Functions) described in Section 3.4.2.
124
8 Multinomial Multiplication and Division
Assume domains X and Y. The variables X and Y take their values from X and Y
respectively. In the Dirichlet model the analyst observes occurrences of values x ∈ X
and values y ∈ Y, and represents these observations as evidence vectors r X and r Y ,
so that e.g. r X (x) represents the number of observed occurrences of the value x. In
addition the analyst must specify the base rate distributions a X over X as well as the
base rate distribution aY over Y. These parameters define the Dirichlet PDFs on X
and Y .
Let e.g. DireX and DirYe denote the evidence Dirichlet PDFs on variables X and Y
respectively, according to Eq.(3.16). Their product can be denoted as:
DireXY = DireX · DirYe
The procedure for computing
is also illustrated in Figure 8.2.
DireXY
(8.34)
according to Eq.(8.34) is described next and
1. Specify the evidence parameters (rr X , a X ) and (rrY , aY ) of the factor Dirichlet
PDFs.
2. Derive opinions ωX and ωY from Dirichlet PDFs according to the mapping of
Eq.(3.23).
3. Compute ωXY = ωX · ωY as described in Section 8.1 in case of multinomial
product.
4. Derive the product Dirichlet PDF DireXY from the multinomial product opinion
ωXY according to the mapping of Eq.(3.23).
Figure 8.2 depicts the procedure just described.
DirXe
Mapping
ZX
Multiplication
DirYe
Mapping
˜
Z XY
Mapping
e
DirXY
ZY
Fig. 8.2 Procedure for computing the product of Dirichlet PDFs
In general the product of two Dirichlet PDFs can be computed as a Dirichlet PDF.
If needed a Dirichlet HPDF can be projected onto a Dirichlet PDF. This is done by
first mapping the Dirichlet HPDF to a hyper-opinion according to Eq.(3.36). Then
the hyper-opinion is projected onto a multinomial opinion according to Eq.(3.31).
Finally the multinomial opinion can be used in the multiplication operation.
The above described method for computing a product Dirichlet PDF from two
Dirichlet PDFs is very simple and requires very little computation. Although not
equivalent this result is related to Dirichlet convolution.
8.6 Example Multinomial Product Computation
125
8.6 Example Multinomial Product Computation
We consider the scenario where a GE (Genetic Engineering) process can produce
Male (M) or Female (F) fertilised eggs. In addition, each fertilised egg can have
genetic mutation S or T independently of its gender. This constitutes two binary
domains representing gender: Gen = {M, F} and mutation Mut = {S, Ξ} , or alternatively the quaternary product domain Gen×Mut = {(M S), (M T), (F S), (F T)}.
Sensor A observes whether each egg has gender M or F, and Sensor B observes
whether the egg has mutation S or T.
Sensors A and Sensor B have thus observe different and orthogonal aspects, so
that opinions derived from their observations can be combined with multiplication.
This is illustrated in Figure 8.3.
Sensor A
Opinion about gender
Product opinion about egg
Multiplication
Sensor B
Opinion about mutation
Fig. 8.3 Multiplication of opinions on orthogonal aspects of GE eggs
The result of opinion multiplication in this case can be considered as an opinion
based on observation from a single sensor that simultaneously detects both aspects
at the same time.
Assume that 20 eggs have been produced and that the two aspects aspects have
been observed for each egg. Table 8.2 summarises the observations and the resulting
opinions.
Table 8.2 Observations of egg gender and mutation
Observations
Opinions
Base Rates
Probabilities
Gender
opinion:
r(M) = 12
r(F) = 6
b(M) = 0.60
b(F) = 0.30
uGen = 0.10
a(M) = 0.50
a(F) = 0.50
P(M) = 0.65
P(F) = 0.35
Mutation
opinion:
r(S) = 14
r(Ξ) = 4
b(S) = 0.70
b(Ξ) = 0.20
uMut = 0.10
a(S) = 0.50
a(Ξ) = 0.50
P(S) = 0.75
P(Ξ) = 0.25
The Cartesian product domain and the projected probabilities are expressed as:
126
8 Multinomial Multiplication and Division
Gen×Mut =
(M S),
(F S),
(M T)
,
(F T)
P(Gen×Mut) =
0.49,
0.26,
0.16
. (8.35)
0.09
The next section describes and compares the normal multinomial product and
the projected multinomial product of this example. Then Section 8.6.2 describes the
hypernomial product of this example.
8.6.1 Multinomial Product Computation
Below is presented the result of multinomial multiplication according to the normal
product method described in Section 8.1 as well as the projected product method
described in Section 8.3.
Table 8.3 shows the results applying the method of normal multinomial product
of Section 8.1, as well as the method or projected multinomial product. Also shown
are the Dirichlet PDF parameters that are obtained with the mapping of Eq.(3.23).
Table 8.3 therefore accounts for both multinomial opinion product as well as Dirichlet PDF product.
Table 8.3 Multinomial products of egg gender and mutation
Opinions
Base Rates
Probabilities
Eq. Observations
b(M S)
b(M T)
b(F S)
b(F T)
uGen×Mut
= 0.460
= 0.135
= 0.235
= 0.060
= 0.110
a(M S)
a(M T)
a(F S)
a(F T)
= 1/4
= 1/4
= 1/4
= 1/4
P(M S)
P(M T)
P(F S)
P(F T)
= 0.49
= 0.16
= 0.26
= 0.09
r(M S)
r(M T)
r(F S)
r(F T)
∑r
= 8.36
= 2.45
= 4.27
= 1.09
= 16.18
b(M S)
Proportional b(M T)
product:
b(F S)
b(F T)
uGen×Mut
= 0.473
= 0.148
= 0.248
= 0.073
= 0.058
a(M S)
a(M T)
a(F S)
a(F T)
= 1/4
= 1/4
= 1/4
= 1/4
P(M S)
P(M T)
P(F S)
P(F T)
= 0.49
= 0.16
= 0.26
= 0.09
r(M S)
r(M T)
r(F S)
r(F T)
∑r
= 16.21
= 5.07
= 8.50
= 2.50
= 32.28
b(M S)
b(M T)
b(F S)
b(F T)
uGen×Mut
= 0.485
= 0.160
= 0.260
= 0.085
= 0.010
a(M S)
a(M T)
a(F S)
a(F T)
= 1/4
= 1/4
= 1/4
= 1/4
P(M S)
P(M T)
P(F S)
P(F T)
= 0.49
= 0.16
= 0.26
= 0.09
r(M S)
r(M T)
r(F S)
r(F T)
∑r
= 97
= 32
= 52
= 17
= 198
Normal
product:
Projected
product:
The normal product preserves the most uncertainty, the proportional product preserves about 50% less uncertainty, and the projected product hardly preserves any
uncertainty at all. The normal product thereby represent the most conservative approach, and should normally be used for multinomial product computation in general situations. The analytically correct product of Dirichlet PDFs is computationally complex, and the methods described here represent approximations. Simula-
8.7 Multinomial Division
127
tions of the binomial product in Section 7.1.3 show that the normal product is a very
good approximation of the analytically correct product.
8.6.2 Hypernomial Product Computation
Computation of hypernomial products does not require any synthesis of uncertainty
mass or projection and is therefore much simpler than the computation of multinomial products. We continue the example of observing egg gender and mutation.
Based on the observation parameters of Table 8.2 the hypernomial product can
be computed according to the method described in Section 8.4.
In case of hypernomial products it is necessary to consider the product hyperdomain R(Gender×Mutation) (including the product domain (Gen Mut)) of
Eq.(8.36).


(M S),
(M T),
(M Mut)
(F T),
(F Mut) 
R(Gen×Mut) =  (F S),
(8.36)
(Gen S), (Gen T), (Gen×Mut)
Table 8.4 shows the hypernomial product ωGen×Mut including the hypernomial
product DireH
Gen×Mut .
Table 8.4 Hypernomial product of egg gender and mutation
Opinions
Hypernomial
product:
b(M S)
b(M T)
b(F S)
b(F T)
b(M Mut)
b(F Mut)
b(Gen S)
b(Gen T)
uGen×Mut
Base Rates
= 0.42
= 0.12
= 0.21
= 0.06
= 0.07
= 0.07
= 0.06
= 0.03
= 0.01
a(M S)
a(M T)
a(F S)
a(F T)
= 0.25
= 0.25
= 0.25
= 0.25
Probabilities
P(M S)
P(M S)
P(F S)
P(F S)
= 0.49
= 0.16
= 0.26
= 0.09
8.7 Multinomial Division
Multinomial division is the inverse operation of multinomial multiplication described in Section 8.1. Similarly to how multinomial multiplication applies to the
Cartesian product of two domains, multinomial division applies to the Cartesian
quotient of a product domain by one of its factor domains.
128
8 Multinomial Multiplication and Division
Consider the Cartesian product domain X × Y, and the factor domain Y. The
Cartesian quotient resulting from dividing product domain X×Y by factor domainm
Y produces the quotient domain X, as illustrated in Figure 8.4.
:u;
;
:
}
(x1 yl)
y1
x1
(x2 y1)
(x2 y2)
}
(x2 yl)
y2
x2
.
.
.
.
.
.
(xk y1)
(xk y2)
.
.
.
}
/
(xk yl)
yl
=
...
(x1 y2)
...
(x1 y1)
xk
Fig. 8.4 Cartesian quotient of a Cartesian product domain divided by one of its factors
Assume a multinomial opinion ωXY = (bbXY , uXY , a XY ) on the product domain
XY with the following belief mass distribution and base rate distribution.

b XY (x1 y1 ), b X (x1 y2 ), . . . , b X (x1 yl )




 b XY (x2 y1 ), b X (x2 y2 ), . . . , b X (x2 yl )
.
..., .
b XY = .
(8.37)


.
.
.
.
.
,
.



b XY (xk y1 ), b X (xk y2 ), . . . , b X (xk yl )


 a XY (x1 y1 ), a X (x1 y2 ), . . . , a X (x1 yl )


 a XY (x2 y1 ), a X (x2 y2 ), . . . , a X (x2 yl )
.
..., .
a XY = .
(8.38)


.
.
.
.
.
,
.



a XY (xk y1 ), a X (xk y2 ), . . . , a X (xk yl )
Assume also the multinomial opinion ωY = (bbY , uY , aY ) on domain Y. We want
to determine ωX by multinomial division according to.
ωX = ωXY /ωY
(8.39)
It can already be mentioned that there is no general solution to this seemingly
simple equation. A general solution would have to satisfy the unrealistic requirement of Eq.(8.40).
8.7 Multinomial Division
129
PXY (xi y1 ) PXY (xi y2 )
PXY (xi yl )
=
= ...
, for all xi ∈ X.
PY (y1 )
PY (y2 )
PY (yl )
(8.40)
However, there is only one single product opinion ωXY for which Eq.(8.40) holds,
making it impossible to satisfy that requirement in general. Instead, the realistic
situation is expressed by Eq.(8.41).
Unrealistic: PX (xi ) =
PXY (xi yl )
PXY (xi y1 ) PXY (xi y2 )
6=
6= . . .
, for all xi ∈ X.
PY (y1 )
PY (y2 )
PY (yl )
(8.41)
For the base rate distributions a X , aY and a XY , we assume that the requirement
does hold, as expressed by Eq.(8.42).
Realistic: PX (xi ) 6=
a XY (xi yl )
a XY (xi y1 ) a XY (xi y2 )
=
= ...
, for all xi ∈ X.
aY (y1 )
aY (y2 )
aY (yl )
(8.42)
Because there is no general analytical solution to multinomial division, we will
instead describe two partial solutions below. The method for the first possible solution is averaging proportional division, and the method for the second possible
solution is selective division.
Assumed: a X (xi ) =
8.7.1 Averaging Proportional Division
The method for averaging proportional division, denoted ‘/’, is synthetic by nature,
in the sense that it produces a solution where there is no real analytical solution.
The produced solution can not be described as an approximation, because there is
not even a correct solution to approximate.
The main principle of averaging proportional division is simply to compute the
average of the different projected probabilities from Eq.(8.41). However, potential
problems with zero divisors and non-additivity must be addressed, by defining one
limit rule and one condition.
Limit rule:
(
IF
(PXY (xi y j ) = PY (y j ) = 0) for some (i, j),
P (xi y j )
THEN XY
PY (y j ) = 0.
(8.43)
Condition:
IF
(PXY (xi y j ) > 0) ∧ (PY (y j ) = 0) for some (i, j),
THEN no division is possible.
(8.44)
Thus, by applying the limit rule of Eq.(8.43) and respecting the condition of
Eq.(8.44), a preliminary quotient projected probability distribution can be computed
according to:
130
8 Multinomial Multiplication and Division
PPre
X (xi ) =
l
PXY (xi y j )
∑ PY (y j )
j=1
!
1
l
(8.45)
Note the cardinality l = |Y| used for producing the preliminary quotient probability distribution. It is possible that the probability distribution of PPre
X is non-additive
(i.e. the sum is not equal to 1). Additivity can be obtained through simple normalisation through the normalisation factor νXAve expressed as:
k
νXAve = ∑ PPre
X (xi )
(8.46)
i=1
The average quotient projected probability distribution PAve
can then be comX
puted, as expressed in Eq.(8.47).
PAve
X (xi ) =
PPre
X (xi )
.
νXAve
(8.47)
The average quotient projected probability distribution is then:
PAve
X = {PX (xi ), for i = 1, . . . , k}
One disadvantage of PAve
X
(8.48)
is obviously that, in general, PXY =
6 PAve
X ·PY . However,
it can be noted that the following equality holds:
Ave
PAve
X = (PX · PY )/PY
(8.49)
Given the average quotient projected probability distribution PAve
X , the question is
how to compute an appropriate level of uncertainty, based on the uncertainties uXY
and uY . One simple heuristic method is to take the maximium theoretical uncertainty
levels ubXY and ubY , and define the uncertainty level uY based on the assumption
that uXY is the proportional average of uX and uY , as expressed by Eq.(8.50). This
produces a preliminary uncertainty uPre
X
⇔
⇔
uXY
ubXY
uPre
X
ubX +b
uY
=
=
uPre
=
X
uPre
uPre
uY
X + uY
X
=
+
ubX + ubY
ubX + ubY ubY + ubY
uY
uXY
−
ubXY ubX + ubY
uXY (b
uX + ubY )
− uY
ubXY
(8.50)
(8.51)
(8.52)
bX with uncertainty ubX is
The computation of uncertainty-maximised opinions ω
described in Section 3.4.6. The convention for marginal cases of division by zero is
that the whole fraction is equal to zero, as e.g. expressed by:
8.7 Multinomial Division
131
IF (uXY = 0) ∧ (b
uXY = 0) THEN
uXY
= 0.
ubXY
(8.53)
Eq.(8.53) is sound in all cases, because we always have uXY ≤ ubXY . Of course,
the uncertainty uXY is strictly limited by the maximum possible uncertainty ubXY .
The uncertainty uPre
X of Eq.(8.52) could theoretically take values greater than 1 or
less than 0. Therefore, in order to normalise the situation it is necessary to complete
the computation by constraining its range.

(uPre
 IF
X > 1) THEN uX = 1
Range constraint: ELSEIF (uPre
X < 0) THEN uX = 0

ELSE
uX = uPre
X
(8.54)
Having computed the uncertainty level uX , the belief masses are computed according to:
b X (x) = PX (x) − a X (x)uX , for each x ∈ X
(8.55)
This completes the computation of the averaging proportional quotient opinion
expressed as:
ωXAve = ωXY /ωY
(8.56)
The averaging aspect of this operator stems from the averaging of the projected
probabilities in Eq.(8.45). The proportional aspect comes from the computation of
uncertainty in Eq.(8.52) proportionally to the uncertainty of the argument opinions.
8.7.2 Selective Division
The method for selective division, denoted ‘’, assumes that one of the values of Y
has been observed, and is thereby considered to be absolutely TRUE. Let the specific
observed value be y j , then the belief mass distribution bY and projected probability
distriution PY are expressed by:
y j is TRUE ⇒

 bY (y j ) = 1, and PY (y j ) = 1,

bY (yt ) = 0, and PY (yt ) = 0 for all other values yt 6= y j
(8.57)
The quotient projected probability distribution is expressed as:
PX (xi ) =
PXY (xi y j )
= PXY (xi y j ), for i = 1, . . . , k
PY (y j )
(8.58)
132
8 Multinomial Multiplication and Division
The uncertainty can be determined in the same way as for average division according to Eq.(8.52), but since it is assumed that uY = 0 (because some y j is TRUE),
then a simplified version of Eq.(8.52) is expressed as:
uSel
X =
uXY ubX
ubXY
(8.59)
It can be verified that uSel
X ∈ [0, 1], so that no constraining is needed.
Having computed the uncertainty level uSel
X , the quotient belief mass distribution
is computed as:
Sel
b Sel
X (x) = PX (x) − a X (x)uX , for each x ∈ X
(8.60)
This completes the computation of the selective quotient opinion expressed as:
ωXSel = ωXY ωY
(8.61)
The selective aspect is that one of the divisor elements is considered TRUE,
which makes it possible to select one specific quotient term of candidate projected
probability terms, thereby avoiding complications resulting from unequal terms.
8.8 Multinomial Opinion Projection
A multinomial product opinion is a joint opinion in case the factor opinions are
independent. Let ωX and ωY be independent multinomial opinions on respective
domains X and Y, and let ωXY be the multinomial product opinion on the Cartesian
product domain X×Y. There are cases where it is necessary to project the product
opinion to produce a projected opinion on a factor domain. The interpretation of this
operation is that a projected opinion represents the specific factor in the product.
8.8.1 Opinion Projection Method
Let X be a domain of cardinality k = |X|, and let Y be a domain of cardinality
l = |Y|, with X and Y as their respective variables. Let ωX and ωY be two opinions,
and let ωXY be the product opinion on domain X × Y.
The product opinion ωXY can be projected onto X or Y , so both projections are
described below.
Let PX and PY be (assumed) projected probability distributions over X and Y
respectively, then the product projected probability distribution is expressed as:
8.8 Multinomial Opinion Projection
Product: PXY
133

PXY (x1 y1 ), PXY (x1 y2 ), . . .



PXY (x2 y1 ), PXY (x2 y2 ), . . .
=
.
...
.


PXY (xk y1 ), PXY (xk y2 ), . . .
PXY (x1 yl )
PXY (x2 yl )
.
PXY (xk yl )

PX (x1 )PY (y1 ), PX (x1 )PY (y2 ), . . .



PX (x2 )PY (y1 ), PX (x2 )PY (y2 ), . . .
Implicitly equal to: =
.
...
.


PX (xk )PY (y1 ), PX (xk )PY (y2 ), . . .
(8.62)
PX (x1 )PY (yl )
PX (x2 )PY (yl )
(8.63)
.
PX (xk )PY (yl )
The projection onto X and Y produces the projected probability distributions PX
and PY expressed as:
l
P̌X (xi ) = ∑ PXY (xi y j )
j=1
(8.64)
k
P̌Y (y j ) = ∑ PXY (xi y j )
i=1
Similarly, the projected base rates a X and aY are computed as:
l
ǎaX (xi ) = ∑ a XY (xi y j )
j=1
(8.65)
k
ǎaY (y j ) = ∑ a XY (xi y j )
i=1
The question is how to determine an appropriate level of uncertainty. For this
purpose, the same principle as for averaging proportional division described in Section 8.7.1 is used. In addition, we assume that the uncertainties for ωX and ωY are
equal. This leads to the set of constraints on the left-hand side of Eq.(8.66), which
produces the uncertainties on the right-hand side of Eq.(8.66).


ǔY
uXY
ub u


= ǔubXX +
 ubXY
 ǔX = XubXYXY
+b
uY
(8.66)
⇔


 ǔX = ǔY
 ǔ = ubY uXY
Y
ubX
ubY
ubXY
Having computed the projected uncertainties ǔX and ǔY , the respective projected
belief mass distributions b̌bX and b̌bY can be computed as:
b̌bX (xi ) = P̌X (xi ) − ǎaX (xi ) ǔX
(8.67)
b̌bY (y j ) = P̌Y (y j ) − ǎaY (yi ) ǔY
This completes the projection onto X with opinion ω̌X = (b̌bX , ǔX , ǎaX ), as well as
projection onto Y with opinion ω̌Y = (b̌bY , ǔY , ǎaY ).
134
8 Multinomial Multiplication and Division
8.8.2 Example: Football Games
Consider e.g. two football games, where in game X the teams x1 and x2 play together, and in game Y the teams y1 and y2 play together. The rules dictates that the
games can not end in a draw, so if the score is level at the end of a game, a penalty
shootout follows until one team wins.
Assume that an betting analyst has the joint opinion ωXY about the four possible
outcomes of the two games before they begin. Assume further that game Y is cancelled for some reason (e.g. the bus carrying team y2 had an accident). The betting
analyst then wants to project the joint opinion ωXY onto the domain X to produce
the projected opinion ωX . This is possible since the belief masses on all product
values (x y) ∈ X×Y have assigned belief masses.
The Cartesian product domain, the belief mass distribution, and the base rate
distributions are expressed as:
(x1 y1 ), (x1 y2 )
X×Y =
,
(x2 y1 ), (x2 y2 )
b XY =
a XY =
b XY (x1 y1 ) = 0.4,
b XY (x2 y1 ) = 0.1,
a XY (x1 y1 ) = 0.25,
a XY (x2 y1 ) = 0.25,
b XY (x1 y2 ) = 2.0
. uXY = 0.2
b XY (x2 y2 ) = 0.1
(8.68)
a XY (x1 y2 ) = 0.25
.
a XY (x2 y2 ) = 0.25
The product projected probability distribution can be computed as:
PXY (x1 y1 ) = 0.45, PXY (x1 y2 ) = 0.25
PXY =
PXY (x2 y1 ) = 0.15, PXY (x2 y2 ) = 0.15
Table 8.5 summarises the projected opinions ω̌X and ω̌Y .
Table 8.5 Projected opinions ω̌X and ω̌Y
Opinions
Base Rates
Probabilities
Projection
on X:
b̌bX (x1 ) = 0.600
b̌bX (x2 ) = 0.200
ǔX
= 0.200
ǎaX (x1 ) = 1/2
ǎaX (x2 ) = 1/2
P̌X (x1 ) = 0.70
P̌X (x2 ) = 0.30
Projection
on Y :
b̌bY (y1 ) = 0.467
b̌bY (y2 ) = 0.267
ǔY
= 0.266
ǎaY (y1 ) = 1/2
ǎaY (y2 ) = 1/2
P̌Y (y1 ) = 0.60
P̌Y (y2 ) = 0.40
(8.69)
Chapter 9
Conditional Deduction
9.1 Introduction to Conditional Reasoning
Both binary logic and probability calculus have mechanisms for conditional reasoning, where deduction and abduction are the fundamental operators. Conditional
deduction and abduction are both discussed in this introduction, but this chapter
only describes in detail the deduction operator for subjective logic. The abduction
operator for subjective logic is described in detail in Chapter 10.
In binary logic, Modus Ponens (MP) which represents deduction, and Modus
Tollens (MT) which represents abduction, are the classical operators that are used
in any field of logic that requires conditional inference. In probability calculus, conditional probabilities together with base rates are used for analysing deductive and
abductive reasoning models. Subjective logic extends the traditional probabilistic
approach by allowing subjective opinions to be used as input arguments, so that deduced or abduced conclusions reflect the underlying uncertainty of the situation to
be analysed.
Our knowledge of the real world tells us that certain variables and states are
related in some way. For example, the state of rainy weather and the state of carrying
an umbrella are often related, so it is meaningful to express this relationship as
a conditional propositions “If it rains, Bob carries an umbrella” which is of the
form “IF x THEN y”. Here x denotes the antecedent proposition (aka. evidence) y
the consequent proposition (aka. hypothesis) in the conditional. The format of this
binary logic conditional is thus always “IF <antecedent> THEN <consequent>”.
Causal conditionals are typically considered in reasoning models, where the parent is assumed to cause the child.
Consider a reasoning model about carrying an umbrella depending on rainy
weather. There is obviously a causal conditional between rain and carrying an umbrella, so the rain variable becomes parent, and the umbrella variable becomes child,
as illustrated in Figure 9.1.
A conditional is a complex proposition consisting of the antecedent and consequent sub-propositions that practically or hypothetically represent states in the real
135
9 Conditional Deduction
ZX
Causal
conditionals
Child node Y
ZY|X
Deduction
Parent node X
ZY||X
~
ZX||Y
Abduction
136
Fig. 9.1 Principles of deduction and abduction
world. The conditional proposition does not represent a state in the same way, rather
it represents a relationship between states of the world. The purpose of conditionals is primarily to reason from knowledge about the antecedent proposition to infer knowledge about the consequent proposition, which commonly is called deductive reasoning. In addition, conditionals can also be used to reason from knowledge
about the consequent proposition to infer knowledge about the antecedent proposition which commonly is called abductive reasoning. Abductive reasoning involves
inversion of conditionals, which makes abduction more complex than deduction.
In case of a causal conditional it is assumed that a parent variable dynamically
influences a child variable in space and time. For example consider the case of a
binary parent proposition “It rains”, and a binary child proposition “Bob carries an
umbrella” that both can be evaluated to TRUE or FALSE. Initially, assume that the
parent propositions are both FALSE. Subsequently, assume that the parent proposition becomes TRUE, and that this makes the child proposition also becoming TRUE.
If this dynamic conditional relationship holds in general, then it can be seen as a
TRUE conditional.
Note that in case of a TRUE causal conditional, forcing the child proposition to
become TRUE normally does not influence the parent proposition in any way. The
above scenario of rain and umbrellas clearly demonstrates this principle, because
carrying an umbrella obviously does not bring rain. However, in case of a TRUE
causal conditional, simply knowing that the child is TRUE can nevertheless indicate
the truth value of the parent, because seeing Bob carrying an umbrella can plausibly
indicate that it rains. Hence, in case of a TRUE causal conditional, to force or to
know the child proposition to be TRUE can have very different effects on the parent
proposition. However, to force or to know the parent proposition to be TRUE have
equal effect on the child proposition.
A derivative conditional is the opposite of a causal conditional. This means that
even in case of a TRUE derivative conditional, then forcing the parent proposition
to become TRUE does not necessarily make the child proposition become TRUE
9.2 Probabilistic Conditional Inference
137
as well. For example, the conditional “IF Bob carries an umbrella THEN it must be
raining” is a derivative conditional because forcing Bob to carry an umbrella does
not cause rain.
Conditionals can also be non-causal. For example in case two separate lamps are
connected to the same electric switch, then observing one of the lamps being lit
gives an indication of the other lamb being lit too, so there is clearly a conditional
relationship between them. However neither lamp actually causes the other to light
up, rather it is the flipping of the switch which causes both lamps to light up at the
same time.
Conditionals are logically directed from the antecedent to the consequent proposition. The idea is that an analyst with knowledge of the truth value of a conditional
and its antecedent proposition can infer knowledge about the consequent proposition. However, in case the analyst needs to infer knowledge about the antecedent
proposition, then the conditional can not be used directly. What the analyst needs is
the opposite conditional where the propositions have swapped places. For example
assume that Alice knows that Bob usually carries an umbrella when it rains. Then
she has an initial causal conditional with which she can determine with some accuracy whether he carries an umbrella simply by observing that it rains. However, if
she wants to determine whether it rains by observing that Bob picks up his umbrella
before going out, then in theory the causal conditional is not directly applicable.
Alice might intuitively infer that it probably rains if he picks up an umbrella, but
in reality she then practices derivative reasoning because she applies the inverse of
a causal conditional, as well as abductive reasoning because she implicitly inverts
the initial conditional (whether it is causal or not). Inversion of conditionals is more
complex than what intuition indicates, so it is important to use sound methods to
make sure that it is done correctly and consistently.
The degree of truth, or equivalently the validity of conditionals, can be expressed
in different ways, e.g. as Boolean TRUE or FALSE, as a probabilities, or as subjective opinions. In the sections below, the methods for binomial and multinomial
conditional deduction are explained and described, first in case of probabilistic conditionals, and subsequently in case of opinion conditionals. Then, in Chapter 10,
the methods for binomial and multinomial conditional abduction are explained and
described.
9.2 Probabilistic Conditional Inference
With the aim of giving the reader a gentle introduction to the principles of deduction and abduction, this section provides a brief overview of how it is handled in
traditional probability calculus. The binomial case is described first, followed by
the general multinomial case.
138
9 Conditional Deduction
9.2.1 Binomial Probabilistic Deduction and Abduction
The notation ykx, introduced in [55], denotes that the truth or probability of proposition y is derived as a function of the probability of the parent x together with
the conditionals p(y|x) and p(y|x). The expression p(ykx) thus represents a derived
value, whereas the expression p(y|x) represents an input argument. This notational
convention will be used when describing probabilistic and subjective conditional
reasoning below.
The deductive and abductive reasoning situations are illustrated in Figure 9.1
where X denotes the parent variable and Y denotes the child variable in the reasoning
model.
Conditionals are expressed as p(< consequent > | < antecedent >), i.e. with the
consequent variable first, and the antecedent variable last. Formally, a conditional
probability is defined as follows:
Conditional probability: p(y|x) =
p(x ∧ y)
.
p(x)
(9.1)
Parent-child reasoning models are typically assumed to be causal, where parent
nodes have a causal influence over child nodes.
In this situation conditionals are typically expressed in the same direction as the
reasoning, i.e. with parent as antecedent, and child as consequent.
Forward conditional inference, called deduction, is when the analyst has evidence
about the parent variable, and the child variable is the target of reasoning.
Assume that the elements x and x are relevant to the element y (and y) according
to the conditional statements y|x and y|x. Here, x and x are parents and y is the child
of the conditionals. Let p(x), p(y|x) and p(y|x) be probability assessments of x, y|x
and y|x respectively. The law of total probability says that the probability of a child
value y is the sum of conditional probabilities conditioned on every parent value,
expressed as:
Law of total probability: p(y) = p(x)p(y|x) + p(x)p(y|x).
(9.2)
The conditionally deduced probability p(ykx) can then be computed from the
law of total probability as:
Deduced probability: p(ykx) = p(x)p(y|x) + p(x)p(y|x)
(9.3)
= p(x)p(y|x) + (1 − p(x))p(y|x) .
In case the analyst knows exactly that x is true, i.e. p(x) = 1, then from Eq.(9.3)
it can immediately be seen that p(ykx) = p(y|x). Conversely, in case the analyst
knows exactly that x is false, i.e. p(x) = 0, then from Eq.(9.3) it can immediately be
seen that p(ykx) = p(y|x).
9.2 Probabilistic Conditional Inference
139
Reverse conditional inference, called abduction, is when the analyst has evidence
about the child variable, and the parent variable is the target of the reasoning. In this
case the available conditionals are directed in the opposite direction to the reasoning,
and opposite to what the analyst needs.
Assume that the states x and x are relevant to y according the conditional statements y|x and y|x, where x and x are parent values, and y and y are child values. Let
p(y), p(y|x) and p(y|x) be probability assessments of y, y|x and y|x respectively. The
required conditionals can be correctly derived by inverting the available conditionals
using Bayes theorem.
Bayes theorem: p(x|y) =
p(x)p(y|x)
.
p(y)
(9.4)
Bayes theorem is simply derived from the definition of conditional probability of
Eq.(9.1) as follows.
By expressing the conditional probability of p(y|x) and p(x|y) Bayes theorem
emerges.

 p(y|x) = p(x∧y)

p(x)
p(x)p(y|x)
⇒ p(x|y) =
.
(9.5)

p(y)
 p(x|y) = p(x∧y)
p(y)
With Bayes theorem the inverted conditional p(x|y) can be computed from the
conditionals p(y|x) and p(y|x). However, the simple expression of Bayes theorem
hides some subtleties related to base rates as explained below.
Conditionals are assumed to represent general dependence relationships between
statements, so the terms p(x) and p(y) on the right hand side of Eq.(9.4) must also
represent general probabilities, and not for example observations. A general probability must be interpreted as a base rate, as explained in Section 2.6. The term p(x)
on the right hand side of Eq.(9.4) therefore expresses the base rate of x. Similarly,
the term p(y) on the right hand side of Eq.(9.4) expresses the base rate of y, which
conditionally depends on the base rate of x, and which can be computed from the
base rate of x using the law of total probability of Eq.(9.2).
In order to avoid confusion between the base rate of x and the probability of x,
the term a(x) will denote the base rate of x in the following. Similarly, the term a(y)
will denote the base rate of y.
By applying the law of total probability of Eq.(9.2) the term p(y) in Eq.(9.5) can
be expressed as a function of the base rate a(x) and its complement a(x) = 1 − a(x).
As a result the inverted positive and negative conditionals are:

a(x)p(y|x)

 p(x|y) = a(x)p(y|x)+a(x)p(y|x) ,
Inverted conditionals:
(9.6)

a(x)p(y|x)
 p(x|y) =
.
a(x)p(y|x)+a(x)p(y|x)
The conditionally abduced probability p(ykx) can then be computed as:
140
9 Conditional Deduction
Abduced probability: p(xk̃y) = p(y)p(x|y) + p(y)p(x|y)
= p(y)
a(x)p(y|x)
a(x)p(y|x)+a(x)p(y|x)
+ p(y)
a(x)p(y|x)
a(x)p(y|x)+a(x)p(y|x)
(9.7)
It can be noted that Eq.(9.7) is simply the application of conditional deduction
according to Eq.(9.3) where the conditionals are determined according to Eq.(9.6)
The terms used in Eq.(9.3) and Eq.(9.7) are interpreted as follows:
p(y|x) : the conditional probability of y given that x is TRUE
p(y|x) : the conditional probability of y given that x is FALSE
p(x|y) : the conditional probability of x given that y is TRUE
p(x|y) : the conditional probability of x given that y is FALSE
p(y)
: the probability of y
p(y)
: the probability of the complement of y (= 1 − p(y))
a(x)
: the base rate of x
p(ykx) : the deduced probability of y as a function of evidence on x
p(xk̃y) : the abduced probability of x as a function of evidence on y
The binomial expressions for probabilistic deduction of Eq.(9.3) and probabilistic abduction of Eq.(9.7) can be generalised to multinomial expressions as explained
below.
9.2.2 Multinomial Probabilistic Deduction and Abduction
Let X = {xi |i = 1 . . . k} be the parent domain with variable X, and let Y = {y j | j =
1 . . . l} be the child domain with variable Y . The deductive conditional relationship
between X and Y is then expressed as the set of conditionals p(Y |X) with k specific conditionals p(Y |xi ), each having the l dimensions of Y . This is illustrated in
Figure 9.2.
A specific conditional probability distribution p (Y |xi ) relates the value xi to the
values of variable Y . The probability distribution p (Y |xi ) consists of l conditional
probabilities expressed as:
l
p(Y |xi ) = {p(y j |xi ), for j = 1 . . . l}, where
∑ p(y j |xi ) = 1 .
j=1
(9.8)
9.2 Probabilistic Conditional Inference
Parent
variable
X
141
Child variable
Y
y1
˜˜˜
y2
x1
p(Y | x1)
x2
p(Y | x2)
.
.
.
.
.
.
xk
p(Y | xk)
yl
Conditionals p(Y|X)
Fig. 9.2 Multinomial conditionals between parent X and child Y
The term p(Y |X) denotes the set of probability distributions of the form of
Eq.(9.8), which can be expressed as:
p(Y |X) = {p(Y |xi ), for x = 1 . . . k} .
(9.9)
The generalised law of total probability for multinomial probability distributions
is expressed as:
k
General law of total probability: p(y) = ∑ p(xi )p(y|xi ) .
(9.10)
i=1
The probabilistic expression for multinomial conditional deduction from X to
Y is directly derived from the law of total probability of Eq.(9.10). The deduced
probability distribution over Y is denoted p (Y kX), where the deduced probability
p(y j kX) of each value y j is:
k
Deduced probability: p(y j kX) = ∑ p(xi )p(y j |xi ) .
(9.11)
i=1
The deduced probability distribution on Y can be expressed as:
l
p(Y kX) = {p(y j kX) for j = 1 . . . l} where
∑ p(y j kX) = 1 .
(9.12)
j=1
Note that in case the exact variable value X = xi i known, i.e. p(xi ) = 1, then
from Eq.(9.11) it can immediately be seen that the deduced probability distribution
becomes p(Y kX) = p(Y |xi ).
Moving over to abduction, it is necessary to first compute the inverted conditionals. The multinomial probabilistic expression for inverting conditionals, according
to the generalised Bayes theorem is:
Generalised Bayes theorem: p(xi |y j ) =
a(xi )p(y j |xi )
k
a(xt )p(y j |xt )
∑t=1
(9.13)
142
9 Conditional Deduction
where a(xi ) represents the base rate of xi .
By substituting the conditionals of Eq.(9.11) with inverted multinomial conditionals from Eq.(9.13), the general expression for probabilistic abduction emerges:
!
l
a(xi )p(y j |xi )
Abduced probability: p(xi kY ) = ∑ p(y j )
.
(9.14)
k
a(xt )p(y j |xt )
∑t=1
j=1
The set of abduced probability distributions can be expressed as:
k
p(X k̃Y ) = {p(xi kY ) for i = 1 . . . k} where
∑ p(xi kY ) = 1
.
(9.15)
i=1
The terms used in the above described formalism for multinomial conditional
inference are interpreted as follows:
p(Y |X) : set of conditional probability distributions on Y
p(Y |xi ) : specific conditional probability distribution on Y
p(X|Y ) : set of conditional probability distributions on X
p(X|y j ) : specific conditional probability distribution on X
p(X)
: probability distribution over X
p(xi )
: probability of specific value xi
p(Y )
: probability distribution over Y
p(y j )
: probability of specific value y j
a(X)
: base rate distribution over X
a(Y )
: base rate distribution over Y
p(Y kX) : deduced probability distribution over Y
p(y j kX) : deduced probability of specific value y j
p(X k̃Y ) : abduced probability distribution over X
p(xi k̃Y ) : abduced probability of specific value xi
9.3 Notation for Subjective Conditional Inference
143
The above described formalism is illustrated by a numerical example in Section 10.8.1.
9.3 Notation for Subjective Conditional Inference
This section simply introduces the notation used for conditional deduction and abduction in subjective logic. The notation is similar to the corresponding notation for
probabilistic deduction and abduction. The detailed mathematical description of the
operators is provided in subsequent sections below.
9.3.1 Notation for Binomial Deduction and Abduction
Let X = {x, x} and Y = {y, y} be two binary domains with respective variables X and
Y , where there is a degree of relevance between X and Y . Let ωx = (bx , dx , ux , ax ),
ωy|x = (by|x , dy|x , uy|x , ay|x ) and ωy|x = (by|x , dy|x , uy|x , ay|x ) be an agent’s respective
opinions about x being true, about y being true given that x is true, and finally about
y being true given that x is false.
Conditional deduction is computed with the deduction operator denoted ‘⊚’ so
that binomial deduction is expressed as:
Binomial opinion deduction: ωykx = ωx ⊚ (ωy|x , ωy|x ) .
(9.16)
e so
Conditional abduction is computed with the abduction operator denoted ‘⊚’,
that binomial abduction is expressed as:
e (ωy|x , ωy|x , ax )
Binomial opinion abduction: ωxk̃y = ωy ⊚
(9.17)
= ωy ⊚ (ωx|y , ωx|y ) = ωxky .
The conditionally abduced opinion ωxk̃y expresses the belief about x being true
as a function of the beliefs about y and the two sub-conditionals y|x and y|x, as well
as the base rate ax .
In order to compute Eq.(9.17) it is necessary to invert the conditional opinions
ωy|x and ωy|x in order to obtain the conditional opinions ωx|y and ωx|y , so that the
final part of the abduction computation can be based on simple deduction according
to Eq.(9.16). Inversion of binomial opinion conditionals is described in Section 10.3
below. The notation for binomial opinion inversion is given below:
e ({ωy|x , ωy|x }, ax )
Binomial opinion inversion: {ωx|y , ωx|y } = ⊚
(9.18)
144
9 Conditional Deduction
9.3.2 Notation for Multinomial Deduction and Abduction
Let domain X have cardinality k = |X| and domain Y have cardinality l = |Y|, where
variable X plays the role of parent, and variable Y the role of child.
Assume the set of conditional opinions of the form ωY |xi , where i = 1 . . . k. There
is thus one conditional opinion for each element xi of the parent variable. Each of
these conditionals must be interpreted as the subjective opinion on Y given that xi is
TRUE. The subscript notation on each conditional opinion ωY |xi specifies not only
the child variable Y it applies to, but also the element xi of the parent variable it is
conditioned on.
By extending the notation for binomial conditional deduction to the case of multinomial opinions, the general expression for multinomial conditional deduction is
written as:
Multinomial opinion deduction: ωY kX = ωX ⊚ ωY |X
(9.19)
where the symbol ‘⊚’ denotes the conditional deduction operator for subjective
opinions, and where ωY |X is a set of k = |X| different opinions conditioned on each
xi ∈ X respectively.
The structures of deductive reasoning is illustrated in Figure 9.3. The conditionals are expressed on the child variable, which is also the target variable for the
deductive reasoning. In the example of weather being the parent variable, and carrying an umbrella being the child variable, the evidence is about the weather, and
the conclusion is an opinion about carrying an umbrella.
Child variable
Parent variable
Y
X
y1
y2
x2
ZX
.
.
.
Z Y | xk
xk
=
yl
Conditionals ZY | X
Z Y | x1
Z Y | x2
x1
.
.
.
˜˜˜
ZY || X
Fig. 9.3 Structure of conditionals for deduction
In case of abduction, the goal is to reason from the child variable Y to the parent
variable X. The multinomial expression for subjective logic conditional abduction
is written as:
9.3 Notation for Subjective Conditional Inference
145
e (ωY |X , a X )
Multinomial opinion abduction: ωX k̃Y = ωY ⊚
(9.20)
= ωY ⊚ ωX|Y = ωXkY .
e denotes the general conditional abduction operawhere the operator symbol ‘⊚’
tor for subjective opinions, and where ωY |X is a set of k = |X| different multinomial
opinions conditioned on each xi ∈ X respectively. The base rate distribution over X
is denoted by a X .
In order to compute the abduced opinion according to Eq.(9.20) it is necessary to
invert the set of conditional opinions ωY |X which produces the set of conditional
opinions ωX|Y so that the final part of the abduction computation can be based
on multinomial deduction according to Eq.(9.19). Inversion of multinomial opinion conditionals is described in Section 10.3 below. The notation for multinomial
opinion inversion is given below:
e (ωY |X , a X )
Multinomial opinion inversion: ωX|Y = ⊚
(9.21)
The structures of abductive reasoning is illustrated in Figure 9.4. The fact that
both the evidence opinion as well as the set of conditional opinions are expressed
on the child variable makes this reasoning situation complex. The parent is now
the target variable, so it is a situation of reasoning from child to parent. In the example of weather being the causal parent variable, and carrying an umbrella being
the consequent child variable, the evidence is about carrying an umbrella, and the
conclusion is an opinion about its possible cause, the weather.
Parent variable
Child variable
X
Y
x1
x2
˜˜˜
xk
y1
y2
.
.
.
ZY
~
Conditionals ZY | X
.
Z Y | x1 Z Y | x2
.
Z Y | xk
.
yl
=
ZX ~|| Y
Fig. 9.4 Structure of conditionals for abduction
Note that the conditionals do not necessarily have to be causal. However, for
analysts it is typically easier to express conditionals in the causal direction, which
146
9 Conditional Deduction
is the reason why it is normally assumed that parent variables represent causes of
child variables.
The above sections presented the concepts of conditionals as well as of deductive
and abductive reasoning. It is time to get down to the mathematical details. This next
sections describe binomial and multinomial deduction in subjective logic.
The binomial case can be described in the form of closed expressions, whereas
the multinomial case requires a series of steps that need to be implemented as an
algorithm. For that reason, the cases of binomial and multinomial deduction are
presented separately.
9.4 Binomial Deduction
Conditional deduction with binomial opinions has previously been described in [55].
However, that description did not include the base rate consistency requirement of
Eq.(9.24) which has been added to the description of binomial deduction of Definition 9.1 below.
9.4.1 Bayesian Base Rate
In general, the base rate of x and the conditionals on y put constraints on the base
rate of y. The strictest base rate consistency requirement is to derive a specific base
rate, as described here. The expression for the base rate ay in Eq.(9.26) is derived
from the base rate consistency requirement:
Binomial base rate consistency requirement: ay = ax Py|x + ax Py|x
(9.22)
Assuming that ωy|x and ωy|x are not both vacuous, i.e. that uy|x + uy|x < 2 the
simple expression for the Bayesian base rate ay can be derived as follows:
ay = ax Py|x + ax Py|x = ax (by|x + ay uy|x ) + ax (by|x + ay uy|x )
ay =
ax by|x +ax by|x
1−ax uy|x −ax uy|x
⇔
(9.23)
With the Bayesian base rate of Eq.(9.23), it is guaranteed that the projected probabilities of binomial conditional opinions do not change after multiple inversions.
In case ωy|x and ωy|x are both vacuous, i.e. when uy|x = uy|x = 1, then there is no
constraint on the base rate ay , as in the case of the free base rate interval described
in Section 9.4.2.
9.4 Binomial Deduction
147
Figure 9.5 shows a screenshot of binomial deduction, involving a Bayesian base
rate, which is equal to the deduced projected probability given a vacuous antecedent
◦
ω x , according to the requirement of Eq.(9.22).
u
u
u
y|™x

DEDUCE
=
y|x
d
b
aP
Opinion about x
belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.80
probability 0.80
d
b
P
Opinion about y|x
belief
0.40
disbelief
0.50
uncertainty 0.10
base rate
0.40
probability 0.44
b
d
aP
aP
y|™x
0.00
0.40
0.60
0.40
0.24
Opinion about y
belief
0.32
disbelief
0.48
uncertainty 0.20
base rate
0.40
probability 0.40
◦
Fig. 9.5 Screenshot of deduction with vacuous antecedent ω x , and corresponding Bayesian base
rate ay = 0.40
To intuitively see why Bayesian base rate is necessary, consider the case of a a
pair of dogmatic conditional opinions ωy|x and ωy|x , where both projected probabilities are Py|x = Py|x = 1. In this trivial case we always have p(y) = 1 independently on the probability p(x). It would then be totally inconsistent to e.g. have base
rate ay = 0.5 when we always have p(y) = 1. The base rate must reflect reality, so
the only consistent base rate in this case is ay = 1, which emerges directly from
Eq.(9.23).
9.4.2 Free Base Rate Interval
There may be situations where it is not necessary to have the strict requirement
of a specific base rate for ay , but instead allow an interval for ay . In general, the
more dogmatic the conditional opinions the stricter the constraints on base rates.
This section describes the relatively lax requirement of an interval for choosing a
free base rate. Section 9.4.1 above describes the more strict requirement of a single
Bayesian base rate.
The idea of a free base rate interval, is that the base rate ay in Eq.(9.26) must take
its value from the interval defined by a lower base rate limit a−
y and an upper base
rate limit a+
in
order
to
be
consistent
with
the
conditionals.
In case of dogmatic
y
conditionals, the free base rate interval is reduced to a single base rate, which in
148
9 Conditional Deduction
fact is the Bayesian base rate. The upper and lower limits for free base rates are the
projected probabilities of the consequent opinions resulting from first assuming a
◦
vacuous antecedent opinion ω x , and then to hypothetically set maximum (=1) and
minimum (=0) base rates for the consequent variable y. The upper and lower base
rate limits are:
1
Upper base rate: a+
y = max[ax Py|x + ax Py|x ]
ay =0
1
= max[ax (by|x + ay uy|x ) + ax (by|x + ay uy|x )]
ay =0
(9.24)
1
Lower base rate: a−
y = min [ax Py|x + ax Py|x ]
ay =0
1
= min [ax (by|x + ay uy|x ) + ax (by|x + ay uy|x )]
ay =0
+
The free base rate interval for ay is then expressed as [a−
y , ay ], meaning that the
base rate ay must be within that interval in order to be consistent with the pair of
+
conditionals ωy|x and ωy|x . Note that the free base rate interval [a−
y , ay ] is also a
function of base rate ax .
Figure 9.6 shows a screenshot of binomial deduction, indicating the upper base
rate a+
y = 0.52, which is equal to the deduced projected probability given a vacuous
◦
antecedent ω x and a hypothetical base rate ay = 1, according to Eq.(9.24).
Figure 9.7 shows a screenshot of binomial deduction, indicating the lower base
rate a−
y = 0.32, which is equal to the deduced projected probability given a vacuous
◦
antecedent ω x and a hypothetical zero base rate ay = 0, according to Eq.(9.24).
In case of dogmatic conditionals, the free base rate interval collapses to a single
value expressed by Eq.(9.23).
ay = ax Py|x + ax Py|x = ax by|x + ax by|x
(9.25)
In case both ωy|x and ωy|x are vacuous opinions, i.e. when uy|x = uy|x = 1, then the
free base rate interval for ay is [0, 1], meaning that there is no consistency constraint
on the base rate ay .
9.4.3 Method for Binomial Deduction
Binomial opinion deduction is a generalisation of probabilistic conditional deduction expressed in Eq.(9.3). It is assumed that the base rate ay is consistent with the
9.4 Binomial Deduction
149
u
u
u
y|™x

DEDUCE
=
y|x
d
b
d
Opinion about y|x
belief
0.40
disbelief
0.50
uncertainty 0.10
base rate
1.00
probability 0.50
Opinion about x
belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.80
probability 0.80
d
b
a
P P
aP
P
b
a
Opinion about y
belief
0.32
disbelief
0.48
uncertainty 0.20
base rate
1.00
probability 0.52
y|™x
0.00
0.40
0.60
1.00
0.60
◦
Fig. 9.6 Screenshot of deduction with vacuous antecedent ω x , indicating upper free base rate
ay = 0.52
u
u
u
y|™x

DEDUCE
=
y|x
d
b
aP
Opinion about x
belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.80
probability 0.80
d
aP
b
P
Opinion about y|x
belief
0.40
disbelief
0.50
uncertainty 0.10
base rate
0.00
probability 0.40
y|™x
0.00
0.40
0.60
0.00
0.00
d
a
b
P
Opinion about y
belief
0.32
disbelief
0.48
uncertainty 0.20
base rate
0.00
probability 0.32
◦
Fig. 9.7 Screenshot of deduction with vacuous antecedent ω x , indicating lower free base rate ay =
0.32
base rate ax and the pair of conditionals ωx|y and ωx|y , as described in Section 9.4.1,
or alternatively as described in Section 9.4.2.
Definition 9.1 (Conditional Deduction with Binomial Opinions).
Let X = {x, x} and Y = {y, y} be two binary domains where there is a degree of
relevance of variable X ∈ X to variable Y ∈ Y. Let opinions ωx = (bx , dx , ux , ax ),
ωy|x = (by|x , dy|x , uy|x , ay ) and ωy|x = (by|x , dy|x , uy|x , ay ) be an agent’s respective opinions about x being true, about y being true given that x is true and about y being true
given that x is false. The deduced opinion ωykx = (bykx , dykx , uykx , ay ) is computed
150
9 Conditional Deduction
as:

bykx








dykx






uykx
ωykx :







ay







ay
= bIy − ay K
= dyI − (1 − ay )K
= uIy + K
=
(9.26)
ax by|x +ax by|x
1−ax uy|x −ax uy|x
= ay (arbitrary value)
for uy|x + uy|x < 2
for uy|x = uy|x = 1
 I
by = bx by|x + dx by|x + ux (by|x ax + by|x (1 − ax ))





dyI = bx dy|x + dx dy|x + ux (dy|x ax + dy|x (1 − ax ))
where




 I
uy = bx uy|x + dx uy|x + ux (uy|x ax + uy|x (1 − ax ))
(9.27)
and where K is determined according to the following selection criteria:
Case I:
((by|x > by|x ) ∧ (dy|x > dy|x )) ∨ ((by|x ≤ by|x ) ∧ (dy|x ≤ dy|x ))
=⇒ K = 0
Case II.A.1: ((by|x > by|x ) ∧ (dy|x ≤ dy|x ))
∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) ≤ ax )
=⇒ K =
(9.28)
(9.29)
ax ux (bIy −by|x )
(bx +ax ux )ay
Case II.A.2:
((by|x > by|x ) ∧ (dy|x ≤ dy|x ))
∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) > ax )
=⇒ K =
ax ux (dyI −dy|x )(by|x −by|x )
(dx +(1−ax )ux )ay (dy|x −dy|x )
Case II.B.1: ((by|x > by|x ) ∧ (dy|x ≤ dy|x ))
∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) ≤ ax )
=⇒ K =
(9.30)
(1−ax )ux (bIy −by|x )(dy|x −dy|x )
(bx +ax ux )(1−ay )(by|x −by|x )
(9.31)
9.4 Binomial Deduction
151
Case II.B.2: ((by|x > by|x ) ∧ (dy|x ≤ dy|x ))
∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) > ax )
=⇒ K =
(dx +(1−ax )ux )(1−ay )
Case III.A.1: ((by|x ≤ by|x ) ∧ (dy|x > dy|x ))
∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) ≤ ax )
=⇒ K =
where
(bx +ax ux )(1−ay )
(9.36)
ax ux (bIy −by|x )(dy|x −dy|x )
(dx +(1−ax )ux )(1−ay )(by|x −by|x )

 P(ωykx◦ ) = by|x ax + by|x (1 − ax ) + ay (uy|x ax + uy|x (1 − ax ))

(9.35)
ax ux (dyI −dy|x )
Case III.B.2: ((by|x ≤ by|x ) ∧ (dy|x > dy|x ))
∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) > ax )
=⇒ K =
(9.34)
(1−ax )ux (bIy −by|x )
(dx +(1−ax )ux )ay
Case III.B.1: ((by|x ≤ by|x ) ∧ (dy|x > dy|x ))
∧ (P(ωykx◦ ) > (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) ≤ ax )
=⇒ K =
(9.33)
(1−ax )ux (dyI −dy|x )(by|x −by|x )
(bx +ax ux )ay (dy|x −dy|x )
Case III.A.2: ((by|x ≤ by|x ) ∧ (dy|x > dy|x ))
∧ (P(ωykx◦ ) ≤ (by|x + ay (1 − by|x − dy|x )))
∧ (P(ωx ) > ax )
=⇒ K =
(9.32)
(1−ax )ux (dyI −dy|x )
(9.37)
P(ωx ) = bx + ax ux
⊔
⊓
The computed ωykx is the conditionally deduced opinion derived from ωx , ωy|x
and ωy|x . It expresses the belief in y being true as a function of the beliefs in x and
the two sub-conditionals y|x and y|x. The conditional deduction operator is a ternary
operator, and by using the function symbol ‘⊚’ to designate this operator, we define
ωykx = ωx ⊚ (ωy|x , ωy|x ) .
Figure 9.8 shows a screenshot of binomial deduction, involving the Bayesian
base rate ay = 0.40. The deduced opinion ωykx = (0.07, 0.42, 0.51, 0.40) lies within
the sub-triangle defined by ωy|x , ωy|x , and ωykx◦ = (0.32, 0.48, 0.20, 0.40). In this
case, the sub-triangle is reduced to a line between ωy|x , ωy|x , because ωykx◦ is situated
on that line.
152
9 Conditional Deduction
u
u
u
y|™x

DEDUCE
=
y|x
d
b
P
a
Opinion about x
belief
0.10
disbelief
0.80
uncertainty 0.10
base rate
0.80
probability 0.18
d
b
P
b
d
aP
Opinion about y|x
belief
0.40
disbelief
0.50
uncertainty 0.10
base rate
0.40
probability 0.44
P a
y|™x
0.00
0.40
0.60
0.40
0.24
Opinion about y
belief
0.07
disbelief
0.42
uncertainty 0.51
base rate
0.40
probability 0.28
Fig. 9.8 Screenshot of deduction, involving Bayesian base rate ay = 0.40
Note that in case x is known to be true, i.e. ωx = (1, 0, 0, a) is an absolute positive
opinion, then obviously ωykx = ωy|x . Similarly, in case x is known to be false, i.e.
ωx = (0, 1, 0, a) is an absolute negative opinion, then obviously ωykx = ωy|x .
9.4.4 Justification for the Binomial Deduction Operator
While not particularly complex, the expressions for binomial conditional inference
has many cases which can be difficult to understand and interpret. A more direct
and intuitive justification can be found in its geometrical interpretation.
The image space of the child opinion is a subtriangle where the two subconditionals ωy|x and ωy|x form the two bottom vertices. The third vertex of the
◦
subtriangle is the child opinion resulting from a vacuous parent opinion ω x . This
particular child opinion, denoted ωykx◦ , is determined by the base rates of x and y
as well as the horizontal distance between the sub-conditionals. The parent opinion
then determines the actual position of the child within that subtriangle.
For example, when the parent is believed to be TRUE, i.e. ωx = (1, 0, 0, ax ), the
child opinion is ωykx = ωy|x , when the parent is believed to be FALSE, i.e. ωx =
(0, 1, 0, ax ), the child opinion is ωykx = ωy|x , and when the parent opinion is vacuous,
◦
i.e. ω x = (0, 0, 1, ax ), the child opinion is ωykx = ωykx◦ . For all other opinion values
of the parent, the child opinion is determined by linear mapping from a point in the
parent triangle to a point in the child subtriangle according to Def.9.1.
It can be noted that when ωy|x = ωy|x , the child subtriangle is reduced to a point,
so that it is necessary that ωykx = ωy|x = ωy|x = ωykx◦ in this case. This would mean
that there is no relevance relationship between parent and child.
9.4 Binomial Deduction
153
Assume the sub-triangle defined by the conditional opinions inside the child
opinion triangle. Deduction then consists of linearly projecting the antecent opinion onto the sub-triangle. Figure 9.9 illustrates the principle of projecting an parent
opinion to the child opinion in the subtriangle with vertices:
Arguments:
Deduced opinion:

 ωy|x = (0.55, 0.30, 0.15, 0.38)
ω = (0.10, 0.75, 0.15, 0.38)
 y|x
ωx = (0.00, 0.40, 0.60, 0.50)
Zq
⇒ ωykx = (0.15, 0.48, 0.37, 0.38)
ux vertex
uy vertex
x
Zx
Z y|| x
Z y|| xq
Z y| x
x
(9.38)
Px
y
ax
Opinion on antecedent
x
Z y| x
y
Py|| x a y
Deduced opinion on
y
Fig. 9.9 Projection from parent opinion triangle to the child opinion sub-triangle
The deduced opinion can be obtained geometrically by mapping the position of
the parent ωx in the parent triangle onto a position that relatively seen has the same
in the subtriangle (shaded area) of the child triangle.
Additional parameters seen in Figure 9.9 are:

Projected probability of x:
Px = 0.30





Projected deduced probability: Pykx = 0.29




 Vacuous-deduced opinion:
ωykx◦ = (0.19, 0.30, 0.51, 0.38)
(9.39)
In the general case, the child image subtriangle is not equal-sided as in the example above. By setting the base rate of x different from 0.5, and by defining subconditionals with different uncertainty, the child image subtriangle will be skewed, and
it is even possible that the uncertainty of ωykx◦ is less that that of ωx|y or ωx|y .
154
9 Conditional Deduction
9.5 Multinomial Deduction
9.5.1 Constraints for Multinomial Deduction
Conditional deduction with multinomial opinions has previously been described in
[39]. However, the description below has improved clarity and also includes a description of consistency intervals for base rate distributions.
Multinomial opinion deduction is a generalisation of probabilistic conditional
deduction as expressed in Eq.(9.11).
Let X = {xi |i = 1, . . . , k} and Y = {y j | j = 1, . . . , l} be random variables, where
X is the evidence variable, and Y is the target variable in given domains.
Assume an opinion ωX = (bbX , uX , a X ) on X and a set of conditional opinions
ωY |xi on Y , one for each xi , i = 1, . . . , k. The conditional opinion ωY |xi is a subjective
opinion on Y given that X takes the value xi . Formally, each conditional opinion
ωY |xi , i = 1, . . . , k, is a tuple:
ωY |xi = (bbY |xi , uY |xi , aY ),
(9.40)
where bY |xi : Y → [0, 1] is a belief mass distribution and uY |xi ∈ [0, 1] is an uncertainty
mass, such that Eq.(2.6) holds, and the base rate distribution aY : Y → [0, 1] is a prior
probability distribution of Y . We denote by ωY |X the set of all conditional opinions
on Y given the values of X:
ωY |X = {ωY |xi |i = 1, . . . , k}.
(9.41)
Motivated by the above analysis, we want to deduce a subjective opinion on the
target variable Y :
ωY kX = (bbY kX , uY kX , aY ) ,
(9.42)
where bY kX : Y → [0, 1] is a belief mass distribution and uY kX ∈ [0, 1] is an uncertainty mass, such that Eq.(2.6) holds.
Note that the base rate distribution aY is the same for all of the conditional opinions in ωY |X , and we have the same base rate distribution for the deduced opinion as
well. We could as equally well take a separate base rate distribution aY |xi , for each
i = 1, . . . , k, and require that aY is determined from the given base rate distributions
by Eq.(9.11); or not put any requirements for a relation between the base rates at all.
By keeping the same base rate in the given conditional opinions and the deduced
one, we want to point out that the method of deducing an opinion described below
is irrelevant on the choice of the base rate distributions and their interconnections.
The definition of Bayesian deduction for subjective opinions should be compatible with the definition of Bayesian deduction for probability distributions described
in Section 9.2. This requirement leads to the following conclusion: The projected
probability of the deduced opinion ωY kX should satisfy the probabilistic deduction
relation given in Eq.(9.11), i.e. :
9.5 Multinomial Deduction
155
k
PY kX (y j ) = ∑ PX (xi )PY |xi (y j ),
(9.43)
i=1
for j = 1, . . . , l, where Eq.(3.12) provides each factor on the right-hand side of
Eq.(9.43).
On the other hand, from Eq.(3.12), we have:
PY kX (y j ) = bY kX (y j ) + aY (y j )uY kX .
(9.44)
Eq.(9.43) and Eq.(9.44) together determine l linear equations with the beliefs
bY kX (y j ), j = 1, . . . , l, and uncertainty uY kX as variables. We obtain one more equation over the same variables from the additivity property for the beliefs and uncertainty of the subjective opinion ωY kX , by Eq.(2.6):
l
uY kX + ∑ bY kX (y j ) = 1.
(9.45)
j=1
This means that we have a system of l + 1 equation with l + 1 variables, which
might seem to fully determine the deduced opinion ωY kX . However, the projected
probabilities on the left-hand side of the equations in Eq.(9.44) also add up to 1,
which makes this system dependent. Hence, the system has an infinite number of
solutions, which means that there are infinitely many subjective opinions on Y with
a base rate aY , the projected probability distribution of which satisfies Eq.(9.43).
This is in correspondence with the geometrical representation given in Figure 9.9
and Figure 9.10, namely: Once we have an opinion point ωY as a solution, then every
other point in ΩY , lying on the line through ωY parallel to the director line, will also
be a solution. The question is which one of these points is the most appropriate to
represent the deduced opinion on Y from the given input opinions.
The above observation suggests that we need to somehow choose and fix one
belief mass (or the uncertainty mass) of the deduced opinion, and determine the rest
of the values from the projected probability relations in Eq.(9.44). Since, in general, we do not have a reason to distinguish among belief mass values, the obvious
candidate for this is the uncertainty mass. In what follows, we provide a method for
determining the most suitable uncertainty mass value for the deduced opinion corresponding to the given input opinions, i.e. we provide a method for fully determining
the deduced opinion.
The method of obtaining the deduced opinion ωY kX from the opinions in the set
ωY |X and an opinion ωX , i.e. the method of determining a suitable uncertainty mass
value of the opinion ωY kX for the given input, is inspired by the geometric analysis of
the input opinions and how they are related. The idea is that the conditional opinions
in ωY |X are input arguments to the deduction operator which maps ωX into ωY kX .
Multinomial deduction is denoted by:
ωY kX = ωX ⊚ ωY |X
.
(9.46)
156
9 Conditional Deduction
The deduction operator ‘⊚’ maps the whole opinion space of X, ΩX , into a subspace of ΩY , which we call a deduction sub-space. The following intuitive constraints are taken into consideration in providing the definition of the deduction
operator:
Constraint 1. The vertices lying on the base of the opinion space ΩX map correspondingly into the opinion points determined by ωY |X . This means that the conditional opinions on Y in the set ωY |X = {ωY |xi |i = 1, . . . , k}, correspond to the absolute
opinions on X, namely:
ωY |xi = ωXi ⊚ ωY |X ,
(9.47)
for i = 1, . . . , k, where ωXi = (bbiX , uiX , a X ) is the absolute opinion on X such that
b iX (xi ) = 1 (consequently b iX (x j ) = 0, for j 6= i, and uiX = 0), and a X is the same as
in ωX .
Constraint 2. The apex of the opinion space ΩX maps into the apex of the deduction
sub-space. The apex of ΩX corresponds to the vacuous opinion on X given as:
◦
ω X = (bb0 , 1, a X ),
(9.48)
where b 0 is the zero-belief mass distribution that has b 0 (xi ) = 0, for every i =
1, . . . , k. Let us denote by ω ◦ the opinion on Y that corresponds to the apex of
Y kX
the deduction sub-space, corresponding to the deduction of the vacuous opinion
◦
ω X . Then we obtain the following constraint on the operator ⊚:
ω
◦
◦
Y kX
= ω X ⊚ ωY |X .
(9.49)
Now, according to Eq.(9.47) and Eq.(9.49), the vertices of the domain opinion space
ΩX map into the opinion points ωY |xi , i = 1, . . . , k, and ω ◦ with the deduction
Y kX
operator ⊚. We want the deduction operator to be defined in a way that the deduction
sub-space is the ‘convex closure’ of these points. In that way, the deduction subspace, and the deduction itself, will be fully determined by the given conditional
opinions in ωY |X , and the given base rate distribution a X .
Constraint 3. The image of an arbitrary opinion point ωX on the evidence variable
X is obtained by linear projection of the parameters of ωX inside the deduction subspace, and represents an opinion ωY kX on the target variable Y .
A visualisation of the above in the case of trinomial opinions where the opinion
spaces are tetrahedrons, is given in Figure 9.10.
The deduction sub-space is shown as a shaded tetrahedron inside the opinion
tetrahedron of Y , on the right-hand side of Figure 9.10.
Based on the above assumptions, the deduced opinion ωY kX results from first to
construct the deduction sub-space, and then to project the opinion ωX onto it.
The deduction sub-space is bounded by the k points ωY |xi , i = 1, . . . , k, and the
point that corresponds to the vacuous opinion ω ◦ . While the former are given, the
Y kX
latter needs to be computed, as described in Step 1 below.
9.5 Multinomial Deduction
157
uX vertex
ZY || Xq
Zq X
uY vertex
ZY |x
ZY || X
ZX
ZY |x
ZY |x
x2
2
1
3
y2
x3
Opinion on antecedent
X
x1
y3
Deduced opinion on
Y
y1
Fig. 9.10 Projecting an antecedent opinion into a consequent deduction Subspace
The opinion ωX is then linearly projected onto this sub-space, which means that
its uncertainty mass uY kX is determined as a linear transformation of the parameters
of ωX , with the belief masses determined accordingly.
9.5.2 Bayesian Base Rate Distribution
Similarly to the Bayesian base rate for binomial conditionals as described in Section 9.4.1, there is a Bayesian base rate distribution for multinomial conditional
opinions. This is necessary if it is required that the set of conditional opinions between nodes X and Y can be inverted multiple times, while preserving their projected
probabilities.
Assume parent node X of cardinality k and child node Y of cardinality l, with
◦
associated set of conditional opinions ωY |X . Let ω X denote the vacuous opinion on
X.
Let P ◦ denote the deduced projected probability distribution computed with
Y kX
◦
the vacuous opinion ω X . The constraint on the base rate distribution aY is:
Multinomial base rate consistency requirement: aY = P
◦
Y kX
(9.50)
Assuming that the conditional opinions ωY |X are not all vacuous, formally expressed as ∑x∈X uY |x < k, the simple expression for aY (y) can be derived from
Eq.(9.50) as follows:
158
9 Conditional Deduction
aY (y) = ∑ aX (x) PY |x (y) = ∑ aX (x) (bbY |x (y) + aY (y) uY |x )
x∈X
⇔
aY (y) =
x∈X
∑x∈X a X (x) bY |x (y)
1−∑x∈X a X (x) uY |x
(9.51)
By using the Bayesian base rate distribution aY of Eq.(9.51), it is guaranteed
that the multinomial opinion conditionals get the same projected probability distributions after multiple inversions. Note that Eq.(9.51) is a generalisaiton of the
binomial case in Eq.(9.23).
In case all conditionals in the set ωY |X are vacuous, i.e. when ∑x∈X uY |x = k, then
there is no constraint on the base rate distribution aY .
9.5.3 Free Base Rate Distribution Intervals
Similarly to the free base rate interval for binomial conditionals as described in
Section 9.4.2, there is a set of free base rate intervals for the base rate distribution
of multinomial conditional opinions.
More specifically, the base rate distribution aY of the deduced opinion ωY kX in
Eq.(9.46) must be constrained by intervals defined by a lower base rate limit aY− (y)
and an upper base rate limit aY+ (y) for each element y ∈ Y in order to be consistent
with the set of conditionals ωY |X . In case of dogmatic conditionals, the set of free
base rate intervals is reduced to a Bayesian base rate distribution. The upper and
lower limits for consistent base rates are the projected probabilities of the consequent opinions resulting from first assuming a vacuous antecedent opinion ωX , and
then to hypothetically set maximum (=1) and minimum (=0) base rates for elements
y ∈ Y.
Assume parent node X of cardinality k and child node Y of cardinality l, with as◦
◦
sociated set of conditional opinions ωY |X . Let ω X , where uX = 1, denote the vacuous
opinion on X.
Let P ◦ denote the deduced projected probability distribution computed with
Y kX
◦
the vacuous opinion ω X as antecedent. The free base rate distribution aY still has
the constraint of the upper and lower limits:
1
Upper base rate: aY+ (y) = max [P
aY (y)=0
◦
Y kX
(y)]
1
= max [ ∑ a X (x) PY |x (y)]
aY (y)=0 x∈X
1
= max [ ∑ a X (x) (bbY |x (y) + aY (y) uY |x )]
aY (y)=0 x∈X
(9.52)
9.5 Multinomial Deduction
159
1
Lower base rate: aY− (y) = min [P
aY (y)=0
◦
Y kX
(y)]
1
= min [ ∑ a X (x) PY |x (y)]
aY (y)=0 x∈X
(9.53)
1
= min [ ∑ aX (x) (bbY |x (y) + aY (y) uY |x )]
aY (y)=0 x∈X
The free base rate interval for aY (y) is then expressed as [aY− (y), aY+ (y)], meaning
that the base rate aY (y) must be within that interval in order to be consistent with the
set of conditionals ωY |X . Note that the free base rate interval [aY− (y), aY+ (y)] is also a
function of the base rate distribution a X . In case of dogmatic conditionals ωY |X the
set of free base rate intervals for aY collapses to a base rate distribution, where each
base rate aY (y) is expressed by Eq.(9.54).
aY (y) =
∑ a X (x) PY |x (y) = ∑ a X (x) bY |x (y)
x∈X
(9.54)
x∈X
In case the set of conditional opinions ωY |X contains only vacuous opinions, i.e.
when uY |x = 1 for all x ∈ X, then every free interval is [0, 1], meaning that there is
no constraint on the base rate distribution aY other than the additivity requirement
∑ aY (y) = 1.
9.5.4 Method for Multinomial Deduction
Deduction for multinomial conditional opinions can be described in 3 steps. The first
step consists of determining the Bayesian base rate distribution (or set of base rate
intervals) for aY . The second step consists of determining the image sub-simplex of
the deduction space Y . The third step consists of linear mapping of the opinion on
X onto the sub-simplex of Y to produce the deduced opinion ωY kX .
Step 1: Compute the Bayesian base rate distribution aY according to Eq.(9.55), as
described in Section 9.5.2.
∑x∈X a X (x) bY |x (y)
1 − ∑x∈X a X (x) uY |x
aY (y) =
(9.55)
Alternatively, if the analyst wants to specify the base rate distribution more freely,
a set of base rate intervals can be computed as described in Section 9.5.3.
Step 2: The belief mass assigned to each value y j of Y in any deduced opinion ωY kX
should be at least as large as the minimum of the corresponding belief masses in the
given conditionals, i.e. :
bY kX (y j ) ≥ min[bbY |xi (y j )],∀ j = 1, . . . , l.
i
(9.56)
160
9 Conditional Deduction
This is intuitively clear: If we assign a belief mass to a value y j of Y , for every
possible value of X, then the belief mass we would assign to y j without knowing the
value of X should be at least as large as the minimum of the assigned belief masses
to y j conditional on the values of X.
Eq.(9.56) holds for the belief masses of every deduced opinion, and in particular
for the belief masses of the opinion ω ◦ . In determining ω ◦ we need to consider
Y kX
Y kX
the constraint about the projected probability distribution given in Eq.(9.43), and
keep track on the condition in Eq.(9.56), while maximizing the uncertainty.
The fact that all deduced opinions should satisfy Eq.(9.56) has the following
geometrical interpretation in tetrahedron opinion spaces: All the deduced opinion
points must be inside the auxiliary deduction sub-space of ΩY , that is the sub-space
bounded with the planes bY (y j ) = mini [bbY |xi (y j )] (parallel to the sides of ΩY ).
◦
Applying Eq.(9.43) to the vacuous opinion on X, ω X , we obtain the following
equation for the projected probability distribution of ω ◦ :
Y kX
k
P
◦
Y kX
(y j ) = ∑ a X (xi )PY |xi (y j ).
(9.57)
i=1
On the other hand, for the projected probability of ω
◦
Y kX
, according to the defini-
tion of projected probability given in Eq.(3.12), we have the following:
P
◦
Y kX
(y j ) = b
◦
Y kX
Thus, we need to find the point ω
◦
Y kX
(y j ) + aY (y j )u
◦
Y kX
= (bb
◦
Y kX
,u
◦
Y kX
.
(9.58)
, aY ) with the greatest possible
uncertainty satisfying the requirements in Eq.(9.58) and Eq.(9.56), where P
◦
Y kX
(y j )
is determined by Eq.(9.57). From Eq.(9.58) and Eq.(9.56) we have the following:
P
u
◦
Y kX
≤
◦
Y kX
(y j ) − min[bbY |xi (y j )]
i
aY (y j )
,
(9.59)
for every j = 1, . . . , l. For simplicity, let us denote the right-hand side of Eq.(9.59)
by u j . Hence we have:
u
◦
Y kX
for every j = 1, . . . , l. Now, the greatest u
≤ u j,
◦
Y kX
(9.60)
, for which Eq.(9.60) holds, is obviously
determined as:
u
◦
Y kX
= min[u j ].
j
(9.61)
Namely, from Eq.(9.58) and Eq.(9.56), it follows that this value is non-negative. It
is also less than or equal to 1 since, if we assume the opposite, it will follow that
u j > 1, for every j = 1, . . . , l, which leads to P ◦ (y j ) > mini [bbY |xi (y j ) + aY (y j )],
Y kX
9.5 Multinomial Deduction
161
for every j = 1, . . . , l. Summing up by j in the last inequality leads to contradiction,
since both the projected probabilities and the base rates of Y sum up to 1. Hence,
u ◦ determined by Eq.(9.61) is a well-defined uncertainty value. It is obviously the
Y kX
greatest value satisfying Eq.(9.60), hence also the initial requirements.
Having determined u ◦ , we determine the corresponding belief masses b
Y kX
j = 1, . . . , l from Eq.(9.58), hence determine the opinion point ω
◦
Y kX
◦
Y kX
(y j ),
.
In the geometrical representation of opinions in tetrahedrons, determining the
opinion point ω ◦ is equivalent to identifying the intersection between the surface
Y kX
of the auxiliary deduction sub-space and the projector line (line parallel to the director) passing through the point on the base correspondent to the projected probability
determined by Eq.(9.57).
Step 3: The vertices of the opinion simplex of X map into the vertices of the deduction sub-space. This leads to the following linear expression for the uncertainty
uY kX of an opinion ωY kX on Y , deduced from an opinion ωX = (bbX , uX , a X ) on X:
k
uY kX = uX u
◦
Y kX
+ ∑ uY |xi b X (xi ) .
(9.62)
i=1
We obtain the last expression as the unique transformation on the beliefs and uncertainty of an opinion on X, that maps the beliefs and uncertainties of the opinions
◦
ωXi , i = 1, . . . , k, and ω X , into the uncertainty masses of ωY |xi , i = 1, . . . , k, and ω ◦ ,
Y kX
correspondingly.
From the equivalent form of Eq.(9.62):
k
uY kX = u
◦
Y kX
− ∑ (u
i=1
◦
Y kX
− uY |xi )bbX (xi ) ,
(9.63)
The uncertainty of a deduced opinion ωY kX from an arbitrary opinion ωX is obtained
when the maximum uncertainty of the deduction, u ◦ , is decreased by the weighted
Y kX
average of the beliefs by the ‘uncertainty distance’ of the conditional opinions to the
maximum uncertainty.
Having deduced the uncertainty uY kX , the belief mass distribution, bY kX =
{bbY kX (y j ), j = 1, . . . , l}, of the deduced opinion is determined by rearranging
Eq.(9.44), to the following form.
bY kX (y j ) = PY kX (y j ) − aY (y j )uY kX .
(9.64)
The deduced multinomial opinion is then:
ωY kX = (bbY kX , uY kX , aY ).
(9.65)
162
9 Conditional Deduction
This marks the end of the 3-step deduction procedure for multinomial opinions.
Note that in case the analyst knows the exact value of variable X = xi , i.e.
b X (xi ) = 1 so that ωX is an absolute opinion, then obviously ωY kX = ωY |xi .
The above described procedure can be applied also in the case when some of the
given opinions are hyper opinions. In that case, we first determine the corresponding
projections of the hyper opinions into multinomial opinions, in the way described
in Section 3.5.2, and then deduce an opinion from the projections. The resulting
deduced opinion is then multinomial.
9.6 Example: Multinomial Deduction for Match-Fixing
This example is about a football match to be played between Team 1 and Team 2.
A gambler who plans to bet on the match suspects that match-fixing is taking place,
whereby one of the teams has been paid to loose, or at least not to win the match.
The gambler has an opinion about the outcome of the match in case Team 1 has
been paid to loose, in case Team 2 has been paid to loose, and in case no team has
been paid to loose. The gambler also has an opinion about whether Team 1, Team
2, or none of the teams has been paid to loose.
Let X = {x1 , x2 , x3 } denote the variable representing which team has been paid,
as expressed by Eq.(9.66).

 x1 : Team 1 has been paid to loose
Domain for match fixing: X = x2 : Team 2 has been paid to loose

x3 : No match-fixing
(9.66)
Let Y = {y1 , y2 , y3 } denote the variable representing which team wins the match,
as expressed by Eq.(9.67).

 y1 : Team 1 wins the match
Domain for winning the match: Y = y2 : Team 2 wins the match
(9.67)

y3 : The match ends in a draw
The opinions are given in Table 9.1
Table 9.1 Opinion ωX (match-fixing), and conditional opinions ωY |X (winning the match).
Opinion on X
b X (x1 )
b X (x2 )
b X (x3 )
uX
= 0.50
= 0.10
= 0.10
= 0.30
a X (x1 ) = 0.1
a X (x2 ) = 0.1
a X (x1 ) = 0.8
bY |x1
bY |x2
bY |x3
Conditional opinions ωY |X
y1
y2
y3
= {0.00, 0.70, 0.10} uY |x1 = 0.20
= {0.70, 0.00, 0.10} uY |x2 = 0.20
= {0.10, 0.10, 0.20} uY |x3 = 0.60
9.7 Interpretation of Material Implication in Subjective Logic
163
The first step is to apply Eq.(9.55) to compute the Bayesian base rate distribution
aY , which produces:

 aY (y1 ) = 0.3125
Bayesian base rate distribution aY = aY (y2 ) = 0.3125
(9.68)

aY (y3 ) = 0.3750
The second step is to apply Eq.(9.59) and Eq.(9.61) to compute the sub-simplex
apex uncertainty which produces u ◦ = 0.7333.
Y kX
The third step is to apply Eq.(9.62) and Eq.(9.64) to compute the deduced opinion
about which team will win the match, which produces:
Deduced opinion ωY kX

bY kX (y1 ) = 0.105,
 bY kX (y2 ) = 0.385,
=
 bY kX (y3 ) = 0.110,
uY kX
= 0.4

aY kX (y1 ) = 0.3125,
aY kX (y2 ) = 0.3125, 
 (9.69)
aY kX (y3 ) = 0.3750, 
So, based on the opinion about match-fixing, as well as conditional opinions
about the chances of winning, it appears that Team 2 will win, with projected probability PY kX (y2 ) = 0.385 + (0.3125 · 0.4) = 0.51.
9.7 Interpretation of Material Implication in Subjective Logic
Material implication is traditionally denoted as (x → y), where x represents the antecedent and y the consequent of the logical relationship between the propositions
x and y. Material implication is a truth functional connective, meaning that it is
defined by its Truth Table 9.2.
Table 9.2 Traditional truth table for material implication
x
F
F
T
T
y
F
T
F
T
x→y
T
T
F
T
While truth functional connectives normally have a relatively clear interpretation
in normal language, this is not the case for material implication. The implication
(x → y) could for example be expressed as: “If x is true, then y is true”. However,
this does not say anything about the case when x is false, which is problematic for
the interpretation of the corresponding entries in the truth table. In this section we
show that material implication is not closed under Boolean truth values, and that it in
fact produces uncertainty in the form of a vacuous opinion. When seen in this light,
164
9 Conditional Deduction
it becomes clear that the traditional definition of material implication is based on
the over-simplistic and misleading interpretation of complete uncertainty as binary
logic TRUE. We redefine material implication with subjective logic to preserve the
uncertainty that it unavoidably produces in specific cases. We then compare the new
definition of material implication with conditional deduction, and show that they
reflect the same mathematical equation rearranged in different forms.
9.7.1 Truth Functional Material Implication
By definition, logical propositions in binary logic can only be evaluated to TRUE or
FALSE. A logical proposition can be composed of sub-propositions that are combined with logical connectives. For example, the conjunctive connective ∧ can be
used to combine propositions x and y into the conjunctive proposition (x ∧ y). In this
case the proposition (x ∧ y) is a complex proposition because it is composed of subpropositions. The ∧ connective has its natural language interpretation expressed as:
“x and y are both TRUE”. A logical proposition is said to be truth functional when
its truth depends on the truth of its sub-propositions alone [5]. Traditionally, it is
required that the complex proposition has a defined truth value for all the possible
combinations of truth values of the sub-propositions, in order for the truth function
to be completely defined.
As an example of a simple truth functional connective, the conjunction of two
propositions is defined as the truth table shown in Table 9.3 below.
Table 9.3 Truth table for conjunction (x ∧ y)
x
F
F
T
T
y
F
T
F
T
x∧y
F
F
F
T
Logical AND reflects the natural intuitive understanding of the linguistic term
‘and’, and is at the same time extremely useful for specifying computer program
logic.
Material implication, also known as truth functional implication, is a conditional
proposition usually denoted as (x → y), where x is the antecedent, y is the consequent. A conditional proposition is a complex proposition consisting of the two
sub-propositions x and y connected with the material implication connective ‘→’.
The natural language interpretation of material implication is: “If x is TRUE, then
y is TRUE”, or simply as “x implies y”. However, it can be noted that the natural language interpretation does not say anything about the case when x is FALSE.
Nevertheless, material implication is defined both in case x is TRUE and x FALSE.
The natural language interpretation thus only covers half the definition of material
9.7 Interpretation of Material Implication in Subjective Logic
165
implication, and this interpretation vacuum is the source of the confusion around
material implication.
Defining material implication as truth-functional means that its truth values are
determined as a function of the truth values of x and y alone, as shown in Table 9.4.
Table 9.4 Basic cases in truth table for material implication
Case 1:
Case 2:
Case 3:
Case 4:
x
F
F
T
T
y
F
T
F
T
x→y
T
T
F
T
The truth table of Table 9.4 happens to be equal to the truth table of (x ∨ y), which
is the reason why the traditional definition of truth functional material implication
leads to the equivalence (x → y) ⇔ (x ∨ y).
However, treating conditionals as truth functional in this fashion leads to well
known inconsistencies. Truth functional material implication should therefore not
be considered to be a binary logic operator at all. The natural language interpretation assumes that there is a relevance connection between x and y which does not
emerge from Truth Table 9.4. The relevance property which intuitively, but mistakenly, is assumed by (x → y) can be expressed as: “The truth value of x is relevant for
the truth value of y”. For example, connecting a false antecedent proposition with
an arbitrary consequent proposition gives a true implication according to material
implication, but is obviously counter-intuitive when expressed in normal language
such as: “If 2 is odd, then 2 is even”. However, the inverse proposition: “If 2 is even,
then 2 is odd” is false according to Truth Table 9.4.
Furthermore, connecting an arbitrary antecedent with a true consequent proposition is true according to material implication although the antecedent and consequent might not have any relevance to each other. An example expressed in normal
language is e.g. “If it rains, then 2 is even”.
The problem is that it takes more than a truth table to determine whether a proposition x is relevant for another proposition y. In natural language, the term ‘relevance’ assumes that when the truth value of the antecedent varies, so does that of
the consequent. Correlation of truth variables between antecedent and consequent
is thus a necessary element for relevance. Material implication defined in terms of
Truth Table 9.4 does not express any relevance between the propositions, and therefore does not reflect the meaning of the natural language concept of implication.
Truth Table 9.4 gives a case-by-case static view of truth values which is insufficient
to derive any relevance relationships.
There is thus a difference between e.g. conjunctive propositions and conditional
propositions in that conjunctive propositions are intuitively truth functional, whereas
conditional propositions are not. Treating a conditional proposition as truth functional is problematic because its truth cannot be determined in binary logic terms
solely as a function of the truth of its components.
166
9 Conditional Deduction
This section explains that the truth function of material implication is not closed
under binary logic, and that its truth value in fact can be uncertain, which is something that cannot be expressed with binary logic. We show that material implication
can be redefined to preserve uncertainty, and compare this to probabilistic and subjective logic conditional deduction where uncertainty can be expressed either in the
form of probability density functions or as subjective opinions. Subjective logic allows degrees of uncertainty to be explicitly expressed and is therefore suitable for
expressing the uncertain truth function of material implication.
9.7.2 Material Probabilistic Implication
By material probabilistic implication we mean that the probability value of the conditional p(y|x) shall be determined as a function of other probability variables. This
then corresponds directly with propositional logic material implication where the
truth value of the conditional is determined as a function of the antecedent and the
consequent truth values according to the truth table.
According to Eq.(9.3) on p.138 binomial probabilistic deduction is expressed as:
p(ykx) = p(x)p(y|x) + p(x)p(y|x).
(9.70)
The difference between probabilistic conditional deduction and probabilistic material implication is a question of rearranging Eq.(9.70) so p(y|x) is expressed as:
p(y|x) =
p(ykx) − p(x)p(y|x)
p(x)
(9.71)
Below we will use Eq.(9.71) and Eq.(9.70) to determine the value of the conditional (y|x).
• Cases 1 & 2: p(x) = 0
The case p(x) = 0 in Eq.(9.71) immediately appears as problematic. It is therefore necessary to consider Eq.(9.70).
It can be seen that the term involving p(y|x) disappears from Eq.(9.70) when
p(x) = 0. As a result p(y|x) can take any value in the range [0, 1], so p(y|x) must
be expressed as a probability density function. Without any prior information,
the density function must be considered to be uniform, which in subjective logic
has a specific interpretation as will be explained below.
A realistic example could for example be when considering the propositions
x:“The switch is on” and y:“The light is on”. Recall that x is FALSE (i.e. “The
switch is off”) in the cases under consideration here.
Let us first consider the situation corresponding to Case 1 in Table 2 where ykx
is FALSE (i.e. “The light is off with the given switch position, which happens to
be off”), which would be the case when y|x is FALSE. In this situation it is perfectly possible that y|x is FALSE too (i.e. “The light is off whenever the switch
9.7 Interpretation of Material Implication in Subjective Logic
167
is on”). It is for example possible that the switch in question is not connected
to the lamp in question, or that the bulb is blown.
Let us now consider the situation corresponding to Case 2 in Table 2 where ykx
is TRUE (i.e. “The light is on with the given switch position, which happens to
be off”), which would be the case when y|x is TRUE. In this situation it is also
perfectly possible that y|x is FALSE (i.e. “The light is off whenever the switch
is on”). It is for example possible that the electric cable connections have been
inverted, so that the light is on when the switch is off, and vice versa.
These examples are in direct contradiction with Cases 1 & 2 of Table 2 which
dictates that the corresponding implication (x → y) should be TRUE in both
cases. The observation of this contradiction proves that the traditional definition
of material implication is inconsistent with standard probability calculus.
• Cases 3 & 4: p(x) = 1
Necessarily p(x) = 0, so that Eq.(9.71) is transformed into p(y|x) = p(ykx).
Thus when x is TRUE (i.e. p(x) = 1) then necessarily (x → y) will have the
same truth value as y. This does not necessarily mean that the truth value of x
is relevant to the truth value of y. In fact it could be either relevant or irrelevant.
For example consider the antecedent proposition x:“It rains” combined with the
consequent y:“I wear an umbrella”, then it is plausibly relevant, but combined
with the consequent y:“I wear glasses”, then it is plausibly irrelevant. It can be
assumed that x and y are TRUE in this example so that the implication is TRUE.
The unclear level of relevance can also be observed in examples where the consequent y is FALSE so that the implication (x → y) becomes FALSE. The level
of relevance between the antecedent and the consequent is thus independent of
the truth value of the implication (x → y) alone. The criteria for relevance will
be described in more detail below.
9.7.3 Relevance in Implication
A meaningful conditional relationship between x and y requires that the antecedent
x is relevant to the consequent y, or in other words that the consequent depends on
the antecedent, as explicitly expressed in relevance logics [19]. Conditionals that are
based on the dependence between consequent and antecedent are considered to be
universally valid (and not truth functional), and are called logical conditionals [16].
Deduction with logical conditionals reflects human intuitive conditional reasoning,
and do not lead to any of the paradoxes of material implication.
Material implication, which is purely truth functional, ignores any relevance connection between antecedent x and the consequent y, and defines the truth value of
the conditional as a function of the truth values of the antecedent and consequent
alone.
It is possible to express the relevance between the antecedent and the consequent
as a function of the conditionals. According to Definition 10.1 and Eq.(10.4) the
relevance denoted Ψ(y|x) is expressed as:
168
9 Conditional Deduction
Ψ(y|x) = |p(y|x) − p(y|x)| .
(9.72)
Obviously, Ψ(y|x) ∈ [0, 1]. The case Ψ(y|x) = 0 expresses total irrelevance/independence,
and the case Ψ(y|x) = 1 expresses total relevance/dependence of x on y.
Relevance cannot be derived from the traditional truth functional definition of
material implication because the truth value of (x → y) is missing from the truth
table. In order to rectify this, an augmented truth table that includes (x → y) is given
below.
Table 9.5 Truth and relevance table for material implication
Case 1a:
Case 1b:
Case 2a:
Case 2b:
Case 3a:
Case 3b:
Case 4a:
Case 4b:
x
F
F
F
F
T
T
T
T
y x → y x → y Relevance
F F
Any
Any
F T
n.d.
n.d.
T F
n.d.
n.d.
T T
Any
Any
F F
F
None
F T
F
Total
T F
T
Total
T T
T
None
From Table 9.5 it can be seen that the truth table entries in the cases 1 and 2 are
either ambiguous (‘Any’) or not defined (‘n.d’). The term ‘Any’ is used to indicate
that any truth or probability for (x → y) is possible in the cases 1a and 2b, not just
Boolean TRUE or FALSE. Only in the cases 3 and 4 is the truth table clear about
the truth value of (x → y). The same applies to the relevance between x and y where
any relevance value is possible in the cases 1a and 2b, and only the cases 3 and 4
define the relevance crisply as either no relevance or total relevance. Total relevance
can be interpreted in the sense x ⇔ y or x ⇔ y, i.e. that x and y are either equivalent
or inequivalent.
Our analysis shows that the natural conditional relationship between two propositions can not be meaningfully be described with a simple binary truth table because
other values than Boolean TRUE and FALSE are possible. The immediate conclusion is that material implication is not closed under a binary truth value space. Not
even by assigning probabilities to (x → y) in Table 9.5 can material implication be
made meaningful. Below we show that subjective logic which can express uncertainty is suitable for defining material implication.
9.7.4 Subjective Interpretation of Material Implication
The discussion in Section 9.7.3 above concluded that any probability is possible in
the cases 1 and 2 of the truth table of material implication. The uniform probability
density function expressed by Beta(1, 1) which is equivalent to the vacuous opinion
ω = (0, 0, 1, 12 ) is therefore a meaningful and sound representation of the term ‘Any’
9.7 Interpretation of Material Implication in Subjective Logic
169
in Table 9.5. Similarly to three-valued logics such as Kleene-logic [24], it is possible to define three-valued truth as {TRUE, FALSE, UNCERTAIN}, abbreviated as
{T, F, U}, where the truth value UNCERTAIN represents ω = (0, 0, 1, 21 ).
Given that material implication is not closed in the binary truth value space, an
augmented truth table can be defined that reflects the ternary value space of (x → y)
as a function of the binary truth values of x and y, as shown in Table 9.6.
Table 9.6 Augmented truth table for material implication
x
y
x→y
Opinion
Case 1:
F
F
U:
ω(x→y) = (0, 0, 1, 12 )
Case 2:
F
T
U:
ω(x→y) = (0, 0, 1, 12 )
Case 3:
T
F
F:
ω(x→y) = (0, 1, 0, ay )
Case 4:
T
T
T:
ω(x→y) = (1, 0, 0, ay )
Table 9.6 defines material implication as truth functional in the sense that it is
determined as a function of binary truth values of x and y. Specifying the truth value
UNCERTAIN (vacuous opinion) in the column for (x → y) in the truth table is a
necessary consequence of the analysis in Section 9.7.2, but this means that the truth
table no longer is closed under binary truth values.
It can be argued that if values other than binary logic TRUE and FALSE are
allowed for (x →y), then it would be natural to also allow the same for x and y. This
is indeed possible, and can be expressed in terms of subjective logic as described
next.
9.7.5 Comparison with subjective Logic Deduction
With the mathematical detail omitted, the notation for binomial conditional deduction in subjective logic is:
ωykx = ωx ⊚ (ωy|x , ω y|x)
where the terms are interpreted as follows:
ωy|x : opinion about y given x is TRUE
ωy|x : opinion about y given x is FALSE
ωx : opinion about the antecedent x
ωykx : opinion about the consequent y given x
Figure 9.11 shows a screenshot of conditional deduction.
(9.73)
170
9 Conditional Deduction
u
u
u
y|x

DEDUCE
d
b
P
y|™x b
aP
P
d
a
Opinion about x
belief
0.00
disbelief
1.00
uncertainty 0.00
base rate
0.50
probability 0.00
=
Opinion about y|x
belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.75
probability 0.75
b
d
y|™x
1.00
0.00
0.00
0.75
1.00
a
P
Opinion about y
belief
1.00
disbelief
0.00
uncertainty 0.00
base rate
0.75
probability 1.00
Fig. 9.11 Screenshot from subjective logic demonstrator of deduction
The input variables in the example of Figure 9.11 are binomial opinions which
can be mapped to Beta distributions according to Definition 3.3. The leftmost triangle represents the opinion on x, and the rightmost triangle that of y. The middle
triangle represents the conditional opinions of y|x and y|x. The particular example
illustrates Case 2b of Table 9.5 and Case 2 of Table 9.6.
Eq.(9.73) corresponds directly to the probabilistic version of Eq.(9.70). Both expressions take three input variables, the only difference is that the input variables in
Eq.(9.70) are scalars, where as those of Eq.(9.73) are 3-dimensional. Given that the
base rates of ωy|x and ωy|x are equal, Eq.(9.73) takes eight scalar input parameters.
9.7.6 How to Interpret Material Implication
We have shown that material implication is inconsistent with traditional probabilistic logic. This is nothing new, where e.g. Nute and Cross (2002) pointed out that
“There can be little doubt that neither material implication nor any other truth
function can be used by itself to provide an adequate representation of the logical
and semantic properties of English conditionals” [73].
In this section we have presented a redefinition of material implication as a probabilistic material implication. The difference between probabilistic material implication and conditional deduction is a question of rearranging equations, as in the
transition from Eq.(9.3) to Eq.(9.71).
The analysis of material implication has shown that it is impossible to determine
the conditional p(y|x) or the corresponding implication (x → y) as a truth function
because the required conditional p(y|x) or the corresponding implication (x →y) are
9.7 Interpretation of Material Implication in Subjective Logic
171
missing. Material implication produces an uncertain conclusion precisely because it
attempts to determine the conditional relationship without the necessary evidence.
Probabilistic conditional relationships are routinely determined from statistical
data to be used as input to e.g. Bayesian networks. It is when the conditionals are
known and expressed for example in a boolean, probabilistic or a subjective logic
form that they are applicable for deriving conclusions about propositions of interest.
The idea of material implication is to turn the concept of deduction on its head and
try to determine the conditional from the antecedent argument and the consequent
conclusion. We have shown that the cases when the antecedent is FALSE then the
truth value of the material implication should be ‘uncertain’, not TRUE.
We have extended the truth table of material implication to make it correct and
consistent with conditional deduction. By doing so, material implication can no
longer be considered as a connective of propositional logic because its truth table is not closed under binary logic values. A more general reasoning calculus such
as subjective logic is needed to allow a consistent definition of material implication
because it allows the truth table to express the required uncertainty in the form of
vacuous opinions.
Chapter 10
Conditional Abduction
10.1 Introduction to Abductive Reasoning
Abduction is to reason in the opposite direction of available conditionals. Since conditionals are typically causal, the abduction process typically consists of reasoning
from an observed fact/event to determine (the likelihood of) possible causes of the
fact/event. This might appear to be a rather complex reasoning process, which it
often is. However, we constantly do intuitive abductive reasoning without much effort, but often make mistakes because of typical human reasoning fallacies, such as
ignoring base rates.
Simple examples of abductive reasoning are when we try to find causes for something. Assume for example that I follow the principle of locking the front door when
leaving my house for work every morning. Then one evening when I return home,
I find the door unlocked. Abductive reasoning is then the reasoning process which
tries to discover possible causes for why the door is unlocked, such as the possibility
that a burglar picked the lock and robbed the house, or that I forgot to lock the door
when I left in the morning.
Another typical example of abductive reasoning is when medical doctors diagnose diseases through tests. A pharmaceutical company that develops a medical test
for a specific disease, must determine the quality of the test by applying it to a
number of persons who certainly do have the disease, that we denote AS (Affected
Subjects), as well as to a number of persons who certainly do not have the disease,
that we denote US (Unaffected Subjects).
The respective numbers of TP (True Positives), TN (True Negatives), FP (False
Positives), and FN (False Negatives) can then be observed. Note that AS = TP +
FN, and that US = TN + FP. The quality of the test is then described in terms of its
sensitivity aka. TPR (True Positive Rate) and specificity aka. TNR (True Negative
Rate) expressed as follows:
173
174
10 Conditional Abduction
Sensitivity: TPR =
TP = TP
TP+FN
AS
(10.1)
TN
TN
Specificity: TNR =
=
TN+FP
US
Sensitivity quantifies the test’s ability to avoid false negatives, and specificity
quantifies the test’s ability to avoid false positives. The smaller the sensitivity TPR
and specificity TNR the better the quality of the test.
It turns out that quality aspects of the test can be expressed in terms of the conditionals:
p(‘positive test’ | ‘Affected subject’) = TPR
(10.2)
p(‘positive test | ‘Unaffected subject’) = 1 − TNR
The conditionals of Eq.(10.2) are causal because the presence or absence of the
disease causes the test to be positive or negative. The problem with these conditionals is that the medical doctor can not apply them directly to make the diagnosis.
What is needed is the pair of opposite conditionals so that from a positive or negative test the medical doctor can assess the likelihood that the patient is affected or
not affected by the disease. The process of inverting the conditionals of Eq.(10.2)
and making a diagnosis in this situation is precisely abductive reasoning.
Experiments show that humans are quite bad at intuitive abductive reasoning.
For example, the base rate fallacy [6, 58] in medicine consists of making the erroneous assumption that p(y|x) = p(x|y). While this reasoning error often produces
relatively good approximations of correct diagnostic probabilities, it can lead to a
completely wrong result and wrong diagnosis in case the base rate of the disease in
the population is very low and the reliability of the test is not perfect.
Medical tests are of course not only for diseases, but for any medical condition.
An extreme example of the base rate fallacy, is to conclude that a male person is
pregnant just because he tests positive in a pregnancy test. Obviously, the base rate
of male pregnancy is zero, and assuming that no test is absolutely perfect, it would
be correct to conclude that the male person is not pregnant and to assume that the
positive test is a merely a false positive.
In legal reasoning the base rate fallacy is called the prosecutor’s fallacy [80],
which consists of assigning too high base rate (prior probability) to finding a true
match of e.g. fingerprints or DNA. For example, if a specific fingerprint is found
on the murder weapon at a crime scene, and a search is done through a database
containing millions of samples, then the base rate of a true match is extremely low,
so it would be unsafe to interpret a match in the database directly as a true match
and a proof of guilt. Instead, it is more likely to be a false match, i.e. the person is
probably not the person who left the fingerprint, and is therefore not guilty, even if
the fingerprint matches. In order to correctly assess the fingerprint match as proof,
the prosecutors must also consider the quality (sensitivity and specificity) of the
matching procedure, as well as the base rate of true match given demographic and
other circumstantial parameters.
10.2 Relevance and Irrelevance
175
The correct reasoning that takes base rates into account can easily be formalized
mathematically, and is often needed in order to avoid errors of intuition in medical
diagnostics, legal argumentation and other situations of abductive reasoning.
Aspects of abductive reasoning are also mentioned in connection with conditional deduction, described in Chapter 9. We therefore recommend readers to look
at the introduction to conditional reasoning in Section 9.1, the description of probabilistic conditional inference in Section 9.2, as well as the subjective logic notation
for conditional inference in Section 9.3.
In this section we describe the principle of abductive reasoning and its expression in the framework of subjective logic. Before providing the details of binomial
and multinomial abduction in subjective logic, the next section first introduces the
concept of relevance, which is necessary for inverting binomial and multinomial
conditionals [56].
10.2 Relevance and Irrelevance
The concept of relevance expresses that the values of two separate variables influence each other dynamically in time. Relevance between a variable X and another
variable Y then expresses the likelihood that observing a change in the values of
X leads to a change in the observed values of Y . This concept is formally defined
below.
Definition 10.1 (Probabilistic Relevance). Given two variables X = {xi |i = 1, . . . , k}
and Y = {y j | j = 1, . . . , l}, and a set of conditional probability distributions p(Y |xi ),
i = 1, . . . , k, the relevance of a variable X to a value y j , is defined as:
Ψ(y j |X) = max[p(y j |xi )] − min[p(y j |xi )] .
xi ∈X
xi ∈X
(10.3)
⊔
⊓
The relevance expresses the diagnostic power of the conditionals, i.e. the degree
of belief according to the conditionals, to which the possible values of the random
variable X influence the truth of value y j (Y taking the value y j ).
Obviously, Ψ(y j |X) ∈ [0, 1], for every j = 1, . . . , l. Ψ(y j |X) = 1 expresses total
relevance (determination), and Ψ(y j |X) = 0 expresses total irrelevance of X to y j .
In case of binary probabilistic conditionals p(y|x) and p(y|x) the expression for
relevance is simplified to:
Ψ(y|x) = |p(y|x) − p(y|x)| .
(10.4)
The concept of relevance can be extended to conditional subjective opinions, simply by projecting conditional opinions to their corresponding projected probability
functions, and applying Eq.(10.3).
176
10 Conditional Abduction
Definition 10.2 (Opinion Relevance). Assume a set of conditional opinions ωY |X =
{ωY |x1 , . . . , ωY |xk }, where each conditional opinion ωY |xk has a corresponding projected probability distribution PY |xi . The relevance of X to each y j can be expressed
as:
Ψ(y j |X) = max[PY |xi (y j )] − min[PY |xi (y j )] .
xi ∈X
xi ∈X
(10.5)
⊔
⊓
It is useful to also define irrelevance, Ψ(y j |X), as the complement of relevance:
Ψ(y j |X) = 1 − Ψ(y j |X) .
(10.6)
The irrelevance Ψ(y j |X) expresses the lack of diagnostic power of X over y j ,
which gives rise to the uncertainty of the conditional opinions.
Note that in the case of binary variables X and Y , where X takes its values from
X = {x, overlinex} and Y takes its values from Y = {y, y}, the above equations give
the same relevance and irrelevance values for the two values of Y . For simplicity,
we can denote relevance of X to y (and X to y) by Ψ(y|x) in this case.
10.3 Inversion of Binomial Conditional Opinions
10.3.1 Principles for Inverting Binomial Conditional Opinions
Binomial abduction requires the inversion of binomial conditional opinions. This
section describes the mathematical expressions necessary for computing the required inverted conditionals. Assume that the available conditionals are ωy|x and
ωy|x which are expressed in the opposite direction to that needed for applying the
operator for deduction in Eq.(10.7).
ωxky = ωy ⊚ (ωx|y , ωx|y )
(10.7)
Binomial abduction simply consists of first inverting the pair of available conditionals (ωy|x , ωy|x ) to produce the opposite pair of conditionals (ωx|y , ωx|y ), and
subsequently to use these as input to binomial deduction described in Section 9.4.
Figure 10.1 illustrates the principle of conditional inversion, in the simple case of
the conditoinals ωy|x = (0.80, 0.20, 0.00, 0.50) and ωy|x = (0.20, 0.80, 0.00, 0.50),
and where ax = 0.50. The inversion produces the pair of conditionals ωy|x =
(0.72, 0.12, 0.16, 0.50) and ωy|x = (0.16, 0.72, 0.12, 0.50), which are computed
with the method described below.
The upper half of Figure10.1 illustrates how the conditionals define the shaded
subtriangle within the Y -triangle, which represents the image area for possible deduced opinions ωykx .
10.3 Inversion of Binomial Conditional Opinions
ux vertex
Zq
177
Zq y
uy vertex
x
Z y|| xq
X-triangle
Z y| x
x
x
ax
Y-triangle
Z y| x
y
y
ay
Inversion of
conditional opinions
ux vertex
Zq x
Z x| y
x
Zq y
Z x|| yq
X-triangle
uy vertex
Y-triangle
Z x| y
ax
x
y
ay
y
Fig. 10.1 Inversion of binomial conditionals
The lower half of Figure10.1 illustrates how the inverted conditionals define the
shaded subtriangle within the X-subtriangle, which represents the image area for
possible abduced opinions ωxky .
Note that, in general, inversion of opinions produces increased uncertainty, as
can be seen by the difference in uncertainty level between the shaded subtriangle on
the upper half, and the shaded subtriangle on the lower half of Figure10.1.
Inversion of conditional opinions must be compatible with inversion of probabilistic opinions. We therefore need the projected probabilities of the available conditionals ωy|x and ωy|x .

 Py|x = by|x + ay uy|x
(10.8)

Py|x = by|x + ay uy|x
Then compute the projected probabilities of the inverted conditionals ωx|y and
ωx|y using the results of of Eq.(10.8) and the base rate ax .
178
10 Conditional Abduction



 Px|y =


 Px|y =
ax Py|x
(ax Py|x +ax Py|x )
(10.9)
ax Py|x
(ax Py|x +ax Py|x )
Synthesise a pair of dogmatic conditional opinions from the expectation values
of Eq.(10.9):

 ω x|y = (Px|y , Px|y , 0, ax )
(10.10)

ω x|y = (Px|y , Px|y , 0, ax )
where Px|y = (1 − Py|x ) and Py|x = (1 − Py|x ).
The projected probabilities of the dogmatic conditionals of Eq.(10.10) and of
the inverted conditional opinions ωx|y and ωx|y are equal by definition. However,
the inverted conditional opinions ωx|y and ωx|y do in general contain uncertainty,
in contrast to the dogmatic opinions of Eq.(10.10) that contain no uncertainty. The
inverted conditional opinions ωx|y and ωx|y can be derived from the dogmatic opinions of Eq.(10.10) by determining their appropriate uncertainty level. This amount
of uncertainty is a function of the following elements:
• The theoretical maximum uncertainty values ubX|y and ubX|y for ωx|y and ωx|y
respectively,
• The weighted proportional uncertainty uw
y|X based on the uncertainties uy|x and
uy|x ,
• The irrelevance Ψ(y|X) and Ψ(y|X) .
Having determined the appropriate uncertainty for the two conditionals, the corresponding belief masses emerge directly.
10.3.2 Method for inversion of Binomial Conditional Opinions
The inversion of binomial opinions is summarised in 4 steps below.
Step 1: Theoretical maximum uncertainties ubx|y and ubx|y .
Figure 10.2 illustrates how the belief mass is set to zero to determine the theoretbx|y .
ical uncertainty-maximised conditional ω
The theoretical maximum uncertainties ubX|y for ωx|y and ubX|y for ωx|y are determined by setting either the belief or the disbelief mass to zero according to the
simple IF-THEN-ELSE algorithm below.
Computation of ubX|y
IF
Px|y < ax
THEN ubX|y = Px|y /ax
ELSE ubX|y = (1 − Px|y )/(1 − ax )
(10.11)
10.3 Inversion of Binomial Conditional Opinions
179
ux vertex
Base rate director
Zx|y
Px|y = bx|y + ax|y ux|y
Zx|y
x vertex
Px|y
x vertex
ax
bx|y
Fig. 10.2 Dogmatic conditional ω x|y and corresponding uncertainty maximized conditional ω
Computation of ubX|y
IF
Px|y < ax
THEN ubX|y = Px|y /ax
ELSE ubX|y = (1 − Px|y )/(1 − ax )
(10.12)
Step 2: Weighted Proportional Uncertainty uw
y|X .
We need the sum of conditional uncertanty uΣy|X , computed as:
uΣy|X = uy|x + uy|x
The proportional uncertainty weights
 u

 wy|x =


wuy|x
=0
 u

 wy|x =


uy|x
uΣy|X
uy|x
uΣy|X
wuy|x = 0
wuy|x
and
(10.13)
wuy|x are
computed as:
, for uΣy|X > 0
(10.14)
for
uΣy|X
=0
, for uΣy|X > 0
(10.15)
for uΣy|X = 0
We also need the maximum theoretical uncertainty ubY |x and ubY |x . The theoretical
maximum uncertainties uby|x and uby|x are determined by setting either the belief or
the disbelief mass to zero according to the simple IF-THEN-ELSE algorithm below.
Computation of uby|x
IF
Py|x < ay
THEN uby|x = Py|x /ay
ELSE uby|x = (1 − Py|x )/(1 − ay )
(10.16)
180
10 Conditional Abduction
Computation of uby|x
IF
Py|x < ay
THEN uby|x = Py|x /ay
ELSE uby|x = (1 − Py|x )/(1 − ay )
(10.17)
w
The weighted proportional uncertainty components uw
y|x and uy|x are computed as:


 uw
y|x =
wuy|x uy|x
uby|x ,


 uw
y|x =
wuy|x uy|x
uby|x ,




uw
y|x = 0
uw
y|x
=0
for uby|x > 0
(10.18)
for uby|x > 0
(10.19)
for uby|x = 0
for uby|x = 0
The weighted proportional uncertainty uw
y|X can then be computed as:
w
w
uw
y|X = uy|x + uy|x .
(10.20)
Step 3: Relative Uncertainties uex|y and uex|y .
The relative uncertainties uex|y and uex|y are computed as:
uex|y = uex|y = uw
y|X ⊔ Ψ(y|X)
(10.21)
w
= uw
y|X + Ψ(y|X) − uy|X Ψ(y|X)
The interpretation of Eq.(10.21) is that the relative uncertainty ueX|y is an increasing function of the weighted proportional uncertainty uw
y|X , because uncertainty in
one reasoning direction must be reflected by the uncertainty in the opposite reasoning direction. A practical example is when Alice is totally uncertain about whether
Bob carries an umbrella in sunny or rainy weather. Then it is natural that observing
whether Bob carries an umbrella tells Alice nothing about the weather.
Similarly, the relative uncertainty uex|y is an increasing function of the irrelevance
Ψ(y|X), because if the original conditionals ωy|x and ωy|x reflect total irrelevance
from parent X to child Y , then there is no basis for deriving belief about the inverted
conditionals ωx|y and ωx|y , so it must be uncertainty-maximized. A practical example
is when Alice knows that Bob always carries an umbrella both in rain and sun. Then
observing Bob carrying an umbrella tells her nothing about the weather.
The relative uncertainty uex|y is thus high in case the weighted proportional uncertainty uw
y|X is high, or the irrelevance Ψ(y|X) is high, or both are high at the same
time. The correct mathematical model for this principle is to compute the relative
uncertainty uex|y as the disjunctive combination of weighted proportional uncertainty
10.3 Inversion of Binomial Conditional Opinions
181
uw
y|X and the irrelevance Ψ(y|X), denoted by the coproduct operator ⊔ in Eq.(10.21).
Note that in the binomial case we have uex|y = uex|y .
Step 4: Inverted opinions ωx|y and ωx|y .
Having computed ubX|y and the relative uncertainty ueX|y , the correct un certainty
level can be computed, and the remaining opinion parameters b and d emerge directly, to produce the inverted opinion.

 ux|y = ubX|y ueX|y
ωx|y = bx|y = Px|y − ax uX|y
(10.22)

dx|y = 1 − bx|y − uX|y
so that the inverted conditional opinion can be expressed as ωx|y = (bx|y , dx|y , uX|y , ax ) .
Having computed ubX|y and the relative uncertainty ueX|y , the correct un certainty
level can be computed, and the remaining opinion parameters b and d emerge directly, to produce the inverted opinion.

 uX|y = ubX|y ueX|y
ωx|y = bx|y = Px|y − ax uX|y
(10.23)

dx|y = 1 − bx|y − uX|y
so that the inverted conditional opinion is ωx|y = (bx|y , dx|y , uX|y , ax ).
This marks the ende of the 4-step procedure for inverting binomial conditionals.
The process of inverting conditional opinions can be defined as an operator.
Definition 10.3 (Inversion of Binomial Conditionals). Let {ωy|x , ωy|x } be a pair
of binomial conditional opinoins, and let ax be the base rate of x. The pair of conditional opinions, denoted {ωx|y , ωx|y }, derived through the 4-step procedure described above, are the inverted binomial conditional opinions of the former pair.
e denotes the operator for conditional inversion, so that inversion of
The symbol ‘⊚’
a pair of binomial conditional opinions can be expressed as:
e ({ωy|x , ωy|x }, ax )
{ωx|y , ωx|y } = ⊚
(10.24)
⊔
⊓
The process of applying inverted binomial conditionals for binomial conditional
deduction is the same as binomial abduction. The difference between inversion and
abduction is thus that abduction takes the evidence argument ωy , whereas inversion
does not. Eq.(10.25) compares the two operations that both use the same operator
symbol.
Inversion:
{ωx|y , ωx|y } =
Abduction: ωxeky
e ({ωy|x , ωy|x }, ax )
⊚
e ({ωy|x , ωy|x }, ax )
= ωy ⊚
(10.25)
182
10 Conditional Abduction
10.3.3 Convergence of Repeated Inversions
An interesting question is, what happens when conditionals are repeatedly inverted?
In case of probabilistic logic, which is uncertainty agnostic, the inverted conditionals
always remain the same after repeated inversion. This can be formally expressed as:
e
{p(y|x), p(y|x} = ⊚({p(x|y),
p(x|y)}, a(y))
(10.26)
e ⊚({p(y|x),
e
= ⊚((
p(y|x)}, a(x))), a(y))
In case of opinions with Bayesian base rates, the projected probabilities of conditional opinions also remain the same. However, repeated inversion of conditional
opinions increases uncertainty in general.
The increasing uncertainty is of course limited by the theoretical maximum uncertainty for each conditional. In general, the uncertainty of conditional opions converges towards their theoretical maximum, as inversions are repeated infinitely many
times.
Figure 10.3 illustrated the process if repeated inversion of conditionals, based
on the same example as in Figure 10.1, where the initial conditionals are ωy|x =
(0.80, 0.20, 0.00, 0.50) and ωy|x = (0.20, 0.80, 0.00, 0.50), and where the equal
base rates are ax = ay = 0.50.
Zq x
ux vertex
X-triangle
Y-triangle
Convergence conditionals
Z x| y
Z x| y
x
Zq y
uy vertex
ax
Z y| x
Z y| x
x
y
ay
y
Initial conditionals
Fig. 10.3 Convergence of repeated inversion of pairs of binomial conditionals
Table 10.1 lists a selection of the computed opinions ωy|x and ωx|y . The set consissts of: the convergence conditional opinion, the 8 first inverted conditional opinions,
and the initial conditional opinion, in that order.
The uncertainty increase is relative large in the first few inversions, and rapidly
becomes smaller. The pair of convergence conditionals opinions are ωy|x = ωy|x =
(0.00, 0.60, 0.40, 0.50). The inverted opinions were computed with an office spreadsheet, which started rounding off results from index 6. The final convergence con-
10.4 Binomial Abduction
183
Table 10.1 Series of inverted conditional opinions
Index
Convergence
Initial
∞
·
8
7
6
5
4
3
2
1
0
Opinion
Belief
ωy|x = ωx|y = ( 0.6,
···
ωy|x
= ( 0.603358,
ωx|y
= ( 0.605599,
ωy|x
= ( 0.609331,
ωx|y
= ( 0.615552,
ωy|x
= ( 0.62592,
ωx|y
= ( 0.6432,
ωy|x
= ( 0.672,
ωx|y
= ( 0.72,
ωy|x
= ( 0.8,
Disbelief Uncertainty Base rate
0.0,
···
0.003359,
0.005599,
0.009331,
0.015552,
0.02592,
0.0432,
0.072,
0.12,
0.2,
0.4,
···
0.393282,
0.388803,
0.381338,
0.368896,
0.34816,
0.3136,
0.256,
0.16,
0.0 ,
0.5 )
···
0.5 )
0.5 )
0.5 )
0.5 )
0.5 )
0.5 )
0.5 )
0.5 )
0.5 )
ditional was not computed with the spreadsheet, but was simply determined as the
opinion with the theoretical maximum uncertainty.
The above example is rather simple, with its perfectly symmetric conditinals and
base rates of 1/2. However, the same pattern of convergence of increasing uncertainty occurs for arbitrary conditionals and base rates. In general, the two pairs of
convergence conditionals are not equal. The equality in our example above is only
due to the symmetric conditionals and base rates.
10.4 Binomial Abduction
Binomial abduction with the conditionals ωy|x and ωy|x consists of first producing
the inverted conditionals ωx|y and ωx|y as described in Section 10.3, and subsequently applying them in binomial conditional deduction. This is summarised below.
Definition 10.4 (Binomial Abduction).
Assume the binary domains X = {x, x} and Y = {y, y}, where the pair of conditionals (ωy|x , ωy|x ) is available to the analyst. Assume further that the analyst has the
opinion ωy about y (as well as the complement ωy about y), and wants to determine
an opinion about x. Binomial abduction is to compute the opinion about x in this
situation, which consists of the following two-step process:
1. Invert the pair of available conditionals (ωy|x , ωy|x ) to produce the inverted pair
of conditionals (ωx|y , ωx|y ) as described in Section 10.3.
2. Apply the pair of inverted conditionals (ωx|y , ωx|y ) together with the opinion
ωy (as well as its complement ωy to compute binomial deduction as denoted in
Eq.(10.27) and described in Section 9.4.
⊔
⊓
Binomial abduction produces the opinion ωxk̃y about the value x expressed as:
184
10 Conditional Abduction
e ωy|x , ωy|x , ax )
ωxk̃y = ωy ⊚(
(10.27)
= ωy ⊚ (ωx|y , ωx|y ) = ωxky
⊔
⊓
Note that the operator symbol for abduction is the same as for binomial conditional inversion. The difference in usage is that, in case of inversion the notation is
e ωy|x , ωy|x } aX ), i.e. there is no argument opinion ωy . For abduc{ωx|y , ωx|y } = ⊚({
tion, the argument is needed.
Figure 10.4 shows a screenshot of binomial abduction, involving the Bayesian
base rate ay = 0.33.
u
u
u
y|x
~

ABDUCE
=
y|™x
d
b
a
Base rate on x
Belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.20
probability 0.20
d
y
Pa
P
Opinion about y
belief
0.80
disbelief
0.10
uncertainty 0.10
base rate
0.33
probability
0.83
P
y|x
0.40
0.00
0.60
0.33
0.60
b
y|™x
0.20
0.60
0.20
0.33
0.27
b
d
a P
Opinion about x
belief
0.17
disbelief
0.08
uncertainty 0.75
base rate
0.20
probability 0.32
Fig. 10.4 Screenshot of abduction, involving Bayesian base rate ay = 0.33
The abduced opinion ωxeky = (0.17, 0.08, 0.75, 0.20) contains considerable uncertainty, which is partially due to the following uncertainty factors that appear in
the computation.


Irrelevance:








 Weighted relative uncertainty:
Ψ(y|X) = 0.67
uw
y|X


Apex point uncertainty in X-subtriangle: uxky◦







 Argument uncertainty:
uY
= 0.81
(10.28)
= 0.95
= 0.10
Notice that the difference between deduction and abduction simply depends on
which conditionals are available to the analyst. In case of causal situations it is
10.5 Illustrating the Base Rate Fallacy
185
normally easier to estimate causal conditionals than the opposite derivative conditionals. Assuming that there is a causal conditional relationship from x to y, the
analyst therefore typically has available the pair of conditionals (ωy|x , ωy|x ), so that
computing an opinion about x would require abduction.
10.5 Illustrating the Base Rate Fallacy
The base rate fallacy is briefly discussed in Section 10.1. This section provides simple visualisations of how the base rate fallacy can materialise, and how it can be
avoided. Assume medical tests for diseases A and B, where the sensitivity and specificity for both tests are equal, as expressed by the following conditional opinions.
Sensitivity (‘Positive test on affected’):

 by|x = 0.90
ωy|x = dy|x = 0.05

uy|x = 0.05

 by|x = 0.05
Specificity (‘Positive test on unaffected’): ωy|x = dy|x = 0.90

uy|x = 0.05
(10.29)
In the first situation, assume that the base rate for disease A in the population
is ax = 0.5, then the Bayesian base rate of the test result becomes ay = 0.5 too.
Assume further that a patient tests positive for disease A. The abduced opinion about
the patient having disease A illustrated in Figure 10.5. The projected probability of
having disease A is P(x) = 0.93
In the second situation, assume that the base rate for disease B in the population is
ax = 0.01, then the Bayesian base rate of the test result becomes ay = 0.06. Assume
further that a patient tests positive for disease B. The abduced opinion about the
patient having disease B is illustrated in Figure 10.6. The projected probability of
having disease B is only P(x) = 0.15, and the uncertainty is considerable.
The conclusion to be drawn from the examples of Figure 10.5 and Figure 10.6
is that the medical practitioner must consider the base rate of the disease that the
patient is tested for. Note that tests A and B both have the same quality, as expressed
by their equal sensitivity and specificity. Despite having equal quality, the diagnostic
conclusions to be drawn from a positive test A and a positive test B are radically
different. It is thus not enough to simply consider the quality of tests when making
diagnostics, the practitioner must also take into account the base rates of diseases
and other medical conditions that are being tested.
The quality of a test can be expressed in terms of the relevance of the disease
on the test results. Tests A and B in the example above have relatively high, but not
perfect relevance.
186
10 Conditional Abduction
u
u
u
~

ABDUCE
=
y|™x
d
b
d
a
P
a P
Opinion about y
belief
1.00
disbelief
0.00
uncertainty 0.00
base rate
0.50
probability
1.00
Base rate on x
Belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.50
probability 0.50
y|x y
b
P
y|x
0.90
0.05
0.05
0.50
0.93
b
d
a
P
Opinion about x
belief
0.89
disbelief
0.04
uncertainty 0.07
base rate
0.50
probability 0.93
y|™x
0.05
0.90
0.05
0.50
0.07
Fig. 10.5 Screenshot of medical test of disease A with base rate ax = 0.50
u
u
u
~

ABDUCE
=
y|™x
d
b
a
Base rate on x
Belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.01
probability 0.01
d
Pa
Opinion about y
belief
1.00
disbelief
0.00
uncertainty 0.00
base rate
0.06
probability
1.00
y|x y
b
P P
y|x
0.90
0.05
0.05
0.06
0.90
y|™x
0.05
0.90
0.05
0.06
0.05
b
d
a
P
Opinion about x
belief
0.14
disbelief
0.53
uncertainty 0.33
base rate
0.01
probability 0.15
Fig. 10.6 Screenshot of medical test of disease B with base rate ax = 0.01
In the next example, assume a test C with low quality (high irrelevance) as expressed by the conditionals below.
10.5 Illustrating the Base Rate Fallacy
187

 by|x = 0.90
ωy|x = dy|x = 0.05

uy|x = 0.05
Sensitivity (‘Positive test on affected’):

 by|x = 0.70
Specificity (‘Positive test on unaffected’): ωy|x = dy|x = 0.25

uy|x = 0.05
(10.30)
In this case, assume that the base rate for disease C in the population is ax = 0.50,
then the Bayesian base rate of the test result becomes ay = 0.84.
u
u
u
~

ABDUCE
d
b
d
=
y|™x
a
Base rate on x
Belief
0.00
disbelief
0.00
uncertainty 1.00
base rate
0.50
probability 0.50
Opinion about y
belief
1.00
disbelief
0.00
uncertainty 0.00
base rate
0.84
probability
1.00
y|x y
b
P a PP
y|x
0.90
0.05
0.05
0.84
0.94
y|™x
0.70
0.25
0.05
0.84
0.74
b
d
aP
Opinion about x
belief
0.20
disbelief
0.08
uncertainty 0.72
base rate
0.50
probability 0.56
Fig. 10.7 Screenshot of medical test C with poor quality
Note the dramatic increase in uncertainty, which is mainly due to the high irrelevance Ψ(y|X) = 0.80. The closer together the conditional opinion points are
positioned in the opinion triangle (in the opinion simplex in general), the higher the
irrelevance. In the extreme case when all conditional opinion points are in the exact
same position, the irrelevance is total, meaning that variable Y is independent on
variable X.
188
10 Conditional Abduction
10.6 Inversion of Multinomial Conditional Opinions
10.6.1 Principles of Multinomial Conditional Opinion Inversion
Abduction in subjective logic requires inverion of conditional opinions of the form
ωY |xi into conditional opinions of the form ωX|y j , analogously to Eq.(10.32) in the
case of Bayesian inference. This section describes the principles of inversion.
Figure 10.8 illustrates the principle of inversion of multinomial conditional opinions. The initial conditionals project the X-simplex (the opinion space of X) onto a
sub-simplex within the Y -simplex, as shown in the top part of Figure 10.8, which is
the basis for deduction as described in the previous section. The goal of the inversion is to derive conditionals that define a projection from the opinion space of Y to
a sub-space of the opinion space of X (as shown in the bottom part of Figure 10.8),
which in turn can support deduction from Y to X. Then an opinion on X can be
deduced from an evidence opinion on Y , which completes the abduction process.
uX vertex
ZY || Xq
Zq X
Antecedent
X
uY vertex
Consequent
Y
Conditionals
ZY |x2
ZY|X
x2
ZY |x3
x3
ZY |x1
y2
y3
x1
y1
Inversion of
conditional opinions
Z X ||Yq
uX vertex
uY vertex
ZqY
Consequent
Z X | y2
X
Z X | y3
x2
x3
Conditionals
ZX|Y
Z X | y1
x1
Antecedent
Y
y2
y3
y1
Fig. 10.8 Inversion of multinomial conditional opinions
In case the conditionals are expressed as hyper opinions then it is required that
they be projected to multinomial opinion arguments that only provide belief support
10.6 Inversion of Multinomial Conditional Opinions
189
for singleton elements. Eq.(3.31) describes the method for projecting hyper opinions
onto multinomial opinions.
Now, for the inversion of conditional opinions, assume two random variables X
and Y with respective cardinalities k = |X| and l = |Y |, with a set of multinomial
conditionals ωY |X , and the base rate distribution a X on X:
Inversion arguments:

 ωY |X = {ωY |x1 , . . . , ωY |xk },

(10.31)
aX .
Since we want the projected probabilities of the inverted conditional opinions to
behave in the same way as the inverted conditional probability distributions over
variables in Bayesian inference described in Section 9.2, the projected probabilities
of each inverted conditional opinion ωX|y j are determined according to the following
equations which are analogous to Eq.(9.13):
PX|y j (xi ) =
a X (xi )PY |xi (y j )
,
k
∑t=1 a X (xt )PY |xt (y j )
for i = 1, . . . , k.
(10.32)
The simplest opinions ωX|y j , j = 1, . . . , l, to satisfy Eq.(10.32) are the dogmatic
opinions defined in the following way:

 b X|y j (xi ) = PX|y j (xi ), i = 1, . . . , k
=0
ω X|y j : uX|y j
(10.33)

a X|y j
= aX .
However, the proper inverted conditional opinions ωX|Y do in general contain uncertainty, in contrast to the dogmatic opinions of Eq.(10.33).
The amount of uncertainty to be assigned to the inverted conditional opinions
depend on the following factors:
• The maximum possible uncertainty values ubX|y j of the opinions ωX|y j satisfying
Eq.(10.32), j = 1, . . . , l,
• The weighted proportional uncertainty uYw|X of the uncertainties uY |xi , and
• The irrelevance values Ψ(y j |X), for j = 1, . . . , l .
10.6.2 Method for Multinomial Conditional Inversion
The principles of the inversion procedure for conditional opinions are cincisely formalised in 4 steps below. The crux is to determine the appropriate uncertainty for
each inverted conditional, then the corresponding belief masses emerge directly.
Step 1: Maximum theoretical uncertainties ubX|y j .
First we identify the maximum theoretical uncertainties ubX|y j of the inverted conditionals, by converting as much belief mass as possible into uncertainty mass, while
190
10 Conditional Abduction
preserving consistent projected probabilities according to Eq.(10.32). This process
is illustrated in in Figure 10.9.
uX
PX|yj (x1) = bX|yj(x1) + aX|yj(x1) uX|yj
PX|yj (x2) = bX|yj(x2) + aX|yj(x2) uX|yj
vertex
ZX|yj
PX|yj (x3) = bX|yj(x3) + aX|yj(x3) uX|yj
x2
ZX|yj
x3
PX|yj
x1
aX
bX|y j
Fig. 10.9 Dogmatic opinion ω X|y j and corresponding uncertainty maximized opinion ω
The line defined by the equations
PX|y j (xi ) = b X|y j (xi ) + a X (xi )uX|y j , i = 1, . . . k,
(10.34)
which by definition is parallel to the base rate director line and which joins ω X|y j
bX|y j in Figure 10.9, defines possible opinions ωX|y j for which projected proband ω
bX|y j is
ability is consistent with Eq.(10.32). As the illustration shows, an opinion ω
uncertainty-maximised when Eq.(10.34) is satisfied and at least one belief mass of
bX|y j is zero, since the corresponding point would lie on a side of the simplex. In
ω
general, not all belief masses can be zero simultaneously except for vacuous opinions. The example of Figure 10.9 indicates the case when b X|y j (x1 ) = 0.
bX|y j should satisfy the following requireThe components of the opinion point ω
ments:
PX|y j (xi0 )
ubX|y j =
, for some i0 ∈ {1, . . . , k}, and
(10.35)
a X (xi0 )
PX|y j (xi ) ≥ a X (xi )uX|y j , for every i ∈ {1, . . . , k.
(10.36)
The requirement of Eq.(10.36) ensures that all the belief masses determined according to Eq.(3.12) are non-negative. These requirements lead to the theoretical
uncertainty maximum :
"
#
"
#
PX|y j (xi )
PY |xi (y j )
ubX|y j = min
= min
(10.37)
k a
i
i
a X (xi )
∑t=1
X (xt )PY |xt (y j )
10.6 Inversion of Multinomial Conditional Opinions
191
Step 2: Weighted proportional uncertainty uYw|X .
We need the sum of conditional uncertanty uYΣ |X , computed as:
uYΣ |X = ∑ uY |x
(10.38)
x
The proportional uncertainty weights wYu |x are computed as:
 u

 wY |x =


wYu |x
uY |x
uYΣ |X
, for uYΣ |X > 0
(10.39)
=0
for
uYΣ |X
=0
We also need the maximum theoretical uncertainty ubY |xi of each conditional ωY |xi .
The maximum theoretical uncertainty must satisfy the following requirements:
ubY |xi =
PY |xi (y j0 )
, for some j0 ∈ {1, . . . , l}, and
aY (y j0 )
PY |xi (y j ) ≥ aY (y j )uY |xi , for every j ∈ {1, . . . , l.
(10.40)
(10.41)
The requirement of Eq.(10.41) ensures that all belief masses determined according to Eq.(3.12) are non-negative. These requirements lead to the theoretical uncertainty maximum :
PY |xi (y j )
(10.42)
ubY |xi = min
j
aY (y j )
The weighted proportional uncertainty components uYw|x are computed as:


 uYw|x =


uYw|x
wYu |x uY |x
ubY |x ,
=0
for ubY |x > 0
(10.43)
for ubY |x = 0
The weighted proportional uncertainty uYw|X can then be computed as:
k
uYw|X = ∑ uYw|xi .
(10.44)
i=1
Step 3: Relative uncertainties ueX|y .
The relative uncertainty, denoted ueX|y j , is computed as the coproduct of the
weighted proportional uncertainty uYw|X and the irrelevance Ψ(y j |X):
ueX|y j = uYw|X ⊔ Ψ(y j |X)
= uYw|X + Ψ(y j |X) − uYw|X Ψ(yi |X) .
(10.45)
192
10 Conditional Abduction
The irrelevance Ψ(y j |X) of the variable X to the particular value y j of Y , is obviously a factor for determining the uncertainty uX|y j . For example, if the original
conditionals ωY |X reflect total irrelevance of the variable X to the value y j of Y ,
then there is no basis for deriving belief about the inverted conditionals ωX|y j , and
the latter must have maximal uncertainty. This is assured by both Eq.(10.45) when
the irrelevance Ψ(y j |X) = 1. A practical example is when we know that the climate
has continuously been changing for millions if years, for various reasons. Then observing that the climate is currently changing, in itself, says nothing about specific
causes of the current change.
The weighted proportional uncertainty uYw|X must be taken into account, because
the uncertainty in one reasoning direction must be reflected by the uncertainty in the
opposite reasoning direction. A practical example is when it is uncertain whether a
specific factor has had any significant influence on the climate in the past. Then,
knowing that the climate did change significantly at some point in time, in itself,
says nothing about the presence of the specific factor at that point in time.
According to the way it is defined by Eq.(10.44), uYw|X represents the proportional
expected uncertainty of Y given X, which represents a general uncertainty level for
the deductive reasoning, and which must be reflected in the inverted conditionals as
well.
The justification for Eq.(10.45) is that the relative uncertainty ueX|y should be
an increasing function of both the weighted proportional uncertainty uYw|X and the
irrelevance Ψ(y j |X). In addition to that, all the three values should lie in the interval
[0, 1]. The disjunctive combination of weighted proportional uncertainty uYw|X and
the irrelevance Ψ(y|X), is an adequate choice because it enables the following:
• When one of the two operands equals 0, the result equals the other one,
• When one of the two operands equals 1, the result equals 1 (equals that
operand).
Step 4: Inverted conditionals ωX|Y .
The uncertainty of each inverted conditional, denoted uX|y j , is computed by multiplying the theoretical maximum uncertainty ubX|y j with the relative uncertainty
ueX|y j , as expressed by Eq.(10.46)
uX|y j = ubX|y j ueX|y j = ubX|y j (uYw|X + Ψ(y j |X) − uYw|X Ψ(yi |X)) .
(10.46)
The uncertainty uX|y j is in the range [0, ubX|y j ], because the relative uncertainty
ueX|y j is in the range [0, 1].
Finally, given the uncertainties uX|y j , the inverted conditional opinions are simply
determined as:
ωX|y j = (bbX|y j , uX|y j , a X ),
(10.47)
where b X|y j (xi ) = PX|y j (xi ) − uX|y j a X (xi ), for i = 1, . . . , k.
10.7 Multinomial Abduction
193
Eq.(10.47) determines the set ωX|Y of inverted conditional opinions of Eq.(10.47).
This marks the end of the 4-step procedure for inverting multinomial conditional
opinions, which can be defined as an operator.
Definition 10.5 (Inversion of Multinomial Conditionals). Let ωY |X be a set of
multinomial conditional opinoins, and let a X be the base rate distribution over X.
The set of conditional opinions ωX|Y , derived through the procedure of 4 steps described above, is the inverted set of conditional opinions of the former set. The
e denotes the operator for conditional inversion, so that inversion of a set
symbol ‘⊚’
of conditional opinions can be expressed as:
e (ωY |X , a X )
ωX|Y = ⊚
(10.48)
⊔
⊓
The difference between inversion and abduction is thus that abduction takes the
evidence argument ωy , whereas inversion does not. Eq.(10.25) compares the two
operations that both use the same operator symbol.
Inversion:
ωX|Y =
e (ωY |X , a X )
⊚
e (ωY |X , a X )
Abduction: ωX ekY = ωY ⊚
(10.49)
Conditional abduction according to Eq.(9.20) with the original set of multinomial
conditionals ωY |X , is reduced to multinomial conditional deduction according to
Eq.(9.19), with the set of the inverted conditionals ωX|Y .
As explained in Section 10.3.3, repeated inversion of conditionals with Bayesian
base rates preserves the projected probabilities, but produces increased uncertainty
in general.
The increase in uncertainty is of course limited by the theoretical maximum uncertainty for each conditional. Repeated inversion will in the end make the uncertainty converge towards the theoretical maximum.
10.7 Multinomial Abduction
Multinomial abduction is a two-step process, where the first consists of inverting a
set of multinomial conditional opinions as described in Section 10.6, which in the
second step are used as arguments for multinomial deduction.
e denotes the conditional abduction operator for subjective opinThe symbol ⊚
ions, and ωY |X denotes the set of all the k = |X| different conditional opinions over
Y , ωY |xi , i = 1, . . . , k. ωX k̃Y denotes the opinion on X derived by the operation of
abduction that we define below.
Definition 10.6 (Multinomial Abduction).
194
10 Conditional Abduction
Let X = {xi |i = 1, . . . , k} and Y = {y j | j = 1, . . . , l} be random variables, where
now Y is the evidence variable, and X is the target variable.
Assume we have an opinion on ωY , and a set of conditional opinions of the form
ωY |xi , one for each i = 1, . . . , k, and a base rate distribution a X on X. The conditional
opinion ωY |xi expresses a subjective opinion on Y given that X takes the value xi .
Formally, a conditional opinion ωY |xi , i ∈ [1, k], is a tuple:
ωY |xi = bY |xi , uY |xi , aY ,
(10.50)
where bY |xi : (Y ) → [0, 1] is a belief mass function, uY |xi is an uncertainty mass, and
aY : Y → [0, 1] is a base rate function expressing the prior probabilities over Y . (Note
that the base rate function aY is the same for all of the conditional opinions ωY |xi ,
i = 1, . . . , k.)
Given the above, assume that the analyst wants to derive a subjective opinion on
X. Multinomial abduction is to compute the opinion about X in this situation, which
consists of the following two-step process:
1. Invert the set of available conditionals ωY |X to produce the set of conditionals
ωX|Y , as described in Section 10.6.
2. Apply the set of inverted conditionals ωX|Y together with the argument opinion
ωY to compute the abduced opinion ωX ekY , as described in Section 9.5.4 on
multinomial deduction.
⊔
⊓
Multinomial abduction produces the opinion ωX k̃Y about the variable X expressed as:
e ωY |X , aX )
ωX k̃Y = ωY ⊚(
(10.51)
= ωY ⊚ ωX|Y .
⊔
⊓
Note that the operator symbol for abduction is the same as for conditional
inversion. The difference in usage is that, in case of inversion the notation is
e ωY |X , aX ), i.e. there is no argument opinion ωY . For abduction, the arωX|Y = ⊚(
gument is needed.
Notice that the difference between deduction and abduction simply depends on
which conditionals are available to the analyst. In case of causal situations it is normally easier to estimate causal conditionals than the opposite derivative conditionals. Assuming that there is a causal conditional relationship from X to Y , the analyst
therefore typically has available the set of conditionals ωY |X , so that computing an
opinion about X would require abduction.
10.8 Example: Military Intelligence Analysis
195
10.8 Example: Military Intelligence Analysis
10.8.1 Example: Intelligence Analysis with Probability Calculus
Two countries A and B are in conflict, and intelligence analysts of country B wants
to find out whether country A intends to use military aggression. The analysts of
country B consider the following possible alternatives regarding country A’s plans:
x1 : No military aggression from country A
x2 : Minor military operations by country A
x3 : Full invasion of country B by country A
(10.52)
The way the analysts will determine the most likely plan of country A is by trying
to observe movement of troops in country A. For this, they have spies placed inside
country A. The analysts of country B consider the following possible movements of
troops.
y1 : No movement of country A’s troops
y2 : Minor movements of country A’s troops
(10.53)
y3 : Full mobilisation of all country A’s troops
The analysts have defined a set of conditional probabilities of troop movements
as a function of military plans, as specified by Table 10.2.
Table 10.2 Conditional probabilities p(Y |X): troop movement y j given military plan xi
Probability
y1
vectors
No movemt.
p (Y |x1 ):
p(y1 |x1 ) = 0.50
p (Y |x2 ):
p(y1 |x2 ) = 0.00
p (Y |x3 ):
p(y1 |x3 ) = 0.00
Troop movements
y2
y3
Minor movemt.
Full mob.
p(y2 |x1 ) = 0.25 p(y3 |x1 ) = 0.25
p(y2 |x2 ) = 0.50 p(y3 |x2 ) = 0.50
p(y2 |x3 ) = 0.25 p(y3 |x3 ) = 0.75
The rationale behind the conditionals are as follows. In case country A has no
plans of military aggression (x1 ), then there is little logistic reason for troop movements. However, even without plans of military aggression against country B it is
possible that country A expects military aggression from country B, forcing troop
movements by country A. In case country A prepares for minor military operations
against country B (x2 ), then necessarily troop movements are required. In case country A prepares for full invasion of country B (x3 ), then significant troop movements
are required.
Assume that, based on observations by spies of country B, the analysts assess the
likelihoods of actual troop movements to be:
p(y1 ) = 0.20 ,
p(y2 ) = 0.60 ,
p(y3 ) = 0.20 .
(10.54)
196
10 Conditional Abduction
The analysts are faced with an abductive reasoning situation, and must first derive
the inverted conditionals p(X|Y ). Assume that the analysts estimate the base rates
(prior probabilities) of military plans to be:
a(x1 ) = 0.70 ,
a(x2 ) = 0.20 ,
a(x3 ) = 0.10 .
(10.55)
The expression of Eq.(9.13) can now be used to derive the required inverted
conditionals, which are given in Table 10.3 below.
Table 10.3 Conditional probabilities p(X|Y ): military plan xi given troop movement y j
Probabilities of military plans given troop movement
p (X|y1 )
p (X|y2 )
p (X|y3 )
Military plan
No movemt.
Minor movemt.
Full mob.
y1 : No aggr.
p(x1 |y1 ) = 1.00 p(x1 |y2 ) = 0.58 p(x1 |y3 ) = 0.50
y2 : Minor ops. p(x2 |y1 ) = 0.00 p(x2 |y2 ) = 0.34 p(x2 |y3 ) = 0.29
y3 : Invasion
p(x3 |y1 ) = 0.00 p(x3 |y2 ) = 0.08 p(x3 |y3 ) = 0.21
The expression of Eq.(9.11) can now be used to derive the probabilities of military plans of country A, resulting in:
p(x1e
kY ) = 0.65 ,
p(x2e
kY ) = 0.26 ,
p(x3e
kY ) = 0.09 .
(10.56)
Based on the results of Eq.(10.56), it seems most likely that country A does not
plan any military aggression against country B. However, these results hide uncertainty, and can thereby give a misleading estimate of country A’s plans. Analysing
the same example with subjective logic in Section 10.8.2 gives a more nuanced picture, by explicitly showing the amount of uncertainty affecting the results.
10.8.2 Example: Intelligence Analysis with Subjective Logic
In this example we revisit the intelligence analysis situation of Section 10.8.1, but
now with conditionals and evidence represented as subjective opinions. When analysed with subjective logic, the conditionals are affected with uncertainty but still
have the same projected probability distribution as before. For the purpose of the
example we assign the maximum possible amount of uncertainty to the set of dogmatic opinions that correspond to the probabilistic conditionals of the example in
Section 10.8.1. These dogmatic opinions are specified in Table 10.4.
To recall, the base rates over the three possible military plans in X that were
already specified in Eq.(10.55), are repeated in Eq(10.57) below.
Military plan base rates: a X (x1 ) = 0.70,
a X (x2 ) = 0.20,
a X (x3 ) = 0.10
(10.57)
10.8 Example: Military Intelligence Analysis
197
Table 10.4 Dogmatic conditional opinions ω Y |X : troop movement y j given military plan xi
Opinions
ω Y |X
ω Y |x1 :
ω Y |x2 :
ω Y |x3 :
y1 :
No movemt.
bY |x1 (y1 ) = 0.50
bY |x2 (y1 ) = 0.00
bY |x3 (y1 ) = 0.00
Troop movements
y2 :
y3 :
Minor movemt.
Full mob.
bY |x1 (y2 ) = 0.25
bY |x1 (y3 ) = 0.25
bY |x2 (y2 ) = 0.50
bY |x2 (y3 ) = 0.50
bY |x3 (y2 ) = 0.25
bY |x3 (y3 ) = 0.75
Uncertainty
Any
uY |x1 = 0.00
uY |x2 = 0.00
uY |x3 = 0.00
The opinion conditionals affected with uncertainty specified in Table 10.5 are
obtained by uncertainty maximization of the dogmatic opinion conditionals of Table 10.4. The uncertainty maximization depends on the base rates in Eq.(10.58) over
the three possible troop movements in Y , which are derived as Bayesian base rates
as described in Section 9.5.2.
Troop movement base rates: aY (y1 ) = 0.35,
aY (y2 ) = 0.30,
aY (y3 ) = 0.35
(10.58)
With the base rates over Y , the uncertainty ubY |xi of the uncertainty-maximised
conditional opinions about troop movement of Table 10.5 are obtained according to
Eq.(10.59), which is equivalent to Eq.(10.37) as described in Section 10.6.
PY |xi (y j )
(10.59)
ubY |xi = min
j
aY (y j )
The belief masses of the uncertainty-maximised opinions are then computed according to Eq.(10.60).
bY |xi (y j ) = PY |xi (y j ) − aY (y j )b
uY |xi
(10.60)
The uncertainty-maximixed conditional opinions are given in Table 10.5.
Table 10.5 Uncertain conditional opinions ωY |X : troop movement y j given military plan xi
Opinions
ωY |X
ωY |x1 :
ωY |x2 :
ωY |x3 :
y1 :
No movemt.
bY |x1 (y1 ) = 0.25
bY |x2 (y1 ) = 0.00
bY |x3 (y1 ) = 0.00
Troop movements
y2 :
y3 :
Minor movemt.
Full mob.
bY |x1 (y2 ) = 0.04 bY |x1 (y3 ) = 0.00
bY |x2 (y2 ) = 0.50 bY |x2 (y3 ) = 0.50
bY |x3 (y2 ) = 0.25 bY |x3 (y3 ) = 0.75
Uncertainty
Any
uY |x1 = 0.71
uY |x2 = 0.00
uY |x3 = 0.00
The opinion about troop movements also needs to be uncertainty maximised,
in accordance with Eq.(10.59) and Eq.(10.60), where the uncertainty maximised
opinion is expressed by Eq.(10.61).
198
10 Conditional Abduction

bY (y1 ) = 0.00,



bY (y2 ) = 0.43,
ωY =
 bY (y3 ) = 0.00,


uY
= 0.57
aY (y1 ) = 0.35
aY (y2 ) = 0.30
aY (y3 ) = 0.35
(10.61)
First, the opinion conditionals must be inverted by taking into account the base
rate of military plans expressed in Eq.(10.55). The inversion process produces the
inverted conditional expressed in Table 10.6.
Table 10.6 Conditional opinions ωX|Y : Military plan xi given Troop movement y j
Military plan
Opinions of military plans given troop movement
ωX|y1
ωX|y2
ωX|y3
No movemt. Minor movemt.
Full mob.
x1 : No aggression b X|y1 (x1 ) = 1.00 b X|y2 (x1 ) = 0.00 b X|y3 (x1 ) = 0.00
x2 : Minor ops.
b X|y1 (x2 ) = 0.00 b X|y2 (x2 ) = 0.17 b X|y3 (x2 ) = 0.14
x3 : Invasion
b X|y1 (x3 ) = 0.00 b X|y2 (x3 ) = 0.00 b X|y3 (x3 ) = 0.14
X: Any
uX|y1
= 0.00 uX|y2
= 0.83 uX|y3
= 0.72
Then the likelihoods of country A’s plans can be computed as the opinion:
ωX ekY

b X ekY (x1 ) = 0.00, a X (x1 ) = 0.70, PX ekY (x1 ) = 0.65



 b e (x2 ) = 0.07, a X (x2 ) = 0.20, P e (x2 ) = 0.26
X kY
X kY
=
b X ekY (x3 ) = 0.00, a X (x3 ) = 0.10, PX ekY (x3 ) = 0.09



u
= 0.93
Xe
kY
(10.62)
These results can be compared with those of Eq.(10.56) which were derived with
probabilities only, and which are equal to the probability distribution given in the
rightmost column of Eq.(10.62).
An important observation to make is that, although x3 (full invasion) seems to
be country A’s least likely plan in probabilistic terms es expressed by PX ekY (x3 ) =
0.09, there is considerable uncertainty, as expressed by uX ekY = 0.93. In fact, the
probability PX (x1 ) = 0.65 of the most likely plan x1 has no belief support at all,
and is only based on uncertainty, which would be worrisome in a real situation. A
likelihood expressed as a scalar probability can thus hide important aspects of the
analysis, which will only come to light when uncertainty is explicitly expressed, as
done in the example above.
Chapter 11
Fusion of Subjective Opinions
Belief fusion is a central concept in subjective logic. It allows evidence and opinions
held by source agents about the same domain of interest to be merged in order to
provide an opinion about the domain representing the combination of the source
agents.
11.1 Interpretation of Fusion
In many situations there will be multiple sources of evidence about a domain of
interest, where there can be significant differences between the opinions. It is often
useful to combine the evidence from different sources in order to produce an opinion
that better reflects the set of different opinions or that is closer to the ground truth
than each opinion in isolation. Belief fusion is precisely to merge multiple opinions
in order to produce a single opinion that is more correct (according to some criteria) than each opinion in isolation. The principle of opinion fusion is illustrated in
Figure 11.1.
Belief fusion consists of merging the separate sources/agents A and B into a single source/agent that e.g. can be denoted denoted (A ⋄ B), and of mathematically
combining their opinions into a single opinion which then represents the opinion of
the merged source/agent.
Fusion situations vary significantly. For this reason it is necessary to apply different fusion operators when modelling different fusion situations. However, it can
be challenging to identify the correct fusion operator for a specific situation. In general, a given fusion operator is unsatisfactory when it produces adequate results in
most instances of a situation, but not in all instances of the situations. There should
be no exceptional input arguments that the fusion operator can not handle. A fusion
operator should produce adequate results in all possible cases of the situation to be
modelled.
199
200
11 Fusion of Subjective Opinions
Agent A
expresses
Z XA
Values of
variable X
Agent B
expresses
represent
States of
domain :
represent
State of
domain :
Z XB
Fusion Process
Agents
A and B
combined
express
Z X( A and B )
supports
Values of
variable X
Fig. 11.1 Fusion process principle
As an analogy, consider the situation of predicting the strength of a steel chain,
where the classical model is that of the weakest link, meaning that the chain is only
as strong as the weakest of all its links.
A different situation is e.g. to determine the strength of a relay swimming team,
for which an adequate model could be the average strength of each swimmer on the
team.
Applying the weakest link model to assess the overall strength of the relay swimming team is an approximation that might give good predictions in most instances
of high level swimming championships. However, it obviously is a poor model and
would produce unreliable predictions in general.
Similarly, applying the average strength model for assessing the overall strength
of the chain represents an approximation that would produce satisfactory strength
predictions in most instances of high quality steel chains. However, it is obviously
a very poor model which would be unreliable in general and which could be fatal if
life depended on it.
These examples illustrate that it is insufficient to simply use a few numerical examples to test whether the weakest link principle is an adequate model for predicting
the strength of relay swimming teams. Similarly it is insufficient to simply use a few
numerical examples to test whether the average principle is adequate for modeling
steel chains. Without a clear understanding of the situations to be modelled the analyst does not have a basis for selecting the most appropriate model. The selection
of the appropriate models might be obvious for the simple examples above, but it
can be challenging to say whether a fusion operator is adequate for a specific fusion
situation [52].
The conclusion to be drawn from this discussion is that the analyst must first
understand the dynamics of the situation at hand in order to find the most correct
model for analyzing it. The next section describes the problem of determining the
correctness of a specific model for belief fusion.
11.1 Interpretation of Fusion
201
11.1.1 Correctness and Consistency Criteria for Fusion Models
We argue that meaningful and reliable belief fusion depends on the fusion operator’s
ability to produce correct results for the practical or hypothetical situation that is
being analyzed. This calls for a definition of what it means for results to be correct.
Definition 11.1 (Correctness of Results). In general the correctness of results produced by a model is the degree to which the results represent the true state of the
real situation that is being modeled.
⊔
⊓
To clarify this definition it is useful to distinguish between three types of truth: 1)
ground truth, 2) consensus truth and 3) subjective truth, as described below, where
ground truth is the strongest and subjective truth is the weakest form of truth.
• Ground truth about a situation is the objectively observable state of the situation.
• Consensus truth about a situation is the state that is identified to be the actual state by a commonly shared opinion about the situation, or the state that
is identified to be the actual state according to commonly accepted norms or
standards.
• Subjective truth about a situation is the state identified to be actual state by the
analyst’s own opinion about the situation.
The term ‘true state’ can also be used in the sense that the state is satisfactory
or preferred. For example when a group of people wants to select a movie to watch
at the cinema together it would seem strange to say that one specific movie is more
true than another. However, in this case the term truth can interpreted in the sense
that one specific movie (the true state) can be considered to be the most satisfactory
for all the group members to watch together.
Fusion models produce output results when used to analyze fusion situations.
Three different types of result correctness emerge from the three different types of
truth, where objective correctness is the strongest, and subjective correctness is the
weakest.
• Objective result correctness is the degree to which the result represents the
ground truth of a situation.
• Consensus result correctness is the degree to which the result represents the
consensus truth of a situation.
• Subjective result correctness is the degree to which the result represents the
subjective truth of a situation.
Depending on whether ground truth, consensus truth or subjective truth is available, the strongest form of correctness should be required for assessing the results.
For example assume a weather forecast model with all its various input parameters
and their complex relationships. Weather forecasts can be compared with the actual
weather when the time of the forecast arrives a day or two later, so that it is reasonable to require objective correctness when assessing weather forecasting models.
202
11 Fusion of Subjective Opinions
The case of predicting global warming might seem similar to that of forecasting
the weather, because models for global warming are also based on many different
input parameters with complex relationships. Although predicting global warming
to occur over the next 100 years can be objectively verified or refuted, the time scale
makes it impossible to require objective correctness in the short term. Instead, practical assessment of model correctness must be based on consensus among experts.
So with no ground truth as bench mark it is only possible to require consensus correctness in the short term. An paradoxical observation is that in 100 years (e.g. after
year 2100) when ground truth about global warming predicted for the next 100 years
finally becomes available there will probably no longer be any interest in assessing
the correctness of the models used to make the predictions, and the individuals who
designed the models will be long gone. Designers of global warming models will
thus never be confronted with the ground truth about their models and predictions.
Despite the lack of objective basis, consensus correctness as a criterion is often used
for selecting specific models and for determining whether or not the results they
produce shall be used in planning and decision making.
In situations where ground truth can not be observed and consensus truth is impossible to obtain, only subjective criteria for truth can be used. Models for which
subjective correctness criteria can be used are e.g. models for making personal decisions about which career path to follow or which partner to live with. In theory
such decision are made based on multiple forms of evidence which must be fused to
form an opinion. People normally do not use formal models for analyzing such situations, and instead use their intuition. Models assessed under subjective correctness
criteria are often only used for practical decision making by an individual a small
number of times during a lifetime, so not even statistical evidence can be obtained.
However there are expert systems for e.g. career path choice and partner matching,
in which case it is possible to determine statistically whether a particular model
predicts ‘good’ career choices and ‘happy’ unions in the long term.
With regard to Definition 11.1, it is necessary to examine the case when it has
only been observed once or, a small number of times, and as whether it is representative of the true state of a situation. Although a model produces correct results in
some instances, there might be other instances where the results are clearly wrong,
in which case the model can not be considered to be correct in general. In situations
when only instances with correct results have been observed, the analyst might erroneously think that the model is correct in general.
For example, assume a rather naı̈ve analyst who misinterprets the situation of
adding apples from two baskets, and erroneously thinks that the product rule of integer multiplication is an appropriate model. Assume that the analyst tries a specific
example with two apples in each basket, and computes the sum with the product
rule, which gives 4 apples. When observing a real example of two baskets of two
apples each, it turns out that adding them together also produces 4 apples. This result
could mistakenly be interpreted as a confirmation that the product rule is a correct
model, simply because the computed result is the same as the ground truth in this
particular instance. It is of course wrong to conclude that a model is correct just
because it produces results that (perhaps by coincidence) correspond to the ground
11.1 Interpretation of Fusion
203
truth in a single instance. In order for a model to be correct, it is natural to require
that results produced by it are generally correct, and not just by coincidence in specific instances of a situation. In order to distinguish between coincidentally correct
results and generally correct results, it is necessary to also consider consistency,
which leads to the following definition.
Definition 11.2 (Model Correctness). A model is correct for a specific situation
when it consistently produces correct results in all instances of the situation.
⊔
⊓
On a high level of abstraction, a correct reasoning model according to Definition 11.2 must faithfully reflect the (class of) situations that are being modeled. A
precise way of expressing this principle is that for a given a class of situations, there
is one correct model. Note that is possible to articulate three types of model correctness according to the three types of result correctness.
• Objective model correctness for a specific class of situations is the model’s
ability to consistently produce objectively correct results for all possible situations in the class.
• Consensus model correctness for a specific class of situations is the model’s
ability to consistently produces consensus correct results for all possible situations in the class.
• Subjective model correctness for a specific class of situations is the model’s
ability to consistently produces subjectively correct results for all possible situations in the class.
Depending on whether ground truth, consensus truth or subjective truth is available, the strongest form of model correctness should be required for practical analysis. Observing result correctness in one instance is not sufficient to conclude that a
model is correct. It can be theoretically impossible to verify that all possible results
are consistently correct, so proving that a model is correct in general can be challenging. On the other hand, if a single false result is observed it can be concluded
that the model is incorrect for the situation. In such cases it might be meaningful to
indicate the range of validity of the model which limits the range of input arguments
or possibly the range of output results.
The next two sections described interpretations of fusion operators defined in
subjective logic, and selection criteria than analysts can use when deciding which
fusion operator to use for a specific situation.
11.1.2 Classes of Fusion Situations
Situations of belief fusion involve belief arguments from multiple sources that must
be fused in some way to produce a single belief argument. More specifically, the
situation is characterized by a frame consisting of two or more statements, and a
set of different belief arguments about these statements. It is assumed that each
belief argument supports one or several statements. The purpose of belief fusion is
204
11 Fusion of Subjective Opinions
to produce a new belief that identifies the most ‘correct’ statement(s) in the frame.
The meaning of most correct statement can also be that it is the most acceptable or
most preferred statement.
Different beliefs can be fused in various ways, each having an impact on how the
specific situation in evidence fusion is modeled. It is often challenging to determine
the correct or the most appropriate fusion operator for a specific situation. One way
of addressing this challenge is to categorize these specific situations according to
their typical characteristics, which would then allow for determining which fusion
operators are more adequate to each category. Four distinct classes as well as one
hybrid class of fusion situations are described below.
• Belief Constraint Fusion is when it is assumed that (i) each belief argument
can dictate which states of the frame are the most correct, and (ii) there is no
room for compromise in case two argument opinions are totally conflicting, i.e.
the fusion result is not defined in that case. In some situations this property is
desirable. An example is when two persons try to agree on seeing a movie at
the cinema. If their preferences include some common movies they can decide
to see one of them. Yet, if their preferences do not have any movies in common
then there is no solution, so the rational consequence is that they will not watch
any movie together. Constraint fusion is described in Section 11.2.
• Cumulative Belief Fusion is when it is assumed that it is possible to collect
an increasing amount of independent evidence by including more and more
arguments, and that certainty about the most correct state increases with the
amount of evidence accumulated. A typical case depicting this type of fusion
is when one makes statistical observations about possible outcomes, i.e. the
more observations the stronger the analyst’s belief about the likelihood of each
outcome. For example, a mobile network operator could observe the location of
a subscriber over time, which will produce increasing certainty about the most
frequent locations of that subscriber. However, the result would not necessarily
be suitable for indicating the exact location of the subscriber at a specific time.
Cumulative fusion is described in Section 11.3.
• Averaging Belief Fusion is when dependence between arguments is assumed.
In other words, including more arguments does not mean that more evidence is
supporting the conclusion. An example of this type of situation is when a jury
tries to reach a verdict after having observed the court proceedings. Because the
evidence is limited to what was presented to the court, the certainty about the
verdict does not increase by having more jury members expressing their beliefs,
since they were all exposed to the same evidence. Averaging fusion is described
in Section 11.4.
• Hybrid Cumulative / Averaging Fusion can be applied when observations
can be considered partially dependent. The operator for hybrid cumulativeaveraging fusion [49] is partially based on cumulative fusion and partially on
averaging fusion. It is described in Section 11.5.
• Consensus & Compromise Fusion (CC-fusion) is when no single belief argument alone can dictate that specific states of the frame are the most correct. In
this fusion class the analyst naturally wants to preserve shared beliefs from each
11.1 Interpretation of Fusion
205
argument, and in addition transform conflicting beliefs into new shared beliefs
on union subsets. In this way consensus belief is preserved when it exists and
compromise belief is formed when necessary. In case of totally conflicting beliefs on a binary frame, then the resulting fused belief is totally uncertain. An
example is when analysing evidence about the Kennedy murder case, where the
analyst collects statements from two witnesses. Assuming that both witnesses
claim to know with some certainty that Oswald killed Kennedy, the consensus
& compromise fusion would say the same, because there is a consensus. However, when assuming that witness 1 claims to know with certainty that Oswald
killed Kennedy, and that witness 2 claims to know with certainty that Oswald
did not kill Kennedy, then consensus & compromise fusion would return the
result that it is totally uncertain whether Oswald killed Kennedy, because uncertainty is the best compromise in case of totally conflicting beliefs. CC-fusion
is described in Section 11.6.
The subtle differences between the fusion situations above illustrate the challenge
of modeling them correctly. For instance, consider the task of determining the location of a mobile phone subscriber at a specific point in time by collecting location
evidence from base station, in which case it seems natural to use constraining belief
fusion. If two adjacent base stations detect the subscriber, then the belief constraint
operator can be used to locate the subscriber within the overlapping region of the
respective radio cells. However, if two base stations far apart detect the subscriber
at the same time, then the result of constraining belief fusion is not defined so there
is no conclusion. With additional assumptions, it would still be reasonable to think
that the subscriber is probably located in one of the two cells, but not which one
in particular, and that the case needs further investigation because the inconsistent
signals might be caused by en error in the system.
11.1.3 Criteria for Fusion Operator Selection
While defining classes of fusion situations helps in scoping the solution space, there
is still the issue of determining which class a specific situation belongs to. The approach we propose for this classification problem is to specify a set of assumptions
about a fusion situation, where each assumption can be judged to be either valid or
invalid for the situation. In other words, we decompose the classification problem so
it now becomes a matter of defining whether specific assumptions apply to the situation. The set of assumptions below can be used to determine which class a situation
belongs to.
In order to select the correct fusion model the analyst must consider the set of
assumption about the fusion situation to be analysed and judge which assumptions
are applicable. The most adequate fusion model is identified as a function of the set
of assumptions that applies to the situation to be analyzed. The selection procedure
is illustrated in Figure 11.2.
206
11 Fusion of Subjective Opinions
(a)
Consider situation
of belief fusion to be
modelled
(b)
Fusion of
totally conflicting opinion
arguments
(c)
Undefined
(Non-compromise)
Constraint fusion
operator ‘~’
Defined
(Compromise)
(d)
Effect of
fusing equal opinion
arguments
(e)
Reduced uncertainty
(Non-idempotence)
Cumulative fusion
operator ‘†’
Equal uncertainty
(Idempotence)
(g)
(f)
Effect of
fusion with vacuous opinion
argument
(h)
Increased uncertainty
(Non-neutrality)
Averaging fusion
operator ‘†’
No effect
(Compromise, idempotence and neutrality)
Consensus and compromise
cc ’
fusion operator ‘c
Fig. 11.2 Procedure for selecting the most adequate fusion operator
The steps of the selection procedure in Figure 11.2 are described in more detail
below.
(a) The analyst first needs a good understanding of the situation to be analysed and
modelled with a fusion operator. This includes being able to make the binary
choices of (b), (d) and (f) below.
(b) Does it make sense to fused two totally conflicting opinion arguments?
(c) In case no compromise can be imagined between two totally conflicting arguments, then it is probably adequate to apply the belief constraint fusion operator. This fusion operator is not defined in case of totally conflicting arguments,
which reflects the assumption that there is no compromise for totally conflicting
arguments.
(d) Should two equal opinion arguments produce a fused opinion which is equal to
the arguments?
(e) In case it is not assumed that two equal arguments should produce an equal
fusion result, then it is probably adequate to apply the cumulative fusion operator. It means that equal arguments are considered as independent support for
specific values of the variable, so that two equal arguments produce stronger
support than a single argument alone. In other words the operator should be
11.2 Belief Constraint Fusion
207
non-idempotent, as is the case for cumulative fusion. This operator can also
handle totally conflicting opinions if necessary.
(f) Should a vacuous opinion have any influence on the fusion result?
(g) In case it is assumed that a vacuous opinion has an influence on the fused result,
then it is probably adequate to apply the averaging fusion operator. It means
that there can be no neutral fusion argument. This can be meaningful when e.g.
applying fusion to make a survey of opinions. The averaging fusion operator
would suit this purpose e.g. because a situation where many agents express
vacuous opinions would be reflected in the result. This operator can also handle
totally conflicting opinions if necessary.
(h) In case it is assumed that two equal arguments produce an equal fusion result,
and that a vacuous opinion argument has no influence on the fusion result, then
it is probably adequate to apply the consensus and compromise fusion (CCfusion) operator. This operator can also handle totally conflicting opinions if
necessary.
The fusion operators mentioned in Figure 11.2 are described in the next sections.
11.2 Belief Constraint Fusion
Situations where agents with different preferences try to agree on a single choice
occur frequently. This must not be confused with fusion of evidence from different
agents to determine the most likely correct hypothesis or actual event. Multi-agent
preference combination assumes that each agent has already made up her mind, and
is about determining the most acceptable decision or choice for the group of agents.
Preferences for a state variable can be expressed in the form of subjective opinions. The constraint fusion operator of subjective logic can be applied as a method
for merging preferences of multiple agents into a single preference for the whole
group. This model is expressive and flexible, and produces perfectly intuitive results. Preference can be represented as belief and indifference can be represented as
uncertainty/uncommitted belief. Positive and negative preferences are considered
as symmetric concepts, so they can be represented in the same way and combined
using the same operator. A totally uncertain opinion has no influence and thereby
represents the neutral element.
11.2.1 Method of Constraint Fusion
The belief constraint fusion operator described here is an extension of Dempster’s
rule which in Dempster-Shafer belief theory is often presented as a method for fusing evidence from different sources [81] in order to identify the most likely hypothesis from the frame (domain). Many authors have however demonstrated that
208
11 Fusion of Subjective Opinions
Dempster’s rule is not an appropriate operator for evidence fusion [92], and that it
is better suited as a method for combining constraints [50, 48].
Definition 11.3 (The Constraint Fusion Operator). Assume the domain X and its
hyperdomain R(X), and assume the hypervariable X which takes its values from
R(X). Let agent A hold opinion ωXA and agent B hold opinion ωXB . The superscripts
A and B are attributes that identify the respective belief sources or belief owners.
These two opinions can be mathematically merged using the belief constraint fusion
operator denoted as ‘⊙’ which can be expressed as:
(A&B)
Belief Constraint Fusion: ωX
= ωXA ⊙ ωXB .
(11.1)
Belief source combination denoted with ‘&’ thus corresponds to opinion fusion
with ‘⊙’. Below is the algebraic expression of the belief constraint fusion operator
for subjective opinions.

(A&B)

bX
(x) =








 u(A&B)
=
(A&B)
X
ωX
:






a (A&B) (x) =




Har(x)
(1−Con) ,
∀ x ∈ R(X),
uAX uBX
(1−Con)
(11.2)
a AX (x)(1−uAX )+aaBX (x)(1−uBX )
,
2−uAX −uBX
∀ x ∈ R(X), x 6= 0/
The term Har(x) represents the relative harmony between constraints (in terms of
overlapping belief mass) on x. The term Con represents the relative conflict between
constraints (in terms of non-overlapping belief mass) between ωXA and ωXB . These
parameters are defined below:
Har(x) = b AX (x)uBX + b BX (x)uAX +
∑
b AX (xA )bbBX (xB )
(11.3)
(xA ∩xB )=x
Con =
∑
b AX (xA )bbBX (xB )
(11.4)
(xA ∩xB )=0/
⊔
⊓
The divisor (1 − Con) in Eq.(11.2) normalizes the derived belief mass; it ensures
belief mass and uncertainty mass additivity. The use of the constraint fusion operator
is mathematically possible only if ωXA and ωXB are not totally conflicting, i.e., if
Con 6= 1.
The constraint fusion operator is commutative and non-idempotent. Associativity is preserved when the base rate is equal for all agents. Associativity in case of
different base rates requires that all preference opinions be combined in a single
11.2 Belief Constraint Fusion
209
operation which would require a generalization of Eq.(11.2) for multiple agents, i.e.
for multiple input arguments, which is relatively trivial.
The base rates of the two arguments are normally assumed to be equal, expressed
by aAX = aBX , but different base rates can be used in case of base rate disagreement
between agents A and B.
Associativity in case of different base rates requires that all preference opinions
be combined in a single operation which would require a generalisation of Definition 11.3 for multiple agents, i.e. for multiple input arguments, which is relatively
trivial. A totally indifferent opinion acts as the neutral element for constraint fusion,
formally expressed as:
IF (ωXA is totally indifferent, i.e. with uAX = 1) THEN (ωXA ⊙ ωXB = ωXB ) .
(11.5)
Having a neutral element in the form of the totally indifferent opinion is very
useful when modelling situations of preference combination.
The flexibility of subjective logic makes it simple to express positive and negative preferences within the same framework, as well as indifference/uncertainty.
Because preference can be expressed over arbitrary subsets of the domain, this is
in fact a multi-polar model for expressing and combining preferences. Even in the
case of non-overlapping focal elements the belief constraint fusion operator produces meaningful results, namely that the preferences are incompatible. Examples
in Sections 11.2.3–11.2.6 demonstrates the usefulness of this property.
11.2.2 Frequentist Interpretation of Constraint Fusion
The behaviour and purpose of the constraint fusion operator can be challenging to
understand. However, it has a very clear frequentist interpretation which is described
next.
Assume a domain X with its hyperdomain R(X)and powerset P(X). Recall
from Eq.(2.1) that P(X) = R(X) ∪ {X, 0}.
/ Let x denote a specific value of the
hyperdomain R(X) or of the powerset P(X).
Let X be a hypervariable in X, and let χ by a random variable which takes its
values from the powerset P(X). We consider a repetitive process denoted U which
generates unconstrained instances of the variable χ , that in turn gets constrained
through serially arranged stages A and B to produce a constrained output value of
the variable. To be unconstrained means that χ = X. To constrain χ by e.g. xA is
to produce a new χ so that χ := χ ∩ xA . This means that that χ is modified to take
a value in P(X) with smaller or equal cardinality, i.e. that the new constrained χ
typically has a value consisting of fewer singleton elements.
Assume the opinions ωXA = (bbAX , uAX , aAX ) and ωXB = (bbBX , uBX , aBX ). Let pAχ be a
probability distribution over χ where p Aχ (x) = b AX (x) and p Aχ (X) = uAX . Similarly, let
pBχ be a probability distribution over χ based on ωXB .
210
11 Fusion of Subjective Opinions
The serial constraining configuration is determined by the probability distributions p Aχ and p Bχ in the following way. At stage A a specific element xA ∈ P(X)
is selected with probability p Aχ (xA ). At stage B a specific element xB ∈ P(X) is
selected with probability p Bχ (xB ). The unconstrained variable χ produced at stage
U is first constrained at stage A by computing χ := (χ ∩ xA ) = xA , which in turn is
constrained at stage B to produce χ := (xA ∩ xB ) = xC . The final constrained values
xC = (xA ∩ xB ) are collected at stage C. This is illustrated in Figure 11.3.
U
Unconstrained
F =:
Source of
unconstrained
variable values
F=:
A
Constrained
F = xA
Stochastic
constraint
xA selected
A
by p Ȥ .
B
Constrained
F = xC
Stochastic
constraint
xB selected
B
by p Ȥ .
C
Collected
constrained
values
xC = (xAˆ xB )
Fig. 11.3 Frequentist interpretation of constraint fusion
Assume that a series of unconstrained variable instances χ are generated at
source U, and that the resulting constrained values are collected at stage C.
If e.g. for a specific instance i of the process in Figure 11.3 the constraints are
such that (xiA ∩ xiB ) = xCi 6= 0,
/ then a non-empty value xCi is collected. If for another
/ then the collected value is
instance j the constraints are such that (xAj ∩ xBj ) = xCj = 0,
A
B
0.
/ If for yet another instance k the constraints are xk = xk = X so that (xkA ∩ xkB ) = X,
then the collected value is X.
Relative to the total number of collected values (including X and 0)
/ the relative
proportions of each type of collected values can be expressed by a relative frequency
probability distribution pCχ .
Let n denote the total number of collected values, and let T denote the Boolean
truth function such that T(TRUE) = 1 and T(FALSE) = 0. Then the convergence
values of pCχ when n goes to infinity is expressed as:
∑ni=1 T(xCi = x)
C
p χ (x) = lim
∀ x ∈ P(X).
(11.6)
n→∞ n − ∑n T(xC = 0)
/
i=1
i
The stochastic constraint opinion ωXC is derived from the probability distribution
pCχ according to Definition 11.4.
Definition 11.4 (Stochastic Constraint Opinion).
Given the convergent constrained relative frequency distribution pCχ of Eq.(11.6)
the stochastic constraint opinion is expressed as:
11.2 Belief Constraint Fusion
211
 C
b X (x) = pCχ (x), for x ∈ R(X)





ωXC :
uCX
= pCχ (X)




 C
aX
= aX
(11.7)
⊔
⊓
(A&B)
the opinion ωX
The stochastic constraint opinion ωXC is the same as
produced
by the belief constraint fusion operator of Definition 11.3, so the following theorem
can be stated.
Theorem 11.1 (Equivalence Between Stochastic and Belief Constraint Fusion).
Stochastic constraint fusion of Definition 11.3 is equivalent to the belief constraint
fusion of Definition 11.4. This can be expressed as:
(A&B)
ωX
≡ ωXC .
(11.8)
Proof.
(A&B)
(A&B)
The stepwise transformation of bCX (x) into b X
(x), as well as of uCX into uX
,
demonstrates the equivalence.
(A&B)
Transformation bCX −→ b X
:
1 : bCX (x) = pCX (x)
2:
= lim
n→∞

3:
=
∀ x ∈ R(X)
∑ni=1 T((xiA ∩xiB )=x)
n − ∑ni=1 T((xiA ∩xiB )=0)
/

n
∑ p Aχ (xA )ppBχ (xB )
(xA ∩xB )=x


∀ x ∈ R(X), xiA , xiB ∈ P(X)
∀ x ∈ R(X), xA , xB ∈ P(X)
n1−
∑ p Aχ (xA )ppBχ (xB )
(xA ∩xB )=0/
∑
4:
=
p Aχ (xA )ppBχ (xB )
(xA ∩xB )=x
1−
∑
p Aχ (xA )ppBχ (xB )
∀ x ∈ R(X), xA , xB ∈ P(X)
(xA ∩xB )=0/
bAX (x)uBX +bbBX (x)uAX + ∑
5:
=
6:
=
bAX (xA )bbBX (xB )
(xA ∩xB )=x
1−
∑ b AX (xA )bbBX (xB )
(xA ∩xB )=0/
Har(x)
(1−Con)
(A&B)
= bX
(x)
∀ x ∈ R(X), xA , xB ∈ R(X)
∀ x ∈ R(X).
(11.9)
The crucial point is the transformation from step 2 to step 3. The validity of this
transformation is evident because at every instance i the following probability
equalities hold:
212
11 Fusion of Subjective Opinions
p((xiA ∩ xiB ) = x)
=
p((xiA ∩ xiB ) = 0)
/
=
∑
p Aχ (xiA )ppBχ (xBj )
∑
p Aχ (xiA )ppBχ (xiB ).
(xiA ∩xiB )=x
(11.10)
(xiA ∩xiB )=0/
This means that the probabilities of the two Boolean truth functions in step 2
above are given by Eq.(11.10). The transformation from step 2 to step 3 above simply consists of rewriting these probabilities.
(A&B)
Simplified transformation uCX −→ uX
:
1 : uCX = pCX (X)
2:
= lim
3:
=
n→∞
1−
∑ni=1 T((xiA ∩xiB )=X)
n − ∑ni=1 T((xiA ∩xiB )=0)
/
xiA , xiB ∈ P(X)
p Aχ (X)ppBχ (X)
∑ p Aχ (xA )ppBχ (xB )
xA , xB ∈ P(X)
uAX uBX
b AX (xA )bbBX (xB )
xA , xB ∈ R(X)
(11.11)
(xA ∩xB )=0/
4:
=
1−
∑
(xA ∩xB )=0/
5:
=
uAX uBX
(1−Con) ,
(A&B)
= uX
.
⊔
⊓
While Definition 11.4 is based on long term frequentist situations, the results
can be extended to the combination of subjective opinions in the same way that
frequentist probability calculus can be extended to subjective probability and nonfrequentist situations. According to de Finetti [13] a frequentist probability is no
more objective than a subjective (non-frequentist) probability, because even if observations are objective, their translation into probabilities is always subjective. de
Finetti [12] provides further justification for this view by explaining that subjective
knowledge of a system often will carry more weight when estimating probabilities
of future events than purely objective observations of the past. The only case where
probability estimates can be purely based on frequentist information is in abstract
examples from text books. Frequentist information is thus just another form of evidence used to estimate probabilities. Because a subjective opinion is simply a probability distribution over a hyperdomain, de Finetti’s view can obviously be extended
to subjective opinions. Based on this argumentation there is not only a mathematical
equivalence, but also an interpretational equivalence between stochastic constraint
fusion and belief constraint fusion.
11.2 Belief Constraint Fusion
213
11.2.3 Expressing Preferences with Subjective Opinions
Preferences can be expressed e.g. as soft or hard constraints, qualitative or quantitative, ordered or partially ordered etc. It is possible to specify a mapping between qualitative verbal tags and subjective opinions which enables easy solicitation of preferences [75]. Table 11.1 describes examples of how preferences can be
expressed.
Table 11.1 Example preferences and corresponding subjective opinions
Example & Type
“Ingredient x is mandatory”
Hard positive
“Ingredient x is totally out of the question”
Hard negative
“My preference rating for x is 3 out of 10”
Quantitative
“I prefer x or y, but z is also acceptable”
Qualitative
Opinion Expression
Binary domain
X = {x, x}
ωx : (1, 0, 0, 12 )
Binomial opinion
Binary domain
X = {x, x}
ωx : (0, 1, 0, 21 )
Binomial opinion
Binary domain
X = {x, x}
Binomial opinion
ωx : (0.3, 0.7, 0.0, 21 )
Ternary domain
Θ = {x, y, z}
Trinomial opinion ωΘ : (b(x, y) = 0.6, b(z) = 0.3,
u = 0.1, a(x, y, z) = 13 )
“I like x, but I like y even more”
Two binary domains X = {x, x} and Y = {y, y}
Positive rank
Binomial opinions ωx : (0.6, 0.3, 0.1, 21 ),
ωy : (0.7, 0.2, 0.1, 21 )
“I don’t like x, and I dislike y even more” Two binary domains X = {x, x} and Y = {y, y}
Negative rank
Binomial opinions ωx : (0.3, 0.6, 0.1, 21 ),
ωy : (0.2, 0.7, 0.1, 21 )
“I’m indifferent about x, y and z”
Ternary domain
Θ = {x, y, z}
Neutral
Trinomial opinion ωΘ : (uΘ = 1.0, a(x, y, z) = 13 )
“I’m indifferent but most people prefer x” Ternary domain
Θ = {x, y, z}
Neutral with bias
Trinomial opinion ωΘ : (uΘ = 1.0, a(x) = 0.6,
a(y) = 0.2, a(z) = 0.2)
All the preference types of Table 11.1 can be interpreted in terms of subjective
opinions, and further combined by considering them as constraints expressed by
different agents. The examples that comprise two binary domains could also have
been modelled with a quaternary product domain with a corresponding 4-nomial
product opinion. In fact product opinions over product domains could be a method
of simultaneously considering preferences over multiple variables, and this will be
the topic of future research.
Default base rates are specified in all but the last example which indicates total
indifference but with a bias which expresses the average preference in the population. Base rates are useful in many situations, such as for default reasoning. Base
rates only have an influence in case of significant indifference or uncertainty.
214
11 Fusion of Subjective Opinions
11.2.4 Example: Going to the Cinema, 1st Attempt
Assume three friends, Alice, Bob and Clark, who want to see a film together at
the cinema one evening, and that the only films showing are Black Dust (BD),
Grey Matter (GM) and White Powder (W P), represented as the ternary domain
Θ = {BD, GM, W P}. Assume that the friends express their preferences in the form
of the opinions of Table 11.2.
Table 11.2 Combination of film preferences
b(BD)
b(GM)
b(W P)
b(GM ∪W P)
=
=
=
=
Preferences of:
Results of preference combinations:
Alice Bob Clark (Alice & Bob) (Alice & Bob & Clark)
ωΘA ωΘB
ωΘA&B
ωΘC
ωΘA&B&C
0.99 0.00 0.00
0.00
0.00
0.01 0.01 0.00
1.00
1.00
0.00 0.99 0.00
0.00
0.00
0.00 0.00 1.00
0.00
0.00
Alice and Bob have strong and conflicting preferences. Clark, who only does not
want to watch Black Dust, and who is indifferent about the two other films, is not
sure whether he wants to come along, so Table 11.2 shows the results of applying
the preference combination operator, first without him, and then including in the
party.
By applying belief constraint fusion, Alice and Bob conclude that the only film
they are both interested in seeing is Grey Matter. Including Clark in the party does
not change that result because he is indifferent to Grey Matter and White Powder
anyway, he just does not want to watch the film Black Dust.
The belief mass values of Alice and Bob in the above example are in fact equal
to those of Zadeh’s example [92] which was used to demonstrate the unsuitability
of Dempster’s rule for fusing beliefs because it produces counter-intuitive results.
Zadeh’s example describes a medical case where two medical doctors express their
opinions about possible diagnoses, which typically should have been modelled with
the averaging fusion operator [41], not with Dempster’s rule. In order to select the
appropriate operator it is crucial to fully understand the nature of the situation to
be modelled. The failure to understand that Dempster’s rule does not represent an
operator for cumulative or averaging belief fusion, combined with the unavailability of the general cumulative and averaging belief fusion operators for many years
(1976[81]-2010[41]), has often led to inappropriate applications of Dempster’s rule
to cases of belief fusion [48]. However, when specifying the same numerical values as in [92] in a case of preference combination such as the example above, the
constraint fusion operator (which is a simple extension of Dempster’s rule) is very
suitable and produces perfectly intuitive results.
11.2 Belief Constraint Fusion
215
11.2.5 Example: Going to the Cinema, 2nd Attempt
In this example Alice and Bob soften their strong preference by expressing some
indifference in the form of u = 0.01, as specified by Table 11.3. Clark has the same
opinion as in the previous example, and is still not sure whether he wants to come
along, so Table 11.3 shows the results without and with his preference included.
Table 11.3 Combination of film preferences with some indifference and with non-default base
rates
Preferences of:
Results of preference combinations:
Alice Bob Clark (Alice & Bob) (Alice & Bob & Clark)
ωΘA ωΘB
ωΘA&B
ωΘC
ωΘA&B&C
b (BD)
=
0.98 0.00 0.00
0.490
0.000
b (GM)
=
0.01 0.01 0.00
0.015
0.029
0.00 0.98 0.00
0.490
0.961
b (W P)
=
b (GM ∪W P)
=
0.00 0.00 1.00
0.000
0.010
u
=
0.01 0.01 0.00
0.005
0.000
a (BD)
=
0.6 0.6 0.6
0.6
0.6
a (GM) = a(W P) =
0.2 0.2 0.2
0.2
0.2
The effect of adding some indifference is that Alice and Bob should pick film
Black Dust or White Powder because in both cases one of them actually prefers one
of the films, and the other finds it acceptable. Neither Alice nor Bob prefers Gray
Matter, they only find it acceptable, so it turns out not to be a good choice for any of
them. When taking into consideration that the base rate a(BD) = 0.6 and the base
rate a(W P) = 0.2, the expected preference levels according to Eq.(3.28) are such
that:
PA&B (BD) > PA&B (W P) .
(11.12)
More precisely, the expected preference levels according to Eq.(3.28) are:
PA&B (BD) = 0.493 ,
PA&B (W P) = 0.491 .
(11.13)
Because of the higher base rate, Black Dust also has a higher expected preference
than White Powder, so the rational choice would be to watch Black Dust.
However, when including Clark who does not want to watch Black Dust, the base
rates no longer dictates the result. In this case Eq.(3.28) produces PA&B&C (W P) =
0.966 so the obvious choice is to watch White Powder.
11.2.6 Example: Not Going to the Cinema
Assume now that the Alice and Bob express totally conflicting preferences as specified in Table 11.4, i.e. Alice expresses a hard preference for Black Dust and Bob
216
11 Fusion of Subjective Opinions
expresses a hard preference for White Powder. Clark still has the same preference
as before, i.e he does not want to watch Black Dust and is indifferent about the two
other films.
Table 11.4 Combination of film preferences with hard and conflicting preferences
b(BD)
b(GM)
b(W P)
b(GM ∪W P)
=
=
=
=
Preferences of:
Results of preference combinations:
Alice Bob Clark (Alice & Bob) (Alice & Bob & Clark)
ωΘA ωΘB
ωΘA&B
ωΘC
ωΘA&B&C
1.00 0.00 0.00
Undefined
Undefined
0.00 0.00 0.00
Undefined
Undefined
0.00 1.00 0.00
Undefined
Undefined
0.00 0.00 1.00
Undefined
Undefined
In this case the belief constraint fusion operator can not be applied because
Eq.(11.2) produces a division by zero. The conclusion is that the friends will not
go to the cinema to see a film together. The test for detecting this situation is when
Con = 1 in Eq.(11.4). It makes no difference to include Clark in the party because a
conflict can not be resolved by including additional preferences. However it would
have been possible for Bob and Clark to watch White Powder together without Alice.
11.3 Cumulative Fusion
The cumulative fusion operator for belief opinions is equivalent to simply adding
up the evidence parameters of evidence opinions. The cumulative fusion operator
for belief opinions is then obtained through the bijective mapping between belief
opinions and evidence opinions, as described by Definition 3.9.
Assume a domain X and its hyperdomain R(X). Assume a process where variable X takes values from X resulting from the process. Consider two agents A and
B who observe the outcomes of the process over two separate time periods. Their
observations can be vague, meaning that sometimes they observe an outcome which
can be one of multiple possible singletons in X, but they are unable to identify the
observed outcome uniquely.
For example, assume that persons A and B are observing coloured balls being
picked from an urn, where the balls can have one of four colours: black, white,
red or green. Assume further that the observers A and B are colour blind, which
means that sometimes they are unable see the difference between red and green
balls, although they can always tell when a ball is black or white. As a result their
observations can be vague, meaning that sometimes they perceives a specific ball
to be either red or green, but are unable to identify the ball’s colour precisely. This
corresponds to the situation where X is a hypervariable which takes its values from
R(X).
11.3 Cumulative Fusion
217
The symbol ‘⋄’ denotes the fusion of two observers A and B into a single imaginary observer denoted as (A ⋄ B).
Definition 11.5 (The Cumulative Fusion Operator).
Let ω A and ω B be opinions respectively held by agents A and B over the same
(A⋄B)
(hyper)variable X on domain X. Let ωX
be the opinion such that:
Case I: For uAX 6= 0 ∨ uBX 6= 0 :

b AX (x)uBX +bbBX (x)uAX
(A⋄B)


 b X (x) = uAX +uBX −uAX uBX


 u(A⋄B)
X
=
Case II: For uAX = 0 ∧ uBX = 0 :

(A⋄B)

 b X (x) = γXA b AX (x) + γXB b B (x)

 u(A⋄B)
X
(A⋄B)
=0
(11.14)
uAX uBX
uAX +uBX −uAX uBX
where


γ A = lim



uAX →0



uBX →0





γXB = lim


uAX →0


B
uBX
A
uX +uBX
(11.15)
uAX
uAX +uBX
uX →0
is called the cumulatively fused opinion of ωXA and ωXB , representing
Then ωX
the combination of independent opinions of sources A and B. By using the symbol
(A⋄B)
⊔
⊓
‘⊕’ to designate this belief operator, we define ωX
≡ ωXA ⊕ ωXB .
It can be verified that the cumulative fusion operator is commutative, associative and non-idempotent. In Case II of Definition 11.5, the associativity depends on
the preservation of relative weights of intermediate results, which requires the additional weight parameter γ . In this case, the cumulative fusion operator is equivalent
to the weighted average of probabilities.
The cumulative fusion operator is equivalent to updating prior Dirichlet PDFs by
adding new evidence to produce posterior Dirichlet PDFs. Deriving the cumulative
belief fusion operator is based on the bijective mapping between belief opinions and
evidence opinions. The mapping is expressed in Definition 3.9.
Theorem 11.2. The cumulative fusion operator of Definition 11.5 is equivalent to
simple addition of the evidence parameters of evidence opinions as expressed in in
Eq.(3.34).
Proof. The cumulative belief fusion operator of Definition 11.5 is derived by mapping the argument belief opinions to evidence opinion through the bijective mapping of Definition 3.9. Cumulative fusion of evidence opinions simply consists of
evidence parameter addition. The fused evidence opinion is then mapped back to
218
11 Fusion of Subjective Opinions
a belief opinion through the bijective mapping of Definition 3.9. This explanation
is in essence the proof of Theorem 11.2. A more detailed explanation is provided
below.
Let the two observers’ respective hyper opinions be expressed as ωXA and ωXB .
eH B
A
The corresponding evidence opinions DireH
X (rr X , a X ) and DirX (rr X , a X ) contain the
A
B
respective evidence parameters r X and r X .
The cumulative fusion of these two bodies of evidence simply consists of vector
eH B
A
addition of DireH
X (rr X , a X ) and DirX (rr X , a X ), expressed as:
(A⋄B)
DireH
X (rr X
eH B
A
, a X ) = DireH
X (rr X , a X ) ⊕ DirX (rr X , a X )
(11.16)
A
B
= DireH
X ((rr X + r X ), a X ) .
More specifically, for each value x ∈ R(X) the accumulated observation evidence
is computed as:
(A⋄B)
r X (x) = r AX (x) + r BX (x).
(11.17)
(A⋄B)
rX
The cumulative fused belief opinion ωXA⋄B of Definition 11.5 results from mapping the fused evidence opinion of Eq.(11.16) back to a belief opinion by applying
the bijective mapping of Definition 3.9. ⊓
⊔
Notice that the expression for the cumulative fusion operator in Definition 11.5
is independent of the non-informative prior weight W . That means that the choice of
non-informative prior weight in fact only influences the mapping between evidence
opinions and belief opinions, not the cumulative fusion operator itself.
The cumulative fusion operator represents a generalisation of the consensus operator [38, 37]. The binomial cumulative fusion operator emerges directly from Definition 11.5 by assuming a binary domain and binomial argument opinions.
11.4 Averaging Fusion
The averaging fusion operator for belief opinions is equivalent to averaging the evidence parameters of the evidence opinions. The averaging fusion operator for belief
opinions is then obtained through the bijective mapping between belief opinions and
evidence opinions, as expressed by Definition 3.9.
Assume a domain X and its hyperdomain R(X). Assume a process X where the
produced outcomes are values from X. Agents A and B observe the same outcomes
of the same process over the same time period.
Even though A and B witness the same process, their perceptions might be different, e.g. because their cognitive capabilities are different. For example, consider a
situation where persons A and B are observing coloured balls being picked from an
urn, where the balls can have one of four colours: black, white, red or green. Assume
further that observer B is colour blind, which means that sometimes he has trouble
distinguishing between red and green balls, although he can always tell when a ball
11.4 Averaging Fusion
219
is black or white. Observer A has perfect colour vision. and normally can always
tell the correct colour when a ball is picket. As a result, when a red ball has been
picked, observer A normally identifies it as red, but observer B might identify it as
green. This corresponds to a case where two observers have conflicting opinions
about the same variable, although their observations and opinions are totally dependent. Consider that a priori it is unknown that one of the observers is colour blind,
so that their opinions are considered equally reliable. The averaging fusion operator
provides an adequate model for this fusion situation.
When it is assumed that some observers are unreliable, their opinions might be
discounted as a function of the analyst’s trust in the observers. This principle is
described in Chapter 13.
The symbol ‘⋄’ denotes the principle of averaging fusion where two observers A
and B are merged into a single imaginary observer denoted (A⋄B).
Definition 11.6 (The Averaging Fusion Operator).
Let ωXA and ωXB be opinions respectively held by agents A and B over the same
(A⋄B)
(hyper)variable X on domain X. Let ωX
be the opinion such that:
Case I: For uAX 6= 0 ∨ uBX 6= 0 :

b AX (x)uBX +bbBX (x)uAX
(A⋄B)


 b X (x) =
uAX +uBX


 u(A⋄B)
X
=
Case II: For uAX = 0 ∧ uBX = 0 :
 (A⋄B)
 b X (x) = γXA b AX (x) + γXB b BX (x)

uA⋄B
X
=0
(11.18)
2uAX uBX
uAX +uBX
where


γXA = lim



uAX →0



uBX →0



B


 γX = lim

uAX →0


B
uBX
uAX +uBX
(11.19)
uAX
uAX +uBX
uX →0
Then ωXA⋄B is called the averaged opinion of ωXA and ωXB , representing the combination of the dependent opinions of A and B. By using the symbol ‘⊕’ to designate
(A⋄B)
≡ ωXA ⊕ωXB .
⊔
⊓
this belief operator, we define ωX
It can be verified that the averaging fusion rule is commutative and idempotent,
but not associative.
The averaging fusion operator is equivalent to updating prior Dirichlet PDFs by
computing the average of prior evidence and new evidence to produce posterior
Dirichlet PDFs. Deriving the averaging belief fusion operator is based on the bijective mapping between the belief and evidence notations described in Definition 3.9.
220
11 Fusion of Subjective Opinions
Theorem 11.3. The averaging fusion operator of Definition 11.6 is equivalent to
simple averaging of the evidence parameters of evidence opinions as expressed in
Eq.(3.34).
Proof. The cumulative fusion operator for belief opinions of Definition 11.6 is derived by mapping the argument belief opinions to evidence opinion through the
bijective mapping of Definition 3.9. Cumulative fusion of evidence opinions simply consists of computing the average of evidence parameters. The fused evidence
opinion is then mapped back to a belief opinion through the bijective mapping of
Definition 3.9. This explanation is in essence the proof of Theorem 11.3. A more
detailed explanation is provided below.
Let the two observers’ respective belief opinions be expressed as ωXA and ωXB .
eH B
A
The corresponding evidence opinions DireH
X (rr X , a X ) and DirX (rr X , a X ) contain the
A
B
respective evidence parameters r X and r X .
The averaging fusion of these two bodies of evidence simply consists of vector
eH B
A
averaging of DireH
X (rr X , a X ) and DirX (rr X , a X ), expressed as:
(A⋄B)
DireH
X (rr X
eH B
A
, a X ) = DireH
X (rr X , a X )⊕DirX (rr X , a X )
(11.20)
A
B
= DireH
X ((rr X + r X )/2, a X ) .
More specifically, for each value x ∈ R(X) the accumulated observation evidence
(A⋄B)
is computed as:
rX
r A (x) + r BX (x)
(A⋄B)
r X (x) = X
.
(11.21)
2
The averaging fused belief opinion ωXA⋄B of Definition 11.6 results from mapping
the fused evidence opinion of Eq.(11.20) back to a belief opinion by applying the
bijective mapping of Definition 3.9. ⊓
⊔
The cumulative rule represents a generalisation of the consensus rule for dependent opinions defined in [45].
11.5 Hybrid Cumulative-Averaging Fusion
A hybrid fusion operator can be designed for situations where the arguments are
partially dependent. In such situations, neither cumulative fusion nor averaging fusion would be fully adequate. Instead a hybrid fusion operator can be used to model
the partial dependence between the arguments and compute a fused opinion.
Let two agents A and B receive evidence e.g. by observing the same process
during two partially overlapping periods. If it is known exactly which events were
observed by both, one of the agents could simply dismiss these observations, so
their opinions would be independent. However, it may not always be possible to
determine which observations are the same.
11.5 Hybrid Cumulative-Averaging Fusion
221
Instead, it may be possible to determine the degree of dependence between their
evidence. The idea is that cumulative fusion can be applied to the independent part,
and averaging fusion to the dependent part.
A
Let the opinions of A and B be represented as evidence opinions DireH
X (rr X , a X )
eH r B
A
B
and DirX (r X , a X ) with the respective evidence parameters r X and r X .
A i(B)
Assume that r AX can be split into two parts, where r X
is totally independent of
A d(B)
B’evidence, and where r X
is totally dependent on B’s evidence. Similarly, r BX can
also be split into a dependent and an independent part, as expressed by Eq.(11.22).

A i(B)
A d(B)

+ rX
 r AX = r X
Partially dependent evidence:
(11.22)

 r B = r B i(A) + r B d(A)
X
X
X
Figure 11.4 illustrates the situation of partially dependent evidence. Assuming
that the fraction of overlapping observations is known, the dependent and the independent parts of their observation evidence can be estimated, so that an operator for
combined cumulative and averaging fusion can be defined [45, 49].
Evidence
body I
rXAi ( B )
A’s evidence
Ad ( B )
X
r
rXB d ( A)
Evidence
body II
B’s evidence
rXB i ( A)
Fig. 11.4 Partially dependent evidence
Let A and B have evidence parameters rXA and rXB that are partially dependent,
(A/B)
where δX
represents A’s evidence’s relative dependence on B’s evidence, and
(B/A)
denotes B’s evidence’s relative dependence on A’s evidence. As shown in
δX
Figure 11.4 the two degrees of dependence are not necessarily equal, e.g. when one
of the observers has collected a larger body of evidence than the other observer.
The dependent and independent evidence can be defined as a function of the two
dependence factors.
(
Ad(B)
(A/B)
r X (x) = rXA (x)δX
A
rX :
Ai(B)
(A/B)
r X (x) = rXA (x)(1 − δX
)
(11.23)
(
Bd(A)
B (x)δ (B/A)
r
(x)
=
r
x
X
X
r BX :
Bi(A)
(B/A)
r X (x) = rXB (x)(1 − δX
)
222
11 Fusion of Subjective Opinions
The fusion of partially dependent evidence opinions can then be defined as a
function of their respective dependent and independent parts.
Definition 11.7 (Fusion of Partially Dependent Evidence Opinions).
Let rXA and rXB be evidence parameters in the evidence opinions respectively held
e denotes fusion between
by the agents A and B regarding variable X. The symbol ⊕
partially dependent opinions. As before ⊕ and ⊕ are the respective operators for
cumulative and averaging fusion. Partially dependent fusion between A and B can
then be written as:
⋄B = r A ⊕r
e XB
r Ae
X
X
Ad(B)
= (rr X
(11.24)
Bd(A)
⊕ rX
Ai(B)
) ⊕ rX
Bi(A)
⊕ rX
⊔
⊓
The equivalent expression for fusion of partially dependent belief opinions can
be obtained by using Eq.(11.24) and by applying the bijective mapping of Definition 3.9. The reciprocal dependence factors are as before denoted by δ (A/B) and
δ (B/A) .
Definition 11.8 (Fusion of Partially Dependent Belief Opinions). Let A and B
have the partially dependent opinions ωXA and ωXB respectively, about the same variable X, and let their dependent and independent parts be expressed according to
Eq.(11.25) below.
11.5 Hybrid Cumulative-Averaging Fusion
Ai(B)
ωX
Ad(B)
ωX
Bi(A)
ωX
Bd(A)
ωX
:
:
:
:

Ai(B)


 b X (x) =




 uAi(B)
X
(A/B)
bAX (x)(1−δX
(A/B)
(1−δX
)(∑ bAX )+uAX
(A/B)
bAX (x)δX
(A/B)
(∑ bAX )+uAX
δX
=
(A/B)
δX
∀x ∈ R(X)
uAX
(∑ bAX )+uAX
(11.25)

Bi(A)


b
(x) =

 X
(B/A))
bBX (x)(1−δX
(B/A)
(1−δX
)(∑ bBX )+uBX
∀x ∈ R(X)
uBX
=
(B/A)
(1−δX

Bd(A)


b
(x) =

 X



 uBd(A)
X
∀x ∈ R(X)
uAX

Ad(B)


 b X (x) =




 uBi(A)
X
)
(A/B)
)(∑ bAX )+uAX
(1−δX
=



 uAd(B)
X
223
=
)(∑ bBX )+uBX
(B/A)
bBX (x)δX
(B/A)
δX
(∑ bBX )+uBX
(B/A)
δX
∀x ∈ R(X)
uBX
(∑ bBX )+uBX
Having specified the separate dependent and independent parts of two partially
dependent opinions, the fusion operator for partially dependent opinions can be expressed.
e denotes fusion between partially dependent opinions. As usual ⊕
The symbol ⊕
and ⊕ denote the operators for independent and dependent opinions.
e ωXB
ωXAe⋄B = ωXA ⊕
Ad(B)
= (ωX
(11.26)
Bd(A)
⊕ωX
Ai(B)
) ⊕ ωX
Bi(A)
⊕ ωX
⊔
⊓
Theorem 11.4. The fusion operator for partially dependent belief opinions described in Definition 11.8 is equivalent to the fusion operator for partially dependent
evidence opinions of Definition 11.7.
Proof. To show the equivalence it is enough to map the partially dependent belief
opinion arguments to evidence opinions according to the bijective mapping of Definition 3.9, then do the fusion according to Definition 11.7, and finally map the result
back to a belief opinions according to the same bijective mapping of Definition 3.9.
The expressions of Definition 11.8 then emerge directly. ⊓
⊔
(A/B)
It is also easy to prove that for any opinion ωXA with a dependence factor δx
to another opinion ωXB the following equality holds:
224
11 Fusion of Subjective Opinions
Ai(B)
ωXA = ωX
Ad(B)
⊕ ωX
(11.27)
11.6 Consensus & Compromise Fusion
CC-fusion (Consensus & Compromise) is a fusion model specifically designed to
satisfy the requirements of being idempotent, having a neutral element, and where
conflicting beliefs result in compromise beliefs. This shows that it is possible to
design fusion models to fit particular requirements.
Assume two opinions ωXA and ωXB over the variable X which takes its values from
the hyperdomain R(X). The superscripts A and B are attributes that identify the respective belief sources or belief owners. These two opinions can be mathematically
CC ’ which can be expressed as:
merged using the CC-fusion operator denoted as ‘
B
CC ω .
Consensus & Compromise Fusion: ωXA♥B = ωXA X
(11.28)
Belief source combination denoted with ‘♥’ thus corresponds to opinion fusion
CC ’. The CC-operator is formally described next. It is a two-step operator
with ‘ where the consensus step comes first, and then the compromise step.
11.6.1 Consensus Step
The consensus step simply consists of determining shared belief mass between the
two arguments, which is stored as the belief vector b cons
expressed by Eq.(11.29).
X
A
B
b cons
(11.29)
X (x) = min b X (x), b X (x) .
The sum of consensus belief denoted bcons
is expressed as:
X
bcons
=
X
∑
b cons
X (x)
(11.30)
x∈R(X)
The residue belief masses of the arguments are:
resA
b X (x) = b AX (x) − b cons
X (x)
B
cons
b resB
(x)
=
b
(x)
−
b
X
X
X (x)
(11.31)
11.6.2 Compromise Step
The compromise step redistributes conflicting residue belief mass to produce comcomp
promise belief mass, stored in b X expressed by Eq.(11.32).
11.6 Consensus & Compromise Fusion
225
bcomp
(xi ) = bresA (xi )uBX + bresB (xi )uAX
X
+ ∑ a X (y/z) a X (z/y) b resA (y) b resB (z)
{y∩z}=xi
+
resA
resB
∑ (1 − a X (y/z) a X (z/y)) b (y) b (z)
{y∪z}=xi
{y∩z}6=0/
+
(11.32)
resA
resB
∑ b (y) b (z) , where xi ∈ P(X) .
{y∪z}=xi
{y∩z}=0/
pre
The preliminary uncertainty uX is computed as:
pre
uX = uAX uBX .
The sum of compromise belief denoted
comp
bX
=
∑
comp
bX
(11.33)
is:
comp
b X (x) .
(11.34)
x∈P(X)
comp
pre
comp
In general bcons
+ bX + uX < 1, so normalisation of b X
is required. The
X
normalisation factor denoted fnorm is:
pre
fnorm =
1 − (bcons
X + uX )
.
comp
bX
(11.35)
Because belief on X is uncertainty, the fused uncertainty is:
pre
comp
uA♥B
= uX + fnorm b X
X
(X) .
(11.36)
After computing the fused uncertainty the compromise belief mass on X must be
set to zero, i.e.
comp
b X (X) = 0 .
(11.37)
11.6.3 Merging Consensus and Compromise Belief
After normalisation the resulting CC-fused belief is:
comp
cons
b A♥B
(x) , ∀x ∈ R(X) .
X (x) = b X (x) + f norm b X
(11.38)
The CC-operator is commutative, idempotent and semi-associative, with the vacuous opinion as neutral element. Semi-associativity means that 3 or more arguments
must first be combined together in the Consensus Step, and then together again in
the Compromise Step before normalisation.
Chapter 12
Unfusion and Fission of Subjective Opinions
Given belief fusion as a principle for merging evidence about a domain of interest,
it is natural to think of its opposite. However, it is not immediately clear what the
opposite of belief fusion might be. From a purely linguistic and semantic point of
view, fission naturally appears to be the opposite of fusion. As a consequence we
define belief fission for subjective opinions below. In addition, we also define unfusion for subjective opinions. The two concepts are related but still clearly different.
Their interpretations are explained in the respective sections below.
12.1 Unfusion of Opinions
The principle of unfusion [40] is the opposite of fusion, namely to eliminate the
contribution of a specific belief from an already fused belief, with the purpose of
deriving the remaining belief. This chapter describes cumulative unfusion as well as
averaging unfusion opinions. These operators can for example be applied to remove
the contribution of a given real or hypothetical evidence source in order to determine
the result of analysing what the situation would have been in the absence of that
evidence source.
Figure12.1 illustrates the principle of unfusion.
Arguments:
(A¡B)
believes
B
Result:
Z XA¡B
u
Z XB
X
remove
belief
A
believes
Z XA
~
Z X( A¡B ) ¡B
Z XA¡B Z XB
X
Fig. 12.1 Unfusion operator principle
227
228
12 Unfusion and Fission of Subjective Opinions
There are situations where it is useful to separate a fused belief in its contributing belief components, and this process is called belief unfusion. This requires the
already fused belief and one of its contributing belief components as input, and will
produce the remaining contributing belief component as output. Unfusion is basically the opposite of fusion, and the formal expressions for unfusion can be derived
by rearranging the expressions for fusion. This will be described in the following
sections.
Fission of beliefs is related to unfusion of beliefs but is different and will not be
discussed here. Fission simply means that a belief is split into several parts without
specifying any of its contributing factors. A belief can for example be split into two
equal contributing beliefs.
The principle of belief unfusion is the opposite to belief fusion. This section describes the unfusion operators corresponding to the cumulative and averaging fusion
operators described in the previous section.
12.1.1 Cumulative Unfusion
Assume a domain X of cardinality k with hyperdomain R(X) and associated variable X. Assume two observers A and B who have observed the outcomes of a process over two separate time periods. Assume that the observers’ beliefs have been
cumulatively fused into ωXA⋄B = ωXC = (bbCX , uCX , a X ), and assume that entity B’s contributing opinion ωXB = (bbBX , uBX , a X ) is known.
The cumulative unfusion of these two bodies of evidence is denoted as ωXCe⋄B =
A
ωX = ωXC ⊖ ωXB , which represents entity A’s contributing opinion. The mathematical
expressions for cumulative unfusion is described below.
Definition 12.1 (The Cumulative Unfusion Operator). Let ωXC = ωXA⋄B be the cumulatively fused opinion of ωXB and the unknown opinion ωXA over the variable X.
Let ωXA = ωXCe⋄B be the opinion such that:
Case I: For uCX 6= 0 ∨ uBX 6= 0 :

A
C⋄B


 bX (x) = bX (x) =


 uA = uCe⋄B
X
X
=
B bB (x)uC
bC
X
X (x)uX −b
X
B
B C
uX −uC
X +uX uX
(12.1)
uBX uC
X
B
B C
uX −uC
X +uX uX
12.1 Unfusion of Opinions
229
Case II: For uCX = 0 ∧ uBX = 0 :
 A
⋄B
B C
C B
 b X (x) = bCe
X (x) = γ b X (x) − γ b X (x)

⋄B
uAX = uCe
X
where
(12.2)
=0


γ B = lim



uC

X →0

B
uX →0

 γC




= lim
uC
X →0
uBX →0
uBX
B C
uBX −uC
X +uX uX
uC
X
C
B
uX −uX +uBX uC
X
Then ωXCe⋄B is called the cumulatively unfused opinion of ωXC and ωXB , representing the result of eliminating the opinion of B from that of C. By using the symbol
‘⊖’ to designate this belief operator, we define:
Cumulative unfusion: ωXCe⋄B ≡ ωXC ⊖ ωXB .
(12.3)
⊔
⊓
Cumulative unfusion is the inverse of cumulative fusion. Its proof and derivation
is based on rearranging the mathematical expressions of Definition 11.5.
It can be verified that cumulative unfusion is non-commutative, non-associative
and non-idempotent. In Case II of Definition 12.1, the unfusion operator is equivalent to the weighted subtraction of probabilities.
12.1.2 Averaging Unfusion
Assume a domain X of cardinality k with corresponding hyperdomain R(X) and
associated variable X. Assume two observers A and B who have observed the same
outcomes of a process over the same time period. Assume that the observers’ beliefs
have been averagely fused into ωXC = ωXA⋄B = (bbCX , uCX , aCX ), and assume that entity
B’s contributing opinion ωXB = (bbBX , uBX , a BX ) is known.
The averaging unfusion of these two bodies of evidence is denoted as ωXA =
ωXCe⋄B = ωXC ⊖ωXB , which represents entity A’s contributing opinion. The mathematical expressions for averaging unfusion is described below.
Definition 12.2 (Averaging Unfusion Operator). Let ωXC = ωXA⋄B be the fused
average opinion of ωXB and the unknown opinion ωXA over the variable X. Let
ωXA = ωXCe⋄B be the opinion such that:
230
12 Unfusion and Fission of Subjective Opinions
Case I: For uCX 6= 0 ∨ uBX 6= 0 :

Ce
⋄B
A


 b X (x) = b X (x) =


 uA = uCe⋄B
X
X
=
B bB (x)uC
2bbC
X
X (x)uX −b
X
B
2uX −uC
X
(12.4)
uBX uC
X
2uBX −uC
X
Case II: For uCX = 0 ∧ uBX = 0 :
 A
Ce
⋄B
 bx = bX (x) = γ B bCX (x) − γ C bBX (x)

⋄B
uAX = uCe
X
where
=0


γ B = lim



uC

X →0

B
uX →0

 γC




= lim
uC
X →0
uBX →0
(12.5)
2uBX
2uBX −uC
X
uC
X
B
2uX −uC
X
Then ωXCe⋄B is called the average unfused opinion of ωXC and ωXB , representing the
result of eliminating the opinions of B from that of C. By using the symbol ‘⊖’ to
designate this belief operator, we define:
Averaging unfusion: ωXCe⋄B ≡ ωXC ⊖ωXB .
(12.6)
⊔
⊓
Averaging unfusion is the inverse of averaging fusion. Its proof and derivation is
based on rearranging the mathematical expressions of Definition 11.6
It can be verified that the averaging unfusion operator is idempotent, noncommutative and non-associative.
12.1.3 Example: Cumulative Unfusion of Binomial Opinions
Assume that A has an unknown binomial opinion about x. Let B’s opinion and the
cumulatively fused opinion between A’s and B’s opinions be specified as:
ωxA⋄B = (0.90, 0.05, 0.05, 21 ) and
ωxB = (0.70, 0.10, 0.20, 12 )
The cumulative unfusion operator can be used to derive A’s opinion. By applying
the argument opinions to Eq.(12.1) the contributing opinion from A is derived as:
12.2 Fission of Opinions
231
ωxA = (0.93, 0.03, 0.06,
1
)
2
12.2 Fission of Opinions
Assuming that an opinion can be considered as the actual or virtual result of fusion,
there are situations where it is useful to split it into two separate opinions, and this
process is called opinion fission. This operator, which requires an opinion and a
fission parameter as input arguments, will produce two separate opinions as output.
Fission is basically the opposite operation to fusion. The mathematical formulation
of fission will be described in the following sections.
12.2.1 Cumulative Fission
The principle of opinion fission is the opposite operation to opinion fusion. This section describes the fission operator corresponding to the cumulative fusion operator
that was described in the previous section.
There are in general an infinite number of ways to split an opinion. The principle
followed here is to require an auxiliary fission parameter φ to determine how the
argument opinion shall be split. As such, opinion fission is a binary operator, i.e.
it takes two input arguments which are the fission parameter and the opinion to be
split.
Assume a domain X and its hyperdomain R(X), with associated variable X.
Assume that the opinion ωXC = (bbX , uX , a X ) over X is held by a real or imaginary
entity C.
The fission of ωXC consists of splitting ωXC into two opinions ωXC1 and ωXC2 assigned to the (real or imaginary) agents C1 and C2 so that ωXC = ωXC1 ⊕ ωXC2 . The
parameter φ determines the relative proportion of belief mass that each new opinion gets. Fission of ωXC produces two opinions denoted as φ ωXC = ωXC1 and
φ ωXC = ωXC2 . The mathematical expressions for cumulative fission are constructed
as follows.
First we map the argument opinion ωXC = (bbCX , uCX , a X ) to the Dirichlet HPDF
C
DireH
X (rr X , a X ) according to the mapping of Definition 3.9. The the parameters of this
C1
eH C2
Dirichlet HPDF is linearly split into two parts DireH
X (rr X , a X ) and DirX (rr X , a X ) as
a function of the fission parameter φ . These steps produce:
232
12 Unfusion and Fission of Subjective Opinions
C1
DireH
X (rr X , a X )
C2
DireH
X (rr X , a X ) :
:
 C
 rX 1 =

(12.7)
a XC1 = a
 C
 rX 2 =

φW b
u
(1−φ )W b
u
(12.8)
a XC2 = a
where W denotes the non-informative prior weight.
The reverse mapping of these evidence parameters into two separate opinions
according to Definition 3.9 produces the expressions of Def.12.3 below. As would
be expected, the base rate is not affected by fission.
Definition 12.3 (Cumulative Fission Operator).
Let ωXC be an opinion over the variable X. The cumulative fission of ωXC based on the
fission parameter φ where 0 < φ < 1 produces two opinions ωXC1 and ωXC2 defined
by:
 C

b X 1 = u+φ φk b b(x )


∑i=1
i




u
ωXC1 : uCX1 =
(12.9)
u+φ ∑ki=1 b(xi )






 C1
aX = a
ωXC2

φ )bb

b XC2 = u+(1−(1−

φ ) ∑ki=1 b(xi )





u
: uCX2 =
u+(1−φ ) ∑ki=1 b(xi )






 C2
aX = a
(12.10)
By using the symbol ‘’ to designate this operator, we define:
ωXC1 = φ ωXC
(12.11)
ωXC2
(12.12)
=φ
ωXC
In case [C : X] represents a trust edge where X represents a target entity, it can also
be assumed that the entity X is being split, which leads to the same mathematical
expression as Eq.(12.9) and Eq.(12.10), but with the following notation:
ωXC1 = φ ωXC = ωXC1
(12.13)
ωXC2
(12.14)
=φ
ωXC
=
ωXC2
12.2 Fission of Opinions
233
⊔
⊓
It can be verified that ωXC1 ⊕ ωXC2 = ωXC , as expected. In case φ = 0 or φ = 1 one of
the resulting opinions will be vacuous, and the other equal to the argument opinion.
12.2.2 Fission of Average
Assume a domain X and the corresponding variable X. Then assume that the opinion
ωXA = (bb, u, a ) over X is held by a real or imaginary entity A.
Average fission of ωXA consists of splitting ωXA into two opinions ωXA1 and ωXA2
assigned to the (real or imaginary) agents A1 and A2 so that ωXA = ωXA1 ⊕ωXA2 .
It turns out that averaging fission of an opinion trivially produces two opinions
that are equal to the argument opinion. This is because the average fusion of two
equal opinions necessarily produces the same opinion. It would be meaningless to
define this operator formally because it is trivial, and because it does not provide a
useful model for any interesting practical situation.
12.2.3 Example Fission of Opinion
Consider a ternary domain X with corresponding variable X and a hyper-opinion
ωX . An analyst wants to split the opinion based on the fission parameter φ = 0.75.
Table 12.1 shows the argument opinion as well as the result of the fission operation.
Table 12.1 Example cumulative opinion fission with φ = 0.75
Argument opinion:
ωCA
Parameters:
belief mass of x1 :
belief mass of x2 :
belief mass of x3 :
uncertainty mass:
b (x1 )
b (x2 )
b(x3 )
uX
0.20
0.30
0.40
0.10
base rate of x1 :
base rate of x2 :
base rate of x3 :
a (x1 )
a(x2 )
a (x3 )
0.10
0.20
0.70
projected prob. of x1 :
projected prob. of x2 :
projected prob. of x3 :
P(x1 )
P(x1 )
P(x1 )
0.21
0.32
0.47
Fission result:
ωCA1
ωCA2
0.194
0.290
0.387
0.129
0.154
0.230
0.308
0.308
0.207
0.316
0.477
0.185
0.292
0.523
234
12 Unfusion and Fission of Subjective Opinions
It can bee seen that the derived opinion ωCA1 contains significantly less uncertainty
than ωCA2 , which means that ωCA1 represents the largest evidence base. This is due to
the fission parameter φ = 0.75 which dictates the relative proportion of evidence
between ωCA1 and ωCA2 to be 3 : 1.
Chapter 13
Computational Trust
Subjective logic was originally developed for the purpose of reasoning about trust in
information security, such as when analysing trust structures of a PKI (Public-Key
Infrastructure). Subjective logic and its application to this type of computational
trust was first proposed by Jøsang in 1997 [35]. The idea of computational trust was
originally proposed by Marsh in 1994 [64].
The concept of trust is to a large extend a mental and psychological phenomenon
which does not correspond to a physical process that can be objectively observed
and analysed. For this reason, formal trust models do not have any natural benchmark against which they can be compared and validated. There is thus no single
correct formalism for computational trust, so that any formal trust model to a certain extent becomes ad hoc. Using subjective logic as a basis for computational trust
is therefore just one of several possible approaches.
Computational trust with subjective logic has been a thriving research topic since
the first publication in 1997, with subsequent contributions by many different authors. It has the advantage of being intuitively sound and relatively simple, which is
important when making practical implementations.
The main subjective logic operators used for computational trust are fusion and
trust discounting. Fusion operators are described in Chapter 11 above. The operator
for trust discounting is described in Section 13.3 below. Trust discounting is the
operator for deriving trust or belief from transitive trust paths. Before diving into
the mathematical details of trust discounting the next section introduces the concept
of trust from a more philosophical perspective.
13.1 The Notion of Trust
Trust can be considered to be a particular kind of belief. In that sense trust can be
modeled as an opinion that can be used as input arguments, or that can be the output
results, in reasoning models based on subjective logic. We use the term trust opinion
to denote trust represented as a subjective opinion.
235
236
13 Computational Trust
Trust is a directional relationship between two parties that can be called trustor
and trustee. One must assume the trustor to be a ‘thinking entity’ in some form
meaning that it has the ability to make assessments and decisions based on received
information and past experience. The trustee can be anything from a person, organisation or physical entity, to abstract notions such as information, propositions or a
cryptographic key [34].
A trust relationship has a scope, meaning that it applies to a specific purpose or
domain of action, such as ‘being authentic’ in the case of a an agent’s trust in a cryptographic key, or ‘providing reliable information’ in case of a person’s trust in the
correctness of an entry in Wikipedia1 . Mutual trust is when both parties trust each
other with the same scope, but this is obviously only possible when both parties are
cognitive entities capable of doing some form of reliability, risk and policy assessment. Trust influences the trustor’s attitudes and actions, but can also have effects
on the trustee and other elements in the environment, for example, by stimulating
reciprocal trust [21]. The literature uses the term trust with a variety of different
meanings which can be confusing[67].
Two main interpretations are to view trust as the perceived reliability of something or somebody, called reliability trust, and to view trust as a decision to enter into
a situation of dependence on something or somebody, called decision trust. These
two different interpretations of trust are explained in the following sections. It can
already be mentioned that the notion of trust opinions in subjective logic assumes
trust to have the meaning of perceived reliability.
13.1.1 Reliability Trust
As the name suggest, reliability trust can be interpreted as the estimated reliability
of something or somebody independently of any actual commitment or decision,
and the definition by Gambetta (1988) [27] provides an example of how this interpretation can be articulated:
Definition 13.1 (Reliability Trust). Trust is the subjective probability by which an
individual, A, expects that another individual, B, performs a given action on which
its welfare depends.
⊔
⊓
In Definition 13.1 trust is interpreted as the trustor’s probability assessment of
the trustee’s reliability, in the context of the trustor’s potential dependence on the
trustee. Instead of using probability, the trustor can express the trustee’s reliability
as a subjective opinion, which thereby becomes a trust opinion.
Assume that an agent A holds a certain belief about an arbitrary variable X, which
then represents a belief relationship formally expressed as [A, X], and that agent A
also has a level of trust in entity E, which then represents a trust relationship formally expressed as [A, E]. A crucial semantic difference between holding a belief
1
http://www.wikipedia.org/
13.1 The Notion of Trust
237
about a variable X and having a level of trust in an entity E is that the trust relationship [A, E] assumes that trustor A potentially or actually is in a situation of
dependence on E, whereas the belief relationship [A, X] makes no assumption about
dependence.
By dependence is meant that the welfare of agent A depends on the performance
of E, but which which A can not accurately predict or control. This uncertainty on
the objectives of A means that in case E does not perform as assumed by A, then A
would suffer some damage. In general, the uncertainty on the objectives of an agent
is defined as risk [33].
The dependence aspect of a trust opinion thus creates risk which is a function of
the potential damage resulting from the possible failure of entity E to meet its trust
expectations.
Trust opinions are binomial opinions because they apply to binary variables that
naturally can take two values. A general trust domain can be denoted T = {t, t},
so that a binary random trust variable T can be assumed to take one of these two
variables with the general meaning:
Trust domain T :

 t : “The action is performed as expected.”

t : “The action is not performed as expected.”
Assume that an entity E is trusted to perform a specific action. A binomial trust
opinion about E can thus be denoted ω tE . However, in order to have more direct expressions for trust opinions we normally use the notation ωE with the same meaning.
This convention is expressed in Eq.(13.1).
Equivalent opinion notations for trust in E: ωE ≡ ω tE
(13.1)
For example, when bank B provides credit to E it puts itself in a situation of
dependence on E and becomes exposed to risk in case E is unable to repay its
debt. The bank can use a trust opinion to express trust in E with regard to E’s
creditworthiness.
Trust in target E can be represented as belief in a variable TE which takes its
values from the binary domain TE = {tE ,t E } where the values have the following
meanings:

 tE : “Entity E will pay its debt.”
Trust domain TE :

t E : “Entity E will default on its debt.”
A binomial opinion about the value tE is then a trust opinion about E with regard
to E’s creditworthiness. Bank B’s binomial trust opinion about entity E can thus
be denoted ωtBE . However, for ease of expression we use the simplified notation ωEB
according to notation equivalence of Eq.(13.1).
Trust opinions fit nicely into the reasoning framework of subjective logic, either
as input arguments or as output results. Applying subjective logic to reasoning with
238
13 Computational Trust
trust opinions represents computational trust which is a powerful way of taking
subjective aspects of belief reasoning into account. Some subjective logic operators are essential for computational trust, in particular trust discounting described in
Section 13.3, and (trust) fusion described in Chapter 11. Trust discounting is used
for deriving opinions from transitive trust paths, and trust fusion is used for merging multiple trust paths. In combination, trust discounting and trust fusion form the
main the building blocks for trust networks, as described in Section 14.
13.1.2 Decision Trust
Trust can be interpreted with a more complex meaning than than that of reliability
trust and Gambetta’s definition. For example, Falcone & Castelfranchi (2001) [22]
note that having high (reliability) trust in a person is not necessarily sufficient for
deciding to enter into a situation of dependence on that person. In [22] they write:
“For example it is possible that the value of the damage per se (in case of failure) is too
high to choose a given decision branch, and this independently either from the probability
of the failure (even if it is very low) or from the possible payoff (even if it is very high). In
other words, that danger might seem to the agent an intolerable risk.” [22]
To illustrate the difference between reliability trust and decision trust with a practical example, consider the situation of a fire drill where participants are asked to
abseil from the third floor window of a house using a rope that looks old and that
appears to be in a state of severe deterioration. In this situation, the participants
would assess the probability that the rope will hold them while abseiling.
Let R denote the rope, and assume the binary trust domain TR = {tR ,t R } where
the values have the following meanings:
Trust domain TR :

 tR : “The rope will hold me while I’m abseiling down.”

t R : “The rope will rupture if I try to abseil down.”
Person A’s reliability trust in the rope can then be expressed as a binomial opinion
denoted ω tAR , but we normally use ωRA as equivalent notation according to Eq.(13.1).
If person A thinks that the rope will rupture she would express this in the form of
a binomial opinion ωRA with a disbelief parameter dRA close to 1.0, as a representation
for distrust in the rope R, and would most likely refuse to use it for abseiling. The
fire drill situation is illustrated on the left side of Figure 13.1.
Imagine now that the same person is trapped in a real fire, and that the only escape
is to abseil from the third floor window using the same old rope. It is assumed that
the trust opinion ωRA is the same as before. However, in this situation it is likely that
person A would decide to trust the rope for abseiling down, even if she thinks it is
possible that it could rupture. The trust decision has thus changed even though the
reliability trust opinion is unchanged. This paradox is easily explained by the fact
13.1 The Notion of Trust
239
Would you trust this rope ...
in a fire drill?
No, I would not!
in a real fire?
Yes, I would!
Fig. 13.1 Same reliability trust, but different decision trust
that we are here talking about two different types of trust, namely reliability trust
and decision trust.
The change in trust decision is perfectly rational because the likelihood of injury
or death while abseiling is assessed against the likelihood of smoke suffocation and
death by fire. Although the reliability trust in the rope is the same in both situations,
the decision trust changes as a function of the comparatively different utility values
associated with the different courses of action in the two situations. The following
definition captures the concept of decision trust.
Definition 13.2 (Decision Trust). Trust is the commitment to depend on something
or somebody in a given situation with a feeling of relative security, even though
negative consequences are possible (inspired by [67]).
⊔
⊓
In Definition 13.2, trust is primarily interpreted as the willingness to actually
rely on a given object, and specifically includes the notions of dependence on the
trustee, and its reliability and risk. In addition, Definition 13.2 implicitly also covers
situational elements such as utility (of possible outcomes), environmental factors
(law enforcement, contracts, security mechanisms etc.) and risk attitude (risk taking,
risk averse, etc.).
Both reliability trust and decision trust reflect a positive belief about something
on which the trustor depends for its welfare. Reliability trust is most naturally measured as a probability or opinion about reliability, whereas decision trust is most
naturally measured in terms of a binary decision. While most trust and reputation
models assume reliability trust, decision trust can also modelled. Systems based on
decision trust models should be considered as decision support tools.
The difficulty of capturing the notion of trust in formal models in a meaningful
way has led some economists to reject it as a computational concept. The strongest
expression for this view has been given by Williamson (1993) [89] who argues that
the notion of trust should be avoided when modelling economic interactions, because it adds nothing new, and that well studied notions such as reliability, utility
and risk are adequate and sufficient for that purpose. Personal trust is the only type of
trust that can be meaningful for describing interactions, according to Williamson. He
argues that personal trust applies to emotional and personal interactions such as love
240
13 Computational Trust
relationships where mutual performance is not always monitored and where failures
are forgiven rather than sanctioned. In that sense, traditional computational models
would be inadequate e.g. because of insufficient data and inadequate sanctioning,
but also because it would be detrimental to the relationships if the involved parties
were to take a computational approach. Non-computation models for trust can be
meaningful for studying such relationships according to Williamson, but developing such models should be done within the domains of sociology and psychology,
rather than in economy.
In the light of Williamson’s view on modelling trust it becomes important to
judge the purpose and merit of computational trust itself. Can computational trust
add anything new and valuable to the Internet technology and economy? The answer, in our opinion, is definitely yes. The value of computational trust lies in the
architectures and mechanisms for collecting trust relevant information, for efficient,
reliable and secure processing, for distribution of derived trust and reputation scores,
and for taking this information into account when navigating the Internet and making decisions about online activities and transactions. Economic models for risk
taking and decision making are abstract and do not address how to build trust networks and reputation systems. Computational trust specifically addresses how to
build such systems, and can be combined with economic modeling whenever relevant and useful.
13.1.3 Reputation and Trust
The concept of reputation is closely linked to that of trustworthiness, but it is evident
that there is a clear and important difference. For the purpose of this study, we will
define reputation according to Merriam-Webster’s online dictionary [68].
Definition 13.3 (Reputation). The overall quality or character as seen or judged by
people in general.
⊔
⊓
This definition corresponds well with the view of social network researchers
[26, 63] that reputation is a quantity derived from the underlying social network
which is globally visible to all members of the network. The difference between
trust and reputation can be illustrated by the following perfectly normal and plausible statements:
1. “I trust you because of your good reputation.”
2. “I trust you despite your bad reputation.”
Assuming that the two sentences relate to the same trust scope, statement a) reflects that the relying party is aware of the trustee’s reputation, and bases his trust on
that. Statement b) reflects that the relying party has some private knowledge about
the trustee, e.g. through direct experience or intimate relationship, and that these
13.2 Trust Transitivity
241
factors overrule any (negative) reputation that a person might have. This observation reflects that trust ultimately is a personal and subjective phenomenon that that is
based on various factors or evidence, and that some of those carry more weight than
others. Personal experience typically carries more weight than second hand trust referrals or reputation, but in the absence of personal experience, trust often has to be
based on referrals from others.
Reputation can be considered as a collective measure of trustworthiness (in the
sense of reliability) based on the referrals or ratings from members in a community. An individual’s subjective trust can be derived from a combination of received
referrals and personal experience.
Reputation can relate to a group or to an individual. A group’s reputation can for
example be modelled as the average of all its members’ individual reputations, or
as the average of how the group is perceived as a whole by external parties. Tadelis’
(2001) [87] study shows that an individual belonging to to a given group will inherit
an a priori reputation based on that group’s reputation. If the group is reputable all
its individual members will a priori be perceived as reputable and vice versa.
Reputation systems are automated systems for generating reputation scores about
products or services. Reputation systems are based on receiving feedback and ratings from users about their satisfaction with products or services that they have had
direct experience with, and uses the ratings and the feedback to derive reputation
scores.
Reputation systems are widely used on e-commerce platforms, social networks
and in web 2.0 applications in general.
Evidence opinions where the number of observations are explicitly represented
are well suited as the basis for computation in reputation systems. Feedback can
be represented as an observation, and can be merged using the cumulative fusion
operator described in Section 11.3. This type of reputation computation engines
is called a Bayesian reputation system because it is based on Bayesian statistics
through Beta and Dirichlet PDFs. Chapter 15 below describes the principle and
building blocks of Bayesian reputation systems.
13.2 Trust Transitivity
The formalism for computational trust described in the following sections assumes
that trust is interpreted as reliability trust according to Definition 13.1. Based on the
assumption that reliability trust is a form of belief, degrees of trust can be expressed
as trust opinions.
242
13 Computational Trust
13.2.1 Motivating Example for Transitive Trust
We constantly make choices and decisions based on trust. As a motivating example,
let us assume that Alice has trouble with her car, so she needs to get it fixed by a
car mechanic. Assume further that Alice has recently moved to town and therefore
has no experience with having her car serviced in that town. Bob who is one of
her colleagues at work has lived in the town for many years. When Alice’s car
broke down, Bob gave her a lift with his car. Alice noticed that Bob’s car was well
maintained, so she intuitively trusted Bob in matters of car maintenance. Bob tells
her that he usually gets his car serviced by a car mechanic named Eric, and that based
on direct experience Eric seems to be a very skilled car mechanic. As a result, Bob
has direct trust in Eric. Bob gives her the advice to get her car fixed at Eric’s garage
she. Based on her trust in Bob in matters of car maintenance, and on Bob’s advice,
Alice develops trust in Eric too. Alice has derived indirect trust in Eric, because it is
not based in direct experience. Still, it is genuine trust which helps Alice to make a
decision about where to get her car fixed.
This example represents trust transitivity in the sense that Alice trusts Bob who
trusts Eric, so that Alice also trusts Eric. This assumes that Bob actually tells Alice
that he trusts Eric, which is called a recommendation. This is illustrated in Figure 13.2, where the indexes indicate the order in which the trust relationships and
recommendations are formed.
4
derived functional trust
Alice
referral trust
2
Eric
Bob
functional trust
1
3
recommendation
Fig. 13.2 Transitive trust principle
Trust is only conditionally transitive [10]. For example, assume that Alice would
trust Bob to look after her child, and that Bob trusts Eric to fix his car, then this does
not imply that Alice trusts Eric for looking after her child. However, when certain semantic requirements are satisfied [47], trust can be transitive, and a trust system can
be used to derive trust. For example, every trust edge along a transitive chain must
share the same trust scope. In the last example, trust transitivity collapses because
the scopes of Alice’s and Bob’s trust are different. These semantic requirements are
described in Section 13.2.5 below.
Based on the situation of Figure 13.2, let us assume that Alice needs to have her
car serviced, so she asks Bob for his advice about where to find a good car mechanic
13.2 Trust Transitivity
243
in town. Bob is thus trusted by Alice to know about a good car mechanic, and to tell
his honest opinion about that. Bob in turn trusts Eric to be a good car mechanic.
The examples above assume some sort of absolute trust between the agents in
the transitive chain. In reality trust is never absolute, and many researchers have proposed to express trust as discrete verbal statements, as probabilities or other continuous measures. When applying computation to such trust measures, intuition dictates
that trust should be weakened or diluted through transitivity. Revisiting the above
example, this means that Alice’s derived trust in the car mechanic Eric through the
recommender Bob can be at most as strong or confident as Bob’s trust in Eric. How
trust strength and confidence should be formally represented depends on the particular formalism used.
It could be argued that negative trust in a transitive chain can have the paradoxical effect of strengthening the derived trust. Take for example the case where Alice
distrusts Bob, and Bob distrusts Eric. In this situation, it might be reasonable for
Alice to derive positive trust in Eric, since she thinks “Bob is trying to trick me, I
will not rely on him”. In some situations it is valid to assume that the enemy of my
enemy is my friend. If Alice would apply this principle, and Bob would recommends
distrust in Eric, then Bob’s negative advice could count as a pro-Eric argument from
Alice’s perspective. The question of how transitivity of distrust should be interpreted
can quickly become very complex because it can involve multiple levels of deception. Models based on this type of reasoning have received minimal attention in
the trust and reputation systems literature, and it might be argued that the study of
such models belongs to the intelligence analysis discipline, rather than online trust
management. However, the fundamental issues and problems are the same in both
disciplines.
The safe and conservative approach is to assume that distrust in a node which
forms part of a transitive trust path should contribute to increased uncertainty in the
opinion about the target entity or variable. This is also the approach taken by the
typical trust discounting operator described in Section 13.3.
13.2.2 Referral Trust and Functional Trust
With reference to the previous example, it is important to separate between trust
in the ability to recommend a good car mechanic which represents referral trust,
and trust in actually being a good car mechanic which represents functional trust.
The scope of the trust is nevertheless the same, namely to be a good car mechanic.
Assuming that Bob has demonstrated to Alice that he is knowledgeable in matters
relating to car maintenance, Alice’s referral trust in Bob for the purpose of recommending a good car mechanic can be considered to be direct. Assuming that Eric on
several occasions has proven to Bob that he is a good mechanic, Bob’s functional
trust in Eric can also be considered to be direct. Thanks to Bob’s advice, Alice also
trusts Eric to actually be a good mechanic. However, this functional trust must be
244
13 Computational Trust
considered to be indirect, because Alice has not directly observed or experienced
Eric’s skills in servicing and repairing cars.
The concept of referral trust represents a third type of belief/trust relationships
that comes in addition to belief relationships and functional trust relationships. The
list of all three types of belief/trust relationships is given in Table 13.1, which extends Table 3.1 on p.19.
Table 13.1 Notation for belief, functional trust and referral trust relationships
Nr.
Relationship
type
Formal
notation
Graph edge
notation
Interpretation
1a
Belief
[A, X]
A −→ X
A has an opinion on variable X
2a
Functional trust [A, E]
A −→ E
A has a functional trust opinion in entity E
2b
Functional trust [A,tE ]
A −→ tE
A has a functional trust opinion in value tE
3a
Referral trust
[A; B]
A 99K B
A has a referral trust opinion in entity B
3b
Referral trust
[A;tB ]
A 99K tB
A has a referral trust opinion in value tB
Note that types 2a and 2b in Table 13.1 simply give two equivalent notations for
functional trust relationships, and that types 3a and 3b give two equivalent notations
for referral trust relationships.
13.2.3 Notation for Transitive Trust
Table 13.1 specifies the notation for representing simple belief and trust relationships, where each relationship represents an edge. Transitive trust paths are formed
by combining relationship edges with the transitivity symbol ‘:’. For example the
transitive trust path of Figure 13.2 can be formally express as:
Notation for transitive trust in Figure 13.2 :
[A, E] = [A; B] : [B, E].
(13.2)
The referral trust edge from A to B is thus denoted [A; B], and the functional trust
edge from B to E is denoted [B, E]. The serial/transitive connection of the two trust
edges produces the derived functional trust edge [A, E].
The mathematics for computing derived trust opinions resulting from transitive
trust networks such as in Figure 13.2 and expressed by Eq.(13.2) is given by the
trust discounting operator described in Section 13.3 below.
Let us slightly extend the example, wherein Bob does not actually know any
car mechanics himself, but he trusts Claire, whom he believes knows a good car
mechanic. As it happens, Claire is happy to give a recommendation about Eric to
Bob, which Bob passes on to Alice. As a result of transitivity, Alice is able to derive
13.2 Trust Transitivity
245
trust in Eric, as illustrated in Figure 13.3. This clearly illustrates that referral trust
in this example is about the ability to recommend somebody who can fix cars, and
that functional trust in this example is about the ability to actually fix cars. “To be
skilled at fixing cars” represents the scope of the trust relationships in this example.
Alice
2
rec.
1
referral trust
Bob
2
rec.
Claire
1
referral trust
Eric
1
functional
trust
derived functional trust
3
Fig. 13.3 Trust derived through transitivity
Formal notation of the graph of Figure 13.3 is given in Eq.(13.3).
Transitive trust path notation: [A, E] = [A; B] : [B;C] : [C, E]
(13.3)
The functional trust for the example of Figures 13.3 can be represented by the
binomial opinion ωEC which then expresses C’s level of functional trust in entity E.
The equivalent notation ωtCE makes explicit the belief in the trustworthiness of E
expressed as the variable value tE , where high belief mass assigned to tE means that
C has high trust in E.
Similarly, the opinion ωBA expresses A’s referral trust in B, with equivalent notation ωtAB , where the statement tB e.g. is interpreted as tB : “Entity B can provide good
advice about car mechanics”.
13.2.4 Compact Notation for Transitive Trust Paths
A transitive trust path consists of a set of trust edges representing chained trust
relationships. It can be noted that two adjacent trust edges repeat the name of an
entity connecting the two edges together. For example, the entity name B appears
twice in the notation of the trust path [A; B] : [B;C].
This notation can be simplified by omitting the repetition of entities, and by
representing consecutive edges as a single multi-edge. By using this principle, the
compact notation for the transitive trust path of Figure 13.3 emerges, as given in
Eq.(13.4), which also shows the corresponding equivalent full notation.
Compact notation for trust paths: [A; B;C, E] ≡ [A; B] : [B;C] : [C, E].
(13.4)
246
13 Computational Trust
The advantage of the compact notation is precisely that it is more compact than
the full notation, resulting in simpler expressions. We sometimes use the compact
notation in our description of trust networks below.
13.2.5 Semantic Requirements for Trust Transitivity
The concept of referral trust might seem subtle. The interpretation of referral trust
is that Alice trusts Bob to recommend somebody (who can recommend somebody
etc.) who can recommend a good car mechanic. At the same time, referral trust
always assumes the existence of functional trust or belief at the end of the transitive
path, which in this example is about being a good car mechanic.
The ‘referral’ variant of a trust can be considered to be recursive, so that any transitive trust chain, with arbitrary length, can be expressed. This principle is captured
by the following criterion.
Definition 13.4 (Functional Trust Derivation Criterion). Derivation of functional
trust through referral trust, requires that the last trust edge represents functional trust,
and all previous trust arcs represents referral trust.
⊔
⊓
In practical situations, a trust scope can be characterised by being general or specific. For example, knowing how to change wheels on a car is more specific than to
be a good car mechanic, where the former scope is a subset of the latter. Whenever
the functional trust scope is equal to, or a subset of the referral trust scopes, it is possible to form transitive paths. This can be expressed with the following consistency
criterion.
Definition 13.5 (Trust Scope Consistency Criterion). A valid transitive trust path
requires that the trust scope of the functional/last edge in the path be a subset of all
previous referral trust edges in the path.
⊔
⊓
Trivially, every trust edge can have the same trust scope. Transitive trust propagation is thus possible with two variants (i.e. functional and referral) of a single trust
scope.
A transitive trust path stops at the first functional trust edge encountered. It is, of
course, possible for a principal to have both functional and referral trust in another
principal, but that should be expressed as two separate trust edges. The existence
of both a functional and a referral trust edge, e.g. from Claire to Eric, should be
interpreted as Claire having trust in Eric not only to be a good car mechanic, but
also to recommend other car mechanics.
13.3 The Trust Discounting Operator
247
13.3 The Trust Discounting Operator
13.3.1 Principle of Trust Discounting
The general idea behind trust discounting is to express degrees of trust in an information source and then to discount information provided by that source as a function
of the trust in the source. We represent both the trust and the provided information
in the form of subjective opinions, and then define an appropriate operation on these
opinions to find the trust discounted opinion.
Let agent A denote the relying party and agent B denote an information source.
Assume that agent B provides information to agent A about the state of a variable X
expressed as a subjective opinion on X. Assume further that agent A has an opinion
on the trustworthiness of B with regard to providing information about X, i.e. the
trust scope is to provide information about X. Based on the combination of A’s trust
in B as well as B’s opinion about X given as an advice to A, it is possible for A to
derive an opinion about X. This process is illustrated in Figure 13.4.
Arguments:
A
trusts
Z BA
Z XB
B
believes
Result:
X
A
believes
Z [XA; B ]
Z BA … Z XB
X
Fig. 13.4 Trust discounting of opinions
Several trust discounting operators for subjective logic are described in the literature [49, 51]. The general representation of trust discounting is through conditionals
[51], while special cases can be expressed with specific trust discounting operators.
In this paper we use the specific case of uncertainty-favouring trust discounting
which enables the uncertainty in A’s derived opinion about X to increase as a function of the projected distrust in the recommender B. The uncertainty-favouring trust
discounting operator is described below.
13.3.2 Trust Discounting with 2-Edge Paths
Agent A’s referral trust in B can be formally expressed as a binomial opinion on
domain TB = {tB ,t B } where the values tB and t B denote trusted and distrusted respectively. We denote this opinion by ωBA = (bAB , dBA , uAB , aAB ) 2 . The values bAB , dBA ,
According to Eq.(13.1), an equivalent notation for this opinion is ω tAB = (btAB , dtAB , utAB , atAB ). For
practical reasons we use the simplified notation here.
2
248
13 Computational Trust
and uAB represent the degrees to which A trusts, does not trust, or is uncertain about
the trustworthiness of B in the current situation, while aAB is a base rate probability
that A would assign to the trustworthiness of B a priori, before receiving the advice.
Definition 13.6 (Trust Discounting for 2-Edge Path). Assume agents A and B
where A has referral trust in B for a trust scope related to domain X. Let X denote
a variable on domain X, and let ωXB = (bbBX , uBX , a BX ) be agent B’s general opinion on
X as recommended by B to A. Also assume that agent A’s referral trust in B with
respect to recommending belief about X is expressed as ωBA . The notation for trust
discounting is given by:
[A;B]
(13.5)
ωX = ωBA ⊗ ωXB .
The trust discounting operator combines agent A’s referral trust opinion about
agent B, denoted by ω AB , to discount B’s opinion about variable X, denoted ωXB , to
[A;B]
produce A’s derived opinion about X, denoted ωX . The parameters of the derived
[A;B]
opinion ωX are defined in the following way:
 [A;B]
b X (x) = PAB b BX (x)







 [A;B]
[A;B]
= 1 − PAB ∑ b BX (x)
ωX : uX
(13.6)
x∈R(X)







 [A;B]
a X (x) = a BX (x)
⊔
⊓
The effect of the trust discounting operator is illustrated in Figure 13.5 with the
following example. Let ωBA = (0.20, 0.40, 0.40, 0.75) be A’s trust opinion on B,
with projected probability PAB = 0.50. Let further ωXB = (0.45, 0.35, 0.20, 0.25) be
B’s binomial opinion about the binary variable X, i.e with projected probability
PBx1 = 0.50.
[A;B]
By using Eq.(13.6) we can compute A’s derived opinion about X as ωx
=
[A;B]
(0.225, 0.175, 0.600, 0.250) which has projected probability Px1 = 0.375. The
opinion values are summarised in Eq.(13.7).
13.3 The Trust Discounting Operator
249
u
u
u
A
bxA1 ; B Z XA; B d xA1 ; B PB
d BA
bBA Z A
B
…
u BA
tB
=
B
B d
x
bxB1 Z X 1
u XA; B
u XB
A
B
P
a
A
B
A’s referral trust in B
tB
x2
a
B
x1
B
x1
x1
P
B’s opinion about X
x2
a xA1; B PxA1 ; B
x1
A’s derived opinion about X
Fig. 13.5 Uncertainty-favouring trust discounting
 A
b


 B



 A

dB





ωBA : uAB






aAB






 A
PB
= 0.20
= 0.40
= 0.40
= 0.75
= 0.50
 B
b


 x1



 B

dx1





ωXB : uBX






aBx1






 B
Px1
= 0.45
= 0.35
= 0.20
= 0.25
= 0.50

[A;B]

bx1







[A;B]


dx1





[A;B]
ωX : u[A;B]
X






[A;B]


 ax1





 [A;B]
Px1
= 0.225
= 0.175
= 0.600
= 0.250
= 0.375
(13.7)
[A;B]
typically gets increased uncertainty, comThe trust-discounted opinion ωX
pared to the original opinion recommended by B, where the uncertainty increase is
dictated by the projected probability of the referral trust opinion ωBA . The principle
is that the smaller the projected probability PAB , the greater the uncertainty of the
derived opinion ωBA .
Figure 13.5 illustrates the general behaviour of the uncertainty-favouring trust
discounting operator, where the derived opinion is constrained to the shaded subtriangle at the top of the right-most triangle. The size of the shaded sub-triangle
corresponds to the projected probability of trust in the trust opinion. The effect of
this is that the barycentric representation of ωXB is shrunk proportionally to PAB to
become a barycentric opinion representation inside the shaded sub-triangle.
Some special cases are worth mentioning. In case the projected trust probability
equals one, which means complete trust, the relying party accepts the recommended
opinion ‘as is’, meaning that the derived opinion is equal to the recommended opinion. In case the projected trust probability equals zero, which means complete distrust, the recommended opinion is reduced to a vacuous opinion, meaning that the
recommended opinion is completely discarded.
It can be mentioned that the trust discounting operator described above is a special case of the general trust discounting operator for deriving opinions from arbitrarily long trust paths, as expressed by Definition 13.7 below.
250
13 Computational Trust
13.3.3 Example: Trust Discounting of Restaurant
Recommendation
The following example illustrates how trust discounting is applied intuitively in real
situations.
Let us assume that Alice is on holiday in a town in a foreign country, and is
looking for a restaurant where the locals go, because she would like to avoid places
overrun by tourists. Alice’s impression is that it is hard to find a good restaurant,
and guesses that only about 20% of the restaurants could be characterised as good.
She would like to find the best one. While walking around town she meets a local
called Bob who tells her that restaurant Xylo is the favourite place for locals.
We assume that Bob is a stranger to Alice, so that a priori her trust in Bob is
affected by high uncertainty. However, it is enough for Alice to assume that locals
in general give good advice, which translates into a high base rate for her trust in
the advice from locals. Even if her trust in Bob is vacuous, a high base rate will
result in a strong projected probability of trust. Assuming that Bob gives a very
positive recommendation about Xylo, Alice will derive a positive opinion about the
restaurant based on Bob’s advice.
This example situation can be translated into numbers. Figure 13.6 shows a
screenshot of the online demonstrator for subjective logic trust discounting, with
arguments that are plausible for the example of the restaurant recommendation.
u
u
u
…
=
DISCOUNT
d
b
aP
Opinion on [A;B]
0.00
belief
disbelief
0.00
uncertainty 1.00
base rate
0.90
probability 0.90
b
d
a
Opinion on [B,X]
0.95
belief
disbelief
0.00
uncertainty 0.05
base rate
0.20
probability 0.96
P
b
d
a
P
Opinion on [A;B,X]
0.85
belief
disbelief
0.00
uncertainty 0.15
base rate
0.20
probability 0.88
Fig. 13.6 Example trust discounting in the restaurant recommendation example.
As a result of the recommendation Alice becomes quite convinced that Xylo is
the right place to go, and intends to have dinner at Xylo in the evening.
However, if Alice receives a second and contradictory advice, her trust in Xylo
could drop dramatically, so she might change her mind. This scenario is described
in an extended version of this example in Section 13.5.3 below.
13.3 The Trust Discounting Operator
251
The next section describes how transitive trust path longer that two edges can be
analysed.
13.3.4 Trust Discounting for Multi-Edge Path
Trust discounting described in Definition 13.6 describes how trust discounting is
performed for a trust path consisting of two adjacent edges, but does not say how
it should be computed for longer trust paths. The method for computing transitive
trust in case three or more edges are chained is described below.
This type of multi-edge transitive trust path can be split into the referral trust
part, and the final belief part.
Consider the following trust network represented as a graph.
Multi-edge trust graph: (A1 −→ X) = (A1 99K A2 99K . . . An −→ X)
(13.8)
The derived functional belief edge [A1 , X] of Eq.(13.8) is formally expressed as:
Full formal notation:
[A1 , X] = [A1 ; A2 ] : [A2 ; A3 ] : . . . [An , X]
Compact trust path notation:
[A1 , X] = [A1 ; A2 ; . . . An , X]
(13.9)
Separate referral and functional: [A1 , X] = [A1 ; . . . An ] : [An , X]
Short notation with separation:
[A1 , X] = [A1 ; An ] : [An , X]
Let each edge from the full formal notation have assigned an opinion, where
the inter-agent trust opinions are denoted ωAAi , and the final opinion is denoted
(i+1)
ωXAn . All the inter-agent opinions represent referral trust, whereas the final opinion
represents functional belief.
The projected probability of the referral trust part [A1 ; An ] is computed as:
n−1
Referral Trust Projected Probability: PAA1n = ∏ PAAii+1
(13.10)
i=1
Trust discounting with an arbitrarily long referral trust path is defined as a function of the referral trust projected probability of Eq.(13.10).
Definition 13.7 (Trust Discounting for Multi-Edge Paths).
Assume a transitive trust path consisting of chained trust edges between n agents
denoted A1 . . . An followed by a final belief edge between the last agent An and the
target node X, where the goal is to derive a belief edge between the first agent A1 and
252
13 Computational Trust
the target X. The parameters of the derived opinion ωXA1 are defined in the following
way:
 A

b X1 (x) = PAA1n b AXn (x)






 A1
A1
= 1 − PAA1n ∑ b AXn (x)
ωX : uX
(13.11)
x∈R(X)







 A1
a X (x) = aBX (x)
⊔
⊓
The principle of multi-edge trust discounting is to take the projected probabilities
of the referral trust part expressed by Eq.(13.10) as a measure of the trust network
reliability. This reliability measure is then used to discount the recommended functional opinion ωXAn of the final belief edge [An , X].
Note that transitive referral trust computation described here is computed similarly to serial reliability of components in a system, as described in Section 7.2. The
difference is that in case of serial reliability analysis the subjective logic multiplication operator described in Chapter 7 is used, whereas for transitive trust networks,
the simple product of projected probabilities is used. To only use probability product is sufficient for trust networks because only the projected probability is needed
for the trust discounting operator.
In the case where every referral trust opinion has projected probability PAAii+1 = 1,
then the product referral trust projected probability also equals to 1, so that the derived opinion ωXA1 is equal to the recommended opinion ωXAn . In case any of the
referral trust relationships has projected probability PAAii+1 01, then the product referral trust projected probability is also zero, so the derived opinion ωXA1 becomes
vacuous.
Note that the operator for deriving opinions from arbitrarily long trust paths described here is a generalisation of the trust discounting operator for combining only
two edges, described by Definition 13.6 above.
As an example consider the transitive trust network expressed as:
[A, X] = [A; B] : [B;C] : [C; D] : [D, X]
= [A; B;C; D, X]
(13.12)
= [A; B;C; D] : [D, X] = [A; D] : [D, X].
Table 13.2 provides example opinions for the trust path of Eq.(13.12), and shows
the result of the trust discounting computation.
Although each referral trust edge has relatively high projected probability, their
product quickly drops to a relatively low value as expressed by PAD = 0.44. The trust
[A;D]
therefore becomes highly uncertain.
discounted opinion ωX
13.3 The Trust Discounting Operator
253
Table 13.2 Example trust discounting in multi-edge path
ωBA
Parameters:
Argument opinions:
ωCB
ωDC
ωXD
Product:
PAD
Derived:
[A;D]
ωX
belief:
b
0.20
0.20
0.20
0.80
0.35
disbelief:
d
0.10
0.10
0.10
0.20
0.09
uncertainty:
u
0.70
0.70
0.70
0.00
0.56
base rate:
a
0.80
0.80
0.80
0.10
0.10
projected prob.: P
0.76
0.76
0.76
0.80
0.44
0.41
This result reflects the intuition that that a long path of indirect recommendations
quickly becomes meaningless.
It is worth asking the question whether a trust path longer than a few edges can
be practical. In everyday contexts we rarely rely on trust paths longer than a couple
of edges. For example, few people would put much faith in an advice delivered like
this: “A colleague of mine told me that his sister has a friend who knows a good car
mechanic, so therefore you should take your car to his garage”.
It is natural to become suspicious about the veracity of information or advice in
situations with a high degree of separation between the analyst agent A and the origin source of information about the target X, because we often see how information
gets distorted when passed from person to person through many hops. This is because we humans are quite unreliable agents regarding truthful representation and
forwarding of information that we receive.
However, computer systems are able to correctly propagate information through
multiple nodes with relatively good reliability. The application of trust transitivity
through long chains is therefore better suited for computer networks than human
networks.
MANETs (Mobile Ad-Hoc Networks) and sensor networks represent a type of
computer networks where multiple nodes depend on each other for service provision. A typical characteristic of such networks is the uncertain reliability of each
node, as well as the lack of control of one node over other nodes. Transitive trust
computation with subjective logic therefore is highly relevant for MANETs and
similar networks, even in case of long trust paths.
254
13 Computational Trust
13.4 Trust Fusion
It is common to collect recommendations from several sources in order to be better
informed e.g when making decisions. This can be called trust fusion, meaning that
the derived trust opinions resulting from from multiple trust paths are fused.
Let us continue the example of Alice who needs to have her car serviced, where
she has received a recommendation from Bob to use the car mechanic Eric. This
time we assume that Alice has doubts about Bob’s advice so she would like to
get a second opinion. She therefore asks her other colleague Claire for her opinion
about Eric. The trust graph which includes both recommendations is illustrated in
Figure 13.7.
B
2
A
E
1
1
C
1
1
2
Legend:
derived functional trust
referral trust
recommendation
belief / functional trust
3
Fig. 13.7 Example of trust fusion
Formal notation of the graph of Figure 13.7 is shown in Eq.(13.13). This trust
network also involves trust transitivity which is computed with trust discounting.
Trust fusion formal notation: [A, E] = ([A; B] : [B, E]) ⋄ ([A;C] : [C, E])
(13.13)
Compact notation:
[A, E] = [A; B, E] ⋄ [A;C, E]
The computation of trust fusion involves both trust discounting and belief fusion, because what is actually fused is a pair of opinions computed with trust discounting. The trust target E in Figure 13.13 can be expressed as a binary variable
X = {“E is reliable”, “E is unreliable”} so that that A in fact derives a (trust) opinion about the variable X.
In general it is assumed that agent A receives opinions about target X from two
sources B and C, and that A has referral trust opinions in both B and C, as illustrated
in Figure 13.8.
13.4 Trust Fusion
255
Arguments:
B ZX
believe
X
Z BA
A
trusts
Result:
B
A
believes
ZCA C Z XC
Z X[A; B ]¡[A;C ]
X
(Z BA … Z XB ) † (ZCA … Z XC )
Fig. 13.8 Fusion of trust-discounted opinions
The symbol ⋄ denotes fusion between the two trust paths [A; B, X] and [A;C, X]
(in compact notation). The choice of this symbol was motivated by the resemblance
between the diamond shape and the graph of a typical trust fusion network such as
on the left side of Figure 13.8. The expression for the A’s derived opinion about X
as a function of trust fusion is given by Eq.(13.14).
[A;B]⋄[A;C]
Trust fusion computation: ωX
= (ωBA ⊗ ωXB ) ⊕ (ωCA ⊗ ωXC )
(13.14)
The operator for fusing trust paths must be selected from the set of fusion operators described in Chapter 11 according to the selection criteria described in Figure 11.2 on p.206. As an example Eq.(13.14) shows trust fusion using the cumulative
fusion operator ‘⊕’.
Table 13.3 provides a numerical example showing the result of cumulative trust
fusion in the example of Alice and the car mechanic of Figure 13.7. The trust opinions derived from each path are first computed with the trust discounting operator
of Definition 13.6 (which is a special case of Definition 13.7). Then the two derived
trust opinions are fused with the cumulative fusion operator of Definition 11.5.
Table 13.3 Example trust fusion for the car mechanic situation of Figure 13.7
ωBA
Parameters:
Argument opinions:
ωEB
ωCA
ωEC
Intermediate:
[A;B]
[A;C]
ωE
ωE
Derived:
[A;B]⋄[A;C]
ωE
belief:
b
0.40
0.90
0.50
0.80
0.630
0.600
0.743
disbelief:
d
0.10
0.00
0.00
0.10
0.000
0.075
0.048
uncertainty:
u
0.50
0.10
0.50
0.10
0.370
0.325
0.209
base rate:
a
0.60
0.40
0.50
0.40
0.400
0.400
0.400
projected prob.: P
0.70
0.94
0.75
0.84
0.778
0.730
0.826
256
13 Computational Trust
In this example, both Bob and Claire offer relatively strong recommendations
about Eric, so that Alice’s first derived trust in Eric is strengthened by asking Claire
for a second advice.
Figure 13.9 shows a screenshot of the online demonstrator for subjective logic
trust network based on the input arguments in the example of Table 13.3.
u
u
d
a P
t
d
u
d
a
P
t
d
u
P
a
u
a
P
t
E
t
=
d
a
P
t
Fig. 13.9 Example trust fusion
Figure 13.9 shows the same trust network as that of Figure 13.7, but where the
opinion triangles are placed on top of each edge. The input arguments are represented as the 4 opinion triangles at the top of the figure, and the derived trust opinion
is represented as the opinion triangle at the bottom of the figure.
This trust fusion example uses a combination of trust discounting and fusion. By
combining fusion and trust discounting, complex trust networks can be modeled and
analysed, as described in the following sections.
13.5 Trust Revision
13.5.1 Motivation for Trust Revision
A complicating element in case of trust fusion is when multiple sources provide
highly conflicting advice, which might indicate that one or both sources are unreliable. In this case a strategy is needed for dealing with the conflict. The chosen
strategy must be suitable for the specific situation.
13.5 Trust Revision
257
A simplistic strategy for fusing conflicting opinions would be to consider the
trust opinions as static, and not to revise trust at all. With this strategy the relying
party only needs to determine the most suitable fusion operator for the situation to
be analysed. For example, if averaging fusion is considered suitable, then a simple
model would be to derive A’s opinion about X according to the principle of trust
fusion of Eq.(13.13) as follows:
(A;B)⋄(A;C)
Simple averaging fusion: ωX
= (ωBA ⊗ ωXB )⊕(ωCA ⊗ ωXC )
(13.15)
However, there are several situations where simple fusion might be considered
inadequate, and where it would be natural to revise one or both trust opinions.
One such situation is when the respective opinions provided by B and C are
highly conflicting in terms of their projected probability distributions on X.
Note here that in some situations this might be natural, such as in case of short
samples of random processes where a specific type of events might be observed in
clusters. Another situation where highly conflicting beliefs might occur naturally is
when the observed system can change characteristics over time and the observed
projected probability distributions refer to different time periods.
However, if the sources A and B have observe the same situation or event, possibly at the same time, but still produce different opinions, then it is likely that one or
both sources are unreliable, so that trust revision should be considered.
Another situation that calls for a trust revision is when the relying party A learns
that the ground truth about X is radically different from the recommended opinions. The the analyst has good reasons to distrust a source that recommends a very
different opinion.
Since high conflict indicates that one or several sources might be unreliable, the
strategy should aim at reducing the influence of unreliable sources in order to derive
the most reliable belief in the target.
A reduction of the influence of unreliable sources typically involves some form
of trust revision, i.e. that the analyst’s trust in one or several advisors is reduced as
a function of their degree of conflict. This section describes a strategy for how this
should be done.
In conclusion, there are situations where revision of trust opinions is warranted.
The method for doing this is described in the next section.
13.5.2 Trust Revision Method
Trust revision is based on the degree of conflict between the opinions derived
through trust discounting in two different paths. The rationale is that conflict indicates that one or both advisors are unreliable, so that the referral trust in the advisors
should be revised as a function of the degree of conflict.
258
13 Computational Trust
Recall from Section 4.8 and Definition 4.20 that DC (degree of conflict) is the
product of PD (projected probability distance) and CC (conjunctive certainty) expressed as:
Degree of conflict: DC = PD · CC
[A;B]
When applied to the two trust-discounted opinions ωX
probability distance PD is expressed as:
[A;B]
[A;B]
[A;C]
PD(ωX , ωX )
∑ |PX
=
x∈X
(13.16)
[A;C]
(x) − PX
[A;C]
and ωX
(x)|
(13.17)
2
[A;B]
Similarly, when applied to the two trust discounted opinions ωX
conjunctive certainty CC is expressed as:
[A;B]
CC(ωX
[A;C]
, ωX
the projected
[A;B]
) = (1 − uX
[A;C]
)(1 − uX
[A;C]
and ωX
)
the
(13.18)
The degree of conflict between the two trust discounted opinions is then:
[A;B]
DC(ωX
[A;C]
, ωX
[A;B]
) = PD(ωX
[A;C]
, ωX
[A;B]
) · CC(ωX
[A;C]
, ωX
).
(13.19)
Knowing the degree of conflict is only one of the factors for determining the
magnitude of revision for the referral trust opinions ωBA and ωCA . It is natural to let the
magnitude of trust revision also be determined by the relative degree of uncertainty
in the referral trust opinions, so that the most uncertain opinion gets revised the
most. The rationale is that if the analyst has uncertain referral trust in another agent,
then the level of trust could easily change.
The uncertainty differential (UD) is a a measure for relative uncertainty between
two referral trust opinions. There is one UD for each opinion relative to the other.

A
A |ω A ) = uB

UD(
ω

A
A
B
C

uB +uC
Uncertainty Differentials:
(13.20)

A

 UD(ω A |ω A ) = uC
C
B
A
uAB +uC
It can be seen that UD ∈ [0, 1], where UD = 0.5 means that both referral trust
opinions have equal uncertainty and therefore should get their equal share of revision. The case where UD = 1 means that the first referral trust opinion is infinitely
more uncertain than the other and therefore should get all the revision. The case
where UD = 0 means that the other referral trust opinion is infinitely more uncertain than the first, so that the other opinion should get all the revision.
In summary the UD factors dictate the relative share of trust revision for each
referral trust opinion. The magnitude of trust revision is determined by revision
factor RF which is a product of DC and and UD.
13.5 Trust Revision
Revision Factors:
259

[A;B]
[A;C]

 RF(ωBA ) = UD(ωBA |ωCA ) · DC(ωX , ωX )

 RF(ω A ) = UD(ω A |ω A ) · DC(ω [A;B] , ω [A;C] )
C
C B
X
X
(13.21)
Trust revision consists of modifying the referral trust opinions by increasing distrust mass at the cost of trust mass and uncertainty mass. The idea is that sources
found to be unreliable should be distrusted more. A source found to be completely
unreliable should be absolutely distrusted.
In terms of the opinion triangle, trust revision consists of moving the opinion
point towards the t-vertex, as shown in Figure 13.10. Given the argument referral
eBA is
trust opinion ωBA = (bAB , dBA , uAB , aAB ), the revised referral trust opinion denoted ω
expressed as:
 A
b̃B = bAB − bAB · RF(ωBA )








d˜BA = dBA + (1 − dBA ) · RF(ωBA )
A
eB :
(13.22)
Revised opinion ω

A = uA − uA · RF(ω A )

ũ

B
B
B
B





 A
ãB = aAB
Similarly, given the argument referral trust opinion ωCA = (bCA , dCA , uCA , aCA ), the
eCA is expressed as:
revised referral trust opinion denoted ω
 A
b̃ = bCA − bCA · RF(ωCA )


C





d˜CA = dCA + (1 − dCA ) · RF(ωCA )
A
eC :
Revised opinion ω
(13.23)

A = uA − uA · RF(ω A )

ũ

C
C
C
C





 A
ãC = aCA
Figure 13.10 illustrates the effect of trust revision on ωBA which consists of making the referral trust opinion more distrusting.
eBA and ω
eCA , trust fusion accordAfter trust revision has been applied to produce ω
ing to Eq.(13.15) can be repeated, with reduced conflict. Trust revised averaging
fusion is given by the expression in Eq.(13.24) below.
(A;B)⋄(A;C)
eX
Revised trust fusion: ω
eBA ⊗ ωXB )⊕(ω
eCA ⊗ ωXC )
= (ω
(13.24)
Trust revision offers a strategy to handle situations where potentially unreliable
sources give conflicting advice or recommendations, presumably because one or
both of them give advice that is wrong or significantly different from the ground
truth. Based on the degree of conflict, and also on the prior uncertainty in the referral trust opinions, trust revision determines the degree by which the referral trust
opinions should be considered unreliable, and therefore be revised in order to have
260
13 Computational Trust
u vertex (uncertainty)
d BA
bBA
Z~ A
Z BA
u BA
B
t vertex
t vertex
(distrust)
(trust)
Fig. 13.10 Revision of referral trust opinion
ωBA
less influence on the derived fused belief. This process leads to more conservative
results which take into account that the information sources might be unreliable.
13.5.3 Example: Conflicting Restaurant Recommendations
We continue the example from Section 13.3.3 about the recommended restaurant
Xylo, where this time we assume that Alice stays in a hostel with another traveler
named Claire who tells her that she already tried the recommended restaurant, and
that she was very disappointed because the food was very bad, and that there were
no locals there.
Let us assume that Alice has spoken with Claire a few times, and judges her to
be an experienced traveler, so she intuitively gets a relatively high trust in Claire.
Now Alice has received a second advice about the restaurant Xylo. This gives her a
basis and a reason to revise her initial trust in Bob, which potentially could translate
into distrusting Bob, and which could trigger a change in her initial belief about the
restaurant Xylo.
This example situation can be translated into numbers where ωBA , ωXB , ωCA and
C
ωX are the argument opinions.
Let us first show the result of trust fusion without trust revision where the derived
opinion ω [A;B]⋄[A;C]X is computed simple trust fusion as:
[A;B]⋄[A;C]
Simple trust fusion: ωX
= (ωBA ⊗ ωXB ) ⊕ (ωCA ⊗ ωXB )
(13.25)
The argument opinions as well as the derived opinion are shown in Table 13.4.
The application of trust fusion to the situation where Alice receives advice from
both Bob and Claire produces a derived opinion with projected probability PAX =
0.465 which indicates that the chances of Xylo being a good or bad restaurant are
about even. This result seems counter-intuitive. Although the projected probabilities
13.5 Trust Revision
261
Table 13.4 Simple trust fusion of conflicting advice about restaurant
ωBA
Parameters:
Argument opinions:
ωXB
ωCA
ωXC
Intermediate:
[A;B]
[A;C]
ωX
ωX
Derived:
[A;B]⋄[A;C]
ωX
belief:
b
0.00
0.95
0.90
0.10
0.855
0.099
0.452
disbelief:
d
0.00
0.00
0.00
0.80
0.000
0.792
0.482
uncertainty:
u
1.00
0.05
0.10
0.10
0.145
0.109
0.066
base rate:
a
0.90
0.20
0.90
0.20
0.200
0.200
0.200
projected prob.: P
0.90
0.96
0.99
0.12
0.884
0.121
0.465
PAB = 0.90 and PCA = 0.99 are relatively close, Alice’s referral trust belief bCA = 0.90
in Claire is much stronger than her referral trust belief bAB = 0.00 in Bob. It would
therefore seem natural to let Claire’s advice carry significantly more weight, but in
the case of simple trust fusion as expressed in Table 13.4 it does not.
From an intuitive perspective, Alice’s natural reaction in this situation would be
to revise her referral trust in Bob because his advice conflicts with that of Claire
whom she trust with more certainty, i.e. with more belief mass. Alice would typically start to distrust Bob because apparently his advice is unreliable. As a result of
this trust revision, Claire’s advice would carry the most weight.
The application of the trust revision method described in Section 13.5.2 produces
the following intermediate values:

[A;B]
[A;C]

Projected distance:
PD(ωX , ωX ) = 0.763





Conflict: Conjunctive Certainty: CC(ωX[A;B] , ωX[A;C] ) = 0.762






[A;B]
[A;C]
Degree of Conflict:
DC(ωX , ωX ) = 0.581
Revision:

Uncertainty Differential for B : UD(ωBA |ωCA ) = 0.909








 Uncertainty Differential for C : UD(ωCA |ωBA ) = 0.091


Revision Factor for B :







Revision Factor for C :
RF(ωBA )
= 0.529
RF(ωCA )
= 0.053
(13.26)
(13.27)
These intermediate parameters of Eq.(13.26) and Eq.(13.27) determine the trust
eBA and ω
eCA that are specified in Table 13.5. The table
revised referral trust opinions ω
262
13 Computational Trust
also shows the result of applying trust fusion based on the revised referral trust
opinions.
Table 13.5 Trust revision of conflicting advice about restaurant
eBA
ω
Parameters:
Argument opinions:
eCA
ωXB
ω
ωXC
Intermediate:
eX[A;B] ω
eX[A;C]
ω
Derived:
eX[A;B]⋄[A;C]
ω
belief:
b
0.00
0.95
0.85
0.10
0.403
0.094
0.180
disbelief:
d
0.53
0.00
0.05
0.80
0.000
0.750
0.679
uncertainty:
u
0.47
0.05
0.10
0.10
0.597
0.156
0.141
base rate:
a
0.90
0.20
0.90
0.20
0.200
0.200
0.200
projected prob.: P
0.42
0.96
0.94
0.12
0.522
0.125
0.208
It can be seen that Alice’s revised referral trust in Bob has been reduced significantly, whereas her referral trust on Claire has been kept more or less unchanged.
This reflects the intuitive reaction the we would have in a similar situation.
Trust revision must be considered to be an ad hoc method, because there is no
parallel in physical processes that can be objectively observed and analysed. Especially the expression for the revision factor RF is affected by the design choice of
mirroring intuitive human judgment. There might be different design choices for
the revision factor that better reflect human intuition, and that also can be shown to
produce sound results under specific criteria. We invite the reader to reflect on these
issues, and maybe come up with an alternative and improved design for the revision
factor.
Chapter 14
Trust Networks
Trust networks represent chained trust and belief relationships between an analyst
agent and a target entity or variable. Simple trust networks have already been described in Chapter 13 on computational trust, and included where the computation
of transitive trust paths, and the computation of simple trust fusion networks.
In case of more complex trust networks it is necessary to take an algorithmic
approach to modeling and analysis. This chapter describes how to deal with trust
networks that are more complex than those described in Chapter ??.
The operators for fusion, trust discounting and trust revision can be applied
to trust and belief relationships represented as a DSPG (Directed Series-Parallel
Graphs). The next section therefore gives a brief introduction to such graphs.
14.1 Graphs for Trust Networks
14.1.1 Directed Series-Parallel Graphs
Series-parallel graphs, called SP-graphs for short, represent a specific type of graphs
that have a pair of distinguished vertices called source and sink.
The following definition of SP-graphs is taken from [18].
Definition 14.1 (Series-Parallel Graph). A graph is an SP-graph, if it may be
turned into a single edge connecting a source node s and a sink node t by a
sequence of the following operations:
(i) Replacement of a pair of edges incident to a vertex of degree 2 other than the
source or sink with a single edge.
(ii) Replacement of a pair of parallel edges with a single edge that connects their
common endpoint vertices.
⊔
⊓
263
264
14 Trust Networks
For example, Figure 14.1 illustrates how the SP-graph to the left can be stepwise
transformed using the operations of Definition 14.1.
SP-graph
Transform 1
Transform 2
Transform 3
Transform 4
s
s
s
s
s
t
t
t
t
t
Fig. 14.1 Procedure for transforming and SP-graph into a single edge.
Transform 1 results from applying procedure (i) 4 times. Transform 2 results
from applying procedure (ii) twice. Transform 3 results from applying procedure
(i) twice. Transform 4, which is a single edge, results from applying procedure (ii)
once. The fact that the graph transforms into a single edge in this way proves that it
is a SP-graph.
Trust networks are represented as directed graphs. We therefore assume that an
SP-graph representing a trust network is directed from the source to the sink, in
which case it is called a directed series-parallel graph or DSPG for short [25].
Definition 14.2 (Directed Series-Parallel Graph). A graph is a DSPG (Directed
Series-Parallel Graph) iff it is a SP-graph according to Definition 14.1 and it only
consists of directed edges that form paths without loops from source to the sink. ⊓
⊔
In the context of trust networks the source node of an DSPG is the analyst agent,
aka. relying party, which is typically represent by the label A. In general, the sink
node of a DSPG is the target variable which is typically represented by the label X,
but which can also be a trusted entity represented e.g. by the label E.
14.2 Outbound-Inbound Node Set
A DSPG (Directed Series Parallel Network) can consist of multiple subnetworks
that themselves are sub-DSPGs. A parallel-path subnetwork (PPS) is a subnetwork
that consists parallel paths in a DSPG.
A node can be part of one or multiple edges. In general the degree of a node
represents the number of edges that a node is part of. Since a DSPG is directed it is
possible to distinguish between inbound degree (ID) and outbound degree (OD) for
14.2 Outbound-Inbound Node Set
265
a node. The inbound degree of a node is the number if inbound edges to that node.
Similarly, the outbound degree of a node is the number of outbound edges from that
node. Consider for example the following referral trust network:
A 99K B 99K C
Node B has degree 2, denoted as Deg(B) = 2, because it is connected to the
two edges [A; B] and [B;C]. At the same time, node B has inbound degree ID(B) = 1
because its only inbound edge is [A; B], and has outbound degree OD(B) = 1 because
its only outbound edge is [B;C]. Node A has ID(A) = 0 and OD(A) = 1. Obviously,
for any node V in a DSPG, its degree is represented as Deg(V ) = ID(V ) + OD(V ).
An ordered pair of nodes (Vs ,Vt ) in a DSPG is said to be connected if the second
node Vt can be reached by departing from the first node Vs . From the simple example
of a 3-node network above it can easily be seen that (A,C) is a connected pair of
nodes.
Consider a node in a DSPG. We define the outbound node set of the node as
the set of edges that can be traversed after departing from that node. Similarly, the
inbound node set of a node in a DSPG is the set of edges that can be traversed before
reaching the node.
Definition 14.3 (OINS: Outbound-Inbound Node Set). Consider an ordered pair
of nodes (Vs ,Vt ) in a DSPG. We define the outbound-inbound node set (OINS) of
the ordered pair to be the intersection of the outbound set of the first node Vs and the
inbound set of the second node Vt .
⊔
⊓
Some simple properties of an OINS can be stated.
Theorem 14.1. A pair of of nodes (Vs ,Vt ) in a DSPG are connected iff their OINS
(Outbound-Inbound Node Set) is non-empty.
Proof. If OINS 6= 0,
/ then the OINS contains at least one edge that can be traversed
after departing from the first node Vs and that can be traversed before reaching the
second node Vt , which means that it is possible to reach the second node Vt by
departing from the first node Vs , so the nodes are connected. If OINS = 0,
/ then the
OINS contains no path connecting the two nodes, which means that they are not
connected. ⊓
⊔
14.2.1 Parallel-Path Subnetworks
A DSPG can in general consist of multiple subnetworks that themselves are DSPGs
that can contain parallel paths. We are interested in identifying subnetworks within
a DSPG that contain parallel paths. A parallel-path subnetwork (PPS) in a DSPG is
the set of multiple paths between a pair of connected nodes, as defined next.
266
14 Trust Networks
Definition 14.4 (parallel-path subnetwork). Select an ordered pair (Vs ,Vt ) of connected nodes in a DSPG. The subnetwork consisting of the pair’s OINS is a parallelpath subnetwork (PPS) iff both the outbound degree of the first node Vs in OINS
satisfies OD(Vs ) ≥ 2, and the inbound degree of the second node Vt in the OINS
satisfies ID(Vt ) ≥ 2.
The node Vs is called the source of the PPS, and Vt is called the sink of the PPS.
⊔
⊓
Consider for example the OINS of the node pair (C, J) in Figure 14.2. Within that
particular OINS we have OD(C) = 2 and ID(J) = 3 which satisfies the requirements
of Definition 14.4, so that the OINS is a PPS (parallel-path subnetwork).
F
D
A
B
C
I
J
E
Legend:
X
G
Referral trust
H
Functional trust / belief
Fig. 14.2 DSPG with 5 PPSs (parallel-path subnetworks).
It can also be verified that the respective OINSs of the node pairs (E, J), (C, F),
(F, X) and (C, X) are also PPSs, which means that the DSPG of Figure 14.2 contains
5 PPSs in total.
However, the sub-network between the connected node pair (E, X) is not a PPS
because ID(X) = 1 within that OINS (outbound-inbound node set), which does not
satisfy the requirements of Definition 14.4.
14.2.2 Nesting Level
The concept of nesting level is important for analysing trust networks represented
as a DSPG. In general the nesting level of an edge reflects how many PPSs it is
part of in the DSPG. Each edge has a specific nesting level equal or greater than 0.
For example, a trust network consisting of a single trust path has trust edges with
nesting level 0, because the edges are not part of any subnetwork of parallel paths.
The nesting level of an edge in a DSPG is defined next.
Definition 14.5 (Nesting Level). Assume a DSPG consisting of multiple nodes connected via directed edges. The nesting level of an edge in the DSPG is equal to the
number of PPSs (parallel-path subnetworks) that the edge is a part of.
14.3 Analysis of DSPG Trust Networks
267
Let e.g. [Vm ;Vn ] be an edge in a DSPG. The nesting level of the edge [Vm ;Vn ] is
formally denoted NL([Vm ;Vn ]). The nesting level of an edge is an integer that can be
zero or greater.
⊔
⊓
In Figure 14.3 the nesting level of edges is indicated by the numbered diamonds
on the edges.
F
2
2
2
2
D
I
2
2
A
B
0
0
C
2
G
2
E
Legend:
1
J
3
3
#
X
2
H
Nesting level
3
Referral trust
Functional trust / belief
Fig. 14.3 Nesting levels of edges in a DSPG.
It can be seen that the edge [A; B] is not part of any PPS, so that NL([A; B]) = 0.
It can also be seen that the edge [H; J] is part of 3 PPSs belonging to the node pairs
(E, J), (C, F) and (C, X), so that NL([H; J]) = 3.
The nesting level determines the order of computation of trust in a DSPG trust
network, as described next.
14.3 Analysis of DSPG Trust Networks
We assumes that the trust network to be analysed is represented in the form of a
DSPG. It can be verified that the trust network of Figure 14.4 represents a DSPG
(Directed Series-Parallel Graph) according to Definition 14.2.
Analyst
A
B
C
D
Legend:
E
F
I
G
J
H
Referral trust
Functional trust / belief
Fig. 14.4 Trust network in the form of a DSPG.
Target
X
268
14 Trust Networks
It can also be seen that the DSPG of Figure 14.4 consists of 3 PPSs (parallel-path
subnetworks), represented by the source-sink pairs (D, J), (A, J) and (A, X).
The compact formal expression for the trust network of Figure 14.4 is given in
Eq.(14.1).
[A, X] = [A; B; E; I, X] ⋄ (([A;C; F; J] ⋄ ([A; D] : ([D; G; J] ⋄ [D; H; J]))) : [J, X]).
(14.1)
Next we describe a simple algorithm for analysing and deriving belief/trust from
a DSPG trust network like that of Figure 14.4.
14.3.1 Algorithm for Analysis of DSPG
The procedure for computing derived trust in a trust network represented as a DSPG
is explained below in the form of a flowchart algorithm. The procedure can e.g. be
applied for computing the trust opinion ωXA in Figure 14.4. The procedure corresponds well to the operations of graph simplification of Definition 14.1. For the
purpose of the computation principles defined here, agents and target are called
nodes.
Each step in the flowchart algorithm in Figure 14.5 is described below.
(a) Prepare for analysing the trust network. This includes representing the trust
network as a set of directed edges with pairs of nodes. Verify that the trust
network is indeed a DSPG.
(b) Identify each PPS (Parallel Path Subnetwork) with its pair of source and target
nodes (Vs ,Vt ). Determine the nesting level of every edge as a function of the
number of PPSs it is part of.
(c) Select the PPS where all edges have the highest nesting level, and proceed to
(d). In case no PPS remains proceed to (g).
(d) For the selected PPS, compute 2-edge or multi-edge trust discounting of every
path between Vs and Vt , where the node Vt is considered as a target node of the
analysis. As a result, every path is transformed into an edge.
(e) For the selected PPS, compute trust fusion of all edges. As a result, the PPS is
transformed into a single edge.
(f) Determine the nesting level of the edge that now replaces the selected PPS.
(g) When no PPS exists, the trust network might still consist of a series of edges. In
that case compute 2-edge or multi-edge trust discounting. In case the resulting
network consists of a single edge, nothing needs to be done.
(h) The trust network has now been transformed into a single edge between the
analyst and the final target, and the computation is completed.
A parser that implements the computational algorithm of Figure 14.5 is able to
analyse the graph of e.g. Figure 14.4 and derive the opinion ωXA .
14.3 Analysis of DSPG Trust Networks
(a)
(b)
269
Prepare for
computational trust
analysis of DSPG
Determine nesting
levels of all edges,
and find every PPS
(c)
Select
PPS with highest nesting
level
PPS found
no PPS found
Compute trust
(d) discounting for each
path in PPS
Compute trust
discounting for last
path in DSPG
(g)
Compute trust fusion
(e) of edges that replace
paths in PPS
Completed DSPG
trust computation
(h)
(f)
Determine nesting
level of edge that
replaces PPS
Fig. 14.5 Flowchart algorithm for computational trust analysis of DSPG network
14.3.2 Soundness Requirements for Trust Recommendations
It is important that a recommendation is always passed in its original form from the
recommender to the relying party, and not in the form of secondary derived trust.
The reason for this is explained below.
Figure 14.6 shows an example of how not to provide recommendations.
In Figure 14.6 the trust and recommendation arrows are indexed according to
the order in which they are formed whereas the initial trust relationships have no
index. In the scenario of Figure14.6 D passes his recommendation about X to B and
C (index 2) so that B and C are able to derive their opinions about X (index 3). Now
B and C pass their derived opinions about X to A (index 4) so that she can derive her
opinion about X (index 5).
As a result A perceives the topology to be ([A; B, X] ⋄ [A;C, X]). Note that we use
the compact notation presented in Section 13.2.4.
The problem with the scenario of Figure 14.6 is that A ignores the presence of D
so that A in fact derives a hidden topology that is different from the perceived topol-
270
14 Trust Networks
B
4
2
A
1
3
D
1
1
C
1
X
1
2
4
3
incorrectly derived belief
4
Legend:
referral trust
recommendation
belief / functional trust
Fig. 14.6 Incorrect recommendation
ogy, which in both in fact are different from the real topology. The three different
topologies are given in Table 14.1.
Table 14.1 Inconsistent trust network topologies
Perceived topology:
Hidden topology:
Real topology:
([A; B, X] ⋄ [A;C, X])
([A; B; D, X] ⋄ [A;C; D, X])
([A; B; D] ⋄ [A;C; D]) : [D, X]
The reason for this inconsistency is that B’s belief relationship [B, X] is derived
from [B; D, X], and C’s belief relationship [C, X] is derived from [C; D, X]. So when
[B;D]
B recommends ωXB he implicitly recommends ωX , and when C recommends ωXC
[C;D]
she implicitly recommends ωX , but A ignores the influence of D in the received
recommendations [36]. It can easily be seen that neither the perceived nor the hidden topology is equal to the real topology, which shows that this way of passing
recommendations can produces inconsistent results.
The sound way of sending the recommendations is to let B and C pass the recommended opinions they receive from D ‘as is’ without modification, as well as their
respective trust opinions in D. This principle is certainly possible to follow, but it
also requires that A is convinced that B and C have not altered the recommendations
from D, which precisely is part of A’s referral trust in B and C.
It is thus necessary that A receives all the trust recommendations unaltered and
as expressed by the original recommending party. An example of a correct way of
passing recommendations is indicated in Figure 14.7
14.3 Analysis of DSPG Trust Networks
2
A
271
B
2
1
1
D
1
C
1
X
1
2
Legend:
referral trust
recommendation
belief / functional trust
correctly derived belief
3
Fig. 14.7 Correct recommendation
In the scenario of Figure 14.7 the perceived topology is equal to the real topology
which can be expressed as:
[A, X] = ([A; B; D] ⋄ [A;C; D]) : [D, X]
(14.2)
The lesson to be learned from the scenarios of Figure 14.6 and Figure 14.7 is
that there is a crucial difference between recommending trust/belief in an entity or
variable resulting from your own experience with that entity or domain, and recommending trust/belief in an entity or variable which has been derived as a result of
recommendations received from others.
The morale is that analysts should be aware of the difference between direct
trust/belief and derived trust/belief. Figure 14.6 illustrated how problems can occur
when derived belief is recommended, so the rule is to only recommend direct belief
[36]. However, it is not always possible to follow this principle, but simply being
aware of the potential inconsistencies is useful when assessing the results of an
analysis, or when considering mitigation strategies against inconsistencies.
If B and C were unreliable they might for example try to change the recommended trust measures. Not only that, any party that is able to intercept the recommendations from B, C, or D to A might want to alter the trust values, and A needs to
receive evidence of the authenticity and integrity of the recommendations. Cryptographic security mechanisms can typically be used to solve this problem.
272
14 Trust Networks
14.4 Analysing Complex Non-DSPG Trust Networks
An analyst might be confronted with a trust network that appears more complex
than a DSPG. It is desirable not to put any restrictions on the possible trust network
topology that can be analysed, except that it should not be cyclic. This means that
the set of possible trust paths from the analyst agent A to the target X can contain
paths that are inconsistent with a DSPG. The question then arises how such a trust
network should be analysed.
A complex non-DSPG trust network is one that is not a SP-graph according to
Definition 14.1. In many cases it can be challenging to recognize which referral trust
components are in series and which are in parallel in a complex network. Figure 14.8
illustrates a good example of such a complex trust network. The trust network only
consists of the edges from A to E and from A to F. The trust network thus consists
of referral trust edges only.
The last edges from E and F to the target X represent functional belief relationships which could also be considered to be functional trust relationships according
to Figure 13.2. However, for the purpose of trust network analysis described below,
only the referral trust network from A to E and F is relevant.
B
Analyst
A
E
X
C
D
Target
F
Legend:
Referral trust
Functional belief / trust
Fig. 14.8 Non-DSPG trust network.
Simply by looking at Figure 14.8 it is obvious that this trust network can not
be broken down into groups of series-parallel paths, which complicates the problem of computing trust from the network. In case of a DSPG which can be split
into series-parallel configurations, it is simple to determine the mathematical or analytical formula that describes the network’s derived trust. However, for a complex
non-DSPG network, trust computation requires more involved methods.
Transitive trust networks can be digitally represented and stored in the form of a
list of directed trust edges with additional attributes such as trust scope σ , time of
collection, trust variant (referral or functional) and trust opinion. Based on the list of
edges, an automated parser can establish valid DSPGs between two nodes depending on the need. The trust edges of the non-DSPG trust network of Figure 14.8 can
for example be listed as in Table 14.4.
14.4 Analysing Complex Non-DSPG Trust Networks
273
Table 14.2 Trust edges of Figure 14.8
Source Vs
Target VT
Scope
Variant
Opinion
A
B
σ
referral
ωBA
A
C
σ
referral
ωCA
A
D
σ
referral
ωDA
B
E
σ
referral
ωEB
B
F
σ
referral
ωFB
D
F
σ
referral
ωFD
E
X
σ
functional
ωXE
F
X
σ
functional
ωXF
There can be multiple approaches to analysing non-DSPG trust networks. This
task might appear similar to the case of reliability analysis of complex systems,
described in Section 7.2. However, the case of complex trust networks is fundamentally different. The main differences are that trust networks involve possible
deception which system reliability networks do not, and that trust networks involve
fusion, which is not an ingredient of system reliability analysis. The principles for
analysing complex reliability networks can therefore not be applied to complex trust
networks. A different approach is needed for analysing non-DSPG trusts networks.
A non-DSPG trust network must be simplified in order to remove paths that
prevent consistent trust computation. This process produces a DSPG trust network
which can easily be analysed.
The optimal derived DSPG trust network is the one that produces the highest
certainty of the derived opinion. Note that the goal is to maximise certainty in the
derived opinion, and not e.g. to deriving the opinion with the highest projected probability of some value of the variable X. There is a trade-off between the time it takes
to find the optimal DSPG, and how close to the optimal DSPG a simplified graph
can can be. Below we describe an exhaustive method that is guaranteed to find the
optimal DSPG, and a heuristic method that will find a DSPG close to, or equal to
the optimal DSPG.
• Exhaustive Discovery of Optimal DSPG Trust Network
The exhaustive method of finding the optimal DSPG trust network consists of
determining all possible DSPGs, then deriving the trust value for each one of
them, and finally selecting the DSPG and the corresponding canonical expression that produces the trust value with the highest confidence level. The compu-
274
14 Trust Networks
tational complexity of this method is Comp = lm(2n −1), where n is the number
of possible paths, m is the average number of paths in the DSPGs, and l is the
average number of arcs in the paths.
• Heuristic Discovery of Near-Optimal DSPG Trust Network
The heuristic method of finding a near-optimal DSPG trust network consists of
constructing the graph by including new paths one by one in decreasing order
of confidence. Each new path that would turn the graph into a non-DSPG and
break canonicity is excluded. This method only requires the computation of the
trust value for a single DSPG and canonical expression, with computational
complexity Comp = lm, where m is average number of paths in the DSPGs, and
l is the average number of arcs in the paths.
The heuristic method will produce a DSPG trust network that produces a derived opinion/trust with certainty level equal or close to that of the optimal DSPG
trust network. The reason why this method is not guaranteed to produce the optimal DSPG, is that it could exclude multiple trust paths with relatively low certainty
levels because of conflict with a single path with higher certainty level previously
included. It is possible that the low certainty paths together could provide higher
certainty than the previous high certainty path alone. In such cases it would have
been optimal to exclude the single high certainty path, and instead include the low
certainty paths. However, only the exhaustive method described above is guaranteed
to find the optimal DSPG in such cases.
The next section describes a heuristic method for transforming a non-DSPG trust
network into a DSPG trust network.
It can be mentioned that an alternative approach to constructing efficient networks from a potentially large and complex network, has been described in [61]
where it is called discovery of small worlds. However, we do not follow that approach here.
14.4.1 Synthesis of DSPG Trust Network
Below we describe an algorithm which is able to to simplify a non-DSPG trust
network like the one in Figure 14.8 in order to synthesise a DSPG trust network.
Simplification of a non-DSPG trust network is a two-step process. First, the nonDSPG trust network is analysed to identify all possible trust paths from the analyst
A to the target X. Secondly a new DSPG trust network is synthesised from scratch
by only including certain trust paths from the non-DSPG trust network in a way
that preserves the DSPG property of the synthesised trust network. The synthesised
graph between the source analyst A and the target node X is then a DSPG trust
network.
A DSPG can be constructed by sequences of serial and parallel compositions that
are defined as follows [25]:
14.4 Analysing Complex Non-DSPG Trust Networks
275
Definition 14.6 (Directed Series and Parallel Composition).
• A directed series composition consists of replacing an edge [A;C] with two
edges [A; B] and [B;C] where B is a new node.
• A directed parallel composition consists of replacing an edge [A;C] with two
edges [A;C]1 and [A;C]2 .
The principle of directed series and parallel composition is illustrated in Figure 14.9.
A
A
B
C
A
C
C
A
C
a) Serial composition
b) Parallel composition
Fig. 14.9 Principles of directed series and parallel composition.
Figure 14.10 shows a flowchart algorithm for synthesising a DSPG trust network
from a non-DSPG trust network according to the heuristic method.
Each step in the flowchart algorithm of Figure 14.10 is described below.
(a) Prepare for simplification of non-DSPG trust network. This includes representing the non-DSPG trust network as a set of directed edges with pairs of nodes.
Set a threshold pT for the lowest relevant reliability of trust paths. Create an
empty DSPG trust network to be synthesised.
(b) Identify each trust path from the analyst A to the target node X. For each path,
compute the product of the projected probabilities of referral trust edges. The
last functional belief/trust edge to the target X is not included in the product.
(c) Create a ranked list of paths according to the products computed in the previous
step, i.e. where the path with the highest product has index 1. Set pointer to 0.
(d) Increment the pointer to select the next path from the ranked list of paths. Exit to
termination if there is no path left, or if the product is smaller than the threshold
pT . Continue in case there is a path with product greater or equal to the threshold
reliability pT .
(e) Check if the selected trust path can be added and integrated into the DSPG trust
network. Use the criteria described in Section 14.4.2.
(f) Add the selected trust path in case it fits into the DSPG. Existing trust edges are
not duplicated, only new trust edges are added to the DSPG.
(g) Drop the selected trust path in case it does not fit into the DSPG.
(h) The synthesised DSPG can be analysed according to the algorithm described in
Section 14.3.
276
14 Trust Networks
(a)
(b)
Prepare non-DSPG
trust network for synthesis
of DSPG trust network
Trace all trust paths in
non-DSPG from
analyst A to target X
Rank all paths based on
(c) product of referral trust
projected probabilities
(d)
Select next
path from ranked list of
paths
no path found
path found
(e)
(g)
Drop path
no
Can path
be integrated into
DSPG ?
Completed synthesis
(h)
of DSPG trust network
yes
(f)
Integrate path into
DSPG trust network
Fig. 14.10 Flowchart algorithm for synthesising a DSPG from a complex non-DSPG
14.4.2 Requirements for DSPG Synthesis
Ideally, all the possible paths discovered by the algorithm of Figure 14.10 should be
taken into account when deriving the opinion/trust value. A general directed graph
will often contain loops and dependencies. This can be avoided by excluding certain paths, but this can also cause information loss. Specific selection criteria are
needed in order to find the optimal subset of paths to include. With n possible paths,
there are (2n − 1) different combinations for constructing graphs, of which not all
necessarily are DSPGs. The algorithm of Figure 14.10 aims at synthesising a DSPG
trust network with the least information loss relative to the original non-DSPG trust
network.
Figure 14.11 illustrates an simple non-DSPG trust graph, where it is assumed that
A is the source analyst and X is the target. The two heuristic rules used to discard
paths are 1) when a path is inconsistent with a DSPG, and 2) when the product of
projected probabilities drops below a predefined threshold.
In the algorithm of Figure 14.10 on p.276 it can be noted that step (d) enforces
the rule that the product projected probability of the referral trust edges in the graph
14.4 Analysing Complex Non-DSPG Trust Networks
277
B
A
D
X
C
Legend:
Referral trust
Functional belief / trust
Fig. 14.11 Simple non-DSPG trust network
is greater than or equal to threshold pT . A low product value indicates low certainty
in the trust path. By removing paths with low certainty, the number of paths to
consider is reduced while the information loss can be kept to an insignificant level.
The subsequent step (e) checks that the path can be included consistently with the
DSPG.
In the non-DSPG trust network of Figure 14.11 there are 3 possible paths between A and X, as expressed by Eq.(14.3) below.
φ1 = ([A; B] : [B;C] : [C, X])
φ2 = ([A; D] : [D;C] : [C, X])
φ3 = ([A; B] : [B; D] : [D;C] : [C, X])
(14.3)
The 3 paths can generate the following 7 potential combinations/graphs.
γ1 = φ 1
γ2 = φ 2
γ3 = φ 3
γ4 = φ 1 ⋄ φ 2
γ5 = φ 1 ⋄ φ 3
γ6 = φ 2 ⋄ φ 3
γ7 = φ 1 ⋄ φ 2 ⋄ φ 3
(14.4)
The expression γ7 contains all possible paths between A and X. The problem
with γ7 is that it is a non-DSPG so that it can not be represented in the form of a
canonical expression, i.e. where each edge only appears once. In this example, one
path must must be removed from the graph in order to have a canonical expression.
The expressions γ4 , γ5 and γ6 can be canonicalised, and the expressions γ1 , γ2 and
γ3 are already canonical, which means that all the expressions except γ7 can be used
as a basis for constructing a DSPG and for deriving A’s opinion/trust in X.
The following requirements must be satisfied for preserving a DSPG when including new sub-paths. The source and target nodes refer to the source and target
nodes of the new sub-path that is to be added to the existing graph by bifurcation.
Definition 14.7 (Requirements for Including New Sub-Paths in DSPG).
1. The target node must be reachable from the source node in the existing graph.
2. The source and the target nodes must have equal nesting levels in the existing
graph.
3. The nesting level of the source and target nodes must be equal to, or less than
the nesting level of all intermediate nodes in the existing graph.
278
14 Trust Networks
⊔
⊓
These principles are illustrated with examples below. Figure 14.12, Figure 14.13
and Figure 14.14 illustrate how new paths can be included in a way that preserves
graph canonicity. In the figures, the nesting levels of nodes and edges are indicated
as an integer. A bifurcation is when a node has two or more incoming or outgoing
edges, and is indicated by brackets in the shaded node boxes. The opening bracket
‘(’ increments the nesting level by 1, and the closing bracket ‘)’ decrements the
nesting level by 1. A sub-path is a section of a path without bifurcations. The equal
sign ‘=’ means that the node is part of a sub-path, in which case the nesting level
of the edge on the side of the ‘=’ symbol is equal to the nesting level of the node.
Each time a new path is added to the old graph, some sub-path sections may already
exist in the old graph which does not require any additions, whereas other sub-path
sections that do not already exist, must be added by bifurcations to the old graph.
• Illustrating DSPG Synthesis Requirement 1.
Requirement 1 from Definition 14.7 is illustrated in Figure 14.12. The new
edge [B;C] is rejected because C is not reachable from B in the existing graph,
whereas the new edge [A; D] can be included because D is reachable from A in
the existing graph.
B
=1=
A
0(
1
1
1
C
=1=
D
)0=
0
X
=0
1
Legend:
Existing referral trust edge
Existing functional belief/trust edge
Potential new referral trust edge
Included
Rejected
#
Nesting level
Fig. 14.12 Visualising the requirement that the target must be reachable from the source.
The edge can be included under the same nesting level as the sub-paths ([A; B] :
[B; D]) and ([A;C] : [C; D]) in this example. The existing and new updated graphs
of Figure 14.12 are expressed below. Note that the brackets around sub-paths,
e.g. ([A; B] : [B; D]), are not reflected in Figure 14.12 because they do not represent nesting, but simply grouping of arcs belonging to the same sub-path.
14.4 Analysing Complex Non-DSPG Trust Networks
279
Existing graph: ((([A; B] : [B; D]) ⋄ ([A;C] : [C; D])) : [D; X])
(14.5)
Updated graph: ((([A; B] : [B; D]) ⋄ ([A;C] : [C; D]) ⋄ [A; D]) : [D; X])
• Illustrating DSPG Synthesis Requirement 2.
Requirement 2 from Definition 14.7 is illustrated in Figure 14.13. The new edge
[B; D] is rejected because B and D have different nesting levels, whereas the
new edge [A; D] is included because A and D have equal nesting levels. Node
A does in fact have nesting levels 1 and 2 simultaneously because two separate
bifurcations with different nesting levels start from A.
2
2
A
0(1(
2
B
=2=
2
C
)1=
1
D
=1=
1
X
)0
Legend:
Existing referral trust edge
Existing functional belief/trust edge
Potential new referral trust edge
Included
Rejected
#
Nesting level
Fig. 14.13 Visualising the requirement that the source and target must have equal nesting levels.
Including the new edge produces an additional nesting level being created,
which also causes the nesting levels of the sub-paths [A; B] : [B;C] and [A;C]
to increment. The existing and updated graphs of Figure 14.13 can then be expressed as:
Existing graph: (((([A; B] : [B;C]) ⋄ [A;C]) : [C; D] : [D, X]) ⋄ [A, X])
Updated graph: ((((([A; B] : [B;C]) ⋄ [A;C]) : [C; D]) ⋄ [D, X]) ⋄ [A, X])
(14.6)
• Illustrating DSPG Synthesis Requirement 3.
Requirement 3 from Definition 14.7 is illustrated in Figure 14.14. The new edge
[B; D] is rejected because the node C has a nesting level that is inferior to that of
B and D, whereas the new edge [A, X] is included because the nesting level of C
is equal to that of A and X.
Including the new edge produces an additional nesting level being created,
which also causes the nesting levels of the existing sub-paths to increment. The
280
14 Trust Networks
B
=1=
A
0(
1
D
=1=
1
1
C
)0(
1
1
1
X
)0
Legend:
Existing referral trust edge
Existing functional trust edge
Potential new (referral or functional) belief/trust edge
Included
Rejected
#
Nesting level
Fig. 14.14 Visualising the requirement that nesting level of intermediate nodes must be equal to
or greater than that of source and target
existing and new graphs of Figure 14.14 can then be expressed as:
Existing graph: ((([A; B] : [B;C]) ⋄ [A;C]) : (([C; D] : [D, X]) ⋄ [C, X]))
Updated graph: (((([A; B] : [B;C]) ⋄ [A;C]) : (([C; D] : [D, X]) ⋄ [C, X])) ⋄ [A, X])
(14.7)
Chapter 15
Bayesian Reputation Systems
Reputation systems are used to collect and analyse feedback about the performance
and quality of products, service and service entities, that for short can be called
service objects. The received feedback can be used to derive reputation scores, that
in turn can be published to potential future users. The feedack can also be used
internally by the service provider in order to improve the quality of service objects.
Figure 15.1 illustrates how a reputation system typically isintegrated in online
service provision. The figure indicates the cyclic sequence of steps including request and provision of services, in addition to the exchange and processing of feedback ratings and reputation scores. Reputation systems are normally integrated with
the service provision function, so that steps related to reputation are linked to the
steps of service provision. Feedback from service users is highly valuable to service
providers, but there are typically no obvious incentive for service users to provide
ratings. In order to increase the amount of feedback it is common that the service
provider explicitly requests feedback from the service users, after a service has been
provided and consumed.
From the user’s point of view, it is assumed that reputation scores can help predict the future performance of service objects and thereby reduce uncertainty of
users during decision-making to rely on those service objects [77]. The idea is that
transactions with reputable service objects or service providers are likely to result in
more favourable outcomes than transactions with disreputable service objects and
service providers.
Reputation scores are not only useful for service consumer decision making.
Reputation scores can also be used internally by a service provider in order to tune
and configure the service provision system, and in general to increase quality and
performance.
Reputation systems are typically centralised, meaning that ratings are centrally
aggregated, as illustrated in Figure 15.1. Distributed reputation systems have been
proposed, and could e.g. be implemented in conjunction with Peer-to-Peer (P2P)
networks [3], where a user must discover and request private reputation ratings from
other users in the P2P-network. We will not discuss distributed reputation systems
here.
281
282
15 Bayesian Reputation Systems
User
Service
provider
Select and request
service object
Service request
Consume and
assess service
Fetch and consider
reputation information
Service provision
Compose and
provide service
Feedback rating
Reputation System
Reputation
dissemination
Aggregate feedback
& compute scores
Improve quality
of service objects
Legend:
External interaction
Internal processing
Fig. 15.1 Integration of reputation systems in service architectures
Two fundamental elements of reputation systems are:
1. Collection network that allows the reputation system to receive and aggregate
feedback ratings about service objects from users, as well as quality indicators
from other sources.
2. A reputation score computation engine used by the reputation system to derive
reputation scores for each participant, based on received ratings, and possibly
also on other information.
Many different reputation systems, including reputation score computation methods, have been proposed in the literature, and we do not intend to provide a complete
survey or comparison here. We refer to [23, 43, 60, 78, 83] as general literature on
reputation systems.
The most common approach to computing the reputation score in commercial
systems is probably to use some form of weighted mean. Computation methods
based in weights distributed around the median rating (i.e. the middle rating in a
ranked list of ratings) [1, 2, 28] can provide more stable reputation scores than scores
based on the simple mean. It is also possible to compute scores bases on fuzzy logic
[4], for example. User-trust and time-related factors can be incorporated into any of
the above mentioned reputation computation methods,
This chapter focuses on the reputation score computation methods, and in particular on Bayesian computational methods. Binomial and multinomial Bayesian
reputation systems have been proposed and studied e.g. in [42, 44, 54, 90]. The purpose of this chapter is to concisely describe basic features of Bayesian reputation
systems.
15.1 Computing Reputation Scores
283
15.1 Computing Reputation Scores
Binomial reputation systems allow ratings to be expressed with two values, as either
positive (e.g. Good) or negative (e.g. Bad). Multinomial reputation systems allow
the possibility of providing ratings in different discrete levels such as e.g. mediocre
- bad - average - good - excellent.
15.1.1 Binomial Reputation Scores.
Binomial Bayesian reputation systems apply to the binary state space {Bad, Good}
which reflect a corresponding performance of a service object. The evidence notation of the Beta PDF of Eq.(3.8) takes the two parameters r and s that represent the
number of received positive and negative ratings respectively.
Binomial reputation is computed by statistical updating of the Beta PDF. More
specifically, the a posteriori (i.e. the updated) reputation is continuously computed
by combining the a priori (i.e. previous) reputation with every new rating. It is the
expected probability of Eq.(3.9) that is used to represent the reputation score. The
Beta PDF itself only provides the underlying statistical foundation, and is otherwise
not used in the reputation system.
Before receiving any ratings, the a priori distribution is the Beta PDF with r = 0
and s = 0, which with the default base rate a = 0.5 produces a uniform a priori
distribution. Then after observing r ”Good” and s ”Bad” outcomes, the a posteriori Beta PDF gives a reputation score S that can be computed with the expression
for expected probability of Eq.(3.9), which in terms of reputation score is repeated
below.
S = (r +Wa)/(r + s +W ).
(15.1)
This score should be interpreted as the probability that the next experience with
the service object will be ”Good”. Recall that W denotes the non-informative prior
weight, where W = 2 is normally used.
15.1.2 Multinomial Reputation Scores.
Multinomial Bayesian reputation systems allow ratings to be provided over k different levels which can be considered as a set of k disjoint elements. Let this set
be denoted as Λ = {L1 , . . . Lk }, and assume that ratings are provided as votes on
the elements of Λ . This leads to a Dirichlet PDF (probability density function) over
the k-component random probability variable p (Li ), i = 1 . . . k with sample space
[0, 1]k , subject to the simple additivity requirement ∑ki=1 p (Li ) = 1. The evidence
representation of the Dirichlet PDF is given in Eq.(3.16). The Dirichlet PDF itself
284
15 Bayesian Reputation Systems
only provides the underlying statistical foundation, and is otherwise not used in the
reputation system.
The Dirichlet PDF with prior captures a sequence of observations of the k possible outcomes with k positive real rating parameters r (Li ), i = 1 . . . k, each corresponding to one of the possible levels. In order to have a compact notation we define
a vector p = {pp(Li ) | 1 ≤ i ≤ k} to denote the k-component probability variable, and
a vector r = {ri | 1 ≤ i ≤ k} to denote the k-component rating variable.
In order to distinguish between the a priori default base rate, and the a posteriori
ratings, the Dirichlet distribution must be expressed with prior information represented as a base rate distribution a over the rating levels L.
Similarly to the binomial case, the multinomial reputation score S is the distribution of expected probabilities of the k random probability variables, which can
be computed with the expression for expected probability distribution of Eq.(3.17),
which in terms of score distribution is expressed as:
S (Li ) = E(pp(Li ) | r , a ) =
r (Li ) +W a (Li )
.
W + ∑ki=1 r (Li )
(15.2)
The non-informative prior weight W will normally be set to W = 2 when a uniform distribution over binary state spaces is assumed. Selecting a larger value for
W would result in new observations having less influence. over the Dirichlet distribution, and can in fact represent specific a priori information provided by a domain
expert or by another reputation system.
15.2 Collecting and Aggregating Ratings
Before computing reputation scores, the ratings must be collected and aggregated in
some way. This includes taking time decay into account, for example.
15.2.1 Collecting Ratings
Assume k different discrete rating levels L. This translates into having a domain of
cardinality k. For binomial reputation systems k = 2 and the rating levels are ”Bad”
and ”Good”. For multinomial reputation system k > 2 and any corresponding set of
suitable rating levels can be used. Let the rating level be indexed by i. The aggregate
ratings for a particular service object y are stored as a cumulative vector, expressed
as:
Ry (Li ) | i = 1 . . . k) .
R y = (R
(15.3)
15.2 Collecting and Aggregating Ratings
285
The simplest way of updating a rating vector as a result of a new rating is by
adding the newly received rating vector r to the previously stored vector R . The
case when old ratings are aged is described in Sec.15.2.2.
Each new discrete rating of service object y by an agent A takes the form of a
trivial vector r Ay where only one element has value 1, and all other vector elements
have value 0. The index i of the vector element with value 1 refers to the specific
rating level.
15.2.2 Aggregating Ratings with Aging
Ratings are typically aggregated by simple addition of the components (vector addition). However, service objects may change their quality over time, so it is desirable
to give relatively greater weight to more recent ratings. This principle is called time
decay, which can be taken into by introducing a longevity factor λ ∈ [0, 1] for ratings, which controls the rapidity with which old ratings are aged and discounted as
a function of time. With λ = 0, ratings are completely forgotten after a single time
period. With λ = 1, ratings are never forgotten.
Let new ratings be collected in discrete time periods. Let the sum of the ratings
of a particular service object y in period t be denoted by the vector r y,t . More specifically, it is the sum of all ratings r Ay of service object y by rating agents A during that
period, expressed by:
r y,t = ∑ r xy
(15.4)
A∈My,t
where My,t is the set of all rating agents who rated service object y during period t.
Let the total accumulated ratings (with aging) of service object y after the time
period t be denoted by Ry,t . Then the new accumulated rating after time period t + 1
can be expressed as:
Ry,(t+1) = λ · Ry,t + r y,(t+1) , where 0 ≤ λ ≤ 1 .
(15.5)
Eq.(15.5) represents a recursive updating algorithm that can be executed once
every period for all service objects, or alternatively in a discrete fashion for each
service object for example after each new rating. Assuming that new ratings are
received between time t and time t + n periods, then the updated rating vector can
be computed as:
R y,(t+n) = λ n · R y,t + r y,(t+n) , 0 ≤ λ ≤ 1.
(15.6)
286
15 Bayesian Reputation Systems
15.2.3 Reputation Score Convergence with Time Decay
The recursive algorithm of Eq.(15.5) makes it possible to compute convergence values for the rating vectors, as well as for reputation scores. Assuming that a particular
service object receives the same ratings every period, then Eq.(15.5) defines a geometric series. We use the well known result of geometric series:
∞
1
∑ λ j = 1−λ
for − 1 < λ < 1 .
(15.7)
j=0
Let e y represent a constant rating vector of service object y for each period. The
Total accumulated rating vector after an infinite number of periods is then expressed
as:
R y,∞ =
ey
, where 0 ≤ λ < 1 .
1−λ
(15.8)
Eq.(15.8) shows that the longevity factor λ determines the convergence values
for the accumulated rating vector according to Eq.(15.5). In general it will be impossible for components of the accumulated rating vector to reach infinity, which makes
it impossible for the score vector components to cover the whole range [0, 1]. However, service objects that provide maximum quality services over a long time period
wold naturally expect to get the highest possible reputation score. An intuitive interpretation of this expectation is that each long standing service object should have
its own individual base rate which is determined as a function of the service object’s
total history, or at least a large part of it. This approach is used in the next section to
include individual base rates.
15.3 Base Rates for Ratings
The cold-start problem in reputation systems is when a service object has not received any ratings, or too few ratings to produce a reliable reputation score. This
problem can be solved by basing reputation scores on base rates, which can be individual or community based.
15.3.1 Individual Base Rates
A base rate normally expresses the average in a population or domain. Here we will
compute individual base rates from a ‘population’ consisting of individual performances over a series of time periods. The individual base rate for service object y at
time t will be denoted as a y,t . It will be based on individual evidence vectors denoted
as Q y,t .
15.3 Base Rates for Ratings
287
Let a denote the community base rate as usual. Then the individual base rate for
service object y at time t can be computed similarly to Eq.(??) as:
a y,t (Li ) =
Q y,t (Li ) +W a (Li )
W + ∑ki=1 Q y,t (Li )
.
(15.9)
Reputation scores can be computed as normal with Eq.(15.2), except that the
community base rate a is replaced with the individual base rate a y,t of Eq.(15.9). It
can be noted that the individual base rate a y,t is partly a function of the community
base rate a , which thereby constitutes a two-level base rate model.
The components of the reputation score vector computed with Eq.(15.2) based
on the individual base rate of Eq.(15.9) can theoretically be arbitrarily close to 0 or
1 with any longevity factor and any community base rate.
The simplest alternative to consider is to let the individual base rate for each
service object be a function of the service object’s total history. A second similar
alternative is to let the individual base rate be computed as a function of an service
object’s performance over a very long sliding time window. A third alternative is
to define an additional high longevity factor for base rates that is much closer to 1
than the common longevity factor λ . The formalisms for these three alternatives are
briefly described below.
15.3.2 Total History Base Rate.
The total evidence vector Q y,t for service object y used to compute the individual
base rate at time period t is expressed as:
t
Q y,t =
∑ R y, j
(15.10)
j=1
15.3.3 Sliding Time Window Base Rate.
The evidence vector Q y,t for computing an service object y’s individual base rate at
time period t is expressed as:
t
Q y,t =
∑ R y, j
where Window Size = (t − u) .
(15.11)
j=u
The Window Size would normally be a constant, but could also be dynamic.
In case e.g. u = 1 the Window Size would be increasing and be equal to t, which
also would make this alternative equivalent to the total history alternative described
above.
288
15 Bayesian Reputation Systems
15.3.4 High Longevity Factor Base Rate.
Let λ denote the normal longevity factor. A high longevity factor λH can be defined where λH > λ . The evidence vector Q y,t for computing an service object y’s
individual base rate at time period t is computed as:
Q y,t = λH · Q y,(t−1) + r y,t , where λ < λH ≤ 1 .
(15.12)
In case λH = 1 this alternative would be equivalent to the total history alternative
described above. The high longevity factor makes ratings age much slower than the
regular longevity factor.
15.3.5 Dynamic Community Base Rates
Bootstrapping a reputation system to a stable and conservative state is important.
In the framework described above, the base rate distribution a will define initial
default reputation for all service objects. The base rate can for example be evenly
distributed, or biased towards either a negative or a positive reputation. This must be
defined by those who set up the reputation system in a specific market or community.
Service objects will come and go during the lifetime of a market, and it is important to be able to assign a reasonable base rate reputation to new service objects. In
the simplest case, this can be the same as the initial default reputation used during
during bootstrap.
However, it is possible to track the average reputation score of the whole community of service objects, and this can be used to set the base rate for new service
objects, either directly or with a certain additional bias.
Not only new service objects, but also existing service objects with a standing
track record can get the dynamic base rate. After all, a dynamic community base
rate reflects the whole community, and should therefore be applied to all service
object members of that community. The aggregate reputation vector for the whole
community at time t can be computed as:
R M,t =
∑
R y,t
(15.13)
y j ∈M
This vector then needs to be normalised to a base rate vector as follows:
Definition 15.1 (Community Base Rate). Let R M,t be an aggregate reputation vector for a whole community, and let S M,t be the corresponding multinomial probability reputation vector which can be computed with Eq.(15.2). The community base
rate as a function of existing reputations at time t + 1 is then simply expressed as
the community score at time t:
a M,(t+1) = S M,t .
(15.14)
15.4 Reputation Representation
289
The base rate vector of Eq.(15.14) can be given to every new service object that
joins the community. In addition, the community base rate vector can be used for
every service object every time their reputation score is computed. In this way, the
base rate will dynamically reflect the quality of the market at any one time.
If desirable, the base rate for new service objects can be biased in either negative
or positive direction in order to make it harder or easier to enter the market.
When base rates are a function of the community reputation, the expressions for
convergence values with constant ratings can no longer be defined with Eq.(15.8),
and will instead converge towards the average score from all the ratings.
15.4 Reputation Representation
Reputation can be represented in different forms. We will here illustrate reputation
as multinomial probability scores, and as point estimates. Each form will be described in turn below.
15.4.1 Multinomial Probability Representation.
The most natural is to define the reputation score as a function of the expected
probability of each score level. The expected probability for each rating level can be
computed with Eq.(15.2).
Let R represent the service object’s aggregate ratings. Then the vector S defined
by:
!
R ∗y (Li ) +W a (Li )
S y : S y (Li ) =
;| i = 1...k .
(15.15)
W + ∑kj=1 R ∗y (L j )
is the corresponding multinomial probability reputation score. As already stated,
W = 2 is the value of choice, but a larger value for the constant W can be chosen if
a reduced influence of new evidence over the base rate is required.
The reputation score S can be interpreted like a multinomial probability measure
as an indication of how a particular service object is expected to perform in future
transactions. It can easily be verified that
k
∑ S(Li ) = 1 .
(15.16)
i=1
The multinomial reputation score can for example be visualised as columns,
which would clearly indicate if ratings are polarised. Assume for example 5 levels:
290
15 Bayesian Reputation Systems

L1 : Mediocre,




 L2 : Bad,
Discrete rating levels: L3 : Average,


 L4 : Good,


L5 : Excellent.
(15.17)
We assume a default base rate distribution. Before any ratings have been received,
the multinomial probability reputation score will be equal to 1/5 for all levels. Let us
assume that 10 ratings are received. In the first case, 10 average ratings are received,
which translates into the multinomial probability reputation score of Fig.15.2.a. In
the second case, 5 mediocre and 5 excellent ratings are received, which translates
into the multinomial probability reputation score of Fig.15.2.b.
(a) 10 average ratings
(b) 5 mediocre and 5 excellent ratings
Fig. 15.2 Illustrating score difference resulting from average and polarised ratings
With a binomial reputation system, the difference between these two rating scenarios would not have been visible.
15.4.2 Point Estimate Representation.
While informative, the multinomial probability representation can require considerable space to be displayed on a computer screen. A more compact form can be
to express the reputation score as a single value in some predefined interval. This
can be done by assigning a point value ν to each rating level i, and computing the
normalised weighted point estimate score σ .
Assume e.g. k different rating levels with point values evenly distributed in the
i−1
range [0,1], so that ν (Li ) = k−1
. The point estimate reputation score is then:
k
σ = ∑ ν (Li )S(Li ) .
(15.18)
i=1
However, this point estimate removes information, so that for example the difference between the average ratings and the polarised ratings of Fig.15.2.a and
Fig.15.2.b is no longer visible. The point estimates of the reputation scores of
Fig.15.2.a and Fig.15.2.b are both 0.5, although the ratings in fact are quite dif-
15.5 Simple Scenario Simulation
291
ferent. A point estimate in the range [0,1] can be mapped to any range, such as 1-5
stars, a percentage or a probability.
15.4.3 Continuous Ratings
It is common that the rating and score of service objects are measured on a continuous scale, such as time, throughput or relative ranking, to name a few examples.
Even when it is natural to provide discrete ratings, it may be difficult to express that
something is strictly good or average, so that combinations of discrete ratings, such
as ‘average-to-good’ would better reflect the rater’s opinion. Such ratings can then
be considered continuous. To handle this, it is possible to use a fuzzy membership
function to convert a continuous rating into a binomial or multinomial rating. For
example with five rating levels the sliding window function can be illustrated as in
Fig.15.3. The continuous q-value determines the r-values for that level.
Fig. 15.3 Fuzzy triangular membership functions
15.5 Simple Scenario Simulation
A simple scenario can be used to illustrate the performance of a multinomial reputation system that uses some of the features described above. Let us assume that
service objects y and z receive the following ratings over 70 rounds or time periods.
Table 15.1 Sequence of ratings
Sequence
Periods 1 - 10
Periods 11-20
Periods 21-30
Periods 31-40
Periods 41-70
Service object y
10 × L1 ratings in each period
10 × L2 ratings in each period
10 × L3 ratings in each period
10 × L4 ratings in each period
30 × L5 ratings in each period
Service object z
1 × L1 rating in each period
1 × L2 rating in each period
1 × L3 rating in each period
1 × L4 rating in each period
3 × L5 ratings in each period
292
15 Bayesian Reputation Systems
The longevity of ratings is set to λ = 0.9 and the individual base rate is computed with the high longevity approach described in Sec.15.3.4 with high longevity
factor for the base rate set to λH = 0.999. For simplicity in this example the
community base rate is assumed to be fixed during the 70 rounds, expressed by
a(L1 ) = a(L2 ) = a(L3 ) = a(L4 ) = a(L5 ) = 0.2. Fig.15.4 illustrates the evolution of
the scores of service objects y and z during the period.
Scores
Scores
1
1
0,9
0,9
0,8
0,8
0,7
71
0,6
61
0,5
51
0,4
41
0,3
31
0,2
Periods
21
0,1
0,7
71
0,6
61
51
0,5
41
0,4
0,3
31
0,2
Periods
21
0,1
11
0
L1 L3
Point Estimate
L5 1
(a) Scores for service object y
11
0
L1 L3
L5 1
Point Estimate
(b) Scores for service object z
Fig. 15.4 Score evolution for service objects y and z
The scores for both service objects start with the community base rate, and then
vary as a function of the received ratings. Both service objects have an initial point
estimate of 0.5.
The scores for service object z in Fig.15.4.b are similar in trend but less articulated than that of service object y in Fig.15.4.a, because service object z receives
equal but less frequent ratings. The final score of service object z is visibly lower
than 1 because the relatively low number of ratings is insufficient for driving the
individual base rate very close to 1. Thanks to the community base rate, all new
service objects in a community will have a meaningful initial score. In case of rating scarcity, an service object’s score will initially be determined by the community
base rate, with the individual base rate dominating as soon as some ratings have
been received,
15.6 Combining Trust and Reputation
Multinomial aggregate ratings can be used to derive binomial trust in the form of an
opinion. This is done by first converting the multinomial ratings to binomial ratings
according to Eq.(15.19) below, and then to apply the mapping of Definition 3.3.
Let the multinomial reputation model have k rating levels Li ; i = 1, . . . k, where
R (Li ) represents the ratings on each level Li , and let σ represent the point estimate
15.7 Combining Trust and Reputation
293
reputation score from Eq.(15.18). Let the binomial reputation model have positive
and negative ratings r and s respectively. The derived converted binomial rating
parameters (r, s) are given by:

 r = σ ∑ki=1 R y (Li )
(15.19)

k R
s = ∑i=1 y (Li ) − r
With the equivalence mapping of Definition 3.3 it is possible to analyse trust
networks based on both trust relationships and reputation scores, as described next.
15.7 Combining Trust and Reputation
The multinomial Bayesian reputation systems described above use the same representation as multinomial evidence opinions described in Eq.(3.16), which can be
mapped to multinomial opinions according to Definition 3.6. Furthermore, the projection from multinomial ratings to binomial ratings of Eq.(15.19), combined with
the binomial mapping of Definition 3.3 makes it is possible to represent reputation
scores as binomial opinions, that in turn can be applied in computational trust models as described in Chapter 13.
Fig.15.5 illustrates a scenario involving a reputation system that publishes reputation scores about agents in a network. We assume that agent A needs to derive a
measure of trust in agent F, and that only agent B has knowledge about F. Assume
furthermore that agent A has no direct trust in B, but that A trusts the Reputation
System RS and that RS has published a reputation score about B.
Reputation System
RS
1
2
A
3
B
5
4
C
D
E
Users and
Services
F
Fig. 15.5 Combining trust and reputation
A in the Reputation System (arrow 1), and agent B has repuAgent A has trust ωRS
RS
tation score R B (arrow 2). The binomial reputation score R RS
B a Beta PDF denoted
Betae (rx , sx , ax ) as expressed in Eq.(3.8), and which can be mapped to a binomial
opinion according to the mapping of Definition 3.3. The binomial opinion derived
from the reputation score of B produced by the Reputation System RS is then denoted ωBRS .
294
15 Bayesian Reputation Systems
Agent A can then derive a measure of trust in B (arrow 3) based on A’s trust in
RS, and the reputation opinion ωBRS . Agent B trusts F (arrow 4) with opinion ωFB ,
where it is assumed that B recommends this trust to A, so that A can derive a measure
of trust in F (arrow 5). The trust path is expressed as:
[A, F] = [A; RS] : [RS; B] : [B, F]
(15.20)
The trust edges [A; RS] and [RS; B] represent referral trust, whereas the trust edge
[B, F] represents functional trust.
In the notation of subjective logic, A’s derived trust in F can be expressed as:
A
ωFA = ωRS
⊗ ωBRS ⊗ ωFB
(15.21)
The computation of ωFA is done according to the method for multi-node trust
transitivity described in Section 13.3.4.
The compatibility between Bayesian reputation systems and subjective logic provides a flexible framework for analysing trust networks consisting of both reputation
scores and private trust values.
Chapter 16
Subjective Networks
This chapter focuses on representation and reasoning in conditional inference networks, combined with trust networks, thereby introducing subjective networks as
graph-based structures of variables combined with conditional opinions. Subjective
networks generalize Bayesian network modelling and analysis from being based on
probability calculus, to being based on subjective logic.
A Bayesian network [74] is a compact representation of a joint probability distribution of random variables in the form of DAG (directed acyclic graph) and a set
of conditional probability distributions associated with each node.
The goal of inference in Bayesian networks is to derive a conditional probability
distribution of any set of (target) variables in the network, given that the values of
any other set of (evidence) variables have been observed. Bayesian networks reasoning algorithms provide a way to propagate the probabilistic information through
the graph, from the evidence to the target.
One serious limitation of traditional Bayesian network reasoning is that all the
input conditional probability distributions in the network must be assigned precise
values in order for the inference algorithms to work, and for the model to be analysed. This is problematic in situations where probabilities can not be reliably elicited
and one needs to do inference with uncertain or incomplete probabilistic information, inferring the most accurate conclusions possible.
Subjective opinions can express uncertain probabilistic information of any kind
(minor or major imprecision, and even total ignorance), by varying the uncertainty
mass between 0 and 1.
A straightforward generalization of Bayesian networks in subjective logic retains
the network structure and replaces conditional probability distributions with conditional subjective opinions at every node of the network. We call this a subjective
Bayesian network and consider the reasoning in it as a generalization of classical
Bayesian reasoning, where the goal is to obtain a subjective opinion on the target
given the evidence that can be an instantiation of values, but also a subjective opinion itself.
295
296
16 Subjective Networks
General inference in subjective Bayesian networks can be challenging, since subjective logic inference requires the consideration of uncertainty, which changes the
notion of conditional independence in Bayesian networks.
At the time of writing, subjective Bayesian networks – the topic of this chapter – is a relatively new field that is still not thoroughly developed. This chapter is
therefore meant as a brief introduction into this new field, which appears to be a
very fertile field of research and development. The capacity of subjective logic for
reasoning in the presence of uncertainty, combined with the power of Bayesian networks for modelling conditional knowledge structures, is a very potent combination.
A whole new book will be required to cover this field properly.
The next section gives a brief overview of Bayesian networks. For a thorough
introduction into the field, see e.g. [59]. After the introduction, we describes some
properties and aspects resulting from the generalisation to subjective Bayesian networks, and how it can be applied.
16.1 Bayesian Networks
Bayesian networks represent a powerful framework for modelling and analysing
practical situation, where the analyst needs to make probabilistic inference about
a set of variables with unknown values. Initially proposed by Pearl in 1988 [74],
Bayesian network tools are currently being used in important applications in many
areas like medical diagnostics, risk management, marketing, military planning, etc.
When events and states are related in time and space, they are conditionally dependent. For example, the state of carrying an umbrella is typically influenced by the
state of rain. These relationships can be expressed in the form of graphs, consisting
of nodes connected with directed edges. To be practical, the graphs must be acyclic
to prevent loops, so that the graph is a DAG (directed acyclic graph), to be precise.
The nodes are variables that represent possible states or events. The directed edges
represent the (causal) relationships between the nodes.
Associated with the Bayesian network graph, are various (conditional) probability distributions, that formally specify selected local (conditional) relationships
between nodes. Missing probability distributions for specific target nodes can be
derived through various algorithms that take as input arguments the existing known
probability distributions and the structure of the Bayesian network graph.
Consider a Bayesian network containing K nodes, X I to X K , and a joint probability distribution over all the variables. A particular probability in the joint distribution
is represented by p(X I = xI , X II = xII , . . . , X K = xK ), which can be expressed more
concisely as p(x1 , x2 , . . . , xn ).
The chain rule of conditional probability reasoning expresses the joint probability
in terms factorisation of conditional probabilities as:
16.1 Bayesian Networks
297
p(xI , xII , . . . , xK ) = p(xI ) · p(xII |xI ) · p(xIII |(xI , xII ))·, . . . , ·p(xK |(xI , . . . , x(K−1) ))
= ∏i p(xi |(xI , . . . , x(i−1) ))
(16.1)
The application of Eq.(16.1) togeter with Bayes rule, a set of independence properties, as well as various computation algorithms, provide the basis for analysing
complex Bayesian networks.
Figure 16.1 illustrate the most common reasoning categories supported by Bayesian
networks [59].
Direction of reasoning
PREDICTIVE
DIAGNOSTIC
X II
XI
Query
Y
ZI
Direction of reasoning
Evidence
XI
X II
Y
Query
Z II
ZI
Z II
Evidence
INTERCAUSAL
XI
X II
Query
Y
ZI
COMBINED
Evidence
XI
Query
Z II
Direction of reasoning
ZI
X II
Y
Evidence
Z II
Direction of reasoning
Fig. 16.1 Categories of Bayesian reasoning
The next sections provide examples of reasoning according to these categories.
16.1.1 Example: Lung Cancer Situation
A classical example used to illustrate Bayesian networks is the case of lung cancer, that on the one hand can have various causes, and that on the other hand can
cause observable effects [59]. In the example, it is assumed that breathing polluted
air, denoted P, and cigarette smoking, denoted S, are the most relevant causes. The
estimated likelihood of getting lung cancer is specified as a table with conditional
298
16 Subjective Networks
probabilities corresponding to all possible combinations of causes, i.e. without any
of the causes, as a result of cause S alone, cause P alone, or both causes simultaneously. In addition, assuming that a person has lung cancer (or not), the estimated
likelihood of positive cancer detection on an X-ray image, denoted X, and the estimated likelihood of shortness of breath (medical term: ‘dyspnoea’, denoted D, are
also specified as probability tables. Figure 16.2 illustrates this particular Bayesian
network.
a(P) = 0.30
a(C) = 0.01
C p( D | C)
F
T
Smoking
Pollution
Cancer
Dyspnoea
a(S) = 0.40
P
S
p( C | P, S )
F
F
T
T
F
T
F
T
0.001
0.020
0.002
0.030
X-Ray
0.020
0.650
C p( X | C)
F
T
0.010
0.900
Fig. 16.2 Simple Bayesian network for the lung cancer situation
Once the graph has bee drawn and populated with conditional probabilities, the
Bayesian network becomes a basis for reasoning. In particular, when the value of
one or several variables has been observed (or guessed), we can make inferences
from the new information. This process is sometimes called probability propagation
or belief updating, and consists of applying Bayes theorem and other laws of probability, with the observed evidence and the probability tables as input parameters, to
determine the probability distribution of specific variables of interest.
For example, assume a person who consults his GP because of shortness of
breath. From this evidence alone, the GP can estimate the likelihood that the person suffers from lung cancer. The computation applies Bayes theorem of Eq.(9.6),
and would require a prior (base rate) probability of lung cancer in the population
(expressed as a(x) in Eq.(9.6)). Assume that the prior probability for lung cancer is
a(C) = 0.01. The probability of lung cancer given dyspnoea is then:
p(C|D) =
a(C)p(D|C)
a(C)p(D|C)+a(C)p(D|C)
(16.2)
=
0.01·0.65
(0.01·0.65)+(0.99·0.02)
= 0.25.
16.1 Bayesian Networks
299
However, assuming that the evidence is not very conclusive, and that many other
conditions can also cause shortness of breath, the GP can decide to get an X-ray image of the person’s lungs in order to have more firm evidence. Based on indications
of lung cancer found on the X-ray image, combined with the evidence of dyspnoea,
the GP can update her belief in the likelihood of lung cancer by re-applying Bayes
theorem. The expression for the probability of lung cancer in Eq.(16.3) is conditioned on the joint variables (D,X).
p(C|D, X) =
a(C)p(D, X|C)
a(C)p(D, X|C) + a(C)p(D, X|C)
(16.3)
However, there is no available probability table for p(D, X|C), so Eq.(16.3) can
not be computed directly. Of course, medical authorities could establish a specific
probability table for p(D, X|C), but because there can be many different indicators
for a given diagnosis, it is typically impractical to produce ready-made probability
tables for every possible combination of indicators.
Instead, an approximation of Eq.(16.3) is the so-called naı̈ve Bayes classifier,
where multiple conditional probability tables based on separate variables can be
combined as if they were one single probability table based on joint variables. This
simplification is correct to the extent that the separate variables are independent,
which in many cases is a reasonable assumption. Eq.(16.4) gives the result of the
naı̈ve Bayes classifier in the example of diagnosing lung cancer based on dyspmoea
and X-ray.
p(C|D, X) ≈
a(C)p(D|C)p(X|C)
a(C)p(D|C)p(X|C)+a(C)p(D|C)p(X|C)
(16.4)
≈
0.01·0.65·0.90
(0.01·0.65·0.90)+(0.99·0.02·0.01)
= 0.97
It can be seen that, by including X-ray as evidence, the derived likelihood of
cancer increases significantly.
The Bayesian network of Figure 16.2 can also be used for making predictions.
Assume for example that, according to statistics, the base rate of the population
exposed to significant pollution is a(P) = 0.30, and the base rate of the population
that are smokers is a(S) = 0.40. Assuming independence, the combined base rates
of smokers and exposure to pollution are given in Table 16.1.
Table 16.1 Base rates of people exposed to pollution and being smokers
Pollution Smoker Probability
F
F
0.42
F
T
0.28
0.18
T
F
0.12
T
T
300
16 Subjective Networks
From the statistics of Table 16.1 and the conditional probability table in Figure 16.2, the base rate of lung cancer in the population can be computed with the
deduction operator of Eq.(9.11) to produce p(C) = 0.01.
The Bayesian network can also be used to formulate policy targets for public
health. Assume that the health authorities want to reduce the base rate of lung cancer
to p(C) = 0.005. Then they would have various options, where one option could be
to reduce the base rate exposure to pollution to p(P) = 0.1, and the base rate of
smokers to a(S) = 0.20. According to the Bayesian network, that would give a base
rate of lung cancer of p(C) = 0.005.
16.1.2 Naı̈ve Bayes Classifier
Figure 16.3 illustrates a generalised version of the Bayesian network of the above
example. In this general case, we assume that there is a central variable Y of cardinality l, with a set of K different parent variables X, and a set of M different child
variables Z.
XI
.....
X II
Z II
Parent variables
Intermediate variable
Y
ZI
XK
.....
ZM
Child variables
Fig. 16.3 General Bayesian network for intermediate variable with sets of parents and children
A specific parent variable is denoted X I , where a specific value of index i is
denoted xiI . We write X I = xiI to denote the case when variable X I takes the specific
value xiI . The notation xI denotes a specific value for variable X I without explicitly
specifying its index. Similarly, zI denotes a specific value of variable Z I without
explicitly specifying its index, and y denotes a specific value of variable Y without
explicitly specifying its index. Assume that Y takes its values from domain Y of
cardinality l = |Y|.
Eq.(16.5) is the general Bayes classifier which gives the expression for the probability distribution over Y given evidence variables with specific values Z I = zI ,
Z II = zII ,. . . , Z M = zM . The prior probability distribution, called base rate distribu-
16.1 Bayesian Networks
301
tion in subjective logic, is denoted a, so that e.g. a(y j ) denotes the prior probability
of value y j ∈ Y.
a(y)p(zI , . . . , zM | y)
p(y|zI , . . . , zM ) =
l
(16.5)
∑ a(y j )p(zI , . . . , zM | y j )
j=1
Assuming that there is no available multivariate probability table for conditional
probability distributions over Y based on joint variables, and that there are only
probability tables for conditional probability distributions based on single variables,
then it would be impossible to apply Eq.(16.5). In practice, this is often the case.
However, if it can be assumed that the variables Z I , Z II ,. . . , Z M are reasonably
independent, it is possible to apply the naı̈ve Bayes classifier of Eq.(16.6).
p(y|zI , . . . , zM ) ≈
a(y) ∏i p(zi | y)
l
(16.6)
∑ [a(y j ) ∏i p(zi | y j )]
j=1
Eq.(16.6) is the fundamental naı̈ve Bayes classifier that is used in a wide range
of fields, such as spam filtering, natural language text classification, medical diagnostics, customer profiling and marketing, just to name a few.
16.1.3 Independence and Separation
A Bayesian network graph is assumed to represent the significant dependencies of
all relevant variables of the situation to be analysed. This is expressed by the Markov
property which states that there are no direct dependencies in the system being modeled which are not already explicitly shown via edges. In the above example of lung
cancer, there is no way for smoking to influence dyspnoea except by way of causing
cancer.
Bayesian networks which have the Markov property are often called Independencemaps (or, I-maps for short), since every independence implicitly indicated by the
lack of an edge is real in the situation.
If every arc in a Bayesian network correspond to a direct dependence in the
situation, then the network is said to be a Dependence-map (or, D-map for short). A
Bayesian network which is both an I-map and a D-map is called a perfect map.
Bayesian networks with the Markov property, are I-maps by definition, and
explicitly express conditional independencies between probability distributions of
variables in the causal chain. Consider a causal chain of three nodes, X, Y and Z, as
shown in Figure 16.4.a.
In the above example of lung cancer, one such causal chain is:
‘smoking’ −→ ‘cancer’ −→ ‘dyspnoea’
302
16 Subjective Networks
X
X
Y
Y
Z
Y
(a) Causal chain
X
Z
(b) Common Cause
Z
(c) Common effect
Fig. 16.4 Different topologies of causality
Causal chains give rise to conditional independence, which for Figure 16.4.a is reflected by:
p(Z|X,Y ) = p(Z|Y )
(16.7)
Eq.(16.7) expresses the property of Figure 16.4.a where the probability of Z given
Y is exactly the same as the probability of Z, given both X and Y , because knowing
that X has occurred is irrelevant to our beliefs about Z if we already know that Y has
occurred.
With reference to Figure 16.4.a, the probability of dyspnoea (Z) depends directly
only on the condition of lung cancer (Y ). If we only know that someone is a smoker
(X), then that would increase the likelihood of both the person having lung cancer
(Y ) and suffering from shortness of breath (Z). However, if we already know that the
person has lung cancer (Y ), then it is assumed that also knowing that he is smoking
(X) is irrelevant to the probability of dyspnoea (Z). Expressed concisely, dyspnoea
is conditionally independent of smoking given lung cancer.
Consider now Figure 16.4.b, where two variables Y and Z have the common
cause X. With reference to the same example, lung cancer (X) is the common
cause of both symptoms which are dyspnoea (Y ), and positive indication on X-ray
(Z). Common causes have the same conditional independence properties as causal
chains, as expressed by Eq.(16.8)
p(Z|X,Y ) = p(Z|Y )
(16.8)
If there is no direct evidence about cancer, then knowing that one symptom is
present increases the likelihood of lung cancer, which in turn increase the likelihood
of the other symptom. However, Eq.(16.8) expresses the property of Figure 16.4.b
which says: if we already know that the person suffers from lung cancer, it is assumed that an additional positive observation of dyspnoea does not change the likelihood of finding a positive indication on the X-ray image.
Finally, consider Figure 16.4.c, where two variables X and Y have the common effect Z. Common effect situations have the opposite conditional independence
structure to that of causal chains and common causes, as expressed by Eq.(16.9).
p(X|Y, Z) 6= p(X|Z)
(16.9)
16.2 Subjective Bayesian Networks
303
Thus, if the effect Z (e.g., lung cancer) is observed, and we know that one of the
causes is absent (e.g., the patient does not smoke Y ), then that evidence increases
the likelihood of presence of the other cause (e.g., that he lives in a polluted area X).
16.2 Subjective Bayesian Networks
The goal of subjective Bayesian network modelling and analysis is to generalise
traditional Bayesian network modelling and analysis by including the uncertainty
dimension. This is certainly possible, but involves some additional complexity to
handle the uncertainty dimension. On the other hand, the advantage is that subjective
Bayesian networks explicitly express the inherent uncertainty of realistic situation in
the formal modelling, thereby allowing the analysis and the results to better reflect
the situation as seen by the analysts. In other words, the inherent uncertainty of
situations can no longer be ‘hidden under the carpet’, which is good news for policy
and decision makers.
Consider a subjective Bayesian network containing a set of K nodes/variables,
denoted X I to X K , and a joint opinion over all the variables. The joint opinion is
expressed by ωX I ,...,X K .
Similarly to the chain rule for conditional probabilities of Eq.(16.1), the chain
rule of subjective conditional reasoning expresses the joint opinion in terms iterative
deduction of conditional opinions as:
ωX I ,...,X K = ((.(ωX I ⊚ ωX II | X I ) ⊚ ωX III | (X I ,X II ) )⊚, . . . ) ⊚ ωX K | (X I ,...,X (K−1) )
⊚
⊚
(16.10)
= ∏ωX i | (X I ,...,X (i−1) ) , where ∏ denotes chained deduction.
i
Eq.(16.10) allows the concept of Bayesian networks to be generalised to subjective Bayesian networks. In subjective Bayesian networks, (conditional) opinions and
base rate distributions replace the (conditional) probability tables and priors used in
traditional Bayesian networks. Based on the operators of subjective logic such as
multiplication, division, deduction and abduction/inversion, models of subjective
Bayesian networks can be nicely expressed and analysed.
In the sections below, the four reasoning categories of fig:Bayes-reasoningmodels are described within the framework of subjective logic. Then in Section 16.4,
aspects of conditional independence for subjective Bayesian networks, as well as the
combination of trust networks, are discussed.
304
16 Subjective Networks
16.2.1 Subjective Predictive Reasoning
The predictive reasoning category was described during the presentation of traditional Bayesian networks above. Predictive reasoning is simply the application of
deduction, as described in Section 9.5 above.
b denote a set of K domains, and let Xb denote the corresponding joint variLet X
able, where both are expressed as:
b = {XI , XII ,. . . , XK }
Set of domains: X
Joint variable:
(16.11)
Xb = {X I , X II ,. . . , X K }
A specific domain Y of cardinality l, with variable Y , represents a consequent
variable of interest to the analyst. Figure 16.5 illustrates the general situation of
Bayesian predictive modelling, involving the mentioned variables.
X
I
X
II
.....
X
K
ZY | X
Joint cause variables X
(Evidence)
Consequent variable Y
(Query)
Y
Fig. 16.5 Situation of Bayesian prediction
Assume that there exists a joint conditional opinion ωY |Xb which for every combi-
nation of values in Xb specifies an opinion on Y . Assume also that there is an opinion
on ωXb . Subjective Bayesian predictive reasoning is expressed as:
ωY kXb = ωXb ⊚ ωY |Xb
(16.12)
In case the variables in Xb can be assumed independent, the opinion on the joint
variable Xb can be generated by normal multinomial multiplication described in
Chapter 8, and is expressed by:
K
ωXb = ∏ ωX i
(16.13)
i=1
There are two optional operators for multinomial multiplication for computing
Eq.(16.13), namely normal multiplication described in Section 8.1, and proportional
multiplication described in Section 8.2. At the time of writing, the relative performance of each multiplication operator has not been fully investigated, so no specific
advice vcan be given on the choice of operator.
16.2 Subjective Bayesian Networks
305
16.2.2 Subjective Diagnostic Reasoning
This section briefly describes how the diagnostic reasoning category can be handled
with subjective logic.
b denote a set of L domains, and let Yb denote the respective set of variables,
Let Y
where both are expressed as.
b = {YI , YII ,. . . , YL }
Set of domains: Y
(16.14)
Yb = {Y I , Y II ,. . . , Y L }
Joint variable:
A specific class domain X of cardinality k, with variable X, represents the set of
classes of interest to the analyst. The classes can e.g. be a set of medical diagnoses,
or types of email messages such as ‘spam’ or ‘ham’.
Figure 16.6 illustrates the situation of Bayesian classifiers, where states of
varaible X causes states of the joint variable Yb .
Class variable X
(Query)
X
ZY | X
ZY
I
YI
II
ZY
|X
.....
Y II
L
|X
YL
Observation variables
(Evidence)
Fig. 16.6 Situation of Bayes classifier
Eq.(16.15) expresses the general subjective Bayes classifier which gives the expression for the opinion on X given the joint opinion on the evidence variables Yb . It
is based on the operator for multinomial opinion inversion of Definition 10.48.
e ωb , aX )
ωX|Yb = ⊚(
Y |X
(16.15)
In practical situations, the joint conditional opinion ωYb|X is typically not available, then it would be impossible to apply Eq.(16.15). It is typically more practical
to obtain conditional opinions for single Y -variables. In case it can reasonably assumed that the Y -variables are independent, the naı̈ve Bayes classifier for subjective
logic, expressed in Eq.(16.16), can be used.
!
e
ωX|Yb ≈ ⊚
L
∏ ωY j |X , a X
j=1
(16.16)
306
16 Subjective Networks
Eq.(16.16) expresses the general naı̈ve Bayes classifier for subjective logic. The
product of conditional opinions ω(Y j |X) can be computed by normal multinomial or
proportional multiplication described in Chapter 8.
At the time of writing, no software implementations or applications exist for this
classifier. When implemented, it will be interesting to see how it performs in fields,
such as spam filtering, natural language text classification, medical diagnostics, customer profiling and cuber incident classification.
16.2.3 Subjective Intercausal Reasoning
Situations of intercausal reasoning occur frequently. With reference to the lung cancer example, it could for example be that a non-smoker person has been diagnosed
with lung cancer. Then it would be possible to determine the likelihood that the person has been exposed to pollution. The derived probability of exposure to pollution
is then typically high, which then would be seen as the cause of cancer.
Alternatively, if the person is a smoker, then the derived probability of exposure
to pollution is typically low, which would then be seen not to be the cause of cancer.
b denote a set of K domains, and let Xb denote the respective set of variables,
Let X
where both are expressed as.
b = {XI , XII ,. . . , XK }
Set of domains: X
Joint variable:
(16.17)
Xb = {X I , X II ,. . . , X K }
b denote a set of L domains, and let Yb denote the respective set of variables,
Let Y
where both are expressed as.
b = {YI , YII ,. . . , YL }
Set of domains: Y
Joint variable:
(16.18)
Yb = {Y I , Y II ,. . . , Y L }
b denote a specific consequent domain with variable Z.
b
Let Z
Figure 16.7 illustrates the general situation of intercausal reasoning, where the
two sets of variables Xb and Yb are causes of the consequent variable Z. It is assumed
that there is evidence on the set of variables Yb as well as on Z, and that the query
b
targets the set of variables X.
Intercausal reasoning takes place in two steps: 1) Abduction, and 2) Division.
More specifically, it is assumed that the analyst has an opinion ωZ about the consequent variable Z, and that there exists a joint conditional opinion ωZ|(X,
b Yb ) . With
multinomial abduction, it is possible to compute the opinion ω(X,
b Yb )kZ expressed as:
e ω bb , a bb )
ω(X,
b Yb )kZ = ωZ ⊚(
(X,Y )|Z
(X,Y )
(16.19)
16.3 Subjective Combined Reasoning
307
Joint cause variables X
(Query)
XI
.....
Joint cause variables Y
(Evidence)
XK
YI
.....
YL
Z Z |Y
Z Z | X
Z
Consequent variable Z
(Evidence)
Fig. 16.7 Intercausal reasoning
b Yb ),
Having computed the abduced opinion on the joint set of joint variables (X,
we can proceed to the second step. Assume that the analyst has an opinion ωYb on
the set of joint variables Yb , then it is possible to derive an opinion ωXb about the set
of joint variable Xb thorugh multinomial division, as expressed by:
ωXb = ω(X,
b Yb ) k Z /ωYb
(16.20)
Multinomial division is described in Chapter 8. There are two optional division
operators. In case the evidence on Yb is an absolute (product) opinion, then selective division described in Section 8.7.2 should be used. In case the evidence on Yb
is a partially uncertain (product) opinion, then proportional division described in
Section 8.7.1 should be used.
16.3 Subjective Combined Reasoning
The last reasoning category to be described is the so-called combined category, because it combines predictive and diagnostic reasoning.
b denote a set of K domains, and let Xb denote the respective set of variables,
Let X
where both are expressed as.
b = {XI , XII ,. . . , XK }
Set of domains: X
Joint variable:
Xb = {X I , X II ,. . . , X K }
(16.21)
b denote a set of L domains, and let Yb denote the respective set of variables,
Let Y
where both are expressed as.
308
16 Subjective Networks
b = {ZI , ZII ,. . . , ZM }
Set of domains: Z
Joint variable:
(16.22)
Zb = {Z I , Z II ,. . . , Z M }
Let Y be an intermediate consequent domain with variable Y .
Figure 16.8 illustrates the general situation of combined reasoning, where the
two sets of variables Xb and Zb represent the evidence, and variable Y represents the
query variable.
X
I
X
.....
II
X
K
Joint cause variables X
(Evidence)
ZY | X
Intermediate consequent variable Y
(Query)
Y
Z Z |Y
ZZ
I
ZI
Z II
II
ZZ
|Y
.....
M
|Y
ZM
Concequent variables
(Evidence)
Fig. 16.8 Combined reasoning
The situation of Figure 16.8 can be handled by first computing the inverted condtional opinion ωX|Y
b , and subsequently by deriving a naı̈ve Bayes classifier for the
variable Y based on both ωX|Y
b and ωZ|Y
b .
!
!
e
ωY |(Yb,X)
b ≈⊚
M
ωX|Y
b · ∏ ωZ j |Y
, aY
(16.23)
j=1
In the example of lung cancer, the GP can thus use all the evidence consisting of
air pollution, smoking, X-ray and dyspnoea to compute an opinion about whether
the person suffers from lung cancer.
16.4 Subjective Networks
The independence properties of Bayesian networks described in Section 16.1.3 are
not obvious in case of subjective logic. This is because the criterion of ‘knowing
the probability distribution’, e.g of an intermediate variable in a causal chain, is not
necessarily satisfied with a subjective opinion on the variable. Consider for example
a vacuous opinion on node Y in Figure 16.4.a. It would be an exaggeration to say
16.4 Subjective Networks
309
that the ‘probability distribution is known’ in that case. It is also possible that different analysts have different opinions about the same variables. Traditional Bayesian
networks are not designed to handle such situations.
Subjective logic opens up possible ways of handling such situations. Figure 16.9
illustrated how trust networks and subjective Bayesian networks can be integrated.
Z
Subjective
Trust
Network
Z
A
Z
ZCA
B
C
B
X
X
Subjective
Bayesian
Network
A
B
Z XA:B
A
D
Agents
D
ZYC ZYD
Y
ZY( A:C ) ¡ ( A:D )
Variables
Z Z |( X ,Y )
Z
Z Z ||( X ,Y )
Fig. 16.9 Subjective Networks, consisting of trust networks and Bayesian networks
The investigation of theoretical models and practical methods for Bayesian network modelling based on subjective logic, combined with trust networks, opens up
a highly fertile field of research in AI and machine learning.
310
16 Subjective Networks
References
1. Ahmad Abdel-Hafez, Yue Xu, and Audun Jøsang. A Normal-Distribution Based Rating Aggregation Method for Generating Product Reputations. Web Intelligence, 13(1):43–51, 2015.
2. Ahmad Abdel-Hafez, Yue Xu, and Audun Jøsang. An Accurate Rating Aggregation Method
for Generating Item Reputation. In Proceedings of the International Conference on Data
Science and Advanced Analytics (DSAA 2015), Paris, October 2015. IEEE.
3. K. Aberer and Z. Despotovic. Managing trust in a peer-2-peer information system. In Henrique Paques, Ling Liu, and David Grossman, editors, Proceedings of the Tenth International
Conference on Information and Knowledge Management (CIKM01), pages 10–317. ACM
Press, 2001.
4. K.K. Bharadwaj and M.Y.H. Al-Shamri. Fuzzy Computational Models for Trust and Reputation Systems. Electronic Commerce Research and Applications, 8(1):37–47, 2009.
5. F.M. Brown. Boolean Reasoning: The Logic of Boolean Equations. 1st edition, Kluwer
Academic Publishers,. 2nd edition, Dover Publications, 2003.
6. W. Casscells, A. Schoenberger, and T.B. Graboys. Interpretation by physicians of clinical
laboratory results. New England Journal of Medicine, 299(18):999–1001, 1978.
7. E. Castagnoli and M. LiCalzi. Expected Utility Without Utility. Theory and Decision,
41(3):281–301, November 1996.
8. A. Chateauneuf. On the use of capacities in modelling uncertainty aversion and risk aversion.
Journal of Mathematical Economics, 20:343–369, 1991.
9. G. Choquet. Theory of capacities. Annales de l’Institut Fourier, 5:131–295, 1953.
10. B. Christianson and W. S. Harbison. Why Isn’t Trust Transitive? In Proceedings of the Security
Protocols International Workshop. University of Cambridge, 1996.
11. M. Daniel. Associativity in Combination of Belief Functions. In Proceedings of 5th Workshop
on Uncertainty Processing. - Praha, Edicni oddeleni VSE 2000, pages 41–54. Springer, 2000.
12. Bruno de Finetti. The true subjective probability problem. In Carl-Axel Staël von Holstein,
editor, The concept of probability in psychological experiments, pages 15–23, Dordrecht, Holland, 1974. D.Reidel Publishing Company.
13. Bruno de Finetti. The value of studying subjective evaluations of probability. In Carl-Axel
Staël von Holstein, editor, The concept of probability in psychological experiments, pages
1–14, Dordrecht, Holland, 1974. D.Reidel Publishing Company.
14. Sebastien Destercke and Didier Dubois. Idempotent merging of belief functions: Extending
the minimum rule of possibility theory. In Thierry Denœx, editor, Workshop on the Theory on
Belief Functions (WTBF 2010), Brest, April 2010.
15. Jean Dezert, Pei Wang, and Albena Tchamova. On the Validity of Dempster Shafer Theory.
In Proceedings of the 15th International Conference on Information Fusion (FUSION 2012),
Singapore, July 2012.
16. M.R. Diaz. Topics in the Logic of Relevance. Philosophia Verlag, München, 1981.
17. D. Dubois and H. Prade. Representation and combination of uncertainty with belief functions
and possibility measures. Comput. Intell., 4:244–264, 1988.
18. R. J. Duffin. Topology of Series-Parallel Networks. Journal of Mathematical Analysis and
Applications, 10(2):303–313, 1965.
19. J.K. Dunn and G. Restall. Relevance Logic. In D. Gabbay and F. Guenthner, editors, Handbook of Philosophicla Logic, 2nd Edition, volume 6, pages 1–128. Kluwer, 2002.
20. Daniel Ellsberg. Risk, ambiguity, and the Savage axioms. Quarterly Journal of Ecomonics,
75:643–669, 1961.
21. R. Falcone and C. Castelfranchi. How trust enhances and spread trust. In Proceedings of the
4th Int. Workshop on Deception Fraud and Trust in Agent Societies, in the 5th International
Conference on Autonomous Agents (AGENTS’01), May 2001.
22. R. Falcone and C. Castelfranchi. Social Trust: A Cognitive Approach. In C. Castelfranchi and
Y.H. Tan, editors, Trust and Deception in Virtual Societies, pages 55–99. Kluwer, 2001.
23. Randy Farmer and Bryce Glass. Building Web Reputation Systems. O’Reilly Media / Yahoo
Press, March 2010.
References
311
24. M. Fitting. Kleene’s three-valued logics and their children. Fundamenta Informaticae,
20:113–131, 1994.
25. P. Flocchini and F.L. Luccio. Routing in Series Parallel Networks. Theory of Computing
Systems, 36(2):137–157, 2003.
26. L.C. Freeman. Centrality on Social Networks. Social Networks, 1:215–239, 1979.
27. D. Gambetta. Can We Trust Trust? In D. Gambetta, editor, Trust: Making and Breaking
Cooperative Relations, pages 213–238. Basil Blackwell. Oxford, 1990.
28. F. Garcin, B. Faltings, and R. Jurca. Aggregating Reputation Feedback. In Proceedings of the
First International Conference on Reputation: Theory and Technology, pages 62–74. Italian
National Research Council, 2009.
29. P. Gärdenfors and N.-E. Sahlin. Unreliable probabilities, risk taking, and decision making.
Synthese, 53:361–386, 1982.
30. A. Gelman et al. Bayesian Data Analysis, 2nd ed. Chapman and Hall/CRC, Florida, USA,
2004.
31. Robin K.S. Hankin. A generalization of the dirichlet distribution. Journal of Statistical Software, 33(11):1–18, February 2010.
32. U. Hoffrage, S. Lindsey, R. Hertwig, and G. Gigerenzer. Communicating statistical information. Science, 290(5500):2261–2262, December 2000.
33. ISO. ISO 31000:2009 - Risk management Principles and guidelines. International Organization for Standardization, 2009.
34. A. Jøsang. The right type of trust for distributed systems. In C. Meadows, editor, Proc. of the
1996 New Security Paradigms Workshop. ACM, 1996.
35. A. Jøsang. Artificial reasoning with subjective logic. In Abhaya Nayak and Maurice Pagnucco, editors, Proceedings of the 2nd Australian Workshop on Commonsense Reasoning,
Perth, December 1997. Australian Computer Society.
36. A. Jøsang. An Algebra for Assessing Trust in Certification Chains. In J. Kochmar, editor,
Proceedings of the Network and Distributed Systems Security Symposium (NDSS’99). The
Internet Society, 1999.
37. A. Jøsang. A Logic for Uncertain Probabilities. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(3):279–311, June 2001.
38. A. Jøsang. The Consensus Operator for Combining Beliefs. Artificial Intelligence, 142(1–
2):157–170, October 2002.
39. A. Jøsang. Conditional Reasoning with Subjective Logic. Journal of Multiple-Valued Logic
and Soft Computing, 15(1):5–38, 2008.
40. A. Jøsang. Cumulative and Averaging Unfusion of Beliefs. In The Proceedings of the International Conference on Information Processing and Management of Uncertainty (IPMU2008),
Malaga, June 2008.
41. A. Jøsang, J. Diaz, and M. Rifqi. Cumulative and Averaging Fusion of Beliefs. Information
Fusion, 11(2):192–200, 2010. doi:10.1016/j.inffus.2009.05.005.
42. A. Jøsang and R. Ismail. The Beta Reputation System. In Proceedings of the 15th Bled
Electronic Commerce Conference, June 2002.
43. A. Jøsang, R. Ismail, and C. Boyd. A Survey of Trust and Reputation Systems for Online
Service Provision. Decision Support Systems, 43(2):618–644, 2007.
44. A. Jøsang and Haller J. Dirichlet Reputation Systems. In The Proceedings of the International
Conference on Availability, Reliability and Security (ARES 2007), Vienna, Austria, April 2007.
45. A. Jøsang and S.J. Knapskog. A Metric for Trusted Systems (full paper). In Proceedings of
the 21st National Information Systems Security Conference. NSA, October 1998.
46. A. Jøsang and D. McAnally. Multiplication and Comultiplication of Beliefs. International
Journal of Approximate Reasoning, 38(1):19–51, 2004.
47. A. Jøsang and S. Pope. Semantic Constraints for Trust Tansitivity. In S. Hartmann and
M. Stumptner, editors, Proceedings of the Asia-Pacific Conference of Conceptual Modelling
(APCCM) (Volume 43 of Conferences in Research and Practice in Information Technology),
Newcastle, Australia, February 2005.
48. A. Jøsang and S. Pope. Dempster’s Rule as Seen by Little Colored Balls. Computational
Intelligence, 28(4), November 2012.
312
16 Subjective Networks
49. A. Jøsang, S. Pope, and S. Marsh. Exploring Different Types of Trust Propagation. In Proceedings of the 4th International Conference on Trust Management (iTrust), Pisa, May 2006.
50. Audun Jøsang. Multi-Agent Preference Combination using Subjective Logic. In International
Workshop on Preferences and Soft Constraints (Soft’11), Perugia, Italy, 2011.
51. Audun Jøsang, Tanja Az̆derska, and Stephen Marsh. Trust Transitivity and Conditional Belief
Reasoning. In Proceedings of the 6th IFIP International Conference on Trust Management
(IFIPTM 2012), Surat, India, May 2012.
52. Audun Jøsang, Paulo C.G. Costa, and Erik Blash. Determining Model Correctness for Situations of Belief Fusion. In Proceedings of the 16th International Conference on Information
Fusion (FUSION 2013), Istanbul, July 2013.
53. Audun Jøsang and Robin Hankin. Interpretation and Fusion of Hyper Opinions in Subjective
Logic. In Proceedings of the 15th International Conference on Information Fusion (FUSION
2012), Singapore, July 2012.
54. Audun Jøsang, Xixi. Luo, and Xiaowu Chen. Continuous Ratings in Discrete Bayesian Reputation Systems. In The Proceedings of the Joint iTrust and PST Conferences on Privacy, Trust
Management and Security (IFIPTM 2008), Trondheim, June 2008.
55. Audun Jøsang, Simon Pope, and Milan Daniel. Conditional deduction under uncertainty. In
Proceedings of the 8th European Conference on Symbolic and Quantitative Approaches to
Reasoning with Uncertainty (ECSQARU 2005), 2005.
56. Audun Jøsang and Francesco Sambo. Inversting Conditional Opinions in Subjective Logic. In
Proceedings of the 20th International Conference on Soft Computing (MENDEL 2014), Brno,
2014.
57. Sherman Kent. Words of estimated probability. In Donald P. Steury, editor, Sherman Kent and
the Board of National Estimates: Collected Essays. CIA, Center for the Study of Intelligence,
1994.
58. Jonathan Koehler. The Base Rate Fallacy Reconsidered: Descriptive, Normative and Methodological Challenges. Behavioral and Brain Sciences, 19, 1996.
59. Kevin B. Korb and Ann E. Nicholson. Bayesian Artificial Intelligence, Second Edition. CRC
Press, Inc., Boca Raton, FL, USA, 2nd edition, 2010.
60. Robert E. Kraut and Paul Resnick. Building Successful Online Communities: Evidence-Based
Social Design. MIT Press, Cambridge, MA, 2012.
61. V. Latora and M. Marchiori. Economic small-world behavior in weighted networks. The
European Physical Journal B, 32:249–263, 2003.
62. E. Lefevre, O. Colot, and P. Vannoorenberghe. Belief Functions Combination and Conflict
Management. Information Fusion, 3(2):149–162, June 2002.
63. P.V. Marsden and N. Lin, editors. Social Structure and Network Analysis. Beverly Hills: Sage
Publications, 1982.
64. Stephen Marsh. Formalising Trust as a Computational Concept. PhD thesis, University of
Stirling, 1994.
65. D. McAnally and A. Jøsang. Addition and Subtraction of Beliefs. In Proceedings of Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2004),
Perugia, July 2004.
66. Robert J. McEliece. Theory of Information and Coding. Cambridge University Press New
York, NY, USA, 2nd edition, 2001.
67. D.H. McKnight and N.L. Chervany. The Meanings of Trust. Technical Report MISRC Working Paper Series 96-04, University of Minnesota, Management Information Systems Reseach
Center, 1996.
68. Merriam-Webster. Merriam-Webster Online. Available from http://www.m-w.com/, accessed
October 2015.
69. August Ferdinand Möbius. Der barycentrische Calcul. Leipzig, 1827. Re-published by Georg
Olms Verlag, Hildesheim, New York, 1976.
70. Mohammad Modarres, Mark P. Kaminskiy, and Vasiliy Krivtsov. Reliability Engineering and
Risk Analysis: A Practical Guide, Second Edition. CRC Press, 2002.
71. Catherine K. Murphy. Combining belief functions when evidence conflicts. Decision Support
Systems, 29:1–9, 2000.
References
313
72. N. J. Nilsson. Probabilistic logic. Artificial Intelligence, 28(1):71–87, 1986.
73. Donald Nute and Charles B. Cross. Conditional Logic. In Dov M. Gabbay and Franz Guenthner, editors, Handbook of Philosophical Logic, 2nd Edition. Kluwer, 2002.
74. Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufman Publishers,
1988.
75. Simon Pope and Audun Jøsang. Analsysis of Competing Hypotheses using Subjective Logic.
In Proceedings of the 10th International Command and Control Research and Technology
Symposium (ICCRTS). United States Department of Defense Command and Control Research
Program (DoDCCRP), 2005.
76. A.P. Prudnikov, Yu.A. Brychkov, and O.I. Marichev. Integrals and Series, (translated from
Russian), volume 1–3. Gordon and Breach Science Publishers, Amsterdam, New York, 1986.
77. W. Quattrociocchi, M. Paolucci, and R. Conte. Dealing with Uncertainty: Simulating Reputation in an Ideal Marketplace. In Proceedings of the 2008 Trust Workshop, at the 7th Int. Joint
Conference on Autonomous Agents & Multiagent Systems (AAMAS), 2008.
78. P. Resnick, R. Zeckhauser, R. Friedman, and K. Kuwabara. Reputation Systems. Communications of the ACM, 43(12):45–48, December 2000.
79. Sebastian Ries, Sheikh Mahbub Habib, Max Mühlhäuser, and Vijay Varadharajan. Certainlogic: A logic for modeling trust and uncertainty. Technical Report TUD-CS-2011-0104,
Technische Universität Darmstadt, Darmstadt, Germany, April 2011.
80. B. Robertson and G.A. Vignaux. Interpreting evidence: Evaluating forensic evidence in the
courtroom. John Wiley & Sons, Chichester, 1995.
81. G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976.
82. C.E. Shannon. A mathematical theory of communication. Bell Syst.Techn.J., 27:379–423,
623–656, July and October 1948.
83. C. Shapiro. Consumer Information, Product Quality, and Seller Reputation. The Bell Journal
of Economics, 13(1):20–35, 1982.
84. P. Smets. The Combination of Evidence in the Transferable Belief Model. IEEE Transansactions on Pattern Analysis and Machine Intelligence, 12(5):447–458, 1990.
85. M. Smithson. Ignorance and Uncertainty: Emerging Paradigms. Springer, 1988.
86. David Sundgren and Alexander Karlsson. Uncertainty levels of second-order probability.
Polibits, 48:5–11, 2013.
87. S. Tadelis. Firm Reputation with Hidden Information. Economic Theory, 21(2):635–651,
2003.
88. P. Walley. Inferences from Multinomial Data: Learning about a Bag of Marbles. Journal of
the Royal Statistical Society, 58(1):3–57, 1996.
89. O.E. Williamson. Calculativeness, Trust and Economic Organization. Journal of Law and
Economics, 36:453–486, April 1993.
90. A. Withby, A. Jøsang, and J. Indulska. Filtering Out Unfair Ratings in Bayesian Reputation
Systems. The Icfain Journal of Management Research, 4(2):48–64, 2005.
91. R. Yager. On the Dempster-Shafer framework and new combination rules. Information Sciences, 41:93–137, 1987.
92. L.A. Zadeh. Review of Shafer’s A Mathematical Theory of Evidence. AI Magazine, 5:81–83,
1984.
Download
Study collections