Integration of a Bayesian Net Solver ... KBCW Comlink System and a ... Diagnosis System

advertisement
Integration of a Bayesian Net Solver With the
KBCW Comlink System and a Network Intrusion
Diagnosis System
by
Erwin Tam
Submitted to the Department of Electrical Engineering and
Computer Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer
Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 1999
© Erwin Tam, MCMXCIX. All rights reserved.
The author hereby grants to MIT permission to reproduce and
ENG
distribute publicly paper and electronic copies of this thesis
MASSACHUSEIStW1AT1IM
document in whole or in part.
OF TECHN00 Y~ is
0*
--.-.-.-.-.-.-.
Author ........................
Department of Electrical Engipjeering and Computer Science
August 25, 1999
Certified by..
----.
.........
Howard E. Shrobe
Professor
Thesis-5pervisor
Accepted by.................
Arthur C. Smith
Chairman, Department Committee on Graduate Students
Integration of a Bayesian Net Solver With the KBCW
Comlink System and a Network Intrusion Diagnosis System
by
Erwin Tam
Submitted to the Department of Electrical Engineering and Computer Science
on August 25, 1999, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
Abstract
This thesis attempts to demonstrate the practical aspect of applying probabilistic
methods, namely Bayesian nets, towards solving a variety of applications. Uncertainty comes into play when dealing with real world applications. Rather than trying
to solve these problems exactly, a probabilistic approach is more feasible and can provide useful results. For this project, a Bayes net solver was integrated with a network
intrusion detector with resource stealing the primary attack that was focused upon.
A Bayes net solver was also integrated into the Knowledge Based Collaboration Webs
(KBCW) Comlink system. The results show that such an integration between different ideas can be both useful and productive, providing functionality that neither
alone could accomplish. The project is a successful demonstration of the potential
benefits that Bayesian nets can provide.
Thesis Supervisor: Howard E. Shrobe
Title: Professor
2
Acknowledgments
I would like to thank Professor Howard E. Shrobe for his endless patience and support.
Without him, there could have been no thesis. His understanding and consideration
helped me get through tough times both academically and personally. I can honestly
say that at least half of the credit for finishing this thesis should be attributed to
him. My eternal thanks and gratitude go out to him.
I would also like to thank the other members of the KBCW group in the Al lab
that helped me with my daily questions, always with a friendly ear. Getting work
done and trying to find research ideas was never easy but it was so much better being
able to draw upon the creative ideas and suggestions by everyone in the lab.
Lastly I would like to thank my family and close friends who have kept me sane
enough emotionally to maintain the focus needed to finally finish this thesis. I needed
some help in overcoming this hurdle and for that, I am thankful. Times can be tough,
life can seem gloom, and the last thing on one's mind is work. Without the love and
support of those around you, things would be nearly impossible to cope with alone.
My love goes out to them. Thanks and God Bless.
3
Contents
8
1 Introduction
. . . . . . . . . . . . .
8
. . . . . . . .
9
1.3
Applications and Scope . . . . . . .
10
1.4
Objectives . . . . . . . . . . . . . .
11
1.1
Uncertainty
1.2
Probabilistic Models
1.5
1.4.1
MBT for Network Intrusion
11
1.4.2
KBCW Comlink System . .
12
. . . . . . . . . . . . . .
13
Roadmap
14
2 Bayesian Nets
2.1
H istory . . . . . . . . . . . . . . . .
14
2.2
Description
. . . . . . . . . . . . .
18
2.2.1
Simple Example . . . . . . .
18
2.2.2
Independence Assumptions .
20
2.2.3
Consistent Probabilities
. .
21
2.2.4
Exact Solutions . . . . . . .
22
2.2.5
Approximate Solutions . . .
23
. . . . . . . . . . . . .
24
. . . . . . . .
24
2.3
2.4
Advantages
2.3.1
Computation
2.3.2
Structure
. . . . . . . . . .
24
2.3.3
Human reasoning . . . . . .
24
Disadvantages . . . . . . . . . . . .
25
Scaling . . . . . . . . . . . .
25
2.4.1
4
3
6
25
2.4.3
Conflicting model . . . . . . . . . . . . . . . . . . . . . . . . .
25
26
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
3.2
Basic Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.3
Alternate Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.3.1
Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.3.2
Fault Dictionaries and Diagnostics
. . . . . . . . . . . . . . .
28
3.3.3
Rule based Systems . . . . . . . . . . . . . . . . . . . . . . . .
29
3.3.4
W hen not to use the model-based approach
. . . . . . . . . .
29
Three Fundamental Tasks . . . . . . . . . . . . . . . . . . . . . . . .
30
3.4.1
Hypothesis Generation . . . . . . . . . . . . . . . . . . . . . .
30
3.4.2
Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . .
31
3.4.3
Hypothesis Discrimination . . . . . . . . . . . . . . . . . . . .
31
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.5
5
Probability values . . . . . . . . . . . . . . . . . . . . . . . . .
Model Based Troubleshooting
3.4
4
2.4.2
34
Design
4.1
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
4.2
Process modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.3
Input/Output Black Box modeling
. . . . . . . . . . . . . . . . . . .
37
4.4
Fault modeling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.5
Probabilistic Integration . . . . . . . . . . . . . . . . . . . . . . . . .
40
46
Implementation
5.1
Linear Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
5.2
Branch/Fan process.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.3
Branch and Join
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
58
Conclusion
6.1
System critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
6.2
System limitations
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
5
7
6.2.1
Lack of correctness detection . . . . . . . . . . . . . . . . . . .
60
6.2.2
Lack of probabilistic links between components
. . . . . . . .
60
6.2.3
Lack of descriptive model states . . . . . . . . . . . . . . . . .
60
6.3
Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
6.4
Lessons learned .......
..............................
62
64
KBCW Comlink System
7.1
D escription
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
7.2
Integrating a Bayes Net Solver . . . . . . . . . . . . . . . . . . . . . .
66
7.3
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
Exam ple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
C onclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
7.4.1
System critique . . . . . . . . . . . . . . . . . . . . . . . . . .
80
7.4.2
Lessons learned . . . . . . . . . . . . . . . . . . . . . . . . . .
80
Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
7.5.1
Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . .
82
7.5.2
Multiple Viewpoints
. . . . . . . . . . . . . . . . . . . . . . .
83
7.3.1
7.4
7.5
8 Summary
84
A delay-simulator code
85
B comlink-ideal code
96
6
List of Figures
. . . . . . . . . . . . . .
19
. . . . . . . . . . . . .
21
4-1
A simple sample component. . . . . . . . . . . . . . . . .
38
4-2
A second example of a component module. . . . . . . . .
39
4-3
Completely specified component with probabilistic model included.
43
5-1
Linear process model. . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5-2
Linear process apriori probabilistic model.
. . . . . . . . . . . . . . .
48
5-3
Linear process post evidential probabilistic model.....
. . . . . . .
49
5-4
Branch/Fan process model.
. . . . . . . . . . . . . . . . . . . . . . .
50
5-5
Branch/Fan process probabilistic model.
. . . . . . . . . . . . . . . .
51
5-6
Branch and Join model.
. . . . . . . . . . . . . . . . . . . . . . . . .
55
5-7
Branch and Join probabilistic model. . . . . . . . . . . . . . . . . . .
55
7-1
The Burger Problem before any evidence. . . . . . . . . . . . . . . . .
75
7-2
Cheap cream cheese developed.
. . . . . . . . . . . . . . . . . . . . .
76
7-3
Germany goes bankrupt and liquidates pickled cabbage to the world.
76
7-4
McDonald's survey show customers value variety the most. . . . . . .
77
7-5
Medical results say that you need to eat meat. . . . . . . . . . . . . .
77
7-6
Estimated cost of pickled cabbage quadruple burger too high.....
78
7-7
New McDonald's survey also shows that customers love the cheeses.
79
7-8
Final result putting together all the evidence.
79
2-1
A simple causal graph network.
2-2
Fully specified Bayesian network.
7
. . . . . . . . . . . . .
Chapter 1
Introduction
1.1
Uncertainty
Uncertainty is a central motivating force behind many aritificial intelligence topics.
How can we make intelligent decisions in the face of limited data? One might say
that learning itself is merely a process of reducing uncertainty in our world. How
can we ever be entirely certain of something? Given that we know a small limited
amount of information, what conclusions or hypotheses can we reason about from
that? How likely are these conclusions to be true? There are many situations where
tough decisions need to be made. They usually involve a subjective process whereby
one person's experience and intuition guide them in the decision making. Ideally,
we'd like to have a more formal process, especially when confronted with a lot of data
or varying opinions from many people. How much do we really believe something to
be true and what effect does our level of belief have on the conclusions we come up
with? To be able to answer this, we need some model of uncertainty that allows a
systematic procedure for chaining together information and coming up with a viable
hypothesis.
Understanding uncertainty and how the human brain deals with it is a very important question. Right now researchers have no idea how to model the way the
human senses work or the way the brain works. We know that the senses send signals
containing information to the brain, but how this data is processed is still a mystery.
8
Like all things, data is often incomplete, yet the human mind is capable of piecing
things together to form a complete picture. Is it possible that by learning more about
this mysterious process, we can garner practical knowledge that can be readily applied to other fields and problems? The answer appears to be a resounding "yes".
There are so many applications that have to deal with limited data yet still need to
generate some form of useful representation or hypothesis from thatt. Coming up
with a model that can handle such conditions has such wide ranging applicability.
This is an area of research that is starting to expand. As computer technology gets
faster and better, we should also come up with smarter and better ways to utilize
such computing power. Even if we eventually reach the raw computing power of the
human brain, if we cannot make efficient usage of it, the outcome will be severely
limited.
1.2
Probabilistic Models
One of the most useful models for dealing with uncertainty is the probabilistic model.
Probability theory is a well understood and mature field. When dealing with unknown
quantities and attempting to combine and reason with them, probability theory serves
as an obvious choice to serve this purpose. Probability theory has had a resurgence in
the past decade and is the current most popular tool for analyzing uncertainty. This
was due mainly to the increased computing power and newer models and algorithms
for probability theory. There are other formalisms for reasoning about uncertainty
such as the Dempster-Shafer theory of evidence[10] or the large body of work based
on Zadeh's fuzzy set theory[8). However, the model of choice that will be used is the
probabilistic model.
Analyzing uncertainty using formal probability theory provides us with a structured and accurate method to handle evidence and partial information. It allows us to
merge together sources of information with imperfect reliability to generate hypotheses, each with their own relative likelihood or probability of being true. Probability
theory is a well understood idea that is proven to be a correct way for dealing with
9
uncertainty. This is why we decided to use the probabilistic model.
1.3
Applications and Scope
There are three main areas where uncertainty and probability theory can be applied.
They are: diagnosis, data interpretation, and decision making. Diagnosis is a broad
area that covers medical decision making and model based troubleshooting for systems. It is widely used in medicine for making decisions about patient treatment,
interpreting test results, and helping patients understand the rationale behind alternative treatments. This fast growing area is known as medical informatics. Model
based troubleshooting is another form of diagnosis. Here the problem deals with
some system that can fail for a number of reasons. The objective is to diagnose the
condition and reason about possible faults given observations about how the system
is behaving or misbehaving as is usually the case. This form of diagnosis began in
manufacturing where faults in circuits needed to be diagnosed to find out where the
problem lay. Since there is a limited supply of experts who can properly troubleshoot
a given system, automated troubleshooting is a viable and very desirable goal to attain. The range of applications this applies to is immense given the vast amounts of
complex machinery and technology available which often break down.
Data interpretation is widely used by the military when they have to interpret
intelligence data and determine the likelihood of different hypotheses. This is critical
in evaluating the capabilities of the enemy in order to make smarter, more strategic
decisions. Gathering data is often a very costly process both in finacial resources
and human resources. As such one would like to find the optimal ratio of value of
information to cost of information. There is a wide range of oft incomplete data.
The military strategist needs to be able to piece together all of the information in
order to get an idea what the data is suggesting. Having a systematic approach to
data interpretation would take a lot of the guess work out of this process. In turn
it can help to minimize the effects of human error. This is a common problem as
demonstrated by the lack of military tact sometimes displayed by governments, ours
10
included.
The final application area deals with collaborative decision making. Oftentimes
when a large group of people is trying to come up with a decision, they need to decide
between various alternatives. There is no systematic process of debating. Moreover,
there is no structured quantitataive way of weighing arguments including how they
might support or deny another statement. Without a systematic way to piece together
arguments and other data, the decision making process can become very disorganized
and non-optimal. The "might makes right" approach whereby the top CEO or head
of the group makes the final decision isn't ideal. None of the ideas or arguments of
the others are taken into account. If it happens that someone has a really good idea
but is very low on the corporate ladder scale, their idea will get squashed by someone
higher up even though it is a superior idea. This is the type of problem we are trying
to solve when using probabilistic methods in collaborative decision making.
1.4
Objectives
The goal of this thesis is to demonstrate the feasibility of applying probabilitistic
methods towards solving two applications. The first one deals with model based
trouble shooting of network intrusion detection. The second deals with collaborative decision making, namely integrating probabilistic functionality into the KBCW
Comlink system. I will talk more about each of these two projects in the proceeding
sections. The overall objective is to integrate probability into different domains to
show that probability allows for very useful functionality. These two domains are just
a small sample of the many wide ranging applications that probabilistic methods can
be incorportated in. The success of these two projects will demonstrate that this is
an important emerging idea whose potential is not yet fully realized.
1.4.1
MBT for Network Intrusion
The first task/application to be dealt with is model based troubleshooting for network intrusion. Network intrusion is a big problem especially with today's type of
11
distributed systems. Hackers can attack the system in many ways. One difficult problem is resource stealing. This is a hard problem because it is passive. It is difficult to
know if resources are being stealed since nothing really goes wrong. The only visible
sign of a problem is that processes that use the same resource may run a bit slower.
Since the loads on resources vary throughout the day and is not constant, this is difficult to monitor and determine definitively if there is indeed a problem. What we have
done is to model a network of processes and resources and allow timing information
to propagate throughout the system. That is, we model the inputs and outputs of
processes and how they flow between components in the system while making note
of arrival and departure times of the input/output. Together with a prior knowledge
of the range of times it usually takes for a process to complete, we will be able to
monitor and troubleshoot if there seems to be a resource stealing problem. Thus we
will know if the network has been compromised with and more specifically, which
particular resource is likely to be compromised. This is a self monitoring probabilistic system that is a good beginning point for a network traffic monitoring/trouble
shooting system. The main bulk of the thesis will be focused on this problem.
1.4.2
KBCW Comlink System
The second task/application deals with the KBCW Comlink System. The Knowledge Based Collaboration Webs Comlink System allows people from all over the net
to be able to communicate in a collaborative forum. There they can voice their
opinions/arguments about topics to be debated upon. Currently the system allows
arguments to be linked together. That is, there is a graphical structure to the way in
which a person's argument can either agree/disagree or support/deny another statement given by another person. This system allows people from different locations to
be able to come to a group decision based on the arguments of each person. However,
one thing lacking currently is a way in which to quantify things. That is, how much
does one statement/argument support another? Do certain claims/hypotheses seem
more likely to be true than others? What we did was to basically integrate a probabilistic system into the Comlink system to provide this functionality. This would
12
give Comlink the ability to more precisely quantify how likely certain hypotheses are.
In a collaborative decision making process, this would allow the system to be able to
come up with a relative likelihood for all of the competing ideas. It would provide
a systematic way to distinguish between them to see which is more likely to succeed
given all of the arguments given by the group.
1.5
Roadmap
A roadmap of the direction this thesis takes will be given. First the probabilistic
model of choice, Bayesian nets will be described and discussed briefly to provide
the reader with a good basic idea of what Bayesian nets are and what they can do.
Next a quick summary of the field of model based troubleshooting will be given to
allow the reader to obtain a good idea what the issue/problems are in the area of
troubleshooting. This section is included for completeness and is in no way meant to
replace other superior sources on the topic. Next the problem of the network intrusion
troubleshooter will be discussed with an explanation of how things are modeled and
what functionality is available. Several walk through examples of the system will be
given to demonstrate what it can do. A critique of the system and it's limitations
will then be discussed. Issues such as future work and important lessons learned
from designing/implementing such a system will be discussed. Next I will talk about
the second project dealing with the integration of a Bayesian net solver with the
KBCW Comlink system. Issues such as design, implementation, critique, limitations,
and future work will be discussed. Finally a summary of the relative success of this
project will be given. Issues about what went wrong and what went well will be
talked about.
13
Chapter 2
Bayesian Nets
2.1
History
A Bayesian network is a compact, expressive representation of uncertain relationships
among parameters in a domain[3]. They are based on the probability theorem named
after the reverend Bayes.
Bayes' Theorem is a very powerful tool for determining
how to use evidence contained in data to determine the likelihood of hypotheses. It
can be shown to be the only coherent way to pass from specific evidence to general
hypotheses. In other words, it is a method for combining evidence which is provably
correct and well understood formally.
Thus it is of great value when combining
evidence and choosing between competing hypotheses.
Bayes theorem allows one to update the probability of a hypothesis given new
information or evidence. Bayes' Theorem is:
P(H|E) = P(EIH)P(H)/P(E)
Here, P(H) is the prior probability of a hypothesis H. P(H IE) is the posterior probability of a hypothesis, given that we observed evidence E. This allows us to constantly
update the probability of a hypothesis in the face of new evidence. Since this is an
example where new information changes our degree of certainty or belief in an event,
it may also be considered a type of learning.
14
Reducing uncertainty about a topic
is essentially equivalent to learning more about the topic. In Bayes' original paper,
published only after his death, he wanted to figure out how to go from the usual deduction of the probability of a specific result given a general hypothesis, P(E|H)and
turn it around, showing how to pass from a specific evidential result to the probability
of the general case, P(HIE). Bayes was a nonconformist minister and he developed
his theory as a formal means of arguing for the existence of God. The role of evidence
in this argument is taken by the occurrence of miracles and other manifestations of
God's good works. The two hypotheses he was comparing are the affirmation and
denial of God. Bayes wanted to prove formally that it was more likely that God
existed rather than the contrary.
Probability theory, including Bayes theorem is the oldest and best understood
theory for representing and reasoning about situations with uncertainty involved.
However, early artificial intelligence experimental efforts at applying probability theory was unsuccessful and disappointing. The main quip against probability theory
was the complaint that those who were worried about numbers were missing the
main point, that structure was the key and not the numeric details. At that time,
the only way to use probability theory in solving these problems was through the
joint probability distribution, also known as the JPD. For domains described by a set
of discrete parameters, the size of the JPD and the complexity of reasoning with it
directly can both be exponential in the number of parameters[3]. The method was
provably correct but infeasible to use. There was no efficient model that made it simpler or easier to understand or reason about quickly. Plus it was nearly impossible
computational-wise to use the probabilistic method reasonably. Calculations simply
took an exorbitant amount of time even for domains that contained only a small but
reasonable amount of parameters.
An approach to simplify things was the naive Bayes' model. This model assumes
that the probability distribution for each observable parameter depends only on the
root cause and not on the other parameters. This simplification allowed tractable
reasoning and computation. However the model was too extreme and oversimplified.
It did not provide the desired results. Thus given the computing power and probability
15
models at that time, probabilistic methods were not a feasible way to deal with
uncertainty problems.
However, about 5-10 years ago, there was renewed interest in probability, especially decision theory. The reason for this was the result of dramatic new developments in computational probability and decision theory which directly addressed the
perceived shortcomings of probability theory. The key idea was the discovery that
a relationship could be established between a well-defined notion of conditional independence in probability theory and the absence of arcs in a directed acyclic graph
(DAG). This relationship made it possible to express much of the structural information in a domain independently of the detailed numeric information, in a way that
both simplifies knowledge acquisition and reduces the computational complexity of
reasoning. The resulting graphical models have come to be known as Bayesian networks. This directly addressed the two major shortcomings of probability theory.
First that it was structure that was important and not just the numeric details in the
domain. Second that it was intractable to reason about and compute with probability models. The current work is in developing better algorithms to speed up belief
updating in Bayesian nets. This is currently a very active field and has helped to
grow the importance of Bayesian nets in the Al community as well as in industry.
The Bayesian network formalism is the single development most responsible for
progress in building practical systems capable of handling uncertain information[6].
The first book on Bayesian networks was published by Pearl in 1988[9]. The reader
is referred to this text as it is regarded as the authority in Bayesian nets. A Bayesian
network is a directed acyclic graph that represents a probability distribution. Nodes
represent random variables, and arcs represent probabilistic correlation between the
variables. The types of path and lack thereof between variables indicate probabilistic independence.
Quantitative probability information is specified in the form of
conditional probability tables. For each node, the table specifies the probability of
each possible state of the node given each possible combination of states of its parent nodes. The tables for root nodes just contain unconditional probabilities. This
formalism is very intuitive to reason with. It has been claimed that the human mind
16
works in very similar ways when reasoning about uncertainty. Since this model is
theorized to be similar to how we think, it is easy and intuitive to use.
The important feature of Bayesian networks is the fact that they provide a method
for decomposing a probability distribution into a set of local distributions. The independence semantics associated with the network topology specifies how to combine
these local distributions to obtain the complete joint-probability distribution over all
the random variables represented by the nodes in the network. The only probability
values that need to be specified for each node is the conditional probability of the
node being in one of its possible states given all combinations of its parents and their
respective states. The structure of Bayesian networks allow three important features.
First, by naively specifying a joint probability distribution with a table, it requires
an exponential amount of values in the number of variables. However if the graph is
sparse, the number of values needed is drastically reduced. As the network gets larger,
the savings become very substantial. Second, there are efficient inference algorithms
which work by transmitting information between the local distributions rather than
working with the full joint distribution. In essense the algorithms divide up the graph
into several smaller pieces, solves each smaller piece, and combines them together to
get the final result. Much quicker computation can come about by using such optimizing strategies. Third, the separation of the qualitative structure of the domain
and variables with the quantitative specification of the relative strengths of influence
between variables is extremely beneficial. This breaks the problem of modeling a
domain into two distinct stages. The process makes the knowledge engineering task
much easier and more tractable. The first step is in coming up with the qualitative
structure of the graph to see how the variables in the domain influence each other.
The second step is in coming up with the numbers and quantifying the strengths of
these relationships. Once both steps are complete, we are left with a model that
completely specifies the joint probability distribution in an intuitive and easy to use
graphical model.
17
2.2
2.2.1
Description
Simple Example
Now that we have discussed theoretically about what a Bayesian net is and why it is
so useful, we will go through a simple example to demonstrate how a Bayesian net
works. The best way to learn about Bayesian nets is to see an example of one. Note
that the example discussed below is borrowed from Charniak's paper[1] as it is a good
simple example of a Bayesian net. The following will briefly summarize some of the
issues and ideas discussed in his paper.
Bayesian nets are best at modeling situations where causality plays a role but
where our understanding of what is actually going on is incomplete. Therefore we
need to describe things probabilistically.
Suppose that when I go home, I will only take out my keys to open the door if
my family is out. Otherwise if I know that they're in I would just ring the doorbell.
In my house there is a light that my family turns on whenever they leave the house.
However, the light is also turned on if my family is expecting guests. This doesn't
happen all the time though. I also have a dog at home and he is outside in the yard
sometimes. Whenever nobody is at home, the dog is put in the backyard. The same
thing also happens if the dog has bowel-problems. Nobody at home wants the dog
in the house if that is the case. If my dog is outside in the backyard, he will bark
from time to time. There is a chance that I will hear him bark when I get home.
This is the situation when I get home. Since I am lazy and don't want to expend
the effort of taking out my keys if I don't have to, I'd like to be able to figure out
if my family is home or not. The two observations I can make are to see if the light
is on and if I hear the dog barking or not. I can't see if the dog is actually outside
in the backyard since it is in the back of the house. I could walk back there and
check though that would defeat the purpose of conserving my energy. So based on
my two observations, what conclusion can I draw about whether or not my family is
home? This situation is depicted in figure 2-1. The nodes in the graph signify random
variables which can be thought of as states of affairs. Each of the variables can have
18
Figure 2-1: A simple causal graph network.
a multitude of possible values. In our case, they are binary, either true or false. In
the more general case, each node or random variable can be N-ary, having N amount
of discrete states. Bayesian nets also extend to the non-discrete or continuous states.
This is an interesting area of research though for most practical everyday cases, it
isn't as good a model as discrete states.
The directed arcs in the graph signify causal relationships between variables. For
example if my family is out, then it has a causal effect on the light being turned on.
Similarly if my dog has bowel problems, then that directly affects the likelihood of
the dog being put out in the backyard. The important thing to note is that the causal
connections are not absolute. If the dog is out in the backyard, that doesn't mean that
I will definitely hear him bark. Sometimes he might be sleeping in the backyard or is
just being a good quiet dog and isn't making a ruckus. In any case, this is the first
stage of using Bayesian nets to model a real world problem with uncertainty. Note
that no numbers have been quantified yet though much information can be obtained
through the qualitative structure of the model. The arcs in the Bayesian network
specify the independence assumptions that must hold between the random variables.
Nodes that are not connected by arcs are conditionally independent of each other.
For example, suppose that I have observed that my dog is out in the backyard. I
19
went around back and checked and the dog was there. Now that I know for sure that
the dog is out in the backyard, it makes no difference whether he has bowel problems
or the family is out when determining if I will hear him bark or not. These variables
are conditionally independent.
The next step involves specifying the probability distribution of the Bayesian
network. In order to do this, one must first give the prior probabilities of all root
nodes, nodes with no predecessors and the conditional probabilities of all the other
nodes given all possible combinations of their direct predecessors or parent nodes.
Figure 2-2 shows a completely specified Bayesian network. Now that the model is
completed, we can deal with evidence. For example, let's say that I observe the light
to be on and I don't hear the dog barking. These nodes are then set to be in a specific
state with definite probability. I can calculate the conditional probability of familyout given these pieces of evidence. This is known as evaluating the Bayesian network
given the evidence. As more evidence comes in, the probabilities of the belief of each
node changes. It is important to note that it isn't the probabilities of the nodes that
are changing. What is changing is the conditional probability of the nodes given the
emerging evidence. In this case, belief is defined as the conditional probability given
the evidence.
2.2.2
Independence Assumptions
One important feature of Bayesian nets is the implied independence assumptions.
This feature saves a lot on computation for sparse graphs. One objection to the use
of probability theory is that the complete specification of a probability distribution
requires an exponential amount of numbers. If there are n binary random variables,
the complete distribution is specified by 2' - 1 joint probabilities. Thus the complete
distribution for figure 2-2 would require 31 values, yet we only needed to specify 10.
If we doubled the size of our example by grafting a copy of the graph onto the existing
graph, the number of values needed to specify the joint probability distribution would
be 2' - 1 which is 1023, but we would only need to give it 21 with the Bayesian net
formalism. The savings we get as opposed to the brute force probabilistic method
20
.01
.15
P(do
P(do
P(do
P(do
| fo bp) = .99
| fo !bp) = .90
!fo bp) = .97
!fo !bp)= .3
P(lo I fo) = .6
P(lo I !fo) = .05
P(hb I do) =.7
P(hb I !do) = .01
Figure 2-2: Fully specified Bayesian network.
comes from the independence assumptions implied in the graph. See Charniak[1] or
Pearl[9] for a mathematically precise definition of dependence and independence in
Bayesian networks.
2.2.3
Consistent Probabilities
One problem with a naive probabilistic scheme is inconsistent probabilities whereby
the individual conditional probabilities seem legitimate. However, when you combine
them, you can get probabilities which are not consistent, i.e. probabilities which are
greater than 1. There is no such problem with a Bayesian network. They provide
consistent probabilities and are provably equivalent to a full joint probability distribution. The numbers specified by the Bayesian network formalism define a single
unique joint distribution. Furthermore, if the numbers for each local distribution are
consistent, then the global distribution is consistent. A short proof of this claim is
found in Charniak[1] or Pearl[9].
21
2.2.4
Exact Solutions
The basic computation on a belief network is the computation of every node's belief
given the evidence that has been observed so far. This updating process is also known
as belief propagation. One of the biggest constraint on the use of Bayesian networks is
that in general, this computation is NP-hard[2]. The exponential time limitation often
does show up in many real world Bayesian net models. This is a real issue since many
real-world problems we would like to solve take an unacceptable amount of time to
evaluate. Finding a general algorithm that can solve any Bayesian network exactly is
NP-hard. This means that it is very unlikely to find a Bayesian net algorithm that will
work equally well for all cases. The algorithms for solving a Bayesian network employ
one of two strategies. The first is to factor the joint probability distribution based
on the independencies in the graph. The seceond method is to partition the graph
into several smaller parts and solve each separate part individually before combining
the results together to get the final answer. It can be shown that these two methods
are identical to each other. Some algorithms are very good for solving certain classes
of graphs while they are terrible for other types. One solution might be to have a
library of Bayesian network solver algorithms and be able to identify which algorithm
would work best given the particular problem. This would be a good way around the
NP-hard issue. This idea hasn't been that widespread most likely due to the cost of
implementing several algorithms instead of just one.
The algorithms for solving Bayesian networks exactly work well on a restricted
class of networks.
They can efficiently be solved in time linear to the number of
nodes. The class is that of singly connected networks. A singly connected network,
also known as a polytree, is one in which the underlying undirected graph has no more
than one path between any two nodes. There are techniques that can also transform
a multiply connected network into a singly connected one. There are a few ways to
do this but the most common ways are variation on a technique called clustering. In
clustering, one combines nodes until the resulting graph is singly connected. There
are well understood techniques for producing the necessary local probabilities for the
22
clustered network. Once the network has been converted into a singly connected one,
the previous algorithms can be applied.
2.2.5
Approximate Solutions
There are times when the Bayesian network is just too large that no exact algorithm
is capable of solving it in an acceptable amount of time. As is the case when trying to
solve NP-hard problems, one can opt for an approximate answer. That is, an answer
that is not exact but guaranteed to have a certain amount of error depending on how
many iterations of the algorithm is made. There are many approximation algorithms
available but the common approach they take is sampling. The basic approach is to
start at the root nodes and choose a value for its state based on its probabilities. Next
assume that those nodes are in that particular state and progress on to the children.
Again choose a state for the child node to be in based on its conditional probabilities
with the assumption that the parent nodes are in the states specified by the previous
iteration. Once you've gone through the entire network, record the value of the
node that you are trying to query the probability of given the evidence.
Repeat
this operation several times and the distribution that you record should approach
the actual exact distribution had you solved it exactly. The more iterations you
do the closer your solution will be. However, there are a couple of problems. The
first is that sometimes the solution takes a while to converge, i.e. it'll take a lot
of iterations before the answer you get approaches the exact answer. Secondly, and
this is related to the first problem, depending on where you start, you might get
stuck at a local maxima/minima. The solution you get could be quite different from
the actual answer though your answer wouldn't change for a while through several
iterations simply because you are at a local maxima/minima. Regardless, there is
a greater possibility that there exists an approximation algorithm which works well
for all kinds of Bayesian networks since an exact solution is NP-hard. With the ever
growing level of computing power, this might be the most feasible approach towards
solving Bayesian networks.
23
2.3
2.3.1
Advantages
Computation
There are a few advantages of Bayesian networks that make them very attractive
for solving a lot of Al problems. The first is computation. Given the independence
assumptions in the formalism, computation can be sped up greatly as compared to
computing the joint probability distribution the brute force way. For sparse singly
connected networks, they can be solved very efficiently, even if there are a large
amount of variables/parameters.
2.3.2
Structure
The focus of Bayesian networks is more on structure rather than numbers. That was
one of the key arguments against using probability theory in decision making and
other uncertainty problems. Now one can see visually how ideas are linked together.
A good model that is accepted by all the parties involved can be created first. This
process is the more intuitive step and is easier to come to a group agreement on what
the proper model of the problem should be.
2.3.3
Human reasoning
Bayes nets are theorized to be similar to how humans think. The graph model where
new ideas can be linked in quite easily is a good formalism for human thinking. One
of the reason for the success of Bayesian networks is that they are easy to reason
with. Simply using numbers with the joint probability distribution was shown to be
very intractable to reason about. However with the Bayesian formalism, it has been
shown to be very intuitive to reason with. How the probability propagates as the
result of evidence can be depicted visually lending more belief/credence to the results
generated.
24
2.4
2.4.1
Disadvantages
Scaling
There are still problems with using Bayesian networks. They are not perfect and do
not work equally well for all situations. The general problem is still NP hard to solve
exactly. This definitely does not scale well. It is more likely for larger models with
more variables to be more difficult to solve in a reasonable amount of time. NP-hard
problems are very unlikely to ever be solved efficiently as this is a problem that has
plagued algorithm theorists for many years.
2.4.2
Probability values
Another problem is that one still needs to come up with subjective values for the
conditional probabilities. Even though the structure can be decided upon independently, probability values still need to be given to fully specify the model. Now the
question of where these numbers come from arises. People will very likely argue over
the exact values. One fear is that by just changing the numbers, one can come about
with whatever result one desires.
2.4.3
Conflicting model
Model generation is still subjective. There can be conflicting opinions about causality.
Not everyone will come up with the same exact Bayesian network to model a particular
problem. The issue then becomes, which model is more correct. This is a difficult
problem to address as there is no formal way to quantify how correct a particular
model is. Once again this comes down to subjective viewpoints and the fear becomes
that a not-so-correct model is chosen instead of a more accurate one.
25
Chapter 3
Model Based Troubleshooting
We will now briefly go over the important points and issues that arise when discussing
model based troubleshooting. This chapter is just a summary of the excellent article
written by Davis and Hamscher[4] in chapter 8 of the 11th International Joint Conference on Al. It is by no means meant to be a substitute for that. For a more complete
description of model based reasoning and its current state, consult the aforementioned
reference. This chapter is included for completeness to provide the reader with a fair
understanding of the ideas involved in model based reasoning and troubleshooting.
3.1
Introduction
An oft occurring problem that plagues all of us from time to time is that something
stops working. We would like to know why it stopped working and to figure out
how to fix it. A good first step is to understand how it was supposed to work in the
first place. That is the main idea behind model based reasoning. The rest of this
chapter will discuss what the nature of the troubleshooting task is, exploring what
is given and what is the desired result. Models of the structure and behavior of the
system in question is very useful for diagnosis and reasoning. Most of this chapter will
talk about how to use these representations to do model based diagnosis. The basic
procedure is to witness the interaction between prediction and observation, that is we
predict what should happen and observe what actually happens. Thus when there is
26
a contradiction between the two, we attempt to solve the problem. This is broken
down into three fundamental subproblems: generating hypotheses by reasoning from
the symptoms to components that may possibly be at fault, testing each hypotheses
to see if it is consistent with all the observations, and discriminating among all the
valid hypotheses to see which one is the most likely. What we will find is that there
are well known methods for model based diagnosis once a tractable model for the
problem has been given. However, the harder problem to solve is to figure out how
to come up with a good model. This is an open research topic and presents many
problems.
3.2
Basic Task
As stated earlier, the basic paradigm of model based reasoning is to analyze the
interaction between observation and prediction. Typically there is a physical device or
software that operates in an expected normal manner. However when the observation
or how the device/system is actually operating differs from that which is predicted,
there is a discrepancy. One fundamental assumption is that if the model is assumed
to be correct, then a discrepancy must mean that there is a defect somewhere in
the system. The type of faults and location of the faults that occur are clues that
provide some information about where the defect in the system might occur. This
raises some issues though since the assumption might not always be true. In any
case, given a model, the basic task is to determine which of the components in the
system could have failed in such a way as to account for the discrepancies observed.
The model contains information about the structure and correct behavior of the
components in the system. This information is used to reason with. This approach
to troubleshooting has been called a multitude of names such as model based and
reasoning from first principles. This is because the method is based on a few basic
principles about causality and "deep reasoning".
27
3.3
Alternate Approaches
There are several other approaches to trouble shooting besides the model based
method. They each have their own strengths and weaknesses. They will be discussed
below.
3.3.1
Diagnostics
Diagnostics involve running test programs on devices/systems after they have been
manufactured/created to ensure that the system is capable of doing everything it is
supposed to do. The problem is that this approach is not diagnosis but verification.
The tests make sure that the system is supposed to behave in expected ways. There
is no misbehavior to diagnose since none have come up yet. Model based diagnosis on
the other hand is diagnostic because it is symptom directed. Whenever a fault occurs,
the observed symptom is analyzed and used to work backwards to the underlying
components that might be faulty. This approach is more efficient working backwards
from faults that have already occurred rather than trying to find out what all the
possible faults are.
3.3.2
Fault Dictionaries and Diagnostics
Similarly to diagnostics is the idea of fault dictionaries. Here the fault dictionary is
built by using simulation and a list of the kind of faults anticipated. Once a test
has been simulated, the resulting symptoms/faults are recorded. The list is then
inverted so that one can go backwards from symptoms to faults to find the reason for
failure. This is not broad enough since the only possible symptoms it can recognize
are those that come from the prespecified faults at the time of the fault dictionary's
creation. If a new fault occurs which the designers had not anticipated, the dictionary
becomes useless and is unable to correctly diagnose the problem. Using fault models
like this is useful if the library of faults is very broad since there is a high degree of
specificity to the diagnosis. However it is difficult to be certain that the fault models
are comprehensive enough.
28
3.3.3
Rule based Systems
Rule based systems are built upon the knowledge of experts who know the potential
problems that may arise and what the symptoms may be. The problem is that it may
take a while before there is enough expert knowledge to be able to efficiently diagnose
problems. This is important in systems today since the design cycle is so short. There
is no time to be an expert on a system because by the time you are proficient and
knowledgeable about it, the product becomes obsolete. This approach is also very
device dependent. The knowledge and diagnostic methods used are only applicable
to that particle device or system. In contrast, the model based approach is strongly
device independent. It reasons about from first principles and just needs to know
the basic structure/behavior of the system and its components. This information is
often supplied by the description used to build the device in the first place. The
model based approach is also more methodical and comprehensive. It is less likely to
miss something as opposed to the rule based approach which relies on a subjective
expert's knowledge. Finally rule based systems offer very little help in thinking about
or representing structure and behavior. It does not use that for diagnosis and does
not lead us to think in such terms. This makes the diagnosis harder to understand
or follow for those who are not experts about the system.
3.3.4
When not to use the model-based approach
The model based approach does offer significant advantages compared to other trouble shooting approaches. However it is not the best approach to use in all situations.
If the system that is to be modeled is too complex, the model based approach is
unsuccessful. There are too many unknown variables that just aren't modeled. It
would be too complicated to include all of the information needed to correctly predict and understand the behavior of the system. Conversely, if the system is very
simple, the model based approach isn't optimal. For simple systems, we can model
its behavior completely and exhaustively. The faults considered are well known and
can be enumerated beforehand reliably. Thus a fault model approach such as a fault
29
dictionary would be the optimal approach here.
3.4
Three Fundamental Tasks
The task of model based diagnosis can be broken down into three fundamental task.
Once a fault or discrepancy is observed, a set of hypotheses must be generated to try
to explain what went wrong. Each of these hypothese must be tested to see if it is
consistent with the discrepancies observed. Finally all of the consistent hypotheses
must be discriminated between to find the best, most likely answer.
3.4.1
Hypothesis Generation
A hypothesis generator should typically have three desired qualities. A good generator
should be complete, it should be able to produce all plausible hypotheses. It should be
non-redundant,only unique hypotheses should be generated. It should be informed,
only a small fraction of the hypotheses generated should be proven incorrect after
the testing process. We assume that the device/system in question is modeled as
a collection of several interacting components each with inputs and outputs. We
also postulate that there is a stream of data that flows through the system from one
component to another with each component processing the inputted data in some way.
The first simplification is that we only need to consider components that are upstream
of the discrepancy to be suspects for faultiness. Another idea is that not every input
to a component influences the output. There is thus no need to follow irrelevant inputs
upstream for the same reason for not following components downstream. If there is
more than one discrepancy, we can generate a set of suspect components for each and
intersect them. This may further reduce the amount of suspect components to test.
Hypothesis generation thus becomes a process of following the paths backwards from
the discrepancies.
30
3.4.2
Hypothesis Testing
The second fundamental task of model based diagnosis is to test each of the potential
hypotheses generated and see if it can account for all the observations made. One
simple method is to enumerate all the ways each component in the device can malfunction, then simulate the behavior of the entire device on the original set of inputs
under the assumption that the suspect component is malfunctioning in the specified
way. If the simulated results match the observed results, then that hypothesis is
consistent with the observations and is retained, else it is discarded. The problem
with this is that one must have a complete description of the way in which every
single component can misbehave, otherwise the simulation is not accurate. A more
advanced technique is to use constraint suspension. The basic idea behind this technique is to model the behavior of each component as a set of constraints, and test
suspects by determining whether it is consistent to believe that only the suspect is
malfunctioning. Thus given the known inputs and observed outputs, is it consistent
to believe that all components other than the suspect are working properly? The
traditional method to handling inconsistencies in a contraint network is to find a
value to retract. In the hypothesis testing case though, we want to consider which
contraint rather than value to retract in order to remove the inconsistency.
3.4.3
Hypothesis Discrimination
Now that we have a set of hypotheses that all satisfy the observed discrepancies, we
must have a method to choose or discriminate among them. There are a couple of
approaches and each will be discussed briefly in the proceeding sections. The first
method involves variations on probing while the second involves testing.
Probing
Probing involves running the system again with the same inputs but this time, gather
data that was not present before by probing values within the system itself. With
this new data, not all of the hypotheses will be consistent with it and some will have
31
to be discarded.
Using Structure and Behavior
Just probing at random locations is not very optimal. A smarter approach would
be to use the structure and behavior of the system to choose locations where the
information probed would be more discriminatory towards the possible hypotheses.
By choosing locations which are upstream of the discrepancy, we can improve our
chances of finding out more useful information which can further discriminate among
the hypotheses.
Using Failure Probabilities
When probing for locations which are more informative than others, it may be the
case that there are several locations which are equally informative. It would be easy
though more costly to just probe all of these locations but if we can only probe once
or a small amount of times, we'd want to pick the best one. With the use of failure
probabilities, we can know which components are more likely to fail, thus it would be
better to probe those equally informative locations which are near the more likely to
fail components. This would further improve the chances of finding useful definitive
information for hypothesis discrimination.
Testing
The second basic technique for hypothesis discrimination is testing. Here we select
new inputs and once again observe the outputs. The set of possible hypotheses thus
must also satisfy the observations given these new inputs and observed outputs. This
can be done a multiple number of times if allowable to continually trim down the set
of possible hypotheses.
Cost Considerations
One consideration when choosing between the various techniques described is cost.
Not all techniques are equal cost-wise. For example, using an optimal probe is more
32
accurate but it might be very costly to find the optimal probe. It might have been
cheaper to use non-optimal probes a multiple amount of times. Similarly, testing
is a good approach for hypothesis discrimination. However it might be too costly
or even impossible to run a new set of inputs on the system. This is a real world
constraint that must be taken into account when designing a model based trouble
shooting system.
3.5
Conclusion
In summary, model based troubleshooting is based on the interaction between observation and prediction. It is symptom directed and reasons backwards from first
principles given a good model of the system. Model based troubleshooting is device
and domain independent.
The ideas can be equally extended towards other non-
related fields. There are three fundamental tasks that comprise the process. They
are hypothesis generation, hypothesis testing, and hypothesis discrimination. There
are many well understood techniques for reasoning about a model to diagnose the
fault. However the harder problem is in coming up with a good model. There is
an inherent tradeoff between completeness and complexity. A good model needs to
model everything about the system taking into account every single minor detail.
However such a large model can often be too complex and might contain too much
information which is not useful for trouble shooting. These are the problems that
many researchers are currently striving to find solutions to.
33
Chapter 4
Design
We have discussed the viability of using Bayesian nets in a variety of AI applications
such as diagnosis, data interpretation, and trouble shooting. Bayesian nets are a
powerful tool that can be integrated into many existing systems to provide additional
functionality which can prove to be extremely useful. Model based trouble shooting
has also been discussed. This is a very practical area that has broad applications.
The fusion of two such powerful ideas and the results will be examined.
4.1
Problem Statement
Why is network intrusion a problem? As computer networks get larger and larger,
the level of coordination needed to organize such a structure increases dramatically.
Computer networks have grown substantially at such a rapid pace with no signs of
slowing down. Unfortunately, as the computer network has grown, so too has the
art of computer hacking. Keeping a network secure from outside intruders is very
important. In sensitive applications such as those involving company trade secrets or
military knowledge, it is of the utmost priority to make sure such information is kept
secure. The current trend is to have a large network of distributed systems sharing
a pool of common resources. A distributed system is inherently harder to protect
against hackers as opposed to one single supercomputer mainframe. There are more
areas to attack, either blatantly or discretely.
34
In order to design a system that is resistant to such attacks, one would like to be
able to assume a framework of absolute trust requiring a provably impenetrable and
incorruptible trusted computing base[7]. This is not a reasonable or realistic task to
accomplish so the question thus becomes, how do we perform computations in the face
of unreliable resources? How can we model such a system effectively? The problem
becomes very complex as a result of the dynamic nature of networks, distributed
computations, lack of monitoring on all desired inputs/outpus from processes, etc..
Additional complexities arise from the fact that not all hacks are obvious. There are
some that are more passive in nature, e.g. resource stealing. There are also different
levels of "hackedness". For example, if a hacker sniffs a password for a user of a server
but that user doesn't have any root access, the hacker is very limited in what he/she
can do to harm the server. However if a hacker was able to gain root access, that
server has been totally compromised and is not to be trusted as well. Additionally,
different computations can have varying levels of sensitivity. For example if a user
wanted to send a file to be printed out to a printer and the file was just a scanned
image of his dog, that is a very low sensitivity process. However, if a military general
was sending an email to his captains to give them orders about what targets to strike,
that would be an extremely sensitive process. That must be taken into account when
assessing the risk of doing such a computation on a resource that is not totally trusted.
As we can see, this is a very difficult problem that must be addressed nonetheless.
There are many security issues to attempt to solve but in this project, the focus will
be on resource stealing. Resource stealing is difficult to detect since it is passive.
Nothing completely wrong occurs. Some processes might take longer to compute but
the time to compute is hardly constant. It depends on the level of network traffic,
the system load, the amount of resources required for the process, the priority of the
computation in the resource's queue, etc.. Thus we can never be sure if the system
has been compromised in such a way. Normal troubleshooting methods are ineffective
at dealing with this problem since there hasn't really been a definite fault. However
we would still like to get some information about the relative likelihood that some
system resource has been compromised. Probabilistic methods are the obvious tool
35
of choice here. The following sections will describe the model used to describe this
situation of a general network system.
4.2
Process modeling
Model based troubleshooting is a good approach to use for solving this problem. One
of the benefits of model based reasoning is the fact that no specific fault models
need to be specified. We only need to model what a component is supposed to do,
how it is supposed to work. A property of model based reasoning is: something is
malfunctioning if it's not doing what it's supposed to do, no matter what else it may
be doing. Thus it isn't required to prespecify how the component might fail since
a fault is defined as any behavior that doesn't match expectations. This is a very
desirable property in the network instrusion problem. We are unsure of exactly how
the system components can be compromised or even what the particular behavior will
be if it is indeed hacked. There are a lot of unknowns in that respect, which is why
it is ideal to use model based reasoning. All these details are swept underneath the
carpet as they are not necessary. Valuable information can still be garnered through
this process.
The model we will be using is as follows. The computations are modeled as component nodes. A given computation can take input from another computation or can
have it's output be linked to another computation. In our model, a computation is an
abstract term describing anything that takes in input information, processes it using
some prespecified resources, and outputs the result. Thus all of these computation
nodes are linked together in a network with each component containing information
about which resource it uses. Note that resources can be shared among components.
Resources are also modeled as nodes and contain information about which components executes on it.
Similar to the GDE/Sherlock circuit fault troubleshooter, an assumptions baseds
truth maintenance system will be used to maintain consistency in the model. Thus
troubleshooting a fault will be a matter of deciding which constraint to retract through
36
the process of constraint suspension. In our system, we use Joshua, a knowledge based
reasoning system built in Symbolics Common Lisp. This will provide the framework
and infrastructure to allow us to have truth maintenance in the system in order to
detect if there is an inconsistency or not given the inputs and outputs of the system.
Joshua is an extensible software product for building and delivering expert system
applications. It is a very compact system, implemented in the Symbolics Genera
environment. It has a statement oriented Lisp-like syntax. Joshua is at its core a
rule-based inference language. It has five major components: predications, database,
rules, protocol of inference, and truth maintenance system. Our application will draw
on a few of these components.
4.3
Input/Output Black Box modeling
Each component is modeled as a black box node with input and output ports. This is
a very abstract view of the component. We do not specify how it does the computation
nor what particular faults it might have. All we specify is what type of inputs it takes
in, which resources it uses to compute with, and what outputs it has. The component
can be thought of as a factory. It waits for its resources, the inputs to come in. When
it has enough of the resources to start one of its processes, it begins. When the process
is done, it outputs the result as the product. Now note that it is possible that there are
multiple inputs and outputs related in any arbitrary way with regards to which inputs
are needed to create the corresponding outputs. For our problem domain of resource
stealing, it is necessary to model the computation times needed for that component.
We do this by specifying a range of time units that the component needs to complete
the computation when it is operating normally. For example, component A has a
normal computation time range of [1,5]. This means that at best, the computation
takes 1 unit of time. At worse it takes 5 units of time to complete during the normal
operating state. We can thus specify the arrival and departure times of the inputs and
outputs. These can be exact times or ranges depending on how things are specified.
All of the components are modeled as such. The data pathways are then completed so
37
A
FOO
[3,5] NORMAL
C
[7,10] HACKED
B
D
Resource: WILSON
Figure 4-1: A simple sample component.
that the outputs of components go into the inputs of other components as the model
would dictate. Thus we have now modeled the dataflow between the components
as well as the timing information for computation. An important point to note is
that nothing is being said about the correctness of the information passed. Indeed
it is virtually an assumption that all values that get passed are correct and do not
affect the computation time of components. That is, even if a component receives
an erroneous input, it will still take the same amount of time to process that input
as compared to a correct expected input. We assume this because the problem we
are trying to tackle is resource stealing where we assume that the system hasn't been
hacked into blatantly, i.e. all the processes produce the same correct values, only the
computation time is affected.
In figure 4-1, the component is named FOO. It has two inputs, A and B and two
outputs C and D. Inputs A and B combine and get processed to produce outputs C
and D. Thus process FOO would have to wait until both inputs A and B are there
before it can start computing. FOO executes on resource WILSON. FOO has two
possible states, a NORMAL and a HACKED state. In the NORMAL state, FOO
takes time [3,5] once both inputs A and B are present to produce outputs C and D.
If FOO is in the hacked state, it takes time [7,10] to produce C and D.
In figure 4-2, the component is named BAR. It has two inputs A and B and two
outputs C and D. Here inputs A and B are independent of each other and do not
interact at all to produce the outputs. Input A is used to produce output X while
input B is used to produce output Y. The two inputs do not need to wait till the
38
A
BAR
B
X
[2,7]
NORMAL
[3,61
[5,10]
HACKED
Y
Resource: ATHENA
Figure 4-2: A second example of a component module.
other one gets there before they can start processing since they are independent of
each other. The first process of input A to output X takes time [1,5] in the NORMAL
state and time [3,6] in the HACKED state. The second process of input B to output
Y takes time [2,7] in the NORMAL state and time [5,10] in the HACKED state.
Component BAR operates on resource ATHENA.
These two figures are indicative of the types of components that will be present
in the process models. A more complex example would have many more such components linked together in more complicated ways.
4.4
Fault modeling
Similar to the GDE[11] and Sherlock[5] circuit fault troubleshooter, each component
and resource module has several fault states.
Now instead of having states that
describe exact types of faults that can occur, only the behavior of the system in fault
states are given. This covers up a lot of the specific details about how a particular
module could have been compromised. For example, exhaustively enumerating the
ways in which a server could be hacked is intractable.
Possibilities include user
accounts being hacked into, root access being compromised, printer resources being
compromised, etc.. Each module has the obvious NORMAL operating state. There
can be several other state of operations.
Possibilities include HACKED, SLOW,
FAST, etc.. One good idea to use is to also include an OTHER state. This is to
include the miscellaneous conditions that we don't take into account; think of it as
39
a leak probability. Our model of the system is necessarily incomplete and simplified.
As completeness increases, so too does complexity. To keep things tractable, we
use a more simplified view of the system but must allow for an OTHER state for
completeness. The Sherlock system[5] contains a lot of similar ideas that our system
borrows from. The interested reader is encouraged to take a look at the reference. Our
approach towards modeling is similar to the Sherlock circuit fault troubleshooter. The
main idea from the perspective of diagnosis is to identify consistent modes of behavior,
correct or faulty. Thus we are assuming that if a resource is hacked, even though we
don't know the details or specifics of what exactly happened, the behavior of it,
namely the computation time will be consistent. Therefore, we can group all of it up
into the timing range information for the HACKED state. Similar arguments goes for
the other states as well. For our application, we allow only the component modules to
have different consistent states of behavior. The resources are in a separate set from
the components. The reason for this is because the focus is on the component level.
Resources can be thought of as the root nodes in this graph model. They are base
resources that do not depend on other modules for operation. Thus instead of having
a conditional probability dependence on other modules, they will just have a prior
probability distribution for the modes of operation. In some sense, they still contain
the mutltiple states of behavior idea but is executed in a different fasion. Resource
modules will have prior probabilities of being in the NORMAL, HACKED, OTHER,
etc.. state.
4.5
Probabilistic Integration
Now that we have a good framework for modeling the system with the truth maintenance system providing the infrastructure for the detection of inconsistencies, we
turn towards the integration of probabilistic methods. The system right now is capable of checking for inconsistencies in the model given a system description listing all
the fully specified components and resources along with the input/output dataflows.
The timing information specifies time ranges where the inputs/outputs must be ar40
riving/departing. We can give the system specific times of certain inputs and outputs
and it can detect if these numbers are consistent with the system description. Recall
in the discussion on model based reasoning that there are three fundamental tasks
towards trouble shooting once a model for the system has been created. They are
hypothesis generation, hypothesis testing, and hypothesis discrimination. For our
system, if a fault is observed and there is a contradiction with the expected values,
Joshua will signal an exception and will then call functions to handle the inconsistency. Our system employs a strategy similar to the GDE/Sherlock troubleshooter
in that the hypothesis generation and testing stages occur at the same time. Recall
that each component and resource module has several states of operation. We start
off with everything in the NORMAL state as an assumption. When a discrepancy
is observed, we use constraint suspension to solve the inconsistency. Thus we would
pick a model of a component to retract. In our case, we would need to retract the
assumption that a component is in the NORMAL state since that assumption isn't
consistent with the observed values.
Our system currently is very similar to the GDE/Sherlock projects. As such we
could use similar methods to deal with hypothesis generation and testing and hypothesis discrimination. However, we would like to integrate probabilistic methods
to make this process smarter. Right now we know how the components interact with
each other and what resources they run on. We also know that each component contains a set of complete modes of behavior to describe the different types of ways it can
behave regardless of the cause. Hypothesis generation becomes a matter of choosing
models to retract as a form of constraint suspension. The Joshua infrastructure allows us to easily test if a certain hypothesis is successful at explaining the observed
values. However, there is no way to distinguish or discriminate among the various
hypotheses. There is nothing guiding us in determining which component's model we
should retract first to see if it solves the problem. An idea used in other model based
troubleshooters is to allow a failure probability for each component. Thus one can
choose which model to retract based on which component is more likely to be not
in the normal operating state. Our problem domain is a little more complicated in
41
that there are underlying resources which can be shared among components. Thus
if the resource is hacked, that should increase the likelihoods for failures to appear
among all of the components that execute on it. Thus we will also include conditional
probability information in our model.
Specifically, we will allow for the resources to have a causal effect on the components that executes on it. This makes sense as it is likely for the components to not
be operating in the normal state if the resource is hacked. Thus we will allow for
conditional dependence from the components to the resources. Probabilistically, this
would involve generating a Bayesian network with the resources as the root nodes
which have a causal effect on the component nodes. We will thus need to specify
the conditional probabilities of the components being in each state given all possible
combination of the parent node states. In our case, each node can have at most one
parent node since components are assumed to only execute on one resource. We use
IDEAL to provide this probabilistic functionality, making sure that we do the proper
bookkeeping to maintain consistency between the IDEAL Bayesian structure and our
Joshua based structure. One feature we added to IDEAL was the idea of evidence
nodes for each component. Basically this just allows for negative evidence similar to
the ideas in Struss and Dressler[11]. IDEAL does not allow the user to have negative evidence for a node, only positive evidence. Negative evidence, in effect, allows
us to state that we know for sure that a node is NOT in a certain state. We get
around the IDEAL limitations by the following method. For each component node
with N different states, we will create N *evidence* nodes, one node for each state
of the component node, and have causal links from the component node to each of
them. The probabilities will be set such that whenever any of the *evidence* nodes
are given positive or negative evidence, that will force the component node to either
be in that state, or definitely not be in that state. Figure 4-3 shows a completely
specified component along with the probabilistic structure created using IDEAL. We
require such a feature because when we are retracting component models, we are in
essence giving negative evidence for that component being in that particular state.
Having integrated a Bayesian network into our system, the question still remains
42
'-.
---
[2,7] NORMAL
FOO-NORMAL
[1,51 FAST
D
B
FOO
BOX-I
BOX-1
NORMAL
HACKED
Figure 4-3: Completely specified component with probabilistic model included.
how best to use the information provided by it. The probabilities can guide us in
our selection of hypotheses to test. It can order the hypotheses by likelihood and
test them accordingly. We can stop whenever we do find a solution that is consistent
with the values. This method is the hillclimbing approach where at each step, we
choose the best, most probable item to retract and then test it. When we've come
to a solution state, we are done. The benefits of this method are speed since we
are not doing an exhaustive search on the entire space of possibilities for component
state models. However, we are not guaranteed that we will find the *best* answer,
meaning the most likely hypothesis which satisfies the given values. A hillclimbing
approach suffers from this problem, also known as the local maxima problem. One
other problem is that with the hillclimbing approach, we choose the best step given
to us at each iteration. However, we could potentially run into a dead end whereby
hitting the end of the road and running out of options without yet having satisfied
the inconsistencies in the model. In this case, the hillclimbing approach would also
need to be able to backtrack to retract the prior step and choose the second best
option available at that time instead.
43
Another method towards using the probability information is to simply do an
exhaustive search on all of the possible hypotheses that do satisfy the constraints on
the model given the inputted values. Then one can rank each of these solutions by how
likely they are using the probability information in the model. The problem is that
this is an exhaustive search. For large problem spaces, this could take an intractable
amount of time to calculate. Oftentimes we don't need the best answer, just an
answer that is reasonably likely to be true. This method however does guarantee the
best solution though the tradeoff to attain this is speed in computation.
For our system, we have decided to use the probability information in the following manner. We intend to merge both the ideas of an exhaustive search together
with a hillclimbing approach. The approach we use is as follows. We being with a
hillclimbing best first search where we choose the most likely component to not be in
the normal operating state. We retract the normal state and assume another state
and test to see if it is consistent with the given outputs. If it is, we are done. If not,
then we continue on and select the next most likely state to be not in the normal
state. We continue doing this best first search until we've come to a solution that is
consistent with the given inputs. Now that we have an answer that is consistent, we
conduct and exhaustive search on the rest of the possible combinations of states, generate them, test them, and collect all the configurations that yield consistent results.
Prior to conducting the diagnosis, we generate an exhaustive list of all the possible
combinations of states of the components in the model and enumerate them all. This
allows us to keep track of which configurations we have tested already and provides
bookkeeping functionality in the exhaustive search. After the first "best" solution is
found, the system searches through all of the possible configurations, tests them, and
collects the configurations which are consistent with the inputs. Now we have a set
of all the "good" combinations which are solutions to the problem given the inputs,
and a set of the "nogood" states.
The set of "nogoods" is information that the system has learned from the diagnosis
of the fault. To encode this knowledge gained, we save this information on the IDEAL
side of things by adding all of the nogood configurations into the Bayesian network
44
as the system goes along testing all the hypotheses. The information gained from
the "goods" is also encoded into the Bayesian network in a similar manner.
For
each solution, we construct a Bayesian net node which is the logical AND of all
the nodes corresponding to the assertions in the solution. Note that due to time
constraints, this part of the implementation has not yet been completed. However,
the general procedure mirrors the situation with the "nogoods" configurations. Once
all the "nogoods" and "good" information has been incorporated into the Bayesian
network, we can compute one final solution to the entire net and we will thus acquire
a probability for each solution conditioned by the complete set of "nogoods" which
takes the position of "evidence" in our system. It is partial information that can affect
the relative likelihoods of the probability of success for the solution nodes. This is
probably the most accurate estimate of how likely each possible solution is. We can
thus choose the absolute "best" solution for the given problem.
Note that if computation time is a limitation, we do not necessarily need to go
through the exhaustive search. Recall that we do this only after we have found a
solution using a best first search hillclimbing approach. It would be interesting to
see how close this answer is to the absolute best answer we get from an exhaustive
search. The hope is that the initial "best" solution is relatively close to the absolute
best solution. Depending on the application, this might suffice. Thus we allow for
the system to be able to trade off speed vs. accuracy as determined by the user.
45
Chapter 5
Implementation
Now that we've discussed the design behind the system, the natural question to ask
is "What can it do and what does it know?"
The best way to demonstrate the
capabilities of the system is by demonstrating a few case examples and show how the
system reacts to different faults. We will analyze what the system's decision on how
to solve the inconsistency is, whether it makes sense or not, and what effect does it
have on the probabilities in the model. We will start off with two simple academic
examples showing some basic structures that will be common to many types of more
complex networks. The last example will be a slightly more complex example which
is more "real world" in terms of applicability. Through these examples and walk
throughs, the reader will hopefully get an intuitive feel for what the system is capable
of doing. It will be demonstrated that the system provides useful, beneficial results.
5.1
Linear Process
The first example is a simple linear process. Here we have a very simple network
consisting of two components each executing on its own resource. Figure 5-1 depicts
the model generated showing the dataflow structure and the timing information.
The probabilistic information is shown in figure 5-2. Note that it is more likely for
component FOO to be in the NORMAL state rather than BAR. The "evidence"
nodes which allow for negative evidence are also depicted here for completeness. For
46
[5,10]
[7,15]
[2,7] NORMAL
NORMAL
[1,5] FAST
[10,20]
[15,20]
SLOW
Figure 5-1: Linear process model.
brevity, they will be omitted in later diagrams.
We will now walk through a few examples to show how the system deals with
different possible faults. The examples are taken directly from the command line
input one would give to specify observed input/output values to the system.
* Normal case with no faults.
(run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 30)))
This is the command line input to tell the system to run a certain model with
given inputs and outputs. The first argument to the function run-case is the
name of the model, called 'test-1 in our case. The next argument is a listing of
the input bindings. In this cause it states that A of FOO is observed to be at
time 10 and B of FOO is observed to be at time 15. This is the arrival times for
the inputs. The third and last argument states that X of BAR is observed at
time 30. In this case, the output time of 30 at X of BAR is consistent with the
predictions of the model. Given the inputs of 10 and 15 of A of FOO and B of
FOO respectively, X of BAR should have an expected time intervale of [22,32].
30 fits into that intervale so no problem is signaled.
47
483/17
17/83
26/74
17/83
20/80
10/90
P(FOO = NORMAL) = 0.83
P(BAR = NORMAL) = 0.74
Figure 5-2: Linear process apriori probabilistic model.
* Fault in output X of BAR.
(run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 34)))
In this case the output X of BAR is observed to be 34 which is outside of the
expected time intervale of [22,32]. This time value is slower than expected. The
system changes the model of BAR from NORMAL to SLOW which solves the
contradiction. Also the probability of the resource BOX-2 which BAR operates
on, of being hacked increases from 0.20 to 0.69. This makes sense and is to
be expected since BAR is acting outside of normal expectations and whenever
it does, it is likely for the resource it is acting upon to be hacked. Figure 5-3
illustrates the probability models after the evidence has been propagated.
e Fault in output C of FOO.
(run-case 'test-1 '((a foo 10) (b foo 15)) '((c foo 16)))
In this example, the output of FOO at C is observed to be at time 16. This is
faster than the expected time interval of [17,22]. The system decides to change
the model of FOO from NORMAL to FAST which makes it consistent with the
given inputs and outputs. The probability of the resource which FOO executes
48
100/0
83/17
17/83
0/100
100/0
17/83
69/31
10/90
P(BOX-2 = HACKED) = 0.69
after evidence
P(BOX-2 = HACKED) = 0.20
before evidence
Figure 5-3: Linear process post evidential probabilistic model.
on, BOX-1 of being hacked increases from 0.10 to 0.47. This makes sense since
something is not normal with FOO and it is most likely to be the resource it is
using as the underlying fault.
e No Possible solutions.
(run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 43)))
This case has a slow output for X of BAR which yields no possible solution
for this model. There is no combination of selected models for the components
that will yield a non-contradictory state. There is no solution possible and this
is an example of a problem that the system cannot cope with properly yet. It
doesn't know when to stop searching for solutions when none exists.
5.2
Branch/Fan process
The second case illustrates a branch or fan structure. This is where one component's
output flows into the inputs of two other components. Figure 5-4 depicts this case.
One of the desired goals of the system is for it to be able to tell when it is a shared
49
Figure 5-4: Branch/Fan process model.
resource that is more likely to be the source of discrepancies rather than a shared
component from whence its inputs came. Figure 5-5 shows a simplified form of the
Bayesian network created for this case. For simplicity, the "evidence" nodes are not
shown. The important prior probability information to know is: P(RESOURCE-1
HACKED)
=
0.10 and P(RESOURCE-2 = HACKED) = 0.20.
We will now walk through a few examples to show how the system deals with
different possible faults. The examples are taken directly from the command line
input one would give to specify observed input/output values to the system.
50
Figure 5-5: Branch/Fan process probabilistic model.
" Normal case with no faults.
(run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 25)))
The inputs state that A of FOO arrives at time 10. We observe that the outputs
X of BAR occurs at time 25 while Y of BAZ occurs at time 25. Given the input
A of FOO arriving at 10, the expected time range for X of BAR is [17,27] and
the expected time range for Y of BAZ is [17,27].
The value of 25 for both
BAR and BAZ are consistent with the predicted ranges so no contradiction is
detected
" Slow fault on BAZ.
(run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 30)))
In this example the output at Y of BAZ is outside of the expected range thus a
contradiction is detected. The system decides to change the selected model of
BAZ from NORMAL to REALLY-SLOW which solves the contradiction. Additionally, the probability that the resource which BAZ operates on, RESOURCE2, being hacked increases from 0.20 to 0.69.
" Slow fault on BAR.
(run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 25)))
In this example, a similar fault from the previous example occurs except here it
is BAR that gets the slow fault. Analogously, the system decides to change the
selected model of BAR from NORMAL to SLOW which solves the contradiction.
51
Once again, the probability of RESOURCE-2 being hacked increases from 0.20
to 0.69
* Slow fault on both BAZ and BAR.
(run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 30)))
In this example, both X of BAR and Y of BAZ are observed to be slower than
expected. This creates two inconsistencies in the model. The system decides to
change the selected model of BAZ from NORMAL to REALLY-SLOW and the
selected model of BAR from NORMAL to SLOW. This solves the contradiction.
More importantly, the probability of RESOURCE-2 being hacked increases from
0.20 to 0.95. This makes sense and shows that there is a lot of information
favoring the fact that RESOURCE-2 is hacked since both of the components
that execute on it faulted in this example.
* No Possible Solution.
(run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 15)))
In this example, once again the observed values provide no possible solutions.
This example contains a slow fault on X of BAR and a fast fault on Y of BAZ.
No combination of selected model states can answer these observed values. Once
again, the system is not equipped to handle examples with no possible solutions
as it loops forever searching exhaustively but doing redundant work.
* Fast fault on FOO.
(run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 25) (b foo 11)))
In this example, B of FOO was given a fast fault. The time observed there was
11 while the expected time range for that location is [12,17]. This was done to
see how the system would respond to such a fault. The result was at first unexpected, but interesting nonetheless. The system decided to change the selected
model of FOO from NORMAL to FAST as expected. However, before it did
that, it also set BAR to the SLOW model and BAZ to the REALLY-SLOW
52
model. Ideally FOO should just have been changed to the FAST state and
nothing more. However, the system did this because given the numbers in the
model, it is more probable that FOO is in the NORMAL state as compared to
BAR and BAZ. So BAR and BAZ's states are changed first to the slower ones
to see if that solves the contradiction. This is not quite what one would ideally
want but it is consistent with how the code works. The resulting configuration is consistent with the observed inputs/outputs though not necessarily the
best or most probable answer. The probability of RESOURCE-1 being hacked
inscreases from .10 to .47. The code works this way because of the manner
in which we implemented it. We chose to use a hillclimbing approach towards
hypothesis generation. We choose what is the most likely thing at that time
and place. It might not end up giving us the best answer but it will give us a
good consistent answer.
5.3
Branch and Join
This last case is an example of a branch and join structure. It takes a real world type
of components and is known as the web-server/trader example. Figure 5-6 shows the
model/structure inputted. Figure 5-7 shows the corresponding probabilistic model
with the prior probabilities for the resources shown. Recall that the first number is
the probability of being hacked while the second is the probability of being in the
normal state. Various faults and how the system handles them will be shown.
This case attempts to model a simple network that consists of these following
components: web-server, dollar-monitor, yen-monitor, bond-trader, and currencytrader. The resources in the model are: wallst-server, jpmorgan-net, bonds-r-us,
and trader-joe. The story for this example goes like this. Everyday traders of all
type get information updates about prices on various things to trade on. When the
prices are favorable to them, they will act upon it buying or selling things. We
imagine that there is a bond-trader working in a company called Bonds-R-Us. He
sits at his computer every day working on the Bonds-R-Us network. It is a relatively
53
small company so the network isn't as fast or as secure as some of the other larger
trading firms. Similarly, there is a currency-trader called Joe. He works at home on
his computer analyzing the price differences between the dollar and the yen. When
prices are favorable, he will do a prescribed buy/sell trade. He only has his PC to
work on and it is not secure at all since it is running Windows. His computer often
gets hacked into. Now all traders need to get information from certain places that
are reliable and provide such service. Oftentimes it is a large financial company that
offers such services to the public. In our example, JP Morgan is the company that
will provide such a service to the people. Everyday they monitor the prices on various
commodities such as the price of the dollar in the US market and the price of the yen
in the Japanese market. The bond-trader needs to know the relative strength of the
US dollar to know if it is good or bad to buy/sell US bonds. He makes the decision on
his computer based on the information updates he gets from JP Morgan of the current
value of the dollar. Similarly, our currency-trader Joe, working from his home PC gets
up to date information about the dollar and the yen prices to know what the relative
pricing is between them. Given this information, Joe will make a decision whether or
not to buy more US dollars or Japanese yen, or other trading strategies. Now these
services that JP Morgan provide do not run by themselves. They too need to get
information from a central web server from Wall St. which keeps track of basically
everything financial in the entire world. It is a most powerful supercomputing web
server indeed. Whatever information the JP Morgan computers/networks need, they
get it from this centralized financial web-server.
This sets up the basic idea behind this example which will demonstrate a branch
and join structure in the context of a real world network. We will now walk through
a few examples to show how the system deals with different possible faults. The
examples are taken directly from the command line input one would give to specify
observed input/output values to the system.
* Normal case with no faults.
(run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision
54
Figure 5-6: Branch and Join model.
10/90
20/80
15/85
30/70
Figure 5-7: Branch and Join probabilistic model.
55
bond-trader 25) (decision currency-trader 28)))
This is the normal case with no faults. The DECISION of BOND-TRADER
occurs at time 25 and the DECISION of currency-trader occrs at time 28. The
predicted ranges for them respectively are [20,31] and [27,39]. Thus both values
are consistent with the predicted values and no contradiction is detected.
" Slow fault on bond-trader.
(run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision
bond-trader 32) (decision currency-trader 28)))
The output DECISION of BOND-TRADER occurs at a time not within the
expected time intervale. The system decides to change the selected model of
BOND-TRADER from NORMAL to SLOW. This solves the contradiction. Additionally the probability that the resource BONDS-R-US is hacked increases
from 0.20 to 0.32. In the real world, computations take varying times depending on the time of day. Thus it is likely for the bond-trader's computer to be
a little slow. However there is the possibility that the bonds-r-us network has
been tampered with and some resources might have been stolen by hackers.
* Fast fault on bond-trader.
(run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision
bond-trader 18) (decision currency-trader 28)))
In this example, bond-trader is given a fast fault where the observed time for
the output DECISION of BOND-TRADER occurs at time 18 which is inconsistent with the expected interval of [20,31]. The system changes the selected
model of bond-trader from NORMAL to FAST which solves the contradiction.
The probability that BONDS-R-US is hacked increased from .20 to .31. Once
again sometimes the bond-trader sees some input values which are deemed nobrainers, i.e. values which obviously dictate a certain course of action. Thus
computation of these such cases could take faster than at other times. However
there is also the outside remote chance that once again, the system has been
56
tampered with and the bonds-r-us network could be hacked into.
57
Chapter 6
Conclusion
6.1
System critique
We will now discuss in more detail about what the system can and cannot do. We will
also attempt to answer the question "How useful is it in real world scenarios?" What
does the system allow us to do and what does it know? Once the user has inputted
and fully specified the model of the network, there is much that the system knows
from that. First, the system knows the timing information and how the dataflows
between the components.
Given an arbitrary input, the application can calculate
what the expected time intervals for all of the component outputs are. Given output
information, the system can immediately decide if it is consistent with the expected
values. The system knows what time intervals each output should expect and can
simply test to see if a given output value fits in the predicted time interval. If it
doesn't, the system knows how to systematically search through all the components
to find the combination of component models that satisfies the given observed values.
The system also knows how to approach this search intelligently through the use
of conditional probabilities given apriori. Additionally, once the system has found
a solution, the probabilities will have been updated to reflect information garnered
from the troubleshooting process. Namely, resources that are likely to have been
hacked into to cause the inconsistencies will have an increased probability of being
hacked. This is a very useful feature for self monitoring adaptive systems whereby
58
if the probability of being hacked increase past a certain threshold, predetermined
methods of dealing with this problem can be applied. Thus the system knows much
about the network given the simple model it has. Though it is important to note that
it can only deal with problems which operate on the same level of detail as the user
provided model. i.e. if a problem occurs at a lower level such that the structures used
do not correctly model these problems, the system will be incapable of dealing with
such things. This is more of a limitation of the model based approach rather than
our system. The model based approach naturally must use one set of structures to
model a system. The ideal approach would be to allow for multiple representations of
varying detail to model the system. The problem of course is that such an approach
is typically not feasible to use. It takes too long to come up with one good model for
a system, let along a multitude of them. The level of complexity is a bit high and
a lot of practical implementation questions come into play. What constitutes as a
higher level of detail when modeling something? There are a lot of borderline issues
that are highly subjective. The problem is that most problems do not break up into
easily identifiable levels.
In any case, the system does serve a useful purpose. It has its limitations, which we
will explore a little more in the next section. However, it still succeeds at solving the
problem it was intended to and is expandable enough to incorporate more complex
ideas, algorithms, and functionality in the future. All in all a good foundation to
build a self monitoring resource stealing troubleshooter. It is a system with a fair
amount of knowledge and capability in it with a lot of room for future growth.
6.2
System limitations
Although the system provides a lot of useful functionality and can effectively model
and troubleshoot the problem of resource stealing in network intrusion, there nevertheless are some limitations to the system. We will discuss a few of the limitations of
the system and state how difficult it would be to address these shortcomings.
59
6.2.1
Lack of correctness detection
There is a lack of correctness detection. Nothing is said about whether the value
outputted by a component is correct or not. This is mainly because we are trying
to solve the problem of resource stealing. We had stated by assumption that all the
inputs as well as the outputs of all the components are correct. All that can possibly
cause an inconsistency with the system is the timing of the computations. The model
does not include correctness detection and as well it isn't necessary given the problem
domain we are attempting to solve. However if we wanted to extend this system
towards other types of network attacks, correctness detection would be mandatory
to ensure that all processes are behaving properly with respect to computation and
timing.
6.2.2
Lack of probabilistic links between components
There is no propagation of probabilities between components. Right now the only
probabilistic links are between a component and the resource that it uses. The only
way for probabilities to flow between components is indirectly through a shared common resource. For our model and the problem that we're addressing, this is fine. The
reason goes back to the initial assumption that all values produced by components
are correct, though they may not take a "normal" amount of time to produce.
6.2.3
Lack of descriptive model states
Right now we have only included very simplistic security models. There is a lot of
simplification as we are only focusing on the problem of resource stealing. If we were
to extend the system to handle other types of network security issues, we would need
more specific models that store even more information. These models would have to
draw on the knowledge base of network security. One feature of model based reasoning that will help us in extending this functionality in the future is that model
based reasoning uses hierarchical structures.
The reasoning is device and domain
independent. All one has to do is to change the device/system specifications and
60
the troubleshooter can diagnose faults by reasoning from first principles yet again.
Hierarchically structured models would allow us to have varying level of detail depending on how specific the problem is. Thus each time we "zoom" in on the problem,
examining details in greater depth, we can use an appropriate model that correctly
captures all of the necessary information, no more, no less than what is required for
the diagnosis. Such a feature of model based troubleshooting makes it very scalable
and extensible.
6.3
Future work
This system for diagnosing resource stealing network intrusion problems is part of the
bigger picture of the Active Trust Management (ATM) for Autonomous Adaptive Survivable Systems (AASS's)[7]. The project attempt to build survivable systems in an
imperfect environment in which any resource may have been compromised to an unknown extent. The project's claim is that such survivable systems can be constructed
by restructuring the ways in which systems organize and perform computations. Our
project that deals with monitoring and trouble shooting resource stealing network
intrusion attacks is a part of the active trust management system needed to constantly monitor and update the relative trust of the resources given a limited amount
of information. There are five main tenets of the Active Trust Management project.
They will be listed below. This is taken directly from the project proposal submitted
to ARPA[7].
1. Such systems will estimate to what degree and for what purposes a computer
(or other computational resource) may be trusted, as this influences decisions
about what tasks should be assigned to them, what contingencies should be
provided for, and how much effort to spend watching over them.
2. Making this estimate will in turn depend on having a model of the possible
ways in which a computational resource may be compromised.
61
3. This in turn will depend on having in place a system for long term monitoring
and analysis of the computational infrastructure which can detect patterns of
activity indicative of successful attacks leading to compromise. Such a system
will be capable of assimilating information from a variety of sources including
both self-checking observation points within the application itself and intrusion
detection systems.
4. The application systems will be capable of self-monitoring and diagnosis and
capable of adaptationto best achieve its purposes with the available infrastructure.
5. This, in turn, depends on the ability of the application, monitoring, and control
system to engage in rational decision making about what resources they should
use in order to achieve the best ratio of expected benefit to risk.
Our system for trouble shooting resource stealing fits well into this bigger picture.
One of the biggest problem in network security is to know when a resource has been
compromised.
There are different levels of compromise possible for resources and
not all of them are easy to detect. Thus rather than trying futilely to develop an
environment where all the resources are guaranteed to be secured and can be trusted
totally, we instead deal with managing the level of trust among the resources and then
use risk assessment to determine if one should run a certain computation on a certain
resource. Our system can integrate into this nicely and can provide good practical
value to the overall project once more complex models including more descriptive and
realistic component/resource behavior states are used. There however is still much
to be done on this large and ambitious Active Trust Management for Autonomous
Adaptive Survivable Systems project.
6.4
Lessons learned
There were many lessons learned from designing, building and testing such a system.
There were many steps in the process, from creating the infrastructure using Joshua
62
for consistency detection, to modeling the calculation of arrival/departure times, to
integrating IDEAL into it, and finally to decide how it is that we want to use the
probabilistic ideas. Oftentimes it was unclear what was the *more* correct thing
to do. One problem was the difficulty in integrating several different applications
together. Time is always a limiting factor and I never had the time to fully understand
how Joshua or IDEAL works.
All you can do is to learn enough for what your
application needs. The problem grows when something goes wrong and you have to
learn more about how Joshua and IDEAL actually work in order to find out why things
aren't working. Another problem was in the knowledge acquisition part. Since I am
not an expert in network security and the issues in the field, at best I can only come
up with very simplisitic, perhaps unrealistic models for components and resources.
Having a better grasp of these concepts and details might have given us a better idea
of how best to design/model the system from the start. As such we were constrained
to design the system to the best of our knowledge. Another important lesson is in the
difficulty of research. Oftentimes it is unclear how best you should approach things
since nothing similar has been attempted before. This usually requires intuition that
comes only from extensive background knowledge of the problem area. The solution
to this is to usually consult the experts in the area to gain their knowledgable insight
on the problem. However, real world scheduling problems come into play here when
it is nearly impossible to find times when all parties are able to meet up to talk about
the topic. It's an unavoidable real world problem that must always be taken into
account.
63
Chapter 7
KBCW Comlink System
There are three general types of problems that probabilistic methods can be used.
We've already discussed a model based troubleshooting application. The next application involves the Knowledge Based Collaboration Webs KBCW Comlink System.
The scope of problems this can be used to deal with include data interpretation and
collaborative decision making with the main focus on decision making.
7.1
Description
Collaboration is the task of people working together towards a common goal. There
is a pool of shared knowledge and understanding of the problem domain. There are
limitations on what can be accomplished based on factors such as the skill/knowledge
level of the people involves, workload, goals, and ability to work together in a communal fashion.
The Knowledge Based Collaboration Webs (KBCW) project strives to find new
and better ways of setting up and supporting collaboration. It is a broad, large
scale project that seeks to integrate various separate lines of research in the MIT
Artificial Intelligence laboratory to suppose the unified collaboration goal. Currently
the direction is to allow for computer mediation of the problem using natural language
and other forms of interaction between people.
The Comlink system in particular incorporates email, and online web-based dis64
cussion forums in order to achieve the collaboration ideal. What the system attempts
to do is to allow a structured, automated, computer mediated forum whereby people all over the world can debate on topics ranging from politics to technology to
basically anything they'd want. It allows a systematic way to have discussions, arguments, and debates about various topics. Someone can pose a statement or question
to the forum initially. Then other people can read what the statement is and come
up with arguments/evidence that either support it or deny it. In turn, there can
be recursive support/denial of each of these statements. Comlink keeps track of all
these arguments and how all the statements link with each other. This allows for a
very structured and formal method of debate or argumentation. One of the useful
capabilities of the system is for collaborative decision making whereby a large group
of people spread out all over the world can debate on what is the best course of
action to take for a particular problem domain. Examples could include engineers
from around the world trying to come up with a new design for a computer CPU
architecture or McDonald employees around the world trying to brainstorm a new
idea for a burger/sandwich.
One good feature of this system is that it can allow for anonymity. Statements
need not specify the origin of the source. What this allows is for a more level playing field among the participants such that everyone has an equal say in the argument/decision making process. One big fault of a collaborative decision making venture is that not all parties involved are equal status/position wise. For example, in a
company meeting, the CEO of the company would have a much bigger say in decisions as opposed to a secretary. Whatever he/she says would have larger weight to it.
This is the case even if he/she is totally wrong and has an incorrect view on things.
Conversely, even if the lowly secretary has a brilliant idea, it does not have the equal
weight behind it despite the superiority of the idea. This problem has shown up before, sometimes with catastrophic results. An example of this is the Challenger space
shuttle disaster. A memo from one of the engineers stated the potential for disaster
if the shuttle was to be launched in its current state. It explicitly stated the problem
that could arise in the 0-rings. However, the higher ranking management dismissed
65
the idea even though they were obviously not the experts in the case. In this case,
management pulled rank on the engineer and as a result, the shuttle exploded. Afterwards the finger pointing began. The crux of the matter is this: the disaster could
have been averted has that one engineer's recommendation had been more seriously
followed up on.
Comlink provides a more formalized way to bring about issues and deal with them
in a more efficient manner. Email, online discussion forums, and memos alone by
themselves are not effective when there is a large group of people with varying opinions
and knowledge. What is needed is a good mediator that can keep track of everything
and provide an intuitive yet informative interface to present the information presented
thus far on the topic. An online collaborative forum like Comlink provides a more
systematic approach towards decision making and is more resistant to the problem
of office politics and people with less expertise pulling rank on those that are more
qualified knowledge and skill-wise.
7.2
Integrating a Bayes Net Solver
Although the Comlink system provides very useful functionality now, it does have
some limitations. The main criticism is that there is no way to distinguish between a
"good" argument and a "weak" one. Someone can make a wild claim that is questionable whether it is true or not. Someone else can make a reasonable claim and have a
lot of evidence in its support. However there is no way to distinguish quantitatively
the difference between these two claims to know which statement is more likely to be
true. This is the motivating force behind the integration of probability into Comlink.
In particular, we are once again using a Bayesian network solver called IDEAL to
accomplish this task. IDEAL, an acronym for Influsence Diagram Evaluation and
Analysis in Lisp, is a test bed for work in influence diagrams and Bayesian networks.
It contains various inference algorithms for belief networks and evaluation algorithms
for influence diagrams. It also contains facilities for creating and editing influence diagrams and belief networks. IDEAL was created in the Rockwell International Science
66
Center and is available free for non-commercial research ventures. It is the Bayesian
Network toolkit that we will be using as it provides us with all the functionality that
we will need.
By integrating IDEAL with the Comlink system, what we can now accomplish is a
quantification of the relative strengths and weaknesses of arguments. If a statements
gets a lot of positive support, the belief in that statement will obviously increase.
Similarly if there is a lot of negative denial type of evidence against it, the belief will
decrease. Thus we can systematically add probability to all the arguments and will
thus be able to quantitatively compare competing theories or viewpoints on things.
This is a big part of collaborative decision making. Oftentimes the group is attempting
to come up with a unanimous decision about how to approach a problem. There will
be several candidate hypotheses and the group must argue for and against each of
these hypotheses. Without any quantification, it is very subjective which hypothesis
has the most support or better argument for it. By adding in probability to a model
that is already structurally similar to a Bayesian network, we can achieve this desired
goal. One important thing to point out is that coming up with the numbers is still
subjective. How can I say that a statement I make has a .75 probability of being true?
Humans have been shown to be not very good at estimating probabilities. The lack of
accurate estimations of prior and conditional probabilities on statements could cause
inaccurate results. There is no easy answer to this problem but a technique called
sensitivity analysis attempts to address some of the shortcomings of this probabilistic
approach.
7.3
Implementation
Now that the design of the system has been declared, a couple of obvious questions
to ask arise. What can it do? What does it know? What the system can do now is
the following. On the Comlink side, you can enter in statements into the discussion
and it will be incorporated into the existing discussion structure already. Visually
a statement is a node that is being added to a graph. There are different types of
67
statements you can add to the discussion though the exact details are not important
to our design. If the statement is in response to either support or deny some other
already present statement, Comlink allows you to specify this and an arc is drawn
between the two nodes to specify a causal link between the two. The user can then
input in either the prior probability of the statement if it will have no parents, i.e.
doesn't support or deny any other statements, or a conditional probability given
all the possible combination of states of its predecessors/parents.
This is entirely
analagous to specifying the probability values in a Bayesian network. Key points to
note are that the nodes are binary valued, i.e. they are either true or false. In the
implementation, when the comlink graphical structure is created, a mirror IDEAL
copy is also created along with the appropriate functions to link between the two
versions. Once all of the statements, links, and probability values are specified, the
IDEAL side of things can use one of its inference algorithms and propagate the belief
values through the Bayesian network. If any evidence should be discovered, it can be
entered into the Bayesian network and the beliefs will be repropagated. Evidence in
this case could be knowing for sure that a statement is absolutely true or absolutely
false. The end result is that all of the information can be processed and thus there
will be a quantification of the relative likelihood of competing hypotheses. Thus given
all the evidence and support/deny statements, we can pick the best option out of all
the candidate hypotheses.
What does the system know? The system knows how to combine together a
collection of statements given how they support/deny each other qualitatively and
given user supplied probability values. It knows how to propagate the numbers and
combine evidence in a coherent manner. That is just a by product of using Bayesian
networks.
7.3.1
Example
Here I will use a simple example to illustrate how the system works and what it
can tell us. The problem domain will be McDonald's burgers. The top executives
at McDonald'ss are a little concerned that sales has been steadily decreasing in the
68
past few years. There has been more competition from other fast food places which
have been taking away more and more profit each year. McDonald's wants to do
something about this and the plan is to unveil a new out of this world burger, the
burger that will end all burgers at the next "Y2K Concept Burger Convention".
They hope to generate quite a stir in the fast food industry and thereby send waves
of repurcussions throughout the market. There are a few constraints though. Issues
such as cost, appearance, target audience, etc.. come into play. Basically they want
to make a burger that isn't too wild but just different enough to set it apart from
all else. They also don't want the burger to cost too much money to make or be too
difficult for their employees to assemble or *cook* depending on who you ask. The
top McDonald's executives from around the world decide to use Comlink as the forum
for this collaborative decision making process. The following will be an abridged form
of their actual discussion. I will have a statement template with several fields which
include: statement number/name, node numbers it supports and/or denies, a listing
of the conditional/prior probabilities, and the actual statement itself. I will start off
at the top with the candidate burgers. All statements are binary valued, that is they
are either true or false. I will denote probabilities as X/Y where X is the probability
of being FALSE and Y is the probability of being TRUE. Note that this is redundant
as X + Y = 1. If the statement does not support or deny anything, then it has no
predecessors and will be a root node. If a statement has a parent node, then it's
probability will depend on all the possible combinations of states of the parent nodes.
In our particular case, every node has at most one parent node which is either in the
true (T) or false (F) case. I will denote these conditional probabilities as X/YIT or
X/YIF where the probabilities are conditioned on the fact that the parent node is
either in the T or F state. Probability values will be from a scale of 0-100 with 100
denoting absolute certainty.
Top level statements:
Number: 1
Name: Cream cheese soy burger
69
Supports: None
Denies: None
Probabilities: 50/50
Statement: A cream cheese soy burger is the best new burger that
McDonald's can offer to its customers.
Number: 2
Name: Pickled cabbage quadruple cheeseburger
Supports: None
Denies: None
Probabilities: 50/50
Statement: A pickled cabbage quadruple 1/4 pounder cheeseburger is the
best new burger that McDonald's can offer to its customers.
Number: 3
Name: Bacon three-cheese sauteed onion double cheeseburger of the
EVERYTHING burger
Supports: None
Denies: None
Probabilities: 50/50
Statement: A bacon three-cheese sauteed onion double cheeseburger is the
best new burger that McDonald's can offer to its customers.
Supporting or denying statements:
Number: 4
Name: Vegetarian
Supports: 1
Denies: None
Probabilities: 20/801T, 40/601F
70
Statement: People today are more concerned about health and will be
more willing to eat a soy burger.
Number: 5
Name: Expensive cheese
Supports: None
Denies: 1
Probabilities: 90/10T, 20/80IF
Statement: Cream cheese is expensive and would cost too much to use in
burgers for a reasonable cost.
Number: 6
Name: Meat lovers
Supports: None
Denies: 1
Probabilities: 95/05|T, 35/65|F
Statement: The main bulk of McDonald's customers are meat people who
aren't interested in health. Otherwise why would they even be eating
fast food? Therefore they would want meat and not some veggie
meat-wannabe burger.
Number: 7
Name: Value
Supports: 2
Denies: None
Probabilities: 05/95|T, 50/501F
Statement: McDonald's customers like value, they want to feel that they
are getting their money's worth. The more meat patties in the burger,
the greater the value.
71
Number: 8
Name: Expensive cabbage
Supports: None
Denies: 2
Probabilities: 70/301T, 40/60IF
Statement: Cabbage isn't cheap and it's cost would be too high to make
the burger successful financially.
Number: 9
Name: Too big
Supports: None
Denies: 2
Probabilities: 65/351T, 20/801F
Statement: Four meat patties is too big for an average sized mouthed
person to eat. They'd have to dislocate their jaws to eat it so it's
not likely to be popular with the customers.
Number: 10
Name: Good variety
Supports: 3
Denies: None
Probabilities: 05/95|T, 50/50|F
Statement: Customers love variety when it comes to their burgers. They
love getting a burger with all the works. Therefore this burger will be
very popular with our customers.
Number: 11
Name: Good cheese
Supports: 3
Denies: None
72
Probabilities: 30/701T, 50/501F
Statement: Offering three types of cheese offer a new different type of
taste not normally associated with burgers. Plus the types of cheese
used will be common cheeses so it won't be too costly to implement.
Number: 12
Name: Boring
Supports: None
Denies: 3
Probabilities: 50/501T, 40/601F
Statement: The burger has a lot of stuff in it but it is all normal
boring things that other burgers in the past have offered before in some
shape or form. The customers will just think it is more of the same
thing and not really be excited about it.
Number: 13
Name: Cheap cream cheese
Supports: None
Denies: 5
Probabilities: 80/201T, 50/501F
Statement: A new science breakthrough has just occurred which allows for
very cheap imitation cream cheese. It tastes virtually the same and can
be manufactured for half the cost as the real thing.
Number: 14
Name: Germany bankrupt
Supports: None
Denies: 8
Probabilities: 95/051T, 50/501F
Statement: Germany is about to go bankrupt and thus is trying to
73
liquidate all assets. In particular, they have a huge surplus of
pickled cabbage which they will desparately try to unload on the
American market. Thus pickled cabbage can be purchased for very cheap
prices.
These were all of the statements submitted by various McDonald's executives.
What the system will do now is to construct a model based on these statements both
on the Comlink and IDEAL side. The IDEAL model will generate a Bayesian net.
Together with the conditional and prior probabilities along with the causal links as
denoted by the support/deny information, we are able to generate a fully specified
Bayesian model based on the given information. Figure 7-1 depicts graphically how
the model looks. Note that the values generated next to each node are the beliefs
of each node which is separate from what the distribution of the node is. Figure 7-1
shows the model before any specific evidence has been discovered. Recall that the
probability listed as X/Y refers to the probability of that node being false as X and
true as Y. Thus prior to any evidence, the three root nodes still have the same prior
probabilities, initialed to 50/50 stating that all three candidates are equally likely to
succeed prior to evidence.
Things get a little more interesting when evidence is entered into the diagram.
Seven examples of evidence in this case will be demonstrated. The reader can see how
the probabilities will propagate as a result. Note that the three candidate burgers form
disjoint trees in the diagram. They are not probabilistically linked in this particular
example. Therefore evidence in one subtree will only propagate within that subtree
and will not affect the results in the other two subtrees.
Example #1:
Insider sources at the chem lab where they are rumored to be
developing a cheap cream cheese which tastes virtually the same as the real thing
have confirmed the rumors. The cheap cream cheese does exist; it tastes just like
normal cream cheese and will be sold to the public immediately at a very cheap price.
Figure 7-2 depicts the result of this evidence on the cream cheese soy burger sub-tree.
Note that as a result, the probability of the burger being a success has increased from
74
50/50
50/50
cream cheese
soy burger
vegetarian
expensive
30/70
cheese28/72
55/45
cheap cream
cheese
50/50
pickled cabbage
quadruple burger
vers
metexesve
value
expensive
cabbage
55/45
65/35
everything
burger
to
bi
42/58
good
variety
good
boring
cheese
burger
28/2
40/60
45/55
Germany
bankrupt
70/30
64/36
Figure 7-1: The Burger Problem before any evidence.
50% to 64%. Recall that since this part of the tree is disjoint from the rest, any
evidence declared within here does not propagate towards the other non-connected
nodes.
Example #2:
Insider sources in Germany have confirmed that Germany is indeed
going to declare bankruptcy and is attempting to liquidate all of its pickled cabbage.
As such they will definitely be able to sell it on the open market for a very cheap
price. Figure 7-3 depicts the result of this evidence on the pickled quadruple burger
subtree. Note that the probability of success for the burger has increased from 50%
to 61% as a result of this information obtained by our sources.
Example #3:
McDonald's has decided to conduct a survey to see just how im-
portant variety is to their customers. They conduct a worldwide survey asking their
customers to rank qualities in order of importance. The result was that variety was
voted as the most important quality when it came to burger preference. Customers
were tired or boring burgers and liked having different flavors melding together providing a complex and satisfying delicious taste. Figure 7-4 depicts the result of this
evidence on the everything burger. The probability of this burger's success has in-
creased from 50% to 66%.
Example #4:
A late breaking medical health experimental result has just oc-
75
36/64
27/73
74/26
0/100
Figure 7-2: Cheap cream cheese developed.
39/61
0/100
Figure 7-3: Germany goes bankrupt and liquidates pickled cabbage to the world.
76
34/66
everything
burger
good
variety
good
cheese
0/100
37/63
boring
burger
47/53
Figure 7-4: McDonald's survey show customers value variety the most.
93/7
cream cheese
soy burger
vegetarian
39/61
expensive
cheese
25/75
meat
lovers
0/100
cheap cream
cheese
72/27
Figure 7-5: Medical results say that you need to eat meat.
curred. Researchers have found that people who don't eat enough meat can suffer
from increased chances of heart failure. This shocking news creates a ripple effect and
people are frantically in a meat frenzy now cutting down on a lot of vegetarian types
of food. People are declaring, "I want my meat!". The health craze suddenly takes a
180 on the issue of meat. Figure 7-5 depicts the result of this evidence on the cream
cheese soy burger subtree. The probability of success of the burger has dropped from
50% to a shockingly low 7%.
Example #5:
The bean counters at McDonald's have come up with some cost
figures for the concept pickled cabbage quadruple cheeseburger. The prototype burger
77
76/24
Figure 7-6: Estimated cost of pickled cabbage quadruple burger too high.
does offer a lot of food but the projected cost to produce such a burger is immense.
A single burger alone would have to cost the consumers $10 for McDonald's to have
a decent profit on it. This totally shoots the whole value argument for the burger.
Figure 7-6 depicts the result of this evidence on the pickled cabbage quadruple burger.
The probability of success of the burger has dropped from 50% to a lowly 9%.
Example #6: This next example has McDonald's once again conducting a survey
to see what factors are important to the customers.
To generate more customer
feedback, each booth that gives out the surveys to be filled out also has food samples.
The food sample they are using is the same three cheese combination that is in the
prototype everything burger. The survey also includes customer feedback on the
cheeses. Well the surveys are in and everyone LOVES the three cheese combo. As
before, they also cite variety as the most important quality when choosing among
burgers.
Figure 7-7 depicts the result of this evidence on the everything burger
subtree. The probability of success has increased from 50% to 73%.
Example #7:
This final example, as depicted by Figure 7-8 contains all of the
evidence from the previous six examples. This is the information that the McDonald's
executives have acquired from insider sources and customer surveys. The final result
has the cream cheese soy burger having a 22% chance of success. The pickled cabbage
78
27/73
everything
burger
good
variety
good
cheese
boring
0/100
0/100
47/53
burger
Figure 7-7: New McDonald's survey also shows that customers love the cheeses.
78/22
86/14
27/73
cream cheese
soy burger
pickled cabbage
quadruple burger
everything
burger
good
boring
variety
cheese
burger
0/100
0/100
47/53
good
vegetarian
38/62
expensive
cheese
45/55
meat
0/100
value
100/0
expensive
cabbage
88/12
cheap cream
cheese
Germany
bankrupt
0/100
0/100
too
b
26/74
Figure 7-8: Final result putting together all the evidence.
quadruple cheeseburger has a 14% chance of success. The everything burger has a
73% chance of success. Given the results that the system outputs, the McDonald's
executives decide to go with the everything burger as their new flagship burger. The
burger is wildly successful and heralded by many as the next big thing, a dramatic
quantum leap in burger technology, forever known as one of the all time greats if not
THE best burger of all time.
This is just one simple but realistic (debatable) situation that can occur where
the Comlink system together with IDEAL can prove to be most useful. In more
79
complex situations, the value would be even greater as it is harder to discern or
discriminate among a large amount of candidate hypotheses each with their own
complex interaction of supports and denies. A systematic formal approach would be
the only tractable way to tackle such a problem.
7.4
7.4.1
Conclusion
System critique
We have now seen a description of the system and walked through an example. A
logical question to ask is: What can it do and is it useful at all? The "What can it do"
question has been answered so now we deal with the more important question of how
useful it is. Barring the big problem of how the probability numbers are generated,
the system is successful at solving the problem intended. The problem was to give
a systematic mathematical probabilistic functionality to the Comlink system. That
requirement has been met. We can now link together any number of statements which
are part of a decision making process and propagate the values through the model.
This is an extremely useful feature in any decision making procedure where one has to
choose among competing viewpoints/alternatives. Formalism and structure has been
introduced into a domain that has traditionally been very subjective and disorganized.
This provides a significantly more efficient method in the problem domain. Greater
efficiency in any process is always a desirable goal.
7.4.2
Lessons learned
What did I learn from designing/building this system? There are many issues and
thoughts that came up while working on this portion of the project. For one thing,
I've learned that when trying to integrate one system into another, it really helps if the
two systems have comparable structure to begin with. This was the situation in our
case as the Comlink structure was very similar to the IDEAL structure. Extending
the Comlink side to be able to accept probability values was not very difficult. The
80
integration then became more of a bookkeeping process to keep track of the Comlink
model and the IDEAL model of the problem. Another lesson I learned is that when
extending a large body of existing code, things are okay if you can treat the existing
code as a black box. However, when it behaves contrary to what you expect, then it is
an arduous task to dig through that code to try to understand why it didn't do what
you expected it to do. Good documentation would always help in this case. Such is
one of the many follies of large software projects. What went well was how relatively
trouble-free it was to integrate IDEAL into the Comlink system. This was mainly due
to the structural similarities between the two models. What didn't work out so well
was coming up with good examples. There are existing Comlink structures debating
certain issues but one would have to go back and input in all of that information plus
the probability values again. As I am not an expert in those fields, it is difficult for
me to assess what an accurate probability value should be for statements and how
well something supports/denies another. To come up with a non-trivial model is not
a simple task. Coming up with meaningful numbers/probabilities takes time and a
more expert understanding of the problem domain and the many issues that arise.
Another problem was the lack of specificity in the problem domain. There are a
vast amount of problems where this system could be applied to. Comlink is capable
of handling online collaborative decision making as well as data interpretation. The
focus of the project became unclear at times as the problem domain seemed very
large. Knowing exactly what type of problem to tackle would definitely have made
things easier. However this was more a result of trying to create a general system
that could handle a multitude of problems.
7.5
Future work
Although the system we have does the intended job, there are still some limitations
to it. We will discuss a couple of these issues that are open research topics suitable
for future work on the project to extend its functionality.
81
7.5.1
Sensitivity Analysis
One of the chief complaints against the system as it now stands is a common problem
with many probabilistic approaches. That is, subjective values for probabilities still
need to be made. The fear is that you can get whatever result you want given the same
structure for the model by playing around with the numbers. As such, one technique
that attempts to deal with this complaint is sensitivity analysis. The basic idea is
to see how varying the probabilities on different nodes have an effect on the toplevel
probabilities used to differentiate between competing hypotheses. We can see how
sensitive the high level nodes are to variations in probability of other nodes further
down the tree. What one often finds is that the final probability usually doesn't
change that much just by varying the probability of another node within a certain
reasonable range. It is difficult to say definitely that something is likely to be true
with probability x. However it is much more feasible to state a range of hypotheses.
For example, I am not exactly sure how probable it is that eating greasy food causes
heart diseases. I can say that it is between .5 and .75 and be more certain of that
assessment rather than picking a single value. Sensitivity analysis thus attempts to
allow for ranges of probabilities and see how each node can affect the final result.
Some nodes will obviously have more of an impact on the final value. The most
influential nodes are called critical nodes.
Identification of critical nodes is very
important as it allows you to know which issue/node is the most important thing
to debate upon to determine the final outcome.
It lets you know where best to
focus your resources on to find out. For example, in the realm of intelligence data
interpretation, it might be critical to know if a foreign nation is stockpiling on fighter
jets and bombers to know what the likelihood of an air strike by that country is. The
issue of how much air power is being accumulated is critical and information about
that should be acquired by all means necessary. In the collaborative decision making
problem, critical nodes could be the important basic issues of the problem which have
the most bearing on the issue at hand. Identifying critical nodes can allow the group
to argue about the more important impacting issues rather than periphery topics that
82
don't have much influence on the final decision.
Thus sensitivity analysis and the identification of critical nodes is a very useful
technique which we would like to incorporate into our system in the future if possible.
It addresses the main criticism of inaccurate subjective evaluations of probabilities.
7.5.2
Multiple Viewpoints
One problem that arises comes from the generation of the model for the problem. One
person can say that a certain statement, call it A supports another statement, call it
B. However, somebody else could argue that statement A more accurately supports
statement C while denying statement D. Thus even before you get into arguing about
the specific numbers for the probabilities, there is disagreement over how the domain
should be modeled. The problem then becomes this, two people might model the
problem differently and arrange the statements along with the support and deny
links in different ways, coming up with their own model or viewpoint. What do we
do with all of these multiple viewpoints on the same problem? Which one is *more*
correct? It is difficult to say which model is better or closer to the real problem.
So assuming that we can't tell for sure which viewpoint is better, is there any way
we can use all of these multiple viewpoints? Is there any way for us to integrate
them with each other or to merge them so you can get some useful information out
of them? Can we tell how the differing viewpoints agree/disagree and to what degree
on certain topics? This is a very interesting question and is an open research topic.
There isn't much current active work in this area. But as Bayesian networks become
more widespread, this issue becomes of greater importance and will eventually need
to be dealt with.
83
Chapter 8
Summary
We have demonstrated successfully how Bayesian networks can be used and integrated with existing applications and provide useful, productive results. The scope
of problems this can be applied to is very large and we are only at the tip of the iceberg right now in terms of applications using Bayesian networks. As computers get
evem more powerful, Bayesian networks will play an even larger role in more problem
domains than ever thought possible. The two broad but important fields of model
based troubleshooting and collaborative decision making has much greater potential
and functionality when the capabilities of probabilistic methods are incorporated into
them. It is thus feasible and highly desirable to see just what else can benefit from
them. The usage of Bayesian nets started a little over 10 years ago and it is showing
no signs of slowing down. It will thus be very interesting to observe its development
over the course of the next few years.
84
Appendix A
delay-simulator code
This code can be found on the server named WILSON in the MIT Artifical Intelligence
Laboratory in the KBCW group. The file is located on that machine at:
w:>hes>delay-simulator.lisp
;;; -*-
Mode: joshua; Syntax:
joshua; Package: user
-*-
(in-package :ju)
;;; structural model
(define-predicate component (name) (ltms: ltms-predicate-model))
(define-predicate dataflow (producer-port producer consumer-port consumer) (ltms:ltms-predicate-model))
(define-predicate port (direction component port-name ) (ltms:ltms-predicate-model))
;;; underlying infrastructure model
(define-predicate resource (name) (ltms:ltms-predicate-model))
;;; dynamic assertions about values
(define-predicate potentially (predicate) (ltms:ltms-predicate-model))
(define-predicate earliest-arrival-time (port component time) (ltms:ltms-predicate-model))
(define-predicate latest-arrival-time (port component time) (ltms:ltms-predicate-model))
(define-predicate earliest-production-time (port component time) (ltms:ltms-predicate-model))
(define-predicate latest-production-time (port component time) (ltms:ltms-predicate-model))
;;; static and dynamic assertions about "mode"s
(define-predicate possible-model (component model) (ltms:ltms-predicate-model))
(define-predicate selected-model (component model) (ltms:ltms-predicate-model))
(define-predicate executes-on
(component resource) (ltms:ltms-predicate-model))
;;; Probabilities (apriori and conditional)
(define-predicate a-priori-probability-of (resource model probability) (ltms:ltms-predicate-model))
(define-predicate conditional-probability (component model resource model conditional-probability)
(ltms:ltms-predicate-model))
(define-predicate-method (notice-truth-value-change potentially :after)
(declare (ignore old-truth-value))
(let* ((predication (second (predication-statement self)))
(type (predication-predicate predication)))
(with-statement-destructured (port component time) predication
(declare (ignore time))
(let ((best-time nil)
(best-guy nil)
(old-best-time nil)
(old-best-guy nil))
(block find-current
(ask '[,type ,port ,component ?his-time]
85
(old-truth-value)
# (lambda (justification)
(setq old-best-time ?his-time old-best-guy (ask-database-predication justification))
(return-from find-current (values)))))
(when (or (null old-best-guy)
(Not (eql :premise (destructure-justification (current-justification old-best-guy)))))
(setq best-time old-best-time)
(ask '[potentially [,type ,port ,component ?new-time]]
#'(lambda (justification)
(when (or (null best-time)
(case type
(earliest-arrival-time (> ?new-time best-time))
(latest-arrival-time (< ?new-time best-time))
(earliest-production-time (> ?new-time best-time))
(latest-production-time (< ?new-time best-time))))
(setq best-time ?new-time best-guy (ask-database-predication justification)))))
(when (or (null old-best-time)
(not (= old-best-time best-time)))
(when old-best-guy
(unjustify old-best-guy))
(when best-time
(tell '[,type ,port ,component ,best-time]
:justification '(best-time-finder (,best-guy))))))))))
(defmacro defensemble (name &key components dataflows resources resource-mappings model-mappings)
'(defun ,name 0
(clear)
,@(loop for (component) in components collect '(tell [component ,component]))
,@(loop for (resource) in resources collect '(tell [resource ,resource]))
,@(loop for (component . plist) in components
for models = (getf plist :models)
for inputs = (getf plist :inputs)
for outputs = (getf plist :outputs)
append (loop for output in outputs collect '(tell [port output ,component ,output]))
append (loop for input in inputs collect '(tell [port input ,component ,input]))
append (loop for model in models collect '(tell [possible-model ,component ,model])))
,@(loop for (resource . models) in resources
append (loop for (model prior-probability) in models
collect '(tell [possible-model ,resource ,model])
when prior-probability
collect '(tell [a-priori-probability-of ,resource ,model ,prior-probability])))
,G(loop for (out-port from-component in-port to-component) in dataflows
collect '(tell [dataflow ,out-port ,from-component ,in-port ,to-component]))
,@(loop for (component resource) in resource-mappings
collect '(tell [executes-on ,component ,resource]))
,@(loop for (component component-model resource resource-model probability) in model-mappings
collect '(tell [conditional-probability ',component ,component-model ,resource ,resource-model
,probability])
(defmacro defmodel ((component model-name) &body rules)
'(progn
,0(loop for (inputs outputs min max) in rules
for clause-number from 0
for clause-name = (make-name component model-name)
append (build-forward-propagators inputs outputs component model-name clause-number min max clause-name)
append (build-backward-propagators inputs outputs component model-name clause-number min max clause-name))))
(defun build-forward-propagators (inputs outputs component model-name clause-number min max rule-name)
(setq rule-name (make-name rule-name 'forward))
(let ((min-rule-name (intern (format nil "a-~a-F-min-~d" component model-name clause-number)))
(max-rule-name (intern (format nil "a-~a-F-max-~d" component model-name clause-number))))
(list (loop for input in inputs
for counter from 0
for logic-value = '(logic-variable-maker ,(intern (format nil "?input-~d" counter)))
for support-lv = '(logic-variable-maker ,(intern (format nil "?support-~d" counter)))
collect logic-value into lvs
collect support-lv into support-lvs
append '((predication-maker '(earliest-arrival-time ,input ,component ,logic-value))
:support support-lv)
into clauses
finally (let ((model-support '(logic-variable-maker ,(intern "?model-support"))))
(return '(defrule ,min-rule-name (:forward)
If (predication-maker
'(and (predication-maker '(selected-model ,component ,model-name))
:support ,model-support
,@clauses))
then (compute-forward-min-delay (list ,@lvs) ,min ',outputs ',component
(list ,model-support ,@support-lvs)
',rule-name)))))
(loop for input in inputs
86
for counter from 0
'(logic-variable-maker ,(intern (format nil "?input-~d" counter)))
for logic-value
for support-lv = '(logic-variable-maker ,(intern (format nil "?support-~d" counter)))
collect logic-value into lvs
collect support-lv into support-lvs
append '((predication-maker '(latest-arrival-time ,input ,component ,logic-value))
:support ,support-lv)
into clauses
finally (let ((model-support '(logic-variable-maker ,(intern "?model-support"))))
(return '(defrule ,max-rule-name (:forward)
If (predication-maker
'(and (predication-maker '(selected-model ,component ,model-name))
:support ,model-support
,Gclauses))
then (compute-forward-max-delay (list ,Qlvs) ,max ',outputs ',component
(list ,model-support ,@support-lvs)
',rule-name))))))))
(defun compute-forward-min-delay (input-times delay output-names component-name support rule-name)
(let* ((max-of-input-times (loop for input-time in input-times maximize input-time))
(output-time (+ max-of-input-times delay)))
(loop for output-name in output-names doing
(tell
'[potentially [earliest-production-time ,output-name ,component-name ,output-time]]
:justification '(,rule-name ,support)))))
(defun compute-forward-max-delay (input-times delay output-names component-name support rule-name)
(let* ((max-of-input-times (loop for input-time in input-times maximize input-time))
(output-time (+ max-of-input-times delay)))
(loop for output-name in output-names doing
(tell '[potentially [latest-production-time ,output-name ,component-name ,output-time]]
:justification '(,rule-name ,support)))))
(defun build-backward-propagators (inputs outputs component model-name clause-number min max rule-name)
(setq rule-name (make-name rule-name 'backwards))
(loop for output in outputs
for output-counter from 0
append(loop for input in inputs
for input-counter from 0
for min-rule-name = (intern (format nil "a-~a-B-min-~d-~d-~d"
component model-name clause-number input-counter output-counter))
for max-rule-name = (intern (format nil "a-~a-B-max-~d-~d-~d"
component model-name clause-number input-counter output-counter))
collect
(loop for other-input in (remove input inputs)
for counter from 0
for logic-value = '(logic-variable-maker ,(intern (format nil "?input-~d" counter)))
for support-lv = '(logic-variable-maker ,(intern (format nil "?support-~d" counter)))
collect logic-value into lvs
collect support-lv into support-lvs
append '((predication-maker
(earliest-arrival-time ,other-input ,component ,logic-value)) :support ,support-lv)
into clauses
finally (let ((model-support '(logic-variable-maker ,(intern "?model-support")))
(output-lv '(logic-variable-maker ,(intern "?output-time")))
(output-support '(logic-variable-maker ,(intern "?output-support"))))
(return
'(defrule ,min-rule-name (:forward)
If (predication-maker
'(and
(predication-maker '(selected-model ,component ,model-name))
:support ,model-support
,output ,component ,output-lv))
(predication-maker '(earliest-production-time
:support ,output-support
,@clauses))
then (compute-backward-min-delay ,output-lv (list ,@lvs) ,max ',input ',component
(list ,model-support ,output-support
,0support-lvs)
',rule-name
,(intern "?model-support")))
(let ((model-support '(logic-variable-maker
(support-lv '(logic-variable-maker ,(intern "?output-support")))
(output-lv '(logic-variable-maker ,(intern "?output"))))
'(defrule ,max-rule-name (:forward)
If (predication-maker
'(and (predication-maker '(selected-model ,component ,model-name)) :support ,model-support
(predication-maker '(latest-production-time ,output ,component ,output-lv))
:support ,support-lv))
then (compute-backward-max-delay ,output-lv ,min ',input ',component
(list ,model-support ,support-lv)
collect
87
,rule-name))))))
(defun compute-backward-min-delay (output-time other-input-times delay input-name component-name support rule-name)
(let* ((max-of-other-input-times (loop for input-time in other-input-times maximize input-time))
(input-constraint (- output-time delay)))
(when (> input-constraint max-of-other-input-times)
(tell '[potentially [earliest-arrival-time ,input-name ,component-name ,input-constraint]]
:justification '(,rule-name ,support)))))
(defun compute-backward-max-delay (output-time delay input-name component-name support rule-name)
(let* ((constraint (- output-time delay)))
'[potentially [latest-arrival-time ,input-name ,component-name ,constraint]]
(tell
:justification '(,rule-name ,support))))
;;;Basic rules for detecting conflicts
and propagating dataflows
(defrule time-inconsistency-1 (:forward)
If [and [earliest-arrival-time ?input ?component ?early]
[latest-arrival-time ?input ?component ?late]
(> ?early ?late)
I
then [ltms:contradiction])
(defrule time-inconsistency-2 (:forward)
If [and [earliest-production-time ?output ?component ?early]
[latest-production-time ?output ?component ?late]
(> ?early ?late)
]
then [ltms:contradiction])
(defrule dataflow-1 (:forward)
If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer]
[earliest-arrival-time ?consumer-port ?consumer ?time]]
then [earliest-production-time ?producer-port ?producer ?time])
(defrule dataflow-2 (:forward)
If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer]
[latest-arrival-time ?consumer-port ?consumer ?time]]
then [latest-production-time ?producer-port ?producer ?time])
(defrule dataflow-3 (:forward)
If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer]
[earliest-production-time ?producer-port ?producer ?time]]
then [earliest-arrival-time ?consumer-port ?consumer ?time])
(defrule dataflow-4 (:forward)
If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer]
[latest-production-time ?producer-port ?producer ?time]]
then [latest-arrival-time ?consumer-port ?consumer ?time])
building the equivalent ideal model
predicates to map over:
component
resource
possible-model
a-priori-probability-of
conditonal-probability
set-up-evidence-nodes
;;;the dynamic part:
selected-model
(defun build-ideal-model 0
(let ((ideal-diagram nil))
(setq ideal-diagram (build-nodes ideal-diagram
(setq ideal-diagram (build-nodes ideal-diagram
(do-priors ideal-diagram)
(do-conditional-probabilities ideal-diagram)
ideal-diagram))
(defun build-nodes (ideal-diagram
(ask '[,node-type ?node]
#'(lambda (stuff)
(declare (ignore stuff))
(let ((states nil))
'component))
'resource))
node-type)
88
(ask [possible-model ?node ?model-name]
#'(lambda (stuff)
(declare (ignore stuff))
(push ?model-name states)))
(multiple-value-bind (new-diagram parent-node)
(ideal:add-node ideal-diagram
:name ?node
:type :chance
:relation-type :prob
:state-labels states)
(setq ideal-diagram new-diagram)
(when (eql node-type 'component)
(loop for state in states
for new-name = (make-name ?node state)
do (multiple-value-bind (new-diagram child-node)
(ideal:add-node ideal-diagram
:name new-name
:type :chance
:relation-type :prob
:state-labels '(:true :false))
(setq ideal-diagram new-diagram)
;; Add an arc from the parent node to it
(ideal:add-arcs child-node (list parent-node))
;; Set the conditional probabilities on the child evidence node
(ideal :for-all-cond-cases (parent-case parent-node)
(let ((correct-corresponding-case
(eql (ideal:state-in parent-case)
(ideal:get-state-label parent-node state))))
(ideal:for-all-cond-cases (child-case child-node)
(let ((true-case (eql (ideal:state-in child-case)
(ideal:get-state-label child-node :true))))
(if correct-corresponding-case
(setf (ideal:prob-of child-case parent-case)
(if true-case 1 0))
(setf (ideal:prob-of child-case parent-case)
(if true-case 0 1))))))))))))))
ideal-diagram)
(defun do-priors (ideal-diagram)
(ask [a-priori-probability-of ?resource ?model ?probability]
#'(lambda (stuff)
(declare (ignore stuff))
(let* ((ideal-node (ideal:find-node ?resource ideal-diagram))
(state-label (ideal:get-state-label ideal-node ?model)))
(ideal:for-all-cond-cases (case ideal-node)
(when (eql (ideal:state-in case) state-label)
(setf (ideal:prob-of case nil) ?probability)))))))
(defun do-conditional-probabilities (ideal-diagram)
;; first add arcs
(ask [component ?component]
#'(lambda (stuff)
(declare (ignore stuff))
(let ((component-node (ideal:find-node ?component ideal-diagram)))
(ask [executes-on ?component ?resource]
#'(lambda (stuff)
(declare (ignore stuff))
(let ((resource-node (ideal:find-node ?resource ideal-diagram)))
resource-node))))))))
(ideal:add-arcs component-node (list
now build conditional probabilities
(ask [conditional-probability ?component ?c-model ?resource ?r-model ?probability]
#'(lambda (stuff)
(declare (ignore stuff))
(let* ((resource-node (ideal:find-node ?resource ideal-diagram))
(resource-label (ideal:get-state-label resource-node ?r-model))
(component-node (ideal: find-node ?component ideal-diagram))
(component-label (ideal:get-state-label component-node ?c-model)))
(ideal :for-all-cond-cases (c-case component-node)
(when (eql (ideal:state-in c-case) component-label)
(ideal:for-all-cond-cases (r-case resource-node)
(when (eql (ideal:state-in r-case) resource-label)
(setf (ideal:prob-of c-case r-case) ?probability))))))))
(defun make-name (&rest names)
(intern (format nil "{a~-}" names)))
(defun make-component-state-invalid (component state diagram)
(let* ((node-name (make-name component state))
(node (ideal:find-node node-name diagram)))
89
;; set ideal evidence
node))
(defstruct (retraction-entry)
clause probability ideal-node component model)
;;;the other thing to do is build a probability model equivalent of the
;;;the disallowed set of states conjunctively imply a binary node whose
;;;this will have the effect that when one of these states becomes more
;;;less likely. So the true-true-...-true case justifies the true case
and the other cases justify the false case equally?
no-good in which
false state has evidence.
likely the others will become
of the ideal "contradiction-node"
(defun pick-best-guy-to-retract (condition ideal-diagram)
(let ((assumptions (tms-contradiction-non-premises condition)))
(let ((retraction-entries (loop for clause in assumptions
collect (multiple-value-bind (mnemonic assumption) (destructure-justification clause)
(declare (ignore mnemonic))
(with-statement-destructured (component model) assumption
(let* ((node-name (make-name component model))
(ideal-node (ideal: find-node node-name ideal-diagram))
(true-state (ideal: get-state-label ideal-node :true))
(probability nil))
(ideal :for-all-cond-cases (node-case ideal-node)
(when (eql (ideal:state-in node-case) true-state)
(setq probability (ideal:belief-of node-case))))
(make-retraction-entry :clause clause :probability probability
:ideal-node ideal-node
:component component :model model)))))))
retraction-entries)
(loop with best = (first
for entry in (rest retraction-entries)
when (< (retraction-entry-probability entry) (retraction-entry-probability best))
do (setq best entry)
finally (return best)))))
(defun find-best-other-state (best ideal-diagram)
(let* ((ideal-node (ideal :find-node (retraction-entry-component best) ideal-diagram))
(ideal-state (ideal:get-state-label ideal-node (retraction-entry-model best)))
(component (retraction-entry-component best))
(best-other-case nil)
(best-other-probability nil))
(ideal :for-all-cond-cases (node-case ideal-node)
(when (and (not (eql ideal-state (ideal:state-in node-case)))
(or (null best-other-case)
(> (ideal:belief-of node-case) best-other-probability)))
(setq best-other-probability (ideal:belief-of node-case))
(setq best-other-case (ideal::label-name (ideal:state-in node-case)))))
(values
,component ,best-other-case])))
'[selected-model
(defun update-ideal-for-unretraction (component model ideal-diagram jensen-tree)
(let ((ideal-node (ideal:find-node (make-name component model) ideal-diagram)))
(when (eql (ideal:node-state ideal-node) (ideal:get-state-label ideal-node :false))
(format t "%Reconsidering model ~a of component ~a" model component)
(ideal:remove-evidence (list ideal-node))
(ideal:jensen-infer jensen-tree ideal-diagram))))
(defun update-ideal-for-retraction (best ideal-diagram jensen-tree)
;; provide negative evidence for the guy we just removed and resolve the diagram
(let ((ideal-node (retraction-entry-ideal-node best)))
(setf (ideal:node-state ideal-node)
(ideal:get-state-label ideal-node :false))
(ideal: jensen-infer jensen-tree ideal-diagram)))
(defun run-case (ensemble-name input-timings output-timings)
set up with each model being the normal one
;; first
(funcall ensemble-name)
(let* ((ideal-diagram (build-ideal-model))
(ideal-diagram-jensen-tree (ideal:create-jensen-join-tree ideal-diagram)))
(ideal:jensen-infer ideal-diagram-jensen-tree ideal-diagram)
(with-atomic-action
(loop for (node component time) in input-timings
'[potentially [earliest-arrival-time ,node ,component ,time]]
doing (tell
:justification :premise)
,node ,component ,time]
'[latest-arrival-time
(tell
:justification :premise))
(loop for (node component time) in output-timings
'[potentially [earliest-production-time ,node ,component ,time]]
doing (tell
:justification :premise)
90
,node ,component ,time]
'[latest-production-time
:justification :premise)))
(let ((guys-signalled-to-assert nil))
(condition-bind
((ltms:ltms-contradiction #'(lambda (condition)
(let ((best (pick-best-guy-to-retract condition ideal-diagram)))
(update-ideal-for-retraction best ideal-diagram ideal-diagram-jensen-tree)
(push (find-best-other-state best ideal-diagram) guys-signalled-to-assert)
(let ((clause-to-retract (retraction-entry-clause best)))
(multiple-value-bind (mnemonic consequent)
(destructure-justification clause-to-retract)
(declare (ignore mnemonic))
(with-statement-destructured (component model) consequent
(format t "~XRetracting Model ~a of component ~a" model component)))
clause-to-retract)))))))
(sys:proceed condition :unjustify-subset (list
(loop for first-time = t then nil
for guys-to-assert = nil then guys-signalled-to-assert
until (and (not first-time) (null guys-to-assert))
doing
(setq guys-signalled-to-assert nil)
(cond
(first-time
(ask [component ?component]
#'(lambda (stuff)
(declare (ignore stuff))
(tell [selected-model ?component normal]
:justification :assumption))))
(guys-to-assert
(loop for guy-to-assert in guys-to-assert doing
(with-statement-destructured (component model) guy-to-assert
(format t "~XChoosing Model ~a for ~a" model component)
(update-ideal-for-unretraction component model ideal-diagram ideal-diagram-jensen-tree)
guy-to-assert :justification :assumption)))))))
(tell
ideal-diagram)))
(tell
;;; Examples of system descriptions
#1
Example #1 Simple hacked resource
(defensemble test-1
:components ((foo :models (normal fast) :inputs (a b) :outputs (c d))
(bar :models (normal slow) :inputs (a b) :outputs (x y)))
:dataflows ((c foo a bar)
(d foo b bar))
:resources ((box-1 (normal .9) (hacked .1)) (box-2 (normal .8) (hacked
:resource-mappings ((foo box-1) (bar box-2))
:model-mappings ((foo normal box-i normal .90)
(foo fast box-1 normal .10)
(foo normal box-1 hacked .20)
(foo fast box-1 hacked .80)
(bar normal box-2 normal .90)
(bar slow box-2 normal .10)
(bar normal box-2 hacked .10)
(bar slow box-2 hacked .90)))
(defmodel (foo normal)
((a b) (c d) 2 7))
(defmodel (foo fast)
((a b) (c d) 1 5))
(defmodel (bar normal)
((a) (x) 5 10)
((b) (y) 7 15))
(defmodel (bar slow)
((a) (x) 10 20)
((b) (y) 15 20))
Various test cases with different inputs/outputs to test system
normal case with no faults
P(box-1 = hacked) = 0.10
P(box-2 = hacked) = 0.20
(run-case 'test-1
'((a foo 10) (b foo 15))
'((x bar 30)))
;;fault in the output of bar, slower than expected
the model of bar is changed to slow which solves the contradiction
91
.2)))
Also the probability of box-2 being hacked which bar runs on increases
from 0.20 to 0.69 which makes sense and is to be expected
(run-case 'test-1
'((a foo 10) (b foo 15))
'((x bar 34)))
fault is in the output of foo where it comes earlier than is expected.
The system correctly changes the model of foo to fast which makes it consistent
Probability of box-1 being hacked increases from
with the given inputs/outputs.
0.10 to 0.47
(run-case 'test-1
'((a foo 10) (b foo 15))
'((c foo 16)))
this is a case with no possible solutions, i.e. no combination of selected models
for the components will yield in a non-contradictory state
Right now the system does not cope with this situation yet, i.e. it doesn't know when
to stop. There appears to be a problem with the setting of evidence in the ideal side
Perhaps the evidence should be reset on each iteration of picking a component
of things.
model to retract. Problem arises when the code tries to retract a retraction which on the
Ideal side is equivalent to setting evidence both for the True and the False case which
is impossible.
(run-case 'test-1
'((a foo 10) (b foo 15))
'((x bar 43)))
1#
#1
Example #2 Branch
(defensemble test-2
:components ((foo :models (normal fast) :inputs (a) :outputs Cb))
(bar :models (normal slow) :inputs (a) :outputs (x))
(baz :models (normal really-slow) :inputs (a) :outputs (y)))
:dataflows ((b foo a bar)
(b foo a baz))
(resource-2 (normal .8)
:resources ((resource-1 (normal .9) (hacked .1))
:resource-mappings ((foo resource-1) (bar resource-2) (baz resource-2))
:model-mappings ((foo normal resource-1 normal .90)
(foo fast resource-1 normal .10)
(foo normal resource-1 hacked .20)
(foo fast resource-1 hacked .80)
(bar normal resource-2 normal .90)
(bar slow resource-2 normal .10)
(bar normal resource-2 hacked .10)
(bar slow resource-2 hacked .90)
(baz normal resource-2 normal .90)
(baz really-slow resource-2 normal .10)
(baz normal resource-2 hacked .10)
(baz really-slow resource-2 hacked .90)))
(defmodel (foo normal)
((a) (b) 2 7))
(defmodel (foo fast)
((a) (b) 1 5))
(defmodel (bar normal)
((a) (x) 5 10))
(defmodel (bar slow)
((a) (x) 7 15))
(defmodel (baz normal)
((a) (y) 5 10))
(defmodel (baz really-slow)
((a) (y) 10 17))
Various test cases with different inputs/outputs to test system
normal case with no faults
;;P(resource-1 = hacked) = 0.10
;;P(resource-2 = hacked) = 0.20
(run-case 'test-2
'((a foo 10))
'((x bar 25) (y baz 25)))
fault on output of baz, value is
higher than tolerable
92
(hacked
.2)))
the selected model of baz is changed from normal to really-slow
which solves the contradiction.
The probability that resource-2 is hacked increases from .2 to .69
(run-case 'test-2
'((a foo 10))
'((x bar 25) (y baz 30)))
same as
(run-case
'((a foo
'((x bar
previous but with bar having the fault instead
'test-2
10))
30) (y baz 25)))
The result
both bar and baz have a fault on the outputs.
is that both bar and baz have been changed to their *slower* models
The probability that resource-2 is hacked increases from .2 to .95
This makes sense and is to be expected.
(run-case 'test-2
'((a foo 10))
'((x bar 30) (y baz 30)))
This case has bar with a slow fault output and baz with a fast fault output
Once again no solution exists for this case and the system doesn't know how to
deal with this case.
Interesting note is that the error you get is that the resource-2 node has invalid
somehow the evidence that is given to it causes the belief of the node to be
values, i.e.
0.0 0.0 which is an impossible case.
(run-case 'test-2
'((a foo 10))
'((x bar 30) (y baz 15)))
Gave a value such that foo must be in the fast case to see what would happen.
This is because
bar and baz were both set to the *slower* states interestingly enough.
with the numbers in the model, it is more probable that foo is in the normal state
to see if that solves
So bar and baz's states are changed first
compared to bar and baz.
the contradiction. Not quite what one would ideally want but it is consistent with what
the code is supposed to do. This is a configuration consistent with the given
The probability
inputs/outputs though not necessarily the best or most probable answer.
that resource-1 has been hacked increases from .1 to .47.
(run-case 'test-2
'((a foo 10))
'((x bar 25) (y baz 25) (b foo 11)))
1#
#1
Example #3 Branch and Join
(defensemble test-3
:components ((web-server :models (normal peak off-peak) :inputs (queryl query2) :outputs (answer1 answer2))
(dollar-monitor :models (normal slow) :inputs (update) :outputs (result))
(yen-monitor :models (normal slow really-slow) :inputs (update) :outputs (result))
(bond-trader :models (normal fast slow) :inputs (price) :outputs (decision))
(currency-trader :models (normal fast slow) :inputs (pricel price2) :outputs (decision)))
:dataflows ((answerl web-server update dollar-monitor)
(answer2 web-server update yen-monitor)
(result dollar-monitor price bond-trader)
(result dollar-monitor pricel currency-trader)
(result yen-monitor price2 currency-trader))
:resources ((wallst-server (normal .9) (hacked .1))
(jpmorgan-net (normal .85) (hacked .15))
(bonds-r-us (normal .8) (hacked .2))
(trader-joe (normal .7) (hacked .3)))
:resource-mappings ((web-server wallst-server)
(dollar-monitor jpmorgan-net)
(yen-monitor jpmorgan-net)
(bond-trader bonds-r-us)
(currency-trader trader-joe))
:model-mappings ((web-server normal wallst-server normal .6)
(web-server peak wallst-server normal .1)
(web-server off-peak wallst-server normal .3)
(web-server normal wallst-server hacked .15)
(web-server peak wallst-server hacked .8)
(web-server off-peak wallst-server hacked .05)
(dollar-monitor normal jpmorgan-net normal .80)
(dollar-monitor slow jpmorgan-net normal .20)
(dollar-monitor normal jpmorgan-net hacked .30)
(dollar-monitor slow jpmorgan-net hacked .70)
(yen-monitor normal jpmorgan-net normal .6)
(yen-monitor slow jpmorgan-net normal .25)
(yen-monitor really-slow jpmorgan-net normal .15)
(yen-monitor normal jpmorgan-net hacked .05)
93
(yen-monitor slow jpmorgan-net hacked .45)
(yen-monitor really-slow jpmorgan-net hacked .50)
(bond-trader normal bonds-r-us normal .5)
(bond-trader fast bonds-r-us normal .25)
(bond-trader slow bonds-r-us normal .25)
(bond-trader normal bonds-r-us hacked .05)
(bond-trader fast bonds-r-us hacked .45)
(bond-trader slow bonds-r-us hacked .50)
(currency-trader normal trader-joe normal .5)
(currency-trader fast trader-joe normal .25)
(currency-trader slow trader-joe normal .25)
(currency-trader normal trader-joe hacked .05)
(currency-trader fast trader-joe hacked .45)
(currency-trader slow trader-joe hacked .50)))
(defmodel (web-server normal)
((queryl) (answerl) 4 8)
((query2) (answer2) 5 10))
(defmodel (web-server peak)
((queryl) (answerl) 7 11)
((query2) (answer2) 8 13))
(defmodel (web-server off-peak)
((queryl) (answer1) 1 5)
((query2) (answer2) 2 7))
(defmodel (dollar-monitor normal)
((update) (result) 3 6))
(defmodel (dollar-monitor slow)
((update) (result) 6 10))
(defmodel (yen-monitor normal)
((update) (result) 4 7))
(defmodel (yen-monitor slow)
((update) (result) 7 10))
(defmodel (yen-monitor really-slow)
((update) (result) 10 15))
(defmodel (bond-trader normal)
((price) (decision) 3 7))
(defmodel (bond-trader fast)
((price) (decision) 1 3))
(defmodel (bond-trader slow)
((price) (decision) 6 10))
(defmodel (currency-trader normal)
((pricel price2) (decision) 3 7))
(defmodel (currency-trader fast)
((pricel price2) (decision) 1 3))
(defmodel (currency-trader slow)
((pricel price2) (decision) 6 10))
Various test cases with different inputs/outputs to test system
normal case with no faults
(run-case 'test-3
'((queryl web-server 10) (query2 web-server 15))
'((decision bond-trader 25) (decision currency-trader
28)))
slow fault on the output of bond-trader.
bond-trader is correctly changed from the normal to the slow state.
The probability of the resource bonds-r-us being hacked increased from
.20 to .32.
(run-case 'test-3
'((queryl web-server 10) (query2 web-server 15))
'((decision bond-trader 32) (decision currency-trader 28)))
bond-trader was given a fast fault and it sets the bond-trader
component to the fast model which is correct. The probability of the resource
;;bonds-r-us being hacked increases from .20 to .31
(run-case 'test-3
'((queryl web-server 10) (query2 web-server 15))
94
'((decision
bond-trader
18)
(decision currency-trader 28)))
I was trying to give both bond-trader and currency-trader slow faults in the hopes
of seeing that the resource jpmorgan-net has an increased probability of being hacked.
Unfortunately, this example causes the system to choke because there is no backtracking
Thus it attempts to do a hillclimbing approach which leads to no
implemented yet.
possible solution and it can't handle things afterwards.
(run-case 'test-3
'((queryl web-server 10) (query2 web-server 15))
'((decision bond-trader 35) (decision currency-trader 45)))
A fix to these problems would definitely have to include a way to be able to backtrack
and keep track of what combinations of assumptions you have made/unjustified so you can
Perhaps a new strategy of picking which
redo them if it doesn't solve the problem.
This is definitely necessary
component to unjustify and change models is required.
if we are thinking of handling multiple iterations of retractions. A simple fix might
be reset the evidence at each iteration to test consistency, then set the evidence given
Thus if a retraction has been retracted, it will
the things that are known to be false.
hopefully not be in the joshua database of true/false things and in essence a form of
backtracking is created. It must be noted though that this is still only a hack and not
a true form of backtracking.
95
Appendix B
comlink-ideal code
This code can be found on the server named WILSON in the MIT Artifical Intelligence
Laboratory in the KBCW group. The file is located on that machine at:
w:>comlink>v-5>kbcw>code>ideal.lisp
(defclass comlink-to-ideal-mapping
()
((forward-map :initform (make-hash-table) :accessor forward-map)
(backward-map :initform (make-hash-table) :accessor backward-map)
(ideal-diagram :accessor ideal-diagram :initform nil)
(issue-node :initform nil :initarg :issue :accessor issue)
(root-node-map :initform (make-hash-table) :accessor root-node-map)
(defmethod intern-comlink-node ((the-map comlink-to-ideal-mapping) node)
(with-slots (forward-map backward-map root-node-map ideal-diagram) the-map
(let* ((the-ideal-node (gethash node forward-map)))
(cond ((not (null the-ideal-node))
(values the-ideal-node :old))
(t (multiple-value-bind (new-diagram new-node)
(etypecase node
(compound-document-record
(ideal:add-node ideal-diagram
:name (gensym "NODE")
:state-labels '(:false :true)
:type :chance
:relation-type :prob
:noisy-or nil))
(basic-document-record
(when (and (null (follow-link node :supports :backward))
(null (follow-link node :denies :backward)))
(setf (gethash node root-node-map) nil))
(ideal:add-node ideal-diagram
:name (gensym "NODE")
:state-labels '(:false :true)
:noisy-or t
:noisy-or-subtype :binary)))
(setq the-ideal-node new-node ideal-diagram new-diagram)
(setf (gethash node forward-map) the-ideal-node
(gethash the-ideal-node backward-map) node)
(values the-ideal-node :new)))))))
(defun build-an-ideal-structure (issue-node)
(let* ((the-map (make-instance 'comlink-to-ideal-mapping))
(forward-map (forward-map the-map)))
(with-db-transaction
Pass 1:
First traverse the graph and get every node interned
we only queue up document children if interning them tells us this is the
96
;; first time we're seeing it.
(labels ((do-a-node (node)
(multiple-value-bind (ideal-node old-or-new?) (intern-comlink-node the-map node)
(declare (ignore ideal-node))
(when (eql old-or-new? :new)
(typecase node
it's important that this goes first because
compound-document-record is a sub-class of basic-document-record
(compound-document-record
(loop for child in (component-document-records node)
doing (do-a-node child)))
(basic-document-record
(loop for child in (follow-link node :supports :backward)
doing (do-a-node child))
(loop for child in (follow-link node :denies :backward)
doing (do-a-node child)))))))
(loop for node in (follow-link issue-node :is-a-hypothesis-about :backward)
do (do-a-node node)))
pass 2:
Now for each link between nodes build the ideal arcs since we
know all the nodes are in the hash table we can iterate over
that rather than walking the graph.
(loop for comlink-node being the hash-keys of forward-map using (hash-value ideal-node)
do (typecase comlink-node
see above about ordering
(compound-document-record
(ideal:add-arcs ideal-node
(loop for child in (component-document-records comlink-node)
for ideal-child = (intern-comlink-node the-map child)
collect ideal-child)))
(basic-document-record
(let ((ideal-children nil))
(loop for child in (follow-link comlink-node :supports :backward)
doing (push (intern-comlink-node the-map child) ideal-children))
(loop for child in (follow-link comlink-node :denies :backward)
doing (push (intern-comlink-node the-map child) ideal-children))
(ideal:add-arcs ideal-node ideal-children)))))
pass 3:
set up the probabilities
(loop for comlink-node being the hash-keys of forward-map using (hash-value
do (typecase comlink-node
see above about ordering
(compound-document-record (set-up-and-node comlink-node ideal-node the-map))
(basic-document-record
(if (null (ideal:node-predecessors ideal-node))
(set-up-root-node comlink-node ideal-node the-map)
(set-up-noisy-or-node comlink-node ideal-node the-map))))))
the-map))
(defun set-up-and-node (comlink-node ideal-node the-map)
(declare (ignore comlink-node the-map))
(ideal:for-all-cond-cases (cond-case (ideal:node-predecessors ideal-node))
(let ((all-true (loop for remaining-cases on cond-case
for next-node = (ideal:node-in remaining-cases)
for next-state = (ideal:state-in remaining-cases)
always (eql next-state (ideal:get-state-label next-node :true)))))
(ideal: for-all-cond-cases (node-case ideal-node)
(let ((true-node-case (eql (ideal:state-in node-case) (ideal:get-state-label
(if all-true
(setf (ideal:prob-of node-case cond-case)
(if true-node-case 1 0))
(setf (ideal:prob-of node-case cond-case)
(if true-node-case 0 1))))))))
ideal-node)
ideal-node
(defun set-up-noisy-or-node (comlink-node ideal-node the-map)
(loop for link in (get-document-backward-links comlink-node (kbcw::supports-link))
for source = (Xdocument-link-source link)
for ideal-predecessor = (intern-comlink-node the-map source)
for predecessor-false-label = (ideal:get-state-label ideal-predecessor :false)
for probability = (Xdocument-link-certainty-factor link)
for inhibition-value = (- 1 (/ (float probability) 100.0))
do (ideal :for-all-cond-cases (case ideal-predecessor)
Notice that since this uses a node, not a list of nodes, each cond-case
consisting of a single pair.
will be a list
(when (eql (ideal:state-in case) predecessor-false-label)
(setf (ideal:inhibitor-prob-of ideal-node case) inhibition-value))))
(Ideal:compile-noisy-or-distribution ideal-node)
97
:true))))
(defun set-up-root-node (comlink-node ideal-node the-map)
is for the root of the comlink evidence chain
;;this
create another node linked to this one by a conditional probability
If provide evidence for
representing our intended belief in the node.
this node then it will have the desired effect.
(with-slots (ideal-diagram root-node-map) the-map
(loop for inhibition-value in '(.1 .2 .3)
for new-node = (multiple-value-bind (new-diagram new-node)
(ideal: add-node ideal-diagram
:name (gensym "NODE")
:state-labels '(:false :true)
:type :chance
:relation-type :prob
:noisy-or nil)
(setq ideal-diagram new-diagram)
new-node)
for new-guys-false-label = (ideal:get-state-label new-node :false)
new-node))
do (ideal:add-arcs ideal-node (list
(push (cons inhibition-value new-node) (gethash comlink-node root-node-map))
(ideal:for-all-cond-cases (case new-node)
of nodes, each cond-case
Notice that since this uses a node, not a list
consisting of a single pair.
will be a list
(when (eql (ideal:state-in case) new-guys-false-label)
(setf (ideal: inhibitor-prob-of ideal-node case) inhibition-value)))
(defun decide-on-evidence (map)
(with-slots (root-node-map) map
(loop for comlink-document being the hash-keys of root-node-map using (hash-value choices)
(.3 .2 .1))))
for prob = (progn (print comlink-document) (scl:accept '(dw:member-sequence
for node = (cdr (assoc prob choices :test #'=))
do (loop for (prob . node) in choices
do (setf (ideal:node-state node) (ideal:get-state-label node :false)))
(setf (ideal:node-state node) (ideal:get-state-label node :true)))))
98
Bibliography
[1] Eugene Charniak. Bayesian networks without tears. Al Magazine, 1991.
[2] G. F. Cooper. Probabilistic inference using belief networks is np-hard. Medical
computer science group, Stanford University, 1987.
[3] Bruce D'Ambrosio. Inference in bayesian networks. Al Magazine, 20(2):21-35,
1999.
[4] Randall Davis and Walter Hamscher. Model-based reasoning: Troubleshooting.
In Howard E. Shrobe, editor, Exploring Artificial Intelligence, chapter 8, pages
297-346. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1988.
[5] Johan de Kleer and Brian C. Williams. Diagnosis with behavioral modes. Knowledge Representation,pages 1324-1330, 19.
[6] Peter Haddawy. An overview of some recent developments in bayesian problemsolving techniques. Al Magazine, 20(2):11-19, 1999.
[7] Peter Szolovits Howard Shrobe, Jon Doyle.
Active trust management (atm)
for autonomous adaptive survivable systems (aass's). Proposal for ARPA BAA
99-10, 1999.
[8] Wellman MP, Eckman MH, and et al. Fleming C.
Automated critiquing of
medical decision trees. Medical Decision Making, 9:272-84., 1989.
[9] Judea Pearl. ProbabilisticReasoning in Intelligent Systems. Morgan Kaufmann
Publishers, INC., 1988.
99
[10] G. Shafer. A Mathematical Theory of Evidence. Princeton University Press.,
1976.
[11] Peter Struss and Oskar Dressler. Physical negation - integrating fault models
into the general diagnostic engine. Knowledge Representation,pages 1318-1323,
19.
100
Download