Integration of a Bayesian Net Solver With the KBCW Comlink System and a Network Intrusion Diagnosis System by Erwin Tam Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 1999 © Erwin Tam, MCMXCIX. All rights reserved. The author hereby grants to MIT permission to reproduce and ENG distribute publicly paper and electronic copies of this thesis MASSACHUSEIStW1AT1IM document in whole or in part. OF TECHN00 Y~ is 0* --.-.-.-.-.-.-. Author ........................ Department of Electrical Engipjeering and Computer Science August 25, 1999 Certified by.. ----. ......... Howard E. Shrobe Professor Thesis-5pervisor Accepted by................. Arthur C. Smith Chairman, Department Committee on Graduate Students Integration of a Bayesian Net Solver With the KBCW Comlink System and a Network Intrusion Diagnosis System by Erwin Tam Submitted to the Department of Electrical Engineering and Computer Science on August 25, 1999, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract This thesis attempts to demonstrate the practical aspect of applying probabilistic methods, namely Bayesian nets, towards solving a variety of applications. Uncertainty comes into play when dealing with real world applications. Rather than trying to solve these problems exactly, a probabilistic approach is more feasible and can provide useful results. For this project, a Bayes net solver was integrated with a network intrusion detector with resource stealing the primary attack that was focused upon. A Bayes net solver was also integrated into the Knowledge Based Collaboration Webs (KBCW) Comlink system. The results show that such an integration between different ideas can be both useful and productive, providing functionality that neither alone could accomplish. The project is a successful demonstration of the potential benefits that Bayesian nets can provide. Thesis Supervisor: Howard E. Shrobe Title: Professor 2 Acknowledgments I would like to thank Professor Howard E. Shrobe for his endless patience and support. Without him, there could have been no thesis. His understanding and consideration helped me get through tough times both academically and personally. I can honestly say that at least half of the credit for finishing this thesis should be attributed to him. My eternal thanks and gratitude go out to him. I would also like to thank the other members of the KBCW group in the Al lab that helped me with my daily questions, always with a friendly ear. Getting work done and trying to find research ideas was never easy but it was so much better being able to draw upon the creative ideas and suggestions by everyone in the lab. Lastly I would like to thank my family and close friends who have kept me sane enough emotionally to maintain the focus needed to finally finish this thesis. I needed some help in overcoming this hurdle and for that, I am thankful. Times can be tough, life can seem gloom, and the last thing on one's mind is work. Without the love and support of those around you, things would be nearly impossible to cope with alone. My love goes out to them. Thanks and God Bless. 3 Contents 8 1 Introduction . . . . . . . . . . . . . 8 . . . . . . . . 9 1.3 Applications and Scope . . . . . . . 10 1.4 Objectives . . . . . . . . . . . . . . 11 1.1 Uncertainty 1.2 Probabilistic Models 1.5 1.4.1 MBT for Network Intrusion 11 1.4.2 KBCW Comlink System . . 12 . . . . . . . . . . . . . . 13 Roadmap 14 2 Bayesian Nets 2.1 H istory . . . . . . . . . . . . . . . . 14 2.2 Description . . . . . . . . . . . . . 18 2.2.1 Simple Example . . . . . . . 18 2.2.2 Independence Assumptions . 20 2.2.3 Consistent Probabilities . . 21 2.2.4 Exact Solutions . . . . . . . 22 2.2.5 Approximate Solutions . . . 23 . . . . . . . . . . . . . 24 . . . . . . . . 24 2.3 2.4 Advantages 2.3.1 Computation 2.3.2 Structure . . . . . . . . . . 24 2.3.3 Human reasoning . . . . . . 24 Disadvantages . . . . . . . . . . . . 25 Scaling . . . . . . . . . . . . 25 2.4.1 4 3 6 25 2.4.3 Conflicting model . . . . . . . . . . . . . . . . . . . . . . . . . 25 26 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Basic Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3 Alternate Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.1 Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.2 Fault Dictionaries and Diagnostics . . . . . . . . . . . . . . . 28 3.3.3 Rule based Systems . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.4 W hen not to use the model-based approach . . . . . . . . . . 29 Three Fundamental Tasks . . . . . . . . . . . . . . . . . . . . . . . . 30 3.4.1 Hypothesis Generation . . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.3 Hypothesis Discrimination . . . . . . . . . . . . . . . . . . . . 31 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.5 5 Probability values . . . . . . . . . . . . . . . . . . . . . . . . . Model Based Troubleshooting 3.4 4 2.4.2 34 Design 4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2 Process modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Input/Output Black Box modeling . . . . . . . . . . . . . . . . . . . 37 4.4 Fault modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 Probabilistic Integration . . . . . . . . . . . . . . . . . . . . . . . . . 40 46 Implementation 5.1 Linear Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2 Branch/Fan process. . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3 Branch and Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 58 Conclusion 6.1 System critique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 System limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 7 6.2.1 Lack of correctness detection . . . . . . . . . . . . . . . . . . . 60 6.2.2 Lack of probabilistic links between components . . . . . . . . 60 6.2.3 Lack of descriptive model states . . . . . . . . . . . . . . . . . 60 6.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.4 Lessons learned ....... .............................. 62 64 KBCW Comlink System 7.1 D escription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 7.2 Integrating a Bayes Net Solver . . . . . . . . . . . . . . . . . . . . . . 66 7.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Exam ple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 C onclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.4.1 System critique . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.4.2 Lessons learned . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.5.1 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . 82 7.5.2 Multiple Viewpoints . . . . . . . . . . . . . . . . . . . . . . . 83 7.3.1 7.4 7.5 8 Summary 84 A delay-simulator code 85 B comlink-ideal code 96 6 List of Figures . . . . . . . . . . . . . . 19 . . . . . . . . . . . . . 21 4-1 A simple sample component. . . . . . . . . . . . . . . . . 38 4-2 A second example of a component module. . . . . . . . . 39 4-3 Completely specified component with probabilistic model included. 43 5-1 Linear process model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5-2 Linear process apriori probabilistic model. . . . . . . . . . . . . . . . 48 5-3 Linear process post evidential probabilistic model..... . . . . . . . 49 5-4 Branch/Fan process model. . . . . . . . . . . . . . . . . . . . . . . . 50 5-5 Branch/Fan process probabilistic model. . . . . . . . . . . . . . . . . 51 5-6 Branch and Join model. . . . . . . . . . . . . . . . . . . . . . . . . . 55 5-7 Branch and Join probabilistic model. . . . . . . . . . . . . . . . . . . 55 7-1 The Burger Problem before any evidence. . . . . . . . . . . . . . . . . 75 7-2 Cheap cream cheese developed. . . . . . . . . . . . . . . . . . . . . . 76 7-3 Germany goes bankrupt and liquidates pickled cabbage to the world. 76 7-4 McDonald's survey show customers value variety the most. . . . . . . 77 7-5 Medical results say that you need to eat meat. . . . . . . . . . . . . . 77 7-6 Estimated cost of pickled cabbage quadruple burger too high..... 78 7-7 New McDonald's survey also shows that customers love the cheeses. 79 7-8 Final result putting together all the evidence. 79 2-1 A simple causal graph network. 2-2 Fully specified Bayesian network. 7 . . . . . . . . . . . . . Chapter 1 Introduction 1.1 Uncertainty Uncertainty is a central motivating force behind many aritificial intelligence topics. How can we make intelligent decisions in the face of limited data? One might say that learning itself is merely a process of reducing uncertainty in our world. How can we ever be entirely certain of something? Given that we know a small limited amount of information, what conclusions or hypotheses can we reason about from that? How likely are these conclusions to be true? There are many situations where tough decisions need to be made. They usually involve a subjective process whereby one person's experience and intuition guide them in the decision making. Ideally, we'd like to have a more formal process, especially when confronted with a lot of data or varying opinions from many people. How much do we really believe something to be true and what effect does our level of belief have on the conclusions we come up with? To be able to answer this, we need some model of uncertainty that allows a systematic procedure for chaining together information and coming up with a viable hypothesis. Understanding uncertainty and how the human brain deals with it is a very important question. Right now researchers have no idea how to model the way the human senses work or the way the brain works. We know that the senses send signals containing information to the brain, but how this data is processed is still a mystery. 8 Like all things, data is often incomplete, yet the human mind is capable of piecing things together to form a complete picture. Is it possible that by learning more about this mysterious process, we can garner practical knowledge that can be readily applied to other fields and problems? The answer appears to be a resounding "yes". There are so many applications that have to deal with limited data yet still need to generate some form of useful representation or hypothesis from thatt. Coming up with a model that can handle such conditions has such wide ranging applicability. This is an area of research that is starting to expand. As computer technology gets faster and better, we should also come up with smarter and better ways to utilize such computing power. Even if we eventually reach the raw computing power of the human brain, if we cannot make efficient usage of it, the outcome will be severely limited. 1.2 Probabilistic Models One of the most useful models for dealing with uncertainty is the probabilistic model. Probability theory is a well understood and mature field. When dealing with unknown quantities and attempting to combine and reason with them, probability theory serves as an obvious choice to serve this purpose. Probability theory has had a resurgence in the past decade and is the current most popular tool for analyzing uncertainty. This was due mainly to the increased computing power and newer models and algorithms for probability theory. There are other formalisms for reasoning about uncertainty such as the Dempster-Shafer theory of evidence[10] or the large body of work based on Zadeh's fuzzy set theory[8). However, the model of choice that will be used is the probabilistic model. Analyzing uncertainty using formal probability theory provides us with a structured and accurate method to handle evidence and partial information. It allows us to merge together sources of information with imperfect reliability to generate hypotheses, each with their own relative likelihood or probability of being true. Probability theory is a well understood idea that is proven to be a correct way for dealing with 9 uncertainty. This is why we decided to use the probabilistic model. 1.3 Applications and Scope There are three main areas where uncertainty and probability theory can be applied. They are: diagnosis, data interpretation, and decision making. Diagnosis is a broad area that covers medical decision making and model based troubleshooting for systems. It is widely used in medicine for making decisions about patient treatment, interpreting test results, and helping patients understand the rationale behind alternative treatments. This fast growing area is known as medical informatics. Model based troubleshooting is another form of diagnosis. Here the problem deals with some system that can fail for a number of reasons. The objective is to diagnose the condition and reason about possible faults given observations about how the system is behaving or misbehaving as is usually the case. This form of diagnosis began in manufacturing where faults in circuits needed to be diagnosed to find out where the problem lay. Since there is a limited supply of experts who can properly troubleshoot a given system, automated troubleshooting is a viable and very desirable goal to attain. The range of applications this applies to is immense given the vast amounts of complex machinery and technology available which often break down. Data interpretation is widely used by the military when they have to interpret intelligence data and determine the likelihood of different hypotheses. This is critical in evaluating the capabilities of the enemy in order to make smarter, more strategic decisions. Gathering data is often a very costly process both in finacial resources and human resources. As such one would like to find the optimal ratio of value of information to cost of information. There is a wide range of oft incomplete data. The military strategist needs to be able to piece together all of the information in order to get an idea what the data is suggesting. Having a systematic approach to data interpretation would take a lot of the guess work out of this process. In turn it can help to minimize the effects of human error. This is a common problem as demonstrated by the lack of military tact sometimes displayed by governments, ours 10 included. The final application area deals with collaborative decision making. Oftentimes when a large group of people is trying to come up with a decision, they need to decide between various alternatives. There is no systematic process of debating. Moreover, there is no structured quantitataive way of weighing arguments including how they might support or deny another statement. Without a systematic way to piece together arguments and other data, the decision making process can become very disorganized and non-optimal. The "might makes right" approach whereby the top CEO or head of the group makes the final decision isn't ideal. None of the ideas or arguments of the others are taken into account. If it happens that someone has a really good idea but is very low on the corporate ladder scale, their idea will get squashed by someone higher up even though it is a superior idea. This is the type of problem we are trying to solve when using probabilistic methods in collaborative decision making. 1.4 Objectives The goal of this thesis is to demonstrate the feasibility of applying probabilitistic methods towards solving two applications. The first one deals with model based trouble shooting of network intrusion detection. The second deals with collaborative decision making, namely integrating probabilistic functionality into the KBCW Comlink system. I will talk more about each of these two projects in the proceeding sections. The overall objective is to integrate probability into different domains to show that probability allows for very useful functionality. These two domains are just a small sample of the many wide ranging applications that probabilistic methods can be incorportated in. The success of these two projects will demonstrate that this is an important emerging idea whose potential is not yet fully realized. 1.4.1 MBT for Network Intrusion The first task/application to be dealt with is model based troubleshooting for network intrusion. Network intrusion is a big problem especially with today's type of 11 distributed systems. Hackers can attack the system in many ways. One difficult problem is resource stealing. This is a hard problem because it is passive. It is difficult to know if resources are being stealed since nothing really goes wrong. The only visible sign of a problem is that processes that use the same resource may run a bit slower. Since the loads on resources vary throughout the day and is not constant, this is difficult to monitor and determine definitively if there is indeed a problem. What we have done is to model a network of processes and resources and allow timing information to propagate throughout the system. That is, we model the inputs and outputs of processes and how they flow between components in the system while making note of arrival and departure times of the input/output. Together with a prior knowledge of the range of times it usually takes for a process to complete, we will be able to monitor and troubleshoot if there seems to be a resource stealing problem. Thus we will know if the network has been compromised with and more specifically, which particular resource is likely to be compromised. This is a self monitoring probabilistic system that is a good beginning point for a network traffic monitoring/trouble shooting system. The main bulk of the thesis will be focused on this problem. 1.4.2 KBCW Comlink System The second task/application deals with the KBCW Comlink System. The Knowledge Based Collaboration Webs Comlink System allows people from all over the net to be able to communicate in a collaborative forum. There they can voice their opinions/arguments about topics to be debated upon. Currently the system allows arguments to be linked together. That is, there is a graphical structure to the way in which a person's argument can either agree/disagree or support/deny another statement given by another person. This system allows people from different locations to be able to come to a group decision based on the arguments of each person. However, one thing lacking currently is a way in which to quantify things. That is, how much does one statement/argument support another? Do certain claims/hypotheses seem more likely to be true than others? What we did was to basically integrate a probabilistic system into the Comlink system to provide this functionality. This would 12 give Comlink the ability to more precisely quantify how likely certain hypotheses are. In a collaborative decision making process, this would allow the system to be able to come up with a relative likelihood for all of the competing ideas. It would provide a systematic way to distinguish between them to see which is more likely to succeed given all of the arguments given by the group. 1.5 Roadmap A roadmap of the direction this thesis takes will be given. First the probabilistic model of choice, Bayesian nets will be described and discussed briefly to provide the reader with a good basic idea of what Bayesian nets are and what they can do. Next a quick summary of the field of model based troubleshooting will be given to allow the reader to obtain a good idea what the issue/problems are in the area of troubleshooting. This section is included for completeness and is in no way meant to replace other superior sources on the topic. Next the problem of the network intrusion troubleshooter will be discussed with an explanation of how things are modeled and what functionality is available. Several walk through examples of the system will be given to demonstrate what it can do. A critique of the system and it's limitations will then be discussed. Issues such as future work and important lessons learned from designing/implementing such a system will be discussed. Next I will talk about the second project dealing with the integration of a Bayesian net solver with the KBCW Comlink system. Issues such as design, implementation, critique, limitations, and future work will be discussed. Finally a summary of the relative success of this project will be given. Issues about what went wrong and what went well will be talked about. 13 Chapter 2 Bayesian Nets 2.1 History A Bayesian network is a compact, expressive representation of uncertain relationships among parameters in a domain[3]. They are based on the probability theorem named after the reverend Bayes. Bayes' Theorem is a very powerful tool for determining how to use evidence contained in data to determine the likelihood of hypotheses. It can be shown to be the only coherent way to pass from specific evidence to general hypotheses. In other words, it is a method for combining evidence which is provably correct and well understood formally. Thus it is of great value when combining evidence and choosing between competing hypotheses. Bayes theorem allows one to update the probability of a hypothesis given new information or evidence. Bayes' Theorem is: P(H|E) = P(EIH)P(H)/P(E) Here, P(H) is the prior probability of a hypothesis H. P(H IE) is the posterior probability of a hypothesis, given that we observed evidence E. This allows us to constantly update the probability of a hypothesis in the face of new evidence. Since this is an example where new information changes our degree of certainty or belief in an event, it may also be considered a type of learning. 14 Reducing uncertainty about a topic is essentially equivalent to learning more about the topic. In Bayes' original paper, published only after his death, he wanted to figure out how to go from the usual deduction of the probability of a specific result given a general hypothesis, P(E|H)and turn it around, showing how to pass from a specific evidential result to the probability of the general case, P(HIE). Bayes was a nonconformist minister and he developed his theory as a formal means of arguing for the existence of God. The role of evidence in this argument is taken by the occurrence of miracles and other manifestations of God's good works. The two hypotheses he was comparing are the affirmation and denial of God. Bayes wanted to prove formally that it was more likely that God existed rather than the contrary. Probability theory, including Bayes theorem is the oldest and best understood theory for representing and reasoning about situations with uncertainty involved. However, early artificial intelligence experimental efforts at applying probability theory was unsuccessful and disappointing. The main quip against probability theory was the complaint that those who were worried about numbers were missing the main point, that structure was the key and not the numeric details. At that time, the only way to use probability theory in solving these problems was through the joint probability distribution, also known as the JPD. For domains described by a set of discrete parameters, the size of the JPD and the complexity of reasoning with it directly can both be exponential in the number of parameters[3]. The method was provably correct but infeasible to use. There was no efficient model that made it simpler or easier to understand or reason about quickly. Plus it was nearly impossible computational-wise to use the probabilistic method reasonably. Calculations simply took an exorbitant amount of time even for domains that contained only a small but reasonable amount of parameters. An approach to simplify things was the naive Bayes' model. This model assumes that the probability distribution for each observable parameter depends only on the root cause and not on the other parameters. This simplification allowed tractable reasoning and computation. However the model was too extreme and oversimplified. It did not provide the desired results. Thus given the computing power and probability 15 models at that time, probabilistic methods were not a feasible way to deal with uncertainty problems. However, about 5-10 years ago, there was renewed interest in probability, especially decision theory. The reason for this was the result of dramatic new developments in computational probability and decision theory which directly addressed the perceived shortcomings of probability theory. The key idea was the discovery that a relationship could be established between a well-defined notion of conditional independence in probability theory and the absence of arcs in a directed acyclic graph (DAG). This relationship made it possible to express much of the structural information in a domain independently of the detailed numeric information, in a way that both simplifies knowledge acquisition and reduces the computational complexity of reasoning. The resulting graphical models have come to be known as Bayesian networks. This directly addressed the two major shortcomings of probability theory. First that it was structure that was important and not just the numeric details in the domain. Second that it was intractable to reason about and compute with probability models. The current work is in developing better algorithms to speed up belief updating in Bayesian nets. This is currently a very active field and has helped to grow the importance of Bayesian nets in the Al community as well as in industry. The Bayesian network formalism is the single development most responsible for progress in building practical systems capable of handling uncertain information[6]. The first book on Bayesian networks was published by Pearl in 1988[9]. The reader is referred to this text as it is regarded as the authority in Bayesian nets. A Bayesian network is a directed acyclic graph that represents a probability distribution. Nodes represent random variables, and arcs represent probabilistic correlation between the variables. The types of path and lack thereof between variables indicate probabilistic independence. Quantitative probability information is specified in the form of conditional probability tables. For each node, the table specifies the probability of each possible state of the node given each possible combination of states of its parent nodes. The tables for root nodes just contain unconditional probabilities. This formalism is very intuitive to reason with. It has been claimed that the human mind 16 works in very similar ways when reasoning about uncertainty. Since this model is theorized to be similar to how we think, it is easy and intuitive to use. The important feature of Bayesian networks is the fact that they provide a method for decomposing a probability distribution into a set of local distributions. The independence semantics associated with the network topology specifies how to combine these local distributions to obtain the complete joint-probability distribution over all the random variables represented by the nodes in the network. The only probability values that need to be specified for each node is the conditional probability of the node being in one of its possible states given all combinations of its parents and their respective states. The structure of Bayesian networks allow three important features. First, by naively specifying a joint probability distribution with a table, it requires an exponential amount of values in the number of variables. However if the graph is sparse, the number of values needed is drastically reduced. As the network gets larger, the savings become very substantial. Second, there are efficient inference algorithms which work by transmitting information between the local distributions rather than working with the full joint distribution. In essense the algorithms divide up the graph into several smaller pieces, solves each smaller piece, and combines them together to get the final result. Much quicker computation can come about by using such optimizing strategies. Third, the separation of the qualitative structure of the domain and variables with the quantitative specification of the relative strengths of influence between variables is extremely beneficial. This breaks the problem of modeling a domain into two distinct stages. The process makes the knowledge engineering task much easier and more tractable. The first step is in coming up with the qualitative structure of the graph to see how the variables in the domain influence each other. The second step is in coming up with the numbers and quantifying the strengths of these relationships. Once both steps are complete, we are left with a model that completely specifies the joint probability distribution in an intuitive and easy to use graphical model. 17 2.2 2.2.1 Description Simple Example Now that we have discussed theoretically about what a Bayesian net is and why it is so useful, we will go through a simple example to demonstrate how a Bayesian net works. The best way to learn about Bayesian nets is to see an example of one. Note that the example discussed below is borrowed from Charniak's paper[1] as it is a good simple example of a Bayesian net. The following will briefly summarize some of the issues and ideas discussed in his paper. Bayesian nets are best at modeling situations where causality plays a role but where our understanding of what is actually going on is incomplete. Therefore we need to describe things probabilistically. Suppose that when I go home, I will only take out my keys to open the door if my family is out. Otherwise if I know that they're in I would just ring the doorbell. In my house there is a light that my family turns on whenever they leave the house. However, the light is also turned on if my family is expecting guests. This doesn't happen all the time though. I also have a dog at home and he is outside in the yard sometimes. Whenever nobody is at home, the dog is put in the backyard. The same thing also happens if the dog has bowel-problems. Nobody at home wants the dog in the house if that is the case. If my dog is outside in the backyard, he will bark from time to time. There is a chance that I will hear him bark when I get home. This is the situation when I get home. Since I am lazy and don't want to expend the effort of taking out my keys if I don't have to, I'd like to be able to figure out if my family is home or not. The two observations I can make are to see if the light is on and if I hear the dog barking or not. I can't see if the dog is actually outside in the backyard since it is in the back of the house. I could walk back there and check though that would defeat the purpose of conserving my energy. So based on my two observations, what conclusion can I draw about whether or not my family is home? This situation is depicted in figure 2-1. The nodes in the graph signify random variables which can be thought of as states of affairs. Each of the variables can have 18 Figure 2-1: A simple causal graph network. a multitude of possible values. In our case, they are binary, either true or false. In the more general case, each node or random variable can be N-ary, having N amount of discrete states. Bayesian nets also extend to the non-discrete or continuous states. This is an interesting area of research though for most practical everyday cases, it isn't as good a model as discrete states. The directed arcs in the graph signify causal relationships between variables. For example if my family is out, then it has a causal effect on the light being turned on. Similarly if my dog has bowel problems, then that directly affects the likelihood of the dog being put out in the backyard. The important thing to note is that the causal connections are not absolute. If the dog is out in the backyard, that doesn't mean that I will definitely hear him bark. Sometimes he might be sleeping in the backyard or is just being a good quiet dog and isn't making a ruckus. In any case, this is the first stage of using Bayesian nets to model a real world problem with uncertainty. Note that no numbers have been quantified yet though much information can be obtained through the qualitative structure of the model. The arcs in the Bayesian network specify the independence assumptions that must hold between the random variables. Nodes that are not connected by arcs are conditionally independent of each other. For example, suppose that I have observed that my dog is out in the backyard. I 19 went around back and checked and the dog was there. Now that I know for sure that the dog is out in the backyard, it makes no difference whether he has bowel problems or the family is out when determining if I will hear him bark or not. These variables are conditionally independent. The next step involves specifying the probability distribution of the Bayesian network. In order to do this, one must first give the prior probabilities of all root nodes, nodes with no predecessors and the conditional probabilities of all the other nodes given all possible combinations of their direct predecessors or parent nodes. Figure 2-2 shows a completely specified Bayesian network. Now that the model is completed, we can deal with evidence. For example, let's say that I observe the light to be on and I don't hear the dog barking. These nodes are then set to be in a specific state with definite probability. I can calculate the conditional probability of familyout given these pieces of evidence. This is known as evaluating the Bayesian network given the evidence. As more evidence comes in, the probabilities of the belief of each node changes. It is important to note that it isn't the probabilities of the nodes that are changing. What is changing is the conditional probability of the nodes given the emerging evidence. In this case, belief is defined as the conditional probability given the evidence. 2.2.2 Independence Assumptions One important feature of Bayesian nets is the implied independence assumptions. This feature saves a lot on computation for sparse graphs. One objection to the use of probability theory is that the complete specification of a probability distribution requires an exponential amount of numbers. If there are n binary random variables, the complete distribution is specified by 2' - 1 joint probabilities. Thus the complete distribution for figure 2-2 would require 31 values, yet we only needed to specify 10. If we doubled the size of our example by grafting a copy of the graph onto the existing graph, the number of values needed to specify the joint probability distribution would be 2' - 1 which is 1023, but we would only need to give it 21 with the Bayesian net formalism. The savings we get as opposed to the brute force probabilistic method 20 .01 .15 P(do P(do P(do P(do | fo bp) = .99 | fo !bp) = .90 !fo bp) = .97 !fo !bp)= .3 P(lo I fo) = .6 P(lo I !fo) = .05 P(hb I do) =.7 P(hb I !do) = .01 Figure 2-2: Fully specified Bayesian network. comes from the independence assumptions implied in the graph. See Charniak[1] or Pearl[9] for a mathematically precise definition of dependence and independence in Bayesian networks. 2.2.3 Consistent Probabilities One problem with a naive probabilistic scheme is inconsistent probabilities whereby the individual conditional probabilities seem legitimate. However, when you combine them, you can get probabilities which are not consistent, i.e. probabilities which are greater than 1. There is no such problem with a Bayesian network. They provide consistent probabilities and are provably equivalent to a full joint probability distribution. The numbers specified by the Bayesian network formalism define a single unique joint distribution. Furthermore, if the numbers for each local distribution are consistent, then the global distribution is consistent. A short proof of this claim is found in Charniak[1] or Pearl[9]. 21 2.2.4 Exact Solutions The basic computation on a belief network is the computation of every node's belief given the evidence that has been observed so far. This updating process is also known as belief propagation. One of the biggest constraint on the use of Bayesian networks is that in general, this computation is NP-hard[2]. The exponential time limitation often does show up in many real world Bayesian net models. This is a real issue since many real-world problems we would like to solve take an unacceptable amount of time to evaluate. Finding a general algorithm that can solve any Bayesian network exactly is NP-hard. This means that it is very unlikely to find a Bayesian net algorithm that will work equally well for all cases. The algorithms for solving a Bayesian network employ one of two strategies. The first is to factor the joint probability distribution based on the independencies in the graph. The seceond method is to partition the graph into several smaller parts and solve each separate part individually before combining the results together to get the final answer. It can be shown that these two methods are identical to each other. Some algorithms are very good for solving certain classes of graphs while they are terrible for other types. One solution might be to have a library of Bayesian network solver algorithms and be able to identify which algorithm would work best given the particular problem. This would be a good way around the NP-hard issue. This idea hasn't been that widespread most likely due to the cost of implementing several algorithms instead of just one. The algorithms for solving Bayesian networks exactly work well on a restricted class of networks. They can efficiently be solved in time linear to the number of nodes. The class is that of singly connected networks. A singly connected network, also known as a polytree, is one in which the underlying undirected graph has no more than one path between any two nodes. There are techniques that can also transform a multiply connected network into a singly connected one. There are a few ways to do this but the most common ways are variation on a technique called clustering. In clustering, one combines nodes until the resulting graph is singly connected. There are well understood techniques for producing the necessary local probabilities for the 22 clustered network. Once the network has been converted into a singly connected one, the previous algorithms can be applied. 2.2.5 Approximate Solutions There are times when the Bayesian network is just too large that no exact algorithm is capable of solving it in an acceptable amount of time. As is the case when trying to solve NP-hard problems, one can opt for an approximate answer. That is, an answer that is not exact but guaranteed to have a certain amount of error depending on how many iterations of the algorithm is made. There are many approximation algorithms available but the common approach they take is sampling. The basic approach is to start at the root nodes and choose a value for its state based on its probabilities. Next assume that those nodes are in that particular state and progress on to the children. Again choose a state for the child node to be in based on its conditional probabilities with the assumption that the parent nodes are in the states specified by the previous iteration. Once you've gone through the entire network, record the value of the node that you are trying to query the probability of given the evidence. Repeat this operation several times and the distribution that you record should approach the actual exact distribution had you solved it exactly. The more iterations you do the closer your solution will be. However, there are a couple of problems. The first is that sometimes the solution takes a while to converge, i.e. it'll take a lot of iterations before the answer you get approaches the exact answer. Secondly, and this is related to the first problem, depending on where you start, you might get stuck at a local maxima/minima. The solution you get could be quite different from the actual answer though your answer wouldn't change for a while through several iterations simply because you are at a local maxima/minima. Regardless, there is a greater possibility that there exists an approximation algorithm which works well for all kinds of Bayesian networks since an exact solution is NP-hard. With the ever growing level of computing power, this might be the most feasible approach towards solving Bayesian networks. 23 2.3 2.3.1 Advantages Computation There are a few advantages of Bayesian networks that make them very attractive for solving a lot of Al problems. The first is computation. Given the independence assumptions in the formalism, computation can be sped up greatly as compared to computing the joint probability distribution the brute force way. For sparse singly connected networks, they can be solved very efficiently, even if there are a large amount of variables/parameters. 2.3.2 Structure The focus of Bayesian networks is more on structure rather than numbers. That was one of the key arguments against using probability theory in decision making and other uncertainty problems. Now one can see visually how ideas are linked together. A good model that is accepted by all the parties involved can be created first. This process is the more intuitive step and is easier to come to a group agreement on what the proper model of the problem should be. 2.3.3 Human reasoning Bayes nets are theorized to be similar to how humans think. The graph model where new ideas can be linked in quite easily is a good formalism for human thinking. One of the reason for the success of Bayesian networks is that they are easy to reason with. Simply using numbers with the joint probability distribution was shown to be very intractable to reason about. However with the Bayesian formalism, it has been shown to be very intuitive to reason with. How the probability propagates as the result of evidence can be depicted visually lending more belief/credence to the results generated. 24 2.4 2.4.1 Disadvantages Scaling There are still problems with using Bayesian networks. They are not perfect and do not work equally well for all situations. The general problem is still NP hard to solve exactly. This definitely does not scale well. It is more likely for larger models with more variables to be more difficult to solve in a reasonable amount of time. NP-hard problems are very unlikely to ever be solved efficiently as this is a problem that has plagued algorithm theorists for many years. 2.4.2 Probability values Another problem is that one still needs to come up with subjective values for the conditional probabilities. Even though the structure can be decided upon independently, probability values still need to be given to fully specify the model. Now the question of where these numbers come from arises. People will very likely argue over the exact values. One fear is that by just changing the numbers, one can come about with whatever result one desires. 2.4.3 Conflicting model Model generation is still subjective. There can be conflicting opinions about causality. Not everyone will come up with the same exact Bayesian network to model a particular problem. The issue then becomes, which model is more correct. This is a difficult problem to address as there is no formal way to quantify how correct a particular model is. Once again this comes down to subjective viewpoints and the fear becomes that a not-so-correct model is chosen instead of a more accurate one. 25 Chapter 3 Model Based Troubleshooting We will now briefly go over the important points and issues that arise when discussing model based troubleshooting. This chapter is just a summary of the excellent article written by Davis and Hamscher[4] in chapter 8 of the 11th International Joint Conference on Al. It is by no means meant to be a substitute for that. For a more complete description of model based reasoning and its current state, consult the aforementioned reference. This chapter is included for completeness to provide the reader with a fair understanding of the ideas involved in model based reasoning and troubleshooting. 3.1 Introduction An oft occurring problem that plagues all of us from time to time is that something stops working. We would like to know why it stopped working and to figure out how to fix it. A good first step is to understand how it was supposed to work in the first place. That is the main idea behind model based reasoning. The rest of this chapter will discuss what the nature of the troubleshooting task is, exploring what is given and what is the desired result. Models of the structure and behavior of the system in question is very useful for diagnosis and reasoning. Most of this chapter will talk about how to use these representations to do model based diagnosis. The basic procedure is to witness the interaction between prediction and observation, that is we predict what should happen and observe what actually happens. Thus when there is 26 a contradiction between the two, we attempt to solve the problem. This is broken down into three fundamental subproblems: generating hypotheses by reasoning from the symptoms to components that may possibly be at fault, testing each hypotheses to see if it is consistent with all the observations, and discriminating among all the valid hypotheses to see which one is the most likely. What we will find is that there are well known methods for model based diagnosis once a tractable model for the problem has been given. However, the harder problem to solve is to figure out how to come up with a good model. This is an open research topic and presents many problems. 3.2 Basic Task As stated earlier, the basic paradigm of model based reasoning is to analyze the interaction between observation and prediction. Typically there is a physical device or software that operates in an expected normal manner. However when the observation or how the device/system is actually operating differs from that which is predicted, there is a discrepancy. One fundamental assumption is that if the model is assumed to be correct, then a discrepancy must mean that there is a defect somewhere in the system. The type of faults and location of the faults that occur are clues that provide some information about where the defect in the system might occur. This raises some issues though since the assumption might not always be true. In any case, given a model, the basic task is to determine which of the components in the system could have failed in such a way as to account for the discrepancies observed. The model contains information about the structure and correct behavior of the components in the system. This information is used to reason with. This approach to troubleshooting has been called a multitude of names such as model based and reasoning from first principles. This is because the method is based on a few basic principles about causality and "deep reasoning". 27 3.3 Alternate Approaches There are several other approaches to trouble shooting besides the model based method. They each have their own strengths and weaknesses. They will be discussed below. 3.3.1 Diagnostics Diagnostics involve running test programs on devices/systems after they have been manufactured/created to ensure that the system is capable of doing everything it is supposed to do. The problem is that this approach is not diagnosis but verification. The tests make sure that the system is supposed to behave in expected ways. There is no misbehavior to diagnose since none have come up yet. Model based diagnosis on the other hand is diagnostic because it is symptom directed. Whenever a fault occurs, the observed symptom is analyzed and used to work backwards to the underlying components that might be faulty. This approach is more efficient working backwards from faults that have already occurred rather than trying to find out what all the possible faults are. 3.3.2 Fault Dictionaries and Diagnostics Similarly to diagnostics is the idea of fault dictionaries. Here the fault dictionary is built by using simulation and a list of the kind of faults anticipated. Once a test has been simulated, the resulting symptoms/faults are recorded. The list is then inverted so that one can go backwards from symptoms to faults to find the reason for failure. This is not broad enough since the only possible symptoms it can recognize are those that come from the prespecified faults at the time of the fault dictionary's creation. If a new fault occurs which the designers had not anticipated, the dictionary becomes useless and is unable to correctly diagnose the problem. Using fault models like this is useful if the library of faults is very broad since there is a high degree of specificity to the diagnosis. However it is difficult to be certain that the fault models are comprehensive enough. 28 3.3.3 Rule based Systems Rule based systems are built upon the knowledge of experts who know the potential problems that may arise and what the symptoms may be. The problem is that it may take a while before there is enough expert knowledge to be able to efficiently diagnose problems. This is important in systems today since the design cycle is so short. There is no time to be an expert on a system because by the time you are proficient and knowledgeable about it, the product becomes obsolete. This approach is also very device dependent. The knowledge and diagnostic methods used are only applicable to that particle device or system. In contrast, the model based approach is strongly device independent. It reasons about from first principles and just needs to know the basic structure/behavior of the system and its components. This information is often supplied by the description used to build the device in the first place. The model based approach is also more methodical and comprehensive. It is less likely to miss something as opposed to the rule based approach which relies on a subjective expert's knowledge. Finally rule based systems offer very little help in thinking about or representing structure and behavior. It does not use that for diagnosis and does not lead us to think in such terms. This makes the diagnosis harder to understand or follow for those who are not experts about the system. 3.3.4 When not to use the model-based approach The model based approach does offer significant advantages compared to other trouble shooting approaches. However it is not the best approach to use in all situations. If the system that is to be modeled is too complex, the model based approach is unsuccessful. There are too many unknown variables that just aren't modeled. It would be too complicated to include all of the information needed to correctly predict and understand the behavior of the system. Conversely, if the system is very simple, the model based approach isn't optimal. For simple systems, we can model its behavior completely and exhaustively. The faults considered are well known and can be enumerated beforehand reliably. Thus a fault model approach such as a fault 29 dictionary would be the optimal approach here. 3.4 Three Fundamental Tasks The task of model based diagnosis can be broken down into three fundamental task. Once a fault or discrepancy is observed, a set of hypotheses must be generated to try to explain what went wrong. Each of these hypothese must be tested to see if it is consistent with the discrepancies observed. Finally all of the consistent hypotheses must be discriminated between to find the best, most likely answer. 3.4.1 Hypothesis Generation A hypothesis generator should typically have three desired qualities. A good generator should be complete, it should be able to produce all plausible hypotheses. It should be non-redundant,only unique hypotheses should be generated. It should be informed, only a small fraction of the hypotheses generated should be proven incorrect after the testing process. We assume that the device/system in question is modeled as a collection of several interacting components each with inputs and outputs. We also postulate that there is a stream of data that flows through the system from one component to another with each component processing the inputted data in some way. The first simplification is that we only need to consider components that are upstream of the discrepancy to be suspects for faultiness. Another idea is that not every input to a component influences the output. There is thus no need to follow irrelevant inputs upstream for the same reason for not following components downstream. If there is more than one discrepancy, we can generate a set of suspect components for each and intersect them. This may further reduce the amount of suspect components to test. Hypothesis generation thus becomes a process of following the paths backwards from the discrepancies. 30 3.4.2 Hypothesis Testing The second fundamental task of model based diagnosis is to test each of the potential hypotheses generated and see if it can account for all the observations made. One simple method is to enumerate all the ways each component in the device can malfunction, then simulate the behavior of the entire device on the original set of inputs under the assumption that the suspect component is malfunctioning in the specified way. If the simulated results match the observed results, then that hypothesis is consistent with the observations and is retained, else it is discarded. The problem with this is that one must have a complete description of the way in which every single component can misbehave, otherwise the simulation is not accurate. A more advanced technique is to use constraint suspension. The basic idea behind this technique is to model the behavior of each component as a set of constraints, and test suspects by determining whether it is consistent to believe that only the suspect is malfunctioning. Thus given the known inputs and observed outputs, is it consistent to believe that all components other than the suspect are working properly? The traditional method to handling inconsistencies in a contraint network is to find a value to retract. In the hypothesis testing case though, we want to consider which contraint rather than value to retract in order to remove the inconsistency. 3.4.3 Hypothesis Discrimination Now that we have a set of hypotheses that all satisfy the observed discrepancies, we must have a method to choose or discriminate among them. There are a couple of approaches and each will be discussed briefly in the proceeding sections. The first method involves variations on probing while the second involves testing. Probing Probing involves running the system again with the same inputs but this time, gather data that was not present before by probing values within the system itself. With this new data, not all of the hypotheses will be consistent with it and some will have 31 to be discarded. Using Structure and Behavior Just probing at random locations is not very optimal. A smarter approach would be to use the structure and behavior of the system to choose locations where the information probed would be more discriminatory towards the possible hypotheses. By choosing locations which are upstream of the discrepancy, we can improve our chances of finding out more useful information which can further discriminate among the hypotheses. Using Failure Probabilities When probing for locations which are more informative than others, it may be the case that there are several locations which are equally informative. It would be easy though more costly to just probe all of these locations but if we can only probe once or a small amount of times, we'd want to pick the best one. With the use of failure probabilities, we can know which components are more likely to fail, thus it would be better to probe those equally informative locations which are near the more likely to fail components. This would further improve the chances of finding useful definitive information for hypothesis discrimination. Testing The second basic technique for hypothesis discrimination is testing. Here we select new inputs and once again observe the outputs. The set of possible hypotheses thus must also satisfy the observations given these new inputs and observed outputs. This can be done a multiple number of times if allowable to continually trim down the set of possible hypotheses. Cost Considerations One consideration when choosing between the various techniques described is cost. Not all techniques are equal cost-wise. For example, using an optimal probe is more 32 accurate but it might be very costly to find the optimal probe. It might have been cheaper to use non-optimal probes a multiple amount of times. Similarly, testing is a good approach for hypothesis discrimination. However it might be too costly or even impossible to run a new set of inputs on the system. This is a real world constraint that must be taken into account when designing a model based trouble shooting system. 3.5 Conclusion In summary, model based troubleshooting is based on the interaction between observation and prediction. It is symptom directed and reasons backwards from first principles given a good model of the system. Model based troubleshooting is device and domain independent. The ideas can be equally extended towards other non- related fields. There are three fundamental tasks that comprise the process. They are hypothesis generation, hypothesis testing, and hypothesis discrimination. There are many well understood techniques for reasoning about a model to diagnose the fault. However the harder problem is in coming up with a good model. There is an inherent tradeoff between completeness and complexity. A good model needs to model everything about the system taking into account every single minor detail. However such a large model can often be too complex and might contain too much information which is not useful for trouble shooting. These are the problems that many researchers are currently striving to find solutions to. 33 Chapter 4 Design We have discussed the viability of using Bayesian nets in a variety of AI applications such as diagnosis, data interpretation, and trouble shooting. Bayesian nets are a powerful tool that can be integrated into many existing systems to provide additional functionality which can prove to be extremely useful. Model based trouble shooting has also been discussed. This is a very practical area that has broad applications. The fusion of two such powerful ideas and the results will be examined. 4.1 Problem Statement Why is network intrusion a problem? As computer networks get larger and larger, the level of coordination needed to organize such a structure increases dramatically. Computer networks have grown substantially at such a rapid pace with no signs of slowing down. Unfortunately, as the computer network has grown, so too has the art of computer hacking. Keeping a network secure from outside intruders is very important. In sensitive applications such as those involving company trade secrets or military knowledge, it is of the utmost priority to make sure such information is kept secure. The current trend is to have a large network of distributed systems sharing a pool of common resources. A distributed system is inherently harder to protect against hackers as opposed to one single supercomputer mainframe. There are more areas to attack, either blatantly or discretely. 34 In order to design a system that is resistant to such attacks, one would like to be able to assume a framework of absolute trust requiring a provably impenetrable and incorruptible trusted computing base[7]. This is not a reasonable or realistic task to accomplish so the question thus becomes, how do we perform computations in the face of unreliable resources? How can we model such a system effectively? The problem becomes very complex as a result of the dynamic nature of networks, distributed computations, lack of monitoring on all desired inputs/outpus from processes, etc.. Additional complexities arise from the fact that not all hacks are obvious. There are some that are more passive in nature, e.g. resource stealing. There are also different levels of "hackedness". For example, if a hacker sniffs a password for a user of a server but that user doesn't have any root access, the hacker is very limited in what he/she can do to harm the server. However if a hacker was able to gain root access, that server has been totally compromised and is not to be trusted as well. Additionally, different computations can have varying levels of sensitivity. For example if a user wanted to send a file to be printed out to a printer and the file was just a scanned image of his dog, that is a very low sensitivity process. However, if a military general was sending an email to his captains to give them orders about what targets to strike, that would be an extremely sensitive process. That must be taken into account when assessing the risk of doing such a computation on a resource that is not totally trusted. As we can see, this is a very difficult problem that must be addressed nonetheless. There are many security issues to attempt to solve but in this project, the focus will be on resource stealing. Resource stealing is difficult to detect since it is passive. Nothing completely wrong occurs. Some processes might take longer to compute but the time to compute is hardly constant. It depends on the level of network traffic, the system load, the amount of resources required for the process, the priority of the computation in the resource's queue, etc.. Thus we can never be sure if the system has been compromised in such a way. Normal troubleshooting methods are ineffective at dealing with this problem since there hasn't really been a definite fault. However we would still like to get some information about the relative likelihood that some system resource has been compromised. Probabilistic methods are the obvious tool 35 of choice here. The following sections will describe the model used to describe this situation of a general network system. 4.2 Process modeling Model based troubleshooting is a good approach to use for solving this problem. One of the benefits of model based reasoning is the fact that no specific fault models need to be specified. We only need to model what a component is supposed to do, how it is supposed to work. A property of model based reasoning is: something is malfunctioning if it's not doing what it's supposed to do, no matter what else it may be doing. Thus it isn't required to prespecify how the component might fail since a fault is defined as any behavior that doesn't match expectations. This is a very desirable property in the network instrusion problem. We are unsure of exactly how the system components can be compromised or even what the particular behavior will be if it is indeed hacked. There are a lot of unknowns in that respect, which is why it is ideal to use model based reasoning. All these details are swept underneath the carpet as they are not necessary. Valuable information can still be garnered through this process. The model we will be using is as follows. The computations are modeled as component nodes. A given computation can take input from another computation or can have it's output be linked to another computation. In our model, a computation is an abstract term describing anything that takes in input information, processes it using some prespecified resources, and outputs the result. Thus all of these computation nodes are linked together in a network with each component containing information about which resource it uses. Note that resources can be shared among components. Resources are also modeled as nodes and contain information about which components executes on it. Similar to the GDE/Sherlock circuit fault troubleshooter, an assumptions baseds truth maintenance system will be used to maintain consistency in the model. Thus troubleshooting a fault will be a matter of deciding which constraint to retract through 36 the process of constraint suspension. In our system, we use Joshua, a knowledge based reasoning system built in Symbolics Common Lisp. This will provide the framework and infrastructure to allow us to have truth maintenance in the system in order to detect if there is an inconsistency or not given the inputs and outputs of the system. Joshua is an extensible software product for building and delivering expert system applications. It is a very compact system, implemented in the Symbolics Genera environment. It has a statement oriented Lisp-like syntax. Joshua is at its core a rule-based inference language. It has five major components: predications, database, rules, protocol of inference, and truth maintenance system. Our application will draw on a few of these components. 4.3 Input/Output Black Box modeling Each component is modeled as a black box node with input and output ports. This is a very abstract view of the component. We do not specify how it does the computation nor what particular faults it might have. All we specify is what type of inputs it takes in, which resources it uses to compute with, and what outputs it has. The component can be thought of as a factory. It waits for its resources, the inputs to come in. When it has enough of the resources to start one of its processes, it begins. When the process is done, it outputs the result as the product. Now note that it is possible that there are multiple inputs and outputs related in any arbitrary way with regards to which inputs are needed to create the corresponding outputs. For our problem domain of resource stealing, it is necessary to model the computation times needed for that component. We do this by specifying a range of time units that the component needs to complete the computation when it is operating normally. For example, component A has a normal computation time range of [1,5]. This means that at best, the computation takes 1 unit of time. At worse it takes 5 units of time to complete during the normal operating state. We can thus specify the arrival and departure times of the inputs and outputs. These can be exact times or ranges depending on how things are specified. All of the components are modeled as such. The data pathways are then completed so 37 A FOO [3,5] NORMAL C [7,10] HACKED B D Resource: WILSON Figure 4-1: A simple sample component. that the outputs of components go into the inputs of other components as the model would dictate. Thus we have now modeled the dataflow between the components as well as the timing information for computation. An important point to note is that nothing is being said about the correctness of the information passed. Indeed it is virtually an assumption that all values that get passed are correct and do not affect the computation time of components. That is, even if a component receives an erroneous input, it will still take the same amount of time to process that input as compared to a correct expected input. We assume this because the problem we are trying to tackle is resource stealing where we assume that the system hasn't been hacked into blatantly, i.e. all the processes produce the same correct values, only the computation time is affected. In figure 4-1, the component is named FOO. It has two inputs, A and B and two outputs C and D. Inputs A and B combine and get processed to produce outputs C and D. Thus process FOO would have to wait until both inputs A and B are there before it can start computing. FOO executes on resource WILSON. FOO has two possible states, a NORMAL and a HACKED state. In the NORMAL state, FOO takes time [3,5] once both inputs A and B are present to produce outputs C and D. If FOO is in the hacked state, it takes time [7,10] to produce C and D. In figure 4-2, the component is named BAR. It has two inputs A and B and two outputs C and D. Here inputs A and B are independent of each other and do not interact at all to produce the outputs. Input A is used to produce output X while input B is used to produce output Y. The two inputs do not need to wait till the 38 A BAR B X [2,7] NORMAL [3,61 [5,10] HACKED Y Resource: ATHENA Figure 4-2: A second example of a component module. other one gets there before they can start processing since they are independent of each other. The first process of input A to output X takes time [1,5] in the NORMAL state and time [3,6] in the HACKED state. The second process of input B to output Y takes time [2,7] in the NORMAL state and time [5,10] in the HACKED state. Component BAR operates on resource ATHENA. These two figures are indicative of the types of components that will be present in the process models. A more complex example would have many more such components linked together in more complicated ways. 4.4 Fault modeling Similar to the GDE[11] and Sherlock[5] circuit fault troubleshooter, each component and resource module has several fault states. Now instead of having states that describe exact types of faults that can occur, only the behavior of the system in fault states are given. This covers up a lot of the specific details about how a particular module could have been compromised. For example, exhaustively enumerating the ways in which a server could be hacked is intractable. Possibilities include user accounts being hacked into, root access being compromised, printer resources being compromised, etc.. Each module has the obvious NORMAL operating state. There can be several other state of operations. Possibilities include HACKED, SLOW, FAST, etc.. One good idea to use is to also include an OTHER state. This is to include the miscellaneous conditions that we don't take into account; think of it as 39 a leak probability. Our model of the system is necessarily incomplete and simplified. As completeness increases, so too does complexity. To keep things tractable, we use a more simplified view of the system but must allow for an OTHER state for completeness. The Sherlock system[5] contains a lot of similar ideas that our system borrows from. The interested reader is encouraged to take a look at the reference. Our approach towards modeling is similar to the Sherlock circuit fault troubleshooter. The main idea from the perspective of diagnosis is to identify consistent modes of behavior, correct or faulty. Thus we are assuming that if a resource is hacked, even though we don't know the details or specifics of what exactly happened, the behavior of it, namely the computation time will be consistent. Therefore, we can group all of it up into the timing range information for the HACKED state. Similar arguments goes for the other states as well. For our application, we allow only the component modules to have different consistent states of behavior. The resources are in a separate set from the components. The reason for this is because the focus is on the component level. Resources can be thought of as the root nodes in this graph model. They are base resources that do not depend on other modules for operation. Thus instead of having a conditional probability dependence on other modules, they will just have a prior probability distribution for the modes of operation. In some sense, they still contain the mutltiple states of behavior idea but is executed in a different fasion. Resource modules will have prior probabilities of being in the NORMAL, HACKED, OTHER, etc.. state. 4.5 Probabilistic Integration Now that we have a good framework for modeling the system with the truth maintenance system providing the infrastructure for the detection of inconsistencies, we turn towards the integration of probabilistic methods. The system right now is capable of checking for inconsistencies in the model given a system description listing all the fully specified components and resources along with the input/output dataflows. The timing information specifies time ranges where the inputs/outputs must be ar40 riving/departing. We can give the system specific times of certain inputs and outputs and it can detect if these numbers are consistent with the system description. Recall in the discussion on model based reasoning that there are three fundamental tasks towards trouble shooting once a model for the system has been created. They are hypothesis generation, hypothesis testing, and hypothesis discrimination. For our system, if a fault is observed and there is a contradiction with the expected values, Joshua will signal an exception and will then call functions to handle the inconsistency. Our system employs a strategy similar to the GDE/Sherlock troubleshooter in that the hypothesis generation and testing stages occur at the same time. Recall that each component and resource module has several states of operation. We start off with everything in the NORMAL state as an assumption. When a discrepancy is observed, we use constraint suspension to solve the inconsistency. Thus we would pick a model of a component to retract. In our case, we would need to retract the assumption that a component is in the NORMAL state since that assumption isn't consistent with the observed values. Our system currently is very similar to the GDE/Sherlock projects. As such we could use similar methods to deal with hypothesis generation and testing and hypothesis discrimination. However, we would like to integrate probabilistic methods to make this process smarter. Right now we know how the components interact with each other and what resources they run on. We also know that each component contains a set of complete modes of behavior to describe the different types of ways it can behave regardless of the cause. Hypothesis generation becomes a matter of choosing models to retract as a form of constraint suspension. The Joshua infrastructure allows us to easily test if a certain hypothesis is successful at explaining the observed values. However, there is no way to distinguish or discriminate among the various hypotheses. There is nothing guiding us in determining which component's model we should retract first to see if it solves the problem. An idea used in other model based troubleshooters is to allow a failure probability for each component. Thus one can choose which model to retract based on which component is more likely to be not in the normal operating state. Our problem domain is a little more complicated in 41 that there are underlying resources which can be shared among components. Thus if the resource is hacked, that should increase the likelihoods for failures to appear among all of the components that execute on it. Thus we will also include conditional probability information in our model. Specifically, we will allow for the resources to have a causal effect on the components that executes on it. This makes sense as it is likely for the components to not be operating in the normal state if the resource is hacked. Thus we will allow for conditional dependence from the components to the resources. Probabilistically, this would involve generating a Bayesian network with the resources as the root nodes which have a causal effect on the component nodes. We will thus need to specify the conditional probabilities of the components being in each state given all possible combination of the parent node states. In our case, each node can have at most one parent node since components are assumed to only execute on one resource. We use IDEAL to provide this probabilistic functionality, making sure that we do the proper bookkeeping to maintain consistency between the IDEAL Bayesian structure and our Joshua based structure. One feature we added to IDEAL was the idea of evidence nodes for each component. Basically this just allows for negative evidence similar to the ideas in Struss and Dressler[11]. IDEAL does not allow the user to have negative evidence for a node, only positive evidence. Negative evidence, in effect, allows us to state that we know for sure that a node is NOT in a certain state. We get around the IDEAL limitations by the following method. For each component node with N different states, we will create N *evidence* nodes, one node for each state of the component node, and have causal links from the component node to each of them. The probabilities will be set such that whenever any of the *evidence* nodes are given positive or negative evidence, that will force the component node to either be in that state, or definitely not be in that state. Figure 4-3 shows a completely specified component along with the probabilistic structure created using IDEAL. We require such a feature because when we are retracting component models, we are in essence giving negative evidence for that component being in that particular state. Having integrated a Bayesian network into our system, the question still remains 42 '-. --- [2,7] NORMAL FOO-NORMAL [1,51 FAST D B FOO BOX-I BOX-1 NORMAL HACKED Figure 4-3: Completely specified component with probabilistic model included. how best to use the information provided by it. The probabilities can guide us in our selection of hypotheses to test. It can order the hypotheses by likelihood and test them accordingly. We can stop whenever we do find a solution that is consistent with the values. This method is the hillclimbing approach where at each step, we choose the best, most probable item to retract and then test it. When we've come to a solution state, we are done. The benefits of this method are speed since we are not doing an exhaustive search on the entire space of possibilities for component state models. However, we are not guaranteed that we will find the *best* answer, meaning the most likely hypothesis which satisfies the given values. A hillclimbing approach suffers from this problem, also known as the local maxima problem. One other problem is that with the hillclimbing approach, we choose the best step given to us at each iteration. However, we could potentially run into a dead end whereby hitting the end of the road and running out of options without yet having satisfied the inconsistencies in the model. In this case, the hillclimbing approach would also need to be able to backtrack to retract the prior step and choose the second best option available at that time instead. 43 Another method towards using the probability information is to simply do an exhaustive search on all of the possible hypotheses that do satisfy the constraints on the model given the inputted values. Then one can rank each of these solutions by how likely they are using the probability information in the model. The problem is that this is an exhaustive search. For large problem spaces, this could take an intractable amount of time to calculate. Oftentimes we don't need the best answer, just an answer that is reasonably likely to be true. This method however does guarantee the best solution though the tradeoff to attain this is speed in computation. For our system, we have decided to use the probability information in the following manner. We intend to merge both the ideas of an exhaustive search together with a hillclimbing approach. The approach we use is as follows. We being with a hillclimbing best first search where we choose the most likely component to not be in the normal operating state. We retract the normal state and assume another state and test to see if it is consistent with the given outputs. If it is, we are done. If not, then we continue on and select the next most likely state to be not in the normal state. We continue doing this best first search until we've come to a solution that is consistent with the given inputs. Now that we have an answer that is consistent, we conduct and exhaustive search on the rest of the possible combinations of states, generate them, test them, and collect all the configurations that yield consistent results. Prior to conducting the diagnosis, we generate an exhaustive list of all the possible combinations of states of the components in the model and enumerate them all. This allows us to keep track of which configurations we have tested already and provides bookkeeping functionality in the exhaustive search. After the first "best" solution is found, the system searches through all of the possible configurations, tests them, and collects the configurations which are consistent with the inputs. Now we have a set of all the "good" combinations which are solutions to the problem given the inputs, and a set of the "nogood" states. The set of "nogoods" is information that the system has learned from the diagnosis of the fault. To encode this knowledge gained, we save this information on the IDEAL side of things by adding all of the nogood configurations into the Bayesian network 44 as the system goes along testing all the hypotheses. The information gained from the "goods" is also encoded into the Bayesian network in a similar manner. For each solution, we construct a Bayesian net node which is the logical AND of all the nodes corresponding to the assertions in the solution. Note that due to time constraints, this part of the implementation has not yet been completed. However, the general procedure mirrors the situation with the "nogoods" configurations. Once all the "nogoods" and "good" information has been incorporated into the Bayesian network, we can compute one final solution to the entire net and we will thus acquire a probability for each solution conditioned by the complete set of "nogoods" which takes the position of "evidence" in our system. It is partial information that can affect the relative likelihoods of the probability of success for the solution nodes. This is probably the most accurate estimate of how likely each possible solution is. We can thus choose the absolute "best" solution for the given problem. Note that if computation time is a limitation, we do not necessarily need to go through the exhaustive search. Recall that we do this only after we have found a solution using a best first search hillclimbing approach. It would be interesting to see how close this answer is to the absolute best answer we get from an exhaustive search. The hope is that the initial "best" solution is relatively close to the absolute best solution. Depending on the application, this might suffice. Thus we allow for the system to be able to trade off speed vs. accuracy as determined by the user. 45 Chapter 5 Implementation Now that we've discussed the design behind the system, the natural question to ask is "What can it do and what does it know?" The best way to demonstrate the capabilities of the system is by demonstrating a few case examples and show how the system reacts to different faults. We will analyze what the system's decision on how to solve the inconsistency is, whether it makes sense or not, and what effect does it have on the probabilities in the model. We will start off with two simple academic examples showing some basic structures that will be common to many types of more complex networks. The last example will be a slightly more complex example which is more "real world" in terms of applicability. Through these examples and walk throughs, the reader will hopefully get an intuitive feel for what the system is capable of doing. It will be demonstrated that the system provides useful, beneficial results. 5.1 Linear Process The first example is a simple linear process. Here we have a very simple network consisting of two components each executing on its own resource. Figure 5-1 depicts the model generated showing the dataflow structure and the timing information. The probabilistic information is shown in figure 5-2. Note that it is more likely for component FOO to be in the NORMAL state rather than BAR. The "evidence" nodes which allow for negative evidence are also depicted here for completeness. For 46 [5,10] [7,15] [2,7] NORMAL NORMAL [1,5] FAST [10,20] [15,20] SLOW Figure 5-1: Linear process model. brevity, they will be omitted in later diagrams. We will now walk through a few examples to show how the system deals with different possible faults. The examples are taken directly from the command line input one would give to specify observed input/output values to the system. * Normal case with no faults. (run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 30))) This is the command line input to tell the system to run a certain model with given inputs and outputs. The first argument to the function run-case is the name of the model, called 'test-1 in our case. The next argument is a listing of the input bindings. In this cause it states that A of FOO is observed to be at time 10 and B of FOO is observed to be at time 15. This is the arrival times for the inputs. The third and last argument states that X of BAR is observed at time 30. In this case, the output time of 30 at X of BAR is consistent with the predictions of the model. Given the inputs of 10 and 15 of A of FOO and B of FOO respectively, X of BAR should have an expected time intervale of [22,32]. 30 fits into that intervale so no problem is signaled. 47 483/17 17/83 26/74 17/83 20/80 10/90 P(FOO = NORMAL) = 0.83 P(BAR = NORMAL) = 0.74 Figure 5-2: Linear process apriori probabilistic model. * Fault in output X of BAR. (run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 34))) In this case the output X of BAR is observed to be 34 which is outside of the expected time intervale of [22,32]. This time value is slower than expected. The system changes the model of BAR from NORMAL to SLOW which solves the contradiction. Also the probability of the resource BOX-2 which BAR operates on, of being hacked increases from 0.20 to 0.69. This makes sense and is to be expected since BAR is acting outside of normal expectations and whenever it does, it is likely for the resource it is acting upon to be hacked. Figure 5-3 illustrates the probability models after the evidence has been propagated. e Fault in output C of FOO. (run-case 'test-1 '((a foo 10) (b foo 15)) '((c foo 16))) In this example, the output of FOO at C is observed to be at time 16. This is faster than the expected time interval of [17,22]. The system decides to change the model of FOO from NORMAL to FAST which makes it consistent with the given inputs and outputs. The probability of the resource which FOO executes 48 100/0 83/17 17/83 0/100 100/0 17/83 69/31 10/90 P(BOX-2 = HACKED) = 0.69 after evidence P(BOX-2 = HACKED) = 0.20 before evidence Figure 5-3: Linear process post evidential probabilistic model. on, BOX-1 of being hacked increases from 0.10 to 0.47. This makes sense since something is not normal with FOO and it is most likely to be the resource it is using as the underlying fault. e No Possible solutions. (run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 43))) This case has a slow output for X of BAR which yields no possible solution for this model. There is no combination of selected models for the components that will yield a non-contradictory state. There is no solution possible and this is an example of a problem that the system cannot cope with properly yet. It doesn't know when to stop searching for solutions when none exists. 5.2 Branch/Fan process The second case illustrates a branch or fan structure. This is where one component's output flows into the inputs of two other components. Figure 5-4 depicts this case. One of the desired goals of the system is for it to be able to tell when it is a shared 49 Figure 5-4: Branch/Fan process model. resource that is more likely to be the source of discrepancies rather than a shared component from whence its inputs came. Figure 5-5 shows a simplified form of the Bayesian network created for this case. For simplicity, the "evidence" nodes are not shown. The important prior probability information to know is: P(RESOURCE-1 HACKED) = 0.10 and P(RESOURCE-2 = HACKED) = 0.20. We will now walk through a few examples to show how the system deals with different possible faults. The examples are taken directly from the command line input one would give to specify observed input/output values to the system. 50 Figure 5-5: Branch/Fan process probabilistic model. " Normal case with no faults. (run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 25))) The inputs state that A of FOO arrives at time 10. We observe that the outputs X of BAR occurs at time 25 while Y of BAZ occurs at time 25. Given the input A of FOO arriving at 10, the expected time range for X of BAR is [17,27] and the expected time range for Y of BAZ is [17,27]. The value of 25 for both BAR and BAZ are consistent with the predicted ranges so no contradiction is detected " Slow fault on BAZ. (run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 30))) In this example the output at Y of BAZ is outside of the expected range thus a contradiction is detected. The system decides to change the selected model of BAZ from NORMAL to REALLY-SLOW which solves the contradiction. Additionally, the probability that the resource which BAZ operates on, RESOURCE2, being hacked increases from 0.20 to 0.69. " Slow fault on BAR. (run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 25))) In this example, a similar fault from the previous example occurs except here it is BAR that gets the slow fault. Analogously, the system decides to change the selected model of BAR from NORMAL to SLOW which solves the contradiction. 51 Once again, the probability of RESOURCE-2 being hacked increases from 0.20 to 0.69 * Slow fault on both BAZ and BAR. (run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 30))) In this example, both X of BAR and Y of BAZ are observed to be slower than expected. This creates two inconsistencies in the model. The system decides to change the selected model of BAZ from NORMAL to REALLY-SLOW and the selected model of BAR from NORMAL to SLOW. This solves the contradiction. More importantly, the probability of RESOURCE-2 being hacked increases from 0.20 to 0.95. This makes sense and shows that there is a lot of information favoring the fact that RESOURCE-2 is hacked since both of the components that execute on it faulted in this example. * No Possible Solution. (run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 15))) In this example, once again the observed values provide no possible solutions. This example contains a slow fault on X of BAR and a fast fault on Y of BAZ. No combination of selected model states can answer these observed values. Once again, the system is not equipped to handle examples with no possible solutions as it loops forever searching exhaustively but doing redundant work. * Fast fault on FOO. (run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 25) (b foo 11))) In this example, B of FOO was given a fast fault. The time observed there was 11 while the expected time range for that location is [12,17]. This was done to see how the system would respond to such a fault. The result was at first unexpected, but interesting nonetheless. The system decided to change the selected model of FOO from NORMAL to FAST as expected. However, before it did that, it also set BAR to the SLOW model and BAZ to the REALLY-SLOW 52 model. Ideally FOO should just have been changed to the FAST state and nothing more. However, the system did this because given the numbers in the model, it is more probable that FOO is in the NORMAL state as compared to BAR and BAZ. So BAR and BAZ's states are changed first to the slower ones to see if that solves the contradiction. This is not quite what one would ideally want but it is consistent with how the code works. The resulting configuration is consistent with the observed inputs/outputs though not necessarily the best or most probable answer. The probability of RESOURCE-1 being hacked inscreases from .10 to .47. The code works this way because of the manner in which we implemented it. We chose to use a hillclimbing approach towards hypothesis generation. We choose what is the most likely thing at that time and place. It might not end up giving us the best answer but it will give us a good consistent answer. 5.3 Branch and Join This last case is an example of a branch and join structure. It takes a real world type of components and is known as the web-server/trader example. Figure 5-6 shows the model/structure inputted. Figure 5-7 shows the corresponding probabilistic model with the prior probabilities for the resources shown. Recall that the first number is the probability of being hacked while the second is the probability of being in the normal state. Various faults and how the system handles them will be shown. This case attempts to model a simple network that consists of these following components: web-server, dollar-monitor, yen-monitor, bond-trader, and currencytrader. The resources in the model are: wallst-server, jpmorgan-net, bonds-r-us, and trader-joe. The story for this example goes like this. Everyday traders of all type get information updates about prices on various things to trade on. When the prices are favorable to them, they will act upon it buying or selling things. We imagine that there is a bond-trader working in a company called Bonds-R-Us. He sits at his computer every day working on the Bonds-R-Us network. It is a relatively 53 small company so the network isn't as fast or as secure as some of the other larger trading firms. Similarly, there is a currency-trader called Joe. He works at home on his computer analyzing the price differences between the dollar and the yen. When prices are favorable, he will do a prescribed buy/sell trade. He only has his PC to work on and it is not secure at all since it is running Windows. His computer often gets hacked into. Now all traders need to get information from certain places that are reliable and provide such service. Oftentimes it is a large financial company that offers such services to the public. In our example, JP Morgan is the company that will provide such a service to the people. Everyday they monitor the prices on various commodities such as the price of the dollar in the US market and the price of the yen in the Japanese market. The bond-trader needs to know the relative strength of the US dollar to know if it is good or bad to buy/sell US bonds. He makes the decision on his computer based on the information updates he gets from JP Morgan of the current value of the dollar. Similarly, our currency-trader Joe, working from his home PC gets up to date information about the dollar and the yen prices to know what the relative pricing is between them. Given this information, Joe will make a decision whether or not to buy more US dollars or Japanese yen, or other trading strategies. Now these services that JP Morgan provide do not run by themselves. They too need to get information from a central web server from Wall St. which keeps track of basically everything financial in the entire world. It is a most powerful supercomputing web server indeed. Whatever information the JP Morgan computers/networks need, they get it from this centralized financial web-server. This sets up the basic idea behind this example which will demonstrate a branch and join structure in the context of a real world network. We will now walk through a few examples to show how the system deals with different possible faults. The examples are taken directly from the command line input one would give to specify observed input/output values to the system. * Normal case with no faults. (run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision 54 Figure 5-6: Branch and Join model. 10/90 20/80 15/85 30/70 Figure 5-7: Branch and Join probabilistic model. 55 bond-trader 25) (decision currency-trader 28))) This is the normal case with no faults. The DECISION of BOND-TRADER occurs at time 25 and the DECISION of currency-trader occrs at time 28. The predicted ranges for them respectively are [20,31] and [27,39]. Thus both values are consistent with the predicted values and no contradiction is detected. " Slow fault on bond-trader. (run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision bond-trader 32) (decision currency-trader 28))) The output DECISION of BOND-TRADER occurs at a time not within the expected time intervale. The system decides to change the selected model of BOND-TRADER from NORMAL to SLOW. This solves the contradiction. Additionally the probability that the resource BONDS-R-US is hacked increases from 0.20 to 0.32. In the real world, computations take varying times depending on the time of day. Thus it is likely for the bond-trader's computer to be a little slow. However there is the possibility that the bonds-r-us network has been tampered with and some resources might have been stolen by hackers. * Fast fault on bond-trader. (run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision bond-trader 18) (decision currency-trader 28))) In this example, bond-trader is given a fast fault where the observed time for the output DECISION of BOND-TRADER occurs at time 18 which is inconsistent with the expected interval of [20,31]. The system changes the selected model of bond-trader from NORMAL to FAST which solves the contradiction. The probability that BONDS-R-US is hacked increased from .20 to .31. Once again sometimes the bond-trader sees some input values which are deemed nobrainers, i.e. values which obviously dictate a certain course of action. Thus computation of these such cases could take faster than at other times. However there is also the outside remote chance that once again, the system has been 56 tampered with and the bonds-r-us network could be hacked into. 57 Chapter 6 Conclusion 6.1 System critique We will now discuss in more detail about what the system can and cannot do. We will also attempt to answer the question "How useful is it in real world scenarios?" What does the system allow us to do and what does it know? Once the user has inputted and fully specified the model of the network, there is much that the system knows from that. First, the system knows the timing information and how the dataflows between the components. Given an arbitrary input, the application can calculate what the expected time intervals for all of the component outputs are. Given output information, the system can immediately decide if it is consistent with the expected values. The system knows what time intervals each output should expect and can simply test to see if a given output value fits in the predicted time interval. If it doesn't, the system knows how to systematically search through all the components to find the combination of component models that satisfies the given observed values. The system also knows how to approach this search intelligently through the use of conditional probabilities given apriori. Additionally, once the system has found a solution, the probabilities will have been updated to reflect information garnered from the troubleshooting process. Namely, resources that are likely to have been hacked into to cause the inconsistencies will have an increased probability of being hacked. This is a very useful feature for self monitoring adaptive systems whereby 58 if the probability of being hacked increase past a certain threshold, predetermined methods of dealing with this problem can be applied. Thus the system knows much about the network given the simple model it has. Though it is important to note that it can only deal with problems which operate on the same level of detail as the user provided model. i.e. if a problem occurs at a lower level such that the structures used do not correctly model these problems, the system will be incapable of dealing with such things. This is more of a limitation of the model based approach rather than our system. The model based approach naturally must use one set of structures to model a system. The ideal approach would be to allow for multiple representations of varying detail to model the system. The problem of course is that such an approach is typically not feasible to use. It takes too long to come up with one good model for a system, let along a multitude of them. The level of complexity is a bit high and a lot of practical implementation questions come into play. What constitutes as a higher level of detail when modeling something? There are a lot of borderline issues that are highly subjective. The problem is that most problems do not break up into easily identifiable levels. In any case, the system does serve a useful purpose. It has its limitations, which we will explore a little more in the next section. However, it still succeeds at solving the problem it was intended to and is expandable enough to incorporate more complex ideas, algorithms, and functionality in the future. All in all a good foundation to build a self monitoring resource stealing troubleshooter. It is a system with a fair amount of knowledge and capability in it with a lot of room for future growth. 6.2 System limitations Although the system provides a lot of useful functionality and can effectively model and troubleshoot the problem of resource stealing in network intrusion, there nevertheless are some limitations to the system. We will discuss a few of the limitations of the system and state how difficult it would be to address these shortcomings. 59 6.2.1 Lack of correctness detection There is a lack of correctness detection. Nothing is said about whether the value outputted by a component is correct or not. This is mainly because we are trying to solve the problem of resource stealing. We had stated by assumption that all the inputs as well as the outputs of all the components are correct. All that can possibly cause an inconsistency with the system is the timing of the computations. The model does not include correctness detection and as well it isn't necessary given the problem domain we are attempting to solve. However if we wanted to extend this system towards other types of network attacks, correctness detection would be mandatory to ensure that all processes are behaving properly with respect to computation and timing. 6.2.2 Lack of probabilistic links between components There is no propagation of probabilities between components. Right now the only probabilistic links are between a component and the resource that it uses. The only way for probabilities to flow between components is indirectly through a shared common resource. For our model and the problem that we're addressing, this is fine. The reason goes back to the initial assumption that all values produced by components are correct, though they may not take a "normal" amount of time to produce. 6.2.3 Lack of descriptive model states Right now we have only included very simplistic security models. There is a lot of simplification as we are only focusing on the problem of resource stealing. If we were to extend the system to handle other types of network security issues, we would need more specific models that store even more information. These models would have to draw on the knowledge base of network security. One feature of model based reasoning that will help us in extending this functionality in the future is that model based reasoning uses hierarchical structures. The reasoning is device and domain independent. All one has to do is to change the device/system specifications and 60 the troubleshooter can diagnose faults by reasoning from first principles yet again. Hierarchically structured models would allow us to have varying level of detail depending on how specific the problem is. Thus each time we "zoom" in on the problem, examining details in greater depth, we can use an appropriate model that correctly captures all of the necessary information, no more, no less than what is required for the diagnosis. Such a feature of model based troubleshooting makes it very scalable and extensible. 6.3 Future work This system for diagnosing resource stealing network intrusion problems is part of the bigger picture of the Active Trust Management (ATM) for Autonomous Adaptive Survivable Systems (AASS's)[7]. The project attempt to build survivable systems in an imperfect environment in which any resource may have been compromised to an unknown extent. The project's claim is that such survivable systems can be constructed by restructuring the ways in which systems organize and perform computations. Our project that deals with monitoring and trouble shooting resource stealing network intrusion attacks is a part of the active trust management system needed to constantly monitor and update the relative trust of the resources given a limited amount of information. There are five main tenets of the Active Trust Management project. They will be listed below. This is taken directly from the project proposal submitted to ARPA[7]. 1. Such systems will estimate to what degree and for what purposes a computer (or other computational resource) may be trusted, as this influences decisions about what tasks should be assigned to them, what contingencies should be provided for, and how much effort to spend watching over them. 2. Making this estimate will in turn depend on having a model of the possible ways in which a computational resource may be compromised. 61 3. This in turn will depend on having in place a system for long term monitoring and analysis of the computational infrastructure which can detect patterns of activity indicative of successful attacks leading to compromise. Such a system will be capable of assimilating information from a variety of sources including both self-checking observation points within the application itself and intrusion detection systems. 4. The application systems will be capable of self-monitoring and diagnosis and capable of adaptationto best achieve its purposes with the available infrastructure. 5. This, in turn, depends on the ability of the application, monitoring, and control system to engage in rational decision making about what resources they should use in order to achieve the best ratio of expected benefit to risk. Our system for trouble shooting resource stealing fits well into this bigger picture. One of the biggest problem in network security is to know when a resource has been compromised. There are different levels of compromise possible for resources and not all of them are easy to detect. Thus rather than trying futilely to develop an environment where all the resources are guaranteed to be secured and can be trusted totally, we instead deal with managing the level of trust among the resources and then use risk assessment to determine if one should run a certain computation on a certain resource. Our system can integrate into this nicely and can provide good practical value to the overall project once more complex models including more descriptive and realistic component/resource behavior states are used. There however is still much to be done on this large and ambitious Active Trust Management for Autonomous Adaptive Survivable Systems project. 6.4 Lessons learned There were many lessons learned from designing, building and testing such a system. There were many steps in the process, from creating the infrastructure using Joshua 62 for consistency detection, to modeling the calculation of arrival/departure times, to integrating IDEAL into it, and finally to decide how it is that we want to use the probabilistic ideas. Oftentimes it was unclear what was the *more* correct thing to do. One problem was the difficulty in integrating several different applications together. Time is always a limiting factor and I never had the time to fully understand how Joshua or IDEAL works. All you can do is to learn enough for what your application needs. The problem grows when something goes wrong and you have to learn more about how Joshua and IDEAL actually work in order to find out why things aren't working. Another problem was in the knowledge acquisition part. Since I am not an expert in network security and the issues in the field, at best I can only come up with very simplisitic, perhaps unrealistic models for components and resources. Having a better grasp of these concepts and details might have given us a better idea of how best to design/model the system from the start. As such we were constrained to design the system to the best of our knowledge. Another important lesson is in the difficulty of research. Oftentimes it is unclear how best you should approach things since nothing similar has been attempted before. This usually requires intuition that comes only from extensive background knowledge of the problem area. The solution to this is to usually consult the experts in the area to gain their knowledgable insight on the problem. However, real world scheduling problems come into play here when it is nearly impossible to find times when all parties are able to meet up to talk about the topic. It's an unavoidable real world problem that must always be taken into account. 63 Chapter 7 KBCW Comlink System There are three general types of problems that probabilistic methods can be used. We've already discussed a model based troubleshooting application. The next application involves the Knowledge Based Collaboration Webs KBCW Comlink System. The scope of problems this can be used to deal with include data interpretation and collaborative decision making with the main focus on decision making. 7.1 Description Collaboration is the task of people working together towards a common goal. There is a pool of shared knowledge and understanding of the problem domain. There are limitations on what can be accomplished based on factors such as the skill/knowledge level of the people involves, workload, goals, and ability to work together in a communal fashion. The Knowledge Based Collaboration Webs (KBCW) project strives to find new and better ways of setting up and supporting collaboration. It is a broad, large scale project that seeks to integrate various separate lines of research in the MIT Artificial Intelligence laboratory to suppose the unified collaboration goal. Currently the direction is to allow for computer mediation of the problem using natural language and other forms of interaction between people. The Comlink system in particular incorporates email, and online web-based dis64 cussion forums in order to achieve the collaboration ideal. What the system attempts to do is to allow a structured, automated, computer mediated forum whereby people all over the world can debate on topics ranging from politics to technology to basically anything they'd want. It allows a systematic way to have discussions, arguments, and debates about various topics. Someone can pose a statement or question to the forum initially. Then other people can read what the statement is and come up with arguments/evidence that either support it or deny it. In turn, there can be recursive support/denial of each of these statements. Comlink keeps track of all these arguments and how all the statements link with each other. This allows for a very structured and formal method of debate or argumentation. One of the useful capabilities of the system is for collaborative decision making whereby a large group of people spread out all over the world can debate on what is the best course of action to take for a particular problem domain. Examples could include engineers from around the world trying to come up with a new design for a computer CPU architecture or McDonald employees around the world trying to brainstorm a new idea for a burger/sandwich. One good feature of this system is that it can allow for anonymity. Statements need not specify the origin of the source. What this allows is for a more level playing field among the participants such that everyone has an equal say in the argument/decision making process. One big fault of a collaborative decision making venture is that not all parties involved are equal status/position wise. For example, in a company meeting, the CEO of the company would have a much bigger say in decisions as opposed to a secretary. Whatever he/she says would have larger weight to it. This is the case even if he/she is totally wrong and has an incorrect view on things. Conversely, even if the lowly secretary has a brilliant idea, it does not have the equal weight behind it despite the superiority of the idea. This problem has shown up before, sometimes with catastrophic results. An example of this is the Challenger space shuttle disaster. A memo from one of the engineers stated the potential for disaster if the shuttle was to be launched in its current state. It explicitly stated the problem that could arise in the 0-rings. However, the higher ranking management dismissed 65 the idea even though they were obviously not the experts in the case. In this case, management pulled rank on the engineer and as a result, the shuttle exploded. Afterwards the finger pointing began. The crux of the matter is this: the disaster could have been averted has that one engineer's recommendation had been more seriously followed up on. Comlink provides a more formalized way to bring about issues and deal with them in a more efficient manner. Email, online discussion forums, and memos alone by themselves are not effective when there is a large group of people with varying opinions and knowledge. What is needed is a good mediator that can keep track of everything and provide an intuitive yet informative interface to present the information presented thus far on the topic. An online collaborative forum like Comlink provides a more systematic approach towards decision making and is more resistant to the problem of office politics and people with less expertise pulling rank on those that are more qualified knowledge and skill-wise. 7.2 Integrating a Bayes Net Solver Although the Comlink system provides very useful functionality now, it does have some limitations. The main criticism is that there is no way to distinguish between a "good" argument and a "weak" one. Someone can make a wild claim that is questionable whether it is true or not. Someone else can make a reasonable claim and have a lot of evidence in its support. However there is no way to distinguish quantitatively the difference between these two claims to know which statement is more likely to be true. This is the motivating force behind the integration of probability into Comlink. In particular, we are once again using a Bayesian network solver called IDEAL to accomplish this task. IDEAL, an acronym for Influsence Diagram Evaluation and Analysis in Lisp, is a test bed for work in influence diagrams and Bayesian networks. It contains various inference algorithms for belief networks and evaluation algorithms for influence diagrams. It also contains facilities for creating and editing influence diagrams and belief networks. IDEAL was created in the Rockwell International Science 66 Center and is available free for non-commercial research ventures. It is the Bayesian Network toolkit that we will be using as it provides us with all the functionality that we will need. By integrating IDEAL with the Comlink system, what we can now accomplish is a quantification of the relative strengths and weaknesses of arguments. If a statements gets a lot of positive support, the belief in that statement will obviously increase. Similarly if there is a lot of negative denial type of evidence against it, the belief will decrease. Thus we can systematically add probability to all the arguments and will thus be able to quantitatively compare competing theories or viewpoints on things. This is a big part of collaborative decision making. Oftentimes the group is attempting to come up with a unanimous decision about how to approach a problem. There will be several candidate hypotheses and the group must argue for and against each of these hypotheses. Without any quantification, it is very subjective which hypothesis has the most support or better argument for it. By adding in probability to a model that is already structurally similar to a Bayesian network, we can achieve this desired goal. One important thing to point out is that coming up with the numbers is still subjective. How can I say that a statement I make has a .75 probability of being true? Humans have been shown to be not very good at estimating probabilities. The lack of accurate estimations of prior and conditional probabilities on statements could cause inaccurate results. There is no easy answer to this problem but a technique called sensitivity analysis attempts to address some of the shortcomings of this probabilistic approach. 7.3 Implementation Now that the design of the system has been declared, a couple of obvious questions to ask arise. What can it do? What does it know? What the system can do now is the following. On the Comlink side, you can enter in statements into the discussion and it will be incorporated into the existing discussion structure already. Visually a statement is a node that is being added to a graph. There are different types of 67 statements you can add to the discussion though the exact details are not important to our design. If the statement is in response to either support or deny some other already present statement, Comlink allows you to specify this and an arc is drawn between the two nodes to specify a causal link between the two. The user can then input in either the prior probability of the statement if it will have no parents, i.e. doesn't support or deny any other statements, or a conditional probability given all the possible combination of states of its predecessors/parents. This is entirely analagous to specifying the probability values in a Bayesian network. Key points to note are that the nodes are binary valued, i.e. they are either true or false. In the implementation, when the comlink graphical structure is created, a mirror IDEAL copy is also created along with the appropriate functions to link between the two versions. Once all of the statements, links, and probability values are specified, the IDEAL side of things can use one of its inference algorithms and propagate the belief values through the Bayesian network. If any evidence should be discovered, it can be entered into the Bayesian network and the beliefs will be repropagated. Evidence in this case could be knowing for sure that a statement is absolutely true or absolutely false. The end result is that all of the information can be processed and thus there will be a quantification of the relative likelihood of competing hypotheses. Thus given all the evidence and support/deny statements, we can pick the best option out of all the candidate hypotheses. What does the system know? The system knows how to combine together a collection of statements given how they support/deny each other qualitatively and given user supplied probability values. It knows how to propagate the numbers and combine evidence in a coherent manner. That is just a by product of using Bayesian networks. 7.3.1 Example Here I will use a simple example to illustrate how the system works and what it can tell us. The problem domain will be McDonald's burgers. The top executives at McDonald'ss are a little concerned that sales has been steadily decreasing in the 68 past few years. There has been more competition from other fast food places which have been taking away more and more profit each year. McDonald's wants to do something about this and the plan is to unveil a new out of this world burger, the burger that will end all burgers at the next "Y2K Concept Burger Convention". They hope to generate quite a stir in the fast food industry and thereby send waves of repurcussions throughout the market. There are a few constraints though. Issues such as cost, appearance, target audience, etc.. come into play. Basically they want to make a burger that isn't too wild but just different enough to set it apart from all else. They also don't want the burger to cost too much money to make or be too difficult for their employees to assemble or *cook* depending on who you ask. The top McDonald's executives from around the world decide to use Comlink as the forum for this collaborative decision making process. The following will be an abridged form of their actual discussion. I will have a statement template with several fields which include: statement number/name, node numbers it supports and/or denies, a listing of the conditional/prior probabilities, and the actual statement itself. I will start off at the top with the candidate burgers. All statements are binary valued, that is they are either true or false. I will denote probabilities as X/Y where X is the probability of being FALSE and Y is the probability of being TRUE. Note that this is redundant as X + Y = 1. If the statement does not support or deny anything, then it has no predecessors and will be a root node. If a statement has a parent node, then it's probability will depend on all the possible combinations of states of the parent nodes. In our particular case, every node has at most one parent node which is either in the true (T) or false (F) case. I will denote these conditional probabilities as X/YIT or X/YIF where the probabilities are conditioned on the fact that the parent node is either in the T or F state. Probability values will be from a scale of 0-100 with 100 denoting absolute certainty. Top level statements: Number: 1 Name: Cream cheese soy burger 69 Supports: None Denies: None Probabilities: 50/50 Statement: A cream cheese soy burger is the best new burger that McDonald's can offer to its customers. Number: 2 Name: Pickled cabbage quadruple cheeseburger Supports: None Denies: None Probabilities: 50/50 Statement: A pickled cabbage quadruple 1/4 pounder cheeseburger is the best new burger that McDonald's can offer to its customers. Number: 3 Name: Bacon three-cheese sauteed onion double cheeseburger of the EVERYTHING burger Supports: None Denies: None Probabilities: 50/50 Statement: A bacon three-cheese sauteed onion double cheeseburger is the best new burger that McDonald's can offer to its customers. Supporting or denying statements: Number: 4 Name: Vegetarian Supports: 1 Denies: None Probabilities: 20/801T, 40/601F 70 Statement: People today are more concerned about health and will be more willing to eat a soy burger. Number: 5 Name: Expensive cheese Supports: None Denies: 1 Probabilities: 90/10T, 20/80IF Statement: Cream cheese is expensive and would cost too much to use in burgers for a reasonable cost. Number: 6 Name: Meat lovers Supports: None Denies: 1 Probabilities: 95/05|T, 35/65|F Statement: The main bulk of McDonald's customers are meat people who aren't interested in health. Otherwise why would they even be eating fast food? Therefore they would want meat and not some veggie meat-wannabe burger. Number: 7 Name: Value Supports: 2 Denies: None Probabilities: 05/95|T, 50/501F Statement: McDonald's customers like value, they want to feel that they are getting their money's worth. The more meat patties in the burger, the greater the value. 71 Number: 8 Name: Expensive cabbage Supports: None Denies: 2 Probabilities: 70/301T, 40/60IF Statement: Cabbage isn't cheap and it's cost would be too high to make the burger successful financially. Number: 9 Name: Too big Supports: None Denies: 2 Probabilities: 65/351T, 20/801F Statement: Four meat patties is too big for an average sized mouthed person to eat. They'd have to dislocate their jaws to eat it so it's not likely to be popular with the customers. Number: 10 Name: Good variety Supports: 3 Denies: None Probabilities: 05/95|T, 50/50|F Statement: Customers love variety when it comes to their burgers. They love getting a burger with all the works. Therefore this burger will be very popular with our customers. Number: 11 Name: Good cheese Supports: 3 Denies: None 72 Probabilities: 30/701T, 50/501F Statement: Offering three types of cheese offer a new different type of taste not normally associated with burgers. Plus the types of cheese used will be common cheeses so it won't be too costly to implement. Number: 12 Name: Boring Supports: None Denies: 3 Probabilities: 50/501T, 40/601F Statement: The burger has a lot of stuff in it but it is all normal boring things that other burgers in the past have offered before in some shape or form. The customers will just think it is more of the same thing and not really be excited about it. Number: 13 Name: Cheap cream cheese Supports: None Denies: 5 Probabilities: 80/201T, 50/501F Statement: A new science breakthrough has just occurred which allows for very cheap imitation cream cheese. It tastes virtually the same and can be manufactured for half the cost as the real thing. Number: 14 Name: Germany bankrupt Supports: None Denies: 8 Probabilities: 95/051T, 50/501F Statement: Germany is about to go bankrupt and thus is trying to 73 liquidate all assets. In particular, they have a huge surplus of pickled cabbage which they will desparately try to unload on the American market. Thus pickled cabbage can be purchased for very cheap prices. These were all of the statements submitted by various McDonald's executives. What the system will do now is to construct a model based on these statements both on the Comlink and IDEAL side. The IDEAL model will generate a Bayesian net. Together with the conditional and prior probabilities along with the causal links as denoted by the support/deny information, we are able to generate a fully specified Bayesian model based on the given information. Figure 7-1 depicts graphically how the model looks. Note that the values generated next to each node are the beliefs of each node which is separate from what the distribution of the node is. Figure 7-1 shows the model before any specific evidence has been discovered. Recall that the probability listed as X/Y refers to the probability of that node being false as X and true as Y. Thus prior to any evidence, the three root nodes still have the same prior probabilities, initialed to 50/50 stating that all three candidates are equally likely to succeed prior to evidence. Things get a little more interesting when evidence is entered into the diagram. Seven examples of evidence in this case will be demonstrated. The reader can see how the probabilities will propagate as a result. Note that the three candidate burgers form disjoint trees in the diagram. They are not probabilistically linked in this particular example. Therefore evidence in one subtree will only propagate within that subtree and will not affect the results in the other two subtrees. Example #1: Insider sources at the chem lab where they are rumored to be developing a cheap cream cheese which tastes virtually the same as the real thing have confirmed the rumors. The cheap cream cheese does exist; it tastes just like normal cream cheese and will be sold to the public immediately at a very cheap price. Figure 7-2 depicts the result of this evidence on the cream cheese soy burger sub-tree. Note that as a result, the probability of the burger being a success has increased from 74 50/50 50/50 cream cheese soy burger vegetarian expensive 30/70 cheese28/72 55/45 cheap cream cheese 50/50 pickled cabbage quadruple burger vers metexesve value expensive cabbage 55/45 65/35 everything burger to bi 42/58 good variety good boring cheese burger 28/2 40/60 45/55 Germany bankrupt 70/30 64/36 Figure 7-1: The Burger Problem before any evidence. 50% to 64%. Recall that since this part of the tree is disjoint from the rest, any evidence declared within here does not propagate towards the other non-connected nodes. Example #2: Insider sources in Germany have confirmed that Germany is indeed going to declare bankruptcy and is attempting to liquidate all of its pickled cabbage. As such they will definitely be able to sell it on the open market for a very cheap price. Figure 7-3 depicts the result of this evidence on the pickled quadruple burger subtree. Note that the probability of success for the burger has increased from 50% to 61% as a result of this information obtained by our sources. Example #3: McDonald's has decided to conduct a survey to see just how im- portant variety is to their customers. They conduct a worldwide survey asking their customers to rank qualities in order of importance. The result was that variety was voted as the most important quality when it came to burger preference. Customers were tired or boring burgers and liked having different flavors melding together providing a complex and satisfying delicious taste. Figure 7-4 depicts the result of this evidence on the everything burger. The probability of this burger's success has in- creased from 50% to 66%. Example #4: A late breaking medical health experimental result has just oc- 75 36/64 27/73 74/26 0/100 Figure 7-2: Cheap cream cheese developed. 39/61 0/100 Figure 7-3: Germany goes bankrupt and liquidates pickled cabbage to the world. 76 34/66 everything burger good variety good cheese 0/100 37/63 boring burger 47/53 Figure 7-4: McDonald's survey show customers value variety the most. 93/7 cream cheese soy burger vegetarian 39/61 expensive cheese 25/75 meat lovers 0/100 cheap cream cheese 72/27 Figure 7-5: Medical results say that you need to eat meat. curred. Researchers have found that people who don't eat enough meat can suffer from increased chances of heart failure. This shocking news creates a ripple effect and people are frantically in a meat frenzy now cutting down on a lot of vegetarian types of food. People are declaring, "I want my meat!". The health craze suddenly takes a 180 on the issue of meat. Figure 7-5 depicts the result of this evidence on the cream cheese soy burger subtree. The probability of success of the burger has dropped from 50% to a shockingly low 7%. Example #5: The bean counters at McDonald's have come up with some cost figures for the concept pickled cabbage quadruple cheeseburger. The prototype burger 77 76/24 Figure 7-6: Estimated cost of pickled cabbage quadruple burger too high. does offer a lot of food but the projected cost to produce such a burger is immense. A single burger alone would have to cost the consumers $10 for McDonald's to have a decent profit on it. This totally shoots the whole value argument for the burger. Figure 7-6 depicts the result of this evidence on the pickled cabbage quadruple burger. The probability of success of the burger has dropped from 50% to a lowly 9%. Example #6: This next example has McDonald's once again conducting a survey to see what factors are important to the customers. To generate more customer feedback, each booth that gives out the surveys to be filled out also has food samples. The food sample they are using is the same three cheese combination that is in the prototype everything burger. The survey also includes customer feedback on the cheeses. Well the surveys are in and everyone LOVES the three cheese combo. As before, they also cite variety as the most important quality when choosing among burgers. Figure 7-7 depicts the result of this evidence on the everything burger subtree. The probability of success has increased from 50% to 73%. Example #7: This final example, as depicted by Figure 7-8 contains all of the evidence from the previous six examples. This is the information that the McDonald's executives have acquired from insider sources and customer surveys. The final result has the cream cheese soy burger having a 22% chance of success. The pickled cabbage 78 27/73 everything burger good variety good cheese boring 0/100 0/100 47/53 burger Figure 7-7: New McDonald's survey also shows that customers love the cheeses. 78/22 86/14 27/73 cream cheese soy burger pickled cabbage quadruple burger everything burger good boring variety cheese burger 0/100 0/100 47/53 good vegetarian 38/62 expensive cheese 45/55 meat 0/100 value 100/0 expensive cabbage 88/12 cheap cream cheese Germany bankrupt 0/100 0/100 too b 26/74 Figure 7-8: Final result putting together all the evidence. quadruple cheeseburger has a 14% chance of success. The everything burger has a 73% chance of success. Given the results that the system outputs, the McDonald's executives decide to go with the everything burger as their new flagship burger. The burger is wildly successful and heralded by many as the next big thing, a dramatic quantum leap in burger technology, forever known as one of the all time greats if not THE best burger of all time. This is just one simple but realistic (debatable) situation that can occur where the Comlink system together with IDEAL can prove to be most useful. In more 79 complex situations, the value would be even greater as it is harder to discern or discriminate among a large amount of candidate hypotheses each with their own complex interaction of supports and denies. A systematic formal approach would be the only tractable way to tackle such a problem. 7.4 7.4.1 Conclusion System critique We have now seen a description of the system and walked through an example. A logical question to ask is: What can it do and is it useful at all? The "What can it do" question has been answered so now we deal with the more important question of how useful it is. Barring the big problem of how the probability numbers are generated, the system is successful at solving the problem intended. The problem was to give a systematic mathematical probabilistic functionality to the Comlink system. That requirement has been met. We can now link together any number of statements which are part of a decision making process and propagate the values through the model. This is an extremely useful feature in any decision making procedure where one has to choose among competing viewpoints/alternatives. Formalism and structure has been introduced into a domain that has traditionally been very subjective and disorganized. This provides a significantly more efficient method in the problem domain. Greater efficiency in any process is always a desirable goal. 7.4.2 Lessons learned What did I learn from designing/building this system? There are many issues and thoughts that came up while working on this portion of the project. For one thing, I've learned that when trying to integrate one system into another, it really helps if the two systems have comparable structure to begin with. This was the situation in our case as the Comlink structure was very similar to the IDEAL structure. Extending the Comlink side to be able to accept probability values was not very difficult. The 80 integration then became more of a bookkeeping process to keep track of the Comlink model and the IDEAL model of the problem. Another lesson I learned is that when extending a large body of existing code, things are okay if you can treat the existing code as a black box. However, when it behaves contrary to what you expect, then it is an arduous task to dig through that code to try to understand why it didn't do what you expected it to do. Good documentation would always help in this case. Such is one of the many follies of large software projects. What went well was how relatively trouble-free it was to integrate IDEAL into the Comlink system. This was mainly due to the structural similarities between the two models. What didn't work out so well was coming up with good examples. There are existing Comlink structures debating certain issues but one would have to go back and input in all of that information plus the probability values again. As I am not an expert in those fields, it is difficult for me to assess what an accurate probability value should be for statements and how well something supports/denies another. To come up with a non-trivial model is not a simple task. Coming up with meaningful numbers/probabilities takes time and a more expert understanding of the problem domain and the many issues that arise. Another problem was the lack of specificity in the problem domain. There are a vast amount of problems where this system could be applied to. Comlink is capable of handling online collaborative decision making as well as data interpretation. The focus of the project became unclear at times as the problem domain seemed very large. Knowing exactly what type of problem to tackle would definitely have made things easier. However this was more a result of trying to create a general system that could handle a multitude of problems. 7.5 Future work Although the system we have does the intended job, there are still some limitations to it. We will discuss a couple of these issues that are open research topics suitable for future work on the project to extend its functionality. 81 7.5.1 Sensitivity Analysis One of the chief complaints against the system as it now stands is a common problem with many probabilistic approaches. That is, subjective values for probabilities still need to be made. The fear is that you can get whatever result you want given the same structure for the model by playing around with the numbers. As such, one technique that attempts to deal with this complaint is sensitivity analysis. The basic idea is to see how varying the probabilities on different nodes have an effect on the toplevel probabilities used to differentiate between competing hypotheses. We can see how sensitive the high level nodes are to variations in probability of other nodes further down the tree. What one often finds is that the final probability usually doesn't change that much just by varying the probability of another node within a certain reasonable range. It is difficult to say definitely that something is likely to be true with probability x. However it is much more feasible to state a range of hypotheses. For example, I am not exactly sure how probable it is that eating greasy food causes heart diseases. I can say that it is between .5 and .75 and be more certain of that assessment rather than picking a single value. Sensitivity analysis thus attempts to allow for ranges of probabilities and see how each node can affect the final result. Some nodes will obviously have more of an impact on the final value. The most influential nodes are called critical nodes. Identification of critical nodes is very important as it allows you to know which issue/node is the most important thing to debate upon to determine the final outcome. It lets you know where best to focus your resources on to find out. For example, in the realm of intelligence data interpretation, it might be critical to know if a foreign nation is stockpiling on fighter jets and bombers to know what the likelihood of an air strike by that country is. The issue of how much air power is being accumulated is critical and information about that should be acquired by all means necessary. In the collaborative decision making problem, critical nodes could be the important basic issues of the problem which have the most bearing on the issue at hand. Identifying critical nodes can allow the group to argue about the more important impacting issues rather than periphery topics that 82 don't have much influence on the final decision. Thus sensitivity analysis and the identification of critical nodes is a very useful technique which we would like to incorporate into our system in the future if possible. It addresses the main criticism of inaccurate subjective evaluations of probabilities. 7.5.2 Multiple Viewpoints One problem that arises comes from the generation of the model for the problem. One person can say that a certain statement, call it A supports another statement, call it B. However, somebody else could argue that statement A more accurately supports statement C while denying statement D. Thus even before you get into arguing about the specific numbers for the probabilities, there is disagreement over how the domain should be modeled. The problem then becomes this, two people might model the problem differently and arrange the statements along with the support and deny links in different ways, coming up with their own model or viewpoint. What do we do with all of these multiple viewpoints on the same problem? Which one is *more* correct? It is difficult to say which model is better or closer to the real problem. So assuming that we can't tell for sure which viewpoint is better, is there any way we can use all of these multiple viewpoints? Is there any way for us to integrate them with each other or to merge them so you can get some useful information out of them? Can we tell how the differing viewpoints agree/disagree and to what degree on certain topics? This is a very interesting question and is an open research topic. There isn't much current active work in this area. But as Bayesian networks become more widespread, this issue becomes of greater importance and will eventually need to be dealt with. 83 Chapter 8 Summary We have demonstrated successfully how Bayesian networks can be used and integrated with existing applications and provide useful, productive results. The scope of problems this can be applied to is very large and we are only at the tip of the iceberg right now in terms of applications using Bayesian networks. As computers get evem more powerful, Bayesian networks will play an even larger role in more problem domains than ever thought possible. The two broad but important fields of model based troubleshooting and collaborative decision making has much greater potential and functionality when the capabilities of probabilistic methods are incorporated into them. It is thus feasible and highly desirable to see just what else can benefit from them. The usage of Bayesian nets started a little over 10 years ago and it is showing no signs of slowing down. It will thus be very interesting to observe its development over the course of the next few years. 84 Appendix A delay-simulator code This code can be found on the server named WILSON in the MIT Artifical Intelligence Laboratory in the KBCW group. The file is located on that machine at: w:>hes>delay-simulator.lisp ;;; -*- Mode: joshua; Syntax: joshua; Package: user -*- (in-package :ju) ;;; structural model (define-predicate component (name) (ltms: ltms-predicate-model)) (define-predicate dataflow (producer-port producer consumer-port consumer) (ltms:ltms-predicate-model)) (define-predicate port (direction component port-name ) (ltms:ltms-predicate-model)) ;;; underlying infrastructure model (define-predicate resource (name) (ltms:ltms-predicate-model)) ;;; dynamic assertions about values (define-predicate potentially (predicate) (ltms:ltms-predicate-model)) (define-predicate earliest-arrival-time (port component time) (ltms:ltms-predicate-model)) (define-predicate latest-arrival-time (port component time) (ltms:ltms-predicate-model)) (define-predicate earliest-production-time (port component time) (ltms:ltms-predicate-model)) (define-predicate latest-production-time (port component time) (ltms:ltms-predicate-model)) ;;; static and dynamic assertions about "mode"s (define-predicate possible-model (component model) (ltms:ltms-predicate-model)) (define-predicate selected-model (component model) (ltms:ltms-predicate-model)) (define-predicate executes-on (component resource) (ltms:ltms-predicate-model)) ;;; Probabilities (apriori and conditional) (define-predicate a-priori-probability-of (resource model probability) (ltms:ltms-predicate-model)) (define-predicate conditional-probability (component model resource model conditional-probability) (ltms:ltms-predicate-model)) (define-predicate-method (notice-truth-value-change potentially :after) (declare (ignore old-truth-value)) (let* ((predication (second (predication-statement self))) (type (predication-predicate predication))) (with-statement-destructured (port component time) predication (declare (ignore time)) (let ((best-time nil) (best-guy nil) (old-best-time nil) (old-best-guy nil)) (block find-current (ask '[,type ,port ,component ?his-time] 85 (old-truth-value) # (lambda (justification) (setq old-best-time ?his-time old-best-guy (ask-database-predication justification)) (return-from find-current (values))))) (when (or (null old-best-guy) (Not (eql :premise (destructure-justification (current-justification old-best-guy))))) (setq best-time old-best-time) (ask '[potentially [,type ,port ,component ?new-time]] #'(lambda (justification) (when (or (null best-time) (case type (earliest-arrival-time (> ?new-time best-time)) (latest-arrival-time (< ?new-time best-time)) (earliest-production-time (> ?new-time best-time)) (latest-production-time (< ?new-time best-time)))) (setq best-time ?new-time best-guy (ask-database-predication justification))))) (when (or (null old-best-time) (not (= old-best-time best-time))) (when old-best-guy (unjustify old-best-guy)) (when best-time (tell '[,type ,port ,component ,best-time] :justification '(best-time-finder (,best-guy)))))))))) (defmacro defensemble (name &key components dataflows resources resource-mappings model-mappings) '(defun ,name 0 (clear) ,@(loop for (component) in components collect '(tell [component ,component])) ,@(loop for (resource) in resources collect '(tell [resource ,resource])) ,@(loop for (component . plist) in components for models = (getf plist :models) for inputs = (getf plist :inputs) for outputs = (getf plist :outputs) append (loop for output in outputs collect '(tell [port output ,component ,output])) append (loop for input in inputs collect '(tell [port input ,component ,input])) append (loop for model in models collect '(tell [possible-model ,component ,model]))) ,@(loop for (resource . models) in resources append (loop for (model prior-probability) in models collect '(tell [possible-model ,resource ,model]) when prior-probability collect '(tell [a-priori-probability-of ,resource ,model ,prior-probability]))) ,G(loop for (out-port from-component in-port to-component) in dataflows collect '(tell [dataflow ,out-port ,from-component ,in-port ,to-component])) ,@(loop for (component resource) in resource-mappings collect '(tell [executes-on ,component ,resource])) ,@(loop for (component component-model resource resource-model probability) in model-mappings collect '(tell [conditional-probability ',component ,component-model ,resource ,resource-model ,probability]) (defmacro defmodel ((component model-name) &body rules) '(progn ,0(loop for (inputs outputs min max) in rules for clause-number from 0 for clause-name = (make-name component model-name) append (build-forward-propagators inputs outputs component model-name clause-number min max clause-name) append (build-backward-propagators inputs outputs component model-name clause-number min max clause-name)))) (defun build-forward-propagators (inputs outputs component model-name clause-number min max rule-name) (setq rule-name (make-name rule-name 'forward)) (let ((min-rule-name (intern (format nil "a-~a-F-min-~d" component model-name clause-number))) (max-rule-name (intern (format nil "a-~a-F-max-~d" component model-name clause-number)))) (list (loop for input in inputs for counter from 0 for logic-value = '(logic-variable-maker ,(intern (format nil "?input-~d" counter))) for support-lv = '(logic-variable-maker ,(intern (format nil "?support-~d" counter))) collect logic-value into lvs collect support-lv into support-lvs append '((predication-maker '(earliest-arrival-time ,input ,component ,logic-value)) :support support-lv) into clauses finally (let ((model-support '(logic-variable-maker ,(intern "?model-support")))) (return '(defrule ,min-rule-name (:forward) If (predication-maker '(and (predication-maker '(selected-model ,component ,model-name)) :support ,model-support ,@clauses)) then (compute-forward-min-delay (list ,@lvs) ,min ',outputs ',component (list ,model-support ,@support-lvs) ',rule-name))))) (loop for input in inputs 86 for counter from 0 '(logic-variable-maker ,(intern (format nil "?input-~d" counter))) for logic-value for support-lv = '(logic-variable-maker ,(intern (format nil "?support-~d" counter))) collect logic-value into lvs collect support-lv into support-lvs append '((predication-maker '(latest-arrival-time ,input ,component ,logic-value)) :support ,support-lv) into clauses finally (let ((model-support '(logic-variable-maker ,(intern "?model-support")))) (return '(defrule ,max-rule-name (:forward) If (predication-maker '(and (predication-maker '(selected-model ,component ,model-name)) :support ,model-support ,Gclauses)) then (compute-forward-max-delay (list ,Qlvs) ,max ',outputs ',component (list ,model-support ,@support-lvs) ',rule-name)))))))) (defun compute-forward-min-delay (input-times delay output-names component-name support rule-name) (let* ((max-of-input-times (loop for input-time in input-times maximize input-time)) (output-time (+ max-of-input-times delay))) (loop for output-name in output-names doing (tell '[potentially [earliest-production-time ,output-name ,component-name ,output-time]] :justification '(,rule-name ,support))))) (defun compute-forward-max-delay (input-times delay output-names component-name support rule-name) (let* ((max-of-input-times (loop for input-time in input-times maximize input-time)) (output-time (+ max-of-input-times delay))) (loop for output-name in output-names doing (tell '[potentially [latest-production-time ,output-name ,component-name ,output-time]] :justification '(,rule-name ,support))))) (defun build-backward-propagators (inputs outputs component model-name clause-number min max rule-name) (setq rule-name (make-name rule-name 'backwards)) (loop for output in outputs for output-counter from 0 append(loop for input in inputs for input-counter from 0 for min-rule-name = (intern (format nil "a-~a-B-min-~d-~d-~d" component model-name clause-number input-counter output-counter)) for max-rule-name = (intern (format nil "a-~a-B-max-~d-~d-~d" component model-name clause-number input-counter output-counter)) collect (loop for other-input in (remove input inputs) for counter from 0 for logic-value = '(logic-variable-maker ,(intern (format nil "?input-~d" counter))) for support-lv = '(logic-variable-maker ,(intern (format nil "?support-~d" counter))) collect logic-value into lvs collect support-lv into support-lvs append '((predication-maker (earliest-arrival-time ,other-input ,component ,logic-value)) :support ,support-lv) into clauses finally (let ((model-support '(logic-variable-maker ,(intern "?model-support"))) (output-lv '(logic-variable-maker ,(intern "?output-time"))) (output-support '(logic-variable-maker ,(intern "?output-support")))) (return '(defrule ,min-rule-name (:forward) If (predication-maker '(and (predication-maker '(selected-model ,component ,model-name)) :support ,model-support ,output ,component ,output-lv)) (predication-maker '(earliest-production-time :support ,output-support ,@clauses)) then (compute-backward-min-delay ,output-lv (list ,@lvs) ,max ',input ',component (list ,model-support ,output-support ,0support-lvs) ',rule-name ,(intern "?model-support"))) (let ((model-support '(logic-variable-maker (support-lv '(logic-variable-maker ,(intern "?output-support"))) (output-lv '(logic-variable-maker ,(intern "?output")))) '(defrule ,max-rule-name (:forward) If (predication-maker '(and (predication-maker '(selected-model ,component ,model-name)) :support ,model-support (predication-maker '(latest-production-time ,output ,component ,output-lv)) :support ,support-lv)) then (compute-backward-max-delay ,output-lv ,min ',input ',component (list ,model-support ,support-lv) collect 87 ,rule-name)))))) (defun compute-backward-min-delay (output-time other-input-times delay input-name component-name support rule-name) (let* ((max-of-other-input-times (loop for input-time in other-input-times maximize input-time)) (input-constraint (- output-time delay))) (when (> input-constraint max-of-other-input-times) (tell '[potentially [earliest-arrival-time ,input-name ,component-name ,input-constraint]] :justification '(,rule-name ,support))))) (defun compute-backward-max-delay (output-time delay input-name component-name support rule-name) (let* ((constraint (- output-time delay))) '[potentially [latest-arrival-time ,input-name ,component-name ,constraint]] (tell :justification '(,rule-name ,support)))) ;;;Basic rules for detecting conflicts and propagating dataflows (defrule time-inconsistency-1 (:forward) If [and [earliest-arrival-time ?input ?component ?early] [latest-arrival-time ?input ?component ?late] (> ?early ?late) I then [ltms:contradiction]) (defrule time-inconsistency-2 (:forward) If [and [earliest-production-time ?output ?component ?early] [latest-production-time ?output ?component ?late] (> ?early ?late) ] then [ltms:contradiction]) (defrule dataflow-1 (:forward) If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer] [earliest-arrival-time ?consumer-port ?consumer ?time]] then [earliest-production-time ?producer-port ?producer ?time]) (defrule dataflow-2 (:forward) If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer] [latest-arrival-time ?consumer-port ?consumer ?time]] then [latest-production-time ?producer-port ?producer ?time]) (defrule dataflow-3 (:forward) If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer] [earliest-production-time ?producer-port ?producer ?time]] then [earliest-arrival-time ?consumer-port ?consumer ?time]) (defrule dataflow-4 (:forward) If [and [dataflow ?producer-port ?producer ?consumer-port ?consumer] [latest-production-time ?producer-port ?producer ?time]] then [latest-arrival-time ?consumer-port ?consumer ?time]) building the equivalent ideal model predicates to map over: component resource possible-model a-priori-probability-of conditonal-probability set-up-evidence-nodes ;;;the dynamic part: selected-model (defun build-ideal-model 0 (let ((ideal-diagram nil)) (setq ideal-diagram (build-nodes ideal-diagram (setq ideal-diagram (build-nodes ideal-diagram (do-priors ideal-diagram) (do-conditional-probabilities ideal-diagram) ideal-diagram)) (defun build-nodes (ideal-diagram (ask '[,node-type ?node] #'(lambda (stuff) (declare (ignore stuff)) (let ((states nil)) 'component)) 'resource)) node-type) 88 (ask [possible-model ?node ?model-name] #'(lambda (stuff) (declare (ignore stuff)) (push ?model-name states))) (multiple-value-bind (new-diagram parent-node) (ideal:add-node ideal-diagram :name ?node :type :chance :relation-type :prob :state-labels states) (setq ideal-diagram new-diagram) (when (eql node-type 'component) (loop for state in states for new-name = (make-name ?node state) do (multiple-value-bind (new-diagram child-node) (ideal:add-node ideal-diagram :name new-name :type :chance :relation-type :prob :state-labels '(:true :false)) (setq ideal-diagram new-diagram) ;; Add an arc from the parent node to it (ideal:add-arcs child-node (list parent-node)) ;; Set the conditional probabilities on the child evidence node (ideal :for-all-cond-cases (parent-case parent-node) (let ((correct-corresponding-case (eql (ideal:state-in parent-case) (ideal:get-state-label parent-node state)))) (ideal:for-all-cond-cases (child-case child-node) (let ((true-case (eql (ideal:state-in child-case) (ideal:get-state-label child-node :true)))) (if correct-corresponding-case (setf (ideal:prob-of child-case parent-case) (if true-case 1 0)) (setf (ideal:prob-of child-case parent-case) (if true-case 0 1)))))))))))))) ideal-diagram) (defun do-priors (ideal-diagram) (ask [a-priori-probability-of ?resource ?model ?probability] #'(lambda (stuff) (declare (ignore stuff)) (let* ((ideal-node (ideal:find-node ?resource ideal-diagram)) (state-label (ideal:get-state-label ideal-node ?model))) (ideal:for-all-cond-cases (case ideal-node) (when (eql (ideal:state-in case) state-label) (setf (ideal:prob-of case nil) ?probability))))))) (defun do-conditional-probabilities (ideal-diagram) ;; first add arcs (ask [component ?component] #'(lambda (stuff) (declare (ignore stuff)) (let ((component-node (ideal:find-node ?component ideal-diagram))) (ask [executes-on ?component ?resource] #'(lambda (stuff) (declare (ignore stuff)) (let ((resource-node (ideal:find-node ?resource ideal-diagram))) resource-node)))))))) (ideal:add-arcs component-node (list now build conditional probabilities (ask [conditional-probability ?component ?c-model ?resource ?r-model ?probability] #'(lambda (stuff) (declare (ignore stuff)) (let* ((resource-node (ideal:find-node ?resource ideal-diagram)) (resource-label (ideal:get-state-label resource-node ?r-model)) (component-node (ideal: find-node ?component ideal-diagram)) (component-label (ideal:get-state-label component-node ?c-model))) (ideal :for-all-cond-cases (c-case component-node) (when (eql (ideal:state-in c-case) component-label) (ideal:for-all-cond-cases (r-case resource-node) (when (eql (ideal:state-in r-case) resource-label) (setf (ideal:prob-of c-case r-case) ?probability)))))))) (defun make-name (&rest names) (intern (format nil "{a~-}" names))) (defun make-component-state-invalid (component state diagram) (let* ((node-name (make-name component state)) (node (ideal:find-node node-name diagram))) 89 ;; set ideal evidence node)) (defstruct (retraction-entry) clause probability ideal-node component model) ;;;the other thing to do is build a probability model equivalent of the ;;;the disallowed set of states conjunctively imply a binary node whose ;;;this will have the effect that when one of these states becomes more ;;;less likely. So the true-true-...-true case justifies the true case and the other cases justify the false case equally? no-good in which false state has evidence. likely the others will become of the ideal "contradiction-node" (defun pick-best-guy-to-retract (condition ideal-diagram) (let ((assumptions (tms-contradiction-non-premises condition))) (let ((retraction-entries (loop for clause in assumptions collect (multiple-value-bind (mnemonic assumption) (destructure-justification clause) (declare (ignore mnemonic)) (with-statement-destructured (component model) assumption (let* ((node-name (make-name component model)) (ideal-node (ideal: find-node node-name ideal-diagram)) (true-state (ideal: get-state-label ideal-node :true)) (probability nil)) (ideal :for-all-cond-cases (node-case ideal-node) (when (eql (ideal:state-in node-case) true-state) (setq probability (ideal:belief-of node-case)))) (make-retraction-entry :clause clause :probability probability :ideal-node ideal-node :component component :model model))))))) retraction-entries) (loop with best = (first for entry in (rest retraction-entries) when (< (retraction-entry-probability entry) (retraction-entry-probability best)) do (setq best entry) finally (return best))))) (defun find-best-other-state (best ideal-diagram) (let* ((ideal-node (ideal :find-node (retraction-entry-component best) ideal-diagram)) (ideal-state (ideal:get-state-label ideal-node (retraction-entry-model best))) (component (retraction-entry-component best)) (best-other-case nil) (best-other-probability nil)) (ideal :for-all-cond-cases (node-case ideal-node) (when (and (not (eql ideal-state (ideal:state-in node-case))) (or (null best-other-case) (> (ideal:belief-of node-case) best-other-probability))) (setq best-other-probability (ideal:belief-of node-case)) (setq best-other-case (ideal::label-name (ideal:state-in node-case))))) (values ,component ,best-other-case]))) '[selected-model (defun update-ideal-for-unretraction (component model ideal-diagram jensen-tree) (let ((ideal-node (ideal:find-node (make-name component model) ideal-diagram))) (when (eql (ideal:node-state ideal-node) (ideal:get-state-label ideal-node :false)) (format t "%Reconsidering model ~a of component ~a" model component) (ideal:remove-evidence (list ideal-node)) (ideal:jensen-infer jensen-tree ideal-diagram)))) (defun update-ideal-for-retraction (best ideal-diagram jensen-tree) ;; provide negative evidence for the guy we just removed and resolve the diagram (let ((ideal-node (retraction-entry-ideal-node best))) (setf (ideal:node-state ideal-node) (ideal:get-state-label ideal-node :false)) (ideal: jensen-infer jensen-tree ideal-diagram))) (defun run-case (ensemble-name input-timings output-timings) set up with each model being the normal one ;; first (funcall ensemble-name) (let* ((ideal-diagram (build-ideal-model)) (ideal-diagram-jensen-tree (ideal:create-jensen-join-tree ideal-diagram))) (ideal:jensen-infer ideal-diagram-jensen-tree ideal-diagram) (with-atomic-action (loop for (node component time) in input-timings '[potentially [earliest-arrival-time ,node ,component ,time]] doing (tell :justification :premise) ,node ,component ,time] '[latest-arrival-time (tell :justification :premise)) (loop for (node component time) in output-timings '[potentially [earliest-production-time ,node ,component ,time]] doing (tell :justification :premise) 90 ,node ,component ,time] '[latest-production-time :justification :premise))) (let ((guys-signalled-to-assert nil)) (condition-bind ((ltms:ltms-contradiction #'(lambda (condition) (let ((best (pick-best-guy-to-retract condition ideal-diagram))) (update-ideal-for-retraction best ideal-diagram ideal-diagram-jensen-tree) (push (find-best-other-state best ideal-diagram) guys-signalled-to-assert) (let ((clause-to-retract (retraction-entry-clause best))) (multiple-value-bind (mnemonic consequent) (destructure-justification clause-to-retract) (declare (ignore mnemonic)) (with-statement-destructured (component model) consequent (format t "~XRetracting Model ~a of component ~a" model component))) clause-to-retract))))))) (sys:proceed condition :unjustify-subset (list (loop for first-time = t then nil for guys-to-assert = nil then guys-signalled-to-assert until (and (not first-time) (null guys-to-assert)) doing (setq guys-signalled-to-assert nil) (cond (first-time (ask [component ?component] #'(lambda (stuff) (declare (ignore stuff)) (tell [selected-model ?component normal] :justification :assumption)))) (guys-to-assert (loop for guy-to-assert in guys-to-assert doing (with-statement-destructured (component model) guy-to-assert (format t "~XChoosing Model ~a for ~a" model component) (update-ideal-for-unretraction component model ideal-diagram ideal-diagram-jensen-tree) guy-to-assert :justification :assumption))))))) (tell ideal-diagram))) (tell ;;; Examples of system descriptions #1 Example #1 Simple hacked resource (defensemble test-1 :components ((foo :models (normal fast) :inputs (a b) :outputs (c d)) (bar :models (normal slow) :inputs (a b) :outputs (x y))) :dataflows ((c foo a bar) (d foo b bar)) :resources ((box-1 (normal .9) (hacked .1)) (box-2 (normal .8) (hacked :resource-mappings ((foo box-1) (bar box-2)) :model-mappings ((foo normal box-i normal .90) (foo fast box-1 normal .10) (foo normal box-1 hacked .20) (foo fast box-1 hacked .80) (bar normal box-2 normal .90) (bar slow box-2 normal .10) (bar normal box-2 hacked .10) (bar slow box-2 hacked .90))) (defmodel (foo normal) ((a b) (c d) 2 7)) (defmodel (foo fast) ((a b) (c d) 1 5)) (defmodel (bar normal) ((a) (x) 5 10) ((b) (y) 7 15)) (defmodel (bar slow) ((a) (x) 10 20) ((b) (y) 15 20)) Various test cases with different inputs/outputs to test system normal case with no faults P(box-1 = hacked) = 0.10 P(box-2 = hacked) = 0.20 (run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 30))) ;;fault in the output of bar, slower than expected the model of bar is changed to slow which solves the contradiction 91 .2))) Also the probability of box-2 being hacked which bar runs on increases from 0.20 to 0.69 which makes sense and is to be expected (run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 34))) fault is in the output of foo where it comes earlier than is expected. The system correctly changes the model of foo to fast which makes it consistent Probability of box-1 being hacked increases from with the given inputs/outputs. 0.10 to 0.47 (run-case 'test-1 '((a foo 10) (b foo 15)) '((c foo 16))) this is a case with no possible solutions, i.e. no combination of selected models for the components will yield in a non-contradictory state Right now the system does not cope with this situation yet, i.e. it doesn't know when to stop. There appears to be a problem with the setting of evidence in the ideal side Perhaps the evidence should be reset on each iteration of picking a component of things. model to retract. Problem arises when the code tries to retract a retraction which on the Ideal side is equivalent to setting evidence both for the True and the False case which is impossible. (run-case 'test-1 '((a foo 10) (b foo 15)) '((x bar 43))) 1# #1 Example #2 Branch (defensemble test-2 :components ((foo :models (normal fast) :inputs (a) :outputs Cb)) (bar :models (normal slow) :inputs (a) :outputs (x)) (baz :models (normal really-slow) :inputs (a) :outputs (y))) :dataflows ((b foo a bar) (b foo a baz)) (resource-2 (normal .8) :resources ((resource-1 (normal .9) (hacked .1)) :resource-mappings ((foo resource-1) (bar resource-2) (baz resource-2)) :model-mappings ((foo normal resource-1 normal .90) (foo fast resource-1 normal .10) (foo normal resource-1 hacked .20) (foo fast resource-1 hacked .80) (bar normal resource-2 normal .90) (bar slow resource-2 normal .10) (bar normal resource-2 hacked .10) (bar slow resource-2 hacked .90) (baz normal resource-2 normal .90) (baz really-slow resource-2 normal .10) (baz normal resource-2 hacked .10) (baz really-slow resource-2 hacked .90))) (defmodel (foo normal) ((a) (b) 2 7)) (defmodel (foo fast) ((a) (b) 1 5)) (defmodel (bar normal) ((a) (x) 5 10)) (defmodel (bar slow) ((a) (x) 7 15)) (defmodel (baz normal) ((a) (y) 5 10)) (defmodel (baz really-slow) ((a) (y) 10 17)) Various test cases with different inputs/outputs to test system normal case with no faults ;;P(resource-1 = hacked) = 0.10 ;;P(resource-2 = hacked) = 0.20 (run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 25))) fault on output of baz, value is higher than tolerable 92 (hacked .2))) the selected model of baz is changed from normal to really-slow which solves the contradiction. The probability that resource-2 is hacked increases from .2 to .69 (run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 30))) same as (run-case '((a foo '((x bar previous but with bar having the fault instead 'test-2 10)) 30) (y baz 25))) The result both bar and baz have a fault on the outputs. is that both bar and baz have been changed to their *slower* models The probability that resource-2 is hacked increases from .2 to .95 This makes sense and is to be expected. (run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 30))) This case has bar with a slow fault output and baz with a fast fault output Once again no solution exists for this case and the system doesn't know how to deal with this case. Interesting note is that the error you get is that the resource-2 node has invalid somehow the evidence that is given to it causes the belief of the node to be values, i.e. 0.0 0.0 which is an impossible case. (run-case 'test-2 '((a foo 10)) '((x bar 30) (y baz 15))) Gave a value such that foo must be in the fast case to see what would happen. This is because bar and baz were both set to the *slower* states interestingly enough. with the numbers in the model, it is more probable that foo is in the normal state to see if that solves So bar and baz's states are changed first compared to bar and baz. the contradiction. Not quite what one would ideally want but it is consistent with what the code is supposed to do. This is a configuration consistent with the given The probability inputs/outputs though not necessarily the best or most probable answer. that resource-1 has been hacked increases from .1 to .47. (run-case 'test-2 '((a foo 10)) '((x bar 25) (y baz 25) (b foo 11))) 1# #1 Example #3 Branch and Join (defensemble test-3 :components ((web-server :models (normal peak off-peak) :inputs (queryl query2) :outputs (answer1 answer2)) (dollar-monitor :models (normal slow) :inputs (update) :outputs (result)) (yen-monitor :models (normal slow really-slow) :inputs (update) :outputs (result)) (bond-trader :models (normal fast slow) :inputs (price) :outputs (decision)) (currency-trader :models (normal fast slow) :inputs (pricel price2) :outputs (decision))) :dataflows ((answerl web-server update dollar-monitor) (answer2 web-server update yen-monitor) (result dollar-monitor price bond-trader) (result dollar-monitor pricel currency-trader) (result yen-monitor price2 currency-trader)) :resources ((wallst-server (normal .9) (hacked .1)) (jpmorgan-net (normal .85) (hacked .15)) (bonds-r-us (normal .8) (hacked .2)) (trader-joe (normal .7) (hacked .3))) :resource-mappings ((web-server wallst-server) (dollar-monitor jpmorgan-net) (yen-monitor jpmorgan-net) (bond-trader bonds-r-us) (currency-trader trader-joe)) :model-mappings ((web-server normal wallst-server normal .6) (web-server peak wallst-server normal .1) (web-server off-peak wallst-server normal .3) (web-server normal wallst-server hacked .15) (web-server peak wallst-server hacked .8) (web-server off-peak wallst-server hacked .05) (dollar-monitor normal jpmorgan-net normal .80) (dollar-monitor slow jpmorgan-net normal .20) (dollar-monitor normal jpmorgan-net hacked .30) (dollar-monitor slow jpmorgan-net hacked .70) (yen-monitor normal jpmorgan-net normal .6) (yen-monitor slow jpmorgan-net normal .25) (yen-monitor really-slow jpmorgan-net normal .15) (yen-monitor normal jpmorgan-net hacked .05) 93 (yen-monitor slow jpmorgan-net hacked .45) (yen-monitor really-slow jpmorgan-net hacked .50) (bond-trader normal bonds-r-us normal .5) (bond-trader fast bonds-r-us normal .25) (bond-trader slow bonds-r-us normal .25) (bond-trader normal bonds-r-us hacked .05) (bond-trader fast bonds-r-us hacked .45) (bond-trader slow bonds-r-us hacked .50) (currency-trader normal trader-joe normal .5) (currency-trader fast trader-joe normal .25) (currency-trader slow trader-joe normal .25) (currency-trader normal trader-joe hacked .05) (currency-trader fast trader-joe hacked .45) (currency-trader slow trader-joe hacked .50))) (defmodel (web-server normal) ((queryl) (answerl) 4 8) ((query2) (answer2) 5 10)) (defmodel (web-server peak) ((queryl) (answerl) 7 11) ((query2) (answer2) 8 13)) (defmodel (web-server off-peak) ((queryl) (answer1) 1 5) ((query2) (answer2) 2 7)) (defmodel (dollar-monitor normal) ((update) (result) 3 6)) (defmodel (dollar-monitor slow) ((update) (result) 6 10)) (defmodel (yen-monitor normal) ((update) (result) 4 7)) (defmodel (yen-monitor slow) ((update) (result) 7 10)) (defmodel (yen-monitor really-slow) ((update) (result) 10 15)) (defmodel (bond-trader normal) ((price) (decision) 3 7)) (defmodel (bond-trader fast) ((price) (decision) 1 3)) (defmodel (bond-trader slow) ((price) (decision) 6 10)) (defmodel (currency-trader normal) ((pricel price2) (decision) 3 7)) (defmodel (currency-trader fast) ((pricel price2) (decision) 1 3)) (defmodel (currency-trader slow) ((pricel price2) (decision) 6 10)) Various test cases with different inputs/outputs to test system normal case with no faults (run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision bond-trader 25) (decision currency-trader 28))) slow fault on the output of bond-trader. bond-trader is correctly changed from the normal to the slow state. The probability of the resource bonds-r-us being hacked increased from .20 to .32. (run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision bond-trader 32) (decision currency-trader 28))) bond-trader was given a fast fault and it sets the bond-trader component to the fast model which is correct. The probability of the resource ;;bonds-r-us being hacked increases from .20 to .31 (run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) 94 '((decision bond-trader 18) (decision currency-trader 28))) I was trying to give both bond-trader and currency-trader slow faults in the hopes of seeing that the resource jpmorgan-net has an increased probability of being hacked. Unfortunately, this example causes the system to choke because there is no backtracking Thus it attempts to do a hillclimbing approach which leads to no implemented yet. possible solution and it can't handle things afterwards. (run-case 'test-3 '((queryl web-server 10) (query2 web-server 15)) '((decision bond-trader 35) (decision currency-trader 45))) A fix to these problems would definitely have to include a way to be able to backtrack and keep track of what combinations of assumptions you have made/unjustified so you can Perhaps a new strategy of picking which redo them if it doesn't solve the problem. This is definitely necessary component to unjustify and change models is required. if we are thinking of handling multiple iterations of retractions. A simple fix might be reset the evidence at each iteration to test consistency, then set the evidence given Thus if a retraction has been retracted, it will the things that are known to be false. hopefully not be in the joshua database of true/false things and in essence a form of backtracking is created. It must be noted though that this is still only a hack and not a true form of backtracking. 95 Appendix B comlink-ideal code This code can be found on the server named WILSON in the MIT Artifical Intelligence Laboratory in the KBCW group. The file is located on that machine at: w:>comlink>v-5>kbcw>code>ideal.lisp (defclass comlink-to-ideal-mapping () ((forward-map :initform (make-hash-table) :accessor forward-map) (backward-map :initform (make-hash-table) :accessor backward-map) (ideal-diagram :accessor ideal-diagram :initform nil) (issue-node :initform nil :initarg :issue :accessor issue) (root-node-map :initform (make-hash-table) :accessor root-node-map) (defmethod intern-comlink-node ((the-map comlink-to-ideal-mapping) node) (with-slots (forward-map backward-map root-node-map ideal-diagram) the-map (let* ((the-ideal-node (gethash node forward-map))) (cond ((not (null the-ideal-node)) (values the-ideal-node :old)) (t (multiple-value-bind (new-diagram new-node) (etypecase node (compound-document-record (ideal:add-node ideal-diagram :name (gensym "NODE") :state-labels '(:false :true) :type :chance :relation-type :prob :noisy-or nil)) (basic-document-record (when (and (null (follow-link node :supports :backward)) (null (follow-link node :denies :backward))) (setf (gethash node root-node-map) nil)) (ideal:add-node ideal-diagram :name (gensym "NODE") :state-labels '(:false :true) :noisy-or t :noisy-or-subtype :binary))) (setq the-ideal-node new-node ideal-diagram new-diagram) (setf (gethash node forward-map) the-ideal-node (gethash the-ideal-node backward-map) node) (values the-ideal-node :new))))))) (defun build-an-ideal-structure (issue-node) (let* ((the-map (make-instance 'comlink-to-ideal-mapping)) (forward-map (forward-map the-map))) (with-db-transaction Pass 1: First traverse the graph and get every node interned we only queue up document children if interning them tells us this is the 96 ;; first time we're seeing it. (labels ((do-a-node (node) (multiple-value-bind (ideal-node old-or-new?) (intern-comlink-node the-map node) (declare (ignore ideal-node)) (when (eql old-or-new? :new) (typecase node it's important that this goes first because compound-document-record is a sub-class of basic-document-record (compound-document-record (loop for child in (component-document-records node) doing (do-a-node child))) (basic-document-record (loop for child in (follow-link node :supports :backward) doing (do-a-node child)) (loop for child in (follow-link node :denies :backward) doing (do-a-node child))))))) (loop for node in (follow-link issue-node :is-a-hypothesis-about :backward) do (do-a-node node))) pass 2: Now for each link between nodes build the ideal arcs since we know all the nodes are in the hash table we can iterate over that rather than walking the graph. (loop for comlink-node being the hash-keys of forward-map using (hash-value ideal-node) do (typecase comlink-node see above about ordering (compound-document-record (ideal:add-arcs ideal-node (loop for child in (component-document-records comlink-node) for ideal-child = (intern-comlink-node the-map child) collect ideal-child))) (basic-document-record (let ((ideal-children nil)) (loop for child in (follow-link comlink-node :supports :backward) doing (push (intern-comlink-node the-map child) ideal-children)) (loop for child in (follow-link comlink-node :denies :backward) doing (push (intern-comlink-node the-map child) ideal-children)) (ideal:add-arcs ideal-node ideal-children))))) pass 3: set up the probabilities (loop for comlink-node being the hash-keys of forward-map using (hash-value do (typecase comlink-node see above about ordering (compound-document-record (set-up-and-node comlink-node ideal-node the-map)) (basic-document-record (if (null (ideal:node-predecessors ideal-node)) (set-up-root-node comlink-node ideal-node the-map) (set-up-noisy-or-node comlink-node ideal-node the-map)))))) the-map)) (defun set-up-and-node (comlink-node ideal-node the-map) (declare (ignore comlink-node the-map)) (ideal:for-all-cond-cases (cond-case (ideal:node-predecessors ideal-node)) (let ((all-true (loop for remaining-cases on cond-case for next-node = (ideal:node-in remaining-cases) for next-state = (ideal:state-in remaining-cases) always (eql next-state (ideal:get-state-label next-node :true))))) (ideal: for-all-cond-cases (node-case ideal-node) (let ((true-node-case (eql (ideal:state-in node-case) (ideal:get-state-label (if all-true (setf (ideal:prob-of node-case cond-case) (if true-node-case 1 0)) (setf (ideal:prob-of node-case cond-case) (if true-node-case 0 1)))))))) ideal-node) ideal-node (defun set-up-noisy-or-node (comlink-node ideal-node the-map) (loop for link in (get-document-backward-links comlink-node (kbcw::supports-link)) for source = (Xdocument-link-source link) for ideal-predecessor = (intern-comlink-node the-map source) for predecessor-false-label = (ideal:get-state-label ideal-predecessor :false) for probability = (Xdocument-link-certainty-factor link) for inhibition-value = (- 1 (/ (float probability) 100.0)) do (ideal :for-all-cond-cases (case ideal-predecessor) Notice that since this uses a node, not a list of nodes, each cond-case consisting of a single pair. will be a list (when (eql (ideal:state-in case) predecessor-false-label) (setf (ideal:inhibitor-prob-of ideal-node case) inhibition-value)))) (Ideal:compile-noisy-or-distribution ideal-node) 97 :true)))) (defun set-up-root-node (comlink-node ideal-node the-map) is for the root of the comlink evidence chain ;;this create another node linked to this one by a conditional probability If provide evidence for representing our intended belief in the node. this node then it will have the desired effect. (with-slots (ideal-diagram root-node-map) the-map (loop for inhibition-value in '(.1 .2 .3) for new-node = (multiple-value-bind (new-diagram new-node) (ideal: add-node ideal-diagram :name (gensym "NODE") :state-labels '(:false :true) :type :chance :relation-type :prob :noisy-or nil) (setq ideal-diagram new-diagram) new-node) for new-guys-false-label = (ideal:get-state-label new-node :false) new-node)) do (ideal:add-arcs ideal-node (list (push (cons inhibition-value new-node) (gethash comlink-node root-node-map)) (ideal:for-all-cond-cases (case new-node) of nodes, each cond-case Notice that since this uses a node, not a list consisting of a single pair. will be a list (when (eql (ideal:state-in case) new-guys-false-label) (setf (ideal: inhibitor-prob-of ideal-node case) inhibition-value))) (defun decide-on-evidence (map) (with-slots (root-node-map) map (loop for comlink-document being the hash-keys of root-node-map using (hash-value choices) (.3 .2 .1)))) for prob = (progn (print comlink-document) (scl:accept '(dw:member-sequence for node = (cdr (assoc prob choices :test #'=)) do (loop for (prob . node) in choices do (setf (ideal:node-state node) (ideal:get-state-label node :false))) (setf (ideal:node-state node) (ideal:get-state-label node :true))))) 98 Bibliography [1] Eugene Charniak. Bayesian networks without tears. Al Magazine, 1991. [2] G. F. Cooper. Probabilistic inference using belief networks is np-hard. Medical computer science group, Stanford University, 1987. [3] Bruce D'Ambrosio. Inference in bayesian networks. Al Magazine, 20(2):21-35, 1999. [4] Randall Davis and Walter Hamscher. Model-based reasoning: Troubleshooting. In Howard E. Shrobe, editor, Exploring Artificial Intelligence, chapter 8, pages 297-346. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1988. [5] Johan de Kleer and Brian C. Williams. Diagnosis with behavioral modes. Knowledge Representation,pages 1324-1330, 19. [6] Peter Haddawy. An overview of some recent developments in bayesian problemsolving techniques. Al Magazine, 20(2):11-19, 1999. [7] Peter Szolovits Howard Shrobe, Jon Doyle. Active trust management (atm) for autonomous adaptive survivable systems (aass's). Proposal for ARPA BAA 99-10, 1999. [8] Wellman MP, Eckman MH, and et al. Fleming C. Automated critiquing of medical decision trees. Medical Decision Making, 9:272-84., 1989. [9] Judea Pearl. ProbabilisticReasoning in Intelligent Systems. Morgan Kaufmann Publishers, INC., 1988. 99 [10] G. Shafer. A Mathematical Theory of Evidence. Princeton University Press., 1976. [11] Peter Struss and Oskar Dressler. Physical negation - integrating fault models into the general diagnostic engine. Knowledge Representation,pages 1318-1323, 19. 100