ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] LEARNING AGENTS Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information Technology Institute of Applied Computer Systems Department of Systems Theory and Design E-mail: Janis.Grundspenkis@rtu.lv LEARNING FROM OBSERVATIONS Learning in intelligent agents is essential for dealing with unknown environments (compensating for the designer’s lack of full amount of knowledge about the agent’s environment). The idea behind learning is that percepts should be used not only for acting, but also for improving the agent's ability to act in the future. Learning takes place as a result of the interaction between the agent and the world, and from observation by the agent of its own decisionmaking process. LEARNING AGENT (1) Four conceptual components: • LEARNING ELEMENT is responsible for making improvements and for efficiency of the performance element. • PERFORMANCE ELEMENT is responsible for selecting external actions. • CRITIC is designed to tell the learning element how well the agent is doing. • PROBLEM GENERATOR is responsible for suggesting actions that will lead to new and informative experience. LEARNING AGENT (2) Performance standard feedback Learning element learning goals Problem generator Agent Sensors changes knowledge Performance element Effectors Environment Critic THE DESIGN OF THE LEARNING ELEMENT Four major issues: • Which components of the performance element are to be improved. • What representation is used for those components. • What feedback is available. • What prior knowledge is available. COMPONENTS OF THE PERFORMACE ELEMENT (1) 1. A direct mapping from conditions on the current state to actions. 2. A means to infer relevant properties of the world from the percept sequence. 3. Information about the way the world evolves. COMPONENTS OF THE PERFORMACE ELEMENT (2) 4. Utility information indicating the desirability of world states. 5. Action-value information indicating the desirability of particular actions in particular states. 6. Goals that describe classes of states whose achievement maximizes the agent’s utility. COMPONENTS OF THE PERFORMACE ELEMENT (3) Each of the components can be learned, given the appropriate feedback. For example, if the agent does an action and then perceives the resulting state of the environment, this information can be used to learn a description of the results of actions (the fourth component). If the critic can use the performance standard to deduce utility values from the percepts, then the agent can learn a useful representation of its utility function (the fifth component). COMPONENTS OF THE PERFORMACE ELEMENT (4) Each of the seven components of the performance element can be described mathematically as a function. Learning any particular component of the performance element can be seen as learning an accurate representation of a function. The difficulty of learning depends on the chosen representation. Functions can be represented by logical sentences, belief networks, neural networks, etc. COMPONENTS OF THE PERFORMACE ELEMENT (5) Learning takes many forms, depending on the nature of the performance element, the available feedback, and the available knowledge. For example, supervised learning – any situation in which both the inputs and outputs of a component can be perceived. If agent receives some evaluation of its actions but is not told a correct action in learning the condition-action component, this is called reinforcement learning. INDUCTIVE LEARNING Learning a function (constructing a description of a function) from a set of input/output examples is called inductive learning. An example is a pair (x, f(x)), where x is the input and f(x) is the output of the function applied to x. The task of induction is this: given a collection of examples of f return a function h that approximates f. The function h is called a hypothesis. EXAMPLES AND DIFFERENT HYPOTHESIS f(x) f(x) f(x) x x a) x b) c) x d) x e) The true function f is unknown, so there are many choices for h. REFLEX LEARNING AGENT EXAMPLES are (percept, action) pairs. procedure REFLEX-LEARNING-ELEMENT (percept, action) INPUTS: percept, feedback percept action, feedback action EXAMPLES EXAMPLES { (percept, action) } function REFLEX-PERFORMACE-ELEMENT (percept) returns an action IF (percept, action) is in EXAMPLES THEN return action ELSE call the learning algorithm INDUCE (EXAMPLES) h which the agent uses to choose the action return h (percept) LEARNING DECISION TREES A decision tree takes as input an object or situation described by a set of properties, and outputs a YES/NO decision. Decision trees are an effective method for learning deterministic Boolean functions. Each internal node in the tree corresponds to a test of the value of one of the properties (attributes). Each leaf node in the tree specifies the Boolean value to be returned if that leaf is reached. EXAMPLE OF THE DECISION TREE LEARNING (1) The problem: whether to wait for a table in restaurant. The aim is to learn a definition for the goal predicate WILL WAIT, where the definition is expressed as a decision tree. The examples are described by the following attributes: 1. Alternate: whether there is a suitable alternative restaurant nearby. 2. Bar: whether the restaurant has a comfortable bar area to wait in. EXAMPLE OF THE DECISION TREE LEARNING (2) 3. 4. 5. Friday/Saturday: true on Fridays and Saturdays. Hungry: whether we are hungry. Patrons: how many people are in the restaurant (values are NONE, SOME, and FULL). 6. Price: the restaurant’s price range ($, $$, $$$). 7. Raining: whether it is raining outside. 8. Reservation: whether we made a reservation. 9. Type: the kind of restaurant (French, Italian, Thai, or Burger). 10. Wait Estimate: the wait estimate by the host (0-10 minutes, 10-30 minutes, 30-60, > 60). A DECISION TREE FOR DECIDING WHETHER TO WAIT FOR A TABLE Patrons? None No Full Some Yes WaitEstimate? >60 No 0-10 30-60 10-30 Alternate? No Yes Reservation? No Yes Bar? Yes Hungry? No Yes Yes Fri/Sat? Yes Alternate? No Yes No Yes No Yes Yes No Yes Raining? No Yes No Yes No Yes REPRESENTATION OF THE DECISION TREE The tree can be expressed as a conjunction of individual implications corresponding to the paths through the tree ending in YES nodes. Example: The path for a restaurant full of patrons, with an estimates wait of 10-30 minutes when the agent is not hungry is expressed by the logical sentence r Patrons(r, FULL) WaitEstimate(r, 10-30) Hungry (r, NOT) WillWait(r) EXPRESSIVENESS OF DECISION TREES (1) The decision trees can not represent any set because decision trees are implicitly limited to talking about a single object. The decision tree language is essentially propositional, with each attribute test being a proposition. The decision trees are fully expressive within the class of propositional languages, that is, any Boolean function can be written as a decision tree. EXPRESSIVENESS OF DECISION TREES (2) Decision trees are good for some kinds of functions, and bad for others. n 2 If we have n attributes than there are 2 different functions. For example, with just six Boolean attributes, there are about 21019 different functions to choose from. How to find consistent hypothesis in such a large space? INDUCING DECISION TREES FROM EXAMPLES (1) An example is described by the values of the attributes and the value of the goal predicate. The value of the goal predicate is called the classification of the example. If the goal predicate is true for some example, it is a positive example, otherwise it is a negative example. INDUCING DECISION TREES FROM EXAMPLES (2) EXAMPLE Example Attributes Alt Bar Fri Hun Pat Price Rain Res Type Est Goal WillWait X1 Yes No No Yes Some $$$ No Yes French 0-10 Yes X2 Yes No No Yes Full $ No No Thai 30-60 No X3 No Yes No No Some $ No No Burger 0-10 Yes X4 Yes No Yes Yes Full $ No No Thai 10-30 Yes X5 Yes No Yes No Full $$$ No Yes French >60 No X6 No Yes No Yes Some $$ Yes Yes Italian 0-10 Yes X7 No Yes No No None $ Yes No Burger 0-10 No X8 No No No Yes Some $$ Yes Yes Thai 0-10 Yes X9 No Yes Yes No Full $ Yes No Burger >60 No X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10-30 No X11 No No No No None $ No No Thai 0-10 No X12 Yes Yes Yes Yes Full $ No No Burger 30-60 Yes CONTRUCTION OF THE DECISION TREE (1) Simple solution: construct a decision tree that has one path to a leaf for each example, where the path tests each attribute in turn and follows the value for the example, and the leaf has the classification of the example. This is a simple way how to find the decision tree that agrees with the training set of examples. When given the example with the same description again, the decision tree will come up with the right classification. CONTRUCTION OF THE DECISION TREE (2) The problem with a trivial tree is that it just memorizes the observations. It does no extract any pattern from the examples and so we can not expect it to be able to extrapolate to examples it has not seen. Extracting a pattern means being able to describe a large number of cases in a concise way. We should try to find a concise decision tree. CONTRUCTION OF THE DECISION TREE (3) This is an example of a general principle of inductive learning called Ockham’s razor: The most likely hypothesis is the simplest one that is consistent with all observations (examples). A simple hypothesis that is consistent with the observations is more likely to be correct than a complex one. FINDING THE SMALLEST DECISION TREE (1) The basic idea: Test the most important attribute first. The most important is the attribute that makes the most difference to the classification of an example. This way, we hope to get the correct classification with a small number of tests, meaning that all paths in the tree will be short and the tree as a whole will be small. FINDING THE SMALLEST DECISION TREE (2) +: X1, X3, X4, X6, X8, X12 -: X2, X5, X7, X9, X10, X11 Type? French +: X1 -: X5 Italian +: X6 -: X10 Thai +: X4, X8 -: X2, X11 Burger +: X3, X12 -: X7, X9 FINDING THE SMALLEST DECISION TREE (3) +: X1, X3, X4, X6, X8, X12 -: X2, X5, X7, X9, X10, X11 Patrons? None +: -: X7, X11 Some +: X1, X3, X6, X8 -: Full +: X4, X12 -: X2, X5, X9, X10 FINDING THE SMALLEST DECISION TREE (4) +: X1, X3, X4, X6, X8, X12 -: X2, X5, X7, X9, X10, X11 Patrons? None +: -: X7, X11 No Some +: X1, X3, X6, X8 -: Full +: X4, X12 -: X2, X5, X9, X10 Yes Hungry? Y +: X4, X12 -: X2, X10 N +: -: X5, X9 FINDING THE SMALLEST DECISION TREE (5) Consider all possible attributes and find the most important one. After the first attribute test splits up the examples, each outcome is a new decision tree learning problem in itself, with fewer examples and one fewer attribute. FINDING THE SMALLEST DECISION TREE (6) FOUR CASES 1. If there are some positive and some negative examples, then choose the best attribute to split them. 2. If all the remaining examples are positive (or all negative), answer YES or NO respectively. 3. If there are no examples left (no such examples has been observed), return a default value calculated from the majority classification at the node’s parent. FINDING THE SMALLEST DECISION TREE (7) 4. If there are no attributes left, but both positive and negative examples, it means that these examples have exactly the same description, but different classification. This happens when some of the data are incorrect (there is noise in the data). It also happens when the attributes do not give enough information to fully describe the situation, or when the domain is truly nondeterministic. One simple way out of the problem is to use a majority vote if no more attributes can be used. +: X1, X3, X4, X6, X8, X12 -: X2, X5, X7, X9, X10, X11 None +: -: X7, X11 Patrons? Some +: X1, X3, X6, X8 -: No FINDING THE SMALLEST DECISION TREE (8) Full +: X4, X12 -: X2, X5, X9, X10 Yes Hungry? Yes No +: X4, X12 -: X2, X10 +: -: X5, X9 Type? French Yes Italian +: -: X10 No No Thai Burger +: X4 -: X2 +: X12 -: Fri/Sat? No Yes +: -: X2 Yes +: X4 -: No Yes FINDING THE SMALLEST DECISION TREE (9) CONCLUSIONS The learning algorithm looks at all examples, not at the correct function, and in fact, its hypothesis shown as the last decision tree not only agrees with all the examples, but is considerably simpler than the original tree. The learning algorithm has no reason to include tests for RAINING and RESERVATION, because it can classify all the examples without them. If we were to gather more examples, we might induce a tree more similar to the original. ASSESSING THE PERFORMANCE OF THE LEARNING ALGORITHM (1) A learning algorithm is good if it produces hypothesis that do a good job of predicting the classification of unseen examples. A prediction is good if it turns out to be true, so we can assess the quality of a hypothesis by checking its predictions against the correct classification once we know it. It is done on the test set. ASSESSING THE PERFORMANCE OF THE LEARNING ALGORITHM (2) THE METHODOLOGY 1. Collect a large set of examples. 2. Divide it into two distinct sets: the training set and the test set. 3. Use the learning algorithm with the training set as examples to generate a hypothesis. 4. Measure the percentage of examples in the test set that are correctly classified by a hypothesis. 5. Repeat steps 1 to 4 for different sizes of training sets and different randomly selected training sets of each size. ASSESSING THE PERFORMANCE OF THE LEARNING ALGORITHM (3) The key idea of the methodology is to keep the training and test data separate. The result of the application of the methodology is a set of data that can be processed to give the average prediction quality as a function of the size of the training set. ASSESSING THE PERFORMANCE OF THE LEARNING ALGORITHM (4) A learning curve % correct 1 on test set 0 20 40 60 80 100 As the training set grows, the prediction quality increases. Training set size