Artificial Intelligence CS 165A Tuesday, November 20, 2007 Knowledge Representation (Ch 10) Uncertainty (Ch 13) Notes • HW #4 due by noon tomorrow • Reminder: Final exam December 14, 4-7pm – Review in class on Dec. 6th 2 Review Situation Calculus – actions, events • “Situation Calculus” is a way of describing change over time in first-order logic – Fluents: Functions or predicates that can vary over time have an extra argument, Si (the situation argument) Predicate(args, Si) Location of an agent, aliveness, changing properties, ... – The Result function is used to represent change from one situation to another resulting from an action (or action sequence) Result(GoForward, Si) = Sj “Sj is the situation that results from the action GoForward applied to situation Si Result() indicates the relationship between situations 3 Review Situation Calculus 4 Represents the world in different “situations” and the relationship between situations Review Situation Calculus 5 Represents the world in different “situations” and the relationship between situations Review Examples • How would you interpret the following sentences in FirstOrder Logic using situation calculus? x, s Studying(x, s) Failed(x, Result(TakeTest, s)) If you’re studying and then you take the test, you will fail. (or) Studying a subject implies that you will fail the test for that subject. x, s TurnedOn(x, s) LightSwitch(x) TurnedOff(x, Result(FlipSwitch, s)) If you flip the light switch when it is turned on, it will then be turned off. 6 There are other ways to deal with time • Event calculus – Based on points in time rather than situations – Designed to allow reasoning over periods of time Can represent actions with duration, overlapping actions, etc. • Generalized events – Parts of a general “space-time chunk” • Processes – Not just discrete events • Intervals – Moments and durations of time • Objects with state fluents – Not just events, but objects can also have time properties 7 Event calculus relations • Initiates(e, f, t) – Event e at time t causes fluent f to become true • Terminates(e, f, t) – Event e at time t causes fluent f to no longer be true • Happens(e, t) – Event e happens at time t • Clipped(f, t1, t2) – f is terminated by some event sometime between t1 and t2 8 Generalized events • An ontology of time that allows for reasoning about various temporal events, subevents, durations, processes, intervals, etc. Space-time chunk Australia time 9 Time interval predicates Ex: After(ReignOf(ElizabethII), ReignOf(GeorgeVI)) Overlap(Fifties, ReignOf(Elvis)) Start(Fifties) = Start(AD1950) Meet(Fifties, Sixties) 10 Objects with state fluents President(USA) 11 Knowledge representation • Chapter 10 covers many topics in knowledge representation, many of which are important to real, sophisticated AI reasoning systems – We’re only scratching the surface of this topic – Best covered in depth in an advanced AI course and in context of particular AI problems – Read through the Internet shopping world example in 10.5 • Now we move on to probabilistic reasoning, a different way of representing and manipulating knowledge – Chapters 13 and 14 12 Quick Review of Probability From here on we will assume that you know this… 13 Probability notation and notes • Probabilities of propositions – P(A), P(the sun is shining) • Probabilities of random variables – P(X = x1), P(Y = y1), P(x1 < X < x2) • P(A) usually means P(A = True) (A is a proposition, not a variable) – This is a probability value – Technically, P(A) is a probability function • P(X = x1) – This is a probability value (P(X) is a probability function) • P(X) – This is a probability function or a probability density function • Technically, if X is a variable, we should not write P(X) = 0.5 – But rather P(X = x1) = 0.5 14 Discrete and continuous probabilities • Discrete: Probability function P(X, Y) is described by an MxN matrix of probabilities – Possible values of each: P(X=x1, Y=y1) = p1 – P(X=xi, Y=yj) = 1 – P(X, Y, Z) is an MxNxP matrix • Continuous: Probability density function (pdf) P(X, Y) is described by a 2D function – P(x1 < X < x2, y1 < Y < y2) = p1 – P(X, Y) dX dY = 1 15 Discrete probability distribution p( X x ) 1 0.2 i i p(X) 0.1 0 1 2 3 4 5 6 7 X 8 9 10 11 12 16 Continuous probability distribution 0.4 p( X ) 1 p(X) 0.2 0 1 2 3 4 5 6 7 X 8 9 10 11 12 17 Continuous probability distribution 0.4 8 p( X ) a P(X=5) = ??? P(X=5) = 0 6 P(X=x1) = 0 p(X) 0.2 0 1 2 3 4 5 6 7 X 8 9 10 11 12 18 Three Axioms of Probability 1. The probability of every event must be nonnegative – For any event A, P(A) 0 2. Valid propositions have probability 1 – P(True) = 1 – P(A A) = 1 3. For disjoint events A1, A2, … – P(A1 A2 …) = P(A1) + P(A2) + … • From these axioms, all other properties of probabilities can be derived. – E.g., derive P(A) + P(A) = 1 19 Some consequences of the axioms • Unsatisfiable propositions have probability 0 – P(False) = 0 – P(A A) = 0 • For any two events A and B – P(A B) = P(A) + P(B) – P(A B) • For the complement Ac of event A – P(Ac) = 1 – P(A) • For any event A – 0 P(A) 1 • For independent events A and B – P(A B) = P(A) P(B) 20 Venn Diagram True A A B B Visualize: P(True), P(False), P(A), P(B), P(A), P(B), P(A B), P(A B), P(A B), … 21 Joint Probabilities • A complete probability model is a single joint probability distribution over all propositions/variables in the domain – P(X1, X2, …, Xi, …) • A particular instance of the world has the probability – P(X1=x1 X2=x2 … Xi=xi …) = p • Rather than stating knowledge as WetGrass WetGrass Raining 0.8 0.04 Raining 0.01 0.15 – Raining WetGrass • We can state it as – – – – P(Raining, WetGrass) = 0.15 P(Raining, WetGrass) = 0.01 P(Raining, WetGrass) = 0.04 P(Raining, WetGrass) = 0.8 22 Conditional Probability • Unconditional, or Prior, Probability – Probabilities associated with a proposition or variable, prior to any evidence – E.g., P(WetGrass), P(Raining) • Conditional, or Posterior, Probability – – – – Probabilities after evidence is gathered P(A | B) – “The probability of A given that we know B” After (posterior to) procuring evidence E.g., P(WetGrass | Raining) P( X | Y ) P( X , Y ) P(Y ) or P( X | Y ) P(Y ) P( X , Y ) Assumes P(Y) nonzero 23 The chain rule P( X , Y ) P( X | Y ) P(Y ) By the Chain Rule P( X , Y , Z ) P( X | Y , Z ) P(Y , Z ) P( X | Y , Z ) P(Y | Z ) P( Z ) or , equivalently P( X ) P(Y | X ) P( Z | X , Y ) Notes: • Precedence: ‘|’ is lowest • E.g., P(X | Y, Z) means which? P( (X | Y), Z ) P(X | (Y, Z) ) 24 Joint probability distribution From P(X,Y), we can always calculate: X Y x1 x2 x3 y1 0.2 0.1 0.1 y2 0.1 0.2 0.3 P(X) P(Y) P(X|Y) P(Y|X) P(X=x1) P(Y=y2) P(X|Y=y1) P(Y|X=x1) P(X=x1|Y) etc. 25 P(X,Y) y1 y2 x1 x2 0.2 0.1 0.1 0.2 P(Y) x3 P(X) 0.1 0.3 P(X|Y) y1 0.4 y2 0.6 P(X=x1,Y=y2) = ? P(X=x1) = ? P(Y=y2) = ? P(X|Y=y1) = ? P(X=x1|Y) = ? x1 x2 x3 0.3 0.3 0.4 x1 x2 x3 y1 0.5 0.25 0.25 y2 0.167 0.333 0.5 x1 x2 y1 0.667 0.333 0.25 y2 0.333 0.667 0.75 P(Y|X) x3 26 Probability Distributions Continuous vars Discrete vars P(X) Function (of one variable) M vector P(X=x) Scalar* Scalar P(X,Y) Function of two variables MxN matrix P(X|Y) Function of two variables MxN matrix P(X|Y=y) Function of one variable M vector P(X=x|Y) Function of one variable N vector P(X=x|Y=y) Scalar* Scalar * - actually zero. Should be P(x1 < X < x2) 27 Bayes’ Rule • Since P( X , Y ) P( X | Y ) P(Y ) and P( X , Y ) P(Y | X ) P( X ) • Then P( X | Y ) P(Y ) P(Y | X ) P( X ) P(Y | X ) P( X ) Bayes’ Rule P( X | Y ) P(Y ) 28 Bayes’ Rule • Similarly, P(X) conditioned on two variables: P(Y | X , Z ) P( X | Z ) P( X | Y , Z ) P(Y | Z ) P( Z | X , Y ) P( X | Y ) P( X | Y , Z ) P( Z | Y ) • Or N variables: P( X 2 | X 1 , X 3 , , X N ) P( X 1 | X 3 , , X N ) P( X 1 | X 2 , X 3 ,, X N ) P( X 2 | X 3 ,, X N ) 29 Bayes’ Rule • This simple equation is very useful in practice – Usually framed in terms of hypotheses (H) and data (D) Which of the hypotheses is best supported by the data? Likelihood (causal knowledge) Prior probability P( D | H i ) P( H i ) P( H i | D) P( D) Posterior probability (diagnostic knowledge) Normalizing constant P ( H i | D) k P ( D | H i ) P ( H i ) 30 Bayes’ rule example: Medical diagnosis • Meningitis causes a stiff neck 50% of the time • A patient comes in with a stiff neck – what is the probability that he has meningitis? • Need to know two things: – The prior probability of a patient having meningitis (1/50,000) – The prior probability of a patient having a stiff neck (1/20) • ? P( S | M ) P( M ) P( M | S ) P( S ) • P(M | S) = (0.5)(0.00002)/(0.05) = 0.0002 31 Example (cont.) • Suppose that we also know about whiplash – P(W) = 1/1000 – P(S | W) = 0.8 • What is the relative likelihood of whiplash and meningitis? – P(W | S) / P(M | S) P(W | S ) P( S | W ) P(W ) (0.8)(0.001) 0.016 P( S ) 0.05 So the relative likelihood of whiplash vs. meningitis is (0.016/0.0002) = 80 32 A useful Bayes rule example A test for a new, deadly strain of anthrax (that has no symptoms) is known to be 99.9% accurate. Should you get tested? The chances of having this strain are one in a million. What are the random variables? A – you have anthrax (boolean) T – you test positive for anthrax (boolean) Notation: Instead of P(A=True) and P(A=False), we will write P(A) and P(A) What do we want to compute? P(A|T) What else do we need to know or assume? Priors: P(A) , P(A) Given: P(T|A) , P(T|A), P(T|A), P(T|A) Possibilities A A T A T A T T 33 Example (cont.) We know: Given: P(T|A) = 0.999, P(T|A) = 0.001, P(T|A) = 0.001, P(T|A) = 0.999 Prior knowledge: P(A) = 10-6, P(A) = 1 – 10-6 Want to know P(A|T) P(A|T) = P(T|A) P(A) / P(T) Calculate P(T) by marginalization P(T) = P(T|A) P(A) + P(T|A) P(A) = (0.999)(10-6) + (0.001)(1 – 10-6) 0.001 So P(A|T) = (0.999)(10-6) / 0.001 0.001 Therefore P(A|T) 0.999 What if you work at a Post Office? 34 People with anthrax People without anthrax Bad T (0.1%) Good T All people 35