Artificial Intelligence CS 165A Knowledge Representation (Ch 10) Uncertainty (Ch 13)

advertisement
Artificial Intelligence
CS 165A
Tuesday, November 20, 2007
 Knowledge Representation (Ch 10)
 Uncertainty (Ch 13)
Notes
• HW #4 due by noon tomorrow
• Reminder: Final exam December 14, 4-7pm
– Review in class on Dec. 6th
2
Review
Situation Calculus – actions, events
• “Situation Calculus” is a way of describing change over
time in first-order logic
– Fluents: Functions or predicates that can vary over time have an
extra argument, Si (the situation argument)
 Predicate(args, Si)
 Location of an agent, aliveness, changing properties, ...
– The Result function is used to represent change from one situation
to another resulting from an action (or action sequence)
 Result(GoForward, Si) = Sj
 “Sj is the situation that results from the action GoForward
applied to situation Si
 Result() indicates the relationship between situations
3
Review
Situation Calculus
4
Represents the world in different “situations” and the relationship between situations
Review
Situation Calculus
5
Represents the world in different “situations” and the relationship between situations
Review
Examples
• How would you interpret the following sentences in FirstOrder Logic using situation calculus?
 x, s Studying(x, s)  Failed(x, Result(TakeTest, s))
If you’re studying and then you take the test, you will fail.
(or) Studying a subject implies that you will fail the test for that subject.
 x, s TurnedOn(x, s)  LightSwitch(x)  TurnedOff(x,
Result(FlipSwitch, s))
If you flip the light switch when it is turned on, it will then be turned off.
6
There are other ways to deal with time
• Event calculus
– Based on points in time rather than situations
– Designed to allow reasoning over periods of time
 Can represent actions with duration, overlapping actions, etc.
• Generalized events
– Parts of a general “space-time chunk”
• Processes
– Not just discrete events
• Intervals
– Moments and durations of time
• Objects with state fluents
– Not just events, but objects can also have time properties
7
Event calculus relations
• Initiates(e, f, t)
– Event e at time t causes fluent f to become true
• Terminates(e, f, t)
– Event e at time t causes fluent f to no longer be true
• Happens(e, t)
– Event e happens at time t
• Clipped(f, t1, t2)
– f is terminated by some event sometime between t1 and t2
8
Generalized events
• An ontology of time that allows for reasoning about
various temporal events, subevents, durations, processes,
intervals, etc.
Space-time chunk
Australia
time
9
Time interval predicates
Ex:
After(ReignOf(ElizabethII), ReignOf(GeorgeVI))
Overlap(Fifties, ReignOf(Elvis))
Start(Fifties) = Start(AD1950)
Meet(Fifties, Sixties)
10
Objects with state fluents
President(USA)
11
Knowledge representation
• Chapter 10 covers many topics in knowledge
representation, many of which are important to real,
sophisticated AI reasoning systems
– We’re only scratching the surface of this topic
– Best covered in depth in an advanced AI course and in context of
particular AI problems
– Read through the Internet shopping world example in 10.5
• Now we move on to probabilistic reasoning, a different
way of representing and manipulating knowledge
– Chapters 13 and 14
12
Quick Review of Probability
From here on we will assume that you know this…
13
Probability notation and notes
• Probabilities of propositions
– P(A), P(the sun is shining)
• Probabilities of random variables
– P(X = x1), P(Y = y1), P(x1 < X < x2)
• P(A) usually means P(A = True)
(A is a proposition, not a variable)
– This is a probability value
– Technically, P(A) is a probability function
• P(X = x1)
– This is a probability value (P(X) is a probability function)
• P(X)
– This is a probability function or a probability density function
• Technically, if X is a variable, we should not write P(X) = 0.5
– But rather P(X = x1) = 0.5
14
Discrete and continuous probabilities
• Discrete: Probability function P(X, Y) is described by an
MxN matrix of probabilities
– Possible values of each: P(X=x1, Y=y1) = p1
–  P(X=xi, Y=yj) = 1
– P(X, Y, Z) is an MxNxP matrix
• Continuous: Probability density function (pdf) P(X, Y) is
described by a 2D function
– P(x1 < X < x2, y1 < Y < y2) = p1
–
 P(X, Y) dX dY = 1
15
Discrete probability distribution
 p( X  x )  1
0.2
i
i
p(X)
0.1
0
1
2
3
4
5
6
7
X
8
9
10
11
12
16
Continuous probability distribution

0.4
 p( X )  1

p(X)
0.2
0
1
2
3
4
5
6
7
X
8
9
10
11
12
17
Continuous probability distribution
0.4
8
 p( X )  a
P(X=5) = ???
P(X=5) = 0
6
P(X=x1) = 0
p(X)
0.2
0
1
2
3
4
5
6
7
X
8
9
10
11
12
18
Three Axioms of Probability
1. The probability of every event must be nonnegative
– For any event A, P(A)  0
2. Valid propositions have probability 1
– P(True) = 1
– P(A  A) = 1
3. For disjoint events A1, A2, …
– P(A1  A2  …) = P(A1) + P(A2) + …
•
From these axioms, all other properties of probabilities
can be derived.
– E.g., derive P(A) + P(A) = 1
19
Some consequences of the axioms
• Unsatisfiable propositions have probability 0
– P(False) = 0
– P(A  A) = 0
• For any two events A and B
– P(A  B) = P(A) + P(B) – P(A  B)
• For the complement Ac of event A
– P(Ac) = 1 – P(A)
• For any event A
– 0  P(A)  1
• For independent events A and B
– P(A  B) = P(A) P(B)
20
Venn Diagram
True
A
A B
B
Visualize: P(True), P(False), P(A), P(B), P(A), P(B),
P(A  B), P(A  B), P(A  B), …
21
Joint Probabilities
• A complete probability model is a single joint probability
distribution over all propositions/variables in the domain
– P(X1, X2, …, Xi, …)
• A particular instance of the world has the probability
– P(X1=x1  X2=x2  …  Xi=xi  …) = p
• Rather than stating knowledge as
WetGrass
WetGrass
Raining
0.8
0.04
Raining
0.01
0.15
– Raining  WetGrass
• We can state it as
–
–
–
–
P(Raining, WetGrass) = 0.15
P(Raining, WetGrass) = 0.01
P(Raining, WetGrass) = 0.04
P(Raining, WetGrass) = 0.8
22
Conditional Probability
• Unconditional, or Prior, Probability
– Probabilities associated with a proposition or variable, prior to any
evidence
– E.g., P(WetGrass), P(Raining)
• Conditional, or Posterior, Probability
–
–
–
–
Probabilities after evidence is gathered
P(A | B) – “The probability of A given that we know B”
After (posterior to) procuring evidence
E.g., P(WetGrass | Raining)
P( X | Y ) 
P( X , Y )
P(Y )
or
P( X | Y ) P(Y )  P( X , Y )
Assumes P(Y) nonzero
23
The chain rule
P( X , Y )  P( X | Y ) P(Y )
By the Chain Rule
P( X , Y , Z )  P( X | Y , Z ) P(Y , Z )
 P( X | Y , Z ) P(Y | Z ) P( Z )
or , equivalently
 P( X ) P(Y | X ) P( Z | X , Y )
Notes:
• Precedence: ‘|’ is lowest
• E.g., P(X | Y, Z) means which?
P( (X | Y), Z )
P(X | (Y, Z) )
24
Joint probability distribution
From P(X,Y), we can always calculate:
X
Y
x1
x2
x3
y1
0.2
0.1
0.1
y2
0.1
0.2
0.3
P(X)
P(Y)
P(X|Y)
P(Y|X)
P(X=x1)
P(Y=y2)
P(X|Y=y1)
P(Y|X=x1)
P(X=x1|Y)
etc.
25
P(X,Y)
y1
y2
x1
x2
0.2
0.1
0.1
0.2
P(Y)
x3
P(X)
0.1
0.3
P(X|Y)
y1
0.4
y2
0.6
P(X=x1,Y=y2) = ?
P(X=x1) = ?
P(Y=y2) = ?
P(X|Y=y1) = ?
P(X=x1|Y) = ?
x1
x2
x3
0.3
0.3
0.4
x1
x2
x3
y1
0.5
0.25
0.25
y2
0.167
0.333
0.5
x1
x2
y1
0.667
0.333
0.25
y2
0.333
0.667
0.75
P(Y|X)
x3
26
Probability Distributions
Continuous vars
Discrete vars
P(X)
Function (of one variable)
M vector
P(X=x)
Scalar*
Scalar
P(X,Y)
Function of two variables
MxN matrix
P(X|Y)
Function of two variables
MxN matrix
P(X|Y=y)
Function of one variable
M vector
P(X=x|Y)
Function of one variable
N vector
P(X=x|Y=y)
Scalar*
Scalar
* - actually zero. Should be P(x1 < X < x2)
27
Bayes’ Rule
• Since
P( X , Y )  P( X | Y ) P(Y )
and
P( X , Y )  P(Y | X ) P( X )
• Then
P( X | Y ) P(Y )  P(Y | X ) P( X )
P(Y | X ) P( X )
Bayes’ Rule
P( X | Y ) 
P(Y )
28
Bayes’ Rule
• Similarly, P(X) conditioned on two variables:
P(Y | X , Z ) P( X | Z )
P( X | Y , Z ) 
P(Y | Z )
P( Z | X , Y ) P( X | Y )
P( X | Y , Z ) 
P( Z | Y )
• Or N variables:
P( X 2 | X 1 , X 3 , , X N ) P( X 1 | X 3 , , X N )
P( X 1 | X 2 , X 3 ,, X N ) 
P( X 2 | X 3 ,, X N )
29
Bayes’ Rule
• This simple equation is very useful in practice
– Usually framed in terms of hypotheses (H) and data (D)
 Which of the hypotheses is best supported by the data?
Likelihood
(causal knowledge)
Prior probability
P( D | H i ) P( H i )
P( H i | D) 
P( D)
Posterior probability
(diagnostic knowledge)
Normalizing constant
P ( H i | D)  k P ( D | H i ) P ( H i )
30
Bayes’ rule example: Medical diagnosis
• Meningitis causes a stiff neck 50% of the time
• A patient comes in with a stiff neck – what is the
probability that he has meningitis?
• Need to know two things:
– The prior probability of a patient having meningitis (1/50,000)
– The prior probability of a patient having a stiff neck (1/20)
• ?
P( S | M ) P( M )
P( M | S ) 
P( S )
• P(M | S) = (0.5)(0.00002)/(0.05) = 0.0002
31
Example (cont.)
• Suppose that we also know about whiplash
– P(W) = 1/1000
– P(S | W) = 0.8
• What is the relative likelihood of whiplash and meningitis?
– P(W | S) / P(M | S)
P(W | S ) 
P( S | W ) P(W ) (0.8)(0.001)

 0.016
P( S )
0.05
So the relative likelihood of whiplash vs. meningitis is (0.016/0.0002) = 80
32
A useful Bayes rule example
A test for a new, deadly strain of anthrax (that has no symptoms)
is known to be 99.9% accurate. Should you get tested? The
chances of having this strain are one in a million.
What are the random variables?
A – you have anthrax (boolean)
T – you test positive for anthrax (boolean)
Notation: Instead of P(A=True) and P(A=False), we will write P(A) and P(A)
What do we want to compute?
P(A|T)
What else do we need to know or assume?
Priors: P(A) , P(A)
Given: P(T|A) , P(T|A), P(T|A), P(T|A)
Possibilities
A
A
T
A
T
A
T
T
33
Example (cont.)
We know:
Given: P(T|A) = 0.999, P(T|A) = 0.001, P(T|A) = 0.001, P(T|A)
= 0.999
Prior knowledge: P(A) = 10-6, P(A) = 1 – 10-6
Want to know P(A|T)
P(A|T) = P(T|A) P(A) / P(T)
Calculate P(T) by marginalization
P(T) = P(T|A) P(A) + P(T|A) P(A) = (0.999)(10-6) + (0.001)(1 – 10-6)
 0.001
So P(A|T) = (0.999)(10-6) / 0.001  0.001
Therefore P(A|T)  0.999
What if you work at a Post Office?
34
People with anthrax
People without anthrax
Bad T
(0.1%)
Good T
All people
35
Download