CSC242: Intro to AI Lecture 15

advertisement
CSC242: Intro to AI
Lecture 15
Administrivia
Project 3
Code available
Due Mon Apr 9 11:59PM
ULW
2nd draft due Apr 1
College Writing Program Contest: $100!
Deadline: April 5
http://writing.rochester.edu
Approximate Inference
in Bayesian Networks
Bayesian Networks
Cavity
Random Variables
P(Cavity)
“has direct influence on”
P(Toothache | Cavity)
Toothache
P(Catch | Cavity)
Catch
conditionally independent given parents
P(Xi | Parents(Xi ))
The Goal
• Query variable X
• Evidence variables E , ..., E
• Observed values: e = < e , ..., e >
• Non-evidence, non-query (“hidden”) variables: Y
• Approximate: P(X | e)
1
m
1
m
Generating Samples
• Sample each variable in topological order
• Child appears after its parents
• Choose the value for that variable
conditioned on the values already chosen
for its parents
Rejection Sampling
• Generate sample from the prior
distribution specified by the network
• Reject sample if inconsistent with the
evidence
• Use remaining samples to estimate
probability of event
• Fraction of samples consistent with the
evidence drops exponentially with number
of evidence variables
Likelihood Weighting
• Generate sample using topological order
• Evidence variable: Fix value to evidence
value and update weight of sample using
probability in network
• Non-evidence variable: Sample from values
using probabilities in the network (given
parents)
• Probabilistic Graphical Models: Principles and
Techniques by Daphne Koller, Nir Friedman
P(C)=.5
Cloudy
C
t
f
P(S)
.10
.50
Rain
Sprinkler
Wet
Grass
S
t
t
f
f
C P(R)
t .80
f .20
Cloudy
true
Sprinkler true
Rain
false
WetGrass true
R P(W)
t
.99
f
.90
t .90
f .00
P(Rain | Sprinkler = true, WetGrass = true)
Markov Chain Monte
Carlo Simulation
• To approximate: P(X | e)
• Generate a sequence of states
• Values of evidence variables are fixed
• Values of other variables appear in the
right proportion given the distribution
encoded by the network
U1
...
Um
X
Z1j
Y1
...
U1
Z nj
Yn
Conditional
Independence
Um
...
X
Z 1j
Y1
...
Z nj
Yn
Markov
Blanket
Markov Blanket
• The Markov Blanket of a node:
• Parents
• Children
• Children’s parents
• A node is conditionally independent of all
other nodes in the network given its
Markov Blanket
MCMC
Gibbs Simulation
Sampling
• To approximate: P(X | e)
• Start in a state with evidence variables set
to evidence values (others arbitrary)
• On each step, sample a non-evidence
variable conditioned on the values of the
variables in its Markov Blanket
• Order irrelevant
P(C)=.5
Cloudy
C
t
f
P(S)
.10
.50
Rain
Sprinkler
Wet
Grass
S
t
t
f
f
C P(R)
t .80
f .20
Cloudy
true
Sprinkler true
Rain
false
WetGrass true
R P(W)
t
.99
f
.90
t .90
f .00
P(Rain | Sprinkler = true, WetGrass = true)
P(C)=.5
Cloudy
C
t
f
P(S)
.10
.50
Rain
Sprinkler
Wet
Grass
S
t
t
f
f
C P(R)
t .80
f .20
Cloudy
true
Sprinkler true
Rain
false
WetGrass true
R P(W)
t
.99
f
.90
t .90
f .00
P(Rain | Sprinkler = true, WetGrass = true)
P(Cloudy | Sprinkler = true, Rain = false)
P(C)=.5
Cloudy
C
t
f
P(S)
.10
.50
Rain
Sprinkler
Wet
Grass
S
t
t
f
f
C P(R)
t .80
f .20
Cloudy
false
Sprinkler true
Rain
false
WetGrass true
R P(W)
t
.99
f
.90
t .90
f .00
P(Rain | Sprinkler = true, WetGrass = true)
P(Rain | Sprinkler = true, Rain = false, Cloudy = false)
P(C)=.5
Cloudy
C
t
f
P(S)
.10
.50
Rain
Sprinkler
Wet
Grass
S
t
t
f
f
R P(W)
t
.99
f
.90
t .90
f .00
C P(R)
t .80
f .20
Cloudy
false
Sprinkler true
Rain
true
WetGrass true
Sprinkler
P(Rain | Sprinkler = true, WetGrass = true)
P(C)=.5
Cloudy
C
t
f
P(S)
.10
.50
Rain
Sprinkler
Wet
Grass
S
t
t
f
f
R P(W)
t
.99
f
.90
t .90
f .00
C P(R)
t .80
f .20
Cloudy
false
Sprinkler true
Rain
true
WetGrass true
Sprinkler
P(Rain | Sprinkler = true, WetGrass = true)
Cloudy Sprinkler Rain
T
T
F
F
T
F
F
T
T
WetGrass
T
T
T
R
✓
¬R
✓
✓
Gibbs Sampling
• To approximate: P(X | e)
• Start in a state with evidence variables set
to evidence values (others arbitrary)
• On each step, sample non-evidence
variables conditioned on the values of the
variables in their Markov Blanket
• Order irrelevant
• A form of local search!
Exact Inference in
Bayesian Networks
• #P-Hard even for distribution described as
a Bayesian Network
Approximate Inference
in Bayesian Networks
• Sampling consistent with a distribution
• Rejection Sampling: rejects too much
• Likelihood Weighting: weights get too small
• Gibbs Sampling: MCMC algorithm (like local
search)
• All generate consistent estimates (equal to
exact probability in the large-sample limit)
Probabilistic Reasoning
Over Time
Belief States
• Logic: Defined in terms of possible (or
impossible) worlds
• Probability: Defined in terms of more (or
less) likely possible worlds
• Either way: State of the world is fixed
(doesn’t change during reasoning)
• Random variable has single fixed value
1,4
2,4
3,4
4,4
3,3
4,3
3,2
4,2
1,3
W!
P?
2,3
1,2
A
2,2
S
B
2,1
V
OK
= Agent
= Breeze
= Glitter, Gold
= Safe square
= Pit
= Stench
= Visited
= Wumpus
1,4
2,4
1,3 W!
1,2
OK
OK
1,1
P?
A
B
G
OK
P
S
V
W
B
V
OK
3,1
(a)
P!
P?
4,1
S
V
OK
1,1
3,4
4,4
2,3
3,3 P?
4,3
2,2
3,2
4,2
A
S G
B
V
OK
2,1
V
OK
P?
B
V
OK
3,1
(b)
P!
4,1
Changing Probabilities
•
•
Query: BloodSugar, InsulinLevel
•
Hidden: MetabolicActivity, ...
Evidence: MeasuredBloodSugar,
InsulinTaken, FoodEaten, ...
Goal
• Given history of evidence
• Assess current state
• Predict future states
Representation
• Model the world as a series of time slices
• Unobservable state variables X
• Observable evidence variables E
• State at time t: X
• Observation at time t: E = e
t
t
t
Raint = Rt
Umbrellat = Ut
Raint = Rt
Umbrellat = Ut
Xt : R0 , R1 , R2 , . . .
Et :
U1 , U 2 , . . .
• Representation of state: X
t
, Et
Modeling Change
• Given state from time 0 until time t, need
to know distribution of state variables for
time t+1
P(Xt+1 | X0:t ) = P(Xt+1 | X0 , X1 , . . . Xt )
Andrey (Andrei) Andreyevich Markov
(1856 – 1922)
Markov Process
(Assumption)
• Current state depends only on the
previous state and not on earlier states
• The future is conditionally independent of
the past, given the present
P(Xt | X0:t−1 ) = P(Xt | Xt−1 )
Markov Process
Xt-2
Xt-1
Xt
Xt+1
Xt+2
Markov Process
Rt-2
Rt-1
Rt
Rt+1
Rt+2
Markov Process
Rt-2
Rt-1
P(Rt−2 | Rt−3 )
Rt
Rt+1
P(Rt | Rt−1 )
P(Rt−1 | Rt−2 )
Rt+2
P(Rt+2 | Rt−1 )
P(Rt+1 | Rt )
Stationary Process
• Changes in the state are caused by a
process that does not itself change
• Can use the same model to compute the
changes for any pair of states Xt, Xt+1
• Example: P(R
t
| Rt−1 ) is the same for any t
Rt-1
t
f
Rt-1
P(Rt)
0.7
0.3
Rt
Rt+1
• Representation of state: X , E
• Transition model: P(X | X )
• Markov assumption, stationary process
t
t
t
t−1
Raint = Rt
Umbrellat = Ut
Xt : R0 , R1 , R2 , . . .
Et :
U1 , U 2 , . . .
Rt-1
t
f
P(Rt)
0.7
0.3
Rt-1
Rt
Rt+1
Ut-1
Ut
Ut+1
Rt-1
t
f
P(Rt)
0.7
0.3
Rt-1
Rt
Rt+1
Ut-1
Ut
Ut+1
Sensor Markov
Assumption
• Observed values of evidence variables
depend only on the current state
• Evidence is conditionally independent of
the past, given the present
P(Et | X0:t , E0:t−1 ) = P(Et | Xt )
Sensor Markov
Assumption
• Observed values of evidence variables
depend only on the current state
• Evidence is conditionally independent of
the past, given the present
P(Et | X0:t , E0:t−1 ) = P(Et | Xt )
Sensor Model
Rt-1
t
f
Rt-1
P(Rt)
0.7
0.3
Rt
Rt+1
Rt
t
f
Ut-1
Ut
Ut+1
P(Ut)
0.9
0.2
• Representation of state: X , E
• Transition model: P(X | X )
• Markov assumption, stationary process
• Sensor model: P(E | X )
• Prior distribution at time 0: P(X )
t
t
t
t
t−1
t
0
Temporal Model
P(X0:t , E1:t ) = P(X0 )
t
�
i=1
P(Xi | Xi−1 )P(Ei | Xi )
Sensor Model
Initial State Model
Transition Model
Inference
• Filtering (State Estimation)
• Prediction
• Smoothing
• Most Likely Explanation
Filtering
(State Estimation)
• Compute current belief state given all
evidence to date
Filtering
(State Estimation)
• Compute current belief state given all
evidence to date
• Method: Build network incrementally, do
inference on it
Rt-1
t
f
P(Rt)
0.7
0.3
R0
Rt
t
f
P(Ut)
0.9
0.2
Rt-1
t
f
R0
P(Rt)
0.7
0.3
R1
Rt
t
f
U1
P(Ut)
0.9
0.2
Rt-1
t
f
R0
R1
P(Rt)
0.7
0.3
R2
Rt
t
f
U1
U2
P(Ut)
0.9
0.2
Rt-1
t
f
R0
R1
P(Rt)
0.7
0.3
R2
R3
Rt
t
f
U1
U2
U3
P(Ut)
0.9
0.2
Filtering
(State Estimation)
• Compute current belief state given all
evidence to date
• Method: Build network incrementally, do
inference on it
• Have to maintain current state estimate
and update it rather than recomputing over
history of observations every time
Filtering
(State Estimation)
P(Xt+1 | e1:t+1 ) = P(Xt+1 | e1:t , et+1 )
Bayes’ Rule = α P(et+1 | Xt+1 , e1:t )P(Xt+1 | e1:t )
= α P(et+1 | Xt+1 )P(Xt+1 | e1:t )
Sensor Markov assumption
Filtering
(State Estimation)
P(Xt+1 | e1:t+1 ) = P(Xt+1 | e1:t , et+1 )
= α P(et+1 | Xt+1 , e1:t )P(Xt+1 | e1:t )
= α P(et+1 | Xt+1 )P(Xt+1 | e1:t )
Prediction of next state
Update with evidence
(using sensor model)
One-Step Prediction
P(Xt+1 | e1:t ) =
=
�
xt
�
xt
Condition on Xt
P(Xt+1 | xt , e1:t )P (xt | e1:t )
P(Xt+1 | xt )P (xt | e1:t )
Markov assumption
Filtering
(State Estimation)
P(Xt+1 | e1:t+1 ) = α P(et+1 | Xt+1 )
�
xt
P(Xt+1 | xt )P (xt | e1:t )
Prediction of next state
Update with evidence
(using sensor model)
Rt-1
t
f
R0
P(Rt)
0.7
0.3
R1
R2
Rt
t
f
U1
U2
P(Ut)
0.9
0.2
Rt-1
t
f
R0
P(Rt)
0.7
0.3
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
U1
U2
P(Ut)
0.9
0.2
Rt-1
t
f
R0
P(Rt)
0.7
0.3
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
U1
U1 = true
U2
P(Ut)
0.9
0.2
Rt-1
t
f
P(Rt)
0.7
0.3
R0
P(R0 ) = �0.5, 0.5�
�
P(R1 ) =
P(R1 | r0 )P (r0 )
r0
R1
R2
Rt
t
f
U1
U2
U1 = true
= �0.7, 0.3� × 0.5 + �0.3, 0.7� × 0.5
= �0.5, 0.5�
P(Ut)
0.9
0.2
Rt-1
t
f
R0
P(Rt)
0.7
0.3
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) = α P(u1 | R1 )P(R1 )
U1
U1 = true
= α �0.9, 0.2��0.5, 0.5�
= α �0.45, 0.1� ≈ �0.818, 0.182�
U2
P(Ut)
0.9
0.2
Rt-1
t
f
P(Rt)
0.7
0.3
R0
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) ≈ �0.818, 0.182�
U1
U1 = true
U2
P(Ut)
0.9
0.2
Rt-1
t
f
P(Rt)
0.7
0.3
R0
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) ≈ �0.818, 0.182�
U1
U2
U1 = true U2 = true
P(Ut)
0.9
0.1
Rt-1
t
f
R0
P(Rt)
0.7
0.3
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) ≈ �0.818, 0.182�
�
P(R2 | u1 ) =
P(R2 | r1)PU
(r11 | u1 )
r1
U2
U1 = true U2 = true
= �0.7, 0.3� × 0.818 + �0.3, 0.7� × 0.182
≈ �0.627, 0.373�
P(Ut)
0.9
0.1
Rt-1
t
f
P(Rt)
0.7
0.3
R0
R1
R2
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) ≈ �0.818, 0.182�
P(R2 | u1 , u2 ) = α P(u2 | R2 )P(R
U1 2 | u1 ) U2
U1 = true U2 = true
= α �0.9, 0.2��0.627, 0.373�
= α �0.565, 0.075� ≈ �0.883, 0.117�
Rt
t
f
P(Ut)
0.9
0.1
Rt-1
t
f
P(Rt)
0.7
0.3
R0
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) ≈ �0.818, 0.182�
P(R2 | u1 , u2 ) ≈ �0.883, 0.117�U1
U2
U1 = true U2 = true
P(Ut)
0.9
0.1
Filtering
(State Estimation)
P(Xt+1 | e1:t+1 ) = α P(et+1 | Xt+1 )
�
xt
P(Xt+1 | xt )P (xt | e1:t )
Implement as recursive procedure:
P(X0 | e1:0 ) = P(X0 )
P(Xt+1 | e1:t+1 ) = α Forward(P(Xt | e1:t ), et+1 )
Updates in constant time and space!
Prediction
• Compute posterior distribution for future
state, given all evidence to date
• This is filtering without the addition of any
new evidence
Prediction
P(Xt+k+1 | e1:t+1 ) =
�
xt+k
P(Xt+k+1 | xt+k )P(xt+k | e1:t )
Smoothing
• Compute posterior over past state(s) given
evidence up to the present
• May allow you to improve the estimate you
made at the time, since you know now
what was then in the future
Rt-1
t
f
P(Rt)
0.7
0.3
R0
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) ≈ �0.818, 0.182�
P(R2 | u1 , u2 ) ≈ �0.883, 0.117�U1
U2
U1 = true U2 = true
P(R1 | u1 , u2 ) = α P(R1 | u1 )P(u2 | R1 )
P(Ut)
0.9
0.1
Rt-1
t
f
P(Rt)
0.7
0.3
R0
R1
R2
Rt
t
f
P(R0 ) = �0.5, 0.5�
P(R1 | u1 ) ≈ �0.818, 0.182�
P(R2 | u1 , u2 ) ≈ �0.883, 0.117�U1
U2
U1 = true U2 = true
P(R1 | u1 , u2 ) = α P(R1 | u1 )P(u2 | R1 )
≈ �0.883, 0.117�
P(Ut)
0.9
0.1
Finding the Most Likely
Sequence
• Infer most likely sequence of states that
could have generated observations
• Without enumerating all possible
sequences of states and evaluating their
likelihood
Rain 1
Rain 2
Rain 3
Rain 4
Rain 5
true
true
true
true
true
false
false
false
false
false
Umbrella t true
true
false
true
true
.8182
.5155
.0361
.0334
.0210
.1818
.0491
.1237
.0173
.0024
m1:1
m1:2
m1:3
m1:4
m1:5
(a)
(b)
Viterbi Algorithm
Time complexity: O(t)
Space complexity: O(t)
Temporal Models
• Representation of state: Xt , Et
• Transition model: P(X | X )
• Markov assumption, stationary process
• Sensor model: P(E | X )
• Sensor Markov assumption
• Prior distribution at time 0: P(X )
t
t
t−1
t
0
Temporal Model
P(X0:t , E1:t ) = P(X0 )
t
�
i=1
P(Xi | Xi−1 )P(Ei | Xi )
Sensor Model
Initial State Model
Transition Model
Inference
• Filtering (State Estimation): Compute current
belief state given all evidence to date
• Prediction: Compute posterior distribution
for future state, given all evidence to date
• Smoothing: Compute posterior over past
state(s) given evidence up to the present
• Most Likely State: Infer most likely sequence
of states that could have generated
observations
For Next Time:
AIMA 16.0-16.3, 16.5
Download