Course Review (post midterm topics)

advertisement
CS 188: Artificial Intelligence
Spring 2007
Lecture 29: Post-midterm course
review
5/8/2007
Srini Narayanan – ICSI and UC Berkeley
Final Exam
 8:10 to 11 AM on 5/15/2007 at 50 BIRGE
 Final prep page up
 Includes all topics (see page).
 Weighted toward post midterm topics.
 2 double sided cheat sheets allowed as is
a calculator.
 Final exam review Thursday 4 PM Soda
306.
Utility-Based Agents
Today
 Review of post midterm topics relevant for the
final.
 Reasoning about time
 Markov Models
 HMM forward algorithm, Vitterbi Algorithm.
 Classification
 Naïve Bayes, Perceptron
 Reinforcement Learning
 MDP, Value Iteration, Policy iteration
 TD-value learning, Q-learning,
 Advanced topics
 Applications to NLP
Questions
 What is the basic conditional independence
assertion for markov models?
 What is a problem with Markov Models for
prediction into the future?
 What are the basic CI assertions for HMM?
 How do inference algorithms exploit the CI
assertions
 Forward Algorithm
 Viterbi algorithm.
Markov Models
 A Markov model is a chain-structured BN
 Each node is identically distributed (stationarity)
 Value of X at a given time is called the state
 As a BN:
X1
X2
X3
X4
 Parameters: called transition probabilities or
dynamics, specify how the state evolves over time
(also, initial probs)
Conditional Independence
X1
X2
X3
X4
 Basic conditional independence:
 Past and future independent of the present
 Each time step only depends on the previous
 This is called the (first order) Markov property
 Note that the chain is just a (growing BN)
 We can always use generic BN reasoning on it (if we
truncate the chain)
Example
 From initial state (observation of sun)
P(X1)
P(X2)
P(X3)
P(X)
 From initial state (observation of rain)
P(X1)
P(X2)
P(X3)
P(X)
Hidden Markov Models
 Markov chains not so useful for most agents
 Eventually you don’t know anything anymore
 Need observations to update your beliefs
 Hidden Markov models (HMMs)
 Underlying Markov chain over states S
 You observe outputs (effects) at each time step
 As a Bayes’ net:
X1
X2
X3
X4
X5
E1
E2
E3
E4
E5
Example
 An HMM is
 Initial distribution:
 Transitions:
 Emissions:
Conditional Independence
 HMMs have two important independence properties:
 Markov hidden process, future depends on past via the present
 Current observation independent of all else given current state
X1
X2
X3
X4
X5
E1
E2
E3
E4
E5
 Quiz: does this mean that observations are independent
given no evidence?
 [No, correlated by the hidden state]
Forward Algorithm
 Can ask the same questions for HMMs as Markov chains
 Given current belief state, how to update with evidence?
 This is called monitoring or filtering
 Formally, we want:
X1
X2
X3
X4
X5
E1
E2
E3
E4
E5
Viterbi Algorithm
 Question: what is the most likely state sequence given
the observations?
 Slow answer: enumerate all possibilities
 Better answer: cached incremental version
X1
X2
X3
X4
X5
E1
E2
E3
E4
E5
Classification
 Supervised Models
 Generative Models
 Naïve Bayes
 Discriminative Models
 Perceptron
 Unsupervised Models
 K-means
 Agglomerative Cluster
Parameter estimation
 What are the parameters for Naïve
Bayes?
 What is Maximum Likelihood estimation
for NB?
 What are the problems with ML estimates?
General Naïve Bayes
 A general naive Bayes model:
|C| x |E|n
parameters
|C| parameters
C
n x |E| x |C|
parameters
E1
E2
En
 We only specify how each feature depends on the class
 Total number of parameters is linear in n
Estimation: Smoothing
 Problems with maximum likelihood (relative frequency)
estimates:
 If I flip a coin once, and it’s heads, what’s the estimate for
P(heads)?
 What if I flip 10 times with 8 heads?
 What if I flip 10M times with 8M heads?
 Basic idea:
 We have some prior expectation about parameters (here, the
probability of heads)
 Given little evidence, we should skew towards our prior
 Given a lot of evidence, we should listen to the data
Estimation: Laplace Smoothing
 Laplace’s estimate (extended):
 Pretend you saw every outcome
k extra times
 What’s Laplace with k = 0?
 k is the strength of the prior
 Laplace for conditionals:
 Smooth each condition
independently:
H
H
T
Types of Supervised classifiers
 Generative Models
 Naïve Bayes
 Discriminative Models
 Perceptron
Questions
 What is a binary threshold perceptron?
 How can we make a multi-class
perceptron?
 What sorts of patterns can perceptrons
classify correctly
The Binary Perceptron
 Inputs are features
 Each feature has a weight
 Sum is the activation
 If the activation is:
 Positive, output 1
 Negative, output 0
f1
f2
f3
w1
w2
w3

>0?
The Multiclass Perceptron
 If we have more than
two classes:
 Have a weight vector for
each class
 Calculate an activation for
each class
 Highest activation wins
Linear Separators
 Binary classification can be viewed as the task
of separating classes in feature space:
w .x = 0
w .x > 0
w .x < 0
Feature design
 Can we design features f1 and f2 to use a
perceptron to separate the the two
classes?
MDP and Reinforcement Learning
 What is an MDP (Basics) ?
 What is Bellman’s equation and how is it
used in value iteration?
 What is reinforcement learning
 TD-value learning
 Q learning
 Exploration vs. exploitation
Markov Decision Processes
 Markov decision processes (MDPs)
 A set of states s  S
 A model T(s,a,s’) = P(s’ | s,a)
 Probability that action a in state s
leads to s’
 A reward function R(s, a, s’)
(sometimes just R(s) for leaving a
state or R(s’) for entering one)
 A start state (or distribution)
 Maybe a terminal state
 MDPs are the simplest case of
reinforcement learning
 In general reinforcement learning, we
don’t know the model or the reward
function
Bellman’s Equation for Selecting
actions
 Definition of utility leads to a simple relationship
amongst optimal utility values:
Optimal rewards = maximize over first action and then
follow optimal policy
Formally: Bellman’s Equation
That’s my
equation!
Elements of RL
Agent
State
Policy
Reward
Action
Environment
0 : r0
1 : r1
2 : r2
s0 a
 s1 a
 s2 a
 
 Transition model, how action influences states
 Reward R, immediate value of state-action transition
 Policy , maps states to actions
MDPs
 Which of the following are true?
A
B
C
D
E
Reinforcement Learning
 What’s wrong with the following agents?
Model-Free Learning
 Big idea: why bother learning T?
s
 Update each time we experience a transition
 Frequent outcomes will contribute more updates
(over time)
 Temporal difference learning (TD)
 Policy still fixed!
 Move values toward value of whatever
successor occurs
a
s, a
s,a,s’
s’
Problems with TD Value Learning
 TD value learning is modelfree for policy evaluation
 However, if we want to turn
our value estimates into a
policy, we’re sunk:
 Idea: Learn state-action
pairings (Q-values) directly
 Makes action selection
model-free too!
s
a
s, a
s,a,s’
s’
Q-Learning
 Learn Q*(s,a) values
 Receive a sample (s,a,s’,r) (select a using e-greedy)
 Consider your old estimate:
 Consider your new sample estimate:
 Nudge the old estimate towards the new sample
 Set s = s’ until s is terminal
Applications to NLP
 How can generative models play a role in
MT, Speech, NLP?
 List three kinds of ambiguities often found
in language?
NLP applications of
Bayes Rules!!
 Handwriting recognition
 P (text | strokes) = P (text) * P (strokes | text)
 Spelling correction
 P (text | typos) = P (text) * P (typos | text)
 OCR
 P (text | image) = P (text) * P (image | text)
 MT
 P (english | french) = P (english) * P (french| english)
 Speech recognition
 P (language | sound) = P (LM) * P (sound | LM)
Ambiguities
 Headlines:








Iraqi Head Seeks Arms
Ban on Nude Dancing on Governor’s Desk
Juvenile Court to Try Shooting Defendant
Teacher Strikes Idle Kids
Stolen Painting Found by Tree
Kids Make Nutritious Snacks
Local HS Dropouts Cut in Half
Hospitals Are Sued by 7 Foot Doctors
 Why are these funny?
Learning
I hear and I forget
I see and I remember
I do and I understand
attributed to Confucius 551-479 B.C.
Thanks!
And good luck on the final and for the future!
Srini Narayanan
snarayan@icsi.berkeley.edu
Phase II: Update Means
 Move each mean to the
average of its assigned
points:
 Also can only decrease total
distance… (Why?)
 Fun fact: the point y with
minimum squared Euclidean
distance to a set of points {x}
is their mean
Download