CS 182 Sections 101 - 102 March 15 2006 slides created by

advertisement
CS 182
Sections 101 - 102
slides created by
Eva Mok (emok@icsi.berkeley.edu)
modified by JGM
March 15 2006
Announcements
• a5 is due Friday night at 11:59pm
• a6 is out tomorrow (2nd coding assignment), due
the Monday after spring break
• Midterm solution will be posted (soon)
Quick Recap
• This Week
– you just had the midterm
– a bit more motor control
– some belief net, feature structure
• Coming up
– Bailey’s Model of learning hand action words
Your Task:
As far as the brain / thought / language is
concerned, what is the single biggest mystery
to you at this point?
Remember Recruitment Learning?
• One-shot learning
• The idea is for things like words or grammar, kids learn
at least something given a single input
• Granted, they might not get it completely right in the
first shot
• But over time, their knowledge slowly converges to
the right answer (i.e. built a model to fit the data)
Model Merging
• Goal:
– learn a model given data
• The model should:
– explain the data well
– be "simple"
– be able to make generalizations
Naïve way to make a model
• create a special case for each piece of data
• of course get the training data completely right
• cannot generalize at all when test data comes
• how to fix this — Model Merging
• "compact" the special cases into more descriptive
rules without losing too much performance
Basic idea of Model Merging
• Start with the naïve model: one special case for each
piece of data
• While performance increases
– Create a more general rule that explains some of
the data
– Discard the corresponding special cases
2 examples of Model Merging
• Bailey’s VerbLearn system
– model that maps actions to verb labels
– performance:
complexity of model + ability to explain data  MAP
• Assignment 6 - Grammar Induction
– model that maps sentences to grammar rules
– performance:
size of grammar + derivation length of sentences  cost
Grammar
• Grammar: rules that governs what sentences are legal
in a language
• e.g. Regular Grammar, Context Free Grammar
• Production rules in a grammar have the form

• Terminal symbols: a, b, c, etc
• Non-terminal symbols: S, A, B, X, etc
• Different classes of grammar restrict where these
symbols can go
• We’ll see an example on the next page
Right-Regular Grammar
• Right-Regular Grammar is a further restricted class of
Regular Grammar
• Non terminal symbols are always on the right end
• e.g:
S -> a b c X
X -> d e
X -> f
• valid sentences would be "abcde" and "abcf“
Grammar Induction
• As input data (e.g. “abcde”, “abcf”) comes in, we’d
like to build up a grammar that explains the data
• We can certainly have one rule for each sentence we
see in the data  naive approach, no generalization
• Would rather “compact” your grammar
• In a6, you have two ways of doing this “compaction”
– prefix merge
– suffix merge
How do we find the model?
• prefix merge
• suffix merge
Sabcde
Sabcde
Sabcf
Sfcde
becomes
becomes
SabcX
SabX
Xde
SfX
Xf
Xcde
Contrived Example
• Suppose you have these 3 grammar rules:
r1: S  eat them here or there
r2: S  eat them anywhere
r3: S  like them anywhere or here or there
• 5 merging options
–
–
–
–
–
prefix merge (r1, r2, 1)
prefix merge (r1, r2, 2)
suffix merge (r1, r3, 1)
suffix merge (r1, r3, 2)
suffix merge (r1, r3, 3)
Computationally
•
Kids aren’t presented all the data at once
•
Instead they’ll hear these sentences one by one:
1. eat them here or there
2. eat them anywhere
3. like them anywhere or here or there
•
As each sentence (i.e. data) comes in, you create one rule
for it, e.g.
S  eat them here or there
•
Then you look for ways to merge as more sentences come in
Example 1: just prefix merge
• After the first two sentences are presented, we can already
do a prefix merge of length 2:
r1: S  eat them here or there
r2: S  eat them anywhere
r3: S  eat them X1
r4: X1  here or there
r5: X1  anywhere
Example 2: just suffix merge
• After the first three sentences are presented, we can do a
suffix merge of length 3:
r1: S  eat them here or there
r2: S  eat them anywhere
r3: S  like them anywhere or here or there
r4: S  eat them X2
r5: S  like them anywhere or X2
r6: X2  here or there
Your Task in a6
• pull in sentences one by one
• monitor your sentences
• do either a prefix merge or a suffix merge as soon as
it’s “good” to do so
How do we know if a model is good?
• want a small grammar
• but want it to explain the data well
• minimize the cost along the way:
c(G) =  s(G) + d(G,D)
size of grammar
derivation length of sentences
: learning factor to play with
Back to Example 2
• Your original grammar:
Remember your data is:
1. eat them here or there
2. eat them anywhere
3. like them anywhere or
here or there
r1: S  eat them here or there
r2: S  eat them anywhere
r3: S  like them anywhere or here or there
size of grammar = 15
derivation length of sentences = 1 + 1 + 1 = 3
c(G) =  s(G) + d(G,D) =  ∙ 15 + 3
Back to Example 2
• Your new grammar:
Remember your data is:
1. eat them here or there
2. eat them anywhere
3. like them anywhere or
here or there
r2: S  eat them anywhere
r4: S  eat them X2
r5: S  like them anywhere or X2
r6: X2  here or there
so in fact you
SHOULDN’T merge
if  ≤ 2
size of grammar = 14
derivation length of sentences = 2 + 1 + 2 = 5
c(G) =  s(G) + d(G,D) =  ∙ 14 + 5
Download