CS 182 Sections 101 - 102 slides created by Eva Mok (emok@icsi.berkeley.edu) modified by JGM March 15 2006 Announcements • a5 is due Friday night at 11:59pm • a6 is out tomorrow (2nd coding assignment), due the Monday after spring break • Midterm solution will be posted (soon) Quick Recap • This Week – you just had the midterm – a bit more motor control – some belief net, feature structure • Coming up – Bailey’s Model of learning hand action words Your Task: As far as the brain / thought / language is concerned, what is the single biggest mystery to you at this point? Remember Recruitment Learning? • One-shot learning • The idea is for things like words or grammar, kids learn at least something given a single input • Granted, they might not get it completely right in the first shot • But over time, their knowledge slowly converges to the right answer (i.e. built a model to fit the data) Model Merging • Goal: – learn a model given data • The model should: – explain the data well – be "simple" – be able to make generalizations Naïve way to make a model • create a special case for each piece of data • of course get the training data completely right • cannot generalize at all when test data comes • how to fix this — Model Merging • "compact" the special cases into more descriptive rules without losing too much performance Basic idea of Model Merging • Start with the naïve model: one special case for each piece of data • While performance increases – Create a more general rule that explains some of the data – Discard the corresponding special cases 2 examples of Model Merging • Bailey’s VerbLearn system – model that maps actions to verb labels – performance: complexity of model + ability to explain data MAP • Assignment 6 - Grammar Induction – model that maps sentences to grammar rules – performance: size of grammar + derivation length of sentences cost Grammar • Grammar: rules that governs what sentences are legal in a language • e.g. Regular Grammar, Context Free Grammar • Production rules in a grammar have the form • Terminal symbols: a, b, c, etc • Non-terminal symbols: S, A, B, X, etc • Different classes of grammar restrict where these symbols can go • We’ll see an example on the next page Right-Regular Grammar • Right-Regular Grammar is a further restricted class of Regular Grammar • Non terminal symbols are always on the right end • e.g: S -> a b c X X -> d e X -> f • valid sentences would be "abcde" and "abcf“ Grammar Induction • As input data (e.g. “abcde”, “abcf”) comes in, we’d like to build up a grammar that explains the data • We can certainly have one rule for each sentence we see in the data naive approach, no generalization • Would rather “compact” your grammar • In a6, you have two ways of doing this “compaction” – prefix merge – suffix merge How do we find the model? • prefix merge • suffix merge Sabcde Sabcde Sabcf Sfcde becomes becomes SabcX SabX Xde SfX Xf Xcde Contrived Example • Suppose you have these 3 grammar rules: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there • 5 merging options – – – – – prefix merge (r1, r2, 1) prefix merge (r1, r2, 2) suffix merge (r1, r3, 1) suffix merge (r1, r3, 2) suffix merge (r1, r3, 3) Computationally • Kids aren’t presented all the data at once • Instead they’ll hear these sentences one by one: 1. eat them here or there 2. eat them anywhere 3. like them anywhere or here or there • As each sentence (i.e. data) comes in, you create one rule for it, e.g. S eat them here or there • Then you look for ways to merge as more sentences come in Example 1: just prefix merge • After the first two sentences are presented, we can already do a prefix merge of length 2: r1: S eat them here or there r2: S eat them anywhere r3: S eat them X1 r4: X1 here or there r5: X1 anywhere Example 2: just suffix merge • After the first three sentences are presented, we can do a suffix merge of length 3: r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there r4: S eat them X2 r5: S like them anywhere or X2 r6: X2 here or there Your Task in a6 • pull in sentences one by one • monitor your sentences • do either a prefix merge or a suffix merge as soon as it’s “good” to do so How do we know if a model is good? • want a small grammar • but want it to explain the data well • minimize the cost along the way: c(G) = s(G) + d(G,D) size of grammar derivation length of sentences : learning factor to play with Back to Example 2 • Your original grammar: Remember your data is: 1. eat them here or there 2. eat them anywhere 3. like them anywhere or here or there r1: S eat them here or there r2: S eat them anywhere r3: S like them anywhere or here or there size of grammar = 15 derivation length of sentences = 1 + 1 + 1 = 3 c(G) = s(G) + d(G,D) = ∙ 15 + 3 Back to Example 2 • Your new grammar: Remember your data is: 1. eat them here or there 2. eat them anywhere 3. like them anywhere or here or there r2: S eat them anywhere r4: S eat them X2 r5: S like them anywhere or X2 r6: X2 here or there so in fact you SHOULDN’T merge if ≤ 2 size of grammar = 14 derivation length of sentences = 2 + 1 + 2 = 5 c(G) = s(G) + d(G,D) = ∙ 14 + 5