Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 2. From computer science (then) to psycholinguistics (now) Syntax acquisition as parameter setting Like playing “20 questions’. The learner’s task is to detect the correct settings of the finite number of parameters. Headedness parameter: Are syntactic phrases head-initial (e.g., in VP, the verb precedes its object) or head-final (the verb follows the object)? Wh-movement parameter: Does a Wh-phrase move to the top of a clause or does it remain in situ? Parameter values are ‘triggered’ by learner’s encountering a distinctive revealing property of an input sentence. This Principles-and-Parameters approach has been retained through many subsequent changes in TG theory. It greatly reduces a learner’s workload of data-processing. It helps address the Poverty of Stimulus problem. 2 Parameter setting as flipping switches Chomsky never provided a specific implementation of parametric triggering. He often employed a metaphor of setting switches. (Chomsky 1981/1986) The metaphor suggests that parameter setting is: › Automatic, instantaneous, effortless: no linguistic reasoning is required of the learner. (Unlike hypothesis-formation models.) › Input-guided (no trial-and-error process). › A universal mechanism, but leading reliably to languagespecific parameter settings. › Non-interacting parameters: Each can be set separately. › Each has unambiguous triggers recognizable regardless of what else the learner does or doesn’t know about the language. › Deterministic learning: fully accurate, so no revision is ever needed. A wonderful advance if true – if psychologically feasible! 3 But computational linguists couldn’t implement it (parameters yes; triggering no) Syntacticians largely embraced this neat picture. But as a mechanism, triggering was never implemented. Computational linguists deemed it unfeasible. Due to ambiguity and opacity of would-be triggers, in the natural language domain. (Clark, 1989) Examples, next slide Only the concept of parameterization was retained: Language acquisition is selection of a grammar from a finite set, which is defined by UG (innate principles + innate parametric choices). The learning process was modeled as a trial-and-error search through the domain of all possible grammars. Applying familiar domain-general learning algorithms from computer science. No input guidance toward correct grammar. Input serves only as feedback on hypotheses selected partly at random. 4 Why doesn’t instant triggering work? Input ambiguity: E.g. Exceptional Case Marking (Clark 1989) We consider him to be clever. ECM or Infin assigns Acc case? I consider myself to be clever. Long-distance anaphora? Derivational opacity: E.g. Adv P not Verb Subj. Entails -NullSubj. Why?! Because P with no object must be due to obj-topicalization, then topic-drop. +NullTop entails -NS. Conclusion: It’s impossible or impractical to recognize the parameter-values from the surface sentence. Learners have to guess. (Counter-argument in Classes 6 & 7.) Also, classic triggering mis-predicts child data (Yang 2002): children’s grammar changes are gradual; they must be contemplating two or more (many?) grammars simultaneously. 5 Trial-and-error domain search methods: under-powered or over-resourced Genetic algorithm. Clark & Roberts (1993) Test many grammars each on many sentences, rank them, breed them, repeat, repeat. (Over-resourced) Triggering Learning Algorithm. Gibson & Wexler (1994) Test one grammar at a time, on one sentence. If it fails, change one P at random. (Under-powered; fails often, slow) Next slide Give TLA a memory for success-rate of each parameter value. Test one grammar, but sample the whole domain. Variational Model. Yang (2000) Bayesian Learner. Perfors, Tenenbaum & Regier (2006) Test all grammars on total input sample. Adopt the one with best mix of simplicity & good fit. (Over-resourced) 6 Variational Model’s memory for how well each P-value has performed 1 1 1 1 1 etc. 0 0 0 0 0 Test one grammar at a time. If it succeeds, nudge the pointer for each parameter toward the successful P-value. If the grammar fails, nudge the pointers away from those P-values. Select a grammar to test next, with probabNull subject Head-direction WH-movement ility based on the weights of its P-values. 7 Varieties of domain-search, illustrated Think Easter egg hunt. Eggs are the parameter values, to be found. Search domain is the park. Genetic Algorithm: Send out hordes of searchers, compare notes. Triggering Learning Algorithm: A lone searcher, following own nose, small steps: “getting warmer”. Variational Model: Mark findings/failures on a rough map to focus search; occasionally dash to another spot to see what’s there. Compare these with decoding: First consult the sentence! Read a clue, decipher its meaning, go where it says; the egg is there. 8 Varieties of domain-search, illustrated GA: Send out hordes of searchers, compare notes. (Vast effort) TLA: A lone searcher, following own nose, small steps: “getting warmer”. (Slow progress) VM: Mark findings/failures on a rough map; occasionally dash to another spot to see what’s there. (Still a needle in a haystack) … ... ... 9 Yang’s VM: the best current search model Can learn from every input sentence. Choice of a grammar to try is based on its track record. But no decoding, so it extracts little info per sentence: Only can /cannot parse. Not why, or what would help. Can’t recognize unambiguity. Non-deterministic. Parameters may swing back & forth between the two values repeatedly. Inefficiency increases with size of the domain, perhaps exponentially (especially if domain is not ‘smooth’). Yang’s simulations and ours agree: VM consumes an order of magnitude more input than decoding models. 10 Is VM plausible as psychology? VM improves on TLA, achieving more effective search with modest resources. And it avoids getting permanently trapped in a wrong corner of the domain. (‘local minimum’) But it has some strange un-human-like(?) properties: Irrelevant parameter values are rewarded / punished, e.g., prep-stranding in a sentence with no preps. Without decoding, VM can’t know which parameters are relevant to the input sentence. To explore, it tests some grammars that are NOT highly valued at present The child will often fail to parse a sentence, even if her currently best grammar can parse it! Exploring fights normal language use. 11 What’s more psychologically realistic? A crucial aspect of the VM is that even low-valued grammars are occasionally tried out on input sentences. But is this what children do? When a toddler hears an utterance, what goes on in her brain? Specifically: What grammar does she try to process the sentence with? Surely, she’d apply her currently ‘highest-valued’ grammar? Why would she use one that she believes to be wrong? A low-valued grammar would often fail to deliver a successful parse of the sentence. When it fails, the child doesn’t (linguistically) understand the sentence – even if it’s one she understood yesterday and it is generated by her current ‘best’ grammar! CUNY’s alternative: Learning by parsing This is a brief preview. We’ll go into more detail in Class 7. A child’s aim is to understand what people are saying. So, just like adults, children try to parse the sentences they hear. (Assign structure to word string; semantic composition.) When the child’s grammar licenses an input, her parsing routines function just as in adult sentence comprehension. When the sentence lies beyond her current grammar, the parsing mechanism can process parts of the sentence but not all. It seeks a way to complete the parse tree. (Not just yes/no.) To do so, it draws on the additional parameter-values that UG makes available, seeking one that can solve the problem. If a parameter-value succeeds in rescuing the parse, that means it’s useful, so it is adopted into the grammar. 13 So a parameter value must be something the parser can use What a parser (adult or child) really needs is a way to connect an incoming word into the tree structure being built. Some linkage of syntactic nodes and branches. At CUNY we take parameter values to be UG-specified ‘treelets’, that the parser can use. (Not switch-settings.) A treelet is a sub-structure of larger sentential trees (typically underspecified in some respects). Example treelet: a PP node immediately dominating a preposition and a nominal trace. Indicates a positive value for the preposition-stranding parameter (Who are you talking with now? vs. *Avec qui parles-tu maintenant?). 14 Children do what adults do: Example E.g., Which rock can you jump to from here? has a stranded preposition to, with no overt complement. That becomes evident at the word from. For an adult English speaker, the parsing mechanism has access to a possible piece of tree structure (a ‘treelet’) which inserts a phonologically null complement to the preposition, and links it to a fronted wh-phrase. See tree diagram Now consider a child who already knows wh-movement but not yet preposition stranding (maybe not realistic!). The child’s parser would do exactly the same as the adult’s, up to the word from. The child’s current grammar offers no means of continuing the parse. It has no treelet that fits between to and from. So it must look and see whether UG can provide one. 15 In English, a preposition may have a null complement. Learners will discover this, as they parse. +nulli 16 Children must reach out to UG The child’s parser must search for a treelet in the wider pool of candidates made available by UG, to identify one that will fill that gap in the parse tree. Once found, that treelet would become part of the learner’s grammar, for future use in understanding and producing sentences with stranded prepositions. Summary: In the treelet model, the learner’s innate parsing mechanism works with the learner’s single currently best grammar hypothesis, and upgrades it on-line just if and where it finds that a new treelet is needed in order to parse an incoming sentence. A child’s processing of sentences differs from an adults’, only in the need to reach out to UG for new treelets. 17 Compared with domain search systems In this way, the specific properties of input sentences provide a word-by-word guide to the adoption of relevant parameter values, in a narrowly channeled process. E.g., What to do if you encounter a sentence containing a prep without an overt object. This input-guidance gets the maximum benefit from the information the input contains. It requires no specifically-evolved learning mechanism for language. (But it does need access to UG.) It makes use of the sentence parsing mechanism, which is needed in any case – and which is generally regarded as being innate, ready to function as soon as the child knows some words. 18 Please read before Friday (Class 3) The 2-page article “Positive and negative evidence in language acquisition”, by Grimshaw & Pinker. On the availability and utility of negative data. The key questions: Does negative evidence exist? Do language learners use it? Do language learners need to? 19