どうして統語論は不可能であるのか - Linguistics and English Language

Why Syntax is Impossible Mike Dowman Syntax Languages have tens of thousands of words Some combinations of words make valid sentences Others don’t No one understands the grammar of any language Syntax is Complicated! I saw Bill with Mary yesterday. You saw WHO with Mary yesterday?! Who did you see with Mary yesterday? Syntax is Complicated! I saw Bill with Mary yesterday. You saw WHO with Mary yesterday?! Who did you see with Mary yesterday? I saw Bill and Mary yesterday. You saw WHO and Mary yesterday?! Syntax is Complicated! I saw Bill with Mary yesterday. You saw WHO with Mary yesterday?! Who did you see with Mary yesterday? I saw Bill and Mary yesterday. You saw WHO and Mary yesterday?! Who did you see and Mary yesterday? Generative Grammar  An explicit formal system that defines the set of valid sentences in a language  And maybe also explains what each one means  Generative grammar is the core research topic in linguistics  Includes strongly nativist theories and theories proposing that languages are primarily learned Grammar Writing Linguists take a selection of possible sentences And obtain grammaticality judgments for those sentences Then they produce a grammar that accounts for all the data Grammar Coverage Linguists’ grammars only work for selected sentences They can’t explain most naturally occurring sentences The more data we consider the more surprising quirks of syntax that emerge Children’s Language Acquisition Kid’s observe a limited number of example sentences But quickly internalize a system that correctly characterizes the whole language E-language LAD I-language How can kids do syntax when linguists can’t? Innate component of language (provided by genes) Learned component of language (provided by language data) How can kids do syntax when linguists can’t? Innate component of language (provided by genes) Learned component of language (provided by language data) Linguists have to infer both Children only the learned component Information Theory Both components of language must contain some amount of information Data available to children must provide at least enough information as is in the learned component This puts a limit on the complexity of the learned component of language Linguists’ Task Linguists need to have at least as much information as is in the learned and innate components together Can use data from multiple languages to try to characterize innate components And can use positive and negative data Correspondence to Linguistic Theories Small learned component = parameter setting Large learned component = learned languages Small innate component = general learning mechanism Large innate component = universal grammar Size of Each Component Inna te Co m po nen t small Lea r ne d Co m po nen t large huge small large huge learn = easy learn = easy learn = easy ling = easy ling = hard ling = impossible learn = hard learn = hard learn = hard ling = hard ling = hard ling = impossible learn = impossibl e learn = impossible learn = impossible ling = impossible ling = impossible ling = impossible Which component is large? As we haven’t yet managed to produce a generative grammar, at least one of innate or learned components must be large Children learn relatively easily, so the learned component can’t be too big Size of Each Component Inna te Co m po nen t small Lea r ne d Co m po nen t large huge small large huge learn = easy learn = easy learn = easy ling = easy ling = hard ling = impossible learn = hard learn = hard learn = hard ling = hard ling = hard ling = impossible learn = impossibl e learn = impossible learn = impossible ling = impossible ling = impossible ling = impossible How big could the innate component be? Genome contains 3 billion base pairs = 6 billion bits Cell metabolism adds more information Each base pair can be modified Huge amount of information! What could be in a huge innate component? Not words forms - vary from language to language Grammaticality patterns Rules of syntax would be hugely complex Impossibility of Syntax Grammaticality judgments on average can provide no more than one bit of information each If syntax is hugely complex, there will be many grammars that are compatible with any given body of data But all but one of these grammars would fail when tested on enough new data A Concrete Example A multi-agent model Each agent has: innate component learned component Both are bit strings of fixed length Sentences are 100 bit strings Deciding on the Grammaticality of a Sentence 1  Treat the sentence as a binary number  Find: bi = s mod ni bl = s mod nl b is an index to a bit in the innate (bi) or learned (bl) component n is the number of bits in the innate (ni) or learned (nl) component s is the length of the sentences Deciding on the Grammaticality of a Sentence 2  A pseudo-random function maps from the two selected bits plus the sentence to a Boolean grammaticality judgment  It’s therefore typically necessary to know every bit of the sentence and both the innate and learned bits to predict the grammaticality of the sentence  Every bit counts  Usually about half of sentences grammatical, half ungrammatical are 4 Kinds of Agent Teacher Innate: 10101000 Learned: 10010101 Related Innate: 10101000 Learned: 11110001 Unrelated Innate: 10110101 Learned: 00111000 Linguist Innate: 00110100 Learned: 10001100 Learning by Related, Unrelated Observe a sentence from the teacher Work out if it is grammatical according to current I-language If not, invert the relevant bit of the learned component Grammar Inference by Linguists Choose random sentences Ask the teacher if they are grammatical Store all sentences and grammaticality judgments Search for a setting of innate and learned components that assigns the correct grammaticality rating to every sentence 1,000 Bit Innate and Learned Components 1 0.9 related unrelated linguist 0.8 0.7 0.6 0 5000 10000 Number of Example Sentences 15000 20000 1,000 Bit Innate Component 1,000,000 Bit Learned Component 1 0.9 related unrelated linguist 0.8 0.7 0.6 0 5000 10000 Number of Example Sentences 15000 20000 1,000,000 Bit Innate Component 1,000 Bit Learned Component 1 0.9 related unrelated linguist 0.8 0.7 0.6 0 5000 10000 Number of Example Sentences 15000 20000 Implications of Impossible Syntax A linguist can write a grammar that will adequately characterize any body of data But it will fail when tested on new data Partial grammars are not a stepping stone to complete generative grammars A Universal Law of Generative Grammar Generative grammar is impossible if: H(learned component) + H(innate component) > H(language data) Unless we can use information from another source (genetic, neuroscientific, psycholinguistic) Why do Syntax? Studying generative grammar may tell us something about the human mind It won’t help us build natural language processing systems Is studying rare and obscure constructions the best way to do syntax? Conclusion The idea that we can characterize a language by considering enough linguistic data is a hypothesis It’s very unlikely that it’s possible to write a complete generative grammar

どうして統語論は不可能であるのか - Linguistics and English Language

Related documents

Products

Support

どうして統語論は不可能であるのか - Linguistics and English Language

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib