Linguistics and Language Technologies Lori Levin 11-721: Grammars and Lexicons Fall Term 2003 Linguistics • Linguistics is a – Cognitive Science – Social Science – Area of the Humanities – Also, neuro-science, area of mathematics, computer science, etc. • Primarily about the human mind and human communication behavior. Linguistics as a Cognitive Science • Knowledge of language is not conscious knowledge. – Like knowing how to walk without knowing which neurons and muscles are involved. – What does knowledge of a language consist of? – Sub-areas of linguistic knowledge: • Grammar of sentences (syntax), grammar of words (morphology), sentence meaning (semantics), word meaning (lexical semantics), language use in context (pragmatics and discourse analysis). • Do human languages differ from each other in random ways, or are there common, universal properties? • How are human languages different from mathematical languages, logical languages, programming languages, and animal communication systems? Linguistics as a Cognitive Science • First language acquisition: How do human babies learn something so complex so quickly with such imperfect input? • Second language acquisition: How do adults learn a second language, and why are they so bad at something that babies are so good at? – Do adults learn languages better with immediate or delayed feedback on errors? – Does explanation of foreign language grammar help adults learn the foreign language? • Psycholinguistics: How is human language processed in the brain and how is human language produced? – When you hear the sentence “He put his money in the bank?” does your brain activate only the sense of “bank” that is related to money, or all of the senses of “bank” because they sound the same (e.g., river bank). – Why do you have to do a double take to understand this sentence: • The cotton shirts are made of is soft. • Neuro-linguistics: What areas of the brain are activated during language processing? How do brain injuries affect language production and comprehension? Linguistics as a Social Science • Historical Linguistics: How do human languages change over time? – Drift: • “Corn” used to mean all small grains, e.g, pepper corn, barley corn. • What happened to the word “britches”? • English “f” is systematically related to French “p”. What was the common sound that they both derived from in some ancient language? – Foot/pied – Father/pere – Contact: • Languages in proximity to each other will influence each other’s vocabulary and grammar, even if the languages were previously unrelated. Language as a Social Science • Sociolinguistics: – How do human languages vary with social factors such as • • • • • • Geography Age Ethnic group Sex Race Economic class – In situations of language contact, what are the factors that determine whether there will be bilingualism or language loss? Language Technologies • Computer based tools for processing human languages – Speech recognition – Speech synthesis – Machine translation – Human-machine dialogue systems – Information Retrieval, Extraction, and Summarization – Computer-assisted language learning Why should language technologists learn linguistics? Audience participation. What does knowledge of a language consist of? • Can he and Sam be the same person? – He thinks that Sam is wrong. – Sam expected to see him. – Sam thinks that he is wrong. – Sam believed him to be wrong. – Sam expected Bill to see him. – The person that he saw likes Sam. What does knowledge of a language consist of? • Recognition of ambiguity: – I saw a man with a telescope. – We sold her dog biscuits. – Milk drinkers turn to powder. – I saw a friend of John’s brother. – Grandmother of nine makes hole in one. What does knowledge of a language consist of? • Recognition of grammaticality. • Many linguists (probably a majority) assume that people can distinguish strings of words that are sentences of their language from strings of words that are not sentences of their language. – So imagine that you are a machine or a classifier that takes a sentence as input, and returns “accept” or “reject” as output. Grammaticality 1. 2. 3. 4. 5. 6. I gave back the car to him. I gave the car back to him. I gave the car to him back. I gave back him the car. I gave him back the car. I gave him the car back. Grammaticality 1. 2. 3. 4. 5. 6. I gave back the car to him. I gave the car back to him. * I gave the car to him back. * I gave back him the car. I gave him back the car. I gave him the car back. Grammaticality • A string of words that you recognize as a sentence in your native language is grammatical. • A string of words that you do not recognize as a sentence in your native language is ungrammatical. • When you decide whether a sentence is grammatical or ungrammatical, this is called giving a grammaticality judgement. • Ungrammatical sentences are preceded by an asterisk or star (*). Sometimes they are called starred sentences. • If native speakers can’t decide whether the sentence is grammatical or ungrammatical, it is preceded by a combination of stars and question marks. Grammaticality: Descriptive and Prescriptive Linguistics • Linguists describe what people say. – Me and him went to the movies. – Sam wants to boldly go where no one has gone before. • Linguists do not prescribe what people should say. • Language technologists don’t get a say in the matter. – If it’s in the input, you have to deal with it. • When you give a grammaticality judgement, you are not supposed to judge whether the sentence is the most elegant or appropriate --- just whether it is a sentence of your language or not. Grammaticality • Grammaticality is not completely determined by meaning: Sentences 1 and 2 have similar meaning: • 1. 2. – Bill saw Sam and Sue. Bill saw Sam with Sue. Sentence 2 can be transformed into a question by (1) changing “Sue” to “Who”, (2) moving it to the beginning of the sentence, and (3) making some changes to the verb. – • Who did Bill see Sam with? The same process applied to Sentence 1 does not result in a grammatical sentence. – * Who did Bill see Sam and? Grammaticality • Sentences that are only possible in poetry are probably not grammatical: – * To her we laurels bring. – * indirect-object subject direct-object verb – We bring laurels to her. – subject verb direct-object indirect-object Grammaticality • Sentences that are only possible in poetry are probably not grammatical: – *Bring we to our alma mater trust and honor due. – * verb subject indirect-object direct-object – We bring trust and honor (that are) due to our alma mater. – subject verb direct-object indirect-object Grammaticality • Sentences that are understandable, but sound like mistakes are probably not grammatical. – *These are things that I don’t know anyone who says. Grammaticality • However, many types of sentences that are found in writing, or are restricted to special contexts are considered to be grammatical and even have names: – Locative Inversion: In this village live many people. – Topicalization: Sam, I like. – Heavy NP Shift: I presented to the students many examples of strange and unusual constructions. (indirect object comes before direct object because the direct object is too long) Problems with Grammaticality • Dialect differences: – The car needs washed. • (The car needs to be washed.) – We go to the movies a lot anymore. • (We go to the movies a lot these days.) – I gave it her. • (I gave it to her.) – It were me what told her. • (It was me that told her.) – Mine is bigger than what yours is. • (Mine is bigger than yours is.) Problems with grammaticality • What is the source of the problem? – Colorless green ideas sleep furiously. – Sleep ideas green furiously. Grammaticality in language technologies • Real input (especially spoken input) is not always well-formed, so you should not build a program that accepts only grammatical sentences. • Can we do away with grammar in language technologies? Grammaticality in Language Technologies • You cannot extract the meaning of a sentence without processing the grammar: – Sue interviewed Sam. – Sam interviewed Sue. • LT output has to be comprehensible, and therefore, mostly grammatical: – Synthesized speech – An automatically produced translation – An automatically produced summary • Error detection programs for computer-assisted language instruction or for word processing must distinguish grammatical from ungrammatical sentences.