Linguistics and Language Technologies Lori Levin 11-721: Grammars and Lexicons

advertisement
Linguistics and Language
Technologies
Lori Levin
11-721: Grammars and Lexicons
Fall Term 2003
Linguistics
• Linguistics is a
– Cognitive Science
– Social Science
– Area of the Humanities
– Also, neuro-science, area of mathematics,
computer science, etc.
• Primarily about the human mind and
human communication behavior.
Linguistics as a Cognitive Science
• Knowledge of language is not conscious knowledge.
– Like knowing how to walk without knowing which neurons and
muscles are involved.
– What does knowledge of a language consist of?
– Sub-areas of linguistic knowledge:
• Grammar of sentences (syntax), grammar of words (morphology),
sentence meaning (semantics), word meaning (lexical semantics),
language use in context (pragmatics and discourse analysis).
• Do human languages differ from each other in random
ways, or are there common, universal properties?
• How are human languages different from mathematical
languages, logical languages, programming languages,
and animal communication systems?
Linguistics as a Cognitive Science
• First language acquisition: How do human babies learn something so
complex so quickly with such imperfect input?
• Second language acquisition: How do adults learn a second language,
and why are they so bad at something that babies are so good at?
– Do adults learn languages better with immediate or delayed feedback on
errors?
– Does explanation of foreign language grammar help adults learn the
foreign language?
• Psycholinguistics: How is human language processed in the brain and
how is human language produced?
– When you hear the sentence “He put his money in the bank?” does your
brain activate only the sense of “bank” that is related to money, or all of
the senses of “bank” because they sound the same (e.g., river bank).
– Why do you have to do a double take to understand this sentence:
• The cotton shirts are made of is soft.
• Neuro-linguistics: What areas of the brain are activated during
language processing? How do brain injuries affect language
production and comprehension?
Linguistics as a Social Science
• Historical Linguistics: How do human languages
change over time?
– Drift:
• “Corn” used to mean all small grains, e.g, pepper corn, barley
corn.
• What happened to the word “britches”?
• English “f” is systematically related to French “p”. What was
the common sound that they both derived from in some
ancient language?
– Foot/pied
– Father/pere
– Contact:
• Languages in proximity to each other will influence each
other’s vocabulary and grammar, even if the languages were
previously unrelated.
Language as a Social Science
• Sociolinguistics:
– How do human languages vary with social factors
such as
•
•
•
•
•
•
Geography
Age
Ethnic group
Sex
Race
Economic class
– In situations of language contact, what are the factors
that determine whether there will be bilingualism or
language loss?
Language Technologies
• Computer based tools for processing
human languages
– Speech recognition
– Speech synthesis
– Machine translation
– Human-machine dialogue systems
– Information Retrieval, Extraction, and
Summarization
– Computer-assisted language learning
Why should language technologists
learn linguistics?
Audience participation.
What does knowledge of a
language consist of?
• Can he and Sam be the same person?
– He thinks that Sam is wrong.
– Sam expected to see him.
– Sam thinks that he is wrong.
– Sam believed him to be wrong.
– Sam expected Bill to see him.
– The person that he saw likes Sam.
What does knowledge of a
language consist of?
• Recognition of ambiguity:
– I saw a man with a telescope.
– We sold her dog biscuits.
– Milk drinkers turn to powder.
– I saw a friend of John’s brother.
– Grandmother of nine makes hole in one.
What does knowledge of a language
consist of?
• Recognition of grammaticality.
• Many linguists (probably a majority)
assume that people can distinguish strings
of words that are sentences of their
language from strings of words that are
not sentences of their language.
– So imagine that you are a machine or a
classifier that takes a sentence as input, and
returns “accept” or “reject” as output.
Grammaticality
1.
2.
3.
4.
5.
6.
I gave back the car to him.
I gave the car back to him.
I gave the car to him back.
I gave back him the car.
I gave him back the car.
I gave him the car back.
Grammaticality
1.
2.
3.
4.
5.
6.
I gave back the car to him.
I gave the car back to him.
* I gave the car to him back.
* I gave back him the car.
I gave him back the car.
I gave him the car back.
Grammaticality
• A string of words that you recognize as a sentence in
your native language is grammatical.
• A string of words that you do not recognize as a
sentence in your native language is ungrammatical.
• When you decide whether a sentence is grammatical or
ungrammatical, this is called giving a grammaticality
judgement.
• Ungrammatical sentences are preceded by an asterisk
or star (*). Sometimes they are called starred
sentences.
• If native speakers can’t decide whether the sentence is
grammatical or ungrammatical, it is preceded by a
combination of stars and question marks.
Grammaticality: Descriptive and
Prescriptive Linguistics
• Linguists describe what people say.
– Me and him went to the movies.
– Sam wants to boldly go where no one has gone before.
• Linguists do not prescribe what people should say.
• Language technologists don’t get a say in the matter.
– If it’s in the input, you have to deal with it.
• When you give a grammaticality judgement, you are not
supposed to judge whether the sentence is the most
elegant or appropriate --- just whether it is a sentence of
your language or not.
Grammaticality
•
Grammaticality is not completely determined by
meaning:
Sentences 1 and 2 have similar meaning:
•
1.
2.
–
Bill saw Sam and Sue.
Bill saw Sam with Sue.
Sentence 2 can be transformed into a question by (1)
changing “Sue” to “Who”, (2) moving it to the
beginning of the sentence, and (3) making some
changes to the verb.
–
•
Who did Bill see Sam with?
The same process applied to Sentence 1 does not
result in a grammatical sentence.
–
* Who did Bill see Sam and?
Grammaticality
• Sentences that are only possible in poetry
are probably not grammatical:
– * To her we laurels bring.
– * indirect-object subject direct-object verb
– We bring laurels to her.
– subject verb direct-object indirect-object
Grammaticality
• Sentences that are only possible in poetry
are probably not grammatical:
– *Bring we to our alma mater trust and honor
due.
– * verb subject indirect-object direct-object
– We bring trust and honor (that are) due to our
alma mater.
– subject verb direct-object indirect-object
Grammaticality
• Sentences that are understandable, but
sound like mistakes are probably not
grammatical.
– *These are things that I don’t know anyone
who says.
Grammaticality
• However, many types of sentences that are
found in writing, or are restricted to special
contexts are considered to be grammatical and
even have names:
– Locative Inversion: In this village live many people.
– Topicalization: Sam, I like.
– Heavy NP Shift: I presented to the students many
examples of strange and unusual constructions.
(indirect object comes before direct object because
the direct object is too long)
Problems with Grammaticality
• Dialect differences:
– The car needs washed.
• (The car needs to be washed.)
– We go to the movies a lot anymore.
• (We go to the movies a lot these days.)
– I gave it her.
• (I gave it to her.)
– It were me what told her.
• (It was me that told her.)
– Mine is bigger than what yours is.
• (Mine is bigger than yours is.)
Problems with grammaticality
• What is the source of the problem?
– Colorless green ideas sleep furiously.
– Sleep ideas green furiously.
Grammaticality in language
technologies
• Real input (especially spoken input) is not
always well-formed, so you should not
build a program that accepts only
grammatical sentences.
• Can we do away with grammar in
language technologies?
Grammaticality in Language
Technologies
• You cannot extract the meaning of a sentence without
processing the grammar:
– Sue interviewed Sam.
– Sam interviewed Sue.
• LT output has to be comprehensible, and therefore,
mostly grammatical:
– Synthesized speech
– An automatically produced translation
– An automatically produced summary
• Error detection programs for computer-assisted
language instruction or for word processing must
distinguish grammatical from ungrammatical sentences.
Download