August 29, 2004: Dorr
Overview, History, Goals, Problems,
Techniques; Intro to MT
(J&M 1, 24.1, 24.2, 24.9)
Prof. Bonnie J. Dorr
Co-Instructor: Nitin Madnani
TA: Hamid Shahri
http://www.umiacs.umd.edu/~bonnie/courses/cmsc723-Fall07/
IMPORTANT:
•For Today : Chapters 1 and 24.1, 24.2, 24.9
•For Next Time : Chapter 2
1
This course is interdisciplinary—cuts across different areas of expertise.
Expect that a subset of the class will be learning new material at any time, while others will have to be patient! (The subsets will swap frequently!)
Assignments:
– Before midterm: one without programming, two with programming
– After midterm: one without programming, two with programming
– Programming assignments:
• Both written solutions as well as a code
• Use of NLP toolkits to build modules for different types of NLP processes (e.g., morphological processing, parsing, or machine translation).
• Submission of plain ascii and/or pdf for written solutions
• Python for programming, submitted as .py file(s).
• Linux accounts will be distributed.
– No solutions will be handed out. Written comments will be sent by TA.
All email correspondence MUST HAVE “CMSC 723” in the Subject line
Put bonnie@cs, nmadnani@cs, hamid@cs in the TO line unless it is an issue that you feel should be directed only to the professor.
Assignment 1 (not for credit) has been posted.
Why “Computational Linguistics” (CL) rather than
“Natural Language Processing” (NLP)?
– Computational Linguistics: Computers dealing with language, modeling what people do
– Natural Language: Applications on the computer side
• Why “natural”? Refers to the language spoken by people, e.g. English, Japanese, Swahili, as opposed to artificial languages, like C++, Java, etc.
2
Artificial Intelligence (AI)
(notions of rep, search, etc.)
Machine Learning
(particularly, probabilistic or statistic ML techniques)
Human Computer
Interaction (HCI)
Electrical Engineering
(EE) (Optical Character
Recognition)
Linguistics (Syntax,
Semantics, etc.)
Psychology
CL
Theory of
Computation
Philosophy of Language,
Formal Logic
Information
Retrieval
SWE/HCI Databases
Computers
Artificial Intelligence Alg/Thy/NA Sys/networks
Robotics ML Logic Natural Language Processing Search
Information
Retrieval
Machine
Translation
Language
Analysis
Adapted from Rada Mihalcea (2007)
Semantics Parsing
3
Linguistics: formal grammars, abstract characterization of what is to be learned.
Computer Science: algorithms for efficient learning or online deployment of these systems in automata.
Engineering: stochastic techniques for characterizing regular patterns for learning and ambiguity resolution.
Psychology: Insights into what linguistic constructions are easy or difficult for people to learn or to use
Development of formal language theory
(Chomsky, Kleene, Backus).
– Formal characterization of classes of grammar
(context-free, regular)
– Association with relevant automata
Probability theory: language understanding as decoding through noisy channel (Shannon)
– Use of information theoretic concepts like entropy to measure success of language models.
4
Symbolic
– Use of formal grammars as basis for natural language processing and learning systems. (Chomsky, Harris)
– Use of logic and logic based programming for characterizing syntactic or semantic inference (Kaplan, Kay,
Pereira)
– First toy natural language understanding and generation systems (Woods, Minsky, Schank, Winograd, Colmerauer)
– Discourse Processing: Role of Intention, Focus (Grosz,
Sidner, Hobbs)
Stochastic Modeling
– Probabilistic methods for early speech recognition, OCR
(Bledsoe and Browning, Jelinek, Black, Mercer)
Use of stochastic techniques for part of speech tagging, parsing, word sense disambiguation, etc.
Comparison of stochastic, symbolic, more or less powerful models for language understanding and learning tasks.
5
Advances in software and hardware create
NLP needs for information retrieval (web), machine translation, spelling and grammar checking, speech recognition and synthesis.
Stochastic and symbolic methods combine for real world applications.
Large amounts of spoken & written material now widely available: LDC, etc.
Increased focus on learning has led to more serious interplay with statistical ML community.
Unsupervised learning techniques on the rise—in part brought about by difficulty of producing reliably annotated corpora.
6
Turing test:
– machine, human, and human judge
Judge asks questions of computer and human.
– Machine’s job is to act like a human, human’s job is to convince judge that he’s not the machine.
– Machine judged “intelligent” if it can fool judge.
Judgement of “intelligence” linked to appropriate answers to questions from the system.
Remarkably simple “Rogerian
Psychologist”
Uses Pattern Matching to carry on limited form of conversation.
Seems to “Pass the Turing Test!”
(McCorduck, 1979, pp. 225-226)
Eliza Demo: http://www.lpa.co.uk/pws_dem4.htm
7
Analysis:
Decomposition of the signal (spoken or written) eventually into meaningful units.
This involves …
Decomposition into words, segmentation of words into appropriate phones or letters
Requires knowledge of phonological patterns:
– I’m enormously proud.
– I mean to make you proud.
8
Inflectional
– duck + s = [N duck] + [plural s]
– duck + s = [V duck] + [3rd person s]
Derivational
– kind, kindness
Spelling changes
– drop, dropping
– hide, hiding
Associate constituent structure with string
Prepare for semantic interpretation
S
OR: watch
NP VP Subject Object
I V NP watched det N
I terrapin
Det the terrapin the
9
A way of representing meaning
Abstracts away from syntactic structure
Example:
– First-Order Logic: watch(I,terrapin)
– Can be: “I watched the terrapin” or “The terrapin was watched by me”
Real language is complex:
– Who did I watch?
The Terrapin , is who I watched.
Watch the Terrapin is what I do best.
* Terrapin is what I watched the
Predicate: “watch”
Watcher: “I”
Watchee: “Terrapin”
10
Association of parts of a proposition with semantic roles
Proposition
Experiencer Predicate: Be (perc)
I (1st pers, sg) pred patient saw the Terrapin
Scoping: Every man loves a woman
Any verb can add “able” to form an adjective.
– I taught the class . The class is teachable
– I rejected the idea. The idea is rejectable.
Association of particular words with specific semantic forms.
– John (masculine)
– The boys ( masculine, plural, human)
11
Real world knowledge, speaker intention, goal of utterance.
Related to sociology.
Example 1:
– Could you turn in your assignments now (command)
– Could you finish the homework? (question, command)
Example 2:
– I couldn’t decide how to catch the crook. Then I decided to spy on the crook with binoculars.
– To my surprise, I found out he had them too. Then I knew to just follow the crook with binoculars.
[ the crook [with binoculars]]
[ the crook] [ with binoculars]
Discourse: How propositions fit together in a conversation—multi-sentence processing.
– Pronoun reference:
The professor told the student to finish the assignment.
He was pretty aggravated at how long it was taking to pass it in.
– Multiple reference to same entity:
George W. Bush, president of the U.S.
– Relation between sentences:
John hit the man. He had stolen his bicycle
12
speech
Phonetic Analysis
Morphological analysis text
OCR/Tokenization
Syntactic analysis
Semantic Interpretation
Discourse Processing
analysis input generation output
Morphological analysis
Syntactic analysis
Semantic Interpretation
Interlingua
Morphological synthesis
Syntactic realization
Lexical selection
13
I made her duck
I cooked waterfowl for her
I cooked waterfowl belonging to her
I created the (plaster?) duck she owns
I forced her to lower her head
By magic, I changed her into waterfowl
Structural ambiguity:
S
NP VP NP
S
VP
I V NP VP I V NP made her V made det N duck her duck
14
[verb Duck ] !
[noun Duck] is delicious for dinner
I went to the bank to deposit my check.
I went to the bank to look out at the river.
I went to the bank of windows and chose the one dealing with last names beginning with “d”.
• Dictionary
• Morphology and Spelling Rules
• Grammar Rules
• Semantic Interpretation Rules
• Discourse Interpretation
Natural Language processing involves:
(1) learning or fashioning the rules for each component,
(2) embedding the rules in the relevant automaton
(3) using the automaton to efficiently process the input
15
Machine Translation—Babelfish (Alta Vista): http://babelfish.altavista.com/translate.dyn
Question Answering—Ask Jeeves (Ask Jeeves): http://www.ask.com/
Language Summarization—MEAD (U. Michigan): http://www.summarization.com/mead
Spoken Language Recognition— EduSpeak (SRI): http://www.eduspeak.com/
Automatic Essay evaluation—E-Rater (ETS): http://www.ets.org/research/erater.html
Information Retrieval and Extraction—NetOwl (SRA): http://www.netowl.com/extractor_summary.html
Definition: Translation from one natural language to another by means of a computerized system
Early failures
Later: varying degrees of success
16
The spirit is willing but the flesh is weak
The vodka is good but the meat is rotten
1950’s: Intensive research activity in MT
1960’s: Direct word-for-word replacement
1966 (ALPAC): NRC Report on MT
Conclusion: MT no longer worthy of serious scientific investigation.
1966-1975: `Recovery period’
1975-1985: Resurgence (Europe, Japan)
1985-present: Resurgence (US) http://www.hutchinsweb.me.uk/MTS-93.pdf
17
Need for MT and other NLP applications confirmed
Change in expectations
Computers have become faster, more powerful
WWW
Political state of the world
Maturation of Linguistics
Development of hybrid statistical/symbolic approaches
Integration of machine learning into new linguistically motivated translation paradigms
(Direct, Transfer, Interlingual)
18
Direct:
– I checked his answers against those of the teacher →
Yo comparé sus respuestas a las de la profesora
– Rule: [check X against Y] → [comparar X a Y]
Transfer:
– Ich habe ihn gesehen → I have seen him
– Rule: [clause agt aux obj pred] → [clause agt aux pred obj]
Interlingual:
– I like Mary → Mary me gusta a m í
– Rep: [Be
Ident
(I [AT
Ident
(I, Mary)] Like+ingly)]
Direct : GAT [Georgetown, 1964],
TAUM-METEO [Colmerauer et al. 1971]
Transfer : GETA/ARIANE [Boitet, 1978]
LMT [McCord, 1989], METAL [Thurmair,
1990], MiMo [Arnold & Sadler, 1990], …
Interlingual : MOPTRANS [Schank, 1974],
KBMT [Nirenburg et al, 1992], UNITRAN
[Dorr, 1990]
19
Statistical MT and Hybrid
Symbolic/Stats MT: 1990-2002
Candide [Brown, 1990, 1992];
Halo/Nitrogen [Langkilde and Knight,
1998], [Yamada and Knight, 2002];
GHMT [Dorr and Habash, 2002];
DUSTer [Dorr et al. 2002]
EGYPT/GIZA [Och and Ney, 2003];
PHARAOH [Koehn, 2003, 2004];
HIERO [Chiang, 2005]; MOSES
[Koehn et al., 2006]
20
Pros
– Fast
– Simple
– Inexpensive
– No translation rules hidden in lexicon
Cons
– Unreliable
– Not powerful
– Rule proliferation
– Requires too much context
– Major restructuring after lexical substitution
Pros
– Don’t need to find language-neutral rep
– Relatively fast
Cons
– N 2 sets of transfer rules: Difficult to extend
– Proliferation of language-specific rules in lexicon and syntax
– Cross-language generalizations lost
21
Pros
– Portable (avoids N 2 problem)
– Lexical rules and structural transformations stated more simply on normalized representation
– Explanatory Adequacy
Cons
– Difficult to deal with terms on primitive level: universals?
– Must decompose and reassemble concepts
– Useful information lost (paraphrase)
J&M Chapter 2
Start examining the Python and NLTK resources for Lecture 3
Consider starting assignment 1 now.
22