Lecture Notes 1

advertisement

CMSC 723 / LING 723:

Computational Linguistics I

August 29, 2004: Dorr

Overview, History, Goals, Problems,

Techniques; Intro to MT

(J&M 1, 24.1, 24.2, 24.9)

Prof. Bonnie J. Dorr

Co-Instructor: Nitin Madnani

TA: Hamid Shahri

Administrivia

http://www.umiacs.umd.edu/~bonnie/courses/cmsc723-Fall07/

IMPORTANT:

•For Today : Chapters 1 and 24.1, 24.2, 24.9

•For Next Time : Chapter 2

1

Other Important Stuff

‘

‘

‘

‘

‘

This course is interdisciplinary—cuts across different areas of expertise.

Expect that a subset of the class will be learning new material at any time, while others will have to be patient! (The subsets will swap frequently!)

Assignments:

– Before midterm: one without programming, two with programming

– After midterm: one without programming, two with programming

– Programming assignments:

• Both written solutions as well as a code

• Use of NLP toolkits to build modules for different types of NLP processes (e.g., morphological processing, parsing, or machine translation).

• Submission of plain ascii and/or pdf for written solutions

• Python for programming, submitted as .py file(s).

• Linux accounts will be distributed.

– No solutions will be handed out. Written comments will be sent by TA.

All email correspondence MUST HAVE “CMSC 723” in the Subject line

Put bonnie@cs, nmadnani@cs, hamid@cs in the TO line unless it is an issue that you feel should be directed only to the professor.

Assignment 1 (not for credit) has been posted.

CL vs NLP

Why “Computational Linguistics” (CL) rather than

“Natural Language Processing” (NLP)?

– Computational Linguistics: Computers dealing with language, modeling what people do

– Natural Language: Applications on the computer side

• Why “natural”? Refers to the language spoken by people, e.g. English, Japanese, Swahili, as opposed to artificial languages, like C++, Java, etc.

2

Relation of CL to

Other Disciplines

Artificial Intelligence (AI)

(notions of rep, search, etc.)

Machine Learning

(particularly, probabilistic or statistic ML techniques)

Human Computer

Interaction (HCI)

Electrical Engineering

(EE) (Optical Character

Recognition)

Linguistics (Syntax,

Semantics, etc.)

Psychology

CL

Theory of

Computation

Philosophy of Language,

Formal Logic

Information

Retrieval

Where does it fit in the CS taxonomy?

SWE/HCI Databases

Computers

Artificial Intelligence Alg/Thy/NA Sys/networks

Robotics ML Logic Natural Language Processing Search

Information

Retrieval

Machine

Translation

Language

Analysis

Adapted from Rada Mihalcea (2007)

Semantics Parsing

3

A Sampling of

“Other Disciplines”

‘ Linguistics: formal grammars, abstract characterization of what is to be learned.

‘ Computer Science: algorithms for efficient learning or online deployment of these systems in automata.

‘ Engineering: stochastic techniques for characterizing regular patterns for learning and ambiguity resolution.

‘ Psychology: Insights into what linguistic constructions are easy or difficult for people to learn or to use

History: 1940-1950’s

‘ Development of formal language theory

(Chomsky, Kleene, Backus).

– Formal characterization of classes of grammar

(context-free, regular)

– Association with relevant automata

‘ Probability theory: language understanding as decoding through noisy channel (Shannon)

– Use of information theoretic concepts like entropy to measure success of language models.

4

1957-1983

Symbolic vs. Stochastic

‘ Symbolic

– Use of formal grammars as basis for natural language processing and learning systems. (Chomsky, Harris)

– Use of logic and logic based programming for characterizing syntactic or semantic inference (Kaplan, Kay,

Pereira)

– First toy natural language understanding and generation systems (Woods, Minsky, Schank, Winograd, Colmerauer)

– Discourse Processing: Role of Intention, Focus (Grosz,

Sidner, Hobbs)

‘ Stochastic Modeling

– Probabilistic methods for early speech recognition, OCR

(Bledsoe and Browning, Jelinek, Black, Mercer)

1983-1993:

Return of Empiricism

‘ Use of stochastic techniques for part of speech tagging, parsing, word sense disambiguation, etc.

‘ Comparison of stochastic, symbolic, more or less powerful models for language understanding and learning tasks.

5

1993-1999

‘ Advances in software and hardware create

NLP needs for information retrieval (web), machine translation, spelling and grammar checking, speech recognition and synthesis.

‘ Stochastic and symbolic methods combine for real world applications.

The Rise of Machine Learning:

2000-2007

‘ Large amounts of spoken & written material now widely available: LDC, etc.

‘ Increased focus on learning has led to more serious interplay with statistical ML community.

‘ Unsupervised learning techniques on the rise—in part brought about by difficulty of producing reliably annotated corpora.

6

Language and Intelligence:

Turing Test

‘ Turing test:

– machine, human, and human judge

‘ Judge asks questions of computer and human.

– Machine’s job is to act like a human, human’s job is to convince judge that he’s not the machine.

– Machine judged “intelligent” if it can fool judge.

‘ Judgement of “intelligence” linked to appropriate answers to questions from the system.

ELIZA

‘ Remarkably simple “Rogerian

Psychologist”

‘ Uses Pattern Matching to carry on limited form of conversation.

‘ Seems to “Pass the Turing Test!”

(McCorduck, 1979, pp. 225-226)

‘ Eliza Demo: http://www.lpa.co.uk/pws_dem4.htm

7

What’s involved in an

“intelligent” Answer?

Analysis:

Decomposition of the signal (spoken or written) eventually into meaningful units.

This involves …

Speech/Character Recognition

‘ Decomposition into words, segmentation of words into appropriate phones or letters

‘ Requires knowledge of phonological patterns:

– I’m enormously proud.

– I mean to make you proud.

8

Morphological Analysis

‘ Inflectional

– duck + s = [N duck] + [plural s]

– duck + s = [V duck] + [3rd person s]

‘ Derivational

– kind, kindness

‘ Spelling changes

– drop, dropping

– hide, hiding

Syntactic Analysis

‘ Associate constituent structure with string

‘ Prepare for semantic interpretation

S

OR: watch

NP VP Subject Object

I V NP watched det N

I terrapin

Det the terrapin the

9

Semantics

‘ A way of representing meaning

‘ Abstracts away from syntactic structure

‘ Example:

– First-Order Logic: watch(I,terrapin)

– Can be: “I watched the terrapin” or “The terrapin was watched by me”

‘ Real language is complex:

– Who did I watch?

Lexical Semantics

The Terrapin , is who I watched.

Watch the Terrapin is what I do best.

* Terrapin is what I watched the

Predicate: “watch”

Watcher: “I”

Watchee: “Terrapin”

10

Compositional Semantics

‘ Association of parts of a proposition with semantic roles

Proposition

Experiencer Predicate: Be (perc)

I (1st pers, sg) pred patient saw the Terrapin

‘ Scoping: Every man loves a woman

Word-Governed Semantics

‘ Any verb can add “able” to form an adjective.

– I taught the class . The class is teachable

– I rejected the idea. The idea is rejectable.

‘ Association of particular words with specific semantic forms.

– John (masculine)

– The boys ( masculine, plural, human)

11

Pragmatics

‘ Real world knowledge, speaker intention, goal of utterance.

‘ Related to sociology.

‘ Example 1:

– Could you turn in your assignments now (command)

– Could you finish the homework? (question, command)

‘ Example 2:

– I couldn’t decide how to catch the crook. Then I decided to spy on the crook with binoculars.

– To my surprise, I found out he had them too. Then I knew to just follow the crook with binoculars.

[ the crook [with binoculars]]

[ the crook] [ with binoculars]

Discourse Analysis

‘ Discourse: How propositions fit together in a conversation—multi-sentence processing.

– Pronoun reference:

The professor told the student to finish the assignment.

He was pretty aggravated at how long it was taking to pass it in.

– Multiple reference to same entity:

George W. Bush, president of the U.S.

– Relation between sentences:

John hit the man. He had stolen his bicycle

12

NLP Pipeline

speech

Phonetic Analysis

Morphological analysis text

OCR/Tokenization

Syntactic analysis

Semantic Interpretation

Discourse Processing

Relation to Machine Translation

analysis input generation output

Morphological analysis

Syntactic analysis

Semantic Interpretation

Interlingua

Morphological synthesis

Syntactic realization

Lexical selection

13

Ambiguity

I made her duck

I cooked waterfowl for her

I cooked waterfowl belonging to her

I created the (plaster?) duck she owns

I forced her to lower her head

By magic, I changed her into waterfowl

Syntactic Disambiguation

‘ Structural ambiguity:

S

NP VP NP

S

VP

I V NP VP I V NP made her V made det N duck her duck

14

Part of Speech Tagging and

Word Sense Disambiguation

‘ [verb Duck ] !

[noun Duck] is delicious for dinner

‘ I went to the bank to deposit my check.

I went to the bank to look out at the river.

I went to the bank of windows and chose the one dealing with last names beginning with “d”.

Resources for

NLP Systems

• Dictionary

• Morphology and Spelling Rules

• Grammar Rules

• Semantic Interpretation Rules

• Discourse Interpretation

Natural Language processing involves:

(1) learning or fashioning the rules for each component,

(2) embedding the rules in the relevant automaton

(3) using the automaton to efficiently process the input

15

Some NLP Applications

‘ Machine Translation—Babelfish (Alta Vista): http://babelfish.altavista.com/translate.dyn

‘ Question Answering—Ask Jeeves (Ask Jeeves): http://www.ask.com/

‘

‘

Language Summarization—MEAD (U. Michigan): http://www.summarization.com/mead

Spoken Language Recognition— EduSpeak (SRI): http://www.eduspeak.com/

‘ Automatic Essay evaluation—E-Rater (ETS): http://www.ets.org/research/erater.html

‘ Information Retrieval and Extraction—NetOwl (SRA): http://www.netowl.com/extractor_summary.html

What is MT?

‘ Definition: Translation from one natural language to another by means of a computerized system

‘ Early failures

‘ Later: varying degrees of success

16

An Old Example

The spirit is willing but the flesh is weak

The vodka is good but the meat is rotten

Machine Translation History

‘ 1950’s: Intensive research activity in MT

‘ 1960’s: Direct word-for-word replacement

‘ 1966 (ALPAC): NRC Report on MT

‘ Conclusion: MT no longer worthy of serious scientific investigation.

‘ 1966-1975: `Recovery period’

‘ 1975-1985: Resurgence (Europe, Japan)

‘ 1985-present: Resurgence (US) http://www.hutchinsweb.me.uk/MTS-93.pdf

17

What happened between

ALPAC and Now?

‘

‘

Need for MT and other NLP applications confirmed

Change in expectations

‘

‘

Computers have become faster, more powerful

WWW

‘

‘

Political state of the world

Maturation of Linguistics

‘ Development of hybrid statistical/symbolic approaches

‘ Integration of machine learning into new linguistically motivated translation paradigms

Classical MT and the Vauquois

Triangle

(Direct, Transfer, Interlingual)

18

Examples of Three Approaches

‘

‘

‘

Direct:

– I checked his answers against those of the teacher →

Yo comparé sus respuestas a las de la profesora

– Rule: [check X against Y] → [comparar X a Y]

Transfer:

– Ich habe ihn gesehen → I have seen him

– Rule: [clause agt aux obj pred] → [clause agt aux pred obj]

Interlingual:

– I like Mary → Mary me gusta a m í

– Rep: [Be

Ident

(I [AT

Ident

(I, Mary)] Like+ingly)]

MT Systems: 1964-1990

‘ Direct : GAT [Georgetown, 1964],

TAUM-METEO [Colmerauer et al. 1971]

‘ Transfer : GETA/ARIANE [Boitet, 1978]

LMT [McCord, 1989], METAL [Thurmair,

1990], MiMo [Arnold & Sadler, 1990], …

‘ Interlingual : MOPTRANS [Schank, 1974],

KBMT [Nirenburg et al, 1992], UNITRAN

[Dorr, 1990]

19

Statistical MT and Hybrid

Symbolic/Stats MT: 1990-2002

Candide [Brown, 1990, 1992];

Halo/Nitrogen [Langkilde and Knight,

1998], [Yamada and Knight, 2002];

GHMT [Dorr and Habash, 2002];

DUSTer [Dorr et al. 2002]

Statistical and Phrase-Based MT:

2003-present

EGYPT/GIZA [Och and Ney, 2003];

PHARAOH [Koehn, 2003, 2004];

HIERO [Chiang, 2005]; MOSES

[Koehn et al., 2006]

20

Direct MT: Pros and Cons

‘ Pros

– Fast

– Simple

– Inexpensive

– No translation rules hidden in lexicon

‘ Cons

– Unreliable

– Not powerful

– Rule proliferation

– Requires too much context

– Major restructuring after lexical substitution

Transfer MT: Pros and Cons

‘ Pros

– Don’t need to find language-neutral rep

– Relatively fast

‘ Cons

– N 2 sets of transfer rules: Difficult to extend

– Proliferation of language-specific rules in lexicon and syntax

– Cross-language generalizations lost

21

Interlingual MT: Pros and Cons

‘ Pros

– Portable (avoids N 2 problem)

– Lexical rules and structural transformations stated more simply on normalized representation

– Explanatory Adequacy

‘ Cons

– Difficult to deal with terms on primitive level: universals?

– Must decompose and reassemble concepts

– Useful information lost (paraphrase)

Readings for next time

‘ J&M Chapter 2

‘ Start examining the Python and NLTK resources for Lecture 3

‘ Consider starting assignment 1 now.

22

Download