CS4705 Fall 2007 Natural Language Processing CS 4705

advertisement
CS4705
Natural Language Processing
Fall 2007
CS 4705
What will we study in this course?
• How can machines recognize and generate text
and speech?
• Current real world applications?
– Searching very large text and speech corpora: e.g. the
Web
– Translating between one language and another: e.g.
Arabic and English
– Summarizing very large amounts of text: e.g. your
email, the news
– Building dialogue systems: e.g. Amtrak’s ‘Julie’
Open Problems in NLP
• If you want to find all references to union
activities in New York, what keywords do you
specify?
– Union…and…Unions? United? Uniform? Onion?
– Activities…and…Activity? Active? Actor? Action?
• Morphology: how words are composed of smaller
units of meaning – which words are related?
• What’s the same about these sentences?
Different?
– John hit Bill
– Bill was hit by John
– Bill, John hit
– Who John hit was Bill
• Syntax: the way words are grouped together into
larger constituents and phrases and the way these
phrases can be ordered – how sentences are related
• Semantics: the context-independent ‘meaning’ of
utterances (the similar part)
• Pragmatics: the context-dependent ‘meaning’ of
utterances (some of the different part)
• If you want to find travel information about Nice,
France why might you get documents on Nice
views in Cleveland?
– Word Sense Disambiguation: how to distinguish the
different meanings of words spelled the same
Course Focus: NLP for Text and Speech
• Morphology, syntax, semantics, pragmatics/discourse
• Human language phenomena
• Techniques and algorithms for computational language
processing
– Parsing, information extraction/retrieval, statistical and machine
learning approaches (corpus linguistics)
• Applications: Language generation and summarization,
machine translation, dialogue systems and spoken
language processing
• Next term: CS 4706 focuses on spoken NLP
Instructor
• Julia Hirschberg
–
–
–
–
Computational Linguist in CS
Focus: Spoken Language Processing
Lab: The Speech Lab, CEPSR 7LW3-A
Research:
• Deceptive speech
• Charismatic speech:
• Emotional speech: anger, uncertainty
• Speech summarization: Broadcast News
• Spoken Dialogue Systems: Games Corpus
• `Translating Prosody’: English – Mandarin
– Course Details
Is She Lying?
Bureaucracy
• Instructor: Julia Hirschberg
– (julia@cs.columbia.edu)
– Office and hours: CEPSR 705, TBA
• Teaching Assistant: Frank Enos
– (frank@cs.columbia.edu)
– Office and hours: CEPSR 726 TBA
• Syllabus available at
http://www1.cs.columbia.edu/~julia/cs4705/syllab
us07.html
• Text: Daniel Jurafsky and James H. Martin,
Speech and Language Processing, Prentice-Hall,
2000 (available at CU Bookstore)
– Note errata available on website; check before reading
each chapter please
– Check courseworks
• Assignments:
–
–
–
–
3 homework assignments
Midterm and final exams
Four ‘free’ late days for homework assignments
You must get a CS account
• Evaluation: 50% homework + 50% exams
Academic Integrity
Copying or paraphrasing someone's work (code
included), or permitting your own work to be copied
or paraphrased, even if only in part, is forbidden, and
will result in an automatic grade of 0 for the entire
assignment or exam in which the copying or
paraphrasing was done. Your grade should reflect
your own work. If you are going to have trouble
completing an assignment, talk to the instructor or
TA in advance of the due date please. Everyone:
Read/write protect your homework files at all times.
For Next Class
• Look at syllabus
• Read Chapters 1-2 of J&M
• Questions?
Download