01 - CLAIR

advertisement
NLP
Natural Language Processing
Class Logistics
Quiz
• Where is this quote from?
Dave Bowman: Open the pod bay doors, HAL.
HAL: I’m sorry Dave. I’m afraid I can’t do that.
Quiz Answer
• “2001: A Space Odyssey”
– 1968 film by Stanley Kubrick
– based on a joint screenplay with
Arthur C. Clarke.
Watson Example
http://www.geekwire.com/2013/ibm-takes-watson-cloud/
What is Natural Language
Processing
• Natural Language Processing (NLP) is the study of the
computational treatment of natural (human) language.
• In other words, teaching computers how to understand
(and generate) human language.
How Computers Understand Language
Modern Applications
•
Search engines
–
•
Question answering
–
•
Yahoo!
Automated earthquake reports
–
•
Google Translate
News digest
–
•
Apple’s Siri
Translation systems
–
•
IBM’s Watson
Natural language assistants
–
•
Google, Yahoo!, Bing, Baidu
LA Times
Automated stock market reports
–
Narrative Science
Notes
• Computers are confused by (human) language
– Specific techniques are needed
– NLP draws on research in Linguistics, Theoretical Computer Science,
Mathematics, Statistics, Artificial Intelligence, Psychology, Databases, etc.
• Goals of this class
–
–
–
–
Understand that language processing is hard (and why)
Understand the key problems in NLP
Learn about the methods used to address these problems
Understand the limitations of these methods
EECS 595/LING 541/SI 561
•
Instructor:
– Dragomir Radev (radev@umich.edu)
•
Class times:
– M 3:10-5:55 in 133 Chrysler
•
GSI:
– Catherine Finegan-Dollak (cfdollak)
•
Grader:
–
TBA
EECS 595/LING 541/SI 561
• Course home page:
– http://web.eecs.umich.edu/~radev/NLP-fall2015/
• Textbook:
–
–
–
–
–
http://www.cs.colorado.edu/~martin/slp.html
Speech and Language Processing
by Jurafsky and Martin
Second edition, 2009
http://web.stanford.edu/~jurafsky/slp3/
• Additional readings:
– www.nltk.org
Other Available Books
• Foundations of Statistical Natural Language Processing
– Chris Manning and Hinrich Schütze
– http://nlp.stanford.edu/fsnlp/
• Natural Language Understanding
– James Allen
Course Dates
•
SEP
–
•
OCT 5
–
•
2 9 16 23 30
DEC
–
•
•
•
•
12 26
NOV
–
•
14 21 28
7 14
no class Mon Oct 19
midterm (unofficial) Nov 2
last class Mon Dec 14
exams Dec 16-23
Structure of the Course
•
Four major parts:
•
Three major goals:
– Linguistic, mathematical, and computational background
– Computational models of morphology, syntax, semantics, discourse, pragmatics
– Core NLP technology: parsing, part of speech tagging, text generation, semantic
analysis, etc.
– Applications: text classification, sentiment analysis, text summarization, question
answering, machine translation, information extraction, etc.
– Learn the basic principles and theoretical issues underlying natural language
processing
– Learn techniques and tools used to develop practical, robust systems that can
understand text and communicate with users in one or more languages
– Gain insight into some open research problems in natural language
Syllabus
• Book sections
–
–
–
–
–
Introduction (chapter 1)
Words (chapters 2-6)
Syntax (chapters 12-16)
Semantics and Pragmatics (chapters 17-21)
Applications (chapters 22-25)
Draft Syllabus
Introduction
Language Modeling
Part-of-Speech Tagging
Hidden Markov Models
Formal Grammars of English
Syntactic Parsing
Statistical Parsing
Features and Unification
Dependency Parsing
The Representation of Meaning
Computational Semantics
Lexical Semantics
Computational Lexical Semantics
Computational Discourse
Information Extraction
Question Answering and Summarization
Dialog and Conversational Agents
Machine Translation
Sentiment and Subjectivity Analysis
Vector Semantics
Deep Learning for NLP
Grading
• Assignments
–
–
–
–
4 programming projects (60%)
Midterm (15%)
Final (20%)
Class participation (5%)
Programming Projects
•
•
•
•
Language Modeling and Part of Speech Tagging
Dependency Parsing
Vector Semantics for Word Sense Disambiguation
Machine Translation
More Sample Projects
•
•
•
•
•
•
•
•
•
•
•
•
Noun phrase parser
Paraphrase identification
Question answering
NL access to databases
Named entity tagging
Rhetorical parsing
Anaphora resolution
Document and sentence alignment
Using bioinformatics methods
Information extraction
Speech processing
Sentence normalization
•
•
•
•
•
•
•
•
•
•
•
•
Text summarization
Sentence compression
Definition extraction
Crossword puzzle generation
Prepositional phrase attachment
Machine translation
Generation
Semi-structured document parsing
Semantic analysis of short queries
User-friendly summarization
Number classification
Time-dependent fact extraction
Courses at Other Places
• Brick-and-Mortar
–
–
–
–
–
–
Johns Hopkins University (Jason Eisner)
Cornell University (Lillian Lee)
Stanford University (Chris Manning, Dan Jurafsky, Richard Socher)
U. Maryland (Hal Daumé)
Berkeley (Dan Klein)
U. Texas (Ray Mooney)
• Coursera
– Manning/Jurafsky (2012, survey)
– Michael Collins (2013, more advanced)
The Association for Computational Linguistics (ACL)
www.aclweb.org
The Alphabet Soup
•
•
•
•
•
•
•
NLP (Natural Language Processing)
CL (Computational Linguistics)
IR (Information Retrieval)
SP (Speech Processing)
HLT (Human Language Technology)
NLE (Natural Language Engineering)
ML (Machine Learning)
Research in NLP
•
Conferences:
–
•
Journals:
–
•
Google, MSR, Yahoo!, FB, IBM, SRI, BBN, MITRE, AT&T Labs
The ACL Anthology
–
•
Berkeley, Columbia, Stanford, CMU, JHU, Brown, UMass, MIT, UPenn, USC/ISI, Illinois, Michigan, UW, Maryland, etc.
Toronto, Edinburgh, Cambridge, Sheffield, Saarland, Trento, Prague, QCRI, NUS, and many others
Industrial research sites:
–
•
Computational Linguistics, TACL, Natural Language Engineering, Information Retrieval, Information Processing and Management,
ACM Transactions on Information Systems, ACM TALIP, ACM TSLP
University centers:
–
–
•
ACL/NAACL, EMNLP, SIGIR, AAAI/IJCAI, Coling, HLT, EACL/NAACL, AMTA/MT Summit, ICSLP/Eurospeech
http://www.aclweb.org/anthology
The ACL Anthology Network (AAN)
–
http://clair.eecs.umich.edu/aan/index.php
NLP
Download