Lecture 1 CS4705 Introduction to Natural Language Processing

advertisement
Lecture 1
CS4705
Introduction to Natural
Language Processing
CS 4705
What is Natural Language Processing?
• The study of human languages and how they can
be represented computationally and analyzed and
generated algorithmically
– The cat is on the mat. --> on (mat, cat)
– on (mat, cat) --> The cat is on the mat
• Studying NLP involves studying natural language,
formal representations, and algorithms for their
manipulation
What can we learn about language?
• Morphology: words and their composition
– cat, cats, dogs
– child, children
– undo, union
• Phonetics and Phonology: speech sounds, their
production, and the rule systems that govern their
use
–
–
–
–
tap, butter
nice white rice; height/hot; kite/cot; night/not...
city hall, parking lot, city hall parking lot
The cat is on the mat. The cat is on the mat?
• Syntax: the structuring of words into larger
phrases
–
–
–
–
John hit Bill
Bill was hit by John (passive)
Bill, John hit (preposing)
Who John hit was Bill (wh-cleft)
• Semantics: the (truth-functional) meaning of
words and phrases
–
–
–
–
gun(x) & holster(y) & in(x,y)
fake (gun (x)) (compositional semantics)
The king of France is bald (presupposition violation)
bass fishing, bass playing (word sense disambiguation)
• Pragmatics and Discourse: the meaning of words
and phrases in context
–
–
–
–
George got married and had a baby.
George had a baby and got married.
Some people left early.
Prosodic Variation
• German teachers
• Bill doesn’t drink because he’s unhappy.
• John only introduced Mary to Sue.
• John called Bill a Republican and then he insulted
him.
• John likes his mother, and so does Bill.
NLP Applications
• Speech Synthesis, Speech Recognition, IVR
Systems (TOOT: more or less succeeds)
• Information Retrieval (SCANMail demo)
• Information Extraction
– Question Answering (AQUA)
• Machine Translation (SYSTRAN)
• Summarization (NewsBlaster)
• Automated Psychotherapy (Eliza)
AQUA Demo
Bureaucracy
• Instructor: Julia Hirschberg
– Office and hours: CEPSR 705, TTh 2:30-3:30
• Teaching Assistant: Ani Nenkova
– Office and hours: CEPSR 721, M 10-12
• Syllabus available at
http://www.cs.columbia.edu/~julia/cs4705/syllabu
s.html
• Text: Daniel Jurafsky and James H. Martin,
Speech and Language Processing, Prentice-Hall,
2000 (available at Columbia bookstore)
Note errata available on website
• Assignments: 3 homework assignments, midterm,
final
– Evaluation: 50% homework + 50% exams
Academic Integrity
Copying or paraphrasing someone's work (code
included), or permitting your own work to be copied
or paraphrased, even if only in part, is forbidden, and
will result in an automatic grade of 0 for the entire
assignment or exam in which the copying or
paraphrasing was done. Your grade should reflect
your own work. If you are going to have trouble
completing an assignment, talk to the instructor or
TA in advance of the due date please. Everyone:
Read/write protect your homework files at all times.
Questions
•
•
•
•
•
•
Name
Email address
Undergrad/Grad
Major/Specialization
Previous language study
Natural Languages
– Your native language
– Languages you are fluent in
– Languages you have some facility in
• Anything else?
For Next Class
• Read Chapters 1-2
• For fun: Experiment with Eliza:
– Does she pass the Turing Test?
– What kind of input defeats her?
– How could you improve her ability to fool people into
thinking she is human?
Download