lecture1

advertisement
LING 388: Language and
Computers
Sandiway Fong
Lecture 1
Administrivia
• Where and when
– AME S314 Tuesdays/Thursdays 3:30-4:45PM
• No Class
• November 11th (Veterans Day)
• November 27th (Thanksgiving)
• Office Hours
– catch me after class, or
– drop by my office (or make appt.)
– Location: Douglass 311
• TA:
– Benjamin Martin bamartin@email.arizona.edu
Administrivia
• Email
– sandiway@email.arizona.edu
• Homepage
– http://dingo.sbs.arizona.edu/~sandiway
– or Google me by my first name “sandiway”
• Lecture slides:
– available on homepage during and after each class (may be
updated)
– in both PowerPoint (.pptx) and Adobe PDF formats
• .pptx slides may contain animation
Administrivia
Administrivia
Administrivia
• Tips on how to take this class
– No required textbook
– Lecture slides contain everything you need to know in order
to do the homeworks
• To understand the slides,
• you need to attend classes to “grok” the concepts
– Unclear on something?
• You are encouraged to ask questions in or after class
• Ask while the question is still fresh in your mind
• Review lecture video (NEW this year)
– Practice
• You can’t get good at computers just by reading a text
• This is a hands-on class, try the exercises
Administrivia
• Course Objectives
– Theoretical
• Introduction to natural language processing techniques
– Practical
• Be able to write a natural language grammar that runs on
a computer
• Get an idea of what’s hard and what’s easy to do on a
computer
Outcome: by the end of the course, you will have built a
small machine translation engine
Administrivia
• This semester, we will explore two
parallel tracks
– one: roll our own… learn how to write
grammar rules
– two: learn to use software available for
language analysis
Grammar Rules
• Rules of Syntax:
–
–
–
–
S  NP VP
S = sentence
NP = noun phrase
VP = verb phrase
• Example:
• VP  V NP
Grammar Rules
• Rules of Syntax:
–
–
–
–
–
–
–
–
–
S  NP VP
VP  V NP
NP  D N
S = sentence
NP = noun phrase
VP = verb phrase
V = Verb
N = Noun
D = Determiner
• Example:
Grammar Rules
• It can get much
more complicated
(with modern
theories) but the
essential idea is the
same …
Grammar Rules
• Rules:
– S  NP VP
– VP  V NP
– NP  D N
• On a computer:
Prolog language
http://www.learnprolognow.org
Grammar-based Translator
• Example…
English grammar
Japanese grammar
“glue”
Grammar-based Translator
Grammar-based Translator
• Idioms (language-specific):
– John kicked the bucket
gomasuri (ごますり)
Grammar-based Translator
• gomasuri (ごますり)
–
–
–
–
–
–
taroo-ga sensei-ni goma-o
sutta
taroo-nom teacher-dat sesame-acc grinded
“John flattered the teacher”
taroo-ga Hanako-ni goma-o
sutta
taroo-nom Hanako-dat sesame-acc grinded
“John flattered Mary”
Grammar-based Translator
• What about state of the art systems?
Administrivia
• Laboratory Exercises
– Some lectures will be laboratory sessions
– We will do exercises in class
• use your own laptop
– Homework questions will be handed out in
these sessions
– Homework questions are designed to extend the
exercises done in the lab
Administrivia
•
Grading
– 6~7 homeworks
– Mandatory and Extra Credit
Questions:
• extra credit questions may be
applied to the current homework
• they may also bump you up a grade
if you are borderline at the end of
the semester
– Homeworks are typically due one
week after they are handed out
– Homeworks must be submitted by
email to me (by midnight)
– Example:
• a homework given out on Tuesday
will be due next Tuesday at
midnight
•
Ethics
– You may discuss the
homeworks with your
classmates
– However, you must do the work
and write them up
independently
– Sources must be acknowledged
(students, webpage)
– UA Code of Academic Integrity
•
http://deanofstudents.arizona.edu/co
deofacademicintegrity
Administrivia
• Late Policy
– All homeworks are mandatory
– deduction if handed in late
– If you know you’re going to be late or have an upcoming
emergency, let me know ahead of time
• Homework tips
– Homeworks are based on lab exercises
• I don’t take attendance but practice is essential to
understanding
– Nightmare strategy: wait until the evening homework is due,
scratch your head over the lecture notes, have tons of
questions and start panicking
• your computer crashes, the net goes down …
Natural Language Processing (NLP)
= Human Language Technology (HLT)
= Computational Linguistics
• Research Question:
– What methods can we use to process natural
languages on a computer?
• Intersects with:
–
–
–
–
–
Computer science (CS)
Mathematics/Statistics
Artificial intelligence (AI)
Linguistic Theory
Psychology: Psycholinguistics
• e.g. the human sentence processor
Applications
• Information retrieval
– information is stored and accessed using language (keywords etc.)
– document classification (email, news)
• Machine translation
– babelfish (now Bing)
• http://babelfish.altavista.com/
– Google
• http://translate.google.com
• Language Comprehension
– document summarization
• Jeopardy (Quiz show)
• Speech
– automated 800 toll-free directory (800 555 1212)
– cellphones (handsfree dialing)
– car navigation (voice-synthesized directions)
Applications
– technology is still under development, in its infancy …
• computers can’t really understand language
(yet)
– see google webpage translation
– at least it’s free!
• even if we are willing to pay...
– machine translation has been worked on since after World War
II (1950s), still not perfected today
– why?
– what are the properties of human languages that make it hard?
Natural Language Properties
• which ones are going to be difficult for
computers to deal with?
• Grammar
(Rules for putting words together into sentences)
– How many rules are there?
• 100, 1000, 10000, more …
– Portions learnt or innate
– Do we have all the rules written down somewhere?
• Lexicon (Dictionary)
– How many words do we need to know?
• 1000, 10000, 100000 …
Computers vs. Humans
• Knowledge of language
– Computers are way faster than humans
• They kill us at arithmetic and chess
– But human beings are so good at
language, we often take our ability for
granted
• Processed without conscious thought
• Do pretty complex things
Examples
• Ambiguity
– Where can I see the
bus stop?
– stop: verb or part of
the noun-noun
compound bus stop
– Context (Discourse
or situation)
Let’s explore what
computer software
can do
Examples
Available online
http://nlp.stanford.edu:8080/parser/
Examples
Examples
• Changes in interpretation
• John is too stubborn to talk to
• John is too stubborn to talk to Bill
Examples
• Stanford parser:
Examples
Examples
Examples
Download