(Lecture for CS410 Text Information Systems)
Jan 28, 2011
ChengXiang Zhai
Department of Computer Science
University of Illinois, Urbana-Champaign
1
•
What is NLP?
•
A brief history of NLP
•
The current state of the art
•
NLP and text management
2
How can a computer make
out of this string
?
Morphology
- What are the basic units of meaning (words)?
- What is the meaning of each word?
Syntax
Semantics
Pragmatics
- How are words related with each other?
What is the “combined meaning” of words?
What is the “meta-meaning”? (speech act)
Discourse
Inference
- Handling a large chunk of text
- Making sense of everything
3
A dog is chasing a boy on the playground
Det Noun Aux Verb Det Noun Prep Det Noun
Lexical analysis
(part-of-speech tagging)
Noun Phrase
Noun Phrase Complex Verb Noun Phrase
Semantic analysis
Dog(d1).
Boy(b1).
Playground(p1).
Chasing(d1,b1,p1).
+
Scared(x) if Chasing(_,x,_).
Verb Phrase
Sentence
Verb Phrase
Prep Phrase
Syntactic analysis
(Parsing)
Scared(b1)
Inference
A person saying this may be reminding another person to get the dog back…
Pragmatic analysis
(speech act) 4
BAD NEWS:
Unfortunately, we can’t.
General NLP = “AI-Complete”
5
•
Natural language is designed to make human communication efficient. As a result,
– we omit a lot of “common sense” knowledge, which we assume the hearer/reader possesses
– we keep a lot of ambiguities, which we assume the hearer/reader knows how to resolve
•
This makes EVERY step in NLP hard
– Common sense reasoning is pre-required
6
•
Word-level ambiguity: E.g.,
– “design” can be a noun or a verb (Ambiguous POS)
– “root” has multiple meanings (Ambiguous sense)
•
Syntactic ambiguity: E.g.,
– “natural language processing” (Modification)
– “A man saw a boy with a telescope .
” (PP Attachment)
•
Anaphora resolution: “John persuaded Bill to buy a
TV for himself .
” (himself = John or Bill?)
•
Presupposition: “He has quit smoking.” implies that he smoked before.
7
8
•
Early enthusiasm (1950’s): Machine Translation
– Too ambitious
– Bar-Hillel report (1960) concluded that fully-automatic high-quality translation could not be accomplished without knowledge (Dictionary + Encyclopedia)
•
Less ambitious applications (late 1960’s & early 1970’s): Limited success, failed to scale up
– Speech recognition
– Shallow understanding
Deep understanding in
– limited domain
•
Real world evaluation (late 1970’s – now)
– Story understanding (late 1970’s & early 1980’s)
Knowledge representation
– Large scale evaluation of speech recognition, text retrieval, information extraction (1980 – now) Robust component techniques
– Statistical approaches enjoy more success (first in speech recognition & retrieval, later others) Stat. language models
•
Current trend:
– Heavy use of machine learning techniques Learning-based NLP
– Boundary between statistical and symbolic approaches is disappearing.
– We need to use all the available knowledge
Applications
– Applicationdriven NLP research (bioinformatics, Web, Question answering…)
9
A dog is chasing a boy on the playground
Det Noun Aux Verb Det Noun Prep Det Noun
POS
Tagging:
97%
Noun Phrase
Noun Phrase Complex Verb Noun Phrase
Verb Phrase
Prep Phrase
Parsing: partial >90%(?)
Semantics: some aspects
- Entity/relation extraction
- Word sense disambiguation
- Anaphora resolution
Sentence
Verb Phrase
Speech act analysis: ???
Inference: ???
10
Training data (Annotated text)
This sentence serves as an example of
Det N V1 P Det N P annotated text…
V2 N
“This is a new sentence”
Consider all possibilities, and pick the one with the highest probability
POS Tagger
This is a new sentence
Det Aux Det Adj N
This is a new sentence
Det Det Det Det Det
… …
Det Aux Det Adj N
… …
V2 V2 V2 V2 V2
Method 1: Independent assignment
Most common tag
(
1
,..., k
, ,..., )
i k
( |
1
1
)... ( | p w t p t t k i
) (
1
)
1
)... (
Method 2: Partial dependency k
) w
1
=“this”, w
2
=“is”, …. t
1
=Det, t
2
=Det, …,
11
S
Grammar
Lexicon
S
NP VP
NP
Det BNP
NP
BNP
NP
NP PP
BNP
N
VP
V
VP
Aux V NP
VP
VP PP
PP
P NP 1.0
…
1.0
0.3
0.4
0.3
…
Generate
V
chasing
Aux
is
N
dog
N
boy
0.01
0.003
N
playground
Det
the
…
Det
a
P
on
…
Det
A
Det
A
NP
NP
BNP
N dog
BNP
N
S dog
VP
VP
Aux V
Probability of this tree=0.000015
is chasing
NP P a boy
Aux is
VP
V
Choose a tree with highest prob….
NP
PP chasing
NP
P
NP a boy on
PP on
NP the playground
Can also be treated as a classification/decision problem… the playground roller skates
12
Only successful for VERY limited domain or for
SOME aspect of semantics
E.g.,
– Entity extraction (e.g., recognizing a person’s name):
Use rules and/or machine learning
– Word sense disambiguation: addressed as a classification problem with supervised learning
– Sentiment tagging
– Anaphora resolution …
In general, exploiting machine learning and statistical language models…
13
•
100% POS tagging
– “He turned off the highway.” vs “He turned off the fan.”
•
General complete parsing
– “A man saw a boy with a telescope.”
•
Deep semantic analysis
– Will we ever be able to precisely define the meaning of “own” in
“John owns a restaurant.”?
Robust & general NLP tends to be “shallow”, while “deep” understanding doesn’t scale up …
14
•
Speech recognition: e.g., Auto telephone call routing
•
Text management
– Text retrieval/filtering
– Text classification
– Text summarization Our focus
– Text mining
– Query answering
•
Language tutoring
– Spelling/grammar correction
•
Machine translation
– Cross-language retrieval
– Restricted natural language
•
Natural language user interface
15
Better NLP => Better Text Management
Bad NLP => Bad Text Management?
Robust, shallow NLP tends to be more useful than deep, but fragile NLP.
Errors in NLP can hurt text management performance…
16
Tasks Dependency on NLP
“Easier” &
More “workarounds”
Classification/
Retrieval
Summarization/
Extraction/
Mining
Translation/
Dialogue
Question
Answering
17
ELIZA plays the role of a therapist
How does it work?
Person : Men are all alike.
ELIZA : In what way?
Person : They're always bugging us about something or other.
ELIZA : Can you think of a specific example?
Person : Well, my boyfriend made me come here.
ELIZA : Your boyfriend made you come here?
Person : He says I'm depressed much of the time.
ELIZA : I am sorry to hear that you are depressed.
Person : It's true. I am unhappy.
ELIZA : Do you think coming here will help you not to be unhappy?
Person : I need some help, that much seems certain.
ELIZA : What would it mean to you if you got some help?
Person : Perhaps I could learn to get along with my mother.
ELIZA : Tell me more about your family
Heuristic rules based on
Keyword matching!
I remember X =>
Do you often think of X?
always =>
E.g.,
Can you think of a specific example?
It keeps the dialogue going on, but NEVER makes any claim!
A breakdown example:
Person : Failure is the mother of success.
ELIZA : Tell me more about your family
No NLP, but useful. Perhaps we should call this NLP?
Statistical NLP often has a similar flavor with “SOFT” rules LEARNED from data
18
Learn how to translate Chinese to English from many example translations
Intuitions:
- If we have seen all possible translations, then we simply lookup
- If we have seen a similar translation, then we can adapt
If we haven’t seen any example that’s similar, we try to generalize what we’ve seen
All these intuitions are captured through a probabilistic model
P(E)
English
Words (E)
Noisy
Channel
Chinese
Words(C)
P(C|E)
Translator
P(E|C)=?
English
Translation
19
Statistical NLP in general, and statistical language models in particular
The need for high robustness and efficiency implies the dominant use of simple models (i.e., unigram models)
20
•
NLP is the basis for text management
– Better NLP enables better text management
– Better NLP is necessary for sophisticated tasks
•
But
– Bad NLP doesn’t mean bad text management
– There are often “workarounds” for a task
– Inaccurate NLP can even hurt the performance of a task
•
The most effective NLP techniques are often statistical with the help of linguistic knowledge
•
The challenge is to bridge the gap between NLP and applications
21