Linguistics NLP

advertisement
Linguistically Rich
Statistical Models of Language
Joseph Smarr
M.S. Candidate
Symbolic Systems Program
Advisor: Christopher D. Manning
December 5th, 2002
Grand Vision

Talk to your computer like another human


Ask your computer a question, it finds the
answer


HAL, Star Trek, etc.
“Who’s speaking at this week’s SymSys
Forum?”
Computer can read and summarize text for
you

“What’s the cutting edge in NLP these days?”
We’re Not There (Yet)

Turns out behaving intelligently is
difficult


General Artificial Intelligence problems


What does it take to achieve the grand
vision?
Knowledge representation, common sense
reasoning, etc.
Language-specific problems


Complexity, ambiguity, and flexibility of
language
Always underestimated because language
is so easy for us!
Are There Useful Sub-Goals?

Grand vision is still too hard, but we can
solve simpler problems that are still valuable





Filter news for stories about new tech
gadgets
Take the SSP talk email and add it to my
calendar
Dial my cell phone by speaking my friend’s
name
Automatically reply to customer service emails
Find out which episode of The Simpsons is
tonight
Theoretical Linguistics vs.
NLP
Theoretical Linguistics

Goal:


Understand people’s
Knowledge of language

Goal:

Method:


Natural Language
Processing
Rich logical
representations of
language’s hidden
structure and meaning

Method:

Guiding principles:



Separation of (hidden)
knowledge of language
and (observable)
performance
Grammaticality is
categorical (all or none)
Describe what are possible

Develop practical tools for
analyzing speech / text
Simple, robust models of
everyday language use that
are sufficient to perform
tasks
Guiding principles


Exploit (empirical)
regularities and patterns in
examples of language in text
collections
Sentence “goodness” is
gradient (better or worse)
Theoretical Linguistics vs.
NLP
Linguistics
NLP
Linguistic Puzzle

When dropping an argument, why do some
verbs keep the subject and some keep the
object?



Not just “quirkiness of language”



John sang the song  John sang
John broke the vase  The vase broke
Similar patterns show up in other languages
Seems to involve deep aspects of verb
meaning
Rules to account for this phenomenon

Two classes of verbs (unergative &
unaccusative)
Exception: Imperatives


“Open the pod bay doors, Hal”
Different goals lead to study of different
problems. In NLP...




Need to recognize this as a command
Need to figure out what specific action to
take
Irrelevant how you’d say it in French
Describing language vs. working with
language
Theoretical Linguistics vs.
NLP

Potential for much synergy between linguistics and
NLP


Chomsky (founder of generative grammar):


“It must be recognized that the notion ‘probability of a
sentence’ is an entirely useless one, under any known
interpretation of this term.”
Karttunen (founder of finite state technologies at
Xerox)


However, historically they have remained quite distinct
Linguists’ reaction to NLP: “Not interested. You do not
understand Theory. Go away you geek.”
Jelinek (former head of IBM speech project):

“Every time I fire a linguist, the performance of our
speech recognition system goes up.”
Potential Synergies

Lexical acquisition (unknown words)


Modeling “naturalness” and “conventionality”


Use corpus data to weight constructions
Dealing with ungrammatical utterances


Statistically infer new lexical entries from
context
Find “most similar / most likely” correction
Richer patterns for finding information in
text

Use argument structure / semantic
dependencies
Finding Information in Text

US Government has sponsored lots of
research in “information extraction” from
news articles



Find mentions of terrorists and which
locations they’re targeting
Find which companies are being acquired by
which others and for how much
Progress driven by simplifying the models
used

Early work used rich linguistic parsers


Unable to robustly handle natural text
Modern work is mainly finite state patterns
Web Information Extraction

How much does that text book cost on
Amazon?

Learn patterns for finding relevant fields
Our Price: $##.##
Concept: Book
Title: Foundations of Statistical
Natural Language
Processing
Author(s) Christopher D. Manning &
: Hinrich Schütze
Price: $58.45
Improving IE Performance
on Natural Text Documents

How can we scale IE back up for natural text?


Need to look elsewhere for regularities to
exploit
Idea: Consider grammatical structure


Run shallow parser on each sentence
Flatten output into sequence of “typed
chunks”
Example of Tagged Sentence:
Uba2p is located largely in the nucleus.
NP_SEG
VP_SEG
PP_SEG NP_SEG
Power of Linguistic Features
Using typed phrase segment tags uniformly impoves BWI's
performance on the 4 natural text MEDLINE extraction tasks
Average performance on 4 data sets
1.0
0.8
0.6
no tags
tags
0.4
0.2
0.0
Precision
Recall
21% increase
65% increase
F1
45% increase
Linguistically Rich(er) IE

Exploit more grammatical structure for
patterns

e.g. Tim Grow’s work on IE with PCFGs
S{pur, acq, amt}
VP{acq, amt}
NP{pur}
NNP NNP NNP
{pur} {pur} {pur}
First Union Corp
VP{acq, amt}
MD
will
PP{amt}
VB
NP{acq}
acquire
NNP NNP NNP
{acq} {acq} {acq}
Sheland Bank Inc
IN
for
NP{amt}
CD
CD NNP
{amt} {amt} {amt}
three million dollars
Classifying Unknown Words

Which of the following is the name of a city?
Cotrimoxazole
Wethersfield
Alien Fury: Countdown to Invasion


Most linguistic grammars assume a fixed
lexicon
How do humans learn to deal with new words?
 Context (“I spent a summer living in
Wethersfield”)
 Makeup of the word itself (“phonesthetics”)
What’s in a Name?
oxa
0
:
0
00
6
field
0
00
708
18
4
17
14
4
drug
company
movie
place
person
14
0
8
6
68
Generative Model of PNPs
Length n-gram model and word model
P(pnp|c) = Pn-gram(word-lengths(pnp))
*Pword ipnp P(wi|word-length(wi))
Word model: mixture of character n-gram model
and common word model
P(wi|len) = llen*Pn-gram(wi|len)k/len + (1-llen)* Pword(wi|len)
N-Gram Models: deleted interpolation
P0-gram(symbol|history) = uniform-distribution
Pn-gram(s|h) = lC(h)Pempirical(s|h) + (1- lC(h))P(n-1)-gram(s|h)
Experimental Results
82%
84%
86%
88%
90%
92%
94%
96%
98%
98.93%
drug-nyse
nyse-drug_movie_place_person
98.70%
nyse-place
98.64%
98.41%
nyse-person
98.16%
drug-person
97.76%
nyse-movie
96.81%
95.77%
drug-nyse_movie_place_person
drug-movie
95.47%
person-drug_nyse_movie_place
95.24%
drug-place
94.57%
nyse-place-person
94.34%
93.25%
place-person
drug-nyse-place-person
92.70%
movie-person
91.86%
place-drug_nyse_movie_person
90.90%
89.94%
movie-drug_nyse_place_person
movie-place
drug-nyse-movie-place-person
88.11%
pairwise
1-all
n-way
100%
Knowledge of Frequencies



Linguistics traditionally assumes Knowledge
of Language doesn’t involve counting
Letter frequencies are clearly an important
source of knowledge for unknown words
Similarly, we saw before that there are
regular patterns to exploit in grammatical
information
Take home point:
 Combining Statistical NLP methods with
richer linguistic representations is a big win!
Language is Ambiguous!







Ban on Nude Dancing on Governor’s Desk –
from a Georgia newspaper column
discussing current legislation
Lebanese chief limits access to private parts
– talking about an Army General’s initiative
Death may ease tension – an article about
the death of Colonel Jean-Claude Paul in
Haiti
Iraqi Head Seeks Arms
Juvenile Court to Try Shooting Defendant
Teacher Strikes Idle Kids
Stolen Painting Found By Tree
Language is Ambiguous!








Local HS Dropouts Cut in Half
Obesity Study Looks for Larger Test Group
British Left Waffles on Falkland Islands
Red Tape Holds Up New Bridges
Man Struck by Lightning Faces Battery
Charge
Clinton Wins on Budget, but More Lies Ahead
Hospitals Are Sued by 7 Foot Doctors
Kids Make Nutritious Snacks
Coping With Ambiguity

Categorical grammars like HPSG provide
many
possible analyses for sentences



455 parses for “List the sales of the products
produced in 1973 with the products produced
in 1972.” (Martin et al, 1987)
In most cases, only one interpretation is
intended
Initial solution was hand-coded preferences
among rules


Hard to manage as number of rules increase
Need to capture interactions among rules
Statistical HPSG Parse
Selection

HPSG provides deep analyses of sentence
structure and meaning




Useful for NLP tasks like question answering
Need to solve disambiguation problem to
make using these richer representations
practical
Idea: Learn statistical preferences among
constructions from hand-disambiguated
collection of sentences
Result: Correct analysis chosen >80% of the
time
Towards Semantic Extraction

HPSG provides representation of meaning


Computers need meaning to do inference


Who did what to whom?
Can we extend information extraction
methods to extract meaning representations
from pages?
Current project: IE for the semantic web


Large project to build rich ontologies to
describe the content of web pages for
intelligent agents
Use IE to extract new instances of concepts
from web pages (as opposed to manual
Towards the Grand Vision?

Collaboration between Theoretical
Linguistics and NLP is important step
forward


How can we ever teach computers enough
about language and the world?




Practical tools with sophisticated language
power
Hawking: Moore’s Law is sufficient
Moravec: mobile robots must learn like
children
Kurzweil: reverse-engineer the human brain
The experts agree:
Upcoming Convergence
Courses





Ling 139M
Ling 239E
CS 276B
Ling 239A
CS 224N
Machine Translation
Win
Grammar Engineering
Win
Text Information Retrieval Win
Parsing and Generation
Spr
Natural Language Processing Spr
Get Involved!!
Download