. - Villanova Department of Computing Sciences

advertisement
CS 8520: Artificial Intelligence
Natural Language Processing
Introduction
Paula Matuszek
Spring, 2013
CSC 8520 Spring 2013. Paula Matuszek
Natural Language Processing
•
•
•
•
•
•
•
•
•
•
speech recognition
natural language understanding
computational linguistics
psycholinguistics
information extraction
information retrieval
inference
natural language generation
speech synthesis
language evolution
CSC 8520 Spring 2013. Paula Matuszek
2
Applied NLP
•
•
•
•
•
•
Machine translation
spelling/grammar correction
Information Retrieval
Data mining
Document classification
Question answering, conversational agents
CSC 8520 Spring 2013. Paula Matuszek
3
Natural Language Understanding
sound waves
accoustic
/phonetic
morphological
/syntactic
semantic /
pragmatic
internal
representation
CSC 8520 Spring 2013. Paula Matuszek
4
Natural Language Understanding
sound waves
accoustic
/phonetic
Sounds
morphological/
syntactic
Symbols
semantic /
pragmatic
Sense
internal
representation
CSC 8520 Spring 2013. Paula Matuszek
5
Where are the words?
sound waves
accoustic
/phonetic
morphological
/syntactic
semantic /
pragmatic
•“How to recognize speech, not to wreck a
internal
nice beach”
•“The cat scares all the birds away”
representation
•“The cat’s cares are few”
-
pauses in speech bear little relation to word breaks
CSC 8520 Spring 2013. Paula Matuszek
+ intonation offers additional clues to meaning
6
Dissecting words/sentences
sound waves
accoustic
/phonetic
morphological
/syntactic
semantic /
pragmatic
internal
representation
•“The dealer sold the merchant a dog”
• “I saw the Golden bridge flying into San Francisco”
CSC 8520 Spring 2013. Paula Matuszek
7
What does it mean?
sound waves
accoustic
/phonetic
morphological
/syntactic
• “I saw Pathfinder on Mars with a telescope”
• “Pathfinder photographed Mars”
semantic /
pragmatic
internal
representation
• “The Pathfinder photograph from Ford has arrived”
• “When a Pathfinder fords a river it sometimes mars its paint job.”
CSC 8520 Spring 2013. Paula Matuszek
8
What does it mean?
sound waves
accoustic
/phoneti
c
morphologi
cal/syntact
ic
• “Jack went to the store. He
found the milk in aisle 3. He
paid for it and left.”
semantic /
pragmatic
internal
representation
• “ Q: Did you read the report?
A: I read Bob’s email.”
CSC 8520 Spring 2013. Paula Matuszek
9
The steps in NLP
• Morphology: Concerns the way words are
built up from smaller meaning bearing
units.
• Syntax: concerns how words are put
together to form correct sentences and what
structural role each word has
• Semantics: concerns what words mean and
how these meanings combine in sentences
to form sentence meanings
CSC 8520 Spring 2013. Paula Matuszek
Taken from ocw.kfupm.edu.sa/user062/ICS48201/NL2introduction.ppt
10
The steps in NLP (Cont.)
• Pragmatics: concerns how sentences are
used in different situations and how use
affects the interpretation of the sentence
• Discourse: concerns how the immediately
preceding sentences affect the interpretation
of the next sentence
CSC 8520 Spring 2013. Paula Matuszek
Taken from ocw.kfupm.edu.sa/user062/ICS48201/NL2introduction.ppt
11
Some of the Tools
• Regular Expressions and Finite State
Automata
• Part of Speech taggers
• N-Grams
• Grammars
• Parsers
• Semantic Analysis
CSC 8520 Spring 2013. Paula Matuszek
12
Parsing (Syntactic Analysis)
• Assigning a syntactic and logical form to an input
sentence
– uses knowledge about word and word meanings (lexicon)
– uses a set of rules defining legal structures (grammar)
• Paula ate the apple.
• (S (NP (NAME Paula))
•
(VP (V ate)
•
(NP (ART the)
•
(N apple))))
CSC 8520 Spring 2013. Paula Matuszek
Taken from ocw.kfupm.edu.sa/user062/ICS48201/NL2introduction.ppt
13
Word Sense Resolution
• Many words have many meanings or senses
• We need to resolve which of the senses of
an ambiguous word is invoked in a
particular use of the word
• I made her duck. (made her a bird for lunch
or made her move her head quickly
downwards?)
CSC 8520 Spring 2013. Paula Matuszek
Taken from ocw.kfupm.edu.sa/user062/ICS48201/NL2introduction.ppt
14
Reference Resolution
• Domain Knowledge (Registration transaction)
• Discourse Knowledge
• World Knowledge
•
U: I would like to register in an CSC Course.
•
S: Which number?
•
U: Make it 8520.
•
S: Which section?
•
U: Which section is in the evening?
•
S: section 1.
•
U: Then make it that section.
CSC 8520 Spring 2013. Paula Matuszek
Taken from ocw.kfupm.edu.sa/user062/ICS48201/NL2introduction.ppt
15
Stages of NLP
parse
tree
Syntactic
Analysis
user
stems
Morphological
Analysis
Surface
form
Semantic
Analysis
Internal
representation
CSC 8520 Spring 2013. Paula Matuszek
lexicon
Discourse
Analysis
Pragmatic
Analysis
Perform
action
Resolve
references
Taken from ocw.kfupm.edu.sa/user062/ICS48201/NL2introduction.ppt
Human Languages
• You know ~50,000 words of primary language,
each with several meanings
• six year old knows ~13000 words
• First 16 years we learn 1 word every 90 min of
waking time
• Mental grammar generates sentences -virtually
every sentence is novel
• 3 year olds already have 90% of grammar
• ~6000 human languages – none of them simple!
Adapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society London
CSC 8520 Spring 2013. Paula Matuszek
17
Human Spoken language
• Most complicated mechanical motion of the human
body
– Movements must be accurate to within mm
– synchronized within hundredths of a second
• We can understand up to 50 phonemes/sec (normal
speech 10-15ph/sec)
– but if sound is repeated 20 times /sec we hear
continuous buzz!
• All aspects of language processing are involved and
manage to keep apace
Adapted from Martin Nowak 2000 – Evolutionary biology of language – Phil.Trans. Royal Society London
CSC 8520 Spring 2013. Paula Matuszek
18
Why Language is Hard
• NLP is AI-complete
• Abstract concepts are difficult to represent
• LOTS of possible relationships among
concepts
• Many ways to represent similar concepts
• Tens of hundreds or thousands of
features/dimensions
CSC 8520 Spring 2013. Paula Matuszek
19
Why Language is Easy
• Highly redundant
• Many relatively crude methods provide
fairly good results
CSC 8520 Spring 2013. Paula Matuszek
20
History of NLP
• Prehistory (1940s, 1950s)
– automata theory, formal language theory,
markov processes (Turing, McCullock&Pitts,
Chomsky)
– information theory and probabilistic algorithms
(Shannon)
– Turing test – can machines think?
CSC 8520 Spring 2013. Paula Matuszek
21
History of NLP
• Early work:
– symbolic approach
• generative syntax - eg Transformations and Discourse Analysis Project
(TDAP- Harris)
• AI – pattern matching, logic-based, special-purpose systems
– Eliza -- Rogerian therapist http://www.manifestation.com/neurotoys/eliza.php3
– stochastic
• bayesian methods
– early successes -$$$$ grants!
– by 1966 US government had spent 20 million on machine translation
• Critics:
– Bar Hillel – “no way to disambiguation without deep understanding”
– Pierce NSF 1966 report: “no way to justify work in terms of practical
output”
CSC 8520 Spring 2013. Paula Matuszek
22
History of NLP
• The middle ages (1970-1990)
– stochastic
• speech recognition and synthesis (Bell Labs)
– logic-based
• compositional semantics (Montague)
• definite clause grammars (Pereira&Warren)
– ad hoc AI-based NLU systems
• SHRDLU robot in blocks world (Winograd)
• knowledge representation systems at Yale (Shank)
– discourse modeling
• anaphora
• focus/topic (Groz et al)
• conversational implicature (Grice)
CSC 8520 Spring 2013. Paula Matuszek
23
History of NLP
• NLP Renaissance (1990-2000)
–
–
–
–
–
Lessons from phonology & morphology successes:
finite-state models are very powerful
probabilistic models pervasive
Web creates new opportunities and challenges
practical applications driving the field again
• 21st Century NLP
– The web changes everything:
– much greater use for NLP
– much more data available
CSC 8520 Spring 2013. Paula Matuszek
24
Document Features
• Most NLP is applied to some quantity of
unstructured text.
• For simplicity, we will refer to any such
quantity as a document
• What features of a document are of
interest?
• Most common is the actual terms in the
document.
CSC 8520 Spring 2013. Paula Matuszek
25
Tokenization
• Tokenization is the process of breaking up a string
of letters into words and other meaningful
components (numbers, punctuation, etc.
• Typically broken up at white space.
• Very standard NLP tool
• Language-dependent, and sometimes also domaindependent.
– 3,7-dihydro-1,3,7-trimethyl-1H-purine-2,6-dione
• Tokens can also be larger divisions: sentences,
paragraphs, etc.
CSC 8520 Spring 2013. Paula Matuszek
26
Lexical Analyser
• Basic idea is a finite state machine
• Triples of input state, transition token,
output state
A-Z
1
blank
A-Z
0
blank, EOF
2
• Must be very efficient; gets used a LOT
CSC 8520 Spring 2013. Paula Matuszek
27
Design Issues for Tokenizer
• Punctuation
– treat as whitespace?
– treat as characters?
– treat specially?
• Case
– fold?
• Digits
– assemble into numbers?
– treat as characters?
– treat as punctuation?
CSC 8520 Spring 2013. Paula Matuszek
28
NLTK Tokenizer
• Natural Language ToolKit
• http://text-processing.com/demo/tokenize/
• Call me Ishmael. Some years ago--never mind
how long precisely--having little or no money
in my purse, and nothing particular to interest
me on shore, I thought I would sail about a
little and see the watery part of the world.
CSC 8520 Spring 2013. Paula Matuszek
29
Document Features
• Once we have a corpus of standardized and possibly
tokenized documents, what features of text documents
might be interesting?
–
–
–
–
–
–
–
–
Word frequencies: Bag of Words (BOW)
Language
Document Length -- characters, words, sentences
Named Entities
Parts of speech
Average word length
Average sentence length
Domain-specific stuff
CSC 8520 Spring 2013. Paula Matuszek
30
Bag of Words
• Specifically not interested in order
• Frequency of each possible word in this
document.
• Very sparse vector!
• In order to assign count to correct position,
need to know all the words used in the
corpus
– two-pass
– reverse-index
CSC 8520 Spring 2013. Paula Matuszek
31
Simplifying BOW
• Do we really want every word?
• Stop words
– omit common function words.
– e.g. http://www.ranks.nl/resources/stopwords.html
• Stemming or lemmatization
– Convert words to standard form
– lemma is the standard word; stem may not be a word at all
• Synonym matching
• TF*IDF
– Use the “most meaningful” words
CSC 8520 Spring 2013. Paula Matuszek
32
Stemming
• Inflectional stemming: eliminate morphological variants
– singular/plural, present/past
– In English, rules plus a dictionary
• books -> book, children -> child
– few errors, but many omissions
• Root stemming: eliminate inflections and derivations
– invention, inventor -> invent
– much more aggressive
• Which (if either) depends on problem
• http://text-processing.com/demo/stem/
CSC 8520 Spring 2013. Paula Matuszek
33
Synonym-matching
• Map some words to their synonyms
• http://thesaurus.com/browse/computer
• Problematic in English
– requires a large dictionary
– many words have multiple meanings
• In specific domains may be important
– biological and chemical domains:
http://en.wikipedia.org/wiki/Caffeine
– any specific domain: Nova, Villanova, V’Nova
CSC 8520 Spring 2013. Paula Matuszek
34
TF-IDF
• Word frequency is simple, but:
– affected by length of document
– not best indicator of what doc is about
• very common words don’t tell us much about
differences between documents
• very uncommon words are typos or idiosyncratic
• Term Frequency Inverse Document
Frequency
– tf-idf(j) = tf(j)*idf(j). idf(j)=log(N/df(j))
CSC 8520 Spring 2013. Paula Matuszek
35
Language
• Relevant if documents are in multiple
languages
• May know from source
• Determining language itself can be
considered a form of NLP.
• http://translate.google.com/?hl=en&tab=wT
• http://fr.wikipedia.org/
• http://de.wikipedia.org/
CSC 8520 Spring 2013. Paula Matuszek
36
Document Counts
• Number of characters, words, sentences
• Average length of words, sentences,
paragraphs
• EG: clustering documents to determine
how many authors have written them or
how many genres are represented
• NLTK + Python makes this easy
CSC 8520 Spring 2013. Paula Matuszek
37
Named Entities
• What persons, places, companies are mentioned in
documents?
• “Proper nouns”
• One of most common information extraction tasks
• Combination of rules and dictionary
– Example rules:
• Capitalized word not at beginning of sentence
• Two capitalized words in a row
• One or more capitalized words followed by Inc
– Dictionaries of common names, places, major corporations.
Sometimes called “gazetteer”
CSC 8520 Spring 2013. Paula Matuszek
38
Parts of Speech
• Part-of-Speech (POS) taggers identify
nouns, verbs, adjectives, noun phrases, etc
• Brill is the best-known rule-based tagger
• More recent work uses machine learning to
create taggers from labeled examples
• http://text-processing.com/demo/tag/
• http://cst.dk/online/pos_tagger/uk/index.ht
ml
CSC 8520 Spring 2013. Paula Matuszek
39
Domain-Specific
• Structure in document
– email: To, From, Subject, Body
– Villanova Campus Currents: News, Academic
Notes, Events, Sports, etc
• Tags in document
– Medline corpus is tagged with MeSH terms
– Twitter feeds may be tagged with #tags.
• Intranet documents might have date, source,
department, author, etc.
CSC 8520 Spring 2013. Paula Matuszek
40
N-Grams
• These language features will take you a
long way
• But it’s not hard to come up with examples
where it fails:
– Dog bites man. vs. Man bites dog.
• Without adding explicit semantics we can
still get substantial additional information
by considering sequences of words.
CSC 8520 Spring 2013. Paula Matuszek
41
Free Association Exercise 
• I am going to say some phrases. Write down the
next word or two that occur to you.
–
–
–
–
–
–
–
Microsoft announced a new security ____
NHL commissioner cancels rest ____
One Fish, ______
Colorless green ideas ______
Conjunction Junction, what’s __________
Oh, say, can you see, by the dawn’s ______
After I finished my homework I went _____.
CSC 8520 Spring 2013. Paula Matuszek
42
Human Word Prediction
• Clearly, at least some of us have the ability
to predict future words in an utterance.
• How?
– Domain knowledge
– Syntactic knowledge
– Lexical knowledge
CSC 8520 Spring 2013. Paula Matuszek
43
Claim
• A useful part of the knowledge needed to
allow Word Prediction can be captured
using simple statistical techniques
• In particular, we'll rely on the notion of the
probability of a sequence (a phrase, a
sentence)
CSC 8520 Spring 2013. Paula Matuszek
44
Applications
• Why do we want to predict a word, given
some preceding words?
– Rank the likelihood of sequences containing
various alternative hypotheses, e.g. for
automated speech recognition, OCRing.
• Theatre owners say popcorn/unicorn sales have
doubled...
– Assess the likelihood/goodness of a sentence,
e.g. for text generation or machine translation
• Como mucho pescado. --> At the most fished.
CSC 8520 Spring 2013. Paula Matuszek
45
Real Word Spelling Errors
• They are leaving in about fifteen minuets to go to
her house.
• The study was conducted mainly be John Black.
• The design an construction of the system will take
more than a year.
• Hopefully, all with continue smoothly in my
absence.
• Can they lave him my messages?
• I need to notified the bank of….
• He is trying to fine out.
Example from Dorr, http://www.umiacs.umd.edu/~bonnie/courses/cmsc723-04/lecture-notes/Lecture5.ppt
CSC 8520 Spring 2013. Paula Matuszek
46
Language Modeling
• Fundamental tool in NLP
• Main idea:
– Some words are more likely than others to
follow each other
– You can predict fairly accurately that
likelihood.
• In other words, you can build a language
model
Adapted from Hearst, http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/lecture4.ppt
CSC 8520 Spring 2013. Paula Matuszek
47
N-Grams
• N-Grams are sequences of tokens.
• The N stands for how many terms are used
– Unigram: 1 term
– Bigram: 2 terms
– Trigrams: 3 terms
• You can use different kinds of tokens
– Character based n-grams
– Word-based n-grams
– POS-based n-grams
• N-Grams give us some idea of the context around the
token we are looking at.
Adapted from Hearst, http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/lecture4.ppt
CSC 8520 Spring 2013. Paula Matuszek
48
N-Gram Models of Language
• A language model is a model that lets us compute
the probability, or likelihood, of a sentence S, P(S).
• N-Gram models use the previous N-1 words in a
sequence to predict the next word
– unigrams, bigrams, trigrams,…
• How do we construct or train these language
models?
– Count frequencies in very large corpora
– Determine probabilities using Markov models, similar
to POS tagging.
CSC 8520 Spring 2013. Paula Matuszek
49
Counting Words in Corpora
• What is a word?
–
–
–
–
–
–
e.g., are cat and cats the same word?
September and Sept?
zero and oh?
Is _ a word? * ? ‘(‘ ?
How many words are there in don’t ? Gonna ?
In Japanese and Chinese text -- how do we
identify a word?
• Back to stemming!
CSC 8520 Spring 2013. Paula Matuszek
50
Terminology
• Sentence: unit of written language
• Utterance: unit of spoken language
• Word Form: the inflected form that appears in the
corpus
• Lemma: an abstract form, shared by word forms
having the same stem, part of speech, and word sense
• Types: number of distinct words in a corpus
(vocabulary size)
• Tokens: total number of words
CSC 8520 Spring 2013. Paula Matuszek
51
Simple N-Grams
• Assume a language has V word types in its
lexicon, how likely is word x to follow word y?
– Simplest model of word probability: 1/ V
– Alternative 1: estimate likelihood of x occurring in new
text based on its general frequency of occurrence
estimated from a corpus (unigram probability)
• popcorn is more likely to occur than unicorn
– Alternative 2: condition the likelihood of x occurring in
the context of previous words (bigrams, trigrams,…)
• mythical unicorn is more likely than mythical popcorn
CSC 8520 Spring 2013. Paula Matuszek
52
Computing the Probability of a Word Sequence
• Compute the product of component
conditional probabilities?
– P(the mythical unicorn) = P(the)
P(mythical|the) P(unicorn|the mythical)
• The longer the sequence, the less likely we
are to find it in a training corpus
• P(Most biologists and folklore specialists believe
that in fact the mythical unicorn horns derived from
the narwhal)
• Solution: approximate using n-grams
CSC 8520 Spring 2013. Paula Matuszek
53
Bigram Model
• Approximate
by
– P(unicorn|the mythical) by P(unicorn|mythical)
• Markov assumption: the probability of a word depends only
on the probability of a limited history
• Generalization: the probability of a word depends only on the
probability of the n previous words
– Trigrams, 4-grams, …
– The higher n is, the more data needed to train
– The higher n is, the sparser the matrix.
• Leads us to backoff models
CSC 8520 Spring 2013. Paula Matuszek
54
Using N-Grams
• For N-gram models
–

– P(wn-1,wn) = P(wn | wn-1) P(wn-1)
– By the Chain Rule we can decompose a joint
probability, e.g. P(w1,w2,w3)
P(w1,w2, ...,wn) = P(w1|w2,w3,...,wn) P(w2|w3, ...,wn) … P(wn1|wn) P(wn)
For bigrams then, the probability of a sequence is just the
product of the conditional probabilities of its bigrams
P(the,mythical,unicorn) = P(unicorn|mythical) P(mythical|the)
P(the|<start>)
CSC 8520 Spring 2013. Paula Matuszek
55
A Simple Example
– P(I want to eat Chinese food) =
P(I | <start>)
* P(want | I)
P(to | want)
*
P(eat | to)
P(Chinese | eat) * P(food | Chinese)
CSC 8520 Spring 2013. Paula Matuszek
56
Counts from the Berkeley Restaurant Project
Nth term
N-1
term
CSC 8520 Spring 2013. Paula Matuszek
57
BeRP Bigram Table
Nth term
N-1
term
CSC 8520 Spring 2013. Paula Matuszek
58
A Simple Example
– P(I want to eat Chinese food) =
P(I | <start>)
* P(want | I)
P(to | want)
*
P(eat | to)
P(Chinese | eat) * P(food | Chinese)
•.25*.32*.65*.26*.02*.56 = .00015
CSC 8520 Spring 2013. Paula Matuszek
59
So What?
• P(I want to eat British food) = P(I|<start>)
P(want|I) P(to|want) P(eat|to) P(British|eat)
P(food|British) = .25*.32*.65*.26*.001*.60
= .000080
• vs. I want to eat Chinese food = .00015
• Probabilities seem to capture ``syntactic''
facts, ``world knowledge''
– eat is often followed by an NP
– British food is not too popular
CSC 8520 Spring 2013. Paula Matuszek
60
Approximating Shakespeare
• As we increase the value of N, the accuracy of the n-gram
model increases, since choice of next word becomes
increasingly constrained
• Generating sentences with random unigrams...
– Every enter now severally so, let
– Hill he late speaks; or! a more to leg less first you enter
• With bigrams...
– What means, sir. I confess she? then all sorts, he is trim, captain.
– Why dost stand forth thy canopy, forsooth; he is this palpable hit
the King Henry.
CSC 8520 Spring 2013. Paula Matuszek
61
• Trigrams
– Sweet prince, Falstaff shall die.
– This shall forbid it should be branded, if
renown made it empty.
• Quadrigrams
– What! I will go seek the traitor Gloucester.
– Will you not tell me who I am?
CSC 8520 Spring 2013. Paula Matuszek
62
• There are 884,647 tokens, with 29,066
word form types, in about a one million
word Shakespeare corpus
• Shakespeare produced 300,000 bigram
types out of 844 million possible bigrams:
so, 99.96% of the possible bigrams were
never seen (have zero entries in the table)
• Quadrigrams worse: What's coming out
looks like Shakespeare because it is
Shakespeare
CSC 8520 Spring 2013. Paula Matuszek
63
N-Gram Training Sensitivity
• If we repeated the Shakespeare experiment
but trained our n-grams on a Wall Street
Journal corpus, what would we get?
• This has major implications for corpus
selection or design
CSC 8520 Spring 2013. Paula Matuszek
64
Some Useful Empirical Observations
• A few events occur with high frequency
• Many events occur with low frequency
• You can quickly collect statistics on the high
frequency events
• You might have to wait an arbitrarily long time to
get valid statistics on low frequency events
• Some of the zeroes in the table are really zeros
But others are simply low frequency events you
haven't seen yet. We smooth the frequency table
by assigning small but non-zero frequencies to
these terms.
CSC 8520 Spring 2013. Paula Matuszek
65
Smoothing is like Robin Hood: Steal from the rich
and give to the poor (in probability mass)
From Snow, http://www.stanford.edu/class/linguist236/lec11.ppt
CSC 8520 Spring 2013. Paula Matuszek
66
Summary
• N-gram probabilities can be used to
estimate the likelihood
– Of a word occurring in a context (N-1)
– Of a sentence occurring at all
• Smoothing techniques deal with problems
of unseen words in a corpus
• N-grams are useful in a wide variety of
NLP tasks
CSC 8520 Spring 2013. Paula Matuszek
67
All Our N-Grams Are Belong to You
http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
• Google uses n-grams for machine translation,
spelling correction, other NLP
• In 2006 they released a large collection of ngram counts through the Linguistic Data
Consortium, based on a trillion web pages
– a trillion tokens, 300 million bigrams, about a
billion tri-, four-and five-grams.
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2006T13)
• This quantity of data makes a qualitative change
in what we can do with statistical NLP.
CSC 8520 Spring 2013. Paula Matuszek
68
Semantics and Semantic
Analysis
CSC 8520 Spring 2013. Paula Matuszek
69
Meaning
• So far, we have focused on the structure of language,
not on what things mean
• We have been doing natural language processing, but
not natural language understanding.
• So what is natural language understanding?
–
–
–
–
Answering an essay question on an exam?
Deciding what to order at a restaurant by reading a menu?
Realizing you’ve been insulted?
Appreciating a sonnet?
• As hard as answering "What is artificial intelligence?"
CSC 8520 Spring 2013. Paula Matuszek
70
Meaning, cont
• On the practical side, we want to “understand”
natural language because BOW and n-gram
methods will only take us so far in some things:
– Machine translation
– Generation
– Question answering
• So we saw how we could use n-grams to choose
the more likely translation for a word given other
words. But n-grams can’t give us the potential
translations in the first place.
CSC 8520 Spring 2013. Paula Matuszek
71
Semantics
• What kinds of things can we not do well with the
tools we have already looked at?
– Retrieve information in response to unconstrained
questions: e.g., travel planning
– Accurate translations
– Play the "chooser" side of 20 Questions
– Read a newspaper article and answer questions about it
• These tasks require that we also consider semantics:
the meaning of our tokens and their sequences
CSC 8520 Spring 2013. Paula Matuszek
72
So What IS Meaning?
• From NLP viewpoint, meaning is a mapping from
linguistic forms to some kind of representation of
knowledge of the world
• It is interpreted within the framework of some sort of
action to be taken.
• Often we manipulate symbols all the way through; the
“meaning” is put in by the human user. Translations,
for instance.
• But not always – voice command systems, for instance,
may map from the representation into actions.
CSC 8520 Spring 2013. Paula Matuszek
73
Example
• Question: Is there a restaurant in King of Prussia
serving vegetarian dinners?
• From a restaurant database
– Lemon Grass is a sister restaurant to the one in West
Philadelphia serving traditional Thai dishes in addition to a
complete vegetarian Thai menu. The service and
atmosphere are quite pleasant. A welcome addition to the
King of Prussia area.
• What do we need to know to answer this question from
this text?
• Can we unambiguously answer it?
CSC 8520 Spring 2013. Paula Matuszek
74
Example, continued
• Is there a restaurant in King of Prussia serving vegetarian dinners?
Yes.
• Lemon Grass is a sister restaurant to the one in West Philadelphia
serving traditional Thai dishes in addition to a complete vegetarian
Thai menu. The service and atmosphere are quite pleasant. A
welcome addition to the King of Prussia area.
• What do we need to know?
– Vegetarian Thai menu = vegetarian dinners
– Welcome addition to the KoP area = in KoP
• Can we unambiguously answer it?
– No. But we can be fairly certain.
• The only parse that makes sense
• Restaurants normally do serve the dinner meal.
CSC 8520 Spring 2013. Paula Matuszek
75
Knowledge Representation for NLP
• So what representation?
• A useful KR needs to give us a way to encode the
knowledge relevant to our problem in a way which allows
us to use it. Any representation which does this will work.
• How do we decide what we want to represent?
– Entities, categories, events, time, aspect
– Predicates, relationships among entities, arguments (constants,
variables)
– And…quantifiers, operators (e.g. temporal)
• How much structure do we capture?
CSC 8520 Spring 2013. Paula Matuszek
76
Semantic Nets
• labeled, directed graph
• nodes represent objects, concepts, or situations
– labels indicate the name
– nodes can be instances (individual objects) or
classes (generic nodes)
• links represent relationships
– the relationships contain the structural information
of the knowledge to be represented
– the label indicates the type of the relationship
CSC 8520 Spring 2013. Paula Matuszek
77
Semantic Net Examples
HasA
Ship
HasA
Propulsion
Hull
Person
InstanceOf
Steamboat
Instance
Instance
Bob
give
recipient
agent
Mary
object
Candy
CSC 8520 Spring 2013. Paula Matuszek
78
Generic/Individual
• Generic describes the idea--the “notion”
– static
• Individual or instance describes a real entity
– must conform to notion of generic
– dynamic
– “individuate” or “instantiate”
• A lot of NLP using semantic nets involves
instantiating generic nets based on a given piece
of text.
CSC 8520 Spring 2013. Paula Matuszek
79
Individuation example
Person
give
recipient
agent
Person
object
Thing
Generic Representation
Process the sentence
“Bob gave Mary some candy”.
Bob
give
recipient
agent
Mary
object
Candy
Instantiation
CSC 8520 Spring 2013. Paula Matuszek
80
Frames
• Represent related knowledge about a subject
• Frame has a title and a set of slots
– Title is what the frame “is” – the concept
– Slots capture relationships of the concept to other things
• Typically can be organized hierarchically
– Most frame systems have an is-a slot
– allows the use of inheritance
• Slots can contain all kinds of items
– Rules, facts, images, video, questions, hypotheses, other frames
– In NLP, typically capture relationships to other frames or entities
• Slots can also have procedural attachments
– on creation, modification, removal of the slot value
CSC 8520 Spring 2013. Paula Matuszek
81
Simple Frame Example
Slot Name
Restaurant
Cuisine
Price
Service
Atmosphere
Location
Web page
CSC 8520 Spring 2013. Paula Matuszek
Filler
Lemon Grass
Thai, Vegetarian
Expensive
Excellent
Good
KoP
Ask Google.
82
Usage of Frames
• Most operations with frames do one of two
things:
• Fill slots
– Process a piece of text to identify an entity for
which we have a frame
– Fill as many slots as possible
• Use contents of slots
– Look up answers to questions
– Generate new text
CSC 8520 Spring 2013. Paula Matuszek
[Rogers 1999]
83
Scripts
• Describe typical events or sequences
• Components are
–
–
–
–
script variables (players, props)
entry conditions
transactions
exit conditions
• Create instance by filling in variables
CSC 8520 Spring 2013. Paula Matuszek
84
Restaurant Script Example
• generic template for restaurants
– different types
– default values
• script for a typical sequence of activities at
a restaurant
• Often has a frame behind it; script is
essentially instantiating the frame
CSC 8520 Spring 2013. Paula Matuszek
[Rogers 1999]
85
EAT-AT-RESTAURANT Script
Restaurant Script
Props:
(Restaurant, Money, Food, Menu, Tables, Chairs)
Roles:
(Hungry-Persons, Wait-Persons, Chef-Persons)
Point-of-View:
Hungry-Persons
Time-of-Occurrence: (Times-of-Operation of Restaurant)
Place-of-Occurrence: (Location of Restaurant)
Event-Sequence:
first:
Enter-Restaurant Script
then:
if (Wait-To-Be-Seated-Sign or Reservations)
then Get-Maitre-d's-Attention Script
then:
Please-Be-Seated Script
then:
Order-Food-Script
then:
Eat-Food-Script unless (Long-Wait) when Exit-Restaurant-Angry Script
then:
if (Food-Quality was better than Palatable)
then Compliments-To-The-Chef Script
then:
Pay-For-It-Script
finally:
Leave-Restaurant Script
CSC 8520 Spring 2013. Paula Matuszek
[Rogers 1999]
86
Comments on Scripts
• Obviously takes a lot of time to develop
them initially.
– The script itself has much of the knowledge
– May be serious overkill for most NLP tasks
• We need this level of detail if we want to
include answers based on reasoning like
“Most restaurants do serve dinner.”
CSC 8520 Spring 2013. Paula Matuszek
87
First Order Predicate Calculus (FOPC)
• Terms
– Constants: Lemon Grass
– Functions: LocationOf(Lemon Grass)
– Variables: x in LocationOf(x)
• Predicates: Relations that hold among objects
– Serves(Lemon Grass,VegetarianFood)
• Logical Connectives: Permit compositionality of
meaning
– I only have $5 and I don’t have a lot of time
– Have(I,$5)  ¬ Have(I,LotofTime)
CSC 8520 Spring 2013. Paula Matuszek
88
Choosing a Representation
• We would like our representation to
support:
–
–
–
–
–
Verifiability
Unambiguous Representation
Canonical Form
Inference
Expressiveness
CSC 8520 Spring 2013. Paula Matuszek
89
Verifiability
• System can match input representation
against representations in knowledge base.
If it finds a match, it can return Yes;
Otherwise No.
• Does Lemon Grass serve vegetarian food?
Serves(Lemon Grass,vegetarian food)
CSC 8520 Spring 2013. Paula Matuszek
90
Unambiguous Representation
• Single linguistic input can have different meaning
representations
• Each representation unambiguously characterizes
one meaning.
• Example: small cars and motorcycles are allowed
– car(x) & small(x) & motorcycle(y) & small(y) &
allowed(x) & allowed(y)
– car(x) & small(x) & motorcycle(y) & allowed(x) &
allowed(y)
CSC 8520 Spring 2013. Paula Matuszek
91
Ambiguity and Vagueness
• An expression is ambiguous if, in a given
context, it can be disambiguated to have a
specific meaning, from a number of
discrete, possible meanings. E.g., bank
(financial institution) vs bank (river bank)
• An expression is vague, if it refers to a
range of a scalar variable, such that, even in
a specific context, it’s hard to specify the
range entirely. E.g., he’s tall, it’s warm, etc.
CSC 8520 Spring 2013. Paula Matuszek
92
Representing Similar Concepts
• Distinct inputs could have the same meaning
–
–
–
–
Does Lemon Grass have vegetarian dishes?
Do they have vegetarian food at Lemon Grass?
Are vegetarian dishes served at Lemon Grass?
Does Lemon Grass serve vegetarian fare?
• Alternatives
– Four different semantic representations
– Store all possible meaning representations in KB
CSC 8520 Spring 2013. Paula Matuszek
93
Canonical Form
• Solution: Inputs that mean same thing have
same meaning representation
• Is this easy? No!
– Vegetarian dishes, vegetarian food, vegetarian
fare
– Have, serve
• What to do?
CSC 8520 Spring 2013. Paula Matuszek
94
How to Produce a Canonical Form
• Systematic Meaning Representations can be derived
from thesaurus
– food ___
– dish ___|____one overlapping meaning sense
– fare ___|
• We can systematically relate syntactic constructions
– [S [NP Lemon Grass] serves
[NP vegetarian
dishes]]
– [S [NP vegetarian dishes] are served at [NP Lemon
Grass]]
CSC 8520 Spring 2013. Paula Matuszek
95
Inference
• Consider a more complex request
– Can vegetarians eat at Lemon Grass?
– Vs: Does Lemon Grass serve vegetarian food?
• Why do these result in the same answer?
• Inference: Draw conclusions about truth of
propositions not explicitly stored in KB
• serve(Lemon Grass,VegetarianFood) =>
CanEat(Vegetarians,At Lemon Grass)
CSC 8520 Spring 2013. Paula Matuszek
96
Non-Yes/No Questions
• Example: I'd like to find a restaurant where
I can get vegetarian food.
• serve(x,VegetarianFood)
• Matching succeeds only if variable x can be
replaced by known object in KB.
CSC 8520 Spring 2013. Paula Matuszek
97
Meaning Structure of Language
• Human Languages
–
–
–
–
Display a basic predicate-argument structure
Make use of variables
Make use of quantifiers
Display a partially compositional semantics
CSC 8520 Spring 2013. Paula Matuszek
98
Compositional Semantics
• Assumption: The meaning of the whole is comprised of the
meaning of its parts
–
–
–
–
George cooks. Dan eats. Dan is sick.
Cook(George) Eat(Dan) Sick(Dan)
If George cooks and Dan eats, Dan will get sick.
(Cook(George) ^ eat(Dan)) --> Sick(Dan)
• Meaning derives from
– The people and activities represented (predicates and arguments,
or, nouns and verbs)
– The way they are ordered and related: syntax of the representation,
which may also reflect the syntax of the sentence
• Parse inputs and associate meaning as we parse.
CSC 8520 Spring 2013. Paula Matuszek
99
Feature and Information Extraction
• We don’t always need a complete parse.
– We may be after only some specific
information
– And we know what that information is
• Lemon Grass is a sister restaurant to the one in
West Philadelphia serving traditional Thai dishes in
addition to a complete vegetarian Thai menu. The
service and atmosphere are quite pleasant. A
welcome addition to the King of Prussia area.
– We don’t care at all about the other information
CSC 8520 Spring 2013. Paula Matuszek
100
Information Extraction
• Retrieve some specific information which is located
somewhere in this set of documents.
• Don’t want all the information, just what we have specified.
• Information may occur multiple times in the text but we just
need to find it once
• Often what is really wanted from a web search, for instance.
• Often involves long sentences, complex syntax, extended
discourse, multiple writers
• May involve sentence fragments and other non-grammatical
text
CSC 8520 Spring 2013. Paula Matuszek
101
Some Examples of Information Extraction
• Financial Information
– Who is the CEO/CTO of a company?
– What were the dividend payments for stocks I’m interested in for
the last five years?
• Biological Information
– Are there known inhibitors of enzymes in a pathway?
– Are there chromosomally located point mutations that result in a
described phenotype?
• Other typical questions
– Who is expert in some area?
– Is there a vegetarian restaurant in KoP?
– Can I get a flight to Chicago tomorrow?
CSC 8520 Spring 2013. Paula Matuszek
102
Information Extraction -- How
• Create a model of information to be extracted
• Create knowledge base of rules for extraction
– concepts
– relations among concepts
• Process information applying a series of tools
• Always involves some level of domain
modeling and considerable natural language
processing
CSC 8520 Spring 2013. Paula Matuszek
103
Staged Approach
• Apply a series of tools to an input text
• At each stage an element of is identified
for possible use in the next level
• The end result is a set of relations suitable
for entry into a database
CSC 8520 Spring 2013. Paula Matuszek
104
Cascaded Process
• Tokenize
• POS Tag
• Tag named entities: Names, dates,
locations
• Tag complex entities and relationships
– Grammatical, such as noun phrases
– Semantic, such as CEO
• Bind values to tags throughout.
CSC 8520 Spring 2013. Paula Matuszek
105
Named Entities
• Named entity recognition…
– Labeling all the occurrences of named entities in a
text…
• People, organizations, lakes, bridges, hospitals, mountains,
etc…
• This is well-understood technology, readily
available and works well.
• Typically uses a combination of enumerated lists
(often called gazetteers) and regular expressions.
CSC 8520 Spring 2013. Paula Matuszek
106
Complex Entities and Relationships
• Uses Named Entities as components
• Pattern-matching rules specific to a given domain
• May be multi-pass: One pass creates entities
which are used as part of later passes.
– EG: First pass locates CEOs, second pass locates dates
for CEOs being replaced.
• May involve syntactic analysis as well, especially
for things like negatives and reference resolution
CSC 8520 Spring 2013. Paula Matuszek
107
Why Does This Work?
• The problem is very constrained.
– Only looking for a small set of items that can
appear in a small set of roles
• We can ignore stuff we don’t care about.
• BUT: creating the knowledge bases can be
very complex and time-consuming.
CSC 8520 Spring 2013. Paula Matuszek
108
Summary
• Some NLP tasks require some model of meaning: semantics
• Semantics can be modeled in a variety of different
formalisms. The goal of all of them is to map from the
sample of text being processed into some representation
relevant to a real-world task
• Complete generic semantic parses gives us a very rich
representation of text, but typically depend on the
assumption that language is compositional.
• Other tasks such as Information Extraction may not require a
detailed parse, but may depend on extensive domain
knowledge
• How you represent and work with semantics will always
depend on the NLP task you are trying to accomplish.
CSC 8520 Spring 2013. Paula Matuszek
109
Download