corpus1

advertisement
Lecture 13
Corpus Linguistics I
CS 4705
From Knowledge-Based to Corpus-Based
Linguistics
• A Paradigm Shift begins in the 1980s
– Seeds planted in the 1950s (Harris, Firth)
– Cut off by Chomsky
– Renewal due to
• Interest in practical applications (ASR, MT, …)
• Availability at major industrial labs of powerful
machines and large amounts of storage
• Increasing availability of large online texts and
speech data
• Crossover efforts with ASR community, fostered by
DARPA
• For many practical tasks, statistical methods
perform better
• Less knowledge required by researchers
Next Word Prediction
• An ostensibly artificial task: predicting the next
word in a sequence.
• From a NY Times story...
– Stocks plunged this ….
– Stocks plunged this morning, despite a cut in interest
rates
– Stocks plunged this morning, despite a cut in interest
rates by the Federal Reserve, as Wall ...
– Stocks plunged this morning, despite a cut in interest
rates by the Federal Reserve, as Wall Street began
– Stocks plunged this morning, despite a cut in interest
rates by the Federal Reserve, as Wall Street began
trading for the first time since last …
– Stocks plunged this morning, despite a cut in interest
rates by the Federal Reserve, as Wall Street began
trading for the first time since last Tuesday's terrorist
attacks.
Human Word Prediction
• Clearly, at least some of us have the ability to
predict future words in an utterance.
• How?
– Domain knowledge
– Syntactic knowledge
– Lexical knowledge
Claim
• A useful part of the knowledge needed to allow
Word Prediction (guessing the next word) can be
captured using simple statistical techniques.
• In particular, we'll rely on the notion of the
probability of a sequence (e.g., sentence) and the
likelihood of words co-occurring
Why would we want to do this?
• Why would anyone want to predict a word?
– If you say you can predict the next word, it means you
can rank the likelihood of sequences containing various
alternative words, or, alternative hypotheses
– You can assess the likelihood/goodness of an
hypothesis
• Many NLP problems can be modeled as mapping
from one string of symbols to another.
• In statistical language applications, knowledge of
the source (e.g, a statistical model of word
sequences) is referred to as a Language Model or a
Grammar
Why is this useful?
• Example applications that employ language
models:
• Speech recognition
• Handwriting recognition
• Spelling correction
• Machine translation systems
• Optical character recognizers
Real Word Spelling Errors
• They are leaving in about fifteen minuets to go to
her house.
• The study was conducted mainly be John Black.
• The design an construction of the system will take
more than a year.
• Hopefully, all with continue smoothly in my
absence.
• Can they lave him my messages?
• I need to notified the bank of….
• He is trying to fine out.
Handwriting Recognition
• Assume a note is given to a bank teller, which the
teller reads as I have a gub. (cf. Woody Allen)
• NLP to the rescue ….
– gub is not a word
– gun, gum, Gus, and gull are words, but gun has a higher
probability in the context of a bank
For Spell Checkers
• Collect a list of commonly substituted words
– piece/peace, whether/weather, their/there ...
– Whenever you encounter one of these words in a
sentence, construct the alternative sentence as well
– Assess the goodness of each and choose the one (word)
with the more likely sentence
• E.g.
• On Tuesday, the whether
• On Tuesday, the weather
The Noisy Channel Model
• A probabilistic model developed by Claude Shannon to
model communication (as over a phone line)
^
I  Noisy Channel  O
^
I = argmaxPr(I|O) = argmaxPr(I) Pr(O|I)
I
I
^
• I the most likely input
• Pr(I) the prior probability
• Pr(I|O) the most likely I given O
• Pr(O|I) the probability that O is the output if I is the input
Review: Basic Probability
• Prior Probability (or unconditional probability)
– P(A), where A is some event
– Possible events: it raining, the next person you see
being Scandinavian, a child getting the measles, the
word ‘warlord’ occurring in the newspaper
• Conditional Probability
– P(A | B)
– the probability of A, given that we know B
– E.g. it raining, given that we know it’s October; the
next person you see being Scandinavian, given that
you’re in Sweden, the word ‘warlord’ occurring in a
story about Afghanistan
Example
FFFFFFIIII
•
•
•
•
P(Finn) = .6
P(skier) = .5
P(skier|Finn) = .67
P(Finn|skier) = .8
Next class
• Midterm
• Next class:
– Hindle & Rooth 1993
– Begin studying semantics, Ch. 14
Download