Lecture 13 Corpus Linguistics I CS 4705 From Knowledge-Based to Corpus-Based Linguistics • A Paradigm Shift begins in the 1980s – Seeds planted in the 1950s (Harris, Firth) – Cut off by Chomsky – Renewal due to • Interest in practical applications (ASR, MT, …) • Availability at major industrial labs of powerful machines and large amounts of storage • Increasing availability of large online texts and speech data • Crossover efforts with ASR community, fostered by DARPA • For many practical tasks, statistical methods perform better • Less knowledge required by researchers Next Word Prediction • An ostensibly artificial task: predicting the next word in a sequence. • From a NY Times story... – Stocks plunged this …. – Stocks plunged this morning, despite a cut in interest rates – Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall ... – Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began – Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began trading for the first time since last … – Stocks plunged this morning, despite a cut in interest rates by the Federal Reserve, as Wall Street began trading for the first time since last Tuesday's terrorist attacks. Human Word Prediction • Clearly, at least some of us have the ability to predict future words in an utterance. • How? – Domain knowledge – Syntactic knowledge – Lexical knowledge Claim • A useful part of the knowledge needed to allow Word Prediction (guessing the next word) can be captured using simple statistical techniques. • In particular, we'll rely on the notion of the probability of a sequence (e.g., sentence) and the likelihood of words co-occurring Why would we want to do this? • Why would anyone want to predict a word? – If you say you can predict the next word, it means you can rank the likelihood of sequences containing various alternative words, or, alternative hypotheses – You can assess the likelihood/goodness of an hypothesis • Many NLP problems can be modeled as mapping from one string of symbols to another. • In statistical language applications, knowledge of the source (e.g, a statistical model of word sequences) is referred to as a Language Model or a Grammar Why is this useful? • Example applications that employ language models: • Speech recognition • Handwriting recognition • Spelling correction • Machine translation systems • Optical character recognizers Real Word Spelling Errors • They are leaving in about fifteen minuets to go to her house. • The study was conducted mainly be John Black. • The design an construction of the system will take more than a year. • Hopefully, all with continue smoothly in my absence. • Can they lave him my messages? • I need to notified the bank of…. • He is trying to fine out. Handwriting Recognition • Assume a note is given to a bank teller, which the teller reads as I have a gub. (cf. Woody Allen) • NLP to the rescue …. – gub is not a word – gun, gum, Gus, and gull are words, but gun has a higher probability in the context of a bank For Spell Checkers • Collect a list of commonly substituted words – piece/peace, whether/weather, their/there ... – Whenever you encounter one of these words in a sentence, construct the alternative sentence as well – Assess the goodness of each and choose the one (word) with the more likely sentence • E.g. • On Tuesday, the whether • On Tuesday, the weather The Noisy Channel Model • A probabilistic model developed by Claude Shannon to model communication (as over a phone line) ^ I Noisy Channel O ^ I = argmaxPr(I|O) = argmaxPr(I) Pr(O|I) I I ^ • I the most likely input • Pr(I) the prior probability • Pr(I|O) the most likely I given O • Pr(O|I) the probability that O is the output if I is the input Review: Basic Probability • Prior Probability (or unconditional probability) – P(A), where A is some event – Possible events: it raining, the next person you see being Scandinavian, a child getting the measles, the word ‘warlord’ occurring in the newspaper • Conditional Probability – P(A | B) – the probability of A, given that we know B – E.g. it raining, given that we know it’s October; the next person you see being Scandinavian, given that you’re in Sweden, the word ‘warlord’ occurring in a story about Afghanistan Example FFFFFFIIII • • • • P(Finn) = .6 P(skier) = .5 P(skier|Finn) = .67 P(Finn|skier) = .8 Next class • Midterm • Next class: – Hindle & Rooth 1993 – Begin studying semantics, Ch. 14