Why part of speech tagging?
Word classes
Tag sets and problem definition
Automatic approaches 1: rule-based tagging
Automatic approaches 2: stochastic tagging
Automatic approaches 3: transformation-based tagging
Other issues: tagging unknown words, evaluation
Slide 1
“The process of assigning a part-of-speech or other lexical class marker to each word in a corpus”
WORDS the girl kissed the baby on the cheek
TAGS
N
V
P
DET
Slide 2
WORD the girl kissed the baby on the cheek
LEMMA the girl kiss the baby on the cheek
TAG
+DET
+NOUN
+VPAST
+DET
+NOUN
+PREP
+DET
+NOUN
Slide 3
Speech synthesis — pronunciation
Speech recognition — class-based N-grams
Information retrieval — stemming, selection highcontent words
Word-sense disambiguation
Corpus analysis of language & lexicography
Slide 4
Why part of speech tagging?
Word classes
Tag sets and problem definition
Automatic approaches 1: rule-based tagging
Automatic approaches 2: stochastic tagging
Automatic approaches 3: transformation-based tagging
Other issues: tagging unknown words, evaluation
Slide 5
Basic word classes: Noun, Verb, Adjective, Adverb,
Preposition, …
Open vs. Closed classes
Open:
Nouns, Verbs, Adjectives, Adverbs.
Why “open”? (that readily accept new members)
Closed:
Definition: The category of function words--that is, parts of speech
(or word classes) that do not readily accept new members. determiners: a, an, the pronouns: she, he, I prepositions: on, under, over, near, by, …
Slide 6
Every known human language has nouns and verbs
Nouns: people, places, things
Classes of nouns proper vs. common count vs. mass
Verbs: actions and processes
Adjectives: properties, qualities
Adverbs: hodgepodge!
Unfortunately, John walked home extremely slowly yesterday
Numerals: one, two, three, third, …
Slide 7
Differ more from language to language than open class words
Examples: prepositions: on, under, over, … particles: up, down, on, off, … determiners: a, an, the, … pronouns: she, who, I, ..
conjunctions: and, but, or, … auxiliary verbs: can, may should, …
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Why part of speech tagging?
Word classes
Tag sets and problem definition
Automatic approaches 1: rule-based tagging
Automatic approaches 2: stochastic tagging
Automatic approaches 3: transformation-based tagging
Other issues: tagging unknown words, evaluation
Slide 14
• Vary in number of tags: a dozen to over 200
• Size of tag sets depends on language, objectives and purpose
– Some tagging approaches (e.g., constraint grammar based) make fewer distinctions e.g., conflating prepositions, conjunctions, particles
– Simple morphology = more ambiguity = fewer tags
Slide 15
PRP
PRP$
Slide 16
The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.
VB DT NN .
Book that flight .
VBZ DT NN VB NN ?
Does that flight serve dinner ?
See http://www.infogistics.com/posdemo.htm
Slide 17
Words often have more than one word class: this
This is a nice day = PRP
This day is nice = DT
You can go this far = RB
Slide 18
Unambiguous (1 tag): 35,340
Ambiguous (2-7 tags): 4,100
2 tags
3 tags
4 tags
5 tags
6 tags
7 tags
3,760
264
61
12
2
1
Slide 19
• Rule-Based Tagger: ENGTWOL (ENGlish TWO Level analysis)
• Stochastic Tagger: HMM-based
• Transformation-Based Tagger (Brill)
Slide 20
Why part of speech tagging?
Word classes
Tag sets and problem definition
Automatic approaches 1: rule-based tagging
Automatic approaches 2: stochastic tagging
Automatic approaches 3: transformation-based tagging
Other issues: tagging unknown words, evaluation
Slide 21
• Basic Idea:
– Assign all possible tags to words
– Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv.
– Typically more than 1000 hand-written rules, but may be machine-learned.
Slide 22
Demo: http://www2.lingsoft.fi/cgi-bin/engtwol
Slide 23
Why part of speech tagging?
Word classes
Tag sets and problem definition
Automatic approaches 1: rule-based tagging
Automatic approaches 2: stochastic tagging
Automatic approaches 3: transformation-based tagging
Other issues: tagging unknown words, evaluation
Slide 24
• Based on probability of certain tag occurring given various possibilities
• Requires a training corpus
• No probabilities for words not in corpus.
• Training corpus may be different from test corpus.
Slide 25
• Simple Method: Choose most frequent tag in training text for each word!
– Result: 90% accuracy
– Baseline
– Others will do better
Slide 26
Why part of speech tagging?
Word classes
Tag sets and problem definition
Automatic approaches 1: rule-based tagging
Automatic approaches 2: stochastic tagging
Automatic approaches 3: transformation-based tagging
Other issues: tagging unknown words, evaluation
Slide 27
• Combination of Rule-based and stochastic tagging methodologies
– Like rule-based because rules are used to specify tags in a certain environment
– Like stochastic approach because machine learning is used—with tagged corpus as input
• Input:
– tagged corpus
– dictionary (with most frequent tags)
• Usually constructed from the tagged corpus
Slide 28
• Basic Idea:
– Set the most probable tag for each word as a start value
– Change tags according to rules of type “if word-1 is a determiner and word is a verb then change the tag to noun” in a specific order
• Training is done on tagged corpus:
– Write a set of rule templates
– Among the set of rules, find one with highest score
– Continue from 2 until lowest score threshold is passed
– Keep the ordered set of rules
• Rules make errors that are corrected by later rules
Slide 29
Why part of speech tagging?
Word classes
Tag sets and problem definition
Automatic approaches 1: rule-based tagging
Automatic approaches 2: stochastic tagging
Automatic approaches 3: transformation-based tagging
Other issues: tagging unknown words, evaluation
Slide 30
• New words added to (newspaper) language 20+ per month
• Plus many proper names …
• Increases error rates by 1-2%
• Method 1: assume they are nouns
• Method 2: assume the unknown words have a probability distribution similar to words only occurring once in the training set.
• Method 3: Use morphological information, e.g., words ending with –ed tend to be tagged VBN.
Slide 31
• The result is compared with a manually coded “Gold
Standard”
– Typically accuracy reaches 96-97%
– This may be compared with result for a baseline tagger (one that uses no context).
• Important: 100% is impossible even for human annotators.
• Factors that affects the performance
– The amount of training data available
– The tag set
– The difference between training corpus and test corpus
– Dictionary
– Unknown words
Slide 32