Word-Classes and Part-of-Speech Tagging Lecture Outline

advertisement
1
2
Lecture Outline
Word-Classes and Part-of-Speech
Tagging
•
•
•
•
•
•
Christopher Brewster
University of Sheffield
Computer Science Department
Natural Language Processing Group
C.Brewster@dcs.shef.ac.uk
Definition and Example
Motivation
Word-classes
A Basic Tagging System
Transformation-Based Tagging
Tagging Unknown Words
3
4
Definition
An Example
“the process of assigning a part-of-speech or
other lexical class marker to each word in a
corpus” – D. Jurafsky and J.H. Martin, 2000, Speech and Language
Processing
WORDS
TAGS
N
the
girl
kissed
the
boy
on
the
cheek
V
P
ART
…
The
girl
kissed
the
boy
on
the
cheek
lemma
the
girl
kiss
the
boy
on
the
cheek
tag
+DET
+NOUN
+VPAST
+DET
+NOUN
+PREP
+DET
+NOUN
from http://www.xrce.xerox.com/research/mltt/toolhome.html
1
5
6
Motivation:
the uses of Tagging
•
•
•
•
•
Word Classes
• Basic words classes: Noun, Verb,
Adjective, Adverb, Preposition, …..
• Open vs. Closed classes.
Speech synthesis – pronunciation
Speech recognition – class-based N-grams
Information retrieval – stemming
Word-sense disambiguation
Corpus analysis of language & lexicography
– Closed e.g
determiners: a, an, the
pronouns: she, he, I, others
prepositions: on, under, over, near, by, at, from,
to, with
7
8
Word Classes: Tag set example
Word Classes: Tag sets
• Vary in number of tags: a dozen to over 200
• Size of tag sets depends on language,
objectives and purpose
– Simple morphology = more ambiguity = fewer
tags
– Some tagging approaches (e.g. constraint
grammar based) make fewer distinctions eg.
conflating adverbs, particles and interjections
CC
coordin.
conjunction
and, but, IN
or
prepositi of, in, by
on
CD
cardinal
number
determiner
one, two, JJ
three
a, the
JJR
adjective yellow
EX
existential
‘there’
there
FW
foreign word mea
culpa
noun
singular
or mass
noun,
plural
DT
NN
NNS
adj.
bigger
compar.
llama
llamas
from the Penn treebank part-of-speech tag set.
2
9
10
Word Class Ambiguity
(in the Brown Corpus)
The Problem
Unambiguous (1 tag)
Ambiguous (2-7 tags)
• Words often have more than one word
class: this
– This is a nice day = PR
– This day is nice = ADJ
– You can go this far. = ADV
2 tags
3 tags
4 tags
5 tags
6 tags
7 tags
35, 340
4,100
3,760
264
61
12
2
1 (still)
from DeRose (1988)
11
A Basic System: the PARTS
program
• “PARTS – A System for Assigning Word
Classes to English Texts”, L.L.Cherry
• Uses list of function words, and list of
suffixes and auxiliaries as key sources of
information
• many combination classes e.g. noun_adj
• words members of >2 classes initially
assigned unk
12
The PARTS program: input
• List of function words and irregular verbs with tags:
able,adj
every, adj
own, adj
ago, adj_adv
will, aux
do, auxv
be, be
and, conj
or, conj
but, conj
begun, ed
bitten, ed
outside, prep
up, prep
over, prep
until, prep_adv
• List of suffixes with most probable tag for words of that
suffix.
ic, adj
ance, noun
ship, noun
ant, noun_adj
age, noun
ize,
ment, noun
verb
ary, adj
• suffixes chosen by hand
• if most words with suffix have only 1 or 2 tags, this single or
combined class assigned, exceptions added to exception list
• exception list has many obscure words
• A text
3
13
The PARTS program:
step 1 pre-processing
1. tokenises words and sentences
•
•
word = string of characters separated by blanks or
punctuation
sentence = string of words ending in .?! (other punctuation is
treated as a comma
The PARTS program:
step 2 suffix analysis
14
1. applies to words NOT assigned tags in step 1
2. look up suffix list
3. unassigned words go on to step 3
2. marks capitalised words not starting sentences as
noun_adj
3. marks hyphenated words as noun_adj
4. lookup function words & irregular verbs in the list
15
The PARTS program:
step 3 word class assignment
1. finds verb in the sentence (using auxiliary)
2. finds nouns
3. applies a set of rules of form:
verb_adj & ~a => verb
“if the word has been assigned the class
verb_adj and the verb has not been
recognised in the sentence, assign verb to
it”
16
The PARTS program:
results and example
• 95% correct assignment
• 41.5% of errors arise from noun-adjective
confusion
• Example:
They act as
messengers for
the legislators.
pronp unk prep_adv nv_pl
prep_adv art nv_pl
pron verb prep
noun
prep
art noun
4
17
Other methods: Stochastic
Tagging
Stochastic tagging
• Not based on rules, but on probability of a certain
tag occurring given …. various possibilities.
• Necessitates a TRAINING CORPUS i.e. a hand
tagged text in order to derive probabilities.
• Problem: no probabilities for words not in corpus
• Problem: Bad results if training corpus is very
different from test corpus
Transformation-Based Learning
Tagging (Brill Tagging)
• Combination of rule-based AND stochastic
tagging methodologies
– Like rule-based because rules are used to specify tags
in a certain environment
– Like stochastic approach because machine learning is
used using a tagged corpus as input
• Input:
– a tagged corpus
– a dictionary (with the most frequent tags)
18
• Method: Choose most frequent tag in
training text for each word.
– Result: 90% accuracy
– Reason: cf. figures on word class ambiguity
where 90% of words have only one tag
– Therefore: this is a base line, and any other
method must do significantly better
– cf. HMM tagging (lecture of Nick Webb)
19
20
TBL: Rule Application
• Example rules:
– Change NN to VB when previous tag is TO
– For example: race has the following probabilities in
the Brown corpus:
• P(NN|race) = .98
• P(VB|race) = .02
… is/VBZ expected/VBN to/TO race/NN tomorrow/NN
becomes
… is/VBZ expected/VBN to/TO race/VB tomorrow/NN
5
TBL: Rule Learning
21
TBL: Rule Learning (2)
• 2 parts to a rule:
• Templates are like under specified rules:
– Triggering environment
– Rewrite rule
– Replace tag X with tag Y, provided tag Z or word Z’
appears in some position
• Rules are learned in ordered sequence – whichever
gives best net improvement at each iteration of the
learning algorithm.
• Rules may interact i.e. Rule 1 may make a change
which provides context for Rule 2 to fire.
• Rules are compact (a few hundred) and can be
inspected by humans (vs. impossibility of
inspecting HMM transition probabilities)
• The range of Triggering environments or
templates(from Manning & Schutze 1999:363):
Schema
1
2
3
4
5
6
7
8
9
22
t1-3
ti-2
ti-1 ti
*
*
*
*
*
*
*
*
*
ti+1
ti+2
ti+3
23
24
TBL: the Algorithm
• Step 1: Label every word with most likely tag
(from dictionary)
• Step 2: Check every possible transformation &
select one which most improves tagging (with
respect to hand tagged corpus)
• Step 3: Re-tag corpus applying the rules
• Repeat 2-3 until some stopping criterion is
reached e.g. x % correct with respect to training
corpus
• RESULT: a sequence of transformation rules
TBL: Problems
• Execution Speed: TBL tagger is slow
compared to HMM approach
– Solution: compile the rules to a Finite State
Transducer (FST)
• Learning Speed: Brill’s implementation
over a day (600k tokens)
6
25
Further Reading
Tagging Unknown Words
• New words added to (newspaper) language 20+ per
month.
• Plus many proper names ….
• Increases error rates by 1-2%
• Method 1: assume they are nouns
• Method 2: assume the unknown words have a
probability distribution similar to hapax legomena
• Method 3: use capitalisation, suffixes, etc. This works
very well for morphologically complex languages
26
• Introductory:
– Jurafsky, Daniel & James H. Martin, Speech and Language Processing,
Prentice Hall: 2000 Chapter 8, pp285-322
– Manning, Christopher & Hinrich Schutze, Foundations of Statistical Natural
Language Processing, Chap 10, pp341-380
• Texts:
– Brill, Eric Transformation-based error-driven learning and natural language
processing: A case-study in part-of-speech tagging. Computational
Linguistics 21:543-565
– Cherry, L. PART: a system for assigning words classes to English text. AT
&T memorandum. 1978
– Church, K. A stochastic parts program and noun phrase parser for
unrestricted text. Second Conference on Applied NLP, Austin, 1988
– Garside, Roger, Geoffrey Sampson and Geoffrey Leach (eds) The
Computational analysis of English: a corpus-based approach. London: 1987
Also check the papers referred to in the Introductory references.
7
Download