cikm2005

advertisement
Predicting Accuracy of Extracting Information
from Unstructured Text Collections
Eugene Agichtein and Silviu Cucerzan
Microsoft Research
Research
Text Mining Search and Navigation Group
Extracting and Managing Information in Text
Text
Document
Collections
Web
Documents
Blogs
News
Alerts
…
Varying properties
Different Languages
Varying consistency
Noise/errors
….
Complex problem
Usually many parameters
Often tuning required
Information Extraction System
Entities
------------------------------------------------------------------------Research
Events
---------------------
Relations
E1
E2
E3
E4
E4
E1
Success ~ Accuracy
Text Mining Search and Navigation Group
The Goal: Predict Extraction Accuracy
Estimate the expected success of an IE system
that relies on contextual patterns before
• running expensive experiments
• tuning parameters
• training the system
Useful when adapting an IE system to
• a new task
• a new document collection
• a new language
Research
Text Mining Search and Navigation Group
Specific Extraction Tasks
• Named Entity Recognition (NER)
Organization
Misc
European champions Liverpool paved the way to the group stages of the
Champions League taking a 3-1 lead over CSKA Sofia on Wednesday [...] Gerard
Houllier's men started the match in Sofia on fire with Steven Gerrard scoring [...]
Location
Person
• Relation Extraction (RE)
Abraham Lincoln was born on Feb. 12, 1809, in a log cabin in Hardin (now Larue)
County, Ky
BORN Who
Abraham Lincoln
Research
When
Where
Feb. 12, 1809
Hardin County, KY
Text Mining Search and Navigation Group
Contextual Clues
… yesterday, Mrs Clinton told reporters the move to the East Room
Left context
Right context
engineers Orville and Wilbur Wright built the first working airplane in 1903 .
Left context
Research
Middle context
Right context
Text Mining Search and Navigation Group
Approach: Language Modelling
• Presence of contextual clues for a task appears
related to extraction difficulty
• The more “obvious” the clues, the easier the task
• Can be modelled as “unexpectedness” of a word
• Use Language Modelling (LM) techniques to
quantify intuition
Research
Text Mining Search and Navigation Group
Language Models (LM)
• An LM is summary of word distribution in text
• Can define unigram, bigram, trigram, n-gram models
• More complex models exist
– Distance, syntax, word classes
– But: not robust for web, other languages, …
• LMs used in IR, ASR, Text Classification, Clustering:
– Query Clarity: Predicting query performance
[Cronen-Townsend et al, SIGIR 2002]
– Context Modelling for NER
[Cucerzan et al., EMNLP 1999], [Klein et al. CoNLL 2003]
…
Research
Text Mining Search and Navigation Group
Document Language Models
• A basic LM is a normalized word
histogram for the document collection
word
the
to
and
said
...
's
company
mrs
won
president
• Unigram (word) models commonly used
• Higher-order n-grams (bigrams, trigrams)
can be used
Freq
0.0584
0.0269
0.0199
0.0147
...
0.0018
0.0014
0.0003
0.0003
0.0003
0.07
0.06
frequency
0.05
0.04
0.03
0.02
0.01
0
the
Research
to
and
said
's
company
mrs
won
pre side nt
Text Mining Search and Navigation Group
Context Language Models
•
Senator Christopher Dodd, D-Conn., named general chairman of the
Democratic National Committee last week by President Bill Clinton , said it
was premature to talk about lifting the U.S. embargo against Cuba…
•
Although the Clinton ‘s health plan failed to make it through Congress this
year , Mrs Clinton vowed continued support for the proposal.
•
A senior White House official, who accompanied Clinton , told reporters…
•
By the fall of 1905, the Wright brothers ’ experimental period ended. With
their third powered airplane , they now routinely made flights of several …
•
Against this backdrop, we see the Wright brothers efforts to develop an
airplane …
Research
Text Mining Search and Navigation Group
Key Observation
• If normally rare words consistently appear in contexts
around entities, extraction task tends to be “easier”.
0.07
0.06
frequency
0.05
0.04
0.03
0.02
0.01
0
the
to
and
said
's
company
mrs
won
president
• Contexts for a task are an intrinsic property of collection
and extraction task, and not restricted to a specific
information extraction system.
Research
Text Mining Search and Navigation Group
Divergence Measures
• Cosine Divergence:
 LM C  LM BG 
Cosine ( LM C , LM BG )  1 
|| LM BG ||2  || LM C ||2
• Relative entropy: KL Divergence
KL ( LM C
Research
LM C ( w)
|| LM BG )   LM C ( wi )  log
LM BG ( w)
wV
Text Mining Search and Navigation Group
Interpreting Divergence: Reference LM
• Need to calibrate the observed divergence
• Compute Reference Model LMR :
– Pick K random non-stopwords R and compute the context
language model around Ri.
… the five-star Hotel Astoria is a symbol of elegance and comfort.
With an unbeatable location in St Isaac's Square in the heart of St
Petersburg, ...
• Normalized KL(LMC)=
KL( LMC )
KL( LMR )
• Normalization corrects for bias introduced by small
sample size
Research
Text Mining Search and Navigation Group
Reference LM (cont)
3-word context
2-word context
1-word context
average KL-divergence
8.00
7.00
6.85
6.00
5.89
5.00
5.85
4.79
4.00
4.06
3.86
3.01
2.81
3.62
3.00
3.73
3.29
2.67
2.00
2.49
3.17
2.71
1.82
1.00
2.43
1.51
2.11
2.39
1.62 1.25
0.92
1.27
0.74
0.00
1
2
3
4
5
10
20
50
0.68
0.53
100
random sample size
• LMR converges to LMBG for large sample sizes
• Divergence of LMR substantial for small samples
Research
Text Mining Search and Navigation Group
Predicting Extraction Accuracy: The Algorithm
1. Start with a small sample S of entities (or
relation tuples) to be extracted
2. Find occurrences of S in given collection
3. Compute LMBG for the collection
4. Compute LMC for S and the collection
5. Pick |S| random words R from LMBG
6. Compute context LM for R  LMR
7. Compute KL(LMC || LMBG), KL(LMR || LMBG)
8. Return normalized KL(LMC)
Research
Text Mining Search and Navigation Group
Experimental Evaluation
• How to measure success?
– Compare predicted ease of task vs. observed
extraction accuracy
• Extraction Tasks: NER and RE
– NER: Datasets from the CoNLL 2002, 2003
evaluations
– RE: Binary relations between NEs and generic
phrases
Research
Text Mining Search and Navigation Group
Extraction Task Accuracy
NER
English
Spanish
Dutch
LOC
90.21
79.84
79.19
MISC
78.83
55.82
73.9
ORG
81.86
79.69
69.48
PER
91.47
86.83
78.83
Overall
86.77
79.2
75.24
Relation
RE
Research
Accuracy (%)
strict
partial
Task Difficulty
BORN
0.73
0.96
Easy
DIED
0.34
0.97
Easy
INVENT
0.35
0.64
Hard
WROTE
0.12
0.50
Hard
Text Mining Search and Navigation Group
Document Collections
Task
Collection
NER
Reuters RCV1, 1/100
3,566,125 words
Reuters RCV1, 1/10
35,639,471 words
EFE newswire articles, May 2000 (Spanish)
367,589 words
“De Morgen” articles (Dutch)
268,705 words
Encarta document collection
RE
Size
64,187,912 words
Note that Spanish and Dutch corpus sizes are much smaller
Research
Text Mining Search and Navigation Group
Predicting NER Performance (English)
Florian et al.
Chieu et al.
Klein et al.
Zhang et al. Carreras et al.
Average
LOC
91.15
91.12
89.98
89.54
89.26
90.21
MISC
80.44
79.16
80.15
75.87
78.54
78.83
ORG
84.67
84.32
80.48
80.46
79.41
81.86
PER
93.85
93.44
90.72
90.44
88.93
91.47
Overall
88.76
88.31
86.31
85.50
85.00
86.77
Reuters 1/10, Context = 3 words, discard stopwords, avg
Context size
Absolute
Normalized
LOC
0.98
1.07
MISC
1.29
1.40
ORG
2.83
3.08
PER
4.10
4.46
RANDOM
LOC exception:
Large overlap between locations
in the training and test collections
(i.e., simple gazetteers effective).
0.92
Absolute and Normalized KL-divergence
Research
Text Mining Search and Navigation Group
NER – Robustness / Different Dimensions
• Counting stopwords (w) or
not (w/o)
Reuters 1/100, context ±3, avg
• Context Size
Reuters 1/100, no stopwords, avg
• Corpus size
Reuters, context ±3, no stopwords, avg
Research
LOC
MISC
ORG
PER
RAND
F
90.2
78.8
81.9
91.5
-
w
0.93
1.09
2.68
3.91
0.78
w/o
1.48
1.83
3.81
5.62
1.27
LOC
MISC
ORG
PER
RAND
1
0.88
1.26
2.12
2.94
2.43
2
1.06
1.47
2.95
4.11
1.14
3
1.07
1.4
3.08
4.46
0.92
MISC
ORG
LOC
PER
RAND
1/10
1.07
1.4
3.08
4.46
0.92
1/100
1.48
1.83
3.81
5.62
1.27
Text Mining Search and Navigation Group
Other Dimensions: Sample Size
3-word context
LOC
ORG
6
M ISC
P ER
5
4
3
2
2-word context
7.00
6.85
6.00
5.89
5.00 5.85
4.79
4.00
3.62
3.00
3.86
3.01
2.81
3.73
3.29
2.67
2.00
4.06
2.49
3.17
2.71
2.43
1.51
2.11
2.39
1.82
1.00
1
1-word context
8.00
average KL-divergence
Normalized KL divergence
7
1.62 1.25
0.92
1.27
0.74
10
20
30
sam ple size
40
50
0.00
1
2
3
4
5
10
20
50
random sample size
• Normalized divergence of LMC remains high
- Contrast with LMR for larger sample sizes
Research
0.68
0.53
Text Mining Search and Navigation Group
100
Other Dimensions: N-gram size
LOC
90.21
6
MISC
78.83
5
ORG
81.86
4
PER
91.47
Normalized KL divergence
7
LOC
MISC
ORG
PER
3
2
1
1
2
3
ngram size
Higher order n-grams may help in some cases.
Research
Text Mining Search and Navigation Group
Other Languages
• Spanish
Context=1
Context=2
Context=3
Entity
Actual
LOC
79.84
MISC
55.82
2.56
ORG
79.69
1.53
PER
86.83
LOC
1.18
1.39
1.42
MISC
1.73
2.12
2.35
ORG
1.42
1.59
1.64
PER
2.01
2.31
RANDOM
2.42
1.82
Problem: very small collections
• Dutch
Context=1
Context=2
Context=3
Entity
Actual
LOC
1.44
1.65
1.61
LOC
79.19
MISC
1.97
2.02
1.91
MISC
73.9
ORG
1.53
1.86
1.92
PER
2.25
2.63
2.60
ORG
69.48
RANDOM
2.59
1.89
1.71
PER
78.83
Research
Text Mining Search and Navigation Group
Predicting RE Performance (English)
Relation
Accuracy (%)
2.39
BORN
0.73
0.96
1.86
1.83
DIED
0.34
0.97
1.94
1.75
1.72
INVENT
0.35
0.64
WROTE
1.59
1.59
1.53
WROTE
0.12
0.50
RANDOM
6.87
6.24
5.79
Relation
Context size 1
Context size 2
Context size 3
BORN
2.02
2.17
DIED
1.89
INVENT
• 2- and 3- word contexts correctly distinguish between
“easy” tasks (BORN, DIED), and “difficult” tasks
(INVENT, WROTE).
• 1-word context size appears not sufficient for predicting RE
Research
Text Mining Search and Navigation Group
Other Dimensions: Sample Size
3
B ORN
DIED
INVENT
WROTE
3-word context
2-word context
1-word context
Normalized KL
divergence
8.00
2.5
average KL-divergence
7.00 6.85
2
1.5
6.00 5.89
5.00 5.85
4.79
4.00
3.62
3.00
3.86
3.01
2.81
3.73
3.29
2.67
2.00
4.06
2.49
3.17
2.71
1.82
1.00
2.43
1.51
2.11
2.39
1.62 1.25
0.92
1.27
0.74
0.00
1
1
10
20
sam ple size
30
40
2
3
4
5
10
20
0.53
50
random sample size
• Divergence increases w/ sample size
Research
0.68
Text Mining Search and Navigation Group
100
Results Summary
• Context models can be effective in predicting the
success of information extraction systems
• Even a small sample of available entities can be
sufficient for making accurate predictions
• Available large collection size most important
limiting factor
Research
Text Mining Search and Navigation Group
Other Applications and Future Work
• Could use results for
– Active learning/training IE
– Improved boundary detection for NER
– Improved confidence estimation of extraction
• e.g.: Culotta and McCallum [HLT 2004]
• For better results, could incorporate:
– Internal contexts, gazeteers (e.g., for LOC entities)
• e.g.: Agichtein & Ganti [KDD 2004], Cohen & Sarawagi [KDD 2004]
– Syntax/logical distance
– Coreference Resolution
– Word classes
Research
Text Mining Search and Navigation Group
Summary
• Presented the first attempt to predict information
extraction accuracy for a given task and collection
• Developed a general, system-independent
method utilizing Language Modelling techniques
• Estimates for extraction accuracy can help
– Deploy information extraction systems
– Port Information Extraction systems to new
tasks, domains, collections, and languages
Research
Text Mining Search and Navigation Group
For More Information
Text Mining, Search, and Navigation Group
http://research.microsoft.com/tmsn/
Research
Text Mining Search and Navigation Group
Download