Y - University of Massachusetts Amherst

advertisement
Toward Unified Graphical Models of
Information Extraction and Data Mining
Andrew McCallum
Computer Science Department
University of Massachusetts Amherst
Joint work with
Charles Sutton, Aron Culotta, Ben Wellner, Khashayar Rohanimanesh,
Wei Li, Andres Corrada, Xuerui Wang
Goal:
Mine actionable knowledge
from unstructured text.
Extracting Job Openings from the Web
foodscience.com-Job2
JobTitle: Ice Cream Guru
Employer: foodscience.com
JobCategory: Travel/Hospitality
JobFunction: Food Services
JobLocation: Upper Midwest
Contact Phone: 800-488-2611
DateExtracted: January 8, 2001
Source: www.foodscience.com/jobs_midwest.htm
OtherCompanyJobs: foodscience.com-Job1
IE from
Chinese Documents regarding Weather
Department of Terrestrial System, Chinese Academy of Sciences
200k+ documents
several millennia old
- Qing Dynasty Archives
- memos
- newspaper articles
- diaries
IE from Research Papers
[McCallum et al ‘99]
IE from Research Papers
Mining Research Papers
[Rosen-Zvi, Griffiths, Steyvers,
Smyth, 2004]
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + clustering + association
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
Microsoft Corporation
CEO
Bill Gates
Microsoft
Gates
Microsoft
Bill Veghte
Microsoft
VP
Richard Stallman
founder
Free Software Foundation
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + association + clustering
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
Microsoft Corporation
CEO
Bill Gates
Microsoft
Gates
Microsoft
Bill Veghte
Microsoft
VP
Richard Stallman
founder
Free Software Foundation
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + association + clustering
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
Microsoft Corporation
CEO
Bill Gates
Microsoft
Gates
Microsoft
Bill Veghte
Microsoft
VP
Richard Stallman
founder
Free Software Foundation
What is “Information Extraction”
As a family
of techniques:
Information Extraction =
segmentation + classification + association + clustering
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the opensource concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
* Microsoft Corporation
CEO
Bill Gates
* Microsoft
Gates
* Microsoft
Bill Veghte
* Microsoft
VP
Richard Stallman
founder
Free Software Foundation
Larger Context
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Prediction
Outlier detection
Decision support
Outline
• Examples of IE and Data Mining.
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
• Piecewise Training for large multi-component models
• Two example projects
– Email, contact management, and Social Network Analysis
– Research Paper search and analysis
Hidden Markov Models
HMMs are the standard sequence modeling tool in
genomics, music, speech, NLP, …
Graphical model
Finite state model
S t-1
St
S t+1
...
...
observations
...
Generates:
State
sequence
Observation
sequence
transitions
O
Ot
t -1
O t +1

|o|
o1
o2
o3
o4
o5
o6
o7
o8
 
P( s , o )   P( st | st 1 ) P(ot | st )
S={s1,s2,…}
Start state probabilities: P(st )
Transition probabilities: P(st|st-1 )
t 1
Parameters: for all states
Usually a multinomial over
Observation (emission) probabilities: P(ot|st ) atomic, fixed alphabet
Training:
Maximize probability of training observations (w/ prior)
IE with Hidden Markov Models
Given a sequence of observations:
Yesterday Rich Caruana spoke this example sentence.
and a trained HMM:
person name
location name
background
Find the most likely state sequence: (Viterbi)
Yesterday Rich Caruana spoke this example sentence.
Any words said to be generated by the designated “person name”
state extract as a person name:
Person name: Rich Caruana
(Linear Chain) Conditional Random Fields
[Lafferty, McCallum, Pereira 2001]
Undirected graphical model,
trained to maximize conditional probability of outputs given inputs
Finite state model
Graphical model
OTHER
y t-1
PERSON
yt
OTHER
y t+1
ORG
y t+2
TITLE …
y t+3
output seq
FSM states
...
observations
x
x
t -1
said
t
Veght
1 T
p(y | x) 
y (y t , y t1 )xy (x t , y t )

Z(x) t1
x
a
x
t +1
t +2
Microsoft
where
x
t +3
VP …
input seq


()  exp k f k ()
 k

Fast-growing, wide-spread interest, many positive experimental results.
Noun phrase, Named entity [HLT’03], [CoNLL’03]
Protein structure prediction [ICML’04]
IE from Bioinformatics text [Bioinformatics ‘04],…
Asian word segmentation [COLING’04], [ACL’04]
IE from
 Research papers [HTL’04]
Object classification in images [CVPR ‘04]
Table Extraction from Government Reports
Cash receipts from marketings of milk during 1995 at $19.9 billion dollars, was
slightly below 1994. Producer returns averaged $12.93 per hundredweight,
$0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds,
1 percent above 1994. Marketings include whole milk sold to plants and dealers
as well as milk sold directly to consumers.
An estimated 1.56 billion pounds of milk were used on farms where produced,
8 percent less than 1994. Calves were fed 78 percent of this milk with the
remainder consumed in producer households.
Milk Cows and Production of Milk and Milkfat:
United States, 1993-95
-------------------------------------------------------------------------------:
:
Production of Milk and Milkfat 2/
:
Number
:------------------------------------------------------Year
:
of
:
Per Milk Cow
:
Percentage
:
Total
:Milk Cows 1/:-------------------: of Fat in All :-----------------:
: Milk : Milkfat : Milk Produced : Milk : Milkfat
-------------------------------------------------------------------------------: 1,000 Head
--- Pounds --Percent
Million Pounds
:
1993
:
9,589
15,704
575
3.66
150,582 5,514.4
1994
:
9,500
16,175
592
3.66
153,664 5,623.7
1995
:
9,461
16,451
602
3.66
155,644 5,694.3
-------------------------------------------------------------------------------1/ Average number during year, excluding heifers not yet fresh.
2/ Excludes milk sucked by calves.
Table Extraction from Government Reports
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
100+ documents from www.fedstats.gov
Labels:
CRF
of milk during 1995 at $19.9 billion dollars, was
eturns averaged $12.93 per hundredweight,
1994. Marketings totaled 154 billion pounds,
ngs include whole milk sold to plants and dealers
consumers.
ds of milk were used on farms where produced,
es were fed 78 percent of this milk with the
cer households.
1993-95
------------------------------------
n of Milk and Milkfat 2/
-------------------------------------: Percentage :
Non-Table
Table Title
Table Header
Table Data Row
Table Section Data Row
Table Footnote
... (12 in all)
Features:
uction of Milk and Milkfat:
w
•
•
•
•
•
•
•
Total
----: of Fat in All :-----------------Milk Produced : Milk : Milkfat
------------------------------------
•
•
•
•
•
•
•
Percentage of digit chars
Percentage of alpha chars
Indented
Contains 5+ consecutive spaces
Whitespace in this line aligns with prev.
...
Conjunctions of all previous features,
time offset: {0,0}, {-1,0}, {0,1}, {1,2}.
Table Extraction Experimental Results
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
Line labels,
percent correct
HMM
65 %
Stateless
MaxEnt
85 %
CRF
95 %
Table segments,
F1
64 %
92 %
IE from Research Papers
[McCallum et al ‘99]
IE from Research Papers
Field-level F1
Hidden Markov Models (HMMs)
75.6
[Seymore, McCallum, Rosenfeld, 1999]
Support Vector Machines (SVMs)
89.7
 error
40%
[Han, Giles, et al, 2003]
Conditional Random Fields (CRFs)
[Peng, McCallum, 2004]
93.9
Named Entity Recognition
CRICKET MILLNS SIGNS FOR BOLAND
CAPE TOWN 1996-08-22
South African provincial side
Boland said on Thursday they
had signed Leicestershire fast
bowler David Millns on a one
year contract.
Millns, who toured Australia with
England A in 1992, replaces
former England all-rounder
Phillip DeFreitas as Boland's
overseas professional.
Labels:
PER
ORG
LOC
MISC
Examples:
Yayuk Basuki
Innocent Butare
3M
KDP
Cleveland
Cleveland
Nirmal Hriday
The Oval
Java
Basque
1,000 Lakes Rally
Named Entity Extraction Results
[McCallum & Li, 2003, CoNLL]
Method
F1
HMMs BBN's Identifinder
73%
CRFs w/out Feature Induction 83%
CRFs with Feature Induction
based on LikelihoodGain
90%
Outline
• Examples of IE and Data Mining.
a
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
• Piecewise Training for large multi-component models
• Two example projects
– Email, contact management, and Social Network Analysis
– Research Paper search and analysis
Larger Context
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Prediction
Outlier detection
Decision support
Knowledge
Discovery
IE
Segment
Classify
Associate
Cluster
Problem:
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Combined in serial juxtaposition,
IE and DM are unaware of each others’
weaknesses and opportunities.
1) DM begins from a populated DB, unaware of
where the data came from, or its inherent
uncertainties.
2) IE is unaware of emerging patterns and
regularities in the DB.
The accuracy of both suffers, and significant mining
of complex text sources is beyond reach.
Solution:
Uncertainty Info
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Discover patterns
- entity types
- links / relations
- events
Database
Document
collection
Actionable
knowledge
Emerging Patterns
Prediction
Outlier detection
Decision support
Solution:
Unified Model
Spider
Filter
Data
Mining
IE
Segment
Classify
Associate
Cluster
Probabilistic
Model
Discover patterns
- entity types
- links / relations
- events
Discriminatively-trained undirected graphical models
Document
collection
Conditional Random Fields
[Lafferty, McCallum, Pereira]
Conditional PRMs
[Koller…], [Jensen…],
[Geetor…], [Domingos…]
Complex Inference and Learning
Just what we researchers like to sink our teeth into!
Actionable
knowledge
Prediction
Outlier detection
Decision support
Larger-scale Joint Inference for IE
• What model structures will capture salient dependencies?
• Will joint inference improve accuracy?
• How do to inference in these large graphical models?
• How to efficiently train these models,
which are built from multiple large components?
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
But errors cascade--must be perfect at every stage to do well.
1. Jointly labeling cascaded sequences
Factorial CRFs
[Sutton, Khashayar,
McCallum, ICML 2004]
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
Joint prediction of part-of-speech and noun-phrase in newswire,
matching accuracy with only 50% of the training data.
Inference:
Tree reparameterization BP
[Wainwright et al, 2002]
2. Jointly labeling distant mentions
Skip-chain CRFs [Sutton, McCallum, SRL 2004]
…
Senator Joe Green said today
…
.
Green ran
for …
Dependency among similar, distant mentions ignored.
2. Jointly labeling distant mentions
Skip-chain CRFs [Sutton, McCallum, SRL 2004]
…
Senator Joe Green said today
…
.
Green ran
14% reduction in error on most repeated field
in email seminar announcements.
Inference:
Tree reparameterization BP
[Wainwright et al, 2002]
for …
3. Joint co-reference among all pairs
Affinity Matrix CRF
“Entity resolution”
“Object correspondence”
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
99
Y/N
Y/N
11
~25% reduction in error on
co-reference of
proper nouns in newswire.
. . . she . . .
Inference:
Correlational clustering
graph partitioning
[Bansal, Blum, Chawla, 2002]
[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]
Coreference Resolution
AKA "record linkage", "database record deduplication",
"entity resolution", "object correspondence", "identity uncertainty"
Output
Input
News article,
with named-entity "mentions" tagged
Number of entities, N = 3
Today Secretary of State Colin Powell
met with . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . he . . . . . .
. . . . . . . . . . . . . Condoleezza Rice . . . . .
. . . . Mr Powell . . . . . . . . . .she . . . . . . .
. . . . . . . . . . . . . . Powell . . . . . . . . . . . .
. . . President Bush . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . Rice . . . . . . . . . .
. . . . . . Bush . . . . . . . . . . . . . . . . . . . . .
........... . . . . . . . . . . . . . . . .
#1
Secretary of State Colin Powell
he
Mr. Powell
Powell
#2
Condoleezza Rice
she
Rice
.........................
#3
President Bush
Bush
Inside the Traditional Solution
Pair-wise Affinity Metric
Mention (3)
. . . Mr Powell . . .
N
Y
Y
Y
Y
N
Y
Y
N
Y
N
N
Y
Y
Mention (4)
Y/N?
. . . Powell . . .
Two words in common
One word in common
"Normalized" mentions are string identical
Capitalized word in common
> 50% character tri-gram overlap
< 25% character tri-gram overlap
In same sentence
Within two sentences
Further than 3 sentences apart
"Hobbs Distance" < 3
Number of entities in between two mentions = 0
Number of entities in between two mentions > 4
Font matches
Default
OVERALL SCORE =
29
13
39
17
19
-34
9
8
-1
11
12
-3
1
-19
98
> threshold=0
The Problem
. . . Mr Powell . . .
affinity = 98
Y
affinity = 104
Pair-wise merging
decisions are being
made independently
from each other
. . . Powell . . .
N
Y
affinity = 11
. . . she . . .
Affinity measures are noisy and imperfect.
They should be made
in relational dependence
with each other.
A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003, ICML]
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
30
Y/N
Y/N
Make pair-wise merging
decisions in dependent
relation to each other by
- calculating a joint prob.
- including all edge weights
- adding dependence on
consistent triangles.
11
. . . she . . .


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
30
Y/N
Y/N
11
Make pair-wise merging
decisions in dependent
relation to each other by
- calculating a joint prob.
- including all edge weights
- adding dependence on
consistent triangles.

. . . she . . .


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
(45)
. . . Powell . . .
N
(30)
N
Y
(11)
. . . she . . .
4


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
(45)
. . . Powell . . .
Y
(30)
N
Y
(11)
. . . she . . .
infinity


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

A Markov Random Field for Co-reference
(MRF)
[McCallum & Wellner, 2003]
. . . Mr Powell . . .
(45)
. . . Powell . . .
Y
(30)
N
N
(11)
. . . she . . .
64


1
P(y | x ) 
exp
l f l (x i , x j , y ij )   ' f '(y ij , y jk , y ik )




Zx
i, j l
i, j,k

Inference in these MRFs = Graph Partitioning
[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
45
. . . Powell . . .
106
30
134
11
. . . Condoleezza Rice . . .
. . . she . . .
10
log (P(y | x )    l f l (x i , x j , y ij ) 
i, j
l
w
i, j w/in
paritions
ij

w
i, j across
paritions
ij
Inference in these MRFs = Graph Partitioning
[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
45
. . . Powell . . .
106
30
134
11
. . . Condoleezza Rice . . .
. . . she . . .
10
log (P(y | x )    l f l (x i , x j , y ij ) 
i, j
l
w
i, j w/in
paritions
ij

w
i, j across
paritions
ij
= 22
Inference in these MRFs = Graph Partitioning
[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
45
. . . Powell . . .
106
30
134
11
. . . Condoleezza Rice . . .
. . . she . . .
10
log (P(y | x )    l f l (x i , x j , y ij ) 
i, j
l
w
i, j w/in
paritions
ij

w'
i, j across
paritions
ij
= 314
Co-reference Experimental Results
[McCallum & Wellner, 2003]
Proper noun co-reference
DARPA ACE broadcast news transcripts, 117 stories
Single-link threshold
Best prev match [Morton]
MRFs
Partition F1
16 %
83 %
88 %
error=30%
Pair F1
18 %
89 %
92 %
error=28%
DARPA MUC-6 newswire article corpus, 30 stories
Single-link threshold
Best prev match [Morton]
MRFs
Partition F1
11%
70 %
74 %
error=13%
Pair F1
7%
76 %
80 %
error=17%
Joint Co-reference of Multiple Fields
[Culotta & McCallum 2005]
Citations
X. Li
Predicting the Stock Market
Venues
International Conference on
Knowledge Discovery and Data Mining
Y/N
Y/N
X. Li
Predicting the Stock Market
Y/N
KDD
Y/N
Y/N
Xuerui Li
Predict the Stock Market
Y/N
KDD
Joint Co-reference Experimental Results
[Culotta & McCallum 2005]
CiteSeer Dataset
1500 citations, 900 unique papers, 350 unique venues
constraint
reinforce
face
reason
Micro Average
Paper
indep joint
88.9
91.0
92.2
92.2
88.2
93.7
97.4
97.0
91.7
93.4
Venue
indep joint
79.4
94.1
56.5
60.1
80.9
82.8
75.6
79.5
73.1
79.1
error=20%
error=22%
Joint co-reference among all pairs
Affinity Matrix CRF
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
99
Y/N
Y/N
11
~25% reduction in error on
co-reference of
proper nouns in newswire.
. . . she . . .
Inference:
Correlational clustering
graph partitioning
[Bansal, Blum, Chawla, 2002]
[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]
4. Joint segmentation and co-reference
Extraction from and matching of
research paper citations.
o
s
Laurel, B. Interface Agents:
Metaphors with Character, in
The Art of Human-Computer Interface
Design, B. Laurel (ed), AddisonWesley, 1990.
World
Knowledge
c
y
Brenda Laurel. Interface Agents:
Metaphors with Character, in
Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
p
Co-reference
decisions
y
Database
field values
c
s
c
y
Citation attributes
s
o
Segmentation
o
35% reduction in co-reference error by using segmentation uncertainty.
6-14% reduction in segmentation error by using co-reference.
Inference:
Variant of Iterated Conditional Modes
[Besag, 1986]
[Wellner, McCallum, Peng, Hay, UAI 2004]
see also [Marthi, Milch, Russell, 2003]
4. Joint segmentation and co-reference
Joint IE and Coreference from Research Paper Citations
Textual citation mentions
(noisy, with duplicates)
Paper database, with fields,
clean, duplicates collapsed
AUTHORS
TITLE
Cowell, Dawid…
Probab…
Montemerlo, Thrun…FastSLAM…
Kjaerulff
Approxi…
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
VENUE
Springer
AAAI…
Technic…
Citation Segmentation and Coreference
Laurel, B.
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
Citation Segmentation and Coreference
Laurel, B.
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1)
Segment citation fields
Citation Segmentation and Coreference
Laurel, B.
Y
?
N
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1)
Segment citation fields
2)
Resolve coreferent citations
Citation Segmentation and Coreference
Laurel, B.
Y
?
N
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
AUTHOR =
TITLE =
PAGES =
BOOKTITLE =
EDITOR =
PUBLISHER =
YEAR =
Brenda Laurel
Interface Agents: Metaphors with Character
355-366
The Art of Human-Computer Interface Design
T. Smith
Addison-Wesley
1990
1)
Segment citation fields
2)
Resolve coreferent citations
3)
Form canonical database record
Resolving conflicts
Citation Segmentation and Coreference
Laurel, B.
Y
?
N
Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
AUTHOR =
TITLE =
PAGES =
BOOKTITLE =
EDITOR =
PUBLISHER =
YEAR =
Perform
Brenda Laurel
Interface Agents: Metaphors with Character
355-366
The Art of Human-Computer Interface Design
T. Smith
Addison-Wesley
1990
1)
Segment citation fields
2)
Resolve coreferent citations
3)
Form canonical database record
jointly.
IE + Coreference Model
AUT AUT YR TITL TITL
CRF Segmentation
s
Observed citation
x
J Besag 1986 On the…
IE + Coreference Model
AUTHOR = “J Besag”
YEAR =
“1986”
TITLE = “On the…”
Citation mention attributes
c
CRF Segmentation
s
Observed citation
x
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Structure for each
citation mention
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Binary coreference variables
for each pair of mentions
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
Binary coreference variables
for each pair of mentions
y
n
n
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
IE + Coreference Model
Smyth
,
P Data mining…
AUTHOR = “P Smyth”
YEAR =
“2001”
TITLE = “Data Mining…”
...
Research paper entity
attribute nodes
y
n
n
c
s
x
Smyth . 2001 Data Mining…
J Besag 1986 On the…
Such a highly connected graph makes
exact inference intractable…
Parameter Estimation
Separately for different regions
IE Linear-chain
Exact MAP
Coref graph edge weights
MAP on individual edges
Entity attribute potentials
MAP, pseudo-likelihood
y
n
n
In all cases:
Climb MAP gradient with
quasi-Newton method
Outline
• Examples of IE and Data Mining.
a
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
a
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
• Piecewise Training for large multi-component models
• Two example projects
– Email, contact management, and Social Network Analysis
– Research Paper search and analysis
Piecewise Training
Piecewise Training with NOTA
Experimental Results
Named entity tagging (CoNLL-2003)
Training set = 15k newswire sentences
9 labels
Test F1
Training time
CRF
89.87
9 hours
MEMM
88.90
1 hour
CRF-PT
90.50
5.3 hours
stat. sig. improvement at
p = 0.001
Experimental Results 2
Part-of-speech tagging (Penn Treebank, small subset)
Training set = 1154 newswire sentences
45 labels
Test F1
Training time
CRF
88.1
14 hours
MEMM
88.1
2 hours
CRF-PT
88.8
2.5 hours
stat. sig. improvement at
p = 0.001
“Parameter Independence Diagrams”
Graphical models = formalism for representing
independence assumptions among variables.
Here we represent
independence assumptions among parameters (in factor graph)
Piecewise Training Research Questions
• How to select the boundaries of “pieces”?
• What choices of limited interaction are best?
• How to sample sparse subsets of NOTA instances?
• Application to simpler models (classifiers)
• Application to more complex models (parsing)
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
Emailed seminar ann’mt entities
Email English words
GRAND CHALLENGES FOR MACHINE
LEARNING
60k words training.
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from
obscurity in the 1970s into a vibrant
and popular discipline in artificial
intelligence during the 1980s and 1990s.
As a result of its success and growth,
machine learning is evolving into a
collection of related disciplines:
inductive concept acquisition, analytic
learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist
learning, hybrid systems, and so on.
Too little labeled training data.
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
Train on “related” task with more data.
Newswire named entities
Newswire English words
200k words training.
CRICKET MILLNS SIGNS FOR BOLAND
CAPE TOWN 1996-08-22
South African provincial side Boland said
on Thursday they had signed
Leicestershire fast bowler David Millns on
a one year contract.
Millns, who toured Australia with England
A in 1992, replaces former England allrounder Phillip DeFreitas as Boland's
overseas professional.
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
At test time, label email with newswire NEs...
Newswire named entities
Email English words
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
…then use these labels as features for final task
Emailed seminar ann’mt entities
Newswire named entities
Email English words
Piecewise Training in Factorial CRFs
for Transfer Learning
[Sutton, McCallum, 2005]
Piecewise training of a joint model.
Seminar Announcement entities
Newswire named entities
English words
CRF Transfer Experimental Results
[Sutton, McCallum, 2005]
Seminar Announcements Dataset [Freitag 1998]
CRF
stime
etime
location speaker overall
No transfer
99.1
97.3
81.0
73.7
87.8
Cascaded transfer
99.2
96.0
84.3
74.2
88.4
Joint transfer
99.1
96.0
85.3
76.3
89.2
New “best published”
accuracy on common
dataset
Outline
a
• Examples of IE and Data Mining.
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
a
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
• Piecewise Training for large multi-component models
a
• Two example projects
– Email, contact management, and Social Network Analysis
– Research Paper search and analysis
Managing and Understanding
Connections of People in our Email World
Workplace effectiveness ~ Ability to leverage network of acquaintances
But filling Contacts DB by hand is tedious, and incomplete.
Contacts DB
Email Inbox
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Automatically
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
WWW
Qu i ck Ti me ™a nd a
TIF F (Un co mpre ss ed )d ec omp res so r
a re ne ed ed to s ee th i s pi c tu re.
System Overview
WWW
Email
CRF
Qu i ck Ti me ™a nd a
TIF F (Un co mpre ss ed )d ec omp res so r
a re ne ed ed to s ee th i s pi c tu re.
Keyword
Extraction
Person
Name
Extraction
Name
Coreference
Homepage
Retrieval
names
Contact
Info and
Person
Name
Extraction
Social
Network
Analysis
An Example
To: “Andrew McCallum” mccallum@cs.umass.edu
Subject ...
Search for
new people
First
Name:
Andrew
Middle
Name:
Kachites
Last
Name:
McCallum
JobTitle:
Associate Professor
Company:
University of Massachusetts
Street
Address:
140 Governor’s Dr.
City:
Amherst
State:
MA
Zip:
01003
Company
Phone:
(413) 545-1323
Links:
Fernando Pereira, Sam
Roweis,…
Key
Words:
Information extraction,
social network,…
Example keywords
extracted
Person
Keywords
William Cohen
Logic programming
Text categorization
Data integration
Rule learning
Daphne Koller
Bayesian networks
Relational models
Probabilistic models
Hidden variables
Deborah
McGuiness
Semantic web
Description logics
Knowledge representation
Ontologies
Tom Mitchell
1.
2.
Machine learning
Cognitive states
Learning apprentice
Artificial intelligence
Summary of Results
Contact info and name extraction performance (25 fields)
CRF
Token
Acc
Field
Prec
Field
Recall
Field
F1
94.50
85.73
76.33
80.76
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Expert Finding:
When solving some task, find friends-of-friends with relevant expertise.
Avoid “stove-piping” in large org’s by automatically suggesting collaborators.
Given a task, automatically suggest the right team for the job. (Hiring aid!)
Social Network Analysis:
Understand the social structure of your organization.
Suggest structural changes for improved efficiency.
From LDA to Author-Recipient-Topic
(ART)
Inference and Estimation
Gibbs Sampling:
- Easy to implement
- Reasonably fast
r
Enron Email Corpus
• 250k email messages
• 23k people
Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)
From: debra.perlingiere@enron.com
To: steve.hooser@enron.com
Subject: Enron/TransAltaContract dated Jan 1, 2001
Please see below. Katalin Kiss of TransAlta has requested an
electronic copy of our final draft? Are you OK with this? If
so, the only version I have is the original draft without
revisions.
DP
Debra Perlingiere
Enron North America Corp.
Legal Department
1400 Smith Street, EB 3885
Houston, Texas 77002
dperlin@enron.com
Topics, and prominent sender/receivers
discovered by ART
Topics, and prominent sender/receivers
discovered by ART
Beck = “Chief Operations Officer”
Dasovich = “Government Relations Executive”
Shapiro = “Vice Presidence of Regulatory Affairs”
Steffes = “Vice President of Government Affairs”
Comparing Role Discovery
Traditional SNA
ART
Author-Topic
distribution over
authored topics
distribution over
authored topics
connection strength (A,B) =
distribution over
recipients
Comparing Role Discovery
Tracy Geaconne  Dan McCarty
Traditional SNA
ART
Similar roles
Different roles
Geaconne = “Secretary”
McCarty = “Vice President”
Author-Topic
Different roles
Comparing Role Discovery
Tracy Geaconne  Rod Hayslett
Traditional SNA
Different roles
ART
Not very similar
Author-Topic
Very similar
Geaconne = “Secretary”
Hayslett = “Vice President & CTO”
Comparing Role Discovery
Lynn Blair  Kimberly Watson
Traditional SNA
Different roles
ART
Very similar
Author-Topic
Very different
Blair = “Gas pipeline logistics”
Watson = “Pipeline facilities planning”
McCallum Email Corpus 2004
• January - October 2004
• 23k email messages
• 825 people
From: kate@cs.umass.edu
Subject: NIPS and ....
Date: June 14, 2004 2:27:41 PM EDT
To: mccallum@cs.umass.edu
There is pertinent stuff on the first yellow folder that is
completed either travel or other things, so please sign that
first folder anyway. Then, here is the reminder of the things
I'm still waiting for:
NIPS registration receipt.
CALO registration receipt.
Thanks,
Kate
McCallum Email Blockstructure
Four most prominent topics
in discussions with ____?
Two most prominent topics
in discussions with ____?
Words
love
house
time
great
hope
dinner
saturday
left
ll
visit
evening
stay
bring
weekend
road
sunday
kids
flight
Prob
0.030514
0.015402
0.013659
0.012351
0.011334
0.011043
0.00959
0.009154
0.009154
0.009009
0.008282
0.008137
0.008137
0.007847
0.007701
0.007411
0.00712
0.006829
0.006539
0.006539
Words
today
tomorrow
time
ll
meeting
week
talk
meet
morning
monday
back
call
free
home
won
day
hope
leave
office
tuesday
Prob
0.051152
0.045393
0.041289
0.039145
0.033877
0.025484
0.024626
0.023279
0.022789
0.020767
0.019358
0.016418
0.015621
0.013967
0.013783
0.01311
0.012987
0.012987
0.012742
0.012558
Topic 40
Words
code
mallet
files
al
file
version
java
test
problem
run
cvs
add
directory
release
output
bug
source
ps
log
created
Prob
0.060565
0.042015
0.029115
0.024201
0.023587
0.022113
0.021499
0.020025
0.018305
0.015356
0.013391
0.012776
0.012408
0.012285
0.011916
0.011179
0.010197
0.009705
0.008968
0.0086
Sender
Recipient
Prob
hough
mccallum
0.067076
mccallum
hough
0.041032
mikem
mccallum
0.028501
culotta
mccallum
0.026044
saunders
mccallum
0.021376
mccallum
saunders
0.019656
pereira
mccallum
0.017813
casutton
mccallum
0.017199
mccallum
ronb
0.013514
mccallum
pereira
0.013145
hough
melinda.gervasio 0.013022
mccallum
casutton
0.013022
fuchun
mccallum
0.010811
mccallum
culotta
0.009828
ronb
mccallum
0.009705
westy
hough
0.009214
xuerui
corrada
0.0086
ghuang
mccallum
0.008354
khash
mccallum
0.008231
melinda.gervasio
mccallum
0.008108
Role-Author-Recipient-Topic Models
Outline
• Examples of IE and Data Mining.
a
a
• Brief review of Conditional Random Fields
• Joint inference: Motivation and examples
a
– Joint Labeling of Cascaded Sequences
(Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution
– Joint Segmentation and Co-ref
(Graph Partitioning)
(Iterated Conditional Samples)
• Piecewise Training for large multi-component models
a
• Two example projects
– Email, contact management, and Social Network Analysis
a
– Research Paper search and analysis
Previous Systems
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Previous Systems
Cites
Research
Paper
More Entities and Relations
Expertise
Cites
Research
Paper
Grant
Venue
Person
University
Groups
Summary
• Conditional Random Fields
– Conditional probability models of structured data
• Data mining complex unstructured text suggests the
need for joint inference IE + DM.
• Early examples
–
–
–
–
Factorial finite state models
Jointly labeling distant entities
Coreference analysis
Segmentation uncertainty aiding coreference
• Piecewise Training
– Faster + higher accuracy
• Current projects
– Email, contact management, expert-finding, SNA
– Mining the scientific literature
End of Talk
Download