Toward Unified Graphical Models of Information Extraction and Data Mining Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta, Ben Wellner, Khashayar Rohanimanesh, Wei Li, Andres Corrada, Xuerui Wang Goal: Mine actionable knowledge from unstructured text. Extracting Job Openings from the Web foodscience.com-Job2 JobTitle: Ice Cream Guru Employer: foodscience.com JobCategory: Travel/Hospitality JobFunction: Food Services JobLocation: Upper Midwest Contact Phone: 800-488-2611 DateExtracted: January 8, 2001 Source: www.foodscience.com/jobs_midwest.htm OtherCompanyJobs: foodscience.com-Job1 IE from Chinese Documents regarding Weather Department of Terrestrial System, Chinese Academy of Sciences 200k+ documents several millennia old - Qing Dynasty Archives - memos - newspaper articles - diaries IE from Research Papers [McCallum et al ‘99] IE from Research Papers Mining Research Papers [Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004] QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + clustering + association October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the opensource concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the opensource concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the opensource concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation What is “Information Extraction” As a family of techniques: Information Extraction = segmentation + classification + association + clustering October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the opensource concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… * Microsoft Corporation CEO Bill Gates * Microsoft Gates * Microsoft Bill Veghte * Microsoft VP Richard Stallman founder Free Software Foundation Larger Context Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Document collection Actionable knowledge Prediction Outlier detection Decision support Outline • Examples of IE and Data Mining. a • Brief review of Conditional Random Fields • Joint inference: Motivation and examples – Joint Labeling of Cascaded Sequences (Belief Propagation) – Joint Labeling of Distant Entities (BP by Tree Reparameterization) – Joint Co-reference Resolution – Joint Segmentation and Co-ref (Graph Partitioning) (Iterated Conditional Samples) • Piecewise Training for large multi-component models • Two example projects – Email, contact management, and Social Network Analysis – Research Paper search and analysis Hidden Markov Models HMMs are the standard sequence modeling tool in genomics, music, speech, NLP, … Graphical model Finite state model S t-1 St S t+1 ... ... observations ... Generates: State sequence Observation sequence transitions O Ot t -1 O t +1 |o| o1 o2 o3 o4 o5 o6 o7 o8 P( s , o ) P( st | st 1 ) P(ot | st ) S={s1,s2,…} Start state probabilities: P(st ) Transition probabilities: P(st|st-1 ) t 1 Parameters: for all states Usually a multinomial over Observation (emission) probabilities: P(ot|st ) atomic, fixed alphabet Training: Maximize probability of training observations (w/ prior) IE with Hidden Markov Models Given a sequence of observations: Yesterday Rich Caruana spoke this example sentence. and a trained HMM: person name location name background Find the most likely state sequence: (Viterbi) Yesterday Rich Caruana spoke this example sentence. Any words said to be generated by the designated “person name” state extract as a person name: Person name: Rich Caruana (Linear Chain) Conditional Random Fields [Lafferty, McCallum, Pereira 2001] Undirected graphical model, trained to maximize conditional probability of outputs given inputs Finite state model Graphical model OTHER y t-1 PERSON yt OTHER y t+1 ORG y t+2 TITLE … y t+3 output seq FSM states ... observations x x t -1 said t Veght 1 T p(y | x) y (y t , y t1 )xy (x t , y t ) Z(x) t1 x a x t +1 t +2 Microsoft where x t +3 VP … input seq () exp k f k () k Fast-growing, wide-spread interest, many positive experimental results. Noun phrase, Named entity [HLT’03], [CoNLL’03] Protein structure prediction [ICML’04] IE from Bioinformatics text [Bioinformatics ‘04],… Asian word segmentation [COLING’04], [ACL’04] IE from Research papers [HTL’04] Object classification in images [CVPR ‘04] Table Extraction from Government Reports Cash receipts from marketings of milk during 1995 at $19.9 billion dollars, was slightly below 1994. Producer returns averaged $12.93 per hundredweight, $0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds, 1 percent above 1994. Marketings include whole milk sold to plants and dealers as well as milk sold directly to consumers. An estimated 1.56 billion pounds of milk were used on farms where produced, 8 percent less than 1994. Calves were fed 78 percent of this milk with the remainder consumed in producer households. Milk Cows and Production of Milk and Milkfat: United States, 1993-95 -------------------------------------------------------------------------------: : Production of Milk and Milkfat 2/ : Number :------------------------------------------------------Year : of : Per Milk Cow : Percentage : Total :Milk Cows 1/:-------------------: of Fat in All :-----------------: : Milk : Milkfat : Milk Produced : Milk : Milkfat -------------------------------------------------------------------------------: 1,000 Head --- Pounds --Percent Million Pounds : 1993 : 9,589 15,704 575 3.66 150,582 5,514.4 1994 : 9,500 16,175 592 3.66 153,664 5,623.7 1995 : 9,461 16,451 602 3.66 155,644 5,694.3 -------------------------------------------------------------------------------1/ Average number during year, excluding heifers not yet fresh. 2/ Excludes milk sucked by calves. Table Extraction from Government Reports [Pinto, McCallum, Wei, Croft, 2003 SIGIR] 100+ documents from www.fedstats.gov Labels: CRF of milk during 1995 at $19.9 billion dollars, was eturns averaged $12.93 per hundredweight, 1994. Marketings totaled 154 billion pounds, ngs include whole milk sold to plants and dealers consumers. ds of milk were used on farms where produced, es were fed 78 percent of this milk with the cer households. 1993-95 ------------------------------------ n of Milk and Milkfat 2/ -------------------------------------: Percentage : Non-Table Table Title Table Header Table Data Row Table Section Data Row Table Footnote ... (12 in all) Features: uction of Milk and Milkfat: w • • • • • • • Total ----: of Fat in All :-----------------Milk Produced : Milk : Milkfat ------------------------------------ • • • • • • • Percentage of digit chars Percentage of alpha chars Indented Contains 5+ consecutive spaces Whitespace in this line aligns with prev. ... Conjunctions of all previous features, time offset: {0,0}, {-1,0}, {0,1}, {1,2}. Table Extraction Experimental Results [Pinto, McCallum, Wei, Croft, 2003 SIGIR] Line labels, percent correct HMM 65 % Stateless MaxEnt 85 % CRF 95 % Table segments, F1 64 % 92 % IE from Research Papers [McCallum et al ‘99] IE from Research Papers Field-level F1 Hidden Markov Models (HMMs) 75.6 [Seymore, McCallum, Rosenfeld, 1999] Support Vector Machines (SVMs) 89.7 error 40% [Han, Giles, et al, 2003] Conditional Random Fields (CRFs) [Peng, McCallum, 2004] 93.9 Named Entity Recognition CRICKET MILLNS SIGNS FOR BOLAND CAPE TOWN 1996-08-22 South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional. Labels: PER ORG LOC MISC Examples: Yayuk Basuki Innocent Butare 3M KDP Cleveland Cleveland Nirmal Hriday The Oval Java Basque 1,000 Lakes Rally Named Entity Extraction Results [McCallum & Li, 2003, CoNLL] Method F1 HMMs BBN's Identifinder 73% CRFs w/out Feature Induction 83% CRFs with Feature Induction based on LikelihoodGain 90% Outline • Examples of IE and Data Mining. a a • Brief review of Conditional Random Fields • Joint inference: Motivation and examples – Joint Labeling of Cascaded Sequences (Belief Propagation) – Joint Labeling of Distant Entities (BP by Tree Reparameterization) – Joint Co-reference Resolution – Joint Segmentation and Co-ref (Graph Partitioning) (Iterated Conditional Samples) • Piecewise Training for large multi-component models • Two example projects – Email, contact management, and Social Network Analysis – Research Paper search and analysis Larger Context Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Document collection Actionable knowledge Prediction Outlier detection Decision support Knowledge Discovery IE Segment Classify Associate Cluster Problem: Discover patterns - entity types - links / relations - events Database Document collection Actionable knowledge Combined in serial juxtaposition, IE and DM are unaware of each others’ weaknesses and opportunities. 1) DM begins from a populated DB, unaware of where the data came from, or its inherent uncertainties. 2) IE is unaware of emerging patterns and regularities in the DB. The accuracy of both suffers, and significant mining of complex text sources is beyond reach. Solution: Uncertainty Info Spider Filter Data Mining IE Segment Classify Associate Cluster Discover patterns - entity types - links / relations - events Database Document collection Actionable knowledge Emerging Patterns Prediction Outlier detection Decision support Solution: Unified Model Spider Filter Data Mining IE Segment Classify Associate Cluster Probabilistic Model Discover patterns - entity types - links / relations - events Discriminatively-trained undirected graphical models Document collection Conditional Random Fields [Lafferty, McCallum, Pereira] Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…] Complex Inference and Learning Just what we researchers like to sink our teeth into! Actionable knowledge Prediction Outlier detection Decision support Larger-scale Joint Inference for IE • What model structures will capture salient dependencies? • Will joint inference improve accuracy? • How do to inference in these large graphical models? • How to efficiently train these models, which are built from multiple large components? 1. Jointly labeling cascaded sequences Factorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words 1. Jointly labeling cascaded sequences Factorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words 1. Jointly labeling cascaded sequences Factorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words But errors cascade--must be perfect at every stage to do well. 1. Jointly labeling cascaded sequences Factorial CRFs [Sutton, Khashayar, McCallum, ICML 2004] Named-entity tag Noun-phrase boundaries Part-of-speech English words Joint prediction of part-of-speech and noun-phrase in newswire, matching accuracy with only 50% of the training data. Inference: Tree reparameterization BP [Wainwright et al, 2002] 2. Jointly labeling distant mentions Skip-chain CRFs [Sutton, McCallum, SRL 2004] … Senator Joe Green said today … . Green ran for … Dependency among similar, distant mentions ignored. 2. Jointly labeling distant mentions Skip-chain CRFs [Sutton, McCallum, SRL 2004] … Senator Joe Green said today … . Green ran 14% reduction in error on most repeated field in email seminar announcements. Inference: Tree reparameterization BP [Wainwright et al, 2002] for … 3. Joint co-reference among all pairs Affinity Matrix CRF “Entity resolution” “Object correspondence” . . . Mr Powell . . . 45 . . . Powell . . . Y/N 99 Y/N Y/N 11 ~25% reduction in error on co-reference of proper nouns in newswire. . . . she . . . Inference: Correlational clustering graph partitioning [Bansal, Blum, Chawla, 2002] [McCallum, Wellner, IJCAI WS 2003, NIPS 2004] Coreference Resolution AKA "record linkage", "database record deduplication", "entity resolution", "object correspondence", "identity uncertainty" Output Input News article, with named-entity "mentions" tagged Number of entities, N = 3 Today Secretary of State Colin Powell met with . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . he . . . . . . . . . . . . . . . . . . . Condoleezza Rice . . . . . . . . . Mr Powell . . . . . . . . . .she . . . . . . . . . . . . . . . . . . . . . Powell . . . . . . . . . . . . . . . President Bush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rice . . . . . . . . . . . . . . . . Bush . . . . . . . . . . . . . . . . . . . . . ........... . . . . . . . . . . . . . . . . #1 Secretary of State Colin Powell he Mr. Powell Powell #2 Condoleezza Rice she Rice ......................... #3 President Bush Bush Inside the Traditional Solution Pair-wise Affinity Metric Mention (3) . . . Mr Powell . . . N Y Y Y Y N Y Y N Y N N Y Y Mention (4) Y/N? . . . Powell . . . Two words in common One word in common "Normalized" mentions are string identical Capitalized word in common > 50% character tri-gram overlap < 25% character tri-gram overlap In same sentence Within two sentences Further than 3 sentences apart "Hobbs Distance" < 3 Number of entities in between two mentions = 0 Number of entities in between two mentions > 4 Font matches Default OVERALL SCORE = 29 13 39 17 19 -34 9 8 -1 11 12 -3 1 -19 98 > threshold=0 The Problem . . . Mr Powell . . . affinity = 98 Y affinity = 104 Pair-wise merging decisions are being made independently from each other . . . Powell . . . N Y affinity = 11 . . . she . . . Affinity measures are noisy and imperfect. They should be made in relational dependence with each other. A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003, ICML] . . . Mr Powell . . . 45 . . . Powell . . . Y/N 30 Y/N Y/N Make pair-wise merging decisions in dependent relation to each other by - calculating a joint prob. - including all edge weights - adding dependence on consistent triangles. 11 . . . she . . . 1 P(y | x ) exp l f l (x i , x j , y ij ) ' f '(y ij , y jk , y ik ) Zx i, j l i, j,k A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . 45 . . . Powell . . . Y/N 30 Y/N Y/N 11 Make pair-wise merging decisions in dependent relation to each other by - calculating a joint prob. - including all edge weights - adding dependence on consistent triangles. . . . she . . . 1 P(y | x ) exp l f l (x i , x j , y ij ) ' f '(y ij , y jk , y ik ) Zx i, j l i, j,k A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . (45) . . . Powell . . . N (30) N Y (11) . . . she . . . 4 1 P(y | x ) exp l f l (x i , x j , y ij ) ' f '(y ij , y jk , y ik ) Zx i, j l i, j,k A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . (45) . . . Powell . . . Y (30) N Y (11) . . . she . . . infinity 1 P(y | x ) exp l f l (x i , x j , y ij ) ' f '(y ij , y jk , y ik ) Zx i, j l i, j,k A Markov Random Field for Co-reference (MRF) [McCallum & Wellner, 2003] . . . Mr Powell . . . (45) . . . Powell . . . Y (30) N N (11) . . . she . . . 64 1 P(y | x ) exp l f l (x i , x j , y ij ) ' f '(y ij , y jk , y ik ) Zx i, j l i, j,k Inference in these MRFs = Graph Partitioning [Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002] . . . Mr Powell . . . 45 . . . Powell . . . 106 30 134 11 . . . Condoleezza Rice . . . . . . she . . . 10 log (P(y | x ) l f l (x i , x j , y ij ) i, j l w i, j w/in paritions ij w i, j across paritions ij Inference in these MRFs = Graph Partitioning [Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002] . . . Mr Powell . . . 45 . . . Powell . . . 106 30 134 11 . . . Condoleezza Rice . . . . . . she . . . 10 log (P(y | x ) l f l (x i , x j , y ij ) i, j l w i, j w/in paritions ij w i, j across paritions ij = 22 Inference in these MRFs = Graph Partitioning [Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002] . . . Mr Powell . . . 45 . . . Powell . . . 106 30 134 11 . . . Condoleezza Rice . . . . . . she . . . 10 log (P(y | x ) l f l (x i , x j , y ij ) i, j l w i, j w/in paritions ij w' i, j across paritions ij = 314 Co-reference Experimental Results [McCallum & Wellner, 2003] Proper noun co-reference DARPA ACE broadcast news transcripts, 117 stories Single-link threshold Best prev match [Morton] MRFs Partition F1 16 % 83 % 88 % error=30% Pair F1 18 % 89 % 92 % error=28% DARPA MUC-6 newswire article corpus, 30 stories Single-link threshold Best prev match [Morton] MRFs Partition F1 11% 70 % 74 % error=13% Pair F1 7% 76 % 80 % error=17% Joint Co-reference of Multiple Fields [Culotta & McCallum 2005] Citations X. Li Predicting the Stock Market Venues International Conference on Knowledge Discovery and Data Mining Y/N Y/N X. Li Predicting the Stock Market Y/N KDD Y/N Y/N Xuerui Li Predict the Stock Market Y/N KDD Joint Co-reference Experimental Results [Culotta & McCallum 2005] CiteSeer Dataset 1500 citations, 900 unique papers, 350 unique venues constraint reinforce face reason Micro Average Paper indep joint 88.9 91.0 92.2 92.2 88.2 93.7 97.4 97.0 91.7 93.4 Venue indep joint 79.4 94.1 56.5 60.1 80.9 82.8 75.6 79.5 73.1 79.1 error=20% error=22% Joint co-reference among all pairs Affinity Matrix CRF . . . Mr Powell . . . 45 . . . Powell . . . Y/N 99 Y/N Y/N 11 ~25% reduction in error on co-reference of proper nouns in newswire. . . . she . . . Inference: Correlational clustering graph partitioning [Bansal, Blum, Chawla, 2002] [McCallum, Wellner, IJCAI WS 2003, NIPS 2004] 4. Joint segmentation and co-reference Extraction from and matching of research paper citations. o s Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, B. Laurel (ed), AddisonWesley, 1990. World Knowledge c y Brenda Laurel. Interface Agents: Metaphors with Character, in Laurel, The Art of Human-Computer Interface Design, 355-366, 1990. p Co-reference decisions y Database field values c s c y Citation attributes s o Segmentation o 35% reduction in co-reference error by using segmentation uncertainty. 6-14% reduction in segmentation error by using co-reference. Inference: Variant of Iterated Conditional Modes [Besag, 1986] [Wellner, McCallum, Peng, Hay, UAI 2004] see also [Marthi, Milch, Russell, 2003] 4. Joint segmentation and co-reference Joint IE and Coreference from Research Paper Citations Textual citation mentions (noisy, with duplicates) Paper database, with fields, clean, duplicates collapsed AUTHORS TITLE Cowell, Dawid… Probab… Montemerlo, Thrun…FastSLAM… Kjaerulff Approxi… QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. VENUE Springer AAAI… Technic… Citation Segmentation and Coreference Laurel, B. Interface Agents: Metaphors with Character , in The Art of Human-Computer Interface Design , T. Smith (ed) , Addison-Wesley , 1990 . Brenda Laurel . Interface Agents: Metaphors with Character , in Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 . Citation Segmentation and Coreference Laurel, B. Interface Agents: Metaphors with Character , in The Art of Human-Computer Interface Design , T. Smith (ed) , Addison-Wesley , 1990 . Brenda Laurel . Interface Agents: Metaphors with Character , in Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 . 1) Segment citation fields Citation Segmentation and Coreference Laurel, B. Y ? N Interface Agents: Metaphors with Character , in The Art of Human-Computer Interface Design , T. Smith (ed) , Addison-Wesley , 1990 . Brenda Laurel . Interface Agents: Metaphors with Character , in Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 . 1) Segment citation fields 2) Resolve coreferent citations Citation Segmentation and Coreference Laurel, B. Y ? N Interface Agents: Metaphors with Character , in The Art of Human-Computer Interface Design , T. Smith (ed) , Addison-Wesley , 1990 . Brenda Laurel . Interface Agents: Metaphors with Character , in Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 . AUTHOR = TITLE = PAGES = BOOKTITLE = EDITOR = PUBLISHER = YEAR = Brenda Laurel Interface Agents: Metaphors with Character 355-366 The Art of Human-Computer Interface Design T. Smith Addison-Wesley 1990 1) Segment citation fields 2) Resolve coreferent citations 3) Form canonical database record Resolving conflicts Citation Segmentation and Coreference Laurel, B. Y ? N Interface Agents: Metaphors with Character , in The Art of Human-Computer Interface Design , T. Smith (ed) , Addison-Wesley , 1990 . Brenda Laurel . Interface Agents: Metaphors with Character , in Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 . AUTHOR = TITLE = PAGES = BOOKTITLE = EDITOR = PUBLISHER = YEAR = Perform Brenda Laurel Interface Agents: Metaphors with Character 355-366 The Art of Human-Computer Interface Design T. Smith Addison-Wesley 1990 1) Segment citation fields 2) Resolve coreferent citations 3) Form canonical database record jointly. IE + Coreference Model AUT AUT YR TITL TITL CRF Segmentation s Observed citation x J Besag 1986 On the… IE + Coreference Model AUTHOR = “J Besag” YEAR = “1986” TITLE = “On the…” Citation mention attributes c CRF Segmentation s Observed citation x J Besag 1986 On the… IE + Coreference Model Smyth , P Data mining… Structure for each citation mention c s x Smyth . 2001 Data Mining… J Besag 1986 On the… IE + Coreference Model Smyth , P Data mining… Binary coreference variables for each pair of mentions c s x Smyth . 2001 Data Mining… J Besag 1986 On the… IE + Coreference Model Smyth , P Data mining… Binary coreference variables for each pair of mentions y n n c s x Smyth . 2001 Data Mining… J Besag 1986 On the… IE + Coreference Model Smyth , P Data mining… AUTHOR = “P Smyth” YEAR = “2001” TITLE = “Data Mining…” ... Research paper entity attribute nodes y n n c s x Smyth . 2001 Data Mining… J Besag 1986 On the… Such a highly connected graph makes exact inference intractable… Parameter Estimation Separately for different regions IE Linear-chain Exact MAP Coref graph edge weights MAP on individual edges Entity attribute potentials MAP, pseudo-likelihood y n n In all cases: Climb MAP gradient with quasi-Newton method Outline • Examples of IE and Data Mining. a a • Brief review of Conditional Random Fields • Joint inference: Motivation and examples a – Joint Labeling of Cascaded Sequences (Belief Propagation) – Joint Labeling of Distant Entities (BP by Tree Reparameterization) – Joint Co-reference Resolution – Joint Segmentation and Co-ref (Graph Partitioning) (Iterated Conditional Samples) • Piecewise Training for large multi-component models • Two example projects – Email, contact management, and Social Network Analysis – Research Paper search and analysis Piecewise Training Piecewise Training with NOTA Experimental Results Named entity tagging (CoNLL-2003) Training set = 15k newswire sentences 9 labels Test F1 Training time CRF 89.87 9 hours MEMM 88.90 1 hour CRF-PT 90.50 5.3 hours stat. sig. improvement at p = 0.001 Experimental Results 2 Part-of-speech tagging (Penn Treebank, small subset) Training set = 1154 newswire sentences 45 labels Test F1 Training time CRF 88.1 14 hours MEMM 88.1 2 hours CRF-PT 88.8 2.5 hours stat. sig. improvement at p = 0.001 “Parameter Independence Diagrams” Graphical models = formalism for representing independence assumptions among variables. Here we represent independence assumptions among parameters (in factor graph) Piecewise Training Research Questions • How to select the boundaries of “pieces”? • What choices of limited interaction are best? • How to sample sparse subsets of NOTA instances? • Application to simpler models (classifiers) • Application to more complex models (parsing) Piecewise Training in Factorial CRFs for Transfer Learning [Sutton, McCallum, 2005] Emailed seminar ann’mt entities Email English words GRAND CHALLENGES FOR MACHINE LEARNING 60k words training. Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. Too little labeled training data. Piecewise Training in Factorial CRFs for Transfer Learning [Sutton, McCallum, 2005] Train on “related” task with more data. Newswire named entities Newswire English words 200k words training. CRICKET MILLNS SIGNS FOR BOLAND CAPE TOWN 1996-08-22 South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England allrounder Phillip DeFreitas as Boland's overseas professional. Piecewise Training in Factorial CRFs for Transfer Learning [Sutton, McCallum, 2005] At test time, label email with newswire NEs... Newswire named entities Email English words Piecewise Training in Factorial CRFs for Transfer Learning [Sutton, McCallum, 2005] …then use these labels as features for final task Emailed seminar ann’mt entities Newswire named entities Email English words Piecewise Training in Factorial CRFs for Transfer Learning [Sutton, McCallum, 2005] Piecewise training of a joint model. Seminar Announcement entities Newswire named entities English words CRF Transfer Experimental Results [Sutton, McCallum, 2005] Seminar Announcements Dataset [Freitag 1998] CRF stime etime location speaker overall No transfer 99.1 97.3 81.0 73.7 87.8 Cascaded transfer 99.2 96.0 84.3 74.2 88.4 Joint transfer 99.1 96.0 85.3 76.3 89.2 New “best published” accuracy on common dataset Outline a • Examples of IE and Data Mining. a • Brief review of Conditional Random Fields • Joint inference: Motivation and examples a – Joint Labeling of Cascaded Sequences (Belief Propagation) – Joint Labeling of Distant Entities (BP by Tree Reparameterization) – Joint Co-reference Resolution – Joint Segmentation and Co-ref (Graph Partitioning) (Iterated Conditional Samples) • Piecewise Training for large multi-component models a • Two example projects – Email, contact management, and Social Network Analysis – Research Paper search and analysis Managing and Understanding Connections of People in our Email World Workplace effectiveness ~ Ability to leverage network of acquaintances But filling Contacts DB by hand is tedious, and incomplete. Contacts DB Email Inbox QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Automatically QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. WWW Qu i ck Ti me ™a nd a TIF F (Un co mpre ss ed )d ec omp res so r a re ne ed ed to s ee th i s pi c tu re. System Overview WWW Email CRF Qu i ck Ti me ™a nd a TIF F (Un co mpre ss ed )d ec omp res so r a re ne ed ed to s ee th i s pi c tu re. Keyword Extraction Person Name Extraction Name Coreference Homepage Retrieval names Contact Info and Person Name Extraction Social Network Analysis An Example To: “Andrew McCallum” mccallum@cs.umass.edu Subject ... Search for new people First Name: Andrew Middle Name: Kachites Last Name: McCallum JobTitle: Associate Professor Company: University of Massachusetts Street Address: 140 Governor’s Dr. City: Amherst State: MA Zip: 01003 Company Phone: (413) 545-1323 Links: Fernando Pereira, Sam Roweis,… Key Words: Information extraction, social network,… Example keywords extracted Person Keywords William Cohen Logic programming Text categorization Data integration Rule learning Daphne Koller Bayesian networks Relational models Probabilistic models Hidden variables Deborah McGuiness Semantic web Description logics Knowledge representation Ontologies Tom Mitchell 1. 2. Machine learning Cognitive states Learning apprentice Artificial intelligence Summary of Results Contact info and name extraction performance (25 fields) CRF Token Acc Field Prec Field Recall Field F1 94.50 85.73 76.33 80.76 QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!) Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency. From LDA to Author-Recipient-Topic (ART) Inference and Estimation Gibbs Sampling: - Easy to implement - Reasonably fast r Enron Email Corpus • 250k email messages • 23k people Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT) From: debra.perlingiere@enron.com To: steve.hooser@enron.com Subject: Enron/TransAltaContract dated Jan 1, 2001 Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions. DP Debra Perlingiere Enron North America Corp. Legal Department 1400 Smith Street, EB 3885 Houston, Texas 77002 dperlin@enron.com Topics, and prominent sender/receivers discovered by ART Topics, and prominent sender/receivers discovered by ART Beck = “Chief Operations Officer” Dasovich = “Government Relations Executive” Shapiro = “Vice Presidence of Regulatory Affairs” Steffes = “Vice President of Government Affairs” Comparing Role Discovery Traditional SNA ART Author-Topic distribution over authored topics distribution over authored topics connection strength (A,B) = distribution over recipients Comparing Role Discovery Tracy Geaconne Dan McCarty Traditional SNA ART Similar roles Different roles Geaconne = “Secretary” McCarty = “Vice President” Author-Topic Different roles Comparing Role Discovery Tracy Geaconne Rod Hayslett Traditional SNA Different roles ART Not very similar Author-Topic Very similar Geaconne = “Secretary” Hayslett = “Vice President & CTO” Comparing Role Discovery Lynn Blair Kimberly Watson Traditional SNA Different roles ART Very similar Author-Topic Very different Blair = “Gas pipeline logistics” Watson = “Pipeline facilities planning” McCallum Email Corpus 2004 • January - October 2004 • 23k email messages • 825 people From: kate@cs.umass.edu Subject: NIPS and .... Date: June 14, 2004 2:27:41 PM EDT To: mccallum@cs.umass.edu There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for: NIPS registration receipt. CALO registration receipt. Thanks, Kate McCallum Email Blockstructure Four most prominent topics in discussions with ____? Two most prominent topics in discussions with ____? Words love house time great hope dinner saturday left ll visit evening stay bring weekend road sunday kids flight Prob 0.030514 0.015402 0.013659 0.012351 0.011334 0.011043 0.00959 0.009154 0.009154 0.009009 0.008282 0.008137 0.008137 0.007847 0.007701 0.007411 0.00712 0.006829 0.006539 0.006539 Words today tomorrow time ll meeting week talk meet morning monday back call free home won day hope leave office tuesday Prob 0.051152 0.045393 0.041289 0.039145 0.033877 0.025484 0.024626 0.023279 0.022789 0.020767 0.019358 0.016418 0.015621 0.013967 0.013783 0.01311 0.012987 0.012987 0.012742 0.012558 Topic 40 Words code mallet files al file version java test problem run cvs add directory release output bug source ps log created Prob 0.060565 0.042015 0.029115 0.024201 0.023587 0.022113 0.021499 0.020025 0.018305 0.015356 0.013391 0.012776 0.012408 0.012285 0.011916 0.011179 0.010197 0.009705 0.008968 0.0086 Sender Recipient Prob hough mccallum 0.067076 mccallum hough 0.041032 mikem mccallum 0.028501 culotta mccallum 0.026044 saunders mccallum 0.021376 mccallum saunders 0.019656 pereira mccallum 0.017813 casutton mccallum 0.017199 mccallum ronb 0.013514 mccallum pereira 0.013145 hough melinda.gervasio 0.013022 mccallum casutton 0.013022 fuchun mccallum 0.010811 mccallum culotta 0.009828 ronb mccallum 0.009705 westy hough 0.009214 xuerui corrada 0.0086 ghuang mccallum 0.008354 khash mccallum 0.008231 melinda.gervasio mccallum 0.008108 Role-Author-Recipient-Topic Models Outline • Examples of IE and Data Mining. a a • Brief review of Conditional Random Fields • Joint inference: Motivation and examples a – Joint Labeling of Cascaded Sequences (Belief Propagation) – Joint Labeling of Distant Entities (BP by Tree Reparameterization) – Joint Co-reference Resolution – Joint Segmentation and Co-ref (Graph Partitioning) (Iterated Conditional Samples) • Piecewise Training for large multi-component models a • Two example projects – Email, contact management, and Social Network Analysis a – Research Paper search and analysis Previous Systems QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Previous Systems Cites Research Paper More Entities and Relations Expertise Cites Research Paper Grant Venue Person University Groups Summary • Conditional Random Fields – Conditional probability models of structured data • Data mining complex unstructured text suggests the need for joint inference IE + DM. • Early examples – – – – Factorial finite state models Jointly labeling distant entities Coreference analysis Segmentation uncertainty aiding coreference • Piecewise Training – Faster + higher accuracy • Current projects – Email, contact management, expert-finding, SNA – Mining the scientific literature End of Talk