Poster

Multi-view Exploratory Learning for AKBC Problems Bhavana Dalvi and William W. Cohen School Of Computer Science, Carnegie Mellon University 2 Multi-view Exploratory EM  Traditional EM method for SSL jointly learns missing labels of unlabeled data points as well as model parameters. Inputs: 𝑿 : Labeled data points 𝑿  We consider two extensions of traditional EM for SSL: Outputs: {𝜽𝟏 … 𝜽𝒌+𝒎 }: Parameters for k seed and m newly added classes; 𝒍 (𝟏) (𝒗)  Iterate till convergence (Data likelihood and number of classes)  E Step (Iteration t): Predict labels for unlabeled data points For i = 1 : N 𝑷 𝑪𝒋 𝑿𝒊 ) = (𝟏)…(𝒗) CombineMultiViewScore(𝑿𝒊 (𝟏)…(v) , 𝜽𝒋 ) If NewClassCreationCriterion(𝑷 𝑪𝒋 𝑿𝒊 ), 𝒁𝒕 ) Example use-case of Exploratory EM Create a new class 𝑪𝒏𝒆𝒘 , assign 𝑿𝒊 to it 𝒁𝒕 = UpdateConstraints(𝒁𝒕 , 𝑪𝒏𝒆𝒘 , {𝒀𝒍 , 𝒀𝒖 }) 𝑌𝑖𝑡 = OptimalLabelAssignment 𝑷 𝑪𝒋 𝑿𝒊 , 𝑍 𝑡 ) Location  M step: Re-compute model parameters 𝜽𝒕+𝟏 using 𝒋 seeds and predicted labels for unlabeled data points 𝑌𝑖𝑡 . Food 0.1 0.9 C8 State Country Vegetable 0.55 Number of classes might increase in each iteration. Condiment  Check if model selection criterion is satisfied. 0.45 Modeling 3 Unobserved Classes Dynamically introducing new classes  Hypothesis: Dynamically inducing clusters of data-points that do not belong to any of the seeded classes will reduce the semantic drift.  For each data-point 𝑋𝑖 , we compute posterior distribution 𝑃 𝐶𝑗 𝑋𝑖 ) of 𝑋𝑖 belonging to any of the existing classes 𝐶1 … 𝐶𝑘 [Dalvi et al., ECML’13]  Criterion 1 : MinMax If not, revert to model in Iteration `t-1’ Incorporating Multiple Views 4 and Ontological Constraints Multiple Data Views  Each data point and class centroid or classifier has (𝑣) (1) (𝑣) 1 representation in multiple views 𝑥𝑖 … 𝑥𝑖 and 𝐶𝑗 … 𝐶𝑗  E.g. In the noun phrase classification task, we consider co-occurrences of NPs in text sentences (View-1) and HTML tables (View-2).  Combining scores from multiple views  Sum-Score: Addition of scores  Prod-Score: Product of scores  Max-Agree: Maximize agreement between per view label assignments [Dalvi and Cohen, in submission] 𝑚𝑎𝑥𝑃 = 𝑚𝑎𝑥(𝑃 𝐶𝑗 𝑋𝑖 )), 𝑚𝑖𝑛𝑃 = 𝑚𝑖𝑛(𝑃 𝐶𝑗 𝑋_𝑖)) If (𝑚𝑎𝑥𝑃 𝑚𝑖𝑛𝑃 < 2) → Create a new class/cluster  Criterion 2 : JS (Jensen–Shannon divergence) 𝑢𝑛𝑖𝑃 = uniform distribution over k classes 𝑗𝑠𝐷𝑖𝑣 = JS−Divergence(𝑢𝑛𝑖𝑃, 𝑃(𝐶𝑗 |𝑋𝑖 )) if (𝑗𝑠𝐷𝑖𝑣 < 1/𝑘)  Create a new class/cluster  For hierarchical classification we also need to decide where to place this newly created class:  Divide and conquer method for extending tree structured ontology [Dalvi et al. AKBC 2013]  Extension of DAC to extend a generic ontology with subset and mutual exclusion constraints (OptDAC) [Dalvi and Cohen, under review] Ontological Constraints  Each data point is assigned a bit vector of labels. Subset and mutual exclusion constraints decide consistency of potential bit vectors.  GLOFIN: A mixed integer program is solved for each data point to get optimal label vector. [Dalvi et al. WSDM 2015] 20 Newsgroups Dataset (#seed classes = 6) Micro-reading  Task: To classify an entity mention using context specific features .  Clustering NIL entities for KBP entity discovery and linking (EDL) task [Mazaitis et al., KBP 2014] Multi-view Hierarchical SSL (MaxAgree)  MaxAgree method exploits clues from different data views.  We define multi-view clustering as an optimization problem and compare various methods for combining scores across views. Correlation w.r.t MaxAgree method is more robust compared Performance improvement difference in views to Prod-Score method when we vary over best view Coefficient P-value difference of performance between views. Prod-Score -0.59 0.01  Our proposed Hier-MaxAgree method can MaxAgree -0.05 0.82 incorporate both: the clues from multiple view, and ontological 70 constraints. 65 Concatenation [Dalvi and Cohen, in submission] 60  On entity classification for Co-training 55 NELL KB, our proposed Sum-Score 50 Hier-MaxAgree method 45 Prod-Score gave state-of-the-art 40 performance. 5 10 15 20 25 30 Hier-MaxAgree Training Percentage Hierarchical Exploratory Learning (OptDAC)  We proposed OptDAC that can do hierarchical SSL in the presence of incomplete class ontologies.  It employs mixed integer programming formulation to find optimal label assignments for a data point, while traversing the class ontology in topdown fashion to detect whether a new class needs to be added and where to place it. [Dalvi and Cohen, under review] Text-patterns + Ontology-1 Text-patterns + Ontology-2 HTML-tables + Ontology-1 HTML-tables + Ontology-2 Model Selection  This step makes sure that we do not create too many new classes.  We tried BIC, AIC, and AICc criteria, and Extended AIC (AICc) worked best for our tasks. AICc(g) = AIC(g) + 2 * v * (v+1) / (n – v -1) Here g: Model being evaluated, L(g): Log-likelihood of data given g, v: Number of free parameters of the model, n: Number of data points. Macro-reading  Semi-supervised classification of noun-phrases into categories, using distributional features. (Explore-EM)  Exploratory learning can reduce semantic drift of seed classes. [Dalvi et al. ECML 2013]  Initialize the model 𝜽𝟎𝒋 with a few seeds per class 𝑪𝒋  Our proposed framework combines structural search for the best class hierarchy with SSL, reducing the semantic draft associated with erroneously grouping unanticipated classes with expected classes. Root  Optimized Divide and Conquer (OptDAC): Here we combine 1) divide and conquer based top-down strategy to detect and place new categories in the ontology, with 2) mixed integer programming technique (GLOFIN) to select optimal set of labels for a data point, consistent w.r.t. ontological constraints. An example of extended ontology by OptDAC Automatic gloss finding for KBs (GLOFIN) Different Document Representations 80 70 60 50 40 30 20 10 0  Naïve Bayes: Assumes multinomial distribution for feature occurrences, explicitly models class prior.  Seeded K-Means: Similarity based on cosine distance between centroids and data points  Seeded von Mises-Fisher: SSL method for data distributed on the unit hyper-sphere. 6 AKBC tasks Class 𝒋 can have 𝒗 data views 𝜽𝒋 … 𝜽𝒋 ; 𝒁𝒌+𝒎 : Set of class constraints between k+m classes; 𝒀𝒖 : Labels for 𝑿𝒖 .  Assigning multiple labels from multiple levels of class hierarchy while satisfying ontological constraints, and considering multiple data views. 1.0 𝒍 𝒗 …𝑿 𝒊𝒏 𝒄𝒂𝒔𝒆 𝒐𝒇 𝒗 𝒗𝒊𝒆𝒘𝒔 ; 𝒀𝒍 : Labels of 𝑿𝒍 ; 𝑿𝒖 : Unlabeled data points; N: #data points; 𝒌:#Seed classes; 𝒁𝒌 : Constraints on k seed classes.  We consider a new latent variable, unobserved classes, by dynamically introducing new classes when appropriate. Coke 𝒍 𝟏 5 Macro-averaged F1 score Motivation 1 SVM Labal Propagation GLOFIN-Naïve-Bayes Precision Recall F1 Conclusions  Exploratory learning helps reduce semantic drift of seeded classes. It gets more powerful in conjunction with multiple data views and class hierarchy, when imposed as soft-constraints on the label vectors.  It can be applied for multiple AKBC tasks like macroreading, gloss finding, ontology extension etc.  Datasets and code can be downloaded from: www.cs.cmu.edu/~bbd/exploratory_learning  We developed GLOFIN method that takes a gloss-free KB, a large collection of glosses and automatically matches glosses to entities in the KB. [Dalvi et al. WSDM 2015]  We used Glosses with only one candidate KB entity (unambiguous glosses) are used as training data to train hierarchical classification model for categories in the KB. Ambiguous glosses are then disambiguated based on the KB category they are put in.  Our method outperformed SVM and a label propagation baseline especially when amount of training data is small.  In future: Apply GLOFIN to word sense disambiguation w.r.t. WordNet synset hierarchy. Acknowledgements : This work is supported by Google PhD Fellowship in Information Extraction and a Google Research Grant.

Poster

Related documents

Products

Support

Poster

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib