KBP2014 Entity Linking Scorer Xiaoman Pan, Qi Li, Heng Ji, Xiaoqiang Luo, Ralph Grishman jih@rpi.edu Overview We will apply two steps to evaluate the KBP2014 Entity Discovery and Linking results. 1. Clustering score 2. Entity Linking (Wikification) F-score Overview We use a tuple ⟨doc-id, start, end, entity-type, kb-id⟩ to represent each entity mention, where a special type of kb-id is NIL. Let s be an entity mention in the system output, g be an corresponding gold-standard. An output mention s matches a reference mention g iff: 1. s.doc-id = g.doc-id, 2. s.start = g.start, s.end = g.end, 3. s.entity-type = g.entity-type, 4. and s.kb-id = g.kb-id. Clustering score In this step, we only concern clustering performance. Thus, change all mentions’ kb_id to NIL. We will apply three metrics to evaluate the clustering score B-Cubed metric CEAF metric Graph Edit Distance (G-Edit) B-Cubed B-Cubed 3 B : Precision ● Precision = sum mention credits / #system-output-mentions = (1/2 + 2/2 + 2/2 +1/1 + 0)/6 = 0.583 1: 1/2 1 3 2 1 6 5 2: 2 /2 7 3 6: 2 /2 3: 1/1 4 4 4: 0 Gold Standard 2 6 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset 3 B : Recall ● Recall = sum mention credits / #gold-standard-mentions = (1/3+ 2/3 + 2/3 + 1/2)/6 = 0.361 1: 1/3 1 3 2 1 6 5 2: 2 /3 7 3 6: 2 /3 3: 1/2 4 4 4: 0 Gold Standard 2 6 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset CEAF CEAF (Luo 2005) Idea: a mention or entity should not be credited more than once Formulated as a bipartite matching problem A special ILP problem efficient algorithm: Kuhn-Munkres We will use CEAFm as the official scoring metric because it’s more sensitive to cluster size than CEAFe CEAF (Luo, 2005) CEAFm: Example ● Solid: best 1-1 alignment ● ● Recall=#common / #mentions-in-key = (2+1)/6 = 1/2 ● Precision= #common / #mentions-in-response = (2+1)/6 = 1/2 1 1 2 6 1 7 3 3 2 5 1 4 4 2 Gold Standard 6 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset CEAFe: Example ● Solid: best 1-1 alignment ● ● Recall=#common / #mentions-in-key = (2+1)/6 = 1/2 ● Precision= #common / #mentions-in-response = (2+1)/6 = 1/2 ● Will Jaccard index more reasonable? 2/5 1 3 2 1 6 5 4/5 2/3 7 3 4 4 2 Gold Standard 6 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset G-Edit The first step, evaluate the name mention tagging F-measure The second step, evaluate the overlapped mentions by using graph-based edit distance To do: Find a theoretical proof that the connected component based method is the unique optimal solution Try to make the matching more sensitive to cluster size G-Edit ● Construct a bipartite graph with bipartition G, S ● An edge (Gi,Sj) exists if Gi and Sj share members ● Construct the connected components. In each connected component, merge the S's and then split them appropriately. 1 2 6 1 5 3 3 5 4 4 2 Gold Standard 6 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset G-Edit ● Merge [1,5], [3], [2,6] ● Merge Cost = 2 1 2 6 1 3 3 2 6 5 5 4 Gold Standard 4 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset G-Edit ● Split [1,2,6], [3,5] ● Split Cost = 1 1 2 3 5 6 4 Gold Standard 1 3 2 6 5 4 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset G-Edit ● Total Cost = Merge Cost + Split Cost ● Total Max Cost = Max Merge Cost + Max Split Cost ● Max Merge Cost = # mentions in S - 1 ● Max Split Cost = # clusters in G -1 ● Score = 1 - Total Cost / Total max cost = 1 - (2+1)/7 = 4/7 1 2 3 5 6 4 Gold Standard 1 3 2 6 5 4 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset Wikification F-score Existing Entity Linking (Wikification) F-score Wikification F-score ● Precision = (1+1)/6 = 1/3 ● Recall = (1+1)/6 = 1/3 ● F-score = 2 * 1/3 * 1/3 / (1/3 + 1/3) = 1/3 1 3 2 1 1 1 3 6 7 5 0 4 4 2 Gold Standard 6 System Output cluster mentions together 1 color refer to kb_id shape refer to entity type number refer to doc_id + offset