YAGO – A Core of Semantic Knowledge Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum (Max-Planck Institute for Computer Science Saarbrücken/Germany) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 1 Overview رMotivation رThe Yago ontology رContent رModel رExtension رConclusion Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 2 The Truth about Elvis Elvis is alive! Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 3 The Truth about Elvis Elvis is alive! He works as an astronaut in NASA's special security program Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 4 Usual solution Which NASA astronaut was born when Elvis was born? Yields only rubbish. Reasons: 1. Google participates in the conspiracy 2. Google does not search knowledge, but Web sites Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 5 Solution: An ontology astronaut born born 1935 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge is an ? 6 Solution: An ontology entity subclass person subclass is a astronaut is a born born 1935 means "Elvis Presley" Fabian M. Suchanek ? means "The King" YAGO - A Core of Semantic Knowledge 7 Solution: An ontology entity subclass Classes person Relations subclass is a astronaut is a born Individuals 1935 means Words born "Elvis Presley" Fabian M. Suchanek ? means "The King" YAGO - A Core of Semantic Knowledge 8 Where do we get the ontology from? Previous approaches: رAssemble the ontology manually (WordNet, SUMO, GeneOntology) Problems: Usually low coverage (MPI is in none of these) رExtract the ontology from corpora (e.g. the Web) (KnowItAll, Espresso, Snowball, LEILA) Problem: Usually low accuracy (50%-92%) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 9 Where do we get the ontology from? YAGO approach: Assemble the ontology from Wikipedia (=> good coverage) Use the category system of Wikipedia (=> good accuracy) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 10 Exploiting the Wikipedia category system Elvis Pr born blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter 1935 Exploit relational categories Categories: 1935_births Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 11 Exploiting the Wikipedia category system Elvis Pr American_singer is a born blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories: 1935 Exploit relational categories Exploit conceptual categories American_singers Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 12 Exploiting the Wikipedia category system Elvis Pr Disputed_article American_singer is a is a born blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories: 1935 Exploit relational categories Exploit conceptual categories Avoid administrational categories Disputed_articles Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 13 Exploiting the Wikipedia category system Rock'n_Roll_Music American_singer Elvis Pr is a is a born blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Categories: Rock'n_Roll_Music Fabian M. Suchanek 1935 Exploit relational categories Exploit conceptual categories Avoid administrational categories Avoid thematic categories YAGO - A Core of Semantic Knowledge 14 The Upper Model entity person ? American_singer is a born 1935 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 16 The Upper Model: From Wikipedia? Business Social_group People_by_occupation ? American_singer is a born 1935 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 17 The Upper Model: From WordNet? Person#3 Singer#1 ... Singer#17 American_singer is a born 1935 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 18 The Upper Model: From WordNet? Person#3 Origin#7 Singer#1 ... Singer#17 American_singers_of_Jewish_origin is a born 1935 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 19 The YAGO ontology Person#3 subclass Singer#1 means subclass "singer" American_singer is a born 1935 Fabian M. Suchanek means "Elvis Presley" YAGO - A Core of Semantic Knowledge 20 The YAGO ontology: Accuracy Relation subclass Accuracy is a 94.54% +/- 2.36% familyName 97.81% +/- 1.75% givenName 97.62% +/- 2.08% establishedIn 90.84% +/- 4.28% bornInYear 93.14% +/- 3.71% diedInYear 98.72% +/- 1.30% locatedIn 98.41% +/- 1.52% politicianOf 92.43% +/- 3.93% writtenInYear 94.35% +/- 3.33% hasWonPrize 98.47% +/- 1.57% Fabian M. Suchanek 97.70% +/- 1.59% YAGO - A Core of Semantic Knowledge 21 6,000,000 The YAGO ontology: Number of Facts Ontologies should not be judged purely by the number of facts! This is just an informational overview. 2,000,000 30,000 60,000 200,000 300,000 KnowItAll SUMO WordNet OpenCyc Fabian M. Suchanek Cyc YAGO - A Core of Semantic Knowledge Yago 22 The Yago Model: Why binary is not enough singer (Elvis, is_a, singer) (But only from 1953 to 1977) is a (We know this from Wikipedia) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 23 The Yago Model: Why binary is not enough singer #1 (Elvis, is_a, singer) time is a 1953-1977 #2 (#1, time, 1953-1977) #3 (#1, source, Wikipedia) source Fabian M. Suchanek Wikipedia YAGO - A Core of Semantic Knowledge 24 The Yago model formally A YAGO ontology over رa set of relations R رa set of common entities C #1 (Elvis, is_a, singer) رa set of fact identifiers I #2 (#1, time, 1953-1977) is a function #3 (#1, source, Wikipedia) I (RCI) R (RIC) We can talk about رfacts (#1, source, Wikipedia) رadditional arguments (#1, time, 1953-1977) رrelations (time, hasRange, time_interval) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 25 The Yago model: Logical aspects Axioms: person (x, is_a, y) subclass singer is a Fabian M. Suchanek is a (y, subclass, z) => (x, is_a, z) ... YAGO - A Core of Semantic Knowledge 26 The Yago model: Logical aspects finite, unique f1, f2, f3, f4, f5, f6, f7, f8, f9, f10 Axioms: (x, is_a, y) derive facts (y, subclass, z) => (x, is_a, z) f1, f2, f3, f4, f5 ... Eliminate facts f1, f2, f3 Fabian M. Suchanek finite, unique YAGO - A Core of Semantic Knowledge 27 Extending the Ontology Whom did Elvis marry? X married Y Elvis married Priscilla Priscilla Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 28 Extending the Ontology with LEILA Whom did Elvis marry? subj obj X married Y subj obj Elvis, the great rock star, married Priscilla Priscilla Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 29 Extending the Ontology Ontology (YAGO) Information Extraction (LEILA) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 30 The Truth about Elvis Which astronaut was born in the same year as Elvis? http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/ Enter your Yago Query: "Elvis Presley" bornInYear $year $astro bornInYear $year 20 results $astro isa astronaut Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 31 The Truth about Elvis Which astronaut codenamed "Roger" was born in the same year as Elvis? http://www.mpi-inf.mpg.de/~suchanek/downloads/yago/ Enter your Yago Query: "Elvis Presley" bornInYear $year $astro bornInYear $year "Roger" givenNameOf $astro $astro isa astronaut Fabian M. Suchanek $astro = Roger_Chaffee YAGO - A Core of Semantic Knowledge 32 Conclusions رYago bases on a logically clean model رYago has an accuracy of around 95% رYago is 3 times larger than the largest competitor رElvis is alive Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 33 Reference For all details, please refer to our technical report "Yago – A Core of Semantic Knowledge" (Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum) available at http://www.mpii.mpg.de/~suchanek BibTex: @TECHREPORT{yagotr, AUTHOR = {Suchanek, Fabian and Kasneci, Gjergji and Weikum, Gerhard}, TITLE = {Yago: A Core of Semantic Knowledge}, TYPE = {Research Report}, INSTITUTION = {Max-Planck-Institut f{\"u}r Informatik}, ADDRESS = {Stuhlsatzenhausweg 85, 66123 Saarbr{\"u}cken, Germany}, NUMBER = {MPI-I-2006-5-006}, YEAR = {2006} } Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 34