YAGO-NAGA Project Presented By: Mohammad Dwaikat To: Dr. Yuliya Lierler CSCI 8986 – Fall 2012 Agenda • • • • • What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration YAGO-NAGA Sub-Projects Agenda • • • • • What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration YAGO-NAGA Sub-Projects What is YAGO-NAGA? • Harvesting, Searching, and Ranking Knowledge from the Web. • Building a conveniently searchable, large-scale, highly accurate knowledge base of common facts in a machine-processable representation. • Harvested knowledge about millions of entities and facts about their relationships, from Wikipedia and WordNet with careful integration of these two sources. What is YAGO-NAGA? • Its vision is a confluence of Semantic Web (Ontologies), Social Web (Web 2.0), and Statistical Web (Information Extraction) assets towards a comprehensive repository of human knowledge. YAGO • Yet Another Great Ontology (YAGO) Knowledge base. • It is a huge semantic knowledge base, derived from Wikipedia, WordNet, and GeoNames. • It knows almost 10 million entities (e.g. persons, organizations, cities), and 120 million facts about these entities. • It has a manually confirmed accuracy of 95%. • YAGO is an ontology that is anchored in time and space. – It attaches a temporal dimension and a spacial dimension to many of its facts and entities. YAGO • It contains all the entities and ontological facts extracted from Wikipedia (from 2010-08-17), with categories mapped to the WordNet class hierarchy. • It also contains multi-lingual data from the Universal WordNet (UWN). YAGO • It contains all the entities and facts from GeoNames - (from a dump of August 2010). • It also contains textual and structural data from Wikipedia. • All links+anchor texts between the YAGO entities. • All Wikipedia category names. • The titles of references. YAGO • It is particularly suited for disambiguation purposes, as it contains a large number of names for entities. It also knows the gender of people. • YAGO is the resulting knowledge base, the facts are represented as RDF triples (Resource Description Framework). • Methods and prototype systems have been developed for querying, ranking, and exploring knowledge. NAGA • Not Another Google Answer (NAGA) is a new semantic search engine which provides ranked answers to queries based on statistical models. • It can operate on knowledge bases that are organized as graphs with labeled nodes and edges, so called relationship graphs. • As of now, NAGA uses a projection of YAGO as its knowledge base. • The underlying query language supports keyword search for the casual user as well as graph-based queries with regular expressions for the expert user. Agenda • • • • • What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration YAGO-NAGA Sub-Projects Consider These Questions • Which German Nobel laureate survived both world wars and outlived all four of his children? – The answer is Max Planck. • Which politicians are also accomplished scientists? – The German chancellor Angela Merkel and Benjamin Franklin. • How are Max Planck, Angela Merkel, Jim Gray, and the Dalai Lama related? – All four have doctoral degrees from German universities. Why YAGO-NAGA? • Three major research: – Semantic-Web-style knowledge repositories. • Such as SUMO, OpenCyc, and WordNet. – Large-scale information extraction. – Social tagging and Web 2.0 communities that constitute the social Web. • Wikipedia is another example of the Social Web paradigm. • The challenge is how to extract the important facts from the Web and organize them into an explicit knowledge base that captures entities and semantic relationships among them. Agenda • • • • • What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration YAGO-NAGA Sub-Projects How YAGO-NAGA Works? • YAGO adopts concepts from the standardized SPARQL Protocol and RDF Query Language for RDF data but extends them through more expressive pattern matching and ranking. • The prototype system that implements these features is NAGA. Query for the YAGO Knowledge Base A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? Structured Knowledge Queries • A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? Select Distinct ?c Where { ?c type City . ?c locatedIn USA . ?a1 type Airport . ?a2 type Airport . ?a1 locatedIn ?c . ?a2 locatedIn ?c . ?a1 namedAfter ?p . ?p type WarHero . ?a2 namedAfter ?b . ?b type BattleField . } Growing the Knowledge Base Word Net + Wikipedia YAGO Core Extractors YAGO Core Checker YAGO Core knows all entities Web sources YAGO YAGO Gatherer Gatherer Hypotheses YAGO YAGO Scrutinizer Gatherer YAGO G r o w i n g focus on facts 19/38 Information Extraction from Wikipedia Subj. Pred. Obj. Stanford University type Private University hasPresident J.L.Hennessy hasStudents 15,319 foundedBy L.Stanford foundedIn 1891 … … … YAGO Knowledge Base • Combine knowledge from WordNet & Wikipedia. • Additional Gazetteers (geonames.org). Searching & Ranking RDF Graphs in NAGA Ranking based on confidence, compactness and relevance Discovery queries: Kiel bornIn $x type $a scientist diedOn > $b Connectedness queries: German novelist type $x hasWon hasSon diedOn $y * Thomas Mann Queries with regular expressions: Ling hasFirstName | hasLastName (coAuthor | advisor)* Beng Chin Ooi $x type scientist worksFor $y Nobel prize locatedIn* Zhejiang Goethe Agenda • • • • • What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration YAGO-NAGA Sub-Projects YAGO Server: UI & API YAGO Server: UI & API YAGO-UI – Interactive online demo – RDF with time, space & provenance annotations – SPARQL + keywords YAGO-API Two basic WebServices: – processQuery (String query) – getYagoEntitiesByNames (String[] names) www.mpi-inf.mpg.de/yago-naga/demo.html YAGO • Browse through the YAGO knowledge base. –https://d5gate.ag5.mpisb.mpg.de/webyagospotlx/Browser • Ask queries on YAGO using SPOTLX patterns. View the results on a map and timeline. –https://d5gate.ag5.mpisb.mpg.de/webyagospotlx/WebInterface Agenda • • • • • What is YAGO-NAGA? Why YAGO-NAGA? How YAGO-NAGA Works? Demonstration YAGO-NAGA Sub-Projects YAGO-NAGA Sub-Projects • More than 13 sub-projects of YAGO-NAGA. • AIDA: is a method, implemented in an online tool, for disambiguating mentions of named entities that occur in natural-language text or Web tables. – https://d5gate.ag5.mpi-sb.mpg.de/webaida/ Names, Surface Patterns & Paraphrases Which chemist was born in London? NN VBD VBN IN NNP/LOC • (I) Named entity disambiguation – chemist wordnet_chemist, wordnet_pharmacist – born Bertran_de_Born, Born_Identity_(Movie), Born_(Album) – London London_UK, London_Arkansas, Antonio_London • (II) Mapping surface patterns onto semantic relations – <person> was_born_in <location> bornIn(<person>, <location>) – <person> was_born_in <date> bornOn(<person>, <date>) • (III) Paraphrases of questions <person> [was] born in <location> <location>-born <person> bornIn(<person>, <location>) References • YAGO-NAGA Project: – http://www.mpi-inf.mpg.de/yago-naga/ • YAGO: – http://yago-knowledge.org • NAGA: – http://www.mpi-inf.mpg.de/yagonaga/naga/demo.html