GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science YAGO Reporter: Qi Liu Put conference information here Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science What is YAGO? A semantic web A knowledge base A combination of WordNet and wikipedia 2 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Semantic web Advocated by W3C(World Wide Web Consortium) Aimed at reconstructing the WWW A standard framework: RDF(Resource Description Framework) 3 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science What is YAGO? A semantic web A knowledge base A combination of WordNet and wikipedia 4 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Knowledge base To be: • A special database for knowledge management To do: • Provides a means for collecting, organising, searching and utilising information Three types: • Machine-readable knowledge bases(DBpedia) • Human-readable konwledge bases(Wikipedia) • Knowledge base analysis and design 5 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science What is YAGO? A semantic web A knowledge base A combination of WordNet and wikipedia 6 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science WordNet To be: • A lexical database for English since 1985 To do: • Groups words into synsets • Provides short, general definitions • Records the semantic relations between these synsets 25 basic noun groups & 15 verb groups 7 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Key Concepts Ontology vs Taxonomy Lexicon:the bridge between a language and the knowledge expressed in that language Syntactic (there vs their) Semantic (sight vs site) Pragmatic (infer vs imply) 8 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Semantics of YAGO Five relations: • • • • • Domain Range subRelationof Type subClassOf Entities: • • • • • Domain Relation Range Literal ...... 10 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Axiomatic rules 11 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Reasoning rules correctness and completeness 12 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science The YAGO system Knowledge extraction YAGO storage Enriching YAGO 13 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Knowledge extraction TYPE relation SUBCLASSOF relation MEANS relation Other relations Meta-relations 14 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science TYPE relation extraction The Wikipedia Category System • Types: conceptual, administrative, relational, thematic Identifying Conceptual Categories • Conceptual TYPE • Adm and relational ones: excluded by hand • Employ a shallow linguistic parsing(Noun Group Parser) of the left two categories • E.g. Naturalized citizens of United States • domain and range extracted at the same time 15 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science SUBCLASSOF relation extraction Wikipedia categories • DAG(directed acyclic graph) • Reflect merely the thematic structure • Use only the leaf categories of Wikipedia Integrating WordNet Synsets • Match or prefer WordNet Establishing subClassOf • American people in Japan Exceptions • Correct manually 16 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Means relation extraction Exploiting WordNet Synsets • A synset{urban center,metropolis, city} • Attach a class for the synset ‘city’ Exploiting Wikipedia Redirects • Search “Einstein, Albert”, redirected to “Albert, Einstein” Parsing Person Names • givenNameOf subRelationOf means • familyNameOf subRelationOf means 17 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Other relations extraction BornInYear & DiedInYear EstablisedIn & LocatedIn WrittenInYear PolitionOf HasWonPrize Filtering the Results 18 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Meta-relations extraction Descriptions • Individual DESCRIBES URL Witness • Fact FoundIn URL(of its witness page) • ExtractedBy Context • Linkages btw A&B: A Context B 19 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Knowledge extraction TYPE relation SUBCLASSOF relation MEANS relation Other relations Meta-relations 20 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science The YAGO system Knowledge extraction YAGO storage Enriching YAGO 21 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science YAGO storage Model independent of storage Storage: • Text files, XML, database tables, RDF 22 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Enriching YAGO Add the fact(x,r,y) • • • • • Map x,y to existing entities(word sense disambiguation) If mapping failed, add new entity. Map r to YAGO ontology If mapping successed, add a FoundIn relation If mapping failed, add a new fact! 23 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Summary on YAGO1 1M entities & 5M facts Accuracy around 95% 24 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science 25 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science YAGO2: In Time, Space and Many Languages YAGO: about 100 manually defined relations Build YAGO2 architecture based on such rules: • Factual rules • E.g. Exceptions,definition of all relations, domains, ranges and classes • Implication rules • Inferring rules from the facts in the database • Replacement rules • Normalize numbers, tags and other formats • Extraction rules • Extracting facts from a given source text 26 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Temporal Dimension People wasBornOnDate & diedOnDate Groups wasCreatedOnDate&wasDestroyedOnDate Artifacts(buildings, songs,cities) [same as above] Events startedOnDate & endedOnDate =>startExistingOnDate&endExistingOnDate Facts Entities in a fact =>subjectStartRelation&objectStartRelation 27 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science GEO-SPATIAL Dimension All physical objects have a location in space! Define it with geographical coordinates, i.e. Latitude and longtitude =>yagoGeoCoordinates, =>hasGeoCoordinates Two sources: • Wikipedia • GeoNames • locatedIn & hasGeoCoordinates & <location,TYPE,class> 28 Email: zerup123@gmail.com GDM@FUDAN http://gdm.fudan.edu.cn Graph Data Management Lab, School of Computer Science Textual Dimension hasWikipediaAnchorText hasWikipediaCategory hasCitationTitle subClassOf hasContext Integrating UWN to including 200 languages 29 Email: zerup123@gmail.com