DBpedia - A Crystallization Point for the Web of Data 2011.10.05 Junghee - Han Outline The DBpedia Project Understanding Linked Data The DBpedia Knowledge Extraction Framework The DBpedia Knowledge Base Accessing the DBpedia Knowledge Base Applications facilitated by DBpedia DBpedia - A Crystallization Point for the Web of Data 2 The DBpedia Project DBpedia 위키피디아로부터 구조화된 정보를 추출하고, 이를 웹에서 이용할 수 있도록 만들기 위한 커뮤니티 Dbpedia is a community effort to Extract structured information from Wikipedia Make this information available on the Web under an open license Interlink the DBpedia dataset with other open datasets on the Web DBpedia - A Crystallization Point for the Web of Data 3 The DBpedia Project DBpedia knowledge base Currently describes more than 2.6 million entities - 198,000 persons - 328,000 places - 101,000 musical works - 34,000 films - 20,000 companies. The knowledge base contains 3.1 million links to external web pages and 4.9 million RDF links into other Web data sources. DBpedia - A Crystallization Point for the Web of Data 4 Linked Data 참고: 5 Linked Data Web Browsers Search Engines HTTP HTTP 참고: 6 Linked Data RDF stands for Resource Description Framework : URI를 갖는 모든 것(웹페이지, 이미지, 동영상등) : 자원(Resource)들의 속성, 특성, 관계기술 : 위의 것들을 기술하기 위한 모델, 언어, 문법 RDF는 Graph Model을 갖고 있다. 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 7 Linked Data Graph Model 예시 Triple 형식표현 RDF Syntax SPARQL(Simple Protocol and RDF Query Language) W3C에서 만든 RDF 질의 언어 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 8 Linked Data 1. Use URI(Uniform Resource Identifier)s as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful RDF Information 4. Include RDF statements that link to other URIs so that they can discover related things Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html 9 Linked Data http://bibleontology.com/page/Bilhah 1. Use URIs as names for things http://bibleontology.com/page/Bilhah 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 10 Linked Data http://bibleontology.com/page/Bilhah 2. Use HTTP URIs so that people can look up those names http://bibleontology.com/page/Bilhah 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 11 Linked Data http://bibleontology.com/page/Bilhah 3. When someone looks up a URI, provide useful RDF Information 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 12 Linked Data http:// http://bibleontology.com/page/Bilhah 4. Include RDF statements that link to other URIs so that they can discover related things 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 13 Linked Data [residences] Seoul HongGilDong [researches] [name] [age] http://dbpedia.org/ resource/Seoul SemanticWe b [sameAs] Hong, Gil Dong 35 http://dbpedia.org/ resource/Semantic_We b [hasPhotoCollection] http://sws.geonames.org/1835848/ [nearbyFeatures] http://www4.wiwiss.fuberlin.de/flickrwrappr/ photos/Semantic_Web http://sws.geonames.org/1835848/nearby.rd f 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 14 Linked Data 로 식별하고, Linking 하고, 로 표현하고, 로 질의하고, 로 유통하고, SQL SPARQL 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 15 Linked Data 단절된 국가 공공정보 부동산정 보 문화재정 보 일자리정 보 연결된 국가 공공정보 교통정보 공간정보 문헌정보 환경정보 문화재정 보 XXX 정보 일자리정 보 여행정보 토지정보 공간정보 문헌정보 환경정보 교통정보 여행정보 XXX 정보 토지정보 상품정보 상품정보 민간 정보 포털 및 언론 부동산정 보 해외 정보 대학 기타 DBPedia BBC etc 국가 공공정보 여행정보 공간정보 참고: [KSWC2010]데이터의 가치를 높이는 Linked Data 문헌정보 환경정보 XXX정보 16 Wikipedia Content Domain specific Data Title Images Description Infoboxes Languages Web Links Categorization DBpedia - A Crystallization Point for the Web of Data 17 The DBpedia Knowledge Extraction Framework(1/2) Currently 19 extractors Labels(title,rdfs:label) Abstracts(first paragraph,rdfs:comment) Interlanguage links. Images. Redirects. Disambiguation(depedia:disambiguates) External links(dbpedia:reference) Page links(dbpedia:wikilink) Homepages(foaf:homepage) Geo-coordinates. Person data. PND. SKOS categories. Page ID. Revision ID. Category label. Article categories. Mappings. Infobox. Until March 2010, the DBpedia project was using a PHP-based extraction framework to extract different kinds of structured information from Wikipedia. This framework has been superseded by the new Scala-based extraction framework and the old PHP framework is not maintained anymore DBpedia - A Crystallization Point for the Web of Data 18 The DBpedia Knowledge Extraction Framework(2/2) Two Work-Flows Dump-based extraction - The Wikimedia Foundation publishes SQL dumps of all Wikipedia editions on a monthly basis - The dump-based workflow uses the DatabaseWikipedia page collection as the source of article texts and the N-Triples serializer as the output destination. Live extraction Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) DBpedia - A Crystallization Point for the Web of Data 19 Infobox Extraction dbpedia:BBC p:network_name „British Broadcasting Corporation (BBC)“ dbpedia:BBC p:country dbpedia:United_Kingdom dbpedia:BBC p:key_people dbpedia:Michael_Lyons dbpedia:Mark_Thompson DBpedia - A Crystallization Point for the Web of Data 20 The DBpedia Knowledge Base Identifying Entities Resources are assigned a URI according to the pattern http://dbpedia.org/resource/Name (where Name is taken from the URL of the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name) Classifying Entities DBpedia entities are classified within four classification schemata in order to fulfill different application requirements. - Wikipedia Categories - YAGO - UMBEL(Upper Mapping and Binding Exchange Layer) - DBpedia Ontology Describing Entities Every DBpedia entity is described by a set of general properties DBpedia - A Crystallization Point for the Web of Data 21 Accessing the DBpedia Knowledge Base over the Web Linked Data DBpedia resource identifiers(ex: http://dbpedia.org/resource/Berlin) SPARQL Endpoint http://dbpedia.org/sparql RDF Dumps http://wiki.dbpedia.org/Downloads32 Lookup Index http://lookup.dbpedia.org/api/search.asmx DBpedia - A Crystallization Point for the Web of Data 22 Interlinked Web Content Currently contains 4.9 million outgoing RDF links DBpedia - A Crystallization Point for the Web of Data 23 Applications facilitated by Dbpedia(1/3) Browsing and Exploration DBpedia Mobile DBpedia - A Crystallization Point for the Web of Data 24 Applications facilitated by Dbpedia(2/3) Querying and Search DBpedia Query Builder . http://querybuilder.dbpedia.org DBpedia - A Crystallization Point for the Web of Data 25 Applications facilitated by Dbpedia(3/3) Querying and Search Relationship Finder . DBpedia - A Crystallization Point for the Web of Data 26 Conclusions and Future Work Conclusion The resulting DBpedia knowledge base covers a wide range of different domains and connects entities across these domains. Future Work Cross-language infobox knowledge fusion - Derive an astonishingly detailed multi-domain knowledge base Wikipedia article augmentation - Develop a MediaWiki extension that augments Wikipedia articles with additional information as well as media items (pictures, audio) from these sources Wikipedia consistency checking - Improve the overall quality of Wikipedia DBpedia - A Crystallization Point for the Web of Data 27