Linked Data

advertisement
DBpedia - A Crystallization Point
for the Web of Data
2011.10.05
Junghee - Han
Outline
The DBpedia Project
Understanding Linked Data
The DBpedia Knowledge Extraction Framework
The DBpedia Knowledge Base
Accessing the DBpedia Knowledge Base
Applications facilitated by DBpedia
DBpedia - A Crystallization Point for the Web of Data
2
The DBpedia Project
DBpedia
위키피디아로부터 구조화된 정보를 추출하고, 이를 웹에서
이용할 수 있도록 만들기 위한 커뮤니티
Dbpedia is a community effort to
Extract structured information from Wikipedia
Make this information available on the Web under an
open license
Interlink the DBpedia dataset with other open
datasets on the Web
DBpedia - A Crystallization Point for the Web of Data
3
The DBpedia Project
DBpedia knowledge base
Currently describes more than 2.6 million entities
- 198,000 persons
- 328,000 places
- 101,000 musical works
- 34,000 films
- 20,000 companies.
The knowledge base contains 3.1 million links to
external web pages and 4.9 million RDF links into
other Web data sources.
DBpedia - A Crystallization Point for the Web of Data
4
Linked Data
참고:
5
Linked Data
Web
Browsers
Search
Engines
HTTP
HTTP
참고:
6
Linked Data
RDF
stands for
Resource
Description
Framework
: URI를 갖는 모든 것(웹페이지, 이미지, 동영상등)
: 자원(Resource)들의 속성, 특성, 관계기술
: 위의 것들을 기술하기 위한 모델, 언어, 문법
RDF는 Graph Model을 갖고 있다.
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
7
Linked Data
Graph Model 예시
Triple 형식표현
RDF Syntax
SPARQL(Simple Protocol and RDF Query Language)
W3C에서 만든 RDF 질의 언어
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
8
Linked Data
1. Use URI(Uniform Resource Identifier)s as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful RDF Information
4. Include RDF statements that link to other URIs so that they can discover
related things
Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html
9
Linked Data
http://bibleontology.com/page/Bilhah
1. Use URIs as names for things
http://bibleontology.com/page/Bilhah
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
10
Linked Data
http://bibleontology.com/page/Bilhah
2. Use HTTP URIs so that people can look up those
names
http://bibleontology.com/page/Bilhah
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
11
Linked Data
http://bibleontology.com/page/Bilhah
3. When someone looks up a URI, provide useful RDF Information
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
12
Linked Data
http:// http://bibleontology.com/page/Bilhah
4. Include RDF statements that link to other URIs so that they can
discover related things
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
13
Linked Data
[residences]
Seoul
HongGilDong
[researches]
[name]
[age]
http://dbpedia.org/
resource/Seoul
SemanticWe
b
[sameAs]
Hong, Gil
Dong
35
http://dbpedia.org/
resource/Semantic_We
b
[hasPhotoCollection]
http://sws.geonames.org/1835848/
[nearbyFeatures]
http://www4.wiwiss.fuberlin.de/flickrwrappr/
photos/Semantic_Web
http://sws.geonames.org/1835848/nearby.rd
f
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
14
Linked Data
로 식별하고, Linking 하고,
로 표현하고,
로 질의하고,
로 유통하고,
SQL
SPARQL
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
15
Linked Data
단절된 국가 공공정보
부동산정
보
문화재정
보
일자리정
보
연결된 국가 공공정보
교통정보
공간정보
문헌정보
환경정보
문화재정
보
XXX 정보
일자리정
보
여행정보
토지정보
공간정보
문헌정보
환경정보
교통정보
여행정보
XXX 정보
토지정보
상품정보
상품정보
민간 정보
포털 및 언론
부동산정
보
해외 정보
대학
기타
DBPedia
BBC
etc
국가 공공정보
여행정보
공간정보
참고: [KSWC2010]데이터의 가치를 높이는 Linked Data
문헌정보
환경정보
XXX정보
16
Wikipedia Content
Domain specific
Data
Title
Images
Description
Infoboxes
Languages
Web Links
Categorization
DBpedia - A Crystallization Point for the Web of Data
17
The DBpedia Knowledge Extraction Framework(1/2)
Currently 19 extractors
Labels(title,rdfs:label)
Abstracts(first paragraph,rdfs:comment)
Interlanguage links.
Images.
Redirects.
Disambiguation(depedia:disambiguates)
External links(dbpedia:reference)
Page links(dbpedia:wikilink)
Homepages(foaf:homepage)
Geo-coordinates.
Person data.
PND.
SKOS categories.
Page ID.
Revision ID.
Category label.
Article categories.
Mappings.
Infobox.
Until March 2010, the DBpedia project was using a PHP-based extraction framework to extract different kinds of structured information
from Wikipedia. This framework has been superseded by the new Scala-based extraction framework and the old PHP framework is not
maintained anymore
DBpedia - A Crystallization Point for the Web of Data
18
The DBpedia Knowledge Extraction Framework(2/2)
Two Work-Flows
Dump-based extraction
- The Wikimedia Foundation publishes SQL dumps of all Wikipedia
editions on a monthly basis
- The dump-based workflow uses the DatabaseWikipedia page collection
as the source of article texts and the N-Triples serializer as the output
destination.
Live extraction
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
DBpedia - A Crystallization Point for the Web of Data
19
Infobox Extraction
dbpedia:BBC p:network_name
„British Broadcasting Corporation (BBC)“
dbpedia:BBC p:country dbpedia:United_Kingdom
dbpedia:BBC p:key_people dbpedia:Michael_Lyons
dbpedia:Mark_Thompson
DBpedia - A Crystallization Point for the Web of Data
20
The DBpedia Knowledge Base
Identifying Entities
Resources are assigned a URI according to the pattern
http://dbpedia.org/resource/Name (where Name is taken from the URL of
the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name)
Classifying Entities
DBpedia entities are classified within four classification schemata
in order to fulfill different application requirements.
- Wikipedia Categories
- YAGO
- UMBEL(Upper Mapping and Binding Exchange Layer)
- DBpedia Ontology
Describing Entities
Every DBpedia entity is described by a set of general properties
DBpedia - A Crystallization Point for the Web of Data
21
Accessing the DBpedia Knowledge Base over the Web
Linked Data
DBpedia resource identifiers(ex: http://dbpedia.org/resource/Berlin)
SPARQL Endpoint
http://dbpedia.org/sparql
RDF Dumps
http://wiki.dbpedia.org/Downloads32
Lookup Index
http://lookup.dbpedia.org/api/search.asmx
DBpedia - A Crystallization Point for the Web of Data
22
Interlinked Web Content
Currently contains 4.9 million outgoing RDF links
DBpedia - A Crystallization Point for the Web of Data
23
Applications facilitated by Dbpedia(1/3)
Browsing and Exploration
DBpedia Mobile
DBpedia - A Crystallization Point for the Web of Data
24
Applications facilitated by Dbpedia(2/3)
Querying and Search
DBpedia Query Builder
.
http://querybuilder.dbpedia.org
DBpedia - A Crystallization Point for the Web of Data
25
Applications facilitated by Dbpedia(3/3)
Querying and Search
Relationship Finder
.
DBpedia - A Crystallization Point for the Web of Data
26
Conclusions and Future Work
Conclusion
The resulting DBpedia knowledge base covers a wide
range of different domains and connects entities
across these domains.
Future Work
Cross-language infobox knowledge fusion
- Derive an astonishingly detailed multi-domain knowledge base
Wikipedia article augmentation
- Develop a MediaWiki extension that augments Wikipedia
articles with additional information as well as media items
(pictures, audio) from these sources
Wikipedia consistency checking
- Improve the overall quality of Wikipedia
DBpedia - A Crystallization Point for the Web of Data
27
Download