SEMEDIA

advertisement
SEMEDIA
Linking data on the Semantic Web?
The idea, a search engine and a client
(yes, with some geospatial use cases)
Dr. Giovanni Tummarello
Digital Enterprise Research Institute, Galway
 Copyright 2007 Digital Enterprise Research
Institute. All rights reserved.
www.deri.org
Outline
• Semantic Web and Linked Data
• How to publish and Sindice Search Engine
• DBin 2.0: a Semantic Web Client
–
–
–
–
Read Write capabilities to Semantic Web
Domain specific user interfaces (Brainlets)
Cooperative
Supports geospatial placing of resources
• Conclusion
2
Semantic Web as a Quad space
• A semantic model (RDF) can be published on the web, at a specific
web location (URL):
Resolve(http://g1o.net/foaf.rdf)  rdf/xm
• The collection of all the RDF models published on the web is today
referred to by some as being the “Semantic Web”
• The Semantic Web can be therefore see as an infinite quad store
where:
– Any graph is in readable (in general, but HTTP access control possible)
– It is possible to write but only in controlled web spaces (e.g. one’s
homepage)
A big plus: it ties to the URL/DNS name ownership mechanism,
3
Plenty of data on the Semantic Web already
• Databases : DBpedia, DBLP, Uniprot, Geonames,
• Personal FOAF, Forum data (SIOC), enhanced RSS … etc.
• Size of the current of the current Semantic Web: 50-100m
documents
• Most of these use the “Linked Data” approach:
4
Linked data
• Create your URI as a resolvable URI
– used to be called URL ? 
• When someone asks for such URIs (HTTP GET) give an
RDF description or an HTML page according to the
HTTP GET HEADER
– Example: http://sws.geonames.org/2950159/
• Actually, recreate URIs for concept that have already a
URI elsewhere
– Argh!? http://dbpedia.org/resource/Berlin , it’s Berlin again.
• Pro: locally simple
5
Tools for the Semantic Web an overview:
• Browsers: Tabulator, Disco (browsers) etc..
• Services: SWSE (for full queries), Sindice (find the
sources, query yourself)
• Clients: Protégé (read/local edit), DBin 2.0 (read/write)
6
Implementation in action: Sindice
• Can help a user or a client (e.g. Tabulator) to find
useful Semantic Web Sources to import.
• Quick to update, monitors changes, crawls (soon)
• First beta is out indexing most of the currently
known Semantic Web
• Discovers, and uses Semantic Sitemaps
7
Sindice scenario
DBPedia
DBLP
GeoNames
The tabulator
Disco,
Piggy Bank,
SIOC Explorer etc..
8
Tools for data producers
•
•
•
•
9
You want all your data to be found
You want your data to be indexed correctly
HTML: Sitemap to describe your website
HTML: Validator and link checker
Semantic Sitemap extention
• Sitemap protocol exposes “deep web" to crawlers
• Semantic sitemap adds Semantic Web data
10
Large quantities of linked data: how to expose?
• The fact that the data is HTTP retrievable in small bits
makes it crawlable.
• But data producers are very scared of this:
– Million of hits for each refresh
– Each hit triggers potentially many complex query to generate the
RDF view of the entity
– DOS on the SW have happened and they are not fun.
• And clearly something better must be possible
– Most data producers do in fact already provide full dumps of the
base data
– Or SPARQL endpoints
11
The idea: Extending Sitemaps to expose data
• Sitemaps:
– Originally by Google, immediately adopted by all (Yahoo, MSN)
etc
– Expose the “deep web”, by providing a list of pages “to be
crawled”
– Written in XML, Linked directly in the robot.txt
Example:
<?xml version="1.0" encoding="UTF-8"?>
< urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
< url>
< loc>http://www.example.com/</loc>
< lastmod>2005-01-01</lastmod>
< changefreq>monthly</changefreq>
< priority>0.8</priority>
</url>
</urlset>
12
The Semantic Sitemap Extention
Example first:
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
<sc:dataset>
<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>
<sc:dataDumpLocation>http://example.org/cataloguedump.rdf
</sc:dataDumpLocation>
<sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix>
<changefreq>monthly</changefreq>
</sc:dataset>
</urlset>
13
The Semantic Sitemap Extention
Example first:
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
<sc:dataset>
<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>
<sc:dataDumpLocation>http://example.org/cataloguedump.rdf
</sc:dataDumpLocation>
<sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix>
<changefreq>monthly</changefreq>
</sc:dataset>
</urlset>
14
The Semantic Sitemap Extention
Example first:
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
<sc:dataset>
<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>
<sc:dataDumpLocation>http://example.org/cataloguedump.rdf
</sc:dataDumpLocation>
<sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix>
<changefreq>monthly</changefreq>
</sc:dataset>
</urlset>
15
The Semantic Sitemap Extention
Example first:
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
<sc:dataset>
<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>
<sc:dataDumpLocation>http://example.org/cataloguedump.rdf
</sc:dataDumpLocation>
<sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix>
<changefreq>monthly</changefreq>
</sc:dataset>
</urlset>
16
The Semantic Sitemap Extention
Example first:
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
<sc:dataset>
<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>
<sc:dataDumpLocation>http://example.org/cataloguedump.rdf
</sc:dataDumpLocation>
<sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix>
<changefreq>monthly</changefreq>
</sc:dataset>
</urlset>
17
The Semantic Sitemap Extention
Example first:
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
<sc:dataset>
<sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel>
<sc:dataDumpLocation>http://example.org/cataloguedump.rdf
</sc:dataDumpLocation>
<sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix>
<changefreq>monthly</changefreq>
</sc:dataset>
</urlset>
18
Other features
• Location of the SPARQL endpoint of the dataset
<sc:sparqlEndPoint>http://example.org/queryengine/sparql</sc:sp
arqlEndPoint>
• A reppresentative URI/URL
<sc:sampleURI>http://example.org/products/id1234<sc:sampleURI>
• Split data dumps
<sc:dataFragmentDump>http://example.org/data
19
How it is meant to be used
As a crawler:
• If you are given a URL for an RDF site check for the
sitemap
• If a dump is available, download that instead
As a client:
• If you have a dump, and want an update
• Check the sitemap, to locate it in case it has changed
position
• Or to locate a SPARQL endpoint
20
Who uses it?
Data producers
• Geonames
• DBpedia
• Uniprot
• DBLP
• … (takes 10 minutes to do one..)
Data consumers
• Sindice
• Next: SWSE, DBin 2.0
21
How does a client write to the Semantic Web?
• Reading is simple,
• Writing requires a server side module with write access
to a URL space.
• Our brew, a very simple “Publishing Service”
– PHP: can be placed almost anywhere (e.g. your homepage)
– Almost no installation effort
– Password protected upload
• Once published, makes sure the Semantic Web knows
(e.g. also give a ping to PingTheSemanticweb, Sindice
etc..)
22
DBin 2.0 overview
• Is a desktop client
• Provides a rich user interfaces which can be customized
for specific domains (Brainlet model)
• As a basis, it Reads and Writes from/to the Semantic
Web
• Implements a “Semantic Web Pipes”: workflows which
combine Semantic Web sources in specific ways
2.0
23
DBin 2.0 overview
• Is a desktop client
• Provides a rich user interfaces which can be
customized for specific domains (Brainlet model)
• As a basis, it Reads and Writes from/to the Semantic
Web
• Implements a “Semantic Web Pipes”: workflows which
combine Semantic Web sources in specific ways
2.0
24
Brainlets in a slide
• Created by Power Users (domain experts) using an XML
language
• Defines Ontologies and other data aspects
• Defines the User Interface as a mashup of specific data
visualizers
• Rendered by the DBin platform  Used by the end user
25
Example: conference information manager
Similarly to the Web (HTML), the user creates “environments” using a text editor.
Example:
<Brainlet name="Conference Information Manager“ author="Onofrio Panzarino" version="1.0">
<Ontology file="brainlet/Conference.owl"/>
<GUED name="Conference">
<Topic name="Conferences" uri="http://www.purl.org/net/ontology/Conference#Conference">
<Child
query="SELECT X FROM {X} <rdfs:subClassOf> {$parent} WHERE X != $parent"
recursive="true">
<Child subjectBy="rdf:type" icon="/icons/Conference.gif"/>
</Child>
<Child subjectBy="rdf:type"
icon="/icons/Conference.gif"/>
</Topic>
<Topic name=“Topics" uri="http://www.purl.org/net/ontology/Conference#Topic">
<Child
query="SELECT X FROM {X} <rdfs:subClassOf> {$parent} WHERE X != $parent"
recursive="true">
<Child subjectBy="rdf:type"/>
</Child>
<Child subjectBy="rdf:type"/>
</Topic>
</GUED>
<View id="Focus" />
<View id="GUEDNavigator" title="ConferenceNavigator" icon="nav.gif" selecterFor="main" />
<View id="Comments" title="Comments" listenTo="main" selecterFor="comments" />
<View id="Comment" title="Details" listenTo="comments" />
<View id="Gallery" listenTo="main" />
</Brainlet>
26
SW researchers expect something like this….
27
Domain specific Data visualization can be mashed
28
29
30
v
31
32
Reading Writing Visualizing data in DBIN 2.0
34
Semantic Web Communities: the idea
35
One more tools: Semantic Web Pipes
• Live, reusable, composable transformation of Semantic
Web Sources
• http://pipes.deri.org
36
Conclusions
• The Semantic Web is now a reality
– Linked data, Semantic sitemaps
– Sindice to find all the connections
– How can a user interact with it?
• DBin 2.0 is a prototypical client to read/write this
information:
– Very prototypical 
– Can be a good starting point thought
• Final thought: Geo Semantic web should not seen as a
competitor to other formats, but an extra opportunity.
37
Thanks for your attention
2.0
SEMEDIA
Semantic Web and Multimedia
38
Download