SEMEDIA Linking data on the Semantic Web? The idea, a search engine and a client (yes, with some geospatial use cases) Dr. Giovanni Tummarello Digital Enterprise Research Institute, Galway Copyright 2007 Digital Enterprise Research Institute. All rights reserved. www.deri.org Outline • Semantic Web and Linked Data • How to publish and Sindice Search Engine • DBin 2.0: a Semantic Web Client – – – – Read Write capabilities to Semantic Web Domain specific user interfaces (Brainlets) Cooperative Supports geospatial placing of resources • Conclusion 2 Semantic Web as a Quad space • A semantic model (RDF) can be published on the web, at a specific web location (URL): Resolve(http://g1o.net/foaf.rdf) rdf/xm • The collection of all the RDF models published on the web is today referred to by some as being the “Semantic Web” • The Semantic Web can be therefore see as an infinite quad store where: – Any graph is in readable (in general, but HTTP access control possible) – It is possible to write but only in controlled web spaces (e.g. one’s homepage) A big plus: it ties to the URL/DNS name ownership mechanism, 3 Plenty of data on the Semantic Web already • Databases : DBpedia, DBLP, Uniprot, Geonames, • Personal FOAF, Forum data (SIOC), enhanced RSS … etc. • Size of the current of the current Semantic Web: 50-100m documents • Most of these use the “Linked Data” approach: 4 Linked data • Create your URI as a resolvable URI – used to be called URL ? • When someone asks for such URIs (HTTP GET) give an RDF description or an HTML page according to the HTTP GET HEADER – Example: http://sws.geonames.org/2950159/ • Actually, recreate URIs for concept that have already a URI elsewhere – Argh!? http://dbpedia.org/resource/Berlin , it’s Berlin again. • Pro: locally simple 5 Tools for the Semantic Web an overview: • Browsers: Tabulator, Disco (browsers) etc.. • Services: SWSE (for full queries), Sindice (find the sources, query yourself) • Clients: Protégé (read/local edit), DBin 2.0 (read/write) 6 Implementation in action: Sindice • Can help a user or a client (e.g. Tabulator) to find useful Semantic Web Sources to import. • Quick to update, monitors changes, crawls (soon) • First beta is out indexing most of the currently known Semantic Web • Discovers, and uses Semantic Sitemaps 7 Sindice scenario DBPedia DBLP GeoNames The tabulator Disco, Piggy Bank, SIOC Explorer etc.. 8 Tools for data producers • • • • 9 You want all your data to be found You want your data to be indexed correctly HTML: Sitemap to describe your website HTML: Validator and link checker Semantic Sitemap extention • Sitemap protocol exposes “deep web" to crawlers • Semantic sitemap adds Semantic Web data 10 Large quantities of linked data: how to expose? • The fact that the data is HTTP retrievable in small bits makes it crawlable. • But data producers are very scared of this: – Million of hits for each refresh – Each hit triggers potentially many complex query to generate the RDF view of the entity – DOS on the SW have happened and they are not fun. • And clearly something better must be possible – Most data producers do in fact already provide full dumps of the base data – Or SPARQL endpoints 11 The idea: Extending Sitemaps to expose data • Sitemaps: – Originally by Google, immediately adopted by all (Yahoo, MSN) etc – Expose the “deep web”, by providing a list of pages “to be crawled” – Written in XML, Linked directly in the robot.txt Example: <?xml version="1.0" encoding="UTF-8"?> < urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> < url> < loc>http://www.example.com/</loc> < lastmod>2005-01-01</lastmod> < changefreq>monthly</changefreq> < priority>0.8</priority> </url> </urlset> 12 The Semantic Sitemap Extention Example first: <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> <sc:dataset> <sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel> <sc:dataDumpLocation>http://example.org/cataloguedump.rdf </sc:dataDumpLocation> <sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix> <changefreq>monthly</changefreq> </sc:dataset> </urlset> 13 The Semantic Sitemap Extention Example first: <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> <sc:dataset> <sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel> <sc:dataDumpLocation>http://example.org/cataloguedump.rdf </sc:dataDumpLocation> <sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix> <changefreq>monthly</changefreq> </sc:dataset> </urlset> 14 The Semantic Sitemap Extention Example first: <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> <sc:dataset> <sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel> <sc:dataDumpLocation>http://example.org/cataloguedump.rdf </sc:dataDumpLocation> <sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix> <changefreq>monthly</changefreq> </sc:dataset> </urlset> 15 The Semantic Sitemap Extention Example first: <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> <sc:dataset> <sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel> <sc:dataDumpLocation>http://example.org/cataloguedump.rdf </sc:dataDumpLocation> <sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix> <changefreq>monthly</changefreq> </sc:dataset> </urlset> 16 The Semantic Sitemap Extention Example first: <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> <sc:dataset> <sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel> <sc:dataDumpLocation>http://example.org/cataloguedump.rdf </sc:dataDumpLocation> <sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix> <changefreq>monthly</changefreq> </sc:dataset> </urlset> 17 The Semantic Sitemap Extention Example first: <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> <sc:dataset> <sc:datasetLabel>Product Catalog for Example.org</sc:datasetLabel> <sc:dataDumpLocation>http://example.org/cataloguedump.rdf </sc:dataDumpLocation> <sc:linkedDataPrefix>http://example.org/products/</sc:linkedDataPrefix> <changefreq>monthly</changefreq> </sc:dataset> </urlset> 18 Other features • Location of the SPARQL endpoint of the dataset <sc:sparqlEndPoint>http://example.org/queryengine/sparql</sc:sp arqlEndPoint> • A reppresentative URI/URL <sc:sampleURI>http://example.org/products/id1234<sc:sampleURI> • Split data dumps <sc:dataFragmentDump>http://example.org/data 19 How it is meant to be used As a crawler: • If you are given a URL for an RDF site check for the sitemap • If a dump is available, download that instead As a client: • If you have a dump, and want an update • Check the sitemap, to locate it in case it has changed position • Or to locate a SPARQL endpoint 20 Who uses it? Data producers • Geonames • DBpedia • Uniprot • DBLP • … (takes 10 minutes to do one..) Data consumers • Sindice • Next: SWSE, DBin 2.0 21 How does a client write to the Semantic Web? • Reading is simple, • Writing requires a server side module with write access to a URL space. • Our brew, a very simple “Publishing Service” – PHP: can be placed almost anywhere (e.g. your homepage) – Almost no installation effort – Password protected upload • Once published, makes sure the Semantic Web knows (e.g. also give a ping to PingTheSemanticweb, Sindice etc..) 22 DBin 2.0 overview • Is a desktop client • Provides a rich user interfaces which can be customized for specific domains (Brainlet model) • As a basis, it Reads and Writes from/to the Semantic Web • Implements a “Semantic Web Pipes”: workflows which combine Semantic Web sources in specific ways 2.0 23 DBin 2.0 overview • Is a desktop client • Provides a rich user interfaces which can be customized for specific domains (Brainlet model) • As a basis, it Reads and Writes from/to the Semantic Web • Implements a “Semantic Web Pipes”: workflows which combine Semantic Web sources in specific ways 2.0 24 Brainlets in a slide • Created by Power Users (domain experts) using an XML language • Defines Ontologies and other data aspects • Defines the User Interface as a mashup of specific data visualizers • Rendered by the DBin platform Used by the end user 25 Example: conference information manager Similarly to the Web (HTML), the user creates “environments” using a text editor. Example: <Brainlet name="Conference Information Manager“ author="Onofrio Panzarino" version="1.0"> <Ontology file="brainlet/Conference.owl"/> <GUED name="Conference"> <Topic name="Conferences" uri="http://www.purl.org/net/ontology/Conference#Conference"> <Child query="SELECT X FROM {X} <rdfs:subClassOf> {$parent} WHERE X != $parent" recursive="true"> <Child subjectBy="rdf:type" icon="/icons/Conference.gif"/> </Child> <Child subjectBy="rdf:type" icon="/icons/Conference.gif"/> </Topic> <Topic name=“Topics" uri="http://www.purl.org/net/ontology/Conference#Topic"> <Child query="SELECT X FROM {X} <rdfs:subClassOf> {$parent} WHERE X != $parent" recursive="true"> <Child subjectBy="rdf:type"/> </Child> <Child subjectBy="rdf:type"/> </Topic> </GUED> <View id="Focus" /> <View id="GUEDNavigator" title="ConferenceNavigator" icon="nav.gif" selecterFor="main" /> <View id="Comments" title="Comments" listenTo="main" selecterFor="comments" /> <View id="Comment" title="Details" listenTo="comments" /> <View id="Gallery" listenTo="main" /> </Brainlet> 26 SW researchers expect something like this…. 27 Domain specific Data visualization can be mashed 28 29 30 v 31 32 Reading Writing Visualizing data in DBIN 2.0 34 Semantic Web Communities: the idea 35 One more tools: Semantic Web Pipes • Live, reusable, composable transformation of Semantic Web Sources • http://pipes.deri.org 36 Conclusions • The Semantic Web is now a reality – Linked data, Semantic sitemaps – Sindice to find all the connections – How can a user interact with it? • DBin 2.0 is a prototypical client to read/write this information: – Very prototypical – Can be a good starting point thought • Final thought: Geo Semantic web should not seen as a competitor to other formats, but an extra opportunity. 37 Thanks for your attention 2.0 SEMEDIA Semantic Web and Multimedia 38