Semantic Web & Semantic Web Processes A course at Universidade da Madeira, Funchal, Portugal June 16-18, 2005 Dr. Amit P. Sheth Professor, Computer Sc., Univ. of Georgia Director, LSDIS lab CTO/Co-founder, Semagix, Inc Special Thanks: Cartic Ramakrishnan, Karthik Gomadam Agenda Part I • What is Semantic Web? • What makes the Semantic web • Ontologies – importance of relationships and knowledge • Representation and Languages • Why XML is not enough • Describe semantic web resources- RDF and RDFS • OWL • Query processing and storage Part II • Metadata, Enabling techniques and technologies • Ontology and knowledge engineering: ontology design, ontology population maintaining, ontology freshness • Automated metadata extraction and annotation • Computation and reasoning with focus on relationships • Example commercial Semantic Web platform 1 Agenda 2 Part III • Semantic web applications: search, integration, analysis a. Pan-Web and consumer-centric b. Enterprise Part IV • Semantic Web Services and Processes • • • • What are Web Services ? What are Web processes ? Creating Web processes: Annotation, discovery, composition, etc. Semantic Web Service/Process tools Part I • What is Semantic Web? • What makes the Semantic web • Ontologies – importance of relationships and knowledge • Types and examples of ontologies • Metadata and Semantic Annotation -- metadata classifications • Representation and Languages • Why XML is not enough • RDF - Describe semantic web resources and RDFS - RDF as a triple, RDF as a graph (show example RDF/S) • OWL • RDF Query processing and storage Three generation of Information Systems: Where we have come from, where we are going Semantics Generation III 2000s MediaAnywhere InfoQuilt, OBSERVER , Semantic Web technologies and platforms Semagix Freedom Metadata Generation II 1990s Generation I 1980s (Ontology, Context, Relationships, KB) VisualHarness InfoHarness (Domain model) Metadata based integration, Mediator Systems, Digital Libraries AdaptX/Harness Data (Schema, “semantic data modeling) Mermaid DDTS Intervisio Heterogeneous databases/ Federated Databases Research Semi-Formal Informal Degree of Agreement Formal Other dimensions: how agreements are reached, … Broad Scope of Semantic (Web) Technology Current Semantic Web Focus Lots of Useful Semantic Technology (interoperability, Integration) Qos Execution Scope of Agreement Task/ App Domain Industry Gen. Purpose, Broad Based Function Common Sense Data/ Info. Cf: Guarino, Gruber What is the Semantic Web? • "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 • Ontologies • RDF/RDFS or OWL Syntax – machine processable • Semantic Metadata – annotation of web resources “An ontology is a specification of a conceptualization” (T. Gruber) • A conceptualization is the way we think about a domain • A specification provides a formal way of writing it down Building Ontologies from the Ground Up When users set out to model their professional activity – Mark Mussen Conceptualization and Ontology Everything that can be expressed in the language http://www.w3c.it/events/minerva20040706/guarino.pdf Ontology Constraining Possible Interpretations Of what can Be expressed Central Role of Ontology • Ontology represents agreement, represents common terminology/nomenclature • Ontology is populated with extensive domain knowledge or known facts/assertions • Key enabler of semantic metadata extraction from all forms of content: – unstructured text (and 150 file formats) – semi-structured (HTML, XML) and – structured data • Ontology is in turn the center piece that enables – resolution of semantic heterogeneity – semantic integration – semantically correlating/associating objects and documents Types of Ontologies (or things close to ontology) • Upper ontologies: modeling of time, space, process, etc • Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), SWETO, WordNet ; • Domain-specific or Industry specific ontologies – – – – – – News: politics, sports, business, entertainment Financial Market Terrorism Pharma GlycO, ProPreO (GO (a nomenclature), UMLS inspired ontology, …), MGED – – – – Anti-money laundering Equity Research Repertoire Management Financial irregularity • Application Specific and Task specific ontologies Fundamentally different approaches in developing ontologies at the two end of the above spectrum Building ontology Three broad approaches: • social process/manual: many years, committees – Can be based on metadata standard • automatic taxonomy generation (statistical clustering/NLP): limitation/problems on quality, dependence on corpus, naming • Descriptional component (schema) designed by domain experts; Description base (assertional component, extension) using automated processes from trusted knowledge sources Option 2 is being investigated in several research projects; Option 3 is currently supported by Semagix Freedom SUMO -- http://ontology.teknowledge.com/ Part of the CYC Upper Ontology http://www.cyc.com/cyc/technology/whatiscyc_dir/whatdoescycknow SWETO (Semantic Web Testbed Ontology) Current Status • Developed using Semagix technology for free noncommercial usage by the SW community; some initial users • V1.4 population includes over 800,000 entities and over 1,500,000 explicit relationships among them • Continue to populate the ontology with diverse sources thereby extending it in multiple domains, new smaller and larger release due soon; RDF and OWL versions • Significant information for provenance/trust support [UMBC partnership] • 97% of disambiguation performed automatically, 2% manually; not quite a high-quality as an evaluation testset (e.g., low connectivity) • Working on test harness, quality measures, and benchmarks Expressiveness Range: Knowledge Representation and Ontologies TAMBIS KEGG Thesauri “narrower term” relation Catalog/ID DB Schema UMLS Wordnet Terms/ glossary RDFS OO Formal instance SWETO Disjointness, Inverse, part of… Frames (properties) RDF Informal is-a GO Simple Taxonomies Formal is-a BioPAX DAML CYC OWL IEEE SUO Value Restriction General Logical constraints GlycO Pharma Expressive Ontologies Ontology Dimensions After McGuinness and Finin EcoCyc Gene Ontology (GO) • Comprises three independent “ontologies” – molecular function of gene products – cellular component of gene products – biological process representing the gene product’s higher order role. • Uses these terms as attributes of gene products in the collaborating databases (gene product associations) • Allows queries across databases using GO terms, providing linkage of biological information across species http://www.geneontology.org/ GO = Three Ontologies • Molecular Function – elemental activity or task – example: DNA binding • Cellular Component – location or complex – example: cell nucleus • Biological Process – goal or objective within cell – example: secretion http://www.geneontology.org/ GlycO GlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycans Contains 770 classes – describe structural features of glycans URL: http://lsdis.cs.uga.edu/projects/glycomics/glyco is a focused ontology for the description of glycomics • models the biosynthesis, metabolism, and biological relevance of complex glycans • models complex carbohydrates as sets of simpler structures that are connected with rich relationships GlycO statistics: Ontology schema can be large and complex • 770 classes • 142 slots • Instances Extracted with Semagix Freedom: – – – – – – – 69,516 genes (From PharmGKB and KEGG) 92,800 proteins (from SwissProt) 18,343 publications (from CarbBank and MedLine) 12,308 chemical compounds (from KEGG) 3,193 enzymes (from KEGG) 5,872 chemical reactions (from KEGG) 2210 N-glycans (from KEGG) GlycO taxonomy The first levels of the GlycO taxonomy Most relationships and attributes in GlycO GlycO exploits the expressiveness of OWL-DL. Cardinality constraints, value constraints, Existential and Universal restrictions on Range and Domain of properties allow the classification of unknown entities as well as the deduction of implicit relationships. Query and visualization A biosynthetic pathway N-glycan_beta_GlcNAc_9 GNT-I attaches GlcNAc at position 2 N-acetyl-glucosaminyl_transferase_V N-glycan_alpha_man_4 GNT-V attaches GlcNAc at position 6 UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 The impact of GlycO • GlycO models classes of glycans with unprecedented accuracy • Implicit knowledge about glycans can be deductively derived • Experimental results can be validated according to the model N-Glycosylation Process (NGP) Cell CultureBy N-glycosylation Process, extract we mean the identification and Glycoprotein Fraction quantification of proteolysis glycopeptides Glycopeptides Fraction 1 n Separation technique I Glycopeptides Fraction n PNGase Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction ms peaklist ms/ms peaklist binning Glycopeptide identification and quantification N-dimensional array Signal integration Data reduction Peptide identification Peptide list Data correlation ProPreO - Experimental Proteomics Process Ontology ProPreO models the phases of proteomics experiment using five fundamental concepts: Data: (Example: a peaklist file from ms/ms raw data) Data_processing_applications: (Example: MASCOT* search engine) Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF) Parameter_list: describes the different types of parameter lists associated with experimental phases Task: (Example: chromatography) *http://www.matrixscience.com/ component separation, used in Semantic Annotation of Scientific Data 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 ms/ms peaklist data <ms/ms_peak_list> <parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_m ass_spectrometer mode = “ms/ms”/> <parent_ion_mass>830.9570</parent_ion_mass> <total_abundance>194.9604</total_abundance> <z>2</z> <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> <ms/ms_peak_list> Annotated ms/ms peaklist data Semantic annotation of Scientific Data <ms/ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_s pectrometer” mode = “ms/ms”/> <parent_ion_mass>830.9570</parent_ion_mass> <total_abundance>194.9604</total_abundance> <z>2</z> <mass_spec_peak m/z = 580.2985 abundance = 0.3592/> <mass_spec_peak m/z = 688.3214 abundance = 0.2526/> <mass_spec_peak m/z = 779.4759 abundance = 38.4939/> <mass_spec_peak m/z = 784.3607 abundance = 21.7736/> <mass_spec_peak m/z = 1543.7476 abundance = 1.3822/> <mass_spec_peak m/z = 1544.7595 abundance = 2.9977/> <mass_spec_peak m/z = 1562.8113 abundance = 37.4790/> <mass_spec_peak m/z = 1660.7776 abundance = 476.5043/> <ms/ms_peak_list> Annotated ms/ms peaklist data Syntax for Onologies and Metadata • • • • Why not use XML? Why use OWL? Or for that matter why RDF? So many questions … From XML to OWL • XML • XML Schema • RDF • RDF Schema • OWL NO SEMANTICS – surface syntax for structured documents – imposes no semantic constraints on the meaning of these documents. – is a datamodel for objects ("resources") and relations between them, – provides a simple semantics for this datamodel Relationships as – these datamodels can be represented in an XML syntax. first class objects– key to Semantics – is a vocabulary for describing properties and classes of RDF resources – with a semantics for generalization-hierarchies of such properties and classes. – adds more vocabulary for describing properties and classes: • • • • relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), SEMANTICS equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes. http://en.wikipedia.org/wiki/Semantic_web#Components_of_the_Semantic_Web Expressive Power – is a language for restricting the structure of XML documents. From an alphabet to a Language • XML – “XML is only the first step to ensuring that computers can communicate freely. XML is an alphabet for computers and as everyone traveling in Europe knows, knowing the alphabet doesn’t mean you can speak Italian of French.” – Business Week, March 18th 2002 – Example cited by Nicola Guarino in http://www.w3c.it/events/minerva20040706/guarino.pdf • RDF/RDFS and OWL would therefore be akin to the language computers use to communicate • And ontologies represented in these languages would be akin to the exact interpretations of the concepts being communicated Syntax for Onologies and Metadata • RDF – A simple W3C standard used to describe Web resources – Relationships in RDF (Properties), are binary relationships between two resources or a resource and a literal – Resources take on the roles of Subject and Object respectively. – The Subject, Predicate and Object compose an RDF statement http://www.w3.org/RDF/ What is RDF? • Resource Description Framework • Proposed as the base semantic web language • Data model for describing properties of resources • Statements about properties and values of web resources • Machine-understandable metadata RDF Elements • Resource: – Something that can be described/referenced – Identified by a URI • Property: – Relationship from a resource to a value: • Another resource • An atomic value/literal • Statement: – resource -> property -> value RDF Statement RDF Model • Formal Data Model – Directed labeled graph • Nodes: resources or literals • Edges: properties (relationships/attributes) • Labels: URIs of nodes and edges – Collection of triples • subject (resource) • predicate (property) • object (resource or literal) • W3C recommendation Graph Model Triple Model Subject Predicate Object Shaguille O’Neal plays_for Miami Heat Kobe Bryant plays_for LA Lakers LA Lakers competes_with Miami Heat Phil Jackson coaches LA Lakers RDF Syntax • • • • • Formal syntax Encoded in XML Unambiguous property names and values RDF adds rules for interpretation W3C recommendation Example <sample:Athlete rdf:about="&sample;Kobe_Bryant"> <rdfs:label xml:lang="en">Kobe Bryant</rdfs:label> <sample:plays_for rdf:resource="&sample;LA_Lakers"/> </sample:Athlete> <sample:Athlete rdf:about="&sample;Shaquille_ONeal"> <rdfs:label xml:lang="en">Shaquille O'Neal</rdfs:label> <sample:plays_for rdf:resource="&sample;Miami_Heat"/> </sample:Athlete> <sample:Team rdf:about="&sample;LA_Lakers" <rdfs:label xml:lang="en">LA Lakers</rdfs:label> </sample:Team> <sample:Team rdf:about="&sample;Miami Heat" <rdfs:label xml:lang="en">Miami Heat</rdfs:label> <sample:competes_with rdf:resource="&sample;LA_Lakers"/> </sample:Team> <sample:Coach rdf:about="&sample;sample1_Instance_8" <rdfs:label xml:lang="en">sample1_Instance_8</rdfs:label> <sample:coaches rdf:resource="&sample;LA_Lakers"/> </sample:Coach> What is RDFS? • RDF Vocabulary Description Language • (RDF Schema) • Extension of RDF: same data model – graph or triples • A hierarchy of classes • A hierarchy of properties relating classes • W3C recommendation RDF Schema RDF Instances Passenger Ticket subClassOf(isA) number for String Flight String FFlyer Customer fflierno typeOf(instance) String purchased String subPropertyOf Bank Account Payment ffid FFNo float String CCard &r4 Cash Client ffid “XYZ123” &r11 “M’mmed” &r1 purchased &r2 paidby &r3 “Atta” &r5 “Marwan” &r7 “Al-Shehhi” lname purchased &r8 for paidby &r6 &r9 holder RDFS Core Classes • rdfs:Class – Class of resources that are RDF classes – Instance of rdfs:Class • rdfs:Resource – All things being described – The class type of everything in RDF(S) – Instance of rdfs:Class • rdf:Property – Class of RDF properties – Instance of rdfs:Class http://www.w3.org/TR/rdf-schema/ RDFS Core Properties • rdfs:type – A resource is an instance of a class – Instance of rdf:Property • rdfs:subClassOf – All instances of a class are also instances of another class – Instance of rdf:Property • rdfs:subPropertyOf – All resources related by one property are also related by another property – Instance of rdf:Property RDF Core Properties • rdfs:range – All values of a property are instances of one or more class • The value MUST be an instance of all range classes – Instance of rdf:Property • rdfs:domain – All resources with the given property are instances of one or more class • The resource MUST be an instance of all domain classes – Instance of rdf:Property OWL, W3C definition • “language for defining structured, Webbased ontology which enables richer integration and interoperability of data across application boundaries” http://www.w3.org/2004/OWL/ OWL Use Cases • • • • • • Web portals Multimedia Collections Corporate web site management Design documentation Agents and services Ubiquitous computing OWL Design Goals • • • • • • • • Shared ontologies Ontology evolution Ontology interoperability Inconsistency detection Expressivity vs. scalability Ease of use Compatibility with other standards Internationalization What’s in OWL, but not in RDF • Ability to be distributed across many systems – By means of owl:imports (similar to ‘include’ in C/C++) • Scalable to Web needs (?) • Compatible with Web standards for: – accessibility, and – Internationalization • Open and extensible OWL open and extensible • RDF Schema (meta-modeling facilities, i.e. classes of classes) • OWL Full • OWL DL (Description Logics) • OWL Lite – targeting tool builders owl:Class • Sub class of Class in RDF • Better to forget about classes of classes • Top-most class: owl:Thing OWL Properties Object Properties Data type Properties Ana owns Cuba Ana age 25 Is range a literal / typed value ? then ERROR • XML Schema data types supported – DB people happy Transitivity of properties X p1 Y Y p1 Z implies X p1 Z • Transitivity existed already in RDF – “subClassOf”, and “subPropertyOf” • Example: located_in Atlanta located_in Georgia located_in U.S.A. located_in Symmetric properties X p1 Y implies X p1 Y Spain Portugal has_border_with has_border_with Portugal Spain Functional Properties X p1 Y X p1 Z imply Z is the same as Y (they describe the same) • example, Portugal Portugal p1 = has_name has_capital &r1 has_capital &r2 Result: &r1 and &r2 represent the same entity Inverse Functional Properties Y p1 A Z p1 A imply Z is the same as Y (they describe the same) • example, &r1:Tim Finin &r2:Timothy Finin p1 = has_email has_email has_email finin@umbc.edu Result: &r1 and &r2 represent the same entity OWL Cardinality • min Cardinality • max Cardinality • “Cardinality” – When min = max • has Value – belongs to the class if it has the value OWL Tools • Pellet (umd.edu) – DLbased reasoner implemented in Java • Euler – an inference engine supporting logic based proofs. Finds out whether a given set of facts support a given conclusion • FaCT (Ian Horrocks) – DL classifier that can also be used for modal logic satisfiability testing RDF Storages • • • • • • • • • • Jena Sesame Redland Triple 3store RDFSuite RDFStore Kowari Yars Brahms developed at LSDIS • Variety of available storages • Different APIs and languages • Support from RDF to OWL-full – even reasoning • Storage and query approach: graph Vs. triple-centric http://www.w3.org/2001/sw/Europe/reports/rdf_scalable_storage_report/ Jena • • • • • Implemented in Java by HP Laboratories Support for RDF, RDFS and OWL Reasoning / inference engine Support for reified statements In-memory and persistent storage (Oracle, MySQL, PostgreSQL) • Query language: RDQL, SPARQL • Read/write RDF in RDF/XML, N3 and N-Triples format • Triple-centric organization and API Jena – graph abstraction • Graph interface is separated from (persistent) triple storage layer • Special support for different types of graphs - optimized for performance • Support operations like add, delete, find. “Efficient RDF Storage and Retrieval in Jena2” Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Jena – query processing • Converting multiple patterns in query into one query to DB • Use DB query optimizer instead of executing multiple queries from Jena level • Cluster properties that are likely to be accessed together - optimize for common patterns • Associate a table with pattern (best) or span pattern between tables (requires join operation) • Query may span between different graphs, but it can be optimized only if they are in the same database Redland, Rasqual, Raptor • Storage for RDF triples - do not implement any language by itself • This is the main module to include in RDF manipulation system • Implemented in pure C for portability • Rich API enables to build modules on top of it • Rasqual - RDF query module – RDQL – SPARQL • Raptor - a fast RDF parser Redland • API available in different languages – C, C#, Java, Perl, Python, PHP, Ruby, Tcl • API for manipulating – triples, URI/literals, graphs • Portable - can built in most OSes • Scalable to handle millions of triples – while using of persistent storage – but indexing is very space-consuming • Support for context and hierarchy of models Redland - model • Abstraction of model to support different storages • In-memory and persistent models – BerkeleyDB, 3store, MySQL • Rich, triple-centric API „The Design and Implementation of the Redland RDF Application Framework” - David Beckett Sesame • Implemented in Java • Database independent – idea of SAIL (Storage Abstraction Interface Layer) • Scalable architecture • Implementation of remote models – can query different models over network • Graph-centric approach • Language: RQL Sesame - architecture • RAL - Repository Abstraction Layer – makes Sesame storage independent – API supportes RDF Schema semantics (e.g. subsumption reasoning) – can be stacked one on another – interface oriented for persistance storage (DBMS, Object-Relational DB) – data returned as streams – can even use net-based RDF services (!) • Due to poor performance, implemented cache as one of RALs – cache mainly for RDFS, as it needs code support in reasoning (subClassOf, ...) “Sesame: An Architecture for Storing and Querying RDF Data and Schema Information” - Jeen Broekstra, Arjohn Kampman, Frank van Harmelen Sesame – query module • Query module – query plan and optimizer similar to already known DB solutions – query is translated to a set of simple RAL calls – each leaf of the query plan can ‘evaluate itself’ and pull data from RAL – data are returned as streams – lack of optimization on storage level Brahms • • • • Implemented in C++ (bindings for Java also available) Graph-centric approach Designed to support large in-memory RDF graphs Optimized for speed and memory usage – other storages do not offer optimized in-memory implementation for large graphs – only main memory offers fastest access - usage of persistent storage decreases performance • In-memory storage with fast precomputed graph snapshot loading – minimize cold-start time Brahms • Framework for fast discovery of long association paths in large RDF bases – memory and CPU intense algorithms • Rich API, but no query language supported – higher level query languages do not support variable length association path queries – association path discovery algorithms operate on lowlevel graph API • Overperformed Jena, Sesame and Redland during tests for association discovery – also was able to work efficiently on much larger inmemory graphs than other storages did not handle “BRAHMS: A WorkBench RDF Store And High Performance Memory System for Semantic Association Discovery” (Technical report) - Maciej Janik, Krys Kochut Why RDF languages? • Find resources based on predicates, values, labels or associations • SQL is not good for querying RDF data – different models: relational and graph • XML query languages cannot deal with graph data • Syntactic approach is not enough • Required semantic querying • Inferencing is desirable Available query languages • • • • • • • • • • RQL RDQL SeRQL Triple SPARQL – (latest) SquishQL Versa N3 RxPath RDFQL • Majority of languages have roots in SQL • No single standard as SQL • Some languages are tightly coupled with specific storages RQL • Based on OQL • Utilizes functional approach with support for generalized path expressions – both nodes and edges can become variables • Not completely compatible with RDF specification – has some additional restrictions • Return bindings to variables (no closure) • Implemented in RDFSuite and partially in Sesame select Res from {Res} ns:label {x} where x=“foo” using namespace ns=… “RQL: A Declarative Query Language for RDF” - Greg Karvounarakis, Sofia Alexaki, Vassilis Christophides, Dimitris Plexousakis, Michel Scholl RDQL • SQL-like syntax – easy to adopt for DB users • Can specify patterns of triples to select • Schema is not interpreted • Not closed under queries – output as bindings to selected variables • Implemented in Jena select ?p, ?q where (?p <rdfs:label> “foo”) (?p <rdf:type> ?q) “RDQL - A Query Language for RDF” (W3C Member Submission) - Andy Seaborne (HP Labs Bristol) SeRQL • Sesame RDF Query Language • Based on RQL and RDQL • Support for generalized path expressions and optional matching • Query filters – select-from-where – return variable bindings and is not closed – construct-from-where – return matching subgraph that can be queried (closure) “SeRQL: Sesame RDF query language” - Jeen Broekstra Triple • Derived from F-logic – should be easy to adopt for logic programmers • Triples are logic expressions – S[PO] • Queries and triples have the same logic representation • Reasoning is a part of language • Does not fulfill closure property • Implemented in Triple system FORALL X <- ( X[rdfs:label -> “foo”] )@default:ln. “TRIPLE - A Query, Inference, and Transformation Language for the Semantic Web” Michael Sintek, Stefan Decker SPARQL • W3C effort to standarize query language – best experience and requirements from different languages (like RQL, RDQL) • Based on matching graph patterns – triples, paths, subgraphs – optional blocks and matching – matching alternatives (union) and disjunction • Many additional operators – grouping, sorting, limit results PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:mbox ?mbox } } “SPARQL Query Language for RDF” (W3C Working Draft) - Eric Prud'hommeaux , Andy Seaborne Sample path query “A Comparison of RDF Query Languages” - Peter Haase, Jeen Broekstra, Andreas Eberhart, Raphael Volz Expressive power of RDF languages “Ontology Storage and Querying” (Technical Rreport No 308) - Aimilia Magkanaraki et al.