Back to the Future – Should SQL Surrender to SPARQL?

advertisement

Back to the Future –

Should SQL Surrender to SPARQL?

Rainer Manthey

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 1

How to Communicate with Databases?

?

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015

From: http://www.intsolgrp.com/

2

Communicating with Google: Our Everyday Experience

Our

request

:

A line of symbols

Google‘s

answer

:

139 Mill. Links

© 2015 Prof. Dr. Rainer Manthey

( … If Google is/has/uses a database ?! )

SOFSEM 2015 3

Asking a Relational Database: More Complex, More Goal-Directed

© 2015 Prof. Dr. Rainer Manthey

Our request:

An SQL

Query

The DB‘s answer:

A table with data rows

SOFSEM 2015

From: technet.microsoft.com

4

Reminder of Basic Terminology: DBS = DBMS + n*DB

DBS

DBMS

Database Management System

Database System

Database

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 5

Basics (2): Query Language and Query Manager

Query

(= declarative program)

DBS

DBMS

Query Language

Interpreter

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 6

Relational Databases and SQL Systems: A Multi-Billion Dollar Market

• 1970: Proposal of the Relational Model of Data (RM) by Edward Codd

• 1974 : Design of SQL by Chamberlin/Boyce started

• 1979: First commercial SQL DBMS (Oracle 2)

• 1986: First SQL standard

RM/SQL: A more than 30 years success story . . . Up till now?

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 7

SQL: End of an Era?

No SQL takes the database market by storm

Are SQL Databases Dead?

Is it the end of the line for SQL ?

The relational model is dead, SQL is dead, of the No SQL Hoopla

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015

From: http://crossfitlittleton.net

8

SPARQL: The Hardest New Competitor

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 9

The Semantic Web Dream: SPARQL‘s Vision and Goal

© 2014 by LyonLabs, LLC and Barrett Lyon

© 2015 Prof. Dr. Rainer Manthey

“ I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers.

A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.

The "intelligent agents" people have touted for ages will finally materialize.”

From: „Weaving the Web“ (1999)

Sir Timothy Berners-Lee

SOFSEM 2015 10

W3C Activities in Developing New Query Languages

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 11

A SPARQL Taster

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015

From: taxoncuration.myspecies.info

12

Restriction in this Talk: No Distributed Data Management!

SPARQL:

• Designed for managing data over „the semantic web“

• Navigation in distributed data (re)sources is big issue

• IRIs as identifiers for such

(re)sources used intensively

• At the same time able to manage data without a web.

In this presentation:

All web-related aspects in SPARQL ignored , as SQL has not been made for this context.

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 13

On Syntax: Triples vs. Tuples

• Goal of this contribution:

Compare SQL and SPARQL wrt to their data management capacities only !

• Therefore: First look at the underlying data models of the two languages!

• RM: Tables of rows and columns (or: relations as sets of tuples)

• RDF: Datasets consisting of triples

SQL SPARQL

Query Language based on

Data Model

RM RDF

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 14

RDF: The (Only?) Data Model for the Semantic Web

“ RDF (Resource Description Framework) is one of the three foundational

Semantic Web technologies, the other two being SPARQL and OWL.

In particular, RDF is the data model of the Semantic Web. That means that all data in Semantic Web technologies is represented as RDF .

If you store Semantic Web data, it's in RDF .

If you query Semantic Web data (typically using SPARQL), it's RDF data.

If you send Semantic Web data to your friend, it's RDF .” http://www.cambridgesemantics.com/semantic-university/rdf-101

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 15

RDF Data: Graphs or Triples?

Resource

Literal

© 2015 Prof. Dr. Rainer Manthey

Graph

Representation

Triple

Representation

From: http://www.openarchives.org/ore/1.0/primer

SOFSEM 2015 16

RDF Datasets Are Relations

Some quite obvious observations:

• Every RDF triple can be perceived as a relational tuple .

• Every RDF dataset can be perceived as a relational table .

• Every RDF dataset has the same attributes : S, P, O

⇒ We could accomodate every RDF database in a RM database!

(If we wanted to do so!)

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 17

Relational Tables Represented in RDF

T

A

. . .

1

. . .

Primary key attribute

B C D E a 23 Jim 4.5

T as an RM table

T as an RDF dataset

• N-ary tuple into n-1 triples

• Attributes into predicate values, i.e., meta-data into data

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015

S P O

1

1

1

. . .

1

. . .

B

C

D

E a

23

Jim

4.5

T

18

Tuples as Graphs, RM DBs as Graph Databases

T

A

. . .

1

. . .

B C D E a 23 Jim 4.5

Tuple in serialized notation

Tuple in graphical notation a

B

C

1

D

E

Jim

23

4.5

• Tables in RM can represent graphs as easily as RDF datasets.

• No need to introduce a new data model for „graph-structured data“.

• RM databases are graph databases.

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 19

RM vs. RDF: Brief Summary

• Triples are special tuples .

• Uniform length 3, representing SPO statements

• SO: „Things“, P: „Relationships“

• Tuples can be turned into sets of triples (systematically):

• Provided they have a unary primary key!

• Attributes are turned into data: become queryable!

• Datasets are special tables .

• Tables can be turned into datasets .

RM and RDF are (in principle) equally expressive.

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 20

SQL Basics (1)

Table: x

T

A B C D E

. . .

7 a 123 eg 2.1

. . .

• x : tuple variable

• Attributes: Functions

• Written in postfix notation, e.g., x.A

• Applied to each tuple in turn

SELECT B, E

FROM T

WHERE A = 7

SQL query

in full syntax:

SELECT x.

B, x.

E

FROM T AS x

WHERE x.

A =

7

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 21

SQL Basics (2) x y

T

A B C D E

. . .

. . a 123 . . . .

. . .

. . . .

123 . . 3.4

. . .

SELECT

x.B, y.E

FROM T AS x, T AS y

WHERE

x.C = y.A

In SQL: Tuples from different tables (or copies of a table) are linked by explicit comparisons of attribute from both tuples,

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 22

SPARQL Basics

?x

T

S P O

. . .

. . 2 123

. . .

123

. . .

. .

a

?y

?z

• ?x, ?y, ?z

: triple component variables

• Each triple represented by a single

(triple) pattern in the WHERE part

• Positional syntax, not attributes as selectors

SPARQL Query

SELECT ?x ?y

FROM T

WHERE {?x 2 ?z

.

?z

?y a . }

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 23

Common Query Processing Paradigm

FROM

WHERE

Query

SELECT

• Common principle : Sets of data elements (triple, tuple) as both, input and output

• Difference :

• In SQL: Both input and output are tables, output to be always used as further input – algebraic composition possible

• In SPARQL: Output is not necessarily consisting of triples, thus no composition possible

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 24

Datalog: SQL‘s (Relatively) Unknown Brother

• SQL is (was) not the only relational query language, e.g.:

• Theoretical languages: calculus-based (TRC, DRC), relational algebra (RA)

• Early languages: QUEL, Query-by-Example

• Nearly as old as SQL (developed in the 1970s/80s):

Datalog

(Database + PROLOG)

• Syntactically : Like pure PROLOG (facts and rules, goals as queries)

• Semantically : Like SQL (set-oriented evaluation, no backtracking)

• In Style:

• Datalog : Minimalistic, purely symbolic (mathematical)

• SQL: Verbose, rich of variants, English keywords (user-friendly?)

• In science: Quite successful for understanding complex problems (e.g., recursion)

• Commercially: Completely „irrelevant“, no Datalog DBMS product ever

• Datalog was never standardized : Free for scientific experiments

• Datalog is (at least) as expressive as SQL , if equipped with the same built-ins.

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 25

SQL and Datalog: Two Different Lingustic Approaches to Querying p(X,Y) ← t(X,2,Z), t(Z, Y, a).

• Based on DRC : Domain Relational Calculus

• Variables represent individual tuple components .

• No attributes necessary!

• Strictly symbolic style

Datalog rule

CREATE VIEW p AS

SELECT x.A, y.B

FROM t AS x, t AS y

WHERE x.B=2 AND y.C=a AND x.C=y.A

• Based on TRC : Tuple Relational Calculus

• Variables represent entire tuples .

• Tuple components accessed via attributes!

• Keyword-based style („verbose“)

SQL view

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 26

SPARQL and Datalog: Two Real „Brother“ Languages

SELECT ?x ?y

FROM t

WHERE {?x 2 ?z. ?z ?y a. FILTER ?y > ?z}

SPARQL query

{ (X,Y) : t(X, 2, Z) , t(Z, Y, a) , Y > Z } ?

Datalog query

• Obviously very similar basic principle !

• In both languages: Variables represent components of tuples/triples

• Literals in Datalog = Triple patterns in SPARQL

• More than one literal/triple pattern connected conjunctively (AND)

• Identity conditions expressed indirectly in literals/triple patterns

• Constant values appearing on suitable position

• Identity of values in different position: same variable

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 27

SPARQL and SQL: Quite Unrelated (Except on the Surface)

SELECT ?x ?y

FROM t

WHERE {?x 2 ?z. ?z ?y a. FILTER ?y > ?z}

SPARQL query

{ (X,Y) : t(X, 2, Z) , t(Z, Y, a) , Y > Z } ?

Datalog query

In comparison: SQL is very different in „philosophy“ and style from both of these!

SELECT t1.A, t2.B

FROM T AS t1, T AS t2

WHERE t1.B = 2 AND t1.C = t2.A AND t2.C = a AND t2.B > t2.A

SQL query

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 28

SQL and SPARQL: A Brief Summary of Additional Complex Features

• SQL: More complex queries constructed by . . .

• Combining SELECT-FROM-WHERE blocks using set operators UNION, INTERSECT, MINUS

• Nesting SFW-blocks (using EXISTS quantifier in WHERE conditions)

• Explicit propositional operators AND, OR, NOT

• Aggregate functions (e.g., COUNT, AVG) and GROUP BY

• Ordering of query results: ORDER BY

• SPARQL:

• UNION operator available for merging patterns in WHERE parts

• No other set operators , no combination of several queries

• EXISTS operator since SPARQL 1.1 for nested patterns in FILTER

• Boolean operators only in special situations

• Aggregation as in SQL since SPARQL 1.1

• ORDER BY as in SQL

SPARQL stepwise enhanced with other SQL keywords

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 29

Back to the Future?

• Successful Science Fiction movie from 1985

• Crazy inventor tries to do time travels using a futuristic high-tech car

• Reaches the past (1955), aiming at the future

• 30 years back (like SPARQL to SQL)

As far as data management is concerned,

SPARQL seems to be a step back in time .

SQL and (even more) Datalog are too close , but hidden by idiosyncratic new syntax details and by IRIs around everywhere.

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 30

Conclusion: A Tale of Two Languages

Should SQL Surrender to SPARQL?

• As far as database management is concerned: Certainly not!

• Both, RM and RDF, are very close in style and equally expressive.

• SPARQL cannot really claim any advantage wrt graph databases.

• The two languages have more commonalities than differences.

• Superiority on the SPARQL side is not really visible.

• Surprising: SPARQL is much closer to Datalog than to SQL!

• As far as „serving the web“ is concerned: No competition by SQL (yet)!

Some (more) personal opinions:

• SPARQL‘s style („look and feel“) is consequent in some aspects, but appears to me quite ugly and overblown otherwise.

• The documentation of SPARQL & Co by W3C is hard to „digest“.

• The „propaganda“ for SPARQL by the „Semantic Web Movement“ is making fair comparisons hard.

• Whether the SQL vendors will again be able to „swallow“ a competitor this time remains to be seen . . . I have my doubts.

© 2015 Prof. Dr. Rainer Manthey SOFSEM 2015 31

Download