“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.” This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in this document remains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied, reproduced or distributed to anyone outside Oracle without prior written consent of Oracle. This document is not part of your license agreement nor can it be incorporated into any contractual agreement with Oracle or its subsidiaries or affiliates. Jayant Sharma Technical Director, Spatial Jayant.Sharma@oracle.com “This presentation is for informational purposes only and may not be incorporated into a contract or agreement.” An Introduction to the Oracle Database 10gR2 RDF storage & query model Outline • • • • Technology Trends Why Oracle RDF Oracle RDF model Summary Enterprise Architecture Trends Flexibility Service Oriented Architecture Enterprise Grid Computing Infrastructure Consolidation Range of solutions Resource Sharing Infrastructure Consolidation Primary Types of Consolidation Standardize Other, 0.5% Data integration, 13.8% Consolidate Centralization, 24.9% Application Integration, 14.0% Automate Physical Consolidation, 22.3% Storage Consolidation, 24.5% Source: IDC Worldwide Server Consolidation Forecast & Analysis, 2002 - 2006 Grid Computing • Resource pooling & sharing • Low-cost modular hardware • Incremental scaling • Highly Available, scalable & performant • Self monitoring & self managing • Dynamically configurable infrastructure Service Oriented Architecture Service Oriented Architecture Grid Infrastructure ERP Worldwide Web CRM SCM Custom Virtualized Applications Virtualized Resources Virtualized Information The Missing Layer • The flexibility of a Grid cannot be realized • • • • If application modules are too tightly coupled to data – If the creation or discovery of new kinds of data needs continuous UI or application code changes If we cannot make more data machine readable If we cannot provide seamless access to all kinds of data If we cannot relate information across different sources, or analyze heterogeneous information Relating Information • Search provides random access to data across sources • Taxonomic classification provides dynamic categories which can be used to navigate better • Ontologies help describe and relate information across sources • Better Decisions Oracle10g Value Proposition Secure RDF Data Management SOA Mediaiton Services • • • • • Highly scalable Single source of truth Strong Security Real-time information updates Ontology Engineering Integrate semantic information from multiple sources • Enhanced Business and Concept Mapping Operational Intelligence ETL Inferencing Engines Ontologybased Search RDF Support in Oracle 10g R2 Oracle 10g RDF Approach: • Provide an open and generic network data model and analysis platform for semantic applications. • Extended existing Oracle10g network data model (NDM) to support RDF object types • Perform SQL-based graph analysis • Support for user-defined rules, rulebases, rule indexes • RDF Data Model with RDFS inferencing and support for user-defined rules • Enable combined SQL query of enterprise database and RDF graphs • Support large graphs (millions & billion of triples) • Easily extensible by 3rd party tools/apps Oracle RDF Components RDF Data Model • • • Model RDF graph consisting of a set of triples Rulebase RDFS and user-defined rules Rules Index Inferred triples (on applying a rulebase to a model) RDF Query • • • • SDO_RDF_MATCH Table Function for SQL level access to RDF data SQL based approach (instead of a new language approach) Graph specification syntax based on SPARQL Benefits: • Leverage powerful SQL constructs to process RDF match results • Combine SQL queries without staging Components Appl. Tables A1 DDL Load DML Rulebase Rulebase … Rulebase 1 2 m Model 1 A2 Model 2 … … An Model n Rules Index 1 Rules Index p Rules Index 2 RDF Query RDF Query RDF Querying Problem • Given • • An RDF dataset (graphs) to be searched A graph-pattern containing a set of variables • Find • Subgraphs that match the graph-pattern • Return • Sets of variable bindings – each set corresponds to a matching subgraph (substitution in graph-pattern produces subgraph) RDF Query: Example Graph-pattern: Find <grandpa, parent, grandchild> (?x :fatherOf ?y) (?y :parentOf ?z) Bindings: x = :John y = :Suzie z = :Cathy x = :John y = :Suzie z = :Jack x = :John y = :Matt z = :Tom x = :John y = :Matt z = :Cindy Matching subgraphs: (:John :fatherOf :Suzie) (:Suzie :parentOf :Cathy) (:John (:Suzie :fatherOf :parentOf :Suzie) :Jack) (:John (:Matt :fatherOf :parentOf :Matt) :Tom) (:John (:Matt :fatherOf :parentOf :Matt) :Cindy) RDF Querying Approach • New language approach • Create new (declarative, SQL-like) languages e.g., RQL, SeRQL, TRIPLE, Versa, SPARQL, RDQL, RDFQL, SquishQL • SQL-based approach • • Introduces a SQL table function SDO_RDF_MATCH that accepts RDF queries Benefits – Leverage powerful constructs of SQL to process RDF Query results – Combine with SQL queries without staging Embedding RDF Query in SQL SELECT … FROM …, TABLE ( RDF Query (expressed via SDO_RDF_MATCH invocation) ) t, … WHERE … SDO_RDF_MATCH Table Func • Input Parameters SDO_RDF_MATCH ( Query, Models, Rulebases, Aliases, Filter graph-pattern (with variables) set of RDF models set of rulebases (e.g., RDFS) aliases for namespaces additional selection criteria ) • Return type in definition is AnyDataSet • Actual return type is determined at compile time based on the arguments for each specific invocation RDF Query: Example select * from TABLE(SDO_RDF_MATCH( '(?f rdf:type :Female)', -- find all the females in the family SDO_RDF_Models('family'), null, SDO_RDF_Aliases( SDO_RDF_Alias('', 'http://www.example.org/family/')), null)); Table Function returns a two-column table f varchar2 f$rdfVTYP varchar2 Multiple matching representations • Select USCities with 37" of annual rainfall SELECT n city, r rainfall FROM TABLE(SDO_RDF_MATCH( '(?c noaa:annInchRainfall "37.0"^^xsd:decimal) (?c noaa:annInchRainfall ?r) (?c usc:name ?n)', SDO_RDF_Models('us_territory'), null, SDO_RDF_Aliases(SDO_RDF_Alias('noaa','http://www.nc dc.noaa.gov/oa/climate/online/data#'), SDO_RDF_Alias('usc','http://www.daml.ri.cmu.edu/ont /USCity.daml#')), null)); CITY RAINFALL ---------------------------------------- ---------Grand Rapids 37 Rockford 37 Portland 37.0 Matching multiple representations of a value • The same point in value space may have multiple representations • • • • “37”^^xsd:Integer “37”^^xsd:PositiveInteger “037”^^xsd:Integer “+37.00”^^xsd:decimal • SDO_RDF_MATCH automatically resolves these Join with SQL tables: Example • Display a map of states in New England • SELECT a.name state, b.geometry geom FROM TABLE(SDO_RDF_MATCH( '(usrs:NewEngland usrs:memberstate ?s) (?s usrs:name ?name)', SDO_RDF_Models(‘us_territory'), SDO_RDF_Aliases(SDO_RDF_Alias ('usrs','http://www.daml.ri.cmu.edu/ont/USRegio nState.daml#'), …)) a, states b WHERE a.name=b.state_name; Join: Example 2 • List population of cities in New England • Three datasets used • • • RDF dataset to determines states in New England Census dataset of cities and their 1990 census population figures Census dataset with state, city, and census block boundaries Join Example 2 • SELECT c.pop90 Population, c.city || ', ' || c.state_abrv City from cities c, states s where s.state in (SELECT a.name state_name FROM TABLE(SDO_RDF_MATCH( '(usrs:NewEngland usrs:memberstate ?s) (?s usrs:name ?name)', SDO_RDF_Models('us_territory'), SDO_RDF_Rulebases('RDFS','us_territory_rb'), SDO_RDF_Aliases(SDO_RDF_Alias ('usrs','http://www.daml.ri.cmu.edu/ont/USRegionState.daml# '), SDO_RDF_Alias ('usc','http://www.daml.ri.cmu.edu/ont/USCity.daml#')), null)) a) AND sdo_inside(c.location, s.geom)='TRUE' order by population desc ; POPULATION CITY ------------------- ---------------------------------------------574283 Boston, MA 169759 Worcester, MA 160728 Providence, RI 156983 Springfield, MA 141686 Bridgeport, CT 139739 Hartford, CT 130474 New Haven, CT 108961 Waterbury, CT 108056 Stamford, CT 103439 Lowell, MA 10 rows selected. Inference Components Appl. Tables A1 DDL Load DML Rulebase Rulebase … Rulebase 1 2 m Model 1 A2 Model 2 … … An Model n Rules Index 1 Rules Index p Rules Index 2 RDF Query Rulebases Rulebase: Overview • Each rulebase consists of a set of rules • Each rule consists of • • • antecedent: graph-pattern filter condition (optional) consequent: graph-pattern • One or more rulebases may be used with relevant RDF models (graphs) to infer new data Rulebase: Example Oracle supplied, pre-loaded rulebases: e.g., RDFS rdfs:subClassOf is transitive and reflexive Antecedent: ‘(?x rdf:type ?y) (?y rdfs:subClassOf ?c)’ Consequent: ‘(?x rdf:type ?c)’ Antecedent: ‘(?x ?p ?y) (?p rdfs:domain ?c)’ Consequent: ‘(?x rdf:type ?c)’ Rules in a rulebase us_territory_rb: Antecedent: ‘(?x usc:state ?y) (?y usrs:region ?z)’ Consequent: ‘(?x usrs:cityRegion ?z)’ Rules Indexes Rules Index: Overview • A rules index is created on an RDF dataset (consisting of a set of RDF models and a set of RDF rulebases) • A rules index contains RDF triples inferred from the model-rulebase combination Rules Index: Example • A rules index may be created on a dataset consisting of • • US territory (city, state, region) RDF data, and us_territory_rb rulebase (shown earlier) • The rules index will contain inferred triples showing RDFS entailment and cityRegion relationships RDF Query with Inference SDO_RDF_MATCH with Rulebases • Arguments • • • • Graph Pattern RDF Data set – A set of RDF models – A set of Rulebases Filters Aliases • Example • SDO_RDF_Rulebases (‘RDFS’, ‘us_territory_rb’) Query w/ RDFS Inference: USCity is a subclassOf City hence a USCity is a City select cn ME_CITIES from TABLE(SDO_RDF_MATCH( '(?n rdf:type ac:City) (?n usc:state usrs:ME) (?n usc:name ?cn)', SDO_RDF_Models(‘us_territory’), SDO_RDF_Rulebases('RDFS'), SDO_RDF_Aliases( SDO_RDF_Alias( 'ac','http://www.daml.ri.cmu.edu/ont/City.daml#'), SDO_RDF_Alias( 'usrs','http://www.daml.ri.cmu.edu/ont/USRegionState.daml#'), SDO_RDF_Alias ( 'usc','http://www.daml.ri.cmu.edu/ont/USCity.daml#')), null)); ME_CITIES -----------------------------------------------------Augusta Portland Lewiston Query w/o RDFS Inference: select cn ME_CITIES from TABLE(SDO_RDF_MATCH( '(?n rdf:type ac:City) (?n usc:state usrs:ME) (?n usc:name ?cn)', SDO_RDF_Models(‘us_territory’), null, SDO_RDF_Aliases( SDO_RDF_Alias( 'ac','http://www.daml.ri.cmu.edu/ont/City.daml#'), SDO_RDF_Alias( 'usrs','http://www.daml.ri.cmu.edu/ont/USRegionState.daml#'), SDO_RDF_Alias ( 'usc','http://www.daml.ri.cmu.edu/ont/USCity.daml#')), null)); ME_CITIES -----------------------------------------------------/* no rows selected */ Operations Components Appl. Tables A1 DDL Load DML Rulebase Rulebase … Rulebase 1 2 m Model 1 A2 Model 2 … … An Model n Rules Index 1 Rules Index p Rules Index 2 RDF Query RDF Model operations Model: DDL • Procedures provided as part of the API to • • Create a model Drop a model • When a user creates a model, a database view gets created automatically • RDFM_us_territory • A model corresponds to a column of type SDO_RDF_TRIPLE_S in an application table • Each model has exactly one application table column associated with it Model: DDL Creating a Model • Create an Application Table CREATE TABLE us_territory_table ( …, us_territory_triple SDO_RDF_TRIPLE_S, …); • Create a Model exec SDO_RDF.CREATE_RDF_MODEL( ‘us_territory', ‘us_territory_table', ‘us_territory_triple'); • Automatically creates a database view RDFM_us_territory (…) Model: DML • SQL DML commands may be used to do DML operations on a application table to effect DML (i.e., triple insert, delete, and update) on the corresponding model • Insert Triples INSERT INTO us_territory_table VALUES (1, SDO_RDF_TRIPLE_S(‘us_territory', '<http://www.daml.ri.cmu.edu/ont/USCity.daml#anchorageak>', '<http:// www.daml.ri.cmu.edu/ont/USCity.daml#name>', ‘Anchorage’)); Model: Security • The creator of the application table corresponding to a model can grant privileges to other users • To perform DML to a model, a user must have DML privileges for the corresponding application table • The creator of a model can grant SELECT privileges on the corresponding database view to other users • A user can query only those models for which s/he has SELECT privileges (via corresponding DB views) • Only the creator of a model can drop the model Model: Views • RDFM_<mode-name> • Contains list of triples for an RDF model Rulebase operations Rulebase: DDL • Procedures provided as part of the API may be used to • • • Create a rulebase create_rulebase(‘us_territory_rb'); Drop a rulebase drop_rulebase('us_territory_rb'); • When a user creates a rulebase, a database view gets created automatically • RDFR_us_territory_rb (rule_name, antecedents, filter, consequents, aliases) Rulebase: DML • SQL DML commands may be used on the database view corresponding to a target rulebase to insert, delete, and update rules • insert into RDFR_us_territory_rb values( ‘cityRegion_rule', ‘(?x usc:state ?y) (?y usrs:region ?z)’, NULL, '(?x usc:cityRegion ?z)', SDO_RDF_Aliases(…)); Rulebase: Security • Creator of a rulebase can grant privileges on the corresponding database view to other users • Performing DML operations requires invoker to have appropriate privileges on the database view • Only the creator of a rulebase can drop the rulebase Rulebase: Views • RDF_RULEBASE_INFO • • Contains the list of rulebases For each rulebase, contains additional information (such as, creator, view name, etc) • RDFR_<rulebase-name> • Shows content of each rulebase consisting of its list of rules and for each rule, its name, antecedents, filter, consequents, and aliases Rules Index operations Rules Index: DDL • Procedures provided as part of the API to • • Create a rules index create_rules_index (‘us_territory_rb_rix‘, SDO_RDF_Models ('us_territory '), SDO_RDF_Rulebases (‘rdfs','us_territory _rb')); Drop a rules index drop_rules_index ('us_territory _rb_rix'); • When a user creates a rules index, a database view gets created automatically • RDFI_us_territory_rb_rix (…) Rules Index: Dependencies • Content of a rules index depends upon the content of each element of its dataset • • Any modification to the models or rulebases in its dataset invalidates the rules index – Insertion: VALID INCOMPLETE – Deletion/Update: VALID INVALID Dropping a model or rulebase will drop dependent rules indexes automatically. Rules Index: Security • To create a rules index on an RDF dataset (models and rulebases), user needs to have SELECT privileges on those models and rulebases • Creator of a rules index holds SELECT privilege on the rules index and may grant this privilege to other users • Only the creator of a rules index can drop it Rule Index: Views • RDFI_<rules-index-name> • Contains the list of inferred triples • RDF_RULESINDEX_INFO • • Contains the list of rules indexes For each rules index, contains additional information (such as, creator, status, etc) • RDF_RULESINDEX_DATASETS • For every rules index, contains the names of its models and rulebases Summary • Comprehensive, fully integrated into SQL RDF support in Oracle 10g Release 2 • • • • Models (Graphs) Rulebases Rules Indexes Query using SDO_RDF_MATCH table function • Documentation and White Papers http://www.oracle.com/technology/tech/semantic_technologies/index.html Loading RDF Data into Oracle • Java API provided to load RDF data in NTriple format • Loading times (10.2.0.2) approx. 2.5 M triples/hour Intel Xeon 3 GHz CPU, 3 Gb RAM Additional Platform Features • Clustered database servers • Partitioning: Oracle table partitioning in support of very large graphs • Parallelism: Oracle parallelism to support load, index and query of very large graph models • Data Loading: Import and export data in triple formats (e.g. N-triple) using Oracle’s SQL Loader utility • Versioning • Text Search • Support for unstructured data types (e.g. XML, spatial, images, georaster imagery, audio, video, text) • XML tools (XDB, XQuery) • Middleware: Integrated with germane Oracle Application Server technology (BPEL, XSLT, UDDI, portal, …) Performance Metrics • Batch Loading • 1 million triples loaded in 27 minutes • Querying • • 80M triples RDF_MATCH based query performance is scalable with retrieval cost per result row almost the same as dataset size changes • WordNet (0.5M triples) • Sub-second query response • UniProt (80M triples) • Query Results Range: 0.5 – 5 seconds • See 2005 VLDB Paper Large-Scale RDF Data • UniProt – 10M, 20M, 40M, 80M triples • 6 example queries given with UniProt • Number of matches remain constant as dataset size changes (ROWNUM) UniProt Sample Queries Description Pattern Projection Result limit Q1: Display the ranges of transmembrane regions 6 triples 5 vars 3 vars 15000 rows Q2: List proteins with publications by authors with matching names 5 triples 5 vars 1 LIKE pred. 3 vars 10 rows Q3: Count the number of times a publication by a specific author is cited Q4: List resources that are related to proteins annotated with a specific keyword 3 triples 2 vars 0 vars 32 rows 3 triples 2 vars 1 var 3000 rows Q5: List genes associated with human diseases 7 triples 5 vars 3 vars 750 rows Q6: List recently modified entries 2 triples 2 vars 1 range pred. 2 vars 8000 rows Query Response Times RDF_MATCH Performance Scalability Q1 Q2 Q3 Q4 Q5 Q6 0.86 < 0.01 < 0.01 0.03 0.18 0.46 20 M Triples 0.95 < 0.01 < 0.01 0.03 0.19 0.47 40 M Triples 0.96 < 0.01 < 0.01 0.03 0.18 0.47 80 M Triples 1.03 < 0.01 < 0.01 0.03 0.20 0.49 Maximum .054 0.002 0.002 .011 .065 0.07 10 M Triples More Information • www.oracle.com/technology/tech/semantic_ technologies • Product Development contacts: • • Product Management – Xavier Lopez (xavier.lopez@oracle.com) – Jayant Sharma (jayant.sharma@oracle.com) Development – Souri Das (souripriya.das@oracle.com) – Melliyal (Melli) Annamalai (melliyal.annamalai@oracle.com)