Document 13349547

advertisement
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
This document is for informational purposes. It is not a commitment to deliver any
material, code, or functionality, and should not be relied upon in making purchasing
decisions. The development, release, and timing of any features or functionality
described in this document remains at the sole discretion of Oracle. This document
in any form, software or printed matter, contains proprietary information that is the
exclusive property of Oracle. This document and information contained herein may
not be disclosed, copied, reproduced or distributed to anyone outside Oracle
without prior written consent of Oracle. This document is not part of your license
agreement nor can it be incorporated into any contractual agreement with Oracle or
its subsidiaries or affiliates.
Jayant Sharma
Technical Director, Spatial
Jayant.Sharma@oracle.com
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
An Introduction to the
Oracle Database 10gR2
RDF storage & query model
Outline
•
•
•
•
Technology Trends
Why Oracle RDF
Oracle RDF model
Summary
Enterprise Architecture Trends
Flexibility
Service Oriented
Architecture
Enterprise
Grid Computing
Infrastructure
Consolidation
Range
of solutions
Resource Sharing
Infrastructure Consolidation
Primary Types of Consolidation
Standardize
Other, 0.5%
Data
integration,
13.8%
Consolidate
Centralization,
24.9%
Application
Integration,
14.0%
Automate
Physical
Consolidation,
22.3%
Storage
Consolidation,
24.5%
Source:
IDC Worldwide Server Consolidation
Forecast & Analysis, 2002 - 2006
Grid Computing
• Resource pooling & sharing
• Low-cost modular hardware
• Incremental scaling
• Highly Available, scalable & performant
• Self monitoring & self managing
• Dynamically configurable infrastructure
Service Oriented Architecture
Service Oriented
Architecture
Grid
Infrastructure
ERP
Worldwide
Web
CRM
SCM
Custom
Virtualized
Applications
Virtualized
Resources
Virtualized
Information
The Missing Layer
• The flexibility of a Grid cannot be realized
•
•
•
•
If application modules are too tightly coupled to data
– If the creation or discovery of new kinds of data
needs continuous UI or application code changes
If we cannot make more data machine readable
If we cannot provide seamless access to all kinds of
data
If we cannot relate information across different
sources, or analyze heterogeneous information
Relating Information
• Search provides random access to data across
sources
• Taxonomic classification provides dynamic
categories which can be used to navigate better
• Ontologies help describe and relate information
across sources
•
Better Decisions
Oracle10g Value Proposition
Secure RDF Data Management
SOA
Mediaiton
Services
•
•
•
•
•
Highly scalable
Single source of truth
Strong Security
Real-time information updates Ontology
Engineering
Integrate semantic information
from multiple sources
• Enhanced Business and
Concept
Mapping
Operational Intelligence
ETL
Inferencing
Engines
Ontologybased
Search
RDF Support in Oracle 10g R2
Oracle 10g RDF Approach:
• Provide an open and generic network data model and
analysis platform for semantic applications.
• Extended existing Oracle10g network data model
(NDM) to support RDF object types
• Perform SQL-based graph analysis
• Support for user-defined rules, rulebases, rule indexes
• RDF Data Model with RDFS inferencing and support for
user-defined rules
• Enable combined SQL query of enterprise database
and RDF graphs
• Support large graphs (millions & billion of triples)
• Easily extensible by 3rd party tools/apps
Oracle RDF Components
RDF Data Model
•
•
•
Model  RDF graph consisting of a set of triples
Rulebase  RDFS and user-defined rules
Rules Index  Inferred triples (on applying a rulebase to a model)
RDF Query
•
•
•
•
SDO_RDF_MATCH Table Function for SQL level access to RDF data
SQL based approach (instead of a new language approach)
Graph specification syntax based on SPARQL
Benefits:
•
Leverage powerful SQL constructs to process RDF match results
•
Combine SQL queries without staging
Components
Appl.
Tables
A1
DDL
Load
DML
Rulebase Rulebase … Rulebase
1
2
m
Model 1
A2
Model 2
…
…
An
Model n
Rules
Index 1
Rules
Index p
Rules
Index 2
RDF
Query
RDF Query
RDF Querying Problem
• Given
•
•
An RDF dataset (graphs) to be searched
A graph-pattern containing a set of variables
• Find
•
Subgraphs that match the graph-pattern
• Return
•
Sets of variable bindings
– each set corresponds to a matching subgraph
(substitution in graph-pattern produces subgraph)
RDF Query: Example
Graph-pattern: Find
<grandpa, parent, grandchild>
(?x
:fatherOf
?y)
(?y
:parentOf
?z)
Bindings:
x = :John y = :Suzie z = :Cathy
x = :John y = :Suzie z = :Jack
x = :John y = :Matt z = :Tom
x = :John y = :Matt z = :Cindy
Matching subgraphs:
(:John
:fatherOf
:Suzie)
(:Suzie :parentOf
:Cathy)
(:John
(:Suzie
:fatherOf
:parentOf
:Suzie)
:Jack)
(:John
(:Matt
:fatherOf
:parentOf
:Matt)
:Tom)
(:John
(:Matt
:fatherOf
:parentOf
:Matt)
:Cindy)
RDF Querying Approach
• New language approach
•
Create new (declarative, SQL-like) languages e.g., RQL,
SeRQL, TRIPLE, Versa, SPARQL, RDQL, RDFQL,
SquishQL
• SQL-based approach
•
•
Introduces a SQL table function SDO_RDF_MATCH that
accepts RDF queries
Benefits
– Leverage powerful constructs of SQL to process RDF
Query results
– Combine with SQL queries without staging
Embedding RDF Query in SQL
SELECT …
FROM …, TABLE (
RDF Query
(expressed via
SDO_RDF_MATCH
invocation)
) t, …
WHERE …
SDO_RDF_MATCH Table Func
• Input Parameters
SDO_RDF_MATCH (
Query,
Models,
Rulebases,
Aliases,
Filter
 graph-pattern (with variables)
 set of RDF models
 set of rulebases (e.g., RDFS)
 aliases for namespaces
 additional selection criteria
)
• Return type in definition is AnyDataSet
• Actual return type is determined at compile time based on
the arguments for each specific invocation
RDF Query: Example
select * from TABLE(SDO_RDF_MATCH(
'(?f rdf:type :Female)',
-- find all the females in the family
SDO_RDF_Models('family'),
null,
SDO_RDF_Aliases(
SDO_RDF_Alias('', 'http://www.example.org/family/')),
null));
Table Function returns a two-column table
f varchar2
f$rdfVTYP varchar2
Multiple matching representations
• Select USCities with 37" of annual rainfall
SELECT n city, r rainfall
FROM TABLE(SDO_RDF_MATCH(
'(?c noaa:annInchRainfall "37.0"^^xsd:decimal)
(?c noaa:annInchRainfall ?r) (?c usc:name ?n)',
SDO_RDF_Models('us_territory'), null,
SDO_RDF_Aliases(SDO_RDF_Alias('noaa','http://www.nc
dc.noaa.gov/oa/climate/online/data#'),
SDO_RDF_Alias('usc','http://www.daml.ri.cmu.edu/ont
/USCity.daml#')), null));
CITY
RAINFALL
---------------------------------------- ---------Grand Rapids
37
Rockford
37
Portland
37.0
Matching multiple representations of a
value
• The same point in value space may have
multiple representations
•
•
•
•
“37”^^xsd:Integer
“37”^^xsd:PositiveInteger
“037”^^xsd:Integer
“+37.00”^^xsd:decimal
• SDO_RDF_MATCH automatically resolves
these
Join with SQL tables: Example
• Display a map of states in New England
• SELECT a.name state, b.geometry geom
FROM TABLE(SDO_RDF_MATCH(
'(usrs:NewEngland usrs:memberstate ?s)
(?s usrs:name ?name)',
SDO_RDF_Models(‘us_territory'),
SDO_RDF_Aliases(SDO_RDF_Alias
('usrs','http://www.daml.ri.cmu.edu/ont/USRegio
nState.daml#'),
…)) a, states b
WHERE a.name=b.state_name;
Join: Example 2
• List population of cities in New England
• Three datasets used
•
•
•
RDF dataset to determines states in New
England
Census dataset of cities and their 1990
census population figures
Census dataset with state, city, and census
block boundaries
Join Example 2
• SELECT c.pop90 Population, c.city || ', ' || c.state_abrv City
from cities c, states s where s.state in
(SELECT a.name state_name
FROM TABLE(SDO_RDF_MATCH(
'(usrs:NewEngland usrs:memberstate ?s) (?s usrs:name
?name)',
SDO_RDF_Models('us_territory'),
SDO_RDF_Rulebases('RDFS','us_territory_rb'),
SDO_RDF_Aliases(SDO_RDF_Alias
('usrs','http://www.daml.ri.cmu.edu/ont/USRegionState.daml#
'),
SDO_RDF_Alias
('usc','http://www.daml.ri.cmu.edu/ont/USCity.daml#')),
null)) a) AND
sdo_inside(c.location, s.geom)='TRUE'
order by population desc ;
POPULATION CITY
------------------- ---------------------------------------------574283
Boston, MA
169759
Worcester, MA
160728
Providence, RI
156983
Springfield, MA
141686
Bridgeport, CT
139739
Hartford, CT
130474
New Haven, CT
108961
Waterbury, CT
108056
Stamford, CT
103439
Lowell, MA
10 rows selected.
Inference
Components
Appl.
Tables
A1
DDL
Load
DML
Rulebase Rulebase … Rulebase
1
2
m
Model 1
A2
Model 2
…
…
An
Model n
Rules
Index 1
Rules
Index p
Rules
Index 2
RDF
Query
Rulebases
Rulebase: Overview
• Each rulebase consists of a set of rules
• Each rule consists of
•
•
•
antecedent: graph-pattern
filter condition (optional)
consequent: graph-pattern
• One or more rulebases may be used with relevant
RDF models (graphs) to infer new data
Rulebase: Example
Oracle supplied, pre-loaded rulebases: e.g., RDFS
rdfs:subClassOf is transitive and reflexive
Antecedent: ‘(?x rdf:type ?y) (?y rdfs:subClassOf ?c)’
Consequent: ‘(?x rdf:type ?c)’
Antecedent: ‘(?x ?p ?y) (?p rdfs:domain ?c)’
Consequent: ‘(?x rdf:type ?c)’
Rules in a rulebase us_territory_rb:
Antecedent: ‘(?x usc:state ?y) (?y usrs:region ?z)’
Consequent: ‘(?x usrs:cityRegion ?z)’
Rules Indexes
Rules Index: Overview
• A rules index is created on an RDF dataset
(consisting of a set of RDF models and a set of RDF
rulebases)
• A rules index contains RDF triples inferred from the
model-rulebase combination
Rules Index: Example
• A rules index may be created on a dataset
consisting of
•
•
US territory (city, state, region) RDF data, and
us_territory_rb rulebase (shown earlier)
• The rules index will contain inferred triples showing
RDFS entailment and cityRegion relationships
RDF Query with Inference
SDO_RDF_MATCH with Rulebases
• Arguments
•
•
•
•
Graph Pattern
RDF Data set
– A set of RDF models
– A set of Rulebases
Filters
Aliases
• Example
•
SDO_RDF_Rulebases (‘RDFS’, ‘us_territory_rb’)
Query w/ RDFS Inference:
USCity is a subclassOf City hence a USCity is a City
select cn ME_CITIES from TABLE(SDO_RDF_MATCH(
'(?n rdf:type ac:City) (?n usc:state usrs:ME) (?n usc:name ?cn)',
SDO_RDF_Models(‘us_territory’),
SDO_RDF_Rulebases('RDFS'),
SDO_RDF_Aliases(
SDO_RDF_Alias(
'ac','http://www.daml.ri.cmu.edu/ont/City.daml#'),
SDO_RDF_Alias(
'usrs','http://www.daml.ri.cmu.edu/ont/USRegionState.daml#'),
SDO_RDF_Alias (
'usc','http://www.daml.ri.cmu.edu/ont/USCity.daml#')), null));
ME_CITIES
-----------------------------------------------------Augusta
Portland
Lewiston
Query w/o RDFS Inference:
select cn ME_CITIES from TABLE(SDO_RDF_MATCH(
'(?n rdf:type ac:City) (?n usc:state usrs:ME) (?n usc:name ?cn)',
SDO_RDF_Models(‘us_territory’),
null,
SDO_RDF_Aliases(
SDO_RDF_Alias(
'ac','http://www.daml.ri.cmu.edu/ont/City.daml#'),
SDO_RDF_Alias(
'usrs','http://www.daml.ri.cmu.edu/ont/USRegionState.daml#'),
SDO_RDF_Alias (
'usc','http://www.daml.ri.cmu.edu/ont/USCity.daml#')), null));
ME_CITIES
-----------------------------------------------------/* no rows selected */
Operations
Components
Appl.
Tables
A1
DDL
Load
DML
Rulebase Rulebase … Rulebase
1
2
m
Model 1
A2
Model 2
…
…
An
Model n
Rules
Index 1
Rules
Index p
Rules
Index 2
RDF
Query
RDF Model operations
Model: DDL
• Procedures provided as part of the API to
•
•
Create a model
Drop a model
• When a user creates a model, a database view
gets created automatically
•
RDFM_us_territory
• A model corresponds to a column of type
SDO_RDF_TRIPLE_S in an application table
• Each model has exactly one application table
column associated with it
Model: DDL  Creating a Model
• Create an Application Table
CREATE TABLE us_territory_table (
…, us_territory_triple SDO_RDF_TRIPLE_S, …);
• Create a Model
exec SDO_RDF.CREATE_RDF_MODEL(
‘us_territory', ‘us_territory_table',
‘us_territory_triple');
• Automatically creates a database view
RDFM_us_territory (…)
Model: DML
• SQL DML commands may be used to do DML
operations on a application table to effect DML (i.e.,
triple insert, delete, and update) on the
corresponding model
• Insert Triples
INSERT INTO us_territory_table VALUES (1,
SDO_RDF_TRIPLE_S(‘us_territory',
'<http://www.daml.ri.cmu.edu/ont/USCity.daml#anchorageak>',
'<http:// www.daml.ri.cmu.edu/ont/USCity.daml#name>',
‘Anchorage’));
Model: Security
• The creator of the application table corresponding to a
model can grant privileges to other users
• To perform DML to a model, a user must have DML
privileges for the corresponding application table
• The creator of a model can grant SELECT privileges on the
corresponding database view to other users
• A user can query only those models for which s/he has
SELECT privileges (via corresponding DB views)
• Only the creator of a model can drop the model
Model: Views
• RDFM_<mode-name>
•
Contains list of triples for an RDF model
Rulebase operations
Rulebase: DDL
• Procedures provided as part of the API may be
used to
•
•
•
Create a rulebase
create_rulebase(‘us_territory_rb');
Drop a rulebase
drop_rulebase('us_territory_rb');
• When a user creates a rulebase, a database view
gets created automatically
•
RDFR_us_territory_rb (rule_name,
antecedents, filter, consequents, aliases)
Rulebase: DML
• SQL DML commands may be used on the
database view corresponding to a target
rulebase to insert, delete, and update rules
• insert into RDFR_us_territory_rb values(
‘cityRegion_rule',
‘(?x usc:state ?y) (?y usrs:region ?z)’,
NULL,
'(?x usc:cityRegion ?z)',
SDO_RDF_Aliases(…));
Rulebase: Security
• Creator of a rulebase can grant privileges on the
corresponding database view to other users
• Performing DML operations requires invoker to have
appropriate privileges on the database view
• Only the creator of a rulebase can drop the rulebase
Rulebase: Views
• RDF_RULEBASE_INFO
•
•
Contains the list of rulebases
For each rulebase, contains additional information (such
as, creator, view name, etc)
• RDFR_<rulebase-name>
•
Shows content of each rulebase consisting of its list of
rules and for each rule, its name, antecedents, filter,
consequents, and aliases
Rules Index operations
Rules Index: DDL
• Procedures provided as part of the API to
•
•
Create a rules index
create_rules_index (‘us_territory_rb_rix‘,
SDO_RDF_Models ('us_territory '),
SDO_RDF_Rulebases (‘rdfs','us_territory _rb'));
Drop a rules index
drop_rules_index ('us_territory _rb_rix');
• When a user creates a rules index, a database
view gets created automatically
•
RDFI_us_territory_rb_rix (…)
Rules Index: Dependencies
• Content of a rules index depends upon the content
of each element of its dataset
•
•
Any modification to the models or rulebases in its dataset
invalidates the rules index
– Insertion: VALID  INCOMPLETE
– Deletion/Update: VALID  INVALID
Dropping a model or rulebase will drop dependent rules
indexes automatically.
Rules Index: Security
• To create a rules index on an RDF dataset (models
and rulebases), user needs to have SELECT
privileges on those models and rulebases
• Creator of a rules index holds SELECT privilege on
the rules index and may grant this privilege to other
users
• Only the creator of a rules index can drop it
Rule Index: Views
• RDFI_<rules-index-name>
•
Contains the list of inferred triples
• RDF_RULESINDEX_INFO
•
•
Contains the list of rules indexes
For each rules index, contains additional information (such
as, creator, status, etc)
• RDF_RULESINDEX_DATASETS
•
For every rules index, contains the names of its models
and rulebases
Summary
• Comprehensive, fully integrated into SQL RDF
support in Oracle 10g Release 2
•
•
•
•
Models (Graphs)
Rulebases
Rules Indexes
Query using SDO_RDF_MATCH table function
• Documentation and White Papers
http://www.oracle.com/technology/tech/semantic_technologies/index.html
Loading RDF Data into Oracle
• Java API provided to load RDF data in NTriple
format
• Loading times (10.2.0.2)
approx. 2.5 M triples/hour
Intel Xeon 3 GHz CPU, 3 Gb RAM
Additional Platform Features
• Clustered database
servers
• Partitioning: Oracle table
partitioning in support of
very large graphs
• Parallelism: Oracle
parallelism to support
load, index and query of
very large graph models
• Data Loading: Import and
export data in triple
formats (e.g. N-triple)
using Oracle’s SQL
Loader utility
• Versioning
• Text Search
• Support for unstructured data
types (e.g. XML, spatial,
images, georaster imagery,
audio, video, text)
• XML tools (XDB, XQuery)
• Middleware: Integrated with
germane Oracle Application
Server technology (BPEL,
XSLT, UDDI, portal, …)
Performance Metrics
• Batch Loading
•
1 million triples loaded in 27 minutes
• Querying
•
•
80M triples
RDF_MATCH based query performance is scalable with
retrieval cost per result row almost the same as dataset
size changes
• WordNet (0.5M triples)
•
Sub-second query response
• UniProt (80M triples)
•
Query Results Range: 0.5 – 5 seconds
• See 2005 VLDB Paper
Large-Scale RDF Data
• UniProt – 10M, 20M, 40M, 80M triples
• 6 example queries given with UniProt
• Number of matches remain constant as dataset size
changes (ROWNUM)
UniProt Sample Queries
Description
Pattern
Projection
Result limit
Q1: Display the ranges of
transmembrane regions
6 triples
5 vars
3 vars
15000 rows
Q2: List proteins with
publications by authors with
matching names
5 triples
5 vars
1 LIKE pred.
3 vars
10 rows
Q3: Count the number of
times a publication by a
specific author is cited
Q4: List resources that are
related to proteins annotated
with a specific keyword
3 triples
2 vars
0 vars
32 rows
3 triples
2 vars
1 var
3000 rows
Q5: List genes associated
with human diseases
7 triples
5 vars
3 vars
750 rows
Q6: List recently modified
entries
2 triples
2 vars
1 range pred.
2 vars
8000 rows
Query Response Times
RDF_MATCH Performance Scalability
Q1
Q2
Q3
Q4
Q5
Q6
0.86
< 0.01
< 0.01
0.03
0.18
0.46
20 M Triples
0.95
< 0.01
< 0.01
0.03
0.19
0.47
40 M Triples
0.96
< 0.01
< 0.01
0.03
0.18
0.47
80 M Triples
1.03
< 0.01
< 0.01
0.03
0.20
0.49
Maximum 
.054
0.002
0.002
.011
.065
0.07
10 M Triples
More Information
• www.oracle.com/technology/tech/semantic_
technologies
• Product Development contacts:
•
•
Product Management
– Xavier Lopez (xavier.lopez@oracle.com)
– Jayant Sharma (jayant.sharma@oracle.com)
Development
– Souri Das (souripriya.das@oracle.com)
– Melliyal (Melli) Annamalai
(melliyal.annamalai@oracle.com)
Download