Enterprise Information Integration using Semantic Web Technologies

advertisement
Enterprise Information Integration
using Semantic Web Technologies:
RDF as the Lingua Franca
David Booth, Ph.D.
HP Software
Semantic Technology Conference 20-May-2008
In collaboration with Steve Battle, HP Labs
Latest version of these slides:
http://dbooth.org/2008/stc/slides.ppt
© 2008 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Disclaimer
This work reflects research and is presented for
discussion purposes only. No product
commitment whatsoever is expressed or implied.
Furthermore, views expressed herein are those
of the author and do not necessarily reflect those
of HP.
2
Outline
•
PART 0: The problem
PART 1: RDF: The lingua franca for information exchange
•
−
Why
1.
2.
3.
4.
−
How
1.
2.
3.
4.
•
RDF message semantics
REST-based SPARQL endpoints
XML with GRDDL transformations
Aggregators
PART 2: POC: A SPARQL adaptor for UCMDB
−
−
3
Focus on semantics
Easier data integration
Easier to bridge other formats/models
Looser coupling
What is UCMDB
SPARQL adaptor
PART 0
The problem
4
Problem 1: Integration complexity
•
Multiple producers/consumers need to share data
•
Tight coupling hampers independent versioning
Compliance Management
Discovery
Provisioning
Release
Management
Incident
Management
Change
Management
Release Managers
Compliance
Managers
Operation Centers
Source Control
Monitoring
Networking Engineers
Storage
Administrators
Unix System
Administrators
Networking
Administrators
5
Windows System
Administrators
Ticketing
Problem 2: Babelization
•
Proliferation of data models (XML schemas, etc.)
•
Parsing issues influence data models
•
No consistent semantics
•
Data chaos
Tower of Babel, Abel Grimmer (1570-1619)
6
PART 1
RDF: The lingua franca
for information exchange
7
Why?
Four reasons . . .
8
Why?
1. Focus on semantics
•
XML:
− Schema is focused on how to serialize
• Constrains more than the model
− Parent/child and sibling relationships are not named
• Are their semantics documented? E.g., does sibling order
matter?
•
RDF:
− One URI per concept
− Syntax independent
•
9
Who cares about syntax?
Why?
2. Easier data integration
10
Why?
2. Easier data integration
•
11
Blue App has model
Why?
2. Easier data integration
•
Red App has model
•
Need to integrate Red & Blue models
12
Why?
2. Easier data integration
•
Step 1: Merge RDF
•
Same nodes (URIs) join automatically
13
Why?
2. Easier data integration
•
Step 2: Add relationships and rules
•
(Relationships are also RDF)
14
Why?
2. Easier data integration
•
Step 3: Define Green model
•
(Making use of Red
& Blue models)
15
Why?
2. Easier data integration
•
What the Blue app sees:
− No difference!
16
Why?
2. Easier data integration
•
What the Red app sees
•
No difference!
17
Why?
3. RDF helps bridge other formats/models
•
•
•
Producers and consumers may use different formats/models
Rules can specify transformations
Inference engine finds path to desired result model
A1
X
A2
A3
RDF
Model
Transform
Y
B1
B2
C1
Z
18
C2
Ontologies
Ontologies
&&Ontologies
Rules
&Rules
Rules
Why?
4. Looser coupling
•
Without breaking consumers:
− Ontologies can be mixed and extended
− Triples can be added
•
19
Producer & consumer can be versioned more
independently
Example of looser coupling
•
•
RedCust and GreenCust ontologies added
Blue app is not affected
(Blue app)
Consumer
20
Producer
How?
Four ways . . .
21
How?
1. RDF message semantics
•
Interface contract specifies RDF, regardless of
serialization
•
RDF pins the semantics
Consumer
22
RDF
Producer
How?
2. REST-based SPARQL endpoints
Consumer
23
RDF
SPARQL
HTTP
Producer
REST-based SPARQL endpoints
•
Why REST:
− HTTP is ubiquitous
− Simpler than SOAP-based Web services (WS*)
− Looser process coupling
24
REST-based SPARQL endpoints
•
Why SPARQL:
− One endpoint supports multiple data needs
• Each consumer gets what it wants
− Insulates consumers from internal model changes
• Inferencing transforms data to consumer's desired model
• Looser data coupling
25
How?
3. XML with GRDDL transformations
•
GRDDL is a W3C standard
•
GRDDL permits RDF to be "gleaned" from XML
− XML document or schema specifies desired GRDDL
transformation
− GRDDL transformation produces RDF from XML
document
− Mostly intended for getting microformat and other
data/metadata from HTML pages
26
Using GRDDL for XML document
semantics
•
Each XML format can be viewed as a custom serialization
of RDF!
− GRDDL transformation produces semantics of the XML document
•
Helps bridge XML and RDF worlds
•
Same XML document can be consumed by:
− Legacy XML app
− RDF app
•
App interface contract can specify RDF
− Serializations can vary
− Semantics are pinned by RDF
27
Using GRDDL for XML document
semantics
Service
Normalize
to RDF
Client
Core App
Processing
Serialize as
XML/other/RDF
See: http://dbooth.org/2007/rdf-and-soa/rdf-and-soa-paper.htm
28
RDF Engine
/ Store
How?
4. Aggregators
•
Gets data from multiple sources
•
Provides data to consumers
A1
X
A2
SPARQL
A3
Y
B1
Aggregator
B2
C1
Z
29
C2
Ontologies
Ontologies
&&Ontologies
Rules
&Rules
Rules
Aggregator
•
Conceptual component
− Not necessarily a separate physical service
•
Handles mechanics of getting data
− Different adaptors for different sources
• REST, WS*, Relational, XML, etc.
• Diverse data models
− Might do caching and query distribution (federation)
•
Provides model transformation
− Plug in ontologies and inference rules as needed
30
PART 2
Proof-of-Concept: A SPARQL
adaptor for UCMDB
31
IT Service Management (ITSM)
•
Manage IT environment
•
Configuration
Management Data
Base (CMDB) is central
32
The HP Universal CMDB (UCMDB)
Goal:
•
Maintain a comprehensive and
current record of all configuration
items (CIs) and their relationships
CMDB : Configuration Management DB
33
Example: host information
Host properties
http://cmdb.mercury.com#nt.35014541
One particular host
machine
34
SPARQL adaptor
•
Uses existing SOAP
interface to UCMDB
•
Enables SPARQL
queries
•
Results can be RDF
SPARQL adaptor
•
No model
transformation (yet)
SOAP interface
HP UCMDB
35
Architecture of SPARQL adaptor
Run Time
SPARQL adaptor
SPARQL
SOAP interface
compile
TQL
submit
RDF
lift
XML
HP UCMDB
Database
Design Time
CMDB metadata
36
export
OWL
UCMDB ontology
•
The HP UCMDB ontology
defines CI types and
relationship hierarchies.
•
Derived automatically from
HP UCMDB metadata.
37
Jena based implementation
•
Jena, ARQ, Joseki developed at HP Labs*.
Jena : Semantic Web toolkit
ARQ : Query Engine
Joseki : SPARQL server
RDF, OWL,
inference
SPARQL query
algebra, evaluation
SPARQL protocol
* http://www.hpl.hp.com/semweb/
38
Query returning a table
Select the names of host servers
on the network with addresses from 192.168.81.0
SELECT ?host_name
WHERE {
[ a object:network ] attr:network_netaddr "192.168.81.0" ;
link:member [ a object:host ;
attr:host_dnsname ?host_name
] }
host_name
"ILDTRD129" ^^<http://www.w3.org/2001/XMLSchema#string>
"JONI" ^^<http://www.w3.org/2001/XMLSchema#string>
"MBADIR-IL" ^^<http://www.w3.org/2001/XMLSchema#string>
39
Query returning an RDF subgraph
Describe a network
(192.168.81.0) with host
servers containing a
DB.
40
Example RDF result set
Database
Host
41
Outline
•
PART 0: The problem
PART 1: RDF: The lingua franca for information exchange
•
−
Why
1.
2.
3.
4.
−
How
1.
2.
3.
4.
•
RDF message semantics
REST-based SPARQL endpoints
XML with GRDDL transformations
Aggregators
PART 2: POC: A SPARQL adaptor for UCMDB
−
−
42
Focus on semantics
Easier data integration
Easier to bridge other formats/models
Looser coupling
What is UCMDB
SPARQL adaptor
Questions?
© 2008 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
Download