Data Services

advertisement
Semantic Interoperability:
Automatically Resolving Vocabularies
Chuck Mosher
8500 Leesburg Pike
Vienna, VA
cmosher@metamatrix.com
4th Semantic Interoperability Conference
February 10, 2006
Interoperable Information Backbone
Enterprise Data Service Layer
Applications
MetaMatrix
Data Sources
•
•
Enterprise-wide data abstraction layer for applications
Integrated views of data from multiple sources
–
•
•
•
2
Relational databases, applications, files
Re-useable Data Services for data consistency
Metadata-driven data management and integration
Complements other data integration tools (ETL, EAI, quality, etc.)
Data Services
• A type of Web Service
• Does all of the work to transform any data in
any format to a W3C compliant service
– Implements all of the logic to effect the
transformation
– Provides access to data sources, regardless of
source API, technology
• Does not implement application logic
• Decouples the data from the application
while making the data discoverable and
accessible
3
Model-Based Approach Maximizes Re-use
Data Abstraction Without Coding
Exposed
Information
Services
Information
Consumers
Reusable Integrated
Business Objects
Enterprise
Information
Sources (EIS)
Web Services,
Business Processes
databases
<WSDL>
Packaged Apps
SOAP
(contract)
services
<WSDL>
(contract)
warehouses
Custom Apps
<WSDL>
EAI, Data warehouses
geo-spatial
rich media
…
4
xml
spreadsheets
JDBC
Reporting, Analytics
ODBC
(contract)
<sale/>
<value/>
</ sale >
Meta Object Facility (MOF)
Metamodel
Model
Data
5
MetaMatrix MetaBase Modeler
• Model disparate
information sources
– Relational DBs
– Content Management
Systems
– Files
– Services
– Applications
• Uses and retains
domain-specific
modeling terminology
– Relational models
have “Tables”,
“Foreign Keys”,
“Columns”, etc.
– UML models have
“Packages”, “Classes”,
“Attributes”, etc.
6
MetaMatrix MetaBase Modeler
• Define reusable
data services/
business objects
• Transformations
defined with:
– Selects
– Joins
– Criteria
– Unions
– Functions
– User defined
• Perform schema
and semantic
matching, data
type conversion
7
Semantic Mediation: The Problem
Business
Intelligence
Applications
Portal
Applications
Web
Services
ODBC/JDBC
JDBC
SOAP
Virtual XML
Document
<a>
<b>
…
</b>
</a>
T
T
Logical Data Model
Location_ID
T
bldg_type
bldg_id
T
Location_Type
T
T
Depot_Number
SITENUM
Facility_ID
Multiple Internal/External Information Sources
8
Aggregate Data Services:
• Relational or XML
• Application-specific
• Access via ODBC,
JDBC, or SOAP APIs
Enterprise-wide or
COI-driven Data Model
• Rationalization and
Semantic mediation
Layer
• Harmonization
• Data Catalog/Dictionary
Data Sources
- Authoritative
- Redundant
- Overlapping
Building Enterprise Semantic Model(s)
Business
Intelligence
Applications
Portal
Applications
Web
Services
ODBC/JDBC
JDBC
SOAP
J-8 Force Structure
J-7 Operational Plans
J-6 C4CS
J-5 Plans & Policy
Enterprise-wide or
COI-driven Data Models
• Rationalization
• Harmonization
• Data Catalogs
J-4 Logistics (GCSS)
J-3 Operations
J-2 Intelligence
J-1 Manpower / Personnel
T
T
T
Multiple Internal/External Information Sources
9
Data Sources
- Authoritative
- Redundant
- Overlapping
Biggest Challenge in Creating Data Services?
•
•
•
•
•
Semantics!!!
Structural differences are straightforward
Differing definitions among data sources
Differing vocabularies among COI’s
Established, emerging, and evolving data
standards
– C2IEDM, JC3IEDM, GJXDM, NIEM, GFM,
many more
• Not addressed by ETL, EAI, SOA
10
A Previously Intractable Problem
• TWPDES has 1000+ core entities
• NIEM has 100,000+!
• Even a limited program with a dozen data
sources could yield 10’s of 1000’s of
potential mappings
• Humans cannot address this without help
• Indeed, it has stopped many data
integration/reconciliation programs in their
tracks.
11
Automated Semantic Matching
DISCLAIMER
• Semantic matching can't really be done
automatically yet!
• Requires intelligence to understand the
context and semantics.
• So use computers to do most of the work
but then have the user confirm or check
the result.
13
The Matching Problem
• Given two symbols, calculate a measure
of the relationship between them:
amount
quantity
Doesn’t seem so hard…
14
The Matching Problem
• Given two symbols, calculate a measure
of the relationship between them:
ftuqky
aqfkyeyr
This is what a computer “sees.”
15
The Matching Problem
• Even after extracting likely symbols,
matching is a difficult problem.
• Symbols alone are not enough to generate
good matches:
– “ID” -> “SocialSecurityNumber” or “NY”
• The solution relies on context:
– “NJ”,”MA”,”CA”,”ID”
– “Ego”, “SuperEgo”, “ID”
• MatchIt provides that context
16
MatchIT 1.0
• Integrated component of the MetaMatrix
Semantic Data Services product
• Based on ontology-driven semantic knowledge
base
– Word relationships, dictionaries, lexicons, thesauri
• Plug-in architecture
• Standards-compliant:
–
–
–
–
–
–
17
OWL
RDF
Inference engines
OSGI
Eclipse
JDBC
(Semi-)Automated Semantic Mediation
*An extensible semantic knowledge
base provides a dictionary and
thesaurus like information on
“words”, their “meanings”, and their
relationships to other words.
Ontology
“Sex” semantically related to “Gender”
Gender ID
Matched
(Confidence of 90%)
Person Sex
Code
*A sophisticated set of
matching algorithms
provides string similarity
matches and semantic
matches with confidence
ratings and explanations.
Data
Source
Services
FBI
18
CBP
NYC
NY
NJ
Matching Techniques
• MatchIT uses two types of matching techniques:
– String Matching
• Attempts to determine string similarity based on the lexical
distance between them.
– Semantic Matching
• Attempts to determine string similarity based on the
ontological distance between them within a semantic
ontology.
• Generate Match Sets
• Can be run individually or in combinations
• Pluggable architecture allows for algorithmic
extendibility
19
String Matching
• What is the lexical distance between two
symbols?
– “PUZZLE”, “PUZZ”
– “ID”,”IDENTIFIER”
– “STRONG”,”SONG”
20
Semantic Matching
• How semantically similar are two
concepts?
vehicle
is a
is a
wheeled vehicle
is a
self-propelled
vehicle
is a
car
aircraft
heavier-than-air
craft
is a
is a
truck
car and truck are very similar
Car and airplane are less similar
21
is a
is a
motor vehicle
is a
craft
airplane
Semantic Matching Objectives
• Find and rank the potential matches, but
let the user review and decide for sure.
• I.e., eliminate 99+% of the things that don't
match, and let the user review the <1%.
• Many times, a user can visually scan a
small list of the top 1% and very quickly
agree or disagree with the results.
• Favor false positives over false negatives.
22
Semantic Matching in MetaMatrix
Enterprise Information Sources
Conceptual/Logical/Physical Data Models
Relational
Domain
[UML/ER]
XML
X
MX
L MX
LM
L
Ontologies
[OWL/RDF]
Representations
Custom
Any
Source
XML
JDBC
File
System
RDBMS
MetaMatrix Connector Framework
MetaMatrix Importer Framework
Data/
Content
Access
Import Export
MatchIt
MetaBase Modeler
Semantic Knowledge
Base
Ontology
Find Matches
Data
Harmonization
Complete
Schema-level
Match
Instance-level
Match
Metadata
Access
•Analyze
•Visualize
•Collaborate
•Transform
MetaBase Repository
Ontological
Semantics
Access
Fact
Repository
Onomasticons
Models & Files
[versioned]
Lexicons
Search
Index
23
Web
Reporting
Example
Overall process
• Import two nontrivial vocabularies
– ERwin model of large data warehouse
– TWPDES XML schema
• Extract symbols
– Schema-specific tokenization algorithms
• Assign semantics to each
– Symbols are keys into dictionaries
• Perform semantic matching between them
• Analyze results
25
ERwin Data Warehouse Model
26
TWPDES XML Schema
Mapping Classes
for each XML frag
in hierarchy
27
Generated Symbol Dictionary
28
(TWPDES)
Generated Symbol Dictionary
29
(ERwin model)
Editing the Dictionary
Modify Definition
30
Editing the Semantics
Control Senses
31
Target Model
Match Results
32
Examine Details
33
Match Details
34
Matches Used to Build Mappings
35
From Pat Cassidy & COSMO
The Integrating Function of the Common Semantic Model –
via Domain-level Mapping
GenericObligation
SameAs
Obligation
36
SameAs
Duty
MatchIt Semantic Matching Tool
• A way to use ontologies in a world where nearly
100% of what already exists is not in an
ontology.
• Map connections between ontologies that are
being built and artifacts currently in use:
–
–
–
–
RDBMs schemas
XML and XSD files
Spreadsheet data
More coming, including ontologies!
• Map an imported model to a Vocabulary, and a
Vocabulary to an Ontological structure
37
Thank you
Download