Overcoming Ontological Conflicts in Information Integration

advertisement
Overcoming Ontological Conflicts in
Information Integration
Aykut Firat, Stuart Madnick and Benjamin Grosof
MIT Sloan School of Management
{aykut,smadnick,bgrosof}@mit.edu
Presentation for
ICIS 2002
Barcelona, Spain
December 15-18
1
Death. Taxes. Integration.
Motivation
• DATEK IT INTEGRATION CHALLENGES AMERITRADE
“...consolidate data centers, develop a single online-trading
Web site and set up unified systems…” (CW)
• DEFENDING THE U.S.
“..a sea of unconnected islands of information technology..” (EW)
• A FRANCO-GERMAN PARTNERSHIP IN THE COCKPIT
“EADS: a big challenge to forge an integrated, cross-border
group within Europe…” (FT)
• MARKET MAKES IT PRIORITY IN DRUG MERGER
“Competitive pressures make it priority for (Glaxo Wellcome
2
and Smith-Kline Beecham) to combine their systems ...” (CWorld).
Death. Taxes. Integration.
Motivation
• DATEK IT INTEGRATION CHALLENGES AMERITRADE
“...consolidate data centers, develop a single online-trading
Web site and set up unified systems…” (CW)
• DEFENDING THE U.S.
“..a sea of unconnected islands of information technology..” (EW)
• A FRANCO-GERMAN PARTNERSHIP IN THE COCKPIT
“EADS: a big challenge to forge an integrated, cross-border
group within Europe…” (FT)
• MARKET MAKES IT PRIORITY IN DRUG MERGER
“Competitive pressures make it priority for (Glaxo Wellcome
3
and Smith-Kline Beecham) to combine their systems ...” (CWorld).
Roadmap
•Key Concepts
•Case Study
•Solution Methodology
•Test Example
•Concluding Remarks
4
Key Concepts
Semantic Integration
In t
e rn
et
Se
ma
nti
c
Disconnected Sources
Physical Integration
We
b
Semantic Integration
5
Key Concepts
Ontology
g
“Specification of a conceptualization”
Company
located in
has
Financials
is-a
Country
official currency
PE Ratio
Earnings
Currency
A snapshot from a financial ontology
6
Key Concepts
Ontological Heterogeneity
Bloomberg
Ontology
Company
Market Guide
Ontology
Financials
Company
Financials
Currency
Country
Currency
Earnings
Country
PE Ratio
trailing
Price
Earnings in the
last 4 quarters
Earnings
PE Ratio
ontological
conflict
forward
looking
Price
Earnings in the last 3 quarters +
forecasted next quarter
7
Case Study
Information Chain
Analyst
SEC Filing
Company
Local accounting
Pro-forma
Auditors & Investors
Analyst
Reports
8
Data Source Analysis
Case Study
Found Three Different Types of Heterogeneities
•Data Level
•Ontological
•Temporal
Primark
• Worldscope
• DataStream
• Disclosure
SEC
Web
• Bloomberg
• Hoovers
• Yahoo
• Market Guide
• Fortune
• Corporate Information
•…
9
Data Level Heterogeneities
Case Study
Definition: Same entity different representations
Sales Data
Sales(FIAT)
WorldScope
Market Guide
93,719,340,540
45,871.5
Currency: Local
Scale Factor:
1000
Currency: USD
Scale Factor:
Millions
COMPANY/
DATASOURCE
Hoovers
Yahoo
Market Guide
Money Central
Corporate
Information
World Scope
Disclosure
Primark Review
FIAT
48,741.0 (Dec99)
N/A
45,871.5 (Dec99)
49,274.6 (’99)
57,603,000,000
(’00)
93,719,340,540 (99)
48,402,000 (99)
51,264 (99)
10
Case Study
Data Level Heterogeneities
Definition: Same entity different representations
Sales Data
COMPANY/
DATASOURCE
Hoovers
Yahoo
Market Guide
Money Central
Corporate
Market
Guide
Information
45,871.5
World
Scope
Disclosure
Primark Review
Sales(FIAT)
WorldScope
93,719,340,540
Currency: Local
Scale Factor:
1000
FIAT
48,741.0 (Dec99)
N/A
45,871.5 (Dec99)
49,274.6 (’99)
57,603,000,000
(’00)
93,719,340,540 (99)
48,402,000 (99)
51,264 (99)
DAIMLER
CHRYSLER BENZ
152,446.0
131.4B
145,076.4
152.4 Bil
162,384,000,000
(’00)
257,743,189 (98)
131,782,000 (98)
71354 (97)
Currency: USD
Scale Factor:
Millions
11
Ontological Heterogeneities
Case Study
Definition: Different entity type definitions and/or relationships
SOURCE
ABC
Bloomberg
DBC
MarketGuide
PERatio
Bloomberg
5.57
Trailing
P/E Ratio
P/E RATIO
11.6
5.57
19.19
7.46
Market Guide
7.46
Forward
looking
P/E Ratio
Price/PE_Trailing =
Price/ PE_Forward –Forecasted_Quarterly_Sales(t+1) + Quarterly_Sales(t-3)
12
Temporal Heterogeneities
Case Study
Definition: Entity values or definitions belong to different
times, or time intervals.
Data Level
Worldscope
Scale Factor = 1000
For Exxon:
“Worldscope.
Revenues” =
“Disclosure. Net Sales”
– “SEC. Excise Taxes”
Ontological
1996
2000
Worldscope
Scale Factor = 1
For Exxon:
“Worldscope. Revenues” = “Disclosure.
Net Sales” –“ SEC. Earnings from Equity
Interests and Other Revenue” –“ SEC.
Excise Taxes”
13
Our Challenge
Problem Definition
How to represent and reason with semantic heterogeneities?
Why challenging?
•Machine to machine communication: AI
•Declarative, scalable, extendible solution required
•Modeling: combination of art and science
•Includes NP-Complete problems (e.g. Source Selection)
14
Approach
Solution Methodology
Context Interchange (COIN): Data Level
•Research undertaken by Sloan IT Ph.D. Cheng-Goh
•Logical Framework: COIN Data Model + Abductive Logic Programming
•Loosely coupled approach to semantic integration
Extended COIN: Ontological
•Extended Data Model + Symbolic Equation Solving Techniques
•Based on Constraint Logic Programming
•Source Selection
•Ontology Merging
15
Solution Methodology
COIN Architecture
Conversion
Library
e.g. Currency
Converter
Ontology
Cu
Mediated
Query
Optimizer/
Executioner
Scale factor: 1000
Currency: USD
…
Context
Mediator
D
:U S
y
c
rren
Context
Axioms
Query
Curre
ncy:I
T
L
e.g Sales of FIAT
Market Guide
DBMS
Context
Axioms
DataStream
Semistructured
Data Sources
(e.g. XML)
Scale factor: 1000
Currency: Local
…
Context
Axioms
16
ECOIN Architecture
Solution Methodology
Price/PE_Trailing =
PE Ratio: Trailing
Scale factor: 1000
Currency: USD
…
Price/ PE_Forward –Forecasted_Quarterly_Sales(t+1) + Quarterly_Sales(t-3)
Equation
Library
Conversion
Library
e.g. Currency
Converter
Constraint
Engine
P
Mediated
Query
Context
Mediator
orwa
rd
Symbolic
Equation Solver
PERa
tio: F
Optimizer/
Executioner
Market Guide
DBMS
Context
Axioms
DataStream
Semistructured
Data Sources
(e.g. XML)
Ontology
Context
Axioms
i
:Tra
o
i
t
ERa
ling
Context
Axioms
Query
e.g PERatio of FIAT
PE Ratio: Forward
Scale factor: 1000
Currency: Local
…
17
Key Link:
ECOIN & Semantic Web
Solution Methodology
Semantic Web Architecture
(Tim Berners-Lee, W3C)
Proof
Logic
ECOIN
Rules
Ontology
Digital Signatures
Trust
RDF+ RDF Schema
Tagged data: XML + Namespaces + XML Schema
Unicode
URIs
18
Test Example
E-Business Application
Price: Nominal
Product Code: Numeric
Price Equations
pokemon
13.3
starwars
30.1
Context
Mediator
Results
Query
Prices of Products
Cheaper in eToys
compared to Kid’s World
Price:Nominal + Tax+Shipping
Product Code: Alpha
Price:Nominal + Tax
Product Code: Numeric
eToys
Kid’s World
123456
20
pokemon
17
234567
40
starwars
45
…
..
…
…
19
Concluding Remarks
ƒWe developed ECOIN
ƒSymbolic Equation Solving Techniques
ƒProof of the concept prototype
ƒFirst system to deal with both data level and ontological
heterogeneities
ƒTemporal in our agenda next
ƒContribute to the global semantic interoperability efforts
20
Industry solutions?
Problems addressed
Many Experts in Physical and Tightly-Coupled Integration
•
Content Aggregation: Yodlee,..
•
Workflow Integration: IBM, Microsoft,..
•
Enterprise Application Integration: SAP, WebMethods, Siebel
•
B2B: Ariba, CommerceOne,…
•
Portals: Vignette, Broadvision,..
Our focus is on Semantic and Loosely-Coupled Integration
21
Download