Overcoming Ontological Conflicts in Information Integration Aykut Firat, Stuart Madnick and Benjamin Grosof MIT Sloan School of Management {aykut,smadnick,bgrosof}@mit.edu Presentation for ICIS 2002 Barcelona, Spain December 15-18 1 Death. Taxes. Integration. Motivation • DATEK IT INTEGRATION CHALLENGES AMERITRADE “...consolidate data centers, develop a single online-trading Web site and set up unified systems…” (CW) • DEFENDING THE U.S. “..a sea of unconnected islands of information technology..” (EW) • A FRANCO-GERMAN PARTNERSHIP IN THE COCKPIT “EADS: a big challenge to forge an integrated, cross-border group within Europe…” (FT) • MARKET MAKES IT PRIORITY IN DRUG MERGER “Competitive pressures make it priority for (Glaxo Wellcome 2 and Smith-Kline Beecham) to combine their systems ...” (CWorld). Death. Taxes. Integration. Motivation • DATEK IT INTEGRATION CHALLENGES AMERITRADE “...consolidate data centers, develop a single online-trading Web site and set up unified systems…” (CW) • DEFENDING THE U.S. “..a sea of unconnected islands of information technology..” (EW) • A FRANCO-GERMAN PARTNERSHIP IN THE COCKPIT “EADS: a big challenge to forge an integrated, cross-border group within Europe…” (FT) • MARKET MAKES IT PRIORITY IN DRUG MERGER “Competitive pressures make it priority for (Glaxo Wellcome 3 and Smith-Kline Beecham) to combine their systems ...” (CWorld). Roadmap •Key Concepts •Case Study •Solution Methodology •Test Example •Concluding Remarks 4 Key Concepts Semantic Integration In t e rn et Se ma nti c Disconnected Sources Physical Integration We b Semantic Integration 5 Key Concepts Ontology g “Specification of a conceptualization” Company located in has Financials is-a Country official currency PE Ratio Earnings Currency A snapshot from a financial ontology 6 Key Concepts Ontological Heterogeneity Bloomberg Ontology Company Market Guide Ontology Financials Company Financials Currency Country Currency Earnings Country PE Ratio trailing Price Earnings in the last 4 quarters Earnings PE Ratio ontological conflict forward looking Price Earnings in the last 3 quarters + forecasted next quarter 7 Case Study Information Chain Analyst SEC Filing Company Local accounting Pro-forma Auditors & Investors Analyst Reports 8 Data Source Analysis Case Study Found Three Different Types of Heterogeneities •Data Level •Ontological •Temporal Primark • Worldscope • DataStream • Disclosure SEC Web • Bloomberg • Hoovers • Yahoo • Market Guide • Fortune • Corporate Information •… 9 Data Level Heterogeneities Case Study Definition: Same entity different representations Sales Data Sales(FIAT) WorldScope Market Guide 93,719,340,540 45,871.5 Currency: Local Scale Factor: 1000 Currency: USD Scale Factor: Millions COMPANY/ DATASOURCE Hoovers Yahoo Market Guide Money Central Corporate Information World Scope Disclosure Primark Review FIAT 48,741.0 (Dec99) N/A 45,871.5 (Dec99) 49,274.6 (’99) 57,603,000,000 (’00) 93,719,340,540 (99) 48,402,000 (99) 51,264 (99) 10 Case Study Data Level Heterogeneities Definition: Same entity different representations Sales Data COMPANY/ DATASOURCE Hoovers Yahoo Market Guide Money Central Corporate Market Guide Information 45,871.5 World Scope Disclosure Primark Review Sales(FIAT) WorldScope 93,719,340,540 Currency: Local Scale Factor: 1000 FIAT 48,741.0 (Dec99) N/A 45,871.5 (Dec99) 49,274.6 (’99) 57,603,000,000 (’00) 93,719,340,540 (99) 48,402,000 (99) 51,264 (99) DAIMLER CHRYSLER BENZ 152,446.0 131.4B 145,076.4 152.4 Bil 162,384,000,000 (’00) 257,743,189 (98) 131,782,000 (98) 71354 (97) Currency: USD Scale Factor: Millions 11 Ontological Heterogeneities Case Study Definition: Different entity type definitions and/or relationships SOURCE ABC Bloomberg DBC MarketGuide PERatio Bloomberg 5.57 Trailing P/E Ratio P/E RATIO 11.6 5.57 19.19 7.46 Market Guide 7.46 Forward looking P/E Ratio Price/PE_Trailing = Price/ PE_Forward –Forecasted_Quarterly_Sales(t+1) + Quarterly_Sales(t-3) 12 Temporal Heterogeneities Case Study Definition: Entity values or definitions belong to different times, or time intervals. Data Level Worldscope Scale Factor = 1000 For Exxon: “Worldscope. Revenues” = “Disclosure. Net Sales” – “SEC. Excise Taxes” Ontological 1996 2000 Worldscope Scale Factor = 1 For Exxon: “Worldscope. Revenues” = “Disclosure. Net Sales” –“ SEC. Earnings from Equity Interests and Other Revenue” –“ SEC. Excise Taxes” 13 Our Challenge Problem Definition How to represent and reason with semantic heterogeneities? Why challenging? •Machine to machine communication: AI •Declarative, scalable, extendible solution required •Modeling: combination of art and science •Includes NP-Complete problems (e.g. Source Selection) 14 Approach Solution Methodology Context Interchange (COIN): Data Level •Research undertaken by Sloan IT Ph.D. Cheng-Goh •Logical Framework: COIN Data Model + Abductive Logic Programming •Loosely coupled approach to semantic integration Extended COIN: Ontological •Extended Data Model + Symbolic Equation Solving Techniques •Based on Constraint Logic Programming •Source Selection •Ontology Merging 15 Solution Methodology COIN Architecture Conversion Library e.g. Currency Converter Ontology Cu Mediated Query Optimizer/ Executioner Scale factor: 1000 Currency: USD … Context Mediator D :U S y c rren Context Axioms Query Curre ncy:I T L e.g Sales of FIAT Market Guide DBMS Context Axioms DataStream Semistructured Data Sources (e.g. XML) Scale factor: 1000 Currency: Local … Context Axioms 16 ECOIN Architecture Solution Methodology Price/PE_Trailing = PE Ratio: Trailing Scale factor: 1000 Currency: USD … Price/ PE_Forward –Forecasted_Quarterly_Sales(t+1) + Quarterly_Sales(t-3) Equation Library Conversion Library e.g. Currency Converter Constraint Engine P Mediated Query Context Mediator orwa rd Symbolic Equation Solver PERa tio: F Optimizer/ Executioner Market Guide DBMS Context Axioms DataStream Semistructured Data Sources (e.g. XML) Ontology Context Axioms i :Tra o i t ERa ling Context Axioms Query e.g PERatio of FIAT PE Ratio: Forward Scale factor: 1000 Currency: Local … 17 Key Link: ECOIN & Semantic Web Solution Methodology Semantic Web Architecture (Tim Berners-Lee, W3C) Proof Logic ECOIN Rules Ontology Digital Signatures Trust RDF+ RDF Schema Tagged data: XML + Namespaces + XML Schema Unicode URIs 18 Test Example E-Business Application Price: Nominal Product Code: Numeric Price Equations pokemon 13.3 starwars 30.1 Context Mediator Results Query Prices of Products Cheaper in eToys compared to Kid’s World Price:Nominal + Tax+Shipping Product Code: Alpha Price:Nominal + Tax Product Code: Numeric eToys Kid’s World 123456 20 pokemon 17 234567 40 starwars 45 … .. … … 19 Concluding Remarks We developed ECOIN Symbolic Equation Solving Techniques Proof of the concept prototype First system to deal with both data level and ontological heterogeneities Temporal in our agenda next Contribute to the global semantic interoperability efforts 20 Industry solutions? Problems addressed Many Experts in Physical and Tightly-Coupled Integration • Content Aggregation: Yodlee,.. • Workflow Integration: IBM, Microsoft,.. • Enterprise Application Integration: SAP, WebMethods, Siebel • B2B: Ariba, CommerceOne,… • Portals: Vignette, Broadvision,.. Our focus is on Semantic and Loosely-Coupled Integration 21