Application Integration in an E-Commerce World Leslie M. Tierstein STR LLC 1 Application Integration 2 Overview – Acquire data from one or more sources – Transform its meaning and/or format – Deliver it to one or more targets Application Integration 3 Processing scenarios: – Data warehouse loads – Conversions from legacy systems or application interfaces performed in a “batch window” – Ongoing “real-time” interfaces What are the properties of each scenario? How are the scenarios different/the same? Warehouse Loads (1) Repeatable, regularly scheduled – – Load must ensure consistent user views – 4 Data is initially loaded (see Conversions) Then it is “refreshed”, typically via Change Data Propagation “Checkpoint” OLAP environment Warehouse Loads (2) Data must be “transformed” – – – – 5 From third-normal form OLTP source systems to star-schema OLAP target systems Possibly to an Operational Data Store (ODS) Usually from multiple, heterogenous sources Summarization may also be required to the desired level of detail Warehouse Loads (3) Vast amounts of operational data Importance of metadata – 6 Oracle’s Common Warehouse Metadata (CWM) Legacy Conversions 7 “One-time” task – In “big-bang” implementation – Some phased implementations need conversions repeated numerous times – Scheduled “cut over” to the new system Data in the source system is expendable after it is converted -- “quick and dirty” is an option Application Interfaces (1) Repeatable – – Small to large volumes of data Operational data at both ends – – 8 Regularly scheduled (“batch”) Event-driven (“near” real-time) Source and target Custom, COTS, external applications (owned by another entity/business) Old Terminology E(T)TL – Extract, (Transport,) Transform, Load – 9 Extract source data (Transport data to new platform) Transform data to new format Load data into new database Typically applied to batch application integration or warehouse loads Newer Terminology EAI – – 10 Enterprise Application Integration Acquire data from source application(s) Transform data Deliver data to target application(s) Exchange of data between two or more applications Newest Terminology (1) 11 A2A: Application to Application Integration – Exchange of data between two or more applications, typically without a web interface – May be “real-time” or batch – “Interfaces” between systems/applications (cf: Oracle Applications Interface tables) Newest Terminology (2) 12 B2C: Business to Consumer Integration – A consumer, via a web site, interacts with software owned by one business – The business’s corporate database(s) is (are) queried in the transaction – The business’s corporate database(s) is (are) updated as a result of the transaction Newest Terminology (3) 13 B2B: Business to Business Integration – “I’ll have my computer call your computer” – A transaction in one business’s computer automatically triggers a transaction in another business’s computer – B2B integration may be under the covers in B2C scenarios or performed independent of B2C transactions Newest Terminology (4) B2B: “I took my notepad from my shirt pocket and displayed a standard contract … She glanced at it, then had her own computer scrutinize the document. Conversing in modulated infrared, the machines rapidly negotiated the fine details. My notepad signed the agreement on my behalf, and Lansing’s did the same, and they both chimed happily in unison to let us know that the deal had been concluded.” Greg Egan, “Cocoon”, ©1994 14 Extract/Acquire (1) 15 Online, real-time database access – Native Oracle access – ODBC/JDBC – Oracle gateways ($$$) – Heterogeous replication packages (such as DataBridge) – APIs (COTS packages such as SAP) Extract/Acquire (2) Alternate character sets – – Change Data Propagation (CDP) – – 16 EBCDIC, ASCII, unicode 7-bit, 8-bit, 16-bit Triggers Event Logs Load/Deliver 17 Same access issues as Extract/Acquire Transport (1) Files – Connectivity – 18 LAN/WAN, Internet, Sneaker-Net Transfer protocols ftp, proprietary, http, https WAP Transport (2) Messages – via queues – via email 19 IBM MQ Series, Oracle AQ Microsoft MSMQ, Java JMS POP3, attachments Transform – Data Mapping (1) 20 Potential many-to-many mapping between sources and targets – “Point-to-point” mappings – vs. hub-and-spoke transformation engines Algorithms to change the format and semantics of the data Transform - Data Mapping (2) Relationships – – 1:1, many:many - Facts of life 1:many – Many:1 21 normalization - conversions, semantically overloaded attributes mergers/acquisitions; multi-line text to LOB Repository for impact analysis Data Transformation (1) Data type translation – – Mutually intelligible data – – 22 Should be transparent (a la Oracle) Except for rare types (eg, bit maps) XML Emerging XML standards Data Transformation (2) 23 Algorithms are often referred to as “business rules” – Rules may range from simple assignments – To complex lookups/translations on multiple columns, with referential integrity checks, data cleansing, functions, etc. Rules and/or their components should be reusable Data Transformation (3) 24 Algorithms/business rules – Ability to INSERT/UPDATE/DELETE – Ability to produce multiple target records per one source, or one target per multiple sources – Ability to track (and potentially reprocess) exceptions (not part of transform per se) Data Transformation (4) Data Cleansing - specialized transform process, applied to “dirty” legacy data – Report on fixes, exceptions – Ability to resubmit failed rows – Third-party products 25 Merge-purge software (typically for addresses) Technology (1) Selection criteria – – Runs on your hardware and software Support for physical data types – Support for logical data types 26 Files, databases, message queues Internet and wireless protocols Adapters, connectors, pre-built interfaces Especially for COTS packages (Oracle Apps, PeopleSoft, SAP, Siebel CRM) Technology (2) Selection criteria – Ability to write business rules – Maintainability 27 Language, point-and-click, combination Ability to use external code (custom or bought) Ability to reuse components Cost-benefit over the system development life cycle Technology (3) Selection criteria – – Real-time, “near real-time”, batch Metadata – – – – 28 Operational Programmer-oriented (business rules) Scalability (maintenance windows?) Integration with other tools, skill sets Support for corporate standards Infrastructure required (middleware) Oracle Technology (1) 29 PL/SQL and SQL*Loader Data Mart Suite (RIP) Oracle Warehouse Builder Oracle Integration Server (MIA) XML services Oracle Technology (2) Database services – – – 30 Scheduling (DBMS_JOB) Advanced Queuing (AQ) Replication Third-Party Tools Specialized for a processing scenario – – – General purpose – – 31 Conversions Warehouse loads Data Integration (A2A, B2B, B2C) Mainframe gateways Heterogenous replication Summary 32 Select a methodology and tool to fit your processing scenario -- more than one tool if necessary Integrate the tool(s) into your development and maintenance methodology About the Author 33 Leslie Tierstein is an Technical Project Manager at STR LLC in Fairfax VA. She can be reached at: ltierstein@strllc.com