Text Files ETL My ODS or OLTP System Reports My DW • Alter the shape • Create a Star Schema (de-nomalized for analysis queries) • Surrogate Keys (in place of business keys) • Pre-Aggregations (to support some types of reporting) • Track History • Slowly Changing Dimensions (history of entities) • Manage Partitions (once a month, roll up details and archive) • Take changes from the store • React to Inserts/Updates/Deletes. • Could be a “full refresh” or incremental Create Consistency Old Accounts Receivable on SAP New Custom AR system in SQL Create Consistency • A long running ‘bridge’ • Existing systems will be left in place and kept in synch. • Reacts to changes in either system. • Needs a way to react to changes or messages to minimize tax on App systems • The systems are different • Often different back ends. • Match schemas, tables, columns • Consistent data domains (like keys) • Detect and resolve duplicates • Create a consistent level of granularity • Aggregate • Allocate Old Accounts Receivable on SAP Once the design is set and tested, execute this Transfer all data and map the shape New Custom AR system in SQL • Systems or Companies merged or acquired. • Bring the data together into the “new” place. • An integration system is design and built and tested to minimize the down time for the old system and make one smooth transition. • Match schemas • Consistent data domains (like keys) • Detect and resolve duplicates • May create a long running ‘bridge’ while the systems settle. Customers Support Customers Accounting Customers Marketing Customers Sales Customers • • • • • Creating ‘One Version of the Truth’ Data residing in many sources where each source schema is fixed but different. Combined into one store with a consistent schema • Pivot / Unpivot • Type and domain mapping • Key generation Ensure quality • Remove duplicates • Provide missing data • Hard matching to find duplicates Bulk update and trickle changes Changes to central store delivered back to operational system PartsAreUs EZ Buy Internet / WAN Orders Order Fulfillment System • • • • • • • Contracts SLAs Standardized formats Long running transactions or business process Loosely coupled Coordination, message passing A very specific perspective on Application Integration. Supplier’s System WeShip Shipper’s System Data Warehouse and Business Intelligence Data Consistency Between Applications Data System Migration and Consolidation Master Data Management Inter Enterprise Data Acquisition and Sharing Point B Point A RDBMS Text Files ETL RDBMS ELT XML • Move a sizeable set of rows from point A to point B • Often • Part of a scheduled process • Transform the shape of the data being moved • Combine many sources or split into many destinations • Two flavors • ETL (Extract Transform Load) • SSIS • Ascential Datastage (IBM) • ELT (Extract Load Transform) • Oracle Warehouse Builder • Bulk Insert Text Files XML C B A Line Of Business Application Coordinator RDBMS From To Message D C File Date C A Insert A B Purchase Event D • Central ‘Coordinator’ • Guarantees receipt and delivery of messages. • Components are ‘at rest’ until activated by the coordinator or an external event. • Data delivered in packets along with the message. • Terms that might fit in this category: • CDC • Trickle Feed • SOA • Message Bus XML Text Files From To Message D C File Date C A Insert A B Purchase Repl / Sync Agent • Maintaining equivalent copies of data in different locations • One master, many slaves • Multi-master • High Availability (live backups) • Similarity between systems • Most often table copies on the same brand of RDBMS • Heterogeneous possible • Attunity, Goldengate, etc. • Transformations: Little to none • Terms that might fit in this category: • CDC, Log mining • Merge Replication • Checksum tables From To Message D C File Date C A Insert A B Purchase From To Message D C File Date C A Insert A B Purchase View Provider Reports • Answers queries directly from many source systems • View Provider may: • Optimize and execute the combined query (Joins, etc.) • Pushes query parts down to the source. • Provide unified security model • Provide unified metadata • Cache source data • Support Heterogeneous Sources Source Destination CEP Engine Event Processing Event • Monitor a stream of data, Create an event when • Temporal (time based) events occur • Running average or aggregate hits a limit • Interesting sequence of records is detected • Also called CEP (Complex Event Processing) • Different from the other Technology Types??? I Can’t tell yet. Event Log • A collection of services common to most Data Integration solutions • Shared semantic model • Metadata library • Manage hierarchies • Data artifact level security model • Data Quality • Profile to understand • Merge to resolve duplicates • Find approximate matches • Test and monitor quality. • Version management for data. Bulk Movement Message Oriented Movement Replication and Synchronization Federated Views Data Management and Quality Stream Processing (CEP) Data Warehouse and Business Intelligence Data Consistency Between Applications Bulk Movement 15% Message Oriented Movement 10% Replication and Synchronization 60% Federated Views Data Management and Quality Stream Processing (CEP) 15% Data System Migration and Consolidation Master Data Management Inter Enterprise Data Acquisition and Sharing Data Warehouse and Business Intelligence Bulk Movement Data System Migration and Consolidation Master Data Management Inter Enterprise Data Acquisition and Sharing SSIS Service Broker SQL Replication Message Oriented Movement Replication and Synchronization BizTalk Distributed Query Federated Views Master Data Services Data Management and Quality Stream Processing (CEP) Data Consistency Between Applications Stream Insights Developer’s Mindset How does a developer approach building a solution or modeling their application? • “I just know SQL”. • Message Oriented vs Sequential. Application Pattern What is the canonical application that Is most resembled? The integrated data has some amount of “staleness” when compared to the sources. • DW Fundamentals (SSIS) • Business Orchestration (BizTalk) Data Size Expected amount of data that will be processed in one transaction or integration event. • One record at a time • 1 million records Data push or pull Is data pulled from sources (sources must respond to • Push queries) by way of the integration process and then pushed • Pull at destinations or is data “made available” by a source on its • own schedule and pushed through the integration or perhaps data is pulled into a destination through the integration when the destination desires it. Latency • Monthly / Weekly /Daily (SSIS) • Hourly / Near real time (SI, DQ) • One machine drives a process • Many masters • Message orchestrator (BizTalk) Data Heterogeneity Hub-spoke, etc. Middle-tier or other locations for integration engine. Availability (determines hub-spoke) Authority: Who is in charge? (who is master) Need for heterogeneity of Sources / Destination. Conflict detection and resolution Integration problem has a need for detecting and resolving conflicting versions of the same records in different system • None (SSIS) • Merge Replication Data Integration or Movement Before data is delivered to its final destination, must it be combined with other data that comes from a different source versus a need to simply move, transform and react to data from mostly once source. Data access patterns Ad hoc vs. known-in-advance. Are the access patterns hard coded into the solution and fixed at “development” time or are the access patterns determined at runtime via some flexible specification. • SQL is very flexible • SSIS hard codes metadata • BizTalk can change sources on the fly Data Shape “Point” (data about a single entity) vs. table-valued data access patterns vs. Message content or event data. • Tables • XML hierarchies Topology • SQL Server to SQL Server • Oracle, SAP, Teradata, XML Need for flexibility to changes in data shape. Should the mainline non-error case behavior expect to handle variant data formats? Need for complex transformation of data shape versus the simple data type conversions required by heterogeneity • SSIS Fixed structure • BizTalk ‘Promoted’ properties • SQL just adapts Structured or unstructured data Working with unstructured documents, blob data, semistructured XML /rigid XML, flexible/rigid file formats that must be parsed / rectangular table data ? • Structured Tables (SSIS) • XML Messages (BizTalk) Supports peruser security. If returning results to a user as if were a data server, do the end user's credentials become part of a request and enable enforcement of heterogeneous security policies? • Dist Query enforces user. • SSIS batch runs with job’s context. Recovery SLA What happens when nodes are down or disconnected, and what kind of recovery is required when connection is reestablished? Are business processes “stopped” or “failed” when integration is delayed or incomplete • SSIS has error handling • Dist Queries just ‘Fail’ • BizTalk had long running transactions and auto-retry. Stream Processing Need to react to temporal or localized changes in a stream of records • The ‘Point’ of StreamInsights • User built script in SSIS Known vs. variable data formats. Complexity of transformation • Minor transform (Replication) • XSL (BizTalk) Move, Conform, Combine Data Build a Data Warehouse Coordinate Activities Tool for ETL Developers An Execution Environment The I/E Wizard • Text files, Oracle, SQL, SAP BW, Excel, etc. • Merge, look-up, union • Pivot, calculate, filter • Create slowly changing dimensions • Pre aggregate • Partition data • Send mail • Loop over files • Connect to FTP • Departmental and IT pros • Special class of developer, might be able to write c# script. • In BIDS (Visual Studio). Graphical Editor, Debugging • Heads free automation of jobs • Object model for embedded applications. • 1 time utility • Load or export a file • Movement of tables from one place to another • Constructing a Data Warehouse • Migration / Consolidation Text Files SSIS My ODS or OLTP System • Bulk Movement • ETL My DW Developer’s mindset Sequential, some scripting Heterogeneity Files, XML, Access, Oracle, Teradata, etc. Application pattern DW fundamentals Shape / Access Rigid schema and access Latency Hourly Conflict resolution None Data size Millions of rows Complex transform Complex business logic, reshaping Topology 1 Machine drives Recovery Custom error handling logic. CRM (SQL Server) Attunity CDC for Oracle Flat File Source SQL Server Source Inventory Management (Oracle) Data Mart (Reporting and Analytics) SSIS Package Lookups, load facts and dimensions, surrogate key generation, … SSIS Package SSIS Package Lookups, slowly changing dimensions, address cleansing, … SSIS Package Data Warehouse (SQL Server) Data conversions, parsing, data quality, aggregations, … SSIS Package Manufacturing Data (Flat files) Staging DB Operational Database (Shop Floor Application) Distributed Applications Loosely Coupled Messaging Part of SQL Server • Run asynchronously • Communicate reliably • Communicate securely • Every system has its own data managed and administered independently • Only communicate via messages • Transactions do not span • Specify message types and contracts • A queue looks like a SQL Table. Routes connect queues • Conversation is a persistent 2 way session of communication between two services • Single Install • Unified programming, administration and security. Great if you love SQL. • SQL Server benefits: Transactions, Backup, Mirroring • Consistency Between Applications • Master Data Management (?) Database 1 • Message Oriented Movement Database 2 conversation Queue A Service A Service B Queue B Developer’s mindset “I Love SQL!” Heterogeneity SQL Server to SQL Server Application pattern Data tier, Loose coupled Shape / Access Flexible Latency Near real time Data size Many small messages Complex transform Minimal, Data carried in messages Recovery SQL Transactions Service Broker Source Table subset 1 (x rows) server 1 subset 2 (x rows) … server 2 … sproc SSIS 32 cores subset n (x rows) SSIS server n Result Table Synchronized Tables Key Scenarios • SQL Tables • Many copies in different databases • Changes may originate in any database. • • • • • • Read Scale Reporting and Staging Geo Data Locality Branch Office Offline Sync EIM Part of SQL Server • Tables • Stored Procedures • Build a custom data tier application Management and Configuration • SQL Server Management Studio • Data Warehousing • Data consistency between Applications • Migration / Consolidation Developer’s mindset SQL centric • Replication and Synchronization Heterogeneity Mostly SQL to SQL. Some support Shape / Access Rigid schema and access Latency Minutes Conflict resolution Merge Data size Changed records Complex transform Slight in the heterogeneous case Topology Bi Directional, Many masters Corporate Offices LOB Systems SSIS (daily) ‘Central’ Service Broker Merge Replica tion ‘Branch’ Transactional Replication Online Terminal Branch Office Unified View of Data sources Gateway to Remote Data • One SQL Query that joins/combines data from n remote servers. • Consistent type system • Consistent query grammar • Cannot move data • Healthcare, Finance • Privacy restrictions. Stored Procedures only access to data • Augment restricted data Federated Databases • Ad-hoc BI. One time or infrequent use. • Combine data from Microsoft eco-system (Access, Excel, SSAS) SQL Features • Linked Servers • OPENROWSET, OPENQUERY, OPENDATASOURCE Data Sources • OLE/DB as protocol Query Optimizer • Rowset Remoting • Query Expression Remoting Access SQL • Business Intelligence • Master Data Management Developer’s mindset SQL Application pattern Ad-Hoc or infrequent reports Latency none Data size Quickly remotable tables Topology Hub and spoke • Federated Views SQL Heterogeneity Some via OLEDB. Mostly SQL to SQL Complex transform Through SQL operators Messaging • Connecting Disparate Systems Across Various Boundaries Orchestration • Automating Business Processes Heterogeneous Data • LOB, Legacy, Technologies, RDBMS Business Activity Monitoring • Providing Process Visibility and Analytics B2B • Connecting Business Partners Manage Business Rules Server Messages • Hosts and runs ‘Orchestrations’ • Message Delivery • Long Running Transactions • XML, Xpath, XSLT • B2B • Data Consistency Between Applications LOB App OLTP • Message Oriented Movement XML Docs Orchestration Logic BizTalk Server BizTalk Server Developer’s mindset Message Oriented. SOA Heterogeneity Highly mixed sources of messages Application pattern Orchestrated Business Process Shape XML Latency Minutes / Seconds Data size Message Contents. 100KB Complex transform XSLT on Message content. LOB App XML Docs Orchestration Logic OLTP SSIS Package SSIS Package DW CEP Engine Captures Events Rich Query Semantics .Net integration • Monitor stream of data from database query, hardware device, internet feed, etc. • Point in time event • Fixed duration events with a sliding widow • Interesting sequence of events • • • • Grouping and aggregation with windows Correlate event streams Absence of activity or too much activity. Calculations, filters, top-K • Ideal for custom applications • LINQ Syntax for stream semantics • Business Intelligence • Data Warehousing • Message Oriented Movement • Stream Processing Data Sources, Operations, Assets, Feeds, Sensors, Devices Input Data Streams CEP Engine Operational Data Store & Archive Developer’s mindset SQL and .Net Results Application pattern f(x) g(y) f'(x) h(x,y) Stream Processing Switch Switch Logs Switch Logs Logs Fact Processing StreamInsight Component SSIS DW Fraud www.microsoft.com/teched www.microsoft.com/learning http://microsoft.com/technet http://microsoft.com/msdn Sign up for Tech·Ed 2011 and save $500 starting June 8 – June 31st http://northamerica.msteched.com/registration You can also register at the North America 2011 kiosk located at registration Join us in Atlanta next year