Informatica Data Virtualization The “Foundation” for AGILITY & PRODUCTIVITY Kerry Holton Informatica Senior Sales Engineer 1 Let’s Win Something!!! A copy of “Lean Integration.” Tell me which box is the ONLY thing that data virtualization built on data federation does – and why??? H Answer questions along the way… Take some good notes ! Informatica Corporation Confidential – Do Not Distribute 2 To Learn More… Informatica.com > Products > PowerCenter > Data Virtualization Edition Informatica.com > Products > Data Virtualization Sign-Up Expert Roundtables JOIN & DISCUSS 2000+ Strong “Data Virtualization & Data Services Architecture” Group Data Virtualization Corner http://vip.informatica.com/?elqPURLPage=8668 Informatica Corporation Confidential – Do Not Distribute 3 Agenda • “2012” – The Year of “BI” Agility • Data Virtualization – Overview, Problem & Need • Key Use Cases • Customer Examples • Data Virtualization in Action • Why Informatica? • Next Steps & Q&A Informatica Corporation Confidential – Do Not Distribute 4 I’m writing you a million dollar check, but you’re not solving my big problem. My big problem isn’t getting the data into the data warehouse. My big problem is … getting the data out!” ICC Director (VP of IM) to Dave Lyle (VP Product Strategy), end of Q3, 2009 Informatica Corporation Confidential – Do Not Distribute 5 “2012” Have any of you had this discussion? • Need for a new BI infrastructure • Replacing spreadsheets • Faster data access & reporting Business / BI IT BI will be the top priority for the CIO, in 2012! “Demands by users of business intelligence (BI) applications to "just get it done" are turning typical BI relationships, such as business/IT alignment and the roles that traditional and nextgeneration BI technologies play, upside down. As business users demand more control over BI applications, IT is losing its once-exclusive control over BI platforms, tools, and applications.” – Boris Evelson, Forrester Research, Blog “Top 10 BI Predictions for 2012” • Business-focused BI • $100M Qtr. in 2011 • 10k+ customers Informatica Corporation Confidential – Do Not Distribute 6 How Long Does it Take to Deliver New Critical Data or Reports to the Business? Informatica Corporation Confidential – Do Not Distribute 7 The Business Can’t Wait 3-6 Months For a Single View of All Enterprise Data Business Intelligence Business Intelligence Hand Coding Business Intelligence ETL Business Intelligence Business Intelligence ESB/EAI Business Intelligence Business Intelligence SOA EII SWIFT Cloud Computing Applications Databases NACHA HIPAA … Partner Data Unstructured Warehouses NoSQL Social Informatica Corporation Confidential – Do Not Distribute 8 Overview Informatica Corporation Confidential – Do Not Distribute 9 HealthNow’s Data Integration Challenges Business BI (Cognos) Portal (WebSphere) IT NO REUSE 30,000 Data Marts Were Created by Shadow IT Teams So What Did the Business Do? To Add 1 Product Attribute to Existing Report – IT Estimated 1700 Hours Different Price Info in Each LOB 16 Types of Data Sources 30,000 Data Marts (MS Access) Data Warehouse (DB2) Facets [Benefits, Products] (Sybase ASE) Product Config Mgmt (MS SQL Server) The Fundamental Problem(s)… Typical Data Integration Process 1. Design 2. Change 3. Integrate 4. Unit Test 5. Validate 6. Deploy • It takes too long to explain requirements • It takes months to change a DW / add new critical data • It takes many iterations to get the right data / reports Business is Involved Too Late • Changes can break existing integrations & impact apps. As-Is Value Stream Map (LOT OF WAIT & WASTE) Informatica Corporation Confidential – Do Not Distribute 11 Trying to Solve it in BI Layer Just Wont Scale…Why? No Reuse No Common Data Access Layer No Easy Way to Handle Change No Data Quality & No Data Consistency Unstructured Data Spread Marts EDW Applications DATA MART Informatica Corporation Confidential – Do Not Distribute 12 What is Needed to Solve these Problems? BI Composite Apps Portal Data Consumers COMMON ACCESS LAYER ACROSS MANY DATA SOURCES Logical Data Objects Data Abstraction CUSTOMER ORDER PRODUCT … Enterprise Data Sources Logical View of All Underlying Data Think Virtual Machines for DATA! FAST, DIRECT ACCESS TO DATA THE BUSINESS TRUSTS DATA ABSTRACTION & REUSE OF SKILLS/LOGIC SUPPORT ALL USE CASES BI / DW MDM Informatica Corporation Confidential – Do Not Distribute SOA 13 How is the Market Trying to Address the Problems? Time GAINED by federation is nullified by Time SPENT on more processing Data Virtualization (Built-On Data Federation) Limited or Data Source Profiling Only Deliver Merge Access X BI SQL/XQuery Only Transformations & No Data Quality X Virtual View X DW Cannot Easily Move to Persistent Store or Reuse DW • Addresses specific use cases • No data movement / no copies / only federation • Code heavy / not model-based / no reuse • Not tools for business self-service • SQL/XQuery-only transformations • No data profiling / no data quality It’s like ONE step forward & TWO steps backward Informatica Corporation Confidential – Do Not Distribute 14 What Are the Top 3 Key Capabilities for a Project that Needs Data Virtualization? If Performance is a given… Dataset - 600 Source – Informatica Data Virtualization Expert’s Forum ,2011 Informatica Corporation Confidential – Do Not Distribute 15 Are We Talking About TWO Separate Tools? Informatica Corporation Confidential – Do Not Distribute 16 What Does the Ideal Solution Look Like? 1 2 Customer Name Address Category Orders 7 Virtual Table Accounts CRM 3 Virtual Table ACCESS & MERGE MODEL Business Manager Optimizations & Caching CRM Virtual Table Analyst, Steward Developer, Architect Common Metadata Accounts Virtual Table SCALE & PERFORM PROFILE IN RT Business 4 6 IT 5 Virtual Table DW Batch Web Services Virtual Table Virtual Table Call Center Accounts Query Engine WS Server Advanced Transformations, Data Quality, Data Masking TRANSFORM IN RT MOVE OR FEDERATE REUSE INSTANTLY Informatica Corporation Confidential – Do Not Distribute 17 How Does Informatica Deliver the Ideal Solution? Data Virtualization = (Data Integration + Data Federation) in ONE Tool Analyze & Profile Data & Logic Anytime Early Business Involvement Deliver Merge Prototype First Access BI Advanced Transformations & Data Quality DW Virtual View Move to DW or Instantly Reuse as SQL / WS DW • Single environment for both data integration and data federation • No data movement / no copies – but easily reuse virtual views for batch • Early & iterative business (analyst) involvement – self-service • Pre-built library of rich ETL-like advanced data transformations • Integrated real-time, on-the-fly data profiling & data quality Informatica Corporation Confidential – Do Not Distribute 18 REQUEST NEWNEW DATA & REPORTS BUSINESS NEEDS •THAT Change / add an attribute TRUSTS, DELIVERED • & Join new data not in DW DAYS vs. MONTHS • IN Create a new report How Does It Work? SELECT * INSTANT REUSE FROM INNER JOIN EXISTING QUERY FROM customer_table customer_table SELECT * NEW QUERY support_table ON SELECT * customer_table.customer_num = FROM SUPPORT support_table.customer_id WHERE customer_name=‘ACME’ DM DM DW DM DW CUSTOMER DM CustSUPPORT DW PRODUCT INVOICE ODS WEB Data On-boarding Complement Trusted Virtual New Results Retrieve Query quality query view blend retrieved historical is rules for new can processed data ofreport applied be historical data in architecture customer physically real-time needing does by on-theand not operational materialized without virtualization break with fly data against virtualization data datatxt integrations not data later movement in data DW delivered layer into DW Informatica Corporation Confidential – Do Not Distribute 19 Informatica Data Virtualization at HealthNow Business IT BI (Cognos) Portal (WebSphere) Instant Reuse NO DW, BI, SOA & MDM REUSE (SQL, Web Services, Batch) Fast, Direct Data Delivery 1 week (vs. 3 months) Shared Repository MEMBER 30,000 Data Marts (MS Access) CLAIM “Virtual Table” Common Data Model Data Warehouse (DB2) PRODUCT Facets [Benefits, Products] (Sybase ASE) ORDER Product Config Mgmt (MS SQL Server) What Does Informatica’s Data Virtualization Solution Look Like? NEW PowerCenter Data Virtualization Edition Partitioning New PowerCenter Edition for AGILITY & PRODUCTIVITY Combines: Data integration (PowerCenter SE) Data Profiling Data Virtualization (IDS Full Use) Data Federation (Data Services) Data Profiling (IDE Full Use) Developer Tool Business-IT Collaboration (Analyst) Analyst Tool 2 Adapters (PWX for Relational) ETL (PC Standard Edition) Packaged for simplicity and attractively priced Reuses existing skills and resources Informatica Corporation Confidential – Do Not Distribute 21 1 What Use Cases Are Supported? DW/Business Intelligence (BI) Business IT Prototype DW & accelerate new data & reports from months to days Change Request Deploy to Production 2 MDM Deliver a complete view of master & transactional data in real-time INCOMPLETE VIEW COMPLETE VIEW OF CUSTOMER Virtual View Weeks/Days Months MDM HUB TRANSACTIONAL SYSTEMS DATA WAREHOUSE SOA Deliver the missing data services layer to SOA & applications BPM ESB 3 Registry Applications Biz. Services Data Abstraction Data Sources Informatica Corporation Confidential – Do Not Distribute 22 What are the Benefits of Informatica’s Solution? • Provide fast, direct access to critical new data & reports in days vs. months • Enable rapid iterations to results with instant Biz-IT collaboration • Deliver flexibility, ensure reuse & insulate applications from changes COMPLETE, CURRENT & TRUSTED View of All Data, On-Demand Informatica Corporation Confidential – Do Not Distribute 23 Customer Examples 24 BI, MDM, SOA – HealthNow NY Improves Risk & Pricing Analysis With Data Services BI (Cognos) Portal (WebSphere) SQL, Web Service IDS Data Marts (MS Access) The Challenge • 16 enterprise databases and over 30,000 Access databases • Took 1700 man hours to add a new product to portfolio • Business had to go to 5 different sources for all information related to paid claims • Continued data growth with over 30,000 claims processed per day • Data proliferation leading to HIPAA compliance concerns Virtual Table Data Warehouse (DB2) Facets [Benefits, Products] (Sybase ASE) The Solution Product Config Mgmt (MS SQL Server) The Benefits • Logical data models and data services to represent their core data entities – MEMBER, CLAIMS,PROVIDER, ENCOUNTER, LAB RESULTS • Speed of data delivery – Implemented first project in around 40 man hours. This would have taken an order of magnitude more in the past • ‘Rate Letter’ project for determination of policy rates and discounts went live in May 2010 • Complete view of the truth Business users now access plan rate information from single service • Over 400 Logical data objects and 2 web services being used by around 125 end users • Better governance – Centrally managed virtual views as opposed to one-off data marts is improving governance of data 25 BI, SOA - Large Latin American Bank Improves Governance Microsoft Reporting Services Customized Applications SQL, Web Service Data Virtualization Transactions Tables (Mainframe – Adabas, DB2) The Challenge • Lack of visibility for proper supervision and regulation of the national financial system Virtual Table Data Warehouse (DB2 LUW) Credit Analysis, Applications, AML (SQL Server) The Solution • Logical data models to represent core business entities (e.g. CUSTOMER) Financial Institutions (Flat Files and Messages) The Benefits • Speed of data delivery – implemented first project in around 60 man hours and delivered a new virtual view in < 1hour • Mainframe virtualization (join data from • Better risk/fraud governance (across Adabas, DW DB2, Apps., 3rd Party ) more than 6000 financial institutions) • Logical data models and Web services and compliance with BASEL I, BASELII • Persistent data replication even for to deliver flexibility and agility to and SOX one-time use respond to changing business needs • Complete single view of the truth • Huge data volumes (Online 6TB, DW • Creation of logical data objects and business users can now access 14 TB) physical materialization of virtual views consistent customer and plan rate data to familiar PowerCenter environment • Different reporting tools requesting • Centralized management and different data combinations across administration of logical data objects heterogeneous data sources • Real-time analysis and joining of data (Adabas, DB2, SQLServer, Files) 26 BI, MDM – VW Leverages Delivers a Complete View of Critical Data On-Demand BI SQL, Web Service Portal Reuse IDQ Virtual Table IDS MDM Hub (Customer, Purchase, Case) (IBM) DW (Service History) (Teradata) The Challenge • CUSTOMER data in > 30 systems, MDM hub, transaction systems, DW PRD [Campaign History] (SAGA/Win) The Solution • Create a common data model for VW owners, prospects, & partners Transactional Systems (Warranty, Service) (Varied) The Benefits • Completed DI, DQ, & data services production pilot in <1 month • Have 80% data but missing critical 20% • Federate data in real-time from > 30 transactions - WARRANTY, SERVICE systems & transactional systems • Can leverage operational efficiency & real-time decisions to differentiate • No authoritative source of CUSTOMER, • Provide easy-to-use, browser-based PRODUCT data, conflicting relationships tools for business & IT to collaborate • Delivered accurate, complete view of CUSTOMER data, on-demand • No complete view of CUSTOMER data • Apply reusable DQ rules on-the-fly on-demand is affecting service to CUSTOMER, PRODUCT data • Lowered costs by increasing productivity & reuse of data services • Without complete view of data, can’t • Instantly reuse data services for meet goal to sell 3x more cars by 2018 SQL or Web services • Supported strategy to triple sales to 1M vehicles annually, by 2018 27 Data Virtualization in Action Informatica Corporation Confidential – Do Not Distribute 28 The “Keystone” – Business Owns the Data While IT Retains Control • Role-based tools for Analysts (Web) & IT developers (eclipse) BI Report Analyst Tool (Web Browser) SQL or Web Service • Empower business analysts to: • Define entities & directly access & merge data to create virtual views • Rapidly profile data sources & logic without more processing • Quickly find data & rules via business glossary • Collaborate, test, validate & share results • Cuts the wait & the waste in the process Common Metadata Developer Tool (Eclipse) VIRTUAL TABLE • Common metadata lets Analysts & IT collaborate in RT SQL or Web Service Portal Batch ETL Data Warehouse 29 The 7 Steps to AGILITY & PRODUCTIVITY 1 2 Customer Name Address Category Orders 7 Virtual Table Accounts CRM 3 Virtual Table ACCESS & MERGE MODEL Business Manager Optimizations & Caching CRM Virtual Table Analyst, Steward Developer, Architect Common Metadata Accounts Virtual Table SCALE & PERFORM PROFILE IN RT Business 4 6 IT 5 Virtual Table DW Batch Web Services Virtual Table Virtual Table Call Center Accounts Query Engine WS Server Advanced Transformations, Data Quality, Data Masking TRANSFORM IN RT MOVE OR FEDERATE REUSE INSTANTLY 30 1. Model Common Data Access Layer – Logical Data Object CUSTOMER ORDER Unstructured Spread Marts Data marts Data PRODUCT EDW INVOICE Applications • Represent underlying data as business entities (CUSTOMER) • Provide a common logical view or abstraction of all data • Import logical model from 200+ modeling tools (ERWIN) • Use visual and metadata based mapping language • Instantly reuse logical data object for all applications 31 31 2. Access and Merge Turn many data sources into ONE with Data Virtualization CUSTOMER SUPPORT Analytical Data Transactional Data Social PRODUCT Master Data Warehouses Interactional Data Archived Data NoSQL SWIFT Cloud Computing Application Database INVOICE Unstructured NACHA HIPAA … Partner Data 32 3. Profile in RT Rich set of integrated profiling capability to find data anomalies and to discover keys and hidden relationships: • Column & Rule Profiling • Midstream or Comparative Profiling • Join & Overlap Analysis • Primary Key / Foreign Key Profiling • Dependency Profiling 33 4. Transform in RT • Metadata-driven, codeless, graphical environment • Rich, pre-built library of advanced transformation • Integrated Data Quality transformations • Define policies to mask sensitive data in real time 34 5. Reuse Instantly Batch SQL Web services • Instantly reuse LDOs for any mode/protocol (SQL, WS) • Single click deployment to batch • Execution & optimization separate from design-time • No re-development & rebuilding of LDOs METADATA REPOSITORY 35 6. Move or Federate Data Federation Data Integration BI Deliver BI DW Virtual View Merge DW Single-click deployment to PowerCenter (batch) Access DW Advanced Transform & Extract Load Quality • Specific use cases • Majority of use cases • No data movement / no copies • Physical data movement • Real-time federation • Bulk/batch, near real-time, real-time • SQL/XQuery-only transformations • Advanced transformations • No data quality / business validation • Built-in data quality 36 7. Scale & Perform • Leverage the proven, highperformance Informatica engine • Optimized SQL Query engine & graphical Query Plan • High-performance Web services server • Rich set of optimizations & caching mechanisms • Rule Based, Cost Based, Push Down, Early Projection, Early Selection, SemiJoin, Virtual Table & Result Set Caching • Fine grained access control, WSSecurity & pass-through security • Database, Schema, Table, Column, Row-Level (v9.5) security 37 Data Virtualization Built On Data Federation Does 1 Box – Which 1? 1 2 Customer Name Address Category Orders 7 Virtual Table Accounts CRM 3 Virtual Table ACCESS & MERGE MODEL Business Manager Optimizations & Caching CRM Virtual Table Analyst, Steward Developer, Architect Common Metadata Accounts Virtual Table SCALE & PERFORM PROFILE IN RT Business 4 6 IT 5 Virtual Table DW Batch Web Services Virtual Table Virtual Table Call Center Accounts Query Engine WS Server Advanced Transformations, Data Quality, Data Masking TRANSFORM IN RT MOVE OR FEDERATE REUSE INSTANTLY 38 Do it Right – Avoid Costly Mistakes! Enabling Rapid Development Sustain & Maintain TIME COST Analyzing & Profiling Get it Right 1st Time TIME COST RISK Integrating with Quality Bake-in Quality TIME COST RISK Scaling with Flexibility Leveraging Investments Prototype First & Then Scale Re-purpose Logic & Skills TIME COST TIME COST Virtual Table EII Optimizations Model & metadatadriven environment Profile data AND logic anywhere 1000s of lines of code Only source profiling, need extra processing v/s v/s Leverage pre-built logic including quality v/s Hand-coding can’t do advanced transforms SQL XQuery Virtualize or physically materialize in 1 tool v/s Non-integrated technologies Naturally extend your infrastructure v/s Re-work, re-deploy & re-train every time EII X Simple Cleansing Web Service TIME COST Maintenance Nightmare TIME COST RISK Many Iterations & Mistakes TIME COST RISK Limited Rules, No Data Quality TIME COST RISK Overburden Data Virtualization TIME COST Re-invent the Wheel 39 Data Virtualization in Action 40 Scenario – Big Company ISSUES Call center talk times increasing = scattered data + many screens Time wasted in correcting inconsistent & inaccurate customer data Agents can’t easily & quickly identify what products are owned IMPACT Can’t easily identify top customers to improve up-sell/cross-sell Low customer satisfaction & growing customer attrition High marketing costs without targeted campaigns 41 Demo – Big Company Business needs a new report – NOW vs. months! Quickly merge data from multiple systems & cleanse Analysts know the data – want some self-service Join CUSTOMER (Oracle CRM) & ORDER (file) Get ORDER TOTAL for ACTIVE customers Analyst Analyst defines business entity, profiles, defines rules & hands over to IT Integrate missing data, do data cleansing “on-thefly,” validate IT Architect / Developer IT enriches the business entity & publishes for BI tool, portal or batch 42 Why Informatica? Informatica Corporation Confidential – Do Not Distribute 43 Why Informatica? Gartner Magic Quadrant for Forrester Wave: Data Data Integration Tools, 2011 Virtualization, Q1 ‘12 ONLY INFORMATICA COMBINES… Power of The Platform THE BEST OF “DATA INTEGRATION” (SOPHISTICATION) “The ability to switch seamlessly and transparently between delivery modes (bulk / batch vs. granular real-time vs. federation) with minimal rework will be key for IT organizations seeking to develop a successful data integration strategy.” Ted Friedman, VP Distinguished Analyst, Gartner THE BEST OF “DATA VIRTUALIZATION” (AGILITY) “With v9, Informatica advanced its capabilities with on-the-fly data quality and profiling, a model-driven approach to provisioning data services, performance enhancements, cloud integration, common metadata, and role-specific tools.” The Forrester Wave: Data Virtualization, Q1 2012 …INTO ONE SOLUTION THAT REUSES SKILLS Informatica Corporation Confidential – Do Not Distribute 44 Only Informatica Provides ONE Solution for Data Integration and Federation Analyze & Profile Data & Logic Anytime Early Business Involvement Deliver Transform Prototype First Access BI Advanced Transformations & Data Quality DW Virtual View Move to DW or Instantly Reuse as SQL/WS DW • Single environment for both data integration and data federation • No data movement / no copies – but can easily reuse virtual views for batch • Early & iterative business (analyst) involvement, efficient collaboration • Pre-built library of rich ETL-like advanced data transformations • Integrated real-time, on-the-fly data profiling & data quality Informatica Corporation Confidential – Do Not Distribute 45 Next Steps & Q&A Informatica Corporation Confidential – Do Not Distribute 46 Have the Conversation with the Business! New data & reports take too long… 1. 2. 3. 4. 5. Business IT “YOU” can now do it in DAYS! Identify a Critical Project in Your Company Involve the Business Early & Often Bake-In Quality & Support Advanced Logic Demonstrate Business Value Early Self-Service + Data Virtualization = ROI Informatica Corporation Confidential – Do Not Distribute 47 Next Steps & Q&A Informatica.com > Products > PowerCenter > Data Virtualization Edition Informatica.com > Products > Data Virtualization Sign-Up Expert Roundtables JOIN & DISCUSS 2000+ Strong “Data Virtualization & Data Services Architecture” Group Data Virtualization Corner http://vip.informatica.com/?elqPURLPage=8668 Informatica Corporation Confidential – Do Not Distribute 48 Informatica Corporation Confidential – Do Not Distribute 49