Teradata Leaders in Enterprise Data Warehousing John Tulley Vice President, Teradata Canada Email: John.tulley@ncr.com Office: 905-478-8997 NCR Corporate Overview 2004 Revenue by Business Unit • Fortune 500 company • Global operations in more than 100 countries & territories • 28,500 employees Teradata Financial Retail Systemedia Customer Service Payment & Imaging Other • 2004 Revenue $5.984B • 1999-2004 >51% revenue growth Retail Solutions Teradata Data Warehouse Financial Solutions Systemedia Worldwide Customer Services 2 Top Industry Leaders Rely on Teradata Teradata Top 10 80% of Top 10 Global Telco Firms 60% of Top 10 Most Admired Global Companies 60% of Top 10 Global Airlines 50% of Top 10 Global Retailers 50% of the Top 10 Transportation Logistic Firms FORTUNE Global Rankings, July 2005 • Leading industries > Banking > Government > Insurance & Healthcare > Manufacturing > Retail > Telecommunications > Transportation Logistics > Travel • World class customer list > More than 800 customers > Over 1200 installations • Global presence > Over 100 countries • 4,000 world-wide professionals dedicated to data warehousing 3 The Teradata Difference What We Do…. • • • • • • • Enterprise data warehouse Windows 2003/Unix/Linux scales from Intel laptop to MPP Analytic capabilities transform data into information. Extreme high availability Industry leader in analytical applications Integration with SAP, Siebel, Hyperion Partnerships include Accenture, Bearingpoint, CAPGemini, Deloitte, EDS, Lockheed Martin • Strong customer references All we do is Data Warehousing! 4 Teradata - the recognized leader in data warehousing and high-performance decision analytics. ….Gartner ASEM IBM S/390 OS/390 DB2 EEE Sun Enterprise Solaris Oracle HP HP9000 HP-UX Oracle IBM SP RS/6000 AIX DB2 EEE Compaq Alpha Tru64 Oracle Teradata Generic Unisys Intel IA-32 ES7000 Win2000 Win2000 SQL Server SQL Server Data Mgmt. Data Admin. Scalability and Suitability Concurrent Query Mgmt. DW Track Record Query Perform. Source: Gartner ASEM Ratings 2004 Worst Best 5 Industry Leadership Recognition • Gartner - “Dominant Lead” – 5th Consecutive Year > “DBMS is surely the place where NCR Teradata sets the gold standard. As in previous years, the Teradata score was 98%, leaving little scope (and need) for improvement.” – Gartner's [Application Server Evaluation Model] ASEM Data Warehouse Server Update, A. Butler, K. Strange, J. Enck, M. Chuba, November 2004 > Teradata[database management system] DBMS capabilities remain unchallenged by its competitors in the market.” – Gartner’s Magic Quadrant for Data Warehouse DBMSs, 2004, Kevin H. Strange, June 2004 > “Teradata continues to drive a strong vision.” – Gartner Research, MarketScope: Customer Relationship Marketing, 1Q04, G. Herschel, J. Radcliffe, Feb 2004 > Gartner Dataquest recognized Teradata as the growth leader in the RDBMS market, with above market growth of 17.4%. 2005 > Teradata is rated “Positive” in Gartner’s MarketScope for Campaign Management, the highest rating awarded 2005 • META Group > “Teradata has displayed unmatched (but often copied) strength of vision and focus in the [enterprise data warehouse] EDW market.” – METAspectrum Market Summary, Enterprise Data Warehouse METAspectrumSM Evaluation, 2004 6 Industry Awards and Recognition - 2005 BI Excellence Award Sponsor: Gartner Group •Continental Airlines - winner •Cardinal Health - finalist Technology Leadership Award Sponsor: Frost & Sullivan •Teradata selected for Leadership Award – CRM Analytics TDWI Best Practices Award •sunrise TDC Switzerland AG – winner - Customer Relationship Management 1to1 Impact Award Sponsor: Peppers & Rogers Continental Airlines recognized as Technology Optimization winner Editors’ Choice Awards Sponsor: Intelligent Enterprise •Teradata selected for the “Dozen” Most Influential BI Companies •Winner, Customer Analytics category NEXUS Awards NEXUS Sponsor: New Zealand Awards Direct Marketing Association •Bank of New Zealand, silver award - data mining & analytics; bronze award - data management 7 Government Agencies with Teradata Presence • US Air Force • US Navy • US Transportation Command • Defense Commissary Agency • Army, Air Force Exchange • Intelligence Community • US Postal Service • Italian Post Office • Dept. of Justice • Dept. of Housing and Urban Development • Dept. of Agriculture • Arizona, Iowa, Florida, Texas, Illinois, New York, Utah, Michigan • RAMQ – Quebec • Australian Tax Office • South African Tax Office 8 Teradata Solutions Methodology Project Management Strategy Research Analyze Design Equip Build Integrate Manage Opportunity Assessment Business Value Application Requirement System Architecture Hardware Platform Physical Database Components for Testing Help Desk Enterprise Assessment EDW Roadmap Logical Model Package Adaptation Software Platform ECTL Application System Test Capacity Planning Information Sourcing Data Mapping Custom Component Support Management Information Exploitation Production Install System Performance Infrastructure & Education Test Plan Operational Mentoring Operational Applications Initial Data Business Continuity Education Plan Technical Education Backup & Recovery Acceptance Testing Data Migration User Curriculum User Training HW/SW Upgrade Value Assessment Availability SLA Technology Neutral Services System DBA Teradata’s success is the combination of hardware, software and methodology Solution Architect 9 Workload Complexity Data Warehouse Needs Will Evolve • • • • • • • ACTIVATING MAKE it happen! Query complexity grows Workload mixture grows Data volume grows Schema complexity grows Depth of history grows Number of users grows Expectations grow OPERATIONALIZING WHAT IS happening? PREDICTING WHAT WILL happen? Event-Based Triggering Takes Hold ANALYZING WHY did it happen? REPORTING WHAT happened? Batch Analytical Modeling Grows Increase in Ad Hoc Analysis Ad Hoc Analytics Continuous Update/Short Queries Event-Based Triggering Primarily Batch & Some Ad Hoc Reports Data Sophistication 10 Enterprise Analytical Topologies Virtual, Distributed, Federated Data Mart Centric Sources Hub-andSpoke Data Warehouse Sources Sources Enterprise Data Warehouse Sources ODS Middleware Marts Users Users DW DW Marts Users Users Independent Data Marts Leave Data Where it Lies Dependent Data Marts Centralized Integrated Data With Direct Access P • Easy to Build Organizationally r o • Easy to Build Technically s • No need for ETL • No need for separate platform • Allows easier customization of user interfaces & reports • Enterprise view • Design consistency & data quality • Data reusability C • Business Enterprise view unavailable o n • Redundant data costs s • High ETL costs • No ETL • Meta data issues • Network bandwidth and join complexity issues • Only viable for low volume • Business Enterprise view challenging • Redundant data costs • High DBA and operational costs • Data latency • ODS duplication • Requires vision • Requires Data Owners to willingly participate • High App costs • High DBA and operational costs 11 Typical Data Warehouse Architecture What’s wrong with this picture? 1. There are too many copies of the data. Will they all be the same? Transaction Systems Operational Data Stores Central store, Hub, Clearing house 2. There is too much latency - too long to get the data to the people who need it. Everyone sees different inconsistent points in time Data Marts 3. The solution is too complex. Every line on the chart represents an ETL process that requires $$ for Life Cycle Maintenance 4. The solution is too expensive. There are numerous components that lead to increased costs. Costs often hidden in distributed organization. 12 Teradata’s Enterprise Data Warehouse An Integrated, Centralized Data Warehouse Solution Single version of data ORDER ORDER NU M BER ORDER DA T E ST AT US “Enterprise” Data Warehouse ORDER IT E M BACKORDERED QUANT IT Y CUST OM ER CUST OM ER CUST OM ER CUST OM ER CUST OM ER CUST OM ER CUST OM ER CUST OM ER CUST OM ER NUM BER NAM E CIT Y POST ST ADDR PHONE FAX ORDER IT E M SHIPPED QUANT IT Y SHIP DAT E IT EM IT EM NUM B ER QUANT IT Y DESCRIPT I ON PRODUCT PERIOD Data Replication PERIOD KEY DATE DAY MONTH YEAR QUARTER TRIMESTER SALES PERIOD KEY PRODUCT KE Y CUSTOMER K EY MARKET KEY DOLLARS UNITS CUSTOMER Data Marts CUSTOMER K EY CUSTOMER NAME CUSTOMER CITY CUSTOMER P OST CUSTOMER S T CUSTOMER A DDR CUSTOMER P HONE CUSTOMER FAX PRODUCT KE Y PRODUCT NA ME DISTRIBUTOR PRODUCT DE SCRIPTION PRODUCT HE IGHT PRODUCT WIDTH PRODUCT DE PTH PRODUCT WE IGHT Logical (Views) Application MARKET MARKET KEY CITY STATE ZIP ZIP4 DISTRICT REGION COUNTRY Dimensional Co-Located Dependent DM Optional Virtual Views Business & Technology – Consultation Support & Education Services Optional ELT Enterprise, System, & Database Management Optional Logical Data Model Operational Data Store (ODS) Optional ETL Hub Metadata Data Transformation Middleware/Enterprise Message Bus Transactional Data Physical Data Base Design Transactional Users Decision Users Strategic Users Tactical Users Reporting OLAP Users Data Miners Event-driven/ Closed Loop 13 TERADATA is an Open System Virtually any application or middleware framework can be integrated with TERADATA !!! Messages JMS JSP IIOP ASP JAVA EJB TAP Appl CORBA .NET JDBC JDBC JDBC ODBC OLE-DB TERADATA Utilities Adapter(s) TERADATA TERADATA Utilities Adapter(s) Message Bus JMS Publish & Subscribe WEB Queues 14 Teradata Active Data Warehouse in action Front Base Line Supply DOD Supplier Secure Wireless Warfighter Support 5.Warfighter receives alert via Secure Blackberry, adjusts Battle Plans to align with rush replenishment 1.Continuous Transaction feeds on supplies usage Secure DOD Network Enterprise Application Integration Web Services WebTibco .NET Sphere (EAI) Strategic & Tactical Queries 4. and or DOD Vendor notified and reorders Secure DOD Network Business Services OLAP Rules Event Intel Queries Agents Engine Engine 2. Conditioning & Ascential Loading of trans Informatica data Information Exchange MQ Adapter T-Pump, MQ Adapter Fast Export Legacy Systems Direct Data Access Data Acquisition T-Pump, MQ Adapter Fast Load, Multi Load Transactional Environment 3.Stored Procedures trigger based event detection TERADATA sends alert Stored Procedures to Q Tables Warfighter, UDF, Triggers Warfighter Support, & DOD Supplier via MSTR Narrowcaster Decision Making Environment 16 So what is Teradata ? What is Teradata? • RDBMS designed to run the world’s largest databases • Latest Intel technology nodes • UNIX-MP-RAS, Windows 2003 • Linux in Fall 2005 • Scales linearly from Laptop to MPP • Has a parallel aware optimizer that allows multiple complex queries to run concurrently • Standard access language (SQL) • Uses a “Shared-Nothing” architecture • Unlimited, unconditional parallelism • Linear Scalability allows for increased workload without decreased throughput. 18 Teradata Hardware Architecture • SMP Nodes > Latest Intel SMP CPUs > Configured in 2 to 8 node cliques > Windows, Unix or Linux • BYNET Interconnect > Fully scalable bandwidth > 1 to 1024 nodes BYNET Interconnect SMP Node1 PE PE AMP AMP AMP AMP SMP Node2 PE PE AMP AMP AMP AMP SMP Node3 PE PE AMP AMP AMP AMP SMP Node4 PE PE AMP AMP AMP AMP • Connectivity > Fully scalable > Channel - ESCON > LAN, WAN • Storage > Independent I/O > Scales per node • Server Management > One console to view the entire system Server Management 19 Teradata Shared Nothing Architecture P P P FSB Memory P FSB I/O P I/O Memory P FSB Memory P P FSB I/O I/O Memory • Similar to Large SMP, except Interconnect runs at I/O Rates and not Memory Rates • Longer Lifetime: I/O Interfaces have a 3-5 Year Lifetime • Scaling Is By Increasing Link Data Rates and Parallel Links 20 SMP vs. MPP: The Teradata Advantage • 2-Way SMP > > > > > 1.8 Relative CPU’s 4 GB Memory 3.2 GB/Sec BUS 3.2 GB/Sec Memory 1.5 GB/Sec I/O • 4-Way SMP > > > > > 3.1 Relative CPU’s 4 GB Memory 3.2 GB/SEC BUS 3.2 GB/Sec Memory 1.5 GB/Sec I/O • 2 2-Way Teradata Nodes > 3.6 Relative CPU’s > 8 GB Memory > 6.4 GB/Sec BUS > 6.4 GB/Sec Memory > 3 GB/Sec I/O • 32 2-Way Teradata Nodes > 57.6 Relative CPU’s > 128 GB Memory > 102.0 GB/Sec BUS > 102.0 GB/Sec Memory > 48 GB/Sec I/O 21 Teradata Data Distribution Dividing the Work • Rows are distributed evenly by hash partitioning > > Done in real-time as data are loaded, appended, or changed. No reorgs, repartitioning, space management • Shared nothing software: > > > Table A Table B Table C Each VAMP owns an equal slice of the data. Each VAMP works exclusively & independently on its rows Nothing centralized: No single point of control for any operation (I/O, Buffers, Locking, Logging, Dictionary) Prime Index Teradata Parallel Hash Function VAMP1 VAMP2 VAMP3 P P P M D M D M RowHash (Hash Bucket) VAMP4 ………………………………………………………VAMPn P D Data Fields M P D M P D M P D M P D M P D M D 22 File System • File system architecture is fundamentally different > > > > Broke all the rules No Pages, BufferPools, TableSpaces, Extents,... Data location and management are entirely automatic Space allocation is entirely dynamic • Absolutely minimal labor required > No reorgs – Don’t even have a reorg utility > > > > > No index rebuilds No re-partitioning No detailed space management Easy database and table definition Minimum ongoing maintenance – All performed automatically 23 Self Managing Architecture • Teradata’s self-managing philosophy provides the lowest total cost of ownership of any RDBMS > > > > > > Automatic, random and even data distribution Parallel-aware optimizer eliminates query tuning Parallel utilities with low setup and checkpoint restart Single operational view of entire MPP complex (AWS) Single point of control for the DBA (Teradata Manager) SQL-ready database management information (log files) Teradata DBAs Don’t Worry About! 1. 2. 3. 4. Install the Database Understand, monitor and tune extensive operating system parameters Understand, monitor and tune extensive database parameters Determine the size and physical location and/or space allocations of tables and index partitions 5. Perform periodic table and index re-orgs 6. Manually restart multi-step load process when failure occurs 7. Ability to run queries and data maintenance 24x7 8. Sort data before loading 9. Calculate and configure fail-over plans in a clustered multiprocessing environment 10. Spend a lot of time planning and expanding the system 11. Query tuning for decision support 25 Teradata High Availability • Teradata software provides high availability beyond other databases > Compensates for hardware failures: – Automatic failover for dynamic workload rebalancing (migrating VPROCS) – Online, continuous backup (Fallback) BYNET Interconnect SMP Node1 PE PE AMP AMP AMP AMP SMP Node2 PE PE AMP AMP AMP AMP SMP Node3 PE PE AMP AMP AMP AMP SMP Node4 PE PE AMP AMP AMP AMP > Recycles before the operating system completes its reboot (multi-node system) 26 Teradata’s Multidimensional Scalability (It’s more than just big data) Amount of Detailed Data Concurrent Users Multiple Subject Areas Sophisticated Queries • Simple Direct at the start ORDER ORDER NUMBER ORDER DA TE STATUS • Moderate Multi-table Join ORDER ITE M BACKORDERED QUANTITY CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER CUSTOMER NUMBER NAME CITY POST ST ADDR PHONE FAX ORDER ITE M SHIPPED • Regression analysis • Query tool support QUANTITY SHIP DATE ITEM ITEM NUMB ER QUANTITY DESCRIPTI ON 28 EDW Requires Multi-dimensional Scalability Data Volume (Raw, User Data) Mixed Workload Query Concurrency Data Freshness Query Freedom Query Complexity Query Data Volume Schema Sophistication 29 The Teradata Difference “Multi-dimensional Scalability” Data Volume (Raw, User Data) Mixed Workload Query Concurrency Teradata can Scale Simultaneously Across Multiple Dimensions Driven by Business! Competition Scales One Dimension at the Expense of Others Limited by Technology! Data Freshness Query Freedom Query Complexity Query Data Volume Schema Sophistication 30 The Teradata Difference “Multi-dimensional Scalability” Data Volume (Raw, User Data) Mixed Workload Teradata can Scale Simultaneously Across Multiple Dimensions Driven by Business! Data Freshness Query Concurrency The Teradata Competition Scales One Dimension at the Expense of Others Limited by Technology! Query Complexity Difference! Query Freedom Query Data Volume Schema Sophistication 31 The Teradata Difference “Multi-dimensional Scalability” Data Storage (raw, user data) Teradata Others 20 TB 100’s TBs + Multiple, Integrated Stars and Normalized 15 TB 1,000’s Schema Sophistication Normalized 10 TB Multiple, Integrated Stars 5 TB Simple Star 3-5 Way Joins 15+ way Joins + OLAP operations + Aggregation + Complex “Where” constraints + Views Parallelism 5-10 Way Joins # of Concurrent Queries MBs Batch Reporting, Repetitive Queries “Iterative”, Ad Hoc Queries Data Analysis/Mining Near Real Time Data Feeds Active Data Warehousing GBs Query Complexity TBs Query Data Volumes Workload Mix 32 State of Michigan, Department of Community Health (DCH) Customer Profile Teradata Customer Since 1991 As the largest department in the State of Michigan, DCH is responsible for managing delivery of health care services to more than 1.2 million clients and overseeing an annual budget of $9.5 billion. DCH administers many of the state’s most critical programs, including Medicaid, WIC, and child immunizations. Business Solutions • Data warehouse integrates claims/encounters; beneficiary eligibility data; provider data; birth records; death records; long-term care assessments; WIC data; immunizations; lead screening; newborn screening; & notifiable diseases. • Fraud & abuse • Contract management with health plans • Healthcare cost & quality assessment • Overpayment & COB analysis • Program effectiveness • Predict State’s healthcare needs • Prioritize health initiatives for future Implementation Summary • Integrated data from nine separate health-related agencies • Managed and used by agency subject matter/programmatic experts, not by the IT department • Over 200 users in Medicaid and 8,000 state-wide Realizations and ROI • Estimated annual savings of $75 million–$100 million due to advanced health care analysis • Medicaid administrative costs have been reduced by 25 percent • Recoveries for Medicaid Fraud has doubled • Maximized Medicaid program savings while sustaining quality care • Warehouse helped Michigan go from “last to first” in child immunization rates • Track and substantiate savings in Medicaid pharmacy costs • 2004 TDWI Best Practice Award Winner – Government and Non-Profit Category 33 The New York State Department of Health (DoH) Teradata Customer Since 1999 Customer Profile New York’s Medicaid program provides critical health care services to more than 3.7 million participants – 2.4 million in New York City alone. To serve this constituency, the state processes and analyzes more than 300 million claims totaling more than $38 billion annually. It is the largest Medicaid program in the US. Business Solutions New York is making more rapid, informed decisions about programs, policies, and people across its vast Medicaid system. • Fraud & abuse • Tracking bio-terrorism indicators daily by pharmaceutical purchases with acute illness data from hospital emergency rooms • Determining disease patterns and trends and the best possible treatment • Tracking drug pattern usage to prevent abuse • Program effectiveness • Service delivery effectiveness • Enhanced audit control • Forecasting the cost and utilization of expensive prescription drugs • Identification of overpayments • Responding quickly to legislative inquiries Implementation Summary • More than five years of History • 1.3 Billion Claims • 650 users from 17 counties that is expected to grow to thousands Realizations and ROI • First year in operation paid for entire implementation of the DW! • Better analysis of integrated data resulted in recoveries in the millions! • $16m - Coordination of Benefits, $5m - duplicate payments, $1 million - overpayments • $187 million saved due to better policy decisions based on medical and pharmaceutical analysis • Millions saved due to efficiency of analysis such as Audit process reduced to 2 hours from 8 weeks • 2004 NASCIO Award – Best Information Architecture Category 34 Iowa Department of Revenue Tax Compliance • Have more accurate leads because of better information • Experienced substantial savings; staff can -> Analyze greater volumes of data > Manage a greater number of cases > Exercise a higher level of control over taxpaying behavior > Before the EDW, this additional work would have caused for a 20-25% increase of the audit staff • Generated $69.7M in incremental collections and refund reductions in 2003 > $30.6M through office examinations > $17.4M in refund reductions > $ 9.1M from tax gap revenues > $ 7.5M in out-of-state audits of multi-state businesses > $ 5.1M from in-state field audits Business Benefits 35 The Teradata Mission Teradata Active Data Warehousing strategic tactical event-driven decision making in a single centralized mission-critical up-to-date version of the enterprise data Sources tactical strategic Active Data Warehouse Users “Any Question, By Any User, At Any Time” All Decision Making…from One Copy of the Data. 36 The Industry Leader in Data Warehousing john.tulley@ncr.com 37