Data Warehousing at Acxiom Paul Montrose Agenda • Acxiom Overview • Data warehouses • Transactional databases • Hybrid databases • What’s new/future innovations • Summary • Questions and answers Acxiom Overview At Acxiom, we create and deliver Customer and Information Management Solutions that enable many of the largest, most respected companies in the world to build great relationships with their customers. Acxiom achieves this by blending data, technology and services to provide the most advanced customer information infrastructure available in the marketplace today. Acxiom Overview Acxiom customizes industry-specific solutions to solve the unique business issues of the Automotive, Financial Services, Government Services, Healthcare, Insurance, Media, Retail, Technology, Telecommunications, as well as Travel and Leisure industries. Every solution that Acxiom offers is built from our core competencies: • CDI/Technology • Data • Database • Consulting and Analytics • Privacy Leadership • IT Outsourcing Acxiom Overview Customer and Information Management Solutions for marketing, risk and IT help companies: • Improve acquisition, retention, cross sell, up sell and channel management • Improve authorization, increase collections and reduce fraud • Increase operational efficiencies and improve enduser satisfaction Data Warehouses The characteristics of an Acxiom data warehouse generally are... • • • • • • • • Large multi-terabyte databases Large periodic sequential data loads Denormalized database schema Sequential reads/full table scans Little or no indices Little or no transaction logging Robust periodic backup solutions Performance measured using megabytes/gigabytes per second (MBPS, GBPS) Data Warehouses IBM The processing platform is generally a large global class server or cluster of servers running UNIX. The database is; A large vertical database that is denormalized with few tables but very long with sorted data and are sometimes several billion rows. The data is striped across the storage in a manner that prevents physical hot spots and takes advantage of the wide bandwidth. Database The storage subsystem is very fast with wide bandwidth and high levels of redundancy which permits the ability to move large amounts of sequential data in a very short time. Data Warehouses IBM Transactional Databases The characteristics of an Acxiom transactional database generally are... • Small, usually no larger than a few terabytes • Random and simultaneous inserts, updates, deletes, and queries • Random reads and writes • Normalized database schema • Transaction logging and archiving with incremental and periodic backup solutions • Generally sub-second response required per transaction taking into account concurrency • Performance measured using transactions per second (TPS) and I/O latency Transactional Databases IBM The processing platform is generally a medium/enterprise class server The database is; A normalized database that utilizes lookup tables. The data is stored randomly within a table but striped across the storage to prevent physical hot spots. Database The storage subsystem is very fast with low latency and nominal bandwidth and high levels of redundancy which permits the ability to move small amounts of selected data quickly. Transactional Databases IBM Hybrid Databases The characteristics of an Acxiom hybrid database generally are... • Medium sized, usually three to ten terabytes • Random and simultaneous inserts, updates, deletes, and queries • Random and sequential reads and writes • Loosely normalized database schema • Indices used sparingly • Usually a batch maintenance process • Transaction logging and archiving with incremental and periodic backup solutions • Generally sub-second response required per transaction taking into account concurrency • Performance measured using TPS, I/O latency, and MBPS Hybrid Databases IBM The processing platform is generally a medium sized global class server The database is; A large vertical database that is loosely normalized with few tables but very long with sorted data and are sometimes more than a billions rows. The data is striped across the storage in a manner that prevents physical hot spots and takes advantage of the wide bandwidth. Database The storage subsystem is very fast with wide bandwidth and high levels of redundancy which permits the ability to move large amounts of random and sequential data in a very short time. Hybrid Databases IBM What’s New/ Future Innovations Grid or scale-out environments... • Utilize low cost commodity based servers • Low cost/no cost operating systems • Many servers can be working on one problem with the aggregate processing power being more that one large server for less money • Not locked into a single vendor or supplier • When adding a new node, able to use current technology at a lower price • Need to understand and factor in peripheral costs such as network, administration, data center etc. Parallel Grid Clustered Grid IBM server IBM server IBM server IBM server IBM server IBM server pSeries pSeries pSeries pSeries pSeries DB DB DB DB DB DB OS OS DB pSeries Distributed Grid Database • Shared nothing environment, each partition has its own resources allowing unlimited scalability (up to 999 partitions). Any partition can receive connections and • Centralized management of partitioned environment. distribute queries among the other nodes. • Data is equally distributed across all partitions. Summary • Understand the process in which the database is to be used and fashion a solution to meet the requirements and customer expectations • Even though a DBA may only be responsible for the database, many factors such as operating system and hardware configuration affect the functionality of the database and thus are a concern to the DBA. A DBA must relate the database to its environment to achieve an optimized solution. • A large multi-terabyte database is not a scary monster, it is the same as dealing with a smaller database, just add a few more zeros. Questions?