RED BRICK WAREHOUSE / INFORMIX STATE OF TECHNOLOGY Joe Carr Frank A. LoPinto Robert Totin ISM 611: Enterprise Data Systems November 20, 1999 Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 1 November 8, 1999 ISM 611: Enterprise Data Systems EXECUTIVE SUMMARY "As an integral piece of Informix Decision Frontier Solution Suite, Informix Red Brick Warehouse implementations for data marts clearly show proven time-to-market advantages and the highest success rates. Informix Red Brick Warehouse is a specialized server technology that has been designed and optimized for analytic data mart solutions, for complex queries, fast load performance, highcapacity/high-performance processing, and for efficient management of very large databases." Today’s global business environment is unprecedented in its dynamics, demanding that organizations adapt their strategies and business practices far more rapidly and intelligently than ever before. Changing market conditions like mass customization in the retail sector, deregulation in banking, utilities, communications and insurance, and electronic commerce are driving change within organizations at an unprecedented rate and impacting business across all industries. Organizations know that having high-quality information—about markets, competitors, economic conditions, resources, and their own business—has gone beyond being a success factor and has become business-critical. The quantity, complexity, and scope of data are growing exponentially, and decisionmakers are finding it harder to make sense of it. However, some of Informix’s leading customers are developing new business strategies. For example: A bank can combine information from different product lines to understand who its most valuable customers are. Once identified, the institution can proactively market to them in new, innovative ways. A telecommunications company can analyze call records and identify customers who are likely to defect to a competing vendor. The company can then decide how much to invest to prevent "churning". A manufacturing firm can consolidate data from various sources to determine vendors whose goods cause downstream product failures. Enabling manufacturers to work with their suppliers to improve processes and procedures helps them produce higher-quality products, reduce waste, and improve customer satisfaction. Achieving better knowledge management through the acquisition and use of data and information technologies allows business to implement new techniques in analytic merchandizing to help them survive. Informix Red Brick Warehouse empowers business to get in front of the business intelligence curve by providing the smart solutions for data warehouses, Web warehouses, and analytic data marts. Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 2 November 8, 1999 ISM 611: Enterprise Data Systems BUILT SPECIFICALLY FOR DATA MARTS Data marts empower knowledge workers with the ability to make better, fact-based decisions about their business. A database that supports this process must store large volumes of business data and quickly and reliably answer the widest range of business questions. These high-performance, highcapacity requirements demand a product with specialized technologies. Informix Red Brick Warehouse is an open, relational database designed specifically to meet the specialized needs of data marts. In addition, Informix Red Brick Warehouse is optimized for complex, high-performance Web analytics. The advanced Red Brick Warehouse architecture delivers superlative response times to the most complex business questions. Businesses that can gather, analyze, understand, and act on information in the fastest manner will achieve greater success. Informix Red Brick Warehouse can provide answers to “any question, of any data—fast.” DATA MARTS FOR E-COMMERCE E-commerce, the new model for business, is growing exponentially. Built on the Internet, the most extensive communications network available today, e-commerce is profoundly altering the way business is done around the world. To be competitive, organizations must develop new markets, retain customers, and capitalize on e-commerce opportunities. Informix Red Brick Warehouse enables companies to execute e-commerce strategies by giving them the tools they need, business-critical intelligence and high performance. Analytical capabilities like " market basket analysis", provides a hardware retailer, not only how many printers are sold, but a snapshot of the each transaction, including the time of purchase, payment method, and other items purchased with the printer. Aggregating the millions of transactions that occur over time provides a detailed composite of customer behavior and allows better targeting of promotions resulting in higher sales. Informix Red Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 3 November 8, 1999 ISM 611: Enterprise Data Systems Brick Warehouse provides the ease of implementation, scalability and reliability requirements that are necessary for companies to be more effective in analyzing their use of the Internet for e-commerce. To handle the most demanding data warehousing e-commerce requirements, Informix Red Brick Warehouse also supports Java, the language of the Internet. THE BUSINESS-CRITICAL DATA MART SOLUTION Informix Red Brick Warehouse delivers faster, higher levels of performance more cost effectively to significantly outperform other vendor offerings. Informix Red Brick Warehouse’s performance uses performance-enhancing technologies, which provide faster solutions, lower costs, and higher return on investment. Regardless of the complexity, Informix Red Brick Warehouse delivers solutions faster, keeps costs lower, and provides better answers to business intelligence questions. SUPPORTS THE ENTIRE DATA MART PROCESS Significant Red Brick Warehouse advantages in the data mart design include loading data into the data mart, storing and managing data within the database, and extracting information. Solutions built around Informix Red Brick Warehouse are successful because Informix Red Brick Warehouse integrates the entire data mart process with key performance technologies applied at each step. THE RED BRICK ADVANTAGE In 1998, Red Brick Warehouse version 5.1 was released with the claim that it was the fastest and most scalable relational database for data warehousing, data marts, OLAP, and data mining. Skeptics questioned the use of a relational database for OLAP and data mining, knowing from experience that relational databases worked fine for data warehouses and data marts, but believing that special functionality was needed for data mining and that OLAP required multidimensional modeling to drill around various dimensions. Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 4 November 8, 1999 ISM 611: Enterprise Data Systems Red Brick Warehouse version 5.1 consists of three components: a database server, a load subsystem, and gateway technologies for client/server access. The server was designed to support databases larger than 500GB with billions of records, using compressed indexes to reduce the storage space requirements. It utilizes parallel joins and trademarked technology called parallel-on-demand which partitions queries for optimal parallelism. Red Brick supports conventional B-tree, star, and target indexes for different types of queries. Red Brick uses a multiple join algorithm to overcome performance problems of conventional sequential pairwise joins in star schemas. The use of "hybrid" index types can mix the domains for a column to neutralize the effects of a skewed data distribution. Red Brick Intelligent SQL (RISQL) has numeric and string functions and macros and specially developed to simplify repetitive DSS queries. Red Brick's high-performance load subsystem called Table Management Utility (TMU) was designed to populate data warehouses quickly and efficiently by providing data aggregation, conversion, and transformation utilities along with integrity checking and index updating in one integrated run. RED BRICK DATA MINE OPTION Red Brick Data Mine Option operates on the premise that the data-mining tool should be taken to the data instead of taking the data to the mining tool. In collaboration with DataMind Corp, Red Brick integrated DataMind's neural network, decision trees and statistical algorithms into the core of the RDBMS server to allow data mining directly on the Red Brick Warehouse database. Users can create multidimensional models that appear as tables in the database. When data is inserted into the data warehouse tables, data mining calculations are performed and the results are stored in model tables, which can be analyzed or "mined" using RISQL. The model tables can be created using the GUI-based Red Brick Data Mine builder or through extended SQL statements issued directly to the warehouse DBMS server. This approach to OLAP and data mining saves a considerable amount of data extraction, Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 5 November 8, 1999 ISM 611: Enterprise Data Systems transformation, shipping, and loading and minimizes redundant storage of data in different databases. It reduces administrative procedures by allowing security and administrative tasks to be performed only once on the database. The biggest advantage of this approach is that when analysts want to perform drilldown analysis, the detailed data is at hand in the same database and the mappings between the OLAP data and the warehouse data are defined in the metadata. NEW FEATURES New features in Red Brick Warehouse version 5.1 improve ease of use and administration. Red Brick Vista, a component of Red Brick server, has features to manage and process aggregate queries for life cycle management. Aggregate Advisor can audit selected aggregates and compare their estimated gains to actual system usage in order to calculate a cost formula for each potential aggregate. This allows DBAs to choose which aggregates to create. Transparent Query Rewrite function analyzes complex SQL queries and transforms the queries to use the appropriate stored aggregate. An administrator can edit the aggregation strategies without affecting existing applications and queries. Red Bricks parallel loader, TMU, which loads aggregates automatically when a base table is updated, along with Aggregate Advisor and Transparent Query Rewrite, use the metadata layer, which contains the definitions of the aggregates and the tables upon which they are based. SQL-Backtrack is an administrative utility to manage fast backup and recovery. It supports online, incremental and parallel backups. Red Brick Warehouse Administrator is a GUI-based administrative tool to control all the data warehousing tasks on all the platforms that Red Brick Warehouse supports and is especially focused on segmentation and partitioning. LIMITATIONS All queries submitted to Red Brick Warehouse database are subject to a limit of 8k on the row size of the intermediate and final result tables. This limit is the maximum size of a row in a table and is important Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 6 November 8, 1999 ISM 611: Enterprise Data Systems when considering all the possible queries a user may wish to make. Joining large descriptive columns from the dimensions table to a wide fact table could exceed this limit. The database server has a default stack size of 5MB and will fail if it runs out of stack space. For extensive data mining operations, this may be problematic. FINAL ANALYSIS Red Brick's table, index, and query strategy designs are aimed at carrying out any data warehouse queries of any complexity as fast as possible on very large data warehouses. Red Brick caters to large data warehouses in its loading, administrative, and backup and recovery facilities, all of which promote a high degree of parallelism. The OLAP functionality built into the relational database server is a very interesting feature and a unique approach, since most other vendors view OLAP as a specialized area requiring a multidimensional server. The strategy of taking the OLAP function to the data is unique and saves a considerable amount of data duplication and upload/reload processing. It also maintains links between the multidimensional views and the warehouse data that is extremely useful for drill-down operations. Implementation and management of a data warehouse should be much easier using a product such as Red Brick Warehouse version 5.1. COMPANY BACKGROUND First released in 1990, Red Brick Warehouse is a RDBMS designed exclusively for data warehouse processing and uniquely suited for decision-support processing. Based upon "Star Schema" design principles, touted by noted system architect Ralph Kimball, Red Brick Warehouse quickly gained support in the field of data warehousing and data mining. However, by 1998, Red Brick Warehouse had fallen on hard times financially. In the fall of 1998, Informix, Inc. negotiated the purchase of Red Brick Warehouse for $ 35 million and began integrating Red Brick's products into an organization that already had a worldwide presence. The impetus for this acquisition was the strength of the Red Brick Warehouse database product. Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 7 November 8, 1999 ISM 611: Enterprise Data Systems THE BUY-OUT Informix officials were not very talkative in early October 1998, when the acquisition of Red Brick Warehouse by Informix was announced. Red Brick Warehouse was and still is based on a star schema model ideally suited for decision-support data warehousing but not for OLTP. Informix, Inc, a worldwide presence in the market for high-end OLTP with its Dynamic Server, obtained much stronger data warehousing capabilities with the purchase of Red Brick Systems, Inc. The $35 million deal, which closed late in 1998, gave Informix new decision-support and data movement capabilities for its Dynamic Server database. It also provided Informix with superior data warehouse talent, significant presence in key markets, and the "best in class" data mart technology. Officials declined to predict how the Red Brick products would be integrated into the Informix family but made it clear that the two flagship products would continue to be supported separately for the time being. The Red Brick brand will be maintained in selected geographies because of its strong equity with customers. Informix wants to continue to leverage and extend this advantage in the high-end enterprise data warehouse market. BETWEEN THEN AND NOW In January 1999, Informix announced that it would offer two distinct versions of it Dynamic Server database, one for OLTP and the other for data warehousing. The data warehousing version, Yellowstone would be available in mid-1999. In May 1999, Informix announced that they would introduce, through Red Brick technology, a new product, i.Sell, which will provide Internet- based retailers a way to trace and predict its customers' buying habits by allowing users to analyze the actions of visitors to a web site. In July 1999, Informix CEO, Jean-Yves Dexmier spoke about the continued focus on the Internet as a revolutionary new market, particularly for business intelligence. Dexmier maintains that on the Internet, every click changes and expands business intelligence, and that business intelligence is now Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 8 November 8, 1999 ISM 611: Enterprise Data Systems exponentially more complex. Informix's Web solution is two-pronged. The first, Centaur, is a technology foundation strategy tied to Informix Dynamic Server. The second, the i.reach and i.sell products, tied to the Red Brick technology, will provide tools to analyze web-based traffic. Also in July, Informix announced a new data warehousing software product called Red Brick Decision Server, which will provide administrators with advanced analysis of web traffic. It will be attractive to those who want to analyze click-stream data. The software will support variable-length character strings, enabling storage of URLs while minimizing disk space use. Informix is integrating with Red Brick to provide data warehousing and data mining of web traffic patterns. In August, 1999, beating all previous results, the Red Brick Warehouse, on the Sun platform, successfully loaded, queried and scaled a data warehouse to more than 300GB of raw data with up to 600 concurrent users. Only half of the CPU capacity was used and table loading at 14GB/hour was 2.3 times faster than prior tests. The test simulated a retail environment of 63 stores, 19000 plus products, 3.6 million transactions/day and 35 ongoing promotions. The data warehouse included two fact tables and five dimension tables. Loading consisted of data cleaning, index building, referential checking and aggregation updating. The results, achieved in August 1999, demonstrate the system's outstanding performance for large-scale warehousing in the intense retail environment. WHERE RED BRICK WAREHOUSE IS TODAY It should be obvious by now that Informix has no intention of burying the value of the Red Brick Warehouse technology within its stable of Informix products but fully intends to leverage the name recognition and reputation of Red Brick Warehouse to the fullest extent possible. Below is Informix's positioning statement for Red Brick Warehouse and the most current list of the features that separate Red Brick from the rest of the players in the high-end database warehousing market: "To gain competitive advantage in today's rapidly changing business environment, customers are increasingly challenged to develop and deploy decision-support applications quickly. Informix Red brick Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 9 November 8, 1999 ISM 611: Enterprise Data Systems Warehouse is an open, relational database designed specifically to meet the requirements for single-object or departmental datamarts. Informix Red brick Warehouse is a specialized server technology that has been designed and optimized for analytic data mart solutions, for complex queries, fast load performance, high capacity/high performance processing, and for efficient management of very large databases. As an integral piece of Informix Decision Frontier Solution Suite, Informix Red Brick Warehouse implementations of data marts clearly show proven time-to-market advantages and the highest success rates. With Informix Red Brick Warehouse, more users can gather, understand and act on information faster and more easily. Informix is the technology leader for complete end-to-end, business critical data warehousing solutions, and employing Informix Red Brick Warehouse as part of Decision Frontier Solution Suite provides better ways to gain competitive advantage." CONCLUSION Informix Red Brick Warehouse is the choice for implementing business critical datamarts and data warehouses. Today’s global business environment demands that organizations adapt their strategies and business practices far more rapidly and intelligently than ever before. A rapidly changing business environment is driving change within organizations at an unprecedented rate, and requiring quick and effective storage and retrieval of business critical knowledge. Achieving better knowledge management through the acquisition and use of data and information technologies allows business to implement new techniques to help them survive. Organizations know that having high-quality information-about markets, competitors, economic conditions, resources, and their own business-has gone beyond being a success factor and has become business-critical. The quantity, complexity, and scope of data are growing exponentially, and decision-makers are finding it harder to make sense of it. This is why Informix Red Brick Warehouse should be the solution used to empower business to get in front of the business intelligence curve and gain a competitive advantage over the competition. Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 10 November 8, 1999 ISM 611: Enterprise Data Systems APPENDIX A Red Brick Database Server- Red Bricks relational database server was designed to support databases larger than 500GB with billions of records. It uses compact representations for numeric data and compressed indexes to reduce storage space requirements. Red Brick Load subsystem- Red Brick uses PTMU (Parallel Table Management Utility) to populate data warehouses quickly and efficiently. It provides data aggregation, conversion, and transformation utilities to mask out or merge specific fields from an input data stream and roll up totals through data hierarchies. It does this integrity check and updates all necessary indexes in one integrated run. Client/server access- provided by Gateway technologies Parallel-on-demand - using this feature, Red Brick query analyzer partitions queries for the optimal degree of parallelism where it considers the query's complexity, the tables partitioning, and available resources. Indexes- Red Brick support conventional B-tree, star and Pattern indexes for different types of queries. STAR indexes are automatically built when tables are created. STARindex- Red Brick had the advantage with its STAR index Red Bricks STARindex is automatically built when tables are created. This index maintains relationships between primary and foreign keys are a unique feature of Red Brick. Using multiple STARindexes, multiple join processing is greatly accelerated at the time of query execution and occupies less disk space than the multiple indexes required by other vendor solutions. Targetindex-continually adaptive bit map indexing technology specifically designed for fast selection of records from large tables. Extended SQL- Red Brick offers RISQL (Red Brick Intelligent SQL) specially developed for decision support queries. It supports analysis queries through sequential, rank, running total, moving average and ratio functions. This was the only software that provided a means to look at time slices of data. Product maximums- unlimited databases per system, 32,727 tables per database 7200 columns per table, 4.3 billion rows per table, 2 terabytes per table. Product and server requirements-12MB disk space for installing software; 4MB physical memory per user during execution. STARjoins-a unique multiple join algorithm technique optimized for star schemas that boasts the fastest possible response for multidimensional analysis of data warehouse queries. Targetjoin- Allows user to apply sets of restrictions across multiple tables in parallel providing a narrower, more targeted view of data. Vista-an aggregate management system, integrated into the server which provides comprehensive aggregate creation, management, and query optimization. Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 11 November 8, 1999 ISM 611: Enterprise Data Systems SuperScan-Lets multiple users leverage a single I/O stream, resulting in dramatically reduced I/O across a group of users and queries. Table Segmentation-Allows a table to be physically split across a number of devices or file systems, while maintaining all of the administrative and usability advantages of single, logical table. Time-Cyclic Data Management-allows efficient handling of time sensitive data like a rolling set of time periods. Query Priority Concurrency-a unique multitasking mechanism designed specifically for DSS environments so query execution remains unaffected during data modification and load operations. SQL Backtrack- provides a comprehensive and flexible solution for fast, easy, and safe backup and recovery of databases. Informix Red Brick Security Features- Ease large-scale database administration by providing hierarchical, role based security and rich logging to support administration, tuning, and charge back accounting. Informix Red Brick Warehouse Administrator-A graphical, windows based tool that makes administration simpler and more efficient. Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 12 November 8, 1999 ISM 611: Enterprise Data Systems APPENDIX B CUSTOMER TESTIMONIALS AT&T In 1996, Congress passed the Telecommunications Reform Act which opened the $90 billion per year local telephone market to competition. AT&T realized that in order to survive they needed to meet many new challenges. One of these many challenges was to strategically leverage customer data to achieve a competitive advantage. One of the solutions was to consolidate detailed customer information from multiple mainframe systems into a single data warehouse. AT&T chose Red Brick Warehouse as the vendor for their decision-support warehouse. One key to AT&T's success would be to mine the vast amounts of data buried in mainframe databases to gain a better understanding of customer needs. "WE chose Red Brick to provide highly summarized data for decision-support applications because of its fast load and query-retrieval capabilities. It would have been difficult to bring the warehouse where it is were it not for Red Brick's professional services." General Mills General Mills is a consumer foods giant with more than $5.5 billion in annual sales, and is the second largest cereal manufacturer in the U.S. In 1993, General Mills decided to upgrade its Nielson database. General Mills reviewed several database alternatives and ultimately chose Red Brick Warehouse as the best solution for decision-support environments. Red Brick's high-speed data loading and retrieval specifications were twice as fast as the others evaluated. "Red Brick delivered on its promises. When we were installing Red Brick, we also brought in a new decision-support tool that had never worked against Red Brick. Red Brick and this vendor collaborated to make Red Brick and this tool work together. Today, General Mills uses the Red Brick Warehouse database to sift through its Nielson data and track the buying activities of millions of customers. The high performance Red Brick Warehouse supports insightseeking query activity by changing the dynamics of how staff query databases. Sara Lee Intimates Sara Lee Intimates is a manufacturer of quality women's intimate apparel. Sara lee uses sales data shared from its retailers to build product plans to maximize product margins. Sara Lee needed a way to sift through volumes of retail sales data to understand emerging fashion trends and direct manufacturing decisions. Sara Lee began a search for the right system to support the information analysis needs, and after reviewing several possibilities, chose Red Brick Warehouse for its new decision-support data warehouse. "Red Brick helped with the product design, including how it needed to be set up and run inside Red Brick. What we bought worked. It went incredibly smoothly. We had no life before Red brick. Now we have a life. We've got the data warehouse system tuned to do what Red brick does best. Our day to day transaction database didn't have to change. It was an easy process." Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 13 November 8, 1999 ISM 611: Enterprise Data Systems APPENDIX C Red Brick And HP Benchmark Red Brick and Hewlett-Packard have collaborated to produce a benchmark test for Proof of Performance and Scalability (POPS). The overall goal of the test was to prove Red Brick and HP’s superior ability to load, query and scale a data warehouse system using more than a terabyte of raw data and hundreds of concurrent users. This test is based on a real-world retail schema and addresses the type of complex, ad hoc business questions commonly posed in a retail environment. Release of the results is proof that the performance and scalability goals were accomplished and that the joint hardware/software solution will perform exceptionally well in similar tests. Performance Goals Demonstrate the ability to load and implement a 1+ TB data warehouse within an acceptable time frame Demonstrate superior query performance against 1+ TB of raw data in a data warehouse Demonstrate the ability to scale up to a very large number of simultaneous users (600) against a 1+ TB data warehouse with linear system and query performance Architecture The POPS test was conducted on a symmetrical multiprocessor system (SMP). SMP systems excel over massively parallel processing (MPP) systems especially in the area of decision support and data warehouse applications to take advantage of CPU utilization and data distribution. The POPS test has proven that a SMP system can deliver the performance and scalability normally expected from a more costly MPP architecture. The schema used in the POPS test, based upon a retail business, consisted of two fact tables, Daily Sales and Daily Forecast, and five dimension tables, Promotion, Product, Customer, Period, and Store. Over 1+ TB of raw data was loaded into the combination of Daily Sales (.983 TB) and Daily Forecast (.019 TB). The diagram above shows each table in relation to the other tables and, in parentheses, the number of rows in each table. Sample Retail Business Questions Asked What were the profits per store for a given category by store on a given day? In addition, what were the shares per store by district? How much was spent by a given customer over a one-month period? Based on revenue for a given brand, what is the revenue per store? Load Performance Loading data into the data warehouse is one the most critical performance criteria “Load windows” determine the availability of critical data and timely action on strategic decisions Data loading in the load window includes data cleansing, index building, referential integrity checking and aggregation updating Red Brick Warehouse advantages: Cleanses and loads data, builds indexes, checks referential integrity and updates aggregates all within a single load process Red Brick’s parallel loader utility takes advantage of the high-performance, SMP characteristics of the HP 9000 V2200 hardware The load rate for loading the largest table, 7.7 billions rows and 1+ TB in size, was nearly 9GB per hour. Loading and implementing 1+ TB of data was accomplished in approximately 10 days! Query performance Query performance in a data warehouse is one of the most critical issues for end-users. The speed of execution of queries determines how efficiently analysis and decisions can be produced. Red Brick Warehouse uniquely provides a family of superior join technologies to produce the absolute best query performance, including STARindex TM, STARjoin TM, TARGETindexTM and TARGETjoin TM, all of which contributed to the excellent query response times posted in the POPS benchmark results. Joe Carr Frank A. LoPinto Robert Totin Red Brick State of Technology Page 14 November 8, 1999 ISM 611: Enterprise Data Systems REFERENCES http://www.informix.com/answers/english/prbrick60.htm http://www.informix.com/answers/english/prbrick51.htm http://www.informix.com/informix/solutions/dw/redbrick/wpapers/index.html http://www.redbrick.com/ http://www.informix.com/informix/solutions/dw/redbrick/ DBMS article-Red Brick Warehouse 5.1, June 1998.