06-5-AGGREGATES – STORAGE METHODS PRE AGGREGATE DISCUSSION Fact table contains only base data Sept 20 Store 1 Product 1 Sept 20 Store 1 Product 2 Etc Etc Etc -- through millions of rows Sales $ Sales $ To get the total for 2005 there are millions of rows to process. To compare 2005 with 2006 there are millions more rows to process. If ask the same questions again later or a very similar questions such as compare on monthly basis the sales in 2005 and 2006 -- then same millions of rows to process RESULT SLOW SLOW SLOW Aggregates are pre-calculated and stored summaries. ADVANTAGE A standard query would have to add up all the atomic fact rows to calculate a summary. Could be billions of rows Having a pre-calculated total already defined and sitting in a table can Save processing Save reprocessing Speed up significantly the retrieval for commonly requested queries LEVEL OF AGGREGATION Could be daily, monthly, or yearly depending upon business requirements If daily or monthly kept “Maybe” don’t need to keep yearly, as the calculation is small -- 12 months to add vs 365 days vs 600 millions rows AGGREGATES are more tables holding the stored data Document1 by rt -- 9 February 2016 1 of 4 DIFFERENT IMPLEMENTATIONS OR EFFECT OF AGGREGATION WITH 3 METHODS OF STORAGE ROLAP (Relational data storage) ROLAP uses the structures of a relational database to store the cube’s aggregations. The data is left in the data source. When cube is queried, the base level of data is needed. Data is retrieved from the original data source. Aggregations are stored in the relational database as a set of tables. Aggregate tables can take up a lot of space in the data warehouse, especially if they only “lightly” summarize the atomic data. With light summarization the data is almost duplicated Briefly – query goes to FACT table -- Add the data -- Display it To improve performance – keep summarized tables Data is still in STAR SCHEMA Source ROLAP Document1 by rt -- 9 February 2016 2 of 4 MOLAP (Multidimensional data storage) Uses a type of data storage specifically created for OLAP analysis. Data is copied from data source (FACT table) and stored in the MOLAP cube’s multidimensional structure When cube is queried, the original data in the relational database is not needed All data is available in the cube. Aggregations are stored in the specialized multidimensional cube structure. COMPARE ROLAP vs MOLAP FAST MOLAP - Everything there - Easy slice and dice SPACE MOLAP - Requires more HOLAP (Hybrid data storage) It combines ROLAP and MOLAP. It handles the data like ROLAP and aggregations like MOLAP. - DETAILED data stays in source database - Aggregations in CUBES Partition Storage Methods – The following is reported to be from Microsoft (this is copied for my personal use to remember what partition storage methods is all about) Physical storage options affect the performance, storage requirements, and storage locations of partitions and their parent cubes. One of these options is the storage mode of the partition. A partition can have one of three storage modes: Multidimensional OLAP (MOLAP) Relational OLAP (ROLAP) Hybrid OLAP (HOLAP) Analysis Services supports all three storage modes. With the Storage Design Wizard you can choose the storage mode most appropriate for your partition. Alternatively, you can use the Usage-Based Optimization Wizard to select a storage mode and optimize aggregation design based on queries that have been sent to the cube. Also, you can use an explicitly defined filter to restrict the source data that is read into the partition when using any of the three storage modes. Document1 by rt -- 9 February 2016 3 of 4 The MOLAP and ROLAP storage modes have somewhat different meanings when applied to dimensions and local cubes rather than partitions. The HOLAP storage mode does not apply to dimensions or local cubes. MOLAP The MOLAP storage mode causes the aggregations of the partition and a copy of its source data to be stored in a multidimensional structure on an Analysis server computer. This computer can be the Analysis server computer where the partition is defined or another Analysis server computer, depending on whether the partition is defined as local or remote. The multidimensional structure that stores the partition's data is located in a subfolder of the Data folder of the Analysis server. For more information about the Data folder, see Analysis Server. Because a copy of the source data resides on the Analysis server computer, queries can be resolved without accessing the partition's source data even when the results cannot be obtained from the partition's aggregations. The MOLAP storage mode provides the potential for the most rapid query response times, depending on the percentage and design of the partition's aggregations. In general, MOLAP is more appropriate for partitions in cubes with frequent use and the necessity for rapid query response. ROLAP The ROLAP storage mode causes the aggregations of the partition to be stored in tables in the relational database specified in the partition's data source. However, you can use the ROLAP storage mode for the partition's data without creating aggregations in the relational database. For more information, see Set Aggregation Options (Storage Design Wizard) or Set Aggregation Options (Usage-Based Optimization Wizard). Also, indexed views are created instead of tables if the partition's source data is stored in SQL Server 2000 and if certain criteria are met. For more information, see Indexed Views for ROLAP Partitions. Unlike the MOLAP storage mode, ROLAP does not cause a copy of the source data to be stored; the partition's fact table is accessed to answer queries when the results cannot be derived from the aggregations or client cache. With the ROLAP storage mode, query response is generally slower than that available with the other two storage modes. ROLAP is typically used for large datasets that are infrequently queried, such as historical data from less recent previous years. Note: Aggregations cannot be created for a partition with ROLAP storage if the data source is Analysis Services (that is, if the provider is the Microsoft OLE DB Provider for Analysis Services). HOLAP The HOLAP storage mode combines attributes of both MOLAP and ROLAP. Like MOLAP, HOLAP causes the aggregations of the partition to be stored in a multidimensional structure on an Analysis server computer. HOLAP does not cause a copy of the source data to be stored. For queries that access only summary data contained in the aggregations of a partition, HOLAP is the equivalent of MOLAP. Queries that access source data, such as a drilldown to an atomic cube cell for which there is no aggregation data, must retrieve data from the relational database and will not be as fast as if the source data were stored in the MOLAP structure. Partitions stored as HOLAP are smaller than equivalent MOLAP partitions and respond faster than ROLAP partitions for queries involving summary data. HOLAP storage mode is generally suitable for partitions in cubes that require rapid query response for summaries based on a large amount of source data. Document1 by rt -- 9 February 2016 4 of 4