07 AGGREGATES * STORAGE METHODS

advertisement
06-5-AGGREGATES – STORAGE METHODS
PRE AGGREGATE DISCUSSION
Fact table contains only base data
Sept 20
Store 1
Product 1
Sept 20
Store 1
Product 2
Etc
Etc
Etc
-- through millions of rows
Sales $
Sales $
To get the total for 2005 there are millions of rows to process.
To compare 2005 with 2006 there are millions more rows to process.
If ask the same questions again later or a very similar questions such as compare on monthly basis
the sales in 2005 and 2006 -- then same millions of rows to process
RESULT  SLOW
SLOW
SLOW
Aggregates are pre-calculated and stored summaries.
ADVANTAGE
A standard query would have to add up all the atomic fact rows to calculate a summary.
 Could be billions of rows
Having a pre-calculated total already defined and sitting in a table can
 Save processing
 Save reprocessing
 Speed up significantly the retrieval for commonly requested queries
LEVEL OF AGGREGATION
Could be daily, monthly, or yearly depending upon business requirements
If daily or monthly kept  “Maybe” don’t need to keep yearly, as the calculation is small
-- 12 months to add vs 365 days vs 600 millions rows
AGGREGATES are more tables holding the stored data
Document1 by rt -- 9 February 2016
1 of 4
DIFFERENT IMPLEMENTATIONS
OR
EFFECT OF AGGREGATION WITH
3 METHODS OF STORAGE
ROLAP (Relational data storage)
ROLAP uses the structures of a relational database to store the cube’s aggregations.
The data is left in the data source.
When cube is queried, the base level of data is needed.
Data is retrieved from the original data source.
Aggregations are stored in the relational database as a set of tables.
Aggregate tables can take up a lot of space in the data warehouse, especially if they only “lightly”
summarize the atomic data.
With light summarization the data is almost duplicated
Briefly
– query goes to FACT table
-- Add the data
-- Display it
To improve performance – keep summarized tables
Data is still in STAR SCHEMA Source
ROLAP
Document1 by rt -- 9 February 2016
2 of 4
MOLAP (Multidimensional data storage)
Uses a type of data storage specifically created for OLAP analysis.
Data is copied from data source (FACT table) and stored in the MOLAP cube’s multidimensional
structure
When cube is queried, the original data in the relational database is not needed
 All data is available in the cube.
Aggregations are stored in the specialized multidimensional cube structure.
COMPARE ROLAP vs MOLAP
FAST
MOLAP
- Everything there
- Easy slice and dice
SPACE
MOLAP
- Requires more
HOLAP (Hybrid data storage)
It combines ROLAP and MOLAP.
It handles the data like ROLAP and aggregations like MOLAP.
- DETAILED data stays in source database
- Aggregations in CUBES
Partition Storage Methods
– The following is reported to be from Microsoft
(this is copied for my personal use to remember what partition storage methods is all about)
Physical storage options affect the performance, storage requirements, and storage locations of
partitions and their parent cubes. One of these options is the storage mode of the partition. A partition
can have one of three storage modes:
Multidimensional OLAP (MOLAP)
Relational OLAP (ROLAP)
Hybrid OLAP (HOLAP)
Analysis Services supports all three storage modes. With the Storage Design Wizard you can choose
the storage mode most appropriate for your partition. Alternatively, you can use the Usage-Based
Optimization Wizard to select a storage mode and optimize aggregation design based on queries that
have been sent to the cube. Also, you can use an explicitly defined filter to restrict the source data
that is read into the partition when using any of the three storage modes.
Document1 by rt -- 9 February 2016
3 of 4
The MOLAP and ROLAP storage modes have somewhat different meanings when applied to
dimensions and local cubes rather than partitions. The HOLAP storage mode does not apply to
dimensions or local cubes.
MOLAP
The MOLAP storage mode causes the aggregations of the partition and a copy of its source data to
be stored in a multidimensional structure on an Analysis server computer. This computer can be the
Analysis server computer where the partition is defined or another Analysis server computer,
depending on whether the partition is defined as local or remote. The multidimensional structure that
stores the partition's data is located in a subfolder of the Data folder of the Analysis server. For more
information about the Data folder, see Analysis Server.
Because a copy of the source data resides on the Analysis server computer, queries can be resolved
without accessing the partition's source data even when the results cannot be obtained from the
partition's aggregations. The MOLAP storage mode provides the potential for the most rapid query
response times, depending on the percentage and design of the partition's aggregations. In general,
MOLAP is more appropriate for partitions in cubes with frequent use and the necessity for rapid query
response.
ROLAP
The ROLAP storage mode causes the aggregations of the partition to be stored in tables in the
relational database specified in the partition's data source. However, you can use the ROLAP storage
mode for the partition's data without creating aggregations in the relational database. For more
information, see Set Aggregation Options (Storage Design Wizard) or Set Aggregation Options
(Usage-Based Optimization Wizard).
Also, indexed views are created instead of tables if the partition's source data is stored in SQL Server
2000 and if certain criteria are met. For more information, see Indexed Views for ROLAP Partitions.
Unlike the MOLAP storage mode, ROLAP does not cause a copy of the source data to be stored; the
partition's fact table is accessed to answer queries when the results cannot be derived from the
aggregations or client cache. With the ROLAP storage mode, query response is generally slower than
that available with the other two storage modes. ROLAP is typically used for large datasets that are
infrequently queried, such as historical data from less recent previous years.
Note: Aggregations cannot be created for a partition with ROLAP storage if the data source is
Analysis Services (that is, if the provider is the Microsoft OLE DB Provider for Analysis Services).
HOLAP
The HOLAP storage mode combines attributes of both MOLAP and ROLAP. Like MOLAP, HOLAP
causes the aggregations of the partition to be stored in a multidimensional structure on an Analysis
server computer. HOLAP does not cause a copy of the source data to be stored. For queries that
access only summary data contained in the aggregations of a partition, HOLAP is the equivalent of
MOLAP. Queries that access source data, such as a drilldown to an atomic cube cell for which there
is no aggregation data, must retrieve data from the relational database and will not be as fast as if the
source data were stored in the MOLAP structure.
Partitions stored as HOLAP are smaller than equivalent MOLAP partitions and respond faster than
ROLAP partitions for queries involving summary data. HOLAP storage mode is generally suitable for
partitions in cubes that require rapid query response for summaries based on a large amount of
source data.
Document1 by rt -- 9 February 2016
4 of 4
Download