Programme: II B.Com BA Course Title: Business Data Mining Content: Unit I - Schemas for Multidimensional Databases Ms. S. Valarmathi Assistant Professor B.Com (CA) Department KPR College of Arts Science and Research II B.Com BA Schemas for Multidimensional Databases What is Schema? Database uses relational model while data warehouse requires Schema. Schema is a logical description of the entire database. It includes the name and description of records. Much like a database, a data warehouse also requires to maintain a schema. II B.Com BA Schemas for Multidimensional Databases 1 Types of Schemas Star Schema Snowflake Schema Galaxy Schema II B.Com BA Schemas for Multidimensional Databases 2 Fact Table Contains primary information of the warehouse. Contain the contents of the data warehouse and store different types of measures. Located at center in Star or Snowflake Schema surrounded by dimensional tables. Two columns: Measurements(numeric values)and Foreign keys to dimension tables. II B.Com BA Schemas for Multidimensional Databases 3 Dimension Table Contain information about a particular dimension. Textual information of the business. Stores attributes or dimensions that describes the objects in a fact table. Dimension table has a surrogate key column that uniquely identifies each dimension record. It is de-normalized because built to analyze data as easily as possible. II B.Com BA Schemas for Multidimensional Databases 4 Star Schema Star schema is a relational model with one-to-many relationship between the fact table and the dimension tables. De-normalized model. Easy for users to understand. Easy querying and data analysis. Ability to drill down or roll up. II B.Com BA Schemas for Multidimensional Databases 5 Star Schema Each dimension in a star schema is represented with only one-dimension table. This dimension table contains the set of attributes. There is a fact table at the center. It contains the keys to each of four dimensions. The fact table also contains the attributes, namely dollars sold and units sold. II B.Com BA Schemas for Multidimensional Databases 6 Star Schema time time_key day day_of_the_week month quarter year item Sales Fact Table time_key item_key branch_key branch location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales item_key item_name brand type supplier_type location location_key street city state_or_province country Measures II B.Com BA Schemas for Multidimensional Databases 7 Star Schema Here Sales Fact table got concatenated keys. • Concatenation of all the primary keys of the dimension tables i.e. Time Key of Time Dimension table, Item Key of Item Dimension table, Location key from Location dimension table, Branch key from Branch dimension table. • II B.Com BA Schemas for Multidimensional Databases 8 Star Schema in Real-world Database SELECT P.Brand, S.Country AS Countries, SUM(F.Units_Sold) FROM Fact_Sales F INNER JOIN JOIN Dim_Date D ON (F.Date_Id = D.Id) INNER Dim_Store S ON (F.Store_Id = S.Id) INNER JOIN Dim_Product P ON (F.Product_Id = P.Id) P.Product_Category = 'tv' GROUP BY P.Brand, S.Country WHERE II B.Com BA Schemas for Multidimensional Databases D.Year = 1997 AND 9 Benefits Star Schema • Star schemas are denormalized, meaning the normal rules of normalization applied to transactional relational databases are relaxed during star schema design and implementation. The benefits of star schema denormalization are: • Simpler queries - star schema join logic is generally simpler than the join logic required to retrieve data from a highly normalized transactional schema. • Simplified business reporting logic - when compared to highly normalized schemas, the star schema simplifies common business reporting logic, such as period-over-period and as-of reporting. • Query performance gains - star schemas can provide performance enhancements for read-only reporting applications when compared to highly normalized schemas. • Fast aggregations - the simpler queries against a star schema can result in improved performance for aggregation operations. • Feeding cubes - star schemas are used by all OLAP systems to build proprietary OLAP cubes efficiently; in fact, most major OLAP systems provide a ROLAP mode of operation which can use a star schema directly as a source without building a proprietary cube structure. II B.Com BA Schemas for Multidimensional Databases 10 Demerits • The main disadvantage of the star schema is that data integrity is not enforced as well as it is in a highly normalized database. One-off inserts and updates can result in data anomalies which normalized schemas are designed to avoid. Generally speaking, star schemas are loaded in a highly controlled fashion via batch processing or near-real time "trickle feeds", to compensate for the lack of protection afforded by normalization. • Star schema is also not as flexible in terms of analytical needs as a normalized data model. • Normalized models allow any kind of analytical queries to be executed as long as they follow the business logic defined in the model. Star schemas tend to be more purpose-built for a particular view of the data, thus not really allowing more complex analytics. • Star schemas don't support many-to-many relationships between business entities - at least not very naturally. Typically these relationships are simplified in star schema to conform to the simple dimensional model. II B.Com BA Schemas for Multidimensional Databases 11 Snowflake Schema Some dimension tables in the Snowflake schema are normalized. The normalization splits up the data into additional tables. Unlike Star schema, the dimensions table in a snowflake schema are normalized. II B.Com BA Schemas for Multidimensional Databases 12 Snowflake Schema Time time_key day day_of_the_week month quarter year Item Sales Fact Table time_key item_key branch_key Branch branch_key branch_name branch_type location_key units_sold dollars_sold avg_sales Measures II B.Com BA Schemas for Multidimensional Databases item_key item_name brand type supplier_key Supplier supplier_key supplier_type location location_key street city_key city city_key city state_or_province country 13 Snowflake Schema For example, the item dimension table in star Schema is normalized and split into two dimension tables, namely item and supplier table. Advantage of Snowflake schema is that it is easier to update and maintain normalized structures. Disadvantage of Snowflake schema is that it degrades the query performance because of additional joins.. II B.Com BA Schemas for Multidimensional Databases 14 Snowflake Schema in Real-world Database II B.Com BA Schemas for Multidimensional Databases 15 SELECT B.Brand, G.Country, SUM(F.Units_Sold) FROM Fact_Sales F INNER JOIN Dim_Date D ON F.Date_Id = D.Id INNER JOIN Dim_Store S ON F.Store_Id = S.Id INNER JOIN Dim_Geography G ON S.Geography_Id = G.Id INNER JOIN Dim_Product P ON F.Product_Id = P.Id INNER JOIN Dim_Brand B ON P.Brand_Id = B.Id INNER JOIN Dim_Product_Category C ON P.Product_Category_Id = C.Id WHERE D.Year = 1997 AND C.Product_Category = 'tv' GROUP BY B.Brand, G.Country II B.Com BA Schemas for Multidimensional Databases 16 Benefits Snowflake Schema The snowflake schema is in the same family as the star schema logical model. In fact, the star schema is considered a special case of the snowflake schema. The snowflake schema provides some advantages over the star schema in certain situations, including: Some OLAP multidimensional database modeling tools are optimized for snowflake schemas. Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins. II B.Com BA Schemas for Multidimensional Databases 17 Demerits • The primary disadvantage of the snowflake schema is that the additional levels of attribute normalization adds complexity to source query joins, when compared to the star schema. • Snowflake schemas, in contrast to flat single table dimensions, have been heavily criticized. Their goal is assumed to be an efficient and compact storage of normalized data but this is at the significant cost of poor performance when browsing the joins required in this dimension. This disadvantage may have reduced in the years since it was first recognized, owing to better query performance within the browsing tools. • When compared to a highly normalized transactional schema, the snowflake schema's denormalization removes the data integrity assurances provided by normalized schemas. Data loads into the snowflake schema must be highly controlled and managed to avoid update and insert anomalies. II B.Com BA Schemas for Multidimensional Databases 18 Difference between Star Schema and Snow Flake Schema Star Schema Snow FlakeSchema The star schema is the simplest data warehouse scheme. In star schema, each of the dimensions is represented in a single table. It should not have any hierarchies between dims. • It contains a fact table surrounded by dimension tables. If the dimensions are de-normalized, we say it is a star schema design. In star schema only one join establishes the relationship between the fact table and any one of the dimensiontables. A star schema optimizes the performance by keeping queries simple and providing fast response time. All the information about the each level is stored inone row. It is called a star schema because the diagram resembles astar. II B.Com BA Snowflake schema is a more complexdata warehouse model than a starschema. In snow flake schema, at least one hierarchy should exist between dimensiontables. It contains a fact table surrounded by dimension tables. If a dimension is normalized, we say it is a snow flaked design. In snow flake schema since there is relationship between the dimensions tables it has to do many joins to fetch the data. Snowflake schemas normalize dimensions to eliminated redundancy.The result is more complex queries and reduced query performance. Itis called a snowflake schema because the diagram resembles asnowflake. Schemas for Multidimensional Databases 19 Fact Constellation Schema A fact constellation has multiple fact tables. It is also known as galaxy schema. The following diagram shows two fact tables, namely sales and shipping. II B.Com BA Schemas for Multidimensional Databases 20 Fact Constellation Schema Time time_key day day_of_the_week month quarter year item Sales Fact Table time_key item_key item_name brand type supplier_type item_key location_key branch_key branch_name branch_type units_sold dollars_sold avg_sales item_key shipper_key location to_location location_key street city province_or_state country dollars_cost Measures II B.Com BA time_key from_location branch_key branch Shipping Fact Table units_shipped shipper shipper_key shipper_name location_key shipper_type Schemas for Multidimensional Databases 21 Fact Constellation Schema The sales fact table is same as that in the star schema. Shipping fact table contains three dimensions. It is also possible to share dimension tables between fact tables. For example item and location dimension tables are shared between the sales and shipping fact table. It is a collection of schema in which multiple fact tables share dimension tables. Sophisticated application requires such schema. II B.Com BA Schemas for Multidimensional Databases 22