Uploaded by valarmathi.s

Unit I - Schemas

advertisement
Programme: II B.Com BA
Course Title: Business Data Mining
Content: Unit I - Schemas for Multidimensional Databases
Ms. S. Valarmathi
Assistant Professor
B.Com (CA) Department
KPR College of Arts Science and Research
II B.Com BA
Schemas for Multidimensional Databases
What is Schema?
Database uses relational model while data warehouse
requires Schema.
 Schema is a logical description of the entire
database.
 It includes the name and description of records.
 Much like a database, a data warehouse also requires to
maintain a schema.

II B.Com BA
Schemas for Multidimensional Databases
1
Types of Schemas



Star Schema
Snowflake Schema
Galaxy Schema
II B.Com BA
Schemas for Multidimensional Databases
2
Fact Table
Contains primary information of the warehouse.
 Contain the contents of the data warehouse and store different
types of measures.
 Located at center in Star or Snowflake Schema
surrounded by dimensional tables.
 Two columns: Measurements(numeric values)and Foreign
keys to dimension tables.

II B.Com BA
Schemas for Multidimensional Databases
3
Dimension Table
Contain information about a particular dimension.
 Textual information of the business.

Stores attributes or dimensions that describes
the objects in a fact table.

Dimension table has a surrogate key column that
uniquely identifies each dimension record.

It is de-normalized because built to analyze data as
easily as possible.

II B.Com BA
Schemas for Multidimensional Databases
4
Star Schema





Star schema is a relational model with one-to-many
relationship between the fact table and the dimension
tables.
De-normalized model.
Easy for users to understand.
Easy querying and data analysis. Ability to
drill down or roll up.
II B.Com BA
Schemas for Multidimensional Databases
5
Star Schema
Each dimension in a star schema is represented with
only one-dimension table.

This dimension table contains the set of attributes.

There is a fact table at the center. It contains the keys to
each of four dimensions.

The fact table also contains the attributes, namely dollars
sold and units sold.

II B.Com BA
Schemas for Multidimensional Databases
6
Star Schema
time
time_key
day
day_of_the_week
month
quarter
year
item
Sales Fact Table
time_key
item_key
branch_key
branch
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
item_key
item_name
brand
type
supplier_type
location
location_key
street
city
state_or_province
country
Measures
II B.Com BA
Schemas for Multidimensional Databases
7
Star Schema
Here Sales Fact table got concatenated keys.
•
Concatenation of all the primary keys of the dimension
tables i.e. Time Key of Time Dimension table, Item Key of Item
Dimension table, Location key from Location dimension table,
Branch key from Branch dimension table.
•
II B.Com BA
Schemas for Multidimensional Databases
8
Star Schema in Real-world Database
SELECT
P.Brand,
S.Country AS Countries,
SUM(F.Units_Sold)
FROM
Fact_Sales F
INNER JOIN
JOIN
Dim_Date D ON (F.Date_Id = D.Id) INNER
Dim_Store S ON (F.Store_Id = S.Id) INNER JOIN
Dim_Product P ON (F.Product_Id = P.Id)
P.Product_Category =
'tv' GROUP BY P.Brand, S.Country
WHERE
II B.Com BA
Schemas for Multidimensional Databases
D.Year
= 1997
AND
9
Benefits Star Schema
• Star schemas are denormalized, meaning the normal rules of normalization
applied to transactional relational databases are relaxed during star schema
design and implementation. The benefits of star schema denormalization are:
• Simpler queries - star schema join logic is generally simpler than the join logic
required to retrieve data from a highly normalized transactional schema.
• Simplified business reporting logic - when compared to
highly normalized schemas, the star schema simplifies common business
reporting logic, such as period-over-period and as-of reporting.
• Query performance gains - star schemas can provide performance
enhancements for read-only reporting applications when compared to
highly normalized schemas.
• Fast aggregations - the simpler queries against a star schema can result
in improved performance for aggregation operations.
• Feeding cubes - star schemas are used by all OLAP systems to build
proprietary OLAP cubes efficiently; in fact, most major OLAP systems provide
a ROLAP mode of operation which can use a star schema directly as a source
without building a proprietary cube structure.
II B.Com BA
Schemas for Multidimensional Databases
10
Demerits
• The main disadvantage of the star schema is that data integrity is not
enforced as well as it is in a highly normalized database. One-off inserts and
updates can result in data anomalies which normalized schemas are
designed to avoid. Generally speaking, star schemas are loaded in a highly
controlled fashion via batch processing or near-real time "trickle feeds", to
compensate for the lack of protection afforded by normalization.
• Star schema is also not as flexible in terms of analytical needs as a
normalized data model.
• Normalized models allow any kind of analytical queries to be executed
as long as they follow the business logic defined in the model. Star
schemas tend to be more purpose-built for a particular view of the data,
thus not really allowing more complex analytics.
• Star schemas don't support many-to-many relationships between
business entities - at least not very naturally. Typically these relationships
are simplified in star schema to conform to the simple dimensional model.
II B.Com BA
Schemas for Multidimensional Databases
11
Snowflake Schema
Some dimension tables in the Snowflake schema are
normalized.
 The normalization splits up the data into additional tables.
 Unlike Star schema, the dimensions table in a snowflake
schema are normalized.

II B.Com BA
Schemas for Multidimensional Databases
12
Snowflake Schema
Time
time_key
day
day_of_the_week
month
quarter
year
Item
Sales Fact Table
time_key
item_key
branch_key
Branch
branch_key
branch_name
branch_type
location_key
units_sold
dollars_sold
avg_sales
Measures
II B.Com BA
Schemas for Multidimensional Databases
item_key
item_name
brand
type
supplier_key
Supplier
supplier_key
supplier_type
location
location_key
street
city_key
city
city_key
city
state_or_province
country
13
Snowflake Schema
 For example, the item dimension table in star
 Schema is normalized and split into two dimension tables,
namely item and supplier table.
 Advantage of Snowflake schema is that it is easier to
update and maintain normalized structures.
 Disadvantage of Snowflake schema is that it degrades the
query performance because of additional joins..
II B.Com BA
Schemas for Multidimensional Databases
14
Snowflake Schema in Real-world Database
II B.Com BA
Schemas for Multidimensional Databases
15
SELECT
B.Brand,
G.Country,
SUM(F.Units_Sold)
FROM Fact_Sales F
INNER JOIN Dim_Date D ON F.Date_Id = D.Id
INNER JOIN Dim_Store S ON F.Store_Id = S.Id
INNER JOIN Dim_Geography G ON S.Geography_Id =
G.Id
INNER JOIN Dim_Product P ON F.Product_Id = P.Id
INNER JOIN Dim_Brand B ON P.Brand_Id = B.Id
INNER JOIN Dim_Product_Category C ON
P.Product_Category_Id = C.Id
WHERE D.Year = 1997 AND C.Product_Category =
'tv' GROUP BY B.Brand, G.Country
II B.Com BA
Schemas for Multidimensional Databases
16
Benefits Snowflake Schema
 The snowflake schema is in the same family as the star schema logical
model.

In fact, the star schema is considered a special case of the snowflake
schema. The snowflake schema provides some advantages over the
star schema in certain situations, including:
 Some OLAP multidimensional database modeling tools are optimized for
snowflake schemas.
 Normalizing
attributes results in storage savings, the tradeoff
being additional complexity in source query joins.
II B.Com BA
Schemas for Multidimensional Databases
17
Demerits
• The primary disadvantage of the snowflake schema is that the additional
levels of attribute normalization adds complexity to source query joins,
when compared to the star schema.
• Snowflake schemas, in contrast to flat single table dimensions, have
been heavily criticized. Their goal is assumed to be an efficient and
compact storage of normalized data but this is at the significant cost of
poor performance when browsing the joins required in this dimension.
This disadvantage may have reduced in the years since it was first
recognized, owing to better query performance within the browsing
tools.
• When compared to a highly normalized transactional schema, the
snowflake
schema's denormalization removes the data integrity
assurances provided by normalized schemas. Data loads into the
snowflake schema must be highly controlled and managed to avoid update
and insert anomalies.
II B.Com BA
Schemas for Multidimensional Databases
18
Difference between Star Schema and Snow Flake
Schema
Star Schema
Snow FlakeSchema

The star schema is the simplest data warehouse
scheme.


In star schema, each of the dimensions is
represented in a single table. It should not have
any hierarchies between dims.




•


It contains a fact table surrounded by dimension
tables. If the dimensions are de-normalized, we
say it is a star schema design.

In star schema only one join establishes the
relationship between the fact table and any one
of the dimensiontables.

A star schema optimizes the performance by
keeping queries simple and providing fast
response time. All the information about the
each level is stored inone row.
It is called a star schema because the diagram
resembles astar.
II B.Com BA

Snowflake schema is a more complexdata
warehouse model than a starschema.
In snow flake schema, at least one hierarchy
should exist between dimensiontables.
It contains a fact table surrounded by dimension
tables. If a dimension is normalized, we say it is a
snow flaked design.
In snow flake schema since there is relationship
between the dimensions tables it has to do many
joins to fetch the data.
Snowflake schemas normalize dimensions to
eliminated redundancy.The result is more
complex queries and reduced query
performance.
Itis called a snowflake schema because the
diagram resembles asnowflake.
Schemas for Multidimensional Databases
19
Fact Constellation Schema



A fact constellation has multiple fact tables. It is also
known as galaxy schema.
The following diagram shows two fact tables,
namely sales and shipping.
II B.Com BA
Schemas for Multidimensional Databases
20
Fact Constellation Schema
Time
time_key
day
day_of_the_week
month
quarter
year
item
Sales Fact Table
time_key
item_key
item_name
brand
type
supplier_type
item_key
location_key
branch_key
branch_name
branch_type
units_sold
dollars_sold
avg_sales
item_key
shipper_key
location
to_location
location_key
street
city
province_or_state
country
dollars_cost
Measures
II B.Com BA
time_key
from_location
branch_key
branch
Shipping Fact Table
units_shipped
shipper
shipper_key
shipper_name
location_key
shipper_type
Schemas for Multidimensional Databases
21
Fact Constellation Schema
The sales fact table is same as that in the star schema.

Shipping fact table contains three dimensions.
 It is also possible to share dimension tables between fact
tables.
 For example item and location dimension tables
are
shared between the sales and shipping fact table.
 It is a collection of schema in which multiple fact tables share
dimension tables. Sophisticated application requires such
schema.

II B.Com BA
Schemas for Multidimensional Databases
22
Download