madgavkar

advertisement
Tanvi Madgavkar
CSE 7330
FALL 2009
Ralph Kimball states that :
A data warehouse is a copy of transaction data
specifically structured for query and analysis.
Bill Inmon states that :
A warehouse is a subject-oriented, integrated,
time-variant and non-volatile collection of data
in support of management's decision making
process.
• A data warehouse provides a common data
model for all data of interest regardless of the
data's source.
• Prior to loading data into the data warehouse,
inconsistencies are identified and resolved.
• The information in the warehouse can be stored
safely for extended periods of time.
• It is a short for On Line Transaction Processing.
• OLTP refers to a class of systems that facilitate
and manage transaction-oriented applications,
typically for data entry and information retrieval.
• It is characterized by a large number of short
on-line transactions.
• The main emphasis for OLTP systems is put on
very fast query
environments.
processing
in
multi-access
• It is a short for On Line Analytical Processing.
• OLAP
is an approach to quickly
multi-dimensional analytical queries.
answer
• The
term OLAP was created as a slight
modification of the traditional database term OLTP.
• It is characterized by relatively low volume of
transactions.
In general, OLTP systems provide source data to data
warehouses, whereas OLAP systems help to analyze it.
OLTP
OLAP
Source of data
OLTPs are the original
source of data
Data comes from various
OLTP databases
Purpose of data
To run fundamental
transaction related tasks
To help with planning and
decision support
Queries
Standardized and simple
queries
Complex queries involving
Aggregation
Processing
Speed
Very Fast
Depends on the amount of
data involved
Space
Requirements
Relatively small
Larger due to existence of
historical data
• Multidimensional OLAP - MOLAP
This is the more traditional way of OLAP analysis. In MOLAP,
data is not stored in the relational database but in a
multidimensional cube.
• Relational OLAP - ROLAP
It works directly with relational databases, the base data is
stored as relational tables and new tables are created to hold
the aggregated information.
• Hybrid OLAP - HOLAP
HOLAP attempt to combine the advantages of MOLAP and
ROLAP. Here, a database will divide data between relational to
hold the larger quantities of detailed data and specialized
storage for smaller quantities of less-detailed data.
Steps in OLAP creation process:
•OLAPs are designed to give an overview
analysis of what happened. Hence the data
storage has to be set up differently.
•OLAP cubes also called a multidimensional
cube or a hypercube
data models.
and are created from
•OLAP cubes are not strictly cuboids - it is the
name given to the process of linking data
from the different dimensions.
•There can be number of cubes, developed
along units of dimensions or a giant cube can
be formed with all the dimensions.
•The OLAP cube is present at the core of any
OLAP system
and consists of number of
tables arranged in a particular schema.
• The cube metadata is typically created from
either a star schema or snowflake schema of
tables in a relational database.
• The most common method is called the star
design and it is called so, because it resembles a
‘star’ in shape.
• The star schema also known as star join
schema is the simplest style of data warehouse
schema.
• The star schema consists of a few fact tables,
normally possibly only one, justifying the name
referencing number of dimension tables.
•Create Table FACT1 (time_key INTEGER, item_key INTEGER,
branch_key INTEGER, Location_key INTEGER,
PRIMARY KEY (time_key))
•Create Table TIME (time_key INTEGER, day VARCHAR(10),
month VARCHAR(10), year VARCHAR(10),
day_of_work VARCHAR(10),
quarter VARCHAR(10),
FOREIGN KEY time_key REFERENCES FACT1)
•Create Table BRANCH (time_key INTEGER, branch_key INTEGER,
branch_name VARCHAR(10),
branch_type VARCHAR(10),
FOREIGN KEY time_key
REFERENCES FACT1)
•Advantages:
Simplest DW schema.
Easy to understand.
Easy to Navigate between the tables due to
less number of joins.
Most suitable for Query processing.
•Disadvantages:
Occupies more space.
Highly Denormalized.
• A snowflake schema is a logical arrangement of
tables in a multidimensional database such that
the entity relationship diagram resembles a
snowflake in shape.
• It is closely related to star schema as it is just a
variation of it. The only difference being that
dimensions are normalized into multiple related
tables in a snowflake schema whereas the star
schema's dimensions are denormalized with each
dimension being represented by a single table.
• Create Table FACT1 (time_key INTEGER,
item_key INTEGER,
branch_key INTEGER, Location_key INTEGER,
PRIMARY KEY (time_key)))
• Create Table ITEM(time_key INTEGER, item_key INTEGER,
item_name VARCHAR(10),
brand VARCHAR(10), type VARCHAR(10) ,
supplier_type VARCHAR(10)
FOREIGN KEY time_key REFERENCES FACT1)
• Create table SUPPLIER (time_key integer, supplier_key integer,
supplier_type integer)
FOREIGN KEY time_key
REFERENCES FACT1)
• Create Table FACT1 (time_key INTEGER, item_key INTEGER,
branch_key INTEGER, Location_key INTEGER,
PRIMARY KEY (time_key)))
• Create Table LOCATION(time_key INTEGER,
location_key INTEGER,
street VARCHAR (10), city VARCHAR(10),
PRIMARY KEY(location_key)
FOREIGN KEY time_key
REFERENCES FACT1)
• Create table CITY (location_key INTEGER, city_key INTEGER,
country VARCHAR (10), city VARCHAR (10),
state VARCHAR (10))s
FOREIGN KEY location_key
REFERENCES LOCATION)
• Advantages:
 These tables are easier to maintain.
 Saves the storage space.
• Disadvantages:
Due to large number of joins it is complex
to navigate.
• Star schema is a better option to choose from
users point of view. This schema exposes users to
the underlying table structures and also the
queries are simpler in nature. It is more likely to
be used when the data warehouse is large.
•Snowflake schema are often better with more
sophisticated query tools and smaller data
warehouse. Even though its maintenance is
relatively easy, it is based on
environments
having numerous queries with complex criteria
and hence more query execution time.
W.H. Inmon. “What is a Data Warehouse?, Prism, Volume 1, Number 1, 1995”.
Ralph Kimball. “The Data Warehouse Toolkit: Practical Techniques for Building
Dimensional Data Warehouses”.
Jun Yang. “WareHouse Information Prototype at Stanford”.
C. Caldeira. "Data Warehousing – Concepts and Models".
RainMaker DataWarehousing. “OLAP_vs_OLTP.pdf”, http://www.rainmaker
works.com/pdfdocs/OLTP_vs_OLAP.pdf
“Data Warehousing: A look at Business Intelligence and Data Warehouse”,
http://www.1keydata.com/dataware housing/ molap-rolap.html
Hari Mailvaganam. “ Data Warehousing Review – Introduction to OLAP”,
http://www.dwreview.com/OLAP /Introduction_OLAP.html
Mri Sonam. “What is the difference between star schema and snow flake schema?”,
http://www.geekinterview.com/question_details/38599
Wikipedia, The Free Encyclopedia. “Data Warehouse, OLAP, OLTP, Star Schema,
Snowflake Schema”, http://en.wikipedia.org/wiki/Main_Page
Download