Project kick-off meeting template

advertisement
Data Warehousing –
A Technology Marvel
-by Swati Chawla
Agenda
•
•
•
•
•
•
•
•
•
•
Introduction
Business Need Beyond Reporting
Traditional Approaches
Definition
Data Classification
Components of Data Warehouse
Benefits
Tools For DataWarehousing
Data Modeling Terminologies
Schemas
– Star Schema
– Snowflake Schema
2
Scenario 1
• Your company has made less profit than previous year?
• What could be the reason?
• How would you generate a report of your yearly sales
and how long would you need to figure out the problem?
• Your manager wants the reason as early as possible…
3
Business Need beyond Reporting….
4
Scenario 2
•
You are a frequent Traveler
•
You have a Saving Bank account with ABC Bank pvt. Ltd.
•
You use your Bank’s ATM card to buy your Air Tickets…
•
•
Now, one day you receive an exciting offer from the bank stating a 15 percent discount on
all the Air Tickets booked using Bank’s ATM Card …..
Sounds Fascinating , Isn’t it?
•
What Do You Think would have Happened?
•
How did you Bank Get to know about you Nature of your
transactions…????
•
Did the Bank Manager own a Magic-Ball ?
5
Traditional Approaches
• Programs were written to analyze the data stored on
tapes or on Mainframes .
• With the advent of personal computers, programs were
run on data dump (Data Islands) stored on individual
PCs in order to analyze the data.
• Decision Support System
• Executive Information systems
6
Data Warehousing has the key to all these Questions ….
7
Defining Data Warehouse
According to Bill Inmon, known as the father of Data Warehousing, a data
warehouse is a subject oriented, integrated, time-variant, nonvolatile
collection of data in support of management decisions.
Few of the applications of DWH:
– Cloth Manufacturer: Analyze sales and product trends by location to
understand customer buying patterns
– Pharma Manufacturer: Analysis of physicians and their prescribing
patterns
– Retailer: Analyze sale fluctuations across different regions
– Movie Theatre Chain: Key performance indicators including average
ticket price, attendance, box office ticket sales, concession sales,
buttered vs. non-buttered popcorn
– Airline Industry: Analysis of airline network trends by revenue class,
routes, origin-destination, point of booking
8
Data classification
Data
Operational
Data
Informational
Data
Operational
processing
Analytical
processing
9
Informational & Operational Data
Data warehouse
OLTP DB
Typical operation
Query scans thousands or millions of rows.
For example" Find the total sales of last
month."
Accesses only a handful of records. For example"
Retrieve the current order for this customer."
Schema design
De-normalized or partially normalized
schemas
Fully normalized schemas
Data Modification:
A data warehouse is updated on a regular
basis. The end users of a data warehouse do
not directly update the data.
The OLTP database is always up to date, and
reflects the current state of each business
transaction.
Historical Data
Data warehouses usually store many months
or years of data.
OLTP systems usually store data from only a few
weeks or months.
User
Knowledge worker, Business Analyst
Clerk, IT Professional
#Users
Hundreds
Thousands
10
Components of Data Warehouse
A Data Warehouse typically comprises of following
components –
•
•
•
•
•
•
Source Data Layer
Data Transformation Layer
Data Store / Warehouse Layer
Reporting Layer
Metadata Layer
Operations Layer
11
12
Source Data Layer & Data Transformation Layer
ETL is the process of Extracting, Transforming & Loading Data in the
process of Data Warehousing.
• EXTRACTION: The data are extracted from the source. Data can be
extracted from more than a single source.
• TRANSFORMATION: Manipulations can be made to the data that
are being extracted from the source. The Manipulations needed are
done at this stage. It includes converting the data into a format and
presenting it in such a manner, which facilitates the easy
understanding of data and enhances the business user’s capability
to carry out the business data analysis .
• LOADING: The modified data is then loaded into the Data
Warehouse . Loading involves the insertion of data into the target
system, that is, the data warehouse.
13
Data Flow (Data Warehousing Layer)
A Data Mart is •Scaled down version of DWH which is designed for a particular line of business.
•Focuses on one subject area or only one group of users.
Finance
Orders
Billing
DWH
Marketing
Product
Customer
OLTP
Customer
Service
14
Data Marts
Reporting Layer
• Reporting is the process of development and production
of business reports based on data warehouse data.
• Data mining is the process of examining data for trends
and patterns that might have evaded human analysis.
• OLAP an acronym for 'Online Analytical Processing' is a
technique by which the data sourced from a data
warehouse or data mart is visualized and summarized to
provide perspective multidimensional view across
multiple dimensions.
15
Data Warehousing – End to End
16
Benefits
Data Warehouse –
•
•
•
•
•
Queries do not impact Operational systems
Provides quick response to queries for reporting
Enables Subject Area Orientation
Integrates data from multiple, diverse sources
Enables multiple interpretations of same data by different
users or groups
• Provides thorough analysis of data over a period of time
• Accuracy of Operational systems can be checked
• Provides analysis capabilities to decision makers
17
Tools Available For Data Warehousing:-
18
Fact
1. Bottles of Soft
Drink
2. Sold in the month
of July 2006
500
3. Sold in Jalandhar
City
19
Data Modeling Terminologies
• Fact table consists of the measurements, metrics or
facts of a business process .
• Dimension table is one of the set of companion tables to
a fact table.
• Schema is a collection of database objects, including
tables, views, indexes, and synonyms
20
Data Warehouse Schemas
– Star Schema
• Star Schema is a relational database schema for representing
multidimensional data. The center of the star schema consists
of a large fact table and it points towards the dimension tables
– Snowflake Schema
• A snowflake schema is a variation on the star schema, in
which very large dimension tables are normalized into
multiple tables. Dimensions with hierarchies can be
decomposed into a snowflake structure when it is required to
normalize the dimension tables, in order to save space.
Snowflake schema approach increases the number of joins
and results in poor performance in retrieval of data.
21
Example of a Star Schema
22
Example of a Snowflake Schema
23
Thank You
Download