Uploaded by Taqiya Yaman

Data warehouse unit 3

advertisement
UNIT
DATA WAREHOUSE ARCHITECTURE
Names of Sub-Units
Data Warehouse Architecture Types, Centralized Data Warehouse, Independent Data Marts,
Federated Data Model, Hub and Spoke Data Model, and Data Mart Bus.
Overview
This architectural type takes into account the enterprise-level information requirements.
The overall infrastructure is established. Atomic-level normalized data at the lowest level
of granularity is stored in the third normal form. Occasionally, some summarized data is
included. Queries and applications access the normalized data in the central data
warehouse. There are no separate data marts.
Learning Objectives
In this Unit you will learn –
 the Appraise and interpret the architectural design of a Data Warehouse.
Learning Outcomes
At the end of this unit, you would:
 Able to Describe the design of a Data Warehouse
JGI
JAIN
D EE M E D- T O -B E U N I V E R S I T Y
UNIT 3: Data Warehouse Architecture
Pre-Unit Preparatory Material
 https://archive.nptel.ac.in/courses/110/105/110105147/
 http://nitttrc.edu.in/nptel/courses/video/106108102/L05.html
2
UNIT 3: Data Warehouse Architecture
JGI
JAIN
D EE M E D- T O -B E U N I V E R S I T Y
3.1 Data Warehouse Architecture Types
This architectural type evolves in companies where the organizational units develop
their own data marts for their own specific purposes. Although each data mart serves a
particular organizational unit, these separate data marts do not provide “a single version of
the truth.” The data marts are independent of one another. As a result, these different data
marts are likely to have inconsistent data definitions and standards. Such variances hinder
the analysis of data across data marts. For example, if there are two independent data marts,
one for sales and the other for shipments, although sales and shipments are related subjects,
the independent data marts would make it difficult to analyze sales and shipments data
together.
Fig. Data warehouse architectural types
There are three common data warehouse architecture types typically used
for building a data warehouse.
 Single-Tier Architecture
 Two-Tier Architecture
 Three-Tier Architecture
3
JGI
JAIN
UNIT 3: Data Warehouse Architecture
D EE M E D- T O -B E U N I V E R S I T Y
Each type of data warehouse architecture has its own benefits and limitations. Let’s
explore the unique characteristics of each one of them.
3.1.1 Single-Tier Architecture
The single-tier data warehouse architecture reduces the amount of data stored in a
data warehouse by building a more compact data set. Its advantage is that it helps remove
data redundancies and improves the quality of your data. However, it isn’t the ideal solution
for agencies that own large volumes of data and operate with multiple data streams because
it’s inefficient.
The single-tier architecture has three layers:
 A source layers
 A data warehouse layer
 An analysis of layers
In the single-tier architecture, only the source layer is physical. The data warehouse layer
is virtual and provides data in a multidimensional view, created by an intermediate
processing layer. One drawback of the single-tier architecture is the lack of separation
between analytical and transactional processing. And that’s why this type of data warehouse
architecture is not used frequently.
Fig. Single Tier Architecture
4
JGI
UNIT 3: Data Warehouse Architecture
JAIN
D EE M E D- T O -B E U N I V E R S I T Y
3.1.2 Two-Tier Architecture
Unlike the single-tier architecture, the two-tier architecture contains a data staging
area that ensures any data you load into the warehouse is cleansed and in the right format.
It’s found between the source layer and the data warehouse layer, as depicted in the image
below.
Fig. Two Tier Architecture
3.1.3 Three-Tier Architecture
The three-tier architecture is what most organizations go for when building a data
warehouse system. It solves the connectivity problems that two-tier architecture
commonly faces. The three-tier architecture is made up of:
 A source layers
 A reconciled layer
 A data warehouse layer
The three-tier architecture is useful for extensive, enterprise-wide systems. But its
disadvantage is the additional storage space it uses through the redundant, reconciled layer.
The three-tier architecture also has three tiers:
 A bottom tier
 A top tier
 A middle tier
5
JGI
JAIN
UNIT 3: Data Warehouse Architecture
D EE M E D- T O -B E U N I V E R S I T Y
These three tiers are commonly called the layers of a data warehouse architecture.
Let’s take an in-depth look at these layers.
Fig. Three Tier Architecture
3.2 Centralized Data Warehouse
A Centralized Data Warehouse is a data warehousing implementation wherein a single
data warehouse serves the needs of several separate business unites simultaneously using a
single data model that spans the needs of multiple business divisions.
Today, having information means having power. In any a lot of aspects of daily living,
having relevant information can give us more ease in daily activities. This is made manifest
by the use of the internet. Because of the information that can be obtained every day,
internet users are growing by the day and the time people spend on the internet is getting
longer as web services are getting more and more sophisticated with applications that can
gather and aggregate billions of disparate data into useful information.
6
UNIT 3: Data Warehouse Architecture
JGI
JAIN
D EE M E D- T O -B E U N I V E R S I T Y
Fig. Centralized Data Warehouse
3.3 Independent Data Marts
A data mart is a simple form of a data warehouse that is focused on a single subject (or
functional area), such as Sales or Finance or Marketing. Data marts are often built and
controlled by a single department within an organization. Given their single-subject focus,
data marts usually draw data from only a few sources. The sources could be internal
operational systems, a central data warehouse, or external data.
An independent data mart is created without the use of a central data warehouse. This
could be desirable for smaller groups within an organization. It is not, however, the focus of
this Guide. See the Data Mart Suites documentation for further details regarding this
architecture.
7
JGI
JAIN
UNIT 3: Data Warehouse Architecture
D EE M E D- T O -B E U N I V E R S I T Y
Fig. Independent Data Marts
3.3.1 Dependent Data Mart
A dependent data mart allows you to unite your organization's data in one data
warehouse. This gives you the usual advantages of centralization.
Fig. Dependent Data Mart
3.4 Federated Data Model
Some companies get into data warehousing with an existing legacy of an assortment
8
UNIT 3: Data Warehouse Architecture
JGI
JAIN
D EE M E D- T O -B E U N I V E R S I T Y
of decision-support structures in the form of operational systems, extracted datasets,
primitive data marts, and so on. For such companies, it may not be prudent to discard all
that huge investment and start from scratch. The practical solution is a federated
architectural type where data may be physically or logically integrated through shared key
fields, overall global metadata, distributed queries, and other methods. In this architectural
type, there is no one overall data warehouse.
3.5 Hub and Spoke Data Model
This is the Inman Corporate Information Factory approach. Similar to the centralized
data warehouse architecture, here too is an overall enterprise-wide data warehouse. Atomic
data in the third normal form is stored in the centralized data warehouse. The major and
useful difference is the presence of dependent data marts in this architectural type.
Dependent data marts obtain data from the centralized data warehouse. The centralized
data warehouse forms the hub to feed data to the data marts on the spokes. The dependent
data marts may be developed for a variety of purposes: departmental analytical needs,
specialized queries, data mining, and so on. Each dependent dart mart may have normalized,
denormalized, summarized, or dimensional data structures based on individual
requirements. Most queries are directed to the dependent data marts although the
centralized data warehouse may itself be used for querying. This architectural type results
from adopting a top-down approach to data warehouse development.
3.6 Data Mart Bus
This is the Kimbal conformed supermarts approach. You begin with analyzing
requirements for a specific business subject such as orders, shipments, billings, insurance
claims, car rentals, and so on. You build the first data mart (supermart) using business
dimensions and metrics. These business dimensions will be shared in the future data marts.
The principal notion is that by conforming dimensions among the various data marts, the
result would be logically integrated supermarts that will provide an enterprise view of the
data. The data marts contain atomic data organized as a dimensional data model. This
architectural type results from adopting an enhanced bottom-up approach to data
warehouse development.
9
JGI
JAIN
UNIT 3: Data Warehouse Architecture
D EE M E D- T O -B E U N I V E R S I T Y
Summary
A data warehouse architecture is a set of interconnected databases that store,
organizes, and analyzes data. A data warehouse is a collection of databases that stores and
organizes data in a systematic way. A data warehouse architecture consists of three main
components: a data warehouse, an analytical framework, and an integration layer. The data
warehouse is the central repository for all the data. The analytical framework is the software
that processes the data and organizes it into tables. The integration layer is the software that
connects the databases together and makes them accessible to other applications. A data
warehouse architecture is an important part of any IT infrastructure because it helps to
optimize the performance of the entire system.
Self-Practice
Mini Project: Fraud Detection using PaySim Financial Dataset
Reference
link:
https://www.projectpro.io/article/data-warehouse-project-ideas-for-
practice/572
Self-Assessment Questions
a) Essay
Short
1. Define Data Warehouse Architecture.
2. List Data Warehouse Architecture Types.
3. Define data marts.
4. Define the Data Model.
5. Define Centralized Data Warehouse.
Medium
1. Writhe working principle of single-tier Data Warehouse Architecture.
2. Writhe working principle of two-tier Data Warehouse Architecture.
3. Writhe working principle of two-tier Data Warehouse Architecture.
4. Brief Federated Data Model in Data Warehouse Architecture.
5. Brief Data Mart Bus.
10
UNIT 3: Data Warehouse Architecture
JGI
JAIN
D EE M E D- T O -B E U N I V E R S I T Y
Long
1. Explain single-tier Data Warehouse Architecture with a neat sketch.
2. Describe Two-tier Data Warehouse Architecture with a neat sketch.
3. Explain Three-tier Data Warehouse Architecture with a neat sketch.
4. Elaborate Centralized Data Warehouse with a suitable sketch.
5. Explain Hub and Spoke Data Model with a neat diagram.
3.7
POST-UNIT READING MATERIAL
 https://www.youtube.com/watch?v=J326LIUrZM8&list=PL9ooVrP1hQOEDSc5QEbI8W
YVV_EbWKJwX
 https://www.youtube.com/watch?v=G4NYQox4n2g
3.8 TOPICS FOR DISCUSSION FORUMS
11
Download