Slides Set 7 - WordPress.com

advertisement
CHAPTER 7:
ARCHITECTURAL COMPONENTS
CHAPTER OBJECTIVES




Understand data warehouse architecture
Examine how the architectural framework supports the flow of data
Study the functions and services of the architectural components
Revisit the five major architectural types
UNDERSTANDING DATA WAREHOUSE ARCHITECTURE
 Architecture: Definitions
 Architecture in Three Major Areas
ARCHITECTURAL FRAMEWORK
 Architecture Supporting Flow of Data
 The Management and Control Module
TECHNICAL ARCHITECTURE
 Data Acquisition
 Data Storage
 Information Delivery
ARCHITECTURAL TYPES
 Centralized Corporate Data Warehouse
 Independent Data Marts
 Federated
 Hub-and-Spoke
UNDERSTANDING DATA WAREHOUSE ARCHITECTURE
We were introduced to the building blocks of the data warehouse. At that stage,
we quickly looked at the list of components and reviewed each very briefly.
In this chapter, we want to review the data warehouse architecture from different
perspectives.
You will study the architectural components in the order in which they enable the
flow of data from the sources as business intelligence to the end-users.
Then you will be able to look at each area of the architecture and examine the
functions, procedures, and features in that area.
Architecture: Definitions
The structure that brings all the components of a data warehouse together is known
as the architecture.
In your data warehouse,
 The architecture includes the integrated data that is the centerpiece.
 The architecture includes everything that is needed to prepare the data and
store it.
 The architecture includes all the means for delivering information from your data
warehouse.
 The architecture is further composed of the rules, procedures, and functions that
enable your data warehouse to work and fulfill the business requirements.
 The architecture is made up of the technology that empowers your data
warehouse.
What is the general purpose of the data warehouse architecture?
 The architecture provides the overall framework for developing and deploying
your data warehouse; it is a comprehensive blueprint.
 The architecture defines the standards, measurements, general design, and
support techniques.
Architecture in Three Major Areas
As you already know, the three major areas in the data warehouse are:
 Data acquisition
 Data storage
 Information delivery
Figure 7-1 groups these major architectural components into the three areas.
ARCHITECTURAL FRAMEWORK
Architecture Supporting Flow of Data
This collection of data from the various sources moves to the staging area. What
happens next? The extracted data goes through a detailed preparation process in
the staging area before it is sent forward to the data warehouse to be properly
stored.
From the data warehouse storage, data transformed into useful information is
retrieved by the users or delivered to the user desktops as required.
Figure 7-2 shows the flow of data from beginning to end and also highlights the
architectural components enabling the flow of data as the data moves along.
Figure 7-2 Architectural framework supporting the flow of data.
The Management and Control Module
This architectural component is an overall module managing and controlling the
entire data warehouse environment. It is an umbrella component working at various
levels and covering all the operations.
Major functions:
 Monitor all the ongoing operations
 Recover from problems when things go wrong.
 Manages and controls the data acquisition functions, ensuring that extracts
and transformations are carried out correctly and in a timely fashion.
 Manages backing up significant parts of the data warehouse and recovering
from failures.
 Monitoring the growth and periodically archiving data from the data
warehouse.
 Governs data security and provides authorized access to the data
warehouse.
 Interfaces with the end-user information delivery component to ensure that
information delivery is carried out properly.
Figure 7-3 shows how the management component relates to and manages all of
the data warehouse operations.
Figure 7-3 The management and control component.
TECHNICAL ARCHITECTURE
Let us now consider the technical architecture in each of the three major areas of
the data warehouse , Data Acquisition, Data Storage, Information Delivery.
Data Acquisition
This area covers the entire process of extracting data from the data sources,
moving all the extracted data to the staging area, and preparing the data for
loading into the data warehouse repository.
Figure 7-4 Data acquisition: technical architecture.
1) Data Flow
Flow
In the data acquisition area, the data flow begins at the data sources and pauses
at the staging area. After transformation and integration, the data is ready for
loading into the data warehouse repository.
Data Sources
1. Usually, these systems are supported by relational DBMSs. Here you may use
an SQL-based language for extracting data.
2. A fairly large number of companies have adopted ERP (enterprise resource
planning) systems. ERP data sources provide an advantage in that the data
from these sources is already consolidated and integrated.
3. For including data from outside sources, you will have to create temporary files
to hold the data received from the outside sources. After reformatting and
rearranging the data elements, you will have to move the data to the staging
area.
Intermediary Data Stores
As data gets extracted from the data sources, it moves through temporary files.
Sometimes, extracts of homogeneous data from several source applications are
pulled into separate temporary files and then merged into another temporary file
before moving it to the staging area.
Typically, the general practice is to use flat files to extract data from operational
systems.
Staging Area
This is the place where all the extracted data is put together and prepared for
loading into the data warehouse. The staging area is like an assembly plant or a
construction area.
In this area, you examine each extracted file, review the business rules,
perform the various data transformation functions, sort and merge data,
resolve inconsistencies, and cleanse the data.
When the data is finally prepared either for an enterprise-wide data warehouse or
one of the conformed data marts, the data temporarily resides in the staging area
repository waiting to be loaded into the data warehouse repository.
2) Functions and Services
The list of functions and services in this section relates to the data acquisition
area and is broken down into three groups. This is a general list. It does not
indicate the extent or complexity of each function or service. For the technical
architecture of your data warehouse, you have to determine the content and
complexity of each function or service.
List of Functions and Services
Data Extraction
 Generate automatic extract files from operational systems using
replication and other techniques.
 Create intermediary files to store selected data to be merged later.
 Transport extracted files from multiple platforms.
 Reformat input from outside sources.
 Reformat input from departmental data files, databases, and
spreadsheets.
 Generate common application codes for data extraction.
Data Transformation







Map input data to data for data warehouse repository.
Clean data, de-duplicate, and merge/purge.
Convert data types.
Calculate and derive attribute values.
Aggregate data as needed.
Resolve missing values.
Consolidate and integrate data.
Data Staging






Provide backup and recovery for staging area repositories.
Sort and merge files.
Create files as input to make changes to dimension tables.
If data staging storage is a relational database, create and populate database.
Resolve and create primary and foreign keys for load tables.
If staging area storage is a relational database, extract load files.
Data Storage
This covers the process of loading the data from the staging area into the data
warehouse repository. All functions for transforming and integrating the data are
completed in the data staging area.
Figure 7-5 shows a summarized view of the technical architecture for data
storage.
Figure 7-5 Data storage: technical architecture.
1) Data Flow
Flow
For data storage, the data flow begins at the data staging area. The transformed
and integrated data is moved from the staging area to the data warehouse
repository. If the data warehouse is an enterprise-wide data warehouse being built
in a top-down fashion, then there could be movements of data from the enterprisewide data warehouse repository to the repositories of the dependent data marts.
Data Groups
Prepared data waiting in the data staging area fall into two groups.
 The first group is the set of files or tables containing data for a full refresh. This
group of data is usually meant for the initial loading of the data warehouse.
Occasionally, some data warehouse tables may be refreshed fully.
 The other group of data is the set of files or tables containing ongoing
incremental loads.
The Data Repository
Almost all of today’s data warehouse databases are relational databases. All the
power, flexibility, and ease of use capabilities of the RDBMS become available for
the processing of data.
2) Functions and Services
The general list of functions and services given in this section is for your guidance.
The list relates to the data storage area and covers the broad functions and
services. This is a general list. It does not indicate the extent or complexity of each
function or service. For the technical architecture of data warehouse, you have to
determine the content and complexity of each function or service.
List of Functions and Services








Load data for full refreshes of data warehouse tables.
Support loading into multiple tables at the detailed and summarized levels.
Optimize the loading process.
Provide automated job control services for loading the data warehouse.
Provide backup and recovery for the data warehouse database.
Provide security.
Monitor and fine-tune the database.
Periodically archive data from the database.
Information Delivery
This area spans a broad spectrum of methods for making information available to
users. For your users, the information delivery component is the data warehouse.
They do not come into contact with the other components directly. For the users,
the strength of your data warehouse architecture is mainly concentrated in the
flexibility of the information delivery component. The information delivery
component makes it easy for the users to access the information either directly
from the enterprise-wide data warehouse, from the dependent data marts, or from
the set of conformed data marts.
Almost all modern data warehouses provide for online analytical processing
(OLAP). In this case, the primary data warehouse feeds data to proprietary
multidimensional databases (MDDBs) where summarized data is kept as
multidimensional cubes of information. The users perform complex
multidimensional analysis using the information cubes in the MDDBs. Refer to
Figure 7-6 for a summarized view of the technical architecture for information
delivery.
Figure 7-6 Information delivery: technical architecture.
1) Data Flow
Flow
For information delivery, the data flow begins at the enterprise-wide data
warehouse and the dependent data marts when the design is based on the top
down technique. When the design follows the bottom-up method, the data flow
starts at the set of conformed data marts. Generally, data transformed into
information flows to user desktops during query sessions. Sometimes, the result
sets from individual queries or reports are held in proprietary data stores of the
query or reporting tool vendors.
Recently progressive organizations implement dashboards and scorecards as
part of information delivery. Dashboards are real time or near real time
information display devices.
Service Locations
In your information delivery component, you may provide query services from the
user desktop, from an application server, or from the database itself. This will be
one of the critical decisions for your architecture design.
Data Stores
For information delivery, you may consider the following intermediary data
stores:
 Proprietary temporary stores to hold results of individual queries and
reports for repeated use
 Data stores for standard reporting
 Proprietary multidimensional databases
2) Functions and Services
Review the general list of functions and services given below and use it as a guide
to establish the information delivery component of your data warehouse
architecture. The list relates to information delivery and covers the broad functions
and services. Again, this is a general list. It does not indicate the extent or
complexity of each function or service. For the technical architecture of your data
warehouse, you have to determine the content and complexity of each function or
service.







Provide security to control information access.
Monitor user access to improve service and for future enhancements.
Allow users to browse data warehouse content.
Simplify access by hiding internal complexities of data storage from users.
Automatically reformat queries for optimal execution.
Govern queries and control runaway queries.
Provide self-service report generation for users, consisting of a variety of flexible
options to create, schedule, and run reports.
 Store result sets of queries and reports for future use.
 Make provision for the users to perform complex analysis through online
analytical processing (OLAP).
ARCHITECTURAL TYPES
1) Centralized Corporate Data Warehouse
In this architecture type, a centralized enterprise data warehouse is present. There
are no data marts, whether dependent or independent. Therefore all information
delivery is from the centralized data warehouse.
Figure 7-7 Overview of the components of a centralized data warehouse.
2) Independent Data Marts
In this architecture type, the data warehouse is a collection of unconnected,
disparate data marts, each serving a specific department or purpose. Each data
mart delivers information to its own group of users.
Figure 7-8 Overview of the components of independent data marts.
3) Federated
In the federated architectural type, common data elements in the various data
marts and even data warehouses that compose the federation are integrated
physically or logically. The goal is to strive for a single version of truth for the
organization; a centralized enterprise data warehouse is present. There are no data
marts, whether dependent or independent. Therefore all information delivery is from
the centralized data warehouse.
Figure 7-9 Overview of the components of a federated data warehouse.
4) Hub-and-Spoke
In this architecture type, a centralized enterprise data warehouse is present. In
addition, there are data marts that depend on the enterprise data warehouse for
data feed. Information delivery can, therefore, be both from the centralized data
warehouse and the dependent data marts.
Figure 7-10 Overview of the components of a hub-and-spoke type of data warehouse.
5- Data-Mart Bus
In this architecture type, no distinct, single data warehouse exists. The
collection of all the data marts form the data warehouse because the data
marts are conformed “super-marts”
6-27
CHAPTER SUMMARY
 Architecture is the structure that brings all the components together.
 Data warehouse architecture consists of distinct components with the read-only data
repository as the centerpiece.
 A few typical data warehouse architectural types are in use at various organizations. Broadly
these types reflect how data is stored and made available—centrally as the single enterprise
data warehouse database or as a collection cohesive data marts.
 The architectural components support the functioning of the data warehouse in the three
major areas of data acquisition, data storage, and information delivery.
 Data warehouse architecture is wide, complex, expansive, and has several distinguishing
characteristics.
 The architectural framework enables the flow of data from the data sources at one end to
the user’s desktop at the other.
 The technical architecture of a data warehouse is the complete set of functions and services
provided within its components. It includes the procedures and rules needed to perform the
functions and to provide the services.
 The flow of data from the source systems to end-users as business intelligence depends on
the architectural type.
Download