Data Warehouse Design Consideration_ODS_NIIT

advertisement
Shiva.inbox@gmail.com
Operational Data Store Design Consideration:
The operational data store is an architectural constructs that that has elements of both
operational and DSS processing. The ODS is able to provide operational transaction response
time of two to three seconds and allows update to be done directly. In that sense it is very
operational. But an ODS is also integrated and can be used for DSS processing. In that sense
it is decision support.
The ODS is architecturally similar to the data warehouse in that both the ODS and the data
warehouse are integrated and subject oriented. But the ODS and the data warehouse are very
dissimilar in that the data warehouse is not updated by transactions while the ODS certainly
can be updated by transactions, the data warehouse contains summarized data while the ODS
does not, the data warehouse contains a lengthy historical perspective of data while the ODS
does not, the data warehouse is used for strategic decisions while the ODS is used for tactical
decisions, and the data warehouse is used at the managerial level while the ODS is used at
the clerical level.
The ODS sits between the legacy systems environment and the data warehouse. As data
passes out of the legacy environment and into the ODS, the data is integrated, just as it is as
data passes into the data warehouse. Then as the data ages it passes out of the ODS and into
the data warehouse environment.
The ODS is defined to be a structure that is:

Integrated

Subject oriented

Volatile, where update can be done

Current valued, containing data that is a day or perhaps a month old

Contains detailed data only
The ODS is like a data warehouse in that it is integrated and is subject oriented. But a data
warehouse is very different from an ODS when it comes to update. An ODS contains data that
can be altered, whereas a data warehouse contains data that exists in snapshots and cannot
be altered.
The second major difference between an ODS and a data warehouse is that the ODS contains
very current data while the data warehouse contains a robust amount of history. The ODS
typically contains daily, weekly, or even monthly data. The data warehouse contains historical
data that may be five or even ten years old.
Another important difference between the data warehouse and the ODS is in the summary
data found in each environment. Summary data can be created in the ODS but it is not usually
stored, once having been created.
The summary data that is created in the ODS can be called "dynamic summary data".
Dynamic summary data is summary data whose accuracy depends on the moment of
calculation.
DWH Design Operational Data Store
1
Shiva.inbox@gmail.com
The summary data found in the data warehouse is quite different from the dynamic summary
data found in the ODS. The summary data found in the data warehouse can be called "static
summary data". Static summary data is data whose accuracy does not depend on the moment
of calculation.
The ODS is a particularly difficult structure to build because the design of the ODS must satisfy
both operational and DSS requirements at the same time. Because the ODS must satisfy more
than one master at a time, the ODS design is necessarily a compromise. In many regards the
ODS is a staging area for data as it passes into the DSS environment.
TRANSACTIONS TO ODS:
The operational data store (ODS) is designed to provide integrated, collective information for
the operational environment. The ODS allows updates to occur and contains only current
information. Architecturally, the ODS is in many ways the operational counterpart of the data
warehouse.
There are four classes of ODS, based upon the speed with which operational transactions are
entered into the ODS after being transacted in the operational transaction processing
environment.

A class I ODS is one in which transaction changes are entered into the ODS almost
immediately upon being transacted in the transaction processing environment.

A class II ODS is one in which transactions are stored and forwarded into the ODS after
having been executed in the transaction environment.

The class III ODS is the same as the class II ODS insofar as the mechanics of the storing
and forwarding of the transaction are concerned.

A class IV ODS is one where data is loaded from the data warehouse into the ODS. In this
case much analytical work has occurred in the data warehouse environment and the
results of the analysis are fed back to the ODS. Once fed back to the ODS, the results are
available on an OLTP type of response time.
An ODS provides a foundation for collective, up to the second views of the enterprise. And at
the same time the ODS supports decision support processing (DSS).
Because of the many roles that an ODS fulfills, it is necessarily a complex structure. Its
underlying technology is complex. Its design is complex. Monitoring and maintaining the ODS
is complex.
The ODS typically takes a long time to design and implement. The ODS requires changing or
replacing old legacy systems that are un-integrated.
The classical design of the structures found in the DSS environment begins with a data model,
which reflects the informational needs of the corporation.
From the data model are generated normalized tables. These tables constitute what can be
described as a logical design. The many normalized tables are combined into a form of
physical design that can be described as lightly normalized design. In a lightly normalized
DWH Design Operational Data Store
2
Shiva.inbox@gmail.com
design, tables are combined on the basis of containing common keys and general common
usage.
The design technique of creating normalized \ lightly normalized structures based on a data
model that has been described here fits many instances of DSS design.
But there is a problem this approach. When the issues of:

Performance, where many tables must be joined,

Performance, where there are many occurrences of data that will populate the design, and

Simplicity, where users find it unnatural to join many tables together to represent data in
a form comprehensible to the end user each time the end user does a transaction
Are considered, the design technique of light normalization yields marginal results.
An alternate design approach is to take into consideration the volume and usage of the data.
When the volume and usage of the data are factored into the design, a mutant form of
normalization is achieved. The light normalization turns into heavy normalization, and a
structure known as the "star join" is created.
There are two essential parts to a star join - fact tables and dimension tables. The fact table
represents the structure that holds the majority of the occurrences of the data. Fact tables
typically combine data and cross reference keys from a variety of other tables.
The other type of table that participates in a star join is the dimension table. Dimension tables
contain data, which is not terribly voluminous. Dimension tables are related to fact tables by
means of a foreign key relationship.
Fact tables are efficient to access because data has been prejoined into the table at the
moment of loading. The end user is be able to access fact tables efficiently because the fact
tables are extremely streamlined in their design. In addition, the fact table is familiar to the
end user, in terms of the day-to-day structuring of data that the end user used to seeing.
By building star joins, the designer has created a structure for efficient access, large volumes
of data, and natural end user viewing. But there is a problem with star joins. In order to know
how to create the star join, the designer must make assumptions about the usage of the data.
Stated differently, without knowing the predominant pattern of access and usage of the data,
you cannot create a star join. At the heart of the design of any star join is the implicit
understanding of how the data in the star join is to be used.
Second problem with star join structures, and that problem is that online update plays havoc
with the underlying data management required to make the star join complete. In a DSS
world where there is no update, this is not a problem. But in an ODS world where online
update is a normal event, the inability of the star join to gracefully handle updates presents a
special challenge.
DWH Design Operational Data Store
3
Shiva.inbox@gmail.com
NORMALIZED vs. STAR JOIN STRUCTURE:
NORMALIZED STRUCTURE
STAR JOIN STRUCTURE
Inefficient to access
Efficient to access
Holds modest amounts of data
Holds large amounts of data
Applicable to a wide audience
Applicable to a restricted
audience
Handles updates
Does not handle updates
Best approach for Designing ODS is to consider dual design approach.
ODS Underlying Technology:
The technology underlying the ODS is perhaps the most complex technology there is. The ODS
technology gets to be especially complex when the volume of data found in the ODS or the
number of users of the ODS grows large.
As long as the volume of data for the ODS is small and as long as there are not too many
users of the ODS, then standard, general purpose technology can accommodate the ODS. But
in the face of growth, the technology underlying the ODS must be chosen and managed very
carefully.
The ODS is characterized by operating under a very mixed workload. On the one hand, the
ODS must support online high performance transaction processing. At the same time the ODS
must support integration and DSS processing. Operating systems and DBMS cannot be made
optimal for both environments. It is inevitable that compromise of design and system
configuration be made.
One-way the ODS server supports the very different environments is to break apart day and
night time processing. The daytime becomes the time for OLTP processing. The nighttime
becomes the time for DSS processing.
But the hardware and software do not change as day turns to night and vice verse. The
hardware and the software supporting the ODS need to look like one kind of processor at one
time of the day and another kind of processor the other part of the day. In order to
accomplish this, several considerations must be made:

The DBMS and the operating system must be versatile,

System configuration parameters must be flexible and able to accommodate many
different patterns of processing,

The applications running in the ODS must recognize the boundaries of operation within
which they must live,

The systems programmers that support the ODS must have a, clear picture of how the
environment operates, and so forth.
The hardware and software platform for the ODS environment then must support a very mixed
workload.
When there is not too much volume of data and processing, an SMP processor may well suit
the needs of the ODS. But as the workload grows, the volume of data grows, and the number
of users grows, the balance is tipped toward an MPP architecture. With an MPP architecture,
the volume of data that can be accommodated is limited only by economics, not by technology
itself.
DWH Design Operational Data Store
4
Shiva.inbox@gmail.com
WORKLOAD:
The daily workload of the ODS has two distinct periods. During the day time, from 8:00 am to
6:00 pm (or there abouts) the ODS workload looks like a classical OLTP environment. During
this time span the queries that are submitted are very short in stature, each looking for a
finite amount of data. There are typically many queries that are submitted during this time. In
addition, the pattern in which these queries are submitted is very predictable.
Usually users that have a clerical orientation submit the queries. After 6:00 pm the queries
change. The ODS goes into its nighttime processing mode. After 6:00 pm and going to the
following morning until 7:00 am (or there abouts), the ODS server turns into a batch
processor. This is the period of time when long sequential jobs are run. The queries and other
jobs that are submitted run for long periods of time. Many records of data are accessed by a
given job. However, relatively few queries are submitted, certainly far fewer than the number
of daytime OLTP queries. In addition, during the night time hours, utilities, loads, indexing,
and other activity essential to the ODS environment are done.
By drastically altering the nature of transactions and processing between daytime and
nighttime hours, the ODS is able to accommodate a very different workload.
ODS REFRESHMENT:
The ODS is refreshed from the integration and transformation layer. In some cases the same
programs that do the enterprise data warehouse refreshment will be used. In other cases
separate programs will be written for the refreshment of data into the ODS. There is one very
important difference between enterprise data warehouse refreshment and ODS refreshment
and that difference is that refreshment in the ODS environment is very sensitive to time while
refreshment in the enterprise data warehouse environment is not.
For example, there may be a lag of a day or even a week from the execution of an operational
transaction until that transaction is refreshed in the enterprise data warehouse. But
transactions are refreshed much more quickly into the ODS. Depending on the class of ODS,
refreshment may be done in a matter of milliseconds. A class I ODS may be refreshed very,
very quickly from the operational environment.
One of the consequences of rapid refreshment into the ODS is that the faster the refreshment,
the less opportunity there is for integration. For example, with a class I ODS it is very difficult
to do anything other than copy the transaction into the ODS. With sub second refreshment,
there simply is very little opportunity for any serious integration.
But with a class II, III, or IV ODS there is ample opportunity to perform heavy integration on
the transaction data as it passes from the application environment to the ODS.
Another important implication of the different classes of ODS is that the faster the
refreshment, the more expensive the technology and the design. In other words, a class I
ODS is expensive, a class II ODS is less expensive, and a class III ODS is relatively cheap.
DWH Design Operational Data Store
5
Download