Shiva.inbox@gmail.com Operational Data Store Design Consideration: The operational data store is an architectural constructs that that has elements of both operational and DSS processing. The ODS is able to provide operational transaction response time of two to three seconds and allows update to be done directly. In that sense it is very operational. But an ODS is also integrated and can be used for DSS processing. In that sense it is decision support. The ODS is architecturally similar to the data warehouse in that both the ODS and the data warehouse are integrated and subject oriented. But the ODS and the data warehouse are very dissimilar in that the data warehouse is not updated by transactions while the ODS certainly can be updated by transactions, the data warehouse contains summarized data while the ODS does not, the data warehouse contains a lengthy historical perspective of data while the ODS does not, the data warehouse is used for strategic decisions while the ODS is used for tactical decisions, and the data warehouse is used at the managerial level while the ODS is used at the clerical level. The ODS sits between the legacy systems environment and the data warehouse. As data passes out of the legacy environment and into the ODS, the data is integrated, just as it is as data passes into the data warehouse. Then as the data ages it passes out of the ODS and into the data warehouse environment. The ODS is defined to be a structure that is: Integrated Subject oriented Volatile, where update can be done Current valued, containing data that is a day or perhaps a month old Contains detailed data only The ODS is like a data warehouse in that it is integrated and is subject oriented. But a data warehouse is very different from an ODS when it comes to update. An ODS contains data that can be altered, whereas a data warehouse contains data that exists in snapshots and cannot be altered. The second major difference between an ODS and a data warehouse is that the ODS contains very current data while the data warehouse contains a robust amount of history. The ODS typically contains daily, weekly, or even monthly data. The data warehouse contains historical data that may be five or even ten years old. Another important difference between the data warehouse and the ODS is in the summary data found in each environment. Summary data can be created in the ODS but it is not usually stored, once having been created. The summary data that is created in the ODS can be called "dynamic summary data". Dynamic summary data is summary data whose accuracy depends on the moment of calculation. DWH Design Operational Data Store 1 Shiva.inbox@gmail.com The summary data found in the data warehouse is quite different from the dynamic summary data found in the ODS. The summary data found in the data warehouse can be called "static summary data". Static summary data is data whose accuracy does not depend on the moment of calculation. The ODS is a particularly difficult structure to build because the design of the ODS must satisfy both operational and DSS requirements at the same time. Because the ODS must satisfy more than one master at a time, the ODS design is necessarily a compromise. In many regards the ODS is a staging area for data as it passes into the DSS environment. TRANSACTIONS TO ODS: The operational data store (ODS) is designed to provide integrated, collective information for the operational environment. The ODS allows updates to occur and contains only current information. Architecturally, the ODS is in many ways the operational counterpart of the data warehouse. There are four classes of ODS, based upon the speed with which operational transactions are entered into the ODS after being transacted in the operational transaction processing environment. A class I ODS is one in which transaction changes are entered into the ODS almost immediately upon being transacted in the transaction processing environment. A class II ODS is one in which transactions are stored and forwarded into the ODS after having been executed in the transaction environment. The class III ODS is the same as the class II ODS insofar as the mechanics of the storing and forwarding of the transaction are concerned. A class IV ODS is one where data is loaded from the data warehouse into the ODS. In this case much analytical work has occurred in the data warehouse environment and the results of the analysis are fed back to the ODS. Once fed back to the ODS, the results are available on an OLTP type of response time. An ODS provides a foundation for collective, up to the second views of the enterprise. And at the same time the ODS supports decision support processing (DSS). Because of the many roles that an ODS fulfills, it is necessarily a complex structure. Its underlying technology is complex. Its design is complex. Monitoring and maintaining the ODS is complex. The ODS typically takes a long time to design and implement. The ODS requires changing or replacing old legacy systems that are un-integrated. The classical design of the structures found in the DSS environment begins with a data model, which reflects the informational needs of the corporation. From the data model are generated normalized tables. These tables constitute what can be described as a logical design. The many normalized tables are combined into a form of physical design that can be described as lightly normalized design. In a lightly normalized DWH Design Operational Data Store 2 Shiva.inbox@gmail.com design, tables are combined on the basis of containing common keys and general common usage. The design technique of creating normalized \ lightly normalized structures based on a data model that has been described here fits many instances of DSS design. But there is a problem this approach. When the issues of: Performance, where many tables must be joined, Performance, where there are many occurrences of data that will populate the design, and Simplicity, where users find it unnatural to join many tables together to represent data in a form comprehensible to the end user each time the end user does a transaction Are considered, the design technique of light normalization yields marginal results. An alternate design approach is to take into consideration the volume and usage of the data. When the volume and usage of the data are factored into the design, a mutant form of normalization is achieved. The light normalization turns into heavy normalization, and a structure known as the "star join" is created. There are two essential parts to a star join - fact tables and dimension tables. The fact table represents the structure that holds the majority of the occurrences of the data. Fact tables typically combine data and cross reference keys from a variety of other tables. The other type of table that participates in a star join is the dimension table. Dimension tables contain data, which is not terribly voluminous. Dimension tables are related to fact tables by means of a foreign key relationship. Fact tables are efficient to access because data has been prejoined into the table at the moment of loading. The end user is be able to access fact tables efficiently because the fact tables are extremely streamlined in their design. In addition, the fact table is familiar to the end user, in terms of the day-to-day structuring of data that the end user used to seeing. By building star joins, the designer has created a structure for efficient access, large volumes of data, and natural end user viewing. But there is a problem with star joins. In order to know how to create the star join, the designer must make assumptions about the usage of the data. Stated differently, without knowing the predominant pattern of access and usage of the data, you cannot create a star join. At the heart of the design of any star join is the implicit understanding of how the data in the star join is to be used. Second problem with star join structures, and that problem is that online update plays havoc with the underlying data management required to make the star join complete. In a DSS world where there is no update, this is not a problem. But in an ODS world where online update is a normal event, the inability of the star join to gracefully handle updates presents a special challenge. DWH Design Operational Data Store 3 Shiva.inbox@gmail.com NORMALIZED vs. STAR JOIN STRUCTURE: NORMALIZED STRUCTURE STAR JOIN STRUCTURE Inefficient to access Efficient to access Holds modest amounts of data Holds large amounts of data Applicable to a wide audience Applicable to a restricted audience Handles updates Does not handle updates Best approach for Designing ODS is to consider dual design approach. ODS Underlying Technology: The technology underlying the ODS is perhaps the most complex technology there is. The ODS technology gets to be especially complex when the volume of data found in the ODS or the number of users of the ODS grows large. As long as the volume of data for the ODS is small and as long as there are not too many users of the ODS, then standard, general purpose technology can accommodate the ODS. But in the face of growth, the technology underlying the ODS must be chosen and managed very carefully. The ODS is characterized by operating under a very mixed workload. On the one hand, the ODS must support online high performance transaction processing. At the same time the ODS must support integration and DSS processing. Operating systems and DBMS cannot be made optimal for both environments. It is inevitable that compromise of design and system configuration be made. One-way the ODS server supports the very different environments is to break apart day and night time processing. The daytime becomes the time for OLTP processing. The nighttime becomes the time for DSS processing. But the hardware and software do not change as day turns to night and vice verse. The hardware and the software supporting the ODS need to look like one kind of processor at one time of the day and another kind of processor the other part of the day. In order to accomplish this, several considerations must be made: The DBMS and the operating system must be versatile, System configuration parameters must be flexible and able to accommodate many different patterns of processing, The applications running in the ODS must recognize the boundaries of operation within which they must live, The systems programmers that support the ODS must have a, clear picture of how the environment operates, and so forth. The hardware and software platform for the ODS environment then must support a very mixed workload. When there is not too much volume of data and processing, an SMP processor may well suit the needs of the ODS. But as the workload grows, the volume of data grows, and the number of users grows, the balance is tipped toward an MPP architecture. With an MPP architecture, the volume of data that can be accommodated is limited only by economics, not by technology itself. DWH Design Operational Data Store 4 Shiva.inbox@gmail.com WORKLOAD: The daily workload of the ODS has two distinct periods. During the day time, from 8:00 am to 6:00 pm (or there abouts) the ODS workload looks like a classical OLTP environment. During this time span the queries that are submitted are very short in stature, each looking for a finite amount of data. There are typically many queries that are submitted during this time. In addition, the pattern in which these queries are submitted is very predictable. Usually users that have a clerical orientation submit the queries. After 6:00 pm the queries change. The ODS goes into its nighttime processing mode. After 6:00 pm and going to the following morning until 7:00 am (or there abouts), the ODS server turns into a batch processor. This is the period of time when long sequential jobs are run. The queries and other jobs that are submitted run for long periods of time. Many records of data are accessed by a given job. However, relatively few queries are submitted, certainly far fewer than the number of daytime OLTP queries. In addition, during the night time hours, utilities, loads, indexing, and other activity essential to the ODS environment are done. By drastically altering the nature of transactions and processing between daytime and nighttime hours, the ODS is able to accommodate a very different workload. ODS REFRESHMENT: The ODS is refreshed from the integration and transformation layer. In some cases the same programs that do the enterprise data warehouse refreshment will be used. In other cases separate programs will be written for the refreshment of data into the ODS. There is one very important difference between enterprise data warehouse refreshment and ODS refreshment and that difference is that refreshment in the ODS environment is very sensitive to time while refreshment in the enterprise data warehouse environment is not. For example, there may be a lag of a day or even a week from the execution of an operational transaction until that transaction is refreshed in the enterprise data warehouse. But transactions are refreshed much more quickly into the ODS. Depending on the class of ODS, refreshment may be done in a matter of milliseconds. A class I ODS may be refreshed very, very quickly from the operational environment. One of the consequences of rapid refreshment into the ODS is that the faster the refreshment, the less opportunity there is for integration. For example, with a class I ODS it is very difficult to do anything other than copy the transaction into the ODS. With sub second refreshment, there simply is very little opportunity for any serious integration. But with a class II, III, or IV ODS there is ample opportunity to perform heavy integration on the transaction data as it passes from the application environment to the ODS. Another important implication of the different classes of ODS is that the faster the refreshment, the more expensive the technology and the design. In other words, a class I ODS is expensive, a class II ODS is less expensive, and a class III ODS is relatively cheap. DWH Design Operational Data Store 5