Matakuliah : M0584 - Data Warehouse Tahun : Sep - 2009 Granularity in the Data Warehouse Pertemuan - 05 Granularity in the Data Warehouse • The Single Most Important Design Issue facing the data warehouse developer is determining the GRANULARITY. • The Primary issue of granularity is that of *getting it at the right level* Bina Nusantara University 3 Raw Estimates • The Starting Point for determining the appropriate level of granularity is to do a raw estimate of the number of rows of data and the DASD (direct access storage device) that will be in the data warehouse. • The raw estimates of the number of rows of data will reside in the data warehouse tells the architect a great deal • See Figure 4.1 at page 148 for details Bina Nusantara University 4 Raw Estimates • Estimating rows/space for the warehouse environment • 1. For each known table: – How big is a row (in byte) • Biggest estimate • Smallest estimate – For the 1-year horizon • What is the maximum number of rows possible? • What is the minimum number of rows possible? – For the 5-year horizon • What is the maximum number of rows possible? • What is the minimum number of rows possible? Bina Nusantara University 5 Raw Estimates (cont’d) – For each key of the table • What is the size of the key (in bytes) – Total maximum 1-year space = biggest row x 1-year max rows – Total minimum 1-year space = smallest row x 1-year min rows plus index space • 2. Repeat (1) for all known tables Bina Nusantara University 6 Input to the Planning Process • The estimate of rows and DASD then serves as input to the planning process, as shown in Figure 4.2 • See details on page 150 • Find these terms also: – Data in Overflow? – Overflow storage Bina Nusantara University 7 What the Levels of Granularity will be • Once the simple analysis is done, the next step is to determine the level of granularity for data residing on disk storage. This step requires common sense and a certain amount of intuition. • Creating a disk-based data warehouse with a very low level of detail doesn’t make sense because too many resources are required to process the data. Bina Nusantara University 8 What the Levels of Granularity will be • On the other hand, creating a disk-based data warehouse with a level of granularity that is too high means that much analysis must be done against data that resides in overflow storage. • See figure 4.6 Bina Nusantara University 9 Some Feedback Loop Techniques • Build the first parts of the data warehouse in very small, very fast steps, and carefully listen to the end users’ comments at the end of each step of development. Be prepared to make adjustments quickly. • If available, use prototyping and allow the feedback loop to function using observations gleaned from the prototype • Look at how other people have built their levels of granularity and learn from their experience. Bina Nusantara University 10 Some Feedback Loop Techniques • Go through the feedback process with an experienced user who is aware of the process occurring. Under no circumstances should you keep your users in the dark as to the dynamics of the feedback loop • Look at whatever the organization has now that appears to be working, and use those functional requirements as a guideline. • Execute joint application development (JAD) sessions and simulate the output in order to achieve the desired feedback Bina Nusantara University 11 Levels of Granularity - samples • See figure 4.7 for Banking environment Bina Nusantara University 12 Levels of Granularity - samples • See figure 4.9 for Manufacturing environment Bina Nusantara University 13 Levels of Granularity - samples • See figure 4.11 for Insurance environment Bina Nusantara University 14 Summary • Choosing the proper levels of granularity for the architected environment is vital to success. • The process of granularity design begins with a raw estimate of how large the warehouse will be on the oneyear and the five-year horizon. • There is an important feedback loop for the data warehouse environment. Upon building the data warehouse’s first iteration, the data architect listens very carefully to the feedback from the end user. Adjustments are made based on the user’s input. Bina Nusantara University 15