Document 15062926

advertisement
Matakuliah : M0584 - Data Warehouse
Tahun
: Sep - 2009
Granularity in the Data Warehouse
Pertemuan - 05
Granularity in the Data Warehouse
• The Single Most Important Design Issue facing the data
warehouse developer is determining the
GRANULARITY.
• The Primary issue of granularity is that of *getting it at
the right level*
Bina Nusantara University
3
Raw Estimates
• The Starting Point for determining the appropriate level
of granularity is to do a raw estimate of the number of
rows of data and the DASD (direct access storage
device) that will be in the data warehouse.
• The raw estimates of the number of rows of data will
reside in the data warehouse tells the architect a great
deal
• See Figure 4.1 at page 148 for details
Bina Nusantara University
4
Raw Estimates
• Estimating rows/space for the warehouse environment
• 1. For each known table:
– How big is a row (in byte)
• Biggest estimate
• Smallest estimate
– For the 1-year horizon
• What is the maximum number of rows possible?
• What is the minimum number of rows possible?
– For the 5-year horizon
• What is the maximum number of rows possible?
• What is the minimum number of rows possible?
Bina Nusantara University
5
Raw Estimates (cont’d)
– For each key of the table
• What is the size of the key (in bytes)
– Total maximum 1-year space = biggest row x 1-year max rows
– Total minimum 1-year space = smallest row x 1-year min rows
plus index space
• 2. Repeat (1) for all known tables
Bina Nusantara University
6
Input to the Planning Process
• The estimate of rows and DASD then serves as input to
the planning process, as shown in Figure 4.2
• See details on page 150
• Find these terms also:
– Data in Overflow?
– Overflow storage
Bina Nusantara University
7
What the Levels of Granularity will be
• Once the simple analysis is done, the next step
is to determine the level of granularity for data
residing on disk storage. This step requires
common sense and a certain amount of intuition.
• Creating a disk-based data warehouse with a
very low level of detail doesn’t make sense
because too many resources are required to
process the data.
Bina Nusantara University
8
What the Levels of Granularity will be
• On the other hand, creating a disk-based
data warehouse with a level of granularity
that is too high means that much analysis
must be done against data that resides in
overflow storage.
• See figure 4.6
Bina Nusantara University
9
Some Feedback Loop Techniques
• Build the first parts of the data warehouse in very small,
very fast steps, and carefully listen to the end users’
comments at the end of each step of development. Be
prepared to make adjustments quickly.
• If available, use prototyping and allow the feedback loop
to function using observations gleaned from the
prototype
• Look at how other people have built their levels of
granularity and learn from their experience.
Bina Nusantara University
10
Some Feedback Loop Techniques
• Go through the feedback process with an experienced
user who is aware of the process occurring. Under no
circumstances should you keep your users in the dark as
to the dynamics of the feedback loop
• Look at whatever the organization has now that appears
to be working, and use those functional requirements as
a guideline.
• Execute joint application development (JAD) sessions
and simulate the output in order to achieve the desired
feedback
Bina Nusantara University
11
Levels of Granularity - samples
• See figure 4.7 for Banking environment
Bina Nusantara University
12
Levels of Granularity - samples
• See figure 4.9 for Manufacturing environment
Bina Nusantara University
13
Levels of Granularity - samples
• See figure 4.11 for Insurance environment
Bina Nusantara University
14
Summary
• Choosing the proper levels of granularity for the
architected environment is vital to success.
• The process of granularity design begins with a raw
estimate of how large the warehouse will be on the oneyear and the five-year horizon.
• There is an important feedback loop for the data
warehouse environment. Upon building the data
warehouse’s first iteration, the data architect listens very
carefully to the feedback from the end user. Adjustments
are made based on the user’s input.
Bina Nusantara University
15
Download