Network database evaluation using analytical modeling*

Network database evaluation using analytical modeling* by TOBY J. TEOREY and LEWIS B. OBERLANDER The University of Michigan Ann Arbor, Michigan INTRODUCTION package which evaluates Honeywell's Integrated Data Store or IDS. 6 While the model produces •• expected value" results, it goes beyond the usual expected value assumptions and considers the effects of the distribution of logical record sizes and placement. The DBDE is directly applicable to CODASYL database systems. The most easily identifiable forerunner to the DBDE was the general analytical method of evaluating physical database designs for single record type files proposed by Yao. 13,14 The general method decomposed all candidate organizations and access methods into several basic parameters: the number of index levels, the expected search length at each index level and the level of usable data, and the fraction of sequential accesses versus pointer (random) accesses at each level. All applications were assumed to consist of randomly addressed operations such as queries and updates. The File Design Analyzer l l extended the basic method to consider both sequential and batched processing in addition to random processing, and it extended the concept of the pointer (random) access parameter to specify the degree of randomness required by each configuration. An example of this is the implementation of overflow in an indexed sequential organization. The pointer to an overflow record could refer to another point on the same track, another track with the same cylinder, a different cylinder or even a different device. Each specification would imply a different expected access time to the overflow area. This extension enabled the File Design Analyzer to determine a range of performance between "single access" (i.e., a dedicated device with no seek time, but including an average rotational delay between consecutive sequential accesses) and "multiaccess" (i.e., a shared device with the worst case conditions of all consecutive sequential accesses becoming random accesses in reality). The' purpose of this dichotomy was to place bounds on I/O service time for the extremes of one user and many users. No attempt was made to determine the effect on queuing delays with this model or with the DBDE. The DBDE does, however, represent significant improvements over the previous models in the following areas: The proliferation of computerized databases and the widespread demand for timely access to data has resulted in the need to develop automated techniques for database design. In the past, database design was accomplished manually by data processing personnel through past experience and trial and error. However, the implementation of large integrated databases in organizations made it increasingly difficult for individual users to do the necessary design. A new approach was required to consider the many tradeoffs among user objectives and within the complex logical structures required to integrate the various types of data. The stages of database design have only recently been well-defined, 5 and much of that work is still done manually with some assorted design aids available at some of the stages. One of the more quantifiable stages in the database design process is the implementation of a physical database from a logical database design. Logical database design results in the definition of record types and their relationships under the constraints of a particular data model, but independent of physical structure. The design of the physical database must take into account the variations in access methods, i.e., the access paths and search mechanisms available, as well as physical clustering of records near each other within blocks or in a contiguously defined set of blocks. Considerably more insight is required in physical database design before determining whether or not it can be accomplished independently of logical database design. Currently there exist some very general physical database designers 10,13 which address tradeoffs among major classes of storage structure. On the other hand, several evaluators of physical databases have also been proposed. Some are simulation models and are quite expensive to run. 2 ,7,9 Others are probabilistic and are limited by expected value assumptions. 3 ,11 This paper described the concepts leading to the development of an operational Database Design Evaluator (DBDE), which accounts for the more significant parameters affecting physical database design performance and yet is computationally fast enough for consideration as the central module of a physical designer. The DBDE is a software 1. Computation of database overflow probabilities on the basis of database growth and distribution of record size, and determination of the effect of increasing * Supported by the Defense Communications Agency, Contract Number DCA l00-75-C-0064 for the WWMCCS ADP Directorate, Reston, Virginia 833 From the collection of the Computer History Museum (www.computerhistory.org) 834 National Computer Conference, 1978 sequential search in the page and possibly an overflow page yields the desired record. 3. Secondary records-These are records that can only be retrieved by accessing the master records and then traversing the chain to which they belong. Often it is possible that the immediate master cannot be accessed directly. In this event, master records that are on a higher level than that of the required immediate master need to be accessed. The search is then performed from these higher level records down to the immediate master in question and then finally to the desired secondary record. Such higher level masters are known as entry points and are accessed via primary record operation, calculated (CALC) record operation or by a RETRIEVE master record operation. overflow on the I/O service time required to execute various data manipulation operations. 2. The effect of buffering on I/O service time. 3. The modeling of "currency" and dependent sequences of operations performed on network databases. Other analytical models have been confined to storage structures forflat files or hierarchical systems, but have not considered network databases. The proliferation of CODASYL DBTG or similar implementations for network databases have increased the need for network evaluation models. INTEGRATED DATA STORE (IDS) The Integrated Data Store (IDS) is a host language Database Management System which interfaces extensively with COBOL. It was developed in 1963 by General Electric and is now being used on the Honeywell 6000 series and other configurations. It is widely regarded as a forerunner of the CODASYL DBTG concepts, and is very close to the specifications of the 1971 DBTG report. 4 It was developed to organize information so as to minimize redundancy and duplication of records or data, provide a single database for many applications, store and retrieve logically related data in cohesive manner, and provide an efficient query capability.6 The structure of IDS may best be summarized in terms of the interrelationships between records and access mechanisms, and chain classification. Chain classification In IDS a record may belong to any number of chains. If retrieval of a record is specified by a particular chain, it is retrieved via that chain. If the chain through which the record is to be retrieved is not specifically stated, the record is retrieved via its prime chain which is declared a priori (in the RETRIEVAL VIA CLAUSE). This manner of retrieval by means of a chain applies only to the secondary records. Finally, a chain may be implemented as any combination of the options chain PRIOR (backward pointer) and chain MASTER (pointer to the master record). Every chain has a NEXT (forward) pointer. MODEL PARAMETERS Record classification DBDE inputs A record may either be a master record or a detail record. Masters and details are relate4 to each other by means of chains. A chain may contain only one master record. However, a record may be the master or detail of more than one chain, and a record may be a master in one chain and a detail in another chain simultaneously. All IDS records of a specific record type are fixed length and fixed format. Different record types may have different lengths and formats. Another form of record classification is by the retrieval access mechanism. Three methods are allowed in IDS: Categories of inputs and outputs are summarized in Figure 1. a. Database Description The IDS model requires the definition of master records and detail records of chains, the page size and page range of subfiles, and the record lengths and page ranges of all record types (see Table I for an example). b. Workload Specification 1. Primary records-retrieved by reference codes which point directly to the physical page and line number on which the record resides. The page is the basic unit of transfer between secondary storage and main storage. It is a physical block (with fixed size in subfiles, somewhat analogous to DBTG areas) that may contain many record types simultaneously. PAGE RANGE is a feature of IDS that specifies a set of physically contiguous blocks (pages) to contain records of a given type or types. 2. Calculated records-retrieved by calculating a page address (via hash code) from a particular key value. A The model identifies applications in terms of specific IDS operations called: RETRIEVE, HEAD (i.e., find master record), STORE, DELETE, MODIFY, and QUERY. An application can be defined by a sequence of related retrieval operations, which traverse the network in various ways, ending with a specified operation on the desired record( s). It is also possible to specify location mode in terms of database entry points and type of access method, and the PLACE NEAR option. PLACE NEAR allows the clustering of detail records near master records they are most frequently processed with. From the collection of the Computer History Museum (www.computerhistory.org) Network Database Evaluation a. Main Storage Requirement User Commands 1 This is a straightforward computation, adding the user estimate for object program storage space to the space required for buffers for each application. USER INTERFACE Database Descri pt ion Workload Specific ation Hardware Characte ristics , 835 IDS (NETWORK) STORAGE b. Secondary Storage Requirement , STRUCTURE ,."" AL TERNA TI YES .Jo. In IDS this is input explicitly by the user (analyst) in terms of subtile specifications for first and last page numbers. c. I/O Service Time LMal'n Storaoe Requi red.. Se condary Storage Requi red II o Service Time ~ I/O time implies elapsed I/O service time, which affects either response time or turnaround time. It differs from channel time in that it always includes all the major components of I/O service: seek time, rotational delay, and data transfer. Bounds on I/O time are provided for the extremes of "single access", and "multi-access" configurations described above. CP U Time Figure I-The data base design evaluator (DBDE) c. Hardware Characteristics Hardware specifications are, for the most part, limited to the timing characteristics and capacity levels of the secondary storage subsystem. It is assumed that only one type of secondary storage device is used, although the alternative types of devices could be evaluated by reinitializing these parameters. In IDS, allocation across several devices is possible. Other input parameters which fall loosely under the category of hardware are the estimated CPU time required to initiate I/O starts and completions, the CPU time for moving data within main storage, and the main storage required for support software plus user applications. One parameter that falls in none of the above categories is the specification that the effect of overflow on performance is desired. The analyst may specify any number of time periods that overflow performance data is to be computed, with the database growth rate implicitly specified by the current database size and the number of record additions per time period. All computations in the DBDE are based on a unit time period. The analyst may choose the most meaningful time unit for his analysis, and merely needs to be consistent with that time unit when specifying workload parameters and overflow time periods. DBDE outputs The DBDE outputs reflect an attempt to present a fairly complete picture of resource usage and the components of response time. Enough performance data is given so the analyst can estimate the real cost of the applications evaluated. d. CPU Time The DBDE estimates the CPU time required for the database functions: overhead to start and complete I/O commands (for physical reads and writes), time for data movement between buffers and user work areas where applicable, database search time, and software write verification. The user CPU time to process logical records is merely echoed from the input data, and is an optional feature. MAJOR COMPONENTS An algorithm for database sizing and overflow computation Given a database with N records of varying size, an algorithm to determine its size needs to answer the following questions: 1. How many physical blocks (pages) will be required if the database is organized sequentially and the ordering results in random placement of records of a given size? 2. How many record occurrences will typically fit into a block? The first question is of interest in modeling the record distribution or population of sequential and indexed sequential database organizations, which can be computed by the DBDE. The second question is of interest in modeling the placement of chains in IDS. If all records were of the same size, R words, and the physical block size were B words, then the number of records that could fit into a block would be ~]- where [expr]- denotes the "floor function" or "greatest integer equal to or smaller than" the actual value of the From the collection of the Computer History Museum (www.computerhistory.org) 836 National Computer Conference, 1978 TABLE I-Input data for test network database HARDWARE SECTION Honeywell 6060/055191 30 mi11isec 10 mi 11 i sec 8.35 mi11isec 16.7 mi11isec Secondary storage device Average seek time Adjacent cy1. seek time Avg. rotational latency Full rotation time Bytes per word Transfer rate Cylinders per disk pack Tracks per cylinder Bytes per track Block gap size CPU time for I/O start + end CPU time for core-core move 6 1,074,000 byte/sec. 404 19 11904 bytes 12 bytes 1 mi11isec 2.75 microsec/byte IDS SECTION IDS DATA DIVISION Number of buffers (Max. is 256) Write verify ON? Main storage reqmt. for software CALC chain retrievals/time unit RETRIEVE EACH operations/time unit SUBFILE DIVISION First page number Last page number . Page size Disk number SUBFILE 1 1 100 320 words 1 50 NO lOOK bytes o o SUBFILE 2 101 150 320 words SUBFILE 3 151 200 320 words 2 From the collection of the Computer History Museum (www.computerhistory.org) 3 Network Database Evaluation 837 TABLE I-(Continued) Max. 100 record types RECORD DIVISION Record number 1 Record length (words) Retrieval via No. occurrences First page number Last page number Workload per time unit: a. Retrieve direct b. Retrieve record c. Retrieve current d. Store e. Delete f. Modify 2 3 4 5 6 7 8 10 CALC 50 1 100 90 CHN.6 50 1 100 60 CALC 50 101 150 35 CALC 50 151 200 65 25 CHN.2 CH'l.4 50 50 151 1 200 100 40 am.8 CHN.10 50 50 1 101 100 150 5 5 5 15 10 0 5 5 5 15 10 0 5 5 5 5 15 10 0 5 5 5 15 10 0 5 5 5 15 10 0 5 5 15 10 0 5 5 5 15 10 0 75 i 5 5 5 15 10 0 Max. 100 Chain types CHAIN DIVISION Chain number 1 2 3 4 5 6 7 8 9 10 Master record Detail record Chain order Link to next Link to prior Link to master Record selection Retrv next/~~it tlme Retrv.master/ unit time Head opns . /u~it t,me 3 4 FIRST YES NO NO CURR. 4 5 FIRST YES NO NO CURR. 1 5 FIRST YES NO NO CURR. 2 6 FIRST YES NO NO CURR. 8 0 1 7 FIRST YES NO NO CURR. 0 8 0 2 7 FIRST YES NO NO UNIQ. 0 3 0 1 2 FIRST YES NO NO CURR. 0 1 0 1 6 FIRST YES NO NO CURR. 0 FIRST YES NO NO CURR. 0 FIRST YES NO NO CURR. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Place near master NO YES NO YES NO YES NO YES NO YES From the collection of the Computer History Museum (www.computerhistory.org) 838 National Computer Conference, 1978 expression. The resulting database size would be [[irT blocks where [expr]+ denotes the "ceiling function" or "smallest integer equal to or greater than" the actual value of the expression. The problem becomes significantly more difficult when there are a variety of record sizes and not much is known about their distribution in the database. The database size is extremely sensitive to record placement when there are very few record sizes, but a large difference in those sizes. An example will help illustrate the possible extreme values. Suppose there were 1000 records of length 20 words and 1000 records of length 90 words and a constant block size of 100 words. If the ordering specified all the large records must come first and all the small records second, the total size of the database would be: DATABASE SIZE ~ [[:FJ= J [[:2~ T The algorithm takes into account parameters such as percent fill and page overhead associated with specific storage structures in order to compute realistic values for E(used).12 As an example of the type of analysis required, let the database consist of 1000 records with an expected record size of 200 words and a block size of 320 words. The actual distribution of record size is: Record Type Record Size (Words) Number of Occurrences A B C 100 200 300 300 400 300 For this simple configuration, a random distribution of record types across the database results in the seven possible arrangements of records in a block shown in Figure 2 occurring with the probability Ph i = 1, 2, ... , 7. From the + 1000 + 200 1200 blocks On the other hand, an ordering which alternated large and small records would require 2000 blocks. Any other ordering of records would produce a database size somewhere between these two extremes. For very large databases this nearly 2-to-l difference in size will greatly affect the estimated performance for many applications, and consequently record distribution must be represented accurately in the model. The algorithm assumes that the size of any record in the database is independent of the size of its neighbors. If this assumption is not true for sequential or indexed sequential organizations, then the database size must be computed manually. The IDS model allows dependent record placement to be specified in terms of page ranges. The algorithm then enumerates the possible combinations of record sizes that would overflow a physical block and r~quire the populating of the next block, and it obtains a distribution of ,unused space in each block. The expected used space in a block, called E(used), is easily computed as the difference between block size B and the expected value for unused space. The total database size is therefore computed as DATABASE SIZE = [ ~ ReconLsizej*Number_of_occurrencesj/E(used)]+ ALL RECORD TYPES; 100 +---+----t-rr....-n""+----; 200 +---t-,-,.-n-n-t""~ \ 300 h-M~rn\ Next record A.BorC type in random sequence Probabil.027 ity of occurrence'P i Used space 300 in the block. A.B.orC B or C .063 200 A.B.orC B or C A.B.or C .09 .12 .12 .28 .30 100 300 300 200 300 USED; Figure 2-Random sequence distribution of record types in blocks data in Figure 2 we can compute the expected value of used space in each block and then the database size. 7 E[used] = L Pi * USED i = 247.7 words i=l 3 Total data volume =L j=l RecorcL.sizej * Number _of_occurrences where j is the number of record types = 100 * 300 + 200 * 400 + 300 * 300 = 200,000 words Expected number of blocks required = [ 200,000 words J+ = 808 blocks 247.7 wordslblock where a record type is defined by its length and number of occurrences. The same number of fixed size records of 200 words each would require 1000 blocks. Therefore a significant reduction From the collection of the Computer History Museum (www.computerhistory.org) Network Database Evaluation is possible in this simple case when the record sizes are distributed over a larger range. The above computation becomes extremely difficult as the number of record types becomes large (Le., 500 or more), and even the computer time to exhaustively enumerate all cases is prohibitive. The DBDE uses an algorithm to compute E(used) that reduces the computational complexity to a feasible range for 50-100 record types, which is adequate for most IDS databases. The complete derivation of this algorithm is given in Reference 12. A natural extension of the database sizing algorithm is to consider new records being added to the database and the probability that they will overflow the page where they would be naturally placed. From this value one can determine the expected number of overflow records generated by a given number of record additions over a particular time period. Overflow in IDS significantly increases the expected retrieval time for records. Buffering The original purpose for buffering on second generation (uniprogramming) computer systems was to provide a mechanism for overlapping CPU and I/O time and improve overall performance of each application. Buffering for multiprogramming systems still produces an overlap of resources, but not necessarily for the same application Gob). While job A is executing, job B may be inputting into its buffer while job C is done with I/O and waiting for the CPU. The existence of a large number of jobs in the system causes other delays which usually subsume the synchronous CPU-I/O sequences of a sequential data processing operation. Consequently, the benefits of such buffering received by sequential or random database applications in multiprogramming environments are not easily measurable with probability models, such as DBDE, but they can be better approximated by a queuing model for the whole system. The more directly measurable function of buffering in multiprogramming systems is that of keeping active data in main storage in order to reduce physical I/O operations when data is continually referenced. The DB DE considers this form of buffering, and it allows the analyst to specify up to 256 buffers for IDS applications. Naturally, increasing the number of buffers increases the probability that the next record you need is already in main storage and consequently decreases the expected I/O service time. However, increasing the buffer size too much may result in wasted storage, increased cost for storage used, and main storage allocation delays that may degrade overall performance. Therefore the choice of proper buffer size is critical to an effective IDS application. The DB DE implements the buffer concept by maintaining buffer currency data based on the most recently executed IDS operations and the database definition of chains and record types. It also simulates the basic IDS buffer replacement algorithm, least-recently-used (LRU). Thus, for this part of the IDS model, the DBDE uses a hybrid analytical and deterministic simulation technique. 839 Currency in a network database IDS (as well as DBTGdatabases) has currency indicators which specify the most recently accessed record of a particular record type or master and detail records for a chain type, for example. The DBDE maintains similar information to represent the current status of access to the database. Consequently the next operation is evaluated according to current position and next position desired. The logical structure diagram of the database is implicitly known from the form of the input parameters, and the algorithm consults this information to compute I/O time and CPU overhead to access the next record. EXAMPLE NETWORK DATABASE EVALUATIONS The objectives of the test network database evaluations were to validate the IDS model with live test data from an existing database, to provide insight regarding the sensitivity of database performance to the values of storage structure and user workload parameters, and to compare the performance of alternative database designs. The conclusions from preliminary validation tests al • that the DBDE is able to describe the major database design parameters and their relationships accurately, that the input data can be created easily and quickly, and that many meaningful experiments can be conducted at a single interactive session with the program. The validation IDS database which represented a large equipment inventory system, was supplied by a client U.S. government agency. The database consisted of 155,000 records of varying length across three major record types, and live test data was collected on an IDS system interfaced with a Honeywell 6060 GCOS operating system for an extensive series of retrieval operations. The test was conducted during peak hours in a large batch multiprogramming environment. Estimates for elapsed time for the live test data (54* 103 seconds) and the DBDE (42.7*103 seconds) differed by 21 percent. The ability of the DBDE to estimate total elapsed time was due to the availability of an extension which implements a central server queuing model of the test system and estimates total elapsed time based on CPU and I/O requirements derived by the IDS model for the test database. Although these preliminary results have been encouraging, we believe that the validation process should be a continually on-going one as more detailed monitor data is made available. Validation at the individual IDS operation level is currently not possible in the existing environment. Example 1 The first test database chosen for experimentation was the support database for DBDE,l which provided the opportunity to evaluate our own design for DB DE before ap-. plying the technology to client systems. The test database schema is illustrated in Figure 3. The database contains eight record types and ten chain types. The database is quite small From the collection of the Computer History Museum (www.computerhistory.org) 840 National Computer Conference, 1978 (755 records), but this kept the cost down while allowing a great deal of analysis of the model. Other input parameters are summarized in Table 1. The basic output produced by the program is illustrated in Table II. Each operation is considered separately and totals are produced for given frequencies of occurrence of each operation. Table II illustrates a two-operation application that retrieves a record and modifies it. To accomplish this task, 2 sUb-operations are required for each retrieval and 5 sub-operations are necessary for the modify operation. This feature of providing detailed output for each operation enables the analyst to obtain a better picture of cause and effect relationships for overall system performance, and thus make better decisions on parameter values to choose. Example 2 The estimates of database I/O requirements for several configurations are summarized in Figure 4. The purpose of this experiment was to investigate the tradeoff between I/O time and secondary storage space for this experiment. It consisted of a single master record type (100 occurrences) and five detail record type (5000 occurrences each) with each contained in a separate chain type. The secondary Figure 3-DBDE database schema TABLE II-IDS retrieval and update application. Sample output APPLICATION: Retrieve and Modify Record 7 I/O Service Time (MLSEC) Single Mu1tiAccess Access System CPU Time Estimate (MLSEC) Record Type Chain Type Retrieve as CALC record 1 - 10.20 40.40 2.77 Retrieve as chain detail record 7 8 .07 .26 .02 Modify and del ink 7 - 50.69 200.70 8.50 Retrieve master wIG currency table 2 7 10.14 40.14 2.75 Link to chain 7 7 7 20.28 80.28 2.00 Retrieve record 1 as CALC 1 - 10.20 40.40 2.77 Link to chain 8 7 8 20.28 80.28 2.00 - - 121 .86 482.46 20.81 ACTION TOTAL From the collection of the Computer History Museum (www.computerhistory.org) Network Database Evaluation 841 storage space for this configuration was: 150 - CHAIN NEXT CHAIN NEXT, PRIOR. MASTER Master Records 100*(144 bytes) 100*(164 bytes) Detail Records 25000*(128 bytes) 25000*(136 bytes) The implementation of pointers for PRIOR and MASTER in each record represents an overall increase of slightly over 6 percent in total storage space. The results illustrated in Figure 4 show that the I/O time for most operations is independent of chain pointer implementations. However, there is a slight increase in the STORE and a very significant decrease in HEAD (i.e., retrieve master) operations. The results for "multi-access" were relatively the same as those in Figure 4 but all values were approximately 30 percent larger on a point-by-point basis. The increase for STORE is due to a greater number of pointers to reset, while the decrease for HEAD is due to direct access to the master record with a MASTER pointer. Overall, this experiment shows that choice of pointer implementation depends on the types of applications to be run. Normally, a 6 percent increase in storage space is not significant, therefore, the increase in storage space appears to be less important than the effect on I/O time, which in tum affects total response time. Example 3 We now return to the data structure for Example 1 to illustrate the capability of the DBDE to predict degradation 800 1/0 700 service time in mi 11 i seconds RETRIEVE FIRST 600 500 100 - 3416K bytes 3214K bytes Total I/O service til!'<! in milliseconds HEAD (RETRIEVE MASTER) /' /' /' /' /' 50 /' /' /' Single Acces --- --I I I I '5 I --- --- _lllS.gX.O~li"'l.--- I I I '10 I I I I l'S I I I I 26 Future Time Periods Figure 5-1/0 service time for the retrieve record 7 operation as a function of increasing IDS database size in I/O service time as the probability of overflow increases. U sing the database configuration in Table 1, we assume a database growth rate (number of record occurrences of each type) of 10 percent per time period for an arbitrary time period length. The IDS model derives the overflow probability at discrete time period intervals for as many intervals as the analyst wishes to specify. The assignment of a value such as "3 days" for a time period has no effect on the model; real time units for time periods are independent of the overflow algorithm. I/O service time to retrieve a record (type 7) is shown to degrade at a nonlinear rate over 20 time p'eriods in Figure 5. The linear 10 percent database growth rate is included for comparison. The purpose of conducting this type of experiment is to compare storage structures, not just for the current configuration, but also for future configurations that involve more record occurrences and possibly a different level of workload. Observation of the results in Figure 5 could also help determine when and how often the database should be reorganized to clear the overflow areas. 400 300 FUTURE DIRECTIONS MODIFY FIRST 200 STORE 100 CHAIN NEXT CHAIN NEXT, PRIOR, and MASTER Figure 4-1/0 service time for IDS operations with two pointer imple~entations ("single access" with dedicated disk) Experience in the development of the DBDE has led to the conclusion that considerable insight can be gained into the causes and effects of physical database performance. Based on this experience it is felt that reasonable extensions can be made in three major directions. The first potential extension needed is to model the interface between the DBMS and the operating system in its many facets: mutual exclusion of processes attempting to read and update the From the collection of the Computer History Museum (www.computerhistory.org) 842 National Computer Conference, 1978 same data, queuing delays from mutual exlusion or other resource contention, and details of operating system overhead to facilitate DBMS commands to access data. The simulation model of Rulten and Soderlund 7 has achieved considerable success in this area. Secondly, the evaluation can be extended to databases based on hierarchical or other logical models of data because they exhibit the same properties as network models: access can be described in terms of some combination of direct (hashing) or indexing plus a series of sequential or pointer steps within a particular level of indexing or at the final level of data. The generalization applies to existing search methods, indexing methods, and overflow techniques, although the resulting implementation in the analytical model is more arduous in some cases. At the level of detail exemplified by the DBDE, each technique must be modeled individually, and although models designed by Yao 14 and Severance 10 are more general, they lack important detail available in the DBDE. The third major extension would be to incorporate the DBDE into a physical database designer that optimizes user specified performance measures as a function of user specified control parameters. Typically, the control parameters would be defined as a subset of the database characteristics currently input to the DBDE. Several of these characteristics are fixed (Le., bound) at the logical database design . phase, and so the range of physical design control parameters is further narrowed. In IDS one could control subfile page size, page range for subfiles and record types, PLACE NEAR specifications, RETRIEV AL VIA clauses, and pointer options. ACKNOWLEDGMENTS The research and development for the Database Design Evaluator was conducted under Defense Communications Agency Contract Number DCA 100-75-C-0064 for the WWMCCS ADP Directorate, Reston, Virginia. The authors also gratefully acknowledge the fine programming contributions made by Judy Botwick and Rick Haan. REFERENCES 1. Bastarache, M. J. and E. A. Hershey, "The Data Base Management System User Manual and Example," ISDOS Working Paper No. 89, Dept. of Industrial and Operations Engin., The University of Michigan, April, 1975. 2. Cardenas, A. F., "Evaluation and Selection of File Organization-A Model and System," Comm. ACM, 16,9, 1973, PP.' 540-548. 3. Cardenas, A. F., "Analysis and Performance of Inverted Data Base Structures," Comm. ACM 18,5, 1975, pp. 253-263. 4. CODASYL Data Task Group, April 197IReport, ACM, New York. 5. Fry, J. P. and B. K. Kahn, "A Stepwise Approach to Database Design," Proc. ACM Southeast Regional Conference, Birmingham, Alabama, April 1976. 6. Honeywell Information Systems, Inc. Integrated Data Store, Wellesley Hills, Massachusetts, Order No. BR69, 1971. 7. Hulten, C. and Soderlund, L. "A Simulation Model for Performance Analysis of Large Shared Data Bases," Proc. International Conf. on Very Large Data Bases, Tokyo, Oct. 6-8, 1977. 8. Knuth, D. E. The Art of Computer Programming, Vol. 3: Searching and Sorting, Addison-Wesley, Reading, Massachusetts, 1973. 9. Senko, M. E., V. Lum, and P. Owens, "A File Organization Evaluation Model (FOREM)," Proc. IFlP 1968, pp. CI9-C23. 10. Severance, D. G., Some Generalized Modeling Structures for Use in Design of File Organizations, Ph.D. Dissertation, University of Michigan, 1972 . 11. Teorey, T. J., and K. S. Das, "Application of an Analytical Model to Evaluate Storage Structures," Proc. ACMISIGMOD International Conference on the Management of Data, Washington, D.C. June 2-4, 1976, pp.9-19. 12. Teorey, T. J., J. Botwick, R. A. Haan, and L. Oberlander, "Design Specifications for the Database Design Evaluator," Data Translation Project Working Paper DE 7.2, Graduate School of Business Admin., University of Michigan, 1977. 13. Yao, S. B. and A. G. Merten, "Selection of File Organization Using Analytic Modeling", Proc. International Conference on Very Large Databases, Framingham, Massacliusetts, September 22-24, 1975, pp. 255267. 14. Yao, S. B., "An Attribute Based Model for Database Cost Analysis," ACM Trans. on Database Systems, 2,1, March 1977, pp. 45-67. From the collection of the Computer History Museum (www.computerhistory.org)

Network database evaluation using analytical modeling*

Related documents

Products

Support

Network database evaluation using analytical modeling*

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib