Center for Information Systems Research Massachusetts Institute of Alfred P Sloan School of Technology Management 50 Memorial Drive Cambridge, Massachusetts, 02139 617 253-1000 Contract Number N0039-80-K-0498 Internal Renort Number MOlO-8011-05 A Prelijnir.aiy /urcliitectural Design For Ihe Functional Hierarchy Of Ihe HMDPLEX Database Ifeiclran Ccirputer Hsu Teclinical Report # 5 WP 1197-81 tfeveinber 1980 Principal Investigator: Professor S.E. Madnick Prepared for: Naval Electronics Systems Comrrand V;ashington, D.C. ABSTRACT Conventional computer architecture has shown in Database machines which supporting large-scale information processing. specialize limitations its database functions have been suggested to alleviate the in problem of increasing loads on these computers. capacity. This paper is about hierarchy. functional the storage hierarchy contains two subsystems: It subsystem hierarchy, the employs Madnick by a architecture to achieve high performance and computer highly-parallel proposed machine database The INFOPLEX the design of the functional a INFOPLEX the of and which performs database functions. As originally propsosed by Madnick, the made up hierarchical of microprocessors. The and attempts to this turn accomplishing the to is be implemented (2) original following: requirements (i.e., features) data model; idea (1) into a (3) detailed Identification of a generalized DBMS; (2) (1) design of these by functional features Development of an Detailed specifications of architectural levels, including their functions and implementation (4) multiple This paper functional abstraction. are to be supported by the functional hierarchy; integrated by guidelines for identifying these levels are pipelining of transactions, and is each level is designed to perform levels; certain databasae functions hierarchy functional Pointing out future research directions. strategies; and TABLE OF CONTENTS 1. Introduction 1.1 1.2 1.3 2. 2. 2 3. Stratification of the Database Management System 2.1.1 The ANSI/SPARC DBMS Architecture 2.1.2 DIAM Concepts 2.1.3 The INFOPLEX Approach Data Models 2.2.1 The Conceptual Data Model 2.2.1.1 Literature Overview 2.2.1.2 the INFOPLEX Approach 2.2.2 The Internal Data Model 2.2.3 Support of Multiple Views 2.2.4 Summary Memory Management 3.1 3.2 3.3 3.4 3.5 4. Storage Hierarchy 6 Functional Hierarchy 9 9 1.2.1 Hierarchical Functional Decomposition 1.2.1.1 Hierarchical vs. Non-hierarchical design--^ 13 1.2.1.2 Family of systems 1.2.2 Multiple Microprocessor Implementation 15 19 1.2.3 Summary Research Goals and an Overview of this Report 22 The General Structure of the Functional Hierarhcy 2.1 The id Approach Allocating Storage Space Page Fix and Clustering Considerations Virtual Storage Interface Memory Management Interface Internal Structure 4.1 4.2 4.3 5 Introduction Data Encoding Level 4.2.1 Data Definition Interface 4.2.2 Operational Interface Unary Set Level 4.3.1 Introduction 4.3.2 Primary Sets and Secondary Sets (Subsets) 4.3.3 Catalogue Implementation 4.3.4 Fast Search Mechanisms 4.3.4.1 Sorting 4.3.4.2 Index Table Implementation 4.3.4.3 Hash Table Implementation 4.3.4.4 Summary of Fast Search Mechanisms 27 27 27 30 30 32 32 32 35 46 49 54 56 56 60 61 61 62 67 67 71 71 72 74 74 76 76 78 79 79 80 82 4.4 5. 83 83 84 89 89 89 91 ^5 97 Database Semantics 5.1 5.2 5.3 6. 4.3.5 Data Definition Interface 4.3.6 Operational Interface 4.3.7 Conclusion of Unary Set Level Binary Association Level 4.4.1 Introduction 4.4.2 General Mechanism 4.4.3 Data Definition Interface 4.4.4 Operational Interface 4.4.5 Summary 103 N-ary Level 5.1.1 Introduction 5.1.2 Data Definition 5.1.3 N-ary Operators 5.1.4 Retrieval Strategy 5.1.5 Entity Record Construction Virtual Information Level 5.2.1 Introduction 5.2.2 The General Mechanism 5.2.3 Data Definition Interface 5.2.4 Operational Interface Data Validity Level 5.3.1 General Mechanism 5.3.2 Data Definition Interface 5.3.3 Operational Interface 103 t ; 103 103 104 107 113 119 119 ^ 20 122 122 '.'.'.'.'.'.'.'.'.'.'.'.'.'.'.'.'.'.'.'.'. A 2^ 125 ^26 127 128 User Views and Database Secureity 128 Introduction 129 6.1.1 Mappings 134 6.2. View Enforcement Level 134 6.2.1 Introduction 136 6.2.2 General Mechanism 136 6.2.3 Data Definition Interface 137 6.2.4 Operational Interface 6.3 View Translation Level 138 138 6.3.1 View Translation Level -- Relational View 6.3.2 View Translation Level -- Hierarchical View, ...147 153 6.3.3 View Translation Level Network View 6.3.4 Database Sublanguage Facility and Summary of the View ^^^ Translation Level 164 6.4 View Authorization Level 1 ^^ 6.4.1 Introduction 165 6.4.2 Data Definition Interface 168 6.4.3 Operatinal Interface 6.1 — 7. Summary and Future Research Directions 7.1 Summary of Report 169 16" 7.2 References Future 7.2.1 7.2.2 7.2.3 7.2.4 7.2.5 7.2.6 Research Directions Formal Design Methodology Locking Mechanisms Mapping of Operators Implementation of a Software Prototype Performance Evaluation Recovery and Reliability 172 172 173 174 174 175 175 1 76 . I. INTRODUCTION Conventional computer architecture has shown high-performance, supporting limitations its high-reliability large-capacity and systems dictated by today's information processing needs. the form of instruction microcoded in Attempts in intelligent controllers, sets, back-end processors and database machines have been made to augment the data processing capability of a computer <Hsiao77>. database computer represents one effort which employs architecture computer designed specifically a highly parallel such for INFOPLEX The information procesing needs. Using the concept of hierarchical decomposition in design, its multiple microprocessors in its implementation, and decentralization in its control mechanism, the INFOPLEX database computer architecture has as its objective the support of large-scale information management with high reliability <Madnick79>. It aims to provide solution a to the of increasing loads, in terms of both throughput and volume of problem stored data, faced by today's and tomorrow's information processing nodes INFOPLEX consists of large data storage a storage hierarchy, which system, and a functional supports hierarchy, a which is responsible for providing all database management functions other device management. The functional hierarchy storage hierarchy. This data computer may be used database where machine, users interact with than built on top of is as it very a alone stand directly through Data-Definition/Data-Manipulation/Query language interface, or a it a can connected be to a computer the host regular host computer through augments processors and other utilities. language providing generates commands machine's database the data channel, where a received be to functions The host computer DDL/DML/Query INFOPLEX's by by interface, as depicted in Fig 1.1. "1. 1 Storage Hierarchy The storage hierarchy is comprised of levels various with performance and features. cost <Madnick75>, it is found that the requirements and storage low-cost technologies inexpensive system combining are a hierarchy. the highest level of the The high-performance lowest of level the on the top hierarchy), while low-performance devices such as mass storage systems are placed at the with decomposition is devices, such as cache memory and main memory, are placed (i.e., mixture of a devices Hierarchical devices. applied to organize this ensemble as by high-performance expensive low-performance high-performance a satisfied best research our In of devices storage of hierarchy). An bottom the example of (i.e., our storage hierarchy is shown in Fig 1.2. The storage providing time. a The hierarchy supports the very large linear virtual address space with actual functional a by small access structure of the storage hierarchy and movement of information between levels within the storage system the hierarchy functional hierarchy. are hidden from The lowest level of the storage hierarchy always contains all the information of the system, while higher levels p. Ho:;t Compute] User Terminal) User Interface Functiona Hierarchy Controllers Storage Interface Storage Hierarchy Fig 1.1 INFOPLEX: Backend machine/Stand-alone data computer 7 storage references 4v controller 1. CACHE 2. MAIN 3. BLOCK 4. BACKING 5. SECONDARY 6. MASS \/ <sZ -Ch O Figure 1,2 An Example Mer.ory Hierarchy contain subsets the highest level, depending levels information and actual upon Requests for data are made to database. of the total or anticipated information most likely to be referenced highest level. The automatically moved is in between such that the usage, the future is kept at the effectiveness of the storage hierarchy therefore depends heavily on locality of reference. Microprocessors, are used at each level to implement data algorithms. enhance Simultaneous and parallel operations at all storage levels throughput and reliability the storage system. of desirable properties of storage hierarchies have been the relationships between these properties management strategies have been studied 1.2 Functional in and detail identified, various and storage Abraham79>. <Lam79} functional decomposition INFOPLEX functional Hierarchy is designed around hierarchical Various Hierarchy 1.2.1 Hierarchical The movement functional decomposition. The a the key functional modules that a technique have minimal interdependenc ies and can be combined hierarchically to form system, such as an operating system or 1.2.1.1 Hierarchical vs. a of concept of hierarchical decomposition, as applied to the functior.al design area, is that identifies concept a software database management system. non-hierarchical design 10 comparison. and Fig In to be variable a very strict manner illustrated as to 1.^. In were E have may also D) is distributed a hierarchical the In modules in to structure hierarchical a as this case, the modules are designed such only is serve its function. to modified, there is only one must and be variable only impacts propagation a In dependent upon one other case, if that subroutine is can change in a Besides subroutine. a be directly a data pool minimizing the changes, this hierarchical approach also makes it much of since well-defined.. a Similarly, tested. single such subroutine other must easier to determine which subroutine manner, and an Furthermore, each routine maintains its own private data pool needed affected produce routines nine that each of the routine. B,C, A, pool. functionality as so Fig in M, data the in approach, decomposition represents figure Ihe same argument applies to changes in the format or changed. usage of program "main" a simple a interface to suroutine or changed, five other subroutines (i.e. by to common data pool used a the in function the If . link opposed as shown be addition, there is Each by all of these routines. interdependency can 1.3 depicts a system consisting of subroutines. eight design modular subroutine conventional design modular hierarchical of The advantages the functionality of be each This approach has been found to changed, and subroutine be very in what must be effective in earlier design work on file systems <Kadnick69, Madnick7^>. Another reason for the choice of advantage of database system. has pipelining the A nature a hierarchical design is to of transaction that enters transaction processing a database system to go through a sequence of stages of processing. take in a normally For example, it p. Fig 1.3: Non-hierarchical desiqr gn 11 p. M. J*_ M, M. .4^3 M, -v^. i_ M, 3Z M, -V''6. ^^ ^^7 M, IL M, Fig 1.4 ... P Hierarchical design 12 13 may first be checked by security control module; a then it passed is name-mapping module which determines the records to be accessed; to a and then it is given to finally the records; from memory. the search module which determines the address of a storage module is invoked to obtain the a Moreover, structure that reflects their sequence. support a database system modules the that stages of processing (e.g., security control and earlier the strongly suggest stages Ihese record name-mapping) also require services provided modules those by that support the later stages of processing (e.g., searching and accessing). Research chapter stratification in database systems will be reviewed in Here we conclude by pointing out that, 2. hierarchical since enables higher-level modules to be implemented on top of an modularity •extended machine' lower of incorporating all the primitives implemented at modules, it contributes to the reduction of redundancy of level functions in the system, and therefore enhances the reliability. '1.2.1.2 Family of systems Another advantage development systems' DBMS's is of a of 'family over systems'. of proliferation motivated by the developed decomposition the approach The in systems user to find a structure through a set of It can that By decomposing systems. for system which possesses just the right number of qualities that he desires. general or While the individual systems may have various desirable features, it is often difficult the the concept of 'family of operating of past decade or two. the lies a is therefore advantageous to used as system into modules well-defined a basis for many different be a develop that are compatible interfaces, it is nearly possible to . 14 proposed. to In needs other words, way the assembled. system specific develop any In a stereo a an by appropriate choice modules of desired system may be assembled according system or a automobile cutomized is our research on Family of Operating Systems <FOS76> and Family of Database Management Systems <FODS76>, design was found to be effective in providing a hierarchical modular normative model for the systems Members of differences in a family of systems will differ as result a of the contents of the modules that make up the hierarchy. There are three broad classes of module differences: (1) Functional: module must be Although the purpose of and the interface to each clearly defined, the functions specific and algorithms used may vary significantly. (2) Performance: For different implementations a given that be performance different offer may there functionality, characteristics. (3) Existence: As an extreme case of minimal functionality, certain modules may not exist at all in certain system. Applying the concept of heirarchical decomposition, Functional access path optimization, data integrity encoding, hierarchy of tasks, each of which to be implemented as hierarchy. level INFOPLEX Hierarchy attempts to decompose the typical DBMS functions, such as data restructuring, security control, checking, the Levels are connected in modules a top-down and validity etc., a fashion, 'level' with into a of the higher supported by the 'primitives' of the 'extended machine' 5 1 hardware composed of the and all lower functional level modules. However, design and implementation of each levei is made as independent of other levels particular level are particular, possible as so affected not that algorithms incorporated those by communication is made through inter-level levels. other of in a In set of clean, a pre-defined interfaces. 1.2.2 Multiple Microprocessor Implementation The functional levels specified above are to be implemented using microprocessors to take advantage of possible parallelism and multiple pipelining effects in processing incoming streams of transactions. shown Fig in 1.6, each level in the hierarchy communicates only with adjacent levels and each module within a processor in a level requires service from the it places an operation code and level, level communicates a is called an next associated operands in memory area accessible to only these two levels. module only with Thus no central control mechanism is necessary. adjacent modules. When As , or IRQ, shared memory special This Interlevel-Request-Queue a lower to be distinguished from the storage hierarchy and the local memory described below. addition to the IRQ, every In working modules space. Therefore, each level has processor may have : (1) some IRQ shared by the next higher level, local 3 memory as sets of memory 16 (2) local working space, and (3) IRQ shared by the next lower level. This concept is shown in Fig Even though operations processor has a can 1.6. supported be a number in of memory conceptually a assigning different ranges of the address space each of operations ( i .e modules. memory three its .LOAD, STORE MOVE etc , , .) Thus and , simple of the modules, memory fashion by processor the to same set of storage addressing same the mechanisms can be used for each of the three types of memory. To places illustrate, suppose the data encoding level (refer to fig request to the memory management level to fetch a of data, given its (The 'id'. 'id' is a a 'FETCH') This message contains an operation and the id of the data element to be retreived the memory management level completes this request, string module calling message and stores it in the IRQ shared by the data encoding and memory management levels. (i.e. byte string unique identifier for the byte string, and is described in detail in section 2.1) The formats a 1.5) data in the IRQ, and returns of level, containing a a . When it stores the byte message to the data encoding pointer to the data in the shared IRQ. There are several ways to implement this hierarchical ensemble processors and code memory modules. One approach is to of simulate the hierarchical structure of structure Research in the area of multiple microprocessor (Fig networks has 1.7). shown that the system improved with a communication linear, single protocols and bus, bus architectures can be used, with today's technology, to linearly connect e i p. View Tra? Level En- 1 r Viev/ fnrrpment LevQl Validity/ Integrity I- Level Virtual Level n Info'. Data Encoding Level i Memory Mgmt. Level 77 ^ y rtual Sto r age Virtual Storage Fig 1.5: F unctional Hierarchy Inte r f a c 17 v.. 1! Level i-1 IRQ (i-l,i) Level - IRQ i (i,i+l) Level i+1 Fig 1,6 Multi-processor implementation of Functional Hierarchy Microprocessors Memory modules Single Bus Fig 1.7 Simulating Functional Hierarchy with single-bus microprocessor ensemble a . up to communication bottlenecks taking without any performance degradation due to microprocessors 60 Another . approach, advantage of new fabrication technologies, is to implement each clusters. As illustrated referred to as a functional fabricated a single chip. on Fig in multi-memory multi-microprocessor several using functional level FPC, bus<Toong80> the on processor of each 1.8, cluster these clusters is and (FPC), IRQ may also be implemented using The whose data buse.s communicate with adjacent functional shown in Fig 1.9 can levels, be a as <Madnick80>. 1.2.3 Summary The advantages of decomposition functional the approach to database computer design are summarized below: (a) Decomposed design decomposition breaks potentially design the implementation and of a DBMS into smaller, much-easier-to-tackle large very Functional implementation: and modules, where each module can be worked on separately. (b) Modularity: interacts clean set of interfaces; therefore levels with" other modules that algorithms Each level of the functional hierarchy through perform the a same task while using different or employing different levels of sophistication can be selectively plugged into the system. (c) Parallelism implementation rates (orders systems) and pipelining: Multiple microprocessor makes it possible to process very high transaction of magnitude higher than currently available p. Fig 1.8: Functional Processor Clusters (FPC) in a functional level 2; p. Inter Level Requests y FIGURE IRQ Structure as a Functional Inter Level Requests 1.9 Processor Cluster (FCP) 2T 22 Distributed (d) control: functions, it is easier to erroneous module. Since each module has clearly defined detect errors and to identify the Also the use of multiple parallel processors at each level enhances the availability of the system in the event of any isolated software or hardware breakdown. 1.3 Research goals, and an overview of this report The goal of this research is to turn the pipelined, multi-processor-based detailed design. (1) have INFOPLEX management database concept of a system into a Specific accomplishments are the following: Identification of functional objectives identified, of system: the We drawing from the current literature in the DBMS area, the following as major features to be included in the design of the system: (a) multiple types external views of the database (b) a high-level conceptual data model rich in semantics (c) a flexible physical data structure (d) explicit support of database security, validity, alerting constraints and virtual information (e) (2) data concurrent use of the database Development of an integrated data model: we have developed model, which is used to describe the database and serve as a a media for inter-level communications (3) Specification database of architectural stratifications in the levels: past and We have proposed examined a more p. architecture generalized layered objectives. functions particular, In achieve to This can used be as a blue-print functional our performed each structures used to implement them at data and are described. software prototype level for 23 implementation and for future study and refinement. In this chapter, we have reviewed the architecture of the INFOPLEX database machine and introduced basic design concepts of the functional hierarchy as outlined in <Madnick79>. In is hierarchy chapter two, the general structure of the functional discussed. It stratification in describes the integrated an rationale fashion, and for proposed the relates to it the literature of various database research areas. structure The rest of the report is organized around the proposed of the Functional Hierarchy. For each level identified in the Functional Hierarchy, the following general issues are discussed: 7) The functions this level performs; 2) The rationale for singling out this level; 3) The implementation strategies; 4) The interfaces; As shown in Fig 1.5 , our design Hierarchy has the following levels: A. Memory Management of the INFOPLEX Functional . 24 memory management level 1) Internal Structure B. 2) data encoding level 3) unary set 4) binary association level Database Semantics C. 5) n-ary entity level 6) virtual information level 7) data validity and data integrity level User Views and Database Security D. 8) view enforcement level 9) view translation level view authorization level 10) The In level report chapter three, focuses on from the lowest level of the Functional Hierarchy. starts we discuss the memory manager. The discussion how this level is to be implemented in order to manage the virtual memory resource and to provide mapping functions of the logical identifier of a data element to its physical identifier in the virtual memory In database chapter four, we describe how the internal are supported. structures of the Descriptions of the three levels supporting the internal structure, namely, the data encoding level, the unary set level, and the binary association level are presented. The encoding scheme of a stored data element may change when the . 25 element is passed from one level to another. may particular, an element In through data compaction, editing, or varl 'Us forms of encoding go right before it is to be stored into the storage hierarchy. encoding level conversion . provides functions perform to types of data these Stored data elements are grouped into unary sets. (The notion to that of grouping stored records into files.) similar data The The unary set level deals with search and retrieval of stored data elements from unary sets. providf'^ It superior levels. "content-addressable" a incorporates It data is the interface to its structures facilitate that searching into the database. The binary specified association the in implements level conceptual schema. connections binary Even though it is classified as one of the internal structure levels, it provides the basic service materialize complex semantic constructs. data element from database the given It is capable of extracting to a the content of an associated element In chapter five, we discuss how database semantics may be built in and maintained. the n-ary A structure of describe to attributes this n-ary construct. which hierarchical levels is proposed. At entity level, binary associations are grouped into an n-ary construct that is used Multi-valued 3 as well an entity the in real world. as nested attributes are built into This level supports an n-ary entity interface, returns an entity based on some description of the attributes of this entity. It also has to resolve certain access path selection . . 26 problems The virtual information functions provides level information which is not physically stored. The validity control level implements constraints on updating of the database. further derive to These two levels complete the discussion of database semantics. chapter six, we describe the implementation of definitions In mappings of external views control and database of three-level functional structure is discussed. security. view The and A translation sits in the middle, performing mapping and translation of views. level Three kinds of views are views, and relational discussed: network views. It shows, by way of hierarchical views, a sample database, how different views and their operators may be translated. Below the view integrates constraints. all translation external level, views On the other hand, and view the enforcement operational enforces level access the view authorization level on top of the view translation level authentiates log-on users and authorizes views to the users. Finally, in chapter seven, we conclude this report, and point dimensions hi er archy for further research in the design of out the functional 27 p. The General II. Structure of the Functional Hierarchy Before we go on to the description of hierarchy, integrated an presented here. area design database and levels the in of the stratification proposed is literature This chapter dwells on the recent database of overview individual management the in systems, and its relationship to the design of the Functional Hierarchy. 2.1 Stratification of the DatabaSe Management System In section, this stratification shall we management database of review concepts important two systems. first The represented by the ANSI/SPARC recommendations, emphasizes abstraction data of and other one, represented by functions. Both a in one, process the three-level hierarchy of data models. DIAM the model, The abstraction stresses of been drawn upon for determining features to be have supported by the functional hierarchy and its architecture. 2. 1. The ANSI/SPAhC DBMS architecture T It to be objectives of the INFOPLEX is one of the able to support information modeller. capability flexibility order In in network to design a DBMS views) of the database, (e.g. as the has that relational, well the as the organization and reorganization of the stored data, INFOPLEX has adopted the various high level constructs demanded by an to provide many different kinds of views hierarchical or Hierarchy Functional a DBMS architecture similar to that ANSI/SPARC study group. <ANSI75, YourmarkTT, suggested Tsichritz78> . by Under . 28 this framework, to as shown Fig 2.1, a definitions (i.e. view insulate in structure definitions (i.e. the conceptual schema is introduced the external schema) from stored schema). internal application The program views are mapped to the conceptual schema, such that changes or additions views will not affect the definitions of the individual of On the other hand, conceptual schema is mapped to the internal others. structure, such that changes to the internal structure will affect only the mapping between the conceptual schema and the internal schema, not Therefore data independence may be preserved external views. the and protection of existing application programs can serve purpose, its but conceptual the be effected. To schema should have the following properties: (1) It is description a of the enterprise that stay will relatively stable compared to the external or internal schema. It (2) is capable existent in the of expressing high level semantic constructs enterprise order in faithfully to model the enterprise (3) It is simple to work with and flexible in restructuring itself to provide different external views. A very similar architecture is found System R <Astranhan76 > Storage System (RSS) . As shown which in Fig 2.2, corresponds to in the description System R has the a Relational internal Relational Data System (RDS) which corresponds to the coceptual and various interface. programs run on top of of level, a level, the RDI to support other user 29 External Model A external Model N External/ Conceptual Mapping Conceotual Data Model Conceptual/Internal Mapping Stored data base (Internal Model) Fig 2.1: The ANSI/SPARC DBMS Architecture ^—Programs to support various interfaces Relational Data System <— Relational Data Interface (RDI) (RDS) Relational Storage System ^ Relational Storage Interface (RSI) (R5S) Fig 2.2: The Svsteni R Architecture s 30 The choice of data models has levels a impact great functional on design the hierarchy. We shall interfaces between examine, section 2.2, the significance of choices of data models each in the at levels, and approaches taken in the design of the three the of of of Functioanl Hierarchy. 2.1.2 The DIAM concepts Senko <Senko73> has also exploited to of stratification implementing in Independent Accessing Model level), DBMS: the string model model. device physical a (the basic construct the Encoding model database a (DIAM), as shown four levels of abstraction for info-logical great extent a Even in Entity the functions. model, is with emphasis the in a a of model Set though there is certainly the line a (the level), of higher database at of DIAM ' abstraction of database of lower functions internal its builds its always layer a the similarity a purpose the decomposition further that has identified implementation level), and implemented those stratification of <Navathe76> along This results in functions on top of example more Data His (the const ruct-bui Id ing between DIAM and the ANSI/SPARC architecture, stratification system. Fig 2.3, concept the layer. is Another presented in more limited context. 2.1.3 The INFOPLEX Approach The architecture of the Functional Hierarchy the following features of a DBMS: strives to achieve : Concept Names Entity Set ^ Representation Independent Language ± Translator J±. M- String Model Representation Names Encoding Model •^ Physical Device Model Fig 2.3: Representation Dependent Language The DIAM Architecture Data Structure Conventional hierarchical or network models Info Structure Nonr.alized relational model Schmid; Chen: roleconcept Aggregation generalization E-R model; Senko Entity-Set model Fig 2.4; Conceptual Data Modeling McLead: SDM Codd: RM/T AI: Semantic Networks 32 (1) Support of multiple types of views and of the structure of views Separation (2) the structure of the stored data relatively (the stable and simple data model 'external schema') (the 'internal schema") (the by a 'conceptual schema') (3) Implementation of various stored data structure techniques (4) Explicit support for virtual information, validity/consistency checking and security checking (5) Support concurrent use of database In order pipelining to and achieve these features and at the functional abstraction in same the system, hierarchy is given the propsoed architecture as shown 2. 2 in realize time the functional Fig 1.5. Data Models From the discussion above, it is clear that selection of data models at the external, conceptual and internal levels has great impact on design of interfaces the supported at each level. in the area of data between levels and functions to be This section presents an overview of research models and points out approaches to be taken by the functional hierarchy. 2.2.1 Conceptual Data Models 2.2.1.1 Literature Overview- Recent research in the area of logical data models has followed 33 directions. two network). hierarchical, relational, and models (e.g. which suffer from deficiency in semantic constructs. have made been enrich to is especially them, efforts refinement of the in a relational an attempt to understand better the meaning of a relation by recognizing functional dependencies among the further <5chmid75> characteristic data classified relations by 'type'. which by indicating type type) entity (e.g. <Codd72>. Schmid He suggests that, association type, type, normalized relation belongs to, the meaning of a storage operations (e.g. insert, delete and update) the on relation This concept of 'type' of relations enrich semantics of are clarified. the in Numerous The concept of 'normalization' relational data model. model argued is It models are basically 'syntactic' data models conventional these that focused on enriching the conventional data has One syntactic strictly structure, and has motivated work on further The Entity-Relationship normalization <Fagin77a>. model proposed by Chen <Chen76> is also concerned with an improved modelling technique to be applied to real world facts. added the concepts an & 77b> then and cluster Along the similar line, Bachman <Bachman77>, in attempt to extend the network model, introduced the role concept in representation of real world entities. data generalization aggregation, of membership attributes. Smith and Smith <Smith77a model semantic Semantic Data Model concepts such included. A extensions as recent in constructs (SDM) cover paper by is A comprehensive found codd semantic constructs into and a where event-type <Codd79> of the development of the <Mcleod78>, McLeod aggregation by in discussion has additional entities summarized are these relational model called RM/T. One way to suramarize these developments is to plot them on a 34 one-dimensional chart (Fig where 2. A), strictly syntactic data models such as conventional netv/ork models, and more semantics are added progress the Semantic the in v;i information naturally area of artificial naming a and the data models as they to the field of artificial Networks provide the user as hierarcliical right, then at the right end of the chart there be models proposed v.'ill as towards left end there are the at powerful set th a where <Roussop75>, model to world real Data models developed possible. as effort is made to an tools of such intelligence, in the intelligence also strive to provide flexibilities in certain set of objects, depending context the on of the application and the angle from which information is viewed. Another direction of research in logical the identification of a basic models data construct. simple This construct, sometimes termed "minimum information unit", is simple, clean semantics, and may be easily collected in represent complex a not flexible varieties in semantic structures. restructuring in themselves N-ary relations (or single-leveled files) v/ith <Schmid75>. the real relation All the fields in world, to purpose. a a different view. and binary relations are more plagued by semantic However, the ambiguity tuple are equally associated, while in a single tuple may lead to meaning of the information. misunderstanding of Even though considerable efforts have been put in the concept of 'normalized' best They some associations may be direct and others indirect. Placing all of them in the is and Hierarchical and appropriate for use as the basic construct in this sense. conventional n-ary clear meaningful fashion to network models are considered too complicated for this are emphasizes relations, it is felt that guard against spurious information is a the binary association model. 35 has been argued that It approximation the association binary association Fal kenbery76>. model data Briefly, semantics, clean close much more desirable technical properties than n-ary has The advantages of relations for use at the logical level <Senko77>. binary some or it discussed are and are most <Bracchi76, in associations believed that binary is a have flexible in supporting various external representations. 2.2.1.2 The INFOPLEX approach is not our It However models. use of a purpose here to add to the debate we in binary network type of model as the basis for logical handling representations. of the proposed a and mapping of the support the remaining part of this section, network binary design. binary network are clean semantics and its mul t iple- view In data look into research in this area, and propose the The important qualities of ease various of model examples demonstrating these qualites. given, is internal description a followed by some We also believe that the binary construct is capable of supporting more complicated constructs demanded by some recent data models. The Binary Network Model: A visual presentation of our binary network Fig 2.5. represent There some Primitive Set is generic are objects a basic four group or of facts (BN) constructs. in primitive elements a is Primitive real world the properties and therefore are given model that (Fig have shown in Elements 2.5a). A similar common group name, called p. 9 Office Automation "jac^]" Decision Support "MARY" Fig 2.5a: PrJMtive elements /Office Automation * . ^ "JOIIN" \ I I ' i I \ ^ Decision Support \ "MARY" ' Z N PRDJ / ' H^l_^ EMP Fig 2.5b: Primitive sets ' (P set) Office Automation^, ( Decision Suppo ^ PRDJ EIIP EI:^IM o Fig 2.5c: Binary associations ' Office Automation u__V, Q ~ \ "JOTN" \_^Decision Suppot 7 c Ero V -r-"MARY" / ~J' Name Proi Fig 2.5d: Binary association sets (B_set) \ ( I m\ 36 37 Primitive Set Name, or Pset_n3me (Fig 2.5b). a representations of elements the relationships among different primitive sets (Fig 2.5c). from associations group of binary (i.e., world real some Binary associations are primitive incident have that similar A pair of primitive set names ahd binary set is a properties generic elements belong to the same primitive sets, and the associations have the same meaning). a primitive pair a of is designated by It association names (Fig 2.5d). portion Fig 2.5d, the upper In (primitive portion represents the schema of the database. our BN 'nodes') model composed is of and binary association sets Therefore, (also known as the graph, In together. By the value network a value node. nodes or world attributes. objects An entity 'equally related' we mean that those nodes tied to instead this entity node are all direct attributes of the entity node, of 'derived' schema 'arcs'). binary a node can be either an entity node or a entity node serves to tie all equally related nodes binary primitive sets (also known as the Further classification of nodes and arcs: schema and represents the instance of the database, while the lower associations) of elements An (tangible or entity node corresponds to any set of real intangible) that have some common set of attributes which are revealed by the node's binary connections to other value nodes or entity nodes. associations; i.e., an It's own identity is reflected by these instance of an entity node does not have any value or identity, and its designation is made through instances of its associated nodes. collect equally In a sense, the purpose related binary oi' associations an entity to form node a is to semantically 38 Therefore an entity node is also clean n-ary association. as entity n-ary an to Those nodes that are not enity nodes are node. entity Value nodes, in contrast to value nodes. refered nodes, have values assigned to their instances. Arcs can further be specified by several parameters. which is given in terms of 1:1, syntactic function, The other, to be specified for semantic function, 'association', 'total' direction given is terms in 'optional', or ' of the n:l and n:m. l:n, the is arc, is the 'hierarchical' or of candidate_key ' or 'non-key', parameters help further define and clarify the meaning of These etc. which each One the storage operations on these nodes or arcs in our conceptual schema. the instance of an incident node For example, of hierarchical a arc depends on the existence of the associated instance at the other end of arc the existence, while the association arc does not imply this for restriction. A network binary these with schema distinctions is presented in Fig 2.5e. At the instance Conditional Arcs: where the existence of instance of another node. an with associati.^n a an association For example, value node there level, associated TYPE of an instance of the node situations depends on the value of the an entity node TYPE, are and if PERSON is PERSON may have the value of the 'doctor', then this instance will have an association with another entity node DOCTOR; on the other hand, if it is associated with a TYPE have an association with another entity node NURSE. the existence another node. of \le the instance 'nurse', This it will means that of an arc may depend on the value of shall distinguish this kind of conditional arcs from p. 39 (Convention: : O: I : : Entity node Value node Arc Direction of Spec. A: assoc.; H: Hierarchial; O: optional; T: total candicate key i; NK: non-key) K. : J, Fig 2.5e: An exanple of BN schema with distinctions of nodes and arcs A,T,K,UNCO: 1:1 H,0,NI'.,COr rig 2.5f : An exanple of a BN schema with conditional arcs . 40 unconditional the Fig 2.5f sliows arcs. diagram incorporating this a distinction. We shall conclude the description of the BN model its definition language specifications. 'Def ine_Nset node; in 'Def ine_Vset Specifically, specification. the ' Fig 2.6 by referring BNF expression of is a ' will create will create an entity node; the nodes that correspond to example of corresponding node-arc diagram connecting given is 2.7a, Fig in given in Fig 2.7b. is this the domain of the attributes. to definition schema a value a and the attribute set the Nset definition' is manifested by creating arcs Nset to and An its is believed It that this definition language is very simple to understand and easy to use Implementation: (the lowest level This level in accepts The BN model implemented at is the hierarchy that supports the conceptual schema definitions in the form as constructs. interpretes operations against the (More details given are implementation of the n-ary level.) to the internal access path design. schema schema 5, It to be these of which describes The binary netv/ork designer The latter is instances chapter in made also is as a basis for the file and specified in an internal specification language implemented at the internal levels. binary network is also made known to the external describe network. catalogue of all the value, binary and entity sets defined in a the schema and known level 'database semantics'). shown in Fig 2.6, and generates the corresponding binary keeps N-ary the different external view levels. types of views, which are schema designer implemented at The to the ; ) 41 <BN stateinent) = <Vset> <Vset <Nset> <Attr_list) ( Attr_dG scrip) < dcmai n_nan-ie > (Arc_spec> = = = = ~ <Hier> <Key> ^.condition) icon> <link> < equivalence) <Bset_nairB> 1: 2: , ) <Attr_list>^Attr_descrip> Arc_spec> attr nanie < doiTiain_nameV Hsetnarre Vset namg op name ] Sem Syn [< equivalence) ( | , , ) <f , ] , , [ l:l[l:n|n:l[n:m Hier> i Hier ( Itotal , 0:irparatlve> ,^Key> , ^condition) Assoc Optional I Caiid^key | I-Jon_key Lmcon attr name ^lin]-:>) <;link> (].iteral, Nset name (literal, Mset name eq=Bse t_name R eq= Bset name' Nset nama op name Nset nama. Attr name <icon>| , ( ) 1 ^ , ) . [ . denotes optional paraireters; terminal symbols are imderlined. ] Bset_narrE.R refers to the reverse of Bset_nan-B [ A Bl^ Fig 2.6: (a) ( <Attr_dGscrir-j> | = = = = = = = = = = = <.Inparative> ^3ote: <Nset> Define_Vset ( Vset name ) Define_Nset Mset narre <:Attr_list> I specification of the schema definition language Define_Nset (EMP, (E#,E#,l:l,H,T,K,\incon) (DEPT,DEPT,n:l,A,T,IK,uncon) DefineJSIset (DEPT (DN,DN,l:l,H,T,K,uncon) EMP,EMP,l:n,A,0,NK,uncon,eq=EMP.DEPT.R) ) Define_Vset (DN) Define Vset (EMP#) l:n A,0,M<,uncon A,T,NK, (b) H,T,K 1:1 Fig 2.7a & 2.7b: An example schema definition and it BN graph 42 Some examples are given here Examples: demonstrate to the advantages of the Binary Network data model: Semantics: Clean (1) potentially relational schema may while a illustrated As ambiguous carry n-ary an 2.8, information, binary form of the schema eliminates this ambiguity. (2) Ease map an of mapping into different constructs: relational n-ary relationships into while the mapping is explicit, constraints. it (For containing schema more performed is is awkward It to one-to-many equivalent hierarchical form (Fig 2.9a), its naturally from Binary the Also, since all the binary connections Network schema (Fig 2.9b). are Fig in easier example, maintain to deleting certain product a semantic will trigger deleting of the shipments of that product.) • (3) Ease of mapping into internal constructs: the binary network may be mapped structures by simply into specifying a Fig 2.10 shov/s how variety of nodes how internal and arcs are to be while this convenience does not exist in implemeted; data other most types of schema representation. Support of rich semantics: This subsection summarizes the BN model's capability of supporting rich semantics. BN model, while parsimonious in This is done to show that the its constructs, can be used as the simply by basic building block for richer models. (1) multi-valued attributes: This specifying the syntactic function of attribute to be l:n. is the acliieved arc implementing this No other explicit specifications such as 'characteristic entities' are necessary. p. 43 7^ ainbiguous n-ary scheina: M7\NAGER| SEQSTZXRY SZvIARY (Does the SALAPY refer to the MANAGER'S S7iLARY or the SECRETARY 's SALARY?) Elimination of aj±)ituity through the use of the binary form: riANAGER ISEa^TZARY I \/'SAIARY Fig 2.8; (a) Binary (SALARY refers to the SECRETARY'S SALARY) netV'/ork n-ar^^ relational schema safeguards clean semantics — p. Pecord SegrrentatLon "" \" Stored record 2 Stored record 1 — Record pointer linkage — Inexing a record — Record integration Fig 2.10: Exanples of internal specification using binary network data model 44 p. (2) This is achieved by specifying the domain of the aggregation: aggregated attribute which restriction a torn ic (3) ) be to entity. another (There is no says that the domain of an attribute has to be . generalization: This is achieved the case of a property. In by that to at association while the unconditional arc. from In lower arc unconditional generalization an lower level a conditional the hierarchy <Smith77a>, the association from level 45 is node a based on a higher conditional arc, a higher to at based is on an the case of an alternative general izai ton conditional. <Codd79>, the associations in both directions may be scheme also implements cluster membership attributes and the This role concept. (4) hierarchical association: arc 'hierarchical'. be to This is achieved by specifying It clarifies association provides external identify to the that this fact the the 'child' node, and deletion of an instance of the 'parent' node necessitates deletion of all the associated instances of the Summary: 'atomic In Our approach bears certain resemblance semantics' that paper, semantics, and simple while n-ary molecular relations semantics referred are represent In propose to use binary associations as 'atomic' to show concepts to introduced 'molecular semantics' atomic semantics to form complex constructs. ahead node. 'child' a to 'bonds' similar semantics. in of <Codd79>. as atomic that tie up spirit, we We also move how these atomic semantics are actually implemented at the internal levels, and hov; they are collected to realize more complex logical structures at the semantic construct levels. Moreover, we will 46 sent the formation of different views for the from user end the underlying binary structures. 2.2.2 The Internal Data Model 2.2.2.1 Literature overview The internal structures of conceptual database. a the outcome of a schema database used is to The choice of a design database physical describe data physical physical data structure is process, which uses the and statistics on usage of the database to generate either an optimized or physical model data a 'good' deesign and response time, under physical data structure. The of performance, i.e., good throughput is good certain access/update pattern a goal and on load the database. The scope of physical database design spans the problem (e.g., sequential selection problem (e.g., segmentation and hierarchies .sometimes included and in or sequential allocation physical record), and the memory file inverted scan or file structuring list), the access path indexing), the record problem (e.g., the number of fields in reorganization files allocating problem. among the physical database design, problems The a of storage devices are but they are not addressed at the internal levels of the Functional Hierarchy. There has been database design. a great deal of research in the This results in a of area desisre to support a physical large number p. of data structures in of structures these database management system. a is given <Date77>, and in survey general A 47 survey of physical a database design methodologies is given in <Schkolnick78>. In order to support a large number data model has to be very general, i.e., internal structures, data of the it has to be a model through v;hich the various data structures may be described by the implemented by the DBMS. and dedicated been has While most of the research in data models conceptual to in models (as indicated data have previous section), some prominent ideas context generated been in the the in the data translation and conversion problems <Smith71> and of the development of the DIAM system <Senko73>. In the DIAM system, the concept of introduced. computer. The A It form.er pointer basic encoding may field (Fig information The and idea is and that, definitions of the control information parameters structures data can This provides be realized. describing data structures, and a a data field. into the identifier field, the length and encoding type) 2.11). is unit of data stored in the is a be further broken down (i.e. basic encoding unit (BEU) a unit is comprised of control attribute field The user the relationship by manipulating of a common basis for a various BEU, powerful media for implementing them. implementation will consist of functions that decode the parameters and build up data structures accordingly. The concept of BEU summarized generalize all data structures in unit), and allowed variations a to the attempts up to then to single construct (i.e., an encoding be parameterized. Use of the BEU p. CDr«TC)L INFO, IDN ATTRIBUTE LINKlPOirJTERS d. DATA 7 ETIP tt ^ ? Fig 2.11: Format of a BEU 0396 Ju Y JOI-i;S Al 48 p. concept is extended and further formalized in authors paper by <Fry77>. a The that paper adopted this concept to express the "translator of view" in their data tranlation project conducted at the Michigan. ^9 They call it Logical the University Encoding Unit (LEU). of Several operations are defined on the basic construct: 1. Collapsing/expanding: this pair operations of encode and decode data values into bit strings; 2. Extracting (factoring) dispersing / first operation condenses the encoding fields into catalogue a relationship pointers are pointers entry. expressed by physical contiguity, or (distribution) unit It also : common bringing by may the specify how actural (i.e. whether etc.). The second operation by does the reverse. 2.2.2.2 The INFOPLEX approach We have adopted the BEU approach to internal modelling because its power and simplicity. is It considered fairly general in its ability to encode various data structures, and at the neat to work with. is used to In describe of same time very chapter four, we will describe how the BEU model and implement the physical data structures of a database formatted in terms of the BN conceptual data model. 2.2.3 Multiple-View Support As discussed in section 2.1, a sophisticated DBMS ought to be able . 50 p. This is important to support different external views of the database. on two grounds: (1) Protection of investment programs: of the existing application programs are either written in Most conventional environment without on application existing in database a IMS). old It is system that employs important that a implemented database system or a a a different data model (e.g., DBMS is capable of 'simulating' the data structures so that the exisiting application programs do not have rewritten be to scratch. from practical This consideration is critical in implementing conversion from one DBMS to another (2) . Diversity users: in views in different applications and by diferent Each user's view of the real world may differ depending on the application context and the preference of the individuals. selection <Nijssen76> it is pointed out that data model of application an by the user is analoguous to selection of Therefore an effective DBMS should be capable of In a religion. providing the 'freedom of choice' by supporting diversity of views. Supporting multiple views requires: integration during (1) view modelling the logical database design, and (2) the logical Navathe78, of a view. The design Vetter77>. In data and the first one has been discussed in the context of database concpetual view specification and implementation of the mappings between the external views conceptual and process fact, model it is which 'integrated view' of the database as <e.g., Bernstein76, very related to the development describe the result of view integration. The must a Chen76, be used to 51 second one, other the on hand, is issue in the design of the an database management system, and is to be incorporated into the external view levels of the functional hierarchy. essence, the external view In levels are responsible for accepting definitions of the views in of different data models (e.g. relational or hierarchical, etc.) their structural and operational based the on mappings schema conceptual the to and network model, and translating operators issued binary against the external data models to the operators. terms equivalent schema conceptual These are problems to be addressed in this section. Scope of the problem: In order to clarify the mapping problem, levels three of complexity of the multiple-view support are defined here: (1) A This is the simplest level of the mapping problem. Subschema: subschema is a conceptual schema. the conceptual view that represents strictly a subset of the For example, if a relational model schema, the relational, and each individual view contains relations subsets (in terms of either defined for the conceptual extremely useful degree of data expressed in for schema. cardinality) or subschema The data But it models does to not fully objectives described in the beginning of this of in are that of those facility is security control, and does provide certain independence. different degree used views are also external allowable is provide accomplish section. views the Examples this kind of facility are DBTG's Sub-schema facility <DBTG76>, IMS's logical database facility <IMSa>, and System R's view p. 52 facility <Astranhan76>. Simulating (2) This level of external data model: different a mapping actually involves more than one data models. in the research of the Database Computer that it has been , hierarchical or network type of external models explicitly models into Another idiosyncratic the the example is information record-oriented DBC model on top of System R by incorporating into the relations these external <Hsiao79b>. model a non-relational data sequence-number field level of multiple-view This <Astrahan76>. support has generally ignored the a incorporating by about data implementation of the shown can be used to accomodate relational, model data DBC the (DBC) For example, possible interactions between different external models due to the explicit altering made to the conceptual schema. (This is conceptual schema in question is semantic-oriented.) It syntactic-oriented is one step but may still not be ideal in due to the fact that the largely than rather above the subschema approach, models supporting multiple types of simultaneously. (3) Transformation: multiple view hierarchy This support, strives to is the is the achieve. It and external data models simultaneously. kind of view support is that the most kind ambitious level of the that supports The basic functional the multiple types of premise conceptual schema this of is an embodiment of all the knowledge available, and the external models are merely different templates for abstraction and of this knowledge. transformation As pointed out in <Falkenberg77>, the process 53 p. of realizing external the 'de-conceptualization' views analoguous is process discussed in linguistics. <Klug77>, the correspondence between the external levels is described to be similar in and the to Also in conceptual nature to the implementation of one abstract data type in terms of another. The implications of implementing the last type of multiple-view support are the following: (1) Three levels of abstraction for the mapping process are to be considered <Rothnie76>. At the highest level there is data the model correspondence, which specifies the mapping language for the two different schema level) , models. At the data definition level (i.e., the an association between the exteranl schema and conceptual schema is defined in terms of the mapping language. instance the level, the At the mapping must perform the translation of the external operators into the conceptual operators. (2) mapping The (basically name has to cover mapping) and both correspondence structural operational correspondence (i.e., translation of operators). (3) There has to be control over interference different among views and proof of correctness of the mapping. Correctness of discussion in <Paolini77>, a also gives an Mapping: There has relatively been little mapping. In need for formal definition of mappings is proposed. It the literature about correctness of example of interactions among external models. <Borkin78> furthered this study by providing a set of Borkin formal p. 54 models in definitions for data model equivalence. INFOPLEX results hierarhcy. than in data Support of multiple external INFOPLEX approach: external view levels in the functional multiple These levels are basically parallel to each the views expressed in a rather Each level is designed to provide all hierarchically. organized other, particular external data Presently, model. the relational, the hierarchical, and three data models are supported: the network models. In chapter six we how show shall structural the conceptual schema are Proof specified. of correctness conducted as one of the future research dimensions. a will be Basically, to show mapping is correct, the following is to be demonstrated: Given a conceptual data model of conceptual operators would like to C^ construct and a operational mapping algorithm t 2. 3 as mapping between the three types of external models and our operational that well as Summary = MCc) a C, an external data model set of external E, operators a set E we strucutral mapping language M and an M^ such that p. We have discussed the formation of the general functional structure architectural and We have functional examined the literature logical data hierarhcy. both Research in the models and physical data models is reviewed to shed light on the structures to be functional from points of view and identified functions and their organization in the functional hierarchy. of the hierarhcy in the context of recent research in the database management systems. area of 55 supported by the levels of the Finally, the need for mul tiple- view support and problems associated with it are discussed. . 56 p. MEMORY MANAGEMENT III. As described in consists computer hierarchy, which is large virtual previous the a collection storage, and the database management functions. between these two components. functional hierarchy — storage of functional There is a the INFOPLEX components: hierarchical two of chapters, devices implementing hierarchy level a because it virtual storage interface in interacts that logical units of data) the the storage with large the We have isolated memory management as deals of a DBMS. a physical memory issues (e.g. with bytes and byte addresses) which are very different from logical (e.g. a implementing We now start at the lowest level of virtual storage address space. level storage the hierarchy through the virtual storage interface and manages separate database issues This level is depicted in Fig 3. 1. 3.1 The id approach The primary task of the memory manager of storage virtual while We first give manage vast a volume insulating the rest of the system from the details of virtual memory management. proposed here. is to a An approach using the data id brief description of this approach is and then provide some rationales. This method divides the entire memory into pages. data is stored in a page, it is given an id. When piece of The id comprises the page number and a extention of the TID <tuple id> concept used in System R pointer slot number within that a page. idea The ( is Astrahan76 an ) : Higher Level Mem Mgt Interface y_ Mem Mgt Processor local memory /\ \/ Storage Hierarchy Fig 3.1: page h Data id h — A \ Memory Management Level page# X data area slot# ) pointer slot format: Flags pointer area 58 p. The pointer slots reside in a special region, called the pointer within each The concept page. If a item is moved the is presented in Fig 3.2. data item is moved around within needs offset pointer slot contains an offset within this Each page. pointer to another page, slot actually a flag points same the only page, remains unchanged. be updated, and the id to area, the a data is set in the original slot, and to the data If item's new page/slot. This operation is illustrated in Fig 3.3. the data element is to be moved If its original time, (see Fig 3.3). pointer slot (h,k) In order to to is a new page updated and for (l,m) second a is support this process, the original released id must be stored along with the data element each time it is moved (h,k) to a new page. Using this approach, the memory manager isolates the virtual storage from higher levels of the system, i.e. higher level modules do not have direct access to the virtual storage. Levels above the memory manager reference data physical 'byte addresses'. (a) solely via the id, and are not concerned with Advantages of this approach are: The length of the id may be much smaller than that of the byte address of the virtual memory. (b) data It provides a (i.e. address) id) such separation between the logical identifier of the and that the physical identifier (i.e. the byte when data are moved around in storage, their id's need not be affected. page h data (a) h id; k page 1 p. 59 p. The memory manager (c) insulates detail' 'byte the 60 other from levels, centralizes the memory management algorithms and policies, reduces therefore and complexity the eliminates contention involved shared in other of usage levels of and storage the hierarchy. 3.2 Allocating storage space When request to store a manager has to store data a In order space, the memory manager has to have keep table, a in which chained memory the to keep track of free storage catalogue. a (2) method One is to each page has an entry, which points to the first free slot in that page. are received, is allocate some free storage to this data item and (1) and update pointers. it item All other free slots of same the page The catalogue also contains an offset of that together. This is length field is page which indicates the starting byte of the free data area. shown in Fig 3.4. Data stored for items are of variable lengths. v;ith each item. When an id is Therefore, presented to a the memory manager retrieval of the item, the length of the item is first examined and the number of bytes to be pulled it.self may be of out determined. variable length to accomodate a The length field wide range of sizes of data items. For simplicity, boundary. When a data item the size of the is usually not split across page free data area of the first available page is not enough to store approached, until a Another field found. a particular data item, the next p. 61 page is page that is capable of holding this data item is may added be the to which catalogue, page indicates the size of the free data area of each page. 3.3 Page reservation and clustering consideration Even though c-ntrally decisions allocation definitely is it advantageous storage have to at the memory management level, made there are situations in which, for practical reasons, certain areas Once the storage space are requested to be set aside by higher levels. pages assigned are be dedicated to certain purposes, to longer be used for storage of items outside example, some pages may calculates a the id of id to the memory manager, they can no purposes. these For be reserved to store only elements of a set which are stored according to actually of of hashing a function. higher The level data item to be stored and passes the instead of having the latter assign the id. Special parameters are incorporated into these commands. 3.4 Virtual Storage interface It with is the through this interface that Storage memory, and conventional Hierarchy. the memory The latter provides STORE/LOAD operations computer. Details are of manager a performed virtual storage interacts byte-addressable as within a operations are completely concealed beneath this interface and are responsibilities of . p. the storage hierarchy. equipped with memory of a memory The simply manager views itself 62 as very large size. a 3.5 Memory management interface The memory manager provides the following functions for modules of higher levels that call for service (refer to Fig 3.5 for their logic): operations arguments (mode, byte_string, CREATE idO) UPDATE (id, new_string) DELETE (id) RETRIEVE (id) RESERVE (code, no_of_pages) CREATE is invoked when a data item (i.e. be stored in the database. storage area CREATE: (1) is (2) may be a byte string) passed is to Other parameters concerning the data item's passed at the same time. In Regular mode, There are the calltr does not care where 3 modes for the item to be stored. In id mode, the caller specifies the id of the item to be created (3) In approx mode, the caller provides the id of another item around which this new item is to be created. UPDATE replaces the old content of the data element designated via id with the new byte string passed. the same size of the old one, new the if stored is page) is string byte discarded, and a If the it new string is of a p. 63 smaller or written over the old one. is However, larger, the area where the old string is is new free data area (preferably in the id is not affected. In either case, sought to store it. same the DELETE is effected simply by chaining the pointer slot to the free slot chain and setting not recaptured until compaction page a through pages to collect them. the number this counter exceeds a deleted bytes of functions to is invoked walk to order to facilitate page compaction, in a a flag is set for this page, and (see Fig. that When particular page is recorded. critical value, a module In request for compaction is filed In addition The data area freed by DELETE is flag in the slot. a 3.4). may be invoked through the interface, there are miscellaneous housekeeping tasks to be maintained. compaction Page is one, and reorganization may be another. data area statistics collection data and Other design issues such as page size, sizes of pointer slots and length fields, as well as size, whether an overflow area is to be reserved for the page, etc., are to be discussed in detail design. One final consideration at this level catalogue is be to stored. is the place where the Since the storage hierarchy is directly accessable at this level, it seems natural to use part of this storage to store the page catalogue. An is virtual adequage number of some pre-determined pages may be assigned to the page catalogue, of the catalogue page retrieved by the following formula: and entry 64 base address of catalogue + (page no - 1) * size of entry The structure and information to be stored with the catalogue are to be determined as to how during large decomposition, the detail design. catalogues since the set are of to In be general, it opens maintained question functional in catalogues represents a a very large database, and data structure manipulation functions devised to maintain the database may also be needed catalogue represents a to housekeep catalogues. The page design problem and different alternatives and their tradeoffs are to be explored. . ; ; ' ; ; ; 65 operational commands at memory management level Arguments suffixed by '*' are return arguments) (id, byte_addr*, code*) calculate slot addr of id; load content of slot; if free flag set, then code = 'free', and return; else if new page not set, then go to 7; else use info in current slot to obtain new slot address; load content of new slot; calculate addr of data item and store it in byte_addr return (byte_addr, code='o.k.') Fig 3.5: (note: LOCATE 1. 2. 3. 4. 5. 6. 7. 8. CREATE (mode, byte_string, idO, id*, code*) mode: 1. LOCATE (idO, A, R) 2. if R not equal to 'free', then Return else id 3. 4. 5. 6. 7. id = (Code= contention ' ') idO; F_CHAIN (' remove id, l=length byte_str ing) LOCATE (idO, A); store byte_string at A; Return (idO, code= o.k approx mode: 8. (assume idO=(pO,kO)) p=pO; l=length byte_str ing) 9. if Reserve (p) not set, and F_space(p) >= 1, then go to 11; else 10. p=p+l; go to 9; 11. let k=F_slot(p) and id = (p,k) 12. go to 4; regular mode: 13. let p=PAGE; 14. if F_space(p) >= 1 then go to 11; else 15. p=p+l; go to 14; ( , . ' ) ) ( (Note: DELETE 1. 2. 3. 4. 5. 6. 7 UPDATE 1. 2. 3. 4. 5. PAGE is a variable that points to an immediately available page) (id) LOCATE (id. A, Code); let Ll=length of data element at A; F_CHAIN insert' id, LI); if new page flag not set, then return; else let idl=id of new slot; F_CHAIN (• insert' ,idl); return; ( , • byte_string) LOCATE (id. A, Code) let Ll=length of data element at A; byte_st r ing then go to 8; else if Ll>= length let idO=id and call CREATE ('approx', idO, byte_string, id); if id and idO are of the same page, then update content of slot designated by idO; and call F_CHAIN ('insert', idO, LI), (id, ( ) . . 66 6. 7. 8. 9 and go to 9; else set new page flag at slot designated by idO; go to 9; store byte_string at A; return; F_CHAIN (op, id, L) where op=' insert' or 'remove' assume id = (p,k) 1. if remove op, then remove kth slot from free slot chain, update F_data(p) by adding L to it, and if p full, declare it; 2. if insert op, then insert kth slot into free slot chain of page p; increment delete_byte_counter by L. If critical value exceeded, add page p to compaction request queue; set free flag of that slot. 67 INTERNAL STRUCTURE IV. 4. 1 Introduction In chapter two, we have discussed the choice of We shall, in the present chapter, expand this internal modelling. for concept and show how the internal levels of designed are concept BEU the hierarchy functional the support the binary-network conceptual data model with to various data structure techniques. Our approach produces to the conceptual data a gradual mapping of the internal model. highest The level of our structure, the binary association level, may be viewed level of the conceptual model loosely construct building. then distributed coupled with set of is parameters levels that are parameters specifying how many indexes particular level lowest the as simply are be to internal consistency concern. of binary the guide that These parameters are checked for to internal The data definition language to itself. be accepcted by this internal structure definition construct and example, For maintained for a elements are processed by the unary set processor, while those specifying how data is to be edited before being stored are processed by the internal encoding level. It is easy to show that changes in techniques of internal representation can be accomplished by changes in the parameter space presented to the internal schema writer. The parameter the database conceptual space is virtually the collection of tools available to designer. definition is an example of how a While these remains stable. parameters may change, the This parameterization approach true separation of the conceptual schema and the 68 internal schema may be brought about. the following specifications are made to the BEU's: In our design, 1. A BED is the smallest logical unit of data to a collection of generically similar BEUs. similar' we mean that they share common control 'generically By information which has been factored into the catalogue entry of the unary set. identifier field (which is used to name the unary set) by link a to 'factored'). contiguity. 2. Binary catalogue entry (i.e. a This link may be a pointer, is to be specified by the It is are links also The replaced the identifier field is a table, or via physical internal schema writer. are implemented by association links. relations meaning of these entries. and They are grouped into sets, called Unary Sets A unary retrieved. set is stored be factored into the The catalogue Binary links may again take the form of actual pointers, physical contiguity or data duplication. 3. We break the relationship pointer field of the BEU into two areas, one called SP area (Set Pointer area) (Associative Pointer area). has to exist before Therefore we follow managing levels. other, 4. these two a It natural types of unit unit route that connection set breaks the task of into two hierarchical processor level, and the on top of the former, the binary association level. The stored representation of encoding encoding is obvious that an its associations to others may be created. One is called the unary built and the other AP area may the data value field of the be very different from that of the unit being processed at various levels of the system. This specification, if 69 common to catalogue encoding units of entry of that a certain set, may be factored into representation transformation as relationship management. We set. a Therefore identify the task of stored very different a a task from level called data encoder the is isolated for this job. To conclude, 4.1. the format of our BEU takes the shape as shown in Fig p. Header part Length Eield Data part 70 p. 71 4.2 Data Encoding Level this section, the data encoding level, and the In unary set level, the phrase a one, 'set', unless otherwise qualified, to the unary set, while the phrase to next 'element' or refers element' 'data the refers BEU. The data encoding techniques is singled out to implement various encoding and text editing, such as suppressing of data in level blanks and duplicated characters in the text, other text compaction techniques, crptographic methods to encode data for protection, etc. An element to be stored it belongs to. A catalogue is passed to this level along with the set traversed to determine whether it is is a set of which the data part is to be encoded according to some specified function. encoding If it the element is stored as it is; is not, function is before it is stored. stored element is set if it has gone through encoding, and procedure (i.e. it is, the located and transformation performed on the data part of the element (refer to Fig 4.1) a if decoding) A flag a of reversed followed when this element is retrieved. is 4.2.1 Data Definition Interface A set that requires data the Encoding Structure definition, as shown in field encoding will have Parameter the following: Space (ESPS) an coupled element In of its p. 72 Define_set (Setname, other parameters, u6ESPS). This parameter is then given u the data encoding level of data encoding methods may be precoded name, may and to build into this level, each given be invoked by giving this name. a Various types the set name serving as the key entry. with catalogue, to a Data encoding is then accomplished by executing the procedure that implements the method. To facilitate fast retrieval of catalogue entries, hashed be generate to address the catalogue is small, it may be stored level; it if accomodate it. large, is procedure memory working the If this of an entry is made up of the set name and The latter is represented excuted. be to of its catalogue entry. the may set name the virtual storage may have to be used to In either case, encoding method name. in a by pointer a to a Procedures, again, may be stored either in the working memory or the virtual storage. 4.2.2 Operational Interface Requests to create, delete, update and passed of a down from higher levels. retrieve an element are An element is distinctively composed header part, which is intact at this level, and a Also data part. passed as an argument is the set name of the element. In memory element. short, this level sits between the unary set processor and manager to perform transformation of the the data field of an Clearly, the system will still function without this level. p. It the to represents an option presented to the user. encoding 73 When this level exists, methods that it supports may also differ from one system another, depending on the needs of the user. 74 p. Unary Set Level 4. 3 4.3.1 Introduction The purpose of this level is elements link to facilitate fast retrieval of an element in element stored one Primary Set. These defined. in is clarified Every first. The set processor maintains a catalogue of sets all sets may be logical unary sets defined by the user or purposes. higher levels information store to share stored database Every for Therefore, "set", to the unary set processor, merely some collection of data elements that properties. and the storage hierarchy belongs to one and only sets defined by modules at housekeeping sets set. a The meaning of the "set" may need to be data into element identified by the combination of a the in set name and certain is common uniquely content the of the set. An element. Every catalogue entry serves as the 'head' of contains entry contains a contiguity, used to such as sort, Together with the element, the set is implementation also index, given to the is of a set. member, its hashing implement retrieval mechanisms. data element when passed to this level belongs primary pointer pointing to the first instance of information, other concerning information a The and It and physical format of a shown in Fig 4.2. name to set processor. which this element Accordingly, the set processor pulls out the catalogue entry of this set, concatenates a set p. '1 "^ length field Data AP area -z^ Header Fig 4.2: Format of an element as it is passed through unary set interface ] 3 length field SP area AP area Data Header Fig 4,3a; Format of an element after "SP" area is added to it Catalogue Asv AP "MARY" STUDENT H ^SP AP Fig 4.3b: Inserting an element into set "JOHN" a primary 75 . 76 p. chain pointer a proper field to the element, and (SP) position in inserts this element into This is shown in Fig 4.3a and 4.3b for the set. sets that are implemented as linked lists. 4.3.2 Primary Sets and Secondary Sets (Subsets) There are two types of sets implemented at this level. primary set, chained by the primary set pointer primary set is created by actually storing a (PSP) One is the A member . of a data element into the data base The other type is the secondary set. secondary set by is Inserting an element into way of passing the id of the element (i.e., element is already stored), and the secondary set it belongs to. is a catalogue the linkage entry for each secondary set defining the this of A set. set processing is very useful the form l:n or n:l. this level • available It makes to in the There structure of of this type can be considered subset, in contrast to the primary set discussed above. set a This mode a of implementing binary associations of implemented the retrieval mechanism subsets of elements as well. An example at is given in Fig 4.4. 4.3.3 Catalogue implementation Catalogue entries by themselves are members of the name of CATALOGUE. Techniques used to a primary implement set sets by and s . p. CATALOGUE ^—^: PSP SP, AP 77 DATA .2ML AP PSP DATA SLOAIL. JOHN 1OLIT U LARY RT.OAN.F^n Unary set DEPT unary set STUDENT Fig k.h-i Subsets - . Binary association between unary sets DEPT and STUDENT is of the type l:n. In this example, while SAM and JOHN have AP's pointing to SLOAN, SLOAN' AP points to a catalogue entry SLOAN. ST which chains SAM and JOHN together with a SP> CATALOGUE INDEX TABLE Index Pointer STUDENT ^ Lov; Id CHRIS id3' EVON id5 1 idL AMY 'id2 EVON BOB tzz Fig ^.5a: DAVID Sorted linked list and its index table 78 facilities available for search and retrieval at the catalogue as well. To employed process to set is defined, catalogue the transformed set is into an a After command catalogue entry is to be retrieved, a retrieval be is the catalogue set. of a which a Most structure the command to define insertion can illustrate, the first of this set into the catalogue set; entry catalogue structure the the catalogue set is hased. for example, likely, of defines which level DEFINE-CATALOGUE, Data Definition command to the set processor, statement this regular unary inserts the likev;ise, when a command used is to accomplish this job. 4.3.4 Fast Search Mechanisms The set processor is responsible for presenting to a caller, to given its set name and data part. accomplish sequential retrieval of a It particular set. accomplish this retrieval task; for example, whether desired, whether it is The. internal set is to be implemented in order to a is a a sorted data part. If and sorted, a a affected, are updated. structure requested, the set members is usually Other parameters may be added to determine of the index table. specified, are member is inserted or deleted, index entries, if the A pointer to the beginning of the index table is maintained in the catalogue entry of that set. table of the scatter table is to be maintained for the hashed an index is when linked two-way or one-way link, whether an index is to be built on the data part (or part of the data part) element, and whether element may also be required schema, therefore, specifies how list stored a If a scatter the hashing function as well as the beginning of p. the scatter table are stored with set the catalogue entry. 79 Other techniques may be incorporated by augmenting the parameter space of the catalogue entry. 4.3.4.1 Sorting Sorting A module used to facilitate sequential processing and is SORT used to perform this task. is indexing. To make this mechanism more powerful, sorting can be based on the entire data part or part the data part. The sort field may or may not be unique. be desirable that sorting be performed according to the It data of may even part of elements of another set whose id's are part of the data part of the set to be sorted. These different modes of sorting are specified when sets are defined, and indices built accordingly. The sort module is invoked after the Then sorted the of is first loaded. set is maintained by logic incorporated into INSERT, REMOVE and UPDATE functions. time database It may be invoked during the operational the database to reorganize sets that are previously unsorted, or sorted with another key. 4.3.4.2 Index Table Implementation Suppose we have index table implemented whole table on in the a sorted linked list as shown in Fig 4.5a, and right is built for this set. several ways. If contiguity and stored away. Index table may be it is by physical adjacency, may be considered as a an then the sorted set implemented by physical When search in the table is desired, the p. entries the of table are retrieved the same way members of set are a retrieved, and decoded according to its structure parameters (e.g. length of each table entry) are that stored with index the 80 the table. These parameters are passed when this set is defined by DEFINE-SET. Another approach would be to build the index linked list, then and make be built. also then the higher level example If by sorted a multilevel A index the lower level index table is built as index table is merely an index on this indexing of as use of functions designed to manipulate linked lists to manipulate entries of the table. may table linked list given is in a set, set. An Fig 4.5b, and a multi-level indexing example is given in Fig 4.5c. When removal of an item in an indexed set is requested and element is a example, on is a the BUILD-INDEX) and module the critical value. is BUILD_INDEX a delete or insert is called when During the database load period, A periodically index table as the set is augmented. counter may be incremented when a set, a reaches of reconstruct to that member in the index table, the table has to be modifed. module that builds index tables (called called if is For done this counter this mode calling can be suppressed and indices built only after the database fully loaded. Essentially, BUILD-INDEX would visit every element of the set select elements at a and particular interval to be entries in the index table. 4.3.4.3 Hash Table Implementation 1 p. Catalogue entry STUDENT ir;Dx 81 .Control in fo, about index J2i: ^HRJS tl^ C=i [evon | j_d3I I INDEX TABLE id 5 id3. I CHRIS Fig 4.. 5b: Catalogue entry EKP_NO r Index table implemented as elements of a set ^ Control info. Indx p |0^50fid3| 102^0 First level Index table Fig ^.5c: Control info. 1 ici^ Second level Index table Multi-level index table implementation 82 The hash pointer contained One approach is to assign to the catalogue points entry to a function and other parameters are defined. the hashing where location in certain series of pages that are to be used a store this particular set. The hashing function is performed on the whole or part of the data string, and returns number slot a and the The essence is to generate lower positions of the page number as well. number which belongs to the area assigned to this set. This area an id is reserved for this set only and no other sets are to be stored within it. When collision occurs, either starting from the collided id pointer chain or a linear a search can be used to handle the problem. Again many of these properties may be par ametar i zed together the internal schema of the set involved. when the data part of an element in One also has to be careful hashed a with set is modified. For example, if JOHN is modified to be JOHNNY, and if H(JOHN) = 0100 H (JOHNNY) = 0907, then in original. the new location (0907) a flag set to point back to the is The logic of hashing is incorporated into INSERT and UPDATE. \ 4.3.4.4 Summary of Fast Search Mechanisms The salient feature of this level all is • the fact that approximately techniques for internal representation that are geared toward fast search or retrieval of an element of incorporated a set into the structure of the set. can be parameterized and What has been shown is an example implementation, in which sets are primarily linear search, facilities 83 either as or by physical contiguity, and search mechanisms include lists linked stored p. may sequential indexed be provided if However, hashing. and other the parameter space of the structure of the unary set is augmented. 4.3.5 Data Definition Interface This is the interface across which sets as well mechanism and encoding format are defined. retrieval of unary sets is given in Fig 4.6; a The structure of The data definition the SP area of an element is also given. types, their as typical definition of a language unary set looks like the following: DEFINE_USET (Uset_name, other parameters, x^SSPS, uCESPS) Where u is to be passed to the data encoding level, and and entered into the set catalogue at this level. DELETE_USET Two is processed other commands x and UPDATE_USET are used to modify set definitions. stands for Set Structure Parameter Space) (SSPS . 4.3.6 Operational Interface It data is through this elements are interface made. data part, and/or header part. to be found in Fig 4.7. operational manipulations of Commands are provided to insert, retrieve, delete and update members of sets. or that Arguments include the set name, id A list of commands at this level is To provide a feeling as to what operations may 84 be involved when these commands are invoked, some general logic is also given in Fig 4.7 4.3.7 Conclusion of Unary Set Level A summary of modules identified at this level Some modules are implemented on top of between in modules others, is and given in Fig 4.8. inter-connections are delineated according to the module logic outlined Fig 4.6 and Fig 4.7. . ) 85 Fig 4.6: (1) (2) DD interface at unary level Def ine_catalogue This command defines the structure of the unary catalogue. : Define_Uset (Uset_name, x€ SSPS, u( ESPS): This command defines a unary set, where SSPS stands for Set Structure Parameter Space, which may include specifications of the following parameters: a. set type: primary set or secondary set (i.e. subset); set element storage location: this parameter specifies b. how the storage location of each element in this set is determined; there are 3 modes: in this set is the location of each element (1) id mode: determined by higher levels; hashing mod*^the location is to be determined by (2) hashing the da:.a part of the element; link element location determined by mode: (3) system described below; This parameter specifies how elements c. set element link: in a set are linked together; there are 3 different ways: (1) (2) (3) pointer: by way of link list; pointer sequential: by way of sorted linked list; physical contiguity: by way of id contiguity; this parameter specifies the number of indexes to index: following the required, built. For thus be each index information is furnished: (It may Which part of the element is to be indexed? (1) either be part of the AP area or part of Data area); indexing partial If (2) Full indexing or partial indexing? be? is used, how sparse is it going to (3) Will any sort previously performed on this set be useful? the location and entry (i.e. structure of the index? (4) si ze etc d. , . sorted (by be a set may additional sort and indexing: link list) based on different keys, and further indexes may be built. These are specifed in a similar way as described in d. above. e. (3) Update_uset (Uset_name, xtSSPS, utESPS); (4) Delete_Uset (Uset_name) These commands effect changes or two deletions of a catalogue entry. The former may force internal data re-organization, while the latter may involve deleting all the elements in the set. Ramifications will be studied and : deta iled and ; ; ; , p. Fig 4.7: ^fi operational commands at unary level Create_element (Uset_name, (AP, data) or idO, id*): This command creates a unary element. 1. 2. 3. retrieve catalogue entry by Retr ieve_element ('data' mode, Uset_name, ctl_entry*); decode ctl_entry; case 'set element storage location' of id mode: id=idO; hashing mode: do; perform hashing on data; generate id; format SP area and then BEU; id by try: try to store this element at location CREATE ('id' mode, r eturn_code* if return_code = 'contention', then call Collision_Handler id*) and go to try; ) ( end; 4. system mode: continue; case 'set element link' of pointer or pointer_sequential do; format SP and BEU; try to store this element by GREAT ('regular' mode, id*); : end; physical_cont do; obtain Last_id and Inc from Ctl_entry; if id out of bound of reserve area of this set, then call Reserve_more; id=Last_id + Inc; format SP and BEU; store BEU at id by GREAT ('id' mode); : 5. end; if set element link is pointer then call Insert (Beg_po inter , id); else if set element link is pointer sequential then do; id2*); call Search (Uset_name, data, fnd*, idl*, call Insert (idl, id, id2); end; 6. 7. additional link list sorts and indexes are specified, then update the lists and indexes; return id) if ( Retr ieve_element (mode, Uset_name, full_data or partial_data or id, (AP, datar, idr)*, code*); data mode: 1. call Search (Uset_name, full_data or par tial_data fnd*, idl*, id2*); 2. if fnd = null then return (code = not_found else do; ' ' ) . ; 87 id r=fnd; call Retr ieve_element ('id' return (AP, datar, id); mode, f nd , AP*, datar*); end; id mode: 1. 2. 3. RETRIEVE (id, byte_string*) decompose byte_string into SP, AP and datar; return (AP, datar, idr-id); Search_element (Uset_name, patial_data or full_data, fnd*,idl*, id2*) This subroutine locates elements in a set. It is also responsible making an intelligent decision about which of the following for access paths to use (if available): 1. 2. 3. 4. 5. hashing indexed indexed sequential binary search' linear search in the Information necessary for this decision making is stored (When only partial data is specifed, and if unary set catalogue. more than one elements contain that partial data, all of them will unless otherwise suppressed by the be located and returned, caller) Delete_element (mode, Uset_name, full_data or partial_data or id, code*) This command removes an element or a group of elements from a set. related If the Uset_name is a secondary set, only the connections That is to say, only the part to the secondary set are removed. of the SP of the element that is used to chain this element to the subset is affected. this If the Uset_name is a primary set, then element is deleted from the database, so are its connections to all subsets. Special attention is paid to connections made through hashing or physical contiguity. Update__element new AP, new data) This command replaces the old content of the element designated by id to the new content specified within the command. (id, 88 Define_ Catalogue Define Uset Delete. Update. Uset Uset Retrieve. Update_ Create_ Delete_ Element Element Element Element Collision. Build. Index Search Handler Access_ Path_ Selection Fig ^.8: Unary level sub-modules Sort 89 4.4 Binary Association Level 4.4.1 Introduction schema. conceptual levels terms in hand, On the one of with communicates it higher units', such as primitive sets and 'information conceptual binary relations specified in the hand, the in serves as the bridge between the conceptual and It the internal models. specified associations binary implements This level schema; on other the such talks to the lower levels in terms of 'stored elements' it as BEUs and unary sets. essence The mapping this of briefly is summarized below: a. set in the conceptual model primitive A necessarily) mapped Therefore a to primitive a unary element set is usually implemented by However, there are situations in which exactly correspond to a unary explained with details later, embedded set. in an set. when model. internal the in usually (but not is a a primitive For set a BEU. does not example, as will be primitive set associated set, it will not be mapped Rather, its existence is manipulated through the to is to a unary be unary set that implements the embedding primitive set. b. Binary associations of within the AP area of its BEU. 4.4.2 General Mechanism a primitive element are implemented : 90 The function of this level primitive elements. among implement to is keeps two catalogues; It describes the collection of primitive sets defined and one, called CTLP, database, the for other, CTLB, contains information concerning binary relations the An entry of the latter is composed of names among these sets. sets involved in function the associations binary the the binary relation, their reciprocol attribute names, type l:n (e.g., and the association etc.), n:m, or specifications, Based on these structure structure. of sets unary and their formats are defined. Recall that is composed of a BEU, a an Association-Pointer area, (SP) stored element, a (AP) area, and a data part. area is created and manipulated at the unary set level, area to is be Set-Pointer while discussed in section 4.1, (AP's) have we and set made the pointers distinction (SP's) As between they since represent different types of connections among data elements. to AP and maintained by the binary association constructed associative pointers to the The AP area contains a collection of associative pointers. level. used The SP An SP is chain BEUs of the same unary set together, while an AP is used connect primitive elements of different primitive sets together. Function types refer to the way elements of two binary sets are related. There are design, we have identified 3 4 types: different 1:1, associated l:n, n:l and m:n. modes of binary In this association implementation 1. Pointer mode: In this mode, 1:1 type is implemented through 91 inserting associative pointers into the AP area of An association pointer binary association. (Recall that of unary iii sets, secondary set.) is the id l:n and n:l 'are implemented by creating section 4.3.2 it being one v/as to a subset. mentioned that there are two types the primary set, the other the subset, or Many-to-may type 4.9a Fig in element. data of the counterpart element in this is effected through that incorporates the binary elements involved. shown the 4.9c. Fig Note These that the a dummy unary set structures subsets are may be implemented as either linked lists or pointer arrays. 2. Physical duplication mode: assoicated Instead storing of the id of the element, the data part of that element is duplicated in the AP area, as shown in Fig 4.9d through Fig 4.9f. 3. Physical embedding mode: element physically is 'embedded' associated data the associating element. This Under this scheme, stored within element may have its own identity, belongs to certain primary unary set and is as shown always in Fig in the sense that it recognized by the unary set processor, as shown in Fig 4.9g, or it may be manipulation the a sub-unit, such that its depends on manipulation of the embedding element, 4.9h. 4.4.3 Data Definition Interface Data definitions of pr imitive, sets and binary relations are passed through this interface. Also passed are values of parameter spaces to be discussed later. parameters in the Two statements are identified: p. 92 id. 12 FIRST ST CA.M Header Fig 4.9a: Fig 4.9b: Pointer mode, 1:1 Relationship Pointer mode, l:n (children of JOHN) Relationshii (EMP#) (dummy elemer Fig 4.9c: Pointer mode, m:n Relationship (EMP PROJ) — EADDR Fig 4.9d: ENM EMP# Physical duplication mode; 1:1 Relationship Children EMP# RICH AP 7> Fig 4.9e: Physical duplication mode; l:n Relationship I 1 1 p. 94 other elements DATA CAMB MA other AP' s EOOl embedded ADDR element other address elements Fig 4.9g: Embedding mode: embedded element has its ov;n id 95 Define_Pset (Pset_name, x€SSPS, uEESPS) Define_Bset (Bset_name, zeASPS) ASPS represents Association the Structure specifies the function type, sets involved data structures implementation chosen implement to specifications in this Parameter this binary relation, and binary are parameter spaces sections 4.2.1 and 4.3.5). and therefore becomes provide guidance to the binary level in that are of the The SSPS and processed at lower levels (see The binary level defines aware These set. building the structure of the AP area of each element. ESPS which Space, the unary sets, existence of these unary sets. Brief statements of logic of these two definition commands are given in Fig 4.10. Binary set definitions may be deleted, hand, when are the a deleted. primitive sets deleted. When a binary involved are left intact. set On the other primitive set is deleted, all binary sets defined upon Binary definitions set modification may represent a may also is be modified. it This data reorganization at the internal level. The detailed mechanism of this modification will be studied. 4.4.4 Operational Interface Through this interface, insert, delete, update elements in primitive or binary sets are made. and retrieval of 96 modes Retrieval of the data base is done in two Unique (1) and satisfies the requirement is retrieved. satisfy that items sequential processing, Under requirement the second the retrieved. are mode first the level: the first mode, an item that Under mode. Set (2) this at all To facilitate classified further is mode, into self-contained commands and sequential operation commands. Since the majority of existing very applications database are procedure oriented, and not like the higher level query languages which are relatively self-contained, the sequential operation such as Get_Next become at this level, content a the processor has the need that variable is. of When such necessity. know to that commands all what the current For simplicity, it is assumed that such arguments, the through this interface will be self-contained. assumption However, this commands command is received a commands will pass information of this sort as part of so still may modified be later for performance considerations. At the affects instance level, deleting an instance of only association structure (e.g. the two elements, while deleting a primitive relations stemmed from primitive element is deleted. implemented by a BEU, related elements. re-used, the the id of If problem be left deletes all relation of the binary problem may occur when if a primitive element a is then*its id may be encoded into the BEUs of many is However, for efficiency, practice, For example, binary pointers etc.) element integrity An it. a the deleted element simplifed by setting the unused id of a indefinitely. a deleted When is not to be delete bit for the id. element the id cannot, in of a deleted p. element is re-used, those elements that are originally associated the deleted now would have areas) has assumed problem (and therefore have its id encoded element the a mistaken association to the the deleted element. of id 97 with into their AP new element that One way to avoild this to do as follows: is 1. Locate the element to be deleted 2. Remove it from the primary set and all chains subset (i.e., update the link list of the unary connection) Locate 3. associated other all element to be deleted is stored. 4. id This elements where the Set it to null. also applies to situations where consideration consideration is that the is changed. This bi-directional vertical associations (i.e. a may linked be lists affected accomplished, and 4.4.5 Summary data implication The when a related for example, with bi-directional binary complete binary implementation). The set of commands to be supported at this level 4.11 a database system has to have the capability to pull out elements that may be element this the free slot chain of that page). to integrity this return Delete the element (i.e., set the delete bit and part is physically duplicated within another element. of of the id is given in Fig 98 This concludes functions are now internal our built structure We shall see at Higher level on top of these structures, and communicate with the internal constructs through the level. design. a later interface provided at point how manipulation and query commands are eventually translated into operators that are accepted internal constructs. this by . : 99 Fig 4.10: DD interface at the binary Def ine_Pset (Pset_name, xf SSPS, log ic u 6 as:..oc iation level ESPS) : format header according to the structure of CTLP. header, data = Pset_naine) Create_element 'CTLP If this Pset is to be implemented as a Uset then Def ine_Uset (Uset_name , x,u) 4. return 1. 2. 3. ( ' , Define_Bset (Bset_name, z t ASPS) This command defines a binary set. ASPS stands for Association Structure Parameter Space, which includes the following: 1. names and roles of the two primitive sets involved in this asso c 2. 3. 4. function types (e.g. l:n, n:l, 1:1, m:n) storage mode (e.g. pointer, physical duplication or embedding) the structure of the dummy set if functional type is m:n log ic 1. 2. 3. 4. 5. 6. check consistency of primitive sets involved in this binary set against CTLP format z according to the structure of CTLB into Z Inser t_element ('CTLB', header=Z, data=Bset_name) if function type not equal m:n, then go to 6, else Define_Uset (Uset_name=setl/set2, x3, y3) return The following commands effect changes in the definitions of primitive ind binary sets: Update_Pset (Pset_name, changes in parameters) Update_Bset (Bset_name, chagnes in parameters) Delete_Pset (Pset_name) Delete Bset (Bset name) . p. 100 operational commands at binary association level Fig 4.11: Create_p (Pset_name [Byte_str ingl effects creation of an entry in the primitive set named here; returns an id; ) , Create_b (Bset_name, idl [data_of_role2 (or id2)J effects creation of a binary association; optionally returns an id; ) , Delete_p (Pset_name, id) Delete_p (Pset_name, Rel_op, data) effects deletion of all elements that satisfy rel_op,data) pair; the predicate ( Delete_b (Bset_name, idl, (^id2(or data2)J Delete_b_multiple (idl, n, , (erJ ) (Bset_namei, (idi(or datai)^ , j^ER^ , i=l,n)) effects deletion of a binary association: If "ER" is not specified, then only the association between the two data is deleted; If "ER" is specified, then the association and the incident data are deleted; For one-to-many relationship, the incident data has to be specified; if not, all the associated data in that Bset_name are deleted; Update_p (Pset_name, old_data(or id), new_data) Update_b (Bset_name, idl, id2 (or data2), new_data) updates the content of the incident data Update_b (Bset_name, idl, id2 (or data2), new_id) updates the association between idl and id2 to be idl and new_id Retrieval of the database is done in two modes unique, which retrieves one item at a time, and set, which retrieves a set of elements. A limited ability to process predicate requirements is incorporated : p. 101 unique mode: Find (Pset_name, data, fnd*) Select_p_f i rst (Pset_name, data*, id*) Select_p_next {Pset_name, idO, data*, id*) retrieves the element that follows element at idO; Select b (Bset_name, datal(or idl), data2*, id*) RetrTeves the binary associated element of datal or idl in Bset_name; Select_b_f i rst (Bset_name, dataller idl), data2*, id*) retrieves the first occurrence of binary associated element of dataller idl) in bset_name; Select_b_next (Bset_name, dataller idl), idO, data2*, id*) retrieves the next occurrence of the binary associated element after idO of datal in bset_name; (note: meaningful only when Bset__name is 1-to-many or many-to many) Select_b_join_f irst (m, (Bset_namei, datai(or idi), i=l,m)) gives the first element that satisfies m binary predicates; Select_b_join_next (m, (Bset_namei, datai(or idi), i=l,m), idO) gives the next element that satisfies the m binary prdeicates next to idO; set mode: Retr ieve_p_set (Pset_name, n* ptr*) gets the set of element in Pset_name; n is the number of element returned; ptr points to the area where the whole set can be found; , Retr ieve_b_set (Bset_name, n* , ptr*) Select_p_set (Pset_name, rel_opl, datal, n* ptr*) gives the set of element that satisfies the predicate; , Select_b_set (Bset_name, rel_op, datal, n* ptr*) gives the set of binary relation that satisfies the predicate; , p. Select_b_join_set (n, (Bset_namei, Rel_opi datai(or idi), i=l,n)) gives the set of the elements that satisfy the m binary prdicates; , Count_p_set (Pset_name, n*) gives the number of elements in the primitive set; Count_b_set (Bset_name, n*) counts the number of binary 'tuples' in the binary set; 102 p. 103 DATABASE SEMANTICS V. Making use of functions provided by the internal structure levels, 3 additional levels enrich levels These are established the data build to model and semantic provide constructs. facility a for expressing schema constraints. 5. 1 N-ary Level 5.1.1 Introduction The N-ary level processes constructs schema. defined The conceptual schema defines the total, the database with clean semantics. in the conceptual integrated 'view' of The major functions of this level are the following: (1) Accept the data definitions of the conceptual schema expressed in the BN model (as shown in Fig 2.6), check for consistency and build the conceptual schema catalogue. (2) Process operations on instances of the conceptual schema. 5.1.2 N-ary Data Definitions The DD interface consists of the following definition Define_Vset (Vset_name, other parameters) statements p. Define_Nset (Nset_name, ) stands for Value Set, Nset for Entity Set, and ATPS stands Vset where (attr ibute_naine, ytATPS)_list 104 for Attribute Parameter Space. The detailed format of data definitions processed discussed been has in chapter 2 Define_Vset or Define_Nset will result in schema, each and attribute defined result in an arc being created. items as shown and in a set a checking includes such of and Processing arcs. data definition statements results in building Binary. in interpreting and set of executing The primitive catalogue contains an entry for each direction of an arc defined. in such a way so as to a There are two catalogues to be maintained: each node defined in the schema, and the binary catalogue for the proper equivalence definition, proper domain definition, and operations on the database. entry in the Define_Nset statement will Consistency catalogues to be used by the N-ary level Primitive Either node being created compatible syntax and semantic specification on the of level 2.6. Fig in this at contains an The catalogues are built facilitate cross referencing. 5.1.3 N-ary Operators In order to distinguish between instances of Nsets defined at this level and instances of constructs defined at external view levels, former is called an entity record from now on. and update the entity records are modes of retrieval: supported. set mode and unique mode. the Operators to retrieve Again, there are two Set mode will give all 105 p. the entity records that satisfy the requirement, while unique will give Sequential operator Retr ieve_next is also included. just one. The retrieval operators are listed below: Retrieve_Set (Nset_name, (attr ibute_name WHERE rel_opr, value)_list , Retr ieve_Unique (attr ibute_name att r ibute_name_list) (Nset_name, attr ibute_name_li st) WHERE rel_opr, value)_list , Retr ieve_next (Nset_name, (attr ibute_name, attr ibute_name_list) rel_opr, value)_list CURRENT IS WHERE ((attribute name, value)_list) where rel_opr are the relational operators such as =, note that if the 'next' the list is not in If the domain of Also any of the in the command. instances of an entity set, one has to be careful in defining the meaning of the update. are etc. value set, the attributes of that a non-value attribute to be retrieved are specified To update the >, mode is specified, the current content of the attr ibute_name list has to be supplied. attr ibute_name >=, (Recall that not restricted to normalized relations). defined Nsets here Therefore, the following rules are imposed: 1. To insert employee's a new entity record record into attributes may be passed. to be enclosed 2. To in the (e.g., Nset Values of parantheses. delete an entity record to add EMPLOYEE), a (e.g. all l:n or m:n An example newly a is given hired the immediate attribute in have Fig 5.1. to delete the record of an p. lOG Suppose we have an n-ary entity set EMP defined as EMP(EMP#,EN7\ME,DEPT, CHILDREN) To create a new employee "Joe John", ; use the follov;ing INSERT_ENTITY statement: INSERT_ENTITY (EMP, EMP#="0907", ENAME="JOE JOHN", DEPT="0 015", CHILDREN- ("MARY JOHN", "RICH JOHN")) To add another child to Joe John's CHILDREN attribute, the following INSERT_ATTRIBUTE statement: INSERT_ATTRIBUTE (EMP, ENAME="JOE JOHN", CHILDREN=("JEFF JOHN")) Fig 5.1 and 5.2: INSERT ENTITY and INSERT ATTRIBUTE use 107 p. employee who has quit from the Nset EMPLOYEFV candidate has keys be to passed. All only one rest the of the done is automat ically. 3. To insert or delete an instance of (e.g., insert to relation EMP) attribute a , candidate a key m:n or entity the of given. are relation are not cited. 5. l:n attribute new child into the attribute CHILDREN of the a instance a record and All other attributes within that See Fig 5.2. Updating of any ^htribute value is done pair-wise, i.e., candidate key the of the entity only record and the attribute to be changed are involved each time. The operators to be supported for update are listed below: Insert_Entity (Nset_name, key value, (attribute ,value)_list) Insert_Attribute (Nset_name, key value, (attribute val ue ( s) )_list) Delete_enti ty (Nset_name, key value) Delete_Attr ibute value (s) (Nset_name, value, (attribute, )_list) Update (Nset_name, key value, (attribute, old value, new value)) The logic of these operators is section. key The detailed algorithms operations will be studied. 5.1.4 Retrieval Strategy briefly as well discussed as in the next the effect of these p. When retrieval commands for entity records translated operators upon underlying binary associations. into binary operators are then given to the next Since instance the binary the attributes of an entity set are implemented as the binary asociations to the entity node, establish across These This translation procedure involves access path association interface. selection. level are they given, are 108 simplest the solution to is of the entity node first, and then follow the binary associations to obtain its attributes. instance There may be different ways to establish the desired an Therefore, optimization of the construction effort is entity node. to be considered. followed. set to In general, To clarify this idea, are tree. to constructed be Fig tree, where attributes are However, example is given in Fig 5.4. tree in a the before those of the lower levels of the established 5.3. n-ary an Branches represent binary associations in consideration. in many in Instances of attributes at higher levels of example an there are represent exactly the same relation with the be imagine that the mapping of natural choice of the tree from of shown can record its binary form is to be stored as a represented by nodes. tree a on the choice of the starting role and the path to be depending ways, of It a retrieval equivalent command a is trees that may different starting role. that the latter graph is seen Then An 'hangs' slightly different orientation and completely changes the path of construction. Since the performance, shape it is of a retrieval strategy module. the decision The may tree to be issue may affect made even record construction intelligently involve the by a internal . . . p. structure, such as indices sorted lists. or To see what all 109 these mean, we use the above example to illustrate the different construction efforts that may be involved suppose PROJ# Ideally field. procedure specified is the In example, the the retrieval command to be the sequence in system trees. two translate should command the into generated by subjecting form a that records are constructed according to the sequence such entirely Alternatively, it may sort the records after they are order. tree these in and, if the:T\ the un to . l sort module. a set 'PROJ#' > If we choose sorted in the internal is generated the second desired structure, then records automatically. On the other hand, the first tree form will not result will ordered by in a set of records be the in requires therefore PROJ#, order further a sort Another more obvious consideration would be some attributes are to be restricted Restriction usually takes values if PRO JECT= system (e.g. save some effort if construction starts at that, ' the restricted place when the WHERE clause is ') / of it may attribute. used in the command To summarize, the strategy used in this design is listed below; more examples are also given: (1) If predicate no given is , the entity record construction starts from one of the candidate key attributes of the entity set. (Fig (2) 5.5a) If a predicate, key attribute of the the entity record(s) entity is(are) is restricted in the constructed starting from . . : ) p. that key. (3) (Fig 5. 5b) If none of the while 110 key attributes are restricted in the predicate, non-key attribute is, then there are two approaches: a Locate those unary instances of (a) attribute restricted the non-key that satisfy the restriction and trace back to its entity node instance (Fig 5.5c). Visit every entity in (b) attribute collect and the relevant all from Start set. fields, some and then see whether this entity record satisfies the predicate. for every entity in the set. choice again would be either 3(a) this Do (Fig 5.5d). If multiple non-key attributes are specified (4) key or 3(b) or some the predicate, the in combination of the two attributes sequence If (5) are specifed, then there are again two approaches (a) Construct entity records according to the sort order (b) Construct entity records with another strategy and sort them at the end. centralize To this issue retreival_strategy module is singled out. take (for advantage in are developments in strategy, a Conceivably, this module can the area of query-decomposition, example, <Yao79, Ast rahan76>. When n-ary of retrieval of a storage operation is given (e.g., insert or level has to carry out its semantic ramifications. inserting a declared delete), For example, new entity instance into an Nset, those attributes to be total (i.e., those that cannot the have that value ) p. Ill Fig 5.3 command: & Tree forms of access path selection 5.4: RETRIEVE SET (N. EMP , (EMP# ,ENAME DEPT (DEPT# DNTU^E) , , ) ) loop Fig 5.5a: command: RETRIEVE UNIQUE Retrieval strategy (N .EMP, (EMP# Fig 5.5b: ,ENAME,DEPT#) Retrieval strategy (1) ) where (EMP# = (2) ' 0909 ' command: RETRIEVE SET (N .EMP , (ENAME ADDR) , ) where (DEPT(DNAME = 'SYSTEM')) (DNM) 'SYSTEM' ® i (l:n loop) i ^® /enm^ Fig 5.5c: command: RETRIEVE SET (N. EMP (addr\ (z) , Retrieval strategy 3(a) (ENMIE, ADDR) ) where (DEPT(DNAME = 'SYSTEM') loOD /DNAMfe if Fig 5.5d: 'SYSTEM', then ©"0 Retrieval strategy 3(b) 113 p. must be supplied. 'undefined') Also, if it an is arc, an instance of its hierarchical hierarchical instance hierarchical parent hierarchical implemented have to be deleted; other some to node, must be deleted; 'child' the and if it is instance the all information level and These rules must be and so on. data the discussed later in this chapter certain database a this of make sure that database assertions are maintained. to virtual the instances other to a 'parent' must exist. As another example, when an instance of an entity is deleted, associations of validity level assertions (At to be are that not carried out at this level are also maintained.) 5.1.5 Entity record construction We now show by some examples how an entity record is format The of Nset the catalogue data and constructed. structure used during construction are also exemplified. Suppose we have shown in Fig 5.2c. Fig 5.6. (N.EMP, ) definitions as command evaluates the in is EMP_PROF ADDR, (PROJ#, (PROJ a table of a format shown in Fig 5.7a. first passed to the retrieval strategy module, which possible Fig (EMP#, ) equivalent to generating depicted entity An example format of the Nset catalogue is given in PNAME) ,TIMEFRAC) This 3 Then processing the following retrieval command: Retrieve_Set is database composed of a 5.7b. access It slows paths. a A possible access path is tree diagram that portrays the route according to which the database is to be traversed in order to carry p. Out the command. A data structure that corresponds to this tree is then generated by the retrieval strategy module. structure is shown in Fig 5.7c. The An general example logic of of this record construction given this data structure is shown in Fig 5.8a, while procedure Fig 5.8b. 114 the used to carry out this example retrieval command is shown in p. Nset Cagalogiie Nset Attr List naire 115 Field no " , 1 ; ; ; ; ; Record Construction (Catalogue, Record Reserve wo r kin,; space VS Begin: ID , the record, for 117 p. niap) stack. :ind ST; /'''Initiate the first record */ Initiate: N= number of nodes to be traversed; I-l; J=Field_no (WS (J) , (T) ID(J) )-Select_e_f irst (Domain ( 1 ).> ; NEXT- 2; Ca 1 ] Re record s t ; Gail Print record; Continue: /"Process other records:^/ Do until EOD__o f_s t a r t ing__node /^checking stack for looping*/ Do while scack not empty; I-ST(SP); J=Field_no (I) (V7S(J) ID(J) )=Select_b_next (Bset(Il, WSCPred), id =ID(J)L; If not end_of_Qata then do; NEXT=I+I; Call Res t__reco rd Call Pr i n t_r ecord ; ) , : ; end ; Elsepopstack; end ; /* Establish next 1 data of starting node"/ = 1; J=Field_no(I) ; = Select_a_next NEXT=2; Call Res t_^r eco rd Call Pr in t_r ecord (.WS(J) , ID(J)) ), CDomain ( I). , i d ° =ID(Ji); ; ; end ; Return; Rest_record: /*Internai routine for completing Do I = NEXT to N: J=Field_no(I) If Bset(I) is 1:1 or n:l then record*/ a ' . • . ; (WS ( J ) Else do , I D ( J ) ) = Se 1 ec t_b ( B set (I ), , W S CP r ed ) ) ; ; Push. I (WSCJ). on stack; .ID(J))-Select_b_f irst (BsetCD,, WS (Pr ed ), ). fend end Print_r ec ord Print WS; : /^'Internal routine for output the record just constructed--/ End Record Construction: Fig 5 (a : A General Logic for Record Constructi on (Given schema catalogue and record map produced by Re t r i eva l_!iiod u le") 118 For example, the following procedure will be executed by the record-construction module when invoked to build the table shown in Fig 4.7a: /*Reserve WS 12 WS ID EMP # , ID and ST as follows:*/ 3 4 5 Tl T2 T3 ST 119 p. 5.2 Virtual Information Level 5.2.1 Introduction Semantic or statistical relationships prevalent in a is equal to elements often are For example, an employee's age can be derived database. from his birthday and the current date; account among the accrued its balance multiplied by the interest of interest rate. problems consistency the derived element is also stored, certain bank a If may occur. accuracy. There are two approaches to the issue of database is maintain to a catalogue of consistency constraints as well as some traverse "housekeeping" routines that also be invoked when a from the in may DBMS stored derived from other data. used Housekeeping routines may It maintain the derivation process. be computed. routines used in of computing and Also, since This catalogue of functions a to but approach may do away with those housekeeping recomputing are be This gives rise to the term 'virtual However, it adds to the overhead the previous method. they by those data fields that can be database maintains consistency database information', signifying information that is not physically stored may to sensitive data element is to be updated. Alternatively the eliminating periodically database the of these constraints. satisfaction enforce One a data field whenever it is accessed. not physically stored, it is difficult to make direct retrieval against these fields. the accounts that have accrued For example, a query to get all interests exceeding a certain level 120 p. that the accrued interest of every account be computed. require would these The database designer has to take tradeoffs of approaches into 'derived' from consideration in making internal structure decisions. Extending this concept, any combinations algorithms of <Madnick73>. data and On one extreme, the algorithms through that an (e.g. be physically is may information be employee name direct a given combination a person's age is algorithms of search derived by retrieving the the in employee number). his and purely On the other between these extremes, however, there is information that through stored derived Sine and Consine functions). (e.g. extreme, information may be derived through database may information data (e.g., is derived query on a date current In and a his from the stored database and then performing a subtraction). birthdate Under this framework, several categories of virtual information are identified here: (1) Computed facts: algorithms to compute from stored data; (2) Representation: data type conversion functions; (3) Encoding: data string encoding functions. 5.2.2 The General Mechanism To support the serves as a virtual information implementation, this level front gate to the level immediately lower to it, namely, the n-ary entity level. 121 p. keeps It a catalogue of all information that is virtually defined. Every request to create, retrieve, update or delete through filtered in the request, (1) the gate, and it provides if a data unit of is any virtual information is involved functions for the transformation data. of Computed facts: Any attribute definition that has its The virtual attribute is entered into next level. derivation function also is stored a on (VT) is catalogue. with (or chained along The function may use both raw data and catalogue entry. algorithm, its flag and processed at this level without being passed down to the extracted in virtual The the to) data virtual For therefore effects nested virtual information. example, the attribute VOLUME is derived from AREA and HEIGHT, while AREA may in turn be derived from LENGTH and WIDTH. (2) Representation: As also a request for records is given, the data type of each field given. If it different is from the stored data is type, transformation is done before the element is passed down to the storage Therefore, the virtual information level or returned to the requester. has to be aware of the data types of all the stored elements. (3) Encoding Attributes that need noted at this level. tc be encoded before they are passed down are Encoding functions are given. This type of : p. information virtual service definition languages (e.g. is especially relation names) 122 useful for encoding data processed are before they by lower levels. 5.2.3 Data Definition Interface The Data Definition statements of database constructs Binary or N-ary) that have a and processed at this level. (Primitive, virtual flag attached to are intercepted The definition statement looks like attribute_name list) the fol lowing Define_Nset (Nset_name, where ATPS is recognized ( and ytATPS, vtVTPS) , processed by lower levels, while VTPS represents Virtual Parameter Space, and v is decoded at this level; appropriate, entries are made the in information virtual if catalogue accord ingly. 5.2.4 Operational Interface The operators assocition level. are almost Operations identical on intercepted. the at n-ary attributes are passed nonvirtural untouched to the lower level, while operations are those to on virtual attributes Encoding of database mnemonics such as Nset or Bset names is also accomplished at this point. An elaborate record these mnemonic designations is to be maintained. catalogue to While the derivation changed by updating to be considered virtual element cannot catalogue entry may of virtual a the attribute definition, the element cannot be updated. is function (For example, meaningless). be be inserted It or attribute p. 123 can be individual virtual request to update JOHN'S age also follows that individual deleted; thus operated on. operations are to be processed at this level. only the virtual Therefore, only retrieval 124 5.3 Data Validity Level The conceptual schema legitimate assume. 12 values include and 28 to 31 days allowed be in exceed to constraints on set the of elements in the database are allowed data For example, elements describing months may not some may a date has to be confined to to the calender; employee's salary data certain etc. a maximum; These constraints uphold the data validity of the database. Another type of semantic restriction affects the interrelationship among data elements. courses that For example, conflicting have a student cannot be allowed to schedules, or be appointed assistant before he clears all the 'incompletes' Another example would equal to the sum of branches. one another) allocated if a functional budgets of is year should be subordinate the relationship exists between data elements, the information the task previous section). to term. exactly computable from (i.e. that may be derived needs not be stored, but computed This a teaching last the the total budget of the all that (Note that be from a take translate of the virtual when requested. information level mentioned in When these constraints are specified, the DBMS them realization of them. into routines that Some descriptions of monitor and integrity the enforce the has the constraints are presented in <Eswaran75>. There are also situations that are allowed to occur, but users are to be warned or alerted immediately <Morgan75>. the death rate at the health a certain region detected by management would be an example. A sudden a increase in system used to monitor The DBMS is expected to p. conditions output the alerting message when 'abnormal' such as 125 these are detected in the databse. The data validity, integrity and alerting problems are targets of the current level. General Mechanism 5.3.1 All primitive, binary or nary sets of the database that have these constraints attached constraints, are written translated into a as flagged part they when of procedure that is the data be to defined. are definition, executed is this at The to be level, mostly when update of elements of these sensitive sets are encountered. Operators at levels below shall be used implemented the translated Constraints may also be specified for virtual procedures. There are two modes of enforcement: (1) first mode, a attributes. periodic checking and checking invoked only when data elemnts are updated. the in In implementing hardware timer has to be made available. list, sorted by scheduled time of the events is maintained and either constantly, dedicated processor. the routine is by If invoked is detected, and pending for correction. An event checked way of timer interrupt or through the use of the time of the inserted into the event list. error (2) the a scheduled event has arrived next scheduled time for this routine A message is error printed condition when a a and is database may be logged or left . To support database sensitive delete) the second mode, that the result, 3) update Any precompute the result, 1) catalogue of (including filtered be to a 126 insert and through the check for legitimacy of 2) determine whether to accept or deny the request, and request a constructs. of an element of these sets has procedures if this level maintains p. denied, be to return explanatory an message. operations on non-sensitive data, the requests are passed 4) For directly to the next level How to select the mode of constraint implementation is, DB again, a designer decision, which has to have situational factors taken into considerations. The simplest type of constraint would whether it must be numeric or alphanumeric. may be specified in terms of continuous data element legitimate values of is a consideration, database 5.3.2 in a 1) or range the 2) a the data type, i.e., More strict constraints values of catalogue discrete data element. allowed that contains When for a a set of inter- relationship procedure has to retrieve other data in the order to do the computation. DD Interface The most complex problem at this interface where by these constraints. level is the data definition the specification of the validity and consistency constraints are crossed. to the be Commands are also available to make changes p. Define_Vset (Nset_name, (Vset_name, ( other parameters, w VLPS) or Define_Nset attr ibute_name , w VLPS, v VTPS, y ATPS) where VLPS stands for VaLidity Parameter Space, to be interpreted at current spaces are passed down. 5.3.3 127 list) intercepted and level, while parameters in the rest parameter (See section 5.1.2 and 5.2.3 for explanation). Operational Interface Commands to manipulate data are passed through. The are identical to those to be accepted by the next level, and operators checking elements is performed for sensitive data sets. p. USER VIEWS AND DATABASE SECURITY VI. 6. 128 Introduction 1 The conceptual schema designer may have structured the information into entities constructs and are attributes. implemented database, however, may see from to Operators in levels the look at the data as organized into hierarchy of segments. below. structure of the data a of the conceptual data model. that operate to They a that user end is allowed to look at data in application and choose comfortable use. to data a different ways they may want a sublanguages various a sublanguage In this manner, way most natural to his that most feels he addition, existing application programs are In protected from becoming obsolete due These different is different set of relations or accordingly to operate on these "external constructs". the these End users of the In particular, choose then upon looking of changes to at data in structure. structure of the database the constitute different "user views", also called "external views", of the database. and mapped To support external views, onto conceptual these views have structures. Operators external views have to be translated into operators to be defined performed upon entity sets upon defined at the conceptual level. In addition to providing user views, the DBMS is also required maintain database security. access to the database. to Database security refers to the control of Since an integrated database is accessed by number of users, and since each of them may be granted permission a to p. part only identify a of Read of DBMS has to have the capability to the user, determine his access limitations, and accept or reject Furthermore, permissions may be an access request. terms database, the There Write. and are view of definitions, modification. identified user views three-level a security and Authorization Level authentiates to particular view. a mappings between two approaches to The first one one through is and a hierarchical control. At for top, the Viev; the log-on user and authorizes the defined in external translates the operators. Before going into these individual levels, user views a and Belov/ it, a the the View viev/. formal definition of between external constructs and conceptual constructs is given in the next fashion, by query structure Enforcement Level checks for legality of the operation of mapping is Next, the View Translation Level keeps track of constructs schema, conceptual second the in Here we choose fo use the first approach. We have handling and differentiated generally maintaining database security control <Hsiao79>. way 129 a subsection. single To illustrate transformations database in a coherent example is used throughout this chapter. The conceptual definition of this example database shown is in Fig 6.1.1. 6.1.1 Mappings Mapping herein refers to the correspondence scehma and the conceptual schema. between One of the reasons v;hy an external we choose to ) p. 130 DATABASE EMP_ASSIGNMENT CONCEPTUAL DEFINITION: EMP(E#, EN,ADDR,DEPT,E_P, JB_HSTRY,MGR_OF, PRJ--E_P (PRJ) DEPT(D#,DN,LOC,EMP) PRJ P # ( , PN' , MGR EMP=E_P EMP ( , ) , E_P E_P (EMP,PRJ,TIMEFRAC) J_I1 (EMP, JOB, DATE) JOB(JB^, JBN, JB HSTRY) Primitive/Binary Data Structure Diagram of EMP._ASSIGNMENT; JOB J H EMP Fig 6.1.1: An example database (VIGR) PRJ ) p. use binary relations mentioned earlier as in underlying the section transformation (or "deconceptual provided for constructs in 2.2, i za t ion" conceptual is for ) In . data models. structure, as flexibility in our system, mapping is three different external data models, or "types of views", namely, the relational, network its 131 Constructs in the hierarchical, and the these models are mapped using a common mapping language which describes any external construct in terms of the follov/ing conceptual constructs: (1) Entities and their direct attributes or derived attributes; (2) Binary relations between entities and/or attributes; (3) Predicates restricting above constructs; In essence, to be any external construct (e.g., mapped to a a relation or BNF is portion of the integrated database described by the conceptual schema, and the mapping statements specify The segment) a this specificatin of this mapping language is given Some examples of mapping language statements and their in "portion". Fig 6.1.2a. corresponding graphical representation, called the tree form of the mapping, which is basically a "clipping" of the data structure diagram of the integrated conceptual schema, are given in Fig 6.1.2b. p. 132 Mappings statement ::= Entity_statcment Binary_Gtatement Entity_statement ::= Entity_attribute_clause, predicate clause Entity_attri'bute_ clause ::= Nset_name '(attr_list) attr_phrase attr_list attr_phrase attr_list attr_name attr_name(attr_phrase) attr_phrase Nset_name (attrj)hrase) Binary_statement condition_list) predicate_clause condition condition_list condition condition_list attr_phrase, comp_op, target_phrase condition* >(= |< >= <= |-i=Ul9 comp_op v'ariat)le_name literal targe t_phrase I | | ( | I I | Conditions can be further elaborated to include set-theoretic comparisons. Fig 6.1.2a: A BNF specification of mapping language p. (1) External construct e and attributes: 133 mapped to unrestricted entities '-iiuxuies e^ = EMP(EN,ADDR,DEPT(Dff) ) treeform: ^^^ /Xv^ \^R DEPT (2) 62 mapped to restricted entities and attributes: e^ = PRJ(PN,P#) WHERE (EMP (EN) =Var EN) . treeform: PRJ EMP (3) e^ mapped to a binary relation e^ = EMP(JB_HSTRY) treeform: J H Fig 6.1.2b • EMP Mapping statements and their 'treeform' graph 134 p. 6.2 View Enforcomtnt Level 6.2.1 Introduction two major tasks: performs coordinates integrates and This level external all views. It process operational security parameters (1) of the views and check for compatibility of these parameters among and views; enforce these operational security requirements. (2) definition. The first task is performed during view the DBMS conflicts identify to A conflict of this kind views. portion of integrated the one when occurs database conflicts These domain. designates view database are not made translation level. constructs and v/hen are translation in with translation each level other performs chapter, the view at mapping On other the hand, constructs operation. task and terms of the common conceptual data model they reach the view enforcement level, therefore coordination The second of of operations in each view independent of views. expressed be facilitated here. into communicate The view the existence of other operations to into detected at the view easily translation level because, as will be explained later in this views a exclusive use, to be of its own not are enables It inconsistencies among different or while another view attempts to include that part of the its all can This is graphically sho\;n in fig 6.2.1. of this All operations on a level is performed during database particular view, after being translated operators on conceptual constructs by the view translation level, are checked against the security parameters maintained at this level. 135 p. (Next Higher Level) Hlerg Relatlional Viev; Viev^ Netvjork Viev View-3 View-4 View-i 'A \ . I . 1 : p. 136 6.2.2 General Mechanism This level accomplishes its contains which information on construct. enables This catalogue, by views and conceptual constructs. lists, for each conceptual construct the views that are allowed to maintaining tasks entities (e.g., a attributes), and read, write, share or exclusively use the current the level identify conflicts to between viev/s and generate messages to effect intervention base administrator to It resolve the conflict. by a On the other hand, prevent information is also used during database operation to a data this user from issuing operations not allowed within the viev; he is using. This catalogue itself can be defined as entities and attributes. A third entity, Two basic entities are VIEWS and CONCEPTUAL_CONSTRUCT. called SECURITY, may be used to designate the many-to-many relationship between them. The binary network model of this catalogue is shown in This strategy of Fig 6.2.2. catalogue implementation makes use of functions provided at lower levels and releases the burden of catalogue maintenance from the view enforcement level. 6.2.3 Data Definition Interface There are two parts in this schema definitions, whereby interface. the in these conceptual schema is the conceptual enforcement level obtains all view construct names in the conceptual schema. embodied One More complicated definitions information and other semantic parameters) are of no (sucli parameters as virtual concern to the . p. Current level, and passed are down intact. 137 The other part of this interface is the view definition, which consists of identification of a view and the corresponding conceptual schema this view is mapped to, as well as security requirements. catalogue The data definition interface is shown below: this information. using The current level builds its Define_Nset (Nset_name, attr_list) Define_View (View_id, conceptual_constr uct_l ist) where the conceptual_constr uct_list is the view mapped is to a constructs list of conceptual and their corresponding operational security parameters. 6.2.4 Operational Interface Operators to through. The manipulate operators this operation is a elements are tag which identifies the view issued. enforce legality of this operation, denied data passed are identical to those to be accepted by the next lower level, except for which conceptual Security checking and unauthorized is based on performed to operations are p. 138 6.3 View Translation Level This level incorporates several parallel modules, called model data each processors, designed to handle views, or external data model. translation from network, three models, data the mapping language introduced these in particular type of a this section we describe mapping and in hierarchical, relational, the common conceptual data model. to constructs In external and We shall illustrate how section 6.1.1 used is to map external data models, and how operators in these models are translated. 6.3.1 View Translation Level -- Relational View 6.3.1.1 Definition and Mapping Relational views are defined in terms of names, attribute relation names and sequence attribute(s) the attribute (i.e. according to which the relation is to be sorted, if any). of these mapping The names to the conceptual constructs are also specified. types may be declared to be different from the form physically The relation name gives Since each attribute in has to mapping may take shows the mapped be tree a to of is stored. stored. relation has to be atomic, an a a the mapping. attribute The relation definition and value set. format exemplified in Fig form Data rise to an entry in the relation catalogue, where information about this relation and its mapping name domain names, 6.3.1a, and Fig 6.3.1b Due to the great similarities between the relational data model and our conceptual n-ary entity sets. 139 Relational View ET-IP Efi Mapping tree form for relations EMP Mapping tree form for relations JOB HISTRY Fig 6.3.1b: EiMP , DEPT, PROJ PROJ & & JOB; JOB HISTRY; EMP PROJ Tree form of the relational view of EMP ASSIGNMENT , 141 p. mapping and translation of operators are fairly straightforward. 6.3.1.2 Tuple Construction actual construction of In our model, place when relation is defined; the the relation name. Tuples of a the does relation are built one by one It next to when an definitions no relations are built when they are Because levels. take is built by passing n-ary entity retrieval commands extracted from the mapping down not only the mapping is stored with command of the relation is encountered. output tuples Again, there are defined, redundancy of data is completely eliminated. no restrictions on the normality of the relations. 6.3.1.3 Operations Relational operator JOIN, SELECT, DELETE_TUPLE, and UPDATE_TUPLE are supported. command available examining for tuples one Ge t_next_tuple is also by one. describes the general logic how these operators may be performed INSERT_TUPLE SELECT_WHERE, a This section translated and . JOIN, SELECT and SELECT_WHERE commands are set operators that give rise to new relations. These relations, as usual, are not constructed, but their mappings are generated automatically and then stored. In the following discussions, the effect of these operations is exemplified by the tree form of the map. , p. JOIN operation domain. involves joining relations two operation would result in 6.3.3. Note that selection that incorporates eliminates a 'pruned' tree, as shown in Fig lower level this is that, SELECT if a command resulting relation shall have the shown tree reason The eliminate the intermediate level from the if we tree, the semantics of this relation this example, roles intermediate level role would result in the some latter being marked in the tree, but not eliminated. for common a An Example is given in Fig 6.3.2. SELECT but over 142 a may (R, become (EMP_NO, ambiguous. LOC) is ) In given, the form as shown in Fig 6.3.3b, while 6.3.3c has an ambiguous semantics between in Fig the node EMP-NO and DEPT. SELECT_WHERE is used to impose restrictions on values attributes in a translated into the implemented at the n-ary level. mapping commands Retrieve into with relations. However, tree. WHERE It is clause An example is given in Fig 6.3.4. Due to semantic considerations, the update commands are normalized certain this command is easily implemented by relation; incorporating this restriction readily of if this restricted to restriction is lifted, other measures may be taken to remedy the semantic ambiguity. INSERT_TUPLE: Inserting equivalent to command is translated tuple a inserting into an n-ary relation record into an nary entity sets. a into Insert_enti ty or is This Inser t_attr ibute depending on the context. DELETE_TUPLE: Deleting a tuple from an n-ary relation is treated ) ) R = JOIN(EMP,DEPT,OVERD#) ,'. R(E#,ENM,ADDR,D#,DNM,LOC) BMP (E# EN ADDR, DEPT (D# DN LOG) = , , , , ) Join operation results in a joined tree Fig 6.3.2': Rl = SELECT(R, (E#,D#,LOC) .', R1(E#,D#,L0C) Fig 6.3.3a: = EMP(E#,DEPT(D#,LOC) Select operation results in a 'pruned' tree R2 = SELECT(R1, (E#,LOC) R2 = SEI,ECT{R1, (E#,LOC) . 6.3.3b: . ) R2 = N.EMP(E#,DEPT(LOC) D3 is marked not to be ) I output I Fig 6.3.3c: Ambiguous tree map p. R3 = SELECT (R, (E#, D#, LOG)) (LOG = "Ne^J YORK") WERE ©(Restricted = * R3(Ei, D#, LOG) = EI-IP (E#, VJHERE "Nn\' by YORK") DEPT(D#, LOG)) (DEPT(LOC) = Fig 6.3.4: SELEGT VJKERE operation "l\IEi'J YORK") 144 145 p. the entity designated by the key of this relation from as deleting the coi respond ing entity set. should normalized relation relation not normalized. "what is does exactly Note that, in Then : one might the ask that a the question: relations removal of any of these would result in removal of Delete_attr ibute Update_Tuple Suppose referenced. why point this at this user want to delete from the database?" This command the tuple.) be is clear tuple where several entities and binary a involved, are (It , is translated Delete_enti ty into or according to the context. This operation may be translated into a sequence of Update commands to be passed down to the n-ary level. 6.3.1.4 Defining Relations Using Relational Operator New relational views may be defined by commands JOIN, SELECT_WHERE. considered a A relation generated by these temporary one, therefore not entered into catalogue unless attempt to save it is made. is saved, it access to constraints. to by it a view, and other users may add a relation be granted calling the relation name and satisfying the access Therefore two commands are also available at dynamically be permanent the Once and would commands relation may be treated as SELECT entries to level or remove them from the catalogue of relations: Save_Relat ion Rel_name, access constraint parameteres Delete Relation Rel name this p. where access parameters are extracted and processed at the next 146 higher level, namely, the view authorization level. 6.3.1.5 Relational Sublanguage The relational operators just described relations in our level, such as used manipulate to However, the end users may still find them model. cumbersome to use, and query higher are or SEQUEL manipulating <Ast rahan76>, languages may of desired. be database sublanguage facility, which will be described even an section in The 6.4, provides translators to facilitate ease of user interface. Operations other than those described here may be added (e.g., operator that tests to see if two relations are equal, etc.) the modularity of the system, the current repertoire of be easily expanded to accommodate future needs. . an Due to operators may 147 p. 6.3.2 View Translation Level -- Hierarchical View 6.3.2.1 Definition and Mapping This type of external structure may be treated in a When' the data definition and the relational structure. similar way as mapping of hierarchy of segments are received, they are translated and stored catalogue. To in a illustrate, we again adopt the EMP_ASSIGNMENT database example shov/n in Fig 6.1. Now suppose database, as shovm in Fig 6.3.5, the a is a hierarchical view this of The definition of to be generated. view, v/hich is very similar to an IMS type of DDL <IMSa>, with the addition of mapping specifications, may look like what is shown in 6.3.6. Fig Taking these definition statements as input, the DD translator may translate them into a set hierarchical view catalogue. of parameters be to stored the in A tree form of this hierarchy is shown in Fig 5. 3.7. 6.3.2.2 Basic construct At Work Comparing Fig 6.3.7 with tree discussing relational relations may be used views, freely forms may one demonstrated see how constructing in when v;e underlying views. In were binary essence, primitive and binary sets serve as the basic construct of the database, which are flexible enough to expressions, and has the property Another benefit that may be be of read built into relative from any kind of external stability over time. these figures is the clear -1 rig 6.3.7: I A tree form of Hierarchical view ) p. semantics in an external view derived associations. binary from 150 Not only may it help the DBA in clarifying the path during data definition, it also provides better documentation of the meaning of a database. 6.3.2.3 Operations Conventional operators hierarchical GET_NEXT_WITHIN_PARENT, DELETE, INSERT and GET_UNIQUE, REPLACE are GET_NEXT supported. <IMSb>. (a) GET commands GET commands invoke the looking at the segment construction For which, by map and the content of the current buffer, determines how various data elements are to be retrieved and buffer. module example, typical a IMS placed into output retrieval command against our example database: GU DEFT (DNAME=' system ' ) EMP • . JOB_HISTRY (JOB_NAME=' programmer' may be translated into operations on the nary entity sets, as shown Fig 6.3.8. (b) Update commands in ) ) 151 p. Retrievejjnique (JJI (JOB (JBS,JBN) , DATE) (E;^lP(DEPT{Dr^f="SYSTEi4"))) )V-7IiERE AIO (JOB (JN =" PROGRAMMER" SEQ (EI^1P(DEPT(D#) ,E#) The tree form of a strategy to satisfy this request is: Fig 6.3.8; GU operation; tree form of the retrieval ) p. Insertion of Insert entity segment a and Insert_att r ibute ensure proper update of the database. those discussed under be compatible into However, semantics of commands. DELETE and REPLACE operations may need to to down broken is clarified order in views, relational require and special languages. "Subschema" A subset of generate a hierarchy original 'subviev/' of segments various views on this hierarchy. name of the new 'subviev/' the to These issues are, again, similar attention during design of data definition and manipulation (c) 152 is hierarchy. to be defined, may be defined to DDL is input to specify the as well as its connection Access information is also given. entered into the catalogue and may legitimate users. also be operated to Then this upon by p. 153 6.3.3 View Translation Level -- Network View 6.3.3.1 Definition and iMapping In this section we discuss definition and mapping of network views Based <DBTG71>. the on definition conceptual of database the EMP_ASSIGNMENT of Fig 6.1, the system would like to present to the user a DBTG type of network view as shown in Fig 6.3.9a. mapping defined language shown is 6.3.9b. Fig in the network, are mapped to in entity In sets, The definition and essence, the records their identifiers « mapped to key attributes of the entities, and DBTG underlying in are mapped to A tree form of this mapping associations. binary 'sets' is shown Fig 6.3.9c. 6.3.3.2 Operations Operations on network view include <Date77>: Find: Retrieves any record or Modify: Writes content of Insert; Remove: Delete: Store: The a 'currency' Insert a Removes record into a a specific record within a a set the database record back to record from a Deletes Inserts a a set a set record from the database record into the database concept used in DBTG model is utilized in the tranlation of those commands. The user is given a UWA (User Working Area), p. DEPT D#|l 154 ; p. NET:^-raRK EMP_^SIGNMElTr RECORD DEPT, IDENT IS DEPT# IN DEPT 155 02 DETTt, aiAR5 02 DNAME, amR20 02 IDZ, CHAR20 * PEC. DEPT RECORD = DEPT (D#,DN, LOG) IDETW IS EMP# IN EMP E>IP, 02 B>IP#, CHARS 02 ENATIE, CHAR20 02 ADDR, CHAR30 * PEC. EMP RECOPJD = EMP (E#, EN, ADDR); JOB, IDEITT IS J# IN JOB 02 J#, amR5 02 JN, CHAP20 * I^EC.JOB RECORD = JOB(JB#,JBN) ; JB_H, IDEIIT IS J\\ IN JOB, EMP# IN EMP 02 J#, CHAR 5 02 EMP#, aiARS 02 DATE, CHARS * REC.JBJI = SET * J_H (E-IP(E#), JOB(JB#), DATE) S_D_E, aJNER IS DEPT, MHIBER IS H^P SET(S_D_E) M -> M = DEPT (EMP) -» = E!-IP(DEPT) Fig 6.3.9b: 7\n a exanple of data definition and mapping of view net\\«Drk p. Treeform of DEPT record: Tree form of H-IP record: Treeform of S D E set: (o->m) t I (m->o) Iw^J Fig 6.3.90: Treeform of a newtork view of E-1P_ASSIGM'IENT 156 p. containing a buffer for every record type defined content of this UWA, together retrieval/update generate in the view. th the mapping definition, 157 The used to conmands against the nary entity level. To v/i is illustrate, suppose that the network and mapping definitions are stored in a data structure exemplified by Fig 6.3.10. buffer is reserved as shown in Fig together with some examples of translation shown in Fig 6.3.12. During run time, 6.3.11; of the a UWA and the basic logic DBTG commands is 158 MAPI; Rec name ) ) 159 FIND Eec_nane : Retrievejani.que (NSET (Rec_naii'e) ) where (Key_attr (Rec_naiTie) =ID (Rec_nane) --rf-^i^-.3- - ?:-. (Here NSET(rtec name) refers to the second column of MPAl in Fig G.ll; Key_attr (Rec_nairB) refers to the 3rd colurnn of MAPI; ID(R3c_narTB) refers to~"the ID field of the record Rec_naiTie in U'iA; etc.) FIND FIST(1\EXT) METBER TOTHIN SetjiaiTE: Retrieve_f irst (next) (NSET (riEM_REC (Set_naiTie) ) where (MEM_TO_a'MER_ATTR Se t_naiiB = ( ID(a7NER REC(Set ) nanie) ) FIND Q'JNER VJITHIN Set_naiTB: Retrieve_\anique where ATIRlSet name) = ID (MEM REC(Set name))) (NSET(a-JNER_REC(Set_naiTie) (Qi/NER TO liEM ) ) Fig 6.3.12a: General logic for translating FIND commands 160 MOVE "15" TO FIND DEPT; DEPTJI IN DEPT; RETRIE\'E_U>r[QUE (DEPT (D#,DN,DX)) \^iERE (Djf- DEPT# in DEPT in UWA) DIWE ; IN DEPT = "SYSTEM"; MODIFY DEPT; UPDATE (DEPT) (D#=DEPT# in DEPT in U<!A) (DN= DNT^^ in DEPT in U^JA, 1CC=LDC in DEPT in U\ZA) FIND FIRST Ef-IP WITHIN S_D_E; RETPJEVE_UiaQUE (EMP (E#,EN)) I'?HERE(DEPT(D#)= DEPT# in DEPT in UWA); FIl-D FIRST ET-LP_PRDJ TOTIGN S_E_EP; RETRIEVE_UNIQUE (E_P (EMP(E#) ,PPuJ(P#) ,TIMEFRAC) WHERE (Et-ff'(E#)=e-IP# in EMP in O-IA) ; FIND a-JNER UTETHIN S_P_PJ; RETinE\/E_UNIQUE (PRJ (P#,PN) \nHERE (P#= P# in EMP_PRJ in U^^7A) PRINT PROJ; FIND l^EXT EMP WIT!iIN S_D_E; RETRIEVEJEXT (E:-IP (E#,EN)) VJIiERE (DEPT(D#)=DEPT# in DEPT in Ul'iA) SEQ(E#) CURRENT IS (EMP# in EMP in UV-JA) ; ; ) Fig 6.3.12b Exanple of tmaslation of a set of network data model corrmands (* denotes translated statements) ; 161 p. and Summary of View Translation Facility Sublanguage Database 6.3.4 Level As mentioned section 6.3.1, in relational sublanguage such as SEQUEL may wish to use a database. general, In user of the relational data model a very-high-level any to user-oriented sublanguage may be incorporated into our system, so long model assumes it supported is the at operators it uses can be mapped to the data model. A graphical Note 6.3.13. query view operators the database data the as translation level and that of particular representation of this concept is shown in Fig that some sublanguages may be built on top of different external data models simultaneously, depending on application the it supports. In from summary, the three types of views discussed are very each allowed. other in sets is similar, views into the basic constructs. a of these views on top of our since all of them involved breaking the Therefore the final mapping may have common format. In be definitions and the set of operations of implementation Hov^ever, conceptual terms different general, other views may also be provided, so long as they broken down into a can clearly defined collection of basic constructs supported by the conceptual schema. Once this is done, any operation on the viev;s may be translated. Finally, we should take note of the issue of we have truely established a performance. Since system of many levels of indirection, care p. r DB sublanguage! Database Sublanguage DBSLl EDMl Facility mapping I External External Data Data Model Model Processors L 162 DB sublanguage2 DBSL2 EDM2 mapping DBSL2 EDM3 mapping Y 3 Fig 6.3.13 External Data Model 2 External Data Model 3 Architecture for handling Database Sublanguages p. has to be taken to address the problem of overhead. to 163 How we may proceed take advantage of sequential processing by penetrating through these levels in order not to lose connectivity between one transaction and the next is one of the vital performance issues we shall look an external view and processings against it are established. at when p. 164 hierarchy of 6.4 View Authorization Level 6.4.1 Introduction Every external view (e.g. segments) associated has designated by a set of relations or with set of accounts. are specified when a it a set of legal a 'viewers', which is Legitimate user accounts of a view view is defined. Hierarchy of authority: organized a hierarchically; The access power of accounts may be for example, accounts beginning at letter A may have all the access capabilities of accounts beginning at letter B, but not vice versa. views Also, those accounts that are capable of have the authority to designate will users and specify their capability (e.g. a set of accounts as legal a that has authority to define the view. the On the other read or write). view definition may only be changed or deleted hand, defining by Therefore a an account hierarchy of authorization is constituted. Log-on: As conventionally done, a user gains access to an account through its password which is to be supplied attempts to created. The process controller log on. Once to the system at the front end he a process is level maintains user is in the system, the when information of all active processes. View authorization: level maintains several For the purpose control tables. of security control, this The first one is an access p. List. It lists, for each view defined for the database, allowed to retrieve or update the view. are that accounts that have the authority to change the view accounts the It also 165 lists those definition. An example is given in Fig 6.4.1. The other table is an active process table, v/hich, for each active process in the system, lists the view currently in use by the A process^ process has to declare the view it wishes to operate upon before any access command is given. command), OPEN_VIElV Once view the is declared the access list checked and (through for legality of this declaration, an entry in the active process table is made. of proceeds to create view translation 'process control block' a the appropriate external data model processor. block' also maintains working space command) wants it viev/ to see by closing and declaring this level is shown a new one. in Fig level, which then for this process within This necessary construction procedures to serve this process. the format The The view authorization level also the table is shown in Fig 6.4.2. posts this information with the an A for those process may the previous one A summary of control 'process the view change (by CLOSE_VIEW command flow at 6.4.3. 6.4.2 DD Interface Database designers have to spell out access information database or a view of it is defined. DD statements are intercepted further and when the All authorization parameters in entered into the catalogue, and manipulation on the database are checked against this security 166 VTKV r^!A^IE ; 167 VAL: BEGIN; Case command Of "Define View": Do; Process View Authorisation definition; If definition legal then do; Make entry in the access list; Pass the rest of the data definition to an appropriate EDMP; end; Else do; Formulate error message; Pass error message out; end; end;' "Open View": Do; Check against the acess list; If legal then do; Make entry in the active process table; Pass command to the appropriate EDMP; end; Else do; Formulate error message; Pass error message out; end; end "Close View" : Do; Erase view entry in the active process table; end; "Operations": Do; Check against active process table: Pass com.mand to the EDMP which processes the view; end; END; (EDMP: External Data Model Processor) Fig 6.4.3: Command flow in the View Authorization Level : p. 168 information in the catalogue. A special module called author i zation_pa rameter_processo to decode the parameters security r is used and make entries in the security catalogue before passing the rest of the DD statement down. 6.4.3 Operational Interface The set of operators accepted at this level purposes is addition, a identical it, view or becomes f and manipulation those accepted at external view levels. process must open against identi to for data a view before it issues an In operator close the view before it wishes to change to another inactive. Therefore two additional ied OPEN_VIEW (process_id, account_id, view_name) CLOSE VIEW (process id) commands are . 169 p. SUMMARY AND FUTURE RESEARCH DIRECTIONS VII. 7.1 Summary of report The INFOPLEX database computer project has provided the motivation for this report. is and will is our It be information belief that, while playing the central role the in processing application of computers, the conventional computing-oriented computer architecture is not adequate for information large-scale handling INFOPLEX database computer is geared toward design of processing. a The highly-parallel computer system specialized in information processing. chapter In concepts of 1, the INFOPLEX consists Functional general the database INFOPLEX of parts, two Hierarchy. background The computer are Storage the Storage basic and virtual other database preliminary while storage; management design of hierarchical functional the The introduced. Hierarchy and the Hierarchy handles an ensemble of storage devices and supervises data movement among them, large architectural supporting a the Functional Hierarchy performs all This functions. report presents a latter component based on the concepts of decomposition and microprocessor multiple implementation The first task during preliminary functional requirements of the system. design In chapter is 2, to a identify search into the literature of database systems has helped clarifying the picture. notably, the Most two concepts in stratification of database managment systems. p. information abstracton and functional abstraction, are reviewed namely, to provide insights into functions to be supported by contemporary a system and how to organize them into hierarchical level. database important following are hierarchy: (1) flexible (3) internal objectives functional multiple conceptual data model; a 170 variety of stored data model); (4) explicit validity, alerting and virtual information; functional the of external views; types a of a (2) (5) high-level structures support and The (i.e. security, of concurrent use of the database. A Binary Network data model model in functional the is developed as hierarchy. incorporates clean semantics. view support are also outlined. stored functions may grouped be structure management, conceptual data structure, data The internal data model and external Chapter 2 concludes with hierarchy of functions, which are further detailed These conceptual model is shown to provide The natural mapping to the external views and the and the into memory structure in later management, management, and a 10-level chapters. internal external structure management. The memory management level uses the id approach to byte detail from the upper all Encoding Unit The upper levels to use an id as the location of and not to be concerned about the byte address. management is broken (BEU) into three concept. the levels, and keeps track of free storage space and performs compaction and garbage collection. enables insulate levels, id a approach data item, The internal structure integrated by the Basic The data encoding level provides 'final touches', such as text editing, compaction and encryption, to the data 171 p. before the data enters into the storage hierarchy. organizes elements into unary sets, and data intellignent search into element. in unary a set for is The unary set level capable of performing particular a The binary association level maps binary relations described the conceptual schema to their implementation, and unary data elements organization of information insulation and between changes in internal structures will be reflected only between the two. These three conceptual the structure internal its together pieces and pointers among them. (records) internal structure levels provide upper data unary in such that mapping the The interface to the binary association level enables levels to dedicate search and storage of their own house-keeping information (e.g. functional catalogues) to Furthermore, abstraction. the concept BEU realizing thus the internal levels, reflects a parametric and modular approach to internal structure building. The conceptual structure management is also broken into The n-ary level processes the part of the conceptual schema, core keeping track of entities and attributes and enforces relationships among them. The levels. 3 binary semantic virtual information level builds new consructs whose values are not stored, but derived. The validity level maintains more complex update constraints. The external structure management provides integiated database. Three types of views user views the are demonstrated to be The view translation level keeps mappable to the conceptual structure. track of construct mapping and performs operator translation. system security is maintained through onto a Finally, view mechanism contained view enforcement level and the view authorization level. in the 172 p. To summarize, processing of a request in the database below. is It authorization first checked for legality entry by into those in The request then goes into conceptual structure the conceptual schema. which in turn call up internal structure levels to retrieve or update the target elements. Each level in performing its task may call upon lower levels for information or subtasks. The system employs both transaction pipelining and functional abstraction. the view the then it enters into the view translation level, level; which translates its references to external constructs levels, described is notion 'family of others through of systems' It supports also by having levels communicate with implementation- independent interfaces, allowing easy reconfiguration of the system. 7.2 Future research directions We plan to conduct further research in the area of the functional hierarchy along the following dimensions. 7.2.1 Formal Design Methodology In database this report, we have identified functional system and organized them into a requirements of a hierarchical structure. A formal design methodology will be utilized such that the current design can be verified and alternative methodology will also designs examined. A formal design help in detecting potential ambiguities in the design before implementing a software prototype of the system. We plan p. to build our methodology based on developed (SDM) Systematic the Methodology Design Some extensions of SDM may have to by Huff <Huff79>. be generated to tie this design methodology more closely to the of research This hierarchy. functional the 173 design include will both investigation and application of the methodology. 7.2.2 Locking Mechanisms concurrent To support mechanisms must uses of a interlock database, shared Care has to be used to coordinate update operations. be taken in designing and implementing this locking mechanism to adversely affecting this area includes performance its Current research in the system. of aspects theoretical Bernstein80a> Gray76, BadalSO, Ries77>. method that is They will be reviewed in an most suited relationship hierarchy. The functional level to database systems in certain performance issues <e.g., and , BernsteinSOb, <e.g., Eswaran74>, strategies used for concurrency control <e.g., avoid attempt develop to a architecture of the functional the concurrency between control at the and at the physical level will also be investigated. 7.2.3 Mapping of the operators The need hierarchy is loi from the fact the that composed of layers each of which supports of operators. between mapping stems levels The differences in data structures a and functional different set data models contribute most to the differences in these operators. ; p. 174 Further research is required to 1) specify in more detail the meaning of the operators each at level 2) show how these operators are translated into those implemented at the next lower 3) prove that the level; preserve algorithms translation desired the meaning of the operators. Current-day research in the area of data models, much of it having been reviewed chapter two in (notably section 2.2.3), will drawn be upon to obtain insights into this problem. 7.2.4 Implementation of For better implementation a software prototype understanding of a of functional the software prototype shall be planned. hierarchy, One of the major purposes of this software prototype is to facilitate measurements of parameters for performance evaluation. The following steps will be taken for carrying out this implementation: 1) Select a member system from the family of the database systems supported by the functional hierarchy; 2) Resolve the detailed design issues and algorithms selected system; 3) Conduct coding and testing of the system 4) Identify properties in performance of the system. in PL/1; in the . p. 175 7.2.5 Performance evaluation built Models will be of discussed above as estimates of some of the parameters Experiments system will conducted be to closely examine in the model. performance of the various user environments and implementation alternatives. in comparison in performance will highly proposed the will be taken from the software prototype Measurements architecture. performance examine to parallel also decentralized and be made between the A proposed, computer architecture and the proposed architecture of the conventional architectures. 7.2.6 Recovery and reliability One of functional the advantages hierarchy is and hardware breakdowns. methods the its ability to recover from isolated software Further research is required to identify the and protocols to be used by the functional hierarchy to detect and recover itself reliability are ar chi tec of ture to from be possible taken for breakdowns. comparison Measurements of with the conventional 176 p. REFERENCES Abraham79: Abraham, M., 'Properties of reference algorithms Sloan multilevel storage hierarchy' Master Thesis, School Management, MIT, 1979 Astrahan76: Astrahan, M.M. 1976 June 2, ANSI75: ANS I/X3/SPARC et al. study group 'System R' ACM TODS, Vol in of NO. 1, on DBMS interim report, Feburary 1975 Bachman77: Bachman, C.W., and Daya, M., models', Proc, VLDB, 1977 'The role concept data in BadalSO: Badal, D.Z., 'The analysis of the effects of concurrency control on distributed database system performance', VLDB 1980 Bernstein76: Bernstein, P. A., 'Synthesizing third relations from functional dependency', ACM TODS, Vol. normal 1, No. 4, form Dec 1976 BernsteinSOa: Bernstein, P. A., D.W., Shipman, Rothnie, and J.B., 'Concurrency control in a system for distributed databases (SDD-1)' ACM TODS Vol. 5, No. 1, March 1980 BernsteinSOb: Bernstein, P. A., Goodman, N. 'Fundamental and Algorithms for concurrency control in distributed database systems'. Technical Report CCA-80-05, Computer Corporation of America, Feb 1980 , Borkin78: Borkin, S.A., 'Data model equivalence', Proc, VLDB, 1978 Bracchi74: Bracchi G. et al. 'A multilevel relational model for database management systems', in Data Base Management, North-Holland Publishing Company, 1974 , Bracchi76: Bracchi, Paolini, P., Pelagatti, G., 'Binary G., and logical associations in data modelling', Base Modelling in Data in Management System, (ed., Nijssen), North-Holland, 1976 Chen76: Chen, P. P., 'The entity-relationship model -- toward unified view of data', ACM TODS Vol 1, No. 1, March 1976 a 1 , p. Codd72: Codd, E.F., 'Further normalization of the database model', in DataBase Systems, Prentice-Hall, 1972 177 relational Codd79: Codd, E. F., 'Extending database relational model to capture 4, December 1979 more meaning' ACM TODS, vol 4, no. introduction Date77: Date, C.J., 'An Addison-Wesley Publishing Company, 1977 database systems' Programming Language to Data Base Task Group of CODASAL DBTG76: Committee, COBOL Journal of Development, 1976 notions of consistency 'The et.al., Eswaran74: Eswaran K. P. IBM RJ 1329, December 1974 system', locks a database predicate in , , D. D. Chamberlin, Eswaran75: Eswaran, K. P. and integrity', specifications of a subsystem for database and 'Functional VLDB, Procd., 1975 Fagin77a: Fagin, R. 'Multivalued depencdencies and a new normal from 2, No. 3, September 1977 for relational database', ACM TODS, Vol. synthetic Fagin77b: Fain, R. 'The decomposition versus the to relational database design', Proc, VLDB, 1977 , 'Significations: Falkenberg76: Falkenberg, E. base management', Information Systems, 1976, Vol approach the key to unify data 2, No . Falkenberg77: Falkenberg, E. 'Concepts for the coexistence approach Symp., April 1977 Corap. to database management', Proc, Int. , FODS76: 'Proposal for research on the design of a family of data base systems' CISR Draft, Sloan School of Management, MIT, December 1976 FOS76: Madnick Goldberg, 'Family of Operating Systems', Sloan School & of Management, MIT, Fry77: Fry, translation: PP. 95-148 and description 'Stored-data et al., model and language', Information Systems, Vol 2, J. P. a 1976 data 1977, p. 178 Lorie, R.A., Putzolu, G.R., Traiger, I.L., Gray, J.N., Gray76: 'Granularity of locks and degrees of consistency in a shared database' Holland Publishing in Modelling in database management systems. North Company, 1976 'Relations and Entities', Hall76: Hall, P., Owlett, J., and Todd, S. in Modelling in Database Management System, North-Holland, 1976 , S.B., functional B.C., Waddle, V., and Yao 'The Housel79: Housel dependency model for logical database design', Proc, VLDB, 1979 , , 'Database machine architecture Hsiao, D. K. Madnick, S.E. Hsiao77: Proceeding, VLDB in the context of information technology evolution', 1977 , Hsiao79a: Hsiao, D.K., Kerr, D.S., Madnick, S.E., Academic Press 1979 'Computer security' 'Collected readings on a database computer Hsiao79b: Hsiao, D.K., ed (DBC) Department of computer and Information Science, The Ohio State University, Columbus, March 1979 . , ' , Huff 79: Huff, S.L., systematic methodology for designing the 'A architecture of complex software systems', Ph.D. Thesis, Sloan School of Management, MIT, June 1980 IBM Information Reference Manual IMSa: Management System/ Virtual Storage Utilities Application IMSb: IBM Information Management System/ Virtual Programming Reference Manual Storage Kerschberg77: Kerschberg, L. Klug, A., taxonomy of data models', Proc., VLDB, 1976 Tsichritzis, and , Klug77: Klug, A. and Tsichritzis, D. VLDB, the ANSI/SPARC framework', Proc. , Lam79: Lam, Madnick, multiple page C. , systems with September, 1979 Madnick69: Madnick, S.E. S.E., sizes 'Multiple view support D. , 'A within 1977 'Properties of storage hierarchy TODS, redundant data' ACM and and Alsop, J.W., 'A modular approach to file p. system desing', Proc, AFPIS, 179 1969 Madnick73: Madnick,S.E. et al., 'Virtual Information in data base systems", Sloan School of Management, M.I.T. Madnick74: Madnick, S.E., McGrqw-Hill, New York, 1974 Donovan, 'Operating J.J- Systems', Madnick75: Madnick, 'Design of a general hierarchical storage S.E., system', IEEE International Conference Procedings, 1975 concepts Madnick, S.E., 'The INFOPLEX database computer: Madnick79: directions' Proceedings, IEEE Computer Conference, Feburary 1979 & MadnickSO: Madnick, S.E., 'Proposal for Research on the Desgin of a high-er formance high-availability intelligent memory system', Sloan School of Management, MIT, May 1980 associated McLeod78: McLeod, D. and its 'A semantic database model Thesis, L.C.S., MIT, August 1978 structured user interface', Ph.D. 'Alerting in database Morgan75: Morgan, H. L. Buneman, and O.P., systems: concepts and techniues', working paper 75-12-02, The Wharton School , Navathe76: databases: Fry, Navathe, S.B., and J. P., 'Restructuring for large 2 June, Three levels of abstraction', ACM TODS, Vol 1, No. 1976 Navathe77: Navathe, 'Schema S.B., restructuring', VLDB Proceedings, 1977 analysis for database 'View representation in Navathe78: Navathe, S.B. and Schkolnick, M. logical database design', Proc, ACM SIGMOD 1978 , Data Base in Nijssen76: Nijssen CM. Modelling ed Systems, North-Holland Publishing Company, New York 1976 . , Paolini77: mappings in Pelagatti, G. Paolini, P. and a database', Proc, ACM SIGMOD 1977 , Management 'Formal definitions of , 180 p. Rics77: Rie^, D.R., granularity in September 1977 a and database Stonebraker, M. 'Effects of locking management system', ACM TODS Vol.2, No 3 , . Rothnie76: Rothnie, J.B. and Hardgrave, W.T., 'Data model theory: beginning', Texas Conference on computer Systems, 1976 Roussop75: Roussopoulos N. and Mylopoulos, J., networks for data base managment', Proc, VLDB 1975 a 'Using semantic , Schkolnick78: Schkolnick, M. 'A survey of physical database methodology and techniques', IBM Research RJ 2306, August 1978 , design Schmid75: Schmid, H.A., Swenson, J.R., 'On the semantics of the relational data model', ACM SIGMOD Conference Proceedings, 1975 Senko73: Senko, M.E., 'Data structures and accessing in database systems: II. Information Organization', and 'III. Data representations and data independent accessing model', IBM Systems Journal, No 1 1973 . Senko77: systems: Senko, M.E, 'Data strucutre and data accessing in database past, present and future', IBM Systems Journal No. 3, 1977 Smith71: Smith, D.P., 'An approach to data description and conversion'. Moor School Report No. 72-20, University of Pennsylvania, 1971 Smith77a: Smith J.M., Smith D.C.P., 'Database abstractions: Aggregation and generalizations', ACM TODS Vol 2, No. 2, June 1977 Smith77b: Smith, J.M., and Smith, D.C.P, Aggregation', CACM, Vol 20, No. 6, 1977 'Database abstractions: ToongSO. Toong H.D., 'A general multi-microprocessor interconnection mechanism for non-numeric processing'. Fifth Workshop on Computer Architecture for non-numeric processing, March 1980 , Tsichritz78: Tsichritzis D. et al. 'The ANSI/SPAC DBMS framework'. Information Systems, 3,3,1978, pp. 173-192 Vetter77: Vetter, M. , 'Database design by applied data synthesis'. p. VLDB Proc, 181 1977 Yao79: Yao S.B., 'Optimization of query evaluation algorithms', ACM TODS Vol 4, No. 2, June 1979 , Yourmark77: Yormark, B. 'The ANSI/X3/SPARC/SGDBMS architecture' 1977