VIRTUAL INFORMATION IN THE INFOPLEX DATABASE COMPUTER by LAWRENCE ABRAM KRAKAUER B.S.E., Princeton University (1978) SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE at the MASSACHUSETTS INSTITUTE (F TECHNOLOGY June 1980 0 Massachusetts Institute of Technology 1980 Signature of Author Sloan School of Management May 21, 1980 Certified by Stuart E. Madnick Thesis Supervisor Accepted by Michael S. Scott Morton Chairman, Department Committee VIRTUAL INFORMATION IN THE INFOPLEX DATABASE COMPUTER by LAWRENCE ABRAM KRAKAUER Submitted to the Sloan School of Management on May 21, 1980 in partial fulfillment of the requirements for the Degree of Master of Science in Management ABSTRACT The concept of virtualness is extended to information. Virtual information is, informally speaking, information that is derived from other information. The term virtual information was proposed in 1973 in a paper by Folinus, Madnick, and Schutzman{R7}. We extend that paper by defining virtual information in a precise manner. Database concepts related to virtual information (although typically not called virtual information in the literature) are reviewed. These concepts include data typing, data type abstractions, security, data independence, and multiple views of data. Finally, the function of virtual information in the INFOPLEX database computer is discussed{R15}. Thesis Supervisor: Dr. Stuart E. Madnick Title: Associate Professor of Management Science Table of Contents Title Page Abstract Table of Contents Introduction Chapter 1. Virtual Information Chapter 2. Applications of the Virtual Information Concept Chapter 3. Virtual Information Functions in INFOPLEX 37 Conclusion 48 Introduction The concept of virtualness plays a prominent the world of computers. virtual card readers, virtual printers{R26}. concept to In information. proposed in 1973 Schutzman{R7}. in a We this virtual extend information Finally, the function virtual card punches, and paper, we extend the The term virtual information was paper information machines, virtual by Folinus, of and Database concepts related (although in Madnick, that paper by defining virtual information in a precise manner. to virtual in For example, the VM/CMS operating system contains virtual memory, virtual disks, role the typically literature) virtual not are called reviewed. information in the INFOPLEX database computer is discussed{R15}. In chapter 1, the concept of virtualness is by example. Then a model of information is presented and virtual information is analogy is drawn defined in terms of that model. between virtual concepts of data type abstraction Chapter described information and data and 2 describes one application of virtual information context. Chapter first second computer. the independence. in a non-DBMS context, and several applications in and An 3 relates chapters to the the ideas presented INFOPLEX data a DBMS in the base Chapter 1. Virtual Information The Concept of Virtualness Bruce sells watches with leather and metal customer bands. A asks Bruce for a brand X watch with a metal band. Bruce has a brand X watch with a leather band, but he has a spare metal band in a drawer. Mary-Louise is condition that invited What is Bruce's reply? to John's she brings dessert. party on the She has flour, sugar, eggs, vanilla, baking soda, salt, chocolate chips, and mother's recipe for chocolate chip cookies. her Can she attend the party? David is also invited to ingredients and John's party. He no recipe (although his mother is cook), but he does have $10. The bakery is walk from David's front door. a is effectively yes, minute Can he attend the party? the posed conditional on there being sufficient time -to complete the obvious transformation. say that Bruce has a virtual brand X band. Mary-Louise and David no a great two In each of the above stories the answer to question has watch with a have virtual cookies. We metal For example, as far as the customer is concerned, Bruce X brand with watch a The strap. metal has customer a is concerned only with results, not implementation. The advantages of utilizing the concept of virtualness are obvious. larger For example, inventory would be stuck stocked chocolate cookies and chip to keep is disadvantage cookies if a David he had if John had requested raisin other On the cookies instead of just dessert. major have would if he kept "actual" watches only. with actual Bruce hand, the the speed at which each person can respond to a request. What makes Mary-Louise's cookies virtual and they become edible? We will refer to more desireable, edible, concrete, useable form of the The less tasty form of cookie as the materialization. cookie is the basis. the materializing The recipe is the cookies course, the ingredients themselves which case ingredients. to as the they would be the the cookie's basis Specifically, comprises the ingredients. for do This question implies that cookies are more desireable in one form than another. the when prescription from the ingredients. Of be in might composed of virtual, even more basic The most basic set of ingredients is referred cookie's primary basis, everything else is derived. but Cookies become edible when they are materialized, Suppose we kept what made them virtual in the first place? an in cookie edible yet However, view of materialized) until the in in ant outside cookie a drawer the drawer form usable opened. is point of The cookie outside the the drawer. drawer is derived from the cookie inside the kept longer no a materialized cookie from the there is the and drawer, The cookie is not ingredients around. (not a is the but drawer, not a virtual cookie because the respective materializations are identical. As another example, suppose that cookies are made from ingredients, but temporarily are before being consumed. and outside stored in CI. of the drawer respectively. From The ingredients, previous the its basis, I. respect It seems unreasonable, however, to say that CO is not virtual when its basis is. virtual as well. basis examples, CO is not virtual with respect to its basis, CI, while CI is virtual with to drawer Let CI and.CO denote cookies inside I, compose the primary basis for CO and CI, and for the Therefore CO is A Model of Information at, An object t, denoted by an to refer called strings bit of be created, destroyed, and tested. may Atomics atomics. existence the We postulate is simply name the The association, or binding, of an atomic. object to an atomic gives a semantic meaning to the therefore age" is an object and 24 is atomics.) atomic contains information., For example, "Mike's and descriptive to used an atomic. than representations Binding "Mike's information that Mike is age" 24. (We will more use bit strings to describe to 24 the produces An atomic that is bound to an object t is called the materialization of Ot and is denoted by The binding process is called the materialization tOt. process. The database is the set of all objects. database comprises q objects, {O 1,.., is independent of (TO1 ,..., t}, 0 If is TOi tOj not materializations We say that tOi iff for every set of materializations independent of TO If of Toi is objects the binding materialization of a then toi is independent in of said to be all the database then other TOi The materialization of a said to be a ba'sic object. is }. the no change in TO3 results in a change in tO'. derived from TOn. object 0 Suppose of derived basic the object to an atomic. object is the is process The of mapping the materializations of other objects onto a new atomic and binding that atomic to the object. For example, consider the objects "Mike's birth year", "Mike's age", and "current year". birth year" The materialization of the object "Mike's is the process of mapping the materialization of Mike's age and the current year onto an atomic and binding that atomic to "Mike's birth year". The materialization of a derived object is defined terms of other t basis of 0 objects, called the object's basis. t is denoted by Bt. More ordered set of objects, Bt={ 0 ,..., of the basis is "Mike's age" is The [1,Su]. where p>0. 0 P} is an The size rt i. by For example, component of "Mike's birth year". Bt, tBt, defined to be The materialization of two bases, TBu and identical are denoted the first of materialization {T01,...,top}. tBV, is t B The The ith member of Bt is called a p=St. component of Ot and precisely, in iff Su=Sv is truli=TrV' and The rule for materializing to for i in from tBt is called the materialization mapping, Mt For 0 derived from sense to i t to be a component of Ot, we require that to be TO i . The rationale is that define tot in terms of it doesn't objects t materializations are completely unrelated to to. requirement imposed on the make whose Another components of Ot is that the A basis, B , components must form a ccmplete basis for 0 . is following the if is complete 0t If true. is materialized at two different instances to produce t0tl and then 10 t1=ot2 iff tBt =TBt2 . Hence h0t is said to be t completely determined by tBt. A necessary conditibn for 0 t2 completeness is that all objects that tot is derived from t are either in tB or are objects that tBt is derived from. between O corresponds to Each node, Nt, directed edges. A directed edge, Eu,v, leads from Nu to N component that the convention to corresponds components two leftmost the the first through St component follow of child from of Nt left Therefore, Bt corresponds to the children of N. the graph are an object, iff 0 is a When drawing object graphs, we will use Ou. of nodes and comprises graph object An objects. relationships the show to used An object graph is not edges leading from it. that and to Nt right. Cycles in A leaf is a node with no permitted. A node, Nt, basic, otherwise Nt is derived. node a is a leaf iff The primary basis of Nt is t is the set of all basic objects whose nodes are descendents of Nt in the graph. We define a cell in terms of its materialization. The materialization of a cell is a substring of an atomic. If we can decompose to the completely such that each tCtri is where p=S t, (in into a set of cells {Ctl ,...,CtPl then we say that sense used earlier) by trt, An non-cellular object is in shown and object cellular a of example is cellular. determined figure 1. iff tC ti-Trtri to its basis, tBt (tOt=TBt), each i in [1,St]. t by definition, each (relations), corresponding object graph, and the materialization of object. materializations PERSON and YEAR dashed by tables. materializations, tedious process (materialization of The lines. Some nodes instead their tables are the naming labelled are with of names, to avoid the object. each The of the) person's age is derived from (the materialization of) her birth year and the to connected are Objects example If then tOt/tBt Figure 2 shows an instance of two tables the be for cellular is not If an object, 0t, t a The materialization of a cellular object, Ot, is defined to identical t the current year. current year were to change to 1981, then Joanne's age would materialize as 16. Note cellular, except for '15' and '14'. that all objects are OF OBJecr Ak~c) 'P-s A CT,-LLVLA R. flcff(FE jac,-(O Cf-(PONEMTS IJA HE i~ot tuA Mg I-tEIGHr (TtJCid3) gq0 THC HATekiALVlafTlJO\S objeCT MAME Ar JD OF i-rS A t\30 - CFLLULAK CoPJE~S 131l KT EAK AGE 51'oU cLL~ENT IQgo. VUL 13 Fjukre 2% TMig i^ e PERW$~) AMT B8en-M veJr, tI Jco.nne, {,~. YA&TAL A66T % ~.tro~in61966 LcZi1 6 i I4ATGP,,i ALI EAaI\IjS OF 0 &J EC FS Kin~ P6'~~~ eC Irzylok)id We are is Object Ot a position to define virtual information. in now virtual statements is iff one of following the two true. (i) TOtfTBt (ii) Trt8i is virtual for some i in [1 ,St Thus, Ot is virtual to identical the if its materialization object in its basis is virtual. t then is said figure 2, the respect to materialization to be objects their of is not its basis or if an If statement (i) is true virtual with respect to Bt. '15' and bases. '14' are virtual In with Objects tl, t2, and PERSON are virtual, but not with respect to their bases. As a corollary we information, from a virtual to the that Ot is virtual show object, 0u u completely described by 0u). a path must exist between N path from Nt to Nu edges leading from N From (although If Ptu, is virtual. if it virtual is not derived necessarily from Ou then in the object graph. A an ordered set of directed to Nu, where Also, of Ot is derived and N the definition of virtual because OUis virtual. definition information, Ot rl,..Eik u Pt=-Et is 0 ik virtual is virtual if 0 il Therefore, by induction, Ot must be virtual. is 15 For another example of virtual the PAY relations an distributions within Each tuple themselves are top secret, and we a to study salary want salary Since organization. confidential, is we Suppose object. salary information 3. figure in VPAY of PAY comprises a basic name object and (Otl,0t2 ,etc.) basic and consider information, executive salaries new relation define a In System R consisting only of the salaries under $50,000. {R6} VPAY could be defined as follows. DEFINE VIEW VPAY AS: SELECT SALARY FROM PAY WHERE SALARY < 50000 The corresponding object graph and example materialization are also in figure 3. as 0PAY and M The The System R statements define BVPAY in terms of the salary virtual object in figure 3 is 0 only cell . of tO . The reason that 0 VPAY is virtual is that, although it is cellular, VPAY34 PAY. We can generalize this observation by noting . TO t0 that if the information embodied in the materialization of an object, to , is a subset of the information embodied tBt, then 0t is a virtual object. in PAY 5 0 AP,- ti Chr-15 1)6 __ ____ 1110 6,456 L3Frarz H ALA fZY II)5 ___ tSCeo~ I..) qO2 S5qcl 67,e c296 ?pT-k+on PAM rxni VAY Ho5+PriXli za-iorL-s oA 0 b5 ects ( no+ calt Shown) iaoco Harcio. lchb- ElI F-orsj I (oru1cter.. GA.* objerd 6&rek Fcubre3 fi.)s9)j_ Virtual Information: A Functional Perspective Virtual information from objects existing enables objects, new of construction make the new objects to appear as real as the existing objects. This is analagous to data type abstraction in programming languages{R18,R20}. type Data abstraction types from existing data types, appear as of new data construction enables to make the new data real as the existing data types. types We proceed to devlop this analogy further. Liskov and Zilles define an abstract data type "a class of characterized by objects"{R18}. abstract objects the operations which is available to be completely on those In other words, to define an abstract data type one simply defines (i) the operations that can manipulate the data (ii) (iii) type, the internal representation of the data type, the implementation of each operation. and A programming CLU{R20} as (such capacity language with an abstract data type to a lesser extent, and, PASCALfR25}) real, provides a set of primitive data types (for example, boolean, integer, to enable definition of new data etc.) Abstract data types can also be used to define types. abstract definition of an integer stack{R18}. empty(the stack). specified of Second, the internal representation is to be an array of integers with a pointer to the top of the stack. terms relevant the First, the push, pop, looktop, erasetop, enumerated: are operations consider example, an As types. data new operation each Third, is defined in manipulation of the stack and the pointer to the top of the stack. Implementation details defining an abstract are data type. data type only the behavior -- data is of importance. type know how simplified details. manipulate to by not the being when important very However, when using the the functionality -- of the The programmer needs only to data type. complicated by His task is implementation The result should be that programs will be easier to write and maintain{Rl8}. 19 Virtual information is analagous to data type abstraction both in terms of implementation and in terms of benefits. type The only abstraction difference concept is that there is no data analagous to non-virtual information. A virtual object is completely characterized by operations on (as materialization that object mapping). it is useful to spectrum from pure data to pure to {R7}, "recorded facts information in the data base data." by the We therefore take a functional perspective of virtual information. perspective, defined To view investigate this information along a algorithm{R7}. According which are independent of other may be considered as pure In terms of our definition, basic objects are pure objects. An example of pure algorithm is the trigonometric sine function. In between pure lies information. derived combination of pure data and data A pure and derived pure algorithm object algorithm. is A a virtual object is a derived object with the additional quality that the materialization of the object's basis not be identical to the employee materialization salary of information the is object itself. represented comprising tuples that comprise three fields: name (EMPN), (WSAL), Although to ASAL, the employee's weekly salary by Suppose a table the employee in and theemployee's annual salary in dollars dollars (ASAL). a functional dependency exists between tOWSAL and the materializations are independent. The table is 20 shown in figure 4.FTy re 4 A tJon - Vir voi Tdi6 A5AL- KiSAL EMP' 3mo 600 \ERTRUDE toA 5SO .2860O L1.'J EON60o t B^A Obj ed H ate, alia~ O b erk 1r phk a The functional dependency is undesirable because in tOWSAL not will be automatically corresponding change in to reflected 5 .SALFigure change by how shows a the concept of virtual information can be applied to the tables figure in 4 to solve the dependency problem. the basis of table A' (BA) is 4, while tOA to tO The pure data . (WSAL) to give virtual information Note that the functionality of 0A is identical to (ASAL). delete, standard database operations such as insert, ; update, and retrieve still apply assuming that M A' -1 are suitably defined (for example, assuming (M ) the is pure algorithm (multiply by 52), via the materialization mapping, 0 figure is different than BA in identical a with combined In figure 5 update course, Of WSAL). MA' be always not may that ASAL by 52 before updating divides operation and Several example of this are given in invertible. {R16}. Table A' in figure 5 is virtual with respect to table A' were and table A' can be manipulated exactly as if TO A' A' B and is defined in terms of M Once 0 identical to tO . B, BA', can 0A implementation. function is manipulated The insulation extremely system design -- languages. be In its of implementation and regard important to database and database just as it fact, to without this support of data independence. is important insulation to programming is crucial to the Fivwe 9' A k/irf uoJ IcbfeeMPt4 W3,A L A5A L 1~OA m PI1 WJcSA L LECI~360o C-e KTR UDE 6so to. L3~IIA LEZIIIJ EFIZ Obd -c- HoMieriaI -- ~o n,,s cAbje-cA- nrafk. 5A 23 been has Data independence various in defined both precisely {R24} and imprecisely {R7,R15} fashions, to For our purposes it is sufficient the literature. in say independence exists if a change in rBt does not t assuming that Mt is necessarily imply a change in to, data that For example, data independence exists change. to allowed in the relations of figure 5 if a new field, be changing to without 0 to added contained in B ). can (note that BA' is data example, As another EMPAGE, independence at the applications programming level if changes to exists the physical storage structure do not force changes to programs. applications the refer We the literature for discussions of many the reader to the virtues of data independence {R13,R24}. information A virtual independence A. comprises the 0 in save storage space 0B creates Suppose the database Noting that the ASAL can figure 4. by shown in eliminating figure 5. the ASAL Without field. a have programs, However, to change for example, with a virtual to tO. He virtual information capability, since tBA has changed to tB A', would be the database administrator decides to WASL, from derived following way. the in data facilitates capability toA Therefore, applications using toA would have to be modified. information capability, a new table A' (shown in figure 5) can be defined in terms of to . We such that toA (in figure 4) is identical an M find must Indeed, 5). (in figure to* to they in identical are figures 4 and 5. In practice, find a suitable always to is impossible to guarantee that we can materialization is it because More importantly, however, a change in basis a corresponding change in Mt to Mt, such and B that tot is identical to tot, Mt. mapping, possible to change a basis in such a way as to lose information. Bt it may give us a non-invertible Data independence will exist for retrieval operations, but not for store operations. storable at level We say that an object 0 k if Mtk-i is invertible. This problem has been addressed in the computer literature {R16} and implemented DBMS systems. two is in In implemented systems there are possible reasons for non-storable virtual information: (i) The system doesn't support some forms of storable virtual information. (ii) The materialization mapping is not invertible-. 25 Applications of the Virtual Information Concept Chapter 2. A Non-DBMS Application of the Virtual Information Concept Data Reorganization/Translation The process of data translation is one of changing the new hardware/software system, or possibly a on processed reorganize a database range from efficiency to to the of simplification Figure 6 shows new hardware/software). necessity (e.g. to approach Michigan {Rl,R2,R3}. language definition source and target is language the University First, the source and target data respect with are defined, (DDL). to structure, by Second, a mapping between the defined by a translation definition The Reader then reads the source database (TDL). form. The Translator uses form. Finally, the Writer common the source data in the common form and the TDL mapping to produce target common data a using the source DDL and translates the data into a data a the problem taken by members of the Data Translation Project at of for Reasons more effectively on the same system {Rl,R2}. wanting be can it organization of the data in a database so that data in uses the target data definition to convert the target data to the correct target format. 26 FR~ &wa6 A DbofcLa TroasICLt"I C)n C eme~ Fisorhe Do--cdcx Tra c Ict "f~ i or) as on App r o~Virfvoi r-frmfdi'on I t~o Ic~ The data translation problE!m virtual of terms figure 6 is problem redrawn in the conte xt find to is Thus, structure at the target level. mapping that DBMS, DD a of that such M where Consider figure 7, information. tOD in restated easily is The 0D has the desired we are looking for a will give us our desired virtual view of the source data. DBMS Applications of the Virtual Information Concept Multiple Views of Data A view of an object is the Objects object. views of 0 object iff 0 Ot simply the materialization and Ou are said to be different is a descendent of both objects representation graph. in the We discuss multiple views of data because much of the work on implemented DBMS systems -- of views -- especially in has focussed on multiple views. Multiple views serve two major functions: (i) The support of multiple data models, the hierarchical, network and relational models. (ii) The support of multiple users' views, its specifically own each with requirements/authorization for a subset of the total database. the and system, DBTG the R, INFOPLEX, System ANSI/X3/SPARC architecture all have been discussed in terms the efficiently support Our data and relational) network, a but new models database the is 0D that Assume model to be used to support database 0n hierarchical, network, and relational databases (0h Or respectively). 0 In then TOh, the basis of 0h, other words, 0 be must, in each case, virtual if the If 0 , and not were model ,and ton, to r, and toD would all be identical Without proof, (since they have a common basis). that as is to express the problem in terms of purpose with to enough three standard models the only not virtual information. implemented multiple flexible be should models level below well. R5 that the database representation at the is support (hierarchical, {R15,R6,R13,and The difficulty in providing respectively}. model support model data multiple of we claim the materialization of different model views were always identical then little would be gained by supporting multiple models. The support of multiple users' some extent by terminology differs. block (PCB) In specifies "physical" databases sub-schema if most, definition the is DBMSs -- not all, IMS, the views program handled to although the communication mapping between "logical" and system, the In the DBTG specifies the mapping between the {R13}. 29 view facility enables selection of a subset of definition .R6}. System R is a. The view one as user one or more relations to be presented to the view the R, System In the "sub-schema" and the schema (R13}. support provided by IMS, DBTG, and important for two reasons: The user can view the This wants. part him frees database the of worrying from sees only be changed about Since irrelevant fields, sets, tuples, etc. he he part of the database, other parts can without him affecting -- thereby introducing data independence. b. Views can be defined particular and to permission perform operations on data can be selectively Thus views granted to users. can be used for security purposes. To see the connection between virtual information the of support multiple users' views, we can look at a user's view as a view of a database with only a the potential information. of the user's view is and subset of Obviously, the materialization not identical to the materialization of the user's basis, since the user's basis is identical to the basis of the database subset of the database. -- while the view is only a Prevention of Errors of The prevention securi-ty, integrity, errors comprises consistency, four areas: and reliability. These areas are defined as follows {R21}. Security: prevent users from accessing and modifying data in unauthorized ways. due prevent semantic errors made by users Integrity: to carelessness or lack of knowledge. Consistency: prevent interaction of semantic multiple errors processes due to operating concurrently on shared data. Reliability: prevent errors due to malfunctioning of hardware/software. Virtual information is important to two of areas: the above security and integrity. Security Security is related to virtual information because it necessarily implies different information content available to different users. provide security multiple view In fact, System R, DBTG, and IMS all facilities facilities. in For conjunction example, with in their System R, granted to specified users to perform certain permission is operations on specified views (in System R a user's overall smaller of views Another way that virtual information can be used objects). to of combination a is database view For provide security is through encryption techniques. example, a view of an object could be defined as a function of the view of the secure object Without insertion and key. encryption an of the correct key, the object would be garbled upon materialization. Integrity Virtual information supports integrity data type (for example, three kilograms to tons), related type concepts: and scaling (for example, Unit conversion and scaling can be thousands to millions). thought of as a subset of type conversion. are and other operations meaningful operations between done between conversions. unlike can- be made. objects a virtual like Second, where no objects it by rejects meaningful A discussion of the relationship of data conversions to integrity can is Type conversion First, it insures that comparison aids integrity two ways. Conversion Data (for example, real to integer), unit conversion conversion conversion ways: two conversion and elimination of redundancy. type conversion comprises performing in be information found in function {R21}. in the 32 following sense. represents materialization 0u component of (that For an example the number, of MIMS (u/m). a data the sole is as integer an The can we Using MIMS terminology, have user's facility, conversion system {R23}. an view associated of unit that field is then of MIMS allows To enable u/m conversions, always in that u/m. the user to create a file called the UM file. in whose t a virtual view of each field in a record measure real Ifu if 0u can be manipulated and real) then t0u is consider a at, object conversion is defined between types integer if is, that Assume Each record the UM file comprises a u/m code (feet, seconds, kg), u/m type (distance, time, mass), a u/m standard indicator. determination of the magnitudes. internal units measures of a given u/m type are to be shows a field an example UM file. is defined representation would be yards. and a The u/m definition expression enables relative specifies indicator definition, a with Thus, a u/m The in stored. standard which Figure all 8 referring to figure 8, if of feet the internal (the materialization of its sole component) MIMS U/M File u/m definition u/m standard u/m code u/m type inch distance 1 yes feet distance 12 no yard distance 36 no sec time 1 yes min time 60 no hour time 60*min no feet/sec velocity feet/sec no inch/sec velocity inch/sec yes Figure 8 A data conversion facility requires some sort of typing facility. One proposal for such a facility is contained in {R22}. That proposal is couched domain' definitions for relational attributes. definition comprises a description of the domain, an approach might integrity be representation details. of The domain objects in the of the domain. A These {R22}) , in order to hide issues will be discussed reference to INFOPLEX. Redundancy of (inserts, deletes) data must means susceptible that multiple updates be done when one object is to be updated (inserted, deleted). Database integrity is then to processes that either neglect to modify all occurences of the example, to can be terms to use the concept of abstract data types (this is pointed out in later in in ordering, and an action to be taken in case of attempted violation of the better data due completed. object or that are terminated (for a system crash) before all modifications On the other hand, redundancy facilitates processes that only retrieve the redundant data - since it tends to reduce inter-relation (record, set, etc.) references. For example, consider the network pictured in figure 9. Suppose we want to determine Kathy's school. find Person-School record via the P-PS set, since her her We school name (sname) is located in the Person-School record. We would have had to search in the P-PS set, were it not 35 Rc iuaeA Ec anpje Th-,cn RO-COr pnctme- Gcxrrnple. 5chcdI 'cr cL-jeJ t~nl~r~Q ~I~cj Ga m pI- Percscni Sc hcJ I G-(-6d jpn~ea 3flCjm1e- GPA aih 6a eii~n31q io(Cat ton 36 the for redundancy in sname. To eliminate the redundancy while maintaining the benefits of redundancy, we could make pname and sname virtual within In fact, DBTG this provides Person-School the record. with its VIRTUAL facility SOURCE clause {R13}. Data Encoding Data encoding is typically used to reduce the space required by objects. of special end-of-the-line encoding information, would object. fixed length lines text might be compressed via the substitution of a the Data For example, storage be clearly character falls for under blanks. trailing the hat of virtual since the materialization of a decoded object different from the materialization of an encoded Virtual Information Functions in INFOPLEX Chapter 3. the INFOPLEX DBMS. storage and high extremely desire for and comprises a a and retrieval, speed, storage high to relevance the discuss hierarchy functional of reliability, hierarchy for data for Our objective performing information management functions. is in INFOPLEX is the for motivation The phase. design currently computer, INFOPLEX {Rl5} is a database virtual information to INFOPLEX's functional hierarchy. Three functions in the INFOPLEX computer These are data virtual information functions. essentially are typing (including conversion, unit of measure, scaling and encoding), virtual view definition (including virtual views of relations and attributes) , and security. are functions their related, functional hierarchy cannot be example, the same level. For data encoding must be done at a low level so that levels with data information hierarchy is functions comparison comparisons on decoded data. virtual in the INFOPLEX placement at Although- these depicted do the Our proposal for placement of functions in can in the INFOPLEX functional fi.gure 10. Intuitively, the security, integrity, and data typing (except data encoding) levels belong above the view definition level because LC VlkTUAL TN-FCRJAATIO&i EoNJc,TIONtJL I hleZ ThM THO TAIFJOPLE-. PI i)L/QU~fZy VI rta1V e0;I e+;cir, j / 39 terms virtual of discussion greater in Placement will be views (and objects). discussed later in slightly expressed be security, integrity, and data typing may detail. complete A the INFOPLEX functional hierarchy is beyond of The, reader is referred the scope of this paper. {R15} to and {R28}. The Security Function prevent users accessing from unauthorized ways. previously, stated The security function, as and modifying more general necessary to prevent data modification. to to in data Although data access can be effectively controlled through encryption, a mechanism is the assert userid) already exists. true is We assume that the identity We are then left method of with a user (the two major design issues: (i) Location Strategies 1. Keep access permission information for information for objects in a user profile. 2. Keep access permission objects with the object itself. 40 to (ii) Basis Strategies (not with confused be an object's basis) 1. Control access on a views basis. 2. Control access on a query modification. basis. Logically, the user that creates an object should be solely responsible for granting access permission to other users. This is the strategy used in System R {R6}. user A If to grant permission to user's B and C, and wishes location strategy 1 is used then A must access B's and user a profiles--which may be C's located in a slow retrieval on (e.g. level of the INFOPLEX storage hierarchy tape). If location strategy 2 is used then only the object profile must be retrieved because accessed). the the (which is object likely to be readily accessible likely is Furthermore, location granting of to access to been have 2 strategy all users facilitates (System R uses a special keyword of PUBLIC for this purpose). question recently Finally, the "what users have access to this object" is easier to answer with location strategy 2. The only advantage of location strategy 1 is that access rights can be determined before an object is accessed. strategy for INFOPLEX is 2. Overall, the better location Query modification is really a form of view definition in the sense that the modified query corresponds to view that only lasts for the duration of the query. INGRES {R13} DBMS uses query modification as security. We a prefer to seperate the a basis newThe for view definitions functions from the security function, and therefore favor the use of view definitions as a basis for security (rather than something the security level does) in INFOPLEX. is the approach taken, for example, by System R, IMS, This and DBTG. The Virtual View Definition Function (VVDF) The VVD function is to enable the construction of new objects from objects already in the database. specifying, for object 0t, Bt and Mt. earlier, most existing DBMSs have some This means As was mentioned (perhaps limited) VVDF. One of provided by the most System comprehensive R {R6}. VVDF facility The System R user defines a view preceeding a query with the header "DEFINE VIEW" the example definition. on A catalog When an object is If the name page ?). The holds all referred to, is (see query itself is the view the view definitions. System R checks the catalog. of the object is corresponding definition is present in the'catalog, substituted for the the object referenced. A System R view definition of explicitly defined unless updated object cannot be one-to-one a is there is Mt) A view-defined query itself. the by Bt retrieval The mapping, materialization the operation (i.e., defines Ot objects. other referencing by simply object between the object and an object defined in correspondence a base relation (System R's terminology for basic objects). a We propose below located view similar level, the then data If it typing not be able to do type conversion for objects would Security cannot defined by the VVD level. if INFOPLEX, the data typing and security levels. were above the data typing level for facility be VVD; below it were then the security function could not be handled via virtual views. Suppose that a user wishes to view of collection a INFOPLEX's n-ary level. the of define objects a already hierarchical defined The user writes a query to at define virtual (hierarchical) object in terms of the existing n-ary objects. operations that That can query take effectively place on all defines the new object. We assume that the hierarchical external view level provides a mechanism for mapping hierarchical operations into n-ary 43 . query Since the explicitly of retrieval (for mapping the mechanism would be non-trivial). (although operations such modification operations problem the update), defines relation and delete, insert, of inferring the latter exists. general we can permit modification operations in a n-ary (for but not (M) atomics), as materialization the (object) if the In virtual materializations of the attributes of the virtual relation are invertible functions of of the materializations of the attributes exactly We can delete or insert a tuple only if relation. the constraint that the attributes of the virtual include we meet relation least one candidate attribute from each of the at if we mentioned above. In relations in the basis of the virtual relation, and meet the one invertibility requirement addition, modification is possible when deletes and inserts are possible. The Data Typing Function Each object has its own materialization - determines the operations that reasonable to behavior can identical, the manipulate object the that by defining the object. It is expect that many objects would have similar materialization mappings. were of mapping If the materialization they could be catalogued, mappings and each object could refer to the catalog for its materialization mapping. We could ref3r to catalog a instead Thus, definition. materialization mapping for with include the usual types (real, complex each integer, constraints, composite data types distinct (e.g. data levels, the encoding (DE) level. encoding techniques inplementation a new object, we Types would as etc.) could well network, etc.). extensive require as type would size constraints, and array of integers). typing can be divided into two data typing (DT) level and the data The DE for minimizing level comprise encoding/decoding techniques. in {R28}. type facilities to enable parameterization of, range In INFOPLEX object an object type. Such a data typing facility would for example, new relation, hierarchy, types (array, parameterization an defining of each object associate as entry is responsible storage a use. catalog for Its of Further details can be found The DT level is responsible for: (i) insuring differing that operations between objects of data types are rejected or that one or both objects are properly converted 45 (ii) enforcing (i) with respect to different units of measure and scale factors allowing user definition of the (iii) materialization of objects at the next higher level (iv) providing whatever necessary deemed suport towards an abstract data typing facility. The advantage of typing materialization operation The well-defined. to impossible mapping disadvantage pre-specify all all would is that a the would it A be since the type data facility, similar to those used in programming abstraction such that be explicit and object types -- number of possible object types is infinite. languages, is objects would have to be built{R19,R29}. facility is has of No abstract data type non-trivial. bases The design been in developed the facility for data literature. We prefer restricting INFOPLEX data typing to objects whose materializations structure (e.g., real, have integer, a relatively string, addition, a limited data abstraction lines of {R22},.would be quite useful. facility, simple etc.). along In the 46 To perform (i) requests to (iv) higher from parse must level the DT all levels in order to identify each of the objects references and each of the operations involved. Suppose the following request were issued: SELECT EMP.NAME,EMP.VACATION TIME WHERE EMP.VACATION TIME<EMP.SICK TIME catalog DT The EMP relation and an example is shown in The DT level would convert the request to: figure 11. SELECT EMP.NAMEEMP.VACATION TIME WHERE REAL GREATER THAN(CONVERT INT TO REAL( EMP.VACATIONTIME),EMP.VACATION TIME) When the response is returned by the lower levels, DT must modify the objects EMP.NAME and EMP.VACATION TIME so that they are materialized in the prescribed format. The above example illustrates two implementation considerations: (i) Lower levels must be capable of handling arithmetic complex operations comparison and the generated by the DT level. (ii) Virtual view definitions must be modified by DT level before definition (VVD) level. level were above to going the Note DT objects could not be typed. the that level virtual if the the view VVD then. vir~tual 47 Th& EMP The R~~ci C-o-±CL Icj Obiech-On Me CLS - e, TieL mm AIO - ies -9 T ne va-oi Cm Nateri al ikc+C L)re-, Daseo-I Itr 48 Conclusion have We concepts. database to attempted information model and a corresponding notation virtual information concepts major fault of the notation is retrieving information defined in terms of can the an which with concept The of database in fairly well materialization the provide communicated. that while the from be related many of Virtual information is a collection mapping, the concept of changing the information content of the database is poorly defined. A more desirable information model would completely characterize the behavior of objects in terms of the operations that can be performed on them. The implementation of virtual information in was discussed in terms of general issues. INFOPLEX Further details could be given only if assumptions were made as to details of the interface between the virtual information levels and other levels. Hopefully, this determination of those details. paper will help in the Bibliography and FryJ.P. Merten,A.G. R1. Approach Language , Data and Ann April 1974 Michigan, Fry,J.P. R2. Data Michigan, Translation Project, The University of Arbor, Description Translation", File to "A Translation Project, Formulation Data Reorganization", Data of Definition a "Towards and Jervis,D.W., Michigan, The University of Ann Arbor, Michigan, January 1974 Lum,V., Behymer,J.A., Data Description and Language", Information Frank,R.L., Schneiderman,B., "Stored-Data A Translation: Systems, Vol. Model and 2, pp.95-148, 1977 Pergamon Press, Tsichritzis,D. R4. Taylor,R.W., SmithD.C.P., Fry,J.P., R3. and Klug,A., "The ANSI/X3/SPARC DBMS Database Systems and Tsichritzis,D., "Multiple View Support Framework", X3/SPARC Study Group on (X3 Project 226) Klug,A. R5. Within the ANSI/SPARC Framework", Proceedings on Very Large Data Bases, Astraham,M.M. R6. to et al, "System R Relational Approach Database Management", ACM Transactions on Database Systems, Vol. R7. 1977 1, No. 2, June 1976 Folinus,J.J., Madnick,S.E., Schutzman,H.B., "Virtual Information in Data-Base Systems", Information Systems Group, Sloan School of Management, M.I.T., Cambridge, MA 02139 R8. Each,M.J., System: A Coguen,N.F., Kaplan,M.M., Generalized Approach Conversion", Proceedings on 1979, R9. Very "The ADAPT Towards Data Large Data Bases, pp183-193 Navathe,S.B., Schkolick,M., "View Representation Logical Database Design", SIGMOD in Proceedings 1978, ppl44-156 R10. NavatheS.B., Databases: FryJ.P., Three "Restructuring Levels of for Large Abstraction", ACM Transactions on Database Systems, June 1976, Rll. Vol. 1, No. 2, Is The Use Of pp138-158 Brodie.M.L., Abstract Schmidt,J., Data Types In "What Data Bases?", VLDB '78, pp140-141 R12. Smith,J.M., "A Normal Form Proceedings R13. For Abstract Syntax", on Very Large Data Bases, 1978, pp156-162 Date,C.J., "An Introduction to Database Systems", Addison-Wesley Publishing Co., Reading, Mass., 1975 R14. CardenasA., "Data Base Management Systems", Allyn and Bacon, Inc., Boston, Mass., 1979 R15. Madnick,S.E., Concepts and "The Infoplex Directions", Database Proceedings Computer: of the IEEE Computer Conference, pp168-176, February 26, 1979 R16. Codd,E.F., "Recent Investigations Into Relational IFIP Congress 1974 Data Base Systems", Proc. R17. Liskov,B.H., Zilles,S.N., "Specification Techniques 10, No. for Data Abstractions", SIGPLAN Notices, Vol. 6, R18. pp72-87, June 1975 Types", Data Notices, SIGPLAN Abstract With "Programming Liskov,B., Zilles,S., 9, Vol. 4, No. pp50-59, April 1974 R19. R20. Liskov,B., "Abstraction pp56 4 -5 76 SnyderA., Atkinson,R., Mechamisms in CACM, CLU", 8, Vol. International Conf. Data Bases, Sept. 1975 McLeod,D.J., "High Level Conference on Data Base", Data: Base on Very Large Definition Domain Proc. Data for Subsystem a Integrity", Proc. Relational "Functional Chamberlin,D.D., of Specifications R22. Schaffert,C., , August 1977 Eswaran,K.P., R21. 2, 1976, pp58-59 8, No. SIGPLAN Notices, Vol. Bases", Data for Hammer,M.M., "Data Abstractions ACM Abstraction, in a SIGPLAN/SIGMOD Definition, and Structure (March 1976) R23. Mitrol, Inc., "MIMS Request Reference Manual", June 1975 R24. Stonebraker,M.R., Independence", Proc. "A Functional View of Data 1974 ACM SIGMOD Workshop on Data Description, Access and Control R25. Jensen,K., Wirth,N., "Pascal: User Manual and 52 Report", R26. Springer-Verlag, I.B.M., CMS Users 1979 Second Guide, Edition, August 1977 R27. Chen,P.P.S., "The Entity-Relationship TODS, Vol. R28. Hsu,M., Functional Machine", Management, R29. et 1, No. ACM 1, March 1976 "Architectural Hierarchy of Unpublished M.I.T., Model", Specifications the Draft, Infoplex Sloan of the Database School of 1980 "Data Abstractions For al, ACM TODS, Vol. Databases", 4, No. Lockemann,P.C. 1, March 1979, pp60-75