DATABASE MANAGEMENT SYSTEM Unit- I Database System Architecture - Basic Concepts : Data System, Operational Data, Data Independence, Architecture for a Database System, Distributed Databases, Storage Structures : Representation of Data. Data Structures and Corresponding Operators: Introduction, Relation Approach, Hierarchical Approach, Network Approach. Unit – II Relational Approach : Relational Data Structure : Relation, Domain, Attributes, Key Relational Algebra - Introduction, Traditional Set Operation. Attribute Attribute names for derived relations - Special Relational Operations. Unit – III Embedded SQL: Introduction – Operations not involving cursors, involving cursors – Dynamic statements, Query by Example – Retrieval operations, Built-in Functions, update operations - QBE Dictionary. Normalization : Functional dependency, First, Second, Third normal forms, Relations with more than one candidate key, Good and bad decomposition. Unit – IV Hierarchical Approach : IMS data structure - Physical Database, Database Description- Hierarchical sequence - External level of IMS : Logical Databases, the program communication block IMS Data manipulation : Defining the Program communication Block : DL / 1 Examples. Unit – V Network Approach : Architecture of DBTG System. DBTG Data Structure : The set construct, Singular sets, Sample Schema, the external level of DBTG – DBTG Data Manipulation UNIT -I 1.Data base system Database system is nothing more than a computer-based record keeping system (i.e.) a system whose overall purpose is to record and maintain information. The information concerned can be anything that is deemed to be of significance to the organization or the system which may serve the organization in decision-making processes involved in the management of that organization. The database system involves four major componenets.They are data ,hardware, software and users. Database Management System 1 User1 User User Application programs End Users Fig: Simplified picture of a database system Data The data stored in the system is partitioned into one or more databases. A database is a repository for stored data, it is of both integrated and shared. Integrated: By integrated we mean that the database can be thought of as a unification of several distinct files, with the redundancy among those files eliminated. Example: Combination of EMPLOYEE and ENROLLMENT data files. Shared: By Shared we mean that individual pieces of data in the database can be shared among different users that is many users can have access to the same piece of data. Example: The department information in EMPLOYEE file would be shared by users in the personal department, education department etc. Hardware The hardware consists of the secondary storage device disks, drums,etc… on which the database resides together with the associated devices, control units, channels and so forth. Software Between the physical database and the users of the system is a layer of software usually called the DBMS.All requests from users for access to the database are handled by the DBMS.One general function provided by the DBMS is thus the shielding of the database users from hardware level. The DBMS provides a view of the database that is elevated somewhat above the hardware level and supports user operation that are expressed in terms of that higher-level view. 2 Users We consider three broad categories of database users, they are *application programmers *end-users *DBA 1.Application programmers Application programmer is responsible for writing application programs that use the database. These application programs operate on the data in all the usual ways that is in retrieving information, creating new information, deleting or changing existing information. 2.End-users End-users access the database from a terminal. An end-user may employ a query language provided as an integral part of the system or may invoke a userwritten application program that accepts commands from the terminal and in turn issues requests to the DBMS on the end-user’s behalf. 3.Database Administrator DBMS have central control of both the data and to the programs that access those data. The person who has such control over the system is called DBA.The main functions of DBA are *Schema definition *Storage structure and access-method definition *Granting and physical-organization modification *Integrity-constraint specification These are the various components of a database system. 2.Operational data A database is a collection of stored operational data used by the application systems of some particular enterprise. Where enterprise is a conventional generic term for any reasonably self-contained commercial, scientific, technical or other organization. Examples. Manufacturing company,Bank,Hospital,University,Government department etc. The enterprise should maintain a lot of data about its operation. The “operational data” for the enterprises quoted above are, Product data, account data, patient data, student data, planning data. 3 Example for the illustration of operational data Consider the manufacturing company where the enterprise will wish to retain information about the projects it has on hand; parts used in those projects; the suppliers who supply the parts; the warehouses in which the parts are stored; the employees who work on the projects etc..These are the basic entities about which data is recorded in the database. In general there will be associations or relationships linking the basic entities together(entity is any distinguishable object). For example, there is an association between suppliers and parts that is each supplier supplies certain parts and conversely each part is supplied by certain suppliers etc.. projects suppliers parts warehouses locations employees department s Fig: An example of operational data The figure illustrates 1.Most of the associations are between two entities or more than that ex., arrow connecting suppliers-parts-projects Here supplier s2 supplies part p4 to project j3. 2.The example also shows one arrow involving only in one type of entity (parts) ex., some parts are components of other parts (a screw is a component of a huge assembly or char etc..) 3.Some entities may be associated in more than one relationship Ex., projects and employees are linked in two relationships a. the employee works on the project b .the employee is the manager of the project This example clearly illustrates operational data and its functions. 4 3.Data Independence The ability to modify a schema definition in one level without affecting a schema in the next higher level is called data independence. Most present day applications are data-dependent. This means ,the way in which the data is organized in secondary storage and the way in which it is accessed are both dictated by the requirements of the application ,and moreover that knowledge of the data organization and access technique is built into the application logic. For example, if a file is stored in indexed sequential form, and in order to modify the file the indexes defined should be known. Here the data is dependent, and the modification requires complete application program to be rewritten. In database system, data resides independent and any modification done at physical level/conceptual level may not affect the database system. Two types of data independence stated are 1.Physical data independence Physical data independence is the ability to modify the physical schema without causing application programs to be rewritten. Modifications at the physical level are occasionally necessary to improve performance. Example, Modifying the structure of the database using ALTER command etc. 2.Logical data independence Logical data independence is the ability to modify the logical schema without causing the application programs to be rewritten. Example, Modifications such as adding new columns or field to the database. Most of the modifications are done by the DBA and the types of change that the DBA wish to make may be explained with the help of the following definitions: Stored field: Stored field is the smallest unit of data stored in the database. Ex., database containing information about parts would probably include a stored field type called part number etc. Stored record: Stored record is a named collection of associated stored fields. Stored file: Stored file is the collection of all occurrences of one type of stored record. Similarly if a data type of the stored field has to be changed is also done by Data. The data storage may be in any of the following form. 5 1.Representation of numeric data Data may be stored in internal arithmetic form or as a character string. 2.Representation of character data A character field may be (eg.EBCDIC,ASCII..) stored in any of several character codes 3.Units for numeric data The units in a numeric field may change.Ex.,from inches to centimeters 4.Data coding In some situations it may be desirable to represent data in storage by coded values. Ex., the value for part color=RED can be interpreted as 1=’RED’. 5.Structure of stored records Two existing types of stored record may be combined into one. For ex., the record types(part number, color) and (part number, weight) may be integrated to give (part number,color,weight). Also a single type of stored record may be split into two. For ex.,(part number,color,weight) may be broken down into (part number, color) and (part number, weight). 6.Structure of stored fields A given stored file may be physically implemented in storage in a wide variety of ways. For ex., storing the file in single storage volume or spread across several volumes. The above fact implies that the database is able to grow without affecting existing applications. 4.Architecture for a Database system The architecture is divided into three general levels, they are internal,conceptual,external levels, ------------------- External level (individual user views) Conceptual level (Community user view) Internal level 6 (Storage view) Fig:Three levels of architecture *Internal level(Physical level) This level is the one closest to the physical storage .This is a low-level representation of the entire database; it consists of many occurrences of each of many types of internal record .The storage view is described by means of the internal schema which not only defines the various stored record types but also specifies what indexes exist, how stored files are represented ,what physical sequence the stored records are in and so on. *Conceptual level (Community logical level) This level is the representation of the entire information content of the database. It consists of many occurrences of each of many types of conceptual record. Also this is a level of indirection between the other two levels. *External level(user logical level) This level is closest to the users and is concerned with the way the data is seen by the individual users. The users may be application programmers,end-users,DBA etc.Each user has a language at his/her disposal to interact with the database. For the application programmer the language will be either a conventional programming like c++,JAVA etc. For end users the language will be either a query language or some specialpurpose language and that language is data sub language (DSL) which is a subset of the total language that is concerned with database objects and operations. The DSL is embedded within the corresponding host language . A given system might support any number of host languages and any number of data sub languages; however, one particular data sub language that is supported by almost all current systems is the language SQL. Any given data sub language is a combination of at least two subordinate languages-a Data definition language(DDL) and data manipulation language(DML).Where the DDL portion consists of declarative constructs and the DML portion consists of executable statements. The individual user will generally be interested only in some portion of the total database; moreover ,that user’s view of that portion will generally be somewhat abstract when compared with the way the data is physically stored. The term for an individual user’s view is an external view. An external view is thus the content of the database as seen by some particular user. For example, A user from the Personnel Department might view the details of employee and department and nothing else. 7 Detailed System architecture User A1 Host language +DSL *external schema A user A2 User B1 Host language +DSL Host language +DSL *external schema B External view A External/conceptual mapping A Host language +DSL External view B External/conceptual mapping B Conceptual view conceptual schema User B2 Database managem ent system (DBMS) Conceptual/internal mapping storage structure definition (internal schema) Stored database(internal level) *user interface fig: Database system architecture Mappings The mappings involved in the architecture are conceptual/internal mapping and external/conceptual mappings. The conceptual/internal mapping defines the correspondence between the conceptual view and stored database, it specifies how conceptual records and fields are represented at the internal level. If the structure of the stored database is changed then the conceptual/internal mapping must be changed accordingly, so that the conceptual schema can remain invariant. The effects of such changes must be isolated below the conceptual level, in order to preserve physical data independence. The external/conceptual mapping defines the correspondence between a particular external view and the conceptual view. Database administrator(DBA) 8 The Data Administrator(DA) is the person who makes the strategic and policy decisions regarding the data of the enterprise and the DBA is the person who provides the necessary technical support fro implementing those decisions. Thus the DBA is responsible for the overall control of the system in technical level. The major tasks of DBA are *defining the conceptual schema or schema definition *storage structures and access-method definition *schema and physical organization modification *granting of authorization for data access *integrity constraint specification DBMS The DBMS is the software that handles all access to the database. Its functions are as follows A user issues an access request using some particular data sub language The DBMS intercepts that request and analyses it. The DBMS inturn,intercepts the external schema for that user, the corresponding external/conceptual mapping, the conceptual schema, the conceptual/internal mapping, the storage structure definition. The DBMS executes the necessary operations on the stored database The diagrammatic representation of the major functions of DBMS and its components. Source schemas and mappings DDL processors Planned DML requests Unplanned DML requests DML processor Query language processor Compiled requests 9 Enforce security and Integrity constraints 5.Distributed databases The key objective of distributed system is that it should look like a centralized system to the users. Distributed processing means that distinct machines can be connected together into communication network such as the Internet, so that the single data-processing task can span several machines in the network. A distributed database is typically a database that is not stored in its entirety at a single physical location, but rather is spread across a network of computers that are geographically dispersed and connected through communication links. For example, consider a banking system in which the customer accounts database is distributed across the bank branch offices, such that each individual customer account record is stored at the customer’s local branch. It other words the data is stored at the 10 location at which it is frequently used, but is still available through communication network to users at other locations for example, users at the bank’s central office. Client Server Client Server Communication network Client Server Client Server D database Advantages Efficiency of local processing Data sharing Disadvantages Overhead may be quite high Technical difficulties 6.Storage structures and its purposes. 11 The main idea behind data maintenance is for future reference and it has to be stored for the storage and access of data ,various techniques like sequential ,direct access etc. exists. Once the data is stored in the memory in internal level(physical storage) then it is accessed through DML operations in terms of external records and must be converted in turn to operations at the actual hardware level that is to operations on physical records or blocks. The component responsible for this internal/physical conversion is called an access method. The access method consists of a set of routines whose function is to conceal all device-dependent details from the DBMS and to present the DBMS with a stored record interface. USER user interface External record DBMS Stored record interface occurrences Stored record occurrences Access Method Physical record interface physical record occurrences Fig: The stored record interface The stored record interface thus corresponds to the internal level, just as the user interface corresponds to the external level. Also the stored record interface allows DBMS to view the storage structure as a collection of stored files each consisting of all occurrences of one type of stored record. The DBMS knows *What stored files exist *The structure of the corresponding stored record *The stored fields on which it is sequenced *The stored field which can be used for direct access etc. These information will be specified as part of the storage structure definition. The DBMS does not know a)anything about physical records b)how sequencing is performed c)how direct access is performed These information are specified to the access method not to the DBMS. 12 Also ,when a new stored record occurrence is first created and entered into the database, the access method is responsible for assigning it a unique stored record address(SRA).This value distinguishes each stored records from other records, the SRA for a particular occurrence is returned to the DBMS by the access method when the occurrence is first created and may be used by the DBMS for subsequent direct access to the occurrence concerned. The SRA for a given occurrence does not change until the occurrence is physically moved as part of a database reorganization. 7.How data are stored in the physical storage? There are various possible representations of data within the memory and some of them are explained here. Consider the following example. S# S1 S2 S3 S4 S5 Sname Smith Jones Blake Clark Adams Status 20 10 30 20 30 City London Paris Paris London Athens The table consists of information about five suppliers for each supplier a record number ,a supplier name, a status value and a location is recorded. Also the supplier number for each supplier is unique, that is each record is sequenced on the basis of its primary key. The above example is the simplest from of data representation containing only five record occurrences with unique supplier number. If the suppliers are 10000 rather than five and located in only 10 different cities then the storage will be wasted specifying the 10 cities among 10000 suppliers. Then the pointer is specified from the supplier file to the city file by separating the city attribute alone to a file. The following is another form of data the representation Supplier file S# S1 S2 S3 S4 S5 Sname Smith Jones Blake Clark Adams city file City Athens London Paris Status City-ptr 20 10 30 20 30 In the above figure the pointers exists from supplier file to the city file and they are SRAs(Storage record address).Advantage of this form of representation over the previous one is, in the later memory space is saved. 13 The third form of data representation is indexing. If a file is indexed on any of its attributes(more frequently occurring) then accessing such file is quite easier. The representation can be S# S1 S2 S3 S4 S5 City Supplier ptr Athens London paris Sname Smith Jones Blake Clark Adams Status 20 10 30 20 30 indexed on city An example,”Find all suppliers in a given city”,when this query is placed then the result is retrieved quite easily from the database if represented as above that is in indexed form. The purpose of indexing is to provide an access path to the file.An index is a file in which each entry(record) consists of a data value together with one or more pointers.The data value is a value for some field of the indexed file and the pointers identify records in the indexed file having that value for that field.An index can be used in two ways first it is used for sequential access to the indexed file and another is used for direct access to individual records in the indexed file on the basis of a given value for that same field. The another form of dat representation is multilist organisation. 8.DATA STRUCTURES AND CORRESPONDING OPERATORS The range of data structures supported at the user level is a factor that critically affects many componenets of the system .It dictates the design of the corresponding data manipulation languages,since DML operation must be defined in terms of its effect on those datastructures.We may categorize database systems according to the approach and the best known approaches are Relational approach Hierarchical approach Network approach The relational approach 14 The relational approach uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns and each column has a unique name. Sample relational database Bank customer Customer name Johnson Smith Hayes Turner Johnson Jones Lindsay Smith Snsocial-security-no. 92-83-7465 019-28-3746 677-28-9011 182-73-6091 192-83-7465 321-12-3123 336-66-9999 019-28-3746 customer-street Alma North Main Putnam Alma Main Park North customer-city Palo Alto Rye Harrison Stamford Palo Alto Harrison Pits field Rye account-no. A-101 A-215 A-102 A-305 A-201 A-217 A-222 A-201 Accounts account-no A-101 A-215 A-102 A-305 A-201 A-217 A-222 balance 500 700 400 350 900 750 700 For example, customer Johnson whose social-security-no. is 192-83-7465 lives on Alma in Palo Alto and has 2 accounts A-101 with balance 500,a-201 with balance 900.Also smith and Jhonson shares A-201 account. Network model Data in the network model are represented by collections of records and relationships among data .The relationships among data can be represented by links, which can be viewed as pointers Sample network databases Johnson 192-83-7465 Alma Palo Alto A-101 500 15 A-215 700 Smith 019-28-3746 North Rye Hierarchical Model This form of data representation is similar to network model in the sense that records represent data and relationships among data and links .It differs from the network model in that the records are organized as collection of trees rather than graphs. 9.Advantages of using DBMS Many enterprises choose to store its operational data in an integrated database because it provides the enterprise with centralized control of its operational data, which is most valuable. DBA has the central responsibility over operational data. Advantages if data is stored under centralized control. 1.Redundancy can be reduced In non-database system each application has its own private files-which may cause redundancy in stored data. By means of integration this can be avoided. 2.Inconsistency can be avoided (to some extent) 16 Suppose the fact, Employee E3 works in department D8 is represented by two distinct entries in the database and the system is not aware of this duplication. And if any one alone is updated in some occasions they will not agree and comes inconsistent state. So if the redundancy is controlled then the system could guarantee that the database is never inconsistent as seen by the user, by ensuring that any change made to either of two entries is automatically made to each other. This process is known as propagating updates. 3.The data can be shared New applications can access the stored databases. 4.Security restrictions can be applied. Only if permissions are available all users could access the database. The permissions are given by the DBA, so the data ensures security. 5.Integrity can be maintained Data in the database is accurate or not is mostly validated. 10.Database Administrator One of the main reasons for using DBMS is to have central control of both the data and the programs that access those data. The person who has such central control over the system is called the database administrator (DBA). The functions of the DBA include the following. Schema definition: The DBA creates the original database schema by writing a set of definitions that is translated by DDL compiler to a set of tables that is stored permanently in the data dictionary. Storage structure and access-method definition: The DBA creates appropriate storage structures and access methods by writing a set of definitions, which is translated by the data-storage and data-definition-language compiler. Schema and physical-organization modification: Programmers accomplish the relatively rare modifications either to the database schema or to the description of the physical storage organization by writing a set of definitions that is used by either the DDL compiler or the data-storage and data-definition language. Granting of authorization for data acess: Granting of different types of authorization allows the DBA to regulate which parts of the database various users can access. Integrity – constraint specification: Setting constraints (conditions) while entering data to the database .For ex, the minimum balance in the account should be at least 500 etc. 17 DATABASE MANAGEMENT SYSTEM UNIT I Objective questions 1.Database is a) Computer-based billing system b) Computer-based record keeping system c) Computer-based animation system 2.The software used for access to the database is a) BASIC b) PASCAL c) DBMS 3.The end-users access the database from the terminal using a) Query language b) English language c) C language 4.DBA stands for a) Data Base Administrator b) Data base Access c) Data Batch Administration 5.Which of the following is not operational data 18 a) Product data b) Account data c) two numbers 6.The database system provides the enterprise with ___________ control of its operational data a) Centralized b) Single c) Shared 7.The ability to modify the schema definition in one level without affecting the schema in the other level is called a) Data dependence b) data independence c) data abstraction 8.Which of the following is not a level of database architecture a) External b) logical c) super d) conceptual 9.Data sub language is a combination of a) DDL and DML b) DDL and TCL c) C and C++ 10.A database that is not stored in a single physical location in its entirety and spread across the network is a) Centralized database b) Distributed database c) Shared database 11.DBMS is a) A software that handles all access to the database b) A hardware c) An interface between end-user and computer 12.The component responsible for internal/physical conversion is called a) Access method b) internal conversion c) a hardware 13. SRA is a) Stored Record Array b) Stored Record Access c) Stored Record Address 14.Primary key is the key which a) Avoids duplication of data b) supports duplication of data c) allows null values 15.The data is represented in terms of 1) Relational approach 2) hierarchical approach 3) network approach a) 1,2 b) 1,2,3 c) none of the above 16.The representation of data in relational approach 1) Tables 2) tuples 3) relations Ans: a) 1 b) 1,2 c) 1,2,3 d) none 17.The data represented in network approach is through a) Records and links b) tables c) trees 18.The ___________permits the DBMS to view the storage structure as a collection of stored files. a) Stored record interface b) Stored record address c) Access method 19.Entity is a) Any distinguishable real world object b) Not an object c) Incident 20.DBMS stands for a) Data Base Management System b) Database Multimedia system c) Data Base Management Standards 19 Short questions 1.What are the basic components of database system? 2.Explain the components of a database system with the simplified diagram. 3.What is an operational data? 4.Explain operational data with example. 5.Explain data independence. 6.Why database systems is adopted rather than filesystem or write down the advantages of database system. 7.Distinguish between input, output, and operational data 8.Explain three levels of database system in brief. 9.What is the role of DBA? 10.What are the functions of DBMS? 11.Explain in brief distributed databases. 12.Relate distributed databases with client server architecture. 13.Explain access method, SRA, SRI. 14.Differentiate relational, network, hierarchical approaches. 15.Explain any one form of data representation. Elaborate questions 1.Role of DBA with any one-function explanation in detail 2.DBMS and its functions, advantages, disadvantages 3Database system is followed now-a-days. Justify 4.Explain the architecture of database system. 5.Explain database system with simplified structure. 6.Explain storage structures with at least any one representation. 7.Explain various data structures used to represent data in database system. Unit II Syllabus Relational approach: Relational data structure: relation, domain, attributes, keys Relational algebra: Introduction, traditional set operation, attribute names for derived relations, special relational operations. Books for Reference: Database system Concepts - Abraham silberschatz, Henry F.Korth, S.Sudharsan 20 An introduction to database system - C.J.Date Principles of database system -Aho D.Ullman An introduction to database systems -Bipin P.Desai Relational Approach Introduction: The relational model has established itself as the primary data model for commercial data-processing applications. The first database systems were based on either the network model or the hierarchical model. The relational model is now being used in numerous applications outside the domain of traditional data processing. Structure of relational databases. A relational database consists of a collection of tables, each of which is assigned a unique name. A row in a table represents a relationship among a set of values. The rows are termed as tuples and columns are termed as attributes. Since a table is a collection of such relationships, there is a close correspondence between the concept of table and the mathematical concept relation, from which the relational data model takes its name. The following account table or relation has three column headers: branchname, account-number and balance. These are the attributes (columns are referred as attributes). For each attribute there is a set of permitted values, called the domain of that attribute. For the attribute, branch-name set of all branch-names is its domain. The account relation Branch-name Downtown Mianus Perry ridge Round Hill Brighton Redwood Brighton Account-number A-101 A-215 A-102 A-305 A-201 A-222 A-217 21 Balance 500 700 400 350 900 700 750 Let D1 denote the set of all branch-names, D2 denote the set of all accountnumbers, and D3 the set of all balances. In the account relation it consists of a 3tuple (v1, v2, v3), were v1 is a branch name, v2 is an account number and v3 is a balance. The account will contain only a subset of the set of all possible rows. It can be represented as D1 * 2 * D3 In general a table of n attributes must be a subset of D1 * D2 *……Dn-1 * D n The relation is said to be a subset of a Cartesian product of a list of domains. Tables are relations and the mathematical terms relation and tuple is used for the terms table and row respectively. In the account relation of the above figure there are seven tuples. Let the tuple variable t refer to the first tuple of the relation .We use the notation t [branch-name] to denote the value of t on the branch-name attribute. Thus, t [branch-name]=”Downtown”, and t [balance]=500.Since the relation is a set of tuples, we use the mathematical notation of t E r to denote that tuple r is in relation r. Domain: -Domain is a pool of values. Also we can say that domain is atomic if elements of the domain are considered to be individual units. For example, the set of integers is a nonatomic domain. The distinction is that we do not normally consider integers to have subparts, but we consider sets of integers to have subparts-namely, the integers comprising the set. It is possible for several attributes to have the same domain. The customer relation Customer-name Jones Smith Hayes Curry Lindsay Turner Williams Adams Johnson Glenn Brooks Green Customer-street Main North Main North Park Putnam Nassau Spring Alma Sand Hill Senator Walnut Customer-city Harrison Rye Harrison Rye Pittsfield Stamford Princeton Pittsfield Palo Alto Woodside Brooklyn Stamford 22 It is possible for several attributes to have the same domain. For example, suppose that we have a relation customer that has the three-attribute customername, customer-street and customer-city, and a relation employee that includes the attribute employee-name. It is possible that the attributes customer-name and employee-name will have the same domain: the set of all person names. The domains of balance and branch-name are certainly distinct. It is perhaps less clear whether customer-name and branch-name should have the same domain. At the physical level, both customer names and branch-names are character strings. However, at the logical level, we may want customer-name and branch-name to have distinct domains. Relation: Definition for relation (mathematically): Given a collection of set D1, D2,……Dn (not necessarily distinct,R is a relation on those n sets if it is a set of ordered n-tuples <d1,d2,……dn> such that d1 belongs to D1,d2 belongs to D2 ,…..dn belongs to Dn.Set D1,D2,D3,…..Dn are the domains of R.The value of n is the degree of R. The concepts of relation correspond to the programming-language notion of a variable. The concept of a relation schema corresponds to the programminglanguage notion of type definition. It is convenient to give a name to a relation schema, just as we give names to type definitions in programming languages. We adopt the convention of using lowercase names for relations, and names beginning with an uppercase letter for relation schemas. For example, Account-schema=(branch-name, account-number, balance) The explanation of relation can be expressed diagrammatically with the help of E-R diagrams. Before discussing E-R diagrams, the common terms used in the diagrams is analysed. Entity: This is a thing or object in the real world that is distinguishable from all other objects. For example, each person in an enterprise is an entity. An entity has a set of properties, and the values for some set of properties may uniquely identify entity. For example, the social-security number 677-89-9011(employee number 1111) uniquely identifies one particular person in the enterprise. Entity Set: An entity set is a set of entities of the same type that share the same properties or attributes. The set of all persons who are customers at a given bank, for example, can be defined as the entity set customer. Attributes: An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. Possible attributes of 23 customer entity are customer-number, customer-street, and customer-city. The following attribute types, as used in the E-r model, can characterize an attribute. Simple and Composite attributes: The attributes, which can be divided into subparts, are composite attribute. For example, name is an attribute, which is combination of first-name, middle name, and last-name. Single-valued and Multivalued attributes: The attributes that we have specified in our examples all have a single value for a particular entity. For instance, the loan-number attribute for a specific loan entity refers to only one loan number. Such attributes are said to be single valued. There may be instances where an attribute has a set of values for a specific entity. Null attributes: A null value is used when an entity does not have a value for an attribute. Derived attribute: The value for this type of attribute can be derived from the values of other related attributes or entities. For instance, let us say that the customer entity set has an attribute loans-held, which represents how many loan a customer entity set has from the bank. We can derive the value for this attribute by counting the number of loan entities associated with that customer. Relationship sets Consider the relation loan. Branch-name Downtown Redwood Perry ridge Downtown Mianus Round Hill Perry ridge Loan-number L-17 L-23 L-15 L-14 L-93 L-11 L-16 Amount 1000 2000 1500 1500 500 900 1300 A relationship is an association among several entities. For example, we can define a relationship that associates customer Hayes with loan number L-15.This relationship specifies that Hayes is a customer with loan number L-15. A relationship set is a set of relationships of the same type.Formally.it is a mathematical relation on n>=2 (possibly non distinct) entity sets. If E1, E2,…..En are entity sets, then a relationship set R is a subset of 24 {(e1, e2,…………..,en)|e1 E1,e2 E2 ,…..en En} Where (e1, e2,…….en) is a relationship. Consider the two entity sets customer and loan, we can define the relationship set borrower to denote the association between customers and the bank loans that the customers have. As another example, consider the two-entity sets loan and branch. We can define the relationship set loan-branch to denote the association between a bank loan and the branch in which that loan is maintained. Each row of the table represents one n-tuple of the relation. The number of tuples in the relation is called the cardinality of the relation. Eg. The cardinality of the relation loan is 7. The relations may be unary, binary, ternary, n-ary etc. Unary: Relations of degree one is unary. For ex, the query Find the branch name that issued loan with number L-17.The output will be Branch-name Downtown Binary: Relations of degree two are binary. Ex, Find branch-name and amount for loan-number L-17 from branch relation The output will be, Branch-name Amount Downtown 1000 Ternary: Relations of degree three are ternary N-ary: Relations of degree n are n-ary. Mapping cardinalities: Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via relationship set. Mapping cardinalities are most useful in describing binary relationship sets, although occasionally they contribute to the description of relationship sets that involve more than two entity sets. For binary relationship set R between sets A and B, the mapping cardinality must be one of the following: 25 One to one: An entity is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. One to Many: An entity in A is associated with any number of entities in B.An entity in B, however, can be associated with at most one entity in A. Many to one: An entity in A is associated with at most one entity in B.An entity in B, however, can be associated with any number of entities in A. Many to Many: An entity in A is associated with any number of entities in B, and an entity in B is associated with any number of entities in A. Keys: In a relation there is one attribute whose values is unique within the relation and thus can be used to identify the tuples of that relation. For ex, in the above said loan relation the loan number can be considered as a key, which is unique, and can be used to distinguish all other tuples in that relation. Befrore discussing on various keys let us have a glance on integrity constraints. Integrity constraints: An integrity constraint is a mechanism used by oracle to prevent invalid data entry into the table. It is nothing but enforcing rule for the coloumn in a table. The following are the various types of integrity constraints: *Domain integrity constraints Maintains value according to the specification like ‘not null’ condition, so that the user has to enter a value for the coloumn on which it is specified. ‘Not null’ and ‘Check’ constraints fall unde this category. *Entity integrity constraint Maintains uniqueness in a record. *Referential integrity constraint Enforces relationship between tables To establish a ‘parent-child’ or a ‘master-detail’ relationship between two tables having a common column we make use of referential integrity constraints. To implement this we should define the column in the 26 parent table as a primary key and the same column in the child table as a foreign key referring to the corresponding parent entry. We define constraint to either at table or column level. If it is defined at the table level, then it can be enforced to any number of columns in a table .On other hand, if it is defined at the column level then it holds good only for the column for which it is defined. Various keys related to relational approaches are Primary Key: Primary key is a set of one or more attributes that, taken collectively allows us to identify uniquely an entity in the entity-set. Ex.1) An-number in the loan relation 2) Also the combination of branch-name and loan-number Candidate Key: Several distinct sets of attributes could serve as candidate key Referenced key:It is a unique or a primary key, which is defined on a coloumn belonging to the parent table. Foreign Key: A coloumn or combination of coloumns included in the definition of referential integrity, which would refer to a referenced key. Child table: This table depends upon the values present in the referenced key of the parent table, which is referred by a foreign key. Parent table: This table determines whether insertion or updation of data can be done in child table. This table would be referred by child table’s foreign key. On delete cascade clause If all rows under the referenced key coloumn in a parent table are deleted, than all rows in the child table with dependent foreign key will also be deleted automatically. Entity-Relationship Diagrams: An E-R diagram can express the overall logical structure of a database graphically. Such a diagram consists of the following major components: The symbol used to represent entity is rectangle 27 The symbol used to represent attribute is ellipse The symbol used to represent links is lines _______ The symbol used to represent the relation is The symbol used to represent multivalued attributes is Double ellipses The symbol used to represent the derived attributes is dashed ellipses The symbol used to represent the total partition of entity in a relationship set is double lines. E-R diagram for a Banking-Enterprise Account-number account Branch-city Balance Assets Branch-name Accountbranch Deposit -or branch Loanbranch 28 Borrower customer loan Customer-street Loan-number Customer-name Customer-city Amount Various relations used for the discussion of this chapter are 1.Account relation Branch-name Downtown Mianus Perry ridge Round Hill Brighton Redwood Brighton Account-number A-101 A-215 A-102 A-305 A-201 A-222 A-217 Balance 500 700 400 350 900 700 750 2.Loan relation Branch-name Downtown Redwood Perry ridge Downtown Mianus Round Hill Perry ridge Loan-number L-17 L-23 L-15 L-14 L-93 L-11 L-16 Amount 1000 2000 1500 1500 500 900 1300 3.Branch relation Branch-name Downtown Branch-city Brooklyn Assets 9000000 29 Redwood Perryridge Mianus Round hill Pownal North town Brighton Palo alto Horse neck Horse neck Horse neck Bennington Rye Brooklyn 2100000 1200000 400000 8000000 300000 3700000 7100000 4.Customer relation Customer-name Jones Smith Hayes Curry Lindsay Turner Williams Adams Johnson Glenn Brooks Green Customer-street Main North Main North Park Putnam Nassau Spring Alma Sand Hill Senator Walnut Customer-city Harrison Rye Harrison Rye Pittsfield Stamford Princeton Pittsfield Palo Alto Woodside Brooklyn Stamford 5.Depositor relation Customer-name Johnson Smith Hayes Turner Johnson Jones Lindsay Account-number A-101 A-215 A-102 A-305 A-201 A-217 A-222 6.Borrower relation Customer-name Jones Smith Hayes Jackson Curry Smith Williams Adams` Loan-number L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 30 Relational Algebra Note: Query languages A query language is a language in which a user requests information from the database. These languages are typically of a level higher than that of a standard programming language. Query languages can be categorized as being either procedural or non-procedural .In procedural language, the user instructs the system to perform a sequence of operations on the database to compute the desired result. In a non-procedural language, the user describes the information desired without giving a specific procedure for obtaining that information. Introduction Relational algebra is a collection of operations on relations. Also it is a procedural query language, it consists of a set of operations that take one or two relations as input and produce a new relation as their result. The fundamental operations or traditional set operations available with relational algebra are select, project, set difference, Cartesian, rename, union. In addition to the fundamental operations, there are several other operations-namely, set intersection, natural join, division, and assignment. These operations will be defined in terms of the fundamental operations. Also we can state the selction, projection, join and division operations as special relational operators. Fundamental operations The select, project and rename operations are called unary operations, because they operate on one relation. The other three operations union, setdifference and Cartesian product operate on pairs of relations and are, therefore called binary operations. The select operation 31 The select operation selects tuples that satisfy a given predicate. The lowercase Greek letter sigma () is used to denote selection. The predicate appear as a subscript to . The argument relation is given in parenthesis following the . Example: 1.Select those tuples of the loan relation where the branch is “Perryridge”. branch _name=”perryridge”(loan) The result of the query is Branch-name Loan-number Amount Perryridge L-15 1500 Perryridge L-16 1300 2.Find all tuples in which the amount lent is more than $1200 Amount>1200(loan) All comparisons using =,, <,,≥ in the selection predicate. Also we can combine larger predicates using the connectives and (^) and or (۷). 3.Find those tuples pertaining to loans of more than $1200 made by Perryridge branch branch _name=”perryridge”^amount>1200(loan) The project operation Suppose we want to list all loan numbers and the amount of the loans, but do not care about the branch name. The project operation allows us to produce this relation. The project operation is a unary operation that returns its argument relation, with certain attributes left out. Since a relation is a set, any duplicate rows are eliminated. Projection is denoted by the Greek letter pi (π). We list those attributes that we wish to appear in the result as subscript to π.The argument relation follows in parentheses. Example: 1.List all loan numbers and the amount of the loan .The corresponding query is π loan-number,amount(loan) The relation that results from this query is Loan-number L-17 L-23 L-15 L-14 L-93 L-11 L-16 Amount 1000 2000 1500 1500 500 900 1300 32 The set difference operation The set-difference operation, denoted by -, allows us to find tuples that are in one relation but are not in another. The expression r – s results in a relation containing those tuples in r but not in s. Example: 1.Find all customers of the bank who have an account but not a loan π customer-name (depositor) – πcustomer-name (borrower) The result will be Customer-name Johnson Turner Lindsay For a set difference operation r-s to be valid, we require that the relations r and s be of the same arity, and that the domains of the ith attribute of r and the ith attribute of s be the same. The cartesian – product operation The Cartesian-product operation, denoted by a cross (X), allows us to combine information from any two relations. We write the Cartesian product of relations r1 and r2 as r1 X r2. Since the same attribute name may appear in both r1 and r2, we need to devise a naming schema to distinguish between these attributes. We do so here by attaching to an attribute the name of the relation from which the attribute originally came. For example, the relation schema for r = borrower X loan is (borrower.customer-name,borrower.loan-number,loan.branch-name,loan.loannumber,loan.amount) So now we can distinguish borrower.loan-number from loan.loan-number.For those attributes that appear in only one of the two schemas,we shall usually drop the relationname prefix.We can wrte the relation schema for r as (customer-name,borrower.loan-number,branch-name,loan.loan-number,amount) This above naming convention requires that the relations that are arguments of the Cartesian-product operation have distinct names. Assume that we have n1 tuples in borrower and n2 tuples in loan. Then, there are n1 * n2 ways of choosing a pair of tuples –one tuple from each relation; so there are n1*n2 tuples in r. In particular ,note that for some tuples t in r,it may be that t[borrower. loan-number] not equal to t[loan.loan-number]. 33 In general ,if we have relations r1(R1) and r2(R2),then r1 X r2 is a realtion whose schema is the concatenation of R1 and R2.Relation R contains all tuples t for which there is a tuple t1 in r1,and t2 in r2 for which t[R1]=t1[R1] and t[R2]=T2[R2]. For example 1.if we want to find the names of all customers who have a loan at the Perryridge branch.We need the information in both the loan relation and the borrower relation to do so.If we write branch-name=”Perryridge”(borrower X loan) Customer-name Borrower.loan- Branch-name number Jones L-17 Downtown Jones L-17 Redwood ……. ……. ……. ……. ……. …… ……. ……. …… Adams L-16 Round hill Adams L-16 Perryridge Table:Result of borrower X loan Loan.loan-number Amount L-17 L-23 …….. ……. ……. L-11 L-16 1000 2000 ….. ….. ….. 900 1300 Now the output of the query stated above will be as Customer-name Jones Jones Smith Smith Hayes Hayes Jackson Jackson Curry Curry Smith Smith Williams Williams Adams Adams Loan-number L-17 L-17 L-23 L-23 L-15 L-15 L-14 L-14 L-93 L-93 L-11 L-11 L-17 L-17 L-16 L-16 Branch-name Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Perryridge Loan-number L-15 L-16 L-15 L-15 L-15 L-16 L-15 L-16 L-15 L-16 L-15 L-16 L-15 L-16 L-15 L-16 Amount 1500 1300 1500 1300 1500 1300 1500 1300 1500 1300 1500 1300 1500 1300 1500 1300 Table:result of query branch-name=”Perryridge”(borrower X loan) The relation describes the details relating to perryridge branch alone.But there is a chance that many customers may not have a loan at perryridge branch.So the query can be re-written as 34 borrower.loan-number=loan.loan-number ( branch-name=”Perryridge”(borrower X loan)) In order to retrieve only the customer-name ,we vcan have the projection operation as customer-name(borrower.loan-number = loan.loan-number (branch-name=”Perryridge”(borrower X loan) The result is as shown below Customer-name Hayes Adams Table:Result of customer-name(borrower.loan-number = loan.loan-number (branch-name=”Perryridge”(borrower X loan) The rename operation Unlike relations in the database, the results of relational-algebra expressions do not have a name that we can use to refer to them. It is useful to be able to give them names; the rename operator, denoted by the lower-case Greek letter rho (), lets us perform this task. Given a relational-algebra expression E, the expression x(E) returns the result of expression E under the name x. A relation r by itself is considered to be a trivial relational-algebra expression. Thus, we can also apply the rename operation to a relation r to get the same relation under a new name. A second form of the rename operation is as follows. Assume that a relationalalgebra expression E has arity n. Then the expression x(A1,A2,.....An)(E) returns the result of expression E under the name x,and with the attributes renamed to A1,A2,.....An. For example, 1.Find the largest balance in the bank Steps invloved are Compute first the relation consisting of those balances that are not the largest The take the set difference between the relation balance(account) 35 Then comes the temporary relation The corresponding queries are account.balance( account.balance < d.balance(account X d (account))) This expression gives those balances in the account relation for which a larger balance appears somewhere in the account relation(renamed as d).The result contains all balances except the largest one. The relation is Balance 500 700 400 350 750 The query to find the largest account balance in the bank can be written as follows: balance(account) – account.balance (account.balance <d.balance(account X d (account))) the result of this query is Balance 900 Fig: largest account balance in the bank 2.Find the names of all customers who live on the same street and in the same city as Smith The street and city of smith can be obtained by writing as customer-street,customer-city(customer-name=”Smith”(customer)) In order to find other customers with this street and city, we must reference the customer relation a second time. In the following query, we use the rename operation on the preceding expression to give its result the name smith-addr, and to rename its attributes to street and city, instead of customer-street and customer-city: customer.customer-name (customer.customer-street=smith-addr.street^customer.customer-city=smith-addr.city (customer X smith-addr(street,city) (customer-street,customer-city(customer-name=”Smith”(customer))))) The result of this query is as shown below Customer-name Smith 36 curry Additional operations or special relational operations 1.The set-intersection operation The symbol used to identify is . Example: 1.Find all customers who have both a loan and an account. Query is customer-name(borrower) customer-name(depositor) The result will be Customer-name Hayes Jones Smith Table: customers with both an account and a loan at the bank The intersection operation can be replaced using the set difference operation as r s =r-(r-s) The Union operation With the help of this operation we can choose the details which are present in either of two relations. For example: 1.Find the names of all bank customers who have either an accoubt or a loan or both. The customer relaion does not contain the information ,since a customer does not need to have either an account or a loan at the bank.And to answer this query we need the information in the depositor relation and in the borrower relation . *To find the customers with loan at the bank we use customer-name(borrower) *To find the names of all customers with an account in the bank: customer_name(depositor) To find both account and loan holding customers we need to union these two as Customer-name(borrower) customer-name(depositor) The result of this query is Customer-name Johnson Smith 37 Hayes Turner Jones Londsay Jackson Curry Williams Adams For union operation r U s to be valid, we require two conditions: 1.The relations r and s must be of the same arity. That is, they must have the same number of attributes. 2.The domain of the ith attribute of r and the ith attribute of s must be the same, for all i. Where r and s can be, in general temporary relations that are the result of relationalalgebra expressions. The natural-join operation It is often desirable to simplify certain queries that require a Cartesian product. A query that involves a Cartesian product includes a selection operation on the result of the Cartesian product. Assume: Find the names of all customers who have a loan at the bank, and find the amount of the loan. Steps : 1.Form the Cartesian product of the borrower and loan relations. 2.Select those tuples that pertain to only the same loan-number. 3.Project the customer-name,loan-number and amount. customer-name,loan.loan-number,amount (borrower.loan-number=loan.loan-number(borrower X loan)) The natural join is a binary operation that allows us to combine certain selections and a Cartesian product into one operation. It is denoted by the “join” symbol ⋈.The natural-join operation forms a Cartesian product of its two arguments, performs a selection forcing equality on those attributes that appear in both relation schemas, and finally removes duplicate attributes. For example: 1.Find the names of all customers who have a loan at the bank, and find the amount of the loan. customer-name,loan-number,amount(borrower⋈ loan) The result of the query is 38 Customer-name Jones Smith Hayes Jackson Curry Smith Williams Adams Loan-number L-17 L-23 L-15 L-14 L-93 L-11 L-17 L-16 Amount 1000 2000 1500 1500 500 900 1000 1300 2.find names of all branches with customers who have an account in the bank and who live in Harrison branch-name( customer-city=”Harrison”(customer ⋈ account ⋈ depositor)) The result of the query is Branch-name Brighton Perryridge The division operation The division operation, denoted by, is suited to queries that include the phrase “for all”. Example: 1.Find all customers who have an account at all the branches located in Brooklyn. Steps: 1.All branches in Brooklyn can be obtained as r1= branch-name( branch-city=”Brooklyn”(branch)) The result is Branch-name Brighton Downtown We can find all (customer-name,branch-name) pairs for which the customer has an account at a branch by writing r2=customer-name,branch-name(depositor⋈ account) 39 Customer-name Johnson Smith Hayes Turner Williams Lindsay Johnson Jones Branch-name Downtown Mianus Perryridge Round hill Perryridge Redwood Brighton Brighton Table:Result of customer-name,branch-name(depositor⋈ account) Our question is to find those customers who appear in r2 with every branch name in r1.We formulate the query by writing customer-name,branch-name(depositor ⋈ account) ⊹ Branch-name( branch-city=”Brooklyn”(branch)) Extended relational-algebra operations The basic relational-algebra expressions have been extended in several ways. A simple extension is to allow arithmetic operations as part of projection. An important extension is to allow aggregate operations, such as computing the sum of the elements of a set, or their average. Another important extension is the outer-join operation, which allows relational-algebra expressions to deal null values, which model missing information. Generalized Projection The generalized projection operation extends the projection operation by allowing arithmetic functions to be used in the projection list. The generalized projection has the form F1,F2,……Fn(E) Where E is any relational-algebra expression, and each F1, F2,…Fn are arithmetic expressions involving constants and attributes in the schema of E.As a special case, the arithmetic expression may be simply an arithmetic or a constant. The following example demonstrates the basis for the use of the generalized projection operation. Suppose we have a relation credit-info, as shown, which lists the credit limit and expenses so far .If we want to find how much more each person can spend, we can write the following expression: customer-name,limit - credit-balance(credit-info) Customer-name Limit Credit-balance 40 Jones Smith Hayes Curry 6000 2000 1500 2000 700 400 1500 1750 Table:The credit-info relation Customer-name Limit-credit_balance Jones Smith Hayes Curry 5300 1600 0 250 The result of customer-name, limit - credit-balance (credit-info) Outer join The outer-join operation is an extension of the join operation to deal with missing information. Aggregate functions Aggregate functions are functions that take a collection of values and return a single value as a result. For example, the aggregate function sum takes a collection of values and returns the sum of the values. The function sum applied on the collection <1,1,3,4,4,11> returns the value 24. The function avg returns the average of the values. So average of the above is 4. The function count returns the number of the elements in the collection and would return 6 on the preceding collection. The functions min and max, returns the minimum and maximum values in a collection; they return 1 and 11. Examples: 41 1.Find out the total sum of salaries of all part-time employees in the bank. The query is Sum salary (pt-works) The result of this query is a relation with a single attribute, containing a single row with a numerical value corresponding to the sum of all the salaries of all employees working part-time in the bank. Refer for further details of aggregate functions in the text 1.Database system concepts -Abraham Silberschatz,Henry K.Forth 2.Refer ‘An introductin to database systems’ –chapter 4 -Bipin P.Desai for relational approach. Short questions: 1.What is relational approach. 2.What is relational algebra. 3.Write the definition for relational algebra. 4.What are the fundamental operations of relational algebra. 5.What is entity, relation, entity set, relaionship, relationship set, attribute. 6.Briefly explain mapping cardinalities. 7.Draw the entity relationship diagram for banking enterprise. 8.Explain selection and projection operation with example. 9.Explain aggregate functions in brief. 10.Explain set operations. 11.Explain binary, unary, ternary and n-ary relations. 12.What are the various symbols used in entity relationship diagram. 13.What is constraint? 14.Write note on integrity rules. 15.What is a key? Elaborate questions: 1.Write the definition for key and explain various keys with example. 2.Explain the structure of relational databases with example. 3.Explain referential integrity constraint or rule, with example. 4.Explain all fundamental operations of relational algebra or traditional set operations with example. 5.Write all aggregate functions and explain in detail with example. 6.What is extended relational operations and explain all the available operations. 42 Unit III Syllabus Embedded SQL:Introduction –operators not involving cursors, involving cursorsDynamic statements. Query by example-retrieval operations, builtin-functions, update operations, QBE Dictionary.Normalization: Functional Dependency, First, Second, third normal formd, relations with more than one candidate key, good and bad decomposition. Books for Reference: An introduction to database system - C.J.Date Database system Concepts - Abraham silberschatz, Henry F.Korth, S.Sudharsan Principles of database system -Aho D.Ullman Embedded SQL SQL provides a powerful declarative query language; writing queries in SQL are typically much easier than is coding the same queries in a general-purpose programming language. To access a database from a general-purpose programming language is for the following two reasons. 1.Not all queries can be expressed in SQL, since SQL does not provide the full expressive power of a general-purpose language. That is, there exists queries that can be expressed in a language such as Pascal, C, COBOL or FORTRAN that cannot be expressed in SQL write queries, we can embed SQL within a more powerful language 2.Nondeclarative actions-such as printing a report, interacting with a user, or sending the results of a query to a graphical user interface-cannot be done from within SQL. A language in which SQL queries are embedded is referred to as host language, and the SQL structures permitted I the host language constitute embedded SQL. Languages such as PL/I however are not well equipped to handle more that one record at a time. It is therefore necessary t provide some form of bridge between the two functional levels and embedded SQL provides such a bridge by means of a new type of object called a cursor. Operations not involving cursors 43 The DML statements that do not need cursors are as follows: “Singleton SELECT” UPDATE INSERT DELETE Singleton SELECT We use the term “singleton SELECT “ to mean statement for which the retrieved table contains at most one row. Example: SELECT statement UPDATE This statement can be executed to have changes in the databases designed. Example: UPDATE, statement of SQL. INSERT This statement is used to include new row or information. Example: INSERT, statement of SQL. DELETE This is used to delete information from the database. Example: DELETE, statement of SQL. Operations involving cursors Consider the case of a SELECT that selects a whole set of records, not just one. What is needed is a mechanism for accessing the records in the set one by one; and cursors provide such a mechanism. Explicitly defined cursors are constructs that enable the user to name an area of memory to hold a specific statement for access at a later time. The programmer to process a multiple-row active set one record at a time defines explicit cursors. The following are steps for using explicitly defined cursors within PL/SQL. 1.Declare the cursor * Name the cursor * Each cursor associates a query with cursor 44 Syntax Declare cursor-name is select statement Example Declare c_names is select branch_name from branch_city=’Brooklyn’; branch where 2.Open the cursor Opening the cursor activates the query and identifies the active set. Open also initializes the cursor pointer to just before the first row of the active set. Syntax Open cursor-name; 3.Fetching the cursor Getting data into the cursor is accompolished with the fetch command.The fetch command retrieves the rows in the cursor set one row at a time. Syntax Fetch cursor-name into record-list; 4.Closing the cursor The close statement closes or deactivates the previously opened cursor and makes the active set undefined oracle will implicitly close a cursor when the user’s program or see\ssion is terminated.After a cursor is closed ,we cannot perform any operation on it. Syntax Close cursor-name; Attributes involved in cursors %ISOPEN returns TRUE if the cursor is already OPEN %FOUND returns TRUE if the last FETCH returned a row, and returns FALSE if the last FETCH failed to return a row. %NOTFOUND is the logical opposite of %FOUND. %ROWCOUNT yields the number of rows fetched. Example to illustrate cursor 1) Declare 45 Cursor c4 is select salary,job from emp where job=’CLERK’; Begin if c4%isopen then dbms.output.put_line(‘This message will not be displayed’); else open c4; dbms.output.put_line(‘Cursor not found’); end if; close c4; end; 2) The procedure to update students information by finding the total and average. Declare st stu%rowtype; cursor c1 is select * from stu; Begin Open c1; loop; fetch c1 into st; exit when c1%notfound; st.tot1l:=st.m1+st.m2+st.m3; st.average:=st.total/3; if st.m1>=50 and st.m2>=50 and st.m3>=50 then st.result:=’PASS’; else st.result:=’FAIL’; end if; update stu total=st.total,average=st.average,result=st.result where regno=st.regno; end loop; commit; end; set Dynamic Statements Embedded SQL provides certain features to facilitate the writing of on-line application programs that is programs to support on-line access to the database from an end-user at the terminal. Steps involved are 1.accept a command from the terminal 2.analyze the command 3.issue appropriate SQL statements 4.return a message and/or results to the terminal 46 The precompiler is a compiler for the SQL language. Suppose the application programs have written a program P that includes some embedded SQL statements. Pre-compilation proceeds as follows. The precompiler scans the source program P and locates the embedded SQL statements. For each statement it finds the precompiler decides on a strategy for implementing that statements in terms of RSI operations. This process is referred to as optimization The precompiler replaces each of the original embedded SQL statements by an ordinary PL/I statement The dynamic SQL component of SQL-92 allows programs to construct and submit SQL queries at run-time. In case of embedded SQL, each statement must be completely present at compile time, and are compiled by the embedded SQL preprocessor. Using dynamic SQL, programs can create SQL queries as strings at run-time (based on i/p from the user) and can either have them executed immediately, or have them prepared for subsequent use. The two principal dynamic statements are PREPARE and EXECUTE. DCL SQLSOURCE CHAR (256); SQLSOUCE =’DELETE BRANCH_CITY=’PERRYRIDGE’; $PREPARE SQLOBJ $EXECUTE SQLOBJ: FROM BRANCH FROM WHERE SQLSOURCE: The PREPARE statement passes the SQLSOURCE string to the RDS precompiler which goes through its normal process of parsing, optimization, code generation and builds a machine language versions of the statement called SQLOBJ.EXECUTE statement causes this machine language routine to be executed and thus causes the actual deletions to occur. Once PREPAREd ,a given dynamically generated SQL statement can be EXECUTED many times. The generated statement can be replaced by another by issuing PREPARE again with the same target and a different source. QUERY-BY-EXAMPLE Query-by-example (QBE) is the name of both a data-manipulation language and the database system that included this language. The QBE database system was developed at IBM T.J.Watson Research center in the 47 early 1970s.Today,some-database systems for personal computers support variants of QBE languages. It has two distinctive features: 1.Unlike most query languages and programming languages, QBE has a two-dimensional syntax: Queries look like tables. A query in one-dimensional language can be written in a one line. A two-dimensional language requires two dimensions for its expression. 2.QBE queries are expressed “by example”. Instead of giving a procedure for obtaining the desired answer, the user gives an example of what is desired. The system generalizes this example to compute the answer to the query. We express queries in QBE using skeleton tables. These tables show the relation schema as shown below. Example the representation of branch relation Branch Branch name Branch city assets Retreival operations Queries on One relation Examples: 1:Find all loan numbers at the Perryridge branch Loan Branchname Perryridge Loannumber P._x Amount The proceeding query causes the system to look for tuples in loan that have “perryridge” as the value for the branch-name attribute. For each such tuple the value of the loan-number attribute is assigned to the variable x. The value of the variable x is “printed”, because the command P. appears in the loan-number coloumn next to the variable x.QBE assumes that a blank position in a row contains unique variable.As a result,if a variable does not appear more than once in a query,it may be omitted. 48 Thus the previous query can be re-written as Loan branch-name loan-number amount Perryridge P. QBE performs duplicate elimination automatically.To suppress the duplicate elimination,we insert the command ALL. After the P. command: Loan branch-name loan-number amount Perryridge P.ALL To display the entire loan relation ,we can create a single row consisting of P. in every field. Loan branch-name loan-number amount P. QBE allows queries that involve arithmetic comparisons Example 1.Find the loan numbers of all loans with a loan amount of more than $700. Loan Branch-name Loan-no. Amount P.>700 The arithmetic operations that QBE supports are =,<,≤,≥ and ¬ 2.Find the names of all branches that are not located in Brooklyn. Branch Branch-name Branch-city Assets P. ¬Brooklyn 3.Find the loan-no. of all loans made jointly to Smith and Jones. Borrower Customer-name Loan-no. ‘Smith’ P._x ‘Jones’ _x 49 4.Find the loan numbers of all loans made to smith ,to Jones or to both jointly. Borrower customer-name loan-no. ‘Smith’ P._x ‘Jones’ P._y 5.Find all customers who live in the same city as Jones. Customer Customer-name Customer-street Customer-city P._x _y Jones _y Queries on several relations QBE allows queries that span several different relations. The connections among the various relations are achieved through variables that force certain tuples to have the same value on certain attributes. Example 1.Find the names of all customers who have a loan from the ‘perryridge’ branch.. loan branch_name loan_no. amount perryridge _x borrower cust_name loan_no. P._x _x 2.Find the names of all customers who have both an account and a loan at the bank. Depositor customer-name account-no. P._x Borrower customer-name account-no. 50 _x 3.Find the names of all customers who have an account at the bank ,but who have a loan from the bank. Depositor customer-name account-no. P._x Borrower customer-name loan-no. _x 4.Find all customers who have atleast two account. Depositor customer-name account-no. P._x x _y y The condition box It is not convenient to express all the constraints on the domain variables within the skeleton tables. To overcome this QBE includes a condition box feature that allows the expression of general constraints over any of the domain variables. Example: 1:Find all customers who are not named ‘Jones’ and who atleast two account. Depositor customer-name account-no. P._x x _y y Conditions -Y>_z 51 2.Find all account-no. with a balance between $1300 and $1500 ,we write acc-no. branch-name acc-no. balance P. _x Conditions _x.≥1300 _x≤1500 3.Find all branches that have assests greater than those of atleast one branch loacated in ‘Brooklyn’. Branch branch-name branch-city assets P._x Brooklyn _y _x Conditions _Y >_z Options available with condition Box 1.QBE allows complex arithmetic expressions to appear in a condition box. Example: Find all branches that have assets that are atleast twice as large as the assets of one of the branches located in Brooklyn. Branch branch-name branch-city assets P._x Brooklyn 52 _y _x 2.QBE allows logical expressions to appear in condition box.Operators used are and( & ),or( | ) Example Find all account numbers with a balance between $1300 and $2000 but not exactly $1500. Account branch-name account-no. balance P. _x Conditions _x=( ≥1300 and ≤2000 and ┐1500) The result relation If the result of a query includes attributes from several relation schemas, we need a mechanism to display the desired result in a single table. Example 1.Find the customer-name, account-no. and balance for all accounts at the perryridge branch In relational algebra 1.Join depositor and account relation 2.project customer-name, account-no. and balance QBE related with this. 1.Create a skeleton table called result with attributes customer-name, account-no. and balance. Account branch-name account-no. Balance Perryridge _y _z Depositor customer-name account-no. _x _y 53 Result customer-name P. _x Ordering of the display of tuples account-no. Balance _y _z By using the command AO. And DO. we can order the contents. Example 1.List all customers in descending alphabetical order. Depositor customer-name account-no. P.DO. Aggregate functions[Built-in functions] QBE includes the aggregate operators AVG, MAX, MIN, SUM and CNT.we must postfix these operators with ALL. to create a multiset on which the aggregate operation is evaluated. Example 1.Find the total balance of all the account maintained at the perryridge branch. Account branch-name account-no. balance Perryridge P.SUM ALL. 2.Find the total no. of customers who have an account at the bank. Depositor customer-name P.CNT.UNQ.ALL. 54 account-no. 3.Find the name,street and city of all customers who have more than one account at the bank. Customer cust-name cust-street cust-city P. _x Depositor Cust-name Account-No. G._x CNT.ALL._y Conditions CNT.ALL._y > 1 Update operations/Modification of the database This section deals with the options how to add, remove or change information using QBE. Deletion Deletion of tuples from a relation is expressed in much the same way as a query. The major difference is the use of D. in the place of P..In QBE we can delete whole tuples, as well as values in selected coloumns. To delete information in only some of the columns, null values, specified by-are inserted. D. Operates on only one relation. To delete tuples from several relations, we must use one D. operator for each relation. *Delete customer smith customer cust_name cust_street cust_city D. Smith *Delete the branch-city value of the branch whose name is “Perryridge”. Branch branch-name branch-city asstes Perryridge D. *Delete all loans with a loan amount between $1300 and $1500 55 Loan Branch-name loan-no. amount 56 57 58 D. _y _x Borrower cust_name loan_no. D. _y Condition _x=(>=1300 and <= 1500) *Delete all accounts at all branches located in Brooklyn. Account branch_name account_no. balance D. _x _y Depositor cust_name acc_no. D. _y 59 branch branch_name _x branch_city assets Brooklyn Insertion We do the insertion by placing the I. Operator in the query expression.The attribute values for inserted tuplles must be members of the attributes domain Example *To insert into the branch relation information about a new branch with name “Capital” and city “Queens”,but with a null asset value,we write branch branch_name branch_city assets I. Capital Queens *To insert the account A-9732 at the Perryridge branch has a balance of $700. Account branch-name account_no. balance I. Perryridge A-9732 700 Updates If we want to changeone value in a tuple withput changing all values in the tuple we use the update facility and the operartor used is U. .QBE allows users to update the primary key fields. Update the asset value of the Perryridge branch to $10,000,000 Branch branch-name branch-city assets Perryridge U. 100000000 60 The query updates the assets of the Perryrigde branch to $10,000,000 regardless of the old values.If we want to update a value using the previous vaulue ,we must express a request using two rows:One specifying the old tuples that need to be updated,and the other indicating the new updated tuples to be inserted in the database The interesty payments are being made,and all branches are to be increased by 5%. Account branch-name account-no. balance U. _x * 1.05 _x. QBE Dictionary QBE has a built-in dictionary that is represented to the user as a collection of tables. The dictionary include for example, a TABLE and a DOMAIN table, giving details of all tables and all domains currently known to the system. The dictionary tables can be interrogated using the ordinary retrieval operations of the DML. Retrieval of table-names Get the names of all tables known to the system. P. Instead of having to build a skeleton for the TABLE table and entering “P.” in the NAME column of that skeleton, the user can formulate this query by simply entering the “P.” in the table-name position of the blank table. Retrieval of column-name for a given table Get names of all columns in table S. S P. 61 User enters the table-name (S) followed by “P.” against the row of (blank) columnnames. Creation of a new table 1.Create table branch I. branch I. Branch name branch city branch street The first I. Creates a dictionary entry for table branch; the 2nd I. Creates dictionary entries for the four columns of the table branch. Also the information for each column must be specified .The information includes the name of the underlying domain; the data-type of the domain; if that domain is not already known to QBE. Dropping a table Drop table branch. A table can be dropped only if it is currently empty. 1)Delete all branch details branch branch name branch city branch street D. 2)Drop the table D. Branch branch name branch city branch street Expanding a table Add a asset coloumn to the table branch. QBE does not directly support the dynamic addition of a new column to an existing table is currently empty. 62 So the following steps should be followed. 1) Define a new table the same shape as the existing table plus the new column. 2) Load the new table from the old using a multiple-record insert. 3) Delete all data from the old table. 4) Drop the old table. 5) Change the name of the new table to that of the old table. Normalization Introduction Normalization theory is build around the concept of normal forms. A relation is said to be in a particular normal form if it satisfies a certain specified set of constraints. For example, a relation is said to be in first normal form if and only if it satisfies the constraint that it contains atomic values only. Various normal forms are First Normal Form, Second Normal Form, Third Normal Form, DKNF, and BCNF etc. Concept of normalization arises in the case to design a relational-database without unnecessary redundancy, easy way of retrieval etc…So if we want to design such a database we go for normalization. For the description of normalization, we shall consider the supplier-and-parts database. The database or relation is as follows: PART---P P# P1 P2 P3 P4 P5 P6 Pname Nut Bolt Screw Screw Cam Cog Color Weight City Red Green Blue Red Blue Red 12 17 17 14 12 19 S# S1 S2 S3 S4 S5 63 Sname Smith Jones Blake Clark Adams London Paris Rome London Paris London Status 20 10 30 20 30 City London Paris Paris London Athens SP------ S# S1 S1 S1 S1 S1 S1 S2 S2 S3 S4 S4 S4 P# P1 P2 P3 P4 P5 P6 P1 P2 P2 P2 P4 P5 QTY 300 200 400 200 100 100 300 400 200 200 300 400 FIG:1 Functional Dependency Definition: Given a relation R, attribute Y of R is functionally dependent on attribute X of R if and only if each X-value in R has associated with it precisely one Yvalue in R. In the supplier-and-parts database the attributes SNAME, STATUS and CITY of a relation S are each functionally dependent on attribute S#. For a particular value for S# there exists precisely one corresponding value for each of SNAME, STATUS and CITY. S.S# S.SNAME S.S# S.STATUS S.S# S.CITY Or we can say represent as S.S#S. (SNAME, STATUS, CITY) The statement S.S#S.CITY is read as “attribute S.CITY is functionally dependent on attribute S.S#”, or “attribute S.S# functionally determines attribute S.CITY”. Alternate definition for functional dependence Given a relation R, attribute Y of R is functionally dependent on attribute X of R if and only if, whenever two tuples of R agree on their Xvalue, they also agree on their Y-value. 64 S# S1 S1 S1 S1 P# P1 P2 P3 P4 Qty 300 200 400 100 Status 20 20 20 20 Fig: Partial tabulation of relation SP’. For example in this relation SP’ SP’.S#SP’.STATUS A functional dependence is a special form of integrity constraint. For example, if a relation S satisfies the FD S.S#S.CITY then we say that every legal extension of that relation satisfies that constraint. It is convenient to represent the FDs in a given set of relations by means of a functional dependency diagram. Example: S# PNAME STATUS P# SNAME COLOR WEIGHT CITY S# QTY P# CITY Fig: Functional dependencies in relations S, P, SP. Various Normal Forms Brief description of Normal forms First Normal Form Eliminates repetition of data that is converts each data value to its atomic form No two rows should be identical 65 Each table entry should be single valued Every table has a primary key, which is a unique label or identifier for each row Second Normal Form Requires taking out data that is only dependent on a part of the key Each non-key attribute is functionally dependent on the entire key Third Normal form Involves getting rid of anything in the tables that does not depend solely on the primary key 3NF is sometimes characterized as “the key, the whole key, and nothing but the key” First Normal Form Definition: A relation R is in first normal form(1NF) if and only if all underlying domain contain atomic values only. A relation that is only in first normal form has a structure that is undesirable for a number of reasons. For example: Let us assume that information concerning suppliers and shipments, rather than being split into two separate relations (S and SP) is combined into a single relation and let the name be FIRST with fields (S#, STATUS, CITY, P#, QTY). Where S# represents the supplier number, STATUS represents the supply details, CITY represents the city where the supply has been made P# represents the Part number, QTY represents the quantity of supply. Here the constraint is STATUS is functionally dependent on CITY. That is the meaning of this constraint is that a supplier’s status is determined by the corresponding location: e.g., all LONDON suppliers must have a status of 20.Also we ignore the attribute SNAME for simplicity The primary key of FIRST is the combination of (S#, P#). The following is the functional dependency diagram for this relation S# 66 STATUS QTY P# CITY Fig: Functional dependencies in the relation FIRST In the diagram i) STATUS and CITY are not functionally dependent on the primary key. ii) STATUS and CITY are not mutually dependent. Certain difficulties of the FIRST relation occurs while UPDATION.They are explained as Insert: We cannot enter the fact that a particular supplier is located in a particular city until that supplier supplies at least one part. The following is the tabulation of FIRST. S# STATUS CITY P# QTY S1 20 London P1 300 S1 20 London P2 200 S1 202 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400 Table: FIRST The FIRST relation does not show that supplier S% is located in ATHENS. Because until S5 supplies some part, we have not appropriate primary key value. Deletion: If we delete the only FIRST tuple for a particular supplier, we destroy not only the shipment connecting that supplier to some part but also the information that the supplier is located in a particular city. For example if we delete the FIRST tuple with S# value S# and P# value P2, we lose the information that S3 is located in Paris. Updation: the city value for a given supplier appears in FIRST many times, this redundancy causes update problems. For example, if supplier S1 moves from London to Amsterdam then the two difficulties occurs. They are 67 Searching the FIRST relation to find every tuple connecting S1 and London and this produces an inconsistent result. The solution to these problems is to replace the relation FIRST by the two relations SECOND (S#, STATUS, CITY) and SP (S#, P#, QTY). The functional dependency diagrams for these two relations are as shown here. STATUS S# CITY S# P# CITY Fig:Functional dependencies in the relation SECOND and SP. The following tables shows the sample tabulations corresponding to the data values of FIG:1 except the information for supplier S5 has been included in SECOND and not in SP. SECOND S# Status City S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens SP S# P# QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400 Fig: Sample tabulations of SECOND and SP. After building the tables as shown we overcome the difficulties of FIRST relation. Now we can easily do the operations on the tables. This is about first normal form. 68 SECOND NORMAL FORM: DEFINITION: A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully dependent on the primary key. Relations SECOND and SP are both 2NF (the primary keys are S# and the combination (S#,P#), respectively). Relation FIRST is not in 2NF. A relation that is in first normal form and not in second can always be reduced to an equivalent collection of 2NF relations. The reduction consists of replacing the relations by suitable projections; the collections of these projections is equivalent to the original relations, in the sense that the original relation can always be recovered by taking the natural join of these projections, so no information is lost in the process. In other words, the process is reversible. In our example: SECOND and SP relations are projections of FIRST, and FIRST is the natural join of SECOND and SP over S#. The reduction of FIRST to the pair (SECOND, SP) is an example of nonloss decomposition. In general, given a relation R with possibly composite attributes A, B, C satisfying the FD R.A R.B, R can always be “nonloss-decomposed” into its projections R1 (A, B) and R2 (A, C).Since no information is lost in the reduction process, any information that can be derived from the original structure can also be derived from the new structure. The converse is not true, however: The new structure may contain information (such as the fact that S5 is located in Athens) that could not be represented in the original. In the sense the new structure is a slightly more faithful reflection of the real world. The SECOND /SP structure still causes problems, however. Relation SP is satisfactory; as a matter of fact, relation SP is now in the normal form, and we shall ignore it for the reminder of this section. Relation SECOND, on the other hand, still suffers from a lack of mutual independence among its nonkey attributes. The dependence diagram for SECOND is still more complex than a 3NF diagram. To be specific, the dependency of the STATUS on S#, thought it is functional, is transitive (via CITY): Each S# value determines a CITY value, and this in returns determines the STATUS value. This transitivity leads, once again, to difficulties over update operations. (We now concentrate on the association between cities and status values-ie.,on the functional dependency of STATUS on CITY .) INSERTING: We cannot enter the fact that a particular city has a particular status value-for example, we cannot state that any supplier in Rome must have a status of 50-until we have some supplier located in that city. The reason is, again, that until such a supplier exists we have no appropriate primary key value. 69 DELETING: If we delete the only SECOND tuple for a particular city, we destroy not only the information for the supplier concerned but also the information that that the city has that particular status value. For example, if we delete the SECOND tuple for S5, we lose the information that the status for the Athens is 30. UPDATING:The status value for a given city appears in SECOND many times.Thus,if we need to change the status value for London from 20 to 30 we are faced with either the problem of searching the SECOND relation to find every tuple for London or the possibilbity of producing an inconsistent result. The solution to the problems is to replace the original relation (SECOND) by two projections SC(S#,CITY) and CS(CITY,STATUS).And the corresponding functional dependency diagram is shown here. S# CITY CITY STATUS The tabulations corresponding to these is S# City S1 London S2 Paris S3 Paris S4 London S5 Athens City Status Athens 30 London 20 Paris 10 SC CS--- Fig:2 Sample tabulations of SC and CS. It should be clear that this new structure overcomes all the problems over update operations concerning the CITYSTATUS association. Third Normal Form Definition: A relation R is in third normal form (3NF) if and only if is in 2NF and every non-key attribute is nontransitively dependent on the primary key. Relations SC and CS (shown in Fig:2)are both 3NF;relation SECOND (shown in page 20)is not in 3NF.A relation that is not in second normal form and not in third can always be reduced to an equivalent collection of 3NF relations. Relations with more than one candidate key or BCNF (Boycecodd normal form) Definition: 70 A relation R is in BCNF if and only if every determinant is a candidate key. The objective of BCNF is to handle a relation having two or more composite and overlapping candidate keys. Although BCNF is stronger than 3NF,it is still true that any relation can be decomposed in a non-less way into an equivalent collection of BCNF relations. Relation FIRST consists of three determinants: S#, CITY and the combination (S#, P#). Among these (S#, P#) alone is a candidate key; hence FIRST is not in BCNF. Relation SECOND is also not in BCNF because the determinant CITY is not a candidate key. Relations SP, SC and CS are in BCNF because in each case the primary key is the only determinant in the relation. Example: involving two disjoint (non-overlapping) candidate keys. Let us consider relation S (S#, SNAME, STATUS, CITY) .the relation S is BCNF.However, it is desirable to specify both keys in the definition of the relation: a) To inform the DBMS, so that it may enforce the constraints implied by the two-way dependency between the two keys-namely, that corresponding to each supplier number there exists a unique supplier name, and conversely b) To inform the users, since of course the uniqueness of the two attributes is an aspect of the semantics of the relation and is therefore of interest to people using it. Example -where the candidate keys overlap. Two candidate keys overlap if they involve two or more attributes each and have an attribute in common. 1) We suppose that the supplier names are unique, and we consider the relation SSP (S#, SNAME, P#, QTY). The keys are (S#, P#) and (SNAME, P#). This is relation is not in BCNF because we have two determinants# and SNAME, which are not keys for the relation (S# determines SNAME, and conversely). But the relation is in 3NF if we consider the definition----A relation R is in 3NF if and only if it is in 2NF and every non-key attribute is non-transitively dependent on the primary key. Here in this definition it does not require an attribute to be fully dependent on the primary key if it was itself a component of some other key in the relation, and so the fact that SNAME is not fully dependent on (S#, P#). But this fact leads to redundancy and hence to update problems in the relation SSP.If we go for updating the name of supplier S from 71 Smith to Robinson leads either to search problems or to possibly inconsistent results. The solution to the problems as usual is to decompose the relation SSP into two projections, in this case SS (S#, SNAME) and SP (S#, P#, QTY) for SP (SNAME,P#,QTY).These projections are both BCNF. 2) Second example; Consider the relation SJT with attributes S(student),J(subject) and T(teacher).The meaning of an SJT tuple is that the specified student is taught the specified subject by the specified teacher. The semantic rules follow: 1.Only one teacher teaches each student of thet subject 2.Each teacher teaches only one subject 3.Several tachers teach each subject. The sample tabulation of this relation is as follows SJT S Smith Smith Jones Jones J Math Physics Math Physics T Prof.white Prof.Green Prof.White Prof.Brown The functional dependencies of SJT are: From the first semantic rule we have functional dependency of T on the composite attributes (S, J). Form the second semantic rule we have a functional dependency of J on T. From the third semantic rule it is understood that there is no functional dependency of T on J. So the diagram is as follows S T J Fig: Functional dependencies in the relation SJT. 72 Here again we are having two overlapping candidate keys: the combination (S, J) and the combination (S, T). Once again the relation is 3NF and not BCNF; and once again the relation suffers from certain anomalies in connection with update operations. For example, if we wish to delete the information that Jones is studying physics, we cannot do so without at the same time losing information that professor Brown teaches physics. The difficulties are caused by the fact that T is determinant but not a candidate key. Again we can get over the problem by replacing the original relation by two BCNF projections, in this case ST (S, T) and T, J (T, J). Finally we say that the concept of BCNF eliminates certain problem cases that could occur under the old definition of 3NF.Moreover,BCNF is conceptually simpler than 3NF,in that it involves no reference to the concepts of primary key, transitive dependence and full dependence. The reference of candidate keys can also be replaced by a reference to the more fundamental notion of functional dependence. The reference to candidate keys can also be replaced by a reference to the more fundamental notion of functional dependence. Good and Bad decompositions During the reduction process it is frequently the case that a given relation can be decomposed in a variety of different ways. Consider the relation SECOND (S#, STATUS, CITY) with functional dependencies (FDs). SECOND.S#SECOND.CITY SECOND.CITYSECOND.STATUS And therefore by transitivity SECOND.S#SECOND.STATUS The representation of SECOND relation is PNAME S# STATUS COLOR P# SNAME CITY WEIGHT CITY S# QTY P# 73 Fig: Functional dependencies in relations S, P, SP The above diagram clearly states that the update problems encountered with SECOND could be overcome by replacing it by its decomposition into the two 3NF projections SC (S#, CITY) and CS (CITY, STATUS)------------------A Let this composition be A. An alternative decomposition is SC (S#, CITY) and SS (S#, STATUS)---------------------------B Decomposition B is also nonloss, and the two projections are again BCNF.But decomposition B is less satisfactory than decomposition A. For example, it is still not possible (in B) to insert the fact that a particular city has a particular status value unless supplier is located in that city. The explanation of this example is as follows: In decomposition A the two projections are independent of each other, in the sense that updates can be made to either one without regard for the other; So joining them will not violate the FD constraints on SECOND. In decomposition B updates to either of the two projections must be monitored to ensure that the FD SECOND.CITYSECOND.STATUS is not violated. Thus projections SC and SS are not independent of each other. A relation that cannot be decomposed into independent component is said to be atomic. Questions: 1.What is embedded SQL? 2.Define QBE. 3.Explain operations involving cursors and not involving cursors. 4.What do you meant by dynamic statements? 5.Explain retrieval operations of QBE. 6.Explain update operations of QBE. 7.Explain built-in functions of QBE. 8.Define Normalization. 9.What are various forms of normalization? 10.What do you meant by QBE dictionary? 11.Explain first, second and third normal forms. 12.Explain relations with more than one candidate keys [BCNF]. 13.what do you meant by good and bad decomposition? 14.What are QBE-aggregate functions? 15.What is functional dependency? 74 Unit IV Syllabus Hierarchical Approach:IMS data structure. Physical database, database description, Hierarhical sequence. External level of IMS: Logical Databases, the program communication block. IMS data manipulation: Defining the program communication block: DL/I Examples. Books for Reference: An introduction to database system - C.J.Date Database system Concepts - Abraham silberschatz, Henry F.Korth, S.Sudharsan Principles of database system -Aho D.Ullman IMS data structure(Information Management System) A physical database is an ordered set, the elements of which consist of all occurrences of one type of physical database record(PDBR).A PDBR occurrences in turn consists of a hierarchical arrangement of fixed-length segment occurrences; and a segment occurrence consists of a set of associated fixed-length field occurrences. As an example we consider a PDB that contains information about the internal education system of a large industrial company. The hierarchical structure of this PDBthat is the PDBR type is shown here Course Course# Title Description Prereq Course# Title Offering Date Location Teacher Student Emp# Name 75 Format Emp# Name Grade Fig: PDBR type for the education database. In this example we are assuming that the company maintains an education department whose function is to run a number of training courses. Each course is offered at a number of different locations within the company. The PDB contains details both of offerings already given and of offerings scheduled to be in the future,. The details are as follows: For each course: course number (unique), course title, course description, details of prerequisites courses if any, and details of all offerings. For each prerequisite course for a given course: course number and title. For each offering of a given course: date, location, format, details of all teachers and details of all students; For each teacher of a given offering: employee number and name For each student of a given offerings: (EMP_N), name and grade. In the PDBR structure shown, we have five types of sgments: COURSE, PREREQ, OFFERING, TEACHER and STUDENT, each one consisting of the field types indicated. COURSE is the root segment type and the others are department segment types. Each dependent has a parent for example the parent of TEACHER is OFFERING. Similarly each parent has at least one child, for example COURSE has two children. For one occurrence of any given segment type may be any number occurrences of each of its child segment types. Course M23 Dynamics … Prereq Offering M19 Calculus M16 Trignomentry 750106 751104 730813 Oslo Dublin Madrid F2 F3 F3 Student 421633 Sharp.R 761620 183009 102141 Teacher 76 Tallis.T Gibbons.O Byrd,W B A B Fig: Sample PDBR Occurrence for the education database. The database Description Each physical database is defined together with its mapping to storage by a database description (DBD). The source form of the DBD is written using special System/370 Assembler language macro statements, once written the DBD is assembled and the object form is stored away in a system library, from which it may be extracted when required by the IMS control program. So the following is the DBD for the education database. 1 DBD 2 SEGM 3 FIELD 4 FIELD 5 FIELD 6 SEGM 7 FIELD 8 FIELD 9 SEGM 10 FIELD 11 FIELD 12 FIELD 13 SEGM 14 FIELD 15 FIELD 16 SEGM 17 FIELD 18 FIELD 19 FIELD NAME=EDUCPDBD NAME=COURSE, BYTES=256 NAME=(COURSE#, SEQ), BYTES=3,START=1 NAME=TITLE, BYTES=33,START=4 NAME=DESCRIPN, BYTES=220,START=37 NAME=PREREQ, PARENT=COURSE, BYTES=36 NAME=(COURSE#, SEQ), BYTES=3,START=1 NAME=TITLE, BYTES=33,START=4 NAME=OFFERING, PARENT=COURSE, BYTES=20 NAME=(DATE, SEQ, M), BYTES=12,START1 NAME=LOCATION, BYTES=12,START=19 NAME=FORMAT, BYTES=2,START=19 NAME=TEACHER,PARENT=OFFERING,BYTES=24 NAME=(EMP#, SEQ), BYTES=6,START=7 NAME=NAME, BYTES=18,START=7 NAME=STUDENT,PARENT=OFFERING, BYTES=25 NAME=(EMP#, SEQ), BYTES=18MSTART=7 NAME=NAME, BYTES=18,START=7 NAME=GRADE, BYTES=1,START=25 FIG: DBD for the education PDB. Explanation Statement 1:Assigns the name EDUCPDBD (“education physical database description”) to the DBD.All the names in IMS are limited to a maximum length of eight characters. 77 Statement 2:Defines the root segment type with the name COURSE and has totally 256 bytes length. Statement 3-5:Defines the field types that go to make up COURSE. Each is given a name, a length in bytes, and a start position within the segment. The first field, COURSE# is defined to be the sequence field for the segment. So the PDBR occurrences will be sequenced in ascending course number order. Statement 6:Defines PREREQ as a 36-byte segment and is dependent on COURSE. Statements 7-8:Define the fields of PREREQ. Statement 9:Defines OFFERING as a child of COURSE. Statements 10-12:Define the fields of OFFERING.DATE are defined as the sequence field for OFFERING. The specification M (multiple) means that twin OFERING occurrences may contain the same date value. Statements 13-15:Define the TEACHER segment and its fields Statements 16-19:Define the STUDENT segment and its fields The sequence of statements in the DBD is significant. Specifically SEGM statements must appear in the sequence that reflects the hierarchical structure also each SEGM statement must be immediately followed by the appropriate FIELD statements. Hierarchical Sequence The concept of hierarchical sequence within a database is a very important one in IMS.The definition for this is as follows: For each segment occurrence, we define the “hierarchical sequence key value” to consist of the sequence field value for that segment, prefixed with the type code for that segment, prefixed with the hierarchical sequence key value of its parent, if any. For example, the hierarchical sequence key value for the STUDENT occurrence for “Byrd,W.” is 1M2337308135102141 78 Here 1 is the type code for COURSE, M23 the course#, 3 is the type code of OFFERING, 730813 is the DATE of OFFERING, 5 is the type code of STUDENT, 102141 is the EMP# of STUDENT. Then the hierarchical sequence for an IMS database is that sequence of segment occurrences defined by ascending values of the hierarchical sequence key. This notion is important in case of IMS databases because in IMS databases are stored in hierarchical sequence. External Level OF IMS Logical databases: In architecture the user’s external view was defined as subset of the corresponding physical database. A LDB (logical database) is an ordered set, the elements of which consist of all occurrences of one type of LDBR (logical database record).An LDBR type is a hierarchical arrangement of segment types, and is derived from the corresponding PDBR hierarchy in accordance with the following rules. Any segment type of the PDBR hierarchy together with all its dependents can be omitted from the LDBR hierarchy The fields of an LDBR segment type can be a subset of those of the corresponding PDBR segment type, and can be rearranged within that LDBR segment type. Example: Course Course# Title Description Offering Date Location Format Student Emp# Name Grade Fig: Sample LDBR type for the education database. Sensitive Segments: 79 The segments, which are present in PDB and is included in LDB are said to be sensitive segments. In the above example COURSE, STUDENT, OFFERING are sensitive segments .The user of this LDB will not be aware of the existence of any other segments. For example, the DL/I “get next” operation, which in general is used for sequential retrieval, will simply skip over any segments that are not sensitive for the user. If the user deletes a sensitive segment all children of that segment will be deleted regardless of sensitiveness. So the user should not be given the authority to delete a segment, which allows the deletion of other hidden segments too. Also sensitive-segment concept protects the user from modification like addition to the PDB unless it is proved that the addition of new segment may not affect any existing parent-child relationship. Also sensitive-segment concept provides a degree of control over data security, is as much as users can be prevented from accessing particular segment types by the omission of those segments from the LDB. Sensitive fields Sensitive fields are those fields of the PDB that are included in the LDB.Every sensitive field must be controlled within a sensitive segment A given LDB may include or exclude any combination of fields from the PDB, in general except that if the program intends to insert new occurrences of a given segment type, then it must be “sensitive to” the sequence filed for that segment type. Field sensitivity, like segment sensitivity, protects the user from certain types of growth in the database and provides a simple level of data security. The program communication block (PCB) Each LDB is defined by a PDB.The PCB includes the specification of the mapping between the LDB and the corresponding PDB.Like DBD (database description) a PCB is written using special system/370 assembler language macro statements. These statements constitute the “external DDL”for IMS.The set of all PCBs for a given user forms that user’s program specification block (PSB); the object form of the PSB is stored in a system library, from which it may be extracted when required by the IMS control program. Example: 1 2 3 4 PCB SENSEG SENSEG SENSEG TYPE=DB,DBNAME=EDUCPDBD,KEYLEN=15 NAME=COURSE, PROCOPT=G NAME=OFFERING,PARENT=COURSE,PROCOPT=G NAME=STUDENT,PARENT=OFFERING, PROCOPT=G Fig: PCB for the LDB 80 Explanation Statement 1:Specifies that this is a PCB database and named as EDUCPDBD, length of the key feedback area is 15 bytes. Key Feedback: When the user accesses an LDB, the corresponding PCB is held in storage and acts, as a communication area between the user’s program and IMS.One of the fields in the PCB is the key feedback area. When the user retrieves a segment from the LDB, IMS not only fetches the requested segment but also places a “fully concatenated key” into the key feedback area. The fully concatenated key consists of the concatenation of the sequence field values of all segments in the hierarchical path from the root down to the retrieved segment. Fetches the requested segment For example; Retrieve the STUDENT occurrence for Byrd.W. IMS will place the value M23730813102141 in the key feedback area. The fully concatenated key of a segment is not quite the same as the “hierarchical sequence key” as this does not include segment type code information. Statement 2:Specifies the first sensitive segment in the LDB.The name of the sensitive segment must be same as the name assigned to the segment in the DBD. The PROCOPT (processing options”) entry specifies the types of operation that the user will be permitted to perform on this segment. In this example the entry is G (“get”) indicating retrieval only. Other options are I (“insert”), R (“replace”) and D (“delete”). Statement 3:Defines the next sensitive segments in the LDB. Statement 4:Defines the last sensitive segments. In our example statements 3 and 4 are very similar. The PROCOPT entry is the same for each of the three sensitive segments .In such a situation we may specify PROCOPT in the PCB statement instead of in each SENSEG statement. 81 If PROCOPT=K is specified in the SENSEG statement for OFFERING, the user may largely ignore the presence of OFFERINGs in the hierarchy. The output for this modification is shown as follows. Course Course# Title Description Student Emp# Name Grade Fig: Effect of specifying PROCOPT=K for offering The main difference is that when a STUDENT occurrence is retrieved, the fully concatenated key in the key feedback area will include the date value from the parent OFFERING. The LDB shown in the example figure 1, is sensitive to all fields in segments COURSE, OFFERING and STUDENT of the underlying PDB.Suppose if we wish to exclude the LOCATION field of the OFFERING segment from the LDB while still remaining sensitive still all other fields as shown here: SENFLD SENFLD NAME=FORMAT, START=1 NAME=DATE, START=1 These statements specify the fields to be included in the LDB segment and their start position within that segment. If no SENFLD statement is given for a particular SENSEG statement, then by default that segment is taken to be identical to the underlying PDB segment. IMS Data Manipulation Defining the Program Communication Block (PCB) 82 The IMS data manipulation language (DL/I) is invoked from the host language (PL/I) by means of ordinary subroutine calls. When an application program is operating on a particular logical database (LDB), the PCB for that LDB is kept in storage to serve as a communication area between the programs and IMS; infact when the program calls DL/I, it has to quote the storage address of the appropriate PCB to identify to DL/I which LDB it is to operate on. PCB address is supplied to the program by IMS when the program is first entered. what actually happens is this.when a database application is to be run, IMS is given control first. IMS determines which PSB and DBD(s) are required, fetches them from their respective libraries and loads them into storage. IMS then fetches the application program and gives it control, passing it the PCB address as parameters. In order for the application program to be able to access the information in the PCB for a particular LDB, it must contain a definition of that PCB. DLITPLI: . . . Declare PROCEDURE (COSPCB_ADDR) OPTIONS (MAIN); 1 2 2 2 2 2 2 2 2 2 COSPCB DBDNAME SEGLEVEL STATUS PROCOPT RESERVED SEGNAME KEYFBLEN #SENSEGS KEYFBAREA BASED(COSPCB_ADDR), CHARACTER(8), CHARACTER(2), CHARACTER(2), CHARACTER(4), FIXED BINARY(31), CHARACTER(8), FIXED BINARY(31), FIXED BINARY(31), CHARACTER(15); Fig A: Example of program entry and PCB definition (PL/I). Explanation: The procedure statement (labeled DLITPLI) is the program entry point. the expression in parentheses following the keyword PROCEDURE represents the parameters to be passed to the program by IMS, it consist of the pointer giving the address of the PCB. The rest of the Fig A consist of a declare statement that defines a structure to represent the single PCB used in the application. The field DBDNAME contains the name of the underlying DBD throughout the execution of the program. 83 The SEGLEVEL field is set after the DL/I operation to contain the segment level number of the segment just accessed. The STATUS field is the most important field in the PCB. After each DL/I call, the two character value is placed in this field to indicate the success or otherwise of the requested operation. A blank value indicates that the operation was completed satisfactorily, any other value represents an exceptional or error condition. The PROCOPT field contains the PROCOPT value as specified in the PCB statement when the PCB was originally defined. The SEGNAME field contains the name if the segment last accessed. The KEYFBLEN field contains the length of the fully concatenated key. The #SENSEGS field contains a count of the number of sensitive segments. The field KEYFBAREA is the key feedback area contains the fully concatenated key. DL/I Examples Get Unique (GU) Get next (GN) Get next with parent (GNP) Get hold (GHU), (GHN),(GHNP) Insert (ISRT) Delete (DLET) Replace (REPL) Direct retrieval Sequential retrieval Sequential retrieval under current parent Allows subsequent DLET/REPL Add new segment occurrence Delete existing segment occurrence Replace existing segment occurrence Tab: DL/I Operations Direct retrieval: Get the first OFFERING occurrence where the location is Stockholm. GU COURSE OFFERING (LOCATION =’STOCKHOLM’) Sequential retrieval with an SSA: Get all STUDENT occurrences in the LDB, starting with the first student for the first offering in Stockholm. 84 GU NS COURSE OFFERING (LOCATION=’STOCKHOLM’) STUDENT GN STUDENT GOTO NS Sequential retrieval with an SSA within a parent: Get all students for the offering on 13 august 1973 of course M23. COURSE (COURSE#=’M23’) OFFERING (DATE=’730813’) GNP STUDENT GOTO NP GU NP Segment occurrence insertion: Add a new segment occurrence for the offering on 13 august 1973 of course M23. ISRT COURSE (COURSE#=’M23’) OFFERING (DATE=’730813’) STUDENT Segment deletion: Delete the offering of course M23 on aug 1973. GHU COURSE (COURSE# = ‘M23’) OFFERING (DATE=’730813’) DLET Segment replacement: Change the location of the 13 Aug 1973 offering of course M23 to Helsinki. GHU COURSE (COUSE# =’M23’) OFFERING (DATE=’730813’) REPL Questions. 1. Explain physical and logical database of hierarchical approach with example. 2. Explain DataBase Description (DBD) with example. 3. Explain Hierarchical sequence key value. 4. Explain Program communication block (PCB). 5. Discuss DL/I operations with some examples. UNIT-V 85 Syllabus Network approach: Architecture of DBTG system. DBTG data structure: The set construct, singular sets, sample schema, and the external level of DBTG-DBTG Data manipulation Books for reference: 1:Database system concepts Abraham Silberschatz and Henry F.Korth 2:An introduction to database systems C.J.Date Basic concepts: A network database consists of a collection of records, which are connected to one another through links. A record is in many respects similar to an entity in the entityrelationship model. Each record is a collection of fields (attributes), each of which contains only one value. A link can be viewed as a restricted (binary) form of relationship in the sense of the E-R model. To illustrate, consider a database representing a customer-account relationship in a banking system. There are two record types, customer and account. As we saw earlier, the customer record type can be defined, using Pascal-like notation, as follows: type customer = record name: string; street: string; city: string; end The account record type can be defined as follows: type account = record number: integer; balance: integer; end The sample database in figure A.1 shows that Lowman has account 305, Camp has accounts 226 and 177, and kahn has account 155. Lowman Square Dallas 305 226 86 500 336 Camp Downridge Garland 177 205 155 Kahn Fig:1 Sample database Bayside Plano 62 Data-structure diagrams: [Architecture of network model] A data-structure diagram is the scheme representing the design of a network database. Such a diagram consists of two basic components: *Boxes, which correspond to record types. *Lines, which correspond to links. A data-structure diagram serves the same purpose as an entity-relationship diagram; namely, it specifies the overall logical structure of the database. We shall consider the representation of binary, ternary etc. relationships of entity-relationship diagrams. Binary relationship The entity-relationship diagram for banking example is shown as follows: Street Name Balance Number City Cust Acct customer account E-R diagram (a) Name street city Number (b) 87 balance FIG:2 The above shown diagram (a) is the entity-relationship diagram and consists of two entity-sets customer and account, and they are related through a binary ‘many-tomany’ relationship ‘custacct’ with no descriptive attributes. The diagram shows that a customer may have several accounts and that an account may belong to several different customers. The corresponding datastructure diagram is shown in figure (b). Here the record type customer corresponds to the entity set customer. It includes three fields-name, street and city. Similarly, account is the record type corresponding to account entity-set and includes the attributes number and balance. Since, in the E-R diagram of above figure the CustAcct relationship is many-to-many, we draw no arrows on the link CustAcct diagram. If the relationship custacct were one-to-many from customer to account then the link custacct would have an arrow pointing to customer record type. The representation is shown as follows: name street number city Customer balance account (a) name street number city Customer balance account FIG:3 A sample database corresponding to the data-structure diagram of figure as shown. Since the relation is many-to-many, we show that katz has accounts 256 and 347 and that account 347 is owned by katz and Doner. A sample database corresponding to the data-structure diagram is shown here: Beck Maple San Francisco Katz North San jose Doner Sidehill 200 Palo Alto 88 55 256 100 000 347 667 301 10 533 Fig:4 Sample database corresponding t diagram of FIG:3a Since the relationship is one-to-many ------From customer to account, a customer may have more than one account, as is the case with Camp, who owns both 226 and 177. An account, however, cannot belong to more than one customer, as is indeed observed in the sample database. Finally, a sample database corresponding to the data-structure diagram of fig:3b is shown in the FIG:1. How to replace the E-R diagram shown in FIG:2a if the descriptive attribute has to be included? The transformation is more complicated because the link cannot contain any data value.So new record type has to be created and links need to be established as follows: If for example we consider the E-R diagram shown in FIG:2a and we are trying to add the descriptive attribute date to the custacct relationship to denote the last time the customer has accessed the account.The newly derived E-R diagram is shown here To transform this diagram to a data-structure diagram we need to: 1:Replace entities customer and account with record types customer and account 2:Create a new record type date with a single field to represent the date. 3:Create the following many-to-one links: *custdate from the date record type to the customer record type *acctdate from the date record type to the account recotd type The DBTG CODASYL Model The Database Task Group wrote the first database standard specification, called the CODASYL DBTG 1971 report, in the late 1960s. Then a number of changes have been suggested to that report, the last official one in 1978.The rules or standards advised by DBTG group are Link restriction DBTG Sets Repeating Groups Link Restriction 89 In the DBTG model, only many-to-one links can be used. Many-to-many links are disallowed in order to simplify the implementation. One-to-one links are represented using a many-to-one link. Let us illustrate this with the help of an example: Consider a binary relationship that is either one-to-many or one-to-one. If for our customer-account database, if the custacct relationship is one-to-many with no descriptive attributes and with descriptive attribute is shown in the following figure: Customer Name City account Number Balance Street Customer Name City account Street Number Balance Fig: Two data-structure diagramsDate If the custacct relationship is many-to-many then our transformation algorithm must be refined as follows. If the relationships have no descriptive attributes then the following algorithm must be employed: 1:Replace the entity sets customer and account with record types customer and account. 2:Create a new dummy record type Rlink that may either have no fields or have a single field containing an externally defined unique identifier. 3:Create the following two many-to-one links: custrlink from rlink record type to customer record type *acctlink from record type to account record type. stree t nam e D numbe r City Customer custAc ct Balance Account 90 DBTG sets Given that only many-to-one links can be used in the DBTG model, a datastructure diagram consisting of two record types that are linked together has the general form of the following figure: Name street city Number balance A B Fig:A The above shown structure is referred in the DBTG model as a DBTG-set. The name of the set is usually chosen to be the same as the name of the link connecting the two record types. In each such DBTG-set, the record type A is said as the owner (or parent) of the set, and the record type B is said as the member (or child) of the set. Each DBTG-set can have any number of set occurrences-that is actual instances of linked records. For example in the figure we are having three occurrences corresponding to the DBTG-set of figure A. Since many-to-many links are disallowed, each set occurrence has precisely one owner and zero or more member records. In addition, no member record of a set can participate. Simultaneoulsy in several set occurrences of different DBTG-sets. To illustrate, consider the data-structure diagram shown here. There are two DBTG-sets. Custacct, having customer as the owner of the DBTG-set, and account as the member of the DBTG-set. 91 Brncacct, having branch as the owner of the DBTG-set, and account as the member of the DBTG-set. The set custacct may be defined as follows: Set name is custacct Owner is customer Member is account The set brncacct may be defined similarly as Set name is brncacct Owner is branch Member is account An instance of the database is shown here: Five set occurences are shown: three of set custacct,and two of set brncacct 1:owneer is customer record Lowman with a singke member account record 305 2:owner is customer record Camp with two member account records 177 and 226 3:Owner is cuatomer record Kahn with three member account records 155,402 and 408. 4:Owner is branch record Hillside with three member account records 305,226 and 155. 5:Owner is branch record Valleyview with three member account records 177,402 and 408 Here the fact, an account record cannot appear in more than one set occurrence of one individual set type. This is because an account can belong to exactly one customer, and can be associated with only one bank branch. An account can appear in two set occurrences of different set types. For example, acccount 305 is a member of set occurrence 1 of type custacct and is also a member of set occurrence 4 of type brncacct. The member records of a set occurrence may be ordered in a variety of ways. Repeating Groups: The DBTG model provides a mechanism for a field to have a set of values, rather than one single value. For example, Suppose that a customer have several addresses. In this case, the customer record type will have the (street, city) pair of fields is defined as repeating group. So the customer record for Kahn is shown here: 92 The repeating groups construct is another way of representing the notion of weak entities in the E-R model. To illustrate we shall split the entity set customer into two sets: *Customer, with descriptive attribute name *Address, with descriptive attribute street and city. The address entity set is weak entity set, since it depends on the strong entity set customer. DBTG data retrieval facility The data manipulation language of the DBTG proposal consists of a number of commands that are embedded in a host language. The commands are explained as follows: The Find and Get commands The two most frequently used DBTG commands are *find-locates a record in the database and sets the appropriate currency pointers *get,which copies the record to which the current of run-unit points from the database to the appropriate program work area template. Access of individual records: The find command has a number of forms. There are two different find commands for locating individual records in the database. the simplest command has the form: Find any <record type> using <record-field> Purpose: Locates a record of type <record type> whose <record-field> value is the same as the value of <record-field> in the <record-type> template in the program work-area. The following currency pointers are set to point to that record: *The currency of run-unit pointer *The record-type currency pointer for <record type> 93 *For each set in which that record belongs, the appropriate set currency pointer For example: Construct the DBTG query that prints the street address of Lowman. Customer. name:=”Lowman”; Find any customer-using name; Get customer; Print (customer.street); To display the duplicate records the command is Find duplicate <record type> using <record-field> Which locates the next record, which matches the <record-field>. Example: Construct the DBTG-query that prints the names of all the customers who live in Dallas: Customer.city:=”Dallas”; Find any customer-using city; While DB-status = 0 do Begin Get customer; Print(customer.name); Find duplicate customer using city; End; Access of records within a set Purpose: Locate records in a particular DBTG-set. There are three different types of commands. The basic find command is Find first <record type> within <set-type> Which locates the first database record of type <record type> belonging to the current <set-type>. 94 To locate the other members of a set the command is Find next <record-type> within <set-type> This command finds the next elements in the set <set-type> Example: Construct the DBTG query that prints the total balance of all accounts belonging to Lowman. Sum: =0; Customer. name:=”Lowman”; Find any customer-using name; Find first account within custacct; While DB-status =0 do Begin Get account; Sum:=sum + account. Balance; Find next account within custacct; End Print (sum); To find the owner of a particular DBTG-set .The command used is Find owner within <set-type> Example: Construct the DBTG-query that prints all the customers of the Hillside branch: Branch-name:=”Hillside”; Find any branch-using name; Find first account within brncacct; While DB-status=0 do Begin Find owner within custacct; Get customer; Print(customer. name); Find next account within brncacct; End DBTG update facility Creating new records 95 To create a new record of type <record type> we insert the appropriate values in the corresponding <record type> template. And the command used is Store <record type> Example: Construct the DBTG query to add a new customer Jackson to the database. Customer.name:=”Jackson”; Customer.street:=”Old road”; Customer.city:=”Richardson”; Store customer; Modifying an existing record In order to modify an existing record of type <record type> we must find the record in the database, get that record into the memory, and then change the desired fields in the template of <record type>. Once this is accomplished, we reflect the changes to the record to which the currency pointer of <record type> points by executing the command: Modify <record type> The DBTG model requires the find command to be executed prior to modifying a record must have the additional clause “for update” so that the system is aware of the fact that the record is to be modified. Example: Construct the DBTG program to change the street address of Kahn to North Loop. Customer.name:=”Kahn”; Find for update any customer using name; Get customer; Customer.city:=”North Loop”; Modify customer; Deleting a record To delete an existing record of type <record type> we use the command: 96 Erase <record type> Example: The query to construct the DBTG program to delete account 402 belonging to Kahn: Finish:=false; Customer.name:=”Kahn”; Find any customer using name; Find for update first account within custacct; While DB-status=0 and not finish do Begin Get account; If account.number =402 then Begin Erase account; Finish: = true; End; Else Find for update next account within custAcct End; It is possible to delete an entire set occurrence by finding the owner of the set – say, a record of type <record type> - and executing. Erase all<record-type> This will delete the owner of the set as well as its entire member. If a member of the set is an owner of another set the members of that set are also deleted. That the erase all operation is recursive. Eg. Consider the DBTG program to delete customer “Camp” and all of her accounts. Customer.name :=”Camp”; Find for update any customer using name; Erase all customer. DBTG set-processing facility This mainly concerns with the mechanism of inserting records into and removing records from a particular set occurrence. The connect statement 97 To insert a new record of type <record type> into a particular occurrence of <settype> we must first insert the record into the database, then set the currency pointers of <record type> and <set type> to point to the appropriate record and set occurrence. The command used is Connect <record type> to <set-type> A new record can be inserted as follows: 1:create a new record of type <record type> . 2:Find the appropriate owner of the set <set type>. 3:Insert the new record into the set by executing the connect statement. Example: Create the DBTG query for creating new account 267 which belongs to Jackson: Account.number:=267; Account.balance:=0; Store account; Customer.name:=”Jackson”; Find any customer using name; Connect account to custacct; The Disconnect statement In order to remove a record of type <record type> from a set occurrence of <settype>, we need to set the currency pointer of <record type> and <set-type> to point to the appropriate record and set occurrence. Once this is accomplished, the record can be removed from the set by executing Disconnect <record-type> from <set-type> Eg. To remove account 177 from the set occurrence of type custacct. Account.number :=177; Find for update any account using number; Get account; Find owner within custacct; Disconnect account from custacct; The reconnect statement In order to move a record of type <record-type> from one set occurrence to another set occurrence of type <set-type>, we need to find the appropriate record and the 98 owner of the set occurrence to which the record is to be moved. Once this is done, we can move the record by executing: Reconnect <record-type> to <set-type> Consider the DBTG program to move all accounts of Lowman that are currently at the hillside branch to the valley view branch. Customer.name :=”Lowman”; Find any customer-using name; Find first account within custacct; While DB-status =0 do Begin Find owner within brncacct; Getbranch; If branch.name = “hillside” then Begin Branch.name:=”Valley view”; Find any branch-using name; Reconnect account to brncacct; End; Find next account within custacct; End; Set Insertion and Retention When a new set is defined, we must specify how member records are to be inserted. In addition, we must specify the conditions under which a record must be retained in the set occurrence in which it was initially inserted. Set Insertion A newly created record of type <record type > of a set type <set type > can be added to a set occurrence either explicitly (MANUALLY) or implicitly (automatically). This distinction is specified at set definition time via 99 Insertion is < insert mode > Where < insert mode > can take one of two forms. Manual : The new record can be inserted into the set manually ( explicitly ) by executing . Connect < record type > to <set-type> Automatic : The new record is inserted into the set automatically ( implicitly ) when it is created , that is , when we execute . Store < record type > In either case, just prior to insertion, the <set-type> currency pointer must point to the set occurrence into which the insertion is to be made. Set Retention There are various restrictions on how and when a member record can be removed from a set occurrence into which it has been inserted previously. These restrictions are specified at set definition time via Retention is < retention-mode > Where <retention-mode> can take one of the three forms Fixed : Once a member record has been inserted into a particular set occurrence , it cannot be removed from that set . If retention is fixed , then to reconnect a record to another set , we must first erase that record , re-create it , and then insert it into the new set occurrence . Mandatory : Once a member record has been inserted into a particular set occurrence , it can be reconnected only to another set occurrence of type <settype>. It can neither be disconnected nor be reconnected to a set of another type . Optional : No restrictions are placed on how and when a member record can be reconnected , disconnected ,and connected at will .The decision as to which to option to choose is dependent on the application . 100 Deletion When a record is deleted (erased) and that record is the owner of set occurrence of type <set-type> , the best way of handling this deletion depends on the specification of the set retention of <set-type> If the retention status is optional, then the record will be deleted and every member of the set it owns will be disconnected. These records, however, are kept in the database. If the retention status is fixed, then the record and all of its owned members will be deleted. This follows from the fact that the fixed status indicates that a member record cannot be removed from the set occurrence without being deleted. If the retention status is mandatory, then the record cannot be erased this is because the mandatory status indicates that a member record must belong to a set occurrence; it cannot be disconnected form that set. Set Ordering The members of a set occurrence of <set-type> may be ordered in a variety of ways. A programmer specifies these orders when the set is defined Order is <order-mode> Where <order-mode> can be First : When a new record is added to a set , it is inserted in the first positive . Thus, the set is in reverse chronological ordering Last : When a new record is added to a set , it is inserted in the ;last position . Thus, the set is in chronological ordering Next : Suppose that the currency pointer of <set-type> points to record X . if X is a member type , then when a new record is added to the set . It is inserted in the position following X. If X is an owner type, then when a new record is added, it is inserted in the last position. Prior : Suppose that the currency pointer of ,set-type> points to record X . If X is a member type, then when a new record is added to the set it is inserted in the position just prior to X. If X is an owner type, then when a new record is added, it is inserted in the last position. System default : When a new record is added to a set , it is inserted in an arbitrary position determined by the system . Sorted : When a new record is added to a set , it is inserted in a position that ensures that the set will remain sorted . The sorting order is specified by a particular key value when a programmer defines the set. The programmer must specify whether members are ordered in ascending or descending order relative to that key. 101 REFER THE TEXT BOOK FOR FURTHER REFERENCE Questions: 1. Explain the architecture of network model. 2. Write short notes on a) Link restriction b) DBTG Sets c) Repeating Groups 3. Explain DBTG data retrieval facility. 4. Explain DBTG set-processing facility. 5. explain DBTG update facility. 6. What is set insertion and retention. 102