Chapter 1, Problem 1RQ Problem Define the following terms: data, database, DBMS, database system, database catalog, program-data independence, user view, DBA, end user, canned transaction, deductive database system, persistent object, meta-data, and transaction-processing application. Step-by-step solution Step 1 of 14 Data The word data is derived from the Latin which means ‘to give’; data is real given facts, from which additional facts can be inferred. Data is a collection of known facts that can be recorded and that have implicit meanings. Comment Step 2 of 14 Database Database is a collection of related data or operational data extracted from any firm or organization. In other words, a collection of organized data is called database. Comment Step 3 of 14 DBMS (Database Management System) DBMS is a collection of programs that enables users to create, maintain, and manipulate a database. The DBMS is a general purpose software system that facilitates the process of defining, constructing, and manipulating database. Comment Step 4 of 14 Database Systems A database system comprises a database of operational data, together with the processing functionality required to access and manage that data. The combination of the DBMS and the database is called database systems. Comment Step 5 of 14 Database Catalog A database catalog contains complete description of the databases, database objects, database structure, details of users, and constraints etc. that are stored. Comment Step 6 of 14 Program-data independence In traditional file processing, the structure of the data files is ‘hard-coded” into the programs. To change the structure of the data file, one or more programs that access that file, should be changed. The process of changing can introduce errors. In contrast to this more traditional approach, DBMS access stores the structure in a catalog, separating the DBMS programs and the data definition. Storing the data and programs separately is known as program-data independence. Comment Step 7 of 14 User View The way in which the database appears to a particular user is called user view. Comment Step 8 of 14 DBA (Database Administrator) DBA is a person who is responsible for authorizing access to the database, coordinating and monitoring its use, and acquiring software and hardware resources as needed. Comment Step 9 of 14 End User End users are the people who want to access the database for different purposes like, querying, updating, and generating reports. Comment Step 10 of 14 Canned Transactions Standardized queries and updates on the database using carefully programmed and tested programs. Comment Step 11 of 14 Deductive Database System A deductive database system is a database system that supports the proof-theoretic view of a database, and ,in particular, is capable of deducing are inferring additional facts from the given facts in the extensional database by applying specified deductive anxious are rules of inference to those given facts. Comments (3) Step 12 of 14 Persistent object Object-Oriented database systems are compatible with programming languages such as c++ and JAVA. An object that is stored in such a way that it survives that termination of the DBMS program is persistent. Comment Step 13 of 14 Meta Data Information about the data is called Meta data. The information stored in the catalog is called Meta data. The schema of a table is an example of Meta data. Comment Step 14 of 14 Transaction processing application A transaction is a logical unit of database. The processing includes one or more database operations like, insertion, deletion, modification and retrieval. The database operations that form a transaction can either be embedded within an application program on they can be specified interactively via a high-level query language such as SQL. Comment Chapter 1, Problem 2RQ Problem What four main types of actions involve databases? Briefly discuss each. Step-by-step solution Step 1 of 5 The four types of actions involve the database are as follows: • Database Administration • Database Designing • Database Usage by end users. • System Analysis and Application Programming Comments (1) Step 2 of 5 • Database Administration: • Database Administration is a process of administering the database resources such as application programs, database management system. • Database Administrator (DBA) is responsible for giving the permission to access the database. • The administrative work also includes acquiring the software and hardware resources. • The security of the database is also managed by the database administration. Comment Step 3 of 5 • Database designing: • Database designing is a process of designing the database which includes identifying the data to be stored in the database and which data structures will be required to store the data. • Database design should fulfill the requirements of all the user groups of the organization. Comment Step 4 of 5 • Database Usage by end user: • End users are the users who can directly access the database for querying, updating and generating the reports. There are following types of end users: o Casual end user: These are the users who access the database occasionally. Middle and high-level managers are the examples of the Casual end users. o Parametric end user: These are the users who constantly access the database. Bank tellers are the examples of the parametric end users. o Sophisticated end user: They are under the category of engineers, scientists who implement the application to meet the complex requirements. o Standalone users: These are the users who maintain personal database by using ready-made program packages. Comment Step 5 of 5 • System Analysis and Application Programming: • The system analysis is a process which determines the requirement of the end users. • The system analysis is done by the System Analysts. System Analysts develop the specification for the canned transactions that meet the requirement of the end users. • The implementation of these specification is done by the Application programmers. Comment Chapter 1, Problem 3RQ Problem Discuss the main characteristics of the database approach and how it differs from traditional file systems. Step-by-step solution Step 1 of 4 Characteristics of Database: Self – Describing nature of a database system: A fundamental characteristic of the database approach is that the database system contains not only the database itself but also complete definitions are description of the database. Structure and constraints. • The information stored in the catalogs is called meta – data, and if describes the structure of the primary database. • In traditional file processing, data definition is typically part of the application programs themselves. Those programs are constrained to work with only one specific database; whose structure is declared in the application programs. Comment Step 2 of 4 Insulation between programs and data and data abstraction:– In traditional file processing, the structure of data files is embedded in the applications programs, so any changes to the structure of a file may require changing all programs that access that file. • DBMS access programs do not require such changes in vast cases. • The structure of data files is stored in DBMS catalog separately from the access programs. Comment Step 3 of 4 Support of multiple views of the data A database typically has many users; each of whom may require a different perspective are view of the database. • A multi-user DBMS whose users have a variety of district applications must provide facilities for defining multiple view. • In case of traditional approach multiple views of data not supported. Comment Step 4 of 4 Sharing of Data and Multi-user Transaction Process:–‘ A multi-user DBMS must allow multiple users to access the database at the sometime. The DBMS must include concurrency central software to ensure that several users trying to update the same data do so in an controlled manner so that the result of the updates is correct. • In traditional database, no such data sharing is possible, there is no such concurrency software available. Comment Chapter 1, Problem 4RQ Problem What are the responsibilities of the DBA and the database designers? Step-by-step solution Step 1 of 2 Responsibilities of DBA: DBA stands for Data Base Administrator. The purpose of a database administrator is highly technical, who is responsible for managing the database used in the organization. • The database administrator has the responsibility to build the physical design of the database. • The database administrator deals with the technical responsibilities like, o Defence enforcement o Performance of the database o Provide access to the database o Acquire resources such hardware and software components o Backup of the data from the database o Recovery of the lost data from the database o Monitoring and Coordinating the use of database o Monitoring response time and security breaches. Comment Step 2 of 2 Responsibilities of Database Designer: Database designer is the Architect of the database, database designer work is versatile, and He/she works with everyone in the organization. The responsibilities of database designer is as follows, • The data to be stored in the database is identified by the database designers • Appropriate structure to store the data are chosen by database designers • Database designer studies and understands the business needs • They communicate about the architecture to business and management and also may participates in business development as advisor • Ensure consistency across database • Create and Enforce database development standards and processes. Comment Chapter 1, Problem 5RQ Problem What are the different types of database end users? Discuss the main activities of each. Step-by-step solution Step 1 of 2 The end users perform various database operations like querying, updating, and generating reports. The different types of end users are as follows: • Casual end users • Naive or parametric end users • Sophisticated end users • Standalone Users Comment Step 2 of 2 Casual end users: • The Casual end users access the database occasionally. • Each time they access the database, their request will vary. • They use sophisticated database query language to retrieve the data from the database. Naive or parametric end users: • Naïve or parametric end users spend most of their time in querying and updating the database using standard types of queries. Sophisticated end users: • The sophisticated end users access the database to implement their own applications to meet their specific goals. • The sophisticated end users are engineers, scientists, and business analysts. Standalone Users: • The standalone end users maintain their own databases by creating one using the ready-made program packages that provides a graphical user interface. Comment Chapter 1, Problem 7RQ Problem Discuss the differences between database systems and information retrieval systems. Step-by-step solution Step 1 of 14 Database Approach:– A databases is more than a file it contains information about more then one entity and information about relationships among the entities. Information retrieval systems:– It information retrieval system data are stored in file is a very old rout often used approach to system developed. Comment Step 2 of 14 Database approach:– Data about a single entity (i.e., Product customer, department) are each stored to a “table” in the database. Comment Step 3 of 14 Information retrieval systems: Each program (system) often had its own unique set of files. Comment Step 4 of 14 Database approach: Databases are designed to meet the needs of multiple users and to be used in multiple applications. Comment Step 5 of 14 Information retrieval systems: User of information retrieval systems are almost always at the mercy of the information department to write programs that manipulate stored data and produce needed information. Comment Step 6 of 14 Database approach: Database approach are relatively complex to design, implement and maintained. Comment Step 7 of 14 Information retrieval systems: Information retrieval systems are very simple to design and implement as they are normally based on a single application or information system. Comment Step 8 of 14 Database approach: The process speed is slow in comparison to information retrieval systems. Comment Step 9 of 14 Information retrieval systems:– The processing speed is faster than other ways of storing data Comment Step 10 of 14 Author Differences :– In database systems program – data independence, bent in case of information retrieval systems program – data are dependence. Comment Step 11 of 14 In database system minimal data redundancy improved data consistence, enforcement of standards improved data quality, but in information retrieval systems duplication of data is resent Comment Step 12 of 14 Improve data sharing is present in database, but in case of data retrieval limited data sharing. Comment Step 13 of 14 In database flexibility and scalability are present but in retrieval system, data are not flexible and scalable Comment Step 14 of 14 In database, reduce data redundancy, but in case of data retrieval systems data redundancy is are of the important problems. Comment Chapter 1, Problem 7RQ Problem Discuss the differences between database systems and information retrieval systems. Step-by-step solution Step 1 of 14 Database Approach:– A databases is more than a file it contains information about more then one entity and information about relationships among the entities. Information retrieval systems:– It information retrieval system data are stored in file is a very old rout often used approach to system developed. Comment Step 2 of 14 Database approach:– Data about a single entity (i.e., Product customer, department) are each stored to a “table” in the database. Comment Step 3 of 14 Information retrieval systems: Each program (system) often had its own unique set of files. Comment Step 4 of 14 Database approach: Databases are designed to meet the needs of multiple users and to be used in multiple applications. Comment Step 5 of 14 Information retrieval systems: User of information retrieval systems are almost always at the mercy of the information department to write programs that manipulate stored data and produce needed information. Comment Step 6 of 14 Database approach: Database approach are relatively complex to design, implement and maintained. Comment Step 7 of 14 Information retrieval systems: Information retrieval systems are very simple to design and implement as they are normally based on a single application or information system. Comment Step 8 of 14 Database approach: The process speed is slow in comparison to information retrieval systems. Comment Step 9 of 14 Information retrieval systems:– The processing speed is faster than other ways of storing data Comment Step 10 of 14 Author Differences :– In database systems program – data independence, bent in case of information retrieval systems program – data are dependence. Comment Step 11 of 14 In database system minimal data redundancy improved data consistence, enforcement of standards improved data quality, but in information retrieval systems duplication of data is resent Comment Step 12 of 14 Improve data sharing is present in database, but in case of data retrieval limited data sharing. Comment Step 13 of 14 In database flexibility and scalability are present but in retrieval system, data are not flexible and scalable Comment Step 14 of 14 In database, reduce data redundancy, but in case of data retrieval systems data redundancy is are of the important problems. Comment Chapter 1, Problem 8E Problem Identify some informal queries and update operations that you would expect to apply to the database shown in Figure 1.2. Step-by-step solution Step 1 of 2 Information Queries:– a) Retrieve the transcript – a list of all courses and grades – of ‘smith’ b) List the name of students who took the section of the ‘Database’ course offered in fall 2005 and their grades in that section. c) List the pre-requisites of the “Database” course Comment Step 2 of 2 Updates Operations:– a) Change the class of “Smith” to sophomore b) Create a new section for the “Database” course for this semester. c) Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester Comment Chapter 1, Problem 9E Problem What is the difference between controlled and uncontrolled redundancy? Illustrate with examples. Step-by-step solution Step 1 of 3 Storing the same facts or data at multiple places in the database is considered as redundancy. In other words, duplication of data is known as redundancy. Some of the problems with redundant data are as follows: • Inconsistency of data • Wastage of memory space Comment Step 2 of 3 Differences between controlled redundancy and uncontrolled redundancy is as follows: Comment Step 3 of 3 Example to illustrate controlled redundancy and uncontrolled redundancy is as follows: Consider the following tables. Employee(empno, ename, job, salary, dob) Department(deptno, dname, location) Project (pno, pname, description) works(empno, deptno, pno) Assume that an employee can work on multiple projects. So, in works table, empno and deptno are redundant if an employee works on two or more projects. Figure 1 is an example of controlled redundancy. Deptno for empno 100 is same in all three records. Figure 2 is an example of uncontrolled redundancy. Deptno for empno 100 is inconsistent in the two records. Comment Chapter 1, Problem 10E Problem Specify all the relationships among the records of the database shown in Figure 1.2. Step-by-step solution Step 1 of 2 Relationship in the database specify how the data tables are related to each other. Comment Step 2 of 2 The relationship between tables are as follows: • Consider the tables COURSE and SECTION. The two tables have common column “Course_number”. Hence, the table SECTION is related to COURSE through Course_number. • Consider the tables STUDENT and GRADE_REPORT. The two tables have common column “Student_number”. Hence, the table GRADE_REPORT is related to STUDENT through Student_number. • Consider the tables COURSE and PREREQUISITE. The two tables have common column “Course_number”. Hence, the table PREREQUISITE is related to COURSE through Course_number. • Consider the tables SECTION and GRADE_REPORT. The two tables have common column “Section_identifier”. Hence, the table GRADE_REPORT is related to SECTION through Section_identifier. Chapter 1, Problem 11E Problem Give some additional views that may be needed by other user groups for the database shown in Figure 1.2. Step-by-step solution Step 1 of 2 Additional views for the given database: New view can be created, which filters each section number of a student and grade of the student. GRADE_SEC_REPORT Student_number Section_identifier Course_number Grade This view is very helpful for university’s administration to print each section’s grade report. Comment Step 2 of 2 Additional view can be created, which filters total number of courses took by a student and the grade achieved by a student in that courses. COURSE_GRADE_REPORT Student_number Course_number Grade GPA This view is very helpful for university’s administration to determine students’ honours. Chapter 1, Problem 12E Problem Cite some examples of integrity constraints that you think can apply to the database shown in Figure 1.2. Step-by-step solution Step 1 of 1 Few constraints that can be imposed on database are: 1. Grade can be given only to enrolled students. 2. Each section must belong to any Course. 3. Each course must be a part of existing department 4. Prerequisite of each course must have been an offered course in past or must be an existing course. Student must be a part of section for which he is graded Comment Chapter 1, Problem 13E Problem Give examples of systems in which it may make sense to use traditional file processing instead of a database approach. Step-by-step solution Step 1 of 2 Despite the advantages of using a database approach, there are some situations in which a DBMS may involve unnecessary overhead costs that would not be incurred in traditional file processing. Comment Step 2 of 2 The following are examples of systems in which it may make sense to use traditional file processing instead of a database approach. • Many computer aided design foals (CAD) used by the chemical and civil engineers have proprietary file and data management software that is geared for the internal manipulations or drawing and 3D objects. • Similarly, communication and switching systems designed by companies like At & T. • The GIS implementations often implement their own data organization schemes for efficiently implementing functions related to processing maps, physical contours, lines, polygons, and so on. General purpose DBMS’s are inadequate for their purpose. • Small single user applications. • The real-time navigation system that requires less data. Comment Chapter 1, Problem 14E Problem Consider Figure 1.2. a. If the name of the ‘CS’ (Computer Science) Department changes to ‘CSSE’ (Computer Science and Software Engineering) Department and the corresponding prefix for the course number also changes, identify the columns in the database that would need to be updated. b. Can you restructure the columns in the COURSE, SECTION, and PREREQUISITE tables so that only one column will need to be updated? Step-by-step solution Step 1 of 2 a) The following columns need to be updated when the name of the department changed along with the course number. In the STUDENT table, Major has to be updated. In the COURSE table, Course_number and Department should be updated. In the SECTION table, Course_number should be updated. In the PREREQUISITE table, Course_number and Prerequisite_number are to be modified. Comment Step 2 of 2 b) The columns of the tables are split as follows: The tables are as follows after restructuring: Comments (1) Chapter 2, Problem 1RQ Problem Define the following terms: data model, database schema, database state, internal schema, conceptual schema, external schema, data independence, DDL, DML, SDL, VDL, query language, host language, data sublanguage, database utility, catalog, client/server architecture, three-tier architecture, and n-tier-architecture. Step-by-step solution Step 1 of 19 Data model The data model describes the logical structure of the database and it introduces abstraction in the DBMS (Database Management System). The data model provides a tool to describe the data and their relationships. Comment Step 2 of 19 Database Schema The database schema describes the overall design of the database. It is a basic structure to define how the data is organized in the database. The database schema can be depicted by the schema diagrams. Comment Step 3 of 19 Database state The actual data stored in the database in a moment in time is called the database state. Comment Step 4 of 19 Internal Schema It is also referred as the Physical level schema. The internal schema represents the structure of the data as viewed by the DBMS and it describes the physical storage structure of the database. Comment Step 5 of 19 Conceptual Schema It is also referred to as the Logical level schema. It describes the logical structure of the whole database for a group of users. It hides the internal details of the physical storage structure. Comment Step 6 of 19 External Schema The external schema referred as User level schema. It describes the data which is viewed by the end users. This schema describes the part of the database for a user group and it hides the rest of the database from that user group. Comment Step 7 of 19 Data independence The capacity to change the schema at the physical level of a database system without affecting the schema at the conceptual or external level is called data independence. Comment Step 8 of 19 DDL DDL stands for Data Definition Language. It is used to create, alter, and drop the database tables, views, and indexes. Comment Step 9 of 19 DML DML stands for Data Manipulation Language. It is used to insert, retrieve, update, and delete the records in the database. Comment Step 10 of 19 SDL SDL stands for Storage Definition Language. It is used to specify the internal schema of the database and specify the mapping between two schemas. Comment Step 11 of 19 VDL VDL stands for View Definition Language. It specifies the user views and their mappings to the logical schema in the database. Comment Step 12 of 19 Query Language The query language is a high-level language used to retrieve the data from the database. Comment Step 13 of 19 Host Language The host language is used for application programming in a database. The DML commands are embedded in a general-purpose language to manipulate the data in the database. Comment Step 14 of 19 Data Sublanguage The data manipulation language commands are embedded in a general-purpose language to manipulate the data such as insert, update, and delete operations in the database, here the DML is referred as a data sublanguage. Comment Step 15 of 19 Database utility The database utility is a software module to help the DBA (Database Administrator) to manage the database. Comment Step 16 of 19 Catalog The catalog stores the complete description of the database structure and its constraints. Comment Step 17 of 19 Client/server architecture The client/server architecture is a database architecture and it contains two modules. A client module usually a PC that provides the user interface. A server module can respond the user queries and provide services to the client machines. Comment Step 18 of 19 Three-tier architecture The three-tier architecture consists of three layers such as client, application server, and database server. The client machine usually contains the user interface and the intermediate layer (application layer) running the application programs and storing business rules. The database layer stores the data. Comment Step 19 of 19 n-tier architecture The n-tier architecture consists of four or five tiers. The intermediate layer or business logic layer is divided into multiple layers. And distributing programming and data throughout a network. Comment Chapter 2, Problem 2RQ Problem Discuss the main categories of data models. What are the basic differences among the relational model, the object model, and the XML model? Step-by-step solution Step 1 of 2 The three main categories of data models are as follows: • High-level or Conceptual data model • Representational or implementational data model • Low -level or Physical data model Comment Step 2 of 2 The Differences between relational model, the object model and XML model are as follows: Relational Model Object Model XML Model The data in relational model It refers to the model which The data in the XML model is in is represented logically and deals with how applications hierarchical mode. We can information about the will interact with the resources define different types of the relationship types. from any external resource. data in a single XML document. The data is defined in columns with the field name and the entire data in a column must be in the same It also deals with the relationship between the classes, methods and properties of the classes. type. It is closer to conceptual data The relational database The classes in the object uses high-level query model are designed in acyclic language graph manner. Example: SQL Comment The data in XML document does not have any inherent ordering. models. Example: Document Object Model (DOM) Data is represented in the form of tags known as elements. Example: Stylus studio Chapter 2, Problem 3RQ Problem What is the difference between a database schema and a database state? Step-by-step solution Step 1 of 1 Difference between a database schema and a database state:Database schema is a description of the database and the database state is the database it self. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. Most data models have certain convention for displaying schemas as diagram. A displayed schema is called a schema diagram schema diagram displays the structure of each record type but not the actual instances of records. A schema diagram displays only some aspects of a schema, such as the names of record types and data items, and some types of constraints. The data in the database at a particular moment in time is called a database state. It is also called the current set of occurrences are instances in the data base. In a given database state, each schema construct has its own current set of instances many database states can be constructed to covers pond to a particular data base schema. Every time we insert are delete a record are change the value of a data item in a record we change one state of the database into another state. When we define a new database we specify its database schema only to the DBMS. At this point, the covers pending database state in the empty state with no data. The DBMS in partly responsible for ensuring the every state of the database is a valid state. – that is , a state that satisfies the structure and constraints specified in the schema. The schema is sometimes called the intension, and a database state is called an extension of the schema. Comment Chapter 2, Problem 4RQ Problem Describe the three-schema architecture. Why do we need mappings among schema levels? How do different schema definition languages support this architecture? Step-by-step solution Step 1 of 3 Three-schema architecture :The goal of he three-schema architecture is to separate the user applications and the physical database. In this architecture schemas can be defined at the following three levels. (1) internal level :it has an internal schema, which describes the physical storage structure of the database. (2) Conceptual level :It has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations and constraints. Comment Step 2 of 3 (3) External level :It includes a number of external schema are user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that group. A high-level data model on an implementation data model can be used at this level. Need of mapping :The process of transforming requests and results between levels are called mappings. The conceptual internal mapping define the coverspondence between the conceptual view and the stared database. It specifies how conceptual records and fields are represented at the internal level. An external conceptual mapping defines the covers pondence between a particular external view and the conceptual view. Comment Step 3 of 3 Different schema definition language :DDL :Data definition language is used to specify conceptual and internal schemas for the database and any mappings between the two, the DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog. SDL :- Storage definition language is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages. In mast relational DBMS’s to day, there is no specific language that performs the sale of SDL. Instead the internal schema is specified by a combination of parameters and specifications related to storage. VDL :View Definition Language is used to specify user view and their mappings to the conceptual schema but in most DBMS’s the DDL is used to define both conceptual and external schemas. In relational DBMS’s SQL is used in the sale of VDL to define user are application views as results of predefined queries. Comment Chapter 2, Problem 5RQ Problem What is the difference between logical data independence and physical data independence? Which one is harder to achieve? Why? Step-by-step solution Step 1 of 3 The data independency refers to the task of changing a level of schema without affecting the other levels or the levels at higher level. There are following two different ways in which data independence is achieved: • Logical data independence • Physical data independence Comment Step 2 of 3 Logical data independence is the capacity to change the conceptual schema without changing the external schema. This only requires changing the view definition and the mappings. For example, changing the constraints of an attribute that does not affect the external schema, insertion and deletion of data items that changes the table size but does not affect the external schema. Physical data independence is the capacity to change the internal schema without changing the conceptual schema or the external schema. For example, reorganization of files on the physical storage to enhance the operations on the database and since the data is the same and only the files are relocated, the conceptual/external schema remains unaffected. Comment Step 3 of 3 The logical data independence is harder to achieve. Changing the attribute constraints and the structure of the table might result in invalid data for the changed attributes. The table or the application program that references the modified table will get affected which should not be the case in logical data independence. Comment Chapter 2, Problem 6RQ Problem What is the difference between procedural and nonprocedural DMLs? Step-by-step solution Step 1 of 2 Difference between procedural and nonprocedural DML “Procedural DML :Procedural data manipulation language is called low level DML. Procedural DML must be embedded in a general purpose programming language. This type of DML typically retrieves. Individual records are objects from the database and process each separately. Therefore, it needs to use programming language. Constructs, such as looping to retrieve and process each record form a set of records. Procedural DMLs are also called record –at-a-time DML. Comment Step 2 of 2 Non-procedural DML :Non-procedural is called high level DML. Non-procedural DML can be used on its own to specify complex database operations concisely many DBMS’s allow high-level DML statements either to be entered interactively from a display monitor ore terminal are to be embedded in a generalpurpose programming language. A query in a high level DML often specifies which data to retrieve rather than how to retrieve it. Therefore such languages are also called declarative. Non-procedural DML requires a user to specify what data are needed without specifying low to get these data. Comment Chapter 2, Problem 7RQ Problem Discuss the different types of user-friendly interfaces and the types of users who typically use each. Step-by-step solution Step 1 of 7 User friendly interfaces provided by the DBMS are as follows: (a) Menu-Based interfaces: • These interfaces contain the lists of options through which the user can send the request. • Pull-down menus are a very popular technique in web-based user interfaces. User who use the interface: • These types of interfaces are used by the web browsing users and web clients. Comment Step 2 of 7 (b) Forms-based interfaces: • These types of interfaces display a form to each user. • The user can fill the entries to insert new data. • These Forms are usually designed and programmed for naive users as interfaces to recorded transactions. User who use the interface: • User who wants to submit the online information by filling and submitting the details. • Mostly used to create accounts on a website, or enrolling into some institution etc. Comment Step 3 of 7 (c) Graphical user interfaces: • A graphical user interfaces contain a diagrammatic form that comprises a schema to the user. • The user can ask a query by manipulating the diagram. • These interfaces use mouse as pointing device to pick certain parts of the displayed schema diagram. User who use the interface: • Mostly used by the users who uses the electronic gadgets such as mobile phones and touch screens. • Users who uses the applications that are accessed by pointing devices. Comment Step 4 of 7 (d) Natural language interfaces: • These interfaces accept the request from the user and tries to interpret it. • The natural language interfaces have its own schema which is like the database conceptual schema. User who use the interface: • The Search engines in these days are using natural language interfaces. • The users can use these search engines that accepts the words and retrieves the related information. Comment Step 5 of 7 (e) Speech input and output: • These interfaces accept speech as an input and outputs the speech as a result. User who use the interface: • These types of interfaces are used in the inquiry for telephone directory or to get the flight information over the smart gadgets, etc. Comment Step 6 of 7 (f) Interfaces for parametric users: • Paramedic users such as bank tellers have a small set of operations that they must perform repeatedly. • These interfaces contain some commands to perform a request with minimum key strokes. User who use the interface: • These can be used in bank transactions to deposit or withdrawal of money. Comment Step 7 of 7 (g) Interfaces for the DBA: • These interfaces contain some commands for creating accounts, to manipulate the database and to perform some operations on the database. User who use the interface: • These interfaces are specifically used by the Database administrators. Comment Chapter 2, Problem 8RQ Problem With what other computer system software does a DBMS interact? Step-by-step solution Step 1 of 7 Database management system (DBMS): A database management system (DBMS) is a set of program that empowers users to build and maintain a database. It is a general-purpose software system that enables the processes to define, construct, manipulate, and share databases among various applications and users. Comment Step 2 of 7 List of other computer system software a database management system (DBMS) interacts with: The following are the list of other computer system software a database management system (DBMS) interacts with: • Computer-Aided Software Engineering (CASE) tools. • Data dictionary systems. • Application development environments. • Information repository systems. • Communication software. Comment Step 3 of 7 CASE tools: The design phase of the database system often employs the CASE tools. Comment Step 4 of 7 Data dictionaries: Data dictionaries are similar to database management system catalog, however, they include variety of information. • Typically, data dictionaries can be directly accessed by the database administrator (DBA) whenever required. Comment Step 5 of 7 Application development environments: Typically, application development environments often provide an environment to develop database application and have facilities that aid in many features of database systems, including graphical user interface (GUI) development, database design, querying, update, and application program development. • Examples of application development environments are listed below: o JBuilder (Borland) o PowerBuilder (Sybase) Comment Step 6 of 7 Information repository systems: • The information repository is a kind of data dictionary that can also stores information like design decisions, application program descriptions, usage standards, and user information. • Like data dictionaries, information repository can also be directly accessed by the database administrator. Comment Step 7 of 7 Communication software: • The database management system also requires interfacing with communication software. • The main function of the communication software is to enable users residing remote from the database system to access the database through personal computers, or workstations. • The communication software are connected to the database system through communications hardware like routers, local networks, phone lines, or satellite communication devices. Comment Chapter 2, Problem 9RQ Problem What is the difference between the two-tier and three-tier client/server architectures? Step-by-step solution Step 1 of 2 The difference between a two-tire architecture and a three tire architecture is that of a layers through which data and queries pass at time of processing, for any database. In two tire architecture there is two layers viz., Client layer (user interface) and query server or transaction server. Application programs run on client side and when data processing is required connection is established with the server (DBMS), where data is stored. Once connection is established, transaction and query requests are sent using Open Database Connectivity’s API’s, which are then processed at server side. It may also happen that client side takes care of user interaction and query processing while server stores data, manages disks etc. Exact distribution of functionality differs but two - tire architecture has two layers. Comment Step 2 of 2 In three- tire architecture there are three layers, and a new application or web layer is between client and database service layer. The idea behind three tire architecture is to partition roles in different layers and each layer has specific task. In three-tire architecture, user layer or client layer provide user interface from where user can run query. Query gets processes at application or web server layer. This layer also checks for any business constraints that may be imposed on type of query user can send or verify credentials of user so has verify access permissions that user has. This layer can also be called as Business logic layer. Finally Database server manages storage of data in the system. Comment Chapter 2, Problem 10RQ Problem Discuss some types of database utilities and tools and their functions. Step-by-step solution Step 1 of 2 Few categories of database utilities and tools and their functions are: 1. Loading: Load existing data files such as text files into the database. • Transfer data from one dbms to another dbms easily used in many organizations. • Vendors are offering the conversion tools. Those tools are useful loading programs. 2. Backup: It is one of the utility that organize a backup copy of the database. • Put entire database onto tape and those database backup copies can be used in the case of catastrophic loss for recovering system state. Comment Step 2 of 2 3. Database storage reorganization: It is a utility that can be used to restructure a set of database files into a different file organization to raise the performance of the database. 4. CASE tools: CASE tools are used to produce a plan for a database application. 5. Data Dictionary system: Information repository plays main role in data dictionary system. • It is one of the repository is used to store design process, user information and application program description. • This information can be accessed by user when it is required. • Information repository contains additional information than the DBMS catalog. 6. Performance monitoring: It is used to control database usage and maintain stats. • Those stats are used by the DBA in making selection, those selections are related to file restructure and indexing for raise the performance of database. There are several utilities are available those are • Sorting the text files in the database. • Data compression techniques handled by database. Comment Chapter 2, Problem 11RQ Problem What is the additional functionality incorporated in n-tier architecture (n > 3)? Step-by-step solution Step 1 of 1 It is customary to divide the layer between the user and the stored data in three tire architecture into finer components, thereby giving rise to an n-tire architecture, where n may be 4 or 5. Typically, the business logic layer is divided into multiple layer. 1. N-tire architecture distributes data and programming over the network. 2. Each tire can run on appropriate processor or operating system platform and can be handled independently. Another layer that is typically used by vendors of ERP and CRM packages is the middleware layer which accounts for the front-end modules communicating with a number of back-end databases. Comment Chapter 2, Problem 13E Problem Choose a database application with which you are familiar. Design a schema and show a samp database for that application, using the notation of Figures 1.2 and 2.1. What types of additional information and constraints would you like to represent in the schema? Think of several users o your database, and design a view for each. Step-by-step solution Step 1 of 2 Consider Flight Reservation system. • Each flight is identifies by Number, and consists of one or more FLIGHT_LEGs with Leg_no. And flies on certain weekdays. • Each FLIGHT_LEG has scheduled arrival and departure time and arrival and departure airport and one or more LEG_INSTANCEs – one for eachDate on which flight travels. • FARE is kept for each flight and there are certain set of restrictions on FARE. • For each FLIGHT_LEG instance, SEAT_RESERVATIONs are kept, as are AIRPLANE used on each leg and the actual arrival and departure times and airports. • AIRPLANE is identified by an airplane id, and is of a particular AIRPLANE_TYPE. It has a fixe no. of seats. • CAN_LAN relates AIRPLANE_TYPE to the AIRPORTS at which they can land. • AIRPORT is identified by airport code. Comment Step 2 of 2 Following constraints hold good on schema: a. Asked flight number or flight leg is available on given date. Data can be checked from LEG_INSTANCE table. b. A non reserved seat must exist for specifies date and flight. We can get total number of seats available from AIRPLANE. c. Fligh_leg can correspond to existing flight number. d. Arrival and code must be of existing airports. e. Leg_instance can have entries only for valid Flight_number and leg_number combination. f. Flight_number in any relation is of a valid flight that has its entry in FLIGHT table. g. Airplane_type_name in CAN_LAND must be a vlaid name from AIRPLANE_TYPE. Comment Chapter 2, Problem 14E Problem If you were designing a Web-based system to make airline reservations and sell airline tickets, which DBMS architecture would you choose from Section 2.5? Why? Why would the other architectures not be a good choice? Step-by-step solution Step 1 of 4 There are four architectures discussed in section 2.5 in the textbook. They are 1. Centralized DBMS architecture 2. Basic Client/Server Architecture 3. Two-Tier Client/Server Architecture 4. Three-Tier Client/Server Architecture Comment Step 2 of 4 For designing a Web-based system to make airline reservations and sell airline tickets, Three-tie client/server architecture will be the best choice. • A web user interface is necessary as different types of users such as naive users or casual users will interact with the system. • Web user interface is placed in the client system. • User can interact with user interface and submit the transactions. • Web server can handle those transactions, validate the data and manipulate database accordingly. • Webserver/application server will handle the application logic of the system. • The database server contains the DBMS. Comment Step 3 of 4 In centralized DBMS architecture, DBMS functionality and user interface are performed on the same system. But for a Web-based system, they must be on different systems. Hence centralized DBMS architecture is not appropriate for web-based system. Comment Step 4 of 4 In three-tier Client/Server Architecture, the business logic is placed in application server or web server. Basic Client/Server architecture or Two-Tier Client/Server architecture can be considered appropriate for web server if the business logic can be placed in database server or client. But if business logic is placed in database server or client, it will be a burden. Hence, Basic Client/Server architecture and Two-Tier Client/Server architecture are not appropriate for web-based system. Comment Chapter 2, Problem 15E Problem Consider Figure 2.1. In addition to constraints relating the values of columns in one table to columns in another table, there are also constraints that impose restrictions on values in a column or a combination of columns within a table. One such constraint dictates that a column o a group of columns must be unique across all rows in the table. For example, in the STUDENT table, the Student_number column must be unique (to prevent two different students from havin the same Student_number). Identify the column or the group of columns in the other tables that must be unique across all rows in the table. Step-by-step solution Step 1 of 2 By using schema diagram of the database, the database tables are constructed. Each data bas table contains column and those columns are unique. Comment Step 2 of 2 Group of columns that will be unique in each table are: 1. STUDENT: Student_number 2. COURSE: Course_number. If course name is separate for each course Course_name can also be a column. 3. PREREQUISITE: Course_number can be a unique identifier but only if a course has single PREREQUISITE or else Course_number and Prerequisite_number will together form unique combination. 4. SECTION: Section_identifier • Consider that no two sections can have the same Section_identifier. • Look at that Section_identifier is unique only within a given course allow in a given term. 5. GRADE_REPORT: Section_identifier and Student_number. • The Section_identifier will be different if a student takes the same course or different course in other term. Comment Chapter 3, Problem 1RQ Problem Discuss the role of a high-level data model in the database design process. Step-by-step solution Step 1 of 2 High-level data model provides the concepts for presenting data which are close to the user recognize data. It helps to show the data requirements of the users in a detailed description of the entity types, relationships and constraints. Comment Step 2 of 2 The role of a high-level data model in the database design process is as follows: • The design process of the High-level data model is easy to understand and useful in communicating with non-technical users. • This model acts as a reference to ensure that all the user requirements are met and do not conflict with each other. • High-level data model helps to concentrate on specifying the properties of data to the database designers, without being concerned with storage details in the database design process. • This data model helps in conceptual design. Comment Chapter 3, Problem 2RQ Problem List the various cases where use of a NULL value would be appropriate. Step-by-step solution Step 1 of 2 Use of NULL values is appropriate in two situations: 1. When value of an attribute is irrelevant for an entity. For example: In a schema that stores information about a person if we have an attribute called Company, which sores the company name where a person works. Now for a student who is no working, this attribute value will be irrelevant, so we can put in a NULL value at its place. Comment Step 2 of 2 2. When value of a particular attribute is not known; either because it is not known that value for attribute exist or because existing value is unknown; then we can put NULL as value. For example: In a schema that stores information about a person if we have an attribute called Company, which sores the company name where a person works. Now for a person it is possible that he is not working or it might be the case that the value of the company in which person works is unknown, so we can put in a NULL value at its place. Comment Chapter 3, Problem 3RQ Problem Define the following terms: entity, attribute, attribute value, relationship instance, composite attribute, multivalued attribute, derived attribute, complex attribute, key attribute, and value set (domain). Step-by-step solution Step 1 of 5 1. Entity: An entity is an object (thing) with independent physical (car, home, person) or conceptual (company, university course) existence in the real world. 2. Attribute: Each real world entity (thing) has certain properties that represent its significance i real world or describes it. These properties of an entity are known as attribute. For example: consider a car: various things that describe a car can be: model, manufacture, color, cost etc... All these are relevant in a miniworld and are important in describing a car. These are attributes o a CAR. Comment Step 2 of 5 3. Attribute Value: Associated with each real world entity are certain attributes that describe tha entity. Value of these attributes for any entity is called attribute value. For Example: Attribute Value of color attribute of car entity can be Red. 4. Relationship Instance: Each relationship instance rj in R is an association of entities, where the association includes exactly one entity from each participating entity type. Each such relationship instance rj represent the fact that the entities participating in rj are related in some way in the corresponding miniworld situation. For example: In relationship type WORKS_FOR between the two entity types EMPLOYEE and DEPARTMENT, which associates each employee with the department for which the employee works. Each relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE and one DEPARTMENT. Comment Step 3 of 5 5. Composite Attribute: An attribute that can be divided into smaller subparts, which represent more basic attributes with independent meanings, is called a composite attribute. For Example: consider an attribute called phone number that in relation to an employee of a company. One can have phone number as a single attribute or as two attributes, viz. ., area cod and number. Since phone number can be broken into two independent attributes, it is a composite attribute. Weather to break a composite attribute or divide it in basic attributes depends on usage of the attribute in miniworld. 6. Multivalued Attribute: For a real world entity, an attribute may have more than one value. Fo example: Phone number attribute of a person. A person may have one, two or three phones. So there is a possibility of more than one value for this attribute. Any attribute that can have more than one value is a multivalued attribute. Comment Step 4 of 5 7. Derived Attribute: For a real world entity, an attribute may have value that is independent of other attributes or can not be derived from other attributes; such attributes are called as stored attributes. There are also certain attributes, whose value can be derived using value of other attributes; such attributes are known as derived attributes. For example: if date of birth of a person is a stored attribute, and using DOB attribute and current date age of a person can be calculated; so age is a derived attribute. 8. Complex Attribute: Composite and multivalued attribute can be nested arbitrarily. Arbitrary nesting can be represented by grouping components of a composite attribute between parenthesis () and separating the components with comas, and by displaying multivalued attributes between braces {}. Such attributes are called composite attributes. For Example: if a person has more than one address and each residence has multiple phones and address_phone attribute can be specifies as: (Address_phone({Phone(Area_code,Ph_Num)},Address(street_address, (Number,Street,Apartment_number),City,State,Zip)) Comment Step 5 of 5 9. Key Attribute: Each real world entity is unique in itself. There are certain attributes whose value is different for all similar type of entities. These attributes are called Key attributes. These attributes are used to specify uniqueness constraint in a relation. For Example: Consider a entity Car. For all cars, attribute, registration number and car number will have different values. These are key of all entity of car type. It is possible that a set of attributes form a key. 10. Value Set (domain): For a Attribute of a real world entity, there is a range of values from which a particular attribute can take value. For example: Age attribute of an employee must have value, let, from 18-70 then all integers in range 18-70 are domain of attribute Age In most programming languages basic data types such as integers, strings, float, date etc… are used to specify domain of a particular attribute. Comment Chapter 3, Problem 4RQ Problem What is an entity type? What is an entity set? Explain the differences among an entity, an entity type, and an entity set. Step-by-step solution Step 1 of 4 Entity type: An entity type defines a collection (or set) of entities that have the same attributes. database usually contains a group of entities that are similar. These entities have same attribute but different attribute values. A collection of these entities is an entity type. For example a car dealer might like to store details of all car in his showroom in a car database A collection of all car entities will be call as entity type. Each entity type in a database is represented by its name and its attributes. Comment Step 2 of 4 For example in CAR can be the name of the entity type and Reg_num, Car_num, Manufacturer model, cost, color can be attributes. Entity Set: At a particular time the dealer might have a set of eight cars and at some other time he might have a set of different 4 cars. The collection of all entities of a particular entity type in a database, at any point of time are called entity set. It is referred by same name as entity type. Comment Step 3 of 4 For example if we have 4 entities (4 cars): Entity set will include: Name: CAR Entities: e1(reg_1, DL_1, ford, 1870,2000000,white),e2(reg_2, DL_3, ford, 1830,1000000,white),e3(reg_3, DL_3, ford, 1877,2100000,red),e4(reg_4, DL_4, ford, 1970,2500000,white) Comment Step 4 of 4 An entity is a real world object or thing that has independent physical or conceptual existence. Often there are many entities of similar type and about those information needs to be stored in database. Name of this database and attributes of entity jointly form an entity type, or in other words entity type is collection of entities that have similar attributes. At two instance of time, entities in miniworld about which information is stored in the database can be different. Collectio of entities of an entity type at an instance of time is called entity set. Comment Chapter 3, Problem 5RQ Problem Explain the difference between an attribute and a value set. Step-by-step solution Step 1 of 2 Attribute: Every entity has certain things that represent its importance in the real world. These properties o entities are known as attribute. Example: Let us consider a Bus, bus contains different things that describe a bus can be model, color, manufacture date, year, country etc. Value set: For attribute of an entity, there is a range of values from which an attribute can take a value. Example: Age attributes of an employee must have a value. Let us consider Age is attribute in the range o 16 - 60 then all are integers and those known as the value set of attribute Age. Comment Step 2 of 2 The difference between an attribute and value set. Attribute Value set A table grouped the data in rows and columns. The Value set is the group of values that may columns are known as attributes of that table. Attribute contains certain properties of an entity. be allow to that attribute for each entity. Value set is a range of values which an attribute can take a value. Comment Chapter 3, Problem 6RQ Problem What is a relationship type? Explain the differences among a relationship instance, a relationship type, and a relationship set. Step-by-step solution Step 1 of 3 Relationship type: This expresses a type of relationship that is occurring between the entities and also lists the possible count of relationships between entities. Comment Step 2 of 3 Consider the following diagram. Explanation: STUDENT and COURSE are entities and ENROLL refers to the relationship. S1, S2, S3… are the instances of entity STUDENT. C1, C2, C3… are the instances of entity COURSE. r1, r2, r3… are the relationship types between the entities. Relationship type is the association between the entities. In the above diagram ENROLL is the relationship type. Relationship instance refers to exactly one instance from each participating entity type. S1 is related to C1 through r1. S1 and C1 are one instance, S2 and C2 are one instance, S3 and C1 and so on. Relationship set refers to all instances of a Relationship type. {(S1, C1), (S2, C2) , S1, C3) …} form the relationship set. Comment Step 3 of 3 Differences between relationship instance, type and set: Relationship instance Relationship type Relationship set It refers to exactly one instance from It refers association This is a collection instances each participating entity type. between the entities. of a relationship type. Comment Chapter 3, Problem 7RQ Problem What is a participation role? When is it necessary to use role names in the description of relationship types? Step-by-step solution Step 1 of 3 The Participation role is the part that every entity participates in a relationship. • This role is important to use role name in the depiction of relationship type when the similar entity type participates more than once in a relationship type in various roles. • The role names are necessary in recursive relationships. Example: An employee is related to a department in which he works in a company. So, we can say that a relationship may exist between various entities (of same or different entity type). Each entity type that participates in a relationship type plays a role in the relationship. Comment Step 2 of 3 Participation Role or Role name signifies role that a participating entity from the entity type plays in each relationship instance and helps to explain what relationship means. Example: In WORKS_FOR relationship type, EMPLOYE plays the role of worker and DEPARTMENT plays role of department or employer. In figure below an employee works for department. E1 and E3 work for D1 and E2 works for D2. Comment Step 3 of 3 Using Role name is not necessary in the description of relationship types where all participating entities are distinct as in example above because, in such cases name of entity type generally specify the role played by each entity type. But when one entity type participates in a relation in more than one role; recursive relationships; it becomes necessary to use role names in the description of relationship types. Example: Consider entity type EMPLOYEE. There can be another employee who can supervise the first employee. In this case role cannot be describes using the entity type name as this is relationship of an entity type with itself. In such a case using role name becomes important. In figure below Supervision relationship type relates employee and supervisor. E1 supervises E2. Here each relationship instance ri in SUPERVISION associates two employee, ei and ej, one playing role of supervisor and other playing role of supervisee. Comment Chapter 3, Problem 8RQ Problem Describe the two alternatives for specifying structural constraints on relationship types. What are the advantages and disadvantages of each? Step-by-step solution Step 1 of 3 The two alternatives for specifying structural constraints on relationship types are as follows: • Cardinality ratio • Participation constraint Comment Step 2 of 3 Cardinality Ratio: • The entity can participate in any number of relationship instances. • The cardinality ratio specifies the maximum participation of the entity. • For a binary relationship, the cardinality ratios can be 1:1, 1:N, N:1 and M:N. • Cardinality Ratio is represented on ER diagram by 1,M and N on the left and right side of the diamond. Participation constraint: • The participation constraint specifies the minimum number of relationship instances that can be participated by each entity. • The participation constraint specifies the minimum participation of the entity. It is also called as minimum cardinality constraint. • There are two types of participation constraints. They are total and partial participation constraints. • Participation constraint is represented in an ER diagram a line joining the participating entity type and relationship. Total participation is represented by a double line where as partial participation is represented by a single line. Comment Step 3 of 3 Advantages and disadvantages: • The cardinality ratio and participation constraint specify the participation of the entity in the relationship instances. • They are helpful in describing the binary relationship types. • It is a costly affair for some of the entities and relationships to be expressed using these two modeling constructs. Comment Chapter 3, Problem 9RQ Problem Under what conditions can an attribute of a binary relationship type be migrated to become an attribute of one of the participating entity types? Step-by-step solution Step 1 of 2 • The attributes of a relationship with cardinality 1:1 and 1: N can be migrated to become an attribute of entity types. • In case of 1:1 cardinality, the attribute can be moved to either of entity types in the binary relationship. • In case of 1: N cardinality, the attribute can be migrated only to N side of the relationship. Comment Step 2 of 2 Example • Consider a binary relationship, Works_for between the EMPLOYEE and DEPARTMENT. • This relationship is between the DEPARTMENT and EMPLOYEE is of cardinality 1: N. • Each employee is in one department but there can be several employees in a single department. • In this scenario, an attribute Start_date in relationship type WORKS_FOR that can be migrated to EMPLOYEE entity type that tells start date when the employee started working for that department. Comment Chapter 3, Problem 10RQ Problem When we think of relationships as attributes, what are the value sets of these attributes? What class of data models is based on this concept? Step-by-step solution Step 1 of 3 Solution: Relationship as attributes: • Whenever the attribute refers to one entity type to another entity type, there a relationship exists. • They can have attributes like entity types. • For those attributes having cardinality relationship type as 1:1 or 1: N. • The relationship types to become attributes of entity types when it is migrated. For example: Take the scenario as follows: There is a relationship between the EMPLOYEE and DEPARTMENT. • The relationship DEPARTMENT:EMPLOYEE is of the cardinality 1: N. • Here, each employee is in one department and several employees are in a single department. Start_date attribute is in the WORKS_FOR relationship type that can be migrated to EMPLOYEE entity type. This will inform the Start_date, when EMPLOYEE started working for that department. Date will be the domain or value set for Start_date of EMPLOYEE in any department. This will not change or depend on any attribute whether it is present or not. Comment Step 2 of 3 The Value sets of attributes: The set of values attribute can call as domain or value set. In conceptual design phase of data model all entity types, relationships and constraints are specified as follows: • DEPARTMENT entity type contains the attributes like name, locations, number, manager and managerstartdate. • Here, multi-valued attribute is location. Key attributes are both Name and number. • PROJECT entity type contains the attributes like name, number location, controllingdepartment. • Key attributes are both Name and number. • EMPLOYEE entity type contains the attributes like name, sex, ssn, salary, department, address, salary, department, birthdate and supervisor. • Composite attributes are both Name and address. • DEPENDENT entity type contains the attributes like employee, dependantname, sex, relationship, and birthdate. Comment Step 3 of 3 The relational data model is based on this concept. Comment Chapter 3, Problem 11RQ Problem What is meant by a recursive relationship type? Give some examples of recursive relationship types. Step-by-step solution Step 1 of 2 Recursive relationship: If there is a relationship between the two entities of the similar type is called as recursive relationship. • The relationship between occurrences of two different entities is termed as recursive relationship Comment Step 2 of 2 Example of recursive relationship: The following is the example of recursive relationship, Consider that the entity might be a PERSON. In this entity, the attribute will be MOTHER which is a person itself. Here, the recursive relationship exists because one row in the PERSON table refers to another row in the same PERSON table. Comment Chapter 3, Problem 12RQ Problem When is the concept of a weak entity used in data modeling? Define the terms owner entity type, weak entity type, identifying relationship type, and partial key. Step-by-step solution Step 1 of 5 The concept of a weak entity is used in the conceptual phase of a data modeling. While modeling, the entity types who do not have key attributes of there own. Example Consider the entity types DEPENDENT and EMPLOYEE. • A DEPENDENT can only be an EMPLOYEE of the company. • The DEPENDENT attributes can be same for relatives of two employees so, there can be no unique way of distinguishing between two records such entity types are called weak entity types. Comments (1) Step 2 of 5 Owner entity type The entities belong to a weak entity type are identified by being associated to specific entities from another entity type in combination with one of their attribute values. Comment Step 3 of 5 Weak Entity Type Entity types that do not have key attributes of their own are called weak entity types. Comment Step 4 of 5 Identifying Relationship Type A relationship type that relates a weak entity to its owner entity type is called identifying relationship type. Comment Step 5 of 5 Partial key A partial key is a set of attributes in weak entity types that can uniquely identify weak entities that are related to the same owner entity. Comment Chapter 3, Problem 13RQ Problem Can an identifying relationship of a weak entity type be of a degree greater than two? Give examples to illustrate your answer. Step-by-step solution Step 1 of 4 Identifying relationship: The relationship between a strong and a weak entity is known as identifying relationship. Comment Step 2 of 4 The degree of an identifying relationship of a weak entity can be two or greater than two. Comment Step 3 of 4 Consider the following ER diagram: Here, • Student and Company are the two strong entities and Interview is the weak entity. • The selection_process is an identifying relationship. • The degree of the identifying relationship (selection_process) is 3. • In the above ER diagram, the student applies for a job in a company and interview is a selection process for the student to take a job in the company. Comment Step 4 of 4 Therefore, from the above ER diagram, it can be concluded that the degree of an identifying relationship of a weak entity can be greater than 2. Comment Chapter 3, Problem 14RQ Problem Discuss the conventions for displaying an ER schema as an ER diagram. Step-by-step solution Step 1 of 1 Comment Chapter 3, Problem 15RQ Problem Discuss the naming conventions used for ER schema diagrams. Step-by-step solution Step 1 of 1 The naming conventions used for ER schema diagrams are as follows: • The entity type names should be in singular names. • The names of the entity type and the relationship type are should written in uppercase letters. • The attribute names of each entity are initial letter capitalized. • The role names are in lowercase. Comment Chapter 3, Problem 16E Problem Which combinations of attributes have to be unique for each individual SECTION entity in the UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld constraints: a. During a particular semester and year, only one section can use a particular classroom at a particular DaysTime value. b. During a particular semester and year, an instructor can teach only one section at a particular DaysTime value. c. During a particular semester and year, the section numbers for sections offered for the same course must all be different. Can you think of any other similar constraints? Step-by-step solution Step 1 of 4 a. Consider the following miniworld constraint: A particular classroom can be used by a section at a particular DaysTime value, during a particular semester and year. The attribute combinations, that must be unique for the above constraint, are as follows: Sem, Year, SecID, CRoom, DaysTime Comment Step 2 of 4 b. Consider the following miniworld constraint: Only one section can be taught by an instructor at a particular DaysTime value, during a particular semester and year. The attribute combinations, that must be unique for the above constraint, are as follows: Sem, Year, SecId, DaysTime, Id (of the INSTRUCTOR teaching the SECTION) Comment Step 3 of 4 c. Consider the following miniworld constraint: The section numbers corresponding to the sections offered for the same course must all be different during a particular semester and year. The attribute combinations, that must be unique for the above constraint, are as follows: Sem, Year, SecNo, CCode (of the COURSE related to the SECTION) Comment Step 4 of 4 Some of the other similar constraints related to SECTION entity are as follows: • In a particular semester and year, a student can take only one section at a particular DaysTime value. • In a particular semester and year, an instructor of a particular rank cannot teach two sections at the same DaysTime value. • Only one section of a particular course can use only one classroom during each particular semester and year. Comment Chapter 3, Problem 17E Problem Composite and multivalued attributes can be nested to any number of levels. Suppose we want to design an attribute for a STUDENT entity type to keep track of previous college education. Such an attribute will have one entry for each college previously attended, and each such entry will be composed of college name, start and end dates, degree entries (degrees awarded at that college, if any), and transcript entries (courses completed at that college, if any). Each degree entry contains the degree name and the month and year the degree was awarded, and each transcript entry contains a course name, semester, year, and grade. Design an attribute to hold this information. Use the conventions in Figure 3.5. Step-by-step solution Step 1 of 3 Complex attributes are the attributes that are formed by nesting multivalued attributes and composite attributes. • The curly braces {} are used to group the components of multivalued attributes. • The open braces () are used to group the components of composite attributes. Comment Step 2 of 3 A multivalued attribute PreviousCollege is used to hold the college previously attended by the student. • The components of PreviousCollege are CollegeName, StartDate, EndDate. A multivalued attribute Degree is used to hold the details of degrees awarded to the student. • The components of Degree are DegreeName, Month, Year. A multivalued attribute Transcript is used to hold the details of transcript of the student. • The components of Transcript are CourseName, Semester, Year, Grade. Comment Step 3 of 3 An attribute that holds the details of PreviousCollege, Degree and Transcript of the STUDENT entity is as follows: {PreviousCollege (CollegeName, StartDate, EndDate, {Degree (DegreeName, Month, Year)}, {Transcript (CourseName, Semester, Year, Grade)})} Comment Chapter 3, Problem 18E Problem Show an alternative design for the attribute described in Exercise that uses only entity types (including weak entity types, if needed) and relationship types. Exercise Composite and multivalued attributes can be nested to any number of levels. Suppose we want to design an attribute for a STUDENT entity type to keep track of previous college education. Such an attribute will have one entry for each college previously attended, and each such entry will be composed of college name, start and end dates, degree entries (degrees awarded at that college, if any), and transcript entries (courses completed at that college, if any). Each degree entry contains the degree name and the month and year the degree was awarded, and each transcript entry contains a course name, semester, year, and grade. Design an attribute to hold this information. Use the conventions in Figure 3.5. Step-by-step solution Step 1 of 3 The alternative design for the entity STUDENT with attribute to keep track of previous college education as discussed in the previous problem is as shown below: Comment Step 2 of 3 The strong entities are as given below: • STUDENT • COLLEGE • DEGREE The weak entities are as given below: • TRANSCRIPT • ATTENDANCE Comment Step 3 of 3 Relationships between the entities are as given below: • There exists a binary 1:N relationship PREVIOUS_ATTENDED_COLLEGE between STUDENT and ATTENDANCE. • There exists a binary 1:N relationship ATTENDED between COLLEGE and ATTENDANCE. • There exists a binary M:N relationship DEGREE_AWARDED between ATTENDANCE and DEGREE. • There exists a binary 1:N relationship MAINTAIN_ATTENDANCE between ATTENDANCE and TRANSCRIPT. Comment Chapter 3, Problem 19E Problem Consider the ER diagram in Figure, which shows a simplified schema for an airline reservations system. Extract from the ER diagram the requirements and constraints that produced this schema. Try to be as precise as possible in your requirements and constraints specification. Figure An ER diagram for an AIRLINE database schema Step-by-step solution Step 1 of 2 Refer the ER diagram of the AIRLINE database schema given in figure 3:21. The requirements and the constraints that produced from the schema are as follows: AIRPORT • The database represents the information about each AIRPORT. • Each AIRPORT has its unique Airport_code, AIRPORT Name, City and State where it is located. • Each AIRPORT is identified by airport code. FLIGHT • Each FLIGHT is identified by a unique number. • It also specifies the information about the airline for the FLIGHT and the days on which it is scheduled. FLIGHT_LEG • Each FLIGHT consists of one or more FLIGHT_LEGs with Leg_no. • FARE is kept for each flight and there are certain set of restrictions on FARE. • Each FLIGHT_LEG has the details of its scheduled arrival time, departure time and an Airport Arrival, Airport Departure. Comment Step 2 of 2 LEG_INSTANCE • Each FLIGHT_LEG has the details of its scheduled arrival time, departure time and Airport Arrival and Airport Departure with one or more LEG_INSTANCEs. • A LEG_INSTANCE is an instance of a FLIGHT LEG for a date on which flight travels. • The information for the AIRPLANE used and the number of available seats is kept in the LEG INSTANCE. RESERVATION • In LEG INSTANCE, RESERVATIONs for every customer include the Customer Name, Phone, and Seat Number(s). AIRPLANE, AIRPLANE TYPE, CAN_LAND • All the information about the AIRPLANEs and AIRPLANE TYPEs are included. • AIRPLANE is identified by an airplane id, and the particular type of an AIRPLANE_TYPE. • It has a fixed number of seats and has a particular manufacturing company name. • CAN_LAND relates AIRPLANE_TYPE to the AIRPORTS where they can land at a time. Comment Chapter 3, Problem 20E Problem In Chapters 1 and 2, we discussed the database environment and database users. We can consider many entity types to describe such an environment, such as DBMS, stored database, DBA, and catalog/data dictionary. Try to specify all the entity types that can fully describe a database system and its environment; then specify the relationship types among them, and draw an ER diagram to describe such a general database environment. Step-by-step solution Step 1 of 1 Entity types that can fully describe a database environment and users are: 1. USERS(User_name, User_id, Kind_of_user): User_name gives name of user, User_id is unique identifier for each user and Kind of user tells if user is from DBA staff, casual_User, Application Programmer, Parametric user.(list can be expanded to include menu based application user, form base application user and so on) 2. COMMAND_INTERFACE_TYPE (Interface_identifier, User_group, Next_tool): Interface_identifier can tell which interfaces user can use, viz. DDL statements, Privileged commands, Interactive query, Application programs, compiled transactions, menu based interface, form based interface and so on. User_group tells which user group will use this interface and so that others cannot carry out instructions which they don’t have access to. Next_tool tells tool_id of tool that will be used by interface for further processing. 3. TOOLS (Tool_id, Tool_type, Next_tool): Tool_id helps to uniquely identify the tool, Tool_type tells if the tool is a compiler, or and optimizer or storage tool, Next_tool tells the Tool_id of next tool that will be used by this tool for completing the transaction. E-R diagram: Comment Chapter 3, Problem 21E Problem Design an ER schema for keeping track of information about votes taken in the U.S. House of Representatives during the current two-year congressional session. The database needs to keep track of each U.S. STATE’s Name (e.g., ‘Texas’, ‘New York’, ‘California’) and include the Region of the state (whose domain is {‘Northeast’, ‘Midwest’, ‘Southeast’, ‘Southwest’, ‘West’}). Each CONGRESS_PERSON in the House of Representatives is described by his or her Name, plus the District represented, the Start_date when the congressperson was first elected, and the political Party to which he or she belongs (whose domain is {‘Republican’, ‘Democrat’, ‘Independent’, ‘Other’}). The database keeps track of each BILL (i.e., proposed law), including the Bill_name, the Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is {‘Yes’, ‘No’}), and the Sponsor (the congressperson(s) who sponsored—that is, proposed—the bill). The database also keeps track of how each congressperson voted on each bill (domain of Vote attribute is {‘Yes’, ‘No’, ‘Abstain’, ‘Absent’}). Draw an ER schema diagram for this application. State clearly any assumptions you make. Step-by-step solution Step 1 of 2 Comment Step 2 of 2 ASSUMPTIONS: 1. Each CONGRESS_PERSON can represent one district and one district is represented by one CONGRESS_MAN. 2. Bill is sponsored by one CONGRESS_MAN. 3. Every BILL has different name. Above schema has three entity types 1. US_STATE_REGION: represents states and regions in US 2. CONGRESS_PERSON: who are elected from various regions and are related to US_STATE_REGION by relationship REPRESENTATIVE. 3. BILL: each bill is related to CONGRESS_PERSON, who presents it and is voted by all CONGRESS_MAN. Comment Chapter 3, Problem 22E Problem A database is being constructed to keep track of the teams and games of a sports league. A team has a number of players, not all of whom participate in each game. It is desired to keep track of the players participating in each game for each team, the positions they played in that game, and the result of the game. Design an ER schema diagram for this application, stating any assumptions you make. Choose your favorite sport (e.g., soccer, baseball, football). Step-by-step solution Step 1 of 2 Consider a soccer league in which various teams participate to win the title. The following is the ER diagram for the database of a sports league. Comment Step 2 of 2 Assumptions: • Only two teams can participate in each game. • Each player in a team has unique number. • On a date only one game takes place. • A player can play many games. Comment Chapter 3, Problem 23E Problem Consider the ER diagram shown in Figure for part of a BANK database. Each bank can have multiple branches, and each branch can have multiple accounts and loans. a. List the strong (nonweak) entity types in the ER diagram. b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship. c. What constraints do the partial key and the identifying relationship of the weak entity type specify in this diagram? d. List the names of all relationship types, and specify the (min, max) constraint on each participation of an entity type in a relationship type. Justify your choices. e. List concisely the user requirements that led to this ER schema design. f. Suppose that every customer must have at least one account but is restricted to at most two loans at a time, and that a bank branch cannot have more than 1,000 loans. How does this show up on the (min, max) constraints? An ER diagram for a BANK database schema. Step-by-step solution Step 1 of 6 (a) Non weak entity types are: • LOAN • CUSTOMER • ACCOUNT • BANK Comment Step 2 of 6 (b) Yes there is a weak entity type BANK_BRANCH and its Partial key is Branch_no and identifying relationship is BRANCHES. Comment Step 3 of 6 (c) • No two branches have same number. • A bank can have any number of branches but a branch is of only one bank. Comment Step 4 of 6 (d) Relationship types are: • BRANCHES: BANK (min, max) = (1, 1) and BANK_BRANCH (min, max) = (1.*). A bank can have any number of branches but a branch can be owned by a single bank • ACCTS: ACCOUNT (min, max) = (1..*) and BANK_BRANCH(min, max) = (1, 1). An account can be with one branch but a branch can have many accounts. • LOANS: LOAN (min, max) = (1..*) and BANK_BRANCH(min, max) = (1,1). A branch can give any number of loans but a loan is given from one branch only. • A_C: ACCOUNT(min, max) = (1.*) and CUSTOMER(min, max) = (1,1). A customer can have any number of accounts but an account is owned by only one customer • L_C: CUSTOMER(min, max) = (1,1) and LOAN(min, max) = (1..*). A customer can take any number of loans but a loan is given to only one customer. Comments (1) Step 5 of 6 (e) Consider a banking system • Each BANK has a unique code, name and address. • A bank can have any number of BANK_BRANCH. Each BANK_BRANCH has number that is unique in branches of that bank. • Each BANK_BRACH opens account and gives loans to customers. • Each account and loan.is identifies by account number and has balance, is of particular type. • Each customer is identified by Ssn. Name address phone of customer are stored. Comment Step 6 of 6 (f) Relationship type constraints are: • BRANCHES: BANK (min, max) = (1, 1) and BANK_BRANCH (min, max) = (1.*) • ACCTS: ACCOUNT (min, max) = (1,500) and BANK_BRANCH(min, max) = (1, 1) • LOANS: LOAN (min, max) = (1,1000) and BANK_BRANCH(min, max) = (1,1) • A_C: ACCOUNT(min, max) = (1.*) and CUSTOMER(min, max) = (1,1) • L_C: CUSTOMER(min, max) = (1,1) and LOAN(min, max) = (1,2) Comments (2) Chapter 3, Problem 24E Problem Consider the ER diagram in Figure Assume that an employee may work in up to two departments or may not be assigned to any department. Assume that each department must have one and may have up to three phone numbers. Supply (min, max) constraints on this diagram. State clearly any additional assumptions you make. Under what conditions would the relationship HAS_PHONE be redundant in this example? Part of an ER diagram for a COMPANY database. Step-by-step solution Step 1 of 2 Consider the ER diagram for the COMPANY database. The employee may work in up to two departments or may not be a part of any department. The (min, max) constraint in this case is (0, 2). Each department must have one phone number and may have up to three phone numbers. The (min, max) constraint in this case is (1, 3). The following are the other assumptions made for the COMPANY database: • Each department must have one employee and may have up to twenty employees. The (min, max) constraint in this case is (1, 20). • Each phone used by only one department. The (min, max) constraint in this case is (1, 1). • Each phone is assigned to at least one employee and may be assigned to 5 employees. The (min, max) constraint in this case is (1, 5). • Each employee must have one phone and may have up to 3 phones. The (min, max) constraint in this case is (1, 3). Comment Step 2 of 2 The following is the ER diagram after supplying the (min, max) constraints for the COMPANY database: The relationship HAS_PHONE would be redundant under the following condition: • If the EMPLOYEEs assigned to all PHONEs of their DEPARTMENT and none of any other department. Comment Chapter 3, Problem 25E Problem Consider the ER diagram in Figure. Assume that a course may or may not use a textbook, but that a text by definition is a book that is used in some course. A course may not use more than five books. Instructors teach from two to four courses. Supply (min, max) constraints on this diagram. State clearly any additional assumptions you make. If we add the relationship ADOPTS, to indicate the textbook(s) that an instructor uses for a course, should it be a binary relationship between INSTRUCTOR and TEXT, or a ternary relationship among all three entity types? What (min, max) constraints would you put on the relationship? Why? Part of an ER diagram or a COURSES database. Step-by-step solution Step 1 of 1 Relationship type constraints are: TEACHES: INSTRUCTOR (min, max) = (1,1) and COURSE (min, max) = (2,4). Assumption: One course is taught by a single teacher. USES: TEXT (min, max) = (0, 5) and COURSE (min, max) = (1, 1). Assumption: One text can be used by single course. If relationship ADOPTS is added in between INSTRUCTOR and TEXT (min, max) constraints would be: INSTRUCTOR (min, max) = (1,1) and TEXT (min, max) = (0, 20). Since each Instructor can take 2-4 courses and can use unto five texts for each course or none, min and max constraints will be like above. Comment Chapter 3, Problem 26E Problem Consider an entity type SECTION in a UNIVERSITY database, which describes the section offerings of courses. The attributes of SECTION are Section_number, Semester, Year. Course_number, Instructor, Room_no (where section is taught), Building (where section is taught), Weekdays (domain is the possible combinations of weekdays in which a section can be offered {‘MWF’, ‘MW’, ‘TT’, and so on}), and Hours (domain is all possible time periods during which sections are offered {‘9–9:50 a.m.’, ‘10–10:50 a.m.’, …, ‘3:30–4:50 p.m.’, ‘5:30–6:20 p.m.’, and so on}). Assume that Section_number is unique for each course within a particular semester/year combination (that is, if a course is offered multiple times during a particular semester, its section offerings are numbered 1, 2, 3, and so on). There are several composite keys for section, and some attributes are components of more than one key. Identify three composite keys, and show how they can be represented in an ER schema diagram. Step-by-step solution Step 1 of 4 The attributes of the SECTION entity are as follows: • Section_number • Semester • Year • Course_number • Instructor • Room_no • Building • Weekdays • Hours Comment Step 2 of 4 As Section_number is unique for a course in particular semester of a year, {Section_number, Semester, Year, Course} can be considered as composite key for SECTION entity. As unique room can be allocated for a specific days and hours in a particular semester of a year, {Semester, Year, Room_no, Weekdays, Hours} can be considered as composite key for SECTION entity. As unique Instructor can be allocated to teach for a specific days and hours in a particular semester of a year, {Semester, Year, Instructor, Weekdays, Hours} can be considered as composite key for SECTION entity. Comment Step 3 of 4 Hence, the composite keys for SECTION entity are as follows: • Key 1: Section_number, Semester, Year, Course • Key 2: Semester, Year, Room_no, Weekdays, Hours • Key 3: Semester, Year, Instructor, Weekdays, Hours Comment Step 4 of 4 The ER schema diagram is as follows: Chapter 3, Problem 27E Problem Cardinality ratios often dictate the detailed design of a database. The cardinality ratio depends on the real-world meaning of the entity types involved and is defined by the specific application. For the following binary relationships, suggest cardinality ratios based on the common-sense meaning of the entity types. Clearly state any assumptions you make. Entity 1 Cardinality Ratio Entity 2 1. STUDENT ______________ SOCIAL_SECURITY_CARD 2. STUDENT ______________ TEACHER 3. CLASSROOM ______________ WALL 4. COUNTRY ______________ CURRENT_PRESIDENT 5. COURSE ______________ TEXTBOOK 6. ITEM (that can be found in an order) ______________ ORDER 7. STUDENT ______________ CLASS 8. CLASS ______________ INSTRUCTOR 9. INSTRUCTOR ______________ OFFICE 10 EBAY_AUCTIONJTEM ______________ EBAY_BID Step-by-step solution Step 1 of 3 1. Each student will have a unique social security number. So there exists a 1:1 cardinality ratio between STUDENT and SOCIAL_SECURITY_NUMBER entities. 2. A student can be taught by many teachers and a teacher can teach many students. So there exists a M: N cardinality ratio between STUDENT and TEACHER entities. 3. A class room can have 4 walls and there will be a common wall for two class rooms. So there exists a 2: 4 cardinality ratio between CLASSROOM and WALL entities. 4. Each country will have an only one president and a person can be president to only one country. So there exists a 1:1 cardinality ratio between COUNTRY and PRESIDENT entities. 5. A course can have any number of textbooks but a textbook can belong to only one course. So there exists a 1:N cardinality ratio between COURSE and TEXTBOOK entities. Comments (2) Step 2 of 3 6. An order can consist of many items and an item can belong to more than one order. So there exists a M: N cardinality ratio between ORDER and ITEM entities. 7. A student can belong to one class, but a class can consist of many students. So there exists a N:1 cardinality ratio between STUDENT and CLASS entities. 8. A class can have many instructors and an instructor can belong to more than one class. So there exists a M: N cardinality ratio between CLASS and INSTRUCTOR entities. 9. An instructor can belong to one office, but an office can have more than one instructor. So there exists a N:1 cardinality ratio between INSTRUCTOR and OFFICE entities. 10. An eBay auction item can have any number of bids. So there exists a 1:N cardinality ratio between EBAY_AUCTION_ITEM and EBAY-BID entities. Comment Step 3 of 3 Summary of cardinality ratio: Comment Chapter 3, Problem 28E Problem Consider the ER schema for the MOVIES database in Figure. Assume that MOVIES is a populated database. ACTOR is used as a generic term and includes actresses. Given the constraints shown in the ER schema, respond to the following statements with True, False, or Maybe. Assign a response of Maybe to statements that, although not explicitly shown to be True, cannot be proven False based on the schema as shown. Justify each answer. a. There are no actors in this database that have been in no movies. b. There are some actors who have acted in more than ten movies. c. Some actors have done a lead role in multiple movies. d. A movie can have only a maximum of two lead actors. e. Every director has been an actor in some movie. f. No producer has ever been an actor. g. A producer cannot be an actor in some other movie. h. There are movies with more than a dozen actors. i. Some producers have been a director as well. j. Most movies have one director and one producer. k. Some movies have one director but several producers. l. There are some actors who have done a lead role, directed a movie, and produced a movie. m. No movie has a director who also acted in that movie. Figure An ER diagram for a MOVIES database schema. Step-by-step solution Step 1 of 13 a. There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and MOVIE. ACTOR and MOVIE have full participation in relationship PERFORMS_IN. Hence, the given statement is TRUE. Comment Step 2 of 13 b. There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and MOVIE. The maximum cardinality M or N indicates that there is no maximum number. Some of the actors may be acted in more than ten movies. Hence, the given statement is MAY BE. Comment Step 3 of 13 c. There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE. The maximum cardinality for an actor to act in a movie as a lead role is N. N can be 2 or more. Hence, the given statement is TRUE. Comment Step 4 of 13 d. There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE. The maximum cardinality 2 indicates that an actor can act as a lead role in only two movies. Hence, the given statement is TRUE. Comments (1) Step 5 of 13 e. There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and DIRECTOR. Director does not have total participation in the relationship named ALSO_A_DIRECTOR. So, there may be an actor who is also a director, but every director cannot be an actor. Hence, the given statement is FALSE. Comment Step 6 of 13 f. There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and PRODUCER. Producer does not have total participation in the relationship named ACTOR_PRODUCER. So, there may be an actor who is also a producer. Hence, the given statement is FALSE. Comment Step 7 of 13 g. A producer can act in any movie other than directed by him. Hence, the given statement is FALSE. Comment Step 8 of 13 h. There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and MOVIE. The maximum cardinality M indicates that there is no maximum number. A movie can have more than 12 actors performing in it. Hence, the given statement is MAY BE. Comment Step 9 of 13 i. There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and DIRECTOR. There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and PRODUCER. Hence, there may be an actor who is a director as well a producer Hence, the given statement is TRUE. Comment Step 10 of 13 j. There exists a one to many relationship named DIRECTS between DIRECTOR and MOVIE. A director can direct N movies. There exists a many to many relationship named PRODUCES between PRODUCER and MOVIE. A producer can produce any number of movies. So, there may be one director and one producer for a movie. Hence, the given statement is MAY BE. Comment Step 11 of 13 k. There exists a one to many relationship named DIRECTS between DIRECTOR and MOVIE. A director can direct N movies. There exists a many to many relationship named PRODUCES between PRODUCER and MOVIE. A producer can produce any number of movies. So, there can be one director and several producers for movies. Hence, the given statement is TRUE. Comment Step 12 of 13 l. There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE. There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and DIRECTOR. There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and PRODUCER. So, there may an actor who is a producer, director and performed a lead role in a movie. Hence, the given statement is TRUE. Comment Step 13 of 13 m. There may be a movie in which a director performed in the movie directed by him. Hence, the given statement is FALSE. Comment Problem Chapter 3, Problem 29E Given the ER schema for the MOVIES database in Figure, draw an instance diagram using three movies that have been released recently. Draw instances of each entity type: MOVIES, ACTORS, PRODUCERS, DIRECTORS involved; make up instances of the relationships as they exist in reality for those movies. An ER diagram for a MOVIES database schema. Step-by-step solution Step 1 of 2 Comment Step 2 of 2 Amir Khan: Produced a movie he acted in and Also directed the movie. Comment Chapter 3, Problem 30E Problem Illustrate the UML diagram for Exercise. Your UML design should observe the following requirements: a. A student should have the ability to compute his/her GPA and add or drop majors and minors. b. Each department should be able to add or delete courses and hire or terminate faculty. c. Each instructor should be able to assign or change a student’s grade for a course. Note: Some of these functions may be spread over multiple classes. Reference Problem 16 Which combinations of attributes have to be unique for each individual SECTION entity in the UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld constraints: a. During a particular semester and year, only one section can use a particular classroom at a particular DaysTime value. b. During a particular semester and year, an instructor can teach only one section at a particular DaysTime value. c. During a particular semester and year, the section numbers for sections offered for the same course must all be different. Can you think of any other similar constraints? Step-by-step solution Step 1 of 5 The UML diagram consists of a class, such that the class is equivalent to the entity in ER diagram. The class consists of following three sections: • Class name: It is the top section of the UML class diagram. Class name is similar to the entity type name in ER diagram. • Attributes: It is the middle section of the UML class diagram. Attributes are the same as the attributes of an entity in the ER diagram. • Operations: It is the last section of the UML class diagram. It indicates the operations that can be performed on individual objects, where each object is similar to the entities in ER diagram. Comment Step 2 of 5 a. The operation that indicates the ability of the student to calculate his/her GPA and also to add or drop the majors and minors is specified in the last section of the UML class diagram. The operations are as follows: • computer_gpa • add_major • drop_major • add_minor • drop_minor Comment Step 3 of 5 b. The operation that indicates the ability of each department to add or delete a course and also to hire or terminate a faculty is specified in the last section of the UML class diagram. The operations are as follows: • add_course • delete_course • hire_faculty • terminate_faculty Comment Step 4 of 5 c. The operation that indicates the ability of each instructor to assign or change the grade of a student for a particular course is specified in the last section of the UML class diagram. The operations are as follows: • assign_grade • change_grade Comment Step 5 of 5 The UML diagram corresponding to the above requirements are as follows: Comment Chapter 3, Problem 31LE Problem Consider the UNIVERSITY database described in Exercise 16. Build the ER schema for this database using a data modeling tool such as ERwin or Rational Rose. Reference Exercise 16 Which combinations of attributes have to be unique for each individual SECTION entity in the UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld constraints: a. During a particular semester and year, only one section can use a particular classroom at a particular DaysTime value. b. During a particular semester and year, an instructor can teach only one section at a particular DaysTime value. c. During a particular semester and year, the section numbers for sections offered for the same course must all be different. Can you think of any other similar constraints? Step-by-step solution Step 1 of 1 Refer to the exercise 3.16 for the UNIVERSITY database. Use Rational Rose tool to create the ER schema for the database as follow: • In the options available on left, right click on the option Logical view, go to New and select the option Class Diagram. • Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and then click on empty space of the Class Diagram file. Name the class as COLLEGE. Right click on the class, select the option New Attribute, and name the attribute as CName. Similarly, create the other attributes COffice and CPhone. • Now right click on the attribute CName, available on the left under the class UNIVERSITY, and select the option Open Specification. Select the Protected option under Export Control. This will make CName as primary key. • Similarly create another class INSTRUCTOR; its attributes Id, Rank, IName, IOffice and IPhone; and Id as the primary key. • Select the option Unidirectional Association from the toolbar, for creating relationships between the two classes. Now click on the class COLLEGE; while holding the click drag the mouse towards the class INSTRUCTOR and release the click. This will create the relationship between the two selected classes. Name the association as DEAN. Since the structural constraint in the ER diagram is specified using (min, max) notation, so specify the structural constraints using the Rational Rose tool as follows: • Right click on the association close to the class COLLEGE and select 1 from the option Multiplicity. • Again, right click on the association close to the class INSTRUCTOR and select Zero or One from the option Multiplicity. • Similarly, create other classes and their associated attributes. Specify the relationships and structural constraints between the classes, as mentioned above. ER schema may be specified using alternate diagrammatic notation that is class diagram, through the use of Rational Rose tool as follows: Comment Chapter 3, Problem 32LE Problem Consider a MAIL_ORDER database in which employees take orders for parts from customers. The data requirements are summarized as follows: â– The mail order company has employees, each identified by a unique employee number, first and last name, and Zip Code. â– Each customer of the company is identified by a unique customer number, first and last name, and Zip Code. â– Each part sold by the company is identified by a unique part number, a part name, price, and quantity in stock. â– Each order placed by a customer is taken by an employee and is given a unique order number. Each order contains specified quantities of one or more parts. Each order has a date of receipt as well as an expected ship date. The actual ship date is also recorded. Design an entity-relationship diagram for the mail order database and build the design using a data modeling tool such as ERwin or Rational Rose. Step-by-step solution There is no solution to this problem yet. Get help from a Chegg subject expert. Ask an expert Chapter 3, Problem 35LE Problem Consider the ER diagram for the AIRLINE database shown in Figure Build this design using a data modeling tool such as ERwin or Rational Rose. An ER diagram for an AIRLINE database schema Step-by-step solution Step 1 of 1 Refer to the figure 3.21 for the ER schema of AIRLINE database. Use Rational Rose tool to create the ER schema for the database as follow: • In the options available on left, right click on the option Logical view, go to New and select the option Class Diagram. • Name the class diagram as AIRLINE. Select the option Class available in the toolbar and then click on empty space of the Class Diagram file. Name the class as AIRPORT. Right click on the class, select the option New Attribute, and name the attribute as Airport_code. Similarly, create the other attributes City, State and Name. • Now right click on the attribute Airport_code, available on the left under the class AIRPORT, and select the option Open Specification. Select the Protected option under Export Control. This will make Airport_code as primary key. • Similarly create another class FLIGHT_LEG and its attribute Leg_no • Select the option Unidirectional Association from the toolbar, for creating relationships between the two classes. Now click on the class AIRPORT; while holding the click drag the mouse towards the class FLIGHT_LEG and release the click. This will create the relationship between the two selected classes. Name the association as DEPARTURE_AIRPORT. Since the structural constraint in the ER diagram is specified using (min, max) notation, so specify the structural constraints using the Rational Rose tool as follows: • Right click on the association close to the class AIRPORT and select 1 from the option Multiplicity. • Again, right click on the association close to the class FLIGHT_LEG and select n from the option Multiplicity. • Similarly, create other classes and their associated attributes. Specify the relationships and structural constraints between the classes, as mentioned above. ER schema may be specified using alternate diagrammatic notation that is class diagram, through the use of Rational Rose tool as follows: Comment Chapter 4, Problem 1RQ Problem What is a subclass? When is a subclass needed in data modeling? Step-by-step solution Step 1 of 3 Subclass: The sub class is also called as a derived class. This class extends from another class (Parent Class) so that it inherits protected and public members from the parent class. The sub class is same as the entity in the superclass but in a distinct specific role. Comment Step 2 of 3 An entity is an object (thing) with independent physical (car, home, person) or conceptual (company, university course) existence in the real world.). Each real-world entity (thing) has certain properties that represent its significance in real world or describes it. These properties of an entity are known as attribute. An entity type defines a collection (or set) of entities that have the same attributes. A database usually contains a group of entities that are similar. These entities have same attributes but different attribute values. A collection of these entities is an entity type. In each entity type there may exist, smaller groupings on basis of one or other attribute/relationship. Such attributes or relationships may not apply to all entities in entity type but are of significant value for that group. All such groups can be represented as separate classes or entity types. These form subclass of bigger entity type. Example: Consider am entity type VEHICLE. Now all vehicles have property that they have manufacturer, number_plate, registration_number, colour etc. , but there are certain properties hat we may link only to carrier vehicles like load_capacity, size(for width and height of product it can take) etc…, and certain attributes that can be attached to passenger vehicles only are sitting_capacity, ac/non ac etc…, so we can have subclasses for Entity type vehicle as PASSENGER_VEHICLE and GOODS_VEHICLE. PASSENGER_VEHICLE and GOODS_VEHICLE are subclasses of VEHICLE superclass. Comment Step 3 of 3 Subclass needed in data modeling: To define inheritance relationship between two classes, the subclass is needed in data modeling. Concept of subclass is used in data modeling to represent data more meaningfully and to represent those attributes/relationships clearly that are part of a group of entities in superclass and are not part of all entities. Comment Chapter 4, Problem 2RQ Problem Define the following terms: superclass of a subclass, superclass/subclass relationship, IS-A relationship, specialization, generalization, category, specific (local) attributes, and specific relationships. Step-by-step solution Step 1 of 9 1. Superclass of a subclass: In each entity type there may exist, smaller groupings on basis of one or other attribute/relationship. Such attributes or relationships may not apply to all entities in entity type but are of significant value for that particular group. All such groups can be represented as separate classes or entity types. These form subclass of bigger entity type. Bigger entity type is known as superclass. For example: Consider am entity type VEHICLE. Now all vehicles have property that they have manufacturer, number_plate, registration_number, colour etc. , but there are certain properties hat we may link only to carrier vehicles like load_capacity, size(for width and height of product it can take) etc…, and certain attributes that can be attached to passenger vehicles only are sitting_capacity, ac/non ac etc…, so we can have subclasses for Entity type vehicle as PASSENGER_VEHICLE and GOODS_VEHICLE. PASSENGER_VEHICLE and GOODS_VEHICLE are subclasses of VEHICLE superclass Comment Step 2 of 9 2. Superclass/subclass relationship: Relationship between a superclass and any one of its subclass is known as superclass/subclass relationship. Comment Step 3 of 9 3. is-a relationship: A superclass/subclass relationship is often called as is-a relationship because of the way in which concept is referred. For example: Consider am entity type VEHICLE. Now all vehicles have property that they have manufacturer, number_plate, registration_number, colour etc. , but there are certain properties hat we may link only to carrier vehicles like load_capacity, size(for width and height of product it can take) etc…, and certain attributes that can be attached to passenger vehicles only are sitting_capacity, ac/non ac etc…, so we can have subclasses for Entity type vehicle as PASSENGER_VEHICLE and GOODS_VEHICLE. PASSENGER_VEHICLE and GOODS_VEHICLE are subclasses of VEHICLE superclass. Or we can say GOOD_VEHICLE is a VEHICLE.. 4. Comment Step 4 of 9 Specialization: Specialization is a process of defining a set of subclass of an entity type(superclass of specialization). The set of subclass that forms a specialization is defined on basis of some distinguishing characteristic of the entities in the superclass. For example: the set of {GOOD_VEHICLE and CARRIER_VEHICLE} is a specialization of superclass VEHICLE that distinguishes among vehicle entities on basis of purpose which each vehicle serves. There can be several specializations of same entity type based on different distinguishing characteristics. Foe example: On basis that vehicle is commercial or not we can have other specialization {COMMERCIAL, PRIVATE}. Specialization is a process that allows user to do following: a. Define a set of subclass of an entity type. b. Establish additional specific attribute with each subclass. c. Establish additional specific relationship types between each subclass and other entity types or other subclasses. Comment Step 5 of 9 5. Generalization: This is a reverse process of abstraction in which differences between several entity types are suppressed, common features are identified, and generalized into a single superclass of which the original entity types are special subclass. For example: GOOD_VEHICLE and CARRIER_VEHICLE are two classes and they have certain attributes, viz. , number_plate, reg_number, color, etc. ; these attributes from both these classes can be taken in common and a new superclass can be created VEHICLE. This is called generalization. Comment Step 6 of 9 6. Category: It may happen sometime that need arises for modeling a single superclass/subclass relationship with more than one superclass, where the superclasses represent different entity types. In this case, the subclass will represent a collection of objects that is a subset of the of distinct entity types; such a subclass is called a union or a category. Comment Step 7 of 9 7. Comments (2) Step 8 of 9 Specific (local) attributes: Consider am entity type VEHICLE. Now all vehicles have property that they have manufacturer, number_plate, registration_number, colour etc. , but there are certain properties hat we may link only to CARRIER_VEHICLES subclass like load_capacity, size(for width and height of product it can take) etc…, and certain attributes that can be attached to PASSENGER_VEHICLES subclass only: sitting_capacity, ac/non ac etc. These attributes that are part of only subclaases and not of superclass are called local attributes or specific attributes. Comment Step 9 of 9 8. Specific relationships: Like local attributes there are certain relationships that are true only for a subclass of superclass and not for all subclasses or for superclass. Such relations are called specific relationships. For example: CARRIES_GOODS can b a relation between CARRIER_VEHICLES and COMPANY and but not between PASSENGER_VEHICLE and COMPANY. Comment Chapter 4, Problem 3RQ Problem Discuss the mechanism of attribute/relationship inheritance. Why is it useful? Step-by-step solution Step 1 of 2 The Enhanced entity relationship (EER) model is the extension of the ER model. The EER model includes some new concepts in addition to the concepts of the ER model. The EER model includes the concepts of subclass, superclass, specialization, generalization, category or union type. The ER model with all these additional concepts is associated with the mechanism of attribute and relationship inheritance. Comment Step 2 of 2 The type of each entity is defined by the set of attributes and the relationship types. The members of the subclass entity inherit the attributes and the relationships of the superclass entity. This mechanism is useful because, the attributes in the subclass possess the characteristics of the superclass. Comment Chapter 4, Problem 4RQ Problem Discuss user-defined and predicate-defined subclasses, and identify the differences between the two. Step-by-step solution Step 1 of 1 Predicate-defined subclasses: When we decide entities that will become member of each class of specialization by placing condition on some attribute of the superclass. Such subclasses are called predicate-defined subclass. User- defined subclasses: When there is no condition for determining membership in a subclass, the subclass is called user defined. Membership in such a subclass is determined by the database users when they apply the operation to add entity to the subclass; hence, membership is specified individually for each entity by user, not by any condition that that may be evaluated automatically. Difference between predicate defined and user defined subclass are: 1. Membership of predicate defined subclasses can be decided automatically but it is not the same for user defined subclasses. Comment Chapter 4, Problem 5RQ Problem Discuss user-defined and attribute-defined specializations, and identify the differences between the two. Step-by-step solution Step 1 of 5 User- defined specialization: Comment Step 2 of 5 If there is no condition for deciding membership of all subclasses, then the sub class is called user defined specialization. Comment Step 3 of 5 Membership in such a specialization is determined by the database users when any operation is performed to add entity to the subclass. Comment Step 4 of 5 Hence, membership is specified individually for each entity by user. Attribute-defined specialization: If the user chooses entities, the entity become member of each class of specialization by placing condition on some attribute of the superclass. Such subclasses are called attribute-defined subclass. Comment Step 5 of 5 The difference between user-defined specialization and attribute-defined specialization is as follows: User-defined specialization Attribute-defined specialization The user is responsible for identifying proper The value of the same attribute is used in subclass. defining predicate for all subclasses. Membership of user-defined defined Membership of attribute defined specialization cannot be decided automatically. specialization can be decided automatically. Comment Chapter 4, Problem 6RQ Problem Discuss the two main types of constraints on specializations and generalizations. Step-by-step solution Step 1 of 1 Two main constraints on specialization and generalization are: 1. Disjoint Constraint: This specifies that the subclasses of the specialization must be disjoint. This means that an entity can be a member of at most one of the subclasses of the specialization. A specialization that is attribute-defined implies the disjoint ness constraint if the attribute used to define membership predicate is single-valued. If disjoint ness constraint holds true than specialization is disjoint. There might be a set of entities that are common to subclasses, this is condition of overlap. 2. Completeness Constraint: This may be total or partial. A total specialization constraint specifies that every entity in the superclass must be a member of at least one of the subclass in the specialization. Partial specialization allows an entity not to belong to any of the subclasses. Comment Problem Chapter 4, Problem 7RQ What is the difference between a specialization hierarchy and a specialization lattice? Step-by-step solution Step 1 of 1 A subclass itself may have further subclasses specified on it, forming a hierarchy or a lattice of specializations. A specialization hierarchy has that constraint that every subclass participates as a subclass in only one class/subclass relationship; that is, each subclass has only one parent, which results in a tree structure. In contrast, for a specialization lattice, a subclass can be a subclass in more than one class/subclass relationship. Comment Next Chapter 4, Problem 8RQ Problem What is the difference between specialization and generalization? Why do we not display this difference in schema diagrams? Step-by-step solution Step 1 of 2 Specialization is a process of defining a set of subclass of an entity type (superclass of specialization). The set of subclass that forms a specialization is defined on basis of some distinguishing characteristic of the entities in the superclass. For example: the set of {GOOD_VEHICLE and CARRIER_VEHICLE} is a specialization of superclass VEHICLE that distinguishes among vehicle entities on basis of purpose which each vehicle serves. There can be several specializations of same entity type based on different distinguishing characteristics. Foe example: On basis that vehicle is commercial or not we can have other specialization {COMMERCIAL, PRIVATE}. Specialization is a process that allows user to do following: a. Define a set of subclass of an entity type. b. Establish additional specific attribute with each subclass. c. Establish additional specific relationship types between each subclass and other entity types or other subclasses. Comment Step 2 of 2 Generalization: This is a reverse process of abstraction in which differences between several entity types are suppressed, common features are identified, and generalized into a single superclass of which the original entity types are special subclass. For example: GOOD_VEHICLE and CARRIER_VEHICLE are two classes and they have certain attributes, viz. , number_plate, reg_number, color, etc. ; these attributes from both these classes can be taken in common and a new superclass can be created VEHICLE. This is called generalization. Specialization and generalization can be viewed as functionally reverse processes of each other. We do not generally display difference in design of schema because the decision as to which process is more appropriate in a particular situation is often subjective. Comment Chapter 4, Problem 9RQ Problem How does a category differ from a regular shared subclass? What is a category used for? Illustrate your answer with examples. Step-by-step solution Step 1 of 3 Category is different from regular shared subclasses because: 1. A category has two or more superclasses that may represent distinct entity types, whereas other regular shared subclasses always have a single superclass. Regular shared subclass fig: Category fig: Comments (1) Step 2 of 3 2. An entity that is member of shared subclass must exist in all superclasses i.e. it is subset of intersection of superclasses. In case of category, a member entity can be part of any one of superclass, i.e., it is subset of union of superclasses. 3. Attribute inheritance works selectively in case of categories. Attributes of any one of superclass are inherited, depending on the superclass to which entity belongs. On the other hand, a shared subclass inherits all the attributes of its superclasses. Comment Step 3 of 3 USE:It may happen sometime that need arises for modeling a single superclass/subclass relationship with more than one superclass, where the superclasses represent different entity types. In this case, the subclass will represent a collection of objects that is a subset of the of distinct entity types; in such cases union or a category is used. For example: Consider a piece of property. This can be owned by a person, a business firm, a charitable institution, a bank etc. All this entities are of different type but will jointly form total set of land owners. Above figure illustrate this example. Comment Chapter 4, Problem 10RQ Problem For each of the following UML terms (see Sections 3.8 and 4.6) discuss the corresponding term in the EER model, if any: object, class, association, aggregation, generalization, multiplicity, attributes, discriminator, link, link attribute, reflexive association, and qualified association. Step-by-step solution Step 1 of 1 S.No UML Term EER model Term 1 Object Entity 2 Class Entity type 3 Association Relationship types 4 Aggregation Relationship between a whole object and component part 5 Generalization Generalization 6 Multiplicity (min, max) notation 7 Attributes Attributes 8 Discriminator Partial key 9 Link Relationship instances 10 Link Attribute Relationship attribute 11 Reflexive association Recursive relationship 12 Qualified association Weak entity Comment Chapter 4, Problem 11RQ Problem Discuss the main differences between the notation for EER schema diagrams and UML class diagrams by comparing how common concepts are represented in each. Step-by-step solution Step 1 of 1 Following are some of the differences between the notation for EER schema diagram and UML class diagram notations are as follows: Comment Problem Chapter 4, Problem 12RQ List the various data abstraction concepts and the corresponding modeling concepts in the EER model. Step-by-step solution Step 1 of 3 The list of four abstraction concepts in the EER (Enhanced Entity-Relationship model) are as follows: • Classification and instantiation • Identification • Specialization and generalization • Aggregation and association Comment Step 2 of 3 Classification and instantiation • The classification is used to assign the similar entities or object to the entity type or object type. • The instantiation is a quite opposite of the classification and it is used to a specific examination of distinct objects of a class. Identification • Identify the classes and objects are uniquely identified by the identifier is known as an identification. • The identification needs two levels: o The identification is used to tell the difference between the classes and objects. o The identification is also used to identify the database objects and to relate them to their realworld counterparts. Specialization and generalization • The specialization is used to categorizing a class of objects into subclasses. • The generalization is the quite opposite of the generalization and it is used combined several classes into a higher-level class. Aggregation and association • The aggregation is used to build the composite objects from their component objects. • The association is used to associate objects from several independent classes. Comment Step 3 of 3 The following are the modeling concepts of the EER model: • The modeling concepts in the EER model almost like all the ER model modeling concepts. In addition, the EER model contains subclass and superclass are related to the concepts of the Specialization and generalization. • Another modeling concepts in the EER model is category or union type. Which have no standard terminology related to the abstract concepts of the EER model. Comment Chapter 4, Problem 13RQ Problem What aggregation feature is missing from the EER model? How can the EER model be further enhanced to support it? Step-by-step solution Step 1 of 2 Missing feature: In the EER (Enhanced Entity Relationship) model may not be used explicitly and it includes the possibility of combining the objects which are related to specific instance into a higher level aggregate object. • This may be sometimes helpful because this higher-level aggregate may be related to some other object. • This type of relationship between the primitive object and aggregate object is referred as IS-APART-OF and its inverse is called as IS-A-COMPONENT-OF. Comment Step 2 of 2 Enhancement: This missing feature must be further enhanced by representing the aggregation feature correctly in EER model by creating the additional entity types. Comment Chapter 4, Problem 14RQ Problem What are the main similarities and differences between conceptual database modeling techniques and knowledge representation techniques? Step-by-step solution Step 1 of 2 Major similarities and differences between conceptual database modeling techniques and knowledge representation techniques: 1. Both the disciplines use an abstraction process to identify common properties and important aspects of objects in the miniworld while suppressing insignificant differences and unimportant details. 2. Both disciplines provide concepts, constraints, operations, and languages for defining data and representing knowledge. 3. KR is generally broader in scope than semantic data models. Different forms of knowledge, such as rules, incomplete and default knowledge, temporal and spatial knowledge, are represented in KR schemes. Comment Step 2 of 2 4. KR schemes include reasoning mechanisms that deduce additional facts stored in a database. Hence, whereas most current database systems are limited to answering the direct queries, knowledge-based systems using KR schemes can answer queries that involve inferences over the stored data. 5. Whereas most data models concentrate on the representation of database schemas, or metaknowledge, KR schemes often mix up the schemas with the instances themselves in order to provide flexibility in representing exceptions. This often leads to inefficiencies when KR schemes are implemented in comparison to database especially when large amount of data needs to be stored. Comment Chapter 4, Problem 15RQ Problem Discuss the similarities and differences between an ontology and a database schema. Step-by-step solution Step 1 of 1 The difference between ontology and database schema is that, the schema is usually limited to describing a small subset of a miniworld form reality in order to store and manage data. Ontology is usually considered to be more general in that. It attempts to describe a part of reality or a domain of interest (e.g., medical terms, electronic-commerce applications) as completely as possible Comment Chapter 4, Problem 16E Problem Design an EER schema for a database application that you are interested in. Specify all constraints that should hold on the database. Make sure that the schema has at least five entity types, four relationship types, a weak entity type, a superclass/subclass relationship, a category, and an n-ary (n > 2) relationship type. Step-by-step solution Step 1 of 2 Comment Step 2 of 2 Here weak entity type INTERVIEW has ternary identifying relationships- JOB_OFFER, CANDIDATE and EMPLOYER. An interview can be related to candidate who gives interview and some employer that takes it and some job offer for which interview can be taken. Employer can be a government organization or a private firm, and is hiring for a department for which a candidate can apply or wants to work for. A candidate can be a fresher or may have some work experience. Comment Chapter 4, Problem 17E Problem Consider the BANK ER schema in Figure, and suppose that it is necessary to keep track of different types of ACCOUNTS (SAVINGS_ACCTS, CHECKING_ACCTS, …) and LOANS (CAR_LOANS, HOME_LOANS, …). Suppose that it is also desirable to keep track of each ACCOUNT’S TRANSACTIONS (deposits, withdrawals, checks, …) and each LOAN's PAYMENTS; both of these include the amount, date, and time. Modify the BANK schema, using ER and EER concepts of specialization and generalization. State any assumptions you make about the additional requirements. An ER diagram for an AIRLINE database schema Step-by-step solution Step 1 of 2 Following are the assumptions: • There are only three types of accounts SAVING, CURRENT and CHECKING accounts. • There are only three types of loans CAR loans, HOME loans and PERSONAL loans. • Each user can do any number of transactions on an account. • A loan can be repaid in any number of payments • Each transaction and payment have unique id. Comment Step 2 of 2 The modified enhanced entity relationship diagram is as follows: Comment Chapter 4, Problem 18E Problem The following narrative describes a simplified version of the organization of Olympic facilities planned for the summer Olympics. Draw an EER diagram that shows the entity types, attributes, relationships, and specializations for this application. State any assumptions you make. The Olympic facilities are divided into sports complexes. Sports complexes are divided into one-sport and multisport types. Multisport complexes have areas of the complex designated for each sport with a location indicator (e.g., center, NE corner, and so on). A complex has a location, chief organizing individual, total occupied area, and so on. Each complex holds a series of events (e.g., the track stadium may hold many different races). For each event there is a planned date, duration, number of participants, number of officials, and so on. A roster of ail officials will be maintained together with the list of events each official will be involved in. Different equipment is needed for the events (e.g., goal posts, poles, parallel bars) as well as for maintenance. The two types of facilities (one-sport and multisport) will have different types of information. For each type, the number of facilities needed is kept, together with an approximate budget. Step-by-step solution Step 1 of 3 In the EER diagram, • “Rectangle box” denotes entity. • “Diamond-shaped” symbol represents the relationship. • “Oval” symbol connected with attribute represents the attribute. Comment Step 2 of 3 The following is the EER diagram for the organization of Olympic facilities planned for the summer Olympics. Comment Step 3 of 3 Explanation: • The Olympic facilities are divided into sports complexes. The sport complexes are divided into one sport and multisport types. • There exist a holds relationship between Complex and Event entities. The complex holds the number of events. • Each event is assigned to an officer. • Both complex and event have equipment. The complex maintains maintenance equipment and event has event equipment. Comment Chapter 4, Problem 19E Problem Identify all the important concepts represented in the library database case study described below. In particular, identify the abstractions of classification (entity types and relationship types), aggregation, identification, and specialization/generalization. Specify (min, max) cardinality constraints whenever possible. List details that will affect the eventual design but that have no bearing on the conceptual design. List the semantic constraints separately. Draw an EER diagram of the library database. Case Study: The Georgia Tech Library (GTL) has approximately 16,000 members, 100,000 titles, and 250,000 volumes (an average of 2.5 copies per book). About 10% of the volumes are out on loan at any one time. The librarians ensure that the books that members want to borrow are available when the members want to borrow them. Also, the librarians must know how many copies of each book are in the library or out on loan at any given time. A catalog of books is available online that lists books by author, title, and subject area. For each title in the library, a book description is kept in the catalog; the description ranges from one sentence to several pages. The reference librarians want to be able to access this description when members request information about a book. Library staff includes chief librarian, departmental associate librarians, reference librarians, check-out staff, and library assistants. Books can be checked out for 21 days. Members are allowed to have only five books out at a time. Members usually return books within three to four weeks. Most members know that they have one week of grace before a notice is sent to them, so they try to return books before the grace period ends. About 5% of the members have to be sent reminders to return books. Most overdue books are returned within a month of the due date. Approximately 5% of the overdue books are either kept or never returned. The most active members of the library are defined as those who borrow books at least ten times during the year. The top 1% of membership does 15% of the borrowing, and the top 10% of the membership does 40% of the borrowing. About 20% of the members are totally inactive in that they are members who never borrow. To become a member of the library, applicants fill out a form including their SSN, campus and home mailing addresses, and phone numbers. The librarians issue a numbered, machinereadable card with the members photo on it. This card is good for four years. A month before a card expires, a notice is sent to a member for renewal. Professors at the institute are considered automatic members. When a new faculty member joins the institute, his or her information is pulled from the employee records and a library card is mailed to his or her campus address. Professors are allowed to check out books for three-month intervals and have a two-week grace period. Renewal notices to professors are sent to their campus address. The library does not lend some books, such as reference books, rare books, and maps. The librarians must differentiate between books that can be lent and those that cannot be lent. In addition, the librarians have a list of some books they are interested in acquiring but cannot obtain, such as rare or out- of-print books and books that were lost or destroyed but have not been replaced. The librarians must have a system that keeps track of books that cannot be lent as well as books that they are interested in acquiring. Some books may have the same title; therefore, the title cannot be used as a means of identification. Every book is identified by its International Standard Book Number (ISBN), a unique international code assigned to all books. Two books with the same title can have different ISBNs if they are in different languages or have different bindings (hardcover or softcover). Editions of the same book have different ISBNs. The proposed database system must be designed to keep track of the members, the books, the catalog, and the borrowing activity. Step-by-step solution Step 1 of 2 Entity Types: 1. LIBRARY_MEMBER 2. BOOK 3. STAFF_MEMBER Relationship types: 1. ISSUE_CARD 2. ISSUE_NOTICE 3. ISSUE_BOOK 4. GET_DESCRIPTION Aggregation: 1. All entity types are aggregation of constituent attributes as can be seen from EER diagram. 2. Relationship types that have member attributes (see figure) are also aggregation. Identification: 1. All entity types and Relationship type are identified by names. 2. Each entity of entity type is identified differently by: a. LIBRARY_MEMBER: Ssn b. BOOK: Key(Title, Bind, Language, ISBN) c. STAFF_MEMBER: Ssn Specialization/ generalization: 1. Specialization of STAFF_MEMBER on basis of Designation. This is a partial disjoint specialization. 2. Specialization of BOOK on basis of In_Library. This is a total disjoint specialization. 3. Specialization of IN_LIBRARY_BOOK on basis of Can_be_rented. This is a total disjoint specialization. Other Constraints that may pose in future: 1. Fine that will be charged for a lost card. 2. Expiry period of lost card 3. Comment Step 2 of 2 Privileges that may be entitled to a particular group of users. 4. Book description might change with new issues. 5. Fine that will be charged for damaged book. Comment Chapter 4, Problem 20E Problem Design a database to keep track of information for an art museum. Assume that the following requirements were collected: â– The museum has a collection of ART_OBJECTS. Each ART_OBJECT has a unique ld_no, an Artist (if known), a Year (when it was created, if known), a Title, and a Description. The art objects are categorized in several ways, as discussed below. â– ART_OBJECTS are categorized based on their type. There are three main types—PAINTING, SCULPTURE, and STATUE—plus another type called OTHER to accommodate objects that do not fall into one of the three main types. â– A PAINTING has a Paint_type (oil, watercolor, etc.), material on which it is Drawn_on (paper, canvas, wood, etc.), and Style (modern, abstract, etc.). â– A SCULPTURE or a statue has a Material from which it was created (wood, stone, etc.), Height, Weight, and Style. â– An art object in the OTHER category has a Type (print, photo, etc.) and Style. â– ART_OBJECTs are categorized as either PERMANENT_COLLECTION (objects that are owned by the museum) and BORROWED. Information captured about objects in the PERMANEN_COLLECTION includes Date_acquired, Status (on display, on loan, or stored), and Cost. Information captured about BORROWED objects includes the Collection from which it was borrowed, Date_borrowed, and Date_returned. â– Information describing the country or culture of Origin (Italian, Egyptian, American, Indian, and so forth) and Epoch (Renaissance, Modern, Ancient, and so forth) is captured for each ART_OBJECT. â– The museum keeps track of ARTIST information, if known: Name, DateBorn (if known), Date_died (if not living), Country_of_origin, Epoch, Main_style, and Description. The Name is assumed to be unique. â– Different EXHIBITIONS occur, each having a Name, Start_date, and End_date. EXHIBITIONS are related to all the art objects that were on display during the exhibition. â– Information is kept on other COLLECTIONS with which the museum interacts; this information includes Name (unique), Type (museum, personal, etc.), Description, Address, Phone, and current Contact_person. Draw an EER schema diagram for this application. Discuss any assumptions you make, and then justify your EER design choices. Step-by-step solution Step 1 of 2 Consider the following museum database to create the ER diagram: The following are the assumptions: • An ARTIST can create any number of ART_OBJECTS. • ART_OBJECT will be displayed in the exhibition. • Many ART_OBJECTS can be displayed in many EXHIBITIONS. Comment Step 2 of 2 The EER schema diagram for the art museum database is as follows: Comments (1) Chapter 4, Problem 21E Problem Figure shows an example of an EER diagram for a small-private-airport database; the database is used to keep track of airplanes, their owners, airport employees, and pilots. From the requirements for this database, the following information was collected: Each AIRPLANE has a registration number [Reg#], is of a particular plane type [OF_TYPE], and is stored in a particular hangar [STORED_IN]. Each PLANE_TYPE has a model number [Model], a capacity [Capacity], and a weight [Weight]. Each HANGAR has a number [Number], a capacity [Capacity], and a location [Location]. The database also keeps track of the OWNERs of each plane [OWNS] and the EMPLOYEES who have maintained the plane [MAINTAIN]. Each relationship instance in OWNS relates an AIRPLANE to an OWNER and includes the purchase date [Pdate]. Each relationship instance in MAINTAIN relates an EMPLOYEE to a service record [SERVICE]. Each plane undergoes service many times; hence, it is related by [PLANE_SERVICE] to a number of SERVICE records. A SERVICE record includes as attributes the date of maintenance [Date], the number of hours spent on the work [Hours], and the type of work done [Work_code]. We use a weak entity type [SERVICE] to represent airplane service, because the airplane registration number is used to identify a service record. An OWNER is either a person or a corporation. Hence, we use a union type (category) [OWNER] that is a subset of the union of corporation [CORPORATION] and person [PERSON] entity types. Both pilots [PILOT] and employees [EMPLOYEE] are subclasses of PERSON. Each PILOT has specific attributes license number [Lic_num] and restrictions [Restr]; each EMPLOYEE has specific attributes salary [Salary] and shift worked [Shift]. All PERSON entities in the database have data kept on their Social Security number [Ssn], name [Name], address [Address], and telephone number [Phone]. For CORPORATION entities, the data kept includes name [Name], address [Address], and telephone number [Phone]. The database also keeps track of the types of planes each pilot is authorized to fly [FLIES] and the types of planes each employee can do maintenance work on [WORKS_ON]. Show how the SMALL_AIRPORT EER schema in Figure 4.12 may be represented in UML notation. (Note: We have not discussed how to represent categories (union types) in UML, so you do not have to map the categories in this and the following question.) EER schema for a SMALL_AIRPORT database. Step-by-step solution Step 1 of 2 Consider the EER schema for a SMALL_AIRPORT database. The following is the UML diagram that represents the SMALL_AIRPORT database. Comment Step 2 of 2 Each entity and relationships are shown in the UML diagram. In the provided EER diagram, there is a union type (category) specified for OWNER. The OWNER is a subset of the union of CORPORATION and PERSON. The categories are not mapped in the UML as specified. Comments (2) Chapter 4, Problem 22E Problem Show how the UNIVERSITY EER schema in Figure 4.9 may be represented in UML notation. Step-by-step solution Step 1 of 2 • The entity relationship diagram refers to the diagram that represents the relationship between different entities and their attributes. The entities can be people, objects etc. • The UML refers to the unified modeling language which is a language used to develop or model the fields in software engineering. It is very helpful to understand the designing of the system. Comment Step 2 of 2 For the given ER diagram, the UML diagram is shown below: Comment Chapter 4, Problem 23E Problem Consider the entity sets and attributes shown in the following table. Place a checkmark in one column in each row to indicate the relationship between the far left and far right columns. a. The left side has a relationship with the right side. b. The right side is an attribute of the left side. c. The left side is a specialization of the right side. d. The left side is a generalization of the right side. (b) Has (a) Has a Entity Set Relationship with an Attribute that is (c) Is a (d) Is a Specialization Generalization of of Entity Attrib 1. MOTHER PERS 2. DAUGHTER MOT 3. STUDENT PERS 4. STUDENT Stude 5. SCHOOL STUD 6. SCHOOL CLAS 7. ANIMAL HOR 8. HORSE Breed 9. HORSE Age 10. EMPLOYEE SSN 11. FURNITURE CHAI 12. CHAIR Weig 13. HUMAN WOM 14. SOLDIER PERS 15. ENEMY_COMBATANT PERS Step-by-step solution Step 1 of 2 Relationship between Entity Sets and Attributes Specialization: Specialization is the process of classifying a class of objects into more specialized subclasses. Consider an example a “PERSON” class, classify this class objects into more specialized subclasses like MOTHER, STUDENT, SOILDER, and so on. Generalization: Generalization is a relationship in which the child class is based on the parent class. Both child and parent class elements in a generalization relationship must be of the same type. Aggregation: It specifies a whole/part relationship between the aggregate (whole) and a component part. When a class is formed as a collection of other classes, it is called an aggregation relationship between these classes. It is also called a “has a” relationship. Inheritance: A child class properties is derived from parent class properties. It is also called an “Is a” relationship. Comment Step 2 of 2 Consider the entity sets and attributes and apply one of the relationship. Entity Sets and Attributes Relationship Table Comment Chapter 4, Problem 24E Problem Draw a UML diagram for storing a played game of chess in a database. You may look at http://www.chessgames.com for an application similar to what you are designing. State clearly any assumptions you make in your UML diagram. A sample of assumptions you can make about the scope is as follows: 1. The game of chess is played between two players. 2. The game is played on an 8 x 8 board like the one shown below: 3. The players are assigned a color of black or white at the start of the game. 4. Each player starts with the following pieces (traditionally called chessmen): a. king b. queen c. 2 rooks d. 2 bishops e. 2 knights f. 8 pawns 5. Every piece has its own initial position. 6. Every piece has its own set of legal moves based on the state of the game. You do not need to worry about which moves are or are not legal except for the following issues: a. A piece may move to an empty square or capture an opposing piece. b. If a piece is captured, it is removed from the board. c. If a pawn moves to the last row, it is “promoted” by converting it to another piece (queen, rook, bishop, or knight). Note: Some of these functions may be spread over multiple classes. Step-by-step solution Step 1 of 1 Assumptions: 1. In any move maximum two pieces can get affected. 2. Player can promote a piece. 3. A piece gets promoted. 4. After a move captured piece is removed from board. Comment Chapter 4, Problem 25E Problem Draw an EER diagram for a game of chess as described in Exercise. Focus on persistent storage aspects of the system. For example, the system would need to retrieve all the moves of every game played in sequential order. Exercise Draw a UML diagram for storing a played game of chess in a database. You may look at http://www.chessgames.com for an application similar to what you are designing. State clearly any assumptions you make in your UML diagram. A sample of assumptions you can make about the scope is as follows: 1. The game of chess is played between two players. 2. The game is played on an 8 x 8 board like the one shown below: 3. The players are assigned a color of black or white at the start of the game. 4. Each player starts with the following pieces (traditionally called chessmen): a. king b. queen c. 2 rooks d. 2 bishops e. 2 knights f. 8 pawns 5. Every piece has its own initial position. 6. Every piece has its own set of legal moves based on the state of the game. You do not need to worry about which moves are or are not legal except for the following issues: a. A piece may move to an empty square or capture an opposing piece. b. If a piece is captured, it is removed from the board. c. If a pawn moves to the last row, it is “promoted” by converting it to another piece (queen, rook, bishop, or knight). Note: Some of these functions may be spread over multiple classes. Step-by-step solution Step 1 of 1 EER diagram for chess game Enhanced Entity Relationship diagram is the concept of superclass and subclass entity types in the ER model. Here super classes are PLAYER, MOVES, PIECES and subclasses are Name, Color, Cur_position, Initian_Position, Piece_name, Position_before_move, Changed_position. Sequence order for game play: Step 1: PLAYER makes a first move. Step 2: PIECES get moved and give the chance for position. Step 3: PLAYER, take the chance and ready to nest move Step 4: PIECES change the position for avoiding the PLAYER move. Step 5: This process will continue until the End. Comment Chapter 4, Problem 26E Problem Which of the following EER diagrams is/are incorrect and why? State clearly any assumptions you make. a. b. c. Step-by-step solution Step 1 of 3 a. The given EER diagram is correct. • E is a super class and E1 and E2 are sub classes of entity E. • E1 and E2 are overlapping entities of entity E. It indicates that E may be a member of E1 or E2 or both. • There exists a one to many relationship R between E2 and E3. Comment Step 2 of 3 b. The given EER diagram is correct. • E is a super class and E1 and E2 are sub classes of entity E. • E1 and E2 are disjoint entities of entity E. It indicates that E may be a member of E1 or E2. • There exists a one to one relationship R between E1 and E2. Comment Step 3 of 3 c. The given EER diagram is incorrect. • E1 and E3 are overlapping entities of entity say E. It indicates that E may be a member of E1 or E3 or both. • The overlapping entities E1 and E3 cannot share a relationship R. So there cannot be a many to many relationship between E1 and E3. Hence, given EER is not possible. Comments (1) Chapter 4, Problem 27E Problem Consider the following EER diagram that describes the computer systems at a company. Provide your own attributes and key for each entity type. Supply max cardinality constraints justifying your choice. Write a complete narrative description of what this EER diagram represents. Step-by-step solution Step 1 of 5 S.No Entity Type Attributes Key 1 COMPUTER RAM, ROM, Processor, S_no, Manufacturer, Cost S_no 2 ACCESSORY S_no, cost, type S_no 3 LAPTOP Weight, Screen_size NA 4 DESKTOP Color NA 5 SOFTWARE 6 OPERATONG_SYSTEM Name, size NA 7 COMPONENT Manufacturer, S_no, Cost, Type S_no 8 KEYBOARD Type NA 9 MEMORY Size NA 10 MONITOR Size, Resolution, Type NA 11 MOUSE Type, Is_wired NA 12 SOUND_CARD Type NA 13 VIDEO_CARD Type NA Lic_no, Cost, Manufacturer, Is_system_software, Year_of_manufacturing, Version, Author Lic_no Comment Step 2 of 5 (min,max) Relationship Entity type1 name name 1 SOLD_WITH COMPUTER (1,1) ACCESSORY (1,N) 2 INSTALLED COMPUTER (1,1) SOFTWARE (1,M) S.No constraint, Entity type2 name REASON (min,max) constraint 3 INSTALLED_OS COMPUTER (1,1) OPERATING_SYSTEM (1,N) 4 MEM_OPTIONS LAPTOP (1,1) MEMORY (1,N) 5 OPTIONS DESKTOP (1,1) COMPONENT (1,N) 6 SUPPORTS SOFTWARE (1,N) COMPONENT (1,M) Comment Step 3 of 5 As all components and accessories are restricted by S_no and all softwares are restricted by Lic_no so each can go to a single LAPTOP/ DESKTOP/ COMPUTER. On the contrary a computer can have any no of ACCESSORY/ SOFTWARE/ OPERATING_SYSTEM/ COMPONENT/ MEMORY. SOFTWARE may need many supporting COMPONENTS and a COMPONENT can SUPPORT many SOFTWARES. Comment Step 4 of 5 Narrative description: A database is needed to maintain all computers systems in a company. Each COMPUTER in company has a unique S_no. it has a fixed RAM, ROM, Processor, Manufacturer and Cost. A COMPUTER can be a LAPTOP or a DESKTOP. Each LAPTOP has Screen_size, Weight. Each DESKTOP has a Colour. A COMPUTER has many SOFTWARE INSTALLED. Each SOFTWARE has a unique Lic_no. It has also associated with it Cost, Manufacturer, Is_system_software, Year_of_manufacturing, Version, Author. OPERATING_SYSTEM is also software that is related to COMPUTER and has size of memory it consumes and name associated with it. Comment Step 5 of 5 With COMPUTER one can get ACCESSORY. Each ACCESSORY has cost, S_no, and type(audio/ video./ input/output). ACCESSORY can be categorized into KEYBOARD (type), MOUSE (type, Is_wired), MONITOR(size, resolution, type). Associated with DESKTOP and software we have various COMPONENT (Manufacturer, S_no, Cost, Type). COMPONENT are further divided in MEMORY(size), AUDIO_CARD(type), VIDEO_CARD(type). LAPTOP can also have MEMORY_OPTOIONS. Comment Chapter 4, Problem 29LE Show transcribed image text Consider an ONLINE AUCTION database system in which members (buyers and sellers) participate the sale of items The data requirements for this system are summarized as follows: The online site has members, each of whom is identi ed by a unique member number and is described by an e-mail address, name. password, home address, and phone number. A member may be a buyer or a seller A buyer has a shipping address recorded in the database A seller has a bank account number and routing. number recorded in the database. Items are placed by a seller for sale and are identi ed by unique item number assigned by the system. Items are also described b) an Item title. a description, starting bid price, bidding increment, the start date of the auction, and the end date of the auction. Items are also categorized based on a xed classi cation hierarchy ( r example a modem may be classi ed as COMPUTER rightarrow HARDWARE rightarrow MODEM). Buyers make bids for items they are interested in. Bid price and time of bid is recorded. The bidder at the end of the auction with the highest bid price is declared the winner and a transaction between buyer and seller may the proceed The buyer and seller my record feedback regarding their completed transaction. Feedback contains a rating of the other party participating in the transaction (1-10)and a comment. View comments (1)  EER diagram for Online Auction Database Chapter 4, Problem 30LE Consider a database system for a baseball organization such as the major leagues. The data requirements are summarized as follows: The personnel involved in the league include players, coaches, managers, and umpires. Each is identi ed by a unique personnel id. They are also described by their rst and last names along with the date and place of birth. Players are further described by other attributes such as their batting orientation (left, right, or switch) and have a lifetime batting average (BA). Within the players group is a subset of players called pitchers. Pitchers have a lifetime ERA (earned run average) associated with them. Teams are uniquely identi ed by their names. Teams are also described by the city in which they are located and the division and league in which they play (such as Central division of the American League). Teams have one manager, a number of coaches, and a number of players. Games are played between two teams, with one designated as the home team and the other the visiting team on a particular date. The score (runs, hits, and errors) is recorded for each team. The team with the most runs is declared the winner of the game. With each nished game, a winning pitcher and a losing pitcher are recorded. In case there is a save awarded, the save pitcher is also recorded. With each nished game, the number of hits (singles, doubles, triples, and home runs) obtained by each player is also recorded. Design an enhanced entity–relationship diagram for the BASEBALL database Using that EER diagram, model the database in Microsoft Access. Populate each table with appropriate data. our populated Access database with all relationships added (with referential integrity, of course) If you are a completist, you can nd data at ESPN or MLB sites. Expert Answer Anonymous answered this 5 answers Was this answer helpful? 0 The EER model of the Baseball Database is as follows: Below are the Database tables designed in MS Access for Teams, Managers, Umpires, Players and Pitchers: 0 For Managers: For Players: For Pitchers: For Umpires: These all the related tables used to manage the Baseball game with a Master Database as follows: Master DB: Part1: Master DB Part2: Problem Chapter 4, Problem 31LE Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in notation between the diagram in the text and the corresponding equivalent diagrammatic notation you end up using with the tool. Step-by-step solution Step 1 of 1 Refer to the figure 4.9 for the EER diagram of the UNIVERSITY database. Use Rational Rose tool to create the ER schema for the database as follow: • In the options available on left, right click on the option Logical view, go to New and select the option Class Diagram. • Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and then click on empty space of the Class Diagram file. Name the class as FACULTY. Right click on the class, select the option New Attribute, and name the attribute as Rank. Similarly, create the other attributes Foffice, Fphone and Salary. • Similarly create another class GRANT and its attributes Title, No, Agency and St_date. • Now right click on the attribute No, available on the left under the class GRANT, and select the option Open Specification. Select the Protected option under Export Control. This will make the attribute No as primary key. • Select the option Unidirectional Association from the toolbar, for creating relationships between the two classes. Now click on the class FACULTY; while holding the click drag the mouse towards the class GRANT and release the click. This will create the relationship between the two selected classes. Name the association as PI. Since the structural constraint in the EER diagram is specified using cardinality ratio, so specify the structural constraints using the Rational Rose tool as follows: • Right click on the association close to the class FACULTY and select 1 from the option Multiplicity. • Again, right click on the association close to the class GRANT and select n from the option Multiplicity. • Similarly, create other classes and their associated attributes. Specify the relationships and structural constraints between the classes, as mentioned above. ER schema may be specified using alternate diagrammatic notation that is class diagram, through the use of Rational Rose tool as follows: The list of differences in notation between the EER diagram used in the figure 4.9 and its equivalent diagrammatic notation, drawn through the Rational Rose tool, are as follows: • In the EER diagram the entities are specified in a rectangle. However, the class diagram in Rational Rose makes use of top section of the class diagram for specifying the entities. • The attributes are specified in the EER diagram using the oval. The class diagram in the Rational Rose makes use of the middle section, for specifying the attributes. • The primary keys in the EER diagram are specified by underlining the attribute in an oval. An attribute can be made a primary key in the class diagram in the Rational Rose by selecting the option Open Specification; followed by selecting the Protected option under Export Control. A yellow color key against the attribute in the class diagram in the Rational Rose indicates primary key. • The relationship between two entities is specified in the diamond shaped box. For example, in figure 4.9 PI is the relationship between FACULTY and GRANT. The class diagram in Rational Rose makes use of option Unidirectional Association for specifying the relation or association between two entities. For example, in the above class diagram, the association named PI is specified on the line joining the two entities. • The structural constraint in the EER diagram is specified using cardinality ratio. For example, in the PI relationship, FACULTY: GRANT is of cardinality ratio 1:N. In the class diagram made using Rational Rose, the Multiplicity option is used for specifying the cardinality ratio. Comment Chapter 4, Problem 31LE Problem Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in notation between the diagram in the text and the corresponding equivalent diagrammatic notation you end up using with the tool. Step-by-step solution Step 1 of 1 Refer to the figure 4.9 for the EER diagram of the UNIVERSITY database. Use Rational Rose tool to create the ER schema for the database as follow: • In the options available on left, right click on the option Logical view, go to New and select the option Class Diagram. • Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and then click on empty space of the Class Diagram file. Name the class as FACULTY. Right click on the class, select the option New Attribute, and name the attribute as Rank. Similarly, create the other attributes Foffice, Fphone and Salary. • Similarly create another class GRANT and its attributes Title, No, Agency and St_date. • Now right click on the attribute No, available on the left under the class GRANT, and select the option Open Specification. Select the Protected option under Export Control. This will make the attribute No as primary key. • Select the option Unidirectional Association from the toolbar, for creating relationships between the two classes. Now click on the class FACULTY; while holding the click drag the mouse towards the class GRANT and release the click. This will create the relationship between the two selected classes. Name the association as PI. Since the structural constraint in the EER diagram is specified using cardinality ratio, so specify the structural constraints using the Rational Rose tool as follows: • Right click on the association close to the class FACULTY and select 1 from the option Multiplicity. • Again, right click on the association close to the class GRANT and select n from the option Multiplicity. • Similarly, create other classes and their associated attributes. Specify the relationships and structural constraints between the classes, as mentioned above. ER schema may be specified using alternate diagrammatic notation that is class diagram, through the use of Rational Rose tool as follows: The list of differences in notation between the EER diagram used in the figure 4.9 and its equivalent diagrammatic notation, drawn through the Rational Rose tool, are as follows: • In the EER diagram the entities are specified in a rectangle. However, the class diagram in Rational Rose makes use of top section of the class diagram for specifying the entities. • The attributes are specified in the EER diagram using the oval. The class diagram in the Rational Rose makes use of the middle section, for specifying the attributes. • The primary keys in the EER diagram are specified by underlining the attribute in an oval. An attribute can be made a primary key in the class diagram in the Rational Rose by selecting the option Open Specification; followed by selecting the Protected option under Export Control. A yellow color key against the attribute in the class diagram in the Rational Rose indicates primary key. • The relationship between two entities is specified in the diamond shaped box. For example, in figure 4.9 PI is the relationship between FACULTY and GRANT. The class diagram in Rational Rose makes use of option Unidirectional Association for specifying the relation or association between two entities. For example, in the above class diagram, the association named PI is specified on the line joining the two entities. • The structural constraint in the EER diagram is specified using cardinality ratio. For example, in the PI relationship, FACULTY: GRANT is of cardinality ratio 1:N. In the class diagram made using Rational Rose, the Multiplicity option is used for specifying the cardinality ratio. Comment Chapter 4, Problem 32LE Problem Consider the EER diagram for the small AIRPORTdatabase shown in Figure. Build this design using a data modeling tool such as ERwin or Rational Rose. Be careful how you model the category OWNER in this diagram. (Hint: Consider using CORPORATION_IS_OWNER and PERSON_IS_OWNER as two distinct relationship types.) EER schema for a SMALL_AIRPORT database. Step-by-step solution Step 1 of 2 Refer to the figure 4.12 for the EER schema of AIRLINE database. Use Rational Rose tool to create the EER schema for the database as follow: • In the options available on left, right click on the option Logical view, go to New and select the option Class Diagram. • Name the class diagram as SMALL_AIRPORT. Select the option Class available in the toolbar and then click on empty space of the Class Diagram file. Name the class as PLANE_TYPE. Right click on the class, select the option New Attribute, and name the attribute as Model. Similarly, create the other attributes Capacity and Weight. • Now right click on the attribute Model, available on the left under the class PLANE_TYPE, and select the option Open Specification. Select the Protected option under Export Control. This will make Model as the primary key. • Similarly create another class EMPLOYEE and its attribute Salary and Shift. • Select the option Unidirectional Association from the toolbar, for creating relationships between the two classes. Now click on the class PLANE_TYPE; while holding the click drag the mouse towards the class EMPLOYEE and release the click. This will create the relationship between the two selected classes. Name the association as WORKS_ON. Since the structural constraint in the EER diagram is specified using cardinality ratio, so specify the structural constraints using the Rational Rose tool as follows: • Right click on the association close to the class PLANE_TYPE and select n from the option Multiplicity. • Again, right click on the association close to the class EMPLOYEE and select n from the option Multiplicity. • Similarly, create other classes and their associated attributes. Specify the relationships and structural constraints between the classes, as mentioned above. ER schema may be specified using alternate diagrammatic notation that is class diagram, through the use of Rational Rose tool as follows: Comment Step 2 of 2 In the above class diagram, OWNER is the superclass, and PERSON and CORPORATION are the subclasses. The subclasses can further participate in specific relationship types. For example, in the above class diagram the PERSON subclass participates in the OWNER_TYPE relationship. The subclass PERSON is further related to an entity type PERSON_IS_OWNER via the OWNER_TYPE relationship. Similarly, the subclass CORPORATION is related to CORPORATION_IS_OWNER via the OWNER_TYPE relationship. The relationship types can be specified using the Rational Rose as follows: • Create the subclass PERSON_IS_OWNER of the class PERSON as explained above. Also create the association between the class PERSON and its subclass PERSON_IS_OWNER and name it as OWNER_TYPE, as explained above. • Similarly, create the subclass CORPORATION_IS_OWNER of the class CORPORATION and name the association between them as OWNER_TYPE. Comment Chapter 4, Problem 33LE Problem Consider the UNIVERSITY database described in Exercise 3.16.You already developed an ER schema for this database using a data modeling tool such as ERwin or Rational Rose in Lab Exercise 3.31. Modify this diagram by classifying COURSES as either UNDERGRAD_COURSES or GRAD_COURSES and INSTRUCTORS as either JUNIOR_PROFESSORS or SENIOR_PROFESSORS. Include appropriate attributes for these new entity types. Then establish relationships indicating that junior instructors teach undergraduate courses whereas senior instructors teach graduate courses. Reference Exercise 3.31 Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in notation between the diagram in the text and the corresponding equivalent diagrammatic notation you end up using with the tool. Reference Problem 3.16 Which combinations of attributes have to be unique for each individual SECTION entity in the UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld constraints: a. During a particular semester and year, only one section can use a particular classroom at a particular DaysTime value. b. During a particular semester and year, an instructor can teach only one section at a particular DaysTime value. c. During a particular semester and year, the section numbers for sections offered for the same course must all be different. Can you think of any other similar constraints? Step-by-step solution Step 1 of 1 Refer to the Exercise 3.16 for the UNIVERSITY database and the ER schema developed for this database through Rational Rose tool. Using Rational Rose, make the required changes and create the ER schema as follows: • COURSE is the superclass and UNDERGRAD_COURSES and GRAD_COURSES are its subclasses. The subclasses are introduced in the class diagram, developed using Rational Rose tool in Lab Exercise 3.31, via Rational Rose tool as follows: • Consider the class COURSE developed in Exercise 3.31. Select the option Class available in the toolbar and then click on empty space of the Class Diagram file. Name the subclass as UNDERGRAD_COURSES. Right click on the class, select the option New Attribute, and name the attribute as Title. Similarly, create the other attribute Department. Similarly, create another subclass GRAD_COURSES of the class COURSE and its attributes Title and Department. • Similarly, create the subclasses JUNIOR_PROFESSORS and SENIOR_PROFESSORS of the superclass INSTRUCTOR. Also create the attributes Specialization, Designation and Qualification for these subclasses, as described above. • The subclass JUNIOR_PROFESSORS is further related to another subclass UNDERGRAD_COURSES via the TEACHES relationship. Also, the subclass SENIOR_PROFESSORS is further related to another subclass GRAD_COURSES via the TEACHES relationship. The relationship types between the subclass and superclass can be specified using the Rational Rose as follows: • Select the option Unidirectional Association from the toolbar, for creating relationships between the two classes. Now click on the class JUNIOR_PROFESSORS; while holding the click drag the mouse towards the class UNDERGRAD_COURSES and release the click. This will create the relationship between the two selected classes. Name the association as TEACHES. • Similarly, create the relationship between the classes SENIOR_PROFESSORS and GRAD_COURSES. ER schema with the changes may be specified using alternate diagrammatic notation that is class diagram, through the use of Rational Rose tool as follows: Comment Chapter 5, Problem 1RQ Problem Define the following terms as they apply to the relational model of data: domain, attribute, ntuple, relation schema, relation state, degree of a relation, relational database schema, and relational database state. Step-by-step solution Step 1 of 7 575-5-1RQ 1. Domain: Domain is a set of atomic (indivisible) values that can appear in a particular column in a relational schema. A common method of specifying domain is to specify a data type (integer, character, floating point, etc...) from which the data values forming a domain can be drawn. For example: Consider a relational schema called Student that may have facts about students in a particular course. Consider a fact to be name of the student. Name of a student must be a char string. So we can say domain of name is char string. Comment Step 2 of 7 2. Attribute: An Attribute is a role played by some domain in the relational schema. For example: In relational Schema STUDENT, NAME can be one of the attributes of the relation NOTATIONS: • Relational Schema R1 >> R(A1,A2,…..,AN) • Attributes>> A1, A2 …. • Domain of say A1>> dom(A1) • Tuple>> t Comment Step 3 of 7 3. N-tuple: If a Relational Schema consists of n Attributes, i.e., degree of relational schema is n, then n-tuple is an ordered list of n values that represent a tuple , t = ; where each value vi,1<=i<=n, is a element of dom(Ai) or is a special NULL value. For example: In a relational schema STUDENT, if we have four attributes, viz., Name, Roll No., Class, and , Rank then n-tuple for a student can be where Student Ram has roll number 1 and studies in class to and got rank 5 in class. 4. Relational Schema: Relational schema is but collection of attributes that define facts and relation between a real world entity and name. In other words a relational schema R, denoted by R (A1,A2,….,AN), is made up of a name and a list of attributes A1, A2,…,An. For example: STUDENT can be name of a relational schema and Name, Roll No., Class, and , Rank can be its four attributes. Comment Step 4 of 7 5. : A relation state, r, of a relation schema R(A1, A2,……An), is a set of n-tuples. In another words a relation state of a relational schema is a collection of various tuples, where each tuple represents information about single entity. For example: In relational schema for student collection of data for 2 students, viz., , is a relation state. Formal Definition: A relation state, r(R), is a mathematical relation of degree n on the domains of all attributes, which is a subset of the cartesian product of the domains that define R: r(R) C (dom (A1) × dom (A2)×……..× dom (An)) Comment Step 5 of 7 6. Degree of a Relation: The degree (or arity) of a relation is the number of attributes n of its relational schema. Comment Step 6 of 7 7. Relational Database Schema: A Relational Database Schema S is a set of relation schemas, S = { R1,R2,….Rn} and a set of integrity constraints IC. Comment Step 7 of 7 8. : A Relational Database State DB of S is set of relation states, DB = {r1,r2,….rn}, such that each ri is state of Ri and such that the ri relation states satisfy the integrity constraints specified in IC. Comment Problem Chapter 5, Problem 2RQ Why are tuples in a relation not ordered? Step-by-step solution Step 1 of 2 A relation in database management is defined as a set of tuples. And mathematically, the elements of a set have no order among them. Comment Step 2 of 2 Hence, the tuples in a relation are not ordered. Comment Chapter 5, Problem 3RQ Problem Why are duplicate tuples not allowed in a relation? Step-by-step solution Step 1 of 1 Duplicate tuples are not allowed in a relation as it violates the relational integrity constraints. • A key constraint states that there must be an attribute or combination of attributes in a relation whose values are unique. • There should not be any two tuples in a relation whose values are same for their attribute values. • If the tuples contains duplicate values, then it violates the key constraint. Hence, duplicate tuples are not allowed in a relation. Comment Chapter 5, Problem 4RQ Problem What is the difference between a key and a superkey? Step-by-step solution Step 1 of 2 A super key SK is a set of attributes that uniquely identifies the tuples of a relation. It satisfies the uniqueness constraint. A key K is an attribute or set of attributes that uniquely identifies the tuples of a relation. It is a minimal super key. In other words, when an attribute is removed from super key, it will no longer be a super key. Comment Step 2 of 2 The differences between key and super key are as follows: Comment Chapter 5, Problem 5RQ Problem Why do we designate one of the candidate keys of a relation to be the primary key? Step-by-step solution Step 1 of 1 Every relation must contain an attribute or combination of attributes which can used to uniquely identify each tuple in a relation. • An attribute or combination of attributes which can used to uniquely identify each tuple in a relation is known as candidate key. • A relation can have more than one candidate key. • Among several candidate key, one candidate key which is usually single and simple is chosen as a primary key. • A primary key is an attribute that uniquely identifies each tuple in a relation. Comment Chapter 5, Problem 6RQ Problem Discuss the characteristics of relations that make them different from ordinary tables and files. Step-by-step solution Step 1 of 2 The tables, relations, and files are the key concepts of the relational data model. A relation resembles a table, but it has some added constraints to it to use the link between two tables in an efficient way. A file is basically a collection of records or a table stored on a physical device. Comment Step 2 of 2 Even though both the relation and a table are used to store/represent data, there are differences between them as shown below: Comment Chapter 5, Problem 7RQ Problem Discuss the various reasons that lead to the occurrence of NULL values in relations. Step-by-step solution Step 1 of 2 NULL value: The absence of a data, that is “nothing” represented as an empty value. • The NULL value can be considered as a data. • The data may be “zero”, “blank” or “none” For example, If the student does not have any pen or pencil for the exam, • For that particular student, the values of those attributes are defined as NULL. • The NULL value can be either the values do not exist, an unknown value or the value not yet available. Comment Step 2 of 2 The Occurrence of NULL values in relations: • The tuple can be marked as NULL, When the value of an attribute is not applicable. • The tuple can be marked as NULL, When the existing value of an attribute is unknown. • If the value of an attribute does not apply to a tuple, it is also marked as NULL. • If the value of an attribute is not known or not found, the particular tuple is marked as NULL.s • For instance, suppose the values are known but specifically does not apply to the tuple it is marked as NULL. • In relations of NULL values, the values exist but at present, it is not available. • In relations of NULL values, the different meanings can be conveyed by different codes. • In relations, the operations of NULL value have been proved when the lack of value (NULL) is found. Comment Chapter 5, Problem 8RQ Problem Discuss the entity integrity and referential integrity constraints. Why is each considered important? Step-by-step solution Step 1 of 2 Entity Integrity Constraint: It states that no primary key value can be NULL. Importance: Primary key values are used to identify a tuple in a relation. Having NULL value for primary key will mean that we cannot identify some tuples. Referential Integrity Constraints: It states that a tuple in one relation that refers to another relation must refer to an existing tuple in that relation Comment Step 2 of 2 Definition using Foreign Key: For two relational schemas R1 and R2, a set of attributes FK in relation schema R1, is foreign key of R1 that references relation R2 f it satisfies following condition: • Attributes in FK have same domain(s) as primary key attributes PK of R2; the attributes FK are said to reference relation R2. • A value of FK in a tuple t1 of the current state r1 (R1) either occurs in as a value of PK for some tuple in the current state r2 (R2) or is NULL . In former case (t1 [FK] = t2 [PK]) tuple t1 is said to refer to the tuple t2. When these two conditions hold true between R1 the referencing relation and R2 the referenced relation the referential integrity constraint is said to hold true. Importance: Referential Integrity constraints are specified among two relations and are used to maintain consistency among tuples in two relations. Comment Chapter 5, Problem 9RQ Problem Define foreign key. What is this concept used for? Step-by-step solution Step 1 of 2 A foreign key is an attribute or composite attribute of one relation which is/are a primary key of other relation that is used to maintain relationship between two relations. • A relation can have more than one foreign key. • A foreign key can contain null values. Comment Step 2 of 2 The concept of foreign key is used to maintain referential integrity constraint between two relations and hence in maintaining consistency among tuples in two relations. • The value of a foreign key should match with value of the primary key in the referenced relation. • A value to a foreign key cannot be added which does not exist in the primary key of the referenced relation. • It is not possible to delete a tuple from the referenced relation if there is any matching record in the referencing relation. Comment Chapter 5, Problem 10RQ Problem What is a transaction? How does it differ from an Update operation? Step-by-step solution Step 1 of 2 A transaction is a program in execution that involves various operations that can be done on the database. The operations that are included in a transaction are as follows: • Reading data from the database. • Deleting a tuple from the database. • Inserting new tuples to the database • Updating values of existing tuples in the database. Comment Step 2 of 2 The main difference between update operation and a transaction is as follows: • In an update operation, only a single attribute value can be changed at one time. • In a transaction, more than one update operation along with reading data from the database, insertion and deletion operations can be done. Comment Chapter 5, Problem 11E Problem Suppose that each of the following Update operations is applied directly to the database state shown in Figure 5.6. Discuss all integrity constraints violated by each operation, if any, and the different ways of enforcing these constraints. a. Insert <‘Robert’, ‘F’ ‘Scott’, ‘943775543’, ‘1972-06-21’, ‘2365 Newcastle Rd, Bellaire, TX’, M, 58000, ‘888665555’, 1> into EMPLOYEE. b. Insert <‘ProductA’, 4, ‘Bellaire’, 2> into PROJECT. c. Insert <‘Production’, 4, ‘943775543’, ‘2007-10-01’> into DEPARTMENT. d. Insert <‘677678989’, NULL, ‘40.0’> into WORKS_ON. e. Insert <‘453453453’, ‘John’, ‘M’, ‘1990-12-12’, ‘spouse’> into DEPENDENT. f. Delete the WORKS_ON tuples with Essn = ‘333445555’. g. Delete the EMPLOYEE tuple with Ssn = ‘987654321’.. h. Delete the PROJECT tuple with Pname = ‘ProductX’. i. Modify the Mgr_ssn and Mgr_start_date of the DEPARTMENT tuple with Dnumber = 5 to ‘123456789’ and ‘2007-10-01’, respectively. j. Modify the Super_ssn attribute of the EMPLOYEE tuple with Ssn = ‘999887777’ to ‘943775543’. k .Modify the Hours attribute of the WORKS_ON tuple with Essn = ‘999887777’ and Pno = 10 to ‘5.0’. Step-by-step solution Step 1 of 11 (a) Acceptable operation. Comment Step 2 of 11 (b) Not Acceptable. Violates referential integrity constraint as value of Department number that is foreign key is not present in DEPARTMENT relation. Ways of enforcing as follows: • Not performing the operation and explain to user cause of the same. • Inserting NULL value in department field and performing operation. • Prompting user to insert department with Dept number 2 in DEPRTMENT relation and then performing the operation. Comment Step 3 of 11 (c) Not Acceptable. Violates Key constraint. Department with dept number 4 already exist. Ways of enforcing as follows: • Not performing the operation and explain to user cause of the same. Comment Step 4 of 11 (d) Not Acceptable. Violates entity Integrity constraint and referential integrity constraint. Value of one of the Attributes of primary is NULL. Also value of Essn is not present in referenced relation, i.e., EMPLOYEE. Ways of enforcing as follows: • Not performing the operation and explain to user cause of the same. • Prompting user to specify correct values for the primary key and performing the operation. Comment Step 5 of 11 (e) Acceptable Comment Step 6 of 11 (f) Acceptable Comment Step 7 of 11 (g) Not Acceptable. Violates referential integrity constraint as value of Ssn has been used as foreign key of WORKS_ON, EMPLOYEE, DEPENDENT, DEPARTMENT relations and deleting record with Ssn = ‘987654321’ will leave no corresponding entry for record in WORKS_ON relation. Ways of enforcing as follows: • Not performing the operation and explain to user cause of the same. • Deleting corresponding records in corresponding tables as well. Comment Step 8 of 11 (h) Not Acceptable. Violates referential integrity constraint as value of Pnumber has been used as foreign key of WORKS_ON relation and deleting record with Pname = ‘ProductX’ will also delete product with Pnumber = ’1’. Since this value has been used in WORKS_ON table so deleting this record will violate referential integrity constraint. Ways of enforcing as follows: • Not performing the operation and explain to user cause of the same. • Deleting corresponding records in corresponding tables as well. Comment Step 9 of 11 (i) Acceptable. Comment Step 10 of 11 (j) Not Acceptable. Violates referential integrity constraint as value of Super_Ssn is also foreign key for EMPLOYEE relation. Since no employee with Ssn = ‘943775543’ exist so Super_Ssn of any employee cannot be ‘943775543’. Ways of enforcing as follows: • Not performing the operation and explain to user cause of the same. • Prompting user to either add a record in EMPLOYEE relation with Ssn = ‘943775543’ or to change Super_Ssn to some valid value. Comment Step 11 of 11 (k) Acceptable. Comment Chapter 5, Problem 12E Problem Consider the AIRLINE relational database schema shown in Figure, which describes a database for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled arrival and departure times, airports, and one or more LEG_INSTANCEs—one for each Date on which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance, SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE database to enter a reservation on a particular flight or flight leg on a given date. a. Give the operations for this update. b. What types of constraints would you expect to check? c. Which of these constraints are key, entity integrity, and referential integrity constraints, and which are not? d. Specify all the referential integrity constraints that hold on the schema shown in Figure. The AIRLINE relational database schema. Step-by-step solution Step 1 of 4 a. First it is necessary check if the seats are available on the on a particular flight or flight leg on a given date. This can be done by checking the LEG_INSTANCE relation. SELECT Number_of_available_seats FROM LEG_INSTANCE WHERE Flight_number ='FL01' and Date='2000-06-07'; If the Number_of_available_seats>0, then perform the following operation to reserve a seat. INSERT INTO SEAT_RESERVATION VALUES ('FL01', '1', '2000-06-07', '1', 'John','9910110110'); Comment Step 2 of 4 b. The constraints that need to be checked into to perform the update are as follows: • Check if Number_of_available_seats in LEG_INSTANCE relation for the particular flight on the particular date is greater than 1. • Check if the particular SEAT_NUMBER for particular flight on the particular date is available or not. Comments (1) Step 3 of 4 c. Checking the Number_of_available_seats in LEG_INSTANCE relation does not come under entity or referential integrity constraint. Checking for SEAT_NUMBER particular flight on the particular date comes under entity integrity constraint. Comment Step 4 of 4 d. A referential integrity constraint specifies that the value of a foreign key should match with value of the primary key in the primary table. The referential integrity constraints hold are as follows: • Flight_number of FLIGHT_LEG relation is a foreign key which references the Flight_number of FLIGHT relation. • Flight_number of LEG_INSTANCE is a foreign key which references the Flight_number of FLIGHT relation. • Flight_number of FARE is a foreign key which references the Flight_number of FLIGHT relation. • Flight_number of SEAT_RESERVATION is a foreign key which references the Flight_number of FLIGHT relation. • Departure_airport_code and Arrival_airport_code of FLIGHT_LEG are foreign keys which references the Airport_code of AIRPORT relation. • Departure_airport_code and Arrival_airport_code of LEG_INSTANCE are foreign keys which references the Airport_code of AIRPORT relation. • Airport_code of CAN_LAND is a foreign key which references the Airport_code of AIRPORT relation. • Flight_number and Leg_number of LEG_INSTANCE are foreign keys which references Flight_number and Leg_number of FLIGHT_LEG. • Airplane_id of LEG_INSTANCE is a foreign key which references the Airplane_id of AIRPLANE relation. • Flight_number, Leg_number and Date of SEAT_RESERVATION are are foreign keys which references Flight_number, Leg_number and Date of LEG_INSTANCE relation. • Airplane_type_name of CAN_LAND is a foreign key which references the Airplane_type_name of AIRPLANE_TYPE relation. Comment Chapter 5, Problem 13E Problem Consider the relation CLASS(Course#, Univ_Section#, Instructor_name, Semester, Building_code, Room#, Time_period, Weekdays, Credit_hours). This represents classes taught in a university, with unique Univ_section#s. Identify what you think should be various candidate keys, and write in your own words the conditions or assumptions under which each candidate key would be valid. Step-by-step solution Step 1 of 2 The relation CLASS specified about the uniqueness of and classes that are taught in University. As per the CLASS relation, the following are the possible candidate keys: 1. 2. – If this is unique throughout all the semesters. – If at least one course is taught by an instructor for each semester. 3. – If at given same time, for a specific semester, same room cannot be used by more than one course. Comment Step 2 of 2 4. – These would be the candidate keys if the is not unique. In this case, more than one Universities are considered and depending on the section numbers used by rules of University. 5. Otherwise, – If are assigned with unique numbers throughout the semester. Comment is unique, then all the sections Chapter 5, Problem 14E Problem Consider the following six relations for an order-processing database application in a company: CUSTOMER(Cust#, Cname, City) ORDER(Order#, Odate, Cust#, Ord_amt) ORDER_ITEM(Order#, Item#, Qty) ITEM(Item#, Unit_price) SHIPMENT(Order#, Warehouse#, Ship_date) WAREHOUSE(Warehouse#, City) Here, Ord_amt refers to total dollar amount of an order; Odate is the date the order was placed; and Ship_date is the date an order (or part of an order) is shipped from the warehouse. Assume that an order can be shipped from several warehouses. Specify the foreign keys for this schema, stating any assumptions you make. What other constraints can you think of for this database? Step-by-step solution Step 1 of 2 Foreign Keys: a. Cust# of ORDER is FK for CUSTOMER: orders are taken from recognized customers only. b. Order# of ORDER_ITEM is FK of ORDER. c. Item# of ORDER_ITEM is FK of ITEM: Orders are taken only for items in stock. d. Order# of SHIPMENT is FK of ORDER: Shipment is done only for orders taken. e. Warehouse# of SHIPMENT is FK of WAREHOUSE: shipment is done only from companies warehouses. Comment Step 2 of 2 Other Constraints: • Ship_date must be greater (later date) then Odate in ORDER. Order must be taken before it is shipped. • Ord_amt must be greater than Unit_price. Comment Chapter 5, Problem 15E Problem Consider the following relations for a database that keeps track of business trips of salespersons in a sales office: SALESPERSON(Ssn, Name, Start_year, Dept_no) TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip id) EXPENSE(Trip id, Account#, Amount) A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating any assumptions you make. Step-by-step solution Step 1 of 3 A foreign key is a column or composite of columns which is/are a primary key of other table that is used to maintain relationship between two tables. • A foreign key is mainly used for establishing relationship between two tables. • A table can have more than one foreign key. Comment Step 2 of 3 The foreign keys in the given relations are as follows: • Ssn is a foreign key in TRIP relation. It references the Ssn of SALESPERSON relation. • Trip_id is a foreign key in EXPENSE relation. It references the Trip_id of TRIP relation. Comment Step 3 of 3 Assume that there are additional tables that stores the department information and account details. Then possible foreign keys are as follows: • Dept_no is a foreign key in SALESPERSON relation. • Account# is a foreign key in EXPENSE relation. Comment Chapter 5, Problem 16E Problem Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course: STUDENT(Ssn, Name, Major, Bdate) COURSE(Course#, Cname, Dept) ENROLL(Ssn, Course#, Quarter, Grade) BOOK ADOPTION(Course#, Quarter, Book_isbn) TEXT(Book_isbn, Book_title, Publisher, Author) Specify the foreign keys for this schema, stating any assumptions you make. Step-by-step solution Step 1 of 2 A foreign key is a column or composite of columns which is/are a primary key of other table that is used to maintain relationship between two tables. • A foreign key is mainly used for establishing relationship between two tables. • A table can have more than one foreign key. Comment Step 2 of 2 The foreign keys in the given relations are as follows: • Ssn is a foreign key in ENROLL table which references the Ssn of STUDENT table . Ssn is a primary key in STUDENT table. • Course# is a foreign key in ENROLL table which references the Course# of COURSE table . Course#is a primary key in COURSE table. • Course# is a foreign key in BOOK_ADOPTION table which references the Course# of COURSE table . Course# is a primary key in COURSE table. • Book_isbn is a foreign key in BOOK_ADOPTION table which references the Book_isbn of TEXT table . Book_isbn is a primary key in TEXT table. Comment Chapter 5, Problem 17E Problem Consider the following relations for a database that keeps track of automobile sales in a car dealership (OPTION refers to some optional equipment installed on an automobile): CAR(Serial no, Model, Manufacturer, Price) OPTION(Serial_no, Option_name, Price) SALE(Salesperson_id, Serial_no, Date, Sale_price) SALESPERSON(Salesperson_id, Name, Phone) First, specify the foreign keys for this schema, stating any assumptions you make. Next, populate the relations with a few sample tuples, and then give an example of an insertion in the SALE and SALESPERSON relations that violates the referential integrity constraints and of another insertion that does not. Step-by-step solution Step 1 of 4 Foreign keys are: a. Serial_no from OPTION is FK for CAR: spare parts can be added to cars with serial number. b. Serial_no from is FK for CAR:only car with serial number can be put to sale. c. Salesperson_id from is FK for SALESPERSON: salesperson can sell any car. Comments (2) Step 2 of 4 Consider a relation schema state: CAR: Serial_no Model Manufacturer Price(lakh) 1 1987 ford 7 2 1998 Tata 4 3 1988 Ferrari 20 4 1952 Ford 2 Serial_no Option_name Price 2 Abc 200 4 def 400 OPTION: Comment Step 3 of 4 SALESPERSON: Saleperson_id Name Phone Sl1 Ram 9910101010 Sl2 John 9999999999 Sl3 Mario 9090909090 : Saleperson_id Serial_no Date Sl1 1 Sale_price(lakh) 2000-6-07 7.5 Sl2 2 2000-6-08 4.1 Comment Step 4 of 4 Insertion in that violates Referential Integrity constraint: Insert <’Sl4’, ‘5’,’2000-07-07’,’21’> into Invalid Saleperson_id and Serial_no. Insertion in that does not violates Referential Integrity constraint: Insert < ’Sl1’,’4’,’2000-09-07’,’2.1’> into Insertion in SALESPERSON can not violate Referential Integrity constraint. A valid insertion for SALESPERSON can be: Insert <’Sl4’, ‘Jack’,’9190000000’> into SALESPERSON. Comment Chapter 5, Problem 18E Problem Database design often involves decisions about the storage of attributes. For example, a Social Security number can be stored as one attribute or split into three attributes (one for each of the three hyphen-delineated groups of numbers in a Social Security number—XXX-XX-XXXX). However, Social Security numbers are usually represented as just one attribute. The decision is based on how the database will be used. This exercise asks you to think about specific situations where dividing the SSN is useful. Step-by-step solution Step 1 of 2 Usually during the database design, the social security number (SSN) is stored as single attribute. • SSN is made up of 9 digits divided into three parts. • The format of SSN is XXX-XX-XXXX. • Each part is separated by a hyphen. • The first part represents the area number. • The second part represents the group number. • The third part represents the serial number. Comment Step 2 of 2 The situations where it is preferred to store the SSN as parts instead of as a single attribute is as follows: • Area number determines the location or state. In some cases, it is necessary to group the data based on the location to generate some statistical information. • The area code (or city code) is required and sometimes country code is needed for dialing the international phone numbers. • Every part has its own independent existence. Comment Chapter 5, Problem 19E Problem Consider a STUDENT relation in a UNIVERSITY database with the following attributes (Name, Ssn, Local_phone, Address, Cell_phone, Age, Gpa). Note that the cell phone may be from a different city and state (or province) from the local phone. A possible tuple of the relation is shown below: Name Ssn Local_phone Address George Shaw William 123-45- Edwards 6789 Cell_phone Age Gpa 123 Main St., 555-1234 Anytown, CA 555-4321 19 3.75 94539 a. Identify the critical missing information from the Local_phone and Cell_phone attributes. (Hint: How do you call someone who lives in a different state or province?) b. Would you store this additional information in the Local_phone and Cell_phone attributes or add new attributes to the schema for STUDENT? c. Consider the Name attribute. What are the advantages and disadvantages of splitting this field from one attribute into three attributes (first name, middle name, and last name)? d. What general guideline would you recommend for deciding when to store information in a single attribute and when to split the information? e. Suppose the student can have between 0 and 5 phones. Suggest two different designs that allow this type of information. Step-by-step solution Step 1 of 5 a. State, province or city code is missing from phone number information. Comment Step 2 of 5 b. Since cell phone and local phone can be of different city or state, additional information must be added in Local_phone and Cell_phone attributes. Comment Step 3 of 5 c. If Name is Split in First_name, Middle_name and Last_name attributes there can be following advantages: • Sorting can be done on basis of First Name or Last Name or Middle Name. Disadvantages: • By splitting single attribute into three attributes NULL values may increase in database. (If few students don’t have a Middle Name.) • Extra Memory will be consumed for storing NULL values of attributes that may not exist for a particular student. (Middle Name). Comment Step 4 of 5 d. To decide when to store information in single attribute: • When storing information in different attributes will create NULL values, single attribute must be preferred. • When while using single attribute atomicity can not be maintained, we must use different attributes. • When information needs to be sorted on the basis of some Sub-field of and attribute or when any sub-field is needed for decision making, we must split single attribute into many. e. Comment Step 5 of 5 First Design • STUDENT(Name, Ssn, Phone_number_count, Address, Age, Gpa) Phone (Ssn, Phone_number) Second Design: • STUDENT(Name, Ssn, Phone_number1, Phone_number2, Phone_number3, Phone_number4, Phone_number5, Address, Age, Gpa) Although schema can be designed in either of the two ways but design first is better than second as it leaves lesser number of NULL values. Comment Chapter 5, Problem 20E Problem Recent changes in privacy laws have disallowed organizations from using Social Security numbers to identify individuals unless certain restrictions are satisfied. As a result, most U.S. universities cannot use SSNs as primary keys (except for financial data). In practice, Student_id, a unique identifier assigned to every student, is likely to be used as the primary key rather than SSN since Student_id can be used throughout the system. a. Some database designers are reluctant to use generated keys (also known as surrogate keys) for primary keys (such as Student_id) because they are artificial. Can you propose any natural choices of keys that can be used to identify the student record in a UNIVERSITY database? b. Suppose that you are able to guarantee uniqueness of a natural key that includes last name. Are you guaranteed that the last name will not change during the lifetime of the database? If last name can change, what solutions can you propose for creating a primary key that still includes last name but remains unique? c. What are the advantages and disadvantages of using generated (surrogate) keys? Step-by-step solution Step 1 of 1 (a) Some Operation on Students Name and Local and cell phone numbers (originals) can jointly be used for generating id for student. For Example: First name + initials of name+ ‘_’ + last name + ‘_’ + digits of local_phone_number + sum of digits of cell phone number + ‘_’ + increasing record counter. For Example: for record Let it be 57th entry into the system. We can have unique identifier as: GeorgeGWE_Edwards_555-123430_57. Assumptions: Each student has different local_number unless they have same address and two students with same address will not have same names. Some hash operations can also be used on various fields for generation of key. (b) In case if natural key uses Last name and as last name can change we can include a column called original last name. That can be used for identification. (c) Advantages of Surrogate keys: Immutability: • Surrogate keys do not change while the row exists. This has two advantages: Database applications won't lose their "handle" on the row because the data changes; • Many database systems do not support cascading updates of keys across foreign keys of related tables. This results in difficulty in modifying the primary key data. Flexibility for changing requirements Because of changing requirements, the attributes that uniquely identify an entity might change. In that case, the attribute(s) initially chosen as the natural key will no longer be a suitable natural key. Example : An employee ID is chosen as the natural key of an employee DB. Because of a merger with another company, new employees from the merged company must be inserted, who have conflicting IDs (as their IDs were independently generated when the companies were Separate). In these cases, generally a new attribute must be added to the natural key (e.g. an attribute "original_company"). With a surrogate key, only the table that defines the surrogate key must be changed. With natural keys, all tables (and possibly other, related software) that use the natural key will have to change. More generally, in some problem domains it is simply not clear what might be a suitable natural key. Surrogate keys avoid problems from choosing a natural key that later turns out to be incorrect. Performance Often surrogate keys are composed of a compact data type, such as a four-byte integer. This allows the database to query faster than it could multiple columns. • A non-redundant distribution of keys causes the resulting b-tree index to be completely balanced. • If the natural key is a compound key, joining is more expensive as there are multiple columns to compare. Surrogate keys are always contained in a single column. Compatibility Several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate (Java), depend on the use of integer or GUID surrogate keys in order to support database-system-agnostic operations and object-to-row mapping. Disadvantages of surrogate keys: Disassociation Because the surrogate key is completely unrelated to the data of the row to which it is attached, the key is disassociated from that row. Disassociated keys are unnatural to the application's world, resulting in an additional level of indirection from which to audit. Query Optimization Relational databases assume a unique index is applied to a table's primary key. The unique index serves two purposes: 1) to enforce entity integrity— primary key data must be unique across rows—and 2) to quickly search for rows queried. Since surrogate keys replace a table's identifying attributes—the natural key—and since the identifying attributes are likely to be those queried, then the query optimizer is forced to perform a full table scan when fulfilling likely queries. The remedy to the full table scan is to apply a (non-unique) index on each of the identifying attributes. However, these additional indexes will take up disk space, slow down inserts, and slow down deletes. Normalization The presence of a surrogate key can result in the database administrator forgetting to establish, or accidentally removing, a secondary unique index on the natural key of the table. Without a unique index on the natural key, duplicate rows are likely to appear and are difficult to identify. Business Process Modeling Because surrogate keys are unnatural, flaws can appear when modeling the business requirements. Business requirements, relying on the natural key, then need to be translated to the surrogate key. Inadvertent Disclosure Proprietary information may be leaked if sequential key generators are used. By subtracting a previously generated sequential key from a recently generated sequential key, one could learn the number of rows inserted during that time period. This could expose, for example, the number of transactions or new accounts per period. The solution to the inadvertent disclosure problem is to generate a random primary key. However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection. Inadvertent Assumptions Sequentially generated surrogate keys create the illusion that events with a higher primary key value occurred after events with a lower primary key value. This illusion would appear when an event is missed during the normal data entry process and is, instead, inserted after subsequent events were previously inserted. The solution to the inadvertent assumption problem is to generate a random primary key. However, a randomly generated primary key must be queried before assigned to prevent duplication and cause an insert rejection. Comment Chapter 6, Problem 1RQ Problem How do the relations (tables) in SQL differ from the relations defined formally in Chapter 3? Discuss the other differences in terminology. Why does SQL allow duplicate tuples in a table or in a query result? Step-by-step solution Step 1 of 1 SQL allows a table(relation) to have two or more tuples that are identical in all their attribute values. Hence, in general, an SQL table is not a set of tuples, because a set does not allow two identical members; rather, it is a multiset of tuples. Some SQL relations are constrained to be sets because a key constraint has been declared or because of DISTINCT option has been used in SELECT statement. On contrary relation defined formally says that a relation is set of tuples that is, same values are not allowed for any tuple. Correspondence between ER and Relational Model can help in understanding other differences in terminology: ER Model Relational Model Entity type Entity relation 1:1 or 1:N relationship type Foreign key(or relationship type) M:N relationship type Relationship relation and two foreign keys n-ary relationship type Relationship relation and n foreign keys Simple Attributes Attribute Composite attributes Set of simple component attribute Multivalued attributes Relation and foreign keys Value set Domain Key attributes Primary(or secondary) key SQL allows duplicate tuples for following reasons: 1. Duplicate elimination is a expensive operation. 2. User may want to see duplicate tuples in the result of query. 3. When an aggregate function is applied to tuples, in most cases user don’t want to remove duplicates. Comment Chapter 6, Problem 2RQ Problem List the data types that are allowed for SQL attributes. Step-by-step solution Step 1 of 1 List of data types allowed for SQL attributes:The basic data types available for attributes are Numeric data types Character string Bit string Boolean Date and time. Comment Chapter 6, Problem 3RQ Problem How does SQL allow implementation of the entity integrity and referential integrity constraints described in Chapter 3? What about referential triggered actions? Step-by-step solution Step 1 of 6 An entity integrity constraint specifies that every table must have a primary key and the primary key should contain unique values and cannot contain null values. SQL allows implementation of the entity integrity constraint using PRIMARY KEY clause. • The PRIMARY KEY clause must be specified at the time of creating a table. • It ensures that no duplicate values are inserted into the table. Comment Step 2 of 6 Following are the examples to illustrate how the entity integrity constraint is implemented in SQL: CREATE TABLE BOOKS (BOOK_CODE INT PRIMARY KEY, BOOK_TITLE VARCHAR(20), BOOK_PRICE INT ); In the table BOOKS, BOOK_CODE is a primary key. CREATE TABLE AUTHOR (AUTHOR_ID INT PRIMARY KEY, AUTHOR_NAME VARCHAR(20)); In the table AUTHOR, AUTHOR_ID is a primary key. Comment Step 3 of 6 A foreign key is an attribute or two or more attributes which is/are a primary key of other table that is used to maintain relationship between two tables. A referential integrity constraint specifies that the value of a foreign key should match with value of the primary key in the primary table. SQL allows implementation of the referential integrity constraint using FOREIGN KEY clause. • The FOREIGN KEY clause must be specified at the time of creating a table. • It ensures that it is not possible to add a value to a foreign key which does not exist in the primary key of the primary/linked table. Comment Step 4 of 6 Following is the example to illustrate how the referential integrity constraint is implemented in SQL: CREATE TABLE BOOKSTORE (BOOK_CODE INT FOREIGN KEY REFERENCES BOOKS(BOOK_CODE), AUTHOR_ID INT FOREIGN KEY REFERENCES AUTHOR(AUTHOR_ID), BOOK_TYPE VARCHAR(20), PRIMARY KEY(BOOK_CODE, AUTHOR_ID)); In the table BOOKSTORE, BOOK_CODE, AUTHOR_ID together form the primary key. BOOK_CODE is a foreign key which refers the BOOK_CODE of table BOOKS. AUTHOR_ID is a foreign key which refers the AUTHOR_ID of table AUTHOR. The use of the foreign key BOOK_CODE is that it is not possible to add a tuple to BOOKSTORE table unless there is a valid BOOK_CODE in the BOOKS table. The use of the foreign key AUTHOR_ID is that it is not possible to add a tuple to BOOKSTORE table unless there is a valid AUTHOR_ID in the AUTHOR table. Comment Step 5 of 6 When a foreign key is violated, the default action performed by the SQL is to reject the operation. • Instead of rejecting the operation, it is possible to add a REFERENTIAL TRIGGERED ACTION clause to the foreign key which will automatically insert a NULL value or a default value. • The options provided along with REFERENTIAL TRIGGERED ACTION are SET NULL, SET DEFAULT, CASCADE. • A qualifier ON DELETE or ON UPDATE must be specified along with the options. Comment Step 6 of 6 Following is the example to illustrate how the referential triggered action is implemented in SQL: CREATE TABLE EMPLOYEE (EMPNO INT PRIMARY KEY, ENAME VARCHAR(20), JOB VARCHAR(20), SALARY INT, MANAGER INT FOREIGN KEY REFERENCES EMPLOYEE(EMPNO) ON DELETE SET NULL); Comment Chapter 6, Problem 4RQ Problem Describe the four clauses in the syntax of a simple SQL retrieval query. Show what type of constructs can be specified in each of the clauses. Which are required and which are optional? Step-by-step solution Step 1 of 1 The four clauses in the syntax of a simple SQL retrieval query: The following are the four clauses of a simple SQL retrieval query. Select: • It is a statement connected with the From clause to extract or get the data from the database in a human readable format. • The select clause is required. From: • The From clause should be used in combination with the Select statement for retrieving the data. • It will prompt the database to use which table to retrieve the data and we can mention multiple tables in the from clause. • It is required. Where: • It is used to impose conditions on the query and remove the rows or tuples which does not satisfy the condition. • We can use more than one condition in the where clause and • It is optional. Order By: • This clause is used to sort the values of the output either in ascending order or descending order. • The default value of the Order By is ascending order. • This clause is also optional. Example of Simple Sql query: Select * from employee where empno=10 Order by desc; Comment Chapter 6, Problem 5E Problem Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. What are the referential integrity constraints that should hold on the schema? Write appropriate SQL DDL statements to define the database. Step-by-step solution Step 1 of 2 From the figure 1.2 in the text book the referential integrity constraints that should hold the following notation: R.(A1, ..., An) --> S.(B1, ..., Bn) This represent a foreign key from the attributes A1, ..., An of referencing relation R to S (the referenced relation)): PREREQUISITE.(CourseNumber) --> COURSE.(CourseNumber) PREREQUISITE.(PrerequisiteNumber) --> COURSE.(CourseNumber) SECTION.(CourseNumber) --> COURSE.(CourseNumber) GRADE_REPORT.(StudentNumber) --> STUDENT.(StudentNumber) GRADE_REPORT.(SectionIdentifier) --> SECTION.(SectionIdentifier) Comment Step 2 of 2 SQL statements for above data base. CREATE TABLE STUDENT ( Name VARCHAR(30) NOT NULL, StudentNumber INTEGER NOT NULL, Class CHAR NOT NULL, Major CHAR(4), PRIMARY KEY (StudentNumber) ); CREATE TABLE COURSE ( CourseName VARCHAR(30) NOT NULL, CourseNumber CHAR(8) NOT NULL, CreditHours INTEGER, Department CHAR(4), PRIMARY KEY (CourseNumber), UNIQUE (CourseName) ); CREATE TABLE PREREQUISITE ( CourseNumber CHAR(8) NOT NULL, PrerequisiteNumber CHAR(8) NOT NULL, PRIMARY KEY (CourseNumber, PrerequisiteNumber), FOREIGN KEY (CourseNumber) REFERENCES COURSE (CourseNumber), FOREIGN KEY (PrerequisiteNumber) REFERENCES COURSE (CourseNumber) ); CREATE TABLE SECTION ( SectionIdentifier INTEGER NOT NULL, CourseNumber CHAR(8) NOT NULL, Semester VARCHAR(6) NOT NULL, Year CHAR(4) NOT NULL, Instructor VARCHAR(15), PRIMARY KEY (SectionIdentifier), FOREIGN KEY (CourseNumber) REFERENCES COURSE (CourseNumber) ); CREATE TABLE GRADE_REPORT ( StudentNumber INTEGER NOT NULL, SectionIdentifier INTEGER NOT NULL, Grade CHAR, PRIMARY KEY (StudentNumber, SectionIdentifier), FOREIGN KEY (StudentNumber) REFERENCES STUDENT (StudentNumber), FOREIGN KEY (SectionIdentifier) REFERENCES SECTION (SectionIdentifier) ); Comment Chapter 6, Problem 6E Problem Repeat Exercise, but use the AIRLINE database schema of Figure. Exercise Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. What are the referential integrity constraints that should hold on the schema? Write appropriate SQL DDL statements to define the database. The AIRLINE relational database. Step-by-step solution Step 1 of 10 Below referential integrity constraints for the AIR LINE data base schema is based on the figure 2.1 from the text book. FLIGHT_LEG.(FLIGHT_NUMBER) --> FLIGHT.(NUMBER) FLIGHT_LEG.(DEPARTURE_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE) FLIGHT_LEG.(ARRIVAL_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE) LEG_INSTANCE.(FLIGHT_NUMBER, LEG_NUMBER) --> FLIGHT_LEG.(FLIGHT_NUMBER, LEG_NUMBER) LEG_INSTANCE.(AIRPLANE_ID) --> AIRPLANE.(AIRPLANE_ID) LEG_INSTANCE.(DEPARTURE_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE) LEG_INSTANCE.(ARRIVAL_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE) FARES.(FLIGHT_NUMBER) --> FLIGHT.(NUMBER) CAN_LAND.(AIRPLANE_TYPE_NAME) --> AIRPLANE_TYPE.(TYPE_NAME) CAN_LAND.(AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE) AIRPLANE.(AIRPLANE_TYPE) --> AIRPLANE_TYPE.(TYPE_NAME) SEAT_RESERVATION.(FLIGHT_NUMBER, LEG_NUMBER, DATE) --> LEG_INSTANCE.(FLIGHT_NUMBER, LEG_NUMBER, DATE) Comment Step 2 of 10 CREATE TABLE statements for the database is, CREATE (AIRPORT_CODE CHAR (3) NOT NULL, NAME VARCHAR (30) NOT NULL, CITY VARCHAR (30) NOT NULL, STATE VARCHAR (30), PRIMARY KEY (AIRPORT_CODE) ); Comment Step 3 of 10 CREATE TABLE FLIGHT (NUMBER VARCHAR (6) NOT NULL, AIRLINE VARCHAR (20) NOT NULL, WEEKDAYS VARCHAR (10) NOT NULL, PRIMARY KEY (NUMBER)); Comment Step 4 of 10 CREATE TABLE FLIGHT_LEG (FLIGHT_NUMBER VARCHAR (6) NOT NULL, LEG_NUMBER INTEGER NOT NULL, DEPARTURE_AIRPORT_CODE CHAR (3) NOT NULL, SCHEDULED_DEPARTURE_TIME TIMESTAMP WITH TIME ZONE, ARRIVAL_AIRPORT_CODE CHAR (3) NOT NULL, SCHEDULED_ARRIVAL_TIME TIMESTAMP WITH TIME ZONE, PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER), FOREIGN KEY (FLIGHT_NUMBER) REFERENCES FLIGHT (NUMBER), FOREIGN KEY (DEPARTURE_AIRPORT_CODE) REFERENCES AIRPORT (AIRPORT_CODE), FOREIGN KEY (ARRIVAL_AIRPORT_CODE) (AIRPORT_CODE)); Comment Step 5 of 10 CREATE TABLE LEG_INSTANCE (FLIGHT_NUMBER VARCHAR (6) NOT NULL, LEG_NUMBER INTEGER NOT NULL, LEG_DATE DATE NOT NULL, NO_OF_AVAILABLE_SEATS INTEGER, AIRPLANE_ID INTEGER, DEPARTURE_AIRPORT_CODE CHAR(3), DEPARTURE_TIME TIMESTAMP WITH TIME ZONE, ARRIVAL_AIRPORT_CODE CHAR(3), ARRIVAL_TIME TIMESTAMP WITH TIME ZONE, PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE), FOREIGN KEY (FLIGHT_NUMBER, LEG_NUMBER) REFERENCES FLIGHT_LEG (FLIGHT_NUMBER, LEG_NUMBER), FOREIGN KEY (AIRPLANE_ID) REFERENCES AIRPLANE (AIRPLANE_ID), FOREIGN KEY (DEPARTURE_AIRPORT_CODE) (AIRPORT_CODE), FOREIGN KEY (ARRIVAL_AIRPORT_CODE) (AIRPORT_CODE) ); Comment Step 6 of 10 CREATE TABLE FARES (FLIGHT_NUMBER VARCHAR (6) NOT NULL, FARE_CODE VARCHAR (10) NOT NULL, AMOUNT DECIMAL (8, 2) NOT NULL, RESTRICTIONS VARCHAR (200), PRIMARY KEY (FLIGHT_NUMBER, FARE_CODE), FOREIGN KEY (FLIGHT_NUMBER) REFERENCES FLIGHT (NUMBER) ); Comment Step 7 of 10 CREATE TABLE AIRPLANE_TYPE (TYPE_NAME VARCHAR (20) NOT NULL, MAX_SEATS INTEGER NOT NULL, COMPANY VARCHAR (15) NOT NULL, PRIMARY KEY (TYPE_NAME) ); Comment Step 8 of 10 CREATE TABLE CAN_LAND (AIRPLANE_TYPE_NAME VARCHAR (20) NOT NULL, AIRPORT_CODE CHAR (3) NOT NULL, PRIMARY KEY (AIRPLANE_TYPE_NAME, AIRPORT_CODE), FOREIGN KEY (AIRPLANE_TYPE_NAME) REFERENCES AIRPLANE_TYPE (TYPE_NAME), FOREIGN KEY (AIRPORT_CODE) (AIRPORT_CODE) ); Comment Step 9 of 10 CREATE TABLE AIRPLANE (AIRPLANE_ID INTEGER NOT NULL, TOTAL_NUMBER_OF_SEATS INTEGER NOT NULL, AIRPLANE_TYPE VARCHAR (20) NOT NULL, PRIMARY KEY (AIRPLANE_ID), FOREIGN KEY (AIRPLANE_TYPE) REFERENCES AIRPLANE_TYPE (TYPE_NAME) ); Comment Step 10 of 10 CREATE TABLE SEAT_RESERVATION (FLIGHT_NUMBER VARCHAR (6) NOT NULL, LEG_NUMBER INTEGER NOT NULL, LEG_DATE DATE NOT NULL, SEAT_NUMBER VARCHAR (4), CUSTOMER_NAME VARCHAR (30) NOT NULL, CUSTOMER_PHONE CHAR (12), PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE, SEAT_NUMBER), FOREIGN KEY (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE) REFERENCES LEG_INSTANCE (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE) ); Comment Chapter 6, Problem 7E Problem Consider the LIBRARY relational database schema shown in Figure. Choose the appropriate action (reject, cascade, set to NULL, set to default) for each referential integrity constraint, both for the deletion of a referenced tuple and for the update of a primary key attribute value in a referenced tuple. Justify your choices. A relational database scheme for a LIBRARY database. Step-by-step solution Step 1 of 7 The appropriate actions of the LIBRARY relational database schema are as follows: • The REJECT action will not permit the automatic changes in the LIBRARY database. • If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the rows of the referenced relation BOOK_AUTHORS. • If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the rows of the referenced relation BOOK_AUTHORS. Therefore, the CASCADE on DELETE and CASCADE on UPDATE actions are chosen for the above referential integrity. Comment Step 2 of 7 • It is not possible to delete the rows in the PUBLISHER relation because it is referenced to the rows in the BOOK table. • If the PUBLISHER’s name is updated the CASCADE on UPDATE action is automatically propagated to the rows of the referenced relation BOOK. Therefore, the ON DELETE REJECT and CASCADE on UPDATE actions are chosen for the above referential integrity. Comment Step 3 of 7 • If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the rows of the referenced relation BOOK_LOANS. • If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the rows of the referenced relation BOOK_LOANS. • It is not possible to delete the rows in the BOOK relation because it is referenced to the rows in the BOOK_LOANS table. Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT actions are chosen for the above referential integrity. Comment Step 4 of 7 • If a BOOK is deleted, then delete all its associated rows in the relation BOOK_COPIES. • If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the rows of the referenced relation BOOK_COPIES. • If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the rows of the referenced relation BOOK_COPIES. Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT actions are chosen for the above referential integrity. Comment Step 5 of 7 • If the rows deleted in a BORROWER table, the CASCADE on DELETE action is automatically propagated to the rows of the referenced relation BOOK_LOANS. • If the CardNo is updated in the BORROWER table, the CASCADE on UPDATE action is automatically propagated to the rows of the referenced relation BOOK_LOANS. • It is not possible to delete the rows in the BORROWER relation because it is referenced to the rows in the BOOK_LOANS table. Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT actions are chosen for the above referential integrity. Comment Step 6 of 7 • If the rows deleted in a LIBRARY_BRANCH table, the CASCADE on DELETE action is automatically propagated to the rows of the referenced relation BOOK_COPIES. • If the Branch_id is updated in the LIBRARY_BRANCH table, the CASCADE on UPDATE action is automatically propagated to the rows of the referenced relation BOOK_COPIES. • It is not possible to delete the rows in the LIBRARY_BRANCH relation because it is referenced to the rows in the BOOK_COPIES table. Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT actions are chosen for the above referential integrity. Comment Step 7 of 7 • If the rows deleted in a LIBRARY_BRANCH table, the CASCADE on DELETE action is automatically propagated to the rows of the referenced relation BOOK_LOANS. • If the Branch_id is updated in the LIBRARY_BRANCH table, the CASCADE on UPDATE action is automatically propagated to the rows of the referenced relation BOOK_LOANS. • It is not possible to delete the rows in the LIBRARY_BRANCH relation because it is referenced to the rows in the BOOK_LOANS table. Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT actions are chosen for the above referential integrity. Comment Chapter 6, Problem 8E Problem Write appropriate SQL DDL statements for declaring the LIBRARY relational database schema of Figure. Specify the keys and referential triggered actions. A relational database scheme for a LIBRARY database. Step-by-step solution Step 1 of 7 Set of statements for the LIBRARY relational schema from the figure 6.14 in the text book. The CREATE TABLE is like this: CREATE TABLE BOOK ( BookId CHAR(20) NOT NULL, Title VARCHAR(30) NOT NULL, PublisherName VARCHAR(20), PRIMARY KEY (BookId), FOREIGN KEY (PublisherName) REFERENCES PUBLISHER (Name) ON UPDATE CASCADE ); Comment Step 2 of 7 CREATE TABLE BOOK_AUTHORS ( BookId CHAR(20) NOT NULL, AuthorName VARCHAR(30) NOT NULL, PRIMARY KEY (BookId, AuthorName), FOREIGN KEY (BookId) REFERENCES BOOK (BookId) ON DELETE CASCADE ON UPDATE CASCADE ); Comment Step 3 of 7 CREATE TABLE PUBLISHER ( Name VARCHAR(20) NOT NULL, Address VARCHAR(40) NOT NULL, Phone CHAR(12), PRIMARY KEY (Name) ); Comment Step 4 of 7 CREATE TABLE BOOK_COPIES ( BookId CHAR(20) NOT NULL, BranchId INTEGER NOT NULL, No_Of_Copies INTEGER NOT NULL, PRIMARY KEY (BookId, BranchId), FOREIGN KEY (BookId) REFERENCES BOOK (BookId) ON DELETE CASCADE ON UPDATE CASCADE,FOREIGN KEY (BranchId) REFERENCES BRANCH (BranchId) ON DELETE CASCADE ON UPDATE CASCADE ); Comment Step 5 of 7 CREATE TABLE BORROWER ( CardNo INTEGER NOT NULL, Name VARCHAR(30) NOT NULL, Address VARCHAR(40) NOT NULL, Phone CHAR(12), PRIMARY KEY (CardNo) ); Comment Step 6 of 7 CREATE TABLE BOOK_LOANS ( CardNo INTEGER NOT NULL, BookId CHAR(20) NOT NULL, BranchId INTEGER NOT NULL, DateOut DATE NOT NULL, DueDate DATE NOT NULL, PRIMARY KEY (CardNo, BookId, BranchId), FOREIGN KEY (CardNo) REFERENCES BORROWER (CardNo) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (BranchId) REFERENCES LIBRARY_BRANCH (BranchId) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (BookId) REFERENCES BOOK (BookId) ON DELETE CASCADE ON UPDATE CASCADE ); Comment Step 7 of 7 CREATE TABLE LIBRARY_BRANCH ( BranchId INTEGER NOT NULL, BranchName VARCHAR(20) NOT NULL, Address VARCHAR(40) NOT NULL, PRIMARY KEY (BranchId) ); Comment Chapter 6, Problem 9E Problem How can the key and foreign key constraints be enforced by the DBMS? Is the enforcement technique you suggest difficult to implement? Can the constraint checks be executed efficiently when updates are applied to the database? Step-by-step solution Step 1 of 3 Enforcement of key constraint in DBMS (Database management System): Key constraint: The technique that is often used to check efficiently for the key constraint is to create an index on the combination of attributes that form each key (primary or secondary). • Before inserting a new record (tuple), each index is searched to check that no value currently exists in the index that matches the key value in the new record. • If the search is successful then it inserts the record. Foreign key constraint: The technique to check the foreign key constraint is that using the index on the primary key of each referenced relation will make the check relatively efficient. Whenever a new record is inserted in a referencing relation, its foreign key value is used to search the index for the primary key of the referenced relation, and if the referenced record exists, then the new record can be successfully inserted in the referencing relation. For deletion of a referenced record, it is useful to have an index on the foreign key of each referencing relation so as to be able to determine efficiently whether any records reference the record being deleted. Comment Step 2 of 3 Implementation of enforcement technique: , the enforcement technique of using the index is easy to identify the duplicate data records. • If any other alternative structure like hashing is used instead of using the index on key constraint then it only does the linear searches to check for constraints and it makes the checks quite inefficient. Comment Step 3 of 3 Efficient constraint checks: , the constraint checks are executed efficiently while inserting or deleting the record from the database. • Using the index to enforce the key constraint avoids the duplication of data records and this helps the product vendors to achieve the greater data storage and management. Thus, the constraint checks using the index is efficient. Comment Chapter 6, Problem 10E Problem Specify the following queries in SQL on the COMPANY relational database schema shown in Figure 5.5. Show the result of each query if it is applied to the COMPANY database in Figure 5.6. a. Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project. b. List the names of all employees who have a dependent with the same first name as themselves. c. Find the names of all employees who are directly supervised by ‘Franklin Wong’. Step-by-step solution Step 1 of 9 a) Query: Select emp.Fname, emp.Lname from employee emp, works_on w, project p where emp.Dno = 5 and emp.ssn = w.Essn and w.Pno = p.pnumber and p.pname = 'ProductX' and w.hours > 10 Comment Step 2 of 9 Result: Fname Lname John Smith Joyce English Comment Step 3 of 9 Explanation: The above query will display the names of all employees of department “5” and who works more than 10 hours per week on the project “Product X”. Comment Step 4 of 9 b) Query: Select emp.Fname, emp.Lname from employee emp, dependent d where emp.ssn= d.essn and emp.Fname = d.Dependent_name Comment Step 5 of 9 Result: (empty) Fname Lname Comment Step 6 of 9 Explanation: The above query will display the names of the entire employee who have a dependent with the same first name as themselves. • Here, the result is empty. Because, it does not have the same first name in dependent and employee table. Comment Step 7 of 9 c) Query: Select emp.Fname, emp.Lname from employee emp, employee emp1 where emp1.Fname= ‘Franklin’ and emp1.Lname = ‘Wong’ and emp.superssn = emp1.ssn Comment Step 8 of 9 Fname Lname John Smith Ramesh Narayan Joyce English Comment Step 9 of 9 Explanation: The above query uses self-join to display the names of all the employees who are under the supervision of Franklin Wong. Comment Chapter 6, Problem 11E Show transcribed image text E Chegg Study TEXTBOOK SOLUTIONS EXPERT Q&A Search home study /engineering /computer science database systems /solutions manual fundamentals of database systems /7th edition /chapter 6 problem 11e Fundamentals of Database Systems (7th Edition) E Chapter 6, Problem 11E Bookmark Show all steps: a ON Problem Specify the updates of Exercise using the SQL update commands. Exercise What is meant by a recursive relationship type? Give some example of recursive relationship types. Step-by-step solution There is no solution to this problem yet. Get help from a Chegg subject expert. ASK AN EXPERT If the same entity type participate more than once in a relationship type in different roles then such relationship types are called recursive relationship. It occur within unary relationships. The relationship may be one to one, one to many or many to many. That is the cardinality of the relationship is unary. The connectivity may be 1:1, 1:M, or M:N. For example, in the below gure REPORTS_TO is a recursive relationship as the Employee entity type plays two roles – 1) Supervisor and 2) Subordinate. The above relationship can also be de ned as relationship between a manager and a employee. An employee is a manager as well as employee. To implement recursive relationship, a foreign key of the employee’s manager number would be held in each employee record. Emp_entity( Emp_no,Emp_Fname, Emp_Lname, Emp_DOB, Emp_NI_Number, Manager_no); View comments (1)  Manager no - (this is the employee no of the Chapter 6, Problem 12E Problem Specify the following queries in SQL on the database schema of Figure 1.2. a. Retrieve the names of all senior students majoring in ‘cs’ (computer science). b. Retrieve the names of all courses taught by Professor King in 2007 and 2008. c. For each section taught by Professor King, retrieve the course number, semester, year, and number of students who took the section. d. Retrieve the name and transcript of each senior student (Class = 4) majoring in CS. A transcript includes course name, course number, credit hours, semester, year, and grade for each course completed by the student. Step-by-step solution Step 1 of 4 a. The query to display the names of senior students majoring in CS is as follows: Query: SELECT Name FROM STUDENT WHERE Major = “CS” AND Class = “4”; Output: Explanation: • There are no rows in the database where Class is Senior, and Major is CS. • SELECT is used to query the database and get back the specified fields. o Name is the columns of STUDENT table. • FROM is used to query the database and get back the preferred information by specifying the table name. o STUDENT is a table name. • WHERE is used to specify a condition based on which the data is to be retrieved. In the database, Seniors are represented by Class 4. The condition is as follows: o Major='CS'AND Class = ‘4’ Comment Step 2 of 4 b. The query to get the course name that are taught by professor King in year 2007 and 2008 is as follows: Query: SELECT Course_name FROM COURSE, SECTION WHERE COURSE.Course_number = SECTION.Course_number AND Instructor = 'King' AND (Year='07' or Year='08'); Output : Explanation: • SELECT is used to query the database and get back the specified fields. o Course_name is the columns of COURSE table. • FROM is used to query the database and get back the preferred information by specifying the table name. o COURSE, SECTION are table names. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions are as follows: o COURSE.Course_number = SECTION.Course_number o Instructor = 'King' o (Year='07' or Year='08') • The conditions are concatenated with AND operator. All the conditions must be satisfied. Comment Step 3 of 4 c. The query to retrieve the course number, Semester, Year and number of students who took the section taught by professor King is as follows: Query: SELECT Course_number, Semester, Year, Count(G.Student_number) AS 'Number of Students' FROM SECTION AS S, GRADE_REPORT AS G WHERE S.Instructor= 'King' AND S.Section_identifier=G.Section_identifier; Output : Explanation: • SELECT is used to query the database and get back the specified fields. o Course_number, Semester, Year are the columns of SECTION table. • FROM is used to query the database and get back the preferred information by specifying the table name. o GRADE_REPORT, SECTION are table names. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions are as follows: o S.Instructor= 'King' o S.Section_identifier=G.Section_identifier Comment Step 4 of 4 d. The query to display the name and transcript of each senior students majoring in CS is as follows: Query: SELECT ST.Name, C.Course_name, C.Course_number, C.Credit_hours, S.Semester, S.Year, G.Grade FROM STUDENT AS ST, COURSE AS C, SECTION AS S, GRADE_REPORT As G WHERE Class = 4 AND Major='CS' AND ST.Student_number= G.Student_number AND G.Section_identifier= S.Section_identifier AND S.Course_number= C.Course_number; Output : No rows selected. Explanation: • SELECT is used to query the database and get back the specified fields. o Course_number, Course_number, Credit_hours are the columns of COURSE table. o Semester, Year are the columns of SECTION table. o Name is the columns of STUDENT table. o Grade is the columns of GRADE_REPORT table. • FROM is used to query the database and get back the preferred information by specifying the table name. o STUDENT, COURSE, GRADE_REPORT, SECTION are table names. o ST is the alias name for STUDENT table. o G is the alias name for GRADE_REPORT table. o S is the alias name for SECTION table. o C is the alias name for COURSE table. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions are as follows: o Class = 4 o Major='CS' o ST.Student_number= G.Student_number o G.Section_identifier= S.Section_identifier o S.Course_number= C.Course_number Comment Chapter 6, Problem 13E Problem Write SQL update statements to do the following on the database schema shown in Figure 1.2. a. Insert a new student, , in the database. b. Change the class of student ‘Smith’ to 2. c. Insert a new course, <’Knowledge Engineering’, ‘cs4390’, 3, ‘cs’>. d. Delete the record for the student whose name is ‘Smith’ and whose student number is 17. Step-by-step solution Step 1 of 4 a. The query to insert a new student into STUDENT relation is as follows: Query: INSERT INTO STUDENT VALUES ('Johnson', 25, 1, 'MATH'); Explanation: • INSERT command is used to insert a row into a relation. • STUDENT is the name of the relation. Output: Comment Step 2 of 4 b. The query to update the class of a student with name Smith to 2 is as follows: Query: UPDATE STUDENT SET CLASS = 2 WHERE Name='Smith'; Explanation: • UPDATE command is used to modify the data in a relation. • STUDENT is the name of the relation. • SET is used to specify the new value for a column. • WHERE is used to specify a condition based on which the data is to be retrieved. Output: Comment Step 3 of 4 c. Query: INSERT INTO COURSE VALUES ('Knowledge Engineering','cs4390', 3,'cs'); Explanation: • INSERT command is used to insert a row into a relation. • COURSE is the name of the relation. Output: Comment Step 4 of 4 d. Query: DELETE FROM STUDENT WHERE Name='Smith' AND Student_number=17; Explanation: • DELETE command is used to delete a row from the specified relation. • STUDENT is the name of the relation. • WHERE is used to specify a condition based on which the data is to be retrieved. Output: Chapter 6, Problem 14E Problem Design a relational database schema for a database application of your choice. a. Declare your relations using the SQL DDL. b. Specify a number of queries in SQL that are needed by your database application. c. Based on your expected use of the database, choose some attributes that should have indexes specified on them. d. Implement your database, if you have a DBMS that supports SQL. Step-by-step solution Step 1 of 6 Consider a student database that stores the information about students, courses and faculty. a. The DDL statement to create the relation STUDENT is as follows: CREATE TABLE STUDENT ( StudentID int(11) NOT NULL, FirstName varchar(20) NOT NULL, LastName varchar(20) NOT NULL, Address varchar(30) NOT NULL, DOB date, Gender char ); The DDL statement to add a primary key to the relation STUDENT is as follows: ALTER TABLE STUDENT ADD PRIMARY KEY (StudentID); The DDL statement to create the relation COURSE is as follows: CREATE TABLE COURSE ( CourseID varchar(30) NOT NULL, CourseName varchar(30) NOT NULL, PRIMARY KEY (CourseID) ); The DDL statement to create the relation FACULTY is as follows: CREATE TABLE FACULTY ( FacultyID int(11) NOT NULL, FacultyName varchar(30) NOT NULL, PRIMARY KEY (FacultyID) ); The DDL statement to create the relation REGISTRATION is as follows: CREATE TABLE REGISTRATION ( StudentID int(11) NOT NULL, CourseID varchar(30) NOT NULL, PRIMARY KEY (StudentID, CourseID) ); The DDL statement to create the relation TEACHES is as follows: CREATE TABLE TEACHES ( FacultyID int(11) NOT NULL, CourseID varchar(30) NOT NULL, DateQualified varchar(12), PRIMARY KEY (FacultyID,CourseID) ); The DDL statement to add a column GradePoints to the relation COURSE is as follows: ALTER TABLE COURSE ADD COLUMN GradePoints int(2); Comment Step 2 of 6 b. A wide number of queries can be written using the five relations based on the requirement of the user. So, the number of queries is not fixed and will vary. Some of the possible queries that are needed by the database application are as follows: The query to retrieve the details of the students is as follows: SELECT * FROM STUDENT; The query to retrieve the details of the faculties is as follows: SELECT * FROM FACULTY; The query to retrieve the details of the courses offered is as follows: SELECT * FROM COURSE; The query to retrieve which course is taught by which faulty is as follows: SELECT * FROM TEACHES; The query to retrieve the names of the students who have registered for a course is as follows: SELECT FirstName, LastName FROM STUDENT, REGISTRATION WHERE STUDENT.StudentID=REGISTRATION.StudentID; Comment Step 3 of 6 The query to retrieve the details of the male students is as follows: SELECT * FROM STUDENT WHERE GENDER= 'M'; The query to retrieve the courses with grade point 3 and above is as follows: SELECT * FROM COURSE WHERE GradePoints >=3; Comment Step 4 of 6 c. Indexes are used for faster retrieval of data. Some of the attributes that can used as indexes are as follows: • An index can be specified on FirstName in STUDENT relation. • An index can be specified on LastName in STUDENT relation. • An index can be specified on CourseName in COURSE relation. • An index can be specified on FacultyName in FACULTY relation. Comment Step 5 of 6 d. The implementation of the student database is as follows: Comment Step 6 of 6 Comment Chapter 6, Problem 15E Problem Consider that the EMPLOYEE table’s constraint EMPSUPERFK as specified in Figure 6.2 is changed to read as follows: CONSTRAINT EMPSUPERFK FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn) Answer the following questions: a. What happens when the following command is run on the database state shown in Figure 5.6? DELETE EMPLOYEE WHERE Lname = ‘Borg’ b. Is it better to CASCADE or SET NULL in case of EMPSUPERFK constraint ON DELETE? Step-by-step solution Step 1 of 2 a) From the figure 8.2 in the text book, while EMP table constraint specified as CONSTRAINT EMPSUPER FK FOREIGN KEY(supper_ssn) REFERNCES EMPLOYEE(Ssn) ON DELETET CASCADE ON UPDATE CASCADE, From the figure 5.5 in the text book the result is like this. The James E. Borg entry is deleted from the table, and each employee with him as a supervisor is also (and their supervisees, and so on). In total, 8 rows are deleted and the table is empty. Comment Step 2 of 2 b) Yes, It is better to SET NULL, since an employee is not fired (DELETED) when their supervisor is deleted. Instead, their SUPERSSN should be SET NULL so that they can later get a new supervisor. Comment Chapter 6, Problem 16E Problem Write SQL statements to create a table EMPLOYEE_BACKUP to back up the EMPLOYEE table shown in Figure 5.6. Step-by-step solution Step 1 of 4 Step1: Create a table EMPLOYEE is as follows: CREATE TABLE EMPLOYEE ( Fname varchar(15) NOT NULL, Minit char(1) DEFAULT NULL, Lname varchar(15) NOT NULL, Ssn char(9) NOT NULL, Bdata date DEFAULT NULL, Address varchar(30) DEFAULT NULL, Sex char(1) DEFAULT NULL, Salary decimal(10,2) DEFAULT NULL, Super_ssn char(9) DEFAULT NULL, Dno int(11) NOT NULL, PRIMARY KEY ( Ssn ) ); Step2: Insert the data into the EMPLOYEE table using INSERT command. INSERT INTO EMPLOYEE VALUES ('James', 'E', 'Borg', '888665555', DATE '1937-11-10', '450 Stone, Houston, TX', 'M', 55000, NULL, 1); INSERT INTO EMPLOYEE VALUES ('Jennifer', 'S', 'Wallace', '987654321', DATE '1941-06-20', '291 Berry, Bellaire, Tx', 'F', 37000, '888665555', 4); INSERT INTO EMPLOYEE VALUES ('Franklin', 'T', 'Wong', '333445555', DATE '1955-12-08', '638 Voss, Houston, TX', 'M', 40000, '888665555', 5); INSERT INTO EMPLOYEE VALUES ('John', 'B', 'Smith', '123456789', DATE '1965-01-09', '731 Fondren, Houston, TX', 'M', 30000, '333445555', 5); INSERT INTO EMPLOYEE VALUES ('Alicia', 'J', 'Zelaya', '999887777', DATE '1968-01-19', '3321 castle, Spring, TX', 'F', 25000, '987654321', 4); INSERT INTO EMPLOYEE VALUES ('Ramesh', 'K', 'Narayan', '666884444', DATE '1920-09-15', '975 Fire Oak, Humble, TX', 'M', 38000, '333445555', 5); INSERT INTO EMPLOYEE VALUES ('Joyce', 'A', 'English', '453453453', DATE '1972-07-31', '5631 Rice, Houston, TX', 'F', 25000, '333445555', 5); INSERT INTO EMPLOYEE VALUES ('Ahmad', 'V', 'Jabbar', '987987987', DATE '1969-03-29', '980 Dallas, Houston, TX', 'M', 22000, '987654321', 4); INSERT INTO EMPLOYEE VALUES ('Melissa', 'M', 'Jones', '808080808', DATE '1970-07-10', '1001 Western, Houston, TX', 'F', 27500, '333445555', 5); Step3: Now, select the EMPLOYEE table to display all the rows. select * from EMPLOYEE; Sample Output: Comment Step 2 of 4 The SQL statements to create a table EMPLOYEE_BACKUP to store the backup data of EMPLOYEE table is as follows: The SQL statement to create the EMPLOYEE_BACKUP table: CREATE TABLE EMPLOYEE_BACKUP LIKE EMPLOYEE; Explanation: • The SQL statement will create the table EMPLOYEE_BACKUP with the same structure as the table EMPLOYEE. • CREATE TABLE is the command to create a table. • LIKE is the keyword used to copy the structure of the table EMPLOYEE. Comment Step 3 of 4 The SQL statement to insert the data into the EMPLOYEE_BACKUP: INSERT INTO EMPLOYEE_BACKUP (SELECT * FROM EMPLOYEE); Explanation: • The SQL statement will insert the data in the table EMPLOYEE_BACKUP into the table EMPLOYEE_BACKUP. • Comment Step 4 of 4 SELECT * FROM EMPLOYEE will fetch all the records from the table EMPLOYEE. Sample Output: Comment Chapter 7, Problem 1RQ Problem Describe the six clauses in the syntax of an SQL retrieval query. Show what type of constructs can be specified in each of the six clauses. Which of the six clauses are required and which are optional? Step-by-step solution Step 1 of 3 A query in SQL consists of up to six clauses. The clauses are specified in following order. • SELECT < attribute list > • FROM < table list > • [ WHERE < condition > ] • [ GROUP BY < grouping attributes (S) > ] • [ HAVING < group condition > ] • [ ORDER BY < attribute list > ] Comment Step 2 of 3 The definition of the types of the values returned by the query is made with the help of the SELECT clause. The FROM clause is used to retrieve the desired data from the table for the provided query. The WHERE clause is a conditional clause. It is used to retrieve the values with restriction. The GROUP BY clause is used to group the results for the provided query according to the properties. The HAVING clause is used to retrieve the results of the GROUP BY clause with some restriction. The ORDER BY clause is used to sort the values returned by the query in a specific order. Comment Step 3 of 3 The SELECT and FROM clauses are the required clauses and the clauses like WHERE, GROUP BY, HAVING and ORDER BY are optional clauses. Comment Chapter 7, Problem 2RQ Problem Describe conceptually how an SQL retrieval query will be executed by specifying the conceptual order of executing each of the six clauses. Step-by-step solution Step 1 of 1 A retrieval query in SQL can consist of up to six clauses, but only the first two-SELECT and FROM- are mandatory. The clauses are specified in the following order, with the clauses between square brackets […] being optional: SELECT FROM [WHERE] [GROUP BY] [HAVING] [ORDER BY ] The SELECT clause lists the attributes or functions to be retrieved. The FROM clause specifies all relation needed in query, including joined relations, but not those in nested queries. The WHERE clause specifies the conditions for selection of tuples from these relations, including join conditions if needed. GROUP BY specifies grouping attributes, HAVING specifies a condition on groups being selected rather than individual tuples. ORDER BY specifies an order for displaying the result of a query. A query is evaluated conceptually by first applying FROM clause, followed by WHERE clause, and then GROUP BY, and HAVING. ORDER BY s applied at the end to sort the query result. The values of the attributes specified in SELECT clause are shown in result. Comment Chapter 7, Problem 3RQ Problem Discuss how NULLs are treated in comparison operators in SQL. How are NULLs treated when aggregate functions are applied in an SQL query? How are NULLs treated if they exist in grouping attributes? Step-by-step solution Step 1 of 1 In SQL NULL is treated as an UNKNOWN value. SQL has thre logical operators TRUE, FALSE, UNKNOWN. For comparison operators in SQL, NULL can be compared using IS or IS NOT operator. SQL treats each NULL as a distinct value, so =,<,> can not be used for comparison. In general, NULL values are discarded when aggregate functions are applied to a particular column. If NULL exists in the grouping attribute, then separate group is created for all tuples with a NULL value in the grouping attribute. Comment Chapter 7, Problem 4RQ Problem Discuss how each of the following constructs is used in SQL, and discuss the various options for each construct. Specify what each construct is useful for. a. Nested queries b. Joined tables and outer joins c. Aggregate functions and grouping d. Triggers e. Assertions and how they differ from triggers f. The SQL WITH clause g. SQL CASE construct h. Views and their updatability i. Schema change commands Step-by-step solution Step 1 of 11 a. Nested Queries: A nested query is a type of SQL query that is used within another SQL queries with WHERE clause. It is also known as sub query or Inner query. Options: It can be used with the SELECT, INSERT, UPDATE, and DELETE statements. These statements are used with the operators <, >, <=, >=, =, IN, BETWEEN. SYNTAX: Get the employee id of all employee who are enrolled in the same business as the other employee with salary 35000. Select * from where in Use: It is used to return values after comparison from the selected values. Comment Step 2 of 11 b. Joined Tables: A joined-table is the resultant table that is the generated by an inner join, or an outer join, or a cross join. Uses of Joined Tables: A joined table can be used in any context where the SELECT statement is used. Outer Join: Types of outer: 1) Left outer join: when left outer join is applied on tables it return all the rows from the left table and those right table rows also came which is same in the left table row. It is denoted by the symbol (?). Syntax: SELECT columnFROM table_ALEFTJOIN table_BON table_A.column_1=table_B.column_2; 2) Right outer join: when the right outer join is applied to tables, it returns all the rows from the right table and those left table rows also came which is same in the right table row. It is denoted by the symbol (?). Syntax: SELECT columnFROM table_ARIGHTJOIN tableBON table_A.column1=tableB.column2; 3) Full outer join: when the full outer join is applied on the table it return all the rows from both the left and the right table. It is denoted by the symbol (?). Syntax: SELECT column FROM table_AFULLOUTERJOIN table_BON table_A.column1=table_B.column2; Options: It is used with the SELECT, FROM and ON statements. Use: Join can be used to get a resultant column or table by adding two different table. Comment Step 3 of 11 c. Aggregate Functions: It is a function where the multiple input values take from the Column to generate a single value as an output Aggregate functions are: Avg, Count, First, Last, Max, Min, Sum etc. Option: It can be used with the SELECT and FROM. Use: It is used to perform mathematics operation easily. Grouping: In many cases to subgroup the tuples in a relation the aggregation function may apply. These subgroups are dependent on some attribute values. On applying the group by clause the table is divided into different group. Syntax for using Group by clause: SELECT column name, function (column name) FROM table name WHERE column name operator value GROUP BY column name; Options: It can be used with the SELECT,FROM and WHERE statements. Use: The GROUP BY clause is applied when there is a need of dividing the table into different group according the attributes values. Comment Step 4 of 11 d. Triggers: A database trigger is procedural code, which automatically execute or fire when event (INSERT, DELETE or UPDATE) occurs. Syntax for trigger: Options: It can be used with the INSERT, DELETE and UPDATE statements. Use: Trigger can be used for the following purpose: 1. To create some derived column automatically. 2. To improve security authorization. 3. To avoid the invalid transaction Comment Step 5 of 11 e. Assertions: It is an expression that should be always true. When there is create the expression should always be true. DBMS checks the assertion after any change in the expression that may violate the expression. Syntax for Assertions: Create assertion check Predicates always return a result either true or false. Option: It can be used with the CREATE, CHECK and FROM statements. Use: It can be used to check the condition of schema only. The following table shows the difference between ASSERTION andTRIGGERS: ASSERTIONS TRIGGERS Assertion only check the conditions it do not Triggers check the condition and if required modify the data. the change the data also. Assertion neither linked the particular table nor Trigger linked the both particular table and particular events in the database. particular in the database. All Assertion can be used as the Trigger. All Triggers cannot be implements as assertions. Oracle database does not implements Assertions. Oracle database implements Triggers. Comment Step 6 of 11 f. The SQL WITH clause: This clause was introduced as a convenience in SQL 99 and it was added into the Oracle SQL syntax in Oracle 9.2, it may not available in all SQL based DBMS. It allows the user to define the table in a such a way that it is only being used in a particular query. It is sometime similar like creating a view that will be used in a particular query then drop. Syntax for SQL WITH clause: WITH temporary table SELECT Column name FROM table name WHERE condition GROUP BY column name; Option: It can be used with SELECT, FROM, WHERE and GROUP BY statements. Used: It can be used to create a complex statement rather than simple statements. It can be used to break down complex SQL queries with which it easy for debugging and processing the complex queries. Comment Step 7 of 11 g. SQL CASE construct: The SQL case constructs used as the if-else-then used in java similarly it is used in SQL. It can be used when some value or any values is different on a particular condition. SQL case construct can be used with any SQL query where the conditional values have to be extract. Syntax of Sql case construct: Case expression WHEN condition_a THEN result_1 WHEN condition_b THEN result_2 WHEN condition_c THEN result_3 ELSE result END; Comment Step 8 of 11 Option: It can be used with the SELECT and FROM statement. Comment Step 9 of 11 Use: It can be used to perform a operation when there is a particular condition occur. Comment Step 10 of 11 h. Views and their updatability: The view is a virtual table which is derived from the other table and these other tables are base table. And these base tables are physically exist and its tuples are stored in the database. Syntax for creating view: CREATE VIEW virtual table AS SELECT attributes FROM different tables WHERE conditions; It creates the view there is the name of the view and in the AS SELECT we define the attributes which came under virtual table, the FROM clause defines the table from where the attributes will be extracted for the virtual table and in the where there is particular condition which should be satisfied by the virtual table. Option: It can be used with the AS SELECT, FROM, WHERE statements. Use: The virtual table is create when the table need to reference frequently. Comment Step 11 of 11 i. Schema change Commands: The schema change command used in sql to alter a schema by adding or dropping the attributes, table, constraints and other schema elements. This can be done when the database does not require to again compile the database schema and the database is optional. The different Schema change Commands are as follows: • The drop command • The alter command DROP command: The drop commands can be used to drop schema elements, Such as tables, attributes, constraints. The whole schema can be drop by the command DROP SCHEMA. Syntax of drop command: DROP SCHEMA employee CASCADE; ALTER command: The schema can be change with the help of the Alter command, such as changing the column name, adding or dropping the attributes. Syntax of alter command: ALTER TABLE employee ADD COLUMN phone_no VARCHAR (15); Use: It can be used to change the schema or to drop the schema. Comment Chapter 7, Problem 5E Problem Specify the following queries on the database in Figure 5.5 in SQL. Show the query results if each query is applied to the database state in Figure 5.6. a. For each department whose average employee salary is more than $30,000, retrieve the department name and the number of employees working for that department. b. Suppose that we want the number of male employees in each department making more than $30,000, rather than all employees (as in Exercise a). Can we specify this query in SQL? Why or why not? Step-by-step solution Step 1 of 2 a) The query to retrieve dname and count of employees working in that department whose average salary is greater than 30000 is as follows: Query: SELECT Dname, COUNT(*) FROM DEPARTMENT, EMPLOYEE WHERE DEPARTMENT.Dnumber=EMPLOYEE.DNo GROUP BY Dname HAVING AVG(Salary) > 30000; Output: Explanation: • SELECT is used to query the database and get back the specified fields. o Dname, LAST_NAME, FIRST_NAME is an attribute of DEPARTMENT table. • FROM is used to query the database and get back the preferred information by specifying the table name. o EMPLOYEE and DEPARTMENT are table names. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions are as follows: o DEPARTMENT.Dnumber=EMPLOYEE.DNo • GROUP BY is used to group the result of a SELECT statement done on a table where the tuple values are similar for more than one column. o Dname is the group by attribute. • HAVING clause is used to specify the condition based on group by function. o AVG(Salary) > 30000 is the condition. • COUNT(*) is used to count the number of tuples that satisfy the conditions. Comment Step 2 of 2 (b) The query to retrieve dname and count of employees working in that department whose salary is greater than 30000 is as follows: Query: SELECT Dname, COUNT(*) FROM DEPARTMENT, EMPLOYEE WHERE DEPARTMENT.Dnumber=EMPLOYEE.DNo AND Sex='M' AND Salary > 30000 GROUP BY Dname; Output: Explanation: • SELECT is used to query the database and get back the specified fields. o Dname, LAST_NAME, FIRST_NAME is an attribute of DEPARTMENT table. • FROM is used to query the database and get back the preferred information by specifying the table name. o EMPLOYEE and DEPARTMENT are table names. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions are as follows: o DEPARTMENT.Dnumber=EMPLOYEE.DNo o Sex='M' o Salary > 30000 • GROUP BY is used to group the result of a SELECT statement done on a table where the tuple values are similar for more than one column. o Dname is the group by attribute. Comments (1) Chapter 7, Problem 6E Problem Specify the following queries in SQL on the database schema in Figure 1.2. a. Retrieve the names and major departments of all straight-A students (students who have a grade of A in all their courses). b. Retrieve the names and major departments of all students who do not have a grade of A in any of their courses. Step-by-step solution Step 1 of 2 a. The query to retrieve the names and major departments of the students who got A grade in all the courses is as follows: Query: SELECT Name, Major FROM STUDENT WHERE NOT EXISTS (SELECT * FROM GRADE_REPORT WHERE Student_number= STUDENT.Student_number AND NOT (GRADE='A')); Explanation: • SELECT is used to query the database and get back the specified fields. o Name, Major are columns of STUDENT table. • FROM is used to query the database and get back the preferred information by specifying the table name. o STUDENT is a table name. • WHERE is used to specify a condition based on which the data is to be retrieved. • The inner query retrieves the details of the student who got other than A grade for any courses. • The outer query retrieves the name and major of the student who got A grade for all courses. • NOT EXISTS is used to retrieve only those students which are not retrieved by inner query. Output: Comment Step 2 of 2 b. The query to retrieve the names and major departments of the students who got A grade in all the courses is as follows: Query: SELECT Name, Major FROM STUDENT WHERE NOT EXISTS (SELECT * FROM GRADE_REPORT WHERE Student_number= STUDENT.Student_number AND (GRADE= 'A')); Explanation: • SELECT is used to query the database and get back the specified fields. o Name, Major are columns of STUDENT table. • FROM is used to query the database and get back the preferred information by specifying the table name. o STUDENT is a table name. • WHERE is used to specify a condition based on which the data is to be retrieved. • The inner query retrieves the details of the student who got A grade for any courses. • The outer query retrieves the name and major of the student who did not get A grade for any courses. • NOT EXISTS is used to retrieve only those students which are not retrieved by inner query. Output: Comment Chapter 7, Problem 7E Problem In SQL, specify the following queries on the database in Figure 5.5 using the concept of nested queries and other concepts described in this chapter. a. Retrieve the names of all employees who work in the department that has the employee with the highest salary among all employees. b. Retrieve the names of all employees whose supervisor’s supervisor has ‘888665555’ for Ssn. c. Retrieve the names of employees who make at least $10,000 more than the employee who is paid the least in the company. Step-by-step solution Step 1 of 4 SQL: Structured Query Language (SQL) is a database language for managing and accessing the data in a relational database. • SQL consists of queries to insert, update, delete, and retrieve records from a database. It even creates a new database and database table. Nested query: Some of the queries require the need of existing values to be obtained and then it is utilized in a comparison condition. This is referred as nested query. In this, a completed “select from where” blocks exist inside WHERE clause of a different query. This query is referred as outer query. The format of “ select ” statement is: SELECT attribute-list FROM table-list WHERE condition o Here, “SELECT”, “FROM”, and “WHERE” are the keywords. o “attribute-list” is the list of attributes. • To retrieve all the attributes of a table, instead of giving all attributes in the table, asterisk (*) can be used. o “table-list” is the list of tables. o Condition is optional. Comment Step 2 of 4 a) Query: SELECT LNAME FROM EMPLOYEE WHERE DNO = (SELECT DNO FROM EMPLOYEE WHERE SALARY = (SELECT MAX(SALARY) FROM EMPLOYEE) ) Explanation: The first nested (outer) query selects all employee names. While the second query selects department number with the employee of highest salary among all the employees. Comment Step 3 of 4 b) Query: SELECT LNAME FROM EMPLOYEE WHERE SUPERSSN IN (SELECT SSN FROM EMPLOYEE WHERE SUPERSSN = ‘888665555’) Explanation: The first nested (outer) query selects the employee names where the supervisor’s supervisor serial number in the second query matches with the number “888665555”. Comments (1) Step 4 of 4 c) Query: SELECT LNAME FROM EMPLOYEE WHERE SALARY > 10000 + ( SELECT MIN(SALARY) FROM EMPLOYEE) Explanation: The first nested (outer) query selects the employee names where the salary is greater than 10,000 and in the second query, it selects the employee who has the least salary. Comment Chapter 7, Problem 8E Problem Specify the following views in SQL on the COMPANY database schema shown in Figure 5.5. a. A view that has the department name, manager name, and manager salary for every department b. A view that has the employee name, supervisor name, and employee salary for each employee who works in the ‘Research’ department c. A view that has the project name, controlling department name, number of employees, and total hours worked per week on the project for each project d. A view that has the project name, controlling department name, number of employees, and total hours worked per week on the project for each project with more than one employee working on it Step-by-step solution Step 1 of 4 a. A view that has the department name along with the name and salary of the manager for every department is as follows: CREATE VIEW MANAGER_INFORMATION AS SELECT Dname, Fname AS Manager_First_name, Salary FROM DEPARTMENT, EMPLOYEE WHERE DEPARTMENT.Mgr_ssn = EMPLOYEE.Ssn; Explanation: • CREATE VIEW will create a view with the MANAGER_INFORMATION. • SELECT is used to query the database and get back the specified fields. o Dname is an attribute of DEPARTMENT table. o Fname and Salary are attributes of EMPLOYEE table. • FROM is used to query the database and get back the preferred information by specifying the table name. o DEPARTMENT, EMPLOYEE are table names. • WHERE is used to specify a condition based on which the data is to be retrieved. o DEPARTMENT.Mgr_ssn = EMPLOYEE.Ssn is the condition. Comment Step 2 of 4 b. A view that has the employee name, supervisor name and employee salary for each employee who works in the Research department is as follows: CREATE VIEW EMPLOYEE_INFORMATION AS SELECT e.Fname AS Employee_first_name, e.Minit AS Employee_middle_init, e.Lname AS Employee_last_name, s.Fname AS Manager_fname, s.Minit AS Manager_minit, s.Lname AS Manager_Lname, Salary FROM EMPLOYEE AS e, EMPLOYEE AS s, DEPARTMENT AS d WHERE e.Super_ssn = s.Ssn AND e.Dno = d.Dnumber AND d.Dname = 'Research'; Explanation: • CREATE VIEW will create a view with the EMPLOYEE_INFORMATION. • SELECT is used to query the database and get back the specified fields. o Dname is an attribute of DEPARTMENT table. o Fname, Lname, Minit and Salary are attributes of EMPLOYEE table. • FROM is used to query the database and get back the preferred information by specifying the table name. o DEPARTMENT, EMPLOYEE are table names. o e, s are the alias names of EMPLOYEE table. o d is alias name of DEPARTMENT table. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions specified in the query are o e.Super_ssn = s.Ssn checks o e.Dno = d.Dnumber o d.Dname = 'Research' Comment Step 3 of 4 c. A view that has the project name, controlling department name, number of employees, and total hours worked per week on the project is as follows: CREATE VIEW PROJECT_INFORMATION AS SELECT Pname, Dname, COUNT(WO.Essn), SUM(WO.Hours) FROM PROJECT AS P, WORKS_ON AS WO, DEPARTMENT AS D WHERE P.Dnum = D.Dnumber AND P.Pnumber = WO.Pno GROUP_BY Pno; Explanation: • CREATE VIEW will create a view with the PROJECT_INFORMATION. • SELECT is used to query the database and get back the specified fields. o Dname is an attribute of DEPARTMENT table. o Pname is an attribute of PROJECT table. o Essn and Hours are attributes of WORKS_ON table. • FROM is used to query the database and get back the preferred information by specifying the table name. o DEPARTMENT, EMPLOYEE and WORKS_ON are table names. o P is the alias name for PROJECT table. o D is alias name of DEPARTMENT table. o WO is alias name of WORKS_ON table. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions specified in the query are o P.Dnum = D.Dnumber o P.Pnumber = WO.Pno • GROUP BY is used to group the result of a SELECT statement done on a table where the tuple values are similar for more than one column. o Pno is the group by attribute. Comment Step 4 of 4 d. The following is the view that has the project name, controlling department name, number of employees, and total hours worked per week on the project for each project with more than one employee working on it. CREATE VIEW PROJECT_INFO AS SELECT Pname, Dname, COUNT(WO.Essn), SUM(WO.Hours) FROM PROJECT AS P, WORKS_ON AS WO, DEPARTMENT AS D WHERE P.Dnum = D.Dnumber AND P.Pnumber = WO.Pno GROUP_BY Pno HAVING COUNT(WO.Essn) > 1; Explanation: • CREATE VIEW will create a view with the PROJECT_INFO. • SELECT is used to query the database and get back the specified fields. o Dname is an attribute of DEPARTMENT table. o Pname is an attribute of PROJECT table. o Essn and Hours are attributes of WORKS_ON table. • FROM is used to query the database and get back the preferred information by specifying the table name. o DEPARTMENT, EMPLOYEE and WORKS_ON are table names. o P is the alias name for PROJECT table. o D is alias name of DEPARTMENT table. o WO is alias name of WORKS_ON table. • WHERE is used to specify a condition based on which the data is to be retrieved. The conditions specified in the query are o P.Dnum = D.Dnumber o P.Pnumber = WO.Pno • GROUP BY is used to group the result of a SELECT statement done on a table where the tuple values are similar for more than one column. o Pno is the group by attribute. • HAVING clause is used to specify the condition based on group by function. o COUNT(WO.Essn) > 1 is the condition. Comment Chapter 7, Problem 9E Problem Consider the following view, DEPT_SUMMARY, defined on the COMPANY database in Figure 5.6: CREATE VIEW DEPT_SUMMARY (D, C, Total_s, Average_s)AS SELECT Dno, COUNT State which of the following queries and updates would be allowed on the view. If a query or update would be allowed, show what the corresponding query or update on the base relations would look like, and give its result when applied to the database in Figure 5.6. a. SELECT * FROM DEPT_SUMMARY; b. SELECT D,C FROM DEPT_SUMMARY WHERE TOTAL_S > 100000; c. SELECT D, AVERAGE_S FROM DEPT_SUMMARY WHERE C > ( SELECT C FROM D d. UPDATE DEPT_SUMMARY SET D=3 WHERE e. DELETE FROM DEPT_SUMMARY WHERE C > 4; Step-by-step solution Step 1 of 5 a) Allowed D C Total_s Average_s 5 4 133000 33250 D = 4; 4 3 93000 31000 1 1 55000 55000 Comments (1) Step 2 of 5 b) Allowed D C 5 4 Comment Step 3 of 5 c) Allowed D Average_s 5 33250 Comment Step 4 of 5 d) Not allowed because update on aggregate functions is not evaluated. Comment Step 5 of 5 e) Not allowed because there can be multiple meaning of the query. Comment Chapter 8, Problem 1RQ Problem List the operations of relational algebra and the purpose of each. Step-by-step solution Step 1 of 6 The operations of relational algebra are as follows: • SELECT • PROJECT • THETA JOIN • EQUI JOIN • NATURAL JOIN • UNION • INTERSECTION • MINUS or DIFFERENCE • CARTESIAN PRODUCT • DIVISION Comment Step 2 of 6 SELECT operation: • It is used to obtain a subset of tuples of a relation based on a condition. In other words, it retrieves only those tuples that satisfy the condition. • The symbol used to denote SELECT operation is . • The notation of SELECT operation is • . retrieves the tuples from relation Employee whose job is clerk. PROJECT operation: • It is used to obtain certain attributes/columns of a relation. The attributes to be retrieved must be specified as a list separated by commas. • The symbol used to denote PROJECT operation is • The notation of PROJECT operation is • . . retrieves only the employee’s last name, first name and employee number of all employees in relation Employee Comment Step 3 of 6 THETA JOIN operation: • THETA JOIN operation combines related tuples from two relations and outputs as a single tuple. • The symbol used to denote THETA JOIN operation is . • The notation of THETA JOIN between the relations R and S is given as . EQUI JOIN operation: • An EQUIJOIN operation combines all the tuples of relations R and S that satisfy the condition. The comparison operator must be =. • The notations of EQUI JOIN between the relations R and S is given as Comment Step 4 of 6 NATURAL JOIN operation: • It is similar to EQUIJOIN. The only difference is the join attributes of relation S are not included in the resultant relation. • The notations of NATURAL JOIN between the relations R and S is given as UNION operation: • When UNION operation is applied on relations R and S, the resultant relation consists of all the tuples in relation R or S or both R and S. • If similar tuples are in both R and S relations, then only one tuple will be in the resultant relation. • The UNION operation can be applied on relations R and S only if the relations are union compatible. • The symbol used to denote UNION operation is . • The notation of UNION between the relations R and S is given as . Comment Step 5 of 6 INTERSECTION operation: • When INTERSECTION operation is applied on relations R and S, the resultant relation consists of only the tuples that are in both R and S. • The symbol used to denote INTERSECTION operation is . • The notation of INTERSECTION between the relations R and S is given as . MINUS or DIFFERENCE operation: • When DIFFERENCE operation is applied on relations R and S, the resultant relation consists of only the tuples that are R but not in S. • The symbol used to denote DIFFERENCE operation is . • The notation of DIFFERENCE between the relations R and S is given as . Comment Step 6 of 6 CARTESIAN PRODUCT operation: • When CARTESIAN PRODUCT operation is applied on relations R and S, the resultant relation consists of all the attributes of relation R and S along with all possible combination of the tuples of R and S. • The symbol used to denote CARTESIAN PRODUCT operation is . • The notation of CARTESIAN PRODUCT between the relations R and S is given as . DIVISION operation: • This combines all the tuples form a new relation where of that appears in • The symbol used to denote DIVISION operation is • The notation of DIVISION between R and S is given as Comments (1) with every tuple from . . . to Chapter 8, Problem 2RQ Problem What is union compatibility? Why do the UNION, INTERSECTION, and DIFFERENCE operations require that the relations on which they are applied be union compatible? Step-by-step solution Step 1 of 2 Union compatibility: The two relations are said to be union compatible if both the relations have the same number of attributes and the domain of the similar attributes is same. Comment Step 2 of 2 The UNION, INTERSECTION and DFFERENCE operations require that the relations on which they are applied be union compatible because all these operations are binary set operations. The tuples of the relations are directly compared under these operations and the tuples should have same no of attributes and the domain of the similar attributes should be same. Comment Chapter 8, Problem 3RQ Problem Discuss some types of queries for which renaming of attributes is necessary in order to specify the query unambiguously. Step-by-step solution Step 1 of 1 When a query has an NATURAL JOIN operation than renaming foreign key attribute is necessary, if the name is not already same in both relations, for operation to get executed. In EQUIJOIN after the operation is performed there are two attributes that have same values for all tuples. These are attributes which have been checked in condition. In NATURAL JOIN one of them has been removed only single attribute is there. DIVISION operation is another such operation. Division takes place on basis of common attribute so names must be same. Comment Chapter 8, Problem 4RQ Problem Discuss the various types of inner join operations. Why is theta join required? Step-by-step solution Step 1 of 6 Various types of inner join operations: From multiple relations when combining the data, then the related information can be presented in single table. This operation is known as inner join operations. Inner join operations are two types. They are: • EQUI JOIN operations • NATURAL JOIN operations Comment Step 2 of 6 EQUI JOIN operation: • In this operation, it will use the conditions and the relations with equality comparisons. • is called an EQUIJOIN operator where the only comparison operator used in a JOIN operation. In the end result of equijoin operations, always have one or more pair of attributes. It is having identical values in every tuple. Example syntax: Table Expression [INNER] JOIN table Expression {ON Boolean Expression} Or, Comment Step 3 of 6 NATURALJOIN operation: One of each pair of attributes with identical values is superfluous. • *- is denoted by the NATURAL JOIN operation. • It is created to get rid of the second (superfluous) attribute in an EQUI JOIN condition. Definition: • The standard definition of the NATURAL JOIN operation requires two join attributes. • Comment Step 4 of 6 It has the same name in both relations. • If the case is not possible, then the remaining operation is firstly applied. Example syntax: Comment Step 5 of 6 Theta join operation: • Theta join operation is consists of equerries. • From two relations to combine tuples, where the combination condition for the equality of shared attributes is not simple. • Then it is convenient for the JOIN operation to have a more general form. The operator to represent the Theta join operation is - Join operation is a binary operation. It is denoted as . Where, is an attribute for relation R is an attribute for relation S • • have the same domain and is the comparison operator operator is used to join the attributes those are NULL in the tuples or instructs the tuple do not appear the result when the join condition is FALSE. • So, the two relations will join that results in a subset of the Cartesian product, which is a subset determined by the join condition. Example syntax: The result of Professions careers is shown below. Name Job Career Pays Haney Mechanic Mechanic 6500 David Archaeologist Archaeologist 40,000 Comment Step 6 of 6 John Doctor Doctor 50,000 Comment Chapter 8, Problem 5RQ Problem What role does the concept of foreign key play when specifying the most common types of meaningful join operations? Step-by-step solution Step 1 of 3 A foreign key is a column or composite of columns which is/are a primary key of other table that is used to maintain relationship between two tables. • A foreign key is mainly used for establishing relationship between two tables. • A table can have more than one foreign key. Comment Step 2 of 3 The JOIN operation is used to combine related tuples from two relations into a single tuple. • In order to perform JOIN operation, there should exist relationship between two tables. • The relationship is maintained through the concept of foreign key. • If there is no foreign key, then JOIN operation may not lead to meaningful results. Hence, a foreign key concept is needed to establish relationship between two tables. Comment Step 3 of 3 Example: Consider the following relational database. EMPLOYEE(Name, Ssn, Manager_ssn, Job, Salary, Address, DeptNum) DEPARTMENT(Dno,Dname, Mgr_ssn) DeptNum is a foreign key in relation EMPLOYEE. The JOIN operation can be performed on two relations based on the foreign key. To retrieve employee name, DeptNum, Dname, the JOIN is as follows: Comment Chapter 8, Problem 6RQ Problem What is the FUNCTION operation? For what is it used? Step-by-step solution Step 1 of 3 FUNCTION operation: • FUNCTION operation also known as AGGREGATE FUNCTION operation is used to perform some mathematical aggregate functions on the numeric data. • It also allows grouping of data/tuples based on some attributes of the relation. • The aggregate functions are SUM, AVERAGE, MAXIMUM, MINIMUM and COUNT. Comment Step 2 of 3 The syntax of FUNCTION operation is as follows: (R) where, is a list of attributes from R based on which grouping is to be performed. is the symbol used for aggregate function operation. is a list of pairs where a pair consists of function and attributes. Comment Step 3 of 3 FUNCTION operation is used for obtaining the summarized data from the relations. Example: MAXIMUM Salary MINIMUM Salary (EMPLOYEE) The above query will find the maximum and minimum salary in the EMPLOYEE relation. Comment Chapter 8, Problem 7RQ Problem How are the OUTER JOIN operations different from the INNER JOIN operations? How is the OUTER UNION operation different from UNION? Step-by-step solution Step 1 of 2 OUTER JOIN and INNER JOIN: Consider two relational databases R and S. When user wants to keep all the tuples in R, or all those in S, or all the tuples in R, or all those in S, or all those in both relations in the result of the JOIN regardless of weather or not they have matching tuples in other relation, set of operations called outer joins can do so. This satisfies the need of queries in which tuples from two tables are to be combined by matching corresponding rows, but without losing any tuples for lack of matching values. When only matching tuples (based on condition) are contained in resultant relation and not all tuples then join is INNER JOIN (EQUIJOIN and NATURALJOIN). In OUTER JOIN if matching values of other relation are not present fields are padded by NULL value. Comment Step 2 of 2 OUTER UNION and UNION: For UNION operation databases have to be UNION compatible, i.e, they have same number of attributes and each corresponding pair of attributes have same domain. OUTER UNION operation was developed to take the union of tuples from two relations if the relations are not union compatible. This operation will take UNION of tuples in two relations R(X, Y) and S(X,Z) that are partial compatible, meaning that only some attributes, say X, are union compatible. Resultant relation is of form RESULT(X, Y, Z). Two tuples t1 in R and T2 in S are said to match if t1[X] =t2[X] and are considered to contain same entity instance. These are combined in single tuple. For rest of tuples NULL values are padded. Comment Chapter 8, Problem 8RQ Problem In what sense does relational calculus differ from relational algebra, and in what sense are they similar? Step-by-step solution Step 1 of 2 Difference between relational calculus and relational algebra: Relational calculus Relational algebra It is a non-procedural language. It is a procedural language. The query specifies what output is to be retrieved. The order of the operations to be followed for getting the result is not specified. The query specifies how the desired output is retrieved. The order of the operations to be followed for getting the result is specified. The evaluation of the query does not depend on the order of the operations. The evaluation of the query depends on the order of the operations. New relations are not created by performing operations on New relations can be obtained by the existing relations. Formulas are directly applied on the performing operations on the existing relations. existing relations. The queries are domain The queries are domain independent. dependent. Comment Step 2 of 2 Similarities between relational calculus and relational algebra: • Relational algebra and relational calculus are formal query languages for relational model. • They are used for retrieving information from database. Comment Chapter 8, Problem 9RQ Problem How does tuple relational calculus differ from domain relational calculus? Step-by-step solution Step 1 of 2 The relational calculus is a non-procedural query language that uses predicates. • The query in relational calculus specifies what output is to be retrieved. • The order of the operations to be followed for getting the result is not specified. • In other words, the evaluation of the query does not depend on the order of the operations. • The two variations of relational calculus are: o Tuple relational calculus o Domain relational calculus Comment Step 2 of 2 The differences between tuple relational calculus and domain relational calculus are as follows: Comment Chapter 8, Problem 10RQ Problem Discuss the meanings of the existential quantifier (∃) and the universal quantifier (∀). Step-by-step solution Step 1 of 2 Quantifier’s are two types (1) Existential quantifiers:(2) Universal quantifiers:(1) Existential quantifiers:Existential quantifier is a logical relation and symbolized as (“ there exists”). Here The statement is Based on the formula of existential quantifiers is if F is a formula, then so is . Where t is a tuple variable. If the formula F evaluates to TRUE for some tuple assigned to free occurrences of t in F, then the formula is TRUE. Otherwise, it is FALSE. Comment Step 2 of 2 (2) Universal Quantifiers:Universal quantifiers is a logical relation, it is symbolized as The statement is . . Based on the formula of universal quantifiers is If F is a formula then statement is . Here t is the tuple variable and the formula F Evaluates to true for every tuple assigned to free occurrences of t in F, then F is TRUE other wire it is FALSE. Comment Chapter 8, Problem 11RQ Problem Define the following terms with respect to the tuple calculus: tuple variable, range relation,atom, formula, and expression. Step-by-step solution Step 1 of 3 Tuple relational calculus: The tuple relational calculus is a non-procedural language. It contains a declarative expression that specifies what is to be retrieved. Comment Step 2 of 3 Tuple variable: A query in the tuple relational calculus is represented as . Here, t is a tuple variable for which predicate P is true. Range Relation: In the tuple relational calculus, every tuple ranges over a relation. The variable takes any tuple as its value from the relation. Atom: The atom in the tuple relational calculus identifies the range of the tuple variable. The condition in the tuple relational calculus is made of atoms. Comment Step 3 of 3 Formula: A formula or condition is made of atoms. These atoms in the formula are connected via the logical operators like AND, OR, NOT. Every atom in the formula is treated as a formula i.e., the formula may or may not have multiple atoms. Expression: The tuple relational calculus contains a declarative expression that specifies what is to be retrieved. Example: Consider an expression . In this expression, t is the tuple variable, and is the formula, are atoms, the specifies the range of the tuple variable t over the relation Comment is a range relation that . Chapter 8, Problem 11RQ Problem Define the following terms with respect to the tuple calculus: tuple variable, range relation,atom, formula, and expression. Step-by-step solution Step 1 of 3 Tuple relational calculus: The tuple relational calculus is a non-procedural language. It contains a declarative expression that specifies what is to be retrieved. Comment Step 2 of 3 Tuple variable: A query in the tuple relational calculus is represented as . Here, t is a tuple variable for which predicate P is true. Range Relation: In the tuple relational calculus, every tuple ranges over a relation. The variable takes any tuple as its value from the relation. Atom: The atom in the tuple relational calculus identifies the range of the tuple variable. The condition in the tuple relational calculus is made of atoms. Comment Step 3 of 3 Formula: A formula or condition is made of atoms. These atoms in the formula are connected via the logical operators like AND, OR, NOT. Every atom in the formula is treated as a formula i.e., the formula may or may not have multiple atoms. Expression: The tuple relational calculus contains a declarative expression that specifies what is to be retrieved. Example: Consider an expression . In this expression, t is the tuple variable, and is the formula, are atoms, the specifies the range of the tuple variable t over the relation Comment is a range relation that . Chapter 8, Problem 12RQ Problem Define the following terms with respect to the domain calculus: domain variable, range relation, atom, formula, and expression. Step-by-step solution Step 1 of 3 Domain variable:A variable whose value is drawn from the domain of an attribute. To form a relation of degree ‘n’ for a query result, domain variables are used. Ex: The domain of domain variable Crs might be the set of possible values of the Crs code attribute of the relation teaching. Comment Step 2 of 3 Range relation:In the domain calculus, the type of variables is used in formulas, other wise variables having the range over tuples. The variable range over single values from domains of attributes. ATOM:A list of values in a relation must be a tuple let the relation R as Here R is the name of the relation of degree j and each , and is a domain variable. Comment Step 3 of 3 Formula:In a domain relational calculus formula is recursively defined. Starting with simple atomic formulas and building bigger and better formulas using the logical connectives. A formula is mode up of atoms. Expression:It is the domain relational calculus. That is the form of Here are domain variables. An expression in a domain calculus is called formulas. Comment Chapter 8, Problem 13RQ Problem What is meant by a safe expression in relational calculus? Step-by-step solution Step 1 of 3 An expression in relational calculus is said to be safe expression if it ensures to output a finite set of tuples. Comment Step 2 of 3 The relational calculus expression that generates all the tuples from the universe that are not student tuples is as follows: It generates infinite number of tuples as there will be so many tuples other than student tuples. Such expressions in relational calculus that does not generate a finite set of tuples are known as unsafe expression. Comment Step 3 of 3 The generated tuples of the safe expression must be from the domain of an expression. Otherwise it is considered as unsafe. Comment Chapter 8, Problem 14RQ Problem When is a query language called relationally complete? Step-by-step solution Step 1 of 2 A relational query language is said to be relationally complete if a query that is expressed in relational calculus can also be expressed in query language. • The expressive power of query language will be equivalent to relational algebra. • Relational completeness is a criterion by which the expressive strength of a language can be measured. Comment Step 2 of 2 • Some of the queries cannot be expressed in relational calculus or relational algebra. • Almost all relational query languages (for example SQL) are relationally complete. They are more expressive than relational algebra or relational calculus. Comment Chapter 8, Problem 15E Problem Show the result of each of the sample queries in Section 8.5 as it would apply to the database state in Figure 5.6. Step-by-step solution Step 1 of 6 Query 1:Result FNAME LNAME ADDRESS John Smith 731 Fondren, F Rank in Wong 638 Voss, Ramesh Narayan 975 F ire, Oak, Humble , Tx Joyce English 5631 Rice, Comment Step 2 of 6 Query 2:PNUMBER DNUM LNAME ADDRESS B DATE 10 4 Wallace 291 , 20 – JUN – 31 30 4 Wallace 291 , 20 – JUN - 31 Comment Step 3 of 6 Query 3:Result :Is empty because here no tuples satisfy the result. LNAME F NAME Query 4: Result: Is P NO 1 1 Comment Step 4 of 6 Query 5:Result: L NAME F NAME Smith John Wong Comment Step 5 of 6 Query 6:Result :L NAME F NAME Zelaga Alicia Narayan Ramesh English Joyce Jobber Ahmad Borg James Comment Step 6 of 6 Query 7:Result: L NAME FNAME Wallace Wong Comment Jennifer Chapter 8, Problem 16E Problem Specify the following queries on the COMPANY relational database schema shown in Figure 5.5 using the relational operators discussed in this chapter. Also show the result of each query as it would apply to the database state in Figure 5.6. a. Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project. b. List the names of all employees who have a dependent with the same first name as themselves. c. Find the names of all employees who are directly supervised by ‘Franklin Wong’. d. For each project, list the project name and the total hours per week (by all employees) spent on that project. e. Retrieve the names of all employees who work on every project. f. Retrieve the names of all employees who do not work on any project. g. For each department, retrieve the department name and the average salary of all employees working in that department. h. Retrieve the average salary of all female employees. i. Find the names and addresses of all employees who work on at least one project located in Houston but whose department has no location in Houston. j. List the last names of all department managers who have no dependents. Step-by-step solution Step 1 of 10 Comment Step 2 of 10 Comment Step 3 of 10 Comment Step 4 of 10 Comments (1) Step 5 of 10 Comment Step 6 of 10 Comment Step 7 of 10 Comments (1) Step 8 of 10 Comment Step 9 of 10 Comments (2) Step 10 of 10 Comments (2) Chapter 8, Problem 17E Problem Consider the AIRLINE relational database schema shown in Figure, which was described in Exercise. Specify the following queries in relational algebra: a. For each flight, list the flight number, the departure airport for the first leg of the flight, and the arrival airport for the last leg of the flight. b. List the flight numbers and weekdays of all flights or flight legs that depart from Houston Intercontinental Airport (airport code ‘iah’) and arrive in Los Angeles International Airport (airport code ‘lax’). c. List the flight number, departure airport code, scheduled departure time, arrival airport code, scheduled arrival time, and weekdays of all flights or flight legs that depart from some airport in the city of Houston and arrive at some airport in the city of Los Angeles. d. List all fare information for flight number ‘col97’. e. Retrieve the number of available seats for flight number ‘col97’ on ‘2009-10-09’. The AIRLINE relational database scheme. Exercise Consider the AIRLINE relational database schema shown in Figure, which describes a database for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled arrival and departure times, airports, and one or more LEG_INSTANCEs— one for each Date on which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance, SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE database to enter a reservation on a particular flight or flight leg on a given date. a. Give the operations for this update. b. What types of constraints would you expect to check? c. Which of these constraints are key, entity integrity, and referential integrity constraints, and which are not? d. Specify all the referential integrity constraints that hold on the schema shown in Figure. Step-by-step solution Step 1 of 4 The following symbols are used to write a relation algebra query: Comment Step 2 of 4 a. Following is the query to list the flight number, the first leg of flight’s departure airport, and the last leg of flight’s arrival airport from each flight: Explanation: • FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number. • MAX_FLIGHT_LEG holds the data about Flight_numbers whose Leg_number is maximum in the FLIGHT_LEG_IN. • MIN_FLIGHT_LEG holds the data about Flight_numbers whose Leg_number is minimum in the FLIGHT_LEG_IN. • In RESULT1, the data about the Flight_number, Leg_number and Arrival_airport_code of MAX_FLIGHT_LEG is stored. • In RESULT2, the data about the Flight_number, Leg_number and Arrival_airport_code of MIN_FLIGHT_LEG is stored. • RESULT will display the resultant tuples of the Union of the Set Algebra of RESULT1 and RESULT2. Comments (1) Step 3 of 4 b. Following is the query to retrieve the flight numbers and weekdays of all flights or flight legs that flies from Houston Intercontinental Airport whose code is given as ‘iah’ to Los Angeles International Airport whose code is given as ‘lax’: Explanation: • FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number. • In RESULT1, the data about the FLIGHT_LEG is stored whose Departure_airport_code is ‘iah’and Arrival_airport_code is ‘lax’. • RESULT will display the Flight_number, Weekdays of RESULT1. c. Following is the query to retrieve the flight number, airport code and scheduled time of departure, airport code and scheduled time of arrival, and weekdays of all flights or flight legs that flies from one of the airport in city of Houston and lands at one of the airport in Los Angeles: Explanation: • FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number. • The DEPART_CODE will hold the data about the Airport_code of AIRPORT whose City = ‘Houston’. • The ARRIVE_CODE will hold the data about the Airport_code of AIRPORT whose City = ‘Los Angeles’. • The HOUST_DEPART holds the resultant of the relation obtained when the JOIN operation is applied between the relations DEPART_CODE and FLIGHT_LEG_IN which satisfies condition that Airport_Code = Departure_airport_code. • The HOUST_TO_LA holds the resultant of the relation obtained when the JOIN operation is applied between the relations ARRIVE_CODE and HOUST_DEPART which satisfies condition that Airport_Code = Arrival_airport_code. • RESULT will display the Flight_number, Departure_airport_code, Scheduled_departure_time, Arrival_airport_code, Scheduled_arrival_time and Weekdays of HOUST_TO_LA. d. Following is the query to retrieve the fare information of the whose flight number is ‘col97’: Explanation: RESULT will hold the data about the all the FARE’s whose Flight_number is ‘col97’. Comment Step 4 of 4 e. Following is the query to get the number of available seats whose flight number is ‘col97’ and dated on ‘2009-10-09’: Explanation: • LEG_INST_INFO holds the data about LEG_INSTANCE whose Flight_number is ‘col97’ and Date is ‘2009-10-09’. • RESULT will display the Number_of_available_seats information of the LEG_INST_INFO. Comment Chapter 8, Problem 18E Problem Consider the LIBRARY relational database schema shown in Figure, which is used to keep track of books, borrowers, and book loans. Referential integrity constraints are shown as directed arcs in Figure, as in the notation of Figure 5.7. Write down relational expressions for the following queries: a. How many copies of the book titled The Lost Tribe are owned by the library branch whose name is ‘Sharpstown’? b. How many copies of the book titled The Lost Tribe are owned by each library branch? c. Retrieve the names of all borrowers who do not have any books checked out. d. For each book that is loaned out from the Sharpstown branch and whose Due_date is today, retrieve the book title, the borrower’s name, and the borrower’s address. e. For each library branch, retrieve the branch name and the total number of books loaned out from that branch. f. Retrieve the names, addresses, and number of books checked out for all borrowers who have more than five books checked out. g. For each book authored (or coauthored) by Stephen King, retrieve the title and the number of copies owned by the library branch whose name is Central. A relational database scheme for a LIBRARY database. Step-by-step solution Step 1 of 7 a. Following is the relational expression to find the number of copies of the book whose title is ‘The Lost Tribe’ in the library branch whose name is ‘Sharpstown’: Comment Step 2 of 7 b. Following is the relational expression to find the number of copies of the book whose title is ‘The Lost Tribe’ is available at each branch of the library: Comment Step 3 of 7 c. Following is the relational expression to retrieve the names of the borrowers who have no books checked out: Comment Step 4 of 7 d. Following is the relational expression to retrieve the book title, borrower’s name and address of the book that is loaned out from of the borrowers who have no books checked out from branch whose name is ‘Sharpstown’ and which has the due date as today: Comment Step 5 of 7 e. Following is the relational expression to retrieve the branch name and the total number of books loaned out from that branch: Comments (1) Step 6 of 7 f. Following is the relational expression to retrieve the name, address and total number of books for all borrowers who have more than five books checked out: Comment Step 7 of 7 g. Following is the relational expression to retrieve the title and number of copies of each book authored or coauthored by Stephen King in library branch whose name is Central: Comment Chapter 8, Problem 19E Problem Specify the following queries in relational algebra on the database schema given in Exercise: a. List the Order# and Ship_date for all orders shipped from Warehouse# W2. b. List the WAREHOUSE information from which the CUSTOMER named Jose Lopez was supplied his orders. Produce a listing: Order#, Warehouse#. c. Produce a listing Cname, No_of_orders, Avg_order_amt, where the middle column is the total number of orders by the customer and the last column is the average order amount for that customer. d. List the orders that were not shipped within 30 days of ordering. e. List the Order# for orders that were shipped from all warehouses that the company has in New York. Exercise Consider the following six relations for an order-processing database application in a company: CUSTOMER(Cust#, Cname, City) ORDER(Order#, Odate, Cust#, Ord_amt) ORDER_ITEM(Order#, Item#, Qty) ITEM(Item#, Unit_price) SHIPMENT(Order#, Warehouse#, Ship_date) WAREHOUSE(Warehouse#, City) Here, Ord_amt refers to total dollar amount of an order; Odate is the date the order was placed; and Ship_date is the date an order (or part of an order) is shipped from the warehouse. Assume that an order can be shipped from several warehouses. Specify the foreign keys for this schema, stating any assumptions you make. What other constraints can you think of for this database? Step-by-step solution Step 1 of 6 Relational Algebra It is a procedural language to perform various queries on the database. The operations of the relational algebra are as follows: • Select: It is used to select the tuples and it is presented by a symbol σ. • Project: it is used to projects the columns and it is represented by ∏. • Union is identified by ∪. • Set different is identified by –. • Cartesian product is identified by Χ. • Rename is identified by ρ Comment Step 2 of 6 a. Query to retrieve the order number and shipping date for all the orders that are shipped from Warehouse "W2": Explanation: • First projects the Order# and Ship_date and then select the Warehouse# "W2" for all orders. • The above query will select the fields Order# and Ship_date from the table SHIPMENT whose Warehouse number = "W2" for all the orders. Comment Step 3 of 6 b. Query to retrieve the order number and warehouse number for all the orders of customer named "Jose Lopez": Explanation: • First select the Customer named "Jose Lopez" was supplied his orders and then project the listing of Order#, Warehouse#. • TEMP will give the details of the ORDER and the CUSTOMER table whose Cname is ‘Jose Lopez’. The details of Jose Lopez will be the output. • The above query will display only the Order# and Warehouse# and perform natural join on SHIPMENT and the TEMP table whose Order# is same as the Order# number of TEMP. Comment Step 4 of 6 c. Query to retrieve the Cname and total number of orders and average order amount of each customer: Explanation: • The relation TEMP specifies the list of attributes between parenthesis in the RENAME operation. • To define the aggregate functions in the query by using the following syntax: • The number of orders and average order amount is group by the cname field. • The above query will display only the Customer name, number of orders, and average order amount and perform natural join on CUSTOMER and the TEMP table whose Cust# is same as the Cust# number of TEMP. Comment Step 5 of 6 d. Query to list the orders that are not shipped within 30 days of ordering: Explanation: • First projects the Order#, Odate, Cust#, and Order_amt then select the orders were not shipped within the thirty days. • Select the number of days is calculated by subtracting order date from shipping date and perform natural join on SHIPMENT whose Order# is same as the Order# number of ORDER. Comment Step 6 of 6 e. Query to list the order# of the orders shipped from the warehouses located in New York: Explanation: • TEMP will give the details of the WAREHOUSE whose City is ‘NEW YORK’. The details of ‘NEW YORK’ will be the output. • Project the Warehouse# from the SHIPMENT table and it is divided by the TEMP. • The division operator includes all the rows in the SHIPMENT table in combination with every row from relation TEMP and finally the resultant rows appear in the SHIPMENT relation. Comment Chapter 8, Problem 20E Problem Specify the following queries in relational algebra on the database schema given in Exercise: a. Give the details (all attributes of trip relation) for trips that exceeded $2,000 in expenses. b. Print the Ssns of salespeople who took trips to Honolulu. c. Print the total trip expenses incurred by the salesperson with SSN = ‘234-56-7890’. Exercise Consider the following relations for a database that keeps track of business trips of salespersons in a sales office: SALESPERSON(Ssn, Name, Start_year, Dept_no) TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip id) EXPENSE(Trip id, Account#, Amount) A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating any assumptions you make. Step-by-step solution Step 1 of 4 The relational database schema is: SALESPERSON (Ssn, Name, Start_year, Dept_no) TRIP (Ssn, From_city, To_city, Departure_date, Return_date, Trip_id) EXPENSE(Trip_id, Account#, Amount) Comment Step 2 of 4 a) Details for trips that exceeded $2000 in expenses. Comment Step 3 of 4 b) Print the SSN of salesman who took trips to ‘Honolulu’. Comment Step 4 of 4 c) Print the total trip expenses incurred by the salesman with SSN= ‘234-56-7890’. Comment Chapter 8, Problem 21E Problem Specify the following queries in relational algebra on the database schema given in Exercise: a. List the number of courses taken by all students named John Smith in Winter 2009 (i.e., Quarter=W09). b. Produce a list of textbooks (include Course#, Book_isbn, Book_title) for courses offered by the ‘CS’ department that have used more than two books. c. List any department that has all its adopted books published by ‘Pearson Publishing’. Exercise Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course: STUDENT(Ssn, Name, Major, Bdate) COURSE(Course#, Cname, Dept) ENROLL(Ssn, Course#, Quarter. Grade) BOOK ADOPTION(Course#, Quarter, Book_isbn) TEXT(Book_isbn, Book_title, Publisher, Author) Specify the foreign keys for this schema, stating any assumptions you make. Step-by-step solution Step 1 of 3 a. Π Course# (σ Quarter=W09 ((σ Name= ‘John Smith’ (STUDENT) ENROLL)) Explanation: • This query will give the courses taken by the student named ‘John Smith’ in winter 2009. • Here, ‘Π’ is nothing but the projection, ‘σ’ represents selection operation and ‘ ’ represents the natural join operation. Comment Step 2 of 3 b. Π Course#,Book_isbn,Book_title(σ Dept=’CS’ ( Course) (Book_adaption))U (πCourse no (σ Course no >=1)) Explanation: • The above query will retrieve the list of textbooks for CS course with the use of natural join. • The union operator for this query is used to get the common rows from two queries. Comment Step 3 of 3 c. BOOK_ALL_DEPTS = π Dept ((Book_adaption Course)) BOOK_OTHER_DEPTS= π DEPT ((σ Publisher <> ‘Pearson Publishers’ (Book adaption Text) Course)) BOOK_ANY_DEPTS = BOOK_ALL_DEPTS - BOOK_OTHER_DEPTS Explanation: • The above query will list the departments which have all the adopted books published by “Pearson publishing”. • In this query ‘<>’ operator is used for “not equal to” operation. Comment Chapter 8, Problem 22E Problem Consider the two tables T1 and T2 shown in Figure 8.15. Show the results of the following operations: Step-by-step solution Step 1 of 7 Operations of relational algebra The two tables T1 and T2 represent database states. TABLE T1 TABLE T2 P Q R A B C 10 a 5 10 b 6 15 b 8 25 c 3 25 a 6 10 b 5 Comment Step 2 of 7 a) The operation is “THETA JOIN”. It produces all the combinations of tuples that satisfy the join condition . Following table is the result of the “THETA JOIN” operation. P Q R A B C 10 a 5 10 b 6 10 a 5 10 b 5 25 a 6 25 c 3 Comment Step 3 of 7 b) The operation is “THETA JOIN”. It produces all the combinations of tuples that satisfy the join condition operation. P Q R A B C 15 b 8 10 b 6 15 b 8 10 b 5 Comment . Following table is the result of the “THETA JOIN” Step 4 of 7 c) The operation is “LEFT OUTER JOIN”. It produces the tuples that are in the first or left relation T1 with the join condition . If no matching tuple is found in T2, then the attributes are filled with a NULL values. Following table is the result of the “LEFT OUTER JOIN” operation. P Q R A B C 10 a 5 10 b 6 10 a 5 10 b 5 15 a 8 NULL NULL NULL 25 a 8 25 c 3 Comment Step 5 of 7 d) The operation is “RIGHT OUTER JOIN”. It produces the tuples that are in the second or right relation T2 with the join condition . If no matching tuple is found in T1, then the attributes are filled with a NULL values. Following table is the result of the “RIGHT OUTER JOIN” operation. P Q R A 15 b 8 10 b 6 NULL NULL NULL 25 c 3 15 5 b 8 B C 10 b Comment Step 6 of 7 e) The operation is “UNION”. It produces a relation that includes all the tuples that are in T1 or T2 or both T1 and T2. The operation is possible since T1 and T2 are union compatible. Following table is the result of the “UNION” operation. P Q R 10 a 5 15 b 8 25 a 6 10 b 6 25 c 3 10 b 5 Comment Step 7 of 7 f) The operation is “THETA JOIN”. It produces all the combinations of tuples that satisfy the join condition table is the result of the “THETA JOIN” operation. P Q R A 10 a 5 Comment B C 10 b 5 . Following Chapter 8, Problem 23E Problem Specify the following queries in relational algebra on the database schema in Exercise: a. For the salesperson named ‘Jane Doe’, list the following information for all the cars she sold: Serial#, Manufacturer, Sale_price. b. List the Serial# and Model of cars that have no options. c. Consider the NATURAL JOIN operation between SALESPERSON and SALE. What is the meaning of a left outer join for these tables (do not change the order of relations)? Explain with an example. d. Write a query in relational algebra involving selection and one set operation and say in words what the query does. Exercise Consider the following relations for a database that keeps track of automobile sales in a car dealership (OPTION refers to some optional equipment installed on an automobile): CAR(Serial no, Model, Manufacturer, Price) OPTION(Serial_no, Option_name, Price) SALE(Salesperson_id, Serial_no, Date, Sale_price) SALESPERSON(Salesperson_id, Name, Phone) First, specify the foreign keys for this schema, stating any assumptions you make. Next, populate the relations with a few sample tuples, and then give an example of an insertion in the SALE and SALESPERSON relations that violates the referential integrity constraints and of another insertion that does not. Step-by-step solution Step 1 of 4 (a) Comment Step 2 of 4 (b) Comment Step 3 of 4 (c) Meaning of LEFT OUTER JOIN operation between SALESPERSON and SALE is that all the records for which JOIN condition evaluates to be true and all the records from SALESPERSON that do not match condition will also be displayed and attribute values for attributes corresponding to SALE table will be marked as NULL. For example: Consier records for two sale person a. ID_1,ABC,9999999 b. ID_2,DEF,8888888 And having tuple: a) ID_1,111, 2-08-2008,500000 Result of join operation will have two tuples: a) ID_1,ABC,9999999, 111, 2-08-2008,500000 b) ID_2,DEF,8888888,NULL,NULL,NULL Comment Step 4 of 4 (d) This query gives information about Doe couple, who happen to work at same place. Comment Chapter 8, Problem 24E Problem Specify queries a, b, c, e, f, i, and j of Exercise 8.16 in both tuple and domain relational calculus. Reference Exercise 8.16 Specify the following queries on the COMPANY relational database schema shown in Figure 5.5 using the relational operators discussed in this chapter. Also show the result of each query as it would apply to the database state in Figure 5.6. a. Retrieve the names of all employees in department 5 who work more than 10 hours per week on the ProductX project. b. List the names of all employees who have a dependent with the same first name as themselves. c. Find the names of all employees who are directly supervised by ‘Franklin Wong’. d. For each project, list the project name and the total hours per week (by all employees) spent on that project. e. Retrieve the names of all employees who work on every project. f. Retrieve the names of all employees who do not work on any project. g. For each department, retrieve the department name and the average salary of all employees working in that department. h. Retrieve the average salary of all female employees. i. Find the names and addresses of all employees who work on at least one project located in Houston but whose department has no location in Houston. j. List the last names of all department managers who have no dependents. Step-by-step solution Step 1 of 10 Tuple relational calculus The tuple relational calculus is dependent on the use of tuple variables. A tuple variable is a named relation of “ranges over”. Domain relational calculus The variables in the tuple relational calculus take their values from domains of attributes rather than tuples of relations. Comment Step 2 of 10 a. • To specify the range of a tuple variable e as the EMPLOYEE relation. • Select the LNAME, FNAME attributes of the EMPLOYEE relation where DNO=5 work for HOURS>10. Tuple relational calculus: Explanation: • In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the PROJECT considers as the b and the WORKS_ON considers as the c. • In the above tuple relational calculus, there is a free variable a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions EMPLOYEE (a) and WORKS_ON (c) specify the range relations for a and c. The condition a.ssn=c.ESSN is a join condition. Domain relational calculus: Explanation: • There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The only q and s are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q and s for Name fields. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations t=e is a join condition. Comment Step 3 of 10 b. • To specify the range of a tuple variable e as the EMPLOYEE relation. • Select the LNAME, FNAME attributes of the EMPLOYEE relation who have a dependent with the same first name as themselves. Tuple relational calculus: Explanation: • In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the DEPENDENT considers as the b. • In the above tuple relational calculus, there is a free variable a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions EMPLOYEE (a) and DEPENDENT (b) specify the range relations for a and b. The condition a.ssn=b.ESSN is a join condition. Domain relational calculus: Explanation: • There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The only q and s are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q and s for Name fields. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations a=t and b=q is a join condition. Comment Step 4 of 10 c. • To specify the range of a tuple variable e as the EMPLOYEE relation. • Select the LNAME, FNAME attributes of the EMPLOYEE relation to find the names of employees that are directly supervised by 'Franklin Wong'. Tuple relational calculus: Explanation: • In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the EMPLOYEE considers as the b by using self-join. • In the above tuple relational calculus, there is a free variable a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions EMPLOYEE (a) and EMPLOYEE (b) specify the range relations for e and s. The condition a.ssn=b.SSN is a self-join condition. Domain relational calculus: Explanation: • There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The only q and s are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q and s for Name fields. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations y=d and S.FNAME='Franklin' AND S.LNAME='Wong' is a join condition. Comment Step 5 of 10 e. • To specify the range of a tuple variable e as the EMPLOYEE relation. • Select the LNAME, FNAME attributes of the EMPLOYEE relation to retrieve the names of employees who work on every project. Tuple relational calculus: Explanation: • In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the FORALL PROJECT considers as the b. • In the above tuple relational calculus, there is a free variable a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions EMPLOYEE (a) and FORALL PROJECT (b) specify the range relations for a and b. The condition WHERE PNUMBER=PNO AND ESSN=SSN. Domain relational calculus: Explanation: • There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The only q and s are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q and s for Name fields. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations e=t and PNUMBER=PNO AND ESSN=SSN is a join condition. Comment Step 6 of 10 f. • To specify the range of a tuple variable e as the EMPLOYEE relation. • Select the LNAME, FNAME attributes of the EMPLOYEE relation to retrieve the names of employees who do not work on any project. Tuple relational calculus: Comment Step 7 of 10 Explanation: • In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the WORKS_ON considers as the b. • In the above tuple relational calculus, there is a free variable a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions EMPLOYEE (a) and WORKS_ON (b) specify the range relations for e and w. The condition WHERE ESSN=SSN. Domain relational calculus: Explanation: • There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The only q and s are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q and s for Name fields. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations a=t WHERE ESSN=SSN is a join condition. Comment Step 8 of 10 i. • To specify the range of a tuple variable e as the EMPLOYEE relation. • Select the LNAME, FNAME, and ADDRESS attributes of the EMPLOYEE relation employees who work on at least one project located in Houston. Tuple relational calculus: . Explanation: • In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the PROJECT considers as the b and the WORKS_ON considers as the c. • In the above tuple relational calculus, there is a free variable a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions EMPLOYEE (a) and WORKS_ON (c) specify the range relations for a and c. The condition a.ssn=c.ESSN and PNO=PNUMBER AND PLOCATION='Houston' is a join condition. Domain relational calculus: Explanation: • There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The only q and s are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q, s, and v for Name and address fields. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations t=e and e.ssn=w.ESSN and PNO=PNUMBER AND PLOCATION='Houston' is a join condition. Comment Step 9 of 10 j. • To specify the range of a tuple variable e as the EMPLOYEE relation. • Select the LNAME attribute of the EMPLOYEE relation of department managers who have no dependents. Tuple relational calculus: Explanation: • In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the DEPARTMENT considers as the b and the DEPENDENT considers as the c. • Comment Step 10 of 10 In the above tuple relational calculus, there is a free variable a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions EMPLOYEE (a) and DEPARTMENT (b) specify the range relations for e and d. The condition a.ssn=b.MGRSSN and SSN=ESSN is a join condition. Domain relational calculus: Explanation: • There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The only s is free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable s for Name fields. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations e=t and e.ssn=d.MGRSSN and SSN=ESSN is a join condition. Comment Chapter 8, Problem 25E Problem Specify queries a, b, c, and d of Exercise 1 in both tuple and domain relational calculus. Exercise 1 Consider the AIRLINE relational database schema shown in Figure, which was described in Exercise 2. Specify the following queries in relational algebra: a. For each flight, list the flight number, the departure airport for the first leg of the flight, and the arrival airport for the last leg of the flight. b. List the flight numbers and weekdays of all flights or flight legs that depart from Houston Intercontinental Airport (airport code ‘iah’) and arrive in Los Angeles International Airport (airport code ‘lax’). c. List the flight number, departure airport code, scheduled departure time, arrival airport code, scheduled arrival time, and weekdays of all flights or flight legs that depart from some airport in the city of Houston and arrive at some airport in the city of Los Angeles. d. List all fare information for flight number ‘col97’. e. Retrieve the number of available seats for flight number ‘col97’ on ‘2009-10-09’. The AIRLINE relational database scheme. Exercise 2 Consider the AIRLINE relational database schema shown in Figure, which describes a database for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled arrival and departure times, airports, and one or more LEG_INSTANCEs— one for each Date on which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance, SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE database to enter a reservation on a particular flight or flight leg on a given date. a. Give the operations for this update. b. What types of constraints would you expect to check? c. Which of these constraints are key, entity integrity, and referential integrity constraints, and which are not? d. Specify all the referential integrity constraints that hold on the schema shown in Figure. Step-by-step solution Step 1 of 5 a. Tuple Relational Calculus: In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG consider as the l. • In the above tuple relational calculus there are two free variable f and l and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfies the conditions provided after the bar. • The conditions FLIGHT (f) and FLIGHT_LEG (l) specifies the range relations for f and l. The condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER JOIN operation Domain Relational Calculus: • There are need of the 10 variables for the FLIGHT relation, of the ten variables q, r, s…z. Only q, and v are free, because they appear to the left of the bar. • Firstly there is specification of the requested attributes, flight number, departure airport for the first leg of the flight and the arrival airport for the last leg of the flight. • There is condition for selecting a tuple after the bar (|). • A condition relating two domain variable from relations m=z is a join condition. Comment Step 2 of 5 b. Tuple Relational Calculus: In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG consider as the l. • In the created tuple relational calculus there is a single to free variable f this is appear to the left of the bar ( | ) . • The variables are retrieved which come before the bar (|), for all those tuples which satisfies the conditions provided after the bar. • The condition l.Departure_airport_code=’iah’ and l.Arrival_airport_code=’Iax’ is a selection condition, which is similar to the SELECT operation in relational algebra. • The conditions FLIGHT (f) and FLIGHT_LEG (l) specified the range relations for f and l. The condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER JOIN operation. Domain Relational Calculus: • There are need of the 10 variables for the FLIGHT relation, of the ten variables q, r, s…z, only u, and v are free, because they appear to the left of the bar. • Firstly there is specification of the requested attributes flight number, Weekdays, departurefrom the Houstonintercontinental and arrive in los Angeles international Airport and of all the flight and the arrival airport for the last leg of the flight. • The values assigned to the variable qrstuvwxyz, they become the tuple of the FLIGHT relation and these values are for q (Departure_airport_code) and r (Arrival_airport_code) is equal to ‘iah’ and ‘Iax’ respectively. • Then there is condition for selecting a tuple after the bar (|). • A condition relating two domain variable from relations m=z is a join condition. Comment Step 3 of 5 c. Tuple Relational Calculus: In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG consider as the l. • In the created tuple relational calculus there are two free variable f and l and these appear to the left of the bar (|) . • The variables are retrieved which come before the bar (|), for all those tuples which satisfies the conditions provided after the bar. • The condition l.Departure_airport_code=’iah’ and l.Arrival_airport_code=’Iax’ is a selection condition, which is similar to the SELECT operation in relational algebra. • Comment Step 4 of 5 The conditions FLIGHT (f) and FLIGHT_LEG (l) specifies the range relations for f and l. The condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER JOIN operation Domain Relational Calculus: • There are need of the 10 variables for the FLIGHT relation and 5 variable for FLIGHT_LEG, of the 15 variables k, l…..q, r, s…z, only u, l, m, n, o and v are free, because they appear to the left of the bar. • Firstly there is specification of the requested attributes flight number, Departure_airport_code, Scheduled_departure_time, Arrival_airport_code, scheduled_Arrival_time and weekdays for all flight depart from some airportin the Houston city and arrive at some airport in the city of Los Angles. • The values assigned to the variable qrstuvwxyz and jklmnop, they become the tuple of the FLIGHT, FLIGHT_LEG relation and these values are for q (Departure_airport_code) and r (Arrival_airport_code) is equal to ‘iah’ and ‘Iax’ respectively. • Then there is condition for selecting a tuple after the bar (|). • A condition relating two domain variable from relations m=z is a join condition. Comment Step 5 of 5 d. Tuple Relational Calculus: • In the created tuple relational calculus there are two free variable f and r and these appear to the left of the bar (|) . • The variables are retrieved which come before the bar (|), for all those tuples which satisfies the conditions provided after the bar. • The condition r.Fnumber=’col197’is a selection condition, which is similar to the SELECT operation in relational algebra. • The condition FLIGHT (f) and FARE(r) specifies the range relations for f and r. The condition r.Fnumber = f.flight_number is a join condition, whose purpose is similar to the INNER JOIN operation Domain Relational Calculus: • There are need of the 10 variables for the FLIGHT relation and 5 variable for FLIGHT_LEG, of the 15 variables k, l…..q, r, s…z, only s, t, u, v, and m are free, because they appear to the left of the bar. • Firstly there is specification of the requested attributes flight number, Fare_code, Amount, Restrication, and Airline for all fare information for flight number ‘col197’. • The values assigned to the variable qrstuvwxyz and lmnop , they become the tuple of the FARE, FLIGHT relation and these values are for q (flight_number) is equal to ‘col197’. • Then there is condition for selecting a tuple after the bar (|). • A condition relating two domain variable from relations m=z is a join condition Comment Chapter 8, Problem 26E Problem Specify queries c, d, and f of Exercise in both tuple and domain relational calculus. Exercise Consider the LIBRARY relational database schema shown in Figure, which is used to keep track of books, borrowers, and book loans. Referential integrity constraints are shown as directed arcs in Figure, as in the notation of Figure 5.7. Write down relational expressions for the following queries: a. How many copies of the book titled The Lost Tribe are owned by the library branch whose name is ‘Sharpstown’? b. How many copies of the book titled The Lost Tribe are owned by each library branch? c. Retrieve the names of all borrowers who do not have any books checked out. d. For each book that is loaned out from the Sharpstown branch and whose Due_date is today, retrieve the book title, the borrower’s name, and the borrower’s address. e. For each library branch, retrieve the branch name and the total number of books loaned out from that branch. f. Retrieve the names, addresses, and number of books checked out for all borrowers who have more than five books checked out. g. For each book authored (or coauthored) by Stephen King, retrieve the title and the number of copies owned by the library branch whose name is Central. A relational database scheme for a LIBRARY database. Step-by-step solution Step 1 of 3 c. Following is the relational expression to retrieve the names of the borrowers who have no books checked out: Tuple Relational calculus: Explanation: • In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans consider as the l. • In the above tuple relational calculus, there are two free variable b and l and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions Borrower (b) and Book_Loans (l) specifiy the range relations for b and l. The condition b.Card_No = l.Card_No is a join condition. Domain Relational Calculus: Explanation: • There is a need of the 10 variables for the Borrower relation, of the ten variables q, r, s…z. The only q is free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q for Name filed. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations m=z is a join condition. Comment Step 2 of 3 d. Following is the relational expression to retrieve the book title, borrower’s name and address of the book that is loaned out from of the borrowers who have no books checked out from a branch whose name is ‘Sharps town’ and which has the due date as today: Tuple Relational calculus: Explanation: • In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans consider as the c. • In the above tuple relational calculus, there are two free variable b and c and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions Borrower (b) and Book_Loans (c) specify the range relations for and l. The condition b.branch_name = “sharptown” and c.Card_No = b.Card_No and c.Card_No = a.Card_No is a join condition. Domain Relational Calculus: • There is a need of the 16 variables for the BOOK relation, The only a,e, and f are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attributes title from book and name and address fields form borrower. • The values assigned to the variable ijklm, they become the tuple of the Book_loans relation and these values are for i (card_no) is equal to o (card_no) and branch_name=”Sharptown”. • Then there is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations i=o and j=f is a join condition. Comment Step 3 of 3 f. Following is the relational expression to retrieve the name, address and the total number of books for all borrowers who have more than five books checked out: Tuple Relational calculus: Explanation: • In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans consider as the a. • In the above tuple relational calculus, there are two free variable b and a and these appear to the left of the bar (|). • The variables are retrieved which come before the bar (|), for all those tuples which satisfy the conditions provided after the bar. • The conditions Borrower (b) and Book_Loans (a) specify the range relations for b and a. The condition b.Card_No = l.Card_No is a join condition and retrieve the total number of books for all borrowers using count() function. Domain Relational Calculus: Explanation: • There is a need of the 10 variables for the Borrower relation, of the ten variables q, r, s…z. The only q,s, and v are free because they appear to the left of the bar. • Firstly, there is a specification of the requested attribute, the name of the barrower, by the free domain variable q for Name filed, s for address, and v for a total number of books. • There is a condition for selecting a tuple after the bar (|). • A condition relating two domain variables from relations m=z is a join condition and count is greater than 5. Comment Chapter 8, Problem 27E Problem In a tuple relational calculus query with n tuple variables, what would be the typical minimum number of join conditions? Why? What is the effect of having a smaller number of join conditions? Step-by-step solution Step 1 of 1 In a tuple relational calculus, query with n tuple variables should be at least ( n – 1) join conditions, and the second side, the Cartesian product with one of the range relations would be taken. This usually does not make sense. Comment Chapter 8, Problem 28E Problem Rewrite the domain relational calculus queries that followed Q0 in Section 8.7 in the style of the abbreviated notation of Q0A, where the objective is to minimize the number of domain variables by writing constants in place of variables wherever possible. Step-by-step solution Step 1 of 5 Q1: {qsr / (EMPLOYEE (qrstuvwxyz) AND DEPARTMENT ((mno) AND = ‘Research’ AND m = z )} Comment Step 2 of 5 This condition relations two domain variables, here range over attribute from two relations are. m = 2 in Q1 and domain variable to a constant l = ‘Research’ so domain relational calculus for above query is Comment Step 3 of 5 Q1A: { qsv / (EXISTS z ) (EXISTS m) (EMPLOYEE (qrstuvwxyz) AND DEPARTMENT (‘Research ‘, m,n,o) AND m = z)} Comment Step 4 of 5 Q2: {; ksuv | PROTECT (hijk) AND EMPLOYEE (qrstuvwxyz) AND DEPARTMENT (lmno) AND k = m AND n = t AND j = ‘ ’ )} Domain relational calculus is, Comment Step 5 of 5 Q2A:{ iksuv / (EXISTS m) (EXISTS n) (EXISTS t) (PROJECT (h,l,’stafford’, k) AND EMPLOYEE (q,r,s,t,u,v,w,x,y,z) AND DEPARTMENT (l,m,n,o))} Remaining queries Q6, and Q7 will not be different so, they have no constants. Comment Chapter 8, Problem 29E Problem Consider this query: Retrieve the Ssns of employees who work on at least those projects on which the employee with Ssn = 123456789 works. This may be stated as (FORALL x) (IF P THEN Q), where â– x is a tuple variable that ranges over the PROJECT relation. â– P ≡ employee with Ssn = 123456789 works on project x. â– Q ≡ employee e works on project x. Express the query in tuple relational calculus, using the rules â– (∀ x)(P(x)) = NOT (∃x) ( NOT(P(x))). â– (IF P THEN Q)≡(NOT(P) ORQ). Step-by-step solution Step 1 of 1 {e.Ssn|EMPLOYEE(E), AND(( X)(NOT (PROJECT(x)) OR NOT ( ( (WORKS_ON(y) AND y.Essn = ‘123456789’)) OR (( y) w)(WORKS_ON (w) AND w.Essn = e.Ssn AND x.Pnumber = W.Pno)) )} {e.Ssn|EMPLOYEE(E), AND(NOT( X)( (PROJECT(x)) AND (not ( (NOT(WORKS_ON(y) OR NOT y.Essn = ‘123456789’))) OR (( w.Essn = e.Ssn AND x.Pnumber = W.Pno)) )} Comment y) w)(WORKS_ON (w) AND Chapter 8, Problem 30E Problem Show how you can specify the following relational algebra operations in both tuple and domain relational calculus. a. σA=C(R(A, B, C)) b. π<A, B>(R(Α, B, C)) c. R(A, B, C) * S(C, D, E) d. R(A, B, C) ⋃ S(A, B, C) e. P(A, B, C) â‹‚ S(A, B, C) f. P(A, B, C) = S(A, B, C) g. R(A, B, C) ×S(D, E, F) h. P(A, B) ÷ S(A) Step-by-step solution Step 1 of 7 (a) Tuple calculus expression followed by the domain calculus expression is Comment Step 2 of 7 (b) Tuple calculus followed by the domain calculus is , Comment Step 3 of 7 (c) Tuple calculus expression followed by the domain calculus is Comment Step 4 of 7 (d) Tuple calculus expression followed by the domain calculus is Comment Step 5 of 7 (e) Tuple calculus expression (f) Tuple calculus expression Comments (1) Step 6 of 7 (g) Tuple calculus expression Comment Step 7 of 7 (h) Tuple relation calculus expression is Comment Chapter 8, Problem 31E Problem Suggest extensions to the relational calculus so that it may express the following types of operations that were discussed in Section 8.4: (a) aggregate functions and grouping; (b) OUTER JOIN operations; (c) recursive closure queries. Step-by-step solution Step 1 of 3 1. We can define a relation AGGREGATE with attributes Sum, Minimum, Maximum, Average, Count etc. Using any query we can say {t.Sum| AGGREGATE(t) AND ( x)(EMPLOYEE(x) AND x.Sum Σ e.Salary)} We can get sum of salary of all Employees. We can include similar functions for other aggregate operations. Comment Step 2 of 3 2. For OUTER JOIN a special Operation say with symbol δ can be used. And query may look like: {t.| (EMPLOYEE δ DEPARTMENT)(t)} Comment Step 3 of 3 3. Recursive closure: a special Operation say with symbol Φ can be used. And query may look like: {t.| EMPLOYEE (t) AND t.Ssn Φ t.Mgr_ssn } So by specifying that it is a recursive closure operation we may instruct system to calculate result of query. Comment Chapter 8, Problem 32E Problem A nested query is a query within a query. More specifically, a nested query is a parenthesized query whose result can be used as a value in a number of places, such as instead of a relation. Specify the following queries on the database specified in Figure 5.5 using the concept of nested queries and the relational operators discussed in this chapter. Also show the result of each query as it would apply to the database state in Figure 5.6. a. List the names of all employees who work in the department that has the employee with the highest salary among all employees. b. List the names of all employees whose supervisor’s supervisor has ‘888665555’ for Ssn. c. List the names of employees who make at least $10,000 more than the employee who is paid the least in the company. Step-by-step solution Step 1 of 3 Consider the COMPANY database specified in Figure 5.5. a. List the names of all employees who work in the department that has the employee with the highest salary among all employees. The query using the relational operators is as follows: Result: Comment Step 2 of 3 b. List the names of all employees whose supervisor’s supervisor has '888665555' for SSN. The query using the relational operators is as follows: Result: Comments (1) Step 3 of 3 c. List the names of employees who make at least $10,000 more than the employee who is paid the least in the company. The query using the relational operators is as follows: Result: Comment Chapter 8, Problem 33E Problem State whether the following conclusions are true or false: a. NOT (P(x) OR Q(x)) → (NOT (P(x)) AND (NOT (Q(x))) b. NOT (∃ x) ( P(x )) → ∀ x (NOT (P(x)) c. (∃ x) (P(x)) → ∀ x: (( P(x)) Step-by-step solution Step 1 of 3 (a) TRUE Comments (2) Step 2 of 3 (b) TRUE Comment Step 3 of 3 (c) FALSE Comment Chapter 8, Problem 34LE Problem Specify and execute the following queries in relational algebra (RA) using the RA interpreter on the COMPANY database schema in Figure 5.5. a. List the names of all employees in department 5 who work more than 10 hours per week on the ProductX project. b. List the names of all employees who have a dependent with the same first name as themselves. c. List the names of employees who are directly supervised by Franklin Wong. d. List the names of employees who work on every project. e. List the names of employees who do not work on any project. f. List the names and addresses of employees who work on at least one project located in Houston but whose department has no location in Houston. g. List the names of department managers who have no dependents. Step-by-step solution Step 1 of 7 a) EMP_WORK_PRODUCT<--(σPname=’ProductX’(Project)) ?(Pnumber),(Pno) (Works_on) EMP_W_10<-(Employee)?(Ssn,Essn)(σHours>10(EMP_WORK_PRODUCT)) π Lname, Fname, Minit(σ Dno = 5(EMP_W_10)) Explanation: The above query will display the names of all the employee of department and also who works more than 10 hrs per week on the Product X project. For this query we have used natural join and ‘σ’ is for selecting and ‘π’ is projection which eliminates duplicates. Comment Step 2 of 7 b) EMP<--(Employee)? (Ssn,Fname),(Essn, Dependent_name) (DEPENDENT) π Lname, Fname,Minit (EMP) Explanation: The above query will display the names of all the employees who have a dependent with the same first as themselves. Comment Step 3 of 7 C) Wong_S<--πSsn(σFname=’Franklin’and Lname=’Wong’(Employee)) Emp_wong <--(Employee) ? (SuperSsn),(Ssn)(Wong_s) π Lname, Fname,Minit(Emp_wong) Explanation: The above query we use self join in this query to display the names of all the employees who are under the supervision of Franklin Wong. Comment Step 4 of 7 D) Emp_proj(Pno,Ssn) <-- πPno,Essn(Works_on) All_proj <-- π Pnumber (Project) All_proj_emp <-- Emp_proj ÷ All_proj π Lname, Fname,Minit( Employee * All_proj_emp) Explanation: The above query will give the names of employees who work on every project by using minus operator which will remove all the rows that exists in left side table. Comment Step 5 of 7 e) Emps <-- π Ssn (Employee) Emps_Working(Ssn) <-- π Essn (Works_on) Emp_Non_work_Project <-- Emps - Emps_Working π Lname, Fname,Minit( Employee * Emp_Non_work_Project) Explanation: The above query will give the names of employees who does not works on any project by using minus operator which will remove all the rows that exists in left side table. Comment Step 6 of 7 f) Emp_proj_Hou(Ssn)<--πEssn(Works_on(Pno),(Pnumber)(σPlocation=’Houston’(Project))) Dept_NOLOC_HOU <-πDno(Department)–πDno(σDlocation= ‘Houston’(Department’) Emp_Dept_No_Hou<-πSsn(Employee ? (Pno),(Dno)( Dept_NOLOC_HOU)) Emps_Result <-- Emp_proj_Hou - Emp_Dept_No_Hou π Lname, Fname,Minit,Address( Employee * Emps_Result) Explanation: the above query will give the names and address of employees who work at least one project located in ‘Houston’ and no department location in ‘Houston’ by using minus operator which will remove all the rows that exists in left side table. Comment Step 7 of 7 g) Managers_Dept(Ssn) <-- π Mgr_Ssn(Department) Dependents _Of _ Emps(Ssn) <-- π Essn (Dependent) Emps_Result <-- Managers_Dept - Dependents _Of _ Emps π Lname, Fname,Minit( Employee * Emps_Result) Explanation: the above query will give the names of department managers who have no dependents by using minus operator which will remove all the rows that exists in left side table. Comment Chapter 8, Problem 35LE Problem Consider the following MAILORDER relational schema describing the data for a mail order company. PARTS(Pno, Pname, Qoh, Price, Olevel) CUSTOMERS(Cno, Cname, Street, Zip, Phone) EMPLOYEES(Eno, Ename, Zip, Hdate) ZIP_CODES(Zip, City) ORDERS(Ono, Cno, Eno, Received, Shipped) ODETAILS(Ono, Pno, Qty) Qoh stands for quantity on hand : the other attribute names are self- explanatory. Specify and execute the following queries using the RA interpreter on the MAILORDER database schema. a. Retrieve the names of parts that cost less than $20.00. b. Retrieve the names and cities of employees who have taken orders for parts costing more than $50.00. c. Retrieve the pairs of customer number values of customers who live in the same ZIP Code. d. Retrieve the names of customers who have ordered parts from employees living in Wichita. e. Retrieve the names of customers who have ordered parts costing less than $20.00. f. Retrieve the names of customers who have not placed an order. g. Retrieve the names of customers who have placed exactly two orders. Step-by-step solution Step 1 of 7 MAILORDER Relational Schema a) The following command is used to retrieve the names of “PARTS” that costs less than $20.00. SELECT Pname FROM PARTS WHERE Price<$20.00; Comment Step 2 of 7 b) The following command is used to retrieve the names and cities of employees and whose have taken orders for parts costing more than $50.00. SELECT Emp.Ename, Z.City FROM PARTS P, EMPLOYEES Emp, ZIP_CODES Z, ODETAILS OT WHERE P.Pno=OT.Pno AND Emp.Zip= Z.Zip AND Price>$50.00; Comment Step 3 of 7 c) The following command is used to retrieve pairs of customer number values of customers and who live in the same ZIP code: SELECT C.Cno, C1.Cno FROM CUTOMERS C, CUSTOMERS C1 WHERE C.Zip= C1.Zip AND C.Cno!=C1.Cno; Comment Step 4 of 7 d) The following command is used to retrieve names of customer and who have ordered parts from employees living in Wichita. SELECT Distinct C.cname FROM CUSTOMERS C, ORDERS O, EMPLOYEES E, ZIP_CODE Z WHERE C.cno=O.cno AND O.eno = e.eno AND E.zip=Z.zip AND Z.city=‘Wichita’); Comment Step 5 of 7 e) The following command is used to retrieve names of customer and who have ordered parts costing less than $20.00. SELECT C.cname FROM Customers C where NOT EXISTS (select P.Pno from parts p where p.price<20.00 and NOT EXISTS (Select * from ORDERS O, Odetails OT where O.Ono= OT.Ono and O.Ono=C.Cno and OT.Pno=P.Pno)); Comment Step 6 of 7 f) The following command is used to retrieve names of customer and who have not placed an order. SELECT C.cname from Customers C Where NOT EXISTS (Select Ono from ORDERS O, Customers C where O.Ono=C.Cno); Comment Step 7 of 7 g) The following command is used to retrieve names of customer and who have placed an exactly two orders. SELECT C.cname FROM Customers C, ORDERS O where O.Ono=C.Cno and COUNT (Ono)=2; Comment Chapter 8, Problem 36LE Problem Consider the following GRADEBOOK relational schema describing the data for a grade book of a particular instructor. ( Note : The attributes A, B, C, and D of COURSES store grade cutoffs.) CATALOG(Cno, Ctitle) STUDENTS(Sid, Fname, Lname, Minit) COURSES(Term, Sec_no, Cno, A, B, C, D) ENROLLS(Sid, Term, Sec_no) Specify and execute the following queries using the RA interpreter on the GRADEBOOK database schema. a. Retrieve the names of students enrolled in the Automata class during the fall 2009 term. b. Retrieve the Sid values of students who have enrolled in CSc226 and CSc227. c. Retrieve the Sid values of students who have enrolled in CSc226 or CSc227. d. Retrieve the names of students who have not enrolled in any class. e. Retrieve the names of students who have enrolled in all courses in the CATALOG table. Step-by-step solution Step 1 of 5 GRADEBOOK Database a) The following command is used to retrieve the names of students enrolled in the Automata class during the fall 2009 term. • Select Fname, Minit, Lname FROM STUDENTS, ENROLLS, COURSES, CATALOG WHERE STUDENTS.Sid= ENROLLS.Sid And COURSES.Cno=CATALOG.Cno And COURSES.Term=ENROLLS.Term And CATALOG.Ctitle = Automata And ENROLLS.Term=2009; Comment Step 2 of 5 b) The following command is used to retrieve the Sid values of students who have enrolled in CSc226 and CSc227. • Select Sid From STUDENTS WHERE Sid IN (Select Sid from ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And COURSES.Cno=’CSc226’ And Sid IN (Select Sid from ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And COURSES.Cno=’CSc227’; Comment Step 3 of 5 c) The following command is used to retrieve the Sid values of students who have enrolled in CSc226 or CSc227. • Select Sid From STUDENTS WHERE Sid IN (Select Sid from ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And COURSES.Cno=’CSc226’ OR Sid IN (Select Sid from ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And COURSES.Cno= ‘CSc227’; Comment Step 4 of 5 d) The following command is used to retrieve the names of students who have not enrolled in any class. • Select Fname, Minit, Lname FROM STUDENTS WHERE NOT EXISTS (Select Sid from ENROLLS); Comment Step 5 of 5 e) The following command is used to retrieve the names of students who have enrolled in all courses in the CATALOG table. • Select Fname, Minit, Lname FROM STUDENTS WHERE NOT EXISTS ( ( Select Cno from CATALOG) MINUS (Select Cno from COURSES , ENROLLS WHERE COURSE.Term= ENROLLS.Term And STUDENTS.Sid=ENROLLS.Sid)); Comment Chapter 8, Problem 37LE Consider a database that consists of the following relations. SUPPLIER(Sno, Sname) PART(Pno, Pname) PROJECT(Jno, Jname) SUPPLY(Sno, Pno, Jno) The database records information about suppliers, parts, and projects and includes a ternary relationship between suppliers, parts, and projects. This relationship is a many-many-many relationship. Specify the following queries in relational algebra. 1. Retrieve the part numbers that are supplied to exactly two projects. 2. Retrieve the names of suppliers who supply more than two parts to project J1. 3. Retrieve the part numbers that are supplied by every supplier. 4. Retrieve the project names that are supplied by supplier S1 only. 5. Retrieve the names of suppliers who supply at least two dierent parts each to at least two dierent projects. Chapter 9, Problem 1RQ Problem (a) Discuss the correspondences between the ER model constructs and the relational model constructs. Show how each ER model construct can be mapped to the relational model and discuss any alternative mappings. (b) Discuss the options for mapping EER model constructs to relations, and the conditions under which each option could be used. Step-by-step solution Step 1 of 3 A model representing the data in conceptual and abstract way is called ER model. This can be used in database modeling. Also used to reduce the complexity of the database schema and also produce a semantic data model of a system. In relational schema relationship types are represented by two attributes, one as a primary key and the other one as a foreign key instead of representing them explicitly. a. Some of the correspondence between ER model and relational model are as follows: ER MODEL Entity relationship model has entity and relationship among the entities. RELATIONAL MODEL Relational model has entities consisting of attributes. Relationship is established through foreign keys. ER model consists strong entity type that is represented Entity relations are constructed for each strong entity. by a rectangle. ER model also consists weak entity type that is represented Entity relations are constructed for each weak entity. by a rectangle. All binary 1: 1 or 1: N relationship type are represented by a line connecting line. Relationship between two entities is represented by foreign key or relationship relation having two foreign keys each representing corresponding entity. All binary M: N relationship type are represented by a line Represented by relationship relation or two foreign keys connecting line. All n-ary relationship (n>2) type are represented by a line Relationship relation and n foreign keys. connecting line. Entities have simple attributes. Entities have composite attributes. Relations have attributes corresponding to the entities of ER model. Relations have set of simple component attributes. Entities have multivalued Multivalued attributes of ER model are represented by attributes relation and foreign keys. ER model also has derived attributes Derived attributes are not included. Value set is the set of values that may be assigned to Domain is the value scope of particular attribute. attributes. Key attributes are underlined. This model consists of primary key, foreign key, composite key or candidate key etc. Follow the following steps to map ER model into relational model efficiently: 1. Ignore derived attribute. Derived attribute are the attributes which can be derived from other attributes like age, full name. If ER diagram has any derived attribute than remove all derived attributes to make schema simpler. Full name can be calculated by concatenating the first name, middle name and the last name of the candidate. So it is not required to store the full name of the candidate separately. 2. Mapping of all strong Entities into tables. • Map all strong entities into tables. Create a separate relation for each strong entity including all simple attributes in the ER diagram and choose key attribute of ER diagram as primary key of relation. • Assume an entity type T in the ER model E, create a relation R including all simple attributes of T, also choose unique attribute as a primary key of relation R. • If multiple keys exist for T in E during the analysis of the design, then keep all of them to describe specific information about the attributes. Keys can also be used for indexing the database and also for other analysis. 3. Mapping of weak Entities. • Map all weak entities into tables. Create a separate relation for each weak entity including all simple attributes. Include all primary keys of the relations to which weak entity is related as foreign key, to establish connection among the relations. • Weak entity does not have its own candidate key. Here candidate key of the relation R is composed of the primary key(s) of the participating entity(s) and the partial key of the weak entity. 4. Binary 1:1 Mapping. • For each binary 1:1 relationship in the relation R constructed by the ER schema, identify relation between two entities. This relationship might occur in the form of foreign key or merging two attributes into one as a candidate key. • Also add the attributes which come under relationship. This can also be done by creating a new relation R that includes primary keys of both participating relations as foreign key. 5. Binary 1: N Mapping. • Identify all 1: N relationships in ER diagram. For each binary 1: N relationship in relation R, the primary key present on the 1-side of the relationship becomes a foreign key on the N-side relation. • Another approach is to create a new relation S that includes primary keys of both participating entities. Both primary keys work as foreign keys in S. 6. Binary M: N Mapping. • Identify all M: N relationship in ER diagram. Create new relation S, corresponding to each binary M: N relationship, to represent relationship R. Include both primary key attributes of participating relations as foreign keys in the relation S. Also include the simple attributes of the relationship. • Combination of foreign keys will form primary key in S. As in 1: 1 or 1: N relationship, M: N relationship can’t be represented by single foreign key attribute used in one of the participating relations. 7. Comment Step 2 of 3 Mapping of Multivalued attributes. • Create a new relation R, corresponding to each multivalued attribute A present in the ER diagram. All simple attributes corresponding to A, would be present in relation R. • The relation will comprise of primary key attribute K, such that the attribute K belongs to the relation representing the relationship type containing A as a multivalued attribute. The primary key of R would be the combination of A and K. 8. Mapping of N-ary relationship. • For each n-ary relationship, having , represent the relation R through a new relation. Include primary keys attributes of all participating relations as foreign key attributes and also include the simple attribute of n-ary relationship. • Since the participating entities are more than two, so without creating a new relation this cannot be mapped. Combination of all foreign keys is generally used as a primary key in relation R. Comment Step 3 of 3 b. Method of mapping EER model into relational model. Mapping of Enhanced Entity Relationship (EER) model to relations includes all the 8 steps followed in part (a). EER model is an extended model, used to map extended elements of the ER model. Extended elements in the EER model are specialization or generalization and shared subclasses. The following steps can also be used for EER to relation mapping: Mapping specialization or generalization. • A number of subclasses, that constitutes a specialization, can be mapped into relational schema using several options. First option is to map the whole specialization into a single table. Second option is to map it into multiple tables. In each option, variations may occur that depends upon the constraints on the specialization or generalization. • Each specialization containing m subclasses and generalized super class C, having the primary key k and the attributes are converted into relational schemas using one of the following options: Option 9a: Multiple relations—superclass and subclasses. • Create a relation R for superclass that includes all the attributes of C with the primary key k. Create a separate relation having primary key k and attributes for each subclass as primary key for each relation , where . Here k is working . Option 9b: Multiple relations—subclasses relations only. • Create a relation corresponding to every subclass that includes the attributes , where k is primary key for each relation . The specialization, whose subclasses are total, can use this option. A total subclass is a class such that at least one subclass must contain all the entities of super class. • Specialization having disjointed constraints can be mapped through this option. In the case of specialization overlapping, there can be replication of same entity in several relations. This will cause redundancy in the relational schema. Option 9c: Single relation with one type attribute. • Create a single relation R that includes the attributes . The attribute t is called a type attribute or discriminating attribute. It represents the subclass to which each tuple/record belongs. The attribute k is the primary key. • This option is applicable for a specialization whose subclasses are disjoint. This option generates many NULL values if independent attributes exist in the subclasses. Option 9d: Single relation with multiple type attributes. • Create a single relation R that includes the attributes . The attribute k is the primary key for the relation R. • Each attribute a subclass Comment is a Boolean type attribute. This indicates whether a record is contained by or not. This option can be used for specialization, having overlapping subclasses. Chapter 9, Problem 2E Problem Map the UNIVERSITY database schema shown in Figure 3.20 into a relational database schema. Step-by-step solution Step 1 of 3 Refer Fig 3.20 of chapter 3 for the UNIVRESITY database schema from the textbook. Comment Step 2 of 3 Basic steps to map ER diagram into Relational Database Schema are as follows: 1. Ignore derived attribute. If ER diagram has any derived attribute than remove all derived attributes to make schema simpler. Derived attribute are the attributes which can be derived from other attributes like, age, full name. Age can be calculated through difference of current date and the date of birth. 2. Mapping of all strong Entities into tables. Map all strong entities into tables. Create a relation R that includes all single attributes in the ER diagram and choose key attribute of ER diagram as primary key of relation R. COLLEGE CName COffice CPhone INSTRUCTOR Id IName Rank IOffice IPhone DEPT DCode DName DOffice DPhone STUDENT Sid DOB FName MName LName Addr Phone Major COURSE CCode Credits CoName Level CDesc SECTION SecId SecNo Sem Year Bldg RoomNo DaysTime 3. Mapping of weak Entities. For each weak entity create a separate relation R. Add all the simple attributes of weak entity in relation R. Include all primary keys of the relations to which weak entity is related as foreign key, to establish connection among the relations. Since the provided ER diagram has no weak entity, so there is no need to map weak entities. 4. 1:1 Mapping. For each binary 1:1 relationship in the relation R constructed by the ER schema, identify relation between two entities. This relationship might occur in the form of foreign key or by merging two attributes into one (both must have exact same number of attributes). Also add the attributes which come under relationship. COLLEGE CName COffice CPhone DeanId INSTRUCTOR Id IName Rank IOffice IPhone DCode CStartDate 5. 1: N Mapping. Identify all 1:N relationships in ER diagram. For each regular binary 1:N relationship in relation R, add primary key of participating relation of 1-side as foreign keys to the N-side relation. COLLEGE CName COffice CPhone DeanId DCode DEPT DCode DName DOffice DPhone CCode InstId SId INSTRUCTOR Id IName Rank IOffice IPhone DCode CStartDate SecId COURSE CCode Credits CoName Level CDesc SecId 6. M: N Mapping. Identify all M:N relationship in ER diagram. For each M:N relationship, create new relation S to represent relationship. Include all primary key attributes of participating relation as foreign key in the relation S. TAKES Sid Grade SecId 7. Mapping of Multivalued attributes. For each multivalued attribute in the ER diagram, create a new relation R. R will include all attributes corresponding to multivalued attribute. Add primary key attribute as a foreign key in R. Since the provided ER diagram has no multivalued attributes, so there is no need to map multivalued attributes. 8. Mapping of N-ary relationship. For each n-ary relationship, where , create a new relation R to represent the relationship. Include primary keys attributes of all participating relations as foreign key attributes and also include the simple attribute of n-ary relationship. Since the maximum value of n is 2 in the ER diagram provided, so there is no n-ary relationship. Comment Step 3 of 3 Final relational schema, for ER diagram provided in Fig-3.20, can be generated as follows: Final schema has seven relations, six from the strong entities and one from binary M: N relationship. Each relational table has primary and foreign keys. TAKES table represents relationship between STUDENT and SECTION table. Also, Grade can be calculated with the help of Sid and SecId for corresponding semester, year or in particular section. • In COLLEGE table, CName is primary key and DeanId and DCode are foreign keys for INSTRUCTOR and DEPT tables respectively. DeanId is the projection of Id attribute in INSTRUCTOR table. • In INSTRUCTOR table, Id is working as primary key. DCode and SecId are working as foreign key for DEPT and SECTION tables respectively. • In DEPT table, DCode is unique for each department and it is working as primary key. To establish connection with COURSE, INSTRUCTOR and STUDENT, their primary keys can be used as foreign keys. InstId of DEPT table is primary key (Id) attribute in INSTRUCTOR table and it is working as foreign key here. • STUDENT table has primary key only. To get the personal information of student SId will be used. But to retrieve academic information connection is required with DEPT and TAKES table. • Each course has its unique CCode in COURSE table. COURSE table is logically connected with SECTION table and DEPT table to particulate the course in department and section. • TAKES table is created using binary M: N relationship between STUDENT and SECTION. This is normalized form of both tables. • In SECTION table SecId is primary key. Comment Chapter 9, Problem 3E Problem Try to map the relational schema in Figure 6.14 into an ER schema. This is part of a process known as reverse engineering, where a conceptual schema is created for an existing implemented database. State any assumptions you make. Step-by-step solution Step 1 of 3 Take the relational schema from the text book figure 6.14 it shows the relations of mapping the EER categories. Based on this we may construct the ER schema. Comment Step 2 of 3 Comment Step 3 of 3 Here, BOOK_AUTHORS is the multivalued attributes. So it can be represented as weak entity type. Comment Chapter 9, Problem 4E Problem Figure shows an ER schema for a database that can be used to keep track of transport ships and their locations for maritime authorities. Map this schema into a relational schema and specify all primary keys and foreign keys. Figure An ER schema for a SHIP_TRACKING database. Step-by-step solution Step 1 of 6 Following are the steps to convert the given ER scheme into a relational schema: Step 1: Mapping the regular entity types: Identify the regular entities in the given ER scheme and create a relation for each regular entity. Include all the simple attributes of regular entities into relations. The relations are SHIP, SHIP_TYPE, STATE_COUNTRY, and SEA/OCEAN/LAKE. Comments (1) Step 2 of 6 Step 2: Mapping the weak entity types: The weak entities in the given ER scheme are SHIP_MOVEMENT, PORT, and PORT_VISIT. Create a relation for each weak entity. Include all the simple attributes of weak entities into relations and include the primary key of the strong entity that corresponds to the owner entity type as a foreign key. Comments (1) Step 3 of 6 Step 3: Mapping of binary 1:1 relationship types: There exists one binary 1:1 relationship mapping which is SHIP_AT_PORT in given ER scheme. Step 4: Mapping of binary 1: N relationship types: 1: N relationship types in given ER scheme are HISTORY, TYPE, IN, ON, HOME_PORT. For HISTORY 1: N relationship type, include the primary key of SHIP in SHIP_MOVEMENT. That is handled in step 2. For TYPE 1:N relationship type, include the primary key of SHIP_TYPE in SHIP. For IN 1: N relationship type, include the primary key of STATE_COUNTRY in PORT. For ON 1: N relationship type, include the primary key of SEA/OCEAN/LAKE in PORT. For HOME_PORT 1:N relationship type, include the primary key of PORT_VISIT in SHIP. Comment Step 4 of 6 Step 5: Mapping of binary M: N relationship types: There are no binary M: N relationship types in the given ER scheme. Step 6: Mapping of multivalued attributes: There are no multivalued attributes in the given ER scheme. The relational schema is shown below: Comments (3) Step 5 of 6 The primary keys in the schema are: SHIP: SnameSHIP_TYPE: TypeSHIP_MOVEMENT: Statename, Date, Time (Compound key)SEA/OCEAN/LAKE: SeaNamePORT: PnameSTATE_COUNTRY: NamePORT_VISIT: VSname, Start_date (Compound Key) Comment Step 6 of 6 The foreign keys in the schema are: SHIP: Ship_type, P_nameSHIP_TYPE: NoneSHIP_MOVEMENT: StatenameSEA/OCEAN/LAKE: NonePORT: NoneSTATE_COUNTRY: NamePORT_VISIT: VSname Comment Chapter 9, Problem 5E Problem Map the BANKER schema of Exercise 1 (shown in Figure 2) into a relational schema. Specify all primary keys and foreign keys. Repeat for the AIRLINE schema (Figure 3.20) of Exercise 2 and for the other schemas for Exercises 1 through 9. Exercise 1 Consider the ER diagram shown in Figure 1 for part of a BANK database. Each bank can have multiple branches, and each branch can have multiple accounts and loans. a. List the strong (nonweak) entity types in the ER diagram. b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship. c. What constraints do the partial key and the identifying relationship of the weak entity type specify in this diagram? d. List the names of all relationship types, and specify the (min, max) constraint on each participation of an entity type in a relationship type. Justify your choices. Figure 1 An ER diagram for a BANK database schema. Exercise 2 Consider the ER diagram in Figure 2, which shows a simplified schema for an airline reservations system. Extract from the ER diagram the requirements and constraints that produced this schema. Try to be as precise as possible in your requirements and constraints specification. Figure 2 An ER diagram for an AIRLINE database schema. Exercise 3 Which combinations of attributes have to be unique for each individual SECTION entity in the UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld constraints: a. During a particular semester and year, only one section can use a particular classroom at a particular DaysTime value. b. During a particular semester and year, an instructor can teach only one section at a particular DaysTime value. c. During a particular semester and year, the section numbers for sections offered for the same course must all be different. Can you think of any other similar constraints? Exercise 4 Composite and multivalued attributes can be nested to any number of levels. Suppose we want to design an attribute for a STUDENT entity type to keep track of previous college education. Such an attribute will have one entry for each college previously attended, and each such entry will be composed of college name, start and end dates, degree entries (degrees awarded at that college, if any), and transcript entries (courses completed at that college, if any). Each degree entry contains the degree name and the month and year the degree was awarded, and each transcript entry contains a course name, semester, year, and grade. Design an attribute to hold this information. Use the conventions in Figure 3.5. Exercise 5 Show an alternative design for the attribute described in Exercise 4 that uses only entity types (including weak entity types, if needed) and relationship types. Exercise 6 In Chapters 1 and 2, we discussed the database environment and database users. We can consider many entity types to describe such an environment, such as DBMS, stored database, DBA, and catalog/data dictionary. Try to specify all the entity types that can fully describe a database system and its environment; then specify the relationship types among them, and draw an ER diagram to describe such a general database environment. Exercise 7 Design an ER schema for keeping track of information about votes taken in the U.S. House of Representatives during the current two-year congressional session. The database needs to keep track of each U.S. STATE?S Name (e.g., ?Texas?, ?New York?, ?California?) and include the Region of the state (whose domain is {?Northeast?, ?Midwest?, ?Southeast?, ?Southwest?, ? West?}). Each CONGRESS_PERSON in the House of Representatives is described by his or her Name, plus the District represented, the Start_date when the congressperson was first elected, and the political Party to which he or she belongs (whose domain is {?Republican?, ?Democrat?, ?Independent?, ?Other?}). The database keeps track of each BILL (i.e., proposed law), including the Bill_name, the Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is {?Yes?, ?No?}), and the Sponsor (the congressperson(s) who sponsored?that is, proposed?the bill). The database also keeps track of how each congressperson voted on each bill (domainof Vote attribute is {?Yes?, ?No?, ?Abstain., ?Absent?}). Draw an ER schema diagram for this application. State clearly any assumptions you make. Exercise 8 A database is being constructed to keep track of the teams and games of a sports league. A team has a number of players, not all of whom participate in each game. It is desired to keep track of the players participating in each game for each team, the positions they played in that game, and the result of the game. Design an ER schema diagram for this application, stating any assumptions you malie. Choose your favorite sport (e.g., soccer, baseball, football). Exercise 9 Consider the ER diagram in Figure 3. Assume that an employee may work in up to two departments or may not be assigned to any department. Assume that each department must have one and may have up to three phone numbers. Supply (min, max) constraints on this diagram. State clearly any additional assumptions you make. Under what conditions would the relationship HAS_PHONE be redundant in this example? Figure 3 Part of an ER diagram for a COMPANY database. Figure 3.20 Step-by-step solution There is no solution to this problem yet. Get help from a Chegg subject expert. Ask an expert Chapter 9, Problem 6E Problem Map the EER diagrams in Figures 4.9 and 4.12 into relational schemas. Justify your choice of mapping options. Step-by-step solution Step 1 of 7 The relational schema diagram for the EER diagram in figure 4.9 is as shown below: Comment Step 2 of 7 Explanation: • The regular entity types are PERSON, DEPARTMENT, COLLEGE, COURSE and SECTION. So, create a relation for each entity with their respective attributes. • The FACULTY and STUDENT are sub classes of the entity PERSON. So, two relations one for FACULTY and one for STUDENT are created and the primary key of PERSON is included in both the relations along with their respective attributes. • An entity INSTRUCTOR_RESEARCHER is created with Instructor_id as an attribute. This attribute is included as a foreign key in the relations FACULTY and GRAD_STUDENT. • There exists a binary 1:1 relationship CHAIRS between FACULTY and DEPARTMENT. So, include the primary key of Faculty as a foreign key in relation DEPARTMENT. • There exists a binary 1:N relationship CD between COLLEGE and DEPARTMENT. So, include the primary key of COLLEGE as a foreign key in relation DEPARTMENT. Comment Step 3 of 7 • There exists a binary 1:N relationship DC between DEPARTMENT and COURSE. So, include the primary key of DEPARTMENT as a foreign key in relation COURSE. • There exists a binary 1:N relationship CS between COURSE and SECTION. So, include the primary key of COURSE as a foreign key in relation SECTION. • There exists a binary 1:N relationship ADVISOR between FACULTY and GRAD_STUDENT. So, include the primary key of FACULTY as a foreign key in relation GRAD_STUDENT. • There exists a binary 1:N relationship PI between FACULTY and GRANT. So, include the primary key of FACULTY as a foreign key in relation GRANT. • There exists a binary 1:N relationship TEACH between SECTION and INSTRUCTOR_RESEARCHER. Create a relation TEACH and include the primary keys of SECTION and INSTRUCTOR_RESEARCHER as attributes of TEACH. • There exists a binary 1:N relationship MAJOR between STUDENT and DEPARTMENT. Create a relation MAJOR and include the primary keys of STUDENT and DEPARTMENT as attributes of MAJOR. • Comment Step 4 of 7 There exists a binary 1:N relationship MINOR between STUDENT and DEPARTMENT. Create a relation MINOR and include the primary keys of STUDENT and DEPARTMENT as attributes of MINOR. • There exists a binary M:N relationship COMMITTEE between FACULTY and GRAD_STUDENT. Create a relation COMMITTEE and include the primary keys of FACULTY and GRAD_STUDENT as attributes of COMMITTEE. • There exists a binary M:N relationship BELONGS between FACULTY and DEPARTMENT. Create a relation BELONGS and include the primary keys of FACULTY and DEPARTMENT as attributes of BELONGS. • There exists a binary M:N relationship REGISTERED between STUDENT and CURRENT_SECTION. Create a relation REGISTERED and include the primary keys of STUDENT and CURRENT_SECTION as attributes of REGISTERED. • There exists a binary M:N relationship REGISTERED between STUDENT and CURRENT_SECTION. Create a relation REGISTERED and include the primary keys of STUDENT and CURRENT_SECTION as attributes of REGISTERED. • There exists a binary M:N relationship TRANSCRIPT between SECTION and STUDENT. Create a relation TRANSCRIPT and include the primary keys of SECTION and STUDENT as attributes of TRANSCRIPT along with additional attributes of relation TRANSCRIPT. Comment Step 5 of 7 The relational schema diagram for the EER diagram in figure 4.12 is as shown below: Comment Step 6 of 7 Explanation: • The regular entity types are PLANE_TYPE, AIRPLANE and HANGAR. So, create a relation for each entity with their respective attributes. • Create two relations CORPORATION and PERSON and include their respective attributes. • Owner category is a subset of the union of two entities CORPORATION and PERSON. So, a relation OWNER is created with Owner_id as an attribute. This attribute is included as a foreign key in the relations CORPORATION and PERSON. • The EMPLOYEE and PILOT are sub classes of the entity PERSON. So, two relations one for EMPLOYEE and one for PILOT are created and the primary key of PERSON is included as primary key in both the relations along with their respective attributes. • An entity SERVICE is a weak entity. So, create a relation SERVICE and include as attributes the primary key of AIRPLANE along with the attributes of SERVICE. • There exists a binary 1:N relationship OF_TYPE between AIRPLANE and PLANE_TYPE. So, include the primary key of AIRPLANE as a foreign key in relation PLANE_TYPE. • There exists a binary 1:N relationship STORED_IN between AIRPLANE and HANGAR. So, include the primary key of AIRPLANE as a foreign key in relation HANGAR. • There exists a binary M:N relationship WORKS_ON between PLANE_TYPE and EMPLOYEE. Create a relation WORKS_ON and include the primary keys of PLANE_TYPE and EMPLOYEE as attributes of WORKS_ON. • Comment Step 7 of 7 There exists a binary M:N relationship FLIES between PLANE_TYPE and PILOT. Create a relation FLIES and include the primary keys of PLANE_TYPE and PILOT as attributes of FLIES. • There exists a binary M:N relationship OWNS between AIRPLANE and OWNER. Create a relation OWNS and include the primary keys of AIRPLANE and OWNER as attributes of OWNS along with the attribute Pdate. • There exists a binary M:N relationship MAINTAIN between SERVICE and EMPLOYEE. Create a relation OWNS and include the primary keys of SERVICE and EMPLOYEE as attributes of MAINTAIN. Comment Chapter 9, Problem 7E Problem Is it possible to successfully map a binary M : N relationship type without requiring a new relation? Why or why not? Step-by-step solution Step 1 of 3 When there exists a many to many relationship between two entities, then the relationship type is known as binary M: N relationship type. Comment Step 2 of 3 The steps to map a binary M: N relationship type R into relation is as follows: • Create a new relation R1 to represent the relationship type R. • Include the primary keys of the two participating entities as foreign keys in new relation R1. • The primary keys of the two participating entities also become the composite primary key of relation R1. • Also include any simple attributes of the relationship type R. Comment Step 3 of 3 Hence, it is not possible to map a binary M: N relationship type without requiring a new relation. Comment Problem Chapter 9, Problem 8E Consider the EER diagram in Figure for a car dealer. Map the EER schema into a set of relations. For the VEHICLE to CAR/TRUCK/SUV generalization, consider the four options presented in Section 9.2.1 and show the relational schema design under each of those options. Figure EER diagram for a car dealer. Step-by-step solution Step 1 of 8 Option multiple relations – superclass and subclasses: Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the option multiple relations – superclass and subclasses: Comment Step 2 of 8 Using the option multiple relations – superclass and subclasses, a separate relation is created for super class and each sub class in the generalization. • A relation VEHICLE is created with attributes Vin, Model and Price. • A relation CAR is created with attribute Vin and Engine_size. • A relation TRUCK is created with attribute Vin and Tonnage. • A relation SUV is created with attribute Vin and No_seats. Comment Step 3 of 8 The relational schema for a car dealer EER diagram (refer figure 9.9) using the option multiple relations – superclass and subclasses is as shown below: Comment Step 4 of 8 Option multiple relations –subclass relations only: Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the option multiple relations –subclass relations only: Using the option multiple relations –subclass relations only, a separate relation is created for each sub class in the generalization. • A relation CAR is created with attribute Vin, Model, Price and Engine_size. • A relation TRUCK is created with attribute Vin, Model, Price and Tonnage. • A relation SUV is created with attribute Vin, Model, Price and No_seats. Comment Step 5 of 8 Option single relation with one type attribute: Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the option single relation with one type attribute: Using the option single relation with one type attribute, a single relation is created for super class as well as the sub class. • The attributes of the relation will be the union of attributes of super class and sub classes. • An attribute Vehicle_Type is added to specify the type of the vehicle • A relation Vehicle is created with attributes Vin, Model, Price, Engine_size, Tonnage, No_seats and Vehicle_Type. Comment Step 6 of 8 The relational schema for a car dealer EER diagram (refer figure 9.9) using the option single relation with one type attribute is as shown below: Comment Step 7 of 8 Option single relation with multiple type attributes: Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the option single relation with multiple type attributes: Using the option single relation with multiple type attributes, a single relation is created for super class as well as the sub class. • The attributes of the relation will be the union of attributes of super class and sub classes. • An Boolean attribute Car_Type is added to indicate the type of the vehicle as car. • An Boolean attribute Truck_Type is added to indicate the type of the vehicle as truck. • An Boolean attribute SUV_Type is added to indicate the type of the vehicle as SUV. • A relation Vehicle is created with attributes Vin, Model, Price, Car_Type, Engine_size, Truck_Type, Tonnage, SUV_Type, No_seats. Comment Step 8 of 8 The relational schema for a car dealer EER diagram (refer figure 9.9) using the option single relation with multiple type attributes is as shown below: Comment Chapter 9, Problem 9E Problem Using the attributes you provided for the EER diagram in Exercise, map the complete schema into a set of relations. Choose an appropriate option out of 8A thru 8D from Section 9.2.1 in doing the mapping of generalizations and defend your choice. Exercise Consider the following EER diagram that describes the computer systems at a company. Provide your own attributes and key for each entity type. Supply max cardinality constraints justifying your choice. Write a complete narrative description of what this EER diagram represents. Step-by-step solution Step 1 of 2 EER diagram represents: EER diagram represents the computer systems at a company. • The EER diagram starts with the relation computer. • The relation computer has the attributes that RAM, ROM, Processor, S_no, Manufacturer, and Cost. • It has the primary key S_no and the cardinality of 1:M. • EER diagram starts with the relation computer that it deals to many relations that Accessory, Installed and d. • The Accessory has a one-to-many cardinality and transfers the function to the keyboard, monitor, and mouse. • Also, the installed and installed_OS relation deals with the software and operating_system to perform the operations and signals on the computer system to support with it. • The relation d performs the cardinality to laptop and desktop with all other components. • The other components that related are memory, video_card, and sound_card. Cardinality: • One-to-one cardinality describes the entity that related to only one occurrence to another occurrence. • One-to-many cardinality describes the entity that related to one occurrence to many occurrences. • Many-to-many cardinality describes the entity that related to many occurrences to many occurrences. Comment Step 2 of 2 The following table describes the attributes, primary key, and cardinality of each relation: Comment Chapter 9, Problem 10LE Problem Consider the ER design for the UNIVERSITY database that was modeled using a tool like ERwin or Rational Rose in Laboratory Exercise 3.31. Using the SQL schema generation feature of the modeling tool, generate the SQL schema for an Oracle database. Reference Exercise 3.31 Consider the UNIVERSITY database described in Exercise 16. Build the ER schema for this database using a data modeling tool such as ERwin or Rational Rose. Reference Exercise 16 Which combinations of attributes have to be unique for each individual SECTION entity in the UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld constraints: a. During a particular semester and year, only one section can use a particular classroom at a particular DaysTime value. b. During a particular semester and year, an instructor can teach only one section at a particular DaysTime value. c. During a particular semester and year, the section numbers for sections offered for the same course must all be different. Can you think of any other similar constraints? Step-by-step solution Step 1 of 1 Refer to the ER schema for UNIVERSITY database, generated using Rational Rose tool in Laboratory Exercise 3.31. Use Rational Rose tool to create the SQL schema for an Oracle database as follows: • Open the ER schema generated using Rational Rose tool in Laboratory Exercise 3.31. In the options available on left, right click on the option Component view, go to Data Modeler, then go to New and select the option Database. • Name the database as Oracle Database. • Right click on Oracle Database and select the option Open Specification. In the field Target select Oracle 7.x and click on OK. • Import the ER schema, generated using Rational Rose tool in Laboratory Exercise- 3.31, to the Oracle Database as follows: • Right click on the Oracle Database, then go to New and select the option File. • Now browse and select the ER schema generated using Rational Rose tool in Laboratory Exercise 3.31. Selecting the file would import the ER schema for the UNIVERSITY database, generated using Rational Rose tool in Laboratory Exercise 3.31. • Click on File option in menu bar, followed by clicking on Save as option. Save the ER schema by the file name 714374-9-10LE. • This will generate the SQL schema of the UNIVERSITY database for the Oracle database. Comment Chapter 9, Problem 11LE Problem Consider the ER design for the MAIL_ORDER database that was modeled using a tool like ERwin or Rational Rose in Laboratory Exercise. Using the SQL schema generation feature of the modeling tool, generate the SQL schema for an Oracle database. Exercise Consider a MAIL_ORDER database in which employees take orders for parts from customers. The data requirements are summarized as follows: â– The mail order company has employees, each identified by a unique employee number, first and last name, and Zip Code. â– Each customer of the company is identified by a unique customer number, first and last name, and Zip Code. â– Each part sold by the company is identified by a unique part number, a part name, price, and quantity in stock. â– Each order placed by a customer is taken by an employee and is given a unique order number. Each order contains specified quantities of one or more parts. Each order has a date of receipt as well as an expected ship date. The actual ship date is also recorded. Design an entity-relationship diagram for the mail order database and build the design using a data modeling tool such as ERwin or Rational Rose. Step-by-step solution Step 1 of 1 Refer to the ER schema for MAIL_ORDER database, generated using Rational Rose tool in Laboratory Exercise 3.32. Use Rational Rose tool to create the SQL schema for an Oracle database as follows: • Open the ER schema generated using Rational Rose tool in Laboratory Exercise 3.32. In the options available on left, right click on the option Component view, go to Data Modeler, then go to New and select the option Database. • Name the database as Oracle Database. • Right click on Oracle Database and select the option Open Specification. In the field Target select Oracle 7.x and click on OK. • Import the ER schema, generated using Rational Rose tool in Laboratory Exercise- 3.32, to the Oracle Database as follows: • Right click on the Oracle Database, then go to New and select the option File. • Now browse and select the ER schema generated using Rational Rose tool in Laboratory Exercise 3.32. Selecting the file would import the ER schema for the MAIL_ORDER database. • Click on File option in menu bar, followed by clicking on Save as option. Save the ER schema by the file name 714374-9-11LE. • This will generate the SQL schema of the MAIL_ORDER database for the Oracle database. Comment Chapter 10, Problem 1RQ Problem What is ODBC? How is it related to SQL/CLI? Step-by-step solution Step 1 of 1 ODBCL:Open data base connectivity (ODBC) is the standardized application programming interface. It is for accessing a database. For accessing the files we use the ODBC soft ware and programming support of ODBC is Microsoft. SQL/CLI SQL/CLI is the part of SQL standard. SQL / CLT means. Call level interface. It was developed as a follow up to the technique known as ODBC. SQL/ CLI is the set of functions. Comment Chapter 10, Problem 2RQ Problem What is JDBC? Is it an example of embedded SQL or of using function calls? Step-by-step solution Step 1 of 2 JDBCE JDBC stand for Java database connectivity. It is a registered trademark of sun Microsystems. JDBC is the call function interface it is for accessing the databases from java. A JDBC driver is basically an implementation of the function calls. That is specified in the JDBC application programming interface. It is designed for allow a single java program to connect several different databases. Comment Step 2 of 2 JDBC is not the example of embedded SQL. It is a function call. That is specified in JDBC API. JDBC function calls can access any RDBMS where the JDBC driver can available. So the function libraries for this access are known as JDBC. Comment Chapter 10, Problem 3RQ Problem List the three main approaches to database programming. What are the advantages and disadvantages of each approach? Step-by-step solution Step 1 of 3 Main approaches to database programming:The main approaches for database programming are (1) Embedding database command in a general – purpose programming language: Here database statements are embedded into the host programming language. But they are identified by a special prefix and precompiled or preprocessor scans the source program code to identify database statements and extract them for processing by the DBMS. Comment Step 2 of 3 (2) Using a library of database functions:A library of functions is made available to the host programming language for database calls. Comment Step 3 of 3 (3) Designing a brand new language:Database programming language is designed from scratch to be compatible with the database model and query language. Here loops and conditional statements are added to the data base language to convert it in to a full fledged programming language. Advantages and disadvantages of database programming:In many applications, first two steps are most common approaches. But they require some database access and main disadvantages of these two approaches is impedance mismatch. In the third approach it is more appropriate for applications and it has intensive data base interaction. In the third approach impedance mismatch is not occur here. Comment Chapter 10, Problem 4RQ Problem What is the impedance mismatch problem? Which of the three programming approaches minimizes this problem? Step-by-step solution Step 1 of 2 Impedance mismatch: Impedance mismatch is a term that is used to refer the problems occur in the differences between the data base model and the programming language model. It is less of problem when a special data base programming language is designed. At this time that uses the same data model and data types as the database model. In a relational model it has three main constructs. Attributes tuples tables. Comment Step 2 of 2 1 st problem:In the data model the data types of the programming language differ from the attribute data type. So, for this, it is necessary to have a binding for each programming language because different languages have different data types. 2 nd problem: The results of most queries are sets or multisite of tuples. And each is formed of a sequence of attribute value. So binding is needed to map the query result data structure, which is a table to an appropriate data structure in the programming language. The third approach of the data base programming that is designing a brand new language, approach is minimize this impedance mismatch problem. Comment Chapter 10, Problem 5RQ Problem Describe the concept of a cursor and how it is used in embedded SQL. Step-by-step solution Step 1 of 2 A cursor is a pointer that points to a single tuple/ row from the result of a query that retrieves multiple tuples. It is declared when the SQL every command is declared in the program. In the program cursor uses two commands OPEN CURSOR. Command FETCH command And the cursor variable is an iterates. Comments (1) Step 2 of 2 In the embedded SQL, update / delete commands are used when the condition WHERE CURRENT OF < Cursor name > specifies that the current tuple. It is represented by the cursor. When declaring a cursor in the embedded SQL, some operations are performed in that. DECLARE < Cursor name > [ INSENSITIVE ] [ SCROLL ] CURSOR [ WITH HOLD ] FOR < query specification > [ ORDER BY < Ordering Specification >] [ FOR REND ONLY | FOR UPDATE [ OF < attribute ] ] ; Comment Chapter 10, Problem 6RQ Problem What is SQLJ used for? Describe the two types of iterators available in SQLJ. Step-by-step solution Step 1 of 2 SQL J SQL J is standard it is adopted by several vendors for embedded SQL in java. SQL J is used for accessing SQL database from java using function calls. And it is used in oracle DBMS. SQL J is used for convert the SQL statements into java through the JDBC interface. In SQL J an iterates is associated with the tuples and attributes in a query result. Here two types of iterators is there. (1) A named iterator. (2) A positional iterator. Comment Step 2 of 2 A named iterator is associated with a query result by listing the attribute names and types. That may appear in the query result. And A positional iterator lists only the attribute types at the time of query result appear. A part from this, is both cases, the list should be in the same order as the attributes that are listed in the SELECT clause of the query. Looping over a query result is different for these two type of iterators. In the name iterator, there are no attribute names and in the positional iterator only attribute types are present. The positional iterator behaves as move similar to embedded SQL. Comment Chapter 10, Problem 7E Problem Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a program segment to read a student’s name and print his or her grade point average, assuming that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language. Step-by-step solution Step 1 of 1 Assuming all required variables have been declared already and assuming that Name of STUDENT is unique , code will look like: int Total_grade_avg = 0, total_course_count = 0; Prompt("Entre name of Student”, Sname) ; EXEC SQL Select Student_number, Name Into :number, :name From STUDENT Where Name = :Sname; EXEC SQL DECLARE GR CURSOR FOR Select Grade from GRADE_REPORT where Student_number = :number; EXEC SQL OPEN GR EXEC SQL FETCH from GR into :grade; While(SQLCODE = = 0) { switch (:grade) { case ‘A’: total_grade_avg+= 4; case ‘B’: total_grade_avg+= 3; case ‘C’: total_grade_avg+= 2; case ‘D’: total_grade_avg+= 1; } total_course_count++; EXEC SQL FETCH from GR into :grade; } EXEC SQL CLOSE GR If (total_course_count!=0)Total_grade_avg/ = total_course_count; printf(“Grade average of student is ”, total_grade_avg); Comment Chapter 10, Problem 8E Problem Repeat Exercise 10.7, but use SQLJ with Java as the host language. Reference 10.7 Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a program segment to read a student’s name and print his or her grade point average, assuming that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language. Step-by-step solution Step 1 of 1 Assuming all required variables have been declared already, headers have been included, and assuming that Name of STUDENT is unique, code will look like: int Total_grade_avg = 0, total_course_count = 0; Sname = readEntry("entre student name:”); try { #sql { Select Student_number, Name Into :number, :name From STUDENT Where Name = :Sname }; } catch (SQLExeception se) { System.out.println(“no student with this name”+ Sname); Return; } #sql iterator STU(Int number, string name); STU s = null; #sql s = { Select Grade from GRADE_REPORT where Student_number = :number}; while (s.next()) { switch (:grade) { case ‘A’: total_grade_avg+= 4; case ‘B’: total_grade_avg+= 3; case ‘C’: total_grade_avg+= 2; case ‘D’: total_grade_avg+= 1; } total_course_count++; }; If (total_course_count!= 0 ) { total_grade_avg = total_grade_avg/ total_course_count; }; System.out.println(“Grade average of student is ”, +total_grade_avg); s.close(); Comment Chapter 10, Problem 9E Problem Consider the library relational database schema in Figure. Write a program segment that retrieves the list of books that became overdue yesterday and that prints the book title and borrower name for each. Use embedded SQL with C as the host language. Figure A relational database schema for a LIBRARY database. Step-by-step solution Step 1 of 1 Assuming all required variables have been declared already EXEC SQL DECLARE DB CURSOR FOR Select B.Book_id, B.Title, BW.Name from BOOK B, BORROWER BW, BOOK_LOANS BL where BL.Due_date = CurDate() + 1 AND BL.Card_no = BW.Card_no AND BL.Book_id = B.Book_id EXEC SQL OPEN DB EXEC SQL FETCH from DB into :bookId, :bookTitle,:borrowerName; While(SQLCODE = = 0) { printf(“BookId”,bookId ); printf(“Book Title”,bookTitle ); printf(“Borrower Name”,borrowerName ); EXEC SQL FETCH from DB into :bookId, :bookTitle,:borrowerName; } EXEC SQL CLOSE DB Comment Chapter 10, Problem 10E Problem Repeat Exercise, but use SQLJ with Java as the host language. Exercise Consider the library relational database schema in Figure. Write a program segment that retrieves the list of books that became overdue yesterday and that prints the book title and borrower name for each. Use embedded SQL with C as the host language. Figure A relational database schema for a LIBRARY database. Step-by-step solution Step 1 of 1 Assuming all required variables have been declared already, headers have been included. #sql iterator DB(string bookId, string bookTitle, string borrowerName); DB d = null; #sql d = { Select B.Book_id, B.Title, BW.Name from BOOK B, BORROWER BW, BOOK_LOANS BL where BL.Due_date = CurDate() + 1 AND BL.Card_no = BW.Card_no AND BL.Book_id = B.Book_id }; while (d.next()) { System.out.println(“book id :”+d. bookId + “book title:” + d. bookTitle + “borrower name : ” + d. borrowerName); }; d.close(); Comment Chapter 10, Problem 11E Problem Repeat Exercise 10.7 and 10.9, but use SQL/CLI with C as the host language. Reference 10.7 Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a program segment to read a student’s name and print his or her grade point average, assuming that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language. Reference 10.9 Consider the library relational database schema in Figure 6.6. Write a program segment that retrieves the list of books that became overdue yesterday and that prints the book title and borrower name for each. Use embedded SQL with C as the host language. Step-by-step solution Step 1 of 4 Que 9.7 using SQL/CLI #include sqlcli.h; Void printGPA() { SQLHSTMT stmt1 ; SQLHDBC conv1 ; SQLHENV env1 ; SQLRETURN ret1, ret2, ret3, ret4 ; ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1); if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit; if (!ret2) ret3 = SQLConnect (con1, “dbs”, SQL_NTS, “js”, SQL_NTS,”xyz”, SQL_NTS) else exit; if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit; SQLPREPARE(stmt1, “Select Student_number, Name From STUDENT Where Name = ?”, SQL_NTS); prompt (“Entre student name:” Sname); SQLBindParameter(stmt1, 1, SQL_INTEGER, &Sname, 15, &fetchlen1); ret1 = SQLExecute(stmt1); if (!ret1) { SQLBindCol(stmt1, 1, SQL_INT, &number,4, &fetchlen1); SQLBindCol(stmt1, 2, SQL_STRING, &name,15, &fetchlen2); ret2 = SQLFetch(stmt1); while (!ret2) { SQLPREPARE(stmt1, “Select Grade from GRADE_REPORT where Student_number = ?”, SQL_NTS); SQLBindParameter(stmt1, 1, SQL_INTEGER, &number, 4, &fetchlen1); ret1 = SQLExecute(stmt1); if (!ret1) { SQLBindCol(stmt1, 1, SQL_INT, &grade,4, &fetchlen1); ret2 = SQLFetch(stmt1); while (!ret2) { switch (:grade) { case ‘A’: total_grade_avg+= 4; case ‘B’: total_grade_avg+= 3; case ‘C’: total_grade_avg+= 2; case ‘D’: total_grade_avg+= 1; } total_course_count++; ret2 = SQLFetch(stmt1); Comment Step 2 of 4 } } If (total_course_count!=0)Total_grade_avg/ = total_course_count; System.out.printline(“Grade average of student is ”, total_grade_avg); Comment Step 3 of 4 } else System.out.printline(“Sname does not match”); } Que 9.9 using SQL/CLI #include sqlcli.h; Void printDueBookRecord() { SQLHSTMT stmt1 ; SQLHDBC conv1 ; SQLHENV env1 ; SQLRETURN ret1, ret2, ret3, ret4 ; ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1); if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit; if (!ret2) ret3 = SQLConnect (con1, “dbs”, SQL_NTS, “js”, SQL_NTS,”xyz”, SQL_NTS) else exit; if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit; SQLPREPARE(stmt1, “Select B.Book_id, B.Title, BW.Name from BOOK B, BORROWER BW, BOOK_LOANS BL where BL.Due_date = CurDate() + 1 AND BL.Card_no = BW.Card_no AND BL.Book_id = B.Book_id” , SQL_NTS ) ret1 = SQLExecute(stmt1); if (!ret1) { SQLBindCol(stmt1, 1, SQL_STRING, &Book_id,4, &fetchlen1); SQLBindCol(stmt1, 2, SQL_STRING, &Title,30, &fetchlen2); Comment Step 4 of 4 SQLBindCol(stmt1, 3, SQL_STRING, &Borrowername,30, &fetchlen3); ret2 = SQLFetch(stmt1); while (!ret2) { System.out.printline (Book_id, Title, Borrower_name); ret2 = SQLFetch(stmt1); } } Comment Chapter 10, Problem 12E Problem Repeat Exercise 10.7 and 10.9, but use JDBC with Java as the host language. Reference 10.7 Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a program segment to read a student’s name and print his or her grade point average, assuming that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language. Reference 10.9 Consider the library relational database schema in Figure 6.6. Write a program segment that retrieves the list of books that became overdue yesterday and that prints the book title and borrower name for each. Use embedded SQL with C as the host language. Step-by-step solution Step 1 of 2 9.7 using JDBC Import java.io.*; import java.sql.*; ….. class PrintGPAAverage { Public static void main(String args[]) Throws SQLException, IOException{ Try{ Class.forName(“oracle.jdbc.driver.Oracle.Driver”) } catch (ClassNotFoundException x) { System.out.printline (“Driver could not be loaded”); } String dbacct, password, lname; Integer number; String name, Sname; dbacct = readEntry(“entre database account:”); passwrd = readEntry(“entre password:”); Connection conn = DriveManager.getConnection(“jdbc:oracle:oci8:”+ dbacct, +passwrd); Sname = readEntry (“entre student name”); String q=“Select Student_number,Name From STUDENT Where Name = ”+Sname; Statement s = conn.createStatement(); ResultSet r = s.ExecuteQuery(q); while(r.next()) { number = r.getInteger(1); name = r.getString(2); String t = “Select Grade from GRADE_REPORT where Student_number = “ + number.tostring(); Statement g = conn.createStatement(); ResultSet rs = g.executeQuery(t); while (rs.next()){ switch (:grade) { case ‘A’: total_grade_avg+= 4; case ‘B’: total_grade_avg+= 3; case ‘C’: total_grade_avg+= 2; case ‘D’: total_grade_avg+= 1; } total_course_count++; } } If (total_course_count!=0)Total_grade_avg/ = total_course_count; System.out.printline(“Grade average of student is ”, total_grade_avg); } Comment Step 2 of 2 Exercise 6.9 as JDBC: Import java.io.*; import java.sql.*; ….. class PrintGPAAverage { Public static void main(String args[]) Throws SQLException, IOException{ Try{ Class.forName(“oracle.jdbc.driver.Oracle.Driver”) } catch (ClassNotFoundException x) { System.out.printline (“Driver could not be loaded”); } String dbacct, password, lname; String Book_Id, Book_title, Borrower_name; dbacct = readEntry(“entre database account:”); passwrd = readEntry(“entre password:”); Connection conn = DriveManager.getConnection(“jdbc:oracle:oci8:”+ dbacct, +passwrd); String q=“Select B.Book_id, B.Title, BW.Name from BOOK B, BORROWER BW, BOOK_LOANS BL where BL.Due_date = CurDate() + 1 AND BL.Card_no = BW.Card_no AND BL.Book_id = B.Book_id”; Statement s = conn.createStatement(); ResultSet r = s.ExecuteQuery(q); while(r.next()) { Book_Id = r.getString(1); Book_title= r.getString(2); Borrower_name = r.getstring(3); System.out.println(“book id :”++ “book title:” + + “borrower name : ” +); }} } Comment Chapter 10, Problem 13E Problem Repeat Exercise 10.7, but write a function in SQL/PSM. Reference 10.7 Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a program segment to read a student’s name and print his or her grade point average, assuming that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language. Step-by-step solution Step 1 of 2 Consider the following SQL/PSM function to determine the average grade point of student. //Function PSM2: 1. CREATE FUNCTION Average_grad ( IN in_name CHAR(20)) //Declare variables to store intermediate values 2. DECLARE total_avg INTEGER; 3. DECLARE std_no INTEGER; 4. DECLARE count INTEGER; 5. DECLARE final_avg FLOAT; //Query to find the student number of user entered student name. 6. SELECT student_number INTO std_no FROM STUDENT WHERE Name=in_name; //Declare cursor to process the multiple row returned by the query 7. CURSOR grd is SELECT Grade FROM GRADE_REPORT WHERE Student_number= std_no; 8. OPEN grd; 9. LOOP 10. FETCH grd INTO temp_grd; 11. EXIT WHEN grd% NOTFOUND 12. COUNT:=COUNT +1; //use else-if statement to find the total points sum of student. 13. IF temp_grd=’A’ THEN total_avg:= total_avg+4; 14. ELSEIF temp_grd=’B’ THEN total_avg:= total_avg+3; 15. ELSEIF temp_grd=’C’ THEN total_avg:= total_avg+2; 16. ELSE temp_grd=’D’ THEN total_avg:= total_avg+1; 17. END IF; 18. END LOOP; //calculate the average 19. final_avg:=total_avg/count; //display student average point 20. Dbms_output.put_line(“The average is: ”||final_avg); Comment Step 2 of 2 Explanation of the above function: • First a function Average_grad is created which takes the name as an input. • Now, from the line number 2 to line number 5, variables are declared to store intermediate values. • Now, Query in line number 6 is used to find the student number of user entered student name. • In the line number 7, cursor is declared to process the multiple row returned by the query. • Now, from the line number 8 to line number 18, for loop is used to count the number of rows. Also, else-if statement is used inside for loop to find the total point’s sum of student. • At the end, in line number 19 the average is calculated. • In the line number 20, Dbms_output.put_line is used to display the average. Comment Chapter 10, Problem 14E Problem Create a function in PSM that computes the median salary for the EMPLOYEE table shown in Figure 5.5. Step-by-step solution Step 1 of 2 Following is the function in Persistent Stored Module (PSM ) to calculate the median salary for the EMPLOYEE table: //Function PSM1: 0) CREATE FUNCTION Emp_Median_Sal(IN Salary INTEGER) 1) RETURNS INTEGER 2) DECLARE median_salary INTEGER; 3) SELECT MEDIAN(Salary) INTO median_salary 4) FROM EMPLOYEE; 5) RETURN median_salary; Comment Step 2 of 2 Explanation: Line 0: CREATE FUNCTION is used to create a function. The name of the function created is Emp_Median_Sal. It takes the salaries of the EMPLOYEE table as input. Line 1: RETURNS is used to return the median salary among the inputs. Line 2: DECLARE is used to declare local variables. median_salary is a variable declared to hold the value of median salary. Line 3: MEDIAN(Salary) will give the median value among the salaries. INTO clause will assign the value returned by MEDIAN(Salary) into local variable median_salary. Line 4: FROM is used to specify from which table the data is to be considered. Line 5: RETURN is used to return the median_salary. Comment Chapter 14, Problem 1RQ Problem Discuss attribute semantics as an informal measure of goodness for a relation schema. Step-by-step solution Step 1 of 2 Semantics of a relation refers to way of explaining the meaning of an attribute value in a tuple. Comment Step 2 of 2 • The semantics of an attribute should be considered in such a way that they can be interpreted easily. • Once the semantics of an attribute are clear, it will be easy to interpret a relation. • The relation that is easy to interpret will indeed result in a good schema design. Thus, the semantics of an attribute plays an informal measure to design a relation schema. Comment Problem Chapter 14, Problem 2RQ Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with examples. Step-by-step solution Step 1 of 6 Insertion anomaly refers to the situation where it is not possible to enter data of certain attributes into the database without entering data of other attributes. Deletion anomaly refers to the situation where data of certain attributes are lost as the result of deletion of some of the attributes. Modification anomaly refers to the situation where partial update of redundant data leads to inconsistency of data. Comment Step 2 of 6 Insertion, deletion and modification anomalies are considered bad due to the following reasons: • It will be difficult to maintain consistency of data in the database. • It leads to redundant data. • It causes unnecessary updates of data. • Memory space will be wasted at the storage level. Comment Step 3 of 6 Consider the following relation named Emp_Proj: Insertion Anomalies: • Assume that there is an employee E11 who is not yet working in a project. Then it is not possible to enter details of employee E11 into the relation Emp_Proj. • Similarly assume there is a project P7 with no employees assigned to it. Then it is not possible to enter details of project P7 into the relation Emp_Proj. • Therefore, it is possible to enter an employee details into relation Emp_Proj only if he is assigned to a project. • Similarly, it is possible to enter details of a project into relation Emp_Proj only if an employee is assigned to a project. Comment Step 4 of 6 Deletion Anomalies: • Assume that an employee E07 has left the company. So, it is necessary to delete employee E07 details from the relation Emp_Pro. • If employee E07 details are deleted from the relation Emp_Pro, then the details of project P5 will also be lost. Update anomalies: • Assume that the location of project P1 is changed from Atlanta to New Jersey. Then the update should be done at three places. • If the update is reflected for two tuples and is not done for the third tuple, then inconsistency of data occurs. Comment Step 5 of 6 In order to remove insertion, deletion and modification anomalies, decompose the relation Emp_Proj into three relations as shown below: Comment Step 6 of 6 Insertion Anomalies: • It is possible to enter the details of employee E11 into relation Employee even though he is not yet working in a project. • It is possible to enter the details of project P7 into relation Project even though there are no employees assigned to it. Deletion Anomalies: • If employee E07 details are deleted from the relation Employee, still the details of project P5 will not be lost. Update anomalies: • If the location of project P1 is changed from Atlanta to New Jersey, then the update should be done in relation Project at only one place. Comment Chapter 14, Problem 3RQ Problem Why should NULLs in a relation be avoided as much as possible? Discuss the problem of spurious tuples and how we may prevent it. Step-by-step solution Step 1 of 4 Nulls values should be avoided in a relation as much as possible for the following reasons: • Memory space will be wasted at the storage level. • Meaning and purpose of the attributes is not communicated well. Comment Step 2 of 4 • When aggerate operations such as SUM, AVG etc. are performed on the attribute which has null values, the result will be incorrect. • When JOIN operation involves an attribute with null values, the result may be unpredictable. • The NULL value has different meanings. It may be unknown, not applicable or absent. Comment Step 3 of 4 Spurious tuples are generated as the result of bad design or improper decomposition of the base table. • Spurious tuples are the tuples generated when a JOIN operation is performed on badly designed relations. The resultant will have more tuples than the original set of tuples. • The main problem with spurious tuples is that they are considered invalid as they do not appear in the base tables. Comment Step 4 of 4 Spurious tuples can be avoided by taking care while designing relational schemas. • The relations should be designed in such a way that when a JOIN operation is performed, the attributes involved in the JOIN operation must be a primary key in one table and foreign key in another table. • While decomposing a base table into two tables, the tables must have a common attribute. The common attribute must be primary key in one table and foreign key in another table. Comment Chapter 14, Problem 4RQ Problem State the informal guidelines for relation schema design that we discussed. Illustrate how violation of these guidelines may be harmful. Step-by-step solution Step 1 of 1 Informal guidelines for relational schema:For designing a relation a relational database schema there are four types of informal measures of guidelines that are (1) Semantics of the attributes. (2) Reducing the redundant information in tuples. (3) Reducing the NULL values in tuples. (4) Disallowing the possibility of generating spurious tuples. These guidelines may be harmful, (1) Anomalies that cause redundant work to be done during insertion into and modification of a relation. And that may cause accidental loss of information during a deletion from a relation. (2) Waste of storage space due to NULL and the difficulty of perfuming selections. Aggregation operation and joins due to NULL values. (3) Generation of invalid and spurious data during joins on improperly related base relations. There problems may pointed out which can be detected with out additional tool of analysis’s. Comment Chapter 14, Problem 5RQ Problem What is a functional dependency? What are the possible sources of the information that defines the functional dependencies that hold among the attributes of a relation schema? Step-by-step solution Step 1 of 3 Functional dependency: The functional dependency describes the relationship between the attributes in a table. The functional dependency between the two attributes X, Y in a relation R is said to be exist if one attribute determines the other attribute uniquely. Comment Step 2 of 3 The functional dependency is a property of the semantics i.e., the functional dependency represents the semantic association between the attributes of the relation schema R. The main use of the functional dependency is that it describes the relation schema R. It is done by specifying the constraints on a relation R. These constraints are called legal extensions. Comment Step 3 of 3 In the functional dependency , the value Y is determined by the value of X i.e., X determines Y. Full functional dependency indicates that if A and B are attributes of the relation R then B is fully functionally dependent on A, but not any proper subset of A. Partial functional dependency indicates that if A and B are attributes of the relation R then B is partially dependent on A if there is some attribute that can be removed from A and yet the dependency still holds among the attributes of a relational schema. Comment Chapter 14, Problem 6RQ Problem Why can we not infer a functional dependency automatically from a particular relation state? Step-by-step solution Step 1 of 1 Certain FDs can be specified without refereeing to a specific relation, but as a property of those attributes given there generally understood meaning. It is also possible that certain functional dependencies may cease to exist in the real world if the relationship changes. Some tuples may have values that agree to a supposed FD but a new tuple may not agree with the same. Since a functional dependency is a property of the relation schema R, and not of a particular legal relation state R, it is not possible to define FDs from a particular relation state. Comment Chapter 14, Problem 7RQ Problem What does the term unnormalized relation refer to? How did the normal forms develop historically from first normal form up to Boyce-Codd normal form? Step-by-step solution Step 1 of 1 A unnormalized relation refer to a relation which does not meet any normal form condition. The normalization process was first proposed by Codd(1972), takes a relation schema through series of tests to certify weather it satisfies a certain normal form. The process, which proceeds in a top-down fashion by evaluating each relation against criteria for normal forms and decomposing relations as necessary, thus can be considered as relation design by analysis. Initially Codd proposed three normal forms 1NF, 2NF and 3NF. A stronger definition of 3NF called Boyce-Codd normal form(BCNF) was proposed later by Boyce and Codd. All these nornal forms are based on a single analytical tool: the functional dependencies among attributes of relation. 1NF splits relation schema into schemas that have atomic values as domain for all attribues and values of none of attribute is set of values. 2NF removes all partial dependencies of nonprime attributes A in R on key and ensure that all nonprime attributes are fully functionally dependent on the key of R. 3NF removes all transitive dependencies on key of R. and ensure that no non prime attribute is transitively dependent on key. Comment Chapter 14, Problem 8RQ Problem Define first, second, and third normal forms when only primary keys are considered. How do the general definitions of 2NF and 3NF, which consider all keys of a relation, differ from those that consider only primary keys? Step-by-step solution Step 1 of 2 Definition of normal forms when only primary keys are considered First Normal Form: It states that the domain of an attribute must include only atomic values and that the values of any attribute in a tuple must be a single value from the domain of that attribute. In other words first normal form does not allow relations with in relation as attribute values within tuples. Second Normal Form: It is based on concept of full functional dependency. A dependency X-> Y is full functional dependency if after removing any attribute A from X dependency does no hold any more. Else it is called partial dependency. A relation schema R is said to be in 2NF f every nonprime attribute A in R is fully functional dependent on the primary key of R. Third Normal Form: It is based on concept of transitive dependency. A functional dependency X>Y in a relation schema R is a transitive dependency if there is a set of attributes of Z, that are neither a candidate key nor a subset of any key of R, and both X->Z and Z->Y hold. A relation schema is said to be in third normal form if it satisfies second normal form and no nonprime attribute of R is transitively dependent on the primary key. Comment Step 2 of 2 The general definitions of 2NF and 3NF are different from general definition because general definition takes into account candidate keys as well. As a general definition of prime atribute, an attribute that is part of any candidate key will be considered as prime. Partial and full functional dependencies and transitive dependencies will be considered with respect to all candidate keys of a relation. General definition of 2NF: A relation schema R is in second normal form if every non-prime attribute A in R is not partially dependent on any key of R General definition of 3NF: A relation schema is said to be in 3NF if, whenever a nontrivial functional dependency X->A holds in R, either (a) X is a super key of R, or (b) A is a prime attribute of R A functional dependency is X-> Y trivial if X is superset of Y else dependency is non trivial. Comment Chapter 14, Problem 9RQ Problem What undesirable dependencies are avoided when a relation is in 2NF? Step-by-step solution Step 1 of 1 2NF removes all partial dependencies of nonprime attributes A in R on key and ensure that all nonprime attributes are fully functionally dependent on the key of R. Comment Chapter 14, Problem 10RQ Problem What undesirable dependencies are avoided when a relation is in 3NF? Step-by-step solution Step 1 of 1 3NF removes all transitive dependencies on key of R. and ensure that no non prime attribute is transitively dependent on key. Comment Chapter 14, Problem 11RQ Problem In what way do the generalized definitions of 2NF and 3NF extend the definitions beyond primary keys? Step-by-step solution Step 1 of 1 The generalized definitions of second normal form and third normal form extend beyond primary key by taking into consideration all the candidate keys of a relation. • These definitions do not depend/revolve around only the primary key of a relation. • These definitions take into consideration all the attributes that can be a possible key for a relation • These definitions also consider the partial and transitive dependencies on the candidate keys. Comment Chapter 14, Problem 12RQ Problem Define Boyce-Codd normal form. How does it differ from 3NF? Why is it considered a stronger form of 3NF? Step-by-step solution Step 1 of 3 Boyce – Codd Normal Form (BCNF): • A relation is said to be in BCNF if and only if every determinant is a candidate key. • In the functional dependency XY, if the attribute Y is fully functionally dependent on X, then X is said to be a determinant. • A determinant can be composite or single attribute. • BCNF is a stronger form of third normal form (3NF). • A relation that is in BCNF will also be in third normal form. Comment Step 2 of 3 Following are the differences between 3NF and BCNF: BCNF 3NF BCNF is a stronger normal form than 3NF. 3NF is a weaker normal form than BCNF. In the functional dependency XY, Y need not be In the functional dependency XY, Y must be a prime attribute. a prime attribute. It does not allow non-key attributes as determinants. It allows non-key attributes as determinants. Comment Step 3 of 3 BCNF is a stronger form of third normal form (3NF). • In BCNF, every determinant must be a candidate key. • BCNF does not allow some dependencies which are allowed in 3NF. • A relation that is in BCNF will also be in third normal form. • A relation that is in third normal form need not be in BCNF. Comment Chapter 14, Problem 13RQ Problem What is multivalued dependency? When does it arise? Step-by-step solution Step 1 of 3 Multivalued Dependency: • It is defined as a full constraint between two different sets of attributes in a relation. • This does not allow, having a set of values in a tuple. • The tuples should be presented in a relation. Comment Step 2 of 3 Occurrence of Multivalued dependency: • The relation which will have constraints that cannot be specified as the functional dependency then the multivalued dependency arises. • It will also occur when there is occurrence of one or more tuples in the same table in a database. Comment Step 3 of 3 Example of the occurrence of multivalued dependency: The Employee table has two multivalued dependencies listed below. Ename ->> Pname Ename ->> Dname Here Ename indicates employee name, Pname indicates project name, and Dname indicates dependent’s name. This is a multivalued dependency because; an employee can work in more than one project and can have more than one dependent. Comment Chapter 14, Problem 14RQ Problem Does a relation with two or more columns always have an MVD? Show with an example. Step-by-step solution Step 1 of 2 In a relation, when one attribute has multiple values referring to another attribute, then it indicates that there is a multivalued dependency (MVD) in a relation. An example of a relation with three attributes that have an MVD is as follows: In the above relation, there exists two MVDs: In order to remove the MVDs, decompose the relation into two relations as shown below: Comment Step 2 of 2 A relation with two or more columns will not always have a multivalued dependency (MVD). An example of a relation with two attributes that does not have an MVD is as follows: An example of a relation with three attributes that does not have an MVD is as follows: Comment Chapter 14, Problem 15RQ Problem Define fourth normal form. When is it violated? When is it typically applicable? Step-by-step solution Step 1 of 2 Violation of Fourth normal form: • The fourth normal form is violated if the relation is having the multivalued dependencies which are used to identify and decompose the relations in the relational schema R. Comment Step 2 of 2 Conditions for applying Fourth normal form: • A relation can be in Fourth normal form, if the relation is in third normal form. • For every non trivial dependencies X->Y where X is a superkey for R. Comment Chapter 14, Problem 16RQ Problem Define join dependency and fifth normal form. Step-by-step solution Step 1 of 2 Join dependency: • It is a constraint which is specified on the relation schema which is denoted by JD (R1, R2, R3, ... ,Rn). • A join dependency is said to be trivial join dependency if join dependency specified on the relation schema is equal to R. • It is a constraint with a set of legal relations over a database schema. Comment Step 2 of 2 Fifth normal form: • It is a database normalization technique which is used to reduce the redundancy or duplicate values of the relational databases recording multi valued facts. • The table should be the standard for the fourth normal form. • It is also called project join normal form because if there is any decomposition of the Relational Schema R there will be lossless decomposition in join dependency. • The fifth normal is defined with the join dependencies. Comment Chapter 14, Problem 17RQ Problem Why is 5NF also called project-join normal form (PJNF)? Step-by-step solution Step 1 of 2 Fifth Normal Form (5NF): • A relation schema is said to be in fifth normal if it is in the fourth normal form and with the set of the functional and join dependencies. • The fifth normal is defined with the join dependencies. If there is any decomposition of the Relational Schema R there will be lossless decomposition in join dependency. So, the 5NF is called as project-join normal form (PJNF). Comment Step 2 of 2 Examples of Project join normal form: The following is the example of the project join normal form: Consider when supplier(S) supplies the parts (p) to the projects (j). The relationships are derived as follows: • Supplier(S) supplies part (p). • Project(j) uses the part (p) and • Supplier(S) supplies at least one part (p) to the project (j). Therefore it shows the join dependency in the relation which are decomposed into three relations that are shown above and each relation is in 5NF. Comment Chapter 14, Problem 18RQ Problem Why do practical database designs typically aim for BCNF and not aim for higher normal forms? Step-by-step solution Step 1 of 1 Boyce Codd normal form (BCNF): The relation schema is said to be in BCNF whenever the nontrivial functional dependency X->A in R and then X is the super key of the relational schema(R). The practical database design users prefer to use BCNF rather than going for the higher normal forms because of the following reasons: • It is simpler form of 3NF (third normal form). • It reduces the redundancy (or duplicate) of the information in the thousands of tuples. • The data model can be easily understood by using the BCNF normalization technique. • It also improves the performance of the database when compared to the other normal forms. • It is stronger than the 3NF because a relation in BCNF is also a relation in 3NF but not the viceversa. • In most of the cases, the functional dependencies in R that violate the normal form up to BCNF are not present. The above points clearly say that database design users practically use BCNF when compared to other higher normal forms which improve the consistency, performance and quality of the database. Comment Chapter 14, Problem 19E Problem Suppose that we have the following requirements for a university database that is used to keep track of students’ transcripts: a. The university keeps track of each student’s name (Sname), student number (Snum), Social Security number (Ssn), current address (Sc_addr) and phone (Sc_phone), permanent address (Sp_addr) and phone (Sp_phone), birth date (Bdate), sex (Sex), class (Class) (‘freshman’, ‘sophomore’, …, ‘graduate’), major department (Major_code), minor department (Minor_code) (if any), and degree program (Prog) (‘b.a.’, ‘b.s.’, ..., ‘ph.d.’). Both Ssn and student number have unique values for each student. b. Each department is described by a name (Dname), department code (Dcode), office number (Doffice), office phone (Dphone), and college (Dcollege). Both name and code have unique values for each department. c. Each course has a course name (Cname), description (Cdesc), course number (Cnum), number of semester hours (Credit), level (Level), and offering department (Cdept). The course number is unique for each course. d. Each section has an instructor (Iname), semester (Semester), year (Year), course (Sec_course), and section number (Sec_num). The section number distinguishes different sections of the same course that are taught during the same semester/year; its values are 1,2, 3, ..., up to the total number of sections taught during each semester. e. A grade record refers to a student (Ssn), a particular section, and a grade (Grade). Design a relational database schema for this database application. First show all the functional dependencies that should hold among the attributes. Then design relation schemas for the database that are each in 3NF or BCNF. Specify the key attributes of each relation. Note any unspecified requirements, and make appropriate assumptions to render the specification complete. Step-by-step solution Step 1 of 4 Functional Dependency: Functional dependency exists when one attribute in a relation uniquely determines another attribute. Functional dependency is represented as XY. X and Y can be composite. The functional dependencies from the given information are as follows: Comment Step 2 of 4 From the functional dependencies FD 1 and FD 2, the relation STUDENT can be defined. Either Ssn or Snum can be primary key. From the functional dependencies FD 3 and FD 4, the relation DEPARTMENT can be defined. Either Dname or Dcode can be primary key. From the functional dependencies FD 5, the relation COURSE can be defined. Cnum is the primary key. From the functional dependencies FD 6, the relation SECTION can be defined. Sec_num, Sec_course, Semester, Year will be the composite primary key. From the functional dependencies FD 7 and FD 8, the relation GRADE can be defined. {Ssn, Sec_course, Semester, Year} will be the composite primary key. Comment Step 3 of 4 The relations that are in third normal form are as follows: Explanation: • In STUDENT relation, either Ssn or Snum can be primary key. Either keys can be used to retrieve the data from the STUDENT table. • In DEPARTMENT relation, either Dname or Dcode can be primary key. Either keys can be used to retrieve the data from the DEPARTMENT table. • In COURSE table, Cnum is the primary key. • The primary key for the SECTION table is {Sec_num, Sec_course, Semester, Year} which is a composite primary key. • The primary key for the GRADE table is {Ssn, Sec_course, Semester, Year} which is a composite primary key. Comment Step 4 of 4 The relational schema is as follows: Comment Chapter 14, Problem 20E Problem What update anomalies occur in the EMP_PROJ and EMP_DEPT relations of Figures 14.3 and 14.4? Step-by-step solution Step 1 of 2 In EMP_PROJ, the partial dependencies can cause anomalies, that is {SSN}-> {ENAME} and {PNUMBER}->{PNAME, PLOCATION} Let the example as PROJECT temporarily has no EMPLOYEEs working on it. when the last EMPLOYEE working on the information (PNAME, PNUMBER, PLOCATION) will not be represented in the database and is removed. Then new PROJECT cannot be added unless at least one EMPLOYEE is assigned to work on it. Inserting a new tuples relating an existing EMPLOYEE to an existing PROJECT requires checking both partial dependencies; Let the example, if a different value is entered for PLOCATION than those values in other tuples with the same value for PNUMBER, we get an update anomaly. Same like this comments apply to EMPLOYEE information. The reason is that EMP_PROJ represents the relationship between EMPLOYEEs and PROJECTs, and at the same time represents information concerning EMPLOYEE and PROJECT entities. Comment Step 2 of 2 In EMP_DEPT, the transitive dependency can cause anomalies. That is {SSN}->{DNUMBER}->{DNAME, DMGRSSN} Let the Example for , if a DEPARTMENT temporarily has no EMPLOYEEs working for it, its information (DNAME, DNUMBER, DMGRSSN) will not be represented in the database when the last EMPLOYEE working on it is removed. A new DEPARTMENT cannot be added unless at least one EMPLOYEE is assigned to work on it. Inserting a new tuple relating a new EMPLOYEE to an existing DEPARTMENT requires checking the transitive dependencies; for example, if a different value is entered for DMGRSSN than those values in other tuples with the same value for DNUMBER, we get an update anomaly. The reason is that EMP_DEPT represents the relationship between EMPLOYEEs and DEPARTMENTs, and at the same time represents information concerning EMPLOYEE and DEPARTMENT entities. Comment Chapter 14, Problem 21E Problem In what normal form is the LOTS relation schema in Figure 14.12(a) with respect to the restrictive interpretations of normal form that take only the primary key into account? Would it be in the same normal form if the general definitions of normal form were used? Step-by-step solution Step 1 of 1 575-10-23E With respect to restrictive interpretation of normal form, the LOTS relational schema is in 2NF since no partial dependencies are on the primary key. Other wise, it is not in 3NF, since following two transitive dependencies are on the primary key: PROPERTY_ID# ->COUNTY_NAME ->TAX_RATE, and PROPERTY_ID# ->AREA ->PRICE. Now, if we take all keys into account and use the general definition of 2NF and 3NF, then the LOTS relation schema will only be in 1NF because there is a partial dependency COUNTY_NAME ->TAX_RATE on the secondary key {COUNTY_NAME, #}, which violates 2NF. Comment Chapter 14, Problem 22E Problem Prove that any relation schema with two attributes is in BCNF. Step-by-step solution Step 1 of 2 BCNF: • A relation R is said to be in BCNF if it contains a FD (functional dependencies) of the form a->b. • Here, either a->b is a trivial FD or {a} is a super key of the relation R. Comment Step 2 of 2 Take he relation schema R= {a, b} with two attributes. Then the non-trivial FDs are {a} -> {b} and {b} ->{a}. The Functional Dependencies follows below cases: Case 1: No FD holds in R. In this case, the key is {a, b} and the relation satisfies BCNF. Case 2: Only {a} -> {b} holds. In this case, the key is {a} and the relation satisfies BCNF. Case 3: Only {b} ->{a} holds. In this case, the key is {B} and the relation satisfies BCNF. Case 4: Both {a} -> {a} and {b} -> {a} hold. In this case, there are two keys {a} and {a} and the relation satisfies BCNF. Hence, any relation with two attributes is in BCNF. Comment Chapter 14, Problem 23E Problem Why do spurious tuples occur in the result of joining the EMP_PROJ1 and EMP_ LOCS relations in Figure 14.5 (result shown in Figure 14.6)? Step-by-step solution Step 1 of 1 The spurious tuples are those tuples that are not valid. The spurious tuples occur in the result of joining the EMP_PROJ1 and EMP_LOCS relations because the natural joining is based on the common attribute Plocation. • In EMP_LOCS, the primary key is {Ename, Plocation}. • In EMP_PROJ1, the primary key is {Ssn, Pnumber}. • The attribute Plocation is not a primary key or a foreign key in the relations EMP_PROJ1 and EMP_LOCS. • As Plocation is not a primary key or a foreign key in the relations EMP_PROJ1 and EMP_LOCS, it resulted in spurious tuples. Comment Chapter 14, Problem 24E Problem Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for R? Decompose R into 2NF and then 3NF relations. Step-by-step solution Step 1 of 1 575-10-26E Let R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F = { {A, B}-> {C}, {A}->{D, E}, {B}->{F}, {F}->{G, H}, {D}->{I, J} } A minimal set of attributes whose closure includes all the attributes in R is a key. Since the closure of {A, B}, {A, B} + = R, So, one key of R is {A, B} Decompose R into 2NF and then 3NF For this normalize R intuitively into 2NF then 3NF, we may follow below steps Step 1: Identify partial dependencies and that may violate 2NF. These are attributes that are functionally dependent on either parts of the key, {A} or {B}, alone. Now we can calculate the closures {A}+ and {B}+ to determine partially dependent attributes: {A}+ = {A, D, E, I, J}. Hence {A} -> {D, E, I, J} ({A} -> {A} is a trivial dependency {B}+ = {B, F, G, H}, hence {A} -> {F, G, H} ({B} -> {B} is a trivial dependency For normalizing into 2NF, we may remove the attributes that are functionally dependent on part of the key (A or B) from R and place them in separate relations R1 and R2, along with the part of the key they depend on (A or B), which are copied into each of these relations but also remains in the original relation, which we call R3 below: R1 = {A, D, E, I, J}, R2 = {B, F, G, H}, R3 = {A, B, C} The new keys for R1, R2, R3 are underlined. Next, we look for transitive dependencies in R1, R2, R3. The relation R1 has the transitive dependency {A} -> {D} -> {I, J}, so we remove the transitively dependent attributes {I, J} from R1 into a relation R11 and copy the attribute D they are dependent on into R11. The remaining attributes are kept in a relation R12. Hence, R1 is decomposed into R11 and R12 as follows: R11 = {D, I, J}, R12 = {A, D, E} The relation R2 is similarly decomposed into R21 and R22 based on the transitive dependency {B} -> {F} -> {G, H}: R2 = {F, G, H}, R2 = {B, F} The final set of relations in 3NF are {R11, R12, R21, R22, R3} Comments (1) Chapter 14, Problem 25E Problem Repeat Exercise for the following different set of functional dependencies G = {{A, B}, → {C}, {B, D}→ {E, F}, {A, D}→{G, H}, {A}→{I},{H} → {J}}. Exercise Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for R? Decompose R into 2NF and then 3NF relations. Step-by-step solution Step 1 of 6 The relation R={ A, B, C, D, E, F, G, H, I, J} The set of functional dependencies are as follows: {A, B}{C} {B, D}{E, F} {A, D}{G, H} {A}{I} {H}{J} Step 1: Find the closure of single attributes: {A}+{A, I} {B}+{B} {C}+{C} {D}+{D} {E}+{E} {F}+{F} {G}+{G} {H}+{H, J} {I}+{ I} {J}+{J} From the above closures of single attributes, it is clear that the closure of any single attribute does not represent relation R. So, no single attribute forms the key for the relation R. Comment Step 2 of 6 Step 2: Find the closure of pairs of attributes that are in the set of functional dependencies. The closure of {A, B} is as shown below: From the functional dependency {A, B}{C} and {A}{I}, {A, B}+{A, B, C, I} The closure of { B, D} is as shown below: From the functional dependency {B, D}{E, F}, {B, D}+{B, D, E, F} The closure of { A, D} is as shown below: From the functional dependency {A, D}{G, H}, {A}{I} and {H}{J}, {A, D}+{A, D, G, H, I, J} From the above closures of pairs of attributes, it is clear that the closure of any pairs of attributes does not represent relation R. So, no single attribute forms the key for the relation Comment Step 3 of 6 Step 3: Find the closure of union of the three pairs of attributes that are in the set of functional dependencies. The closure of {A, B, D} is as shown below: From the functional dependency {A, B}{C}, {B, D}{E, F} and {A, D}{G, H} {A, B, D}+{A, B, C, D, E, F, G, H} From the functional dependency{A}{I}, the attribute I is added to {A, B, D}+. Hence, {A, B, D}+{A, B, C, D, E, F, G, H, I} From the functional dependency{H}{J}, the attribute J is added to {A, B, D}+. Hence, {A, B, D}+{A, B, C, D, E, F, G, H, I, J} The closure of {A, B, D} represents relation R. Hence, the key for relation R is {A, B, D}. Comment Step 4 of 6 Decomposing the relation R into second normal form (2NF): According to the second normal form, each non-key attribute must depend only on primary key. • The key for relation R is {A, B, D}. • {A} is a partial key that functionally determines the attribute I. • {A, B} is a partial key that functionally determines the attribute C. • {B, D } is a partial key that functionally determines the attribute E and F. • {A, D} is a partial key that functionally determines the attribute G and H. So, decompose the relation R into the following relations. R1{A, I} The key for R1 is {A}. R2{A, B, C} The key for R2 is {A, B}. R3{B, D, E, F} The key for R3 is { B, D}. R4{A, B, D) The key for R4 is { A, B, D}. R5{A, D, G, H, J} The key for R5 is { A, D}. The relations R1, R2, R3, R4, R5 are in second normal form. Comment Step 5 of 6 Decomposing the relation R into third normal form (3NF): According to the third normal form, the relation must be in second normal form and any non-key attribute should not describe any non-key attribute. • H is a non-key attribute that functionally determines the attribute J. So, decompose the relation R5 into the following relations. R6{A, D, G, H,} The key for R3 is { A, D}. R7{H, J} The key for R7 is {H}. Comment Step 6 of 6 The final set of relations that re in third normal form are as follows: R1{A, I} R2{A, B, C} R3{B, D, E, F} R4{A, B, D) R6{A, D, G, H,} R7{H, J} Comment Chapter 14, Problem 26E Problem Consider the following relation: A B C TUPLE# 10 bl cl 1 10 b2 c2 2 11 b4 cl 3 12 b3 c4 4 13 bl cl 5 14 b3 c4 6 a. Given the previous extension (state), which of the following dependencies may hold in the above relation? If the dependency cannot hold, explain why by specifying the tuples that cause the violation. i. A → B, ii. B → C, iii. C → B, iv. B → A, v. C → A b. Does the above relation have a potential candidate key? If it does, what is it? If it does not, why not? Step-by-step solution Step 1 of 2 a) 1.) A->B does not hold good in current state of relation as attribute B has two values corresponding to value 10 of attribute A. 2.) B->C: this relation can hold good in current relation state. 3.) C->B does not hold good in current state of relation as attribute B has two values corresponding to value c1 of attribute C. 4.) B->A does not hold good in current state of relation as attribute A has two values corresponding to value b1 and b3 of attribute B. 5.) C->A does not hold good in current state of relation as attribute A has two values corresponding to value c1, c4 of attribute C. Comment Step 2 of 2 b) If value of attribute - TUPLE# remains different for all tuples in relation it can act as candidate key. Comment Chapter 14, Problem 27E Problem Consider a relation R(A, B, C, D, E) with the following dependencies: AB → C, CD → E, DE→ B Is AB a candidate key of this relation? If not, is ABD? Explain your answer. Step-by-step solution Step 1 of 3 The candidate key is the minimal field or the combination of fields in a relation that can be used to uniquely identify all the other fields of the given relation. The candidate key is checked using the closure property of the set and the functional dependencies of the given relation. Comment Step 2 of 3 Consider the given relation R (A, B, C, D, E) and the following function dependencies: AB C, CD E, DE B To check whether the key AB is the candidate key of the given relation R, find the closure of AB as shown below: Since, all the attributes of the relation R cannot be identified using the key AB, the AB is not the candidate key for the given relation R. Comment Step 3 of 3 To check whether the key ABD is the candidate key of the given relation R, find the closure of ABD as shown below: Since, all the attributes of the relation R can be identified using the key ABD, the ABD is a candidate key for the given relation R. Hence, proved. Comment Chapter 14, Problem 28E Problem Consider the relation R, which has attributes that hold schedules of courses and sections at a university; R = {Course_no, Sec_no, Offering_dept, Credit_hours, Course_level, lnstructor_ssn, Semester, Year, Days_hours, Room_no, No_of_students}. Suppose that the following functional dependencies hold on R: {Course_no} → {Offering_dept, Credit_hours, Course_level} {Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no, No_of_students, lnstructor_ssn} {Room_no, Days_hours, Semester, Year} → {lnstructor_ssn, Course_no, Sec_no} Try to determine which sets of attributes form keys of R. How would you normalize this relation? Step-by-step solution Step 1 of 5 Consider the following relation and functional dependencies: Relation Functional dependencies: Comment Step 2 of 5 The closure of Course_no is as shown below: From the functional dependency The attributes Offering_dept, Credit_hours, Course_level are added to the closure of Course_no as Course_no functionally determines Offering_dept, Credit_hours, Course_level. Comment Step 3 of 5 The closure of Course_no, Sec_no, Semester, Year is as shown below: Comment Step 4 of 5 The closure of Room_no, Days_hours, Semester, Year is as shown below: Comment Step 5 of 5 Comment Problem Chapter 14, Problem 29E Consider the following relations for an order-processing application database at ABC, Inc. ORDER (O#, Odate, Cust#, Total_amount) ORDER ITEM (O#, I#, Qty_ordered, Total_price, Discount%) Assume that each item has a different discount. The Total_price refers to one item, Odate is the date on which the order was placed, and the Total_amount is the amount of the order. If we apply a natural join on the relations ORDER_ITEM and ORDER in this database, what does the resulting relation schema RES look like? What will be its key? Show the FDs in this resulting relation. Is RES in 2NF? Is it in 3NF? Why or why not? (State assumptions, if you make any.) Step-by-step solution Step 1 of 4 The natural join of two relations can be performed only when the relations have a common attribute with the same name. The relations ORDER and ORDER_ITEM have O# as a common attribute. So, based on the attribute O#, the natural join of two relations ORDER and ORDER_ITEM can be performed. The resulting relation RES when natural join is applied on relations ORDER and ORDER_ITEM is as follows: The key of the relation RES will be {O#,I#}. Comment Step 2 of 4 The functional dependencies in the relation RES are as given below: Comment Step 3 of 4 The relation RES is not in second normal form as partial dependencies exist in the relation. • The key of the relation RES is {O#,I#}. • O# is a partial primary key and it functionally determines Odate, Cust# and Total_amt%. Comment Step 4 of 4 According to the third normal form, the relation must be in second normal form and any non-key attribute should not describe any non-key attribute. The relation RES is not in third normal form as it is not in second normal form. Comment Chapter 14, Problem 30E Problem Consider the following relation: CAR_SALE(Car#, Date_sold, Salesperson#, Commission%, Discount_amt) Assume that a car may be sold by multiple salespeople, and hence {Car#, Salesperson#} is the primary key. Additional dependencies are Date_sold → Discount_amt and Salesperson# → Commission% Based on the given primary key, is this relation in INF, 2NF, or 3NF? Why or why not? How would you successively normalize it completely? Step-by-step solution Step 1 of 4 The relation CAR_SALE is in first normal form (1NF) but not in second normal form. • According to the first normal form, the relation should contain only atomic values. • The primary key is {Car#, Salesperson#}. • As the relation CAR_SALE contains only atomic values, the relation CAR_SALE is in the first normal form. Comment Step 2 of 4 The relation CAR_SALE is not in second normal form as partial dependencies exist in the relation. • According to the second normal form, each non-key attribute must depend only on primary key. • Salesperson# is a partial primary key and it functionally determines Commission%. • As partial dependency exists in the relation, the relation CAR_SALE is not in second normal form. • In order to satisfy second normal form, remove the partial dependencies by decomposing the relation as shown below: CAR_SALE1(Car#, Date_sold, Salesperson#, Discount_amt) CAR_SALE2 (Salesperson#, Commission%) • The relations CAR_SALE1, and CAR_SALE2 are in second normal form. Comment Step 3 of 4 The relation CAR_SALE2 is in third normal form but the relation CAR_SALE1 is not in third normal form as transitive dependencies exist in the relation. • According to the third normal form, the relation must be in second normal form and any non-key attribute should not describe any non-key attribute. • In relations CAR_SALE1, Date_sold is a non-key attribute which functionally determines Discount_amt. • As transitive dependency exists in the relation, the relation CAR_SALE1 is not in third normal form. • In order to satisfy third normal form, remove the transitive dependencies by decomposing the relation CAR_SALE1as shown below: CAR_SALE3 (Car#, Date_sold, Salesperson#) CAR_SALE4 (Date_sold, Discount_amt) • The relations CAR_SALE3 and CAR_SALE4 are now in third normal form. Comment Step 4 of 4 The final set of relations that are in third normal are as follows: CAR_SALE2 (Salesperson#, Commission%) CAR_SALE3 (Car#, Date_sold, Salesperson#) CAR_SALE4 (Date_sold, Discount_amt) Comment Chapter 14, Problem 31E Problem Consider the following relation for published books: BOOK (Book_title, Author_name, Book_type, List_price, Author_affil, Publisher) Author_affil refers to the affiliation of author. Suppose the following dependencies exist: Book_title → Publisher, Book_type Book_type → List_price Author_name → Author_affil a. What normal form is the relation in? Explain your answer. b. Apply normalization until you cannot decompose the relations further. State the reasons behind each decomposition. Step-by-step solution Step 1 of 4 a. The relation Book is in first normal form (1NF) but not in second normal form. Explanation: • According to the first normal form, the relation should contain only atomic values. • The primary key is (Book_Title, Author_Name). • As the relation Book contains only atomic values, the relation Book is in the first normal form. • According to the second normal form, each non-key attribute must depend only on primary key. • Author_Name is a partial primary key and it functionally determines Author_affil. • Book_title is a partial primary key and it functionally determines Publisher and Book_type. • As partial dependency exists in the relation, the relation Book is not in second normal form. Comment Step 2 of 4 b. The relation Book is in first normal form. It is not in second normal form as partial dependencies exist in the relation. In order to satisfy second normal form, remove the partial dependencies by decomposing the relation as shown below: Book_author (Book_title, Author_name) Book_publisher(Book_title, Publisher, Book_type, List_price) Author(Author_name, Author_affil) The relations Book_author, Book_publisher and Author are in second normal form. Comment Step 3 of 4 According to the third normal form, the relation must be in second normal form and any non-key attribute should not describe any non-key attribute. • The relations Book_author and Author is in third normal form. • The relations Book_publisher is not in third normal form as transitive dependency exists in the relation. • Book_type is a non-key attribute which functionally determines List_price. • In order to satisfy third normal form, remove the transitive dependencies by decomposing the relation Book_publisher as shown below: Book_details(Book_title, Publisher,Book_type) Book_price (Book_type, List_price) The relations Book_author, Book_details, Book_price and Author are in third normal form. Comment Step 4 of 4 The final set of relations that are in third normal are as follows: Book_author (Book_title, Author_name) Book_details (Book_title, Publisher,Book_type) Book_price (Book_type, List_price) Author(Author_name, Author_affil) Comment Chapter 14, Problem 32E Problem This exercise asks you to convert business statements into dependencies. Consider the relation DISK_DRIVE (Serial_number, Manufacturer, Model, Batch, Capacity, Retailer). Each tuple in the relation DISK_DRIVE contains information about a disk drive with a unique Serial_number, made by a manufacturer, with a particular model number, released in a certain batch, which has a certain storage capacity and is sold by a certain retailer. For example, the tuple Disk_drive (‘1978619’, ‘WesternDigital’, ‘A2235X’, ‘765234’, 500, ‘CompUSA’) specifies that WesternDigital made a disk drive with serial number 1978619 and model number A2235X, released in batch 765234; it is 500GB and sold by CompUSA. Write each of the following dependencies as an FD: a. The manufacturer and serial number uniquely identifies the drive. b. A model number is registered by a manufacturer and therefore can’t be used by another manufacturer. c. All disk drives in a particular batch are the same model. d. All disk drives of a certain model of a particular manufacturer have exactly the same capacity. Step-by-step solution Step 1 of 1 a) manufacturer, serialNumber → model, batch, capacity, retailer b) model → manufacturer c) manufacturer, batch → model d) model → capacity Comments (1) Chapter 14, Problem 33E Problem Consider the following relation: R(Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge) In the above relation, a tuple describes a visit of a patient to a doctor along with a treatment code and daily charge. Assume that diagnosis is determined (uniquely) for each patient by a doctor. Assume that each treatment code has a fixed charge (regardless of patient). Is this relation in 2NF? Justify your answer and decompose if necessary. Then argue whether further normalization to 3NF is necessary, and if so, perform it. Step-by-step solution Step 1 of 1 Let the relation R (Doctor#, Patient#, Date, Diagnosis, Treat_code , Change) Functional dependencies of relation R is {Doctor#, Patient#, Date}→{Diagnosis, Treat_code, Charge} {Treat_code}→{Charge} Here there is no partial dependencies, So, the given relation is in 2NF. And it is not 3NF because the Charge is a nonkey attribute that is determined by another nonkey attribute, Treat_code. We must decompose this as: R (Doctor#, Patient#, Date, Diagnosis, Treat_code) R1 (Treat_code, Charge) We could further infer that the treatment for a given diagnosis is functionally dependant, but we should be sure to allow the doctor to have some flexibility when prescribing cures. Comment Chapter 14, Problem 34E Problem Consider the following relation: CAR_SALE (Car_id, Option_type, Option_listprice, Sale_date, Option_discountedprice) This relation refers to options installed in cars (e.g., cruise control) that were sold at a dealership, and the list and discounted prices of the options. If CarlD → Sale_date and Option_type → Option_listprice and CarlD, Option_type → Option_discountedprice, argue using the generalized definition of the 3NF that this relation is not in 3NF. Then argue from your knowledge of 2NF, why it is not even in 2NF. Step-by-step solution Step 1 of 3 The relation CAR_SALE is as shown below: CAR_SALE( Car_id, Option_type, Option_listprice, Sale_date, Option_discountedprice) The functional dependencies are as given below: Car_id Sale_date Option_type Option_listprice Car_id, Option_type Option_discountedprice Comment Step 2 of 3 In order for a relation to be in third normal form, all nontrivial functional dependencies must be fully dependent on the primary key and any non-key attribute should not describe any non-key attribute. In other words, there should not be any partial dependency and transitive dependency. • For the relation CAR_SALE, Car_id, Option_type is a primary key. • In functional dependency Car_id Sale_date, Car_id is a partial key that determines Sale_date. Hence, there exists partial dependency in the relation. • In functional dependency Option_type Option_listprice, Option_type is a partial key that determines Option_type. Hence, there exists partial dependency in the relation. Therefore, the relation CAR_SALE is not in third normal form. Comment Step 3 of 3 According to the second normal form, the relation must be in first normal form and each non-key attribute must depend only on primary key. In other words, there should not be any partial dependency. • For the relation CAR_SALE, Car_id, Option_type is a primary key. • In functional dependency Car_id Sale_date, Car_id is a partial key that determines Sale_date. Hence, there exists partial dependency in the relation. • In functional dependency Option_type Option_listprice, Option_type is a partial key that determines Option_type. Hence, there exists partial dependency in the relation. Therefore, the relation CAR_SALE is not in second normal form. Comment Chapter 14, Problem 35E Problem Consider the relation: BOOK (Book_Name, Author, Edition, Year) with the data: Book_Name Author Edition Copyright_Year DB_fundamentals Navathe 4 2004 DB_fundamentals Elmasri 4 2004 DB_fundamentals Elmasri 5 2007 DB_fundamentals Navathe 5 2007 a. Based on a common-sense understanding of the above data, what are the possible candidate keys of this relation? b. Justify that this relation has the MVD {Book} ↠{Author} | {Edition, Year}. c. What would be the decomposition of this relation based on the above MVD? Evaluate each resulting relation for the highest normal form it possesses. Step-by-step solution Step 1 of 3 Candidate Key A candidate key may be a single attribute or a set of attribute that uniquely identify tuples or record in a database. Subset of candidate key are called prime attributes and rest of the attributes in the table are called non-prime attributes. Book_Name Author Edition Copyright_Year DB_fundamentals Navathe 4 2004 DB_fundamentals Elmasri 4 2004 DB_fundamentals Elmasri 5 2007 DB_fundamentals Navathe 5 2007 Book_Name is same in all rows so this can’t be consider as a part of candidate key. a. Possible candidate keys: (Author, Edition), (Author, Copyright_Year), (Book_Name, Author, Edition), (Book_Name, Author, Copyright_Year), (Author, Edition, Copyright_Year), (Book_Name, Author, Edition, Copyright_Year). All above sets are candidate keys. Any one candidate key can be implemented. (Author, Edition), (Author, Copyright_Year) will be a better choice to implement. Comment Step 2 of 3 b. Multi Valued Dependency (MVD): MVD occurs when the presence of one or more tuples in the table implies the presence of one or more other rows in the same table. If at least two rows of table agree on all implying attributes, then there components might be swapped, and the resulting tuples must be in the table. MVD plays very important role in 4NF. Consider the MVD The relationship . indicates that the relationship between Book_Name and Author is independent of the relationship between Book_Name and (Edition, Copyright_Year). By the definition of MVD, Book_Name is implying more than one Author and (Edition, Copyright_Year). If the components of Author, Edition and Copyright are swapped than the resulting rows would be present in the table. Therefore, the relation has MVD . Comment Step 3 of 3 c. Decomposition on the basis of MVD: If a relation has MVD then redundant values will be there in the tuples and hence functional dependency would not exist in that relation. Therefore, the relation will be in BCNF. So relation can be decomposed into the following relations: BOOK1 (Book_Name, Author, Edition) BOOK2 (Edition, Copyright_Year) Again BOOK1 is following MVD. Decompose it further and the final schema will be holding highest normal form. BOOK1_1 (Book_Name, Author) BOOK1_2 (Book_Name, Edition) BOOK2 (Edition, Copyright_Year) Comment Chapter 14, Problem 36E Problem Consider the following relation: TRIP (Trip_id, Start_date, Cities_visited, Cards_used) This relation refers to business trips made by company salespeople. Suppose the TRIP has a single Start_date but involves many Cities and salespeople may use multiple credit cards on the trip. Make up a mock-up population of the table. a. Discuss what FDs and/or MVDs exist in this relation. b. Show how you will go about normalizing the relation. Step-by-step solution Step 1 of 2 Relation TRIP has unique attribute Trip_id and particular Trip_id has single Start_date of the trip. So Start_date is fully functionally dependent on Trip_id. a. FDs and MVDs that exist in the relation are: FD1: ( ) Cities_visited and Cards_used may repeat for particular Start_date or Trip_id. Cities_visited and Cards_used are independent of each other and they also have multiple values. Also, both Cities_visited and Cards_used are dependent on Trip_id and Start_date, so the MVDs present in the relation are as follows: MVD1: ( ) MVD2: ( ) Comment Step 2 of 2 b. Normalizing relation Relation is having one FD and two MVDs, so first split the relation to remove functional dependency FD1. TRIP1 ( Trip_id, Start_date) Now split relation to remove multi valued functional dependency. Cities_visited and Cards_used are independent of each other, if their components are swapped then relation will remain unchanged. On the basis of Start_date, the relation can be decomposed as follows: TRIP2 (Start_date, Cities_visited) TRIP3 (Start_date, Cards_used) Following is the final schema for the table provided. TRIP1 ( Trip_id, Start_date) TRIP2 (Start_date, Cities_visited) TRIP3 (Start_date, Cards_used) Comment Chapter 15, Problem 1RQ Problem What is the role of Armstrong’s inference rules (inference rules IR1 through IR3) in the development of the theory of relational design? Step-by-step solution Step 1 of 1 There are six inference rules (IR) for functional dependencies (FD) of which first 3 rules: reflexive, augmentations, and transitive, are referred as Armstrong axioms. Inference Rule 1 (reflexive rule) If , then . The reflexive rule is defined as any set of attributes functionally determines itself. Inference Rule 2 (augmentation rule) . The augmented rule is defined as, when extending the left-hand side attributes of a FD results in another valid FD. Inference Rule 3 (transitive rule) . Transitive rule is defined as if A determines B and B determine C then A determines C. Database designers specify the set of functional dependencies F that can be determined by defining the attributes of relation R, and then IR1, IR2 and IR3 are used to define additional functional dependencies that hold on R. These 3 inference rules are inferring new functional dependencies (additional rules can also be determined from them). Hence they define new facts and preferred by database designers in relational database design. Comment Chapter 15, Problem 2RQ Problem What is meant by the completeness and soundness of Armstrong’s inference rules? Step-by-step solution Step 1 of 1 The inference rules (IR) for functional dependencies (FD) reflexive, augmentation, and transitive rules are referred as Armstrong inference rules. Inference Rule 1 (reflexive rule) If , then . The reflexive rule is defined as any set of attributes functionally determines itself. Inference Rule 2 (augmentation rule) . The augmented rule is defined as, when extending the left-hand side attributes of a FD results in another valid FD. Inference Rule 3 (transitive rule) . Transitive rule is defined as if A determines B and B determine C then A determines C. As given by Armstrong, the inference rules IR1, IR2, and IR3 are sound and complete. Sound It means that for any given set of functional dependencies F specified on a relation schema R, any dependency that is defined from F by using IR1 through IR3 that contained in every relation states of relation R, satisfies the dependencies in F. Complete It means that using IR1 through IR3 continuously again and again to define dependencies until there are no more dependencies can be defined from it, results in the complete set of all possible dependencies that can be defined from F. Comment Chapter 15, Problem 3RQ Problem What is meant by the closure of a set of functional dependencies? Illustrate with an example. Step-by-step solution Step 1 of 2 The closure of a set of functional dependencies is nothing but a set of dependencies that consist of functional dependencies of a relation denoted by F as well as the functional dependencies that can be inferred from or implied by F. The closure of a set of functional dependencies of a relation R is denoted by F+. Comment Step 2 of 2 Example: Consider a relation Student with attributes StudentNo, Sname, address, DOB, CourseNo , CourseName, Credits, Duration. The functional dependencies of Student are as follows: • • The set of functional dependencies of Student is denoted by F. So, The functional dependencies that can be inferred from F are as follows: • • • Hence, Comment Chapter 15, Problem 4RQ Problem When are two sets of functional dependencies equivalent? How can we determine their equivalence? Step-by-step solution Step 1 of 1 • Two set of functional dependencies (FD) A and B are equivalent if . Hence equivalence means that every FD in A can be defined from B, and every FD in B can be defined from A, A is equivalent to B if both the conditions, A covers B and B covers A, hold. • A set of functional dependencies A is said to cover another set of functional dependencies B if every FD in B is also in , it implies if every dependency in B can be defined from A, it can be referred as B is covered by A. • Whether A covers B, the statement is determined by calculating FD in B, then checking whether this with respect to A for each includes the attributes in F, if this holds true for every FD in B, then A covers B. Similarly determined for B covers A and hence both A and B are said to be equivalent. Comment Chapter 15, Problem 5RQ Problem What is a minimal set of functional dependencies? Does every set of dependencies have a minimal equivalent set? Is it always unique? Step-by-step solution Step 1 of 1 If a set of functional dependencies F is said to be minimal sets if it satisfies the following conditions. 1. There are set of dependencies in F, and then every dependency in F contains one single attribute for its right-hand side. 2. Any dependency in F cannot be replaced with another dependency , where Q is a proper subset of P; it contains a set of dependencies that is equivalent to F. 3. Any dependency cannot be removed from F and contains a set of dependencies that is equivalent to F. Condition 1 states that every dependency is accepted with a single attribute on the right-hand side. Conditions 2 and 3 ensure that there are no dependencies that occur repeatedly either by having redundant attributes on the left-hand side of a dependency or by having a dependency that can be defined from the remaining FDs in a set of functional dependency F respectively. A minimal cover of a set of functional dependencies A is a set of functional dependencies F that satisfies the property that every dependency in A is in the closure of F, and is a minimal set of dependencies equivalent to A without redundancy in a standard acceptable form. Hence there is an equivalent set which is unique. Comment Chapter 15, Problem 6RQ Problem What is meant by the attribute preservation condition on a decomposition? Step-by-step solution Step 1 of 1 Attribute preservation condition on decomposition: Decomposition:Replace an un normalized relation by a set of normalized relations. Let is the relation schema than is a decomposition. Attribute preservation Every Attribute is in some relation. All attributes must be preserved through the process of normalization. Start with universal relation schema that includes all the attributes of the database. Here every attribute name is unique Using the functional dependencies, the algorithms decompose the universal relation schema R into a set of relation schemas that will become the relational database schema. D is called decomposition of Such that . and Each attribute in ‘R’ will appear in at least one relation schema no attributes are lost. Attribute preservation condition of decomposition Comment in the decomposition so that Chapter 15, Problem 7RQ Problem Why are normal forms alone insufficient as a condition for a good schema design? Step-by-step solution Step 1 of 1 forms along in sufficient as a condition for good schema design from the describe properties of decompositions, 1) loss less joint property and 2) Dependency preservation property, Using these both, used by the design algorithms to achieve desirable decomposition It is insufficient to test the relation schemas independently of one another for compliance with higher normal from like 2nF, 3NF and 13 CNF. The resulting relations must collectively satisfy these two additional propertied dependency preservation and loss less join property to quality as a good. Comment Chapter 15, Problem 8RQ Problem What is the dependency preservation property for a decomposition? Why is it important? Step-by-step solution Step 1 of 2 Dependency preservation property for decomposition:Let be a set of functional dependencies on schema decomposition of Where the projection of is subset of are contained in decomposition attributes are in on . Take : is denoted by .is the set of all functional dependencies . dlence the projection of be a . such that attributes in on each relation schema is the set of functional dependencies in in the . Such that all their LHS and RHS . of the dependencies that hold on each belongs to be equivalent to closure of . Comment Step 2 of 2 Important:1) With this property we would like to check easily that updates to the database do not result in illegal relations being created. 2) It would be nice if our design allowed us to check updates without having to compute natural joins. To know whether joins must be computed. 3) We want to preserve dependencies because each dependencies in represents a constraint on the database. 4) It is always possible to find a dependency preserving decomposition such that each relation Comment in is in . with respect to Chapter 15, Problem 9RQ Problem Why can we not guarantee that BCNF relation schemas will be produced by dependencypreserving decompositions of non-BCNF relation schemas? Give a counterexample to illustrate this point. Step-by-step solution Step 1 of 3 We can not guarantee that decompositions of non- relation schemas will be produced by dependency-preserving relation schema. For this, consider are example. Take two functional dependencies that exist in the relation . -fd1: Fd2: instructor Here course. is a candidate keys so. This relation is in but not in Comment Step 2 of 3 Comment Step 3 of 3 A relation is NOT in BCNF. That should be decomposed, so as to meet this property. While possible forgoing the preservation of all functions dependencies in the decomposed relations Comment Chapter 15, Problem 10RQ Problem What is the lossless (or nonadditive) join property of a decomposition? Why is it important? Step-by-step solution Step 1 of 1 Loss Less join property of decomposition: This is the one property of decomposition. The word loss in lossess means, lost of information. But not to loss of tuples. Basic definition of loss less – join. A decomposition of dependencies Where of on has the losses join property with respect to if, for every relation of that satisfies . Set , the following holds. is the natural join of all the relations in D Equation Emp-PROJ SSN PNUM Hours ENAME PNAME PLOCATION SSN ENAME PNUM PNAME PLOCATION SSN PNUM hours Here “hours” is the lossless join. Important:Important feature of decomposition is that it gives lossless joins. It shows the problem of spurious tuples. If the relations chosen do not have total information afoot the entity /relationship, when we join the relations, then obtain the tuples. Actually that is not belonging in there. These spurious tuples contain the wrong in formation. To avoid this type problems, we can go through lossless join property. Comment Chapter 15, Problem 11RQ Problem Between the properties of dependency preservation and losslessness, which one must definitely be satisfied? Why? Step-by-step solution Step 1 of 1 Dependency preservation and loss lenses both are describe by the properties of decompositions. With this both are used by the algorithms to achieve desirable decompositions. Property of dependency preservation:It ensures us to in force a constraint on the original relation from corresponding instances in the smaller relations. Property of lossless join property:It ensures that to find out any instance of the original relation from corresponding instance in the smaller relations. Here no spurious rows are generated. When relations are reunited through natural join operation. To test the relation schemas independently of one another for compliance with higher normal forms like Comments (1) , and , dependency preservation is not sufficient. Chapter 15, Problem 12RQ Problem Discuss the NULL value and dangling tuple problems. Step-by-step solution Step 1 of 2 NULL values and dangling tuple problems. When designing a relational database schema, we must consider the problems with NULLS. NULLS can have multiple interpretations. That are 1) The attribute does not apply to this tuple 2) The attribute value for this tuple is unknown. 3) The value is known but absent, that is, it has not been recorded yet. Comment Step 2 of 2 Dangling tuples:Tuples that “disappear” in computing a join. Let a pair of relations not join with any tuple in There is no tuple ‘ ’in and and the natural join . And tuple in ‘ ’ that does . . Such that This is called dangling tuple. It may or may not e acceptable. Example: For suppose there is a tuple in the account relation with the value of “ ”, but no matching tuple in the branch relation for the Town 1 branch. This is undesirable. As should refer to a branch that exists. and now there is a another tuple . In the branch relation with relation for the “ “ ”, but no matching tuple in The account ”branch. Means that, a branch exists for which no accounts exist. When a branch is being opened. Comment Chapter 15, Problem 13RQ Problem Illustrate how the process of creating first normal form relations may lead to multivalued dependencies. How should the first normalization be done properly so that MVDs are avoided? Step-by-step solution Step 1 of 2 Multivalued dependencies are a consequence of first normal form which disallows an attribute in a tuple to have a set of values. If we have two or more multivalued independent attributes in the same relation schema, we get into a problem of having to repeat every value of one of the attributes with every value of other attribute to keep the relation state consistent and to maintain the independence among attributes involved. this constraint is specified by a multivalued dependency. For example: consider a EMP relation with attributes Ename, Project_name, Dependent_name the relation has following tuples: 1.) ('a','x','n') 2.) ('a','x','m') 3.) ('a','y','n') 4.) ('a','y','m') Comment Step 2 of 2 Here employee name 'a' has two depenedents and work for two projects. Since each attribute value must be atomic, the problem of multivalued dependency has risen in the relation. Informally, whenever two independent 1:N relationships A:B and A:C are mixed in the same relation, R(A, B, C) an MVD may arise. Whenever a relation schema R is decomposed into R1= (X U Y) and R2 = (R-Y) based on an MVD X->>Y that holds in R, the decomposition has nonadditive join property. The property NJB': The relation schema R1 and R2 form a nonadditive join decomposition of R with respect to a set of functional and multivalued dependencies if and only if (R1 n R2)->>(R1- R2) ...deals with problem of MVD and thus using this property we can get a relation which is in 1NF and does not has MVD. Comment Chapter 15, Problem 14RQ Problem What types of constraints are inclusion dependencies meant to represent? Step-by-step solution Step 1 of 1 Types of constraints are inclusion dependencies ment to represent. inclusion dependencies, it is defined in order to formalize two types of interrelational constraints. Which cannot be expressed using functional dependencies or multivalued dependencies. That two are Referential integrity constraint:It relates attributes across relations. So, the foreign key or referential integrity constraint cannot be specified as a functional or multivalued dependency. Class/subclass relationship:It represents a relations between two the class/subclass relationship. Also has no formal definition in terms of the functional, multivalued and join dependencies. Comment Chapter 15, Problem 15RQ Problem How do template dependencies differ from the other types of dependencies we discussed? Step-by-step solution Step 1 of 2 Template dependencies differ from the other type of dependences Template dependencies:It is a technique for representing constraints in relations. Based on the semantics of attributes with in the relation some peculiar constraint may option. Basic idea of template dependencies is, to specify a template-or-example. That defines each constraint or dependency. In this dependencies, there are two types (1) Tuple-generating templates (2) Constraint generating templates. And a template consists of number of hypothesis tuples that appear in one or more relations. Comment Step 2 of 2 And other part of template is template conclusion. The conclusion is a set of tuples that must also exist in the relations. If the hypothesis tuples are there. Let one example Take relation We may apply the template dependencies to this relation, dependencies , it shous the template for functional . Hypothesis Here we take Conclusion But while come through other dependencies it is some what different. Comment Chapter 15, Problem 16RQ Problem Why is the domain-key normal form (DKNF) known as the ultimate normal form? Step-by-step solution Step 1 of 1 Domain-key normal form is known as ultimate normal form. Behind the idea of domain-key normal form is. It specify the ultimate normal form that taken in to account all possible types of dependencies that should hold on the valid relation states can be enforced simply by domain constraints and key constraints. - A relation in DkNF has no modification anomalies, and conversely. - DkNF is the ultimate normal form means, here no higher normal form related to modification anomalies. - In domain – key normal form the relation is on every constraint. That is logical consequence of the definition of keys and domains. Keys: - the unique identifier of a tuple. Damain:- physical and logical description of an attributes. Comment Chapter 15, Problem 17E Problem Show that the relation schemas produced by Algorithm 15.4 are in 3NF. Step-by-step solution Step 1 of 1 Assume that one of the relation schemas Now a functional dependency , formed by algorithm 15.4 is not in 3NF. is valid in where, • M is not a super key of R. • A is not a prime attribute of R. However, as per the step 2 of algorithm where that will comprise of a set of attributes for are the only nonprime attributes of Thus, if a functional dependency and M is not a super key of , implying that X is a key of and . holds in the relation schema , where A is not prime , then M must be a subset of X or else M would comprise of X and therefore would be a super key. If both that and holds and M is a subset of X, then this contradicts the condition is a functional dependency in a minimal cover of functional dependencies, as removing an attribute from the key X of functional dependency leaves a valid functional dependency. This infringes one of the minimality conditions and hence the relational schema 3NF. Comment must be in Chapter 15, Problem 18E Problem Show that, if the matrix S resulting from Algorithm 15.3 does not have a row that is all a symbols, projecting Son the decomposition and joining it back will always produce at least one spurious tuple. Step-by-step solution Step 1 of 2 Let take the universal relation and a decomposition of and a set of functional dependencies. Based on Algorithm (given in the text book) Take the matrix S, it is considered to be some relation state Row in represents a tuple ,it is corresponding to and that correspond to the attributes of and of . (From step1 in algorithm) and that has a symbols in columns symbols in the remaining columns. From the step 4 of Algorithm During the loop, the algorithm then transforms the rows of this matrix, that they represent the tuples. So, the tuples satisfy all the functional dependencies in represents two tuples in functional dependency attributes . Any two rows in which that agree in their values for the left-hand-side attributes in of a and it will also agree in their values for the right-hand-side . If any row in ends up with all a symbols, then the decomposition property with respect to has the non additive join . In other hand, if no row ends up being all a symbols, decomposition ‘D’ does not satisfy the lossless-join Property. Comment Step 2 of 2 At this time the relation state dependencies in represented by . And relation state of that satisfies the . But does not satisfy the non additive join condition. From step 4:The loop in the algorithm cann’t change any symbols to a symbols. So the symbols. So, the Ruslting matrix ‘S’ does not have a row with all ‘a’ symbols and the decomposition does not have the loss-join property. Let take the example. Consider the relational schema Comment and set of functional dependencies Chapter 15, Problem 19E Problem Show that the relation schemas produced by Algorithm 15.5 are in BCNF. Step-by-step solution Step 1 of 2 Show that the relation schemas produced by algorithm are in BCNE. In this algorithm the loop will continue until all relation schemas are in BCNF 11.3 Algorithm Input: A universal relation and a set of functional dependencies on the attributes of Step 1 : Set D : Step 2: while there is a relation schema Q is D that is not in BCNF do Choose the relation schema in Find the functional dependency schemas that is not in ; in a that violates BCNF; replace Q in D by two relation and ; Comment Step 2 of 2 According to this algorithm, we decompose one relation schema Q. That is not in BCNF into two relation schemas. According to the property of lossless join decomposition property 1, for binary composition and claim 2 (Preservation of Nonadditivity in successive Decompositions) [which is menctioved in text book], the decomposition D has the no additive join property. At the end of the algorithm. All relation schemas in D will be in BCNF. Example:Working of this algorithm. Let take one relation (for example) which is not in BCNF. Project –ID Company-name Ploat # Area Price Tax-Rate First loop: it is in BCNF Project –ID Company-name Ploat # Area Price Company-name Tax-Rate Second loop: it is also not in BCNF Project –ID Company-name Ploat # Area Area Price Company-name Tax-Rate Final loop: it is under in BCNF Project –ID Area Ploat # Area Company-name Company-name Tax-Rate Comment Chapter 15, Problem 20E Problem Write programs that implement Algorithms 15.4 and 15.5. Step-by-step solution Step 1 of 6 Program to implement Algorithm 15.4 The following program converts a relational schema into 3NF. SynthesisAlgorithm is a public class having main method to start execution. First, program takes the input from the keyboard, stores them into several list. Input values are the attribute names and functional dependencies for the relation. In this program, first step calculates minimal cover of the functional dependencies. Second step calculates the attributes to be considered for the relation. Third step checks whether or not primary key is contained in any of the relation. Forth step finds if there is any redundant relation and removes that relation from the schema. Following is the java code to implement Synthesis algorithm to convert a relation into 3NF. import java.util.*; import java.io.*; public class SynthesisAlgorithm { // main method to start the execution of the program. public static void main(String []args) { BufferedReader br=new BufferedReader(new InputStreamReader(System.in)); // If irrelevant values will be entered, it might give // Exception at Runtime. System.out.println("Note: Everything is case Sensitive, please enter values in the same case everywhere."); System.out.println("Enter the name of Relation:"); // It will store the name of relation. String relationName=br.readLine(); System.out.println("How many attributes are there in the Relation?"); // Number of attributes in the relation for efficient management of the attributes. int n=Integer.parseInt(br.readLine()); System.out.println("Type name of one attribute in each line:"); // This list contains all attribute names. LinkedList<String> attributeList=new LinkedList<String>(); // for loop will insert all attributes to the list. for(int i=0;i<n;i++) attributeList.add(br.readLine()); System.out.println("How many functional dependencies are there in the relation "+relationName); // Number of Functional Dependencies. int numOfFuncDep=Integer.parseInt(br.readLine()); // this will initialize Left Hand Side attributes of Functional Dependencies. LinkedList<String>[] fucDepLHSattr=new LinkedList[numOfFuncDep]; // this will initialize Right Hand Side attributes of Functional Dependencies. LinkedList<String>[] fucDepRHSattr=new LinkedList[numOfFuncDep]; for(int i=0;i<numOfFuncDep;i++) { // Left Hand side of functional dependency might // have more than one determinants. fucDepLHSattr[i]=new LinkedList<String>(); System.out.println("Number of attributes in LHS of functional dependency["+i+"]"); //Number of determinant in Left Hand side of //functional dependency. //temp1 variable overrides itself for each //functional dependency. int temp1=Integer.parseInt(br.readLine()); System.out.println("Enter the attribute names of LHS["+i+"]"); for(int j=0;j<temp1;j++) fucDepLHSattr[i].add(br.readLine()); // Right Hand side of functional dependency might // have more than one determinants. fucDepRHSattr[i]= new LinkedList<String>(); System.out.println("Number of attributes in RHS of functional dependency["+i+"]"); //Number of dependants in Right Hand side of //functional dependency. //temp2 variable overrides itself for each //functional dependency. int temp2=Integer.parseInt(br.readLine()); System.out.println("Enter the attribute names of RHS["+i+"]"); // inserting all attributes on right hand side of // the functional dependency. for(int j=0;j<temp2;j++) fucDepRHSattr[i].add(br.readLine()); } System.out.println("Step 1: Finding minimal cover..."); //initializing a collection to contain the minimal //cover of FDs. HashMap<String,String> canonicalFDs=new HashMap<String,String>(); // calling the minimal cover to calculate minimum FDs // required for the relation. canonicalFDs=minimalCover(fucDepLHSattr,fucDepRHSattr) ; Comment Step 2 of 6 for(int i=0;i<numOfFuncDep;i++) { for(int j=0;j<numOfFuncDep && j!=i;j++) { // Since, HashMap has unique key, value pair, it // will remove redundant FDs. canonicalFDs.get(i).containsKey(canonicalFDs.get (j)); // reducing the attributes from each side. canonicalFDs=minusFD(canonicalFDs.get(i), canonicalFDs.get(i)); } } System.out.println("Step 2: Calculating attributes for each Functional Dependency..."); for(int i=0;i<canonicalFDs.size();i++) { System.out.print("Relation"+i+": "); // this will print the relation for each functional // dependency. System.out.print(relationName+"("+canonicalFDs.get( i)+","+canonicalFDs.get(i)+")"); //printing each relation in new line. System.out.print("\n"); } System.out.println("Step 3: Checking whether key attributes are exist in any of the relations..."); //checking primary keys that exist in the created //relations. if(canonicalFDs==minimalCover(fucDepLHSattr,fucDepRHSa ttr)) System.out.println("No redundant attributes exist:"); System.out.println("Step 4: Reducing redundant relations from the schema..."); System.out.println("Final schema is as follows:"); // this loop will print the final schema. for(int i=0;i<canonicalFDs.size();i++) { System.out.print("Relation"+i+": "); System.out.print(relationName+"("+canonicalFDs.get( i)+","+canonicalFDs.get(i)+")"); } } public static HashMap<String,String> minusFD(HashMap<String,String> map, Object pair) { map.remove(pair); } // this method will find the minimal cover of FDs. public static HashMap<String,String> minimalCover(LinkedList[] LHSlist,LinkedList[] RHSlist) { //if the set of FDs are null this will throw //exception. if(LHSlist==null || RHSlist==null) throw new IllegalArgumentException("Functional Dependency can't be NULL."); else System.out.println(" Converting Functional Dependencies into canonical form..."); HashMap<String,String> canonicalFDs=new HashMap<String,String>(); for(int i=0;i<LHSlist.length && i<RHSlist.length;i++) { canonicalFDs.put(convertIntoCanonical(LHSlist[i],RHSl ist[i])); } return canonicalFDs; } // this method converts all functional dependencies into // canonical form. public static HashMap<String,String> convertIntoCanonical(LinkedList<String> list1,LinkedList<String> list2) { // initializing a HashMap to hold canonical FDs. HashMap<String,String> map=new HashMap<String,String>(); // both loop will insert FDs into map, that hold only // unique pair. for(int j=0;j<list1.size();j++) { for(int i=0;i<list2.size();i++) { map.put(list1.get(i),list2.get(i)); } } return map; } } Comment Step 3 of 6 Program to implement Algorithm 15.5 The following program convert a relation into BCNF using relational decomposition algorithm. In the first step, it considered all attributes in the single relation. In second step enters into a loop of functional dependency and check whether or not any functional dependency violates BCNF. If any FD violates BCNF, a new relation will be created having all those attributes participates in that functional dependency. At the same time the dependents are removed from the parent relation. Following is the java code to implement Decomposition algorithm to convert a relation into BCNF. import java.util.*; import java.io.*; public class DecompositionIntoBCNF { // main method to start the execution of the program. public static void main(String []args) throws Exception { BufferedReader br=new BufferedReader(new InputStreamReader(System.in)); // If irrelevant values will be entered, it might give // Exception at Runtime. System.out.println("Note: Everything is case sensitive, please enter values in the same case everywhere."); System.out.println("Enter the name of Relation:"); // It will store the name of relation. String relationName=br.readLine(); System.out.println("How many attributes are there in the Relation?"); // Number of attributes in the relation for efficient // management of the attributes. int n=Integer.parseInt(br.readLine()); System.out.println("Type name of one attribute in each line:"); // This list contains all attribute names. LinkedList<String> attributeList=new LinkedList<String>(); // for loop will insert all attributes to the list. for(int i=0;i<n;i++) attributeList.add(br.readLine()); System.out.println("How many functional dependencies are there in the relation "+relationName); // Number of Functional Dependencies. int numOfFuncDep=Integer.parseInt(br.readLine()); // this will initialize Left Hand Side attributes of // Functional Dependencies. LinkedList<String>[] fucDepLHSattr=new LinkedList[numOfFuncDep]; // this will initialize Right Hand Side attributes of // Functional Dependencies. LinkedList<String>[] fucDepRHSattr=new LinkedList[numOfFuncDep]; for(int i=0;i<numOfFuncDep;i++) { // Left Hand side of functional dependency might // have more than one determinants. fucDepLHSattr[i]=new LinkedList<String>(); System.out.println("Number of attributes in LHS of functional dependency["+i+"]"); // Number of determinant in Left Hand side of // functional dependency. // temp1 variable overrides itself for each // functional dependency. int temp1=Integer.parseInt(br.readLine()); System.out.println("Enter the attribute names of LHS["+i+"]"); for(int j=0;j<temp1;j++) fucDepLHSattr[i].add(br.readLine()); // Right Hand side of functional dependency might // have more than one determinants. fucDepRHSattr[i]= new LinkedList<String>(); System.out.println("Number of attributes in RHS of functional dependency["+i+"]"); // Number of dependants in Right Hand side of // functional dependency. // temp2 variable overrides itself for each // functional dependency. int temp2=Integer.parseInt(br.readLine()); System.out.println("Enter the attribute names of RHS["+i+"]"); // inserting all attributes on right hand side of // the functional dependency. for(int j=0;j<temp2;j++) fucDepRHSattr[i].add(br.readLine()); } LinkedList<String> output=new LinkedList<String>(); LinkedList[] decomposition=new LinkedList[numOfFuncDep]; output=attributeList; int d=0; // repeat until any functional dependency violates // BCNF. while(!inBCNF(output,fucDepLHSattr[d],fucDepRHSattr[d] ,d)) { decomposition[d]=new LinkedList<String>(); // if FD violates BCNF, create new relation // consisting attributes of LHS in FD. for(int j=0;j<fucDepLHSattr[d].size();j++) Comment Step 4 of 6 decomposition[d].add(fucDepLHSattr[d].get(j)); // add RHS attributes to the relation. for(int j=0;j<fucDepRHSattr[d].size();j++) decomposition[d].add(fucDepRHSattr[d].get(j)); // remove RHS attributes of FD from parent // relation. for(int j=0;j<fucDepRHSattr[d].size();j++) output.remove(fucDepRHSattr[d].get(j)); d++; // limit the loop up to the Number of functional // dependencies. if(d>=numOfFuncDep) break; } System.out.println("Following are the decomposed relations:"); // this loop will print the relations. for(int k=0;k<d;k++) { System.out.print(relationName+""+(k+1)+"("); // HashSet removes the redundant attributes from the // relation. HashSet hs=new HashSet(); for(int q=0;q<decomposition[k].size();q++) hs.add(decomposition[k].get(q)); Iterator it=hs.iterator(); // while loop will print one attribute at a time. while(it.hasNext()) { System.out.print(it.next()); } System.out.print(")\n"); } } // inBCNF method will check whether or not a relation is // in BCNF. public static boolean inBCNF(LinkedList<String> relation,LinkedList<String> list1,LinkedList<String> list2,int index) { // this loop will concatenate the attributes of LHS // and RHS. for(int i=0;i<list2.size();i++) list1.add(list2.get(i)); // if the functional dependency violates BCNF this // will return false otherwise return true. if(list1.size()< relation.size()) return false; else { // sorting attributes to compare attributes whether // or not they exist in the relation. Collections.sort(list1); Collections.sort(relation); // if attributes of functional dependency and // relation are similar this follows BCNF otherwise // it will return false. for(int j=0;j<list1.size() && j<relation.size(); j++) { if(list1.get(j)==relation.get(j)) continue; else return false; } } return true; } } Comment Step 5 of 6 The following output gets displayed by the above program: E:\Tom\java, c & c++ code>javac DecompositionIntoBCNF.java Note: DecompositionIntoBCNF.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. E:\Akram\java, c & c++ code>java DecompositionIntoBCNF Note: Everything is case sensitive, please enter values in the same case everywhere. Enter the name of Relation: MyRelation How many attributes are there in the Relation? 5 Type name of one attribute in each line: A B C D E How many functional dependencies are there in the relation MyRelation 3 Number of attributes in LHS of functional dependency[0] 2 Enter the attribute names of LHS[0] A B Number of attributes in RHS of functional dependency[0] Comment Step 6 of 6 1 Enter the attribute names of RHS[0] C Number of attributes in LHS of functional dependency[1] 2 Enter the attribute names of LHS[1] C D Number of attributes in RHS of functional dependency[1] 1 Enter the attribute names of RHS[1] E Number of attributes in LHS of functional dependency[2] 2 Enter the attribute names of LHS[2] D E Number of attributes in RHS of functional dependency[2] 1 Enter the attribute names of RHS[2] B Following are the decomposed relations: MyRelation1(ABC) MyRelation2(CDE) MyRelation3(BDE) E:\Tom\java, c & c++ code> Comment Chapter 15, Problem 21E Problem Consider the relation REFRIG(Model#, Year, Price, Manuf_plant, Color), which is abbreviated as REFRIG(M, Y, P, MP, C), and the following set F of functional dependencies: F = {M → MP, {M, Y}→ P, MP → C} a. Evaluate each of the following as a candidate key for REFRIG, giving reasons why it can or cannot be a key: {M}, {M, Y}, {M, C}. b. Based on the above key determination, state whether the relation REFRIG is in 3NF and in BCNF, and provide proper reasons. c. Consider the decomposition of REFRIG into D = {R1 (M, Y, P), R2(M, MP, C)}. Is this decomposition lossless? Show why. (You may consult the test under Property NJB in Section 14.5.1.) Step-by-step solution Step 1 of 3 Consider the relation schema REFRIG and the functional dependencies F provided in the question. a. Consider the key {M}. {M} cannot be a candidate key as it cannot determine the attributes P and Y. Consider the key {M, Y}. It is provided that Since . is the superset of M so by IR1, . Since Since and so by IR3, and so by IR3, Therefore, {M, Y} is a candidate key as it determines the attributes P, MP and C. Consider the key {M, C}. {M, C} cannot be a candidate key as it cannot determine the attributes P and Y. Comment Step 2 of 3 b. REFRIG is not in 2NF as there is a functional dependency , in which M is partially dependent on the key {M, Y}. Hence REFRIG is not in 3NF. Since M is not the super key in so REFRIG is not in BCNF too. Comment Step 3 of 3 c. Consider the decomposition of REFRIG as follows, Applying the test for Binary Decomposition, Now it is provided that . Since so by IR3, Hence, and . In the above decomposition, is and is . Since Comment , the NJB test is satisfied and hence decomposition is lossless. Chapter 15, Problem 22E Problem Specify all the inclusion dependencies for the relational schema in Figure 5.5. Step-by-step solution Step 1 of 1 Inclusion dependencies Inclusion dependencies are defined in two types of interrelational constraints. - referential integrity constraints - Class/subclass relationships. Definition of inclusion dependency:- let relation schema time where . And be the set of attributics between – X of of relation schema is a relation state and specifies the constraint that at any specific a relation state of . Then we must have From the figure 5.5 in the text book, we can specify the following inclusion dependencies on the relational schema. DEPENDENT.ESsn < EMPLOYEE.Ssn WORKS-ON.P number DEPT-LOCATIONS.D number All the preceding inclusion dependencies represent referential integrity constraints. We can also use inclusion dependencies to represent class/subclass relationships. Comment Chapter 15, Problem 23E Problem Prove that a functional dependency satisfies the formal definition of multivalued dependency. Step-by-step solution Step 1 of 2 Functional dependency satisfies the formal definition of multi valued dependency. Functional dependencies Consider the rule for the functional dependencies if , and is a subset of then Hence are single attributes and are set of attributes. It should be based on the formal definition of functional dependencies. Multi valued dependencies:While come to multi valued dependencies, it may follow the below rule If (BB intersects CC) where AA, BB, and CC are sets of attributes, and intersect performs set intersection. Comments (2) Step 2 of 2 As with function dependencies (FDs), inference rules for multi valued dependencies (MVPs) have been developed. A functional dependency is a multi valued dependencies it follows the replication Rule. Ice. If then Holds Now assume that all attributes are included in universal relation schema and that and W are subsets of R It may follow the bellow rules. If then all attributes in where except Augmentation rule: If and there exists W with the properties That (a) is empty (b) (c) Here Then and W have to be disjoint and Z has to be a subset of or equal to Y So, by the above rules “every functional dependency is also an multi valued dependencies, because. It satisfies the formal definition of an multi valued dependencies. Comment is Chapter 15, Problem 24E Problem Consider the example of normalizing the LOTS relation in Sections 14.4 and 14.5. Determine whether the decomposition of LOTS into {LOTS1AX, LOTS1AY, LOTS1B, LOTS2} has the lossless join property by applying Algorithm 15.3 and also by using the test under property NJB from Section 14.5.1. Step-by-step solution Step 1 of 8 Consider the example given in text book Comment Step 2 of 8 Comment Step 3 of 8 Let take the relation. Lots (property-id, country-name, #, area, price, tax-rate) Suppose we decompose the above relation into two relations LOTSIAX, LOTSIAY as follows. (From step 1, 2 of Algorithm 11.1) LOTSIAX (Property-id, country-name, #, Area, Price) LOTSIAY (Country-name, Tax-rate) There are a problem with this decomposition but we wish to focus on are aspect at the moment. Let an instance of the relation LOTS be Comment Step 4 of 8 Comment Step 5 of 8 Now let the decomposed relations LOTSAX, LOTSAY. Be Comment Step 6 of 8 And Comment Step 7 of 8 All the information that was in tehr elation LOTS appears to be still available in LOTSIAX and LOTSIAY. But this is not so. Suppose, we construct LOTSIAX by removing the attribute Tax-rate that violates 2NF form LUTS and placing it wilt country-name. Into another relation LOTSIY Let Comment Step 8 of 8 Now we need to retrieve #. Then we would need to join LUTSIAX and LOTSIAY. Then the join would have some tuples. A decomposition of a relation into relations decomposition. With respect to is called loss less join . Optaining result from steps of – Algorithm 11.1 Let decomposition of R has the non additive join property. That represents the set of functional dependencies on . If and only if either the functional dependencies it is also in . By the above relation. Let And . Now apply the property of the NJB, we get a so the functional dependencies it is in Comment and it is also in Chapter 15, Problem 25E Problem Show how the MVDs Ename ↠and Ename ↠Dname in Figure 14.15(a) may arise during normalization into 1NF of a relation, where the attributesPname and Dname are multivalued. Step-by-step solution Step 1 of 2 Given multi valued dependency is E name P name and E name D name According 11.4 cal figure given in Text book EMP E name P name D name It is in first normal form Now, we need to show that attributes P name and D name are multi valued. And hold the EMP relation. Let example of 11.4 (a) gives in text book EMP E name P name D name Smith X John Smith Y Anna Smith X Anna Smith Y John Comment Step 2 of 2 By above relation EMP shows. An employee where name is E name works on the project where P name and has a dependent whose name I D name. An employee may work on several projects and may have several dependents. The employee projects and dependents are independent of one another. To maintain this relation state consistent. We must have a separate tople to represent every combination of based on this Decomposing the EMP relation into two 4 NF relations EMPPROJECTS and EMP-DEPENDENTS. Is EMP-PROJECTS EMP-DEPENDENTS E name P name Smith X Smith Y E name P name Smith john Smith john This specifies the MVD on the EMP Comment Problem Chapter 15, Problem 26E Apply Algorithm 15.2(a) to the relation in Exercise to determine a key for R. Create a minimal set of dependencies G that is equivalent to F, and apply the synthesis algorithm (Algorithm 15.4) to decompose R into 3NF relations. Exercise Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for F? Decompose R into 2NF and then 3NF relations. Step-by-step solution Step 1 of 4 Refer to the Exercise 14.24 for the set of functional dependencies F and relation R. The functional dependencies in F are as follows: • The combination of all the attributes is always a candidate key for that relation. So ABCDEFGHIJ will be a candidate key for the relation R. • Reduce unnecessary attributes from the key as follows: • Since C can be determined by so remove it from the key. • Attributes D and E can be removed because they are determined by • Attribute F can be removed because it can be determined by • Attributes G and H can be removed because they are determined by • Attributes I and J can be removed because they are determined by Therefore, attribute set AB is a candidate key for relation R. Comment Step 2 of 4 Minimal set of dependencies (Minimal cover) If functional dependencies of a relation are not in canonical form then first convert them into canonical form using decomposition rule of inference. Refer to the Exercise 14.24 for the set of functional dependencies F and convert them into canonical form as follows: If there exist any extraneous functional dependency, remove it. Determine the minimal set of dependencies G, using the tests as follows: • Test for minimal set of LHS (only test functional dependencies with ≥2 attributes) 1. Testing for Test the functional dependency : Since so is necessary. 2. Testing for Test the functional dependency Since so : is necessary. • Test for minimal set of RHS 1. Testing for Since so is necessary. 2. Testing for Since so is necessary. 3. Comment Step 3 of 4 Testing for Since so is necessary. so is necessary. so is necessary. 4. Testing for Since 5. Testing for Since 6. Testing for Since so is necessary. 7. Testing for Since so is necessary. so is necessary. 8. Testing for Since Therefore necessary functional dependencies are as follow: After applying composition rule of inference, the minimal set of dependencies is: Hence, the minimal set of dependencies G, that is equivalent to F, is: Comment Step 4 of 4 Following steps must be used to decompose R into 3NF relations, using synthesis algorithm: Step 1: Calculate minimal cover The set of above functional dependencies is a minimal cover of R. Step 2: Creating relation for each functional dependency There are five functional dependencies in the relation R. Create five relations , all having the corresponding attributes as follows: Step 3: Creating relation for key attributes • AB is the candidate key in relation R. Since attributes A and B already exist in relation there is no need to create another relation for key attributes. • If another relation is created containing the candidate key AB, then it will result in redundancy, and step 4 can be used for removing the redundant relation. Therefore, the final 3NF relations obtained after decomposing R are as follows: and Comment so Chapter 15, Problem 27E Problem Repeat Exercise 1 for the functional dependencies in Exercise 2. Exercise 1 Apply Algorithm 15.2(a) to the relation in Exercise to determine a key for R. Create a minimal set of dependencies G that is equivalent to F, and apply the synthesis algorithm (Algorithm 15.4) to decompose R into 3NF relations. Exercise Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for F? Decompose R into 2NF and then 3NF relations. Exercise 2 Repeat Exercise for the following different set of functional dependencies G = {{A, B}, {B, D}→ {E, F}, {A, D}→{G, H}, {A}→{I},{H}{J}}. Exercise Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for F? Decompose R into 2NF and then 3NF relations. Step-by-step solution Step 1 of 5 Refer to the Exercise 14.25 for the set of functional dependencies F and relation R. The functional dependencies in F are as follows: • The combination of all attributes is always a candidate key for that relation. So ABCDEFGHIJ will be candidate key for the relation R. Reduce unnecessary attributes from the key. Since C can be determined by so remove it from the key. • Since attributes B and D are determining attributes E and F so both should be removed from the candidate key. • Since attributes A and D are determining attributes G and H so both should be removed from the candidate key. • Since attribute A is determining attributes I so it should be removed from the candidate key. • Since attribute H is determining attributes J so it should be removed from the candidate key. Therefore, the attribute set ABD is a candidate key for the relation R. Comment Step 2 of 5 Minimal set of dependencies (Minimal cover) If functional dependencies of a relation are not in canonical form then first convert them into canonical form using decomposition rule of inference. Refer to the Exercise 14.25 for the set of functional dependencies F and convert them into canonical form as follows: If there exist any extraneous functional dependency, remove it. Determine the minimal set of dependencies G, using the tests as follows: • Test for minimal set of LHS (only test functional dependencies with ≥2 attributes) 1. Testing for Test the functional dependency Since so : is necessary. 2. Testing for Test the functional dependency Since so : is necessary. 3. Testing for Test the functional dependency Since so : is necessary. 4. Testing for Test the functional dependency Since so : is necessary. 5. Testing for Test the functional dependency Since so : is necessary. 6. Testing for Test the functional dependency Since so : is necessary. • Comment Step 3 of 5 Test for minimal set of RHS 1. Testing for Since so is necessary. so is necessary. so is necessary. so is necessary. 2. Testing for Since 3. Testing for Since 4. Testing for Since 5. Testing for Since so is necessary. 6. Testing for Since so is necessary. so is necessary. 7. Testing for Since Therefore necessary functional dependencies are as follow: After applying composition rule of inference to above canonical functional dependencies, the minimal functional dependencies G (where ) obtained are as follows: Hence, the minimal set of functional dependencies G, that is equivalent to F, is: Comment Step 4 of 5 Following steps must be followed to decompose the relation R into 3NF relation using synthesis algorithm. Refer Exercise 14.25 for the functional dependencies. Step 1: Calculate minimal cover Minimal cover of the given functional dependencies is as follows: The set of above functional dependencies is a minimal cover of R. Step 2: Creating relation for each functional dependency There are five functional dependencies in the relation R, create five relations , all having corresponding attributes. Comment Step 5 of 5 Step 3: Creating relation for key attributes ABD is the candidate keys in relation R. Create a new relation containing attributes A, B and D. Therefore, all six relations with their corresponding attributes are as follow: Step 4: Eliminating redundant relations Remove all relations which are redundant. A relation R is redundant if R is a projection of another relation S in the same schema . Since there is no redundant relation in the schema, so there is no need to remove any relation. Therefore, the final 3NF relations obtained after decomposing R are as follows: and Comment Chapter 15, Problem 29E Problem Apply Algorithm 15.2(a) to the relations in Exercises 1 and 2 to determine a key for R. Apply the synthesis algorithm (Algorithm 15.4) to decompose R into 3NF relations and the decomposition algorithm (Algorithm 15.5) to decompose R into BCNF relations. Exercise 1 Consider a relation R(A, B, C, D, E) with the following dependencies: AB → C, CD → E, DE→ B Is AB a candidate key of this relation? If not, is ABD? Explain your answer. Exercise 2 Consider the relation R, which has attributes that hold schedules of courses and sections at a university; R = {Course_no, Sec_no, Offering_dept, Credit_hours, Course_level, lnstructor_ssn, Semester, Year, Days_hours, Room_no, No_of_students}. Suppose that the following functional dependencies hold on R: {Course_no} → {Offering_dept, Credit_hours, Course_level} {Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no, No_of_students, lnstructor_ssn} {Room_no, Days_hours, Semester, Year} → {lnstructor_ssn, Course_no, Sec_no} Try to determine which sets of attributes form keys of R. How would you normalize this relation? Step-by-step solution Step 1 of 6 Refer to the Exercise 14.27 for the set of functional dependencies and relation R. The functional dependencies are as follows: Canonical functional dependency Functional dependency having only one attribute on their right hand side. • The combination of all attributes is always a candidate key for that relation. So ABCDE will be candidate key for the relation R. Since all functional dependencies are in canonical form, there is no need to convert them into canonical form. • Reduce unnecessary attributes from the key as follows: • Since C can be determined by so remove it from the key. The attribute set ABDE can be considered as a candidate key. • Since E can be determined by so remove it from the key. The attribute set ABD can be considered as a candidate key. Therefore, ABD is a candidate key for the relation R. Comment Step 2 of 6 Refer to the Exercise 14.27 for the set of functional dependencies and relation R. Following steps must be used to decompose R into 3NF relations, using synthesis algorithm: Step 1: Finding the minimal cover The set of above functional dependencies is a minimal cover of R. Step 2: Creating relation for each functional dependency There are three functional dependencies, and their corresponding attributes are as follows: Step 3: Creating relation for key attributes • ABD is the candidate key in relation R. Since attributes A, B and D already exist in the above relations, so there is no need to create another relation for key attributes. • If another relation is created containing the candidate key ABD, then it will result in redundancy, and step 4 can be used for removing the redundant relation. Therefore, the final 3NF relations obtained after decomposing R are as follows: and . Comment Step 3 of 6 Refer to the Exercise 14.27 for the set of functional dependencies and relation R. Following steps must be used to decompose R into BCNF relations, using decomposition algorithm: Step 1: Initialize the decomposition algorithm. S= (A, B, C, D, E) Step 2: Check whether or not any functional dependency violates BCNF. If yes, then decompose the relation. Decompose R into three relations having the following attributes: Therefore, the final BCNF relations obtained after decomposing R are as follows: and . Comment Step 4 of 6 Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Since functional dependencies are not in canonical form, convert them into canonical functional dependencies as follows: The entire attribute set of the relation R is a candidate key. Since Days_hours, Room_no, No_of_students and Instructor_ssn can be determined by functional dependencies FD4, FD5, FD6 and FD7 respectively, so remove them from the candidate key. Remaining attributes in the candidate key are as follows: Since Offering_dept, Credit_hours and Course_level can be determined by FD1, FD2 and FD3 respectively, so remove them from the candidate key. Remaining attributes in candidate are as follows: Therefore, would be the minimal candidate key for the relation R. Comment Step 5 of 6 Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Following steps must be used to decompose R into 3NF relations, using synthesis algorithm: Step 1: Finding the minimal cover Since functional dependencies are not in canonical form, convert them into canonical functional dependencies as follows: Since Instruct_ssn, Course_no and Sec_no have been determined already, so these are extraneous attributes. Minimal cover for the relation R is as follows: The composed form of above functional dependencies is as follows: Step 2: Creating relation for each functional dependency There are two functional dependencies, and their corresponding relations are as follows: Step 3: Creating relation for key attributes • The relation R has the candidate key (Course_no, Sec_no, Semester, Year). Since attributes (Course_no, Sec_no, Semester, Year) already exist in the above relations, so there is no need to create another relation for key attributes. • If another relation is created containing the candidate key (Course_no, Sec_no, Semester, Year), then it will result in redundancy, and step 4 can be used for removing the redundant relation. Therefore, the final 3NF relations obtained after decomposing R are as follows: and Comment Step 6 of 6 Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Following steps must be used to decompose R into BCNF relations, using decomposition algorithm: Step 1: Initialize the decomposition algorithm. Step 2: Check whether or not any functional dependency violates BCNF. If yes, then decompose the relation. Since violates BCNF, relation R is decomposed into two relations . Therefore, the final BCNF relations obtained after decomposing R are as follows: and Comment Chapter 15, Problem 31E Problem Consider the following decompositions for the relation schema R of Exercise. Determine whether each decomposition has (1) the dependency preservation property, and (2) the lossless join property, with respect to F. Also determine which normal form each relation in the decomposition is in. a. D1 = {R1, R2, R3, R4, R5}; R1 = {A, B, C}, R2 = {A, D, E}, R3 = {B, F}, R4 = {F, G, H}, R5 = {D, I, J} b. D2 = {R1, R2, R3}; R1 = {A, B, C, D, E}, R2 = {B, F, G, H}, R3 = {D, I, J} c. D3 = {R1, R2, R3, R4, R5}; R1= {A, B, C, D}, R2= {D, E}, R3 = {B, F}, R4 = {F, G, H}, R5= {D, I, J} Exercise Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for F? Decompose R into 2NF and then 3NF relations. Step-by-step solution Step 1 of 10 Consider the relation R and functional dependencies as follows: Comment Step 2 of 10 a. The decomposition for the relation schema R is: Relation R1 satisfies the functional dependency . Relation R2 satisfies the functional dependency . Relation R3 satisfies the functional dependency . Relation R4 satisfies the functional dependency Relation R5 satisfies the functional dependency Hence, the decomposition . . satisfies the dependency preserving property. Comment Step 3 of 10 In order to know if satisfies the nonadditive join property, apply the algorithm 15.3. Please refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook. The first row consists of “a” symbols in all the cells. Hence, the decomposition satisfies the nonadditive join property. Comment Step 4 of 10 In relation R1, is the primary key and also a super key. It satisfies Boyce Codd normal form. In relation R2, is the primary key and also a super key. It satisfies Boyce Codd normal form. In relation R3, is the primary key and also a super key. It satisfies Boyce Codd normal form. In relation R4, is the primary key and also a super key. It satisfies Boyce Codd normal form. In relation R5, is the primary key and also a super key. It satisfies Boyce Codd normal form. All the relations of decomposition are in Boyce Codd normal form. Comment Step 5 of 10 b. The decomposition for the relation schema R is: Relation R1 satisfies the functional dependency and Relation R2 satisfies the functional dependency and Relation R3 satisfies the functional dependency . Hence, the decomposition . . satisfies the dependency preserving property. Comment Step 6 of 10 In order to know if satisfies the nonadditive join property, apply the algorithm 15.3. Please refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook. The first row consists of “a” symbols in all the cells. Hence, the decomposition satisfies the nonadditive join property. Comment Step 7 of 10 In relation R1, is the primary key. The relation R1 is in first normal form as there is partial dependency. The attribute A is a partial key and it determines the attributes D and E. In relation R2, is the primary key. The relation R2 is in second normal form as there is transitive dependency. The attribute F is a non-key attribute that functional determines the attributes G and H. In relation R3, is the primary key and also a super key. It satisfies Boyce Codd normal form. Comment Step 8 of 10 c. The decomposition for the relation schema R is: Relation R1 satisfies the functional dependency Relation R3 satisfies the functional dependency . . Relation R4 satisfies the functional dependency Relation R5 satisfies the functional dependency The functional dependency Hence, the decomposition . . is not satisfied. does not satisfy the dependency preserving property. Comment Step 9 of 10 In order to know if satisfies the nonadditive join property, apply the algorithm 15.3. Please refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook. There is no row in the matrix that consists of “a” symbols in all the cells. Hence, the decomposition does not satisfy the nonadditive join property. Comment Step 10 of 10 The normal form of relation R1 cannot be determined as it satisfies only functional dependency . Nothing can be said about the attribute D of relation R1. The normal form of relation R2 cannot be determined as it does not satisfy any functional dependency. In relation R3, is the primary key and also a super key. It satisfies Boyce Codd normal form. In relation R4, is the primary key and also a super key. It satisfies Boyce Codd normal form. In relation R5, is the primary key and also a super key. It satisfies Boyce Codd normal form. Comment Chapter 16, Problem 1RQ Problem What is the difference between primary and secondary storage? Step-by-step solution Step 1 of 1 Following are the differences between primary and secondary storage: Primary storage Secondary storage The CPU can directly access the The CPU cannot directly access the secondary storage primary storage devices. devices. Fast access to data is provided by Slower access to data is provided by the secondary the primary storage devices. storage devices. The storage capacity is limited. The storage capacity is larger. Cost of primary storage devices is high than the secondary storage devices. Examples of primary storage are main memory and cache memory. Comment Cost of secondary storage devices is low than the primary storage devices. Examples of secondary storage are hard disk drive, magnetic disks, magnetic tapes, optical disks and flash memory. Chapter 16, Problem 2RQ Problem Why are disks, not tapes, used to store online database files? Step-by-step solution Step 1 of 1 To store online database files, we use disks. Disks are secondary storage device; a disk is a random access addressable device. Data us stored and retrieved in units called disk blocks while come to the tapes. Tapes are the sequential access addressable device. Comment Chapter 16, Problem 3RQ Problem Define the following terms: disk, disk pack, track, block, cylinder, sector, interblock gap, and read/write head. Step-by-step solution Step 1 of 3 Disk: The disk is the secondary storage device that is used to store the huge amount of data. The disk stores the data in the digital form i.e., 0’s and 1’s. The most basic unit of data that can be stored in the disk is bit. Disk pack: The disk pack contains the layers of hard disks to increase the storage capacity i.e., it includes many disks. Comment Step 2 of 3 Track: In the disk, the information is stored on the surface in the form of circles with various diameters. Each circle of the surface is called a track. Block: Each track of the disk is divided into equal sized slices. One or more such slices are grouped together to form a disk block. The block may contain single slice (sector). The size of the block is fixed at the time of disk formatting. Comment Step 3 of 3 Cylinder: In the disk pack, the tracks with the same diameter forms a cylinder. Sector: Each track of the disk is divided into small slices. Each slice is called as sector. Interblock gap: The interblock gap separates the disk blocks. The data cannot be stored in the interblock gap. Read/write head: The read/write head is used to read or write the block. Comment Chapter 16, Problem 4RQ Problem Discuss the process of disk initialization. Step-by-step solution Step 1 of 1 Process of disk initialization:In the disk formatting / initialization process, tracks are divided into equal size. It is set by the operating system. Initialization means, The process of defining the tracks and sectors, so that data and programs can be stored and retrieved. While initialization of the disk, block size is fixed, and it can not be changed dynamically. Comment Chapter 16, Problem 5RQ Problem Discuss the mechanism used to read data from or write data to the disk. Step-by-step solution Step 1 of 1 When the disk drive begins to rotate the disk when ever a particular read or write request is initiated and once the read/unit head is positioned on the right track and the block specified in the block address moves under the read/write head. The electronic component of the read/write head is activated to transfer the data. Below procedure is follows when the data is read for or write from disk. (1) The head seeks to the correct track (2) The correct head is turned on (3) The correct sector is located. (4) The data is read from the hard disk and transferred to a buffer RAM Comment Chapter 16, Problem 6RQ Problem What are the components of a disk block address? Step-by-step solution Step 1 of 1 Disk block address:Data is stored and retrieved in units called disk blocks or pages. Address of a block:Consists of a combination of cylinder number, track number (Surface number with in the cylinder on which the track is located. Block number (with in the track) is supplied to the disk Comment Chapter 16, Problem 7RQ Problem Why is accessing a disk block expensive? Discuss the time components involved in accessing a disk block. Step-by-step solution Step 1 of 4 The data is arranged in an order and then stored in a block of the disk is said to be known as blocking. The data can be transferred from the disk to the main memory in units. Accessing the data in the main memory is less expensive than accessing the data in the disk. This is due to the following components: • Seek time. • Rotational latency. • Block transfer time. Comment Step 2 of 4 The access of the data in the disk is more expensive because of the time components. The time components are explained as follows: • Seek time: o The disk contains a set of tracks. Each track has one head. This is said to be known as disk head. The track is formed by sectors of fixed size. o A sector is said to be known as a small sub-division of a track present on the disk. o Each sector can store up to 512 bytes of data in which the user can access the data. o For reading the data present in the disk, there is an arm on the disk. This is used to read a record from the disk. o Seek time is said to be known as the total time taken to position the arms to the disk head present on the track. o Accessing a disk block takes more seek time. Therefore, this is one of the major reasons for the expensiveness of accessing a disk block. Comment Step 3 of 4 • Rotational latency: o Latency is said to be known as time delay. o The total amount of time taken between the request for an information and how long it takes the disk to position the sector where that data is available. This is said to be known as rotational latency. o This is also said to be a waiting time in which if this time increases, then the expensiveness of accessing a disk block will also increase. Comment Step 4 of 4 • Block transfer time: o If there is need to transfer the data in the disk from one block to another block then it will take some time. o This time is said to be known as block transfer time. At the time of accessing a block of data from the disk, the transfer time may increase. This will result in the expensiveness of accessing a disk block. Comment Chapter 16, Problem 8RQ Problem How does double buffering improve block access time? Step-by-step solution Step 1 of 1 Improve block access time using duffer buffer:Double buffering is used to read a continuous stream of blocks from disk to memory. Double buffering permits continuous reading or writing of data on consecutive Disk blocks, which eliminates the seek time and rotational delay for all but the first block transfer. Moreover, in the programs data is kept ready for processing and it reducing the waiting time. Double buffering processing time is np Where n p blocks processing time/block Comment Chapter 16, Problem 9RQ Problem What are the reasons for having variable-length records? What types of separator characters are needed for each? Step-by-step solution Step 1 of 1 Variable-length records (Reasons) A file is a sequence of records. All records in a file are of the same type and in same size. If different records in the file have different size, the file is said to be made up of variable – length records. A file may have variable-length records for sever reasons. * File records are of the same record type, but one or more of the fields are of different size. * File records are same record type, but one or more of the fields are optimal. Means, they may have some values but not for all. * File contains records of different record types and varying size. If related records of different types were placed together on the disk block it would be occur. Type of separator characters are need for each:In the variable length fields, each record has a value for each field. But we do not know the exact length of some field values. To determine the bytes with in that record it represent each field. Then we can use separator characters like ? or % or $ Types of separator characters:Separating the field name from the field value and separating one field from the next field. For this we use three types of characters. Example Here we use three separator characters. That are , and Comment Chapter 16, Problem 10RQ Problem Discuss the techniques for allocating file blocks on disk. Step-by-step solution Step 1 of 1 Techniques for allocating file Blocks on Disk: Many techniques are there fore allocating the blocks of a file on Disk. In that Contiguous allocation Linked allocation Clusters Indexed allocation. Contiguous allocation:File blocks are allocated to consecutive disk blocks. This makes reading the whole file very fast using double buffering. Linked allocation:Each file block contains a pointer to the next file block. It is easy to expand the file but makes it slow to read the whole file. Clusters:Combination of two allocates of consecutive disk blocks. Clusters are sometime called as file segments/extends. Indexed allocation:One or more index blocks contain pointers to the actual file blocks. Comment Chapter 16, Problem 11RQ Problem What is the difference between a file organization and an access method? Step-by-step solution Step 1 of 1 Difference between a file organization and an access method File organization:- It shows “how the physical records in a file are arranged on the disk. A file organization refers to the organization of the data of a file in to records, blocks, and access structures - In this, records and blocks are placed on the storage medium and interlinked. Access methods:How the data can be retrieved based on the file organization. It provides a group of operations and that can be applied to a file. Some access methods apply to a file organization and can be applied only to file organization in certain way. Comment Chapter 16, Problem 12RQ Problem What is the difference between static and dynamic files? Step-by-step solution Step 1 of 1 Difference between static and dynamic files:Static file:- in the file organization update operations are rarely performed. While come to dynamic files, It may change frequently, Up date operations are constantly applied to them. Comment Chapter 16, Problem 13RQ Problem What are the typical record-at-a-time operations for accessing a file? Which of these depend on the current file record? Step-by-step solution Step 1 of 1 Typical record at a time operations are: 1.) Reset: Set the file pointer to the beginning of file. 2.) Find (locate): Searches for the first record that satisfies a search condition. Transfer the block containing that record into memory buffer. The file pointer points to the record in buffer and it becomes the current record. 3.) Read (Get): Copies current record from the buffer to the program variable in the user program. This command may also advance the current record pointer to the next record in the file, which may necessitate reading the next file block from disk. 4.) FindNext: Searches for next record in file that satisfies the search condition. Transfer the block containing that record into main memory buffer. The record is located in the buffer and becomes current record. 5.) Delete: Delete current record and updates file on disk to reflect the deletion. 6.) Modify: Modifies some field values for current record and eventually update file on disk to reflect the modification. 7.) Insert: Insert new record in the file by locating the block where record is to be inserted, transferring the block into main memory buffer, writing the record into the buffer, and eventually writing buffer to disk to reflect insertion. Operations that are dependent on current record are: 1.) Read 2.) FindNext 3.) Delete 4.) Modify Comment Chapter 16, Problem 14RQ Problem Discuss the techniques for record deletion. Step-by-step solution Step 1 of 1 Techniques for record deletion:We may delete a record from the file using following techniques. (1) A program must first find it’s block Copy the block in to a buffer Delete the record from the buffer and then, Rewrite the block back to the disk. By using this, it leaves the unused space in the disk block. When we use this technique for deleting the large number of records result is wasted in storage space. (2) Another technique for record deletion is deletion marker (deletion is to have an extra byte or bit). In this Setting the deletion marker to a certain (deleted) value. A different value of the marker indicates a valid record Search programs and consider only valid records in a block. These two deletion techniques requires periodic reorganization. During this reorganization. The file blocks are accessed consecutively and records are packed by removing deleted records. For un ordered file, we use either spanned or un spanned organization and it is used with either fixed-length or variable-length records. Comment Chapter 16, Problem 15RQ Problem Discuss the advantages and disadvantages of using (a) an unordered file, (b) an ordered file, and (c) a static hash file with buckets and chaining. Which operations can be performed efficiently on each of these organizations, and which operations are expensive? Step-by-step solution Step 1 of 4 (a) an unordered file: It can be defined as the collection of records, those are placed in file in the same order as they are inserted. Advantages: • It is fast, and Insertion of simple records are added at the end of the last page of the file. • It is easy to get the records from the unordered file. Disadvantages: • Blank spaces may appear in the unordered file. • It will take time to sort the records in unordered file. Comment Step 2 of 4 b) an ordered file: An ordered file, it is stores records in order and it will changes the file when records are inserted. Advantages: • Recording a sequential based file is more capable as all the files are being stored as the order. • Helpful when large volume of data is present. Disadvantage: • Rearranging of file would be needed for storing or modifying or deleting any records. Comment Step 3 of 4 c) static file hashing: Advantages: • The speed is the biggest advantage and it is efficient when huge volume of data is present. Disadvantages: • Difficult to implement the static file hashing. Comment Step 4 of 4 The hashing technique is the most efferent to be executed and is expensive process because of is sophisticated structures. • The extendable hashing is a type of dynamic hashing, which splits and associate the bucket of the database size for change, because when a hash function is to be adjusted on a dynamic basis. • There is a cache which is an added advantage for faster improvement of information’s. Comment Chapter 16, Problem 16RQ Problem Discuss the techniques for allowing a hash file to expand and shrink dynamically. What are the advantages and disadvantages of each? Step-by-step solution Step 1 of 5 Hashing techniques are allow the techniques for dynamic growth and shrinking of the number of the file records. Techniques that are include the dynamic hashing, extendible hashing and linear hashing. In the static hashing the primary pages are fixed and allocated sequentially, pages are never deallocated and if needed pages of overflowed. This technique use the binary representation of the hash value . In the dynamic hashing, the directory is a binary tree, the directories can be stored on disk, and they expand or shrink dynamically. Directory entries point to the disk blocks and that contain the stored records. Dynamic hashing is good for the database and that grows and shrinks in size, and hash function that allows dynamically. Comment Step 2 of 5 In dynamic hashing, extendable hashing is the one form. It generates the values over a large range typically b-bit integers with Hash function tht allows only prefix to index into a table of bucket addresses. Example Let the length of the prefix be I fits, I must be in the limts between 0 and 32 Bucket address table size is here initially is 0. Now, value of i grows and shrinks as the size of the database. Comment Step 3 of 5 The number of buckets also changes dynamically because of coalescing and spilitting of buckets. Advantages and disadvantages of hashing techniques:(1) Advantages of static hashing:Static hashing uses a fixed address space, and perform computation on the internal binary representation of the search key. Using bucket overflow, static hashing is redused, and it can not be eliminated. Disadvantages:Here data base grow is with in time. And if initial number of buckets is too small, then performance will degraded. If data base is shrinks, than again space will be wasted. Comment Step 4 of 5 Extendible hashing:Advantages:It is a type of directory Hash performance dose not degrade with growth of a file Minimal space is over headed. Disadvantages:Bucket address table may it self become big. Changing size of bucket address table is an expensive operation. Comment Step 5 of 5 Linear hashing:Advantages:It avoids the directory by splitting. Overflow pages not likely to be long Duplicates handled easily It allows a hash file to expand to shrink its number of buckets dynamically with out a directory file. Disadvantages:Linear hashing handles the problem of long overflow chains without using a directory and handles duplicates. Comment Chapter 16, Problem 17RQ Problem What is the difference between the directories of extendible and dynamic hashing? Step-by-step solution Step 1 of 1 The differences between directories of extendible and dynamic hashing are as follows: Comment Chapter 16, Problem 18RQ Problem What are mixed files used for? What are other types of primary file organizations? Step-by-step solution Step 1 of 3 A mixed file refers to a file in which contains records of different file types. • An additional field known as record type field is added as the first field along with the fields of the records to distinguish the file to which it belongs. • The records in a mixed file will be of varying size. Comment Step 2 of 3 The uses of mixed file are as follows: • To place related records of different record types together on disk block. • To increase the efficiency of the system while retrieving related records. Comment Step 3 of 3 The other types of primary file organization are as follows: • Unordered file organization • Ordered file organization • Hashed/file organization • Sorted file organization • Hashed file organization • Indexed (b-tree) file organization Comment Chapter 16, Problem 19RQ Problem Describe the mismatch between processor and disk technologies. Step-by-step solution Step 1 of 2 In computer systems, the collection of data can be stored physically in the storage medium. • From the DBMS (DataBase Management System), the data can be processed, retrieved, and updated whenever it is needed. • The storage medium structure in a computer will have some storage hierarchy to make the process of collections of data. There are two main divisions in the storage hierarchy of a computer system. • Primary storage • Secondary and tertiary storage Primary storage: • This storage medium in the computer can be directly accessed by the CPU (Central Processing Unit), it can be stored only as temporarily. • The primary storage is also called as main memory (RAM). • In main memory, the data can be accessed faster with faster cache memories but less storage capacity and cost-effective. • Please note that in case of any power failures or browser crash, the contents of the main memory will be erased automatically. Secondary and tertiary storage: • This storage medium in the computer can be stored permanently in the way of disks, tapes, CD-ROMs, or DVDs. • The secondary storage is also called as secondary memory or Hard Disk Drives (ROM). • In today’s world, the data can be stored in offline considered as removable media, it is called as tertiary storage. • It will store the data as a permanent medium of choice. • The data cannot be accessed directly in this type of storage, at first it will be copied to primary storage and then the CPU processes the data. Comment Step 2 of 2 The Mismatch between processor and disk technologies: In computer systems, the processing can be done by RAM which is having a series of chips. • For an efficient performance, the faster memory is provided to the processor. • Also, the processor has the support of cache memory to retrieve the information faster which will be an added advantage. In computer systems, the disk technologies need the space to accumulate the data. • In disk technologies, the collection of data can be stored physically. • The data cannot be accessed directly in disk type technologies, at first it will be copied to primary storage and then the CPU processes the data. • When it is compared to the processor, the time consumption will be more, and the processor is better to run the processes. Hence, the processor will provide efficient performance better than the disk technologies. Comment Chapter 16, Problem 20RQ Problem What are the main goals of the RAID technology? How does it achieve them? Step-by-step solution Step 1 of 2 To increase reliability of database when using the redundant array of independent disks by introducing redundancy Disk mirroring:It is the technique for introducing redundancy in a database is called mirroring/ shadowing. Is store data redundantly on two identical physical disks that are treated as one logical disk. In case of mirroed data, the data items can be read from any disk, hot for writing the data item must be written on both. Means, When data is read, it can be retrieved from the disk with shorter queuing, seek, and rotational delays. If one disk fails, the other disk is still there to continuously provide the data. It improves the reliability. Comment Step 2 of 2 Quantities example from book:The mean time to failure of a mired disk depends on the man time to failure of the individual disks, as well as on the mean time to repair, which is the time it takes (an average) to replace a failed disk and to restore the data an it. Suppose that, the failures of the two disks are independent; means there is no connection between the failure of one disk and the failure of the other. It the system has 100 disks in an array. The mean to repair is 24 hours, and the MTTF if 200,000 hours on each disk. The mean time to data loss of a mirrored system is Comment Chapter 16, Problem 21RQ Problem How does disk mirroring help improve reliability? Give a quantitative example. Step-by-step solution Step 1 of 1 The technique of data striping to achieve higher transfer rates and improves the performance of disk in RAID, which has two levels (i) bit-level data striping (ii) block level data striping. Comment Chapter 16, Problem 22RQ Problem What characterizes the levels in RAID organization? Step-by-step solution Step 1 of 2 Raid Levels:In the RAID organization, one solution that presents it self because of the increased size and reduced cost of hard drives is to built in redundancy. RAID can be implemented in hardware and software and it is a set of physical disk drives viewed by the operating system as a single logical drive. Levels:Depends on the data redundancy introduced and correctness checking technique used in the schema. Level 0:Uses data striping and it has no redundancy and no correctness checking. Level 1:Redundancy through mirroring and no correctness checking. Level 2:In this level; mirroring and no mirroring combined with memory like correctness checking. For example: Using parity hit: Various versions of level 2 are possible. Comment Step 2 of 2 Level 3:Level 3 is seems like as level 2, but uses the single disk for parity. Level 3 is some time called as bit-interleaved. Disk controller can detect whether a sector has been read correctly. A single parity bit can be used for error correction as well as detection. Level 4:Block level data striping and parity like level 3 and in this level stores blocks. Level 5:Block level data striping but data and parity are distributed across all disks. Level 6:Uses the P+Q redundancy scheme, and P+Q redundancy using Reed-suloman codes to recover from multiple disk failures. Comment Chapter 16, Problem 23RQ Problem What are the highlights of the popular RAID levels 0, 1, and 5? Step-by-step solution Step 1 of 1 Different RAID (Redundant Array of Inexpensive Disks) organizations were defined based on different combinations of the two factors, 1. Granularity of data interleaving (striping) 2. Pattern used to compute redundant information. There are various levels of RAID from 0 to 6. The popularly used RAID organization is level 0 with striping, level 1 with mirroring, and level 5 with an extra drive for parity. RAID level 0 • It uses data striping. • It has no redundant data and hence it provides best write performance as updates are not required to be duplicated. • It splits data evenly across multiple disks. RAID level 1 • It provides good read performance as it uses mirrored disks. • Performance improvement is possible by scheduling a read request to the disk with shortest expected seek and rotational delay. RAID level 5 • It uses block level data striping. • Data and parity information are distributed across all the disks. If any one disk fails, the data lost is due to any changes is determined by using the information of the parity available from the remaining disks. Comment Chapter 16, Problem 24RQ Problem What are storage area networks? What flexibility and advantages do they offer? Step-by-step solution Step 1 of 1 There is a demand for storage and management of cost all data as data are integrated across organization and it is necessary to move from a static fixed data which are used from centered architecture operation to a more flexible and dynamic infrastructure for the processing of information requirements, most of the organizations moved to the better criterion of storage area networks (SANs). • In SAN, online storage peripherals are configured as nodes on a high-speed network and can be attached and removed from servers in a very flexible manner. • They allow storage systems to be placed at longer distances from the servers and provide good performance and different connectivity options • It provides point-to-point (every devices are connected to every other device) connections between servers and storage systems through fiber channel; it allows connecting multiple RAID systems, tape libraries to servers. Advantages 1. It is more flexible as it provides flexible connection with many devices that is many-to-many connectivity among servers and storage devices using fiber channel hubs and switches. 2. Between a server and storage system there is a distance separation of up to 10km provided by using fiber optic cables. 3. It provides better isolation capabilities by allowing non-interruptive addition of new peripheral devices and servers. Comment Chapter 16, Problem 25RQ Problem Describe the main features of network-attached storage as an enterprise storage solution. Step-by-step solution Step 1 of 1 In enterprise applications it is necessary to maintain solutions at a very low cost to provide high performance. Network-attached storage (NAS) devices are used for this purpose. It does not provide any of the services common to the server, but it allows the addition of storage for file sharing. Features • It provides very large amount of hard-disk storage space and it is attached to a network and multiple or more number of servers can make use of those space without shutting them down so that it ensure better maintenance and improve the performance. • It can be located at anywhere in the local area network (LAN) and used with different configuration. • A hardware device called as NAS box or NAS head acts as a gateway between the NAS systems and clients who are connected in the network. • It does not use any of the devices such as monitor, keyboard, or mouse, disk drives that are connected to many NAS systems to increase total capacity. • It can store any data that appears in the form of files, such as e-mails, web content includes text, image or videos, and remote system backups. • It works to provide reliable operation and for easy maintenance. • It includes built-in features such as security (authenticate the access) or automatic sending of alerts through mail in case of error occurred on the device that are connected. • It contributes to provide high degree of scalability, reliability, flexibility and performance. Comment Chapter 16, Problem 26RQ Problem How have new iSCSI systems improved the applicability of storage area networks? Step-by-step solution Step 1 of 1 Internet SCSI (iSCSI) is a protocol proposed to issue commands that allows clients (initiators) to send SCSI commands to SCSI storage devices through remote channels. • The main feature is that, it does not require any special cabling connections as needed by Fiber Channel and it can run for longer distances using existing network infrastructure. • iSCSI allows data transfers over intranets and manages storage over long distances. • It can transfer data over variety of networks includes local area networks (LANs), wide area networks (WANs) or the Internet. • It is bidirectional; when the request is given, it is processed and the resultant data is sent in response to the original request. • It combines different features such as simplicity, low cost, and the functionality of iSCSI devices provides good upgrades and hence applied in small and medium sized business applications. Comment Chapter 16, Problem 27RQ Problem What are SATA, SAS, and FC protocols? Step-by-step solution Step 1 of 3 SATA Protocol: SATA stands for serial ATA, wherein ATA represents attachment; therefore SATA becomes serial AT attachment. SATA is a modern storage protocol that has fully replaced the most commonly used SCSI (small computer system interface) and parallel ATA in laptops and small personal computers. SATA overcomes design limitations of previous storage protocol. • SATA is suitable for tiered storage environment. • SATA can be used for small and medium sized enterprises. • SATA support interchangeability. Comment Step 2 of 3 SAS Protocol: SAS stands for serial attached SCSI. SAS overcomes design limitations of previous storage protocol and also considered superior to SATA. • SAS was designed to replace SCSI interfaces in Storage area network (SAN). • SAS drives are faster than SATA drives and has dual portability. • SATA can be used for small and medium sized enterprises. • SATA support interchangeability. Comment Step 3 of 3 FC Protocol: FC stands for serial Fiber channel protocol. Fiber channel is used to connect multiple RAID systems, taps, which have different configurations. • Fiber channel supports point to point connection between server and storage system. It also Provide flexibility to connect too many connections between servers and storage devices. • Fiber channel has almost the same performance like SAS. It uses fiber optic cables, so high speed data transfer supported. • No distance limitation. Low cost alternative for devices. Comment Chapter 16, Problem 28RQ Problem What are solid-state drives (SSDs) and what advantage do they offer over HDDs? Step-by-step solution Step 1 of 2 Solid-state drives (SSD): SSD is abbreviation for solid-state drives, which uses integrated circuit assemblies as storage to store data permanently. It is a nonvolatile memory, means it will not forget the data on system memory when the system is turned off. SSD is based on flash memory technology, that’s why sometimes it is known as flash memory, and they don’t require continuous power supply to store data on secondary storage, so they are known as solid state disk or solid state drives. SSD does not have read and write head like traditional electromagnetic disk, instead it has controller (embedded processor) for various operations. It makes speed of data retrieval faster than magnetic disks. Commonly in SSDs, interconnected NAND flash memory cards are used. SSD uses wear leveling technique to store data that extend the life of SSD by storing data to separate NAND cell, instead of overwriting it. Comment Step 2 of 2 Advantages of SSDs over HDDs are as follows: • Faster access time and higher transfer rate: In SSD data can be accessed directly from different locations on flash memory, so access time in SSD is 100 times faster than HDD and latency time is low, consequently data transfer rate is high and system boot up time is low. • More reliable: SSD does not have a moving mechanical arm for read and write operations. Data is stored on integrated circuit chips. SSD has controller to manage all the operations on flash cells, and data can be written and erased on flash cell, only limited number of time before it fails. The controller manages these activities, so that SSD can work for many years under normal use. • No moving component (durable): As SSD does not have moving component, so data on SSD is safer, even when equipment is being handled roughly. • Uses less power: As in SSD, there is no head rotation to read and write data, so power consumption is lower than HDD and saves battery life. SDD uses only 2-3 watts whereas HDD uses 6-7 watts of power. • No noise and generate less heat: As no moving head rotation is there, so SSD generate less heat and doesn’t make noise that helps to increase life and reliability of the drive. • Light weight: As SSDs are mounted on circuit board and they don’t have moving head and spindle, so they are light weight and small in size. Comment Chapter 16, Problem 29RQ Problem What is the function of a buffer manager? What does it do to serve a request for data? Step-by-step solution Step 1 of 2 The buffer manager is a software module of DBMS whose responsibility is to serve to all the data requests and take decision about choosing a buffer and to manage page replacement. The main functions of buffer manager are: • To speed up the processing and increase efficiency. • To increase the possibility that the requested page is found in main memory. • To find an appropriate replacement for a page while reading a new disk block from disk, such that the replacement page will not be required soon. • The buffer manager must ensure that the number of buffers fits in the main memory. • Buffer manager functions according to the buffer replacement policy and selects the buffers that must be emptied, when the requested amount of data surpasses the available space in buffer. Comment Step 2 of 2 The buffer manager handles two types of operations in buffer pool to fulfill its functionality: 1. Pin count: This is the counter to track the number of page requests or corresponding number of users who requested that page. Initially counter value is set to zero. If the counter value is always zero, the page is unpinned. Only unpinned blocks are allowed to be written on the disk. As the value of counter is incremented the pages are called pinned. 2. Dirty bit: Initially its values is set to zero for all pages. When the page is updated, its value is updated to 1. Buffer manager processes the page requests in following steps: • Buffer manager checks the availability of the page in buffer. If the page is available, it increments the pin count and sends the page. • If page is not in buffer, than buffer manager takes the following steps: • Buffer manager decides a page according to the replacement policy and increments page’s pin count. • If the dirty bit of replacement page is on, buffer manager writes that page onto disk and replaces the old copy. • If the dirty bit is not on, buffer manager does not write the page back to disk. • Buffer manager reads the new page and conveys the memory location of the page to the demanding application. Comment Chapter 16, Problem 30RQ Problem What are some of the commonly used buffer replacement strategies? Step-by-step solution Step 1 of 2 Buffer replacement strategies: In large DBMSs, files contain so many pages and it is not possible to keep all the data in memory at the same time. To overcome this storage problem and improve efficiency of DBMS transactions, buffer manager (software) uses buffer replacement strategies that decide what buffer to use and which pages are to be replaced in the buffer to give a space to newly requested pages. Comment Step 2 of 2 Some commonly used buffer replacement strategies are as follows: • LRU (Least recently used): The LRU strategy keeps track of page usages for specific period of time and it removes the oldest used page. LRU works on the principle that the pages which are frequently used are most likely to be used in further processing too. To maintain the strategy the buffer manager has to maintain a table where the frequency of the page usage is recorded for every page. This is very common and simple policy. It has problem of sequential flooding, which means that there are frequent scanning and repeated use of I/O for each page. • Clock policy: This is an approximate LRU technique. It is like Round robin strategy. In clock replacement policy buffers are arranged in a circle like a clock with a single clock hand. The buffer manager sets “use bit” on each reference. If “use bit” is not set (flag 0) for any buffer that means it is not used in a long time and is vulnerable for replacement. It replaces the old page not the oldest. • FIFO (First In First Out): This is the simplest buffer replacement technique. When buffer is required to store new pages, the oldest arrived page is swapped out. The pages are arranged into the buffer in a queue in a fashion that most recent page is the tail and oldest arrival is the head. During replacement the page at the head of the queue is replaced first. This strategy is simple and easy to implement but not desirable, because it replaces the oldest page which may be most frequently used page and in future it can be needed, so again it will be swapped in. It creates processing overhead. • MRU (Most recently used): It removes most recently used pages first. This is also called fetch and discard. This is useful in sequential scanning when most recently used page, won’t be used in future for a period of time. In situation of sequential scanning LRU and CLOCK strategies don’t perform well. To enhance performance of FIFO, it can be modified by using some pinned block like root index block, and make sure that they can’t be replaced and always remain in buffer. Comment Chapter 16, Problem 31RQ Problem What are optical and tape jukeboxes? What are the different types of optical media served by optical drives? Step-by-step solution Step 1 of 2 Optical jukeboxes: Optical jukebox is an intelligent data storage device that uses an array of optical disk platters, and automatically load and unload these disks like according to the storage need. Jukeboxes has high capacity storage and it supports up to terabytes and even petabytes of tertiary storage. • Optical jukeboxes have up to 2000 different disk slots. As optical jukeboxes keep traversing different disk storage according data requirement, so it create time overhead and affect processing. • Jukeboxes are cost effective and provide random access of data. • The process of dynamically loading and uploading of disk drives is called migration. Magnetic jukeboxes: Magnetic tape jukeboxes uses a number of tapes as a storage and automatically load and unload taps on tape drives. This is a popular tertiary storage medium that can handle data up to terabytes. Comment Step 2 of 2 Optical media used by optical drives: Optical media stores data in digital form. Optical media can store all type of data like audio, video, software, images and text. To read and write data on optical media, optical drive is used. Optical drive read and write data using laser waves. Laser waves are electromagnetic waves with specific wavelength to read different type of media. The following Optical media used by optical drives: • CD(Compact disk): According to use and recording type there are three type of CDs Read-only: CD-ROM Writable: CD-R Re-writable: CD-RW • DVD(Digital versatile disk) : high capacity drives • Blu-ray disk: most commonly used to store video. Comment Chapter 16, Problem 32RQ Problem What is automatic storage tiering? Why is it useful? Step-by-step solution Step 1 of 1 Automated storage tiering (AST): AST is the one of the storage types that filters and transfers the data among different types of storage like SATA, SAS, SSDs based on the storage requirement, dynamically. Automated tiering mechanism is managed by the storage administrator. According to the tiering policy, less used data is transferred to the SATA drives, as it is slower and is not much expensive, and frequently used data is transferred to high speed SAS or solid state drives. The automated tiering highly improves performance of the DBMS. EMC implements FAST (fully automated storage tiering). It automatically monitors data activeness, and moves active data to high performance storage like SSD and inactive data to inexpensive and slower storage like SATA. Therefore, AST is useful as it results in high performance and low cost. Comment Chapter 16, Problem 33RQ Problem What is object-based storage? How is it superior to conventional storage systems? Step-by-step solution Step 1 of 2 Object - based storage: In object based storage system data is organized in units called object instead of blocks in file. In this storage system, data is not stored in hierarchy rather than all the data is stored in the form of objects, and required object can be searched directly using unique global identifier, without overhead. Every object in object based storage has three parts: • Data: It is the information that is to be stored in the objects. • Variable Meta data: This field has the information about main data like location of the data, usability, confidentiality and other information required to manage the data. • Unique global identifier: This identifier stores the address information of the data so that data can be located easily. Comment Step 2 of 2 Object storage system is better than conventional storage system in following ways: • As the organizations are expanding, their data is also increasing day by day. If the file system is used as a data storage system and data is stored in the blocks, it would become very difficult to manage huge amount of data. In conventional file systems, data is stored in hierarchical fashion and all these data are stored into blocks with their own unique address. To solve this management overhead, data is stored in the form of objects with additional metadata information. • Object based storage provides security of data. In object based systems, the objects can be accessed directly by the applications through unique global identifier. While in the file storage system data need to be searched in linear or binary fashion that generates processing overhead and is time consuming. • Object based storage system supports features like replication, encapsulation and distribution of objects, that makes data secure, manageable and easily accessible. However, conventional file based storage system does not supports replication and distribution of objects. Comment Chapter 16, Problem 34E Problem Consider a disk with the following characteristics (these are not parameters of any particular disk unit): block size B = 512 bytes; interblock gap size G = 128 bytes; number of blocks per track = 20; number of tracks per surface = 400. A disk pack consists of 15 double-sided disks. a. What is the total capacity of a track, and what is its useful capacity (excluding interblock gaps)? b. How many cylinders are there? c. What are the total capacity and the useful capacity of a cylinder? d. What are the total capacity and the useful capacity of a disk pack? e. Suppose that the disk drive rotates the disk pack at a speed of 2,400 rpm (revolutions per minute); what are the transfer rate (tr) in bytes/msec and the block transfer time (btt) in msec? What is the average rotational delay (rd) in msec? What is the bulk transfer rate? (See Appendix B.) f. Suppose that the average seek time is 30 msec. How much time does it take (on the average) in msec to locate and transfer a single block, given its block address? g. Calculate the average time it would take to transfer 20 random blocks, and compare this with the time it would take to transfer 20 consecutive blocks using double buffering to save seek time and rotational delay. Step-by-step solution Step 1 of 8 Given data Block size Inter block gap size Number of blocks per track Number of tracks per surface Disk pack consists of 15 double – sided disks Comment Step 2 of 8 (a) Total track size Block per track (block size block gap size) Bytes k bytes Useful capacity of a track = block per tract block size Bytes Comment Step 3 of 8 (b) Number of cylinders Numbers of tracks 400 Comment Step 4 of 8 (c) Total cylinder capacity Comment Step 5 of 8 (d) Total capacity of a disk pack Bytes m bytes Useful capacity of a disk pack Comment Step 6 of 8 (e) Transfer rate ` Block transfer time Average rotational delay Comment Step 7 of 8 (f) Average time to locate and transfer a block Comment Step 8 of 8 (g) Time to transfer 20 random blocks Time to transfer 20 consecutive blocks using double Buffering Comment Chapter 16, Problem 35E Problem A file has r = 20,000 STUDENT records of fixed length. Each record has the following fields: Name (30 bytes), Ssn (9 bytes), Address (40 bytes), PHONE (10 bytes), Birth_date (8 bytes), Sex (1 byte), Major_dept_code (4 bytes), Minor_dept_code (4 bytes), Class_code (4 bytes, integer), and Degree_program (3 bytes). An additional byte is used as a deletion marker. The file is stored on the disk whose parameters are given in Exercise. a. Calculate the record size R in bytes. b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization. c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the file blocks are stored contiguously, and double buffering is used; (ii) the file blocks are not stored contiguously. d. Assume that the file is ordered by Ssn; by doing a binary search, calculate the time it takes to search for a record given its Ssn value. Exercise What are SATA, SAS, and FC protocols? Step-by-step solution Step 1 of 6 Comment Step 2 of 6 Comment Step 3 of 6 Comments (1) Step 4 of 6 Comment Step 5 of 6 Comment Step 6 of 6 Comment Chapter 16, Problem 36E Problem Suppose that only 80% of the STUDENT records from Exercise have a value for Phone, 85% for Major_dept_code, 15% for Minor_dept_code, and 90% for Degree_program; and suppose that we use a variable-length record file. Each record has a 1-byte field type for each field in the record, plus the 1-byte deletion marker and a 1-byte end-of-record marker. Suppose that we use a spanned record organization, where each block has a 5-byte pointer to the next block (this space is not used for record storage). a. Calculate the average record length R in bytes. b. Calculate the number of blocks needed for the file. Exercise What are solid-state drives (SSDs) and what advantage do they offer over HDDs? Step-by-step solution Step 1 of 3 Assume that a variable length record file is being used. It is provided that each record has 1 byte field type, along with 1 byte deletion marker and 1 byte end of record marker. So the fixed record size would be calculated for fields not mentioned in the question, that is Name, Ssn, Address, Birth_date, Sex, Class_code. Therefore, And for the remaining variable length fields, that is Phone, Major_dept_code, Minor_dept_code, Degree_program), the number of bytes per record can be calculated as, Comment Step 2 of 3 a. Therefore, the average record length R is, The average record length is . Comment Step 3 of 3 b. Since a spanned record-file organization is being used, where each block has unused space of 5-bytes pointer, so the usable bytes in each block are . The number of blocks required for the file can be calculated as, The numbers of blocks required for file are Comment . Chapter 16, Problem 37E Problem Suppose that a disk unit has the following parameters; seek time s = 20 msec; rotational delay rd = 10 msec; block transfer time btt= 1 msec; block size B = 2400 bytes; interblock gap size G = 600 bytes. An EMPLOYEE file has the following fields: Ssn, 9 bytes; Last_name, 20 bytes; First_name, 20 bytes; Middle_init, 1 byte; Birth_date, 10 bytes; Address, 35 bytes; Phone, 12 bytes; Supervisor_ssn, 9 bytes; Department, 4 bytes; Job_code, 4 bytes; deletion marker, 1 byte. The EMPLOYEE file has r = 30,000 records, fixed-length format, and unspanned blocking. Write appropriate formulas and calculate the following values for the above EMPLOYEE file: a. Calculate the record size R (including the deletion marker), the blocking factor bfr, and the number of disk blocks b. b. Calculate the wasted space in each disk block because of the unspanned organization. c. Calculate the transfer rate tr and the bulk transfer rate btr for this disk unit (see Appendix B for definitions of tr and btr). d. Calculate the average number of block accesses needed to search for an arbitrary record in the file, using linear search. e. Calculate in msec the average time needed to search for an arbitrary record in the file, using linear search, if the file blocks are stored on consecutive disk blocks and double buffering is used. f. Calculate in msec the average time needed to search for an arbitrary record in the file, using linear search, if the file blocks are not stored on consecutive disk blocks. g. Assume that the records are ordered via some key field. Calculate the average number of block accesses and the average time needed to search for an arbitrary record in the file, using binary search. Step-by-step solution Step 1 of 7 Consider the following parameter of a disk: Seek time s = 20 msec Rotational delay rd = 10 msec Block transfer time btt = 1 msec Block size B = 2400 bytes Inter block gap size G = 600 bytes Consider a file EMPLOYEE is having records such that r = 30,000. Different fields common in each record are as follows: Field name Size (in bytes) Ssn 9 First_name 20 Last_name 20 Middle_init 1 Address 35 Phone 12 Birth_date 10 Supervisor_ssn 9 Department 4 Job_code 4 deletion marker 1 Comment Step 2 of 7 The record size R can be calculated as, The record size is . Since the file is unspanned so the blocking factor bfr can be calculated as, The blocking factor is . In an unspanned organization of records, the number of file blocks can be calculated as, The numbers of file blocks are . Comment Step 3 of 7 As the file has unspanned organization, so wasted space in each block can be calculated as, The wasted space in each disk block is . Comments (1) Step 4 of 7 The transfer rate tr can be calculated as, The transfer rate for the disk is . The bulk transfer rate btr can be calculated as, The bulk transfer rate for the disk is . Comment Step 5 of 7 While searching for an arbitrary record in a file using the liner search the average number of block accesses can be found as follows: • Records are searched on key fields. If one record satisfies the search condition, on average half of the blocks are to be searched, that is . If the record does not satisfies the search condition, all blocks are to be searched, that is . • Records are searched on non-key fields. In this case all blocks are to be searched, that is . To calculate the average time to find a record using linear search on the file, the search is performed on average half of the file blocks. Half of 1579 file blocks is approximately, 1579/2 = 789.5 blocks. If the blocks are stored on consecutive disk block and double buffering is used, the average time taken to read 789.5 blocks is, If the file blocks are stored consecutively and double buffering is used, then the average time taken to find a record by doing linear search on the file is . Comment Step 6 of 7 If the file blocks are not stored in consecutive disk blocks, the time taken to read 789.5 blocks is, If the file blocks are not stored consecutively, then the average time taken to find a record by doing linear search on the file is . Comment Step 7 of 7 While the records are ordered via some key field and binary search is going on, then the average number of block accesses can be found as follows • If record is found then on an average half of the blocks are to be accessed, that is . • If the record is not found then all blocks are to be accessed, that is . If it is assumed that records are ordered through some key field, the time taken to search a record, using binary search, is calculated as, The average time taken to search a record via some key field is Comment . Chapter 16, Problem 39E Problem Load the records of Exercise into expandable hash files based on extendible hashing. Show the structure of the directory at each step, and the global and local depths. Use the hash function h(K) = K mod 128. Exercise What are optical and tape jukeboxes? What are the different types of optical media served by optical drives? Step-by-step solution Step 1 of 10 Consider the following records: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981 and 9208. The hash function is . Comment Step 2 of 10 Calculate the hash value (bucket number) and binary value to each record as follows: Comment Step 3 of 10 Now, perform the extendible hashing with local depth 0 and global depth 0. Here, each bucket can hold two records. The record 3 i.e., 4692 cannot be inserted because, already two records are inserted. Increase the global depth to one to insert more elements. Now, the global depth is 1 and local depth is 1. Check the binary value of each record. Map the record to 0 if the binary value of the record starts with 0. Map the record to 1 if the binary value of the record starts with 1. For example, the binary value of bucket number for 2369 is 1000001 (First bit is highlighted). The first bit is 1 thus, it should be mapped to 1. The binary value of bucket number for 3760 is 0110000. The first bit is 0 thus, it should be mapped to 0. Comment Step 4 of 10 The next record cannot be inserted because all the blocks are filled. Comment Step 5 of 10 Now, increase the global depth to 2. Thus, check for the first two bits of the binary value of the bucket number. Now, insert the next record. Comment Step 6 of 10 The record 1821 cannot be inserted. Thus, increase the global depth to 3. Comment Step 7 of 10 Now, insert other records. The record 1074 can be inserted easily because there is a space in the bucket. Now, insert 7115. Comment Step 8 of 10 The record 7115 cannot be inserted. Now, increase the local depth to 3 for the last bucket and insert the elements. The records left are 6975, 4981 and 9208. The record 6975 cannot be inserted. Increase the global depth to 4 and insert the elements. Comment Step 9 of 10 The last record cannot be inserted. Insert 9208 by increasing the local depth to 4 in the corresponding block. The final table is as follows: Comment Step 10 of 10 Comment Chapter 16, Problem 40E Problem Load the records of Exercise into an expandable hash file, using linear hashing. Start with a single disk block, using the hash function h0 = K mod 20, and show how the file grows and how the hash functions change as the records are inserted. Assume that blocks are split whenever an overflow occurs, and show the value of n at each stage. Exercise What are optical and tape jukeboxes? What are the different types of optical media served by optical drives? Step-by-step solution Step 1 of 1 When we apply hash function K Mod 2^0 we get a single bucket. We split this bucket into two buckets with new function K Mod 2^1 Bucket1:2369, 4871, 5659, 1821, 7115, 3943, 6975, 4981 Bucket2:3760,4692, 1074, 1620, 2428, 4750, 9208 Now we can split bucket into four buckets: B1a:2369, 1821,4981 B1b:1074,4750 B1c:4871,5659,7115,3943,6975, B1d:3760, 4692,1620,2428,9208 Since some bucket more than 2 elements they can be split using function K Mod 2^3 B1:2369, B5:1821,4981 B7:4871,3943,6975 B3:5659, 7115 B8:3760,9208 B4:4692,1620,2428 B2: 1074 B6:4750 Since some buckets are still greater in size so we apply another function on them K Mod 2^4 B1: 2369 B5:4981 B7:4871,3943, B8:9208 B15:6975 B4:4692,1620 B11:5659,7115 B12:2428 B13:1821 B16:3760 B14:4750 B2:1074 Now we have all buckets of correct size. Comment Chapter 16, Problem 41E Problem Compare the file commands listed in Section 16.5 to those available on a file access method you are familiar with. Step-by-step solution Step 1 of 1 File commends listed in Files of Unordered Records, on a file access methods. Records are placed in the file in the order in which they are inserted. Records are inserted at the end of the file. This record organization is called heap/ pile file. File commends in the files of unordered records:Inserting a new record:Delete a record External sating Inserting a record:New record insertion is very efficient. It is done by when new record is inserted. Then the last block of the file is copied in to a butter than the new record is added then block is rewriters back to the disk. Delete a record:Program must find it’s block first, and copy the block into a buffer, then delete the record from the buffer and finally rewrite the block back to it disk. In this record deletion. We use the technique of deletion marker. External sorting:When we want to read all records in order of the value of some fields. Then we create a sorted copy of the file. For a large disk file it is an expensive. So, for this we use external sorting. Comment Chapter 16, Problem 42E Problem Suppose that we have an unordered file of fixed-length records that uses an unspanned record organization. Outline algorithms for insertion, deletion, and modification of a file record. State any assumptions you make. Step-by-step solution Step 1 of 1 Compare the heap file (unordered files) and file access methods. Heap file:- The simplest and basic type of organization. - Records are placed in the file in the order in which are inserted. - Inserting a new record is very efficient. - New records are inserted at the end of the file. - Searching is done by only search procedure. Mainly involves a linear search, and it is an expensive procedure. Fine access methods:- In the file organization, organization of the data of a file into records, blocks, and access structures. - Records and blocks are placed on the storage medium and they are interlinked. Example: sorted file. Access methods:- Provide a group of operations and that can be applied to a file. Example: Open, find, delete, modify, insert close ……etc. - An organization is consists of several access methods. It is possible to apply. - Some access methods can be applied only to file organized in certain ways. That are Records organized by serially, (sequential) Relative record number based on organization. (Relative) Indexed based organization (indexed) Method access refers to the way that is, in which records are accessed. A file with an organization of indexed or relative may still have its records accessed sequentially. But records in a file with an organization of sequential. Cannot be accessed directly. Comment Chapter 16, Problem 43E Problem Suppose that we have an ordered file of fixed-length records and an unordered overflow file to handle insertion. Both files use unspanned records. Outline algorithms for insertion, deletion, and modification of a file record and for reorganizing the file. State any assumptions you make. Step-by-step solution Step 1 of 2 For ordered file of fixed length: Algorithms: Consider that file name is abc and file is ordered on Key field that is a numeric fiels and in increasing order. For insertion: Let for record that is to be inserted value of Key field be n 1. Open file abc and take file pointer in variable fp 2. Find record where fp.key>n 3. Insert current record at this position. 4. Save the file data 5. Close file For deletion: let record to be deleted has value for key field = n 1. Open file abc and take file pointer in variable fp 2. Find record where fp.key = n 3. Delete the record. 4. Save result 5. Close file For modification: let record to be modified has value of key field = n and value of Name is to be modified to xyz. 1. Open file abc and take file pointer in variable fp 2. Find record where fp.key=n 3. Set fp.name = ‘xyz’ 4. Save result 5. Close file. For am unordered file: Comment Step 2 of 2 For insertion: Let for record that is to be inserted value of Key field be n 1. Open file abc and take file pointer in variable fp 2. Seek end of file 3. Insert current record at this position. 4. Save the file data 5. Close file For deletion: let record to be deleted has value for key field = n 1. Open file abc and take file pointer in variable fp 2. Find record where fp.key = n 3. Delete the record. 4. Save result 5. Close file For modification: let record to be modified has value of key field = n and value of Name is to be modified to xyz. 1. Open file abc and take file pointer in variable fp 2. Find record where fp.key = n 3. Set fp.name = ‘xyz’ 4. Save result 5. Close file. Comment Chapter 16, Problem 44E Problem Can you think of techniques other than an unordered overflow file that can be used to make insertions in an ordered file more efficient? Step-by-step solution Step 1 of 1 575-13-33E Yes, we may think that it is possible to use an overflow file in which the records are chained together in a manner similar to the overflow for static hash files. The overflow records that should be inserted in each block of the ordered file are linked together in the overflow file, and a pointer to the first record in the linked list, that is kept in the block of the main file. The list may or may not be kept ordered. Comment Chapter 16, Problem 45E Problem Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any assumptions you make. Step-by-step solution Step 1 of 2 Over flow is handled by chaining. Means, in a bucket. Multiple blocks are chained together and attached by a number of over flow buckets together. In a hash structure. The insertion is done like this Step 1: Each bucket stores a value values on the first all the entries that point to the same bucket have the same ; bits Step 2: To locate the bucket containing search key ; Compute Use the first high order nits of as a displacement in to the bucket address table and follow the pointer to the appropriate bucket. Step 3: T inserts a record with search key value ; Follow lookup procedure to locate the bucket, say If there is room in bucket , insert the record Otherwise the bucket must be split and insertion reattempted. Comment Step 2 of 2 Deletion in hash file:To delete a key value, Sept 1. Locate it in its bucket and remove it Step 2. The bucket it self can be removed if it becomes empty Step 3. Coalescing of buckets is possible-can only coalesce with a “buddy” bucket having the same value of and same prefix, if one such bucket exists Assumptions:Each key in the record is unique Data file in the record is open Overflow file is open A bucket record has been defined Comment Chapter 16, Problem 45E Problem Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any assumptions you make. Step-by-step solution Step 1 of 2 Over flow is handled by chaining. Means, in a bucket. Multiple blocks are chained together and attached by a number of over flow buckets together. In a hash structure. The insertion is done like this Step 1: Each bucket stores a value values on the first all the entries that point to the same bucket have the same ; bits Step 2: To locate the bucket containing search key ; Compute Use the first high order nits of as a displacement in to the bucket address table and follow the pointer to the appropriate bucket. Step 3: T inserts a record with search key value ; Follow lookup procedure to locate the bucket, say If there is room in bucket , insert the record Otherwise the bucket must be split and insertion reattempted. Comment Step 2 of 2 Deletion in hash file:To delete a key value, Sept 1. Locate it in its bucket and remove it Step 2. The bucket it self can be removed if it becomes empty Step 3. Coalescing of buckets is possible-can only coalesce with a “buddy” bucket having the same value of and same prefix, if one such bucket exists Assumptions:Each key in the record is unique Data file in the record is open Overflow file is open A bucket record has been defined Comment Chapter 16, Problem 46E Problem Can you think of techniques other than chaining to handle bucket overflow in external hashing? Step-by-step solution Step 1 of 5 To handle a bucket overflow in external hashing, there is a techniques like chaining and TrieBased hashing. Through this technique: - it allow the number of allocated buckets to grow and shrink as needed. - Distributes records among buckets based on the values of the leading bits in their hash values. We can show this technique by the following. Let bucket of disk address is Comment Step 2 of 5 Comment Step 3 of 5 Over flow is done by, the bucket (block) based on the first binary digit of the hash address. So, the address is split into Comment Step 4 of 5 Comment Step 5 of 5 Here bulk flow is done and now again it is split on 2nd bit in the hash address Ti show this, Suppose we have: If we want to inset Comment in the previous structure thour the structure is comes like this Chapter 16, Problem 47E Problem Write pseudocode for the insertion algorithms for linear hashing and for extendible hashing. Step-by-step solution Step 1 of 2 Pseudo code for the insertion algorithms:We assume that the elements in the hash table T are keys with no information. The key K is identical to the element containing key K. Every slot contains either a key or Nil. HASH – INSERT (T, K) Report If Then Return j Else Unitl Error “hash table over flow” Comment Step 2 of 2 Pseudo code for the insertion algorithms for extendible mashing:Insertion Algorithm: initialize (num buckets) Input: desired number of buckets 1. Initialize array of linked lists; Algorithm: in sert (key, value) Input: key – value pair // compute table entry: Entry = key. Has code ( ) mod num buckets If table [entry] is null //no list present, so create one Table [Entry] = new linked list; Table [Entry].add (key. value) Else //otherwise, add to existing list Table [entry].add (key. value) End if. Comment Chapter 16, Problem 48E Problem Write program code to access individual fields of records under each of the following circumstances. For each case, state the assumptions you make concerning pointers, separator characters, and so on. Determine the type of information needed in the file header in order for your code to be general in each case. a. Fixed-length records with unspanned blocking b. Fixed-length records with spanned blocking c. Variable-length records with variable-length fields and spanned blocking d. Variable-length records with repeating groups and spanned blocking e. Variable-length records with optional fields and spanned blocking f. Variable-length records that allow all three cases in parts c, d, and e Step-by-step solution Step 1 of 6 a. Consider the following program code for fixed length records with unspanned blocking. //initialize the initial address of starting location using pointer *starting_location=200; // record_to_access int x //x is the fifth record in the field x = 5; y is the second field of the fifth record y = 2; //record_size R=25; //for loop is used to check the value of byte. for (B=0; B>=25; B++) //while loop is used to check the bytes B remaining in each field while (B { x = starting_locaton+(R*x)+y; } • In the above code, assume that the starting location of memory address is 200. • In computer memory, records are stored into the block. • When the records size is less than the block size, each block store more than one record. • Block size is defined by B bytes and records size is defined by R. Comment Step 2 of 6 b. Consider the following program code for fixed length records with spanned blocking. //initialize the initial address of starting location *starting_location=200; // record_to_access int x //x is the fifth record in the field x = 5; //y is the second field of the fifth record y = 2; //record_size R=25; //initialize the value of i int i=0; //B is the block size int B; //a is field size int a=1 // for loop is used to check the value of byte. for (B=0; B>=25; B++) { // while loop is used to check the separating character while ($) { //if while loop contain the separating symbol, update the value of //current_location current_location = current_location + 25B; //while loop is used to check the bytes B remaining in each field while (B { //update the value of variable i i= i + 2*(a+1) } } } • In the above code, $ is used as separator character. • while loop contain the separating symbol, update the value of current_location • update the value of variable I Comment Step 3 of 6 c. Consider the following code for variable length records with variable length fields and spanned blocking. //initialize the initial address of starting location *starting_location=200; // record_to_access int x //x is the fifth record in the field x = 5; //y is the second field of the fifth record y = 2; //record_size R=25; //a is field size int a=1 // ReadFirstByte is used to reads first byte of current line and returns true if it indicates an //empty record empty = ReadFirstByte(a); //if statement is used to check the condition if (! empty) { // update the value of crnt_Rcrd_Length crnt_Rcrd_Length += a.length (); } //if statement is used to check the value of crnt_Rcrd_Length if (crnt_Rcrd_Length!= R) { empty = false; } // if statement is used to check the value of crnt_Rcrd_ if (crnt_Rcrd_Length > R) { // not efficient, nor thread safe - deep copy occurs here records.push_back(*this); } • In the above code assume that each record has an end of record byte. • Move byte by byte to access the records. • if statement is used to check the value of specified condition in loop. Comment Step 4 of 6 d. Consider the following code for variable length records with repeating group and spanned blocking. if (! empty) { // update the value of crnt_Rcrd_Length crnt_Rcrd_Length += a.length (); } • Consider the highlighted code. It will be removed from part (c) to determine variable length records with repeating group and spanned blocking. • Since the spanned blocking involves records spanning more than one block, so the record length is not required. Comment Step 5 of 6 e. Consider the following code for variable length records with optional field and spanned blocking. if (crnt_Rcrd_Length!= R) { empty = false; } • Consider the highlighted code. It will be removed from part (c) to determine variable length records with optional field and spanned blocking. • As some of the fields in the file records are optional, so the record length of the records, present in the files, can be skipped. Comment Step 6 of 6 f. Consider the following code for variable length records that allow all three cases in parts c, d and e. if (crnt_Rcrd_Length > R) { // not efficient, nor thread safe - deep copy occurs here records.push_back(*this); } • Consider the highlighted code. It will be removed from part (c) to determine variable length records that allow all three cases in parts c, d and e. • One or more of the fields of the records, present in the files, are of varying size so their size need not be greater than R. Hence the above part of the code can be skipped. Comment Chapter 16, Problem 49E Problem Suppose that a file initially contains r = 120,000 records of R = 200 bytes each in an unsorted (heap) file. The block size B = 2,400 bytes, the average seek time s = 16 ms, the average rotational latency rd = 8.3 ms, and the block transfer time btt = 0.8 ms. Assume that 1 record is deleted for every 2 records added until the total number of active records is 240,000. a. How many block transfers are needed to reorganize the file? b. How long does it take to find a record right before reorganization? c. How long does it take to find a record right after reorganization? Step-by-step solution Step 1 of 4 Let X = # of records are deleted and 2X= # of records added. So, total active records = 240,000 = 120,000 - X + 2X. X = 120,000 Physically records may deleting for reorganization is = 360,000. Comment Step 2 of 4 (a) No. of blocks for Reorganization = Blocks Read + Blocks Written. -200 bytes/record and 2400 bytes/block gives us 12 records per block - involves 360,000 records 360,000/12 = 30K blocks -Writing involves 240,000 records 240000/12 = 20K blocks. Total blocks transferred during reorganization = 30K + 20K = 50K blocks. Comment Step 3 of 4 (b) On an average we assume that half the file will be read. So, Time = (b/2)* btt = 15000 * 0.8 ms = 12000 ms. = 12 sec. Comment Step 4 of 4 (c) Time to locate a record after reorganization = (b/2) * btt = 10000 * 0.8 = 8 sec. Comment Chapter 16, Problem 50E Problem Suppose we have a sequential (ordered) file of 100,000 records where each record is 240 bytes. Assume that B = 2,400 bytes, s = 16 ms, rd = 8.3 ms, and btt = 0.8 ms. Suppose we want to make X independent random record reads from the file. We could make X random block reads or we could perform one exhaustive read of the entire file looking for those X records. The question is to decide when it would be more efficient to perform one exhaustive read of the entire file than to perform X individual random reads. That is, what is the value for X when an exhaustive read of the file is more efficient than random X reads? Develop this as a function of X. Step-by-step solution Step 1 of 3 The records in the file are ordered sequentially. Total number of records in the file (Tr) = 100000. Size of each record (rs) = 240 bytes. Size of each block (B) = 2400 bytes. Average seek time (s) = 16 ms. Average rotational latency (rd) = 8.3 ms. Block transfer time (btt) = 0.8 ms. Calculate the total number of blocks (TB) in file using the formula . Hence, total number of blocks in file (TB) = 10000 blocks. Comment Step 2 of 3 Calculate the time required for exhaustive reads (er) using the formula . Hence, the time required for exhaustive read (er) = 8024.3 ms. Comment Step 3 of 3 Consider X be the number of records need to be read. The equation to decide the performance of one exhaustive read of the entire file is more efficient than performing X individual random reads follows: Time required to perform X individual random reads > time required for exhaustive read Therefore, when 320 or more individual random reads are required, then it is better to read the file exhaustively. The function in X that relates the individual random reads and exhaustive reads is given by the following equation: Comment Chapter 16, Problem 51E Problem Suppose that a static hash file initially has 600 buckets in the primary area and that records are inserted that create an overflow area of 600 buckets. If we reorganize the hash file, we can assume that most of the overflow is eliminated. If the cost of reorganizing the file is the cost of the bucket transfers (reading and writing all of the buckets) and the only periodic file operation is the fetch operation, then how many times would we have to perform a fetch (successfully) to make the reorganization cost effective? That is, the reorganization cost and subsequent search cost are less than the search cost before reorganization. Support your answer. Assume s = 16 msec, rd = 8.3 msec, and btt = 1 msec. Step-by-step solution Step 1 of 1 Primary Area = 600 buckets Secondary Area = 600 buckets Total reorganization cost = Buckets Read & Buckets Written for (600 & 600) + 1200 = 2400 buckets = 2400 (1 ms) = 2400 ms Let X = number of random fetches from the file. Average Search time per fetch = time to access (1 + 1/2) buckets where 50% of time we need to access the overflow bucket. Access time for one bucket access = (S + r + btt) = 16 + 8.3 + 0-8 = 25.1 ms Time with reorganization for the X fetches = 2400 + X (25.1) ms Time without reorganization for X fetches = X (25.1) (1 + 1/2) ms = 1.5 * X * (25.1) ms. So, 2400 + X (25.1) < (25.1) * (1.5X) 2374.9/ 12.55 < X So, 189.23 < X If we take at least 190 fetches, then the reorganization is worthwhile. Comment Chapter 16, Problem 52E Problem Suppose we want to create a linear hash file with a file load factor of 0.7 and a blocking factor of 20 records per bucket, which is to contain 112,000 records initially. a. How many buckets should we allocate in the primary area? b. What should be the number of bits used for bucket addresses? Step-by-step solution Step 1 of 2 575-13-41E (a) No of buckets in primary area = 112000/(20*0.7) = 8000. Comment Step 2 of 2 (b) Let ‘K’ is the number of bits used for bucket addresses. So, 2K < = 8000 < = 2 k+1 2 12 = 4096 2 13 = 8192 K = 12 Boundary Value = 8000 - 2 12 = 8000 - 4096 = 3904 - Comment Chapter 17, Problem 1RQ Problem Define the following terms: indexing field, primary key field, clustering field, secondary key field, block anchor, dense index, and nondense (sparse) index. Step-by-step solution Step 1 of 1 Define the following terms:Indexing field:Record structure is consisting of several fields. The record fields are used to construct an index. An index access structure is usually defined on a single field of a file. Any field in a file can be used to create an index and multiple indexes on different fields can be constructed on a field. Primary key field:A primary key is the ordering key field of the file. A field that is uniquely identifies a record. Clustering field:A secondary index is also an ordered field with two fields. ( like a primary index). The first field is of the same data type as some non-ordering field of the data file that is an indexing field. If the secondary access structure uses a key field, which has a distinct value for every record. Therefore, it is called as secondary key field. Block anchor:The total number of entries in the index is the same as the number of disk block in the ordered data file. The first record in each block of the data file is called as block anchor. Dense index: An index has an index entry for every search key value (and hence every record) in the data file. Index record contains the pointer and search key value to the records on the disk Non-dense:An index has entries for only some of the search values. Comment Chapter 17, Problem 2RQ Problem What are the differences among primary, secondary, and clustering indexes? How do these differences affect the ways in which these indexes are implemented? Which of the indexes are dense, and which are not? Step-by-step solution Step 1 of 1 Differences among primary secondary and clustering indexes:- Comment Chapter 17, Problem 3RQ Problem Why can we have at most one primary or clustering index on a file, but several secondary indexes? Step-by-step solution Step 1 of 2 A file which is in an order has some fixed size of the records with some key fields is said to be the primary index. But the clustering index in which it has a block pointer and the data with a field of the same type as the clustering field. Adding or removing records in the file cannot be done easily. It has some problems in which the data records are physically ordered. To overcome this problem, a whole block can be reserved for each of the clustering fields. Comment Step 2 of 2 A file which is not in an order is said to be secondary index. It can be defined on a single key field with a unique value and on a non-key field with repeated values. The following is the reason behind why there are at most one primary or clustering indexes whereas several indexes for secondary index: • Primary and clustering index can use a single key field such that both of them cannot be there in a file but for secondary index, a unique value can be taken as a key field in every records or a non-key field with the repeated values in which the pointers will point to another block that have pointers to the repeated values. Comment Chapter 17, Problem 4RQ Problem How does multilevel indexing improve the efficiency of searching an index file? Step-by-step solution Step 1 of 4 Solution: Multilevel indexing improves the efficiency of searching an indexing file. • In multilevel indexing, the main idea is to reduce the blocks of the index that are searched. • It is the blocking factor for the index. So, the search space is reduced much faster. A Multi-level defines the index file that will be referred first with an ordered file with a distinct k value. • By using single level index, create the primary index and then create the second-level, thirdlevel and so on. • So that the multi-level index can be created with the single index blocks. Comment Step 2 of 4 For improving the efficiency of searching the index file, multilevel index in is follows the following steps: Step1: • Multilevel index considers the index file. The distinct value with an ordered file for each key k (i) Step 2: • In first level, create a primary index. • It is called primary index. • Also, use block anchors. • So, there is one entry in the level for each block. • Hence, the second level blocking factor before, is some as the first level of the index. • Here, before the blocking factor the first level 1 has entries, then the first level needs blocks. • Then, in the second level index is needed. Step 3: • In next level, the primary index has an entry in the second level for the second-level blocks. So, the entries in the third level is • Now repeat the process until all the entries fit in the single block of some index level • Now, it is in the block at the fit. th level. Also, it is the top index level. • So, reduce the number of entries by a factor of at the previous level. Comment Step 3 of 4 Use the formula to calculate the value, Hence in the multilevel index, Approximately levels will be corresponding to the first-level entries Where From the above steps and processer, we may improve the efficiency of the search an index file. Comment Step 4 of 4 The following ways that the multilevel indexing improved the efficiency of searching an index file is: • While searching the record, it reduces the access of number of blocks in the given indexing field value. • The benefits of multi-level indexing include the reduction of insertion and deletion problems in indexing. • While inserting new entries, it leaves some space that deals to tshe advantage to developers to adopt the multi-level indexing. • By using B-tress and B+ trees, it is often implemented. Comment Chapter 17, Problem 5RQ Problem What is the order p of a B-tree? Describe the structure of B-tree nodes. Step-by-step solution Step 1 of 2 Order P of a B – tree:1 A tree, it consists that, each node contains at must p – 1 search values and P pointers in the order Where Here each : is a pointer to child node and Is search value from some ordered set of values. Comment Step 2 of 2 Structure of the B-tree Structure of a B-tree follows the below steps. Step 1: Each internal node in the B-tree is in the form of Here is a tree pointer is a data pointer. and Search key value is equal to Step 2: With in each node, Step 3: For all search key field values X in the Sub tree pointed at by Step 4: Each node have at most : tree pointers. Sep 5: Each node, except the root and leaf nodes, has at least two tree pointers unless it is the only node in the tree Step 6: A node with a tree pointers, , has search key field values. Step 7: All nodes are at the same level. Leaf nodes have to same structure as internal nodes except that all of their tree pointers Below figure shows the structure:- Comment are Null Chapter 17, Problem 6RQ Problem What is the order p of a B+-tree? Describe the structure of both internal and leaf nodes of a B+tree. Step-by-step solution Step 1 of 4 Order P of a B + -tree:Implementation of a dynamic multilevel index use a variation of the B-tree data structure is called as -tree. Structure of internal nodes of a B + -tree:- Comment Step 2 of 4 Comment Step 3 of 4 From the above figure, Step 1 Each internal node is of the form of Where and each is a tree pointer. Step 2 Within each internal node, Step 3 For all search field values X in the sub tree pointed at by ; where for we have and Step 4 each internal node has at most P tree pointers Step 5 Each internal node, except the root, has at least tree pointers. The root node has at least two tree pointers, if it is an internal node. Step 6 An internal node with pointers, . Has search field values. Structure of leaf nodes of B + -tree:- Comment Step 4 of 4 From the above figure:- for Step 1: Each leaf node is the form of Where , each is a data pointer and points to the next leaf node of the Step 2: Within each leaf node, Step 3: Each is a data pointer that points to the record whose search field value is or to a file block containing the record. Step 4: Each leaf node has at least values. Step 5:All leaf nodes are at the same values. Comment . Chapter 17, Problem 7RQ Problem How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred as an access structure to a data file? Step-by-step solution Step 1 of 1 The main difference in B-tree and B+ - tree is A B-tree has data pointers in the both internal and leaf nodes, where as In B+-tree, it has only tree pointers in internal nodes and all data pointers are in leaf nodes. B+-tree preferred as an access structure to a data file because, entries in the internal nodes of a B+-tree leading to fewer levels improving the search time. In addition that, the entire tree can be traversed in order using the pent pointers. Comment Chapter 17, Problem 8RQ Problem Explain what alternative choices exist for accessing a file based on multiple search keys. Step-by-step solution Step 1 of 3 Choices for accessing file based on multiple fields are: 1. Ordered Index on Multiple Attributes: In this index is created on search key field that is a combination of attributes . If an index is created on attributes , the search key values are tuples with n values: A lexicographic ordering of these tupl values establishes an order on this composite search key. Lexicographic ordering works similar to ordering of character strings. An index on a composite key of n attributes works similarly to primary or secondary indexing. Comment Step 2 of 3 2. Partitioned Hashing: Partitioned hashing is an extension of static external hashing that allows access on multiple keys. It is only suitable for equality comparisons; range queries are not supported. In partitioned hashing, for a key consisting of n components, the hash function is designed to produce a result with n separate hash address. Th bucket address is a concatenation of these n addresses. It is then possible to search for the required composite search keys by looking up the appropriate buckets that match the parts of the address in which we are interested. For example, consider the composite aearch key id Dno is hashed to 3 bits and Age to 5 bits; we get 8 bits of address. Suppose that Dno = has hash address ‘100’ and for Age = 59 has address ‘10101’ to search combination, search bucket address = 10010101. An advantage of portioned hashing is is that it can be easily extended to any number of attributes. The bucket address can be designed so that high order bits in the address correspond to more frequently accessed attributes. Additionally, no separate access needs to be maintained for the individual attributes. The main drawback of portioned hashing is tat it cannot handle range queries on any of the component attributes. Comment Step 3 of 3 3. Grid Files: We can construct grid array with one linear scale for each of search attributes. This method is particularly useful for range queries that would map into a set of cells corresponding to a group of values along the linear scales. Conceptually, the grid file concept may be applied to any number of search keys. For n search keys, the grid array would have n dimensions of the search keys attributes and provides an access by combinations of value along those dimensions. Grid files perform well in terms of reduction in time for multiple key accesses. However, they represent a space overhead in terms of grid array structure. Moreover, with dynamic files, a frequent recognition of the files adds to maintenance cost. Comment Chapter 17, Problem 9RQ Problem What is partitioned hashing? How does it work? What are its limitations? Step-by-step solution Step 1 of 1 Partitioned flashing:It is an extension of static external hashing. That allows access on multiple keys. Means, hash values that are split into segments. That depend on each attribute of the search key. Let take one example: Let , for customer and search-key being (customer – street, customer – city) Search – key value hash value (Main, ) 101111 (Main, ) 101001 (Park, ) 010010 (Spring, ) 001001 (, ) 110010 Working functionally of partitioned hashing:In partitioned hashing, for a key consisting of produce a result with components. Hash function is designed to separate, hash addresses. Bucket address is added to these address Now, it is ready to search for the required composite search key by looking up eh appropriate buckets that mach the parts of the address in which we are interested. Limitations of partitioned hashing:It can be easily extended to any number of attributes. For individual attributes, it has no separate access structure. It cannot handle range queries on any of the component attributes. Comment Chapter 17, Problem 11RQ Problem Show an example of constructing a grid array on two attributes on some file. Step-by-step solution Step 1 of 2 Take grid array for the EMPLOYEE file with one linear scale D no and another for the age attribute. D no 0 1,2 1 3,4 25 3 6,7 48 5. 9,10 Comment Step 2 of 2 Linear scale for Age 0 1 2 3 4 5 <20 21-25 26-30 31-40 41-50 >50 Through this data we want to show that the linear scale of D no has D no value 0 on the scale while D no combined as one corresponds to the value 2 on the scale and age is divided into its scale of 0 to 5 by grouping ages and distribute the employees uniformly by age. For this the grid array shows cells. And each cell points to some bucket address where the records corresponding to that cell are stored. Now our request for D no and age maps into cell . It is corresponding to grid array, and it will be found in the corresponding bucket. For ‘n’ search keys, the grid array would have ‘n’ dimensions. Grid array on D no and AGE attributes. Employee File. Comment Chapter 17, Problem 12RQ Problem What is a fully inverted file? What is an indexed sequential file? Step-by-step solution Step 1 of 2 Fully inverted file:Indexes that are all secondary and new records are inserted at the end of the file. Then the data file it self is an unordered file. So, a file that have secondary index on every one of its field is called as fully invented file. Usually, the indexes that are implemented as B+- tree and up load dynamically to reflect insertion or deletion of records. Comment Step 2 of 2 Indexed sequential file:An indexed sequential file is a sequential file which has an index. Sequential file means it stored into order of a key field. Indexed sequential files are important for applications where data needs to be accessed through Sequential and randomly using the index. An indexed sequential file allows fast access to a specific record. Let an example. An organization may store the details about it’s employees as an indexed sequential file, and sometimes the file is accessed Sequential:For example, when the whole of the file is processed to produce pay slips at the end of the month. Randomly: An example changes address, or a female employee gets married can changes her surname so, indexed sequential file can only be stored an a random access device. Example magnetic disc, CD Comment Chapter 17, Problem 13RQ Problem How can hashing be used to construct an index? Step-by-step solution Step 1 of 1 Hashing technique is used for searching wherein fast retrieval of records is necessary. The reference file used for this is known as hash file. The search condition is validated using the hash key which is nothing but the reference name that has to be found. Functions of hashing: • A hash function ‘f’ or randomizing function is entered in the hash field value of a record and determines the address of it. • It is also used as an internal search function within a program, whenever a group of records is accessed by using the value of only one field. • Access structures similar to indexes that are based on hashing can be created; the hash index is a secondary structure to access the file by using hashing function on a search key. • The index entries contains the key (K) and the pointer (P) used to point to the record containing the key or block containing the record for that key. • The index files that contain these index entries can be organized as a dynamically expandable hash file, using dynamic or linear or extendible hashing techniques, searching for an entry is performed by using hash search algorithm on K. • Once an entry is identified the pointer (P) is used to locate the corresponding record in the data file. Comment Chapter 17, Problem 14RQ Problem What is bitmap indexing? Create a relation with two columns and sixteen tuples and show an example of a bitmap index on one or both. Step-by-step solution Step 1 of 1 The bitmap index is a data structure that allows querying on more number of keys • It is used for relations that contain a large number of rows so that it can be used identify the relation for the specific key value. • It creates an index for one or more columns and each value or value range in those columns selected is/are indexed. • A bitmap index is created for those columns that should contain only a small number of unique values. Construction • To create a bitmap index for a set of records in a relation or a table, the records must be numbered from 0 to n with an id that is used to be mapped to a physical address that contains a block number and a record offset within the block. • It is created for one particular value of a particular field (or column) as an array of bits. • For example a bitmap index is constructed for the column F and a value V of that column. A relation with n rows of n tuples and it contains n bits. The jth bit is set to 1 if the row j has the value V for column F, otherwise it is set to 0. Example S.No Customer Name Gender 1 Alan M 2 Clara F 3 John M 4 Benjamin M 5 Marcus M 6 Alice F 7 Joule F 8 Louis M 9 Samuel M 10 Lara F 11 Andy F 12 Martin M 13 Catherine F 14 Fuji F 15 Zarain F 16 Ford M Bitmap index for Gender For M: 1011100110010001, the row that contains the tuple M wherever it appears are set to 1, other are set to 0. For F: 0100011001101110, the row that contains the tuple F wherever it appears are set to 1, other are set to 0. Comment Chapter 17, Problem 15RQ Problem What is the concept of function-based indexing? What additional purpose does it serve? Step-by-step solution Step 1 of 1 Function-based indexing is a new type of indexing that has been developed and used by the Oracle database systems as well as in some other organizational products that provides financial profit. By applying any function to the value that belongs to the field or to the collection of fields, a result is obtained which is used as the key to the index that is used to create an index. It ensures that Oracle Database System will use this index to search instead of performing the scan over full table, even when a function is used in the search value of a query. Example, The query that create an index, using function LOWER (CustomerName), CREATE INDEX lower_ix ON Customer (LOWER (CustomerName)); It returns the customer name in lower case letter; LOWER ("MARTIN") results in “martin”, the query given below uses the index: SELECT CustomerName FROM Customer WHERE Lower (CustomerName) = "martin". If the functional-based indexing is not used, an Oracle database system perform scanning process for the entire table, as -tree index is a searching process by using directly only the column value, any function that is used on a column avoids using such an index. Comment Chapter 17, Problem 16RQ Problem What is the difference between a logical index and a physical index? Step-by-step solution Step 1 of 1 Physical index • The index entries with the key (K) and the physical pointer (P), used to point to the physical address of the record stored on the disk as a block number and offset. This is referred as physical index. • For example, a primary file organization is based on extendible or linear hashing, and then at each time when a bucket is split, some of the records are allocated to a newer bucket and hence they are provided with new physical addresses. • If there is a secondary indexing used on the file, the pointers that point to that record must be determined and updated (pointer must be changed if the record moved to another location) but it is considered to be a difficult task. Logical index • The index entries of logical index are a pair of keys K and Ks. • Every entry of the records contains one value of K used for primary organization of files and another key Ks for the secondary indexing field matched with the value K of the field. • While searching the secondary index on the value of Ks, a program can identify the location of the corresponding value of K and use this matching key terms to access the record through the primary organization of the file, thus it introduces an extra search level of indirection between the data and access structure. Comment Chapter 17, Problem 17RQ Problem What is column-based storage of a relational database? Step-by-step solution Step 1 of 1 Column-based storage of relations is a traditional way of storing the relations by row (one by one). It provides advantages especially for read-only queries, which are from read-only databases. It stores each column of data in relational databases individually and provides performance advantages. Advantages • Partitioning the table vertically column by column, so those tables with a two-column are constructed for each and every attribute of the table and thus only the columns that are needed can be accessed. • Column-wise indexes and join indexes are used on multiple tables to provide answer to the queries without accessing the data tables. • Materialized views are used to support queries on multiple columns. Column-wise storage of data provides an extra feature in the index creation. The same column present in each table on number of projections creates indexes on each projection. For storing the values in the same column, various strategies, data compression, null value suppression, and various encoding techniques are used. Comment Chapter 17, Problem 18E Problem Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a record pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed length. Each record has the following fields: Name (30 bytes), Ssn (9 bytes), Department_code (9 bytes), Address (40 bytes), Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes), and Salary (4 bytes, real number). An additional byte is used as a deletion marker. a. Calculate the record size R in bytes. b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization. c. Suppose that the file is ordered by the key field Ssn and we want to construct a primary index on Ssn. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of first-level index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; and (v) the number of block accesses needed to search for and retrieve a record from the file—given its Ssn value—using the primary index. d. Suppose that the file is not ordered by the key field Ssn and we want to construct a secondary index on Ssn. Repeat the previous exercise (part c) for the secondary index and compare with the primary index. e. Suppose that the file is not ordered by the nonkey field Department_code and we want to construct a secondary index on Department_code, using option 3 of Section 17.1.3, with an extra level of indirection that stores record pointers. Assume there are 1,000 distinct values of Department_code and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level index entries and the number of first-level index blocks; (iv) the number of levels needed if we make it into a multilevel index; (v) the total number of blocks required by the multilevel index and the blocks used in the extra level of indirection; and (vi) the approximate number of block accesses needed to search for and retrieve all records in the file that have a specific Department_code value, using the index. f. Suppose that the file is ordered by the nonkey field Department_code and we want to construct a clustering index on Department_code that uses block anchors (every new value of Department_code starts at the beginning of a new block). Assume there are 1,000 distinct values of Department_code and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of first-level index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; and (v) the number of block accesses needed to search for and retrieve all records in the file that have a specific Department_code value, using the clustering index (assume that multiple blocks in a cluster are contiguous). g. Suppose that the file is not ordered by the key field Ssn and we want to construct a B+-tree access structure (index) on Ssn. Calculate (i) the orders p and pleaf of the B+-tree; (ii) the number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for convenience); (iii) the number of levels needed if internal nodes are also 69% full (rounded up for convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block accesses needed to search for and retrieve a record from the file?given its Ssn value?using the B+-tree. h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your results for the B-tree and for the B+-tree. Step-by-step solution Step 1 of 31 Disk operations on file using primary, secondary, clustering, B+ tree and B-tree methods (a) Calculation of Record Size Record size is calculated as follows Record size=Name (in bytes)+ Ssn(in bytes)+Department_code(in bytes) +Address (in bytes) + Phone (in bytes)+ Birth_date(in bytes) +Sex (in bytes)+ Job_code(in bytes)+Salary(in bytes) +1(1byte for deletion marker) Record size Comment Step 2 of 31 (b) Calculation of Blocking factor and number of file blocks Blocking factor, bfr Records per block Number of file blocks, Comment Step 3 of 31 (c) Operations on file ordered by key field Ssn (i) Calculation of Index blocking factor and Index record length, Blocking factor, Comment Step 4 of 31 (ii) Calculation of number of first –level index and number of first level index blocks Number of first – level index entries =Number of first level index blocks Number of first-level index entries, Number of first-level index blocks, Comment Step 5 of 31 (iii) Calculation of number of levels for multi-level index Number of second-level index entries Number of first-level blocks, = = 221 entries Number of second-level index blocks, Number of third-level index entries, = number of second-level index blocks, = 7 entries Number of third-level index blocks, It is the top index level because the third level has only one index. Hence, the index has x = 3 levels (iv) Calculation of number of blocks for multi-level index Total number of the blocks for the index From bit (ii), Number of first-level index blocks, =221 blocks From bit (iii), Number of second-level index blocks, Number of second-level index blocks , =7 block =1 blocks Therefore, the total number of blocks, Comment Step 6 of 31 (v) Calculation of number of block access to search and retrieve a record using primary index on a file. For primary index type of index, the number of block access is equal to the access one block at each level plus one block from the data file. Therefore, the number of block access =x+1 Since the file is ordered with a single key field, Ssn. So it is a type of primary index. Number of blocks access to search for a record Comment Step 7 of 31 (d) Repetition of part c for the secondary index (i) Index record length Index blocking factor bfr In the ‘c’ part, the assumes that leaf-level index blocks contain block pointers. And it is possible to assume that they contain record pointers. And Record size is So, leaf – level index blocking factor bfri. Index records/block for internal nodes, block pointers are always used, so the fan-out for internal nodes to is 34. Comment Step 8 of 31 (ii) Number of first-level index entries Number of file records Number of first level index blocks Number of first level index entries Number of file records Number of first-level index block Comment Step 9 of 31 (iii) Calculate the number of levels Number of second –level index entries Number of first-level index blocks Number of second – level index blocks Ceiling Number of third-level index block So, the third level has one block and it is the top of the level So, index has total 3 levels (iv) Total number of blocks for the index (v) Number of blocks accesses to search for a record Comment Step 10 of 31 (e) Operations on the file which is constructed using secondary index on Department_code (i) Calculation of index blocking factor Index record size Index blocking factor Comment Step 11 of 31 (ii) Calculation of number of blocks for indirection Here distinct values of Department_code. Number of records for each value is So, we know that record pointer size Number of bytes need at the level of indirection for each value of Department_code is It is fits on the block So, 1000 blocks are needed for the level of indirection. Comment Step 12 of 31 (iii) Calculation of number of first-level index entries and number of first level blocks Number of first-level index entries Number of distinct values of Department_code Number of first level index blocks Comment Step 13 of 31 (iv) Calculation of number of levels for multi-level index We can calculate the number of levels by number of second level index entries Number of first level index blocks Entries Number of second-level index blocks Ceiling The index has Comments (1) Step 14 of 31 (v) Calculation of number of blocks for multi-level index Total number of blocks for the index Comment Step 15 of 31 (vi ) Calculation of number of block access to search and retrieve all records in the file for a Department_code value Number of block accesses to search for and retrieve the block containing the record pointers at the level of indirection If records are distributed over 30distrinct blocks, we need an additional blocks. So, total block accesses needed on average to retrieve all the records with in a given value for Department_code Comment Step 16 of 31 (f) Operations on the file which is constructed using clustering index on Department_code (i) Calculation of index blocking factor Index blocking factor Where Comment Step 17 of 31 (ii) Calculation of number of first-level index entries and number of first level blocks Number of first level index entries Number of distinct DEPARTMENT CODE values entries. Number of first-level index blocks Comment Step 18 of 31 (iii) Calculation of number of levels for multi-level index Calculate the number of levels as number of second-level index entries Number of first-level index blocks Number of second-level index blocks Ceiling Second level has one block and it is in the top index level The index has Comments (1) Step 19 of 31 ( iv ) Calculation of number of blocks for multi-level index Total number of blocks for the index Comment Step 20 of 31 (v) Calculation of number of block access to search and retrieve all records in the file for a Department_code value Number of block accesses to search for the first block in the cluster of blocks So, the records are clustered in ceiling So, total block accesses needed on average to retrieve all the records with a given DEPARTMENT CODE Comment Step 21 of 31 (g) Operations on B+ tree (i) Calculation of orders p and p-leaf of B+ tree Orders P and P leaf of the Each internal node has So, For leaf nodes, the record pointers are included in the leaf nodes, and it satisfied the (Or) Comments (2) Step 22 of 31 (ii) Calculation of leaf nodes if the blocks are 69 percent full Nodes are full on the average, so the average number of key values in a leaf node is If we round up this for convenience, we get 22 key values and 22 record pointers per leaf node. So, the file has records and hence values of , the number of leaf-level nodes needed is Comment Step 23 of 31 (iii) Calculation of number of levels if internal nodes are 69 percent full Calculate the number of levels as average fan-out for the internal nodes is Number of second level tree blocks Number of third level tree blocks Number of fourth-level tree blocks So, the fourth level has one block and the tree has So, levels Comment Step 24 of 31 (iv) Calculation of total number of blocks Total number of blocks for the tree Comment Step 25 of 31 (v) Calculation of number of block access to search and retrieve a record of Ssn using B+ tree Number of blocks accesses to search for a record Comment Step 26 of 31 (h) Repetition of part (g) for B-tree (i) p and p leaf order of the Each internal node has Choose p value as large value that satisfies the inequality So, For leaf nodes, the record pointers are included in the leaf nodes, and it satisfied the Comment Step 27 of 31 (ii) Each node of B-Tree is 69% full .So the average number of key values in a leaf node is If we get ceiling of 21.39 for convenience, we get 22 key values and 22 record pointers per leaf node. So, the file has records and hence values of , the number of leaf-level nodes needed is Comment Step 28 of 31 (iii) Calculate the number of levels as average fan-out for the internal nodes is Number of second level tree blocks Number of third level tree blocks Number of fourth-level tree blocks So, the fourth level has one block and the tree has levels So, Comment Step 29 of 31 (iv) Total number of blocks for the tree Comments (1) Step 30 of 31 (v) Number of blocks accesses to search for a record Comment Step 31 of 31 Comparison of B+ tree and B-tree Calculation of approximate number of entries in B+ tree At root level, each node on average will have 34 pointers and 33 (p-1) search fields Root 1 node 33 entries 34 pointers Level1 34 nodes 1122 entries 1156 pointers Level2 1156 nodes 38148 entries 39304 pointers Level3 39304 nodes 1297032 entries Calculation of approximate number of entries in B tree At root level, each node on average will have 23 pointers and 22 (p-1) search fields Root 1 node 23 entries 22 pointers Level1 22 nodes 506 entries 484 pointers Level2 484 nodes 11132 entries 10648 pointers Leaf Level 10648 nodes 244904 entries For given block size, pointer size and search key field size, a three level B+ tree holds 1336335 entries on average .Similarly, for given block size, pointer size and search key field size, a leaf level B tree holds 256565 entries on average .Therefore, average entries stored on B+ tree are more than the average entries stored in B tree. Comment Chapter 17, Problem 19E Problem A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49, 33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like. Step-by-step solution Step 1 of 34 B+ Tree Insertion: Here, the given a set of keys to be inserted into a • The Order • of order implies that each node in the tree should have at most 4 pointers. Means the leaf nodes must have at least 2 keys and at most 3 keys. • The insertion first start from the root, when root or any node overflows its capacity, it must split. • When a leaf node is full the first elements will keep in that node and rest elements should form the right node. • The element at that rightmost position of the left partition will propagate up to the parent node. • If the propagation is from the leaf node, a copy of the element should maintain at leaf. Else, just move that element to its parent node. • All the elements in the key list should be there in the leaf nodes. Comment Step 2 of 34 In problem given a set of keys to insert into the B+ tree in order. The given list is, 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. First, insert the first three keys into the root; it will not result in overflow. Since, the capacity of the node is also 3. The resultant B+ tree will be as below: Since, the node is also a leaf node and there is no pointer. Comment Step 3 of 34 Insert 60: After the insertion of 60 into this node, it will results in an overflow, So the node to be split into two and a new level will created as below: Comment Step 4 of 34 Insert 46: Insertion of 46 will not affect the capacity constraint of the second node in level 2. The resultant tree will be, Level1- 37 Level2: 23,37: 46,60,65 The tree will look as below: Comment Step 5 of 34 Insert 92: Insertion of next key, 92 will results in the overflow of the second node in the level2, it will be 46,60,65,92. • Therefore, we need to split that node from 60 and create one new node in level2 and duplicate 60 in the parent node as below: Comment Step 6 of 34 Comment Step 7 of 34 Insert 48: Insertion of 48 will not prompt any overflow it will insert to the second node in the level2 as below: Comment Step 8 of 34 Insert 71: Insertion of 71 into B+ tree also will not prompt any overflow. • It can insert into third node of level 2 without violating order constraints. • Therefore, the updated tree will be as below: Level1: 37, 60 Level2: 23, 37: 46, 48, 60: 65, 71, 92 The tree will be look as below: Comment Step 9 of 34 Insert 56: Next insertion is of 56. • Clearly 56 is belongs to the second node of level 2 but it will results in an overflow as shown below: • So, need to split that node (46, 48, 56, 60). • The first two (46, 48) will form the first node of split and (56, 60) will form the second, the last element of the first set (48) will propagate to up. • Since it is a leaf node, it will be only duplication. However, the resultant B+ will be as below: Level 1: 37, 48, 60 Level 2: 23, 37: 46, 48: 56, 60: 65, 71, 92 Comment Step 10 of 34 The rest insertion operations can be performed as below: • The level is counts from root to leaves, that is; root will have level value 1 and increment 1 downwards. Insert 59: Level 1: 37, 48, 60 Level 2: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 11 of 34 Insert 18: Level 1: 37, 48, 60 Level 2: 18, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 12 of 34 Insert 21: Level 1: 37, 48, 60 Level 2: 18, 21, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Overflow. Split (18, 21, 23, 37) and propagate 21 to above level. Level 1: 21, 37, 48, 60 Level 2: 18, 21,: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Again, overflow in level 1. Split and propagate 37, since it is not a leaf node so no need to take a copy of 37. This will results a new level in the tree. Level 1: 37 Level 2: 21: 48, 60 Level 3: 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 13 of 34 Insert 10: Level 1: 37 Level 2: 21: 48, 60 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 14 of 34 Insert 74: Level 1: 37 Level 2: 21: 48, 60 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 74, 92 Overflow in level 3. Split overloaded node at 71 Level 1: 37 Level 2: 21: 48, 60, 71 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 92 Comment Step 15 of 34 Insert 78: Level 1: 37 Level 2: 21: 48, 60, 71 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 16 of 34 Insert 15: Level 1: 37 Level 2: 21: 48, 60, 71 Level 3: 10, 15, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Overflow in the first node of level 3, split it at 15 and propagate 15 up. Level 1: 37 Level 2: 15, 21: 48, 60, 71 Level 3: 10, 15: 18, 21: 23, 37: 46, 48: Comment Step 17 of 34 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 18 of 34 Insert 16: Level 1: 37 Level 2: 15, 21: 48, 60, 71 Level 3: 10, 15 16, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 19 of 34 Insert 20: Level 1: 37 Level 2: 15, 21: 48, 60, 71 Level 3: 10, 15 16, 18, 20, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Overflow at the inserted node, split it at 18 and propagate 18 up. Level 1: 37 Level 2: 15, 18, 21: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 20 of 34 Insert 24: Level 1: 37 Level 2: 15, 18, 21: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 21 of 34 Insert 28: Level 1: 37 Level 2: 15, 18, 21: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24, 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Overflow in the fourth node of level 3, split it at 24 and propagate 24 up as below. Level 1: 37 Level 2: 15, 18, 21,24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Again, overflow at level 2, need one more split at 18 as below. Level 1: 18, 37 Level 2: 15: 21,24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 22 of 34 Insert 39: Level 1: 18, 37 Level 2: 15: 21, 24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 23 of 34 Insert 43: Level 1: 18, 37 Level 2: 15: 21, 24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Over flow at the inserted node, so split that node at second element 43 as below. Level 1: 18, 37 Level 2: 15: 21, 24: 43, 48, 60, 71 Comment Step 24 of 34 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Again, overflow at level 2. Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 25 of 34 Insert 47: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 26 of 34 Insert 50: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56, 59, 60: 65, 71: 74, 78, 92 Overflow at the inserted node. Split the node at 56, the second element and propagate it up as below. Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 71: 74, 78, 92 Comment Step 27 of 34 Insert 69: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 78, 92 Comment Step 28 of 34 Insert 75: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75 78, 92 Overflow at the inserted node, split and propagate up the node at the second element. Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71, 75 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Again, overflow at the inserted node, split it at 60 and propagate up. Level 1: 18, 37, 48, 60 Level 2: 15: 21, 24: 43: 56: 71, 75 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Again overflow at the inserted node of 60. Split it at 37 and propagate 37 into a new level. Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 29 of 34 Insert 8: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 30 of 34 Insert 49: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 31 of 34 Insert 33: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 33, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 32 of 34 Insert 38: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 33, 37: 38, 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 33 of 34 The tree after the insertion of the last key 38 will give us the final B+ tree. • From each node except the leaf nodes, a left pointer is there to the child nodes in which left pointer points to node having keys less than that parent node and right pointer points to the node having key values larger than that parent node. • Each set in the above tree levels will form a node and set elements are the keys present in that node. Comment Step 34 of 34 Graphically the final tree after the insertion of keys will look as below: Comment Chapter 17, Problem 19E Problem A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49, 33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like. Step-by-step solution Step 1 of 34 B+ Tree Insertion: Here, the given a set of keys to be inserted into a • The Order • of order implies that each node in the tree should have at most 4 pointers. Means the leaf nodes must have at least 2 keys and at most 3 keys. • The insertion first start from the root, when root or any node overflows its capacity, it must split. • When a leaf node is full the first elements will keep in that node and rest elements should form the right node. • The element at that rightmost position of the left partition will propagate up to the parent node. • If the propagation is from the leaf node, a copy of the element should maintain at leaf. Else, just move that element to its parent node. • All the elements in the key list should be there in the leaf nodes. Comment Step 2 of 34 In problem given a set of keys to insert into the B+ tree in order. The given list is, 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. First, insert the first three keys into the root; it will not result in overflow. Since, the capacity of the node is also 3. The resultant B+ tree will be as below: Since, the node is also a leaf node and there is no pointer. Comment Step 3 of 34 Insert 60: After the insertion of 60 into this node, it will results in an overflow, So the node to be split into two and a new level will created as below: Comment Step 4 of 34 Insert 46: Insertion of 46 will not affect the capacity constraint of the second node in level 2. The resultant tree will be, Level1- 37 Level2: 23,37: 46,60,65 The tree will look as below: Comment Step 5 of 34 Insert 92: Insertion of next key, 92 will results in the overflow of the second node in the level2, it will be 46,60,65,92. • Therefore, we need to split that node from 60 and create one new node in level2 and duplicate 60 in the parent node as below: Comment Step 6 of 34 Comment Step 7 of 34 Insert 48: Insertion of 48 will not prompt any overflow it will insert to the second node in the level2 as below: Comment Step 8 of 34 Insert 71: Insertion of 71 into B+ tree also will not prompt any overflow. • It can insert into third node of level 2 without violating order constraints. • Therefore, the updated tree will be as below: Level1: 37, 60 Level2: 23, 37: 46, 48, 60: 65, 71, 92 The tree will be look as below: Comment Step 9 of 34 Insert 56: Next insertion is of 56. • Clearly 56 is belongs to the second node of level 2 but it will results in an overflow as shown below: • So, need to split that node (46, 48, 56, 60). • The first two (46, 48) will form the first node of split and (56, 60) will form the second, the last element of the first set (48) will propagate to up. • Since it is a leaf node, it will be only duplication. However, the resultant B+ will be as below: Level 1: 37, 48, 60 Level 2: 23, 37: 46, 48: 56, 60: 65, 71, 92 Comment Step 10 of 34 The rest insertion operations can be performed as below: • The level is counts from root to leaves, that is; root will have level value 1 and increment 1 downwards. Insert 59: Level 1: 37, 48, 60 Level 2: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 11 of 34 Insert 18: Level 1: 37, 48, 60 Level 2: 18, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 12 of 34 Insert 21: Level 1: 37, 48, 60 Level 2: 18, 21, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Overflow. Split (18, 21, 23, 37) and propagate 21 to above level. Level 1: 21, 37, 48, 60 Level 2: 18, 21,: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Again, overflow in level 1. Split and propagate 37, since it is not a leaf node so no need to take a copy of 37. This will results a new level in the tree. Level 1: 37 Level 2: 21: 48, 60 Level 3: 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 13 of 34 Insert 10: Level 1: 37 Level 2: 21: 48, 60 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92 Comment Step 14 of 34 Insert 74: Level 1: 37 Level 2: 21: 48, 60 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 74, 92 Overflow in level 3. Split overloaded node at 71 Level 1: 37 Level 2: 21: 48, 60, 71 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 92 Comment Step 15 of 34 Insert 78: Level 1: 37 Level 2: 21: 48, 60, 71 Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 16 of 34 Insert 15: Level 1: 37 Level 2: 21: 48, 60, 71 Level 3: 10, 15, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Overflow in the first node of level 3, split it at 15 and propagate 15 up. Level 1: 37 Level 2: 15, 21: 48, 60, 71 Level 3: 10, 15: 18, 21: 23, 37: 46, 48: Comment Step 17 of 34 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 18 of 34 Insert 16: Level 1: 37 Level 2: 15, 21: 48, 60, 71 Level 3: 10, 15 16, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 19 of 34 Insert 20: Level 1: 37 Level 2: 15, 21: 48, 60, 71 Level 3: 10, 15 16, 18, 20, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Overflow at the inserted node, split it at 18 and propagate 18 up. Level 1: 37 Level 2: 15, 18, 21: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 20 of 34 Insert 24: Level 1: 37 Level 2: 15, 18, 21: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 21 of 34 Insert 28: Level 1: 37 Level 2: 15, 18, 21: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24, 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Overflow in the fourth node of level 3, split it at 24 and propagate 24 up as below. Level 1: 37 Level 2: 15, 18, 21,24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Again, overflow at level 2, need one more split at 18 as below. Level 1: 18, 37 Level 2: 15: 21,24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 22 of 34 Insert 39: Level 1: 18, 37 Level 2: 15: 21, 24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 23 of 34 Insert 43: Level 1: 18, 37 Level 2: 15: 21, 24: 48, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Over flow at the inserted node, so split that node at second element 43 as below. Level 1: 18, 37 Level 2: 15: 21, 24: 43, 48, 60, 71 Comment Step 24 of 34 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Again, overflow at level 2. Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 25 of 34 Insert 47: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 56, 59, 60: 65, 71: 74, 78, 92 Comment Step 26 of 34 Insert 50: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56, 59, 60: 65, 71: 74, 78, 92 Overflow at the inserted node. Split the node at 56, the second element and propagate it up as below. Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 71: 74, 78, 92 Comment Step 27 of 34 Insert 69: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 78, 92 Comment Step 28 of 34 Insert 75: Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75 78, 92 Overflow at the inserted node, split and propagate up the node at the second element. Level 1: 18, 37, 48 Level 2: 15: 21, 24: 43: 56, 60, 71, 75 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Again, overflow at the inserted node, split it at 60 and propagate up. Level 1: 18, 37, 48, 60 Level 2: 15: 21, 24: 43: 56: 71, 75 Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Again overflow at the inserted node of 60. Split it at 37 and propagate 37 into a new level. Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 29 of 34 Insert 8: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 30 of 34 Insert 49: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 31 of 34 Insert 33: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 33, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 32 of 34 Insert 38: Level 1: 37 Level 2: 18: 48, 60 Level 3: 15: 21, 24: 43: 56: 71, 75 Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24: 28, 33, 37: 38, 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92 Comment Step 33 of 34 The tree after the insertion of the last key 38 will give us the final B+ tree. • From each node except the leaf nodes, a left pointer is there to the child nodes in which left pointer points to node having keys less than that parent node and right pointer points to the node having key values larger than that parent node. • Each set in the above tree levels will form a node and set elements are the keys present in that node. Comment Step 34 of 34 Graphically the final tree after the insertion of keys will look as below: Comment Chapter 17, Problem 20E Problem Repeat Exercise, but use a B-tree of order p = 4 instead of a B+-tree. Exercise A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49, 33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like. Step-by-step solution Step 1 of 1 Insertion will take place in steps represented in diagram: Comment Chapter 17, Problem 21E Problem Suppose that the following search field values are deleted, in the given order, from the B+-tree of Exercise; show how the tree will shrink and show the final tree. The deleted values are 65, 75, 43, 18, 20, 92, 59, 37. Exercise A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49, 33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like. Step-by-step solution Step 1 of 10 In the - tree deletion algorithm, the deletion of a key value from a leaf node is (1) It is less than half full. In this case, we may combine with the next leaf node. Comment Step 2 of 10 (2) If the key value is deleted from right most value. Then its value will appear in an internal node. In this case, the key value to the left of the deleted key and left node will replaces the deleted key value in the internal node. From the data, deleting 65 will only affect the leaf node. Deleting 75 will cause a leaf node to be less than half. So, it is combined with the next node and also 75 is removed than the internal node. Comment Step 3 of 10 Comment Step 4 of 10 Deleting 43 causes a leaf node to be less than half full, and is combined with the next node. So the next node has 3 entries. It’s right must entry 48 can replace 43 in both the leaf and interval nodes. Comment Step 5 of 10 Comment Step 6 of 10 In the next step we may delete 18, it is in the right most entry in a leaf node and appears in an internal node of the . Now the leaf node is less than half full and combined with the next node. The value 18 must be removed from the internal node. Causing underflow in the internal One approach for dealing with under flow internal nodes is to reorganize the values of the under flow node with its child nodes, so 21 is moved up into the under flow node leading to the following free. Comment Step 7 of 10 Comment Step 8 of 10 Deleting 20 and 92 will not cause under flow. Deleting 59 causes under flow and the remaining value go is combined with the next leaf node. Hence 60 is no larger a right most entry in a leaf node. This is normally don by moving 56 up to replace 60 in the internal node, but since this leads to under flow in the node that used to contains 56 the nodes can be reorganized as follows. Comment Step 9 of 10 Comment Step 10 of 10 Finally removing 37 causes serious underflow, leading to a reorganization of the whole tree. One approach to deleting the value on the root node is to use the right mast value in the root node is to use the right mast value in the next leaf node to replace the root an move this leaf node to the left sub tree. In this case the resulting tree may book as follows. Comment Chapter 17, Problem 22E Problem Repeat Exercise 1, but for the B-tree of Exercise 3. Exercise 1 Suppose that the following search field values are deleted, in the given order, from the B+-tree of Exercise 2; show how the tree will shrink and show the final tree. The deleted values are 65, 75, 43, 18, 20, 92, 59, 37. Exercise 2 A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49, 33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like. Exercise 3 Repeat Exercise 2, but use a B-tree of order p = 4 instead of a B+-tree. Step-by-step solution Step 1 of 1 Deletion will take place in following order: Comment Chapter 17, Problem 23E Problem Algorithm 17.1 outlines the procedure for searching a nondense multilevel primary index to retrieve a file record. Adapt the algorithm for each of the following cases: a. A multilevel secondary index on a nonkey nonordering field of a file. Assume that option 3 of Section 17.1.3 is used, where an extra level of indirection stores pointers to the individual records with the corresponding index field value. b. A multilevel secondary index on a nonordering key field of a file. c. A multilevel clustering index on a nonkey ordering field of a file. Step-by-step solution Step 1 of 3 Comment Step 2 of 3 Comment Step 3 of 3 Comment Chapter 17, Problem 24E Problem Suppose that several secondary indexes exist on nonkey fields of a file, implemented using option 3 of Section 17.1.3; for example, we could have secondary indexes on the fields Department_code, Job_code, and Salary of the EMPLOYEE file of Exercise. Describe an efficient way to search for and retrieve records satisfying a complex selection condition on these fields, such as (Department_code = 5 AND Job_code =12 AND Salary = 50,000), using the record pointers in the indirection level. Exercise Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a record pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed length. Each record has the following fields: Name (30 bytes), Ssn (9 bytes), Department_code (9 bytes), Address (40 bytes), Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes), and Salary (4 bytes, real number). An additional byte is used as a deletion marker. a. Calculate the record size R in bytes. b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned organization. c. Suppose that the file is ordered by the key field Ssn and we want to construct a primary index on Ssn. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of first-level index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; and (v) the number of block accesses needed to search for and retrieve a record from the file—given its Ssn value—using the primary index. d. Suppose that the file is not ordered by the key field Ssn and we want to construct a secondary index on Ssn. Repeat the previous exercise (part c) for the secondary index and compare with the primary index. e. Suppose that the file is not ordered by the nonkey field Department_code and we want to construct a secondary index on Department_code, using option 3 of Section 17.1.3, with an extra level of indirection that stores record pointers. Assume there are 1,000 distinct values of Department_code and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level index entries and the number of first-level index blocks; (iv) the number of levels needed if we make it into a multilevel index; (v) the total number of blocks required by the multilevel index and the blocks used in the extra level of indirection; and (vi) the approximate number of block accesses needed to search for and retrieve all records in the file that have a specific Department_code value, using the index. f. Suppose that the file is ordered by the nonkey field Department_code and we want to construct a clustering index on Department_code that uses block anchors (every new value of Department_code starts at the beginning of a new block). Assume there are 1,000 distinct values of Department_code and that the EMPLOYEE records are evenly distributed among these values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of first-level index entries and the number of first-level index blocks; (iii) the number of levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the multilevel index; and (v) the number of block accesses needed to search for and retrieve all records in the file that have a specific Department_code value, using the clustering index (assume that multiple blocks in a cluster are contiguous). g. Suppose that the file is not ordered by the key field Ssn and we want to construct a B+-tree access structure (index) on Ssn. Calculate (i) the orders p and pleaf of the B+-tree; (ii) the number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for convenience); (iii) the number of levels needed if internal nodes are also 69% full (rounded up for convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block accesses needed to search for and retrieve a record from the file?given its Ssn value?using the B+-tree. h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your results for the B-tree and for the B+-tree. Step-by-step solution Step 1 of 2 The EMPLOYEE file contains the fields Name, Ssn, Department_code, Address, Phone, Birth_date, Sex, Job_code, Salary. The primary index is maintained on the key field Ssn . Consider that the secondary indexes are maintained on the fields Department_code, Job_code and Salary. The fields Department_code, Job_code and Salary are non-key fields. Comment Step 2 of 2 The steps to retrieve records based on the complex condition (Department_code = 5 AND Job_code = 12 AND Salary = 50,000) using record pointers in indirection level is as follows: 1. First retrieve the record pointers of the records that satisfy the condition Department_code = 5 using secondary index on Deparment_code. 2. Then among the records pointers retrieved in step 1, retrieve the record pointers of the records that satisfy the condition Job_code = 12 using secondary index on Job_code. 3. Then among the records pointers retrieved in step 2, retrieve the record pointers of the records that satisfy the condition Salary = 50000 using secondary index on Salary. Comment Chapter 17, Problem 25E Problem Adapt Algorithms 17.2 and 17.3, which outline search and insertion procedures for a B+-tree, to a B-tree. Step-by-step solution Step 1 of 2 Searching record in B-tree with key field value= K n<- block containing root node of B- tree read block n; while(n is not the leaf node of tree) do begin q<- number of tree pointers in node n; if K<= n.K1( * n.Ki referes to the ith search field value in node n*) then n<- n.P1(* n.Pi refers to the ith tree pointer in node n *) else if K> n.Kq-1 then n<- n.Pq else begin search node n for an entry i such that n.Ki = K if for (n.Ki == K) use data pointer to access the file record; exit; else search node n for an entry i such thatn.Ki-1< K <=n.Ki; n<-n.Pi end read block n; end; begin; search leaf node n for an entry i such that n.Ki = K if for (n.Ki == K) use data pointer to access the file record; else return value does not exist;(*if we rech at this level value does not exist*); Comment Step 2 of 2 Insertion Key field value K in B-tree of order p: n<- block containing root node of B-tree; read block n; set stack S o empty; while (n is not leaf node of tree) do begin push addres of n on stack S; q<- number of tree pointers in node n; if K<= n.K1 then n<- n.P1 else if K> n.Kq-1 then n<- n.Pq else begin search node n for entry i such that n.Ki-1< K<= n.Ki; n<- n.Pi end; read block n end; search block for entry (Ki, P) with K = Ki; if found then record already in file; cannot isert else begin create entry (Pr, ) where Pr points to the new record; if leaf node n is not full then insert (Pr, ) in correct psition in leaf node n else begin copy n to temp; insert entry (Pr, ) in temp in corret position; new<- a new empty leaf node for tree ; j<- [(pleaf+1)/2]; n<- first j entries intemp (upo entry (Prj, new<- remaining entries in temp; K <- Kj; finished<- false; repeat if stack S is empty then begin root<- a new empty internal node for the tree; ROOT<- >; finished<- true; end else begin n<- pop stack S; if internal node n is not full then begin insert (new, ) in correct position in internal node n; finished<- true end else begin copy n to temp insert (new, ) in temp in correct position ; new<- a new empty internal node for tree; j<-[(p+1)/2]; n<- new enteries upto tree pointer Pj in temp; new<- entries from tree pointer Pj+1 in temp; K<- Kj end end; entill finished; end; end; Comment Chapter 17, Problem 26E Problem It is possible to modify the B+-tree insertion algorithm to delay the case where a new level is produced by checking for a possible redistribution of values among the leaf nodes. Figure 17.17 illustrates how this could be done for our example in Figure 17.12; rather than splitting the leftmost leaf node when 12 is inserted, we do a left redistribution by moving 7 to the leaf node to its left (if there is space in this node). Figure 17.17 shows how the tree would look when redistribution is considered. It is also possible to consider right redistribution. Try to modify the B+-tree insertion algorithm to take redistribution into account. Step-by-step solution Step 1 of 1 Refer to figure 17.17 for the redistribution of the values among the leaf nodes at a new level. The figure shows inserting the values 12, 9 and 6. In the figure, value 12 is inserted into the leaf node by moving 7 to its left leaf node through left redistribution. The values 12, 9 and 6 can be distributed among the leaf nodes, at a new level, using right redistribution as follows: • When a new value is inserted in a leaf node, the tree is divided into leaf nodes and internal nodes. Every value that appears in the internal node also appears as the rightmost value at the leaf level, such that the tree pointer to the left of this value points to this value. • If a new values needs to be inserted in the leaf node and the leaf node is full, then it is split. The first values, where denotes the order of leaf nodes, present in the original node are retained and rests of the values are moved to a new leaf node. The duplicate value of the jth search value is retained at the parent node and a pointer pointing to the new node is created. • This new node is inserted in the parent node. If the parent node is full then it is split. The jth search value is moved to the parent and values present in the internal nodes up to where is the jth tree pointer and • The values from are kept, . till the last value present in the node are kept in the new internal node. The splitting of parent node and leaf nodes continues in this way and results in new level for the tree. The modified Comment tree insertion algorithm based on the right redistribution is as follows: Chapter 17, Problem 27E Problem Outline an algorithm for deletion from a B+-tree. Step-by-step solution Step 1 of 1 Delete node with value of key = K n<- block containing root node of B+ -Tree; read block n; while (n is not leaf node of B+ - tree) begin q<- number of tree pointer in node n ; if K<= n.K1( * n.Ki referes to the ith search field value in node n*) then n<- n.P1(* n.Pi refers to the ith tree pointer in node n *) else if K> n.Kq-1 then n<- n.Pq else begin search node n for an entry i such that n.Ki = K if (n.Ki == K) Access the left most value in tree pointed by n.Pi+1 Store this value in a temp; Delete this value from tree; replace K with temp; exit; else search node n for an entry i such thatn.Ki-1< K <=n.Ki; n<-n.Pi end read block n; end; search leaf node n for an entry (Ki, Pri) with = Ki; If not found value does not exist in tree else if it is the single entry in leaf node and P.next is not null temp. K1 , Ptemp.Pr<- Pnext.K1, Pnext.Pr; Delte P.next.K1; Excahange record value of parent record and tmp. replace value of n by temp; exit; else if it is not single entry; Delete it; else if it is single entry and Pnext = NULL Access the right most value in tre pointed by parent tore value in temp; exchange parent and temp; acess record to be deleted; replace this record by temp; exit; Comment Chapter 17, Problem 28E Problem Repeat Exercise for a B-tree. Exercise Outline an algorithm for deletion from a B+-tree. Step-by-step solution Step 1 of 2 Algorithm for deletion from B-tree: B-Tree –delete (x, k) // is the root of the sub tree and k is the key which is to be deleted. // if K deleted successfully, then B-tree-Delete return true. Other wise it returns false. Note: - This function is designed so that when ever it is called recursively If is a leaf then if is in then Delete k from and it return true. Else return false // k is not in subs tree. Else // is an internal node. If k is in them The child of If that precedes k has at least keys them The predecessor of (Use B-tree-find largest) Copy over k // replace with B-tree-Delete ( ) // recursive call else // The child of If z has at least has keys that follows k keys then The successor of k Copy over k // replace k with B-Tree-Delete // recursive call else // both Merge and all of into // here contains 2t-1 keys, and have keys // k and the pointer to z will be deleted from x. B-tree-Delete (y,k) // recursive call else // k is not in internal node x Points to the root, of the sub tree and it contain k. If contain k If C has keys then If C has an immediate left /right sibling, z With t or more keys then let 1<1 be the key in that follows C. Move 1<1 into C as the first /last key in C. Let be the last/first key in the immediate left/right sibling, z Replace (ie. Move in with up into from z ) More the last/first child sub tree of C. Else // C and both of its immediate siblings have (t-1) // we cannot descend to the child node with only keys so Comment Step 2 of 2 Merge ‘C’ with immediate siblings and make the appropriate key of B-tree-delete (c, k) Comment ,c has at least keys. Chapter 20, Problem 1RQ Problem What is meant by the concurrent execution of database transactions in a multiuser system? Discuss why concurrency control is needed, and give informal examples. Step-by-step solution Step 1 of 1 Multi-user system Users, that can use the many system and access data at the same time. That is called multi user system. Concurrency control is needed for Lost update problem – Two or more transactions read a record at the same time, but when the records are saved, only the last record saved will reflect any changes while all other changes will be lost. Temporary update (or dirty read) problem By this we cannot save the data update because someone else may have accessed the record and locked it due to a concurrency safety feature. However, another transaction reads the temporary update. the data from the temporary update is now incorrect or "dirty data". Incorrect Summary problem – Let the example from the book uses an airline seat reservation issue. a person wants to buy a ticket for a seat so the system takes a summary of how many open seats on the plane. between the time the summary action starts and finishes, some other seats become reserved by other tellers and the initial summary comes back to our customer and is now inaccurate because it will not reflect the true number of seats available. Comment Chapter 20, Problem 2RQ Problem Discuss the different types of failures. What is meant by catastrophic failure? Step-by-step solution Step 1 of 1 Types of Failures Failures in database management system are categorized as transaction, system, and media failures. There are many possible reasons for transaction to fail during execution: Computer failure: During the transaction execution, the computer hardware, media, software or network may crash. This type of crashes will cause database management system failures. Transaction or system error: The operations such as such as divide by zero or integer overflow will cause the transaction to fail. Occurrences of logical programming error or erroneous parameter values will cause failures. User may interrupt the system during the transaction execution. Local errors: Errors or exception conditions that are detected by the transaction will cause failures. Then the transaction halts and cancels all inputted data; because something along the way prevents it from proceeding. Concurrency control enforcement: Several transactions become deadlocked and are aborted. Disk failure: The data stored in the disk blocks may be lost because of a read-write error or a read/write head crash. This could occur during a read or a write operation of the transaction. Physical problems and catastrophes: Power failure, robbery, fire accident, destruction and many more refer to physical problems. Catastrophic failure: Catastrophic failure will occur very rarely. Catastrophic failure includes many forms of physical misfortune to our database and there is an endless list of such problems. • The hard drive with all data may completely damage • Fire accident that may cause the loss of physical devices and data loss. • Power or air-conditioning failures. • Destruction of physical devices. • Theft of storage media and physical devices. • Overwriting disks or tapes by mistake. Comment Chapter 20, Problem 3RQ Problem Discuss the actions taken by the read_item and write_item operations on a database. Step-by-step solution Step 1 of 1 In a database, the operations like read item and write item that may Actions taken by the read item operation on a database (assume the read operation is performed on data item X): Find the address of the disk block that contains item X. Copy the disk block into a buffer in main memory if that disk is not already in some main memory buffer. Copy item X from the buffer to the program variable named X. Actions taken by the write item operation on a database (assume the write operation is performed on data item X): Find the address of the disk block that contains item X. Copy the disk block into a buffer in main memory if that disk is not already in some main memory buffer. Copy item X from the program variable named X into its correct location in the buffer. Store the updated block from the buffer back to disk (either immediately or at some later point in time). Comment Chapter 20, Problem 4RQ Problem Draw a state diagram and discuss the typical states that a transaction goes through during execution. Step-by-step solution Step 1 of 2 State diagram of a transaction, Comment Step 2 of 2 Typical states that a transaction begin_transaction - start read or write - read or change or delete a record. end_transaction - finish commit_transaction - change or delete completed. rollback - change or delete unsuccessful, all changes will be reset. Important of transaction commit points Transactions commit points in the log where the transaction has completed successfully and all of the reads and write that go along with it. Comment Chapter 20, Problem 5RQ Problem What is the system log used for? What are the typical kinds of records in a system log? What are transaction commit points, and why are they important? Step-by-step solution Step 1 of 2 System log: The system log used for “to recover from failures that affect transactions”. The system maintains a log to keep track of all transaction operations that affect the values of database items." basically it is used to keep track of all the meaningful stuff from a database. Comment Step 2 of 2 Typical kinds of records in a system log: start_transaction - start commit - finish read - read write - write abort - don’t change anything. Important of transaction commit points Points in the log where the transaction has completed successfully and all of the reads and writes that go along with it. Comment Chapter 20, Problem 6RQ Problem Discuss the atomicity, durability, isolation, and consistency preservation properties of a database transaction. Step-by-step solution Step 1 of 4 Atomicity: • This property states that a transaction must be treated as an atomic unit, that is, either all its operations are executed or none. • There must be no state in a database where a transaction is left partially completed. • States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction. • This property requires that execute a transaction to completion. If the transaction is fail. • If there is a failure at midway or user explicitly cancels the operation or due to any internal error occurred, database ensures whether any partial state from leftover operation or not. • Database can UNDO or ROLLBACK all the changes as the database was present in its first place. • To complicate for some reason, such as a system crash during transaction execution, the recovery technique must undo any effects of the transaction on the database. Comment Step 2 of 4 Durability or permanency: • The changes applied to the database by a committed transaction must persist in the database, and must not be lost if failure occurs. • It is the responsibility of the recovery subsystem of the DBMS. • If a transaction updates a chunk of data in a database and commits, then the database holds the modified data. • Even if a transaction commits but the system fails before the data could be written on to the disk, then the data will be updated once the system springs back into action. Comment Step 3 of 4 Isolation: • A transaction should appear as though it is being executed in isolation from other transactions simultaneously or in parallel. • That is, the execution of a transaction should not be interfered with by any other transactions executing concurrently. • It is enforced by the concurrency control sub system of the DBMS. If every transaction does not make its updates visible to other transactions until it is committed. • It solves the temporary update problem and eliminates cascading rollbacks. • In simple terms, one transaction cannot read data from another transaction until it is not completed. • If two transactions are executing sequentially, and one wants to see the changes done by the another, it must wait until the other is finished. Comment Step 4 of 4 Consistency preservation: • The consistency property ensures that the database remains in a consistent state before the start of the transaction and after the transaction is over (whether it is successful or not). • It states that when transaction is finished the data will remain in a consistent state. • A transaction either creates a new and valid state of data, or, if any failure occurs, returns all data to its state before the transaction was started. • Execution of transaction should take the database from one consistent state to another. Comment Chapter 20, Problem 7RQ Problem What is a schedule (history)? Define the concepts of recoverable, cascade-less, and strict schedules, and compare them in terms of their recoverability. Step-by-step solution Step 1 of 4 Schedule (or history) A schedule (or history) S of n transactions T1, T2 , ...,Tn is an ordering of the operations of the transactions subject to the constraint that, for each transaction Ti that participates in S, the operations of Ti in S must appear in the same order in which they occur in Ti. If we can ensure that a transaction T, when committed, never has to roll back, then we have a demarcation between recoverable and non-recoverable schedules. Schedules determined as non-recoverable should not be permitted. Among the recoverable schedules, transaction failures generate a spectrum of recoverability, from easy to complex. Comment Step 2 of 4 Recoverable: A schedule S is recoverable if no transaction T in S commits until all transactions T’, that have written an item that T reads, have committed. A transaction T reads from transaction T’ in a schedule S if some item X is first written by T’ and read later by T. In addition, T’ should not be aborted before before T reads item X, and there should be no transactions that write X after T’ writes it and before T reads it (unless those transactions, if any, have aborted before T reads X). Comment Step 3 of 4 Cascadeless schedule: A schedule is said to avoid cascading rollback if every transaction in the schedule reads only items that were written by committed transactions. This guarantees that read items will not be discarded. Uncommitted transaction has to be rolled back because it read an item from a transaction and that is that failed. This form of rollback is undesirable, since it can lead to undoing a significant amount of work. It is desirable to restrict the schedules to those where cascading rollbacks cannot occur. Comment Step 4 of 4 Strict schedule: Transactions can neither read nor write an item X until the last transaction that wrote X has committed or aborted. Strict schedules simplify the recovery process. The process of undoing a write (X) operation of an aborted transaction is simply to restore the before image, the old-value for X. Though this always works correctly for strict schedules, it may not work for recoverable or cascadeless schedules. If the schedule is cascadeless it is recoverable. If it is strict it is cascadeless. The reverse is not always true Comment Chapter 20, Problem 8RQ Problem Discuss the different measures of transaction equivalence. What is the difference between conflict equivalence and view equivalence? Step-by-step solution Step 1 of 3 Different measures of transaction equivalence are: 1.) Conflict equivalence: Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules. Two operations in a schedule are said to be conflict if they belong to different transactions, access the same database item, and at least one of the two operations is a write item operation. If two conflicting operations are applied in different orders in two schedules, the effect can be different on the database or on other transactions in the schedules, and hence the schedules are not conflict equivalent. Comment Step 2 of 3 2.) View equivalence: Another less restrictive definition of schedules is called view equivalence. Two schedules S and S' are said to be view equivalent if the following 3 conditions hold: 1.) The same set of transactions participates in S and S', and S and S' include the same operation of those transactions. 2.) For any operation ri(X) of Ti in S, if the value of X read but the operation has been written by an operation wj(X) of Tj, the same condition must hold the value of X read by operation ri(X) of Ti in S'. 3.) If the operation wk(Y) of Tk is the ast operation to write item Y in S, then wk(Y) of Tk must also be the last operation to write item in S'. The idea behind view equivalence is that as long as each rea operation of the transaction reads the result of the same write operation in both the schedules, the write operation of each transaction must produce same result. The read operations are thus said to have same view in both schedules. Condition # ensures that the final write operation on each data item is the same in both schedules, so the database stat should e the same at the end of both schedules. Comment Step 3 of 3 The difference between view equivalence and conflict equivalence arise under unconstrained write assumption. View serializability is less restrictive under unconstrained write assumption, where the value written by a operation wi(X) in it can be independent of its old value from the database. This is called a blind write, and it is illustrated by the following schedule Sg of three transactions T1: r1(X); w1(X); T2: w2(x); and T3: w3(X): Sg: r1(X); w2(X); w1(X); w3(X);c1;c2;c3; in Sg the operation w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value of X. The Schedule Sg is view serializable but not conflict serializable. Conflict serializable schedules are view serializable but not vice versa. Testing of view serializability has been shown to be NPhard, meaning that finding an efficient polynomial time algorithm for this problem is highly unlikely. Comment Chapter 20, Problem 9RQ Problem What is a serial schedule? What is a serializable schedule? Why is a serial schedule considered correct? Why is a serializable schedule considered correct? Step-by-step solution Step 1 of 4 Serial schedule: A schedule “S” is referred as serial, for each transaction “T” participating in schedule, the operations of T must be executed consecutively in schedule. • So from this perspective, it is clear that only one transaction at a time is active and whenever if that transaction is committed, then it initiates the execution of next transaction. Comment Step 2 of 4 Serializable schedule: The schedule is referred as “serializable schedule. When a schedule t T be a set of n transactions ( ), is serializable and if it is equivalent to n transactions executed serially. Consider that possibly there are “n” serial schedule of “n” transactions and moreover there are possibly non-serial schedules. If two disjoined groups of the nonserial schedules are formed then it is equivalent o one or more of the serial schedules. Hence, the schedule is referred as serializable. Comment Step 3 of 4 Reason for the correctness of serial schedule: A serial schedule is said to be correct on the assumption of that each transactions is independent of each other. So according to the “consistency preservation” property, when the transaction runs in isolation, it is executed from the beginning to end from the other transaction .Thus, the output is correct on the database. Therefore a set of transaction executed one at a time is correct. Comment Step 4 of 4 Reason for the correctness of serializable schedule: The simple method to prove the correctness of serializable schedule is that to prove the satisfactory definition. In this definition, it compares the results of the schedules on the database, if both produce same final state of database. Then, two schedules are equivalent and it is proved to be serializable. Therefore, the serializable schedule is correct when the two schedules are in the same order. Comment Chapter 20, Problem 10RQ Problem What is the difference between the constrained write and the unconstrained write assumptions? Which is more realistic? Step-by-step solution Step 1 of 1 Constrained write assumption state that any write operation wi(X) in Ti is preceded by a ri(X) in Ti and the value written by wi(X) in Ti depends only on value of X read by ri(X). This assume that computation of the new value of X is a function f(X) based on the old value of X read from the database. Unconstrained write assumption state that the value written by an operation wi(X) in it can be independent of its old value from the database. This is called a blind write, and it is illustrated by the following schedule Sg of three transactions T1: r1(X); w1(X); T2: w2(x); and T3: w3(X): Sg: r1(X); w2(X); w1(X); w3(X);c1;c2;c3; in Sg the operation w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value of X. Constrained write assumption is more realistic as often we need to take in account the value of a variable before editing the value in the application or query. Comment Chapter 20, Problem 11RQ Problem Discuss how serializability is used to enforce concurrency control in a database system. Why is serializability sometimes considered too restrictive as a measure of correctness for schedules? Step-by-step solution Step 1 of 4 The concept of serializability of schedules is used to identify which schedules are correct when transaction executions have interleaving of their operations in the schedules. A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions. Saying that a non serial schedule S is serializable is equivalent of saying that it is correct, because it is equivalent to a serial schedule, which is considered correct. There are several ways of saying that a Schedule is equivalent: Two schedules are result equivalent if they produce the same final state of database. However two schedules may accidentally produce same final state, so result equivalence cannot be used to define equivalence of schedules. Comment Step 2 of 4 Conflict equivalence: Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules. Two operations in a schedule are said to be conflict if they belong to different transactions, access the same database item, and at least one of the two operations is a write item operation. If two conflicting operations are applied in different orders in two schedules, the effect can be different on the database or on other transactions in the schedules, and hence the schedules are not conflict equivalent. Comment Step 3 of 4 View equivalence: Another less restrictive definition of schedules is called view equivalence. Two schedules S and S' are said to be view equivalent if the following 3 conditions hold: 1.) The same set of transactions participates in S and S', and S and S' include the same operation of those transactions. 2.) For any operation ri(X) of Ti in S, if the value of X read bt the operation has been written by an operation wj(X) of Tj, the same condition must hold the value of X read by operation ri(X) of Ti in S'. 3.) If the operation wk(Y) of Tk is the ast operation to write item Y in S, then wk(Y) of Tk must also be the last operation to write item in S'. Comment Step 4 of 4 Serializability of schedules is sometimes considered to be too restrictive as a condition for ensuring the correctness of concurrent executions. Some applications can produce schedules that are correct by satisfying conditions less stringent than either conflict serializability or view serializability. An example of the type of transactions known as debit card transactions- for example, those that apply deposits and withdrawals to data item whose value is the current balance of a bank account. The semantics of debit- card operations is that they update value of a data item X by either adding or subtracting to current value and both these operations are commutative- and it is possible to produce correct schedules that are not serializable. With additional knowledge, or semantics, that the operation between each ri(I) and wi(I) are commutative, we know that the order of executing the sequence consisting of (read, write, update) is not important as long as each (read, write, update)sequence by a particular transaction Ti on a particular item is not interrupted by conflicting operations. Hence a non serializable can also be considered correct. Researchers have been working on extending concurrency control theory to deal with case where serializability is considered to be too restrictive as a condition for correctness of schedules. Comment Chapter 20, Problem 12RQ Problem Describe the four levels of isolation in SQL. Also discuss the concept of snapshot isolation and its effect on the phantom record problem. Step-by-step solution Step 1 of 2 The statement ISOLATION LEVEL is used to specify isolation value, where these values can be SERIALIZABLE, REPEATABLE END, READ COMMITTED OR READ UNCOMMITTED. SERIALIZABLE is the default isolation level, but some system uses READ COMMITTED as the default level. The four isolation levels are as follows: 1. Level 0: If the dirty reads of higher level transactions cannot be overwritten by a transaction, then such transaction have level 0 isolation. Such isolation level has the value READ UNCOMMITTED. It lets the transaction display the data of previous statement on current page, whether or not the transaction is committed. This is called dirty read too. Example: Statement 1: Begin tran UPDATE stu SET marks=200 where rollno. = 34 waitfor delay ’00:00:20’ COMMIT; Statement 2: SET TRANSACTION ISOLATION LEVEL READ COMMITTED SELECT * FROM stu; The statement 2 will execute after update of stu table by statement 1 and display records before the transaction is committed. 2. Level 1: The transaction having this isolation level has no lost updates. Such isolation level has the value READ COMMITTED. In this isolation level, the SQL query statement takes only committed values. If any transaction is locked or incomplete, then the select statement will wait until all the transactions complete. 3. Level 2: The transaction having this isolation level has no dirty reads as well as no lost updates. Such isolation level has the value REPEATABLE READ. Repeatable read is the extension to the committed read. It ensures that if the same query is executed again in the transaction, it will not read the change in the data value that another query has made. No other user can modify the data values until the transaction is committed or rolled back by the previous user. 4. Level 3: In addition to the properties from level 2, isolation level 3 has repeatable reads. Such isolation level has the value SERIALIZABLE. Serializable isolation level works like repeatable read except that it prevents Phantoms, when same query is executed twice. This option works on range lock. It locks whole the table if there is none of the condition is specified on index. Comment Step 2 of 2 Snapshot isolation: Snapshot isolation is used in concurrency control protocols and some commercial DBMSs. Its definition comprises of the data items that is read by a transaction based on the committed values of the items present in the database snapshot. Snapshot isolation ensures that Phantom record problem does not happen. It ensures this, through the records that are executed in the database at the beginning of a transaction. Comment Chapter 20, Problem 13RQ Problem Define the violations caused by each of the following: dirty read, nonrepeatable read, and phantoms. Step-by-step solution Step 1 of 1 Violations caused by : Dirty read – A transaction that reads information from another transaction, The initial transaction commits while the other transaction aborts. This causes the source used in the initial transaction to become incorrect. Nonrepeatable read – The transaction reads a value from a record. Another transaction changes the values of the record that was read. When the initial transaction reads the record again, the values are different. Phantoms – A transaction may read a set of rows from a table based on some condition specified in the SQL WHERE –class.Seeing a new row that was inserted during the process of the initial transaction. The new row only shows up if the initial transaction is repeated. Comment Chapter 20, Problem 14E Problem Change transaction T2 in Figure 20.2(b) to read read_item(X); X := X + M; if X > 90 then exit else write_item(X); Discuss the final result of the different schedules in Figures 20.3(a) and (b), where M = 2 and N = 2, with respect to the following questions: Does adding the above condition change the final outcome? Does the outcome obey the implied consistency rule (that the capacity of X is 90)? Step-by-step solution Step 1 of 1 Let the condition is read_item(X); X:= X+M; if X > 90 then exit else write_item(X); So, this condition is does not change the final output unless the initial value of X > 88. The outcome, however, does obey the implied consistency rule that X < 90, since the value of X is not updated if it becomes greater than 90. Comment Chapter 20, Problem 15E Problem Repeat Exercise 20.14, adding a check in T1 so that does not exceed 90. Reference Exercise 20.14 Change transaction T2 in Figure 20.2(b) to read read_item(X); X := X + M; if X > 90 then exit else write_item(X); Discuss the final result of the different schedules in Figures 20.3(a) and (b), where M = 2 and N = 2, with respect to the following questions: Does adding the above condition change the final outcome? Does the outcome obey the implied consistency rule (that the capacity of X is 90)? Step-by-step solution Step 1 of 1 Let the data as read_item(X); X:= X+M; if X > 90 then exit else write_item(X); from this we may write like T1 T2 read_item(X); X := X-N; read_item(X); X := X+M; write_item(X); read_item(Y); if X > 90 then exit else write_item(X); Y := Y+N; if Y> 90 then exit else write_item(Y); This condition does not change the final output unless the initial value of X > 88 or Y > 88. This output obeys the implied consistency rule that X < 90 and Y < 90. Chapter 20, Problem 16E Problem Add the operation commit at the end of each of the transactions T1 and T2 in Figure 20.2, and then list all possible schedules for the modified transactions. Determine which of the schedules are recoverable, which are cascade-less, and which are strict. Step-by-step solution Step 1 of 6 Let the data as Let the two Transactions from text book T1T2 read_item(X); read_item(X); X := X - N ; X := X + M; write_item(X); write_item(X); read_item(Y); commit T 2 Y := Y + N; write_item(Y); commit T 1 From these transactions we can be written as using the shorthand notation. That is T 1 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; T 2 : r 2 (X); w 2 (X); C 2 ; Comment Step 2 of 6 Given m transactions with number of operations n1, n2, ..., nm, the number of possible schedules is (n1 + n2 + ... + nm)! / (n1! * n2! * ... * nm!), Here ! is the factorial function. In our case, Let us consider m =2 n1 = 5 n2 = 3, so the number of possible schedules is (5+3)! / (5! * 3!) = 8*7*6*5*4*3*2*1/ 5*4*3*2*1*3*2*1 = 56. Comment Step 3 of 6 So, that 56 possible schedules, and the type of each schedule are S 1 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; r 2 (X); w 2 (X); C 2 ; strict (and hence cascadeless) S 2 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); C 1 ; w 2 (X); C 2 ; recoverable S 3 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); w 2 (X); C 1 ; C 2 ; recoverable S 4 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); w 2 (X); C 2 ; C 1 ; non-recoverable S 5 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); C 1 ; w 2 (X); C 2 ; recoverable S 6 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); w 2 (X); C 1 ; C 2 ; recoverable S 7 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); w 2 (X); C 2 ; C 1 ; non-recoverable S 8 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); w 1 (Y); C 1 ; C 2 ; recoverable S 9 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); w 1 (Y); C 2 ; C 1 ; non-recoverable S 10 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); C 2 ; w 1 (Y); C 1 ; non-recoverable S 11 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; recoverable S 12 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; recoverable S 13 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; non-recoverable S 14 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; recoverable S 15 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; non-recoverable S 16 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; non-recoverable S 17 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; recoverable S 18 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; non-recoverable S 19 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; non-recoverable S 20 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; non-recoverable Comment Step 4 of 6 S 21 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; strict (and hence cascadeless) S 22 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; cascadeless S 23 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; cascadeless S 24 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; cascadeless S 25 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; cascadeless S 26 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; cascadeless S 27 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless S 28 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless S 29 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless S 30 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless S 31 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless S 32 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless S 33 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless S 34 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless S 35 : r 1 (X); r 2 (X); w 2 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; strict (and hence cascadeless) S 36 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; strict (and hence cascadeless) S 37 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; cascadeless S 38 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; cascadeless S 39 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; cascadeless S 40 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; cascadeless Comment Step 5 of 6 S 41 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; cascadeless S 42 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless S 43 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless S 44 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless S 45 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless S 46 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless S 47 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless S 48 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless S 49 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless S 50 : r 2 (X); r 1 (X); w 2 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; cascadeless Comment Step 6 of 6 S 51 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; non-recoverable S 52 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; recoverable S 53 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; recoverable S 54 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; recoverable S 55 : r 2 (X); w 2 (X); r 1 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; recoverable S 56 : r 2 (X); w 2 (X); C 2 ; r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; strict (and hence cascadeless) Comment Chapter 20, Problem 17E Problem List all possible schedules for transactions T1 and T2 in Figure 20.2, and determine which are conflict serializable (correct) and which are not. Step-by-step solution Step 1 of 3 Let the two Transactions T1 and T2 are as follows: Comment Step 2 of 3 The Shorthand notation for the two transactions is, Comment Step 3 of 3 Below are the 15 possible schedules and their type of each schedule: Comment Chapter 20, Problem 18E Problem How many serial schedules exist for the three transactions in Figure 20.8(a)? What are they? What is the total number of possible schedules? Step-by-step solution Step 1 of 2 Let the three Transactions from text book like T 1 T 2 T3 read_item(X); read_item(Z); read_item(Y); write_item(X); read_item(Y); read_item(Z); read_item(Y); write_item(Y) write_item(Y); write_item(Y); read _item(X) write_item(Z) write_itme(X) Comment Step 2 of 2 From defination of serial schedules the above three transactions are T1 T2 T3 T3 T2 T1 T2 T3 T1 T2 T1 T3 T3 T1 T2 T1 T3 T2 Total number of serial schedules for the three transactions = 6 And The total number of serial schedules for n transactions is factorial(n) ie..(n!). Comment Chapter 20, Problem 19E Problem Write a program to create all possible schedules for the three transactions in Figure 20.8(a), and to determine which of those schedules are conflict serializable and which are not. For each conflict-serializable schedule, your program should print the schedule and list all equivalent serial schedules. Step-by-step solution Step 1 of 1 Programs for finding serializable schedules: Array TansactionT1Commands [4] ; Int t1Counter = 0; Int t2Counter = 0; Int t3Counter = 0; Int maxCounter = 0; Int Transaction; Array TansactionT2Commands[5] ; Array TansactionT3Commands[4] ; Array FinalSchedules[12]; Array Schedules[12][5000];//there can be many schedules we will take only 5000 While (maxCounter < 5000) { For (int i = 0; i< 13; i++) { Int ti = Rand(3); If (ti == 1 && t1Counter< 4) { FinalSchedule[i++] = TansactionT1Commands[t1Counter++]; } Else If(ti== 2 && t2Counter< 5) { FinalSchedule[i++] = TansactionT2Commands[t2Counter++]; } Else if (t1== 3 && t3Counter< 4) { FinalSchedule[i++] = TansactionT3Commands[t3Counter++]; } } If (FinalSchedules[12] in Schedules[12][5000]); { ////Do nothing } Else { Save FinalSchedules[12] in Schedules[12][5000]); maxCounter++; Check if Seralizable (FinalSchedules[12]); } } Check if Seralizable (Array FinalSchedules[12]) { For each transaction create a node. For each case in Schedule S where Tj executes a read_item(X) after Ti executes a write_item(X), create an edge (Ti-> Tj) in the precedence graph. For each case in Schedule S where Tj executes a write_item(X) after Ti executes a read_item(X), create an edge (Ti-> Tj) in the precedence graph. For each case in Schedule S where Tj executes a write_item(X) after Ti executes a write_item(X), create an edge (Ti-> Tj) in the precedence graph. The schedule is seralizavble only if there is no cycles. If Serializable print FinalSchedules[12]) Return; } Comment Chapter 20, Problem 20E Problem Why is an explicit transaction end statement needed in SQL but not an explicit begin statement? Step-by-step solution Step 1 of 1 A transaction is an atomic operation. It has only one way to begin, that syntax is like this BEGIN_ TRANSACTION ------; - - - - - ; // READ OR WRITE // -----; END TRANSATIONS; COMMIT_TRANSACTION Transactions could end up in two ways: Successfully installs-- its updates to the database (i.e., commit) or Removes -- its partial updates (which may be incorrect) from the database (abort). So, it is important for the database systems to identify the right way of ending a transaction. It is for this reason an "End" command is needed in SQL2 query. Comment Chapter 20, Problem 21E Problem Describe situations where each of the different isolation levels would be useful for transaction processing. Step-by-step solution Step 1 of 2 Transaction isolation measure the influence of other concurrent transactions on a given transaction. This affects of concurrency has two levels, that are the highest in Read Uncommitted and the lowest in Serializable. Isolation level Serializable: In this level preserves consistency in all situations, thus it is the safest execution mode. It is recommended for execution environment where every update is crucial for a correct result. For example, airline reservation, debit credit, salary increase, and so on. Isolation level Repeatable Read: In this level is similar to Serializable except Phantom problem may occur here. Thus, in record locking (finer granularity), this isolation level must be avoided. It can be used in all types of environments, except in the environment where accurate summary information (e.g., computing total sum of a all different types of account of a bank customer) is desired. Comment Step 2 of 2 Isolation level Read Committed: In this level a transaction may see two different values of the same data items during its execution life. A transaction in this level applies write lock and keeps it until it commits. It also applies a read (shared) lock but the lock is released as soon as the data item is read by the transaction. This isolation level may be used for making balance, weather, departure or arrival times, and so on. Isolation level Read Uncommitted: In this level a transaction does not either apply a shared lock or a write lock. The transaction is not allowed to write any data item, thus it may give rise to dirty read, unrepeatable read, and phantom. It may be used in the environment where statistical average of a large number of data is required. Comment Chapter 20, Problem 22E Problem Which of the following schedules is (conflict) serializable? For each serializable schedule, determine the equivalent serial schedules. a. r1(X); r3(X); w1(X); r2(X); w3(X); b. r1(X); r3(X); w3(X); w1(X); r2(X); c. r3(X); r2(X); w3(X); r1(X); w1(X); d. r3(X); r2(X); r1(X); w3(X); w1(X); Step-by-step solution Step 1 of 5 Serializable schedule: A conflict graph corresponding to a schedule decides whether given schedule is conflict serializable or not. If conflict graph contains cycle, then the schedule is not serializable. The drawing sequence of conflict graph: 1) Create a node labeled Ti in graph for each of the transaction Ti which participates in schedule S. 2) An edge is created from Ti to Tj in graph, where a write_item(X) is executed by Ti and then a read_item(X) is executed by Tj. 3) Create an edge in graph from Ti to Tj, where a read_item(X) is executed by Ti and then a write_item(X) is executed by Tj. 4) Create an edge in graph from Ti to Tj, where a write_item(X) is executed by Ti and then a write_item(X) is executed by Tj. 5) If no cycles are present in conflict graph, then it is a serializable schedule. Comment Step 2 of 5 (a) Given schedule: Conflict graph: The conflict graph has cycle, in T1-T3. Hence, given schedule is Comment Step 3 of 5 (b) Given schedule: Conflict graph: . The conflict graph has cycle, in T1-T3. Hence, the schedule S is . Comment Step 4 of 5 (c) Given schedule: Conflict graph: The graph contains no cycles. Hence, the schedule S is • The equivalent schedule that is serial is: . , that is, Comment Step 5 of 5 (d) Given schedule: Conflict graph: The conflict graph has cycle, in T1-T3. Hence, the schedule S is . Chapter 20, Problem 23E Problem Consider the three transactions T1 T2, and T3, and the schedules S1 and S2 given below. Draw the serializability (precedence) graphs for S1 and S2 and state whether each schedule is serializable or not. If a schedule is serializable, write down the equivalent serial schedule(s). T1: r1 (X); r1 (Z); w1 (X); T2: r2 (Z); r2 (Y); w2 (Z); w2(Y); T3: r3 (X); r3 (Y); w3 (Y); S1: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2 (Y); w2 (Z); w2 (Y); S2: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); w2 (Z); w3 (Y); w2 (Y); Step-by-step solution Step 1 of 2 The schedule S1 is as follows: S1: r1(X); r2(Z); r1(Z); r3(X); r3(Y); w1(X); w3(Y); r2(Y); w2(Z); w2(Y) The precedence graph for S1 is as follows: The schedule S1 is a serializable schedule as there is no cycle in the precedence graph. • T3 reads X before X is modified by T1. • T1 reads Z before Z is modified by T2. • T2 reads Y and writes it only after T3 has written to it. The equivalent serializable schedule is as follows: Comment Step 2 of 2 The schedule S2 is as follows: S2: r1(X); r2(Z); r3(X); r1(Z); r2(Y); r3(Y); w1(X); w2(Z); w3(Y); w2(Y) The precedence graph for S1 is as follows: The schedule S2 is not a serializable schedule as there is cycle in the precedence graph. • T2 reads Y before T3 reads it and modifies Y. • T3 reads Y which is later modified by T2. Comment Chapter 20, Problem 24E Problem Consider schedules S3, S4, and S5 below. Determine whether each schedule is strict, cascadeless, recoverable, or nonrecoverable. (Determine the strictest recoverability condition that each schedule satisfies.) S3: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); c1; w3 (Y); c3; r2(Y); w2(Z); w2(Y); c2; S4: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2(Y); w2(Z); w2(Y); c1; c2; c3; S5: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); c1; w2(Z); w3(Y); w2(Y); c3; c2; Step-by-step solution Step 1 of 5 Strict schedule: A schedule is said to be a strict schedule if a transaction neither reads or writes an item x until another transaction that wrote x is committed. The schedule S3 is a not a strict schedule because of the following reason: • The operation r3(x) is before w1(x) in the schedule S3. • It means that T3 reads the value of x before T1 has written the value of x. • T3 must read x only after T1 commits. The schedule S4 is a not a strict schedule because of the following reason: • The operation r3(x) is before w1(x); in the schedule S4. • It means that T3 reads the value of x before T1 has written the value of x. • T3 must read x only after T1 commits. The schedule S5 is a not a strict schedule because of the following reason: • The operation r3(x) is before w1(x); in the schedule S5. • It means that T3 reads the value of x before T1 has written the value of x. • T3 must read x only after T1 commits. Comment Step 2 of 5 Cascadeless schedule: A schedule is said to be a cascadeless schedule if a transaction reads or writes an item x only after a transaction that wrote x is committed. The schedule S3 is a not a cascadeless schedule because of the following reason: • The operation r3(x) is before w1(x) in the schedule S3. • It means that T3 reads the value of x before T1 commits. The schedule S4 is a not a cascadeless schedule because of the following reason: • The operation r3(x) is before w1(x); in the schedule S4. • It means that T3 reads the value of x before T1 commits The schedule S5 is a not a cascadeless schedule because of the following reason: • The operation r3(x) is before w1(x); in the schedule S5. • It means that T3 reads the value of x before T1 commits Comment Step 3 of 5 Recoverable and non-recoverable schedule: A schedule is said to be a recoverable schedule if no transaction T commits until the transaction T’ that wrote x and whose value of x is read by T is committed. Schedule S3: • If the T1 aborts first and then T3 and T2 are committed, then the schedule S3 is recoverable as rolling back of T1 does not affect T2 and T3. • If the T1 commits first and then T3 aborts and then T2 commits, then the schedule S3 is not recoverable as rolling back of T3 will affect T2 as it has read the value of y written by T3. • If the T1 and T3 commits and then T2 aborts, then the schedule S3 is recoverable as rolling back of T2 does not affect T1 and T3. • Strictest condition is transaction T3 must be committed before T2. Comment Step 4 of 5 Schedule S4: • If the T1 aborts first and then T2 and T3 are committed, then the schedule S4 is recoverable as rolling back of T1 does not affect T2 and T3. • If the T1 commits first and then T2 aborts and then T3 commits, then the schedule S4 is recoverable as rolling back of T1 does not affect T2 and T3. The value of y which is read and written by T3 will be restored by the rollback of T2. • If the T1 and T2 commits and T3 aborts, then the schedule S4 is not recoverable as rolling back of T3 will affect T2 as it has read the value of y written by T3. • Strictest condition is transaction T3 must be committed before T2. Comment Step 5 of 5 Schedule S5: • If the T1 aborts first and then T3 and T2 are committed, then the schedule S5 is recoverable as rolling back of T1 does not affect T2 and T3. T1 writes the value of x which is not read by T2 nor T3. • If the T1 commits first and then T3 aborts and then T2 commits, then the schedule S5 is not recoverable as rolling back of T3 will affect T2 as it has read the value of y written by T3. • If the T1 and T3 commits and then T2 aborts, then the schedule S5 is recoverable as rolling back of T2 does not affect T1 and T3. • Strictest condition is transaction T3 must be committed before T2. Comment Chapter 21, Problem 1RQ Problem What is the two-phase locking protocol? How does it guarantee serializability? Step-by-step solution Step 1 of 2 Two-phase locking: Two-phase locking schema is a one of the locking schema is which a transaction cannot request a new lock until it unlocks the operations in the transaction. It is involved in two phases. • Locking phase • Unlocking phase. Locking phase: This is the expanding or growing phase in which the new locks are acquired but none is released. Unlocking phase:This is the second phase referred as shrinking phase in which it releases the existing locks and does not acquire the new locks. Comment Step 2 of 2 Guarantee of serializability: The attraction of the two-phase algorithm derives from a theorem which provides that the twophase locking algorithm always leads to serializable schedules. It is proved that if every transaction in a schedule follows the two-phase locking protocol, then the schedule is guaranteed to be serializable. With the two-phase locking protocol, the schedule is guaranteed to be serializability because the protocols will prevent interface among different transactions and it avoids the problems of last update, uncommitted dependency and inconsistent analysis if the two phase locking is enforced. Comment Chapter 21, Problem 2RQ Problem What are some variations of the two-phase locking protocol? Why is strict or rigorous two-phase locking often preferred? Step-by-step solution Step 1 of 2 Variations two-phase locking protocol:According to the two-phase locking protocol, locks are handled by transactions and there are a number of variations of two-phase locking. That is (1) Conservative 2PL (or) static 2PL It requires a transaction to lock all the items it access before the transaction beings execution by predeceasing its read-set and write-set, it is a deadlock-free protocol. (2) Basic 2PL This a one technique of 2PL and transaction locks data items incrementally. This may cause dead lock which is dealt with. Comment Step 2 of 2 Strict or rigorous two-phase locking is preferred because, In this variation, a transaction T does not release any of it’s exclusive (write) locks until after it commits or aborts. So, no other transaction can read/write an item that is written by T unless T have committed. And strict 2PL is not dead lock-free. And most restrictive variation of strict -2PL is rigorous 2PL. it also guarantees the strict schedules. In this, a transaction T does not release any of it’s locks until after it commits or aborts and so it is easier to implement than strict 2PL. Comment Chapter 21, Problem 3RQ Problem Discuss the problems of deadlock and starvation, and the different approaches to dealing with these problems. Step-by-step solution Step 1 of 4 Deadlock: • A deadlock refers to a situation in which a transaction Ti waits for an item that is locked by transaction Tj. The transaction Tj in turn waits for an item that is locked by transaction Tk. • When each transaction in a set of transactions is waiting for an item that is locked by other transaction, then it is called deadlock. Example: Suppose there are two transaction T1 and T2 and there are two items X and Y. • Initially transaction T1 hold the item X and transaction T2 hold the item Y. • In order for the transaction T1 to complete its execution, it needs item Y which is locked by transaction T2. • In order for the transaction T2 to complete its execution, it needs item X which is locked by transaction T1. Such a situation is known as deadlock situation because neither transaction T1 and T2 can complete its execution. Comment Step 2 of 4 The different approaches to dealing with deadlock are as follows: • Deadlock prevention: The transaction acquires the lock on all the items it needs before starting the execution. If it cannot acquire a lock on an item, then it should not lock any other items and should wait and try to acquire locks again. • Deadlock detection: A wait for graph is used to check for deadlocks. • Timeouts: A transaction is aborted if it waits for a period longer than the system defined time. Comment Step 3 of 4 Starvation: • Starvation refers to a situation in which a low priority transaction waits indefinitely while other high priority transactions execute normally. • Starvation problem occurs when locking is used. Comment Step 4 of 4 The different approaches to dealing with starvation are as follows: • Use the first come first serve queue to maintain the transactions that are waiting. The transactions can acquire lock on an item in the same order they have been placed in the queue. • Increase the priority of the transactions that are waiting longer so that at some point of time it becomes the transaction with highest priority and proceeds to execute. Comment Chapter 21, Problem 4RQ Problem Compare binary locks to exclusive/shared locks. Why is the latter type of locks preferable? Step-by-step solution Step 1 of 2 Binary locks:Binary locks are type of lock. It has only two states of a lock, it is too simple, and it is too restrictive. It is not used in the practice. Exclusive/shared lock:Exclusive/shared locks that may provide more general locking capabilities and that are used in practical database locking schemas. In this lock. Read-lock as a shared lock and Write-lock as an exclusive lock. From the above locks, exclusive/shared lock is preferable, because, Share-lock is the read-locked item through this other operations are allow to read the item and where as a write-locked is a single transaction exclusively holds the lock on the item. Here these are three locking operations. That are Read-lock (X) Write-lock (X), and Un lock (X) Comment Step 2 of 2 If we use the shared locking scheme. The system must following the (1) A transaction T must issue the operation read-lock (X) or write-lock(X) before any read-item (X) operation is performed in T (2) A transaction T must issue the operation write-lock (X) before any write-items (X) operation is performed in T. (3) A Transactions T must issue the operation unlock (X) after all read-items (X) and writeritem(X) operations are completed in T. (4) A Transaction T will not issue a read lock (X) operation if it already holds a read (Shared) lock or a write (Exclusive) lock on item X. This rule may be relaxed. Comment Chapter 21, Problem 5RQ Problem Describe the wait-die and wound-wait protocols for deadlock prevention. Step-by-step solution Step 1 of 2 Wait-die and wound-wait protocols:Transactions are start based on the order of the timestamps, hence. If transaction before transaction , then starts . So, we notice that, the order transaction has the smaller timestamp value. Two schemes that prevent dead lock are called wait-die and wound-wait. For suppose, transaction other Transaction tries to lock an item X but is not able to because X is locked by some . With a conflicting lock. These rules are followed by below schemas. Comment Step 2 of 2 Wait-die:If . Then same times stamps; other wise abort younger than and restart it later with the as allowed to waid. In a wait-die an older transaction is allowed to wait on a younger transaction and it is requesting an item held by an older transaction is aborted and restarted. The wound-wait is the opposite to wait-die. Means: A younger transaction is allowed to wait an older one. Where an older transaction requesting an item held by a younger transaction precepts the younger transaction by a forting. It Comment Chapter 21, Problem 6RQ Problem Describe the cautious waiting, no waiting, and timeout protocols for deadlock prevention. Step-by-step solution Step 1 of 3 We may prevention the dead lock by using following. Cautious waiting:Suppose, a transaction by some other transaction tries to lock an item but it is not able to do. Because is locked with a conflicting lock. And If is not blocked, than is blocked and allowed to wait other wise abort Ie If X is waiting for , let it wait unless is also waiting for to release some other item. Comment Step 2 of 3 No waiting:In case of inability to obtain a lock, a Transaction aborts and is resubmitted with a fined delay Comment Step 3 of 3 Timeout If a transactions waits for a period longer than a system-defined time out period, and the system assumes that the transaction may be dead locked and ;aborts it-regardless of whether a deadlock actually exists or not. If we use time out protocol in the dead lock prevention. Some transactions that were not deadlocked and they may abort and may have to be resubmitted. Comment Chapter 21, Problem 7RQ Problem What is a timestamp? How does the system generate timestamps? Step-by-step solution Step 1 of 1 Timestamp:Time stamp is a unique identifier created by the DBMS to identify a transaction and it’s values are assigned in the order in which the transactions are submitted to the system. Time stamp means. A monotonically increasing variable (integer) indicating the age of an operation or a transaction. Time stamps that can be generated by system in several ways It is to use a counter and that is incremented each time its value is assigned to a transaction. In this schema, the transaction time stamps are numbered like 1, 2, 3,…and A computer counter has a finite maximum value. So the system must periodically reset the counter to zero. When no transactions are executing for some short period of time and system may implement the timestamps to use the current date/time values of the system clock and ensure that no two time stamp value are generated during the same tick of the clock. Comment Chapter 21, Problem 8RQ Problem Discuss the timestamp ordering protocol for concurrency control. How does strict timestamp ordering differ from basic timestamp ordering? Step-by-step solution Step 1 of 3 Time stamp ordering protocol for concurrency control: The protocol manages concurrent executing such that the time stamps determine the serializability order. The protocol maintains for each data Q through two timestamp values. (1) W-timestamp(Q) It is a largest time-stamp of any transaction that executed write (Q) successfully. (2) R-time stamp (Q) It is the largest time-stamp of any transaction that executed read (Q) Successfully. Time stamp ordering protocol ensures that any conflicting read and write operations are executed in timestamp order. Comment Step 2 of 3 Differ from strict time stamp ordering through basic timestamp ordering:Strict time stamp ordering (TO) When transaction ‘T’ issues a write-item (X) operations and read-item (X) operation. If TS(T)> read-TS(X) then delay T until the transaction ‘T’ that wrote or read X has terminated and if TS(T)> write-TS(X) the delay T until the transaction ‘T’ that wrote or read X has terminated Comment Step 3 of 3 Basic timestamp ordering:When transaction ‘T’ issues a write-item (X) operations and read-item (X) operation. If TS (T)> read-TS(X) then delay T until the transaction ‘T’ that wrote or read X has terminated and If TS (T)> write-TS(X) the delay T until the transaction ‘T’ that wrote or read X has terminated If read-TS(X)>TS(T) or does not exist, then execute write-item (X) of T and set write-TS(X) to TS(T). And If write-TS(X) >TS(T), then an younger transaction has already written to the data item so a fort and roll-back T and reject the operation. If write-TS(X) TS (T), then execute read-item (X) of T and set read-TS(X) to the larger of TS(T) and current read-TS(X). Comment Chapter 21, Problem 9RQ Problem Discuss two multiversion techniques for concurrency control. What is a certify lock? What are the advantages and disadvantages of using certify locks? Step-by-step solution Step 1 of 4 Multiple concurrency control techniques are the ones that retain the old value of data items, while dealing with the newer version of the values. The purpose behind holding the older values as well as is to maintain serializablity and to support some older values as well that are compatible with the previous data. Two multiversion techniques for concurrency control are as follows: 1. Multiversion Technique Based on Timestamp Ordering. 2. Multiversion Two-Phase Locking Using Certify. Comment Step 2 of 4 Consider the description of the two multiversion techniques for concurrency control discussed above: 1. Multiversion Technique Based on Timestamp Ordering: In this, several versions of each data X are maintained. For each version there must two be more details. • Read_TS: It is the time stamp of that particular moment when the data is read. It contains the highest value of all time stamps. • Write_TS: It hold the value of that particular moment at which the data is updated. Whenever a write operation is performed over an item X, the newer version of both the read_TS and write_TS is made, while previous version is also retained. Comment Step 3 of 4 2. Multiversion Two-Phase Locking Using Certify Locks: In this, there are three kinds of locking modes for each item. These three kinds of locking modes are as follows: • Read • Write • Certify So, if a state is said to be locked then it may be any of these three locks. • In the previous locking scheme, if a transaction holds a write lock over an item, then no one item is allowed to access that. But here it is to allow other transactions T to read an item X while a single transaction T holds a write lock on X. • For this purpose two version of x is to be held. Then in case of committing a transaction, certify lock is to be maintained over an item. Comment Step 4 of 4 Certify Lock: It is the kind of lock that is attained only when all the updated values need to be finalized so that it can get a stable state. It is similar to a commit statement when all the transactions that are performed successfully are need to be saved. Advantages of Certify Lock: • When the transaction is completed and is ready to be saved, then a certify lock is maintained over a transaction or over an item so as to maintain a monopoly over it. • The updating of the data item can be completed securely and the data get saved from any kind of hindrance. Disadvantage of Certify Lock: When a transaction is completed and there is maintained a certifies lock, then in that case none of the other data item or other process is not able to have access over that item and cannot have access even for reading the item. Comment Chapter 21, Problem 10RQ Problem How do optimistic concurrency control techniques differ from other concurrency control techniques? Why are they also called validation or certification techniques? Discuss the typical phases of an optimistic concurrency control method. Step-by-step solution Step 1 of 2 In all concurrency control techniques, certain degree of checking is done before a database operation can be executed. For example, in locking a check is done to determine weather the item being accessed is locked. In timestamp ordering, the transaction timestamp is checked against the read and the write timestamps of the item. Such checking represent overhead during transactions. In optimistic concurrency control techniques, also known as validation or certification techniques, no checking is done while the transaction is executing. In one of validation schemes, updates in the transaction are not applied directly to the database items until the transaction reaches its end. During transaction execution all updates are made to the local copies of data items that are kept for transaction. At the end of transaction execution, validation phases checks weather any of the transaction’s updates violate serializability. Certain information needed by validation phase must be kept in the system. If serializability is not violated the transaction is committed and database is updated from local copies; otherwise the transaction is aborted and restarted later. Comment Step 2 of 2 Phases of Concurrency control protocol: 1.) Read phase: A transaction can read values of committed data items from the database. However, updates are applied only to local copies of the data items kept in the transaction workspace. 2.) Validation phase: Checking is performed to ensure that serializability will not be violated if the transaction updates are applied to the database. 3.) Write phase: If the validation phase is successful, the transaction updates are applied to the database; otherwise, the updates are discarded and the transaction restarted. The idea behind optimistic concurrency control is to do all checks at once; hence, transaction execution proceeds with a minimum overhead until the validation phase is reached. Since in the validation phase it is decided that if transaction can be committed or must be aborted it is also called as validation or certification technique. Comment Problem Chapter 21, Problem 11RQ What is snapshot isolation? What are the advantages and disadvantages of concurrency control methods that are based on snapshot isolation? Step-by-step solution Step 1 of 1 Snapshot isolation: Snapshot isolation is used in concurrency control protocols and some commercial DBMSs. Its definition comprises of the data items that is read by a transaction based on the committed values of the items present in the database snapshot. Snapshot isolation ensures that Phantom record problem does not happen. It ensures this, through the records that are executed in the database at the beginning of a transaction. Advantages of concurrency control methods based on snapshot isolation are as follows: • As the database statement or even database transaction only have the records, that were executed in the database when the transaction had started, so the snapshot isolation ensures that the phantom record problem does not arises. • The problems of nonrepeatable read and dirty read might arise during the transaction execution. Snapshot isolation ensures that these problems of nonrepeatable read and dirty read does not occur. • The concurrency control methods based on snapshot isolation has reduced overhead associated with the two phase locking, as there is no necessity to apply read locks to the items, in the read operations linked with the concurrency control methods. Disadvantages of concurrency control methods based on snapshot isolation are as follows: • Nonserializable schedules can occur in the case of concurrency control based snapshot isolation. There are few anomalies such as write-skew anomalies, read-only transaction anomaly that violates serializability. Such anomalies results in corrupted or inconsistent database. Comment Next Chapter 21, Problem 12RQ Problem How does the granularity of data items affect the performance of concurrency control? What factors affect selection of granularity size for data items? Step-by-step solution Step 1 of 3 The size of data item is often referred to as data item granularity. Smaller the size of data item it is fine granularity, larger size is course granularity. Comment Step 2 of 3 How does it affect performance of concurrency control? 1.) First notice that the larger the data item size is, the lower the degree of concurrency permitted. For example, if the data item size is a disk block, a transaction T that needs to lock a record B must lock the whole disk block X that contains B because a lock is associated with the whole data item (block). Now, if another transaction S wants to lock a different record C that happens to reside in the same block X in a conflicting lock mode, it is forced to wait. If the data item size was a single record, transaction S would be able proceed, because it would be locking a different data item (record). 2.) The smaller the data item size is, the more the number of items in the database. Because every item is associated with a lock, the system will have a larger number of active locks to be handled by lock manager. More lock and unlock operations will be performed, causing a higher overhead. In addition , more storage space will be required for the lock table. For timestamps, storage is required for the read_TS and write_TS for each item, and there will be similar overhead for handling a large number of items. Comment Step 3 of 3 Factors affecting selection of granularity size for data items: Best item size is dependent on transactions involved. If a typical transaction accesses a small number of records, it is advantageous to have the data item granularity be one record. On other hand, if a transaction typically accesses many records in the same file. It may be better to have block or file granularity so that the transaction will consider all the records as one data item. Comment Chapter 21, Problem 13RQ Problem What type of lock is needed for insert and delete operations? Step-by-step solution Step 1 of 2 Types of locks needed for insert and delete operations:If we want to per form a delete/insert operation a new item in the database, it can not be accessed until. The item is created and the insert operation is completed. For this we use the locks that (1) two-phase locking (2) index locking by using two-phase locking if we use the delete operation, that may be performed only if the transaction deleting the tuple holds an exclusive lock on the tuple to be deleted. And Comment Step 2 of 2 A transaction that inserts a new tuple into the database is automatically given an exclusive lock on the inserted tuple. Insertion and deletion can lead to the phantom phenomenon. A transaction that scans a relation and a transaction that inserts a tuple in the relation. And if only tuple locks are used non-serializable schedules can result. the transaction scanning the relation is reading information that indicates which tuples the relation contains and while a transaction inserting a tuple updates the same information. Transactions inserting or deleting a tuple acquire and exclusive lock on the data item. From the above protocol, it provides a law concurrency for insertions/deletions. And Index locking protocols provide higher concurrency while preventing the phantom problem by requiring the locks on certain index buckets. Comment Chapter 21, Problem 14RQ Problem What is multiple granularity locking? Under what circumstances is it used? Step-by-step solution Step 1 of 1 Multiple granularity locking is a lock that may contain locks are set of objects. That contain other object locks are exploiting the hierarchical nature of contains a relationship. Multiple granularity locks should have to make some decision for all transactions and data containers are nested. Multiple granularity locks used in where the granularity level can be different for various maxis of transactions. • The Multiple granularity lock may use in concurrency control performance and Ensure that correctness, efficiency. • To create multiple granularity locking, there is required, some extra type of locks, those locks are termed as intention locks. Comment Chapter 21, Problem 15RQ Problem What are intention locks? Step-by-step solution Step 1 of 2 Intention locks:A lock that can be used for, “to macking a lock at multiple granularity levels practical, additional types of lock is needed. That is intention lock. Main idea a behind intention locks is, for a transaction to indicate which type of lock it will require later for a row in that table. (Not locking the object, but declare intension to lock part of the object) here, there are three types of intention locks. Comment Step 2 of 2 (1) Intention – shared (IS):Indicates that, a shared lock (S) will be requested on some decendant node (S) (2) Intention – exclusive (1X):Indicates that an exclusive lock (S) will be requested on some descendant node(S) (3) shared-intention-exclusive (Six) It includes that the current node is locked in shared mod but an exclusive lock (S) will be requested on some descendent node (S) And The intention lock protocall follows. (1) Before a given transaction can acquire an S lock on a given row. It must first acquire an Is or stronger lock on the table contain the row. (2) Before the given transaction can acquire an X lock on a given row. It mot first acquire an IX lock on the table containing that row. Comment Chapter 21, Problem 16RQ Problem When are latches used? Step-by-step solution Step 1 of 1 Latches are used. For, to guarantee the physical integrity of a page when that page is being written from the buffer disk. And Latch would be acquired for the page the page written to disk and then the latch be released. Typically locks are held for a short duration. This is a called as latches. Comment Chapter 21, Problem 17RQ Problem What is a phantom record? Discuss the problem that a phantom record can cause for concurrency control. Step-by-step solution Step 1 of 2 Phantom record:When a new record is inserted by some transaction T, that satisfies the condition, a set of records accessed by another Transaction . At this time, transaction followed by transaction T, it is new one and it is not included for equivalent serial order. And the Transaction logically conflict in the latter case there is really no record in common between the two transactions, since may have locked for all records before transaction ‘T’ inserted the new record. The record that causes the conflict of the phantom record Comment Step 2 of 2 The phantom record can cause for concurrency record:For this we take an example. Suppose, the transaction T is inserted a New EMPLOYEE record whose Dno=5. While transaction is accessing all EMPLOYEE records whose Dno=5. then the equivalent serial order is T followed by the . Then must read the new EMPLOYEE record and include its salary in the sum calculation. At this time the new salary should not be included and the latter. Case there is really no record in common between the two transactions. Since may have locked all the records with Dno=5 before T inserted the new record. This is because the record that cause the conflict is a phantom record. It is suddenly appeared in the database on being inserted. If the other operation in the two transactions conflict, the conflict due to the phantom record may not be recognized by the concurrency control protocol. Comment Chapter 21, Problem 18RQ Problem How does index locking resolve the phantom problem? Step-by-step solution Step 1 of 2 Index locking:Index includes entries that have an attribute values. Plus a set of pointers to all records in the file with that value. And if the index entry is locked before the record it self can be accessed. Then the conflict on the phantom record can be detected because transaction would request a read lock on the index entry and transaction T would request a write lock on the same entry before that could place the locks on the actual records. Since the index lock conflict the phantom conflict and that would be detected. Comment Step 2 of 2 Example:Let the index on Dno of EMPLOYEE would be include an entry for each distinct Dno value, plus a set of pointers to all EMPLOYEE records with that value. At this time if the index entry is locked, before the record itself can be accessed, then the conflict on the phantom record can be detected because transaction would request a read lock on the index entry for Dno=5 and transaction T would request a write lock on the same entry before they could place the locks on the actual records. Since the index locks conflict the phantom conflict would be detected. Comment Chapter 21, Problem 19RQ Problem What is a predicate lock? Step-by-step solution Step 1 of 1 Predicate lock:Index locking is a special case of predicate locking for which an index supports efficient implementation of the predicate lock. Predicate lock means all records that satisfy some logical predicate, and it satisfy an arbitrary predicate In general predicate locking has a lot of locking has a lot of locking over head. It is too expensive. Fancier index locking tricks are used in practice. Comment Chapter 21, Problem 20E Problem Prove that the basic two-phase locking protocol guarantees conflict serializability of schedules. (Hint: Show that if a serializability graph for a schedule has a cycle, then at least one of the transactions participating in the schedule does not obey the two-phase lockingprotocol.) Step-by-step solution Step 1 of 1 For This proof we tack contradiction, and assume binary locks for simplicity. Let n transactions T1, T2, ..., Tn such that they all obey the basic two-phase locking rule which is no transaction has an unlock operation followed by a lock operation. And Suppose that a non(conflict)-serializable schedule S for T1, T2, ..., Tn does occur; then, according to the precedence (serialization) graph for S must have a cycle. Hence, there must be some sequence within the schedule of the form: S: ...; [o1(X); ...; o2(X);] ...; [ o2(Y); ...; o3(Y);] ... ; [on(Z); ...; o1(Z);]... where each pair of operations between square brackets [o,o] are conflicting (either [w,w], or [w, r], or [r,w]) in order to create an arc in the serialization graph. This implies that in transaction T1, Than a sequence of the following form occurs: T1: ...; o1(X); ... ; o1(Z); ... Furthermore, T1 has to unlock item X (so T2 can lock it before applying o2(X) to follow the rules of locking) and has to lock item Z (before applying o1(Z), but this must occur after Tn has unlocked it). Hence, a sequence in T1 of the following form occurs: T1: ...; o1(X); ...; unlock(X); ... ; lock(Z); ...; o1(Z); ... This implies that T1 does not obey the two-phase locking protocol (since lock(Z) follows unlock(X)), contradicting our assumption that all transactions in S follow the two-phase locking protocol. Comment Chapter 21, Problem 21E Problem Modify the data structures for multiple-mode locks and the algorithms for read_lock(X), write_lock(X), and unlock(X) so that upgrading and downgrading of locks are possible. (Hint: The lock needs to check the transaction id(s) that hold the lock, if any.) Step-by-step solution Step 1 of 1 List of transaction ids that have read-locked an item is maintained, as well as the (single) transaction id that has write-locked an item. Only read_lock and write_lock are shown below. read_lock (X, Tn): B: if lock (X) = "unlocked" then begin lock (X) <- "read_locked, List(Tn)"; no_of_reads (X) <- 1 end else if lock(X) = "read_locked, List" then begin (* add Tn to the list of transactions that have read_lock on X *) lock (X) <- "read_locked, Append(List,Tn)"; no_of_reads (X) <- no_of_reads (X) + 1 end else if lock (X) = "write_locked, Tn" (* downgrade the lock if write_lock on X is held by Tn itself *) then begin lock (X) <- "read_locked, List(Tn)"; no_of_reads (X) <- 1 end else begin wait (until lock (X) = "unlocked" and the lock manager wakes up the transaction); goto B; end; write_lock (X,Tn); B: if lock (X) = "unlocked" then lock (X) <- "write_locked, Tn" else if ( (lock (X) = "read_locked, List") and (no_of_reads (X) = 1) and (transaction in List = Tn) ) (* upgrade the lock if read_lock on X is held only by Tn itself *) then lock (X) = "write_locked, Tn" else begin wait (until ( [ lock (X) = "unlocked" ] or [ (lock (X) = "read_locked, List") and (no_of_reads (X) = 1) and (transaction in List = Tn) ] ) and the lock manager wakes up the transaction); goto B; end; Comment Chapter 21, Problem 22E Problem Prove that strict two-phase locking guarantees strict schedules. Step-by-step solution Step 1 of 1 Strict two-phase locking guarantees strict schedules, Since no other transaction that can read or write an item and written by a transaction T until , T has committed and the condition for a strict schedule is satisfied. Comment Chapter 21, Problem 23E Problem Prove that the wait-die and wound-wait protocols avoid deadlock and starvation. Step-by-step solution Step 1 of 2 Two schemas that prevent deadlocks ar called wait-die and wait-wound. Suppose that transaction Ti tries to lock an item X but is not able to because X is locked by some other transaction Tj with a conflicting lock. The rules followed by these schemes are as follows: • Wait – die: If TS(Ti)< TS(Tj), then (Ti older than Tj) Ti is allowed to wait; otherwise (Ti younger than Tj) abort Ti (Ti dies) and restart it later with the same timestamp. • Wound – wait: If TS(Ti)< TS(Tj), then (Ti older than Tj) abort Tj (Ti wounds Tj) and restart it later with the same timestamp; otherwise (Ti younger than Tj) Ti ia allowed to wait. Comment Step 2 of 2 In wait-die, an older transaction is allowed to wait on younger transaction, whereas a younger transaction requesting an item held by an older transaction is aborted and restarted. The woundwait approach does the opposite: A younger transaction is allowed to wait on an older one, whereas an older transaction requesting an item held by a younger transaction preempts the younger transaction by aborting it. Both schemes end up aborting the younger of the two transactions that may be involved in a deadlock. It can be shows that these two techniques are deadlock-free, since in wait-die, transactions only wait on younger transactions so no cycle is created. However, both techniques may cause some transactions to be aborted and restarted needlessly, even though those transactions may never actually cause a deadlock. Comment Chapter 21, Problem 24E Problem Prove that cautious waiting avoids deadlock. Step-by-step solution Step 1 of 1 Cautious waiting avoids deadlock: In cautious waiting, a transaction Ti can wait on a transaction Tj (and hence Ti becomes blocked) only if Tj is not blocked at that time, say time b(Ti), when Ti waits. Later, at some time b(Tj) > b(Ti), Tj can be blocked and wait on another transaction Tk only if Tk is not blocked at that time. However, Tj cannot be blocked by waiting on an already blocked transaction since this is not allowed by the protocol. Hence, the wait-for graph among the blocked transactions in this system will follow the blocking times and will never have a cycle, and so deadlock cannot occur. Comment Chapter 21, Problem 27E Problem Why is two-phase locking not used as a concurrency control method for indexes such as B+trees? Step-by-step solution Step 1 of 1 Two phase locking can also be applied to indexes such as B+ trees, where the nodes of an index correspond to disk pages. However, holding locks on index pages until the shrinking phase of 2PL could cause an undue amount of transaction blocking because searching an index always starts at the root. Therefore, if a transaction wants to insert a record (write operation), the root would be locked in exclusive mode, so all other conflicting lock requests for the index must wait until the transaction enters the shrinking phase. This blocks all other transactions from accessing the index, so in practice other approaches to locking an index must be used. Comment Chapter 21, Problem 28E Problem The compatibility matrix in Figure 21.8 shows that IS and IX locks are compatible. Explain why this is valid. Step-by-step solution Step 1 of 1 IS and IX are compatible. When transaction T holds IS and IX is requested By T’, T is having only a shared lock and moreover T’ might be having intensions having an exclusive lock on a node that might be different from one on which T is working. Similarly T’ might be holding IX and T might request IS lock, since T’ might be having intensions of accessing only a node that may be different from one accessed by T both operations are compatible. Comment Chapter 21, Problem 29E Problem The MGL protocol states that a transaction T can unlock a node N, only if none of the children of node N are still locked by transaction T. Show that without this condition, the MGL protocol would be incorrect. Step-by-step solution Step 1 of 2 The rule that parent node can be unlocked only when none of child are not still locked by transaction T. This rule enforces 2PL rules to produce serializable schedules. If this rule is not followed, schedule will not be serializable and if schedule will not be serializable the transaction will not produce correct results and thus the protocol will fail. Comment Step 2 of 2 This rule ensures serializability of transactions by governing the order of locking and manipulation of data item by a transaction T. Let a transaction T wants to insert data in a node. That is let leaf node. Now before data is inserted and leaf node is unlocked let root node is unlocked. Now consider a situation when leaf node is full, this will call for splitting, but as root has been unlocked and might be locked by transaction T’, operation can not proceed. Hence protocol fails. Comment Chapter 22, Problem 1RQ Problem Discuss the different types of transaction failures. What is meant by catastrophic failure? Step-by-step solution Step 1 of 1 Types of failures : Computer failure – Main memory failure. Any thing that was not committed to the disk is gone. Restart the system and pray it doesn't crash again. Transaction or system error – Divide by zero or integer overflow and this transaction failure may also occur because of erroneous parameter values or because of a logical programming error. Logical errors: Errors or exception conditions that are detected by the transaction. A transaction that proceeds but halts and cancels all inputted data because something along the way prevents it from proceeding. Concurrency control enforcement Several transactions become deadlocked and are aborted. Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a read/ write head crash. Catastrophic failure: This would include many forms of physical misfortune to our database server. This refers to an endless list of problems or at least your hard drive with all your data is screwed... Comment Chapter 22, Problem 2RQ Problem Discuss the actions taken by the read_item and write_item operations on a database. Step-by-step solution Step 1 of 1 In a database, The operations like read item and write item that may Actions taken by the read item operation on a database (assume the read operation is performed on data item X): Find the address of the disk block that contains item X. Copy the disk block into a buffer in main memory if that disk is not already in some main memory buffer. Copy item X from the buffer to the program variable named X. Actions taken by the write item operation on a database (assume the write operation is performed on data item X): Find the address of the disk block that contains item X. Copy the disk block into a buffer in main memory if that disk is not already in some main memory buffer. Copy item X from the program variable named X into its correct location in the buffer. Store the updated block from the buffer back to disk (either immediately or at some later point in time). Comment Chapter 22, Problem 3RQ Problem What is the system log used for? What are the typical kinds of entries in a system log? What are checkpoints, and why are they important? What are transaction commit points, and why are they important? Step-by-step solution Step 1 of 4 System log: Recovery from transaction failures usually means that the database is restored to the most recent consistent state just before the time of failure. To do this, the system must keep information about changes that were applied to data items by various transactions. This information is typically kept in the system log. Thus system logs help in data recovery in case of failures. Comment Step 2 of 4 A typical strategy for recovery may b summarized information as follows: 1.) If there is extensive damage to a wide portion of the database due to catastrophic failure, such as a disk crash, the recovery method restores a past copy of the database that was backed up to archival storage and reconstructs a more current state by reapplying or redoing the operations of committed transactions from the backed up log, up to the time of failure. 2.) When the database is not physically damaged, but has become inconsistent due to noncatastrophic failures the strategy is to reverse any changes that caused inconsistency by undoing some operations. It may also be necessary to re-do some operations in order to restore a consistent state of database. In this case, we do not need a complete archival copy of the database. Rather, the entries kept in the online system log are consulted during recovery. Typical kind of entries that System log include: 1.) [T, write command, data item,old value, new value] 2.) [T, read command, data item, value] //used for checking accesses to database 2.) [Checkpoint] 3.) [Commit, T] 4.) read_TS //(TS = TimeStamp) 5.) write_TS Comment Step 3 of 4 Checkpoint: This is a type of entry in the system log. A [checkpoint] record is written into the log periodically at that point when the system writes out to the database on disk all DBMS buffers that have been modified. As a consequence of this, all transactions have there [commit, T] entries in the log before a [checkpoint] entry do not need to have their WRITE operations redone in case of a system crash, since all their updates will be recorded in the database on disk during check pointing. A checkpoint record may also include additionally information, such as a list of active transaction ids, and the locations of the first and the most recent records in the log for active transaction. This can facilitate undoing transaction operations in the event that a transaction must be rolled back. Comment Step 4 of 4 Commit Point: A commit point is point at which execution o transaction gets completed and is written to database and cannot be rolled back. A commit point is important in case of recovery techniques based on deferred updates. A typical deferred update protocol is stated as follows: 1.) A transaction cannot change the database on disk until it reaches commit point. 2.) A transaction does not reach its commit point until all its update operations are recorded in the log and the log is force- written to disk. Comment Chapter 22, Problem 4RQ Problem How are buffering and caching techniques used by the recovery subsystem? Step-by-step solution Step 1 of 1 Buffering and caching techniques in the recovery subsystem:In a subsystem. The recovery process is of ten closely inter twined with operating system functions. In general one or more disk pages that include the data items to be updated are cached into main memory buffers and then updated in memory before being written back to disk. At this time, the performance gap between disk and CPU increase, disk I/O has become a major performance bottleneck for data intensive applications. Disk I/O latency, in particular is much more difficult to improve than disk band width. While, buffering and caching in main memory have been used extensively to bridge the performance gap between CPU and disk. Comment Chapter 22, Problem 5RQ Problem What are the before image (BFIM) and after image (AFIM) of a data item? What is the difference between in-place updating and shadowing, with respect to their handling of BFIM and AFIM? Step-by-step solution Step 1 of 3 BFIM and AFIM :Before image (BFIM) :The old value of the data item before updating is called the before image (BFIM). After image (AFIM):The new value of the data item after updating is called the after image (AFIM) Comment Step 2 of 3 When flushing a modified buffer back to disk is follows two strategies. That are In – place updating. shadowing. Comment Step 3 of 3 Difference between in place updating and shadowing:In – place updating – writes the buffer to the same original disk location, and over writing the old value of any changed data items on disk. Here , single copy of each database disk block is maintained. This process is called as before image. Shadowing:Writes an updated buffer at a different disk location. Here multiple versions of data items can be maintained. This process is called as after image (AFIM). BFIM and AFIM, both are kept on disk and it is not strictly necessary to maintain a log for recovery. Comment Chapter 22, Problem 6RQ Problem What are UNDO-type and REDO-type log entries? Step-by-step solution Step 1 of 2 UNDO type and REDO type log entries:In the database recovery techniques, the recovery is achieved by the performing only UNDO’s and only REDO’s or by a combination of the two. These operations are recovered in the log when they happen. The log entry information included for a write command and it is needed for UNDO and REDO. UNDO type log entries:This entries includes the old value (S) in the data base before a write operation has been executed UNDO type log entries are necessary for rollback operations. This type entries are use full in “Restore all BFIMs on to the disk, means Remove all AFIMs. Comment Step 2 of 2 REDO type log entries:These entries, includes the new values in the data base a write operation has been executed. It is necessary for repeating already committed transactions. Ex: In case of disk failure. This type of entries are use full in “Restore the all AFIMs on to disk. Comment Chapter 22, Problem 7RQ Problem Describe the write-ahead logging protocol. Step-by-step solution Step 1 of 1 Write – ahead logging protocol:When in – place up dating, (means immediate or differed ) is used, then log is necessary for recovery and in this case, it must be available to recovery manager. For example: If BFIM of the data item is recoded in the appropriate log entry and that the log entry is flushed to disk before the BFIM is overwritten with the AFIM in the database on disk. This total achieved by write – ahead logging (WAL) protocol. Write – Ahead protocol states that (1) For undo:Before a data item’s AFIM is flushed to the database disk, its BFIM must be written to the log and the log must me saved on a stable store. (log disk). (2) For Redo:Before a transaction executes, its commit operation. All it’s AFIM must be written to the log and the log must be saved on a stable store. Comment Chapter 22, Problem 8RQ Problem Identify three typical lists of transactions that are maintained by the recovery subsystem. Step-by-step solution Step 1 of 1 List of transaction maintained by the recovery sub systems:For the best performance of the recovery process, the DBMS recovery subsystem may need to maintain number of transactions. In that three main and typical transactions is there. That are (1) active transactions. (2) Committed transactions. (3) Aborted transactions. These three lists makes the recovery process more efficient. Comment Chapter 22, Problem 9RQ Problem What is meant by transaction rollback? What is meant by cascading rollback? Why do practical recovery methods use protocols that do not permit cascading rollback? Which recovery techniques do not require any rollback? Step-by-step solution Step 1 of 4 Transaction roll back:Transaction rollback means that, if a transaction has failed after a disk write, the writes need to be undone. Means that, To maintain atomicity, a transaction’s operations are redone or undone. Undo : Restore all BFIM s on to disk (Remove all AFIM s) Redo: Restore all AFIM s on to disk. Data base recovery is achieved either by performing only Undo s or only Redo s by a combination of the two. These operations are recorded in the log as they happen. Comment Step 2 of 4 Cascading roll back: Cascading roll back is where the failure and rollback of some transaction requires the rollback of other. Uncommitted transactions because they read updates of the failed transaction. And In mean wile, any values that are derived from the values that were rolled back will also be undo. Comment Step 3 of 4 Practical recovery methods use protocols that do not permit cascading roll back because, it is complex and time – consuming. Practical recovery methods guarantee cascade less or strict schedules. Comment Step 4 of 4 UNDO / REDO recovery technique is do not required any rollback in a deferred update. Comment Chapter 22, Problem 10RQ Problem Discuss the UNDO and REDO operations and the recovery techniques that use each. Step-by-step solution Step 1 of 1 UNDO / REDO operations:If we want to describe a protocol for write – ahead logging, then we must distinguish between two types of log entry information included for a write. Command that are UNDO REDO A UNDO – type log entries includes the old value (BFIM) of the item since this is needed to undo the effect of the operation from the log. A REDO – type entry includes the new value (AFIM) of the item written by the operation since this is needed to read the effect of the operation from the log. In the UNDO / REDO algorithm, both types of log entries are combined. And cascading roll back is possible when the read – item entries in the log are considered to be UNDO – type entries. Comment Chapter 22, Problem 11RQ Problem Discuss the deferred update technique of recovery. What are the advantages and disadvantages of this technique? Why is it called the NO-UNDO/REDO method? Step-by-step solution Step 1 of 5 Deffeved update technique of recovery:The main thought of this technique is, to deffer or postpone any actual updates to the database until the transaction completes its execution successfully end reaches its commit point. Through this technique, the updates are recorded only in the log and in the cache buffers. After the transaction reaches its commit point and the log is force written to disk and the updates are recorded in the data base. Differed update technique is also called as NO – UNDO / REDO recovery. Deferred update protocol. It maintains two main rules. A transaction cannot change any items in the database until it commits. A transaction may not commit until all of the write operations are successfully recorded in the log. This means that we must check to see that the log is actually written to disk Example:- Comment Step 2 of 5 Log file: Start write commit check point start write write commit start write start write system crash …………… Comment Step 3 of 5 from this example: Since How ever, committed, their changes were written to disk. did not commit, hence, their changes were not written to disk. To recover, we simply ignore those transactions that did not commit. Comment Step 4 of 5 Advantages and disadvantages of deferred update technique:Advantages:- Recovery is made easier. Any transaction that reached the commit point (from the log) has its writes applied to the database (REDO). All other transactions are ignored. Cascading rollback does not occur because, no other transactions sees the work of another until it is committed (no stale reads). Disadvantages:Concurrency is limited: Must empty strict 2PL which limits concurrency. Comment Step 5 of 5 Deferred update technique is called as NO – UNDO / REDO recovery method because. From the second step (A transaction does not reach its commit point until all its update operations are recorded in the log and the log is force – written to disk ) of this protocol is a restatement of the write – ahead logging (WAL) protocol. Because the database is never updated on disk until after the transaction commits. There is never a need to UNDO any operations. Hence this is known as the NO – UNDO / REDO method. Comment Chapter 22, Problem 12RQ Problem How can recovery handle transaction operations that do not affect the database, such as the printing of reports by a transaction? Step-by-step solution Step 1 of 1 If a transaction that has actions that do not affect the database, such a generating and printing messages or reports from the information retrieved from the database, fails before completion, we may not want user to get these reports, since the transaction has failed to complete. If such erroneous reports are produced, part of the recovery process would have to inform the user these reports are wrong, since the user may take an action based on these reports that affects the database. Hence such reports must be generated only after the transaction reaches its commit point. A common method of dealing with such actions is to issue the command that generate the reports but keep them as batch jobs, which are executed only after the transaction reaches its commit point. If the transaction fails, the batch jobs are canceled. Comment Chapter 22, Problem 13RQ Problem Discuss the immediate update recovery technique in both single-user and multiuser environments. What are the advantages and disadvantages of immediate update? Step-by-step solution Step 1 of 2 Immediate update technique:Immediate update applies the write operations to the database as the transaction is executing. When the transaction issues an update commend. Then the database can be updated with out any need to wait for the transaction to reach it’s commit point and the update operation must still be recorded in the log before it is applied to the database using the write ahead is maintain two logs. (1) REDO log : A record of each new data item in the database. (2) UNDO log: A record of each update data item old vale And It follows the two rules. (1) Transaction T may not update the database until all undo entries have been written to the UNDO log. (2) Transaction T is not allowed to commit until all REDO and UNDO log entries are written. Comment Step 2 of 2 Advantages and disadvantages of immediate update:Advantages:Immediate update allows higher concurrency, because transactions write continuously to the database rather than waiting until the commit point. Disadvantages:It can lead the cascading roll backs – time consuming and may be problematic. Comment Chapter 22, Problem 14RQ Problem What is the difference between the UNDO/REDO and the UNDO/NO-REDO algorithms for recovery with immediate update? Develop the outline for an UNDO/NO-REDO algorithm. Step-by-step solution Step 1 of 3 Difference between UNDO/ REDO and UNDO/NO – REDO algorithms:UNDO / REDO algorithms:Recovery techniques based on immediate update and it uses in the single user environment. This recovery schema category apply to undo and also redo for recovery. In a single – user environment there is no concurrency control is required but a log is maintained under WAL. The recovery manger performs. Undo of a transaction if it is in the active table. Redo of a transaction if it is in the commit table. Recovery schemas of this category applies undo and also redo to recover the database from failure. Comment Step 2 of 3 UNDO / NO – REDO algorithm:In this algorithm, AFIM’s of a transaction are flushed to the database disk under WAL before it commits. For this reason the recovery manager undoes all transactions during recovery. Here No transaction is redone. It is possible that a transaction might have completed execution and ready to commit but this transaction is also undone. Comment Step 3 of 3 Out line for a an undo / No – Redo algorithm:In this algorithm, AFIMs of a transaction are flushed to the database disk under WAL before it commits. Reason for the recovery manager undoes all transactions during recovery Here NO trans. Comment Chapter 22, Problem 15RQ Problem Describe the shadow paging recovery technique. Under what circumstances does it not require a log? Step-by-step solution Step 1 of 3 Shadow paging recovery technique:Shadow paging is considers that the data base to be made up of a number of fixed size disk pages (or disk blocks ) – say , n – for recovery purposes. Shadow paging technique is mused to manage the access of data items by the concurrent transactions, two directories (current and shadow) are used. The directory arrangement is illustrated below. Comment Step 2 of 3 Current directory. Shadow directory After updating data items 2.5 (not up dated) Comment Step 3 of 3 Here data items means pages:Shadow paging is not required for a log Comment Chapter 22, Problem 16RQ Problem Describe the three phases of the ARIES recovery method. Step-by-step solution Step 1 of 2 Three phases of ARIES recovery method:The ARIES recovery methods / Algorithms. Consists of three phases. (1) Analysis phase (2) Redo phase (3) Undo phase. Comment Step 2 of 2 In the analysis phase, step identifies the dirty pages in the buffer and the set of transactions active at the time of crash. The appropriate point in the log where redo is to start is also determined, Where in the redo phase, redo operations are applied and where in undo. The log is scanned back words and the operations of transactions active at the time of crash are undone in reverse order. Comment Chapter 22, Problem 17RQ Problem What are log sequence numbers (LSNs) in ARIES? How are they used? What information do the Dirty Page Table and Transaction Table contain? Describe how fuzzy checkpointing is used in ARIES. Step-by-step solution Step 1 of 4 Log sequence numbers in ARIES:In ARIES, every log record is associated log sequence number (LSN) that is monotonically increasing and indicates the address of the log record on disk. A log record is used for to write. (1) data up date (2) transaction commit (3) transaction abort (4) undo (5) transaction end. Comment Step 2 of 4 In the case of undo, a compensating log record is written. Dirty page table and Transaction table:For efficient recovery, two tables are needed. These tables are stored in the log during checkpoint. (1) Transaction table :Table contains an entry for each active transaction, with information such as transaction ID. Transaction status and the LSN of the most recent log record for the transaction. (2) Dirty page table:This table contains an entry for each dirty page, in the buffer which includes the page ID and the LSN corresponding to the earliest update to that page. Comment Step 3 of 4 Fuzzy check pointing:Fuzzy check pointing is used for to reduce the cost of check pointing and allow the system to continue to execute transactions. ARIES uses the fuzzy check pointing it does follows. Writes a begin-check point record in the log. Writes an end – check point record in the log. With this record the contents of transaction table and dirty table are appended to the end of the log. Writes the LSN of the begin – check point record to a special file. This special file is accessed during recovery to located the last check point information. Comment Step 4 of 4 In practice, Fuzzy check point technique use when the system can resume transaction processing after the record is written to the log without having to wait for the process of check point action step 2 ( force – write all memory buffers that have been modified to disk ) to finish until the above step is completed. Then the previous record should remain valid. To accomplish this, the system maintains a pointer to the valid check point, which continues to point to the previous record in the log. Once the above step is concluded, the pointer changes to point to the new check point in the log. Comment Chapter 22, Problem 18RQ Problem What do the terms steal/no-steal and force/no-force mean with regard to buffer management for transaction processing? Step-by-step solution Step 1 of 1 In a transaction processing , he buffer management, manages through. (1) Steal / no – steal :A system is said to steal buffers if it allows the buffers that contain dirty data (means it is uncommitted but updated) data to be swapped to physical storage. If steal is allowed in the buffer management undo transaction is necessary. (2) Force / No – force :A system is said to force buffers if every committed data is guarantied to be forced on to the disk at commit time. If force is not allowed, redo is necessary. Comment Chapter 22, Problem 19RQ Problem Describe the two-phase commit protocol for multidatabase transactions. Step-by-step solution Step 1 of 1 Prepare phase – The global coordinator (initiating node) ask a participants to prepare (to promise to commit or rollback the transaction, even if there is a failure) Commit - Phase – If all participants respond to the coordinator that they are prepared, the coordinator asks all nodes to commit the transaction, if all participants cannot prepare, the coordinator asks all nodes to roll back the transaction. Comment Chapter 22, Problem 20RQ Problem Discuss how disaster recovery from catastrophic failures is handled. Step-by-step solution Step 1 of 1 Catastrophic failures from handled by disaster recovery, in this, the entire database along with a log file are copied to a cheap and large storage device periodically. When a catastrophe strikes, the most recent back up copy is placed back where the database used to do. Comment Chapter 22, Problem 21E Problem Suppose that the system crashes before the [read_item, T3, A] entry is written to the log in Figure 22.1(b). Will that make any difference in the recovery process? Step-by-step solution Step 1 of 1 Let consider the data from text book figure 19.1(b) . If the system crashes before the [ read_item, T3, A] entry is written to the log, There will be no difference in the recovery process, because read_item operations are needed only for determining if cascading rollback of additional transactions is necessary. Comment Chapter 22, Problem 22E Problem Suppose that the system crashes before the [write_item, T2, D, 25, 26] entry is written to the log in Figure 22.1(b). Will that make any difference in the recovery process? Step-by-step solution Step 1 of 2 When the system cashes before the transaction T2 performs a write operation on item D, there will a difference in the recovery process. Comment Step 2 of 2 During the recovery process, the following transactions must be rolled back. • The transaction T3 has not reached it commit point. So, the transaction T3 have to be rolled back. • Also, the transaction T2 has not reached it commit point. So, the transaction T2 have to be rolled back. Hence, the transactions T2 and T3 have to be rolled back in the recovery process. Comment Chapter 22, Problem 23E Problem Figure shows the log corresponding to a particular schedule at the point of a system crash for four transactions T1 T2, T3, and T4. Suppose that we use the immediate update protocol with checkpointing. Describe the recovery process from the system crash. Specify which transactions are rolled back, which operations in the log are redone and which (if any) are undone, and whether any cascading rollback takes place. Figure A sample schedule and its corresponding log. Step-by-step solution Step 1 of 5 The recovery process from the system crash will be as follows: • Undo all the write operations of the transaction that are not committed. • Redo all the write operations of the transaction that are committed after the check point. • Do not redo/undo the transaction that have committed before checkpoint. Comment Step 2 of 5 The transactions that need to be roll backed are as follows: • The transaction T3 has not reached it commit point. So, the transaction T3 have to be rolled back. • Also, the transaction T2 has not reached it commit point. So, the transaction T2 have to be rolled back. Comment Step 3 of 5 The operations that are to be redone are as follows: • write_item, T4, D, 25, 15: The transaction T4 must redo the write operation on item D. • write_item, T4, A, 30, 20: The transaction T4 must redo the write operation on item A. Comment Step 4 of 5 The operations that are to be undone are as follows: • write_item, T2, D, 15, 25 • write_item, T3, C, 30, 40: • write_item, T2, B, 12, 18: Comment Step 5 of 5 As no transaction has read an item which is written by an uncommitted transaction, no cascading rollbacks occur in the schedule. Comment Chapter 22, Problem 24E Problem Suppose that we use the deferred update protocol for the example in Figure 22.6. Show how the log would be different in the case of deferred update by removing the unnecessary log entries; then describe the recovery process, using your modified log. Assume that only REDO operations are applied, and specify which operations in the log are redone and which are ignored. Step-by-step solution Step 1 of 2 In the case of deferred update by removing the un necessary log entries , the write operations of uncommitted transactions are not recorded in the database until the transactions commit. So, the write operations of T2 and T3 would not have been applied to the database and so T4 would have read the previous values of items A and B, thus leading to a recoverable schedule. By using the procedure RDU_M (deferred update with concurrent execution in a multiuser environment), the following result is obtained: Comment Step 2 of 2 The list of committed transactions T since the last checkpoint contains only transaction T4. The list of active transactions T' contains transactions T2 and T3. Only the WRITE operations of the committed transactions are to be redone. Hence, REDO is applied to: [write_item,T4,B,15] [write_item,T4,A,20] The transactions that are active and did not commit i.e., transactions T2 and T3 are canceled and must be resubmitted. Their operations do not have to be undone since they were never applied to the database Comments (1) Chapter 22, Problem 25E Problem How does checkpointing in ARIES differ from checkpointing as described in Section 22.1.4? Step-by-step solution Step 1 of 1 From described in section 19.1.4 in textbook, The main difference is that with ARIES, main memory buffers that have been modified are not flushed to disk. ARIES, however writes additional information to the LOG in the form of a Transaction Table and a Dirty Page Table when a checkpoint occurs. Comment Chapter 22, Problem 26E Problem How are log sequence numbers used by ARIES to reduce the amount of REDO work needed for recovery? Illustrate with an example using the information shown in Figure 22.5. You can make your own assumptions as to when a page is written to disk. Step-by-step solution Step 1 of 1 ARIES can be used to reduce the amount of REDO work through log sequence numbers as follows: • ARIES reduces the amount of REDO work by starting redoing after the point, where all prior changes have been applied to the database. ARIES performs REDO at the position in the log that corresponds to smallest LSN, M. • In the Figure 22.5, REDO must start at the log position 1 as the smallest LSN in Dirty Page Table is 1. • When , then the page corresponding to LSN is changed and is propagated to the database. • In the figure 22.5 the transaction performs the update of page C and page C has a LSN of 7. • When REDO starts at log position 1, page C is propagated to the database. But the page C is not changed as its LSN (7) is greater than the LSN of current log position (1). • Now consider the LSN 2. Page B is associated with this LSN and it would be propagated to the database. The page B would be updated if its LSN is less than 2. Similarly, the page corresponding to LSN 6 would be updated. • However the page corresponding to the LSN 7 need not be updated as the LSN of page C, that is 7, is not less than the current log position. Comment Chapter 22, Problem 27E Problem What implications would a no-steal/force buffer management policy have on checkpointing and recovery? Step-by-step solution Step 1 of 1 No-Steal/Force Buffer Management Policy Implications • No-steal/force buffer management policy means that the cache or buffer page that has been updated by the transaction cannot be written to disk before the transaction commits • Force means that pages updated by a transaction are written to disk before transaction commit. • During checkpoint scheme in no-steal, all modified main memory buffers to disk would not be able to write pages updated by uncommitted transactions. • With Force, after a transaction is done, its updates are written to disk. If there is any failure during this transaction, then REDO is still needed. UNDO is not needed since uncommitted updates are never written to disk. Comment Chapter 22, Problem 28E Problem Choose the correct answer for each of the following multiple-choice questions: Incremental logging with deferred updates implies that the recovery system must a. store the old value of the updated item in the log b. store the new value of the updated item in the log c. store both the old and new value of the updated item in the log d. store only the Begin Transaction and Commit Transaction records in the log Step-by-step solution Step 1 of 1 Incremental loging with deferred updates implies that the recovery system must necessarily, Option (b) Store the new value of the updated item in the log. Comment Chapter 22, Problem 29E Problem Choose the correct answer for each of the following multiple-choice questions: The write-ahead logging (WAL) protocol simply means that a. writing of a data item should be done ahead of any logging operation b. the log record for an operation should be written before the actual data is written c. all log records should be written before a new transaction begins execution d. the log never needs to be written to disk Step-by-step solution Step 1 of 1 The write ahead logging (WAL) protocol simply means that the log record for an operation should be written before the actual data is written. Option (b) The log record for an operation should be written before the actual data is written. Comment Problem Chapter 22, Problem 30E Choose the correct answer for each of the following multiple-choice questions: In case of transaction failure under a deferred update incremental logging scheme, which of the following will be needed? a. an undo operation b. a redo operation c. an undo and redo operation d. none of the above Step-by-step solution Step 1 of 1 In case of transaction failure under a deferred update incremental logging scheme which of the following will needed. Option (c) An undo and redo operations. Comments (1) Chapter 22, Problem 31E Problem Choose the correct answer for each of the following multiple-choice questions: For incremental logging with immediate,updates, a log record for a transaction would contain a. a transaction name, a data item name, and the old and new value of the item b. a transaction name, a data item name, and the old value of the item c. a transaction name, a data item name, and the new value of the item d. a transaction name and a data item name Step-by-step solution Step 1 of 1 For incremental logging with immediate updates a log record for a transaction would contain. Option (a) A Transaction name, data item name, old value of item, new value of item Comment Chapter 22, Problem 32E Problem Choose the correct answer for each of the following multiple-choice questions: For correct behavior during recovery, undo and redo operations must be a. commutative b. associative c. idempotent d. distributive Step-by-step solution Step 1 of 1 For correct behavior during recovery, undo and redo operations must be Option (c) Idempotent Comment Chapter 22, Problem 33E Problem Choose the correct answer for each of the following multiple-choice questions: When a failure occurs, the log is consulted and each operation is either undone or redone. This is a problem because a. searching the entire log is time consuming b. many redos are unnecessary c. both (a) and (b) d. none of the above Step-by-step solution Step 1 of 1 When a failure occurs, the log is consulted and each operation is either undone or redone. This is the problem because. Option (a) Searching the entire log is time consuming. Comment Chapter 22, Problem 34E Problem Choose the correct answer for each of the following multiple-choice questions: Using a log-based recovery scheme might improve performance as well as provide a recovery mechanism by a. writing the log records to disk when each transaction commits b. writing the appropriate log records to disk during the transaction’s execution c. waiting to write the log records until multiple transactions commit and writing them as a batch b. never writing the log records to disk Step-by-step solution Step 1 of 1 When using a log based recovery scheme it might improve performance as well as providing a recovery mechanism by Option C Waiting to write the log records until multiple transactions commit and waiting them as a batch. Comment Chapter 22, Problem 35E Problem Choose the correct answer for each of the following multiple-choice questions: There is a possibility of a cascading rollback when a. a transaction writes items that have been written only by a committed transaction b. a transaction writes an item that is previously written by an uncommitted transaction c. a transaction reads an item that is previously written by an uncommitted transaction d. both (b) and (c) Step-by-step solution Step 1 of 1 There is a possibility of a cascading rollback when Option (d) A transaction writes & reads an item that is previously written by an uncommitted transaction. Comment Chapter 22, Problem 36E Problem Choose the correct answer for each of the following multiple-choice questions: To cope with media (disk) failures, it is necessary a. for the DBMS to only execute transactions in a single user environment b. to keep a redundant copy of the database c. to never abort a transaction d. all of the above Step-by-step solution Step 1 of 1 To cope with media (disk) failures. It is necessary Option (b) To keep a redundant copy of the database. Comment Chapter 22, Problem 37E Problem Choose the correct answer for each of the following multiple-choice questions: If the shadowing approach is used for flushing a data item back to disk, then a. the item is written to disk only after the transaction commits b. the item is written to a different location on disk c. the item is written to disk before the transaction commits b. the item is written to the same disk location from which it was read Step-by-step solution Step 1 of 1 If the shadowing approach is used for flushing a data item back to disk then. Option (b) The item is written to different location on disk. Comment Chapter 30, Problem 1RQ Problem Discuss what is meant by each of the following terms: database authorization, access control, data encryption, privileged (system) account, database audit, audit trail. Step-by-step solution Step 1 of 1 Database authorization Database authorization ensures the security of the portions of the database against unauthorized access. Access control Most common problem of security is the prevention of accessing the system by an unauthorized person to obtain information or to inject malicious content that modifies the database. DBMS must include various security mechanisms which restrict access to the entire database system. This function is performed by creating user accounts and passwords for the login process to secure from unauthorized users by the DBMS. Data encryption Sensitive data such as card numbers (ATM or credit card) provided by bank must be protected that is transmitted through communications network; it provides additional protection for database. The data is encoded so that unauthorized users who access those data will have difficulty in decoding it. Privileged account The DBA account provides important capabilities. The commands are privileged that include granting and revoking commands of privileges to individual accounts, users, or user groups by performing following actions • Account creation • Privilege granting • Privilege revocation • Security level assignment Database audit If there are any modifications or any alterations with the database are identified without their knowledge, a database audit is performed. It consists of reviewing the log to examine all accesses and operations applied to the database during certain period of time. Audit trail The database log is used for security purposes as it contains all details of the accessing and the operations are referred as audit trail. Comment Chapter 30, Problem 2RQ Problem Which account is designated as the owner of a relation? What privileges does the owner of a relation have? Step-by-step solution Step 1 of 1 Owner account is designated as the owner of a relation which is typically the account that was used when the relation was created in the first place. The owner of a relation is given all privileges on that relation. The owner account holder can pass privileges on any of the owner relation to other users by granting privileges to their accounts. Comment Chapter 30, Problem 3RQ Problem How is the view mechanism used as an authorization mechanism? Step-by-step solution Step 1 of 1 The view mechanism is an important discretionary authorization mechanism in its own right. For example:If the owner A of a relation R wants another account B to be able to retrieve only some fields of R, then A can create a view V of R that includes only those attributes and then grant SELECT on V to B. the same applies to limiting B to retrieving only certain tuples of R; a view V can be created by defining the view by means of a query that selects only those tuples from R that A wants to allow B to access. Comment Chapter 30, Problem 4RQ Problem Discuss the types of privileges at the account level and those at the relation level. Step-by-step solution Step 1 of 1 There are two levels of privileges to be assigned to use the database system, account level and relation (or table level). • At account level, each account of the relation holds particular privileges independently specified by the database administrator in the database. • At relation level, each individual relation or view in the database accessing privileges are controlled by database administrator. Account level It includes, 1. CREATE SCHEMA or CREATE TABLE privilege, to create a schema. 2. CREATE VIEW privilege. 3. ALTER privilege, to perform changes such as adding or removing attributes. 4. DROP privilege, to delete relations or views. 5. MODIFY privilege, to insert, delete, or update tuples. 6. SELECT privilege, to retrieve information from the database. Relation level • It refers to either base relation or view (virtual) relation. • Each type of command can be applied for each user by specifying the individual relation. Access matrix model, an authorization model is used for granting and revoking of privileges. Comment Chapter 30, Problem 5RQ Problem What is meant by granting a privilege? What is meant by revoking a privilege? Step-by-step solution Step 1 of 1 Granting and revoking of privileges should be performed so that it ensures secure and authorized access and hence both of them should be controlled on each relation R in a database. It is carried out by assigning an owner account, which is the account that was used when the relation was created. The owner of the relation is the one who uses all privileges on that relation. Granting of privileges The owner account holder can transfer the privileges on any of the relations owned to other users by issuing GRANT command (granting privileges) to their accounts. Types of privileges granted on each individual relation R by using GRANT command are as follows, • SELECT privilege on some relation, gives the privilege to retrieve the information (tuples) from that relation. • Modification privilege is provided to do insert, delete, and update operations that modify the database. • References privilege is granted to refer a relation based on integrity constraints specified. Revoking of privileges When any of the privileges is granted it is given temporarily, it should be necessary to cancel that privilege after the task has been completed. REVOKE command is used in SQL for canceling the privileges granted to them. Comment Chapter 30, Problem 6RQ Problem Discuss the system of propagation of privileges and the restraints imposed by horizontal and vertical propagation limits. Step-by-step solution Step 1 of 2 Propagation of privileges: whenever the owner A of a relation R grants a privilege on R to another account B, the privilege can be given to B with or without the GRANT OPTION. If the GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts. Suppose that B is given GRANT OPTION by A and that B then grants the privilege on R to a third account C, also with GRANT OPTION. In this way, privileges on R can propagate to other accounts without the knowledge of the owner of R. If the owner account A now revoke he privileges granted to B, all the privileges that B propagated based on that privileges should automatically be revoked by the system. It is possible for a user to receive a certain privileges from two or more sources. For example, A' may a certain privilege from both B' and C'. Now let B' revokes privileges from A' but A' will still have them from virtue of C'. If now C' also revokes the privileges A' will loose them permanently. The DBMS that allows propagation of privileges must keep a track of how all the privileges were granted do that revoking of privileges can be done correctly and completely. Comment Step 2 of 2 Since propagation of privileges can lead to many accounts having privilege on a relation without the knowledge of owner. There must be ways to restrict number of people that can have privileges on an relation. This can be done using limiting by Horizontal propagation and by limiting by Vertical propagation. Limiting Horizontal propagation to an integer number i mean that an account B given the GRANT OPTION can grant privileges to at most i other accounts. Vertical propagation limits the depth of the granting of privileges. Granting of privileges with vertical propagation zero is equivalent to granting the privileges with no GRANT OPTION. If account A grants privileges to account B with vertical propagation set to j>0, this means that the account B has GRANT OPTION on the privilege, but B can grant the privilege to other accounts only with a vertical propagation less than j. In effect vertical propagation limits the sequence of GRANT OPTIONS that can be given from one account to the next based on single original grant of the privileges. For example: Suppose that A grant SELECT to B on EMPLOYEE relation with horizontal propagation = 1 and vertical propagation = 2. B can grant select to almost one account because horizontal propagation = 1. Additionally, B cannot grant privilege to another account with vertical propagation set to 0 or 1. Thus we can limit propagation by using these two methods. Comment Chapter 30, Problem 7RQ Problem List the types of privileges available in SQL. Step-by-step solution Step 1 of 1 Following type of privileges can be granted on each individual relation R: 1.) Select (retrieval or read) privilege on R: Gives the account retrieval privilege. In SQL this gives the account the privilege to use SELECT statement to retrieve the tuples from R 2.) Modify privilege on R: This gives the account the capability to modify tuples of R. In SQL this privilege is further divided into UPDATE, DELETE, and INSERT privileges to apply corresponding SQL commands to R. Additionally, both the INSERT and UPDATE privileges can specify that only certain attributes of R can be updated by the account. 3.) Reference privileges on R: This gives the account the capability to reference relation R when specifying integrity constraints. This privilege can also be restricted to specific attributes of R. To create a view an account must have SELECT privilege on all relations involved in view definition. Comment Chapter 30, Problem 8RQ Problem What is the difference between discretionary and mandatory access control? Step-by-step solution Step 1 of 2 a. Discretionary Access Control (DAC) policies are characterized by a high degree of flexibility, which makes them suitable for a large variety of application domains. By contrast Mandatory Access Control policies are having a drawback of being too rigid in that they require a strict classification of subject and objects into security levels, and therefore they are applied to ery few environments. Comment Step 2 of 2 b. The main drawback of DAC models is their vulnerability to malicious attacks, such as Trojan horses embedded in application programs. The reason is that discretionary authorization models do not impose any control on how information is propagated and used once it has been accessed by authorized user to do so. By contrast Mandatory Access Control policies ensure a high degree of protection- in a way, they prevent any illegal flow of information. Comment Chapter 30, Problem 9RQ Problem What are the typical security classifications? Discuss the simple security property and the *property, and explain the justification behind these rules for enforcing multilevel security. Step-by-step solution Step 1 of 1 Typical security classes are top secret (Ts), secret (S), confidential (C), and unclassified (U), where TS is the highest level and U the lowest: . Simple security: A subject S is not allowed read access to an object 0 unless . This is known as simple security property. *Property: A subject S is not allowed to write on object O unless . This known as star property. The first rule is that no subjects can red on object whose security classification is higher than the subject’s security clearance. The second restriction is less intuitive; it prohibits a subject from writing an object at a lower security classification than the subject’s security clearance violations of this rule would allow information to flow from higher to lower classifications which violates a basic tenet of multilevel security. Comment Chapter 30, Problem 10RQ Problem Describe the multilevel relational data model. Define the following terms: apparent key, polyinstantiation, filtering. Step-by-step solution Step 1 of 3 Define: 1.) Apparent key: The apparent key of a multilevel relation is the set of attributes that would have formed the primary key in a regular (single- level) relation. Comment Step 2 of 3 2.) Filtering: A multilevel relation will appear to contain different data to subjects with different clearance levels. In some cases, it is possible to store a single tuple in the relation at a higher classification level and produce the corresponding tuples at a lower- level classification through a process known s filtering. Comment Step 3 of 3 3.) Polyinstantiation In some cases, it is necessary to store two or more tuples at different classification levels with the same value for the apparent key. This leads to the concept of polyinstantiation, where several tuples can have same apparent key value but different attributes value for users at different classification levels. Comment Chapter 30, Problem 11RQ Problem What are the relative merits of using DAC or MAC? Step-by-step solution Step 1 of 1 Discretionary access control (DAC) policies are characterized by a high degree of flexibility, which makes them suitable for a large variety of application domains. The main drawback of DAC models is their vulnerability to malicious attacks, such as Trojan horses embedded in application programs. Where as mandatory policies ensures a high degree of protection in a way, they prevent any illegal flow of information. MAC have the drawback of being too rigid and they are only applicable in limited environments. In many practical situations discretionary policies are preferred because they offer a better trade off between security and applicability. Comment Problem Chapter 30, Problem 12RQ What is role-based access control? In what ways is it superior to DAC and MAC? Step-by-step solution Step 1 of 1 Role – based access control (RBAC) technology for managing and enforcing security in large – scale enterprise wide systems. The basic notation is that permissions are associated with soles, and users are assigned to appropriate roles. Roles can be created using the CREATE ROLE and DESTROY ROLE commands, ERANT and REVOKE used to assign and revoke privileges from voles. RBAC appears to be a viable alternative to traditional DAC and MAC, it ensures that only authorized users are given access to certain data or resources. Many DBMS have allowed the concept of voles, where privileges can be assigned to voles. Role hierarchy in RBAC is natural way of organizing roles to reflect the organization’s lines of authority and responsibility. Using an RBAC model highly desirable goal for addressing the key security requirements of web – based applications. DAC and MAC models lack capabilities needed to support the security requirements emerging enterprises and web – based applications. Comment Chapter 30, Problem 13RQ Problem What are the two types of mutual exclusion in role-based access control? Step-by-step solution Step 1 of 1 Allocation of duties is an important requirement in various database management systems. It is necessary to prevent doing work by the single user that involves the requirement of two or more people, so that collision can be prevented. To implement this process successfully mutual exclusion of roles are used. Two roles are said to be mutually exclusive if the user does not able to use both the roles. Mutual exclusion of roles can be classified in to two types. 1. Authorization time exclusion. 2. Runtime exclusion. Authorization time exclusion It is a static process in which two roles that are mutually exclusive are not assigned to user’s authorization at the same time. Runtime exclusion It is a dynamic process, where the two roles are mutually exclusive are authorized to one user at the same time but can activate any one authorization that is both the roles cannot be activated at the same time. Comment Chapter 30, Problem 14RQ Problem What is meant by row-level access control? Step-by-step solution Step 1 of 1 In row level access control, the name itself determines that access control rules are implemented on the data row by row. Each row is given a label, where data sensitivity information is stored. • It ensures data security by allowing the permissions to be set not only for column or table but also for each row. • Database administrator provides the user with the default session label initially. • Row-level contains levels of hierarchy of sensitivity of data to maintain privacy or security. • Unauthorized users are prevented from viewing or altering certain data by using labels assigned. • A user is represented by a low number who have low level authorization, the access is denied to data having a higher-level number. • If the label is not given to a row, it is automatically assigned depending upon the user’s session label. Comment Chapter 30, Problem 15RQ Problem What is label security? How does an administrator enforce it? Step-by-step solution Step 1 of 1 Label Security policy is a policy defined by the administrator. The policy is invoked automatically whenever the policy affected data is accessed through an application. When this policy is implemented, each row is added with a new column. The new column contains the label for each row that is considered to be the sensitivity of the row as per the policy. Each user has an identity in label-based security; it is compared to the label assigned to each row to determine whether the user has rights to access to view the contents of that row. The database administrator has the privilege to set an initial label for the row. Label security administrator defines the security labels for data and authorizations that govern access to specified projects for users. Example If a user has SELECT privilege on the table, Label Security will automatically evaluate each row returned by the query to determine whether the user is provided with the rights to view the data. If the user is assigned with sensitivity level 25, the user can view all rows that have a security level of 25 or lower. Label security can be used to perform security checks on statements that include insert, delete, and update. Comment Chapter 30, Problem 16RQ Problem What are the different types of SQL injection attacks? Step-by-step solution Step 1 of 1 SQL injection attacks are more common threats to database systems. Types of injection attacks include, • SQL Manipulation • Code injection • Function Call injection Explanation SQL Manipulation A modification attack that changes an SQL command in the application, or by extending a query by adding additional query components using set operations such as union, intersect, or minus in SQL query. Example The query used to check authentication: SELECT * FROM loginusers WHERE username="john" and paSSswoRd="johnpwd"; Check whether any rows are returned by using this query. The hacker can try to change or manipulate the SQL statement as follows: SELECT * FROM loginusers WHERE username="john" and paSSswoRd="johnpwd" or "a"="a"; So the hacker knows “john” as a valid login and without knowing his password able to log into the database system. Code Injection • It allows the addition of extra SQL statements or commands to the existing or original SQL statement by introducing a computer bug caused by processing invalid data. • The attacker injects the code into a computer program to change the course of action. • It is a one of the method used for hacking the system to obtain information without authorization. Function call Injection • A database or operating system (OS) function call is injected into the SQL statements to change the data or to make a system call that is considered to be privileged. • It is possible to introduce a function that performs some operation related to communication of network and SQL queries are created that are dynamic as they are executed at run time. Example The query given makes the user request a page from a web server. SELECT TRANSLATE ("||HTTP.REQUEST ('http: //129.107.12.1/') ||", '97876763','9787') FROM dual; The attacker can identify the string that is given as an input, the URL of the web page for doing any other illegal operations. Comment Chapter 30, Problem 17RQ Problem What risks are associated with SQL injection attacks? Step-by-step solution Step 1 of 1 Risk associated with SQL injection attacks are, Database Fingerprinting The attacks related to database are determined by the attacker by identifying the type of backend database which are performed if there is weakness in DBMS. Denial of Service The attacker can make buffer to overflow with request or consume more number of resources or they delete some data, thus denying the service to the intended users. Bypassing Authentication The attacker can access the database system as an authorized user and perform all the desired operations. Identifying Injectable Parameters The attacker obtains the sensitive information such as the type and structure of the back-end database of a web application. It is possible as the default error page is descriptive that are returned by application servers. Executing Remote Commands By this the attacker uses the tool to execute the commands on the database. For example attacker can execute stored procedures and functions from a remote SQL interface. Performing Privilege Escalation This attack makes use of logical flaws within the database to improve the level of access. Comment Problem Chapter 30, Problem 18RQ What preventive measures are possible against SQL injection attacks? Step-by-step solution Step 1 of 1 Preventing from SQL injection attacks is achieved by using some programming rules to all procedures and functions that are accessed through web. Some of the techniques include, Bind Variables • The bind variables are used to (using parameter) protects against injection attacks and hence performance is improved. • For example, consider the code using java and JDBC: PreparedStatement st=con.prepareStatement ("SELECT * FROM employee WHERE empid=? AND pwd=?"); st.setString (1, empid); st.setString (2, pwd); • User input should be bound to a parameter instead of using it in the statement, in this example the input ‘1’ is assigned to a bind variable ‘empid’ instead of directly passing string parameters. Filtering Input • It is used to remove the escape characters by using Replace function of SQL from input strings. • For example the delimiter (“) double quote is replaced by (‘’) two single quotes. Function Security Database standard and custom functions should be restricted as they take advantage during the SQL function injection attacks. Comment Chapter 30, Problem 19RQ Problem What is a statistical database? Discuss the problem of statistical database security. Step-by-step solution Step 1 of 1 Statistical database are used mainly to produce statistics on various populations. The database may contain data on individuals , which should be protected from user access. Users are permitted to retrieve statistical information on the populations such as averages, sums, counts, minimums maximums, and standard deviations. A population is a set of tuples of a relation (table that satisfy some selection condition Statistical queries involve applying statistical functions to a population of tuples. Statistical database security techniques fail to provide security to individual data in some situations. For ex: We may want to retrieve the number of individuals in a population or the average income in the population. Comment Chapter 30, Problem 20RQ Problem How is privacy related to statistical database security? What measures can be taken to ensure some degree of privacy in statistical databases? Step-by-step solution Step 1 of 2 Statistical database are used mainly to produce statistics about various populations. The database may contain confidential data about individuals, which should be protected from user access. However, users are permitted to retrieve statistical information about the populations, such as averages, sums, counts, maximums, minimums, and standard deviations. Since there can be ways to retrieve private information using aggregate function when much information is available about a person, statistical database that store information impose potential threats to privacy. Consider a example: PERSON relation with attributes Name, Ssn, Income, Address, City, Zip, Sex and Last_degree. A population is set of tuples of a relation that satisfy some selection condition. Hence, each selection condition on the PERSON relation will specify a particular population of PERSON tuples. For example Sex = 'F' or Last_degree = 'M.Tech'. Statistical queries involve applying statistical functions to a population of tuples. For example: Avg Income. However, access to personal information is not allowed. Statistical database security techniques must prohibit queries that retrieve attribute values and by allowing only queries that involve aggregate functions such as ADD, MIN,,MAX, AVG, COUNT and STANDATRD DEVIATION. Such queries are sometime called statistical queries. Comment Step 2 of 2 It is the responsibility of a database management system to ensure the confidentiality of information about individuals, while still providing useful statistical summaries of data about those individuals to user. Provision of privacy protection is paramount. Its violation can be illustrated in following statistical queries: Q1 SELECT COUNT (*) FROM PERSON WHERE ; Q2 SELECT AVG(Income) FROM PERSON WHERE; Let someone is interested in find in salary of Jane Smith, who is a female with last degree 'M.S, and stays in Houston adding all these to let we get 1 as result of Q1. Now using same condition for Q2 will give salary of Jane Smith. even if result is not 1 for Q1, still MAX and MIN functions can be used to get range of salary. Measures taken to ensure privacy: 1.) No statistical queries are permitted whenever number of tuples in the population specified by selection falls below some threshold. 2.) Prohibit query that repeatedly refer to same population of tuples. 3.) Introduce slight noises in result of queries. 4.) Partitioning of database into groups and any qury must refer to any complete group, but never to subsets of records within groups. Comment Chapter 30, Problem 21RQ Problem What is flow control as a security measure? What types of flow control exist? Step-by-step solution Step 1 of 3 Flow control regulates the distribution or flow of information among accessible objects. A flow between object X and object Y occurs when a program reads values from X and writes values into Y. Flow control checks that information contained in some object does not flow explicitly or implicitly into less protected objects. Thus, a user cannot get indirectly in Y what he or she cannot get directly in X. Most flow controls employ some concepts of security class; the transfer of information from a sender to a receiver is allowed only if the receiver's security class is at least as privilege as sender's. Examples of a flow control program include preventing a service program from leaking a customer's confidential data, and blocking the transmission of secret military data to an unknown classified user. A flow policy specifies the channels long which information is allowed to move. The simplest flow policy specifies just two classes of information: confidential(C), and non-Confidential (N), and allows all flows except those from class C to N. This policy can solve the confidentiality problem that arises when a service program handles data such a s customer information, some of which may be confidential. Comment Step 2 of 3 Access control mechanisms are responsible for checking users' authorizations for resource access: Only granted operations are executed. Flow controls can be enforced by an extended access control mechanism, which involve assigning a security class to each running program. The program is allowed to read a particular memory segment only if its class is as high as that of the segment. It is allowed to write in a segment only if its class as low as that of the segment. This automatically ensures that no information transmitted by the person can move from a higher to a lower class. For example, a military program with secret clearance can only read from objects that are unclassified and confidential and can only write into objects that are secret or top secret. Two types of flows exist: 1.) Explicit flows: Occurring as a consequence of assignment instructions, such as Y:= f(X1, Xn) 2.) Implicit flows: Generated by conditional instructions, such as if f(Xm+1,..., Xn) then y:= f(X1, Xm). Comment Step 3 of 3 Flow control mechanisms must verify that only authorized flows, both explicit and implicit, are executed. A set of rules must be satisfied to ensure secure information flows. Rules may be expressed using flow relations among classes and assigned to information, stating the authorized flow within the system. This relation can define, for a class, the set of classes where information can flow, or can state the specific relations to be verified between two classes to allow information to flow from one to another. In general, flow control mechanisms implement the control by assigning a label to each object and by specifying the security class of the object. Labels are then used to verify the flow relations defined in the model. Comment Chapter 30, Problem 22RQ Problem What are covert channels? Give an example of a covert channel. Step-by-step solution Step 1 of 2 A covert channel allows a transfer of information that violates the security or the policy. Specifically, covert channel allows information to pass from higher classification level to a lower classification level through improper means. Covert channels can be classified into two broad categories: 1.) Timing Channels: In a timing channel the information is conveyed by the timing event processes 2.) Storage channels: In storage channels temporal synchronization is not required, in that information is conveyed by accessing system information or what is otherwise inaccessible to the user. Comment Step 2 of 2 In a simple example of a convert channel, consider a distributed database system in which two nodes have user security levels of secret(S) and unclassified (U). In order for a transaction to commit, both nodes must agree to commit. They mutually can only do operations that are consistent with *- property, which states that in any transaction, the S site cannot writ or pass information to the U site. However, if these two sites collude to set up a covert channel between them, a transaction involving secret data may be committed unconditionally by the U site, but the S site may do so in some predefined agreed-upon way so that certain information may be passed from the site S to the U site. Measures such as locking prevent concurrent writing of the information by users with different security levels into the same objects, preventing the storagetype convert channels. Operating systems and distributed database provide control over the multi-programming of operations that allows a sharing of resources without the possibility of encroachment of one program or process into another's memory or other resources in the system, thus preventing timing-oriented covert channels. In general, covert channels are not a major problem in well-implemented robust database implementations. However, certain schemes may be contrived by clever uses that implicitly transfer information. Some security experts believe that one way to avoid covert channels is to disallow programmers to actually gain access to sensitive data that a program will process after the program has been put into operation. Comment Chapter 30, Problem 23RQ Problem What is the goal of encryption? What process is involved in encrypting data and then recovering it at the other end? Step-by-step solution Step 1 of 1 Suppose data is communicated via a secure channel but still falls into wrong hands. In this situation, by using encryption we can disguise the message so that even if the transmission is diverted, the message will not be revealed. Encryption is a means of maintaining secure data in an insecure environment. Encryption consists of applying an encryption algorithm to data using some predefined encryption key. The resulting data has to be decrypted using a decryption key to recover the original data. Comment Chapter 30, Problem 24RQ Problem Give an example of an encryption algorithm and explain how it works. Step-by-step solution Step 1 of 3 Public key encryption: Public key encryption is based on mathematical functions rather than operations on bit patterns. They also involve the use of two separate keys, in contrast to conventional encryption, which uses one key only. The use of two keys can have profound consequences in the areas of confidentiality, key distribution, and authentication. The two keys used for public key encryption are referred to as the public key and the private key. Invariably, the private key is kept secret, but it is referred to as private key rather than secret key to avoid confusion with conventional encryption. Comment Step 2 of 3 A public key encryption scheme, or infrastructure, has six ingredients: 1.) Plaintext: data that is to be transmitted (encrypted). 2.) Encryption algorithm: Algorithm that will perform transformations on plain text. 3. and 4.) Public key and Private Key: If one of these is used for encryption the other is used for decryption. 5.) Cipher text: Encrypted data or scrambled text for a given plaintext and set of keys. 6.) Decryption algorithms: This algorithm accepts the cipher text and the matching key and produces the original plain text. Comment Step 3 of 3 Public key is made public for others to use, whereas the private key is known only to its owner. It relies on one key for encryption and other for decryption. Essential steps are as follows: 1.) Each user generates a pair of keys to be used for the encryption and decryption of messages. 2.) Each user places one of the keys in a public register or other accessible file. This is the public key. The companion key is kept private. 3.) If a sender wishes to send a private message to a receiver, the sender encrypts the message using the receiver's public key. 4.) When the receiver receives the message, he or she decrypts it using the receiver's private key. No other recipient can decrypt the message because only the receiver knows his or her private key. Comment Chapter 30, Problem 25RQ Problem Repeat the question for the popular RSA algorithm. Question Give an example of an encryption algorithm and explain how it works. Step-by-step solution Step 1 of 1 The RSA encryption algorithm incorporates results form number theory, combined with the difficulty of determining the prime factors of a target. The RSA algorithm also operates with modular arithmetic -mod n. Two keys e and d, are used for encryption and decryption. An important property is that they can be interchanged. n is chosen as a large integer that is a product of two large distinct prime numbers, a and b. The encryption key e is a randomly chosen number between 1 and n that is relatively prime to (a-1) *(b-1). The plaintext block P is encrypted as P^e mod n. Because the exponentiation is performed mod n, factoring P^e to uncover the encrypted plaintext is difficult. However, the decrypting key d is carefully chosen so that (P^e)^d mod n = P. The d can be computed from the condition that d*e = 1 mod((a-1) * (b-1)). Thus, the legitimate receiver who knows d simply computes (P^e)^dmod n = P and recovers p without having to factor P^e. Comment Chapter 30, Problem 26RQ Problem What is a symmetric key algorithm for key-based security? Step-by-step solution Step 1 of 1 Symmetric key uses same key for both encryption and decryption, by using this characteristic fast encryption and decryption is possible to be used for sensitive data in the database. • The message is encrypted with a secret key and can be decrypted with the same secret key. • Algorithm used for symmetric key encryption is called as symmetric key algorithm and as they are mostly used for encrypting the content of a message, they are also called content encryption algorithm. • The secret key is derived from password string used by the user by applying the same function to the string at both sender and receiver. Thus it is also referred as password based encryption algorithm. • Encrypting the content using longer key is difficult to break than using shorter key as the encryption entirely depends upon the key. Comment Chapter 30, Problem 27RQ Problem What is the public key infrastructure scheme? How does it provide security? Step-by-step solution Step 1 of 2 Public key encryption scheme, 1. Plain text: This is the data or readable message that is fed into the algorithm as input. 2. Encryption algorithm: This algorithm performs various trans formations on the plaintext. 3. Public and private keys: These are a pair of keys that have been selected so that if one is used for encryption, the other is used for decryption. The exact transformations performed by the encryption algorithm detention the public or private key that is provided as in put. 4. Cipher text: This is the scrambled message produced as output. It depends on the plain text and the key. For a given message two different keys will produce two different cipher texts. 5. Decryption algorithm: This algorithm accepts the cipher text and the matching key and produces the original plaintext. A general purpose public key cryptographic works with one key for encryption and different but related key for decryption. Comment Step 2 of 2 The steps are as follows:1. Each user generates a pair of key s to be used for the encryption and decryption of message. 2. User places one of two keys in public register of in an accessible file.(Public key) and companion key is kept private. 3. If user wishes to send a private message to a receiver, the sender encrypts it using receiver public key. 4. Receiver receives message, decry its it using the receiver private key, No other user can decrypt the message thus this provide security to data. Comment Chapter 30, Problem 28RQ Problem What are digital signatures? How do they work? Step-by-step solution Step 1 of 1 Digital signature is a means of associating a mark unique to an individual with a body of text. The mark should be unforgettable i.e others able to check whether signature comes from the originator. Digital signature consists of a string of symbols. - signature must be different for each use. This can be achieved by making each digital signature a function of the message that it is signing together with a time stamp. - Public key techniques are the means cheating digital signatures. Comment Chapter 30, Problem 29RQ Problem What type of information does a digital certificate include? Step-by-step solution Step 1 of 1 A digital certificate combines the public key with the identity of the person that consists of the corresponding private key into a statement that was digitally signed. The certificate are issued and signed by certification authority (CA). The following are the list of information included in the certificate: 1. The certificate owner information, which is a unique identifier known as the distinguished name (DN) of the owner. It includes owner’s name, organization and other related information of the owner. 2. The public key of the owner. 3. The date of issue of the certificate. 4. The validity period is specified by ‘Valid From’ and ‘Valid To’ dates. 5. Information of the issuer identifier. 6. Digital signature of the certification authority (CA) who issues the certificate. All the information is encoded through message-digest function, which creates the signature. Comment Chapter 30, Problem 30E Problem How can privacy of data be preserved in a database? Step-by-step solution Step 1 of 3 Protecting data from unauthorized access is refereed as data privacy. The data warehouses in which a large amount of data is stored must be kept private and secure. There are many challenges associated with data privacy. Some of them are as follows: • In order to preserve data privacy, performing data mining and analysis should be minimized. Usually, a large amount of data is collected and stored in centralized location. Violating one security policy will expose all the data. So, it is better to avoid storing data in central warehouse. Comment Step 2 of 3 • The database contains personal data of the individuals. So, personal data of the individuals is to be kept secure and private. • A lot of people in the organization and outside the organization access the data. Data must be protected from illegal access/attacks. Comment Step 3 of 3 Some of the measures to provide data privacy are as follows: • A good security mechanism should be imposed to protect the data from unauthorized users. It includes physical security which includes protecting the location where the data is stored. • Provide controlled and limited access to the data. Ensure that only authorized users can access the data by using biometrics, passwords etc. Also impose mechanism so that they access the data that they need. • It is better to avoid storing data in central warehouse and distribute the data in different locations. • Anonymize the data and remove all the personal information. Comment Chapter 30, Problem 31E Problem What are some of the current outstanding challenges for database security? Step-by-step solution Step 1 of 3 Challenges in database Security: 1.) Data Quality: The database community needs techniques and organizational solutions to access and attest the quality of data. Techniques may be as simple as Quality stamps posted on Web sites. We also need techniques that provide more efficient integrity semantics verification and tools for assessment od data quality, based on techniques such as record linkage. Comment Step 2 of 3 2.) Intellectual property Rights: With the widespread use of internet and intranets, legal and informational aspects of data are becoming major concerns of organizations. To address these concerns, watermarking techniques for relational data have recently been proposed. The main purpose of digital watermarking is to protect content from unauthorized duplication and distribution by enabling provable ownership of the content. It has traditionally relied upon the availability of large noise domain within which the object can be altered while retaining its essential properties. However, research is needed to assess the robustness of such techniques and to investigate different approaches aimed at preventing intellectual property right violations. Comment Step 3 of 3 3.) Data Survivability: Database systems need to operate and continue their functions, even with reduced capabilities, despite disruptive events such as information warfare attacks. A DBMS, in addition to making every effort to prevent an attack and detecting one in the event of occurrence, should be able to do following: 1.) Confinement: Take immediate action to eliminate the attacker's access to the system and to isolate or contain the problem to prevent further spread. 2.) Damage assessment: Determine the extent of the problem, including failed functions and corrupted data. 3.) Reconfiguration: Reconfigure to allow operation to continue is a degraded mode while recovery proceeds. 4.) Repair: Recover corrupted or lost data and repair or reinstall failed system functions to reestablish a normal level of operation. 5.) Fault treatment: To the extent possible, identify the weaknesses exploited in the attack and take steps to prevent a recurrence. The goal of the information warfare attacker is to damage the organization's operation and fulfillment of its mission through disruption of its information systems. The specific target of an attack may be the system itself or its data. While attacks that bring the system down outright are server and dramatic, they must also be well timed to achieve the attackers goal, since attacks will receive immediate and concentrated attention in order to bring the system back to operational condition, diagnose how the attack took place, and installs preventive measures. Comment Chapter 30, Problem 32E Problem Consider the relational database schema in Figure 5.5. Suppose that all the relations were created by (and hence are owned by) user X, who wants to grant the following privileges to user accounts A, B, C, D, and E a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these privileges to other users. b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary, Mgr_ssn, and Mgr_start_date. c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname, and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT. d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify DEPENDENT. e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have Dno = 3. f. Write SQL statements to grant these privileges. Use views where appropriate. Step-by-step solution Step 1 of 6 (a) GRANT SELECT, UPDATE ON EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON TO USER_A WITH GRANT OOPTION ; Comment Step 2 of 6 (b) CREATE VIEW EMPS AS SELECT FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX, SUPERSSN, DN O FROM EMPLOYEE ; GRANT SELECT ON EMPS TO USER _ B; CREATE VIEW DEPTS AS SELECT DNAME, DNUMBER FROM DEPARTMENT; GRANT SELECTION ON DEPTS TO USER _ B; Comment Step 3 of 6 (c) GRANT SELECT, UPDATE ON WORKS ON TO USE_C CREATE VIEW EMPI AS SELECT FNAME, MINIT, LNAME, SSN FROM EMPLOYEE ; GRANT SELECT ON EMPL TO USER _ C; CREATE VIEWPROJIAS SELECT PNAME, PNUMBER, FROM PROJECT; GRANT SELECTION PROJ1 TO USER_C; Comment Step 4 of 6 (d) GRANT SELECT ON EMPLOYEE, DEPEN DENT TO USER_D; GRANT UPDATE ON DEPENDENT TO USER_D; Comment Step 5 of 6 (e) CREATE VIEW DNO 3_ EMPLOYEEES AS SELECT * FROM EMPLOYEE WHERE DNO = 3; GRANT SELECT ON DNO 3_EMPLOYEES TO USER_E; Comment Step 6 of 6 (f) Working of the above statements grants privileges. Comment Chapter 30, Problem 32E Problem Consider the relational database schema in Figure 5.5. Suppose that all the relations were created by (and hence are owned by) user X, who wants to grant the following privileges to user accounts A, B, C, D, and E a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these privileges to other users. b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary, Mgr_ssn, and Mgr_start_date. c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname, and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT. d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify DEPENDENT. e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have Dno = 3. f. Write SQL statements to grant these privileges. Use views where appropriate. Step-by-step solution Step 1 of 6 (a) GRANT SELECT, UPDATE ON EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON TO USER_A WITH GRANT OOPTION ; Comment Step 2 of 6 (b) CREATE VIEW EMPS AS SELECT FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX, SUPERSSN, DN O FROM EMPLOYEE ; GRANT SELECT ON EMPS TO USER _ B; CREATE VIEW DEPTS AS SELECT DNAME, DNUMBER FROM DEPARTMENT; GRANT SELECTION ON DEPTS TO USER _ B; Comment Step 3 of 6 (c) GRANT SELECT, UPDATE ON WORKS ON TO USE_C CREATE VIEW EMPI AS SELECT FNAME, MINIT, LNAME, SSN FROM EMPLOYEE ; GRANT SELECT ON EMPL TO USER _ C; CREATE VIEWPROJIAS SELECT PNAME, PNUMBER, FROM PROJECT; GRANT SELECTION PROJ1 TO USER_C; Comment Step 4 of 6 (d) GRANT SELECT ON EMPLOYEE, DEPEN DENT TO USER_D; GRANT UPDATE ON DEPENDENT TO USER_D; Comment Step 5 of 6 (e) CREATE VIEW DNO 3_ EMPLOYEEES AS SELECT * FROM EMPLOYEE WHERE DNO = 3; GRANT SELECT ON DNO 3_EMPLOYEES TO USER_E; Comment Step 6 of 6 (f) Working of the above statements grants privileges. Comment Chapter 30, Problem 33E Problem Suppose that privilege (a) of Exercise is to be given with GRANT OPTION but only so that account A can grant it to at most five accounts, and each of these accounts can propagate the privilege to other accounts but without the GRANT OPTION privilege. What would the horizontal and vertical propagation limits be in this case? Reference Problem 30.32 Consider the relational database schema in Figure 5.5. Suppose that all the relations were created by (and hence are owned by) user X, who wants to grant the following privileges to user accounts A, B, C, D, and E a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these privileges to other users. b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary, Mgr_ssn, and Mgr_start_date. c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname, and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT. d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify DEPENDENT. e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have Dno = 3. f. Write SQL statements to grant these privileges. Use views where appropriate. Step-by-step solution Step 1 of 1 The horizontal propagation granted to USERA is 5. The vertical propagation limit granted to USER_A is level 1. So that uses A can then grant it with level 0 vertical limit (i.e with out the GRANT OPTION) to at most five users, who then cannot further grant the privilege. Comment Chapter 30, Problem 34E Problem Consider the relation shown in Figure 30.2(d). How would it appear to a user with classification U? Suppose that a classification U user tries to update the salary of ‘Smith’ to $50,000; what would be the result of this action? Step-by-step solution Step 1 of 1 EMPLOYEE would appear to users with in classification U as follows: NAME: SALARY Job performance TC Smith null If a classification null U user tried to up date the salary of smith to $ 50,000, a third polyinstantiation of smith tuple would result as follows. NAME SALARY JOB performance TC. Smith 40000 C fair SS Smith 40000 C excellent C C Smith 50000 null Brown C 80000 s good C S Comment C