Uploaded by Hest T.

11-fundamentals-of-database-systems-solutions

advertisement
Chapter 1, Problem 1RQ
Problem
Define the following terms: data, database, DBMS, database system, database catalog,
program-data independence, user view, DBA, end user, canned transaction, deductive database
system, persistent object, meta-data, and transaction-processing application.
Step-by-step solution
Step 1 of 14
Data
The word data is derived from the Latin which means ‘to give’; data is real given facts, from
which additional facts can be inferred. Data is a collection of known facts that can be recorded
and that have implicit meanings.
Comment
Step 2 of 14
Database
Database is a collection of related data or operational data extracted from any firm or
organization. In other words, a collection of organized data is called database.
Comment
Step 3 of 14
DBMS (Database Management System)
DBMS is a collection of programs that enables users to create, maintain, and manipulate a
database. The DBMS is a general purpose software system that facilitates the process of
defining, constructing, and manipulating database.
Comment
Step 4 of 14
Database Systems
A database system comprises a database of operational data, together with the processing
functionality required to access and manage that data. The combination of the DBMS and the
database is called database systems.
Comment
Step 5 of 14
Database Catalog
A database catalog contains complete description of the databases, database objects, database
structure, details of users, and constraints etc. that are stored.
Comment
Step 6 of 14
Program-data independence
In traditional file processing, the structure of the data files is ‘hard-coded” into the programs. To
change the structure of the data file, one or more programs that access that file, should be
changed. The process of changing can introduce errors. In contrast to this more traditional
approach, DBMS access stores the structure in a catalog, separating the DBMS programs and
the data definition. Storing the data and programs separately is known as program-data
independence.
Comment
Step 7 of 14
User View
The way in which the database appears to a particular user is called user view.
Comment
Step 8 of 14
DBA (Database Administrator)
DBA is a person who is responsible for authorizing access to the database, coordinating and
monitoring its use, and acquiring software and hardware resources as needed.
Comment
Step 9 of 14
End User
End users are the people who want to access the database for different purposes like, querying,
updating, and generating reports.
Comment
Step 10 of 14
Canned Transactions
Standardized queries and updates on the database using carefully programmed and tested
programs.
Comment
Step 11 of 14
Deductive Database System
A deductive database system is a database system that supports the proof-theoretic view of a
database, and ,in particular, is capable of deducing are inferring additional facts from the given
facts in the extensional database by applying specified deductive anxious are rules of inference
to those given facts.
Comments (3)
Step 12 of 14
Persistent object
Object-Oriented database systems are compatible with programming languages such as c++ and
JAVA. An object that is stored in such a way that it survives that termination of the DBMS
program is persistent.
Comment
Step 13 of 14
Meta Data
Information about the data is called Meta data. The information stored in the catalog is called
Meta data. The schema of a table is an example of Meta data.
Comment
Step 14 of 14
Transaction processing application
A transaction is a logical unit of database. The processing includes one or more database
operations like, insertion, deletion, modification and retrieval. The database operations that form
a transaction can either be embedded within an application program on they can be specified
interactively via a high-level query language such as SQL.
Comment
Chapter 1, Problem 2RQ
Problem
What four main types of actions involve databases? Briefly discuss each.
Step-by-step solution
Step 1 of 5
The four types of actions involve the database are as follows:
• Database Administration
• Database Designing
• Database Usage by end users.
• System Analysis and Application Programming
Comments (1)
Step 2 of 5
• Database Administration:
• Database Administration is a process of administering the database resources such as
application programs, database management system.
• Database Administrator (DBA) is responsible for giving the permission to access the
database.
• The administrative work also includes acquiring the software and hardware resources.
• The security of the database is also managed by the database administration.
Comment
Step 3 of 5
• Database designing:
• Database designing is a process of designing the database which includes identifying the data
to be stored in the database and which data structures will be required to store the data.
• Database design should fulfill the requirements of all the user groups of the organization.
Comment
Step 4 of 5
• Database Usage by end user:
• End users are the users who can directly access the database for querying, updating and
generating the reports. There are following types of end users:
o Casual end user: These are the users who access the database occasionally. Middle and
high-level managers are the examples of the Casual end users.
o Parametric end user: These are the users who constantly access the database. Bank tellers
are the examples of the parametric end users.
o Sophisticated end user: They are under the category of engineers, scientists who implement
the application to meet the complex requirements.
o Standalone users: These are the users who maintain personal database by using ready-made
program packages.
Comment
Step 5 of 5
• System Analysis and Application Programming:
• The system analysis is a process which determines the requirement of the end users.
• The system analysis is done by the System Analysts. System Analysts develop the
specification for the canned transactions that meet the requirement of the end users.
• The implementation of these specification is done by the Application programmers.
Comment
Chapter 1, Problem 3RQ
Problem
Discuss the main characteristics of the database approach and how it differs from traditional file
systems.
Step-by-step solution
Step 1 of 4
Characteristics of Database:
Self – Describing nature of a database system:
A fundamental characteristic of the database approach is that the database system contains not
only the database itself but also complete definitions are description of the database. Structure
and constraints.
• The information stored in the catalogs is called meta – data, and if describes the structure of the
primary database.
• In traditional file processing, data definition is typically part of the application programs
themselves.
Those programs are constrained to work with only one specific database; whose structure is
declared in the application programs.
Comment
Step 2 of 4
Insulation between programs and data and data abstraction:–
In traditional file processing, the structure of data files is embedded in the applications programs,
so any changes to the structure of a file may require changing all programs that access that file.
• DBMS access programs do not require such changes in vast cases.
• The structure of data files is stored in DBMS catalog separately from the access programs.
Comment
Step 3 of 4
Support of multiple views of the data
A database typically has many users; each of whom may require a different perspective are view
of the database.
• A multi-user DBMS whose users have a variety of district applications must provide facilities for
defining multiple view.
• In case of traditional approach multiple views of data not supported.
Comment
Step 4 of 4
Sharing of Data and Multi-user Transaction Process:–‘
A multi-user DBMS must allow multiple users to access the database at the sometime. The
DBMS must include concurrency central software to ensure that several users trying to update
the same data do so in an controlled manner so that the result of the updates is correct.
• In traditional database, no such data sharing is possible, there is no such concurrency software
available.
Comment
Chapter 1, Problem 4RQ
Problem
What are the responsibilities of the DBA and the database designers?
Step-by-step solution
Step 1 of 2
Responsibilities of DBA:
DBA stands for Data Base Administrator. The purpose of a database administrator is highly
technical, who is responsible for managing the database used in the organization.
• The database administrator has the responsibility to build the physical design of the database.
• The database administrator deals with the technical responsibilities like,
o Defence enforcement
o Performance of the database
o Provide access to the database
o Acquire resources such hardware and software components
o Backup of the data from the database
o Recovery of the lost data from the database
o Monitoring and Coordinating the use of database
o Monitoring response time and security breaches.
Comment
Step 2 of 2
Responsibilities of Database Designer:
Database designer is the Architect of the database, database designer work is versatile, and
He/she works with everyone in the organization. The responsibilities of database designer is as
follows,
• The data to be stored in the database is identified by the database designers
• Appropriate structure to store the data are chosen by database designers
• Database designer studies and understands the business needs
• They communicate about the architecture to business and management and also may
participates in business development as advisor
• Ensure consistency across database
• Create and Enforce database development standards and processes.
Comment
Chapter 1, Problem 5RQ
Problem
What are the different types of database end users? Discuss the main activities of each.
Step-by-step solution
Step 1 of 2
The end users perform various database operations like querying, updating, and generating
reports.
The different types of end users are as follows:
• Casual end users
• Naive or parametric end users
• Sophisticated end users
• Standalone Users
Comment
Step 2 of 2
Casual end users:
• The Casual end users access the database occasionally.
• Each time they access the database, their request will vary.
• They use sophisticated database query language to retrieve the data from the database.
Naive or parametric end users:
• Naïve or parametric end users spend most of their time in querying and updating the database
using standard types of queries.
Sophisticated end users:
• The sophisticated end users access the database to implement their own applications to meet
their specific goals.
• The sophisticated end users are engineers, scientists, and business analysts.
Standalone Users:
• The standalone end users maintain their own databases by creating one using the ready-made
program packages that provides a graphical user interface.
Comment
Chapter 1, Problem 7RQ
Problem
Discuss the differences between database systems and information retrieval systems.
Step-by-step solution
Step 1 of 14
Database Approach:– A databases is more than a file it contains information about more then
one entity and information about relationships among the entities.
Information retrieval systems:– It information retrieval system data are stored in file is a very old
rout often used approach to system developed.
Comment
Step 2 of 14
Database approach:– Data about a single entity (i.e., Product customer, department) are each
stored to a “table” in the database.
Comment
Step 3 of 14
Information retrieval systems: Each program (system) often had its own unique set of files.
Comment
Step 4 of 14
Database approach: Databases are designed to meet the needs of multiple users and to be used
in multiple applications.
Comment
Step 5 of 14
Information retrieval systems: User of information retrieval systems are almost always at the
mercy of the information department to write programs that manipulate stored data and produce
needed information.
Comment
Step 6 of 14
Database approach: Database approach are relatively complex to design, implement and
maintained.
Comment
Step 7 of 14
Information retrieval systems: Information retrieval systems are very simple to design and
implement as they are normally based on a single application or information system.
Comment
Step 8 of 14
Database approach: The process speed is slow in comparison to information retrieval systems.
Comment
Step 9 of 14
Information retrieval systems:– The processing speed is faster than other ways of storing data
Comment
Step 10 of 14
Author Differences :–
In database systems program – data independence, bent in case of information retrieval
systems program – data are dependence.
Comment
Step 11 of 14
In database system minimal data redundancy improved data consistence, enforcement of
standards improved data quality, but in information retrieval systems duplication of data is resent
Comment
Step 12 of 14
Improve data sharing is present in database, but in case of data retrieval limited data
sharing.
Comment
Step 13 of 14
In database flexibility and scalability are present but in retrieval system, data are not flexible
and scalable
Comment
Step 14 of 14
In database, reduce data redundancy, but in case of data retrieval systems data redundancy
is are of the important problems.
Comment
Chapter 1, Problem 7RQ
Problem
Discuss the differences between database systems and information retrieval systems.
Step-by-step solution
Step 1 of 14
Database Approach:– A databases is more than a file it contains information about more then
one entity and information about relationships among the entities.
Information retrieval systems:– It information retrieval system data are stored in file is a very old
rout often used approach to system developed.
Comment
Step 2 of 14
Database approach:– Data about a single entity (i.e., Product customer, department) are each
stored to a “table” in the database.
Comment
Step 3 of 14
Information retrieval systems: Each program (system) often had its own unique set of files.
Comment
Step 4 of 14
Database approach: Databases are designed to meet the needs of multiple users and to be used
in multiple applications.
Comment
Step 5 of 14
Information retrieval systems: User of information retrieval systems are almost always at the
mercy of the information department to write programs that manipulate stored data and produce
needed information.
Comment
Step 6 of 14
Database approach: Database approach are relatively complex to design, implement and
maintained.
Comment
Step 7 of 14
Information retrieval systems: Information retrieval systems are very simple to design and
implement as they are normally based on a single application or information system.
Comment
Step 8 of 14
Database approach: The process speed is slow in comparison to information retrieval systems.
Comment
Step 9 of 14
Information retrieval systems:– The processing speed is faster than other ways of storing data
Comment
Step 10 of 14
Author Differences :–
In database systems program – data independence, bent in case of information retrieval
systems program – data are dependence.
Comment
Step 11 of 14
In database system minimal data redundancy improved data consistence, enforcement of
standards improved data quality, but in information retrieval systems duplication of data is resent
Comment
Step 12 of 14
Improve data sharing is present in database, but in case of data retrieval limited data
sharing.
Comment
Step 13 of 14
In database flexibility and scalability are present but in retrieval system, data are not flexible
and scalable
Comment
Step 14 of 14
In database, reduce data redundancy, but in case of data retrieval systems data redundancy
is are of the important problems.
Comment
Chapter 1, Problem 8E
Problem
Identify some informal queries and update operations that you would expect to apply to the
database shown in Figure 1.2.
Step-by-step solution
Step 1 of 2
Information Queries:–
a) Retrieve the transcript – a list of all courses and grades – of ‘smith’
b) List the name of students who took the section of the ‘Database’ course offered in fall 2005
and their grades in that section.
c) List the pre-requisites of the “Database” course
Comment
Step 2 of 2
Updates Operations:–
a) Change the class of “Smith” to sophomore
b) Create a new section for the “Database” course for this semester.
c) Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester
Comment
Chapter 1, Problem 9E
Problem
What is the difference between controlled and uncontrolled redundancy? Illustrate with
examples.
Step-by-step solution
Step 1 of 3
Storing the same facts or data at multiple places in the database is considered as redundancy. In
other words, duplication of data is known as redundancy.
Some of the problems with redundant data are as follows:
• Inconsistency of data
• Wastage of memory space
Comment
Step 2 of 3
Differences between controlled redundancy and uncontrolled redundancy is as follows:
Comment
Step 3 of 3
Example to illustrate controlled redundancy and uncontrolled redundancy is as follows:
Consider the following tables.
Employee(empno, ename, job, salary, dob)
Department(deptno, dname, location)
Project (pno, pname, description)
works(empno, deptno, pno)
Assume that an employee can work on multiple projects. So, in works table, empno and deptno
are redundant if an employee works on two or more projects.
Figure 1 is an example of controlled redundancy. Deptno for empno 100 is same in all three
records.
Figure 2 is an example of uncontrolled redundancy. Deptno for empno 100 is inconsistent in the
two records.
Comment
Chapter 1, Problem 10E
Problem
Specify all the relationships among the records of the database shown in Figure 1.2.
Step-by-step solution
Step 1 of 2
Relationship in the database specify how the data tables are related to each other.
Comment
Step 2 of 2
The relationship between tables are as follows:
• Consider the tables COURSE and SECTION. The two tables have common column
“Course_number”.
Hence, the table SECTION is related to COURSE through Course_number.
• Consider the tables STUDENT and GRADE_REPORT. The two tables have common column
“Student_number”.
Hence, the table GRADE_REPORT is related to STUDENT through Student_number.
• Consider the tables COURSE and PREREQUISITE. The two tables have common column
“Course_number”.
Hence, the table PREREQUISITE is related to COURSE through Course_number.
• Consider the tables SECTION and GRADE_REPORT. The two tables have common column
“Section_identifier”.
Hence, the table GRADE_REPORT is related to SECTION through Section_identifier.
Chapter 1, Problem 11E
Problem
Give some additional views that may be needed by other user groups for the database shown in
Figure 1.2.
Step-by-step solution
Step 1 of 2
Additional views for the given database:
New view can be created, which filters each section number of a student and grade of the
student.
GRADE_SEC_REPORT
Student_number Section_identifier Course_number Grade
This view is very helpful for university’s administration to print each section’s grade report.
Comment
Step 2 of 2
Additional view can be created, which filters total number of courses took by a student and the
grade achieved by a student in that courses.
COURSE_GRADE_REPORT
Student_number Course_number Grade GPA
This view is very helpful for university’s administration to determine students’ honours.
Chapter 1, Problem 12E
Problem
Cite some examples of integrity constraints that you think can apply to the database shown in
Figure 1.2.
Step-by-step solution
Step 1 of 1
Few constraints that can be imposed on database are:
1. Grade can be given only to enrolled students.
2. Each section must belong to any Course.
3. Each course must be a part of existing department
4. Prerequisite of each course must have been an offered course in past or must be an existing
course.
Student must be a part of section for which he is graded
Comment
Chapter 1, Problem 13E
Problem
Give examples of systems in which it may make sense to use traditional file processing instead
of a database approach.
Step-by-step solution
Step 1 of 2
Despite the advantages of using a database approach, there are some situations in which a
DBMS may involve unnecessary overhead costs that would not be incurred in traditional file
processing.
Comment
Step 2 of 2
The following are examples of systems in which it may make sense to use traditional file
processing instead of a database approach.
• Many computer aided design foals (CAD) used by the chemical and civil engineers have
proprietary file and data management software that is geared for the internal manipulations or
drawing and 3D objects.
• Similarly, communication and switching systems designed by companies like At & T.
• The GIS implementations often implement their own data organization schemes for efficiently
implementing functions related to processing maps, physical contours, lines, polygons, and so
on. General purpose DBMS’s are inadequate for their purpose.
• Small single user applications.
• The real-time navigation system that requires less data.
Comment
Chapter 1, Problem 14E
Problem
Consider Figure 1.2.
a. If the name of the ‘CS’ (Computer Science) Department changes to ‘CSSE’ (Computer
Science and Software Engineering) Department and the corresponding prefix for the course
number also changes, identify the columns in the database that would need to be updated.
b. Can you restructure the columns in the COURSE, SECTION, and PREREQUISITE tables so
that only one column will need to be updated?
Step-by-step solution
Step 1 of 2
a) The following columns need to be updated when the name of the department changed along
with the course number.
In the STUDENT table, Major has to be updated. In the COURSE table, Course_number and
Department should be updated. In the SECTION table, Course_number should be updated. In
the PREREQUISITE table, Course_number and Prerequisite_number are to be modified.
Comment
Step 2 of 2
b) The columns of the tables are split as follows:
The tables are as follows after restructuring:
Comments (1)
Chapter 2, Problem 1RQ
Problem
Define the following terms: data model, database schema, database state, internal schema,
conceptual schema, external schema, data independence, DDL, DML, SDL, VDL, query
language, host language, data sublanguage, database utility, catalog, client/server architecture,
three-tier architecture, and n-tier-architecture.
Step-by-step solution
Step 1 of 19
Data model
The data model describes the logical structure of the database and it introduces abstraction in
the DBMS (Database Management System). The data model provides a tool to describe the data
and their relationships.
Comment
Step 2 of 19
Database Schema
The database schema describes the overall design of the database. It is a basic structure to
define how the data is organized in the database. The database schema can be depicted by the
schema diagrams.
Comment
Step 3 of 19
Database state
The actual data stored in the database in a moment in time is called the database state.
Comment
Step 4 of 19
Internal Schema
It is also referred as the Physical level schema. The internal schema represents the structure of
the data as viewed by the DBMS and it describes the physical storage structure of the database.
Comment
Step 5 of 19
Conceptual Schema
It is also referred to as the Logical level schema. It describes the logical structure of the whole
database for a group of users. It hides the internal details of the physical storage structure.
Comment
Step 6 of 19
External Schema
The external schema referred as User level schema. It describes the data which is viewed by the
end users. This schema describes the part of the database for a user group and it hides the rest
of the database from that user group.
Comment
Step 7 of 19
Data independence
The capacity to change the schema at the physical level of a database system without affecting
the schema at the conceptual or external level is called data independence.
Comment
Step 8 of 19
DDL
DDL stands for Data Definition Language. It is used to create, alter, and drop the database
tables, views, and indexes.
Comment
Step 9 of 19
DML
DML stands for Data Manipulation Language. It is used to insert, retrieve, update, and delete the
records in the database.
Comment
Step 10 of 19
SDL
SDL stands for Storage Definition Language. It is used to specify the internal schema of the
database and specify the mapping between two schemas.
Comment
Step 11 of 19
VDL
VDL stands for View Definition Language. It specifies the user views and their mappings to the
logical schema in the database.
Comment
Step 12 of 19
Query Language
The query language is a high-level language used to retrieve the data from the database.
Comment
Step 13 of 19
Host Language
The host language is used for application programming in a database. The DML commands are
embedded in a general-purpose language to manipulate the data in the database.
Comment
Step 14 of 19
Data Sublanguage
The data manipulation language commands are embedded in a general-purpose language to
manipulate the data such as insert, update, and delete operations in the database, here the DML
is referred as a data sublanguage.
Comment
Step 15 of 19
Database utility
The database utility is a software module to help the DBA (Database Administrator) to manage
the database.
Comment
Step 16 of 19
Catalog
The catalog stores the complete description of the database structure and its constraints.
Comment
Step 17 of 19
Client/server architecture
The client/server architecture is a database architecture and it contains two modules. A client
module usually a PC that provides the user interface. A server module can respond the user
queries and provide services to the client machines.
Comment
Step 18 of 19
Three-tier architecture
The three-tier architecture consists of three layers such as client, application server, and
database server. The client machine usually contains the user interface and the intermediate
layer (application layer) running the application programs and storing business rules. The
database layer stores the data.
Comment
Step 19 of 19
n-tier architecture
The n-tier architecture consists of four or five tiers. The intermediate layer or business logic layer
is divided into multiple layers. And distributing programming and data throughout a network.
Comment
Chapter 2, Problem 2RQ
Problem
Discuss the main categories of data models. What are the basic differences among the relational
model, the object model, and the XML model?
Step-by-step solution
Step 1 of 2
The three main categories of data models are as follows:
• High-level or Conceptual data model
• Representational or implementational data model
• Low -level or Physical data model
Comment
Step 2 of 2
The Differences between relational model, the object model and XML model are as
follows:
Relational Model
Object Model
XML Model
The data in relational model
It refers to the model which
The data in the XML model is in
is represented logically and
deals with how applications
hierarchical mode. We can
information about the
will interact with the resources
define different types of the
relationship types.
from any external resource.
data in a single XML document.
The data is defined in
columns with the field name
and the entire data in a
column must be in the same
It also deals with the
relationship between the
classes, methods and
properties of the classes.
type.
It is closer to conceptual data
The relational database
The classes in the object
uses high-level query
model are designed in acyclic
language
graph manner.
Example: SQL
Comment
The data in XML document
does not have any inherent
ordering.
models.
Example: Document Object
Model (DOM)
Data is represented in the form
of tags known as elements.
Example: Stylus studio
Chapter 2, Problem 3RQ
Problem
What is the difference between a database schema and a database state?
Step-by-step solution
Step 1 of 1
Difference between a database schema and a database state:Database schema is a description of the database and the database state is the database it
self.
The description of a database is called the database schema, which is specified during
database design and is not expected to change frequently. Most data models have certain
convention for displaying schemas as diagram. A displayed schema is called a schema diagram
schema diagram displays the structure of each record type but not the actual instances of
records. A schema diagram displays only some aspects of a schema, such as the names of
record types and data items, and some types of constraints.
The data in the database at a particular moment in time is called a database state. It is also
called the current set of occurrences are instances in the data base. In a given database state,
each schema construct has its own current set of instances many database states can be
constructed to covers pond to a particular data base schema. Every time we insert are delete a
record are change the value of a data item in a record we change one state of the database into
another state.
When we define a new database we specify its database schema only to the DBMS. At this
point, the covers pending database state in the empty state with no data. The DBMS in partly
responsible for ensuring the every state of the database is a valid state. – that is , a state that
satisfies the structure and constraints specified in the schema.
The schema is sometimes called the intension, and a database state is called an extension of
the schema.
Comment
Chapter 2, Problem 4RQ
Problem
Describe the three-schema architecture. Why do we need mappings among schema levels? How
do different schema definition languages support this architecture?
Step-by-step solution
Step 1 of 3
Three-schema architecture :The goal of he three-schema architecture is to separate the user applications and the physical
database. In this architecture schemas can be defined at the following three levels.
(1) internal level :it has an internal schema, which describes the physical storage structure of the database.
(2) Conceptual level :It has a conceptual schema, which describes the structure of the whole database for a
community of users. The conceptual schema hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user operations and constraints.
Comment
Step 2 of 3
(3) External level :It includes a number of external schema are user views. Each external schema describes the
part of the database that a particular user group is interested in and hides the rest of the
database from that group. A high-level data model on an implementation data model can be used
at this level.
Need of mapping :The process of transforming requests and results between levels are called mappings.
The conceptual internal mapping define the coverspondence between the conceptual view and
the stared database. It specifies how conceptual records and fields are represented at the
internal level.
An external conceptual mapping defines the covers pondence between a particular external view
and the conceptual view.
Comment
Step 3 of 3
Different schema definition language :DDL :Data definition language is used to specify conceptual and internal schemas for the database
and any mappings between the two, the DBMS will have a DDL compiler whose function is to
process DDL statements in order to identify descriptions of the schema constructs and to store
the schema description in the DBMS catalog.
SDL :-
Storage definition language is used to specify the internal schema. The mappings between the
two schemas may be specified in either one of these languages. In mast relational DBMS’s to
day, there is no specific language that performs the sale of SDL. Instead the internal schema is
specified by a combination of parameters and specifications related to storage.
VDL :View Definition Language is used to specify user view and their mappings to the conceptual
schema but in most DBMS’s the DDL is used to define both conceptual and external schemas. In
relational DBMS’s SQL is used in the sale of VDL to define user are application views as results
of predefined queries.
Comment
Chapter 2, Problem 5RQ
Problem
What is the difference between logical data independence and physical data independence?
Which one is harder to achieve? Why?
Step-by-step solution
Step 1 of 3
The data independency refers to the task of changing a level of schema without affecting the
other levels or the levels at higher level.
There are following two different ways in which data independence is achieved:
• Logical data independence
• Physical data independence
Comment
Step 2 of 3
Logical data independence is the capacity to change the conceptual schema without changing
the external schema. This only requires changing the view definition and the mappings. For
example, changing the constraints of an attribute that does not affect the external schema,
insertion and deletion of data items that changes the table size but does not affect the external
schema.
Physical data independence is the capacity to change the internal schema without changing the
conceptual schema or the external schema. For example, reorganization of files on the physical
storage to enhance the operations on the database and since the data is the same and only the
files are relocated, the conceptual/external schema remains unaffected.
Comment
Step 3 of 3
The logical data independence is harder to achieve. Changing the attribute constraints and the
structure of the table might result in invalid data for the changed attributes. The table or the
application program that references the modified table will get affected which should not be the
case in logical data independence.
Comment
Chapter 2, Problem 6RQ
Problem
What is the difference between procedural and nonprocedural DMLs?
Step-by-step solution
Step 1 of 2
Difference between procedural and nonprocedural DML “Procedural DML :Procedural data manipulation language is called low level DML. Procedural DML must be
embedded in a general purpose programming language. This type of DML typically retrieves.
Individual records are objects from the database and process each separately. Therefore, it
needs to use programming language. Constructs, such as looping to retrieve and process each
record form a set of records.
Procedural DMLs are also called record –at-a-time DML.
Comment
Step 2 of 2
Non-procedural DML :Non-procedural is called high level DML. Non-procedural DML can be used on its own to specify
complex database operations concisely many DBMS’s allow high-level DML statements either to
be entered interactively from a display monitor ore terminal are to be embedded in a generalpurpose programming language.
A query in a high level DML often specifies which data to retrieve rather than how to retrieve it.
Therefore such languages are also called declarative.
Non-procedural DML requires a user to specify what data are needed without specifying low to
get these data.
Comment
Chapter 2, Problem 7RQ
Problem
Discuss the different types of user-friendly interfaces and the types of users who typically use
each.
Step-by-step solution
Step 1 of 7
User friendly interfaces provided by the DBMS are as follows:
(a)
Menu-Based interfaces:
• These interfaces contain the lists of options through which the user can send the request.
• Pull-down menus are a very popular technique in web-based user interfaces.
User who use the interface:
• These types of interfaces are used by the web browsing users and web clients.
Comment
Step 2 of 7
(b)
Forms-based interfaces:
• These types of interfaces display a form to each user.
• The user can fill the entries to insert new data.
• These Forms are usually designed and programmed for naive users as interfaces to recorded
transactions.
User who use the interface:
• User who wants to submit the online information by filling and submitting the details.
• Mostly used to create accounts on a website, or enrolling into some institution etc.
Comment
Step 3 of 7
(c)
Graphical user interfaces:
• A graphical user interfaces contain a diagrammatic form that comprises a schema to the user.
• The user can ask a query by manipulating the diagram.
• These interfaces use mouse as pointing device to pick certain parts of the displayed schema
diagram.
User who use the interface:
• Mostly used by the users who uses the electronic gadgets such as mobile phones and touch
screens.
• Users who uses the applications that are accessed by pointing devices.
Comment
Step 4 of 7
(d)
Natural language interfaces:
• These interfaces accept the request from the user and tries to interpret it.
• The natural language interfaces have its own schema which is like the database conceptual
schema.
User who use the interface:
• The Search engines in these days are using natural language interfaces.
• The users can use these search engines that accepts the words and retrieves the related
information.
Comment
Step 5 of 7
(e)
Speech input and output:
• These interfaces accept speech as an input and outputs the speech as a result.
User who use the interface:
• These types of interfaces are used in the inquiry for telephone directory or to get the flight
information over the smart gadgets, etc.
Comment
Step 6 of 7
(f)
Interfaces for parametric users:
• Paramedic users such as bank tellers have a small set of operations that they must perform
repeatedly.
• These interfaces contain some commands to perform a request with minimum key strokes.
User who use the interface:
• These can be used in bank transactions to deposit or withdrawal of money.
Comment
Step 7 of 7
(g)
Interfaces for the DBA:
• These interfaces contain some commands for creating accounts, to manipulate the database
and to perform some operations on the database.
User who use the interface:
• These interfaces are specifically used by the Database administrators.
Comment
Chapter 2, Problem 8RQ
Problem
With what other computer system software does a DBMS interact?
Step-by-step solution
Step 1 of 7
Database management system (DBMS):
A database management system (DBMS) is a set of program that empowers users to build and
maintain a database.
It is a general-purpose software system that enables the processes to define, construct,
manipulate, and share databases among various applications and users.
Comment
Step 2 of 7
List of other computer system software a database management system (DBMS) interacts
with:
The following are the list of other computer system software a database management system
(DBMS) interacts with:
• Computer-Aided Software Engineering (CASE) tools.
• Data dictionary systems.
• Application development environments.
• Information repository systems.
• Communication software.
Comment
Step 3 of 7
CASE tools:
The design phase of the database system often employs the CASE tools.
Comment
Step 4 of 7
Data dictionaries:
Data dictionaries are similar to database management system catalog, however, they include
variety of information.
• Typically, data dictionaries can be directly accessed by the database administrator (DBA)
whenever required.
Comment
Step 5 of 7
Application development environments:
Typically, application development environments often provide an environment to develop
database application and have facilities that aid in many features of database systems, including
graphical user interface (GUI) development, database design, querying, update, and application
program development.
• Examples of application development environments are listed below:
o JBuilder (Borland)
o PowerBuilder (Sybase)
Comment
Step 6 of 7
Information repository systems:
• The information repository is a kind of data dictionary that can also stores information like
design decisions, application program descriptions, usage standards, and user information.
• Like data dictionaries, information repository can also be directly accessed by the database
administrator.
Comment
Step 7 of 7
Communication software:
• The database management system also requires interfacing with communication software.
• The main function of the communication software is to enable users residing remote from the
database system to access the database through personal computers, or workstations.
• The communication software are connected to the database system through communications
hardware like routers, local networks, phone lines, or satellite communication devices.
Comment
Chapter 2, Problem 9RQ
Problem
What is the difference between the two-tier and three-tier client/server architectures?
Step-by-step solution
Step 1 of 2
The difference between a two-tire architecture and a three tire architecture is that of a layers
through which data and queries pass at time of processing, for any database.
In two tire architecture there is two layers viz., Client layer (user interface) and query server or
transaction server. Application programs run on client side and when data processing is required
connection is established with the server (DBMS), where data is stored. Once connection is
established, transaction and query requests are sent using Open Database Connectivity’s API’s,
which are then processed at server side. It may also happen that client side takes care of user
interaction and query processing while server stores data, manages disks etc. Exact distribution
of functionality differs but two - tire architecture has two layers.
Comment
Step 2 of 2
In three- tire architecture there are three layers, and a new application or web layer is between
client and database service layer. The idea behind three tire architecture is to partition roles in
different layers and each layer has specific task. In three-tire architecture, user layer or client
layer provide user interface from where user can run query. Query gets processes at application
or web server layer. This layer also checks for any business constraints that may be imposed on
type of query user can send or verify credentials of user so has verify access permissions that
user has. This layer can also be called as Business logic layer. Finally Database server manages
storage of data in the system.
Comment
Chapter 2, Problem 10RQ
Problem
Discuss some types of database utilities and tools and their functions.
Step-by-step solution
Step 1 of 2
Few categories of database utilities and tools and their functions are:
1. Loading:
Load existing data files such as text files into the database.
• Transfer data from one dbms to another dbms easily used in many organizations.
• Vendors are offering the conversion tools. Those tools are useful loading programs.
2. Backup:
It is one of the utility that organize a backup copy of the database.
• Put entire database onto tape and those database backup copies can be used in the case of
catastrophic loss for recovering system state.
Comment
Step 2 of 2
3. Database storage reorganization:
It is a utility that can be used to restructure a set of database files into a different file organization
to raise the performance of the database.
4. CASE tools:
CASE tools are used to produce a plan for a database application.
5. Data Dictionary system:
Information repository plays main role in data dictionary system.
• It is one of the repository is used to store design process, user information and application
program description.
• This information can be accessed by user when it is required.
• Information repository contains additional information than the DBMS catalog.
6. Performance monitoring:
It is used to control database usage and maintain stats.
• Those stats are used by the DBA in making selection, those selections are related to file
restructure and indexing for raise the performance of database.
There are several utilities are available those are
• Sorting the text files in the database.
• Data compression techniques handled by database.
Comment
Chapter 2, Problem 11RQ
Problem
What is the additional functionality incorporated in n-tier architecture (n > 3)?
Step-by-step solution
Step 1 of 1
It is customary to divide the layer between the user and the stored data in three tire architecture
into finer components, thereby giving rise to an n-tire architecture, where n may be 4 or 5.
Typically, the business logic layer is divided into multiple layer.
1. N-tire architecture distributes data and programming over the network.
2. Each tire can run on appropriate processor or operating system platform and can be handled
independently.
Another layer that is typically used by vendors of ERP and CRM packages is the middleware
layer which accounts for the front-end modules communicating with a number of back-end
databases.
Comment
Chapter 2, Problem 13E
Problem
Choose a database application with which you are familiar. Design a schema and show a samp
database for that application, using the notation of Figures 1.2 and 2.1. What types of additional
information and constraints would you like to represent in the schema? Think of several users o
your database, and design a view for each.
Step-by-step solution
Step 1 of 2
Consider Flight Reservation system.
• Each flight is identifies by Number, and consists of one or more FLIGHT_LEGs with Leg_no.
And flies on certain weekdays.
• Each FLIGHT_LEG has scheduled arrival and departure time and arrival and departure airport
and one or more LEG_INSTANCEs – one for eachDate on which flight travels.
• FARE is kept for each flight and there are certain set of restrictions on FARE.
• For each FLIGHT_LEG instance, SEAT_RESERVATIONs are kept, as are AIRPLANE used on
each leg and the actual arrival and departure times and airports.
• AIRPLANE is identified by an airplane id, and is of a particular AIRPLANE_TYPE. It has a fixe
no. of seats.
• CAN_LAN relates AIRPLANE_TYPE to the AIRPORTS at which they can land.
• AIRPORT is identified by airport code.
Comment
Step 2 of 2
Following constraints hold good on schema:
a. Asked flight number or flight leg is available on given date. Data can be checked from
LEG_INSTANCE table.
b. A non reserved seat must exist for specifies date and flight. We can get total number of seats
available from AIRPLANE.
c. Fligh_leg can correspond to existing flight number.
d. Arrival and code must be of existing airports.
e. Leg_instance can have entries only for valid Flight_number and leg_number combination.
f. Flight_number in any relation is of a valid flight that has its entry in FLIGHT table.
g. Airplane_type_name in CAN_LAND must be a vlaid name from AIRPLANE_TYPE.
Comment
Chapter 2, Problem 14E
Problem
If you were designing a Web-based system to make airline reservations and sell airline tickets,
which DBMS architecture would you choose from Section 2.5? Why? Why would the other
architectures not be a good choice?
Step-by-step solution
Step 1 of 4
There are four architectures discussed in section 2.5 in the textbook. They are
1. Centralized DBMS architecture
2. Basic Client/Server Architecture
3. Two-Tier Client/Server Architecture
4. Three-Tier Client/Server Architecture
Comment
Step 2 of 4
For designing a Web-based system to make airline reservations and sell airline tickets, Three-tie
client/server architecture will be the best choice.
• A web user interface is necessary as different types of users such as naive users or casual
users will interact with the system.
• Web user interface is placed in the client system.
• User can interact with user interface and submit the transactions.
• Web server can handle those transactions, validate the data and manipulate database
accordingly.
• Webserver/application server will handle the application logic of the system.
• The database server contains the DBMS.
Comment
Step 3 of 4
In centralized DBMS architecture, DBMS functionality and user interface are performed on the
same system. But for a Web-based system, they must be on different systems.
Hence centralized DBMS architecture is not appropriate for web-based system.
Comment
Step 4 of 4
In three-tier Client/Server Architecture, the business logic is placed in application server or web
server.
Basic Client/Server architecture or Two-Tier Client/Server architecture can be considered
appropriate for web server if the business logic can be placed in database server or client. But if
business logic is placed in database server or client, it will be a burden.
Hence, Basic Client/Server architecture and Two-Tier Client/Server architecture are not
appropriate for web-based system.
Comment
Chapter 2, Problem 15E
Problem
Consider Figure 2.1. In addition to constraints relating the values of columns in one table to
columns in another table, there are also constraints that impose restrictions on values in a
column or a combination of columns within a table. One such constraint dictates that a column o
a group of columns must be unique across all rows in the table. For example, in the STUDENT
table, the Student_number column must be unique (to prevent two different students from havin
the same Student_number). Identify the column or the group of columns in the other tables that
must be unique across all rows in the table.
Step-by-step solution
Step 1 of 2
By using schema diagram of the database, the database tables are constructed. Each data bas
table contains column and those columns are unique.
Comment
Step 2 of 2
Group of columns that will be unique in each table are:
1. STUDENT: Student_number
2. COURSE: Course_number. If course name is separate for each course Course_name can
also be a column.
3. PREREQUISITE: Course_number can be a unique identifier but only if a course has single
PREREQUISITE or else Course_number and Prerequisite_number will together form unique
combination.
4. SECTION: Section_identifier
• Consider that no two sections can have the same Section_identifier.
• Look at that Section_identifier is unique only within a given course allow in a given term.
5. GRADE_REPORT: Section_identifier and Student_number.
• The Section_identifier will be different if a student takes the same course or different course in
other term.
Comment
Chapter 3, Problem 1RQ
Problem
Discuss the role of a high-level data model in the database design process.
Step-by-step solution
Step 1 of 2
High-level data model provides the concepts for presenting data which are close to the user
recognize data. It helps to show the data requirements of the users in a detailed description of
the entity types, relationships and constraints.
Comment
Step 2 of 2
The role of a high-level data model in the database design process is as follows:
• The design process of the High-level data model is easy to understand and useful in
communicating with non-technical users.
• This model acts as a reference to ensure that all the user requirements are met and do not
conflict with each other.
• High-level data model helps to concentrate on specifying the properties of data to the database
designers, without being concerned with storage details in the database design process.
• This data model helps in conceptual design.
Comment
Chapter 3, Problem 2RQ
Problem
List the various cases where use of a NULL value would be appropriate.
Step-by-step solution
Step 1 of 2
Use of NULL values is appropriate in two situations:
1. When value of an attribute is irrelevant for an entity.
For example: In a schema that stores information about a person if we have an attribute called
Company, which sores the company name where a person works. Now for a student who is no
working, this attribute value will be irrelevant, so we can put in a NULL value at its place.
Comment
Step 2 of 2
2. When value of a particular attribute is not known; either because it is not known that value for
attribute exist or because existing value is unknown; then we can put NULL as value.
For example: In a schema that stores information about a person if we have an attribute called
Company, which sores the company name where a person works. Now for a person it is
possible that he is not working or it might be the case that the value of the company in which
person works is unknown, so we can put in a NULL value at its place.
Comment
Chapter 3, Problem 3RQ
Problem
Define the following terms: entity, attribute, attribute value, relationship instance, composite
attribute, multivalued attribute, derived attribute, complex attribute, key attribute, and value set
(domain).
Step-by-step solution
Step 1 of 5
1. Entity: An entity is an object (thing) with independent physical (car, home, person) or
conceptual (company, university course) existence in the real world.
2. Attribute: Each real world entity (thing) has certain properties that represent its significance i
real world or describes it. These properties of an entity are known as attribute.
For example: consider a car: various things that describe a car can be: model, manufacture,
color, cost etc...
All these are relevant in a miniworld and are important in describing a car. These are attributes o
a CAR.
Comment
Step 2 of 5
3. Attribute Value: Associated with each real world entity are certain attributes that describe tha
entity. Value of these attributes for any entity is called attribute value.
For Example: Attribute Value of color attribute of car entity can be Red.
4. Relationship Instance: Each relationship instance rj in R is an association of entities, where
the association includes exactly one entity from each participating entity type. Each such
relationship instance rj represent the fact that the entities participating in rj are related in some
way in the corresponding miniworld situation.
For example: In relationship type WORKS_FOR between the two entity types EMPLOYEE and
DEPARTMENT, which associates each employee with the department for which the employee
works. Each relationship instance in the relationship set WORKS_FOR associates one
EMPLOYEE and one DEPARTMENT.
Comment
Step 3 of 5
5. Composite Attribute: An attribute that can be divided into smaller subparts, which represent
more basic attributes with independent meanings, is called a composite attribute.
For Example: consider an attribute called phone number that in relation to an employee of a
company. One can have phone number as a single attribute or as two attributes, viz. ., area cod
and number. Since phone number can be broken into two independent attributes, it is a
composite attribute.
Weather to break a composite attribute or divide it in basic attributes depends on usage of the
attribute in miniworld.
6. Multivalued Attribute: For a real world entity, an attribute may have more than one value. Fo
example: Phone number attribute of a person. A person may have one, two or three phones. So
there is a possibility of more than one value for this attribute. Any attribute that can have more
than one value is a multivalued attribute.
Comment
Step 4 of 5
7. Derived Attribute: For a real world entity, an attribute may have value that is independent of
other attributes or can not be derived from other attributes; such attributes are called as stored
attributes. There are also certain attributes, whose value can be derived using value of other
attributes; such attributes are known as derived attributes.
For example: if date of birth of a person is a stored attribute, and using DOB attribute and
current date age of a person can be calculated; so age is a derived attribute.
8. Complex Attribute: Composite and multivalued attribute can be nested arbitrarily. Arbitrary
nesting can be represented by grouping components of a composite attribute between
parenthesis () and separating the components with comas, and by displaying multivalued
attributes between braces {}. Such attributes are called composite attributes.
For Example: if a person has more than one address and each residence has multiple phones
and address_phone attribute can be specifies as:
(Address_phone({Phone(Area_code,Ph_Num)},Address(street_address,
(Number,Street,Apartment_number),City,State,Zip))
Comment
Step 5 of 5
9. Key Attribute: Each real world entity is unique in itself. There are certain attributes whose
value is different for all similar type of entities. These attributes are called Key attributes. These
attributes are used to specify uniqueness constraint in a relation.
For Example: Consider a entity Car. For all cars, attribute, registration number and car number
will have different values. These are key of all entity of car type.
It is possible that a set of attributes form a key.
10. Value Set (domain): For a Attribute of a real world entity, there is a range of values from
which a particular attribute can take value. For example: Age attribute of an employee must
have value, let, from 18-70 then all integers in range 18-70 are domain of attribute Age In most
programming languages basic data types such as integers, strings, float, date etc… are used to
specify domain of a particular attribute.
Comment
Chapter 3, Problem 4RQ
Problem
What is an entity type? What is an entity set? Explain the differences among an entity, an entity
type, and an entity set.
Step-by-step solution
Step 1 of 4
Entity type: An entity type defines a collection (or set) of entities that have the same attributes.
database usually contains a group of entities that are similar. These entities have same attribute
but different attribute values. A collection of these entities is an entity type.
For example a car dealer might like to store details of all car in his showroom in a car database
A collection of all car entities will be call as entity type.
Each entity type in a database is represented by its name and its attributes.
Comment
Step 2 of 4
For example in CAR can be the name of the entity type and Reg_num, Car_num, Manufacturer
model, cost, color can be attributes.
Entity Set: At a particular time the dealer might have a set of eight cars and at some other time
he might have a set of different 4 cars.
The collection of all entities of a particular entity type in a database, at any point of time are
called entity set. It is referred by same name as entity type.
Comment
Step 3 of 4
For example if we have 4 entities (4 cars):
Entity set will include:
Name: CAR
Entities: e1(reg_1, DL_1, ford, 1870,2000000,white),e2(reg_2, DL_3, ford,
1830,1000000,white),e3(reg_3, DL_3, ford, 1877,2100000,red),e4(reg_4, DL_4, ford,
1970,2500000,white)
Comment
Step 4 of 4
An entity is a real world object or thing that has independent physical or conceptual existence.
Often there are many entities of similar type and about those information needs to be stored in
database. Name of this database and attributes of entity jointly form an entity type, or in other
words entity type is collection of entities that have similar attributes. At two instance of time,
entities in miniworld about which information is stored in the database can be different. Collectio
of entities of an entity type at an instance of time is called entity set.
Comment
Chapter 3, Problem 5RQ
Problem
Explain the difference between an attribute and a value set.
Step-by-step solution
Step 1 of 2
Attribute:
Every entity has certain things that represent its importance in the real world. These properties o
entities are known as attribute.
Example:
Let us consider a Bus, bus contains different things that describe a bus can be model, color,
manufacture date, year, country etc.
Value set:
For attribute of an entity, there is a range of values from which an attribute can take a value.
Example:
Age attributes of an employee must have a value. Let us consider Age is attribute in the range o
16 - 60 then all are integers and those known as the value set of attribute Age.
Comment
Step 2 of 2
The difference between an attribute and value set.
Attribute
Value set
A table grouped the data in rows and columns. The Value set is the group of values that may
columns are known as attributes of that table.
Attribute contains certain properties of an entity.
be allow to that attribute for each entity.
Value set is a range of values which an
attribute can take a value.
Comment
Chapter 3, Problem 6RQ
Problem
What is a relationship type? Explain the differences among a relationship instance, a relationship
type, and a relationship set.
Step-by-step solution
Step 1 of 3
Relationship type:
This expresses a type of relationship that is occurring between the entities and also lists the
possible count of relationships between entities.
Comment
Step 2 of 3
Consider the following diagram.
Explanation:
STUDENT and COURSE are entities and ENROLL refers to the relationship.
S1, S2, S3… are the instances of entity STUDENT.
C1, C2, C3… are the instances of entity COURSE.
r1, r2, r3… are the relationship types between the entities.
Relationship type is the association between the entities. In the above diagram ENROLL is the
relationship type.
Relationship instance refers to exactly one instance from each participating entity type. S1 is
related to C1 through r1. S1 and C1 are one instance, S2 and C2 are one instance, S3 and C1
and so on.
Relationship set refers to all instances of a Relationship type. {(S1, C1), (S2, C2) , S1, C3) …}
form the relationship set.
Comment
Step 3 of 3
Differences between relationship instance, type and set:
Relationship instance
Relationship type
Relationship set
It refers to exactly one instance from
It refers association
This is a collection instances
each participating entity type.
between the entities.
of a relationship type.
Comment
Chapter 3, Problem 7RQ
Problem
What is a participation role? When is it necessary to use role names in the description of
relationship types?
Step-by-step solution
Step 1 of 3
The Participation role is the part that every entity participates in a relationship.
• This role is important to use role name in the depiction of relationship type when the similar
entity type participates more than once in a relationship type in various roles.
• The role names are necessary in recursive relationships.
Example:
An employee is related to a department in which he works in a company.
So, we can say that a relationship may exist between various entities (of same or different entity
type).
Each entity type that participates in a relationship type plays a role in the relationship.
Comment
Step 2 of 3
Participation Role or Role name signifies role that a participating entity from the entity type
plays in each relationship instance and helps to explain what relationship means.
Example:
In WORKS_FOR relationship type, EMPLOYE plays the role of worker and DEPARTMENT plays
role of department or employer. In figure below an employee works for department. E1 and E3
work for D1 and E2 works for D2.
Comment
Step 3 of 3
Using Role name is not necessary in the description of relationship types where all participating
entities are distinct as in example above because, in such cases name of entity type generally
specify the role played by each entity type.
But when one entity type participates in a relation in more than one role; recursive
relationships; it becomes necessary to use role names in the description of relationship types.
Example:
Consider entity type EMPLOYEE. There can be another employee who can supervise the first
employee. In this case role cannot be describes using the entity type name as this is relationship
of an entity type with itself. In such a case using role name becomes important. In figure below
Supervision relationship type relates employee and supervisor.
E1 supervises E2. Here each relationship instance ri in SUPERVISION associates two
employee, ei and ej, one playing role of supervisor and other playing role of supervisee.
Comment
Chapter 3, Problem 8RQ
Problem
Describe the two alternatives for specifying structural constraints on relationship types. What are
the advantages and disadvantages of each?
Step-by-step solution
Step 1 of 3
The two alternatives for specifying structural constraints on relationship types are as follows:
• Cardinality ratio
• Participation constraint
Comment
Step 2 of 3
Cardinality Ratio:
• The entity can participate in any number of relationship instances.
• The cardinality ratio specifies the maximum participation of the entity.
• For a binary relationship, the cardinality ratios can be 1:1, 1:N, N:1 and M:N.
• Cardinality Ratio is represented on ER diagram by 1,M and N on the left and right side of the
diamond.
Participation constraint:
• The participation constraint specifies the minimum number of relationship instances that can be
participated by each entity.
• The participation constraint specifies the minimum participation of the entity. It is also called as
minimum cardinality constraint.
• There are two types of participation constraints. They are total and partial participation
constraints.
• Participation constraint is represented in an ER diagram a line joining the participating entity
type and relationship. Total participation is represented by a double line where as partial
participation is represented by a single line.
Comment
Step 3 of 3
Advantages and disadvantages:
• The cardinality ratio and participation constraint specify the participation of the entity in the
relationship instances.
• They are helpful in describing the binary relationship types.
• It is a costly affair for some of the entities and relationships to be expressed using these two
modeling constructs.
Comment
Chapter 3, Problem 9RQ
Problem
Under what conditions can an attribute of a binary relationship type be migrated to become an
attribute of one of the participating entity types?
Step-by-step solution
Step 1 of 2
• The attributes of a relationship with cardinality 1:1 and 1: N can be migrated to become an
attribute of entity types.
• In case of 1:1 cardinality, the attribute can be moved to either of entity types in the binary
relationship.
• In case of 1: N cardinality, the attribute can be migrated only to N side of the relationship.
Comment
Step 2 of 2
Example
• Consider a binary relationship, Works_for between the EMPLOYEE and DEPARTMENT.
• This relationship is between the DEPARTMENT and EMPLOYEE is of cardinality 1: N.
• Each employee is in one department but there can be several employees in a single
department.
• In this scenario, an attribute Start_date in relationship type WORKS_FOR that can be migrated
to EMPLOYEE entity type that tells start date when the employee started working for that
department.
Comment
Chapter 3, Problem 10RQ
Problem
When we think of relationships as attributes, what are the value sets of these attributes? What
class of data models is based on this concept?
Step-by-step solution
Step 1 of 3
Solution:
Relationship as attributes:
• Whenever the attribute refers to one entity type to another entity type, there a relationship
exists.
• They can have attributes like entity types.
• For those attributes having cardinality relationship type as 1:1 or 1: N.
• The relationship types to become attributes of entity types when it is migrated.
For example:
Take the scenario as follows:
There is a relationship between the EMPLOYEE and DEPARTMENT.
• The relationship DEPARTMENT:EMPLOYEE is of the cardinality 1: N.
• Here, each employee is in one department and several employees are in a single department.
Start_date attribute is in the WORKS_FOR relationship type that can be migrated to
EMPLOYEE entity type.
This will inform the Start_date, when EMPLOYEE started working for that department.
Date will be the domain or value set for Start_date of EMPLOYEE in any department. This will
not change or depend on any attribute whether it is present or not.
Comment
Step 2 of 3
The Value sets of attributes:
The set of values attribute can call as domain or value set.
In conceptual design phase of data model all entity types, relationships and constraints
are specified as follows:
• DEPARTMENT entity type contains the attributes like name, locations, number, manager and
managerstartdate.
• Here, multi-valued attribute is location. Key attributes are both Name and number.
• PROJECT entity type contains the attributes like name, number location,
controllingdepartment.
• Key attributes are both Name and number.
• EMPLOYEE entity type contains the attributes like name, sex, ssn, salary, department,
address, salary, department, birthdate and supervisor.
• Composite attributes are both Name and address.
• DEPENDENT entity type contains the attributes like employee, dependantname, sex,
relationship, and birthdate.
Comment
Step 3 of 3
The relational data model is based on this concept.
Comment
Chapter 3, Problem 11RQ
Problem
What is meant by a recursive relationship type? Give some examples of recursive relationship
types.
Step-by-step solution
Step 1 of 2
Recursive relationship:
If there is a relationship between the two entities of the similar type is called as recursive
relationship.
• The relationship between occurrences of two different entities is termed as recursive
relationship
Comment
Step 2 of 2
Example of recursive relationship:
The following is the example of recursive relationship,
Consider that the entity might be a PERSON. In this entity, the attribute will be MOTHER which is
a person itself.
Here, the recursive relationship exists because one row in the PERSON table refers to another
row in the same PERSON table.
Comment
Chapter 3, Problem 12RQ
Problem
When is the concept of a weak entity used in data modeling? Define the terms owner entity type,
weak entity type, identifying relationship type, and partial key.
Step-by-step solution
Step 1 of 5
The concept of a weak entity is used in the conceptual phase of a data modeling. While
modeling, the entity types who do not have key attributes of there own.
Example
Consider the entity types DEPENDENT and EMPLOYEE.
• A DEPENDENT can only be an EMPLOYEE of the company.
• The DEPENDENT attributes can be same for relatives of two employees so, there can be no
unique way of distinguishing between two records such entity types are called weak entity types.
Comments (1)
Step 2 of 5
Owner entity type
The entities belong to a weak entity type are identified by being associated to specific entities
from another entity type in combination with one of their attribute values.
Comment
Step 3 of 5
Weak Entity Type
Entity types that do not have key attributes of their own are called weak entity types.
Comment
Step 4 of 5
Identifying Relationship Type
A relationship type that relates a weak entity to its owner entity type is called identifying
relationship type.
Comment
Step 5 of 5
Partial key
A partial key is a set of attributes in weak entity types that can uniquely identify weak entities that
are related to the same owner entity.
Comment
Chapter 3, Problem 13RQ
Problem
Can an identifying relationship of a weak entity type be of a degree greater than two? Give
examples to illustrate your answer.
Step-by-step solution
Step 1 of 4
Identifying relationship: The relationship between a strong and a weak entity is known as
identifying relationship.
Comment
Step 2 of 4
The degree of an identifying relationship of a weak entity can be two or greater than two.
Comment
Step 3 of 4
Consider the following ER diagram:
Here,
• Student and Company are the two strong entities and Interview is the weak entity.
• The selection_process is an identifying relationship.
• The degree of the identifying relationship (selection_process) is 3.
• In the above ER diagram, the student applies for a job in a company and interview is a selection
process for the student to take a job in the company.
Comment
Step 4 of 4
Therefore, from the above ER diagram, it can be concluded that the degree of an identifying
relationship of a weak entity can be greater than 2.
Comment
Chapter 3, Problem 14RQ
Problem
Discuss the conventions for displaying an ER schema as an ER diagram.
Step-by-step solution
Step 1 of 1
Comment
Chapter 3, Problem 15RQ
Problem
Discuss the naming conventions used for ER schema diagrams.
Step-by-step solution
Step 1 of 1
The naming conventions used for ER schema diagrams are as follows:
• The entity type names should be in singular names.
• The names of the entity type and the relationship type are should written in uppercase letters.
• The attribute names of each entity are initial letter capitalized.
• The role names are in lowercase.
Comment
Chapter 3, Problem 16E
Problem
Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:
a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.
b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.
c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.
Can you think of any other similar constraints?
Step-by-step solution
Step 1 of 4
a.
Consider the following miniworld constraint:
A particular classroom can be used by a section at a particular DaysTime value, during a
particular semester and year.
The attribute combinations, that must be unique for the above constraint, are as follows:
Sem, Year, SecID, CRoom, DaysTime
Comment
Step 2 of 4
b.
Consider the following miniworld constraint:
Only one section can be taught by an instructor at a particular DaysTime value, during a
particular semester and year.
The attribute combinations, that must be unique for the above constraint, are as follows:
Sem, Year, SecId, DaysTime, Id (of the INSTRUCTOR teaching the SECTION)
Comment
Step 3 of 4
c.
Consider the following miniworld constraint:
The section numbers corresponding to the sections offered for the same course must all be
different during a particular semester and year.
The attribute combinations, that must be unique for the above constraint, are as follows:
Sem, Year, SecNo, CCode (of the COURSE related to the SECTION)
Comment
Step 4 of 4
Some of the other similar constraints related to SECTION entity are as follows:
• In a particular semester and year, a student can take only one section at a particular DaysTime
value.
• In a particular semester and year, an instructor of a particular rank cannot teach two sections at
the same DaysTime value.
• Only one section of a particular course can use only one classroom during each particular
semester and year.
Comment
Chapter 3, Problem 17E
Problem
Composite and multivalued attributes can be nested to any number of levels. Suppose we want
to design an attribute for a STUDENT entity type to keep track of previous college education.
Such an attribute will have one entry for each college previously attended, and each such entry
will be composed of college name, start and end dates, degree entries (degrees awarded at that
college, if any), and transcript entries (courses completed at that college, if any). Each degree
entry contains the degree name and the month and year the degree was awarded, and each
transcript entry contains a course name, semester, year, and grade. Design an attribute to hold
this information. Use the conventions in Figure 3.5.
Step-by-step solution
Step 1 of 3
Complex attributes are the attributes that are formed by nesting multivalued attributes and
composite attributes.
• The curly braces {} are used to group the components of multivalued attributes.
• The open braces () are used to group the components of composite attributes.
Comment
Step 2 of 3
A multivalued attribute PreviousCollege is used to hold the college previously attended by the
student.
• The components of PreviousCollege are CollegeName, StartDate, EndDate.
A multivalued attribute Degree is used to hold the details of degrees awarded to the student.
• The components of Degree are DegreeName, Month, Year.
A multivalued attribute Transcript is used to hold the details of transcript of the student.
• The components of Transcript are CourseName, Semester, Year, Grade.
Comment
Step 3 of 3
An attribute that holds the details of PreviousCollege, Degree and Transcript of the STUDENT
entity is as follows:
{PreviousCollege (CollegeName, StartDate, EndDate,
{Degree (DegreeName, Month, Year)},
{Transcript (CourseName, Semester, Year, Grade)})}
Comment
Chapter 3, Problem 18E
Problem
Show an alternative design for the attribute described in Exercise that uses only entity types
(including weak entity types, if needed) and relationship types.
Exercise
Composite and multivalued attributes can be nested to any number of levels. Suppose we want
to design an attribute for a STUDENT entity type to keep track of previous college education.
Such an attribute will have one entry for each college previously attended, and each such entry
will be composed of college name, start and end dates, degree entries (degrees awarded at that
college, if any), and transcript entries (courses completed at that college, if any). Each degree
entry contains the degree name and the month and year the degree was awarded, and each
transcript entry contains a course name, semester, year, and grade. Design an attribute to hold
this information. Use the conventions in Figure 3.5.
Step-by-step solution
Step 1 of 3
The alternative design for the entity STUDENT with attribute to keep track of previous college
education as discussed in the previous problem is as shown below:
Comment
Step 2 of 3
The strong entities are as given below:
• STUDENT
• COLLEGE
• DEGREE
The weak entities are as given below:
• TRANSCRIPT
• ATTENDANCE
Comment
Step 3 of 3
Relationships between the entities are as given below:
• There exists a binary 1:N relationship PREVIOUS_ATTENDED_COLLEGE between STUDENT
and ATTENDANCE.
• There exists a binary 1:N relationship ATTENDED between COLLEGE and ATTENDANCE.
• There exists a binary M:N relationship DEGREE_AWARDED between ATTENDANCE and
DEGREE.
• There exists a binary 1:N relationship MAINTAIN_ATTENDANCE between ATTENDANCE and
TRANSCRIPT.
Comment
Chapter 3, Problem 19E
Problem
Consider the ER diagram in Figure, which shows a simplified schema for an airline reservations
system. Extract from the ER diagram the requirements and constraints that produced this
schema. Try to be as precise as possible in your requirements and constraints specification.
Figure An ER diagram for an AIRLINE database schema
Step-by-step solution
Step 1 of 2
Refer the ER diagram of the AIRLINE database schema given in figure 3:21.
The requirements and the constraints that produced from the schema are as follows:
AIRPORT
• The database represents the information about each AIRPORT.
• Each AIRPORT has its unique Airport_code, AIRPORT Name, City and State where it is
located.
• Each AIRPORT is identified by airport code.
FLIGHT
• Each FLIGHT is identified by a unique number.
• It also specifies the information about the airline for the FLIGHT and the days on which it is
scheduled.
FLIGHT_LEG
• Each FLIGHT consists of one or more FLIGHT_LEGs with Leg_no.
• FARE is kept for each flight and there are certain set of restrictions on FARE.
• Each FLIGHT_LEG has the details of its scheduled arrival time, departure time and an Airport
Arrival, Airport Departure.
Comment
Step 2 of 2
LEG_INSTANCE
• Each FLIGHT_LEG has the details of its scheduled arrival time, departure time and Airport
Arrival and Airport Departure with one or more LEG_INSTANCEs.
• A LEG_INSTANCE is an instance of a FLIGHT LEG for a date on which flight travels.
• The information for the AIRPLANE used and the number of available seats is kept in the LEG
INSTANCE.
RESERVATION
• In LEG INSTANCE, RESERVATIONs for every customer include the Customer Name, Phone,
and Seat Number(s).
AIRPLANE, AIRPLANE TYPE, CAN_LAND
• All the information about the AIRPLANEs and AIRPLANE TYPEs are included.
• AIRPLANE is identified by an airplane id, and the particular type of an AIRPLANE_TYPE.
• It has a fixed number of seats and has a particular manufacturing company name.
• CAN_LAND relates AIRPLANE_TYPE to the AIRPORTS where they can land at a time.
Comment
Chapter 3, Problem 20E
Problem
In Chapters 1 and 2, we discussed the database environment and database users. We can
consider many entity types to describe such an environment, such as DBMS, stored database,
DBA, and catalog/data dictionary. Try to specify all the entity types that can fully describe a
database system and its environment; then specify the relationship types among them, and draw
an ER diagram to describe such a general database environment.
Step-by-step solution
Step 1 of 1
Entity types that can fully describe a database environment and users are:
1. USERS(User_name, User_id, Kind_of_user): User_name gives name of user, User_id is
unique identifier for each user and Kind of user tells if user is from DBA staff, casual_User,
Application Programmer, Parametric user.(list can be expanded to include menu based
application user, form base application user and so on)
2. COMMAND_INTERFACE_TYPE (Interface_identifier, User_group, Next_tool):
Interface_identifier can tell which interfaces user can use, viz. DDL statements, Privileged
commands, Interactive query, Application programs, compiled transactions, menu based
interface, form based interface and so on. User_group tells which user group will use this
interface and so that others cannot carry out instructions which they don’t have access to.
Next_tool tells tool_id of tool that will be used by interface for further processing.
3. TOOLS (Tool_id, Tool_type, Next_tool): Tool_id helps to uniquely identify the tool, Tool_type
tells if the tool is a compiler, or and optimizer or storage tool, Next_tool tells the Tool_id of next
tool that will be used by this tool for completing the transaction.
E-R diagram:
Comment
Chapter 3, Problem 21E
Problem
Design an ER schema for keeping track of information about votes taken in the U.S. House of
Representatives during the current two-year congressional session. The database needs to keep
track of each U.S. STATE’s Name (e.g., ‘Texas’, ‘New York’, ‘California’) and include the Region
of the state (whose domain is {‘Northeast’, ‘Midwest’, ‘Southeast’, ‘Southwest’, ‘West’}). Each
CONGRESS_PERSON in the House of Representatives is described by his or her Name, plus
the District represented, the Start_date when the congressperson was first elected, and the
political Party to which he or she belongs (whose domain is {‘Republican’, ‘Democrat’,
‘Independent’, ‘Other’}). The database keeps track of each BILL (i.e., proposed law), including
the Bill_name, the Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is
{‘Yes’, ‘No’}), and the Sponsor (the congressperson(s) who sponsored—that is, proposed—the
bill). The database also keeps track of how each congressperson voted on each bill (domain of
Vote attribute is {‘Yes’, ‘No’, ‘Abstain’, ‘Absent’}). Draw an ER schema diagram for this
application. State clearly any assumptions you make.
Step-by-step solution
Step 1 of 2
Comment
Step 2 of 2
ASSUMPTIONS:
1. Each CONGRESS_PERSON can represent one district and one district is represented by one
CONGRESS_MAN.
2. Bill is sponsored by one CONGRESS_MAN.
3. Every BILL has different name.
Above schema has three entity types
1. US_STATE_REGION: represents states and regions in US
2. CONGRESS_PERSON: who are elected from various regions and are related to
US_STATE_REGION by relationship REPRESENTATIVE.
3. BILL: each bill is related to CONGRESS_PERSON, who presents it and is voted by all
CONGRESS_MAN.
Comment
Chapter 3, Problem 22E
Problem
A database is being constructed to keep track of the teams and games of a sports league. A
team has a number of players, not all of whom participate in each game. It is desired to keep
track of the players participating in each game for each team, the positions they played in that
game, and the result of the game. Design an ER schema diagram for this application, stating any
assumptions you make. Choose your favorite sport (e.g., soccer, baseball, football).
Step-by-step solution
Step 1 of 2
Consider a soccer league in which various teams participate to win the title. The following is the
ER diagram for the database of a sports league.
Comment
Step 2 of 2
Assumptions:
• Only two teams can participate in each game.
• Each player in a team has unique number.
• On a date only one game takes place.
• A player can play many games.
Comment
Chapter 3, Problem 23E
Problem
Consider the ER diagram shown in Figure for part of a BANK database. Each bank can have
multiple branches, and each branch can have multiple accounts and loans.
a. List the strong (nonweak) entity types in the ER diagram.
b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship.
c. What constraints do the partial key and the identifying relationship of the weak entity type
specify in this diagram?
d. List the names of all relationship types, and specify the (min, max) constraint on each
participation of an entity type in a relationship type. Justify your choices.
e. List concisely the user requirements that led to this ER schema design.
f. Suppose that every customer must have at least one account but is restricted to at most two
loans at a time, and that a bank branch cannot have more than 1,000 loans. How does this show
up on the (min, max) constraints?
An ER diagram for a BANK database schema.
Step-by-step solution
Step 1 of 6
(a)
Non weak entity types are:
• LOAN
• CUSTOMER
• ACCOUNT
• BANK
Comment
Step 2 of 6
(b)
Yes there is a weak entity type BANK_BRANCH and its Partial key is Branch_no and
identifying relationship is BRANCHES.
Comment
Step 3 of 6
(c)
• No two branches have same number.
• A bank can have any number of branches but a branch is of only one bank.
Comment
Step 4 of 6
(d)
Relationship types are:
• BRANCHES: BANK (min, max) = (1, 1) and BANK_BRANCH (min, max) = (1.*). A bank can
have any number of branches but a branch can be owned by a single bank
• ACCTS: ACCOUNT (min, max) = (1..*) and BANK_BRANCH(min, max) = (1, 1). An account
can be with one branch but a branch can have many accounts.
• LOANS: LOAN (min, max) = (1..*) and BANK_BRANCH(min, max) = (1,1). A branch can give
any number of loans but a loan is given from one branch only.
• A_C: ACCOUNT(min, max) = (1.*) and CUSTOMER(min, max) = (1,1). A customer can have
any number of accounts but an account is owned by only one customer
• L_C: CUSTOMER(min, max) = (1,1) and LOAN(min, max) = (1..*). A customer can take any
number of loans but a loan is given to only one customer.
Comments (1)
Step 5 of 6
(e)
Consider a banking system
• Each BANK has a unique code, name and address.
• A bank can have any number of BANK_BRANCH. Each BANK_BRANCH has number that is
unique in branches of that bank.
• Each BANK_BRACH opens account and gives loans to customers.
• Each account and loan.is identifies by account number and has balance, is of particular type.
• Each customer is identified by Ssn. Name address phone of customer are stored.
Comment
Step 6 of 6
(f)
Relationship type constraints are:
• BRANCHES: BANK (min, max) = (1, 1) and BANK_BRANCH (min, max) = (1.*)
• ACCTS: ACCOUNT (min, max) = (1,500) and BANK_BRANCH(min, max) = (1, 1)
• LOANS: LOAN (min, max) = (1,1000) and BANK_BRANCH(min, max) = (1,1)
• A_C: ACCOUNT(min, max) = (1.*) and CUSTOMER(min, max) = (1,1)
• L_C: CUSTOMER(min, max) = (1,1) and LOAN(min, max) = (1,2)
Comments (2)
Chapter 3, Problem 24E
Problem
Consider the ER diagram in Figure Assume that an employee may work in up to two
departments or may not be assigned to any department. Assume that each department must
have one and may have up to three phone numbers. Supply (min, max) constraints on this
diagram. State clearly any additional assumptions you make. Under what conditions would the
relationship HAS_PHONE be redundant in this example?
Part of an ER diagram for a COMPANY database.
Step-by-step solution
Step 1 of 2
Consider the ER diagram for the COMPANY database. The employee may work in up to two
departments or may not be a part of any department. The (min, max) constraint in this case is (0,
2). Each department must have one phone number and may have up to three phone numbers.
The (min, max) constraint in this case is (1, 3).
The following are the other assumptions made for the COMPANY database:
• Each department must have one employee and may have up to twenty employees. The (min,
max) constraint in this case is (1, 20).
• Each phone used by only one department. The (min, max) constraint in this case is (1, 1).
• Each phone is assigned to at least one employee and may be assigned to 5 employees. The
(min, max) constraint in this case is (1, 5).
• Each employee must have one phone and may have up to 3 phones. The (min, max) constraint
in this case is (1, 3).
Comment
Step 2 of 2
The following is the ER diagram after supplying the (min, max) constraints for the COMPANY
database:
The relationship HAS_PHONE would be redundant under the following condition:
• If the EMPLOYEEs assigned to all PHONEs of their DEPARTMENT and none of any other
department.
Comment
Chapter 3, Problem 25E
Problem
Consider the ER diagram in Figure. Assume that a course may or may not use a textbook, but
that a text by definition is a book that is used in some course. A course may not use more than
five books. Instructors teach from two to four courses. Supply (min, max) constraints on this
diagram. State clearly any additional assumptions you make. If we add the relationship ADOPTS,
to indicate the textbook(s) that an instructor uses for a course, should it be a binary relationship
between INSTRUCTOR and TEXT, or a ternary relationship among all three entity types? What
(min, max) constraints would you put on the relationship? Why?
Part of an ER diagram or a COURSES database.
Step-by-step solution
Step 1 of 1
Relationship type constraints are:
TEACHES: INSTRUCTOR (min, max) = (1,1) and COURSE (min, max) = (2,4). Assumption: One
course is taught by a single teacher.
USES: TEXT (min, max) = (0, 5) and COURSE (min, max) = (1, 1).
Assumption: One text can be used by single course.
If relationship ADOPTS is added in between INSTRUCTOR and TEXT (min, max) constraints
would be:
INSTRUCTOR (min, max) = (1,1) and TEXT (min, max) = (0, 20).
Since each Instructor can take 2-4 courses and can use unto five texts for each course or none,
min and max constraints will be like above.
Comment
Chapter 3, Problem 26E
Problem
Consider an entity type SECTION in a UNIVERSITY database, which describes the section
offerings of courses. The attributes of SECTION are Section_number, Semester, Year.
Course_number, Instructor, Room_no (where section is taught), Building (where section is
taught), Weekdays (domain is the possible combinations of weekdays in which a section can be
offered {‘MWF’, ‘MW’, ‘TT’, and so on}), and Hours (domain is all possible time periods during
which sections are offered {‘9–9:50 a.m.’, ‘10–10:50 a.m.’, …, ‘3:30–4:50 p.m.’, ‘5:30–6:20 p.m.’,
and so on}). Assume that Section_number is unique for each course within a particular
semester/year combination (that is, if a course is offered multiple times during a particular
semester, its section offerings are numbered 1, 2, 3, and so on). There are several composite
keys for section, and some attributes are components of more than one key. Identify three
composite keys, and show how they can be represented in an ER schema diagram.
Step-by-step solution
Step 1 of 4
The attributes of the SECTION entity are as follows:
• Section_number
• Semester
• Year
• Course_number
• Instructor
• Room_no
• Building
• Weekdays
• Hours
Comment
Step 2 of 4
As Section_number is unique for a course in particular semester of a year, {Section_number,
Semester, Year, Course} can be considered as composite key for SECTION entity.
As unique room can be allocated for a specific days and hours in a particular semester of a year,
{Semester, Year, Room_no, Weekdays, Hours} can be considered as composite key for
SECTION entity.
As unique Instructor can be allocated to teach for a specific days and hours in a particular
semester of a year, {Semester, Year, Instructor, Weekdays, Hours} can be considered as
composite key for SECTION entity.
Comment
Step 3 of 4
Hence, the composite keys for SECTION entity are as follows:
• Key 1: Section_number, Semester, Year, Course
• Key 2: Semester, Year, Room_no, Weekdays, Hours
• Key 3: Semester, Year, Instructor, Weekdays, Hours
Comment
Step 4 of 4
The ER schema diagram is as follows:
Chapter 3, Problem 27E
Problem
Cardinality ratios often dictate the detailed design of a database. The cardinality ratio depends on
the real-world meaning of the entity types involved and is defined by the specific application. For
the following binary relationships, suggest cardinality ratios based on the common-sense
meaning of the entity types. Clearly state any assumptions you make.
Entity 1
Cardinality Ratio Entity 2
1.
STUDENT
______________ SOCIAL_SECURITY_CARD
2.
STUDENT
______________ TEACHER
3.
CLASSROOM
______________ WALL
4.
COUNTRY
______________ CURRENT_PRESIDENT
5.
COURSE
______________ TEXTBOOK
6.
ITEM (that can be found in an order) ______________ ORDER
7.
STUDENT
______________ CLASS
8.
CLASS
______________ INSTRUCTOR
9.
INSTRUCTOR
______________ OFFICE
10 EBAY_AUCTIONJTEM
______________ EBAY_BID
Step-by-step solution
Step 1 of 3
1. Each student will have a unique social security number. So there exists a 1:1 cardinality ratio
between STUDENT and SOCIAL_SECURITY_NUMBER entities.
2. A student can be taught by many teachers and a teacher can teach many students. So there
exists a M: N cardinality ratio between STUDENT and TEACHER entities.
3. A class room can have 4 walls and there will be a common wall for two class rooms. So there
exists a 2: 4 cardinality ratio between CLASSROOM and WALL entities.
4. Each country will have an only one president and a person can be president to only one
country. So there exists a 1:1 cardinality ratio between COUNTRY and PRESIDENT entities.
5. A course can have any number of textbooks but a textbook can belong to only one course. So
there exists a 1:N cardinality ratio between COURSE and TEXTBOOK entities.
Comments (2)
Step 2 of 3
6. An order can consist of many items and an item can belong to more than one order. So there
exists a M: N cardinality ratio between ORDER and ITEM entities.
7. A student can belong to one class, but a class can consist of many students. So there exists a
N:1 cardinality ratio between STUDENT and CLASS entities.
8. A class can have many instructors and an instructor can belong to more than one class. So
there exists a M: N cardinality ratio between CLASS and INSTRUCTOR entities.
9. An instructor can belong to one office, but an office can have more than one instructor. So
there exists a N:1 cardinality ratio between INSTRUCTOR and OFFICE entities.
10. An eBay auction item can have any number of bids. So there exists a 1:N cardinality ratio
between EBAY_AUCTION_ITEM and EBAY-BID entities.
Comment
Step 3 of 3
Summary of cardinality ratio:
Comment
Chapter 3, Problem 28E
Problem
Consider the ER schema for the MOVIES database in Figure.
Assume that MOVIES is a populated database. ACTOR is used as a generic term and includes
actresses. Given the constraints shown in the ER schema, respond to the following statements
with True, False, or Maybe. Assign a response of Maybe to statements that, although not
explicitly shown to be True, cannot be proven False based on the schema as shown. Justify each
answer.
a. There are no actors in this database that have been in no movies.
b. There are some actors who have acted in more than ten movies.
c. Some actors have done a lead role in multiple movies.
d. A movie can have only a maximum of two lead actors.
e. Every director has been an actor in some movie.
f. No producer has ever been an actor.
g. A producer cannot be an actor in some other movie.
h. There are movies with more than a dozen actors.
i. Some producers have been a director as well.
j. Most movies have one director and one producer.
k. Some movies have one director but several producers.
l. There are some actors who have done a lead role, directed a movie, and produced a movie.
m. No movie has a director who also acted in that movie.
Figure An ER diagram for a MOVIES database schema.
Step-by-step solution
Step 1 of 13
a.
There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and
MOVIE. ACTOR and MOVIE have full participation in relationship PERFORMS_IN.
Hence, the given statement is TRUE.
Comment
Step 2 of 13
b.
There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and
MOVIE. The maximum cardinality M or N indicates that there is no maximum number. Some of
the actors may be acted in more than ten movies.
Hence, the given statement is MAY BE.
Comment
Step 3 of 13
c.
There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE. The
maximum cardinality for an actor to act in a movie as a lead role is N. N can be 2 or more.
Hence, the given statement is TRUE.
Comment
Step 4 of 13
d.
There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE. The
maximum cardinality 2 indicates that an actor can act as a lead role in only two movies.
Hence, the given statement is TRUE.
Comments (1)
Step 5 of 13
e.
There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and
DIRECTOR. Director does not have total participation in the relationship named
ALSO_A_DIRECTOR. So, there may be an actor who is also a director, but every director cannot
be an actor.
Hence, the given statement is FALSE.
Comment
Step 6 of 13
f.
There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and
PRODUCER. Producer does not have total participation in the relationship named
ACTOR_PRODUCER. So, there may be an actor who is also a producer.
Hence, the given statement is FALSE.
Comment
Step 7 of 13
g.
A producer can act in any movie other than directed by him.
Hence, the given statement is FALSE.
Comment
Step 8 of 13
h.
There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and
MOVIE. The maximum cardinality M indicates that there is no maximum number. A movie can
have more than 12 actors performing in it.
Hence, the given statement is MAY BE.
Comment
Step 9 of 13
i.
There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and
DIRECTOR.
There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and
PRODUCER.
Hence, there may be an actor who is a director as well a producer
Hence, the given statement is TRUE.
Comment
Step 10 of 13
j.
There exists a one to many relationship named DIRECTS between DIRECTOR and MOVIE. A
director can direct N movies.
There exists a many to many relationship named PRODUCES between PRODUCER and
MOVIE. A producer can produce any number of movies.
So, there may be one director and one producer for a movie.
Hence, the given statement is MAY BE.
Comment
Step 11 of 13
k.
There exists a one to many relationship named DIRECTS between DIRECTOR and MOVIE. A
director can direct N movies.
There exists a many to many relationship named PRODUCES between PRODUCER and
MOVIE. A producer can produce any number of movies.
So, there can be one director and several producers for movies.
Hence, the given statement is TRUE.
Comment
Step 12 of 13
l.
There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE.
There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and
DIRECTOR.
There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and
PRODUCER.
So, there may an actor who is a producer, director and performed a lead role in a movie.
Hence, the given statement is TRUE.
Comment
Step 13 of 13
m.
There may be a movie in which a director performed in the movie directed by him.
Hence, the given statement is FALSE.
Comment
Problem
Chapter 3, Problem 29E
Given the ER schema for the MOVIES database in Figure, draw an instance diagram using three
movies that have been released recently. Draw instances of each entity type: MOVIES,
ACTORS, PRODUCERS, DIRECTORS involved; make up instances of the relationships as they
exist in reality for those movies.
An ER diagram for a MOVIES database schema.
Step-by-step solution
Step 1 of 2
Comment
Step 2 of 2
Amir Khan: Produced a movie he acted in and Also directed the movie.
Comment
Chapter 3, Problem 30E
Problem
Illustrate the UML diagram for Exercise. Your UML design should observe the following
requirements:
a. A student should have the ability to compute his/her GPA and add or drop majors and minors.
b. Each department should be able to add or delete courses and hire or terminate faculty.
c. Each instructor should be able to assign or change a student’s grade for a course.
Note: Some of these functions may be spread over multiple classes.
Reference Problem 16
Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:
a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.
b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.
c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.
Can you think of any other similar constraints?
Step-by-step solution
Step 1 of 5
The UML diagram consists of a class, such that the class is equivalent to the entity in ER
diagram. The class consists of following three sections:
• Class name: It is the top section of the UML class diagram. Class name is similar to the entity
type name in ER diagram.
• Attributes: It is the middle section of the UML class diagram. Attributes are the same as the
attributes of an entity in the ER diagram.
• Operations: It is the last section of the UML class diagram. It indicates the operations that can
be performed on individual objects, where each object is similar to the entities in ER diagram.
Comment
Step 2 of 5
a.
The operation that indicates the ability of the student to calculate his/her GPA and also to add or
drop the majors and minors is specified in the last section of the UML class diagram. The
operations are as follows:
• computer_gpa
• add_major
• drop_major
• add_minor
• drop_minor
Comment
Step 3 of 5
b.
The operation that indicates the ability of each department to add or delete a course and also to
hire or terminate a faculty is specified in the last section of the UML class diagram. The
operations are as follows:
• add_course
• delete_course
• hire_faculty
• terminate_faculty
Comment
Step 4 of 5
c.
The operation that indicates the ability of each instructor to assign or change the grade of a
student for a particular course is specified in the last section of the UML class diagram. The
operations are as follows:
• assign_grade
• change_grade
Comment
Step 5 of 5
The UML diagram corresponding to the above requirements are as follows:
Comment
Chapter 3, Problem 31LE
Problem
Consider the UNIVERSITY database described in Exercise 16. Build the ER schema for this
database using a data modeling tool such as ERwin or Rational Rose.
Reference Exercise 16
Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:
a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.
b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.
c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.
Can you think of any other similar constraints?
Step-by-step solution
Step 1 of 1
Refer to the exercise 3.16 for the UNIVERSITY database. Use Rational Rose tool to create the
ER schema for the database as follow:
• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.
• Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and
then click on empty space of the Class Diagram file. Name the class as COLLEGE.
Right click on the class, select the option New Attribute, and name the attribute as CName.
Similarly, create the other attributes COffice and CPhone.
• Now right click on the attribute CName, available on the left under the class UNIVERSITY, and
select the option Open Specification. Select the Protected option under Export Control. This
will make CName as primary key.
• Similarly create another class INSTRUCTOR; its attributes Id, Rank, IName, IOffice and
IPhone; and Id as the primary key.
• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class COLLEGE; while holding the click drag the
mouse towards the class INSTRUCTOR and release the click. This will create the relationship
between the two selected classes.
Name the association as DEAN. Since the structural constraint in the ER diagram is specified
using (min, max) notation, so specify the structural constraints using the Rational Rose tool as
follows:
• Right click on the association close to the class COLLEGE and select 1 from the option
Multiplicity.
• Again, right click on the association close to the class INSTRUCTOR and select Zero or One
from the option Multiplicity.
• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.
ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
Comment
Chapter 3, Problem 32LE
Problem
Consider a MAIL_ORDER database in which employees take orders for parts from customers.
The data requirements are summarized as follows:
â–  The mail order company has employees, each identified by a unique employee number, first
and last name, and Zip Code.
â–  Each customer of the company is identified by a unique customer number, first and last name,
and Zip Code.
â–  Each part sold by the company is identified by a unique part number, a part name, price, and
quantity in stock.
â–  Each order placed by a customer is taken by an employee and is given a unique order number.
Each order contains specified quantities of one or more parts. Each order has a date of receipt
as well as an expected ship date. The actual ship date is also recorded.
Design an entity-relationship diagram for the mail order database and build the design using a
data modeling tool such as ERwin or Rational Rose.
Step-by-step solution
There is no solution to this problem yet.
Get help from a Chegg subject expert.
Ask an expert
Chapter 3, Problem 35LE
Problem
Consider the ER diagram for the AIRLINE database shown in Figure Build this design using a
data modeling tool such as ERwin or Rational Rose.
An ER diagram for an AIRLINE database schema
Step-by-step solution
Step 1 of 1
Refer to the figure 3.21 for the ER schema of AIRLINE database. Use Rational Rose tool to
create the ER schema for the database as follow:
• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.
• Name the class diagram as AIRLINE. Select the option Class available in the toolbar and then
click on empty space of the Class Diagram file. Name the class as AIRPORT.
Right click on the class, select the option New Attribute, and name the attribute as Airport_code.
Similarly, create the other attributes City, State and Name.
• Now right click on the attribute Airport_code, available on the left under the class AIRPORT,
and select the option Open Specification. Select the Protected option under Export Control.
This will make Airport_code as primary key.
• Similarly create another class FLIGHT_LEG and its attribute Leg_no
• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class AIRPORT; while holding the click drag the
mouse towards the class FLIGHT_LEG and release the click. This will create the relationship
between the two selected classes.
Name the association as DEPARTURE_AIRPORT. Since the structural constraint in the ER
diagram is specified using (min, max) notation, so specify the structural constraints using the
Rational Rose tool as follows:
• Right click on the association close to the class AIRPORT and select 1 from the option
Multiplicity.
• Again, right click on the association close to the class FLIGHT_LEG and select n from the
option Multiplicity.
• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.
ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
Comment
Chapter 4, Problem 1RQ
Problem
What is a subclass? When is a subclass needed in data modeling?
Step-by-step solution
Step 1 of 3
Subclass:
The sub class is also called as a derived class. This class extends from another class (Parent
Class) so that it inherits protected and public members from the parent class.
The sub class is same as the entity in the superclass but in a distinct specific role.
Comment
Step 2 of 3
An entity is an object (thing) with independent physical (car, home, person) or conceptual
(company, university course) existence in the real world.).
Each real-world entity (thing) has certain properties that represent its significance in real world or
describes it. These properties of an entity are known as attribute. An entity type defines a
collection (or set) of entities that have the same attributes.
A database usually contains a group of entities that are similar. These entities have same
attributes but different attribute values. A collection of these entities is an entity type.
In each entity type there may exist, smaller groupings on basis of one or other
attribute/relationship. Such attributes or relationships may not apply to all entities in entity type
but are of significant value for that group. All such groups can be represented as separate
classes or entity types. These form subclass of bigger entity type.
Example:
Consider am entity type VEHICLE. Now all vehicles have property that they have manufacturer,
number_plate, registration_number, colour etc. , but there are certain properties hat we may link
only to carrier vehicles like load_capacity, size(for width and height of product it can take) etc…,
and certain attributes that can be attached to passenger vehicles only are sitting_capacity,
ac/non ac etc…, so we can have subclasses for Entity type vehicle as PASSENGER_VEHICLE
and GOODS_VEHICLE. PASSENGER_VEHICLE and GOODS_VEHICLE are subclasses of
VEHICLE superclass.
Comment
Step 3 of 3
Subclass needed in data modeling:
To define inheritance relationship between two classes, the subclass is needed in data modeling.
Concept of subclass is used in data modeling to represent data more meaningfully and to
represent those attributes/relationships clearly that are part of a group of entities in superclass
and are not part of all entities.
Comment
Chapter 4, Problem 2RQ
Problem
Define the following terms: superclass of a subclass, superclass/subclass relationship, IS-A
relationship, specialization, generalization, category, specific (local) attributes, and specific
relationships.
Step-by-step solution
Step 1 of 9
1. Superclass of a subclass: In each entity type there may exist, smaller groupings on basis
of one or other attribute/relationship. Such attributes or relationships may not apply to all
entities in entity type but are of significant value for that particular group. All such groups can be
represented as separate classes or entity types. These form subclass of bigger entity type.
Bigger entity type is known as superclass.
For example: Consider am entity type VEHICLE. Now all vehicles have property that they have
manufacturer, number_plate, registration_number, colour etc. , but there are certain properties
hat we may link only to carrier vehicles like load_capacity, size(for width and height of product it
can take) etc…, and certain attributes that can be attached to passenger vehicles only are
sitting_capacity, ac/non ac etc…, so we can have subclasses for Entity type vehicle as
PASSENGER_VEHICLE and GOODS_VEHICLE. PASSENGER_VEHICLE and
GOODS_VEHICLE are subclasses of VEHICLE superclass
Comment
Step 2 of 9
2. Superclass/subclass relationship: Relationship between a superclass and any one of its
subclass is known as superclass/subclass relationship.
Comment
Step 3 of 9
3. is-a relationship: A superclass/subclass relationship is often called as is-a relationship
because of the way in which concept is referred.
For example: Consider am entity type VEHICLE. Now all vehicles have property that they have
manufacturer, number_plate, registration_number, colour etc. , but there are certain properties
hat we may link only to carrier vehicles like load_capacity, size(for width and height of product it
can take) etc…, and certain attributes that can be attached to passenger vehicles only are
sitting_capacity, ac/non ac etc…, so we can have subclasses for Entity type vehicle as
PASSENGER_VEHICLE and GOODS_VEHICLE. PASSENGER_VEHICLE and
GOODS_VEHICLE are subclasses of VEHICLE superclass.
Or we can say GOOD_VEHICLE is a VEHICLE..
4.
Comment
Step 4 of 9
Specialization: Specialization is a process of defining a set of subclass of an entity
type(superclass of specialization). The set of subclass that forms a specialization is defined on
basis of some distinguishing characteristic of the entities in the superclass.
For example: the set of {GOOD_VEHICLE and CARRIER_VEHICLE} is a specialization of
superclass VEHICLE that distinguishes among vehicle entities on basis of purpose which each
vehicle serves. There can be several specializations of same entity type based on different
distinguishing characteristics.
Foe example: On basis that vehicle is commercial or not we can have other specialization
{COMMERCIAL, PRIVATE}.
Specialization is a process that allows user to do following:
a. Define a set of subclass of an entity type.
b. Establish additional specific attribute with each subclass.
c. Establish additional specific relationship types between each subclass and other entity types
or other subclasses.
Comment
Step 5 of 9
5. Generalization: This is a reverse process of abstraction in which differences between several
entity types are suppressed, common features are identified, and generalized into a single
superclass of which the original entity types are special subclass.
For example: GOOD_VEHICLE and CARRIER_VEHICLE are two classes and they have certain
attributes, viz. , number_plate, reg_number, color, etc. ; these attributes from both these classes
can be taken in common and a new superclass can be created VEHICLE. This is called
generalization.
Comment
Step 6 of 9
6. Category: It may happen sometime that need arises for modeling a single
superclass/subclass relationship with more than one superclass, where the superclasses
represent different entity types. In this case, the subclass will represent a collection of objects
that is a subset of the of distinct entity types; such a subclass is called a union or a category.
Comment
Step 7 of 9
7.
Comments (2)
Step 8 of 9
Specific (local) attributes: Consider am entity type VEHICLE. Now all vehicles have property that
they have manufacturer, number_plate, registration_number, colour etc. , but there are certain
properties hat we may link only to CARRIER_VEHICLES subclass like load_capacity, size(for
width and height of product it can take) etc…, and certain attributes that can be attached to
PASSENGER_VEHICLES subclass only: sitting_capacity, ac/non ac etc. These attributes that
are part of only subclaases and not of superclass are called local attributes or specific
attributes.
Comment
Step 9 of 9
8. Specific relationships: Like local attributes there are certain relationships that are true only
for a subclass of superclass and not for all subclasses or for superclass. Such relations are
called specific relationships.
For example: CARRIES_GOODS can b a relation between CARRIER_VEHICLES and
COMPANY and but not between PASSENGER_VEHICLE and COMPANY.
Comment
Chapter 4, Problem 3RQ
Problem
Discuss the mechanism of attribute/relationship inheritance. Why is it useful?
Step-by-step solution
Step 1 of 2
The Enhanced entity relationship (EER) model is the extension of the ER model. The EER model
includes some new concepts in addition to the concepts of the ER model. The EER model
includes the concepts of subclass, superclass, specialization, generalization, category or union
type. The ER model with all these additional concepts is associated with the mechanism of
attribute and relationship inheritance.
Comment
Step 2 of 2
The type of each entity is defined by the set of attributes and the relationship types. The
members of the subclass entity inherit the attributes and the relationships of the superclass
entity. This mechanism is useful because, the attributes in the subclass possess the
characteristics of the superclass.
Comment
Chapter 4, Problem 4RQ
Problem
Discuss user-defined and predicate-defined subclasses, and identify the differences between the
two.
Step-by-step solution
Step 1 of 1
Predicate-defined subclasses: When we decide entities that will become member of each
class of specialization by placing condition on some attribute of the superclass. Such subclasses
are called predicate-defined subclass.
User- defined subclasses: When there is no condition for determining membership in a
subclass, the subclass is called user defined. Membership in such a subclass is determined by
the database users when they apply the operation to add entity to the subclass; hence,
membership is specified individually for each entity by user, not by any condition that that may be
evaluated automatically.
Difference between predicate defined and user defined subclass are:
1. Membership of predicate defined subclasses can be decided automatically but it is not the
same for user defined subclasses.
Comment
Chapter 4, Problem 5RQ
Problem
Discuss user-defined and attribute-defined specializations, and identify the differences between
the two.
Step-by-step solution
Step 1 of 5
User- defined specialization:
Comment
Step 2 of 5
If there is no condition for deciding membership of all subclasses, then the sub class is called
user defined specialization.
Comment
Step 3 of 5
Membership in such a specialization is determined by the database users when any operation is
performed to add entity to the subclass.
Comment
Step 4 of 5
Hence, membership is specified individually for each entity by user.
Attribute-defined specialization:
If the user chooses entities, the entity become member of each class of specialization by placing
condition on some attribute of the superclass. Such subclasses are called attribute-defined
subclass.
Comment
Step 5 of 5
The difference between user-defined specialization and attribute-defined specialization is as
follows:
User-defined specialization
Attribute-defined specialization
The user is responsible for identifying proper
The value of the same attribute is used in
subclass.
defining predicate for all subclasses.
Membership of user-defined defined
Membership of attribute defined
specialization cannot be decided automatically.
specialization can be decided automatically.
Comment
Chapter 4, Problem 6RQ
Problem
Discuss the two main types of constraints on specializations and generalizations.
Step-by-step solution
Step 1 of 1
Two main constraints on specialization and generalization are:
1. Disjoint Constraint: This specifies that the subclasses of the specialization must be disjoint.
This means that an entity can be a member of at most one of the subclasses of the
specialization. A specialization that is attribute-defined implies the disjoint ness constraint if the
attribute used to define membership predicate is single-valued.
If disjoint ness constraint holds true than specialization is disjoint. There might be a set of entities
that are common to subclasses, this is condition of overlap.
2. Completeness Constraint: This may be total or partial. A total specialization constraint
specifies that every entity in the superclass must be a member of at least one of the subclass in
the specialization. Partial specialization allows an entity not to belong to any of the subclasses.
Comment
Problem
Chapter 4, Problem 7RQ
What is the difference between a specialization hierarchy and a specialization lattice?
Step-by-step solution
Step 1 of 1
A subclass itself may have further subclasses specified on it, forming a hierarchy or a lattice of
specializations. A specialization hierarchy has that constraint that every subclass participates
as a subclass in only one class/subclass relationship; that is, each subclass has only one parent,
which results in a tree structure.
In contrast, for a specialization lattice, a subclass can be a subclass in more than one
class/subclass relationship.
Comment
Next
Chapter 4, Problem 8RQ
Problem
What is the difference between specialization and generalization? Why do we not display this
difference in schema diagrams?
Step-by-step solution
Step 1 of 2
Specialization is a process of defining a set of subclass of an entity type (superclass of
specialization). The set of subclass that forms a specialization is defined on basis of some
distinguishing characteristic of the entities in the superclass.
For example: the set of {GOOD_VEHICLE and CARRIER_VEHICLE} is a specialization of
superclass VEHICLE that distinguishes among vehicle entities on basis of purpose which each
vehicle serves. There can be several specializations of same entity type based on different
distinguishing characteristics.
Foe example: On basis that vehicle is commercial or not we can have other specialization
{COMMERCIAL, PRIVATE}.
Specialization is a process that allows user to do following:
a. Define a set of subclass of an entity type.
b. Establish additional specific attribute with each subclass.
c. Establish additional specific relationship types between each subclass and other entity types
or other subclasses.
Comment
Step 2 of 2
Generalization: This is a reverse process of abstraction in which differences between several
entity types are suppressed, common features are identified, and generalized into a single
superclass of which the original entity types are special subclass.
For example: GOOD_VEHICLE and CARRIER_VEHICLE are two classes and they have certain
attributes, viz. , number_plate, reg_number, color, etc. ; these attributes from both these classes
can be taken in common and a new superclass can be created VEHICLE. This is called
generalization.
Specialization and generalization can be viewed as functionally reverse processes of each other.
We do not generally display difference in design of schema because the decision as to which
process is more appropriate in a particular situation is often subjective.
Comment
Chapter 4, Problem 9RQ
Problem
How does a category differ from a regular shared subclass? What is a category used for?
Illustrate your answer with examples.
Step-by-step solution
Step 1 of 3
Category is different from regular shared subclasses because:
1. A category has two or more superclasses that may represent distinct entity types, whereas
other regular shared subclasses always have a single superclass.
Regular shared subclass fig:
Category fig:
Comments (1)
Step 2 of 3
2. An entity that is member of shared subclass must exist in all superclasses i.e. it is subset of
intersection of superclasses. In case of category, a member entity can be part of any one of
superclass, i.e., it is subset of union of superclasses.
3. Attribute inheritance works selectively in case of categories. Attributes of any one of
superclass are inherited, depending on the superclass to which entity belongs. On the other
hand, a shared subclass inherits all the attributes of its superclasses.
Comment
Step 3 of 3
USE:It may happen sometime that need arises for modeling a single superclass/subclass
relationship with more than one superclass, where the superclasses represent different entity
types. In this case, the subclass will represent a collection of objects that is a subset of the of
distinct entity types; in such cases union or a category is used.
For example: Consider a piece of property. This can be owned by a person, a business firm, a
charitable institution, a bank etc. All this entities are of different type but will jointly form total set
of land owners. Above figure illustrate this example.
Comment
Chapter 4, Problem 10RQ
Problem
For each of the following UML terms (see Sections 3.8 and 4.6) discuss the corresponding term
in the EER model, if any: object, class, association, aggregation, generalization, multiplicity,
attributes, discriminator, link, link attribute, reflexive association, and qualified association.
Step-by-step solution
Step 1 of 1
S.No UML Term
EER model Term
1
Object
Entity
2
Class
Entity type
3
Association
Relationship types
4
Aggregation
Relationship between a whole object and component part
5
Generalization
Generalization
6
Multiplicity
(min, max) notation
7
Attributes
Attributes
8
Discriminator
Partial key
9
Link
Relationship instances
10
Link Attribute
Relationship attribute
11
Reflexive association Recursive relationship
12
Qualified association Weak entity
Comment
Chapter 4, Problem 11RQ
Problem
Discuss the main differences between the notation for EER schema diagrams and UML class
diagrams by comparing how common concepts are represented in each.
Step-by-step solution
Step 1 of 1
Following are some of the differences between the notation for EER schema diagram and UML
class diagram notations are as follows:
Comment
Problem
Chapter 4, Problem 12RQ
List the various data abstraction concepts and the corresponding modeling concepts in the EER
model.
Step-by-step solution
Step 1 of 3
The list of four abstraction concepts in the EER (Enhanced Entity-Relationship model) are as
follows:
• Classification and instantiation
• Identification
• Specialization and generalization
• Aggregation and association
Comment
Step 2 of 3
Classification and instantiation
• The classification is used to assign the similar entities or object to the entity type or object type.
• The instantiation is a quite opposite of the classification and it is used to a specific examination
of distinct objects of a class.
Identification
• Identify the classes and objects are uniquely identified by the identifier is known as an
identification.
• The identification needs two levels:
o The identification is used to tell the difference between the classes and objects.
o The identification is also used to identify the database objects and to relate them to their realworld counterparts.
Specialization and generalization
• The specialization is used to categorizing a class of objects into subclasses.
• The generalization is the quite opposite of the generalization and it is used combined several
classes into a higher-level class.
Aggregation and association
• The aggregation is used to build the composite objects from their component objects.
• The association is used to associate objects from several independent classes.
Comment
Step 3 of 3
The following are the modeling concepts of the EER model:
• The modeling concepts in the EER model almost like all the ER model modeling concepts. In
addition, the EER model contains subclass and superclass are related to the concepts of the
Specialization and generalization.
• Another modeling concepts in the EER model is category or union type. Which have no
standard terminology related to the abstract concepts of the EER model.
Comment
Chapter 4, Problem 13RQ
Problem
What aggregation feature is missing from the EER model? How can the EER model be further
enhanced to support it?
Step-by-step solution
Step 1 of 2
Missing feature:
In the EER (Enhanced Entity Relationship) model may not be used explicitly and it includes the
possibility of combining the objects which are related to specific instance into a higher level
aggregate object.
• This may be sometimes helpful because this higher-level aggregate may be related to some
other object.
• This type of relationship between the primitive object and aggregate object is referred as IS-APART-OF and its inverse is called as IS-A-COMPONENT-OF.
Comment
Step 2 of 2
Enhancement:
This missing feature must be further enhanced by representing the aggregation feature correctly
in EER model by creating the additional entity types.
Comment
Chapter 4, Problem 14RQ
Problem
What are the main similarities and differences between conceptual database modeling
techniques and knowledge representation techniques?
Step-by-step solution
Step 1 of 2
Major similarities and differences between conceptual database modeling techniques and
knowledge representation techniques:
1. Both the disciplines use an abstraction process to identify common properties and important
aspects of objects in the miniworld while suppressing insignificant differences and unimportant
details.
2. Both disciplines provide concepts, constraints, operations, and languages for defining data
and representing knowledge.
3. KR is generally broader in scope than semantic data models. Different forms of knowledge,
such as rules, incomplete and default knowledge, temporal and spatial knowledge, are
represented in KR schemes.
Comment
Step 2 of 2
4. KR schemes include reasoning mechanisms that deduce additional facts stored in a database.
Hence, whereas most current database systems are limited to answering the direct queries,
knowledge-based systems using KR schemes can answer queries that involve inferences over
the stored data.
5. Whereas most data models concentrate on the representation of database schemas, or metaknowledge, KR schemes often mix up the schemas with the instances themselves in order to
provide flexibility in representing exceptions. This often leads to inefficiencies when KR schemes
are implemented in comparison to database especially when large amount of data needs to be
stored.
Comment
Chapter 4, Problem 15RQ
Problem
Discuss the similarities and differences between an ontology and a database schema.
Step-by-step solution
Step 1 of 1
The difference between ontology and database schema is that, the schema is usually limited to
describing a small subset of a miniworld form reality in order to store and manage data. Ontology
is usually considered to be more general in that. It attempts to describe a part of reality or a
domain of interest (e.g., medical terms, electronic-commerce applications) as completely as
possible
Comment
Chapter 4, Problem 16E
Problem
Design an EER schema for a database application that you are interested in. Specify all
constraints that should hold on the database. Make sure that the schema has at least five entity
types, four relationship types, a weak entity type, a superclass/subclass relationship, a category,
and an n-ary (n > 2) relationship type.
Step-by-step solution
Step 1 of 2
Comment
Step 2 of 2
Here weak entity type INTERVIEW has ternary identifying relationships- JOB_OFFER,
CANDIDATE and EMPLOYER. An interview can be related to candidate who gives interview and
some employer that takes it and some job offer for which interview can be taken.
Employer can be a government organization or a private firm, and is hiring for a department for
which a candidate can apply or wants to work for.
A candidate can be a fresher or may have some work experience.
Comment
Chapter 4, Problem 17E
Problem
Consider the BANK ER schema in Figure, and suppose that it is necessary to keep track of
different types of ACCOUNTS (SAVINGS_ACCTS, CHECKING_ACCTS, …) and LOANS
(CAR_LOANS, HOME_LOANS, …). Suppose that it is also desirable to keep track of each
ACCOUNT’S TRANSACTIONS (deposits, withdrawals, checks, …) and each LOAN's
PAYMENTS; both of these include the amount, date, and time. Modify the BANK schema, using
ER and EER concepts of specialization and generalization. State any assumptions you make
about the additional requirements.
An ER diagram for an AIRLINE database schema
Step-by-step solution
Step 1 of 2
Following are the assumptions:
• There are only three types of accounts SAVING, CURRENT and CHECKING accounts.
• There are only three types of loans CAR loans, HOME loans and PERSONAL loans.
• Each user can do any number of transactions on an account.
• A loan can be repaid in any number of payments
• Each transaction and payment have unique id.
Comment
Step 2 of 2
The modified enhanced entity relationship diagram is as follows:
Comment
Chapter 4, Problem 18E
Problem
The following narrative describes a simplified version of the organization of Olympic facilities
planned for the summer Olympics. Draw an EER diagram that shows the entity types, attributes,
relationships, and specializations for this application. State any assumptions you make. The
Olympic facilities are divided into sports complexes. Sports complexes are divided into one-sport
and multisport types. Multisport complexes have areas of the complex designated for each sport
with a location indicator (e.g., center, NE corner, and so on). A complex has a location, chief
organizing individual, total occupied area, and so on. Each complex holds a series of events
(e.g., the track stadium may hold many different races). For each event there is a planned date,
duration, number of participants, number of officials, and so on. A roster of ail officials will be
maintained together with the list of events each official will be involved in. Different equipment is
needed for the events (e.g., goal posts, poles, parallel bars) as well as for maintenance. The two
types of facilities (one-sport and multisport) will have different types of information. For each type,
the number of facilities needed is kept, together with an approximate budget.
Step-by-step solution
Step 1 of 3
In the EER diagram,
• “Rectangle box” denotes entity.
• “Diamond-shaped” symbol represents the relationship.
• “Oval” symbol connected with attribute represents the attribute.
Comment
Step 2 of 3
The following is the EER diagram for the organization of Olympic facilities planned for the
summer Olympics.
Comment
Step 3 of 3
Explanation:
• The Olympic facilities are divided into sports complexes. The sport complexes are divided into
one sport and multisport types.
• There exist a holds relationship between Complex and Event entities. The complex holds the
number of events.
• Each event is assigned to an officer.
• Both complex and event have equipment. The complex maintains maintenance equipment and
event has event equipment.
Comment
Chapter 4, Problem 19E
Problem
Identify all the important concepts represented in the library database case study described
below. In particular, identify the abstractions of classification (entity types and relationship types),
aggregation, identification, and specialization/generalization. Specify (min, max) cardinality
constraints whenever possible. List details that will affect the eventual design but that have no
bearing on the conceptual design. List the semantic constraints separately. Draw an EER
diagram of the library database.
Case Study: The Georgia Tech Library (GTL) has approximately 16,000 members, 100,000
titles, and 250,000 volumes (an average of 2.5 copies per book). About 10% of the volumes are
out on loan at any one time. The librarians ensure that the books that members want to borrow
are available when the members want to borrow them. Also, the librarians must know how many
copies of each book are in the library or out on loan at any given time. A catalog of books is
available online that lists books by author, title, and subject area. For each title in the library, a
book description is kept in the catalog; the description ranges from one sentence to several
pages. The reference librarians want to be able to access this description when members
request information about a book. Library staff includes chief librarian, departmental associate
librarians, reference librarians, check-out staff, and library assistants.
Books can be checked out for 21 days. Members are allowed to have only five books out at a
time. Members usually return books within three to four weeks. Most members know that they
have one week of grace before a notice is sent to them, so they try to return books before the
grace period ends. About 5% of the members have to be sent reminders to return books. Most
overdue books are returned within a month of the due date. Approximately 5% of the overdue
books are either kept or never returned. The most active members of the library are defined as
those who borrow books at least ten times during the year. The top 1% of membership does 15%
of the borrowing, and the top 10% of the membership does 40% of the borrowing. About 20% of
the members are totally inactive in that they are members who never borrow.
To become a member of the library, applicants fill out a form including their SSN, campus and
home mailing addresses, and phone numbers. The librarians issue a numbered, machinereadable card with the members photo on it. This card is good for four years. A month before a
card expires, a notice is sent to a member for renewal. Professors at the institute are considered
automatic members. When a new faculty member joins the institute, his or her information is
pulled from the employee records and a library card is mailed to his or her campus address.
Professors are allowed to check out books for three-month intervals and have a two-week grace
period. Renewal notices to professors are sent to their campus address.
The library does not lend some books, such as reference books, rare books, and maps. The
librarians must differentiate between books that can be lent and those that cannot be lent. In
addition, the librarians have a list of some books they are interested in acquiring but cannot
obtain, such as rare or out- of-print books and books that were lost or destroyed but have not
been replaced. The librarians must have a system that keeps track of books that cannot be lent
as well as books that they are interested in acquiring. Some books may have the same title;
therefore, the title cannot be used as a means of identification. Every book is identified by its
International Standard Book Number (ISBN), a unique international code assigned to all books.
Two books with the same title can have different ISBNs if they are in different languages or have
different bindings (hardcover or softcover). Editions of the same book have different ISBNs.
The proposed database system must be designed to keep track of the members, the books, the
catalog, and the borrowing activity.
Step-by-step solution
Step 1 of 2
Entity Types:
1. LIBRARY_MEMBER
2. BOOK
3. STAFF_MEMBER
Relationship types:
1. ISSUE_CARD
2. ISSUE_NOTICE
3. ISSUE_BOOK
4. GET_DESCRIPTION
Aggregation:
1. All entity types are aggregation of constituent attributes as can be seen from EER diagram.
2. Relationship types that have member attributes (see figure) are also aggregation.
Identification:
1. All entity types and Relationship type are identified by names.
2. Each entity of entity type is identified differently by:
a. LIBRARY_MEMBER: Ssn
b. BOOK: Key(Title, Bind, Language, ISBN)
c. STAFF_MEMBER: Ssn
Specialization/ generalization:
1. Specialization of STAFF_MEMBER on basis of Designation. This is a partial disjoint
specialization.
2. Specialization of BOOK on basis of In_Library. This is a total disjoint specialization.
3. Specialization of IN_LIBRARY_BOOK on basis of Can_be_rented. This is a total disjoint
specialization.
Other Constraints that may pose in future:
1. Fine that will be charged for a lost card.
2. Expiry period of lost card
3.
Comment
Step 2 of 2
Privileges that may be entitled to a particular group of users.
4. Book description might change with new issues.
5. Fine that will be charged for damaged book.
Comment
Chapter 4, Problem 20E
Problem
Design a database to keep track of information for an art museum. Assume that the following
requirements were collected:
â–  The museum has a collection of ART_OBJECTS. Each ART_OBJECT has a unique ld_no, an
Artist (if known), a Year (when it was created, if known), a Title, and a Description. The art
objects are categorized in several ways, as discussed below.
â–  ART_OBJECTS are categorized based on their type. There are three main types—PAINTING,
SCULPTURE, and STATUE—plus another type called OTHER to accommodate objects that do
not fall into one of the three main types.
â–  A PAINTING has a Paint_type (oil, watercolor, etc.), material on which it is Drawn_on (paper,
canvas, wood, etc.), and Style (modern, abstract, etc.).
â–  A SCULPTURE or a statue has a Material from which it was created (wood, stone, etc.),
Height, Weight, and Style.
â–  An art object in the OTHER category has a Type (print, photo, etc.) and Style.
â–  ART_OBJECTs are categorized as either PERMANENT_COLLECTION (objects that are
owned by the museum) and BORROWED. Information captured about objects in the
PERMANEN_COLLECTION includes Date_acquired, Status (on display, on loan, or stored), and
Cost. Information captured about BORROWED objects includes the Collection from which it was
borrowed, Date_borrowed, and Date_returned.
â–  Information describing the country or culture of Origin (Italian, Egyptian, American, Indian, and
so forth) and Epoch (Renaissance, Modern, Ancient, and so forth) is captured for each
ART_OBJECT.
â–  The museum keeps track of ARTIST information, if known: Name, DateBorn (if known),
Date_died (if not living), Country_of_origin, Epoch, Main_style, and Description. The Name is
assumed to be unique.
â–  Different EXHIBITIONS occur, each having a Name, Start_date, and End_date. EXHIBITIONS
are related to all the art objects that were on display during the exhibition.
â–  Information is kept on other COLLECTIONS with which the museum interacts; this information
includes Name (unique), Type (museum, personal, etc.), Description, Address, Phone, and
current Contact_person.
Draw an EER schema diagram for this application. Discuss any assumptions you make, and then
justify your EER design choices.
Step-by-step solution
Step 1 of 2
Consider the following museum database to create the ER diagram:
The following are the assumptions:
• An ARTIST can create any number of ART_OBJECTS.
• ART_OBJECT will be displayed in the exhibition.
• Many ART_OBJECTS can be displayed in many EXHIBITIONS.
Comment
Step 2 of 2
The EER schema diagram for the art museum database is as follows:
Comments (1)
Chapter 4, Problem 21E
Problem
Figure shows an example of an EER diagram for a small-private-airport database; the database
is used to keep track of airplanes, their owners, airport employees, and pilots. From the
requirements for this database, the following information was collected: Each AIRPLANE has a
registration number [Reg#], is of a particular plane type [OF_TYPE], and is stored in a particular
hangar [STORED_IN]. Each PLANE_TYPE has a model number [Model], a capacity [Capacity],
and a weight [Weight]. Each HANGAR has a number [Number], a capacity [Capacity], and a
location [Location]. The database also keeps track of the OWNERs of each plane [OWNS] and
the EMPLOYEES who have maintained the plane [MAINTAIN]. Each relationship instance in
OWNS relates an AIRPLANE to an OWNER and includes the purchase date [Pdate]. Each
relationship instance in MAINTAIN relates an EMPLOYEE to a service record [SERVICE]. Each
plane undergoes service many times; hence, it is related by [PLANE_SERVICE] to a number of
SERVICE records. A SERVICE record includes as attributes the date of maintenance [Date], the
number of hours spent on the work [Hours], and the type of work done [Work_code]. We use a
weak entity type [SERVICE] to represent airplane service, because the airplane registration
number is used to identify a service record. An OWNER is either a person or a corporation.
Hence, we use a union type (category) [OWNER] that is a subset of the union of corporation
[CORPORATION] and person [PERSON] entity types. Both pilots [PILOT] and employees
[EMPLOYEE] are subclasses of PERSON. Each PILOT has specific attributes license number
[Lic_num] and restrictions [Restr]; each EMPLOYEE has specific attributes salary [Salary] and
shift worked [Shift]. All PERSON entities in the database have data kept on their Social Security
number [Ssn], name [Name], address [Address], and telephone number [Phone]. For
CORPORATION entities, the data kept includes name [Name], address [Address], and telephone
number [Phone]. The database also keeps track of the types of planes each pilot is authorized to
fly [FLIES] and the types of planes each employee can do maintenance work on [WORKS_ON].
Show how the SMALL_AIRPORT EER schema in Figure 4.12 may be represented in UML
notation. (Note: We have not discussed how to represent categories (union types) in UML, so
you do not have to map the categories in this and the following question.)
EER schema for a SMALL_AIRPORT database.
Step-by-step solution
Step 1 of 2
Consider the EER schema for a SMALL_AIRPORT database. The following is the UML diagram
that represents the SMALL_AIRPORT database.
Comment
Step 2 of 2
Each entity and relationships are shown in the UML diagram. In the provided EER diagram, there
is a union type (category) specified for OWNER. The OWNER is a subset of the union of
CORPORATION and PERSON. The categories are not mapped in the UML as specified.
Comments (2)
Chapter 4, Problem 22E
Problem
Show how the UNIVERSITY EER schema in Figure 4.9 may be represented in UML notation.
Step-by-step solution
Step 1 of 2
• The entity relationship diagram refers to the diagram that represents the relationship between
different entities and their attributes.
The entities can be people, objects etc.
• The UML refers to the unified modeling language which is a language used to develop or model
the fields in software engineering.
It is very helpful to understand the designing of the system.
Comment
Step 2 of 2
For the given ER diagram, the UML diagram is shown below:
Comment
Chapter 4, Problem 23E
Problem
Consider the entity sets and attributes shown in the following table. Place a checkmark in one
column in each row to indicate the relationship between the far left and far right columns.
a. The left side has a relationship with the right side.
b. The right side is an attribute of the left side.
c. The left side is a specialization of the right side.
d. The left side is a generalization of the right side.
(b) Has
(a) Has a
Entity Set
Relationship
with
an
Attribute
that is
(c) Is a
(d) Is a
Specialization Generalization
of
of
Entity
Attrib
1.
MOTHER
PERS
2.
DAUGHTER
MOT
3.
STUDENT
PERS
4.
STUDENT
Stude
5.
SCHOOL
STUD
6.
SCHOOL
CLAS
7.
ANIMAL
HOR
8.
HORSE
Breed
9.
HORSE
Age
10. EMPLOYEE
SSN
11. FURNITURE
CHAI
12. CHAIR
Weig
13. HUMAN
WOM
14. SOLDIER
PERS
15. ENEMY_COMBATANT
PERS
Step-by-step solution
Step 1 of 2
Relationship between Entity Sets and Attributes
Specialization: Specialization is the process of classifying a class of objects into more
specialized subclasses. Consider an example a “PERSON” class, classify this class objects into
more specialized subclasses like MOTHER, STUDENT, SOILDER, and so on.
Generalization: Generalization is a relationship in which the child class is based on the parent
class. Both child and parent class elements in a generalization relationship must be of the same
type.
Aggregation: It specifies a whole/part relationship between the aggregate (whole) and a
component part. When a class is formed as a collection of other classes, it is called an
aggregation relationship between these classes. It is also called a “has a” relationship.
Inheritance: A child class properties is derived from parent class properties. It is also called an “Is
a” relationship.
Comment
Step 2 of 2
Consider the entity sets and attributes and apply one of the relationship.
Entity Sets and Attributes Relationship Table
Comment
Chapter 4, Problem 24E
Problem
Draw a UML diagram for storing a played game of chess in a database. You may look at
http://www.chessgames.com for an application similar to what you are designing. State clearly
any assumptions you make in your UML diagram. A sample of assumptions you can make about
the scope is as follows:
1. The game of chess is played between two players.
2. The game is played on an 8 x 8 board like the one shown below:
3. The players are assigned a color of black or white at the start of the game.
4. Each player starts with the following pieces (traditionally called chessmen):
a. king
b. queen
c. 2 rooks
d. 2 bishops
e. 2 knights
f. 8 pawns
5. Every piece has its own initial position.
6. Every piece has its own set of legal moves based on the state of the game. You do not need to
worry about which moves are or are not legal except for the following issues:
a. A piece may move to an empty square or capture an opposing piece.
b. If a piece is captured, it is removed from the board.
c. If a pawn moves to the last row, it is “promoted” by converting it to another piece (queen, rook,
bishop, or knight).
Note: Some of these functions may be spread over multiple classes.
Step-by-step solution
Step 1 of 1
Assumptions:
1. In any move maximum two pieces can get affected.
2. Player can promote a piece.
3. A piece gets promoted.
4. After a move captured piece is removed from board.
Comment
Chapter 4, Problem 25E
Problem
Draw an EER diagram for a game of chess as described in Exercise. Focus on persistent
storage aspects of the system. For example, the system would need to retrieve all the moves of
every game played in sequential order.
Exercise
Draw a UML diagram for storing a played game of chess in a database. You may look at
http://www.chessgames.com for an application similar to what you are designing. State clearly
any assumptions you make in your UML diagram. A sample of assumptions you can make about
the scope is as follows:
1. The game of chess is played between two players.
2. The game is played on an 8 x 8 board like the one shown below:
3. The players are assigned a color of black or white at the start of the game.
4. Each player starts with the following pieces (traditionally called chessmen):
a. king
b. queen
c. 2 rooks
d. 2 bishops
e. 2 knights
f. 8 pawns
5. Every piece has its own initial position.
6. Every piece has its own set of legal moves based on the state of the game. You do not need to
worry about which moves are or are not legal except for the following issues:
a. A piece may move to an empty square or capture an opposing piece.
b. If a piece is captured, it is removed from the board.
c. If a pawn moves to the last row, it is “promoted” by converting it to another piece (queen, rook,
bishop, or knight).
Note: Some of these functions may be spread over multiple classes.
Step-by-step solution
Step 1 of 1
EER diagram for chess game
Enhanced Entity Relationship diagram is the concept of superclass and subclass entity types in
the ER model.
Here super classes are PLAYER, MOVES, PIECES and subclasses are Name, Color,
Cur_position, Initian_Position, Piece_name, Position_before_move, Changed_position.
Sequence order for game play:
Step 1: PLAYER makes a first move.
Step 2: PIECES get moved and give the chance for position.
Step 3: PLAYER, take the chance and ready to nest move
Step 4: PIECES change the position for avoiding the PLAYER move.
Step 5: This process will continue until the End.
Comment
Chapter 4, Problem 26E
Problem
Which of the following EER diagrams is/are incorrect and why? State clearly any assumptions
you make.
a.
b.
c.
Step-by-step solution
Step 1 of 3
a.
The given EER diagram is correct.
• E is a super class and E1 and E2 are sub classes of entity E.
• E1 and E2 are overlapping entities of entity E. It indicates that E may be a member of E1 or E2
or both.
• There exists a one to many relationship R between E2 and E3.
Comment
Step 2 of 3
b.
The given EER diagram is correct.
• E is a super class and E1 and E2 are sub classes of entity E.
• E1 and E2 are disjoint entities of entity E. It indicates that E may be a member of E1 or E2.
• There exists a one to one relationship R between E1 and E2.
Comment
Step 3 of 3
c.
The given EER diagram is incorrect.
• E1 and E3 are overlapping entities of entity say E. It indicates that E may be a member of E1 or
E3 or both.
• The overlapping entities E1 and E3 cannot share a relationship R. So there cannot be a many
to many relationship between E1 and E3.
Hence, given EER is not possible.
Comments (1)
Chapter 4, Problem 27E
Problem
Consider the following EER diagram that describes the computer systems at a company. Provide
your own attributes and key for each entity type. Supply max cardinality constraints justifying
your choice. Write a complete narrative description of what this EER diagram represents.
Step-by-step solution
Step 1 of 5
S.No Entity Type
Attributes
Key
1
COMPUTER
RAM, ROM, Processor, S_no, Manufacturer, Cost
S_no
2
ACCESSORY
S_no, cost, type
S_no
3
LAPTOP
Weight, Screen_size
NA
4
DESKTOP
Color
NA
5
SOFTWARE
6
OPERATONG_SYSTEM Name, size
NA
7
COMPONENT
Manufacturer, S_no, Cost, Type
S_no
8
KEYBOARD
Type
NA
9
MEMORY
Size
NA
10
MONITOR
Size, Resolution, Type
NA
11
MOUSE
Type, Is_wired
NA
12
SOUND_CARD
Type
NA
13
VIDEO_CARD
Type
NA
Lic_no, Cost, Manufacturer, Is_system_software,
Year_of_manufacturing, Version, Author
Lic_no
Comment
Step 2 of 5
(min,max)
Relationship
Entity type1
name
name
1
SOLD_WITH
COMPUTER (1,1)
ACCESSORY
(1,N)
2
INSTALLED
COMPUTER (1,1)
SOFTWARE
(1,M)
S.No
constraint,
Entity type2 name
REASON
(min,max)
constraint
3
INSTALLED_OS COMPUTER (1,1)
OPERATING_SYSTEM (1,N)
4
MEM_OPTIONS LAPTOP
(1,1)
MEMORY
(1,N)
5
OPTIONS
DESKTOP
(1,1)
COMPONENT
(1,N)
6
SUPPORTS
SOFTWARE (1,N)
COMPONENT
(1,M)
Comment
Step 3 of 5
As all components and accessories are restricted by S_no and all softwares are restricted by
Lic_no so each can go to a single LAPTOP/ DESKTOP/ COMPUTER. On the contrary a
computer can have any no of ACCESSORY/ SOFTWARE/ OPERATING_SYSTEM/
COMPONENT/ MEMORY.
SOFTWARE may need many supporting COMPONENTS and a COMPONENT can SUPPORT
many SOFTWARES.
Comment
Step 4 of 5
Narrative description: A database is needed to maintain all computers systems in a company.
Each COMPUTER in company has a unique S_no. it has a fixed RAM, ROM, Processor,
Manufacturer and Cost. A COMPUTER can be a LAPTOP or a DESKTOP. Each LAPTOP has
Screen_size, Weight. Each DESKTOP has a Colour. A COMPUTER has many SOFTWARE
INSTALLED. Each SOFTWARE has a unique Lic_no. It has also associated with it Cost,
Manufacturer, Is_system_software, Year_of_manufacturing, Version, Author.
OPERATING_SYSTEM is also software that is related to COMPUTER and has size of memory it
consumes and name associated with it.
Comment
Step 5 of 5
With COMPUTER one can get ACCESSORY. Each ACCESSORY has cost, S_no, and
type(audio/ video./ input/output). ACCESSORY can be categorized into KEYBOARD (type),
MOUSE (type, Is_wired), MONITOR(size, resolution, type).
Associated with DESKTOP and software we have various COMPONENT (Manufacturer, S_no,
Cost, Type). COMPONENT are further divided in MEMORY(size), AUDIO_CARD(type),
VIDEO_CARD(type). LAPTOP can also have MEMORY_OPTOIONS.
Comment
Chapter 4, Problem 29LE
Show transcribed image text
Consider an ONLINE AUCTION database system in which members (buyers and sellers) participate the sale of items The data
requirements for this system are summarized as follows: The online site has members, each of whom is identi ed by a unique
member number and is described by an e-mail address, name. password, home address, and phone number. A member may
be a buyer or a seller A buyer has a shipping address recorded in the database A seller has a bank account number and
routing. number recorded in the database. Items are placed by a seller for sale and are identi ed by unique item number
assigned by the system. Items are also described b) an Item title. a description, starting bid price, bidding increment, the start
date of the auction, and the end date of the auction. Items are also categorized based on a xed classi cation hierarchy ( r
example a modem may be classi ed as COMPUTER rightarrow HARDWARE rightarrow MODEM). Buyers make bids for items
they are interested in. Bid price and time of bid is recorded. The bidder at the end of the auction with the highest bid price is
declared the winner and a transaction between buyer and seller may the proceed The buyer and seller my record feedback
regarding their completed transaction. Feedback contains a rating of the other party participating in the transaction (1-10)and a
comment.
View comments (1)

EER diagram for Online Auction Database
Chapter 4, Problem 30LE
Consider a database system for a baseball organization such as the major leagues. The data requirements
are summarized as follows:
The personnel involved in the league include players, coaches, managers, and umpires. Each is identi ed
by a unique personnel id. They are also described by their rst and last names along with the date and
place of birth.
Players are further described by other attributes such as their batting orientation (left, right, or switch) and
have a lifetime batting average (BA).
Within the players group is a subset of players called pitchers. Pitchers have a lifetime ERA (earned run
average) associated with them.
Teams are uniquely identi ed by their names. Teams are also described by the city in which they are
located and the division and league in which they play (such as Central division of the American League).
Teams have one manager, a number of coaches, and a number of players.
Games are played between two teams, with one designated as the home team and the other the visiting
team on a particular date. The score (runs, hits, and errors) is recorded for each team. The team with the
most runs is declared the winner of the game.
With each nished game, a winning pitcher and a losing pitcher are recorded. In case there is a save
awarded, the save pitcher is also recorded.
With each nished game, the number of hits (singles, doubles, triples, and home runs) obtained by each
player is also recorded.
Design an enhanced entity–relationship diagram for the BASEBALL database
Using that EER diagram, model the database in Microsoft Access.
Populate each table with appropriate data.
our populated Access database with all relationships added (with referential integrity, of course)
If you are a completist, you can nd data at ESPN or MLB sites.
Expert Answer
Anonymous answered this
5 answers
Was this answer helpful?
0
The EER model of the Baseball Database is as follows:
Below are the Database tables designed in MS Access for Teams, Managers, Umpires, Players and Pitchers:
0
For Managers:
For Players:
For Pitchers:
For Umpires:
These all the related tables used to manage the Baseball game with a Master Database as follows:
Master DB: Part1:
Master DB Part2:
Problem
Chapter 4, Problem 31LE
Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design
using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in
notation between the diagram in the text and the corresponding equivalent diagrammatic notation
you end up using with the tool.
Step-by-step solution
Step 1 of 1
Refer to the figure 4.9 for the EER diagram of the UNIVERSITY database. Use Rational Rose
tool to create the ER schema for the database as follow:
• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.
• Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and
then click on empty space of the Class Diagram file. Name the class as FACULTY.
Right click on the class, select the option New Attribute, and name the attribute as Rank.
Similarly, create the other attributes Foffice, Fphone and Salary.
• Similarly create another class GRANT and its attributes Title, No, Agency and St_date.
• Now right click on the attribute No, available on the left under the class GRANT, and select the
option Open Specification. Select the Protected option under Export Control. This will make
the attribute No as primary key.
• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class FACULTY; while holding the click drag the
mouse towards the class GRANT and release the click. This will create the relationship between
the two selected classes.
Name the association as PI. Since the structural constraint in the EER diagram is specified using
cardinality ratio, so specify the structural constraints using the Rational Rose tool as follows:
• Right click on the association close to the class FACULTY and select 1 from the option
Multiplicity.
• Again, right click on the association close to the class GRANT and select n from the option
Multiplicity.
• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.
ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
The list of differences in notation between the EER diagram used in the figure 4.9 and its
equivalent diagrammatic notation, drawn through the Rational Rose tool, are as follows:
• In the EER diagram the entities are specified in a rectangle. However, the class diagram in
Rational Rose makes use of top section of the class diagram for specifying the entities.
• The attributes are specified in the EER diagram using the oval. The class diagram in the
Rational Rose makes use of the middle section, for specifying the attributes.
• The primary keys in the EER diagram are specified by underlining the attribute in an oval.
An attribute can be made a primary key in the class diagram in the Rational Rose by selecting
the option Open Specification; followed by selecting the Protected option under Export
Control. A yellow color key against the attribute in the class diagram in the Rational Rose
indicates primary key.
• The relationship between two entities is specified in the diamond shaped box. For example, in
figure 4.9 PI is the relationship between FACULTY and GRANT.
The class diagram in Rational Rose makes use of option Unidirectional Association for
specifying the relation or association between two entities. For example, in the above class
diagram, the association named PI is specified on the line joining the two entities.
• The structural constraint in the EER diagram is specified using cardinality ratio. For example, in
the PI relationship, FACULTY: GRANT is of cardinality ratio 1:N.
In the class diagram made using Rational Rose, the Multiplicity option is used for specifying the
cardinality ratio.
Comment
Chapter 4, Problem 31LE
Problem
Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design
using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in
notation between the diagram in the text and the corresponding equivalent diagrammatic notation
you end up using with the tool.
Step-by-step solution
Step 1 of 1
Refer to the figure 4.9 for the EER diagram of the UNIVERSITY database. Use Rational Rose
tool to create the ER schema for the database as follow:
• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.
• Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and
then click on empty space of the Class Diagram file. Name the class as FACULTY.
Right click on the class, select the option New Attribute, and name the attribute as Rank.
Similarly, create the other attributes Foffice, Fphone and Salary.
• Similarly create another class GRANT and its attributes Title, No, Agency and St_date.
• Now right click on the attribute No, available on the left under the class GRANT, and select the
option Open Specification. Select the Protected option under Export Control. This will make
the attribute No as primary key.
• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class FACULTY; while holding the click drag the
mouse towards the class GRANT and release the click. This will create the relationship between
the two selected classes.
Name the association as PI. Since the structural constraint in the EER diagram is specified using
cardinality ratio, so specify the structural constraints using the Rational Rose tool as follows:
• Right click on the association close to the class FACULTY and select 1 from the option
Multiplicity.
• Again, right click on the association close to the class GRANT and select n from the option
Multiplicity.
• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.
ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
The list of differences in notation between the EER diagram used in the figure 4.9 and its
equivalent diagrammatic notation, drawn through the Rational Rose tool, are as follows:
• In the EER diagram the entities are specified in a rectangle. However, the class diagram in
Rational Rose makes use of top section of the class diagram for specifying the entities.
• The attributes are specified in the EER diagram using the oval. The class diagram in the
Rational Rose makes use of the middle section, for specifying the attributes.
• The primary keys in the EER diagram are specified by underlining the attribute in an oval.
An attribute can be made a primary key in the class diagram in the Rational Rose by selecting
the option Open Specification; followed by selecting the Protected option under Export
Control. A yellow color key against the attribute in the class diagram in the Rational Rose
indicates primary key.
• The relationship between two entities is specified in the diamond shaped box. For example, in
figure 4.9 PI is the relationship between FACULTY and GRANT.
The class diagram in Rational Rose makes use of option Unidirectional Association for
specifying the relation or association between two entities. For example, in the above class
diagram, the association named PI is specified on the line joining the two entities.
• The structural constraint in the EER diagram is specified using cardinality ratio. For example, in
the PI relationship, FACULTY: GRANT is of cardinality ratio 1:N.
In the class diagram made using Rational Rose, the Multiplicity option is used for specifying the
cardinality ratio.
Comment
Chapter 4, Problem 32LE
Problem
Consider the EER diagram for the small AIRPORTdatabase shown in Figure. Build this design
using a data modeling tool such as ERwin or Rational Rose. Be careful how you model the
category OWNER in this diagram. (Hint: Consider using CORPORATION_IS_OWNER and
PERSON_IS_OWNER as two distinct relationship types.)
EER schema for a SMALL_AIRPORT database.
Step-by-step solution
Step 1 of 2
Refer to the figure 4.12 for the EER schema of AIRLINE database. Use Rational Rose tool to
create the EER schema for the database as follow:
• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.
• Name the class diagram as SMALL_AIRPORT. Select the option Class available in the toolbar
and then click on empty space of the Class Diagram file. Name the class as PLANE_TYPE.
Right click on the class, select the option New Attribute, and name the attribute as Model.
Similarly, create the other attributes Capacity and Weight.
• Now right click on the attribute Model, available on the left under the class PLANE_TYPE, and
select the option Open Specification. Select the Protected option under Export Control. This
will make Model as the primary key.
• Similarly create another class EMPLOYEE and its attribute Salary and Shift.
• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class PLANE_TYPE; while holding the click drag the
mouse towards the class EMPLOYEE and release the click. This will create the relationship
between the two selected classes.
Name the association as WORKS_ON. Since the structural constraint in the EER diagram is
specified using cardinality ratio, so specify the structural constraints using the Rational Rose tool
as follows:
• Right click on the association close to the class PLANE_TYPE and select n from the option
Multiplicity.
• Again, right click on the association close to the class EMPLOYEE and select n from the option
Multiplicity.
• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.
ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
Comment
Step 2 of 2
In the above class diagram, OWNER is the superclass, and PERSON and CORPORATION are
the subclasses. The subclasses can further participate in specific relationship types.
For example, in the above class diagram the PERSON subclass participates in the
OWNER_TYPE relationship. The subclass PERSON is further related to an entity type
PERSON_IS_OWNER via the OWNER_TYPE relationship.
Similarly, the subclass CORPORATION is related to CORPORATION_IS_OWNER via the
OWNER_TYPE relationship.
The relationship types can be specified using the Rational Rose as follows:
• Create the subclass PERSON_IS_OWNER of the class PERSON as explained above. Also
create the association between the class PERSON and its subclass PERSON_IS_OWNER and
name it as OWNER_TYPE, as explained above.
• Similarly, create the subclass CORPORATION_IS_OWNER of the class CORPORATION and
name the association between them as OWNER_TYPE.
Comment
Chapter 4, Problem 33LE
Problem
Consider the UNIVERSITY database described in Exercise 3.16.You already developed an ER
schema for this database using a data modeling tool such as ERwin or Rational Rose in Lab
Exercise 3.31. Modify this diagram by classifying COURSES as either
UNDERGRAD_COURSES or GRAD_COURSES and INSTRUCTORS as either
JUNIOR_PROFESSORS or SENIOR_PROFESSORS. Include appropriate attributes for these
new entity types. Then establish relationships indicating that junior instructors teach
undergraduate courses whereas senior instructors teach graduate courses.
Reference Exercise 3.31
Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design
using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in
notation between the diagram in the text and the corresponding equivalent diagrammatic notation
you end up using with the tool.
Reference Problem 3.16
Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:
a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.
b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.
c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.
Can you think of any other similar constraints?
Step-by-step solution
Step 1 of 1
Refer to the Exercise 3.16 for the UNIVERSITY database and the ER schema developed for this
database through Rational Rose tool. Using Rational Rose, make the required changes and
create the ER schema as follows:
• COURSE is the superclass and UNDERGRAD_COURSES and GRAD_COURSES are its
subclasses. The subclasses are introduced in the class diagram, developed using Rational Rose
tool in Lab Exercise 3.31, via Rational Rose tool as follows:
• Consider the class COURSE developed in Exercise 3.31. Select the option Class available in
the toolbar and then click on empty space of the Class Diagram file. Name the subclass as
UNDERGRAD_COURSES.
Right click on the class, select the option New Attribute, and name the attribute as Title.
Similarly, create the other attribute Department.
Similarly, create another subclass GRAD_COURSES of the class COURSE and its attributes
Title and Department.
• Similarly, create the subclasses JUNIOR_PROFESSORS and SENIOR_PROFESSORS of the
superclass INSTRUCTOR. Also create the attributes Specialization, Designation and
Qualification for these subclasses, as described above.
• The subclass JUNIOR_PROFESSORS is further related to another subclass
UNDERGRAD_COURSES via the TEACHES relationship. Also, the subclass
SENIOR_PROFESSORS is further related to another subclass GRAD_COURSES via the
TEACHES relationship.
The relationship types between the subclass and superclass can be specified using the Rational
Rose as follows:
• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class JUNIOR_PROFESSORS; while holding the
click drag the mouse towards the class UNDERGRAD_COURSES and release the click. This will
create the relationship between the two selected classes.
Name the association as TEACHES.
• Similarly, create the relationship between the classes SENIOR_PROFESSORS and
GRAD_COURSES.
ER schema with the changes may be specified using alternate diagrammatic notation that is
class diagram, through the use of Rational Rose tool as follows:
Comment
Chapter 5, Problem 1RQ
Problem
Define the following terms as they apply to the relational model of data: domain, attribute, ntuple, relation schema, relation state, degree of a relation, relational database schema, and
relational database state.
Step-by-step solution
Step 1 of 7
575-5-1RQ
1. Domain: Domain is a set of atomic (indivisible) values that can appear in a particular column
in a relational schema. A common method of specifying domain is to specify a data type (integer,
character, floating point, etc...) from which the data values forming a domain can be drawn.
For example: Consider a relational schema called Student that may have facts about students
in a particular course. Consider a fact to be name of the student. Name of a student must be a
char string. So we can say domain of name is char string.
Comment
Step 2 of 7
2. Attribute: An Attribute is a role played by some domain in the relational schema.
For example: In relational Schema STUDENT, NAME can be one of the attributes of the relation
NOTATIONS:
• Relational Schema R1 >> R(A1,A2,…..,AN)
• Attributes>> A1, A2 ….
• Domain of say A1>> dom(A1)
• Tuple>> t
Comment
Step 3 of 7
3. N-tuple: If a Relational Schema consists of n Attributes, i.e., degree of relational schema is n,
then n-tuple is an ordered list of n values that represent a tuple , t = ; where each value
vi,1<=i<=n, is a element of dom(Ai) or is a special NULL value.
For example: In a relational schema STUDENT, if we have four attributes, viz., Name, Roll No.,
Class, and , Rank then n-tuple for a student can be where
Student Ram has roll number 1 and studies in class to and got rank 5 in class.
4. Relational Schema: Relational schema is but collection of attributes that define facts and
relation between a real world entity and name. In other words a relational schema R, denoted by
R (A1,A2,….,AN), is made up of a name and a list of attributes A1, A2,…,An.
For example: STUDENT can be name of a relational schema and Name, Roll No., Class, and ,
Rank can be its four attributes.
Comment
Step 4 of 7
5. : A relation state, r, of a relation schema R(A1, A2,……An), is a set of n-tuples. In another
words a relation state of a relational schema is a collection of various tuples, where each tuple
represents information about single entity.
For example: In relational schema for student collection of data for 2 students, viz., , is a relation
state.
Formal Definition: A relation state, r(R), is a mathematical relation of degree n on the domains
of all attributes, which is a subset of the cartesian product of the domains that define R:
r(R) C (dom (A1) × dom (A2)×……..× dom (An))
Comment
Step 5 of 7
6. Degree of a Relation: The degree (or arity) of a relation is the number of attributes n of its
relational schema.
Comment
Step 6 of 7
7. Relational Database Schema: A Relational Database Schema S is a set of relation schemas,
S = { R1,R2,….Rn} and a set of integrity constraints IC.
Comment
Step 7 of 7
8. : A Relational Database State DB of S is set of relation states, DB = {r1,r2,….rn}, such that
each ri is state of Ri and such that the ri relation states satisfy the integrity constraints specified
in IC.
Comment
Problem
Chapter 5, Problem 2RQ
Why are tuples in a relation not ordered?
Step-by-step solution
Step 1 of 2
A relation in database management is defined as a set of tuples.
And mathematically, the elements of a set have no order among them.
Comment
Step 2 of 2
Hence, the tuples in a relation are not ordered.
Comment
Chapter 5, Problem 3RQ
Problem
Why are duplicate tuples not allowed in a relation?
Step-by-step solution
Step 1 of 1
Duplicate tuples are not allowed in a relation as it violates the relational integrity constraints.
• A key constraint states that there must be an attribute or combination of attributes in a relation
whose values are unique.
• There should not be any two tuples in a relation whose values are same for their attribute
values.
• If the tuples contains duplicate values, then it violates the key constraint.
Hence, duplicate tuples are not allowed in a relation.
Comment
Chapter 5, Problem 4RQ
Problem
What is the difference between a key and a superkey?
Step-by-step solution
Step 1 of 2
A super key SK is a set of attributes that uniquely identifies the tuples of a relation. It satisfies the
uniqueness constraint.
A key K is an attribute or set of attributes that uniquely identifies the tuples of a relation. It is a
minimal super key. In other words, when an attribute is removed from super key, it will no longer
be a super key.
Comment
Step 2 of 2
The differences between key and super key are as follows:
Comment
Chapter 5, Problem 5RQ
Problem
Why do we designate one of the candidate keys of a relation to be the primary key?
Step-by-step solution
Step 1 of 1
Every relation must contain an attribute or combination of attributes which can used to uniquely
identify each tuple in a relation.
• An attribute or combination of attributes which can used to uniquely identify each tuple in a
relation is known as candidate key.
• A relation can have more than one candidate key.
• Among several candidate key, one candidate key which is usually single and simple is chosen
as a primary key.
• A primary key is an attribute that uniquely identifies each tuple in a relation.
Comment
Chapter 5, Problem 6RQ
Problem
Discuss the characteristics of relations that make them different from ordinary tables and files.
Step-by-step solution
Step 1 of 2
The tables, relations, and files are the key concepts of the relational data model. A relation
resembles a table, but it has some added constraints to it to use the link between two tables in
an efficient way.
A file is basically a collection of records or a table stored on a physical device.
Comment
Step 2 of 2
Even though both the relation and a table are used to store/represent data, there are differences
between them as shown below:
Comment
Chapter 5, Problem 7RQ
Problem
Discuss the various reasons that lead to the occurrence of NULL values in relations.
Step-by-step solution
Step 1 of 2
NULL value:
The absence of a data, that is “nothing” represented as an empty value.
• The NULL value can be considered as a data.
• The data may be “zero”, “blank” or “none”
For example,
If the student does not have any pen or pencil for the exam,
• For that particular student, the values of those attributes are defined as NULL.
• The NULL value can be either the values do not exist, an unknown value or the value not yet
available.
Comment
Step 2 of 2
The Occurrence of NULL values in relations:
• The tuple can be marked as NULL, When the value of an attribute is not applicable.
• The tuple can be marked as NULL, When the existing value of an attribute is unknown.
• If the value of an attribute does not apply to a tuple, it is also marked as NULL.
• If the value of an attribute is not known or not found, the particular tuple is marked as NULL.s
• For instance, suppose the values are known but specifically does not apply to the tuple it is
marked as NULL.
• In relations of NULL values, the values exist but at present, it is not available.
• In relations of NULL values, the different meanings can be conveyed by different codes.
• In relations, the operations of NULL value have been proved when the lack of value (NULL) is
found.
Comment
Chapter 5, Problem 8RQ
Problem
Discuss the entity integrity and referential integrity constraints. Why is each considered
important?
Step-by-step solution
Step 1 of 2
Entity Integrity Constraint: It states that no primary key value can be NULL.
Importance: Primary key values are used to identify a tuple in a relation. Having NULL value for
primary key will mean that we cannot identify some tuples.
Referential Integrity Constraints: It states that a tuple in one relation that refers to another
relation must refer to an existing tuple in that relation
Comment
Step 2 of 2
Definition using Foreign Key: For two relational schemas R1 and R2, a set of attributes FK in
relation schema R1, is foreign key of R1 that references relation R2 f it satisfies following
condition:
• Attributes in FK have same domain(s) as primary key attributes PK of R2; the attributes FK are
said to reference relation R2.
• A value of FK in a tuple t1 of the current state r1 (R1) either occurs in as a value of PK for some
tuple in the current state r2 (R2) or is NULL . In former case (t1 [FK] = t2 [PK]) tuple t1 is said to
refer to the tuple t2.
When these two conditions hold true between R1 the referencing relation and R2 the referenced
relation the referential integrity constraint is said to hold true.
Importance: Referential Integrity constraints are specified among two relations and are used to
maintain consistency among tuples in two relations.
Comment
Chapter 5, Problem 9RQ
Problem
Define foreign key. What is this concept used for?
Step-by-step solution
Step 1 of 2
A foreign key is an attribute or composite attribute of one relation which is/are a primary key of
other relation that is used to maintain relationship between two relations.
• A relation can have more than one foreign key.
• A foreign key can contain null values.
Comment
Step 2 of 2
The concept of foreign key is used to maintain referential integrity constraint between two
relations and hence in maintaining consistency among tuples in two relations.
• The value of a foreign key should match with value of the primary key in the referenced relation.
• A value to a foreign key cannot be added which does not exist in the primary key of the
referenced relation.
• It is not possible to delete a tuple from the referenced relation if there is any matching record in
the referencing relation.
Comment
Chapter 5, Problem 10RQ
Problem
What is a transaction? How does it differ from an Update operation?
Step-by-step solution
Step 1 of 2
A transaction is a program in execution that involves various operations that can be done on the
database.
The operations that are included in a transaction are as follows:
• Reading data from the database.
• Deleting a tuple from the database.
• Inserting new tuples to the database
• Updating values of existing tuples in the database.
Comment
Step 2 of 2
The main difference between update operation and a transaction is as follows:
• In an update operation, only a single attribute value can be changed at one time.
• In a transaction, more than one update operation along with reading data from the database,
insertion and deletion operations can be done.
Comment
Chapter 5, Problem 11E
Problem
Suppose that each of the following Update operations is applied directly to the database state
shown in Figure 5.6. Discuss all integrity constraints violated by each operation, if any, and the
different ways of enforcing these constraints.
a. Insert <‘Robert’, ‘F’ ‘Scott’, ‘943775543’, ‘1972-06-21’, ‘2365 Newcastle Rd, Bellaire, TX’, M,
58000, ‘888665555’, 1> into EMPLOYEE.
b. Insert <‘ProductA’, 4, ‘Bellaire’, 2> into PROJECT.
c. Insert <‘Production’, 4, ‘943775543’, ‘2007-10-01’> into DEPARTMENT.
d. Insert <‘677678989’, NULL, ‘40.0’> into WORKS_ON.
e. Insert <‘453453453’, ‘John’, ‘M’, ‘1990-12-12’, ‘spouse’> into DEPENDENT.
f. Delete the WORKS_ON tuples with Essn = ‘333445555’.
g. Delete the EMPLOYEE tuple with Ssn = ‘987654321’..
h. Delete the PROJECT tuple with Pname = ‘ProductX’.
i. Modify the Mgr_ssn and Mgr_start_date of the DEPARTMENT tuple with Dnumber = 5 to
‘123456789’ and ‘2007-10-01’, respectively.
j. Modify the Super_ssn attribute of the EMPLOYEE tuple with Ssn = ‘999887777’ to
‘943775543’.
k .Modify the Hours attribute of the WORKS_ON tuple with Essn = ‘999887777’ and Pno = 10 to
‘5.0’.
Step-by-step solution
Step 1 of 11
(a)
Acceptable operation.
Comment
Step 2 of 11
(b)
Not Acceptable. Violates referential integrity constraint as value of Department number that is
foreign key is not present in DEPARTMENT relation.
Ways of enforcing as follows:
• Not performing the operation and explain to user cause of the same.
• Inserting NULL value in department field and performing operation.
• Prompting user to insert department with Dept number 2 in DEPRTMENT relation and then
performing the operation.
Comment
Step 3 of 11
(c)
Not Acceptable. Violates Key constraint. Department with dept number 4 already exist. Ways of
enforcing as follows:
• Not performing the operation and explain to user cause of the same.
Comment
Step 4 of 11
(d)
Not Acceptable. Violates entity Integrity constraint and referential integrity constraint. Value of
one of the Attributes of primary is NULL. Also value of Essn is not present in referenced relation,
i.e., EMPLOYEE.
Ways of enforcing as follows:
• Not performing the operation and explain to user cause of the same.
• Prompting user to specify correct values for the primary key and performing the operation.
Comment
Step 5 of 11
(e)
Acceptable
Comment
Step 6 of 11
(f)
Acceptable
Comment
Step 7 of 11
(g)
Not Acceptable.
Violates referential integrity constraint as value of Ssn has been used as foreign key of
WORKS_ON, EMPLOYEE, DEPENDENT, DEPARTMENT relations and deleting record with Ssn
= ‘987654321’ will leave no corresponding entry for record in WORKS_ON relation.
Ways of enforcing as follows:
• Not performing the operation and explain to user cause of the same.
• Deleting corresponding records in corresponding tables as well.
Comment
Step 8 of 11
(h)
Not Acceptable.
Violates referential integrity constraint as value of Pnumber has been used as foreign key of
WORKS_ON relation and deleting record with Pname = ‘ProductX’ will also delete product with
Pnumber = ’1’. Since this value has been used in WORKS_ON table so deleting this record will
violate referential integrity constraint. Ways of enforcing as follows:
• Not performing the operation and explain to user cause of the same.
• Deleting corresponding records in corresponding tables as well.
Comment
Step 9 of 11
(i)
Acceptable.
Comment
Step 10 of 11
(j)
Not Acceptable.
Violates referential integrity constraint as value of Super_Ssn is also foreign key for EMPLOYEE
relation. Since no employee with Ssn = ‘943775543’ exist so Super_Ssn of any employee cannot
be ‘943775543’.
Ways of enforcing as follows:
• Not performing the operation and explain to user cause of the same.
• Prompting user to either add a record in EMPLOYEE relation with Ssn = ‘943775543’ or to
change Super_Ssn to some valid value.
Comment
Step 11 of 11
(k)
Acceptable.
Comment
Chapter 5, Problem 12E
Problem
Consider the AIRLINE relational database schema shown in Figure, which describes a database
for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or
more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled
arrival and departure times, airports, and one or more LEG_INSTANCEs—one for each Date on
which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance,
SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival
and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a
particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which
they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE
database to enter a reservation on a particular flight or flight leg on a given date.
a. Give the operations for this update.
b. What types of constraints would you expect to check?
c. Which of these constraints are key, entity integrity, and referential integrity constraints, and
which are not?
d. Specify all the referential integrity constraints that hold on the schema shown in Figure.
The AIRLINE relational database schema.
Step-by-step solution
Step 1 of 4
a.
First it is necessary check if the seats are available on the on a particular flight or flight leg on a
given date. This can be done by checking the LEG_INSTANCE relation.
SELECT Number_of_available_seats FROM LEG_INSTANCE
WHERE Flight_number ='FL01' and Date='2000-06-07';
If the Number_of_available_seats>0, then perform the following operation to reserve a seat.
INSERT INTO SEAT_RESERVATION VALUES
('FL01', '1', '2000-06-07', '1', 'John','9910110110');
Comment
Step 2 of 4
b.
The constraints that need to be checked into to perform the update are as follows:
• Check if Number_of_available_seats in LEG_INSTANCE relation for the particular flight on the
particular date is greater than 1.
• Check if the particular SEAT_NUMBER for particular flight on the particular date is available or
not.
Comments (1)
Step 3 of 4
c.
Checking the Number_of_available_seats in LEG_INSTANCE relation does not come under
entity or referential integrity constraint.
Checking for SEAT_NUMBER particular flight on the particular date comes under entity integrity
constraint.
Comment
Step 4 of 4
d.
A referential integrity constraint specifies that the value of a foreign key should match with value
of the primary key in the primary table.
The referential integrity constraints hold are as follows:
• Flight_number of FLIGHT_LEG relation is a foreign key which references the Flight_number of
FLIGHT relation.
• Flight_number of LEG_INSTANCE is a foreign key which references the Flight_number of
FLIGHT relation.
• Flight_number of FARE is a foreign key which references the Flight_number of FLIGHT relation.
• Flight_number of SEAT_RESERVATION is a foreign key which references the Flight_number of
FLIGHT relation.
• Departure_airport_code and Arrival_airport_code of FLIGHT_LEG are foreign keys which
references the Airport_code of AIRPORT relation.
• Departure_airport_code and Arrival_airport_code of LEG_INSTANCE are foreign keys which
references the Airport_code of AIRPORT relation.
• Airport_code of CAN_LAND is a foreign key which references the Airport_code of AIRPORT
relation.
• Flight_number and Leg_number of LEG_INSTANCE are foreign keys which references
Flight_number and Leg_number of FLIGHT_LEG.
• Airplane_id of LEG_INSTANCE is a foreign key which references the Airplane_id of AIRPLANE
relation.
• Flight_number, Leg_number and Date of SEAT_RESERVATION are are foreign keys which
references Flight_number, Leg_number and Date of LEG_INSTANCE relation.
• Airplane_type_name of CAN_LAND is a foreign key which references the Airplane_type_name
of AIRPLANE_TYPE relation.
Comment
Chapter 5, Problem 13E
Problem
Consider the relation CLASS(Course#, Univ_Section#, Instructor_name, Semester,
Building_code, Room#, Time_period, Weekdays, Credit_hours). This represents classes taught
in a university, with unique Univ_section#s. Identify what you think should be various candidate
keys, and write in your own words the conditions or assumptions under which each candidate
key would be valid.
Step-by-step solution
Step 1 of 2
The relation CLASS specified about the uniqueness of
and classes that are
taught in University.
As per the CLASS relation, the following are the possible candidate keys:
1.
2.
– If this
is unique throughout all the semesters.
– If at least one course is taught by an instructor for each
semester.
3.
– If at
given same time, for a specific semester, same room cannot be used by more than one course.
Comment
Step 2 of 2
4.
– These would be the candidate keys if the
is not unique. In this case, more than one Universities are considered and
depending on the section numbers used by rules of University.
5. Otherwise,
– If
are assigned with unique numbers throughout the semester.
Comment
is unique, then all the sections
Chapter 5, Problem 14E
Problem
Consider the following six relations for an order-processing database application in a company:
CUSTOMER(Cust#, Cname, City)
ORDER(Order#, Odate, Cust#, Ord_amt)
ORDER_ITEM(Order#, Item#, Qty)
ITEM(Item#, Unit_price)
SHIPMENT(Order#, Warehouse#, Ship_date)
WAREHOUSE(Warehouse#, City)
Here, Ord_amt refers to total dollar amount of an order; Odate is the date the order was placed;
and Ship_date is the date an order (or part of an order) is shipped from the warehouse. Assume
that an order can be shipped from several warehouses. Specify the foreign keys for this schema,
stating any assumptions you make. What other constraints can you think of for this database?
Step-by-step solution
Step 1 of 2
Foreign Keys:
a. Cust# of ORDER is FK for CUSTOMER: orders are taken from recognized customers only.
b. Order# of ORDER_ITEM is FK of ORDER.
c. Item# of ORDER_ITEM is FK of ITEM: Orders are taken only for items in stock.
d. Order# of SHIPMENT is FK of ORDER: Shipment is done only for orders taken.
e. Warehouse# of SHIPMENT is FK of WAREHOUSE: shipment is done only from companies
warehouses.
Comment
Step 2 of 2
Other Constraints:
• Ship_date must be greater (later date) then Odate in ORDER. Order must be taken before it is
shipped.
• Ord_amt must be greater than Unit_price.
Comment
Chapter 5, Problem 15E
Problem
Consider the following relations for a database that keeps track of business trips of salespersons
in a sales office:
SALESPERSON(Ssn, Name, Start_year, Dept_no)
TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip id)
EXPENSE(Trip id, Account#, Amount)
A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating
any assumptions you make.
Step-by-step solution
Step 1 of 3
A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.
• A foreign key is mainly used for establishing relationship between two tables.
• A table can have more than one foreign key.
Comment
Step 2 of 3
The foreign keys in the given relations are as follows:
• Ssn is a foreign key in TRIP relation. It references the Ssn of SALESPERSON relation.
• Trip_id is a foreign key in EXPENSE relation. It references the Trip_id of TRIP relation.
Comment
Step 3 of 3
Assume that there are additional tables that stores the department information and account
details. Then possible foreign keys are as follows:
• Dept_no is a foreign key in SALESPERSON relation.
• Account# is a foreign key in EXPENSE relation.
Comment
Chapter 5, Problem 16E
Problem
Consider the following relations for a database that keeps track of student enrollment in courses
and the books adopted for each course:
STUDENT(Ssn, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(Ssn, Course#, Quarter, Grade)
BOOK ADOPTION(Course#, Quarter, Book_isbn)
TEXT(Book_isbn, Book_title, Publisher, Author)
Specify the foreign keys for this schema, stating any assumptions you make.
Step-by-step solution
Step 1 of 2
A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.
• A foreign key is mainly used for establishing relationship between two tables.
• A table can have more than one foreign key.
Comment
Step 2 of 2
The foreign keys in the given relations are as follows:
• Ssn is a foreign key in ENROLL table which references the Ssn of STUDENT table . Ssn is a
primary key in STUDENT table.
• Course# is a foreign key in ENROLL table which references the Course# of COURSE table .
Course#is a primary key in COURSE table.
• Course# is a foreign key in BOOK_ADOPTION table which references the Course# of
COURSE table . Course# is a primary key in COURSE table.
• Book_isbn is a foreign key in BOOK_ADOPTION table which references the Book_isbn of
TEXT table . Book_isbn is a primary key in TEXT table.
Comment
Chapter 5, Problem 17E
Problem
Consider the following relations for a database that keeps track of automobile sales in a car
dealership (OPTION refers to some optional equipment installed on an automobile):
CAR(Serial no, Model, Manufacturer, Price)
OPTION(Serial_no, Option_name, Price)
SALE(Salesperson_id, Serial_no, Date, Sale_price)
SALESPERSON(Salesperson_id, Name, Phone)
First, specify the foreign keys for this schema, stating any assumptions you make. Next, populate
the relations with a few sample tuples, and then give an example of an insertion in the SALE and
SALESPERSON relations that violates the referential integrity constraints and of another
insertion that does not.
Step-by-step solution
Step 1 of 4
Foreign keys are:
a. Serial_no from OPTION is FK for CAR: spare parts can be added to cars with serial number.
b. Serial_no from is FK for CAR:only car with serial number can be put to sale.
c. Salesperson_id from is FK for SALESPERSON: salesperson can sell any car.
Comments (2)
Step 2 of 4
Consider a relation schema state:
CAR:
Serial_no Model Manufacturer Price(lakh)
1
1987
ford
7
2
1998
Tata
4
3
1988
Ferrari
20
4
1952
Ford
2
Serial_no Option_name Price
2
Abc
200
4
def
400
OPTION:
Comment
Step 3 of 4
SALESPERSON:
Saleperson_id Name Phone
Sl1
Ram
9910101010
Sl2
John
9999999999
Sl3
Mario 9090909090
:
Saleperson_id Serial_no Date
Sl1
1
Sale_price(lakh)
2000-6-07 7.5
Sl2
2
2000-6-08 4.1
Comment
Step 4 of 4
Insertion in that violates Referential Integrity constraint:
Insert <’Sl4’, ‘5’,’2000-07-07’,’21’> into
Invalid Saleperson_id and Serial_no.
Insertion in that does not violates Referential Integrity constraint:
Insert < ’Sl1’,’4’,’2000-09-07’,’2.1’> into
Insertion in SALESPERSON can not violate Referential Integrity constraint. A valid insertion for
SALESPERSON can be:
Insert <’Sl4’, ‘Jack’,’9190000000’> into SALESPERSON.
Comment
Chapter 5, Problem 18E
Problem
Database design often involves decisions about the storage of attributes. For example, a Social
Security number can be stored as one attribute or split into three attributes (one for each of the
three hyphen-delineated groups of numbers in a Social Security number—XXX-XX-XXXX).
However, Social Security numbers are usually represented as just one attribute. The decision is
based on how the database will be used. This exercise asks you to think about specific situations
where dividing the SSN is useful.
Step-by-step solution
Step 1 of 2
Usually during the database design, the social security number (SSN) is stored as single
attribute.
• SSN is made up of 9 digits divided into three parts.
• The format of SSN is XXX-XX-XXXX.
• Each part is separated by a hyphen.
• The first part represents the area number.
• The second part represents the group number.
• The third part represents the serial number.
Comment
Step 2 of 2
The situations where it is preferred to store the SSN as parts instead of as a single attribute is as
follows:
• Area number determines the location or state. In some cases, it is necessary to group the data
based on the location to generate some statistical information.
• The area code (or city code) is required and sometimes country code is needed for dialing the
international phone numbers.
• Every part has its own independent existence.
Comment
Chapter 5, Problem 19E
Problem
Consider a STUDENT relation in a UNIVERSITY database with the following attributes (Name,
Ssn, Local_phone, Address, Cell_phone, Age, Gpa). Note that the cell phone may be from a
different city and state (or province) from the local phone. A possible tuple of the relation is
shown below:
Name
Ssn
Local_phone Address
George Shaw William
123-45-
Edwards
6789
Cell_phone Age Gpa
123 Main St.,
555-1234
Anytown, CA
555-4321
19
3.75
94539
a. Identify the critical missing information from the Local_phone and Cell_phone attributes. (Hint:
How do you call someone who lives in a different state or province?)
b. Would you store this additional information in the Local_phone and Cell_phone attributes or
add new attributes to the schema for STUDENT?
c. Consider the Name attribute. What are the advantages and disadvantages of splitting this field
from one attribute into three attributes (first name, middle name, and last name)?
d. What general guideline would you recommend for deciding when to store information in a
single attribute and when to split the information?
e. Suppose the student can have between 0 and 5 phones. Suggest two different designs that
allow this type of information.
Step-by-step solution
Step 1 of 5
a. State, province or city code is missing from phone number information.
Comment
Step 2 of 5
b. Since cell phone and local phone can be of different city or state, additional information must
be added in Local_phone and Cell_phone attributes.
Comment
Step 3 of 5
c. If Name is Split in First_name, Middle_name and Last_name attributes there can be following
advantages:
• Sorting can be done on basis of First Name or Last Name or Middle Name.
Disadvantages:
• By splitting single attribute into three attributes NULL values may increase in database. (If few
students don’t have a Middle Name.)
• Extra Memory will be consumed for storing NULL values of attributes that may not exist for a
particular student. (Middle Name).
Comment
Step 4 of 5
d. To decide when to store information in single attribute:
• When storing information in different attributes will create NULL values, single attribute must be
preferred.
• When while using single attribute atomicity can not be maintained, we must use different
attributes.
• When information needs to be sorted on the basis of some Sub-field of and attribute or when
any sub-field is needed for decision making, we must split single attribute into many.
e.
Comment
Step 5 of 5
First Design
• STUDENT(Name, Ssn, Phone_number_count, Address, Age, Gpa)
Phone (Ssn, Phone_number)
Second Design:
• STUDENT(Name, Ssn, Phone_number1, Phone_number2, Phone_number3, Phone_number4,
Phone_number5, Address, Age, Gpa)
Although schema can be designed in either of the two ways but design first is better than second
as it leaves lesser number of NULL values.
Comment
Chapter 5, Problem 20E
Problem
Recent changes in privacy laws have disallowed organizations from using Social Security
numbers to identify individuals unless certain restrictions are satisfied. As a result, most U.S.
universities cannot use SSNs as primary keys (except for financial data). In practice, Student_id,
a unique identifier assigned to every student, is likely to be used as the primary key rather than
SSN since Student_id can be used throughout the system.
a. Some database designers are reluctant to use generated keys (also known as surrogate keys)
for primary keys (such as Student_id) because they are artificial. Can you propose any natural
choices of keys that can be used to identify the student record in a UNIVERSITY database?
b. Suppose that you are able to guarantee uniqueness of a natural key that includes last name.
Are you guaranteed that the last name will not change during the lifetime of the database? If last
name can change, what solutions can you propose for creating a primary key that still includes
last name but remains unique?
c. What are the advantages and disadvantages of using generated (surrogate) keys?
Step-by-step solution
Step 1 of 1
(a)
Some Operation on Students Name and Local and cell phone numbers
(originals) can jointly be used for generating id for student.
For Example:
First name + initials of name+ ‘_’ + last name + ‘_’ + digits of
local_phone_number + sum of digits of cell phone number + ‘_’ + increasing
record counter.
For Example: for record
Let it be 57th entry into the system. We can have unique identifier as:
GeorgeGWE_Edwards_555-123430_57.
Assumptions: Each student has different local_number unless they have same
address and two students with same address will not have same names.
Some hash operations can also be used on various fields for generation of
key.
(b)
In case if natural key uses Last name and as last name can change we can
include a column called original last name. That can be used for identification.
(c)
Advantages of Surrogate keys:
Immutability:
• Surrogate keys do not change while the row exists. This has two advantages:
Database applications won't lose their "handle" on the row because the data changes;
• Many database systems do not support cascading updates of keys across foreign keys of
related tables. This results in difficulty in modifying the primary key data.
Flexibility for changing requirements
Because of changing requirements, the attributes that uniquely identify an entity might change. In
that case, the attribute(s) initially chosen as the natural key will no longer be a suitable natural
key.
Example :
An employee ID is chosen as the natural key of an employee DB. Because of a merger with
another company, new employees from the merged company must be inserted, who have
conflicting IDs (as their IDs were independently generated when the companies were Separate).
In these cases, generally a new attribute must be added to the natural key (e.g. an attribute
"original_company"). With a surrogate key, only the table that defines the surrogate key must be
changed. With natural keys, all tables (and possibly other, related software) that use the natural
key will have to change. More generally, in some problem domains it is simply not clear what
might be a suitable natural key. Surrogate keys avoid problems from choosing a natural key that
later turns out to be incorrect.
Performance
Often surrogate keys are composed of a compact data type, such as a four-byte integer. This
allows the database to query faster than it could multiple columns.
• A non-redundant distribution of keys causes the resulting b-tree index to be completely
balanced.
• If the natural key is a compound key, joining is more expensive as there are multiple columns to
compare. Surrogate keys are always contained in a single column.
Compatibility
Several database application development systems, drivers, and object-relational mapping
systems, such as Ruby on Rails or Hibernate (Java), depend on the use of integer or GUID
surrogate keys in order to support database-system-agnostic operations and object-to-row
mapping.
Disadvantages of surrogate keys:
Disassociation
Because the surrogate key is completely unrelated to the data of the row to
which it is attached, the key is disassociated from that row. Disassociated
keys are unnatural to the application's world, resulting in an additional level of
indirection from which to audit.
Query Optimization
Relational databases assume a unique index is applied to a table's primary
key. The unique index serves two purposes: 1) to enforce entity integrity—
primary key data must be unique across rows—and 2) to quickly search for
rows queried. Since surrogate keys replace a table's identifying attributes—the
natural key—and since the identifying attributes are likely to be those queried,
then the query optimizer is forced to perform a full table scan when fulfilling
likely queries. The remedy to the full table scan is to apply a (non-unique)
index on each of the identifying attributes. However, these additional indexes
will take up disk space, slow down inserts, and slow down deletes.
Normalization
The presence of a surrogate key can result in the database administrator
forgetting to establish, or accidentally removing, a secondary unique index on
the natural key of the table. Without a unique index on the natural key,
duplicate rows are likely to appear and are difficult to identify.
Business Process Modeling
Because surrogate keys are unnatural, flaws can appear when modeling the
business requirements. Business requirements, relying on the natural key,
then need to be translated to the surrogate key.
Inadvertent Disclosure
Proprietary information may be leaked if sequential key generators are used.
By subtracting a previously generated sequential key from a recently
generated sequential key, one could learn the number of rows inserted during
that time period. This could expose, for example, the number of transactions
or new accounts per period. The solution to the inadvertent disclosure
problem is to generate a random primary key. However, a randomly generated
primary key must be queried before assigned to prevent duplication and cause
an insert rejection.
Inadvertent Assumptions
Sequentially generated surrogate keys create the illusion that events with a
higher primary key value occurred after events with a lower primary key value.
This illusion would appear when an event is missed during the normal data
entry process and is, instead, inserted after subsequent events were
previously inserted. The solution to the inadvertent assumption problem is to
generate a random primary key. However, a randomly generated primary key
must be queried before assigned to prevent duplication and cause an insert
rejection.
Comment
Chapter 6, Problem 1RQ
Problem
How do the relations (tables) in SQL differ from the relations defined formally in Chapter 3?
Discuss the other differences in terminology. Why does SQL allow duplicate tuples in a table or in
a query result?
Step-by-step solution
Step 1 of 1
SQL allows a table(relation) to have two or more tuples that are identical in all their attribute
values. Hence, in general, an SQL table is not a set of tuples, because a set does not allow two
identical members; rather, it is a multiset of tuples. Some SQL relations are constrained to be
sets because a key constraint has been declared or because of DISTINCT option has been used
in SELECT statement.
On contrary relation defined formally says that a relation is set of tuples that is, same values are
not allowed for any tuple.
Correspondence between ER and Relational Model can help in understanding other differences
in terminology:
ER Model
Relational Model
Entity type
Entity relation
1:1 or 1:N relationship type Foreign key(or relationship type)
M:N relationship type
Relationship relation and two foreign keys
n-ary relationship type
Relationship relation and n foreign keys
Simple Attributes
Attribute
Composite attributes
Set of simple component attribute
Multivalued attributes
Relation and foreign keys
Value set
Domain
Key attributes
Primary(or secondary) key
SQL allows duplicate tuples for following reasons:
1. Duplicate elimination is a expensive operation.
2. User may want to see duplicate tuples in the result of query.
3. When an aggregate function is applied to tuples, in most cases user don’t want to remove
duplicates.
Comment
Chapter 6, Problem 2RQ
Problem
List the data types that are allowed for SQL attributes.
Step-by-step solution
Step 1 of 1
List of data types allowed for SQL attributes:The basic data types available for attributes are
Numeric data types
Character string
Bit string
Boolean
Date and time.
Comment
Chapter 6, Problem 3RQ
Problem
How does SQL allow implementation of the entity integrity and referential integrity constraints
described in Chapter 3? What about referential triggered actions?
Step-by-step solution
Step 1 of 6
An entity integrity constraint specifies that every table must have a primary key and the primary
key should contain unique values and cannot contain null values.
SQL allows implementation of the entity integrity constraint using PRIMARY KEY clause.
• The PRIMARY KEY clause must be specified at the time of creating a table.
• It ensures that no duplicate values are inserted into the table.
Comment
Step 2 of 6
Following are the examples to illustrate how the entity integrity constraint is implemented in SQL:
CREATE TABLE BOOKS
(BOOK_CODE INT PRIMARY KEY,
BOOK_TITLE VARCHAR(20),
BOOK_PRICE INT );
In the table BOOKS, BOOK_CODE is a primary key.
CREATE TABLE AUTHOR
(AUTHOR_ID INT PRIMARY KEY,
AUTHOR_NAME VARCHAR(20));
In the table AUTHOR, AUTHOR_ID is a primary key.
Comment
Step 3 of 6
A foreign key is an attribute or two or more attributes which is/are a primary key of other table
that is used to maintain relationship between two tables.
A referential integrity constraint specifies that the value of a foreign key should match with value
of the primary key in the primary table.
SQL allows implementation of the referential integrity constraint using FOREIGN KEY clause.
• The FOREIGN KEY clause must be specified at the time of creating a table.
• It ensures that it is not possible to add a value to a foreign key which does not exist in the
primary key of the primary/linked table.
Comment
Step 4 of 6
Following is the example to illustrate how the referential integrity constraint is implemented in
SQL:
CREATE TABLE BOOKSTORE
(BOOK_CODE INT FOREIGN KEY REFERENCES BOOKS(BOOK_CODE),
AUTHOR_ID INT FOREIGN KEY REFERENCES AUTHOR(AUTHOR_ID),
BOOK_TYPE VARCHAR(20),
PRIMARY KEY(BOOK_CODE, AUTHOR_ID));
In the table BOOKSTORE, BOOK_CODE, AUTHOR_ID together form the primary key.
BOOK_CODE is a foreign key which refers the BOOK_CODE of table BOOKS.
AUTHOR_ID is a foreign key which refers the AUTHOR_ID of table AUTHOR.
The use of the foreign key BOOK_CODE is that it is not possible to add a tuple to BOOKSTORE
table unless there is a valid BOOK_CODE in the BOOKS table.
The use of the foreign key AUTHOR_ID is that it is not possible to add a tuple to BOOKSTORE
table unless there is a valid AUTHOR_ID in the AUTHOR table.
Comment
Step 5 of 6
When a foreign key is violated, the default action performed by the SQL is to reject the operation.
• Instead of rejecting the operation, it is possible to add a REFERENTIAL TRIGGERED ACTION
clause to the foreign key which will automatically insert a NULL value or a default value.
• The options provided along with REFERENTIAL TRIGGERED ACTION are SET NULL, SET
DEFAULT, CASCADE.
• A qualifier ON DELETE or ON UPDATE must be specified along with the options.
Comment
Step 6 of 6
Following is the example to illustrate how the referential triggered action is implemented in SQL:
CREATE TABLE EMPLOYEE
(EMPNO INT PRIMARY KEY,
ENAME VARCHAR(20),
JOB VARCHAR(20),
SALARY INT,
MANAGER INT FOREIGN KEY REFERENCES EMPLOYEE(EMPNO)
ON DELETE SET NULL);
Comment
Chapter 6, Problem 4RQ
Problem
Describe the four clauses in the syntax of a simple SQL retrieval query. Show what type of
constructs can be specified in each of the clauses. Which are required and which are optional?
Step-by-step solution
Step 1 of 1
The four clauses in the syntax of a simple SQL retrieval query:
The following are the four clauses of a simple SQL retrieval query.
Select:
• It is a statement connected with the From clause to extract or get the data from the database in
a human readable format.
• The select clause is required.
From:
• The From clause should be used in combination with the Select statement for retrieving the
data.
• It will prompt the database to use which table to retrieve the data and we can mention multiple
tables in the from clause.
• It is required.
Where:
• It is used to impose conditions on the query and remove the rows or tuples which does not
satisfy the condition.
• We can use more than one condition in the where clause and
• It is optional.
Order By:
• This clause is used to sort the values of the output either in ascending order or descending
order.
• The default value of the Order By is ascending order.
• This clause is also optional.
Example of Simple Sql query:
Select * from employee where empno=10 Order by desc;
Comment
Chapter 6, Problem 5E
Problem
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. What are the
referential integrity constraints that should hold on the schema? Write appropriate SQL DDL
statements to define the database.
Step-by-step solution
Step 1 of 2
From the figure 1.2 in the text book the referential integrity constraints that should hold
the following notation:
R.(A1, ..., An) --> S.(B1, ..., Bn)
This represent a foreign key from the attributes A1, ..., An of referencing relation R
to S (the referenced relation)):
PREREQUISITE.(CourseNumber) --> COURSE.(CourseNumber)
PREREQUISITE.(PrerequisiteNumber) --> COURSE.(CourseNumber)
SECTION.(CourseNumber) --> COURSE.(CourseNumber)
GRADE_REPORT.(StudentNumber) --> STUDENT.(StudentNumber)
GRADE_REPORT.(SectionIdentifier) --> SECTION.(SectionIdentifier)
Comment
Step 2 of 2
SQL statements for above data base.
CREATE TABLE STUDENT ( Name VARCHAR(30) NOT NULL,
StudentNumber INTEGER NOT NULL, Class CHAR NOT NULL, Major CHAR(4),
PRIMARY KEY (StudentNumber) );
CREATE TABLE COURSE ( CourseName VARCHAR(30) NOT NULL,
CourseNumber CHAR(8) NOT NULL, CreditHours INTEGER, Department CHAR(4),
PRIMARY KEY (CourseNumber), UNIQUE (CourseName) );
CREATE TABLE PREREQUISITE ( CourseNumber CHAR(8) NOT NULL,
PrerequisiteNumber CHAR(8) NOT NULL, PRIMARY KEY (CourseNumber,
PrerequisiteNumber), FOREIGN KEY (CourseNumber) REFERENCES
COURSE (CourseNumber), FOREIGN KEY (PrerequisiteNumber) REFERENCES
COURSE (CourseNumber) );
CREATE TABLE SECTION ( SectionIdentifier INTEGER NOT NULL,
CourseNumber CHAR(8) NOT NULL, Semester VARCHAR(6) NOT NULL,
Year CHAR(4) NOT NULL, Instructor VARCHAR(15), PRIMARY KEY (SectionIdentifier),
FOREIGN KEY (CourseNumber) REFERENCES
COURSE (CourseNumber) );
CREATE TABLE GRADE_REPORT ( StudentNumber INTEGER NOT NULL,
SectionIdentifier INTEGER NOT NULL, Grade CHAR, PRIMARY KEY (StudentNumber,
SectionIdentifier), FOREIGN KEY (StudentNumber) REFERENCES
STUDENT (StudentNumber), FOREIGN KEY (SectionIdentifier) REFERENCES
SECTION (SectionIdentifier) );
Comment
Chapter 6, Problem 6E
Problem
Repeat Exercise, but use the AIRLINE database schema of Figure.
Exercise
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. What are the
referential integrity constraints that should hold on the schema? Write appropriate SQL DDL
statements to define the database.
The AIRLINE relational database.
Step-by-step solution
Step 1 of 10
Below referential integrity constraints for the AIR LINE data base schema is based on the figure
2.1 from the text book.
FLIGHT_LEG.(FLIGHT_NUMBER) --> FLIGHT.(NUMBER)
FLIGHT_LEG.(DEPARTURE_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)
FLIGHT_LEG.(ARRIVAL_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)
LEG_INSTANCE.(FLIGHT_NUMBER, LEG_NUMBER) -->
FLIGHT_LEG.(FLIGHT_NUMBER, LEG_NUMBER)
LEG_INSTANCE.(AIRPLANE_ID) --> AIRPLANE.(AIRPLANE_ID)
LEG_INSTANCE.(DEPARTURE_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)
LEG_INSTANCE.(ARRIVAL_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)
FARES.(FLIGHT_NUMBER) --> FLIGHT.(NUMBER)
CAN_LAND.(AIRPLANE_TYPE_NAME) --> AIRPLANE_TYPE.(TYPE_NAME)
CAN_LAND.(AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)
AIRPLANE.(AIRPLANE_TYPE) --> AIRPLANE_TYPE.(TYPE_NAME)
SEAT_RESERVATION.(FLIGHT_NUMBER, LEG_NUMBER, DATE) -->
LEG_INSTANCE.(FLIGHT_NUMBER, LEG_NUMBER, DATE)
Comment
Step 2 of 10
CREATE TABLE statements for the database is,
CREATE (AIRPORT_CODE CHAR (3) NOT NULL, NAME VARCHAR (30) NOT NULL, CITY
VARCHAR (30) NOT NULL, STATE VARCHAR (30), PRIMARY KEY (AIRPORT_CODE) );
Comment
Step 3 of 10
CREATE TABLE FLIGHT (NUMBER VARCHAR (6) NOT NULL, AIRLINE VARCHAR (20) NOT
NULL, WEEKDAYS VARCHAR (10) NOT NULL, PRIMARY KEY (NUMBER));
Comment
Step 4 of 10
CREATE TABLE FLIGHT_LEG (FLIGHT_NUMBER VARCHAR (6) NOT NULL,
LEG_NUMBER INTEGER NOT NULL, DEPARTURE_AIRPORT_CODE CHAR (3) NOT NULL,
SCHEDULED_DEPARTURE_TIME TIMESTAMP WITH TIME ZONE,
ARRIVAL_AIRPORT_CODE CHAR (3) NOT NULL, SCHEDULED_ARRIVAL_TIME TIMESTAMP
WITH TIME ZONE, PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER), FOREIGN KEY
(FLIGHT_NUMBER) REFERENCES FLIGHT (NUMBER), FOREIGN KEY
(DEPARTURE_AIRPORT_CODE) REFERENCES
AIRPORT (AIRPORT_CODE), FOREIGN KEY (ARRIVAL_AIRPORT_CODE)
(AIRPORT_CODE));
Comment
Step 5 of 10
CREATE TABLE LEG_INSTANCE (FLIGHT_NUMBER VARCHAR (6) NOT NULL,
LEG_NUMBER INTEGER NOT NULL, LEG_DATE DATE NOT NULL,
NO_OF_AVAILABLE_SEATS INTEGER, AIRPLANE_ID INTEGER,
DEPARTURE_AIRPORT_CODE CHAR(3), DEPARTURE_TIME TIMESTAMP WITH TIME
ZONE, ARRIVAL_AIRPORT_CODE CHAR(3), ARRIVAL_TIME TIMESTAMP WITH TIME ZONE,
PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE), FOREIGN KEY
(FLIGHT_NUMBER, LEG_NUMBER) REFERENCES FLIGHT_LEG (FLIGHT_NUMBER,
LEG_NUMBER), FOREIGN KEY (AIRPLANE_ID) REFERENCES
AIRPLANE (AIRPLANE_ID), FOREIGN KEY (DEPARTURE_AIRPORT_CODE)
(AIRPORT_CODE),
FOREIGN KEY (ARRIVAL_AIRPORT_CODE) (AIRPORT_CODE) );
Comment
Step 6 of 10
CREATE TABLE FARES (FLIGHT_NUMBER VARCHAR (6) NOT NULL,
FARE_CODE VARCHAR (10) NOT NULL, AMOUNT DECIMAL (8, 2) NOT NULL,
RESTRICTIONS VARCHAR (200), PRIMARY KEY (FLIGHT_NUMBER, FARE_CODE),
FOREIGN KEY (FLIGHT_NUMBER) REFERENCES FLIGHT (NUMBER) );
Comment
Step 7 of 10
CREATE TABLE AIRPLANE_TYPE (TYPE_NAME VARCHAR (20) NOT NULL,
MAX_SEATS INTEGER NOT NULL, COMPANY VARCHAR (15) NOT NULL,
PRIMARY KEY (TYPE_NAME) );
Comment
Step 8 of 10
CREATE TABLE CAN_LAND (AIRPLANE_TYPE_NAME VARCHAR (20) NOT NULL,
AIRPORT_CODE CHAR (3) NOT NULL, PRIMARY KEY (AIRPLANE_TYPE_NAME,
AIRPORT_CODE), FOREIGN KEY (AIRPLANE_TYPE_NAME) REFERENCES
AIRPLANE_TYPE (TYPE_NAME),
FOREIGN KEY (AIRPORT_CODE) (AIRPORT_CODE) );
Comment
Step 9 of 10
CREATE TABLE AIRPLANE (AIRPLANE_ID INTEGER NOT NULL,
TOTAL_NUMBER_OF_SEATS INTEGER NOT NULL, AIRPLANE_TYPE VARCHAR (20) NOT
NULL, PRIMARY KEY (AIRPLANE_ID),
FOREIGN KEY (AIRPLANE_TYPE) REFERENCES AIRPLANE_TYPE (TYPE_NAME) );
Comment
Step 10 of 10
CREATE TABLE SEAT_RESERVATION (FLIGHT_NUMBER VARCHAR (6) NOT NULL,
LEG_NUMBER INTEGER NOT NULL, LEG_DATE DATE NOT NULL,
SEAT_NUMBER VARCHAR (4), CUSTOMER_NAME VARCHAR (30) NOT NULL,
CUSTOMER_PHONE CHAR (12), PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER,
LEG_DATE, SEAT_NUMBER), FOREIGN KEY (FLIGHT_NUMBER, LEG_NUMBER,
LEG_DATE) REFERENCES
LEG_INSTANCE (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE) );
Comment
Chapter 6, Problem 7E
Problem
Consider the LIBRARY relational database schema shown in Figure. Choose the appropriate
action (reject, cascade, set to NULL, set to default) for each referential integrity constraint, both
for the deletion of a referenced tuple and for the update of a primary key attribute value in a
referenced tuple. Justify your choices.
A relational database scheme for a LIBRARY database.
Step-by-step solution
Step 1 of 7
The appropriate actions of the LIBRARY relational database schema are as follows:
• The REJECT action will not permit the automatic changes in the LIBRARY database.
• If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the
rows of the referenced relation BOOK_AUTHORS.
• If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the
rows of the referenced relation BOOK_AUTHORS.
Therefore, the CASCADE on DELETE and CASCADE on UPDATE actions are chosen for the
above referential integrity.
Comment
Step 2 of 7
• It is not possible to delete the rows in the PUBLISHER relation because it is referenced to the
rows in the BOOK table.
• If the PUBLISHER’s name is updated the CASCADE on UPDATE action is automatically
propagated to the rows of the referenced relation BOOK.
Therefore, the ON DELETE REJECT and CASCADE on UPDATE actions are chosen for the
above referential integrity.
Comment
Step 3 of 7
• If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the
rows of the referenced relation BOOK_LOANS.
• If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the
rows of the referenced relation BOOK_LOANS.
• It is not possible to delete the rows in the BOOK relation because it is referenced to the rows in
the BOOK_LOANS table.
Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT
actions are chosen for the above referential integrity.
Comment
Step 4 of 7
• If a BOOK is deleted, then delete all its associated rows in the relation BOOK_COPIES.
• If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the
rows of the referenced relation BOOK_COPIES.
• If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the
rows of the referenced relation BOOK_COPIES.
Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT
actions are chosen for the above referential integrity.
Comment
Step 5 of 7
• If the rows deleted in a BORROWER table, the CASCADE on DELETE action is automatically
propagated to the rows of the referenced relation BOOK_LOANS.
• If the CardNo is updated in the BORROWER table, the CASCADE on UPDATE action is
automatically propagated to the rows of the referenced relation BOOK_LOANS.
• It is not possible to delete the rows in the BORROWER relation because it is referenced to the
rows in the BOOK_LOANS table.
Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT
actions are chosen for the above referential integrity.
Comment
Step 6 of 7
• If the rows deleted in a LIBRARY_BRANCH table, the CASCADE on DELETE action is
automatically propagated to the rows of the referenced relation BOOK_COPIES.
• If the Branch_id is updated in the LIBRARY_BRANCH table, the CASCADE on UPDATE action
is automatically propagated to the rows of the referenced relation BOOK_COPIES.
• It is not possible to delete the rows in the LIBRARY_BRANCH relation because it is referenced
to the rows in the BOOK_COPIES table.
Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT
actions are chosen for the above referential integrity.
Comment
Step 7 of 7
• If the rows deleted in a LIBRARY_BRANCH table, the CASCADE on DELETE action is
automatically propagated to the rows of the referenced relation BOOK_LOANS.
• If the Branch_id is updated in the LIBRARY_BRANCH table, the CASCADE on UPDATE action
is automatically propagated to the rows of the referenced relation BOOK_LOANS.
• It is not possible to delete the rows in the LIBRARY_BRANCH relation because it is referenced
to the rows in the BOOK_LOANS table.
Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT
actions are chosen for the above referential integrity.
Comment
Chapter 6, Problem 8E
Problem
Write appropriate SQL DDL statements for declaring the LIBRARY relational database schema of
Figure. Specify the keys and referential triggered actions.
A relational database scheme for a LIBRARY database.
Step-by-step solution
Step 1 of 7
Set of statements for the LIBRARY relational schema from the figure 6.14 in the text book. The
CREATE TABLE is like this:
CREATE TABLE BOOK ( BookId CHAR(20) NOT NULL, Title VARCHAR(30) NOT NULL,
PublisherName VARCHAR(20), PRIMARY KEY (BookId), FOREIGN KEY (PublisherName)
REFERENCES PUBLISHER (Name) ON UPDATE CASCADE );
Comment
Step 2 of 7
CREATE TABLE BOOK_AUTHORS ( BookId CHAR(20) NOT NULL, AuthorName
VARCHAR(30) NOT NULL, PRIMARY KEY (BookId, AuthorName), FOREIGN KEY (BookId)
REFERENCES BOOK (BookId) ON DELETE CASCADE ON UPDATE CASCADE );
Comment
Step 3 of 7
CREATE TABLE PUBLISHER ( Name VARCHAR(20) NOT NULL, Address VARCHAR(40) NOT
NULL, Phone CHAR(12), PRIMARY KEY (Name) );
Comment
Step 4 of 7
CREATE TABLE BOOK_COPIES ( BookId CHAR(20) NOT NULL, BranchId INTEGER NOT
NULL, No_Of_Copies INTEGER NOT NULL, PRIMARY KEY (BookId, BranchId), FOREIGN KEY
(BookId) REFERENCES BOOK (BookId)
ON DELETE CASCADE ON UPDATE CASCADE,FOREIGN KEY (BranchId) REFERENCES
BRANCH (BranchId) ON DELETE CASCADE ON UPDATE CASCADE );
Comment
Step 5 of 7
CREATE TABLE BORROWER ( CardNo INTEGER NOT NULL, Name VARCHAR(30) NOT
NULL, Address VARCHAR(40) NOT NULL, Phone CHAR(12),
PRIMARY KEY (CardNo) );
Comment
Step 6 of 7
CREATE TABLE BOOK_LOANS ( CardNo INTEGER NOT NULL, BookId CHAR(20) NOT NULL,
BranchId INTEGER NOT NULL, DateOut DATE NOT NULL,
DueDate DATE NOT NULL, PRIMARY KEY (CardNo, BookId, BranchId),
FOREIGN KEY (CardNo) REFERENCES BORROWER (CardNo) ON DELETE CASCADE ON
UPDATE CASCADE, FOREIGN KEY (BranchId) REFERENCES LIBRARY_BRANCH (BranchId)
ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (BookId) REFERENCES BOOK (BookId) ON DELETE CASCADE ON UPDATE
CASCADE );
Comment
Step 7 of 7
CREATE TABLE LIBRARY_BRANCH ( BranchId INTEGER NOT NULL, BranchName
VARCHAR(20) NOT NULL, Address VARCHAR(40) NOT NULL,
PRIMARY KEY (BranchId) );
Comment
Chapter 6, Problem 9E
Problem
How can the key and foreign key constraints be enforced by the DBMS? Is the enforcement
technique you suggest difficult to implement? Can the constraint checks be executed efficiently
when updates are applied to the database?
Step-by-step solution
Step 1 of 3
Enforcement of key constraint in DBMS (Database management System):
Key constraint:
The technique that is often used to check efficiently for the key constraint is to create an index on
the combination of attributes that form each key (primary or secondary).
• Before inserting a new record (tuple), each index is searched to check that no value currently
exists in the index that matches the key value in the new record.
• If the search is successful then it inserts the record.
Foreign key constraint:
The technique to check the foreign key constraint is that using the index on the primary key of
each referenced relation will make the check relatively efficient.
Whenever a new record is inserted in a referencing relation, its foreign key value is used to
search the index for the primary key of the referenced relation, and if the referenced record
exists, then the new record can be successfully inserted in the referencing relation.
For deletion of a referenced record, it is useful to have an index on the foreign key of each
referencing relation so as to be able to determine efficiently whether any records reference the
record being deleted.
Comment
Step 2 of 3
Implementation of enforcement technique:
, the enforcement technique of using the index is easy to identify the duplicate data
records.
• If any other alternative structure like hashing is used instead of using the index on key
constraint then it only does the linear searches to check for constraints and it makes the checks
quite inefficient.
Comment
Step 3 of 3
Efficient constraint checks:
, the constraint checks are executed efficiently while inserting or deleting the record from
the database.
• Using the index to enforce the key constraint avoids the duplication of data records and this
helps the product vendors to achieve the greater data storage and management.
Thus, the constraint checks using the index is efficient.
Comment
Chapter 6, Problem 10E
Problem
Specify the following queries in SQL on the COMPANY relational database schema shown in
Figure 5.5. Show the result of each query if it is applied to the COMPANY database in Figure 5.6.
a. Retrieve the names of all employees in department 5 who work more than 10 hours per week
on the ProductX project.
b. List the names of all employees who have a dependent with the same first name as
themselves.
c. Find the names of all employees who are directly supervised by ‘Franklin Wong’.
Step-by-step solution
Step 1 of 9
a)
Query:
Select emp.Fname, emp.Lname
from employee emp, works_on w, project p
where emp.Dno = 5 and emp.ssn = w.Essn and w.Pno = p.pnumber and p.pname = 'ProductX'
and w.hours > 10
Comment
Step 2 of 9
Result:
Fname Lname
John Smith
Joyce English
Comment
Step 3 of 9
Explanation:
The above query will display the names of all employees of department “5” and who works more
than 10 hours per week on the project “Product X”.
Comment
Step 4 of 9
b)
Query:
Select emp.Fname, emp.Lname
from employee emp, dependent d
where emp.ssn= d.essn and emp.Fname = d.Dependent_name
Comment
Step 5 of 9
Result: (empty)
Fname Lname
Comment
Step 6 of 9
Explanation:
The above query will display the names of the entire employee who have a dependent with the
same first name as themselves.
• Here, the result is empty. Because, it does not have the same first name in dependent and
employee table.
Comment
Step 7 of 9
c)
Query:
Select emp.Fname, emp.Lname
from employee emp, employee emp1
where emp1.Fname= ‘Franklin’ and emp1.Lname = ‘Wong’ and emp.superssn = emp1.ssn
Comment
Step 8 of 9
Fname Lname
John Smith
Ramesh Narayan
Joyce English
Comment
Step 9 of 9
Explanation:
The above query uses self-join to display the names of all the employees who are under the
supervision of Franklin Wong.
Comment
Chapter 6, Problem 11E
Show transcribed image text
E Chegg Study TEXTBOOK SOLUTIONS EXPERT Q&A Search home study /engineering /computer science database systems
/solutions manual fundamentals of database systems /7th edition /chapter 6 problem 11e Fundamentals of Database Systems
(7th Edition) E Chapter 6, Problem 11E Bookmark Show all steps: a ON Problem Specify the updates of Exercise using the SQL
update commands. Exercise What is meant by a recursive relationship type? Give some example of recursive relationship types.
Step-by-step solution There is no solution to this problem yet. Get help from a Chegg subject expert. ASK AN EXPERT
If the same entity type participate more than once in a relationship type in different roles then such
relationship types are called recursive relationship. It occur within unary relationships. The relationship may
be one to one, one to many or many to many. That is the cardinality of the relationship is unary. The
connectivity may be 1:1, 1:M, or M:N.
For example, in the below gure REPORTS_TO is a recursive relationship as the Employee entity type plays
two roles – 1) Supervisor and 2) Subordinate.
The above relationship can also be de ned as relationship between a manager and a employee. An
employee is a manager as well as employee.
To implement recursive relationship, a foreign key of the employee’s manager number would be held in
each employee record.
Emp_entity( Emp_no,Emp_Fname, Emp_Lname, Emp_DOB, Emp_NI_Number, Manager_no);
View comments (1)

Manager no - (this is the employee no of the
Chapter 6, Problem 12E
Problem
Specify the following queries in SQL on the database schema of Figure 1.2.
a. Retrieve the names of all senior students majoring in ‘cs’ (computer science).
b. Retrieve the names of all courses taught by Professor King in 2007 and 2008.
c. For each section taught by Professor King, retrieve the course number, semester, year, and
number of students who took the section.
d. Retrieve the name and transcript of each senior student (Class = 4) majoring in CS. A
transcript includes course name, course number, credit hours, semester, year, and grade for
each course completed by the student.
Step-by-step solution
Step 1 of 4
a.
The query to display the names of senior students majoring in CS is as follows:
Query:
SELECT Name FROM STUDENT
WHERE Major = “CS” AND Class = “4”;
Output:
Explanation:
• There are no rows in the database where Class is Senior, and Major is CS.
• SELECT is used to query the database and get back the specified fields.
o Name is the columns of STUDENT table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o STUDENT is a table name.
• WHERE is used to specify a condition based on which the data is to be retrieved. In the
database, Seniors are represented by Class 4. The condition is as follows:
o Major='CS'AND Class = ‘4’
Comment
Step 2 of 4
b.
The query to get the course name that are taught by professor King in year 2007 and 2008 is as
follows:
Query:
SELECT Course_name
FROM COURSE, SECTION
WHERE COURSE.Course_number = SECTION.Course_number
AND Instructor = 'King'
AND (Year='07' or Year='08');
Output :
Explanation:
• SELECT is used to query the database and get back the specified fields.
o Course_name is the columns of COURSE table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o COURSE, SECTION are table names.
• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions are as follows:
o COURSE.Course_number = SECTION.Course_number
o Instructor = 'King'
o (Year='07' or Year='08')
• The conditions are concatenated with AND operator. All the conditions must be satisfied.
Comment
Step 3 of 4
c.
The query to retrieve the course number, Semester, Year and number of students who took the
section taught by professor King is as follows:
Query:
SELECT Course_number, Semester, Year, Count(G.Student_number) AS 'Number of Students'
FROM SECTION AS S, GRADE_REPORT AS G
WHERE S.Instructor= 'King'
AND S.Section_identifier=G.Section_identifier;
Output :
Explanation:
• SELECT is used to query the database and get back the specified fields.
o Course_number, Semester, Year are the columns of SECTION table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o GRADE_REPORT, SECTION are table names.
• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions are as follows:
o S.Instructor= 'King'
o S.Section_identifier=G.Section_identifier
Comment
Step 4 of 4
d.
The query to display the name and transcript of each senior students majoring in CS is as
follows:
Query:
SELECT ST.Name, C.Course_name, C.Course_number, C.Credit_hours, S.Semester, S.Year,
G.Grade
FROM STUDENT AS ST, COURSE AS C, SECTION AS S, GRADE_REPORT As G
WHERE Class = 4 AND Major='CS'
AND ST.Student_number= G.Student_number
AND G.Section_identifier= S.Section_identifier
AND S.Course_number= C.Course_number;
Output :
No rows selected.
Explanation:
• SELECT is used to query the database and get back the specified fields.
o Course_number, Course_number, Credit_hours are the columns of COURSE table.
o Semester, Year are the columns of SECTION table.
o Name is the columns of STUDENT table.
o Grade is the columns of GRADE_REPORT table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o STUDENT, COURSE, GRADE_REPORT, SECTION are table names.
o ST is the alias name for STUDENT table.
o G is the alias name for GRADE_REPORT table.
o S is the alias name for SECTION table.
o C is the alias name for COURSE table.
• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions are as follows:
o Class = 4
o Major='CS'
o ST.Student_number= G.Student_number
o G.Section_identifier= S.Section_identifier
o S.Course_number= C.Course_number
Comment
Chapter 6, Problem 13E
Problem
Write SQL update statements to do the following on the database schema shown in Figure 1.2.
a. Insert a new student, , in the database.
b. Change the class of student ‘Smith’ to 2.
c. Insert a new course, <’Knowledge Engineering’, ‘cs4390’, 3, ‘cs’>.
d. Delete the record for the student whose name is ‘Smith’ and whose student number is 17.
Step-by-step solution
Step 1 of 4
a.
The query to insert a new student into STUDENT relation is as follows:
Query:
INSERT INTO STUDENT VALUES ('Johnson', 25, 1, 'MATH');
Explanation:
• INSERT command is used to insert a row into a relation.
• STUDENT is the name of the relation.
Output:
Comment
Step 2 of 4
b.
The query to update the class of a student with name Smith to 2 is as follows:
Query:
UPDATE STUDENT
SET CLASS = 2
WHERE Name='Smith';
Explanation:
• UPDATE command is used to modify the data in a relation.
• STUDENT is the name of the relation.
• SET is used to specify the new value for a column.
• WHERE is used to specify a condition based on which the data is to be retrieved.
Output:
Comment
Step 3 of 4
c.
Query:
INSERT INTO COURSE VALUES
('Knowledge Engineering','cs4390', 3,'cs');
Explanation:
• INSERT command is used to insert a row into a relation.
• COURSE is the name of the relation.
Output:
Comment
Step 4 of 4
d.
Query:
DELETE FROM STUDENT
WHERE Name='Smith' AND Student_number=17;
Explanation:
• DELETE command is used to delete a row from the specified relation.
• STUDENT is the name of the relation.
• WHERE is used to specify a condition based on which the data is to be retrieved.
Output:
Chapter 6, Problem 14E
Problem
Design a relational database schema for a database application of your choice.
a. Declare your relations using the SQL DDL.
b. Specify a number of queries in SQL that are needed by your database application.
c. Based on your expected use of the database, choose some attributes that should have
indexes specified on them.
d. Implement your database, if you have a DBMS that supports SQL.
Step-by-step solution
Step 1 of 6
Consider a student database that stores the information about students, courses and faculty.
a.
The DDL statement to create the relation STUDENT is as follows:
CREATE TABLE STUDENT (
StudentID int(11) NOT NULL,
FirstName varchar(20) NOT NULL,
LastName varchar(20) NOT NULL,
Address varchar(30) NOT NULL,
DOB date,
Gender char
);
The DDL statement to add a primary key to the relation STUDENT is as follows:
ALTER TABLE STUDENT
ADD PRIMARY KEY (StudentID);
The DDL statement to create the relation COURSE is as follows:
CREATE TABLE COURSE (
CourseID varchar(30) NOT NULL,
CourseName varchar(30) NOT NULL,
PRIMARY KEY (CourseID)
);
The DDL statement to create the relation FACULTY is as follows:
CREATE TABLE FACULTY (
FacultyID int(11) NOT NULL,
FacultyName varchar(30) NOT NULL,
PRIMARY KEY (FacultyID)
);
The DDL statement to create the relation REGISTRATION is as follows:
CREATE TABLE REGISTRATION (
StudentID int(11) NOT NULL,
CourseID varchar(30) NOT NULL,
PRIMARY KEY (StudentID, CourseID)
);
The DDL statement to create the relation TEACHES is as follows:
CREATE TABLE TEACHES (
FacultyID int(11) NOT NULL,
CourseID varchar(30) NOT NULL,
DateQualified varchar(12),
PRIMARY KEY (FacultyID,CourseID)
);
The DDL statement to add a column GradePoints to the relation COURSE is as follows:
ALTER TABLE COURSE
ADD COLUMN GradePoints int(2);
Comment
Step 2 of 6
b.
A wide number of queries can be written using the five relations based on the requirement of the
user. So, the number of queries is not fixed and will vary.
Some of the possible queries that are needed by the database application are as follows:
The query to retrieve the details of the students is as follows:
SELECT *
FROM STUDENT;
The query to retrieve the details of the faculties is as follows:
SELECT *
FROM FACULTY;
The query to retrieve the details of the courses offered is as follows:
SELECT *
FROM COURSE;
The query to retrieve which course is taught by which faulty is as follows:
SELECT *
FROM TEACHES;
The query to retrieve the names of the students who have registered for a course is as follows:
SELECT FirstName, LastName
FROM STUDENT, REGISTRATION
WHERE STUDENT.StudentID=REGISTRATION.StudentID;
Comment
Step 3 of 6
The query to retrieve the details of the male students is as follows:
SELECT * FROM STUDENT
WHERE GENDER= 'M';
The query to retrieve the courses with grade point 3 and above is as follows:
SELECT * FROM COURSE
WHERE GradePoints >=3;
Comment
Step 4 of 6
c.
Indexes are used for faster retrieval of data. Some of the attributes that can used as indexes are
as follows:
• An index can be specified on FirstName in STUDENT relation.
• An index can be specified on LastName in STUDENT relation.
• An index can be specified on CourseName in COURSE relation.
• An index can be specified on FacultyName in FACULTY relation.
Comment
Step 5 of 6
d.
The implementation of the student database is as follows:
Comment
Step 6 of 6
Comment
Chapter 6, Problem 15E
Problem
Consider that the EMPLOYEE table’s constraint EMPSUPERFK as specified in Figure 6.2 is
changed to read as follows:
CONSTRAINT EMPSUPERFK FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn)
Answer the following questions:
a. What happens when the following command is run on the database state shown in Figure 5.6?
DELETE EMPLOYEE WHERE Lname = ‘Borg’
b. Is it better to CASCADE or SET NULL in case of EMPSUPERFK constraint ON DELETE?
Step-by-step solution
Step 1 of 2
a)
From the figure 8.2 in the text book, while EMP table constraint specified as
CONSTRAINT EMPSUPER FK FOREIGN KEY(supper_ssn) REFERNCES EMPLOYEE(Ssn)
ON DELETET CASCADE ON UPDATE CASCADE,
From the figure 5.5 in the text book the result is like this.
The James E. Borg entry is deleted from the table, and each employee with him as a
supervisor is also (and their supervisees, and so on). In total, 8 rows are deleted and the
table is empty.
Comment
Step 2 of 2
b)
Yes, It is better to SET NULL, since an employee is not fired (DELETED) when their
supervisor is deleted. Instead, their SUPERSSN should be SET NULL so that they can later
get a new supervisor.
Comment
Chapter 6, Problem 16E
Problem
Write SQL statements to create a table EMPLOYEE_BACKUP to back up the EMPLOYEE table
shown in Figure 5.6.
Step-by-step solution
Step 1 of 4
Step1:
Create a table EMPLOYEE is as follows:
CREATE TABLE EMPLOYEE (
Fname varchar(15) NOT NULL,
Minit char(1) DEFAULT NULL,
Lname varchar(15) NOT NULL,
Ssn char(9) NOT NULL,
Bdata date DEFAULT NULL,
Address varchar(30) DEFAULT NULL,
Sex char(1) DEFAULT NULL,
Salary decimal(10,2) DEFAULT NULL,
Super_ssn char(9) DEFAULT NULL,
Dno int(11) NOT NULL,
PRIMARY KEY ( Ssn )
);
Step2:
Insert the data into the EMPLOYEE table using INSERT command.
INSERT INTO EMPLOYEE VALUES ('James', 'E', 'Borg', '888665555', DATE '1937-11-10', '450
Stone, Houston, TX', 'M', 55000, NULL, 1);
INSERT INTO EMPLOYEE VALUES ('Jennifer', 'S', 'Wallace', '987654321', DATE '1941-06-20',
'291 Berry, Bellaire, Tx', 'F', 37000, '888665555', 4);
INSERT INTO EMPLOYEE VALUES ('Franklin', 'T', 'Wong', '333445555', DATE '1955-12-08',
'638 Voss, Houston, TX', 'M', 40000, '888665555', 5);
INSERT INTO EMPLOYEE VALUES ('John', 'B', 'Smith', '123456789', DATE '1965-01-09', '731
Fondren, Houston, TX', 'M', 30000, '333445555', 5);
INSERT INTO EMPLOYEE VALUES ('Alicia', 'J', 'Zelaya', '999887777', DATE '1968-01-19', '3321
castle, Spring, TX', 'F', 25000, '987654321', 4);
INSERT INTO EMPLOYEE VALUES ('Ramesh', 'K', 'Narayan', '666884444', DATE '1920-09-15',
'975 Fire Oak, Humble, TX', 'M', 38000, '333445555', 5);
INSERT INTO EMPLOYEE VALUES ('Joyce', 'A', 'English', '453453453', DATE '1972-07-31',
'5631 Rice, Houston, TX', 'F', 25000, '333445555', 5);
INSERT INTO EMPLOYEE VALUES ('Ahmad', 'V', 'Jabbar', '987987987', DATE '1969-03-29',
'980 Dallas, Houston, TX', 'M', 22000, '987654321', 4);
INSERT INTO EMPLOYEE VALUES ('Melissa', 'M', 'Jones', '808080808', DATE '1970-07-10',
'1001 Western, Houston, TX', 'F', 27500, '333445555', 5);
Step3:
Now, select the EMPLOYEE table to display all the rows.
select * from EMPLOYEE;
Sample Output:
Comment
Step 2 of 4
The SQL statements to create a table EMPLOYEE_BACKUP to store the backup data of
EMPLOYEE table is as follows:
The SQL statement to create the EMPLOYEE_BACKUP table:
CREATE TABLE EMPLOYEE_BACKUP LIKE EMPLOYEE;
Explanation:
• The SQL statement will create the table EMPLOYEE_BACKUP with the same structure as the
table EMPLOYEE.
• CREATE TABLE is the command to create a table.
• LIKE is the keyword used to copy the structure of the table EMPLOYEE.
Comment
Step 3 of 4
The SQL statement to insert the data into the EMPLOYEE_BACKUP:
INSERT INTO EMPLOYEE_BACKUP (SELECT * FROM EMPLOYEE);
Explanation:
• The SQL statement will insert the data in the table EMPLOYEE_BACKUP into the table
EMPLOYEE_BACKUP.
•
Comment
Step 4 of 4
SELECT * FROM EMPLOYEE will fetch all the records from the table EMPLOYEE.
Sample Output:
Comment
Chapter 7, Problem 1RQ
Problem
Describe the six clauses in the syntax of an SQL retrieval query. Show what type of constructs
can be specified in each of the six clauses. Which of the six clauses are required and which are
optional?
Step-by-step solution
Step 1 of 3
A query in SQL consists of up to six clauses. The clauses are specified in following order.
• SELECT < attribute list >
• FROM < table list >
• [ WHERE < condition > ]
• [ GROUP BY < grouping attributes (S) > ]
• [ HAVING < group condition > ]
• [ ORDER BY < attribute list > ]
Comment
Step 2 of 3
The definition of the types of the values returned by the query is made with the help of the
SELECT clause.
The FROM clause is used to retrieve the desired data from the table for the provided query.
The WHERE clause is a conditional clause. It is used to retrieve the values with restriction.
The GROUP BY clause is used to group the results for the provided query according to the
properties.
The HAVING clause is used to retrieve the results of the GROUP BY clause with some
restriction.
The ORDER BY clause is used to sort the values returned by the query in a specific order.
Comment
Step 3 of 3
The SELECT and FROM clauses are the required clauses and the clauses like WHERE,
GROUP BY, HAVING and ORDER BY are optional clauses.
Comment
Chapter 7, Problem 2RQ
Problem
Describe conceptually how an SQL retrieval query will be executed by specifying the conceptual
order of executing each of the six clauses.
Step-by-step solution
Step 1 of 1
A retrieval query in SQL can consist of up to six clauses, but only the first two-SELECT and
FROM- are mandatory. The clauses are specified in the following order, with the clauses
between square brackets […] being optional:
SELECT
FROM
[WHERE]
[GROUP BY]
[HAVING]
[ORDER BY ]
The SELECT clause lists the attributes or functions to be retrieved. The FROM clause specifies
all relation needed in query, including joined relations, but not those in nested queries. The
WHERE clause specifies the conditions for selection of tuples from these relations, including join
conditions if needed. GROUP BY specifies grouping attributes, HAVING specifies a condition on
groups being selected rather than individual tuples. ORDER BY specifies an order for displaying
the result of a query.
A query is evaluated conceptually by first applying FROM clause, followed by WHERE clause,
and then GROUP BY, and HAVING. ORDER BY s applied at the end to sort the query result. The
values of the attributes specified in SELECT clause are shown in result.
Comment
Chapter 7, Problem 3RQ
Problem
Discuss how NULLs are treated in comparison operators in SQL. How are NULLs treated when
aggregate functions are applied in an SQL query? How are NULLs treated if they exist in
grouping attributes?
Step-by-step solution
Step 1 of 1
In SQL NULL is treated as an UNKNOWN value. SQL has thre logical operators TRUE, FALSE,
UNKNOWN.
For comparison operators in SQL, NULL can be compared using IS or IS NOT operator. SQL
treats each NULL as a distinct value, so =,<,> can not be used for comparison.
In general, NULL values are discarded when aggregate functions are applied to a particular
column.
If NULL exists in the grouping attribute, then separate group is created for all tuples with a NULL
value in the grouping attribute.
Comment
Chapter 7, Problem 4RQ
Problem
Discuss how each of the following constructs is used in SQL, and discuss the various options for
each construct. Specify what each construct is useful for.
a. Nested queries
b. Joined tables and outer joins
c. Aggregate functions and grouping
d. Triggers
e. Assertions and how they differ from triggers
f. The SQL WITH clause
g. SQL CASE construct
h. Views and their updatability
i. Schema change commands
Step-by-step solution
Step 1 of 11
a.
Nested Queries:
A nested query is a type of SQL query that is used within another SQL queries with WHERE
clause. It is also known as sub query or Inner query.
Options:
It can be used with the SELECT, INSERT, UPDATE, and DELETE statements. These statements
are used with the operators <, >, <=, >=, =, IN, BETWEEN.
SYNTAX:
Get the employee id of all employee who are enrolled in the same business as the other
employee with salary 35000.
Select * from
where in
Use:
It is used to return values after comparison from the selected values.
Comment
Step 2 of 11
b.
Joined Tables:
A joined-table is the resultant table that is the generated by an inner join, or an outer join, or a
cross join.
Uses of Joined Tables:
A joined table can be used in any context where the SELECT statement is used.
Outer Join:
Types of outer:
1) Left outer join: when left outer join is applied on tables it return all the rows from the left table
and those right table rows also came which is same in the left table row. It is denoted by the
symbol (?).
Syntax:
SELECT columnFROM table_ALEFTJOIN table_BON table_A.column_1=table_B.column_2;
2) Right outer join: when the right outer join is applied to tables, it returns all the rows from the
right table and those left table rows also came which is same in the right table row. It is denoted
by the symbol (?).
Syntax:
SELECT columnFROM table_ARIGHTJOIN tableBON table_A.column1=tableB.column2;
3) Full outer join: when the full outer join is applied on the table it return all the rows from both
the left and the right table. It is denoted by the symbol (?).
Syntax:
SELECT column
FROM table_AFULLOUTERJOIN table_BON table_A.column1=table_B.column2;
Options:
It is used with the SELECT, FROM and ON statements.
Use:
Join can be used to get a resultant column or table by adding two different table.
Comment
Step 3 of 11
c.
Aggregate Functions:
It is a function where the multiple input values take from the Column to generate a single value
as an output
Aggregate functions are: Avg, Count, First, Last, Max, Min, Sum etc.
Option:
It can be used with the SELECT and FROM.
Use:
It is used to perform mathematics operation easily.
Grouping:
In many cases to subgroup the tuples in a relation the aggregation function may apply. These
subgroups are dependent on some attribute values. On applying the group by clause the table is
divided into different group.
Syntax for using Group by clause:
SELECT column name, function (column name)
FROM table name
WHERE column name operator value
GROUP BY column name;
Options:
It can be used with the SELECT,FROM and WHERE statements.
Use:
The GROUP BY clause is applied when there is a need of dividing the table into different group
according the attributes values.
Comment
Step 4 of 11
d.
Triggers:
A database trigger is procedural code, which automatically execute or fire when event (INSERT,
DELETE or UPDATE) occurs.
Syntax for trigger:
Options:
It can be used with the INSERT, DELETE and UPDATE statements.
Use:
Trigger can be used for the following purpose:
1. To create some derived column automatically.
2. To improve security authorization.
3. To avoid the invalid transaction
Comment
Step 5 of 11
e.
Assertions:
It is an expression that should be always true. When there is create the expression should
always be true. DBMS checks the assertion after any change in the expression that may violate
the expression.
Syntax for Assertions:
Create assertion check
Predicates always return a result either true or false.
Option:
It can be used with the CREATE, CHECK and FROM statements.
Use:
It can be used to check the condition of schema only.
The following table shows the difference between ASSERTION andTRIGGERS:
ASSERTIONS
TRIGGERS
Assertion only check the conditions it do not
Triggers check the condition and if required
modify the data.
the change the data also.
Assertion neither linked the particular table nor
Trigger linked the both particular table and
particular events in the database.
particular in the database.
All Assertion can be used as the Trigger.
All Triggers cannot be implements as
assertions.
Oracle database does not implements Assertions. Oracle database implements Triggers.
Comment
Step 6 of 11
f.
The SQL WITH clause:
This clause was introduced as a convenience in SQL 99 and it was added into the Oracle SQL
syntax in Oracle 9.2, it may not available in all SQL based DBMS. It allows the user to define the
table in a such a way that it is only being used in a particular query. It is sometime similar like
creating a view that will be used in a particular query then drop.
Syntax for SQL WITH clause:
WITH temporary table
SELECT Column name
FROM table name
WHERE condition
GROUP BY column name;
Option:
It can be used with SELECT, FROM, WHERE and GROUP BY statements.
Used:
It can be used to create a complex statement rather than simple statements. It can be used to
break down complex SQL queries with which it easy for debugging and processing the complex
queries.
Comment
Step 7 of 11
g.
SQL CASE construct:
The SQL case constructs used as the if-else-then used in java similarly it is used in SQL. It can
be used when some value or any values is different on a particular condition. SQL case construct
can be used with any SQL query where the conditional values have to be extract.
Syntax of Sql case construct:
Case expression
WHEN condition_a THEN result_1
WHEN condition_b THEN result_2
WHEN condition_c THEN result_3
ELSE result
END;
Comment
Step 8 of 11
Option:
It can be used with the SELECT and FROM statement.
Comment
Step 9 of 11
Use:
It can be used to perform a operation when there is a particular condition occur.
Comment
Step 10 of 11
h.
Views and their updatability:
The view is a virtual table which is derived from the other table and these other tables are base
table. And these base tables are physically exist and its tuples are stored in the database.
Syntax for creating view:
CREATE VIEW virtual table
AS SELECT attributes
FROM different tables
WHERE conditions;
It creates the view there is the name of the view and in the AS SELECT we define the attributes
which came under virtual table, the FROM clause defines the table from where the attributes will
be extracted for the virtual table and in the where there is particular condition which should be
satisfied by the virtual table.
Option:
It can be used with the AS SELECT, FROM, WHERE statements.
Use:
The virtual table is create when the table need to reference frequently.
Comment
Step 11 of 11
i.
Schema change Commands:
The schema change command used in sql to alter a schema by adding or dropping the
attributes, table, constraints and other schema elements. This can be done when the database
does not require to again compile the database schema and the database is optional.
The different Schema change Commands are as follows:
• The drop command
• The alter command
DROP command:
The drop commands can be used to drop schema elements, Such as tables, attributes,
constraints. The whole schema can be drop by the command DROP SCHEMA.
Syntax of drop command:
DROP SCHEMA employee CASCADE;
ALTER command:
The schema can be change with the help of the Alter command, such as changing the column
name, adding or dropping the attributes.
Syntax of alter command:
ALTER TABLE employee ADD COLUMN phone_no VARCHAR (15);
Use:
It can be used to change the schema or to drop the schema.
Comment
Chapter 7, Problem 5E
Problem
Specify the following queries on the database in Figure 5.5 in SQL. Show the query results if
each query is applied to the database state in Figure 5.6.
a. For each department whose average employee salary is more than $30,000, retrieve the
department name and the number of employees working for that department.
b. Suppose that we want the number of male employees in each department making more than
$30,000, rather than all employees (as in Exercise a). Can we specify this query in SQL? Why or
why not?
Step-by-step solution
Step 1 of 2
a)
The query to retrieve dname and count of employees working in that department whose average
salary is greater than 30000 is as follows:
Query:
SELECT Dname, COUNT(*) FROM DEPARTMENT, EMPLOYEE
WHERE DEPARTMENT.Dnumber=EMPLOYEE.DNo
GROUP BY Dname
HAVING AVG(Salary) > 30000;
Output:
Explanation:
• SELECT is used to query the database and get back the specified fields.
o Dname, LAST_NAME, FIRST_NAME is an attribute of DEPARTMENT table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o EMPLOYEE and DEPARTMENT are table names.
• WHERE is used to specify a condition based on which the data is to be retrieved.
The conditions are as follows:
o DEPARTMENT.Dnumber=EMPLOYEE.DNo
• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.
o Dname is the group by attribute.
• HAVING clause is used to specify the condition based on group by function.
o AVG(Salary) > 30000 is the condition.
• COUNT(*) is used to count the number of tuples that satisfy the conditions.
Comment
Step 2 of 2
(b)
The query to retrieve dname and count of employees working in that department whose salary is
greater than 30000 is as follows:
Query:
SELECT Dname, COUNT(*) FROM DEPARTMENT, EMPLOYEE
WHERE DEPARTMENT.Dnumber=EMPLOYEE.DNo
AND Sex='M'
AND Salary > 30000
GROUP BY Dname;
Output:
Explanation:
• SELECT is used to query the database and get back the specified fields.
o Dname, LAST_NAME, FIRST_NAME is an attribute of DEPARTMENT table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o EMPLOYEE and DEPARTMENT are table names.
• WHERE is used to specify a condition based on which the data is to be retrieved.
The conditions are as follows:
o DEPARTMENT.Dnumber=EMPLOYEE.DNo
o Sex='M'
o Salary > 30000
• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.
o Dname is the group by attribute.
Comments (1)
Chapter 7, Problem 6E
Problem
Specify the following queries in SQL on the database schema in Figure 1.2.
a. Retrieve the names and major departments of all straight-A students (students who have a
grade of A in all their courses).
b. Retrieve the names and major departments of all students who do not have a grade of A in
any of their courses.
Step-by-step solution
Step 1 of 2
a.
The query to retrieve the names and major departments of the students who got A grade in all
the courses is as follows:
Query:
SELECT Name, Major FROM STUDENT
WHERE NOT EXISTS (SELECT * FROM GRADE_REPORT
WHERE Student_number= STUDENT.Student_number
AND NOT (GRADE='A'));
Explanation:
• SELECT is used to query the database and get back the specified fields.
o Name, Major are columns of STUDENT table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o STUDENT is a table name.
• WHERE is used to specify a condition based on which the data is to be retrieved.
• The inner query retrieves the details of the student who got other than A grade for any courses.
• The outer query retrieves the name and major of the student who got A grade for all courses.
• NOT EXISTS is used to retrieve only those students which are not retrieved by inner query.
Output:
Comment
Step 2 of 2
b.
The query to retrieve the names and major departments of the students who got A grade in all
the courses is as follows:
Query:
SELECT Name, Major FROM STUDENT
WHERE NOT EXISTS (SELECT * FROM GRADE_REPORT
WHERE Student_number= STUDENT.Student_number
AND (GRADE= 'A'));
Explanation:
• SELECT is used to query the database and get back the specified fields.
o Name, Major are columns of STUDENT table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o STUDENT is a table name.
• WHERE is used to specify a condition based on which the data is to be retrieved.
• The inner query retrieves the details of the student who got A grade for any courses.
• The outer query retrieves the name and major of the student who did not get A grade for any
courses.
• NOT EXISTS is used to retrieve only those students which are not retrieved by inner query.
Output:
Comment
Chapter 7, Problem 7E
Problem
In SQL, specify the following queries on the database in Figure 5.5 using the concept of nested
queries and other concepts described in this chapter.
a. Retrieve the names of all employees who work in the department that has the employee with
the highest salary among all employees.
b. Retrieve the names of all employees whose supervisor’s supervisor has ‘888665555’ for Ssn.
c. Retrieve the names of employees who make at least $10,000 more than the employee who is
paid the least in the company.
Step-by-step solution
Step 1 of 4
SQL:
Structured Query Language (SQL) is a database language for managing and accessing the data
in a relational database.
• SQL consists of queries to insert, update, delete, and retrieve records from a database. It even
creates a new database and database table.
Nested query:
Some of the queries require the need of existing values to be obtained and then it is utilized in a
comparison condition. This is referred as nested query. In this, a completed “select from where”
blocks exist inside WHERE clause of a different query. This query is referred as outer query.
The format of “ select ” statement is:
SELECT attribute-list FROM table-list WHERE condition
o Here, “SELECT”, “FROM”, and “WHERE” are the keywords.
o “attribute-list” is the list of attributes.
• To retrieve all the attributes of a table, instead of giving all attributes in the table, asterisk (*) can
be used.
o “table-list” is the list of tables.
o Condition is optional.
Comment
Step 2 of 4
a)
Query:
SELECT LNAME FROM EMPLOYEE WHERE DNO = (SELECT DNO FROM
EMPLOYEE WHERE SALARY = (SELECT MAX(SALARY) FROM EMPLOYEE) )
Explanation:
The first nested (outer) query selects all employee names. While the second query selects
department number with the employee of highest salary among all the employees.
Comment
Step 3 of 4
b)
Query:
SELECT LNAME FROM EMPLOYEE WHERE SUPERSSN IN (SELECT SSN
FROM EMPLOYEE WHERE SUPERSSN = ‘888665555’)
Explanation:
The first nested (outer) query selects the employee names where the supervisor’s supervisor
serial number in the second query matches with the number “888665555”.
Comments (1)
Step 4 of 4
c)
Query:
SELECT LNAME FROM EMPLOYEE WHERE SALARY > 10000 + ( SELECT MIN(SALARY)
FROM EMPLOYEE)
Explanation:
The first nested (outer) query selects the employee names where the salary is greater than
10,000 and in the second query, it selects the employee who has the least salary.
Comment
Chapter 7, Problem 8E
Problem
Specify the following views in SQL on the COMPANY database schema shown in Figure 5.5.
a. A view that has the department name, manager name, and manager salary for every
department
b. A view that has the employee name, supervisor name, and employee salary for each
employee who works in the ‘Research’ department
c. A view that has the project name, controlling department name, number of employees, and
total hours worked per week on the project for each project
d. A view that has the project name, controlling department name, number of employees, and
total hours worked per week on the project for each project with more than one employee
working on it
Step-by-step solution
Step 1 of 4
a.
A view that has the department name along with the name and salary of the manager for every
department is as follows:
CREATE VIEW MANAGER_INFORMATION
AS SELECT Dname, Fname AS Manager_First_name, Salary
FROM DEPARTMENT, EMPLOYEE
WHERE DEPARTMENT.Mgr_ssn = EMPLOYEE.Ssn;
Explanation:
• CREATE VIEW will create a view with the MANAGER_INFORMATION.
• SELECT is used to query the database and get back the specified fields.
o Dname is an attribute of DEPARTMENT table.
o Fname and Salary are attributes of EMPLOYEE table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o DEPARTMENT, EMPLOYEE are table names.
• WHERE is used to specify a condition based on which the data is to be retrieved.
o DEPARTMENT.Mgr_ssn = EMPLOYEE.Ssn is the condition.
Comment
Step 2 of 4
b.
A view that has the employee name, supervisor name and employee salary for each employee
who works in the Research department is as follows:
CREATE VIEW EMPLOYEE_INFORMATION
AS SELECT e.Fname AS Employee_first_name,
e.Minit AS Employee_middle_init,
e.Lname AS Employee_last_name,
s.Fname AS Manager_fname,
s.Minit AS Manager_minit,
s.Lname AS Manager_Lname, Salary
FROM EMPLOYEE AS e, EMPLOYEE AS s,
DEPARTMENT AS d
WHERE e.Super_ssn = s.Ssn
AND e.Dno = d.Dnumber
AND d.Dname = 'Research';
Explanation:
• CREATE VIEW will create a view with the EMPLOYEE_INFORMATION.
• SELECT is used to query the database and get back the specified fields.
o Dname is an attribute of DEPARTMENT table.
o Fname, Lname, Minit and Salary are attributes of EMPLOYEE table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o DEPARTMENT, EMPLOYEE are table names.
o e, s are the alias names of EMPLOYEE table.
o d is alias name of DEPARTMENT table.
• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions specified in the query are
o e.Super_ssn = s.Ssn checks
o e.Dno = d.Dnumber
o d.Dname = 'Research'
Comment
Step 3 of 4
c.
A view that has the project name, controlling department name, number of employees, and total
hours worked per week on the project is as follows:
CREATE VIEW PROJECT_INFORMATION
AS SELECT Pname, Dname, COUNT(WO.Essn), SUM(WO.Hours)
FROM PROJECT AS P, WORKS_ON AS WO,
DEPARTMENT AS D
WHERE P.Dnum = D.Dnumber
AND P.Pnumber = WO.Pno
GROUP_BY Pno;
Explanation:
• CREATE VIEW will create a view with the PROJECT_INFORMATION.
• SELECT is used to query the database and get back the specified fields.
o Dname is an attribute of DEPARTMENT table.
o Pname is an attribute of PROJECT table.
o Essn and Hours are attributes of WORKS_ON table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o DEPARTMENT, EMPLOYEE and WORKS_ON are table names.
o P is the alias name for PROJECT table.
o D is alias name of DEPARTMENT table.
o WO is alias name of WORKS_ON table.
• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions specified in the query are
o P.Dnum = D.Dnumber
o P.Pnumber = WO.Pno
• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.
o Pno is the group by attribute.
Comment
Step 4 of 4
d.
The following is the view that has the project name, controlling department name, number of
employees, and total hours worked per week on the project for each project with more than one
employee working on it.
CREATE VIEW PROJECT_INFO
AS SELECT Pname, Dname, COUNT(WO.Essn), SUM(WO.Hours)
FROM PROJECT AS P, WORKS_ON AS WO,
DEPARTMENT AS D
WHERE P.Dnum = D.Dnumber
AND P.Pnumber = WO.Pno
GROUP_BY Pno
HAVING COUNT(WO.Essn) > 1;
Explanation:
• CREATE VIEW will create a view with the PROJECT_INFO.
• SELECT is used to query the database and get back the specified fields.
o Dname is an attribute of DEPARTMENT table.
o Pname is an attribute of PROJECT table.
o Essn and Hours are attributes of WORKS_ON table.
• FROM is used to query the database and get back the preferred information by specifying the
table name.
o DEPARTMENT, EMPLOYEE and WORKS_ON are table names.
o P is the alias name for PROJECT table.
o D is alias name of DEPARTMENT table.
o WO is alias name of WORKS_ON table.
• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions specified in the query are
o P.Dnum = D.Dnumber
o P.Pnumber = WO.Pno
• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.
o Pno is the group by attribute.
• HAVING clause is used to specify the condition based on group by function.
o COUNT(WO.Essn) > 1 is the condition.
Comment
Chapter 7, Problem 9E
Problem
Consider the following view, DEPT_SUMMARY, defined on the COMPANY database in Figure
5.6:
CREATE VIEW
DEPT_SUMMARY (D, C, Total_s, Average_s)AS SELECT
Dno, COUNT
State which of the following queries and updates would be allowed on the view. If a query or
update would be allowed, show what the corresponding query or update on the base relations
would look like, and give its result when applied to the database in Figure 5.6.
a.
SELECT
*
FROM
DEPT_SUMMARY;
b.
SELECT
D,C
FROM
DEPT_SUMMARY
WHERE
TOTAL_S > 100000;
c.
SELECT D, AVERAGE_S
FROM
DEPT_SUMMARY
WHERE C > ( SELECT C FROM D
d.
UPDATE
DEPT_SUMMARY
SET
D=3
WHERE
e.
DELETE
FROM DEPT_SUMMARY
WHERE
C > 4;
Step-by-step solution
Step 1 of 5
a) Allowed
D C Total_s Average_s
5
4
133000 33250
D = 4;
4
3
93000
31000
1
1
55000
55000
Comments (1)
Step 2 of 5
b) Allowed
D C
5
4
Comment
Step 3 of 5
c) Allowed
D Average_s
5
33250
Comment
Step 4 of 5
d) Not allowed because update on aggregate functions is not evaluated.
Comment
Step 5 of 5
e) Not allowed because there can be multiple meaning of the query.
Comment
Chapter 8, Problem 1RQ
Problem
List the operations of relational algebra and the purpose of each.
Step-by-step solution
Step 1 of 6
The operations of relational algebra are as follows:
• SELECT
• PROJECT
• THETA JOIN
• EQUI JOIN
• NATURAL JOIN
• UNION
• INTERSECTION
• MINUS or DIFFERENCE
• CARTESIAN PRODUCT
• DIVISION
Comment
Step 2 of 6
SELECT operation:
• It is used to obtain a subset of tuples of a relation based on a condition. In other words, it
retrieves only those tuples that satisfy the condition.
• The symbol used to denote SELECT operation is
.
• The notation of SELECT operation is
•
.
retrieves the tuples from relation Employee whose job is clerk.
PROJECT operation:
• It is used to obtain certain attributes/columns of a relation. The attributes to be retrieved must
be specified as a list separated by commas.
• The symbol used to denote PROJECT operation is
• The notation of PROJECT operation is
•
.
.
retrieves only the employee’s last name, first name and
employee number of all employees in relation Employee
Comment
Step 3 of 6
THETA JOIN operation:
• THETA JOIN operation combines related tuples from two relations and outputs as a single
tuple.
• The symbol used to denote THETA JOIN operation is
.
• The notation of THETA JOIN between the relations R and S is given as
.
EQUI JOIN operation:
• An EQUIJOIN operation combines all the tuples of relations R and S that satisfy the condition.
The comparison operator must be =.
• The notations of EQUI JOIN between the relations R and S is given as
Comment
Step 4 of 6
NATURAL JOIN operation:
• It is similar to EQUIJOIN. The only difference is the join attributes of relation S are not included
in the resultant relation.
• The notations of NATURAL JOIN between the relations R and S is given as
UNION operation:
• When UNION operation is applied on relations R and S, the resultant relation consists of all the
tuples in relation R or S or both R and S.
• If similar tuples are in both R and S relations, then only one tuple will be in the resultant relation.
• The UNION operation can be applied on relations R and S only if the relations are union
compatible.
• The symbol used to denote UNION operation is
.
• The notation of UNION between the relations R and S is given as
.
Comment
Step 5 of 6
INTERSECTION operation:
• When INTERSECTION operation is applied on relations R and S, the resultant relation consists
of only the tuples that are in both R and S.
• The symbol used to denote INTERSECTION operation is
.
• The notation of INTERSECTION between the relations R and S is given as
.
MINUS or DIFFERENCE operation:
• When DIFFERENCE operation is applied on relations R and S, the resultant relation consists of
only the tuples that are R but not in S.
• The symbol used to denote DIFFERENCE operation is
.
• The notation of DIFFERENCE between the relations R and S is given as
.
Comment
Step 6 of 6
CARTESIAN PRODUCT operation:
• When CARTESIAN PRODUCT operation is applied on relations R and S, the resultant relation
consists of all the attributes of relation R and S along with all possible combination of the tuples
of R and S.
• The symbol used to denote CARTESIAN PRODUCT operation is
.
• The notation of CARTESIAN PRODUCT between the relations R and S is given as
.
DIVISION operation:
• This combines all the tuples
form a new relation where
of
that appears in
• The symbol used to denote DIVISION operation is
• The notation of DIVISION between R and S is given as
Comments (1)
with every tuple from
.
.
.
to
Chapter 8, Problem 2RQ
Problem
What is union compatibility? Why do the UNION, INTERSECTION, and DIFFERENCE
operations require that the relations on which they are applied be union compatible?
Step-by-step solution
Step 1 of 2
Union compatibility: The two relations are said to be union compatible if both the relations have
the same number of attributes and the domain of the similar attributes is same.
Comment
Step 2 of 2
The UNION, INTERSECTION and DFFERENCE operations require that the relations on which
they are applied be union compatible because all these operations are binary set operations. The
tuples of the relations are directly compared under these operations and the tuples should have
same no of attributes and the domain of the similar attributes should be same.
Comment
Chapter 8, Problem 3RQ
Problem
Discuss some types of queries for which renaming of attributes is necessary in order to specify
the query unambiguously.
Step-by-step solution
Step 1 of 1
When a query has an NATURAL JOIN operation than renaming foreign key attribute is
necessary, if the name is not already same in both relations, for operation to get executed. In
EQUIJOIN after the operation is performed there are two attributes that have same values for all
tuples. These are attributes which have been checked in condition. In NATURAL JOIN one of
them has been removed only single attribute is there.
DIVISION operation is another such operation. Division takes place on basis of common attribute
so names must be same.
Comment
Chapter 8, Problem 4RQ
Problem
Discuss the various types of inner join operations. Why is theta join required?
Step-by-step solution
Step 1 of 6
Various types of inner join operations:
From multiple relations when combining the data, then the related information can be presented
in single table.
This operation is known as inner join operations.
Inner join operations are two types. They are:
• EQUI JOIN operations
• NATURAL JOIN operations
Comment
Step 2 of 6
EQUI JOIN operation:
• In this operation, it will use the conditions and the relations with equality comparisons.
•
is called an EQUIJOIN operator where the only comparison operator used in a JOIN
operation.
In the end result of equijoin operations, always have one or more pair of attributes.
It is having identical values in every tuple.
Example syntax:
Table Expression [INNER] JOIN table Expression
{ON Boolean Expression}
Or,
Comment
Step 3 of 6
NATURALJOIN operation:
One of each pair of attributes with identical values is superfluous.
• *- is denoted by the NATURAL JOIN operation.
• It is created to get rid of the second (superfluous) attribute in an EQUI JOIN condition.
Definition:
• The standard definition of the NATURAL JOIN operation requires two join attributes.
•
Comment
Step 4 of 6
It has the same name in both relations.
• If the case is not possible, then the remaining operation is firstly applied.
Example syntax:
Comment
Step 5 of 6
Theta join operation:
• Theta join operation is consists of equerries.
• From two relations to combine tuples, where the combination condition for the equality of
shared attributes is not simple.
• Then it is convenient for the JOIN operation to have a more general form.
The operator to represent the Theta join operation is
- Join operation is a binary operation. It is denoted as
.
Where,
is an attribute for relation R
is an attribute for relation S
•
•
have the same domain and
is the comparison operator
operator is used to join the attributes those are NULL in the tuples or instructs the tuple do
not appear the result when the join condition is FALSE.
• So, the two relations will join that results in a subset of the Cartesian product, which is a
subset determined by the join condition.
Example syntax:
The result of
Professions
careers is shown below.
Name Job Career Pays
Haney Mechanic Mechanic 6500
David Archaeologist Archaeologist 40,000
Comment
Step 6 of 6
John Doctor Doctor 50,000
Comment
Chapter 8, Problem 5RQ
Problem
What role does the concept of foreign key play when specifying the most common types of
meaningful join operations?
Step-by-step solution
Step 1 of 3
A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.
• A foreign key is mainly used for establishing relationship between two tables.
• A table can have more than one foreign key.
Comment
Step 2 of 3
The JOIN operation is used to combine related tuples from two relations into a single tuple.
• In order to perform JOIN operation, there should exist relationship between two tables.
• The relationship is maintained through the concept of foreign key.
• If there is no foreign key, then JOIN operation may not lead to meaningful results.
Hence, a foreign key concept is needed to establish relationship between two tables.
Comment
Step 3 of 3
Example:
Consider the following relational database.
EMPLOYEE(Name, Ssn, Manager_ssn, Job, Salary, Address, DeptNum)
DEPARTMENT(Dno,Dname, Mgr_ssn)
DeptNum is a foreign key in relation EMPLOYEE. The JOIN operation can be performed on two
relations based on the foreign key.
To retrieve employee name, DeptNum, Dname, the JOIN is as follows:
Comment
Chapter 8, Problem 6RQ
Problem
What is the FUNCTION operation? For what is it used?
Step-by-step solution
Step 1 of 3
FUNCTION operation:
• FUNCTION operation also known as AGGREGATE FUNCTION operation is used to perform
some mathematical aggregate functions on the numeric data.
• It also allows grouping of data/tuples based on some attributes of the relation.
• The aggregate functions are SUM, AVERAGE, MAXIMUM, MINIMUM and COUNT.
Comment
Step 2 of 3
The syntax of FUNCTION operation is as follows:
(R)
where,
is a list of attributes from R based on which grouping is to be performed.
is the symbol used for aggregate function operation.
is a list of pairs where a pair consists of function and
attributes.
Comment
Step 3 of 3
FUNCTION operation is used for obtaining the summarized data from the relations.
Example:
MAXIMUM Salary MINIMUM Salary (EMPLOYEE)
The above query will find the maximum and minimum salary in the EMPLOYEE relation.
Comment
Chapter 8, Problem 7RQ
Problem
How are the OUTER JOIN operations different from the INNER JOIN operations? How is the
OUTER UNION operation different from UNION?
Step-by-step solution
Step 1 of 2
OUTER JOIN and INNER JOIN: Consider two relational databases R and S. When user wants
to keep all the tuples in R, or all those in S, or all the tuples in R, or all those in S, or all those in
both relations in the result of the JOIN regardless of weather or not they have matching tuples in
other relation, set of operations called outer joins can do so. This satisfies the need of queries in
which tuples from two tables are to be combined by matching corresponding rows, but without
losing any tuples for lack of matching values.
When only matching tuples (based on condition) are contained in resultant relation and not all
tuples then join is INNER JOIN (EQUIJOIN and NATURALJOIN).
In OUTER JOIN if matching values of other relation are not present fields are padded by NULL
value.
Comment
Step 2 of 2
OUTER UNION and UNION: For UNION operation databases have to be UNION compatible, i.e,
they have same number of attributes and each corresponding pair of attributes have same
domain.
OUTER UNION operation was developed to take the union of tuples from two relations if the
relations are not union compatible. This operation will take UNION of tuples in two relations R(X,
Y) and S(X,Z) that are partial compatible, meaning that only some attributes, say X, are union
compatible. Resultant relation is of form RESULT(X, Y, Z).
Two tuples t1 in R and T2 in S are said to match if t1[X] =t2[X] and are considered to contain
same entity instance. These are combined in single tuple.
For rest of tuples NULL values are padded.
Comment
Chapter 8, Problem 8RQ
Problem
In what sense does relational calculus differ from relational algebra, and in what sense are they
similar?
Step-by-step solution
Step 1 of 2
Difference between relational calculus and relational algebra:
Relational calculus
Relational algebra
It is a non-procedural language.
It is a procedural language.
The query specifies what output is to be retrieved.
The order of the operations to be followed for getting the
result is not specified.
The query specifies how the
desired output is retrieved.
The order of the operations to be
followed for getting the result is
specified.
The evaluation of the query does not depend on the order
of the operations.
The evaluation of the query
depends on the order of the
operations.
New relations are not created by performing operations on New relations can be obtained by
the existing relations. Formulas are directly applied on the
performing operations on the
existing relations.
existing relations.
The queries are domain
The queries are domain independent.
dependent.
Comment
Step 2 of 2
Similarities between relational calculus and relational algebra:
• Relational algebra and relational calculus are formal query languages for relational model.
• They are used for retrieving information from database.
Comment
Chapter 8, Problem 9RQ
Problem
How does tuple relational calculus differ from domain relational calculus?
Step-by-step solution
Step 1 of 2
The relational calculus is a non-procedural query language that uses predicates.
• The query in relational calculus specifies what output is to be retrieved.
• The order of the operations to be followed for getting the result is not specified.
• In other words, the evaluation of the query does not depend on the order of the operations.
• The two variations of relational calculus are:
o Tuple relational calculus
o Domain relational calculus
Comment
Step 2 of 2
The differences between tuple relational calculus and domain relational calculus are as follows:
Comment
Chapter 8, Problem 10RQ
Problem
Discuss the meanings of the existential quantifier (∃) and the universal quantifier (∀).
Step-by-step solution
Step 1 of 2
Quantifier’s are two types
(1) Existential quantifiers:(2) Universal quantifiers:(1) Existential quantifiers:Existential quantifier is a logical relation and symbolized as
(“ there exists”).
Here
The statement is
Based on the formula of existential quantifiers is if F is a formula, then so is
.
Where t is a tuple variable.
If the formula F evaluates to TRUE for some tuple assigned to free occurrences of t in F, then the
formula
is TRUE. Otherwise, it is FALSE.
Comment
Step 2 of 2
(2) Universal Quantifiers:Universal quantifiers is a logical relation, it is symbolized as
The statement is
.
.
Based on the formula of universal quantifiers is
If F is a formula then statement is
.
Here t is the tuple variable and the formula F
Evaluates to true for every tuple assigned to free occurrences of t in F, then F is TRUE other wire
it is FALSE.
Comment
Chapter 8, Problem 11RQ
Problem
Define the following terms with respect to the tuple calculus: tuple variable, range relation,atom,
formula, and expression.
Step-by-step solution
Step 1 of 3
Tuple relational calculus: The tuple relational calculus is a non-procedural language. It contains
a declarative expression that specifies what is to be retrieved.
Comment
Step 2 of 3
Tuple variable: A query in the tuple relational calculus is represented as
. Here, t is a
tuple variable for which predicate P is true.
Range Relation: In the tuple relational calculus, every tuple ranges over a relation. The variable
takes any tuple as its value from the relation.
Atom: The atom in the tuple relational calculus identifies the range of the tuple variable. The
condition in the tuple relational calculus is made of atoms.
Comment
Step 3 of 3
Formula: A formula or condition is made of atoms. These atoms in the formula are connected
via the logical operators like AND, OR, NOT. Every atom in the formula is treated as a formula
i.e., the formula may or may not have multiple atoms.
Expression: The tuple relational calculus contains a declarative expression that specifies what is
to be retrieved.
Example:
Consider an expression
. In this
expression, t is the tuple variable,
and
is the formula,
are atoms, the
specifies the range of the tuple variable t over the relation
Comment
is a range relation that
.
Chapter 8, Problem 11RQ
Problem
Define the following terms with respect to the tuple calculus: tuple variable, range relation,atom,
formula, and expression.
Step-by-step solution
Step 1 of 3
Tuple relational calculus: The tuple relational calculus is a non-procedural language. It contains
a declarative expression that specifies what is to be retrieved.
Comment
Step 2 of 3
Tuple variable: A query in the tuple relational calculus is represented as
. Here, t is a
tuple variable for which predicate P is true.
Range Relation: In the tuple relational calculus, every tuple ranges over a relation. The variable
takes any tuple as its value from the relation.
Atom: The atom in the tuple relational calculus identifies the range of the tuple variable. The
condition in the tuple relational calculus is made of atoms.
Comment
Step 3 of 3
Formula: A formula or condition is made of atoms. These atoms in the formula are connected
via the logical operators like AND, OR, NOT. Every atom in the formula is treated as a formula
i.e., the formula may or may not have multiple atoms.
Expression: The tuple relational calculus contains a declarative expression that specifies what is
to be retrieved.
Example:
Consider an expression
. In this
expression, t is the tuple variable,
and
is the formula,
are atoms, the
specifies the range of the tuple variable t over the relation
Comment
is a range relation that
.
Chapter 8, Problem 12RQ
Problem
Define the following terms with respect to the domain calculus: domain variable, range relation,
atom, formula, and expression.
Step-by-step solution
Step 1 of 3
Domain variable:A variable whose value is drawn from the domain of an attribute.
To form a relation of degree ‘n’ for a query result, domain variables are used.
Ex:
The domain of domain variable Crs might be the set of possible values of the Crs code attribute
of the relation teaching.
Comment
Step 2 of 3
Range relation:In the domain calculus, the type of variables is used in formulas, other wise variables
having the range over tuples. The variable range over single values from domains of
attributes.
ATOM:A list of values in a relation must be a tuple let the relation R as
Here R is the name of the relation of degree j and each
, and
is a domain variable.
Comment
Step 3 of 3
Formula:In a domain relational calculus formula is recursively defined. Starting with simple atomic
formulas and building bigger and better formulas using the logical connectives.
A formula is mode up of atoms.
Expression:It is the domain relational calculus. That is the form of
Here
are domain variables.
An expression in a domain calculus is called formulas.
Comment
Chapter 8, Problem 13RQ
Problem
What is meant by a safe expression in relational calculus?
Step-by-step solution
Step 1 of 3
An expression in relational calculus is said to be safe expression if it ensures to output a finite set
of tuples.
Comment
Step 2 of 3
The relational calculus expression that generates all the tuples from the universe that are not
student tuples is as follows:
It generates infinite number of tuples as there will be so many tuples other than student tuples.
Such expressions in relational calculus that does not generate a finite set of tuples are known as
unsafe expression.
Comment
Step 3 of 3
The generated tuples of the safe expression must be from the domain of an expression.
Otherwise it is considered as unsafe.
Comment
Chapter 8, Problem 14RQ
Problem
When is a query language called relationally complete?
Step-by-step solution
Step 1 of 2
A relational query language is said to be relationally complete if a query that is expressed in
relational calculus can also be expressed in query language.
• The expressive power of query language will be equivalent to relational algebra.
• Relational completeness is a criterion by which the expressive strength of a language can be
measured.
Comment
Step 2 of 2
• Some of the queries cannot be expressed in relational calculus or relational algebra.
• Almost all relational query languages (for example SQL) are relationally complete. They are
more expressive than relational algebra or relational calculus.
Comment
Chapter 8, Problem 15E
Problem
Show the result of each of the sample queries in Section 8.5 as it would apply to the database
state in Figure 5.6.
Step-by-step solution
Step 1 of 6
Query 1:Result
FNAME
LNAME
ADDRESS
John
Smith
731 Fondren,
F Rank in Wong
638 Voss,
Ramesh
Narayan 975 F ire, Oak, Humble , Tx
Joyce
English
5631 Rice,
Comment
Step 2 of 6
Query 2:PNUMBER DNUM LNAME ADDRESS B DATE
10
4
Wallace 291 ,
20 – JUN – 31
30
4
Wallace 291 ,
20 – JUN - 31
Comment
Step 3 of 6
Query 3:Result :Is empty because here no tuples satisfy the result.
LNAME F NAME
Query 4:
Result:
Is
P NO
1
1
Comment
Step 4 of 6
Query 5:Result:
L NAME F NAME
Smith
John
Wong
Comment
Step 5 of 6
Query 6:Result :L NAME F NAME
Zelaga
Alicia
Narayan Ramesh
English
Joyce
Jobber
Ahmad
Borg
James
Comment
Step 6 of 6
Query 7:Result:
L NAME FNAME
Wallace
Wong
Comment
Jennifer
Chapter 8, Problem 16E
Problem
Specify the following queries on the COMPANY relational database schema shown in Figure 5.5
using the relational operators discussed in this chapter. Also show the result of each query as it
would apply to the database state in Figure 5.6.
a. Retrieve the names of all employees in department 5 who work more than 10 hours per week
on the ProductX project.
b. List the names of all employees who have a dependent with the same first name as
themselves.
c. Find the names of all employees who are directly supervised by ‘Franklin Wong’.
d. For each project, list the project name and the total hours per week (by all employees) spent
on that project.
e. Retrieve the names of all employees who work on every project.
f. Retrieve the names of all employees who do not work on any project.
g. For each department, retrieve the department name and the average salary of all employees
working in that department.
h. Retrieve the average salary of all female employees.
i. Find the names and addresses of all employees who work on at least one project located in
Houston but whose department has no location in Houston.
j. List the last names of all department managers who have no dependents.
Step-by-step solution
Step 1 of 10
Comment
Step 2 of 10
Comment
Step 3 of 10
Comment
Step 4 of 10
Comments (1)
Step 5 of 10
Comment
Step 6 of 10
Comment
Step 7 of 10
Comments (1)
Step 8 of 10
Comment
Step 9 of 10
Comments (2)
Step 10 of 10
Comments (2)
Chapter 8, Problem 17E
Problem
Consider the AIRLINE relational database schema shown in Figure, which was described in
Exercise. Specify the following queries in relational algebra:
a. For each flight, list the flight number, the departure airport for the first leg of the flight, and the
arrival airport for the last leg of the flight.
b. List the flight numbers and weekdays of all flights or flight legs that depart from Houston
Intercontinental Airport (airport code ‘iah’) and arrive in Los Angeles International Airport (airport
code ‘lax’).
c. List the flight number, departure airport code, scheduled departure time, arrival airport code,
scheduled arrival time, and weekdays of all flights or flight legs that depart from some airport in
the city of Houston and arrive at some airport in the city of Los Angeles.
d. List all fare information for flight number ‘col97’.
e. Retrieve the number of available seats for flight number ‘col97’ on ‘2009-10-09’.
The AIRLINE relational database scheme.
Exercise
Consider the AIRLINE relational database schema shown in Figure, which describes a database
for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or
more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled
arrival and departure times, airports, and one or more LEG_INSTANCEs— one for each Date on
which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance,
SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival
and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a
particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which
they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE
database to enter a reservation on a particular flight or flight leg on a given date.
a. Give the operations for this update.
b. What types of constraints would you expect to check?
c. Which of these constraints are key, entity integrity, and referential integrity constraints, and
which are not?
d. Specify all the referential integrity constraints that hold on the schema shown in Figure.
Step-by-step solution
Step 1 of 4
The following symbols are used to write a relation algebra query:
Comment
Step 2 of 4
a.
Following is the query to list the flight number, the first leg of flight’s departure airport, and the
last leg of flight’s arrival airport from each flight:
Explanation:
• FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose
FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number.
• MAX_FLIGHT_LEG holds the data about Flight_numbers whose Leg_number is maximum in
the FLIGHT_LEG_IN.
• MIN_FLIGHT_LEG holds the data about Flight_numbers whose Leg_number is minimum in the
FLIGHT_LEG_IN.
• In RESULT1, the data about the Flight_number, Leg_number and Arrival_airport_code of
MAX_FLIGHT_LEG is stored.
• In RESULT2, the data about the Flight_number, Leg_number and Arrival_airport_code of
MIN_FLIGHT_LEG is stored.
• RESULT will display the resultant tuples of the Union of the Set Algebra of RESULT1 and
RESULT2.
Comments (1)
Step 3 of 4
b.
Following is the query to retrieve the flight numbers and weekdays of all flights or flight legs that
flies from Houston Intercontinental Airport whose code is given as ‘iah’ to Los Angeles
International Airport whose code is given as ‘lax’:
Explanation:
• FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose
FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number.
• In RESULT1, the data about the FLIGHT_LEG is stored whose Departure_airport_code is
‘iah’and Arrival_airport_code is ‘lax’.
• RESULT will display the Flight_number, Weekdays of RESULT1.
c.
Following is the query to retrieve the flight number, airport code and scheduled time of departure,
airport code and scheduled time of arrival, and weekdays of all flights or flight legs that flies from
one of the airport in city of Houston and lands at one of the airport in Los Angeles:
Explanation:
• FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose
FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number.
• The DEPART_CODE will hold the data about the Airport_code of AIRPORT whose City =
‘Houston’.
• The ARRIVE_CODE will hold the data about the Airport_code of AIRPORT whose City = ‘Los
Angeles’.
• The HOUST_DEPART holds the resultant of the relation obtained when the JOIN operation is
applied between the relations DEPART_CODE and FLIGHT_LEG_IN which satisfies condition
that Airport_Code = Departure_airport_code.
• The HOUST_TO_LA holds the resultant of the relation obtained when the JOIN operation is
applied between the relations ARRIVE_CODE and HOUST_DEPART which satisfies condition
that Airport_Code = Arrival_airport_code.
• RESULT will display the Flight_number, Departure_airport_code, Scheduled_departure_time,
Arrival_airport_code, Scheduled_arrival_time and Weekdays of HOUST_TO_LA.
d.
Following is the query to retrieve the fare information of the whose flight number is ‘col97’:
Explanation:
RESULT will hold the data about the all the FARE’s whose Flight_number is ‘col97’.
Comment
Step 4 of 4
e.
Following is the query to get the number of available seats whose flight number is ‘col97’ and
dated on ‘2009-10-09’:
Explanation:
• LEG_INST_INFO holds the data about LEG_INSTANCE whose Flight_number is ‘col97’ and
Date is ‘2009-10-09’.
• RESULT will display the Number_of_available_seats information of the LEG_INST_INFO.
Comment
Chapter 8, Problem 18E
Problem
Consider the LIBRARY relational database schema shown in Figure, which is used to keep track
of books, borrowers, and book loans. Referential integrity constraints are shown as directed arcs
in Figure, as in the notation of Figure 5.7. Write down relational expressions for the following
queries:
a. How many copies of the book titled The Lost Tribe are owned by the library branch whose
name is ‘Sharpstown’?
b. How many copies of the book titled The Lost Tribe are owned by each library branch?
c. Retrieve the names of all borrowers who do not have any books checked out.
d. For each book that is loaned out from the Sharpstown branch and whose Due_date is today,
retrieve the book title, the borrower’s name, and the borrower’s address.
e. For each library branch, retrieve the branch name and the total number of books loaned out
from that branch.
f. Retrieve the names, addresses, and number of books checked out for all borrowers who have
more than five books checked out.
g. For each book authored (or coauthored) by Stephen King, retrieve the title and the number of
copies owned by the library branch whose name is Central.
A relational database scheme for a LIBRARY database.
Step-by-step solution
Step 1 of 7
a.
Following is the relational expression to find the number of copies of the book whose title is ‘The
Lost Tribe’ in the library branch whose name is ‘Sharpstown’:
Comment
Step 2 of 7
b.
Following is the relational expression to find the number of copies of the book whose title is ‘The
Lost Tribe’ is available at each branch of the library:
Comment
Step 3 of 7
c.
Following is the relational expression to retrieve the names of the borrowers who have no books
checked out:
Comment
Step 4 of 7
d.
Following is the relational expression to retrieve the book title, borrower’s name and address of
the book that is loaned out from of the borrowers who have no books checked out from branch
whose name is ‘Sharpstown’ and which has the due date as today:
Comment
Step 5 of 7
e.
Following is the relational expression to retrieve the branch name and the total number of books
loaned out from that branch:
Comments (1)
Step 6 of 7
f.
Following is the relational expression to retrieve the name, address and total number of books for
all borrowers who have more than five books checked out:
Comment
Step 7 of 7
g.
Following is the relational expression to retrieve the title and number of copies of each book
authored or coauthored by Stephen King in library branch whose name is Central:
Comment
Chapter 8, Problem 19E
Problem
Specify the following queries in relational algebra on the database schema given in Exercise:
a. List the Order# and Ship_date for all orders shipped from Warehouse# W2.
b. List the WAREHOUSE information from which the CUSTOMER named Jose Lopez was
supplied his orders. Produce a listing: Order#, Warehouse#.
c. Produce a listing Cname, No_of_orders, Avg_order_amt, where the middle column is the total
number of orders by the customer and the last column is the average order amount for that
customer.
d. List the orders that were not shipped within 30 days of ordering.
e. List the Order# for orders that were shipped from all warehouses that the company has in New
York.
Exercise
Consider the following six relations for an order-processing database application in a company:
CUSTOMER(Cust#, Cname, City)
ORDER(Order#, Odate, Cust#, Ord_amt)
ORDER_ITEM(Order#, Item#, Qty)
ITEM(Item#, Unit_price)
SHIPMENT(Order#, Warehouse#, Ship_date)
WAREHOUSE(Warehouse#, City)
Here, Ord_amt refers to total dollar amount of an order; Odate is the date the order was placed;
and Ship_date is the date an order (or part of an order) is shipped from the warehouse. Assume
that an order can be shipped from several warehouses. Specify the foreign keys for this schema,
stating any assumptions you make. What other constraints can you think of for this database?
Step-by-step solution
Step 1 of 6
Relational Algebra
It is a procedural language to perform various queries on the database.
The operations of the relational algebra are as follows:
• Select: It is used to select the tuples and it is presented by a symbol σ. • Project: it is used to
projects the columns and it is represented by ∏. • Union is identified by ∪. • Set different is
identified by –. • Cartesian product is identified by Χ. • Rename is identified by ρ
Comment
Step 2 of 6
a.
Query to retrieve the order number and shipping date for all the orders that are shipped from
Warehouse "W2":
Explanation:
• First projects the Order# and Ship_date and then select the Warehouse# "W2" for all orders.
• The above query will select the fields Order# and Ship_date from the table SHIPMENT whose
Warehouse number = "W2" for all the orders.
Comment
Step 3 of 6
b.
Query to retrieve the order number and warehouse number for all the orders of customer named
"Jose Lopez":
Explanation:
• First select the Customer named "Jose Lopez" was supplied his orders and then project the
listing of Order#, Warehouse#.
• TEMP will give the details of the ORDER and the CUSTOMER table whose Cname is ‘Jose
Lopez’. The details of Jose Lopez will be the output.
• The above query will display only the Order# and Warehouse# and perform natural join on
SHIPMENT and the TEMP table whose Order# is same as the Order# number of TEMP.
Comment
Step 4 of 6
c.
Query to retrieve the Cname and total number of orders and average order amount of each
customer:
Explanation:
• The relation TEMP specifies the list of attributes between parenthesis in the RENAME
operation.
• To define the aggregate functions in the query by using the following syntax:
• The number of orders and average order amount is group by the cname field.
• The above query will display only the Customer name, number of orders, and average order
amount and perform natural join on CUSTOMER and the TEMP table whose Cust# is same as
the Cust# number of TEMP.
Comment
Step 5 of 6
d.
Query to list the orders that are not shipped within 30 days of ordering:
Explanation:
• First projects the Order#, Odate, Cust#, and Order_amt then select the orders were not shipped
within the thirty days.
• Select the number of days is calculated by subtracting order date from shipping date and
perform natural join on SHIPMENT whose Order# is same as the Order# number of ORDER.
Comment
Step 6 of 6
e.
Query to list the order# of the orders shipped from the warehouses located in New York:
Explanation:
• TEMP will give the details of the WAREHOUSE whose City is ‘NEW YORK’. The details of
‘NEW YORK’ will be the output.
• Project the Warehouse# from the SHIPMENT table and it is divided by the TEMP.
• The division operator includes all the rows in the SHIPMENT table in combination with every
row from relation TEMP and finally the resultant rows appear in the SHIPMENT relation.
Comment
Chapter 8, Problem 20E
Problem
Specify the following queries in relational algebra on the database schema given in Exercise:
a. Give the details (all attributes of trip relation) for trips that exceeded $2,000 in expenses.
b. Print the Ssns of salespeople who took trips to Honolulu.
c. Print the total trip expenses incurred by the salesperson with SSN = ‘234-56-7890’.
Exercise
Consider the following relations for a database that keeps track of business trips of salespersons
in a sales office:
SALESPERSON(Ssn, Name, Start_year, Dept_no)
TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip id)
EXPENSE(Trip id, Account#, Amount)
A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating
any assumptions you make.
Step-by-step solution
Step 1 of 4
The relational database schema is:
SALESPERSON (Ssn, Name, Start_year, Dept_no)
TRIP (Ssn, From_city, To_city, Departure_date, Return_date, Trip_id)
EXPENSE(Trip_id, Account#, Amount)
Comment
Step 2 of 4
a) Details for trips that exceeded $2000 in expenses.
Comment
Step 3 of 4
b) Print the SSN of salesman who took trips to ‘Honolulu’.
Comment
Step 4 of 4
c) Print the total trip expenses incurred by the salesman with SSN= ‘234-56-7890’.
Comment
Chapter 8, Problem 21E
Problem
Specify the following queries in relational algebra on the database schema given in Exercise:
a. List the number of courses taken by all students named John Smith in Winter 2009 (i.e.,
Quarter=W09).
b. Produce a list of textbooks (include Course#, Book_isbn, Book_title) for courses offered by the
‘CS’ department that have used more than two books.
c. List any department that has all its adopted books published by ‘Pearson Publishing’.
Exercise
Consider the following relations for a database that keeps track of student enrollment in courses
and the books adopted for each course:
STUDENT(Ssn, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(Ssn, Course#, Quarter. Grade)
BOOK ADOPTION(Course#, Quarter, Book_isbn)
TEXT(Book_isbn, Book_title, Publisher, Author)
Specify the foreign keys for this schema, stating any assumptions you make.
Step-by-step solution
Step 1 of 3
a.
Π Course# (σ Quarter=W09 ((σ Name= ‘John Smith’ (STUDENT)
ENROLL))
Explanation:
• This query will give the courses taken by the student named ‘John
Smith’ in winter 2009.
• Here, ‘Π’ is nothing but the projection, ‘σ’ represents selection operation and
‘
’ represents the natural join operation.
Comment
Step 2 of 3
b.
Π Course#,Book_isbn,Book_title(σ Dept=’CS’ ( Course)
(Book_adaption))U (πCourse no (σ Course no
>=1))
Explanation:
• The above query will retrieve the list of textbooks for CS course with the use of natural join.
• The union operator for this query is used to get the common rows from two queries.
Comment
Step 3 of 3
c.
BOOK_ALL_DEPTS = π Dept ((Book_adaption
Course))
BOOK_OTHER_DEPTS= π DEPT ((σ Publisher <> ‘Pearson Publishers’ (Book adaption
Text)
Course))
BOOK_ANY_DEPTS = BOOK_ALL_DEPTS - BOOK_OTHER_DEPTS
Explanation:
• The above query will list the departments which have all the adopted books published by
“Pearson publishing”.
• In this query ‘<>’ operator is used for “not equal to” operation.
Comment
Chapter 8, Problem 22E
Problem
Consider the two tables T1 and T2 shown in Figure 8.15. Show the results of the following
operations:
Step-by-step solution
Step 1 of 7
Operations of relational algebra
The two tables T1 and T2 represent database states.
TABLE T1
TABLE T2
P
Q R
A
B C
10
a
5
10 b
6
15
b
8
25 c
3
25
a
6
10 b
5
Comment
Step 2 of 7
a) The operation
is “THETA JOIN”. It produces all the combinations of tuples
that satisfy the join condition
. Following table is the result of the “THETA JOIN”
operation.
P
Q R A
B C
10 a
5
10 b
6
10 a
5
10 b
5
25 a
6
25 c
3
Comment
Step 3 of 7
b) The operation
is “THETA JOIN”. It produces all the combinations of tuples
that satisfy the join condition
operation.
P
Q R A
B C
15 b
8
10 b
6
15 b
8
10 b
5
Comment
. Following table is the result of the “THETA JOIN”
Step 4 of 7
c) The operation
is “LEFT OUTER JOIN”. It produces the tuples that are in
the first or left relation T1 with the join condition
. If no matching tuple is found in
T2, then the attributes are filled with a NULL values. Following table is the result of the “LEFT
OUTER JOIN” operation.
P
Q R A
B
C
10 a
5
10
b
6
10 a
5
10
b
5
15 a
8
NULL NULL NULL
25 a
8
25
c
3
Comment
Step 5 of 7
d) The operation
is “RIGHT OUTER JOIN”. It produces the tuples that are in
the second or right relation T2 with the join condition
. If no matching tuple is found
in T1, then the attributes are filled with a NULL values. Following table is the result of the “RIGHT
OUTER JOIN” operation.
P
Q
R
A
15
b
8
10 b
6
NULL NULL NULL 25 c
3
15
5
b
8
B C
10 b
Comment
Step 6 of 7
e) The operation
is “UNION”. It produces a relation that includes all the tuples that are
in T1 or T2 or both T1 and T2. The operation is possible since T1 and T2 are union compatible.
Following table is the result of the “UNION” operation.
P
Q R
10 a
5
15 b
8
25 a
6
10 b
6
25 c
3
10 b
5
Comment
Step 7 of 7
f) The operation
is “THETA JOIN”. It produces all the
combinations of tuples that satisfy the join condition
table is the result of the “THETA JOIN” operation.
P
Q R A
10 a
5
Comment
B C
10 b
5
. Following
Chapter 8, Problem 23E
Problem
Specify the following queries in relational algebra on the database schema in Exercise:
a. For the salesperson named ‘Jane Doe’, list the following information for all the cars she sold:
Serial#, Manufacturer, Sale_price.
b. List the Serial# and Model of cars that have no options.
c. Consider the NATURAL JOIN operation between SALESPERSON and SALE. What is the
meaning of a left outer join for these tables (do not change the order of relations)? Explain with
an example.
d. Write a query in relational algebra involving selection and one set operation and say in words
what the query does.
Exercise
Consider the following relations for a database that keeps track of automobile sales in a car
dealership (OPTION refers to some optional equipment installed on an automobile):
CAR(Serial no, Model, Manufacturer, Price)
OPTION(Serial_no, Option_name, Price)
SALE(Salesperson_id, Serial_no, Date, Sale_price)
SALESPERSON(Salesperson_id, Name, Phone)
First, specify the foreign keys for this schema, stating any assumptions you make. Next, populate
the relations with a few sample tuples, and then give an example of an insertion in the SALE and
SALESPERSON relations that violates the referential integrity constraints and of another
insertion that does not.
Step-by-step solution
Step 1 of 4
(a)
Comment
Step 2 of 4
(b)
Comment
Step 3 of 4
(c)
Meaning of LEFT OUTER JOIN operation between SALESPERSON and SALE is that all the
records for which JOIN condition evaluates to be true and all the records from SALESPERSON
that do not match condition will also be displayed and attribute values for attributes
corresponding to SALE table will be marked as NULL.
For example: Consier records for two sale person
a. ID_1,ABC,9999999
b. ID_2,DEF,8888888
And having tuple:
a) ID_1,111, 2-08-2008,500000
Result of join operation will have two tuples:
a) ID_1,ABC,9999999, 111, 2-08-2008,500000
b) ID_2,DEF,8888888,NULL,NULL,NULL
Comment
Step 4 of 4
(d)
This query gives information about Doe couple, who happen to work at same place.
Comment
Chapter 8, Problem 24E
Problem
Specify queries a, b, c, e, f, i, and j of Exercise 8.16 in both tuple and domain relational calculus.
Reference Exercise 8.16
Specify the following queries on the COMPANY relational database schema shown in Figure 5.5
using the relational operators discussed in this chapter. Also show the result of each query as it
would apply to the database state in Figure 5.6.
a. Retrieve the names of all employees in department 5 who work more than 10 hours per week
on the ProductX project.
b. List the names of all employees who have a dependent with the same first name as
themselves.
c. Find the names of all employees who are directly supervised by ‘Franklin Wong’.
d. For each project, list the project name and the total hours per week (by all employees) spent
on that project.
e. Retrieve the names of all employees who work on every project.
f. Retrieve the names of all employees who do not work on any project.
g. For each department, retrieve the department name and the average salary of all employees
working in that department.
h. Retrieve the average salary of all female employees.
i. Find the names and addresses of all employees who work on at least one project located in
Houston but whose department has no location in Houston.
j. List the last names of all department managers who have no dependents.
Step-by-step solution
Step 1 of 10
Tuple relational calculus
The tuple relational calculus is dependent on the use of tuple variables. A tuple variable is a
named relation of “ranges over”.
Domain relational calculus
The variables in the tuple relational calculus take their values from domains of attributes rather
than tuples of relations.
Comment
Step 2 of 10
a.
• To specify the range of a tuple variable e as the EMPLOYEE relation.
• Select the LNAME, FNAME attributes of the EMPLOYEE relation where DNO=5 work for
HOURS>10.
Tuple relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the PROJECT
considers as the b and the WORKS_ON considers as the c.
• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and WORKS_ON (c) specify the range relations for a and c.
The condition a.ssn=c.ESSN is a join condition.
Domain relational calculus:
Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations t=e is a join condition.
Comment
Step 3 of 10
b.
• To specify the range of a tuple variable e as the EMPLOYEE relation.
• Select the LNAME, FNAME attributes of the EMPLOYEE relation who have a dependent with
the same first name as themselves.
Tuple relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the
DEPENDENT considers as the b.
• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and DEPENDENT (b) specify the range relations for a and b.
The condition a.ssn=b.ESSN is a join condition.
Domain relational calculus:
Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations a=t and b=q is a join condition.
Comment
Step 4 of 10
c.
• To specify the range of a tuple variable e as the EMPLOYEE relation.
• Select the LNAME, FNAME attributes of the EMPLOYEE relation to find the names of
employees that are directly supervised by 'Franklin Wong'.
Tuple relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the
EMPLOYEE considers as the b by using self-join.
• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and EMPLOYEE (b) specify the range relations for e and s. The
condition a.ssn=b.SSN is a self-join condition.
Domain relational calculus:
Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations y=d and S.FNAME='Franklin' AND
S.LNAME='Wong' is a join condition.
Comment
Step 5 of 10
e.
• To specify the range of a tuple variable e as the EMPLOYEE relation.
• Select the LNAME, FNAME attributes of the EMPLOYEE relation to retrieve the names of
employees who work on every project.
Tuple relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the FORALL
PROJECT considers as the b.
• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and FORALL PROJECT (b) specify the range relations for a
and b. The condition WHERE PNUMBER=PNO AND ESSN=SSN.
Domain relational calculus:
Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations e=t and PNUMBER=PNO AND
ESSN=SSN is a join condition.
Comment
Step 6 of 10
f.
• To specify the range of a tuple variable e as the EMPLOYEE relation.
• Select the LNAME, FNAME attributes of the EMPLOYEE relation to retrieve the names of
employees who do not work on any project.
Tuple relational calculus:
Comment
Step 7 of 10
Explanation:
• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the
WORKS_ON considers as the b.
• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and WORKS_ON (b) specify the range relations for e and w.
The condition WHERE ESSN=SSN.
Domain relational calculus:
Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations a=t WHERE ESSN=SSN is a join
condition.
Comment
Step 8 of 10
i.
• To specify the range of a tuple variable e as the EMPLOYEE relation.
• Select the LNAME, FNAME, and ADDRESS attributes of the EMPLOYEE relation employees
who work on at least one project located in Houston.
Tuple relational calculus:
.
Explanation:
• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the PROJECT
considers as the b and the WORKS_ON considers as the c.
• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and WORKS_ON (c) specify the range relations for a and c.
The condition a.ssn=c.ESSN and PNO=PNUMBER AND PLOCATION='Houston' is a join
condition.
Domain relational calculus:
Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q, s, and v for Name and address fields.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations t=e and e.ssn=w.ESSN and
PNO=PNUMBER AND PLOCATION='Houston' is a join condition.
Comment
Step 9 of 10
j.
• To specify the range of a tuple variable e as the EMPLOYEE relation.
• Select the LNAME attribute of the EMPLOYEE relation of department managers who have no
dependents.
Tuple relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the
DEPARTMENT considers as the b and the DEPENDENT considers as the c.
•
Comment
Step 10 of 10
In the above tuple relational calculus, there is a free variable a and these appear to the left of the
bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and DEPARTMENT (b) specify the range relations for e and d.
The condition a.ssn=b.MGRSSN and SSN=ESSN is a join condition.
Domain relational calculus:
Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only s is free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable s for Name fields.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations e=t and e.ssn=d.MGRSSN and
SSN=ESSN is a join condition.
Comment
Chapter 8, Problem 25E
Problem
Specify queries a, b, c, and d of Exercise 1 in both tuple and domain relational calculus.
Exercise 1
Consider the AIRLINE relational database schema shown in Figure, which was described in
Exercise 2. Specify the following queries in relational algebra:
a. For each flight, list the flight number, the departure airport for the first leg of the flight, and the
arrival airport for the last leg of the flight.
b. List the flight numbers and weekdays of all flights or flight legs that depart from Houston
Intercontinental Airport (airport code ‘iah’) and arrive in Los Angeles International Airport (airport
code ‘lax’).
c. List the flight number, departure airport code, scheduled departure time, arrival airport code,
scheduled arrival time, and weekdays of all flights or flight legs that depart from some airport in
the city of Houston and arrive at some airport in the city of Los Angeles.
d. List all fare information for flight number ‘col97’.
e. Retrieve the number of available seats for flight number ‘col97’ on ‘2009-10-09’.
The AIRLINE relational database scheme.
Exercise 2
Consider the AIRLINE relational database schema shown in Figure, which describes a database
for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or
more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled
arrival and departure times, airports, and one or more LEG_INSTANCEs— one for each Date on
which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance,
SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival
and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a
particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which
they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE
database to enter a reservation on a particular flight or flight leg on a given date.
a. Give the operations for this update.
b. What types of constraints would you expect to check?
c. Which of these constraints are key, entity integrity, and referential integrity constraints, and
which are not?
d. Specify all the referential integrity constraints that hold on the schema shown in Figure.
Step-by-step solution
Step 1 of 5
a.
Tuple Relational Calculus:
In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG
consider as the l.
• In the above tuple relational calculus there are two free variable f and l and these appear to the
left of the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.
• The conditions FLIGHT (f) and FLIGHT_LEG (l) specifies the range relations for f and l. The
condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER
JOIN operation
Domain Relational Calculus:
• There are need of the 10 variables for the FLIGHT relation, of the ten variables q, r, s…z. Only
q, and v are free, because they appear to the left of the bar.
• Firstly there is specification of the requested attributes, flight number, departure airport for the
first leg of the flight and the arrival airport for the last leg of the flight.
• There is condition for selecting a tuple after the bar (|).
• A condition relating two domain variable from relations m=z is a join condition.
Comment
Step 2 of 5
b.
Tuple Relational Calculus:
In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG
consider as the l.
• In the created tuple relational calculus there is a single to free variable f this is appear to the left
of the bar ( | ) .
• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.
• The condition l.Departure_airport_code=’iah’ and l.Arrival_airport_code=’Iax’ is a selection
condition, which is similar to the SELECT operation in relational algebra.
• The conditions FLIGHT (f) and FLIGHT_LEG (l) specified the range relations for f and l. The
condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER
JOIN operation.
Domain Relational Calculus:
• There are need of the 10 variables for the FLIGHT relation, of the ten variables q, r, s…z, only
u, and v are free, because they appear to the left of the bar.
• Firstly there is specification of the requested attributes flight number, Weekdays, departurefrom
the Houstonintercontinental and arrive in los Angeles international Airport and of all the flight and
the arrival airport for the last leg of the flight.
• The values assigned to the variable qrstuvwxyz, they become the tuple of the FLIGHT relation
and these values are for q (Departure_airport_code) and r (Arrival_airport_code) is equal to ‘iah’
and ‘Iax’ respectively.
• Then there is condition for selecting a tuple after the bar (|).
• A condition relating two domain variable from relations m=z is a join condition.
Comment
Step 3 of 5
c.
Tuple Relational Calculus:
In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG
consider as the l.
• In the created tuple relational calculus there are two free variable f and l and these appear to
the left of the bar (|) .
• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.
• The condition l.Departure_airport_code=’iah’ and l.Arrival_airport_code=’Iax’ is a selection
condition, which is similar to the SELECT operation in relational algebra.
•
Comment
Step 4 of 5
The conditions FLIGHT (f) and FLIGHT_LEG (l) specifies the range relations for f and l. The
condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER
JOIN operation
Domain Relational Calculus:
• There are need of the 10 variables for the FLIGHT relation and 5 variable for FLIGHT_LEG, of
the 15 variables k, l…..q, r, s…z, only u, l, m, n, o and v are free, because they appear to the left
of the bar.
• Firstly there is specification of the requested attributes flight number, Departure_airport_code,
Scheduled_departure_time, Arrival_airport_code, scheduled_Arrival_time and weekdays for all
flight depart from some airportin the Houston city and arrive at some airport in the city of Los
Angles.
• The values assigned to the variable qrstuvwxyz and jklmnop, they become the tuple of the
FLIGHT, FLIGHT_LEG relation and these values are for q (Departure_airport_code) and r
(Arrival_airport_code) is equal to ‘iah’ and ‘Iax’ respectively.
• Then there is condition for selecting a tuple after the bar (|).
• A condition relating two domain variable from relations m=z is a join condition.
Comment
Step 5 of 5
d.
Tuple Relational Calculus:
• In the created tuple relational calculus there are two free variable f and r and these appear to
the left of the bar (|) .
• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.
• The condition r.Fnumber=’col197’is a selection condition, which is similar to the SELECT
operation in relational algebra.
• The condition FLIGHT (f) and FARE(r) specifies the range relations for f and r. The condition
r.Fnumber = f.flight_number is a join condition, whose purpose is similar to the INNER JOIN
operation
Domain Relational Calculus:
• There are need of the 10 variables for the FLIGHT relation and 5 variable for FLIGHT_LEG, of
the 15 variables k, l…..q, r, s…z, only s, t, u, v, and m are free, because they appear to the left of
the bar.
• Firstly there is specification of the requested attributes flight number, Fare_code, Amount,
Restrication, and Airline for all fare information for flight number ‘col197’.
• The values assigned to the variable qrstuvwxyz and lmnop , they become the tuple of the
FARE, FLIGHT relation and these values are for q (flight_number) is equal to ‘col197’.
• Then there is condition for selecting a tuple after the bar (|).
• A condition relating two domain variable from relations m=z is a join condition
Comment
Chapter 8, Problem 26E
Problem
Specify queries c, d, and f of Exercise in both tuple and domain relational calculus.
Exercise
Consider the LIBRARY relational database schema shown in Figure, which is used to keep track
of books, borrowers, and book loans. Referential integrity constraints are shown as directed arcs
in Figure, as in the notation of Figure 5.7. Write down relational expressions for the following
queries:
a. How many copies of the book titled The Lost Tribe are owned by the library branch whose
name is ‘Sharpstown’?
b. How many copies of the book titled The Lost Tribe are owned by each library branch?
c. Retrieve the names of all borrowers who do not have any books checked out.
d. For each book that is loaned out from the Sharpstown branch and whose Due_date is today,
retrieve the book title, the borrower’s name, and the borrower’s address.
e. For each library branch, retrieve the branch name and the total number of books loaned out
from that branch.
f. Retrieve the names, addresses, and number of books checked out for all borrowers who have
more than five books checked out.
g. For each book authored (or coauthored) by Stephen King, retrieve the title and the number of
copies owned by the library branch whose name is Central.
A relational database scheme for a LIBRARY database.
Step-by-step solution
Step 1 of 3
c.
Following is the relational expression to retrieve the names of the borrowers who have no books
checked out:
Tuple Relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans
consider as the l.
• In the above tuple relational calculus, there are two free variable b and l and these appear to
the left of the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions Borrower (b) and Book_Loans (l) specifiy the range relations for b and l. The
condition b.Card_No = l.Card_No is a join condition.
Domain Relational Calculus:
Explanation:
• There is a need of the 10 variables for the Borrower relation, of the ten variables q, r, s…z. The
only q is free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q for Name filed.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations m=z is a join condition.
Comment
Step 2 of 3
d.
Following is the relational expression to retrieve the book title, borrower’s name and address of
the book that is loaned out from of the borrowers who have no books checked out from a branch
whose name is ‘Sharps town’ and which has the due date as today:
Tuple Relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans
consider as the c.
• In the above tuple relational calculus, there are two free variable b and c and these appear to
the left of the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions Borrower (b) and Book_Loans (c) specify the range relations for and l. The
condition b.branch_name = “sharptown” and c.Card_No = b.Card_No and c.Card_No =
a.Card_No is a join condition.
Domain Relational Calculus:
• There is a need of the 16 variables for the BOOK relation, The only a,e, and f are free because
they appear to the left of the bar.
• Firstly, there is a specification of the requested attributes title from book and name and address
fields form borrower.
• The values assigned to the variable ijklm, they become the tuple of the Book_loans relation and
these values are for i (card_no) is equal to o (card_no) and branch_name=”Sharptown”.
• Then there is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations i=o and j=f is a join condition.
Comment
Step 3 of 3
f.
Following is the relational expression to retrieve the name, address and the total number of
books for all borrowers who have more than five books checked out:
Tuple Relational calculus:
Explanation:
• In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans
consider as the a.
• In the above tuple relational calculus, there are two free variable b and a and these appear to
the left of the bar (|).
• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions Borrower (b) and Book_Loans (a) specify the range relations for b and a. The
condition b.Card_No = l.Card_No is a join condition and retrieve the total number of books for all
borrowers using count() function.
Domain Relational Calculus:
Explanation:
• There is a need of the 10 variables for the Borrower relation, of the ten variables q, r, s…z. The
only q,s, and v are free because they appear to the left of the bar.
• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q for Name filed, s for address, and v for a total number of books.
• There is a condition for selecting a tuple after the bar (|).
• A condition relating two domain variables from relations m=z is a join condition and count is
greater than 5.
Comment
Chapter 8, Problem 27E
Problem
In a tuple relational calculus query with n tuple variables, what would be the typical minimum
number of join conditions? Why? What is the effect of having a smaller number of join
conditions?
Step-by-step solution
Step 1 of 1
In a tuple relational calculus, query with n tuple variables should be at least ( n – 1) join
conditions, and the second side, the Cartesian product with one of the range relations would be
taken. This usually does not make sense.
Comment
Chapter 8, Problem 28E
Problem
Rewrite the domain relational calculus queries that followed Q0 in Section 8.7 in the style of the
abbreviated notation of Q0A, where the objective is to minimize the number of domain variables
by writing constants in place of variables wherever possible.
Step-by-step solution
Step 1 of 5
Q1: {qsr /
(EMPLOYEE (qrstuvwxyz) AND DEPARTMENT ((mno) AND =
‘Research’ AND m = z )}
Comment
Step 2 of 5
This condition relations two domain variables, here range over attribute from two relations are.
m = 2 in Q1 and
domain variable to a constant l = ‘Research’
so domain relational calculus for above query is
Comment
Step 3 of 5
Q1A: { qsv / (EXISTS z ) (EXISTS m) (EMPLOYEE (qrstuvwxyz) AND DEPARTMENT
(‘Research ‘, m,n,o) AND m = z)}
Comment
Step 4 of 5
Q2: {; ksuv |
PROTECT (hijk)
AND EMPLOYEE (qrstuvwxyz) AND DEPARTMENT (lmno) AND k = m AND n = t AND j = ‘ ’ )}
Domain relational calculus is,
Comment
Step 5 of 5
Q2A:{ iksuv / (EXISTS m) (EXISTS n) (EXISTS t) (PROJECT (h,l,’stafford’, k) AND EMPLOYEE
(q,r,s,t,u,v,w,x,y,z) AND DEPARTMENT (l,m,n,o))}
Remaining queries Q6, and Q7 will not be different so, they have no constants.
Comment
Chapter 8, Problem 29E
Problem
Consider this query: Retrieve the Ssns of employees who work on at least those projects on
which the employee with Ssn = 123456789 works. This may be stated as (FORALL x) (IF P
THEN Q), where
â–  x is a tuple variable that ranges over the PROJECT relation.
â–  P ≡ employee with Ssn = 123456789 works on project x.
â–  Q ≡ employee e works on project x.
Express the query in tuple relational calculus, using the rules
â–  (∀ x)(P(x)) = NOT (∃x) ( NOT(P(x))).
â–  (IF P THEN Q)≡(NOT(P) ORQ).
Step-by-step solution
Step 1 of 1
{e.Ssn|EMPLOYEE(E), AND((
X)(NOT (PROJECT(x)) OR NOT ( (
(WORKS_ON(y) AND y.Essn = ‘123456789’)) OR ((
y)
w)(WORKS_ON (w) AND w.Essn =
e.Ssn AND x.Pnumber = W.Pno)) )}
{e.Ssn|EMPLOYEE(E), AND(NOT(
X)( (PROJECT(x)) AND (not (
(NOT(WORKS_ON(y) OR NOT y.Essn = ‘123456789’))) OR ((
w.Essn = e.Ssn AND x.Pnumber = W.Pno)) )}
Comment
y)
w)(WORKS_ON (w) AND
Chapter 8, Problem 30E
Problem
Show how you can specify the following relational algebra operations in both tuple and domain
relational calculus.
a. σA=C(R(A, B, C))
b. π<A, B>(R(Α, B, C))
c. R(A, B, C) * S(C, D, E)
d. R(A, B, C) ⋃ S(A, B, C)
e. P(A, B, C) â‹‚ S(A, B, C)
f. P(A, B, C) = S(A, B, C)
g. R(A, B, C) ×S(D, E, F)
h. P(A, B) ÷ S(A)
Step-by-step solution
Step 1 of 7
(a)
Tuple calculus expression followed by the domain calculus expression is
Comment
Step 2 of 7
(b)
Tuple calculus followed by the domain calculus is
,
Comment
Step 3 of 7
(c)
Tuple calculus expression followed by the domain calculus is
Comment
Step 4 of 7
(d)
Tuple calculus expression followed by the domain calculus is
Comment
Step 5 of 7
(e)
Tuple calculus expression
(f)
Tuple calculus expression
Comments (1)
Step 6 of 7
(g)
Tuple calculus expression
Comment
Step 7 of 7
(h)
Tuple relation calculus expression is
Comment
Chapter 8, Problem 31E
Problem
Suggest extensions to the relational calculus so that it may express the following types of
operations that were discussed in Section 8.4: (a) aggregate functions and grouping; (b) OUTER
JOIN operations; (c) recursive closure queries.
Step-by-step solution
Step 1 of 3
1. We can define a relation AGGREGATE with attributes Sum, Minimum, Maximum, Average,
Count etc. Using any query we can say
{t.Sum| AGGREGATE(t) AND (
x)(EMPLOYEE(x) AND x.Sum Σ e.Salary)}
We can get sum of salary of all Employees. We can include similar functions for other aggregate
operations.
Comment
Step 2 of 3
2. For OUTER JOIN a special Operation say with symbol δ can be used.
And query may look like:
{t.| (EMPLOYEE δ DEPARTMENT)(t)}
Comment
Step 3 of 3
3. Recursive closure: a special Operation say with symbol Φ can be used.
And query may look like:
{t.| EMPLOYEE (t) AND t.Ssn Φ t.Mgr_ssn }
So by specifying that it is a recursive closure operation we may instruct system to calculate result
of query.
Comment
Chapter 8, Problem 32E
Problem
A nested query is a query within a query. More specifically, a nested query is a parenthesized
query whose result can be used as a value in a number of places, such as instead of a relation.
Specify the following queries on the database specified in Figure 5.5 using the concept of nested
queries and the relational operators discussed in this chapter. Also show the result of each query
as it would apply to the database state in Figure 5.6.
a. List the names of all employees who work in the department that has the employee with the
highest salary among all employees.
b. List the names of all employees whose supervisor’s supervisor has ‘888665555’ for Ssn.
c. List the names of employees who make at least $10,000 more than the employee who is paid
the least in the company.
Step-by-step solution
Step 1 of 3
Consider the COMPANY database specified in Figure 5.5.
a. List the names of all employees who work in the department that has the employee with the
highest salary among all employees.
The query using the relational operators is as follows:
Result:
Comment
Step 2 of 3
b. List the names of all employees whose supervisor’s supervisor has '888665555' for SSN.
The query using the relational operators is as follows:
Result:
Comments (1)
Step 3 of 3
c. List the names of employees who make at least $10,000 more than the employee who is paid
the least in the company.
The query using the relational operators is as follows:
Result:
Comment
Chapter 8, Problem 33E
Problem
State whether the following conclusions are true or false:
a. NOT (P(x) OR Q(x)) → (NOT (P(x)) AND (NOT (Q(x)))
b. NOT (∃ x) ( P(x )) → ∀ x (NOT (P(x))
c. (∃ x) (P(x)) → ∀ x: (( P(x))
Step-by-step solution
Step 1 of 3
(a) TRUE
Comments (2)
Step 2 of 3
(b) TRUE
Comment
Step 3 of 3
(c) FALSE
Comment
Chapter 8, Problem 34LE
Problem
Specify and execute the following queries in relational algebra (RA) using the RA interpreter on
the COMPANY database schema in Figure 5.5.
a. List the names of all employees in department 5 who work more than 10 hours per week on
the ProductX project.
b. List the names of all employees who have a dependent with the same first name as
themselves.
c. List the names of employees who are directly supervised by Franklin Wong.
d. List the names of employees who work on every project.
e. List the names of employees who do not work on any project.
f. List the names and addresses of employees who work on at least one project located in
Houston but whose department has no location in Houston.
g. List the names of department managers who have no dependents.
Step-by-step solution
Step 1 of 7
a)
EMP_WORK_PRODUCT<--(σPname=’ProductX’(Project)) ?(Pnumber),(Pno)
(Works_on)
EMP_W_10<-(Employee)?(Ssn,Essn)(σHours>10(EMP_WORK_PRODUCT))
π Lname, Fname, Minit(σ Dno = 5(EMP_W_10))
Explanation: The above query will display the names of all the employee of department and also
who works more than 10 hrs per week on the Product X project. For this query we have used
natural join and ‘σ’ is for selecting and ‘π’ is projection which eliminates duplicates.
Comment
Step 2 of 7
b)
EMP<--(Employee)? (Ssn,Fname),(Essn, Dependent_name) (DEPENDENT)
π Lname, Fname,Minit (EMP)
Explanation: The above query will display the names of all the employees who have a
dependent with the same first as themselves.
Comment
Step 3 of 7
C)
Wong_S<--πSsn(σFname=’Franklin’and Lname=’Wong’(Employee))
Emp_wong <--(Employee) ? (SuperSsn),(Ssn)(Wong_s)
π Lname, Fname,Minit(Emp_wong)
Explanation: The above query we use self join in this query to display the names of all the
employees who are under the supervision of Franklin Wong.
Comment
Step 4 of 7
D)
Emp_proj(Pno,Ssn) <-- πPno,Essn(Works_on)
All_proj <-- π Pnumber (Project)
All_proj_emp <-- Emp_proj ÷ All_proj
π Lname, Fname,Minit( Employee * All_proj_emp)
Explanation: The above query will give the names of employees who work on every project by
using minus operator which will remove all the rows that exists in left side table.
Comment
Step 5 of 7
e)
Emps <-- π Ssn (Employee)
Emps_Working(Ssn) <-- π Essn (Works_on)
Emp_Non_work_Project <-- Emps - Emps_Working
π Lname, Fname,Minit( Employee * Emp_Non_work_Project)
Explanation: The above query will give the names of employees who does not works on any
project by using minus operator which will remove all the rows that exists in left side table.
Comment
Step 6 of 7
f)
Emp_proj_Hou(Ssn)<--πEssn(Works_on(Pno),(Pnumber)(σPlocation=’Houston’(Project)))
Dept_NOLOC_HOU <-πDno(Department)–πDno(σDlocation= ‘Houston’(Department’)
Emp_Dept_No_Hou<-πSsn(Employee ? (Pno),(Dno)( Dept_NOLOC_HOU))
Emps_Result <-- Emp_proj_Hou - Emp_Dept_No_Hou
π Lname, Fname,Minit,Address( Employee * Emps_Result)
Explanation: the above query will give the names and address of employees who work at least
one project located in ‘Houston’ and no department location in ‘Houston’ by using minus operator
which will remove all the rows that exists in left side table.
Comment
Step 7 of 7
g)
Managers_Dept(Ssn) <-- π Mgr_Ssn(Department)
Dependents _Of _ Emps(Ssn) <-- π Essn (Dependent)
Emps_Result <-- Managers_Dept - Dependents _Of _ Emps
π Lname, Fname,Minit( Employee * Emps_Result)
Explanation: the above query will give the names of department managers who have no
dependents by using minus operator which will remove all the rows that exists in left side table.
Comment
Chapter 8, Problem 35LE
Problem
Consider the following MAILORDER relational schema describing the data for a mail order
company.
PARTS(Pno, Pname, Qoh, Price, Olevel)
CUSTOMERS(Cno, Cname, Street, Zip, Phone)
EMPLOYEES(Eno, Ename, Zip, Hdate)
ZIP_CODES(Zip, City)
ORDERS(Ono, Cno, Eno, Received, Shipped)
ODETAILS(Ono, Pno, Qty)
Qoh stands for quantity on hand : the other attribute names are self- explanatory. Specify and
execute the following queries using the RA interpreter on the MAILORDER database schema.
a. Retrieve the names of parts that cost less than $20.00.
b. Retrieve the names and cities of employees who have taken orders for parts costing more
than $50.00.
c. Retrieve the pairs of customer number values of customers who live in the same ZIP Code.
d. Retrieve the names of customers who have ordered parts from employees living in Wichita.
e. Retrieve the names of customers who have ordered parts costing less than $20.00.
f. Retrieve the names of customers who have not placed an order.
g. Retrieve the names of customers who have placed exactly two orders.
Step-by-step solution
Step 1 of 7
MAILORDER Relational Schema
a)
The following command is used to retrieve the names of “PARTS” that costs less than $20.00.
SELECT Pname FROM PARTS WHERE Price<$20.00;
Comment
Step 2 of 7
b)
The following command is used to retrieve the names and cities of employees and whose have
taken orders for parts costing more than $50.00.
SELECT Emp.Ename, Z.City FROM PARTS P, EMPLOYEES Emp,
ZIP_CODES Z, ODETAILS OT WHERE P.Pno=OT.Pno AND
Emp.Zip= Z.Zip AND Price>$50.00;
Comment
Step 3 of 7
c)
The following command is used to retrieve pairs of customer number values of customers and
who live in the same ZIP code:
SELECT C.Cno, C1.Cno FROM CUTOMERS C, CUSTOMERS C1 WHERE
C.Zip= C1.Zip AND C.Cno!=C1.Cno;
Comment
Step 4 of 7
d)
The following command is used to retrieve names of customer and who have ordered parts from
employees living in Wichita.
SELECT Distinct C.cname FROM CUSTOMERS C, ORDERS O, EMPLOYEES E, ZIP_CODE Z
WHERE C.cno=O.cno AND O.eno = e.eno AND E.zip=Z.zip AND Z.city=‘Wichita’);
Comment
Step 5 of 7
e)
The following command is used to retrieve names of customer and who have ordered parts
costing less than $20.00.
SELECT C.cname FROM Customers C where NOT EXISTS (select
P.Pno from parts p where p.price<20.00 and NOT
EXISTS (Select * from ORDERS O, Odetails OT where O.Ono= OT.Ono and O.Ono=C.Cno and
OT.Pno=P.Pno));
Comment
Step 6 of 7
f)
The following command is used to retrieve names of customer and who have not placed an
order.
SELECT C.cname from Customers C Where NOT EXISTS (Select Ono from ORDERS O,
Customers C where O.Ono=C.Cno);
Comment
Step 7 of 7
g)
The following command is used to retrieve names of customer and who have placed an exactly
two orders.
SELECT C.cname FROM Customers C, ORDERS O where O.Ono=C.Cno and COUNT
(Ono)=2;
Comment
Chapter 8, Problem 36LE
Problem
Consider the following GRADEBOOK relational schema describing the data for a grade book of a
particular instructor. ( Note : The attributes A, B, C, and D of COURSES store grade cutoffs.)
CATALOG(Cno, Ctitle)
STUDENTS(Sid, Fname, Lname, Minit)
COURSES(Term, Sec_no, Cno, A, B, C, D)
ENROLLS(Sid, Term, Sec_no)
Specify and execute the following queries using the RA interpreter on the GRADEBOOK
database schema.
a. Retrieve the names of students enrolled in the Automata class during the fall 2009 term.
b. Retrieve the Sid values of students who have enrolled in CSc226 and CSc227.
c. Retrieve the Sid values of students who have enrolled in CSc226 or CSc227.
d. Retrieve the names of students who have not enrolled in any class.
e. Retrieve the names of students who have enrolled in all courses in the CATALOG table.
Step-by-step solution
Step 1 of 5
GRADEBOOK Database
a)
The following command is used to retrieve the names of students enrolled in the Automata class
during the fall 2009 term.
• Select Fname, Minit, Lname FROM STUDENTS, ENROLLS, COURSES, CATALOG WHERE
STUDENTS.Sid= ENROLLS.Sid And COURSES.Cno=CATALOG.Cno And
COURSES.Term=ENROLLS.Term And CATALOG.Ctitle = Automata And ENROLLS.Term=2009;
Comment
Step 2 of 5
b)
The following command is used to retrieve the Sid values of students who have enrolled in
CSc226 and CSc227.
• Select Sid From STUDENTS WHERE Sid IN (Select Sid from ENROLLS, COURSES WHERE
COURSES.Term= ENROLLS.Term And COURSES.Cno=’CSc226’ And Sid IN (Select Sid from
ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And
COURSES.Cno=’CSc227’;
Comment
Step 3 of 5
c)
The following command is used to retrieve the Sid values of students who have enrolled in
CSc226 or CSc227.
• Select Sid From STUDENTS WHERE Sid IN (Select Sid from ENROLLS, COURSES WHERE
COURSES.Term= ENROLLS.Term And COURSES.Cno=’CSc226’ OR Sid IN (Select Sid from
ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And COURSES.Cno=
‘CSc227’;
Comment
Step 4 of 5
d)
The following command is used to retrieve the names of students who have not enrolled in any
class.
• Select Fname, Minit, Lname FROM STUDENTS WHERE NOT EXISTS (Select Sid from
ENROLLS);
Comment
Step 5 of 5
e)
The following command is used to retrieve the names of students who have enrolled in all
courses in the CATALOG table.
• Select Fname, Minit, Lname FROM STUDENTS WHERE NOT EXISTS (
( Select Cno from CATALOG) MINUS (Select Cno from COURSES , ENROLLS WHERE
COURSE.Term= ENROLLS.Term And STUDENTS.Sid=ENROLLS.Sid));
Comment
Chapter 8, Problem 37LE
Consider a database that consists of the following relations.
SUPPLIER(Sno, Sname)
PART(Pno, Pname)
PROJECT(Jno, Jname)
SUPPLY(Sno, Pno, Jno)
The database records information about suppliers, parts, and projects and includes a ternary relationship
between suppliers, parts, and projects. This relationship is a many-many-many relationship. Specify the
following queries in relational algebra.
1. Retrieve the part numbers that are supplied to exactly two projects.
2. Retrieve the names of suppliers who supply more than two parts to project J1.
3. Retrieve the part numbers that are supplied by every supplier.
4. Retrieve the project names that are supplied by supplier S1 only.
5. Retrieve the names of suppliers who supply at least two dierent parts each to at least two dierent
projects.
Chapter 9, Problem 1RQ
Problem
(a) Discuss the correspondences between the ER model constructs and the relational model
constructs. Show how each ER model construct can be mapped to the relational model and
discuss any alternative mappings.
(b) Discuss the options for mapping EER model constructs to relations, and the conditions under
which each option could be used.
Step-by-step solution
Step 1 of 3
A model representing the data in conceptual and abstract way is called ER model. This can be
used in database modeling. Also used to reduce the complexity of the database schema and
also produce a semantic data model of a system.
In relational schema relationship types are represented by two attributes, one as a primary key
and the other one as a foreign key instead of representing them explicitly.
a.
Some of the correspondence between ER model and relational model are as follows:
ER MODEL
Entity relationship model has
entity and relationship among
the entities.
RELATIONAL MODEL
Relational model has entities consisting of attributes.
Relationship is established through foreign keys.
ER model consists strong
entity type that is represented
Entity relations are constructed for each strong entity.
by a rectangle.
ER model also consists weak
entity type that is represented
Entity relations are constructed for each weak entity.
by a rectangle.
All binary 1: 1 or 1: N
relationship type are
represented by a line
connecting line.
Relationship between two entities is represented by foreign
key or relationship relation having two foreign keys each
representing corresponding entity.
All binary M: N relationship
type are represented by a line
Represented by relationship relation or two foreign keys
connecting line.
All n-ary relationship (n>2) type
are represented by a line
Relationship relation and n foreign keys.
connecting line.
Entities have simple attributes.
Entities have composite
attributes.
Relations have attributes corresponding to the entities of ER
model.
Relations have set of simple component attributes.
Entities have multivalued
Multivalued attributes of ER model are represented by
attributes
relation and foreign keys.
ER model also has derived
attributes
Derived attributes are not included.
Value set is the set of values
that may be assigned to
Domain is the value scope of particular attribute.
attributes.
Key attributes are underlined.
This model consists of primary key, foreign key, composite
key or candidate key etc.
Follow the following steps to map ER model into relational model efficiently:
1. Ignore derived attribute.
Derived attribute are the attributes which can be derived from other attributes like age, full name.
If ER diagram has any derived attribute than remove all derived attributes to make schema
simpler. Full name can be calculated by concatenating the first name, middle name and the last
name of the candidate. So it is not required to store the full name of the candidate separately.
2. Mapping of all strong Entities into tables.
• Map all strong entities into tables. Create a separate relation for each strong entity including all
simple attributes in the ER diagram and choose key attribute of ER diagram as primary key of
relation.
• Assume an entity type T in the ER model E, create a relation R including all simple attributes of
T, also choose unique attribute as a primary key of relation R.
• If multiple keys exist for T in E during the analysis of the design, then keep all of them to
describe specific information about the attributes. Keys can also be used for indexing the
database and also for other analysis.
3. Mapping of weak Entities.
• Map all weak entities into tables. Create a separate relation for each weak entity including all
simple attributes. Include all primary keys of the relations to which weak entity is related as
foreign key, to establish connection among the relations.
• Weak entity does not have its own candidate key. Here candidate key of the relation R is
composed of the primary key(s) of the participating entity(s) and the partial key of the weak
entity.
4. Binary 1:1 Mapping.
• For each binary 1:1 relationship in the relation R constructed by the ER schema, identify
relation between two entities. This relationship might occur in the form of foreign key or merging
two attributes into one as a candidate key.
• Also add the attributes which come under relationship. This can also be done by creating a new
relation R that includes primary keys of both participating relations as foreign key.
5. Binary 1: N Mapping.
• Identify all 1: N relationships in ER diagram. For each binary 1: N relationship in relation R, the
primary key present on the 1-side of the relationship becomes a foreign key on the N-side
relation.
• Another approach is to create a new relation S that includes primary keys of both participating
entities. Both primary keys work as foreign keys in S.
6. Binary M: N Mapping.
• Identify all M: N relationship in ER diagram. Create new relation S, corresponding to each
binary M: N relationship, to represent relationship R. Include both primary key attributes of
participating relations as foreign keys in the relation S. Also include the simple attributes of the
relationship.
• Combination of foreign keys will form primary key in S. As in 1: 1 or 1: N relationship, M: N
relationship can’t be represented by single foreign key attribute used in one of the participating
relations.
7.
Comment
Step 2 of 3
Mapping of Multivalued attributes.
• Create a new relation R, corresponding to each multivalued attribute A present in the ER
diagram. All simple attributes corresponding to A, would be present in relation R.
• The relation will comprise of primary key attribute K, such that the attribute K belongs to the
relation representing the relationship type containing A as a multivalued attribute. The primary
key of R would be the combination of A and K.
8. Mapping of N-ary relationship.
• For each n-ary relationship, having
, represent the relation R through a new relation.
Include primary keys attributes of all participating relations as foreign key attributes and also
include the simple attribute of n-ary relationship.
• Since the participating entities are more than two, so without creating a new relation this cannot
be mapped. Combination of all foreign keys is generally used as a primary key in relation R.
Comment
Step 3 of 3
b.
Method of mapping EER model into relational model.
Mapping of Enhanced Entity Relationship (EER) model to relations includes all the 8 steps
followed in part (a). EER model is an extended model, used to map extended elements of the ER
model. Extended elements in the EER model are specialization or generalization and shared
subclasses.
The following steps can also be used for EER to relation mapping:
Mapping specialization or generalization.
• A number of subclasses, that constitutes a specialization, can be mapped into relational
schema using several options.
First option is to map the whole specialization into a single table. Second option is to map it into
multiple tables. In each option, variations may occur that depends upon the constraints on the
specialization or generalization.
• Each specialization containing m subclasses
and generalized super class
C, having the primary key k and the attributes
are converted into relational
schemas using one of the following options:
Option 9a: Multiple relations—superclass and subclasses.
• Create a relation R for superclass that includes all the attributes of C with the primary key k.
Create a separate relation having primary key k and attributes
for each subclass
as primary key for each relation
, where
. Here k is working
.
Option 9b: Multiple relations—subclasses relations only.
• Create a relation
corresponding to every subclass
that includes the attributes
, where k is primary key for each relation
. The specialization, whose subclasses are total, can use this option. A total subclass is a class
such that at least one subclass must contain all the entities of super class.
• Specialization having disjointed constraints can be mapped through this option. In the case of
specialization overlapping, there can be replication of same entity in several relations. This will
cause redundancy in the relational schema.
Option 9c: Single relation with one type attribute.
• Create a single relation R that includes the attributes
. The attribute t is called a type
attribute or discriminating attribute. It represents the subclass to which each tuple/record
belongs. The attribute k is the primary key.
• This option is applicable for a specialization whose subclasses are disjoint. This option
generates many NULL values if independent attributes exist in the subclasses.
Option 9d: Single relation with multiple type attributes.
• Create a single relation R that includes the attributes
. The attribute k is the
primary key for the relation R.
• Each attribute
a subclass
Comment
is a Boolean type attribute. This
indicates whether a record is contained by
or not. This option can be used for specialization, having overlapping subclasses.
Chapter 9, Problem 2E
Problem
Map the UNIVERSITY database schema shown in Figure 3.20 into a relational database
schema.
Step-by-step solution
Step 1 of 3
Refer Fig 3.20 of chapter 3 for the UNIVRESITY database schema from the textbook.
Comment
Step 2 of 3
Basic steps to map ER diagram into Relational Database Schema are as follows:
1. Ignore derived attribute.
If ER diagram has any derived attribute than remove all derived attributes to make schema
simpler. Derived attribute are the attributes which can be derived from other attributes like, age,
full name. Age can be calculated through difference of current date and the date of birth.
2. Mapping of all strong Entities into tables.
Map all strong entities into tables. Create a relation R that includes all single attributes in the ER
diagram and choose key attribute of ER diagram as primary key of relation R.
COLLEGE
CName COffice CPhone
INSTRUCTOR
Id IName Rank IOffice IPhone
DEPT
DCode DName DOffice DPhone
STUDENT
Sid DOB FName MName LName Addr Phone Major
COURSE
CCode Credits CoName Level CDesc
SECTION
SecId SecNo Sem Year Bldg RoomNo DaysTime
3. Mapping of weak Entities.
For each weak entity create a separate relation R. Add all the simple attributes of weak entity in
relation R. Include all primary keys of the relations to which weak entity is related as foreign key,
to establish connection among the relations. Since the provided ER diagram has no weak entity,
so there is no need to map weak entities.
4. 1:1 Mapping.
For each binary 1:1 relationship in the relation R constructed by the ER schema, identify relation
between two entities. This relationship might occur in the form of foreign key or by merging two
attributes into one (both must have exact same number of attributes). Also add the attributes
which come under relationship.
COLLEGE
CName COffice CPhone DeanId
INSTRUCTOR
Id IName Rank IOffice IPhone DCode CStartDate
5. 1: N Mapping.
Identify all 1:N relationships in ER diagram. For each regular binary 1:N relationship in relation R,
add primary key of participating relation of 1-side as foreign keys to the N-side relation.
COLLEGE
CName COffice CPhone DeanId DCode
DEPT
DCode DName DOffice DPhone CCode InstId SId
INSTRUCTOR
Id IName Rank IOffice IPhone DCode CStartDate SecId
COURSE
CCode Credits CoName Level CDesc SecId
6. M: N Mapping.
Identify all M:N relationship in ER diagram. For each M:N relationship, create new relation S to
represent relationship. Include all primary key attributes of participating relation as foreign key in
the relation S.
TAKES
Sid Grade SecId
7. Mapping of Multivalued attributes.
For each multivalued attribute in the ER diagram, create a new relation R. R will include all
attributes corresponding to multivalued attribute. Add primary key attribute as a foreign key in R.
Since the provided ER diagram has no multivalued attributes, so there is no need to map
multivalued attributes.
8. Mapping of N-ary relationship.
For each n-ary relationship, where
, create a new relation R to represent the relationship.
Include primary keys attributes of all participating relations as foreign key attributes and also
include the simple attribute of n-ary relationship.
Since the maximum value of n is 2 in the ER diagram provided, so there is no n-ary relationship.
Comment
Step 3 of 3
Final relational schema, for ER diagram provided in Fig-3.20, can be generated as follows:
Final schema has seven relations, six from the strong entities and one from binary M: N
relationship. Each relational table has primary and foreign keys. TAKES table represents
relationship between STUDENT and SECTION table.
Also, Grade can be calculated with the help of Sid and SecId for corresponding semester, year or
in particular section.
• In COLLEGE table, CName is primary key and DeanId and DCode are foreign keys for
INSTRUCTOR and DEPT tables respectively. DeanId is the projection of Id attribute in
INSTRUCTOR table.
• In INSTRUCTOR table, Id is working as primary key. DCode and SecId are working as foreign
key for DEPT and SECTION tables respectively.
• In DEPT table, DCode is unique for each department and it is working as primary key. To
establish connection with COURSE, INSTRUCTOR and STUDENT, their primary keys can be
used as foreign keys. InstId of DEPT table is primary key (Id) attribute in INSTRUCTOR table
and it is working as foreign key here.
• STUDENT table has primary key only. To get the personal information of student SId will be
used. But to retrieve academic information connection is required with DEPT and TAKES table.
• Each course has its unique CCode in COURSE table. COURSE table is logically connected
with SECTION table and DEPT table to particulate the course in department and section.
• TAKES table is created using binary M: N relationship between STUDENT and SECTION. This
is normalized form of both tables.
• In SECTION table SecId is primary key.
Comment
Chapter 9, Problem 3E
Problem
Try to map the relational schema in Figure 6.14 into an ER schema. This is part of a process
known as reverse engineering, where a conceptual schema is created for an existing
implemented database. State any assumptions you make.
Step-by-step solution
Step 1 of 3
Take the relational schema from the text book figure 6.14 it shows the relations of mapping the
EER categories. Based on this we may construct the ER schema.
Comment
Step 2 of 3
Comment
Step 3 of 3
Here, BOOK_AUTHORS is the multivalued attributes. So it can be represented as weak entity
type.
Comment
Chapter 9, Problem 4E
Problem
Figure shows an ER schema for a database that can be used to keep track of transport ships
and their locations for maritime authorities. Map this schema into a relational schema and specify
all primary keys and foreign keys.
Figure
An ER schema for a SHIP_TRACKING database.
Step-by-step solution
Step 1 of 6
Following are the steps to convert the given ER scheme into a relational schema:
Step 1: Mapping the regular entity types:
Identify the regular entities in the given ER scheme and create a relation for each regular entity.
Include all the simple attributes of regular entities into relations.
The relations are SHIP, SHIP_TYPE, STATE_COUNTRY, and SEA/OCEAN/LAKE.
Comments (1)
Step 2 of 6
Step 2: Mapping the weak entity types:
The weak entities in the given ER scheme are SHIP_MOVEMENT, PORT, and PORT_VISIT.
Create a relation for each weak entity. Include all the simple attributes of weak entities into
relations and include the primary key of the strong entity that corresponds to the owner entity
type as a foreign key.
Comments (1)
Step 3 of 6
Step 3: Mapping of binary 1:1 relationship types:
There exists one binary 1:1 relationship mapping which is SHIP_AT_PORT in given ER scheme.
Step 4: Mapping of binary 1: N relationship types:
1: N relationship types in given ER scheme are HISTORY, TYPE, IN, ON, HOME_PORT.
For HISTORY 1: N relationship type, include the primary key of SHIP in SHIP_MOVEMENT. That
is handled in step 2.
For TYPE 1:N relationship type, include the primary key of SHIP_TYPE in SHIP.
For IN 1: N relationship type, include the primary key of STATE_COUNTRY in PORT.
For ON 1: N relationship type, include the primary key of SEA/OCEAN/LAKE in PORT.
For HOME_PORT 1:N relationship type, include the primary key of PORT_VISIT in SHIP.
Comment
Step 4 of 6
Step 5: Mapping of binary M: N relationship types:
There are no binary M: N relationship types in the given ER scheme.
Step 6: Mapping of multivalued attributes:
There are no multivalued attributes in the given ER scheme.
The relational schema is shown below:
Comments (3)
Step 5 of 6
The primary keys in the schema are:
SHIP: SnameSHIP_TYPE: TypeSHIP_MOVEMENT: Statename, Date, Time (Compound
key)SEA/OCEAN/LAKE: SeaNamePORT: PnameSTATE_COUNTRY: NamePORT_VISIT:
VSname, Start_date (Compound Key)
Comment
Step 6 of 6
The foreign keys in the schema are:
SHIP: Ship_type, P_nameSHIP_TYPE: NoneSHIP_MOVEMENT: StatenameSEA/OCEAN/LAKE:
NonePORT: NoneSTATE_COUNTRY: NamePORT_VISIT: VSname
Comment
Chapter 9, Problem 5E
Problem
Map the BANKER schema of Exercise 1 (shown in Figure 2) into a relational schema. Specify all
primary keys and foreign keys. Repeat for the AIRLINE schema (Figure 3.20) of Exercise 2 and
for the other schemas for Exercises 1 through 9.
Exercise 1
Consider the ER diagram shown in Figure 1 for part of a BANK database. Each bank can have
multiple branches, and each branch can have multiple accounts and loans.
a. List the strong (nonweak) entity types in the ER diagram.
b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship.
c. What constraints do the partial key and the identifying relationship of the weak entity type
specify in this diagram?
d. List the names of all relationship types, and specify the (min, max) constraint on each
participation of an entity type in a relationship type. Justify your choices.
Figure 1
An ER diagram for a BANK database schema.
Exercise 2
Consider the ER diagram in Figure 2, which shows a simplified schema for an airline
reservations system. Extract from the ER diagram the requirements and constraints that
produced this schema. Try to be as precise as possible in your requirements and constraints
specification.
Figure 2
An ER diagram for an AIRLINE database schema.
Exercise 3
Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:
a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.
b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.
c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.
Can you think of any other similar constraints?
Exercise 4
Composite and multivalued attributes can be nested to any number of levels. Suppose we want
to design an attribute for a STUDENT entity type to keep track of previous college education.
Such an attribute will have one entry for each college previously attended, and each such entry
will be composed of college name, start and end dates, degree entries (degrees awarded at that
college, if any), and transcript entries (courses completed at that college, if any). Each degree
entry contains the degree name and the month and year the degree was awarded, and each
transcript entry contains a course name, semester, year, and grade. Design an attribute to hold
this information. Use the conventions in Figure 3.5.
Exercise 5
Show an alternative design for the attribute described in Exercise 4 that uses only entity types
(including weak entity types, if needed) and relationship types.
Exercise 6
In Chapters 1 and 2, we discussed the database environment and database users. We can
consider many entity types to describe such an environment, such as DBMS, stored database,
DBA, and catalog/data dictionary. Try to specify all the entity types that can fully describe a
database system and its environment; then specify the relationship types among them, and draw
an ER diagram to describe such a general database environment.
Exercise 7
Design an ER schema for keeping track of information about votes taken in the U.S. House of
Representatives during the current two-year congressional session. The database needs to keep
track of each U.S. STATE?S Name (e.g., ?Texas?, ?New York?, ?California?) and include the
Region of the state (whose domain is {?Northeast?, ?Midwest?, ?Southeast?, ?Southwest?, ?
West?}). Each CONGRESS_PERSON in the House of Representatives is described by his or her
Name, plus the District represented, the Start_date when the congressperson was first elected,
and the political Party to which he or she belongs (whose domain is {?Republican?, ?Democrat?,
?Independent?, ?Other?}). The database keeps track of each BILL (i.e., proposed law), including
the Bill_name, the Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is
{?Yes?, ?No?}), and the Sponsor (the congressperson(s) who sponsored?that is, proposed?the
bill). The database also keeps track of how each congressperson voted on each bill (domainof
Vote attribute is {?Yes?, ?No?, ?Abstain., ?Absent?}). Draw an ER schema diagram for this
application. State clearly any assumptions you make.
Exercise 8
A database is being constructed to keep track of the teams and games of a sports league. A
team has a number of players, not all of whom participate in each game. It is desired to keep
track of the players participating in each game for each team, the positions they played in that
game, and the result of the game. Design an ER schema diagram for this application, stating any
assumptions you malie. Choose your favorite sport (e.g., soccer, baseball, football).
Exercise 9
Consider the ER diagram in Figure 3. Assume that an employee may work in up to two
departments or may not be assigned to any department. Assume that each department must
have one and may have up to three phone numbers. Supply (min, max) constraints on this
diagram. State clearly any additional assumptions you make. Under what conditions would the
relationship HAS_PHONE be redundant in this example?
Figure 3
Part of an ER diagram for a COMPANY database.
Figure 3.20
Step-by-step solution
There is no solution to this problem yet.
Get help from a Chegg subject expert.
Ask an expert
Chapter 9, Problem 6E
Problem
Map the EER diagrams in Figures 4.9 and 4.12 into relational schemas. Justify your choice of
mapping options.
Step-by-step solution
Step 1 of 7
The relational schema diagram for the EER diagram in figure 4.9 is as shown below:
Comment
Step 2 of 7
Explanation:
• The regular entity types are PERSON, DEPARTMENT, COLLEGE, COURSE and SECTION.
So, create a relation for each entity with their respective attributes.
• The FACULTY and STUDENT are sub classes of the entity PERSON. So, two relations one for
FACULTY and one for STUDENT are created and the primary key of PERSON is included in
both the relations along with their respective attributes.
• An entity INSTRUCTOR_RESEARCHER is created with Instructor_id as an attribute. This
attribute is included as a foreign key in the relations FACULTY and GRAD_STUDENT.
• There exists a binary 1:1 relationship CHAIRS between FACULTY and DEPARTMENT. So,
include the primary key of Faculty as a foreign key in relation DEPARTMENT.
• There exists a binary 1:N relationship CD between COLLEGE and DEPARTMENT. So, include
the primary key of COLLEGE as a foreign key in relation DEPARTMENT.
Comment
Step 3 of 7
• There exists a binary 1:N relationship DC between DEPARTMENT and COURSE. So, include
the primary key of DEPARTMENT as a foreign key in relation COURSE.
• There exists a binary 1:N relationship CS between COURSE and SECTION. So, include the
primary key of COURSE as a foreign key in relation SECTION.
• There exists a binary 1:N relationship ADVISOR between FACULTY and GRAD_STUDENT.
So, include the primary key of FACULTY as a foreign key in relation GRAD_STUDENT.
• There exists a binary 1:N relationship PI between FACULTY and GRANT. So, include the
primary key of FACULTY as a foreign key in relation GRANT.
• There exists a binary 1:N relationship TEACH between SECTION and
INSTRUCTOR_RESEARCHER. Create a relation TEACH and include the primary keys of
SECTION and INSTRUCTOR_RESEARCHER as attributes of TEACH.
• There exists a binary 1:N relationship MAJOR between STUDENT and DEPARTMENT. Create
a relation MAJOR and include the primary keys of STUDENT and DEPARTMENT as attributes of
MAJOR.
•
Comment
Step 4 of 7
There exists a binary 1:N relationship MINOR between STUDENT and DEPARTMENT. Create a
relation MINOR and include the primary keys of STUDENT and DEPARTMENT as attributes of
MINOR.
• There exists a binary M:N relationship COMMITTEE between FACULTY and
GRAD_STUDENT. Create a relation COMMITTEE and include the primary keys of FACULTY
and GRAD_STUDENT as attributes of COMMITTEE.
• There exists a binary M:N relationship BELONGS between FACULTY and DEPARTMENT.
Create a relation BELONGS and include the primary keys of FACULTY and DEPARTMENT as
attributes of BELONGS.
• There exists a binary M:N relationship REGISTERED between STUDENT and
CURRENT_SECTION. Create a relation REGISTERED and include the primary keys of
STUDENT and CURRENT_SECTION as attributes of REGISTERED.
• There exists a binary M:N relationship REGISTERED between STUDENT and
CURRENT_SECTION. Create a relation REGISTERED and include the primary keys of
STUDENT and CURRENT_SECTION as attributes of REGISTERED.
• There exists a binary M:N relationship TRANSCRIPT between SECTION and STUDENT.
Create a relation TRANSCRIPT and include the primary keys of SECTION and STUDENT as
attributes of TRANSCRIPT along with additional attributes of relation TRANSCRIPT.
Comment
Step 5 of 7
The relational schema diagram for the EER diagram in figure 4.12 is as shown below:
Comment
Step 6 of 7
Explanation:
• The regular entity types are PLANE_TYPE, AIRPLANE and HANGAR. So, create a relation for
each entity with their respective attributes.
• Create two relations CORPORATION and PERSON and include their respective attributes.
• Owner category is a subset of the union of two entities CORPORATION and PERSON. So, a
relation OWNER is created with Owner_id as an attribute. This attribute is included as a foreign
key in the relations CORPORATION and PERSON.
• The EMPLOYEE and PILOT are sub classes of the entity PERSON. So, two relations one for
EMPLOYEE and one for PILOT are created and the primary key of PERSON is included as
primary key in both the relations along with their respective attributes.
• An entity SERVICE is a weak entity. So, create a relation SERVICE and include as attributes
the primary key of AIRPLANE along with the attributes of SERVICE.
• There exists a binary 1:N relationship OF_TYPE between AIRPLANE and PLANE_TYPE. So,
include the primary key of AIRPLANE as a foreign key in relation PLANE_TYPE.
• There exists a binary 1:N relationship STORED_IN between AIRPLANE and HANGAR. So,
include the primary key of AIRPLANE as a foreign key in relation HANGAR.
• There exists a binary M:N relationship WORKS_ON between PLANE_TYPE and EMPLOYEE.
Create a relation WORKS_ON and include the primary keys of PLANE_TYPE and EMPLOYEE
as attributes of WORKS_ON.
•
Comment
Step 7 of 7
There exists a binary M:N relationship FLIES between PLANE_TYPE and PILOT. Create a
relation FLIES and include the primary keys of PLANE_TYPE and PILOT as attributes of FLIES.
• There exists a binary M:N relationship OWNS between AIRPLANE and OWNER. Create a
relation OWNS and include the primary keys of AIRPLANE and OWNER as attributes of OWNS
along with the attribute Pdate.
• There exists a binary M:N relationship MAINTAIN between SERVICE and EMPLOYEE. Create
a relation OWNS and include the primary keys of SERVICE and EMPLOYEE as attributes of
MAINTAIN.
Comment
Chapter 9, Problem 7E
Problem
Is it possible to successfully map a binary M : N relationship type without requiring a new
relation? Why or why not?
Step-by-step solution
Step 1 of 3
When there exists a many to many relationship between two entities, then the relationship type is
known as binary M: N relationship type.
Comment
Step 2 of 3
The steps to map a binary M: N relationship type R into relation is as follows:
• Create a new relation R1 to represent the relationship type R.
• Include the primary keys of the two participating entities as foreign keys in new relation R1.
• The primary keys of the two participating entities also become the composite primary key of
relation R1.
• Also include any simple attributes of the relationship type R.
Comment
Step 3 of 3
Hence, it is not possible to map a binary M: N relationship type without requiring a new relation.
Comment
Problem
Chapter 9, Problem 8E
Consider the EER diagram in Figure for a car dealer.
Map the EER schema into a set of relations. For the VEHICLE to CAR/TRUCK/SUV
generalization, consider the four options presented in Section 9.2.1 and show the relational
schema design under each of those options.
Figure
EER diagram for a car dealer.
Step-by-step solution
Step 1 of 8
Option multiple relations – superclass and subclasses:
Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option multiple relations – superclass and subclasses:
Comment
Step 2 of 8
Using the option multiple relations – superclass and subclasses, a separate relation is created for
super class and each sub class in the generalization.
• A relation VEHICLE is created with attributes Vin, Model and Price.
• A relation CAR is created with attribute Vin and Engine_size.
• A relation TRUCK is created with attribute Vin and Tonnage.
• A relation SUV is created with attribute Vin and No_seats.
Comment
Step 3 of 8
The relational schema for a car dealer EER diagram (refer figure 9.9) using the option multiple
relations – superclass and subclasses is as shown below:
Comment
Step 4 of 8
Option multiple relations –subclass relations only:
Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option multiple relations –subclass relations only:
Using the option multiple relations –subclass relations only, a separate relation is created for
each sub class in the generalization.
• A relation CAR is created with attribute Vin, Model, Price and Engine_size.
• A relation TRUCK is created with attribute Vin, Model, Price and Tonnage.
• A relation SUV is created with attribute Vin, Model, Price and No_seats.
Comment
Step 5 of 8
Option single relation with one type attribute:
Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option single relation with one type attribute:
Using the option single relation with one type attribute, a single relation is created for super class
as well as the sub class.
• The attributes of the relation will be the union of attributes of super class and sub classes.
• An attribute Vehicle_Type is added to specify the type of the vehicle
• A relation Vehicle is created with attributes Vin, Model, Price, Engine_size, Tonnage, No_seats
and Vehicle_Type.
Comment
Step 6 of 8
The relational schema for a car dealer EER diagram (refer figure 9.9) using the option single
relation with one type attribute is as shown below:
Comment
Step 7 of 8
Option single relation with multiple type attributes:
Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option single relation with multiple type attributes:
Using the option single relation with multiple type attributes, a single relation is created for super
class as well as the sub class.
• The attributes of the relation will be the union of attributes of super class and sub classes.
• An Boolean attribute Car_Type is added to indicate the type of the vehicle as car.
• An Boolean attribute Truck_Type is added to indicate the type of the vehicle as truck.
• An Boolean attribute SUV_Type is added to indicate the type of the vehicle as SUV.
• A relation Vehicle is created with attributes Vin, Model, Price, Car_Type, Engine_size,
Truck_Type, Tonnage, SUV_Type, No_seats.
Comment
Step 8 of 8
The relational schema for a car dealer EER diagram (refer figure 9.9) using the option single
relation with multiple type attributes is as shown below:
Comment
Chapter 9, Problem 9E
Problem
Using the attributes you provided for the EER diagram in Exercise, map the complete schema
into a set of relations. Choose an appropriate option out of 8A thru 8D from Section 9.2.1 in doing
the mapping of generalizations and defend your choice.
Exercise
Consider the following EER diagram that describes the computer systems at a company. Provide
your own attributes and key for each entity type. Supply max cardinality constraints justifying
your choice. Write a complete narrative description of what this EER diagram represents.
Step-by-step solution
Step 1 of 2
EER diagram represents:
EER diagram represents the computer systems at a company.
• The EER diagram starts with the relation computer.
• The relation computer has the attributes that RAM, ROM, Processor, S_no, Manufacturer, and
Cost.
• It has the primary key S_no and the cardinality of 1:M.
• EER diagram starts with the relation computer that it deals to many relations that Accessory,
Installed and d.
• The Accessory has a one-to-many cardinality and transfers the function to the keyboard,
monitor, and mouse.
• Also, the installed and installed_OS relation deals with the software and operating_system to
perform the operations and signals on the computer system to support with it.
• The relation d performs the cardinality to laptop and desktop with all other components.
• The other components that related are memory, video_card, and sound_card.
Cardinality:
• One-to-one cardinality describes the entity that related to only one occurrence to another
occurrence.
• One-to-many cardinality describes the entity that related to one occurrence to many
occurrences.
• Many-to-many cardinality describes the entity that related to many occurrences to many
occurrences.
Comment
Step 2 of 2
The following table describes the attributes, primary key, and cardinality of each relation:
Comment
Chapter 9, Problem 10LE
Problem
Consider the ER design for the UNIVERSITY database that was modeled using a tool like ERwin
or Rational Rose in Laboratory Exercise 3.31. Using the SQL schema generation feature of the
modeling tool, generate the SQL schema for an Oracle database.
Reference Exercise 3.31
Consider the UNIVERSITY database described in Exercise 16. Build the ER schema for this
database using a data modeling tool such as ERwin or Rational Rose.
Reference Exercise 16
Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:
a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.
b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.
c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.
Can you think of any other similar constraints?
Step-by-step solution
Step 1 of 1
Refer to the ER schema for UNIVERSITY database, generated using Rational Rose tool in
Laboratory Exercise 3.31. Use Rational Rose tool to create the SQL schema for an Oracle
database as follows:
• Open the ER schema generated using Rational Rose tool in Laboratory Exercise 3.31. In the
options available on left, right click on the option Component view, go to Data Modeler, then go
to New and select the option Database.
• Name the database as Oracle Database.
• Right click on Oracle Database and select the option Open Specification. In the field Target
select Oracle 7.x and click on OK.
• Import the ER schema, generated using Rational Rose tool in Laboratory Exercise- 3.31, to the
Oracle Database as follows:
• Right click on the Oracle Database, then go to New and select the option File.
• Now browse and select the ER schema generated using Rational Rose tool in Laboratory
Exercise 3.31. Selecting the file would import the ER schema for the UNIVERSITY database,
generated using Rational Rose tool in Laboratory Exercise 3.31.
• Click on File option in menu bar, followed by clicking on Save as option. Save the ER schema
by the file name 714374-9-10LE.
• This will generate the SQL schema of the UNIVERSITY database for the Oracle database.
Comment
Chapter 9, Problem 11LE
Problem
Consider the ER design for the MAIL_ORDER database that was modeled using a tool like
ERwin or Rational Rose in Laboratory Exercise. Using the SQL schema generation feature of the
modeling tool, generate the SQL schema for an Oracle database.
Exercise
Consider a MAIL_ORDER database in which employees take orders for parts from customers.
The data requirements are summarized as follows:
â–  The mail order company has employees, each identified by a unique employee number, first
and last name, and Zip Code.
â–  Each customer of the company is identified by a unique customer number, first and last name,
and Zip Code.
â–  Each part sold by the company is identified by a unique part number, a part name, price, and
quantity in stock.
â–  Each order placed by a customer is taken by an employee and is given a unique order number.
Each order contains specified quantities of one or more parts. Each order has a date of receipt
as well as an expected ship date. The actual ship date is also recorded.
Design an entity-relationship diagram for the mail order database and build the design using a
data modeling tool such as ERwin or Rational Rose.
Step-by-step solution
Step 1 of 1
Refer to the ER schema for MAIL_ORDER database, generated using Rational Rose tool in
Laboratory Exercise 3.32. Use Rational Rose tool to create the SQL schema for an Oracle
database as follows:
• Open the ER schema generated using Rational Rose tool in Laboratory Exercise 3.32. In the
options available on left, right click on the option Component view, go to Data Modeler, then go
to New and select the option Database.
• Name the database as Oracle Database.
• Right click on Oracle Database and select the option Open Specification. In the field Target
select Oracle 7.x and click on OK.
• Import the ER schema, generated using Rational Rose tool in Laboratory Exercise- 3.32, to the
Oracle Database as follows:
• Right click on the Oracle Database, then go to New and select the option File.
• Now browse and select the ER schema generated using Rational Rose tool in Laboratory
Exercise 3.32. Selecting the file would import the ER schema for the MAIL_ORDER database.
• Click on File option in menu bar, followed by clicking on Save as option. Save the ER schema
by the file name 714374-9-11LE.
• This will generate the SQL schema of the MAIL_ORDER database for the Oracle database.
Comment
Chapter 10, Problem 1RQ
Problem
What is ODBC? How is it related to SQL/CLI?
Step-by-step solution
Step 1 of 1
ODBCL:Open data base connectivity (ODBC) is the standardized application programming interface. It is
for accessing a database.
For accessing the files we use the ODBC soft ware and programming support of ODBC is
Microsoft.
SQL/CLI
SQL/CLI is the part of SQL standard. SQL / CLT means. Call level interface. It was developed as
a follow up to the technique known as ODBC.
SQL/ CLI is the set of functions.
Comment
Chapter 10, Problem 2RQ
Problem
What is JDBC? Is it an example of embedded SQL or of using function calls?
Step-by-step solution
Step 1 of 2
JDBCE
JDBC stand for Java database connectivity. It is a registered trademark of sun Microsystems.
JDBC is the call function interface it is for accessing the databases from java.
A JDBC driver is basically an implementation of the function calls. That is specified in the JDBC
application programming interface. It is designed for allow a single java program to connect
several different databases.
Comment
Step 2 of 2
JDBC is not the example of embedded SQL. It is a function call. That is specified in JDBC API.
JDBC function calls can access any RDBMS where the JDBC driver can available. So the
function libraries for this access are known as JDBC.
Comment
Chapter 10, Problem 3RQ
Problem
List the three main approaches to database programming. What are the advantages and
disadvantages of each approach?
Step-by-step solution
Step 1 of 3
Main approaches to database programming:The main approaches for database programming are
(1) Embedding database command in a general – purpose programming language:
Here database statements are embedded into the host programming language. But they are
identified by a special prefix and precompiled or preprocessor scans the source program code to
identify database statements and extract them for processing by the DBMS.
Comment
Step 2 of 3
(2) Using a library of database functions:A library of functions is made available to the host programming language for database calls.
Comment
Step 3 of 3
(3) Designing a brand new language:Database programming language is designed from scratch to be compatible with the database
model and query language. Here loops and conditional statements are added to the data base
language to convert it in to a full fledged programming language.
Advantages and disadvantages of database programming:In many applications, first two steps are most common approaches. But they require some
database access and main disadvantages of these two approaches is impedance mismatch.
In the third approach it is more appropriate for applications and it has intensive data base
interaction. In the third approach impedance mismatch is not occur here.
Comment
Chapter 10, Problem 4RQ
Problem
What is the impedance mismatch problem? Which of the three programming approaches
minimizes this problem?
Step-by-step solution
Step 1 of 2
Impedance mismatch:
Impedance mismatch is a term that is used to refer the problems occur in the differences
between the data base model and the programming language model.
It is less of problem when a special data base programming language is designed. At this time
that uses the same data model and data types as the database model.
In a relational model it has three main constructs.
Attributes
tuples
tables.
Comment
Step 2 of 2
1 st problem:In the data model the data types of the programming language differ from the attribute data type.
So, for this, it is necessary to have a binding for each programming language because different
languages have different data types.
2 nd problem:
The results of most queries are sets or multisite of tuples. And each is formed of a sequence of
attribute value.
So binding is needed to map the query result data structure, which is a table to an appropriate
data structure in the programming language.
The third approach of the data base programming that is designing a brand new language,
approach is minimize this impedance mismatch problem.
Comment
Chapter 10, Problem 5RQ
Problem
Describe the concept of a cursor and how it is used in embedded SQL.
Step-by-step solution
Step 1 of 2
A cursor is a pointer that points to a single tuple/ row from the result of a query that retrieves
multiple tuples.
It is declared when the SQL every command is declared in the program.
In the program cursor uses two commands
OPEN CURSOR. Command
FETCH command
And the cursor variable is an iterates.
Comments (1)
Step 2 of 2
In the embedded SQL, update / delete commands are used when the condition WHERE
CURRENT OF < Cursor name > specifies that the current tuple. It is represented by the cursor.
When declaring a cursor in the embedded SQL, some operations are performed in that.
DECLARE < Cursor name > [ INSENSITIVE ] [ SCROLL ] CURSOR
[ WITH HOLD ] FOR < query specification >
[ ORDER BY < Ordering Specification >]
[ FOR REND ONLY | FOR UPDATE [ OF < attribute ] ] ;
Comment
Chapter 10, Problem 6RQ
Problem
What is SQLJ used for? Describe the two types of iterators available in SQLJ.
Step-by-step solution
Step 1 of 2
SQL J
SQL J is standard it is adopted by several vendors for embedded SQL in java. SQL J is used for
accessing SQL database from java using function calls. And it is used in oracle DBMS.
SQL J is used for convert the SQL statements into java through the JDBC interface.
In SQL J an iterates is associated with the tuples and attributes in a query result. Here two types
of iterators is there.
(1) A named iterator.
(2) A positional iterator.
Comment
Step 2 of 2
A named iterator is associated with a query result by listing the attribute names and types. That
may appear in the query result. And
A positional iterator lists only the attribute types at the time of query result appear.
A part from this, is both cases, the list should be in the same order as the attributes that are listed
in the SELECT clause of the query. Looping over a query result is different for these two type of
iterators.
In the name iterator, there are no attribute names and in the positional iterator only attribute types
are present.
The positional iterator behaves as move similar to embedded SQL.
Comment
Chapter 10, Problem 7E
Problem
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.
Step-by-step solution
Step 1 of 1
Assuming all required variables have been declared already and assuming that Name of
STUDENT is unique , code will look like:
int Total_grade_avg = 0, total_course_count = 0;
Prompt("Entre name of Student”, Sname) ;
EXEC SQL
Select Student_number, Name
Into :number, :name
From STUDENT
Where Name = :Sname;
EXEC SQL DECLARE GR CURSOR FOR
Select Grade
from GRADE_REPORT
where Student_number = :number;
EXEC SQL OPEN GR
EXEC SQL FETCH from GR into :grade;
While(SQLCODE = = 0)
{
switch (:grade)
{
case ‘A’:
total_grade_avg+= 4;
case ‘B’:
total_grade_avg+= 3;
case ‘C’:
total_grade_avg+= 2;
case ‘D’:
total_grade_avg+= 1;
}
total_course_count++;
EXEC SQL FETCH from GR into :grade;
}
EXEC SQL CLOSE GR
If (total_course_count!=0)Total_grade_avg/ = total_course_count;
printf(“Grade average of student is ”, total_grade_avg);
Comment
Chapter 10, Problem 8E
Problem
Repeat Exercise 10.7, but use SQLJ with Java as the host language.
Reference 10.7
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.
Step-by-step solution
Step 1 of 1
Assuming all required variables have been declared already, headers have been included, and
assuming that Name of STUDENT is unique, code will look like:
int Total_grade_avg = 0, total_course_count = 0;
Sname = readEntry("entre student name:”);
try
{
#sql
{
Select Student_number, Name
Into :number, :name
From STUDENT
Where Name = :Sname
};
}
catch (SQLExeception se)
{
System.out.println(“no student with this name”+ Sname);
Return;
}
#sql iterator STU(Int number, string name);
STU s = null;
#sql s = { Select Grade
from GRADE_REPORT
where Student_number = :number};
while (s.next())
{
switch (:grade)
{
case ‘A’:
total_grade_avg+= 4;
case ‘B’:
total_grade_avg+= 3;
case ‘C’:
total_grade_avg+= 2;
case ‘D’:
total_grade_avg+= 1;
}
total_course_count++;
};
If (total_course_count!= 0 )
{
total_grade_avg = total_grade_avg/ total_course_count;
};
System.out.println(“Grade average of student is ”, +total_grade_avg);
s.close();
Comment
Chapter 10, Problem 9E
Problem
Consider the library relational database schema in Figure. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.
Figure
A relational database schema for a LIBRARY database.
Step-by-step solution
Step 1 of 1
Assuming all required variables have been declared already
EXEC SQL DECLARE DB CURSOR FOR
Select B.Book_id, B.Title, BW.Name
from BOOK B, BORROWER BW, BOOK_LOANS BL
where BL.Due_date = CurDate() + 1
AND BL.Card_no = BW.Card_no
AND BL.Book_id = B.Book_id
EXEC SQL OPEN DB
EXEC SQL FETCH from DB into :bookId, :bookTitle,:borrowerName;
While(SQLCODE = = 0)
{
printf(“BookId”,bookId );
printf(“Book Title”,bookTitle );
printf(“Borrower Name”,borrowerName );
EXEC SQL FETCH from DB into :bookId, :bookTitle,:borrowerName;
}
EXEC SQL CLOSE DB
Comment
Chapter 10, Problem 10E
Problem
Repeat Exercise, but use SQLJ with Java as the host language.
Exercise
Consider the library relational database schema in Figure. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.
Figure
A relational database schema for a LIBRARY database.
Step-by-step solution
Step 1 of 1
Assuming all required variables have been declared already, headers have been included.
#sql iterator DB(string bookId, string bookTitle, string borrowerName);
DB d = null;
#sql d = { Select B.Book_id, B.Title, BW.Name
from BOOK B, BORROWER BW, BOOK_LOANS BL
where BL.Due_date = CurDate() + 1
AND BL.Card_no = BW.Card_no
AND BL.Book_id = B.Book_id
};
while (d.next())
{
System.out.println(“book id :”+d. bookId + “book title:” + d. bookTitle + “borrower name : ” + d.
borrowerName);
};
d.close();
Comment
Chapter 10, Problem 11E
Problem
Repeat Exercise 10.7 and 10.9, but use SQL/CLI with C as the host language.
Reference 10.7
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.
Reference 10.9
Consider the library relational database schema in Figure 6.6. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.
Step-by-step solution
Step 1 of 4
Que 9.7 using SQL/CLI
#include sqlcli.h;
Void printGPA() {
SQLHSTMT stmt1 ;
SQLHDBC conv1 ;
SQLHENV env1 ;
SQLRETURN ret1, ret2, ret3, ret4 ;
ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1);
if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit;
if (!ret2) ret3 = SQLConnect (con1, “dbs”, SQL_NTS, “js”, SQL_NTS,”xyz”, SQL_NTS) else exit;
if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit;
SQLPREPARE(stmt1, “Select Student_number, Name
From STUDENT
Where Name = ?”, SQL_NTS);
prompt (“Entre student name:” Sname);
SQLBindParameter(stmt1, 1, SQL_INTEGER, &Sname, 15, &fetchlen1);
ret1 = SQLExecute(stmt1);
if (!ret1)
{
SQLBindCol(stmt1, 1, SQL_INT, &number,4, &fetchlen1);
SQLBindCol(stmt1, 2, SQL_STRING, &name,15, &fetchlen2);
ret2 = SQLFetch(stmt1);
while (!ret2)
{
SQLPREPARE(stmt1, “Select Grade
from GRADE_REPORT
where Student_number = ?”, SQL_NTS);
SQLBindParameter(stmt1, 1, SQL_INTEGER, &number, 4, &fetchlen1);
ret1 = SQLExecute(stmt1);
if (!ret1)
{
SQLBindCol(stmt1, 1, SQL_INT, &grade,4, &fetchlen1);
ret2 = SQLFetch(stmt1);
while (!ret2)
{
switch (:grade)
{
case ‘A’:
total_grade_avg+= 4;
case ‘B’:
total_grade_avg+= 3;
case ‘C’:
total_grade_avg+= 2;
case ‘D’:
total_grade_avg+= 1;
}
total_course_count++;
ret2 = SQLFetch(stmt1);
Comment
Step 2 of 4
}
}
If (total_course_count!=0)Total_grade_avg/ = total_course_count;
System.out.printline(“Grade average of student is ”, total_grade_avg);
Comment
Step 3 of 4
}
else System.out.printline(“Sname does not match”);
}
Que 9.9 using SQL/CLI
#include sqlcli.h;
Void printDueBookRecord() {
SQLHSTMT stmt1 ;
SQLHDBC conv1 ;
SQLHENV env1 ;
SQLRETURN ret1, ret2, ret3, ret4 ;
ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1);
if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit;
if (!ret2) ret3 = SQLConnect (con1, “dbs”, SQL_NTS, “js”, SQL_NTS,”xyz”, SQL_NTS) else exit;
if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit;
SQLPREPARE(stmt1, “Select B.Book_id, B.Title, BW.Name
from BOOK B, BORROWER BW, BOOK_LOANS BL
where BL.Due_date = CurDate() + 1
AND BL.Card_no = BW.Card_no
AND BL.Book_id = B.Book_id” , SQL_NTS
)
ret1 = SQLExecute(stmt1);
if (!ret1)
{
SQLBindCol(stmt1, 1, SQL_STRING, &Book_id,4, &fetchlen1);
SQLBindCol(stmt1, 2, SQL_STRING, &Title,30, &fetchlen2);
Comment
Step 4 of 4
SQLBindCol(stmt1, 3, SQL_STRING, &Borrowername,30, &fetchlen3);
ret2 = SQLFetch(stmt1);
while (!ret2)
{
System.out.printline (Book_id, Title, Borrower_name);
ret2 = SQLFetch(stmt1);
}
}
Comment
Chapter 10, Problem 12E
Problem
Repeat Exercise 10.7 and 10.9, but use JDBC with Java as the host language.
Reference 10.7
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.
Reference 10.9
Consider the library relational database schema in Figure 6.6. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.
Step-by-step solution
Step 1 of 2
9.7 using JDBC
Import java.io.*;
import java.sql.*;
…..
class PrintGPAAverage
{
Public static void main(String args[])
Throws SQLException, IOException{
Try{ Class.forName(“oracle.jdbc.driver.Oracle.Driver”)
} catch (ClassNotFoundException x) {
System.out.printline (“Driver could not be loaded”);
}
String dbacct, password, lname;
Integer number;
String name, Sname;
dbacct = readEntry(“entre database account:”);
passwrd = readEntry(“entre password:”);
Connection conn = DriveManager.getConnection(“jdbc:oracle:oci8:”+ dbacct, +passwrd);
Sname = readEntry (“entre student name”);
String q=“Select Student_number,Name From STUDENT Where Name = ”+Sname;
Statement s = conn.createStatement();
ResultSet r = s.ExecuteQuery(q);
while(r.next())
{
number = r.getInteger(1);
name = r.getString(2);
String t = “Select Grade from GRADE_REPORT where Student_number = “ + number.tostring();
Statement g = conn.createStatement();
ResultSet rs = g.executeQuery(t);
while (rs.next()){
switch (:grade)
{
case ‘A’:
total_grade_avg+= 4;
case ‘B’:
total_grade_avg+= 3;
case ‘C’:
total_grade_avg+= 2;
case ‘D’:
total_grade_avg+= 1;
}
total_course_count++;
}
}
If (total_course_count!=0)Total_grade_avg/ = total_course_count;
System.out.printline(“Grade average of student is ”, total_grade_avg);
}
Comment
Step 2 of 2
Exercise 6.9 as JDBC:
Import java.io.*;
import java.sql.*;
…..
class PrintGPAAverage
{
Public static void main(String args[])
Throws SQLException, IOException{
Try{ Class.forName(“oracle.jdbc.driver.Oracle.Driver”)
} catch (ClassNotFoundException x) {
System.out.printline (“Driver could not be loaded”);
}
String dbacct, password, lname;
String Book_Id, Book_title, Borrower_name;
dbacct = readEntry(“entre database account:”);
passwrd = readEntry(“entre password:”);
Connection conn = DriveManager.getConnection(“jdbc:oracle:oci8:”+ dbacct, +passwrd);
String q=“Select B.Book_id, B.Title, BW.Name
from BOOK B, BORROWER BW, BOOK_LOANS BL
where BL.Due_date = CurDate() + 1
AND BL.Card_no = BW.Card_no
AND BL.Book_id = B.Book_id”;
Statement s = conn.createStatement();
ResultSet r = s.ExecuteQuery(q);
while(r.next())
{
Book_Id = r.getString(1);
Book_title= r.getString(2);
Borrower_name = r.getstring(3);
System.out.println(“book id :”++ “book title:” + + “borrower name : ” +);
}}
}
Comment
Chapter 10, Problem 13E
Problem
Repeat Exercise 10.7, but write a function in SQL/PSM.
Reference 10.7
Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.
Step-by-step solution
Step 1 of 2
Consider the following SQL/PSM function to determine the average grade point of student.
//Function PSM2:
1. CREATE FUNCTION Average_grad ( IN in_name CHAR(20))
//Declare variables to store intermediate values
2. DECLARE total_avg INTEGER;
3. DECLARE std_no INTEGER;
4. DECLARE count INTEGER;
5. DECLARE final_avg FLOAT;
//Query to find the student number of user entered student name.
6. SELECT student_number INTO std_no FROM STUDENT WHERE
Name=in_name;
//Declare cursor to process the multiple row returned by the query
7. CURSOR grd is SELECT Grade FROM GRADE_REPORT WHERE
Student_number= std_no;
8. OPEN grd;
9. LOOP
10. FETCH grd INTO temp_grd;
11. EXIT WHEN grd% NOTFOUND
12. COUNT:=COUNT +1;
//use else-if statement to find the total points sum of student.
13. IF temp_grd=’A’ THEN total_avg:= total_avg+4;
14. ELSEIF temp_grd=’B’ THEN total_avg:= total_avg+3;
15. ELSEIF temp_grd=’C’ THEN total_avg:= total_avg+2;
16. ELSE temp_grd=’D’ THEN total_avg:= total_avg+1;
17. END IF;
18. END LOOP;
//calculate the average
19. final_avg:=total_avg/count;
//display student average point
20. Dbms_output.put_line(“The average is: ”||final_avg);
Comment
Step 2 of 2
Explanation of the above function:
• First a function Average_grad is created which takes the name as an input.
• Now, from the line number 2 to line number 5, variables are declared to store intermediate
values.
• Now, Query in line number 6 is used to find the student number of user entered student name.
• In the line number 7, cursor is declared to process the multiple row returned by the query.
• Now, from the line number 8 to line number 18, for loop is used to count the number of rows.
Also, else-if statement is used inside for loop to find the total point’s sum of student.
• At the end, in line number 19 the average is calculated.
• In the line number 20, Dbms_output.put_line is used to display the average.
Comment
Chapter 10, Problem 14E
Problem
Create a function in PSM that computes the median salary for the EMPLOYEE table shown in
Figure 5.5.
Step-by-step solution
Step 1 of 2
Following is the function in Persistent Stored Module (PSM ) to calculate the median salary for
the EMPLOYEE table:
//Function PSM1:
0) CREATE FUNCTION Emp_Median_Sal(IN Salary INTEGER)
1) RETURNS INTEGER
2) DECLARE median_salary INTEGER;
3) SELECT MEDIAN(Salary) INTO median_salary
4) FROM EMPLOYEE;
5) RETURN median_salary;
Comment
Step 2 of 2
Explanation:
Line 0: CREATE FUNCTION is used to create a function. The name of the function
created is Emp_Median_Sal. It takes the salaries of the EMPLOYEE table
as input.
Line 1: RETURNS is used to return the median salary among the inputs.
Line 2: DECLARE is used to declare local variables. median_salary is a
variable declared to hold the value of median salary.
Line 3: MEDIAN(Salary) will give the median value among the salaries. INTO
clause will assign the value returned by MEDIAN(Salary) into local
variable median_salary.
Line 4: FROM is used to specify from which table the data is to be considered.
Line 5: RETURN is used to return the median_salary.
Comment
Chapter 14, Problem 1RQ
Problem
Discuss attribute semantics as an informal measure of goodness for a relation schema.
Step-by-step solution
Step 1 of 2
Semantics of a relation refers to way of explaining the meaning of an attribute value in a tuple.
Comment
Step 2 of 2
• The semantics of an attribute should be considered in such a way that they can be interpreted
easily.
• Once the semantics of an attribute are clear, it will be easy to interpret a relation.
• The relation that is easy to interpret will indeed result in a good schema design.
Thus, the semantics of an attribute plays an informal measure to design a relation schema.
Comment
Problem
Chapter 14, Problem 2RQ
Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate
with examples.
Step-by-step solution
Step 1 of 6
Insertion anomaly refers to the situation where it is not possible to enter data of certain attributes
into the database without entering data of other attributes.
Deletion anomaly refers to the situation where data of certain attributes are lost as the result of
deletion of some of the attributes.
Modification anomaly refers to the situation where partial update of redundant data leads to
inconsistency of data.
Comment
Step 2 of 6
Insertion, deletion and modification anomalies are considered bad due to the following reasons:
• It will be difficult to maintain consistency of data in the database.
• It leads to redundant data.
• It causes unnecessary updates of data.
• Memory space will be wasted at the storage level.
Comment
Step 3 of 6
Consider the following relation named Emp_Proj:
Insertion Anomalies:
• Assume that there is an employee E11 who is not yet working in a project. Then it is not
possible to enter details of employee E11 into the relation Emp_Proj.
• Similarly assume there is a project P7 with no employees assigned to it. Then it is not possible
to enter details of project P7 into the relation Emp_Proj.
• Therefore, it is possible to enter an employee details into relation Emp_Proj only if he is
assigned to a project.
• Similarly, it is possible to enter details of a project into relation Emp_Proj only if an employee is
assigned to a project.
Comment
Step 4 of 6
Deletion Anomalies:
• Assume that an employee E07 has left the company. So, it is necessary to delete employee
E07 details from the relation Emp_Pro.
• If employee E07 details are deleted from the relation Emp_Pro, then the details of project P5
will also be lost.
Update anomalies:
• Assume that the location of project P1 is changed from Atlanta to New Jersey. Then the update
should be done at three places.
• If the update is reflected for two tuples and is not done for the third tuple, then inconsistency of
data occurs.
Comment
Step 5 of 6
In order to remove insertion, deletion and modification anomalies, decompose the relation
Emp_Proj into three relations as shown below:
Comment
Step 6 of 6
Insertion Anomalies:
• It is possible to enter the details of employee E11 into relation Employee even though he is not
yet working in a project.
• It is possible to enter the details of project P7 into relation Project even though there are no
employees assigned to it.
Deletion Anomalies:
• If employee E07 details are deleted from the relation Employee, still the details of project P5 will
not be lost.
Update anomalies:
• If the location of project P1 is changed from Atlanta to New Jersey, then the update should be
done in relation Project at only one place.
Comment
Chapter 14, Problem 3RQ
Problem
Why should NULLs in a relation be avoided as much as possible? Discuss the problem of
spurious tuples and how we may prevent it.
Step-by-step solution
Step 1 of 4
Nulls values should be avoided in a relation as much as possible for the following reasons:
• Memory space will be wasted at the storage level.
• Meaning and purpose of the attributes is not communicated well.
Comment
Step 2 of 4
• When aggerate operations such as SUM, AVG etc. are performed on the attribute which has
null values, the result will be incorrect.
• When JOIN operation involves an attribute with null values, the result may be unpredictable.
• The NULL value has different meanings. It may be unknown, not applicable or absent.
Comment
Step 3 of 4
Spurious tuples are generated as the result of bad design or improper decomposition of the base
table.
• Spurious tuples are the tuples generated when a JOIN operation is performed on badly
designed relations. The resultant will have more tuples than the original set of tuples.
• The main problem with spurious tuples is that they are considered invalid as they do not appear
in the base tables.
Comment
Step 4 of 4
Spurious tuples can be avoided by taking care while designing relational schemas.
• The relations should be designed in such a way that when a JOIN operation is performed, the
attributes involved in the JOIN operation must be a primary key in one table and foreign key in
another table.
• While decomposing a base table into two tables, the tables must have a common attribute. The
common attribute must be primary key in one table and foreign key in another table.
Comment
Chapter 14, Problem 4RQ
Problem
State the informal guidelines for relation schema design that we discussed. Illustrate how
violation of these guidelines may be harmful.
Step-by-step solution
Step 1 of 1
Informal guidelines for relational schema:For designing a relation a relational database schema there are four types of informal measures
of guidelines that are
(1) Semantics of the attributes.
(2) Reducing the redundant information in tuples.
(3) Reducing the NULL values in tuples.
(4) Disallowing the possibility of generating spurious tuples.
These guidelines may be harmful,
(1) Anomalies that cause redundant work to be done during insertion into and modification of a
relation. And that may cause accidental loss of information during a deletion from a relation.
(2) Waste of storage space due to NULL and the difficulty of perfuming selections. Aggregation
operation and joins due to NULL values.
(3) Generation of invalid and spurious data during joins on improperly related base relations.
There problems may pointed out which can be detected with out additional tool of analysis’s.
Comment
Chapter 14, Problem 5RQ
Problem
What is a functional dependency? What are the possible sources of the information that defines
the functional dependencies that hold among the attributes of a relation schema?
Step-by-step solution
Step 1 of 3
Functional dependency: The functional dependency describes the relationship between the
attributes in a table. The functional dependency
between the two attributes X, Y in a
relation R is said to be exist if one attribute determines the other attribute uniquely.
Comment
Step 2 of 3
The functional dependency is a property of the semantics i.e., the functional dependency
represents the semantic association between the attributes of the relation schema R. The main
use of the functional dependency is that it describes the relation schema R. It is done by
specifying the constraints on a relation R. These constraints are called legal extensions.
Comment
Step 3 of 3
In the functional dependency
, the value Y is determined by the value of X i.e., X
determines Y.
Full functional dependency indicates that if A and B are attributes of the relation R then B is fully
functionally dependent on A, but not any proper subset of A.
Partial functional dependency indicates that if A and B are attributes of the relation R then B is
partially dependent on A if there is some attribute that can be removed from A and yet the
dependency still holds among the attributes of a relational schema.
Comment
Chapter 14, Problem 6RQ
Problem
Why can we not infer a functional dependency automatically from a particular relation state?
Step-by-step solution
Step 1 of 1
Certain FDs can be specified without refereeing to a specific relation, but as a property of those
attributes given there generally understood meaning. It is also possible that certain functional
dependencies may cease to exist in the real world if the relationship changes. Some tuples may
have values that agree to a supposed FD but a new tuple may not agree with the same. Since a
functional dependency is a property of the relation schema R, and not of a particular legal
relation state R, it is not possible to define FDs from a particular relation state.
Comment
Chapter 14, Problem 7RQ
Problem
What does the term unnormalized relation refer to? How did the normal forms develop historically
from first normal form up to Boyce-Codd normal form?
Step-by-step solution
Step 1 of 1
A unnormalized relation refer to a relation which does not meet any normal form condition.
The normalization process was first proposed by Codd(1972), takes a relation schema through
series of tests to certify weather it satisfies a certain normal form. The process, which proceeds
in a top-down fashion by evaluating each relation against criteria for normal forms and
decomposing relations as necessary, thus can be considered as relation design by analysis.
Initially Codd proposed three normal forms 1NF, 2NF and 3NF. A stronger definition of 3NF called
Boyce-Codd normal form(BCNF) was proposed later by Boyce and Codd. All these nornal forms
are based on a single analytical tool: the functional dependencies among attributes of relation.
1NF splits relation schema into schemas that have atomic values as domain for all attribues and
values of none of attribute is set of values. 2NF removes all partial dependencies of nonprime
attributes A in R on key and ensure that all nonprime attributes are fully functionally dependent
on the key of R. 3NF removes all transitive dependencies on key of R. and ensure that no non
prime attribute is transitively dependent on key.
Comment
Chapter 14, Problem 8RQ
Problem
Define first, second, and third normal forms when only primary keys are considered. How do the
general definitions of 2NF and 3NF, which consider all keys of a relation, differ from those that
consider only primary keys?
Step-by-step solution
Step 1 of 2
Definition of normal forms when only primary keys are considered
First Normal Form: It states that the domain of an attribute must include only atomic values and
that the values of any attribute in a tuple must be a single value from the domain of that attribute.
In other words first normal form does not allow relations with in relation as attribute values within
tuples.
Second Normal Form: It is based on concept of full functional dependency. A dependency X-> Y
is full functional dependency if after removing any attribute A from X dependency does no hold
any more. Else it is called partial dependency.
A relation schema R is said to be in 2NF f every nonprime attribute A in R is fully functional
dependent on the primary key of R.
Third Normal Form: It is based on concept of transitive dependency. A functional dependency X>Y in a relation schema R is a transitive dependency if there is a set of attributes of Z, that are
neither a candidate key nor a subset of any key of R, and both X->Z and Z->Y hold.
A relation schema is said to be in third normal form if it satisfies second normal form and no
nonprime attribute of R is transitively dependent on the primary key.
Comment
Step 2 of 2
The general definitions of 2NF and 3NF are different from general definition because general
definition takes into account candidate keys as well. As a general definition of prime atribute, an
attribute that is part of any candidate key will be considered as prime. Partial and full functional
dependencies and transitive dependencies will be considered with respect to all candidate keys
of a relation.
General definition of 2NF: A relation schema R is in second normal form if every non-prime
attribute A in R is not partially dependent on any key of R
General definition of 3NF: A relation schema is said to be in 3NF if, whenever a nontrivial
functional dependency X->A holds in R, either (a) X is a super key of R, or (b) A is a prime
attribute of R
A functional dependency is X-> Y trivial if X is superset of Y else dependency is non trivial.
Comment
Chapter 14, Problem 9RQ
Problem
What undesirable dependencies are avoided when a relation is in 2NF?
Step-by-step solution
Step 1 of 1
2NF removes all partial dependencies of nonprime attributes A in R on key and ensure that all
nonprime attributes are fully functionally dependent on the key of R.
Comment
Chapter 14, Problem 10RQ
Problem
What undesirable dependencies are avoided when a relation is in 3NF?
Step-by-step solution
Step 1 of 1
3NF removes all transitive dependencies on key of R. and ensure that no non prime attribute is
transitively dependent on key.
Comment
Chapter 14, Problem 11RQ
Problem
In what way do the generalized definitions of 2NF and 3NF extend the definitions beyond primary
keys?
Step-by-step solution
Step 1 of 1
The generalized definitions of second normal form and third normal form extend beyond primary
key by taking into consideration all the candidate keys of a relation.
• These definitions do not depend/revolve around only the primary key of a relation.
• These definitions take into consideration all the attributes that can be a possible key for a
relation
• These definitions also consider the partial and transitive dependencies on the candidate keys.
Comment
Chapter 14, Problem 12RQ
Problem
Define Boyce-Codd normal form. How does it differ from 3NF? Why is it considered a stronger
form of 3NF?
Step-by-step solution
Step 1 of 3
Boyce – Codd Normal Form (BCNF):
• A relation is said to be in BCNF if and only if every determinant is a candidate key.
• In the functional dependency XY, if the attribute Y is fully functionally dependent on X, then X is
said to be a determinant.
• A determinant can be composite or single attribute.
• BCNF is a stronger form of third normal form (3NF).
• A relation that is in BCNF will also be in third normal form.
Comment
Step 2 of 3
Following are the differences between 3NF and BCNF:
BCNF
3NF
BCNF is a stronger normal form than 3NF.
3NF is a weaker normal form than BCNF.
In the functional dependency XY, Y need not be
In the functional dependency XY, Y must be
a prime attribute.
a prime attribute.
It does not allow non-key attributes as
determinants.
It allows non-key attributes as determinants.
Comment
Step 3 of 3
BCNF is a stronger form of third normal form (3NF).
• In BCNF, every determinant must be a candidate key.
• BCNF does not allow some dependencies which are allowed in 3NF.
• A relation that is in BCNF will also be in third normal form.
• A relation that is in third normal form need not be in BCNF.
Comment
Chapter 14, Problem 13RQ
Problem
What is multivalued dependency? When does it arise?
Step-by-step solution
Step 1 of 3
Multivalued Dependency:
• It is defined as a full constraint between two different sets of attributes in a relation.
• This does not allow, having a set of values in a tuple.
• The tuples should be presented in a relation.
Comment
Step 2 of 3
Occurrence of Multivalued dependency:
• The relation which will have constraints that cannot be specified as the functional dependency
then the multivalued dependency arises.
• It will also occur when there is occurrence of one or more tuples in the same table in a
database.
Comment
Step 3 of 3
Example of the occurrence of multivalued dependency:
The Employee table has two multivalued dependencies listed below.
Ename ->> Pname
Ename ->> Dname
Here Ename indicates employee name, Pname indicates project name, and Dname indicates
dependent’s name.
This is a multivalued dependency because; an employee can work in more than one project and
can have more than one dependent.
Comment
Chapter 14, Problem 14RQ
Problem
Does a relation with two or more columns always have an MVD? Show with an example.
Step-by-step solution
Step 1 of 2
In a relation, when one attribute has multiple values referring to another attribute, then it indicates
that there is a multivalued dependency (MVD) in a relation.
An example of a relation with three attributes that have an MVD is as follows:
In the above relation, there exists two MVDs:
In order to remove the MVDs, decompose the relation into two relations as shown below:
Comment
Step 2 of 2
A relation with two or more columns will not always have a multivalued dependency (MVD).
An example of a relation with two attributes that does not have an MVD is as follows:
An example of a relation with three attributes that does not have an MVD is as follows:
Comment
Chapter 14, Problem 15RQ
Problem
Define fourth normal form. When is it violated? When is it typically applicable?
Step-by-step solution
Step 1 of 2
Violation of Fourth normal form:
• The fourth normal form is violated if the relation is having the multivalued dependencies which
are used to identify and decompose the relations in the relational schema R.
Comment
Step 2 of 2
Conditions for applying Fourth normal form:
• A relation can be in Fourth normal form, if the relation is in third normal form.
• For every non trivial dependencies X->Y where X is a superkey for R.
Comment
Chapter 14, Problem 16RQ
Problem
Define join dependency and fifth normal form.
Step-by-step solution
Step 1 of 2
Join dependency:
• It is a constraint which is specified on the relation schema which is denoted by JD (R1, R2, R3,
... ,Rn).
• A join dependency is said to be trivial join dependency if join dependency specified on the
relation schema is equal to R.
• It is a constraint with a set of legal relations over a database schema.
Comment
Step 2 of 2
Fifth normal form:
• It is a database normalization technique which is used to reduce the redundancy or duplicate
values of the relational databases recording multi valued facts.
• The table should be the standard for the fourth normal form.
• It is also called project join normal form because if there is any decomposition of the Relational
Schema R there will be lossless decomposition in join dependency.
• The fifth normal is defined with the join dependencies.
Comment
Chapter 14, Problem 17RQ
Problem
Why is 5NF also called project-join normal form (PJNF)?
Step-by-step solution
Step 1 of 2
Fifth Normal Form (5NF):
• A relation schema is said to be in fifth normal if it is in the fourth normal form and with the set of
the functional and join dependencies.
• The fifth normal is defined with the join dependencies.
If there is any decomposition of the Relational Schema R there will be lossless
decomposition in join dependency. So, the 5NF is called as project-join normal form
(PJNF).
Comment
Step 2 of 2
Examples of Project join normal form:
The following is the example of the project join normal form:
Consider when supplier(S) supplies the parts (p) to the projects (j).
The relationships are derived as follows:
• Supplier(S) supplies part (p).
• Project(j) uses the part (p) and
• Supplier(S) supplies at least one part (p) to the project (j).
Therefore it shows the join dependency in the relation which are decomposed into three relations
that are shown above and each relation is in 5NF.
Comment
Chapter 14, Problem 18RQ
Problem
Why do practical database designs typically aim for BCNF and not aim for higher normal forms?
Step-by-step solution
Step 1 of 1
Boyce Codd normal form (BCNF): The relation schema is said to be in BCNF whenever the
nontrivial functional dependency X->A in R and then X is the super key of the relational
schema(R).
The practical database design users prefer to use BCNF rather than going for the higher normal
forms because of the following reasons:
• It is simpler form of 3NF (third normal form).
• It reduces the redundancy (or duplicate) of the information in the thousands of tuples.
• The data model can be easily understood by using the BCNF normalization technique.
• It also improves the performance of the database when compared to the other normal forms.
• It is stronger than the 3NF because a relation in BCNF is also a relation in 3NF but not the viceversa.
• In most of the cases, the functional dependencies in R that violate the normal form up to BCNF
are not present.
The above points clearly say that database design users practically use BCNF when compared
to other higher normal forms which improve the consistency, performance and quality of the
database.
Comment
Chapter 14, Problem 19E
Problem
Suppose that we have the following requirements for a university database that is used to keep
track of students’ transcripts:
a. The university keeps track of each student’s name (Sname), student number (Snum), Social
Security number (Ssn), current address (Sc_addr) and phone (Sc_phone), permanent address
(Sp_addr) and phone (Sp_phone), birth date (Bdate), sex (Sex), class (Class) (‘freshman’,
‘sophomore’, …, ‘graduate’), major department (Major_code), minor department (Minor_code) (if
any), and degree program (Prog) (‘b.a.’, ‘b.s.’, ..., ‘ph.d.’). Both Ssn and student number have
unique values for each student.
b. Each department is described by a name (Dname), department code (Dcode), office number
(Doffice), office phone (Dphone), and college (Dcollege). Both name and code have unique
values for each department.
c. Each course has a course name (Cname), description (Cdesc), course number (Cnum),
number of semester hours (Credit), level (Level), and offering department (Cdept). The course
number is unique for each course.
d. Each section has an instructor (Iname), semester (Semester), year (Year), course
(Sec_course), and section number (Sec_num). The section number distinguishes different
sections of the same course that are taught during the same semester/year; its values are 1,2, 3,
..., up to the total number of sections taught during each semester.
e. A grade record refers to a student (Ssn), a particular section, and a grade (Grade).
Design a relational database schema for this database application. First show all the functional
dependencies that should hold among the attributes. Then design relation schemas for the
database that are each in 3NF or BCNF. Specify the key attributes of each relation. Note any
unspecified requirements, and make appropriate assumptions to render the specification
complete.
Step-by-step solution
Step 1 of 4
Functional Dependency:
Functional dependency exists when one attribute in a relation uniquely determines another
attribute. Functional dependency is represented as XY. X and Y can be composite.
The functional dependencies from the given information are as follows:
Comment
Step 2 of 4
From the functional dependencies FD 1 and FD 2, the relation STUDENT can be defined. Either
Ssn or Snum can be primary key.
From the functional dependencies FD 3 and FD 4, the relation DEPARTMENT can be defined.
Either Dname or Dcode can be primary key.
From the functional dependencies FD 5, the relation COURSE can be defined. Cnum is the
primary key.
From the functional dependencies FD 6, the relation SECTION can be defined. Sec_num,
Sec_course, Semester, Year will be the composite primary key.
From the functional dependencies FD 7 and FD 8, the relation GRADE can be defined. {Ssn,
Sec_course, Semester, Year} will be the composite primary key.
Comment
Step 3 of 4
The relations that are in third normal form are as follows:
Explanation:
• In STUDENT relation, either Ssn or Snum can be primary key. Either keys can be used to
retrieve the data from the STUDENT table.
• In DEPARTMENT relation, either Dname or Dcode can be primary key. Either keys can be used
to retrieve the data from the DEPARTMENT table.
• In COURSE table, Cnum is the primary key.
• The primary key for the SECTION table is {Sec_num, Sec_course, Semester, Year} which is a
composite primary key.
• The primary key for the GRADE table is {Ssn, Sec_course, Semester, Year} which is a
composite primary key.
Comment
Step 4 of 4
The relational schema is as follows:
Comment
Chapter 14, Problem 20E
Problem
What update anomalies occur in the EMP_PROJ and EMP_DEPT relations of Figures 14.3 and
14.4?
Step-by-step solution
Step 1 of 2
In EMP_PROJ, the partial dependencies can cause anomalies, that is
{SSN}-> {ENAME} and {PNUMBER}->{PNAME, PLOCATION}
Let the example as PROJECT temporarily has no EMPLOYEEs working on it.
when the last EMPLOYEE working on the information (PNAME, PNUMBER, PLOCATION) will
not be represented in the database and is removed. Then new PROJECT cannot be added
unless at least one EMPLOYEE is assigned to work on it.
Inserting a new tuples relating an existing EMPLOYEE to an existing PROJECT requires
checking both partial dependencies;
Let the example, if a different value is entered for PLOCATION than those values in other tuples
with the same value for PNUMBER, we get an update anomaly. Same like this comments apply
to EMPLOYEE information. The reason is that EMP_PROJ represents the relationship between
EMPLOYEEs and PROJECTs, and at the same time represents information concerning
EMPLOYEE and PROJECT entities.
Comment
Step 2 of 2
In EMP_DEPT, the transitive dependency can cause anomalies. That is
{SSN}->{DNUMBER}->{DNAME, DMGRSSN}
Let the Example for , if a DEPARTMENT temporarily has no EMPLOYEEs working for it, its
information (DNAME, DNUMBER, DMGRSSN) will not be represented in the database when the
last EMPLOYEE working on it is removed. A new DEPARTMENT cannot be added unless at
least one EMPLOYEE is assigned to work on it.
Inserting a new tuple relating a new EMPLOYEE to an existing DEPARTMENT requires checking
the transitive dependencies; for example, if a different value is entered for DMGRSSN than those
values in other tuples with the same value for DNUMBER, we get an update anomaly. The
reason is that EMP_DEPT represents the relationship between EMPLOYEEs and
DEPARTMENTs, and at the same time represents information concerning EMPLOYEE and
DEPARTMENT entities.
Comment
Chapter 14, Problem 21E
Problem
In what normal form is the LOTS relation schema in Figure 14.12(a) with respect to the restrictive
interpretations of normal form that take only the primary key into account? Would it be in the
same normal form if the general definitions of normal form were used?
Step-by-step solution
Step 1 of 1
575-10-23E
With respect to restrictive interpretation of normal form, the LOTS relational schema is in 2NF
since no partial dependencies are on the primary key. Other wise, it is not in 3NF, since following
two transitive dependencies are on the primary key:
PROPERTY_ID# ->COUNTY_NAME ->TAX_RATE, and
PROPERTY_ID# ->AREA ->PRICE.
Now, if we take all keys into account and use the general definition of 2NF and 3NF, then the
LOTS relation schema will only be in 1NF because there is a partial dependency
COUNTY_NAME ->TAX_RATE on the secondary key {COUNTY_NAME, #}, which violates 2NF.
Comment
Chapter 14, Problem 22E
Problem
Prove that any relation schema with two attributes is in BCNF.
Step-by-step solution
Step 1 of 2
BCNF:
• A relation R is said to be in BCNF if it contains a FD (functional dependencies) of the form a->b.
• Here, either a->b is a trivial FD or {a} is a super key of the relation R.
Comment
Step 2 of 2
Take he relation schema R= {a, b} with two attributes. Then the non-trivial FDs are
{a} -> {b} and {b} ->{a}.
The Functional Dependencies follows below cases:
Case 1: No FD holds in R.
In this case, the key is {a, b} and the relation satisfies BCNF.
Case 2: Only {a} -> {b} holds.
In this case, the key is {a} and the relation satisfies BCNF.
Case 3: Only {b} ->{a} holds.
In this case, the key is {B} and the relation satisfies BCNF.
Case 4: Both {a} -> {a} and {b} -> {a} hold.
In this case, there are two keys {a} and {a} and the relation satisfies BCNF.
Hence, any relation with two attributes is in BCNF.
Comment
Chapter 14, Problem 23E
Problem
Why do spurious tuples occur in the result of joining the EMP_PROJ1 and EMP_ LOCS relations
in Figure 14.5 (result shown in Figure 14.6)?
Step-by-step solution
Step 1 of 1
The spurious tuples are those tuples that are not valid. The spurious tuples occur in the result of
joining the EMP_PROJ1 and EMP_LOCS relations because the natural joining is based on the
common attribute Plocation.
• In EMP_LOCS, the primary key is {Ename, Plocation}.
• In EMP_PROJ1, the primary key is {Ssn, Pnumber}.
• The attribute Plocation is not a primary key or a foreign key in the relations EMP_PROJ1 and
EMP_LOCS.
• As Plocation is not a primary key or a foreign key in the relations EMP_PROJ1 and
EMP_LOCS, it resulted in spurious tuples.
Comment
Chapter 14, Problem 24E
Problem
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional
dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
R? Decompose R into 2NF and then 3NF relations.
Step-by-step solution
Step 1 of 1
575-10-26E
Let R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies
F = { {A, B}-> {C}, {A}->{D, E}, {B}->{F}, {F}->{G, H}, {D}->{I, J} }
A minimal set of attributes whose closure includes all the attributes in R is a key. Since the
closure of {A, B}, {A, B} + = R,
So, one key of R is {A, B}
Decompose R into 2NF and then 3NF
For this normalize R intuitively into 2NF then 3NF, we may follow below steps
Step 1:
Identify partial dependencies and that may violate 2NF. These are attributes that are
functionally dependent on either parts of the key, {A} or {B}, alone.
Now we can calculate the closures {A}+ and {B}+ to determine partially dependent attributes:
{A}+ = {A, D, E, I, J}. Hence {A} -> {D, E, I, J} ({A} -> {A} is a trivial dependency
{B}+ = {B, F, G, H}, hence {A} -> {F, G, H} ({B} -> {B} is a trivial dependency
For normalizing into 2NF, we may remove the attributes that are functionally dependent on part of
the key (A or B) from R and place them in separate relations R1 and R2, along with the part of
the key they depend on (A or B), which are copied into each of these relations but also remains
in the original relation, which we call R3 below:
R1 = {A, D, E, I, J}, R2 = {B, F, G, H}, R3 = {A, B, C}
The new keys for R1, R2, R3 are underlined. Next, we look for transitive dependencies
in R1, R2, R3.
The relation R1 has the transitive dependency {A} -> {D} -> {I, J}, so we remove the transitively
dependent attributes {I, J} from R1 into a relation R11 and copy the attribute D they are
dependent on into R11. The remaining attributes are kept in a relation R12. Hence, R1 is
decomposed into R11 and R12 as follows:
R11 = {D, I, J}, R12 = {A, D, E}
The relation R2 is similarly decomposed into R21 and R22 based on the transitive dependency
{B} -> {F} -> {G, H}:
R2 = {F, G, H}, R2 = {B, F}
The final set of relations in 3NF are {R11, R12, R21, R22, R3}
Comments (1)
Chapter 14, Problem 25E
Problem
Repeat Exercise for the following different set of functional dependencies G = {{A, B}, → {C}, {B,
D}→ {E, F}, {A, D}→{G, H}, {A}→{I},{H} → {J}}.
Exercise
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional
dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
R? Decompose R into 2NF and then 3NF relations.
Step-by-step solution
Step 1 of 6
The relation R={ A, B, C, D, E, F, G, H, I, J}
The set of functional dependencies are as follows:
{A, B}{C}
{B, D}{E, F}
{A, D}{G, H}
{A}{I}
{H}{J}
Step 1: Find the closure of single attributes:
{A}+{A, I}
{B}+{B}
{C}+{C}
{D}+{D}
{E}+{E}
{F}+{F}
{G}+{G}
{H}+{H, J}
{I}+{ I}
{J}+{J}
From the above closures of single attributes, it is clear that the closure of any single attribute
does not represent relation R. So, no single attribute forms the key for the relation R.
Comment
Step 2 of 6
Step 2: Find the closure of pairs of attributes that are in the set of functional
dependencies.
The closure of {A, B} is as shown below:
From the functional dependency {A, B}{C} and {A}{I},
{A, B}+{A, B, C, I}
The closure of { B, D} is as shown below:
From the functional dependency {B, D}{E, F},
{B, D}+{B, D, E, F}
The closure of { A, D} is as shown below:
From the functional dependency {A, D}{G, H}, {A}{I} and {H}{J},
{A, D}+{A, D, G, H, I, J}
From the above closures of pairs of attributes, it is clear that the closure of any pairs of attributes
does not represent relation R. So, no single attribute forms the key for the relation
Comment
Step 3 of 6
Step 3: Find the closure of union of the three pairs of attributes that are in the set of
functional dependencies.
The closure of {A, B, D} is as shown below:
From the functional dependency {A, B}{C}, {B, D}{E, F} and {A, D}{G, H}
{A, B, D}+{A, B, C, D, E, F, G, H}
From the functional dependency{A}{I}, the attribute I is added to {A, B, D}+.
Hence, {A, B, D}+{A, B, C, D, E, F, G, H, I}
From the functional dependency{H}{J}, the attribute J is added to {A, B, D}+.
Hence, {A, B, D}+{A, B, C, D, E, F, G, H, I, J}
The closure of {A, B, D} represents relation R.
Hence, the key for relation R is {A, B, D}.
Comment
Step 4 of 6
Decomposing the relation R into second normal form (2NF):
According to the second normal form, each non-key attribute must depend only on primary key.
• The key for relation R is {A, B, D}.
• {A} is a partial key that functionally determines the attribute I.
• {A, B} is a partial key that functionally determines the attribute C.
• {B, D } is a partial key that functionally determines the attribute E and F.
• {A, D} is a partial key that functionally determines the attribute G and H.
So, decompose the relation R into the following relations.
R1{A, I}
The key for R1 is {A}.
R2{A, B, C}
The key for R2 is {A, B}.
R3{B, D, E, F}
The key for R3 is { B, D}.
R4{A, B, D)
The key for R4 is { A, B, D}.
R5{A, D, G, H, J}
The key for R5 is { A, D}.
The relations R1, R2, R3, R4, R5 are in second normal form.
Comment
Step 5 of 6
Decomposing the relation R into third normal form (3NF):
According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.
• H is a non-key attribute that functionally determines the attribute J.
So, decompose the relation R5 into the following relations.
R6{A, D, G, H,}
The key for R3 is { A, D}.
R7{H, J}
The key for R7 is {H}.
Comment
Step 6 of 6
The final set of relations that re in third normal form are as follows:
R1{A, I}
R2{A, B, C}
R3{B, D, E, F}
R4{A, B, D)
R6{A, D, G, H,}
R7{H, J}
Comment
Chapter 14, Problem 26E
Problem
Consider the following relation:
A
B
C
TUPLE#
10 bl
cl
1
10 b2 c2 2
11 b4 cl
3
12 b3 c4 4
13 bl
cl
5
14 b3 c4 6
a. Given the previous extension (state), which of the following dependencies may hold in the
above relation? If the dependency cannot hold, explain why by specifying the tuples that cause
the violation.
i. A → B,
ii. B → C,
iii. C → B,
iv. B → A,
v. C → A
b. Does the above relation have a potential candidate key? If it does, what is it? If it does not,
why not?
Step-by-step solution
Step 1 of 2
a)
1.) A->B does not hold good in current state of relation as attribute B has two values
corresponding to value 10 of attribute A.
2.) B->C: this relation can hold good in current relation state.
3.) C->B does not hold good in current state of relation as attribute B has two values
corresponding to value c1 of attribute C.
4.) B->A does not hold good in current state of relation as attribute A has two values
corresponding to value b1 and b3 of attribute B.
5.) C->A does not hold good in current state of relation as attribute A has two values
corresponding to value c1, c4 of attribute C.
Comment
Step 2 of 2
b) If value of attribute - TUPLE# remains different for all tuples in relation it can act as candidate
key.
Comment
Chapter 14, Problem 27E
Problem
Consider a relation R(A, B, C, D, E) with the following dependencies:
AB → C, CD → E, DE→ B
Is AB a candidate key of this relation? If not, is ABD? Explain your answer.
Step-by-step solution
Step 1 of 3
The candidate key is the minimal field or the combination of fields in a relation that can be used
to uniquely identify all the other fields of the given relation.
The candidate key is checked using the closure property of the set and the functional
dependencies of the given relation.
Comment
Step 2 of 3
Consider the given relation R (A, B, C, D, E) and the following function dependencies:
AB C, CD E, DE B
To check whether the key AB is the candidate key of the given relation R, find the closure of AB
as shown below:
Since, all the attributes of the relation R cannot be identified using the key AB, the AB is not the
candidate key for the given relation R.
Comment
Step 3 of 3
To check whether the key ABD is the candidate key of the given relation R, find the closure of
ABD as shown below:
Since, all the attributes of the relation R can be identified using the key ABD, the ABD is a
candidate key for the given relation R.
Hence, proved.
Comment
Chapter 14, Problem 28E
Problem
Consider the relation R, which has attributes that hold schedules of courses and sections at a
university; R = {Course_no, Sec_no, Offering_dept, Credit_hours, Course_level, lnstructor_ssn,
Semester, Year, Days_hours, Room_no, No_of_students}. Suppose that the following functional
dependencies hold on R:
{Course_no} → {Offering_dept, Credit_hours, Course_level}
{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no, No_of_students,
lnstructor_ssn}
{Room_no, Days_hours, Semester, Year} → {lnstructor_ssn, Course_no, Sec_no}
Try to determine which sets of attributes form keys of R. How would you normalize this relation?
Step-by-step solution
Step 1 of 5
Consider the following relation and functional dependencies:
Relation
Functional dependencies:
Comment
Step 2 of 5
The closure of Course_no is as shown below:
From the functional dependency
The attributes Offering_dept, Credit_hours, Course_level are added to the closure of Course_no
as Course_no functionally determines Offering_dept, Credit_hours, Course_level.
Comment
Step 3 of 5
The closure of Course_no, Sec_no, Semester, Year is as shown below:
Comment
Step 4 of 5
The closure of Room_no, Days_hours, Semester, Year is as shown below:
Comment
Step 5 of 5
Comment
Problem
Chapter 14, Problem 29E
Consider the following relations for an order-processing application database at ABC, Inc.
ORDER (O#, Odate, Cust#, Total_amount)
ORDER ITEM (O#, I#, Qty_ordered, Total_price, Discount%)
Assume that each item has a different discount. The Total_price refers to one item, Odate is the
date on which the order was placed, and the Total_amount is the amount of the order. If we apply
a natural join on the relations ORDER_ITEM and ORDER in this database, what does the
resulting relation schema RES look like? What will be its key? Show the FDs in this resulting
relation. Is RES in 2NF? Is it in 3NF? Why or why not? (State assumptions, if you make any.)
Step-by-step solution
Step 1 of 4
The natural join of two relations can be performed only when the relations have a common
attribute with the same name.
The relations ORDER and ORDER_ITEM have O# as a common attribute. So, based on the
attribute O#, the natural join of two relations ORDER and ORDER_ITEM can be performed.
The resulting relation RES when natural join is applied on relations ORDER and ORDER_ITEM
is as follows:
The key of the relation RES will be {O#,I#}.
Comment
Step 2 of 4
The functional dependencies in the relation RES are as given below:
Comment
Step 3 of 4
The relation RES is not in second normal form as partial dependencies exist in the relation.
• The key of the relation RES is {O#,I#}.
• O# is a partial primary key and it functionally determines Odate, Cust# and Total_amt%.
Comment
Step 4 of 4
According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.
The relation RES is not in third normal form as it is not in second normal form.
Comment
Chapter 14, Problem 30E
Problem
Consider the following relation:
CAR_SALE(Car#, Date_sold, Salesperson#, Commission%, Discount_amt)
Assume that a car may be sold by multiple salespeople, and hence {Car#, Salesperson#} is the
primary key. Additional dependencies are
Date_sold → Discount_amt and
Salesperson# → Commission%
Based on the given primary key, is this relation in INF, 2NF, or 3NF? Why or why not? How would
you successively normalize it completely?
Step-by-step solution
Step 1 of 4
The relation CAR_SALE is in first normal form (1NF) but not in second normal form.
• According to the first normal form, the relation should contain only atomic values.
• The primary key is {Car#, Salesperson#}.
• As the relation CAR_SALE contains only atomic values, the relation CAR_SALE is in the first
normal form.
Comment
Step 2 of 4
The relation CAR_SALE is not in second normal form as partial dependencies exist in the
relation.
• According to the second normal form, each non-key attribute must depend only on primary key.
• Salesperson# is a partial primary key and it functionally determines Commission%.
• As partial dependency exists in the relation, the relation CAR_SALE is not in second normal
form.
• In order to satisfy second normal form, remove the partial dependencies by decomposing the
relation as shown below:
CAR_SALE1(Car#, Date_sold, Salesperson#, Discount_amt)
CAR_SALE2 (Salesperson#, Commission%)
• The relations CAR_SALE1, and CAR_SALE2 are in second normal form.
Comment
Step 3 of 4
The relation CAR_SALE2 is in third normal form but the relation CAR_SALE1 is not in third
normal form as transitive dependencies exist in the relation.
• According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.
• In relations CAR_SALE1, Date_sold is a non-key attribute which functionally determines
Discount_amt.
• As transitive dependency exists in the relation, the relation CAR_SALE1 is not in third normal
form.
• In order to satisfy third normal form, remove the transitive dependencies by decomposing the
relation CAR_SALE1as shown below:
CAR_SALE3 (Car#, Date_sold, Salesperson#)
CAR_SALE4 (Date_sold, Discount_amt)
• The relations CAR_SALE3 and CAR_SALE4 are now in third normal form.
Comment
Step 4 of 4
The final set of relations that are in third normal are as follows:
CAR_SALE2 (Salesperson#, Commission%)
CAR_SALE3 (Car#, Date_sold, Salesperson#)
CAR_SALE4 (Date_sold, Discount_amt)
Comment
Chapter 14, Problem 31E
Problem
Consider the following relation for published books:
BOOK (Book_title, Author_name, Book_type, List_price, Author_affil, Publisher)
Author_affil refers to the affiliation of author. Suppose the following dependencies exist:
Book_title → Publisher, Book_type
Book_type → List_price
Author_name → Author_affil
a. What normal form is the relation in? Explain your answer.
b. Apply normalization until you cannot decompose the relations further. State the reasons
behind each decomposition.
Step-by-step solution
Step 1 of 4
a.
The relation Book is in first normal form (1NF) but not in second normal form.
Explanation:
• According to the first normal form, the relation should contain only atomic values.
• The primary key is (Book_Title, Author_Name).
• As the relation Book contains only atomic values, the relation Book is in the first normal form.
• According to the second normal form, each non-key attribute must depend only on primary key.
• Author_Name is a partial primary key and it functionally determines Author_affil.
• Book_title is a partial primary key and it functionally determines Publisher and Book_type.
• As partial dependency exists in the relation, the relation Book is not in second normal form.
Comment
Step 2 of 4
b.
The relation Book is in first normal form. It is not in second normal form as partial dependencies
exist in the relation.
In order to satisfy second normal form, remove the partial dependencies by decomposing the
relation as shown below:
Book_author (Book_title, Author_name)
Book_publisher(Book_title, Publisher, Book_type,
List_price)
Author(Author_name, Author_affil)
The relations Book_author, Book_publisher and Author are in second normal form.
Comment
Step 3 of 4
According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.
• The relations Book_author and Author is in third normal form.
• The relations Book_publisher is not in third normal form as transitive dependency exists in the
relation.
• Book_type is a non-key attribute which functionally determines List_price.
• In order to satisfy third normal form, remove the transitive dependencies by decomposing the
relation Book_publisher as shown below:
Book_details(Book_title, Publisher,Book_type)
Book_price (Book_type, List_price)
The relations Book_author, Book_details, Book_price and Author are in third normal form.
Comment
Step 4 of 4
The final set of relations that are in third normal are as follows:
Book_author (Book_title, Author_name)
Book_details (Book_title, Publisher,Book_type)
Book_price (Book_type, List_price)
Author(Author_name, Author_affil)
Comment
Chapter 14, Problem 32E
Problem
This exercise asks you to convert business statements into dependencies. Consider the relation
DISK_DRIVE (Serial_number, Manufacturer, Model, Batch, Capacity, Retailer). Each tuple in the
relation DISK_DRIVE contains information about a disk drive with a unique Serial_number, made
by a manufacturer, with a particular model number, released in a certain batch, which has a
certain storage capacity and is sold by a certain retailer. For example, the tuple Disk_drive
(‘1978619’, ‘WesternDigital’, ‘A2235X’, ‘765234’, 500, ‘CompUSA’) specifies that WesternDigital
made a disk drive with serial number 1978619 and model number A2235X, released in batch
765234; it is 500GB and sold by CompUSA.
Write each of the following dependencies as an FD:
a. The manufacturer and serial number uniquely identifies the drive.
b. A model number is registered by a manufacturer and therefore can’t be used by another
manufacturer.
c. All disk drives in a particular batch are the same model.
d. All disk drives of a certain model of a particular manufacturer have exactly the same capacity.
Step-by-step solution
Step 1 of 1
a)
manufacturer, serialNumber → model, batch, capacity, retailer
b)
model → manufacturer
c)
manufacturer, batch → model
d)
model → capacity
Comments (1)
Chapter 14, Problem 33E
Problem
Consider the following relation:
R(Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge)
In the above relation, a tuple describes a visit of a patient to a doctor along with a treatment code
and daily charge. Assume that diagnosis is determined (uniquely) for each patient by a doctor.
Assume that each treatment code has a fixed charge (regardless of patient). Is this relation in
2NF? Justify your answer and decompose if necessary. Then argue whether further
normalization to 3NF is necessary, and if so, perform it.
Step-by-step solution
Step 1 of 1
Let the relation R (Doctor#, Patient#, Date, Diagnosis, Treat_code , Change)
Functional dependencies of relation R is
{Doctor#, Patient#, Date}→{Diagnosis, Treat_code, Charge}
{Treat_code}→{Charge}
Here there is no partial dependencies, So, the given relation is in 2NF. And it is not 3NF because
the Charge is a nonkey attribute that is determined by another nonkey attribute, Treat_code.
We must decompose this as:
R (Doctor#, Patient#, Date, Diagnosis, Treat_code)
R1 (Treat_code, Charge)
We could further infer that the treatment for a given diagnosis is functionally dependant, but we
should be sure to allow the doctor to have some flexibility when prescribing cures.
Comment
Chapter 14, Problem 34E
Problem
Consider the following relation:
CAR_SALE (Car_id, Option_type, Option_listprice, Sale_date, Option_discountedprice)
This relation refers to options installed in cars (e.g., cruise control) that were sold at a dealership,
and the list and discounted prices of the options.
If CarlD → Sale_date and Option_type → Option_listprice and CarlD, Option_type →
Option_discountedprice, argue using the generalized definition of the 3NF that this relation is not
in 3NF. Then argue from your knowledge of 2NF, why it is not even in 2NF.
Step-by-step solution
Step 1 of 3
The relation CAR_SALE is as shown below:
CAR_SALE( Car_id, Option_type, Option_listprice,
Sale_date, Option_discountedprice)
The functional dependencies are as given below:
Car_id Sale_date
Option_type Option_listprice
Car_id, Option_type Option_discountedprice
Comment
Step 2 of 3
In order for a relation to be in third normal form, all nontrivial functional dependencies must be
fully dependent on the primary key and any non-key attribute should not describe any non-key
attribute. In other words, there should not be any partial dependency and transitive dependency.
• For the relation CAR_SALE, Car_id, Option_type is a primary key.
• In functional dependency Car_id Sale_date, Car_id is a partial key that determines Sale_date.
Hence, there exists partial dependency in the relation.
• In functional dependency Option_type Option_listprice, Option_type is a partial key that
determines Option_type. Hence, there exists partial dependency in the relation.
Therefore, the relation CAR_SALE is not in third normal form.
Comment
Step 3 of 3
According to the second normal form, the relation must be in first normal form and each non-key
attribute must depend only on primary key. In other words, there should not be any partial
dependency.
• For the relation CAR_SALE, Car_id, Option_type is a primary key.
• In functional dependency Car_id Sale_date, Car_id is a partial key that determines Sale_date.
Hence, there exists partial dependency in the relation.
• In functional dependency Option_type Option_listprice, Option_type is a partial key that
determines Option_type. Hence, there exists partial dependency in the relation.
Therefore, the relation CAR_SALE is not in second normal form.
Comment
Chapter 14, Problem 35E
Problem
Consider the relation:
BOOK (Book_Name, Author, Edition, Year)
with the data:
Book_Name
Author
Edition Copyright_Year
DB_fundamentals Navathe 4
2004
DB_fundamentals Elmasri
4
2004
DB_fundamentals Elmasri
5
2007
DB_fundamentals Navathe 5
2007
a. Based on a common-sense understanding of the above data, what are the possible candidate
keys of this relation?
b. Justify that this relation has the MVD {Book} ↠ {Author} | {Edition, Year}.
c. What would be the decomposition of this relation based on the above MVD? Evaluate each
resulting relation for the highest normal form it possesses.
Step-by-step solution
Step 1 of 3
Candidate Key
A candidate key may be a single attribute or a set of attribute that uniquely identify tuples or
record in a database. Subset of candidate key are called prime attributes and rest of the
attributes in the table are called non-prime attributes.
Book_Name
Author
Edition Copyright_Year
DB_fundamentals Navathe 4
2004
DB_fundamentals Elmasri
4
2004
DB_fundamentals Elmasri
5
2007
DB_fundamentals Navathe 5
2007
Book_Name is same in all rows so this can’t be consider as a part of candidate key.
a.
Possible candidate keys:
(Author, Edition), (Author, Copyright_Year), (Book_Name, Author, Edition), (Book_Name, Author,
Copyright_Year), (Author, Edition, Copyright_Year), (Book_Name, Author, Edition,
Copyright_Year).
All above sets are candidate keys. Any one candidate key can be implemented. (Author, Edition),
(Author, Copyright_Year) will be a better choice to implement.
Comment
Step 2 of 3
b.
Multi Valued Dependency (MVD):
MVD occurs when the presence of one or more tuples in the table implies the presence of one or
more other rows in the same table. If at least two rows of table agree on all implying attributes,
then there components might be swapped, and the resulting tuples must be in the table. MVD
plays very important role in 4NF.
Consider the MVD
The relationship
.
indicates that the relationship between
Book_Name and Author is independent of the relationship between Book_Name and (Edition,
Copyright_Year).
By the definition of MVD, Book_Name is implying more than one Author and (Edition,
Copyright_Year). If the components of Author, Edition and Copyright are swapped than the
resulting rows would be present in the table. Therefore, the relation has MVD
.
Comment
Step 3 of 3
c.
Decomposition on the basis of MVD:
If a relation has MVD then redundant values will be there in the tuples and hence functional
dependency would not exist in that relation. Therefore, the relation will be in BCNF. So relation
can be decomposed into the following relations:
BOOK1 (Book_Name, Author, Edition)
BOOK2 (Edition, Copyright_Year)
Again BOOK1 is following MVD. Decompose it further and the final schema will be holding
highest normal form.
BOOK1_1 (Book_Name, Author)
BOOK1_2 (Book_Name, Edition)
BOOK2 (Edition, Copyright_Year)
Comment
Chapter 14, Problem 36E
Problem
Consider the following relation:
TRIP (Trip_id, Start_date, Cities_visited, Cards_used)
This relation refers to business trips made by company salespeople. Suppose the TRIP has a
single Start_date but involves many Cities and salespeople may use multiple credit cards on the
trip. Make up a mock-up population of the table.
a. Discuss what FDs and/or MVDs exist in this relation.
b. Show how you will go about normalizing the relation.
Step-by-step solution
Step 1 of 2
Relation TRIP has unique attribute Trip_id and particular Trip_id has single Start_date of the trip.
So Start_date is fully functionally dependent on Trip_id.
a.
FDs and MVDs that exist in the relation are:
FD1: (
)
Cities_visited and Cards_used may repeat for particular Start_date or Trip_id. Cities_visited and
Cards_used are independent of each other and they also have multiple values. Also, both
Cities_visited and Cards_used are dependent on Trip_id and Start_date, so the MVDs present in
the relation are as follows:
MVD1: (
)
MVD2: (
)
Comment
Step 2 of 2
b.
Normalizing relation
Relation is having one FD and two MVDs, so first split the relation to remove functional
dependency FD1.
TRIP1 ( Trip_id, Start_date)
Now split relation to remove multi valued functional dependency. Cities_visited and Cards_used
are independent of each other, if their components are swapped then relation will remain
unchanged. On the basis of Start_date, the relation can be decomposed as follows:
TRIP2 (Start_date, Cities_visited)
TRIP3 (Start_date, Cards_used)
Following is the final schema for the table provided.
TRIP1 ( Trip_id, Start_date)
TRIP2 (Start_date, Cities_visited)
TRIP3 (Start_date, Cards_used)
Comment
Chapter 15, Problem 1RQ
Problem
What is the role of Armstrong’s inference rules (inference rules IR1 through IR3) in the
development of the theory of relational design?
Step-by-step solution
Step 1 of 1
There are six inference rules (IR) for functional dependencies (FD) of which first 3 rules:
reflexive, augmentations, and transitive, are referred as Armstrong axioms.
Inference Rule 1 (reflexive rule)
If
, then
.
The reflexive rule is defined as any set of attributes functionally determines itself.
Inference Rule 2 (augmentation rule)
.
The augmented rule is defined as, when extending the left-hand side attributes of a FD results in
another valid FD.
Inference Rule 3 (transitive rule)
.
Transitive rule is defined as if A determines B and B determine C then A determines C.
Database designers specify the set of functional dependencies F that can be determined by
defining the attributes of relation R, and then IR1, IR2 and IR3 are used to define additional
functional dependencies that hold on R. These 3 inference rules are inferring new functional
dependencies (additional rules can also be determined from them). Hence they define new facts
and preferred by database designers in relational database design.
Comment
Chapter 15, Problem 2RQ
Problem
What is meant by the completeness and soundness of Armstrong’s inference rules?
Step-by-step solution
Step 1 of 1
The inference rules (IR) for functional dependencies (FD) reflexive, augmentation, and transitive
rules are referred as Armstrong inference rules.
Inference Rule 1 (reflexive rule)
If
, then
.
The reflexive rule is defined as any set of attributes functionally determines itself.
Inference Rule 2 (augmentation rule)
.
The augmented rule is defined as, when extending the left-hand side attributes of a FD results in
another valid FD.
Inference Rule 3 (transitive rule)
.
Transitive rule is defined as if A determines B and B determine C then A determines C.
As given by Armstrong, the inference rules IR1, IR2, and IR3 are sound and complete.
Sound
It means that for any given set of functional dependencies F specified on a relation schema R,
any dependency that is defined from F by using IR1 through IR3 that contained in every relation
states of relation R, satisfies the dependencies in F.
Complete
It means that using IR1 through IR3 continuously again and again to define dependencies until
there are no more dependencies can be defined from it, results in the complete set of all possible
dependencies that can be defined from F.
Comment
Chapter 15, Problem 3RQ
Problem
What is meant by the closure of a set of functional dependencies? Illustrate with an example.
Step-by-step solution
Step 1 of 2
The closure of a set of functional dependencies is nothing but a set of dependencies that consist
of functional dependencies of a relation denoted by F as well as the functional dependencies that
can be inferred from or implied by F.
The closure of a set of functional dependencies of a relation R is denoted by F+.
Comment
Step 2 of 2
Example:
Consider a relation Student with attributes StudentNo, Sname, address, DOB, CourseNo ,
CourseName, Credits, Duration.
The functional dependencies of Student are as follows:
•
•
The set of functional dependencies of Student is denoted by F.
So,
The functional dependencies that can be inferred from F are as follows:
•
•
•
Hence,
Comment
Chapter 15, Problem 4RQ
Problem
When are two sets of functional dependencies equivalent? How can we determine their
equivalence?
Step-by-step solution
Step 1 of 1
• Two set of functional dependencies (FD) A and B are equivalent if
. Hence
equivalence means that every FD in A can be defined from B, and every FD in B can be defined
from A, A is equivalent to B if both the conditions, A covers B and B covers A, hold.
• A set of functional dependencies A is said to cover another set of functional dependencies B if
every FD in B is also in
, it implies if every dependency in B can be defined from A, it can be
referred as B is covered by A.
• Whether A covers B, the statement is determined by calculating
FD
in B, then checking whether this
with respect to A for each
includes the attributes in F, if this holds true for
every FD in B, then A covers B. Similarly determined for B covers A and hence both A and B are
said to be equivalent.
Comment
Chapter 15, Problem 5RQ
Problem
What is a minimal set of functional dependencies? Does every set of dependencies have a
minimal equivalent set? Is it always unique?
Step-by-step solution
Step 1 of 1
If a set of functional dependencies F is said to be minimal sets if it satisfies the following
conditions.
1. There are set of dependencies in F, and then every dependency in F contains one single
attribute for its right-hand side.
2. Any dependency
in F cannot be replaced with another dependency
, where Q
is a proper subset of P; it contains a set of dependencies that is equivalent to F.
3. Any dependency cannot be removed from F and contains a set of dependencies that is
equivalent to F.
Condition 1 states that every dependency is accepted with a single attribute on the right-hand
side.
Conditions 2 and 3 ensure that there are no dependencies that occur repeatedly either by having
redundant attributes on the left-hand side of a dependency or by having a dependency that can
be defined from the remaining FDs in a set of functional dependency F respectively.
A minimal cover of a set of functional dependencies A is a set of functional dependencies F that
satisfies the property that every dependency in A is in the closure of
F, and is a minimal set
of dependencies equivalent to A without redundancy in a standard acceptable form. Hence there
is an equivalent set which is unique.
Comment
Chapter 15, Problem 6RQ
Problem
What is meant by the attribute preservation condition on a decomposition?
Step-by-step solution
Step 1 of 1
Attribute preservation condition on decomposition:
Decomposition:Replace an un normalized relation by a set of normalized relations.
Let
is the relation schema than
is a decomposition.
Attribute preservation
Every Attribute is in some relation. All attributes must be preserved through the process of
normalization.
Start with universal relation schema
that includes all the attributes of the database.
Here every attribute name is unique
Using the functional dependencies, the algorithms decompose the universal relation schema R
into a set of relation schemas
that will become the relational database
schema. D is called decomposition of
Such that
.
and
Each attribute in ‘R’ will appear in at least one relation schema
no attributes are lost.
Attribute preservation condition of decomposition
Comment
in the decomposition so that
Chapter 15, Problem 7RQ
Problem
Why are normal forms alone insufficient as a condition for a good schema design?
Step-by-step solution
Step 1 of 1
forms along in sufficient as a condition for good schema design from the describe properties of
decompositions,
1) loss less joint property and
2) Dependency preservation property,
Using these both, used by the design algorithms to achieve desirable decomposition
It is insufficient to test the relation schemas independently of one another for compliance with
higher normal from like 2nF, 3NF and 13 CNF. The resulting relations must collectively satisfy
these two additional propertied dependency preservation and loss less join property to quality as
a good.
Comment
Chapter 15, Problem 8RQ
Problem
What is the dependency preservation property for a decomposition? Why is it important?
Step-by-step solution
Step 1 of 2
Dependency preservation property for decomposition:Let
be a set of functional dependencies on schema
decomposition of
Where
the projection of
is subset of
are contained in
decomposition
attributes are in
on
. Take
: is denoted by
.is the set of all functional dependencies
. dlence the projection of
be a
.
such that attributes in
on each relation schema
is the set of functional dependencies in
in the
. Such that all their LHS and RHS
.
of the dependencies that hold on each
belongs to
be equivalent to closure of
.
Comment
Step 2 of 2
Important:1) With this property we would like to check easily that updates to the database do not result in
illegal relations being created.
2) It would be nice if our design allowed us to check updates without having to compute natural
joins. To know whether joins must be computed.
3) We want to preserve dependencies because each dependencies in
represents a constraint
on the database.
4) It is always possible to find a dependency preserving decomposition
such that each relation
Comment
in
is in
.
with respect to
Chapter 15, Problem 9RQ
Problem
Why can we not guarantee that BCNF relation schemas will be produced by dependencypreserving decompositions of non-BCNF relation schemas? Give a counterexample to illustrate
this point.
Step-by-step solution
Step 1 of 3
We can not guarantee that
decompositions of non-
relation schemas will be produced by dependency-preserving
relation schema.
For this, consider are example.
Take two functional dependencies that exist in the relation
.
-fd1:
Fd2: instructor
Here
course.
is a candidate keys so. This relation is in
but not in
Comment
Step 2 of 3
Comment
Step 3 of 3
A relation is NOT in BCNF. That should be decomposed, so as to meet this property. While
possible forgoing the preservation of all functions dependencies in the decomposed relations
Comment
Chapter 15, Problem 10RQ
Problem
What is the lossless (or nonadditive) join property of a decomposition? Why is it important?
Step-by-step solution
Step 1 of 1
Loss Less join property of decomposition:
This is the one property of decomposition. The word loss in lossess means, lost of information.
But not to loss of tuples.
Basic definition of loss less – join.
A decomposition
of dependencies
Where
of
on
has the losses join property with respect to
if, for every relation
of
that satisfies
. Set
, the following holds.
is the natural join of all the relations in D
Equation
Emp-PROJ
SSN PNUM Hours ENAME PNAME PLOCATION
SSN ENAME
PNUM PNAME PLOCATION
SSN PNUM hours
Here “hours” is the lossless join.
Important:Important feature of decomposition is that it gives lossless joins. It shows the problem of spurious
tuples.
If the relations chosen do not have total information afoot the entity /relationship, when we join
the relations, then obtain the tuples. Actually that is not belonging in there.
These spurious tuples contain the wrong in formation.
To avoid this type problems, we can go through lossless join property.
Comment
Chapter 15, Problem 11RQ
Problem
Between the properties of dependency preservation and losslessness, which one must definitely
be satisfied? Why?
Step-by-step solution
Step 1 of 1
Dependency preservation and loss lenses both are describe by the properties of decompositions.
With this both are used by the algorithms to achieve desirable decompositions.
Property of dependency preservation:It ensures us to in force a constraint on the original relation from corresponding instances in the
smaller relations.
Property of lossless join property:It ensures that to find out any instance of the original relation from corresponding instance in the
smaller relations.
Here no spurious rows are generated. When relations are reunited through natural join operation.
To test the relation schemas independently of one another for compliance with higher normal
forms like
Comments (1)
,
and
, dependency preservation is not sufficient.
Chapter 15, Problem 12RQ
Problem
Discuss the NULL value and dangling tuple problems.
Step-by-step solution
Step 1 of 2
NULL values and dangling tuple problems.
When designing a relational database schema, we must consider the problems with NULLS.
NULLS can have multiple interpretations. That are
1) The attribute does not apply to this tuple
2) The attribute value for this tuple is unknown.
3) The value is known but absent, that is, it has not been recorded yet.
Comment
Step 2 of 2
Dangling tuples:Tuples that “disappear” in computing a join.
Let a pair of relations
not join with any tuple in
There is no tuple ‘
’in
and
and the natural join
. And tuple
in ‘
’ that does
.
. Such that
This is called dangling tuple. It may or may not e acceptable.
Example:
For suppose there is a tuple
in the account relation with the value of
“
”,
but no matching tuple in the branch relation for the Town 1
branch.
This is undesirable. As
should refer to a branch that exists. and now there is a another tuple
. In the branch relation with
relation for the “
“
”, but no matching tuple in The account
”branch.
Means that, a branch exists for which no accounts exist. When a branch is being opened.
Comment
Chapter 15, Problem 13RQ
Problem
Illustrate how the process of creating first normal form relations may lead to multivalued
dependencies. How should the first normalization be done properly so that MVDs are avoided?
Step-by-step solution
Step 1 of 2
Multivalued dependencies are a consequence of first normal form which disallows an attribute in
a tuple to have a set of values. If we have two or more multivalued independent attributes in the
same relation schema, we get into a problem of having to repeat every value of one of the
attributes with every value of other attribute to keep the relation state consistent and to maintain
the independence among attributes involved. this constraint is specified by a multivalued
dependency.
For example: consider a EMP relation with attributes Ename, Project_name, Dependent_name
the relation has following tuples:
1.) ('a','x','n')
2.) ('a','x','m')
3.) ('a','y','n')
4.) ('a','y','m')
Comment
Step 2 of 2
Here employee name 'a' has two depenedents and work for two projects. Since each attribute
value must be atomic, the problem of multivalued dependency has risen in the relation.
Informally, whenever two independent 1:N relationships A:B and A:C are mixed in the same
relation, R(A, B, C) an MVD may arise.
Whenever a relation schema R is decomposed into R1= (X U Y) and R2 = (R-Y) based on an
MVD X->>Y that holds in R, the decomposition has nonadditive join property.
The property NJB': The relation schema R1 and R2 form a nonadditive join decomposition of R
with respect to a set of functional and multivalued dependencies if and only if
(R1 n R2)->>(R1- R2)
...deals with problem of MVD and thus using this property we can get a relation which is in 1NF
and does not has MVD.
Comment
Chapter 15, Problem 14RQ
Problem
What types of constraints are inclusion dependencies meant to represent?
Step-by-step solution
Step 1 of 1
Types of constraints are inclusion dependencies ment to represent.
inclusion dependencies, it is defined in order to formalize two types of interrelational
constraints. Which cannot be expressed using functional dependencies or multivalued
dependencies.
That two are
Referential integrity constraint:It relates attributes across relations. So, the foreign key or referential integrity constraint cannot
be specified as a functional or multivalued dependency.
Class/subclass relationship:It represents a relations between two the class/subclass relationship. Also has no formal
definition in terms of the functional, multivalued and join dependencies.
Comment
Chapter 15, Problem 15RQ
Problem
How do template dependencies differ from the other types of dependencies we discussed?
Step-by-step solution
Step 1 of 2
Template dependencies differ from the other type of dependences
Template dependencies:It is a technique for representing constraints in relations. Based on the semantics of attributes
with in the relation some peculiar constraint may option. Basic idea of template dependencies is,
to specify a template-or-example. That defines each constraint or dependency.
In this dependencies, there are two types
(1) Tuple-generating templates
(2) Constraint generating templates.
And a template consists of number of hypothesis tuples that appear in one or more relations.
Comment
Step 2 of 2
And other part of template is template conclusion. The conclusion is a set of tuples that must also
exist in the relations. If the hypothesis tuples are there.
Let one example
Take relation
We may apply the template dependencies to this relation,
dependencies
, it shous the template for functional
.
Hypothesis
Here we take
Conclusion
But while come through other dependencies it is some what different.
Comment
Chapter 15, Problem 16RQ
Problem
Why is the domain-key normal form (DKNF) known as the ultimate normal form?
Step-by-step solution
Step 1 of 1
Domain-key normal form
is known as ultimate normal form.
Behind the idea of domain-key normal form is. It specify the ultimate normal form that taken in to
account all possible types of dependencies that should hold on the valid relation states can be
enforced simply by domain constraints and key constraints.
- A relation in DkNF has no modification anomalies, and conversely.
- DkNF is the ultimate normal form means, here no higher normal form related to modification
anomalies.
- In domain – key normal form the relation is on every constraint. That is logical consequence of
the definition of keys and domains.
Keys: - the unique identifier of a tuple.
Damain:- physical and logical description of an attributes.
Comment
Chapter 15, Problem 17E
Problem
Show that the relation schemas produced by Algorithm 15.4 are in 3NF.
Step-by-step solution
Step 1 of 1
Assume that one of the relation schemas
Now a functional dependency
, formed by algorithm 15.4 is not in 3NF.
is valid in
where,
• M is not a super key of R.
• A is not a prime attribute of R.
However, as per the step 2 of algorithm
where
that
will comprise of a set of attributes
for
are the only nonprime attributes of
Thus, if a functional dependency
and M is not a super key of
, implying that X is a key of
and
.
holds in the relation schema
, where A is not prime
, then M must be a subset of X or else M would comprise of X and
therefore would be a super key.
If both
that
and
holds and M is a subset of X, then this contradicts the condition
is a functional dependency in a minimal cover of functional dependencies, as
removing an attribute from the key X of functional dependency
leaves a valid functional
dependency.
This infringes one of the minimality conditions and hence the relational schema
3NF.
Comment
must be in
Chapter 15, Problem 18E
Problem
Show that, if the matrix S resulting from Algorithm 15.3 does not have a row that is all a symbols,
projecting Son the decomposition and joining it back will always produce at least one spurious
tuple.
Step-by-step solution
Step 1 of 2
Let take the universal relation
and a decomposition
of
and a set
of functional dependencies.
Based on Algorithm
(given in the text book)
Take the matrix S, it is considered to be some relation state
Row
in
represents a tuple
,it is corresponding to
and that correspond to the attributes of
and
of
. (From step1 in algorithm)
and that has a symbols in columns
symbols in the remaining columns.
From the step 4 of Algorithm
During the loop, the algorithm then transforms the rows of this matrix, that they represent the
tuples.
So, the tuples satisfy all the functional dependencies in
represents two tuples in
functional dependency
attributes
. Any two rows in
which
that agree in their values for the left-hand-side attributes
in
of a
and it will also agree in their values for the right-hand-side
.
If any row in
ends up with all a symbols, then the decomposition
property with respect to
has the non additive join
.
In other hand, if no row ends up being all a symbols, decomposition ‘D’ does not satisfy the
lossless-join Property.
Comment
Step 2 of 2
At this time the relation state
dependencies in
represented by
. And relation state
of
that satisfies the
. But does not satisfy the non additive join condition.
From step 4:The loop in the algorithm cann’t change any
symbols to a symbols.
So the symbols.
So, the Ruslting matrix ‘S’ does not have a row with all ‘a’ symbols and the decomposition does
not have the loss-join property.
Let take the example.
Consider the relational schema
Comment
and set of functional dependencies
Chapter 15, Problem 19E
Problem
Show that the relation schemas produced by Algorithm 15.5 are in BCNF.
Step-by-step solution
Step 1 of 2
Show that the relation schemas produced by algorithm
are in BCNE.
In this algorithm the loop will continue until all relation schemas are in BCNF 11.3 Algorithm
Input: A universal relation
and a set of functional dependencies
on the attributes of
Step 1 : Set D :
Step 2: while there is a relation schema Q is D that is not in BCNF do
Choose the relation schema
in
Find the functional dependency
schemas
that is not in
;
in a that violates BCNF; replace Q in D by two relation
and
;
Comment
Step 2 of 2
According to this algorithm, we decompose one relation schema Q. That is not in BCNF into two
relation schemas. According to the property of lossless join decomposition property 1, for binary
composition and claim 2 (Preservation of Nonadditivity in successive Decompositions) [which is
menctioved in text book], the decomposition D has the no additive join property.
At the end of the algorithm. All relation schemas in D will be in BCNF.
Example:Working of this algorithm.
Let take one relation (for example) which is not in BCNF.
Project –ID Company-name Ploat # Area Price Tax-Rate
First loop: it is in BCNF
Project –ID Company-name Ploat # Area Price
Company-name Tax-Rate
Second loop: it is also not in BCNF
Project –ID Company-name Ploat # Area
Area Price
Company-name Tax-Rate
Final loop: it is under in BCNF
Project –ID Area Ploat #
Area Company-name
Company-name Tax-Rate
Comment
Chapter 15, Problem 20E
Problem
Write programs that implement Algorithms 15.4 and 15.5.
Step-by-step solution
Step 1 of 6
Program to implement Algorithm 15.4
The following program converts a relational schema into 3NF. SynthesisAlgorithm is a public
class having main method to start execution. First, program takes the input from the keyboard,
stores them into several list. Input values are the attribute names and functional dependencies
for the relation.
In this program, first step calculates minimal cover of the functional dependencies. Second step
calculates the attributes to be considered for the relation. Third step checks whether or not
primary key is contained in any of the relation. Forth step finds if there is any redundant relation
and removes that relation from the schema.
Following is the java code to implement Synthesis algorithm to convert a relation into 3NF.
import java.util.*;
import java.io.*;
public class SynthesisAlgorithm
{
// main method to start the execution of the program.
public static void main(String []args)
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
// If irrelevant values will be entered, it might give // Exception at Runtime.
System.out.println("Note: Everything is case
Sensitive, please enter values in the same case
everywhere.");
System.out.println("Enter the name of Relation:");
// It will store the name of relation.
String relationName=br.readLine();
System.out.println("How many attributes are there in
the Relation?");
// Number of attributes in the relation for efficient management of the attributes.
int n=Integer.parseInt(br.readLine());
System.out.println("Type name of one attribute in each
line:");
// This list contains all attribute names.
LinkedList<String> attributeList=new
LinkedList<String>();
// for loop will insert all attributes to the list.
for(int i=0;i<n;i++)
attributeList.add(br.readLine());
System.out.println("How many functional dependencies
are there in the relation "+relationName);
// Number of Functional Dependencies.
int numOfFuncDep=Integer.parseInt(br.readLine());
// this will initialize Left Hand Side attributes of
Functional Dependencies.
LinkedList<String>[] fucDepLHSattr=new
LinkedList[numOfFuncDep];
// this will initialize Right Hand Side attributes of
Functional Dependencies.
LinkedList<String>[] fucDepRHSattr=new
LinkedList[numOfFuncDep];
for(int i=0;i<numOfFuncDep;i++)
{
// Left Hand side of functional dependency might // have more than one determinants.
fucDepLHSattr[i]=new LinkedList<String>();
System.out.println("Number of attributes in LHS of
functional dependency["+i+"]");
//Number of determinant in Left Hand side of //functional dependency.
//temp1 variable overrides itself for each //functional dependency.
int temp1=Integer.parseInt(br.readLine());
System.out.println("Enter the attribute names of
LHS["+i+"]");
for(int j=0;j<temp1;j++)
fucDepLHSattr[i].add(br.readLine());
// Right Hand side of functional dependency might // have more than one determinants.
fucDepRHSattr[i]= new LinkedList<String>();
System.out.println("Number of attributes in RHS of
functional dependency["+i+"]");
//Number of dependants in Right Hand side of //functional dependency.
//temp2 variable overrides itself for each //functional dependency.
int temp2=Integer.parseInt(br.readLine());
System.out.println("Enter the attribute names of
RHS["+i+"]");
// inserting all attributes on right hand side of // the functional dependency.
for(int j=0;j<temp2;j++)
fucDepRHSattr[i].add(br.readLine());
}
System.out.println("Step 1: Finding minimal
cover...");
//initializing a collection to contain the minimal //cover of FDs.
HashMap<String,String> canonicalFDs=new
HashMap<String,String>();
// calling the minimal cover to calculate minimum FDs // required for the relation.
canonicalFDs=minimalCover(fucDepLHSattr,fucDepRHSattr)
;
Comment
Step 2 of 6
for(int i=0;i<numOfFuncDep;i++)
{
for(int j=0;j<numOfFuncDep && j!=i;j++)
{
// Since, HashMap has unique key, value pair, it // will remove redundant FDs.
canonicalFDs.get(i).containsKey(canonicalFDs.get
(j));
// reducing the attributes from each side.
canonicalFDs=minusFD(canonicalFDs.get(i),
canonicalFDs.get(i));
}
}
System.out.println("Step 2: Calculating attributes for
each Functional Dependency...");
for(int i=0;i<canonicalFDs.size();i++)
{
System.out.print("Relation"+i+": ");
// this will print the relation for each functional // dependency.
System.out.print(relationName+"("+canonicalFDs.get(
i)+","+canonicalFDs.get(i)+")");
//printing each relation in new line.
System.out.print("\n");
}
System.out.println("Step 3: Checking whether key
attributes are exist in any of the relations...");
//checking primary keys that exist in the created //relations.
if(canonicalFDs==minimalCover(fucDepLHSattr,fucDepRHSa
ttr))
System.out.println("No redundant attributes
exist:");
System.out.println("Step 4: Reducing redundant
relations from the schema...");
System.out.println("Final schema is as follows:");
// this loop will print the final schema.
for(int i=0;i<canonicalFDs.size();i++)
{
System.out.print("Relation"+i+": "); System.out.print(relationName+"("+canonicalFDs.get(
i)+","+canonicalFDs.get(i)+")");
}
}
public static HashMap<String,String>
minusFD(HashMap<String,String> map, Object pair)
{
map.remove(pair);
}
// this method will find the minimal cover of FDs.
public static HashMap<String,String>
minimalCover(LinkedList[] LHSlist,LinkedList[]
RHSlist)
{
//if the set of FDs are null this will throw //exception.
if(LHSlist==null || RHSlist==null)
throw new IllegalArgumentException("Functional
Dependency can't be NULL.");
else
System.out.println(" Converting Functional
Dependencies into canonical form...");
HashMap<String,String> canonicalFDs=new
HashMap<String,String>();
for(int i=0;i<LHSlist.length && i<RHSlist.length;i++)
{
canonicalFDs.put(convertIntoCanonical(LHSlist[i],RHSl
ist[i]));
}
return canonicalFDs;
}
// this method converts all functional dependencies into // canonical form.
public static HashMap<String,String>
convertIntoCanonical(LinkedList<String>
list1,LinkedList<String> list2)
{
// initializing a HashMap to hold canonical FDs.
HashMap<String,String> map=new
HashMap<String,String>();
// both loop will insert FDs into map, that hold only // unique pair.
for(int j=0;j<list1.size();j++)
{
for(int i=0;i<list2.size();i++)
{
map.put(list1.get(i),list2.get(i));
}
}
return map;
}
}
Comment
Step 3 of 6
Program to implement Algorithm 15.5
The following program convert a relation into BCNF using relational decomposition algorithm. In
the first step, it considered all attributes in the single relation.
In second step enters into a loop of functional dependency and check whether or not any
functional dependency violates BCNF. If any FD violates BCNF, a new relation will be created
having all those attributes participates in that functional dependency.
At the same time the dependents are removed from the parent relation.
Following is the java code to implement Decomposition algorithm to convert a relation into BCNF.
import java.util.*;
import java.io.*;
public class DecompositionIntoBCNF
{
// main method to start the execution of the program.
public static void main(String []args) throws Exception
{
BufferedReader br=new BufferedReader(new
InputStreamReader(System.in));
// If irrelevant values will be entered, it might give // Exception at Runtime.
System.out.println("Note: Everything is case
sensitive, please enter values in the same case
everywhere.");
System.out.println("Enter the name of Relation:");
// It will store the name of relation.
String relationName=br.readLine();
System.out.println("How many attributes are there in
the Relation?");
// Number of attributes in the relation for efficient
// management of the attributes.
int n=Integer.parseInt(br.readLine());
System.out.println("Type name of one attribute in each
line:");
// This list contains all attribute names.
LinkedList<String> attributeList=new
LinkedList<String>();
// for loop will insert all attributes to the list.
for(int i=0;i<n;i++)
attributeList.add(br.readLine());
System.out.println("How many functional dependencies
are there in the relation "+relationName);
// Number of Functional Dependencies.
int numOfFuncDep=Integer.parseInt(br.readLine());
// this will initialize Left Hand Side attributes of // Functional Dependencies.
LinkedList<String>[] fucDepLHSattr=new
LinkedList[numOfFuncDep];
// this will initialize Right Hand Side attributes of // Functional Dependencies.
LinkedList<String>[] fucDepRHSattr=new
LinkedList[numOfFuncDep];
for(int i=0;i<numOfFuncDep;i++)
{
// Left Hand side of functional dependency might
// have more than one determinants.
fucDepLHSattr[i]=new LinkedList<String>();
System.out.println("Number of attributes in LHS of
functional dependency["+i+"]");
// Number of determinant in Left Hand side of
// functional dependency.
// temp1 variable overrides itself for each
// functional dependency.
int temp1=Integer.parseInt(br.readLine());
System.out.println("Enter the attribute names of
LHS["+i+"]");
for(int j=0;j<temp1;j++)
fucDepLHSattr[i].add(br.readLine());
// Right Hand side of functional dependency might
// have more than one determinants.
fucDepRHSattr[i]= new LinkedList<String>();
System.out.println("Number of attributes in RHS of
functional dependency["+i+"]");
// Number of dependants in Right Hand side of
// functional dependency.
// temp2 variable overrides itself for each
// functional dependency.
int temp2=Integer.parseInt(br.readLine());
System.out.println("Enter the attribute names of
RHS["+i+"]");
// inserting all attributes on right hand side of
// the functional dependency.
for(int j=0;j<temp2;j++)
fucDepRHSattr[i].add(br.readLine());
}
LinkedList<String> output=new LinkedList<String>();
LinkedList[] decomposition=new
LinkedList[numOfFuncDep];
output=attributeList;
int d=0;
// repeat until any functional dependency violates
// BCNF. while(!inBCNF(output,fucDepLHSattr[d],fucDepRHSattr[d]
,d))
{
decomposition[d]=new LinkedList<String>();
// if FD violates BCNF, create new relation
// consisting attributes of LHS in FD.
for(int j=0;j<fucDepLHSattr[d].size();j++)
Comment
Step 4 of 6
decomposition[d].add(fucDepLHSattr[d].get(j));
// add RHS attributes to the relation.
for(int j=0;j<fucDepRHSattr[d].size();j++)
decomposition[d].add(fucDepRHSattr[d].get(j));
// remove RHS attributes of FD from parent
// relation.
for(int j=0;j<fucDepRHSattr[d].size();j++)
output.remove(fucDepRHSattr[d].get(j));
d++;
// limit the loop up to the Number of functional
// dependencies.
if(d>=numOfFuncDep)
break;
}
System.out.println("Following are the decomposed
relations:");
// this loop will print the relations.
for(int k=0;k<d;k++)
{
System.out.print(relationName+""+(k+1)+"(");
// HashSet removes the redundant attributes from the
// relation.
HashSet hs=new HashSet();
for(int q=0;q<decomposition[k].size();q++)
hs.add(decomposition[k].get(q));
Iterator it=hs.iterator();
// while loop will print one attribute at a time.
while(it.hasNext())
{
System.out.print(it.next());
}
System.out.print(")\n");
}
}
// inBCNF method will check whether or not a relation is
// in BCNF.
public static boolean inBCNF(LinkedList<String>
relation,LinkedList<String> list1,LinkedList<String>
list2,int index)
{
// this loop will concatenate the attributes of LHS
// and RHS.
for(int i=0;i<list2.size();i++)
list1.add(list2.get(i));
// if the functional dependency violates BCNF this
// will return false otherwise return true.
if(list1.size()< relation.size())
return false;
else
{
// sorting attributes to compare attributes whether
// or not they exist in the relation.
Collections.sort(list1);
Collections.sort(relation);
// if attributes of functional dependency and
// relation are similar this follows BCNF otherwise
// it will return false.
for(int j=0;j<list1.size() && j<relation.size();
j++)
{
if(list1.get(j)==relation.get(j))
continue;
else return false;
}
}
return true;
}
}
Comment
Step 5 of 6
The following output gets displayed by the above program:
E:\Tom\java, c & c++ code>javac DecompositionIntoBCNF.java
Note: DecompositionIntoBCNF.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
E:\Akram\java, c & c++ code>java DecompositionIntoBCNF
Note: Everything is case sensitive, please enter values in the same case everywhere.
Enter the name of Relation:
MyRelation
How many attributes are there in the Relation?
5
Type name of one attribute in each line:
A
B
C
D
E
How many functional dependencies are there in the relation MyRelation
3
Number of attributes in LHS of functional dependency[0]
2
Enter the attribute names of LHS[0]
A
B
Number of attributes in RHS of functional dependency[0]
Comment
Step 6 of 6
1
Enter the attribute names of RHS[0]
C
Number of attributes in LHS of functional dependency[1]
2
Enter the attribute names of LHS[1]
C
D
Number of attributes in RHS of functional dependency[1]
1
Enter the attribute names of RHS[1]
E
Number of attributes in LHS of functional dependency[2]
2
Enter the attribute names of LHS[2]
D
E
Number of attributes in RHS of functional dependency[2]
1
Enter the attribute names of RHS[2]
B
Following are the decomposed relations:
MyRelation1(ABC)
MyRelation2(CDE)
MyRelation3(BDE)
E:\Tom\java, c & c++ code>
Comment
Chapter 15, Problem 21E
Problem
Consider the relation REFRIG(Model#, Year, Price, Manuf_plant, Color), which is abbreviated as
REFRIG(M, Y, P, MP, C), and the following set F of functional dependencies: F = {M → MP, {M,
Y}→ P, MP → C}
a. Evaluate each of the following as a candidate key for REFRIG, giving reasons why it can or
cannot be a key: {M}, {M, Y}, {M, C}.
b. Based on the above key determination, state whether the relation REFRIG is in 3NF and in
BCNF, and provide proper reasons.
c. Consider the decomposition of REFRIG into D = {R1 (M, Y, P), R2(M, MP, C)}. Is this
decomposition lossless? Show why. (You may consult the test under Property NJB in Section
14.5.1.)
Step-by-step solution
Step 1 of 3
Consider the relation schema REFRIG and the functional dependencies F provided in the
question.
a.
Consider the key {M}.
{M} cannot be a candidate key as it cannot determine the attributes P and Y.
Consider the key {M, Y}.
It is provided that
Since
.
is the superset of M so by IR1,
.
Since
Since
and
so by IR3,
and
so by IR3,
Therefore, {M, Y} is a candidate key as it determines the attributes P, MP and C.
Consider the key {M, C}.
{M, C} cannot be a candidate key as it cannot determine the attributes P and Y.
Comment
Step 2 of 3
b.
REFRIG is not in 2NF as there is a functional dependency
, in which M is partially
dependent on the key {M, Y}. Hence REFRIG is not in 3NF.
Since M is not the super key in
so REFRIG is not in BCNF too.
Comment
Step 3 of 3
c.
Consider the decomposition of REFRIG as follows,
Applying the test for Binary Decomposition,
Now it is provided that
.
Since
so by IR3,
Hence,
and
.
In the above decomposition,
is
and
is
.
Since
Comment
, the NJB test is satisfied and hence decomposition is lossless.
Chapter 15, Problem 22E
Problem
Specify all the inclusion dependencies for the relational schema in Figure 5.5.
Step-by-step solution
Step 1 of 1
Inclusion dependencies
Inclusion dependencies are defined in two types of interrelational constraints.
- referential integrity constraints
- Class/subclass relationships.
Definition of inclusion dependency:- let
relation schema
time where
. And
be the set of attributics between – X of
of relation schema
is a relation state
and
specifies the constraint that at any specific
a relation state of
. Then we must have
From the figure 5.5 in the text book, we can specify the following inclusion dependencies on the
relational schema.
DEPENDENT.ESsn < EMPLOYEE.Ssn
WORKS-ON.P number
DEPT-LOCATIONS.D number
All the preceding inclusion dependencies represent referential integrity constraints.
We can also use inclusion dependencies to represent class/subclass relationships.
Comment
Chapter 15, Problem 23E
Problem
Prove that a functional dependency satisfies the formal definition of multivalued dependency.
Step-by-step solution
Step 1 of 2
Functional dependency satisfies the formal definition of multi valued dependency.
Functional dependencies
Consider the rule for the functional dependencies if
,
and
is a subset of
then
Hence
are single attributes and
are set of attributes.
It should be based on the formal definition of functional dependencies.
Multi valued dependencies:While come to multi valued dependencies, it may follow the below rule
If
(BB intersects CC) where AA, BB, and CC are sets of attributes, and intersect
performs set intersection.
Comments (2)
Step 2 of 2
As with function dependencies (FDs), inference rules for multi valued dependencies (MVPs)
have been developed. A functional dependency is a multi valued dependencies it follows the
replication Rule. Ice. If
then
Holds
Now assume that all attributes are included in universal relation schema
and that
and W are subsets of R
It may follow the bellow rules.
If
then
all attributes in
where
except
Augmentation rule:
If
and there exists W with the properties
That (a)
is empty
(b)
(c)
Here
Then
and W have to be disjoint and Z has to be a subset of or equal to Y
So, by the above rules “every functional dependency is also an multi valued dependencies,
because. It satisfies the formal definition of an multi valued dependencies.
Comment
is
Chapter 15, Problem 24E
Problem
Consider the example of normalizing the LOTS relation in Sections 14.4 and 14.5. Determine
whether the decomposition of LOTS into {LOTS1AX, LOTS1AY, LOTS1B, LOTS2} has the
lossless join property by applying Algorithm 15.3 and also by using the test under property NJB
from Section 14.5.1.
Step-by-step solution
Step 1 of 8
Consider the example given in text book
Comment
Step 2 of 8
Comment
Step 3 of 8
Let take the relation.
Lots (property-id, country-name, #, area, price, tax-rate)
Suppose we decompose the above relation into two relations
LOTSIAX, LOTSIAY as follows. (From step 1, 2 of Algorithm 11.1)
LOTSIAX (Property-id, country-name, #, Area, Price)
LOTSIAY (Country-name, Tax-rate)
There are a problem with this decomposition but we wish to focus on are aspect at the moment.
Let an instance of the relation LOTS be
Comment
Step 4 of 8
Comment
Step 5 of 8
Now let the decomposed relations LOTSAX, LOTSAY. Be
Comment
Step 6 of 8
And
Comment
Step 7 of 8
All the information that was in tehr elation LOTS appears to be still available in LOTSIAX and
LOTSIAY. But this is not so.
Suppose, we construct LOTSIAX by removing the attribute Tax-rate that violates 2NF form LUTS
and placing it wilt country-name. Into another relation LOTSIY
Let
Comment
Step 8 of 8
Now we need to retrieve #. Then we would need to join LUTSIAX and LOTSIAY. Then the join
would have some tuples.
A decomposition of a relation
into relations
decomposition. With respect to
is called loss less join
.
Optaining result from steps of – Algorithm 11.1
Let decomposition
of R has the non additive join property. That represents the set
of functional dependencies
on
. If and only if either the functional dependencies
it is also in
.
By the above relation.
Let
And
.
Now apply the property of the NJB, we get
a
so the functional dependencies
it is in
Comment
and
it is also in
Chapter 15, Problem 25E
Problem
Show how the MVDs Ename ↠ and Ename ↠ Dname in Figure 14.15(a) may arise during
normalization into 1NF of a relation, where the attributesPname and Dname are multivalued.
Step-by-step solution
Step 1 of 2
Given multi valued dependency is
E name
P name and E name
D name
According 11.4 cal figure given in Text book
EMP
E name P name D name
It is in first normal form
Now, we need to show that attributes P name and D name are multi valued.
And
hold the EMP relation.
Let example of 11.4 (a) gives in text book
EMP
E name P name D name
Smith
X
John
Smith
Y
Anna
Smith
X
Anna
Smith
Y
John
Comment
Step 2 of 2
By above relation EMP shows. An employee where name is E name works on the project where
P name and has a dependent whose name I D name.
An employee may work on several projects and may have several dependents.
The employee projects and dependents are independent of one another.
To maintain this relation state consistent. We must have a separate tople to represent every
combination of based on this Decomposing the EMP relation into two 4 NF relations EMPPROJECTS and EMP-DEPENDENTS.
Is
EMP-PROJECTS EMP-DEPENDENTS
E name P name
Smith
X
Smith
Y
E name P name
Smith
john
Smith
john
This specifies the MVD on the EMP
Comment
Problem
Chapter 15, Problem 26E
Apply Algorithm 15.2(a) to the relation in Exercise to determine a key for R. Create a minimal set
of dependencies G that is equivalent to F, and apply the synthesis algorithm (Algorithm 15.4) to
decompose R into 3NF relations.
Exercise
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional
dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.
Step-by-step solution
Step 1 of 4
Refer to the Exercise 14.24 for the set of functional dependencies F and relation R. The
functional dependencies in F are as follows:
• The combination of all the attributes is always a candidate key for that relation. So
ABCDEFGHIJ will be a candidate key for the relation R.
• Reduce unnecessary attributes from the key as follows:
• Since C can be determined by
so remove it from the key.
• Attributes D and E can be removed because they are determined by
• Attribute F can be removed because it can be determined by
• Attributes G and H can be removed because they are determined by
• Attributes I and J can be removed because they are determined by
Therefore, attribute set AB is a candidate key for relation R.
Comment
Step 2 of 4
Minimal set of dependencies (Minimal cover)
If functional dependencies of a relation are not in canonical form then first convert them into
canonical form using decomposition rule of inference.
Refer to the Exercise 14.24 for the set of functional dependencies F and convert them into
canonical form as follows:
If there exist any extraneous functional dependency, remove it.
Determine the minimal set of dependencies G, using the tests as follows:
• Test for minimal set of LHS (only test functional dependencies with ≥2 attributes)
1. Testing for
Test the functional dependency
:
Since
so
is necessary.
2. Testing for
Test the functional dependency
Since
so
:
is necessary.
• Test for minimal set of RHS
1. Testing for
Since
so
is necessary.
2. Testing for
Since
so
is necessary.
3.
Comment
Step 3 of 4
Testing for
Since
so
is necessary.
so
is necessary.
so
is necessary.
4. Testing for
Since
5. Testing for
Since
6. Testing for
Since
so
is necessary.
7. Testing for
Since
so
is necessary.
so
is necessary.
8. Testing for
Since
Therefore necessary functional dependencies are as follow:
After applying composition rule of inference, the minimal set of dependencies is:
Hence, the minimal set of dependencies G, that is equivalent to F, is:
Comment
Step 4 of 4
Following steps must be used to decompose R into 3NF relations, using synthesis algorithm:
Step 1: Calculate minimal cover
The set of above functional dependencies is a minimal cover of R.
Step 2: Creating relation for each functional dependency
There are five functional dependencies in the relation R. Create five relations
, all having the corresponding attributes as follows:
Step 3: Creating relation for key attributes
• AB is the candidate key in relation R. Since attributes A and B already exist in relation
there is no need to create another relation for key attributes.
• If another relation
is created containing the candidate key AB, then it will result in
redundancy, and step 4 can be used for removing the redundant relation.
Therefore, the final 3NF relations obtained after decomposing R are as follows:
and
Comment
so
Chapter 15, Problem 27E
Problem
Repeat Exercise 1 for the functional dependencies in Exercise 2.
Exercise 1
Apply Algorithm 15.2(a) to the relation in Exercise to determine a key for R. Create a minimal set
of dependencies G that is equivalent to F, and apply the synthesis algorithm (Algorithm 15.4) to
decompose R into 3NF relations.
Exercise
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional
dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.
Exercise 2
Repeat Exercise for the following different set of functional dependencies G = {{A, B}, {B, D}→
{E, F}, {A, D}→{G, H}, {A}→{I},{H}{J}}.
Exercise
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional
dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.
Step-by-step solution
Step 1 of 5
Refer to the Exercise 14.25 for the set of functional dependencies F and relation R. The
functional dependencies in F are as follows:
• The combination of all attributes is always a candidate key for that relation. So ABCDEFGHIJ
will be candidate key for the relation R. Reduce unnecessary attributes from the key. Since C can
be determined by
so remove it from the key.
• Since attributes B and D are determining attributes E and F so both should be removed from
the candidate key.
• Since attributes A and D are determining attributes G and H so both should be removed from
the candidate key.
• Since attribute A is determining attributes I so it should be removed from the candidate key.
• Since attribute H is determining attributes J so it should be removed from the candidate key.
Therefore, the attribute set ABD is a candidate key for the relation R.
Comment
Step 2 of 5
Minimal set of dependencies (Minimal cover)
If functional dependencies of a relation are not in canonical form then first convert them into
canonical form using decomposition rule of inference.
Refer to the Exercise 14.25 for the set of functional dependencies F and convert them into
canonical form as follows:
If there exist any extraneous functional dependency, remove it.
Determine the minimal set of dependencies G, using the tests as follows:
• Test for minimal set of LHS (only test functional dependencies with ≥2 attributes)
1. Testing for
Test the functional dependency
Since
so
:
is necessary.
2. Testing for
Test the functional dependency
Since
so
:
is necessary.
3. Testing for
Test the functional dependency
Since
so
:
is necessary.
4. Testing for
Test the functional dependency
Since
so
:
is necessary.
5. Testing for
Test the functional dependency
Since
so
:
is necessary.
6. Testing for
Test the functional dependency
Since
so
:
is necessary.
•
Comment
Step 3 of 5
Test for minimal set of RHS
1. Testing for
Since
so
is necessary.
so
is necessary.
so
is necessary.
so
is necessary.
2. Testing for
Since
3. Testing for
Since
4. Testing for
Since
5. Testing for
Since
so
is necessary.
6. Testing for
Since
so
is necessary.
so
is necessary.
7. Testing for
Since
Therefore necessary functional dependencies are as follow:
After applying composition rule of inference to above canonical functional dependencies, the
minimal functional dependencies G (where
) obtained are as follows:
Hence, the minimal set of functional dependencies G, that is equivalent to F, is:
Comment
Step 4 of 5
Following steps must be followed to decompose the relation R into 3NF relation using synthesis
algorithm. Refer Exercise 14.25 for the functional dependencies.
Step 1: Calculate minimal cover
Minimal cover of the given functional dependencies is as follows:
The set of above functional dependencies is a minimal cover of R.
Step 2: Creating relation for each functional dependency
There are five functional dependencies in the relation R, create five relations
, all having corresponding attributes.
Comment
Step 5 of 5
Step 3: Creating relation for key attributes
ABD is the candidate keys in relation R. Create a new relation
containing attributes A, B and
D. Therefore, all six relations with their corresponding attributes are as follow:
Step 4: Eliminating redundant relations
Remove all relations which are redundant. A relation R is redundant if R is a projection of another
relation S in the same schema
. Since there is no redundant relation in the schema, so
there is no need to remove any relation.
Therefore, the final 3NF relations obtained after decomposing R are as follows:
and
Comment
Chapter 15, Problem 29E
Problem
Apply Algorithm 15.2(a) to the relations in Exercises 1 and 2 to determine a key for R. Apply the
synthesis algorithm (Algorithm 15.4) to decompose R into 3NF relations and the decomposition
algorithm (Algorithm 15.5) to decompose R into BCNF relations.
Exercise 1
Consider a relation R(A, B, C, D, E) with the following dependencies:
AB → C, CD → E, DE→ B
Is AB a candidate key of this relation? If not, is ABD? Explain your answer.
Exercise 2
Consider the relation R, which has attributes that hold schedules of courses and sections at a
university; R = {Course_no, Sec_no, Offering_dept, Credit_hours, Course_level, lnstructor_ssn,
Semester, Year, Days_hours, Room_no, No_of_students}. Suppose that the following functional
dependencies hold on R:
{Course_no} → {Offering_dept, Credit_hours, Course_level}
{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no, No_of_students,
lnstructor_ssn}
{Room_no, Days_hours, Semester, Year} → {lnstructor_ssn, Course_no, Sec_no}
Try to determine which sets of attributes form keys of R. How would you normalize this relation?
Step-by-step solution
Step 1 of 6
Refer to the Exercise 14.27 for the set of functional dependencies and relation R. The functional
dependencies are as follows:
Canonical functional dependency
Functional dependency having only one attribute on their right hand side.
• The combination of all attributes is always a candidate key for that relation. So ABCDE will be
candidate key for the relation R. Since all functional dependencies are in canonical form, there
is no need to convert them into canonical form.
• Reduce unnecessary attributes from the key as follows:
• Since C can be determined by
so remove it from the key.
The attribute set ABDE can be considered as a candidate key.
• Since E can be determined by
so remove it from the key.
The attribute set ABD can be considered as a candidate key.
Therefore, ABD is a candidate key for the relation R.
Comment
Step 2 of 6
Refer to the Exercise 14.27 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into 3NF relations, using synthesis algorithm:
Step 1: Finding the minimal cover
The set of above functional dependencies is a minimal cover of R.
Step 2: Creating relation for each functional dependency
There are three functional dependencies, and their corresponding attributes are as follows:
Step 3: Creating relation for key attributes
• ABD is the candidate key in relation R. Since attributes A, B and D already exist in the above
relations, so there is no need to create another relation for key attributes.
• If another relation
is created containing the candidate key ABD, then it will result in
redundancy, and step 4 can be used for removing the redundant relation.
Therefore, the final 3NF relations obtained after decomposing R are as follows:
and
.
Comment
Step 3 of 6
Refer to the Exercise 14.27 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into BCNF relations, using decomposition algorithm:
Step 1: Initialize the decomposition algorithm.
S= (A, B, C, D, E)
Step 2: Check whether or not any functional dependency violates BCNF. If yes, then decompose
the relation.
Decompose R into three relations
having the following attributes:
Therefore, the final BCNF relations obtained after decomposing R are as follows:
and
.
Comment
Step 4 of 6
Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Since
functional dependencies are not in canonical form, convert them into canonical functional
dependencies as follows:
The entire attribute set of the relation R is a candidate key. Since Days_hours, Room_no,
No_of_students and Instructor_ssn can be determined by functional dependencies FD4, FD5,
FD6 and FD7 respectively, so remove them from the candidate key. Remaining attributes in the
candidate key are as follows:
Since Offering_dept, Credit_hours and Course_level can be determined by FD1, FD2 and FD3
respectively, so remove them from the candidate key. Remaining attributes in candidate are as
follows:
Therefore,
would be the minimal candidate key for the
relation R.
Comment
Step 5 of 6
Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into 3NF relations, using synthesis algorithm:
Step 1: Finding the minimal cover
Since functional dependencies are not in canonical form, convert them into canonical functional
dependencies as follows:
Since Instruct_ssn, Course_no and Sec_no have been determined already, so these are
extraneous attributes. Minimal cover for the relation R is as follows:
The composed form of above functional dependencies is as follows:
Step 2: Creating relation for each functional dependency
There are two functional dependencies, and their corresponding relations are as follows:
Step 3: Creating relation for key attributes
• The relation R has the candidate key (Course_no, Sec_no, Semester, Year). Since attributes
(Course_no, Sec_no, Semester, Year) already exist in the above relations, so there is no need to
create another relation for key attributes.
• If another relation
is created containing the candidate key (Course_no, Sec_no, Semester,
Year), then it will result in redundancy, and step 4 can be used for removing the redundant
relation.
Therefore, the final 3NF relations obtained after decomposing R are as follows:
and
Comment
Step 6 of 6
Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into BCNF relations, using decomposition algorithm:
Step 1: Initialize the decomposition algorithm.
Step 2: Check whether or not any functional dependency violates BCNF. If yes, then decompose
the relation.
Since
violates BCNF, relation R is decomposed into two relations
.
Therefore, the final BCNF relations obtained after decomposing R are as follows:
and
Comment
Chapter 15, Problem 31E
Problem
Consider the following decompositions for the relation schema R of Exercise. Determine whether
each decomposition has (1) the dependency preservation property, and (2) the lossless join
property, with respect to F. Also determine which normal form each relation in the decomposition
is in.
a. D1 = {R1, R2, R3, R4, R5}; R1 = {A, B, C}, R2 = {A, D, E}, R3 = {B, F}, R4 = {F, G, H}, R5 =
{D, I, J}
b. D2 = {R1, R2, R3}; R1 = {A, B, C, D, E}, R2 = {B, F, G, H}, R3 = {D, I, J}
c. D3 = {R1, R2, R3, R4, R5}; R1= {A, B, C, D}, R2= {D, E}, R3 = {B, F}, R4 = {F, G, H}, R5= {D, I,
J}
Exercise
Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional
dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.
Step-by-step solution
Step 1 of 10
Consider the relation R and functional dependencies as follows:
Comment
Step 2 of 10
a.
The decomposition for the relation schema R is:
Relation R1 satisfies the functional dependency
.
Relation R2 satisfies the functional dependency
.
Relation R3 satisfies the functional dependency
.
Relation R4 satisfies the functional dependency
Relation R5 satisfies the functional dependency
Hence, the decomposition
.
.
satisfies the dependency preserving property.
Comment
Step 3 of 10
In order to know if
satisfies the nonadditive join property, apply the algorithm 15.3. Please
refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook.
The first row consists of “a” symbols in all the cells. Hence, the decomposition satisfies the
nonadditive join property.
Comment
Step 4 of 10
In relation R1,
is the primary key and also a super key. It satisfies Boyce Codd normal
form.
In relation R2,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
In relation R3,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
In relation R4,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
In relation R5,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
All the relations of decomposition
are in Boyce Codd normal form.
Comment
Step 5 of 10
b.
The decomposition for the relation schema R is:
Relation R1 satisfies the functional dependency
and
Relation R2 satisfies the functional dependency
and
Relation R3 satisfies the functional dependency
.
Hence, the decomposition
.
.
satisfies the dependency preserving property.
Comment
Step 6 of 10
In order to know if
satisfies the nonadditive join property, apply the algorithm 15.3. Please
refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook.
The first row consists of “a” symbols in all the cells. Hence, the decomposition satisfies the
nonadditive join property.
Comment
Step 7 of 10
In relation R1,
is the primary key. The relation R1 is in first normal form as there is partial
dependency. The attribute A is a partial key and it determines the attributes D and E.
In relation R2,
is the primary key. The relation R2 is in second normal form as there is
transitive dependency. The attribute F is a non-key attribute that functional determines the
attributes G and H.
In relation R3,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
Comment
Step 8 of 10
c.
The decomposition for the relation schema R is:
Relation R1 satisfies the functional dependency
Relation R3 satisfies the functional dependency
.
.
Relation R4 satisfies the functional dependency
Relation R5 satisfies the functional dependency
The functional dependency
Hence, the decomposition
.
.
is not satisfied.
does not satisfy the dependency preserving property.
Comment
Step 9 of 10
In order to know if
satisfies the nonadditive join property, apply the algorithm 15.3. Please
refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook.
There is no row in the matrix that consists of “a” symbols in all the cells. Hence, the
decomposition does not satisfy the nonadditive join property.
Comment
Step 10 of 10
The normal form of relation R1 cannot be determined as it satisfies only functional dependency
. Nothing can be said about the attribute D of relation R1.
The normal form of relation R2 cannot be determined as it does not satisfy any functional
dependency.
In relation R3,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
In relation R4,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
In relation R5,
is the primary key and also a super key. It satisfies Boyce Codd normal form.
Comment
Chapter 16, Problem 1RQ
Problem
What is the difference between primary and secondary storage?
Step-by-step solution
Step 1 of 1
Following are the differences between primary and secondary storage:
Primary storage
Secondary storage
The CPU can directly access the
The CPU cannot directly access the secondary storage
primary storage devices.
devices.
Fast access to data is provided by
Slower access to data is provided by the secondary
the primary storage devices.
storage devices.
The storage capacity is limited.
The storage capacity is larger.
Cost of primary storage devices is
high than the secondary storage
devices.
Examples of primary storage are
main memory and cache memory.
Comment
Cost of secondary storage devices is low than the
primary storage devices.
Examples of secondary storage are hard disk drive,
magnetic disks, magnetic tapes, optical disks and flash
memory.
Chapter 16, Problem 2RQ
Problem
Why are disks, not tapes, used to store online database files?
Step-by-step solution
Step 1 of 1
To store online database files, we use disks. Disks are secondary storage device; a disk is a
random access addressable device.
Data us stored and retrieved in units called disk blocks while come to the tapes.
Tapes are the sequential access addressable device.
Comment
Chapter 16, Problem 3RQ
Problem
Define the following terms: disk, disk pack, track, block, cylinder, sector, interblock gap, and
read/write head.
Step-by-step solution
Step 1 of 3
Disk: The disk is the secondary storage device that is used to store the huge amount of data.
The disk stores the data in the digital form i.e., 0’s and 1’s. The most basic unit of data that can
be stored in the disk is bit.
Disk pack: The disk pack contains the layers of hard disks to increase the storage capacity i.e.,
it includes many disks.
Comment
Step 2 of 3
Track: In the disk, the information is stored on the surface in the form of circles with various
diameters. Each circle of the surface is called a track.
Block: Each track of the disk is divided into equal sized slices. One or more such slices are
grouped together to form a disk block. The block may contain single slice (sector). The size of
the block is fixed at the time of disk formatting.
Comment
Step 3 of 3
Cylinder: In the disk pack, the tracks with the same diameter forms a cylinder.
Sector: Each track of the disk is divided into small slices. Each slice is called as sector.
Interblock gap: The interblock gap separates the disk blocks. The data cannot be stored in the
interblock gap.
Read/write head: The read/write head is used to read or write the block.
Comment
Chapter 16, Problem 4RQ
Problem
Discuss the process of disk initialization.
Step-by-step solution
Step 1 of 1
Process of disk initialization:In the disk formatting / initialization process, tracks are divided into equal size. It is set by the
operating system.
Initialization means,
The process of defining the tracks and sectors, so that data and programs can be stored and
retrieved.
While initialization of the disk, block size is fixed, and it can not be changed dynamically.
Comment
Chapter 16, Problem 5RQ
Problem
Discuss the mechanism used to read data from or write data to the disk.
Step-by-step solution
Step 1 of 1
When the disk drive begins to rotate the disk when ever a particular read or write request is
initiated and once the read/unit head is positioned on the right track and the block specified in the
block address moves under the read/write head. The electronic component of the read/write
head is activated to transfer the data.
Below procedure is follows when the data is read for or write from disk.
(1) The head seeks to the correct track
(2) The correct head is turned on
(3) The correct sector is located.
(4) The data is read from the hard disk and transferred to a buffer RAM
Comment
Chapter 16, Problem 6RQ
Problem
What are the components of a disk block address?
Step-by-step solution
Step 1 of 1
Disk block address:Data is stored and retrieved in units called disk blocks or pages.
Address of a block:Consists of a combination of cylinder number, track number (Surface number with in the cylinder
on which the track is located. Block number (with in the track) is supplied to the disk
Comment
Chapter 16, Problem 7RQ
Problem
Why is accessing a disk block expensive? Discuss the time components involved in accessing a
disk block.
Step-by-step solution
Step 1 of 4
The data is arranged in an order and then stored in a block of the disk is said to be known as
blocking. The data can be transferred from the disk to the main memory in units.
Accessing the data in the main memory is less expensive than accessing the data in the disk.
This is due to the following components:
• Seek time.
• Rotational latency.
• Block transfer time.
Comment
Step 2 of 4
The access of the data in the disk is more expensive because of the time components. The time
components are explained as follows:
• Seek time:
o The disk contains a set of tracks. Each track has one head. This is said to be known as disk
head. The track is formed by sectors of fixed size.
o A sector is said to be known as a small sub-division of a track present on the disk.
o Each sector can store up to 512 bytes of data in which the user can access the data.
o For reading the data present in the disk, there is an arm on the disk. This is used to read a
record from the disk.
o Seek time is said to be known as the total time taken to position the arms to the disk head
present on the track.
o Accessing a disk block takes more seek time. Therefore, this is one of the major reasons for
the expensiveness of accessing a disk block.
Comment
Step 3 of 4
• Rotational latency:
o Latency is said to be known as time delay.
o The total amount of time taken between the request for an information and how long it takes
the disk to position the sector where that data is available. This is said to be known as rotational
latency.
o This is also said to be a waiting time in which if this time increases, then the expensiveness of
accessing a disk block will also increase.
Comment
Step 4 of 4
• Block transfer time:
o If there is need to transfer the data in the disk from one block to another block then it will take
some time.
o This time is said to be known as block transfer time. At the time of accessing a block of data
from the disk, the transfer time may increase. This will result in the expensiveness of accessing a
disk block.
Comment
Chapter 16, Problem 8RQ
Problem
How does double buffering improve block access time?
Step-by-step solution
Step 1 of 1
Improve block access time using duffer buffer:Double buffering is used to read a continuous stream of blocks from disk to memory.
Double buffering permits continuous reading or writing of data on consecutive
Disk blocks, which eliminates the seek time and rotational delay for all but the first block transfer.
Moreover, in the programs data is kept ready for processing and it reducing the waiting time.
Double buffering processing time is np
Where n
p
blocks
processing time/block
Comment
Chapter 16, Problem 9RQ
Problem
What are the reasons for having variable-length records? What types of separator characters are
needed for each?
Step-by-step solution
Step 1 of 1
Variable-length records (Reasons) A file is a sequence of records. All records in a file are of the
same type and in same size. If different records in the file have different size, the file is said to be
made up of variable – length records.
A file may have variable-length records for sever reasons.
* File records are of the same record type, but one or more of the fields are of different size.
* File records are same record type, but one or more of the fields are optimal.
Means, they may have some values but not for all.
* File contains records of different record types and varying size. If related records of different
types were placed together on the disk block it would be occur.
Type of separator characters are need for each:In the variable length fields, each record has a value for each field. But we do not know the exact
length of some field values. To determine the bytes with in that record it represent each field.
Then we can use separator characters like
? or % or $
Types of separator characters:Separating the field name from the field value and separating one field from the next field.
For this we use three types of characters.
Example
Here we use three separator characters. That are
,
and
Comment
Chapter 16, Problem 10RQ
Problem
Discuss the techniques for allocating file blocks on disk.
Step-by-step solution
Step 1 of 1
Techniques for allocating file Blocks on Disk:
Many techniques are there fore allocating the blocks of a file on Disk. In that
Contiguous allocation
Linked allocation
Clusters
Indexed allocation.
Contiguous allocation:File blocks are allocated to consecutive disk blocks. This makes reading the whole file very fast
using double buffering.
Linked allocation:Each file block contains a pointer to the next file block. It is easy to expand the file but makes it
slow to read the whole file.
Clusters:Combination of two allocates of consecutive disk blocks. Clusters are sometime called as file
segments/extends.
Indexed allocation:One or more index blocks contain pointers to the actual file blocks.
Comment
Chapter 16, Problem 11RQ
Problem
What is the difference between a file organization and an access method?
Step-by-step solution
Step 1 of 1
Difference between a file organization and an access method
File organization:- It shows “how the physical records in a file are arranged on the disk.
A file organization refers to the organization of the data of a file in to records, blocks, and access
structures
- In this, records and blocks are placed on the storage medium and interlinked.
Access methods:How the data can be retrieved based on the file organization.
It provides a group of operations and that can be applied to a file.
Some access methods apply to a file organization and can be applied only to file organization in
certain way.
Comment
Chapter 16, Problem 12RQ
Problem
What is the difference between static and dynamic files?
Step-by-step solution
Step 1 of 1
Difference between static and dynamic files:Static file:- in the file organization update operations are rarely performed.
While come to dynamic files,
It may change frequently,
Up date operations are constantly applied to them.
Comment
Chapter 16, Problem 13RQ
Problem
What are the typical record-at-a-time operations for accessing a file? Which of these depend on
the current file record?
Step-by-step solution
Step 1 of 1
Typical record at a time operations are:
1.) Reset: Set the file pointer to the beginning of file.
2.) Find (locate): Searches for the first record that satisfies a search condition. Transfer the block
containing that record into memory buffer. The file pointer points to the record in buffer and it
becomes the current record.
3.) Read (Get): Copies current record from the buffer to the program variable in the user
program. This command may also advance the current record pointer to the next record in the
file, which may necessitate reading the next file block from disk.
4.) FindNext: Searches for next record in file that satisfies the search condition. Transfer the
block containing that record into main memory buffer. The record is located in the buffer and
becomes current record.
5.) Delete: Delete current record and updates file on disk to reflect the deletion.
6.) Modify: Modifies some field values for current record and eventually update file on disk to
reflect the modification.
7.) Insert: Insert new record in the file by locating the block where record is to be inserted,
transferring the block into main memory buffer, writing the record into the buffer, and eventually
writing buffer to disk to reflect insertion.
Operations that are dependent on current record are:
1.) Read
2.) FindNext
3.) Delete
4.) Modify
Comment
Chapter 16, Problem 14RQ
Problem
Discuss the techniques for record deletion.
Step-by-step solution
Step 1 of 1
Techniques for record deletion:We may delete a record from the file using following techniques.
(1)
A program must first find it’s block
Copy the block in to a buffer
Delete the record from the buffer and then,
Rewrite the block back to the disk.
By using this, it leaves the unused space in the disk block. When we use this technique for
deleting the large number of records result is wasted in storage space.
(2) Another technique for record deletion is deletion marker (deletion is to have an extra byte or
bit). In this
Setting the deletion marker to a certain (deleted) value.
A different value of the marker indicates a valid record
Search programs and consider only valid records in a block.
These two deletion techniques requires periodic reorganization.
During this reorganization. The file blocks are accessed consecutively and records are packed by
removing deleted records.
For un ordered file, we use either spanned or un spanned organization and it is used with either
fixed-length or variable-length records.
Comment
Chapter 16, Problem 15RQ
Problem
Discuss the advantages and disadvantages of using (a) an unordered file, (b) an ordered file,
and (c) a static hash file with buckets and chaining. Which operations can be performed
efficiently on each of these organizations, and which operations are expensive?
Step-by-step solution
Step 1 of 4
(a) an unordered file:
It can be defined as the collection of records, those are placed in file in the same order as they
are inserted.
Advantages:
• It is fast, and Insertion of simple records are added at the end of the last page of the file.
• It is easy to get the records from the unordered file.
Disadvantages:
• Blank spaces may appear in the unordered file.
• It will take time to sort the records in unordered file.
Comment
Step 2 of 4
b) an ordered file:
An ordered file, it is stores records in order and it will changes the file when records are inserted.
Advantages:
• Recording a sequential based file is more capable as all the files are being stored as the order.
• Helpful when large volume of data is present.
Disadvantage:
• Rearranging of file would be needed for storing or modifying or deleting any records.
Comment
Step 3 of 4
c) static file hashing:
Advantages:
• The speed is the biggest advantage and it is efficient when huge volume of data is present.
Disadvantages:
• Difficult to implement the static file hashing.
Comment
Step 4 of 4
The hashing technique is the most efferent to be executed and is expensive process because of
is sophisticated structures.
• The extendable hashing is a type of dynamic hashing, which splits and associate the bucket of
the database size for change, because when a hash function is to be adjusted on a dynamic
basis.
• There is a cache which is an added advantage for faster improvement of information’s.
Comment
Chapter 16, Problem 16RQ
Problem
Discuss the techniques for allowing a hash file to expand and shrink dynamically. What are the
advantages and disadvantages of each?
Step-by-step solution
Step 1 of 5
Hashing techniques are allow the techniques for dynamic growth and shrinking of the number of
the file records.
Techniques that are include the dynamic hashing, extendible hashing and linear hashing.
In the static hashing the primary pages are fixed and allocated sequentially, pages are never deallocated and if needed pages of overflowed.
This technique use the binary representation of the hash value
.
In the dynamic hashing, the directory is a binary tree, the directories can be stored on disk, and
they expand or shrink dynamically. Directory entries point to the disk blocks and that contain the
stored records.
Dynamic hashing is good for the database and that grows and shrinks in size, and hash function
that allows dynamically.
Comment
Step 2 of 5
In dynamic hashing, extendable hashing is the one form. It generates the values over a large
range typically b-bit integers with
Hash function tht allows only prefix to index into a table of bucket addresses.
Example
Let the length of the prefix be I fits,
I must be in the limts between 0 and 32
Bucket address table size is
here initially
is 0.
Now, value of i grows and shrinks as the size of the database.
Comment
Step 3 of 5
The number of buckets also changes dynamically because of coalescing and spilitting of
buckets.
Advantages and disadvantages of hashing techniques:(1) Advantages of static hashing:Static hashing uses a fixed address space, and perform computation on the internal binary
representation of the search key.
Using bucket overflow, static hashing is redused, and it can not be eliminated.
Disadvantages:Here data base grow is with in time. And if initial number of buckets is too small, then
performance will degraded.
If data base is shrinks, than again space will be wasted.
Comment
Step 4 of 5
Extendible hashing:Advantages:It is a type of directory
Hash performance dose not degrade with growth of a file
Minimal space is over headed.
Disadvantages:Bucket address table may it self become big.
Changing size of bucket address table is an expensive operation.
Comment
Step 5 of 5
Linear hashing:Advantages:It avoids the directory by splitting.
Overflow pages not likely to be long
Duplicates handled easily
It allows a hash file to expand to shrink its number of buckets dynamically with out a
directory file.
Disadvantages:Linear hashing handles the problem of long overflow chains without using a directory and
handles duplicates.
Comment
Chapter 16, Problem 17RQ
Problem
What is the difference between the directories of extendible and dynamic hashing?
Step-by-step solution
Step 1 of 1
The differences between directories of extendible and dynamic hashing are as follows:
Comment
Chapter 16, Problem 18RQ
Problem
What are mixed files used for? What are other types of primary file organizations?
Step-by-step solution
Step 1 of 3
A mixed file refers to a file in which contains records of different file types.
• An additional field known as record type field is added as the first field along with the fields of
the records to distinguish the file to which it belongs.
• The records in a mixed file will be of varying size.
Comment
Step 2 of 3
The uses of mixed file are as follows:
• To place related records of different record types together on disk block.
• To increase the efficiency of the system while retrieving related records.
Comment
Step 3 of 3
The other types of primary file organization are as follows:
• Unordered file organization
• Ordered file organization
• Hashed/file organization
• Sorted file organization
• Hashed file organization
• Indexed (b-tree) file organization
Comment
Chapter 16, Problem 19RQ
Problem
Describe the mismatch between processor and disk technologies.
Step-by-step solution
Step 1 of 2
In computer systems, the collection of data can be stored physically in the storage
medium.
• From the DBMS (DataBase Management System), the data can be processed, retrieved, and
updated whenever it is needed.
• The storage medium structure in a computer will have some storage hierarchy to make the
process of collections of data.
There are two main divisions in the storage hierarchy of a computer system.
• Primary storage
• Secondary and tertiary storage
Primary storage:
• This storage medium in the computer can be directly accessed by the CPU (Central
Processing Unit), it can be stored only as temporarily.
• The primary storage is also called as main memory (RAM).
• In main memory, the data can be accessed faster with faster cache memories but less storage
capacity and cost-effective.
• Please note that in case of any power failures or browser crash, the contents of the main
memory will be erased automatically.
Secondary and tertiary storage:
• This storage medium in the computer can be stored permanently in the way of disks, tapes,
CD-ROMs, or DVDs.
• The secondary storage is also called as secondary memory or Hard Disk Drives (ROM).
• In today’s world, the data can be stored in offline considered as removable media, it is called as
tertiary storage.
• It will store the data as a permanent medium of choice.
• The data cannot be accessed directly in this type of storage, at first it will be copied to primary
storage and then the CPU processes the data.
Comment
Step 2 of 2
The Mismatch between processor and disk technologies:
In computer systems, the processing can be done by RAM which is having a series of
chips.
• For an efficient performance, the faster memory is provided to the processor.
• Also, the processor has the support of cache memory to retrieve the information faster which
will be an added advantage.
In computer systems, the disk technologies need the space to accumulate the data.
• In disk technologies, the collection of data can be stored physically.
• The data cannot be accessed directly in disk type technologies, at first it will be copied to
primary storage and then the CPU processes the data.
• When it is compared to the processor, the time consumption will be more, and the processor is
better to run the processes.
Hence, the processor will provide efficient performance better than the disk technologies.
Comment
Chapter 16, Problem 20RQ
Problem
What are the main goals of the RAID technology? How does it achieve them?
Step-by-step solution
Step 1 of 2
To increase reliability of database when using the redundant array of independent disks by
introducing redundancy
Disk mirroring:It is the technique for introducing redundancy in a database is called mirroring/ shadowing. Is
store data redundantly on two identical physical disks that are treated as one logical disk.
In case of mirroed data, the data items can be read from any disk, hot for writing the data item
must be written on both. Means, When data is read, it can be retrieved from the disk with shorter
queuing, seek, and rotational delays.
If one disk fails, the other disk is still there to continuously provide the data. It improves the
reliability.
Comment
Step 2 of 2
Quantities example from book:The mean time to failure of a mired disk depends on the man time to failure of the individual
disks, as well as on the mean time to repair, which is the time it takes (an average) to replace a
failed disk and to restore the data an it. Suppose that, the failures of the two disks are
independent; means there is no connection between the failure of one disk and the failure of the
other.
It the system has 100 disks in an array. The mean to repair is 24 hours, and the MTTF if 200,000
hours on each disk.
The mean time to data loss of a mirrored system is
Comment
Chapter 16, Problem 21RQ
Problem
How does disk mirroring help improve reliability? Give a quantitative example.
Step-by-step solution
Step 1 of 1
The technique of data striping to achieve higher transfer rates and improves the performance of
disk in RAID, which has two levels (i) bit-level data striping (ii) block level data striping.
Comment
Chapter 16, Problem 22RQ
Problem
What characterizes the levels in RAID organization?
Step-by-step solution
Step 1 of 2
Raid Levels:In the RAID organization, one solution that presents it self because of the increased size and
reduced cost of hard drives is to built in redundancy. RAID can be implemented in hardware and
software and it is a set of physical disk drives viewed by the operating system as a single logical
drive.
Levels:Depends on the data redundancy introduced and correctness checking technique used in the
schema.
Level 0:Uses data striping and it has no redundancy and no correctness checking.
Level 1:Redundancy through mirroring and no correctness checking.
Level 2:In this level; mirroring and no mirroring combined with memory like correctness checking.
For example:
Using parity hit:
Various versions of level 2 are possible.
Comment
Step 2 of 2
Level 3:Level 3 is seems like as level 2, but uses the single disk for parity. Level 3 is some time called as
bit-interleaved. Disk controller can detect whether a sector has been read correctly. A single
parity bit can be used for error correction as well as detection.
Level 4:Block level data striping and parity like level 3 and in this level stores blocks.
Level 5:Block level data striping but data and parity are distributed across all disks.
Level 6:Uses the P+Q redundancy scheme, and P+Q redundancy using Reed-suloman codes to recover
from multiple disk failures.
Comment
Chapter 16, Problem 23RQ
Problem
What are the highlights of the popular RAID levels 0, 1, and 5?
Step-by-step solution
Step 1 of 1
Different RAID (Redundant Array of Inexpensive Disks) organizations were defined based on
different combinations of the two factors,
1. Granularity of data interleaving (striping)
2. Pattern used to compute redundant information.
There are various levels of RAID from 0 to 6. The popularly used RAID organization is level 0
with striping, level 1 with mirroring, and level 5 with an extra drive for parity.
RAID level 0
• It uses data striping.
• It has no redundant data and hence it provides best write performance as updates are not
required to be duplicated.
• It splits data evenly across multiple disks.
RAID level 1
• It provides good read performance as it uses mirrored disks.
• Performance improvement is possible by scheduling a read request to the disk with shortest
expected seek and rotational delay.
RAID level 5
• It uses block level data striping.
• Data and parity information are distributed across all the disks. If any one disk fails, the data
lost is due to any changes is determined by using the information of the parity available from the
remaining disks.
Comment
Chapter 16, Problem 24RQ
Problem
What are storage area networks? What flexibility and advantages do they offer?
Step-by-step solution
Step 1 of 1
There is a demand for storage and management of cost all data as data are integrated across
organization and it is necessary to move from a static fixed data which are used from centered
architecture operation to a more flexible and dynamic infrastructure for the processing of
information requirements, most of the organizations moved to the better criterion of storage area
networks (SANs).
• In SAN, online storage peripherals are configured as nodes on a high-speed network and can
be attached and removed from servers in a very flexible manner.
• They allow storage systems to be placed at longer distances from the servers and provide good
performance and different connectivity options
• It provides point-to-point (every devices are connected to every other device) connections
between servers and storage systems through fiber channel; it allows connecting multiple RAID
systems, tape libraries to servers.
Advantages
1. It is more flexible as it provides flexible connection with many devices that is many-to-many
connectivity among servers and storage devices using fiber channel hubs and switches.
2. Between a server and storage system there is a distance separation of up to 10km provided
by using fiber optic cables.
3. It provides better isolation capabilities by allowing non-interruptive addition of new peripheral
devices and servers.
Comment
Chapter 16, Problem 25RQ
Problem
Describe the main features of network-attached storage as an enterprise storage solution.
Step-by-step solution
Step 1 of 1
In enterprise applications it is necessary to maintain solutions at a very low cost to provide high
performance. Network-attached storage (NAS) devices are used for this purpose. It does not
provide any of the services common to the server, but it allows the addition of storage for file
sharing.
Features
• It provides very large amount of hard-disk storage space and it is attached to a network and
multiple or more number of servers can make use of those space without shutting them down so
that it ensure better maintenance and improve the performance.
• It can be located at anywhere in the local area network (LAN) and used with different
configuration.
• A hardware device called as NAS box or NAS head acts as a gateway between the NAS
systems and clients who are connected in the network.
• It does not use any of the devices such as monitor, keyboard, or mouse, disk drives that are
connected to many NAS systems to increase total capacity.
• It can store any data that appears in the form of files, such as e-mails, web content includes
text, image or videos, and remote system backups.
• It works to provide reliable operation and for easy maintenance.
• It includes built-in features such as security (authenticate the access) or automatic sending of
alerts through mail in case of error occurred on the device that are connected.
• It contributes to provide high degree of scalability, reliability, flexibility and performance.
Comment
Chapter 16, Problem 26RQ
Problem
How have new iSCSI systems improved the applicability of storage area networks?
Step-by-step solution
Step 1 of 1
Internet SCSI (iSCSI) is a protocol proposed to issue commands that allows clients (initiators) to
send SCSI commands to SCSI storage devices through remote channels.
• The main feature is that, it does not require any special cabling connections as needed by Fiber
Channel and it can run for longer distances using existing network infrastructure.
• iSCSI allows data transfers over intranets and manages storage over long distances.
• It can transfer data over variety of networks includes local area networks (LANs), wide area
networks (WANs) or the Internet.
• It is bidirectional; when the request is given, it is processed and the resultant data is sent in
response to the original request.
• It combines different features such as simplicity, low cost, and the functionality of iSCSI devices
provides good upgrades and hence applied in small and medium sized business applications.
Comment
Chapter 16, Problem 27RQ
Problem
What are SATA, SAS, and FC protocols?
Step-by-step solution
Step 1 of 3
SATA Protocol:
SATA stands for serial ATA, wherein ATA represents attachment; therefore SATA becomes serial
AT attachment.
SATA is a modern storage protocol that has fully replaced the most commonly used SCSI (small
computer system interface) and parallel ATA in laptops and small personal computers. SATA
overcomes design limitations of previous storage protocol.
• SATA is suitable for tiered storage environment.
• SATA can be used for small and medium sized enterprises.
• SATA support interchangeability.
Comment
Step 2 of 3
SAS Protocol:
SAS stands for serial attached SCSI. SAS overcomes design limitations of previous storage
protocol and also considered superior to SATA.
• SAS was designed to replace SCSI interfaces in Storage area network (SAN).
• SAS drives are faster than SATA drives and has dual portability.
• SATA can be used for small and medium sized enterprises.
• SATA support interchangeability.
Comment
Step 3 of 3
FC Protocol:
FC stands for serial Fiber channel protocol. Fiber channel is used to connect multiple RAID
systems, taps, which have different configurations.
• Fiber channel supports point to point connection between server and storage system. It also
Provide flexibility to connect too many connections between servers and storage devices.
• Fiber channel has almost the same performance like SAS. It uses fiber optic cables, so high
speed data transfer supported.
• No distance limitation. Low cost alternative for devices.
Comment
Chapter 16, Problem 28RQ
Problem
What are solid-state drives (SSDs) and what advantage do they offer over HDDs?
Step-by-step solution
Step 1 of 2
Solid-state drives (SSD):
SSD is abbreviation for solid-state drives, which uses integrated circuit assemblies as storage to
store data permanently. It is a nonvolatile memory, means it will not forget the data on system
memory when the system is turned off.
SSD is based on flash memory technology, that’s why sometimes it is known as flash memory,
and they don’t require continuous power supply to store data on secondary storage, so they are
known as solid state disk or solid state drives.
SSD does not have read and write head like traditional electromagnetic disk, instead it has
controller (embedded processor) for various operations. It makes speed of data retrieval faster
than magnetic disks. Commonly in SSDs, interconnected NAND flash memory cards are used.
SSD uses wear leveling technique to store data that extend the life of SSD by storing data to
separate NAND cell, instead of overwriting it.
Comment
Step 2 of 2
Advantages of SSDs over HDDs are as follows:
• Faster access time and higher transfer rate:
In SSD data can be accessed directly from different locations on flash memory, so access time in
SSD is 100 times faster than HDD and latency time is low, consequently data transfer rate is high
and system boot up time is low.
• More reliable:
SSD does not have a moving mechanical arm for read and write operations. Data is stored on
integrated circuit chips. SSD has controller to manage all the operations on flash cells, and data
can be written and erased on flash cell, only limited number of time before it fails. The controller
manages these activities, so that SSD can work for many years under normal use.
• No moving component (durable):
As SSD does not have moving component, so data on SSD is safer, even when equipment is
being handled roughly.
• Uses less power:
As in SSD, there is no head rotation to read and write data, so power consumption is lower than
HDD and saves battery life. SDD uses only 2-3 watts whereas HDD uses 6-7 watts of power.
• No noise and generate less heat:
As no moving head rotation is there, so SSD generate less heat and doesn’t make noise that
helps to increase life and reliability of the drive.
• Light weight:
As SSDs are mounted on circuit board and they don’t have moving head and spindle, so they are
light weight and small in size.
Comment
Chapter 16, Problem 29RQ
Problem
What is the function of a buffer manager? What does it do to serve a request for data?
Step-by-step solution
Step 1 of 2
The buffer manager is a software module of DBMS whose responsibility is to serve to all the data
requests and take decision about choosing a buffer and to manage page replacement.
The main functions of buffer manager are:
• To speed up the processing and increase efficiency.
• To increase the possibility that the requested page is found in main memory.
• To find an appropriate replacement for a page while reading a new disk block from disk, such
that the replacement page will not be required soon.
• The buffer manager must ensure that the number of buffers fits in the main memory.
• Buffer manager functions according to the buffer replacement policy and selects the buffers that
must be emptied, when the requested amount of data surpasses the available space in buffer.
Comment
Step 2 of 2
The buffer manager handles two types of operations in buffer pool to fulfill its functionality:
1. Pin count: This is the counter to track the number of page requests or corresponding number
of users who requested that page. Initially counter value is set to zero. If the counter value is
always zero, the page is unpinned. Only unpinned blocks are allowed to be written on the disk.
As the value of counter is incremented the pages are called pinned.
2. Dirty bit: Initially its values is set to zero for all pages. When the page is updated, its value is
updated to 1.
Buffer manager processes the page requests in following steps:
• Buffer manager checks the availability of the page in buffer. If the page is available, it
increments the pin count and sends the page.
• If page is not in buffer, than buffer manager takes the following steps:
• Buffer manager decides a page according to the replacement policy and increments page’s pin
count.
• If the dirty bit of replacement page is on, buffer manager writes that page onto disk and
replaces the old copy.
• If the dirty bit is not on, buffer manager does not write the page back to disk.
• Buffer manager reads the new page and conveys the memory location of the page to the
demanding application.
Comment
Chapter 16, Problem 30RQ
Problem
What are some of the commonly used buffer replacement strategies?
Step-by-step solution
Step 1 of 2
Buffer replacement strategies:
In large DBMSs, files contain so many pages and it is not possible to keep all the data in memory
at the same time. To overcome this storage problem and improve efficiency of DBMS
transactions, buffer manager (software) uses buffer replacement strategies that decide what
buffer to use and which pages are to be replaced in the buffer to give a space to newly requested
pages.
Comment
Step 2 of 2
Some commonly used buffer replacement strategies are as follows:
• LRU (Least recently used):
The LRU strategy keeps track of page usages for specific period of time and it removes the
oldest used page.
LRU works on the principle that the pages which are frequently used are most likely to be used in
further processing too. To maintain the strategy the buffer manager has to maintain a table where
the frequency of the page usage is recorded for every page. This is very common and simple
policy.
It has problem of sequential flooding, which means that there are frequent scanning and
repeated use of I/O for each page.
• Clock policy:
This is an approximate LRU technique. It is like Round robin strategy. In clock replacement policy
buffers are arranged in a circle like a clock with a single clock hand. The buffer manager sets
“use bit” on each reference. If “use bit” is not set (flag 0) for any buffer that means it is not used in
a long time and is vulnerable for replacement. It replaces the old page not the oldest.
• FIFO (First In First Out):
This is the simplest buffer replacement technique. When buffer is required to store new pages,
the oldest arrived page is swapped out. The pages are arranged into the buffer in a queue in a
fashion that most recent page is the tail and oldest arrival is the head.
During replacement the page at the head of the queue is replaced first. This strategy is simple
and easy to implement but not desirable, because it replaces the oldest page which may be most
frequently used page and in future it can be needed, so again it will be swapped in. It creates
processing overhead.
• MRU (Most recently used):
It removes most recently used pages first. This is also called fetch and discard. This is useful in
sequential scanning when most recently used page, won’t be used in future for a period of time.
In situation of sequential scanning LRU and CLOCK strategies don’t perform well. To enhance
performance of FIFO, it can be modified by using some pinned block like root index block, and
make sure that they can’t be replaced and always remain in buffer.
Comment
Chapter 16, Problem 31RQ
Problem
What are optical and tape jukeboxes? What are the different types of optical media served by
optical drives?
Step-by-step solution
Step 1 of 2
Optical jukeboxes:
Optical jukebox is an intelligent data storage device that uses an array of optical disk platters,
and automatically load and unload these disks like according to the storage need. Jukeboxes has
high capacity storage and it supports up to terabytes and even petabytes of tertiary storage.
• Optical jukeboxes have up to 2000 different disk slots. As optical jukeboxes keep traversing
different disk storage according data requirement, so it create time overhead and affect
processing.
• Jukeboxes are cost effective and provide random access of data.
• The process of dynamically loading and uploading of disk drives is called migration.
Magnetic jukeboxes:
Magnetic tape jukeboxes uses a number of tapes as a storage and automatically load and
unload taps on tape drives. This is a popular tertiary storage medium that can handle data up to
terabytes.
Comment
Step 2 of 2
Optical media used by optical drives:
Optical media stores data in digital form. Optical media can store all type of data like audio,
video, software, images and text.
To read and write data on optical media, optical drive is used. Optical drive read and write data
using laser waves. Laser waves are electromagnetic waves with specific wavelength to read
different type of media.
The following Optical media used by optical drives:
• CD(Compact disk): According to use and recording type there are three type of CDs
Read-only: CD-ROM
Writable: CD-R
Re-writable: CD-RW
• DVD(Digital versatile disk) : high capacity drives
• Blu-ray disk: most commonly used to store video.
Comment
Chapter 16, Problem 32RQ
Problem
What is automatic storage tiering? Why is it useful?
Step-by-step solution
Step 1 of 1
Automated storage tiering (AST):
AST is the one of the storage types that filters and transfers the data among different types of
storage like SATA, SAS, SSDs based on the storage requirement, dynamically.
Automated tiering mechanism is managed by the storage administrator. According to the tiering
policy, less used data is transferred to the SATA drives, as it is slower and is not much
expensive, and frequently used data is transferred to high speed SAS or solid state drives.
The automated tiering highly improves performance of the DBMS.
EMC implements FAST (fully automated storage tiering). It automatically monitors data
activeness, and moves active data to high performance storage like SSD and inactive data to
inexpensive and slower storage like SATA. Therefore, AST is useful as it results in high
performance and low cost.
Comment
Chapter 16, Problem 33RQ
Problem
What is object-based storage? How is it superior to conventional storage systems?
Step-by-step solution
Step 1 of 2
Object - based storage:
In object based storage system data is organized in units called object instead of blocks in file. In
this storage system, data is not stored in hierarchy rather than all the data is stored in the form of
objects, and required object can be searched directly using unique global identifier, without
overhead.
Every object in object based storage has three parts:
• Data: It is the information that is to be stored in the objects.
• Variable Meta data: This field has the information about main data like location of the data,
usability, confidentiality and other information required to manage the data.
• Unique global identifier: This identifier stores the address information of the data so that data
can be located easily.
Comment
Step 2 of 2
Object storage system is better than conventional storage system in following ways:
• As the organizations are expanding, their data is also increasing day by day. If the file system is
used as a data storage system and data is stored in the blocks, it would become very difficult to
manage huge amount of data. In conventional file systems, data is stored in hierarchical fashion
and all these data are stored into blocks with their own unique address.
To solve this management overhead, data is stored in the form of objects with additional
metadata information.
• Object based storage provides security of data. In object based systems, the objects can be
accessed directly by the applications through unique global identifier. While in the file storage
system data need to be searched in linear or binary fashion that generates processing overhead
and is time consuming.
• Object based storage system supports features like replication, encapsulation and distribution
of objects, that makes data secure, manageable and easily accessible. However, conventional
file based storage system does not supports replication and distribution of objects.
Comment
Chapter 16, Problem 34E
Problem
Consider a disk with the following characteristics (these are not parameters of any particular disk
unit): block size B = 512 bytes; interblock gap size G = 128 bytes; number of blocks per track =
20; number of tracks per surface = 400. A disk pack consists of 15 double-sided disks.
a. What is the total capacity of a track, and what is its useful capacity (excluding interblock
gaps)?
b. How many cylinders are there?
c. What are the total capacity and the useful capacity of a cylinder?
d. What are the total capacity and the useful capacity of a disk pack?
e. Suppose that the disk drive rotates the disk pack at a speed of 2,400 rpm (revolutions per
minute); what are the transfer rate (tr) in bytes/msec and the block transfer time (btt) in msec?
What is the average rotational delay (rd) in msec? What is the bulk transfer rate? (See Appendix
B.)
f. Suppose that the average seek time is 30 msec. How much time does it take (on the average)
in msec to locate and transfer a single block, given its block address?
g. Calculate the average time it would take to transfer 20 random blocks, and compare this with
the time it would take to transfer 20 consecutive blocks using double buffering to save seek time
and rotational delay.
Step-by-step solution
Step 1 of 8
Given data
Block size
Inter block gap size
Number of blocks per track
Number of tracks per surface
Disk pack consists of 15 double – sided disks
Comment
Step 2 of 8
(a) Total track size
Block per track
(block size
block gap size)
Bytes
k bytes
Useful capacity of a track = block per tract
block size
Bytes
Comment
Step 3 of 8
(b) Number of cylinders
Numbers of tracks
400
Comment
Step 4 of 8
(c) Total cylinder capacity
Comment
Step 5 of 8
(d) Total capacity of a disk pack
Bytes
m bytes
Useful capacity of a disk pack
Comment
Step 6 of 8
(e) Transfer rate
`
Block transfer time
Average rotational delay
Comment
Step 7 of 8
(f) Average time to locate and transfer a block
Comment
Step 8 of 8
(g) Time to transfer 20 random blocks
Time to transfer 20 consecutive blocks using double
Buffering
Comment
Chapter 16, Problem 35E
Problem
A file has r = 20,000 STUDENT records of fixed length. Each record has the following fields:
Name (30 bytes), Ssn (9 bytes), Address (40 bytes), PHONE (10 bytes), Birth_date (8 bytes),
Sex (1 byte), Major_dept_code (4 bytes), Minor_dept_code (4 bytes), Class_code (4 bytes,
integer), and Degree_program (3 bytes). An additional byte is used as a deletion marker. The file
is stored on the disk whose parameters are given in Exercise.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned
organization.
c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the
file blocks are stored contiguously, and double buffering is used; (ii) the file blocks are not stored
contiguously.
d. Assume that the file is ordered by Ssn; by doing a binary search, calculate the time it takes to
search for a record given its Ssn value.
Exercise
What are SATA, SAS, and FC protocols?
Step-by-step solution
Step 1 of 6
Comment
Step 2 of 6
Comment
Step 3 of 6
Comments (1)
Step 4 of 6
Comment
Step 5 of 6
Comment
Step 6 of 6
Comment
Chapter 16, Problem 36E
Problem
Suppose that only 80% of the STUDENT records from Exercise have a value for Phone, 85% for
Major_dept_code, 15% for Minor_dept_code, and 90% for Degree_program; and suppose that
we use a variable-length record file. Each record has a 1-byte field type for each field in the
record, plus the 1-byte deletion marker and a 1-byte end-of-record marker. Suppose that we use
a spanned record organization, where each block has a 5-byte pointer to the next block (this
space is not used for record storage).
a. Calculate the average record length R in bytes.
b. Calculate the number of blocks needed for the file.
Exercise
What are solid-state drives (SSDs) and what advantage do they offer over HDDs?
Step-by-step solution
Step 1 of 3
Assume that a variable length record file is being used.
It is provided that each record has 1 byte field type, along with 1 byte deletion marker and 1 byte
end of record marker.
So the fixed record size would be calculated for fields not mentioned in the question, that is
Name, Ssn, Address, Birth_date, Sex, Class_code.
Therefore,
And for the remaining variable length fields, that is Phone, Major_dept_code, Minor_dept_code,
Degree_program), the number of bytes per record can be calculated as,
Comment
Step 2 of 3
a.
Therefore, the average record length R is,
The average record length is
.
Comment
Step 3 of 3
b.
Since a spanned record-file organization is being used, where each block has unused space of
5-bytes pointer, so the usable bytes in each block are
.
The number of blocks required for the file can be calculated as,
The numbers of blocks required for file are
Comment
.
Chapter 16, Problem 37E
Problem
Suppose that a disk unit has the following parameters; seek time s = 20 msec; rotational delay rd
= 10 msec; block transfer time btt= 1 msec; block size B = 2400 bytes; interblock gap size G =
600 bytes. An EMPLOYEE file has the following fields: Ssn, 9 bytes; Last_name, 20 bytes;
First_name, 20 bytes; Middle_init, 1 byte; Birth_date, 10 bytes; Address, 35 bytes; Phone, 12
bytes; Supervisor_ssn, 9 bytes; Department, 4 bytes; Job_code, 4 bytes; deletion marker, 1 byte.
The EMPLOYEE file has r = 30,000 records, fixed-length format, and unspanned blocking. Write
appropriate formulas and calculate the following values for the above EMPLOYEE file:
a. Calculate the record size R (including the deletion marker), the blocking factor bfr, and the
number of disk blocks b.
b. Calculate the wasted space in each disk block because of the unspanned organization.
c. Calculate the transfer rate tr and the bulk transfer rate btr for this disk unit (see Appendix B for
definitions of tr and btr).
d. Calculate the average number of block accesses needed to search for an arbitrary record in
the file, using linear search.
e. Calculate in msec the average time needed to search for an arbitrary record in the file, using
linear search, if the file blocks are stored on consecutive disk blocks and double buffering is
used.
f. Calculate in msec the average time needed to search for an arbitrary record in the file, using
linear search, if the file blocks are not stored on consecutive disk blocks.
g. Assume that the records are ordered via some key field. Calculate the average number of
block accesses and the average time needed to search for an arbitrary record in the file, using
binary search.
Step-by-step solution
Step 1 of 7
Consider the following parameter of a disk:
Seek time s = 20 msec
Rotational delay rd = 10 msec
Block transfer time btt = 1 msec
Block size B = 2400 bytes
Inter block gap size G = 600 bytes
Consider a file EMPLOYEE is having records such that r = 30,000.
Different fields common in each record are as follows:
Field name
Size (in bytes)
Ssn
9
First_name
20
Last_name
20
Middle_init
1
Address
35
Phone
12
Birth_date
10
Supervisor_ssn 9
Department
4
Job_code
4
deletion marker 1
Comment
Step 2 of 7
The record size R can be calculated as,
The record size is
.
Since the file is unspanned so the blocking factor bfr can be calculated as,
The blocking factor is
.
In an unspanned organization of records, the number of file blocks can be calculated as,
The numbers of file blocks are
.
Comment
Step 3 of 7
As the file has unspanned organization, so wasted space in each block can be calculated as,
The wasted space in each disk block is
.
Comments (1)
Step 4 of 7
The transfer rate tr can be calculated as,
The transfer rate for the disk is
.
The bulk transfer rate btr can be calculated as,
The bulk transfer rate for the disk is
.
Comment
Step 5 of 7
While searching for an arbitrary record in a file using the liner search the average number of
block accesses can be found as follows:
• Records are searched on key fields.
If one record satisfies the search condition, on average half of the blocks are to be searched, that
is
.
If the record does not satisfies the search condition, all blocks are to be searched, that is
.
• Records are searched on non-key fields.
In this case all blocks are to be searched, that is
.
To calculate the average time to find a record using linear search on the file, the search is
performed on average half of the file blocks.
Half of 1579 file blocks is approximately, 1579/2 = 789.5 blocks.
If the blocks are stored on consecutive disk block and double buffering is used, the average time
taken to read 789.5 blocks is,
If the file blocks are stored consecutively and double buffering is used, then the average time
taken to find a record by doing linear search on the file is
.
Comment
Step 6 of 7
If the file blocks are not stored in consecutive disk blocks, the time taken to read 789.5 blocks is,
If the file blocks are not stored consecutively, then the average time taken to find a record by
doing linear search on the file is
.
Comment
Step 7 of 7
While the records are ordered via some key field and binary search is going on, then the average
number of block accesses can be found as follows
• If record is found then on an average half of the blocks are to be accessed, that is
.
• If the record is not found then all blocks are to be accessed, that is
.
If it is assumed that records are ordered through some key field, the time taken to search a
record, using binary search, is calculated as,
The average time taken to search a record via some key field is
Comment
.
Chapter 16, Problem 39E
Problem
Load the records of Exercise into expandable hash files based on extendible hashing. Show the
structure of the directory at each step, and the global and local depths. Use the hash function
h(K) = K mod 128.
Exercise
What are optical and tape jukeboxes? What are the different types of optical media served by
optical drives?
Step-by-step solution
Step 1 of 10
Consider the following records:
2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981 and
9208.
The hash function is
.
Comment
Step 2 of 10
Calculate the hash value (bucket number) and binary value to each record as follows:
Comment
Step 3 of 10
Now, perform the extendible hashing with local depth 0 and global depth 0. Here, each bucket
can hold two records.
The record 3 i.e., 4692 cannot be inserted because, already two records are inserted. Increase
the global depth to one to insert more elements. Now, the global depth is 1 and local depth is 1.
Check the binary value of each record. Map the record to 0 if the binary value of the record starts
with 0. Map the record to 1 if the binary value of the record starts with 1. For example, the binary
value of bucket number for 2369 is 1000001 (First bit is highlighted). The first bit is 1 thus, it
should be mapped to 1. The binary value of bucket number for 3760 is 0110000. The first bit is 0
thus, it should be mapped to 0.
Comment
Step 4 of 10
The next record cannot be inserted because all the blocks are filled.
Comment
Step 5 of 10
Now, increase the global depth to 2. Thus, check for the first two bits of the binary value of the
bucket number.
Now, insert the next record.
Comment
Step 6 of 10
The record 1821 cannot be inserted. Thus, increase the global depth to 3.
Comment
Step 7 of 10
Now, insert other records. The record 1074 can be inserted easily because there is a space in
the bucket.
Now, insert 7115.
Comment
Step 8 of 10
The record 7115 cannot be inserted. Now, increase the local depth to 3 for the last bucket and
insert the elements.
The records left are 6975, 4981 and 9208. The record 6975 cannot be inserted. Increase the
global depth to 4 and insert the elements.
Comment
Step 9 of 10
The last record cannot be inserted. Insert 9208 by increasing the local depth to 4 in the
corresponding block. The final table is as follows:
Comment
Step 10 of 10
Comment
Chapter 16, Problem 40E
Problem
Load the records of Exercise into an expandable hash file, using linear hashing. Start with a
single disk block, using the hash function h0 = K mod 20, and show how the file grows and how
the hash functions change as the records are inserted. Assume that blocks are split whenever an
overflow occurs, and show the value of n at each stage.
Exercise
What are optical and tape jukeboxes? What are the different types of optical media served by
optical drives?
Step-by-step solution
Step 1 of 1
When we apply hash function K Mod 2^0
we get a single bucket.
We split this bucket into two buckets with new function K Mod 2^1
Bucket1:2369, 4871, 5659, 1821, 7115, 3943, 6975, 4981
Bucket2:3760,4692, 1074, 1620, 2428, 4750, 9208
Now we can split bucket into four buckets:
B1a:2369, 1821,4981
B1b:1074,4750
B1c:4871,5659,7115,3943,6975,
B1d:3760, 4692,1620,2428,9208
Since some bucket more than 2 elements they can be split using function K Mod 2^3
B1:2369,
B5:1821,4981
B7:4871,3943,6975
B3:5659, 7115
B8:3760,9208
B4:4692,1620,2428
B2: 1074
B6:4750
Since some buckets are still greater in size so we apply another function on them K Mod 2^4
B1: 2369
B5:4981
B7:4871,3943,
B8:9208
B15:6975
B4:4692,1620
B11:5659,7115
B12:2428
B13:1821
B16:3760
B14:4750
B2:1074
Now we have all buckets of correct size.
Comment
Chapter 16, Problem 41E
Problem
Compare the file commands listed in Section 16.5 to those available on a file access method you
are familiar with.
Step-by-step solution
Step 1 of 1
File commends listed in Files of Unordered Records, on a file access methods.
Records are placed in the file in the order in which they are inserted. Records are inserted at the
end of the file. This record organization is called heap/ pile file. File commends in the files of
unordered records:Inserting a new record:Delete a record
External sating
Inserting a record:New record insertion is very efficient. It is done by when new record is inserted. Then the last
block of the file is copied in to a butter than the new record is added then block is rewriters back
to the disk.
Delete a record:Program must find it’s block first, and copy the block into a buffer, then delete the record from the
buffer and finally rewrite the block back to it disk. In this record deletion. We use the technique of
deletion marker.
External sorting:When we want to read all records in order of the value of some fields. Then we create a sorted
copy of the file. For a large disk file it is an expensive. So, for this we use external sorting.
Comment
Chapter 16, Problem 42E
Problem
Suppose that we have an unordered file of fixed-length records that uses an unspanned record
organization. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.
Step-by-step solution
Step 1 of 1
Compare the heap file (unordered files) and file access methods.
Heap file:- The simplest and basic type of organization.
- Records are placed in the file in the order in which are inserted.
- Inserting a new record is very efficient.
- New records are inserted at the end of the file.
- Searching is done by only search procedure. Mainly involves a linear search, and it is an
expensive procedure.
Fine access methods:- In the file organization, organization of the data of a file into records, blocks, and access
structures.
- Records and blocks are placed on the storage medium and they are interlinked. Example:
sorted file.
Access methods:- Provide a group of operations and that can be applied to a file.
Example: Open, find, delete, modify, insert close ……etc.
- An organization is consists of several access methods. It is possible to apply.
- Some access methods can be applied only to file organized in certain ways. That are
Records organized by serially, (sequential)
Relative record number based on organization. (Relative)
Indexed based organization (indexed)
Method access refers to the way that is, in which records are accessed. A file with an
organization of indexed or relative may still have its records accessed sequentially. But records in
a file with an organization of sequential. Cannot be accessed directly.
Comment
Chapter 16, Problem 43E
Problem
Suppose that we have an ordered file of fixed-length records and an unordered overflow file to
handle insertion. Both files use unspanned records. Outline algorithms for insertion, deletion, and
modification of a file record and for reorganizing the file. State any assumptions you make.
Step-by-step solution
Step 1 of 2
For ordered file of fixed length:
Algorithms: Consider that file name is abc and file is ordered on Key field that is a numeric fiels
and in increasing order.
For insertion: Let for record that is to be inserted value of Key field be n
1. Open file abc and take file pointer in variable fp
2. Find record where fp.key>n
3. Insert current record at this position.
4. Save the file data
5. Close file
For deletion: let record to be deleted has value for key field = n
1. Open file abc and take file pointer in variable fp
2. Find record where fp.key = n
3. Delete the record.
4. Save result
5. Close file
For modification: let record to be modified has value of key field = n and value of Name is
to be modified to xyz.
1. Open file abc and take file pointer in variable fp
2. Find record where fp.key=n
3. Set fp.name = ‘xyz’
4. Save result
5. Close file.
For am unordered file:
Comment
Step 2 of 2
For insertion: Let for record that is to be inserted value of Key field be n
1. Open file abc and take file pointer in variable fp
2. Seek end of file
3. Insert current record at this position.
4. Save the file data
5. Close file
For deletion: let record to be deleted has value for key field = n
1. Open file abc and take file pointer in variable fp
2. Find record where fp.key = n
3. Delete the record.
4. Save result
5. Close file
For modification: let record to be modified has value of key field = n and value of Name is
to be modified to xyz.
1. Open file abc and take file pointer in variable fp
2. Find record where fp.key = n
3. Set fp.name = ‘xyz’
4. Save result
5. Close file.
Comment
Chapter 16, Problem 44E
Problem
Can you think of techniques other than an unordered overflow file that can be used to make
insertions in an ordered file more efficient?
Step-by-step solution
Step 1 of 1
575-13-33E
Yes, we may think that it is possible to use an overflow file in which the records are chained
together in a manner similar to the overflow for static hash files. The overflow records that should
be inserted in each block of the ordered file are linked together in the overflow file, and a pointer
to the first record in the linked list, that is kept in the block of the main file.
The list may or may not be kept ordered.
Comment
Chapter 16, Problem 45E
Problem
Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled
by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.
Step-by-step solution
Step 1 of 2
Over flow is handled by chaining. Means, in a bucket. Multiple blocks are chained together and
attached by a number of over flow buckets together.
In a hash structure. The insertion is done like this
Step 1:
Each bucket
stores a value
values on the first
all the entries that point to the same bucket have the same
; bits
Step 2:
To locate the bucket containing search key
;
Compute
Use the first
high order nits of
as a displacement in to the bucket address table and
follow the pointer to the appropriate bucket.
Step 3: T inserts a record with search key value
;
Follow lookup procedure to locate the bucket, say
If there is room in bucket
, insert the record
Otherwise the bucket must be split and insertion reattempted.
Comment
Step 2 of 2
Deletion in hash file:To delete a key value,
Sept 1.
Locate it in its bucket and remove it
Step 2.
The bucket it self can be removed if it becomes empty
Step 3.
Coalescing of buckets is possible-can only coalesce with a “buddy” bucket having the same
value of
and same
prefix, if one such bucket exists
Assumptions:Each key in the record is unique
Data file in the record is open
Overflow file is open
A bucket record has been defined
Comment
Chapter 16, Problem 45E
Problem
Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled
by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.
Step-by-step solution
Step 1 of 2
Over flow is handled by chaining. Means, in a bucket. Multiple blocks are chained together and
attached by a number of over flow buckets together.
In a hash structure. The insertion is done like this
Step 1:
Each bucket
stores a value
values on the first
all the entries that point to the same bucket have the same
; bits
Step 2:
To locate the bucket containing search key
;
Compute
Use the first
high order nits of
as a displacement in to the bucket address table and
follow the pointer to the appropriate bucket.
Step 3: T inserts a record with search key value
;
Follow lookup procedure to locate the bucket, say
If there is room in bucket
, insert the record
Otherwise the bucket must be split and insertion reattempted.
Comment
Step 2 of 2
Deletion in hash file:To delete a key value,
Sept 1.
Locate it in its bucket and remove it
Step 2.
The bucket it self can be removed if it becomes empty
Step 3.
Coalescing of buckets is possible-can only coalesce with a “buddy” bucket having the same
value of
and same
prefix, if one such bucket exists
Assumptions:Each key in the record is unique
Data file in the record is open
Overflow file is open
A bucket record has been defined
Comment
Chapter 16, Problem 46E
Problem
Can you think of techniques other than chaining to handle bucket overflow in external hashing?
Step-by-step solution
Step 1 of 5
To handle a bucket overflow in external hashing, there is a techniques like chaining and TrieBased hashing.
Through this technique:
- it allow the number of allocated buckets to grow and shrink as needed.
- Distributes records among buckets based on the values of the leading bits in their hash values.
We can show this technique by the following.
Let bucket of disk address is
Comment
Step 2 of 5
Comment
Step 3 of 5
Over flow is done by,
the bucket (block) based on the first binary digit of the hash address.
So, the address is split into
Comment
Step 4 of 5
Comment
Step 5 of 5
Here bulk flow is done and now again it is split on 2nd bit in the hash address
Ti show this,
Suppose we have:
If we want to inset
Comment
in the previous structure thour the structure is comes like this
Chapter 16, Problem 47E
Problem
Write pseudocode for the insertion algorithms for linear hashing and for extendible hashing.
Step-by-step solution
Step 1 of 2
Pseudo code for the insertion algorithms:We assume that the elements in the hash table T are keys with no information.
The key K is identical to the element containing key K. Every slot contains either a key or Nil.
HASH – INSERT (T, K)
Report
If
Then
Return j
Else
Unitl
Error “hash table over flow”
Comment
Step 2 of 2
Pseudo code for the insertion algorithms for extendible mashing:Insertion
Algorithm: initialize (num buckets)
Input: desired number of buckets
1. Initialize array of linked lists;
Algorithm: in sert (key, value)
Input: key – value pair
// compute table entry:
Entry = key. Has code ( ) mod num buckets
If table [entry] is null
//no list present, so create one
Table [Entry] = new linked list;
Table [Entry].add (key. value)
Else
//otherwise, add to existing list
Table [entry].add (key. value)
End if.
Comment
Chapter 16, Problem 48E
Problem
Write program code to access individual fields of records under each of the following
circumstances. For each case, state the assumptions you make concerning pointers, separator
characters, and so on. Determine the type of information needed in the file header in order for
your code to be general in each case.
a. Fixed-length records with unspanned blocking
b. Fixed-length records with spanned blocking
c. Variable-length records with variable-length fields and spanned blocking
d. Variable-length records with repeating groups and spanned blocking
e. Variable-length records with optional fields and spanned blocking
f. Variable-length records that allow all three cases in parts c, d, and e
Step-by-step solution
Step 1 of 6
a.
Consider the following program code for fixed length records with unspanned blocking.
//initialize the initial address of starting location using pointer
*starting_location=200;
// record_to_access
int x
//x is the fifth record in the field
x = 5;
y is the second field of the fifth record
y = 2;
//record_size
R=25;
//for loop is used to check the value of byte.
for (B=0; B>=25; B++)
//while loop is used to check the bytes B remaining in each field
while (B
{
x = starting_locaton+(R*x)+y;
}
• In the above code, assume that the starting location of memory address is 200.
• In computer memory, records are stored into the block.
• When the records size is less than the block size, each block store more than one record.
• Block size is defined by B bytes and records size is defined by R.
Comment
Step 2 of 6
b.
Consider the following program code for fixed length records with spanned blocking.
//initialize the initial address of starting location
*starting_location=200;
// record_to_access
int x
//x is the fifth record in the field
x = 5;
//y is the second field of the fifth record
y = 2;
//record_size
R=25;
//initialize the value of i
int i=0;
//B is the block size
int B;
//a is field size
int a=1
// for loop is used to check the value of byte.
for (B=0; B>=25; B++)
{
// while loop is used to check the separating character
while ($)
{
//if while loop contain the separating symbol, update the value of
//current_location
current_location = current_location + 25B;
//while loop is used to check the bytes B remaining in each field
while (B
{
//update the value of variable i
i= i + 2*(a+1)
}
}
}
• In the above code, $ is used as separator character.
• while loop contain the separating symbol, update the value of current_location
• update the value of variable I
Comment
Step 3 of 6
c.
Consider the following code for variable length records with variable length fields and spanned
blocking.
//initialize the initial address of starting location
*starting_location=200;
// record_to_access
int x
//x is the fifth record in the field
x = 5;
//y is the second field of the fifth record
y = 2;
//record_size
R=25;
//a is field size
int a=1
// ReadFirstByte is used to reads first byte of current line and returns true if it indicates an //empty
record
empty = ReadFirstByte(a);
//if statement is used to check the condition
if (! empty)
{
// update the value of crnt_Rcrd_Length
crnt_Rcrd_Length += a.length ();
}
//if statement is used to check the value of crnt_Rcrd_Length
if (crnt_Rcrd_Length!= R)
{
empty = false;
}
// if statement is used to check the value of crnt_Rcrd_
if (crnt_Rcrd_Length > R)
{
// not efficient, nor thread safe - deep copy occurs here
records.push_back(*this);
}
• In the above code assume that each record has an end of record byte.
• Move byte by byte to access the records.
• if statement is used to check the value of specified condition in loop.
Comment
Step 4 of 6
d.
Consider the following code for variable length records with repeating group and spanned
blocking.
if (! empty)
{
// update the value of crnt_Rcrd_Length
crnt_Rcrd_Length += a.length ();
}
• Consider the highlighted code. It will be removed from part (c) to determine variable length
records with repeating group and spanned blocking.
• Since the spanned blocking involves records spanning more than one block, so the record
length is not required.
Comment
Step 5 of 6
e.
Consider the following code for variable length records with optional field and spanned blocking.
if (crnt_Rcrd_Length!= R)
{
empty = false;
}
• Consider the highlighted code. It will be removed from part (c) to determine variable length
records with optional field and spanned blocking.
• As some of the fields in the file records are optional, so the record length of the records, present
in the files, can be skipped.
Comment
Step 6 of 6
f.
Consider the following code for variable length records that allow all three cases in parts c, d and
e.
if (crnt_Rcrd_Length > R)
{
// not efficient, nor thread safe - deep copy occurs here
records.push_back(*this);
}
• Consider the highlighted code. It will be removed from part (c) to determine variable length
records that allow all three cases in parts c, d and e.
• One or more of the fields of the records, present in the files, are of varying size so their size
need not be greater than R. Hence the above part of the code can be skipped.
Comment
Chapter 16, Problem 49E
Problem
Suppose that a file initially contains r = 120,000 records of R = 200 bytes each in an unsorted
(heap) file. The block size B = 2,400 bytes, the average seek time s = 16 ms, the average
rotational latency rd = 8.3 ms, and the block transfer time btt = 0.8 ms. Assume that 1 record is
deleted for every 2 records added until the total number of active records is 240,000.
a. How many block transfers are needed to reorganize the file?
b. How long does it take to find a record right before reorganization?
c. How long does it take to find a record right after reorganization?
Step-by-step solution
Step 1 of 4
Let X = # of records are deleted and 2X= # of records added.
So, total active records = 240,000
= 120,000 - X + 2X.
X = 120,000
Physically records may deleting for reorganization is = 360,000.
Comment
Step 2 of 4
(a)
No. of blocks for Reorganization = Blocks Read + Blocks Written.
-200 bytes/record and 2400 bytes/block gives us 12 records per block
- involves 360,000 records
360,000/12 = 30K blocks
-Writing involves 240,000 records
240000/12 = 20K blocks.
Total blocks transferred during reorganization = 30K + 20K
= 50K blocks.
Comment
Step 3 of 4
(b)
On an average we assume that half the file will be read.
So, Time = (b/2)* btt = 15000 * 0.8 ms
= 12000 ms.
= 12 sec.
Comment
Step 4 of 4
(c)
Time to locate a record after reorganization = (b/2) * btt
= 10000 * 0.8
= 8 sec.
Comment
Chapter 16, Problem 50E
Problem
Suppose we have a sequential (ordered) file of 100,000 records where each record is 240 bytes.
Assume that B = 2,400 bytes, s = 16 ms, rd = 8.3 ms, and btt = 0.8 ms. Suppose we want to
make X independent random record reads from the file. We could make X random block reads or
we could perform one exhaustive read of the entire file looking for those X records. The question
is to decide when it would be more efficient to perform one exhaustive read of the entire file than
to perform X individual random reads. That is, what is the value for X when an exhaustive read of
the file is more efficient than random X reads? Develop this as a function of X.
Step-by-step solution
Step 1 of 3
The records in the file are ordered sequentially.
Total number of records in the file (Tr) = 100000.
Size of each record (rs) = 240 bytes.
Size of each block (B) = 2400 bytes.
Average seek time (s) = 16 ms.
Average rotational latency (rd) = 8.3 ms.
Block transfer time (btt) = 0.8 ms.
Calculate the total number of blocks (TB) in file using the formula
.
Hence, total number of blocks in file (TB) = 10000 blocks.
Comment
Step 2 of 3
Calculate the time required for exhaustive reads (er) using the formula
.
Hence, the time required for exhaustive read (er) = 8024.3 ms.
Comment
Step 3 of 3
Consider X be the number of records need to be read.
The equation to decide the performance of one exhaustive read of the entire file is more efficient
than performing X individual random reads follows:
Time required to perform X individual random reads > time required for exhaustive read
Therefore, when 320 or more individual random reads are required, then it is better to read
the file exhaustively.
The function in X that relates the individual random reads and exhaustive reads is given by the
following equation:
Comment
Chapter 16, Problem 51E
Problem
Suppose that a static hash file initially has 600 buckets in the primary area and that records are
inserted that create an overflow area of 600 buckets. If we reorganize the hash file, we can
assume that most of the overflow is eliminated. If the cost of reorganizing the file is the cost of
the bucket transfers (reading and writing all of the buckets) and the only periodic file operation is
the fetch operation, then how many times would we have to perform a fetch (successfully) to
make the reorganization cost effective? That is, the reorganization cost and subsequent search
cost are less than the search cost before reorganization. Support your answer. Assume s = 16
msec, rd = 8.3 msec, and btt = 1 msec.
Step-by-step solution
Step 1 of 1
Primary Area = 600 buckets
Secondary Area = 600 buckets
Total reorganization cost = Buckets Read & Buckets Written for (600 & 600) + 1200
= 2400 buckets
= 2400 (1 ms)
= 2400 ms
Let X = number of random fetches from the file.
Average Search time per fetch = time to access (1 + 1/2) buckets where 50% of time we need to
access the overflow bucket.
Access time for one bucket access = (S + r + btt)
= 16 + 8.3 + 0-8
= 25.1 ms
Time with reorganization for the X fetches
= 2400 + X (25.1) ms
Time without reorganization for X fetches = X (25.1) (1 + 1/2) ms
= 1.5 * X * (25.1) ms.
So, 2400 + X (25.1) < (25.1) * (1.5X)
2374.9/ 12.55 < X
So, 189.23 < X
If we take at least 190 fetches, then the reorganization is worthwhile.
Comment
Chapter 16, Problem 52E
Problem
Suppose we want to create a linear hash file with a file load factor of 0.7 and a blocking factor of
20 records per bucket, which is to contain 112,000 records initially.
a. How many buckets should we allocate in the primary area?
b. What should be the number of bits used for bucket addresses?
Step-by-step solution
Step 1 of 2
575-13-41E
(a)
No of buckets in primary area = 112000/(20*0.7)
= 8000.
Comment
Step 2 of 2
(b)
Let ‘K’ is the number of bits used for bucket addresses. So, 2K < = 8000 < = 2 k+1
2 12 = 4096
2 13 = 8192
K = 12
Boundary Value = 8000 - 2 12
= 8000 - 4096
= 3904 -
Comment
Chapter 17, Problem 1RQ
Problem
Define the following terms: indexing field, primary key field, clustering field, secondary key field,
block anchor, dense index, and nondense (sparse) index.
Step-by-step solution
Step 1 of 1
Define the following terms:Indexing field:Record structure is consisting of several fields. The record fields are used to construct an index.
An index access structure is usually defined on a single field of a file. Any field in a file can be
used to create an index and multiple indexes on different fields can be constructed on a field.
Primary key field:A primary key is the ordering key field of the file. A field that is uniquely identifies a record.
Clustering field:A secondary index is also an ordered field with two fields. ( like a primary index). The first field is
of the same data type as some non-ordering field of the data file that is an indexing field. If the
secondary access structure uses a key field, which has a distinct value for every record.
Therefore, it is called as secondary key field.
Block anchor:The total number of entries in the index is the same as the number of disk block in the ordered
data file.
The first record in each block of the data file is called as block anchor.
Dense index:
An index has an index entry for every search key value (and hence every record) in the data file.
Index record contains the pointer and search key value to the records on the disk
Non-dense:An index has entries for only some of the search values.
Comment
Chapter 17, Problem 2RQ
Problem
What are the differences among primary, secondary, and clustering indexes? How do these
differences affect the ways in which these indexes are implemented? Which of the indexes are
dense, and which are not?
Step-by-step solution
Step 1 of 1
Differences among primary secondary and clustering indexes:-
Comment
Chapter 17, Problem 3RQ
Problem
Why can we have at most one primary or clustering index on a file, but several secondary
indexes?
Step-by-step solution
Step 1 of 2
A file which is in an order has some fixed size of the records with some key fields is said to be
the primary index. But the clustering index in which it has a block pointer and the data with a field
of the same type as the clustering field.
Adding or removing records in the file cannot be done easily. It has some problems in which the
data records are physically ordered.
To overcome this problem, a whole block can be reserved for each of the clustering fields.
Comment
Step 2 of 2
A file which is not in an order is said to be secondary index. It can be defined on a single key field
with a unique value and on a non-key field with repeated values.
The following is the reason behind why there are at most one primary or clustering indexes
whereas several indexes for secondary index:
• Primary and clustering index can use a single key field such that both of them cannot be there
in a file but for secondary index, a unique value can be taken as a key field in every records or a
non-key field with the repeated values in which the pointers will point to another block that have
pointers to the repeated values.
Comment
Chapter 17, Problem 4RQ
Problem
How does multilevel indexing improve the efficiency of searching an index file?
Step-by-step solution
Step 1 of 4
Solution:
Multilevel indexing improves the efficiency of searching an indexing file.
• In multilevel indexing, the main idea is to reduce the blocks of the index that are searched.
• It is the blocking factor for the index.
So, the search space is reduced much faster.
A Multi-level defines the index file that will be referred first with an ordered file with a
distinct k value.
• By using single level index, create the primary index and then create the second-level, thirdlevel and so on.
• So that the multi-level index can be created with the single index blocks.
Comment
Step 2 of 4
For improving the efficiency of searching the index file, multilevel index in is follows the
following steps:
Step1:
• Multilevel index considers the index file. The distinct value with an ordered file for each key k (i)
Step 2:
• In first level, create a primary index.
• It is called primary index.
• Also, use block anchors.
• So, there is one entry in the level for each block.
• Hence, the second level blocking factor before, is some as the first level of the index.
• Here, before the blocking factor
the first level 1 has
entries, then the first level needs
blocks.
• Then, in the second level
index is needed.
Step 3:
• In next level, the primary index has an entry in the second level for the second-level blocks.
So, the entries in the third level is
• Now repeat the process until all the entries fit in the single block of some index level
• Now, it is in the block at the
fit.
th level. Also, it is the top index level.
• So, reduce the number of entries by a factor of
at the previous level.
Comment
Step 3 of 4
Use the formula to calculate the
value,
Hence in the multilevel index,
Approximately
levels will be corresponding to the
first-level entries
Where
From the above steps and processer, we may improve the efficiency of the search an index file.
Comment
Step 4 of 4
The following ways that the multilevel indexing improved the efficiency of searching an
index file is:
• While searching the record, it reduces the access of number of blocks in the given indexing field
value.
• The benefits of multi-level indexing include the reduction of insertion and deletion problems in
indexing.
• While inserting new entries, it leaves some space that deals to tshe advantage to developers to
adopt the multi-level indexing.
• By using B-tress and B+ trees, it is often implemented.
Comment
Chapter 17, Problem 5RQ
Problem
What is the order p of a B-tree? Describe the structure of B-tree nodes.
Step-by-step solution
Step 1 of 2
Order P of a B – tree:1
A tree, it consists that, each node contains at must p – 1 search values and P pointers in the
order
Where
Here each
:
is a pointer to child node and
Is search value from some ordered set of values.
Comment
Step 2 of 2
Structure of the B-tree
Structure of a B-tree follows the below steps.
Step 1:
Each internal node in the B-tree is in the form of
Here
is a tree pointer
is a data pointer. and
Search key value is equal to
Step 2:
With in each node,
Step 3:
For all search key field values X in the Sub tree pointed at by
Step 4: Each node have at most
:
tree pointers.
Sep 5: Each node, except the root and leaf nodes, has at least two tree pointers unless it is the
only node in the tree
Step 6: A node with a tree pointers,
, has
search key field values.
Step 7: All nodes are at the same level. Leaf nodes have to same structure as internal nodes
except that all of their tree pointers
Below figure shows the structure:-
Comment
are Null
Chapter 17, Problem 6RQ
Problem
What is the order p of a B+-tree? Describe the structure of both internal and leaf nodes of a B+tree.
Step-by-step solution
Step 1 of 4
Order P of a B + -tree:Implementation of a dynamic multilevel index use a variation of the B-tree data structure is called
as
-tree.
Structure of internal nodes of a B + -tree:-
Comment
Step 2 of 4
Comment
Step 3 of 4
From the above figure,
Step 1
Each internal node is of the form of
Where
and each
is a tree pointer.
Step 2
Within each internal node,
Step 3
For all search field values X in the sub tree pointed at by
; where
for
we have
and
Step 4 each internal node has at most P tree pointers
Step 5
Each internal node, except the root, has at least
tree pointers.
The root node has at least two tree pointers, if it is an internal node.
Step 6
An internal node with
pointers,
. Has
search field values.
Structure of leaf nodes of B + -tree:-
Comment
Step 4 of 4
From the above figure:-
for
Step 1:
Each leaf node is the form of
Where
, each
is a data pointer and
points to the next leaf node of the
Step 2:
Within each leaf node,
Step 3:
Each
is a data pointer that points to the record whose search field value is
or to a file block containing the record.
Step 4:
Each leaf node has at least
values.
Step 5:All leaf nodes are at the same values.
Comment
.
Chapter 17, Problem 7RQ
Problem
How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred as an access
structure to a data file?
Step-by-step solution
Step 1 of 1
The main difference in B-tree and B+ - tree is
A B-tree has data pointers in the both internal and leaf nodes, where as
In B+-tree, it has only tree pointers in internal nodes and all data pointers are in leaf
nodes.
B+-tree preferred as an access structure to a data file because, entries in the internal nodes of a
B+-tree leading to fewer levels improving the search time.
In addition that, the entire tree can be traversed in order using the pent pointers.
Comment
Chapter 17, Problem 8RQ
Problem
Explain what alternative choices exist for accessing a file based on multiple search keys.
Step-by-step solution
Step 1 of 3
Choices for accessing file based on multiple fields are:
1. Ordered Index on Multiple Attributes: In this index is created on search key field that is a
combination of attributes . If an index is created on attributes , the search key values are tuples
with n values:
A lexicographic ordering of these tupl values establishes an order on this composite search key.
Lexicographic ordering works similar to ordering of character strings. An index on a composite
key of n attributes works similarly to primary or secondary indexing.
Comment
Step 2 of 3
2. Partitioned Hashing: Partitioned hashing is an extension of static external hashing that
allows access on multiple keys. It is only suitable for equality comparisons; range queries are not
supported. In partitioned hashing, for a key consisting of n components, the hash function is
designed to produce a result with n separate hash address. Th bucket address is a
concatenation of these n addresses. It is then possible to search for the required composite
search keys by looking up the appropriate buckets that match the parts of the address in which
we are interested.
For example, consider the composite aearch key id Dno is hashed to 3 bits and Age to 5 bits; we
get 8 bits of address. Suppose that Dno = has hash address ‘100’ and for Age = 59 has address
‘10101’ to search combination, search bucket address = 10010101.
An advantage of portioned hashing is is that it can be easily extended to any number of
attributes. The bucket address can be designed so that high order bits in the address correspond
to more frequently accessed attributes. Additionally, no separate access needs to be maintained
for the individual attributes. The main drawback of portioned hashing is tat it cannot handle range
queries on any of the component attributes.
Comment
Step 3 of 3
3. Grid Files: We can construct grid array with one linear scale for each of search attributes. This
method is particularly useful for range queries that would map into a set of cells corresponding to
a group of values along the linear scales. Conceptually, the grid file concept may be applied to
any number of search keys. For n search keys, the grid array would have n dimensions of the
search keys attributes and provides an access by combinations of value along those dimensions.
Grid files perform well in terms of reduction in time for multiple key accesses. However, they
represent a space overhead in terms of grid array structure. Moreover, with dynamic files, a
frequent recognition of the files adds to maintenance cost.
Comment
Chapter 17, Problem 9RQ
Problem
What is partitioned hashing? How does it work? What are its limitations?
Step-by-step solution
Step 1 of 1
Partitioned flashing:It is an extension of static external hashing. That allows access on multiple keys. Means, hash
values that are split into segments. That depend on each attribute of the search key.
Let take one example:
Let
, for customer and search-key being (customer – street, customer – city)
Search – key value hash value
(Main, ) 101111
(Main, ) 101001
(Park, ) 010010
(Spring, ) 001001
(, ) 110010
Working functionally of partitioned hashing:In partitioned hashing, for a key consisting of
produce a result with
components. Hash function is
designed to
separate, hash addresses.
Bucket address is added to these
address
Now, it is ready to search for the required composite search key by looking up eh
appropriate buckets that mach the parts of the address in which we are interested.
Limitations of partitioned hashing:It can be easily extended to any number of attributes.
For individual attributes, it has no separate access structure.
It cannot handle range queries on any of the component attributes.
Comment
Chapter 17, Problem 11RQ
Problem
Show an example of constructing a grid array on two attributes on some file.
Step-by-step solution
Step 1 of 2
Take grid array for the EMPLOYEE file with one linear scale D no and another for the age
attribute.
D no
0 1,2
1 3,4
25
3 6,7
48
5. 9,10
Comment
Step 2 of 2
Linear scale for Age
0
1
2
3
4
5
<20 21-25 26-30 31-40 41-50 >50
Through this data we want to show that the linear scale of D no has D no
value 0 on the scale while D no
combined as one
corresponds to the value 2 on the scale and age is divided
into its scale of 0 to 5 by grouping ages and distribute the employees uniformly by age.
For this the grid array shows
cells. And each cell points to some bucket address where the
records corresponding to that cell are stored.
Now our request for D no
and age
maps into cell
. It is corresponding to grid array,
and it will be found in the corresponding bucket. For ‘n’ search keys, the grid array would have ‘n’
dimensions.
Grid array on D no and AGE attributes.
Employee File.
Comment
Chapter 17, Problem 12RQ
Problem
What is a fully inverted file? What is an indexed sequential file?
Step-by-step solution
Step 1 of 2
Fully inverted file:Indexes that are all secondary and new records are inserted at the end of the file. Then the data
file it self is an unordered file. So, a file that have secondary index on every one of its field is
called as fully invented file. Usually, the indexes that are implemented as B+- tree and up load
dynamically to reflect insertion or deletion of records.
Comment
Step 2 of 2
Indexed sequential file:An indexed sequential file is a sequential file which has an index.
Sequential file means it stored into order of a key field.
Indexed sequential files are important for applications where data needs to be
accessed
through
Sequential and randomly using the index.
An indexed sequential file allows fast access to a specific record.
Let an example.
An organization may store the details about it’s employees as an indexed sequential
file,
and sometimes the file is accessed
Sequential:For example, when the whole of the file is processed to produce pay slips at the end of the
month.
Randomly:
An example changes address, or a female employee gets married can changes her surname so,
indexed sequential file can only be stored an a random access device.
Example magnetic disc, CD
Comment
Chapter 17, Problem 13RQ
Problem
How can hashing be used to construct an index?
Step-by-step solution
Step 1 of 1
Hashing technique is used for searching wherein fast retrieval of records is necessary. The
reference file used for this is known as hash file. The search condition is validated using the hash
key which is nothing but the reference name that has to be found.
Functions of hashing:
• A hash function ‘f’ or randomizing function is entered in the hash field value of a record and
determines the address of it.
• It is also used as an internal search function within a program, whenever a group of records is
accessed by using the value of only one field.
• Access structures similar to indexes that are based on hashing can be created; the hash index
is a secondary structure to access the file by using hashing function on a search key.
• The index entries contains the key (K) and the pointer (P) used to point to the record containing
the key or block containing the record for that key.
• The index files that contain these index entries can be organized as a dynamically expandable
hash file, using dynamic or linear or extendible hashing techniques, searching for an entry is
performed by using hash search algorithm on K.
• Once an entry is identified the pointer (P) is used to locate the corresponding record in the data
file.
Comment
Chapter 17, Problem 14RQ
Problem
What is bitmap indexing? Create a relation with two columns and sixteen tuples and show an
example of a bitmap index on one or both.
Step-by-step solution
Step 1 of 1
The bitmap index is a data structure that allows querying on more number of keys
• It is used for relations that contain a large number of rows so that it can be used identify the
relation for the specific key value.
• It creates an index for one or more columns and each value or value range in those columns
selected is/are indexed.
• A bitmap index is created for those columns that should contain only a small number of unique
values.
Construction
• To create a bitmap index for a set of records in a relation or a table, the records must be
numbered from 0 to n with an id that is used to be mapped to a physical address that contains a
block number and a record offset within the block.
• It is created for one particular value of a particular field (or column) as an array of bits.
• For example a bitmap index is constructed for the column F and a value V of that column. A
relation with n rows of n tuples and it contains n bits. The jth bit is set to 1 if the row j has the
value V for column F, otherwise it is set to 0.
Example
S.No Customer Name Gender
1
Alan
M
2
Clara
F
3
John
M
4
Benjamin
M
5
Marcus
M
6
Alice
F
7
Joule
F
8
Louis
M
9
Samuel
M
10
Lara
F
11
Andy
F
12
Martin
M
13
Catherine
F
14
Fuji
F
15
Zarain
F
16
Ford
M
Bitmap index for Gender
For M: 1011100110010001, the row that contains the tuple M wherever it appears are set to 1,
other are set to 0.
For F: 0100011001101110, the row that contains the tuple F wherever it appears are set to 1,
other are set to 0.
Comment
Chapter 17, Problem 15RQ
Problem
What is the concept of function-based indexing? What additional purpose does it serve?
Step-by-step solution
Step 1 of 1
Function-based indexing is a new type of indexing that has been developed and used by the
Oracle database systems as well as in some other organizational products that provides financial
profit.
By applying any function to the value that belongs to the field or to the collection of fields, a result
is obtained which is used as the key to the index that is used to create an index.
It ensures that Oracle Database System will use this index to search instead of performing the
scan over full table, even when a function is used in the search value of a query.
Example,
The query that create an index, using function LOWER (CustomerName),
CREATE INDEX lower_ix ON Customer (LOWER (CustomerName));
It returns the customer name in lower case letter; LOWER ("MARTIN") results in “martin”, the
query given below uses the index:
SELECT CustomerName
FROM Customer
WHERE Lower (CustomerName) = "martin".
If the functional-based indexing is not used, an Oracle database system perform scanning
process for the entire table, as
-tree index is a searching process by using directly only the
column value, any function that is used on a column avoids using such an index.
Comment
Chapter 17, Problem 16RQ
Problem
What is the difference between a logical index and a physical index?
Step-by-step solution
Step 1 of 1
Physical index
• The index entries with the key (K) and the physical pointer (P), used to point to the physical
address of the record stored on the disk as a block number and offset. This is referred as
physical index.
• For example, a primary file organization is based on extendible or linear hashing, and then at
each time when a bucket is split, some of the records are allocated to a newer bucket and hence
they are provided with new physical addresses.
• If there is a secondary indexing used on the file, the pointers that point to that record must be
determined and updated (pointer must be changed if the record moved to another location) but it
is considered to be a difficult task.
Logical index
• The index entries of logical index are a pair of keys K and Ks.
• Every entry of the records contains one value of K used for primary organization of files and
another key Ks for the secondary indexing field matched with the value K of the field.
• While searching the secondary index on the value of Ks, a program can identify the location of
the corresponding value of K and use this matching key terms to access the record through the
primary organization of the file, thus it introduces an extra search level of indirection between the
data and access structure.
Comment
Chapter 17, Problem 17RQ
Problem
What is column-based storage of a relational database?
Step-by-step solution
Step 1 of 1
Column-based storage of relations is a traditional way of storing the relations by row (one by
one). It provides advantages especially for read-only queries, which are from read-only
databases. It stores each column of data in relational databases individually and provides
performance advantages.
Advantages
• Partitioning the table vertically column by column, so those tables with a two-column are
constructed for each and every attribute of the table and thus only the columns that are needed
can be accessed.
• Column-wise indexes and join indexes are used on multiple tables to provide answer to the
queries without accessing the data tables.
• Materialized views are used to support queries on multiple columns.
Column-wise storage of data provides an extra feature in the index creation. The same column
present in each table on number of projections creates indexes on each projection. For storing
the values in the same column, various strategies, data compression, null value suppression,
and various encoding techniques are used.
Comment
Chapter 17, Problem 18E
Problem
Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a record
pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed length. Each
record has the following fields: Name (30 bytes), Ssn (9 bytes), Department_code (9 bytes),
Address (40 bytes), Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes),
and Salary (4 bytes, real number). An additional byte is used as a deletion marker.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned
organization.
c. Suppose that the file is ordered by the key field Ssn and we want to construct a primary index
on Ssn. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve a record
from the file—given its Ssn value—using the primary index.
d. Suppose that the file is not ordered by the key field Ssn and we want to construct a secondary
index on Ssn. Repeat the previous exercise (part c) for the secondary index and compare with
the primary index.
e. Suppose that the file is not ordered by the nonkey field Department_code and we want to
construct a secondary index on Department_code, using option 3 of Section 17.1.3, with an extra
level of indirection that stores record pointers. Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed among these values.
Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of
blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level
index entries and the number of first-level index blocks; (iv) the number of levels needed if we
make it into a multilevel index; (v) the total number of blocks required by the multilevel index and
the blocks used in the extra level of indirection; and (vi) the approximate number of block
accesses needed to search for and retrieve all records in the file that have a specific
Department_code value, using the index.
f. Suppose that the file is ordered by the nonkey field Department_code and we want to construct
a clustering index on Department_code that uses block anchors (every new value of
Department_code starts at the beginning of a new block). Assume there are 1,000 distinct values
of Department_code and that the EMPLOYEE records are evenly distributed among these
values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve all
records in the file that have a specific Department_code value, using the clustering index
(assume that multiple blocks in a cluster are contiguous).
g. Suppose that the file is not ordered by the key field Ssn and we want to construct a B+-tree
access structure (index) on Ssn. Calculate (i) the orders p and pleaf of the B+-tree; (ii) the
number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for
convenience); (iii) the number of levels needed if internal nodes are also 69% full (rounded up for
convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block
accesses needed to search for and retrieve a record from the file?given its Ssn value?using the
B+-tree.
h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your results for the B-tree
and for the B+-tree.
Step-by-step solution
Step 1 of 31
Disk operations on file using primary, secondary, clustering, B+ tree and B-tree methods
(a) Calculation of Record Size
Record size is calculated as follows
Record size=Name (in bytes)+ Ssn(in bytes)+Department_code(in bytes)
+Address (in bytes) + Phone (in bytes)+ Birth_date(in bytes)
+Sex (in bytes)+ Job_code(in bytes)+Salary(in bytes)
+1(1byte for deletion marker)
Record size
Comment
Step 2 of 31
(b) Calculation of Blocking factor and number of file blocks
Blocking factor, bfr
Records per block
Number of file blocks,
Comment
Step 3 of 31
(c) Operations on file ordered by key field Ssn
(i) Calculation of Index blocking factor and
Index record length,
Blocking factor,
Comment
Step 4 of 31
(ii) Calculation of number of first –level index and number of first level index blocks
Number of first – level index entries
=Number of first level index blocks
Number of first-level index entries,
Number of first-level index blocks,
Comment
Step 5 of 31
(iii) Calculation of number of levels for multi-level index
Number of second-level index entries
Number of first-level blocks,
=
= 221 entries
Number of second-level index blocks,
Number of third-level index entries,
= number of second-level index blocks,
= 7 entries
Number of third-level index blocks,
It is the top index level because the third level has only one index.
Hence, the index has x = 3 levels
(iv) Calculation of number of blocks for multi-level index
Total number of the blocks for the index
From bit (ii), Number of first-level index blocks,
=221 blocks
From bit (iii), Number of second-level index blocks,
Number of second-level index blocks ,
=7 block
=1 blocks
Therefore, the total number of blocks,
Comment
Step 6 of 31
(v) Calculation of number of block access to search and retrieve a record using primary
index on a file.
For primary index type of index, the number of block access is equal to
the access one block at each level plus one block from the data file.
Therefore, the number of block access =x+1
Since the file is ordered with a single key field, Ssn. So it is a type of primary index.
Number of blocks access to search for a record
Comment
Step 7 of 31
(d) Repetition of part c for the secondary index
(i) Index record length
Index blocking factor bfr
In the ‘c’ part, the assumes that leaf-level index blocks contain block pointers. And it is possible
to assume that they contain record pointers. And Record size is
So, leaf – level index blocking factor bfri.
Index records/block for internal nodes, block pointers are always used, so the fan-out for
internal nodes to is 34.
Comment
Step 8 of 31
(ii) Number of first-level index entries
Number of file records
Number of first level index blocks
Number of first level index entries
Number of file records
Number of first-level index block
Comment
Step 9 of 31
(iii) Calculate the number of levels
Number of second –level index entries
Number of first-level index blocks
Number of second – level index blocks
Ceiling
Number of third-level index block
So, the third level has one block and it is the top of the level
So, index has total 3 levels
(iv) Total number of blocks for the index
(v) Number of blocks accesses to search for a record
Comment
Step 10 of 31
(e) Operations on the file which is constructed using secondary index on
Department_code
(i) Calculation of index blocking factor
Index record size
Index blocking factor
Comment
Step 11 of 31
(ii) Calculation of number of blocks for indirection
Here
distinct values of Department_code.
Number of records for each value is
So, we know that record pointer size
Number of bytes need at the level of indirection for each value of
Department_code is
It is fits on the block
So, 1000 blocks are needed for the level of indirection.
Comment
Step 12 of 31
(iii) Calculation of number of first-level index entries and number of first level blocks
Number of first-level index entries
Number of distinct values of Department_code
Number of first level index blocks
Comment
Step 13 of 31
(iv) Calculation of number of levels for multi-level index
We can calculate the number of levels by number of second level index entries
Number of first level index blocks
Entries
Number of second-level index blocks
Ceiling
The index has
Comments (1)
Step 14 of 31
(v) Calculation of number of blocks for multi-level index
Total number of blocks for the index
Comment
Step 15 of 31
(vi ) Calculation of number of block access to search and retrieve all records in the file for
a Department_code value
Number of block accesses to search for and retrieve the block containing the record pointers at
the level of indirection
If
records are distributed over 30distrinct blocks, we need an additional
blocks.
So, total block accesses needed on average to retrieve all the records with in a given value for
Department_code
Comment
Step 16 of 31
(f) Operations on the file which is constructed using clustering index on Department_code
(i) Calculation of index blocking factor
Index blocking factor
Where
Comment
Step 17 of 31
(ii) Calculation of number of first-level index entries and number of first level blocks
Number of first level index entries
Number of distinct DEPARTMENT CODE values
entries.
Number of first-level index blocks
Comment
Step 18 of 31
(iii) Calculation of number of levels for multi-level index
Calculate the number of levels as number of second-level index entries
Number of first-level index blocks
Number of second-level index blocks
Ceiling
Second level has one block and it is in the top index level
The index has
Comments (1)
Step 19 of 31
( iv ) Calculation of number of blocks for multi-level index
Total number of blocks for the index
Comment
Step 20 of 31
(v) Calculation of number of block access to search and retrieve all records in the file for a
Department_code value
Number of block accesses to search for the first block in the cluster of blocks
So, the
records are clustered in ceiling
So, total block accesses needed on average to retrieve all the records with a given
DEPARTMENT CODE
Comment
Step 21 of 31
(g) Operations on B+ tree
(i) Calculation of orders p and p-leaf of B+ tree
Orders P and P leaf of the
Each internal node has
So,
For leaf nodes, the record pointers are included in the leaf nodes, and it satisfied the
(Or)
Comments (2)
Step 22 of 31
(ii) Calculation of leaf nodes if the blocks are 69 percent full
Nodes are
full on the average, so the average number of key values in a leaf node is
If we round up this for convenience, we get 22 key values and 22 record pointers per leaf node.
So, the file has
records and hence
values of
, the number of leaf-level nodes
needed is
Comment
Step 23 of 31
(iii) Calculation of number of levels if internal nodes are 69 percent full
Calculate the number of levels as average fan-out for the internal nodes is
Number of second level tree blocks
Number of third level tree blocks
Number of fourth-level tree blocks
So, the fourth level has one block and the tree has
So,
levels
Comment
Step 24 of 31
(iv) Calculation of total number of blocks
Total number of blocks for the tree
Comment
Step 25 of 31
(v) Calculation of number of block access to search and retrieve a record of Ssn using B+
tree
Number of blocks accesses to search for a record
Comment
Step 26 of 31
(h) Repetition of part (g) for B-tree
(i) p and p leaf order of the
Each internal node has
Choose p value as large value that satisfies the inequality
So,
For leaf nodes, the record pointers are included in the leaf nodes, and it satisfied the
Comment
Step 27 of 31
(ii) Each node of B-Tree is 69% full .So the average number of key values in a leaf node is
If we get ceiling of 21.39 for convenience, we get 22 key values and 22 record pointers per leaf
node.
So, the file has
records and hence
values of
, the number of leaf-level nodes
needed is
Comment
Step 28 of 31
(iii) Calculate the number of levels as average fan-out for the internal nodes
is
Number of second level tree blocks
Number of third level tree blocks
Number of fourth-level tree blocks
So, the fourth level has one block and the tree has
levels
So,
Comment
Step 29 of 31
(iv) Total number of blocks for the tree
Comments (1)
Step 30 of 31
(v) Number of blocks accesses to search for a record
Comment
Step 31 of 31
Comparison of B+ tree and B-tree
Calculation of approximate number of entries in B+ tree
At root level, each node on average will have 34 pointers and 33 (p-1) search fields
Root
1 node
33 entries
34 pointers
Level1 34 nodes
1122 entries
1156 pointers
Level2 1156 nodes
38148 entries
39304 pointers
Level3 39304 nodes 1297032 entries
Calculation of approximate number of entries in B tree
At root level, each node on average will have 23 pointers and 22 (p-1) search fields
Root
1 node
23 entries
22 pointers
Level1
22 nodes
506 entries
484 pointers
Level2
484 nodes
11132 entries
10648 pointers
Leaf Level 10648 nodes 244904 entries
For given block size, pointer size and search key field size, a three level B+ tree holds 1336335
entries on average .Similarly, for given block size, pointer size and search key field size, a leaf
level B tree holds 256565 entries on average .Therefore, average entries stored on B+ tree are
more than the average entries stored in B tree.
Comment
Chapter 17, Problem 19E
Problem
A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.
Step-by-step solution
Step 1 of 34
B+ Tree Insertion:
Here, the given a set of keys to be inserted into a
• The Order
•
of order
implies that each node in the tree should have at most 4 pointers.
Means the leaf nodes must have at least 2 keys and at most 3 keys.
• The insertion first start from the root, when root or any node overflows its capacity, it must split.
• When a leaf node is full the first
elements will keep in that node and rest elements
should form the right node.
• The element at that rightmost position of the left partition will propagate up to the parent node.
• If the propagation is from the leaf node, a copy of the element should maintain at leaf. Else, just
move that element to its parent node.
• All the elements in the key list should be there in the leaf nodes.
Comment
Step 2 of 34
In problem given a set of keys to insert into the B+ tree in order.
The given list is,
23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,24, 28, 39, 43, 47, 50, 69, 75,
8, 49, 33, 38.
First, insert the first three keys into the root; it will not result in overflow. Since, the capacity of the
node is also 3.
The resultant B+ tree will be as below:
Since, the node is also a leaf node and there is no pointer.
Comment
Step 3 of 34
Insert 60:
After the insertion of 60 into this node, it will results in an overflow, So the node to be split into
two and a new level will created as below:
Comment
Step 4 of 34
Insert 46:
Insertion of 46 will not affect the capacity constraint of the second node in level 2.
The resultant tree will be,
Level1- 37
Level2: 23,37: 46,60,65
The tree will look as below:
Comment
Step 5 of 34
Insert 92:
Insertion of next key, 92 will results in the overflow of the second node in the level2, it will be
46,60,65,92.
• Therefore, we need to split that node from 60 and create one new node in level2 and duplicate
60 in the parent node as below:
Comment
Step 6 of 34
Comment
Step 7 of 34
Insert 48:
Insertion of 48 will not prompt any overflow it will insert to the second node in the level2 as
below:
Comment
Step 8 of 34
Insert 71:
Insertion of 71 into B+ tree also will not prompt any overflow.
• It can insert into third node of level 2 without violating order constraints.
• Therefore, the updated tree will be as below:
Level1: 37, 60
Level2: 23, 37: 46, 48, 60: 65, 71, 92
The tree will be look as below:
Comment
Step 9 of 34
Insert 56:
Next insertion is of 56.
• Clearly 56 is belongs to the second node of level 2 but it will results in an overflow as shown
below:
• So, need to split that node (46, 48, 56, 60).
• The first two (46, 48) will form the first node of split and (56, 60) will form the second, the last
element of the first set (48) will propagate to up.
• Since it is a leaf node, it will be only duplication.
However, the resultant B+ will be as below:
Level 1: 37, 48, 60
Level 2: 23, 37: 46, 48: 56, 60: 65, 71, 92
Comment
Step 10 of 34
The rest insertion operations can be performed as below:
• The level is counts from root to leaves, that is; root will have level value 1 and increment 1
downwards.
Insert 59:
Level 1: 37, 48, 60
Level 2: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 11 of 34
Insert 18:
Level 1: 37, 48, 60
Level 2: 18, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 12 of 34
Insert 21:
Level 1: 37, 48, 60
Level 2: 18, 21, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Overflow. Split (18, 21, 23, 37) and propagate 21 to above level.
Level 1: 21, 37, 48, 60
Level 2: 18, 21,: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Again, overflow in level 1. Split and propagate 37, since it is not a leaf node so no need to take a
copy of 37. This will results a new level in the tree.
Level 1: 37
Level 2: 21: 48, 60
Level 3: 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 13 of 34
Insert 10:
Level 1: 37
Level 2: 21: 48, 60
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 14 of 34
Insert 74:
Level 1: 37
Level 2: 21: 48, 60
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 74, 92
Overflow in level 3. Split overloaded node at 71
Level 1: 37
Level 2: 21: 48, 60, 71
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 92
Comment
Step 15 of 34
Insert 78:
Level 1: 37
Level 2: 21: 48, 60, 71
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 16 of 34
Insert 15:
Level 1: 37
Level 2: 21: 48, 60, 71
Level 3: 10, 15, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Overflow in the first node of level 3, split it at 15 and propagate 15 up.
Level 1: 37
Level 2: 15, 21: 48, 60, 71
Level 3: 10, 15: 18, 21: 23, 37: 46, 48:
Comment
Step 17 of 34
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 18 of 34
Insert 16:
Level 1: 37
Level 2: 15, 21: 48, 60, 71
Level 3: 10, 15 16, 18, 21: 23, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 19 of 34
Insert 20:
Level 1: 37
Level 2: 15, 21: 48, 60, 71
Level 3: 10, 15 16, 18, 20, 21: 23, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Overflow at the inserted node, split it at 18 and propagate 18 up.
Level 1: 37
Level 2: 15, 18, 21: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 20 of 34
Insert 24:
Level 1: 37
Level 2: 15, 18, 21: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 21 of 34
Insert 28:
Level 1: 37
Level 2: 15, 18, 21: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24, 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Overflow in the fourth node of level 3, split it at 24 and propagate 24 up as below.
Level 1: 37
Level 2: 15, 18, 21,24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Again, overflow at level 2, need one more split at 18 as below.
Level 1: 18, 37
Level 2: 15: 21,24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 22 of 34
Insert 39:
Level 1: 18, 37
Level 2: 15: 21, 24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 23 of 34
Insert 43:
Level 1: 18, 37
Level 2: 15: 21, 24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Over flow at the inserted node, so split that node at second element 43 as below.
Level 1: 18, 37
Level 2: 15: 21, 24: 43, 48, 60, 71
Comment
Step 24 of 34
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Again, overflow at level 2.
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 25 of 34
Insert 47:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 26 of 34
Insert 50:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56, 59, 60: 65, 71:
74, 78, 92
Overflow at the inserted node. Split the node at 56, the second element and propagate it up as
below.
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 71: 74, 78,
92
Comment
Step 27 of 34
Insert 69:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
78, 92
Comment
Step 28 of 34
Insert 75:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75 78, 92
Overflow at the inserted node, split and propagate up the node at the second element.
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71, 75
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92
Again, overflow at the inserted node, split it at 60 and propagate up.
Level 1: 18, 37, 48, 60
Level 2: 15: 21, 24: 43: 56: 71, 75
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92
Again overflow at the inserted node of 60. Split it at 37 and propagate 37 into a new level.
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92
Comment
Step 29 of 34
Insert 8:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 37: 39, 43: 46, 47, 48: 50, 56:
59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 30 of 34
Insert 49:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 31 of 34
Insert 33:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 33, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 32 of 34
Insert 38:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 33, 37: 38, 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 33 of 34
The tree after the insertion of the last key 38 will give us the final B+ tree.
• From each node except the leaf nodes, a left pointer is there to the child nodes in which left
pointer points to node having keys less than that parent node and right pointer points to the node
having key values larger than that parent node.
• Each set in the above tree levels will form a node and set elements are the keys present in that
node.
Comment
Step 34 of 34
Graphically the final tree after the insertion of keys will look as below:
Comment
Chapter 17, Problem 19E
Problem
A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.
Step-by-step solution
Step 1 of 34
B+ Tree Insertion:
Here, the given a set of keys to be inserted into a
• The Order
•
of order
implies that each node in the tree should have at most 4 pointers.
Means the leaf nodes must have at least 2 keys and at most 3 keys.
• The insertion first start from the root, when root or any node overflows its capacity, it must split.
• When a leaf node is full the first
elements will keep in that node and rest elements
should form the right node.
• The element at that rightmost position of the left partition will propagate up to the parent node.
• If the propagation is from the leaf node, a copy of the element should maintain at leaf. Else, just
move that element to its parent node.
• All the elements in the key list should be there in the leaf nodes.
Comment
Step 2 of 34
In problem given a set of keys to insert into the B+ tree in order.
The given list is,
23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,24, 28, 39, 43, 47, 50, 69, 75,
8, 49, 33, 38.
First, insert the first three keys into the root; it will not result in overflow. Since, the capacity of the
node is also 3.
The resultant B+ tree will be as below:
Since, the node is also a leaf node and there is no pointer.
Comment
Step 3 of 34
Insert 60:
After the insertion of 60 into this node, it will results in an overflow, So the node to be split into
two and a new level will created as below:
Comment
Step 4 of 34
Insert 46:
Insertion of 46 will not affect the capacity constraint of the second node in level 2.
The resultant tree will be,
Level1- 37
Level2: 23,37: 46,60,65
The tree will look as below:
Comment
Step 5 of 34
Insert 92:
Insertion of next key, 92 will results in the overflow of the second node in the level2, it will be
46,60,65,92.
• Therefore, we need to split that node from 60 and create one new node in level2 and duplicate
60 in the parent node as below:
Comment
Step 6 of 34
Comment
Step 7 of 34
Insert 48:
Insertion of 48 will not prompt any overflow it will insert to the second node in the level2 as
below:
Comment
Step 8 of 34
Insert 71:
Insertion of 71 into B+ tree also will not prompt any overflow.
• It can insert into third node of level 2 without violating order constraints.
• Therefore, the updated tree will be as below:
Level1: 37, 60
Level2: 23, 37: 46, 48, 60: 65, 71, 92
The tree will be look as below:
Comment
Step 9 of 34
Insert 56:
Next insertion is of 56.
• Clearly 56 is belongs to the second node of level 2 but it will results in an overflow as shown
below:
• So, need to split that node (46, 48, 56, 60).
• The first two (46, 48) will form the first node of split and (56, 60) will form the second, the last
element of the first set (48) will propagate to up.
• Since it is a leaf node, it will be only duplication.
However, the resultant B+ will be as below:
Level 1: 37, 48, 60
Level 2: 23, 37: 46, 48: 56, 60: 65, 71, 92
Comment
Step 10 of 34
The rest insertion operations can be performed as below:
• The level is counts from root to leaves, that is; root will have level value 1 and increment 1
downwards.
Insert 59:
Level 1: 37, 48, 60
Level 2: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 11 of 34
Insert 18:
Level 1: 37, 48, 60
Level 2: 18, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 12 of 34
Insert 21:
Level 1: 37, 48, 60
Level 2: 18, 21, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Overflow. Split (18, 21, 23, 37) and propagate 21 to above level.
Level 1: 21, 37, 48, 60
Level 2: 18, 21,: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Again, overflow in level 1. Split and propagate 37, since it is not a leaf node so no need to take a
copy of 37. This will results a new level in the tree.
Level 1: 37
Level 2: 21: 48, 60
Level 3: 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 13 of 34
Insert 10:
Level 1: 37
Level 2: 21: 48, 60
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment
Step 14 of 34
Insert 74:
Level 1: 37
Level 2: 21: 48, 60
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 74, 92
Overflow in level 3. Split overloaded node at 71
Level 1: 37
Level 2: 21: 48, 60, 71
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 92
Comment
Step 15 of 34
Insert 78:
Level 1: 37
Level 2: 21: 48, 60, 71
Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 16 of 34
Insert 15:
Level 1: 37
Level 2: 21: 48, 60, 71
Level 3: 10, 15, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Overflow in the first node of level 3, split it at 15 and propagate 15 up.
Level 1: 37
Level 2: 15, 21: 48, 60, 71
Level 3: 10, 15: 18, 21: 23, 37: 46, 48:
Comment
Step 17 of 34
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 18 of 34
Insert 16:
Level 1: 37
Level 2: 15, 21: 48, 60, 71
Level 3: 10, 15 16, 18, 21: 23, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 19 of 34
Insert 20:
Level 1: 37
Level 2: 15, 21: 48, 60, 71
Level 3: 10, 15 16, 18, 20, 21: 23, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Overflow at the inserted node, split it at 18 and propagate 18 up.
Level 1: 37
Level 2: 15, 18, 21: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 20 of 34
Insert 24:
Level 1: 37
Level 2: 15, 18, 21: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24, 37: 46, 48:
56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 21 of 34
Insert 28:
Level 1: 37
Level 2: 15, 18, 21: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24, 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Overflow in the fourth node of level 3, split it at 24 and propagate 24 up as below.
Level 1: 37
Level 2: 15, 18, 21,24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Again, overflow at level 2, need one more split at 18 as below.
Level 1: 18, 37
Level 2: 15: 21,24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 22 of 34
Insert 39:
Level 1: 18, 37
Level 2: 15: 21, 24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 23 of 34
Insert 43:
Level 1: 18, 37
Level 2: 15: 21, 24: 48, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Over flow at the inserted node, so split that node at second element 43 as below.
Level 1: 18, 37
Level 2: 15: 21, 24: 43, 48, 60, 71
Comment
Step 24 of 34
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Again, overflow at level 2.
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 25 of 34
Insert 47:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 56, 59, 60: 65, 71: 74, 78, 92
Comment
Step 26 of 34
Insert 50:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56, 59, 60: 65, 71:
74, 78, 92
Overflow at the inserted node. Split the node at 56, the second element and propagate it up as
below.
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 71: 74, 78,
92
Comment
Step 27 of 34
Insert 69:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
78, 92
Comment
Step 28 of 34
Insert 75:
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75 78, 92
Overflow at the inserted node, split and propagate up the node at the second element.
Level 1: 18, 37, 48
Level 2: 15: 21, 24: 43: 56, 60, 71, 75
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92
Again, overflow at the inserted node, split it at 60 and propagate up.
Level 1: 18, 37, 48, 60
Level 2: 15: 21, 24: 43: 56: 71, 75
Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92
Again overflow at the inserted node of 60. Split it at 37 and propagate 37 into a new level.
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92
Comment
Step 29 of 34
Insert 8:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 37: 39, 43: 46, 47, 48: 50, 56:
59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 30 of 34
Insert 49:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 31 of 34
Insert 33:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 33, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 32 of 34
Insert 38:
Level 1: 37
Level 2: 18: 48, 60
Level 3: 15: 21, 24: 43: 56: 71, 75
Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:
28, 33, 37: 38, 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92
Comment
Step 33 of 34
The tree after the insertion of the last key 38 will give us the final B+ tree.
• From each node except the leaf nodes, a left pointer is there to the child nodes in which left
pointer points to node having keys less than that parent node and right pointer points to the node
having key values larger than that parent node.
• Each set in the above tree levels will form a node and set elements are the keys present in that
node.
Comment
Step 34 of 34
Graphically the final tree after the insertion of keys will look as below:
Comment
Chapter 17, Problem 20E
Problem
Repeat Exercise, but use a B-tree of order p = 4 instead of a B+-tree.
Exercise
A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.
Step-by-step solution
Step 1 of 1
Insertion will take place in steps represented in diagram:
Comment
Chapter 17, Problem 21E
Problem
Suppose that the following search field values are deleted, in the given order, from the B+-tree of
Exercise; show how the tree will shrink and show the final tree. The deleted values are 65, 75,
43, 18, 20, 92, 59, 37.
Exercise
A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.
Step-by-step solution
Step 1 of 10
In the
- tree deletion algorithm, the deletion of a key value from a leaf node is
(1) It is less than half full.
In this case, we may combine with the next leaf node.
Comment
Step 2 of 10
(2) If the key value is deleted from right most value. Then its value will appear in an internal
node.
In this case, the key value to the left of the deleted key and left node will replaces the deleted key
value in the internal node.
From the data, deleting 65 will only affect the leaf node.
Deleting 75 will cause a leaf node to be less than half. So, it is combined with the next node and
also 75 is removed than the internal node.
Comment
Step 3 of 10
Comment
Step 4 of 10
Deleting 43 causes a leaf node to be less than half full, and is combined with the next node.
So the next node has 3 entries. It’s right must entry 48 can replace 43 in both the leaf and
interval nodes.
Comment
Step 5 of 10
Comment
Step 6 of 10
In the next step we may delete 18, it is in the right most entry in a leaf node and appears in an
internal node of the
. Now the leaf node is less than half full and combined with the
next node.
The value 18 must be removed from the internal node. Causing underflow in the internal
One approach for dealing with under flow internal nodes is to reorganize the values of the under
flow node with its child nodes, so 21 is moved up into the under flow node leading to the
following free.
Comment
Step 7 of 10
Comment
Step 8 of 10
Deleting 20 and 92 will not cause under flow.
Deleting 59 causes under flow and the remaining value go is combined with the next leaf node.
Hence 60 is no larger a right most entry in a leaf node. This is normally don by moving 56 up to
replace 60 in the internal node, but since this leads to under flow in the node that used to
contains 56 the nodes can be reorganized as follows.
Comment
Step 9 of 10
Comment
Step 10 of 10
Finally removing 37 causes serious underflow, leading to a reorganization of the whole tree. One
approach to deleting the value on the root node is to use the right mast value in the root node is
to use the right mast value in the next leaf node to replace the root an move this leaf node to the
left sub tree. In this case the resulting tree may book as follows.
Comment
Chapter 17, Problem 22E
Problem
Repeat Exercise 1, but for the B-tree of Exercise 3.
Exercise 1
Suppose that the following search field values are deleted, in the given order, from the B+-tree of
Exercise 2; show how the tree will shrink and show the final tree. The deleted values are 65, 75,
43, 18, 20, 92, 59, 37.
Exercise 2
A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.
Exercise 3
Repeat Exercise 2, but use a B-tree of order p = 4 instead of a B+-tree.
Step-by-step solution
Step 1 of 1
Deletion will take place in following order:
Comment
Chapter 17, Problem 23E
Problem
Algorithm 17.1 outlines the procedure for searching a nondense multilevel primary index to
retrieve a file record. Adapt the algorithm for each of the following cases:
a. A multilevel secondary index on a nonkey nonordering field of a file. Assume that option 3 of
Section 17.1.3 is used, where an extra level of indirection stores pointers to the individual records
with the corresponding index field value.
b. A multilevel secondary index on a nonordering key field of a file.
c. A multilevel clustering index on a nonkey ordering field of a file.
Step-by-step solution
Step 1 of 3
Comment
Step 2 of 3
Comment
Step 3 of 3
Comment
Chapter 17, Problem 24E
Problem
Suppose that several secondary indexes exist on nonkey fields of a file, implemented using
option 3 of Section 17.1.3; for example, we could have secondary indexes on the fields
Department_code, Job_code, and Salary of the EMPLOYEE file of Exercise. Describe an
efficient way to search for and retrieve records satisfying a complex selection condition on these
fields, such as (Department_code = 5 AND Job_code =12 AND Salary = 50,000), using the
record pointers in the indirection level.
Exercise
Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a record
pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed length. Each
record has the following fields: Name (30 bytes), Ssn (9 bytes), Department_code (9 bytes),
Address (40 bytes), Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes),
and Salary (4 bytes, real number). An additional byte is used as a deletion marker.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned
organization.
c. Suppose that the file is ordered by the key field Ssn and we want to construct a primary index
on Ssn. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve a record
from the file—given its Ssn value—using the primary index.
d. Suppose that the file is not ordered by the key field Ssn and we want to construct a secondary
index on Ssn. Repeat the previous exercise (part c) for the secondary index and compare with
the primary index.
e. Suppose that the file is not ordered by the nonkey field Department_code and we want to
construct a secondary index on Department_code, using option 3 of Section 17.1.3, with an extra
level of indirection that stores record pointers. Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed among these values.
Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of
blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level
index entries and the number of first-level index blocks; (iv) the number of levels needed if we
make it into a multilevel index; (v) the total number of blocks required by the multilevel index and
the blocks used in the extra level of indirection; and (vi) the approximate number of block
accesses needed to search for and retrieve all records in the file that have a specific
Department_code value, using the index.
f. Suppose that the file is ordered by the nonkey field Department_code and we want to construct
a clustering index on Department_code that uses block anchors (every new value of
Department_code starts at the beginning of a new block). Assume there are 1,000 distinct values
of Department_code and that the EMPLOYEE records are evenly distributed among these
values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve all
records in the file that have a specific Department_code value, using the clustering index
(assume that multiple blocks in a cluster are contiguous).
g. Suppose that the file is not ordered by the key field Ssn and we want to construct a B+-tree
access structure (index) on Ssn. Calculate (i) the orders p and pleaf of the B+-tree; (ii) the
number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for
convenience); (iii) the number of levels needed if internal nodes are also 69% full (rounded up for
convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block
accesses needed to search for and retrieve a record from the file?given its Ssn value?using the
B+-tree.
h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your results for the B-tree
and for the B+-tree.
Step-by-step solution
Step 1 of 2
The EMPLOYEE file contains the fields Name, Ssn, Department_code, Address, Phone,
Birth_date, Sex, Job_code, Salary.
The primary index is maintained on the key field Ssn .
Consider that the secondary indexes are maintained on the fields Department_code, Job_code
and Salary. The fields Department_code, Job_code and Salary are non-key fields.
Comment
Step 2 of 2
The steps to retrieve records based on the complex condition (Department_code = 5 AND
Job_code = 12 AND Salary = 50,000) using record pointers in indirection level is as follows:
1. First retrieve the record pointers of the records that satisfy the condition Department_code = 5
using secondary index on Deparment_code.
2. Then among the records pointers retrieved in step 1, retrieve the record pointers of the records
that satisfy the condition Job_code = 12 using secondary index on Job_code.
3. Then among the records pointers retrieved in step 2, retrieve the record pointers of the records
that satisfy the condition Salary = 50000 using secondary index on Salary.
Comment
Chapter 17, Problem 25E
Problem
Adapt Algorithms 17.2 and 17.3, which outline search and insertion procedures for a B+-tree, to a
B-tree.
Step-by-step solution
Step 1 of 2
Searching record in B-tree with key field value= K
n<- block containing root node of B- tree
read block n;
while(n is not the leaf node of tree) do
begin
q<- number of tree pointers in node n;
if K<= n.K1( * n.Ki referes to the ith search field value in node n*)
then n<- n.P1(* n.Pi refers to the ith tree pointer in node n *)
else if K> n.Kq-1
then n<- n.Pq
else
begin
search node n for an entry i such that n.Ki = K
if for (n.Ki == K)
use data pointer to access the file record;
exit;
else search node n for an entry i such thatn.Ki-1< K <=n.Ki;
n<-n.Pi
end
read block n;
end;
begin;
search leaf node n for an entry i such that n.Ki = K
if for (n.Ki == K)
use data pointer to access the file record;
else
return value does not exist;(*if we rech at this level value does not exist*);
Comment
Step 2 of 2
Insertion Key field value K in B-tree of order p:
n<- block containing root node of B-tree;
read block n; set stack S o empty;
while (n is not leaf node of tree) do
begin
push addres of n on stack S;
q<- number of tree pointers in node n;
if K<= n.K1
then n<- n.P1
else if K> n.Kq-1
then n<- n.Pq
else begin
search node n for entry i such that n.Ki-1< K<= n.Ki;
n<- n.Pi
end;
read block n
end;
search block for entry (Ki, P) with K = Ki;
if found
then record already in file; cannot isert
else
begin
create entry (Pr, ) where Pr points to the new record;
if leaf node n is not full
then insert (Pr, ) in correct psition in leaf node n
else begin
copy n to temp;
insert entry (Pr, ) in temp in corret position;
new<- a new empty leaf node for tree ;
j<- [(pleaf+1)/2];
n<- first j entries intemp (upo entry (Prj,
new<- remaining entries in temp; K <- Kj;
finished<- false;
repeat
if stack S is empty
then
begin
root<- a new empty internal node for the tree;
ROOT<- >; finished<- true;
end
else begin
n<- pop stack S;
if internal node n is not full
then
begin
insert (new, ) in correct position in internal node n;
finished<- true
end
else begin
copy n to temp
insert (new, ) in temp in correct position ;
new<- a new empty internal node for tree;
j<-[(p+1)/2];
n<- new enteries upto tree pointer Pj in temp;
new<- entries from tree pointer Pj+1 in temp;
K<- Kj
end
end;
entill finished;
end;
end;
Comment
Chapter 17, Problem 26E
Problem
It is possible to modify the B+-tree insertion algorithm to delay the case where a new level is
produced by checking for a possible redistribution of values among the leaf nodes. Figure 17.17
illustrates how this could be done for our example in Figure 17.12; rather than splitting the
leftmost leaf node when 12 is inserted, we do a left redistribution by moving 7 to the leaf node to
its left (if there is space in this node). Figure 17.17 shows how the tree would look when
redistribution is considered. It is also possible to consider right redistribution. Try to modify the
B+-tree insertion algorithm to take redistribution into account.
Step-by-step solution
Step 1 of 1
Refer to figure 17.17 for the redistribution of the values among the leaf nodes at a new level. The
figure shows inserting the values 12, 9 and 6. In the figure, value 12 is inserted into the leaf node
by moving 7 to its left leaf node through left redistribution.
The values 12, 9 and 6 can be distributed among the leaf nodes, at a new level, using right
redistribution as follows:
• When a new value is inserted in a leaf node, the tree is divided into leaf nodes and internal
nodes. Every value that appears in the internal node also appears as the rightmost value at the
leaf level, such that the tree pointer to the left of this value points to this value.
• If a new values needs to be inserted in the leaf node and the leaf node is full, then it is split. The
first
values, where
denotes the order of leaf nodes, present in the
original node are retained and rests of the values are moved to a new leaf node. The duplicate
value of the jth search value is retained at the parent node and a pointer pointing to the new
node is created.
• This new node is inserted in the parent node. If the parent node is full then it is split. The jth
search value is moved to the parent and values present in the internal nodes up to
where
is the jth tree pointer and
• The values from
are kept,
.
till the last value present in the node are kept in the new internal node.
The splitting of parent node and leaf nodes continues in this way and results in new level for the
tree.
The modified
Comment
tree insertion algorithm based on the right redistribution is as follows:
Chapter 17, Problem 27E
Problem
Outline an algorithm for deletion from a B+-tree.
Step-by-step solution
Step 1 of 1
Delete node with value of key = K
n<- block containing root node of B+ -Tree;
read block n;
while (n is not leaf node of B+ - tree)
begin
q<- number of tree pointer in node n ;
if K<= n.K1( * n.Ki referes to the ith search field value in node n*)
then n<- n.P1(* n.Pi refers to the ith tree pointer in node n *)
else if K> n.Kq-1
then n<- n.Pq
else
begin
search node n for an entry i such that n.Ki = K
if (n.Ki == K)
Access the left most value in tree pointed by n.Pi+1
Store this value in a temp;
Delete this value from tree;
replace K with temp;
exit;
else search node n for an entry i such thatn.Ki-1< K <=n.Ki;
n<-n.Pi
end
read block n;
end;
search leaf node n for an entry (Ki, Pri) with = Ki;
If not found
value does not exist in tree
else
if it is the single entry in leaf node and P.next is not null
temp. K1 , Ptemp.Pr<- Pnext.K1, Pnext.Pr;
Delte P.next.K1;
Excahange record value of parent record and tmp.
replace value of n by temp;
exit;
else if it is not single entry;
Delete it;
else if it is single entry and Pnext = NULL
Access the right most value in tre pointed by parent
tore value in temp;
exchange parent and temp;
acess record to be deleted;
replace this record by temp;
exit;
Comment
Chapter 17, Problem 28E
Problem
Repeat Exercise for a B-tree.
Exercise
Outline an algorithm for deletion from a B+-tree.
Step-by-step solution
Step 1 of 2
Algorithm for deletion from B-tree:
B-Tree –delete (x, k)
//
is the root of the sub tree and k is the key which is to be deleted.
// if K deleted successfully, then B-tree-Delete return true. Other wise it returns false.
Note: - This function is designed so that when ever it is called recursively
If
is a leaf then
if
is in
then
Delete k from
and it return true.
Else return false // k is not in subs tree.
Else //
is an internal node.
If k is in
them
The child of
If
that precedes k
has at least
keys them
The predecessor of
(Use B-tree-find largest)
Copy
over k
// replace
with
B-tree-Delete (
) // recursive call else //
The child of
If z has at least
has
keys
that follows k
keys then
The successor of k
Copy
over k // replace k with
B-Tree-Delete
// recursive call else // both
Merge
and all of
into
// here
contains 2t-1 keys,
and
have
keys
// k and the pointer to z will be deleted from x.
B-tree-Delete (y,k) // recursive call else // k is not in internal node x
Points to the root,
of the sub tree and it contain k.
If contain k
If C has
keys then
If C has an immediate left /right sibling, z
With t or more keys then let 1<1 be the key in
that follows C.
Move 1<1 into C as the first /last key in C.
Let
be the last/first key in the immediate left/right sibling, z
Replace
(ie. Move
in
with
up into
from z
)
More the last/first child sub tree of C.
Else // C and both of its immediate siblings have (t-1)
// we cannot descend to the child node with only
keys so
Comment
Step 2 of 2
Merge ‘C’ with immediate siblings and make the appropriate key of
B-tree-delete (c, k)
Comment
,c
has at least
keys.
Chapter 20, Problem 1RQ
Problem
What is meant by the concurrent execution of database transactions in a multiuser system?
Discuss why concurrency control is needed, and give informal examples.
Step-by-step solution
Step 1 of 1
Multi-user system
Users, that can use the many system and access data at the same time. That is called multi user
system.
Concurrency control is needed for
Lost update problem –
Two or more transactions read a record at the same time, but when the records are saved, only
the last record saved will reflect any changes while all other changes will be lost.
Temporary update (or dirty read) problem
By this we cannot save the data update because someone else may have accessed the record
and locked it due to a concurrency safety feature. However, another transaction reads the
temporary update. the data from the temporary update is now incorrect or "dirty data".
Incorrect Summary problem –
Let the example from the book
uses an airline seat reservation issue. a person wants to buy a ticket for a seat so the system
takes a summary of how many open seats on the plane. between the time the
summary action starts and finishes, some other seats become reserved by other tellers and the
initial summary comes back to our customer and is now inaccurate because it will not reflect the
true number of seats available.
Comment
Chapter 20, Problem 2RQ
Problem
Discuss the different types of failures. What is meant by catastrophic failure?
Step-by-step solution
Step 1 of 1
Types of Failures
Failures in database management system are categorized as transaction, system, and media
failures.
There are many possible reasons for transaction to fail during execution:
Computer failure:
During the transaction execution, the computer hardware, media, software or network may crash.
This type of crashes will cause database management system failures.
Transaction or system error:
The operations such as such as divide by zero or integer overflow will cause the transaction to
fail. Occurrences of logical programming error or erroneous parameter values will cause failures.
User may interrupt the system during the transaction execution.
Local errors:
Errors or exception conditions that are detected by the transaction will cause failures. Then the
transaction halts and cancels all inputted data; because something along the way prevents it
from proceeding.
Concurrency control enforcement:
Several transactions become deadlocked and are aborted.
Disk failure:
The data stored in the disk blocks may be lost because of a read-write error or a read/write head
crash. This could occur during a read or a write operation of the transaction.
Physical problems and catastrophes:
Power failure, robbery, fire accident, destruction and many more refer to physical problems.
Catastrophic failure:
Catastrophic failure will occur very rarely. Catastrophic failure includes many forms of physical
misfortune to our database and there is an endless list of such problems.
• The hard drive with all data may completely damage
• Fire accident that may cause the loss of physical devices and data loss.
• Power or air-conditioning failures.
• Destruction of physical devices.
• Theft of storage media and physical devices.
• Overwriting disks or tapes by mistake.
Comment
Chapter 20, Problem 3RQ
Problem
Discuss the actions taken by the read_item and write_item operations on a database.
Step-by-step solution
Step 1 of 1
In a database, the operations like read item and write item that may
Actions taken by the read item operation on a database (assume the read operation is performed
on data item X):
Find the address of the disk block that contains item X.
Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.
Copy item X from the buffer to the program variable named X.
Actions taken by the write item operation on a database (assume the write operation is
performed on data item X):
Find the address of the disk block that contains item X.
Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.
Copy item X from the program variable named X into its correct location in the buffer.
Store the updated block from the buffer back to disk (either immediately or at some later point in
time).
Comment
Chapter 20, Problem 4RQ
Problem
Draw a state diagram and discuss the typical states that a transaction goes through during
execution.
Step-by-step solution
Step 1 of 2
State diagram of a transaction,
Comment
Step 2 of 2
Typical states that a transaction
begin_transaction - start
read or write - read or change or delete a record.
end_transaction - finish
commit_transaction - change or delete completed.
rollback - change or delete unsuccessful, all changes will be reset.
Important of transaction commit points
Transactions commit points in the log where the transaction has completed successfully and all
of the reads and write that go along with it.
Comment
Chapter 20, Problem 5RQ
Problem
What is the system log used for? What are the typical kinds of records in a system log? What are
transaction commit points, and why are they important?
Step-by-step solution
Step 1 of 2
System log:
The system log used for “to recover from failures that affect transactions”.
The system maintains a log to keep track of all transaction operations that affect the values of
database items." basically it is used to keep track of all the meaningful stuff from a database.
Comment
Step 2 of 2
Typical kinds of records in a system log:
start_transaction - start
commit - finish
read - read
write - write
abort - don’t change anything.
Important of transaction commit points
Points in the log where the transaction has completed successfully and all of the reads and
writes that go along with it.
Comment
Chapter 20, Problem 6RQ
Problem
Discuss the atomicity, durability, isolation, and consistency preservation properties of a database
transaction.
Step-by-step solution
Step 1 of 4
Atomicity:
• This property states that a transaction must be treated as an atomic unit, that is, either all its
operations are executed or none.
• There must be no state in a database where a transaction is left partially completed.
• States should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.
• This property requires that execute a transaction to completion. If the transaction is fail.
• If there is a failure at midway or user explicitly cancels the operation or due to any internal error
occurred, database ensures whether any partial state from leftover operation or not.
• Database can UNDO or ROLLBACK all the changes as the database was present in its first
place.
• To complicate for some reason, such as a system crash during transaction execution, the
recovery technique must undo any effects of the transaction on the database.
Comment
Step 2 of 4
Durability or permanency:
• The changes applied to the database by a committed transaction must persist in the database,
and must not be lost if failure occurs.
• It is the responsibility of the recovery subsystem of the DBMS.
• If a transaction updates a chunk of data in a database and commits, then the database holds
the modified data.
• Even if a transaction commits but the system fails before the data could be written on to the
disk, then the data will be updated once the system springs back into action.
Comment
Step 3 of 4
Isolation:
• A transaction should appear as though it is being executed in isolation from other transactions
simultaneously or in parallel.
• That is, the execution of a transaction should not be interfered with by any other transactions
executing concurrently.
• It is enforced by the concurrency control sub system of the DBMS. If every transaction does not
make its updates visible to other transactions until it is committed.
• It solves the temporary update problem and eliminates cascading rollbacks.
• In simple terms, one transaction cannot read data from another transaction until it is not
completed.
• If two transactions are executing sequentially, and one wants to see the changes done by the
another, it must wait until the other is finished.
Comment
Step 4 of 4
Consistency preservation:
• The consistency property ensures that the database remains in a consistent state before the
start of the transaction and after the transaction is over (whether it is successful or not).
• It states that when transaction is finished the data will remain in a consistent state.
• A transaction either creates a new and valid state of data, or, if any failure occurs, returns all
data to its state before the transaction was started.
• Execution of transaction should take the database from one consistent state to another.
Comment
Chapter 20, Problem 7RQ
Problem
What is a schedule (history)? Define the concepts of recoverable, cascade-less, and strict
schedules, and compare them in terms of their recoverability.
Step-by-step solution
Step 1 of 4
Schedule (or history)
A schedule (or history) S of n transactions T1, T2 , ...,Tn is an ordering of the operations of the
transactions subject to the constraint that, for each transaction Ti that participates in S, the
operations of Ti in S must appear in the same order in which they occur in Ti.
If we can ensure that a transaction T, when committed, never has to roll back, then we have a
demarcation between recoverable and non-recoverable schedules.
Schedules determined as non-recoverable should not be permitted.
Among the recoverable schedules, transaction failures generate a spectrum of recoverability,
from easy to complex.
Comment
Step 2 of 4
Recoverable:
A schedule S is recoverable if no transaction T in S commits until all transactions T’, that have
written an item that T reads, have committed.
A transaction T reads from transaction T’ in a schedule S if some item X is first written by T’ and
read later by T.
In addition, T’ should not be aborted before before T reads item X, and there should be no
transactions that write X after T’ writes it and before T reads it (unless those transactions, if any,
have aborted before T reads X).
Comment
Step 3 of 4
Cascadeless schedule:
A schedule is said to avoid cascading rollback if every transaction in the schedule reads only
items that were written by committed transactions. This guarantees that read items will not be
discarded.
Uncommitted transaction has to be rolled back because it read an item from a transaction and
that is that failed.
This form of rollback is undesirable, since it can lead to undoing a significant amount of work. It is
desirable to restrict the schedules to those where cascading rollbacks cannot occur.
Comment
Step 4 of 4
Strict schedule:
Transactions can neither read nor write an item X until the last transaction that wrote X has
committed or aborted.
Strict schedules simplify the recovery process.
The process of undoing a write (X) operation of an aborted transaction is simply to restore the
before image, the old-value for X.
Though this always works correctly for strict schedules, it may not work for recoverable or
cascadeless schedules.
If the schedule is cascadeless it is recoverable.
If it is strict it is cascadeless. The reverse is not always true
Comment
Chapter 20, Problem 8RQ
Problem
Discuss the different measures of transaction equivalence. What is the difference between
conflict equivalence and view equivalence?
Step-by-step solution
Step 1 of 3
Different measures of transaction equivalence are:
1.) Conflict equivalence: Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules. Two operations in a schedule are said to be
conflict if they belong to different transactions, access the same database item, and at least one
of the two operations is a write item operation. If two conflicting operations are applied in different
orders in two schedules, the effect can be different on the database or on other transactions in
the schedules, and hence the schedules are not conflict equivalent.
Comment
Step 2 of 3
2.) View equivalence: Another less restrictive definition of schedules is called view equivalence.
Two schedules S and S' are said to be view equivalent if the following 3 conditions hold:
1.) The same set of transactions participates in S and S', and S and S' include the same
operation of those transactions.
2.) For any operation ri(X) of Ti in S, if the value of X read but the operation has been written by
an operation wj(X) of Tj, the same condition must hold the value of X read by operation ri(X) of Ti
in S'.
3.) If the operation wk(Y) of Tk is the ast operation to write item Y in S, then wk(Y) of Tk must
also be the last operation to write item in S'.
The idea behind view equivalence is that as long as each rea operation of the transaction reads
the result of the same write operation in both the schedules, the write operation of each
transaction must produce same result. The read operations are thus said to have same view in
both schedules. Condition # ensures that the final write operation on each data item is the same
in both schedules, so the database stat should e the same at the end of both schedules.
Comment
Step 3 of 3
The difference between view equivalence and conflict equivalence arise under unconstrained
write assumption. View serializability is less restrictive under unconstrained write assumption,
where the value written by a operation wi(X) in it can be independent of its old value from the
database. This is called a blind write, and it is illustrated by the following schedule Sg of three
transactions T1: r1(X); w1(X); T2: w2(x); and T3: w3(X):
Sg: r1(X); w2(X); w1(X); w3(X);c1;c2;c3;
in Sg the operation w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value of X.
The Schedule Sg is view serializable but not conflict serializable. Conflict serializable schedules
are view serializable but not vice versa. Testing of view serializability has been shown to be NPhard, meaning that finding an efficient polynomial time algorithm for this problem is highly
unlikely.
Comment
Chapter 20, Problem 9RQ
Problem
What is a serial schedule? What is a serializable schedule? Why is a serial schedule considered
correct? Why is a serializable schedule considered correct?
Step-by-step solution
Step 1 of 4
Serial schedule:
A schedule “S” is referred as serial, for each transaction “T” participating in schedule, the
operations of T must be executed consecutively in schedule.
• So from this perspective, it is clear that only one transaction at a time is active and whenever if
that transaction is committed, then it initiates the execution of next transaction.
Comment
Step 2 of 4
Serializable schedule:
The schedule is referred as “serializable schedule. When a schedule t T be a set of n
transactions (
), is serializable and if it is equivalent to n transactions executed
serially.
Consider that possibly there are “n” serial schedule of “n” transactions and moreover there are
possibly non-serial schedules. If two disjoined groups of the nonserial schedules are formed then
it is equivalent o one or more of the serial schedules. Hence, the schedule is referred as
serializable.
Comment
Step 3 of 4
Reason for the correctness of serial schedule:
A serial schedule is said to be correct on the assumption of that each transactions is independent
of each other. So according to the “consistency preservation” property, when the transaction runs
in isolation, it is executed from the beginning to end from the other transaction .Thus, the output
is correct on the database.
Therefore a set of transaction executed one at a time is correct.
Comment
Step 4 of 4
Reason for the correctness of serializable schedule:
The simple method to prove the correctness of serializable schedule is that to prove the
satisfactory definition.
In this definition, it compares the results of the schedules on the database, if both produce same
final state of database. Then, two schedules are equivalent and it is proved to be serializable.
Therefore, the serializable schedule is correct when the two schedules are in the same order.
Comment
Chapter 20, Problem 10RQ
Problem
What is the difference between the constrained write and the unconstrained write assumptions?
Which is more realistic?
Step-by-step solution
Step 1 of 1
Constrained write assumption state that any write operation wi(X) in Ti is preceded by a ri(X) in Ti
and the value written by wi(X) in Ti depends only on value of X read by ri(X). This assume that
computation of the new value of X is a function f(X) based on the old value of X read from the
database.
Unconstrained write assumption state that the value written by an operation wi(X) in it can be
independent of its old value from the database. This is called a blind write, and it is illustrated by
the following schedule Sg of three transactions T1: r1(X); w1(X); T2: w2(x); and T3: w3(X):
Sg: r1(X); w2(X); w1(X); w3(X);c1;c2;c3;
in Sg the operation w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value of X.
Constrained write assumption is more realistic as often we need to take in account the value of a
variable before editing the value in the application or query.
Comment
Chapter 20, Problem 11RQ
Problem
Discuss how serializability is used to enforce concurrency control in a database system. Why is
serializability sometimes considered too restrictive as a measure of correctness for schedules?
Step-by-step solution
Step 1 of 4
The concept of serializability of schedules is used to identify which schedules are correct when
transaction executions have interleaving of their operations in the schedules. A schedule S of n
transactions is serializable if it is equivalent to some serial schedule of the same n transactions.
Saying that a non serial schedule S is serializable is equivalent of saying that it is correct,
because it is equivalent to a serial schedule, which is considered correct.
There are several ways of saying that a Schedule is equivalent:
Two schedules are result equivalent if they produce the same final state of database. However
two schedules may accidentally produce same final state, so result equivalence cannot be used
to define equivalence of schedules.
Comment
Step 2 of 4
Conflict equivalence: Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules. Two operations in a schedule are said to be
conflict if they belong to different transactions, access the same database item, and at least one
of the two operations is a write item operation. If two conflicting operations are applied in different
orders in two schedules, the effect can be different on the database or on other transactions in
the schedules, and hence the schedules are not conflict equivalent.
Comment
Step 3 of 4
View equivalence: Another less restrictive definition of schedules is called view equivalence.
Two schedules S and S' are said to be view equivalent if the following 3 conditions hold:
1.) The same set of transactions participates in S and S', and S and S' include the same
operation of those transactions.
2.) For any operation ri(X) of Ti in S, if the value of X read bt the operation has been written by an
operation wj(X) of Tj, the same condition must hold the value of X read by operation ri(X) of Ti in
S'.
3.) If the operation wk(Y) of Tk is the ast operation to write item Y in S, then wk(Y) of Tk must
also be the last operation to write item in S'.
Comment
Step 4 of 4
Serializability of schedules is sometimes considered to be too restrictive as a condition for
ensuring the correctness of concurrent executions. Some applications can produce schedules
that are correct by satisfying conditions less stringent than either conflict serializability or view
serializability.
An example of the type of transactions known as debit card transactions- for example, those that
apply deposits and withdrawals to data item whose value is the current balance of a bank
account. The semantics of debit- card operations is that they update value of a data item X by
either adding or subtracting to current value and both these operations are commutative- and it is
possible to produce correct schedules that are not serializable.
With additional knowledge, or semantics, that the operation between each ri(I) and wi(I) are
commutative, we know that the order of executing the sequence consisting of (read, write,
update) is not important as long as each (read, write, update)sequence by a particular
transaction Ti on a particular item is not interrupted by conflicting operations. Hence a non
serializable can also be considered correct. Researchers have been working on extending
concurrency control theory to deal with case where serializability is considered to be too
restrictive as a condition for correctness of schedules.
Comment
Chapter 20, Problem 12RQ
Problem
Describe the four levels of isolation in SQL. Also discuss the concept of snapshot isolation and
its effect on the phantom record problem.
Step-by-step solution
Step 1 of 2
The statement ISOLATION LEVEL is used to specify isolation value, where these values can be
SERIALIZABLE, REPEATABLE END, READ COMMITTED OR READ UNCOMMITTED.
SERIALIZABLE is the default isolation level, but some system uses READ COMMITTED as the
default level.
The four isolation levels are as follows:
1. Level 0: If the dirty reads of higher level transactions cannot be overwritten by a transaction,
then such transaction have level 0 isolation.
Such isolation level has the value READ UNCOMMITTED. It lets the transaction display the data
of previous statement on current page, whether or not the transaction is committed. This is called
dirty read too.
Example:
Statement 1:
Begin tran
UPDATE stu SET marks=200 where rollno. = 34
waitfor delay ’00:00:20’
COMMIT;
Statement 2:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT * FROM stu;
The statement 2 will execute after update of stu table by statement 1 and display records before
the transaction is committed.
2. Level 1: The transaction having this isolation level has no lost updates. Such isolation level
has the value READ COMMITTED.
In this isolation level, the SQL query statement takes only committed values. If any transaction is
locked or incomplete, then the select statement will wait until all the transactions complete.
3. Level 2: The transaction having this isolation level has no dirty reads as well as no lost
updates. Such isolation level has the value REPEATABLE READ.
Repeatable read is the extension to the committed read. It ensures that if the same query is
executed again in the transaction, it will not read the change in the data value that another query
has made. No other user can modify the data values until the transaction is committed or rolled
back by the previous user.
4. Level 3: In addition to the properties from level 2, isolation level 3 has repeatable reads. Such
isolation level has the value SERIALIZABLE. Serializable isolation level works like repeatable
read except that it prevents Phantoms, when same query is executed twice. This option works on
range lock. It locks whole the table if there is none of the condition is specified on index.
Comment
Step 2 of 2
Snapshot isolation:
Snapshot isolation is used in concurrency control protocols and some commercial DBMSs. Its
definition comprises of the data items that is read by a transaction based on the committed
values of the items present in the database snapshot.
Snapshot isolation ensures that Phantom record problem does not happen. It ensures this,
through the records that are executed in the database at the beginning of a transaction.
Comment
Chapter 20, Problem 13RQ
Problem
Define the violations caused by each of the following: dirty read, nonrepeatable read, and
phantoms.
Step-by-step solution
Step 1 of 1
Violations caused by :
Dirty read –
A transaction that reads information from another transaction, The initial transaction commits
while the other transaction aborts. This causes the source used in the initial transaction to
become incorrect.
Nonrepeatable read –
The transaction reads a value from a record. Another transaction changes the values of the
record that was read. When the initial transaction reads the record again, the values are different.
Phantoms –
A transaction may read a set of rows from a table based on some condition specified in the SQL
WHERE –class.Seeing a new row that was inserted during the process of the initial transaction.
The new row only shows up if the initial transaction is repeated.
Comment
Chapter 20, Problem 14E
Problem
Change transaction T2 in Figure 20.2(b) to read
read_item(X);
X := X + M;
if X > 90 then exit
else write_item(X);
Discuss the final result of the different schedules in Figures 20.3(a) and (b), where M = 2 and N =
2, with respect to the following questions: Does adding the above condition change the final
outcome? Does the outcome obey the implied consistency rule (that the capacity of X is 90)?
Step-by-step solution
Step 1 of 1
Let the condition is
read_item(X);
X:= X+M;
if X > 90 then exit
else write_item(X);
So, this condition is does not change the final output unless the initial value of X > 88.
The outcome, however, does obey the implied consistency rule that X < 90, since the
value of X is not updated if it becomes greater than 90.
Comment
Chapter 20, Problem 15E
Problem
Repeat Exercise 20.14, adding a check in T1 so that does not exceed 90.
Reference Exercise 20.14
Change transaction T2 in Figure 20.2(b) to read
read_item(X);
X := X + M;
if X > 90 then exit
else write_item(X);
Discuss the final result of the different schedules in Figures 20.3(a) and (b), where M = 2 and N =
2, with respect to the following questions: Does adding the above condition change the final
outcome? Does the outcome obey the implied consistency rule (that the capacity of X is 90)?
Step-by-step solution
Step 1 of 1
Let the data as
read_item(X);
X:= X+M;
if X > 90 then exit
else write_item(X);
from this we may write like
T1 T2
read_item(X);
X := X-N;
read_item(X);
X := X+M;
write_item(X);
read_item(Y);
if X > 90 then exit
else write_item(X);
Y := Y+N;
if Y> 90 then
exit
else write_item(Y);
This condition does not change the final output unless the initial value of X > 88 or
Y > 88.
This output obeys the implied consistency rule that X < 90 and Y < 90.
Chapter 20, Problem 16E
Problem
Add the operation commit at the end of each of the transactions T1 and T2 in Figure 20.2, and
then list all possible schedules for the modified transactions. Determine which of the schedules
are recoverable, which are cascade-less, and which are strict.
Step-by-step solution
Step 1 of 6
Let the data as
Let the two Transactions from text book
T1T2
read_item(X); read_item(X);
X := X - N ; X := X + M;
write_item(X); write_item(X);
read_item(Y); commit T 2
Y := Y + N;
write_item(Y);
commit T 1
From these transactions we can be written as using the shorthand notation. That is
T 1 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ;
T 2 : r 2 (X); w 2 (X); C 2 ;
Comment
Step 2 of 6
Given m transactions with number of operations n1, n2, ..., nm,
the number of possible schedules is
(n1 + n2 + ... + nm)! / (n1! * n2! * ... * nm!),
Here ! is the factorial function.
In our case, Let us consider
m =2
n1 = 5
n2 = 3,
so the number of possible schedules is
(5+3)! / (5! * 3!) = 8*7*6*5*4*3*2*1/ 5*4*3*2*1*3*2*1 = 56.
Comment
Step 3 of 6
So, that 56 possible schedules, and the type of each schedule are
S 1 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; r 2 (X); w 2 (X); C 2 ; strict (and hence
cascadeless)
S 2 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); C 1 ; w 2 (X); C 2 ; recoverable
S 3 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); w 2 (X); C 1 ; C 2 ; recoverable
S 4 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); w 2 (X); C 2 ; C 1 ; non-recoverable
S 5 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); C 1 ; w 2 (X); C 2 ; recoverable
S 6 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); w 2 (X); C 1 ; C 2 ; recoverable
S 7 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); w 2 (X); C 2 ; C 1 ; non-recoverable
S 8 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); w 1 (Y); C 1 ; C 2 ; recoverable
S 9 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); w 1 (Y); C 2 ; C 1 ; non-recoverable
S 10 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); C 2 ; w 1 (Y); C 1 ; non-recoverable
S 11 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; recoverable
S 12 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; recoverable
S 13 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; non-recoverable
S 14 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; recoverable
S 15 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; non-recoverable
S 16 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; non-recoverable
S 17 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; recoverable
S 18 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; non-recoverable
S 19 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; non-recoverable
S 20 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; non-recoverable
Comment
Step 4 of 6
S 21 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; strict (and hence
cascadeless)
S 22 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; cascadeless
S 23 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; cascadeless
S 24 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; cascadeless
S 25 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; cascadeless
S 26 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; cascadeless
S 27 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless
S 28 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless
S 29 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless
S 30 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless
S 31 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless
S 32 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless
S 33 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless
S 34 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless
S 35 : r 1 (X); r 2 (X); w 2 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; strict (and hence
cascadeless)
S 36 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; strict (and hence
cascadeless)
S 37 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; cascadeless
S 38 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; cascadeless
S 39 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; cascadeless
S 40 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; cascadeless
Comment
Step 5 of 6
S 41 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; cascadeless
S 42 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless
S 43 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless
S 44 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless
S 45 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless
S 46 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless
S 47 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless
S 48 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless
S 49 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless
S 50 : r 2 (X); r 1 (X); w 2 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; cascadeless
Comment
Step 6 of 6
S 51 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; non-recoverable
S 52 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; recoverable
S 53 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; recoverable
S 54 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; recoverable
S 55 : r 2 (X); w 2 (X); r 1 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; recoverable
S 56 : r 2 (X); w 2 (X); C 2 ; r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; strict (and hence
cascadeless)
Comment
Chapter 20, Problem 17E
Problem
List all possible schedules for transactions T1 and T2 in Figure 20.2, and determine which are
conflict serializable (correct) and which are not.
Step-by-step solution
Step 1 of 3
Let the two Transactions T1 and T2 are as follows:
Comment
Step 2 of 3
The Shorthand notation for the two transactions is,
Comment
Step 3 of 3
Below are the 15 possible schedules and their type of each schedule:
Comment
Chapter 20, Problem 18E
Problem
How many serial schedules exist for the three transactions in Figure 20.8(a)? What are they?
What is the total number of possible schedules?
Step-by-step solution
Step 1 of 2
Let the three Transactions from text book like
T 1 T 2 T3
read_item(X); read_item(Z); read_item(Y);
write_item(X); read_item(Y); read_item(Z);
read_item(Y); write_item(Y) write_item(Y);
write_item(Y); read _item(X) write_item(Z)
write_itme(X)
Comment
Step 2 of 2
From defination of serial schedules the above three transactions are
T1 T2 T3
T3 T2 T1
T2 T3 T1
T2 T1 T3
T3 T1 T2
T1 T3 T2
Total number of serial schedules for the three transactions = 6
And
The total number of serial schedules for n transactions is factorial(n)
ie..(n!).
Comment
Chapter 20, Problem 19E
Problem
Write a program to create all possible schedules for the three transactions in Figure 20.8(a), and
to determine which of those schedules are conflict serializable and which are not. For each
conflict-serializable schedule, your program should print the schedule and list all equivalent serial
schedules.
Step-by-step solution
Step 1 of 1
Programs for finding serializable schedules:
Array TansactionT1Commands [4] ;
Int t1Counter = 0;
Int t2Counter = 0;
Int t3Counter = 0;
Int maxCounter = 0;
Int Transaction;
Array TansactionT2Commands[5] ;
Array TansactionT3Commands[4] ;
Array FinalSchedules[12];
Array Schedules[12][5000];//there can be many schedules we will take only 5000
While (maxCounter < 5000)
{
For (int i = 0; i< 13; i++)
{
Int ti = Rand(3);
If (ti == 1 && t1Counter< 4)
{
FinalSchedule[i++] = TansactionT1Commands[t1Counter++];
}
Else If(ti== 2 && t2Counter< 5)
{
FinalSchedule[i++] = TansactionT2Commands[t2Counter++];
}
Else if (t1== 3 && t3Counter< 4)
{
FinalSchedule[i++] = TansactionT3Commands[t3Counter++];
}
}
If (FinalSchedules[12] in Schedules[12][5000]);
{
////Do nothing
}
Else
{
Save FinalSchedules[12] in Schedules[12][5000]);
maxCounter++;
Check if Seralizable (FinalSchedules[12]);
}
}
Check if Seralizable (Array FinalSchedules[12])
{
For each transaction create a node.
For each case in Schedule S where Tj executes a read_item(X) after Ti executes a write_item(X),
create an edge (Ti-> Tj) in the precedence graph.
For each case in Schedule S where Tj executes a write_item(X) after Ti executes a read_item(X),
create an edge (Ti-> Tj) in the precedence graph.
For each case in Schedule S where Tj executes a write_item(X) after Ti executes a
write_item(X), create an edge (Ti-> Tj) in the precedence graph.
The schedule is seralizavble only if there is no cycles.
If Serializable print FinalSchedules[12])
Return;
}
Comment
Chapter 20, Problem 20E
Problem
Why is an explicit transaction end statement needed in SQL but not an explicit begin statement?
Step-by-step solution
Step 1 of 1
A transaction is an atomic operation. It has only one way to begin, that syntax is like this
BEGIN_ TRANSACTION
------;
- - - - - ; // READ OR WRITE //
-----;
END TRANSATIONS;
COMMIT_TRANSACTION
Transactions could end up in two ways:
Successfully installs-- its updates to the database (i.e., commit)
or
Removes -- its partial updates (which may be incorrect) from the database (abort).
So, it is important for the database systems to identify the right way of ending a transaction. It is
for this reason an "End" command is needed in SQL2 query.
Comment
Chapter 20, Problem 21E
Problem
Describe situations where each of the different isolation levels would be useful for transaction
processing.
Step-by-step solution
Step 1 of 2
Transaction isolation measure the influence of other concurrent transactions on a given
transaction. This affects of concurrency has two levels, that are
the highest in Read Uncommitted
and
the lowest in Serializable.
Isolation level Serializable:
In this level preserves consistency in all situations, thus it is the safest execution mode. It is
recommended for execution environment where every update is crucial for a correct result. For
example, airline reservation, debit credit, salary increase, and so on.
Isolation level Repeatable Read:
In this level is similar to Serializable except Phantom problem may occur here. Thus, in record
locking (finer granularity), this isolation level must be avoided. It can be used in all types of
environments, except in the environment where accurate summary information (e.g., computing
total sum of a all different types of
account of a bank customer) is desired.
Comment
Step 2 of 2
Isolation level Read Committed:
In this level a transaction may see two different values of the same data items during its
execution life. A transaction in this level applies write lock and keeps it until it commits. It also
applies a read (shared) lock but the lock is released as soon as the data item is read by the
transaction. This isolation level may be used for making balance, weather, departure or arrival
times, and so on.
Isolation level Read Uncommitted:
In this level a transaction does not either apply a shared lock or a write lock. The transaction is
not allowed to write any data item, thus it may give rise to dirty read, unrepeatable read, and
phantom. It may be used in the environment where statistical average of a large number of data
is required.
Comment
Chapter 20, Problem 22E
Problem
Which of the following schedules is (conflict) serializable? For each serializable schedule,
determine the equivalent serial schedules.
a. r1(X); r3(X); w1(X); r2(X); w3(X);
b. r1(X); r3(X); w3(X); w1(X); r2(X);
c. r3(X); r2(X); w3(X); r1(X); w1(X);
d. r3(X); r2(X); r1(X); w3(X); w1(X);
Step-by-step solution
Step 1 of 5
Serializable schedule:
A conflict graph corresponding to a schedule decides whether given schedule is conflict
serializable or not. If conflict graph contains cycle, then the schedule is not serializable. The
drawing sequence of conflict graph:
1) Create a node labeled Ti in graph for each of the transaction Ti which participates in schedule
S.
2) An edge is created from Ti to Tj in graph, where a write_item(X) is executed by Ti and then a
read_item(X) is executed by Tj.
3) Create an edge in graph from Ti to Tj, where a read_item(X) is executed by Ti and then a
write_item(X) is executed by Tj.
4) Create an edge in graph from Ti to Tj, where a write_item(X) is executed by Ti and then a
write_item(X) is executed by Tj.
5) If no cycles are present in conflict graph, then it is a serializable schedule.
Comment
Step 2 of 5
(a)
Given schedule:
Conflict graph:
The conflict graph has cycle, in T1-T3. Hence, given schedule is
Comment
Step 3 of 5
(b)
Given schedule:
Conflict graph:
.
The conflict graph has cycle, in T1-T3. Hence, the schedule S is
.
Comment
Step 4 of 5
(c)
Given schedule:
Conflict graph:
The graph contains no cycles. Hence, the schedule S is
• The equivalent schedule that is serial is:
.
, that is,
Comment
Step 5 of 5
(d)
Given schedule:
Conflict graph:
The conflict graph has cycle, in T1-T3. Hence, the schedule S is
.
Chapter 20, Problem 23E
Problem
Consider the three transactions T1 T2, and T3, and the schedules S1 and S2 given below. Draw
the serializability (precedence) graphs for S1 and S2 and state whether each schedule is
serializable or not. If a schedule is serializable, write down the equivalent serial schedule(s).
T1: r1 (X); r1 (Z); w1 (X);
T2: r2 (Z); r2 (Y); w2 (Z); w2(Y);
T3: r3 (X); r3 (Y); w3 (Y);
S1: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2 (Y); w2 (Z); w2 (Y);
S2: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); w2 (Z); w3 (Y); w2 (Y);
Step-by-step solution
Step 1 of 2
The schedule S1 is as follows:
S1: r1(X); r2(Z); r1(Z); r3(X); r3(Y); w1(X); w3(Y); r2(Y); w2(Z); w2(Y)
The precedence graph for S1 is as follows:
The schedule S1 is a serializable schedule as there is no cycle in the precedence graph.
• T3 reads X before X is modified by T1.
• T1 reads Z before Z is modified by T2.
• T2 reads Y and writes it only after T3 has written to it.
The equivalent serializable schedule is as follows:
Comment
Step 2 of 2
The schedule S2 is as follows:
S2: r1(X); r2(Z); r3(X); r1(Z); r2(Y); r3(Y); w1(X); w2(Z); w3(Y); w2(Y)
The precedence graph for S1 is as follows:
The schedule S2 is not a serializable schedule as there is cycle in the precedence graph.
• T2 reads Y before T3 reads it and modifies Y.
• T3 reads Y which is later modified by T2.
Comment
Chapter 20, Problem 24E
Problem
Consider schedules S3, S4, and S5 below. Determine whether each schedule is strict,
cascadeless, recoverable, or nonrecoverable. (Determine the strictest recoverability condition
that each schedule satisfies.)
S3: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); c1; w3 (Y); c3; r2(Y); w2(Z); w2(Y); c2;
S4: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2(Y); w2(Z); w2(Y); c1; c2; c3;
S5: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); c1; w2(Z); w3(Y); w2(Y); c3; c2;
Step-by-step solution
Step 1 of 5
Strict schedule: A schedule is said to be a strict schedule if a transaction neither reads or writes
an item x until another transaction that wrote x is committed.
The schedule S3 is a not a strict schedule because of the following reason:
• The operation r3(x) is before w1(x) in the schedule S3.
• It means that T3 reads the value of x before T1 has written the value of x.
• T3 must read x only after T1 commits.
The schedule S4 is a not a strict schedule because of the following reason:
• The operation r3(x) is before w1(x); in the schedule S4.
• It means that T3 reads the value of x before T1 has written the value of x.
• T3 must read x only after T1 commits.
The schedule S5 is a not a strict schedule because of the following reason:
• The operation r3(x) is before w1(x); in the schedule S5.
• It means that T3 reads the value of x before T1 has written the value of x.
• T3 must read x only after T1 commits.
Comment
Step 2 of 5
Cascadeless schedule: A schedule is said to be a cascadeless schedule if a transaction reads
or writes an item x only after a transaction that wrote x is committed.
The schedule S3 is a not a cascadeless schedule because of the following reason:
• The operation r3(x) is before w1(x) in the schedule S3.
• It means that T3 reads the value of x before T1 commits.
The schedule S4 is a not a cascadeless schedule because of the following reason:
• The operation r3(x) is before w1(x); in the schedule S4.
• It means that T3 reads the value of x before T1 commits
The schedule S5 is a not a cascadeless schedule because of the following reason:
• The operation r3(x) is before w1(x); in the schedule S5.
• It means that T3 reads the value of x before T1 commits
Comment
Step 3 of 5
Recoverable and non-recoverable schedule:
A schedule is said to be a recoverable schedule if no transaction T commits until the transaction
T’ that wrote x and whose value of x is read by T is committed.
Schedule S3:
• If the T1 aborts first and then T3 and T2 are committed, then the schedule S3 is recoverable as
rolling back of T1 does not affect T2 and T3.
• If the T1 commits first and then T3 aborts and then T2 commits, then the schedule S3 is not
recoverable as rolling back of T3 will affect T2 as it has read the value of y written by T3.
• If the T1 and T3 commits and then T2 aborts, then the schedule S3 is recoverable as rolling
back of T2 does not affect T1 and T3.
• Strictest condition is transaction T3 must be committed before T2.
Comment
Step 4 of 5
Schedule S4:
• If the T1 aborts first and then T2 and T3 are committed, then the schedule S4 is recoverable as
rolling back of T1 does not affect T2 and T3.
• If the T1 commits first and then T2 aborts and then T3 commits, then the schedule S4 is
recoverable as rolling back of T1 does not affect T2 and T3. The value of y which is read and
written by T3 will be restored by the rollback of T2.
• If the T1 and T2 commits and T3 aborts, then the schedule S4 is not recoverable as rolling back
of T3 will affect T2 as it has read the value of y written by T3.
• Strictest condition is transaction T3 must be committed before T2.
Comment
Step 5 of 5
Schedule S5:
• If the T1 aborts first and then T3 and T2 are committed, then the schedule S5 is recoverable as
rolling back of T1 does not affect T2 and T3. T1 writes the value of x which is not read by T2 nor
T3.
• If the T1 commits first and then T3 aborts and then T2 commits, then the schedule S5 is not
recoverable as rolling back of T3 will affect T2 as it has read the value of y written by T3.
• If the T1 and T3 commits and then T2 aborts, then the schedule S5 is recoverable as rolling
back of T2 does not affect T1 and T3.
• Strictest condition is transaction T3 must be committed before T2.
Comment
Chapter 21, Problem 1RQ
Problem
What is the two-phase locking protocol? How does it guarantee serializability?
Step-by-step solution
Step 1 of 2
Two-phase locking:
Two-phase locking schema is a one of the locking schema is which a transaction cannot request
a new lock until it unlocks the operations in the transaction. It is involved in two phases.
• Locking phase
• Unlocking phase.
Locking phase:
This is the expanding or growing phase in which the new locks are acquired but none is
released.
Unlocking phase:This is the second phase referred as shrinking phase in which it releases the existing locks and
does not acquire the new locks.
Comment
Step 2 of 2
Guarantee of serializability:
The attraction of the two-phase algorithm derives from a theorem which provides that the twophase locking algorithm always leads to serializable schedules.
It is proved that if every transaction in a schedule follows the two-phase locking protocol, then the
schedule is guaranteed to be serializable.
With the two-phase locking protocol, the schedule is guaranteed to be serializability because the
protocols will prevent interface among different transactions and it avoids the problems of last
update, uncommitted dependency and inconsistent analysis if the two phase locking is enforced.
Comment
Chapter 21, Problem 2RQ
Problem
What are some variations of the two-phase locking protocol? Why is strict or rigorous two-phase
locking often preferred?
Step-by-step solution
Step 1 of 2
Variations two-phase locking protocol:According to the two-phase locking protocol, locks are handled by transactions and there are a
number of variations of two-phase locking.
That is
(1) Conservative 2PL (or) static 2PL
It requires a transaction to lock all the items it access before the transaction beings execution by
predeceasing its read-set and write-set, it is a deadlock-free protocol.
(2) Basic 2PL
This a one technique of 2PL and transaction locks data items incrementally. This may cause
dead lock which is dealt with.
Comment
Step 2 of 2
Strict or rigorous two-phase locking is preferred because,
In this variation, a transaction T does not release any of it’s exclusive (write) locks until after it
commits or aborts. So, no other transaction can read/write an item that is written by T unless T
have committed.
And strict 2PL is not dead lock-free.
And most restrictive variation of strict -2PL is rigorous 2PL. it also guarantees the strict
schedules.
In this, a transaction T does not release any of it’s locks until after it commits or aborts and so it is
easier to implement than strict 2PL.
Comment
Chapter 21, Problem 3RQ
Problem
Discuss the problems of deadlock and starvation, and the different approaches to dealing with
these problems.
Step-by-step solution
Step 1 of 4
Deadlock:
• A deadlock refers to a situation in which a transaction Ti waits for an item that is locked by
transaction Tj. The transaction Tj in turn waits for an item that is locked by transaction Tk.
• When each transaction in a set of transactions is waiting for an item that is locked by other
transaction, then it is called deadlock.
Example:
Suppose there are two transaction T1 and T2 and there are two items X and Y.
• Initially transaction T1 hold the item X and transaction T2 hold the item Y.
• In order for the transaction T1 to complete its execution, it needs item Y which is locked by
transaction T2.
• In order for the transaction T2 to complete its execution, it needs item X which is locked by
transaction T1.
Such a situation is known as deadlock situation because neither transaction T1 and T2 can
complete its execution.
Comment
Step 2 of 4
The different approaches to dealing with deadlock are as follows:
• Deadlock prevention: The transaction acquires the lock on all the items it needs before starting
the execution. If it cannot acquire a lock on an item, then it should not lock any other items and
should wait and try to acquire locks again.
• Deadlock detection: A wait for graph is used to check for deadlocks.
• Timeouts: A transaction is aborted if it waits for a period longer than the system defined time.
Comment
Step 3 of 4
Starvation:
• Starvation refers to a situation in which a low priority transaction waits indefinitely while other
high priority transactions execute normally.
• Starvation problem occurs when locking is used.
Comment
Step 4 of 4
The different approaches to dealing with starvation are as follows:
• Use the first come first serve queue to maintain the transactions that are waiting. The
transactions can acquire lock on an item in the same order they have been placed in the queue.
• Increase the priority of the transactions that are waiting longer so that at some point of time it
becomes the transaction with highest priority and proceeds to execute.
Comment
Chapter 21, Problem 4RQ
Problem
Compare binary locks to exclusive/shared locks. Why is the latter type of locks preferable?
Step-by-step solution
Step 1 of 2
Binary locks:Binary locks are type of lock. It has only two states of a lock, it is too simple, and it is too
restrictive. It is not used in the practice.
Exclusive/shared lock:Exclusive/shared locks that may provide more general locking capabilities and that are used in
practical database locking schemas.
In this lock.
Read-lock as a shared lock and
Write-lock as an exclusive lock.
From the above locks, exclusive/shared lock is preferable, because,
Share-lock is the read-locked item through this other operations are allow to read the item and
where as a write-locked is a single transaction exclusively holds the lock on the item. Here these
are three locking operations.
That are
Read-lock (X)
Write-lock (X), and
Un lock (X)
Comment
Step 2 of 2
If we use the shared locking scheme. The system must following the
(1) A transaction T must issue the operation read-lock (X) or write-lock(X) before any read-item
(X) operation is performed in T
(2) A transaction T must issue the operation write-lock (X) before any write-items (X) operation is
performed in T.
(3) A Transactions T must issue the operation unlock (X) after all read-items (X) and writeritem(X) operations are completed in T.
(4) A Transaction T will not issue a read lock (X) operation if it already holds a read (Shared) lock
or a write (Exclusive) lock on item X. This rule may be relaxed.
Comment
Chapter 21, Problem 5RQ
Problem
Describe the wait-die and wound-wait protocols for deadlock prevention.
Step-by-step solution
Step 1 of 2
Wait-die and wound-wait protocols:Transactions are start based on the order of the timestamps, hence. If transaction
before transaction
, then
starts
.
So, we notice that, the order transaction has the smaller timestamp value.
Two schemes that prevent dead lock are called wait-die and wound-wait.
For suppose, transaction
other Transaction
tries to lock an item X but is not able to because X is locked by some
. With a conflicting lock. These rules are followed by below schemas.
Comment
Step 2 of 2
Wait-die:If
. Then
same times stamps; other wise
abort
younger than
and restart it later with the
as allowed to waid.
In a wait-die an older transaction is allowed to wait on a younger transaction and it is requesting
an item held by an older transaction is aborted and restarted.
The wound-wait is the opposite to wait-die.
Means:
A younger transaction is allowed to wait an older one. Where an older transaction requesting an
item held by a younger transaction precepts the younger transaction by a forting. It
Comment
Chapter 21, Problem 6RQ
Problem
Describe the cautious waiting, no waiting, and timeout protocols for deadlock prevention.
Step-by-step solution
Step 1 of 3
We may prevention the dead lock by using following.
Cautious waiting:Suppose, a transaction
by some other transaction
tries to lock an item
but it is not able to do. Because
is locked
with a conflicting lock.
And
If
is not blocked, than
is blocked and allowed to wait other wise abort
Ie
If X is waiting for
, let it wait unless
is also waiting for
to release some other item.
Comment
Step 2 of 3
No waiting:In case of inability to obtain a lock, a Transaction aborts and is resubmitted with a fined delay
Comment
Step 3 of 3
Timeout
If a transactions waits for a period longer than a system-defined time out period, and the system
assumes that the transaction may be dead locked and ;aborts it-regardless of whether a
deadlock actually exists or not.
If we use time out protocol in the dead lock prevention. Some transactions that were not
deadlocked and they may abort and may have to be resubmitted.
Comment
Chapter 21, Problem 7RQ
Problem
What is a timestamp? How does the system generate timestamps?
Step-by-step solution
Step 1 of 1
Timestamp:Time stamp is a unique identifier created by the DBMS to identify a transaction and it’s values
are assigned in the order in which the transactions are submitted to the system.
Time stamp means.
A monotonically increasing variable (integer) indicating the age of an operation or a transaction.
Time stamps that can be generated by system in several ways
It is to use a counter and that is incremented each time its value is assigned to a transaction. In
this schema, the transaction time stamps are numbered like 1, 2, 3,…and A computer counter
has a finite maximum value. So the system must periodically reset the counter to zero. When no
transactions are executing for some short period of time
and system may implement the timestamps to use the current date/time values of the system
clock and ensure that no two time stamp value are generated during the same tick of the clock.
Comment
Chapter 21, Problem 8RQ
Problem
Discuss the timestamp ordering protocol for concurrency control. How does strict timestamp
ordering differ from basic timestamp ordering?
Step-by-step solution
Step 1 of 3
Time stamp ordering protocol for concurrency control:
The protocol manages concurrent executing such that the time stamps determine the
serializability order.
The protocol maintains for each data Q through two timestamp values.
(1) W-timestamp(Q)
It is a largest time-stamp of any transaction that executed write (Q) successfully.
(2) R-time stamp (Q)
It is the largest time-stamp of any transaction that executed read (Q)
Successfully.
Time stamp ordering protocol ensures that any conflicting read and write operations are executed
in timestamp order.
Comment
Step 2 of 3
Differ from strict time stamp ordering through basic timestamp ordering:Strict time stamp ordering (TO)
When transaction ‘T’ issues a write-item (X) operations and read-item (X) operation.
If TS(T)> read-TS(X) then delay T until the transaction ‘T’ that wrote or read X has terminated
and
if TS(T)> write-TS(X) the delay T until the transaction ‘T’ that wrote or read X has terminated
Comment
Step 3 of 3
Basic timestamp ordering:When transaction ‘T’ issues a write-item (X) operations and read-item (X) operation.
If TS (T)> read-TS(X) then delay T until the transaction ‘T’ that wrote or read X has terminated
and
If TS (T)> write-TS(X) the delay T until the transaction ‘T’ that wrote or read X has terminated
If read-TS(X)>TS(T) or does not exist, then execute write-item (X) of T and set write-TS(X) to
TS(T).
And
If write-TS(X) >TS(T), then an younger transaction has already written to the data item so a fort
and roll-back T and reject the operation.
If write-TS(X)
TS (T), then execute read-item (X) of T and set read-TS(X) to the larger of TS(T)
and current read-TS(X).
Comment
Chapter 21, Problem 9RQ
Problem
Discuss two multiversion techniques for concurrency control. What is a certify lock? What are the
advantages and disadvantages of using certify locks?
Step-by-step solution
Step 1 of 4
Multiple concurrency control techniques are the ones that retain the old value of data items, while
dealing with the newer version of the values. The purpose behind holding the older values as
well as is to maintain serializablity and to support some older values as well that are compatible
with the previous data.
Two multiversion techniques for concurrency control are as follows:
1. Multiversion Technique Based on Timestamp Ordering.
2. Multiversion Two-Phase Locking Using Certify.
Comment
Step 2 of 4
Consider the description of the two multiversion techniques for concurrency control discussed
above:
1.
Multiversion Technique Based on Timestamp Ordering:
In this, several versions of each data X are maintained. For each version there must two be more
details.
• Read_TS: It is the time stamp of that particular moment when the data is read. It contains the
highest value of all time stamps.
• Write_TS: It hold the value of that particular moment at which the data is updated.
Whenever a write operation is performed over an item X, the newer version of both the read_TS
and write_TS is made, while previous version is also retained.
Comment
Step 3 of 4
2.
Multiversion Two-Phase Locking Using Certify Locks:
In this, there are three kinds of locking modes for each item. These three kinds of locking modes
are as follows:
• Read
• Write
• Certify
So, if a state is said to be locked then it may be any of these three locks.
• In the previous locking scheme, if a transaction holds a write lock over an item, then no one
item is allowed to access that. But here it is to allow other transactions T to read an item X while
a single transaction T holds a write lock on X.
• For this purpose two version of x is to be held. Then in case of committing a transaction, certify
lock is to be maintained over an item.
Comment
Step 4 of 4
Certify Lock:
It is the kind of lock that is attained only when all the updated values need to be finalized so that
it can get a stable state. It is similar to a commit statement when all the transactions that are
performed successfully are need to be saved.
Advantages of Certify Lock:
• When the transaction is completed and is ready to be saved, then a certify lock is maintained
over a transaction or over an item so as to maintain a monopoly over it.
• The updating of the data item can be completed securely and the data get saved from any kind
of hindrance.
Disadvantage of Certify Lock:
When a transaction is completed and there is maintained a certifies lock, then in that case none
of the other data item or other process is not able to have access over that item and cannot have
access even for reading the item.
Comment
Chapter 21, Problem 10RQ
Problem
How do optimistic concurrency control techniques differ from other concurrency control
techniques? Why are they also called validation or certification techniques? Discuss the typical
phases of an optimistic concurrency control method.
Step-by-step solution
Step 1 of 2
In all concurrency control techniques, certain degree of checking is done before a database
operation can be executed. For example, in locking a check is done to determine weather the
item being accessed is locked. In timestamp ordering, the transaction timestamp is checked
against the read and the write timestamps of the item. Such checking represent overhead during
transactions. In optimistic concurrency control techniques, also known as validation or
certification techniques, no checking is done while the transaction is executing. In one of
validation schemes, updates in the transaction are not applied directly to the database items until
the transaction reaches its end. During transaction execution all updates are made to the local
copies of data items that are kept for transaction. At the end of transaction execution, validation
phases checks weather any of the transaction’s updates violate serializability. Certain information
needed by validation phase must be kept in the system. If serializability is not violated the
transaction is committed and database is updated from local copies; otherwise the transaction is
aborted and restarted later.
Comment
Step 2 of 2
Phases of Concurrency control protocol:
1.) Read phase: A transaction can read values of committed data items from the database.
However, updates are applied only to local copies of the data items kept in the transaction
workspace.
2.) Validation phase: Checking is performed to ensure that serializability will not be violated if
the transaction updates are applied to the database.
3.) Write phase: If the validation phase is successful, the transaction updates are applied to the
database; otherwise, the updates are discarded and the transaction restarted.
The idea behind optimistic concurrency control is to do all checks at once; hence, transaction
execution proceeds with a minimum overhead until the validation phase is reached. Since in the
validation phase it is decided that if transaction can be committed or must be aborted it is also
called as validation or certification technique.
Comment
Problem
Chapter 21, Problem 11RQ
What is snapshot isolation? What are the advantages and disadvantages of concurrency control
methods that are based on snapshot isolation?
Step-by-step solution
Step 1 of 1
Snapshot isolation:
Snapshot isolation is used in concurrency control protocols and some commercial DBMSs. Its
definition comprises of the data items that is read by a transaction based on the committed
values of the items present in the database snapshot.
Snapshot isolation ensures that Phantom record problem does not happen. It ensures this,
through the records that are executed in the database at the beginning of a transaction.
Advantages of concurrency control methods based on snapshot isolation are as follows:
• As the database statement or even database transaction only have the records, that were
executed in the database when the transaction had started, so the snapshot isolation ensures
that the phantom record problem does not arises.
• The problems of nonrepeatable read and dirty read might arise during the transaction
execution. Snapshot isolation ensures that these problems of nonrepeatable read and dirty read
does not occur.
• The concurrency control methods based on snapshot isolation has reduced overhead
associated with the two phase locking, as there is no necessity to apply read locks to the items,
in the read operations linked with the concurrency control methods.
Disadvantages of concurrency control methods based on snapshot isolation are as follows:
• Nonserializable schedules can occur in the case of concurrency control based snapshot
isolation. There are few anomalies such as write-skew anomalies, read-only transaction anomaly
that violates serializability.
Such anomalies results in corrupted or inconsistent database.
Comment
Next
Chapter 21, Problem 12RQ
Problem
How does the granularity of data items affect the performance of concurrency control? What
factors affect selection of granularity size for data items?
Step-by-step solution
Step 1 of 3
The size of data item is often referred to as data item granularity. Smaller the size of data item it
is fine granularity, larger size is course granularity.
Comment
Step 2 of 3
How does it affect performance of concurrency control?
1.) First notice that the larger the data item size is, the lower the degree of concurrency
permitted. For example, if the data item size is a disk block, a transaction T that needs to lock a
record B must lock the whole disk block X that contains B because a lock is associated with the
whole data item (block). Now, if another transaction S wants to lock a different record C that
happens to reside in the same block X in a conflicting lock mode, it is forced to wait. If the data
item size was a single record, transaction S would be able proceed, because it would be locking
a different data item (record).
2.) The smaller the data item size is, the more the number of items in the database. Because
every item is associated with a lock, the system will have a larger number of active locks to be
handled by lock manager. More lock and unlock operations will be performed, causing a higher
overhead. In addition , more storage space will be required for the lock table. For timestamps,
storage is required for the read_TS and write_TS for each item, and there will be similar
overhead for handling a large number of items.
Comment
Step 3 of 3
Factors affecting selection of granularity size for data items:
Best item size is dependent on transactions involved. If a typical transaction accesses a small
number of records, it is advantageous to have the data item granularity be one record. On other
hand, if a transaction typically accesses many records in the same file. It may be better to have
block or file granularity so that the transaction will consider all the records as one data item.
Comment
Chapter 21, Problem 13RQ
Problem
What type of lock is needed for insert and delete operations?
Step-by-step solution
Step 1 of 2
Types of locks needed for insert and delete operations:If we want to per form a delete/insert operation a new item in the database, it can not be
accessed until. The item is created and the insert operation is completed. For this we use the
locks that
(1) two-phase locking (2) index locking by using two-phase locking if we use the delete
operation, that may be performed only if the transaction deleting the tuple holds an exclusive lock
on the tuple to be deleted.
And
Comment
Step 2 of 2
A transaction that inserts a new tuple into the database is automatically given an exclusive lock
on the inserted tuple.
Insertion and deletion can lead to the phantom phenomenon.
A transaction that scans a relation and a transaction that inserts a tuple in the relation. And if only
tuple locks are used non-serializable schedules can result.
the transaction scanning the relation is reading information that indicates which tuples the
relation contains and while a transaction inserting a tuple updates the same information.
Transactions inserting or deleting a tuple acquire and exclusive lock on the data item.
From the above protocol, it provides a law concurrency for insertions/deletions.
And
Index locking protocols provide higher concurrency while preventing the phantom problem by
requiring the locks on certain index buckets.
Comment
Chapter 21, Problem 14RQ
Problem
What is multiple granularity locking? Under what circumstances is it used?
Step-by-step solution
Step 1 of 1
Multiple granularity locking is a lock that may contain locks are set of objects. That contain other
object locks are exploiting the hierarchical nature of contains a relationship.
Multiple granularity locks should have to make some decision for all transactions and data
containers are nested.
Multiple granularity locks used in where the granularity level can be different for various maxis of
transactions.
• The Multiple granularity lock may use in concurrency control performance and Ensure that
correctness, efficiency.
• To create multiple granularity locking, there is required, some extra type of locks, those locks
are termed as intention locks.
Comment
Chapter 21, Problem 15RQ
Problem
What are intention locks?
Step-by-step solution
Step 1 of 2
Intention locks:A lock that can be used for, “to macking a lock at multiple granularity levels practical, additional
types of lock is needed. That is intention lock.
Main idea a behind intention locks is, for a transaction to indicate which type of lock it will require
later for a row in that table.
(Not locking the object, but declare intension to lock part of the object) here, there are three types
of intention locks.
Comment
Step 2 of 2
(1) Intention – shared (IS):Indicates that, a shared lock (S) will be requested on some decendant node (S)
(2) Intention – exclusive (1X):Indicates that an exclusive lock (S) will be requested on some descendant node(S)
(3) shared-intention-exclusive (Six)
It includes that the current node is locked in shared mod but an exclusive lock (S) will be
requested on some descendent node (S)
And
The intention lock protocall follows.
(1) Before a given transaction can acquire an S lock on a given row. It must first acquire an Is or
stronger lock on the table contain the row.
(2) Before the given transaction can acquire an X lock on a given row. It mot first acquire an IX
lock on the table containing that row.
Comment
Chapter 21, Problem 16RQ
Problem
When are latches used?
Step-by-step solution
Step 1 of 1
Latches are used. For, to guarantee the physical integrity of a page when that page is being
written from the buffer disk.
And
Latch would be acquired for the page the page written to disk and then the latch be released.
Typically locks are held for a short duration. This is a called as latches.
Comment
Chapter 21, Problem 17RQ
Problem
What is a phantom record? Discuss the problem that a phantom record can cause for
concurrency control.
Step-by-step solution
Step 1 of 2
Phantom record:When a new record is inserted by some transaction T, that satisfies the condition, a set of
records accessed by another Transaction
. At this time, transaction
followed by transaction
T, it is new one and it is not included for equivalent serial order. And the Transaction logically
conflict in the latter case there is really no record in common between the two transactions, since
may have locked for all records before transaction ‘T’ inserted the new record.
The record that causes the conflict of the phantom record
Comment
Step 2 of 2
The phantom record can cause for concurrency record:For this we take an example.
Suppose, the transaction T is inserted a New EMPLOYEE record whose Dno=5.
While transaction
is accessing all EMPLOYEE records whose Dno=5. then the equivalent
serial order is T followed by the
. Then
must read the new EMPLOYEE record and include
its salary in the sum calculation. At this time the new salary should not be included and the latter.
Case there is really no record in common between the two transactions. Since
may have
locked all the records with Dno=5 before T inserted the new record. This is because the record
that cause the conflict is a phantom record. It is suddenly appeared in the database on being
inserted.
If the other operation in the two transactions conflict, the conflict due to the phantom record may
not be recognized by the concurrency control protocol.
Comment
Chapter 21, Problem 18RQ
Problem
How does index locking resolve the phantom problem?
Step-by-step solution
Step 1 of 2
Index locking:Index includes entries that have an attribute values. Plus a set of pointers to all records in the file
with that value. And if the index entry is locked before the record it self can be accessed. Then
the conflict on the phantom record can be detected because transaction
would request a read
lock on the index entry and transaction T would request a write lock on the same entry before
that could place the locks on the actual records.
Since the index lock conflict the phantom conflict and that would be detected.
Comment
Step 2 of 2
Example:Let the index on Dno of EMPLOYEE would be include an entry for each distinct Dno value, plus
a set of pointers to all EMPLOYEE records with that value.
At this time if the index entry is locked, before the record itself can be accessed, then the conflict
on the phantom record can be detected because transaction
would request a read lock on the
index entry for Dno=5 and transaction T would request a write lock on the same entry before they
could place the locks on the actual records.
Since the index locks conflict the phantom conflict would be detected.
Comment
Chapter 21, Problem 19RQ
Problem
What is a predicate lock?
Step-by-step solution
Step 1 of 1
Predicate lock:Index locking is a special case of predicate locking for which an index supports efficient
implementation of the predicate lock.
Predicate lock means all records that satisfy some logical predicate, and it satisfy an arbitrary
predicate
In general predicate locking has a lot of locking has a lot of locking over head.
It is too expensive.
Fancier index locking tricks are used in practice.
Comment
Chapter 21, Problem 20E
Problem
Prove that the basic two-phase locking protocol guarantees conflict serializability of schedules.
(Hint: Show that if a serializability graph for a schedule has a cycle, then at least one of the
transactions participating in the schedule does not obey the two-phase lockingprotocol.)
Step-by-step solution
Step 1 of 1
For This proof we tack contradiction, and assume binary locks for simplicity.
Let n transactions T1, T2, ..., Tn such that they all obey the basic two-phase locking rule which is
no transaction has an unlock operation followed by a lock operation. And Suppose that a non(conflict)-serializable schedule S for T1, T2, ..., Tn does occur; then, according to the precedence
(serialization) graph for S must have a cycle. Hence, there must be some sequence within the
schedule of the form:
S: ...; [o1(X); ...; o2(X);] ...; [ o2(Y); ...; o3(Y);] ... ; [on(Z); ...; o1(Z);]...
where each pair of operations between square brackets [o,o] are conflicting (either [w,w], or [w,
r], or [r,w]) in order to create an arc in the serialization graph. This implies that in transaction T1,
Than a sequence of the following form occurs:
T1: ...; o1(X); ... ; o1(Z); ...
Furthermore, T1 has to unlock item X (so T2 can lock it before applying o2(X) to follow
the rules of locking) and has to lock item Z (before applying o1(Z), but this must occur
after Tn has unlocked it). Hence, a sequence in T1 of the following form occurs:
T1: ...; o1(X); ...; unlock(X); ... ; lock(Z); ...; o1(Z); ...
This implies that T1 does not obey the two-phase locking protocol (since lock(Z) follows
unlock(X)), contradicting our assumption that all transactions in S follow the two-phase
locking protocol.
Comment
Chapter 21, Problem 21E
Problem
Modify the data structures for multiple-mode locks and the algorithms for read_lock(X),
write_lock(X), and unlock(X) so that upgrading and downgrading of locks are possible. (Hint: The
lock needs to check the transaction id(s) that hold the lock, if any.)
Step-by-step solution
Step 1 of 1
List of transaction ids that have read-locked an item is maintained, as well as the (single)
transaction id that has write-locked an item. Only read_lock and write_lock are shown below.
read_lock (X, Tn):
B: if lock (X) = "unlocked"
then begin lock (X) <- "read_locked, List(Tn)";
no_of_reads (X) <- 1
end
else if lock(X) = "read_locked, List"
then begin
(* add Tn to the list of transactions that have read_lock on X *)
lock (X) <- "read_locked, Append(List,Tn)";
no_of_reads (X) <- no_of_reads (X) + 1
end
else if lock (X) = "write_locked, Tn"
(* downgrade the lock if write_lock on X is held by Tn itself *)
then begin lock (X) <- "read_locked, List(Tn)";
no_of_reads (X) <- 1
end
else begin
wait (until lock (X) = "unlocked" and the lock manager wakes up the transaction);
goto B;
end;
write_lock (X,Tn);
B: if lock (X) = "unlocked"
then lock (X) <- "write_locked, Tn"
else
if ( (lock (X) = "read_locked, List") and (no_of_reads (X) = 1)
and (transaction in List = Tn) )
(* upgrade the lock if read_lock on X is held only by Tn itself *)
then lock (X) = "write_locked, Tn"
else begin
wait (until ( [ lock (X) = "unlocked" ] or
[ (lock (X) = "read_locked, List") and (no_of_reads (X) = 1)
and (transaction in List = Tn) ] ) and
the lock manager wakes up the transaction);
goto B;
end;
Comment
Chapter 21, Problem 22E
Problem
Prove that strict two-phase locking guarantees strict schedules.
Step-by-step solution
Step 1 of 1
Strict two-phase locking guarantees strict schedules, Since no other transaction that can read or
write an item and written by a transaction T until , T has committed and the condition for a strict
schedule is satisfied.
Comment
Chapter 21, Problem 23E
Problem
Prove that the wait-die and wound-wait protocols avoid deadlock and starvation.
Step-by-step solution
Step 1 of 2
Two schemas that prevent deadlocks ar called wait-die and wait-wound. Suppose that
transaction Ti tries to lock an item X but is not able to because X is locked by some other
transaction Tj with a conflicting lock. The rules followed by these schemes are as follows:
• Wait – die: If TS(Ti)< TS(Tj), then (Ti older than Tj) Ti is allowed to wait; otherwise (Ti younger
than Tj) abort Ti (Ti dies) and restart it later with the same timestamp.
• Wound – wait: If TS(Ti)< TS(Tj), then (Ti older than Tj) abort Tj (Ti wounds Tj) and restart it
later with the same timestamp; otherwise (Ti younger than Tj) Ti ia allowed to wait.
Comment
Step 2 of 2
In wait-die, an older transaction is allowed to wait on younger transaction, whereas a younger
transaction requesting an item held by an older transaction is aborted and restarted. The woundwait approach does the opposite: A younger transaction is allowed to wait on an older one,
whereas an older transaction requesting an item held by a younger transaction preempts the
younger transaction by aborting it. Both schemes end up aborting the younger of the two
transactions that may be involved in a deadlock. It can be shows that these two techniques are
deadlock-free, since in wait-die, transactions only wait on younger transactions so no cycle is
created. However, both techniques may cause some transactions to be aborted and restarted
needlessly, even though those transactions may never actually cause a deadlock.
Comment
Chapter 21, Problem 24E
Problem
Prove that cautious waiting avoids deadlock.
Step-by-step solution
Step 1 of 1
Cautious waiting avoids deadlock:
In cautious waiting, a transaction Ti can wait on a transaction Tj (and hence Ti becomes blocked)
only if Tj is not blocked at that time, say time b(Ti), when Ti waits.
Later, at some time b(Tj) > b(Ti), Tj can be blocked and wait on another transaction Tk
only if Tk is not blocked at that time. However, Tj cannot be blocked by waiting on an
already blocked transaction since this is not allowed by the protocol. Hence, the wait-for
graph among the blocked transactions in this system will follow the blocking times and
will never have a cycle, and so deadlock cannot occur.
Comment
Chapter 21, Problem 27E
Problem
Why is two-phase locking not used as a concurrency control method for indexes such as B+trees?
Step-by-step solution
Step 1 of 1
Two phase locking can also be applied to indexes such as B+ trees, where the nodes of an index
correspond to disk pages. However, holding locks on index pages until the shrinking phase of
2PL could cause an undue amount of transaction blocking because searching an index always
starts at the root. Therefore, if a transaction wants to insert a record (write operation), the root
would be locked in exclusive mode, so all other conflicting lock requests for the index must wait
until the transaction enters the shrinking phase. This blocks all other transactions from accessing
the index, so in practice other approaches to locking an index must be used.
Comment
Chapter 21, Problem 28E
Problem
The compatibility matrix in Figure 21.8 shows that IS and IX locks are compatible. Explain why
this is valid.
Step-by-step solution
Step 1 of 1
IS and IX are compatible. When transaction T holds IS and IX is requested By T’, T is having
only a shared lock and moreover T’ might be having intensions having an exclusive lock on a
node that might be different from one on which T is working.
Similarly T’ might be holding IX and T might request IS lock, since T’ might be having intensions
of accessing only a node that may be different from one accessed by T both operations are
compatible.
Comment
Chapter 21, Problem 29E
Problem
The MGL protocol states that a transaction T can unlock a node N, only if none of the children of
node N are still locked by transaction T. Show that without this condition, the MGL protocol would
be incorrect.
Step-by-step solution
Step 1 of 2
The rule that parent node can be unlocked only when none of child are not still locked by
transaction T. This rule enforces 2PL rules to produce serializable schedules. If this rule is not
followed, schedule will not be serializable and if schedule will not be serializable the transaction
will not produce correct results and thus the protocol will fail.
Comment
Step 2 of 2
This rule ensures serializability of transactions by governing the order of locking and
manipulation of data item by a transaction T. Let a transaction T wants to insert data in a node.
That is let leaf node. Now before data is inserted and leaf node is unlocked let root node is
unlocked. Now consider a situation when leaf node is full, this will call for splitting, but as root has
been unlocked and might be locked by transaction T’, operation can not proceed. Hence protocol
fails.
Comment
Chapter 22, Problem 1RQ
Problem
Discuss the different types of transaction failures. What is meant by catastrophic failure?
Step-by-step solution
Step 1 of 1
Types of failures :
Computer failure –
Main memory failure. Any thing that was not committed to the disk is gone. Restart the system
and pray it doesn't crash again.
Transaction or system error –
Divide by zero or integer overflow and this transaction failure may also occur because of
erroneous parameter values or because of a logical programming error.
Logical errors:
Errors or exception conditions that are detected by the transaction. A transaction that proceeds
but halts and cancels all inputted data because something along the way prevents it from
proceeding.
Concurrency control enforcement
Several transactions become deadlocked and are aborted.
Disk failure:
Some disk blocks may lose their data because of a read or write malfunction or because of a
read/ write head crash.
Catastrophic failure:
This would include many forms of physical misfortune to our database server. This refers to an
endless list of problems
or at least your hard drive with all your data is screwed...
Comment
Chapter 22, Problem 2RQ
Problem
Discuss the actions taken by the read_item and write_item operations on a database.
Step-by-step solution
Step 1 of 1
In a database, The operations like read item and write item that may
Actions taken by the read item operation on a database (assume the read operation is performed
on data item X):
Find the address of the disk block that contains item X.
Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.
Copy item X from the buffer to the program variable named X.
Actions taken by the write item operation on a database (assume the write operation is
performed on data item X):
Find the address of the disk block that contains item X.
Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.
Copy item X from the program variable named X into its correct location in the buffer.
Store the updated block from the buffer back to disk (either immediately or at some later point in
time).
Comment
Chapter 22, Problem 3RQ
Problem
What is the system log used for? What are the typical kinds of entries in a system log? What are
checkpoints, and why are they important? What are transaction commit points, and why are they
important?
Step-by-step solution
Step 1 of 4
System log: Recovery from transaction failures usually means that the database is restored to
the most recent consistent state just before the time of failure. To do this, the system must keep
information about changes that were applied to data items by various transactions. This
information is typically kept in the system log. Thus system logs help in data recovery in case of
failures.
Comment
Step 2 of 4
A typical strategy for recovery may b summarized information as follows:
1.) If there is extensive damage to a wide portion of the database due to catastrophic failure,
such as a disk crash, the recovery method restores a past copy of the database that was backed
up to archival storage and reconstructs a more current state by reapplying or redoing the
operations of committed transactions from the backed up log, up to the time of failure.
2.) When the database is not physically damaged, but has become inconsistent due to noncatastrophic failures the strategy is to reverse any changes that caused inconsistency by undoing
some operations. It may also be necessary to re-do some operations in order to restore a
consistent state of database. In this case, we do not need a complete archival copy of the
database. Rather, the entries kept in the online system log are consulted during recovery.
Typical kind of entries that System log include:
1.) [T, write command, data item,old value, new value]
2.) [T, read command, data item, value] //used for checking accesses to database
2.) [Checkpoint]
3.) [Commit, T]
4.) read_TS //(TS = TimeStamp)
5.) write_TS
Comment
Step 3 of 4
Checkpoint: This is a type of entry in the system log. A [checkpoint] record is written into the log
periodically at that point when the system writes out to the database on disk all DBMS buffers
that have been modified. As a consequence of this, all transactions have there [commit, T]
entries in the log before a [checkpoint] entry do not need to have their WRITE operations redone
in case of a system crash, since all their updates will be recorded in the database on disk during
check pointing. A checkpoint record may also include additionally information, such as a list of
active transaction ids, and the locations of the first and the most recent records in the log for
active transaction. This can facilitate undoing transaction operations in the event that a
transaction must be rolled back.
Comment
Step 4 of 4
Commit Point: A commit point is point at which execution o transaction gets completed and is
written to database and cannot be rolled back.
A commit point is important in case of recovery techniques based on deferred updates.
A typical deferred update protocol is stated as follows:
1.) A transaction cannot change the database on disk until it reaches commit point.
2.) A transaction does not reach its commit point until all its update operations are recorded in
the log and the log is force- written to disk.
Comment
Chapter 22, Problem 4RQ
Problem
How are buffering and caching techniques used by the recovery subsystem?
Step-by-step solution
Step 1 of 1
Buffering and caching techniques in the recovery subsystem:In a subsystem. The recovery process is of ten closely inter twined with operating system
functions. In general one or more disk pages that include the data items to be updated are
cached into main memory buffers and then updated in memory before being written back to disk.
At this time, the performance gap between disk and CPU increase, disk I/O has become a major
performance bottleneck for data intensive applications. Disk I/O latency, in particular is much
more difficult to improve than disk band width.
While, buffering and caching in main memory have been used extensively to bridge the
performance gap between CPU and disk.
Comment
Chapter 22, Problem 5RQ
Problem
What are the before image (BFIM) and after image (AFIM) of a data item? What is the difference
between in-place updating and shadowing, with respect to their handling of BFIM and AFIM?
Step-by-step solution
Step 1 of 3
BFIM and AFIM :Before image (BFIM) :The old value of the data item before updating is called the before image (BFIM).
After image (AFIM):The new value of the data item after updating is called the after image (AFIM)
Comment
Step 2 of 3
When flushing a modified buffer back to disk is follows two strategies. That are
In – place updating.
shadowing.
Comment
Step 3 of 3
Difference between in place updating and shadowing:In – place updating – writes the buffer to the same original disk location, and over writing the old
value of any changed data items on disk.
Here , single copy of each database disk block is maintained.
This process is called as before image.
Shadowing:Writes an updated buffer at a different disk location.
Here multiple versions of data items can be maintained.
This process is called as after image (AFIM).
BFIM and AFIM, both are kept on disk and it is not strictly necessary to maintain a log for
recovery.
Comment
Chapter 22, Problem 6RQ
Problem
What are UNDO-type and REDO-type log entries?
Step-by-step solution
Step 1 of 2
UNDO type and REDO type log entries:In the database recovery techniques, the recovery is achieved by the performing only UNDO’s
and only REDO’s or by a combination of the two.
These operations are recovered in the log when they happen.
The log entry information included for a write command and it is needed for UNDO and REDO.
UNDO type log entries:This entries includes the old value (S) in the data base before a write operation has
been executed
UNDO type log entries are necessary for rollback operations.
This type entries are use full in “Restore all BFIMs on to the disk, means Remove all
AFIMs.
Comment
Step 2 of 2
REDO type log entries:These entries, includes the new values in the data base a write operation has been
executed.
It is necessary for repeating already committed transactions.
Ex: In case of disk failure.
This type of entries are use full in “Restore the all AFIMs on to disk.
Comment
Chapter 22, Problem 7RQ
Problem
Describe the write-ahead logging protocol.
Step-by-step solution
Step 1 of 1
Write – ahead logging protocol:When in – place up dating, (means immediate or differed ) is used, then log is necessary for
recovery and in this case, it must be available to recovery manager.
For example:
If BFIM of the data item is recoded in the appropriate log entry and that the log entry is flushed to
disk before the BFIM is overwritten with the AFIM in the database on disk.
This total achieved by write – ahead logging (WAL) protocol.
Write – Ahead protocol states that
(1) For undo:Before a data item’s AFIM is flushed to the database disk, its BFIM must be written to the log and
the log must me saved on a stable store. (log disk).
(2) For Redo:Before a transaction executes, its commit operation.
All it’s AFIM must be written to the log and the log must be saved on a stable store.
Comment
Chapter 22, Problem 8RQ
Problem
Identify three typical lists of transactions that are maintained by the recovery subsystem.
Step-by-step solution
Step 1 of 1
List of transaction maintained by the recovery sub systems:For the best performance of the recovery process, the DBMS recovery subsystem may need to
maintain number of transactions. In that three main and typical transactions is there.
That are
(1) active transactions.
(2) Committed transactions.
(3) Aborted transactions.
These three lists makes the recovery process more efficient.
Comment
Chapter 22, Problem 9RQ
Problem
What is meant by transaction rollback? What is meant by cascading rollback? Why do practical
recovery methods use protocols that do not permit cascading rollback? Which recovery
techniques do not require any rollback?
Step-by-step solution
Step 1 of 4
Transaction roll back:Transaction rollback means that, if a transaction has failed after a disk write, the writes need to
be undone.
Means that,
To maintain atomicity, a transaction’s operations are redone or undone.
Undo : Restore all BFIM s on to disk (Remove all AFIM s)
Redo: Restore all AFIM s on to disk.
Data base recovery is achieved either by performing only Undo s or only Redo s by a
combination of the two.
These operations are recorded in the log as they happen.
Comment
Step 2 of 4
Cascading roll back:
Cascading roll back is where the failure and rollback of some transaction requires the rollback of
other.
Uncommitted transactions because they read updates of the failed transaction.
And
In mean wile, any values that are derived from the values that were rolled back will also be undo.
Comment
Step 3 of 4
Practical recovery methods use protocols that do not permit cascading roll back because, it is
complex and time – consuming.
Practical recovery methods guarantee cascade less or strict schedules.
Comment
Step 4 of 4
UNDO / REDO recovery technique is do not required any rollback in a deferred update.
Comment
Chapter 22, Problem 10RQ
Problem
Discuss the UNDO and REDO operations and the recovery techniques that use each.
Step-by-step solution
Step 1 of 1
UNDO / REDO operations:If we want to describe a protocol for write – ahead logging, then we must distinguish between two
types of log entry information included for a write.
Command that are
UNDO
REDO
A UNDO – type log entries includes the old value (BFIM) of the item since this is needed to undo
the effect of the operation from the log.
A REDO – type entry includes the new value (AFIM) of the item written by the operation since
this is needed to read the effect of the operation from the log.
In the UNDO / REDO algorithm, both types of log entries are combined. And cascading roll back
is possible when the read – item entries in the log are considered to be UNDO – type entries.
Comment
Chapter 22, Problem 11RQ
Problem
Discuss the deferred update technique of recovery. What are the advantages and disadvantages
of this technique? Why is it called the NO-UNDO/REDO method?
Step-by-step solution
Step 1 of 5
Deffeved update technique of recovery:The main thought of this technique is, to deffer or postpone any actual updates to the database
until the transaction completes its execution successfully end reaches its commit point.
Through this technique, the updates are recorded only in the log and in the cache buffers.
After the transaction reaches its commit point and the log is force written to disk and the updates
are recorded in the data base.
Differed update technique is also called as NO – UNDO / REDO recovery.
Deferred update protocol. It maintains two main rules.
A transaction cannot change any items in the database until it commits.
A transaction may not commit until all of the write operations are successfully
recorded in the log.
This means that we must check to see that the log is actually written to disk
Example:-
Comment
Step 2 of 5
Log file:
Start
write
commit
check point
start
write
write
commit
start
write
start
write
system crash ……………
Comment
Step 3 of 5
from this example:
Since
How ever,
committed, their changes were written to disk.
did not commit, hence, their changes were not written to disk.
To recover, we simply ignore those transactions that did not commit.
Comment
Step 4 of 5
Advantages and disadvantages of deferred update technique:Advantages:-
Recovery is made easier.
Any transaction that reached the commit point (from the log) has its writes applied to
the database (REDO).
All other transactions are ignored.
Cascading rollback does not occur because, no other transactions sees the work of
another until it is committed (no stale reads).
Disadvantages:Concurrency is limited:
Must empty strict 2PL which limits concurrency.
Comment
Step 5 of 5
Deferred update technique is called as
NO – UNDO / REDO recovery method because. From the second step (A transaction does not
reach its commit point until all its update operations are recorded in the log and the log is force –
written to disk ) of this protocol is a restatement of the write – ahead logging (WAL) protocol.
Because the database is never updated on disk until after the transaction commits. There is
never a need to UNDO any operations.
Hence this is known as the NO – UNDO / REDO method.
Comment
Chapter 22, Problem 12RQ
Problem
How can recovery handle transaction operations that do not affect the database, such as the
printing of reports by a transaction?
Step-by-step solution
Step 1 of 1
If a transaction that has actions that do not affect the database, such a generating and printing
messages or reports from the information retrieved from the database, fails before completion,
we may not want user to get these reports, since the transaction has failed to complete. If such
erroneous reports are produced, part of the recovery process would have to inform the user
these reports are wrong, since the user may take an action based on these reports that affects
the database. Hence such reports must be generated only after the transaction reaches its
commit point. A common method of dealing with such actions is to issue the command that
generate the reports but keep them as batch jobs, which are executed only after the transaction
reaches its commit point. If the transaction fails, the batch jobs are canceled.
Comment
Chapter 22, Problem 13RQ
Problem
Discuss the immediate update recovery technique in both single-user and multiuser
environments. What are the advantages and disadvantages of immediate update?
Step-by-step solution
Step 1 of 2
Immediate update technique:Immediate update applies the write operations to the database as the transaction is executing.
When the transaction issues an update commend. Then the database can be updated with out
any need to wait for the transaction to reach it’s commit point and the update operation must still
be recorded in the log before it is applied to the database using the write ahead is maintain two
logs.
(1) REDO log : A record of each new data item in the database.
(2) UNDO log: A record of each update data item old vale
And
It follows the two rules.
(1) Transaction T may not update the database until all undo entries have been written to the
UNDO log.
(2) Transaction T is not allowed to commit until all REDO and UNDO log entries are written.
Comment
Step 2 of 2
Advantages and disadvantages of immediate update:Advantages:Immediate update allows higher concurrency, because transactions write continuously to the
database rather than waiting until the commit point.
Disadvantages:It can lead the cascading roll backs – time consuming and may be problematic.
Comment
Chapter 22, Problem 14RQ
Problem
What is the difference between the UNDO/REDO and the UNDO/NO-REDO algorithms for
recovery with immediate update? Develop the outline for an UNDO/NO-REDO algorithm.
Step-by-step solution
Step 1 of 3
Difference between UNDO/ REDO and UNDO/NO – REDO algorithms:UNDO / REDO algorithms:Recovery techniques based on immediate update and it uses in the single user
environment.
This recovery schema category apply to undo and also redo for recovery.
In a single – user environment there is no concurrency control is required but a log is
maintained under WAL.
The recovery manger performs.
Undo of a transaction if it is in the active table.
Redo of a transaction if it is in the commit table.
Recovery schemas of this category applies undo and also redo to recover the database
from failure.
Comment
Step 2 of 3
UNDO / NO – REDO algorithm:In this algorithm, AFIM’s of a transaction are flushed to the database disk under
WAL before it commits.
For this reason the recovery manager undoes all transactions during recovery.
Here No transaction is redone.
It is possible that a transaction might have completed execution and ready to commit
but this transaction is also undone.
Comment
Step 3 of 3
Out line for a an undo / No – Redo algorithm:In this algorithm, AFIMs of a transaction are flushed to the database disk under WAL
before it commits.
Reason for the recovery manager undoes all transactions during recovery
Here NO trans.
Comment
Chapter 22, Problem 15RQ
Problem
Describe the shadow paging recovery technique. Under what circumstances does it not require a
log?
Step-by-step solution
Step 1 of 3
Shadow paging recovery technique:Shadow paging is considers that the data base to be made up of a number of fixed size disk
pages (or disk blocks ) – say , n – for recovery purposes.
Shadow paging technique is mused to manage the access of data items by the concurrent
transactions, two directories (current and shadow) are used. The directory arrangement is
illustrated below.
Comment
Step 2 of 3
Current directory. Shadow directory
After updating data items 2.5 (not up dated)
Comment
Step 3 of 3
Here data items means pages:Shadow paging is not required for a log
Comment
Chapter 22, Problem 16RQ
Problem
Describe the three phases of the ARIES recovery method.
Step-by-step solution
Step 1 of 2
Three phases of ARIES recovery method:The ARIES recovery methods / Algorithms. Consists of three phases.
(1) Analysis phase
(2) Redo phase
(3) Undo phase.
Comment
Step 2 of 2
In the analysis phase, step identifies the dirty pages in the buffer and the set of transactions
active at the time of crash. The appropriate point in the log where redo is to start is also
determined,
Where in the redo phase, redo operations are applied and where in undo. The log is scanned
back words and the operations of transactions active at the time of crash are undone in reverse
order.
Comment
Chapter 22, Problem 17RQ
Problem
What are log sequence numbers (LSNs) in ARIES? How are they used? What information do the
Dirty Page Table and Transaction Table contain? Describe how fuzzy checkpointing is used in
ARIES.
Step-by-step solution
Step 1 of 4
Log sequence numbers in ARIES:In ARIES, every log record is associated log sequence number (LSN) that is monotonically
increasing and indicates the address of the log record on disk.
A log record is used for to write.
(1) data up date
(2) transaction commit
(3) transaction abort
(4) undo
(5) transaction end.
Comment
Step 2 of 4
In the case of undo, a compensating log record is written.
Dirty page table and Transaction table:For efficient recovery, two tables are needed.
These tables are stored in the log during checkpoint.
(1) Transaction table :Table contains an entry for each active transaction, with information such as
transaction ID. Transaction status and the LSN of the most recent log record for the
transaction.
(2) Dirty page table:This table contains an entry for each dirty page, in the buffer which includes the page
ID and the LSN corresponding to the earliest update to that page.
Comment
Step 3 of 4
Fuzzy check pointing:Fuzzy check pointing is used for to reduce the cost of check pointing and allow the system to
continue to execute transactions.
ARIES uses the fuzzy check pointing it does follows.
Writes a begin-check point record in the log.
Writes an end – check point record in the log. With this record the contents of
transaction table and dirty table are appended to the end of the log.
Writes the LSN of the begin – check point record to a special file. This special file is
accessed during recovery to located the last check point information.
Comment
Step 4 of 4
In practice, Fuzzy check point technique use when the system can resume transaction
processing after the record is written to the log without having to wait for the process of check
point action step 2 ( force – write all memory buffers that have been modified to disk ) to finish
until the above step is completed. Then the previous record should remain valid.
To accomplish this, the system maintains a pointer to the valid check point, which continues to
point to the previous record in the log. Once the above step is concluded, the pointer changes to
point to the new check point in the log.
Comment
Chapter 22, Problem 18RQ
Problem
What do the terms steal/no-steal and force/no-force mean with regard to buffer management for
transaction processing?
Step-by-step solution
Step 1 of 1
In a transaction processing , he buffer management, manages through.
(1) Steal / no – steal :A system is said to steal buffers if it allows the buffers that contain dirty data (means it is
uncommitted but updated) data to be swapped to physical storage.
If steal is allowed in the buffer management undo transaction is necessary.
(2) Force / No – force :A system is said to force buffers if every committed data is guarantied to be forced on to the disk
at commit time.
If force is not allowed, redo is necessary.
Comment
Chapter 22, Problem 19RQ
Problem
Describe the two-phase commit protocol for multidatabase transactions.
Step-by-step solution
Step 1 of 1
Prepare phase –
The global coordinator (initiating node) ask a participants to prepare (to promise to commit or
rollback the transaction, even if there is a failure)
Commit - Phase –
If all participants respond to the coordinator that they are prepared, the coordinator asks
all nodes to commit the transaction, if all participants cannot prepare, the coordinator asks all
nodes to roll back the transaction.
Comment
Chapter 22, Problem 20RQ
Problem
Discuss how disaster recovery from catastrophic failures is handled.
Step-by-step solution
Step 1 of 1
Catastrophic failures from handled by disaster recovery, in this, the entire database along with a
log file are copied to a cheap and large storage device periodically. When a catastrophe strikes,
the most recent back up copy is placed back where the database used to do.
Comment
Chapter 22, Problem 21E
Problem
Suppose that the system crashes before the [read_item, T3, A] entry is written to the log in
Figure 22.1(b). Will that make any difference in the recovery process?
Step-by-step solution
Step 1 of 1
Let consider the data from text book figure 19.1(b) .
If the system crashes before the [ read_item, T3, A] entry is written to the log, There will be no
difference in the recovery process, because read_item operations are needed only for
determining if cascading rollback of additional transactions is necessary.
Comment
Chapter 22, Problem 22E
Problem
Suppose that the system crashes before the [write_item, T2, D, 25, 26] entry is written to the log
in Figure 22.1(b). Will that make any difference in the recovery process?
Step-by-step solution
Step 1 of 2
When the system cashes before the transaction T2 performs a write operation on item D, there
will a difference in the recovery process.
Comment
Step 2 of 2
During the recovery process, the following transactions must be rolled back.
• The transaction T3 has not reached it commit point. So, the transaction T3 have to be rolled
back.
• Also, the transaction T2 has not reached it commit point. So, the transaction T2 have to be
rolled back.
Hence, the transactions T2 and T3 have to be rolled back in the recovery process.
Comment
Chapter 22, Problem 23E
Problem
Figure shows the log corresponding to a particular schedule at the point of a system crash for
four transactions T1 T2, T3, and T4. Suppose that we use the immediate update protocol with
checkpointing. Describe the recovery process from the system crash. Specify which transactions
are rolled back, which operations in the log are redone and which (if any) are undone, and
whether any cascading rollback takes place.
Figure A sample schedule and its corresponding log.
Step-by-step solution
Step 1 of 5
The recovery process from the system crash will be as follows:
• Undo all the write operations of the transaction that are not committed.
• Redo all the write operations of the transaction that are committed after the check point.
• Do not redo/undo the transaction that have committed before checkpoint.
Comment
Step 2 of 5
The transactions that need to be roll backed are as follows:
• The transaction T3 has not reached it commit point. So, the transaction T3 have to be rolled
back.
• Also, the transaction T2 has not reached it commit point. So, the transaction T2 have to be
rolled back.
Comment
Step 3 of 5
The operations that are to be redone are as follows:
• write_item, T4, D, 25, 15: The transaction T4 must redo the write operation on item D.
• write_item, T4, A, 30, 20: The transaction T4 must redo the write operation on item A.
Comment
Step 4 of 5
The operations that are to be undone are as follows:
• write_item, T2, D, 15, 25
• write_item, T3, C, 30, 40:
• write_item, T2, B, 12, 18:
Comment
Step 5 of 5
As no transaction has read an item which is written by an uncommitted transaction, no cascading
rollbacks occur in the schedule.
Comment
Chapter 22, Problem 24E
Problem
Suppose that we use the deferred update protocol for the example in Figure 22.6. Show how the
log would be different in the case of deferred update by removing the unnecessary log entries;
then describe the recovery process, using your modified log. Assume that only REDO operations
are applied, and specify which operations in the log are redone and which are ignored.
Step-by-step solution
Step 1 of 2
In the case of deferred update by removing the un necessary log entries , the write operations of
uncommitted transactions are not recorded in the database until the transactions commit. So, the
write operations of T2 and T3 would not have been applied to the database and so T4 would
have read the previous values of items A and B, thus leading to a recoverable schedule.
By using the procedure RDU_M (deferred update with concurrent execution in a
multiuser environment), the following result is obtained:
Comment
Step 2 of 2
The list of committed transactions T since the last checkpoint contains only transaction
T4. The list of active transactions T' contains transactions T2 and T3.
Only the WRITE operations of the committed transactions are to be redone. Hence, REDO is
applied to:
[write_item,T4,B,15]
[write_item,T4,A,20]
The transactions that are active and did not commit i.e., transactions T2 and T3 are
canceled and must be resubmitted. Their operations do not have to be undone since they
were never applied to the database
Comments (1)
Chapter 22, Problem 25E
Problem
How does checkpointing in ARIES differ from checkpointing as described in Section 22.1.4?
Step-by-step solution
Step 1 of 1
From described in section 19.1.4 in textbook,
The main difference is that with ARIES, main memory buffers that have been modified are not
flushed to disk. ARIES, however writes additional information to the LOG in the form of a
Transaction Table and a Dirty Page Table when a checkpoint occurs.
Comment
Chapter 22, Problem 26E
Problem
How are log sequence numbers used by ARIES to reduce the amount of REDO work needed for
recovery? Illustrate with an example using the information shown in Figure 22.5. You can make
your own assumptions as to when a page is written to disk.
Step-by-step solution
Step 1 of 1
ARIES can be used to reduce the amount of REDO work through log sequence numbers as
follows:
• ARIES reduces the amount of REDO work by starting redoing after the point, where all prior
changes have been applied to the database. ARIES performs REDO at the position in the log
that corresponds to smallest LSN, M.
• In the Figure 22.5, REDO must start at the log position 1 as the smallest LSN in Dirty Page
Table is 1.
• When
, then the page corresponding to LSN is changed and is propagated to the
database.
• In the figure 22.5 the transaction
performs the update of page C and page C has a LSN of
7.
• When REDO starts at log position 1, page C is propagated to the database. But the page C is
not changed as its LSN (7) is greater than the LSN of current log position (1).
• Now consider the LSN 2. Page B is associated with this LSN and it would be propagated to the
database. The page B would be updated if its LSN is less than 2. Similarly, the page
corresponding to LSN 6 would be updated.
• However the page corresponding to the LSN 7 need not be updated as the LSN of page C, that
is 7, is not less than the current log position.
Comment
Chapter 22, Problem 27E
Problem
What implications would a no-steal/force buffer management policy have on checkpointing and
recovery?
Step-by-step solution
Step 1 of 1
No-Steal/Force Buffer Management Policy Implications
• No-steal/force buffer management policy means that the cache or buffer page that has been
updated by the transaction cannot be written to disk before the transaction commits
• Force means that pages updated by a transaction are written to disk before transaction commit.
• During checkpoint scheme in no-steal, all modified main memory buffers to disk would not be
able to write pages updated by uncommitted transactions.
• With Force, after a transaction is done, its updates are written to disk. If there is any failure
during this transaction, then REDO is still needed. UNDO is not needed since uncommitted
updates are never written to disk.
Comment
Chapter 22, Problem 28E
Problem
Choose the correct answer for each of the following multiple-choice questions:
Incremental logging with deferred updates implies that the recovery system must
a. store the old value of the updated item in the log
b. store the new value of the updated item in the log
c. store both the old and new value of the updated item in the log
d. store only the Begin Transaction and Commit Transaction records in the log
Step-by-step solution
Step 1 of 1
Incremental loging with deferred updates implies that the recovery system must necessarily,
Option (b)
Store the new value of the updated item in the log.
Comment
Chapter 22, Problem 29E
Problem
Choose the correct answer for each of the following multiple-choice questions:
The write-ahead logging (WAL) protocol simply means that
a. writing of a data item should be done ahead of any logging operation
b. the log record for an operation should be written before the actual data is written
c. all log records should be written before a new transaction begins execution
d. the log never needs to be written to disk
Step-by-step solution
Step 1 of 1
The write ahead logging (WAL) protocol simply means that the log record for an operation should
be written before the actual data is written.
Option (b)
The log record for an operation should be written before the actual data is written.
Comment
Problem
Chapter 22, Problem 30E
Choose the correct answer for each of the following multiple-choice questions:
In case of transaction failure under a deferred update incremental logging scheme, which of the
following will be needed?
a. an undo operation
b. a redo operation
c. an undo and redo operation
d. none of the above
Step-by-step solution
Step 1 of 1
In case of transaction failure under a deferred update incremental logging scheme which of the
following will needed.
Option (c)
An undo and redo operations.
Comments (1)
Chapter 22, Problem 31E
Problem
Choose the correct answer for each of the following multiple-choice questions:
For incremental logging with immediate,updates, a log record for a transaction would contain
a. a transaction name, a data item name, and the old and new value of the item
b. a transaction name, a data item name, and the old value of the item
c. a transaction name, a data item name, and the new value of the item
d. a transaction name and a data item name
Step-by-step solution
Step 1 of 1
For incremental logging with immediate updates a log record for a transaction would contain.
Option (a)
A Transaction name, data item name, old value of item, new value of item
Comment
Chapter 22, Problem 32E
Problem
Choose the correct answer for each of the following multiple-choice questions:
For correct behavior during recovery, undo and redo operations must be
a. commutative
b. associative
c. idempotent
d. distributive
Step-by-step solution
Step 1 of 1
For correct behavior during recovery, undo and redo operations must be
Option (c)
Idempotent
Comment
Chapter 22, Problem 33E
Problem
Choose the correct answer for each of the following multiple-choice questions:
When a failure occurs, the log is consulted and each operation is either undone or redone. This
is a problem because
a. searching the entire log is time consuming
b. many redos are unnecessary
c. both (a) and (b)
d. none of the above
Step-by-step solution
Step 1 of 1
When a failure occurs, the log is consulted and each operation is either undone or redone.
This is the problem because.
Option (a)
Searching the entire log is time consuming.
Comment
Chapter 22, Problem 34E
Problem
Choose the correct answer for each of the following multiple-choice questions:
Using a log-based recovery scheme might improve performance as well as provide a recovery
mechanism by
a. writing the log records to disk when each transaction commits
b. writing the appropriate log records to disk during the transaction’s execution
c. waiting to write the log records until multiple transactions commit and writing them as a batch
b. never writing the log records to disk
Step-by-step solution
Step 1 of 1
When using a log based recovery scheme it might improve performance as well as providing a
recovery mechanism by
Option C
Waiting to write the log records until multiple transactions commit and waiting them as a batch.
Comment
Chapter 22, Problem 35E
Problem
Choose the correct answer for each of the following multiple-choice questions:
There is a possibility of a cascading rollback when
a. a transaction writes items that have been written only by a committed transaction
b. a transaction writes an item that is previously written by an uncommitted transaction
c. a transaction reads an item that is previously written by an uncommitted transaction
d. both (b) and (c)
Step-by-step solution
Step 1 of 1
There is a possibility of a cascading rollback when
Option (d)
A transaction writes & reads an item that is previously written by an uncommitted transaction.
Comment
Chapter 22, Problem 36E
Problem
Choose the correct answer for each of the following multiple-choice questions:
To cope with media (disk) failures, it is necessary
a. for the DBMS to only execute transactions in a single user environment
b. to keep a redundant copy of the database
c. to never abort a transaction
d. all of the above
Step-by-step solution
Step 1 of 1
To cope with media (disk) failures. It is necessary
Option (b)
To keep a redundant copy of the database.
Comment
Chapter 22, Problem 37E
Problem
Choose the correct answer for each of the following multiple-choice questions:
If the shadowing approach is used for flushing a data item back to disk, then
a. the item is written to disk only after the transaction commits
b. the item is written to a different location on disk
c. the item is written to disk before the transaction commits
b. the item is written to the same disk location from which it was read
Step-by-step solution
Step 1 of 1
If the shadowing approach is used for flushing a data item back to disk then.
Option (b)
The item is written to different location on disk.
Comment
Chapter 30, Problem 1RQ
Problem
Discuss what is meant by each of the following terms: database authorization, access control,
data encryption, privileged (system) account, database audit, audit trail.
Step-by-step solution
Step 1 of 1
Database authorization
Database authorization ensures the security of the portions of the database against unauthorized
access.
Access control
Most common problem of security is the prevention of accessing the system by an unauthorized
person to obtain information or to inject malicious content that modifies the database. DBMS
must include various security mechanisms which restrict access to the entire database system.
This function is performed by creating user accounts and passwords for the login process to
secure from unauthorized users by the DBMS.
Data encryption
Sensitive data such as card numbers (ATM or credit card) provided by bank must be protected
that is transmitted through communications network; it provides additional protection for
database. The data is encoded so that unauthorized users who access those data will have
difficulty in decoding it.
Privileged account
The DBA account provides important capabilities. The commands are privileged that include
granting and revoking commands of privileges to individual accounts, users, or user groups by
performing following actions
• Account creation
• Privilege granting
• Privilege revocation
• Security level assignment
Database audit
If there are any modifications or any alterations with the database are identified without their
knowledge, a database audit is performed. It consists of reviewing the log to examine all
accesses and operations applied to the database during certain period of time.
Audit trail
The database log is used for security purposes as it contains all details of the accessing and the
operations are referred as audit trail.
Comment
Chapter 30, Problem 2RQ
Problem
Which account is designated as the owner of a relation? What privileges does the owner of a
relation have?
Step-by-step solution
Step 1 of 1
Owner account is designated as the owner of a relation which is typically the account that was
used when the relation was created in the first place. The owner of a relation is given all
privileges on that relation. The owner account holder can pass privileges on any of the owner
relation to other users by granting privileges to their accounts.
Comment
Chapter 30, Problem 3RQ
Problem
How is the view mechanism used as an authorization mechanism?
Step-by-step solution
Step 1 of 1
The view mechanism is an important discretionary authorization mechanism in its own right.
For example:If the owner A of a relation R wants another account B to be able to retrieve only some fields of
R, then A can create a view V of R that includes only those attributes and then grant SELECT on
V to B. the same applies to limiting B to retrieving only certain tuples of R; a view V can be
created by defining the view by means of a query that selects only those tuples from R that A
wants to allow B to access.
Comment
Chapter 30, Problem 4RQ
Problem
Discuss the types of privileges at the account level and those at the relation level.
Step-by-step solution
Step 1 of 1
There are two levels of privileges to be assigned to use the database system, account level and
relation (or table level).
• At account level, each account of the relation holds particular privileges independently specified
by the database administrator in the database.
• At relation level, each individual relation or view in the database accessing privileges are
controlled by database administrator.
Account level
It includes,
1. CREATE SCHEMA or CREATE TABLE privilege, to create a schema.
2. CREATE VIEW privilege.
3. ALTER privilege, to perform changes such as adding or removing attributes.
4. DROP privilege, to delete relations or views.
5. MODIFY privilege, to insert, delete, or update tuples.
6. SELECT privilege, to retrieve information from the database.
Relation level
• It refers to either base relation or view (virtual) relation.
• Each type of command can be applied for each user by specifying the individual relation.
Access matrix model, an authorization model is used for granting and revoking of privileges.
Comment
Chapter 30, Problem 5RQ
Problem
What is meant by granting a privilege? What is meant by revoking a privilege?
Step-by-step solution
Step 1 of 1
Granting and revoking of privileges should be performed so that it ensures secure and authorized
access and hence both of them should be controlled on each relation R in a database.
It is carried out by assigning an owner account, which is the account that was used when the
relation was created. The owner of the relation is the one who uses all privileges on that relation.
Granting of privileges
The owner account holder can transfer the privileges on any of the relations owned to other
users by issuing GRANT command (granting privileges) to their accounts. Types of privileges
granted on each individual relation R by using GRANT command are as follows,
• SELECT privilege on some relation, gives the privilege to retrieve the information (tuples) from
that relation.
• Modification privilege is provided to do insert, delete, and update operations that modify the
database.
• References privilege is granted to refer a relation based on integrity constraints specified.
Revoking of privileges
When any of the privileges is granted it is given temporarily, it should be necessary to cancel that
privilege after the task has been completed. REVOKE command is used in SQL for canceling the
privileges granted to them.
Comment
Chapter 30, Problem 6RQ
Problem
Discuss the system of propagation of privileges and the restraints imposed by horizontal and
vertical propagation limits.
Step-by-step solution
Step 1 of 2
Propagation of privileges: whenever the owner A of a relation R grants a privilege on R to
another account B, the privilege can be given to B with or without the GRANT OPTION. If the
GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts.
Suppose that B is given GRANT OPTION by A and that B then grants the privilege on R to a
third account C, also with GRANT OPTION. In this way, privileges on R can propagate to other
accounts without the knowledge of the owner of R. If the owner account A now revoke he
privileges granted to B, all the privileges that B propagated based on that privileges should
automatically be revoked by the system.
It is possible for a user to receive a certain privileges from two or more sources. For example, A'
may a certain privilege from both B' and C'. Now let B' revokes privileges from A' but A' will still
have them from virtue of C'. If now C' also revokes the privileges A' will loose them permanently.
The DBMS that allows propagation of privileges must keep a track of how all the privileges were
granted do that revoking of privileges can be done correctly and completely.
Comment
Step 2 of 2
Since propagation of privileges can lead to many accounts having privilege on a relation without
the knowledge of owner. There must be ways to restrict number of people that can have
privileges on an relation. This can be done using limiting by Horizontal propagation and by
limiting by Vertical propagation.
Limiting Horizontal propagation to an integer number i mean that an account B given the
GRANT OPTION can grant privileges to at most i other accounts.
Vertical propagation limits the depth of the granting of privileges. Granting of privileges with
vertical propagation zero is equivalent to granting the privileges with no GRANT OPTION. If
account A grants privileges to account B with vertical propagation set to j>0, this means that the
account B has GRANT OPTION on the privilege, but B can grant the privilege to other accounts
only with a vertical propagation less than j. In effect vertical propagation limits the sequence of
GRANT OPTIONS that can be given from one account to the next based on single original grant
of the privileges.
For example: Suppose that A grant SELECT to B on EMPLOYEE relation with horizontal
propagation = 1 and vertical propagation = 2. B can grant select to almost one account because
horizontal propagation = 1. Additionally, B cannot grant privilege to another account with vertical
propagation set to 0 or 1. Thus we can limit propagation by using these two methods.
Comment
Chapter 30, Problem 7RQ
Problem
List the types of privileges available in SQL.
Step-by-step solution
Step 1 of 1
Following type of privileges can be granted on each individual relation R:
1.) Select (retrieval or read) privilege on R: Gives the account retrieval privilege. In SQL this
gives the account the privilege to use SELECT statement to retrieve the tuples from R
2.) Modify privilege on R: This gives the account the capability to modify tuples of R. In SQL
this privilege is further divided into UPDATE, DELETE, and INSERT privileges to apply
corresponding SQL commands to R. Additionally, both the INSERT and UPDATE privileges can
specify that only certain attributes of R can be updated by the account.
3.) Reference privileges on R: This gives the account the capability to reference relation R
when specifying integrity constraints. This privilege can also be restricted to specific attributes of
R.
To create a view an account must have SELECT privilege on all relations involved in view
definition.
Comment
Chapter 30, Problem 8RQ
Problem
What is the difference between discretionary and mandatory access control?
Step-by-step solution
Step 1 of 2
a. Discretionary Access Control (DAC) policies are characterized by a high degree of flexibility,
which makes them suitable for a large variety of application domains.
By contrast Mandatory Access Control policies are having a drawback of being too rigid in
that they require a strict classification of subject and objects into security levels, and therefore
they are applied to ery few environments.
Comment
Step 2 of 2
b. The main drawback of DAC models is their vulnerability to malicious attacks, such as Trojan
horses embedded in application programs. The reason is that discretionary authorization models
do not impose any control on how information is propagated and used once it has been
accessed by authorized user to do so.
By contrast Mandatory Access Control policies ensure a high degree of protection- in a way,
they prevent any illegal flow of information.
Comment
Chapter 30, Problem 9RQ
Problem
What are the typical security classifications? Discuss the simple security property and the *property, and explain the justification behind these rules for enforcing multilevel security.
Step-by-step solution
Step 1 of 1
Typical security classes are top secret (Ts), secret (S), confidential (C), and unclassified (U),
where TS is the highest level and U the lowest:
.
Simple security: A subject S is not allowed read access to an object 0 unless
. This is known as simple security property.
*Property: A subject S is not allowed to write on object O unless
. This
known as star property.
The first rule is that no subjects can red on object whose security classification is higher than the
subject’s security clearance.
The second restriction is less intuitive; it prohibits a subject from writing an object at a lower
security classification than the subject’s security clearance violations of this rule would allow
information to flow from higher to lower classifications which violates a basic tenet of multilevel
security.
Comment
Chapter 30, Problem 10RQ
Problem
Describe the multilevel relational data model. Define the following terms: apparent key,
polyinstantiation, filtering.
Step-by-step solution
Step 1 of 3
Define:
1.) Apparent key: The apparent key of a multilevel relation is the set of attributes that would
have formed the primary key in a regular (single- level) relation.
Comment
Step 2 of 3
2.) Filtering: A multilevel relation will appear to contain different data to subjects with different
clearance levels. In some cases, it is possible to store a single tuple in the relation at a higher
classification level and produce the corresponding tuples at a lower- level classification through a
process known s filtering.
Comment
Step 3 of 3
3.) Polyinstantiation In some cases, it is necessary to store two or more tuples at different
classification levels with the same value for the apparent key. This leads to the concept of
polyinstantiation, where several tuples can have same apparent key value but different attributes
value for users at different classification levels.
Comment
Chapter 30, Problem 11RQ
Problem
What are the relative merits of using DAC or MAC?
Step-by-step solution
Step 1 of 1
Discretionary access control (DAC) policies are characterized by a high degree of
flexibility, which makes them suitable for a large variety of application domains.
The main drawback of DAC models is their vulnerability to malicious attacks, such as
Trojan horses embedded in application programs.
Where as mandatory policies ensures a high degree of protection in a way, they
prevent any illegal flow of information.
MAC have the drawback of being too rigid and they are only applicable in limited
environments.
In many practical situations discretionary policies are preferred because they offer a
better trade off between security and applicability.
Comment
Problem
Chapter 30, Problem 12RQ
What is role-based access control? In what ways is it superior to DAC and MAC?
Step-by-step solution
Step 1 of 1
Role – based access control (RBAC) technology for managing and enforcing security in large –
scale enterprise wide systems. The basic notation is that permissions are associated with soles,
and users are assigned to appropriate roles.
Roles can be created using the CREATE ROLE and DESTROY ROLE commands, ERANT and
REVOKE used to assign and revoke privileges from voles.
RBAC appears to be a viable alternative to traditional DAC and MAC, it ensures that only
authorized users are given access to certain data or resources.
Many DBMS have allowed the concept of voles, where privileges can be assigned to voles.
Role hierarchy in RBAC is natural way of organizing roles to reflect the organization’s lines of
authority and responsibility.
Using an RBAC model highly desirable goal for addressing the key security requirements of
web – based applications.
DAC and MAC models lack capabilities needed to support the security requirements emerging
enterprises and web – based applications.
Comment
Chapter 30, Problem 13RQ
Problem
What are the two types of mutual exclusion in role-based access control?
Step-by-step solution
Step 1 of 1
Allocation of duties is an important requirement in various database management systems. It is
necessary to prevent doing work by the single user that involves the requirement of two or more
people, so that collision can be prevented. To implement this process successfully mutual
exclusion of roles are used.
Two roles are said to be mutually exclusive if the user does not able to use both the roles.
Mutual exclusion of roles can be classified in to two types.
1. Authorization time exclusion.
2. Runtime exclusion.
Authorization time exclusion
It is a static process in which two roles that are mutually exclusive are not assigned to user’s
authorization at the same time.
Runtime exclusion
It is a dynamic process, where the two roles are mutually exclusive are authorized to one user at
the same time but can activate any one authorization that is both the roles cannot be activated at
the same time.
Comment
Chapter 30, Problem 14RQ
Problem
What is meant by row-level access control?
Step-by-step solution
Step 1 of 1
In row level access control, the name itself determines that access control rules are implemented
on the data row by row.
Each row is given a label, where data sensitivity information is stored.
• It ensures data security by allowing the permissions to be set not only for column or table but
also for each row.
• Database administrator provides the user with the default session label initially.
• Row-level contains levels of hierarchy of sensitivity of data to maintain privacy or security.
• Unauthorized users are prevented from viewing or altering certain data by using labels
assigned.
• A user is represented by a low number who have low level authorization, the access is denied
to data having a higher-level number.
• If the label is not given to a row, it is automatically assigned depending upon the user’s session
label.
Comment
Chapter 30, Problem 15RQ
Problem
What is label security? How does an administrator enforce it?
Step-by-step solution
Step 1 of 1
Label Security policy is a policy defined by the administrator. The policy is invoked
automatically whenever the policy affected data is accessed through an application. When this
policy is implemented, each row is added with a new column.
The new column contains the label for each row that is considered to be the sensitivity of the row
as per the policy. Each user has an identity in label-based security; it is compared to the label
assigned to each row to determine whether the user has rights to access to view the contents of
that row.
The database administrator has the privilege to set an initial label for the row.
Label security administrator defines the security labels for data and authorizations that govern
access to specified projects for users.
Example
If a user has SELECT privilege on the table, Label Security will automatically evaluate each row
returned by the query to determine whether the user is provided with the rights to view the data.
If the user is assigned with sensitivity level 25, the user can view all rows that have a security
level of 25 or lower.
Label security can be used to perform security checks on statements that include insert, delete,
and update.
Comment
Chapter 30, Problem 16RQ
Problem
What are the different types of SQL injection attacks?
Step-by-step solution
Step 1 of 1
SQL injection attacks are more common threats to database systems. Types of injection attacks
include,
• SQL Manipulation
• Code injection
• Function Call injection
Explanation
SQL Manipulation
A modification attack that changes an SQL command in the application, or by extending a query
by adding additional query components using set operations such as union, intersect, or minus in
SQL query.
Example
The query used to check authentication:
SELECT * FROM loginusers
WHERE username="john" and paSSswoRd="johnpwd";
Check whether any rows are returned by using this query.
The hacker can try to change or manipulate the SQL statement as follows:
SELECT * FROM loginusers
WHERE username="john" and paSSswoRd="johnpwd" or "a"="a";
So the hacker knows “john” as a valid login and without knowing his password able to log into the
database system.
Code Injection
• It allows the addition of extra SQL statements or commands to the existing or original SQL
statement by introducing a computer bug caused by processing invalid data.
• The attacker injects the code into a computer program to change the course of action.
• It is a one of the method used for hacking the system to obtain information without
authorization.
Function call Injection
• A database or operating system (OS) function call is injected into the SQL statements to
change the data or to make a system call that is considered to be privileged.
• It is possible to introduce a function that performs some operation related to communication of
network and SQL queries are created that are dynamic as they are executed at run time.
Example
The query given makes the user request a page from a web server.
SELECT TRANSLATE ("||HTTP.REQUEST ('http: //129.107.12.1/') ||", '97876763','9787') FROM
dual;
The attacker can identify the string that is given as an input, the URL of the web page for doing
any other illegal operations.
Comment
Chapter 30, Problem 17RQ
Problem
What risks are associated with SQL injection attacks?
Step-by-step solution
Step 1 of 1
Risk associated with SQL injection attacks are,
Database Fingerprinting
The attacks related to database are determined by the attacker by identifying the type of backend
database which are performed if there is weakness in DBMS.
Denial of Service
The attacker can make buffer to overflow with request or consume more number of resources or
they delete some data, thus denying the service to the intended users.
Bypassing Authentication
The attacker can access the database system as an authorized user and perform all the desired
operations.
Identifying Injectable Parameters
The attacker obtains the sensitive information such as the type and structure of the back-end
database of a web application. It is possible as the default error page is descriptive that are
returned by application servers.
Executing Remote Commands
By this the attacker uses the tool to execute the commands on the database. For example
attacker can execute stored procedures and functions from a remote SQL interface.
Performing Privilege Escalation
This attack makes use of logical flaws within the database to improve the level of access.
Comment
Problem
Chapter 30, Problem 18RQ
What preventive measures are possible against SQL injection attacks?
Step-by-step solution
Step 1 of 1
Preventing from SQL injection attacks is achieved by using some programming rules to all
procedures and functions that are accessed through web. Some of the techniques include,
Bind Variables
• The bind variables are used to (using parameter) protects against injection attacks and hence
performance is improved.
• For example, consider the code using java and JDBC:
PreparedStatement st=con.prepareStatement ("SELECT * FROM employee WHERE empid=?
AND pwd=?");
st.setString (1, empid);
st.setString (2, pwd);
• User input should be bound to a parameter instead of using it in the statement, in this example
the input ‘1’ is assigned to a bind variable ‘empid’ instead of directly passing string parameters.
Filtering Input
• It is used to remove the escape characters by using Replace function of SQL from input strings.
• For example the delimiter (“) double quote is replaced by (‘’) two single quotes.
Function Security
Database standard and custom functions should be restricted as they take advantage during the
SQL function injection attacks.
Comment
Chapter 30, Problem 19RQ
Problem
What is a statistical database? Discuss the problem of statistical database security.
Step-by-step solution
Step 1 of 1
Statistical database are used mainly to produce statistics on various populations. The database
may contain data on individuals , which should be protected from user access. Users are
permitted to retrieve statistical information on the populations such as averages, sums, counts,
minimums maximums, and standard deviations.
A population is a set of tuples of a relation (table that satisfy some selection condition
Statistical queries involve applying statistical functions to a population of tuples.
Statistical database security techniques fail to provide security to individual data in some
situations.
For ex: We may want to retrieve the number of individuals in a population or the average income
in the population.
Comment
Chapter 30, Problem 20RQ
Problem
How is privacy related to statistical database security? What measures can be taken to ensure
some degree of privacy in statistical databases?
Step-by-step solution
Step 1 of 2
Statistical database are used mainly to produce statistics about various populations. The
database may contain confidential data about individuals, which should be protected from user
access. However, users are permitted to retrieve statistical information about the populations,
such as averages, sums, counts, maximums, minimums, and standard deviations. Since there
can be ways to retrieve private information using aggregate function when much information is
available about a person, statistical database that store information impose potential threats to
privacy.
Consider a example: PERSON relation with attributes Name, Ssn, Income, Address, City, Zip,
Sex and Last_degree.
A population is set of tuples of a relation that satisfy some selection condition. Hence, each
selection condition on the PERSON relation will specify a particular population of PERSON
tuples. For example Sex = 'F' or Last_degree = 'M.Tech'.
Statistical queries involve applying statistical functions to a population of tuples. For example:
Avg Income. However, access to personal information is not allowed. Statistical database
security techniques must prohibit queries that retrieve attribute values and by allowing only
queries that involve aggregate functions such as ADD, MIN,,MAX, AVG, COUNT and
STANDATRD DEVIATION. Such queries are sometime called statistical queries.
Comment
Step 2 of 2
It is the responsibility of a database management system to ensure the confidentiality of
information about individuals, while still providing useful statistical summaries of data about those
individuals to user. Provision of privacy protection is paramount. Its violation can be illustrated in
following statistical queries:
Q1 SELECT COUNT (*) FROM PERSON
WHERE ;
Q2 SELECT AVG(Income) FROM PERSON
WHERE;
Let someone is interested in find in salary of Jane Smith, who is a female with last degree 'M.S,
and stays in Houston adding all these to let we get 1 as result of Q1. Now using same condition
for Q2 will give salary of Jane Smith. even if result is not 1 for Q1, still MAX and MIN functions
can be used to get range of salary.
Measures taken to ensure privacy:
1.) No statistical queries are permitted whenever number of tuples in the population specified by
selection falls below some threshold.
2.) Prohibit query that repeatedly refer to same population of tuples.
3.) Introduce slight noises in result of queries.
4.) Partitioning of database into groups and any qury must refer to any complete group, but never
to subsets of records within groups.
Comment
Chapter 30, Problem 21RQ
Problem
What is flow control as a security measure? What types of flow control exist?
Step-by-step solution
Step 1 of 3
Flow control regulates the distribution or flow of information among accessible objects. A flow
between object X and object Y occurs when a program reads values from X and writes values
into Y. Flow control checks that information contained in some object does not flow explicitly or
implicitly into less protected objects. Thus, a user cannot get indirectly in Y what he or she
cannot get directly in X. Most flow controls employ some concepts of security class; the transfer
of information from a sender to a receiver is allowed only if the receiver's security class is at least
as privilege as sender's.
Examples of a flow control program include preventing a service program from leaking a
customer's confidential data, and blocking the transmission of secret military data to an unknown
classified user.
A flow policy specifies the channels long which information is allowed to move. The simplest flow
policy specifies just two classes of information: confidential(C), and non-Confidential (N), and
allows all flows except those from class C to N. This policy can solve the confidentiality problem
that arises when a service program handles data such a s customer information, some of which
may be confidential.
Comment
Step 2 of 3
Access control mechanisms are responsible for checking users' authorizations for resource
access: Only granted operations are executed. Flow controls can be enforced by an extended
access control mechanism, which involve assigning a security class to each running program.
The program is allowed to read a particular memory segment only if its class is as high as that of
the segment. It is allowed to write in a segment only if its class as low as that of the segment.
This automatically ensures that no information transmitted by the person can move from a higher
to a lower class. For example, a military program with secret clearance can only read from
objects that are unclassified and confidential and can only write into objects that are secret or top
secret.
Two types of flows exist:
1.) Explicit flows: Occurring as a consequence of assignment instructions, such as Y:= f(X1, Xn)
2.) Implicit flows: Generated by conditional instructions, such as if f(Xm+1,..., Xn) then y:= f(X1,
Xm).
Comment
Step 3 of 3
Flow control mechanisms must verify that only authorized flows, both explicit and implicit, are
executed. A set of rules must be satisfied to ensure secure information flows. Rules may be
expressed using flow relations among classes and assigned to information, stating the
authorized flow within the system. This relation can define, for a class, the set of classes where
information can flow, or can state the specific relations to be verified between two classes to
allow information to flow from one to another. In general, flow control mechanisms implement the
control by assigning a label to each object and by specifying the security class of the object.
Labels are then used to verify the flow relations defined in the model.
Comment
Chapter 30, Problem 22RQ
Problem
What are covert channels? Give an example of a covert channel.
Step-by-step solution
Step 1 of 2
A covert channel allows a transfer of information that violates the security or the policy.
Specifically, covert channel allows information to pass from higher classification level to a lower
classification level through improper means. Covert channels can be classified into two broad
categories:
1.) Timing Channels: In a timing channel the information is conveyed by the timing event
processes
2.) Storage channels: In storage channels temporal synchronization is not required, in that
information is conveyed by accessing system information or what is otherwise inaccessible to the
user.
Comment
Step 2 of 2
In a simple example of a convert channel, consider a distributed database system in which two
nodes have user security levels of secret(S) and unclassified (U). In order for a transaction to
commit, both nodes must agree to commit. They mutually can only do operations that are
consistent with *- property, which states that in any transaction, the S site cannot writ or pass
information to the U site. However, if these two sites collude to set up a covert channel between
them, a transaction involving secret data may be committed unconditionally by the U site, but the
S site may do so in some predefined agreed-upon way so that certain information may be
passed from the site S to the U site. Measures such as locking prevent concurrent writing of the
information by users with different security levels into the same objects, preventing the storagetype convert channels. Operating systems and distributed database provide control over the
multi-programming of operations that allows a sharing of resources without the possibility of
encroachment of one program or process into another's memory or other resources in the
system, thus preventing timing-oriented covert channels. In general, covert channels are not a
major problem in well-implemented robust database implementations. However, certain schemes
may be contrived by clever uses that implicitly transfer information.
Some security experts believe that one way to avoid covert channels is to disallow programmers
to actually gain access to sensitive data that a program will process after the program has been
put into operation.
Comment
Chapter 30, Problem 23RQ
Problem
What is the goal of encryption? What process is involved in encrypting data and then recovering
it at the other end?
Step-by-step solution
Step 1 of 1
Suppose data is communicated via a secure channel but still falls into wrong hands. In this
situation, by using encryption we can disguise the message so that even if the transmission is
diverted, the message will not be revealed. Encryption is a means of maintaining secure data in
an insecure environment.
Encryption consists of applying an encryption algorithm to data using some predefined
encryption key.
The resulting data has to be decrypted using a decryption key to recover the original data.
Comment
Chapter 30, Problem 24RQ
Problem
Give an example of an encryption algorithm and explain how it works.
Step-by-step solution
Step 1 of 3
Public key encryption: Public key encryption is based on mathematical functions rather than
operations on bit patterns. They also involve the use of two separate keys, in contrast to
conventional encryption, which uses one key only. The use of two keys can have profound
consequences in the areas of confidentiality, key distribution, and authentication. The two keys
used for public key encryption are referred to as the public key and the private key. Invariably, the
private key is kept secret, but it is referred to as private key rather than secret key to avoid
confusion with conventional encryption.
Comment
Step 2 of 3
A public key encryption scheme, or infrastructure, has six ingredients:
1.) Plaintext: data that is to be transmitted (encrypted).
2.) Encryption algorithm: Algorithm that will perform transformations on plain text.
3. and 4.) Public key and Private Key: If one of these is used for encryption the other is used for
decryption.
5.) Cipher text: Encrypted data or scrambled text for a given plaintext and set of keys.
6.) Decryption algorithms: This algorithm accepts the cipher text and the matching key and
produces the original plain text.
Comment
Step 3 of 3
Public key is made public for others to use, whereas the private key is known only to its owner. It
relies on one key for encryption and other for decryption.
Essential steps are as follows:
1.) Each user generates a pair of keys to be used for the encryption and decryption of messages.
2.) Each user places one of the keys in a public register or other accessible file. This is the public
key. The companion key is kept private.
3.) If a sender wishes to send a private message to a receiver, the sender encrypts the message
using the receiver's public key.
4.) When the receiver receives the message, he or she decrypts it using the receiver's private
key. No other recipient can decrypt the message because only the receiver knows his or her
private key.
Comment
Chapter 30, Problem 25RQ
Problem
Repeat the question for the popular RSA algorithm.
Question
Give an example of an encryption algorithm and explain how it works.
Step-by-step solution
Step 1 of 1
The RSA encryption algorithm incorporates results form number theory, combined with the
difficulty of determining the prime factors of a target. The RSA algorithm also operates with
modular arithmetic -mod n.
Two keys e and d, are used for encryption and decryption. An important property is that they can
be interchanged. n is chosen as a large integer that is a product of two large distinct prime
numbers, a and b. The encryption key e is a randomly chosen number between 1 and n that is
relatively prime to (a-1) *(b-1). The plaintext block P is encrypted as P^e mod n. Because the
exponentiation is performed mod n, factoring P^e to uncover the encrypted plaintext is difficult.
However, the decrypting key d is carefully chosen so that (P^e)^d mod n = P. The d can be
computed from the condition that d*e = 1 mod((a-1) * (b-1)). Thus, the legitimate receiver who
knows d simply computes (P^e)^dmod n = P and recovers p without having to factor P^e.
Comment
Chapter 30, Problem 26RQ
Problem
What is a symmetric key algorithm for key-based security?
Step-by-step solution
Step 1 of 1
Symmetric key uses same key for both encryption and decryption, by using this characteristic
fast encryption and decryption is possible to be used for sensitive data in the database.
• The message is encrypted with a secret key and can be decrypted with the same secret key.
• Algorithm used for symmetric key encryption is called as symmetric key algorithm and as they
are mostly used for encrypting the content of a message, they are also called content
encryption algorithm.
• The secret key is derived from password string used by the user by applying the same function
to the string at both sender and receiver. Thus it is also referred as password based encryption
algorithm.
• Encrypting the content using longer key is difficult to break than using shorter key as the
encryption entirely depends upon the key.
Comment
Chapter 30, Problem 27RQ
Problem
What is the public key infrastructure scheme? How does it provide security?
Step-by-step solution
Step 1 of 2
Public key encryption scheme,
1. Plain text: This is the data or readable message that is fed into the algorithm as input.
2. Encryption algorithm: This algorithm performs various trans formations on the plaintext.
3. Public and private keys: These are a pair of keys that have been selected so that if one is
used for encryption, the other is used for decryption. The exact transformations performed by the
encryption algorithm detention the public or private key that is provided as in put.
4. Cipher text: This is the scrambled message produced as output. It depends on the plain text
and the key. For a given message two different keys will produce two different cipher texts.
5. Decryption algorithm: This algorithm accepts the cipher text and the matching key and
produces the original plaintext. A general purpose public key cryptographic works with one key
for encryption and different but related key for decryption.
Comment
Step 2 of 2
The steps are as follows:1. Each user generates a pair of key s to be used for the encryption and decryption of message.
2. User places one of two keys in public register of in an accessible file.(Public key) and
companion key is kept private.
3. If user wishes to send a private message to a receiver, the sender encrypts it using receiver
public key.
4. Receiver receives message, decry its it using the receiver private key, No other user can
decrypt the message thus this provide security to data.
Comment
Chapter 30, Problem 28RQ
Problem
What are digital signatures? How do they work?
Step-by-step solution
Step 1 of 1
Digital signature is a means of associating a mark unique to an individual with a body of text. The
mark should be unforgettable i.e others able to check whether signature comes from the
originator.
Digital signature consists of a string of symbols.
- signature must be different for each use. This can be achieved by making each digital signature
a function of the message that it is signing together with a time stamp.
- Public key techniques are the means cheating digital signatures.
Comment
Chapter 30, Problem 29RQ
Problem
What type of information does a digital certificate include?
Step-by-step solution
Step 1 of 1
A digital certificate combines the public key with the identity of the person that consists of the
corresponding private key into a statement that was digitally signed. The certificate are issued
and signed by certification authority (CA).
The following are the list of information included in the certificate:
1. The certificate owner information, which is a unique identifier known as the distinguished name
(DN) of the owner. It includes owner’s name, organization and other related information of the
owner.
2. The public key of the owner.
3. The date of issue of the certificate.
4. The validity period is specified by ‘Valid From’ and ‘Valid To’ dates.
5. Information of the issuer identifier.
6. Digital signature of the certification authority (CA) who issues the certificate.
All the information is encoded through message-digest function, which creates the signature.
Comment
Chapter 30, Problem 30E
Problem
How can privacy of data be preserved in a database?
Step-by-step solution
Step 1 of 3
Protecting data from unauthorized access is refereed as data privacy. The data warehouses in
which a large amount of data is stored must be kept private and secure.
There are many challenges associated with data privacy. Some of them are as follows:
• In order to preserve data privacy, performing data mining and analysis should be minimized.
Usually, a large amount of data is collected and stored in centralized location. Violating one
security policy will expose all the data. So, it is better to avoid storing data in central warehouse.
Comment
Step 2 of 3
• The database contains personal data of the individuals. So, personal data of the individuals is
to be kept secure and private.
• A lot of people in the organization and outside the organization access the data. Data must be
protected from illegal access/attacks.
Comment
Step 3 of 3
Some of the measures to provide data privacy are as follows:
• A good security mechanism should be imposed to protect the data from unauthorized users. It
includes physical security which includes protecting the location where the data is stored.
• Provide controlled and limited access to the data. Ensure that only authorized users can access
the data by using biometrics, passwords etc. Also impose mechanism so that they access the
data that they need.
• It is better to avoid storing data in central warehouse and distribute the data in different
locations.
• Anonymize the data and remove all the personal information.
Comment
Chapter 30, Problem 31E
Problem
What are some of the current outstanding challenges for database security?
Step-by-step solution
Step 1 of 3
Challenges in database Security:
1.) Data Quality: The database community needs techniques and organizational solutions to
access and attest the quality of data. Techniques may be as simple as Quality stamps posted on
Web sites. We also need techniques that provide more efficient integrity semantics verification
and tools for assessment od data quality, based on techniques such as record linkage.
Comment
Step 2 of 3
2.) Intellectual property Rights: With the widespread use of internet and intranets, legal and
informational aspects of data are becoming major concerns of organizations. To address these
concerns, watermarking techniques for relational data have recently been proposed. The main
purpose of digital watermarking is to protect content from unauthorized duplication and
distribution by enabling provable ownership of the content. It has traditionally relied upon the
availability of large noise domain within which the object can be altered while retaining its
essential properties. However, research is needed to assess the robustness of such techniques
and to investigate different approaches aimed at preventing intellectual property right violations.
Comment
Step 3 of 3
3.) Data Survivability: Database systems need to operate and continue their functions, even
with reduced capabilities, despite disruptive events such as information warfare attacks. A
DBMS, in addition to making every effort to prevent an attack and detecting one in the event of
occurrence, should be able to do following:
1.) Confinement: Take immediate action to eliminate the attacker's access to the system and to
isolate or contain the problem to prevent further spread.
2.) Damage assessment: Determine the extent of the problem, including failed functions and
corrupted data.
3.) Reconfiguration: Reconfigure to allow operation to continue is a degraded mode while
recovery proceeds.
4.) Repair: Recover corrupted or lost data and repair or reinstall failed system functions to
reestablish a normal level of operation.
5.) Fault treatment: To the extent possible, identify the weaknesses exploited in the attack and
take steps to prevent a recurrence.
The goal of the information warfare attacker is to damage the organization's operation and
fulfillment of its mission through disruption of its information systems. The specific target of an
attack may be the system itself or its data. While attacks that bring the system down outright are
server and dramatic, they must also be well timed to achieve the attackers goal, since attacks will
receive immediate and concentrated attention in order to bring the system back to operational
condition, diagnose how the attack took place, and installs preventive measures.
Comment
Chapter 30, Problem 32E
Problem
Consider the relational database schema in Figure 5.5. Suppose that all the relations were
created by (and hence are owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E
a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these
privileges to other users.
b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary,
Mgr_ssn, and Mgr_start_date.
c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname,
and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT.
d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify
DEPENDENT.
e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have
Dno = 3.
f. Write SQL statements to grant these privileges. Use views where appropriate.
Step-by-step solution
Step 1 of 6
(a) GRANT SELECT, UPDATE
ON EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON
TO USER_A
WITH GRANT OOPTION ;
Comment
Step 2 of 6
(b) CREATE VIEW EMPS AS
SELECT FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX,
SUPERSSN, DN O
FROM EMPLOYEE ;
GRANT SELECT ON EMPS
TO USER _ B;
CREATE VIEW DEPTS AS
SELECT DNAME, DNUMBER FROM DEPARTMENT;
GRANT SELECTION ON DEPTS
TO USER _ B;
Comment
Step 3 of 6
(c) GRANT SELECT, UPDATE ON WORKS ON TO USE_C
CREATE VIEW EMPI AS
SELECT FNAME, MINIT, LNAME, SSN
FROM EMPLOYEE ;
GRANT SELECT ON EMPL
TO USER _ C;
CREATE VIEWPROJIAS
SELECT PNAME, PNUMBER,
FROM PROJECT;
GRANT SELECTION PROJ1
TO USER_C;
Comment
Step 4 of 6
(d) GRANT SELECT ON EMPLOYEE, DEPEN DENT TO USER_D;
GRANT UPDATE ON DEPENDENT TO USER_D;
Comment
Step 5 of 6
(e) CREATE VIEW DNO 3_ EMPLOYEEES AS
SELECT * FROM EMPLOYEE
WHERE DNO = 3;
GRANT SELECT ON DNO 3_EMPLOYEES TO USER_E;
Comment
Step 6 of 6
(f) Working of the above statements grants privileges.
Comment
Chapter 30, Problem 32E
Problem
Consider the relational database schema in Figure 5.5. Suppose that all the relations were
created by (and hence are owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E
a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these
privileges to other users.
b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary,
Mgr_ssn, and Mgr_start_date.
c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname,
and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT.
d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify
DEPENDENT.
e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have
Dno = 3.
f. Write SQL statements to grant these privileges. Use views where appropriate.
Step-by-step solution
Step 1 of 6
(a) GRANT SELECT, UPDATE
ON EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON
TO USER_A
WITH GRANT OOPTION ;
Comment
Step 2 of 6
(b) CREATE VIEW EMPS AS
SELECT FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX,
SUPERSSN, DN O
FROM EMPLOYEE ;
GRANT SELECT ON EMPS
TO USER _ B;
CREATE VIEW DEPTS AS
SELECT DNAME, DNUMBER FROM DEPARTMENT;
GRANT SELECTION ON DEPTS
TO USER _ B;
Comment
Step 3 of 6
(c) GRANT SELECT, UPDATE ON WORKS ON TO USE_C
CREATE VIEW EMPI AS
SELECT FNAME, MINIT, LNAME, SSN
FROM EMPLOYEE ;
GRANT SELECT ON EMPL
TO USER _ C;
CREATE VIEWPROJIAS
SELECT PNAME, PNUMBER,
FROM PROJECT;
GRANT SELECTION PROJ1
TO USER_C;
Comment
Step 4 of 6
(d) GRANT SELECT ON EMPLOYEE, DEPEN DENT TO USER_D;
GRANT UPDATE ON DEPENDENT TO USER_D;
Comment
Step 5 of 6
(e) CREATE VIEW DNO 3_ EMPLOYEEES AS
SELECT * FROM EMPLOYEE
WHERE DNO = 3;
GRANT SELECT ON DNO 3_EMPLOYEES TO USER_E;
Comment
Step 6 of 6
(f) Working of the above statements grants privileges.
Comment
Chapter 30, Problem 33E
Problem
Suppose that privilege (a) of Exercise is to be given with GRANT OPTION but only so that
account A can grant it to at most five accounts, and each of these accounts can propagate the
privilege to other accounts but without the GRANT OPTION privilege. What would the horizontal
and vertical propagation limits be in this case?
Reference Problem 30.32
Consider the relational database schema in Figure 5.5. Suppose that all the relations were
created by (and hence are owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E
a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these
privileges to other users.
b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary,
Mgr_ssn, and Mgr_start_date.
c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname,
and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT.
d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify
DEPENDENT.
e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have
Dno = 3.
f. Write SQL statements to grant these privileges. Use views where appropriate.
Step-by-step solution
Step 1 of 1
The horizontal propagation granted to USERA is 5.
The vertical propagation limit granted to USER_A is level 1.
So that uses A can then grant it with level 0 vertical limit (i.e with out the GRANT OPTION) to at
most five users, who then cannot further grant the privilege.
Comment
Chapter 30, Problem 34E
Problem
Consider the relation shown in Figure 30.2(d). How would it appear to a user with classification
U? Suppose that a classification U user tries to update the salary of ‘Smith’ to $50,000; what
would be the result of this action?
Step-by-step solution
Step 1 of 1
EMPLOYEE would appear to users with in classification U as follows:
NAME: SALARY Job performance TC
Smith
null
If a classification
null
U
user tried to up date the salary of smith to $ 50,000, a third polyinstantiation
of smith tuple would result as follows.
NAME SALARY JOB performance TC.
Smith
40000 C fair SS
Smith
40000 C excellent C C
Smith
50000
null
Brown C 80000 s good C S
Comment
C
Download