Uploaded by Ishan Tripathi

dbms-complete-notes-bca-2

advertisement
lOMoARcPSD|25829564
Dbms complete notes-bca
BCA (University of Mysore)
Studocu is not sponsored or endorsed by any college or university
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Database
A database is a collection of related data.
Data - known facts that can be recorded and that have implicit meaning. Example names, telephone numbers, and addresses of the people you know.
- This collection of related data with an implicit meaning is a database.
A database has the following properties:
■ A database represents some aspect of the real world, sometimes called the miniworld
or the universe of discourse (UoD). Changes to the miniworld are reflected in the
database.
■ A database is a logically coherent collection of data with some inherent meaning. A
random assortment of data cannot correctly be referred to as a database.
■ A database is designed, built, and populated with data for a specific purpose. It has
an intended group of users and some preconceived applications in which these users are
interested.
■Changes must be reflected in the database as soon as possible.
■A database can be of any size and complexity
Database management system (DBMS)
A database management system (DBMS) is a collection of programs that enables
users to create and maintain a database. The DBMS is a general-purpose software
system that facilitates the processes of defining, constructing, manipulating, and sharing
databases among various users and applications.
 Defining – specifying the data types, structures, and constraints of the data to be
stored in the database.
Meta-data.
The database definition or descriptive information is also stored by the DBMS in the
form of a database catalog or dictionary; it is called meta-data.
 Constructing-the database is the process of storing the data on some storage
medium that is controlled by the DBMS.
 Manipulating- a database includes functions such as querying the database to
retrieve specific data, updating the database to reflect changes in the miniworld,
and generating reports from the data.
 Sharing- a database allows multiple users and programs to access the database
simultaneously.
Other important functions provided by the DBMS include,
 Protecting- the database and maintaining it over a long period of time. Protection
includes system protection against hardware or software malfunction (or crashes)
and security protection against unauthorized or malicious access.
Dept of Computer Science
Page 1
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
 Maintaining- A typical large database may have a life cycle of many years, so
the DBMS must be able to maintain the database system by allowing the system
to evolve as requirements change over time.
An application program accesses the database by sending queries or requests for data
to the DBMS.
A query typically causes some data to be retrieved;
A transaction may cause some data to be read and some data to be written into the
database.
 Traditional database applications-in which most of the information that is stored and
accessed is either textual or numeric.
 Multimedia databases- Possible to store images, audio clips, and video streams
digitally.
 Geographic information systems (GIS) -can store and analyze maps, weather data,
and satellite images.
 Data warehouses and online analytical processing (OLAP) systems -are used in
many companies to extract and analyze useful business information from very large
databases to support decision making.
 Real-time and active database technology -issued to control industrial and
manufacturing processes.
Characteristics of the Database Approach
(or The main characteristics of the database approach versus the file-processing)
The main characteristics of the database approach versus the file-processing approach
are the following:
■ Self-describing nature of a database system
■ Insulation between programs and data, and data abstraction
■ Support of multiple views of the data
■ sharing of data and multiuser transaction processing
1.Self-Describing Nature of a Database System
A fundamental characteristic of the database approach is that the database system
contains not only the database itself but also a complete definition or description of the
database structure and constraints. This definition is stored in the DBMS catalog, which
contains information such as the structure of each file, the type and storage format of
each data item, and various constraints on the data. The information stored in the catalog
is called meta-data, and it describes the structure of the primary database
Dept of Computer Science
Page 2
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Insulation between Programs and Data, and Data Abstraction
In traditional file processing, the structure of data files is embedded in the
application programs, so any changes to the structure of a file may require changing all
programs that access that file. By contrast, DBMS access programs do not require such
changes in most cases. The structure of data files is stored in the DBMS catalog
separately from the access programs. We call this property program-data
independence.
Support of Multiple Views of the Data
A database typically has many users, each of whom may require a different
perspective or view of the database. A view may be a subset of the database or it may
contain virtual data that is derived from the database files but is not explicitly stored.
Sharing of Data and Multiuser Transaction Processing
A multiuser DBMS, as its name implies, must allow multiple users to access the
database at the same time. This is essential if data for multiple applications is to be
integrated and maintained in a single database. The DBMS must include concurrency
control software to ensure that several users trying to update the same data do so in a
controlled manner so that the result of the updates is correct.
For example, when several reservation agents try to assign a seat on an airline flight,
the DBMS should ensure that each seat can be accessed by only one agent at a time for
assignment to a passenger. These types of applications are generally called online
transaction processing (OLTP) applications. A fundamental role of multiuser DBMS
software is to ensure that concurrent transactions operate correctly and efficiently.
Advantages of Using the DBMS Approach (OR Additional characteristics of
database systems)
The advantages of using a DBMS and the capabilities that a good DBMS should possess
are the following. These capabilities are in addition to the four main characteristics
discussed already. The DBA must utilize these capabilities to accomplish a variety of
objectives related to the design, administration, and use of a large multiuser database.
1. Controlling Redundancy
Redundancy in storing the same data multiple times leads to several problems.
First, there is the need to perform a single logical update—such as entering data on a
new student—multiple times: once for each file where student data is recorded. This
leads to duplication of effort. Second, storage space is wasted when the same data is
stored repeatedly, and this problem may be serious for large databases. Third, files that
represent the same data may become inconsistent.
Dept of Computer Science
Page 3
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
In the database approach, the views of different user groups are integrated during
database design. Ideally, we should have a database design that stores each logical data
item—such as a student’s name or birth date—in only one place in the database. This
is known as data normalization, and it ensures consistency and saves storage space .It
is sometimes necessary to use controlled redundancy to improve the performance of
queries.
2. Restricting Unauthorized Access
When multiple users share a large database, it is likely that most users will not be
authorized to access all information in the database. A DBMS should provide a security
and authorization subsystem, which the DBA uses to create accounts and to specify
account restrictions.
3. Providing Persistent Storage for Program Objects
Databases can be used to provide persistent storage for program objects and
data structures. This is one of the main reasons for object-oriented database systems.
Programming languages typically have complex data structure. The persistent storage
of program objects and data structures is an important function of database systems.
4. Techniques for Efficient Query Processing
Database systems must provide capabilities for efficiently executing queries and
updates. Because the database is typically stored on disk, the DBMS must provide
specialized data structures and search techniques to speed up disk search for the desired
records. Auxiliary files called indexes are used for this purpose.
5. Providing Backup and Recovery
A DBMS must provide facilities for recovering from hardware or software
failures. The backup and recovery subsystem of the DBMS is responsible for
recovery.
6. Providing Multiple User Interfaces
Because many types of users with varying levels of technical knowledge use a
database, a DBMS should provide a variety of user interfaces. These include query
languages for casual users, programming language interfaces forapplication
programmers, forms and command codes for parametric users, and menu-driven
interfaces and natural language interfaces for standalone users. Both forms-styles
interfaces and menu-driven interfaces are commonly known as graphical user
interfaces (GUIs).
7. Representing Complex Relationships among Data
A database may include numerous varieties of data that are interrelated in many
ways. A DBMS must have the capability to represent a variety of complex relationships
among the data, to define new relationships as they arise, and to retrieve and update
related data easily and efficiently.
8. Enforcing Integrity Constraints
Dept of Computer Science
Page 4
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Most database applications have certain integrity constraints that must hold for
the data. A DBMS should provide capabilities for defining and enforcing these
constraints. The simplest type of integrity constraint involves specifying a data type for
each data item.
9. Permitting Inferencing and Actions Using Rules
Some database systems provide capabilities for defining deduction rules for
inferencing new information from the stored database facts. Such systems are called
deductive database systems. In today’s relational database systems, it is possible to
associate triggers with tables. A trigger is a form of a rule activated by updates to the
table, which results in performing some additional operations to some other tables,
sending messages, and so on.
10. Additional Implications of Using the Database Approach
This section discusses some additional implications of using the database
approach that can benefit most organizations.
a. Potential for Enforcing Standards.
b. Reduced Application Development Time.
c. Flexibility.
d. Availability of Up-to-Date Information.
e. Economies of Scale
Advantages and disadvantages of DBMS over conventional file system
Advantages
• Flexibility: Because programs and data are independent, programs do not have to be
modified when types of unrelated data are added to or deleted from the database, or
when physical storage changes.
• Fast response to information requests: Because data are integrated into a single
database, complex requests can be handled much more rapidly than if the data were
located in separate, non-integrated files. In many businesses, faster response means
better customer service.
• Multiple accesses: Database software allows data to be accessed in a variety of ways
and often, by using several programming languages.
• Lower user training costs: Users often find it easier to learn such systems and
training costs may be reduced. Also, the total time taken to process requests may be
shorter, which would increase user productivity.
• Less storage: Theoretically, all occurrences of data items need be stored only once,
thereby eliminating the storage of redundant data. System developers and database
designers often use data normalization to minimize data redundancy.
Dept of Computer Science
Page 5
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Disadvantages
1. Database systems are complex, difficult, and time-consuming to design
2. Substantial hardware and software start-up costs
3. DBMS subjects a business to a risk in the loss of critical data in its electronic
format it can be more readily stolen without proper security
4. The cost of a DBMS can be prohibitive for small enterprises as they struggle with
cost justification for making the investment in the infrastructure
5. Improper use of the DBMS can lead to incorrect decision making as people take for
granted the data is accurate as presented
.
6. Data can be stolen by careless of the password security.
Problems with File Processing Systems
1. Program-Data Dependence- File descriptions are stored within each application
program that accesses a given file.
2. Duplication of Data- Applications are developed independently in file processing
systems leading to unplanned duplicate files. Duplication is wasteful as it requires
additional storage space and changes in one file must be made manually in all files. This
also results in loss of data integrity. It is also possible that the same data item may have
different names in different files, or the same name may be used for different data items
in different files.
3. Limited data sharing- Each application has its own private files with little
opportunity to share data outside their own applications. A requested report may require
data from several incompatible files in separate systems.
4. Lengthy Development Times-There is little opportunity to leverage previous
development efforts. Each new application requires the developer to start from scratch
by designing new file formats and descriptions
5. Excessive Program Maintenance- The preceding factors create a heavy program
maintenance load. 6. Integrity Problem. The problem of integrity is the problem of
ensuring that the data in the database is accentuate.
7. Inconsistence data
8. Security -The Problem of security in file processing is Non Programmer can retrieve
modify, delete and insert the data in data base. But this is not possible in DBMS.
DBMS classification criteria:
We can classify the DBMS based on the following criteria,
- Data model, number of users, number of sites, access paths, cost
Dept of Computer Science
Page 6
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
 Data model
• Relational
• Object
• Hierarchical and network (legacy)
• Native XML DBMS
 Number of users
• Single-user
• Multiuser
 Number of sites
• Centralized
• Distributed
 Homogeneous
 Heterogeneous
 Cost
• Open source
• Different types of licensing
 Types of access path options
 General or special-purpose
Actors on the Scene (Users of Database)
Actors on the scene -The people whose jobs involve the day-to-day use of a large
database
1. Database Administrators
In any organization where many people use the same resources, there is a need
for a chief administrator to oversee and manage these resources. In a database
environment, the primary resource is the database itself, and the secondary resource is
the DBMS and related software. Administering these resources is the responsibility of
the database administrator (DBA). The DBA is responsible for authorizing access to
the database, coordinating and monitoring its use, and acquiring software and hardware
Dept of Computer Science
Page 7
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
resources as needed. The DBA is accountable for problems such as security breaches
and poor system response time. In large organizations, the DBA is assisted by a staff
that carries out these functions.
2. Database Designers
Database designers are responsible for identifying the data to be stored in the
database and for choosing appropriate structures to represent and store this data. These
tasks are mostly undertaken before the database is actually implemented and populated
with data. It is the responsibility of database designers to communicate with all
prospective database users in order to understand their requirements and to create a
design that meets these requirements. In many cases, the designers are on the staff of
the DBA and may be assigned other staff responsibilities after the database design is
completed.
3 .End Users
End users are the people whose jobs require access to the database for querying,
updating, and generating reports. There are several categories of end users:
■ Casual end users occasionally access the database, but they may need different
information each time.
Eg:- middle- or high-level managers, other occasional browsers.
■ Naive or parametric end users constantly querying and updating the database, using
standard types of queries and updates—called canned transactions—that have been
carefully programmed and tested.
eg:- Bank tellers check account balances and post withdrawals and deposits.
eg:- Reservation agents for airlines, hotels, and car rental companies check availability
for a given request and make reservations.
■ Sophisticated end users include engineers, scientists, business analysts, and others
who thoroughly familiarize themselves with the facilities of the DBMS in order to
implement their own applications to meet their complex requirements.
■ Standalone users maintain personal databases by using ready-made program
packages that provide easy-to-use menu-based or graphics-based interfaces. Eg:- the
user of a tax package that stores a variety of personal financial data for tax purposes.
4. System Analysts and Application Programmers (Software Engineers)
System analysts determine the requirements of end users, especially naive and
parametric end users, and develop specifications for standard canned transactions that
meet these requirements. Application programmers implement these specifications as
programs; then they test, debug, document, and maintain these canned transactions.
Such analysts and programmers—commonly referred to as software developers or
software engineers—should be familiar with the full range of capabilities provided by
the DBMS to accomplish their tasks.
The functions of a Database administrator are,
Dept of Computer Science
Page 8
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
- Schema Definition
- Storage structure and access method definition
- Schema and physical organization modification.
- Granting authorization for data access.
- Routine Maintenance
Schema Definition
The Database Administrator creates the database schema by executing DDL
statements. Schema includes the logical structure of database table (Relation) like
data types of attributes, length of attributes, integrity constraints etc.
Storage structure and access method definition
Database tables or indexes are stored in the following ways: Flat files, Heaps, B+ Tree
etc.
Schema and physical organization modification
The DBA carries out changes to the existing schema and physical organization.
Granting authorization for data access
The DBA provides different access rights to the users according to their level.
Ordinary users might have highly restricted access to data, while you go up in the
hierarchy to the administrator, you will get more access rights.
Routine Maintenance
Some of the routine maintenance activities of a DBA is given below.
 Taking backup of database periodically
 Ensuring enough disk space is available all the time.
 Monitoring jobs running on the database.
 Ensure that performance is not degraded by some expensive task submitted by
some users.
 Performance Tuning
Data Models
One fundamental characteristic of the database approach is that it provides some level
of data abstraction.
Data abstraction generally refers to the suppression of details of data organization and
storage, and the highlighting of the essential features for an improved understanding of
data.
One of the main characteristics of the database approach is to support data abstraction
so that different users can perceive data at their preferred level of detail.
A data model—a collection of concepts that can be used to describe the structure of a
database—provides the necessary means to achieve this abstraction. By structure of a
database we mean the data types, relationships, and constraints that apply to the data.
Dept of Computer Science
Page 9
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Categories of Data Models
We can categorize according to the types of concepts they use to describe the
database structure.
1. High-level or conceptual data models provide concepts that are close to the way
many users perceive data.
Conceptual data models use concepts such as entities, attributes, and relationships.
An entity represents a real-world object or concept, such as an employee or a
project from the miniworld that is described in the database.
An attribute represents some property of interest that further describes an entity,
such as the employee’s name or salary.
A relationship among two or more entities represents an association among the
entities, for example, a works-on relationship between an employee and a project.
Entity-Relationship model—a popular high-level conceptual data model.
2. Low-level or physical data models provide concepts that describe the details of how
data is stored on the computer storage media, typically magnetic disks. Concepts
provided by low-level data models are generally meant for computer specialists, not for
end users. Physical data models describe how data is stored as files in the computer by
representing information such as record formats, record orderings, and access paths.
An access path is a structure that makes the search for particular database records
efficient. An index is an example of an access path that allows direct access to data
using an index term or a keyword.
3. Representational (or implementation) data models, which provide concepts that
may be easily understood by end users but that are not too far removed from the way
data is organized in computer storage. Representational data models hide many details
of data storage on disk but can be implemented on a computer system directly.
Representational data models represent data by using record structures and hence are
sometimes called record-based data models.
4. Object data model as an example of a new family of higher-level implementation
data models that are closer to conceptual data models. A standard for object databases
called the ODMG object model has been proposed by the Object Data Management
Group (ODMG).Object data models are also frequently utilized as high-level
conceptual models, particularly in the software engineering domain.
Categories of data models
 Conceptual (high-level, semantic) data models: Provide concepts that are close
to the way many users perceive data. (Also called entity-based or object-based
data models.)
Dept of Computer Science
Page 10
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
 Physical (low-level, internal) data models: Provide concepts that describe
details of how data is stored in the computer.
 Implementation (representational) data models: Provide concepts that fall
between the above two, balancing user views with some computer storage
details.
Types of common Data Models:
1. Hierarchical: The hierarchical database model rigidly structures data into an
inverted “tree” in which each record contains two elements. The first is a single root or
master field, often called a key, which identifies the type location or ordering of the
record. The second is a variable number of subordinate fields, which define the rest of
the data within a record.
2. Network: The network database model creates relationships among data through a
linked-list structure in which subordinate records (called members) can be linked to
more than one data elements (called an owner).
3.Relational: The relational database model is based on the simple concept of flat
tables consisting of rows and columns. In this model, a table is equivalent to a file, row
is equivalent to a record, and each column is equivalent to a field.
Model
Hierarchical
database
Network database
Relational database
Advantages
Searching is fast and efficient.
Disadvantages
Conceptual simplicity is its
characteristic. There are no
predefined relationships
among data. High flexibility
in ad hoc querying. New data
and records can be added
easily.
Many more relationships
This is the most complicated
between data elements can be
model to design, implement,
defined. There is greater speed and maintain. It has greater
and efficiency than relational
query flexibility than
database models.
hierarchical model, but less
than relational model.
Conceptual simplicity is its
Processing efficiency and
characteristic. There is no
speed is lower. Data
predefined relationships among redundancy cannot be
data. High flexibility in ad hoc eliminated completely.
Dept of Computer Science
Page 11
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
querying. New data and
records can be added easily.
Emerging Data Models:
1. Relational Multidimensional Database: Used for building Data warehouse.
2. Object-Oriented: Consists of inter-related objects, similar to entities consisting of
attributes, methods associated with the object, and its behavior).
3. Hypermedia: Stores chunks information in the form of nodes, for which links are
established by the user, as per specific needs.
4. Geographical Information Database: Used for managing locational data for
overlaying on maps and images.
5. Knowledge Database: Consists of decision rules used to evaluate situations and
help users to take decisions like an expert.
Schemas, Instances, and Database State
Database schema:- In any data model, it is important to distinguish between the
description of the database and the database itself. The description of a database is
called the database Schema, which is specified during database design and is not
expected to change frequently.
Schema diagram :- Most data models have certain conventions for displaying schemas
as diagrams. A displayed schema is called a schema diagram. Figure shows a schema
diagram.
Eg:-fig-Schema diagram for the database
STUDENT
Name
Student _number
COURSE
Course_name
SECTION
Section_identifier
Course_number
Course_number
Semester
Class
Major
Credit_hours
Department
Year
Instructor
Schema construct :-each object in the schema—such as STUDENT or COURSE—a
schema construct. A schema diagram displays only some aspects of a schema, such as
Dept of Computer Science
Page 12
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
the names of record types and data items, and some types of constraints. Other aspects
are not specified in the schema diagram.
Database state:- The actual data in a database may change quite frequently. For
example, the database changes every time we add a new student or enter a new grade in
student database. The data in the database at a particular moment in time is called a
database state or snapshot. It is also called the current set of occurrences or instances
in the database.
The distinction between database schema and database state is very important.
The schema is sometimes called the intension, and a database state is called an
extension of the schema.
Although, as mentioned earlier, the schema is not supposed to change frequently, it is
not uncommon that changes occasionally need to be applied to the schema as the
application requirements change. For example, we may decide that another data item
needs to be stored for each record in a file, such as adding the Date_of_birth to the
STUDENT schema . This is known as schema evolution.
Three-Schema Architecture
Three of the four important characteristics of the database approach are (1) use
of a catalog to store the database description (schema) so as to make it self-describing,
(2) insulation of programs and data (program-data and program-operation
independence), and (3) support of multiple user views. In this section we specify
architecture for database systems, called the three-schema architecture, that was
proposed to help achieve and visualize these characteristics.
The Three-Schema Architecture
The goal of the three-schema architecture is to separate the user applications from
the physical database. In this architecture, schemas can be defined at the following three
levels:
1. The internal level has an internal schema, which describes the physical storage
structure of the database. The internal schema uses a physical data model and describes
the complete details of data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the
whole database for a community of users. The conceptual schema hides the details of
physical storage structures and concentrates on describing entities, data types,
relationships, user operations, and constraints. Usually, a representational data model is
used to describe the conceptual schema when a database system is implemented. This
implementation conceptual schema is often based on a conceptual schema design in a
high-level data model.
Dept of Computer Science
Page 13
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
3. The external or view level includes a number of external schemas or user views.
Each external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from that user group. As in the previous
level, each external schema is typically implemented using a representational data
model, possibly based on an external schema design in a high-level data model.
Fig: The three-schema architecture
The processes of transforming requests and results between levels are called
mappings.
Data Independence
The three-schema architecture can be used to further explain the concept of data
independence, which can be defined as the capacity to change the schema at one level
of a database system without having to change the schema at the next higher level. We
can define two types of data independence:
1. Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item), to
change constraints, or to reduce the database (by removing a record type or data item).
In the last case, external schemas that refer only to the remaining data should not be
affected. Only the view definition and the mappings need to be changed in a DBMS that
supports logical data independence. After the conceptual schema undergoes a logical
reorganization, application programs that reference the external schema constructs must
work as before. Changes to constraints can be applied to the conceptual schema without
affecting the external schemas or application programs.
2. Physical data independence is the capacity to change the internal schema without
having to change the conceptual schema. Hence, the external schemas need not be
Dept of Computer Science
Page 14
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
changed as well. Changes to the internal schema may be needed because some physical
files were reorganized—for example, by creating additional access structures—to
improve the performance of retrieval or update. If the same data as before remains in
the database, we should not have to change the conceptual schema. For example,
providing an access path to improve retrieval speed of section records by semester and
year should not require a query such as list all sections offered in fall 2008 to be
changed, although the query would be executed more efficiently by the DBMS by
utilizing the new access path.
Data independence occurs because when the schema is changed at some level,
the schema at the next higher level remains unchanged; only the mapping between the
two levels is changed. Hence, application programs referring to the higher-level schema
need not be changed.
DBMS Languages
1. Data definition language (DDL): After designing a database we have to choose a
DBMS to implement the database. Then we have is to specify conceptual and internal
schemas for the database and any mappings between the two. The data definition
language (DDL), is used by the DBA and by database designers to define both schemas.
The DBMS will have a DDL compiler whose function is to process DDL statements in
order to identify descriptions of the schema constructs and to store the schema
description in the DBMS catalog. In DBMSs where a clear separation is maintained
between the conceptual and internal levels, the DDL is used to specify the conceptual
schema only.
2. Storage definition language (SDL):- is used to specify the internal schema. The
mappings between the two schemas may be specified in either one of these languages.
In most relational DBMSs today, there is no specific language that performs the role of
SDL. Instead, the internal schema is specified by a combination of functions,
parameters, and specifications related to storage. These permit the DBA staff to control
indexing choices and mapping of data to storage.
3. View definition language (VDL):-to specify user views and their mappings to the
conceptual schema, but in most DBMSs the DDL is used to define both conceptual and
external schemas. In relational DBMSs, SQL is used in the role of VDL to define user
or application views as results of predefined queries.
4. Data manipulation language(DML):-Once the database schemas are compiled and
the database is populated with data, users must have some means to manipulate the
database. Typical manipulations include retrieval, insertion, deletion, and modification
of the data. The DBMS provides a set of operations or a language called the data
manipulation language (DML) for these purposes.
Dept of Computer Science
Page 15
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
There are two main types of DMLs.
A high-level or nonprocedural DML can be used on its own to specify complex
database operations concisely. High level DMLs, such as SQL, can specify and retrieve
many records in a single DML statement; therefore, they are called set-at-a-time or setoriented DMLs. A query in a high-level DML often specifies which data to retrieve
rather than how to retrieve it; therefore, such languages are also called declarative.
Low-level or procedural DML must be embedded in a general-purpose
programming language. This type of DML typically retrieves individual records or
objects from the database and processes each separately. Therefore, it needs to use
programming language constructs, such as looping, to retrieve and process each record
from a set of records. Low-level DMLs are also called record-at-a-time DMLs because
of this property.
Whenever DML commands, whether high level or low level, are embedded in a
general-purpose programming language, that language is called the host language and
the DML is called the data sublanguage. On the other hand, a high-level DML used in
a standalone interactive manner is called a query language.
Example(various commands) for DBMS Languages
1.DML
DML is abbreviation of Data Manipulation Language. It is used to retrieve, store,
modify, delete, insert and update data in database.
SELECT – Retrieves data from a table.
INSERT – Inserts data into a table
UPDATE
– Updates existing data into a table
DELETE
– Deletes all records from a table
2. DDL
DDL is abbreviation of Data Definition Language. It is used to create and modify the
structure of database objects in database.
CREATE – Creates objects in the database
ALTER – Alters objects of the database
DROP –Deletes objects of the database
TRUNCATE – Deletes all records from a table and resets table identity to initial
value.
3. DCL
DCL is abbreviation of Data Control Language. It is used to create roles, permissions,
and referential integrity as well it is used to control access to database by securing it.
Dept of Computer Science
Page 16
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
GRANT – Gives user’s access privileges to database
REVOKE – Withdraws user’s access privileges to database given with the GRANT
command
4. TCL
TCL is abbreviation of Transactional Control Language. It is used to manage
different transactions occurring within a database.
COMMIT – Saves work done in transactions
ROLLBACK – Restores database to original state since the last COMMIT command
in transactions
SAVE TRANSACTION – Sets a savepoint within a transaction
DBMS Interfaces
User-friendly interfaces provided by a DBMS may include the following:
1. Menu-Based Interfaces for Web Clients or Browsing. These interfaces present the
user with lists of options (called menus) that lead the user through the formulation of a
request. Menus do away with the need to memorize the specific commands and syntax
of a query language; rather, the query is composed step-by step by picking options from
a menu that is displayed by the system. Pull-down menus are a very popular technique
in Web-based user interfaces.
2.Forms-Based Interfaces: A forms-based interface displays a form to each user.
Users can fill out all of the form entries to insert new data, or they can fill out only
certain entries, in which case the DBMS will retrieve matching data for the remaining
entries. Forms are usually designed and programmed for naive users as interfaces to
canned transactions.
3. Graphical User Interfaces :A GUI typically displays a schema to the user in
diagrammatic form. The user then can specify a query by manipulating the diagram. In
many cases, GUIs utilize both menus and forms .Most GUIs use a pointing device,
such as a mouse, to select certain parts of the displayed schema diagram.
4. Natural Language Interfaces: These interfaces accept requests written in English
or some other language and attempt to understand them. A natural language interface
usually has its own schema, which is similar to the database conceptual schema, as well
as a dictionary of important words. The natural language interface refers to the words
in its schema, as well as to the set of standard words in its dictionary, to interpret the
request.
5. Speech Input and Output: Limited use of speech as an input query and speech as
an answer to a question or result of a request is becoming commonplace.
Dept of Computer Science
Page 17
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Applications with limited vocabularies such as inquiries for telephone directory, flight
arrival/departure, and credit card account information are allowing speech for input and
output to enable customers to access this information.
6. Interfaces for Parametric Users: Parametric users, such as bank tellers, often have
a small set of operations that they must perform repeatedly. For example, a teller is able
to use single function keys to invoke routine and repetitive transactions such as account
deposits or withdrawals, or balance inquiries.
7. Interfaces for the DBA: Most database systems contain privileged commands that
can be used only by the DBA staff. These include commands for creating accounts,
setting system parameters, granting account authorization, changing a schema, and
reorganizing the storage structures of a database.
The Database System Environment
There are 3 different modules
1. DBMS Component Modules
Dept of Computer Science
Page 18
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Figure illustrates, in a simplified form, the typical DBMS components. The figure
is divided into two parts. The top part of the figure refers to the various users of the
database environment and their interfaces. The lower part shows the internals of the
DBMS responsible for storage of data and processing of transactions.
A higher-level stored data manager module of the DBMS controls access to
DBMS information that is stored on disk, whether it is part of the database or the
catalog.
In top part of the fig,
 The DDL compiler processes schema definitions, specified in the DDL, and stores
descriptions of the schemas (meta-data) in the DBMS catalog.
 Casual users and persons with occasional need for information from the database
interact using some form of interface, which we call the interactive query interface
in Figure.
Dept of Computer Science
Page 19
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
 Queries are parsed and validated for correctness of the query syntax, the names of
files and data elements, and by a query compiler that compiles them into an internal
form.
 The query optimizer is concerned with the rearrangement and possible reordering
of operations, elimination of redundancies, and use of correct algorithms and
indexes during execution.
 The precompiler extracts DML commands from an application program written in
a host programming language. These commands are sent to the DML compiler for
compilation into object code for database access.
In the lower part of Figure,
 The runtime database processor executes (1) the privileged commands, (2) the
executable query plans, and (3) the canned transactions with runtime parameters.
The runtime database processor handles other aspects of data transfer, such as
management of buffers in the main memory.
 It works with the system catalog and may update it with statistics.
 It also works with the stored data manager, which in turn uses basic operating
system services for carrying out low-level input/output (read/write) operations
between the disk and main memory.
 DBMSs have their own buffer management module while others depend on the OS
for buffer management. Concurrency control and backup and recovery systems
are integrated into the working of the runtime database processor for purposes of
transaction management.
2. Database System Utilities
In addition to possessing the software modules just described, most DBMSs have
database utilities that help the DBA manage the database system. Common utilities
have the following types of functions:
■ Loading- A loading utility is used to load existing data files—such as text files or
sequential files—into the database.
■ Backup- A backup utility creates a backup copy of the database, usually by dumping
the entire database onto tape or other mass storage medium. The backup copy can be
used to restore the database in case of catastrophic disk failure.
■ Database storage reorganization- This utility can be used to reorganize a set of
database files into different file organizations, and create new access paths to improve
performance.
■ Performance monitoring-Such a utility monitors database usage and provides
statistics to the DBA. The DBA uses the statistics in making decisions such as whether
or not to reorganize files or whether to add or drop indexes to improve performance.
Dept of Computer Science
Page 20
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Other utilities may be available for sorting files, handling data compression, monitoring
access by users, interfacing with the network, and performing other functions.
3. Tools, Application Environments, and Communications Facilities
 CASE (Computer Aided Software Engineering) tools are used in the design phase of
database systems.

Data dictionary (or data repository) system.The data dictionary stores other
information, such as design decisions, usage standards, application program
descriptions, and user information. Such a system is also called an information
repository. This information can be accessed directly by users or the DBA when
needed. A data dictionary utility is similar to the DBMS catalog, but it includes a wider
variety of information and is accessed mainly by users rather than by the DBMS
software.

Application development environments, such as PowerBuilder (Sybase) or
JBuilder (Borland), have been quite popular. These systems provide an environment for
developing database applications and include facilities that help in many facets of
database systems, including database design, GUI development, querying and updating,
and application program development.

The DBMS also needs to interface with communications software, whose
function is to allow users at locations remote from the database system site to access
the database through computer terminals, workstations, or personal computers. These
are connected to the database site through data communications hardware such as
Internet routers, phone lines, long-haul networks, local networks, or satellite
communication devices.
Data Modeling Using the Entity-Relationship (ER) Model
Entity Types, Entity Sets, Attributes, and Keys
Dept of Computer Science
Page 21
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
The ER model describes data as entities, relationships, and attributes.
The basic object that the ER model represents is an entity,
which is a thing in the real world with an independent existence. An entity may be an
object with a physical existence (for example, a particular person, car, house, or
employee) or it may be an object with a conceptual existence (for instance, a company,
a job, or a university course).
Each entity has attributes—the particular properties that describe it. For
example, an EMPLOYEE entity may be described by the employee’s name, age,
address, salary, and job. A particular entity will have a value for each of its attributes.
The attribute values that describe each entity become a major part of the data stored in
the database.
Entities and Their Attributes:
Figure : Two entities, EMPLOYEE e1, and COMPANY c1, and their attributes.
Several types of attributes occur in the ER model: simple versus composite, single
valued versus multi valued, and stored versus derived.
1. Composite versus Simple (Atomic) Attributes
Composite attributes can be divided into smaller subparts, which represent
more basic attributes with independent meanings. For example, the Address attribute of
the EMPLOYEE entity can be subdivided into Street address, City, State, and Zip,3
with the values ‘2311 Kirby’, ‘Houston’, ‘Texas’, and ‘77001.
Composite attributes can form a hierarchy; for example, Street address can be further
subdivided into three simple component attributes: Number, Street, and Apartment
number.
Figure :A hierarchy of composite attributes
Dept of Computer Science
Page 22
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Attributes that are not divisible are called simple or atomic attributes. The value
of a composite attribute is the concatenation of the values of its component simple
attributes. If the composite attribute is referenced only as a whole, there is no need to
subdivide it into component attributes. For example, if there is no need to refer to the
individual components of an address (Zip Code, street, and so on), then the whole
address can be designated as a simple attribute.
2. Single-Valued versus Multivalued Attributes
Most attributes have a single value for a particular entity; such attributes are
called single-valued. For example, Age is a single-valued attribute of a person.
Some cases an attribute can have a set of values for the same entity—for instance,
a Colors attribute for a car, or a College degrees attribute for a person. Cars with one
color have a single value, whereas two-tone cars have two color values. Similarly, one
person may not have a college degree, another person may have one, and a third person
may have two or more degrees; therefore, different people can have different numbers
of values for the College degrees attribute. Such attributes are called multivalued. A
multivalued attribute may have lower and upper bounds to constrain the number of
values allowed for each individual entity. For example, the Colors attribute of a car may
be restricted to have between one and three values, if we assume that a car can have
three colors at most.
3. Stored versus Derived Attributes
In some cases, two (or more) attribute values are related—for example, the Age
and Birth_date attributes of a person. For a particular person entity, the value of Age
can be determined from the current (today’s) date and the value of that person’s
Birth_date. The Age attribute is hence called a derived attribute and is said to be
derivable from the Birth_date attribute,which is called a stored attribute. Some
attribute values can be derived from related entities; for example, an attribute
Number_of_employees of a DEPARTMENT entity can be derived by counting the
number of employees related to (working for) that department.
4. NULL Value
Dept of Computer Science
Page 23
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
In some cases, a particular entity may not have an applicable value for an
attribute. For example, the Apartment_number attribute of an address applies only to
addresses that are in apartment buildings and not to other types of residences, such as
single-family homes. Similarly, a College_degrees attribute applies only to people with
college degrees. For such situations, a special value called NULL is created. An address
of a single-family home would have NULL for its Apartment_number attribute, and a
person with no college degree would have NULL for College_degrees. NULL can also
be used if we do not know the value of an attribute
for a particular entity.
5. Complex Attributes
Notice that, in general, composite and multivalued attributes can be nested
arbitrarily.We can represent arbitrary nesting by grouping components of a composite
attribute between parentheses () and separating the components
with commas, and by displaying multivalued attributes between braces { }.
Such attributes are called complex attributes. For example, if a person can have
more than one residence and each residence can have a single address and multiple
phones, an attribute Address_phone for a person can be specified as shown in Figure
Both Phone and Address are themselves composite attributes.
Figure A complex attribute:Address_phone.
{Address_phone( {Phone(Area_code,Phone_number)},Address(Street_address
(Number,Street,Apartment_number),City,State,Zip) )}
Entity Types, Entity Sets, Keys, and Value Sets
Entity Types and Entity Set:
A database usually contains groups of entities that are similar. For example, a
company employing hundreds of employees may want to store similar information
concerning each of the employees. These employee entities share the same attributes,
but each entity has its own value(s) for each attribute.
An entity type defines a collection (or set) of entities that have the same
attributes. Each entity type in the database is described by its name and attributes.
Figure shows two entity types: EMPLOYEE and COMPANY, and a list of some of the attributes for
each. A few individual entities of each type are also illustrated, along with the values of their attributes.
Dept of Computer Science
Page 24
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
The collection of all entities of a particular entity type in the database at any point
in time is called an entity set; the entity set is usually referred to using the same name
as the entity type. For example, EMPLOYEE refers to both a type of entity as well as
the current set of all employee entities in the database. An entity type is represented in
ER diagrams as a rectangular box enclosing the entity type name. Attribute names are
enclosed in ovals and are attached to their entity type by straight lines. Composite
attributes are attached to their component attributes by straight lines. Multi valued
attributes are displayed in double ovals. Figure shows a CAR entity type in this notation.
An entity type describes the schema or intension for a set of entities that share the same
structure. The collection of entities of a particular entity type is grouped into an entity
set, which is also called the extension of the entity type.
Key Attributes of an Entity Type
An important constraint on the entities of an entity type is the key or uniqueness
constraint on attributes. An entity type usually has one or more attributes whose values
Dept of Computer Science
Page 25
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
are distinct for each individual entity in the entity set. Such an attribute is called a key
attribute, and its values can be used to identify each entity uniquely. For example, the
Name attribute is a key of the COMPANY entity type because no two companies are
allowed to have the same name. For the PERSON entity type, a typical key attribute is
Ssn (Social Security number). Sometimes several attributes together form a key,
meaning that the combination of the attribute values must be distinct for each entity.
In ER diagrammatic notation, each key attribute has its name underlined inside
the oval .Specifying that an attribute is a key of an entity type means that the preceding
uniqueness property must hold for every entity set of the entity type. Hence, it is a
constraint that prohibits any two entities from having the same value for the key
attribute at the same time.
Some entity types have more than one key attribute. For example, each of the
Vehicle_id and Registration attributes of the entity type CAR is a key in its own right.
The Registration attribute is an example of a composite key formed from two simple
component attributes, State and Number, neither of which is a key on its own. An entity
type may also have no key, in which case it is called a weak entity type.
In our diagrammatic notation, if two attributes are underlined separately, then
each is a key on its own. Unlike the relational model, there is no concept of primary
key in the ER model that we present here; the primary key will be chosen during
mapping to a relational schema
Value Sets (Domains) of Attributes
Each simple attribute of an entity type is associated with a value set (or domain
of values), which specifies the set of values that may be assigned to that attribute for
each individual entity. If the range of ages allowed for employees is between 16 and 70,
we can specify the value set of the Age attribute of EMPLOYEE to be the set of integer
numbers between 16 and 70. Similarly, we can specify the value set for the Name
attribute to be the set of strings of alphabetic characters separated by blank characters,
and so on. Value sets are not displayed in ER diagrams, and are typically specified using
the basic data types available in most programming languages, such as integer, string,
Boolean, float, and enumerated type, subrange, and so on. Additional data types to
represent common database types such as date, time, and other concepts are also
employed.
Mathematically, an attribute A of entity set E whose value set is V can be defined
as a function from E to the power set6 P(V) of V:
A : E → P(V)
Dept of Computer Science
Page 26
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
We refer to the value of attribute A for entity e as A(e). The previous definition
covers both single-valued and multivalued attributes, as well as NULLs. A NULL value
is represented by the empty set.
Initial Conceptual Design of the COMPANY Database
We can now define the entity types for the COMPANY database, based on the
requirements First define several entity types and their attributes then we introduce the
concept of a relationship.
we can identify
four entity types—one corresponding to each of the four items in the specification
1. An entity type DEPARTMENT with attributes Name, Number, Locations, Manager,
and Manager_start_date.Locations is the only multivalued attribute.
We can specify that both Name and Number are (separate) key attributes because each
was specified to be unique.
2. An entity type PROJECT with attributes Name, Number, Location, and
Controlling_department. Both Name and Number are (separate) key attributes.
3. An entity type EMPLOYEE with attributes Name, Ssn, Sex, Address, Salary,
Birth_date, Department, and Supervisor. Both Name and Address may be composite
attributes; however, this was not specified in the requirements. We
must go back to the users to see if any of them will refer to the individual
components of Name—First_name, Middle_initial, Last_name—or of Address.
4. An entity type DEPENDENT with attributes Employee, Dependent_name, Sex,
Birth_date, and Relationship (to the employee).
Dept of Computer Science
Page 27
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Relationship Types, Relationship Sets, Roles, and Structural Constraints
There are several implicit relationships among the various entity types.
In fact, whenever an attribute of one entity type refers to another entity type, some
relationship exists. For example, the attribute Manager of DEPARTMENT refers to an
employee who manages the department;In the ER model, these references should not
be represented as attributes but as relationships.
Relationship Types, Sets, and Instances
A relationship type R among n entity types E1, E2, ..., En defines a set of
associations—or a relationship set—among entities from these entity types. As for the
case of entity types and entity sets, a relationship type and its corresponding relationship
set are customarily referred to by the same name, R.
For example, consider a relationship type WORKS_FOR between the two entity types
EMPLOYEE and DEPARTMENT, which associates each employee with the
department for which the employee works in the corresponding entity set. Each
relationship instance in the relationship set WORKS_FOR associates one EMPLOYEE
entity and one DEPARTMENT entity. Figure illustrates this example, where each
relationship Relationship Types, Relationship Sets, Roles, and Structural Constraints
Dept of Computer Science
Page 28
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
In ER diagrams, relationship types are displayed as diamond-shaped boxes, which are
connected by straight lines to the rectangular boxes representing the participating entity
types. The relationship name is displayed in the diamond-shaped box.
Relationship Degree, Role Names, and Recursive Relationships
Degree of a Relationship Type:
The degree of a relationship type is the number of participating entity types.
Hence, the WORKS_FOR relationship is of degree two. A relationship type of degree
two is called binary, and one of degree three is called ternary.
Role Names and Recursive Relationships
Each entity type that participates in a relationship type plays a particular role in
the relationship. The role name signifies the role that a participating entity from the
entity type plays in each relationship instance, and helps to explain what the relationship
means. For example, in the WORKS_FOR relationship type, EMPLOYEE plays the
role of employee or worker and DEPARTMENT plays the role of department or
employer. Role names are not technically necessary in relationship types where all the
participating entity types are distinct, since each participating entity type name can be
used as the role name.
However, in some cases the same entity type participates more than once in a
relationship type in different roles. In such cases the role name becomes essential for
distinguishing the meaning of the role that each participating entity plays. Such
relationship types are called recursive relationships.
Figure shows an example. The SUPERVISION relationship type relates an employee
to a supervisor, where both employee and supervisor entities are members of the same
Dept of Computer Science
Page 29
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
EMPLOYEE entity set. Hence, the EMPLOYEE entity type participates twice in
SUPERVISION: once in the role of supervisor (or boss), and once in the role of
supervisee (or subordinate).
Constraints on Binary Relationship Types
Relationship types usually have certain constraints that limit the possible combinations
of entities that may participate in the corresponding relationship set. These constraints
are determined from the miniworld situation that the relationships represent.
We can distinguish two main types of binary relationship constraints: cardinality ratio
and participation.
1. Cardinality Ratios for Binary Relationships. The cardinality ratio for a binary
relationship specifies the maximum number of relationship instances that an entity can
participate in. For example, in the WORKS_FOR binary relationship type,
DEPARTMENT: EMPLOYEE is of cardinality ratio 1: N, meaning that each
department can be related to (that is, employs) any number of employees, but an
employee can be related to (work for) only one department. This means that for this
particular relationship WORKS_FOR, a particular department entity can be related to
any number of employees (N indicates there is no maximum number). On the other
hand, an employee can be related to a maximum of one department. The possible
cardinality ratios for binary relationship types are 1:1, 1: N, N: 1, and M: N.
1:1 binary relationship: An example of a 1:1 binary relationship is
MANAGES which relates a department entity to the employee who manages that
department. This represents the miniworld constraints that—at any point in time—an
employee can manage one department only and a department can have one manager
only.
M: N Cardinality Ratio: The relationship type WORKS_ON is of cardinality
ratio M:N, because the miniworld rule is that an employee can work on several projects
and a project can have several employees.
Dept of Computer Science
Page 30
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Figure: An M:N relationship,WORKS_ON.
N: 1 Cardinality Ratio: Employees can works for only one Department but one
department can have many employees.
Employee
N
Wor
ks
1
Department
1: N Cardinality Ratio: One project managed by many Departments but one
department can have many Projects.
Cardinality ratios for binary relationships are represented on ER diagrams by displaying
1, M, and N on the diamonds as shown in Figure.
2. Participation Constraints and Existence Dependencies. The participation
constraint specifies whether the existence of an entity depends on its being related to
another entity via the relationship type. This constraint specifies the minimum number
of relationship instances that each entity can participate in, and is sometimes called the
minimum cardinality constraint. There are two types of participation constraints—
total and partial—that we illustrate by example. If a company policy states that every
employee must work for a department, then an employee entity can exist only if it
participates in at least one WORKS_FOR relationship instance (Figure below). Thus,
Dept of Computer Science
Page 31
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
the participation of EMPLOYEE in WORKS_FOR is called total participation,
meaning that every entity in the total set of employee entities must be related to a
department entity via WORKS_FOR. Total participation is also called existence
dependency.
Fig: Total Participation
And in the Figure below, we do not expect every employee to manage a department, so
the participation of EMPLOYEE in the MANAGES relationship type is partial,
meaning that some or part of the set of employee entities are related to some department
entity via MANAGES, but not necessarily all.
Fig: Partial Participation
We will refer to the cardinality ratio and participation constraints, taken together, as
the structural constraints of a relationship type.
Dept of Computer Science
Page 32
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
In ER diagrams, total participation (or existence dependency) is displayed as a
double line connecting the participating entity type to the relationship, whereas partial
participation is represented by a single line (see Figure). Notice that in this notation,
we can either specify no minimum (partial participation) or a minimum of one (total
participation).
Attributes of Relationship Types
Relationship types can also have attributes, similar to those of entity types. For
example, to record the number of hours per week that an employee works on a particular
project, we can include an attribute Hours for the WORKS_ON relationship type.
Another example is to include the date on which a manager started managing a
department via an attribute Start_date for the MANAGES relationship type.
Weak Entity Types
Entity types that do not have key attributes of their own are called weak entity types.
In contrast, regular entity types that do have a key attribute—which include all the
examples discussed so far—are called strong entity types.
Dept of Computer Science
Page 33
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Common Notations Used in ER Diagram
Dept of Computer Science
Page 34
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Refining the ER Design for the COMPANY Database
In our example, we specify the following relationship types:
■ MANAGES a 1:1 relationship type between EMPLOYEE and DEPARTMENT.
EMPLOYEE participation is partial. DEPARTMENT participation is not clear from the
requirements. We question the users, who say that a department must have a manager
at all times, which implies total participation.13 The attribute Start_date is assigned to
this relationship type.
■ WORKS_FOR, a 1: N relationship type between DEPARTMENT and EMPLOYEE.
Both participations are total.
■ CONTROLS, a 1: N relationship type between DEPARTMENT and PROJECT.
Dept of Computer Science
Page 35
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
The participation of PROJECT is total, whereas that of DEPARTMENT is determined
to be partial, after consultation with the users indicates that some departments may
control no projects.
■ SUPERVISION, a 1: N relationship type between EMPLOYEE (in the supervisor
role) and EMPLOYEE (in the supervisee role). Both participations are determined to
be partial, after the users indicate that not every employee is a supervisor and not every
employee has a supervisor.
■ WORKS_ON, determined to be an M: N relationship type with attribute Hours, after
the users indicate that a project can have several employees working on it. Both
participations are determined to be total.
■ DEPENDENTS_OF, a 1: N relationship type between EMPLOYEE and
DEPENDENT, which is also the identifying relationship for the weak entity type
DEPENDENT. The participation of EMPLOYEE is partial, whereas that of
DEPENDENT is total.
ER Diagrams- Naming Conventions
When designing a database schema, the choice of names for entity types,
attributes, relationship types, and (particularly) roles is not always straightforward. One
should choose names that convey, as much as possible, the meanings attached to the
different constructs in the schema. We choose to use singular names for entity types,
rather than plural ones, because the entity type name applies to each individual entity
belonging to that entity type. In our ER diagrams, we will use the convention that entity
type and relationship type names are uppercase letters, attribute names have their initial
letter capitalized, and role names are lowercase letters As a general practice, given a
narrative description of the database requirements, the nouns appearing in the narrative
tend to give rise to entity type names, and the verbs tend to indicate names of
relationship types. Attribute names generally arise from additional nouns that describe
the nouns corresponding to entity types. Another naming consideration involves
choosing binary relationship names to make the ER diagram of the schema readable
from left to right and from top to bottom.
Dept of Computer Science
Page 36
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Examples –ER DIAGRAMS
ER DIAGRAM FOR BANK DATABASE
Dept of Computer Science
Page 37
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
UNIT 2
Transaction
A transaction is an event which occurs on the database. Generally a transaction
reads a value from the database or writes a value to the database. If you have any concept
of Operating Systems, then we can say that a transaction is analogous to processes.
Although a transaction can both read and write on the database, there are some
fundamental differences between these two classes of operations. A read operation does
not change the image of the database in any way. But a write operation, whether
performed with the intention of inserting, updating or deleting data from the database,
changes the image of the database. That is, we may say that these transactions bring the
database from an image which existed before the transaction occurred (called
Dept of Computer Science
Page 38
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
the Before Image or BFIM) to an image which exists after the transaction occurred
(called the After Image or AFIM).
The Four Properties of Transactions
Every transaction, for whatever purpose it is being used, has the following four
properties. Taking the initial letters of these four properties we collectively call them
the ACID Properties.
1. Atomicity: This means that either all of the instructions within the transaction will
be reflected in the database, or none of them will be reflected.
Say for example, we have two accounts A and B, each containing Rs 1000/-. We now
start a transaction to deposit Rs 100/- from account A to Account B.
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;
The transaction has 6 instructions to extract the amount from A and submit it to
B. The AFIM will show Rs 900/- in A and Rs 1100/- in B.
2. Consistency: If we execute a particular transaction in isolation or together with other
transaction, (i.e. presumably in a multi-programming environment), the transaction will
yield the same expected result.
To give better performance, every database management system supports the execution
of multiple transactions at the same time, using CPU Time Sharing. Concurrently
executing transactions may have to deal with the problem of sharable resources, i.e.
resources that multiple transactions are trying to read/write at the same time. For
example, we may have a table or a record on which two transaction are trying to read
or write at the same time. Careful mechanisms are created in order to prevent
mismanagement of these sharable resources, so that there should not be any change in
the way a transaction performs. A transaction which deposits Rs 100/- to account A
must deposit the same amount whether it is acting alone or in conjunction with another
transaction that may be trying to deposit or withdraw some amount at the same time.
3. Isolation: In case multiple transactions are executing concurrently and trying to
access a sharable resource at the same time, the system should create an ordering in
their execution so that they should not create any anomaly in the value stored at the
sharable
resource.
There are several ways to achieve this and the most popular one is using some kind of
locking mechanism. Again, if you have the concept of Operating Systems, then you
should remember the semaphores, how it is used by a process to make a resource busy
Dept of Computer Science
Page 39
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
before starting to use it, and how it is used to release the resource after the usage is over.
Other processes intending to access that same resource must wait during this time.
Locking is almost similar. It states that a transaction must first lock the data item that it
wishes to access, and release the lock when the accessing is no longer required. Once a
transaction locks the data item, other transactions wishing to access the same data item
must wait until the lock is released.
4. Durability: It states that once a transaction has been complete the changes it has
made should be permanent.
As we have seen in the explanation of the Atomicity property, the transaction, if
completes successfully, is committed. Once the COMMIT is done, the changes which
the transaction has made to the database are immediately written into permanent
storage. So, after the transaction has been committed successfully, there is no question
of any loss of information even if the power fails. Committing a transaction guarantees
that the AFIM has been reached.
There are several ways Atomicity and Durability can be implemented. One of
them is called Shadow Copy. In this scheme a database pointer is used to point to the
BFIM of the database. During the transaction, all the temporary changes are recorded
into a Shadow Copy, which is an exact copy of the original database plus the changes
made by the transaction, which is the AFIM. Now, if the transaction is required to
COMMIT, then the database pointer is updated to point to the AFIM copy, and the
BFIM copy is discarded. On the other hand, if the transaction is not committed, then the
database pointer is not updated. It keeps pointing to the BFIM, and the AFIM is
discarded. This is a simple scheme, but takes a lot of memory space and time to
implement.
Atomicity and Durability is essentially the same thing, just as Consistency and Isolation
is essentially the same thing.
Transaction States
Dept of Computer Science
Page 40
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
There are the following six states in which a transaction may exist
1.Active: The initial state when the transaction has just started execution
.
2. Partially Committed: At any given point of time if the transaction is executing
properly, then it is going towards it COMMIT POINT. The values generated during the
execution are all stored in volatile storage. At this point, some recovery protocols need
to ensure that a system failure will not result in an inability to record the changes of the
transaction permanently (usually by recording changes in the system log, discussed in
the next section). Once this check is successful, the transaction is said to have reached
its commit point and enters the committed state.
3. Failed: If the transaction fails for some reason. The temporary values are no longer
required, and the transaction is set to ROLLBACK. It means that any change made to
the database by this transaction up to the point of the failure must be undone. If the
failed transaction has withdrawn Rs. 100/- from account A, then the ROLLBACK
operation should add Rs 100/- to account A.
4. Aborted: When the ROLLBACK operation is over, the database reaches the BFIM.
The transaction is now said to have been aborted.
5. Committed: If no failure occurs then the transaction reaches the COMMIT POINT.
All the temporary values are written to the stable storage and the transaction is said to
have been committed.
6. Terminated: Either committed or aborted, the transaction finally reaches this state.
The whole process can be described using the following diagram:


Concurrent Execution
A schedule is a collection of many transactions which is implemented as a unit.
Depending upon how these transactions are arranged in within a schedule, a schedule
can be of two types:
Serial: The transactions are executed one after another, in a non-preemptive manner.
Concurrent: The transactions are executed in a preemptive, time shared method.
In Serial schedule, there is no question of sharing a single data item among many
transactions, because not more than a single transaction is executing at any point of
time. However, a serial schedule is inefficient in the sense that the transactions suffer
for having a longer waiting time and response time, as well as low amount of resource
utilization.
In concurrent schedule, CPU time is shared among two or more transactions in
order to run them concurrently. However, this creates the possibility that more than one
transaction may need to access a single data item for read/write purpose and the
database could contain inconsistent value if such accesses are not handled properly. Let
us explain with the help of an example.
Dept of Computer Science
Page 41
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Let us consider there are two transactions T1 and T2, whose instruction sets are given
as following. T1 is the same as we have seen earlier, while T2 is a new transaction.
T1
Read A;
A = A – 100;
Write A;
Read B;
B = B + 100;
Write B;
T2
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
T2 is a new transaction which deposits to account C 10% of the amount in account A.
If we prepare a serial schedule, then either T1 will completely finish before T2 can
begin, or T2 will completely finish before T1 can begin. However, if we want to create
a concurrent schedule, then some Context Switching need to be made, so that some
portion of T1 will be executed, then some portion of T2 will be executed and so on. For
example say we have prepared the following concurrent schedule.
T1
T2
Read A;
A = A - 100;
Write A;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Read B;
B = B + 100;
Write B;
No problem here. We have made some Context Switching in this Schedule, the first one
after executing the third instruction of T1, and after executing the last statement of T2.
T1 first deducts Rs 100/- from A and writes the new value of Rs 900/- into A. T2 reads
Dept of Computer Science
Page 42
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
the value of A, calculates the value of Temp to be Rs 90/- and adds the value to C. The
remaining part of T1 is executed and Rs 100/- is added to B.
It is clear that a proper Context Switching is very important in order to maintain the
Consistency and Isolation properties of the transactions. But let us take another example
where a wrong Context Switching can bring about disaster. Consider the following
example involving the same T1 and T2
T1
T2
Read A;
A = A - 100;
Read A;
Temp = A * 0.1;
Read C;
C = C + Temp;
Write C;
Write A;
Read B;
B = B + 100;
Write B;
This schedule is wrong, because we have made the switching at the second instruction
of T1. The result is very confusing. If we consider accounts A and B both containing
Rs 1000/- each, then the result of this schedule should have left Rs 900/- in A, Rs 1100/in B and add Rs 90 in C (as C should be increased by 10% of the amount in A). But in
this wrong schedule, the Context Switching is being performed before the new value of
Rs 900/- has been updated in A. T2 reads the old value of A, which is still Rs 1000/-,
and deposits Rs 100/- in C. C makes an unjust gain of Rs 10/- out of nowhere.
In the above example, we detected the error simple by examining the schedule and
applying common sense. But there must be some well formed rules regarding how to
arrange instructions of the transactions to create error free concurrent schedules. This
brings us to our next topic, the concept of Serializability.
Serializability
When several concurrent transactions are trying to access the same data item, the
instructions within these concurrent transactions must be ordered in some way so as
there are no problem in accessing and releasing the shared data item. There are two
aspects of serializability which are described here:
Dept of Computer Science
Page 43
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
1. Conflict Serializability
Two instructions of two different transactions may want to access the same data item in
order to perform a read/write operation. Conflict Serializability deals with detecting
whether the instructions are conflicting in any way, and specifying the order in which
these two instructions will be executed in case there is any conflict. A conflict arises if
at least one (or both) of the instructions is a write operation. The following rules are
important in Conflict Serializability:
1. If two instructions of the two concurrent transactions are both for read operation, then
they are not in conflict, and can be allowed to take place in any order.
2. If one of the instructions wants to perform a read operation and the other instruction
wants to perform a write operation, then they are in conflict, hence their ordering is
important. If the read instruction is performed first, then it reads the old value of the
data item and after the reading is over, the new value of the data item is written. It the
write instruction is performed first, then updates the data item with the new value and
the read instruction reads the newly updated value.
3. If both the transactions are for write operation, then they are in conflict but can be
allowed to take place in any order, because the transaction do not read the value
updated by each other. However, the value that persists in the data item after the
schedule is over is the one written by the instruction that performed the last write.
It may happen that we may want to execute the same set of transaction in a different
schedule on another day. Keeping in mind these rules, we may sometimes alter parts of
one schedule (S1) to create another schedule (S2) by swapping only the non-conflicting
parts of the first schedule. The conflicting parts cannot be swapped in this way because
the ordering of the conflicting instructions is important and cannot be changed in any
other schedule that is derived from the first. If these two schedules are made of the same
set of transactions, then both S1 and S2 would yield the same result if the conflict
resolution rules are maintained while creating the new schedule. In that case the
schedule S1 and S2 would be called Conflict Equivalent.
2. View Serializability
This is another type of serializability that can be derived by creating another schedule
out of an existing schedule, involving the same set of transactions. These two schedules
would be called View Serializable if the following rules are followed while creating the
second schedule out of the first. Let us consider that the transactions T1 and T2 are
being serialized to create two different schedules S1 and S2 which we want to be View
Equivalent and both T1 and T2 wants to access the same data item.
1. If in S1, T1 reads the initial value of the data item, then in S2 also, T1 should read the
initial value of that same data item.
Dept of Computer Science
Page 44
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
2. If in S1, T1 writes a value in the data item which is read by T2, then in S2 also, T1
should write the value in the data item before T2 reads it.
3. If in S1, T1 performs the final write operation on that data item, then in S2 also, T1
should perform the final write operation on that data item.
Except in these three cases, any alteration can be possible while creating S2 by
modifying S1.
Relational data model
Relational data model is the primary data model, which is used widely around the
world for data storage and processing. This model is simple and have all the properties
and capabilities required to process data with storage efficiency.
Relational Model Concepts
Tables: In relation data model, relations are saved in the format of Tables. This format
stores the relation among entities. A table has rows and columns, where rows represent
records and columns represents the attributes.
Tuple: A single row of a table, which contains a single record for that relation is called
a tuple.
Relation instance: A finite set of tuples in the relational database system represents
relation instance. Relation instances do not have duplicate tuples.
Relation schema: This describes the relation name (table name), attributes and their
names.
Relation key: Each row has one or more attributes which can identify the row in the
relation (table) uniquely, is called the relation key.
Attribute domain: Every attribute has some pre-defined value scope, known as
attribute domain.
Relational Model Constraints
Every relation has some conditions that must hold for it to be a valid relation. These
conditions are called Relational Integrity Constraints. There are three main integrity
constraints.

Key Constraints

Domain constraints

Referential integrity constraints
Dept of Computer Science
Page 45
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
KEY CONSTRAINTS:
There must be at least one minimal subset of attributes in the relation, which can identify
a tuple uniquely. This minimal subset of attributes is called key for that relation. If there
is more than one such minimal subset, these are called candidate keys


Key constraints forces that:
in a relation with a key attribute, no two tuples can have identical value for key
attributes.
key attribute cannot have NULL values.
Key constrains are also referred to as Entity Constraints.
Domain Constraints
Attributes have specific values in real-world scenario. For example, age can only be
positive integer. The same constraints has been tried to employ on the attributes of a
relation. Every attribute is bound to have a specific range of values. For example, age
cannot be less than zero and telephone number cannot be a outside 0-9.
Referential Integrity Constraints
This integrity constraints works on the concept of Foreign Key. A key attribute of a
relation can be referred in other relation, where it is called foreign key.
Referential integrity constraint states that if a relation refers to a key attribute of a
different or same relation, that key element must exist.
Relational database schema
o
A relational database schema helps you to organize and understand the structure of a
database. This is particularly useful when designing a new database, modifying an
existing database to support more functionality, or building integration between
databases.
Creating the Schema
o
There are two steps to creating a relational database schema: creating the logical schema
and creating the physical schema. The logical schema depicts the structure of the
database, showing the tables, columns and relationships with other tables in the database
and can be created with modeling tools or spreadsheet and drawing software. The
physical schema is created by actually generating the tables, columns and relationships
in the relational database management software (RDBMS). Most modeling tools can
automate the creation of the physical schema from the logical schema, but it can also
be done by manually.
Dept of Computer Science
Page 46
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Modifying the Schema
o
The structure of a relational database will inevitably change over time as data needs
change. It's just as important to document changes to your relational database schema
as it was to document the original design, otherwise it becomes increasingly difficult to
update or integrate the database. Be sure to save copies of previous configurations, so
that changes can be removed if problems occur.
The Relational Algebra and Relational Calculus
In this chapter we discuss the two formal languages for the relational model: the
relational algebra and the relational calculus. Practical language for the relational
model is the SQL standard.
The basic set of operations for the relational model is the relational algebra.
These operations enable a user to specify basic retrieval requests as relational algebra
expressions. The result of retrieval is a new relation, which may have been formed from
one or more relations. The algebra operations thus produce new relations, which can be
further manipulated using operations of the same algebra. A sequence of relational
algebra operations forms a relational algebra expression, whose result will also be a
relation that represents the result of a database query (or retrieval request).
Whereas the algebra defines a set of operations for the relational model, the
relational calculus provides a higher-level declarative language for specifying
relational queries. A relational calculus expression creates a new relation. In a relational
calculus expression, there is no order of operations to specify how to retrieve the query
result—only what information the result should contain. This is the main distinguishing
feature between relational algebra and relational calculus. The relational calculus is
important because it has a firm basis in mathematical logic and because the standard
query language (SQL) for RDBMSs has some of its foundations in a variation of
relational calculus known as the tuple relational calculus.
Relational algebra operations
The relational algebra is often considered to be an integral part of the relational
data model. Its operations can be divided into two groups. One group includes set
operations from mathematical set theory; these are applicable because each relation is
defined to be a set of tuples in the formal relational model. Set operations include
UNION, INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT (also
known as CROSS PRODUCT). The other group consists of operations developed
specifically for relational databases—these include SELECT, PROJECT, and JOIN,
among others. First, we describe the SELECT and PROJECT operations because they
are unary operations that operate on single relations. Then we discuss set operations
Dept of Computer Science
Page 47
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
we discuss JOIN and other complex binary operations, which operate on two tables
by combining related tuples (records) based on join conditions.
Unary Relational Operations: SELECT ,PROJECT and RENAME
1. SELECT
The SELECT operation (denoted by σ (sigma)) is used to select a subset of the tuples
from a relation based on a selection condition. The selection condition acts as a filter
- Keeps only those tuples that satisfy the qualifying condition
- Tuples satisfying the condition are selected whereas the other tuples are discarded
(filtered out)
In general, the SELECT operation is denoted by
σ <selection condition>(R)
where the symbol σ (sigma) is used to denote the SELECT operator and the selection
condition is a Boolean expression (condition) specified on the attributes of relation R.
The relation resulting from the SELECT operation has the same attributes as R.
The Boolean expression specified in <selection condition> is made up of a number of
clauses of the form <attribute name> <comparison op> <constant value>
Or <attribute name> <comparison op> <attribute name>
where <attribute name> is the name of an attribute of R,
<comparison op> is normally one of the operators {=, <, ≤, >, ≥, ≠}, and <constant
value> is a constant value from the attribute domain. Clauses can be connected by the
standard Boolean operators and, or, and not to form a general selection condition.
Properties of SELECT Operation
 SELECT s is commutative:
s <condition1>(s < condition2> (R)) = s <condition2> (s < condition1> (R))
 A cascade of SELECT operations may be replaced by a single selection with a
conjunction of all the conditions:
s<cond1>(s< cond2> (s<cond3>(R)) = s <cond1> AND < cond2> AND < cond3>(R)
Select-Example:
Name
Age
Weight
Name
Age
Weight
Dept of Computer Science
Name
Age
Weight
Page 48
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Harry
34
80
Harry
34
80
Sally
28
64
Helena
54
54
George
29
70
Peter
34
80
Helena
54
54
Peter
34
80
Helena
54
54
2. The PROJECT Operation
 PROJECT Operation is denoted by ∏ (pi)
 This operation keeps certain columns (attributes) from a relation and discards the
other columns.
 PROJECT creates a vertical partitioning
 The list of specified columns (attributes) is kept in each tuple
 The other attributes in each tuple are discarded
For ex: ∏subject, author (Books) -Selects and projects columns named as subject and
author from relation Books.
-We may want to apply several relational algebra operations one after the other
-Either we can write the operations as a single relational algebra expression by nesting
the operations, or
-We can apply one operation at a time and create intermediate result relations.
Ex: To retrieve the first name, last name, and salary of all employees who work in
department number 5, we must apply a select and a project operation
We can write a single relational algebra expression as follows:
 pFNAME, LNAME, SALARY(s DNO=5(EMPLOYEE))
3. RENAME Operation
 The RENAME operator is denoted by  (rho)
 In some cases, we may want to rename the attributes of a relation or the relation
name or both
 Useful when a query requires multiple operations
 Necessary in some cases (see JOIN operation later)
 RENAME operation – which can rename either the relation name or the attribute
names, or both
Dept of Computer Science
Page 49
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
 The general RENAME operation  can be expressed by any of the following
forms:
 S(R) changes:
the relation name only to S
 (B1, B2, …, Bn )(R) changes:
the column (attribute) names only to B1, B1, …..Bn
 S (B1, B2, …, Bn )(R) changes both:
the relation name to S, and the column (attribute) names to B1, B1, …..Bn
A rename is a unary operation written as
where the result is identical
to R except that the b attribute in all tuples is renamed to an a attribute. This is simply
used to rename the attribute of a relation or the relation itself.
To rename the 'isFriend' attribute to 'isBusinessContact' in a
relation,
might be used.
Relational Algebra Operations from Set Theory
The UNION, INTERSECTION, MINUS (Set Difference) and the CARTESIAN PRODUCT
(CROSS PRODUCT) Operations
The next group of relational algebra operations is the standard mathematical
operations on sets.
Several set theoretic operations are used to merge the elements of two sets in
various ways, including UNION, INTERSECTION, and SET DIFFERENCE (also
called MINUS or EXCEPT). These are binary operations; that is, each is applied to
two sets (of tuples).
When these operations are adapted to relational databases, the two relations on
which any of these three operations are applied must have the same type of tuples;
this condition has been called union compatibility or type compatibility. Two
relations R(A1, A2, ..., An) and S(B1, B2, ..., Bn) are said to be union compatible (or
type compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for
1<=i<=n. This means that the two relations have the same number of attributes and
each corresponding pair of attributes has the same domain.
We can define the three operations UNION, INTERSECTION, and SET
DIFFERENCE on two union-compatible relations R and S as follows:
■ UNION: The result of this operation, denoted by R US, is a relation that includes all
tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated.
■ INTERSECTION: The result of this operation, denoted by R ∩S, is a relation that
includes all tuples that are in both R and S.
Dept of Computer Science
Page 50
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
■ SET DIFFERENCE (or MINUS): The result of this operation, denoted by
R – S is a relation that includes all tuples that are in R but not in S.
We will adopt the convention that the resulting relation has the same attribute
names as the first relation R. It is always possible to rename the attributes in the result
using the rename operator.
Figure illustrates the three operations.
The relations STUDENT and INSTRUCTOR in Figure (a) are union compatible and
their tuples represent the names of students and the names of instructors, respectively.
The result of the UNION operation in Figure (b) shows the names of all students and
instructors. Note that duplicate tuples appear only once in the result. The result of the
INTERSECTION operation (Figure (c)) includes only those who are both students and
instructors.
Both UNION and INTERSECTION are commutative operations; that is,
R US = SUR and R∩S = S ∩R
Dept of Computer Science
Page 51
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Both UNION and INTERSECTION can be treated as n-ary operations applicable to
any number of relations because both are also associative operations; that is,
R U(S U T) = (R US) UT and (R ∩S ) ∩T = R ∩(S ∩T )
The MINUS operation is not commutative; that is, in general,
Note that INTERSECTION can be expressed in terms of union and set difference as
follows:
The CARTESIAN PRODUCT (CROSS PRODUCT) Operation
It also known as CROSS PRODUCT or CROSS JOIN—which is denoted by X(Cross
symbol) . This is also a binary set operation, but the relations on which it is applied do
not have to be union compatible. In its binary form, this set operation produces a new
element by combining every member (tuple) from one relation (set) with every member
(tuple) from the other relation (set). In general, the result of R(A1, A2, ..., An) X S(B1,
B2, ..., Bm) is a relation Q with degree n + m attributes Q(A1, A2, ..., An, B1, B2, ...,
Bm), in that order.
The resulting relation Q has one tuple for each combination of tuples—one from R and
one from S. Hence, if R has nR tuples (denoted as |R| = nR), and S has nS tuples, then R
X S will have nR * nS tuples.
The n-ary CARTESIAN PRODUCT operation is an extension of the above concept,
which produces new tuples by concatenating all possible combinations of tuples from
n underlying relations.
In general, the CARTESIAN PRODUCT operation applied by itself is generally
meaningless.
CARTESIAN PRODUCT example
Binary Relational Operations: JOIN and DIVISION
Dept of Computer Science
Page 52
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
JOIN Operator
JOIN is used to combine related tuples from two relations:



In its simplest form the JOIN operator is just the cross product of the two
relations.
As the join becomes more complex, tuples are removed within the cross product
to make the result of the join more meaningful.
JOIN allows you to evaluate a join condition between the attributes of the
relations on which the join is undertaken.
The notation used is ⋈
R ⋈join condition S
JOIN Example
Equijoin : The most common use of JOIN involves join conditions with equality
comparisons only. Such a JOIN, where the only comparison operator used is =, is
called an EQUIJOIN. Both previous examples were EQUIJOINs. Notice that in the
result of an EQUIJOIN we always have one or more pairs of attributes that have
identical values in every tuple.
Dept of Computer Science
Page 53
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Natural Join:It is denoted by *.Invariably the JOIN involves an equality test, and
thus is often described as an equi-join. Such joins result in two attributes in the
resulting relation having exactly the same value. A `natural join' will remove the
duplicate attribute(s).


In most systems a natural join will require that the attributes have the same name
to identify the attribute(s) to be used in the join. This may require a renaming
mechanism.
If you do use natural joins make sure that the relations do not have two
attributes with the same name by accident.
Theta Join: A join operator with general join condition (ex:condition with the
operators,=,<=,>,>= ,# )is called a THETA join.
OUTER JOINs
Notice that much of the data is lost when applying a join to two relations. In some cases
this lost data might hold useful information. An outer join retains the information that
would have been lost from the tables, replacing missing data with nulls.
Outer union can be used to calculate the union of two relations that are partially union
compatible.
Example: The table R
AB
1 2
3 4
Example: The table S
BC
4 5
6 7
The result of an outer union between R and S:
A B C
Dept of Computer Science
Page 54
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
1
3
2 null
4 5
null 6 7
There are three forms of the outer join, depending on which data is to be kept.



LEFT OUTER JOIN( ) - keep data from the left-hand table
RIGHT OUTER JOIN( ) - keep data from the right-hand table
FULL OUTER JOIN( ) - keep data from both tables
OUTER JOIN example 1
Figure : OUTER JOIN (left/right)
OUTER JOIN example 2
Figure : OUTER JOIN (full)
The DIVISION Operation
Dept of Computer Science
Page 55
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
The DIVISION operation, denoted by ÷, is useful for a special kind of query that
sometimes occurs in database applications.
Letr(R) and s(S) be relations
r ÷ s: - the result consists of the restrictions of tuples in r to the attribute names unique
to R, i.e. in the Header of r but not in the Header of s, for which it holds that all their
combinations with tuples in s are present in r.
Fig:
Dept of Computer Science
Page 56
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Dept of Computer Science
Page 57
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Functional Dependencies
FD's are constraints on well-formed relations and represent a formalism on the
infrastructure of relation.
Definition: A functional dependency (FD) on a relation schema R is a constraint X →
Y, where X and Y are subsets of attributes of R.
Definition: an FD is a relationship between an attribute "Y" and a determinant (1 or
more other attributes) "X" such that for a given value of a determinant the value of the
attribute is uniquely defined.





X is a determinant
X determines Y
Y is functionally dependent on X
X→Y
X →Y is trivial if Y Í X
Definition: An FD X → Y is satisfied in an instance r of R if for every pair of
tuples, t and s: if t and s agree on all attributes in X then they must agree on all
attributes in Y
A key constraint is a special kind of functional dependency: all attributes of relation
occur on the right-hand side of the FD:

SSN → SSN, Name, Address
Example Functional Dependencies
Let R be NewStudent(stuId, lastName, major, credits, status, socSecNo)
FDs in R include




{stuId}→{lastName}, but not the reverse
{stuId} →{lastName, major, credits, status, socSecNo, stuId}
{socSecNo} →{stuId, lastName, major, credits, status, socSecNo}
{credits}→{status}, but not {status}→{credits}
ZipCode→AddressCity
:16652 is Huntingdon’s ZIP
Dept of Computer Science
Page 58
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
ArtistName→BirthYear :Picasso was born in 1881
Autobrand→Manufacturer, Engine type

Pontiac is built by General Motors with gasoline engine
Author, Title→PublDate

Shakespeare’s Hamlet was published in 1600
Trivial Functional Dependency
The FD X→Y is trivial if set {Y} is a subset of set {X}
Examples: If A and B are attributes of R,
{A}→{A}
 {A,B} →{A}
 {A,B} →{B}
 {A,B} →{A,B}
are all trivial FDs and will not contribute to the evaluation
of normalization.
Composite Key: -Composite key is a primary key composed of multiple columns.
Functional Dependency – When value of one column is dependent on another column.

So that if value of one column changes the value of other column changes as well.
e.g. Supplier Address is functionally dependent on supplier name. If supplier’s name is
changed in a record we need to change the supplier address as well.
S.Supplier–àS.SupplierAddress
“In our s table supplier address column is functionally dependent on the supplier
column”
Partial Functional Dependency – A non-key column is dependent on some, but not
all the columns in a composite primary key.
In our above example Supplier Address was partially dependent on our composite key
columns (Gadgets+Supplier).
Transitive Dependency- A transitive dependency is a type of functional dependency
in which the value in a non-key column is determined by the value in another non-key
column.
Dept of Computer Science
Page 59
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Term
Definition
Determinant
For any relation R, the set of attributes X in the functional
dependency X → Y is the determinant group for the dependency.
Full functional
dependence
An attribute set X is fully functionally dependent on another
attribute set Y if X is functionally dependent on the whole set of
Y, but not on any subset of Y.
Functional
dependence
For any relation R, attribute X is functionally dependent on
attribute Y if for every valid instance, the value of Y determines
the value of X.
The symbolic notation for this is Y → X.
Multivalued
dependence
Transitive
dependence
For any relation R with attribute set (X,Y,Z), the set X
multivalue determines the set Y if, for every pair of tuples
containing duplicates in X, the instance also contains the pair of
tuples obtained by interchanging the Y tuples in the original pair.
The symbolic notation for this is X ⇒ Y.
Transitive dependencies are dependencies that exist among three
or more attributes in a relation in such a way that one attribute
determines a third by way of an intermediate attribute. For
example, consider the relation R(X,Y,Z), where X is the primary
key.
Attribute Z is transitively dependent on attribute X if attribute Y
satisfies the following dependencies, where the
symbol → indicates does not determine:
•X → Y
•Y → Z
•Y → X
Dept of Computer Science
Page 60
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Normalisation
1.
2.
3.
4.
5.
While designing a database out of an entity–relationship model, the main problem
existing in that “raw” database is redundancy. Redundancy is storing the same data item
in more one place. A redundancy creates several problems like the following:
Extra storage space: storing the same data in many places takes large amount of disk
space.
Entering same data more than once during data insertion.
Deleting data from more than one place during deletion.
Modifying data in more than one place.
Anomalies may occur in the database if insertion, deletion, modification etc are no
done properly. It creates inconsistency and unreliability in the database.
To solve this problem, the “raw” database needs to be normalized. This is a step by step
process of removing different kinds of redundancy and anomaly at each step. At each
step a specific rule is followed to remove specific kind of impurity in order to give the
database a slim and clean look.
Un-Normalized Form (UNF)
If a table contains non-atomic values at each row, it is said to be in UNF. An atomic
value is something that cannot be further decomposed. A non-atomic value, as the
name suggests, can be further decomposed and simplified. Consider the following table:
Emp-Id Emp-Name Month Sales Bank-Id Bank-Name
E01
AA
Jan
1000 B01
SBI
Feb
1200
Mar
850
E02
BB
Jan
2200 B02
UTI
Feb
2500
E03
CC
Jan
1700 B01
SBI
Feb
1800
Mar
1850
Apr
1725
In the sample table above, there are multiple occurrences of rows under each key EmpId. Although considered to be the primary key, Emp-Id cannot give us the unique
identification facility for any single row. Further, each primary key points to a variable
length record (3 for E01, 2 for E02 and 4 for E03).
Dept of Computer Science
Page 61
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
WHY WE NEED NORMALIZATION?
Normalization is the aim of well design Relational Database Management System
(RDBMS). It is step by step set of rules by which data is put in its simplest forms. We
normalize the relational database management system because of the following reasons:
Minimize data redundancy i.e. no unnecessarily duplication of data.








To make database structure flexible i.e. it should be possible to add new data values
and rows without reorganizing the database structure.
Data should be consistent throughout the database i.e. it should not suffer from
following anomalies.
Insert Anomaly - Due to lack of data i.e., all the data available for insertion such that
null values in keys should be avoided. This kind of anomaly can seriously damage a
database
Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same
values in a column. This can lead to inefficiency.
Deletion Anomaly - It leads to loss of data for rows that are not stored elsewhere. It
could result in loss of vital data.
Complex queries required by the user should be easy to handle.
On decomposition of a relation into smaller relations with fewer attributes on
normalization the resulting relations whenever joined must result in the same relation
without any extra rows. The join operations can be performed in any order. This is
known as Lossless Join decomposition.
The resulting relations (tables) obtained on normalization should possess the
properties such as each row must be identified by a unique key, no repeating groups,
homogenous columns, each column is assigned a unique name etc.
ADVANTAGES OF NORMALIZATION





The following are the advantages of the normalization.
More efficient data structure.
Avoid redundant fields or columns.
More flexible data structure i.e. we should be able to add new rows and data values
easily
Better understanding of data.
Ensures that distinct tables exist when necessary.
Dept of Computer Science
Page 62
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
- Easier to maintain data structure i.e. it is easy to perform operations and complex
queries can be easily handled.
- Minimizes data duplication.
- Close modeling of real world entities, processes and their relationships.
DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
- You cannot start building the database before you know what the user needs.
- On normalizing the relations to higher normal forms i.e. 4NF, 5NF the performance
degrades.
- It is very time consuming and difficult process in normalizing relations of higher
degree.
- Careless decomposition may leads to bad design of database which may leads to
serious problems.
First Normal Form (1NF)
A relation is said to be in 1NF if it contains no non-atomic values and each row can
provide a unique combination of values. The above table in UNF can be processed to
create the following table in 1NF.
Emp-Name Month Sales Bank-Id Bank-Name
Emp-Id
E01
AA
Jan
1000 B01
SBI
E01
AA
Feb
1200 B01
SBI
E01
AA
Mar
850 B01
SBI
E02
BB
Jan
2200 B02
UTI
E02
BB
Feb
2500 B02
UTI
E03
CC
Jan
1700 B01
SBI
E03
CC
Feb
1800 B01
SBI
E03
CC
Mar
1850 B01
SBI
E03
CC
Apr
1725 B01
SBI
As you can see now, each row contains unique combination of values. Unlike in UNF,
this relation contains only atomic values, i.e. the rows can not be further decomposed,
so the relation is now in 1NF.
Second Normal Form (2NF)
A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully
depends on the primary key of the relation. Speaking inversely, if a table has some
attributes which is not dependant on the primary key of that table, then it is not in 2NF.
Dept of Computer Science
Page 63
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month,
Sales and Bank-Name all depend upon Emp-Id. But the attribute Bank-Name depends
on Bank-Id, which is not the primary key of the table. So the table is in 1NF, but not in
2NF. If this position can be removed into another related relation, it would come to
2NF.
Emp-Id Emp-Name Month Sales Bank-Id
E01
AA
JAN 1000 B01
E01
AA
FEB 1200 B01
E01
AA
MAR 850 B01
E02
BB
JAN 2200 B02
E02
BB
FEB 2500 B02
E03
CC
JAN 1700 B01
E03
CC
FEB 1800 B01
E03
CC
MAR 1850 B01
E03
CC
APR 1726 B01
Bank-Id Bank-Name
B01
SBI
B02
UTI
After removing the portion into another relation we store lesser amount of data in two
relations without any loss information. There is also a significant reduction in
redundancy.
Third Normal Form (3NF)
A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive
dependency in that relation. Speaking inversely, if a table contains transitive
dependency, then it is not in 3NF, and the table must be split to bring it into 3NF.
What is a transitive dependency? Within a relation if we see
A → B [B depends on A]
And
B → C [C depends on B]
Then we may derive
A → C[C depends on A]
Such derived dependencies hold well in most of the situations. For example if we
have
Roll → Marks
And
Marks → Grade
Dept of Computer Science
Page 64
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Then we may safely derive
Roll → Grade.
This third dependency was not originally specified but we have derived it.
The derived dependency is called a transitive dependency when such dependency
becomes improbable. For example we have been given
Roll → City
And
City → STDCode
If we try to derive Roll → STDCode it becomes a transitive dependency, because
obviously the STDCode of a city cannot depend on the roll number issued by a school
or college. In such a case the relation should be broken into two, each containing one
of these two dependencies:
Roll → City
And
City → STD code
Boyce-Code Normal Form (BCNF)
A relationship is said to be in BCNF if it is already in 3NF and the left hand side of
every dependency is a candidate key. A relation which is in 3NF is almost always in
BCNF. These could be same situation when a 3NF relation may not be in BCNF the
following conditions are found true.
1. The candidate keys are composite.
2. There are more than one candidate keys in the relation.
3. There are some common attributes in the relation.
Professor Code Department Head of Dept. Percent Time
P1
Physics
Ghosh
50
P1
Mathematics Krishnan
50
P2
Chemistry Rao
25
P2
Physics
Ghosh
75
P3
Mathematics Krishnan
100
Consider, as an example, the above relation. It is assumed that:
1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.
Dept of Computer Science
Page 65
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
The relation diagram for the above relation is given as the following:
The given relation is in 3NF. Observe, however, that the names of Dept. and Head of
Dept. are duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose
the information that Rao is the Head of Department of Chemistry.
The normalization of the relation is done by creating a new relation for Dept. and Head
of Dept. and deleting Head of Dept. form the given relation. The normalized relations
are shown in the following.
Professor Code Department Percent Time
P1
Physics
50
P1
Mathematics 50
P2
Chemistry 25
P2
Physics
75
P3
Mathematics 100
Head of Dept.
Department
Physics
Ghosh
Mathematics Krishnan
Chemistry Rao
Dept of Computer Science
Page 66
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
See the dependency diagrams for these new relations.
1.
2.
3.
4.
Fourth Normal Form (4NF)
When attributes in a relation have multi-valued dependency, further Normalization to
4NF and 5NF are required. Let us first find out what multi-valued dependency is.
A multi-valued dependency is a typical kind of dependency in which each and every
attribute within a relation depends upon the other, yet none of them is a unique primary
key.
We will illustrate this with an example. Consider a vendor supplying many items to
many projects in an organization. The following are the assumptions:
A vendor is capable of supplying many items.
A project uses many items.
A vendor supplies to many projects.
An item may be supplied by many vendors.
A multi valued dependency exists here because all the attributes depend upon the other
and yet none of them is a primary key having unique value.
Vendor Code Item Code Project No.
V1
I1
P1
V1
I2
P1
V1
I1
P3
V1
I2
P3
V2
I2
P1
V2
I3
P1
V3
I1
P2
V3
I1
P3
The given relation has a number of problems. For example:
1. If vendor V1 has to supply to project P2, but the item is not yet decided, then a row
with a blank for item code has to be introduced.
2. The information about item I1 is stored twice for vendor V3.
Dept of Computer Science
Page 67
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Observe that the relation given is in 3NF and also in BCNF. It still has the problem
mentioned above. The problem is reduced by expressing this relation as two relations
in the Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one
independent multi valued dependency or one independent multi valued dependency
with a functional dependency.
The table can be expressed as the two 4NF relations given as following. The fact that
vendors are capable of supplying certain items and that they are assigned to supply for
some projects in independently specified in the 4NF relation.
Vendor-Supply
Item Code
Vendor Code
V1
I1
V1
I2
V2
I2
V2
I3
V3
I1
Vendor-Project
Project No.
Vendor Code
V1
P1
V1
P3
V2
P1
V3
P2
Fifth Normal Form (5NF)
These relations still have a problem. While defining the 4NF we mentioned that all the
attributes depend upon each other. While creating the two tables in the 4NF, although
we have preserved the dependencies between Vendor Code and Item code in the first
table and Vendor Code and Item code in the second table, we have lost the relationship
between Item Code and Project No. If there were a primary key then this loss of
dependency would not have occurred. In order to revive this relationship we must add
a new table like the following. Please note that during the entire process of
normalization, this is the only step where a new table is created by joining two attributes,
rather than splitting them into separate tables.
Project No. Item Code
Dept of Computer Science
Page 68
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
P1
P1
P2
P3
P3
11
12
11
11
13
Let us finally summarize the normalization steps we have discussed so far.
Input
Transformation
Output
Relation
Relation
All
Eliminate variable length record. Remove multi-attribute 1NF
Relations lines in table.
1NF
Remove dependency of non-key attributes on part of a multi- 2NF
Relation
attribute key.
2NF
Remove dependency of non-key attributes on other non-key 3NF
attributes.
3NF
Remove dependency of an attribute of a multi attribute key BCNF
on an attribute of another (overlapping) multi-attribute key.
BCNF
Remove more than one independent multi-valued 4NF
dependency from relation by splitting relation.
4NF
Add one relation relating attributes with multi-valued 5NF
dependency.
Dept of Computer Science
Page 69
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
UNIT 3
SQL-The Relational Database Standards
The name SQL is expanded as Structured Query Language. Originally, SQL was called
SEQUEL (Structured English Query Language) and was designed and implemented at
IBM Research as the interface for an experimental relational database system called
SYSTEM R. SQL is now the standard language for commercial relational DBMSs. A
joint effort by the American National Standards Institute (ANSI) and the International
Standards Organization (ISO) has led to a standard version of SQL (ANSI 1986), called
SQL-86 or SQL1. A revised and much expanded standard called SQL-92 (also referred
to as SQL2) was subsequently developed. The next standard that is well-recognized is
SQL: 1999, which started out as SQL3. Two later updates to the standard are SQL: 2003
and SQL: 2006, which added XML features among other updates to the language.
Another update in 2008 incorporated more object database features in.
SQL is a comprehensive database language: It has statements for data definitions,
queries, and updates. Hence, it is both a DDL and a DML. In addition, it has facilities
for defining views on the database, for specifying security and authorization, for
defining integrity constraints, and for specifying transaction controls. It also has rules
for embedding SQL statements into a general-purpose programming language such as
Java, COBOL, or C/C++.1
The later SQL standards (starting with SQL: 1999) are divided into a core specification
plus specialized extensions.
SQL Data Definition and Data Types
SQL uses the terms table, row, and column for the formal relational model terms
relation, tuple, and attribute, respectively. We will use the corresponding terms
interchangeably. The main SQL command for data definition is the CREATE statement,
which can be used to create schemas, tables (relations), and domains (as well as other
constructs such as views, assertions, and triggers).
Dept of Computer Science
Page 70
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
1. Schema and Catalog Concepts in SQL
An SQL schema is identified by a schema name, and includes an authorization
identifier to indicate the user or account who owns the schema, as well as descriptors
for each element in the schema. Schema elements include tables, constraints, views,
domains, and other constructs (such as authorization grants) that describe the schema.
A schema is created via the CREATE SCHEMA statement, which can include all the
schema elements’ definitions. Alternatively, the schema can be assigned a name and
authorization identifier, and the elements can be defined later. For example, the
following statement creates a schema called COMPANY, owned by the user with
authorization identifier ‘Jsmith’. Note that each statement in SQL ends with a
semicolon.
CREATE SCHEMA COMPANY AUTHORIZATION ‘Jsmith’;
In general, not all users are authorized to create schemas and schema elements. The
privilege to create schemas, tables, and other constructs must be explicitly granted to
the relevant user accounts by the system administrator or DBA.
2. The CREATE TABLE Command in SQL
The CREATE TABLE command is used to specify a new relation by giving it
a name and specifying its attributes and initial constraints. The attributes are specified
first and each attribute is given a name, a data type to specify its domain of values, and
any attribute constraints, such as NOT NULL.
CREATE TABLE COMPANY.EMPLOYEE(...);
rather than we can explicitly (rather than implicitly) make the EMPLOYEE table part
of the COMPANY schema.
fig :
The relations declared through CREATE TABLE statements are called base tables (or
base relations).This means that the relation and its tuples are actually created and stored
as a file by the DBMS.
Dept of Computer Science
Page 71
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
3. Attribute Data Types and Domains in SQL
The basic data types available for attributes include numeric, character string,
bit string, Boolean, date, and time.
1) Numeric data types include integer numbers of various sizes (INTEGER or INT, and
SMALLINT) and floating-point (real) numbers of various precision (FLOAT or REAL,
and DOUBLE PRECISION). Formatted numbers can be declared by using
DECIMAL(i,j)—or DEC(i,j) or NUMERIC(i,j)—where i, the precision, is the total
number of decimal digits and j, the scale, is the number of digits after the decimal point.
The default for scale is zero, and the default for precision is implementation-defined.
2)Character-string data types are either fixed length—CHAR(n) or CHARACTER(n),
where n is the number of characters—or varying length— VARCHAR(n) or CHAR
VARYING(n) or CHARACTER VARYING(n), where n is the maximum number of
characters.
3) Bit-string data types are either of fixed length n—BIT(n)—or varying length—BIT
VARYING(n), where n is the maximum number of bits. The default for n, the length of
a character string or bit string, is 1.
4) A Boolean data type has the traditional values of TRUE or FALSE. In SQL, because
of the presence of NULL values, a three-valued logic is used, so a third possible value
for a Boolean data type is UNKNOWN.
5) The DATE data type has ten positions, and its components are YEAR, MONTH, and
DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions,
with the components HOUR, MINUTE, and SECOND in the form HH:MM:SS
6) A timestamp data type (TIMESTAMP) includes the DATE and TIME fields, plus a
minimum of six positions for decimal fractions of seconds and an optional WITH TIME
ZONE qualifier. Literal values are represented by single quoted strings preceded by the
keyword TIMESTAMP, with a blank space between data and time; for example,
TIMESTAMP ‘2008-09-27 09:12:47.648302’.
7) another data type related to DATE, TIME, and TIMESTAMP is the INTERVAL data
type. This specifies an interval—a relative value that can be used to increment or
decrement an absolute value of a date, time, or timestamp. Intervals are qualified to be
either YEAR/MONTH intervals or DAY/TIME intervals.
DOMAIN: It is possible to specify the data type of each attribute directly,
alternatively, a domain can be declared, and the domain name used with the attribute
specification. This makes it easier to change the data type for a domain that is used by
numerous attributes in a schema, and improves schema readability. For example, we
can create a domain SSN_TYPE by the following statement:
CREATE DOMAIN SSN_TYPE AS CHAR(9);
We can use SSN_TYPE in place of CHAR(9).
Dept of Computer Science
Page 72
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Specifying Constraints in SQL
This section describes the basic constraints that can be specified in SQL as part
of table creation. These include key and referential integrity constraints, restrictions on
attribute domains and NULLs, and constraints on individual tuples within a relation.
1. Specifying Attribute Constraints and Attribute Defaults
1) NOT NULL: Because SQL allows NULLs as attribute values, a constraint NOT
NULL may be specified if NULL is not permitted for a particular attribute. This is
always implicitly specified for the attributes that are part of the primary key of each
relation, but it can be specified for any other attributes whose values are required not to
be NULL.
2) DEFAULT:
It is also possible to define a default value for an attribute by
appending the clause DEFAULT <value> to an attribute definition. The default value
is included in any new tuple if an explicit value is not provided for that attribute. Figure
illustrates an example of specifying a default manager for a new department and a
default department for a new employee. If no default clause is specified, the default
default value is NULL for attributes that do not have the NOT NULL constraint.
Ex:CREATE TABLE EMPLOYEE ( . .. ,Dno INT NOT NULL DEFAULT 1,..);
3) CHECK: Another type of constraint can restrict attribute or domain values using the
CHECK clause following an attribute or domain definition.For example, suppose that
department numbers are restricted to integer numbers between 1 and 20; as the
following:
Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber < 21);
The CHECK clause can also be used in conjunction with the CREATE DOMAIN
statement.For example, we can write the following statement:
CREATE DOMAIN D_NUM AS CHECK (D_NUM > 0 AND D_NUM < 21);
We can then use the created domain D_NUM as the attribute type for all attributes
that refer to department numbers , such as Dnumber of DEPARTMENT,
Dnum of PROJECT, Dno of EMPLOYEE, and so on.
2. Specifying Key and Referential Integrity Constraints
Because keys and referential integrity constraints are very important, there are
special clauses within the CREATE TABLE statement to specify them.
A primary key has a single attribute. Referential integrity is specified via the
FOREIGN KEY clause. The UNIQUE clause specifies alternate (secondary) key.
Dname VARCHAR(15) UNIQUE; For example,
Dept of Computer Science
Page 73
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Ex :CREATE TABLE EMPLOYEE ( Ssn NUMBER(5), . . ,
Dno INT NOT NULL DEFAULT 1,
PRIMARY KEY (Ssn));
Ex:CREATE TABLE DEPARTMENT(Dnumber int,Dname varchar(40) UNIQUE,
Super_ssn number(5), Mgr_ssn CHAR(9) NOT NULL
DEFAULT ‘888665555’, PRIMARY KEY(Dnumber),
FOREIGN KEY (Super_ssn) REFERENCES
EMPLOYEE(Ssn) );
3. CASCADE ON DELETE or ON UPDATE: A referential integrity constraint
can be violated when tuples are inserted or deleted, or when a foreign key or primary
key attribute value is modified. The default action that SQL takes for an integrity
violation is to reject the update operation that will cause a violation, which is known as
the RESTRICT option. However, the schema designer can specify an alternative action
to be taken by attaching a referential triggered action clause to any foreign key
constraint. The options include SET NULL, CASCADE, and SET DEFAULT. An
option must be qualified with either ON DELETE or ON UPDATE. We illustrate this
with the examples shown in Figure 3.1
4. Giving Names to Constraints
Figure 3.1 also illustrates how a constraint may be given a constraint name, following
the keyword CONSTRAINT. The names of all constraints within a particular schema
must be unique. A constraint name is used to identify a particular constraint in case the
constraint must be dropped later and replaced with another constraint. Giving names to
constraints is optional. Figure 3.1
Here, the database designer chooses ON DELETE SET NULL and ON UPDATE
CASCADE for the foreign key Super_ssn of EMPLOYEE. This means that if the tuple
Dept of Computer Science
Page 74
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
for a supervising employee is deleted, the value of Super_ssn is automatically set to
NULL for all employee tuples that were referencing the deleted employee tuple. On the
other hand, if the Ssn value for a supervising employee is updated (say, because it was
entered incorrectly), the new value is cascaded to Super_ssn for all employee tuples
referencing the updated employee
tuple.
Basic Retrieval Queries in SQL
SQL has one basic statement for retrieving information from a database: the
SELECT statement. The SELECT statement is not the same as the SELECT operation
of relational algebra. There are many options and
flavors to the SELECT statement in SQL.
1. The SELECT-FROM-WHERE Structure of Basic SQL Queries
The basic form of the SELECT statement, sometimes called a mapping or a selectfrom-where block, is formed of the three clauses SELECT, FROM, and WHERE and
has the following form:
SELECT <attribute list>
FROM <table list>
WHERE <condition>;
where
■ <attribute list> is a list of attribute names whose values are to be retrieved by the
query.
■ <table list> is a list of the relation names required to process the query.
■ <condition> is a conditional (Boolean) expression that identifies the tuples to be
retrieved by the query.
In SQL, the basic logical comparison operators for comparing attribute values with one
another and with literal constants are =, <, <=, >, >=, and <>. These correspond to the
relational algebra operators =, <, ≤, >, ≥, and ≠, respectively, and to the C/C++
programming language operators =, <, <=, >, >=, and !=. The main syntactic difference
is the not equal operator. SQL has additional comparison operators that we will present
gradually.
Query 0. Retrieve the birth date and address of the employee(s) whose name is ‘John
B. Smith’.
Q0: SELECT Bdate, Address
FROM EMPLOYEE
Dept of Computer Science
Page 75
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
WHERE Fname=‘John’ AND Minit=‘B’ AND Lname=‘Smith’;
This query involves only the EMPLOYEE relation listed in the FROM clause. The
query selects the individual EMPLOYEE tuples that satisfy the condition of the
WHERE clause, then projects the result on the Bdate and Address attributes listed in
the SELECT clause.
The SELECT clause of SQL specifies the attributes, whose values are to be retrieved,
which are called the projection attributes, and the WHERE clause specifies the
Boolean condition that must be true for any retrieved tuple, which is known as the
selection condition.
We can think of an implicit tuple variable or iterator in the SQL query ranging or
looping over each individual tuple in the EMPLOYEE table and evaluating the
condition in the WHERE clause. Only those tuples that satisfy the condition—that is,
those tuples for which the condition evaluates to TRUE after substituting their
corresponding attribute values—are selected.
Query 1. Retrieve the name and address of all employees who work for the ‘Research’
department.
Q1:
SELECT Fname, Lname, Address
FROM EMPLOYEE, DEPARTMENT
WHERE Dname=‘Research’ AND Dnumber=Dno;
In the WHERE clause of Q1, the condition Dname = ‘Research’ is a selection condition
that chooses the particular tuple of interest in the DEPARTMENT table, because Dname
is an attribute of DEPARTMENT. The condition Dnumber = Dno is called a join
condition, because it combines two tuples: one from DEPARTMENT and one from
EMPLOYEE, whenever the value of Dnumber in DEPARTMENT is equal to the value
of Dno in EMPLOYEE.
In general, any number of selection and join conditions may be specified in a
singleSQL query. A query that involves only selection and join conditions plus
projection attributes isknown as a select-project-join query. The next example is a
select-project-join query with two join conditions.
Query 2. For every project located in ‘Stafford’, list the project number, the controlling
department number, and the department manager’s last name, address, and birth date.
Q2:
SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND
Plocation=‘Stafford’;
The join condition Dnum = Dnumber relates a project tuple to its controlling department
tuple, whereas the join condition Mgr_ssn = Ssn relates the controlling department tuple
to the employee tuple who manages that department. Each tuple in the result will be a
combination of one project, one department, and one employee that satisfies the join
Dept of Computer Science
Page 76
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
conditions. The projection attributes are used to choose the attributes to be displayed
from each combined tuple.
2. Ambiguous Attribute Names, Aliasing, Renaming, and Tuple Variables
In SQL, the same name can be used for two (or more) attributes as long as the attributes
are in different relations. If this is the case, and a multitable query refers to two or more
attributes with the same name, we must qualify the attribute name with the relation
name to prevent ambiguity. This is done by prefixing the relation name to the attribute
name and separating the two by a period. To illustrate this, the Dno and Lname attributes
of the EMPLOYEE relation were called Dnumber and Name, and the Dname attribute
of DEPARTMENT was also called Name; then, to prevent ambiguity, query Q1 would
be rephrased as shown in Q1A.We must prefix the attributes Name and Dnumber in
Q1A to specify which ones we are referring to, because the same attribute names are
used in both relations:
Q1A:
SELECT Fname, EMPLOYEE.Name, Address
FROM EMPLOYEE, DEPARTMENT
WHERE DEPARTMENT.Name=‘Research’ AND
DEPARTMENT.Dnumber=EMPLOYEE.Dnumber;
Fully qualified attribute names can be used for clarity even if there is no ambiguity in
attribute names. Q1 is shown in this manner as is Q1_ below.We can also create an alias
for each table name to avoid repeated typing of long table names (see Q8 below).
Q1_:
SELECT EMPLOYEE.Fname, EMPLOYEE.LName,
EMPLOYEE.Address
FROM EMPLOYEE, DEPARTMENT
WHERE DEPARTMENT.DName=‘Research’ AND
DEPARTMENT.Dnumber=EMPLOYEE.Dno;
The ambiguity of attribute names also arises in the case of queries that refer to the same
relation twice, as in the following example.
Query 8. For each employee, retrieve the employee’s first and last name and the first
and last name of his or her immediate supervisor.
Q8:
SELECT E.Fname, E.Lname, S.Fname, S.Lname
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn=S.Ssn;
In this case, we are required to declare alternative relation names E and S, called aliases
or tuple variables, for the EMPLOYEE relation. An alias can follow the keyword AS,
as shown in Q8, or it can directly follow the relation name—for example, by writing
EMPLOYEE E, EMPLOYEE S in the FROM clause of Q8. It is also possible to rename
the relation attributes within the query in SQL by giving them aliases.
For example, if we write
Dept of Computer Science
Page 77
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
EMPLOYEE AS E (Fn, Mi, Ln, Ssn, Bd, Addr, Sex, Sal, Sssn, Dno) in the FROM
clause, Fn becomes an alias for Fname, Mi for Minit, Ln for Lname, and so on.
In Q8, we can think of E and S as two different copies of the EMPLOYEE relation; the
first, E, represents employees in the role of supervisees or subordinates; the second, S,
represents employees in the role of supervisors. We can now join the two copies.
We can use this alias-naming mechanism in any SQL query to specify tuple variables
for every table in the WHERE clause, whether or not the same relation needs to be
referenced more than once. In fact, this practice is recommended since it results in
queries that are easier to comprehend. For example, we could specify query Q1 as in
Q1B:
Q1B:
SELECT E.Fname, E.LName, E.Address
FROM EMPLOYEE E, DEPARTMENT D
WHERE D.DName=‘Research’ AND D.Dnumber=E.Dno;
3. Unspecified WHERE Clause and Use of the Asterisk
We discuss two more features of SQL here. A missing WHERE clause indicates
no condition on tuple selection; hence, all tuples of the relation specified in the FROM
clause qualify and are selected for the query result. If more than one relation is specified
in the FROM clause and there is no WHERE clause, then the CROSS
PRODUCT—all possible tuple combinations—of these relations is selected. For
example, Query 9 selects all EMPLOYEE and Query 10 selects all combinations of an
EMPLOYEE Ssn and a DEPARTMENT Dname, regardless of whether the employee
works for the department or not .
Queries 9 and 10. Select all EMPLOYEE Ssns (Q9) and all combinations of
EMPLOYEE Ssn and DEPARTMENT Dname (Q10) in the database.
Q9:
SELECT Ssn
FROM EMPLOYEE;
Q10:
SELECT Ssn, Dname
FROM EMPLOYEE, DEPARTMENT;
It is extremely important to specify every selection and join condition in the
WHERE clause; if any such condition is overlooked, incorrect and very large relations
may result. Notice that Q10 is similar to a CROSS PRODUCT operation followed by a
PROJECT operation in relational algebra . If we specify all the attributes of
EMPLOYEE and DEPARTMENT in Q10, we get the actual CROSS PRODUCT
(except for duplicate elimination, if any).
To retrieve all the attribute values of the selected tuples, we do not have to list the
attribute names explicitly in SQL; we just specify an asterisk (*), which stands for all
the attributes. For example, query Q1C retrieves all the attribute values of any
EMPLOYEE who works in DEPARTMENT number 5 , query Q1D retrieves all the
attributes of an EMPLOYEE and the attributes of the DEPARTMENT in which he or
Dept of Computer Science
Page 78
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
she works for every employee of the ‘Research’ department, and Q10A specifies the
CROSS PRODUCT of the EMPLOYEE and DEPARTMENT relations.
Q1C:
SELECT *
FROM EMPLOYEE
WHERE Dno=5;
Q1D:
SELECT *
FROM EMPLOYEE, DEPARTMENT
WHERE Dname=‘Research’ AND Dno=Dnumber;
Q10A:
SELECT *
FROM EMPLOYEE, DEPARTMENT;
4. Tables as Sets in SQL
As we mentioned earlier, SQL usually treats a table not as a set but rather as a
multiset; duplicate tuples can appear more than once in a table, and in the result of a
query. SQL does not automatically eliminate duplicate tuples in the results of queries,
for the following reasons:
■ Duplicate elimination is an expensive operation. One way to implement it is to sort
the tuples first and then eliminate duplicates.
■ the user may want to see duplicate tuples in the result of a query.
■ When an aggregate function is applied to tuples, in most cases we do not want to
eliminate duplicates.
An SQL table with a key is restricted to being a set, since the key value must be distinct
in each tuple. If we do want to eliminate duplicate tuples from the result of an SQL
query, we use the keyword DISTINCT in the SELECT clause, meaning that only
distinct tuples should remain in the result. In general, a query with SELECT
DISTINCT eliminates duplicates, whereas a query with SELECT ALL does not. Do not
specify SELECT with neither ALL nor DISTINCT— as in our previous examples— is
equivalent to SELECT ALL. For example, Q11 retrieves the salary of every employee;
if several employees have the same salary, that salary value will appear as many times
in the result of the query, as shown in Figure 4.4(a).
Dept of Computer Science
Page 79
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
If we are interested only in distinct salary values, we want each value to appear only
once, regardless of how many employees earn that salary. By using the keyword
DISTINCT as in Q11A, we accomplish this, as shown in Figure 4.4(b).
Query 11. Retrieve the salary of every employee (Q11) and all distinct salary values
(Q11A).
Q11:
SELECT ALL Salary
FROM EMPLOYEE;
Q11A:
SELECT DISTINCT Salary
FROM EMPLOYEE;
Set operations
SQL has directly incorporated some of the set operations from mathematical set theory,
which are also part of relational algebra. There are set union
(UNION), set difference (EXCEPT), and set intersection (INTERSECT) operations.
The relations resulting from these set operations are sets of tuples; that is, duplicate
tuples are eliminated from the result. These set operations apply only to unioncompatible relations, so we must make sure that the two relations on which we apply
the operation have the same attributes and that the attributes appear in the same order
in both relations. The next example illustrates the use of UNION.
Query 4. Make a list of all project numbers for projects that involve an employee whose
last name is ‘Smith’, either as a worker or as a manager of the department that controls
the project.
Q4A:
(SELECT DISTINCT Pnumber
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND Mgr_ssn=Ssn
AND Lname=‘Smith’ )
UNION
( SELECT DISTINCT Pnumber
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE Pnumber=Pno AND Essn=Ssn AND Lname=‘Smith’ );
The first SELECT query retrieves the projects that involve a ‘Smith’ as manager of the
department that controls the project and the second retrieves the projects that involve a
‘Smith’ as a worker on the project. Notice that if several employees have the last name
‘Smith’, the project names involving any of them will be retrieved.
Applying the UNION operation to the two SELECT queries gives the desired result.
SQL also has corresponding multiset operations, which are followed by the keyword
ALL (UNION ALL, EXCEPT ALL, INTERSECT ALL). Their results are multisets
(duplicates are not eliminated). The behavior of these operations is illustrated by the
Dept of Computer Science
Page 80
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
examples in Figure 4.5. Basically, each tuple—whether it is a duplicate or not—is
considered as a different tuple when applying these operations.
5. Substring Pattern Matching and Arithmetic Operators
In this section we discuss several more features of SQL. The first feature allows
comparison conditions on only parts of a character string, using the LIKE comparison
operator. This can be used for string pattern matching. Partial strings are specified
using two reserved characters: % replaces an arbitrary number of zero or more
characters, and the underscore (_) replaces a single character. For example, consider the
following query.
Query 12. Retrieve all employees whose address is in Houston, Texas.
Q12:
SELECT Fname, Lname
FROM EMPLOYEE
WHERE Address LIKE ‘%Houston,TX%’;
To retrieve all employees who were born during the 1950s, we can use Query Q12A.
Here, ‘5’must be the third character of the string (according to our format for date),
so we use the value ‘_ _ 5 _ _ _ _ _ _ _’, with each underscore serving as a placeholder
for an arbitrary character.
Query 12A. Find all employees who were born during the 1950s.
Q12:
SELECT Fname, Lname
FROM EMPLOYEE
WHERE Bdate LIKE ‘_ _ 5 _ _ _ _ _ _ _’;
Another feature allows the use of arithmetic in queries. The standard arithmetic
operators for addition (+), subtraction (–), multiplication (*), and division (/) can be
applied to numeric values or attributes with numeric domains. For example, suppose
that we want to see the effect of giving all employees who work on the
‘ProductX’ project a 10 percent raise; we can issue Query 13 to see what their salaries
would become. This example also shows how we can rename an attribute in the query
result using AS in the SELECT clause.
Query 13. Show the resulting salaries if every employee working on the
Dept of Computer Science
Page 81
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
‘ProductX’ project is given a 10 percent raise.
Q13:
SELECT E.Fname, E.Lname, 1.1 * E.Salary AS Increased_sal
FROM EMPLOYEE AS E, WORKS_ON AS W, PROJECT AS P
WHERE E.Ssn=W.Essn AND W.Pno=P.Pnumber AND
P.Pname=‘ProductX’;
For string data types, the concatenate operator || can be used in a query to append two
string values. For date, time, timestamp, and interval data types, operators include
incrementing (+) or decrementing (–) a date, time, or timestamp by an interval. In
addition, an interval value is the result of the difference between two date, time, or
timestamp values. Another comparison operator, which can be used for convenience, is
BETWEEN, which is illustrated in Query 14.
Query 14. Retrieve all employees in department 5 whose salary is between
$30,000 and $40,000.
Q14:
SELECT *
FROM EMPLOYEE
WHERE (Salary BETWEEN 30000 AND 40000) AND Dno = 5;
The condition (Salary BETWEEN 30000 AND 40000) in Q14 is equivalent to the
condition((Salary >= 30000) AND (Salary <= 40000)).
6. Ordering of Query Results
SQL allows the user to order the tuples in the result of a query by the values of one or
more of the attributes that appear in the query result, by using the ORDER BY clause.
This is illustrated by Query 15.
Query 15. Retrieve a list of employees and the projects they are working on, ordered
by department and, within each department, ordered alphabetically by last name, then
first name.
Q15:
SELECT D.Dname, E.Lname, E.Fname, P.Pname
FROM DEPARTMENT D, EMPLOYEE E, WORKS_ON W,
PROJECT P
WHERE D.Dnumber= E.Dno AND E.Ssn= W.Essn AND
W.Pno= P.Pnumber ORDER BY D.Dname, E.Lname, E.Fname;
The default order is in ascending order of values. We can specify the keyword DESC
if we want to see the result in a descending order of values. The keyword ASC can be
used to specify ascending order explicitly. For example, if we want descending
alphabetical order on Dname and ascending order on Lname, Fname, the ORDER BY
clause of Q15 can be written as
ORDER BY D.Dname DESC, E.Lname ASC, E.Fname ASC
7. Discussion and Summary of Basic SQL Retrieval Queries
Dept of Computer Science
Page 82
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
A simple retrieval query in SQL can consist of up to four clauses, but only the first
two—SELECT and FROM—are mandatory. The clauses are specified in the following
order, with the clauses between square brackets [ ... ] being optional:
SELECT <attribute list>
FROM <table list>
[ WHERE <condition> ]
[ ORDER BY <attribute list> ];
The SELECT clause lists the attributes to be retrieved, and the FROM clause specifies
all relations (tables) needed in the simple query. The WHERE clause identifies the
conditions for selecting the tuples from these relations, including join conditions if
needed. ORDER BY specifies an order for displaying the results of a query.
INSERT, DELETE, and UPDATE Statements in SQL
In SQL, three commands can be used to modify the database: INSERT, DELETE, and
UPDATE. We discuss each of these in turn.
1 .The INSERT Command
In its simplest form, INSERT is used to add a single tuple to a relation.We must specify
the relation name and a list of values for the tuple. The values should be listed in the
same order in which the corresponding attributes were specified in the CREATE
TABLE command. For example,
U1: INSERT INTO EMPLOYEE
VALUES ( ‘Richard’, ‘K’, ‘Marini’, ‘653298653’, ‘1962-12-30’, ‘98
Oak Forest, Katy, TX’, ‘M’, 37000, ‘653298653’, 4 );
A second form of the INSERT statement allows the user to specify explicit attribute
names that correspond to the values provided in the INSERT command. This is useful
if a relation has many attributes but only a few of those attributes are assigned values in
the new tuple. However, the values must include all attributes with NOT NULL
specification and no default value. Attributes with NULL allowed or DEFAULT values
are the ones that can be left out. For example, to enter a tuple for a new
EMPLOYEE for whom we know only the Fname, Lname, Dno, and Ssn attributes, we
can use U1A:
U1A:
INSERT INTO EMPLOYEE (Fname, Lname, Dno, Ssn)
VALUES (‘Richard’, ‘Marini’, 4, ‘653298653’);
Attributes not specified in U1A are set to their DEFAULT or to NULL, and the values
are listed in the same order as the attributes are listed in the INSERT command itself.
It is also possible to insert into a relation multiple tuples separated by commas in a single
INSERT command. The attribute values forming each tuple are enclosed in
parentheses.
Dept of Computer Science
Page 83
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
2. The DELETE Command
The DELETE command removes tuples from a relation. It includes a WHERE clause,
similar to that used in an SQL query, to select the tuples to be deleted. Tuples are
explicitly deleted from only one table at a time. However, the deletion may propagate
to tuples in other relations if referential triggered actions are specified in the referential
integrity constraints of the DDL. Depending on the number of tuples selected by the
condition in the WHERE clause, zero, one, or several tuples can be deleted by a single
DELETE command. A missing WHERE clause specifies that all tuples in the relation
are to be deleted; however, the table remains in the database as an empty table. We must
use the DROP TABLE command to remove the table definition. The DELETE
commands in U4A to U4D, if applied independently to the database , will delete zero,
one, four, and all tuples, respectively, from the EMPLOYEE relation:
U4A:
DELETE FROM EMPLOYEE
WHERE Lname=‘Brown’;
U4B:
DELETE FROM EMPLOYEE
WHERE Ssn=‘123456789’;
U4C:
DELETE FROM EMPLOYEE
WHERE Dno=5;
U4D:
DELETE FROM EMPLOYEE;
3. The UPDATE Command
The UPDATE command is used to modify attribute values of one or more selected
tuples. As in the DELETE command, a WHERE clause in the UPDATE command
selects the tuples to be modified from a single relation. However, updating a primary
key value may propagate to the foreign key values of tuples in other relations if such a
referential triggered action is specified in the referential integrity constraints of the
DDL . An additional SET clause in the UPDATE command specifies the attributes to
be modified and their new values. For example, to change the location and controlling
department number of project number 10 to ‘Bellaire’ and 5, respectively, we use U5:
U5:
UPDATE PROJECT
SET Plocation = ‘Bellaire’, Dnum = 5
WHERE Pnumber=10;
Several tuples can be modified with a single UPDATE command. An example is to give
all employees in the ‘Research’ department a 10 percent raise in salary, as shown in U6.
In this request, the modified Salary value depends on the original Salary value in each
tuple, so two references to the Salary attribute are needed. In the SET clause, the
reference to the Salary attribute on the right refers to the old Salary value before
modification, and the one on the left refers to the new Salary value after modification:
U6:
UPDATE EMPLOYEE
Dept of Computer Science
Page 84
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
SET Salary = Salary * 1.1
WHERE Dno = 5;
It is also possible to specify NULL or DEFAULT as the new attribute value.Notice that
each UPDATE command explicitly refers to a single relation only. To modify multiple
relations, we must issue several UPDATE commands.
More Complex SQL Retrieval Queries
1. Comparisons Involving NULL and Three-Valued Logic
SQL has various rules for dealing with NULL values. Recall from Section 3.1.2 that
NULL is used to represent a missing value, but that it usually has one of three different
interpretations—value unknown (exists but is not known), value not available (exists
but is purposely withheld), or value not applicable (the attribute is undefined for this
tuple). Consider the following examples to illustrate each of the meanings of NULL.
1. Unknown value. A person’s date of birth is not known, so it is represented by NULL
in the database.
2. Unavailable or withheld value. A person has a home phone but does not want it to
be listed, so it is withheld and represented as NULL in the database.
3. Not applicable attribute. An attribute LastCollegeDegree would be NULL for a
person who has no college degrees because it does not apply to that person.
Query 18. Retrieve the names of all employees who do not have supervisors.
Q18:
SELECT Fname, Lname
FROM EMPLOYEE
WHERE Super_ssn IS NULL;
2. Nested Queries, Tuples,and Set/Multiset Comparisons
Some queries require that existing values in the database be fetched and then used in a
comparison condition. Such queries can be conveniently formulated by using nested
queries, which are complete select-from-where blocks within the WHERE clause of
another query. That other query is called the outer query. Query 4 is formulated in Q4
without a nested query, but it can be rephrased to use nested queries as shown in Q4A.
Q4A introduces the comparison operator IN, which compares a value v with a set (or
multiset) of values V and evaluates to TRUE if v is one of the elements in V.
The first nested query selects the project numbers of projects that have an employee
with last name ‘Smith’ involved as manager, while the second nested query selects the
project numbers of projects that have an employee with last name ‘Smith’ involved as
Dept of Computer Science
Page 85
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
worker. In the outer query, we use the OR logical connective to retrieve a PROJECT
tuple if the PNUMBER value of that tuple is in the result of either nested query.
Q4A:
SELECT DISTINCT Pnumber
FROM PROJECT
WHERE Pnumber IN
( SELECT Pnumber
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND
Mgr_ssn=Ssn AND Lname=‘Smith’ )
OR
Pnumber IN
( SELECT Pno
FROM WORKS_ON, EMPLOYEE
WHERE Essn=Ssn AND Lname=‘Smith’ );
If a nested query returns a single attribute and a single tuple, the query result will be a
single (scalar) value. In such cases, it is permissible to use = instead of IN for the
comparison operator. In general, the nested query will return a table (relation), which
is a set or multiset of tuples.
SQL allows the use of tuples of values in comparisons by placing them within
parentheses. To illustrate this, consider the following query:
SELECT DISTINCT Essn
FROM WORKS_ON
WHERE (Pno, Hours) IN ( SELECT Pno, Hours
FROM WORKS_ON
WHERE Essn=‘123456789’ );
This query will select the Essns of all employees who work the same (project, hours)
combination on some project that employee ‘John Smith’ (whose Ssn =
‘123456789’) works on. In this example, the IN operator compares the subtuple of
values in parentheses (Pno, Hours) within each tuple in WORKS_ON with the set of
type-compatible tuples produced by the nested query.
In addition to the IN operator, a number of other comparison operators can be used
to compare a single value v (typically an attribute name) to a set or multiset v (typicallya
nested query). The = ANY (or = SOME) operator returns TRUE if the value v is equal
to some value in the set V and is hence equivalent to IN. The two keywords ANY and
SOME have the same effect. Other operators that can be combined with ANY (or
SOME) include >, >=, <, <=, and <>. The keyword ALL can also be combined with
each of these operators. For example, the comparison condition (v > ALL V) returns
TRUE if the value v is greater than all the values in the set (or multiset) V.
An example is the following query, which returns the names of employees whose
salary is greater than the salary of all the employees in department 5:
Dept of Computer Science
Page 86
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ALL ( SELECT Salary
FROM EMPLOYEE WHERE Dno=5 );
Query 16. Retrieve the name of each employee who has a dependent with the
same first name and is the same sex as the employee.
Q16:
SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E
WHERE E.Ssn IN ( SELECT Essn
FROM DEPENDENT AS D
WHERE E.Fname=D.Dependent_name
AND E.Sex=D.Sex );
3. Correlated Nested Queries
Whenever a condition in the WHERE clause of a nested query references some
attribute of a relation declared in the outer query, the two queries are said to be
correlated.
We can understand a correlated query better by considering that the nested query is
evaluated once for each tuple (or combination of tuples) in the outer query. For
example, we can think of Q16 as follows: For each EMPLOYEE tuple, evaluate the
nested query, which retrieves the Essn values for all DEPENDENT tuples with the
same sex and name as that EMPLOYEE tuple; if the Ssn value of the EMPLOYEE
tuple is in the result of the nested query, then select that EMPLOYEE tuple.
In general, a query written with nested select-from-where blocks and using the = or
IN comparison operators can always be expressed as a single block query. For
example,
Q16 may be written as in Q16A:
Q16A:
SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E, DEPENDENT AS D
WHERE E.Ssn=D.Essn AND E.Sex=D.Sex
AND E.Fname=D.Dependent_name;
4.The EXISTS and UNIQUE Functions in SQL
The EXISTS function in SQL is used to check whether the result of a correlated nested
query is empty (contains no tuples) or not. The result of EXISTS is a Boolean value
TRUE if the nested query result contains at least one tuple, or FALSE if the nested
query result contains no tuples. We illustrate the use of EXISTS—and NOT EXISTS—
with some examples. First, we formulate Query 16 in an alternative form that uses
EXIST as in Q16B:
Dept of Computer Science
Page 87
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E
WHERE EXISTS ( SELECT *
FROM DEPENDENT AS D
WHERE E.Ssn=D.Essn AND E.Sex=D.Sex
AND E.Fname=D.Dependent_name);
EXISTS and NOT EXISTS are typically used in conjunction with a correlated nested
query. In Q16B, the nested query references the Ssn, Fname, and Sex attributes of the
EMPLOYEE relation from the outer query.We can think of Q16B as follows: For each
EMPLOYEE tuple, evaluate the nested query, which retrieves all DEPENDENT tuples
with the same Essn, Sex, and Dependent_name as the EMPLOYEE tuple; if at least one
tuple EXISTS in the result of the nested query, then select that EMPLOYEE tuple. In
general, EXISTS(Q) returns TRUE if there is at least one tuple in the result of thenested
query Q, and it returns FALSE otherwise. On the other hand, NOT EXISTS(Q) returns
TRUE if there are no tuples in the result of nested query Q, and it returns FALSE
otherwise. Next, we illustrate the use of NOT EXISTS.
Query 6. Retrieve the names of employees who have no dependents.
Q6:
SELECT Fname, Lname
FROM EMPLOYEE
WHERE NOT EXISTS ( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn );
In Q6, the correlated nested query retrieves all DEPENDENT tuples related to a
particular EMPLOYEE tuple. If none exist, the EMPLOYEE tuple is selected because
the
WHERE-clause condition will evaluate to TRUE in this case. We can explain Q6 as
follows: For each EMPLOYEE tuple, the correlated nested query selects all
DEPENDENT tuples whose Essn value matches the EMPLOYEE Ssn; if the result is
empty, no dependents are related to the employee, so we select that EMPLOYEE
tuple and retrieve its Fname and Lname.
Query 7. List the names of managers who have at least one dependent.
Q7: SELECT Fname, Lname
FROM EMPLOYEE
WHERE EXISTS ( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn )
AND
EXISTS ( SELECT *
FROM DEPARTMENT
WHERE Ssn=Mgr_ssn );
Q16B:
Dept of Computer Science
Page 88
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
One way to write this query is shown in Q7, where we specify two nested correlated
queries; the first selects all DEPENDENT tuples related to an EMPLOYEE, and the
second selects all DEPARTMENT tuples managed by the EMPLOYEE. If at least one
of the first and at least one of the second exists, we select the EMPLOYEE tuple. Can
you rewrite this query using only a single nested query or no nested queries?
The query Q3: Retrieve the name of each employee who works on all the projects
controlled by department number 5 can be written using EXISTS and NOT EXISTS in
SQL systems. We show two ways of specifying this query Q3 in SQL as Q3A and Q3B.
5. Explicit Sets and Renaming of Attributes in SQL
We have seen several queries with a nested query in the WHERE clause. It is
also possibleto use an explicit set of values in the WHERE clause, rather than a
nested query. Such a set is enclosed in parentheses in SQL.
Query 17. Retrieve the Social Security numbers of all employees who work on
project numbers 1, 2, or 3.
Q17:
SELECT DISTINCT Essn
FROM WORKS_ON
WHERE Pno IN (1, 2, 3);
In SQL, it is possible to rename any attribute that appears in the result of a query by
adding the qualifier AS followed by the desired new name. Hence, the AS construct
can be used to alias both attribute and relation names, and it can be used in both the
SELECT and FROM clauses. For example, Q8A shows how query Q8 can be slightly
changed to retrieve the last name of each employee and his or her supervisor, while
renaming the resulting attribute names as Employee_name and Supervisor_name. The
new names will appear as column headers in the query result.
Q8A:
SELECT E.Lname AS Employee_name, S.Lname AS
Supervisor_name
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.Super_ssn=S.Ssn;
6. Joined Tables in SQL and Outer Joins
The concept of a joined table (or joined relation) was incorporated into SQL
to permit users to specify a table resulting from a join operation in the FROM clause
of a query. This construct may be easier to comprehend than mixing together all the
select and join conditions in the WHERE clause. For example, consider query Q1,
which retrieves the name and address of every employee who works for the
‘Research’ department. It may be easier to specify the join of the EMPLOYEE and
DEPARTMENT relations first, and then to select the desired tuples and attributes.
This can be written in SQL as in Q1A:
Q1A:
SELECT Fname, Lname, Address
Dept of Computer Science
Page 89
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;
The FROM clause in Q1A contains a single joined table. The attributes of such a table
are all the attributes of the first table, EMPLOYEE, followed by all the attributes of
the second table, DEPARTMENT. The concept of a joined table also allows the user
to specify different types of join, such as NATURAL JOIN and various types of
OUTERJOIN. In a NATURAL JOIN on two relations R and S, no join condition is
specified; an implicit EQUIJOIN condition for each pair of attributes with the same
name from R and S is created. Each such pair of attributes is included only once in the
resulting relation.
If the names of the join attributes are not the same in the base relations, it is possible
to rename the attributes so that they match, and then to apply NATURAL JOIN. In
this case, the AS construct can be used to rename a relation and all its attributes in the
FROM clause. This is illustrated in Q1B, where the DEPARTMENT relation is
renamed as DEPT and its attributes are renamed as Dname, Dno (to match the name
of the desired join attribute Dno in the EMPLOYEE table), Mssn, and Msdate. The
implied join condition for this NATURAL JOIN is EMPLOYEE.Dno=DEPT.Dno,
because this is the only pair of attributes with the same name after renaming:
Q1B:
SELECT Fname, Lname, Address FROM (EMPLOYEE NATURAL
JOIN (DEPARTMENT AS DEPT (Dname, Dno, Mssn, Msdate)))
WHERE Dname=‘Research’;
The default type of join in a joined table is called an inner join, where a tuple
isincluded in the result only if a matching tuple exists in the other relation. For
example, in query Q8A, only employees who have a supervisor are included in the
result; an EMPLOYEE tuple whose value for Super_ssn is NULL is excluded. If the
user requires that all employees be included, an OUTER JOIN must be used explicitly
In SQL, this is handled by explicitly specifying the keyword OUTER JOIN in a joined
table, as illustrated in Q8B:
Q8B: SELECT E.Lname AS Employee_name,
S.Lname AS Supervisor_name
FROM (EMPLOYEE AS E LEFT OUTER JOIN EMPLOYEE AS S
ON E.Super_ssn=S.Ssn);
There are a variety of outer join operations,
In SQL, the options available for specifying joined tables include INNER JOIN (only
pairs of tuples that match the join condition are retrieved, same as JOIN), LEFT
OUTER JOIN (every tuple in the left table must appear in the result; if it does not
have a matching tuple, it is padded with NULL values for the attributes of the right
table), RIGHT OUTER JOIN (every tuple in the right table must appear in the result;
if it does not have a matching tuple, it is padded with NULL values for the attributes
of the left table), and FULL OUTER JOIN. In the latter three options, the keyword
Dept of Computer Science
Page 90
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
OUTER may be omitted. If the join attributes have the same name, one can also
specify the natural join variation of outer joins by using the keyword NATURAL
before the operation (for example, NATURAL LEFT OUTER JOIN). The keyword
CROSS JOIN is used to specify the CARTESIAN PRODUCT operation , although
this should be used only with the utmost care because it generates all possible tuple
combinations.
It is also possible to nest join specifications; that is, one of the tables in a join may
itself be a joined table. This allows the specification of the join of three or more tables
as a single joined table, which is called a multiway join. For example, Q2A is a
different way of specifying query Q2 from previous Section using the concept of a
joined table:
Q2A:
SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM ((PROJECT JOIN DEPARTMENT ON Dnum=Dnumber)
JOIN EMPLOYEE ON Mgr_ssn=Ssn)
WHERE Plocation=‘Stafford’;
Not all SQL implementations have implemented the new syntax of joined tables. In
some systems, a different syntax was used to specify outer joins by using the
comparison operators +=, =+, and +=+ for left, right, and full outer join, respectively,
when specifying the join condition. For example, this syntax is available in Oracle.
To specify the left outer join in Q8B using this syntax, we could write the query Q8C
as follows:
Q8C:
SELECT E.Lname, S.Lname
FROM EMPLOYEE E, EMPLOYEE S
WHERE E.Super_ssn += S.Ssn;
7. Aggregate Functions in SQL
Aggregate functions are used to summarize information from multiple tuples
into a single-tuple summary. Grouping is used to create subgroups of tuples before
summarization. Grouping and aggregation are required in many database applications,
and we will introduce their use in SQL through examples.
A number of built-in aggregate functions exist: COUNT, SUM, MAX, MIN, and
AVG.2 The COUNT function returns the number of tuples or values as specified in a
query. The functions SUM, MAX, MIN, and AVG can be applied to a set or multiset
of numeric values and return, respectively, the sum, maximum value, minimum value,
and average (mean) of those values. These functions can be used in the SELECT clause
or in a HAVING clause (which we introduce later). The functions MAX and
MIN can also be used with attributes that have nonnumeric domains if the domain
values have a total ordering among one another.3We illustrate the use of these functions
with sample queries.
Dept of Computer Science
Page 91
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
Query 19. Find the sum of the salaries of all employees, the maximum salary, the
minimum salary, and the average salary.
Q19:
SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM EMPLOYEE;
If we want to get the preceding function values for employees of a specific
department— say, the ‘Research’ department—we can write Query 20, where the
EMPLOYEE tuples are restricted by the WHERE clause to those employees who work
for the ‘Research’ department.
Query 20. Find the sum of the salaries of all employees of the ‘Research’ department,
as well as the maximum salary, the minimum salary, and the average salary in this
department.
Q20:
SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;
Queries 21 and 22. Retrieve the total number of employees in the company (Q21) and
the number of employees in the ‘Research’ department (Q22).
Q21:
SELECT COUNT (*)
FROM EMPLOYEE;
Q22:
SELECT COUNT (*)
FROM EMPLOYEE, DEPARTMENT
WHERE DNO=DNUMBER AND DNAME=‘Research’;
Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of
rows in the result of the query. We may also use the COUNT function to count values
in a column rather than tuples, as in the next example.
Query 23. Count the number of distinct salary values in the database.
Q23:
SELECT COUNT (DISTINCT Salary)
FROM EMPLOYEE;
If we write COUNT (SALARY) instead of COUNT(DISTINCT SALARY) in Q23,
then duplicate values will not be eliminated. However, any tuples with NULL for
SALARY will not be counted. In general, NULL values are discarded when aggregate
functions are applied to a particular column (attribute).
The preceding examples summarize a whole relation (Q19, Q21, Q23) or a selected
subset of tuples (Q20, Q22), and hence all produce single tuples or single values.
They illustrate how functions are applied to retrieve a summary value or summary
tuple from the database. These functions can also be used in selection conditions
involving nested queries.We can specify a correlated nested query with an aggregate
function, and then use the nested query in the WHERE clause of an outer query. For
example, to retrieve the names of all employees who have two or more dependents
(Query 5), we can write the following:
Q5:
SELECT Lname, Fname
Dept of Computer Science
Page 92
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
FROM EMPLOYEE
WHERE (SELECT COUNT (*)
FROM DEPENDENT
WHERE Ssn=Essn ) >= 2;
The correlated nested query counts the number of dependents that each employee has;
if this is greater than or equal to two, the employee tuple is selected.
8. Grouping: The GROUP BY and HAVING Clauses
In many cases we want to apply the aggregate functions to subgroups of tuples in a
relation, where the subgroups are based on some attribute values. For example, we may
want to find the average salary of employees in each department or the number of
employees who work on each project. In these cases we need to partition the relation
into non overlapping subsets (or groups) of tuples. Each group (partition) will consist
of the tuples that have the same value of some attribute(s), called the grouping
attribute(s). We can then apply the function to each such group independently to
produce summary information about each group. SQL has a GROUP BY clause for
this purpose. The GROUP BY clause specifies the grouping attributes, which should
also appear in the SELECT clause, so that the value resulting from applying each
aggregate function to a group of tuples appears along with the value of the grouping
attribute(s).
Query 24. For each department, retrieve the department number, the number
of employees in the department, and their average salary.
Q24:
SELECT Dno, COUNT (*), AVG (Salary)
FROM EMPLOYEE
GROUP BY Dno;
Query 25. For each project, retrieve the project number, the project name, and
the number of employees who work on that project.
Q25:
SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname;
Q25 shows how we can use a join condition in conjunction with GROUP BY. In this
case, the grouping and functions are applied after the joining of the two relations.
Sometimes we want to retrieve the values of these functions only for groups that satisfy
certain conditions. For example, suppose that we want to modify Query 25 so that only
projects with more than two employees appear in the result. SQL provides
a HAVING clause, which can appear in conjunction with a GROUP BY clause, for this
purpose. HAVING provides a condition on the summary information regarding the
group of tuples associated with each value of the grouping attributes. Only the groups
Dept of Computer Science
Page 93
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
that satisfy the condition are retrieved in the result of the query. This is illustrated by
Query 26.
Query 26. For each project on which more than two employees work, retrieve
the project number, the project name, and the number of employees who work
on the project.
Q26:
SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname
HAVING COUNT (*) > 2;
Notice that while selection conditions in the WHERE clause limit the tuples to which
functions are applied, the HAVING clause serves to choose whole groups.
Query 27. For each project, retrieve the project number, the project name, and
the number of employees from department 5 who work on the project.
Q27:
SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE Pnumber=Pno AND Ssn=Essn AND Dno=5
GROUP BY Pnumber, Pname;
Query 28. For each department that has more than five employees, retrieve
the department number and the number of its employees who are making
more than $40,000.
Q28:
SELECT Dnumber, COUNT (*)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno AND Salary>40000 AND
( SELECT Dno
FROM EMPLOYEE
GROUP BY Dno
HAVING COUNT (*) > 5)
Views (Virtual Tables) in SQL
1. Concept of a View in SQL
A view in SQL terminology is a single table that is derived from other tables.6 These
other tables can be base tables or previously defined views. A view does not necessarily
exist in physical form; it is considered to be a virtual table, in contrast to base tables,
whose tuples are always physically stored in the database. This limits the possible
update operations that can be applied to views, but it does not provide any limitations
on querying a view.
We can think of a view as a way of specifying a table that we need to reference
frequently, even though it may not exist physically. For example, referring to the
Dept of Computer Science
Page 94
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
COMPANY database in Figure 3.5 we may frequently issue queries that retrieve the
employee name and the project names that the employee works on. Rather than having
to specify the join of the three tables EMPLOYEE, WORKS_ON, and PROJECT every
time we issue this query, we can define a view that is specified as the result of these
joins. Then we can issue queries on the view, which are specified as single table
retrievals rather than as retrievals involving two joins on three tables. We call the
EMPLOYEE,WORKS_ON, and PROJECT tables the defining tables of the view.
2. Specification of Views in SQL
In SQL, the command to specify a view is CREATE VIEW. The view is given a
(virtual) table name (or view name), a list of attribute names, and a query to specify the
contents of the view. If none of the view attributes results from applying functions or
arithmetic operations, we do not have to specify new attribute names for the view, since
they would be the same as the names of the attributes of the defining tables in the default
case.
V1:
CREATE VIEW WORKS_ON1
AS SELECT Fname, Lname, Pname, Hours
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Ssn=Essn AND Pno=Pnumber;
CREATE VIEW DEPT_INFO(Dept_name, No_of_emps, Total_sal)
AS SELECT Dname, COUNT (*), SUM (Salary)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno
GROUP BY Dname;
In V1, we did not specify any new attribute names for the view WORKS_ON1
(although we could have); in this case,WORKS_ON1 inherits the names of the view
attributes from the defining tables EMPLOYEE, PROJECT, and WORKS_ON.View
V2 explicitly specifies new attribute names for the view DEPT_INFO, using a one-toone correspondence between the attributes specified in the CREATE VIEW clause and
those specified in the SELECT clause of the query that defines the view.
We can now specify SQL queries on a view—or virtual table—in the same way we
specify queries involving base tables. For example, to retrieve the last name and first
name of all employees who work on the ‘ProductX’ project, we can utilize the
WORKS_ON1 view and specify the query as in QV1:
QV1:
SELECT Fname, Lname
FROM WORKS_ON1
WHERE Pname=‘ProductX’;
V2:
Dept of Computer Science
Page 95
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
If we do not need a view any more, we can use the DROP VIEW command to dispose
of it. For example, to get rid of the view V1, we can use the SQL statement in V1A:
V1A:
DROP VIEW WORKS_ON1;
Schema Change Statements in SQL
1 .The DROP Command
The DROP command can be used to drop named schema elements, such as tables,
domains, or constraints. One can also drop a schema. For example, if a whole schema
is no longer needed, the DROP SCHEMA command can be used. There are two drop
behavior options: CASCADE and RESTRICT. For example, to remove the
COMPANY database schema and all its tables, domains, and other elements, the
CASCADE option is used as follows:
DROP SCHEMA COMPANY CASCADE;
If the RESTRICT option is chosen in place of CASCADE, the schema is dropped only
if it has no elements in it; otherwise, the DROP command will not be executed. To use
the RESTRICT option, the user must first individually drop each element in the schema,
then drop the schema itself.
If a base relation within a schema is no longer needed, the relation and its definition can
be deleted by using the DROP TABLE command.
DROP TABLE DEPENDENT CASCADE;
If the RESTRICT option is chosen instead of CASCADE, a table is dropped only if it
is not referenced in any constraints (for example, by foreign key definitions in another
relation) or views (see Section 5.3) or by any other elements. With the
CASCADE option, all such constraints, views, and other elements that reference the
table being dropped are also dropped automatically from the schema, along with the
table itself.
Notice that the DROP TABLE command not only deletes all the records in the table if
successful, but also removes the table definition from the catalog. If it is desired to
delete only the records but to leave the table definition for future use, then the
DELETE command should be used instead of DROP TABLE.
The DROP command can also be used to drop other types of named schema elements,
such as constraints or domains.
2. The ALTER Command
The definition of a base table or of other named schema elements can be changed by
using the ALTER command. For base tables, the possible alter table actions include
adding or dropping a column (attribute), changing a column definition, and adding or
Dept of Computer Science
Page 96
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
dropping table constraints. For example, to add an attribute for keeping track of jobs of
employees to the EMPLOYEE base relation in the COMPANY schema, we can use the
command
ALTER TABLE COMPANY.EMPLOYEE
ADD COLUMN JobVARCHAR(12);
We must still enter a value for the new attribute Job for each individual EMPLOYEE
tuple. This can be done either by specifying a default clause or by using the UPDATE
command individually on each tuple (see Section 4.4.3). If no default clause is specified,
the new attribute will have NULLs in all the tuples of the relation immediately after the
command is executed; hence, the NOT NULL constraint is not allowed in this case.
To drop a column, we must choose either CASCADE or RESTRICT for drop behavior.
It is also possible to alter a column definition by dropping an existing default
clause or by defining a new default clause. The following examples illustrate this clause:
ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn
DROP DEFAULT;
ALTER TABLE COMPANY.DEPARTMENT ALTER COLUMN Mgr_ssn
SET DEFAULT ‘333445555’;
Additional Features of SQL
■ SQL features: various techniques for specifying complex retrieval queries, including
nested queries, aggregate functions, grouping, joined tables, outer joins, and recursive
queries; SQL views, triggers, and assertions; and commands for schema modification.
■ SQL has various techniques for writing programs in various programming languages
that include SQL statements to access one or more databases.
These include embedded (and dynamic) SQL, SQL/CLI (Call Level
Interface) and its predecessor ODBC (Open Data Base Connectivity), and SQL/PSM
(Persistent Stored Modules).
■ Each commercial RDBMS will have, in addition to the SQL commands, a set of
commands for specifying physical database design parameters, file structures for
relations, and access paths such as indexes. We called these commands a storage
definition language (SDL) Earlier versions of
SQL had commands for creating indexes, but these were removed from the language
because they were not at the conceptual schema level. Many systems still have the
CREATE INDEX commands.
■ SQL has transaction control commands. These are used to specify units of database
processing for concurrency control and recovery purposes.
■ SQL has language constructs for specifying the granting and revoking of privileges
to users. Privileges typically correspond to the right to use certain SQL commands to
Dept of Computer Science
Page 97
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
access certain relations. Each relation is assigned an owner, and either the owner or the
DBA staff can grant to selected users the privilege to use an SQL statement—such as
SELECT INSERT, DELETE, or UPDATE— to access the relation. In addition, the
DBA staff can grant the privileges to create schemas, tables, or views to certain users.
These SQL commands—called GRANT and REVOKE—
■ SQL has language constructs for creating triggers. These are generally referred to as
active database techniques, since they specify actions that are automatically triggered
by events such as database updates.
■ SQL has incorporated many features from object-oriented models to have more
powerful capabilities, leading to enhanced relational systems known as objectrelational. Capabilities such as creating complex-structured attributes (also called
nested relations), specifying abstract data types (called UDTs or user-defined types)
for attributes and tables, creating object identifiers for referencing tuples, and
specifying operations on types
■ SQL and relational databases can interact with new technologies such as
XML and OLAP.
Rollback, Commit and Savepoint
All these statements fall in the category of Transaction Control Statements.
Rollback:
This is used for undoing the work done in the current transaction. This command also
releases the locks if any hold by the current transaction. The command used in SQL for
this is simply:
ROLLBACK;
Savepoint:
This is used for identifying a point in the transaction to which a programmer can later
roll back. That is it is possible for the programmer to divide a big transaction into
subsections each having a savepoint defined in it. The command used in SQL for this
is simply:
SAVEPOINT savepointname;
For example:
UPDATE…..
DELETE….
SAVEPOINT e1;
INSERT….
UPDATE….
SAVEPOINT e2;
……
Dept of Computer Science
Page 98
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
It is also possible to define savepoint and rollback together so that programmer can
achieve rollback of part o a transaction. Say for instance in the above
ROLLBACK TO SAVEPOINT e2;
This results in the rollback of all statements after savepoint e2
Commit:
This is used to end the transaction and make the changes permanent. When commit is
performed all save points are erased and transaction locks are released. In other words
commit ends a transaction and marks the beginning of a new transaction. The command
used in SQL for this is simply:
COMMIT;
An Example to Illustrate Granting and Revoking of Privileges
Suppose that the DBA creates four accounts—A1, A2, A3, and A4—and wants only
A1 to be able to create base relations. To do this, the DBA must issue the following
GRANT command in SQL:
GRANT CREATETAB TO A1;
The CREATETAB (create table) privilege gives account A1 the capability to create
new database tables (base relations) and is hence an account privilege. This privilege
was part of earlier versions of SQL but is now left to each individual system
implementation to define.
In SQL2 the same effect can be accomplished by having the DBA issue a CREATE
SCHEMA command, as follows:
CREATE SCHEMA EXAMPLE AUTHORIZATION A1;
User account A1 can now create tables under the schema called EXAMPLE. To
continue our example, suppose that A1 creates the two base relations EMPLOYEE and
DEPARTMENT shown in Figure 24.1; A1 is then the owner of these two relations and
hence has all the relation privileges on each of them.
Next, suppose that account A1 wants to grant to account A2 the privilege to insert
and delete tuples in both of these relations.However, A1 does not want A2 to be able
to propagate these privileges to additional accounts. A1 can issue the following
command:
GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2;
Notice that the owner account A1 of a relation automatically has the GRANT OPTION,
allowing it to grant privileges on the relation to other accounts. However, account A2
cannot grant INSERT and DELETE privileges on the EMPLOYEE and
DEPARTMENT tables because A2 was not given the GRANT OPTION in the
preceding command.
Now suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE
relation from A3; A1 then can issue this command:
REVOKE SELECT ON EMPLOYEE FROM A3;
Dept of Computer Science
Page 99
Downloaded by jazz dude (it23092000@gmail.com)
lOMoARcPSD|25829564
NIE FIRST GRADE COLLEGE
The DBMS must now revoke the SELECT privilege on EMPLOYEE from A3, and it
must also automatically revoke the SELECT privilege on EMPLOYEE from A4. This
is because A3 granted that privilege to A4, but A3 does not have the privilege any
more.
Dept of Computer Science
Page 100
Downloaded by jazz dude (it23092000@gmail.com)
Download