DBMS STUDY MATERIAL - E

advertisement
DATABASE MANAGEMENT SYSTEM
Unit- I
Database System Architecture - Basic Concepts : Data System, Operational Data,
Data Independence, Architecture for a Database System, Distributed Databases,
Storage Structures : Representation of Data. Data Structures and Corresponding
Operators: Introduction, Relation Approach, Hierarchical Approach, Network
Approach.
Unit – II
Relational Approach : Relational Data Structure : Relation, Domain, Attributes,
Key Relational Algebra - Introduction, Traditional Set Operation. Attribute
Attribute names for derived relations - Special Relational Operations.
Unit – III
Embedded SQL: Introduction – Operations not involving cursors, involving
cursors – Dynamic statements, Query by Example – Retrieval operations, Built-in
Functions, update operations - QBE Dictionary. Normalization : Functional
dependency, First, Second, Third normal forms, Relations with more than one
candidate key, Good and bad decomposition.
Unit – IV
Hierarchical Approach : IMS data structure - Physical Database, Database
Description- Hierarchical sequence - External level of IMS : Logical Databases,
the program communication block IMS Data manipulation : Defining the Program
communication Block : DL / 1 Examples.
Unit – V
Network Approach : Architecture of DBTG System. DBTG Data Structure : The
set construct, Singular sets, Sample Schema, the external level of DBTG – DBTG
Data Manipulation
UNIT -I
1.Data base system
Database system is nothing more than a computer-based record keeping system
(i.e.) a system whose overall purpose is to record and maintain information. The
information concerned can be anything that is deemed to be of significance to the
organization or the system which may serve the organization in decision-making
processes involved in the management of that organization.
The database system involves four major componenets.They are data ,hardware,
software and users.
Database Management System
1
User1
User
User
Application programs
End
Users
Fig: Simplified picture of a database system
Data
The data stored in the system is partitioned into one or more databases. A database
is a repository for stored data, it is of both integrated and shared.
Integrated: By integrated we mean that the database can be thought of as a
unification of several distinct files, with the redundancy among those files
eliminated.
Example: Combination of EMPLOYEE and ENROLLMENT data files.
Shared: By Shared we mean that individual pieces of data in the database can be
shared among different users that is many users can have access to the same piece of
data.
Example: The department information in EMPLOYEE file would be shared by
users in the personal department, education department etc.
Hardware
The hardware consists of the secondary storage device disks, drums,etc… on which
the database resides together with the associated devices, control units, channels and
so forth.
Software
Between the physical database and the users of the system is a layer of software
usually called the DBMS.All requests from users for access to the database are
handled by the DBMS.One general function provided by the DBMS is thus the
shielding of the database users from hardware level. The DBMS provides a view of
the database that is elevated somewhat above the hardware level and supports user
operation that are expressed in terms of that higher-level view.
2
Users
We consider three broad categories of database users, they are
*application programmers
*end-users
*DBA
1.Application programmers
Application programmer is responsible for writing application programs that use
the database. These application programs operate on the data in all the usual ways
that is in retrieving information, creating new information, deleting or changing
existing information.
2.End-users
End-users access the database from a terminal. An end-user may employ a
query language provided as an integral part of the system or may invoke a userwritten application program that accepts commands from the terminal and in turn
issues requests to the DBMS on the end-user’s behalf.
3.Database Administrator
DBMS have central control of both the data and to the programs that
access those data. The person who has such control over the system is called
DBA.The main functions of DBA are
*Schema definition
*Storage structure and access-method definition
*Granting and physical-organization modification
*Integrity-constraint specification
These are the various components of a database system.
2.Operational data
A database is a collection of stored operational data used by the application systems of
some particular enterprise. Where enterprise is a conventional generic term for any
reasonably self-contained commercial, scientific, technical or other organization.
Examples.
Manufacturing company,Bank,Hospital,University,Government department etc.
The enterprise should maintain a lot of data about its operation. The “operational data”
for the enterprises quoted above are,
Product data, account data, patient data, student data, planning data.
3
Example for the illustration of operational data
Consider the manufacturing company where the enterprise will wish to
retain information about the projects it has on hand; parts used in those projects; the
suppliers who supply the parts; the warehouses in which the parts are stored; the
employees who work on the projects etc..These are the basic entities about which data is
recorded in the database. In general there will be associations or relationships linking the
basic entities together(entity is any distinguishable object).
For example, there is an association between suppliers and parts that is each supplier
supplies certain parts and conversely each part is supplied by certain suppliers etc..
projects
suppliers
parts
warehouses
locations
employees
department
s
Fig: An example of operational data
The figure illustrates
1.Most of the associations are between two entities or more than that
ex., arrow connecting suppliers-parts-projects
Here supplier s2 supplies part p4 to project j3.
2.The example also shows one arrow involving only in one type of entity (parts)
ex., some parts are components of other parts (a screw is a component of a huge
assembly or char etc..)
3.Some entities may be associated in more than one relationship
Ex., projects and employees are linked in two relationships
a. the employee works on the project
b .the employee is the manager of the project
This example clearly illustrates operational data and its functions.
4
3.Data Independence
The ability to modify a schema definition in one level without affecting a schema in the
next higher level is called data independence.
Most present day applications are data-dependent. This means ,the way in which the
data is organized in secondary storage and the way in which it is accessed are both
dictated by the requirements of the application ,and moreover that knowledge of the data
organization and access technique is built into the application logic.
For example, if a file is stored in indexed sequential form, and in order to modify the
file the indexes defined should be known. Here the data is dependent, and the
modification requires complete application program to be rewritten.
In database system, data resides independent and any modification done at physical
level/conceptual level may not affect the database system.
Two types of data independence stated are
1.Physical data independence
Physical data independence is the ability to modify the physical schema
without causing application programs to be rewritten. Modifications at the physical
level are occasionally necessary to improve performance.
Example,
Modifying the structure of the database using ALTER command etc.
2.Logical data independence
Logical data independence is the ability to modify the logical schema without
causing the application programs to be rewritten.
Example,
Modifications such as adding new columns or field to the database.
Most of the modifications are done by the DBA and the types of change that the
DBA wish to make may be explained with the help of the following definitions:
Stored field: Stored field is the smallest unit of data stored in the database.
Ex., database containing information about parts would probably include a stored field
type called part number etc.
Stored record: Stored record is a named collection of associated stored fields.
Stored file: Stored file is the collection of all occurrences of one type of stored record.
Similarly if a data type of the stored field has to be changed is also done by Data. The
data storage may be in any of the following form.
5
1.Representation of numeric data
Data may be stored in internal arithmetic form or as a character string.
2.Representation of character data
A character field may be
(eg.EBCDIC,ASCII..)
stored
in
any
of
several
character
codes
3.Units for numeric data
The units in a numeric field may change.Ex.,from inches to centimeters
4.Data coding
In some situations it may be desirable to represent data in storage by coded values.
Ex., the value for part color=RED can be interpreted as 1=’RED’.
5.Structure of stored records
Two existing types of stored record may be combined into one. For ex., the record
types(part number, color) and (part number, weight) may be integrated to give (part
number,color,weight).
Also a single type of stored record may be split into two. For ex.,(part
number,color,weight) may be broken down into (part number, color) and (part number,
weight).
6.Structure of stored fields
A given stored file may be physically implemented in storage in a wide variety of
ways.
For ex., storing the file in single storage volume or spread across several volumes.
The above fact implies that the database is able to grow without affecting existing
applications.
4.Architecture for a Database system
The architecture is divided into three general levels, they are internal,conceptual,external
levels,
------------------- External level
(individual user views)
Conceptual level
(Community user view)
Internal level
6
(Storage view)
Fig:Three levels of architecture
*Internal level(Physical level)
This level is the one closest to the physical storage .This is a low-level representation
of the entire database; it consists of many occurrences of each of many types of internal
record .The storage view is described by means of the internal schema which not only
defines the various stored record types but also specifies what indexes exist, how stored
files are represented ,what physical sequence the stored records are in and so on.
*Conceptual level (Community logical level)
This level is the representation of the entire information content of the database. It
consists of many occurrences of each of many types of conceptual record. Also this is a
level of indirection between the other two levels.
*External level(user logical level)
This level is closest to the users and is concerned with the way the data is seen by the
individual users. The users may be application programmers,end-users,DBA etc.Each
user has a language at his/her disposal to interact with the database.
For the application programmer the language will be either a conventional
programming like c++,JAVA etc.
For end users the language will be either a query language or some specialpurpose language and that language is data sub language (DSL) which is a subset of the
total language that is concerned with database objects and operations. The DSL is
embedded within the corresponding host language . A given system might support any
number of host languages and any number of data sub languages; however, one particular
data sub language that is supported by almost all current systems is the language SQL.
Any given data sub language is a combination of at least two subordinate languages-a
Data definition language(DDL) and data manipulation language(DML).Where the DDL
portion consists of declarative constructs and the DML portion consists of executable
statements.
The individual user will generally be interested only in some portion of the total
database; moreover ,that user’s view of that portion will generally be somewhat abstract
when compared with the way the data is physically stored. The term for an individual
user’s view is an external view. An external view is thus the content of the database as
seen by some particular user.
For example,
A user from the Personnel Department might view the details of employee and
department and nothing else.
7
Detailed System architecture
User A1
Host language
+DSL
*external
schema A
user A2
User B1
Host language
+DSL
Host language
+DSL
*external
schema B
External view A
External/conceptual
mapping A
Host language
+DSL
External view B
External/conceptual
mapping B
Conceptual view
conceptual
schema
User B2
Database
managem
ent system
(DBMS)
Conceptual/internal
mapping
storage structure
definition
(internal schema)
Stored database(internal level)
*user interface
fig: Database system architecture
Mappings
The mappings involved in the architecture are conceptual/internal mapping and
external/conceptual mappings.
The conceptual/internal mapping defines the correspondence between the conceptual
view and stored database, it specifies how conceptual records and fields are represented
at the internal level. If the structure of the stored database is changed then the
conceptual/internal mapping must be changed accordingly, so that the conceptual schema
can remain invariant. The effects of such changes must be isolated below the conceptual
level, in order to preserve physical data independence.
The external/conceptual mapping defines the correspondence between a particular
external view and the conceptual view.
Database administrator(DBA)
8
The Data Administrator(DA) is the person who makes the strategic and policy
decisions regarding the data of the enterprise and the DBA is the person who provides the
necessary technical support fro implementing those decisions. Thus the DBA is
responsible for the overall control of the system in technical level. The major tasks of
DBA are
*defining the conceptual schema or schema definition
*storage structures and access-method definition
*schema and physical organization modification
*granting of authorization for data access
*integrity constraint specification
DBMS
The DBMS is the software that handles all access to the database. Its functions
are as follows
 A user issues an access request using some particular data sub language
 The DBMS intercepts that request and analyses it.
 The DBMS inturn,intercepts the external schema for that user, the corresponding
external/conceptual mapping, the conceptual schema, the conceptual/internal
mapping, the storage structure definition.
 The DBMS executes the necessary operations on the stored database
The diagrammatic representation of the major functions of DBMS and its components.
Source schemas
and mappings
DDL processors
Planned DML
requests
Unplanned
DML requests
DML processor
Query language
processor
Compiled
requests
9
Enforce security and
Integrity constraints
5.Distributed databases
The key objective of distributed system is that it should look like a centralized system to
the users. Distributed processing means that distinct machines can be connected together
into communication network such as the Internet, so that the single data-processing task
can span several machines in the network.
A distributed database is typically a database that is not stored in its entirety at a single
physical location, but rather is spread across a network of computers that are
geographically dispersed and connected through communication links.
For example, consider a banking system in which the customer accounts database is
distributed across the bank branch offices, such that each individual customer account
record is stored at the customer’s local branch. It other words the data is stored at the
10
location at which it is frequently used, but is still available through communication
network to users at other locations for example, users at the bank’s central office.
Client
Server
Client
Server
Communication
network
Client
Server
Client
Server
D
database
Advantages
 Efficiency of local processing
 Data sharing
Disadvantages


Overhead may be quite high
Technical difficulties
6.Storage structures and its purposes.
11
The main idea behind data maintenance is for future reference and it has to be stored
for the storage and access of data ,various techniques like sequential ,direct access etc.
exists. Once the data is stored in the memory in internal level(physical storage) then it is
accessed through DML operations in terms of external records and must be converted in
turn to operations at the actual hardware level that is to operations on physical records or
blocks. The component responsible for this internal/physical conversion is called an
access method. The access method consists of a set of routines whose function is to
conceal all device-dependent details from the DBMS and to present the DBMS with a
stored record interface.
USER
user interface
External record
DBMS
Stored record interface
occurrences
Stored record
occurrences
Access Method
Physical record interface
physical record
occurrences
Fig: The stored record interface
The stored record interface thus corresponds to the internal level, just as the user interface
corresponds to the external level. Also the stored record interface allows DBMS to view
the storage structure as a collection of stored files each consisting of all occurrences of
one type of stored record. The DBMS knows
*What stored files exist
*The structure of the corresponding stored record
*The stored fields on which it is sequenced
*The stored field which can be used for direct access etc.
These information will be specified as part of the storage structure definition.
The DBMS does not know
a)anything about physical records
b)how sequencing is performed
c)how direct access is performed
These information are specified to the access method not to the DBMS.
12
Also ,when a new stored record occurrence is first created and entered into the database,
the access method is responsible for assigning it a unique stored record
address(SRA).This value distinguishes each stored records from other records, the SRA
for a particular occurrence is returned to the DBMS by the access method when the
occurrence is first created and may be used by the DBMS for subsequent direct access to
the occurrence concerned. The SRA for a given occurrence does not change until the
occurrence is physically moved as part of a database reorganization.
7.How data are stored in the physical storage?
There are various possible representations of data within the memory and some of
them are explained here. Consider the following example.
S#
S1
S2
S3
S4
S5
Sname
Smith
Jones
Blake
Clark
Adams
Status
20
10
30
20
30
City
London
Paris
Paris
London
Athens
The table consists of information about five suppliers for each supplier a record number
,a supplier name, a status value and a location is recorded. Also the supplier number for
each supplier is unique, that is each record is sequenced on the basis of its primary key.
The above example is the simplest from of data representation containing only five
record occurrences with unique supplier number. If the suppliers are 10000 rather than
five and located in only 10 different cities then the storage will be wasted specifying the
10 cities among 10000 suppliers. Then the pointer is specified from the supplier file to
the city file by separating the city attribute alone to a file.
The following is another form of data the representation
Supplier file
S#
S1
S2
S3
S4
S5
Sname
Smith
Jones
Blake
Clark
Adams
city file
City
Athens
London
Paris
Status City-ptr
20
10
30
20
30
In the above figure the pointers exists from supplier file to the city file and they are
SRAs(Storage record address).Advantage of this form of representation over the previous
one is, in the later memory space is saved.
13
The third form of data representation is indexing. If a file is indexed on any of its
attributes(more frequently occurring) then accessing such file is quite easier. The
representation can be
S#
S1
S2
S3
S4
S5
City
Supplier ptr
Athens
London
paris
Sname
Smith
Jones
Blake
Clark
Adams
Status
20
10
30
20
30
indexed on city
An example,”Find all suppliers in a given city”,when this query is placed then the result
is retrieved quite easily from the database if represented as above that is in indexed form.
The purpose of indexing is to provide an access path to the file.An index is a file in
which each entry(record) consists of a data value together with one or more pointers.The
data value is a value for some field of the indexed file and the pointers identify records in
the indexed file having that value for that field.An index can be used in two ways first it
is used for sequential access to the indexed file and another is used for direct access to
individual records in the indexed file on the basis of a given value for that same field.
The another form of dat representation is multilist organisation.
8.DATA STRUCTURES AND CORRESPONDING OPERATORS
The range of data structures supported at the user level is a factor that critically affects
many componenets of the system .It dictates the design of the corresponding data
manipulation languages,since DML operation must be defined in terms of its effect on
those datastructures.We may categorize database systems according to the approach and
the best known approaches are
Relational approach
Hierarchical approach
Network approach
The relational approach
14
The relational approach uses a collection of tables to represent both data and the
relationships among those data. Each table has multiple columns and each column has a
unique name.
Sample relational database
Bank customer
Customer name
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay
Smith
Snsocial-security-no.
92-83-7465
019-28-3746
677-28-9011
182-73-6091
192-83-7465
321-12-3123
336-66-9999
019-28-3746
customer-street
Alma
North
Main
Putnam
Alma
Main
Park
North
customer-city
Palo Alto
Rye
Harrison
Stamford
Palo Alto
Harrison
Pits field
Rye
account-no.
A-101
A-215
A-102
A-305
A-201
A-217
A-222
A-201
Accounts
account-no
A-101
A-215
A-102
A-305
A-201
A-217
A-222
balance
500
700
400
350
900
750
700
For example, customer Johnson whose social-security-no. is 192-83-7465 lives on Alma
in Palo Alto and has 2 accounts A-101 with balance 500,a-201 with balance 900.Also
smith and Jhonson shares A-201 account.
Network model
Data in the network model are represented by collections of records and
relationships among data .The relationships among data can be represented by links,
which can be viewed as pointers
Sample network databases
Johnson 192-83-7465 Alma Palo Alto
A-101 500
15
A-215 700
Smith 019-28-3746 North Rye
Hierarchical Model
This form of data representation is similar to network model in the sense that records
represent data and relationships among data and links .It differs from the network model
in that the records are organized as collection of trees rather than graphs.
9.Advantages of using DBMS
Many enterprises choose to store its operational data in an integrated database because
it provides the enterprise with centralized control of its operational data, which is most
valuable.
DBA has the central responsibility over operational data.
Advantages if data is stored under centralized control.
1.Redundancy can be reduced
In non-database system each application has its own private files-which may cause
redundancy in stored data. By means of integration this can be avoided.
2.Inconsistency can be avoided (to some extent)
16
Suppose the fact, Employee E3 works in department D8 is represented by two distinct
entries in the database and the system is not aware of this duplication. And if any one
alone is updated in some occasions they will not agree and comes inconsistent state.
So if the redundancy is controlled then the system could guarantee that the database is
never inconsistent as seen by the user, by ensuring that any change made to either of two
entries is automatically made to each other. This process is known as propagating
updates.
3.The data can be shared
New applications can access the stored databases.
4.Security restrictions can be applied.
Only if permissions are available all users could access the database. The permissions
are given by the DBA, so the data ensures security.
5.Integrity can be maintained
Data in the database is accurate or not is mostly validated.
10.Database Administrator
One of the main reasons for using DBMS is to have central control of both
the data and the programs that access those data. The person who has such central control
over the system is called the database administrator (DBA). The functions of the DBA
include the following.
Schema definition: The DBA creates the original database schema by writing a set of
definitions that is translated by DDL compiler to a set of tables that is stored permanently
in the data dictionary.
Storage structure and access-method definition: The DBA creates appropriate storage
structures and access methods by writing a set of definitions, which is translated by the
data-storage and data-definition-language compiler.
Schema and physical-organization modification: Programmers accomplish the
relatively rare modifications either to the database schema or to the description of the
physical storage organization by writing a set of definitions that is used by either the
DDL compiler or the data-storage and data-definition language.
Granting of authorization for data acess: Granting of different types of authorization
allows the DBA to regulate which parts of the database various users can access.
Integrity – constraint specification: Setting constraints (conditions) while entering data
to the database .For ex, the minimum balance in the account should be at least 500 etc.
17
DATABASE MANAGEMENT SYSTEM
UNIT I
Objective questions
1.Database is
a) Computer-based billing system
b) Computer-based record keeping system
c) Computer-based animation system
2.The software used for access to the database is
a) BASIC b) PASCAL c) DBMS
3.The end-users access the database from the terminal using
a) Query language b) English language c) C language
4.DBA stands for
a) Data Base Administrator b) Data base Access c) Data Batch Administration
5.Which of the following is not operational data
18
a) Product data b) Account data c) two numbers
6.The database system provides the enterprise with ___________ control of its
operational data
a) Centralized
b) Single c) Shared
7.The ability to modify the schema definition in one level without affecting the schema in
the other level is called
a) Data dependence
b) data independence c) data abstraction
8.Which of the following is not a level of database architecture
a) External b) logical
c) super d) conceptual
9.Data sub language is a combination of
a) DDL and DML b) DDL and TCL
c) C and C++
10.A database that is not stored in a single physical location in its entirety and spread
across the network is
a) Centralized database b) Distributed database c) Shared database
11.DBMS is
a) A software that handles all access to the database
b) A hardware
c) An interface between end-user and computer
12.The component responsible for internal/physical conversion is called
a) Access method b) internal conversion c) a hardware
13. SRA is
a) Stored Record Array
b) Stored Record Access
c) Stored Record Address
14.Primary key is the key which
a) Avoids duplication of data b) supports duplication of data
c) allows null values
15.The data is represented in terms of
1) Relational approach 2) hierarchical approach 3) network approach
a) 1,2 b) 1,2,3
c) none of the above
16.The representation of data in relational approach
1) Tables 2) tuples
3) relations
Ans: a) 1 b) 1,2 c) 1,2,3 d) none
17.The data represented in network approach is through
a) Records and links
b) tables c) trees
18.The ___________permits the DBMS to view the storage structure as a collection of
stored files.
a) Stored record interface b) Stored record address c) Access method
19.Entity is
a) Any distinguishable real world object
b) Not an object
c) Incident
20.DBMS stands for
a) Data Base Management System b) Database Multimedia system
c) Data Base Management Standards
19
Short questions
1.What are the basic components of database system?
2.Explain the components of a database system with the simplified diagram.
3.What is an operational data?
4.Explain operational data with example.
5.Explain data independence.
6.Why database systems is adopted rather than filesystem or write down the advantages
of database system.
7.Distinguish between input, output, and operational data
8.Explain three levels of database system in brief.
9.What is the role of DBA?
10.What are the functions of DBMS?
11.Explain in brief distributed databases.
12.Relate distributed databases with client server architecture.
13.Explain access method, SRA, SRI.
14.Differentiate relational, network, hierarchical approaches.
15.Explain any one form of data representation.
Elaborate questions
1.Role of DBA with any one-function explanation in detail
2.DBMS and its functions, advantages, disadvantages
3Database system is followed now-a-days. Justify
4.Explain the architecture of database system.
5.Explain database system with simplified structure.
6.Explain storage structures with at least any one representation.
7.Explain various data structures used to represent data in database system.
Unit II
Syllabus
Relational approach: Relational data structure: relation, domain, attributes, keys
Relational algebra: Introduction, traditional set operation, attribute names for
derived relations, special relational operations.
Books for Reference:
Database system Concepts - Abraham silberschatz, Henry F.Korth, S.Sudharsan
20
An introduction to database system - C.J.Date
Principles of database system -Aho D.Ullman
An introduction to database systems -Bipin P.Desai
Relational Approach
Introduction:
The relational model has established itself as the primary data model for
commercial data-processing applications. The first database systems were based on
either the network model or the hierarchical model. The relational model is now
being used in numerous applications outside the domain of traditional data
processing.
Structure of relational databases.
A relational database consists of a collection of tables, each of which is
assigned a unique name. A row in a table represents a relationship among a set of
values. The rows are termed as tuples and columns are termed as attributes. Since a
table is a collection of such relationships, there is a close correspondence between
the concept of table and the mathematical concept relation, from which the
relational data model takes its name.
The following account table or relation has three column headers: branchname, account-number and balance. These are the attributes (columns are referred
as attributes). For each attribute there is a set of permitted values, called the domain
of that attribute. For the attribute, branch-name set of all branch-names is its
domain.
The account relation
Branch-name
Downtown
Mianus
Perry ridge
Round Hill
Brighton
Redwood
Brighton
Account-number
A-101
A-215
A-102
A-305
A-201
A-222
A-217
21
Balance
500
700
400
350
900
700
750
Let D1 denote the set of all branch-names, D2 denote the set of all accountnumbers, and D3 the set of all balances. In the account relation it consists of a 3tuple (v1, v2, v3), were v1 is a branch name, v2 is an account number and v3 is a
balance. The account will contain only a subset of the set of all possible rows. It can
be represented as
D1 * 2 * D3
In general a table of n attributes must be a subset of
D1 * D2 *……Dn-1 * D n
The relation is said to be a subset of a Cartesian product of a list of domains.
Tables are relations and the mathematical terms relation and tuple is used for the
terms table and row respectively. In the account relation of the above figure there
are seven tuples. Let the tuple variable t refer to the first tuple of the relation .We
use the notation t [branch-name] to denote the value of t on the branch-name
attribute. Thus, t [branch-name]=”Downtown”, and t [balance]=500.Since the
relation is a set of tuples, we use the mathematical notation of t E r to denote that
tuple r is in relation r.
Domain: -Domain is a pool of values.
Also we can say that domain is atomic if elements of the domain are
considered to be individual units. For example, the set of integers is a nonatomic
domain. The distinction is that we do not normally consider integers to have
subparts, but we consider sets of integers to have subparts-namely, the integers
comprising the set. It is possible for several attributes to have the same domain.
The customer relation
Customer-name
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green
Customer-street
Main
North
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
Customer-city
Harrison
Rye
Harrison
Rye
Pittsfield
Stamford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stamford
22
It is possible for several attributes to have the same domain. For example,
suppose that we have a relation customer that has the three-attribute customername, customer-street and customer-city, and a relation employee that includes the
attribute employee-name. It is possible that the attributes customer-name and
employee-name will have the same domain: the set of all person names. The domains
of balance and branch-name are certainly distinct. It is perhaps less clear whether
customer-name and branch-name should have the same domain. At the physical
level, both customer names and branch-names are character strings. However, at the
logical level, we may want customer-name and branch-name to have distinct
domains.
Relation:
Definition for relation (mathematically):
Given a collection of set D1, D2,……Dn (not necessarily distinct,R is a relation on
those n sets if it is a set of ordered n-tuples <d1,d2,……dn> such that d1 belongs to
D1,d2 belongs to D2 ,…..dn belongs to Dn.Set D1,D2,D3,…..Dn are the domains of
R.The value of n is the degree of R.
The concepts of relation correspond to the programming-language notion of a
variable. The concept of a relation schema corresponds to the programminglanguage notion of type definition. It is convenient to give a name to a relation
schema, just as we give names to type definitions in programming languages. We
adopt the convention of using lowercase names for relations, and names beginning
with an uppercase letter for relation schemas. For example,
Account-schema=(branch-name, account-number, balance)
The explanation of relation can be expressed diagrammatically with the help of
E-R diagrams. Before discussing E-R diagrams, the common terms used in the
diagrams is analysed.
Entity: This is a thing or object in the real world that is distinguishable from all
other objects. For example, each person in an enterprise is an entity. An entity has a
set of properties, and the values for some set of properties may uniquely identify
entity. For example, the social-security number 677-89-9011(employee number
1111) uniquely identifies one particular person in the enterprise.
Entity Set: An entity set is a set of entities of the same type that share the same
properties or attributes. The set of all persons who are customers at a given bank,
for example, can be defined as the entity set customer.
Attributes: An entity is represented by a set of attributes. Attributes are descriptive
properties possessed by each member of an entity set. Possible attributes of
23
customer entity are customer-number, customer-street, and customer-city. The
following attribute types, as used in the E-r model, can characterize an attribute.

Simple and Composite attributes: The attributes, which can be divided into
subparts, are composite attribute. For example, name is an attribute,
which is combination of first-name, middle name, and last-name.

Single-valued and Multivalued attributes: The attributes that we have
specified in our examples all have a single value for a particular entity.
For instance, the loan-number attribute for a specific loan entity refers to
only one loan number. Such attributes are said to be single valued. There
may be instances where an attribute has a set of values for a specific
entity.

Null attributes: A null value is used when an entity does not have a value
for an attribute.

Derived attribute: The value for this type of attribute can be derived from
the values of other related attributes or entities. For instance, let us say
that the customer entity set has an attribute loans-held, which represents
how many loan a customer entity set has from the bank. We can derive
the value for this attribute by counting the number of loan entities
associated with that customer.
Relationship sets
Consider the relation loan.
Branch-name
Downtown
Redwood
Perry ridge
Downtown
Mianus
Round Hill
Perry ridge
Loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-16
Amount
1000
2000
1500
1500
500
900
1300
A relationship is an association among several entities. For example, we can define
a relationship that associates customer Hayes with loan number L-15.This
relationship specifies that Hayes is a customer with loan number L-15.
A relationship set is a set of relationships of the same type.Formally.it is a
mathematical relation on n>=2 (possibly non distinct) entity sets. If E1, E2,…..En
are entity sets, then a relationship set R is a subset of
24
{(e1, e2,…………..,en)|e1  E1,e2  E2 ,…..en  En}
Where (e1, e2,…….en) is a relationship.
Consider the two entity sets customer and loan, we can define the relationship set
borrower to denote the association between customers and the bank loans that the
customers have. As another example, consider the two-entity sets loan and branch.
We can define the relationship set loan-branch to denote the association between a
bank loan and the branch in which that loan is maintained.
Each row of the table represents one n-tuple of the relation. The number of tuples
in the relation is called the cardinality of the relation. Eg. The cardinality of the
relation loan is 7.
The relations may be unary, binary, ternary, n-ary etc.
Unary: Relations of degree one is unary.
For ex, the query Find the branch name that issued loan with number L-17.The
output will be
Branch-name
Downtown
Binary: Relations of degree two are binary.
Ex, Find branch-name and amount for loan-number L-17 from branch
relation
The output will be,
Branch-name Amount
Downtown
1000
Ternary: Relations of degree three are ternary
N-ary: Relations of degree n are n-ary.
Mapping cardinalities: Mapping cardinalities, or cardinality ratios, express the
number of entities to which another entity can be associated via relationship set.
Mapping cardinalities are most useful in describing binary relationship sets,
although occasionally they contribute to the description of relationship sets that
involve more than two entity sets.
For binary relationship set R between sets A and B, the mapping cardinality must
be one of the following:
25
One to one: An entity is associated with at most one entity in B, and an entity in B
is associated with at most one entity in A.
One to Many: An entity in A is associated with any number of entities in B.An
entity in B, however, can be associated with at most one entity in A.
Many to one: An entity in A is associated with at most one entity in B.An entity in
B, however, can be associated with any number of entities in A.
Many to Many: An entity in A is associated with any number of entities in B, and
an entity in B is associated with any number of entities in A.
Keys:
In a relation there is one attribute whose values is unique within the relation and
thus can be used to identify the tuples of that relation.
For ex, in the above said loan relation the loan number can be considered as a key,
which is unique, and can be used to distinguish all other tuples in that relation.
Befrore discussing on various keys let us have a glance on integrity constraints.
Integrity constraints:
An integrity constraint is a mechanism used by oracle to prevent invalid
data entry into the table. It is nothing but enforcing rule for the coloumn in
a table. The following are the various types of integrity constraints: *Domain integrity constraints
Maintains value according to the specification like ‘not null’
condition, so that the user has to enter a value for the coloumn on which it is
specified.
‘Not null’ and ‘Check’ constraints fall unde this category.
*Entity integrity constraint
Maintains uniqueness in a record.
*Referential integrity constraint
Enforces relationship between tables
To establish a ‘parent-child’ or a ‘master-detail’ relationship
between two tables having a common column we make use of referential
integrity constraints. To implement this we should define the column in the
26
parent table as a primary key and the same column in the child table as a
foreign key referring to the corresponding parent entry.
We define constraint to either at table or column level. If it is defined at
the table level, then it can be enforced to any number of columns in a table
.On other hand, if it is defined at the column level then it holds good only for
the column for which it is defined.
Various keys related to relational approaches are
Primary Key: Primary key is a set of one or more attributes that, taken
collectively allows us to identify uniquely an entity in the entity-set.
Ex.1) An-number in the loan relation
2) Also the combination of branch-name and loan-number
Candidate Key: Several distinct sets of attributes could serve as candidate
key
Referenced key:It is a unique or a primary key, which is defined on a
coloumn belonging to the parent table.
Foreign Key: A coloumn or combination of coloumns included in the
definition of referential integrity, which would refer to a referenced key.
Child table: This table depends upon the values present in the referenced
key of the parent table, which is referred by a foreign key.
Parent table: This table determines whether insertion or updation of data
can be done in child table. This table would be referred by child table’s
foreign key.
On delete cascade clause
If all rows under the referenced key coloumn in a parent table are
deleted, than all rows in the child table with dependent foreign key will also
be deleted automatically.
Entity-Relationship Diagrams:
An E-R diagram can express the overall logical structure of a database
graphically. Such a diagram consists of the following major components:
The symbol used to represent entity is rectangle
27
The symbol used to represent attribute is ellipse
The symbol used to represent links is lines
_______
The symbol used to represent the relation is
The symbol used to represent multivalued attributes is Double ellipses
The symbol used to represent the derived attributes is dashed ellipses
The symbol used to represent the total partition of entity in a relationship set is
double lines.
E-R diagram for a Banking-Enterprise
Account-number
account
Branch-city
Balance
Assets
Branch-name
Accountbranch
Deposit
-or
branch
Loanbranch
28
Borrower
customer
loan
Customer-street
Loan-number
Customer-name
Customer-city
Amount
Various relations used for the discussion of this chapter are
1.Account relation
Branch-name
Downtown
Mianus
Perry ridge
Round Hill
Brighton
Redwood
Brighton
Account-number
A-101
A-215
A-102
A-305
A-201
A-222
A-217
Balance
500
700
400
350
900
700
750
2.Loan relation
Branch-name
Downtown
Redwood
Perry ridge
Downtown
Mianus
Round Hill
Perry ridge
Loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-16
Amount
1000
2000
1500
1500
500
900
1300
3.Branch relation
Branch-name
Downtown
Branch-city
Brooklyn
Assets
9000000
29
Redwood
Perryridge
Mianus
Round hill
Pownal
North town
Brighton
Palo alto
Horse neck
Horse neck
Horse neck
Bennington
Rye
Brooklyn
2100000
1200000
400000
8000000
300000
3700000
7100000
4.Customer relation
Customer-name
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green
Customer-street
Main
North
Main
North
Park
Putnam
Nassau
Spring
Alma
Sand Hill
Senator
Walnut
Customer-city
Harrison
Rye
Harrison
Rye
Pittsfield
Stamford
Princeton
Pittsfield
Palo Alto
Woodside
Brooklyn
Stamford
5.Depositor relation
Customer-name
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay
Account-number
A-101
A-215
A-102
A-305
A-201
A-217
A-222
6.Borrower relation
Customer-name
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams`
Loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
30
Relational Algebra
Note: Query languages
A query language is a language in which a user requests information from the
database. These languages are typically of a level higher than that of a standard
programming language. Query languages can be categorized as being either
procedural or non-procedural .In procedural language, the user instructs the system
to perform a sequence of operations on the database to compute the desired result.
In a non-procedural language, the user describes the information desired without
giving a specific procedure for obtaining that information.
Introduction
Relational algebra is a collection of operations on relations. Also it is a procedural
query language, it consists of a set of operations that take one or two relations as
input and produce a new relation as their result.
The fundamental operations or traditional set operations available with relational
algebra are select, project, set difference, Cartesian, rename, union. In addition to the
fundamental operations, there are several other operations-namely, set intersection,
natural join, division, and assignment. These operations will be defined in terms of
the fundamental operations. Also we can state the selction, projection, join and
division operations as special relational operators.
Fundamental operations
The select, project and rename operations are called unary
operations, because they operate on one relation. The other three operations union,
setdifference and Cartesian product operate on pairs of relations and are, therefore
called binary operations.
The select operation
31
The select operation selects tuples that satisfy a given predicate. The
lowercase Greek letter sigma () is used to denote selection. The predicate appear as
a subscript to . The argument relation is given in parenthesis following the .
Example:
1.Select those tuples of the loan relation where the branch is “Perryridge”.
branch _name=”perryridge”(loan)
The result of the query is
Branch-name Loan-number Amount
Perryridge
L-15
1500
Perryridge
L-16
1300
2.Find all tuples in which the amount lent is more than $1200
 Amount>1200(loan)
All comparisons using =,, <,,≥ in the selection predicate. Also we can combine
larger predicates using the connectives and (^) and or (۷).
3.Find those tuples pertaining to loans of more than $1200 made by Perryridge
branch
branch _name=”perryridge”^amount>1200(loan)
The project operation
Suppose we want to list all loan numbers and the amount of the loans, but do not
care about the branch name. The project operation allows us to produce this relation.
The project operation is a unary operation that returns its argument relation, with
certain attributes left out. Since a relation is a set, any duplicate rows are eliminated.
Projection is denoted by the Greek letter pi (π). We list those attributes that we wish to
appear in the result as subscript to π.The argument relation follows in parentheses.
Example:
1.List all loan numbers and the amount of the loan .The corresponding query is
π loan-number,amount(loan)
The relation that results from this query is
Loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-16
Amount
1000
2000
1500
1500
500
900
1300
32
The set difference operation
The set-difference operation, denoted by -, allows us to find tuples that are in one relation
but are not in another. The expression r – s results in a relation containing those tuples in r
but not in s.
Example:
1.Find all customers of the bank who have an account but not a loan
π customer-name (depositor) – πcustomer-name (borrower)
The result will be
Customer-name
Johnson
Turner
Lindsay
For a set difference operation r-s to be valid, we require that the relations r and s be of the
same arity, and that the domains of the ith attribute of r and the ith attribute of s be the
same.
The cartesian – product operation
The Cartesian-product operation, denoted by a cross (X), allows us to combine
information from any two relations. We write the Cartesian product of relations r1 and
r2 as r1 X r2. Since the same attribute name may appear in both r1 and r2, we need to
devise a naming schema to distinguish between these attributes. We do so here by
attaching to an attribute the name of the relation from which the attribute originally
came. For example, the relation schema for r = borrower X loan is
(borrower.customer-name,borrower.loan-number,loan.branch-name,loan.loannumber,loan.amount)
So now we can distinguish borrower.loan-number from loan.loan-number.For those
attributes that appear in only one of the two schemas,we shall usually drop the relationname prefix.We can wrte the relation schema for r as
(customer-name,borrower.loan-number,branch-name,loan.loan-number,amount)
This above naming convention requires that the relations that are arguments of the
Cartesian-product operation have distinct names.
Assume that we have n1 tuples in borrower and n2 tuples in loan. Then, there are n1 * n2
ways of choosing a pair of tuples –one tuple from each relation; so there are n1*n2 tuples in
r. In particular ,note that for some tuples t in r,it may be that t[borrower. loan-number] not
equal to t[loan.loan-number].
33
In general ,if we have relations r1(R1) and r2(R2),then r1 X r2 is a realtion whose
schema is the concatenation of R1 and R2.Relation R contains all tuples t for which there is
a tuple t1 in r1,and t2 in r2 for which t[R1]=t1[R1] and t[R2]=T2[R2].
For example
1.if we want to find the names of all customers who have a loan at the Perryridge
branch.We need the information in both the loan relation and the borrower relation to do
so.If we write
branch-name=”Perryridge”(borrower X loan)
Customer-name Borrower.loan- Branch-name
number
Jones
L-17
Downtown
Jones
L-17
Redwood
…….
…….
…….
…….
…….
……
…….
…….
……
Adams
L-16
Round hill
Adams
L-16
Perryridge
Table:Result of borrower X loan
Loan.loan-number
Amount
L-17
L-23
……..
…….
…….
L-11
L-16
1000
2000
…..
…..
…..
900
1300
Now the output of the query stated above will be as
Customer-name
Jones
Jones
Smith
Smith
Hayes
Hayes
Jackson
Jackson
Curry
Curry
Smith
Smith
Williams
Williams
Adams
Adams
Loan-number
L-17
L-17
L-23
L-23
L-15
L-15
L-14
L-14
L-93
L-93
L-11
L-11
L-17
L-17
L-16
L-16
Branch-name
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Loan-number
L-15
L-16
L-15
L-15
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
Amount
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
Table:result of query branch-name=”Perryridge”(borrower X loan)
The relation describes the details relating to perryridge branch alone.But there is a
chance that many customers may not have a loan at perryridge branch.So the query
can be re-written as
34
borrower.loan-number=loan.loan-number
( branch-name=”Perryridge”(borrower X loan))
In order to retrieve only the customer-name ,we vcan have the projection operation as
customer-name(borrower.loan-number = loan.loan-number
(branch-name=”Perryridge”(borrower X loan)
The result is as shown below
Customer-name
Hayes
Adams
Table:Result of
customer-name(borrower.loan-number = loan.loan-number
(branch-name=”Perryridge”(borrower X loan)
The rename operation
Unlike relations in the database, the results of relational-algebra expressions do
not have a name that we can use to refer to them. It is useful to be able to give them
names; the rename operator, denoted by the lower-case Greek letter rho (), lets us
perform this task.
Given a relational-algebra expression E, the expression
x(E)
returns the result of expression E under the name x.
A relation r by itself is considered to be a trivial relational-algebra expression. Thus,
we can also apply the rename operation to a relation r to get the same relation under a
new name.
A second form of the rename operation is as follows. Assume that a relationalalgebra expression E has arity n. Then the expression
x(A1,A2,.....An)(E)
returns the result of expression E under the name x,and with the attributes renamed to
A1,A2,.....An.
For example,
1.Find the largest balance in the bank
Steps invloved are
 Compute first the relation consisting of those balances that are not the
largest
 The take the set difference between the relation balance(account)
35

Then comes the temporary relation
The corresponding queries are
account.balance( account.balance < d.balance(account X d (account)))
This expression gives those balances in the account relation for which a
larger balance appears somewhere in the account relation(renamed as
d).The result contains all balances except the largest one.
The relation is
Balance
500
700
400
350
750
The query to find the largest account balance in the bank can be written as follows:
balance(account) –
account.balance (account.balance <d.balance(account X d (account)))
the result of this query is
Balance
900
Fig: largest account balance in the bank
2.Find the names of all customers who live on the same street and in the same city as Smith
The street and city of smith can be obtained by writing as
customer-street,customer-city(customer-name=”Smith”(customer))
In order to find other customers with this street and city, we must reference the
customer relation a second time. In the following query, we use the rename operation
on the preceding expression to give its result the name smith-addr, and to rename its
attributes to street and city, instead of customer-street and customer-city:
customer.customer-name
(customer.customer-street=smith-addr.street^customer.customer-city=smith-addr.city
(customer X smith-addr(street,city)
(customer-street,customer-city(customer-name=”Smith”(customer)))))
The result of this query is as shown below
Customer-name
Smith
36
curry
Additional operations or special relational operations
1.The set-intersection operation
The symbol used to identify is .
Example:
1.Find all customers who have both a loan and an account.
Query is
customer-name(borrower)  customer-name(depositor)
The result will be
Customer-name
Hayes
Jones
Smith
Table: customers with both an account and a loan at the bank
The intersection operation can be replaced using the set difference operation as
r  s =r-(r-s)
The Union operation
With the help of this operation we can choose the details which are present in either of
two relations.
For example:
1.Find the names of all bank customers who have either an accoubt or a loan or both.
The customer relaion does not contain the information ,since a customer does not need
to have either an account or a loan at the bank.And to answer this query we need the
information in the depositor relation and in the borrower relation .
*To find the customers with loan at the bank we use
customer-name(borrower)
*To find the names of all customers with an account in the bank:
 customer_name(depositor)
To find both account and loan holding customers we need to union these two as
Customer-name(borrower)  customer-name(depositor)
The result of this query is
Customer-name
Johnson
Smith
37
Hayes
Turner
Jones
Londsay
Jackson
Curry
Williams
Adams
For union operation r U s to be valid, we require two conditions:
1.The relations r and s must be of the same arity. That is, they must have the same
number of attributes.
2.The domain of the ith attribute of r and the ith attribute of s must be the same, for all i.
Where r and s can be, in general temporary relations that are the result of relationalalgebra expressions.
The natural-join operation
It is often desirable to simplify certain queries that require a Cartesian product. A
query that involves a Cartesian product includes a selection operation on the result
of the Cartesian product.
Assume:
Find the names of all customers who have a loan at the bank, and find the
amount of the loan.
Steps :
1.Form the Cartesian product of the borrower and loan relations.
2.Select those tuples that pertain to only the same loan-number.
3.Project the customer-name,loan-number and amount.
customer-name,loan.loan-number,amount
(borrower.loan-number=loan.loan-number(borrower X loan))
The natural join is a binary operation that allows us to combine certain selections and a
Cartesian product into one operation. It is denoted by the “join” symbol ⋈.The natural-join
operation forms a Cartesian product of its two arguments, performs a selection forcing
equality on those attributes that appear in both relation schemas, and finally removes
duplicate attributes.
For example:
1.Find the names of all customers who have a loan at the bank, and find the amount of
the loan.
 customer-name,loan-number,amount(borrower⋈ loan)
The result of the query is
38
Customer-name
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
Loan-number
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
Amount
1000
2000
1500
1500
500
900
1000
1300
2.find names of all branches with customers who have an account in the bank and who
live in Harrison
 branch-name( customer-city=”Harrison”(customer ⋈ account ⋈ depositor))
The result of the query is
Branch-name
Brighton
Perryridge
The division operation
The division operation, denoted by, is suited to queries that include the phrase “for all”.
Example:
1.Find all customers who have an account at all the branches located in Brooklyn.
Steps:
1.All branches in Brooklyn can be obtained as
r1= branch-name( branch-city=”Brooklyn”(branch))
The result is
Branch-name
Brighton
Downtown
We can find all (customer-name,branch-name) pairs for which the customer has an
account at a branch by writing
r2=customer-name,branch-name(depositor⋈ account)
39
Customer-name
Johnson
Smith
Hayes
Turner
Williams
Lindsay
Johnson
Jones
Branch-name
Downtown
Mianus
Perryridge
Round hill
Perryridge
Redwood
Brighton
Brighton
Table:Result of customer-name,branch-name(depositor⋈ account)
Our question is to find those customers who appear in r2 with every branch name
in r1.We formulate the query by writing
 customer-name,branch-name(depositor ⋈ account)
⊹ Branch-name( branch-city=”Brooklyn”(branch))
Extended relational-algebra operations
The basic relational-algebra expressions have been extended in several ways.
A simple extension is to allow arithmetic operations as part of projection. An
important extension is to allow aggregate operations, such as computing the sum
of the elements of a set, or their average. Another important extension is the
outer-join operation, which allows relational-algebra expressions to deal null
values, which model missing information.
Generalized Projection
The generalized projection operation extends the projection operation
by allowing arithmetic functions to be used in the projection list. The
generalized projection has the form
F1,F2,……Fn(E)
Where E is any relational-algebra expression, and each F1, F2,…Fn are
arithmetic expressions involving constants and attributes in the schema of
E.As a special case, the arithmetic expression may be simply an arithmetic or
a constant. The following example demonstrates the basis for the use of the
generalized projection operation. Suppose we have a relation credit-info, as
shown, which lists the credit limit and expenses so far .If we want to find how
much more each person can spend, we can write the following expression:
customer-name,limit - credit-balance(credit-info)
Customer-name Limit Credit-balance
40
Jones
Smith
Hayes
Curry
6000
2000
1500
2000
700
400
1500
1750
Table:The credit-info relation
Customer-name Limit-credit_balance
Jones
Smith
Hayes
Curry
5300
1600
0
250
The result of customer-name, limit - credit-balance (credit-info)
Outer join
The outer-join operation is an extension of the join operation to deal with missing
information.
Aggregate functions
Aggregate functions are functions that take a collection of values and return a
single value as a result. For example, the aggregate function sum takes a collection of
values and returns the sum of the values.
The function sum applied on the collection
<1,1,3,4,4,11>
returns the value 24.
The function avg
returns the average of the values. So average of the above is 4.
The function count returns the number of the elements in the collection and would
return 6 on the preceding collection.
The functions min and max, returns the minimum and maximum values in a
collection; they return 1 and 11.
Examples:
41
1.Find out the total sum of salaries of all part-time employees in the bank.
The query is
Sum salary (pt-works)
The result of this query is a relation with a single attribute, containing a single row
with a numerical value corresponding to the sum of all the salaries of all employees
working part-time in the bank.
Refer for further details of aggregate functions in the text
1.Database system concepts
-Abraham Silberschatz,Henry K.Forth
2.Refer ‘An introductin to database systems’ –chapter 4
-Bipin P.Desai
for relational approach.
Short questions:
1.What is relational approach.
2.What is relational algebra.
3.Write the definition for relational algebra.
4.What are the fundamental operations of relational algebra.
5.What is entity, relation, entity set, relaionship, relationship set, attribute.
6.Briefly explain mapping cardinalities.
7.Draw the entity relationship diagram for banking enterprise.
8.Explain selection and projection operation with example.
9.Explain aggregate functions in brief.
10.Explain set operations.
11.Explain binary, unary, ternary and n-ary relations.
12.What are the various symbols used in entity relationship diagram.
13.What is constraint?
14.Write note on integrity rules.
15.What is a key?
Elaborate questions:
1.Write the definition for key and explain various keys with example.
2.Explain the structure of relational databases with example.
3.Explain referential integrity constraint or rule, with example.
4.Explain all fundamental operations of relational algebra or traditional set
operations with example.
5.Write all aggregate functions and explain in detail with example.
6.What is extended relational operations and explain all the available operations.
42
Unit III
Syllabus
Embedded SQL:Introduction –operators not involving cursors, involving cursorsDynamic statements. Query by example-retrieval operations, builtin-functions,
update operations, QBE Dictionary.Normalization: Functional Dependency, First,
Second, third normal formd, relations with more than one candidate key, good and
bad decomposition.
Books for Reference:
An introduction to database system - C.J.Date
Database system Concepts - Abraham silberschatz, Henry F.Korth, S.Sudharsan
Principles of database system -Aho D.Ullman
Embedded SQL
SQL provides a powerful declarative query language; writing queries in
SQL are typically much easier than is coding the same queries in a general-purpose
programming language. To access a database from a general-purpose programming
language is for the following two reasons.
1.Not all queries can be expressed in SQL, since SQL does not provide the full
expressive power of a general-purpose language. That is, there exists queries that can be
expressed in a language such as Pascal, C, COBOL or FORTRAN that cannot be
expressed in SQL write queries, we can embed SQL within a more powerful language
2.Nondeclarative actions-such as printing a report, interacting with a user, or sending
the results of a query to a graphical user interface-cannot be done from within SQL.
A language in which SQL queries are embedded is referred to as host language,
and the SQL structures permitted I the host language constitute embedded SQL.
Languages such as PL/I however are not well equipped to handle more that one
record at a time. It is therefore necessary t provide some form of bridge between the two
functional levels and embedded SQL provides such a bridge by means of a new type of
object called a cursor.
Operations not involving cursors
43
The DML statements that do not need cursors are as follows:




“Singleton SELECT”
UPDATE
INSERT
DELETE
Singleton SELECT
We use the term “singleton SELECT “ to mean statement for which the
retrieved table contains at most one row.
Example: SELECT statement
UPDATE
This statement can be executed to have changes in the databases
designed.
Example: UPDATE, statement of SQL.
INSERT
This statement is used to include new row or information.
Example: INSERT, statement of SQL.
DELETE
This is used to delete information from the database.
Example: DELETE, statement of SQL.
Operations involving cursors
Consider the case of a SELECT that selects a whole set of records, not just
one. What is needed is a mechanism for accessing the records in the set one
by one; and cursors provide such a mechanism. Explicitly defined cursors
are constructs that enable the user to name an area of memory to hold a
specific statement for access at a later time.
The programmer to process a multiple-row active set one record at a time
defines explicit cursors. The following are steps for using explicitly defined
cursors within PL/SQL.
1.Declare the cursor
* Name the cursor
* Each cursor associates a query with cursor
44
Syntax
Declare cursor-name is select statement
Example
Declare
c_names
is
select
branch_name
from
branch_city=’Brooklyn’;
branch
where
2.Open the cursor
Opening the cursor activates the query and identifies the active set.
Open also initializes the cursor pointer to just before the first row of the
active set.
Syntax
Open cursor-name;
3.Fetching the cursor
Getting data into the cursor is accompolished with the fetch
command.The fetch command retrieves the rows in the cursor set one row at
a time.
Syntax
Fetch cursor-name into record-list;
4.Closing the cursor
The close statement closes or deactivates the previously opened cursor
and makes the active set undefined oracle will implicitly close a cursor when
the user’s program or see\ssion is terminated.After a cursor is closed ,we
cannot perform any operation on it.
Syntax
Close cursor-name;
Attributes involved in cursors
 %ISOPEN returns TRUE if the cursor is already OPEN
 %FOUND returns TRUE if the last FETCH returned a row, and
returns
FALSE
if
the
last
FETCH
failed to return a row.
 %NOTFOUND is the logical opposite of %FOUND.
 %ROWCOUNT yields the number of rows fetched.
Example to illustrate cursor
1)
Declare
45
Cursor c4 is select salary,job from emp where job=’CLERK’;
Begin
if c4%isopen then
dbms.output.put_line(‘This message will not be displayed’);
else
open c4;
dbms.output.put_line(‘Cursor not found’);
end if;
close c4;
end;
2) The procedure to update students information by finding the total and
average.
Declare
st stu%rowtype;
cursor c1 is select * from stu;
Begin
Open c1;
loop;
fetch c1 into st;
exit when c1%notfound;
st.tot1l:=st.m1+st.m2+st.m3;
st.average:=st.total/3;
if st.m1>=50 and st.m2>=50 and st.m3>=50 then
st.result:=’PASS’;
else
st.result:=’FAIL’;
end if;
update
stu
total=st.total,average=st.average,result=st.result where regno=st.regno;
end loop;
commit;
end;
set
Dynamic Statements
Embedded SQL provides certain features to facilitate the writing of
on-line application programs that is programs to support on-line access to the
database from an end-user at the terminal. Steps involved are
1.accept a command from the terminal
2.analyze the command
3.issue appropriate SQL statements
4.return a message and/or results to the terminal
46
The precompiler is a compiler for the SQL language. Suppose the application
programs have written a program P that includes some embedded SQL statements.
Pre-compilation proceeds as follows.
 The precompiler scans the source program P and locates the embedded
SQL statements.
 For each statement it finds the precompiler decides on a strategy for
implementing that statements in terms of RSI operations. This process is
referred to as optimization
 The precompiler replaces each of the original embedded SQL statements
by an ordinary PL/I statement
The dynamic SQL component of SQL-92 allows programs to construct and
submit SQL queries at run-time. In case of embedded SQL, each statement
must be completely present at compile time, and are compiled by the
embedded SQL preprocessor.
Using dynamic SQL, programs can create SQL queries as strings at run-time
(based on i/p from the user) and can either have them executed immediately,
or have them prepared for subsequent use.
The two principal dynamic statements are PREPARE and EXECUTE.
DCL SQLSOURCE CHAR (256);
SQLSOUCE =’DELETE
BRANCH_CITY=’PERRYRIDGE’;
$PREPARE
SQLOBJ
$EXECUTE SQLOBJ:
FROM
BRANCH
FROM
WHERE
SQLSOURCE:
The PREPARE statement passes the SQLSOURCE string to the RDS
precompiler which goes through its normal process of parsing, optimization,
code generation and builds a machine language versions of the statement
called SQLOBJ.EXECUTE statement causes this machine language routine to
be executed and thus causes the actual deletions to occur.
Once PREPAREd ,a given dynamically generated SQL statement can be
EXECUTED many times. The generated statement can be replaced by
another by issuing PREPARE again with the same target and a different
source.
QUERY-BY-EXAMPLE
Query-by-example (QBE) is the name of both a data-manipulation
language and the database system that included this language. The QBE
database system was developed at IBM T.J.Watson Research center in the
47
early 1970s.Today,some-database systems for personal computers support
variants of QBE languages. It has two distinctive features:
1.Unlike most query languages and programming languages, QBE has a
two-dimensional syntax: Queries look like tables. A query in one-dimensional
language can be written in a one line. A two-dimensional language requires
two dimensions for its expression.
2.QBE queries are expressed “by example”. Instead of giving a procedure for
obtaining the desired answer, the user gives an example of what is desired. The
system generalizes this example to compute the answer to the query.
We express queries in QBE using skeleton tables. These tables
show the relation schema as shown below.
Example the representation of branch relation
Branch Branch
name
Branch city
assets
Retreival operations
Queries on One relation
Examples:
1:Find all loan numbers at the Perryridge branch
Loan
Branchname
Perryridge
Loannumber
P._x
Amount
The proceeding query causes the system to look for tuples in loan
that have “perryridge” as the value for the branch-name attribute. For each
such tuple the value of the loan-number attribute is assigned to the variable
x. The value of the variable x is “printed”, because the command P. appears
in the loan-number coloumn next to the variable x.QBE assumes that a blank
position in a row contains unique variable.As a result,if a variable does not
appear more than once in a query,it may be omitted.
48
Thus the previous query can be re-written as
Loan
branch-name loan-number amount
Perryridge
P.
QBE performs duplicate elimination automatically.To suppress the duplicate
elimination,we insert the command ALL. After the P. command:
Loan
branch-name loan-number amount
Perryridge
P.ALL
To display the entire loan relation ,we can create a single row consisting of P. in
every field.
Loan
branch-name loan-number amount
P.
QBE allows queries that involve arithmetic comparisons
Example
1.Find the loan numbers of all loans with a loan amount of more than $700.
Loan Branch-name Loan-no. Amount
P.>700
The arithmetic operations that QBE supports are =,<,≤,≥ and ¬
2.Find the names of all branches that are not located in Brooklyn.
Branch Branch-name Branch-city Assets
P.
¬Brooklyn
3.Find the loan-no. of all loans made jointly to Smith and Jones.
Borrower Customer-name Loan-no.
‘Smith’
P._x
‘Jones’
_x
49
4.Find the loan numbers of all loans made to smith ,to Jones or to both
jointly.
Borrower customer-name loan-no.
‘Smith’
P._x
‘Jones’
P._y
5.Find all customers who live in the same city as Jones.
Customer Customer-name Customer-street Customer-city
P._x
_y
Jones
_y
Queries on several relations
QBE allows queries that span several different relations. The
connections among the various relations are achieved through variables that
force certain tuples to have the same value on certain attributes.
Example
1.Find the names of all customers who have a loan from the ‘perryridge’
branch..
loan branch_name loan_no. amount
perryridge
_x
borrower cust_name loan_no.
P._x
_x
2.Find the names of all customers who have both an account and a loan at
the bank.
Depositor customer-name account-no.
P._x
Borrower customer-name account-no.
50
_x
3.Find the names of all customers who have an account at the bank ,but who
have a loan from the bank.
Depositor customer-name account-no.
P._x
Borrower customer-name loan-no.
_x
4.Find all customers who have atleast two account.
Depositor customer-name account-no.
P._x
x
_y
y
The condition box
It is not convenient to express all the constraints on the domain
variables within the skeleton tables. To overcome this QBE includes a
condition box feature that allows the expression of general constraints over
any of the domain variables.
Example:
1:Find all customers who are not named ‘Jones’ and who atleast two
account.
Depositor customer-name account-no.
P._x
x
_y
y
Conditions
-Y>_z
51
2.Find all account-no. with a balance between $1300 and $1500 ,we write
acc-no. branch-name acc-no. balance
P.
_x
Conditions
_x.≥1300
_x≤1500
3.Find all branches that have assests greater than those of atleast one branch
loacated in ‘Brooklyn’.
Branch branch-name branch-city assets
P._x
Brooklyn
_y
_x
Conditions
_Y >_z
Options available with condition Box
1.QBE allows complex arithmetic expressions to appear in a condition
box.
Example:
Find all branches that have assets that are atleast twice as large as the assets
of one of the branches located in Brooklyn.
Branch branch-name branch-city assets
P._x
Brooklyn
52
_y
_x
2.QBE allows logical expressions to appear in condition box.Operators used
are and( & ),or( | )
Example
Find all account numbers with a balance between $1300 and $2000 but not
exactly $1500.
Account branch-name account-no. balance
P.
_x
Conditions
_x=( ≥1300 and ≤2000 and
┐1500)
The result relation
If the result of a query includes attributes from several relation schemas,
we need a mechanism to display the desired result in a single table.
Example
1.Find the customer-name, account-no. and balance for all accounts at the
perryridge branch
In relational algebra
1.Join depositor and account relation
2.project customer-name, account-no. and balance
QBE related with this.
1.Create a skeleton table called result with attributes customer-name,
account-no. and
balance.
Account branch-name account-no. Balance
Perryridge
_y
_z
Depositor customer-name account-no.
_x
_y
53
Result customer-name
P.
_x
Ordering of the display of tuples
account-no. Balance
_y
_z
By using the command AO. And DO. we can order the contents.
Example
1.List all customers in descending alphabetical order.
Depositor customer-name account-no.
P.DO.
Aggregate functions[Built-in
functions]
QBE includes the aggregate operators AVG, MAX, MIN, SUM and
CNT.we must postfix these operators with ALL. to create a multiset on which
the aggregate operation is evaluated.
Example
1.Find the total balance of all the account maintained at the perryridge
branch.
Account
branch-name account-no.
balance
Perryridge
P.SUM
ALL.
2.Find the total no. of customers who have an account at the bank.
Depositor
customer-name
P.CNT.UNQ.ALL.
54
account-no.
3.Find the name,street and city of all customers who have more than one
account at the bank.
Customer cust-name cust-street cust-city
P.
_x
Depositor Cust-name Account-No.
G._x
CNT.ALL._y
Conditions
CNT.ALL._y > 1
Update operations/Modification of
the database
This section deals with the options how to add, remove or change
information using QBE.
Deletion
Deletion of tuples from a relation is expressed in much the same way as a query. The major difference is the use of
D. in the place of P..In QBE we can delete whole tuples, as well as values in selected coloumns. To delete information in
only some of the columns, null values, specified by-are inserted.
D. Operates on only one relation. To delete tuples from several
relations, we must use one D. operator for each relation.
*Delete customer smith
customer cust_name cust_street cust_city
D.
Smith
*Delete the branch-city value of the branch whose name is “Perryridge”.
Branch branch-name branch-city asstes
Perryridge
D.
*Delete all loans with a loan amount between $1300 and $1500
55
Loan Branch-name loan-no. amount
56
57
58
D.
_y
_x
Borrower cust_name loan_no.
D.
_y
Condition
_x=(>=1300 and <= 1500)
*Delete all accounts at all branches located in Brooklyn.
Account branch_name account_no. balance
D.
_x
_y
Depositor cust_name acc_no.
D.
_y
59
branch branch_name
_x
branch_city assets
Brooklyn
Insertion
We do the insertion by placing the I. Operator in the query
expression.The attribute values for inserted tuplles must be members of the
attributes domain
Example
*To insert into the branch relation information about a new branch with
name “Capital” and city “Queens”,but with a null asset value,we write
branch branch_name branch_city assets
I.
Capital
Queens
*To insert the account A-9732 at the Perryridge branch has a balance of
$700.
Account branch-name account_no. balance
I.
Perryridge A-9732
700
Updates
If we want to changeone value in a tuple withput changing all values
in the tuple we use the update facility and the operartor used is U. .QBE
allows users to update the primary key fields.

Update the asset value of the Perryridge branch to $10,000,000
Branch branch-name branch-city assets
Perryridge
U. 100000000
60
The query updates the assets of the Perryrigde branch to
$10,000,000 regardless of the old values.If we want to update a value
using the previous vaulue ,we must express a request using two
rows:One specifying the old tuples that need to be updated,and the
other indicating the new updated tuples to be inserted in the database

The interesty payments are being made,and all branches are to be
increased by 5%.
Account branch-name account-no. balance
U.
_x * 1.05
_x.
QBE Dictionary
QBE has a built-in dictionary that is represented to the user as a collection of
tables. The dictionary include for example, a TABLE and a DOMAIN table, giving
details of all tables and all domains currently known to the system. The dictionary
tables can be interrogated using the ordinary retrieval operations of the DML.
Retrieval of table-names
Get the names of all tables known to the system.
P.
Instead of having to build a skeleton for the TABLE table and entering “P.”
in the NAME column of that skeleton, the user can formulate this query by simply
entering the “P.” in the table-name position of the blank table.
Retrieval of column-name for a given table
Get names of all columns in table S.
S
P.
61
User enters the table-name (S) followed by “P.” against the row of (blank) columnnames.
Creation of a new table
1.Create table branch
I. branch I.
Branch name branch city branch street
The first I. Creates a dictionary entry for table branch; the 2nd I. Creates
dictionary entries for the four columns of the table branch. Also the information for
each column must be specified .The information includes the name of the underlying
domain; the data-type of the domain; if that domain is not already known to QBE.
Dropping a table
Drop table branch.
A table can be dropped only if it is currently empty.
1)Delete all branch details
branch branch name branch city branch street
D.
2)Drop the table
D. Branch branch name branch city branch street
Expanding a table
Add a asset coloumn to the table branch.
QBE does not directly support the dynamic addition of a new column to an
existing table is currently empty.
62
So the following steps should be followed.
1) Define a new table the same shape as the existing table plus the new column.
2) Load the new table from the old using a multiple-record insert.
3) Delete all data from the old table.
4) Drop the old table.
5) Change the name of the new table to that of the old table.
Normalization
Introduction
Normalization theory is build around the concept of normal forms. A relation is said to
be in a particular normal form if it satisfies a certain specified set of constraints. For
example, a relation is said to be in first normal form if and only if it satisfies the
constraint that it contains atomic values only. Various normal forms are First Normal
Form, Second Normal Form, Third Normal Form, DKNF, and BCNF etc. Concept of
normalization arises in the case to design a relational-database without unnecessary
redundancy, easy way of retrieval etc…So if we want to design such a database we go
for normalization.
For the description of normalization, we shall consider the supplier-and-parts
database. The database or relation is as follows:
PART---P
P#
P1
P2
P3
P4
P5
P6
Pname
Nut
Bolt
Screw
Screw
Cam
Cog
Color
Weight City
Red
Green
Blue
Red
Blue
Red
12
17
17
14
12
19
S#
S1
S2
S3
S4
S5
63
Sname
Smith
Jones
Blake
Clark
Adams
London
Paris
Rome
London
Paris
London
Status
20
10
30
20
30
City
London
Paris
Paris
London
Athens
SP------
S#
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
P#
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
QTY
300
200
400
200
100
100
300
400
200
200
300
400
FIG:1
Functional Dependency
Definition:
Given a relation R, attribute Y of R is functionally dependent on attribute
X of R if and only if each X-value in R has associated with it precisely one Yvalue in R.
In the supplier-and-parts database the attributes SNAME, STATUS and
CITY of a relation S are each functionally dependent on attribute S#. For a
particular value for S# there exists precisely one corresponding value for each
of SNAME, STATUS and CITY.
S.S#  S.SNAME
S.S#  S.STATUS
S.S# S.CITY
Or we can say represent as
S.S#S. (SNAME, STATUS, CITY)
The statement S.S#S.CITY is read as “attribute S.CITY is functionally
dependent on attribute S.S#”, or “attribute S.S# functionally determines
attribute S.CITY”.
Alternate definition for functional dependence
Given a relation R, attribute Y of R is functionally dependent on
attribute X of R if and only if, whenever two tuples of R agree on their Xvalue, they also agree on their Y-value.
64
S#
S1
S1
S1
S1
P#
P1
P2
P3
P4
Qty
300
200
400
100
Status
20
20
20
20
Fig: Partial tabulation of relation SP’.
For example in this relation SP’
SP’.S#SP’.STATUS
A functional dependence is a special form of integrity constraint. For
example, if a relation S satisfies the FD S.S#S.CITY then we say that every
legal extension of that relation satisfies that constraint.
It is convenient to represent the FDs in a given set of relations by means of a
functional dependency diagram.
Example:
S#
PNAME
STATUS
P#
SNAME
COLOR
WEIGHT
CITY
S#
QTY
P#
CITY
Fig: Functional dependencies in relations S, P, SP.
Various Normal Forms
Brief description of Normal forms
First Normal Form
 Eliminates repetition of data that is converts each data value to
its atomic form
 No two rows should be identical
65
 Each table entry should be single valued
 Every table has a primary key, which is a unique label or
identifier for each row
Second Normal Form


Requires taking out data that is only dependent on a part of
the key
Each non-key attribute is functionally dependent on the entire
key
Third Normal form
 Involves getting rid of anything in the tables that does not
depend solely on the primary key
 3NF is sometimes characterized as “the key, the whole key, and
nothing but the key”
First Normal Form
Definition:
A relation R is in first normal form(1NF) if and only if all underlying
domain contain atomic values only.
A relation that is only in first normal form has a structure that is undesirable for a number of reasons.
For example:
Let us assume that information concerning suppliers and shipments, rather than being split into two separate
relations (S and SP) is combined into a single relation and let the name be FIRST with fields (S#, STATUS, CITY, P#,
QTY).
Where S# represents the supplier number, STATUS represents the supply details, CITY represents the city where
the supply has been made P# represents the Part number, QTY represents the quantity of supply.
Here the constraint is STATUS is functionally dependent on CITY. That is the meaning of this constraint is that a
supplier’s status is determined by the corresponding location: e.g., all LONDON suppliers must have a status of 20.Also
we ignore the attribute SNAME for simplicity The primary key of FIRST is the combination of (S#, P#). The following is
the functional dependency diagram for this relation
S#
66
STATUS
QTY
P#
CITY
Fig: Functional dependencies in the relation FIRST
In the diagram
i) STATUS and CITY are not functionally dependent on the primary key.
ii) STATUS and CITY are not mutually dependent.
Certain difficulties of the FIRST relation occurs while UPDATION.They are explained as
Insert: We cannot enter the fact that a particular supplier is located in a particular city until that supplier supplies at
least one part. The following is the tabulation of FIRST.
S#
STATUS
CITY
P#
QTY
S1
20
London
P1
300
S1
20
London
P2
200
S1
202
London
P3
400
S1
20
London
P4
200
S1
20
London
P5
100
S1
20
London
P6
100
S2
10
Paris
P1
300
S2
10
Paris
P2
400
S3
10
Paris
P2
200
S4
20
London
P2
200
S4
20
London
P4
300
S4
20
London
P5
400
Table: FIRST
The FIRST relation does not show that supplier S% is located in ATHENS. Because until S5 supplies some part, we have
not appropriate primary key value.
Deletion: If we delete the only FIRST tuple for a particular supplier, we destroy not only the shipment connecting that
supplier to some part but also the information that the supplier is located in a particular city.
For example if we delete the FIRST tuple with S# value S# and P# value P2, we lose the information that S3 is located
in Paris.
Updation: the city value for a given supplier appears in FIRST many times, this redundancy causes update problems.
For example, if supplier S1 moves from London to Amsterdam then the two difficulties occurs. They are
67
Searching the FIRST relation to find every tuple connecting S1 and London and this produces an inconsistent result. The
solution to these problems is to replace the relation FIRST by the two relations SECOND (S#, STATUS, CITY) and SP
(S#, P#, QTY). The functional dependency diagrams for these two relations are as shown here.
STATUS
S#
CITY
S#
P#
CITY
Fig:Functional dependencies in the relation SECOND and SP.
The following tables shows the sample tabulations corresponding to the data values of FIG:1 except the information for
supplier S5 has been included in SECOND and not in SP.
SECOND
S#
Status
City
S1
20
London
S2
10
Paris
S3
10
Paris
S4
20
London
S5
30
Athens
SP
S#
P#
QTY
S1
P1
300
S1
P2
200
S1
P3
400
S1
P4
200
S1
P5
100
S1
P6
100
S2
P1
300
S2
P2
400
S3
P2
200
S4
P2
200
S4
P4
300
S4
P5
400
Fig: Sample tabulations of SECOND and SP.
After building the tables as shown we overcome the difficulties of FIRST relation. Now we can easily do the
operations on the tables. This is about first normal form.
68
SECOND NORMAL FORM:
DEFINITION: A relation R is in second normal form (2NF) if and only if it
is in 1NF and every nonkey attribute is fully dependent on the primary key.
Relations SECOND and SP are both 2NF (the primary keys are S# and the
combination (S#,P#), respectively). Relation FIRST is not in 2NF. A relation
that is in first normal form and not in second can always be reduced to an
equivalent collection of 2NF relations. The reduction consists of replacing the
relations by suitable projections; the collections of these projections is
equivalent to the original relations, in the sense that the original relation can
always be recovered by taking the natural join of these projections, so no
information is lost in the process. In other words, the process is reversible.
In our example: SECOND and SP relations are projections of FIRST,
and FIRST is the natural join of SECOND and SP over S#.
The reduction of FIRST to the pair (SECOND, SP) is an example
of nonloss decomposition. In general, given a relation R with possibly
composite attributes A, B, C satisfying the FD R.A R.B, R can always be
“nonloss-decomposed” into its projections R1 (A, B) and R2 (A, C).Since no
information is lost in the reduction process, any information that can be
derived from the original structure can also be derived from the new
structure. The converse is not true, however: The new structure may contain
information (such as the fact that S5 is located in Athens) that could not be
represented in the original. In the sense the new structure is a slightly more
faithful reflection of the real world.
The SECOND /SP structure still causes problems, however.
Relation SP is satisfactory; as a matter of fact, relation SP is now in the
normal form, and we shall ignore it for the reminder of this section. Relation
SECOND, on the other hand, still suffers from a lack of mutual
independence among its nonkey attributes. The dependence diagram for
SECOND is still more complex than a 3NF diagram. To be specific, the
dependency of the STATUS on S#, thought it is functional, is transitive (via
CITY): Each S# value determines a CITY value, and this in returns
determines the STATUS value. This transitivity leads, once again, to
difficulties over update operations. (We now concentrate on the association
between cities and status values-ie.,on the functional dependency of STATUS
on CITY .)
INSERTING:
We cannot enter the fact that a particular city has a
particular status value-for example, we cannot state that any supplier in
Rome must have a status of 50-until we have some supplier located in that
city. The reason is, again, that until such a supplier exists we have no
appropriate primary key value.
69
DELETING:
If we delete the only SECOND tuple for a particular city, we destroy not only the information for the
supplier concerned but also the information that that the city has that particular status value. For example, if we delete
the SECOND tuple for S5, we lose the information that the status for the Athens is 30.
UPDATING:The status value for a given city appears in SECOND many times.Thus,if we need to change the status value
for London from 20 to 30 we are faced with either the problem of searching the SECOND relation to find every tuple for
London or the possibilbity of producing an inconsistent result.
The solution to the problems is to replace the original relation (SECOND) by two projections SC(S#,CITY) and
CS(CITY,STATUS).And the corresponding functional dependency diagram is shown here.
S#
CITY
CITY
STATUS
The tabulations corresponding to these is
S#
City
S1
London
S2
Paris
S3
Paris
S4
London
S5
Athens
City
Status
Athens
30
London
20
Paris
10
SC
CS---
Fig:2 Sample tabulations of SC and CS.
It should be clear that this new structure overcomes all the problems over update operations concerning the CITYSTATUS association.
Third Normal Form
Definition: A relation R is in third normal form (3NF) if and only if is in 2NF and every non-key attribute is nontransitively dependent on the primary key.
Relations SC and CS (shown in Fig:2)are both 3NF;relation SECOND (shown in page 20)is not in 3NF.A relation
that is not in second normal form and not in third can always be reduced to an equivalent collection of 3NF relations.
Relations with more than one candidate key or BCNF (Boycecodd normal form)
Definition:
70
A relation R is in BCNF if and only if every determinant is a
candidate key.
The objective of BCNF is to handle a relation having two or more composite
and overlapping candidate keys. Although BCNF is stronger than 3NF,it is
still true that any relation can be decomposed in a non-less way into an
equivalent collection of BCNF relations.
Relation FIRST consists of three determinants: S#, CITY and the
combination (S#, P#). Among these (S#, P#) alone is a candidate key; hence
FIRST is not in BCNF.
Relation SECOND is also not in BCNF because the determinant
CITY is not a candidate key.
Relations SP, SC and CS are in BCNF because in each case the
primary key is the only determinant in the relation.
Example: involving two disjoint (non-overlapping) candidate keys.
Let us consider relation S (S#, SNAME, STATUS, CITY) .the relation S is
BCNF.However, it is desirable to specify both keys in the definition of the
relation:
a) To inform the DBMS, so that it may enforce the constraints implied
by the two-way dependency between the two keys-namely, that
corresponding to each supplier number there exists a unique supplier name,
and conversely
b) To inform the users, since of course the uniqueness of the two
attributes is an aspect of the semantics of the relation and is therefore of
interest to people using it.
Example -where the candidate keys overlap.
Two candidate keys overlap if they involve two or more attributes
each and have an attribute in common.
1) We suppose that the supplier names are unique, and we consider the
relation SSP (S#, SNAME, P#, QTY). The keys are (S#, P#) and
(SNAME, P#). This is relation is not in BCNF because we have two
determinants# and SNAME, which are not keys for the relation (S#
determines SNAME, and conversely). But the relation is in 3NF if we
consider the definition----A relation R is in 3NF if and only if it is in 2NF
and every non-key attribute is non-transitively dependent on the primary
key. Here in this definition it does not require an attribute to be fully
dependent on the primary key if it was itself a component of some other
key in the relation, and so the fact that SNAME is not fully dependent on
(S#, P#). But this fact leads to redundancy and hence to update problems
in the relation SSP.If we go for updating the name of supplier S from
71
Smith to Robinson leads either to search problems or to possibly
inconsistent results. The solution to the problems as usual is to decompose
the relation SSP into two projections, in this case SS (S#, SNAME) and
SP (S#, P#, QTY) for SP (SNAME,P#,QTY).These projections are both
BCNF.
2) Second example;
Consider the relation SJT with attributes S(student),J(subject) and
T(teacher).The meaning of an SJT tuple is that the specified student is
taught the specified subject by the specified teacher. The semantic rules
follow:
1.Only one teacher teaches each student of thet subject
2.Each teacher teaches only one subject
3.Several tachers teach each subject.
The sample tabulation of this relation is as follows
SJT
S
Smith
Smith
Jones
Jones
J
Math
Physics
Math
Physics
T
Prof.white
Prof.Green
Prof.White
Prof.Brown
The functional dependencies of SJT are:
From the first semantic rule we have functional dependency of T on the
composite attributes (S, J).
Form the second semantic rule we have a functional dependency of J on
T.
From the third semantic rule it is understood that there is no functional
dependency of T on J.
So the diagram is as follows
S
T
J
Fig: Functional dependencies in the relation SJT.
72
Here again we are having two overlapping candidate keys: the combination (S, J)
and the combination (S, T). Once again the relation is 3NF and not BCNF; and once
again the relation suffers from certain anomalies in connection with update
operations. For example, if we wish to delete the information that Jones is studying
physics, we cannot do so without at the same time losing information that professor
Brown teaches physics.
The difficulties are caused by the fact that T is determinant but not a
candidate key. Again we can get over the problem by replacing the original relation
by two BCNF projections, in this case ST (S, T) and T, J (T, J).
Finally we say that the concept of BCNF eliminates certain problem cases
that could occur under the old definition of 3NF.Moreover,BCNF is conceptually
simpler than 3NF,in that it involves no reference to the concepts of primary key,
transitive dependence and full dependence. The reference of candidate keys can also
be replaced by a reference to the more fundamental notion of functional
dependence. The reference to candidate keys can also be replaced by a reference to
the more fundamental notion of functional dependence.
Good and Bad decompositions
During the reduction process it is frequently the case that a given relation
can be decomposed in a variety of different ways. Consider the relation SECOND
(S#, STATUS, CITY) with functional dependencies (FDs).
SECOND.S#SECOND.CITY
SECOND.CITYSECOND.STATUS
And therefore by transitivity
SECOND.S#SECOND.STATUS
The representation of SECOND relation is
PNAME
S#
STATUS
COLOR
P#
SNAME
CITY
WEIGHT
CITY
S#
QTY
P#
73
Fig: Functional dependencies in relations S, P, SP
The above diagram clearly states that the update problems encountered with
SECOND could be overcome by replacing it by its decomposition into the two 3NF
projections
SC (S#, CITY) and CS (CITY, STATUS)------------------A
Let this composition be A.
An alternative decomposition is
SC (S#, CITY) and SS (S#, STATUS)---------------------------B
Decomposition B is also nonloss, and the two projections are again
BCNF.But decomposition B is less satisfactory than decomposition A.
For example, it is still not possible (in B) to insert the fact that a particular
city has a particular status value unless supplier is located in that city. The
explanation of this example is as follows:
In decomposition A the two projections are independent of each other, in the
sense that updates can be made to either one without regard for the other; So
joining them will not violate the FD constraints on SECOND.
In decomposition B updates to either of the two projections must be
monitored to ensure that the FD SECOND.CITYSECOND.STATUS is not
violated. Thus projections SC and SS are not independent of each other.
A relation that cannot be decomposed into independent component is said to
be atomic.
Questions:
1.What is embedded SQL?
2.Define QBE.
3.Explain operations involving cursors and not involving cursors.
4.What do you meant by dynamic statements?
5.Explain retrieval operations of QBE.
6.Explain update operations of QBE.
7.Explain built-in functions of QBE.
8.Define Normalization.
9.What are various forms of normalization?
10.What do you meant by QBE dictionary?
11.Explain first, second and third normal forms.
12.Explain relations with more than one candidate keys [BCNF].
13.what do you meant by good and bad decomposition?
14.What are QBE-aggregate functions?
15.What is functional dependency?
74
Unit IV
Syllabus
Hierarchical Approach:IMS data structure. Physical database, database description,
Hierarhical sequence. External level of IMS: Logical Databases, the program
communication block. IMS data manipulation: Defining the program communication
block: DL/I Examples.
Books for Reference:
An introduction to database system - C.J.Date
Database system Concepts - Abraham silberschatz, Henry F.Korth, S.Sudharsan
Principles of database system -Aho D.Ullman
IMS data structure(Information Management System)
A physical database is an ordered set, the elements of which consist of all
occurrences of one type of physical database record(PDBR).A PDBR occurrences in turn
consists of a hierarchical arrangement of fixed-length segment occurrences; and a
segment occurrence consists of a set of associated fixed-length field occurrences.
As an example we consider a PDB that contains information about the internal
education system of a large industrial company. The hierarchical structure of this PDBthat is the PDBR type is shown here
Course
Course#
Title
Description
Prereq
Course#
Title
Offering
Date
Location
Teacher
Student
Emp#
Name
75
Format
Emp#
Name
Grade
Fig: PDBR type for the education database.
In this example we are assuming that the company maintains an education
department whose function is to run a number of training courses. Each course is offered
at a number of different locations within the company. The PDB contains details both of
offerings already given and of offerings scheduled to be in the future,. The details are as
follows:





For each course: course number (unique), course title, course description,
details of prerequisites courses if any, and details of all offerings.
For each prerequisite course for a given course: course number and
title.
For each offering of a given course: date, location, format, details of all
teachers and details of all students;
For each teacher of a given offering: employee number and name
For each student of a given offerings: (EMP_N), name and grade.
In the PDBR structure shown, we have five types of sgments:
COURSE, PREREQ, OFFERING, TEACHER and STUDENT, each one
consisting of the field types indicated.
COURSE is the root segment type and the others are department segment
types. Each dependent has a parent for example the parent of TEACHER is
OFFERING. Similarly each parent has at least one child, for example COURSE
has two children. For one occurrence of any given segment type may be any
number occurrences of each of its child segment types.
Course
M23
Dynamics
…
Prereq
Offering
M19 Calculus
M16 Trignomentry
750106
751104
730813
Oslo
Dublin
Madrid
F2
F3
F3
Student
421633 Sharp.R
761620
183009
102141
Teacher
76
Tallis.T
Gibbons.O
Byrd,W
B
A
B
Fig: Sample PDBR Occurrence for the education database.
The database Description
Each physical database is defined together with its mapping to storage by
a database description (DBD). The source form of the DBD is written using
special System/370 Assembler language macro statements, once written the DBD
is assembled and the object form is stored away in a system library, from which it
may be extracted when required by the IMS control program. So the following is
the DBD for the education database.
1 DBD
2 SEGM
3 FIELD
4 FIELD
5 FIELD
6 SEGM
7 FIELD
8 FIELD
9 SEGM
10 FIELD
11 FIELD
12 FIELD
13 SEGM
14 FIELD
15 FIELD
16 SEGM
17 FIELD
18 FIELD
19 FIELD
NAME=EDUCPDBD
NAME=COURSE, BYTES=256
NAME=(COURSE#, SEQ), BYTES=3,START=1
NAME=TITLE, BYTES=33,START=4
NAME=DESCRIPN, BYTES=220,START=37
NAME=PREREQ, PARENT=COURSE, BYTES=36
NAME=(COURSE#, SEQ), BYTES=3,START=1
NAME=TITLE, BYTES=33,START=4
NAME=OFFERING, PARENT=COURSE, BYTES=20
NAME=(DATE, SEQ, M), BYTES=12,START1
NAME=LOCATION, BYTES=12,START=19
NAME=FORMAT, BYTES=2,START=19
NAME=TEACHER,PARENT=OFFERING,BYTES=24
NAME=(EMP#, SEQ), BYTES=6,START=7
NAME=NAME, BYTES=18,START=7
NAME=STUDENT,PARENT=OFFERING, BYTES=25
NAME=(EMP#, SEQ), BYTES=18MSTART=7
NAME=NAME, BYTES=18,START=7
NAME=GRADE, BYTES=1,START=25
FIG: DBD for the education PDB.
Explanation
 Statement 1:Assigns the name EDUCPDBD (“education physical database
description”) to the DBD.All the names in IMS are limited to a maximum length
of eight characters.
77
 Statement 2:Defines the root segment type with the name COURSE and has
totally 256 bytes length.
 Statement 3-5:Defines the field types that go to make up COURSE. Each is given
a name, a length in bytes, and a start position within the segment. The first field,
COURSE# is defined to be the sequence field for the segment. So the PDBR
occurrences will be sequenced in ascending course number order.
 Statement 6:Defines PREREQ as a 36-byte segment and is dependent on
COURSE.
 Statements 7-8:Define the fields of PREREQ.
 Statement 9:Defines OFFERING as a child of COURSE.
 Statements 10-12:Define the fields of OFFERING.DATE are defined as the
sequence field for OFFERING. The specification M (multiple) means that twin
OFERING occurrences may contain the same date value.
 Statements 13-15:Define the TEACHER segment and its fields
 Statements 16-19:Define the STUDENT segment and its fields
The sequence of statements in the DBD is significant. Specifically SEGM
statements must appear in the sequence that reflects the hierarchical structure also
each SEGM statement must be immediately followed by the appropriate FIELD
statements.
Hierarchical Sequence
The concept of hierarchical sequence within a database is a very important
one in IMS.The definition for this is as follows:
For each segment occurrence, we define the “hierarchical sequence key
value” to consist of the sequence field value for that segment, prefixed with the type
code for that segment, prefixed with the hierarchical sequence key value of its
parent, if any. For example, the hierarchical sequence key value for the STUDENT
occurrence for “Byrd,W.” is
1M2337308135102141
78
Here 1 is the type code for COURSE, M23 the course#, 3 is the type code of
OFFERING, 730813 is the DATE of OFFERING, 5 is the type code of
STUDENT, 102141 is the EMP# of STUDENT.
Then the hierarchical sequence for an IMS database is that sequence of segment
occurrences defined by ascending values of the hierarchical sequence key. This
notion is important in case of IMS databases because in IMS databases are stored
in hierarchical sequence.
External Level OF IMS
Logical databases:
In architecture the user’s external view was defined as subset of
the corresponding physical database. A LDB (logical database) is an ordered set,
the elements of which consist of all occurrences of one type of LDBR (logical
database record).An LDBR type is a hierarchical arrangement of segment types,
and is derived from the corresponding PDBR hierarchy in accordance with the
following rules.


Any segment type of the PDBR hierarchy together with all its dependents can
be omitted from the LDBR hierarchy
The fields of an LDBR segment type can be a subset of those of the
corresponding PDBR segment type, and can be rearranged within that LDBR
segment type.
Example:
Course
Course#
Title
Description
Offering
Date
Location
Format
Student
Emp#
Name
Grade
Fig: Sample LDBR type for the education database.
Sensitive Segments:
79
The segments, which are present in PDB and is included in LDB are said
to be sensitive segments. In the above example COURSE, STUDENT,
OFFERING are sensitive segments .The user of this LDB will not be aware of the
existence of any other segments.
For example, the DL/I “get next” operation, which in general is used for
sequential retrieval, will simply skip over any segments that are not sensitive for
the user. If the user deletes a sensitive segment all children of that segment will be
deleted regardless of sensitiveness. So the user should not be given the authority
to delete a segment, which allows the deletion of other hidden segments too.
Also sensitive-segment concept protects the user from modification like
addition to the PDB unless it is proved that the addition of new segment may not
affect any existing parent-child relationship.
Also sensitive-segment concept provides a degree of control over data
security, is as much as users can be prevented from accessing particular segment
types by the omission of those segments from the LDB.
Sensitive fields
Sensitive fields are those fields of the PDB that are included in the
LDB.Every sensitive field must be controlled within a sensitive segment A given
LDB may include or exclude any combination of fields from the PDB, in general
except that if the program intends to insert new occurrences of a given segment
type, then it must be “sensitive to” the sequence filed for that segment type.
Field sensitivity, like segment sensitivity, protects the user from certain
types of growth in the database and provides a simple level of data security.
The program communication block (PCB)
Each LDB is defined by a PDB.The PCB includes the specification of the
mapping between the LDB and the corresponding PDB.Like DBD (database
description) a PCB is written using special system/370 assembler language macro
statements. These statements constitute the “external DDL”for IMS.The set of all
PCBs for a given user forms that user’s program specification block (PSB); the
object form of the PSB is stored in a system library, from which it may be
extracted when required by the IMS control program.
Example:
1
2
3
4
PCB
SENSEG
SENSEG
SENSEG
TYPE=DB,DBNAME=EDUCPDBD,KEYLEN=15
NAME=COURSE, PROCOPT=G
NAME=OFFERING,PARENT=COURSE,PROCOPT=G
NAME=STUDENT,PARENT=OFFERING, PROCOPT=G
Fig: PCB for the LDB
80
Explanation
 Statement 1:Specifies that this is a PCB database and named as
EDUCPDBD, length of the key feedback area is 15 bytes.
Key Feedback: When the user accesses an LDB, the
corresponding PCB is held in storage and acts, as a
communication area between the user’s program and
IMS.One of the fields in the PCB is the key feedback area.
When the user retrieves a segment from the LDB, IMS not
only fetches the requested segment but also places a “fully
concatenated key” into the key feedback area.
The fully concatenated key consists of the
concatenation of the sequence field values of all segments
in the hierarchical path from the root down to the retrieved
segment.
Fetches the requested segment
For example;
Retrieve the STUDENT occurrence for
Byrd.W.
IMS will place the value M23730813102141
in the key feedback area. The fully concatenated key of a
segment is not quite the same as the “hierarchical sequence
key” as this does not include segment type code
information.
 Statement 2:Specifies the first sensitive segment in the LDB.The
name of the sensitive segment must be same as the name assigned
to the segment in the DBD.
The PROCOPT (processing options”) entry specifies the
types of operation that the user will be permitted to perform
on this segment. In this example the entry is G (“get”)
indicating retrieval only.
Other options are I (“insert”), R (“replace”) and D
(“delete”).
 Statement 3:Defines the next sensitive segments in the LDB.
 Statement 4:Defines the last sensitive segments. In our example
statements 3 and 4 are very similar. The PROCOPT entry is the
same for each of the three sensitive segments .In such a situation
we may specify PROCOPT in the PCB statement instead of in
each SENSEG statement.
81
If PROCOPT=K is specified in the SENSEG statement for
OFFERING, the user may largely ignore the presence of
OFFERINGs in the hierarchy. The output for this
modification is shown as follows.
Course
Course#
Title
Description
Student
Emp#
Name
Grade
Fig: Effect of specifying PROCOPT=K for offering
The main difference is that when a STUDENT occurrence is retrieved, the
fully concatenated key in the key feedback area will include the date value
from the parent OFFERING.
The LDB shown in the example figure 1, is sensitive to all fields in
segments COURSE, OFFERING and STUDENT of the underlying
PDB.Suppose if we wish to exclude the LOCATION field of the
OFFERING segment from the LDB while still remaining sensitive still all
other fields as shown here:
SENFLD
SENFLD
NAME=FORMAT, START=1
NAME=DATE, START=1
These statements specify the fields to be included in the LDB segment and
their start position within that segment. If no SENFLD statement is given for
a particular SENSEG statement, then by default that segment is taken to be
identical to the underlying PDB segment.
IMS Data Manipulation
Defining the Program Communication Block (PCB)
82
The IMS data manipulation language (DL/I) is invoked from the host
language (PL/I) by means of ordinary subroutine calls. When an application
program is operating on a particular logical database (LDB), the PCB for that
LDB is kept in storage to serve as a communication area between the programs
and IMS; infact when the program calls DL/I, it has to quote the storage address
of the appropriate PCB to identify to DL/I which LDB it is to operate on.
PCB address is supplied to the program by IMS when the program is first
entered. what actually happens is this.when a database application is to be run,
IMS is given control first. IMS determines which PSB and DBD(s) are required,
fetches them from their respective libraries and loads them into storage. IMS then
fetches the application program and gives it control, passing it the PCB address as
parameters.
In order for the application program to be able to access the information in
the PCB for a particular LDB, it must contain a definition of that PCB.
DLITPLI:
.
.
.
Declare
PROCEDURE (COSPCB_ADDR) OPTIONS (MAIN);
1
2
2
2
2
2
2
2
2
2
COSPCB
DBDNAME
SEGLEVEL
STATUS
PROCOPT
RESERVED
SEGNAME
KEYFBLEN
#SENSEGS
KEYFBAREA
BASED(COSPCB_ADDR),
CHARACTER(8),
CHARACTER(2),
CHARACTER(2),
CHARACTER(4),
FIXED BINARY(31),
CHARACTER(8),
FIXED BINARY(31),
FIXED BINARY(31),
CHARACTER(15);
Fig A: Example of program entry and PCB definition (PL/I).
Explanation:
The procedure statement (labeled DLITPLI) is the program entry point. the
expression in parentheses following the keyword PROCEDURE represents the
parameters to be passed to the program by IMS, it consist of the pointer giving the
address of the PCB. The rest of the Fig A consist of a declare statement that
defines a structure to represent the single PCB used in the application.
The field DBDNAME contains the name of the underlying DBD
throughout the execution of the program.
83
The SEGLEVEL field is set after the DL/I operation to contain the
segment level number of the segment just accessed.
The STATUS field is the most important field in the PCB. After each DL/I
call, the two character value is placed in this field to indicate the success or
otherwise of the requested operation. A blank value indicates that the operation
was completed satisfactorily, any other value represents an exceptional or error
condition.
The PROCOPT field contains the PROCOPT value as specified in the
PCB statement when the PCB was originally defined.
The SEGNAME field contains the name if the segment last accessed.
The KEYFBLEN field contains the length of the fully concatenated key.
The #SENSEGS field contains a count of the number of sensitive
segments.
The field KEYFBAREA is the key feedback area contains the fully
concatenated key.
DL/I Examples
Get Unique (GU)
Get next (GN)
Get next with parent (GNP)
Get hold (GHU), (GHN),(GHNP)
Insert (ISRT)
Delete (DLET)
Replace (REPL)
Direct retrieval
Sequential retrieval
Sequential retrieval under current parent
Allows subsequent DLET/REPL
Add new segment occurrence
Delete existing segment occurrence
Replace existing segment occurrence
Tab: DL/I Operations
Direct retrieval:
Get the first OFFERING occurrence where the location is Stockholm.
GU
COURSE
OFFERING (LOCATION =’STOCKHOLM’)
Sequential retrieval with an SSA:
Get all STUDENT occurrences in the LDB, starting with the first student for the
first offering in Stockholm.
84
GU
NS
COURSE
OFFERING (LOCATION=’STOCKHOLM’)
STUDENT
GN
STUDENT
GOTO NS
Sequential retrieval with an SSA within a parent:
Get all students for the offering on 13 august 1973 of course M23.
COURSE (COURSE#=’M23’)
OFFERING (DATE=’730813’)
GNP STUDENT
GOTO NP
GU
NP
Segment occurrence insertion:
Add a new segment occurrence for the offering on 13 august 1973 of course M23.
ISRT COURSE (COURSE#=’M23’)
OFFERING (DATE=’730813’)
STUDENT
Segment deletion:
Delete the offering of course M23 on aug 1973.
GHU COURSE (COURSE# = ‘M23’)
OFFERING (DATE=’730813’)
DLET
Segment replacement:
Change the location of the 13 Aug 1973 offering of course M23 to Helsinki.
GHU COURSE (COUSE# =’M23’)
OFFERING (DATE=’730813’)
REPL
Questions.
1. Explain physical and logical database of hierarchical approach with example.
2. Explain DataBase Description (DBD) with example.
3. Explain Hierarchical sequence key value.
4. Explain Program communication block (PCB).
5. Discuss DL/I operations with some examples.
UNIT-V
85
Syllabus
Network approach: Architecture of DBTG system. DBTG data structure: The set
construct, singular sets, sample schema, and the external level of DBTG-DBTG Data
manipulation
Books for reference:
1:Database system concepts
Abraham Silberschatz and Henry F.Korth
2:An introduction to database systems
C.J.Date
Basic concepts:
A network database consists of a collection of records, which are connected to
one another through links. A record is in many respects similar to an entity in the entityrelationship model. Each record is a collection of fields (attributes), each of which
contains only one value. A link can be viewed as a restricted (binary) form of
relationship in the sense of the E-R model.
To illustrate, consider a database representing a customer-account relationship in
a banking system. There are two record types, customer and account. As we saw earlier,
the customer record type can be defined, using Pascal-like notation, as follows:
type customer = record
name: string;
street: string;
city: string;
end
The account record type can be defined as follows:
type account = record
number: integer;
balance: integer;
end
The sample database in figure A.1 shows that Lowman has account 305, Camp
has accounts 226 and 177, and kahn has account 155.
Lowman
Square
Dallas
305
226
86
500
336
Camp
Downridge Garland
177
205
155
Kahn
Fig:1
Sample database
Bayside
Plano
62
Data-structure diagrams: [Architecture of network
model]
A data-structure diagram is the scheme representing the design of a network
database. Such a diagram consists of two basic components:
*Boxes, which correspond to record types.
*Lines, which correspond to links.
A data-structure diagram serves the same purpose as an entity-relationship diagram;
namely, it specifies the overall logical structure of the database. We shall consider the
representation of binary, ternary etc. relationships of entity-relationship diagrams.
Binary relationship
The entity-relationship diagram for banking example is shown as follows:
Street
Name
Balance
Number
City
Cust
Acct
customer
account
E-R diagram (a)
Name
street
city
Number
(b)
87
balance
FIG:2
The above shown diagram (a) is the entity-relationship diagram and consists of
two entity-sets customer and account, and they are related through a binary ‘many-tomany’ relationship ‘custacct’ with no descriptive attributes.
The diagram shows that a customer may have several accounts and that an
account may belong to several different customers. The corresponding datastructure diagram is shown in figure (b). Here the record type customer
corresponds to the entity set customer. It includes three fields-name, street and
city.
Similarly, account is the record type corresponding to account entity-set and
includes the attributes number and balance. Since, in the E-R diagram of above figure the
CustAcct relationship is many-to-many, we draw no arrows on the link CustAcct
diagram. If the relationship custacct were one-to-many from customer to account then the
link custacct would have an arrow pointing to customer record type. The representation is
shown as follows:
name
street
number
city
Customer
balance
account
(a)
name
street
number
city
Customer
balance
account
FIG:3
A sample database corresponding to the data-structure diagram of figure as
shown. Since the relation is many-to-many, we show that katz has accounts 256 and
347 and that account 347 is owned by katz and Doner. A sample database
corresponding to the data-structure diagram is shown here:
Beck
Maple
San Francisco
Katz
North
San jose
Doner
Sidehill
200
Palo Alto
88
55
256
100 000
347
667
301
10 533
Fig:4
Sample database corresponding t diagram of FIG:3a
Since the relationship is one-to-many ------From customer to account, a customer may have more than one account, as
is the case with Camp, who owns both 226 and 177. An account, however, cannot belong
to more than one customer, as is indeed observed in the sample database.
Finally, a sample database corresponding to the data-structure diagram of fig:3b is shown
in the FIG:1.
How to replace the E-R diagram shown in FIG:2a if the descriptive attribute has to be
included?
The transformation is more complicated because the link cannot contain any data
value.So new record type has to be created and links need to be established as follows:
If for example we consider the E-R diagram shown in FIG:2a and we are trying to
add the descriptive attribute date to the custacct relationship to denote the last time the
customer has accessed the account.The newly derived E-R diagram is shown here
To transform this diagram to a data-structure diagram we need to:
1:Replace entities customer and account with record types customer and account
2:Create a new record type date with a single field to represent the date.
3:Create the following many-to-one links:
*custdate from the date record type to the customer record type
*acctdate from the date record type to the account recotd type
The DBTG CODASYL Model
The Database Task Group wrote the first database standard specification, called
the CODASYL DBTG 1971 report, in the late 1960s. Then a number of changes have
been suggested to that report, the last official one in 1978.The rules or standards advised
by DBTG group are
Link restriction
DBTG Sets
Repeating Groups
Link Restriction
89
In the DBTG model, only many-to-one links can be used. Many-to-many links are
disallowed in order to simplify the implementation. One-to-one links are represented
using a many-to-one link. Let us illustrate this with the help of an example:
Consider a binary relationship that is either one-to-many or one-to-one. If for our
customer-account database, if the custacct relationship is one-to-many with no
descriptive attributes and with descriptive attribute is shown in the following figure:
Customer
Name
City
account
Number
Balance
Street
Customer
Name
City
account
Street
Number
Balance
Fig: Two data-structure diagramsDate
If the custacct relationship is many-to-many then our transformation algorithm
must be refined as follows. If the relationships have no descriptive attributes then the
following algorithm must be employed:
1:Replace the entity sets customer and account with record types customer and account.
2:Create a new dummy record type Rlink that may either have no fields or have a single
field containing an externally defined unique identifier.
3:Create the following two many-to-one links:
custrlink from rlink record type to customer record type
*acctlink from record type to account record type.
stree
t
nam
e
D
numbe
r
City
Customer
custAc
ct
Balance
Account
90
DBTG sets
Given that only many-to-one links can be used in the DBTG model, a datastructure diagram consisting of two record types that are linked together has the
general form of the following figure:
Name
street
city
Number
balance
A
B
Fig:A
The above shown structure is referred in the DBTG model as a DBTG-set. The name of
the set is usually chosen to be the same as the name of the link connecting the two record
types.
In each such DBTG-set, the record type A is said as the owner (or parent) of the
set, and the record type B is said as the member (or child) of the set. Each DBTG-set can
have any number of set occurrences-that is actual instances of linked records.
For example in the figure we are having three occurrences corresponding to the
DBTG-set of figure A.
Since many-to-many links are disallowed, each set occurrence has precisely one
owner and zero or more member records. In addition, no member record of a set can
participate. Simultaneoulsy in several set occurrences of different DBTG-sets.
To illustrate, consider the data-structure diagram shown here. There are two
DBTG-sets.
 Custacct, having customer as the owner of the DBTG-set, and account as the
member of the DBTG-set.
91
 Brncacct, having branch as the owner of the DBTG-set, and account as the
member of the DBTG-set.

The set custacct may be defined as follows:
Set name is custacct
Owner is customer
Member is account
The set brncacct may be defined similarly as
Set name is brncacct
Owner is branch
Member is account
An instance of the database is shown here:
Five set occurences are shown: three of set custacct,and two of set brncacct
1:owneer is customer record Lowman with a singke member account record 305
2:owner is customer record Camp with two member account records 177 and 226
3:Owner is cuatomer record Kahn with three member account records 155,402 and
408.
4:Owner is branch record Hillside with three member account records 305,226 and
155.
5:Owner is branch record Valleyview with three member account records 177,402
and 408
Here the fact, an account record cannot appear in more than one set occurrence of
one individual set type. This is because an account can belong to exactly one
customer, and can be associated with only one bank branch. An account can appear in
two set occurrences of different set types. For example, acccount 305 is a member of
set
occurrence 1 of type custacct and is also a member of set occurrence 4 of type
brncacct.
The member records of a set occurrence may be ordered in a variety of ways.
Repeating Groups:
The DBTG model provides a mechanism for a field to have a set of values, rather
than one single value.
For example, Suppose that a customer have several addresses. In this case, the
customer record type will have the (street, city) pair of fields is defined as repeating
group. So the customer record for Kahn is shown here:
92
The repeating groups construct is another way of representing the notion of weak
entities in the E-R model. To illustrate we shall split the entity set customer into two
sets:
*Customer, with descriptive attribute name
*Address, with descriptive attribute street and city.
The address entity set is weak entity set, since it depends on the strong entity set
customer.
DBTG data retrieval facility
The data manipulation language of the DBTG proposal consists of a number of
commands that are embedded in a host language. The commands are explained as
follows:
The Find and Get commands
The two most frequently used DBTG commands are
*find-locates a record in the database and sets the appropriate
currency pointers
*get,which copies the record to which the current of run-unit
points from the database to the appropriate program work area
template.
Access of individual records:
The find command has a number of forms. There are two different find commands for
locating individual records in the database. the simplest command has the form:
Find any <record type> using <record-field>
Purpose: Locates a record of type <record type> whose <record-field> value is the
same as the value of <record-field> in the <record-type> template in the program
work-area. The following currency pointers are set to point to that record:
*The currency of run-unit pointer
*The record-type currency pointer for <record type>
93
*For each set in which that record belongs, the appropriate set currency pointer
For example: Construct the DBTG query that prints the street address of Lowman.
Customer. name:=”Lowman”;
Find any customer-using name;
Get customer;
Print (customer.street);
To display the duplicate records the command is
Find duplicate <record type> using <record-field>
Which locates the next record, which matches the <record-field>.
Example: Construct the DBTG-query that prints the names of all the customers who
live in Dallas:
Customer.city:=”Dallas”;
Find any customer-using city;
While DB-status = 0 do
Begin
Get customer;
Print(customer.name);
Find duplicate customer using city;
End;
Access of records within a set
Purpose: Locate records in a particular DBTG-set.
There are three different types of commands.
The basic find command is
Find first <record type> within <set-type>
Which locates the first database record of type <record type> belonging to the current
<set-type>.
94
To locate the other members of a set the command is
Find next <record-type> within <set-type>
This command finds the next elements in the set <set-type>
Example: Construct the DBTG query that prints the total balance of all accounts
belonging to Lowman.
Sum: =0;
Customer. name:=”Lowman”;
Find any customer-using name;
Find first account within custacct;
While DB-status =0 do
Begin
Get account;
Sum:=sum + account. Balance;
Find next account within custacct;
End
Print (sum);
To find the owner of a particular DBTG-set .The command used is
Find owner within <set-type>
Example: Construct the DBTG-query that prints all the customers of the Hillside
branch:
Branch-name:=”Hillside”;
Find any branch-using name;
Find first account within brncacct;
While DB-status=0 do
Begin
Find owner within custacct;
Get customer;
Print(customer. name);
Find next account within brncacct;
End
DBTG update facility
Creating new records
95
To create a new record of type <record type> we insert the appropriate values in
the corresponding <record type> template. And the command used is
Store <record type>
Example: Construct the DBTG query to add a new customer Jackson to the
database.
Customer.name:=”Jackson”;
Customer.street:=”Old road”;
Customer.city:=”Richardson”;
Store customer;
Modifying an existing record
In order to modify an existing record of type <record type> we must find the
record in the database, get that record into the memory, and then change the desired
fields in the template of <record type>. Once this is accomplished, we reflect the
changes to the record to which the currency pointer of <record type> points by
executing the command:
Modify <record type>
The DBTG model requires the find command to be executed prior to modifying a
record must have the additional clause “for update” so that the system is aware of the
fact that the record is to be modified.
Example:
Construct the DBTG program to change the street address of Kahn to North Loop.
Customer.name:=”Kahn”;
Find for update any customer using name;
Get customer;
Customer.city:=”North Loop”;
Modify customer;
Deleting a record
To delete an existing record of type <record type> we use the command:
96
Erase <record type>
Example:
The query to construct the DBTG program to delete account 402 belonging to
Kahn:
Finish:=false;
Customer.name:=”Kahn”;
Find any customer using name;
Find for update first account within custacct;
While DB-status=0 and not finish do
Begin
Get account;
If account.number =402 then
Begin
Erase account;
Finish: = true;
End;
Else
Find for update next account within custAcct
End;
It is possible to delete an entire set occurrence by finding the owner of the set – say, a
record of type <record type> - and executing.
Erase all<record-type>
This will delete the owner of the set as well as its entire member. If a member of
the set is an owner of another set the members of that set are also deleted. That the erase
all operation is recursive.
Eg.
Consider the DBTG program to delete customer “Camp” and all of her accounts.
Customer.name :=”Camp”;
Find for update any customer using name;
Erase all customer.
DBTG set-processing facility
This mainly concerns with the mechanism of inserting records into and removing
records from a particular set occurrence.
The connect statement
97
To insert a new record of type <record type> into a particular occurrence of <settype> we must first insert the record into the database, then set the currency pointers of
<record type> and <set type> to point to the appropriate record and set occurrence.
The command used is
Connect <record type> to <set-type>
A new record can be inserted as follows:
1:create a new record of type <record type> .
2:Find the appropriate owner of the set <set type>.
3:Insert the new record into the set by executing the connect statement.
Example:
Create the DBTG query for creating new account 267 which belongs to Jackson:
Account.number:=267;
Account.balance:=0;
Store account;
Customer.name:=”Jackson”;
Find any customer using name;
Connect account to custacct;
The Disconnect statement
In order to remove a record of type <record type> from a set occurrence of <settype>, we need to set the currency pointer of <record type> and <set-type> to point to the
appropriate record and set occurrence. Once this is accomplished, the record can be
removed from the set by executing
Disconnect <record-type> from <set-type>
Eg. To remove account 177 from the set occurrence of type custacct.
Account.number :=177;
Find for update any account using number;
Get account;
Find owner within custacct;
Disconnect account from custacct;
The reconnect statement
In order to move a record of type <record-type> from one set occurrence to
another set occurrence of type <set-type>, we need to find the appropriate record and the
98
owner of the set occurrence to which the record is to be moved. Once this is done, we can
move the record by executing:
Reconnect <record-type> to <set-type>
Consider the DBTG program to move all accounts of Lowman that are currently
at the hillside branch to the valley view branch.
Customer.name :=”Lowman”;
Find any customer-using name;
Find first account within custacct;
While DB-status =0 do
Begin
Find owner within brncacct;
Getbranch;
If branch.name = “hillside” then
Begin
Branch.name:=”Valley view”;
Find any branch-using name;
Reconnect account to brncacct;
End;
Find next account within custacct;
End;
Set Insertion and Retention
When a new set is defined, we must specify how member records are to be
inserted. In addition, we must specify the conditions under which a record must be
retained in the set occurrence in which it was initially inserted.
Set Insertion
A newly created record of type <record type > of a set type <set type > can be
added to a set occurrence either explicitly (MANUALLY) or implicitly (automatically).
This distinction is specified at set definition time via
99
Insertion is < insert mode >
Where < insert mode > can take one of two forms.
 Manual : The new record can be inserted into the set manually ( explicitly ) by
executing .
Connect < record type > to <set-type>
Automatic : The new record is inserted into the set automatically ( implicitly )
when it is created , that is , when we execute .
Store < record type >
In either case, just prior to insertion, the <set-type> currency pointer must point to
the set occurrence into which the insertion is to be made.
Set Retention
There are various restrictions on how and when a member record can be removed
from a set occurrence into which it has been inserted previously. These restrictions are
specified at set definition time via
Retention is < retention-mode >
Where <retention-mode> can take one of the three forms
Fixed : Once a member record has been inserted into a particular set occurrence
, it cannot be removed from that set . If retention is fixed , then to reconnect a
record to another set , we must first erase that record , re-create it , and then insert
it into the new set occurrence .
Mandatory : Once a member record has been inserted into a particular set
occurrence , it can be reconnected only to another set occurrence of type <settype>. It can neither be disconnected nor be reconnected to a set of another type .
Optional : No restrictions are placed on how and when a member record can be
reconnected , disconnected ,and connected at will .The decision as to which to
option to choose is dependent on the application .
100
Deletion
When a record is deleted (erased) and that record is the owner of set occurrence of
type <set-type> , the best way of handling this deletion depends on the specification of
the set retention of <set-type>
 If the retention status is optional, then the record will be deleted and every
member of the set it owns will be disconnected. These records, however, are
kept in the database.
 If the retention status is fixed, then the record and all of its owned
members will be deleted. This follows from the fact that the fixed status
indicates that a member record cannot be removed from the set occurrence
without being deleted.
If the retention status is mandatory, then the record cannot be erased this is
because the mandatory status indicates that a member record must belong to
a set occurrence; it cannot be disconnected form that set.
Set Ordering
The members of a set occurrence of <set-type> may be ordered in a variety of
ways. A programmer specifies these orders when the set is defined
Order is <order-mode>
Where <order-mode> can be
 First : When a new record is added to a set , it is inserted in the first positive .
Thus, the set is in reverse chronological ordering
 Last : When a new record is added to a set , it is inserted in the ;last position .
Thus, the set is in chronological ordering
 Next : Suppose that the currency pointer of <set-type> points to record X . if X
is a member type , then when a new record is added to the set . It is
inserted in the position following X. If X is an owner type, then when a new
record is added, it is inserted in the last position.
 Prior : Suppose that the currency pointer of ,set-type> points to record X . If X
is a member type, then when a new record is added to the set it is
inserted in the position just prior to X. If X is an owner type, then
when a new record is added, it is inserted in the last position.
 System default : When a new record is added to a set , it is inserted in an
arbitrary position determined by the system .
 Sorted : When a new record is added to a set , it is inserted in a position that
ensures that the set will remain sorted . The sorting order is specified by a
particular key value when a programmer defines the set. The programmer must
specify whether members are ordered in ascending or descending order relative to
that key.
101
REFER THE TEXT BOOK FOR FURTHER REFERENCE
Questions:
1. Explain the architecture of network model.
2. Write short notes on
a) Link restriction
b) DBTG Sets
c) Repeating Groups
3. Explain DBTG data retrieval facility.
4. Explain DBTG set-processing facility.
5. explain DBTG update facility.
6. What is set insertion and retention.
102
Download
Study collections