Unit1 - E

advertisement
1
;Study Material
Relational Database Management System
Class
Subject
Unit
Semester
Staff
:
:
:
:
:
II BCA ‘A’ & ‘B’
Relational Database Management System
I
IV
Malar.N
Unit-1:
Introduction:
Purpose of the database system – view of the data – data models – Database languages –
transaction management- storage management- Database administrator- Database users – system
structure.
Entity relationship model:
Basic concepts – keys- entity relationship diagram – weak entity sets- ER features
specialization generalization.
Relational model:
Structure of the relational databases- relational algebra –views.
INTRODUCTION:
A database management system consists of a collection of interrelated data and set of
programs to access those data. The primary aim of the DBMS is to provide an environment that
is both convenient and efficient to use in retrieving and storing database information.
Purpose Of The Database Systems:
Previously organizations used conventional file systems, which were supported by all the
operating system. Permanent records are stored in various files and different application
programs are written to extract records from and to add records to the appropriate files.
Disadvantages of Conventional file system:
1. Data redundancy and inconsistency
2. Difficulty in accessing the data
3. Data isolation
4. Integrity problems
5. Atomicity problems
6. Concurrent access anomalies
7. Security problems
Data redundancy and inconsistency:
Since files and application programs are written by different programmers, the files and
application programs are written in various programming languages. This leads to the
redundancy of data.
For example address and telephone number of a particular customer may be stored in
two different files and it leads to the wastage of storage space. It will also leads to data
consistency that is change of data in one file could not appear in various files. For example
changed customer address in one file may not be reflecting in the other file.
Difficulty in accessing the data:
If we want to view some records for a new kind of request, the current system does not
have any application program to meet it. So we have two options 1. Develop a new program to
satisfy user need 2. Manually extract.
Suppose in a bank the officer needs the list of customers who are residing at Coimbatore
then he has two choices
1. He can get the list of all customers
2. He can write a new piece of code
Both the choices are in efficient even if we write code newly the manager may need the
list of customers who are having a balance of 20000 RS then again a new piece of code has to be
written.
The conventional file processing does not allow needed data to be retrieved in a
convenient & efficient manner.
2
Data isolation:
Since the data are scattered in various files, files may be in different formats it is difficult
to write new applications programs for all purposes.
Integrity problems:
The data values, which are stored in a database, must satisfy certain types of consistency
constraints when new constraints are added, it is difficult to change the programs to enforce
them. If these constraints involve more than one file then it is difficult to implement.
Example: Bank balance should never fall below 500 RS.
Atomicity problems:
A computer system like any other system is subject to failure. So if a failure occurs the
data should be restored to the consistent state that existed prior to the failure. The data transfer
must be atomic I,e it must happen in its entirely or not at all. It is difficult to ensure this in file
processing system.
Consider an example of transferring 20,000 RS from account A to account if some
failure occurs in the middle it might be debited from As account but not credited to B account.
Concurrent access anomalies:
Many systems allow multiple users to update the data simultaneously. Interaction of
concurrent updates may result in inconsistent data. To allow multiple accesses, we have to allow
multiple allow two programs run concurrently, then may both read the value. It leaves
inconsistent state. To guard against this possibility, the system must maintain some form of
supervision.
Consider the bank account, A which contains 500 RS, if two customers withdraw rupees
50 and rupees 100 simultaneously. If two programs act or execute at the same time it will read
the old balance 500 and the value returned might be 400 or 450. This depends on the value which
is returned last but the actual balance should be Rs350. So concurrent usage or updating of bank
balances is not possible.
Security problems:
Not every user of the database system should be able to access all the data .So the
database should be able manage the authorization of the users.
View of the Data:
A DBMS provides an abstract view of the data. It hides certain details of how the data are
stored and maintained.
Data abstraction:
Data base has complex data structures for representing of data to retrieve efficiently. To
simplify the users interaction with the system, developers hide the complexity from users
through several levels of abstraction.
There are three different views of the database they are
View level
View 1
View 2
Logical level
Physical level
View n
3
1. Physical level
2. Logical level
3. View level
Physical level:
This is the lowest level of abstraction and it describes “ How “ the data are actually
stored. Complex data structures are used. Here physical level tables are described as a block of
consecutive storage locations.
Logical level:
The next higher level of abstraction describes “ What “ data are stored in the database
and what relationships exist among those data. It involves more complex level structures to
implement these simple logical structures. The end user of logical level does not need to be
aware of the complexity. This will be handled by DBA’s(Data Base Administrator)
Here each record is described by a type definition and the inter relationship among these
record types.
View level:
This is the highest level of abstraction describes only part of the entire database; many
users of the database may not be concerned with all the information. The system may provide
many views for the same database. It hides the details of data types. View provides security
mechanism.
Instances and Schemas:
Databases may exchange from time to time when they are inserted and deleted. The
collection of information, which is stored at one particular instance of time. This is called as
instance.
The overall design of the language database is called schema.
There are three different types of schema they are
1. Logical schema
2. Physical schema
3. Sub schema
In general database system support one physical schema, one logical schema, and several
sub schemas.
Analogous example:
Schema - > programming language type definition.
Instance -> variable, which corresponds to the type definition.
Eg : Class Abc {} Abc a;
Data independence:
The ability to modify a schema definition in one level without affecting a schema
definition in the next higher level is called data independence. There are two levels of
independence.
1.Physical level data independence:
This is the ability to modify the physical schema without causing application programs
to be rewritten.
2. Logical data independence:
This is the ability to modify the logical schema without causing application programs to
be rewritten. Eg add one more field.
Logical data independence implementation is tough than physical independence because
application program is fully dependent on logical level data independence.
4
Data Models:
The underlying structure of the database is the data model, which explains a collection
of conceptual tools for describing data, relationships, semantics and consistency constraints.
The different data models are
1. Object based logical model
2. Record based Logical
3. Physical model
There are many different models
known models are
1.
2.
3.
4.
and more are likely to come. Several of the more widely
Entity relationship model
Object oriented model
Semantic data model
Functional data model
Object based logical model
1.Entity Relationship Model:
It is a collection of basic objects called as entities and relationships. An entity is a thing
or object in the real world that is distinguishable from other objects. For example bank account is
an entity. Entities are described by number of attributes. A relationship is an association in
between various entities. ER diagrams can express the overall structure of the database. The
various symbols used for ER diagrams are
1. Rectangle which represent entity sets
2. Ellipses which represent attributes
3. Diamond which represents relationships among entity sets
4. Lines which link attributes to entity sets and entity sets to attributes.
Acc. no
Name
e
Balance
City
Customer
Deposits
Account
2.Object oriented model:
This model also consists of a collection of objects. The objects contain the values for the
instance variables and bodies of the code that operate on the object called methods.
The values and same methods are grouped together into classes. The only way in which
the can access the data of another object is by invoking a method of other object. Thus the value
and method are hidden from the user, thus achieving two levels of data abstraction.
The advantage:
If we want to do any changes, no need to change the entire program. Simple do change s
in the method alone is enough.
For example considering the bank account, it contains instance variable account number
and balance, if the bank decides to decrease or increase the interest rate, only change is made
with in the pay interest method and not in the external interface.
Record Based logical models:
These models is used o describe the data at the logical and view levels. They are used to
specify the logical structure of the database and to provide a higher-level description of the
implementation. Record based in named because database is structured in fixed format records of
several types. Each record type defines a fixed length. Object based model whose structure leads
variable –length records at physical level.
There are three types of record based models they are
5
1. Relational model
2. Network model
3. Hierarchical model
Relational model:
The relational model uses a collection of tables to represent both data and relationships
among those data. Each table has multiple columns with unique name.
Customer name
Social status
Street
city
Account number
Henry
Lecturer
coimbatore
143
Mythili
Lecturer
12,saibaba
colony
Valluvar nagar
Dharmapuri
154
Account Number
143
154
Balance
5000
4000
Network model:(Arbitrary graphs)
Data in the network model are represented by collection of records and relationships
among data are represented by links, which are pointers.
henry
Lecturer 12 saibaba colony
Mythili Lecturer 12 valluvar nagar
coimbatore 143
dharmapuri
153
143
5000
153
4000
Hierarchical model:
The hierarchical model also stores the data in the form of records and links and the only
difference is that the records are in the form of tree.
henry
Lecturer 12 saibaba colony
coimbatore 143
143
Mythili Lecturer 12 valluvar nagar
5000
dharmapuri
153
153
4000
Physical data models:
These data models are used to describe data at the lowest level. The two different models,
which are available, are
1. Unifying model
2. Frame memory model
Database languages:
There are two types languages. They are
1. To specify database schema
2. To express database queries and updates.
Data definition language(DDL):
A database is specified by a set of definitions expressed by a special language called a
data definition language. The result of the compilations of DDL statements is a set of tables that
is stored in special file called as data dictionary or data directory. A data dictionary is a file that
6
contains Meta data. This file is consulted before actual data are read or modified in the
data base system. EG. CREATE, ALTER, DROP STATEMENTS.
Data storage and definition language:
The storage structure and access methods used by data base system are specified by a set
of definition in a special type DDL.
Data manipulation language(DML):
A data manipulation language is a language that enables users to access or manipulates data
as organized by appropriate data model. There are basically two types of DML they are
1. Procedural DML
2. Non-procedural DML
Procedural DML: require a user to specify “What “ data are needed and how to get those data.
Non-procedural DML: require a user to specify “ What” data are needed alone.
Insert, delete, update and select queries are example.
Transaction management:
A transaction is a collection of operations that performs a single logical function in a
database application. Each transaction is a unit of atomicity and consistency. Thus we require
that transactions do not violate any database consistency constraints.
Atomicity: All transactions, which have happened, should succeed or all of them should fail. The
correctness of data should be maintained.
Consistency: The correctness requirement is called consistency.
It is essential that the execution of the fund transfer to preserve the consistency of the
database. This is called the consistency of the database. So it is the responsibility of the
programmer to define properly the various transactions such that each preserves the consistency
of the database.
Ensuring the atomicity and durability properties is the responsibility of the database
system itself. It is done by transaction management component. The database must be restored
to the state in which it was before the transaction is started executing. The operation of database
is to detect system failures and to restore the database to a state that existed prior to the failure.
When the database is updated by more than one transaction at a time the consistency of
the database is no longer preserved and it is the responsibility of the concurrency control
manager to control the interaction among concurrent transactions to ensure consistency of the
database. Small firms databases may execute only one transaction at a time. So it is cost is low.
Storage Management:
Databases require large amount of storage space in terms of giga bytes. Since main
memory cannot hold all data permanently so it is stored in disks. So data are moved between
disks and main memory when it is necessary. Since it consumes time the movement of records is
minimized. A good performance of the database system is measured by the quicker response
time. The goal is to facilitate access of data. High-level views help to achieve this goal.
A storage manager is a program that provides interface between the low level data stored
in the database and the application programs and queries of the system. Thus the storage
manager is responsible for storing, retrieving and updating of the data in the database.
The raw data are stored on the disk using the file system. The storage manager translates
the various DML statements into low-level file system commands.
Database Administrator:
A person who has some central control over the data and the programs is called Database
administrator.
The various functions of a DBA includes
 Schema definition
 Structure and access method definition
 Schema and physical organization modification
 Granting of authorization of data access
 Integrity constraint specification.
7
Schema definition:
The DBA creates the original database schema by writing a set of definitions that
translated by DDL compiler. Those are permanently stored in data dictionary.
Storage structure and access method definition:
The DBA creates appropriate storage structure and access methods by writing a set of
definitions, which is translated by the data storage and DDL compiler.
Schema and physical organization modification:
Programmers rarely modify the physical organization of the data or data base schema.
Granting of authorization fro data access:
The granting of different types of authorizations allows the DBA to regulate, which parts
of the database a user can use. The authorization information is kept in a special system structure
that is consulted by the database system whenever access to the data is attempted in the system.
Integrity constraint specification:
The data values, which are stored in the database, must satisfy certain consistency
constraints.
Database Users:
The four different types of database users are
1. Application programmer
2. Sophisticated users
3. Specialized users
4. Naïve users.
Application programmers:
They are computer professionals who interact with system through DML calls, which are
embedded in a program written in a high level language like pascal, C++, Java etc. The
statements are DML statements so they are compiled separately using DML compiler and they
are converted to host language procedure calls and then the object code is generated.
Sophisticated users:
They interact with the system without writing programs. They submit the request to a
query processor, which in turn breaks down the DML statements into statements such that the
storage manager understands.
Specialized Users:
Specialized users are sophisticated users who write database applications that do not fit
into the traditional data processing framework.
Example: Computer aided design, Knowledge based expert system etc.
Naïve users:
They are unsophisticated users who interact with the system by invoking one of the
permanent application programs that have been written previously.
Example: A bank teller uses a program called transfer to transfer an amount 5000 from account
A to account B.
Overall System Structure:
The functional components of a database are broadly classified into
1. Query processor components
2. Storage manager components
Query processor:
The components of a query processor are
1. DML compiler
2. Embedded DML compiler
3. DDL interpreter
4. Query evaluation engine.
DML compiler:
DML compiler which translates DML statements in a query language into low-level
instructions that a query evaluation engine can understand.
8
users
Naïve users
(Tellers, agents, etc.)
application
Programmer’s
Application
interfaces
Application
programs
Embedded
DML pre
compiler
Application
programs object
code
sophisticated
users
Query
DML
compiler
database
administrator
Database
schema
DDL
compiler
Query evaluation
engine
Query processor
Transaction
manager
Buffer
manager
File manager
indices
Dis
Data files
Statistical
data
Disk storage
Storage manager
Data
dictionary
Embedded DML compiler:
This converts DML statements embedded in an application program to normal procedure
calls in the host language.
DDL interpreter: which interprets DDL statements and records them in a set of tables
containing meta data.
Query evaluation engine:
This executes low level instructions generated by the DML compiler. The storage
manager components provide an interface between the low-level data and queries submitted to
the system.
Storage Manager:
The various components of storage manager are
1. Authorization and integrity manager.
2. Transaction manager
3. File manager
4. Buffer manager
9
Authorization and integrity manager:
This tests for the satisfaction of the integrity constraints and checks the authority of the
users to access data.
Transaction manager:
This ensures that the database remains in a consistent state despite the system failures and
that concurrent transactions proceed without conflicting.
File manager:
This manages the allocation of disk space on disk storage and the data structures used to
represent information used on the disk.
Buffer manager:
This is responsible for fetching the data from the disk storage into main memory and deciding
what data to cache in memory.
In addition to the above statements the following data structures are required as part of
physical system implementation. They are
1. Data files, which stores the database itself.
2. Data dictionary, which stores metadata about the structure of the database
3. Indices, which provide fast access to the data items that, hold particular value.
4. Statistical data, which stores statistical information about the data in the database.
Entity Relationship model
The entity relationship model is based on a perception of a real world that consists of a
set of entities and relationships among objects. It was developed to facilitate database design by
allowing the specification of an enterprise schema, which represents the overall logical structure
of the database. E-R model is useful in mapping the meaning and interaction of entities onto the
conceptual schema.
Basic concepts:
There are three basic concepts of ER model. They are
1.Entity sets 2.Relationship sets 3.Attributes.
Entity sets:
An entity set is a thing or object in the real world that is distinguishable from the real
world objects. An entity set is a set of entities of the same type that share the same properties and
attributes. The individual entities that constitute a set are said to be the extension of entity set.
For example account number of a customer in a bank identifies the person in the
enterprise. An entity has a set of properties and values for some set of properties may uniquely
identify an entity.
- Entity set do not need to be disjoint for example an employee entity of the bank may
contain the same person in customer of that bank.
Attributes:
An entity is represented by a set of attributes. Attributes are descriptive properties possed by
each member of entity set.
Example: The set of all customers in a bank can be called as “Customer”. An entity’s
represented by set of attributes.
For each attributes there are set of permitted attributes. It is called as a domain or value set of
that attribute. The domain of the attribute may be the set of all text strings of certain length. An
attribute of an entity set is a function that maps from the entity set into domain. Entity set can be
described as (attribute, data value).
The attributes are of different types they are
1. Simple and composite attributes
2. Single valued and multi valued attributes
3. Null attributes
4. Derived attributes.
1.Simple and composite attributes:
Simple attributes cannot be divided into subparts. Composite attributes can be divided
into subparts. For example customer name can be divided into first name, last name, middle
name etc. Customer address is also a composite attributes which may be divided into street
name, door number house name, city, pin code. Etc.
10
2. Single valued and multi-valued attributes:
The attributes that we have specified in our examples all have a single value for a
particular entity. This is called as single valued attributes. For example an employee number can
have only a single number. So employee number is a single valued attribute. The attributes may
have a set of values for a specific entity. These attributes are called multi-valued attributes where
appropriate upper bound, lower bounds may be placed on the number of values in a multi valued
attributes. For example customer address in a bank may have more than one address. Their
limited boundary is 1 or two.
3. Null attributes:
A null value is used when an entity does not have a value for an attribute. As an
illustration if a particular customer has no two addresses the second address value may be Null.
4.Derived attributes:
The value of this type of attribute can be derived from the values of other related
attributes or entity.
For example in bank calculate current loan time from the loan start date to today’s date.
Relationship sets:
A relationship is an association among several entities. For example we can define a
relationship that associates customer Henry with loan L-15.
A relationship set is a set of relationships of the same type. It is mathematical relation on
n>=2 entity set (possibly non distinct). If E1,E2,…En are entity sets then the relationship set R is
a subset of {(e1,e2,….en) | e1 E1, e2 E2 ,…. En En } where (e1,e2…en) is a relationship.
Consider two sets customer and loan we define the relationship set borrower to denote the
association between a bank loan and customer.
L-17
1000
Henry 312-317 main gowthami
The association between the entity set is referred to as Participation. Ie the entity sets E1,E2..
En participates in an E-R schema represents that an association exists between the named
entities.
The function that an entity plays in a relationship is called that entity’s role. The role of
the entity is implicit. Maximum cases the entity set of a relationship set are no distinct.
The same entity set participating in the relationship set more than once, in different roles.
This type of relationship set, which is called recursive relationship set. Implicit role names are
necessary to specify how an entity participates in a relationship instances.
Descriptive attribute: may also have descriptive attribute. Consider set depositor with
entity sets customer and account. We associate the attribute access _date to that relation.
Binary relationship: The relationship that involves two entity sets.
Ternary –relationship: the relationship that involves more than two entity sets.
The number of entity set that participates in a relationship set is degree of the relationship set.
Binary relationship degree is 2. Ternary relationship degree is 3.
Keys:
To specify entities within an entity set and relationship with in a relationship set. A key is
a single attribute or combination of two or more attributes of an entity set that is used to identify
one or more instances of the set. The difference among these entities and relationships the
concept of keys is used.
Entity set:
Primary key:
Unique entity identifier is referred as primary key.
Super key:
if we add additional attributes to a primary key, the resulting combination would
uniquely identify an instance of an entity set. So therefore a primary key is minimum super key.
Candidate key:
There may be two or more attributes that uniquely identify an instance of an entity set.
These attributes or combination of attributes are called candidate key.
Here we must decide which of the candidate key is a primary key other keys are alternate key.
Secondary key:
11
Secondary key is an attribute or combination of an attribute that may not be a
candidate key but that classifies the entity set on a particular characteristic. Eg department
attribute.
Relationship set:
The primary key of an entity set allows us to distinguish among the various entities of the
set. We need similar mechanism to distinguish between various relationships of the relationship
sets.
Let R be a relationship set involving entity sets E1, E2…En. Let primary key (Ei) denote
the set of attributes that forms the primary key fro entity set Ei. Assume that the attribute names
of all primary keys are unique.( if they are not ,use an appropriate schema). The composition of
the primary key for a relationship set depends on the structure of the attribute associated with the
relationship.
Primary key (E1) U primary key (E2)… U primary key (En)
If the relationship set R has attribute a1, a2… an associated with it.
Primary key (E1, E2…U {a1, a2…an}) this from a super key for relationship.
Entity relationship diagram:
An E-R diagram can express the overall logical structure of a database graphically.
The major components of an ER diagram are:
 Rectangles- that represents entity set.
 Ellipses - which represent attributes.
 Diamonds - which represent relationship sets
 Lines -which link attributes to entity sets and entity sets to relationships sets
 Double ellipses - which represent multi valued attributes.
 Dashed ellipses -which denote derived attributes
 A double line, which indicates total participation of an entity in a relationship set.
The entity relationship diagram, which consists of two entity sets, customer, loan through
a binary relationship set borrower. The relationship borrower may be many-to-many, one-tomany, many-to-one or one-to-one.
For a binary relationship set R between entity sets A and B, the mapping cardinality must
be one of the following:
 One to one: an entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity set A.
 One to many: An entity in A is associated with any number of entities in B, an entity in
B, however, can be associated with at most one entity in A.
 Many to one: An entity in A is associated with at most one entity in B, an entity in B,
however, can be associated with any number of entities in A.
 Many to many: An entity in A is associated with any number of entities in B, and an
entity in B is associated with any number of entities in a.
One to one
one to many
A
b
A
1
B
2
Many to one
A
A
1
A
b
B
2
many to many
b
B
2
A
b
A
1
B
2
12
The relationship set borrower may be many to many, one to many, many to one or one tone. To
distinguish among these types, we draw either a directed line(
) or an undirected line (
)
between the relationship set and the entity set.
 A directed line from the relationship set borrower to the entity set loan specifies that
borrower is either a one to one or many to one relationship set from customer to loan;
borrower cannot be a many to many or a one to many relationship set from customer to
loan.
 An undirected line from the relationship set borrower to the entity set loan specifies that
borrower is either a many to many or one tot many relationship set from customer to
loan.
If a relationship set has also some attributes associated with it then we link these attributes to the
relationship set.
Access date
Cus no
Customer
balance
accno
Cus add
Account
depositor
E-R diagram with an attribute attached to a relationship set
We can indicate roles in E-R diagram by labeling lines that connect diamonds to rectangle.
Employee name
manager
employee
worker
Worksfor
Non-binary relationship sets ca be specified easily in an E-R diagram.
Weak entity set:
An entity sets may not have sufficient attributes to form a primary key. Such an entity set
is termed as weak entity set. An entity set that has a primary key is termed as Strong entity set.
For a week entity set to be meaningful, it must be part of one to many relationships set.
Although the week entity set doesn’t have a primary key, we need to distinguish among
all those entities in the entity set that deepened on the particular strong entity. The discriminator
of a week entity set is set of attributes that allow this distinction to be made. The discriminator of
a week entity set is also called the partial key.
The primary key of a weak entity set is formed by the primary key of the strong entity set
on which the weak entity set is existence dependent, plus the weak entity set’s discriminator.
A weak entity set is indicated in E-R diagrams by a doubly outlined box and the
corresponding identifying relationship by a doubly outlined diamond.
13
Pay_date
Loan _no
loan
Amount
Pay_no
Loanpayment
Pay_amt
payment
Extended E-R Features:
Specialization:
An entity set may include the sub groupings of entities that are distinct in some way from
the other entities in the set. For example consider the entity set account with attributes account
number and balance. The account can be classified into
Savings bank
Checking account.
The entity account consist of two attributes account number and balance. Each of the
above account types may have its own attributes along with the standard account attributes. Thus
the process of designating the sub groupings is called as specialization. An Entity may be
specialized by on e or more distinguishing feature. In case of an account the type of the account
is termed as a distinguishing feature.
In terms of an ER diagram specialization is depicted by a triangle which is Labeled ISA. ISA
stands for “is a”.
Generalization:
Generalization is a simple inversion of specialization. The design goes in a bottom up
fashion. Generalization proceeds from recognition that a number of entity sets share some
common features. Based on the commonalities generalization is used to synthesize these sets into
a single, higher level entity sets. It is mainly used to hide the differences and it also provides the
economy of representation in that shared attributes are not repeated.
Relational Model
The relational model has established itself as the primary data model for commercial data
processing applications.
Structure of relational databases:
A relational database is a collection of tables each of which is assigned a unique names . A row
in a table represents a relationship among the set of values.
Consider the relation
In which there are column headers branch name , account number and balance which are called
as attributes. For each and every attribute there will be a domain of database.
For branchname let D1 be the domain set which denotes all the branch names and D2, D3 be
the set which denotes account number and balance respectively . So the account is the subset of
D1 x D2 x D3. We require that all the relations the domains should be atomic and a domain is
said to be atomic when it is indivisible units . It is possible for several attributes to have a
same domain. If there exists two relations customer and employee it may hail from the same
domain set. The attribute employee may come from a common domain but balance and
branch name cannot be from a same domain.
One domain value which is a member of any possible domain value is called as null value. For
example the telephone number of the customer may be not known or it would not exist.
Database schema :
14
Database schema is the logical design of the database and a database instance which is a
superset if the data in the database in the given instant of time.
Relation and relation schema :
The concept of the relation corresponds to the programming language notion of the variable.
The concept of relational schema corresponds to the type definition of the programming
language.
Branch relation :
Branchname branchcity assets
Rspuram
cbe
1000
Saibabacly
cbe
2600
The relation schema is named using capital letters( starting with) and relations are named
using small letters.
Account schema =(branchname, account number, balance)
we denote a fact that account is a relation on the account schema .
account( Account schema)
A relation instance corresponds to the programming language variable which may hold any
changed value when databases are updated.
We can relate the tuples(RECORDS) in two relations for example.
Branch schema=(branchname , branchcity,assets)
Accountschema =(branchname , account number , balance)
We can see that branch name is present in both the schemas and we can relate these two
schemas. Data redundancy is not counted . suppose we wish to find the information about all
the account in branches located in coimbatore , we first look at the branch relation to find the
names of all branches located in coimbatore. The for each branch we would look in account to
find the information about the maintained in the branch
It is not always advisable for single schema rather than multiple schema. The
disadvantage of the type of approach is redundancy of data.
main
Relational Algebra
The relational algebra is a procedural query language. It consists of operations, which takes two
relations or one relation as input and produces a new relation as output. The fundamental
operations of relational algebra are
1.
2.
3.
4.
5.
6.
Select
Project
Union
Set difference
Cartesian product
Rename.
Fundamental operations of relational algebra:
Select operation:
The select operation selects the tuples that satisfy the given predicate, we use the lower
case Greek letter sigma . The predicate acts as a subscript to sigma.
15
σ branchname=”cbe”(loan)
we can find all the tuples where the loan number is greater than 1200
σamount > 1200(loan)
In general the co mparison operator which are used are
1. =
2. #
3. <
4. <=
5. >
^ id used as the logical AND and v is used as logical OR
Project operation :
The project operation is a unary operation that returns it arguments relation with certain left out.
It is a set so the duplicate rows are eliminated . It is denoted by π.
Π loan number , loan amount(loan)
Composition of relational operations :
The result of relational operation is itself is a relation . This fact is useful when we want to find
all the customers who live at a place “Saibaba colony”
Π customer name(σ customer city = “Saibaba colony”(customer))
Relational algebra expressions can be composed of relational expressions as input.
Union operation :
This operation can be explained as follows ,
If we want the list of all names of all the bank customers who have either an account or loan
or both. Customer relation does not contain loan information and borrower relation does not
contain the customer and bank account information. So there comes the union operation.
Π customer name (borrower) , Π borrower name ( depositor)
To union these we use the symbol U
Π customer name (borrower) U Π borrower name ( depositor)
This fetches all the customers who own a account and owe a loan to the bank. Since relations
are sets duplicate values are eliminated.
The union operationk need the following conditions to hold.
1. The relations r and s must be of the same number of attributes.
2. The domains of the i th attribute of r and i th attribute of s mulst be same for all i.
The set difference operation :
The set difference operation denoted by – allows us to find tuples that are in one relation
but are not in another . The expression r – s results in a relation containing those tuples in r
but not in s.
Π customer name ( deposior)- Π customer name( borrower)
16
For set difference operation to execute
1. r,s should be of the same arity
2. Domains of the ith attribute of r and ith attribute of s be the same.
Cartesian product operation :
A Cartesian product is denoted by a cross (x) . It allows us to combine information from any
two relation.
If same attribute name is used in the relations we use
R = borrower x loan.
Borrower :
Loan:
Customer
Loan
name
number
Branch name
Jones
l-17
Comibatroe
Smith
l-23
Trichy
Hayes
l-15
Jackson
l-93
Salem
Curry
l-11
Chennai
Smith
l-17
name of the relations to differentiate.
Loan no
amount
L-17
L-14
L-15
L-12
40000
7000
56778
34456
Eg
Suppose we want to find the names of all customers who are having an account with Cbe branch
σ branchname=”cbe”(borrower X loan)
the output of the above statement is given overleaf.
To select the matching loan numbers alone,
σ borrower.loannumber = loan.loannumber
σ branchname=”cbe”(borrower X loan)
to display the customers name
Π customer_name (σ borrower.loannumber = loan.loannumber (σ branchname=”cbe”(borrower X
loan) )
Rename Operation:
Unlike relations in database , the results of the database may not have a unique name that we
can use to refer to them. It is useful to given them “Names”.
This can be done by using rename operation.
This is denoted by the symbol  (rho).
x(E)
To illustrate its use , consider the query . “Find the largest account balance in the bank.
Step 1: compute a temporary relation consisting of those balances that are not the largest.
Step 2: Take set difference between the relations. Π balance (account)
The comparison operation can be done by Cartesian product (account X account) and comparing
the value of two balances appearing in one tuple.
The temporary relation that consists of the balances that are not the largest.
Π account.balance (σ account _balance <d.balance(account X d(account)))
Π balance (account) Π amount.balance (σ account _balance <d.balance(account X d(account)))
Additional operation:
17
Views:
 it is not desirable for all the users to see the entire logical model.
 Security constraints may be set and part of the database might be hidden from the user.
 It matches better to the users view.
 A relation that is not part of the logical model but it is made available to the users.
Πbranch_name ,customername (depositor X account) U Π branch_name ,customername
(borrower X loan)
Questions:
2 marks question:
 What is DBMS?
 What is a schema?
 What is data independence?
 What is procedural DML
 What is entity set?
 Explain briefly about attributes.
 What is relationship set?
 What is existence dependency?
 What is select clause algebra
 Explain the algebra for view.
Descriptive questions.
 Explain the data abstraction.
 Explain the data models.
 Describe the use of Transaction management.
 Explain the use of storage management
 Explain the overall system structure
 Explain E-R model. What are the extended features of E-R model?
 Explain the symbols used in E-R model
 Explain the mapping Cardinalities
 Explain Weak entity sets.
 Explain the different relational symbols used.
 Explain rename operation.
 Explain the tuple operation
***************************************************************************
Download