COM 342

advertisement
COM 242
DATABASE MANAGEMENT
SYSTEM (DBMS)
-1-
DATABASE MANAGEMENT SYSTEM (DBMS):
DBMS consists of a collection of interrelated data and a set of
programs to access them. The collection of data usually retreat to as the
database contains information about a particular enter price.
The primary goal of a DBMS is to provide an environment that is both
convenient and efficient to use in storing and retrieving database
information.
Database systems are designed to manage large bodies of information.
The management of data involves both the definition of structure for the
storage of information and the provision of mechanisms for the manipulation
of information.
In addition, the database system must provide for the safety of the
information stored despite system crashes or attempts at unauthorized
access. If data are to be shared among several users, the system must provide
for concurrent access.
Data Abstraction and Viewing
A major purpose of a database system is to provide the user with an
abstract view of the data. The system hides the complexity and the detail of
how the data are stored and retrieved in the DBMS. There are three levels of
data abstraction.
1- Physical Level
The lowest level of abstraction describes how the data are actually
stored on physical devices like disks & tapes. Complex set of low-level data
structures are defined in accordance with the operating system in use.
2- Logical Level
Logical level abstraction describes what data are stored in the
database, and what relationship exists among them. The logical structure for
the database is defined by the database administrator, who must decide what
information is to be kept in the DB.
-2-
3- View Level
The highest level of abstraction describes only part of the database
which the user will apply a certain application program price to interact with
the selected portion of the database. The user does not need to know about
the complexities of the structure and how the data are stored and retrieved in
the database.
View level
View 1
_______
View 2
View n
Logical
Level
Physical
Level
The three levels of data abstraction
Example:
The following example may be used to clarity the distraction among
levels of abstraction. A high-level programming language may declare a
customer record as following.
type customer = record
Customer-Name : string ;
Customer-Name : string ;
Customer-Name : string ;
end ;
The code defines a new record called customer with three fields. Each
field has a name and a type associated with it.
-3-
At physical level a customer record can be described as a block of
consecutive storage location (for example, words on bytes). The compiler
hides this level of detail from the programmer.
At the logical level the programmer will use the record name, field
name and the type to design procedures for the user.
At the review level users will employ the programs to store and
retrieve the data to and from database. The detail on logical structure used
by the programmers and the methods of store-retrieve operations are again
hidden.
Data Models
A collection of conceptual tools for describing data, data relations,
data schematics and consistency constraints underlying the structure of a
database is called data model.
Various data models proposed full into three different groups.
1. Object-Based logical models
2. Record-Based logical models
3. Physical models
1- Object-Based Logical Models
Object-Based logical models are used to describe data at logical and
view levels. They are characterized by the fact that they provide fairly
flexible structuring capabilities and allow data constrains to be specified
explicitly.
There are many different models. The most widely known ones are:
 The Entity-Relationship model
 The Object-Oriented model
 The Schematics data model
 The functional data model
Let’s have a closer look at the first two
-4-
The Entity-Relationship Model
The E-R model is based on a perception of a real world that consists
of a collection of basic objects called ENTITIES, and of relationships among
those objects. An entity is a “Thing” or “Object” in the real world that is
distinguishable from other objects. For example each person is an entity;
bank accounts can be considered to be entities. Entities are described in a
database by a set of attributes. For example, the attributes Account-Number
and Account-Balance describe one particular account in a bank.
A relationship association among several entities, For example, a Depositor
relationship associates a customer with each account that he/she has the set
of all entities of the same type, and the set of all relationships of the same
type are termed as entity set and relationship set respectively.
The over all logical structure of a database can be expressed
graphically by an E-R diagram which is built-up from the following
components.
 Rectangles : which represent entity set
 Ellipses
: which represent attributes
 Diamond : which represent relationships among entity sets
 Lines
: which link attributes to entity sets and entity sets to
relationships
Social
-Security
Customer
-Name
Customer
-Street
Account
-Number
Balance
Customer
-City
Customer
Depositor
Account
Fig: A sample E-R Diagram
Each component is labeled with the entity or relationship that is represents.
-5-
The Object-Oriented Model
Like the E-R model, the object-oriented model is based on a collection
of objects. An object contains values stored in instance variables within the
object. An object also contains bodies of code that operate on the object.
These bodies of codes are called methods.
Objects that contain the same type of values and the same method are
grouped together into classes. A class may be viewed as a type definition for
objects. This combination of data and methods comprising a type definition
is similar to a programming-language abstract data type.
Example:
Consider an object representing a bank account. Such an object
contains instance variables account-number and balance. It contains a
method pay-interest, which adds interest to the balance. Assume that the
bank pays 6% interest to all accounts, but now is changing its policy to pay
5% to those balances less than $1000. Under most data models, making this
adjustment would involve changing code in one or more application
program. Under the object-oriented model, the only change is made within
the pay-interest method. The external interface to the objects remain same
2- Record-Based Logical Models
Record-Based logical models are used in describing data at the logical
and view level. In contrast to object-based data models, they are used both to
specify the overall logical structure of the database and to provide a higherlevel description of the implementation.
Record-Based models are so named because the database is structured
in fixed-format records to several types. Each record type defines a fixed
number of fields, or attributes, and each field is usually of a fixed length
records simplifies the physical-level implementation of the database.
The three most widely accepted Record-Based data models are...
 The relation model
 The network model
 The hierarchical model
-6-
The Relational Model
The relational model uses a collection of tables to represent both data
and relationship among that data. Each table has multiple columns; each
column has a unique name.
Example:
The below figure represents a sample relational database comprising
of two tables, one shows bank customers, and the other shows the accounts
that belong to those customers. It shows that the customer Johns with Socialsecurity number 321-12-3123 lives on main in Harrison, and has an account
A-201 with balance of $500.
Here the tables have a common column to link customers and their
respective balance.
CustomerName
Johnson
Johns
Smith
SocialSecurity
192-83-7465
321-12-3123
345-24-8153
CustomerStreet
Alma
Main
Park
AccountNumber
A-101
A-201
A-305
CustomerCity
Rye
Harrison
Stamford
Balance
$600
$500
$900
Fig: A sample relational database
-7-
AccountNumber
A-101
A-201
A-305
The Network Model
In the network model the tables have no common columns. Instead the
relationships among data are represented by links, which can be viewed as
pointers. The records in the database are organized as collection of arbitrary
graphs.
Johnson
Johns
Smith
192-83-7456
321-12-3123
345-24-8153
Alma
Main
Park
Rye
Harrison
Stamford



A-101
A-201
A-305
$600
$500
$900
Fig: A sample network database
The Hierarchical Model
Similar to the network model in the sense that data and relationship
among them are represented by records and links, respectively, it differs
from the network model in that the records are organized as collections of
trees rather than arbitrary graphs.
DATABASE
Johnson
192-83-7456
Johnson
Alma
Rye
192-83-7456
Johnson
A-101
$600
A-101
Alma
Rye
192-83-7456
$600
Fig: A sample hierarchical model
-8-
Alma
A-101
Rye
$600
3-Physical Models
Physical data models are used to describe data at the lowest level. In
contrast to logical data models, there are few physical models in use. Tow of
the widely ones are the unifying model and the frame-memory model.
Database Language
A database system provides two different types of languages, one to
specify the database schema, and the other to express database queries and
updates.
Data Definition Language
A database schema is specified by a set of definitions expressed by a
special language called data definition language (DDL). The result of
compilation of DDL statements is a set of tables that is stored in a special
file called data dictionary or data directory.
A data dictionary is a file that contains METADATA- that is data
about data. This file is consulted before actual data are read or modified in
the data base system. The storage structure and access methods used are also
defined.
Data Manipulation Language
The language used for data abstraction and data manipulation is called
data manipulation language (DML).
Data manipulation includes
 The retrieval of information stored in database
 The insertion of new information into the database
 The deletion of information form the database
 The modification of information stored in the database
Appropriate algorithms are defined to efficiently access the data and
allow high level of data abstraction and human interaction with the database
system.
-9-
Query Language
A query is a statement requesting retrieval and manipulation of
information that are registered in the database. The portion of the DML that
involves such request is called Query language or structured query language
(SQL)
Application Programs
These are programs that contain user instructions to interact with the
database systems through calls to DML.
Programming Languages like C, Pascal, Delphi, Visual Basic …etc
are used to organize this user interacting in a user friendly environment.
Transaction Management
A transaction is a collection of operations that perform a single logical
function in a database application.
Example:
In a banking system, if a fund transfer is to be made from account-A
to account-B, then the amount to be transferred is incremented on account-B
and decremented on account-A.
Supposing that account-A balance was $300 and account-B balance was
$100 prior to transfer operation.
Account-A
Account-B
Transfer
New Balances
$300
-50
$250
- 10 -
$100
+50
$150
The application program is responsible for forwarding both processes
as an All – or – None bases. That is, either performs both calculations
correctly and in full or don’t perform at all. This is called Atomicity.
It is also essential that the execution of these processes preserve the
database consistency. That is the accounts A&B will reflect the same effect
after the fund transfer.
Account-A
Account-B
Account-A + Account-B
Before Execution
$300
+$100
$400
After Execution
$250
+$150
$400
The accounts will be consistent because the transfer operations
updated both balances correctly and consistently. Therefore the database
remains consistent.
The Relational Model Structure
The relational model has established it self as the primary data model
for commercial data processing applications.
A relational database consists of tables, each of which is assigned a
unique name. A row in a table represents a relationship among set of values.
Since a table is a collection of such relationships, there is a close
correspondence between the concept of table and the mathematical concept
of relations from which the relational data model tales its name.
Consider the following banking enterprise representing a portion of
total banking operation. Consider the account table Fig: 1.1, it has 3 column
headers, Branch-Name, Account-Number and Balance. These headers are
called attributes. For each attribute there is a set of permitted values called
the domain of that attribute. For the attribute Branch-Name, the domain is all
the Branch-Names. Let D1 denote this set, D2 denotes the set of all accountnumbers and D3 the set of all balances. Each row entry is called a tuple. Any
raw in the table 1.1 is made of 3-Tuple entry where V1 is the Branch-Name
in the domain D1, V2 is the Account-Number in the domain D2 and V3 is
the Balance in the domain D3.
- 11 -
In general account will contain only a subset of the set of all possible
rows. Therefore, account is a subset of
D1 D2  D3
BranchName
Downtown
Mianus
Perry ridge
Round Hill
Brighton
Redwood
Brighton
AccountNumber
A-101
A-215
A-102
A-305
A-201
A-222
A-217
Balance
500
700
400
350
900
700
750
Fig: 1.1 Account Relations
The table shown in Fig: 1.1 is the relation and each row in the table is
called a tuple.
Database Schema
The database schema denotes the relation schema for any given tables.
It is the list of attributes and their corresponding domains. For example the
Account-Schema = (Branch-Name, Account-Number, Balance)
Continuing with banking example, we need to know where each
branch is and the assets. Fig: 1.2 is another relation that shows
Branch-Schema = (Branch-Name, Branch-City, Assets). Since we need
customers we have to have customer relation
Customer-Schema = (Customer-Name, Customer-Street, Customer-City)
As shown in Fig: 1.3. We also need a relation to describe the association
between customers and accounts. The relation schema to describe this
association is shown in Fig: 1.4 as
Depositor-Schema = (Customer-Name, Account-Number)
- 12 -
BranchName
Downtown
Redwood
Perry ridge
Mianus
Round Hill
Pownal
North Town
Brighton
CustomerName
Jones
Smith
Hayes
Curry
Lindsay
Turner
Williams
Adams
Johnson
Glenn
Brooks
Green
BranchCity
Brooklyn
Palo Alto
Horse Neck
Horse Neck
Horse Neck
Bennington
Rye
Brooklyn
Fig: 1.2 Branch Relations
CustomerStreet
Main
North
Main
North
Park
Putnam
Nasser
Spring
Alma
Sand Hill
Senator
Walnut
Fig: 1.3 Customer Relations
CustomerAccountName
Number
Johnson
A-101
Smith
A-215
Hayes
A-102
Turner
A-305
Johnson
A-201
Jones
A-217
Lindsay
A-222
Fig: 1.4 Depositor Relations
- 13 -
Assets
9000000
2100000
1700000
400000
8000000
300000
3700000
7100000
CustomerCity
Harrison
Rye
Harrison
Rye
Pittsfield
Stamford
Princeton
Pittsfield
Palo Alto
Wood Side
Brooklyn
Stamford
We include two additional relations to describe data about Loans
maintained in the various branches of the bank
Loan-Schema = (Branch-Name, Loan-Number, Amount)
Borrower-Schema = (Customer-Name, Loan-Number)
BranchName
Downtown
Redwood
Perry ridge
Downtown
Mianus
Round Hill
Perry ridge
LoanNumber
L-17
L-23
L-15
L-14
L-93
L-11
L-16
Amount
1000
2000
1500
1500
500
900
1300
Fig: 1.5 Loan-Branch Relations
CustomerName
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
LoanNumber
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
Fig: 1.6 Borrower Relations
The banking enterprise we have described is derived from the E-R
diagram shown in Fig: 1.7
- 14 -
Account-No
Balance
Customer-City
AccountBranch
Borrower
Loan
LoanBranch
Branch
Branch-City
Branch-Name
Loan-No
Assets
Amount
- 15 -
Account
Depositor
Customer
Customer-Name
Customer-Street
Fig: 1.7 E-R diagrams for the banking enterprise
Keys
It is important to be able to specify how an entity within an entity set
or a relationship within a relationship set is distinguished. Keys allow us to
make such distinction.
Candidate Key
One or more attributes taken collectively can identify uniquely an
entity in an entity set. For example, the social-security attributes of the entity
set customer is sufficient to distinguish one customer entity from another.
Thus social-security is a candidate key. Customer-Name, Customer-Street &
Customer-City collectively is another candidate key since it is highly
unlikely that a second customer will have the same name, street & city.
Primary Key
The primary key is the candidate key that is chosen by the DBMS
manager, to uniquely identity the entities within the entity set. The
remaining candidate keys (if there are any) become the Alternate keys. In
some cases the alternate keys are allowed to have duplicate values.
Foreign Key
When two or more tables (attribute sets) are linked together, the
primary key is used to set up the relationship. The primary key of one set of
attribute that is used in relating the other set of attributes is said to be the
foreign key in the other set. For example, in Branch-schema,
{Branch-Name} and {Branch-Name, Branch-City} are both candidate keys.
They can both be primary keys {Branch-City} can not be candidate key
because two different branches with different names. {Branch-Name} in the
Loan relation is a foreign key since the Branch-Name is the primary key in
the branch relation setting up the relationship between the two relations.
The Relation Algebra
The relation algebra is a procedural query language. It consists of a set
of operations that takes one or more relations as input and produce a new
relation as their result. The fundamental operations in the relational algebra
- 16 -
are SELECT, PROJECT, UNION, SET DIFFERENCE and CARTISIAN
PRODUCT
a) The Select Operation
The select operation selects tuples that satisfy a given predicate.
Sigma ( ) is used to denote the selection.
Example:
To select those tuples of the Loan relation where the branch is
“Perry ridge”
 Branch-Name = “Perry ridge” (Loan)
The result of the query is
Branch-Name Loan-Number Amount
Perry ridge
L-15
1500
Perry ridge
L-16
1300
Example:
To find all tuples in which the amount lent is more than $1200.
 Amount > 1200 (Loan)
Example:
To find all tuples that has the Branch-Name “Perry ridge” and
amount > 1200.

Branch-Name = “Perry ridge”
 amount > 1200 (Loan)
In general comparisons are carried out by using (=, ≠, <, >, ≤, ≥)
in the selection predicate. Further more we can combine several predicates
into longer predicates by using connectives AND ( ) & OR ( ).

- 17 -

b) The Project Operation
The project operation will project (list) the named entities in the tuple
and suppress the others. The projection is represented by pi (  ).
Example:
 Loan-number, amount (Loan), will result
Loan-Number
L-17
L-23
L-15
L-14
L-93
L-11
L-16
Amount
1000
2000
1500
1500
500
900
1300
Composition of Relation Operations
The result of a relational operation is also a relation. An expression
(like arithmetical) can be used to evaluate a relation.
Example:
To project the customer-name that live in “Harrison”
 Customer-Name (
Customer-City = “Harrison”) (Customer)
will result as
Customer-Name
Johns
Hayes
c) The Union Operation
Two entities in the same or different tuples may be joined together as
a single query. The (  ) character is used for uniting two queries into one.
Example:
To project all the customers with an account in the bank
 Customer-Name (Depositor)
- 18 -
To answer the query, we need the union of these two sets.
 Customer-Name (Borrower)   Customer-Name (Depositor)
Customer-Name
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Adams
Notice that there are only 10 tuples in the result. The duplicate names
are eliminated since all relations are also sets. Here Smith, Johns and Hayes
are both Borrowers as well as Depositors.
d) The Set Difference Operation
The set difference operation allows us to find tuples that are in one
relation and not in another. The ( − ) minus sign is used.
Example:
To find the customers of the bank who have an account but not a
Loan.
 Customer-Name (Depositor) −  Customer-Name (Borrower)
Will result as
Customer-Name
Johnson
Turner
Lindsay
- 19 -
e) The Cartesian Product Operation
The Cartesian product operation denoted by a cross (  ) allows us to
combine information from any two relations.
Example:
The relation schema for r = Borrower  Loan which gives
(Borrower. customer-name, Borrower. Loan-number, Borrower. BranchName, Loan. Loan-number, Loan. amount)
Since the Cartesian product will join every tuple of one relation to
every tuple of other the resultant relation will be as below.
CustomerName
Jones
Jones
:
:
:
Jones
Smith
Smith
:
:
:
Smith
:
:
Borrower.
Loan-number
L-17
L-17
:
:
:
L-17
L-23
L-23
:
:
:
L-23
:
:
BranchName
Downtown
Redwood
:
:
:
Perry ridge
Downtown
Redwood
:
:
:
Perry ridge
:
:
Loan.
Loan-number
L-17
L-23
:
:
:
L-16
L-17
L-23
:
:
:
L-16
:
:
Amount
1000
2000
:
:
:
1300
1000
2000
:
:
:
1300
:
:
Suppose that we want the names of all customers who have a Loan at
Perry ridge branch. We need information in both the Loan relation and the
Borrower relation.
 Branch-Name = “Perry ridge” (Borrower  Loan)
- 20 -
Result:
CustomerName
Jones
Jones
Smith
Smith
:
:
:
Adams
Adams
LoanNumber
L-17
L-17
L-23
L-23
:
:
:
L-16
L-16
BranchName
Perry ridge
Perry ridge
Perry ridge
Perry ridge
:
:
:
Perry ridge
Perry ridge
LoanNumber
L-15
L-16
L-15
L-16
:
:
:
L-15
L-16
Amount
1500
1300
1500
1300
:
:
:
1500
1300
Since the resulting relation shows duplicate Branch-Name, we take
the relation with elimination of duplicate values as
 Borrower. Loan-Number = Loan. Loan-Number ( Branch-Name =
“Perry ridge” (Borrower
CustomerName
Hayes
Adams
 Loan))
LoanNumber
L-15
L-16
BranchName
Perry ridge
Perry ridge
LoanNumber
L-15
L-16
Amount
1500
1300
And if we just want to project the names.
 Customer-Name (
(

Borrower. Loan-Number = Loan. Loan-Number
Branch-Name = “Perry ridge” (Borrower  Loan)))
Customer-Name
Hayes
Adams
- 21 -
Additional Operations
The fundamental operations of the relational algebra are sufficient to
access any relational algebra query. However certain common queries are
lengthy to express just using fundamental operations. Therefore some
additional operations are defined.
a) The Set-Intersection Operation
The set-intersection operation denoted by (  ) selects the tuples of
two or more relations intersection.
Example:
Suppose that we wish to find all customers who have both a Loan and
an account using set difference operation
 Customer-Name (Depositor) – (  Customer-Name (Depositor) –
 Customer-Name (Borrower))
r  r  s
Using set intersection we can write
 Customer-Name (Depositor)   Customer-Name (Borrower)
r  s  r  r  s
The result of the query will be
Customer-Name
Hayes
Jones
Smith
- 22 -
b) The Natural-Join Operation
The natural-join operation is a binary operation that allows us to
combine certain selections and a Cartesian product into one operation.
Example:
Consider the query, "Find the name of all customers who have a Loan
at the bank, and find amount of the Loan.
 Customer-Name, Loan.Loan-number, amount
(
 Borrower. Loan-Number = Loan. Loan-Number
(Borrower
 Loan)
The natural-join operation denoted by join (
) forces equality on
those attributes that appears in both relations and removes the duplicate
attributes.
 Customer-Name, Loan-Number, Amount
(Borrower
Loan)
The resulting table is
CustomerName
Jones
Smith
Hayes
Jackson
Curry
Smith
Williams
Adams
LoanNumber
L-17
L-23
L-15
L-14
L-93
L-11
L-17
L-16
Amount
1000
2000
1500
1500
500
900
1000
1300
Example:
Find the names of all branches with customers who have an account in
the bank and who lives in Harrison.
 Branch-Name, ( Customer-City = "Harrison"
(Customer
Account
- 23 -
Depositor))
The result is
Branch-Name
Brighton
Perry ridge
Example:
Find the customers who have both Loan and an account in the bank.
Two possible expressions can be written for this example
i)
 Customer-Name (Borrower
Depositor)
OR
ii)
 Customer-Name (Borrower)   Customer-Name (Depositor)
Result in both cases …
Customer-Name
Hayes
Jones
Smith
c) The Division Operation
The division operation denoted by (  ) is suited to queries that
include the phrase “For All”.
Example:
Suppose that we want to find all customers who have an account at all
the branches located in Brooklyn.
 Branch-Name (
Branch-City = "Brooklyn "Branch)) => r1
Branch-Name
Brighton
Downtown
This will give us all the branches in Brooklyn.
- 24 -
Then
 Customer-Name, Branch-Name (Depositor
Account) => r2
This will give us customers and branches that customers have accounts in
CustomerName
Johnson
Smith
Hayes
Turner
Johnson
Jones
Lindsay
BranchName
Downtown
Mianus
Perry ridge
Round Hill
Brighton
Brighton
Redwood
Hence r 2  r1 will result
Customer-Name
Johnson
d) The Assignment Operation
It is convenient at times to write a relational algebra in parts using
assignment to a temporary relation variable. The assignment operation is
denoted by ( ) as in assignment operation used in programming language.
Example:
r   Customer-Name (Depositor)
s   Customer-Name (Borrower)
selection =
r  r  s
- 25 -
The Extended Relational-Algebra Operations
The basic relational algebra operations have been extended in several
ways. A simple extension is to allow arithmetic operations as part of
projection. An important extension is to allow aggregate operation, such as
computing the sum of the elements of a set or their average. Another
important extension is the outer- join operation, which allows relational
algebra expressions to deal with null values, which model missing
information.
a) Generalized Projection
The generalized projection operation extends the projection operation
by allowing arithmetic functions to be used in the projection list. The
generalized projection operation has the form of
 F1, F2, F3… Fn (E)
Where E is the relational-algebra expression, and each of F1, F2, F3… Fn are
arithmetic expressions involving constants and attributes in the schema of E.
Supposing that we have a relation credit-info as described in the
following figure, which lists the credit limit and expenses credit-balance on
the account
CustomerName
Jones
Smith
Hayes
Curry
Limit
6000
2000
1500
2000
CreditBalance
700
400
1500
1750
Fig: The credit info relation
If we want to find out how much more each person can spend, we can
write the following expression.
 Customer-Name, Limit – Credit-Balance (Credit-info)
- 26 -
The result will be
Limit – CreditBalance
3500
1600
0
250
CustomerName
Jones
Smith
Hayes
Curry
b) Outer-Join
The outer-join operation is an extension of the join operation to deal
with missing information supposes that we have the following schemas
which contain data on full-time employees.
Employee (employee-name, street, city)
ft-works (employee-name, Branch-Name, salary)
Let us consider the employee and ft-works relation shown in the
following figures.
EmployeeName
Coyote
Rabbit
Smith
Williams
Street
City
Toon
Tunnel
Revolver
Sea view
Hollywood
Carrot Ville
Death Valley
Seattle
EmployeeName
Coyote
Rabbit
Gates
Williams
BranchName
Mesa
Mesa
Redwood
Redwood
Salary
1500
1300
5300
1500
Fig: The employee and ft-works relations
Suppose that we want to generate a single relation with all the
information (Street, City, Branch-Name and Salary) about fulltime
- 27 -
employees. A possible approach would be to use a natural join operation as
following
 Employee-Name, Street, City, Branch-Name, Salary
(Employee
Ft-works)
The result will be
EmployeeName
Coyote
Rabbit
Williams
Street
City
Toon
Tunnel
Sea view
Hollywood
Carrot Ville
Seattle
Fig: Result of (employee
BranchName
Mesa
Mesa
Redwood
Salary
1500
1300
1500
ft-works)
Notice that we have lost the information on Smith since the tuple
describing Smith is missing from the ft-work relation; similarly we have lost
the information on Gates since the tuple describing Gates is missing from
the employee relation.
We can use outer-join operation to avoid this loss. There are three
forms of the operation
Left outer-join denoted (
Right outer-join denoted (
Full outer-join denoted (
Applying the left outer-join on the (employee
shown in the following figure
EmployeeName
Coyote
Rabbit
Williams
Smith
Street
City
Toon
Tunnel
Sea view
Revolver
Hollywood
Carrot Ville
Seattle
Death Valley
Fig: The result of (employee
- 28 -
)
)
)
ft-works) … the result is
BranchName
Mesa
Mesa
Redwood
Null
ft-works)
Salary
1500
1300
1500
Null
The left outer-join takes all tuples in the left relation that did not
match with any tuple in the right relation, pads the tuples with NULL values
for all other attributes from other relation, and adds them to the result of the
natural join.
Tuple (Smith, Revolver, Death Valley, Null, Null)
Similarly right outer-join of (employee
following figure
EmployeeName
Coyote
Rabbit
Williams
Gates
ft-works) will result as the
Street
City
Toon
Tunnel
Sea view
Null
Hollywood
Carrot Ville
Seattle
Null
Fig: Result of (employee
The full outer-join of (employee
EmployeeName
Coyote
Rabbit
Williams
Smith
Gates
BranchName
Mesa
Mesa
Redwood
Redwood
Salary
1500
1300
1500
5300
ft-works)
ft-works) will be
Street
City
Toon
Tunnel
Sea view
Revolver
Null
Hollywood
Carrot Ville
Seattle
Death Valley
Null
Fig: Result of (employee
BranchName
Mesa
Mesa
Redwood
Null
Redwood
Salary
1500
1300
1500
Null
5300
ft-work)
c) Aggregate Functions
Aggregate functions are functions that take a collection of values and
return a single value as a result.
i)
Sum: Will return the some of all numerical attributes gives in the
relation.
 Sum Salary (ft-works)
The result is 9600.
- 29 -
ii)
Distinct: There are cases where we must compute an aggregate
function. We use the distinct statement with the function. Consider a
part-time employee relation called pt-work as shown below.
EmployeeName
Johnson
Lorena
Peterson
Sato
Rao
Gopal
Adams
Brown
BranchName
Downtown
Downtown
Downtown
Austin
Austin
Perry ridge
Perry ridge
Perry ridge
Salary
1500
1300
2500
1600
1500
5300
1500
1300
Fig: The pt-work relation
If we want to count the number of branches in the relation using
the aggregate function count, we want to avoid counting the duplicate
Branch-Name. The expression giving the number of branches would
be written as
 Count-Distinct Branch-Name (pt-work)
The result will be 3. (Downtown, Austin, Perry ridge)
iii)
Grouping: There are circumstances where we would like to apply the
aggregate function not only to a single set of tuples, but also to several
groups, where each group is a set of tuples; we do so by using an
operation called grouping.
Example:
We may want to find the total salary sum of all part-time
employees at each branch of the bank individually, rather than in the
entire bank. To do so we need to partition the relation pt-works into
groups based on the branch and to apply the aggregate function on
each group. The “G” operator achieves the desired result.
 Branch-Name G Sum Salary (pt-work)
- 30 -
The result is
BranchName
Downtown
Austin
Perry ridge
iv)
Salary
5300
3100
8100
Min & Max: These functions will return the minimum or maximum
value in the selected column of the relation.
Example:
 Employee-Name, Min Salary (pt-work) will return
EmployeeName
Lorena
Brown
Salary
1300
1300
 Employee-Name, Max Salary (pt-work) will return
EmployeeName
Gopal
Salary
5300
If we want to find the max salary in each group of Branch-Name
 Branch-Name, Salary (
Branch-Name G Max salary (pt-works))
The result is
BranchName
Downtown
Austin
Perry ridge
Salary
2500
1600
5300
- 31 -
v)
Avg: This function returns the average value of the selected columns
in the relation.
Example:
 Avg Salary (ft-works)
Will return 2400 since the sum of salaries in the ft-works is 9600 and
average is 9600/4 giving 2400
The Modification of the Database
The modification of the database involves adding and deleting and
changing information in the database. We express database modifications
using assignment operation.
a) Deletion:
A delete request is expressed in much the same way as query.
However instead of displaying the tuple, we remove the selected tuple.
The deletion expression is written as
r rE
Where r is a relation and E is the relational-algebra query.
Example:
To delete all Smith’s account
Depositor  Depositor −
 Customer-Name = “Smith” (Depositor)
To delete all Loans with amounts in the range 0-50
Loan  Loan −
 Amount ≥ 0 and amount ≤ 50 (Loan)
To delete all account at branches located in Brooklyn

r1 
Branch-City = “Brooklyn” (Account
Branch)
r2   Branch-Name, Account-Number, Balance(r1)
Account  Account − r2
- 32 -
b) Insertion:
To insert data into a relation we either specify a tuple or write a query
whose result is a set of tuples to be inserted. An insertion is expressed by
r r E
Where R is a relation and E is the relational-algebra expression.
Suppose that we want to insert the data Smith who has 1200 $ in account
A_973 at the Perry Ridge branch.
Account  Account  {(“Perry ridge”, A973)}
An example of inserting tuples based on the result of a query, suppose
that as a gift we might want to opens a new account to all those customers
has a Loan at the Perry ridge branch. Here the Loan numbers will be used as
account numbers.
r1  (
 Branch-Name = “Parry ridge” (Borrower
Loan)
r2   Branch-Name, Loan-Number (r1)
Account  Account  (r2 × {(200)})
Depositor  Depositor   Customer-Name, Loan-Number (r1)
The result will be
CustomerAccountName
Number
Hayes
A-15
Adams
A-16
Added to the Depositor relation as new customer
And
BranchAccountName
Number
Perry ridge
A-15
Perry ridge
A-16
Added to the accounts relation as new accounts
- 33 -
Balance
200
200
c) Updating:
In Certain situations we may wish to change some of the values in the
existing tuple. We use the generalized projection operator to do this task
r
 F1, F2, F3… Fn (r)
Example:
Suppose that interest payments of 5 % are to be paid to all accounts.
The balances will be updated by 5% increase.
Account   Branch-Name, Account-Number,
Balance  Balance × 1.05 (Account)
Example:
Suppose that balances over 2000 will receive 6 % where as all the
others will receive 5%.
Account   Branch-Name, Account-Name,
Balance  Balance × 1.06 ( Balance > 2000 (Account))


 Branch-Name, Account-Name,
Balance  Balance × 1.05 (
 Balance ≤ 2000 (Account))
- 34 -
Example:
Consider the following DB tables that represent the stock control
management system
Stock Item Relation
Each stock item is represented by its stock code, stock name,
minimum stock level, sale price and amount in hand
Sales Relation
There are four branches of the company that sells the commonly
defined product. The relation for sales includes date, branch numbers, stock
code and quantity sold.
Branches Relation
The four branch of the firm involved in sales are listed with branch
code, branch name, and town
The management requires the following reports to be generated for
evaluation.
The reports will project the necessary information by evaluating the
relational algebra statements
Continue
Stock Items:
Stock
Code
MON 15
KEY 04
DIS 20
MB 100
CD 52
MS 10
CDWR
SC 64
Stock
Name
15” Monitor
Keyboard
20 GB Disk
Main Board
52×CD Rom
Mouse
700 MB CD
64 MB VGA
Min Stock
Level
4
12
6
2
5
24
100
3
E1
Sale
Price
$150
$20
$90
$300
$60
$9
$0.8
$50
Stock in
Hand
3
28
14
1
7
42
245
2
Sales Relation:
Date
1/12/03
1/12/03
1/12/03
1/12/03
1/12/03
1/12/03
1/12/03
2/12/03
2/12/03
2/12/03
2/12/03
2/12/03
2/12/03
2/12/03
3/12/03
3/12/03
3/12/03
3/12/03
3/12/03
3/12/03
3/12/03
3/12/03
Branch Relation:
Branch-Code
B1
B2
B3
B4
BranchNumber
B2
B3
B2
B1
B4
B1
B2
B2
B4
B1
B3
B4
B1
B2
B1
B2
B4
B3
B3
B2
B2
B1
B2
Stock
Code
MB 100
CD 52
CDWR
CDWR
CDWR
KEY 04
MS 10
CD 52
DIS 20
SC 64
MON 15
CD 52
CDWR
MB 100
CDWR
CDWR
CDWR
CDWR
KEY 04
KEY 04
MS 10
CD 52
KEY 04
Branch-Name
Shopping Center Branch
Kyrenia Str. Branch
Famagusta Branch
Harbor Branch
E2
Quantity
Sold
1
1
5
10
2
1
1
1
1
1
1
1
12
1
5
2
4
2
1
2
3
1
1
Town
Nicosia
Nicosia
Famagusta
Kyrenia
Reports to be projected:
a)
b)
c)
d)
e)
f)
g)
h)
i)
j)
k)
The sum of all sales in al branches
The sum of sales by branches
The number sold from each stock item (over all)
The number of “Keyboards” sold on 1/12/2003
The number of “700 MB CD” sold between the dates 1/12/2003 and
3/12/2003
The number of “700 MB CD” sold in Nicosia
The number of “CD 52” sold in “Harbor Branch”
The items that are below Min Stock Level
The over all sales in Nicosia
The number of sales in Shopping Center Branch who’s price is below
40$
The items sold in Famagusta & Harbor Branch
E3
STRUCTURED QUERY LANGUAGE (SQL):
SQL is a transform-oriented and non-procedure language designed to
use relations to transform inputs into required outputs. SQL has two major
components
 A Data Definition Language (DDL) for defining the database
structure
 A Data Manipulation Language (DML) for retrieving and updating
data
SQL contains only these definitional and manipulative commands. It
does not contain flow control commands. There is no IF ... THEN … ELSE,
GOTO, WHILE … DO or other commands to provide a flow control. Due to
this lock of computational completeness, SQL can be used in two ways:
 Use SQL interactively by entering the statements at the terminal.
 Embed SQL statements in a procedural language.
SQL DML Statements
Data manipulation statements in SQL are
 SELECT To query data in the database
 INSERT
To insert data into a table
 UPDATE To update data in table
 DELETE To delete data from table
The SELECT Statement
The purpose of the SELECT statement is to retrieve and display data
from one or more database tables. It is extremely powerful command
capable of performing the equivalent of the relational algebra’s selection,
projection and join in a single statement. The general format of the SELECT
statement is:
SELECT
FROM
[WHERE
[GROUP BY
[ORDER BY
[DISTENCT/ALL] {*/[Column – Expression]}
Table name(s)
condition]
Column – list][HAVING
Condition]
Column – list]
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Specifies which columns are to appear in the output
Specifies table(s) to be used
Filters the rows subject to some conditions
Forms groups of rows with the same column value
Filters the groups subject to some conditions
Specifies the order of the output
The order of the clauses in the SELECT statement cannot be changed.
The only mandatory clauses are SELECT and FROM, the remainders are
optional
Considering the stock control DB, the SELECT statement can be used
to retrieve data as shown below.
i)
Retrieve all columns. All rows in the stocks
SELECT
FROM
Stock-Code, Stock-Name, Stock-Min-Level, Stock-Price,
Stock-Quantity
Stock ;
OR
SELECT
* FROM
Stock ;
The result is
Stock
Code
MON 15
KEY 04
DIS 20
MB 100
CD 52
MS 10
CDWR
SC 64
Stock
Name
15” Monitor
Keyboard
20 GB Disk
Main Board
52×CD Rom
Mouse
700 MB CD
64 MB VGA
Stock Min
Level
4
12
6
2
5
24
100
3
-1-
Sale
Price
$150
$20
$90
$300
$60
$9
$0.8
$50
Stock in
Hand
3
28
14
1
7
42
245
2
ii)
Retrieve specific columns. All rows in stocks
Example:
List all rows of Stocks by Stock-Name, Stock-Price
SELECT
FROM
Stock-Name, Stock-Price
Stock ;
The result is:
Stock
Name
15” Monitor
Keyboard
20 GB Disk
Main Board
52×CD Rom
Mouse
700 MB CD
64 MB VGA
iii)
StockPrice
$150
$20
$90
$300
$60
$9
$0.8
$50
Retrieve specific columns. Unique rows in sales
Example:
List all Branch-Number in sales
SELECT DISTINCT
FROM
Stock ;
Branch-Number
The result is:
BranchNumber
B2
B3
B1
B4
-2-
iv)
Retrieve all columns. All rows with calculated fields
Example:
List all the stocks with current value in hand; list Stock-Code, StockValue
SELECT
FROM
Stock-Code, Stock-Price * Stock-Quantity
Stocks ;
The result is:
Stock
Code
MON 15
KEY 04
:
:
SC 64
Col 2
$450
$560
:
:
$100
The row selection:
Very often certain search condition is imposed on the rows to restrict
the selection process. WHERE clause is used for setting up a search
condition. There are 5 basic search conditions:
 Comparison
 Range
 Set Membership
 Pattern Match
 Null
Compare the value of one expression to the value
of another expression
Test whether the value of an expression falls
within a specified range of value
Test whether the value of an expression equals one
of a set of values
Test whether a string matches a specified
pattern
Tests whether a column has a Null (Unknown)
value
-3-
v)
Compression. Search condition
Example:
List all stocks whose price is greater than $90 by Stock-Code, StockName, Stock-Price
SELECT
FROM
WHERE
Stock-code, Stock-Name, Stock-Price
Stock
Stock-Price > 90 ;
The result is:
StockCode
MON 15
MB 100
StockName
15” Monitor
Main Board
StockPrice
$150
$300
In conditional statements the following logical operators are used
=
Equals
<
Less than
>
Greater than
≤
Less than or equal
≥
Greater than or equal
<>
Not equal to
More complex predicates can be generated using AND, OR and NOT.
The rule of evaluation such conditional expressions are as follow.




An expression is evaluated left to right
Expression inserted in brackets are evaluated first
NOTs are evaluated before ANDs and Ors
ANDs are evaluated before ORs
-4-
vi)
Compound comparison. Search condition
Example:
List all sales of 64 MB VGA or Keyboard by Sales-Date, BranchNumber, Stock-Code
SELECT Sales-Date, Branch-Number, Stock-Code
FROM
Sales
WHERE
Stock-Code = “SC 64” OR
Stock-Code = “KEY 04” ;
The result is:
BranchNumber
B1
B1
B3
B2
Date
1/12/03
1/12/03
3/12/03
3/12/03
vii)
Stock
Code
KEY 04
SC 64
KEY 04
KEY 04
Range search condition. (BETWEEN / NOT BETWEEN)
Example:
List all stocks whose Stock-Quantity is between 5 and 50 by StockCode, Stock-Quantity.
SELECT
FROM
WHERE
Stock-Code, Stock-Quantity
Stocks
Stock-Quantity
BETWEEN 5 AND 50 ;
The result is:
StockCode
KEY 04
DIS 20
CD 52
MS 10
StockSold
28
14
7
42
-5-
viii)
Set membership search condition. (IN / NOT IN)
Example:
List Branch-Name in Nicosia and Kyrenia
SELECT
From
WHERE
Branch-Name, Branch-Town
Branch
Branch-Town
IN (“Nicosia”, “Kyrenia”) ;
The result is:
Branch-Name
Shopping Center Branch
Kyrenia Str. Branch
Harbor Branch
Branch-Town
Nicosia
Nicosia
Kyrenia
ix)
Pattern match search condition. (LIKE / NOT LIKE). In pattern
match search condition, the string to be searched for can be any
portion taken from any character position with any length between 1
& n (n is the length of the string to search for). % sign is used for a
wild character and underscore ( _ ) is used for a single character
For example:
Address LIKE
‘H%’ means the first character must be “H” but the rest
of the string can be any thing
Address LIKE
‘H_ _ _ _’ means that there must be exactly 4 characters
in the string. First of which must be “H”
Address LIKE
‘%e’ means any sequence of characters of length at least
1 with the last character an “e”
Address LIKE
‘%Nicosia%’ means a sequence of characters of any
length containing “Nicosia”
Address
‘H%’ means the first character cannot be “H”
NOT LIKE
Example:
List all the stocks whose Stock-Name starts wit “M”, by Stock-Code,
Stock-Price
SELECT
FROM
WHERE
Stock-Code, Stock-Name, Stock-Price
Stocks
Stock-Code LIKE “M%” ;
-6-
The result is:
StockCode
MB 100
MS 10
x)
StockPrice
$300
$9
Null search condition. (IS NULL/IS NOT NULL)
Supposing that we have an entry in the Sales that don’t have a date,
the blank inserted in the date field would have Null value and not
( Ø ) or " ". There fore we can't test it against these values.
Example:
List the entries in the Sales by Branch-Number, Stock-Code where
there is no date entry.
SELECT
FROM
WHERE
Stock-Code, Branch-Number
Sales
Date IS Null ;
The result is:
StockCode
KEY 04
xi)
BranchNumber
B2
Sorting the results. (ORDER BY)
The sorting may be in ASC (ascending) or DESC (descending) order.
The sorting may be used on single column or multiple columns
Example:
List the branches in ASC order of Branch-town. By branch-code,
Branch-Name, branch-town.
SELECT
FROM
ORDER BY
Branch-Number, Branch-Name, Branch-Town
Branch
Branch-Town ASC
-7-
The result is:
Branch-Code
B3
B4
B1
B2
Branch-Name
Famagusta Branch
Harbor Branch
Shopping Center Branch
Kyrenia Str. Branch
Town
Famagusta
Kyrenia
Nicosia
Nicosia
SQL Aggregate Functions:
COUNT:
SUM:
AVG:
MIN:
MAX:
Returns the number of values in specific column
Returns the sum of values in specific column
Returns the average of values in specific column
Returns the smallest of values in specific column
Returns the largest of values in specific column
i) COUNT:
Example:
Count the number of entries in the stock relation.
SELECT COUNT
FROM
(Stock-Code) as count
Stock;
The result count = 8
ii) SUM:
Example:
Sum the Stock-Quantity in the brand of those items whose price is less
than $50.
SELECT
FROM
WHERE
Sum (Stock-Quantity) as sum
Stock
Stock-Price < $50;
The result sum = 28 + 42 + 245 = 315
-8-
iii) MAX:
Example:
Find the Stock item with the maximum sale price in the Stocks.
SELECT
FROM
Stock-Name, MAX (Stock-Price)
Stocks ;
The result is
StockName
Main Board
StockPrice
$300
iv) MIN:
Example:
Find the stock item with the minimum sale price in the Stocks.
SELECT
FROM
Stock-Name, MIN (Stock-Price)
Stocks ;
The result is:
StockName
700 MB CD
StockPrice
$0.8
v) AVG:
Example:
Find the average stock sales on 1/12/2003.
SELECT AVG
FROM
WHERE
(Sales-Quantity) as avg
Sales
Sales-Date = '1/12/2003';
The result avg = 3. (21/7)
-9-
The INSERT Statement:
Insert statement is used to add new entries to an existing DB table.
The format of the INSERT statement is as follows,
INSERT INTO
VALUES
<Table-Name> [(column-list)]
(data-value-list)
Example:
Insert a new entry into the stock table as below:
Stock
Code
DIS 35
Stock
Name
3.5” FDD
INSERT INTO
VALUES
Stock Min
Level
3
Sale
Price
$30
Stock in
Hand
5
Stock (Stock-Code, Stock-Name, Stock-Min-Level,
Stock-price, Stock-Quantity)
('DIS 35', '3.5"FDD', 3, 30, 5);
If all data fields are to be inserted with data, then column headings might not
be listed
INSERT INTO
VALUES
Stock
('DIS 35', '3.5"FDD', 3, 30, 5);
The UPDATE Statement:
The existing DB table entry can be modified using UPDATE
statement. UPDATE statement will not change the contents of the primary
key. The format is:
UPDATE
SET
[WHERE
<Table-name>
column-name1 = data-value1, column-name2 = data-value2,
column name3 = data-name3.
search-condition] ;
- 10 -
Example:
If the firm decides to give all the items a price increase of 10%, then
the stock table should be updated as below.
UPDATE
SET
Stock
Sale-Price = Sale-Price * 1.1 ;
As a result the stock table will look as below
Stock
Code
MON 15
:
:
SC 64
Stock
Name
15” Monitor
:
:
64 MB VGA
Stock Min
Level
4
:
:
3
Sale
Price
$165
:
:
$55
Stock in
Hand
3
:
:
2
The DELETE Statement:
The DELETE statement allows rows to be deleted from the specified
table. The format is:
DELETE FROM <Table-name>
[WHERE
search-condition] ;
Example:
To delete all rows in the stocks
DELETE FROM Stock ;
To delete selected rows from the table
DELETE FROM Stock
WHERE
Stock-Price < 20 ;
'MSIO' and 'CDWR' will be deleted from stock table
- 11 -
Download