Uploaded by ECE B

Unit II

advertisement
CSMI14: Database Management
Systems
Dr. R. Bala Krishnan
Asst. Prof.
Dept. of CSE
NIT, Trichy – 620 015
Ph: 999 470 4853
E-mail: balakrishnan@nitt.edu
Course Content
2
Course Content
3
Books
• Text Books (TB)
 Silberschatz, Henry F. Korth, S. Sudharshan, “Database System
Concepts”, Fifth Edition, Tata McGraw Hill, 2006.
 J. Date, A. Kannan, S. Swamynathan, “An Introduction to Database
Systems”, Eighth Edition, Pearson Education, 2006.
• Reference Books (RB)
 Ramez Elmasri, Shamkant B. Navathe, “Fundamentals of Database
Systems”, Fourth Edition, Pearson/Addision Wesley, 2007.
 Raghu Ramakrishnan, “Database Management Systems”, Third
Edition, McGraw Hill, 2003.
 S. K. Singh, “Database Systems Concepts, Design and Applications”,
First Edition, Pearson Education, 2006.
4
Books & Chapters
Unit
Book
Chapter
1
TB1
1
1_
TB1
6
2
RB2
3, 4
2_
TB1
3
3
RB2
19
4
RB2
16, 17, 18
5
TB1
11, 12
• https://www.databasestar.com/sql-practice/
• http://sqlfiddle.com/#!9/7379d5/1
5
Unit II
6
RDBMS
•
Relational model is very simple and elegant
•
A database is a collection of one or more relations -> Each relation is a
table with rows and columns
-
Simple tabular representation enables even novice users to
understand the contents of a database
•
•
Permits the use of simple, high-level languages to query the data
Advantage
-
Simple data representation
-
Even complex queries can be expressed easily
Data Definition Language (DDL) -> Standard language for creating,
manipulating, and querying data in a relational DBMS
7
RDBMS
•
Main construct for representing data in the relational model is a relation
Student
•
•
•
•
•
ID
Name
Department
Year
123
Bala
CSE
1
456
Krishnan
EEE
2
789
Karthik
CSE
1
Relation consists of a relation schema and a relation instance
Relation schema describes the column heads for the table and relation
instance is a table
Schema specifies the relation's name, the name of each field (or column, or
attribute), and the domain of each field
Domain is referred to in a relation schema by the domain name and has a
set of associated values
Set of values associated with domain string is the set of all character strings
8
•
•
RDBMS
An instance of a relation is a set of tuples/records, in which each tuple has
the same number of fields as the relation schema
A relation instance can be thought of as a table in which each tuple is a
row, and all rows have the same number of fields
Students
•
•
•
•
Order in which the rows are listed is not important
If the fields are named, as in our schema definitions and figures depicting
relation instances, the order of fields does not matter either
An alternative convention is to list fields in a specific order and refer to a
field by its position. Eg: sid -> 1; login -> 3; Order is important
In SQL, named fields convention is used in statements that retrieve tuples
and the ordered fields convention is commonly used when inserting tuples
9
•
•
•
•
•
•
•
RDBMS
Domain Constraint
- A relation schema specifies the domain of each field or column in
the relation instance
- Values that appear in a column must be drawn from the domain
associated with that column
Domain constraints are so fundamental -> relation instance means relation
instance that satisfies the domain constraints in the relation schema
Degree, also called arity, of a relation is the number of fields
Cardinality of a relation instance is the number of tuples in it
A relational database is a collection of relations with distinct relation
names
- University database -> Students, Faculty, Courses, Rooms, Enrolled,
Teaches, and Meets_In
Relational database schema is the collection of schemas for the relations
in the database
An instance of a relational database is a collection of relation instances,
one per relation schema in the database schema
10
Creating and Modifying Relations
•
•
SQL language standard uses the word table to denote relation
Subset of SQL that supports the creation, deletion, and modification of
tables is called the Data Definition Language (DDL)
sid
name
login
age
gpa
sid
name
login
age
gpa
53688
Smith
smith@ee
18
3.2
sid
name
login
sid
name
login
age
gpa
53688
Smith
smith@ee
19
2.2
age
gpa
11
Creating and Modifying Relations
Students
12
Creating and Modifying Relations
Students
Students
13
Integrity Constraints over Relations
•
•
•
•
•
•
•
A database is only as good as the information stored in it
DBMS must help prevent the entry of incorrect information
An integrity constraint is a condition specified on a database schema and
restricts the data that can be stored in an instance of the database
If a database instance satisfies all the integrity constraints specified on the
database schema, it is a legal instance
DBMS enforces integrity constraints -> Permits only legal instances to be
stored in the database
Integrity constraints are specified and enforced at different times
- When the DBA or end user defines a database schema, he or she
specifies the constraints that must hold on any instance of this
database
- When a database application is run, the DBMS checks for violations
and disallows changes to the data that violate the specified Ics
Discuss the integrity constraints, other than domain constraints, that a
DBA or user can specify in the relational model
14
Integrity Constraints over Relations
•
Domain Constraint
•
Primary Key Constraint
•
Foreign Key Constraint
15
Key Constraints
•
•
Consider the Students relation and the constraint that no two students
have the same student id -> IC is an example of a key constraint
A set of fields that uniquely identifies a tuple according to a key constraint
is called a Candidate Key / Key for the relation
Student
•
• {ID}
ID
Name
Department
Year
123
Bala
CSE
II
• {Name, Department}
456
Bala
ECE
II
• {ID, Name}
789
Krishnan
CSE
I
Candidate Key Defn
- Two distinct tuples in a legal instance (an instance that satisfies all
Ics, including the key constraint) cannot have identical values in all
the fields of a key
- No subset of the set of fields in a key is a unique identifier for a
tuple
16
Key Constraints
•
•
•
•
•
•
When specifying a key constraint, the DBA or user must be sure that this
constraint will not prevent them from storing a 'correct' set of tuple
Every relation is guaranteed to have a key
Since a relation is a set of tuples, the set of all fields is always a superkey
If other constraints hold, some subset of the fields may form a key, but if
not, the set of all fields is a key
Out of all the available candidate keys, a database designer can identify a
primary key
DBMS may create an index with the primary key fields as the search key, to
make the retrieval of a tuple given its primary key value efficient
Students
•
sid
name
login
age
gpa
123
Bala
b@nit
23
8.6
456
Bala
b@nit
25
8.9
If the constraint is violated, the constraint name is returned and can be
used to identify the error
{sid}, {name, age}, {name, gpa}, {login, age},
{login, gpa}, {age, gpa}
17
Miscellaneous
Index (Created by
DBMS Software for
Quick Access)
Primary
Key
Location
123
101
456
120
customer
customer_id
customer_name
123
Bala
NITT
Trichy
456
Krishnan
NITT
Trichy
101
customer_street customer_city
102
103
104
1
2
3
\t
a
l
a
B
107
106
105
108
18
Foreign Key Constraints
•
•
Information stored in a relation is linked to the information stored in
another relation
If one of the relations is modified, the other must be checked, and
perhaps modified, to keep the data consistent -> foreign key constraint
Enrolled
Students
sid
name
login
age
gpa
studid
cid
grade
123
Bala
b@nit
23
8.6
123
CSE1
B
456
Bala
b@nit
25
8.9
456
ECE1
B
•
•
The studid field of Enrolled is called a foreign key and refers to Students
Primary keys must match on both the tables -> Must have the same
compatible data type
1. Students Relation
• Insert -> No Problem
• Delete
 If not present in Enrolled relation,
then allow deletion
 Else, either don’t allow or delete
from both the relations
2. Enrolled Relation
• Insert -> Allow if present in Students
19
relation
• Delete -> Allow
Foreign Key Constraints
•
•
•
A foreign key could refer to the same relation
Declare Partner column to be a foreign key referring to Students
Every student could then have a partner, and the partner field contains the
partner's sid
Primary Key
Foreign Key
Students
•
•
sid
name
login
age
gpa
Partner
123
Bala
b@nit
23
8.6
456
456
Bala
b@nit
25
8.9
123
789
Krishnan
k@nitt.edu
24
8.7
NULL
458
Selva
s@nitt.edu
25
7.4
NULL
Appearance of null in a foreign key field does not violate the foreign key
constraint; NULL -> Unknown or Not Applicable
Null values are not allowed to appear in a primary key field (because the
primary key fields are used to identify a tuple uniquely)
20
Foreign Key Constraints
•
•
•
•
Foreign key constraint states that every studid value in Enrolled must also
appear in Students
- studid in Enrolled is a foreign key referencing Students
Every studid value in Enrolled must appear as the value in the primary key
field, sid, of Students
Incidentally, the primary key constraint for Enrolled states that a student
has exactly one grade for each course he or she is enrolled in
If we want to record more than one grade per student per course, we
should change the primary key constraint
21
General Constraints
•
•
•
Domain, primary key, and foreign key constraints are considered to be a
fundamental part of the relational data model
Require that student ages be within a certain range of values
Given such an IC specification, the DBMS rejects inserts and updates that
violate the constraint
- Very useful in preventing data entry errors
Legal Instance
•
•
•
Illegal Instance
Require that every student whose age is greater than 18 must have a gpa
greater than 3
Table constraint -> Associated with a single table and checked whenever
that table is modified
Assertion Constraint -> Involve several tables and are checked whenever
any of these tables is modified
22
Miscellaneous
CREATE TABLE sailors (sid int, sname varchar(20), rating int, primary key(sid),
CHECK(rating >= 1 AND rating <=10)
Table Constraint
CHECK((select count(s.sid) from sailors s) + (select count(b.bid) from boats b) < 100) );
Assertion
Constraint
sailors
Boats
sid
sname
rating
bid
bname
123
bala
8
789
Black Pearl
456
krishnan
10
879
Diamond
• Syntax -> CREATE ASSERTION [ assertion_name ] CHECK ( [ condition ] );
https://www.geeksforgeeks.org/difference-between-assertions-and-triggers-in-dbms/
23
Enforcing Integrity Constraints
•
•
•
•
•
ICs are specified when a relation is created and enforced when a relation
is modified
Impact of DOMAIN, PRIMARY KEY, and UNIQUE constraints is
straightforward
- If an insert, delete, or update command causes a violation, it is
rejected
Every potential Ic violation is generally checked at the end of each SQL
statement execution, although it can be deferred until the end of the
transaction executing the statement
Deletion does not cause a violation of domain, primary key or unique
constraints
Insertion and Update can cause violations
24
Enforcing Integrity Constraints
•
Impact of foreign key constraints is more complex
- SQL sometimes tries to rectify a foreign key constraint violation
instead of simply rejecting the change
1. Students Relation
• Insert -> No Problem
• Delete
 If not present in Enrolled relation,
then allow deletion
 Else, either don’t allow or delete
from both the relations
• Update (sid) -> Allow if the value to be
updated is not present in Enrolled
relation
2. Enrolled Relation
• Insert -> Allow if present in Students
relation
• Delete -> Allow
• Update (sid) -> Allow if the updated
value is present in Students relation
25
Enforcing Integrity Constraints
•
•
SQL provides several alternative ways to handle foreign key violations
We must consider three basic questions:
26
Enforcing Integrity Constraints
•
SQL allows us to choose any of the four options on DELETE and UPDATE
, cid)
•
•
Cascade -> Whatever you do on one table, repeat the same thing on the other
No Action -> Don’t do anything to the table. Just reject the query
27
Enforcing Integrity Constraints
DEFAULT ‘53666’,
, cid)
SET DEFAULT
, cid)
SET NULL
•
•
Specification of a default value or null is appropriate only in certain situations
Correct solution in this example is to also delete all enrollment tuples for the
deleted student (that is, CASCADE) or to reject the update
28
Transactions and Constraints
•
A program that runs against a database is called a transaction
•
Can contain several statements (queries, inserts, updates, etc.) that access the
database
Transaction 1 (Account No. 101)
1.
2.
3.
4.
5.
•
Start
Display Balance
Deposit Rs. 100/Display Balance
Commit
Transaction 2 (Account No. 102)
1.
2.
3.
4.
Start
Withdraw Rs. 1000/Display Balance
Commit
If (the execution of) a statement in a transaction violates an integrity constraint,
should the DBMS detect this right away or should all constraints be checked
together just before the transaction completes?
•
By default, a constraint is checked at the end of every SQL statement that could
lead to a violation, and if there is a violation, the statement is rejected
-
Approach is too inflexible
29
Transactions and Constraints
•
Every student is required to have an honors course, and every course is required to
have a grader, who is some student
Students
sid
name
login
age
honors
gpa
Courses
cid
•
•
•
•
cname
credits
grader
Whenever a Students tuple is inserted, a check is made to see if the honors course
is in the Courses relation, and whenever a Courses tuple is inserted, a check is
made to see that the grader is in the Students relation
How are we to insert the very first course or student tuple
One cannot be inserted without the other
Only way to accomplish this insertion is to defer the constraint checking that
would normally be carried out at the end of an INSERT statement
30
Transactions and Constraints
•
SQL allows a constraint to be in DEFERRED or IMMEDIATE mode
Insert into Students values (123, Bala, b@nitt.edu, 24, CSMI24, 8.0)
Students
Insert into Courses values (CSMI24, DBMS, 3, 123)
sid
name
login
age
honors
gpa
SET CONSTRAINT ConstraintFoo DEFERRED
Courses
•
cid
A constraint in deferred mode is checked at commit time
cname
credits
grader
•
In our example, the foreign key constraints on Students and Courses can both be
declared to be in deferred mode
•
We can then insert a sid with a nonexistent honors (temporarily making the
database inconsistent), insert the corresponding cid (restoring consistency), then
commit and check that both constraints are satisfied
31
Querying Relational Data
•
•
•
A relational database query is a question about the data, and the answer
consists of a new relation containing the result
Eg: We might want to find all students younger than 18 or all students
enrolled in Reggae203
A query language is a specialized language for writing queries
• * -> Retain all fields of selected tuples in the result
• S -> Variable that takes on the value of each tuple in
Students, one tuple after the other
• S.age < 18 -> Specifies that we want to select only
tuples in which the age field has a value less than 18
•
Domain of a field restricts the operations that are permitted on field
values, in addition to restricting the values that can appear in the field
32
Querying Relational Data
•
A query can extract a subset of the fields of each selected tuple
•
Order in which we perform these operations does matter
- If we remove unwanted fields first, we cannot check the condition
S.age < 18, which involves one of those fields
If there is a Students tuple S and an Enrolled tuple E such that S.sid =
E.studid (so that S describes the student who is enrolled in E) and E.grade
= 'A', then print the student's name and the course id
•
33
ER to Relational
•
ER model is convenient for representing an initial, high-level database
design
•
Given an ER diagram describing a database, a standard approach is taken
to generate a relational database schema that closely approximates the ER
design
•
How to translate an ER diagram into a collection of tables with associated
constraints -> Relational database schema
34
ER to Relational
•
Entity Sets to Tables
- An entity set is mapped to a relation in a straightforward way
- Each attribute of the entity set becomes an attribute of the table
- Know both the domain of each attribute and the (primary) key of an
entity set
•
Relationship Sets (without Constraints) to Tables
- To represent a relationship, we must be able to identify each
participating entity and give values to the descriptive attributes of
the relationship
- Attributes of the relation include:
 Primary key attributes of each participating entity set, as
foreign key fields
 Descriptive attributes of the relationship set
35
ER to Relational
•
•
Set of nondescriptive attributes is a superkey for the relation
If there are no key constraints, this set of attributes is a candidate key
•
Each department has offices in several locations and we want to record
the locations at which each employee works
36
ER to Relational
Works_In2
•
ssn
did
address
since
123
456
CSE
2009
123
456
CSE
2011
789
456
CSE
2010
address, did and ssn fields are primary keys and cannot take on null values
-
Constraint ensures that these fields uniquely identify a department,
an employee, and a location in each tuple of Works_In
-
Can also specify that a particular action is desired when a
referenced Employees, Departments, or Locations tuple is deleted
37
ER to Relational
<=>
Supervisor
Subordinate
•
Role indicators supervisor and subordinate are used to create meaningful
field names in the CREATE statement for the Reports_To table
•
Need to explicitly name the referenced field of Employees because the
field name (ssn) differs from the name(s) of the referring field(s)
(supervisor…ssn, subordinate…ssn)
38
Translating Relationship Sets with Key
Constraints
•
•
•
A relationship set involves “n” entity sets and some “m” of them are
linked via arrows in the ER diagram
- Key for anyone of these m entity sets constitutes a key for the
relation to which the relationship set is mapped
- Have “m” candidate keys, and one of these should be designated as
the primary key
Manages
ssn
did
since
345
123
2009
345
911
2006
321
123
2009
234
456
2010
567
789
2011
Table corresponding to Manages has the attributes ssn, did, since
Each department has at most one manager -> No two tuples can have the
same did value but differ on the ssn value
- did is itself a key for Manages; indeed, the set did, ssn is not a key 39
Miscellaneous
40
Translating Relationship Sets with Key
Constraints
Employees
ssn
name
lot
321
Bala
Full-Time
Departments
did
dname
budget
456
CSE
10,000
Manages
Select name,dname
from Manages where
did = 456 and ssn = 123
•
•
ssn
did
since
123
456 (CSE)
1999
789
345 (ECE)
2006
Second approach to translating a relationship set with key constraints is often
superior because it avoids creating a distinct table for the relationship set
- Idea is to include the information about the relationship set in the table
corresponding to the entity set with the key, taking advantage of the
key constraint
In the Manages example, because a department has at most one manager, we
can add the key fields of the Employees tuple denoting the Manager and the
since attribute to the Departments tuple
41
Translating Relationship Sets with Key
Constraints
Departments
Select name, dname from
DepLMgr where did = 123
and ssn = 123
•
•
•
•
did
dname
budget
ssn
since
123
CSE
10,000
456
1999
789
ECE
20,000
NULL
2006
Eliminates the need for a separate Manages relation, and queries asking
for a department's manager can be answered without combining
information from two relations
Drawback: Space could be wasted if several departments have no
managers -> Added fields would have to be filled with null values
First translation (using a separate table for Manages) avoids this
inefficiency, but some important queries require us to combine
information from two relations, which can be a slow operation
Conclusion: If a relationship set involves “n” entity sets and some “m” of
them are linked via arrows in the ER diagram, the relation corresponding
to anyone of the “m” sets can be augmented to capture the relationship
42
Translating Relationship Sets with
Participation Constraints
•
Every department is required to have a manager, due to the participation
constraint, and at most one manager, due to the key constraint
43
Miscellaneous
Departments
Employees
ssn
name
lot
did
dname
budget
123
Bala
Full-Time
789
CSE
10,000
456
Selva
Part-Time
321
ECE
20,000
Dept_Mgr
Dept_Mgr
did
dname
budget
ssn
since
did
dname
budget
ssn
since
789
CSE
10,000
123
2009
789
CSE
10,000
123
2009
321
ECE
20,000
123
2010
321
ECE
20,000
456
44
2010
Translating Relationship Sets with
Participation Constraints
•
Captures the participation constraint that every department must have a
manager
•
ssn cannot take on null values
-
Each tuple of Dept_Mgr identifies a tuple in Employees (who is the
manager)
•
NO ACTION specification, which is the default and need not be explicitly
specified
-
Ensures that an Employees tuple cannot be deleted while it is
pointed to by a Dept_Mgr tuple
-
If we wish to delete such an Employees tuple, we must first change
the Dept_Mgr tuple to have a new employee as manager
45
Translating Weak Entity Sets
•
•
•
•
•
A weak entity set always participates in a one-to-many binary relationship
and has a key constraint and total participation
Weak entity has only a partial key
When an owner entity is deleted, we want all owned weak entities to be
deleted
A Dependents entity can be identified uniquely only if we take the key of
the owning Employees entity and the pname of the Dependents entity
Dependents entity must be deleted if the owning Employees entity is
deleted
46
Translating Weak Entity Sets
NOT NULL
•
CASCADE option ensures that information about an employee's policy and
dependents is deleted if the corresponding Employees tuple is deleted
47
Translating Class Hierarchies
•
Two basic approaches to handle ISA hierarchies
48
Translating Class Hierarchies
•
Approach 1
-
We can map each of the entity sets Employees, Hourly_Emps, and
Contract_Emps to a distinct relation
-
Employees relation is created as usual
-
Relation for Hourly_Emps includes the hourly_wages and hours_worked
attributes of Hourly_Emps
-
It also contains the key attributes of the superclass (ssn, in this
example), which serve as the primary key for Hourly_Emps, as well as a
foreign key referencing the superclass (Employees)
-
For each Hourly_Emps entity, the value of the name and lot attributes
are stored in the corresponding row of the superclass (Employees)
-
Note that if the superclass tuple is deleted, the delete must be cascaded
to Hourly_Emps
49
Translating Class Hierarchies
•
Approach 1
Employee
ssn
name
lot
123
Bala
Hourly_Emps
456
Selva
Contract_Emps
789
Karthik
NULL
Hourly_Emps
ssn
hourly hours_
_wages worked
123
500
8
Contract_Emps
Query:
1. Find the list of all employees (Employee Table)
2. Find the details of all Hourly_Emps (Hourly_Emps + Employee Tables)
3. Find the details of all Contract_Emps (Contract_Emps + Employee
Tables)
ssn
contractid
456
321
50
Translating Class Hierarchies
•
Approach 2
-
-
Alternatively, we can create just two relations, corresponding to
Hourly_Emps and Contract_Emps
Relation for Hourly_Emps includes all the attributes of Hourly_Emps
as well as all the attributes of Employees (i.e., ssn, name, lot,
hourly_ wages, hours_worked)
Relation for Contract_Emps includes all the attributes of
Contract_Emps as well as all the attributes of Employees (i.e., ssn,
name, lot, contractid)
Hourly_Emps
ssn
123
name
Bala
Contract_Emps
lot
hourly_
wages
hours_
worked
Hourly_Emps
500
8
ssn
name
lot
contractid
456
Selva
Contract_Emps
321
Query:
1. Find the list of all employees (Hourly_Emps + Contract_Emps Tables)
2. Find the details of all Hourly_Emps (Hourly_Emps)
3. Find the details of all Contract_Emps (Contract_Emps)
Missing
Tuple
789
Karthik NULL
51
Translating Class Hierarchies
•
•
•
First approach is general and always applicable
- Queries in which we want to examine all employees and do not care
about the attributes specific to the subclasses are handled easily
using the Employees relation
- Queries in which we want to examine, say, hourly employees, may
require us to combine Hourly_Emps (or Contract_Emps, as the case
may be) with Employees to retrieve name and lot
Second approach is not applicable
- If we have employees who are neither hourly employees nor
contract employees, since there is no way to store such employees
- If an employee is both an Hourly_Emps and a Contract_Emps entity,
then the name and lot values are stored twice
- A query that needs to examine all employees must now examine
two relations
- On the other hand, a query that needs to examine only hourly
employees can now do so by examining just one relation
Choice between these approaches clearly depends on the semantics of
the data and the frequency of common operations
52
Translating ER Diagrams with
Aggregation
Departments
Employees
ssn
name
lot
did
dname budget
C123
Bala
Full-Time
456
CSE
D123
Krishnan
Part-Time
789
ECE
Projects
pid
Started_on
pbudget
10,000
1
1.1.2009
5,000
20,000
2
1.2.2010
3,000
Sponsors
did
pid
since
456
1
2009
789
2
2010
53
Translating ER Diagrams with
Aggregation
Employees
ssn
name
lot
C123
Bala
Full-Time
D123
Krishnan
Part-Time
Projects
Departments
pid
Started_on
pbudget
did
dname
budget
1
1.1.2009
5,000
456
CSE
10,000
2
1.2.2010
3,000
789
ECE
20,000
54
Translating ER Diagrams with
Aggregation
CREATE TABLE Monitors(ssn CHAR(10),
did CHAR(10),
pid CHAR(10),
until CHAR(20),
PRIMARY KEY(ssn, did, pid),
FOREIGN KEY (ssn) REFERENCES EMPLOYEES,
FOREIGN KEY (did) REFERENCES Departments,
FOREIGN KEY (pid) REFERENCES Projects
Monitors
•
•
ssn
did
pid
until
C123
456
1
2010
D123
456
1
2010
Monitors Relationship Set -> Create a relation with the following
attributes: the key attributes of Employees (ssn), the key attributes of
Sponsors (did, pid), and the descriptive attributes of Monitors (until)
What about Sponsors relationship set? Should we have it or not?
55
Translating ER Diagrams with
Aggregation
Monitors
ssn
did
pid
until
C123
456
1
2010
D123
456
1
2010
Monitors
ssn
did
pid
until
C123
456
1
2010
D123
456
1
2010
NULL
789
2
2011
Partial Participation
Sponsors
did
pid
since
456
1
2009
789
2
56
2010
Translating ER Diagrams with
Aggregation
Monitors
ssn
did
pid
until
C123
456
1
2010
D123
456
1
2010
Monitors
ssn
did
pid
until
C123
456
1
2010
D123
456
1
2010
D123
789
2
2011
Total Participation
Sponsors
did
pid
since
456
1
2009
789
2
57
2010
Translating ER Diagrams with
Aggregation
•
Sponsors Relationship Set
-
Has attributes pid, did, and since
-
Need it (in addition to Monitors) for two reasons

Have to record the descriptive attributes (in our example, since)
of the Sponsors relationship

Not every sponsorship has a monitor, and thus some (pid, did)
pairs in the Sponsors relation may not appear in the Monitors
relation
-
If Sponsors has no descriptive attributes and has total participation in
Monitors, every possible instance of the Sponsors relation can be
obtained from the (pid, did) columns of Monitors -> Sponsors can be
dropped
58
Views
•
View is a table whose rows are not explicitly stored in the database but
are computed as needed from a view definition
B-Students (Data Viewed by
Students)
name
•
•
•
sid
course
View B-Students has three fields called name, sid, and course with the
same domains as the fields name and sid in Students and cid in Enrolled
If the optional arguments name, sid, and course are omitted from the
CREATE VIEW statement, then the column names name, sid, and cid are
inherited
Whenever B-Students is used in a query, the view definition is first
evaluated to obtain the corresponding instance of B-Students, then the
rest of the query is evaluated treating B-Students like any other relation
referred to in the query
select * from B-Students
59
Views, Data Independence and Security
•
•
•
•
•
•
Physical schema for a relational database describes how the relations in the
conceptual schema are stored, in terms of the file organizations and indexes
used
Conceptual schema is the collection of schemas of the relations stored in the
Database
While some relations in the conceptual schema can also be exposed to
applications, that is, be part of the external schema of the database,
additional relations in the external schema can be defined using the view
mechanism
View mechanism thus provides the support for logical data independence in
the relational model
- Can be used to define relations in the external schema that mask
changes in the conceptual schema of the database from applications
Eg: If the schema of a stored relation is changed, we can define a view with
the old schema and applications that expect to see the old schema can now
use this view
Views are also valuable in the context of security
- Can define views that give a group of users access to just the
information they are allowed to see
60
Miscellaneous
Emp_Id Position
DoJ
Emp_Id
DoJ
Salary
Application
View Level
Emp_ID
Name
Position
DoJ
Salary
Logical Level
Hard Disk
Physical Level
61
Miscellaneous
Emp_Id Position DoJ
Emp_Id
DoJ
Salary
Emp
_ID
Name
Application
Position
DoJ Salary
View Level
Emp_ID
Name
Position
DoJ
Salary
Age
Logical Level
Hard Disk
Physical Level
62
Updates on Views
•
A view can be used just like any other relation in defining a query
Students
sid
name
login
age
gpa
456
Bala
b@ni
24
4.5
GoodStudents
select * from GoodStudents
CREATE VIEW GoodStudents (sid, gpa)
AS SELECT S.sid, S.gpa FROM Student S
WHERE S.gpa > 3.0 WITH CHECK
OPTION CONSTRAINT GPA
Insert into GoodStudents values(123, 4)
sid
gpa
456
4.5
Students
sid
name
login
age
gpa
456
Bala
b@ni
24
4.5
123
NULL
NULL
NULL
4
•
An INSERT or UPDATE may change the underlying base table so that the
resulting (i.e., inserted or modified) row is not in the view
Insert into GoodStudents values(123, 2.8) -> Default “Allow”
63
Need to Restrict View Updates
Students
Clubs
cname
mname
•
Find the names and logins of students with a gpa greater than 3 who
belong to at least one club, along with the club name and the date they
joined the club
ActiveStudents
•
Delete the row (Smith, smith@ee, Hiking, 1997) from ActiveStudents. How
are we to do this? -> ActiveStudents rows are not stored explicitly but
computed as needed from the Students and Clubs tables using the view
64
definition -> Disallow such updates on views
Need to Restrict View Updates
B-Students (Data Viewed by Students)
name
•
•
•
•
sid
course
To insert a tuple, say (Dave, 50000, Reggae203) B-Students, we can simply
insert a tuple (Reggae203, B, 50000) into Enrolled since there is already a
tuple for sid 50000 in Students
To insert (John, 55000, Reggae203), we have to insert (Reggae203, B,
55000) into Enrolled and also insert (55000, John, null, null, null) into
Students
View schema contains the primary key fields of both underlying base
tables -> otherwise, we would not be able to support insertions into this
view
To delete a tuple from the view B-Students, we can simply delete the
corresponding tuple from Enrolled
65
Destroying / Altering Tables and Views
•
•
Destroying Table
- If we decide that we no longer need a base table and want to
destroy it (i.e., delete all the rows and remove the table definition
information), we can use the DROP TABLE command
- Eg: DROP TABLE Students RESTRICT -> Destroys the Students table
unless some view or integrity constraint refers to Students; if so, the
command fails
- RESTRICT is replaced by CASCADE -> Students is dropped and any
referencing views or integrity constraints are (recursively) dropped
as well
- One of these two keywords must always be specified
Destroying View
- View can be dropped using the DROP VIEW command, which is just
like DROP TABLE
66
Miscellaneous
Students
DROP TABLE Students RESTRICT
Name
ID
Age
Year
123
BT456
18
I
456
MT789
19
II
• DELETE FROM Students S where S.ID = BT456
• DELETE FROM Students S where S.ID = MT789
67
Destroying / Altering Tables and Views
•
Alter Table
- ALTER TABLE modifies the structure of an existing table
Students
-
Name
ID
Age
Year
Maidenname
123
BT456
18
I
NULL
456
MT789
19
II
NULL
Students is modified to add this column, and all existing rows are
padded with null values in this column
ALTER TABLE can also be used to delete columns and add or drop
integrity constraints on a table
Dropping columns is treated very similarly to dropping tables or
views
68
Preliminaries
•
•
Inputs and outputs of a query are relations
A query is evaluated using instances of each input relation and it produces
an instance of the output relation
Students
sid
name
login
age
gpa
456
Bala
b@ni
24
4.5
Select name, age from Students where sid = 456
•
•
name age
Bala
24
Used field names to refer to fields
Always list the fields of a given relation in the same order and refer to
fields by position rather than by field name
Select 2, 4 from Students where sid = 456
name age
Bala
24
69
Preliminaries
70
Relational Algebra
•
Selection
•
Projection
•
Union
•
Intersection
Set Operations
•
Difference or Set Difference
•
Cross-product
71
Selection
S
•
Selection operator σ -> Specifies the tuples to retain through a selection
condition
•
Selection condition is a Boolean combination (i.e., an expression using the
logical connectives ˄ and ˅) of terms that have the form attribute op
constant or attribute1 op attribute2
•
op is one of the comparison operators <, <=, =, ≠, >=, or >
Reference to an attribute can be by position (of the form .i or i) or by
name (of the form .name or name)
72
Projection
S
•
•
Subscript sname, rating specifies the fields to be retained
Other fields are 'projected out’
S
age
{35.0, 35.0, 35.0}
55.5
•
•
•
Although three sailors are aged 35, a single tuple with age=35.0 appears in
the result of the projection
In practice, real systems often omit the expensive step of eliminating
duplicate tuples, leading to relations that are multisets
Our discussion of relational algebra assumes that duplicate elimination is
always done so that relations are always sets of tuples
73
Projection
•
•
Result of a relational algebra expression is always a relation, we can
substitute an expression wherever a relation is expected
Eg: We can compute the names and ratings of highly rated sailors by
combining two of the preceding queries
S
74
Set Operations
•
•
•
Union (A U B)
A
B
A
B
A
B
Intersection (A ∩ B)
Difference (A – B)
75
Set Operations
•
Union (A U B)
Student
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
•
•
•
•
CSHO23
Union
Employee
ID
45
12
Name
Selva
Sai
Course Handling
CSMI23
CSOE17
ID
Name
Course Taken
12
Bala
CSMI23
12
Sai
CSOE17
23
Karthik
CSHO23
45
Selva
CSMI23
A U B returns a relation instance containing a U tuples that occur in either
relation instance A or relation instance B (or both)
A and B must be union-compatible, and the schema of the result is defined to
be identical to the schema of A
Two relation instances are said to be union-compatible, if the following
conditions hold:
- Both have the same number of the fields
- Corresponding fields, taken in order from left to right, have the same
Domains
Note that field names are not used in defining union-compatibility
- For convenience, we will assume that the fields of A U B inherit names
from A, if the fields of A have names
76
Set Operations
•
Intersection (A ∩ B)
Student (2011 Batch)
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
CSHO23
•
Student (2011 & 2012 Batch)
ID
Intersection
Name
Course Taken
12
Bala
CSMI23
12
Bala
CSMI23
23
Karthik
CSHO23
23
Karthik
CSHO23
45
Selva
CSHo23
ID
Name
Course Taken
A ∩ B returns a relation instance containing all tuples that occur in both A
and B
•
Relations A and B must be union-compatible
•
Schema of the result is defined to be identical to the schema of A
77
Set Operations
•
Difference or Set Difference (A - B)
Student (2011 Batch)
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
CSHO23
•
Student (2011 & 2012 Batch)
Difference or Set Difference
ID
ID
Name
Course Taken
12
Bala
CSMI23
45
Selva
CSHo23
23
Name
Karthik
Course Taken
CSHO23
A - B returns a relation instance containing all tuples that occur in A but
not in B
•
Relations A and B must be union-compatible
•
Schema of the result is defined to be identical to the schema of A
78
Set Operations
•
Cross Product (A x B)
Student (2011 Batch)
Student (2011 & 2012 Batch)
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
CSHO23
ID
Name
CGPA
23
Karthik
7
13
Kumar
8.9
2x2=4
Cross Product
1
2
Course Taken
4
5
CGPA
12
Bala
CSMI123
23
Karthik
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
79
Set Operations
•
Cross Product (A x B)
3x2=6
1
5
80
Set Operations
•
Cross Product (A x B)
-
A x B returns a relation instance whose schema contains all the fields of
A followed by all the fields of B
-
Result of A x B contains all tuple (the concatenation of tuples A and B)
-
Cross-product operation is sometimes called Cartesian product
-
Fields of A x B inherit names from the corresponding fields of A and B
-
It is possible for both A and B to contain one or more fields having the
same name

Creates a naming conflict

Corresponding fields in A x B are unnamed and are referred to
solely by position
81
Renaming
•
•
Name conflicts can arise in some cases -> A x B
Convenient to be able to give names explicitly to the fields of a relation
instance that is defined by a relational algebra expression
Renaming operator ρ
Expression ρ(C(1 → StudID, 2 → Name, 4 → ID, 5 → Name1), A x B)
returns a relation that contains the tuples with the following schema:
- C(StudID: Integer, Name: string, Course Taken: String, ID: Integer,
Name1: Integer, CGPA: Real)
•
•
Cross Product
1
2
Course Taken
Cross Product (with Renaming)
4
5
CGPA
12
Bala
CSMI123
23
Karthik
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
StudID
Name
Course
Taken
ID
Name1
CGPA
12
Bala
CSMI123
23
Karthik
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
82
Joins
•
Used to combine information from two or more relations
•
Although a join can be defined as a cross-product followed by selections
and projections, joins arise much more frequently in practice than plain
cross-products
•
Result of a cross-product is typically much larger than the result of a join
•
Very important to recognize joins and implement them without
materializing the underlying cross-product
83
Joins
•
Condition Join
•
Equi Join
•
Left Join
•
Right Join
•
Natural Join
84
Condition Join
A
B
A
B
A
A.sid < B.sid B
A
1
5
1
A.sid < B.sid B
5
85
Condition Join
•
•
Join condition is identical to a selection condition in form
is defined to be a cross-product followed by a selection
•
Condition c can refer to attributes of both A and B
•
Reference to an attribute of a relation, say, A, can be by position (of the
form A.i) or by name (of the form A.name)
86
Equi Join
•
•
•
•
•
•
A
Join condition A
B solely consists of equalities -> A.name1 = B.name2
Some redundancy in retaining both attributes in the result
For join conditions that contain only such equalities, the join operation is
refined by doing an additional projection in which B.name2 is dropped
Join operation with this refinement -> Equijoin
Schema of the result of an equijoin contains the fields of A followed by the
fields of B that do not appear in the join conditions
If this set of fields in the result relation includes two fields that inherit the
same name from A and B, they are unnamed in the result relation
A.sid = B.sid
1
B
5
87
Equi Join
•
A
A.ID = B.ID B
Student (2011 Batch)
Student (2011 & 2012 Batch)
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
CSHO23
ID
Name
CGPA
23
Karthik
7
13
Kumar
8.9
Equi Join
Cross Join
1
2
Course
Taken
4
5
CGPA
12
Bala
CSMI23
23
Karthik
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
ID
23
2
Karthik
Course
Taken
CSHO23
4
CGPA
Karthik
7
88
Natural Join
•
•
•
Special case of join operation A
B
Equalities are specified on all fields having the same name in A and B ->
Omit the join condition
- Default is that the join condition is a collection of equalities on all
common fields
Has the nice property that the result is guaranteed not to have two fields
with the same name
A
Student (2011 Batch)
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
CSHO23
•
Natural Join (No Condition)
Student (2011 & 2012 Batch)
ID
Name
CGPA
23
Karthik
7
13
Kumar
8.9
B
ID
Name
23
Karthik
If the two relations have no attributes in common, A
cross-product
Course
Taken
CGPA
CSHO23
7
B is simply the
89
Natural Join
•
A
B
Student (2011 Batch)
Student (2011 & 2012 Batch)
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
CSHO23
ID
Name
CGPA
23
Karthik
7
13
Kumar
8.9
A
Cross Join
1
2
Course
Taken
B
Natural Join (No Condition)
4
5
CGPA
ID
Name
23
Karthik
12
Bala
CSMI23
23
Karthik
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
Course
Taken
CSHO23
CGPA
7
90
Natural Join
•
A
B
Student (2011 Batch)
Student (2011 & 2012 Batch)
ID
Name
Course Taken
StudID
Sname
CGPA
12
Bala
CSMI23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
A
B
Cross Join
ID
Name
Course Taken
StudID
Sname
CGPA
12
Bala
CSMI23
23
Karthik
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
91
Left Outer Join
Emp_ID
112
114
116
118
A
Name
Bala
Krishnan
Kumaran
Sai
B
Age
25
45
23
21
Emp_ID
112
116
114
120
Position
Asst. Prof.
Prof.
Asso. Prof.
Prof.
Left Outer Join
Emp_ID
Name
Age
Position
112
Bala
25
Asst. Prof
114
Krishnan
45
Asso. Prof.
116
Kumaran
23
Prof.
118
Sai
21
NULL
92
Right Outer Join
Emp_ID
112
114
116
118
A
Name
Bala
Krishnan
Kumaran
Sai
B
Age
25
45
23
21
Emp_ID
112
116
114
120
Position
Asst. Prof.
Prof.
Asso. Prof.
Prof.
Right Outer Join
Emp_ID
Position
Name
Age
112
Asst. Prof.
Bala
25
116
Prof.
Kumaran
23
114
Asso. Prof.
Krishnan
45
120
Prof.
NULL
NULL
93
Full Join
Emp_ID
112
114
116
118
A
Name
Bala
Krishnan
Kumaran
Sai
B
Age
25
45
23
21
Emp_ID
112
116
114
120
Position
Asst. Prof.
Prof.
Asso. Prof.
Prof.
Full Join
Emp_ID
Position
Name
Age
112
Bala
25
Asst. Prof
114
Krishnan
45
Asso. Prof.
116
Kumaran
23
Prof.
94
Preliminaries
•
•
•
•
•
•
•
Data-Definition Language (DDL) -> SQL DDL provides commands for defining
relation schemas, deleting relations, and modifying relation schemas
Interactive Data-Manipulation Language (DML) -> SQL DML includes a query
language based on both the relational algebra and the tuple relational
calculus. It also includes commands to insert tuples into, delete tuples from,
and modify tuples in the database
Integrity -> SQL DDL includes commands for specifying integrity constraints
that the data stored in the database must satisfy. Updates that violate integrity
constraints are disallowed
View Definition -> SQL DDL includes commands for defining views.
Transaction Control -> SQL includes commands for specifying the beginning
and ending of transactions
Embedded SQL and Dynamic SQL -> Embedded and dynamic SQL define how
SQL statements can be embedded within general-purpose programming
languages such as C, C++, Java, PL/L Cobol, Pascal and Fortran
Authorization -> SQL DDL includes commands for specifying access rights to
relations and views
95
Preliminaries
96
Data Definition Language
•
Set of relations in a database must be specified to the system by means of
a data definition language (DDL)
•
SQL DDL allows specification of not only a set of relations, but also
information about each relation
-
Schema for each relation
-
Domain of values associated with each attribute
-
Integrity constraints
-
Set of indices to be maintained for each relation
-
Security and authorization information for each relation
-
Physical storage structure of each relation on disk
97
Basic Domain Types
•
•
•
•
•
•
•
•
char(n) or character(n) -> A fixed-length character string with userspecified length n
Eg: name char(5); name = bala; □bala
varchar(n) or character varying(n) -> A variable-length character string
with user-specified maximum length n
Eg: name varchar(5); name = bala; bala
int or integer -> An integer (a finite subset of the integers that is machine
dependent)
Smallint -> A small integer (a machine-dependent subset of the integer
domain type)
numeric(p, d) -> A fixed-point number with user-specified precision. The
number consists of p digits (plus a sign), and d of the p digits are to the
right of the decimal point. Thus, numeric(3,1) allows 44.5 to be stored
exactly, but neither 444.5 or 0.32 can be stored exactly in a field of this
type
real, double precision: Floating-point and double-precision floating-point
numbers with machine-dependent precision
float(n): A floating-point number, with precision of at least n digits
SQL also provides special data types, such as various forms of the date
98
type -> DD-MM-YEAR; MM-DD-YEAR, etc.
Basic Schema Definition in SQL
• delete from account where id = 123
• drop table account CASCADE
Many database systems do not support
dropping of attributes, although they will
allow an entire table to be dropped
99
Basic Schema Definition in SQL
• delete from account where id = 123
• drop table account CASCADE
Drop command deletes not only all
tuples of the relation, but also the schema
for relation. After relation is dropped, no
tuples can be inserted into the relation
unless it is re-created with the create table
command
100
Basic Structure of SQL Queries
•
•
•
•
A relational database consists of a collection of relations, each of which is
assigned a unique name
NULL -> Indicate that the value either is unknown or does not exist
NOT NULL -> Used to specify which attributes cannot be assigned null
values
Basic structure of an SQL expression consists of three clauses
- select clause corresponds to the projection operation of the
relational algebra. It is used to list the attributes desired in the
result of a query
- from clause corresponds to the Cartesian-product operation of the
relational algebra. It lists the relations to be scanned in the
evaluation of the expression
- where clause corresponds to the selection predicate of the
relational algebra. It consists of a predicate involving attributes of
the relations that appear in the from clause
101
Basic Structure of SQL Queries
•
If the where clause is omitted, the predicate P is true
•
Unlike the result of a relational-algebra expression, the result of the SQL query
may contain multiple copies of some tuples
•
Three Steps
-
SQL forms the Cartesian product of the relations named in the from
clause
-
Performs a relational-algebra selection using the where clause
predicate
-
Projects the result onto the attributes of the select clause
102
loan
•
Select Clause
ID
Branch_name
Branch_name
12
Trichy
Trichy
23
Trichy
Trichy
45
Trichy
Trichy
67
Chennai
Chennai
Query will list each branch_name once for every tuple in which it appears
in the loan relation -> Duplicates are not removed (No. of occurrences of
duplicates may differ)
Branch_name
Trichy
Chennai
•
Force the elimination of duplicates, we insert the keyword distinct after
Branch_name
select
Trichy
Trichy
•
•
•
Trichy
Use keyword all to specify explicitly that duplicates are not removed
Asterisk symbol " * " can be used to denote "all attributes“
select clause may also contain arithmetic expressions involving the
operators *, -, *, and / operating on constants or attributes of tuples
Chennai
103
Where Clause
<> means not equal to
•
SQL uses the logical connectives and, or, and not -> Rather than the
mathematical symbols ˄, ˅ and ˥ in the where clause
•
Operands of the logical connectives can be expressions involving the
comparison operators <, <=, >, >=, = and <>
•
SQL allows us to use the comparison operators to compare strings and
arithmetic expressions, as well as special types, such as date types
•
Similarly, not between comparison operator also exist
104
From Clause
•
•
•
•
from clause by itself defines a Cartesian product of the relations in the
clause
Since the natural join is defined in terms of a Cartesian product, a
selection, and a projection, it is a relatively simple matter to write an SQL
expression for the natural join
relation-name.attribute-names, does the relational algebra, to avoid
ambiguity in cases where an attribute appears in the schema of more than
one relation
SQL includes extensions to perform natural joins and outer joins in the
from clause
105
Rename Operation
•
Rename both relations and attributes
old-name as new-name
- As clause can appear in both the select and from clauses
•
Names of the attributes in the result are derived from the names of the
attributes in the relations in the from clause
Cannot always derive names in this way, for several reasons
- First, two relations in the from clause may have attributes with the
same name, in which case an attribute name is duplicated in the result
- Second, if we used an arithmetic expression in the select clause, the
resultant attribute does not have a name
- Third, even if an attribute name can be derived from the base relations
as in the preceding example, we may want to change the attribute
name in the result
106
•
Tuple Variables
•
Tuple variables are defined in the from clause by way of the as clause
•
Tuple variables are most useful for comparing two tuples in the same
relation
Observe that we could not use the notation branch.asset, since it would
not be clear which reference to branch is intended
SQL permits us to use the notation (v1, v2, v3, …, vn) to denote a tuple of
arity (or degree) n containing values v1, v2, v3, …, vn
Comparison operators can be used on tuples, and the ordering is defined
lexicographically
- (a1, a2) <= (b1, b2) is true, if a1 < b1, or (a1 = b1) ˄ (a2 <= b2)
Two tuples are equal if all their attributes are equal
•
•
•
•
107
String Operations
•
•
•
SQL specifies strings by enclosing them in single quotes -> 'Penytidge‘
A single quote character that is part of a string can be specified by using
two single quote characters -> It’’s right
Most used operation on strings is pattern matching using the operator like
Describe patterns by using two special characters
- Percent (%) -> % character matches any substring
- Underscore (_) -> _ character matches any character
Patterns are case sensitive
•
SQL expresses patterns by using the like comparison operator
•
•
108
String Operations
•
For patterns to include the special pattern characters (that is, % and _), SQL
allows the specification of an escape character
•
Escape character is used immediately before a special pattern character to
indicate that the special pattern character is to be treated like a normal
character
•
SQL allows us to search for mismatches instead of matches by using the not
like comparison operator
•
SQL also permits a variety of functions on character strings
•
Concatenating (||), extracting substrings, finding the length of strings,
converting strings to uppercase (upper()) and lowercase (lower()), etc.
109
Ordering the Display of Tuples
•
•
•
•
•
•
•
•
Offers the user some control over the order in which tuples in a relation are
displayed
order by clause causes the tuples in the result of a query to appear in sorted
order
By default, the order by clause lists items in ascending order
ordering can be performed on multiple attributes
Suppose that we wish to list the entire loan relation in descending order of
amount
If several loans have the same amount, we order them in ascending order by
loan number
To fulfill an order by request, SQL must perform a sort
Sorting a large number of tuples may be costly -> Do it only when necessary
110
Duplicates
•
•
•
SQL formally defines not only what tuples are in the result of a query, but
also how many copies of each of those tuples appear in the result
Given multiset relations r1 and r2
For example, suppose that relations r1 with schema (A, B) and r2 with
schema (c) are the following multisets:
Check
•
Then ∏B(r1) would be {(a), (a)}, whereas ∏B(r1) x r2 would be
•
We can now define how many copies of each tuple occur in the result of
an SQL query
111
Set Operations
•
•
•
SQL operations union, intersect, and except operate on relations and
correspond to the relational-algebra operations U, ∩, and –
depositor
borrower
Must be Union Compatible
Union Operation
-
customer_name
customer_name
Bala
Bala
Bala
Bala
Bala
Selva
Sai
Union operation automatically eliminates duplicates
If we want to retain all duplicates, we must write union all
Number of duplicate tuples in the result is equal to the total
number of duplicates that appear in both depositor and borrower
112
Set Operations
•
Intersection Operation
depositor
borrower
customer_name
customer_name
Bala
Bala
Bala
Bala
Bala
Selva
Sai
-
•
•
•
•
Intersect operation automatically eliminates duplicates
If we want to retain all duplicates, we must write intersect all
Number of duplicate tuples that appear in the result is equal to the
minimum number of duplicates in both depositor and borrower
depositor
borrower
Except Operation
customer_name
customer_name
Bala
Bala
Bala
Bala
Sai
Selva
Sai
Except operation automatically eliminates duplicates
If we want to retain all duplicates, we must write except all
Number of duplicate copies of a tuple in the result is equal to the number of
duplicate copies of the tuple in depositor minus the number of duplicate
copies of the tuple in borrower, provided that the difference is positive
113
Aggregate Functions
•
•
•
Functions that take a collection (a set or multiset) of values as input and
return a single value
SQL offers five built-in aggregate functions
- Average: avg
- Minimum: min
- Maximum: max
- Total: sum
- Count: count
Input to sum and avg must be a collection of numbers, but the other
operators can operate on collections of nonnumeric data types, such as
strings
114
Aggregate Functions
•
•
•
•
•
•
•
•
SQL does not allow the use of distinct with count (*)
It is legal to use distinct with max and min even though the result does not
change
We can use the keyword all in place of distinct to specify duplicate
retention, but, since all is the default, there is no need to do so
If a where clause and a having clause appear in the same query, SQL
applies the predicate in the where clause first
Tuples satisfying the where predicate are then placed into groups by the
group by clause
SQL then applies the having clause, if it is present, to each group
Removes the groups that do not satisfy the having clause predicate
Select clause uses the remaining groups to generate tuples of the result of
the query
115
NULL Values
•
•
•
•
SQL allows the use of null values to indicate absence of information about
the value of an attribute
Can use the special keyword null in a predicate to test for a null value
Predicate is not null tests f or the absence of a null value
Use of a null value in arithmetic and comparison operations causes several
complications
- Result of an arithmetic expression (involving, for example +, -, * or /)
is null, if any of the input values is null
- and -> Result of true and unknown is unknown, false and unknown
is false, while unknown and unknown is unknown
- or -> Result of true or unknown is true, false or unknown is
unknown, while unknown or unknown is unknown
- Not -> Result of not unknown is unknown
116
NULL Values
amount
NULL
NULL
NULL
•
count(*) = {NULL, NULL, NULL} = 3
sum(amount) = { } = NULL
count(sum(amount)) = count({ }) = 0
All aggregate functions except count (*) ignore null values in their input
collection
•
As a result of null values being ignored, the collection of values may be
empty
-
Count of an empty collection is defined to be 0
-
All other aggregate operations return a value of null when applied
on an empty collection
•
Effect of null values on some of the more complicated SQL constructs can
be subtle
117
Views
•
•
View -> Any relation that is not part of the logical model, but is made
visible to a user as a virtual relation
Possible to support a large number of views on top of any given set of
actual relations
create view v as <query expression>
118
Views
•
•
•
•
•
If a view relation is computed and stored, it may become out of date if the
relations used to define it are modified
To avoid this, views are usually implemented as follows
- When we define a view, the database system stores the definition of
the view itself
- Wherever a view relation appears in a query, it is replaced by the
stored query expression
- Whenever we evaluate the query, the view relation gets
recomputed
Certain database systems allow view relations to be stored, but they make
sure that, if the actual relations used in the view definition change, the
view is kept up to date -> Materialized Views
Process of keeping the views up to date -> View Maintenance
Applications that use a view frequently benefit from the use of
materialized views, as do applications that demand fast response to
certain view-based queries
119
Modification of the Database
•
•
A delete command operates on only one relation
If we want to delete tuples from several relations, we must use one delete
command for each relation
•
Delete statement first tests each tuple in the relation account to check
whether the account has a balance less than the average at the bank
Then, all tuples that fail the test are deleted
Performing all the tests before performing any deletion is important
- If some tuples are deleted before other tuples have been tested, the
average balance may change, and the final result of the delete
would depend on the order in which the tuples were processed
•
•
120
Modification of the Database
•
Evaluate the select statement fully before we carry out any insertions
•
If we carry out some insertions even as the select statement is being
evaluated, a request such as the above might insert infinite number of
tuples
121
Modification of the Database
•
Evaluate the select statement fully before we carry out any insertions
account
•
no
name
balance
123
Bala
10,000
234
Krishnan
15,000
456
Sai
18,000
If we carry out some insertions even as the select statement is being
evaluated, a request such as the above might insert infinite number of
tuples
122
Modification of the Database
•
Change a value in a tuple without changing aII values in the tuple
•
SQL provides a case construct, which we can use to perform both the
updates with a single update statement, avoiding the problem with order
of updates
•
Operation returns resulti, where i is the first of pred1, pred2, . . . , predn
that is satisfied
If none of the predicates is satisfied, the operation returns result 0
Case statements can be used in any place where a value is expected
•
•
123
Modification of the Database
•
Views are a useful tool for queries, they present serious problems if we
express updates, insertions, or deletions with them
•
Difficulty is that a modification to the database expressed in terms of a
view must be translated to a modification to the actual relations in the
logical model of the database
•
Insertion must be represented by an insertion into the relation loan, since
loan is the actual relation from which the database system constructs the
view loan-branch
•
To insert a tuple into loan, we must have some value for amount
-
Reject the insertion, and return an error message to the user
-
Insert a tuple (L-37, "Perryridge", null) into the loan relation
124
Modification of the Database
•
Only possible method of inserting tuples into the borrowerand.loan
relations is to insert ("Johnson", null) into borrower and (null, null, 1900)
into loan
•
Update does not have the desired effect, since the view relation loan_info
still does not include the tuple (Johnson, 1900)
No way to update the relations borrower and loan by using nulls to get the
125
desired update on loan_info
•
Modification of the Database
•
•
•
Because of problems such as these, modifications are generally not
permitted on view relations, except in limited case
Different database systems specify different conditions under which they
permit updates on view relations -> Manual
An SQL view is said to be updatable (that is, inserts, updates or deletes
can be applied on the view) if the following conditions are all satisfied:
- From clause has only one database relation
- Select clause contains only attribute names of the relation, and does
not have any expressions, aggregates, or distinct specification
- Any attribute not listed in the select clause can be set to null
- Query does not have a group by or having clause
• insert into downtown_account values
(123, Trichy, 10,000) -> Do not allow
• insert into downtown_account values
(123, Downtown, 10,000) -> Allow
•
Problem still remains
126
Modification of the Database
•
By default, SQL would allow the above update to proceed
•
Views can be defined with a with check option clause af the end of the
view definition
-
If a tuple inserted into the view does not satisfy the view's where
clause condition, the insertion is rejected by the database system
-
Updates are similarly rejected if the new value does not satisfy the
where clause conditions
127
Transactions
•
Transaction consists of a sequence of query and/or update statements
•
SQL standard specifies that a transaction begins implicitly when an SQL
•
statement is executed
One of the following SQL statements must end the transaction
•
-
Commit work commits the current transaction
-
Rollback work causes the current transaction to be rolled back
Commit is similar, in a sense, to saving changes to a document that is being
edited, while rollback is similar to quitting the edit session without saving
changed
•
Case of power outage or other system crash -> Rollback occurs when the
•
system restarts
Allow multiple SQL statements to be enclosed between the keywords begin
atomic ... End
•
All the statements between the keywords then form a single transaction
128
Joins
129
Join Types and Conditions
•
•
•
•
SQL Join operations take two relations and return another relation as the
result
Outerjoin expressions are typically used in the from clause, they can be
used anywhere that a relation can be used
Each of the variants of the join operations in SQL consists of a join type
and a join condition
- Join condition defines which tuples in the two relations match and
what attributes are present in the result of the join
- Join type defines how tuples in each relation that do not match any
tuple in the other relation (based on the join condition) are treated
Use of a join condition is mandatory for outer joins (if it is omitted, a
Cartesian product results), but is optional for inner joins
130
Join Types and Conditions
• Keyword natural appears before the join type whereas the on and using
conditions appear at the end of the join expression
• Keywords inner and outer are optional, since the rest of the join type
enables us to deduce whether the join is an inner join or an outer join
• Meaning of the join condition natural, in terms of which tuples from the
two relations match, is straightforward
• Ordering of the attributes in the result of a natural join
- Join attributes (that is, the attributes common to both relations)
appear first, in the order in which they appear in the left-hand-side
relation
- Next all nonjoin attributes of the left-hand-side relation, and finally
all nonjoin attributes of the right-hand-side relation
• Right outer join is symmetric to the left outer join
- Tuples from the right-handside relation that do not match any tuple
in the left-hand-side relation are padded with nulls and are added
to the result of the right outer join
131
Join Types and Conditions
•
Join condition using (A1, A2, . . , An) is similar to the natural-join condition
- Except that the join attributes are the attributes, A1, A2, . . , An,
rather than all attributes that are common to both relations
- Attributes, A1, A2, . . , An, must consist of only attributes that are
common to both relations, and they appear only once in the result
of the join
132
Join Types and Conditions
•
Full outer join is a combination of the left and right outer-join types
•
Query -> Find all customers who have an account but no loan at the bank
133
Join Types and Conditions
• Query -> Find all customers who have either an account or a loan (but not
both) at the bank
•
SQL-92 also provides two other join types -> Cross Join and Union Join
•
Cross Join is equivalent to an inner join without a join condition
•
Union Join is equivalent to a full outer join on the "false" condition -> That
is, where the inner join is empty
134
Join Types and Conditions
Employee
Projects
EmpID
Name
EmpID
1
Ferguson
X-63 Structure
1
2
Frost
X-64 Structure
1
3
Toyon
X-63 Guidance
2
X-64 Guidance
2
X-63 Telemetry
X-64 Telemetry
ProjectName
E.EmpID
P.EmpID
ProjectName
S.EmpID
Skill
1
Ferguson
NULL
NULL
NULL
NULL
3
NULL
NULL
1
X-63 Structure
NULL
NULL
3
NULL
NULL
1
X-64 Structure
NULL
NULL
NULL
NULL
NULL
NULL
1
Mechanical Design
NULL
NULL
NULL
NULL
1
Aerodynamic Loading
2
Frost
NULL
NULL
NULL
NULL
NULL
NULL
2
X-63 Guidance
NULL
NULL
NULL
NULL
2
X-64 Guidance
NULL
NULL
NULL
NULL
NULL
NULL
2
Analog Design
Skills
Skill
Name
EmpID
Mechanical Design
1
Aerodynamic
Loading
1
Analog Design
2
NULL
NULL
NULL
NULL
2
Gyroscope Design
Gyroscope Design
2
3
Toyon
NULL
NULL
NULL
NULL
Digital Design
3
NULL
NULL
3
X-63 Telemetry
NULL
NULL
R/F Design
3
NULL
NULL
NULL
NULL
3
Digital Design
NULL
NULL
NULL
NULL
3
R/F Design
135
Natural Join
•
A
B
Student (2011 Batch)
Student (2011 & 2012 Batch)
ID
Name
Course Taken
12
Bala
CSMI23
23
Karthik
CSHO23
ID
Name
CGPA
23
Karthik
7
13
Kumar
8.9
A
Cross Join
1
2
Course
Taken
B
Natural Join (No Condition)
4
5
CGPA
ID
Name
23
Karthik
12
Bala
CSMI23
23
Karthik
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Karthik
7
23
Karthik
CSHO23
13
Kumar
8.9
Course
Taken
CSHO23
CGPA
7
136
Natural Join
•
A
B
Student (2011 Batch)
ID
Name
Student (2011 & 2012 Batch)
Course Taken
ID
Name
CGPA
12
Bala
CSMI23
23
Selva
7
23
Karthik
CSHO23
13
Kumar
8.9
A
Cross Join
1
2
Course
Taken
B
Natural Join (No Condition)
4
5
CGPA
12
Bala
CSMI23
23
Selva
7
12
Bala
CSMI23
13
Kumar
8.9
23
Karthik
CSHO23
23
Selva
7
23
Karthik
CSHO23
13
Kumar
8.9
ID
Name
Course
Taken
CGPA
137
Nested Subqueries
•
•
•
•
Subquery is a select-from-where expression that is nested within another
query
Common use of subqueries is to perform tests for set membership, make
set comparisons, and determine cardinality
Set Membership
- SQL allows testing tuples for membership in a relation
 The in connective tests for set membership, where the set is
a collection of values produced by a select clause
 The not in connective tests for the absence of set
membership
in and not in operators can also be used on enumerated sets
138
Nested Subqueries
depositor
customer_name
ID
Bala
123
Sai
456
Jones
14
customer_name
Bala
Sai
Jones
customer_name
Jones
139
Nested Subqueries
•
Test for Empty Relations
- SQL includes a feature for testing whether a subquery has any
tuples in its result
- exists construct returns the value true if the argument subquery is
nonempty
- Find all customers who have both an account and a loan at the bank
•
Find all customers who have an account at all the branches located in
Brooklyn
140
Nested Subqueries
•
Test for the Absence of Duplicate Tuples
- SQL includes a feature for testing whether a subquery has any
duplicate tuples in its result
- unique construct returns the value true if the argument subquery
contains no duplicate, tuples
- Find all customers who have at most one account at the perryridge
branch
•
Find all customers who have at least two accounts at the perryridge
branch
141
Nested Subqueries
•
Test for the Absence of Duplicate Tuples
- unique test on a relation is defined to fail if and only if the relation
contains two tuples t1 and t2 such that t1 = t2
Student
-
Name
ID
Dept
bala
123
CSE
bala
123
CSE
karthik
456
ECE
unique will be False
Since the test t1 = t2 fails if any of the fields of t1 or t2 are null, it is
possible for unique to be true even if there are multiple copies of a
tuple, as long as at least one of the attributes of the tuple is null
Student
Name
ID
Dept
bala
123
NULL
bala
123
CSE
karthik
456
ECE
unique will be True
142
Nested Subqueries
•
Set Comparison
- Ability of a nested subquery to compare sets
- Find the names of all branches that have assets greater than those
of at least one branch located in Brooklyn
<> means not equal to
-
-
> some comparison in the where clause of the outer select is true if
the assets value of the tuple is greater than at least one member of
the set of all asset values for branches in Brooklyn
SQL allows < some, <= some, >= some, = some and <> some
comparisons
some is identical to in, whereas <> some is not the same as not in
143
Keyword any is synonymous to some in SQL
Nested Subqueries
•
Set Comparison
- Find the names of all branches that have an asset value greater than
that of each branch in Brooklyn
<> means not equal to
-
SQL also allows < all, <= all, >= all, = all, <> all comparisons
<> all is identical to not in
Find the branch that has the highest average balance
144
THANK YOU
145
Download