Study Guide

advertisement
Week 5
5
Week 5
Contents
page
Objectives .................................................................................................. 4
Introduction to Week 5 .............................................................................. 6
Textbook coverage................................................................................ 6
The Parts database ..................................................................................... 7
Introducing the Parts database .............................................................. 7
Understanding SQL queries – the basics ................................................. 11
Query evaluation ................................................................................. 11
Grouped queries .................................................................................. 12
Explicitly grouped queries .................................................................. 12
Implicitly grouped queries .................................................................. 13
A motivating information request............................................................ 14
Joining tables ........................................................................................... 15
Cartesian or implicit joins ................................................................... 15
Understanding explicit joins ............................................................... 16
SQL null – first encounter! ................................................................. 17
Answering complex information requests using views ........................... 18
Divide and conquer ............................................................................. 18
SQL null – second encounter! ............................................................ 19
Answering complex information requests without views ....................... 20
Temporary tables ................................................................................ 20
Common table expressions ................................................................. 20
Using a scalar-valued subquery in SELECT ...................................... 21
Using a table-valued sub-query in FROM .......................................... 22
Missing Information - the SQL null ........................................................ 23
Where did nulls come from? .............................................................. 23
SQL nulls – the good news! ................................................................ 24
SQL nulls – why all the fuss? ............................................................. 24
2
Week 5
The effect of Unknown during query processing ............................... 25
Nulls produced by Aggregate Functions ............................................ 26
Is one type of null enough? ................................................................. 27
Handling Nulls in SQL-92 .................................................................. 28
Recommendation ................................................................................ 29
Recursive queries ..................................................................................... 30
Introduction......................................................................................... 30
Identifying a recursive query .............................................................. 31
Processing a recursive query............................................................... 31
A user-defined function ...................................................................... 32
Other useful things to know..................................................................... 32
Query optimisation ............................................................................. 32
Relational algebra and SQL ................................................................ 33
Relational calculus and SQL .............................................................. 34
Working on Assignment 1 ....................................................................... 37
Making an early start on Assignment 2 .............................................. 37
Review Objectives ................................................................................... 38
Solutions to exercises in module ............................................................. 42
3
Week 5
Objectives
On completion of this module you should be able to:
4

comment on the accuracy of SQL being described as a nonprocedural, declarative, or set-at-a-time language

identify and describe the steps involved in evaluating an SQL query

avoid errors when using grouped queries

explain the difference between an explicitly grouped query and an
implicitly grouped query, and provide examples of each

state the rule that applies to a column that appears in the SELECT
clause of an explicitly grouped query

identify advantages and disadvantages of using SQL views to answer
complex information requests

explain how temporary tables or common table expressions provide
an alternative to views when answering complex information requests

explain differences between the Cartesian join of two tables, the
inner join of two tables, and the outer join of two tables

use SQL inner joins and outer joins to answer information requests

demonstrate how a scalar-valued subquery can be used in the
SELECT clause of an SQL query

demonstrate how a table-valued subquery can be used in the FROM
clause of an SQL query

explain why SQL includes a null , and describe problems associated
with using a null to represent missing information

explain why Ted Codd has proposed two types of null

explain why Chris Date has rejected the proposal for two types of null

identify conveniences and risks posed by the SQL null

identify which of the aggregate functions MIN, MAX, COUNT,
SUM, AVG can evaluate to null, and describe conditions under which
this occurs

identify situations in which a null can arise during query processing

describe SQL’s 3–valued logic system and explain how the value
Unknown can arise during query processing

identify and fix queries that may fail to fully answer an information
request as a consequence of nulls that arise during query processing

demonstrate how the COALESCE function can be used to manage
nulls that arise when processing an SQL query
Week 5

describe the concept of a common table expression (CTE), and
explain how a CTE can be used to express a recursive query

explain how the recursive signature of self-reference appears in the
SQL implementation of recursive queries

explain the processing steps involved in producing a result for a
recursive SQL query

briefly explain the process of SQL query optimisation

explain the relevance of relational algebra to the SQL user

describe the extent to which SQL is based on relational calculus
5
Week 5
Introduction to Week 5
The focus of interest this week is SQL queries.
This module starts out with a revision of SQL query processing. A solid
understanding of query processing is essential to avoid queries that
produce incorrect results. Managers are very trusting of numbers that
emerge from computers.
One common cause of incorrect query results is a poor understanding of
the SQL null. The SQL null takes us form the comfortable world of 2valued logic (True and False) into the brave new world of 3-valued logic
(True, False and Unknown).
The SQL query language supports a number of powerful features that you
may not have met previously. We will explore some in this module.
As well as powerful queries, we are also interested in features of the SQL
query language than help to simplify complex queries. Some features
covered in this module will help.
With the topics covered this week you will be able to:
 finish the views needed for Assignment 1
 develop the queries needed for Assignment 2
 develop the CREATE ASSERTION statement for Assignment 2
Textbook coverage
The textbook Chapter 7 material of interest to this module is in the
section New Forms of Join (p 234).
------------------------------------------------------------------------------------Textbook
Chapter 7, pages 234 to 239
------------------------------------------------------------------------------------The textbook Chapter 8 material of interest to this module is in the
section Additional SQL Statements (p 266).
------------------------------------------------------------------------------------Textbook
Chapter 8, pages 266 to 271
-------------------------------------------------------------------------------------
6
Week 5
The Parts database
One of the nice things about relational databases technology is that its
origins are so well defined. Form your previous studies you will know
that relational database technology is underpinned by a set of ideas called
the relation model. The relational model was first proposed by E.F.Codd
(http://en.wikipedia.org/wiki/Edgar_F._Codd) in 1969. The extract
below comes from the March 08 version of the Wikipedia article on the
relational model (http://en.wikipedia.org/wiki/Relational_model):
“The relational model was invented by E.F. (Ted) Codd as a general
model of data, and subsequently maintained and developed by Chris Date
and Hugh Darwen among others. In The Third Manifesto (first published
in 1995) Date and Darwen show how the relational model can
accommodate certain desired object-oriented features without
compromising its fundamental principles.”
As mentioned in a Wikipedia article on the topic, SQL has its critics:
http://en.wikipedia.org/wiki/Sql. Mentioned in the quote above, Chris
Date is a highly regarded author on relational database technology
(http://en.wikipedia.org/wiki/Christopher_J._Date). Chris Date is one of
the most articulate of the SQL critics. As well as the way SQL handles
missing information, Chris Date’s “The Third Manifesto” also criticises
the way that object-oriented features have been added to SQL.
Introducing the Parts database
Throughout his writings, Chris Date uses a collection of simple tables to
illustrate his ideas. We will make use of these tables throughout this
module, and collectively refer to them as the Parts database.
Chris Date has been using his example tables for a long time. Tables in
the Parts database hold data used by a manufacturer of punched card
sorters. Some students may need to refer to the Wikipedia article on
“computer programming in the punch card era” for an explanation. Our
Parts database consists of five tables:
 P – parts types used by the manufacturer
 S – suppliers of parts used by the manufacturer
 SP – shipments of parts from suppliers
 J – projects to manufacture different products
 SPJ – parts used to manufacture products
Note: The literature is filled with suppliers and parts databases
(http://en.wikipedia.org/wiki/Suppliers_and_Parts_database).
Columns, keys, and sample data for the Parts database follow...
7
Week 5
Columns…
P table – describing parts types used by the manufacturer:
PNO
– unique part type number – px, say
PNAME – name of part type px
COLOR – colour of part type px
WEIGHT – weight in grams of a single part of type px
CITY
– city where parts of type px are held
Note: Chris Date’s US spelling of COLOR will be used in this course
S table – describing part suppliers:
SNO
– unique supplier number – sx, say
SNAME – name of supplier sx
STATUS – numeric status indicator of supplier sx
CITY
– city where supplier sx is located
SP table – describing shipments of parts from suppliers:
SNO
– supplier number – sx, say
PNO
– part type number – px, say
QTY
– number of px parts being shipped by supplier sx
J table – describing projects to manufacture different products:
JNO
– unique project number – jx, say
JNAME – name of project jx
CITY
– city in which project jx is conducted
SPJ table – describing parts used to manufacture products:
SNO
– supplier number – sx, say
PNO
– part type number – px, say
JNO
– product number – jx, say
QTY
– number of px parts from supplier sx used on project jx
Primary and foreign keys…
P (PNO, PNAME, COLOR, WEIGHT, CITY)
S (SNO, SNAME, STATUS, CITY)
SP (SNO, PNO, QTY)
SNO references S, PNO references P
J (JNO, JNAME, CITY)
SPJ (SNO, PNO, JNO, QTY)
SNO references S, PNO references P, JNO references J
8
Week 5
Sample data…
P table – describing parts types used by the manufacturer:
PNO
PNAME COLOR
WEIGHT CITY
P1
Nut
Red
12 London
P2
Bolt
Green
17 Paris
P3
Screw
Blue
17 Rome
P4
Screw
Red
14 London
P5
Cam
Blue
12 Paris
P6
Cog
Red
19 London
S table – describing part suppliers:
SNO
S1
S2
S3
S4
S5
SNAME
Smith
Jones
Blake
Clark
Adams
STATUS
20
10
30
20
30
CITY
London
Paris
Paris
London
Athens
SP table – describing shipments of parts from suppliers:
SNO
S1
S1
S1
S1
S1
S1
S2
S2
S3
S4
S4
S4
PNO
P1
P2
P3
P4
P5
P6
P1
P2
P2
P2
P4
P5
QTY
300
200
400
200
100
100
300
400
200
200
300
400
Notes:
 each row describes the number of parts of a given type currently
being shipped by a given supplier
 a given part type can be supplied by more than one supplier
9
Week 5
J table – describing projects to manufacture different products:
JNO
JNAME CITY
J1
Sorter
Paris
J2
Sorter
Rome
J3
Sorter
Athens
J4
Sorter
Athens
J5
Sorter
London
J6
Sorter
Oslo
J7
Sorter
London
SPJ table – describing parts used to manufacture products:
SNO
PNO
JNO
QTY
S1
P1
J1
200
S1
P1
J4
700
S2
P3
J1
400
S2
P3
J2
200
S2
P3
J3
200
S2
P3
J4
500
S2
P3
J5
600
S2
P3
J6
400
S2
P3
J7
800
S2
P5
J2
100
S3
P3
J1
200
S3
P4
J2
500
S4
P6
J3
300
S4
P6
J7
300
S5
P1
J4
100
S5
P2
J2
200
S5
P2
J4
100
S5
P3
J4
200
S5
P4
J4
800
S5
P5
J4
400
S5
P5
J5
500
S5
P5
J7
100
S5
P6
J2
200
S5
P6
J4
500
Note: Each row describes the number of parts of a given type from a
given supplier used on a given project.
Important: The course web site will provide a Microsoft Access and
SQL Server implementation of the Parts database.
10
Week 5
Understanding SQL queries – the basics
Query evaluation
Understanding the anatomy of an SQL query will help you avoid
producing erroneous results. It will also help you debug those produced
by others. The result of a query is determined by the five clauses:
<SELECT clause>
<FROM
clause>
[<WHERE clause>]
(optional)
[<GROUP BY clause>] (optional)
[<HAVING clause>]
(optional; requires GROUP BY)
Note: Queries can also include an ORDER BY clause and/or a
DISTINCT modifier. However, these only affect the presentation of the
result, not the content.
To avoid errors, you must understand how a query result is produced – at
least conceptually. A DBMS is not obliged to evaluate a query as
suggested below. However, the result must be the same as that produced
by the following steps:
Step 1: evaluate the table specified in FROM clause
Step 2 (optional): filter rows as specified in WHERE clause
Step 3 (optional): form groups as specified in GROUP BY clause
Step 4 (optional): filter groups as specified in HAVING clause
Step 5 (grouped query): produce one output row for each group surviving
the HAVING clause (if specified)
Step 5 (ungrouped query): produce one output row for each row
surviving the WHERE clause (if specified)
Given the description above, you might think that the SELECT clause
should be placed at the end of the query. Two reasons for designers
placing SELECT at the start are:
 to produce a more “structured English query language”
 relational calculus is a “results first” language (more later)
Important points:
 the FROM clause of every query evaluates to a single table
 queries are either grouped or ungrouped
11
Week 5
Grouped queries
As mentioned above, queries are either grouped or ungrouped. A simple
ungrouped query is:
SELECT *
FROM
S;
A simple grouped query:
SELECT CITY, COUNT(SNO)
FROM
S
GROUP BY CITY;
For a grouped query, a single output row is produced for each group that
survives the HAVING clause (if specified). For the query above, a group
is formed for each CITY value in the table. The output row includes the
CITY value for the group, plus a count of rows in the group.
Exercise 1
------------------------------------------------------------------------------------The query below produces an error. Why? Try to answer this question
before continuing.
SELECT CITY, SNAME, COUNT(*)
FROM
S
GROUP BY CITY;
------------------------------------------------------------------------------------The query above produces the following error message from SQL Server
2005. Does this make sense?
Column 'S.SNAME' is invalid in the select list because it
is not contained in either an aggregate function or the
GROUP BY clause.
If we modify the query as shown below, the query does not produce an
error. Here, SNAME is “contained in an aggregate function”.
SELECT CITY, COUNT(SNAME), COUNT(*)
FROM
S
GROUP BY CITY;
The query processor is happy now since, for each group, it is in a position
to produce a single output value for each expression in the SELECT
clause. Previously, it had a dilemma – a group might have more than one
SNAME value. How would it decide which SNAME to display?
Explicitly grouped queries
A query that includes a GROUP BY clause is an explicitly grouped query.
The rule for a column that appears in the SELECT clause of an explicitly
grouped query is that it must either:
(1) be “contained” in an aggregate function in the SELECT clause, or
(2) be “contained” in the GROUP BY clause of the query
A more formal description of (1) above would be to say that the column
must appear as an argument to an aggregate function. The bottom line
12
Week 5
here is that the SELECT clause must produce a single result row for each
group.
For a column that appears in the GROUP BY clause, all rows in a group
will have the same value for that column. Consequently, that value may
appear in the result row for the group – it does not need to appear as the
argument to an aggregate function in the SELECT clause.
For a column that does not appear in the GROUP BY clause, the rows in
a group may hold different values for that column. Consequently, if that
column appears in the SELECT clause, it must appear as an argument to
an aggregate function (COUNT, say) – to produce a single output value
for the set of input column values.
Implicitly grouped queries
As well as explicitly grouped queries, SQL also supports implicitly
grouped queries. Consider the example below. This query does not
include a GROUP BY clause. However, it produces a single output row.
SELECT COUNT(*) FROM S;
The query above is a grouped query. Rows in the S table are treated as a
single group to produce one result row for the query. Indeed, the above
query is equivalent to the one below:
SELECT COUNT(*) FROM S GROUP BY ();
The query below is also a grouped query.
SELECT SNAME, COUNT(*) FROM S;
SQL Server produces the following error message when presented with
the above query:
Column 'S.SNAME' is invalid in the select list because it
is not contained in either an aggregate function or the
GROUP BY clause.
13
Week 5
A motivating information request
In following sections we will use a single information request to motivate
our exploration of SQL queries:
For each and every suppliers described in the Parts database,
find the sum of the number of parts supplied in the past (already
used on projects) and the number of parts they have been
requested to supply in the future (currently being shipped).
To answer this request, data must be obtained from the SP and SPJ tables.
The SPJ table describes parts that have been used in the past. The SP
table describes parts that are currently being shipped.
A summary of suppliers’ parts “used in the past” is obtained from the
query below, producing the following result:
SELECT SNO,SUM(QTY)
FROM
SPJ
GROUP BY SNO;
SNO
S1
S2
S3
S4
S5
SUM(QTY)
900
3200
700
600
3100
A summary of supplier’s parts “currently being shipped” is obtained from
the query below, producing the following result:
SELECT SNO,SUM(QTY)
FROM
SP
GROUP BY SNO;
SNO
S1
S2
S3
S4
SUM(QTY)
1300
700
200
900
We would like to combine this data to produce the following result.
SNO
S1
S2
S3
S4
S5
14
TOTAL
2200
3900
900
1500
3100
Week 5
Joining tables
Cartesian or implicit joins
To answer our information request, data must be drawn from the SP and
SPJ tables. A novice SQL user may think that the query below might
answer the request. It does not. Try it!
SELECT SP.SNO, SUM(SP.QTY)+SUM(SPJ.QTY) AS TOTAL
FROM
SP, SPJ
WHERE SP.SNO = SPJ.SNO
GROUP BY SP.SNO;
Exercise 2
------------------------------------------------------------------------------------Why does the above query produce the wrong result? Spend a few
minutes trying to answer this question before continuing.
------------------------------------------------------------------------------------We can explore the table resulting from the FROM clause using the query
below. This query produces 288 rows. Does that make sense?
SELECT *
FROM
SP, SPJ;
The FROM clause specifies a Cartesian join of tables SP and SPJ – 12
rows in SP, 24 rows in SPJ, 288 rows in the result – that does makes
sense! We can explore the filter in the WHERE clause using the query
below. This query produces 36 rows. Does that make sense?
SELECT *
FROM
SP, SPJ
WHERE SP.SNO = SPJ.SNO;
Every SP row with one or more related SPJ rows (same SNO value) will
appear joined to those rows in the result. That explains why we get 36
rows in the result. But, why does that lead to such large TOTAL values?
Let’s put a couple of classic problem solving techniques to work here:
 breaking complex problems into smaller, simpler problems
 investigate specific cases (similar to debugging a program)
A good case to consider here is S3. The correct TOTAL value for S3 is
900; but, we get 1100 from our novice query. Let’s investigate this case
using the query below, which produces the following result. Can you see
where the 1100 comes from?
SELECT *
FROM
SP, SPJ
WHERE SP.SNO = SPJ.SNO AND SP.SNO = 'S3';
SP.SNO
SP.PNO
SPJ.SNO
SPJ.PNO
SPJ.JNO
S3
P2
SP.QTY
200
S3
P3
J1
SPJ.QTY
200
S3
P2
200
S3
P4
J2
500
15
Week 5
The 1100 comes from the addition of the highlighted values below.
SP.SNO
SP.PNO
SPJ.SNO
SPJ.PNO
SPJ.JNO
S3
P2
SP.QTY
200
S3
P3
J1
SPJ.QTY
200
S3
P2
200
S3
P4
J2
500
A good understanding of table joins will help you to avoid such errors.
Understanding explicit joins
A Cartesian join is sometimes referred to as an implicit join – the FROM
clause does not include the word JOIN. Explicit joins do include the
word JOIN. The following three queries produce the same result.
SELECT *
FROM
S, SP
WHERE S.SNO = SP.SNO;
SELECT *
FROM
S JOIN SP ON S.SNO = SP.SNO;
SELECT *
FROM
S INNER JOIN SP ON S.SNO = SP.SNO;
The result produced by these queries is known as the inner join of S and
SP on SNO. Using the sample data, the result has 12 rows – one for each
row in the SP table. Each SP row is joined to the one related row in S.
The second query above illustrates that the default explicit join is the
inner join. As well as “inner joins”, SQL has “outer joins”. The outer
join preserve rows that have no related row in the joined table.
In the sample data, there is no SP row with an SNO value of S5. As such,
S5 does not appear in the inner join of S and SP. If we wish to obtain the
number of part types currently being shipped by each and every supplier,
we cannot use the query below – S5 does not appear in the result.
SELECT S.SNO, SNAME, COUNT(PNO) AS NbrPartTypes
FROM
S INNER JOIN SP ON S.SNO = SP.SNO
GROUP BY S.SNO, SNAME;
However, if we change the inner join to an outer join, we do get the
required result.
SELECT S.SNO, SNAME, COUNT(PNO) AS NbrPartTypes
FROM
S LEFT OUTER JOIN SP ON S.SNO = SP.SNO
GROUP BY S.SNO, SNAME;
SNO
S1
SNAME
Smith
NbrPartTypes
6
S2
Jones
2
S3
Blake
1
S4
Clark
3
S5
Adams
0
Let’s explore the outer join…
16
Week 5
The query below displays the left outer join of S and SP on SNO. The
result has 13 rows – 12 rows from the inner join, plus the S5 row from S
joined to a row of nulls.
SELECT *
FROM
S LEFT OUTER JOIN SP ON S.SNO = SP.SNO;
S.SNO
SNAME
STATUS
CITY
SP.SNO
PNO
QTY
S1
Smith
20
London
S1
P1
300
S1
Smith
20
London
S1
P2
200
S1
Smith
20
London
S1
P3
400
S1
Smith
20
London
S1
P4
200
S1
Smith
20
London
S1
P5
100
S1
Smith
20
London
S1
P6
100
S2
Jones
10
Paris
S2
P1
300
S2
Jones
10
Paris
S2
P2
400
S3
Blake
30
Paris
S3
P2
200
S4
Clark
20
London
S4
P2
200
S4
Clark
20
London
S4
P4
300
S4
S5
Clark
Adams
20
30
London
Athens
S4
NULL
P5
NULL
400
NULL
The outer join comes in three flavours – LEFT, RIGHT and FULL. The
LEFT join preserves rows from the table on the left, the RIGHT join
preserves rows from the table on the right, and the FULL join preserves
rows from both tables.
The keyword OUTER is optional. The following queries are equivalent.
SELECT *
FROM
S LEFT OUTER JOIN SP ON S.SNO = SP.SNO;
SELECT *
FROM
S LEFT JOIN SP ON S.SNO = SP.SNO;
SQL null – first encounter!
As mentioned previously, SQL has been criticised for the way it handles
missing information. In SQL, missing information is represented by a
null. We will explore why SQL nulls have attracted so much attention
later. By way of an introduction to the topic however, see if you can
predict the NbrParts value for S5 produced by the query below.
Exercise 3
------------------------------------------------------------------------------------What NbrParts value would you anticipate for S5 from the query below?
SELECT S.SNO, SNAME, SUM(QTY) AS NbrParts
FROM
S LEFT JOIN SP ON S.SNO = SP.SNO
GROUP BY S.SNO, SNAME;
Now, check your answer.
-------------------------------------------------------------------------------------
17
Week 5
Answering complex information requests using views
Views can help to solve complex information requests by breaking the
request down in to a collection of smaller, simpler requests. The method
of layering built-in functions described in the textbook (p 244) illustrates
the idea.
Divide and conquer
SQL views can be used to break our information request (repeated below)
into three simpler information requests.
For each and every suppliers described in the Parts database,
find the sum of the number of parts supplied in the past (already
used on projects) and the number of parts they have been
requested to supply in the future (currently being shipped).
If we create two views summarising data from SP and SPJ (shown
below), perhaps we can join these views to produce the required result.
CREATE VIEW V1(SNO,ORDERED) AS
SELECT SNO, SUM(QTY)
FROM
SP
GROUP BY SNO;
CREATE VIEW V2(SNO,USED) AS
SELECT SNO, SUM(QTY)
FROM
SPJ
GROUP BY SNO;
Queries below will return the contents of V1 and V2.
Exercise 4
SELECT *
FROM
V1;
SELECT *
FROM
V2;
SNO
S1
S2
S3
S4
SNO
S1
S2
S3
S4
S5
ORDERED
1300
700
200
900
USED
900
3200
700
600
3100
------------------------------------------------------------------------------------Create V1 and V2 as suggested above.
Now, formulate a query to answer our information request using these
two views.
Note: In Microsoft Access you create queries, not views.
-------------------------------------------------------------------------------------
18
Week 5
Joining V1 and V2 using the query below produces the following result.
SELECT V1.SNO, ORDERED+USED AS TOTAL
FROM
V1, V2
WHERE V1.SNO = V2.SNO;
SNO
S1
S2
S3
S4
Exercise 5
TOTAL
2200
3900
900
1500
------------------------------------------------------------------------------------Explain why S5 is missing from the result.
Is the result different using INNER JOIN rather than a Cartesian join?
Is the result different using an OUTER JOIN?
Can an OUTER JOIN be used in V1 to include S5 in the result?
If an outer join is used in V1, what result is produced by the query above?
-------------------------------------------------------------------------------------
SQL null – second encounter!
We can use outer joins in the definition of V1 and V2. One problem with
this approach is that nulls may appear in the result. We will return to this
issue later.
Another way to including all SNO values in V1 and V2 is shown below.
Exercise 6
------------------------------------------------------------------------------------Redefine V1 and V2 as shown below. Note: DBMS limited to SQL-86
will not support UNION in the definition of a view.
CREATE VIEW V1(SNO,ORDERED) AS
SELECT SNO,SUM(QTY)
FROM
SP
GROUP BY SNO
UNION
SELECT SNO,0
FROM
S
WHERE SNO NOT IN (SELECT SNO FROM SP );
CREATE VIEW V2(SNO,USED) AS
SELECT SNO,SUM(QTY)
FROM
SPJ
GROUP BY SNO
UNION
SELECT SNO,0
FROM
S
WHERE SNO NOT IN (SELECT SNO FROM SPJ);
With V1 and V2 defined as above, check that we can join V1 and V2
to produce the required result.
-------------------------------------------------------------------------------------
19
Week 5
Answering complex information requests without views
We have found a solution to our information request using views.
Unfortunately, creating views to answer every non-trivial information
requests will result in a large number of views in a database. After a
while, keeping track of which views are still used by an application
becomes difficult.
Temporary tables
Most modern DBMS support temporary tables. A temporary table is a
table that is automatically discarded at the end of a session.
Note: The term session describes the dialog that occurs over a database
connection. A connection must be established between a client program
(like SQL Server Management Studio) and a database server before SQL
statements can be submitted from the client to the server.
For our information request, we could create temporary tables T1 and T2,
execute the queries we formulated for V1 and V2 to populate T1 and T2,
and then execute our final query against T1 and T2 (instead of V1 and
V2).
Using SQL Server, tables created with a first character of # are temporary
tables. The solution proposed above will be demonstrated in the lecture
using SQL Server.
Temporary tables provide a feasible solution then, if not the most
efficient. But we can do better!
As well as temporary tables, mature DBMS support a feature that might
be described as temporary views. The formal name for this feature is
common table expressions.
Common table expressions
Introduced in SQL:1999, common table expression (CTEs) might be
described as temporary views. The following query uses CTEs to answer
our information request.
20
Week 5
WITH
V1(SNO,ORDERED) AS
( SELECT SNO,SUM(QTY)
FROM
SP
GROUP BY SNO
UNION
SELECT SNO,0
FROM
S
WHERE SNO NOT IN (SELECT SNO FROM SP)),
V2(SNO,USED) AS
( SELECT SNO,SUM(QTY)
FROM
SPJ
GROUP BY SNO
UNION
SELECT SNO,0
FROM
S
WHERE SNO NOT IN (SELECT SNO FROM SPJ))
SELECT V1.SNO, ORDERED+USED AS TOTAL
FROM
V1,V2
WHERE V1.SNO = V2.SNO;
CTEs were introduced to support recursive queries (a topic for later), but
can also be used to simplify complex queries; although, the query above
is not that simple. Can do better? Yes we can!
Using a scalar-valued subquery in SELECT
One powerful feature of SQL is the use of a scalar-valued subquery in the
SELECT clause of a query. The query below produces the following
result.
SELECT SNO,
(SELECT SUM(QTY) FROM SP WHERE SNO=S.SNO) +
(SELECT SUM(QTY) FROM SPJ WHERE SNO=S.SNO) AS TOTAL
FROM
S;
SNO
TOTAL
S1
2200
S2
3900
S3
900
S4
1500
S5
NULL
Note: A subquery is a query enclosed in round brackets, with no
GROUP BY clause and no ORDER BY clause.
Exercise 7
------------------------------------------------------------------------------------This query does not fully answer our information request. Why?
Hint: How does SQL represent missing information?
------------------------------------------------------------------------------------The query below illustrates the power of combining scalar-valued subqueries with CTEs. Each row in the query result describes a part type
received from a supplier – the number of parts received and the
21
Week 5
percentage of all parts of that type received from all suppliers. The first
six rows of the result are shown following the query.
WITH
V(SNO,PNO,USED) AS
( SELECT SNO,PNO,SUM(QTY)
FROM
SPJ
GROUP BY SNO, PNO )
SELECT SNO,PNO,USED,
(SELECT 100*V1.USED/SUM(USED)
FROM
V
WHERE PNO = V1.PNO) AS PERCENT
FROM
V AS V1;
SNO
PNO
USED
PERCENT
S1
P1
900
90
S2
P3
3100
88
S2
P5
100
9
S3
P3
200
5
S3
P4
500
38
S4
P6
600
46
:
:
:
:
Using a table-valued sub-query in FROM
Another powerful feature of SQL is the use of a table-valued subquery in
the FROM clause of a query. The query below correctly answers our
information request using the sample data (see following exercise).
SELECT
FROM
SNO, SUM(QTY) AS TOTAL
( SELECT SNO,QTY
FROM
SP
UNION ALL
SELECT SNO,QTY
FROM
SPJ ) AS V
GROUP BY SNO;
Exercise 8
------------------------------------------------------------------------------------Why does the above query include the word ALL after UNION?
The above query may not always fully answer our information request.
The start of the request is repeated below. Can you see how the query
above might possible produce an incomplete result?
For each and every suppliers described in the Parts database…
Extend the table-valued subquery in the FROM clause above to produce a
query that will always fully answer our information request.
-------------------------------------------------------------------------------------
22
Week 5
Missing Information - the SQL null
In SQL, missing information is represented by an object called a null.
The SQL null has a controversial history. The user of SQL who does not
understanding nulls is at risk of producing incorrect query results.
It is important to note that nulls were not invented by computer scientists
who designed SQL – it is an inherent part of the relational model. Ted
Codd proposed twelve rules to test the credentials of early RDBMS
(http://en.wikipedia.org/wiki/Codd%27s_12_rules). The extract below
comes from the Wikipedia article on the topic in March 08.
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty).
Specifically, it must support a representation of "missing information
and inapplicable information" that is systematic, distinct from all
regular values (for example, "distinct from zero or any other number,"
in the case of numeric values), and independent of data type. It is
also implied that such representations must be manipulated by the
DBMS in a systematic way.
Where did nulls come from?
The keys of a relational database are very important. C.J. Date has said
that foreign keys and candidate keys are “the glue that holds a relational
database together”. In a relational database, every value of a foreign key
must match a value of the referenced candidate key.
However, some foreign keys do not always have a value. For example,
we may wish to record details of an employee in an E table who has not
yet been assigned to a department (described in a D table).
D (DNO, DName, …);
E (ENO, EName, DNO, …) DNO references D;
The concept of a null was introduced into the relational model to
represent the thing you have when you don’t have a value for a foreign
key. It was subsequently used to represent any missing information.
Missing information, and how to handle it, has been a hot research topic
for many years. It remains a controversial topic. Indeed, Ted Codd (the
creator of the relational model), and Chris Date (one of its most respected
advocates), have widely differing views about how missing information
23
Week 5
should be handled. Codd continues to support the null. Date believes
that nulls should be avoided until we have a better solution.
The facts are:
 SQL does have a null
 it is not going away
 it is impossible to avoid
 it introduces complexity
 failure to understand it will lead to errors
SQL nulls – the good news!
Out of respect for Ted Codd, we should start by considering the main
positive aspect of nulls. That is, that they provide a “representation of
missing information and inapplicable information that is systematic”. To list
suppliers with missing information, the method is the same regardless of
data type:
SELECT * FROM S WHERE STATUS IS NULL;
SELECT * FROM S WHERE SNAME IS NULL;
Note: We will briefly explore the distinction between missing data and
inapplicable data later. Perhaps we can be thankful that SQL only has
one type of null, Ted Codd has proposed two:
(http://en.wikipedia.org/wiki/Relational_Model/Tasmania).
SQL nulls – why all the fuss?
Consider the following query, formulated to obtain a list of suppliers
currently shipping more parts than supplier S3.
SELECT *
FROM
S
WHERE SNO IN (SELECT
FROM
GROUP
HAVING
Exercise 9
SNO
SP
BY SNO
SUM(QTY) > (SELECT SUM(QTY)
FROM
SP
WHERE SNO = 'S3'));
------------------------------------------------------------------------------------Check that this query produces the correct result.
Will it always produce the correct result?
Since S5 is currently shipping no parts, what result might you expect if
we replace S3 with S5?
What result do you get when making this replacement?
-------------------------------------------------------------------------------------
24
Week 5
Chris Date on nulls:
“the SQL approach of using a null to represent missing information is not
a satisfactory solution to that problem. Indeed, it is my opinion that the
SQL null introduces far more problems than it solves…
…it is all too easy (in the presence of nulls) to formulate a query that
looks correct, but in fact is not—even if the user is quite familiar with the
way nulls behave”
The query above looks correct. Indeed, it produces the correct result for
S3. However, when we replace S3 with S5 no result is obtained. But
supplier S5 is shipping no parts. We would expect all suppliers shipping
parts to appear in the result.
The problem here is that the SQL null takes us from the familiar world of
2-valued logic into the world of 3-valued logic. In this world, the familiar
values of True and False are joined by a third value – Unknown!
Unknown results from any comparison involving null. So, the
comparison 3 > null evaluates to Unknown. The truth tables are:
AND
T
F
U
T
T
F
U
F
F
F
F
U
U
F
U
OR
T
F
U
T
T
T
T
F
T
F
U
U
T
U
U
NOT
T
F
F
T
U
U
Aside: You sometimes see the phrase “null value”. However, a null is a
no value here marker. Try to avoid using this phrase.
The effect of Unknown during query processing
Why do we get no result from the query below?
SELECT *
FROM
S
WHERE SNO IN (SELECT
FROM
GROUP
HAVING
SNO
SP
BY SNO
SUM(QTY) > (SELECT SUM(QTY)
FROM
SP
WHERE SNO = 'S5'));
Since there are no rows in SP for supplier S5, the inner subquery
(repeated below) does not produce a result. Computer scientists
designing SQL had to decide how to handle this situation.
(SELECT SUM(QTY)
FROM
SP
WHERE SNO = 'S5')
Since a null had been introduced for foreign keys, the decision was made
to treat a subquery that produces no result in the same way. The result of
the inner subquery is null.
25
Week 5
What then is the result of the outer subquery (repeated below)?
(SELECT
FROM
GROUP
HAVING
SNO
SP
BY SNO
SUM(QTY) > (SELECT SUM(QTY)
FROM
SP
WHERE SNO = 'S5' ))
The outer subquery is evaluated once for each group of SP rows. For
each group, the sum of QTY values is compared to the result of the inner
subquery – a null. As mentioned above, a comparison involving null
evaluates to Unknown. Consequently, the HAVING clause does not
evaluate to True for any group. So, the outer subquery produces an empty
set of SNO values.
Exercise 10
------------------------------------------------------------------------------------Using no new SQL features introduced in this module, how can the query
at the top of this page be modified to produce the correct result?
------------------------------------------------------------------------------------Aside: Some authors suggest that, instead of Unknown, the result of a
comparison involving null is null. Either way, the SQL query processor is
looking for WHERE and HAVING expressions that evaluate to True.
Nulls produced by Aggregate Functions
Computer scientists who designed SQL also had to decide how aggregate
functions (SUM, COUNT, AVG, MIN, MAX) should handle nulls. The
following decisions were made:
 remove nulls from the set to which an aggregate function is applied
 SUM, AVG, MIN and MAX return null when applied to an empty set
 COUNT returns zero when applied to an empty set
Note: Chris Date argues that the SUM of an empty set should return 0
since 0 is the identity value under addition; that is, 0 + x evaluates to x.
To illustrate, consider the following “E” table holding employee data.
26
ENO
EName
DNO
Bonus
E1
J. Smith
NULL
NULL
E2
D. Brown
NULL
NULL
E3
A. Sharma
NULL
NULL
E4
B. Lee
D1
500
E5
S. Green
D1
NULL
E6
Q. Han
D2
500
E7
E8
E9
M. Patel
D. Jones
G. Bush
D2
D2
D2
400
300
300
Week 5
Notes:
 S. Green will earn a bonus, but the amount is currently unknown.

Executives are not assigned to a department and do not earn a bonus.
The query below produces the following result.
SELECT DNO, COUNT(ENO), COUNT(Bonus), SUM(Bonus)
FROM
E
GROUP BY DNO;
DNO
COUNT(ENO)
COUNT(Bonus)
SUM(Bonus)
NULL
3
0
NULL
D1
2
1
500
D2
4
4
1500
Note: Nulls are considered equal when grouping rows (the rows with a
null DNO are placed in the same group), but not when compared for
equality (null = null evaluates to Unknown).
Exercise 11
------------------------------------------------------------------------------------Based on this data, how many bonuses will be paid by department D1?
Explain why the value 1 is shown as the count of bonuses for D1.
----------------------------------------------------------------------------------------
Is one type of null enough?
The process of removing nulls from the set to which an aggregate
function is applied is consistent with the use of a null to represent an
inapplicable value; as distinct from an applicable value that is missing:

a bonus is not applicable to executives

a bonus is applicable to S. Green, but the value of that bonus is
currently unknown

a COUNT (Bonus) value of 0 for executive bonuses accurately
reflects reality

a COUNT (Bonus) value of 1 for department D1 does not accurately
reflect reality – a second bonus will be paid; the value is missing
Noting the distinction between applicable and inapplicable values, Ted
Codd proposed two null types – inapplicable and applicable but missing.
Chris Date (not a fan of nulls) would ask how we might handle the case
of an employee where it has not been decided if a bonus will be paid.
Clearly, it would be wrong to record an “inapplicable null” – a bonus may
be applicable. Likewise, and it would be wrong to record an “applicable
but missing” null – a bonus may not be applicable. In this case, the
applicability of a bonus value is the missing information.
27
Week 5
Chris Date uses such examples to ask if two types of null are sufficient –
perhaps we need three: inapplicable, applicable but missing, unknown if
applicable.
Chris date usually follows the question above with another: when will the
madness end?
From the above coverage, you will see that missing information is nontrivial topic. Interested students can learn more about the topic here:
http://en.wikipedia.org/wiki/Null_%28SQL%29.
As mentioned previously, Chris Date has concluded that “the SQL null
introduces far more problems than it solves”.
Handling Nulls in SQL-92
SQL-92 introduced a COALESCE function that can be used to convert a
null to a value (zero, perhaps). The COALESCE function takes two or
more arguments and returns the result of the first argument that provides a
value (is not null).
You discovered that the query below produces the following result.
SELECT SNO,
(SELECT SUM(QTY) FROM SP WHERE SNO = S.SNO)+
(SELECT SUM(QTY) FROM SPJ WHERE SNO = S.SNO)
FROM
S;
SNO
S1
S2
S3
S4
S5
TOTAL
2200
3900
900
1500
null
The reason a value does not appear for S5 is, once again, the dreaded
SQL null. Problems with nulls extend to the arithmetic operators +, -, *
and /. Any arithmetic operation involving null evaluates to null. So null
* 10 evaluates to null, 0 + null evaluates to null, etc.
We can use COALESCE to handle the null problem above. If we replace
the second SELECT clause item with the following expression, the query
will always display a value for each and every supplier.
(SELECT COALESCE(SUM(QTY),0) FROM SP WHERE SNO = S.SNO)+
(SELECT COALESCE(SUM(QTY),0) FROM SPJ WHERE SNO = S.SNO)
Note: With each use of COALESCE above, two arguments are provided –
SUM(QTY), and the literal value 0. If the aggregate function returns null,
the value of the second argument (0) is returned. This has the effect of
converting nulls to zeros.
28
Week 5
Exercise 12
------------------------------------------------------------------------------------Using COALESCE as described above, check that the resulting query
returns a TOTAL value for each and every supplier.
Note: Microsoft Access does not support COALESCE. However, it
provides functions to achieve the same effect. Replace the subquery
below with the following subquery:
(SELECT COALESCE(SUM(QTY),0) FROM SP WHERE SNO=S.SNO)
(SELECT IIF(IsNull(SUM(QTY)),0,SUM(QTY)) FROM SP WHERE SNO=S.SNO)
Use on-line help to explore the IIF and IsNull functions.
Advanced: How does the fact that 0 + null evaluates to null affect Chris
Date’s suggestion that the SUM of an empty set should evaluate to 0.
----------------------------------------------------------------------------------------
Recommendation
Since the SQL null creates such problems, Chris Date suggests that we try
to minimise the number of nulls we must handle. He suggests that,
wherever possible, nulls should be avoided in base tables. It was partly
due to Chris Date that SQL-92 introduced support for default values.
When a default value is specified for a column, the default value is
inserted into the column if the INSERT operation does not provide a
value for that column. See the WORK table on page 277 of the textbook
for an example.
Exercise 13
------------------------------------------------------------------------------------Advanced: How useful are default values for foreign keys?
---------------------------------------------------------------------------------------Even if no nulls are admitted to base tables, the SQL user will still
encounter nulls during query processing. Nulls arise when:

rows are preserved in an outer join

SUM, AVG, MIN and MAX are applied to an empty set, and

a subquery does not produce a result
There is a good Wikipedia article on the topic:
http://en.wikipedia.org/wiki/Null_%28SQL%29
Recommendation: Beware the SQL null.
29
Week 5
Recursive queries
Earlier in this module you were introduced to Common Table
Expressions (CTEs). CTEs bring support for recursive queries to SQL.
Recursion is a powerful technique. The concept finds expression in many
areas of computer science.
Introduction
To demonstrate the use of recursive queries we will use a table called E
describing employees of some enterprise. Each row in E holds an
employee number (ENO), employee name (EName) and the employee
number of the employee’s boss (BossENO).
E (ENO, EName, BossENO) BossENO references E;
ENO
E1
E2
E3
E4
E5
E6
E7
E8
E9
EName
J. Smith
D. Brown
A. Sharma
B. Lee
S. Green
Q. Han
M. Patel
D. Jones
G. Bush
BossENO
NULL
E1
E1
E2
E2
E3
E3
E7
E7
The E table captures a hierarchical boss of relationship:
J. Smith is boss of
D. Brown is boss of
B. Lee
S. Green
A. Sharma is boss of
Q. Han
M. Patel is boss of
D. Jones
G. Bush
The query below lists all “subordinates” to J.Smith (ENO = E1):
WITH Sub(ENO) AS
( SELECT ENO FROM E WHERE BossENO = 'E1'
UNION ALL
SELECT E.ENO FROM E JOIN Sub S ON E.BossENO = S.ENO )
SELECT ENO FROM Sub;
30
Week 5
Identifying a recursive query
Previously we described CTEs as temporary views. Here we build a
temporary view called Sub. Sub has a single column – ENO. Sub will
hold the ENO value of each employee subordinate to J.Smith (ENO =
E1). Notice that the last line below is a simple query returning ENO
values from the temporary view Sub.
WITH Sub(ENO) AS
( SELECT ENO FROM E WHERE BossENO = 'E1'
UNION ALL
SELECT E.ENO FROM E JOIN Sub S ON E.BossENO = S.ENO )
SELECT ENO FROM Sub;
The recursive nature of this query is seen in the definition of CTE Sub –
extracted below:
WITH Sub(ENO) AS
( SELECT ENO FROM E WHERE BossENO = 'E1'
UNION ALL
SELECT E.ENO FROM E JOIN Sub S ON E.BossENO = S.ENO )
Notice that the second query in the UNION expression mentions Sub. So,
Sub is defined in terms of itself. This is the self-referencing signature of
recursion.
Note: Recursive queries include a recursive CTE.
Processing a recursive query
A recursive CTE has two queries joined by UNION ALL. The first query
specifies anchor rows in the CTE result. The second query defines chain
rows – added recursively to the result. The second query is evaluated
repeatedly until either no more chain rows are added to the result, or a
recursion limit is reached.
In our example, the anchor query (below) produces the following result
(shown horizontally):
SELECT ENO FROM E WHERE BossENO = 'E1'
ENO|E2,E3
The first evaluation of the chain query (below), applied against the value
of Sub above, produces the following result (again, shown horizontally):
SELECT E.ENO FROM E JOIN Sub S ON E.BossENO = S.ENO
Sub: ENO|E2,E3,E4,E5,E6,E7
The second evaluation of the chain query, applied against the value of
Sub above, produces the following result:
Sub: ENO|E2,E3,E4,E5,E6,E7,E8,E9
The third evaluation of the chain query, applied against the value of Sub
above, produces the same result – no new rows are added to the result.
31
Week 5
Recursive evaluation of the chain query terminates when no new rows are
added to the result. Having evaluated Sub, the query referencing Sub is
run to produce the query output.
A user-defined function
Just for fun, the following T-SQL statement defines a function that
returns a table of SNO values for employees subordinate to a given
employee.
CREATE FUNCTION SubsTo(@ENO char(6))
RETURNS @Sub TABLE(ENO char(6)) AS
BEGIN
WITH Sub(ENO) AS
( SELECT ENO FROM E WHERE BossENO = @ENO
UNION ALL
SELECT E.ENO FROM E JOIN Sub S ON E.BossENO = S.ENO )
INSERT INTO @Sub SELECT ENO FROM Sub;
RETURN;
END;
Having created this function, we can use it in the query below, producing
the following result.
SELECT * FROM DBO.SubsTo('E2');
ENO|E4,E5
Note: We said that the FROM clause of an SQL query always evaluates to
a single table. Here the table is returned by the SubsTo function.
Other useful things to know
Query optimisation
Unfortunately, the topic of query optimisation is not covered in the
textbook. It is a topic of interest to the database application developer
and database administrator. A thorough investigation of this topic falls
outside the scope of the course. However, by way of introduction, we
briefly consider the execution of a query that includes a Cartesian join.
As mentioned previously, the Cartesian join of two tables is a table
consisting of all possible combinations of rows from the two tables. The
following query produces the Cartesian join of S and SP:
SELECT *
FROM
S,SP;
Normally, we join related tables, and we are only interested in joining
related rows. The query below lists related rows from S and SP. For this
query, it is conceptually correct to picture the DBMS applying the
WHERE clause filter to the Cartesian join of S and SP. In practice, the
32
Week 5
Disk I/O, CPU time, and memory required to process the query may be
reduced through the use of an index.
SELECT *
FROM
S,SP
WHERE S.SNO = SP.SNO;
Let’s assume that tables S and SP are both large. If the percentage of S
rows with a related SP row is small, an efficient processing plan might
read SP rows by scanning the SP table and use an index on S.SNO to read
S rows of interest.
If the query were extended as shown below, an SP.PNO index might be
used to read the SP rows of interest, and then an index on S.SNO used to
read S rows of interest. This could significantly reduce processing time.
SELECT *
FROM
S,SP
WHERE S.SNO = SP.SNO AND PNO = 'P1';
The query optimiser selects a processing plan by costing candidate plans
expressed as relational algebra expressions. Older optimisers are rulebased estimators – costing a plan directly from the operations used.
Modern optimisers are cost-based estimators – costing a plan using
statistics on tables and indexes held in the system catalog. Cost-based
estimation is more expensive, but more accurate.
Interested students are referred to the Wikipedia article on the subject:
http://en.wikipedia.org/wiki/Query_optimization.
Relational algebra and SQL
Students of this course have met relational algebra previously, and
possibly also relational calculus. A basic understanding of these
languages provides a solid platform for understanding SQL. Here, we
explore the relationships between relational algebra and SQL.
One of the nice things about Chris Date’s writing on relational algebra is
that he uses English words for operators instead of mathematical symbols.
In COIT12167 Database Use & Design, a reading on relational algebra is
provided, written by Chris Date.
The query below is followed by an equivalent relational algebra
expression using Chris Date’s syntax.
SELECT SNAME
FROM
S,SP
WHERE S.SNO = SP.SNO AND PNO = 'P1';
((S TIMES SP) WHERE S.SNO = SP.SNO AND PNO = ‘P1’)[SNAME]
To illustrate the role of relational algebra in query optimisation, some
alternate execution plans are shown below.
((S TIMES (SP WHERE PNO = ‘P1’))
WHERE S.SNO = SP.SNO) [SNAME]
33
Week 5
(((S[SNO,SNAME]) TIMES (SP WHERE PNO = ‘P1’))
WHERE S.SNO = SP.SNO) [SNAME]
(((S[SNO,SNAME]) TIMES ((SP WHERE PNO = ‘P1’)[SNO]))
WHERE S.SNO = SP.SNO) [SNAME]
What use is relational algebra to the database application developer or
database administrator?

SQL provides direct implementation or three relational operators:
UNION, INTERSECT, and EXCEPT
(http://en.wikipedia.org/wiki/Union_(SQL))

the student of relational algebra will know that UNION is a set
operator; and, hence, will be in a position to appreciate the difference
between UNION and UNION ALL

further reading on the subject of query optimisation will assume
familiarity with relational algebra
Note: Most textbooks that cover relational algebra use mathematical
symbols to represent operators. The Wikipedia article does likewise:
http://en.wikipedia.org/wiki/Relational_algebra.
Relational calculus and SQL
You will often see claims that SQL is based on relational calculus. It is
said that SQL is a non-procedural or declarative language, in contrasts
with procedural or imperative languages like Java. Wikipedia provides
articles on declarative and imperative programming:
 declarative: http://en.wikipedia.org/wiki/Declarative_programming
 imperative: http://en.wikipedia.org/wiki/Imperative_programming
With the introduction of procedural extensions in SQL:1999 (SQL/PSM),
the claim that SQL is non-procedural is no longer valid. However,
developing stand-alone queries might still be described as declarative
programming: an SQL query does not specify a procedure to extract data
of interest – it simply declares the data of interest.
Aside: Given the 5-step processing model presented earlier, you might
think that an SQL query does specify a process; and, so, is not
declarative. The purpose of the model provided was to explain the
semantics of an SQL query. The model represents one possible SQL
query execution plan. As you know, query optimisation will consider
many possible execution plans for any given query.
Some points of interest:
 it is not important how a DBMS implements an SQL query
 it is important that the result produced by a DBMS is consistent with
the 5-step process described earlier; a DBMS running on a quantum
computer may take a very different approach
(http://en.wikipedia.org/wiki/Quantum_computer)
34
Week 5


it is the high level of abstraction in the relational model that enables
query optimisation; this abstraction may also enable very different
styles of query processing in the future – more instantaneous than
procedural, perhaps
bottom line: an SQL query is not intrinsically procedural
One of the nice things about Chris Date’s writings on relational calculus
is that he uses English words for the existential and universal quantifiers –
EXISTS and FORALL. The relational calculus comes in two flavours –
domain and tuple. SQL is based on the tuple calculus. The tuple calculus
uses variables that range over tuples in a relation. (The domain calculus
uses variables that range over values in a domain.)
The query below is followed by an equivalent relational tuple calculus
expression using Chris Date’s operators.
SELECT SNAME
FROM
S,SP
WHERE S.SNO = SP.SNO AND PNO = 'P1';
RANGE OF SX IS S
RANGE OR SPX IS SP
SX.SNAME WHERE EXISTS SPX (SPX.SNO = SX.SNO AND
SPX.PNO = ‘P1’)
An English translation is:
The SNAME value of any S tuple where there exists an SP tuple with the
same SNO value and a PNO value of P1.
SQL provides an implementation of the existential quantifier – EXISTS.
An SQL query that is equivalent to the expression above is:
SELECT SNAME
FROM
S SX
WHERE EXISTS ( SELECT
FROM
WHERE
AND
*
SP SPX
SX.SNO = SPX.SNO
SPX.PNO = 'P1');
Similarities between the above SQL query and relational calculus
expression illustrate the claim that SQL is based on relational calculus.
Sadly, SQL does not provide an implementation of the universal
quantifier FORALL. The expression below evaluates to names of
suppliers shipping each and every part type.
RANGE OF
RANGE OR
RANGE OF
SX.SNAME
SX IS S
SPX IS SP
PX IS P
WHERE FORALL PX EXISTS SPX (SPX.SNO = SX.SNO AND
SPX.PNO = P.PNO)
An English translation is:
The SNAME value of any S tuple - sx, say - where for every P tuple there
is an SP tuple describing a shipment of that part type from supplier sx.
35
Week 5
SQL can avoid implementing FORALL because it is not primitive FORALL can be expressed in terms of EXISTS:
RANGE OF
RANGE OR
RANGE OF
SX.SNAME
SX IS S
SPX IS SP
PX IS P
WHERE NOT EXISTS PX NOT EXISTS SPX
(SPX.SNO = SX.SNO AND SPX.PNO = P.PNO)
In English:
The SNAME value of any S tuple - sx, say - where there does not exist a P
tuple that has no related SP tuple describing a shipment of that part type
from supplier sx.
or…
The name of any supplier where there is no part type that they are not
shipping.
The lack of support for FORALL explains why some SQL queries use
double NOT EXISTS (see following example).
Important: Since double NOT EXISTS queries are not the easiest to
read, always look for simpler equivalent queries. All of the queries below
are equivalent.
SELECT SNAME
FROM
S SX
WHERE NOT EXISTS
( SELECT *
FROM
P PX
WHERE NOT EXISTS
( SELECT
FROM
WHERE
AND
*
SP SPX
SPX.SNO = SX.SNO
SPX.PNO = PX.PNO ));
is equivalent to:
SELECT SNAME
FROM
S SX
WHERE NOT EXISTS
( SELECT *
FROM
P PX
WHERE PNO NOT IN
( SELECT PNO
FROM
SP SPX
WHERE SPX.SNO = SX.SNO ));
is equivalent to:
SELECT SNAME
FROM
S
WHERE SNO IN
( SELECT SNO
FROM
SP
GROUP BY SNO
HAVING COUNT(*) = ( SELECT COUNT (*)
FROM P
));
is equivalent to:
36
Week 5
SELECT SNAME
FROM
S INNER JOIN SP ON S.SNO = SP.SNO
GROUP BY SNAME
HAVING COUNT(*) = ( SELECT COUNT (*) FROM P );
What use is relational calculus to you?

familiarity with the calculus provides a solid platform for using SQL

SQL provides direct support for the existential quantifier

relational calculus is based on first order logic – a formal deductive
system that attempts to capture the essence of human reasoning
(http://en.wikipedia.org/wiki/First-order_predicate_calculus)

first order logic finds application in many fascinating areas of
computer science, including: artificial intelligence, deductive
databases, logic programming, program proof, natural language
processing (Wikipedia has articles on all of these topics)
Happy querying!
Working on Assignment 1
It is VERY important that you spend time working on Assignment 1 each
and every week up to the due date (Friday, Week 5). No tutorial work has
been set for this course. Instead, use the time to work on the assignments.
This week you should aim to finish your CREATE VIEW statements for
Assignment 1. Then, having completed you script, perform a final check
that the script runs correctly. It is vital that the marker can run your
script. If not, the marker is unlikely to give full credit for your work.
Having checked that your script runs without errors, the final steps to
making your submission are:

check the documentation requirements for the assignment

prepare your documentation

download the marking sheet and check the assessment criteria

prepare you zip file for submission

make your submission

record your submission number

download your submission file and check it is not corrupted
Making an early start on Assignment 2
37
Week 5
With the content of this module fresh in your mind, you might want to
take a look at the information requests in Assignment 2. It may take a
while to develop queries to answer these requests. The best way to
develop these queries may be to return to the task a few times. Make a
start now by just reading the requests.
Review Objectives
In preparation for the exam, review the leaning objectives identified at the
start of this module. The exam for this course is open book. You can
take your own notes, an annotated Study Guide, and printed materials into
the exam. Prepare for the exam now by making any notes that will help
demonstrate you can satisfy the leaning objectives identified at the start of
this module.
38
Week 5
Exercises
Exercises 1 to 13 are integrated into the body of the module.
14. In your own words, explain why the information request below is not
correctly answered by the following query.
For each and every supplier described in the Parts database,
find the sum of the number of parts supplied in the past (already
used on projects) and the number of parts they have been
requested to supply in the future (currently being shipped).
SELECT SP.SNO,SUM(SP.QTY)+SUM(SPJ.QTY) AS TOTAL
FROM
SP JOIN SPJ ON SP.SNO = SPJ.SNO
GROUP BY SP.SNO;
15. With V1 and V2 defined as shown below, will the following query
fully answer our information request?
CREATE VIEW V1(SNO,ORDERED) AS
SELECT SNO, SUM(QTY)
FROM
SP
GROUP BY SNO;
CREATE VIEW V2(SNO,USED) AS
SELECT SNO, SUM(QTY)
FROM
SPJ
GROUP BY SNO;
SELECT
FROM
WHERE
UNION
SELECT
FROM
WHERE
UNION
SELECT
FROM
WHERE
UNION
SELECT
FROM
WHERE
AND
V1.SNO,ORDERED+USED AS TOTAL
V1,V2
V1.SNO = V2.SNO
SNO,ORDERED
V1
SNO NOT IN ( SELECT SNO FROM V2 )
SNO,USED
V2
SNO NOT IN ( SELECT SNO FROM V1 )
SNO,0
S
SNO NOT IN ( SELECT SNO FROM V1 )
SNO NOT IN ( SELECT SNO FROM V2 );
39
Week 5
16. Consider the following information request and query.
List names of suppliers who are currently shipping more part types
than supplier S3.
SELECT
FROM
WHERE
GROUP
HAVING
SNAME
S,SP
S.SNO = SP.SNO
BY SNAME
COUNT(PNO) >
( SELECT COUNT(PNO)
FROM
SP
WHERE SNO = 'S3' );
Will the above query answer the information request correctly?
17. Consider the following information request and query.
Obtain names of suppliers that are currently shipping no parts.
SELECT SNAME
FROM
S LEFT JOIN SP ON S.SNO = SP.SNO
GROUP BY SNAME
HAVING SUM(QTY) = 0;
(a) Why will this query not answer the information request correctly?
(b) Modify the query to produce the required result.
18. Consider the following information request and query.
Obtain names of suppliers that are currently shipping no P2s.
SELECT SNAME
FROM
S LEFT JOIN SP ON S.SNO = SP.SNO
WHERE PNO = 'P2' OR PNO IS NULL
GROUP BY SNAME
HAVING SUM(QTY) IS NULL;
Why will this query not answer the information request correctly?
19. Consider the following database, information request and query:
Account (AID, LastStatementDate, LastStatementBalance);
Payment (PID, AcctID, Date, Paid) AID references Account;
List Account IDs for accounts where the balance outstanding on the
last account statement has not been paid in full.
SELECT A.AID
FROM
Account A JOIN Payment P ON A.AID = P.AID
WHERE P.Date > LastStatementDate
GROUP BY A.AID, LastStatementBalance
HAVING SUM(Paid) < LastStatementBalance;
This query will not always answer the information request correctly.
(a) Explain why this query will miss accounts that have made no
payment since the last statement.
(b) To include the missing accounts, one might be tempted to use an
outer join and modify WHERE to retain preserved rows (P.Date
IS NULL). Will this fix the problem?
(c) Formulate a query to always answer the request correctly.
40
Week 5
20. The following database describes information about courses (in C
table) and prerequisite courses (in P table). Consider the following
database, information request, query, and sample data:
C (CID, CName);
R (CID, PreReqCID) CID references C, PreReqCID references C;
List the prerequisite courses for COIT13143.
WITH Req(CID) AS
( SELECT PreReqCID FROM R WHERE CID = 'COIT13143'
UNION ALL
SELECT PreReqCID FROM R JOIN Req
ON R.CID = Req.CID )
SELECT CID FROM Req;
C
CID
COIT11134
COIT11222
COIT11226
COIT12167
COIT13143
CName
Java Programming
Visual Programming
Systems Analysis & Design
Database Use & Design
Database Application Development
R
CID
COIT11134
COIT12167
COIT12167
COIT13143
PreReqCID
COIT11222
COIT11134
COIT11226
COIT12167
(a) How do we know that the query above is a recursive query?
(b) Explain the workings of the above query. In particular, explain
how contents of the CTE called Req are derived recursively.
41
Week 5
Solutions to exercises in module
1. As explained in the following text, the query produces an error because SNAME
“does not appear in an aggregate function or the GROUP BY clause”.
2. See solution to Exercise 14 below.
3. One might expect to see a NbrParts value of zero (0). Actually, SQL produces a
null NbrParts for S5.
4. No solution required.
5. S5 is missing from the result because S5 does not appear in V1 and an inner join
of V1 and V2 has been formed.
No, the Cartesian join of V1 and V2 followed by the row filer of V1.SNO =
V2.SNO is equivalent to the inner join.
Yes, S5 appears in the result, but with a null TOTAL.
An outer in V1 will preserve S5, but with a null ORDERED value.
S5 appears with a null TOTAL.
6. It does.
7. For S5 (SELECT SUM(QTY) FROM SP WHERE SNO=S.SNO) evaluates
to null. And, null + 3100 evaluates to null.
8. The query includes ALL after UNION to preserve duplicate (SNO, QTY) rows
that may be obtained from the two operand queries.
The result will not include a row with an SNO value that appears in S but does
not appear in either SP or SPJ.
SELECT SNO, SUM(QTY) AS TOTAL
FROM
( SELECT SNO,QTY
FROM
SP
UNION ALL
SELECT SNO,QTY
FROM
SPJ
UNION
SELECT SNO,0
FROM
S) AS V
GROUP BY SNO;
9. It does produce the correct result.
It will not produce the correct result if S3 is shipping no parts.
One might expect it to list details of all suppliers shipping parts.
The query produces an empty result.
42
Week 5
10. The following query will always list all suppliers shipping more parts than S5:
SELECT *
FROM
S
WHERE SNO IN
( SELECT SNO
FROM
SP
GROUP BY SNO
HAVING SUM(QTY) > ( SELECT SUM(QTY)
FROM
SP
WHERE SNO = 'S5' )
OR NOT EXISTS ( SELECT *
FROM
SP
WHERE SNO = 'S5' ));
11. Two (2) bonuses will be paid by department D1.
The value 1 is shown as the count of bonuses for D1 since the null for S. Green
is removed from the set to which the COUNT function is applied.
12. Advanced: Chris Dates argues that SUM of an empty set should evaluate to 0
since 0 is identity under addition: 0 + x evaluates to x (which is still true if x is
null). The fact that x + null evaluates to null does not weaken his argument.
13. Advanced: Default values are not particularly useful for foreign keys. If a
default value is used, a row in the target table must hold this value for the target
candidate key. There are few cases where this is appropriate. In most cases, this
would require the introduction of a “dummy row” in the target table. Then:
queries using the referenced table must explicitly avoid the dummy row; or, a
view would be needed to filter out the dummy row. Also: queries using the
referencing table must take “default foreign keys values” into account. Note:
searching for default foreign keys values is still type-independent as we can
search for “IS DEFAULT”.
14. The query:
SELECT SP.SNO,SUM(SP.QTY)+SUM(SPJ.QTY) AS TOTAL
FROM
SP JOIN SPJ ON SP.SNO = SPJ.SNO
GROUP BY SP.SNO;
produces the following result:
SNO
S1
S2
S3
S4
TOTAL
8000
12000
1100
3600
Consider the case of supplier S3. The correct value for S3 is 900.
SELECT * FROM SP WHERE
SNO = 'S3';
produces:
SNO
S3
PNO
P2
QTY
200
SELECT * FROM SPJ WHERE
produces:
SNO
S3
S3
PNO
P2
P4
JNO
J1
J2
SNO = 'S3';
QTY
200
500
43
Week 5
SELECT *
FROM
SP JOIN SPJ ON SP.SNO = SPJ.SNO
WHERE SP.SNO = 'S3';
produces:
SP.SNO SP.PNO SP.QTY SPJ.SNO SPJ.PNO SPJ.QTY
S3
P2
200
S3
P3
200
S3
P2
200
S3
P4
500
giving:
SUM(SP.QTY) = 400
SUM(SPJ.QTY) = 700
SUM(SP.QTY)+SUM(SPJ.QTY) = 1100
In summary: Rows from SP and SPJ are joined over SNO. Each row is joined
to one or more related rows in the other table. Each QTY value will appear in the
join as many times as there are related rows in the other table. Summing
replicated QTY values will result in an inflated TOTAL value for the supplier.
15. Yes.
16. Yes it will. COUNT returns zero when it is applied to an empty set of values.
17. (a) Nulls are removed from the set of items to which an aggregate function is
applied. SUM returns null when applied to an empty set.
(b) Two possibilities are:
SELECT SNAME
FROM
S LEFT JOIN SP ON S.SNO = SP.SNO
GROUP
BY SNAME
HAVING SUM(QTY) IS NULL;
SELECT
FROM
GROUP
HAVING
SNAME
S LEFT JOIN SP ON S.SNO = SP.SNO
BY SNAME
COALESCE(SUM(QTY),0) = 0;
18. The WHERE clause is applied to the outer join of S and SP. This outer join will
preserve rows from S with no related row in SP.
An S row describing a supplier not shipping any parts will be preserved by the
outer join. The name of this supplier will appear in the result.
An S row describing a supplier shipping some parts, but not P2s, will appear in
the result of the outer join joined to the rows describing the parts they are
shipping. For this supplier, no row in the outer join has a PNO of “P2 or null”.
Consequently, no rows for this supplier will survive the WHERE clause.
44
Week 5
19. (a) It will not include details of accounts that have made no payments since the
date of the last statement.
(b) No. See answer to Exercise 18.
(c) Three possibilities are:
SELECT A.AID
FROM
Account A JOIN Payment P ON A.AID = P.AID
WHERE P.Date > LastStatementDate
GROUP BY A.AID, LastStatementBalance
HAVING SUM(Paid) < LastStatementBalance
UNION
SELECT A.AID
FROM
Account A
WHERE LastStatementBalance > 0
AND NOT EXISTS
( SELECT *
FROM
Payment P
WHERE P.AID = A.AID
AND P.Date > LastStatementDate );
SELECT A.AID
FROM
Account A
WHERE LastStatementBalance >
( SELECT SUM(Paid)
FROM
Payment P
WHERE P.AID = A.AID
AND P.Date > LastStatementDate )
OR (LastStatementBalance > O AND
NOT EXISTS
( SELECT *
FROM
Payment P
WHERE P.AID = A.AID
AND P.Date > LastStatementDate ));
SELECT A.AID
FROM
Account A
WHERE LastStatementBalance >
( SELECT COALESCE(SUM(Paid),0)
FROM
Payment P
WHERE P.AID = A.AID
AND P.Date > LastStatementDate );
45
Week 5
20. (a) We know that the query is a recursive query because it includes a recursive
common table expression (CTE). Notice how the definition of CTE Req includes
a reference to itself in the query expression following UNION ALL.
WITH Req(CID) AS
( SELECT PreReqCID FROM R WHERE CID = 'COIT13143'
UNION ALL
SELECT PreReqCID FROM R JOIN Req
ON R.CID = Req.CID )
(b) The query works by evaluating the CTE Req and then executing the
following query against Req:
SELECT CID FROM Req;
The evaluation of Req starts by executing the anchor query below, with produces
the following result.
SELECT PreReqCID FROM R WHERE CID = 'COIT13143'
Req: CID| 'COIT12167'
The first evaluation of the chain query (below), applied against the value of Req
above, produces the following result:
SELECT PreReqCID FROM R JOIN Req ON R.CID = Req.CID
Req: CID| 'COIT12167', ' COIT11134', ' COIT11226'
The second evaluation of the chain query, applied against the value of Req
above, produces the following result:
Req: CID| 'COIT12167', ' COIT11134', 'COIT11226',
'COIT11222'
The third evaluation of the chain query, applied against the value of Req above,
produces the same value of Req. That is, no new rows are added to Req. At this
point the recursive CTE is fully evaluated. Having evaluated Req, the query
referencing Req is run to produce the query output - the four CID values above.
46
Download