Lecture 8 - California State University, Los Angeles

advertisement
Using Relational Databases and SQL
Lecture 8:
Subqueries
Department of Computer Science
California State University, Los Angeles
Subqueries
Subqueries are queries within queries
Also called inner queries
A query that contains a subquery is called an
outer query
A subquery must be surrounded by parentheses
Subquery Example
Example: -- List the name of each sales person who does
not represent any members.
No subquery:
SELECT S.FirstName, S.LastName, S.salesID
FROM salespeople S LEFT JOIN members M
USING (salesID) WHERE M.memberID IS
NULL;
With subquery (in red):
SELECT FirstName, LastName, salesID
FROM salespeople
WHERE salesID NOT IN
(SELECT distinct salesID FROM members);
When to Use Subqueries
Use a subquery when:
When it is impossible or extremely difficult to solve
the problem using a single query
When a subquery solution to the problem runs faster
than an equivalent non-subquery solution to the
problem (rare with the current version of MySQL)
When it is easier to understand a subquery than any
alternate solution
When you want to use an aggregate function in a
where clause; subquery will execute separately
Types of Subqueries
Single Value Subqueries
Subquery returns a single value (one column, one row)
List Subqueries
Subquery returns a list (one column, multiple rows)
Table Subqueries
Subquery returns a table (multiple columns and rows)
WHERE Clause Subqueries
Use a subquery in the WHERE clause when you
want to filter records from the outer query using a
single value or list of values returned from one or
more subqueries
Single value subqueries are OK
List subqueries are OK
WHERE Clause Subquery Example
Example #2:
-- List all tracks with runtime greater than the
average runtime of all tracks.
This way won’t work because the where filters out
data before it is aggregated:
SELECT TrackTitle, lengthSeconds FROM Tracks T
WHERE lengthSeconds > AVG(T.lengthSeconds)
WHERE Clause Subquery Example
Solution: put the aggregate into a subquery
SELECT TrackTitle, lengthSeconds
FROM Tracks
WHERE lengthSeconds > (SELECT
AVG(lengthSeconds) FROM tracks);
The aggregate does not filter out any tracks from the
outer query results; the operator in the outer query
does that.
Subquery aggregate runs, full subquery completes and
supplies a return value, then outer query runs
WHERE Clause Subquery Example
Example #1:
-- List all titles recorded at MakeTrax or Lone Star
Recording. Do not use a join and do not hard-code
company IDs.
WHERE Clause Subquery Example
Outer and Inner Queries:
The outer query...
SELECT Title FROM Titles
WHERE StudioID = (X) OR StudioID = (Y);
Inner query X...
SELECT studioID FROM studios WHERE
studioName = 'MakeTrax';
Inner query Y...
SELECT studioID FROM studios
WHERE studioName = 'Lone Star Recording';
WHERE Clause Subquery Example
Solution:
SELECT Title FROM titles WHERE studioID =
(SELECT studioID FROM studios WHERE
studioName = 'MakeTrax') OR studioID = (SELECT
studioID FROM studios WHERE studioName =
'Lone Star Recording');
IN and NOT IN
Use the IN keyword to test if an expression
matches any items in a list (typically returned by
a subquery)
Syntax:
expression IN (list subquery)
expression NOT IN (list subquery)
IN Example
Example:
-- List the memberID of each member of the Bullets
without using a join
IN Example
Solution:
The outer query...
SELECT R.memberID FROM xrefArtistsMembers
R WHERE R.artistID = (X);
IN Example
Solution:
The inner query...
SELECT artistID FROM Artists where artistName =
‘the Bullets’;
Substitute to get the solution...
SELECT MemberID FROM XrefArtistsMembers
WHERE ArtistID = (SELECT ArtistID FROM
Artists WHERE ArtistName = 'the Bullets');
ALL and ANY
ALL
The condition must hold true for all elements in the
list.
Syntax: expression operator ALL (list subquery)
ANY
The condition must hold true for at least one element
in the list.
Syntax: expression operator ANY (list subquery)
ALL and ANY Examples
Example:
mysql> select lastname, birthday, region from
members m where (region = "GA") or (birthday >
all(select birthday from members where region =
"GA")) order by birthday;
Vs
mysql> select lastname, birthday, region from
members m where (region = "GA") or (birthday >
any(select birthday from members where region =
"GA")) order by birthday;
ALL and ANY Examples
Example:
-- List the names of all members whose birthdays are
later than those of all members from CA or OH
ALL and ANY Examples
Outer query...
SELECT LastName, FirstName
FROM Members
WHERE Birthday > ALL (X)
AND Birthday > ALL(Y)
ALL and ANY Examples
Inner queries:
• Inner query X...
SELECT birthday FROM Members WHERE
Region = ‘CA’
-- Inner query Y...
SELECT birthday
FROM Members WHERE Region = ‘OH’
ALL and ANY Examples
Substitute to get solution:
• SELECT lastName, FirstName FROM Members
WHERE birthday > ALL (SELECT birthday FROM
Members WHERE Region = ‘CA’) AND birthday >
ALL(SELECT birthday FROM Members WHERE
Region = ‘OH’)
HAVING Clause Subqueries
As with a WHERE clause, you can have
subqueries in a HAVING clause as well
Think substitution as well
List the number of members in each region which has
more members than California
HAVING Clause Subqueries
Outer and Inner Queries:
Outer Query:
SELECT Region, Count(*) FROM Members
GROUP BY Region
HAVING COUNT(*) > (X)
Inner Query:
SELECT COUNT(*) FROM Members WHERE
Region = ‘CA';
HAVING Clause Subqueries
Substitute to get solution:
SELECT Region, Count(*) FROM Members
GROUP BY Region HAVING COUNT(*) >
(SELECT COUNT(*) FROM Members WHERE
Region = “CA”);
SELECT Clause Subqueries
A SELECT clause subquery must return a single
value (not a list or table)
Examples:
SELECT (SELECT 1) + (SELECT 2); -- 3
SELECT (SELECT COUNT(*) FROM tracks); -- 50
SELECT (SELECT * FROM tracks); -- ERROR!!!
SELECT Clause Subqueries
SELECT clause subqueries are good for singlevalue calculations, such as percentages
Example:
-- What percentage of members are male?
SELECT Clause Subqueries
Example
-- OUTER QUERY
SELECT 100*(X)/(Y);
-- INNER QUERY X = number of male accounts
SELECT COUNT(*) FROM Members
WHERE Gender = 'M';
-- INNER QUERY Y = number of total accounts
SELECT COUNT(*) FROM Members;
-- SOLUTION
SELECT 100*(SELECT COUNT(*) FROM
Members WHERE Gender = 'M')/(SELECT
COUNT(*) FROM Members) as "Percent Male";
Nested Subqueries
Nested subqueries are subqueries within
subqueries
Use same techniques as before, just go a little
further
Example:
-- List the birthdays of all members who belong to
artists which have recorded titles that include the
word “the.” Do not use any joins.
Nested Queries
-- Outer Query
SELECT birthday FROM members WHERE
memberID IN(X)
-- Inner Query (X)
SELECT memberID FROM xrefArtistsMembers
WHERE artistID IN (Y);
--Inner Query (Y)
SELECT artistID FROM titles WHERE Title
LIKE '% the %' OR Title like 'the %' OR Title
LIKE "% the";
Nested Queries
Solution:
SELECT lastname, birthday FROM members WHERE
memberID IN (SELECT memberID FROM
xrefArtistsMembers WHERE artistID IN (SELECT
artistID FROM titles WHERE Title LIKE "% the %" or
Title like "the %" or Title LIKE "% the"))
Nested Subqueries
What does this one do?
SELECT A.artistName FROM artists A WHERE
(A.artistID IN (select artistID FROM titles WHERE
(titles.studioID IN (select studioID FROM studios P WHER
P.salesID IN (select salesID FROM salespeople WHERE
base > 100)))))
Nested Subqueries
Answer: Find all artists who have recorded titles
at studios which are represented by salespeople
whose base salaries are greater than $100
Correlated Subqueries
Previous subqueries have been non-correlated.
non-correlated means ‘no dependencies’
you can run the inner query separately
Correlated Subqueries
Correlated subqueries are inner queries that are
‘dependent’ on data from outer queries.
correlated means ‘with dependencies’
you can’t run the inner query separately
the result of the inner query ‘depends on’ data given
to it from the outer query
A correlated subquery is executed for each row
returned by an outer query
Because of unique syntax the sub-query
cannot be debugged as independent query
Correlated Subqueries
List the first track of each title with its length in
seconds and the total length in seconds of all tracks
for that title:
Select TrackTitle, LengthSeconds As Sec,
(Select Sum(lengthseconds) From Tracks SC
Where SC.TitleID=T.TitleID) As TotSec
From Tracks T Where TrackNum=1;
Sub-query returns multiple rows
WHERE clause in sub-query joins each row to
appropriate row of the outer query
Correlated Subqueries
You can use an alias for the results of the outer
query, making the full query easier to understand
Find the titles of all tracks that are less than the
mean lengths of tracks for the titles on which they
occur:
select tr.tracktitle, tr.lengthseconds from tracks tr
where tr.lengthseconds < (select
avg(lengthseconds) from tracks where titleID =
tr.titleID);
EXISTS with Sub-Queries
EXISTS checks for the existence of data in the
sub-query
Data is either there (True) or it isn't (False)
EXISTS with Sub-Queries
List the names of all artists who have recorded at
least one title:
SELECT artistname from artists A where Exists
(SELECT ArtistID FROM titles T WHERE
T.artistID = A.ArtistID)
More Subqueries
Find all artists which have members from GA:
• no subquery:
SELECT DISTINCT Artistname FROM Artists A
INNER JOIN XRefArtistsMembers X ON
A.ArtistID = X.ArtistID INNER JOIN Members M
ON X.MemberID = M.MemberID WHERE
M.Region = "GA";
More Subqueries
• One subquery, two joins:
SELECT DISTINCT Artistname FROM Artists A INNER JOIN
XRefArtistsMembers X ON A.ArtistID = X.ArtistID INNER
JOIN (SELECT MemberID FROM Members WHERE Region
= "GA") M ON X.MemberID = M.MemberID;
• Can you do this with no joins at all?
More Subqueries
• Two Subqueries:
Select distinct Artistname From Artists A where A.artistID in
(select artistID from xrefartistsmembers x where x.memberID in
(Select MemberID from Members where Region='GA'))
Subqueries v Joins
Subqueries vs Joins
• Joins construct Cartesian products, then filter
• Subqueries select matching records
• The subquery version is often faster
Updating Records
Why is EXISTS with a sub-query faster than a
Join?
With an EXISTS sub-query, SQL does not have
to perform a full row by row join, building the
Cartesian product and then tossing out
unmatched rows. It simply runs the sub-query
for each row of the outer query. It may not even
have to run the entire sub-query, since as soon
as it finds one good record it knows that at least
some data exists.
How to Solve Subquery Problems
To solve subquery problems:
Always think substitution
Analyze the question, looking for subqueries within
the question
Replace subqueries in the original question with
substitution variables such as X, Y, and Z
Write queries for your substitution variables
Write a query to that solves the original question
using your substitution variables
Replace substitution variables with your subqueries
How to Solve Subquery Problems
In other words:
Try to solve the problem using a single query, and
when you get stuck, write a subquery for the part you
get stuck on!
Class Exercise
List the artistIDs of all artists which have
members born before January 2, 1970. List each
artist that meets the criteria only once.
a) use one or more joins, no subqueries
b) use a subquery, no joins
Download