Using Relational Databases and SQL Lecture 8: Subqueries Department of Computer Science California State University, Los Angeles Subqueries Subqueries are queries within queries Also called inner queries A query that contains a subquery is called an outer query A subquery must be surrounded by parentheses Subquery Example Example: -- List the name of each sales person who does not represent any members. No subquery: SELECT S.FirstName, S.LastName, S.salesID FROM salespeople S LEFT JOIN members M USING (salesID) WHERE M.memberID IS NULL; With subquery (in red): SELECT FirstName, LastName, salesID FROM salespeople WHERE salesID NOT IN (SELECT distinct salesID FROM members); When to Use Subqueries Use a subquery when: When it is impossible or extremely difficult to solve the problem using a single query When a subquery solution to the problem runs faster than an equivalent non-subquery solution to the problem (rare with the current version of MySQL) When it is easier to understand a subquery than any alternate solution When you want to use an aggregate function in a where clause; subquery will execute separately Types of Subqueries Single Value Subqueries Subquery returns a single value (one column, one row) List Subqueries Subquery returns a list (one column, multiple rows) Table Subqueries Subquery returns a table (multiple columns and rows) WHERE Clause Subqueries Use a subquery in the WHERE clause when you want to filter records from the outer query using a single value or list of values returned from one or more subqueries Single value subqueries are OK List subqueries are OK WHERE Clause Subquery Example Example #2: -- List all tracks with runtime greater than the average runtime of all tracks. This way won’t work because the where filters out data before it is aggregated: SELECT TrackTitle, lengthSeconds FROM Tracks T WHERE lengthSeconds > AVG(T.lengthSeconds) WHERE Clause Subquery Example Solution: put the aggregate into a subquery SELECT TrackTitle, lengthSeconds FROM Tracks WHERE lengthSeconds > (SELECT AVG(lengthSeconds) FROM tracks); The aggregate does not filter out any tracks from the outer query results; the operator in the outer query does that. Subquery aggregate runs, full subquery completes and supplies a return value, then outer query runs WHERE Clause Subquery Example Example #1: -- List all titles recorded at MakeTrax or Lone Star Recording. Do not use a join and do not hard-code company IDs. WHERE Clause Subquery Example Outer and Inner Queries: The outer query... SELECT Title FROM Titles WHERE StudioID = (X) OR StudioID = (Y); Inner query X... SELECT studioID FROM studios WHERE studioName = 'MakeTrax'; Inner query Y... SELECT studioID FROM studios WHERE studioName = 'Lone Star Recording'; WHERE Clause Subquery Example Solution: SELECT Title FROM titles WHERE studioID = (SELECT studioID FROM studios WHERE studioName = 'MakeTrax') OR studioID = (SELECT studioID FROM studios WHERE studioName = 'Lone Star Recording'); IN and NOT IN Use the IN keyword to test if an expression matches any items in a list (typically returned by a subquery) Syntax: expression IN (list subquery) expression NOT IN (list subquery) IN Example Example: -- List the memberID of each member of the Bullets without using a join IN Example Solution: The outer query... SELECT R.memberID FROM xrefArtistsMembers R WHERE R.artistID = (X); IN Example Solution: The inner query... SELECT artistID FROM Artists where artistName = ‘the Bullets’; Substitute to get the solution... SELECT MemberID FROM XrefArtistsMembers WHERE ArtistID = (SELECT ArtistID FROM Artists WHERE ArtistName = 'the Bullets'); ALL and ANY ALL The condition must hold true for all elements in the list. Syntax: expression operator ALL (list subquery) ANY The condition must hold true for at least one element in the list. Syntax: expression operator ANY (list subquery) ALL and ANY Examples Example: mysql> select lastname, birthday, region from members m where (region = "GA") or (birthday > all(select birthday from members where region = "GA")) order by birthday; Vs mysql> select lastname, birthday, region from members m where (region = "GA") or (birthday > any(select birthday from members where region = "GA")) order by birthday; ALL and ANY Examples Example: -- List the names of all members whose birthdays are later than those of all members from CA or OH ALL and ANY Examples Outer query... SELECT LastName, FirstName FROM Members WHERE Birthday > ALL (X) AND Birthday > ALL(Y) ALL and ANY Examples Inner queries: • Inner query X... SELECT birthday FROM Members WHERE Region = ‘CA’ -- Inner query Y... SELECT birthday FROM Members WHERE Region = ‘OH’ ALL and ANY Examples Substitute to get solution: • SELECT lastName, FirstName FROM Members WHERE birthday > ALL (SELECT birthday FROM Members WHERE Region = ‘CA’) AND birthday > ALL(SELECT birthday FROM Members WHERE Region = ‘OH’) HAVING Clause Subqueries As with a WHERE clause, you can have subqueries in a HAVING clause as well Think substitution as well List the number of members in each region which has more members than California HAVING Clause Subqueries Outer and Inner Queries: Outer Query: SELECT Region, Count(*) FROM Members GROUP BY Region HAVING COUNT(*) > (X) Inner Query: SELECT COUNT(*) FROM Members WHERE Region = ‘CA'; HAVING Clause Subqueries Substitute to get solution: SELECT Region, Count(*) FROM Members GROUP BY Region HAVING COUNT(*) > (SELECT COUNT(*) FROM Members WHERE Region = “CA”); SELECT Clause Subqueries A SELECT clause subquery must return a single value (not a list or table) Examples: SELECT (SELECT 1) + (SELECT 2); -- 3 SELECT (SELECT COUNT(*) FROM tracks); -- 50 SELECT (SELECT * FROM tracks); -- ERROR!!! SELECT Clause Subqueries SELECT clause subqueries are good for singlevalue calculations, such as percentages Example: -- What percentage of members are male? SELECT Clause Subqueries Example -- OUTER QUERY SELECT 100*(X)/(Y); -- INNER QUERY X = number of male accounts SELECT COUNT(*) FROM Members WHERE Gender = 'M'; -- INNER QUERY Y = number of total accounts SELECT COUNT(*) FROM Members; -- SOLUTION SELECT 100*(SELECT COUNT(*) FROM Members WHERE Gender = 'M')/(SELECT COUNT(*) FROM Members) as "Percent Male"; Nested Subqueries Nested subqueries are subqueries within subqueries Use same techniques as before, just go a little further Example: -- List the birthdays of all members who belong to artists which have recorded titles that include the word “the.” Do not use any joins. Nested Queries -- Outer Query SELECT birthday FROM members WHERE memberID IN(X) -- Inner Query (X) SELECT memberID FROM xrefArtistsMembers WHERE artistID IN (Y); --Inner Query (Y) SELECT artistID FROM titles WHERE Title LIKE '% the %' OR Title like 'the %' OR Title LIKE "% the"; Nested Queries Solution: SELECT lastname, birthday FROM members WHERE memberID IN (SELECT memberID FROM xrefArtistsMembers WHERE artistID IN (SELECT artistID FROM titles WHERE Title LIKE "% the %" or Title like "the %" or Title LIKE "% the")) Nested Subqueries What does this one do? SELECT A.artistName FROM artists A WHERE (A.artistID IN (select artistID FROM titles WHERE (titles.studioID IN (select studioID FROM studios P WHER P.salesID IN (select salesID FROM salespeople WHERE base > 100))))) Nested Subqueries Answer: Find all artists who have recorded titles at studios which are represented by salespeople whose base salaries are greater than $100 Correlated Subqueries Previous subqueries have been non-correlated. non-correlated means ‘no dependencies’ you can run the inner query separately Correlated Subqueries Correlated subqueries are inner queries that are ‘dependent’ on data from outer queries. correlated means ‘with dependencies’ you can’t run the inner query separately the result of the inner query ‘depends on’ data given to it from the outer query A correlated subquery is executed for each row returned by an outer query Because of unique syntax the sub-query cannot be debugged as independent query Correlated Subqueries List the first track of each title with its length in seconds and the total length in seconds of all tracks for that title: Select TrackTitle, LengthSeconds As Sec, (Select Sum(lengthseconds) From Tracks SC Where SC.TitleID=T.TitleID) As TotSec From Tracks T Where TrackNum=1; Sub-query returns multiple rows WHERE clause in sub-query joins each row to appropriate row of the outer query Correlated Subqueries You can use an alias for the results of the outer query, making the full query easier to understand Find the titles of all tracks that are less than the mean lengths of tracks for the titles on which they occur: select tr.tracktitle, tr.lengthseconds from tracks tr where tr.lengthseconds < (select avg(lengthseconds) from tracks where titleID = tr.titleID); EXISTS with Sub-Queries EXISTS checks for the existence of data in the sub-query Data is either there (True) or it isn't (False) EXISTS with Sub-Queries List the names of all artists who have recorded at least one title: SELECT artistname from artists A where Exists (SELECT ArtistID FROM titles T WHERE T.artistID = A.ArtistID) More Subqueries Find all artists which have members from GA: • no subquery: SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X ON A.ArtistID = X.ArtistID INNER JOIN Members M ON X.MemberID = M.MemberID WHERE M.Region = "GA"; More Subqueries • One subquery, two joins: SELECT DISTINCT Artistname FROM Artists A INNER JOIN XRefArtistsMembers X ON A.ArtistID = X.ArtistID INNER JOIN (SELECT MemberID FROM Members WHERE Region = "GA") M ON X.MemberID = M.MemberID; • Can you do this with no joins at all? More Subqueries • Two Subqueries: Select distinct Artistname From Artists A where A.artistID in (select artistID from xrefartistsmembers x where x.memberID in (Select MemberID from Members where Region='GA')) Subqueries v Joins Subqueries vs Joins • Joins construct Cartesian products, then filter • Subqueries select matching records • The subquery version is often faster Updating Records Why is EXISTS with a sub-query faster than a Join? With an EXISTS sub-query, SQL does not have to perform a full row by row join, building the Cartesian product and then tossing out unmatched rows. It simply runs the sub-query for each row of the outer query. It may not even have to run the entire sub-query, since as soon as it finds one good record it knows that at least some data exists. How to Solve Subquery Problems To solve subquery problems: Always think substitution Analyze the question, looking for subqueries within the question Replace subqueries in the original question with substitution variables such as X, Y, and Z Write queries for your substitution variables Write a query to that solves the original question using your substitution variables Replace substitution variables with your subqueries How to Solve Subquery Problems In other words: Try to solve the problem using a single query, and when you get stuck, write a subquery for the part you get stuck on! Class Exercise List the artistIDs of all artists which have members born before January 2, 1970. List each artist that meets the criteria only once. a) use one or more joins, no subqueries b) use a subquery, no joins