Correlated Subqueries in PROC SQL Barry Campbell Reproductive Care Program of Nova Scotia Overview • Quick review of PROC SQL • Subqueries and examples • Correlated subqueries and examples Quick review of PROC SQL • • • • SQL: Structured Query Language Talk to relational databases PROC SQL is SAS’s implementation SQL Integrates elements of SAS language: e.g., functions, formats, labels • Alternative to Data Step when you want to think relationally Terminology PROC SQL Syntax What is a subquery? • SELECT statement embedded in a SQL query instead of a column name, table name or expression • Use in SELECT, FROM, WHERE, or HAVING clauses • Scalar or vector depending on context • Can refer to same or different table • AKA “Nested” or “Inner” queries Subquery in SELECT clause SELECT student_id, (SELECT COUNT(*) FROM courses WHERE student_id = S.student_id) num_courses, (SELECT MAX(grade) FROM grades WHERE student_id = S.student_id) best_grade FROM students S Pull number of courses and best grade from other tables. Could also be done with a JOIN. Subquery in FROM clause Create inline table C to join with A and B using IDs A (admits) B (patients) C (DISTINCT doctors) Subquery in WHERE clause Systolic blood pressure What is a correlated subquery • Results of inner query constrained by outer • Uses a common variable or key to correlate inner and outer • Typically in the WHERE clause What’s it good for? • Selection depends on aggregate results from the same table • Selection depends on data about same entity in a different table • Combine multiple steps into one • Pass-through queries (execute on DBMS) What’s it good for? Examples • Improving grades: – Show students whose average grade improved at least 10% over last year’s • Flooding events: – List cities and dates on which rainfall was at least 10x the city’s daily average Correlated Subquery Example SELECT student_id, name FROM students S WHERE (SELECT MEAN(grade) FROM grades WHERE student_id = S.student_id AND year = 2011) > (SELECT MEAN(grade) FROM grades WHERE student_id = S.student_id AND year = 2010) * 1.1 Who improved their average by at least 10%? Correlated Subquery Example SELECT city_id, rainfall_mm, event_date FROM rainfall_data R WHERE rainfall_mm > (SELECT MEAN(rainfall_mm) * 10 FROM rainfall_data WHERE city_id = R.city_id) Where and when was the heavy rain? Correlation: inner to outer Correlated Subquery Example From patient visit registry, list all patients and the maximum amount charged for each one Correlated Subquery Example Which patients had vital signs taken more than once in the same visit? Correlated Subquery Example Show me profits on CPUs from top performing suppliers Correlated Subquery Example Build dataset from X but exclude observations found in Y Summary • Subqueries and correlated subqueries: compact way to write complex data manipulation. • Combine selection with aggregation • Think relationally rather than procedurally. • Good way to improve understanding of relationships among database tables Resources • PROC SQL for DATA Step Die-Hards, Christianna S. Williams http://www2.sas.com/proceedings/forum2008/185-2008.pdf Various data manipulation tasks using Data Step and SQL in an increasingly complex series of examples, including sub-queries and correlated sub-queries. • Advanced SQL Processing, Destiny Corporation http://www.nesug.org/proceedings/nesug02/hw/hw007.pdf Advanced topics in PROC SQL including HAVING, FULL JOINs and creation of Views, Indexes, and Data sets. Later examples of sub-queries and correlated sub-queries. • Working With Subquery in the SQL Procedure Lei Zhang, Domain Solutions Corp http://www.nesug.org/proceedings/nesug98/dbas/p005.pdf - Advanced examples of subqueries and correlated subqueries in all clauses of the SQL statement. • An Animated Guide: Knowing SQL Internal Processes makes SQL Easy - Russ Lavery http://www.phuse.eu/download.aspx?type=cms&docID=597 A graphical representation of the SQL process and some rules for describing/predicting the SQL process. Detailed examples with good explanations of pros and cons of subqueries. • http://beginner-sql-tutorial.com/sql-subquery.htm