Introduction to SQL
Session 2
Retrieving Data From Multiple
Tables
DATA FROM MORE THAN ONE TABLE
Objective
• Select data from more than one table by joining tables
together.
• Use subqueries to select data from one table based on
data values from another table
• Combine the results of more than one query by using set
operators.
DATA FROM MORE THAN ONE TABLE
You can combine data from multiple sources by using two
different methods, Joins and Subqueries.
• Combining data from multiple tables using joins
o Inner Joins
o Outer Joins
• Combining data from multiple tables using subqueries
DATA FROM MORE THAN ONE TABLE
JOINS
Definition
In order to select the data from the tables, join the tables in
a query. It will not affect the original tables. However, there
is a good and not so good way to go about this.
•Not so good way:
proc sql;
distinct tables.
select *
from one, two; *Where “one” and “two” represent two
•The Problem:
Joining tables in this way returns the Cartesian product of
the tables. In other words, each row in the first table is
combined with each row in the other. This can be very
large and is not recommended.
DATA FROM MORE THAN ONE TABLE
JOINS
• Better way: Use an Inner Join. Imagine we have tables
one and two.
proc sql;
select * from one, two;
where one.x = two.x; Where “x” is a variable in both tables.
• Improvement:
The inner join returns only the subset of information that
matches in each table. For that, we use the WHERE
clause to limit the selection. Within the WHERE
statement, choose the the columns that you want want
to be compared for matching values.
Notes: 1. The column name (variable name) is preceded the table
names. Also, you can select data from more than two tables by
separating table names with commas within the FROM clause.
2. Let’s use WidgeOne and Ucdavis2 as example tables now.
DATA FROM MORE THAN ONE TABLE
JOINS
You can also use the WHERE clause to join columns with
other operators as well.
Examples
1.where Plant like ‘D%’ ; Return only those Plant values that begin with D
by using the wild card “%”.
2.where Plant ne “Dallas”; Return values not equal to Dallas values.
3.where Plant like ‘D%’ and Gender = Female; Only return values that
satisfy all factors of the WHERE clause.
Note: PROC SQL treats nulls as distinct entities and as matches for joins.
However, unless you have a specific need for this, you will want only nonmissing values. Therefore, use the IS NOT MISSING operator:
where one.b = two.b and one.b is not missing;
This clause specifies to use only matching b values that are not missing.
DATA FROM MORE THAN ONE TABLE
JOINS
Example of Inner Join Using the WHERE Clause
ods graphics on;
ods rtf;
proc sql;
select a.plant, a.gender, a.position, a.jobgrade, a.yronjob, b.sex, b.gpa
from sql.WidgeOne a inner join sql.UcDavis b
on a.gender = b.sex
where a.jobgrade in (9,7) and a.yronjob = 11.1;
quit;
ods rtf close;
ods graphics off;
NOTE: The graphic notations are there to make the output easily copied. Also, notice
that WidgeOne is assigned to table “a” while UcDavis is assigned to table b.
DATA FROM MORE THAN ONE TABLE
INNER JOINS
Notes
• Use as many variables as necessary by listing them in the
SELECT clause separated by commas.
• Remember, the table names from which you retrieve
data should be listed in the FROM clause.
DATA FROM MORE THAN ONE TABLE
OUTER JOINS
Definition
Outer joins are just inner joins that are supplemented with
rows from one table that do not match any row from the
other table in the join.
This includes more than matching data. To do this, use the
ON clause instead of the WHERE clause. However, you can
use the WHERE clause in addition to subset the query result.
There are Left Outer Joins and Right Outer Joins.
DATA FROM MORE THAN ONE TABLE
LEFT OUTER JOINS
Definition
• Left outer joins list matching rows and rows from, you
guessed it, the left-hand table (the first table listed in the
FROM clause) that do not match any row in the righthand table.
• A left join uses the keywords LEFT JOIN and ON to
generate data.
DATA FROM MORE THAN ONE TABLE
LEFT OUTER JOINS
Example
proc sql;
select a.plant, b.sex, b.gpa
from sql.WidgeOne a left join sql.UcDavis b
on a.gender = b.sex;
quit;
The above code lists the plants from WidgeOne with the
gender and gpa of people from UcDavis, by using a left
join. The left join lists all plants, regardless of the data
provided in UcDavis.
DATA FROM MORE THAN ONE TABLE
LEFT OUTER JOINS
Another Example
proc sql;
title 'Widgets and Students';
select a.plant, a.jobgrade, b.sex 'Gender', b.tv, b.alcohol
from sql.WidgeOne a left join sql.UcDavis b
on a.Gender = b.sex
where b.alcohol gt 10
order by a.plant; *Note: We assign the title Gender to the variable ‘sex’ from second
table.
quit;
The above code lists the plants and job grades of the
employees from WidgeOne regardless of UcDavis. But in
addition, it lists the matching (based on the “ON” clause)
data from UcDavis. However, the data are limited based
on the “WHERE” clause and are ordered by Plant from
WidgeOne.
DATA FROM MORE THAN ONE TABLE
RIGHT OUTER JOINS
Definition
• A right join is specified with the keywords RIGHT JOIN and
ON.
• It is the opposite of a left join.
• Nonmatching rows from the right-hand table* are
included with all matching rows in the output.
• The example for this one reverses the order for the join.
- Uses a right join to select data.
DATA FROM MORE THAN ONE TABLE
RIGHT OUTER JOINS
Right Outer Join
A RIGHT JOIN is opposite of LEFT. Nonmatching rows from
the right-hand table (second listed) are included with all
matching rows in the output:
proc sql;
title 'Widgets and Students';
select a.plant, a.jobgrade, b.sex 'Gender', b.tv, b.alcohol
from sql.WidgeOne2 a right join sql.UcDavis2 b
on a.Gender = b.sex;
quit;
DATA FROM MORE THAN ONE TABLE
FULL OUTER JOINS
A full outer join is used to select all matching and
nonmatching rows. In other words, the FULL OUTER JOIN is
used to select all matching and nonmatching rows from
indicated variables.
TO DO
• Specified with the keywords FULL JOIN and ON
• The full outer join is another way to grab all information or
all information from a selected group.
DATA FROM MORE THAN ONE TABLE
FULL OUTER JOINS
EXAMPLE
proc sql outobs = 10;
title ‘Plant Locations and/or Employee GPA’s’;
select Plant, GPA
from sql.WidgeOne full join sql.UcDavis
on Gender = Sex;
quit;
The output will only yield 10 observations with the following:
all matching and nonmatching rows from the Plant and
GPA observations of the WidgeOne and UcDavis tables for
the first 10 rows encountered.
DATA FROM MORE THAN ONE TABLE
POSITION COUNTS
What if the position of the joined data matters to you? A
DATA step to merge the data first might be in order.
Problem
You want to merge two tables and the position of the
values is important.
Solution
Use a DATA step merge to merge the data based on the BY
variable so that the values appear in the PROC SQL table in
a way that makes sense.
*The following two slides explain the Merge Procedure.
DATA FROM MORE THAN ONE TABLE
POSITION COUNTS
DATA Step to Merge Data
• Merging combines observations from two or more SAS
data sets into a single observation in a new SAS data set.
• In match merging, use a BY statement to combine
observations from the input data sets based on common
values of the BY variable.
• There may exist more that one variable in the BY
statement.
o SAS will merge based on the first variable listed then
proceed to merge based on the subsequent
variables.
o To use the BY variable, remember to sort the data
beforehand.
DATA FROM MORE THAN ONE TABLE
POSITION COUNTS
DATA Step to Merge Data SAS Code
proc sort data = sql.WidgeOne;
by Gender;
run;
proc sort data = sql.UcDavis (rename = (Sex = Gender));
by Gender;
run;
data sql.NewSet;
merge sql.WidgeOne sql.UcDavis;
by Gender;
run;
DATA FROM MORE THAN ONE TABLE
USING SUBQUERIES TO SELECT DATA
• A table join combines multiple tables into a new table
• A subquery selects rows from one table based on values
in another table.
• Another name for it is Inner Query.
• It is a query-expression that is nested as part of another
query-expression.
• It is enclosed in parentheses.
• A subquery can return a single row and column or
multiple rows and columns.
• It can be used in a WHERE or HAVING clause with a
comparison operator.
• Depending on the clause that contains it, a subquery
can return a single value or multiple values.
DATA FROM MORE THAN ONE TABLE
USING SUBQUERIES TO SELECT DATA
Definition
• A single-value subquery returns a single row and column.
• It can be used in a WHERE or HAVING clause with a
comparison operator.
• The subquery must return one value, or else the query fails
o If this happens, an error message will be written to the
log.
DATA FROM MORE THAN ONE TABLE
USING SUBQUERIES TO SELECT DATA
Example
proc sql;
select plant, jobgrade
from sql.WidgeOne
where jobgrade in
(select gpa from sql.UcDavis where gpa = 4);
quit;
Only the job grades that have employees with a GPA of 4.0
are selected along with their corresponding plant locations.
Notice that the output may consist of more than one row or
column.
DATA FROM MORE THAN ONE TABLE
USING SUBQUERIES TO SELECT DATA
Multiple-Value Subqueries
Definition
• A Multiple-Value Subquery can return more than one
value from one column.
• It is also used in a WHERE or HAVING expression that
contains IN or a comparison operator.
• The IN operator is modified by ANY or ALL.
DATA FROM MORE THAN ONE TABLE
USING SUBQUERIES TO SELECT DATA
Example
proc sql;
select gender, jobgrade
from sql.WidgeOne
where jobgrade in
(select computer from sql.UcDavis);
quit;
Notice that you are selecting a variable in the first table
(jobgrade from WidgeOne) based off criteria from the
second table (gpa from UcDavis).
DATA FROM MORE THAN ONE TABLE
JOINS VERSUS SUBQUERIES
• A table join combines multiple tables into a new table.
• A subquery selects rows from one table based on values
in another.
• A subquery, or inner query, is nested as part of another
query expression.
DATA FROM MORE THAN ONE TABLE
COMBINING QUERIES WITH SET OPERATORS
PRC SQL can combine the results of two or more queries in
various ways by using the following set operators:
•UNION – produces all unique rows from both queries.
•EXCEPT – produces rows that are part of the first query only.
•INTERSECT – produces rows that are common to both query results.
•OUTER UNION – concatenates the query results.
Place a semicolon after the last SELECT statement only. Set
operators combine columns from two queries based on
their position in the referenced tables without regard to the
individual column names. Also, columns in the same
relative position in the query become the column names of
the output table.
DATA FROM MORE THAN ONE TABLE
COMBINING QUERIES WITH SET OPERATORS
TO DO
•Place a semicolon after the last SELECT statement only.
o Set operators combine columns from two queries
based on their position in the referenced tables
without regard to the individual column names.
o Also, columns in the same relative position in the
query become the column names of the output
table.
•Try each set of code in turn.
•It is helpful to think of the operators as one does in Set
Theory. Ask yourself, “What part of the two sets is
captured?”
•Draw two intersecting circles representing the tables,
shading the relevant areas as in Set Theory.
DATA FROM MORE THAN ONE TABLE
COMBINING QUERIES WITH SET OPERATORS
UNION
Combines two query results
proc sql;
title ‘WidgeOne UNION UcDavis’;
select jobgrade from sql.WidgeOne
union
select computer from sql.UcDavis;
quit;
You can also use the ALL keyword to request that duplicate
rows remain in the output.
select jobgrade from sql.WidgeOne
union all
select computer from sql.UcDavis;
DATA FROM MORE THAN ONE TABLE
COMBINING QUERIES WITH SET OPERATORS
EXCEPT
Returns rows that result from the first query but not the
second.
proc sql;
title ‘WidgeOne EXCEPT UcDavis’;
select jobgrade from sql.WidgeOne
except
select gpa from sql.UcDavis;
quit;
The above code returns rows that result from WidgeOne but
not from UcDavis. EXCEPT does not return duplicate rows
that do not occur in the second query. Adding ALL keeps
any duplicate rows that do not occur in the second query.
select * from sql.WidgeOne except all select * from sql.UcDavis;
DATA FROM MORE THAN ONE TABLE
COMBINING QUERIES WITH SET OPERATORS
INTERSECT
Works like the mathematical operator: Returns rows from
the first query that also occur in the second.
proc sql;
title ‘WidgeOne INTERSECT UcDavis’;
select jobgrade from sql.WidgeOne
intersect
select computer from sql.UcDavis;
quit;
The above code returns rows that result from WidgeOne
and from UcDavis but none that occur independently.
Again, adding ALL means that the output would contain
the rows produced by the first query that are matched oneto-one with a row produced by the second query. In this
example, the output would match that of the above code.
DATA FROM MORE THAN ONE TABLE
COMBINING QUERIES WITH SET OPERATORS
OUTER UNION
Concatenates the results of the queries.
proc sql;
title ‘WidgeOne OUTER UNION UcDavis’;
select gender from sql.WidgeOne
outer union
select gpa from sql.UcDavis;
quit;
The above code returns rows that result from WidgeOne
followed by the results from UcDavis. Notice that the OUTER
UNION does not overlay columns from the two tables. To
overlay columns in the same position, use the
CORRESPONDING keyword.
select from sql.WidgeOne outer union corr select from
sql.UcDavis;
DATA FROM MORE THAN ONE TABLE
CLOSING NOTES
• It’s been my experience that the two variables must be
the same type when using set operators.
• Of course, a join or a subquery is used when you
reference information from multiple tables.
• Use a subquery when the result that you want requires
more than one query and each subquery provides a
subset of the table involved in the query.
• If a membership question is asked, then a subquery is
usually used. If the query requires a NOT EXISTS condition,
then you must use a subquery because NOT EXISTS
operates only in a subquery; the same principle holds true
for the EXISTS condition.
DATA FROM MORE THAN ONE TABLE
CULMINATED CODE
EXAMPLE:
ods graphics on;
ods rtf;
proc sql;
create table sql.table as
select a.plant, a.position, a.jobgrade, a.Post_Training_Productivity
format=comma14. as Total, a.yronjob, b.gpa
from sql.WidgeOne a inner join sql.UcDavis1 b
on a.gender = b.gender
where a.jobgrade in (9,7) and a.yronjob = 11.1;
quit;
ods rtf close;
ods graphics off;