Introduction to SAS

advertisement
Haas MFE SAS Workshop
Lecture 3:
Peng Liu
http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
SAS SQL
Peng Liu
http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
PROC SQL - What

What does SQL can do?
Selecting
 Ordering/sorting
 Subsetting
 Restructuring
 Creating table/view
 Joining/Merging
 Transforming variables
 Editing

Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 3
PROC SQL - Why

The Advantage of using SQL
Combined functionality
 Faster for smaller tables
 SQL code is more portable for non-SAS
applications
 Not require presorting
 Not require common variable names to
join on. (need same type , length)

Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 4
Selecting Data
PROC SQL;
SELECT DISTINCT rating FROM MFE.MOVIES;
QUIT;

The simplest SQL code, need 3 statements
 By default, it will print the resultant query, use
NOPRINT option to suppress this feature.
 Begin with PROC SQL, end with QUIT; not RUN;
 Need at least one SELECT… FROM statement
 DISTINCT is an option that removes duplicate rows
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 5
Ordering/Sorting Data
PROC SQL ;
SELECT *
FROM MFE.MOVIES
ORDER BY category;
QUIT;


Remember the placement of the SAS statements has
no effect; so we can put the middle statement into 3 lines
SELECT * means we select all variables from dataset
MFE.MOVIES


Put ORDER BY after FROM.
We sort the data by variable “category”
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 6
Subsetting Data
- Character searching in WHERE
PROC SQL;
SELECT title, category
FROM MFE.MOVIES
WHERE category CONTAINS 'Action';
QUIT;




Use comma (,) to separate selected variables
CONTAINS in WHERE statement only for character
variables
Also try WHERE UPCASE(category) LIKE '%ACTION%';
Use wildcard char. Percent sign (%) with LIKE operator.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 7
Subsetting Data
- Phonetic Matching in WHERE
PROC SQL;
SELECT title, category, rating
FROM MFE.MOVIES
WHERE category =* 'Drana';
QUIT;

Always Put WHERE after FROM

Sounds like operator =*
 Search movie title for the phonetic variation of
“drama”, also help possible spelling variations
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 8
Case Logic
- reassigning/recategorize
PROC SQL;
SELECT title, rating,
CASE rating
WHEN 'G'
THEN ‘General'
END AS level
FROM MFE.MOVIES;
QUIT;





ELSE 'Other'
The order of each statement is important
CASE …END AS should in between SELECT and FROM
Note there is , after the variables you want to select
Use WHEN … THEN ELSE… to redefine variables
Rename variable from “rating” to “level”
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 9
Creating New Data
- Create Table
PROC SQL;
CREATE TABLE ACTION AS
SELECT title, category
FROM MFE.MOVIES
WHERE category CONTAINS 'Action';
QUIT;



CREATE TABLE … AS can always be in front of SELECT … FROM
statement to build a sas file.
In SELECT, the results of a query are converted to an output object
(printing). Query results can also be stored as data. The CREATE
TABLE statement creates a table with the results of a query. The
CREATE VIEW statement stores the query itself as a view. Either way,
the data identified in the query can beused in later SQL statements or
in other SAS steps.
Produce a new dataset (table) ACTION in work directory, no printing
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 10
Creating New Data
- Create View
PROC SQL;
CREATE VIEW G_MOVIES AS
SELECT title, category, rating
FROM MFE.MOVIES
WHERE rating = 'G'
ORDER BY title;
SELECT * FROM G_MOVIES;
QUIT;



First step-creating a view,no output is produced; then display the
desired output results
Use ; to separate two block of code inside of proc sql
When a table is created, the query is executed and the resulting
data is stored in a file. When a view is created, the query itself is
stored in the file. The data is not accessed at all in the process of
creating a view.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 11
Join Tables (Merge datasets)
- Cartesian Join
PROC SQL;
SELECT *
FROM MFE.CUSTOMERS, MFE.MOVIES;
QUIT;
Terminology: Join (Merge) datasets (tables)
 No prior sorting required – one advantage over DATA MERGE
 Use comma (,) to separate two datasets in FROM
 Without WHERE, all possible combinations of rows from each
tables is produced, all columns are included
 Turn on the HTML result option for better display:
Tool/Options/Preferences…/Results/ check Create HTML/OK

Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 12
Join Tables (Merge datasets)
- Inner Join using WHERE
PROC SQL;
SELECT *
FROM MFE.MOVIES, MFE.ACTORS
WHERE MOVIES.title = ACTORS.title;
QUIT;

Use WHERE to specify connecting columns (title)
 table1.matchvar = table2.matchvar
 Produce rows that have same movie title
 The matching variable can be of different name
different datasets
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 13
Join Tables (Merge datasets)
- Inner Join using WHERE (Cont.)
PROC SQL;
SELECT M.title, M.rating, A.actor_leading
FROM MFE.MOVIES M, MFE.ACTORS A
WHERE MOVIES.title = ACTORS.title;
QUIT;

Short-cut for table names


Can be used in SELECT and WHERE statements
Need to be declared in FROM statement
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 14
Join Tables (Merge datasets)
- Join three tables
PROC SQL;
SELECT C.cust_no,
M.title,M.rating, M.category,
A.actor_leading
FROM MFE.CUSTOMERS C,
MFE.MOVIES2
M,
MFE.ACTORS
A
WHERE C.cust_no = M.cust_no AND M.title = A.title;
QUIT;




Use AND in WHERE statement to specify two matching
conditions
Produce rows that satisfies all the conditions
Note: We use MOVIES2 in this example
Can join up to 32 tables in one SQL code
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 15
Join Tables (Merge datasets)
- Inner Joins using ON
PROC SQL;
SELECT M.title, rating,actor_leading
FROM MFE.MOVIES M
INNER JOIN MFE.ACTORS A
ON M.TITLE = A.TITLE;
QUIT;

Same result as using where


WHERE is used to select rows from inner joins
ON is used to select rows from outer or inner
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 16
Join Tables (Merge datasets)
- Left Outer Joins
PROC SQL;
SELECT MOVIES.title, actor_leading, rating
FROM MFE.MOVIES
LEFT JOIN
MFE.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;

Resulting output contains all rows for which the SQL expression,
referenced in the ON clause, matches both tables and all rows from
LEFT table (MOVIES) that did not match any row in the right (ACTORS)
table.
 Essentially the rows from LEFT table are preserved and captured exactly
as they stored in the table itself, regardless if a match exists.
 Need to specify a table name for the matching variable in SELECT
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 17
Join Tables (Merge datasets)
- Right Outer Joins
PROC SQL;
SELECT ACTORS.title, actor_leading, rating
FROM MFE.MOVIES
RIGHT JOIN
MFE.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;


Resulting output contains all rows for which the SQL expression,
referenced in the ON clause, matches both tables and all rows
from RIGHT table (ACTORS) that did not match any row in the
right (MOVIES) table.
Essentially the rows from RIGHT table are preserved and
captured exactly as they stored in the table itself, regardless if a
match exists.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 18
Join Tables (Concatenating)
- Outer Union
PROC SQL;
SELECT * FROM MFE.CUSTOMERS
OUTER UNION
SELECT * FROM MFE.MOVIES;
QUIT;

SQL performs OUTER UNION, similar to DATA steps
with a SET statement to Concatenate datasets.
 The result contains all the rows produced by the first
table-expression followed by all the row produced by
the second table-expression.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 19
Transforming Data
- Creating new Variables
/*Creating new variables*/
PROC SQL;
SELECT title, length, category, year, rating,
2006-year AS age
FROM MFE.MOVIES;
QUIT;

You can create new variables within SELECT
statement, the name of new variable follows
after AS.
 Note the order of the express is reversed
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 20
Transforming Data
- Summarizing Data using SQL functions
PROC SQL;
SELECT *,
COUNT(title) AS notitle,
MAX(year)
AS most_recent,
MIN(year)
AS earliest,
SUM(length) AS total_length,
NMISS(rating) AS nomissing
FROM MFE.MOVIES
GROUP BY rating;
QUIT;

Simple summarization functions available
 All function can be operated in GROUPs
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 21
Editing Data
– Insert observations.
PROC SQL NOPRINT;
INSERT INTO MFE.CUSTOMERS
VALUES(1 'Peng');
INSERT INTO MFE.CUSTOMERS
SET Cust_no=2,Name='Sasha';
QUIT;



There are two ways of inserting observations into a
table. Data type should be the same.
VALUES( ) new values are separated by space.
SET column name = newly assigned values, delimited
by commas.
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 22
Editing Data
– Deleting rows and Dropping columns
/*Deleting rows*/
PROC SQL;
DELETE
FROM MFE.MOVIES
WHERE length LE 100;
QUIT;

/*Droping variables*/
PROC SQL;
CREATE TABLE NEW
(DROP=rating) AS
SELECT *
FROM MFE.MOVIES;
QUIT;
Deleting columns can be done in SELECT or
in DROP on created table
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 23
Editing Data
– Update observations
/*Updating Observation*/
PROC SQL NOPRINT;
UPDATE MFE.CUSTOMERS
SET Name='Liu'
WHERE Cust_no=1;
QUIT;



UPDATE … SET… WHERE
Find the observation and set new value
If more than one observations satisfies the condition,
all are updated with the new data in SET statement
Haas School of Business, Berkeley, MFE 2006
Peng Liu AND Alexander Vedrashko 24
Download