Haas MFE SAS Workshop Lecture 3: Peng Liu http://faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006 SAS SQL Peng Liu http://faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006 PROC SQL - What What does SQL can do? Selecting Ordering/sorting Subsetting Restructuring Creating table/view Joining/Merging Transforming variables Editing Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 3 PROC SQL - Why The Advantage of using SQL Combined functionality Faster for smaller tables SQL code is more portable for non-SAS applications Not require presorting Not require common variable names to join on. (need same type , length) Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 4 Selecting Data PROC SQL; SELECT DISTINCT rating FROM MFE.MOVIES; QUIT; The simplest SQL code, need 3 statements By default, it will print the resultant query, use NOPRINT option to suppress this feature. Begin with PROC SQL, end with QUIT; not RUN; Need at least one SELECT… FROM statement DISTINCT is an option that removes duplicate rows Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 5 Ordering/Sorting Data PROC SQL ; SELECT * FROM MFE.MOVIES ORDER BY category; QUIT; Remember the placement of the SAS statements has no effect; so we can put the middle statement into 3 lines SELECT * means we select all variables from dataset MFE.MOVIES Put ORDER BY after FROM. We sort the data by variable “category” Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 6 Subsetting Data - Character searching in WHERE PROC SQL; SELECT title, category FROM MFE.MOVIES WHERE category CONTAINS 'Action'; QUIT; Use comma (,) to separate selected variables CONTAINS in WHERE statement only for character variables Also try WHERE UPCASE(category) LIKE '%ACTION%'; Use wildcard char. Percent sign (%) with LIKE operator. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 7 Subsetting Data - Phonetic Matching in WHERE PROC SQL; SELECT title, category, rating FROM MFE.MOVIES WHERE category =* 'Drana'; QUIT; Always Put WHERE after FROM Sounds like operator =* Search movie title for the phonetic variation of “drama”, also help possible spelling variations Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 8 Case Logic - reassigning/recategorize PROC SQL; SELECT title, rating, CASE rating WHEN 'G' THEN ‘General' END AS level FROM MFE.MOVIES; QUIT; ELSE 'Other' The order of each statement is important CASE …END AS should in between SELECT and FROM Note there is , after the variables you want to select Use WHEN … THEN ELSE… to redefine variables Rename variable from “rating” to “level” Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 9 Creating New Data - Create Table PROC SQL; CREATE TABLE ACTION AS SELECT title, category FROM MFE.MOVIES WHERE category CONTAINS 'Action'; QUIT; CREATE TABLE … AS can always be in front of SELECT … FROM statement to build a sas file. In SELECT, the results of a query are converted to an output object (printing). Query results can also be stored as data. The CREATE TABLE statement creates a table with the results of a query. The CREATE VIEW statement stores the query itself as a view. Either way, the data identified in the query can beused in later SQL statements or in other SAS steps. Produce a new dataset (table) ACTION in work directory, no printing Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 10 Creating New Data - Create View PROC SQL; CREATE VIEW G_MOVIES AS SELECT title, category, rating FROM MFE.MOVIES WHERE rating = 'G' ORDER BY title; SELECT * FROM G_MOVIES; QUIT; First step-creating a view,no output is produced; then display the desired output results Use ; to separate two block of code inside of proc sql When a table is created, the query is executed and the resulting data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 11 Join Tables (Merge datasets) - Cartesian Join PROC SQL; SELECT * FROM MFE.CUSTOMERS, MFE.MOVIES; QUIT; Terminology: Join (Merge) datasets (tables) No prior sorting required – one advantage over DATA MERGE Use comma (,) to separate two datasets in FROM Without WHERE, all possible combinations of rows from each tables is produced, all columns are included Turn on the HTML result option for better display: Tool/Options/Preferences…/Results/ check Create HTML/OK Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 12 Join Tables (Merge datasets) - Inner Join using WHERE PROC SQL; SELECT * FROM MFE.MOVIES, MFE.ACTORS WHERE MOVIES.title = ACTORS.title; QUIT; Use WHERE to specify connecting columns (title) table1.matchvar = table2.matchvar Produce rows that have same movie title The matching variable can be of different name different datasets Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 13 Join Tables (Merge datasets) - Inner Join using WHERE (Cont.) PROC SQL; SELECT M.title, M.rating, A.actor_leading FROM MFE.MOVIES M, MFE.ACTORS A WHERE MOVIES.title = ACTORS.title; QUIT; Short-cut for table names Can be used in SELECT and WHERE statements Need to be declared in FROM statement Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 14 Join Tables (Merge datasets) - Join three tables PROC SQL; SELECT C.cust_no, M.title,M.rating, M.category, A.actor_leading FROM MFE.CUSTOMERS C, MFE.MOVIES2 M, MFE.ACTORS A WHERE C.cust_no = M.cust_no AND M.title = A.title; QUIT; Use AND in WHERE statement to specify two matching conditions Produce rows that satisfies all the conditions Note: We use MOVIES2 in this example Can join up to 32 tables in one SQL code Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 15 Join Tables (Merge datasets) - Inner Joins using ON PROC SQL; SELECT M.title, rating,actor_leading FROM MFE.MOVIES M INNER JOIN MFE.ACTORS A ON M.TITLE = A.TITLE; QUIT; Same result as using where WHERE is used to select rows from inner joins ON is used to select rows from outer or inner Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 16 Join Tables (Merge datasets) - Left Outer Joins PROC SQL; SELECT MOVIES.title, actor_leading, rating FROM MFE.MOVIES LEFT JOIN MFE.ACTORS ON MOVIES.title = ACTORS.title; QUIT; Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from LEFT table (MOVIES) that did not match any row in the right (ACTORS) table. Essentially the rows from LEFT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists. Need to specify a table name for the matching variable in SELECT Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 17 Join Tables (Merge datasets) - Right Outer Joins PROC SQL; SELECT ACTORS.title, actor_leading, rating FROM MFE.MOVIES RIGHT JOIN MFE.ACTORS ON MOVIES.title = ACTORS.title; QUIT; Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from RIGHT table (ACTORS) that did not match any row in the right (MOVIES) table. Essentially the rows from RIGHT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 18 Join Tables (Concatenating) - Outer Union PROC SQL; SELECT * FROM MFE.CUSTOMERS OUTER UNION SELECT * FROM MFE.MOVIES; QUIT; SQL performs OUTER UNION, similar to DATA steps with a SET statement to Concatenate datasets. The result contains all the rows produced by the first table-expression followed by all the row produced by the second table-expression. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 19 Transforming Data - Creating new Variables /*Creating new variables*/ PROC SQL; SELECT title, length, category, year, rating, 2006-year AS age FROM MFE.MOVIES; QUIT; You can create new variables within SELECT statement, the name of new variable follows after AS. Note the order of the express is reversed Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 20 Transforming Data - Summarizing Data using SQL functions PROC SQL; SELECT *, COUNT(title) AS notitle, MAX(year) AS most_recent, MIN(year) AS earliest, SUM(length) AS total_length, NMISS(rating) AS nomissing FROM MFE.MOVIES GROUP BY rating; QUIT; Simple summarization functions available All function can be operated in GROUPs Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 21 Editing Data – Insert observations. PROC SQL NOPRINT; INSERT INTO MFE.CUSTOMERS VALUES(1 'Peng'); INSERT INTO MFE.CUSTOMERS SET Cust_no=2,Name='Sasha'; QUIT; There are two ways of inserting observations into a table. Data type should be the same. VALUES( ) new values are separated by space. SET column name = newly assigned values, delimited by commas. Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 22 Editing Data – Deleting rows and Dropping columns /*Deleting rows*/ PROC SQL; DELETE FROM MFE.MOVIES WHERE length LE 100; QUIT; /*Droping variables*/ PROC SQL; CREATE TABLE NEW (DROP=rating) AS SELECT * FROM MFE.MOVIES; QUIT; Deleting columns can be done in SELECT or in DROP on created table Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 23 Editing Data – Update observations /*Updating Observation*/ PROC SQL NOPRINT; UPDATE MFE.CUSTOMERS SET Name='Liu' WHERE Cust_no=1; QUIT; UPDATE … SET… WHERE Find the observation and set new value If more than one observations satisfies the condition, all are updated with the new data in SET statement Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 24