Handout - University of Iowa

advertisement
SAS Tips & Tricks
Anthony Fina
Statistics Outreach Center
The University of Iowa
Spring 2013
Files to be used with this document:
MOCK_PUBLIC.txt – 92 observations from public schools
MOCK_PRIVATE.txt– 8 observations from private schools
SAS Environment
1. Docking the Explorer window if you accidently close it:
a. Select View from the menu and click Explorer.
b. Then select Window from the menu and click Docked.
c. You may have to select Large Icons on the View tab.
2. Saving your settings:
a. Tools  Options  Preferences…
b. On the General tab check ‘Save settings on exit.’ This isn’t going to work on CITRIX or
on UI computers that do not allow you to save settings.
Please address correspondence to Anthony Fina at anthony-fina@uiowa.edu. Special thanks to Sheila
Barron and Matt Whittaker for your many thoughtful suggestions on this document.
SAS Tips & Tricks 2
3. Displaying results in Output window (default prior to SAS 9.3):
a. Tools  Options  Preferences…
b. On the Results tab check ‘Create listing.’ Selecting this will display results the
traditional way in the Output window.
4. Automatically clear the Results window:
a. Place at the beginning of your program. This display manager command tells SAS to
remove stored output in the Results Navigator Window. HTML output is unaffected but
still should be deleted using command below.
DM 'ODSRESULTS; CLEAR;';
5. Automatically creating HTML output (now default in SAS 9.3):
a. Tools  Options  Preferences…
b. On the Results tab check ‘Create HTML’ and ‘Use WORK folder.’ HTML results are
presented in the Results Viewer.
6. Automatically clear the HTML output (once):
a. Only the most recent PROC output will be shown in HTML output when ‘View results
as they are generated’ is selected in preferences. Window does not clear until another
procedure with output is run.
ODS HTML CLOSE;
ODS HTML;
7. Automatically clear the HTML output (always):
a. Only the most recent PROC output will be shown in the HTML results window when
‘View results as they are generated’ is selected in preferences.
ODS HTML NEWFILE=PROC;
b. The default is to append the results from the current procedure to previous results. The
following code will undo the code in 7.a:
ODS HTML NEWFILE=NONE;
8. Automatically save the log and output windows:
a. Put at the end of your program, these display manager commands will save the contents
of the log and output window to text files. File references were used in this example to
define the location and names of the files.
SAS Tips & Tricks 3
FILENAME
FILENAME
MYLOG "H:\Example.log";
MYOUT "H:\Example.out";
DM 'OUT;FILE MYOUT REP;'; *Saves your Output window to a text file.;
DM 'LOG;FILE MYLOG REP;'; *Saves your Log window as a text file;
9. Commenting out large portions of your program:
a. Highlight the desired code you wish you wish to comment out.
b. Hold down CTRL and press the / button at the same time.
c. To undo the commented area, highlight the code and press CTRL, SHIFT, and / .
10. Deleting all the files in the WORK folder:
a. When a temporary SAS dataset is created, it is saved only for the duration of the SAS
session. This sometimes leads to problems – the common example occurs when there is
an error in your program so a dataset is not created, but an old dataset with the same
name exists from a previous run of the program. SAS will use the old dataset and, unless
you are carefully reviewing your log, you may not realize this is happening.
b. To delete all old temporary SAS datasets:
PROC DATASETS LIBRARY=WORK KILL; RUN;
Data Steps and Functions
11. Importing a file found on the internet: This is an easy way to read in data found on a website.
FILENAME mydata url 'http://www.education.uiowa.edu/centers/docs/soc-shortcourses/0322-MOCK_PRIVATE.txt' LRECL=259 ;
DATA private;
INFILE mydata PAD;
INPUT
@1
system
$CHAR20.
@31 bname
$CHAR20.
@123 grd
$CHAR2.
@125 testmth
$CHAR2.
@131 testyr
$CHAR2.
@133 name
$char31.
@133 lname
$CHAR11.
@153 fname
$CHAR20.
@173 sex
$CHAR1.
@174 bmo
2.
@176 byr
4.
@191 level
$CHAR2.
@193 form
$CHAR1.
@195 dist
$CHAR4.
@199 (item1-item30) (+1 $1.);
RUN;
SAS Tips & Tricks 4
12. RETAIN (for reordering variables) :
a. This is an easy way to reorder variables if you like examining the data without printing it
(i.e. through the Explorer window). Also helpful if you plan to export a dataset.
DATA master;
RETAIN id lname fname test1 test2;
SET master;
RUN;
13. Creating a unique ID for every observation: This can simplify merges later on. It is also
necessary for some advanced analyses.
DATA master;
SET master;
id=_N_;
RUN;
14. Converting variables from numeric to character or character to numeric:
a. Numeric to character:
newvar = PUT(oldvar, 8.);
b. Character to numeric:
The second line where the old variable is multiplied by one leaves a note in the log
indicating that character values have been converted to numeric. The first line does not.
newvar = INPUT(oldvar, 8.);
newvar = oldvar*1;
15. RENAME, DROP, LENGTH:
a. Situation: There are times when you need to change the length of an existing variable.
This is sometimes necessary if you are combining multiple datasets and a variable has
two different lengths.
b. This example renames the ‘id’ variable as ‘tempvar’ as it is read in from the ‘master’
dataset. The length of ‘id’ is specified. Next, the new ‘id’ variable is assigned the value
of ‘tempvar’. Then ‘tempvar’ is dropped as the dataset is written to the work folder.
DATA WORK.master(DROP=tempvar);
LENGTH id 8;
SET master(RENAME=(id=tempvar));
id=INPUT(tempvar,8.);
RUN;
SAS Tips & Tricks 5
16. Number of observations in a dataset: This is the simplest way to find out the number of
observations there are in a dataset. The number of observations are written to the log. More
complicated methods are included below for use with very large datasets.
DATA _NULL_;
PUT NOBS=;
STOP;
SET master NOBS=NOBS;
RUN;
17. Creating blank datasets: Included for future reference. It is often helpful to have blank
datasets created so that even if a procedure does not produce a table, it can still be appended
to an existing dataset. This is especially true when macros are used. The following code
would create two blank datasets.
DATA blank1 blank2; RUN;
18. INDEX and SUBSTR: These two functions are incredibly powerful together. This example
shows how to break apart two concatenated variables based on the location of the first space.
This works pretty well and can save you time on data clean up.
DATA names; SET master;
last=SUBSTR(name, 1, INDEXC(name,1,' '));
first1=SUBSTR(name, INDEXC(name,' '));
KEEP name last lname2 first first1;
RUN;
19. TRIM and LEFT: If you examine the variable ‘first1’ that was created above you will see
that there are extra spaces. The following code will remove the spaces.
first=TRIM(LEFT(SUBSTR(name, INDEXC(name,' '))));
20. COMPRESS: Removes character values you specify. The example below removes spaces
and hyphens. Unlike the TRIM and LEFT combo above, COMPRESS lets us select what we
want to remove.
lname2=COMPRESS(lname, ' -');
21. Concatenating variables: Suppose you wanted to combine a student’s last name, first name,
and a space added in between, all you need to use || between variables.
name = last||" "||first;
SAS Tips & Tricks 6
22. Conditional inputs:
a. Situation: You are reading in data using an INPUT statement and you do not want to
read in all of the observations. You just want to read in observations that satisfy a
particular condition. For example, you only want to read in the data for the males in a
dataset.
b.
Basic idea –Input the variable you want to condition on. The @ at the end of the first
input statement tells SAS that more data will be read in from these rows. Condition on
the variable of interest using an IF statement. Input remaining variables. In the example
below, the variable ‘sex’ is read in for all observations but only when sex=“M” are the
rest of the variables read in and written to the dataset.
DATA males; INFILE "H:\FILES\MOCK_PRIVATE.TXT" LRECL=258 ;
INPUT @173 sex
$CHAR1. @;
IF sex = "M";
INPUT
@1
system
$CHAR20.
@31 bname
$CHAR20.
@123 grd
$CHAR2.
@125 testmth
$CHAR2.
@131 testyr
$CHAR2.
@133 lname
$CHAR11.
@153 fname
$CHAR20.
@174 bmo
2.
@176 byr
4.
@191 level
$CHAR2.
@193 form
$CHAR1.
@195 dist
$CHAR4.
@199 (item1-item30) (+1 $1.);
RUN;
23. Reading in multiple files at once:
a. Situation: You need to read in data from a number of files in a specific folder. It would
be tedious and time consuming to type in the names of all the files. Instead, this can be
accomplished by using the wildcard ‘*’ in the FILENAME statement.
FILENAME NAMES 'H:\FILES\*.TXT';
DATA master;
INFILE names LRECL=259;
INPUT
@1
system
@31 bname
@123 grd
@125 testmth
@131 testyr
@133 name
@133 lname
@153 fname
$CHAR20.
$CHAR20.
$CHAR2.
$CHAR2.
$CHAR2.
$char31.
$CHAR11.
$CHAR20.
SAS Tips & Tricks 7
@173
@174
@176
@191
@193
@195
@199
sex
$CHAR1.
bmo
2.
byr
4.
level
$CHAR2.
form
$CHAR1.
dist
$CHAR4.
(item1-item30) (+1 $1.);
RUN;
24. Searching a text string using WHILE and DO:
a. WHILE and DO make a powerful combination together. In this example, we use these
two functions to search records to determine if a student has been labeled as a home
school student. An indicator variable is also created. The search starts in column one and
scans for HOME in ‘bname.’ The search proceeds one column at a time till the end.
DATA master;
SET master;
HOME = 'N';
I = 1;
DO WHILE (HOME = 'N' & LENGTH(COMPRESS(bname)) GE I+3);
IF (UPCASE(SUBSTR(COMPRESS(bname), I, 4))= 'HOME') THEN HOME ='Y';
I = I + 1;
END;
DROP I;
RUN;
25. Scoring a test:
a. Arrays are a useful tool for completing the same set of tasks on a series of variables.
The code below creates 3 arrays. The first contains the variables corresponding to 10
items on a test. The values for each item correspond to the response option chosen (A,
B, C, D, or E). The second array contains the correct responses for each item. The third
array contains 10 variables that will be scored 0 if the individual chose the wrong
response and 1 if the individual chose the correct response.
DATA master;
SET master;
ARRAY ITEM (10) $ ITEM1-ITEM10;
ARRAY KEY (10) $ KEY1-KEY10 ('C', 'D', 'A', 'B', 'C', 'D', 'C',
'B', 'B', 'A');
ARRAY CORRECT (10) CORR1-CORR10;
DO I = 1 TO 10;
IF
ITEM(I) = KEY(I) THEN CORRECT(I) = 1;
ELSE IF ITEM(I) = ''
THEN CORRECT(I) = .;
ELSE
CORRECT(I) = 0;
END;
SAS Tips & Tricks 8
DROP I KEY1-KEY10;
READ_CONC = SUM(OF CORR1-CORR10);
RUN;
26. Cumulative frequency or summing across rows:
a.
Sometimes you want a column to reflect the cumulative frequency of another variable.
Other times you may want to know the sum of a column without using a PROC step.
This example using the RETAIN statement to create the new variable ‘tot’. ‘freq’ is
added to ‘tot’ in the next line. This serves as a summing function because the RETAIN
statement carries over the value of ‘tot’ from the row above, which is then added to the
value of ‘freq’ to get the new value of ‘tot’.
DATA new;
SET new;
RETAIN tot;
tot + freq;
RUN;
27. Using a BY statement for selection:
a. When you use a BY statement in your data step, SAS creates variables called ‘FIRST.’
and ‘LAST.’. These variables identify the first and last observation with each value of
the variables on the BY statement. This is also great to use if you have multiple
observations for people and you only want the first one or you only want the last one.
Just sort by ID and DATE to get observations in chronological order and add BY ID in a
data step. The temporary variables FIRST.ID and LAST.ID can then be used to select
the desired rows.
FILENAME mydata url 'http://www.education.uiowa.edu/centers/docs/socshort-courses/BY_EXAMPLE.txt' ;
DATA byexample ;
INFILE mydata;
INPUT ID $ 1-3 NAME $ 5-9 @13 (ITEM1-ITEM6) (+1 1.) @26 DATE
MMDDYY9.;
RUN;
PROC SORT;
BY ID DATE;
RUN;
DATA WORK.byexample1;
SET WORK.byexample;
BY ID;
IF FIRST.ID;
RUN;
PROC PRINT;
RUN;
SAS Tips & Tricks 9
Proc Steps
28. Removing labels and formats: Works for SAS 9.3 and previous versions of SAS even though
ATTRIB
is red. This procedure is really helpful when cleaning up data.
PROC
DATASETS LIB=work NOLIST; MODIFY master;
ATTRIB _ALL_ LABEL=''; *Remove labels;
FORMAT _ALL_;
*Remove formats;
QUIT; RUN;
29. PROC SORT: This procedure is helpful for organizing data. It also has two very useful
options, NODUP and NODUPKEY. While both of these options identify duplicate
observations, they have an important difference.
a. NODUP sorts according to the variables listed in the BY statement. However, when
checking for duplicates, it compares each observation to the observation before it in the
data for all of the variables in the file. Thus, two observations will be considered
duplicates only if they have identical values on all the variables.
PROC SORT NODUP;
BY lname fname bmo byr;
RUN;
b. NODUPKEY sorts according to the variables listed in the BY statement. However when
checking for duplicates, it compares each observation to the observation before for only
the variables listed in the BY statement. Another way to think about it is the BY
statement serves as a KEY for identifying duplicates.
PROC SORT NODUPKEY DUPOUT= duplicates;
BY lname fname bmo byr;
RUN;
30. Proc Transpose: This procedure transposes variables so that rows become columns and
columns become rows. Sometimes this is necessary for exporting and analyses. The VAR
statement names the variables whose values you want to transpose. You could add the BY
statement if you have any grouping variables you want to keep as grouping variables (they
are not transposed). You could also add the ID statement to name the transposed columns –
each value of ID must occur only once.
PROC TRANSPOSE DATA=master OUT=master_t
PREFIX = person;
VAR item1 item2;
RUN;
SAS Tips & Tricks 10
31. Replacing missing values: The STDIZE procedure standardizes one or more numeric
variables in a SAS dataset by subtracting a location measure and dividing by a scale measure.
However, this example uses the procedure to replace all missing values in numeric variables
with 0’s.
PROC STDIZE DATA=indat REPONLY
MISSING=0 OUT=outdat;
VAR _numeric_;
RUN;
Macros
32. %LET:
a. %LET is very useful. It assigns a value to a global macro variable that can be used
anywhere in your SAS program.
b. Example: Assigning dates to logs and output files. Note, you need to run the %LET
statement before the filename statement in order for the filename to be properly
assigned. This is a good way to back up your programs. The fourth line of code outputs
the log that was defined in line 2.
%LET DATE=%sysfunc(Date(), worddate18.);
FILENAME
LOG
"H:\TIPS_&DATE..LOG";
(SAS Program)
DM 'LOG;FILE LOG REP;';
33. Number of observations:
a. This code uses a macro to get the number of observations in a dataset. This value is
assigned to the global macro variable ‘nobs’ which is written to the LOG. Having a
macro variable recording the number of observations can be helpful also in some
situations.
%MACRO obs(name);
%LET ds = %SYSFUNC(OPEN(&name,i));
*Makes sure the dataset exists;
%LET nobs= %SYSFUNC(ATTRN(&ds,NOBS));
*Assigns the number of observations to 'nobs';
%put Number of observations = &nobs;
%MEND obs;
%obs(master);
SAS Tips & Tricks 11
34. %INCLUDE: This is great if you have a portion of code that is not changed frequently. For
example, the data statement with the input command in example 2b above could be saved
externally as ‘INPUT.sas’. Then to call it you only need one statement. The entire statement
has been replaced with a single line.
%INCLUDE 'H:\INPUT.sas' LRECL=256;
35. Debugging macros:
a. Debugging macros is notoriously hard.
i. When possible, write your program as regular SAS code first and once you know
that works convert it to a macro.
ii. You can convert the code to a macro in steps by first using %LET to create macro
variables, then once you know that is working, embed the code in a macro and pass
the macro variables as parameters.
iii. When you have errors there are several options that can help you find them
1. SYMBOLGEN: This prints how your macro variable is being resolved. Makes
debugging easier.
a. To turn it on use OPTIONS SYMBOLGEN;
b. To turn it off use OPTIONS NOSYMBOLGEN;
2. MPRINT: When this option is on, SAS will print to the log, the standard SAS
statements that were generated by macros.
a. To turn it on use OPTIONS MPRINT;
b. To turn it off use OPTIONS NOMPRINT;
Inputting multiple files from a single folder
Situation: This is another method to read in data from multiple files in a folder without typing in
the names of all the files. It is more complicated (and more flexible than the method described
above that used a wildcard. First, you want to have SAS look in the folder, find all the file names
and write them to a dataset. This example looks up the files in a specified folder and assigns each
file name to an observation in a dataset. In an optional second step, a data step illustrates how
you can delete files based on their extensions. Last, the remaining files are read into SAS and a
variable is created that contains the name of the dataset. The variable containing the name of the
dataset is a tracking variable indicating the file from which the data was read in from.
SAS Tips & Tricks 12
*This gets the file names in a specified folder;
%macro get_filenames(location);
FILENAME _dir_ "%bquote(&location.)";
DATA filenames(KEEP=memname);
handle=DOPEN( '_dir_' );
IF handle > 0 THEN DO;
count=DNUM(handle);
DO i=1 TO count;
memname=DREAD(handle,i);
OUTPUT filenames;
END;
END;
rc=DCLOSE(handle);
RUN;
FILENAME _dir_ CLEAR;
%MEND;
%get_filenames(H:\FILES\);
* Remove csv files;
DATA names; SET filenames;
csv=0; i=1;
DO WHILE (csv=0 AND i+2 le LENGTH(COMPRESS(memname)));
IF (UPCASE(SUBSTR(COMPRESS(memname), i, 3)) = 'CSV') THEN csv=1;
i = i + 1;
END;
RUN;
DATA names; SET names;
id=_N_;
IF csv=1 THEN DELETE;
RUN;
DATA names; SET names;
FILE 'H:\names.txt' LRECL=150;
PUT @1 memname $char150.;
RUN; *Read in files from created list & dump file name into a variable;
DATA master;
INFILE 'H:\names.txt' LRECL=150; LENGTH memname $150;
INPUT memname $;
fil2read='H:\FILES\'||TRIM(LEFT(memname));
INFILE dummy FILEVAR=fil2read END=done DSD TRUNCOVER LRECL=258;
DO WHILE (not done);
INPUT
@1
system
$CHAR20.
@31 bname
$CHAR20.
@123 grd
$CHAR2.
@125 testmth
$CHAR2.
@131 testyr
$CHAR2.
@133 name
$char31.
@133 lname
$CHAR11.
@153 fname
$CHAR20.
@173 sex
$CHAR1.
@174 bmo
2.
@176 byr
4.
@191 level
$CHAR2.
@193 form
$CHAR1.
@195 dist
$CHAR4.
@199 (item1-item30) (+1 $1.);
OUTPUT;
END;
RUN;
SAS Tips & Tricks 13
Matching (MERGE and PROC SQL)
Scenario: The following six people were requested by a researcher. We need to find these
people in our dataset created earlier (if possible). We will deliver an Excel file to the researcher
of the matched results. Two different methods are examined: MERGE and SQL.
DATA request; INPUT
DATALINES;
DYE
MADISON
BUSCH
TAMMY
WEDEKING
HAILEY
BEHREND
JACK
BEHREND
JACK
CHRISTENSEN JIMMY
GOEBEL
THOMAS
SMITH
ANDREA
;
lname $CHAR11. fname $ bmo byr exam1 exam2;
5
10
8
8
8
10
10
1
2000
1993
1994
1999
1999
1992
1985
1999
86
76
49
82
82
78
95
68
89
75
65
88
88
82
92
78
1. MERGE: Before the data can be merged, they need to be sorted by the common variables.
The simplest MERGE statement is a one-to-one match.
PROC SORT DATA=master;
BY lname fname bmo byr;
RUN;
PROC SORT DATA=request;
BY lname fname bmo byr;
RUN;
DATA found;
MERGE master (IN=ina) request (IN=inb);
BY lname fname bmo byr;
RUN;
This produced a dataset with 104 observations, indicating that there were three matches in
the two files and four unique observations in the dataset called request. However, this is not
what we need. We want only the students that were correctly matched.
(IN=new_name): This is useful for tracking whether a dataset contributed to the current
observation. In the scenario above we would not return unmatched records to a researcher.
Using the ‘IN=’ dataset option, we can select only the observations that came from both files
used in the merge.
DATA found;
MERGE master (IN=ina) request (IN=inb);
BY lname fname bmo byr;
IF ina AND inb THEN OUTPUT;
SAS Tips & Tricks 14
RUN;
This created a dataset with 3 observations. Because we have set a restrictive MERGE (4
variables have to match perfectly), some valid observations may have been incorrectly
excluded. If we drop ‘fname’ from the merge we find an additional 2 students. However,
closer inspection reveals that we have a false positive. It is unlikely that the requested
Tammy Busch is the same person as Tyler Busch in our ‘master’ dataset.
2. SQL: Structured Query Language. A widely used programming language that retrieves and
updates relational tables and databases. The datasets used in DATA statements are the
equivalent of the relational (or source) tables used in an SQL statement. SQL uses a different
vocabulary (e.g., table for dataset, join for merge) but does many of the same things. The
simplest form an SQL statement can take is:
PROC SQL;
SELECT *
FROM request;
QUIT;
The asterisk indicates that every column should be selected from the source table called
‘request.’ If you run this you will see that a table was produced in the Output window. This
statement basically acted like a PROC PRINT statement. Anything that can be done in a
DATA statement can be performed in SQL. Just for illustration, suppose we want to create a
table (dataset) with only males. The code below creates a dataset called ‘males’ that contains
all the columns in the ‘master’ dataset.
PROC SQL;
CREATE TABLE males AS
SELECT *
FROM master
WHERE sex = "M";
QUIT;
For the purposes of merging, we need two source tables (i.e., master and request) and one or
more conditions on the joining of these tables. To recreate the merge above (where first name
was not used to merge), the following code is used:
PROC SQL;
CREATE TABLE found AS
SELECT *
FROM master AS A, request AS B
SAS Tips & Tricks 15
WHERE A.lname = B.lname and
A.bmo = B.bmo and
A.byr = B.byr;
QUIT;
If you examine the log you see that a warning is present. This is because the two tables
(datasets) we are selecting have columns with the same name. If there are differences in
these columns, the value of the variable in the first table first is used. In the current example,
the requested TAMMY would not overwrite TYLER from the ‘master’ dataset. This is why it
is often better to rename the variables before using SQL.
DATA request;
INPUT r_lname
CARDS;
DYE
MADISON
BUSCH
TAMMY
WEDEKING
HAILEY
BEHREND
JACK
BEHREND
JACK
CHRISTENSEN JIMMY
GOEBEL
THOMAS
SMITH
ANDREA
;
$CHAR11. r_fname $ r_bmo r_byr exam1 exam2;
5
10
8
8
8
10
10
1
2000
1993
1994
1999
1999
1992
1985
1999
86
76
49
82
82
78
95
68
89
75
65
88
88
82
92
78
PROC SQL;
CREATE TABLE found AS
SELECT *
FROM master AS A, request AS B
WHERE A.lname = B.r_lname and
A.bmo = B.r_bmo and
A.byr = B.r_byr;
QUIT;
PROC SQL;
SELECT lname, r_lname, fname, r_fname
FROM found;
QUIT;
The second SQL statement reveals that we did incorrectly identify people. This is a problem
frequently encountered and should not be overlooked. The easiest way to be surer of your
matches is include more variables to match on. This is where SQL has some tricks that
DATA steps cannot easily accomplish.
Fuzzy Matching: The idea behind this is that we want to include people in the match even if
some variables are a little bit different. It is common for a Katherine to go by the name of
Katie, William by Bill, Elizabeth by Liz, etc. Fuzzy matching offers us a way match
discrepancies like these. You can read the =* in the code below as “sounds like.”
SAS Tips & Tricks 16
PROC SQL;
CREATE TABLE found AS
SELECT *
FROM master AS A, request AS B
WHERE A.lname = B.r_lname and
A.fname =* B.r_fname and
A.bmo = B.r_bmo and
A.byr = B.r_byr;
QUIT;
PROC SQL;
SELECT lname, r_lname, fname, r_fname
FROM found;
QUIT;
The code above finds five matches. One of which is incorrect (ANDREW is not equal to
ANDREA). This shows that even though the quality of the match was improved, you still
need to be careful about your matches. One way to do this is write several SQL statements
that get less restrictive. This offers an easy way to examine the quality of the match at every
level.
PROC SQL;
CREATE TABLE found1 AS
SELECT *
FROM master AS A, request AS B
WHERE A.lname = B.r_lname and
A.fname =* B.r_fname and
A.bmo = B.r_bmo and
A.byr = B.r_byr;
QUIT;
PROC SQL;
CREATE TABLE found2 AS
SELECT *
FROM master AS A, request AS B
WHERE A.lname = B.r_lname and
A.fname NE B.r_fname and
A.fname =* B.r_fname and
A.bmo = B.r_bmo and
A.byr = B.r_byr;
QUIT;
PROC SQL;
CREATE TABLE found3 AS
SELECT *
FROM master AS A, request AS B
WHERE A.lname NE B.r_lname and
A.lname =* B.r_lname and
A.fname = B.r_fname and
A.bmo = B.r_bmo and
A.byr = B.r_byr;
QUIT;
SAS Tips & Tricks 17
3. Exporting the matched files: The following will export an Excel file with the results from all
three SQL statements to our H drive.
DATA found; SET found1 found2 found3;
RUN;
PROC EXPORT DATA= found
OUTFILE= "H:\Found.xls"
DBMS=EXCEL LABEL REPLACE;
SHEET="Found";
NEWFILE=YES;
RUN;
4. MERGE vs. SQL: SQL is a more powerful procedure and offers users many advantages.

SQL significantly reduces computer processing time

SQL offers fuzzy merging

SQL can access other databases (beyond the scope of this presentation)

In SQL, variables you want to match on do not have to have the same name

With SQL, tables do not need to be sorted prior to joining.
However, using MERGE in a data step does have some advantages.

It is far easier to debug your program when you use MERGE.

Using MERGE as part of a data step is typically easier to understand for beginners.
A few more things to note:

Using MERGE, if two datasets have the same variable, their values do not match, and the
variable was not included in the BY statement, the value from the dataset that was read in
second will overwrite the value from the first.

In SQL, if two datasets have the same variable with different values and it is not included
in the JOIN, then the value from the table read in first will appear in the created table.
SAS Tips & Tricks 18
ODS. Output Delivery System.
By using ODS statements in your program you can change destinations and otherwise tailor
your output.
1. ODS TRACE: Every piece of output (called an object) created in SAS has a name and a path.
However, we typically don’t see or use the names. If you turn on ODS TRACE, the names
and other information will be printed to the log.
a. Using ODS TRACE, we see that there is an object called “Moments” that reports
the descriptive statistics produced by PROC UNIVARIATE.
ODS TRACE ON;
PROC UNIVARIATE;
VAR testgrd;
RUN;
ODS TRACE OFF;
2. ODS SELECT & ODS EXCLUDE: Once you know the names of objects you can select
objects you want, or you can exclude objects you don’t want. These can be printed or saved
to a dataset.
a. To select based on the label of the table, quotes need to be used. This is not true
for the table name. Both are shown below.
ODS LISTING SELECT "Basic Measures of Location and Variability";
ODS LISTING SELECT BasicMeasures;
PROC UNIVARIATE;
VAR exam1; RUN;
3. ODS graphics can be used to produce a wide range of production quality graphs. The easiest
way to get started with ODS GRAPHICS is to turn it on and see what graphs are created by
default. The code below runs PROC CORR. Since ODS graphics is turned on, it will also
produce a matrix of the scatter plots.
ODS GRAPHICS ON;
PROC CORR;
VAR exam1 exam2;
RUN;
ODS GRAPHICS OFF;
Note, ODS graphs will not appear in the output window. If you have selected in your
preferences to create html output, you can view the graphs in the results viewer. Otherwise if
you are routing your output to a destination other than LISTING, you can view the graphs in
the output file.
SAS Tips & Tricks 19
4. Using ODS OUTPUT, we can tell SAS to write the data in the “Moments” object to a dataset
called ‘mom’.
ODS OUTPUT moments=mom;
PROC UNIVARIATE;
VAR exam1;
RUN;
ODS OUTPUT CLOSE;
Use an ODS statement giving the type of destination and the name of a destination to send
data to a file of a given type. For example, using the code below, the output from PROC
MEANS will be saved as a pdf file called “Stats.pdf” and saved on the H drive.
ODS PDF FILE="H:\Stats.pdf";
PROC MEANS;
RUN;
ODS PDF CLOSE;
Resources:
1. www.google.com
2. The Little SAS Book (Delwiche & Slaughter, 2012)
3. In SAS: Help  SAS Help and Documentation  Products
4. SAS Global Forum: http://support.sas.com/events/sasglobalforum/previous/online.html
5. Midwest SAS Users Group: http://mwsug.org/
Download