Article I. Introductio n to SAS Statistics Outreach Center Short Course Beginning instructions: Open Word document Save text file in H:\ Save excel file in H:\ SAS Short Course COURSE TOPICS 1. How to open SAS 2. Overview: the 5 “main windows” of SAS 3. Components of a SAS program a. Data step b. Procedures “step” c. Other features of all SAS programs 4. SAS datasets a. Nature of the dataset b. Data embedded in Editor window i. List input ii. Colum input iii. Informats c. Data from an external file i. Text file 1. Using filename statement 2. Not using filename statement ii. Excel file d. Types of datasets i. Temporary datasets ii. Permanent datasets 5. Data analysis & common procedures a. Contents b. Print c. Frequency d. Means e. Univariate f. Sort 6. Miscellaneous 2 SAS Short Course Getting Started Students at the University of Iowa can use SAS on their “Virtual Desktop.” This site can be found at: https://virtualdesktop.uiowa.edu/Citrix/VirtualDesktop/auth/login.aspx To open SAS: 1. Go to the Virtual Desktop website shown above 2. Log in using your HawkID username and password 3. You will see a main menu with several folders. 4. Go to Statistical analysis and you will see SAS 9_3 32 bit and SAS 9_3 64 bit. Click on “SAS 9_3 64 bit” 5. You will see a pop-up window titled “Getting Started with SAS.” Click “Close” for the time being. 6. You are ready to begin using SAS Note: These instructions can be used on computers that do not have SAS. If you are using a computer that has SAS installed, you can use SAS directly from the installed program. Click the “Start” menu in the lower left hand corner of the screen, click on “All Programs,” and then click on “SAS (English).” 3 SAS Short Course SAS Basics There are 5 main “windows” you can view when using SAS: Explorer, Results, Editor, Log, and Output. Explorer and Results are at the bottom of the left-hand panel, and Editor, Log, and Output are at the bottom of the main panel. A brief description of what each of these performs appears below: Explorer: This contains the folders “Libraries,” “File Shortcuts,” “Favorite Folders,” and “My Computer.” The two most commonly used folders you might use in this environment are “My Computer” and “Libraries.” “My Computer” gives you access to all files on your computer. “Libraries” gives you access to SAS datasets that you create. Results: Results from SAS procedures that you have previously conducted during your work session are stored in here. Editor: This window is where you type in (and edit) your SAS code. Your SAS program runs from this window. Log: After you “run” a program, the Log contains notes concerning your code. This window in SAS keeps track of how procedures were performed, and gives indications of any errors in your SAS code. Output: Output from the requested procedures will be displayed in the output window. 4 SAS Short Course Results Output Options: Tools > Options > Preferences… > Results 5 SAS Short Course Sample SAS Program List Input DM 'LOG;CLEAR;OUT;CLEAR;'; /* CLEARING LOG & OUTPUT WINDOWS */ /*****************************************************************/ /*PROJECT: SAS Short Course */ /* FOR: COE Students */ /* BY: Sheila Barron */ /* DATE: February 05, 2015 */ /* NOTES: Entering data and checking it */ /*****************************************************************/ DATA CLASSDAT; INPUT ID $ NAME $ SEX $ EXAM1 GRADE $; DATALINES; S01 Max M 84 A S02 John M 89 A S03 Sarah F 86 B S04 Lee M 85 B S05 Rosa F 94 A S06 Ming F 84 C ; PROC CONTENTS DATA=CLASSDAT VARNUM; RUN; PROC PRINT DATA=CLASSDAT; TITLE 'SAS SHORT COURSE'; RUN; /*****************************************************************/ /*DM 'OUT;FILE OUT REP;'; DM 'LOG;FILE LOG REP;'; */ /*****************************************************************/ 6 SAS Short Course Components of a SAS program In the SAS editor you can type in the commands you want SAS to execute. A simple SAS program can be thought of as having two important parts (although it is not necessary that every program have both parts). SAS data step: The word DATA tells SAS that you want to work with your dataset – either inputting the data or manipulating the data. SAS procedures step: The word PROC tells SAS you want to do something with the data (e.g., print it out, calculate statistics). o If no data is specified, the last previously used dataset will be invoked. A few things to know about SAS: Each SAS statement must end with a semicolon “;” At the end of your program you must have a run statement, “RUN;”. Otherwise the last SAS data step or SAS procedure will not get executed. SAS comments: Anything written between “/*” and “*/” is considered as documentation that the person writing the program did not intend SAS to try to execute. In other words, SAS will pass over anything that is written between “/*” and “*/”. “*” and “;” also works to denote a comment. It is a good idea to use comments to document what you are doing in your program. If you come back to the program later, the comments will hopefully help you understand the purpose of the program. Running your program: When you want SAS to execute the statements you have written, click the “running man” icon on the toolbar. Or click on the Run pull-down menu and select “submit.” To run the entire program, make sure nothing is highlighted and click Run. If you only want to run part of the program, highlight the part you want to run and then click Run. SAS will only process the part of the program that you have highlighted. 7 SAS Short Course SAS Datasets Before SAS can perform the variety of functions that it is used for, SAS first must know what dataset it is going to use. SAS datasets contain columns corresponding to specific variables (e.g., height, weight, etc.) and rows corresponding to specific observations (e.g., persons, clinic sites, etc.). SAS can read data in two different methods: 1. SAS datasets can be directly embedded in the Editor window 2. SAS datasets can be imported from a file (i.e., text file, excel file, etc.) SAS Variables SAS variables can be in one of two possible formats: 1. Character: typically letters or strings of letters and numbers, and mathematical operations can not be performed on them. (ID, Name, Gender, Grade) 2. Numeric: typically numbers, and mathematical operations can be performed on them. (Exam1) Some rules about variable names: 1. Start with a letter or _ (underscore) 2. Can CoNtAin UpPer and LoWer Case 3. Contain only letters, numerals or underscores (_) 4. No Spaces 5. Are not case sensitive 6. 32 characters or fewer 8 SAS Short Course Data Embedded in Editor Window Suppose we want to use the following dataset in SAS. Note that each row corresponds to a specific observation (person), and each column corresponds to a specific variable (ID, Name, Gender, Exam1, and Grade). S01 S02 S03 S04 S05 S06 Max John Sarah Lee Rosa Ming M M F M F F 84 89 86 85 94 84 A A B B A C Option 1: List Input Notice that the dataset does not have any missing data and there is always at least 1 blank space between variables. When your data are set up like this it is OK to list the variables in the INPUT statement without telling SAS where to find each variable. This is called “list input” – SAS will read the input statement and expect the variables to be in the order they are listed and separated by at least one space. If you have missing data that are represented by blanks, variables that include blanks, or if you have variables that have no spaces between them, ‘list input” won’t work (you will need to put a “.” for missing data). INPUT ID $ NAME $ SEX $ EXAM1 GRADE $; Option 2: Column Input Another option is “Column input.” In order to use “column input,” values for each variable must line up – that is they must always be in the same columns. Then in the input statement you add column numbers to tell SAS what column or columns to find each variable. INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16; 9 SAS Short Course Option 3: Informats A third way of reading in data is to use SAS informats. SAS informats tell the computer the format of the data that is to be read in. The most commonly used informats are date informats. Dates are a little tricky to deal with in computer programs if you want to use them in calculations. A numeric informat consists the following pieces: 1. Name 2. Width 3. A period 4. Number of places after the decimal For example, an informat for a date that is written month, day, year, separated by slashes (e.g., 11/10/2007) is “MMDDYY10.” The name of this informat is MMDDYY, the width is 10, next is the period. This is not a number with a decimal so the number of places after the decimal is omitted. Another note: Character informats start with a dollar sign ‘$’. We will not be discussing informats in great detail. However, to look up other SAS informats, go to the HELP menu, select SAS Help and Documentation. Then go to: SAS products Base SAS SAS 9.3 Formats and Informats: Reference SAS Informats Dictionary of Informats Informats by Category 10 SAS Short Course Data from an external file We will discuss how to import data from two common sources: an EXCEL file and a TEXT file. For the most part, the input statement will follow all the same rules as if the data were in the program but you need to tell SAS where to find the data. When specifying the pathname, you must pay attention to the capitalization used in the filename. Data from a text file using FILENAME statement: DM 'LOG;CLEAR;OUT;CLEAR;'; /* CLEARING LOG & OUTPUT WINDOWS */ /*****************************************************************/ /*PROJECT: SAS Short Course */ /* FOR: COE Students */ /* BY: Sheila Barron */ /* DATE: February 05, 2015 */ /* NOTES: Entering data and checking it */ /*****************************************************************/ FILENAME IN1 'H:\SOC_SAS_Short_Course_INTRO_TXT.TXT'; DATA CLASSDAT; INFILE IN1; INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16; PROC CONTENTS DATA=CLASSDAT VARNUM; RUN; PROC PRINT DATA=CLASSDAT; TITLE 'SAS SHORT COURSE'; RUN; /*****************************************************************/ /*DM 'OUT;FILE OUT REP;'; DM 'LOG;FILE LOG REP;'; */ /*****************************************************************/ 11 SAS Short Course Data from a text file without using FILENAME statement (simpler): DM 'LOG;CLEAR;OUT;CLEAR;'; /* CLEARING LOG & OUTPUT WINDOWS */ /*****************************************************************/ /*PROJECT: SAS Short Course */ /* FOR: COE Students */ /* BY: Sheila Barron */ /* DATE: February 05, 2015 */ /* NOTES: Entering data and checking it */ /*****************************************************************/ DATA CLASSDAT; INFILE 'H:\SOC_SAS_Short_Course_INTRO_TXT.TXT'; INPUT ID $ 1-3 NAME $ 5-9 SEX $ 11 EXAM1 13-14 GRADE $ 16; PROC CONTENTS DATA=CLASSDAT VARNUM; RUN; PROC PRINT DATA=CLASSDAT; TITLE 'SAS SHORT COURSE'; RUN; /*****************************************************************/ /*DM 'OUT;FILE OUT REP;'; DM 'LOG;FILE LOG REP;'; */ /*****************************************************************/ 12 SAS Short Course Importing Data from an Excel file: Pulldown: File Import Data Next Browse for workbook (select appropriate EXCEL file) [OK] Sheet name [Next] Data name (Under Member: CLASSDAT) [Next] SAS File Name (H:\Intro_Data) [Finish] [Open new SAS program] Notice that when you read the data in from EXCEL, SAS tries to assign informats that seem the most logical. This can be a big help – for example, SAS will often correctly read in dates. But it can also be a pain when the informat SAS picks in not the correct one. Thus, be careful when you import data to look carefully and make sure the data got read in correctly. Also, the wizard creates SAS code which can serve as a “template.” The wizard does not have to be used—you just write your own SAS code to import an excel file. The wizard would create the following code. Usually it is easiest to save it to your desktop at the end of the wizard, then copy and paste it into the editor window you are using. PROC IMPORT OUT=CLASSDAT DATAFILE= "H:\RA_SAS_Short_Course_INTRO_XLS.xls" DBMS=EXCEL REPLACE; RANGE="Sheet1$"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; 13 SAS Short Course Data in SAS There are two types of files (data) that can be used in SAS: temporary datasets, and permanent datasets. Temporary datasets: “Work” datasets are temporary datasets. SAS remembers them during the particular session that you are working in, but will forget them for subsequent sessions. Up until this point in time, we’ve only been working with work datasets—hence the “WORK.______” format for all specified datasets. Permanent datasets: Permanent datasets can be created (and stored) using the “<library name>.______” format for specified datasets. The library name can be anything. To do this you need to start your program with a library reference (LIBREF). Then use that reference as the first part of the dataset name you assign. For example, I like to call my library SAVE so I use the following libref. LIBNAME SAVE LIBNAME 'H:\'; lets SAS know that the permanent directory is going to be specified. SAVE (can be anything) is the name used to refer to the external data library specified by 'H:\' which is the full pathname. When specifying the data, use the “<library name>._______” format. For example, using the LIBNAME statement above, SAVE.CLASSDAT would create a permanent file compared to the WORK.CLASSDAT which we have been using. You can then view the dataset by opening the appropriate library in the “Libraries” folder under the explorer tab. 14 SAS Short Course Sample SAS Program Creating a Permanent Dataset With List Input DM 'LOG;CLEAR;OUT;CLEAR;'; /* CLEARING LOG & OUTPUT WINDOWS */ /*****************************************************************/ /*PROJECT: SAS Short Course */ /* FOR: COE Students */ /* BY: Sheila Barron */ /* DATE: November 13, 2007 */ /* NOTES: Entering data and checking it */ /*****************************************************************/ LIBNAME SAVE 'H:\DATA'; DATA SAVE.CLASSDAT; INPUT ID $ NAME $ SEX $ EXAM1 GRADE $; DATALINES; S01 Max M 84 A S02 John M 89 A S03 Sarah F 86 B S04 Lee M 85 B S05 Rosa F 94 A S06 Ming F 84 C ; PROC CONTENTS DATA=SAVE.CLASSDAT VARNUM; RUN; PROC PRINT DATA=SAVE.CLASSDAT; TITLE 'SAS SHORT COURSE'; RUN; /*****************************************************************/ /*DM 'OUT;FILE OUT REP;'; DM 'LOG;FILE LOG REP;'; */ /*****************************************************************/ 15 SAS Short Course Ready to Begin Data Analysis Now that your data is in SAS, you are ready to conduct statistical procedures. SAS has literally hundreds of procedures that will do just about any quantitative analysis you want. To get an overview of the procedures go to the HELP menu, select SAS Help and Documentation and Contents. Then go to: SAS products SAS/STAT 9.3 User’s Guide In the user guide you will find overviews for different types of analyses as well as details on specific procedures. 16 SAS Short Course Sample SAS Program Code for Common Procedures DM 'LOG;CLEAR;OUT;CLEAR;'; /* CLEARING LOG & OUTPUT WINDOWS */ /*****************************************************************/ /*PROJECT: SAS Short Course */ /* FOR: COE Students */ /* BY: Sheila Barron */ /* DATE: Feb 05, 2015 */ /* NOTES: Entering data and checking it */ /*****************************************************************/ DATA WORK.CLASSDAT; INPUT ID $ NAME $ SEX $ EXAM1 GRADE $; DATALINES; S01 Max M 84 A S02 John M 89 A S03 Sarah F 86 B S04 Lee M 85 B S05 Rosa F 94 A S06 Ming F 84 C ; PROC CONTENTS DATA=WORK.CLASSDAT VARNUM; RUN; PROC PRINT DATA=WORK.CLASSDAT; RUN; PROC PRINT DATA=WORK.CLASSDAT (OBS=3); VAR NAME GRADE; RUN; TITLE 'SAS SHORT COURSE'; PROC FREQ; TABLES EXAM1; RUN; PROC FREQ; TABLES EXAM1*GRADE; RUN; PROC FREQ; TABLES EXAM1*GRADE /LIST; RUN; PROC MEANS; VAR EXAM1; RUN; PROC UNIVARIATE; VAR EXAM1; RUN; PROC SORT; BY SEX; RUN; /*****************************************************************/ /*DM 'OUT;FILE OUT REP;'; DM 'LOG;FILE LOG REP;'; */ /*****************************************************************/ 17 SAS Short Course PROC CONTENTS DATA=WORK.CLASSDAT VARNUM; To get a listing of the variables in a dataset along with other information about the dataset. “Varnum” limits output to include only variable names, type, length, and labels. PROC PRINT; To print out a dataset (often good to check the data using PROC PRINT before running any analyses). PROC PRINT DATA=WORK.CLASSDAT (OBS=3); VAR NAME GRADE; If the dataset is small you can print out the whole thing. If it is large you may want to select particular variables to print using a VAR statement or select particular observations to print using an OBS= option. PROC FREQ; TABLES EXAM1; To produce a frequency distribution for a variable (specify the variable using the “TABLES” statement. PROC FREQ; TABLES EXAM1*GRADE; PROC FREQ will also produce two-way (or higher) cross-tabulations of the data. PROC FREQ; TABLES EXAM1*GRADE /LIST; If there are lots of unique values for the variables, you may want to try a LIST option to produce more concise output. 18 SAS Short Course PROC MEANS; VAR EXAM1; PROC UNIVARIATE; VAR EXAM1; To produce means and other descriptive statistics use PROC MEANS or PROC UNIVARIATE. PROC UNIVARIATE will produce more extensive output. (Note that the specific variable is specified by the VAR statement. If no VAR statement is included, by default SAS will produce output for all variables.) Sometimes you may want to want to save the output to a dataset. This can be accomplished with: PROC MEANS DATA=SAVE.CLASSDAT; VAR EXAM1; OUTPUT OUT=STATS; RUN; PROC PRINT DATA=WORK.STATS; RUN; This is useful if you want a permanent record of the results or if you will use the results in other analyses. Note the OUTPUT OUT= statement can be used in other PROC procedures as well. Often times the outputted dataset will have variables you don’t want. To get rid of these use the DROP statement. For example: DATA STATS; SET STATS; DROP _TYPE_ _FREQ_; RUN; PROC PRINT DATA=WORK.STATS; RUN; The first statement drops the automatically created variables _TYPE_ and _FREQ_ from the dataset. The print procedure confirms this. For more advanced procedures or selecting specific tables created from a procedure, look up the ODS TRACE function. PROC SORT; BY SEX; RUN; Sometimes you will want to get descriptive statistics for subgroups based on a categorical variable. This often requires the data be sorted prior to running the analysis (see below). Sorting your data is also helpful if you want to print the data to examine it. PROC MEANS DATA=SAVE.CLASSDAT; VAR EXAM1; BY SEX; RUN; 19 SAS Short Course NOTE: Note that in some PROC statements, the keyword “DATA=” is specified. In other PROC statements, it is omitted. It is necessary to tell SAS which dataset to use if you are just starting your SAS session or if you are switching the dataset you want SAS to use. If you are continuing to use the same dataset that you used in the last procedure or data step, then it is not necessary to tell SAS which dataset to use, it will automatically use the dataset it used last. Miscellaneous Note that there are some lines in this program that we have not talked about. The top line (DM 'LOG; CLEAR; OUT; CLEAR; ';) tells SAS to clear out the log and output windows. Without this line, each time you run the program, SAS will add the log and output to the end of the old log and output. This can sometimes be useful, but it can be confusing after several runs of a program. The two lines that start with “FILENAME” tell SAS where the log and output are to be saved (not included in this program). FILENAME FILENAME LOG "H:\LOG_SAS.TXT"; OUT "H:\OUT_SAS.txt"; The last two lines tell SAS to save the log and output and if those files already exist, to replace the old versions with these versions (not used in this program: these lines were forced to be a comment in this program). When you have written a program it is a good idea to save it. Go to the FILE menu and click SAVE AS. It will prompt you for a name. After that, you can save your revisions by selecting SAVE or clicking the save icon. When you come back later, you can open the program and continue working. 20 SAS Short Course It is always good habit to give a title to something that you are doing. For example, TITLE "Short Course Example";. After you have completed your procedures, end with TITLE;, otherwise your title will be carried through the remainder of your session. 21