Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward 1 SAS Essentials - Elliott & Woodward Intro to SAS Chapter 3, Part 1 2 SAS Essentials - Elliott & Woodward Chapter 3 LEARNING OBJECTIVES To be able to work with SAS libraries and permanent data sets To be able to read and write permanent SAS data sets To be able to interactively import data from another program To be able to define SAS libraries using program code To be able to import data using code To be able to discover the contents of an SAS data set To be able to understand how the Data Step Reads and Stores Data. 3 SAS Essentials - Elliott & Woodward 3.1 WORKING WITH SAS LIBRARIES 4 Permanent Data Sets All of the data sets we’ve created have been “Work” data sets. Work datasets vanish when you end a SAS session However, we can create permanent data sets First, we need a location (library or folder) Then we store data sets in that library (or folder) SAS Essentials - Elliott & Woodward Data Set vs Code The SAS data set is a separate file – like a Word .doc file or an Excel .xls file The data set is not the SAS code that we’ve used up to this point – the code creates the data set. SAS data sets have the extension .sas7bdat 5 SAS Essentials - Elliott & Woodward SAS Data Sets are (usually) created by a DATA statement. are an internal representation of the data created by the DATA statement contain more than the data values – they can contain variable names, labels, the results of codings, calculations and variable formats. are referred to by a name that indicates whether the data set is temporary or permanent 6 SAS Essentials - Elliott & Woodward Temporary vs Permanent A temporary SAS data set is named with a single level name such as MEASLES, or MAR2000. (technically the names are WORK.MEASLES and WORK.MAR2000) A permanent SAS data set is a file saved on your hard disk. There are two ways to refer to a permanent data set We can refer to a permanent SAS data set using a Windows filename such as “C:\SASDATA\SOMEDATA” or “C:\RESEARCH\MEASLES2009”. Or by using a SAS library prefix: MYSASDATA.PEOPLE or RESEARCH.MEASLES2009 where the MYSASDATA library name = “C:\SASDATA” RESEARCH library name = “C:\RESEARCH” 7 SAS Essentials - Elliott & Woodward Creating a SAS Data Set 8 SAS Essentials - Elliott & Woodward Two ways to refer to a permanent location LIBRARY NAME FOLDER LOCATION MYSASDATA “C:\SASDATA” RESEARCH “C:\RESEARCH” Library Name & data set name Complete name MYSASDATA and REPORT MYSASDATA.REPORT “C:\SASDATA” and REPORT “C:\SASDATA\REPORT” Thus… MYSASDATA and C:\SASDATA refer to the same location C:\SASDATA\REPORT and MYSASDATA.REPORT Both refer to the SAS permanent data set named REPORT.SAS7BDAT located in the “C:\SASDATA\” folder 9 SAS Essentials - Elliott & Woodward The physical name of the file on the hard disk is REPORT.SAS7BDAT Quick Quiz… You have two SAS libraries defined: 1. 2. 10 LIB1 refers to C:\RESEARCH\TAB1\ LIB2 refers to N:\NETWORK\ Where is the SAS dataset LIB1.MYDATA stored on the hard disk? How would you refer to the data set “N:\NETWORK\JUNE2013.SAS7BDAT” using a SAS library name? SAS Essentials - Elliott & Woodward Quick Quiz… You have two SAS libraries defined: 1. 2. 11 LIB1 refers to C:\RESEARCH\TAB1\ LIB2 refers to N:\NETWORK\ Where is the SAS dataset LIB1.MYDATA stored on the hard disk? C:\RESEARCH\TAB1\MYDATA.SASB7DAT How would you refer to the data set “N:\NETWORK\JUNE2013.SAS7BDAT”using a SAS library name? SAS Essentials - Elliott & Woodward Quick Quiz… You have two SAS libraries defined: 1. 2. 12 LIB1 refers to C:\RESEARCH\TAB1\ LIB2 refers to N:\NETWORK\ Where is the SAS dataset LIB1.MYDATA stored on the hard disk? C:\RESEARCH\TAB1\MYDATA.SASB7DAT How would you refer to the data set “N:\NETWORK\JUNE2013.SAS7BDAT”using a SAS library name? LIB2.JUNE2013 SAS Essentials - Elliott & Woodward 3.2 CREATING PERMANENT SAS DATA SETS USING THE WINDOWS FILE NAME TECHNIQUE Open the file WRITE.SAS DATA "C:\SASDATA\PEOPLE"; INPUT ID $ 1 SBP 2-4 DBP 5-7 GENDER $ 8 AGE 910 WT 11-13; DATALINES; Remember in the DCOLUMN.SAS 1120 80M15115 program this line was 2130 70F25180 DATA MYDATA; 3140100M89170 4120 80F30150 5125 80F20110 ; RUN; PROC MEANS; RUN; 13 SAS Essentials - Elliott & Woodward WHERE IS IT STORED? Look at the log file NOTE: There were 5 observations read from the data set WC000001.PEOPLE. NOTE: PROCEDURE MEANS used (Total process time): real time 0.03 seconds cpu time 0.01 seconds SAS created a LIBRARY named WC000001 and put the data set there. Look at your SAS Explorer (may have to go back a folder) Yours may have a slightly different name Double click on the file and verify that it contains the data, then exit the viewer. 14 SAS Essentials - Elliott & Woodward However…. While in this session, you can refer to the data file as either "C:\SASDATA\PEOPLE” or WC000001.PEOPLE However… WC000001 is not a permanent library… It goes away when you end the SAS Session. However… the C:\SASDATA\PEOPLE.SASB7DAT file REMAINS ON YOUR HARD DISK – because you have created a permanent SAS data file. When you begin SAS again, you can still refer to the data file as "C:\SASDATA\PEOPLE” 15 SAS Essentials - Elliott & Woodward Let’s use that data Using data in a permanent data file (named SOMEDATA.SAS7BDAT, located at C:\SASDATA. Open the SAS program READFILE.SAS PROC MEANS DATA='c:\sasdata\somedata'; RUN; Note that there is no INPUT or INFILE statement. Once you have a SAS data file, you do not need to create it again using a DATA step – just use it. Run this program. 16 SAS Essentials - Elliott & Woodward Results Variable Label ID ID Number AGE 50 Mean Std Dev Minimum Maximum 374.22 167.4983 101 604 50 10.46 2.426133 4 15 TIME1 TIME2 Age on Jan 1, 2000 Baseline 6 Months 50 50 21.268 27.44 1.716955 2.659062 17 21.3 24.2 32.3 TIME3 12 Months 50 30.492 3.025594 22.7 35.9 TIME4 24 Months 50 30.838 3.530733 21.2 36.1 STATUS Socioecono mic Status 50 3.94 1.331104 1 5 50 0.4 0.494872 0 1 SEX 17 N SAS Essentials - Elliott & Woodward 3.3 CREATING PERMANENT SAS DATA SETS USING A SAS LIBRARY Another technique for reading and writing SAS data sets is using the SAS Library technique. Before using an SAS Library, you have to "create" one. The SAS Library technique involves creating a nickname that refers to a drive location called (in SAS terminology) a library name. For example, you could defina a library name such as MYWORK that refers to C:\LOTSOFFILES\ Or, you could create the MYWORK library name that refers to a complicated networkfolder such as N:\MYNET\ACCOUNTING\STATMENTS\MYFOLDER\RESEARCH\ 18 SAS Essentials - Elliott & Woodward Using a Permanent Library Nickname Once your library "nickname" is set up, you can use the short name to refer to your disk location rather than having to use some complex Windows path name every time. With a library name, you could use the file designation: MYWORK.JAN2016 to refer to an SAS data file named JAN2016.SAS7BDAT rather than the complicated name: "N: \MYNET\ACCOUNTING\STATMENTS\MYFOLDER\RESEARCH\JAN2016" 19 SAS Essentials - Elliott & Woodward Work vs Permanent Data Files – How they are named Within the SAS program, every SAS data set has two parts to its name. For example, the SAS data set referred to as MYSASLIB. MAY2000 consists of the following: the library named MYSASLIB; the data set named MAY2000. Even temporary files can be referred to with a temporary) library name, WORK. Therefore, a file named WORK.LOTSADATA consists of: 20 the library named WORK (refers to the temporary library); the data set name is LOTSADATA. SAS Essentials - Elliott & Woodward 3.4 CREATING A SAS LIBRARY USING A DIALOG BOX To use the SAS Library file reference, you first have to “create” a library. An easy way to create an SAS library in the Windows version of SAS (with a custom name of your own choosing) is to use the New Library dialog box .. 21 SAS Essentials - Elliott & Woodward Display the SAS Library Dialog Box To display the SAS Library dialog box, make the SAS Explorer window active (click the Explorer tab at the bottom of the left window in the main SAS screen). You should see a window something like this… 22 SAS Essentials - Elliott & Woodward From the Windows Menu, choose File/New. This dialog dox should appear. Click Library, and Ok. The Library Dialog appears (next slide) 23 SAS Essentials - Elliott & Woodward Figure 3.2 New Library Dialog Box (p.55) Enter the information shown on this slide. 24 SAS Essentials - Elliott & Woodward Check to see what files are in the library… On the left side of your SAS screen, click the Explorer tab. You should see a Libraries icon that resembles a filing cabinet. Double-click the Libraries icon. A new window called Active Libraries appears. You should see a library named MYSASLIB. Double-click the MYSASLIB icon to display the contents of the MYSASLIB window. 25 SAS Essentials - Elliott & Woodward Contents of the MYSASLIB window Yours may differ slightly… These are the names of the SAS data sets that should be in your C:\SASDATA folder (now also referred to as the library MYSASLIB) when you installed the data sets for the book. 26 SAS Essentials - Elliott & Woodward SOMEDATA Data Set One data sets is named SOMEDATA. Double-click the SOMEDATA icon to display its contents. (Partially shown here.) 27 SAS Essentials - Elliott & Woodward Use a Permanent Data Set in Code Close the MYSASLIB.SOMEDATA data set by selecting File Close or by clicking the X at the top right of the window. Return to the SAS Editor window and enter this code: PROC MEANS DATA=MYSASLIB.SOMEDATA; RUN; Run this program. Observe how you used the Library name MYSASLIB to refer to the SOMESDATA data set. 28 SAS Essentials - Elliott & Woodward Another Example. Using the WRITE.SAS code, change the DATA statement: DATA MYSASLIB.PEOPLE2; INPUT ID $ 1 SBP 2-4 DBP 5-7 GENDER $ 8 AGE 910 WT 11-13; DATALINES; Change the DATA set 1120 80M15115 name to PEOPLE2 using the library name 2130 70F25180 MYSASLIB. 3140100M89170 Run the program. This 4120 80F30150 creates a data set 5125 80F20110 named PEOPLE2 in your ; MYSASLIB Library. Check it out… RUN; PROC MEANS; RUN; 29 SAS Essentials - Elliott & Woodward MYSASLIB.People2 Created Notice that the SAS data set named PEOPLE2 has appeared in the MYSASLIB library. Double click on it to verify its contents, then close the viewer. 30 SAS Essentials - Elliott & Woodward Quick Quiz You run this code: LIBNAME NEWLIB "C:\SASDATA"; PROC MEANS DATA=NEWLIB.SOMEDATA;RUN; 1. 2. 31 Where on your hard drive is the data set SOMEDATA? What is its file name? SAS Essentials - Elliott & Woodward Quick Quiz You run this code: LIBNAME NEWLIB "C:\SASDATA"; PROC MEANS DATA=NEWLIB.SOMEDATA;RUN; 1. 2. 32 Where on your hard drive is the data set SOMEDATA? C:\SASDATA\ What is its file name? SAS Essentials - Elliott & Woodward Quick Quiz You run this code: LIBNAME NEWLIB "C:\SASDATA"; PROC MEANS DATA=NEWLIB.SOMEDATA;RUN; Where on your hard drive is the data set SOMEDATA? C:\SASDATA\ 2. What is its file name? SOMEDATA.SAS7BDAT Or C:\SASDATA\SOMEDATA.SAS7BDAT 1. 33 SAS Essentials - Elliott & Woodward 3.5 CREATING A SAS LIBRARY USING CODE You can also create a TEMPORARY SAS library reference in code using the following technique The LIBNAME statement creates a SAS Library. LIBNAME NEWLIB "C:\SASDATA"; PROC MEANS DATA=NEWLIB.SOMEDATA;RUN; This creates the library named NEWLIB (with the same location as MYSASLIB). However, this library reference is lost when you end the SAS session. Run this program to verify that it works. Notice the NEWLIB icon in the Explorer. (If you look at the files in NEWLIB, they are the same as the ones in MYSASLIB.) 34 SAS Essentials - Elliott & Woodward DO HANDS-ON EXERCISE P 58 Now do these follow-up exercises Create the following SAS libraries using the wizard: MYLIB1, MYLIB2, MYLIB3, MYLIB4, MYLIB5 Create the following SAS libraries using the LIBNAME command in code: MYLIBA, MYLIBB, MYLIBC, MYLIBD, MYLIBE 35 SAS Essentials - Elliott & Woodward 3.6 USING DATA IN PERMANENT SAS DATA SETS After you have created an SAS library (either a permanent or a temporary one), you can access data in that library within SAS procedures or as input into other DATA steps. For example, the following are three different statements that all access the same data set: PROC MEANS DATA='C:\SASDATA\SOMEDATA';RUN; Or PROC MEANS DATA=MYSASLIB.SOMEDATA;RUN; Or PROC MEANS DATA=MYLIB2.SOMEDATA;RUN; 36 SAS Essentials - Elliott & Woodward Do Hands On Exercise p 59… PROC MEANS DATA='c:\sasdata\somedata'; RUN; Changed to Notice quotes on 1st but not the 2nd (Library) version. PROC MEANS DATA=MYSASLIB.SOMEDATA RUN; Note that there are quotes around the filename when given using a Windows path, but not when you use the library name. Run this program. (Same results as previous.) 37 SAS Essentials - Elliott & Woodward 3.7 IMPORTING DATA FROM ANOTHER PROGRAM Before you import data from another file format (such as Excel) you should make sure the file is ready to import. Most files to be imported should follow these criteria: 1. The first row should be SAS friendly variable names 2. Each column should contain data that are consistent to the variable type (character, number, date) desired. 38 SAS Essentials - Elliott & Woodward Importing a CSV File The first row contains the names of the SAS variables, each separated by a comma, and each adhering to SAS naming conventions. (These were discussed in Chapter 2.) Beginning on the second line of the file, each line contains the data values for one subject, where each value is separated by a comma. Two commas in a row indicate a missing value. You could also designate numeric data values that are unknown or missing with a dot (.) or a missing value code (such as 99). Missing character values can be represented by a double quote such as “”. 39 SAS Essentials - Elliott & Woodward CSV file ready for import Note that line 1 contains the 13 names of the variables separated by commas and each following line contains 13 values each separated by a comma, consistent with the data type of the variable. (Note also that in this case, it is okay that some values contain blanks, such as Civic Hybrid because it is the comma that marks a new value, not a blank.) 40 SAS Essentials - Elliott & Woodward Importing using the Import Wizard Select File/Import data (Hands On Example p 61) 41 SAS Essentials - Elliott & Woodward Examine imported file PROC PRINT DATA=MPG_FOR_CARS;RUN; 42 SAS Essentials - Elliott & Woodward Importing Data Using the SAS Code When you used the Wizard, the last question was if you wanted to save the code. This is the code SAS creates and uses to do the import. If you save the code, you can use it again to import the same file, or change it for other imports. The following code was saved from the Wizard for an import using the Excel .xls import option 43 SAS Essentials - Elliott & Woodward SAS code to Import CSV Name of resulting file Open the file IMPORTEXAMPLE.SAS PROC IMPORT OUT= WORK.MPG_FOR_CARS DATAFILE= "C:\SASDATA\CARSMPG.CSV" DBMS=CSV REPLACE; GETNAMES=YES; DATAROW=2; Name of file to import RUN; Variables names are on the first line 44 SAS Essentials - Elliott & Woodward Type of Import… in this case CSV Import Excel File (page 66) Open IMPORTEXAMPLE2.SAS PROC IMPORT OUT= WORK.FROMXL DATAFILE= "C:\SASDATA\EXAMPLE.XLS" DBMS=XLS REPLACE; SHEET="Database"; GETNAMES=YES; RUN; Note XLS file type specified For an Excel Import, it is important to specify the sheet name. 45 SAS Essentials - Elliott & Woodward Identifiers for Importing Files 46 SAS Essentials - Elliott & Woodward Exporting Data Using the SAS Code PROC EXPORT DATA=dataset OUTFILE='fileneme' or OUTTABLE='tablename' DBMS=dbmsidentifier <LABEL><REPLACE>; Do Hands on Exercise p 66 (EXPORT.SAS) 47 SAS Essentials - Elliott & Woodward 3.8 DISCOVERING CONTENTS OF A DATA FILE Many “canned” data sets come in the SAS format (SASB7DAT). One way to examine the contents is (See Hands On Example p 68) PROC DATASETS; CONTENTS DATA = MYSASLIB.SOMEDATA; or PROC DATASETS; CONTENTS DATA = “C:/SASDATA/SOMEDATA”; 48 SAS Essentials - Elliott & Woodward Continue to Chapter 3, Part 2 3.9 GOING DEEPER: UNDERSTANDING HOW THE DATA STEP READS AND STORES DATA 49 SAS Essentials - Elliott & Woodward These slides are based on the book: Introduction to SAS Essentials Mastering SAS for Data Analytics, 2nd Edition By Alan C, Elliott and Wayne A. Woodward Paperback: 512 pages Publisher: Wiley; 2 edition (August 3, 2015) Language: English ISBN-10: 111904216X ISBN-13: 978-1119042167 These slides are provided for you to use to teach SAS using this book. Feel free to modify them for your own needs. Please send comments about errors in the slides (or suggestions for improvements) to acelliott@smu.edu. Thanks. 50 SAS ESSENTIALS -- Elliott & Woodward