Chapter 3 SAS ESSENTIALS

advertisement
Introduction to SAS Essentials
Mastering SAS for Data Analytics
Alan Elliott and Wayne Woodward
1
SAS Essentials - Elliott & Woodward
Intro to SAS
Chapter 3, Part 1
2
SAS Essentials - Elliott & Woodward
Chapter 3 LEARNING OBJECTIVES







To be able to work with SAS libraries and permanent data
sets
To be able to read and write permanent SAS data sets
To be able to interactively import data from another
program
To be able to define SAS libraries using program code
To be able to import data using code
To be able to discover the contents of an SAS data set
To be able to understand how the Data Step Reads and
Stores Data.
3
SAS Essentials - Elliott & Woodward
3.1 WORKING WITH SAS LIBRARIES

4
Permanent Data Sets
 All of the data sets we’ve created have been
“Work” data sets.
 Work datasets vanish when you end a SAS
session
 However, we can create permanent data sets
 First, we need a location (library or folder)
 Then we store data sets in that library (or
folder)
SAS Essentials - Elliott & Woodward
Data Set vs Code
The SAS data set is a separate file – like a Word .doc file or
an Excel .xls file
The data set is not the SAS code that we’ve used up to
this point – the code creates the data set.
SAS data sets have the extension .sas7bdat



5
SAS Essentials - Elliott & Woodward
SAS Data Sets
are (usually) created by a DATA statement.
are an internal representation of the data created by the
DATA statement
contain more than the data values – they can contain
variable names, labels, the results of codings, calculations
and variable formats.
are referred to by a name that indicates whether the data
set is temporary or permanent




6
SAS Essentials - Elliott & Woodward
Temporary vs Permanent
A temporary SAS data set is named with a single level name
such as MEASLES, or MAR2000. (technically the names are
WORK.MEASLES and WORK.MAR2000)
A permanent SAS data set is a file saved on your hard disk.
There are two ways to refer to a permanent data set
We can refer to a permanent SAS data set using a Windows
filename such as “C:\SASDATA\SOMEDATA” or
“C:\RESEARCH\MEASLES2009”.
Or by using a SAS library prefix: MYSASDATA.PEOPLE or
RESEARCH.MEASLES2009 where the
MYSASDATA library name = “C:\SASDATA”
RESEARCH library name = “C:\RESEARCH”







7
SAS Essentials - Elliott & Woodward
Creating a SAS Data Set
8
SAS Essentials - Elliott & Woodward
Two ways to refer to a permanent location
LIBRARY NAME
FOLDER LOCATION
MYSASDATA
“C:\SASDATA”
RESEARCH
“C:\RESEARCH”
Library Name & data set name
Complete name
MYSASDATA and REPORT
MYSASDATA.REPORT
“C:\SASDATA” and REPORT
“C:\SASDATA\REPORT”
Thus… MYSASDATA and C:\SASDATA refer to the same location
C:\SASDATA\REPORT and MYSASDATA.REPORT
Both refer to the SAS permanent data set named
REPORT.SAS7BDAT located in the “C:\SASDATA\” folder
9
SAS Essentials - Elliott & Woodward
The physical name
of the file on the
hard disk is
REPORT.SAS7BDAT
Quick Quiz…
You have two SAS libraries defined:



1.
2.
10
LIB1 refers to C:\RESEARCH\TAB1\
LIB2 refers to N:\NETWORK\
Where is the SAS dataset LIB1.MYDATA stored on the
hard disk?
How would you refer to the data set
“N:\NETWORK\JUNE2013.SAS7BDAT” using a SAS
library name?
SAS Essentials - Elliott & Woodward
Quick Quiz…
You have two SAS libraries defined:



1.
2.
11
LIB1 refers to C:\RESEARCH\TAB1\
LIB2 refers to N:\NETWORK\
Where is the SAS dataset LIB1.MYDATA stored on the hard
disk? C:\RESEARCH\TAB1\MYDATA.SASB7DAT
How would you refer to the data set
“N:\NETWORK\JUNE2013.SAS7BDAT”using a SAS library
name?
SAS Essentials - Elliott & Woodward
Quick Quiz…
You have two SAS libraries defined:



1.
2.
12
LIB1 refers to C:\RESEARCH\TAB1\
LIB2 refers to N:\NETWORK\
Where is the SAS dataset LIB1.MYDATA stored on the hard
disk? C:\RESEARCH\TAB1\MYDATA.SASB7DAT
How would you refer to the data set
“N:\NETWORK\JUNE2013.SAS7BDAT”using a SAS library
name? LIB2.JUNE2013
SAS Essentials - Elliott & Woodward
3.2 CREATING PERMANENT SAS DATA SETS
USING THE WINDOWS FILE NAME TECHNIQUE

Open the file WRITE.SAS
DATA "C:\SASDATA\PEOPLE";
INPUT ID $ 1 SBP 2-4 DBP 5-7 GENDER $ 8 AGE 910 WT 11-13;
DATALINES;
Remember in the
DCOLUMN.SAS
1120 80M15115
program this line was
2130 70F25180
DATA MYDATA;
3140100M89170
4120 80F30150
5125 80F20110
;
RUN;
PROC MEANS;
RUN;
13
SAS Essentials - Elliott & Woodward
WHERE IS IT STORED?

Look at the log file
NOTE: There were 5 observations read from the data set WC000001.PEOPLE.
NOTE: PROCEDURE MEANS used (Total process time):
real time
0.03 seconds
cpu time
0.01 seconds
SAS created a LIBRARY named WC000001 and put the data set there.
 Look at your SAS Explorer (may have to go back a folder)
 Yours may have a slightly different name
 Double click on the file and verify that it contains the
data, then exit the viewer.

14
SAS Essentials - Elliott & Woodward
However….




While in this session, you can refer to the data file as
either "C:\SASDATA\PEOPLE” or WC000001.PEOPLE
However… WC000001 is not a permanent library… It
goes away when you end the SAS Session.
However… the C:\SASDATA\PEOPLE.SASB7DAT file
REMAINS ON YOUR HARD DISK – because you have
created a permanent SAS data file.
When you begin SAS again, you can still refer to the
data file as "C:\SASDATA\PEOPLE”
15
SAS Essentials - Elliott & Woodward
Let’s use that data


Using data in a permanent data file (named
SOMEDATA.SAS7BDAT, located at C:\SASDATA.
Open the SAS program READFILE.SAS
PROC MEANS DATA='c:\sasdata\somedata';
RUN;

Note that there is no INPUT or INFILE statement.

Once you have a SAS data file, you do not need to create it
again using a DATA step – just use it.
Run this program.

16
SAS Essentials - Elliott & Woodward
Results
Variable
Label
ID
ID Number
AGE
50
Mean
Std Dev Minimum Maximum
374.22 167.4983
101
604
50
10.46
2.426133
4
15
TIME1
TIME2
Age on Jan
1, 2000
Baseline
6 Months
50
50
21.268
27.44
1.716955
2.659062
17
21.3
24.2
32.3
TIME3
12 Months
50
30.492
3.025594
22.7
35.9
TIME4
24 Months
50
30.838
3.530733
21.2
36.1
STATUS
Socioecono
mic Status
50
3.94
1.331104
1
5
50
0.4
0.494872
0
1
SEX
17
N
SAS Essentials - Elliott & Woodward
3.3 CREATING PERMANENT SAS DATA SETS
USING A SAS LIBRARY





Another technique for reading and writing SAS data sets
is using the SAS Library technique.
Before using an SAS Library, you have to "create" one.
The SAS Library technique involves creating a nickname
that refers to a drive location called (in SAS terminology)
a library name.
For example, you could defina a library name such as
MYWORK that refers to C:\LOTSOFFILES\
Or, you could create the MYWORK library name that
refers to a complicated networkfolder such as
N:\MYNET\ACCOUNTING\STATMENTS\MYFOLDER\RESEARCH\
18
SAS Essentials - Elliott & Woodward
Using a Permanent Library Nickname

Once your library "nickname" is set up, you can use the
short name to refer to your disk location rather than
having to use some complex Windows path name every
time. With a library name, you could use the file
designation:
MYWORK.JAN2016
to refer to an SAS data file named
JAN2016.SAS7BDAT rather than the complicated name:

"N: \MYNET\ACCOUNTING\STATMENTS\MYFOLDER\RESEARCH\JAN2016"
19
SAS Essentials - Elliott & Woodward
Work vs Permanent Data Files – How they are named

Within the SAS program, every SAS data set has two parts
to its name. For example, the SAS data set referred to as
MYSASLIB. MAY2000 consists of the following:



the library named MYSASLIB;
the data set named MAY2000.
Even temporary files can be referred to with a
temporary) library name, WORK. Therefore, a file named
WORK.LOTSADATA consists of:


20
the library named WORK (refers to the temporary library);
the data set name is LOTSADATA.
SAS Essentials - Elliott & Woodward
3.4 CREATING A SAS LIBRARY USING A DIALOG BOX


To use the SAS Library file reference, you first have to
“create” a library.
An easy way to create an SAS library in the Windows
version of SAS (with a custom name of your own
choosing) is to use the New Library dialog box ..
21
SAS Essentials - Elliott & Woodward
Display the SAS Library Dialog Box

To display the SAS Library dialog box, make the SAS
Explorer window active (click the Explorer tab at the
bottom of the left window in the main SAS screen). You
should see a window something like this…
22
SAS Essentials - Elliott & Woodward

From the Windows Menu, choose File/New. This dialog
dox should appear. Click Library, and Ok. The Library
Dialog appears (next slide)
23
SAS Essentials - Elliott & Woodward
Figure 3.2 New Library Dialog Box (p.55)

Enter the information shown on this slide.
24
SAS Essentials - Elliott & Woodward
Check to see what files are in the library…

On the left side of your SAS screen, click the Explorer tab.
You should see a Libraries icon that resembles a filing
cabinet.

Double-click the Libraries icon. A new window called
Active Libraries appears. You should see a library named
MYSASLIB.

Double-click the MYSASLIB icon to display the contents of
the MYSASLIB window.
25
SAS Essentials - Elliott & Woodward
Contents of the MYSASLIB window

Yours may differ slightly…
These are the names of the SAS
data sets that should be in your
C:\SASDATA folder (now also
referred to as the library
MYSASLIB) when you installed
the data sets for the book.
26
SAS Essentials - Elliott & Woodward
SOMEDATA Data Set

One data sets is named SOMEDATA. Double-click the
SOMEDATA icon to display its contents. (Partially shown
here.)
27
SAS Essentials - Elliott & Woodward
Use a Permanent Data Set in Code


Close the MYSASLIB.SOMEDATA data set by selecting File Close or by clicking the X at the top right of the window.
Return to the SAS Editor window and enter this code:
PROC MEANS DATA=MYSASLIB.SOMEDATA; RUN;

Run this program. Observe how you used the Library
name MYSASLIB to refer to the SOMESDATA data set.
28
SAS Essentials - Elliott & Woodward
Another Example.

Using the WRITE.SAS code, change the DATA statement:
DATA MYSASLIB.PEOPLE2;
INPUT ID $ 1 SBP 2-4 DBP 5-7 GENDER $ 8 AGE 910 WT 11-13;
DATALINES;
Change the DATA set
1120 80M15115
name to PEOPLE2 using
the library name
2130 70F25180
MYSASLIB.
3140100M89170
Run the program. This
4120 80F30150
creates a data set
5125 80F20110
named PEOPLE2 in your
;
MYSASLIB Library.
Check it out…
RUN;
PROC MEANS;
RUN;
29
SAS Essentials - Elliott & Woodward
MYSASLIB.People2 Created


Notice that the SAS data set named
PEOPLE2 has appeared in the
MYSASLIB library.
Double click on it to verify its
contents, then close the viewer.
30
SAS Essentials - Elliott & Woodward
Quick Quiz
You run this code:
LIBNAME NEWLIB "C:\SASDATA";
PROC MEANS DATA=NEWLIB.SOMEDATA;RUN;
1.
2.
31
Where on your hard drive is the data set SOMEDATA?
What is its file name?
SAS Essentials - Elliott & Woodward
Quick Quiz
You run this code:
LIBNAME NEWLIB "C:\SASDATA";
PROC MEANS DATA=NEWLIB.SOMEDATA;RUN;
1.
2.
32
Where on your hard drive is the data set SOMEDATA?
C:\SASDATA\
What is its file name?
SAS Essentials - Elliott & Woodward
Quick Quiz
You run this code:
LIBNAME NEWLIB "C:\SASDATA";
PROC MEANS DATA=NEWLIB.SOMEDATA;RUN;
Where on your hard drive is the data set SOMEDATA?
C:\SASDATA\
2. What is its file name? SOMEDATA.SAS7BDAT
Or C:\SASDATA\SOMEDATA.SAS7BDAT
1.
33
SAS Essentials - Elliott & Woodward
3.5 CREATING A SAS LIBRARY USING CODE

You can also create a TEMPORARY SAS library reference in
code using the following technique
The LIBNAME statement
creates a SAS Library.
LIBNAME NEWLIB "C:\SASDATA";
PROC MEANS DATA=NEWLIB.SOMEDATA;RUN;


This creates the library named NEWLIB (with the same
location as MYSASLIB). However, this library reference is
lost when you end the SAS session.
Run this program to verify that it works. Notice the
NEWLIB icon in the Explorer. (If you look at the files in
NEWLIB, they are the same as the ones in MYSASLIB.)
34
SAS Essentials - Elliott & Woodward
DO HANDS-ON EXERCISE P 58



Now do these follow-up exercises
Create the following SAS libraries using the wizard:
MYLIB1, MYLIB2, MYLIB3, MYLIB4, MYLIB5
Create the following SAS libraries using the LIBNAME
command in code: MYLIBA, MYLIBB, MYLIBC, MYLIBD,
MYLIBE
35
SAS Essentials - Elliott & Woodward
3.6 USING DATA IN PERMANENT SAS DATA SETS

After you have created an SAS library (either a permanent
or a temporary one), you can access data in that library
within SAS procedures or as input into other DATA steps.
For example, the following are three different statements
that all access the same data set:
PROC MEANS DATA='C:\SASDATA\SOMEDATA';RUN;

Or
PROC MEANS DATA=MYSASLIB.SOMEDATA;RUN;
Or
PROC MEANS DATA=MYLIB2.SOMEDATA;RUN;

36
SAS Essentials - Elliott & Woodward
Do Hands On Exercise p 59…
PROC MEANS DATA='c:\sasdata\somedata';
RUN;
Changed to
Notice quotes on 1st
but not the 2nd
(Library) version.
PROC MEANS DATA=MYSASLIB.SOMEDATA
RUN;
Note that there are quotes around the filename when
given using a Windows path, but not when you use the
library name.
Run this program. (Same results as previous.)


37
SAS Essentials - Elliott & Woodward
3.7 IMPORTING DATA FROM ANOTHER PROGRAM
Before you import data from another file format (such as
Excel) you should make sure the file is ready to import.
Most files to be imported should follow these criteria:
1. The first row should be SAS friendly variable names
2. Each column should contain data that are consistent to
the variable type (character, number, date) desired.

38
SAS Essentials - Elliott & Woodward
Importing a CSV File



The first row contains the names of the SAS variables,
each separated by a comma, and each adhering to SAS
naming conventions. (These were discussed in Chapter 2.)
Beginning on the second line of the file, each line
contains the data values for one subject, where each
value is separated by a comma.
Two commas in a row indicate a missing value. You could
also designate numeric data values that are unknown or
missing with a dot (.) or a missing value code (such as 99). Missing character values can be represented by a
double quote such as “”.
39
SAS Essentials - Elliott & Woodward
CSV file ready for import


Note that line 1 contains the 13 names of the variables separated by
commas and each following line contains 13 values each separated
by a comma, consistent with the data type of the variable.
(Note also that in this case, it is okay that some values contain
blanks, such as Civic Hybrid because it is the comma that marks a
new value, not a blank.)
40
SAS Essentials - Elliott & Woodward
Importing using the Import Wizard

Select File/Import data (Hands On Example p 61)
41
SAS Essentials - Elliott & Woodward
Examine imported file

PROC PRINT DATA=MPG_FOR_CARS;RUN;
42
SAS Essentials - Elliott & Woodward
Importing Data Using the SAS Code



When you used the Wizard, the last question was if you
wanted to save the code. This is the code SAS creates and
uses to do the import.
If you save the code, you can use it again to import the
same file, or change it for other imports.
The following code was saved from the Wizard for an
import using the Excel .xls import option
43
SAS Essentials - Elliott & Woodward
SAS code to Import CSV
Name of
resulting file
Open the file IMPORTEXAMPLE.SAS
PROC IMPORT OUT= WORK.MPG_FOR_CARS
DATAFILE= "C:\SASDATA\CARSMPG.CSV"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
Name of file to
import
RUN;

Variables
names are on
the first line
44
SAS Essentials - Elliott & Woodward
Type of
Import… in this
case CSV
Import Excel File (page 66)

Open IMPORTEXAMPLE2.SAS
PROC IMPORT OUT= WORK.FROMXL
DATAFILE=
"C:\SASDATA\EXAMPLE.XLS"
DBMS=XLS REPLACE;
SHEET="Database";
GETNAMES=YES;
RUN;
Note XLS file type
specified
For an Excel Import, it is important to
specify the sheet name.
45
SAS Essentials - Elliott & Woodward
Identifiers for Importing Files
46
SAS Essentials - Elliott & Woodward
Exporting Data Using the SAS Code
PROC EXPORT DATA=dataset
OUTFILE='fileneme' or
OUTTABLE='tablename'
DBMS=dbmsidentifier
<LABEL><REPLACE>;

Do Hands on Exercise p 66 (EXPORT.SAS)
47
SAS Essentials - Elliott & Woodward
3.8 DISCOVERING CONTENTS OF A DATA FILE


Many “canned” data sets come in the SAS format
(SASB7DAT).
One way to examine the contents is (See Hands On
Example p 68)
PROC DATASETS;
CONTENTS DATA = MYSASLIB.SOMEDATA;
or
PROC DATASETS;
CONTENTS DATA = “C:/SASDATA/SOMEDATA”;
48
SAS Essentials - Elliott & Woodward
Continue to Chapter 3, Part 2

3.9 GOING DEEPER: UNDERSTANDING HOW THE DATA
STEP READS AND STORES DATA
49
SAS Essentials - Elliott & Woodward
These slides are based on the book:
Introduction to SAS Essentials
Mastering SAS for Data Analytics, 2nd Edition
By Alan C, Elliott and Wayne A. Woodward
Paperback: 512 pages
Publisher: Wiley; 2 edition (August 3, 2015)
Language: English
ISBN-10: 111904216X
ISBN-13: 978-1119042167
These slides are provided for you to use to teach SAS using this book. Feel free to
modify them for your own needs. Please send comments about errors in the slides
(or suggestions for improvements) to acelliott@smu.edu. Thanks.
50
SAS ESSENTIALS -- Elliott & Woodward
Download