Chapter 2: referencing Files and Setting Options

advertisement

Chapter 2: referencing Files and

Setting Options

A Library Name

SAS Libraries

Path to the physical HD location Hard Drive

Every SAS file is stored in a SAS library.

SAS data set is one type of SAS file.

In some operating environment, a library is a physical collection of files.

In others, such as Windows and Unix environments, a library is a logical name consisting of a group of files that are stored in a physical location in a storage space.

Library can be Temporary or Permanent.

A SAS library must be prepared in order for a SAS program to reach the directory to either read or output a SAS data set.

SAS program only need to recognize the Library reference name.

Reference a SAS file in a SAS Library

A SAS library name has two-levels:

LIBREF.Filename

Libref is the the SAS Library name that is connected to a physical directory in a storage location in your computer. fielname is a file stored in the directory referred to the Libref.

Two types of SAS Library

(A) Temporary SAS Library for hosting temporary SAS data sets:

The LIBREF is always WORK, which is already available in the Libraries folder in

Explore Panel of the SAS working environment.

Example: WORK.admit is a temporary SAS data set.

WORK is the LIBREF and the data set name is admit

NOTE: All of the SAS data sets stored in the WORK library will disappear after log off the SAS session.

NOTE: one can ignore ‘WORK’ and specify the data set as admit, if it is stored in the WORK library as temporary library.

Fro example, in the DATA step:

DATA admit2; is the same as DATA work.admit2;

(B) Permanent SAS Library:

The data sets hosted in the permanent SAS library remains in the SAS session, but the files are stored physically in the HD as defined. The Libref is defined by the user.

For example: Mylib.admit refers to a SAS data set admit which is stored in the library named Mylib.

Mylib is user defined SAS library. Admit is a file stored in the corresponding physical location in the hard drive.

How to assign a SAS Library

If you want to use the WORK library to store your file, there is no need to define WORK library. It is already created by SAS when you login.

If you want to create you own library, there are two ways:

(1) By the pull-down menu, as described in the SAS

Window Environment document.

(2) By using a SAS statement as below to define a SAS library:

LIBNAME libref ‘the path link to the physical folder in HD’;

NOTE: libref is a logical name for the entire folder in HD. The folder can have many data sets. Each data set in the folder will be called: libref.dasetname

Example: you store data sets: admit, budget, tuition in the folder

‘UNIVERSITY’ in C-drive. You define a SAS library ‘COST’ link to these files by:

LIBNAME cost ‘C:\university’;

The data sets will be named in your SAS program as:

Cost.admit , cost.budget , cost .tuition

NOTE: the names can be upper or lower cases.

Rules required for a Valid SAS Library name

• are limited to 8 characters

• must start with a letter or underscore

• can contain only letters, numbers, or underscores.

Example: s575, _s575 , s575_ s575_ are valid LIBREF

S-575 , sta575_online are not valid

How Long Libref remains in effect

The LIBNAME statement is a global statement. A global statement will remain in effect until you modify them, cancel them or end your SAS session.

Although we say the library is permanent, this means your data

set in the SAS library (in the physical storage) is permanent, but not the LIBREF. You still need to assign a libref to each permanent library in order to access these data sets in each

SAS session.

NOTE: If you use the Pull-Down menu to create your permanent and check ‘Enable at Startup’, then, the LIBREF will be available when you login without LIBNAME statement.

Referencing files in other formats

You can use LIBNAME statement to reference not only

SAS files, but also files created by other software products, such as database management systems.

SAS uses appropriate SAS engine designed to connect to these specific software products.

Files from non-SAS software

Engine SAS data library

LIBNAME Libref Engine ‘path to the physical location’;

Some available engines are BMDP, SPSS, OSIRIS

Allows read-only access to BMDP, SPSS, OSIRIS files

See Help document for more details, if needed.

Where to find the Library created and the contents in the library and in each data set?

Once the library is created, it appears in the folder called

‘Libraries’ on the left panel (Explore Panel) of the SAS working interface.

To see the content of a SAS data set, click on the data set to open the data set in ‘Tableview’ window. Close the

Tableview window afterwards.

One can use SAS statements to view the contents of a

SAS library and the detailed DATA descriptor information of any SAS data set.

ID

Exercise

Write a SAS program to read the following

SAS data set located in the class webiste,

Pilots.sas7bdat

This data consists of pilots employed at an airline. The variables are

Variable

LastName char

FirstName char

City

State

Gender

JobCode

Salary

Type char char char char char num

Length Description

4 ID number

10 last name

9 first name

12 city

2

1

3

8 state gender job code current salary

In this program, you will do the following tasks:

(1) Create a SAS library, mylib that connects to the folder in which

Pilots data set is stored.

(2) Read the SAS data set, Pilots

(3) Create a new SAS data set, call:

Pilotsnew, and store it in another SAS library call: mylib1 that connect to the folder, DataEx , inside Math707 folder.

(4) Print the data.

Save the SAS program, name it

C2_readSASData to your C-drive in a new folder, SASEx inside Math707,

Birth

Hired num num

8

8 birth date date hired

HomePhone char 12 home phone number

Answer to Exercise

Libname mylib ‘c:\math707\sasdata’;

Libname mylib1 ‘c:\math707\ dataex’;

Data mylib1.pilotsnew;

Set mylib.pilots;

Run;

Proc print data = mylib1.pilotsnew;

Run;

View contents of entire Library and/or Data descriptor of a data set

In practical situation, a SAS library often consists of many data sets shared by different users. Therefore, it is a good practice to find out the contents in the library.

SAS has two SAS procedures to display the contents in the library as well as for each SAS data set:

PROC CONTENTS <options>;

RUN;

PROC DATASETS <options> ;

CONTENTS <options>;

QUIT;

View the contents in the entire library without data descriptor

/* To display all SAS data sets in Mylib library */ p roc contents data=mylib._all_ nods; run;

/*Or use the following procedure */ proc datasets; contents data=mylib._all_ nods;

Quit;

NOTE: the filename _all_ is a SAS designated variable name referring to all files in the mylib library.

NODS : is a key word referring to NO Data Descriptor details

NOTE: The statement inside /* */ is a comment statement.

View detail data descriptor information of a data set

/*view the data descriptor information for the SAS data set admit */

PROC CONTENTS data=mylib.admit; run;

/* One can also use the following procedure *

PROC DATASETS;

CONTENTS data=mylib.admit;

QUIT;

NOTE: The variables are listed in alphbetic order by default.

View detail data descriptor information of a data set in table column order for the variables in the data set

One can list the variable order based on the order it created in the SAS data set by using the option:

VARNUM

PROC CONTENTS data=mylib.admit

varnum ;

Or

PROC DATSETS ;

CONTENTS DATA=mylib.ADMIT

VARNUM ;

QUIT;

Exercise

Open the SAS program C2_readSASdata program, and use PROC CONTENTS as well as PROC DATASETS to

(1) View only the SAS data sets in mylib library.

(2) View the detailed data descriptor for the SAS data set pilots in mylib.

(3) View the detailed data descriptor for the SAS data set pilots with the table column variable order.

(4) Save the SAS program, name it C2_Contents, to your SASEx folder.

Answer to Exercise

/* use proc contents , display all sas data sets in mylib*/

Libname mylib ‘c:\math707\sasdata’;

Proc contents data = mylib._all_ nods; Run;

/* use proc datasets , display all sas data sets in mylib */ proc datasets; contents data=mylib._all_ nods;

Quit;

/* use proc contents , display details of sas data set pilots with variables in alphabetic order */

Proc contents data = mylib.pilots; Run; proc datasets; contents data=mylib.pilots;

Quit;

/* use proc contents , display details of sas data set pilots with variables in table column order

*/

Proc contents data = mylib.pilots varnum; run; proc datasets; contents data=mylib.pilots varnum;

Quit;

Setting SAS System Options

SAS system options for each window can be set using Tools, Options, System to set the system options using Pull-down menu, or use SAS statement to specify System options:

NOTE: One can set system options for SAS Listing output regarding to

• Line size, page size, the page number, the date and time to be displayed, and many others. These options will not affect the HTML output format .

Setting System Options

The general syntax: OPTIONS options;

Some useful options are:

DATE|NODATE: to print date and time or not (Default is DATE)

NUMBER\NONUMBER: to print page # or not. Default is number and all numbers are cumulated until renumbered.

PAGENO = n: by default, page # are cumulated . Use PAGENO=n to reset the starting page #. For example,

PAGENO=3 will reset the page # starting at page 3, and begin cumulating from that point on.

PAGESIZE = n|max

LINESIZE=n|max: Note: If an observation need more than one line, it continues on to next line.

NOTE: OPTIONS statement is a global statement. Can appear anywhere in your program to change the setting from that point on.

NOTE: It is a good practice to place OPTIONS statements outside the DATA or PROC steps.

Exercise

Open C2_Contents program, and practice the following SAS system options using

OPTIONS statement.

Delete all RPOC DATASETS procedures.

Add options statement at the end of this program with the following options:

Change options to NODATE,

Set PAGENO starting at 1 for the output

Set PAGESIZE to be 50

Set LINESIZE to be 80

Use proc contents to see the descriptor of admit data in mylib

Use proc print statement to print admit data.

Check results to see the effects of these options.

Add another OPTIONS statement and change options back to

DATE, PAGESIZE=max, LINESIZE=max, then,

Use proc print to print PILOTS data in the mylib.

Check the results to see the effect of the options.

Save the program, named C2_SYSOptions to your SASEx folder

Answer to Exercise

Libname mylib ‘c:\math707\sasdata’;

Proc contents data = mylib._all_ nods; Run;

Proc contents data = mylib.admit; run;

Options nodate pageno=1 pagesize=50 linesize=80;

Proc print data = mylib.admit; run;

Options date pagesize=max linesize=max;

Proc print data = mylib.pilots; run;

Handling two-digit years using System

OPTIONS statement

Many data use two-digit year such as 94 for 1994. 10 for 1910.

There is no confuse for 1994 using 94 now, but year 10 can be

1910 or 2010. This is Year 2000 Compliance problem.

SAS uses OPTIONS YEARCUTOFF = year; to control the 2000 year compliance issue. This specifies the 100 year span for interpret two-digit year.

The default yearcutoff = 1920 (interpret the 100 years span from

1920 to 2019 for the two-digit year.

OPTIONS YEARCUTOFF = 1940; interpret 1940 to 2039 as 100 year span for two-digit year.

How does YEARCUTOFF work?

OPTIONS YEARCUTOFF=1940;

Interpret the 100 year from 1940 to 2039

Date in the data set

8/26/15

12/25/65

5/7/90

8/30/48

Interpreted as

8/26/2015

12/25/1965

5/7/1990

8/30/1948

OPTIONS YEARCUTOFF=1960

Date in the data set

8/26/15

12/25/65

5/7/90

8/30/48

Interpreted as

Specifying observations of SAS data set to be processed using OPTIONS statement

In many applications, the # of observations (cases) is very large. It is important that a SAS program is correct before processing the entire data set. However, one needs to test if the program correctly process the data, one can specify only a small part of the data to be processed for testing purpose.

This can be done by using OPTIONS statement.

OPTIONS FIRSTOBS = n1 OBS= n2 ;

FIRSTOBS = n1 will read the data starting at the n1th observation.

OBS=n2 will read the data set ending at the n2th observation.

Example: OPTIONS FIRSTOBS=5 OBS=15;

Will read from the 5 th observations until the 15 th

Default n1 and n2 are: FIRSTOBS=1 and OBS=MAX observations.

To reset reading the entire data set, use

OPTIONS FIRSTOBS = 1 OBS =MAX;

GLOBAL statement Vs. Local Statement

SAS defines some statements as global statements such as

LIBNAME statement, OPTIONS statement. They take effect once it is defined and overwritten by the next statement in the same program during the same SAS session.

Most of SAS statements are local, meaning it takes in effect only at the time it appears. If the same task defined in a global and in a local statement, the local statement overwrites the global statement at the point, but return to the global statement afterwards.

Exercise

Write a program to

(1) Read and print the sas data set Admit using the following options:

Pageno=1, firstobs=5 and obs = 15

(2) Add another options statement to the program with the options:

Firstobs=3 and obs=8

And print the data set Admit again.

Observe the output and make sure you understand the reason for getting the output.

(3) Reset the options with Pageno=1, firstobs=1 and obs=max, then print the Admit data.

(4) Save the program as C2_sysoptions2 to SASEx folder

Answer

Libname mylib ‘c:\math707\sasdata’;

Options pageno=1 firstobs=5 obs=15;

Data admitn; set mylib.admit;

Proc print data=admitn; run;

Proc print data = mylib.admit; run;

Options firstobs=3 obs=8;

Proc print data = admitn; run;

Proc print data = mylib.admit; run;

Options firstobs=1 obs=max pageno=1;

Proc print data=admitn; run;

Proc print data = mylib.admit; run;

FIRSTOBS=, OBS= as local options in a PROC

PRINT procedure

PROC PRINT procedure is the most common procedure to print the data.

The general syntax is:

PROC PRINT <options>; RUN;

The following examples use Local options in PROC PRINT to specify observations:

PROC PRINT data=mylib.admit (FIRSTOBS=5 OBS=15);

Will print 5 th observations to 15 th observations.

More on Local Options Vs. Global Options in PROC PRINT

OPTIONS FIRSTOBS=10 OBS=18;

/* Uses the global OPTIONS. Since there is no local option*/

proc print data = mylib.admit; title 'print 10th to 18th cases';

/*Uses local option for Firstobs = 15, and use global option for obs=18 */

PROC PRINT data=mylib.admit (firstobs=15); title 'prints cases 15 to 20'; run;

/*uses local option for Firstobs = 12, and obs=16.

Since local options overwrite global option for the specific procedure.*/

PROC PRINT data=mylib.admit (firstobs=12 OBS=16); title 'prints cases 12 to 16 ';

run;

/*Uses local option for Firstobs = 5, and obs=20.

Since local options overwrite global option for the specific procedure.*/

PROC PRINT data=mylib.admit(firstobs=5 obs=20); title 'prints 5 to 20 '; run;

More System Options

See SAS Help Documents and a few additional options in textbook.

Exercise

Write a SAS program to do the following:

(1) Create the library Mylib to connect to the SASData folder as usual.

(2) Use options: pageno=1 firstobs=5 obs=15

(3) Print data set admit in mylib

(4) Print data set admit using local options (firstobs = 3 obs =12) in proc print statement.

(5) Add system options statement with firstobs =1 and obs =15.

(6) Print data set admit using local options (firstobs = 10 obs =20) in proc print statement.

(7) Add system options statement with firstobs =1 and obs =max.

(8) Print data set admit using local options (firstobs = 3 obs =12) in proc print statement.

Save the program as c2_glob_loc_options to SASEx folder

Answer

Libname mylib ‘c:\math\sasdata’;

Options pageno=1 firstobs=5 obs=15;

Proc print data = mylib.admit; run;

Proc print data = mylib.admit (firstobs=3 obs=12); run;

Options firstobs=1 obs=15;

Proc print data = mylib.admit (firstobs=3 obs =12); run;

Options firstobs=1 obs=max;

Proc print data = mylib.admit (firstobs=3 obs =12); run;

Download