Unix Comes to the Rescue: A Comparison Between PC SAS and

advertisement
UNIX Comes to the Rescue: A Comparison between UNIX SAS® and
PC SAS®
Chii-Dean Lin, San Diego State University, San Diego, CA
Ming Ji, San Diego State University, San Diego, CA
ABSTRACT
Running SAS under PC and under UNIX environment are very similar in general. However, some differences do
exist between PC SAS and UNIX SAS. A SAS code may run smoothly under PC SAS but not under UNIX SAS. In
this paper, we compare the differences between PC SAS and UNIX SAS. Features that are different between PC
SAS and UNIX SAS are summarized. In addition, we show a step-by-step procedure for running a PC created SAS
code on a UNIX server. This paper is intended for beginners with basic SAS knowledge.
INTRODUCTION
“I really need the SAS results by Friday but my PC and my colleagues’ PCs can only get half of them. What should
I do?” If you have a UNIX account, one suggestion would be to upload the program(s) to UNIX and run program(s)
there. Even if you follow all the necessary rules and procedures of putting SAS program(s) and data sets into UNIX,
sometimes the program will not run as smooth as you wish. Generally speaking, PC SAS and UNIX SAS are so
similar that people usually are not aware of any problems until they try to execute a PC SAS program on UNIX (or
vice versa) and run into trouble. In this paper, we compare differences between PC SAS and UNIX SAS and
discuss potential problems when running a PC created SAS program on a UNIX platform.
Kutler (2003) discussed the Linux/Unix & X-windows systems for SAS system administrators who would like to
implement an open environment within their SAS installation. Zhang (2003) showed how one can use ODS, FTP,
DDE, etc for integrating between UNIX & SAS. Our focus of this paper is not, however, to show how to integrate
between PC and UNIX using SAS/CONNECT or accessing the interfaces for client/server database systems using
SAS/ACCESS. Instead, we emphasize on running a SAS code on either PC or UNIX in a more traditional way: write
a program, submit it, and get results.
PCs are more powerful and faster than before. Even so, we still receive requests for help from students asking how
to run programs such as SAS or S-plus on UNIX due to time limit or memory problem on PC. It is clear that the
UNIX platform still holds some edges for running large programs over the PC platform. In this paper, we first
compare major differences between PC SAS and UNIX SAS. This gives readers a quick snapshot on how SAS
functions under these two systems. After giving a general idea how SAS behaves under PC and UNIX, we provide a
detailed illustration on some major issues and provide simple programs for comparison purposes. A step-by-step
procedure is provided on how one can run a PC created SAS program using UNIX batch mode and how one can
edit the program on UNIX. The procedure gives readers a guideline to run a SAS code from PC SAS to UNIX SAS.
Finally, we conclude with a brief discussion.
A COMPARISON BETWEEN PC SAS AND UNIX SAS
We start with a simple comparison on general issues between PC SAS and UNIX SAS. A summarized table
showing the comparisons is listed below. Sample SAS codes used to show the differences are given in the next
section. Note that some of the comparisons are based on the windowing environment in PC SAS and on the batch
mode in UNIX SAS. Running SAS codes under PC windowing environment and under UNIX batch mode are our
major concern. This is because windowing environment is used the most under PC while the background batch
mode under UNIX can be used as an alternative option for running a long processing SAS program. Some of the
differences listed below may be due to the running modes such as windowing environment and batch mode and not
due to the PC SAS and the UNIX SAS environment.
Server
Invoking a SAS session
PC-SAS
-Windowing environment mode.
-Batch mode.
Terminating a SAS
process
-Windowing environment mode:
Use BREAK icon (circled
exclamation point (!)) or
CTRL+BREAK.
-Batch mode: Click on cancel icon.
No
Case Sensitive?
1
UNIX-SAS
-Display manager mode (if connects from a PC,
one needs an X Window manager such as
XWIN32 to run the Display manager mode).
-Non-interactive mode (Batch mode).
-Line command mode.
All modes: use kill PID under UNIX prompt
(See next section for detailed description).
The UNIX environment is case sensitive but not
Output & Log file
-Windowing environment mode:
accumulate under the Output
Window & the Log Window.
-Batch mode: override the existing
.lst and .log files when rerun
the SAS code.
Yes
Executing system
commands within SAS?
Running multiple jobs?
Yes. Not efficient.
No
Line width restriction
when creating a SAS
code?
The submitted code will
remain in the Program
Editor under the Display
manager mode?
Methods for submitting
a batch mode job:
Windowing environment mode: the
SAS code will remain in the
Program Editor window after a
submission.
Right click on the mouse and
select batch submit icon when
pointing to the SAS program file or
drag the SAS program file to the
SAS shortcut icon.
Windowing environment mode: no
effect (save from Output Window).
Batch mode: the page break effect
exists (export from .lst file).
Page break effect when
exporting output files to
other text editing
software such as
Microsoft Word:
the SAS session. File directories and external file
names called within SAS are case sensitive.
-Display manager mode: same as PC-SAS.
-Batch mode: override the existing .lst and .log
files when rerun the SAS code unless redirect to
different files.
Yes
Yes. Batch mode submission allows us to submit
many jobs simultaneously.
Yes. The lines that is longer than 135 columns
will be automatically moved to next line. (See
the example below)
Display manager mode: the SAS code will
disappear from the Program Editor window after a
submission. Can use RUN --> RECALL LAST
SUBMIT to recall the submitted SAS code.
Under UNIX prompt, type sas filename.sas
&, where filename.sas is the SAS code you like
to run.
Windowing environment mode: no effect (save
from Output Window).
Batch mode: the page break effect exists (export
from .lst file).
The above table provides a quick comparison between PC SAS and UNIX SAS. In this paper, we focus our
discussions on running programs using windowing environment on PC and background batch mode on UNIX. The
following is a simple SAS code that assigns a new variable based on two existing variables under DATA STEP and
under PROC IML. Note that the assignment of z was written in a long line under the SAS Program Editor window.
That is, there is no line break as shown here. The same SAS code was uploaded and run under a UNIX
environment. Partial log files of the submitted SAS code under a PC SAS environment and under a UNIX SAS
environment are shown below. We note that there is no error when the SAS code runs under PC but a syntax error
message is issued when the same SAS code runs under UNIX SAS. We notice this same phenomenon under either
DATA STEP or PROC IML.
data test;
x = 5;
y = 10;
z = x + x +
+y +x + x +
+y +x + x +
+y +x + x +
run;
proc print;
run;
proc iml;
x = 5;
y = 10;
z = x + x +
+y +x + x +
+y +x + x +
+y +x + x +
y
y
y
y
+
+
+
+
x
x
x
x
*
*
*
*
y
y
y
y
+
+
+
+
x
x
x
x
-
y
y
y
y
+
+
+
+
x
x
x
x
+
+
+
+
x
x
x
x
+
+
+
+
x + x + y + y/x + x*y + 3 + x + x + y + x
x + x + y + y/x + x*y + 3 + x + x + y + x
x + x + y + y/x + x*y + 3 + x + x + y + x
x + x +y;
y
y
y
y
+
+
+
+
x
x
x
x
*
*
*
*
y
y
y
y
+
+
+
+
x
x
x
x
-
y
y
y
y
+
+
+
+
x
x
x
x
+
+
+
+
x
x
x
x
+
+
+
+
x + x + y + y/x + x*y + 3 + x + x + y + x
x + x + y + y/x + x*y + 3 + x + x + y + x
x + x + y + y/x + x*y + 3 + x + x + y + x
x + x +y;
print x y z;
quit;
run;
2
A partial log file using PC SAS shows no error.
NOTE: SAS initialization used:
real time
1.90 seconds
cpu time
1.85 seconds
1
data test;
2
x = 5;
3
y = 10;
4
5
z = x + x + y + x * y + x - y + x + x + x + x + y + y/x + x*y + 3 + x + x +
y + x +y +x + x + y
5 ! + x * y + x - y + x + x + x + x + y + y/x + x*y + 3 + x + x + y + x +y +x
+ x + y + x * y + x
5 ! - y + x + x + x + x + y + y/x + x*y + 3 + x + x + y + x +y +x + x
6
+ y + x * y + x - y + x + x + x + x +y;
7
run;
NOTE: The data set WORK.TEST has 1 observation and 3 variables.
NOTE: DATA statement used:
real time
0.84 seconds
cpu time
0.09 seconds
If we upload the same SAS code and run under a UNIX platform, a warning message and an error is shown. A
partial log file is listed:
1
data test;
2
x = 5;
3
y = 10;
4
WARNING: Truncated record.
5
z = x + x + y + x * y + x - y + x + x + x + x + y + y/x + x*y + 3 + x
+ x + y + x +y +x + x + y + x * y + x - y + x + x +
5
! x + x + y + y/x + x*y + 3 + x + x + y + x +y +x + x + y + x * y + x
- y + x + x + x + x + y + y/x + x*y + 3 + x + x + y
5
! + x +y +x + x
6
run;;
___
___
___
22
22
22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +,
-, /, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT,
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +,
-, /, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT,
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +,
-, /, <, <=, <>, =, >, ><, >=, AND, EQ, GE, GT,
LE, LT, MAX, MIN, NE, NG, NL, OR, ^=, |, ||, ~=.
LE, LT, MAX, MIN, NE, NG, NL, OR, ^=, |, ||, ~=.
LE, LT, MAX, MIN, NE, NG, NL, OR, ^=, |, ||, ~=.
7
run;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: SAS set option OBS=0 and will continue to check statements. This may cause
NOTE: No observations in data set.
WARNING: The data set WORK.TEST may be incomplete. When this step was stopped
there were 0 observations and 4 variables.
NOTE: DATA statement used:
real time
0.46 seconds
cpu time
0.03 seconds
7
!
The difference between PC SAS and UNIX SAS is that the PC SAS Program Editor has no column limit while the
UNIX SAS Program Editor automatically moves texts into next line when the entered texts pass a limit. When a
SAS code is uploaded to a UNIX platform, the texts that are out of the UNIX SAS column limit will be truncated. The
truncated SAS code will generate an error message due to the truncation. If you create a SAS code under the UNIX
SAS Program Editor, the entry will be redirected to next line after reaching 135 columns automatically. On the other
hand, the PC SAS Program Editor has no column limit. For a UNIX batch mode submission, the truncation occurs
3
at column 256. One way to fix this problem is to break a long line into several lines so that each line will be shorter
than 135 columns when creates a SAS code.
RUNNING A PC CREATED SAS CODE ON UNIX
Assume we have created a SAS code under a PC environment but think that the UNIX batch mode is more
appropriate. To run this PC created SAS code under UNIX, we need to upload the SAS code along with any
associated raw data sets to UNIX. One easy way is the use of the FTP.
FTP FILES TO A UNIX PLATFORM
To upload the SAS code and raw data sets to the UNIX, one can use FTP to transport the SAS code and the raw
data sets. To do so, under windows operation system, select start, run, type in ftp sciences.sdsu.edu.
Note that you need to change the sciences.sdsu.edu to the UNIX platform that you wish to upload. After type in
the user id and password of your UNIX account, you can use put file.sas to upload file.sas to the UNIX
platform. Note that before you transporting the file, you need to change your directory to where file.sas is
located. The command under FTP is lcd for the change of the local directory. Similarly, cd is used to change the
remote host directory. Since you cannot see where your location is, it is not easy to use FTP for the first time. Type
help under the FTP prompt will show a set of commands that is available under FTP. A better alternative way of
transporting files to a UNIX platform is the use of the freeware, WS_FTP LE, which is available online. After
installing the application, once you click the WS_FTP LE icon, a dialog box will show up. The dialog box is shown
below. You can create new profiles that store the host name and startup directories, etc. In this dialog box, you can
change local or remote site directory after a connection, select files to upload or download, and change the
transport format between ascii and binary. One worthwhile note is that if you edit some files after opening the
WS_FTP LE application, you have to press the refresh button to update the listed files.
CONVERT PC TEXT FILES FOR UNIX
The DOS (includes Microsoft Windows) and UNIX operation systems store text files differently in format. The DOS
places a line feed and a carriage return character at end of each line while the UNIX only places a line feed at end
of each line. Some UNIX applications do not recognize the carriage return character and show the character as ^M.
This will cause a problem when SAS tries to read in a raw data. For example, a simple SAS code was uploaded to
a UNIX platform using binary format (the default format under WS-FTP LE). Note that the values of variable y are
all missing while the original values are not. On the other hand, the character variable z reads the values correctly.
Several easy ways can be used to avoid this uploading problem between PC and UNIX. When uploading the SAS
code and raw data use FTP, select ascii format. This will avoid this problem. An alternative way is to run a
dos2unix command under a UNIX prompt. This can be done easily under a UNIX prompt by entering dos2unix
filename1.sas filename2.sas. The filename1.sas is the original PC created SAS file and the
filename2.sas is the newly converted SAS file. Note you can use the same filename to override the existing one.
options ls = 70 ps = 70;
Data a;
4
Input x y;
Cards;
1 5
2 6
3 7
;
proc print; run;
data b;
input z $;
cards;
5
6
7;
proc print; run;
If we upload and run this code under UNIX SAS, the output file is shown below:
The SAS System
Obs
x
1
y
1
1
.
2
2
.
3
3
.
The SAS System
Obs
z
1
2
3
5
6
7
2
RUNNING SAS IN BATCH MODE
After converting a SAS code and raw data sets to a UNIX format, we can run the SAS code in background batch
mode. An advantage of doing so is that it will free up the X-terminal for doing other jobs. You can even log off the
terminal while the SAS code keeps running in the background. To view the status of the running SAS job, you can
use top command to see how much CPU the job consumes and how long it has been running. To do so, type top
under a UNIX prompt. When you are done viewing, simply press q from the keyboard to get back to the UNIX
prompt. To see the job ID assigned to this running program, you can type ps under UNIX prompt. The command
ps will show the following job description. The PID for the running SAS job is 6224 and the cumulated running time
is 4:06. Use kill –9 6224 if you want to terminate the running process.
The top command provides more information than the ps command. The screen looks like the following. The PID
6224 consumed about 24.98% of the CPU during the time we browsed it. The cumulated running time is 2:31 and
the NICE setting is 0. Depending on the UNIX server regulation, sometimes you need to change the NICE setting to
a lower priority. You can use /usr/bin/nice –20 sas file.sas & to change the nice setting to 20. Note
that some UNIX systems automatically kill a running job that runs longer than a specific time under the regular
priority (NICE = 0). Consult your system administrator for more information. If there are not many processes
during the time you submit your process, the NICE feature will not affect the performance since the system will
allocate all possible source to the job you submitted.
5
VIEWING THE OUTPUT AND LOG FILE AND EDITING THE SAS CODE
When the SAS batch mode process is done, we can use several UNIX commands to view the output file or log file.
One easy way is the use of more command under UNIX. The command more test.log will show the content of
test.log. To scroll to next page, you can press the space bar. To end the view, use q to end the display. Text
editors such as pico or emacs are alternative ways of viewing the output file or log file. If there is a need to modify
a SAS code, you can use text editors mentioned above to edit it and then resubmit the SAS code. Recall that the
output file and log file will be replaced when you resubmit the code. Use cp test.lst test_old.lst to keep
the old output file if you wish.
TIPS FOR RUNNING A LONG PROGRAM
Some UNIX systems may have a running time restriction under a normal running priority. You may need to lower
the running priority to preserve a longer process time. As we mentioned above, you can use nice –20 sas
file.sas & to change the priority. Consult with your system administrator for any restriction. Most of UNIX
accounts have quota limitation. You can check your quota using quota –v. If the output generated from your SAS
code exceeds the quota, the process may be terminated without any notice. If for an anticipated large output file
that you may need to save temporarily, you can use /tmp directory to store the large output.
File permission on UNIX is another feature that PC does not have. Since a UNIX server allows multi users to work
on, a permission set to each file prevents any unnecessary modification by other users of a file that belongs to you.
To check a permission status of a file, you can use ls –l to see the list of files under current directory.
As we mentioned, running a long process program under a UNIX server is a key advantage over a PC client.
However, you still need to estimate how long a process will be running under a UNIX server. An estimated running
time of a SAS program allows you to anticipate the approximate time for getting your output. If a program will run
forever or unreasonably long, you can modify your code so that it will finish under a reasonable time period.
Another reason of running a SAS process from PC SAS to UNIX SAS is the out of memory problem. If the out of
memory error still shows under UNIX, you can use -memsize 0M option to increase the memory to all available
memory when UNIX SAS processes your job. Recall that you may want to change your process to a lower priority
using nice since the -memsize 0M option may slow down other jobs dramatically.
A STEP-BY-STEP PROCEDURE
In this section, we provide a step-by-step procedure that summarize above features into an algorithm for readers to
follow. This procedure is used to upload a PC created SAS code to a UNIX platform and use the UNIX SAS batch
mode to run the code.
1. Ftp files (including programs and data sets) to the Unix system where you want to run your SAS program.
Check the conversion status from pc to UNIX. We recommend using WS-FTP LE for transporting files.
2. From your PC, log on to the UNIX machine using telnet. (Select start, run, then telnet
sciences.sdsu.edu, where sciences.sdsu.edu is the UNIX server you wish to log in.)
3. Remove the Windows carriage returns (^M) using dos2unix sas1.sas sas1.sas under UNIX prompt,
where sas1.sas is the SAS program you uploaded to the UNIX. Repeat the same procedure for all programs and
data sets.
4. Submit the SAS program in background batch mode. Use sas sas1.sas & for a background batch
mode submission. If the UNIX system you will be running requires lower priority for a long running
process, use nice to change the priority (consult with your system administrator for any restriction). For
lower priority submission, use /usr/bin/nice –20 sas sas1.sas &.
5. Check the status of your running program using either top or ps under UNIX.
6
6.
Check the log file (sas1.log) using either pico or emacs or simply the more UNIX command to see if
there is any error message.
7. Use any text editor, pico or emacs, to edit the SAS code and resubmit again.
8. Note that if you are uncomfortable of editing your SAS code under UNIX, you can download the program to
your PC and do the editing there and ftp back to UNIX again.
9. If you have a very large output file or log file that you want to save temporarily, you can use the temporary
directory /tmp.
10. If there is an out of memory message, you can change the memory size by
/usr/bin/nice –20 sas –memsize 0 sas1.sas &
to increase the memory size to all available memory.
CONCLUSION
It is known that personal computers are more powerful nowadays. However, some limitation may still exist. In this
paper, we compare the differences between PC SAS and UNIX SAS. Actions needed to avoid errors when running a
PC created SAS code under a UNIX platform are summarized. We also provide problems when uploading a PC
SAS code to a UNIX environment. A step-by-step procedure is provided for users that want to know how to run a
PC created SAS code on UNIX a quick glance. Note that a lot of online documents are available for basic UNIX
commands, how to use pico, emacs, and how to run ftp, etc. Also, most UNIX servers have different settings
and regulations. You should consult your UNIX system administrator for more information.
REFERENCES
Gady Kotler, SAS, Linux/UNIX and X-WINDOWS systems. Paper 283-28. SUGI 28 Proceedings. SAS Institute,
Inc., 2003, Cary, NC.
SAS Institute, INC., SAS Companion for the Microsoft Windows Environment, Version 8. SAS Institute, Inc., 2000,
Cary, NC.
SAS Institute, INC., SAS Companion for UNIX Environments, Version 8. SAS Institute, Inc., 2000, Cary, NC.
Yadong Zhang, UNIX Meet PC: Version 8 to The Rescue, Paper 39-28. SUGI 28 Proceedings. SAS Institute, Inc.,
2003, Cary, NC.
ACKNOWLEDGMENTS
This first author’s work was supported in part by the Biological and Environmental Research Program (BER), U.S.
Department of Energy, through the Great Plains Regional Center of the National Institute for Global Environmental
Change (NIGEC) under Cooperative Agreement No. DE-FC02-03ER63616.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Chii-Dean Lin
San Diego State University
5500 Campanile Dr.
San Diego, CA 91913
(619)594-6186
Email: cdlin@sciences.sdsu.edu
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
7
Download