Kaz SAS
Kaz’s SAS manual
To liberate Research Assistants of the World
Version 11/6/2004
by Kazuaki Uekawa, Ph.D.
kuekawa@alumni.uchicago.edu
Copyright © 2002 By Kazuaki Uekawa All rights reserved.
1
Kaz SAS
2
Profile:
Kazuaki (Kaz) Uekawa, Ph.D.
I am from Japan, but I have been in the US for about ten years. In 2000 I got my doctorate in Sociology at
the University of Chicago. While working for a research project lead by Charles Bidwell and Anthony Bryk,
I learned SAS. Currently I am a research analyst at AIR, American Institutes for Research, located in
Washington DC.
I am also a professional writer with pretty large audience. I design/write essays that are used for ESL
industry in Japan, i.e., those who are in business of testing students’ English competency. My favorite essay,
used for a material that let people practice read English, is about how Japanese boys collect beatles and
exchange them among themselves just like baseball cards in America. Also I wrote about how Japanese,
Americans, and Mexicans have different techniques to cure common colds. I practice what literary theorists
call “deconstruction,” which is to doubt what is taken for granted in a culture and show how strange and
arbitrary the cultural practice may appear to people outside the culture.
On weekends I am writing a book on English pronunciation. I discovered techniques that let Japanese
people pronounce English phonemes correctly at their first attempt—without any practice. I have decided
that linguistic theory that says that adult learners of foreign language cannot pronounce things correctly is
just an excuse. I plan to rock the non-English speaking world with my book as soon as I can in the year
2005. I believe this will be bigger than the Beatles’ revolution for the impact it has on the entire
non-English speaking communities of this globe. Immediately after Japanese people read my book, they
will be able to tell the differences between “wonder” and “wander” or “lice” and “rice.”
Kaz SAS
3
Table of Contents
I.
Basic Operations................................................................................................................................................ 5
1.
Ask questions to SAS by emailing support@sas.com ..................................................................................... 5
2.
How do I start and what mini-windows do I look at? ................................................................................. 6
3.
How do I look at data sets? ........................................................................................................................... 7
4.
Assigning library name and create folders .................................................................................................. 8
5.
How do we create SAS data? ....................................................................................................................... 10
A)
Create SAS data Via. Ms-Excel Sheets ............................................................................................... 11
B)
Create a SAS data set via an extrenal text file .................................................................................. 12
C)
Create a SAS data using a SAS syntax .............................................................................................. 10
6.
Examples of data steps ................................................................................................................................ 13
7.
Manipulating variables in data steps ........................................................................................................ 14
8.
Lots of manipulation techniques to be used in a data step ...................................................................... 17
9.
Application: How do we restrict analytical samples using NMISS function........................................... 18
Procedures ........................................................................................................................................................ 20
II.
10.
PROC CONTENTS: Description of Contents......................................................................................... 20
11.
PROC PRINT: See Data........................................................................................................................... 21
12.
PROC SORT: Sorting Observations based on a value of variable ........................................................ 21
13.
PROC MEANS: Get Descriptive Statistics (Mean, STD, Min, Max).................................................... 22
14.
PROC FREQ: Get Frequencies ............................................................................................................... 23
15.
PROC UNIVARIATE: Get elaborate statistics and a univariate plot .................................................. 23
16.
PROC PLOT: Plotting Two Variables ..................................................................................................... 24
17.
PROC TIMEPLOT: Time Plot ................................................................................................................. 24
18.
PROC CORR: Correlation........................................................................................................................ 25
19.
PROC OLS: OLS Regression ................................................................................................................... 25
20.
PROC LOGISTIC: Logistic Regression .................................................................................................. 25
21.
MAKE AN ASCHI FILE .......................................................................................................................... 25
III.
More Procedures .......................................................................................................................................... 26
22.
PROC STANDARD: Standardize Values ................................................................................................ 26
23.
PROC RANK: Rank observations ........................................................................................................... 27
24.
PROC SQL: Creating group-level mean variables ................................................................................ 27
25.
PROC IMPORT ........................................................................................................................................ 28
IV.
V.
Merging Data Sets ....................................................................................................................................... 29
MACROs........................................................................................................................................................... 30
26.
Typical Macro – I use this most often. .................................................................................................... 30
27.
LET MACRO –looks useful and it is useful, but in a limited way ....................................................... 30
VI.
ODS and PROC EXPORT ........................................................................................................................... 31
VII.
Application: Do PROC MEANS and save results as excel sheet using ODS .......................................... 34
VIII.
Kaz SAS 4
APPLICATION Read from many tables embedded within Excel sheets ............................................. 36
Kaz SAS
5
I. Basic Operations
1. Ask questions to SAS by emailing support@sas.com
When you have a question about SAS, you can email SAS institutes’ technical support team. The address is
support@sas.com. At the beginning of your email content, you copy the information you get at the head of
your log file. The log file is a file that you get when you run SAS. It looks like this:
NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0)
Licensed to UNIVERSITY OF XXXXX, Site XXXXX.
NOTE: This session is executing on the WIN_ME platform.
I developed my SAS skills mostly by communicating with SAS tech team.
I often use GOOGLE to get answers to my questions.
Kaz SAS
6
2. How do I start and what mini-windows do I look at?
In Windows, you can activate SAS by going to START ALL PROGRAMThe SAS System. Confirm that
you get three windows.
1. Editor file. This is where you write your syntax.
2. Log file. This file shows your errors.
Click this man to run your program.
3. Output file. You get results in this window.
Click on this ! mark to cancel
when the program is running.
Click Explorer
to look at the data sets. See next page on this.
Kaz SAS
3. How do I look at data sets?
This syntax (you type in into the editor file) gets you an example data to look at.
data abcd;
set sashelp.Prdsale;
run;
You can look at the data set in this way if you follow the four steps below.
Notes:
I look at the data sets to check if there is
Look closely if there is any
anything wrong with it.
irregularity in data.
You must close
the data sets before you run anything else if
the syntax you wrote affects the data set.
To get the view above where you can examine the data, follow the following steps.
2. Click
4. Click
Libraries
The data set.
3. Click
Work or other
1. Click
Explorer
folders.
7
Kaz SAS
8
4. Assigning library name and create folders
You need a libname statement at the head of your SAS programs. With these, you assign nick names
(library name) to indicate folders that host your SAS data sets. For example:
libname here "C:¥TEMP";
libname there "C:¥";
Running above creates two folders “here” and “there” in the libraries in the explorer’s view as you see in the
picture below (See previous page to see how to get to this view).
Imagine there is a data set called MYDATA and it is in C: \TEMP.
You can create it in this way:
libname here "C:¥TEMP";
data here.MYDATA;
X=1;
run;
This silly data has one observation, which is X whose value is 1.
Because you decided to call that folder by a nickname HERE, you
will be referring to the data set as “here.MYDATA.”
to print the contents of that data, you will do this:
For example,
proc print data=here.MYDATA;
run;
To see what variables are in the data, do this:
proc contents data=here.MYDATA;
run;
What are other folders? Sashelp fosts lots of data sets that SAS institutes
ship with the SAS software for demonstration’s sake. I have never opened
Sasuser or Maps.
“Work” hosts temporary data sets that you create as you
program in SAS. Temporary data sets disappear if you close your SAS
program. Permanent data sets, on the other hand, are the data sets you
create to keep even after you turn quit SAS. Next page elaborate eon these things.
Kaz SAS
9
Here are some silly example syntax to show you what the folders do and
what temporary and permanent data sets are.
/*libname statements just need to occur at the beginning
of the syntax file*/
libname here "C: ¥TEMP";
libname there "C:¥";
/*this creates a data called Wally in WORK folder*/
data Wally;
x=1;
y=2;
Click on these
folders to find
different
“Wally”
sets.
data
z=3;
run;
/*this creates a data called ABC in HERE folder*/
data here.Wally;
x=4;
y=5;
z=6;
run;
/*this creates a data called ABC in THERE folder*/
data there.Wally;
x=7;
y=8;
z=9;
run;
/*Use proc print to see the content of the data sets*/
proc print data=work.ABC;
run;
The following would do the same:
proc print data=here.ABC;
run;
proc print data=there.ABC;
run;
proc print;
run;
(when data is not specified, SAS just uses
whatever data it sees right before the syntax.)
proc print data=ABC;
run;
(“work.” can be omitted in this way. I always
omit it.)
Kaz SAS 10
5. How do we create SAS data?
A) Create a SAS data using a SAS syntax
Of course you can create data in your syntax.
libname here "C:¥";
data kaz;
input ID 1 SEX $ 4-9 height 13-15 ;
cards;
1 Male
170
2 Female
165
When a value is missing, it is safe to enter a dot
3 Male
4 Male
5 Female
;
run;
168
170
proc print;
run;
instead of leaving it empty in this way. But empty
is also okay because INPUT LINE explicitly is
telling SAS where to find values for each variable
(e.g., height 13-15).
After creating a data set, you want to see the data to see if
there is anything wrong. Because this is a small data set,
you can do PROC PRINT to print it on your output
window. The other useful way is to click on the actual
SAS data to see the content. I explained it earlier.
Kaz SAS 11
B) Create SAS data Via. Ms-Excel Sheets
This uses the first row for variable names. Then
use this syntax to import the excel sheet
(C:¥mary.xls) as a SAS data set (JOHN):
PROC IMPORT OUT= JOHN
DATAFILE= "C:¥mary.xls"
DBMS=EXCEL2000 REPLACE;
RUN;
/*This one ignores variable names.
It also specifies the sheet from
which to take data*/
PROC IMPORT OUT= JOHN
DATAFILE= "C:¥mary.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=NO;
SHEET=”Sheet1”;
RUN;
Be sure to close the excel sheet when you run the syntax to import it. Otherwise, you get this
error message:
ERROR: File _IMEX_.'Sheet1$'n.DATA does not exist.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
Kaz SAS 12
C) Create a SAS data set via an external text file
Imagine you have a text file (say, kaz.txt) that looks like this in your C temp folder.
It’s okay for a value to be missing. Dot “.”
is often used to indicate a missing value,
though. It is safer that way.
If you know where the data points are exactly in the data, you can indicate the locations in the following way.
data kaz;
infile "C:¥TEMP¥kaz.txt" ;
input ID 1 SEX $ 4-9 height 13-15 ;
run;
$ indicates that SEX is a character variable. SAS always needs
proc print;
to know if it is a character variable or a numeric variable.
run;
If character variable is just one word (e.g., Male), then we don’t really need to tell SAS about exact locations.
SAS will consider each block of words or numbers as one value. But you need to say “missover,” so in case
SAS won’t encounter a value (as in the third observation in this data set) at an expected place, it will consider
it as a missing value. If a character variable contains more than one word, then use the method above
instead of below.
libname here "C:¥TEMP";
data kaz;
infile "C:¥TEMP¥kaz.txt" missover;
input ID SEX $ height ;
run;
proc print;
run;
missover: when data are
missing, SAS will treat them
as missing values
Kaz SAS 13
Data Steps and Creating New Variables
6. Examples of data steps
Any SAS program consists of two elements. One is DATA STEPS and the other is PROCs (such as proc
print or proc means). I discuss data steps in this chapter. I show you some variations of data steps, so you
understand them by examples.
libname here "C:¥TEMP";
libname there "C:¥";
I am creating a new temporary data set XYZ (to be found in the
WORK folder) based on an already exisiting temporary data set
data xyz;
called ABC (found in the WORK folder).
set abc;
/*here manipulation of data */
run;
I am creating a new temporary data set ABC (to be found in the
WORK folder) based on an already existing temporary data set called
data abc;
ABC (found in the WORK folder).
The latter ABC will be
set abc;
overridden by a new data ABC. This is perfectly okay.
/*here manipulation of data */
run;
I am creating a new temporary data set XYZ based on an already
existing permanent data set called ABC (found in the HERE folder,
data xyz;
set here.abc;
which is C:¥TEMP).
/*here manipulation of data */
run;
I am creating a new permanent data set ABC in the HERE folder (which
is C:¥TEMP) based on an already existing temporary data set called
data here.abc;
XYZ.
set xyz;
/*here manipulation of data */
run;
I am creating a new permanent data set ABC in the THERE folder (which
data there.abc;
is C:¥TEMP) based on an already existing permanent data set called
ABC in the HERE folder (which is C:¥).
set here.abc;
/*here manipulation of data */
run;
Reminder:
Temporary data sets: Found in WORK folder. They disappear when a session ends..
Work folder: Click on Explorer Click on LIBRARIES Click on WORK
The HERE folder and THERE folder: HERE and THERE are the arbitrary names that I
assigned by giving LIBNAME statement. They refer to paths that I specified.
Kaz SAS 14
7. Manipulating variables in data steps
We use a SAS sample data set sashelp.Class (a data set called Class stored in SASHELP
folder) to practice creating new variables. Do this to find out what this
data set has:
proc contents data=sashelp.Class;
run;
You get information below, telling you that the data set has AGE, Height, Name,
SEX, and weight.
#
Variable
Type
Len
Pos
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
3
Age
Num
8
0
4
Height
Num
8
8
1
Name
Char
8
24
2
Sex
Char
1
32
5
Weight
Num
8
16
Here is a sample of how you can work on this data set to create Body Mass Index, as well as other useful
variables.
You always need to create a new data to create new
data ABC;
variables.
set sashelp.Class;
*Creating a character variable indicating a person's BMI status (Body Mass
Index);
weight_metric=weight*0.45359237;
height_metric=(height* 2.54)/100 ;
BMI=weight_metric/(height_metric**2);
/*Definition of obesity Normal weight = 18.5-24.9
Overweight = 25-29.9 Obesity = BMI of 30 or greater */
Without length statement, SAS would set the length of character to the first
value it encounters, which would be “Underweight” in this case.
length
If BMI
If BMI
If BMI
If BMI
run;
status $ 15;
< 18.5 then status="Underweight";
=> 18.5 and BMI < 25 then status="Normal";
=> 25 and BMI < 30 then status="Overweight";
>= 30 then status="Obese";
Kaz SAS 15
I have cleaned up this document up to here. I am still working on the rest.
The rest of this manual is based on this data set:
http://www.estat.us/sas/kazclass.txt
Download the digital version of this document and cut and paste the following data. The data comes from
TIMSS (Third International Mathematics and Science Survey). MAT7 is 7th graders’ and MAT8 is 8th
graders’ nation-mean mathematics score. NATEXAM is 1 when a nation has a national examination system,
NATTEXT is 1 if a nation decides on textbooks at the national-level, and NATSYLB is 1 when a nation
decides on syllabus at the national-level. Block is a geographical area. PROP is a proportion of kids in
middle school.
data kaz;
input
acro $ NATION $ 6-14
NAME
$
15-33
MAT7
MAT8
GNP14 PROP NATEXAM NATSYLB NATTEXT block $;
cards;
aus
Australi Australia
498 529.63 -0.15526
84
0
1
0
ocea
aut
Austria
509 539.43 -0.29163
100
0
0
1
weuro
bfl
Belgi_FL Belgium (Fl)
558 565.18 -0.25157
100
1
1
0
weuro
bfr
Belgi_FR Belgium (Fr)
507 526.26 -0.25157
100
0
1
0
weuro
can
Canada
494 527.24
0.07184
88
0
0
0
namer
col
Colombia Colombia
369 384.76 -0.23699
62
0
1
0
samer
cyp
Cyprus
Cyprus
446 473.59 -0.41906
95
0
1
1
seuro
csk
Czech
Czech Republic
523 563.75 -0.34840
86
0
1
0
eeuro
dnk
Denmark
Denmark
465 502.29 -0.34057
100
1
0
0
weuro
fra
France
France
492 537.83
0.55791
100
0
1
0
weuro
deu
Germany
Germany
484 509.16
0.91992
100
0
0
0
weuro
grc
Greece
Greece
440 483.90 -0.32620
99
0
1
1
seuro
hkg
HongKong Hong Kong
564 588.02 -0.31638
98
1
1
1
seasia
hun
Hungary
Hungary
502 537.26 -0.37602
81
0
0
0
eeuro
isl
Iceland
Iceland
459 486.78 -0.42606
100
0
0
0
neuro
irn
Iran
Iran, Islamic Rep.
401 428.33 -0.17095
66
0
1
1
meast
irl
Ireland
Ireland
500 527.40 -0.38919
100
1
1
0
weuro
isr
Israel
Israel
. 521.59 -0.35464
87
0
1
0
meast
jpn
Japan
Japan
571 604.77
1.85543
96
0
1
0
seasia
kor
Korea
Korea
577 607.38 -0.01168
93
0
1
1
seasia
kwt
Kuwait
Kuwait
. 392.18 -0.40359
60
0
1
1
meast
lva
Latvia
Latvia (LSS)
462 493.36 -0.42319
87
0
0
0
eeuro
ltu
Lithuani Lithuania
428 477.23 -0.41785
78
1
1
1
eeuro
nld
Netherla Netherlands
516 540.99 -0.18184
93
1
0
0
weuro
Austria
Canada
Kaz SAS 16
nzl
NewZeala New Zealand
472 507.80 -0.38319
100
1
1
0
ocea
nor
Norway
461 503.29 -0.35450
100
0
1
1
neuro
prt
Portugal Portugal
423 454.45 -0.32588
81
0
1
0
weuro
rom
Romania
454 481.55 -0.35396
82
1
1
1
eeuro
rus
RussianF Russian Federation
501 535.47
0.12827
88
1
0
0
eeuro
sco
Scotland Scotland
463 498.46
0.48017
100
0
0
0
weuro
sgp
Singapor Singapore
601 643.30 -0.37279
84
1
1
1
seasia
slv
SlovakRe Slovak Republic
508 547.11 -0.40217
89
0
1
0
eeuro
svn
Slovenia Slovenia
498 540.80 -0.41310
85
0
1
1
eeuro
esp
Spain
Spain
448 487.35
0.03461
100
0
1
1
weuro
swe
Sweden
Sweden
477 518.64 -0.30049
99
0
1
0
neuro
che
Switzerl Switzerland
506 545.44 -0.27916
91
0
0
0
weuro
tha
Thailand Thailand
495 522.37 -0.14533
37
0
1
1
seasia
usa
USA
476 499.76
97
0
0
0
namer
Norway
Romania
United States
;run;proc print;run;
5.37506
Kaz SAS 17
8. Lots of manipulation techniques to be used in a data step
data abc;
set sashelp.Class;
var1=height+weight;
var2=sum(of height weight);
var3=weight-height;
var4=height*weight;
var5=height/weight;
var6=1/(height+weight);
var7=mean(of height weight);
var7B=mean(height, weight);/*this way is okay too*/
var8=max(of height weight);
var9=min(of height weight);
var10=log(height);
var11=abs(var3); /*Absolute values: this takes out negative signs*/
var12=nmiss(of height weight);/*N of missing cases*/
var13=n(of height weight); /*N of observations*/
run;
proc print;
run;
How is Z=mean(of X1 X2 X3) different from Z=(X1+X2+X3)/3;?
How is Z=sum(of X1 X2 X3) different from Z=X1+X2+X3;?
Functions, such as mean(of …) or sum (of …), take statistics of non-missing values. They do return values
even when some of the variables in the brackets are missing. For example, if X1 is missing:
X=mean (of X1 X2 X3); will return the average of X2 and X3.
In contrast,
X=(X1+X2+X3)/2 will return a missing value, namely, “.”
Kaz SAS 18
9. Using Character Functions to create new variables
data abc;
set sashelp.Class;
var1=name||sex;
var2=compress(name||sex);/*COMPRESS gets rid of space in between*/
var3=substr(name,1,3);/*take the first 3 letters starting from the first
letter*/
var4=upcase(name);/*upper case*/
run;
proc print;
run;
10. Application: How do we restrict analytical samples using NMISS function
When we compare several regression models (e.g., coefficients, R2, Goodness-of-fit, etc.), we want to keep the
number of observations same across different models. Because predictors may have different patterns of
missing values, this must be made to happen if you want to. For example, mat7, which is 7th graders’
mathematics score include some missing cases. Some nations only let their 8th graders participate in this
international test.
Use NMISS function to create a new variable john.
data kaz2;set kaz;
john=nmiss(of GNP14 mat8 mat7);/*this returns the number of missing cases*/
run;
/*check how the data looks like now*/
proc print data=kaz2;
var name gnp14 mat8 mat7 john;
run;
/*Apply OLS regression with cases with perfect data (no missing cases). In this way, model 1 and model 2
will have the same number of cases, or to be more precise, the same data.*/
proc reg data=kaz2;
Kaz SAS 19
where john=0; /*Run only when john=0, namely, number of missing cases is 0*/
model mat8=mat7;
model mat8=mat7 gnp14;
run;
Kaz SAS 20
II. Procedures
11. PROC CONTENTS: Description of Contents
Data ABC;set sashelp.Prdsale;
run;
/*1111111111111111111111111*/
/*simple way*/
proc contents data=ABC;
run;
/*I like "position option" because it gives me a table that is sorted by the
position of variables in the data, in addition to alphabetically sorted table*/
proc contents data=ABC position;
run;
/*2222222222222222222222222*/
/*Easiest way to produce RTF or EXCEL documents off PROC CONTENTS*/
/*but I don't like this way because it comes with too many details*/
ods rtf file ="C:¥TEMP¥datadictionary1.rtf";
proc contents data=ABC position;
run;
ods rtf close;
ods html file ="C:¥TEMP¥datadictionary1.xls";
proc contents data=ABC position;
run;
ods html close;
/*Using ODS we get only the data we want.*/
proc contents data=ABC position;
ods output position=whatever_name_you_want ;
run;
ods rtf file ="C:¥TEMP¥datadictionary2.rtf";
proc print data=whatever_name_you_want noobs;
title "data dictionary in RTF";
var variable label ;
run;
Kaz SAS 21
ods rtf close;
ods html file ="C:¥TEMP¥datadictionary2.xls";
proc print data=whatever_name_you_want noobs;
title "data dictionary in Excel";
var variable label ;
run;
ods html close;
12. PROC PRINT: See Data
PROC PRINT data=kaz;
VAR nation mat7 mat8 natexam; /*without this, all variables will be printed*/
run;
Advanced topic: You can selectively print observations.
/*print only when natexam=1*/
proc print data=kaz;where natexam=1;var nation mat7 mat8;run;
/*print by group units*/
proc sort data=kaz out=kaz2;by block;run;
proc print data=kaz;by block;var nation mat7 mat8;run;
/*print only up to a certain number of observations*/
proc print data=kaz2 (obs=5); /*shows only five observations*/
run;
If you want a nicer print-out, try proc report.
13. PROC SORT: Sorting Observations based on a value of variable
You would be using this procedure a lot, but be careful with large data set. This procedure consumes lots of
computation time.
PROC SORT data=kaz out=kaz2;
/*If you don’t want to create a new data set, just write “out=kaz”*/
by mat8;
run;
Advanced topics:
proc sort data=kaz out=kaz2 nodupkey;
Kaz SAS 22
by block;
run;
proc print data=kaz2;run;
This takes only the first observation of each block. Imagine that you have data where there are individual
level variable (e.g., 100 students) and group level variable (e.g., 10 schools). Imagine you want to get school
level information from this data. Above procedure would take just the first observation of each school and
gets you ten lines of data for 10 schools. Ignore individual-level variables, however.
You can use more than one variable in by line.
proc sort data=kaz out=kaz2;
by natexam block;
run;
/*How would the new data look like?*/
proc print data=kaz2;run;
14. PROC MEANS: Get Descriptive Statistics (Mean, STD, Min, Max)
PROC MEANS data=kaz;
VAR mat7 mat8;
run;
Advanced topic: Group means.
/*Report group means*/
proc sort data=kaz out=kaz2;by block;run;
proc means data=kaz2;
by block;
var mat7 mat8;
run;
You can also use “class” statement instead of “by” statement. Class statement is easier because you don’t
need to sort the data by the by-variable before it. I forgot what the downside of it was.
proc means data=kaz2; /*now, kaz2 does not have to be sorted by block*/
class block;
var mat7 mat8;
run;
Kaz SAS 23
/*Save group means*/
ods listing close; /*printing of results suppressed*/
proc means data=kaz2; /*make sure kaz2 is already sorted by group ID*/
by block;
var mat7 mat8;
ods output summary=john; /*Output Delivery System Used. See SAS manual 2*/
run;
ods listing on; /*printing of results resumed*/
proc print data=john;
run;
/*Get standard errors by adding STDERR*/
/*But it would only get standard error, so you must add other statistics you would like with it.
mean, N, STD, MAX, and MIN*/
PROC MEANS data=kaz mean n std max min stderr;
VAR mat7 mat8;run;
run;
Specify
I recommend reading a chapter on PROC MEANS in SAS CD-online. It is a very versatile procedure.
15. PROC FREQ: Get Frequencies
PROC FREQ data=kaz;
Tables natexam ;
Run;
Advanced topics:
Get cross tabulation:
PROC FREQ data=kaz;
tables natexam*block;
run;
16. PROC UNIVARIATE: Get elaborate statistics and a univariate plot
PROC UNIVARIATE PLOT DATA=KAZ;
var mat7 mat8 gnp14;
run;
Advanced topic:Get a whisker plot by sub groups, so you can compare group values. But the output is
text-based and pretty ugly.
Kaz SAS 24
proc sort data=kaz out=kaz2;
by block;
run;
PROC UNIVARIATE data=kaz2 plot;
by block;
var mat8;
run;
17. PROC PLOT: Plotting Two Variables
This is text-based graph. Use proc gplot for a nicer graphic.
PROC PLOT data=KAZ;
Plot mat7*mat8;
run;
18. PROC TIMEPLOT: Time Plot
proc timeplot data=KAZ;
plot mat8= '*';
id NAME;
run;
Advanced topics:
/*Sort first by the variable of your interest and see it*/
/*you will be seeing a ranking of nations*/
proc sort data=kaz out=kaz2;
by mat8;
run;
proc timeplot data=KAZ2;
plot mat8= '*';
id NAME;
run;
Add bells and whistles. Below, I am asking, “Does GNP has anything to do with test score?
/*First sort by GNP*/
proc sort data=kaz out=kaz2;
by gnp14;
run;
proc timeplot data=KAZ2;
title “TIMSS countries sorted by GNP”;
plot mat7 mat8/overlay hiloc npp ;
id NAME block gnp14 prop;
Kaz SAS 25
run;
19. PROC CORR: Correlation
PROC CORR DATA=KAZ;
VAR mat7 mat8 gnp14;
Run;
20. PROC REG: OLS Regression
PROC REG DATA=KAZ;
MODEL mat8=natexam gnp14;
Run;
Advanced Topic:
http://www.estat.us/sas/OLS%20tables%20for%20learning.txt
21. PROC LOGISTIC: Logistic Regression
/*I don’t know if natexam can be considered a dependent variable, but for the sake of demonstration*/
PROC logistic data=kaz descend;
Model natexam=gnp14;
run;
/*option descend makes sure that RROC LOGISTIC is modeling the probability that the outcome=1.
Without this option, it would model the probability that the outcome=0*/
22. MAKE AN ASCHI FILE
To use a stand-alone software program, you may have to create a simple aschi file. But I rarely use this
lately because many software read SAS data directly.
data timss;set kaz;
file "aschi_example.txt";
put (nation) (10.0) (mat7 mat8) (8.0);
run;
Kaz SAS 26
III. More Procedures
23. PROC STANDARD: Standardize Values
Make Z-score with a mean of 0 and standard deviation of 1
proc standard data=kaz out=kaz2 mean=0 std=1;
var mat7 mat8;
run;
/*then see what you did*/
proc print data=kaz2;
run;
Advanced technique: Standardize within groups.
/*First sort by group ID*/
proc sort data=kaz out=kaz2;
by block;
run;
/*Use by statement*/
proc standard data=kaz2 out=kaz3 mean=0 std=1;
by block;
var mat7 mat8;
run;
Kaz SAS 27
24. PROC RANK: Rank observations
proc rank data=kaz out=kaz2 group=3;
/*Creates 3 groups. The new values will be 0, 1, and 2. */
var mat7 mat8;
RANKS Rmat7 Rmat8;
/*give names to the new variables*/
Run;
/*see what happened*/
proc print data=kaz2;
var mat7 Rmat7 mat8 Rmat8;
RUN;
Research Tip:
Why do we use rank?
a. We can split the sample based on the rank. e.g., high SES student sample versus low SES student sample.
b. We can create dummy variables quickly by specifying group=2. e.g., high SES student will receive 1;
else 0. This grouping occurs at the median point of a variable, which may or may not be always the best
strategy. Alternative way is to assign 1 and 0 based on some meaningful threshold. For example, I have
temperature data, I may use a medium point to split the data if it makes sense, but maybe I use 0 degree
(Freezing point) as a meaningful point to split the data instead.
25. PROC SQL: Creating group-level mean variables
One could use proc means to derive group-level means. I don’t recommend this since it involves extra steps
of merging the mean data back to the main data set. Extra steps always create rooms for errors. PROC
SQL does it at once.
proc sql;
create table kaz2 as
select *,
mean(mat7) as mean_mat7,
mean(mat8) as mean_mat8,
mean(gnp14) as mean_gnp
from kaz
group by block;
run; /*proc sql does not really require run statement, but for the sake of consistency*/
Kaz SAS 28
proc print data=kaz2;
run;
26. PROC IMPORT
Since you learned proc export, why not learn proc import. You can read excel data into SAS by this. For an
experiment, create an excel sheet in C drive and import it into SAS using the following code.
PROC IMPORT OUT= mine
DATAFILE= "C:¥example.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=YES;
RUN;
proc print data=mine;
run;
Kaz SAS 29
IV. Merging Data Sets
libname here “C:¥”;
/*Create two data sets A and B.*/
data A;
set kaz; /*I am assuming that you already have this data set “kaz” */
keep nation mat7;
run;
data B;
set kaz;
keep nation mat8;
run;
/*MERGE DATA SETS*/
/*First sort them by a common ID*/
/*Here they are already sorted, so the following two lines are not really necessary*/
proc sort data=A;by nation;run;
proc sort data=B;by nation;run;
data NEW;
merge A B;
by nation;
run;
/*Confirm*/
proc print data=NEW;
run;
Kaz SAS 30
V. MACROs
Macro can save time by reducing repetitive parts in the program.
27. Typical Macro – I use this most often.
%macro john (group=,var1=,var2=);
proc means data=kaz;
class &group;
var &var1;
run;
%mend john;
%john(group=natexam,var1=mat7 mat8);
%john(group=block,var1=gnp14 prop);
28. LET MACRO –looks useful and it is useful, but in a limited way
%let john=weight1; /*change this to weight2 if needed*/
data kaz2;set kaz;
/*hypothetical weight. Unrealistic but for practice*/
weight1=1;
weight2=2;
run;
proc reg data=kaz2;
weight &john;
title "Modeling with &john ";
model mat8=mat7 ;
run;
Kaz SAS 31
VI. ODS and PROC EXPORT
ODS can customize the result of statistics procedures. It can save statistical results as data sets. This is
useful when making a table to go with a paper. Without printing results out on paper, one can manipulate
the result data to come out of SAS as almost-paper-ready quality. Without printing results out on paper, one
can create graphs right after the statistical procedures using those results. ODS exists for all procedures.
Example: PROC MEANS.
1. Know the table names available by doing the following. (You can do this also to any other PROCs)
ods trace on;
proc means data=kaz;
var mat8;
run;
ods trace off;
2. look at the log file to find out the name of tables available. The log will say.
Output Added:
------------Name:
Summary
Label:
Summary statistics
Template:
base.summary
Path:
Means.Summary
------------This means that proc means has a table called SUMMARY in which you will be seeing results of PROC
MIXED.
3. add the ODS line in PROC MEANS in this way. You are getting a data set john in which you will find the
results of the proc means procedure.
proc means data=kaz;
var mat8;
ods output summary=john;
run;
4. See what is inside john.
proc print data=john;
Kaz SAS 32
run;
5. If you like you can manipulate the john data in any way you like by doing a data step here.
5. Save it in an excel file.
PROC EXPORT DATA= john
OUTFILE= "C:¥john2.xls"
DBMS=EXCEL2000 REPLACE;
RUN;
PROC TRANSPOSE
Before going too far with ODS, learn how to transpose data. This is useful when you want to change the
form of result data sets that you obtained by using ODS. For example, you got john data in the previous
page and you may want to transpose it to get the form of table that you like.
Transpose means
to go from
[1 2 3]
to
1
2
3.
We are still using john data that you created in the previous page. The original john looks like below. This
may not be what you like the final table to look like.
Obs
block
NObs
1
eeuro
8
8
2
meast
3
3
447.36666667
66.772247479
392.18
521.59
3
namer
2
2
513.5
19.431294347
499.76
527.24
4
neuro
3
3
502.90333333
15.933519176
486.78
518.64
5
ocea
2
2
6
samer
7
seasia
5
5
8
seuro
2
2
9
weuro
12
12
1
MAT8_N
1
MAT8_Mean
522.06625
518.715
384.76
MAT8_StdDev
32.954715655
15.436141033
MAT8_Min
477.23
507.8
MAT8_Max
563.75
529.63
.
384.76
384.76
593.168
44.409074185
522.37
643.3
478.745
7.290270914
473.59
483.9
519.52
30.411673895
454.45
565.18
Kaz SAS 33
proc transpose data=john out=john3;
id block;
run;
proc print data=john3;run;
Now, the transposed john, or john3 looks like this. If you like this, you can export this as an excel file.
Obs
_NAME_
_LABEL_
eeuro
meast
namer
neuro
ocea
samer
seasia
seuro
weuro
1
NObs
N Obs
8.000
3.000
2.000
3.000
2.000
1.00
5.000
2.000
12.000
2
MAT8_N
N
8.000
3.000
2.000
3.000
2.000
1.00
5.000
2.000
12.000
3
MAT8_Mean
Mean
522.066
447.367
513.500
502.903
518.715
384.76
593.168
478.745
519.520
4
MAT8_StdDev
Std Dev
32.955
66.772
19.431
15.934
15.436
.
44.409
7.290
30.412
5
MAT8_Min
Minimum
477.230
392.180
499.760
486.780
507.800
384.76
522.370
473.590
454.450
6
MAT8_Max
Maximum
563.750
521.590
527.240
518.640
529.630
384.76
643.300
483.900
565.180
There are a lot more bells and whistles to proc transpose. One is the use of by-statement. It allows
transposing of data within by-groups, but the data must be sorted by the group variable right before proc
transpose.
Save it in an excel file.
PROC EXPORT DATA= john3
OUTFILE= "C:¥john3.xls"
DBMS=EXCEL2000 REPLACE;RUN;
Kaz SAS 34
VII. Application: Do PROC MEANS and save results as excel sheet using ODS
Try making a more sophisticated table off PROC MEANS. An example program for PROC REG is at
www.src.uchicago.edu/users/ueka
data kaz2;set kaz;
/*create instrumental variable for the whole sample*/
/*this will be used as a classification variable*/
wholesample="whole";
run;
%macro klas (var=);
ods listing close; /*printing suppressed*/
/*Get statistics and save it in a result data*/
proc means data=kaz2 /*mean std stderr max min n*/;
class &var;
var mat7 mat8 GNP14 PROP;
ods output summary = &var;/*result data's name will be the same as classification variable, i.e.,
wholesample and block*/
run;
/*Transpose the result data, so it looks better*/
proc transpose data=&var out=&var.T;
id &var;
run;
%mend klas;
%klas (var=wholesample);
%klas (var=block);
data all;
merge wholesampleT blockT;
/*by statement (by _name_)not necessary because the two data has identical structure*/
run;
ods listing; /*printing resumed*/
proc print data=all;
run;
Kaz SAS 35
/*create an excel file*/
PROC EXPORT DATA= all
OUTFILE= "C:¥all.xls"
DBMS=EXCEL2000 REPLACE;
RUN;
Kaz SAS 36
VIII. APPLICATION Read from many tables embedded within Excel sheets
Scenario: We have 50 excel sheets. In each Excel sheet, we have students’ achievement data from fifty
different schools. Bad news is that the data is not stored in a conventional form (row observations, columns
variables). Instead, each excel sheet has tables and charts within it. But at least the formats of those
tables are common across the sheets. How can we extract data from these 50 excel sheets and make them
usable for students’ achievement analysis?
Old way: Hire research assistants and let them manually pick relevant information from all fifty excel
workbooks.
New way: Read each excel sheet using PROC IMPORT and save it as a SAS data. Manipulate the SAS data
into an analyzable form (where rows are observations and columns are variables).
Example. Imagine that we have 50 of the excel sheets like this:
Step 1: Read one sheet using PROC IMPORT:
PROC IMPORT OUT= JOHN
DATAFILE= "C:¥temp¥Blue Sky High School.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=NO;
RUN;
Kaz SAS 37
Step 2: Examine the SAS data you created, i.e., JOHN by doing:
proc print;
run;
We get:
Obs
F1
F2
1
Blue Sky High School Math achievement score
.
2
Boys
3
Girls
55
4
Hispanic
45
5
Black
46
6
White
48
7
Asian
49
8
Native American
43
60
Step 3: Think of a way to get this data into a shape where rows are observations.
like this:
NAME
Boys
SCORE
60
Girls
Hispanics
55
45
I want it to look
Step 4:
So I must TRANPOSE the data (PROC TRANSPOSE). But before that, I want to get rid of the first
observation because it looks useless. It is just a title of the table (though I could be creative and
use that info as an ID variable.)
data john;
set john;
if _n_ ne 1;
run;
Kaz SAS 38
Step 5: Now I transpose the John data.
proc transpose data=john out=John2;
id F1;
var F2;
run;
proc print data=john2;
run;
Native_
Obs
1
_NAME_
F2
_LABEL_
F2
Boys
60
Girls
55
Hispanic
Black
White
Asian
45
46
48
49
American
43
The first two variables, _NAME_ and _LABEL_, are useless, so I could get rid of them, but I just
leave them for now. Note so far that I went FROM (an original excel sheet):
TO: (This is a SAS data set JOHN2 stored in WORK folder)
Kaz SAS 39
Step 6 (FINAL): Now I look at what I did so far and thinking of a way to automate it using MACRO,
so I can affect all 50 excel sheets. I used PROC IMPORT to read an Excel sheet, one data step to
get rid of one observation from the data, and PROC transpose to get the format I wanted. Now I
use a macro to get above process applied not only to one excel sheet but also to other sheets.
Macro begins with this.
libname here "C:¥TEMP";
&var1 is a token to be replaced by the words
%macro Edward (var1=, var2=);
specified later. When the first %edward line is
PROC IMPORT OUT= JOHN
read by SAS, all occurrence of &var1 will be
DATAFILE= "C:¥temp¥&var1..xls"
replaced by “Blue Sky High School.”
DBMS=EXCEL2000 REPLACE;
GETNAMES=NO;
RUN;
data john;
set john;
if _n_ ne 1;
run;
You see two dots, which is okay.
The
first dot indicates the ending of &var1.
The second dot is part of the file name.
_N_ indicates a sequence number of observation.
number is not 1 then keep the observations.
So it reads “if sequence
Thus, the first observation is
dropped.
proc transpose data=john out=John2;
id F1;
The occurrence of &var2. Here you don’t need a dot that
var F2;
run;
data data&var2;
set john2;
length ID $ 50;
ID="&var1";
drop _NAME_ _LABEL_;
run;
indicates the ending of a macro token—because it is
obvious that it ends there. We need a dot when it is not
clear. If I chose to name this data “&var2.data” I’d need a
dot in between; otherwise, &var2data feels like an entirely
different macro token called &var2data rather than &var2
and data.
The MACRO begins with %macro and ends with %mend.
%mend Edward;
%Edward (var1=Blue Sky High School, var2=1);
%Edward (var1=Central High School, var2=2);
%Edward (var1=West High School, var2=3);
data here.ALLData;
set data1 data2 data3;run;
Executing
first
iteration.
All
occurance of &var1 will be replaced by
Blue Sky High School and &var2 will
be replaced by 1.
Kaz SAS 40
APPLICATION
/*3333333333333333333333333*/
/*And you can use data steps to manipute the result data set to customize it*/
/*Here I do something tedious but worth while doing*/
/*Merge content data with descriptive statistics*/
/*Feels tedious, but once you write this, you can use it for later use or you
can even just use this program
for your purpose*/
/*proc contents here*/
proc contents data=ABC position;
ods output position=whatever_name_you_want ;
run;
/*get means here*/
proc means data=ABC;
ods output summary=result_from_proc_mean;
run;
proc transpose data=result_from_proc_mean out=transposed_data;
run;
data transposed_data;
set transposed_data;
/*get rid of part of the names*/
_name_=tranwrd(_name_,"_Mean","");
_name_=tranwrd(_name_,"_StdDev","");
_name_=tranwrd(_name_,"_Max","");
_name_=tranwrd(_name_,"_Min","");
_name_=tranwrd(_name_,"_N","");
run;
proc transpose data=transposed_data out=transposed_data2;
by _name_ notsorted ;
var col1;
id _label_;
run;
Kaz SAS 41
data transposed_data2;
length variable $ 32; /*I needed to do this because in the content data the
length is 32*/
set transposed_data2;
variable=_name_;
run;
proc sort data=whatever_name_you_want;by variable;run;
proc sort data=transposed_data2;by variable;run;
data newdata;
merge whatever_name_you_want transposed_data2;
by variable;
run;
/*I want to retain the original sequence of variables (which I lost by PROC
SORT above that I had to use
before merging*/
proc sort;
by
Num;run;
ods rtf file ="C:¥TEMP¥datadictionary3.rtf";
proc print data=newdata noobs;
title "data dictionary in RTF";
var variable label N Mean STD_dev Minimum Maximum ;
run;
ods rtf close;
ods html file ="C:¥TEMP¥datadictionary3.xls";
proc print data=newdata noobs;
title "data dictionary in Excel";
var variable label N Mean STD_dev Minimum Maximum ;
run;
ods html close;