data_step_presentation - South Central SAS® Users Group

advertisement
DATA STEP by DATA STEP
you’ll go far
Aaron J. Rabushka
Statistical Programmer
INC Research, Inc.
Austin, TX
Note that the code examples in
this presentation were
developed and run under SAS
V9.2 (TS2M2), and that the
coloring in the displays of code
and results comes from the
author without necessarily
representing actual SAS
displays.
A FEW BASICS
The SAS system offers a halfway house
between canned routines and procedural
programming.
Its pre-programmed procedures save a lot
of work and time since programmers do
not have to re-code standardized and
routinized procedures and utilities every
time they use them.
A FEW BASICS
Most SAS code goes into STEPs, either
DATA STEPs or PROC (PROCedure)
STEPs.
OPEN CODE refers to instructions not
associated with either of these (e.g.,
OPTIONS statements).
Some DATA STEP features will seem very
familiar to procedural programmers, and
some will seem annoyingly foreign.
A FEW BASICS
Every SAS DATA STEP begins with the
word DATA.
Note that in this instance it is not followed
by an equal sign as DATA= references an
already existing data set during the course
of a PROC statement.
A FEW BASICS
SAS has two data types, NUMERIC and
CHARACTER. SAS users derive all of
their variables from these two types.
SAS does not have special types for
LOGICAL or DATE fields.
A FEW BASICS
If the programmer does not name a
dataset in the DATA statement the system
will name it as DATA with a sequence
number appended.
data;
x = 1;
output;
run;
data;
y = 10;
output;
run;
1
2
3
4
data;
x = 1;
output;
run;
NOTE: The data set WORK.DATA1 has 1 observations and 1
variables.
NOTE: DATA statement used (Total process time):
real time
0.04 seconds
cpu time
0.01 seconds
5
6
7
8
data;
y = 10;
output;
run;
NOTE: The data set WORK.DATA2 has 1 observations and 1
variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
A FEW BASICS
If the programmer does not name a
dataset in the DATA statement the system
will name it as DATA with a sequence
number appended.
This practice is not recommended as it can
result in world-class confusions.
A FEW BASICS
SAS dataset names officially have two
parts, a library name and a data set name.
A period separates the two.
If a programmer does not specify a library
name for a dataset the SAS system will
attach WORK. to the dataset name that he
assigns. The programmer does not need to
articulate WORK. in the code.
data demonstration;
x = 1;
output;
run;
10
11
12
13
data demonstration;
x = 1;
output;
run;
NOTE: The data set WORK.DEMONSTRATION
has 1 observations and 1 variables.
NOTE: DATA statement used (Total
process time):
real time
0.01 seconds
cpu time
0.00 seconds
A FEW BASICS
SAS dataset names officially have two parts, a
library name and a data set name. A period
separates the two.
If a programmer does not specify a library
name for a dataset the SAS system will attach
WORK. to the dataset name that he assigns.
The programmer does not need to articulate
WORK. in the code.
WORK. files disappear when the SAS session
ends.
A FEW BASICS
SAS datasets that need to be saved or that
have been saved into libraries from
previous SAS sessions need to have both
their dataset names and their library
names articulated every time the program
references them.
The programmer must declare library
names with LIBNAME before using them in
this way.
*NOTE THAT LIBRARY DEFINITIONS
ARE OPERATING-SYSTEM SPECFIC;
libname
ajrdata
"h:\";
data ajrdata.demonstration;
x = 1;
output;
run;
17
libname ajrdata "h:\";
NOTE: Libref AJRDATA was successfully assigned as
follows:
Engine:
V9
Physical Name: h:\
18
19
20
data ajrdata.demonstration;
21
x = 1;
22
output;
23
run;
NOTE: The data set AJRDATA.DEMONSTRATION has 1
observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time
0.21 seconds
cpu time
0.00 seconds
A FEW BASICS
Dataset names can
have at most 32
characters and must
start with a letter or
underscore.
data
this_is_an_example_of_a_dataset_name_that_
is_too_long;
x = 1;
output;
run;
25
data
this_is_an_example_of_a_dataset_name_that_is_too_long;
---------------------------------------------------307
ERROR 307-185: The data set name cannot have more than 32
characters.
26
27
28
x = 1;
output;
run;
NOTE: The SAS System stopped processing this step because
of errors.
NOTE: DATA statement used (Total process time):
real time
0.00 seconds
cpu time
0.00 seconds
data
123_this_will_not_work
;
x = 1;
output;
run;
80
data 123_this_will_not_work;
--22
200
ERROR 22-322: Syntax error, expecting one of the following: a
name, a quoted string, /, ;, _DATA_, _LAST_, _NULL_.
ERROR 200-322: The symbol is not recognized and will be ignored.
81
82
83
x = 1;
output;
run;
NOTE: The SAS System stopped processing this step because of
errors.
WARNING: The data set WORK._THIS_WILL_NOT_WORK may be incomplete.
When this step was stopped there
were 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.01 seconds
A FEW BASICS
If a programmer uses the name of a
dataset that already exists then SAS will
simply write the new dataset over the old
one of that name, without warning.
data one_num;
x = 1;
output;
run;
data one_num;
y = 10;
output;
run;
proc print data=one_num;
title1 "one_num";
title2 "note that this contains the data";
title3 "from the second DATA ONE_NUM step";
run;
65
data one_num;
66
x = 1;
67
output;
68
run;
NOTE: The data set WORK.ONE_NUM has 1 observations and 1
variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.01 seconds
69
data one_num;
70
y = 10;
71
output;
72 run;
73 NOTE: The data set WORK.ONE_NUM has 1 observations and 1
variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.00 seconds
74
75
76
proc print data=one_num;
title1 "one_num";
title2 "note that this contains the
data";
77
title3 "from the second DATA ONE_NUM
step";
78
run;
NOTE: There were 1 observations read from the
data set WORK.ONE_NUM.
NOTE: PROCEDURE PRINT used (Total process
time):
real time
0.01 seconds
cpu time
0.00 seconds
one_num
note that this contains the data
from the second DATA ONE_NUM
step
Obs
1
y
10
A FEW BASICS
A DATA statement can create a single data
set or multiple datasets:
DATA
DATA
SUBJECTS;
MEN
WOMEN;
GETTING DATA INTO SAS
DATASETS
SAS users usually refer to records in
datasets as observations.
SAS DATA STEPs operate as implied loops
which iterate as necessary to handle the
data involved.
GETTING DATA INTO SAS
DATASETS
A programmer can assign
data values directly through
assignment statements.
data assignments;
length country $ 12;
subject = 25;
country = "PARAGUAY";
run;
13
14
15
16
17
18
data assignments;
length country $ 12;
subject = 25;
country = "PARAGUAY";
run;
NOTE: The data set WORK.ASSIGNMENTS has 1
observations and 2 variables.
NOTE: DATA statement used (Total process
time):
real time
0.03 seconds
cpu time
0.00 seconds
assignments
Obs
1
country
subject
PARAGUAY
25
GETTING DATA INTO SAS
DATASETS
A programmer can assign data values by
including a DATALINES or CARDS section
in a DATA STEP. Note that SAS accepts
these two interchangeably even when no
actual cards are involved.
data free_form;
input age sex $;
datalines;
54 MALE
35 MALE
40 FEMALE
29
FEMALE
;;;;
proc print data=free_form;
title1 "data free_form";
run;
1
2
3
4
data free_form;
input age sex $;
datalines;
NOTE: The data set WORK.FREE_FORM has 4 observations
and 2 variables.
NOTE: DATA statement used (Total process time):
real time
0.14 seconds
cpu time
0.03 seconds
9
10
11
12
;;;;
proc print data=free_form;
title1 "data free_form";
run;
NOTE: There were 4 observations read from the data
set WORK.FREE_FORM.
NOTE: PROCEDURE PRINT used (Total process time):
real time
0.17 seconds
cpu time
0.04 seconds
data free_form
Obs
1
2
3
4
age
54
35
40
29
sex
MALE
MALE
FEMALE
FEMALE
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
INFILE statements reference and describe
external source files.
INPUT statements direct SAS to read and
incorporate the data from these external
source files.
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
INFILE locates and describes an external
data source.
The syntax of INFILE statements varies by
operating system.
EXAMPLES:
WINDOWS:
data test;
infile
‘c:\work\space\sasajr\test.dat’;
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
UNIX:
data test;
infile ‘users/sasajr/test.dat’;
MAINFRAME:
//FILEIN DD DSN=YAHUPITZ.AJRDATA,DISP=SHR
.
.
.
DATA TEST;
INFILE FILEIN;
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
Can also use a FILENAME statement in open SAS code to refer to
an external file.
Also operating-system-specific.
Example from Windows:
FILENAME TESTDATA
‘c:\work\space\sasajr\test.dat’;
.
.
.
DATA TEST;
INFILE TESTDATA;
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
INFILE statements can also describe a file as
delimited with the DLM option, which
identifies the delimiter used in the file in
question.
EXAMPLE FOR A COMMA-DELIMITED FILE:
infile ‘users/sasajr/test.dat’
dlm = ‘,’;
This is useful in turning .CSV files from Excel
spreadsheets into SAS datasets.
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
A couple of options that are useful with
delimited-file INFILE statements are DSD,
which will recognize missing values between
two delimiters in a row, and MISSOVER,
which keeps SAS from reading data from the
following line if the current observation is not
completely filled in.
EXAMPLE FOR A COMMA-DELIMITED FILE:
infile ‘users/sasajr/test.dat’
dlm = ‘,’ dsd missover;
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
Once the data source is identified with either
INFILE or DATALINES, INPUT creates the
variables in the resultant SAS dataset.
The simplest form of an INPUT statement is
often called free-form input. It does not
require the data to be laid out consistently in
columns. Character variables can have at
most 8 characters, and cannot include spaces.
Note the use of the dollar sign to indicate that
a variable is character rather than numeric.
data free_form;
input age sex $;
datalines;
54
MALE
35 MALE
40 FEMALE
29
FEMALE
;;;;
proc print data=free_form;
title1 "data free_form";
run;
25
26
27
data free_form;
input age sex $;
datalines;
NOTE: The data set WORK.FREE_FORM has 4 observations and
2 variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.00 seconds
32
33
34
35
36
;;;;
proc print data=free_form;
title1 "data free_form";
run;
NOTE: There were 4 observations read from the data set
WORK.FREE_FORM.
NOTE: PROCEDURE PRINT used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
data free_form
Obs
age
sex
1
2
3
4
54
35
40
29
MALE
MALE
FEMALE
FEMALE
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
If the data are column-aligned in their
source you can use column pointers,
which consists of an @ sign followed by a
number, to indicate their placement within
the source record.
data column_aligned;
input @1 age @4 sex $;
datalines;
54 MALE
35 MALE
40 FEMALE
29 FEMALE
;;;;
38
39
40
data column_aligned;
input @1 age @4 sex $;
datalines;
NOTE: The data set WORK.COLUMN_ALIGNED has 4 observations and 2
variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.00 seconds
45
46
47
48
49
;;;;
proc print data = column_aligned;
title1 "data column_aligned";
run;
NOTE: There were 4 observations read from the data set
WORK.COLUMN_ALIGNED.
NOTE: PROCEDURE PRINT used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
data column_aligned
Obs
1
2
3
4
age
54
35
40
29
sex
MALE
MALE
FEMALE
FEMALE
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
If the source data have
more than one row per
observation SAS offers
row pointers that consist
of a pound sign (#)
followed by a number.
data column_and_row_pointers;
input #1 @1 age @4 sex $ #2 country $;
datalines;
54
MALE
URUGUAY
35 MALE
KAZAKHSTAN
40 FEMALE
UNITED KINGDOM
29 FEMALE
AUSTRALIA
;;;;
proc print data=column _and_row_pointers;
title1 'data column_and_row_pointers';
run;
119
120
121
122
data column_and_row_pointers;
input #1 @1 age @4 sex $ #2 country $;
datalines;
NOTE: The data set WORK.COLUMN_AND_ROW_POINTERS has 4
observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.00 seconds
131
132
133
134
135
;;;;
proc print data=column_and_row_pointers;
title1 'data column_and_row_pointers';
run;
NOTE: There were 4 observations read from the data set
WORK.COLUMN_AND_ROW_POINTERS.
NOTE: PROCEDURE PRINT used (Total process time):
real time
0.01 seconds
cpu time
0.00 seconds
data column_and_row_pointers
Obs
1
2
3
4
age
54
35
40
29
sex
MALE
MALE
FEMALE
FEMALE
country
URUGUAY
KAZAKHST
UNITED
AUSTRALI
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
SAS also allows moving forward
to a subsequent row within an
observation by using a slash (/).
data row_and_column;
input @1 age @4 sex
datalines;
54
MALE
URUGUAY
35 MALE
KAZAKHSTAN
40 FEMALE
UNITED KINGDOM
29 FEMALE
AUSTRALIA
;;;;
$ / @1 country $ ;
proc print data=row_and_column;
title1 "data row_and_column";
run;
51
52
53
data row_and_column;
input @1 age @4 sex
datalines;
$ / @1 country $ ;
NOTE: The data set WORK.ROW_AND_COLUMN has 4 observations and 3
variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
62
63
64
65
66
;;;;
proc print data=row_and_column;
title1 "data row_and_column";
run;
NOTE: There were 4 observations read from the data set
WORK.ROW_AND_COLUMN.
NOTE: PROCEDURE PRINT used (Total process time):
real time
0.01 seconds
cpu time
0.00 seconds
data row_and_column
Obs
age
sex
country
1
2
3
4
54
35
40
29
MALE
MALE
FEMALE
FEMALE
URUGUAY
KAZAKHST
UNITED
AUSTRALI
GETTING DATA INTO SAS DATASETS
FROM EXTERNAL FLAT OR DELIMITED
FILES
Formatted input can work with source
data that does not fit the requirements of
the less specific types of INPUT
statements. For example, character
variables that are longer than 8 characters,
and/or include embedded spaces.
Note that SAS often refers to formats used
in an INPUT statement as INFORMATs.
data row_and_column_with_a_format;
input @1 age @4 sex $ / country $14.;
datalines;
54
MALE
URUGUAY
35 MALE
KAZAKHSTAN
40 FEMALE
UNITED KINGDOM
29 FEMALE
AUSTRALIA
;;;;
proc print data=row_and_column_with_a_format;
title1 "data row_and_column_with_a_format";
run;
68
69
70
71
data row_and_column_with_a_format;
input @1 age @4 sex $ / country $14.;
datalines;
NOTE: The data set WORK.ROW_AND_COLUMN_WITH_A_FORMAT has 4
observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
80
81
82
83
84
;;;;
proc print data=row_and_column_with_a_format;
title1 "data row_and_column_with_a_format";
run;
NOTE: There were 4 observations read from the data set
WORK.ROW_AND_COLUMN_WITH_A_FORMAT.
NOTE: PROCEDURE PRINT used (Total process time):
real time
0.01 seconds
cpu time
0.00 seconds
data row_and_column_with_a_format
Obs
age
sex
country
1
2
3
4
54
35
40
29
MALE
MALE
FEMALE
FEMALE
URUGUAY
KAZAKHSTAN
UNITED KINGDOM
AUSTRALIA
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
SAS offers three commands to make new
datasets out of previously existing SAS
datasets: SET, MERGE, and UPDATE, all of
which can be used in situations that range
from extremely simple to extremely
complex.
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
SET allows a programmer
to simply copy the contents
of one dataset into another
without any further
modifications.
data column_and_row_pointers;
input #1 @1 age @4 sex $ #2 country $;
datalines;
54
MALE
URUGUAY
35 MALE
KAZAKHSTAN
40 FEMALE
UNITED KINGDOM
29 FEMALE
AUSTRALIA
;;;;
data subjects;
set column_and_row_pointers;
run;
137
138
139
data column_and_row_pointers;
input #1 @1 age @4 sex $ #2 country $;
datalines;
NOTE: The data set WORK.COLUMN_AND_ROW_POINTERS has 4 observations and 3
variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.01 seconds
148
149
150
151
152
;;;;
data subjects;
set column_and_row_pointers;
run;
NOTE: There were 4 observations read from the data set
WORK.COLUMN_AND_ROW_POINTERS.
NOTE: The data set WORK.SUBJECTS has 4 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
In tandem with IF and
OUTPUT statements SET
can create multiple
datasets from a single
source.
data men women unknown;
set column_and_row_pointers;
if sex = "MALE" then output men;
if sex = "FEMALE" then output
women;
if not
(sex in ("MALE","FEMALE"))
then output unknown;
run;
154 data men women unknown;
155
set column_and_row_pointers;
156
if sex = "MALE" then output men;
157
if sex = "FEMALE" then output women;
158
if not (sex in ("MALE","FEMALE")) then
output unknown;
159 run;
NOTE: There were 4 observations read from the data
set WORK.COLUMN_AND_ROW_POINTERS.
NOTE: The data set WORK.MEN has 2 observations and 3
variables.
NOTE: The data set WORK.WOMEN has 2 observations and
3 variables.
NOTE: The data set WORK.UNKNOWN has 0 observations
and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.07 seconds
cpu time
0.01 seconds
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
Without getting into the
intricacies of this here, a
single SET statement with 2
or more datasets gives
different results than
multiple SET statements for
individual datasets.
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
MERGE can create a dataset from two or
more pre-existing SAS datasets.
Although SAS does allow its use with or
without a BY statement, using MERGE
without BY is dangerous since it puts
observations together with no regard for
their content.
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
A special option called MERGENOBY is often
used to avoid the chaotic results of a MERGE
statement without an attendant BY.
OPTIONS MERGENOBY=ERROR will
terminate any datastep that has a MERGE
without a BY (a reminder that having nothing
is sometimes preferable to having garbage),
and OPTIONS MERGENOBY=WARNING will
allow the DATA step to complete while issuing
a WARNING to the log.
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
Using a BY statement along with MERGE
forces SAS to match observations on
common values. This can be used for oneto-one, one-to-many, and many-to-one
matches, whose common values must
match exactly. Data sets need to be
SORTed by the matching variables in order
for a MERGE...BY command to work.
data sbp; * systolic blood pressure;
input subject sbp;
datalines;
1 120
3 122
4 108
5 133
6 120
7 129
8 139
9 123
10 139
run;
data dbp;
input subject dbp; * diastolic blood pressure;
datalines;
4 80
5 79
1 95
3 88
2 80
10 88
8 77
6 84
7 82
9 90
run;
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
Running a MERGE statement without SORTing:
data bp;
merge sbp dbp;
by subject;
run;
30
31
32
33
34
data bp;
merge sbp dbp;
by subject;
run;
ERROR: BY variables are not properly sorted on data set WORK.DBP.
subject=5 sbp=133 dbp=79 FIRST.subject=1 LAST.subject=1 _ERROR_=1
_N_=4
NOTE: The SAS System stopped processing this step because of
errors.
NOTE: There were 5 observations read from the data set WORK.SBP.
NOTE: There were 3 observations read from the data set WORK.DBP.
WARNING: The data set WORK.BP may be incomplete. When this step
was stopped there were 3
observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.01 seconds
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
Running a MERGE with the previous data steps properly SORTed:
proc sort data=sbp;
by subject;
run;
proc sort data=dbp;
by subject;
run;
data bp;
merge sbp dbp;
by subject;
run;
44
45
46
47
data bp;
merge sbp dbp;
by subject;
run;
NOTE: There were 9 observations read from the data
set WORK.SBP.
NOTE: There were 10 observations read from the data
set WORK.DBP.
NOTE: The data set WORK.BP has 10 observations and 3
variables.
NOTE: DATA statement used (Total process time):
real time
0.06 seconds
cpu time
0.01 seconds
RESULT OF MERGE
WHICH CONTAINS ALL VARIABLES
FROM THE SOURCE DATASETS
Obs
1
2
3
4
5
6
7
8
9
10
subject
1
2
3
4
5
6
7
8
9
10
sbp
dbp
120
.
122
108
133
120
129
139
123
139
95
80
88
80
79
84
82
77
90
88
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
Datasets associated with MERGE can have
special flag variables associated with them
to use for selecting variables that show up
in the associated dataset. This involves the
IN= option.
*
MERGE, keeping only observations
for subjects in both source
datasets:;
data bp3;
merge sbp(in=insbp) dbp(in=indbp);
by subject;
if insbp and indbp;
run;
49
50
51
52
53
data bp3;
merge sbp(in=insbp) dbp(in=indbp);
by subject;
if insbp and indbp;
run;
NOTE: There were 9 observations read from the data
set WORK.SBP.
NOTE: There were 10 observations read from the data
set WORK.DBP.
NOTE: The data set WORK.BP3 has 9 observations and 3
variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
A problem to watch out for: if two
datasets have identically named variables
apart from those used in the BY statement
the value from the second-listed in the
MERGE statement overwrites the value
from the first, even if the value in the
second dataset is missing.
data sbp2;
input subject sbp
datalines;
1 120 M
3 122 M
4 108 M
5 133 F
6 120 F
7 129 F
8 139 M
9 123 F
10 139 M
run;
sex $;
data dbp2;
input subject dbp sex $;
datalines;
4 80 ?
5 79 ?
1 95 ?
3 88 ?
2 80 ?
10 88 ?
8 77 ?
6 84 ?
7 82 ?
9 90 ?
run;
proc sort data=sbp2;
by subject;
run;
proc sort data=dbp2;
by subject;
run;
data bp2;
merge sbp2 dbp2;
by subject;
run;
proc print data=bp2;
var subject sbp dbp sex;
title1 'MERGED DATA';
title2 'SEX FROM THE SECOND DATASET';
title3 'HAS WRITTEN OVER SEX FROM THE FIRST DATASET';
run;
109
110
111
112
113
data bp2;
merge sbp2 dbp2;
by subject;
run;
NOTE: There were 9 observations read from the
data set WORK.SBP2.
NOTE: There were 10 observations read from the
data set WORK.DBP2.
NOTE: The data set WORK.BP2 has 10 observations
and 4 variables.
NOTE: DATA statement used (Total process time):
real time
0.04 seconds
cpu time
0.01 seconds
MERGED DATA
SEX FROM THE SECOND DATASET
HAS WRITTEN OVER SEX FROM THE FIRST DATASET
Obs
1
2
3
4
5
6
7
8
9
10
subject
sbp
dbp
sex
1
2
3
4
5
6
7
8
9
10
120
.
122
108
133
120
129
139
123
139
95
80
88
80
79
84
82
77
90
88
?
?
?
?
?
?
?
?
?
?
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
One solution to this:
rename at least one
of the variables.
data sbp3;
input subject sbp
datalines;
1
120
M
3
122
M
4
108
M
5
133
F
6
120
F
7
129
F
8
139
M
9
123
F
10
139
M
run;
sbpsex $;
data dbp3;
input subject dbp dbpsex $;
datalines;
4
80
?
5
79
?
1
95
?
3
88
?
2
80
?
10
88
?
8
77
?
6
84
?
7
82
?
9
90
?
run;
proc sort data=sbp3;
by subject;
run;
proc sort data=dbp3;
by subject;
run;
data bp3;
merge sbp3 dbp3;
by subject;
run;
proc print data=bp3;
var subject sbp dbp sbpsex dbpsex;
title1 "MERGED DATA";
title2 "WITH NO VARIABLES WRITTEN OVER";
run;
MERGED DATA
WITH NO VARIABLES WRITTEN OVER
Obs
1
2
3
4
5
6
7
8
9
10
subject
1
2
3
4
5
6
7
8
9
10
sbp
dbp
sbpsex
dbpsex
120
.
122
108
133
120
129
139
123
139
95
80
88
80
79
84
82
77
90
88
M
?
?
?
?
?
?
?
?
?
?
M
M
F
F
F
M
F
M
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
The UPDATE command applies
the values of one dataset
(TRANSACTION DATA SET) to
another (MASTER DATASET).
It can write a new version of the
MASTER DATASET, or it can
create a third dataset.
GETTING DATA INTO SAS DATASETS
FROM OTHER SAS DATASETS
It copies non-missing values from the
TRANSACTION dataset over MASTER
dataset values where appropriate.
It makes no change to the MASTER
value if the corresponding
TRANSACTION value is missing.
Like MERGE, UPDATE requires the input
datasets to be sorted.
* MASTER DATASET:;
data weeks_in_study;
input
subject
datalines;
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
1
10 1
;;;;
run;
lastweek;
* TRANSACTION DATASET:;
data recent_weeks;
input
subject
datalines;
11
2
9
3
7
4
5
2
3
4
1
1
2
.
4
.
6 .
;;;;
run;
lastweek;
proc sort data=weeks_in_study;
by subject;
run;
proc sort data=recent_weeks;
by subject;
run;
data current_weeks;
update
weeks_in_study
by subject;
run;
recent_weeks;
proc print data=current_weeks;
title1 "MASTER DATASET";
title2 "AFTER APPLYING THE TRANSACTION DATASET";
run;
MASTER DATASET
AFTER APPLYING THE TRANSACTION DATASET
Obs
subject
lastweek
1
2
3
4
5
6
7
8
9
10
11
1
2
3
4
5
6
7
8
9
10
11
1
1
4
1
2
1
4
1
3
1
2
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
Since we can rarely command the
form in which we get our data the
tools for working with them in SAS
are greatly helpful.
DATA step statements can be
positional (that is, their order
matters), or non-positional (that is,
their order does not matter).
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
LENGTH statements give
programmers a pro-active way to set
the length of a variable without
leaving it to the chance of the
variable’s content.
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
The LENGTH statement is most often
associated with CHARACTER
variables.
It can make the difference between
usable data and garbage, especially
when trying to match with other
datasets, as with MERGE statements or
PROC APPEND.
data length_1;
input @1 yesno_code;
if yesno_code = 1 then yesno =
"NO";
else if yesno_code = 2 then
yesno = "YES";
datalines;
2
1
;;;;
run;
LENGTH_1, WITH LENGTH OF YESNO DETERMINED
FROM THE DATA VALUES AND THEIR ORDER
NOTE THAT THE YES VALUE IS TRUNCATED TO YE
Obs
1
2
yesno_
code
2
1
yesno
YE
NO
data length_correct;
length yesno $ 3;
input @1 yesno_code;
if yesno_code = 1 then yesno =
"NO";
else if yesno_code = 2 then yesno =
"YES";
datalines;
2
1
;;;;
run;
DATA LENGTH CORRECT
NOTE THAT THE LENGTH OF YESNO
IS SET IN THE LENGTH STATEMENT
AND THE DATA ARE PRESENTED IN FULL
Obs
yesno_
code
yesno
1
2
2
1
YES
NO
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
Position is VERY important for
LENGTH statements since SAS is
extremely finicky about their coming
at or near the top of a DATA step.
data length_wrong;
input @1 yesno_code;
if yesno_code = 1 then yesno =
"NO";
else if yesno_code = 2 then yesno =
"YES";
length yesno $ 3;
datalines;
2
1
;;;;
run;
110 data length_wrong;
111
input @1 yesno_code;
112
if yesno_code = 1 then yesno = "NO";
113
else if yesno_code = 2 then yesno = "YES";
114
115
length yesno $ 3;
WARNING: Length of character variable yesno has already
been set.
Use the LENGTH statement as the very first
statement in the DATA
STEP to declare the
length of a character variable.
116
117 datalines;
NOTE: The data set WORK.LENGTH_WRONG has 2 observations
and 2 variables.
NOTE: DATA statement used (Total process time):
real time
0.04 seconds
cpu time
0.01 seconds
DATA LENGTH_WRONG
NOTE THAT THE LENGTH OF YESNO
IS NOT SET
TO WHAT IS IN THE LENGTH STATEMENT
AND DATA ARE TRUNCATED
Obs
1
2
yesno_
code
2
1
yesno
YE
NO
data countries;
length country $ 20;
input @1 country $25.;
datalines;
URUGUAY
PAKISTAN
JAPAN
ISRAEL
UNITED ARAB EMIRATES
UNITED KINGDOM
;;;;
data more_countries;
length country $ 25;
input @1 country $25.;
datalines;
CANADA
ARGENTINA
UNITED STATES OF AMERICA
ZAIRE
NEW ZEALAND
ZAMBIA
;;;;
proc append base=countries data=more_countries;
run;
154
155
proc append base=countries data=more_countries;
run;
NOTE: Appending WORK.MORE_COUNTRIES to
WORK.COUNTRIES.
WARNING: Variable country has different lengths on
BASE and DATA files (BASE 20 DATA 25).
ERROR: No appending done because of anomalies listed
above. Use FORCE option to append these files.
NOTE: 0 observations added.
NOTE: The data set WORK.COUNTRIES has 6 observations
and 1 variables.
NOTE: Statements not processed because of errors
noted above.
NOTE: PROCEDURE APPEND used (Total process time):
real time
0.03 seconds
cpu time
0.00 seconds
NOTE: The SAS System stopped processing this step
because of errors.
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
HOLDING ON TO WHAT YOU WANT AND
GETTING RID OF THE REST
Often programming requires working with
variables that are unnecessary and
unwanted in the final output.
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
HOLDING ON TO WHAT YOU WANT AND
GETTING RID OF THE REST
KEEP and DROP statements allow for
holding on only to those variables that you
want after the work is done.
They can be applied either as stand-alone
statements placed anywhere in the DATA
step, or as parenthetical modifications to
the DATA statement.
Obs
1
2
3
4
dataset SUBJECTS
age
sex
country
54
35
40
29
MALE
MALE
FEMALE
FEMALE
URUGUAY
KAZAKHSTAN
UNITED KINGDOM
AUSTRALIA
data countries_1;
set subjects;
keep country;
run;
dataset COUNTRIES_1
Obs
1
2
3
4
country
URUGUAY
KAZAKHSTAN
UNITED KINGDOM
AUSTRALIA
data countries_2;
keep country;
set subjects;
run;
dataset COUNTRIES_2
Obs
country
1
2
3
4
URUGUAY
KAZAKHSTAN
UNITED KINGDOM
AUSTRALIA
data countries_3 (keep=country);
set subjects;
run;
dataset COUNTRIES_3
Obs
1
2
3
4
country
URUGUAY
KAZAKHSTAN
UNITED KINGDOM
AUSTRALIA
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
DROP mirrors KEEP in that
it dispenses with variables
that you don’t want to hold
on to rather than holding on
to those that you do want.
data countries_4;
drop age sex;
set subjects;
run;
dataset COUNTRIES_4
Obs
1
2
3
4
country
URUGUAY
KAZAKHSTAN
UNITED KINGDOM
AUSTRALIA
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
The RETAIN statement sounds similar to
KEEP, but functions differently.
Use RETAIN to hold a variable’s value
across observations.
Great for building counters within
datasets.
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
In addition to facilities for
KEEPing and DROPping
variables within observations,
SAS has facilities to hold on to
and discard observations in a
dataset.
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
To exclude unwanted
observations use the DELETE
command. This is most often
done conditionally, in conjunction
with an IF statement.
data men4;
set subjects;
if sex ne "MALE" then delete;
run;
346
347
348
349
350
data men4;
set subjects;
if sex ne "MALE" then delete;
run;
NOTE: There were 4 observations read from
the data set WORK.SUBJECTS.
NOTE: The data set WORK.MEN4 has 2
observations and 3 variables.
NOTE: DATA statement used (Total process
time):
real time
0.04 seconds
cpu time
0.00 seconds
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
To include only those
observations that are wanted
use the OUTPUT command.
This can be done conditionally,
in conjunction with an IF
statement.
data men5;
set subjects;
if sex = "MALE" then output;
run;
352
353
354
355
data men5;
set subjects;
if sex = "MALE" then output;
run;
NOTE: There were 4 observations read from the
data set WORK.SUBJECTS.
NOTE: The data set WORK.MEN5 has 2 observations
and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.00 seconds
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
Note that OUTPUT does not
always need to be articulated. If
there is no explicit OUTPUT or
DELETE statement anywhere in a
DATA step SAS will include all
observations processed.
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
One command that may take some getting useful
is the SUBSETTING IF statement. It takes getting
used to because it sounds like an incomplete
sentence, an IF with no THEN.
A good way to think of it is IF XXX THEN include
this observation. If the observation passes the IF
condition it is included in the dataset and SAS
applies the rest of the commands in the DATA step.
If the observation fails the IF condition the
observation is excluded and SAS does not process
the observation further.
data men6;
set subjects;
if sex = "MALE";
run;
362
363
364
365
data men6;
set subjects;
if sex = "MALE";
run;
NOTE: There were 4 observations read
from the data set WORK.SUBJECTS.
NOTE: The data set WORK.MEN6 has 2
observations and 3 variables.
NOTE: DATA statement used (Total
process time):
real time
0.03 seconds
cpu time
0.00 seconds
WORKING WITH DATA
ONCE THEY ARE IN A SAS DATASET
Often a WHERE statement can be used to
the same effect as a subsetting IF. If the
datasets in question are large WHERE can
save a lot of computer time.
Deficit: the log does not show you how
many observations are in the source
dataset when the code uses WHERE, and it
does with a subsetting IF.
data men6;
set subjects;
if sex = "MALE";
run;
data men7;
set subjects;
where sex = "MALE";
run;
367
368
369
370
data men6;
set subjects;
if sex = "MALE";
run;
NOTE: There were 4 observations read from the data set WORK.SUBJECTS.
NOTE: The data set WORK.MEN6 has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.03 seconds
cpu time
0.01 seconds
371
372
373
374
375
data men7;
set subjects;
where sex = "MALE";
run;
NOTE: There were 2 observations read from the data set WORK.SUBJECTS.
WHERE sex='MALE';
NOTE: The data set WORK.MEN7 has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
DATASETS
WHERE WE KEEP NOTHING
Sometimes we use DATA steps to
accomplish tasks that do not involve
keeping any data.
For these we often use the special reserved
name _NULL_.
Uses include writing reports, setting or
checking macro variables, and writing
messages to the log.
data _null_;
x = ('16MAY2010'D '10DEC2009'D)/7;
Y = ('10JAN2010'D '10DEC2009'D)/7;
put
x
y;
run;
377
378
379
380
381
data _null_;
x = ('16MAY2010'D - '10DEC2009'D)/7;
Y = ('10JAN2010'D - '10DEC2009'D)/7;
put
x
y;
run;
22.428571429
4.4285714286
NOTE: DATA statement used (Total process
time):
real time
0.03 seconds
cpu time
0.00 seconds
PROCEDURAL
PROGRAMMNG TOOLS
Position is extremely
important for procedural
commands since the same
commands in different
orders can give some very
different results.
data _null_;
x = 4;
x = x + 1;
x = x * 4;
put "x from first set of statements: " x;
x = 4;
x = x * 4;
x = x + 1;
put "x from second set of statements: " x;
run;
22
23
24
25
26
27
28
29
30
31
data _null_;
x = 4;
x = x + 1;
x = x * 4;
put "x from first set of statements: " x;
x = 4;
x = x * 4;
x = x + 1;
put "x from second set of statements: " x;
run;
x from first set of statements: 20
x from second set of statements: 17
NOTE: DATA statement used (Total process time):
real time
0.00 seconds
cpu time
0.00 seconds
PROCEDURAL
PROGRAMMNG TOOLS
Note that assignment
statements such as
those just shown are
quite economical in
terms of computer time.
PROCEDURAL
PROGRAMMNG TOOLS
CONDITIONAL BRANCHING—THE VERY
HEART OF PROCEDURAL
PROGRAMMING!
The basic IF...THEN statement instructs
the system to execute a command or group of
commands if the specified condition is true,
and possibly another command or command
group if it is false.
Here is a basic IF statement that shows what
to do if the stated condition is true, and
nothing else.
data showing_if1;
length yesno $ 3;
input @1 yesno_code;
if yesno_code = 1 then yesno = "NO";
datalines;
2
1
4
3
;;;;
run;
proc print data=showing_if1;
title1 "DEMONSTRATING RESULTS OF A BASIC IF
STATEMENT";
run;
DEMONSTRATING RESULTS OF A BASIC IF STATEMENT
Obs
1
2
3
4
yesno
NO
yesno_
code
2
1
4
3
PROCEDURAL
PROGRAMMNG TOOLS
In order to instruct the
system what to do when
the IF condition fails,
follow the IF statement
with an ELSE.
data showing_if2;
length yesno $ 3;
input @1 yesno_code;
if yesno_code = 1 then yesno = "NO";
else yesno = "YES";
datalines;
2
1
4
3
;;;;
run;
proc print data=showing_if2;
title1 "DEMONSTRATING RESULTS" ;
title2 "OF AN IF STATEMENT";
title3 "WITH A FOLLOWING ELSE STATEMENT";
run;
DEMONSTRATING RESULTS
OF AN IF STATEMENT
WITH A FOLLOWING ELSE STATEMENT
Obs
1
2
3
4
yesno
YES
NO
YES
YES
yesno_
code
2
1
4
3
PROCEDURAL
PROGRAMMNG TOOLS
SAS allows for complex
constructions involving IF
and ELSE statements. These
are helpful in expressing the
outcomes of complex logic.
data showing_if3;
length yesno $ 3;
input @1 yesno_code;
if yesno_code = 1 then yesno = "NO";
else if yesno_code = 2 then yesno = "YES";
else yesno = "INV";
datalines;
2
1
4
3
;;;;
run;
proc print data=showing_if3;
title1 "DEMONSTRATING RESULTS";
title2 "OF AN IF STATEMENT";
title3 "WITH A FOLLOWING ELSE IF AND ELSE";
run;
DEMONSTRATING RESULTS
OF AN IF STATEMENT
WITH A FOLLOWING ELSE IF AND ELSE
Obs
1
2
3
4
yesno
YES
NO
INV
INV
yesno_
code
2
1
4
3
PROCEDURAL
PROGRAMMNG TOOLS
In order to have SAS execute multiple
commands pursuant to a condition use a
DO block in connection with the
appropriate IF or ELSE statement.
Note that a DO block must close with an
END statement, and that END statements
close out DO blocks and not IF blocks.
data showing_if4;
length yesno $ 3;
input @1 yesno_code;
if yesno_code = 1 then yesno = "NO";
else if yesno_code = 2 then yesno = "YES";
else do;
yesno = "INV";
put "Observation " _n_ " has an invalid code.";
end;
datalines;
2
1
4
3
;;;;
run;
179
180
181
182
183
184
185
186
187
188
189
190
data showing_if4;
length yesno $ 3;
input @1 yesno_code;
if yesno_code = 1 then yesno = "NO";
else if yesno_code = 2 then yesno = "YES";
else do;
yesno = "INV";
put "Observation " _n_ " has an invalid code.";
end;
datalines;
Observation 3
Observation 4
has an invalid code.
has an invalid code.
NOTE: The data set WORK.SHOWING_IF4 has 4 observations and 2
variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.01 seconds
195
196
;;;;
run;
DEMONSTRATING RESULTS
OF AN IF STATEMENT
WITH A FOLLOWING ELSE IF AND ELSE
Obs
1
2
3
4
yesno
YES
NO
INV
INV
yesno_
code
2
1
4
3
PROCEDURAL
PROGRAMMNG TOOLS
LOOPS
In addition to designating blocks
of commands to be executed once,
you can also use DO to loop
through groups of commands
subject to certain conditions.
PROCEDURAL
PROGRAMMNG TOOLS
LOOPS
DO WHILE loops iterate as long as the
test condition is true.
* demonstrating DO--WHILE loop:;
data _null_;
index_var = 0;
do while (index_var < 10);
put "index_var: " index_var;
index_var = index_var + 1;
end;
run;
4971
4972
4973
4974
4975
4976
4977
4978
4979
* demonstrating DO--WHILE loop:;
data _null_;
index_var = 0;
do while (index_var < 10);
put "index_var: " index_var;
index_var = index_var + 1;
end;
run;
index_var: 0
index_var: 1
index_var: 2
index_var: 3
index_var: 4
index_var: 5
index_var: 6
index_var: 7
index_var: 8
index_var: 9
NOTE: DATA statement used (Total process time):
real time
0.00 seconds
cpu time
0.00 seconds
PROCEDURAL
PROGRAMMNG TOOLS
LOOPS
DO WHILE loops iterate as long as the test
condition is true.
DO UNTIL loops iterate as long as the test
condition is false.
* demonstrating DO--UNTIL loop:;
data _null_;
index_var = 0;
do until (index_var > 10);
put "index_var: " index_var;
index_var = index_var + 1;
end;
run;
4980
4981
4982
4983
4984
4985
4986
4987
4988
* demonstrating DO--UNTIL loop:;
data _null_;
index_var = 0;
do until (index_var > 10);
put "index_var: " index_var;
index_var = index_var + 1;
end;
run;
index_var: 0
index_var: 1
index_var: 2
index_var: 3
index_var: 4
index_var: 5
index_var: 6
index_var: 7
index_var: 8
index_var: 9
index_var: 10
NOTE: DATA statement used (Total process time):
real time
0.00 seconds
cpu time
0.00 seconds
PROCEDURAL
PROGRAMMNG TOOLS
LOOPS
DO WHILE loops iterate as long as the test
condition is true.
DO UNTIL loops iterate as long as the test
condition is false.
DO loops can also be set up to iterate a set
number of times.
* demonstrating a DO loop with a
set number of iterations:;
data _null_;
do index_var = 1 to 5;
put "index_var: "
index_var;
end;
run;
4989
4990
4991
4992
4993
4994
4995
* demonstrating a DO loop with a set number of
iterations:;
data _null_;
do index_var = 1 to 5;
put "index_var: " index_var;
end;
run;
index_var: 1
index_var: 2
index_var: 3
index_var: 4
index_var: 5
NOTE: DATA statement used (Total process time):
real time
0.00 seconds
cpu time
0.00 seconds
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
SAS arrays have some fetching and often
frustrating differences from arrays in other
programming languages.
Unlike other languages’ arrays that have a
group of values that have identical attributes
and no life outside of the array, arrays in SAS
consist of groups of variables that need to be
of the same type, but that do not necessarily
have anything else in common.
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
For example, SAS would allow
programmers to put RACE,
SEX, and COUNTRY, and
ETHNICITY into a single
array, even though they are all
of different sizes.
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
SAS offers two methods for handling array
subscripts, called IMPLICIT and
EXPLICIT subscripting.
With IMPLICIT subscripting the
programmer does not need to be
concerned with articulating array
subscripts.
data implicit_arrays;
LENGTH RACE $ 15 SEX $ 6 ETHNICITY $ 12 COUNTRY $
27;
subject = "99";
* NOTE THE USE OF THE DOLLAR SIGN FOR THE ARRAY OF
CHARACTER VARIABLES:;
ARRAY DEMOS $ RACE SEX ETHNICITY COUNTRY;
ARRAY VITALS SYSTOLIC_BP DIASTOLIC_BP PULSE
RESPIRATION WEIGHT;
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
To loop through elements of an implicitly
subscripted array use the DO OVER
looping command.
* DEMONSTRATION OF NULLING OUT A SET OF VARIABLES
THROUGH USING ARRAYS:;
data implicit_arrays;
LENGTH RACE $ 15 SEX $ 6 ETHNICITY $ 12 COUNTRY $
27;
subject = "99";
ARRAY DEMOS $ RACE SEX ETHNICITY COUNTRY;
ARRAY VITALS SYSTOLIC_BP DIASTOLIC_BP PULSE
RESPIRATION WEIGHT;
DO OVER DEMOS;
DEMOS = " ";
END;
DO OVER VITALS;
VITALS = .;
END;
run;
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
To loop through elements of an implicitly
subscripted array use the DO OVER
looping command.
For reasons obscure implicitly subscripted
arrays are often frowned upon.
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
For EXPLICITLY subscripted
arrays the programmer needs to
declare the number of elements or
use an * for a number of elements
that is not pre-determined.
data explicit_arrays1;
LENGTH RACE $ 15 SEX $ 6 ETHNICITY $ 12 COUNTRY $
27;
subject = "99";
ARRAY DEMOS {4} $ RACE SEX ETHNICITY COUNTRY;
ARRAY VITALS {5} SYSTOLIC_BP DIASTOLIC_BP PULSE
RESPIRATION WEIGHT;
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
Looping through elements
of an explicitly subscripted
array requires DO loops that
use index variables.
data explicit_arrays1;
LENGTH RACE $ 15 SEX $ 6 ETHNICITY $ 12 COUNTRY $ 27;
subject = "99";
ARRAY DEMOS {4} $ RACE SEX ETHNICITY COUNTRY;
ARRAY VITALS {5} SYSTOLIC_BP DIASTOLIC_BP PULSE
RESPIRATION WEIGHT;
DO INDEX = 1 TO 4;
DEMOS{INDEX} = " ";
END;
DO INDEX = 1 TO 5;
VITALS{INDEX} = .;
END;
run;
PROCEDURAL
PROGRAMMNG TOOLS
ARRAYS
Note that in no case does the name of the
array stay in any datasets that are created.
Using an array in a subsequent DATA step
requires declaring it again.
PROCEDURAL
PROGRAMMNG TOOLS
RECODING DATA
Since data do not always come in in the
form that users of SAS output require,
programmers often have to recode them.
They often do this in DATA steps.
For example, users often need to report
ages in groups rather than as a specific
year value.
data men4;
set four;
if sex = "MALE";
* RECODING DATA USING ASSIGNMENT STATEMENTS:;
If age = . then age_group = "not reported";
If 0 <= age < 15 then age_group = "< 15";
If 15 <= age <= 24 then age_group = "15 - 24";
If 25 <= age <= 34 then age_group = "24 - 34";
If 35 <= age <= 44 then age_group = "35 - 44";
If 45 <= age <= 54 then age_group = "45 - 54";
If 55 <= age <= 64 then age_group = "55 - 64";
if age >= 65 then age_group = "65+";
;
run;
proc format;
value agegr
. = "not reported"
0 - 14 = "< 15"
15 - 24 = "15 - 24"
25 - 34 = "25 - 34"
35 - 44 = "35 - 44"
45 - 54 = "45 - 54"
55 - 64 = "55 - 64"
/* USING AN IMPOSSIBLY HIGH VALUE AS AN UPPER BOUND*/
65 - 999 = "65+";
run;
data men4;
set four;
if sex = "MALE";
* RECODING DATA USING A FORMAT:;
age_group = put(age,agegr.);
run;
PROCEDURAL
PROGRAMMNG TOOLS
RECODING DATA
Note that the results of this
PUT command always go into
CHARACTER variables, and
that PUT and OUTPUT are not
inverses of one another.
VARIABLE ATTRIBUTES
LABELS
As the data names that programmers use
frequently don’t mean anything to anyone
else, information users frequently request
labels to go with the variables.
A single LABEL statement can assign
labels to multiple variables.
data blood pressure;
input subject sbp dbp;
label sbp = "Systolic Blood Pressure"
dbp = "Diastolic Blood Pressure"
;
datalines;
1 120 80
3 122 79
4 108 95
5 133 88
6 120 77
7 129 84
8 139 84
9 123 89
10 139 80
run;
VARIABLE ATTRIBUTES
LABELS
As the data names that programmers use
frequently don’t mean anything to anyone
else, information users frequently request
labels to go with the variables.
A single LABEL statement can assign
labels to multiple variables.
Note that this statement is non-positional.
VARIABLE ATTRIBUTES
FORMATS
SAS can format variables in PROC steps, in
which case the format is only applied
through the duration of that PROC step, or
in a DATA step, in which case the format
stays with the variable throughout the
program.
data men4;
set four;
* SINCE IT IS ASSIGNED IN A DATA
STEP THE FORMAT AGEGR. WILL
STAY WITH THE VARIABLE AGE
THROUGHOUT THIS PROGRAM.
format age agegr.;
run;
VARIABLE ATTRIBUTES
FORMATS
SAS can format variables in PROC steps, in
which case the format is only applied
through the duration of that PROC step, or
in a DATA step, in which case the format
stays with the variable throughout the
program.
Note that this statement is non-positional.
CONTACT
AARON RABUSHKA
AT
ARABUSHKA@INCRESEARCH
.COM
QUESTIONS?
Download