I Do, I DoW - Virginia SAS Users Group

I Do, I Do, I DoW!
A look at SAS DO and DoW loops
John Matro
Virginia Commonwealth University
Topics
•
•
•
•
•
What is a DO loop?
Some simple examples
Using DO loops with SAS arrays
Using DO loops for reading data
The DoW loop
2
Acknowledgements
• Do Which? Loop, Until or While? A Review Of Data Step
And Macro Algorithms, Ronald J. Fehd, SAS Global
Forum 2007 proceedings,
http://www2.sas.com/proceedings/forum2007/0672007.pdf
• The Magnificent DO, Paul M. Dorfman, SESUG 2002
proceedings,
http://analytics.ncsu.edu/sesug/2002/TU05.pdf
• In Lockstep with the DoW-Loop, Paul M. Dorfman,
SESUG 2011 proceedings,
http://analytics.ncsu.edu/sesug/2011/SS01.Dorfman.pdf
3
Preliminaries
We will use sashelp.class in some of our examples:
Obs
1
2
3
4
5
6
7
8
Name
Sex
Age
Height
Weight
Alfred
Alice
Barbara
Carol
Henry
James
Jane
Janet
M
F
F
F
M
M
F
F
14
13
13
14
14
12
12
15
69.0
56.5
65.3
62.8
63.5
57.3
59.8
62.5
112.5
84.0
98.0
102.5
102.5
83.0
84.5
112.5
Ronald
Thomas
William
M
M
M
15
11
15
67.0
57.5
66.5
133.0
85.0
112.0
...
17
18
19
4
Preliminaries
DATA _NULL_;
SET sashelp.class (OBS=3);
PUT 'Hello ' age age= name= _N_=;
RUN;
Hello 14 Age=14 Name=Alfred _N_=1
Hello 13 Age=13 Name=Alice _N_=2
Hello 13 Age=13 Name=Barbara _N_=3
_NULL_: Means we are not creating a SAS table.
PUT: Writes info to the SAS log or to a file.
_N_: Automatic SAS variable that indicates the
current iteration of the Data step.
5
Preliminaries
You can modify _N_ :
DATA _NULL_;
PUT 'Top:
' _N_=;
SET sashelp.class (OBS=3);
_N_ = _N_ + 10;
PUT 'Bottom: ' _N_= /;
RUN;
Top:
Bottom:
Top:
Bottom:
Top:
Bottom:
Top:
_N_=1
_N_=11
_N_=2
_N_=12
_N_=3
_N_=13
_N_=4
6
Preliminaries
• A data step can have multiple SET statements.
• Each reads its own “virtual copy” of the file
(buffer), which are completely independent of
each other.
• Each begins reading at the first record.
• Each has its own file pointer to remember where
it stopped reading (in its virtual copy).
DATA _NULL_;
SET sashelp.class(OBS=2);
PUT name= age=;
SET sashelp.class(OBS=2);
PUT name= age=;
RUN;
Name=Alfred Age=14
Name=Alfred Age=14
Name=Alice Age=13
Name=Alice Age=13
7
Preliminaries
A data step stops executing when any SET
statement will read beyond the end of file:
DATA _NULL_;
PUT _N_=;
SET sashelp.class (OBS=2);
PUT 'Middle';
SET sashelp.class (OBS=2);
RUN;
_N_=1
Middle
_N_=2
Middle
_N_=3
8
Preliminaries
On a SET statement, END= creates a temporary
variable, initialized to 0, that is set to 1 after the
last observation is read. In this example, we
name the variable 'eof'.
_N_=1 eof=0
_N_=2 eof=0
DATA _NULL_;
_N_=3 eof=0
SET sashelp.class END=eof;
_N_=4 eof=0
PUT _N_= eof=;
...
RUN;
_N_=17 eof=0
_N_=18 eof=0
_N_=19 eof=1
9
Preliminaries
Variables created in a data step are reset to
missing on each data step iteration:
DATA _NULL_;
SET sashelp.class (OBS=3);
if (_N_=1) then a=0;
_N_=1 Age=14 a=14
a = a + age;
_N_=2 Age=13 a=.
PUT _N_= age= a=;
_N_=3 Age=13 a=.
RUN;
But you can override that reset by ‘retaining’
a variable…
10
Preliminaries
Retaining with a RETAIN statement:
DATA _NULL_;
SET sashelp.class (OBS=3);
RETAIN a 0;
_N_=1 Age=14 a=14
a = a + age;
_N_=2 Age=13 a=27
PUT _N_= age= a=;
_N_=3 Age=13 a=40
RUN;
RETAIN a 0: SAS will not reset 'a' on each data step
iteration. Also initializes 'a' to 0.
11
Preliminaries
Retaining with a 'sum statement':
DATA _NULL_;
SET sashelp.class (OBS=3);
a + age;
a=14 b=1
b + 1;
a=27 b=2
PUT a= b=;
a=40 b=3
RUN;
x + ___: Similar to x = x + ___ but it also
initializes 'x' to zero and retains 'x'.
(In SAS docs, see Sum statement.)
12
DO Loop: General Forms
Indexed DO loop:
DO variable = spec-1 <, …spec-n>
where spec-n is of the form:
start <TO stop> <BY increment>
<WHILE(expression) | UNTIL(expression)>
Conditional DO loops:
DO WHILE (condition);
DO UNTIL (condition);
Special:
DO OVER arrayname;
13
Indexed DO Loop
DO variable = spec-1 <, …spec-n>;
<SAS statements>
END;
Where each spec-n is of the form:
start <TO stop> <BY increment>
<WHILE(expression) | UNTIL(expression)>
14
Indexed DO Loop
DATA _NULL_;
DO i = 1,2,-7,46;
PUT i=;
END;
RUN;
i=1
i=2
i=-7
i=46
15
Indexed DO Loop
DATA _NULL_;
DO i = 1 TO 10;
PUT i=;
END;
RUN;
i=1
i=2
i=3
i=4
i=5
i=6
i=7
i=8
i=9
i=10
16
Indexed DO Loop
You can modify the index variable, if
desired:
DATA _NULL_;
DO i = 1 TO 10;
PUT i=;
i = i + 2;
END;
RUN;
i=1
i=4
i=7
i=10
17
Indexed DO Loop
DATA test;
DO i=1 TO 10;
x = i * 4;
OUTPUT;
END;
RUN;
PROC PRINT DATA=test;
RUN;
Obs
i
x
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
4
8
12
16
20
24
28
32
36
40
18
Indexed DO Loop
DATA _NULL_;
DO i=1 TO 20 BY 2;
PUT i=;
END;
RUN;
i=1
i=3
i=5
i=7
i=9
i=11
i=13
i=15
i=17
i=19
19
Indexed DO Loop
DATA _NULL_;
DO i=20 TO 1 BY -2;
PUT i=;
END;
RUN;
i=20
i=18
i=16
i=14
i=12
i=10
i=8
i=6
i=4
i=2
20
Indexed DO Loop
DATA _NULL_;
DO i=1 BY 2 TO 20;
PUT i=;
END;
RUN;
i=1
i=3
i=5
i=7
i=9
i=11
i=13
i=15
i=17
i=19
21
Indexed DO Loop
DATA _NULL_;
a=0; b=10; c=2;
DO i = a TO b BY c;
PUT i=;
a=22; b=33; c=44;
END;
RUN;
i=0
i=2
i=4
i=6
i=8
i=10
22
Conditional DO Loops
Condition is checked at top of loop:
DO WHILE (condition);
<SAS statements>
END;
Condition is checked at bottom of loop:
DO UNTIL (condition);
<SAS statements>
END;
23
WHILE Conditional DO Loop
DATA _NULL_;
i = 0;
DO WHILE (i LT 3);
i = i + 1;
PUT i=;
END;
RUN;
i=1
i=2
i=3
24
UNTIL Conditional DO Loop
DATA _NULL_;
i = 0;
DO UNTIL (i GE 3);
i = i + 1;
PUT i=;
END;
RUN;
i=1
i=2
i=3
25
Indexed DO Loop: WHILE/UNTIL
DATA _NULL_;
x = 0;
DO i=1 BY 1 WHILE (x LT 50);
x = i * 10;
PUT i= x=;
END;
RUN;
i=1
i=2
i=3
i=4
i=5
x=10
x=20
x=30
x=40
x=50
26
Indexed DO Loop: Multiple Specs
DATA _NULL_;
DO i= -4, 21 to 40 by 3
while(i<29), 7, 100,
61 to 66 by 2;
PUT i=;
END;
RUN;
i=-4
i=21
i=24
i=27
i=7
i=100
i=61
i=63
i=65
27
DO Loop: Auxiliary Statements
• CONTINUE: Begins a new iteration of
the loop.
• LEAVE: Exits the loop.
28
CONTINUE Statement
DATA _NULL_;
DO i = 1 TO 10;
IF (i EQ 5) THEN CONTINUE;
PUT i=;
END;
RUN;
i=1
i=2
i=3
i=4
i=6
i=7
i=8
i=9
i=10
29
CONTINUE Statement
DATA _NULL_;
i = 0;
DO UNTIL (i GE 10);
i = i + 1;
IF (i=5) THEN CONTINUE;
PUT i=;
END;
RUN;
i=1
i=2
i=3
i=4
i=6
i=7
i=8
i=9
i=10
30
LEAVE Statement
DATA _NULL_;
DO i = 1 TO 10;
IF (i EQ 5) THEN LEAVE;
PUT i=;
END;
RUN;
i=1
i=2
i=3
i=4
31
LEAVE Statement
DATA _NULL_;
DO i = 1 BY 1;
IF (i EQ 5) THEN LEAVE;
PUT i=;
END;
RUN;
i=1
i=2
i=3
i=4
32
LEAVE Statement
DATA _NULL_;
x = 0;
DO WHILE (1 = 1);
x = x + 100;
IF (x GE 500) THEN LEAVE;
PUT x=;
END;
RUN;
x=100
x=200
x=300
x=400
33
LEAVE Statement
Caution: The LEAVE statement stops the processing of the current DO loop or SELECT group:
DATA _NULL_;
DO UNTIL(whatever);
DO UNTIL(something);
<SAS statements>
IF (z>10) THEN LEAVE;
END;
END;
RUN;
34
DO Loops With Arrays
In the next few slides, our data file is
named one with variables 'a', 'b', and 'c':
DATA one;
INPUT a b c;
DATALINES;
1 2 3
11 22 33
;
Obs
a
b
c
1
2
1
11
2
22
3
33
PROC PRINT;
RUN;
35
DO Loops With Arrays
Here we create three new variables:
DATA test;
SET one;
x = a + 100;
y = b + 100;
z = c + 100;
PUT a= b= c=
RUN;
@17
a=1 b=2 c=3
a=11 b=22 c=33
x= y= z=;
x=101 y=102 z=103
x=111 y=122 z=133
36
DO Loops With Arrays
• We created 3 new variables, based on
the original 3 variables.
• The coding was identical, except for the
variable names.
• We can ‘automate’ that process by using
arrays and a DO loop.
37
DO Loops With Arrays
DATA test;
SET one;
ARRAY jack (3) a b c;
ARRAY jill (3) x y z;
DO i = 1 TO 3;
jill{i} = jack{i} + 100;
END;
PUT a= b= c= @17 x= y= z=;
RUN;
a=1 b=2 c=3
a=11 b=22 c=33
x=101 y=102 z=103
x=111 y=122 z=133
38
DO Loops With Arrays
• Each ARRAY statement creates/defines an array
that represents 3 variables.
• If SAS sees jack{2} (for example), it substitutes
the 2nd variable in the 'jack' array --- that is,
variable 'b'.
• Likewise, for jill{2} it substitutes variable 'y'.
• Thus:
jill{2} = jack{2} + 100;
is equivalent to:
y = b + 100;
39
Conditional DO Loops With Arrays
DATA test;
SET one;
ARRAY jack (3) a b c;
ARRAY jill (3) x y z;
i = 0;
DO WHILE (i LT 3);
i = i + 1;
jill{i} = jack{i} + 100;
END;
PUT a= b= c= @17 x= y= z=;
RUN;
a=1 b=2 c=3
x=101 y=102 z=103
a=11 b=22 c=33 x=111 y=122 z=133
40
DO OVER Loop With Arrays
DATA test;
SET one;
ARRAY jack a b c;
ARRAY jill x y z;
DO OVER jack;
jill = jack + 100;
END;
PUT a= b= c= @17 x= y= z=;
RUN;
a=1 b=2 c=3
a=11 b=22 c=33
x=101 y=102 z=103
x=111 y=122 z=133
41
DO OVER Loop With Arrays
• DO OVER is an undocumented SAS feature, so
use 'caution' (last documented in Version 6 ??).
• You must omit the count (3) portion of the
ARRAY statement.
• In the DO loop, you do not use any indexing.
SAS does all the indexing for you, processing
one variable at a time from each array, until it
has exhausted all the variables in the array
specified in the DO statement.
42
Using DO Loops For Input
For data, we will use sashelp.class, stored in the
SAS Work library as 'class':
Obs
1
2
3
4
5
6
7
8
Name
Sex
Age
Height
Weight
Alfred
Alice
Barbara
Carol
Henry
James
Jane
Janet
M
F
F
F
M
M
F
F
14
13
13
14
14
12
12
15
69.0
56.5
65.3
62.8
63.5
57.3
59.8
62.5
112.5
84.0
98.0
102.5
102.5
83.0
84.5
112.5
Ronald
Thomas
William
M
M
M
15
11
15
67.0
57.5
66.5
133.0
85.0
112.0
...
17
18
19
43
Using DO Loops For Input
The 'traditional' way to read data uses the implied
data step loop ('automatic loop' or 'observation
loop'):
DATA test;
SET class;
RUN;
NOTE: There were 19 observations
read from the data set WORK.CLASS.
NOTE: The data set WORK.TEST has 19
observations and 5 variables.
44
Using DO Loops For Input
Dorfman: “The automatic loop is engraved in the SAS
usage mentality to such an extent that it has attained an
almost religious status. ... As a result, almost every time
when a file ... has to be processed, an attempt is
subconsciously made to use the implied loop, whatever it
takes.
Such a rigid approach is practically tantamount to forcing a
program into the fixed cage of an existing programming
construct. But programming is not meant to be this way.
It makes sense to choose the tool best fitting the task,
rather than tweaking the task to fit the tool.”
(excerpt from The Magnificent Do)
45
Using DO Loops For Input
Instead, we can use a DO loop in the data step:
DATA test;
DO UNTIL(eof);
SET class END=eof;
OUTPUT;
END;
NOTE: There were
RUN;
19 observations
read from the data set WORK.CLASS.
NOTE: The data set WORK.TEST has 19
observations and 5 variables.
Each execution of SET reads another record.
46
Using DO Loops For Input
The obvious question: WHY use a DO loop?
To help answer that, notice how many times SAS
processed the data step:
DATA test;
PUT _N_=;
DO UNTIL(eof);
SET class END=eof;
OUTPUT;
END;
RUN;
_N_=1
_N_=2
47
Using DO Loops For Input
Because the data step makes only one iteration
to read the data, this method is ideal for tasks
that require:
• Retaining the values of variables.
• Performing actions before and after
reading the data.
• Reading multiple files separately in one
data step.
• Performing break-event processing
(the 'DoW loop').
48
Using DO Loops For Input
Task #1:
• Write a header containing today's date.
• Read the data, select only the 'age=14'
records, and print 'name'.
• Write a trailer containing today's date and a
count of the selected records.
49
Using DO Loops For Input
Here is an example of the traditional approach:
DATA _NULL_;
SET class END=eof;
RETAIN count 0 date;
IF (_N_ EQ 1) THEN DO;
date = TODAY();
PUT date MMDDYY10.;
END;
IF (eof) THEN PUT date MMDDYY10. @15 count=;
IF (age NE 14) THEN DELETE;
PUT name=;
count = count + 1;
RUN;
50
Using DO Loops For Input
What's 'wrong' with the traditional approach:
– Must retain 'count' and 'date', otherwise
they get reset to missing on each iteration of
the data step.
– For each record, must evaluate the header
condition and the trailer condition.
– The logic flow is not simple.
These problems arise because all actions, even if
needed only once, must be coded inside the
implied loop.
51
Using DO Loops For Input
Using a DO loop avoids these problems:
DATA _NULL_;
date = TODAY();
PUT date MMDDYY10.;
count = 0;
DO UNTIL(eof);
SET class END=eof;
IF (age NE 14) THEN CONTINUE;
PUT name=;
count = count + 1;
END;
PUT date MMDDYY10. @15 count=;
STOP;
RUN;
52
Using DO Loops For Input
Our program is straighforward:
• compute 'date' and 'count' and print the
header
• read the data and do the required processing
• print the trailer
We did not have to:
• evaluate any begin or end conditions
• retain 'date' or 'count', since there was only
one pass through the data step
53
Multiple Input Files
We next consider a task involving two input files.
Our input files are named file1 and file2:
DATA file1;
INPUT score1;
datalines;
12
26
37
49
;
DATA file2;
INPUT score2;
datalines;
10
20
30
;
54
Multiple Input Files
Task #2:
• Print a beginning banner.
• In file1, compute the average of 'score1'.
(that average is 31)
• Print that average and the file1 record count.
• In file2, add the above average to 'score2' and
print each new value as 'score3'.
• Print an ending banner.
55
Multiple Input Files: DO Loop
DATA _NULL_;
PUT 'STARTING';
count = 0;
total = 0;
DO UNTIL (eof1);
SET file1 END=eof1;
IF MISSING(score1) THEN
CONTINUE;
count = count + 1;
total = total + score1;
END;
avg = total / count;
PUT count= avg=;
(continued in next box)
DO UNTIL (eof2);
SET file2 END=eof2;
score3 = score2 + avg;
PUT score3=;
END;
PUT 'DONE';
STOP;
RUN;
STARTING
count=4 avg=31
score3=41
score3=51
score3=61
DONE
56
Multiple Input Files: Traditional
DATA _NULL_;
SET file1 (IN=in1)
file2 (IN=in2) END=eof;
RETAIN total count 0 avg;
IF (_N_=1) THEN DO;
PUT 'STARTING';
END;
IF (in1) THEN DO;
total = total + score1;
IF MISSING(score1) THEN
DELETE;
count = count + 1;
END;
obs2 + in2;
IF (in2 AND obs2=1)
THEN DO;
avg = total / count;
PUT count= avg=;
END;
IF (in2) THEN DO;
score3 = score2 + avg;
PUT score3=;
END;
IF (eof) THEN PUT 'DONE';
RUN;
57
Using DO Loops For Input
To mimic the error handling of the Data step
implied loop:
DATA _NULL_;
DO _N_ = 1 BY 1 UNTIL(eof);
_ERROR_ = 0;
SET class END=eof;
<SAS statements>
IF _ERROR_ THEN PUT _ALL_;
END;
RUN;
58
The DoW Loop
• Named after Ian Whitlock (the 'renowned
Master of the SAS Universe') and perhaps
Don Henderson.
• Uses DO loop(s) to read data for tasks which
require break-event processing, such as:
– BY-group processing (FIRST. and LAST.)
– checking for a specific value or a missing
value
59
The DoW Loop
Basic structure:
Data ...;
<Stuff done before break-event>;
Do <Index Specs> Until (Break-Event);
Set A;
<Stuff done for each record>;
End;
<Stuff done after break-event... >;
Run;
60
The DoW Loop
Dorfman: “The intent of organizing such a
structure is to achieve logical isolation of
instructions executed between two successive
break-events from actions performed before and
after a break-event, and to do it in the most
programmatically natural manner.”
61
The DoW Loop
Our data file is named base, has variables 'id'
and 'score', and is sorted by 'id':
DATA base;
INPUT id $ score;
DATALINES;
a 1
a 2
b 3
b 4
b 5
;
PROC SORT DATA=base;
BY id;
PROC PRINT; RUN;
Obs
1
2
3
4
5
id
a
a
b
b
b
score
1
2
3
4
5
62
The DoW Loop
Task #3: Compute the mean of the 'score'
variable for each 'id' group.
Obs
1
2
3
4
5
id
a
a
b
b
b
score
1
2
3
4
5
63
DATA new;
SET base;
BY id;
RETAIN count total;
IF FIRST.id THEN DO;
count = 0;
total = 0;
END;
count = count + 1;
total = total + score;
IF LAST.id THEN DO;
mean = total / count;
OUTPUT;
END;
PROC PRINT DATA=new; RUN;
Traditional
approach:
id score
a
1
a
2
b
3
b
4
b
5
Obs
1
2
id
a
b
score
2
5
count
2
3
total
3
12
mean
1.5
4.0
64
DoW loop
approach:
id score
a
1
a
2
b
3
b
4
b
5
DATA new;
count = 0;
total = 0;
DO UNTIL (LAST.id);
SET base;
BY id;
count = count + 1;
total = total + score;
END;
mean = total / count;
PROC PRINT DATA=new; RUN;
Obs
1
2
count
2
3
total
3
12
id
a
b
score
2
5
mean
1.5
4.0
65
The DoW Loop
Dorfman: “What makes the DOW-loop special?
It is all in the logic. The construct
programmatically separates the before-, during-,
and after-group actions in the same manner and
sequence as does the stream-of-theconsciousness logic:
(continued next slide)
66
The DoW Loop
(continued)
1. If an action is to be done before the group is
processed, simply code it before the DOWloop. Note that is unnecessary to predicate
this action by the IF FIRST.ID condition.
2. If it is to be done with each record, code it
inside the loop.
3. If is has to be done after the group, like
computing an average and outputting
summary values, code it after the DOW-loop.
Note that is unnecessary to predicate this
action by the IF LAST.ID condition.”
67
The DoW Loop
We can "improve" the previous DoW Loop
approach by changing how we write the DO
statement . . .
68
DoW loop
Approach
(as before):
DATA new;
count = 0;
total = 0;
DO UNTIL (LAST.id);
SET base;
BY id;
count = count + 1;
total = total + score;
END;
mean = total / count;
PROC PRINT DATA=new; RUN;
Obs
1
2
count
2
3
total
3
12
id
a
b
score
2
5
mean
1.5
4.0
69
DoW loop
approach
(improved):
DATA new;
DO n = 1 BY 1 UNTIL (LAST.id);
SET base;
BY id;
total = SUM(total,score);
END;
mean = total / n;
PROC PRINT DATA=new; RUN;
Obs
1
2
n
2
3
id
a
b
score
2
5
total
3
12
mean
1.5
4.0
70
DoW loop
approach
(further
improved):
DATA new;
DO _N_ = 1 BY 1 UNTIL (LAST.id);
SET base;
BY id;
total = SUM(total,score);
END;
mean = total / _N_;
PROC PRINT DATA=new; RUN;
Obs
1
2
id
a
b
score
2
5
total
3
12
mean
1.5
4.0
71
The DoW Loop
Task #4: Create a table with the mean 'score' for
each 'id' group merged in:
Obs
1
2
3
4
5
id
a
a
b
b
b
score
1
2
3
4
5
Obs
1
2
3
4
5
id
a
a
b
b
b
score
1
2
3
4
5
mean
1.5
1.5
4.0
4.0
4.0
72
The DoW Loop
The 'traditional' approach would be to use
one data step, as we did earlier, to create
an intermediate file containing the
averages:
id
a
b
mean
1.5
4.0
… and then use a second data step to merge
that file with the original file.
73
The DoW Loop
OR …
We could do it all in one data step using
the DoW loop!
74
DATA new;
DO _N_ = 1 BY 1 UNTIL (LAST.id);
SET base;
BY id ;
total = SUM(total, score) ;
END ;
mean = total / _N_ ;
DO UNTIL (LAST.id) ;
SET base;
BY id;
OUTPUT;
END;
PROC PRINT DATA=new; RUN;
DoW loop
approach:
id score
a
1
a
2
b
3
b
4
b
5
Obs
1
2
3
4
5
id
a
a
b
b
b
score
1
2
3
4
5
total
3
3
12
12
12
mean
1.5
1.5
4.0
4.0
4.0
75
How It Works:
This data step has two SET statements, and each one
reads the same file.
Each SET statement reads its own “virtual copy” of the
file. The two virtual copies are completely independent
of each other.
Likewise, each SET statement uses its own file pointer to
mark where it stops reading in its virtual copy, and these
file pointers are independent of each other.
(continued next slide)
76
The first DO loop is similar to our earlier program. It reads all
the ID='a' records and computes a running total for 'score',
storing it in the 'total' variable. SAS sets a file pointer to
remember where it stopped reading in this file.
Next, the 'mean' variable is computed, using the values for
'total' and '_N_' obtained in the first DO loop. This value for
'mean' will be used in the second DO loop.
The second DO loop then reads its copy of the base file
(beginning with case 1). For each case it reads, it does an
OUTPUT to the new file. Each case will contain the 'mean'
variable, that was computed above. The DO loop continues
until it has read and output all the records for ID='a'.
After all the ID='a' cases have been processed, the second DO
loop stops, and SAS sets a file pointer (a different pointer,
independent of the one used in the first DO loop) to
remember where it stopped reading in this second copy of the
base file.
(continued next slide)
77
SAS reaches the end of the data step.
Now SAS goes through the data step again. As such, it
resets 'total' to missing (so there is no need to manually
reset it).
In the first DO loop, SAS begins reading according to
where the pointer was set previously in that first DO
loop. Thus, it starts with the first ID='b' record. Etc.
The 'mean' variable is computed (for the ID='b' records).
The second DO loop begins reading according to where
the pointer was set previously in the second DO loop
(the ID='b' records). Each case is output, and it includes
the 'mean' computed above (which is the average for
the ID='b' records).
(continued next slide)
78
SAS reaches the end of the data step.
SAS goes through the data step a third time.
This time, the SET statement in the first DO loop
encounters the end of its file. The data step stops
processing.
79
The DoW Loop
We can "improve" the previous DoW Loop
approach by changing how we write the
second DO statement . . .
80
DoW loop
Approach
(as before):
Obs
1
2
3
4
5
DATA new;
DO _N_ = 1 BY 1 UNTIL (LAST.id);
SET base;
BY id ;
total = SUM(total, score) ;
END ;
mean = total / _N_ ;
DO UNTIL (LAST.id) ;
SET base;
BY id;
OUTPUT;
END;
PROC PRINT DATA=new; RUN;
id
a
a
b
b
b
score
1
2
3
4
5
total
3
3
12
12
12
mean
1.5
1.5
4.0
4.0
4.0
81
DoW loop
approach
(improved):
DATA new;
DO _N_ = 1 BY 1 UNTIL (LAST.id);
SET base;
BY id;
total = SUM(total, score);
END;
mean = total / _N_;
DO _N_ = 1 TO _N_ ;
SET base;
OUTPUT;
END;
PROC PRINT DATA=new; RUN;
Obs
1
2
3
4
5
id
a
a
b
b
b
score
1
2
3
4
5
total
3
3
12
12
12
mean
1.5
1.5
4.0
4.0
4.0
82
Summary
• The DO loop is a useful tool for performing
various repetitive tasks in SAS.
• In certain situations when reading data, it
provides an alternative and perhaps better
method than using the implied data step
loop.
• The DoW loop can be a valuable tool for
performing break-event processing.
83
John Matro
Virginia Commonwealth University
jmatro@vcu.edu
SAS and all other SAS Institute Inc. product or service
names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. In the
USA and other countries ® indicates USA registration.
84