Understanding the SAS® DATA Step and the Program Data Vector

advertisement
Understanding the SAS® DATA Step
and the Program Data Vector
Steven J. First, President
2997 Yarmouth Greenway Drive, Madison, WI 53711
Phone: (608) 278-9964, Web: www.sys-seminar.com
Understanding the SAS® DATA Step and the Program Data Vector
1
Understanding the SAS® DATA Step and the PDV
This presentation was written by Systems Seminar Consultants
Consultants, Inc
Inc.
SSC specializes SAS software and offers SAS:
• Training Services
• Consulting Services
• Help Desk Plans
• Newsletter subscriptions to The Missing Semicolon™
Semicolon .
SAS is a registered trademark of SAS Institute Inc. in the USA and other
countries. The Missing Semicolon is a trademark of Systems Seminar
C
Consultants,
lt t Inc.
I
Understanding the SAS® DATA Step and the Program Data Vector
2
Global Warming Part 1
Understanding the SAS® DATA Step and the Program Data Vector
3
Abstract
Th SAS system
The
t
is
i made
d up off SAS PROCs
PROC and
d the
th DATA Step
St
The DATA Step has:
• an excellent, full fledged programming language
• statements to read and write almost any type of data value
• Conversion and calculation of new data
• Looping
• Interfaces
y
• Arrays
• Much, much more.
In many ways,
ways the design of the DATA step along with its powerful
statements, is what makes the SAS language so popular.
Understanding the SAS® DATA Step and the Program Data Vector
4
Abstract (continued)
This paper will
Thi
ill address:
dd
• How the DATA step fits with the rest of the SAS System
• DATA step assumptions and defaults
• internal structures such as buffers, and the Program Data Vector
• compiler statements
• executable statements
Understanding the SAS® DATA Step and the Program Data Vector
5
Introduction
The SAS system’s
Th
t ’ origins
i i are in
i the
th 1960’s
1960’ and
d 1970’
1970’s with:
ith
• James Goodnight
• John Sall
• A. J. Barr
• others
Concepts in the design include:
• “self defining files”
y
of default assumptions
p
• a system
• procedures for commonly used routines
• data handling step that would evolve into the SAS DATA step
Understanding the SAS® DATA Step and the Program Data Vector
6
Introduction
The DATA step
Th
t
in
i my opinion:
i i
• extremely simple
• elegant design
• continues today
• 30 years of enhancements.
Understanding the SAS® DATA Step and the Program Data Vector
7
Structure of SAS
SAS consists
i t of:
f
1. a data handling language (DATA step)
2. a library of pre-written procedures ( PROC step)
S A S
STATEMENTS
RAW
DATA
SAS
DATA
STEP
S A S
SUPERVISOR
SAS
DATA
SET
Understanding the SAS® DATA Step and the Program Data Vector
SAS
PROC
STEP
REPORT
8
Purpose of the DATA Step
“Gett th
“G
the d
data
t iin shape”
h
” ffor llater
t PROC
PROCs and
d DATA steps.
t
• SAS PROCs can only read SAS datasets
• We might have some other type of file to process
• SAS dataset has built in descriptor that keeps track of names and
attributes
• later steps don’t have to remember as many details
If we don’t have well defined data
• DATA step gives us the power to read and write virtually any kind of file
• We can do calculations and computations on a single row of data
• It has a very powerful data handling language
Understanding the SAS® DATA Step and the Program Data Vector
9
SAS DATA Step Overview
DATA steps
t
can read
d and
d write
it mostt types
t
off data
d t stored
t d on your computer.
t
other
SAS
d
dataset
raw
disk,
tape
instream
data
DBMS,
program
products
d
SAS/ACCESS
get the data in
shape for analysis
S S/ CC SS
SAS/ACCESS
other
SAS
dataset
instream
data
raw
disk,
tape
reports
DBMS,
program
products
Notes:
• DATA step output is usually a SAS dataset but can be other files
files.
• Access to non-SAS database management systems requires a
SAS/ACCESS product.
Understanding the SAS® DATA Step and the Program Data Vector
10
Default Assumptions
M
Many
assumptions
ti
are made
d tto save titime and
d effort.
ff t
As computer scientist I study and use many programming languages
• Early on I was intrigued by the cleverness and common sense of SAS
• Many tedious programming tasks were eliminated through defaults
• System still provided a means to override those defaults when necessary
• Our job as a DATA step programmer then is different from other
languages
In many ways our tasks are:
• understanding the defaults
g how to work with them
• knowing
• override them as necessary.
Understanding the SAS® DATA Step and the Program Data Vector
11
Default Assumptions (continued)
E
Examples
l Of SAS D
Defaults:
f lt
•
•
•
•
•
•
•
•
Handling compile and execution naming and storage details
A dataset descriptor that makes SAS datasets “self defining”
Generating data set names if omitted
When reading a data set, assume most recently created dataset if not
specified
Processing all the rows and columns in a file
Automatically opening and closing of files
Automatically controlling data initialization
DATA step looping, data set output, and end of file checking
Understanding the SAS® DATA Step and the Program Data Vector
12
Default Assumptions (continued)
E
Examples
l Of SAS D
Defaults:
f lt
•
•
•
•
•
•
•
Automatically defining storage areas for each variable referenced without
need
d tto predefine
d fi th
them
A default length of 8 was assumed for all variables
A assumption that a variable is numeric if not specified
LIST input assumed that data values would be separated by blanks rather
than specifying exact columns
SUBSETTING IF statements which imply to continue processing if a
condition
diti is
i true,
t
else
l delete
d l t the
th observation.
b
ti
(E
(Ex. If rate
t > 10;
10 )
When no comparison is made in an IF statement, assume to be checking
for 1 (true) (Ex. If eof then put ‘At end’;)
Ab i t d sum statements.
Abreviated
t t
t (Ex.
(E Salestot+sales)
S l t t
l )
Understanding the SAS® DATA Step and the Program Data Vector
13
Compiling a DATA Step
A mostt languages,
As
l
the
th DATA step
t is
i fifirstt compiled
il d th
then executed.
t d
•
•
•
•
•
•
•
•
Languages can be compiled, interpreted, SAS is a hybrid language
Some features from other languages
Some unique features
Sometimes difficult to separate compile versus execution events with SAS
DATA step compiler examines SAS statements for syntax data structures
generates an executable program
q to SAS compiler
p
checking
g for the existence of resources
Unique
Assumptions that it “inserts” into the source code.
Understanding the SAS® DATA Step and the Program Data Vector
14
Data Structures
“G tti the
“Getting
th data
d t in
i shape”
h
” needs
d data
d t structures
t t
to
t hold
h ld data
d t as processed.
d
•
•
•
All computer languages need to address this
Each language may name the structures differently
There is a lot of similarity in the way most languages store data.
Understanding the SAS® DATA Step and the Program Data Vector
15
A Typical SAS Job
1234567890123456789012345678901234
input buffer
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
;
infile rawin;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
run;
PROGRAM
DATA
VECTOR
Name:
Type:
Length:
Format:
Informat:
Label:
Flags:
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
SALES
NUM
8
EXPENSE _ERROR_ _N_
NUM
NUM
NUM
8
8
8
D
D
Value
DATASET
DESCRIPTOR
PORTION
(DISK)
(
)
DATASET
DATA
PORTION
Name:
Type:
Length:
g
Format:
Informat:
Label:
NAME
CHAR
10
DIVISION
CHAR
1
BETH
CHRIS
JOHN
H
H
H
YEARS
NUM
8
12
2
7
SALES
NUM
8
EXPENSE
NUM
8
4822.12 982.10
233.11 94.12
678.43 150.11
Understanding the SAS® DATA Step and the Program Data Vector
16
Raw File Buffers
DATA steps
t
reading/writing
di / iti
“raw”
“
” or non-SAS
SAS data
d t need
d memory buffers.
b ff
• Needed to temporarily hold at least one input record at a time.
• Also times when multiple lines of input can be held in buffers
• Allows the program to logically read later rows before earlier ones.
• Buffer contains the complete input and output record, regardless of
whether the INPUT statement reads all of the columns.
• SAS datasets and RDMS (which usually appear as SAS datasets) do not
use raw buffers as the files are already in “shape”.
Understanding the SAS® DATA Step and the Program Data Vector
17
Raw File Buffers
1234567890123456789012345678901234
input buffer
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
infile rawin;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
run;
PROGRAM
DATA
VECTOR
Name:
Type:
Length:
Format:
Informat:
Label:
Flags:
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
SALES
NUM
8
EXPENSE _ERROR_ _N_
NUM
NUM
NUM
8
8
8
D
D
Value
DATASET
DESCRIPTOR
PORTION
(DISK)
(
)
DATASET
DATA
PORTION
Name:
Type:
Length:
g
Format:
Informat:
Label:
NAME
CHAR
10
DIVISION
CHAR
1
BETH
CHRIS
JOHN
H
H
H
YEARS
NUM
8
12
2
7
SALES
NUM
8
EXPENSE
NUM
8
4822.12 982.10
233.11 94.12
678.43 150.11
Understanding the SAS® DATA Step and the Program Data Vector
18
The LOGICAL PROGRAM DATA VECTOR (PDV)
A second
d memory area ffor data
d t manipulation,
i l ti
conversion,
i
refinement.
fi
t
Areas are needed for:
• Inputting and input formatting (informatting) desired variables
• Revising existing values
• Computing new variables
• System indicators and flags
• Called Logical Program Data Vector (PDV).
g g have a similar working
g area. ((COBOL Working
g Storage).
g )
• Manyy languages
• “Retained” and “Non-retained” variables are stored in separate physical
program data vectors, together make Logical Program Data Vector
• All variables referenced be automaticallyy defined in the PDV by
y compiler
p
using characteristics from the first reference of a variable.
p
agemo=age/12;
g
g
;
Example:
Understanding the SAS® DATA Step and the Program Data Vector
19
The LOGICAL PROGRAM DATA VECTOR (PDV)
1234567890123456789012345678901234
input buffer
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
infile rawin;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
run;
PROGRAM
DATA
VECTOR
Name:
Type:
Length:
Format:
Informat:
Label:
Flags:
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
SALES
NUM
8
EXPENSE _ERROR_ _N_
NUM
NUM
NUM
8
8
8
D
D
Value
DATASET
DESCRIPTOR
PORTION
(DISK)
(
)
DATASET
DATA
PORTION
Name:
Type:
Length:
g
Format:
Informat:
Label:
NAME
CHAR
10
DIVISION
CHAR
1
BETH
CHRIS
JOHN
H
H
H
YEARS
NUM
8
12
2
7
SALES
NUM
8
EXPENSE
NUM
8
4822.12 982.10
233.11 94.12
678.43 150.11
Understanding the SAS® DATA Step and the Program Data Vector
20
The Final SAS Dataset
A ““self-defining”
lf d fi i ” d
dataset.
t
t
•
•
•
The dataset descriptor contains attributes for all kept variables plus data
sett labeling
l b li iinformation.
f
ti
The Dataset Data portion contains data values for all output rows and
kept columns.
P
Pseudo-variables
d
i bl are nott kkept.
t
Understanding the SAS® DATA Step and the Program Data Vector
21
The Final SAS Dataset
1234567890123456789012345678901234
input buffer
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
infile rawin;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
run;
PROGRAM
DATA
VECTOR
Name:
Type:
Length:
Format:
Informat:
Label:
Flags:
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
SALES
NUM
8
EXPENSE _ERROR_ _N_
NUM
NUM
NUM
8
8
8
D
D
Value
DATASET
DESCRIPTOR
PORTION
(DISK)
(
)
DATASET
DATA
PORTION
Name:
Type:
Length:
g
Format:
Informat:
Label:
NAME
CHAR
10
DIVISION
CHAR
1
BETH
CHRIS
JOHN
H
H
H
YEARS
NUM
8
12
2
7
SALES
NUM
8
EXPENSE
NUM
8
4822.12 982.10
233.11 94.12
678.43 150.11
Understanding the SAS® DATA Step and the Program Data Vector
22
Variable Attributes
Th compiler
The
il assigns
i
each
hd
defined
fi d variable
i bl severall characteristics.
h
t i ti
•
•
•
•
•
•
•
•
•
Relative variable number
Position in the dataset
Name
Data type
Length in bytes
Informat
Format
Variable label
Flags to indicate dropping, retaining, handling of missing values for each
variable
Understanding the SAS® DATA Step and the Program Data Vector
23
Data Types and Conversion
All SAS values
l
are one off ttwo d
data
t ttypes: numeric
i or character.
h
t
•
•
•
•
•
•
•
Hundreds of different data types can be read or written via the PDV, only
t
two
stored
t d
In PDV, SAS Dataset every character value is stored as a native EBCDIC
or ASCII value with length between 1 and at least 32767
N
Numerics
i stored
t d as d
double
bl precision
i i flfloating
ti point
i t values
l
with
ith llength
th
between 3 and 8 bytes.
Storing only two data types greatly simplifies things for SAS datasets
moves the
th complication
li ti off converting
ti diff
differentt d
data
t ttypes ((packed,
k d bi
binary,
etc.) to the INPUT and PUT statement along with appropriate
INFORMATS and FORMATS.
Floating point for numbers with a length of 8 allows for storage of very
large numbers (or small) without overflow
Floating point does have minor mathematical issues of its own.
Understanding the SAS® DATA Step and the Program Data Vector
24
Data Types and Conversion
1234567890123456789012345678901234
input buffer
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
;
infile rawin;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
run;
PROGRAM
DATA
VECTOR
Name:
Type:
Length:
Format:
Informat:
Label:
Flags:
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
SALES
NUM
8
EXPENSE _ERROR_ _N_
NUM
NUM
NUM
8
8
8
D
D
Value
DATASET
DESCRIPTOR
PORTION
(DISK)
(
)
DATASET
DATA
PORTION
Name:
Type:
Length:
g
Format:
Informat:
Label:
NAME
CHAR
10
DIVISION
CHAR
1
BETH
CHRIS
JOHN
H
H
H
YEARS
NUM
8
12
2
7
SALES
NUM
8
EXPENSE
NUM
8
4822.12 982.10
233.11 94.12
678.43 150.11
Understanding the SAS® DATA Step and the Program Data Vector
25
Pseudo Variables
S
Special
i l variables
i bl th
thatt th
the compiler
il creates
t th
thatt are nott added
dd d tto output
t t fil
file.
Partial list:
_N_ contains the number of times the DATA step has looped.
_ERROR_ is set to 0 if there were no input errors, otherwise 1.
FIRST.variable showing beginning of control break
LAST.variable showing end of control break
_INFILE_ contains entire input buffer
Others can be requested by the programmer to:
• Detect end of file
• Access control blocks
• Access and alter system information
Understanding the SAS® DATA Step and the Program Data Vector
26
Pseudo Variables
1234567890123456789012345678901234
input buffer
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
;
infile rawin;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
run;
PROGRAM
DATA
VECTOR
Name:
Type:
Length:
Format:
Informat:
Label:
Flags:
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
SALES
NUM
8
EXPENSE _ERROR_ _N_
NUM
NUM
NUM
8
8
8
D
D
Value
DATASET
DESCRIPTOR
PORTION
(DISK)
(
)
DATASET
DATA
PORTION
Name:
Type:
Length:
g
Format:
Informat:
Label:
NAME
CHAR
10
DIVISION
CHAR
1
BETH
CHRIS
JOHN
H
H
H
YEARS
NUM
8
12
2
7
SALES
NUM
8
EXPENSE
NUM
8
4822.12 982.10
233.11 94.12
678.43 150.11
Understanding the SAS® DATA Step and the Program Data Vector
27
Output Datasets
U
Usually
ll a SAS d
datafile,
t fil b
butt raw fil
files and
d reports
t can also
l b
be created.
t d
•
•
•
•
•
•
SAS data sets are “self-defining” via dataset descriptor
Much simpler operation for the programmer
SAS automatically builds the structures needed
Outputs the record at DATA step return.
In addition, all variables except dropped and pseudo variables are kept
Descriptor info can be printed with PROC CONTENTS, dictionary tables
Notes:
• If a raw file or report is produced, buffers opposite those of INFILE are
used.
• Raw files can pass info to other programs.
Understanding the SAS® DATA Step and the Program Data Vector
28
Questions About The Data Step
If traditional
t diti
l programming
i experience
i
is
i applied
li d thi
this DATA step,
t
questions
ti
might be:
Where are the opens and closes?
Where do we write out records?
What is looping?
Wh does
When
d
th
the program stop?
t ?
Understanding the SAS® DATA Step and the Program Data Vector
29
Assumptions Made In The Data Step
Th d
The
default
f lt actions
ti
off th
the DATA step
t easily
il accommodates
d t mostt programs.
•
•
•
•
•
•
All rows are read from files starting with the first record until last record.
Input values from a previous row are cleared before reading next row.
All variables referenced will be included on the output file.
All records will be included on the resulting output file.
All files should be opened at the beginning and closed at the end of step.
Programs should not continue to loop if no data is read in previous pass.
Understanding the SAS® DATA Step and the Program Data Vector
30
Assumptions Made In The Data Step
The SAS compiler makes the following assumption and inserts code to do:
•
•
•
•
•
Immediately upon entry check for infinite looping
All values from non-SAS files are cleared before executing any
statements.
If any reading statement would read a record after end of file, step stops.
If the program reaches the last statement in the step
step, or if a RETURN
statement is executed, the current PDV contents (all columns for each
row) is output to the SAS data set being built.
A branch is executed to go to the top and enter the DATA step for another
pass.
Understanding the SAS® DATA Step and the Program Data Vector
31
Assumptions Made In The Data Step
A th view
Another
i
it is
i that
th t the
th compiler
il iinserts
t th
the bl
blue ititalic
li code.
d
data softsale;
Check for looping
looping, initialize PDV
infile rawin;
if at EOF then stop
Input
Name
Years
Expense
$1-10 Division
15-16 Sales
28-34 State
output to SAS Dataset
goto top of DATA step
$12
19-25
$36-37;
run;
Notes:
p virtually
y infinite-loop
pp
proof.
• These save effort and make DATA steps
Understanding the SAS® DATA Step and the Program Data Vector
32
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
Name
Division
Years
Sales
Dataset work.softsale
Division
Years
Sales
Expense
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
33
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
Name
Division
Years
Sales
Dataset work.softsale
Division
Years
Sales
Expense
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
34
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
Name
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
35
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
Name
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
36
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
Name
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
37
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
Name
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
38
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
CHRIS
Name
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
39
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
CHRIS
Name
Division
H
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
40
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
CHRIS
Name
Division
H
Years
Sales
Expense
2
.
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
41
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
CHRIS
Name
Division
H
Years
Sales
Expense
2
233.11
.
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
42
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
CHRIS
Name
Division
H
Years
Sales
Expense
2
233.11
94.12
Dataset work.softsale
Division
Years
Sales
Expense
State
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
43
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
CHRIS
Name
Division
H
Years
Sales
Expense
2
233.11
94.12
Dataset work.softsale
Division
Years
Sales
Expense
State
WI
State
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
44
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
Name
CHRIS
(Descriptor)
Name
(Data)
CHRIS
Division
H
Years
Sales
Expense
2
233.11
94.12
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
WI
State
WI
45
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
Name
CHRIS
(Descriptor)
Name
(Data)
CHRIS
Division
H
Years
Sales
Expense
2
233.11
94.12
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
WI
State
WI
46
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
CHRIS
H
2
233.11
94.12 WI
Input buffer
Logical
Program
Data Vector
Name
CHRIS
(Descriptor)
Name
(Data)
CHRIS
Division
H
Years
Sales
Expense
2
233.11
94.12
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
WI
State
WI
47
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
Name
(Descriptor)
Name
(Data)
CHRIS
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
State
WI
48
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
Name
(Descriptor)
Name
(Data)
CHRIS
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
State
WI
49
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
Name
(Descriptor)
Name
(Data)
CHRIS
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
State
WI
50
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
MARK
H
5
298.12
52.65 WI
Input buffer
Logical
Program
Data Vector
Name
(Descriptor)
Name
(Data)
CHRIS
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
State
WI
51
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
Each row is handled
the same way.
Let's skip through
MARK’s input step.
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
MARK
H
5
298.12
52.65 WI
Input buffer
Logical
Program
Data Vector
Name
(Descriptor)
Name
(Data)
CHRIS
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
Understanding the SAS® DATA Step and the Program Data Vector
233 11
233.11
Expense
94 12
94.12
State
State
WI
52
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
MARK
H
5
298.12
52.65 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
Name
MARK
Name
CHRIS
Division
H
Years
Sales
Expense
5
298.12
52.65
Dataset work.softsale
Division
Years
Sales
H
2
233.11
Expense
94.12
State
WI
State
WI
(Data)
Understanding the SAS® DATA Step and the Program Data Vector
53
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
MARK
H
5
298.12
52.65 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
MARK
Name
CHRIS
MARK
Division
H
Years
Sales
Expense
5
298.12
52.65
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
WI
State
WI
WI
54
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
MARK
H
5
298.12
52.65 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
MARK
Name
CHRIS
MARK
Division
H
Years
Sales
Expense
5
298.12
52.65
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
WI
State
WI
WI
55
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
MARK
H
5
298.12
52.65 WI
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
MARK
Name
CHRIS
MARK
Division
H
Years
Sales
Expense
5
298.12
52.65
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
WI
State
WI
WI
56
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
State
WI
WI
57
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
State
WI
WI
58
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
State
WI
WI
59
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
SARAH
S
6
301.21
65.17 MN
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
State
WI
WI
60
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
Each row is handled
the same way.
Let's skip through
SARAH’S input step.
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
SARAH
S
6
301.21
65.17 MN
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
State
WI
WI
61
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
SARAH
S
6
301.21
65.17 MN
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
SARAH
Name
CHRIS
MARK
Division
S
Years
Sales
Expense
6
301.21
65.17
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
State
MN
State
WI
WI
62
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
SARAH
S
6
301.21
65.17 MN
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
SARAH
Name
CHRIS
MARK
SARAH
Division
S
Years
Sales
Expense
6
301.21
65.17
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
S
6
301.21
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
65.17
State
MN
State
WI
WI
MN
63
Executing the DATA Step
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
SARAH
S
6
301.21
65.17 MN
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
SARAH
Name
CHRIS
MARK
SARAH
Division
S
Years
Sales
Expense
6
301.21
65.17
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
S
6
301.21
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
65.17
State
MN
State
WI
WI
MN
64
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
SARAH
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
S
6
301.21
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
65.17
State
State
WI
WI
MN
65
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
SARAH
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
S
6
301.21
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
65.17
State
State
WI
WI
MN
66
Executing the DATA Step
data softsale;
init buffer, PDV
infile rawin;
if at EOF then stop
Fileref rawin
1
2
3
1234567890123456789012345678901234567
CHRIS
H
2
233.11
94.12 WI
End
MARK
H
5
298.12
52.65 WI
SARAH
S
6
301.21
65.17 MN
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
SARAH
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
S
6
301.21
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
65.17
State
State
WI
WI
MN
67
Executing the DATA Step
Fileref rawin
1
2
3
We're done!
1234567890123456789012345678901234567
data softsale;
Go to the next step.
CHRIS
H
2
233.11
94.12 WI
init buffer, PDV
End
MARK
H
5
298.12
52.65 WI
infile rawin;
SARAH
S
6
301.21
65.17 MN
if at EOF then stop
input Name $1-10 Division $12 Years 15-16
Sales 19-25 Expense 28-34 State $36-37;
output to SAS Dataset
goto top of Data Step
run;
1
2
3
4
8
1234567890123456789012345678901234567890...0
Input buffer
Logical
Program
Data Vector
(Descriptor)
(Data)
Name
Name
CHRIS
MARK
SARAH
Division
Years
Sales
Expense
.
.
.
Dataset work.softsale
Division
Years
Sales
H
2
233.11
H
5
298.12
S
6
301.21
Understanding the SAS® DATA Step and the Program Data Vector
Expense
94.12
52.65
65.17
State
State
WI
WI
MN
68
Overriding Data Step Assumptions
N t allll programs fit scenario
Not
i above;
b
can we alter
lt b
behavior
h i off th
the program?
?
•
•
•
•
•
•
•
The FIRSTOBS= and OBS= system options begin and end logically after
the first record and before the last record respectively
respectively.
The RETAIN compiler statement instructs the step to not initialize
variables.
The STOP statement (usually used with IF statement) stops step early
early.
The DELETE, subsetting IF statements, exit the DATA step early; never
reach OUTPUT.
RETURN exits the DATA step early but does output to the SAS file
file.
DROP, KEEP statements, dataset options exclude or include columns.
GOTO and various DO groups alter the looping path for the program.
Understanding the SAS® DATA Step and the Program Data Vector
69
Retaining Data Values
S
Sometimes
ti
variables
i bl should
h ld nott b
be cleared
l
d upon exit/reentry.
it/
t
The RETAIN compiler statement:
• Specifies listed variables should not be initialized
• Can also set an initial value
• Iif first reference sets compiler attributes
• Is essentially a flag set by compiler to tell execution don’t clear
• Can be coded anywhere in DATA Step
• Variables read with SET,, MERGE,, or UPDATE,, MODIFY are retained.
Understanding the SAS® DATA Step and the Program Data Vector
70
Retaining Data Values
Example: Read a file with CITY in the first row
row, RETAIN the value
value.
1234567890123456789012345678901234
MADISON
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
infile rawin missover;
if _N_ = 1 then do;
input
p
city
y $2-8;
;
delete; end;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
retain city;
run;
PROGRAM
DATA
VECTOR
N
Name:
Type:
Length:
Format:
Informat:
Label:
Flags:
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
SALES
NUM
8
EXPENSE CITY
NUM
CHAR
8
7
R
Value
DATASET
DESCRIPTOR
PORTION
(DISK)
Format:
DATASET
DATA
PORTION
Name:
Type:
Length:
Informat:
Label:
_ERROR_
ERROR
NUM
8
_N_
N
NUM
8
D
D
MADISON
NAME
CHAR
10
DIVISION
CHAR
1
YEARS
NUM
8
BETH
CHRIS
JOHN
H
H
H
12
2
7
SALES
NUM
8
4822.12
4822
12
233.11
678.43
EXPENSE CITY
NUM
CHAR
8
7
982
982.10
10
94.12
150.11
MADISON
MADISON
MADISON
Understanding the SAS® DATA Step and the Program Data Vector
71
Stopping Early
N
Normally
ll jjobs
b run until
til EOF,
EOF OBS
OBS= or STOP can stop
t step
t early.
l
Example: Stop after 50 records
data softsale;
infile rawin;
if _N_
N = > 50 then stop;
input name $1-10 division $12 years 15-16 sales 19-25
expense 27-34;
run;
Understanding the SAS® DATA Step and the Program Data Vector
72
Stopping Late
Sometimes
S
ti
a program should
h ld continue
ti
even if the
th llastt record
d was read.
d
Example: Calculate a percentage of total
DATA PCTDS;
IF _N_ = 1 THEN
DO UNTIL(EOF);
UNTIL(EOF)
SET CONCAT(KEEP=NAME
SALES)
END=EOF;
TOTSALES+SALES;
END;
SET CONCAT(KEEP=NAME SALES);
SALESPCT=(SALES*100)/TOTSALES;
RUN;
PDV
Name
Sales
TOTSALES
CONCAT DATASET
OBS
1
2
3
4
5
6
7
NAME
BETH
CHRIS
JOHN
MARK
ANDREW
BENJAMIN
JANET
YEARS
12
2
7
5
24
3
1
SALES
4822.12
233.11
678.43
298 12
298.12
1762.11
201.11
98.11
SALESPCT
Understanding the SAS® DATA Step and the Program Data Vector
73
Outputting And Deleting Observations
D f lt is
Default
i tto output
t t the
th currentt row if DATA step
t reaches
h implied
i li d OUTPUT.
OUTPUT
Statements to discard rows:
• DELETE (usually after an IF) leaves the DATA step without OUTPUT.
• False cases of Subsetting IF statements exit without OUTPUTting.
• RETURN exits the step but does OUTPUT.
• Explicit OUTPUT statement OUTPUTs when executed, but no implied
OUTPUT at bottom.
• You may prefer those positive statements such as Subsetting IF,
OUTPUT etc.
• You might use a negative statement such as DELETE to filter unwanted
rows.
Note:
• WHERE also filters rows in engine, not in DATA step.
Understanding the SAS® DATA Step and the Program Data Vector
74
Outputting And Deleting Observations
Example:
E
l O
Only
l output
t t records
d with
ith more th
than 4000 in
i Sales.
S l
.
data softsale;
if no input last time thru then stop
initialize PDV
infile rawin;
if at EOF then stop
input Name
$1-10
$1
10
Division $12
Years
15-16
Sales
19-25
Expense
28-34
St t
State
$36
$36-37;
37
If sales > 4000;
If sales not gt 4000 then goto top of DATA step
output to SAS Dataset
goto top
g
p of DATA step
p
run;
Note:
• WHERE also filters rows in engine, not in DATA step.
Understanding the SAS® DATA Step and the Program Data Vector
75
Losing the Crowd
Understanding the SAS® DATA Step and the Program Data Vector
76
Accumulating Totals in a DATA Step
W should
We
h ld b
be able
bl tto write
it fformulas
l tto countt or sum across observations.
b
ti
In a DATA step, read the following dataset and do the following:
• Count all employees and print running counts.
• Sum hours and print running sums.
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
Understanding the SAS® DATA Step and the Program Data Vector
77
Accumulating Totals in a DATA Step (continued)
data timecard;
infile rawin;
input Name $ Hours;
Ktr=ktr+1;
Ktr
ktr 1;
Hourstot=hourstot+hours;
run;
Fileref
i
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO PAYROLL';
run;
SOFTCO PAYROLL
OBS
1
2
3
4
Name
JOE
PETE
STEVE
TOM
Hours
40
20
.
35
Ktr
.
.
.
.
Hourstot
.
.
.
.
Question: Why didn't the above program work correctly?
Understanding the SAS® DATA Step and the Program Data Vector
78
Accumulating Totals in a DATA Step (continued)
Adding values
Addi
l
across records
d has
h three
th
problems:
bl
1. All variables are set to missing on every record.
2. Totals normally need to start at 0, not missing.
3. When a missing value is added to any value, the result is a missing value.
The SUM statement is a special assignment statement to add values
across observations.
SYNTAX:
variable+expression;
Notes:
• SUM variables start at 0, not missing.
• SUM variables are retained from pass to pass.
• Any missing values encountered are ignored.
• RETAIN and SUM function could also be used.
Understanding the SAS® DATA Step and the Program Data Vector
79
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
Ktr
Understanding the SAS® DATA Step and the Program Data Vector
Hourstot
80
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
Ktr
RM
.
0
Understanding the SAS® DATA Step and the Program Data Vector
Hourstot
RM
40
20
.
35
Set at
compile
time.
0
81
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
Ktr
RM
.
0
Understanding the SAS® DATA Step and the Program Data Vector
Hourstot
RM
0
82
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
JOE
Hours
40
Ktr
RM
0
Understanding the SAS® DATA Step and the Program Data Vector
Hourstot
RM
0
83
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
JOE
JOE
PETE
STEVE
TOM
40
20
.
35
0
+ 1
Hours
40
Ktr
RM
1
Understanding the SAS® DATA Step and the Program Data Vector
Hourstot
RM
0
84
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
JOE
40
20
.
35
0
+ 40
Hours
40
Ktr
RM
1
Understanding the SAS® DATA Step and the Program Data Vector
Hourstot
RM
40
85
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
JOE
Obs
1
Ktr
RM
40
Name
JOE
Hourstot
RM
1
40
Hours
Ktr
Hourstot
40
1
40
Understanding the SAS® DATA Step and the Program Data Vector
86
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
Name
JOE
Hours
Ktr
RM
.
1
Hourstot
RM
40
Hours
Ktr
Hourstot
40
1
40
Understanding the SAS® DATA Step and the Program Data Vector
87
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
Name
JOE
Hours
Ktr
RM
.
1
Hourstot
RM
40
Hours
Ktr
Hourstot
40
1
40
Understanding the SAS® DATA Step and the Program Data Vector
88
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
PETE
Obs
1
Ktr
RM
20
Name
JOE
Hourstot
RM
1
40
Hours
Ktr
Hourstot
40
1
40
Understanding the SAS® DATA Step and the Program Data Vector
89
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
Ktr
RM
20
Name
JOE
40
20
.
35
1
+ 1
Hours
PETE
JOE
PETE
STEVE
TOM
Hourstot
RM
2
40
Hours
Ktr
Hourstot
40
1
40
Understanding the SAS® DATA Step and the Program Data Vector
90
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
40
+ 20
Hours
PETE
Ktr
RM
20
Name
JOE
40
20
.
35
Hourstot
RM
2
60
Hours
Ktr
Hourstot
40
1
40
Understanding the SAS® DATA Step and the Program Data Vector
91
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
PETE
Ktr
RM
20
Hourstot
RM
2
60
Obs
Name
Hours
Ktr
Hourstot
1
2
JOE
PETE
40
20
1
2
40
60
Understanding the SAS® DATA Step and the Program Data Vector
92
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
Ktr
RM
.
2
Hourstot
RM
60
Obs
Name
Hours
Ktr
Hourstot
1
2
JOE
PETE
40
20
1
2
40
60
Understanding the SAS® DATA Step and the Program Data Vector
93
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
Ktr
RM
.
2
Hourstot
RM
60
Obs
Name
Hours
Ktr
Hourstot
1
2
JOE
PETE
40
20
1
2
40
60
Understanding the SAS® DATA Step and the Program Data Vector
94
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
Ktr
RM
STEVE
.
2
Hourstot
RM
60
Obs
Name
Hours
Ktr
Hourstot
1
2
JOE
PETE
40
20
1
2
40
60
Understanding the SAS® DATA Step and the Program Data Vector
95
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
JOE
PETE
STEVE
TOM
40
20
.
35
2
+ 1
Name
Hours
Ktr
RM
STEVE
.
3
Hourstot
RM
60
Obs
Name
Hours
Ktr
Hourstot
1
2
JOE
PETE
40
20
1
2
40
60
Understanding the SAS® DATA Step and the Program Data Vector
96
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
Fileref
RAWIN
JOE
PETE
STEVE
TOM
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
+
Name
Hours
Ktr
RM
STEVE
.
3
40
20
.
35
60
.
Hourstot
RM
60
Obs
Name
Hours
Ktr
Hourstot
1
2
JOE
PETE
40
20
1
2
40
60
Understanding the SAS® DATA Step and the Program Data Vector
97
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
Ktr
RM
STEVE
.
3
Obs
1
2
3
Name
JOE
PETE
STEVE
Hourstot
RM
60
Hours
Ktr
Hourstot
40
20
.
1
2
3
40
60
60
Understanding the SAS® DATA Step and the Program Data Vector
98
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
2
3
Name
JOE
PETE
STEVE
Hours
Ktr
RM
.
3
Hourstot
RM
60
Hours
Ktr
Hourstot
40
20
.
1
2
3
40
60
60
Understanding the SAS® DATA Step and the Program Data Vector
99
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
2
3
Name
JOE
PETE
STEVE
Hours
Ktr
RM
.
3
Hourstot
RM
60
Hours
Ktr
Hourstot
40
20
.
1
2
3
40
60
60
Understanding the SAS® DATA Step and the Program Data Vector
100
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
TOM
Obs
1
2
3
Ktr
RM
35
Name
JOE
PETE
STEVE
Hourstot
RM
3
60
Hours
Ktr
Hourstot
40
20
.
1
2
3
40
60
60
Understanding the SAS® DATA Step and the Program Data Vector
101
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
2
3
Ktr
RM
35
Name
JOE
PETE
STEVE
40
20
.
35
3
+ 1
Hours
TOM
JOE
PETE
STEVE
TOM
Hourstot
RM
4
60
Hours
Ktr
Hourstot
40
20
.
1
2
3
40
60
60
Understanding the SAS® DATA Step and the Program Data Vector
102
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Obs
1
2
3
60
+ 35
Hours
TOM
Ktr
RM
35
Name
JOE
PETE
STEVE
40
20
.
35
Hourstot
RM
4
95
Hours
Ktr
Hourstot
40
20
.
1
2
3
40
60
60
Understanding the SAS® DATA Step and the Program Data Vector
103
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
flags
PDV
Name
Hours
TOM
Obs
1
2
3
4
Ktr
RM
35
Name
JOE
PETE
STEVE
TOM
Hourstot
RM
4
95
Hours
Ktr
Hourstot
40
20
.
35
1
2
3
4
40
60
60
95
Understanding the SAS® DATA Step and the Program Data Vector
104
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
Understanding the SAS® DATA Step and the Program Data Vector
105
An Accumulation Solution
data timecard;
infile rawin;
input Name $ Hours;
Ktr+1;
Hourstot+hours;
run;
fileref
rawin
JOE
PETE
STEVE
TOM
40
20
.
35
proc print data=timecard;
title 'SOFTCO
SOFTCO PAYROLL
PAYROLL';
;
run;
SOFTCO PAYROLL
Obs
1
2
3
4
Name
JOE
PETE
STEVE
TOM
Hours
Ktr
Hourstot
40
20
.
35
1
2
3
4
40
60
60
95
Understanding the SAS® DATA Step and the Program Data Vector
106
Another Accumulation Solution
R t i and
Retain
d th
the SUM ffunction
ti can also
l accumulate
l t correctly.
tl
data timecard;
infile rawin;
input Name $ Hours;
ktr=ktr+1;
hourstot=sum(hourstot,hours);
(
,
);
retain Ktr Hourstot 0;
run;
proc print data=timecard;
title 'SOFTCO PAYROLL';
run;
Understanding the SAS® DATA Step and the Program Data Vector
107
Compiler Instructions
A series of statements that instruct the compiler to alter attributes
attributes.
• In general, declarative statements
• Can be coded in any order
• Allow the program to be very explicit in the definition of SAS structures
structures.
Examples of these statements are:
•
•
•
•
•
•
•
•
LENGTH statement to set a variables internal length
INFORMAT to set input format
FORMAT to set output
p display
p y format
LABEL to define a variable label
ATTRIB to define any or all of the above in one statement
DROP to indicate which variables are to be left behind on SAS file
KEEP to indicate which variables to include on the SAS file
RETAIN to set initial values and instruct SAS to never clear
Understanding the SAS® DATA Step and the Program Data Vector
108
Compiler Statements to Alter Variable Attributes
data softsale;
infile rawin;
1234567890123456789012345678901234
BETH
H 12 4822.12
982.10
length name $20;
H
2
233.11
94.12
input buffer CHRIS
JOHN
H
7
678.43
150.11
attr
division length
length=$2;
$2;
format sales comma10.2;
input name $1-10 division $12 years 15-16 sales 19-25
expense 27-34;
drop sales years;
run;
PROGRAM
DATA
VECTOR
C O
DATASET
DESCRIPTOR
PORTION
(DISK)
DATASET
DATA
PORTION
Name:
Type:
Length:
Format:
o at
Informat:
Label:
Flags:
Value
NAME
CHAR
20
DIVISION
CHAR
2
Name:
Type:
Length:
Format:
Informat:
Label:
NAME
CHAR
20
DIVISION
CHAR
2
BETH
CHRIS
JOHN
H
H
H
YEARS
NUM
8
SALES
NUM
8
comma.
co
a
D
D
EXPENSE _ERROR_
NUM
NUM
8
8
_N_
NUM
8
EXPENSE
NUM
8
982.10
94.12
150.11
Understanding the SAS® DATA Step and the Program Data Vector
109
Debugging Features
There are a number of debugging features in the DATA step
step.
•
•
•
•
•
•
Excellent Interactive DATA step debugger available.
Simple DATA step statements that display data
data.
The LIST statement can display the input buffer from the most recent
INPUT.
The FILE LOG with PUT can display any text and variable from the PDV
in the SAS log.
The PUTLOG statement can also display text to the SAS log.
PROC CONTENTS and PROC PRINT/REPORT can be used to display
the dataset descriptor and data values of the final dataset.
Notes:
N
t
• By putting the LIST and PUT/PUTLOG statements at strategic points in
the DATA step may be the simplest and best way to debug.
Understanding the SAS® DATA Step and the Program Data Vector
110
A Debugging Example
Example: The program below selects 0 records.
records Which statement is causing
the problem?
1234567890123456789012345678901234
BETH
CHRIS
JOHN
H
H
H
12
2
7
4822.12
233.11
678.43
982.10
94.12
150.11
data softsale;
infile rawin missover;
input name $1-10 division $12 years 15-16 sales 19-25
expense 27-34;
27 34;
if sales > 4000;
if division='h';
run;
NOTE:
O
The
h d
data set WORK.SOFTSALE
O
SO S
h
has 0 observations
b
i
and
d 5 variables.
i bl
Understanding the SAS® DATA Step and the Program Data Vector
111
A Debugging Example
Now use PUTLOG tto di
N
display
l messages and
d values
l
around
d th
the IF
statements.
data softsale;
;
infile rawin missover;
input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34;
putlog '$$$ before sales if ' _n_= sales= division=;
if sales > 4000;
putlog '$$$ before division if ' _n_= sales= division=;
if division='h';
run;
$$$
$$$
$$$
$$$
before
before
before
before
sales if
division
sales if
sales if
_N_=1 sales=4822.12 division=H
if _N_=1 sales=4822.12 division=H
_N_=2 sales=233.11 division=H
_N_
N =3
3 sales
sales=678.43
678.43 division
division=H
H
NOTE: The data set WORK.SOFTSALE
WORK SOFTSALE has 0 observations and 5 variables.
variables
Understanding the SAS® DATA Step and the Program Data Vector
112
Other Data Step Statements
The SAS DATA step has many
many, many more statements that can read
read, write
write,
and process data in almost any form.
•
•
•
•
•
•
over 500 DATA step functions
interfaces to the SAS macro system
much more
much written about those features
beyond the scope of this paper to try to cover them all.
the DATA step is an extremely versatile and full featured programming
language.
Understanding the SAS® DATA Step and the Program Data Vector
113
Other Topics
As powerful and well designed as the DATA step is
is, it is different from other
languages.
•
•
•
•
•
Perhaps a more standardized language such as SQL might be more
auditable and more desirable.
Terrible DATA step code can be written.
Good design and adequate documentation make a well written program.
PROC SQL is also a great tool that adds that language’s features to SAS.
SAS programs can be well written programs that can be used for
everything from one time programs to full fledged production programs.
Understanding the SAS® DATA Step and the Program Data Vector
114
Conclusions
The SAS DATA step is an excellent programming language with unique
features and extremely versatile features.
Understanding the SAS® DATA Step and the Program Data Vector
115
Not Good On the Flight Home
Understanding the SAS® DATA Step and the Program Data Vector
116
Contact Us
Questions, comments or to subscribe to the free “Missing Semicolon”
newslettter.
SAS® Training, Consulting, & Help Desk Services
Steven J. First
President
(608) 278-9964
FAX (608) 278-0065
sfirst@sys-seminar
sfirst@sys
seminar.com
com
www.sys-seminar.com
2997 Yarmouth Greenway Drive • Madison, WI 53711
Understanding the SAS® DATA Step and the Program Data Vector
117
Download