Understanding the SAS® DATA Step and the Program Data Vector Steven J. First, President 2997 Yarmouth Greenway Drive, Madison, WI 53711 Phone: (608) 278-9964, Web: www.sys-seminar.com Understanding the SAS® DATA Step and the Program Data Vector 1 Understanding the SAS® DATA Step and the PDV This presentation was written by Systems Seminar Consultants Consultants, Inc Inc. SSC specializes SAS software and offers SAS: • Training Services • Consulting Services • Help Desk Plans • Newsletter subscriptions to The Missing Semicolon™ Semicolon . SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. The Missing Semicolon is a trademark of Systems Seminar C Consultants, lt t Inc. I Understanding the SAS® DATA Step and the Program Data Vector 2 Global Warming Part 1 Understanding the SAS® DATA Step and the Program Data Vector 3 Abstract Th SAS system The t is i made d up off SAS PROCs PROC and d the th DATA Step St The DATA Step has: • an excellent, full fledged programming language • statements to read and write almost any type of data value • Conversion and calculation of new data • Looping • Interfaces y • Arrays • Much, much more. In many ways, ways the design of the DATA step along with its powerful statements, is what makes the SAS language so popular. Understanding the SAS® DATA Step and the Program Data Vector 4 Abstract (continued) This paper will Thi ill address: dd • How the DATA step fits with the rest of the SAS System • DATA step assumptions and defaults • internal structures such as buffers, and the Program Data Vector • compiler statements • executable statements Understanding the SAS® DATA Step and the Program Data Vector 5 Introduction The SAS system’s Th t ’ origins i i are in i the th 1960’s 1960’ and d 1970’ 1970’s with: ith • James Goodnight • John Sall • A. J. Barr • others Concepts in the design include: • “self defining files” y of default assumptions p • a system • procedures for commonly used routines • data handling step that would evolve into the SAS DATA step Understanding the SAS® DATA Step and the Program Data Vector 6 Introduction The DATA step Th t in i my opinion: i i • extremely simple • elegant design • continues today • 30 years of enhancements. Understanding the SAS® DATA Step and the Program Data Vector 7 Structure of SAS SAS consists i t of: f 1. a data handling language (DATA step) 2. a library of pre-written procedures ( PROC step) S A S STATEMENTS RAW DATA SAS DATA STEP S A S SUPERVISOR SAS DATA SET Understanding the SAS® DATA Step and the Program Data Vector SAS PROC STEP REPORT 8 Purpose of the DATA Step “Gett th “G the d data t iin shape” h ” ffor llater t PROC PROCs and d DATA steps. t • SAS PROCs can only read SAS datasets • We might have some other type of file to process • SAS dataset has built in descriptor that keeps track of names and attributes • later steps don’t have to remember as many details If we don’t have well defined data • DATA step gives us the power to read and write virtually any kind of file • We can do calculations and computations on a single row of data • It has a very powerful data handling language Understanding the SAS® DATA Step and the Program Data Vector 9 SAS DATA Step Overview DATA steps t can read d and d write it mostt types t off data d t stored t d on your computer. t other SAS d dataset raw disk, tape instream data DBMS, program products d SAS/ACCESS get the data in shape for analysis S S/ CC SS SAS/ACCESS other SAS dataset instream data raw disk, tape reports DBMS, program products Notes: • DATA step output is usually a SAS dataset but can be other files files. • Access to non-SAS database management systems requires a SAS/ACCESS product. Understanding the SAS® DATA Step and the Program Data Vector 10 Default Assumptions M Many assumptions ti are made d tto save titime and d effort. ff t As computer scientist I study and use many programming languages • Early on I was intrigued by the cleverness and common sense of SAS • Many tedious programming tasks were eliminated through defaults • System still provided a means to override those defaults when necessary • Our job as a DATA step programmer then is different from other languages In many ways our tasks are: • understanding the defaults g how to work with them • knowing • override them as necessary. Understanding the SAS® DATA Step and the Program Data Vector 11 Default Assumptions (continued) E Examples l Of SAS D Defaults: f lt • • • • • • • • Handling compile and execution naming and storage details A dataset descriptor that makes SAS datasets “self defining” Generating data set names if omitted When reading a data set, assume most recently created dataset if not specified Processing all the rows and columns in a file Automatically opening and closing of files Automatically controlling data initialization DATA step looping, data set output, and end of file checking Understanding the SAS® DATA Step and the Program Data Vector 12 Default Assumptions (continued) E Examples l Of SAS D Defaults: f lt • • • • • • • Automatically defining storage areas for each variable referenced without need d tto predefine d fi th them A default length of 8 was assumed for all variables A assumption that a variable is numeric if not specified LIST input assumed that data values would be separated by blanks rather than specifying exact columns SUBSETTING IF statements which imply to continue processing if a condition diti is i true, t else l delete d l t the th observation. b ti (E (Ex. If rate t > 10; 10 ) When no comparison is made in an IF statement, assume to be checking for 1 (true) (Ex. If eof then put ‘At end’;) Ab i t d sum statements. Abreviated t t t (Ex. (E Salestot+sales) S l t t l ) Understanding the SAS® DATA Step and the Program Data Vector 13 Compiling a DATA Step A mostt languages, As l the th DATA step t is i fifirstt compiled il d th then executed. t d • • • • • • • • Languages can be compiled, interpreted, SAS is a hybrid language Some features from other languages Some unique features Sometimes difficult to separate compile versus execution events with SAS DATA step compiler examines SAS statements for syntax data structures generates an executable program q to SAS compiler p checking g for the existence of resources Unique Assumptions that it “inserts” into the source code. Understanding the SAS® DATA Step and the Program Data Vector 14 Data Structures “G tti the “Getting th data d t in i shape” h ” needs d data d t structures t t to t hold h ld data d t as processed. d • • • All computer languages need to address this Each language may name the structures differently There is a lot of similarity in the way most languages store data. Understanding the SAS® DATA Step and the Program Data Vector 15 A Typical SAS Job 1234567890123456789012345678901234 input buffer BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; ; infile rawin; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; run; PROGRAM DATA VECTOR Name: Type: Length: Format: Informat: Label: Flags: NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 SALES NUM 8 EXPENSE _ERROR_ _N_ NUM NUM NUM 8 8 8 D D Value DATASET DESCRIPTOR PORTION (DISK) ( ) DATASET DATA PORTION Name: Type: Length: g Format: Informat: Label: NAME CHAR 10 DIVISION CHAR 1 BETH CHRIS JOHN H H H YEARS NUM 8 12 2 7 SALES NUM 8 EXPENSE NUM 8 4822.12 982.10 233.11 94.12 678.43 150.11 Understanding the SAS® DATA Step and the Program Data Vector 16 Raw File Buffers DATA steps t reading/writing di / iti “raw” “ ” or non-SAS SAS data d t need d memory buffers. b ff • Needed to temporarily hold at least one input record at a time. • Also times when multiple lines of input can be held in buffers • Allows the program to logically read later rows before earlier ones. • Buffer contains the complete input and output record, regardless of whether the INPUT statement reads all of the columns. • SAS datasets and RDMS (which usually appear as SAS datasets) do not use raw buffers as the files are already in “shape”. Understanding the SAS® DATA Step and the Program Data Vector 17 Raw File Buffers 1234567890123456789012345678901234 input buffer BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; infile rawin; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; run; PROGRAM DATA VECTOR Name: Type: Length: Format: Informat: Label: Flags: NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 SALES NUM 8 EXPENSE _ERROR_ _N_ NUM NUM NUM 8 8 8 D D Value DATASET DESCRIPTOR PORTION (DISK) ( ) DATASET DATA PORTION Name: Type: Length: g Format: Informat: Label: NAME CHAR 10 DIVISION CHAR 1 BETH CHRIS JOHN H H H YEARS NUM 8 12 2 7 SALES NUM 8 EXPENSE NUM 8 4822.12 982.10 233.11 94.12 678.43 150.11 Understanding the SAS® DATA Step and the Program Data Vector 18 The LOGICAL PROGRAM DATA VECTOR (PDV) A second d memory area ffor data d t manipulation, i l ti conversion, i refinement. fi t Areas are needed for: • Inputting and input formatting (informatting) desired variables • Revising existing values • Computing new variables • System indicators and flags • Called Logical Program Data Vector (PDV). g g have a similar working g area. ((COBOL Working g Storage). g ) • Manyy languages • “Retained” and “Non-retained” variables are stored in separate physical program data vectors, together make Logical Program Data Vector • All variables referenced be automaticallyy defined in the PDV by y compiler p using characteristics from the first reference of a variable. p agemo=age/12; g g ; Example: Understanding the SAS® DATA Step and the Program Data Vector 19 The LOGICAL PROGRAM DATA VECTOR (PDV) 1234567890123456789012345678901234 input buffer BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; infile rawin; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; run; PROGRAM DATA VECTOR Name: Type: Length: Format: Informat: Label: Flags: NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 SALES NUM 8 EXPENSE _ERROR_ _N_ NUM NUM NUM 8 8 8 D D Value DATASET DESCRIPTOR PORTION (DISK) ( ) DATASET DATA PORTION Name: Type: Length: g Format: Informat: Label: NAME CHAR 10 DIVISION CHAR 1 BETH CHRIS JOHN H H H YEARS NUM 8 12 2 7 SALES NUM 8 EXPENSE NUM 8 4822.12 982.10 233.11 94.12 678.43 150.11 Understanding the SAS® DATA Step and the Program Data Vector 20 The Final SAS Dataset A ““self-defining” lf d fi i ” d dataset. t t • • • The dataset descriptor contains attributes for all kept variables plus data sett labeling l b li iinformation. f ti The Dataset Data portion contains data values for all output rows and kept columns. P Pseudo-variables d i bl are nott kkept. t Understanding the SAS® DATA Step and the Program Data Vector 21 The Final SAS Dataset 1234567890123456789012345678901234 input buffer BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; infile rawin; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; run; PROGRAM DATA VECTOR Name: Type: Length: Format: Informat: Label: Flags: NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 SALES NUM 8 EXPENSE _ERROR_ _N_ NUM NUM NUM 8 8 8 D D Value DATASET DESCRIPTOR PORTION (DISK) ( ) DATASET DATA PORTION Name: Type: Length: g Format: Informat: Label: NAME CHAR 10 DIVISION CHAR 1 BETH CHRIS JOHN H H H YEARS NUM 8 12 2 7 SALES NUM 8 EXPENSE NUM 8 4822.12 982.10 233.11 94.12 678.43 150.11 Understanding the SAS® DATA Step and the Program Data Vector 22 Variable Attributes Th compiler The il assigns i each hd defined fi d variable i bl severall characteristics. h t i ti • • • • • • • • • Relative variable number Position in the dataset Name Data type Length in bytes Informat Format Variable label Flags to indicate dropping, retaining, handling of missing values for each variable Understanding the SAS® DATA Step and the Program Data Vector 23 Data Types and Conversion All SAS values l are one off ttwo d data t ttypes: numeric i or character. h t • • • • • • • Hundreds of different data types can be read or written via the PDV, only t two stored t d In PDV, SAS Dataset every character value is stored as a native EBCDIC or ASCII value with length between 1 and at least 32767 N Numerics i stored t d as d double bl precision i i flfloating ti point i t values l with ith llength th between 3 and 8 bytes. Storing only two data types greatly simplifies things for SAS datasets moves the th complication li ti off converting ti diff differentt d data t ttypes ((packed, k d bi binary, etc.) to the INPUT and PUT statement along with appropriate INFORMATS and FORMATS. Floating point for numbers with a length of 8 allows for storage of very large numbers (or small) without overflow Floating point does have minor mathematical issues of its own. Understanding the SAS® DATA Step and the Program Data Vector 24 Data Types and Conversion 1234567890123456789012345678901234 input buffer BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; ; infile rawin; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; run; PROGRAM DATA VECTOR Name: Type: Length: Format: Informat: Label: Flags: NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 SALES NUM 8 EXPENSE _ERROR_ _N_ NUM NUM NUM 8 8 8 D D Value DATASET DESCRIPTOR PORTION (DISK) ( ) DATASET DATA PORTION Name: Type: Length: g Format: Informat: Label: NAME CHAR 10 DIVISION CHAR 1 BETH CHRIS JOHN H H H YEARS NUM 8 12 2 7 SALES NUM 8 EXPENSE NUM 8 4822.12 982.10 233.11 94.12 678.43 150.11 Understanding the SAS® DATA Step and the Program Data Vector 25 Pseudo Variables S Special i l variables i bl th thatt th the compiler il creates t th thatt are nott added dd d tto output t t fil file. Partial list: _N_ contains the number of times the DATA step has looped. _ERROR_ is set to 0 if there were no input errors, otherwise 1. FIRST.variable showing beginning of control break LAST.variable showing end of control break _INFILE_ contains entire input buffer Others can be requested by the programmer to: • Detect end of file • Access control blocks • Access and alter system information Understanding the SAS® DATA Step and the Program Data Vector 26 Pseudo Variables 1234567890123456789012345678901234 input buffer BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; ; infile rawin; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; run; PROGRAM DATA VECTOR Name: Type: Length: Format: Informat: Label: Flags: NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 SALES NUM 8 EXPENSE _ERROR_ _N_ NUM NUM NUM 8 8 8 D D Value DATASET DESCRIPTOR PORTION (DISK) ( ) DATASET DATA PORTION Name: Type: Length: g Format: Informat: Label: NAME CHAR 10 DIVISION CHAR 1 BETH CHRIS JOHN H H H YEARS NUM 8 12 2 7 SALES NUM 8 EXPENSE NUM 8 4822.12 982.10 233.11 94.12 678.43 150.11 Understanding the SAS® DATA Step and the Program Data Vector 27 Output Datasets U Usually ll a SAS d datafile, t fil b butt raw fil files and d reports t can also l b be created. t d • • • • • • SAS data sets are “self-defining” via dataset descriptor Much simpler operation for the programmer SAS automatically builds the structures needed Outputs the record at DATA step return. In addition, all variables except dropped and pseudo variables are kept Descriptor info can be printed with PROC CONTENTS, dictionary tables Notes: • If a raw file or report is produced, buffers opposite those of INFILE are used. • Raw files can pass info to other programs. Understanding the SAS® DATA Step and the Program Data Vector 28 Questions About The Data Step If traditional t diti l programming i experience i is i applied li d thi this DATA step, t questions ti might be: Where are the opens and closes? Where do we write out records? What is looping? Wh does When d th the program stop? t ? Understanding the SAS® DATA Step and the Program Data Vector 29 Assumptions Made In The Data Step Th d The default f lt actions ti off th the DATA step t easily il accommodates d t mostt programs. • • • • • • All rows are read from files starting with the first record until last record. Input values from a previous row are cleared before reading next row. All variables referenced will be included on the output file. All records will be included on the resulting output file. All files should be opened at the beginning and closed at the end of step. Programs should not continue to loop if no data is read in previous pass. Understanding the SAS® DATA Step and the Program Data Vector 30 Assumptions Made In The Data Step The SAS compiler makes the following assumption and inserts code to do: • • • • • Immediately upon entry check for infinite looping All values from non-SAS files are cleared before executing any statements. If any reading statement would read a record after end of file, step stops. If the program reaches the last statement in the step step, or if a RETURN statement is executed, the current PDV contents (all columns for each row) is output to the SAS data set being built. A branch is executed to go to the top and enter the DATA step for another pass. Understanding the SAS® DATA Step and the Program Data Vector 31 Assumptions Made In The Data Step A th view Another i it is i that th t the th compiler il iinserts t th the bl blue ititalic li code. d data softsale; Check for looping looping, initialize PDV infile rawin; if at EOF then stop Input Name Years Expense $1-10 Division 15-16 Sales 28-34 State output to SAS Dataset goto top of DATA step $12 19-25 $36-37; run; Notes: p virtually y infinite-loop pp proof. • These save effort and make DATA steps Understanding the SAS® DATA Step and the Program Data Vector 32 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) Name Name Division Years Sales Dataset work.softsale Division Years Sales Expense Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 33 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) Name Name Division Years Sales Dataset work.softsale Division Years Sales Expense Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 34 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) Name Name Division Years Sales Expense . . . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 35 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) Name Name Division Years Sales Expense . . . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 36 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) Name Name Division Years Sales Expense . . . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 37 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector (Descriptor) Name Name Division Years Sales Expense . . . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 38 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector (Descriptor) Name CHRIS Name Division Years Sales Expense . . . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 39 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector (Descriptor) Name CHRIS Name Division H Years Sales Expense . . . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 40 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector (Descriptor) Name CHRIS Name Division H Years Sales Expense 2 . . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 41 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector (Descriptor) Name CHRIS Name Division H Years Sales Expense 2 233.11 . Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 42 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector (Descriptor) Name CHRIS Name Division H Years Sales Expense 2 233.11 94.12 Dataset work.softsale Division Years Sales Expense State State (Data) Understanding the SAS® DATA Step and the Program Data Vector 43 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector (Descriptor) Name CHRIS Name Division H Years Sales Expense 2 233.11 94.12 Dataset work.softsale Division Years Sales Expense State WI State (Data) Understanding the SAS® DATA Step and the Program Data Vector 44 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector Name CHRIS (Descriptor) Name (Data) CHRIS Division H Years Sales Expense 2 233.11 94.12 Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State WI State WI 45 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector Name CHRIS (Descriptor) Name (Data) CHRIS Division H Years Sales Expense 2 233.11 94.12 Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State WI State WI 46 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 CHRIS H 2 233.11 94.12 WI Input buffer Logical Program Data Vector Name CHRIS (Descriptor) Name (Data) CHRIS Division H Years Sales Expense 2 233.11 94.12 Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State WI State WI 47 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector Name (Descriptor) Name (Data) CHRIS Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State State WI 48 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector Name (Descriptor) Name (Data) CHRIS Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State State WI 49 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector Name (Descriptor) Name (Data) CHRIS Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State State WI 50 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 MARK H 5 298.12 52.65 WI Input buffer Logical Program Data Vector Name (Descriptor) Name (Data) CHRIS Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State State WI 51 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; Each row is handled the same way. Let's skip through MARK’s input step. output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 MARK H 5 298.12 52.65 WI Input buffer Logical Program Data Vector Name (Descriptor) Name (Data) CHRIS Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 Understanding the SAS® DATA Step and the Program Data Vector 233 11 233.11 Expense 94 12 94.12 State State WI 52 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 MARK H 5 298.12 52.65 WI Input buffer Logical Program Data Vector (Descriptor) Name MARK Name CHRIS Division H Years Sales Expense 5 298.12 52.65 Dataset work.softsale Division Years Sales H 2 233.11 Expense 94.12 State WI State WI (Data) Understanding the SAS® DATA Step and the Program Data Vector 53 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 MARK H 5 298.12 52.65 WI Input buffer Logical Program Data Vector (Descriptor) (Data) Name MARK Name CHRIS MARK Division H Years Sales Expense 5 298.12 52.65 Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State WI State WI WI 54 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 MARK H 5 298.12 52.65 WI Input buffer Logical Program Data Vector (Descriptor) (Data) Name MARK Name CHRIS MARK Division H Years Sales Expense 5 298.12 52.65 Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State WI State WI WI 55 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 MARK H 5 298.12 52.65 WI Input buffer Logical Program Data Vector (Descriptor) (Data) Name MARK Name CHRIS MARK Division H Years Sales Expense 5 298.12 52.65 Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State WI State WI WI 56 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State State WI WI 57 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State State WI WI 58 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State State WI WI 59 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 SARAH S 6 301.21 65.17 MN Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State State WI WI 60 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; Each row is handled the same way. Let's skip through SARAH’S input step. output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 SARAH S 6 301.21 65.17 MN Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State State WI WI 61 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 SARAH S 6 301.21 65.17 MN Input buffer Logical Program Data Vector (Descriptor) (Data) Name SARAH Name CHRIS MARK Division S Years Sales Expense 6 301.21 65.17 Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 State MN State WI WI 62 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 SARAH S 6 301.21 65.17 MN Input buffer Logical Program Data Vector (Descriptor) (Data) Name SARAH Name CHRIS MARK SARAH Division S Years Sales Expense 6 301.21 65.17 Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 S 6 301.21 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 65.17 State MN State WI WI MN 63 Executing the DATA Step Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN data softsale; init buffer, PDV infile rawin; if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 SARAH S 6 301.21 65.17 MN Input buffer Logical Program Data Vector (Descriptor) (Data) Name SARAH Name CHRIS MARK SARAH Division S Years Sales Expense 6 301.21 65.17 Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 S 6 301.21 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 65.17 State MN State WI WI MN 64 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK SARAH Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 S 6 301.21 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 65.17 State State WI WI MN 65 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK SARAH Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 S 6 301.21 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 65.17 State State WI WI MN 66 Executing the DATA Step data softsale; init buffer, PDV infile rawin; if at EOF then stop Fileref rawin 1 2 3 1234567890123456789012345678901234567 CHRIS H 2 233.11 94.12 WI End MARK H 5 298.12 52.65 WI SARAH S 6 301.21 65.17 MN input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK SARAH Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 S 6 301.21 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 65.17 State State WI WI MN 67 Executing the DATA Step Fileref rawin 1 2 3 We're done! 1234567890123456789012345678901234567 data softsale; Go to the next step. CHRIS H 2 233.11 94.12 WI init buffer, PDV End MARK H 5 298.12 52.65 WI infile rawin; SARAH S 6 301.21 65.17 MN if at EOF then stop input Name $1-10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 State $36-37; output to SAS Dataset goto top of Data Step run; 1 2 3 4 8 1234567890123456789012345678901234567890...0 Input buffer Logical Program Data Vector (Descriptor) (Data) Name Name CHRIS MARK SARAH Division Years Sales Expense . . . Dataset work.softsale Division Years Sales H 2 233.11 H 5 298.12 S 6 301.21 Understanding the SAS® DATA Step and the Program Data Vector Expense 94.12 52.65 65.17 State State WI WI MN 68 Overriding Data Step Assumptions N t allll programs fit scenario Not i above; b can we alter lt b behavior h i off th the program? ? • • • • • • • The FIRSTOBS= and OBS= system options begin and end logically after the first record and before the last record respectively respectively. The RETAIN compiler statement instructs the step to not initialize variables. The STOP statement (usually used with IF statement) stops step early early. The DELETE, subsetting IF statements, exit the DATA step early; never reach OUTPUT. RETURN exits the DATA step early but does output to the SAS file file. DROP, KEEP statements, dataset options exclude or include columns. GOTO and various DO groups alter the looping path for the program. Understanding the SAS® DATA Step and the Program Data Vector 69 Retaining Data Values S Sometimes ti variables i bl should h ld nott b be cleared l d upon exit/reentry. it/ t The RETAIN compiler statement: • Specifies listed variables should not be initialized • Can also set an initial value • Iif first reference sets compiler attributes • Is essentially a flag set by compiler to tell execution don’t clear • Can be coded anywhere in DATA Step • Variables read with SET,, MERGE,, or UPDATE,, MODIFY are retained. Understanding the SAS® DATA Step and the Program Data Vector 70 Retaining Data Values Example: Read a file with CITY in the first row row, RETAIN the value value. 1234567890123456789012345678901234 MADISON BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; infile rawin missover; if _N_ = 1 then do; input p city y $2-8; ; delete; end; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; retain city; run; PROGRAM DATA VECTOR N Name: Type: Length: Format: Informat: Label: Flags: NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 SALES NUM 8 EXPENSE CITY NUM CHAR 8 7 R Value DATASET DESCRIPTOR PORTION (DISK) Format: DATASET DATA PORTION Name: Type: Length: Informat: Label: _ERROR_ ERROR NUM 8 _N_ N NUM 8 D D MADISON NAME CHAR 10 DIVISION CHAR 1 YEARS NUM 8 BETH CHRIS JOHN H H H 12 2 7 SALES NUM 8 4822.12 4822 12 233.11 678.43 EXPENSE CITY NUM CHAR 8 7 982 982.10 10 94.12 150.11 MADISON MADISON MADISON Understanding the SAS® DATA Step and the Program Data Vector 71 Stopping Early N Normally ll jjobs b run until til EOF, EOF OBS OBS= or STOP can stop t step t early. l Example: Stop after 50 records data softsale; infile rawin; if _N_ N = > 50 then stop; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; run; Understanding the SAS® DATA Step and the Program Data Vector 72 Stopping Late Sometimes S ti a program should h ld continue ti even if the th llastt record d was read. d Example: Calculate a percentage of total DATA PCTDS; IF _N_ = 1 THEN DO UNTIL(EOF); UNTIL(EOF) SET CONCAT(KEEP=NAME SALES) END=EOF; TOTSALES+SALES; END; SET CONCAT(KEEP=NAME SALES); SALESPCT=(SALES*100)/TOTSALES; RUN; PDV Name Sales TOTSALES CONCAT DATASET OBS 1 2 3 4 5 6 7 NAME BETH CHRIS JOHN MARK ANDREW BENJAMIN JANET YEARS 12 2 7 5 24 3 1 SALES 4822.12 233.11 678.43 298 12 298.12 1762.11 201.11 98.11 SALESPCT Understanding the SAS® DATA Step and the Program Data Vector 73 Outputting And Deleting Observations D f lt is Default i tto output t t the th currentt row if DATA step t reaches h implied i li d OUTPUT. OUTPUT Statements to discard rows: • DELETE (usually after an IF) leaves the DATA step without OUTPUT. • False cases of Subsetting IF statements exit without OUTPUTting. • RETURN exits the step but does OUTPUT. • Explicit OUTPUT statement OUTPUTs when executed, but no implied OUTPUT at bottom. • You may prefer those positive statements such as Subsetting IF, OUTPUT etc. • You might use a negative statement such as DELETE to filter unwanted rows. Note: • WHERE also filters rows in engine, not in DATA step. Understanding the SAS® DATA Step and the Program Data Vector 74 Outputting And Deleting Observations Example: E l O Only l output t t records d with ith more th than 4000 in i Sales. S l . data softsale; if no input last time thru then stop initialize PDV infile rawin; if at EOF then stop input Name $1-10 $1 10 Division $12 Years 15-16 Sales 19-25 Expense 28-34 St t State $36 $36-37; 37 If sales > 4000; If sales not gt 4000 then goto top of DATA step output to SAS Dataset goto top g p of DATA step p run; Note: • WHERE also filters rows in engine, not in DATA step. Understanding the SAS® DATA Step and the Program Data Vector 75 Losing the Crowd Understanding the SAS® DATA Step and the Program Data Vector 76 Accumulating Totals in a DATA Step W should We h ld b be able bl tto write it fformulas l tto countt or sum across observations. b ti In a DATA step, read the following dataset and do the following: • Count all employees and print running counts. • Sum hours and print running sums. fileref rawin JOE PETE STEVE TOM 40 20 . 35 Understanding the SAS® DATA Step and the Program Data Vector 77 Accumulating Totals in a DATA Step (continued) data timecard; infile rawin; input Name $ Hours; Ktr=ktr+1; Ktr ktr 1; Hourstot=hourstot+hours; run; Fileref i rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO PAYROLL'; run; SOFTCO PAYROLL OBS 1 2 3 4 Name JOE PETE STEVE TOM Hours 40 20 . 35 Ktr . . . . Hourstot . . . . Question: Why didn't the above program work correctly? Understanding the SAS® DATA Step and the Program Data Vector 78 Accumulating Totals in a DATA Step (continued) Adding values Addi l across records d has h three th problems: bl 1. All variables are set to missing on every record. 2. Totals normally need to start at 0, not missing. 3. When a missing value is added to any value, the result is a missing value. The SUM statement is a special assignment statement to add values across observations. SYNTAX: variable+expression; Notes: • SUM variables start at 0, not missing. • SUM variables are retained from pass to pass. • Any missing values encountered are ignored. • RETAIN and SUM function could also be used. Understanding the SAS® DATA Step and the Program Data Vector 79 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours Ktr Understanding the SAS® DATA Step and the Program Data Vector Hourstot 80 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours Ktr RM . 0 Understanding the SAS® DATA Step and the Program Data Vector Hourstot RM 40 20 . 35 Set at compile time. 0 81 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours Ktr RM . 0 Understanding the SAS® DATA Step and the Program Data Vector Hourstot RM 0 82 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name JOE Hours 40 Ktr RM 0 Understanding the SAS® DATA Step and the Program Data Vector Hourstot RM 0 83 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name JOE JOE PETE STEVE TOM 40 20 . 35 0 + 1 Hours 40 Ktr RM 1 Understanding the SAS® DATA Step and the Program Data Vector Hourstot RM 0 84 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name JOE 40 20 . 35 0 + 40 Hours 40 Ktr RM 1 Understanding the SAS® DATA Step and the Program Data Vector Hourstot RM 40 85 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours JOE Obs 1 Ktr RM 40 Name JOE Hourstot RM 1 40 Hours Ktr Hourstot 40 1 40 Understanding the SAS® DATA Step and the Program Data Vector 86 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 Name JOE Hours Ktr RM . 1 Hourstot RM 40 Hours Ktr Hourstot 40 1 40 Understanding the SAS® DATA Step and the Program Data Vector 87 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 Name JOE Hours Ktr RM . 1 Hourstot RM 40 Hours Ktr Hourstot 40 1 40 Understanding the SAS® DATA Step and the Program Data Vector 88 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours PETE Obs 1 Ktr RM 20 Name JOE Hourstot RM 1 40 Hours Ktr Hourstot 40 1 40 Understanding the SAS® DATA Step and the Program Data Vector 89 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 Ktr RM 20 Name JOE 40 20 . 35 1 + 1 Hours PETE JOE PETE STEVE TOM Hourstot RM 2 40 Hours Ktr Hourstot 40 1 40 Understanding the SAS® DATA Step and the Program Data Vector 90 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 40 + 20 Hours PETE Ktr RM 20 Name JOE 40 20 . 35 Hourstot RM 2 60 Hours Ktr Hourstot 40 1 40 Understanding the SAS® DATA Step and the Program Data Vector 91 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours PETE Ktr RM 20 Hourstot RM 2 60 Obs Name Hours Ktr Hourstot 1 2 JOE PETE 40 20 1 2 40 60 Understanding the SAS® DATA Step and the Program Data Vector 92 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours Ktr RM . 2 Hourstot RM 60 Obs Name Hours Ktr Hourstot 1 2 JOE PETE 40 20 1 2 40 60 Understanding the SAS® DATA Step and the Program Data Vector 93 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours Ktr RM . 2 Hourstot RM 60 Obs Name Hours Ktr Hourstot 1 2 JOE PETE 40 20 1 2 40 60 Understanding the SAS® DATA Step and the Program Data Vector 94 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours Ktr RM STEVE . 2 Hourstot RM 60 Obs Name Hours Ktr Hourstot 1 2 JOE PETE 40 20 1 2 40 60 Understanding the SAS® DATA Step and the Program Data Vector 95 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV JOE PETE STEVE TOM 40 20 . 35 2 + 1 Name Hours Ktr RM STEVE . 3 Hourstot RM 60 Obs Name Hours Ktr Hourstot 1 2 JOE PETE 40 20 1 2 40 60 Understanding the SAS® DATA Step and the Program Data Vector 96 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; Fileref RAWIN JOE PETE STEVE TOM proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV + Name Hours Ktr RM STEVE . 3 40 20 . 35 60 . Hourstot RM 60 Obs Name Hours Ktr Hourstot 1 2 JOE PETE 40 20 1 2 40 60 Understanding the SAS® DATA Step and the Program Data Vector 97 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours Ktr RM STEVE . 3 Obs 1 2 3 Name JOE PETE STEVE Hourstot RM 60 Hours Ktr Hourstot 40 20 . 1 2 3 40 60 60 Understanding the SAS® DATA Step and the Program Data Vector 98 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 2 3 Name JOE PETE STEVE Hours Ktr RM . 3 Hourstot RM 60 Hours Ktr Hourstot 40 20 . 1 2 3 40 60 60 Understanding the SAS® DATA Step and the Program Data Vector 99 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 2 3 Name JOE PETE STEVE Hours Ktr RM . 3 Hourstot RM 60 Hours Ktr Hourstot 40 20 . 1 2 3 40 60 60 Understanding the SAS® DATA Step and the Program Data Vector 100 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours TOM Obs 1 2 3 Ktr RM 35 Name JOE PETE STEVE Hourstot RM 3 60 Hours Ktr Hourstot 40 20 . 1 2 3 40 60 60 Understanding the SAS® DATA Step and the Program Data Vector 101 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 2 3 Ktr RM 35 Name JOE PETE STEVE 40 20 . 35 3 + 1 Hours TOM JOE PETE STEVE TOM Hourstot RM 4 60 Hours Ktr Hourstot 40 20 . 1 2 3 40 60 60 Understanding the SAS® DATA Step and the Program Data Vector 102 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Obs 1 2 3 60 + 35 Hours TOM Ktr RM 35 Name JOE PETE STEVE 40 20 . 35 Hourstot RM 4 95 Hours Ktr Hourstot 40 20 . 1 2 3 40 60 60 Understanding the SAS® DATA Step and the Program Data Vector 103 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; flags PDV Name Hours TOM Obs 1 2 3 4 Ktr RM 35 Name JOE PETE STEVE TOM Hourstot RM 4 95 Hours Ktr Hourstot 40 20 . 35 1 2 3 4 40 60 60 95 Understanding the SAS® DATA Step and the Program Data Vector 104 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; Understanding the SAS® DATA Step and the Program Data Vector 105 An Accumulation Solution data timecard; infile rawin; input Name $ Hours; Ktr+1; Hourstot+hours; run; fileref rawin JOE PETE STEVE TOM 40 20 . 35 proc print data=timecard; title 'SOFTCO SOFTCO PAYROLL PAYROLL'; ; run; SOFTCO PAYROLL Obs 1 2 3 4 Name JOE PETE STEVE TOM Hours Ktr Hourstot 40 20 . 35 1 2 3 4 40 60 60 95 Understanding the SAS® DATA Step and the Program Data Vector 106 Another Accumulation Solution R t i and Retain d th the SUM ffunction ti can also l accumulate l t correctly. tl data timecard; infile rawin; input Name $ Hours; ktr=ktr+1; hourstot=sum(hourstot,hours); ( , ); retain Ktr Hourstot 0; run; proc print data=timecard; title 'SOFTCO PAYROLL'; run; Understanding the SAS® DATA Step and the Program Data Vector 107 Compiler Instructions A series of statements that instruct the compiler to alter attributes attributes. • In general, declarative statements • Can be coded in any order • Allow the program to be very explicit in the definition of SAS structures structures. Examples of these statements are: • • • • • • • • LENGTH statement to set a variables internal length INFORMAT to set input format FORMAT to set output p display p y format LABEL to define a variable label ATTRIB to define any or all of the above in one statement DROP to indicate which variables are to be left behind on SAS file KEEP to indicate which variables to include on the SAS file RETAIN to set initial values and instruct SAS to never clear Understanding the SAS® DATA Step and the Program Data Vector 108 Compiler Statements to Alter Variable Attributes data softsale; infile rawin; 1234567890123456789012345678901234 BETH H 12 4822.12 982.10 length name $20; H 2 233.11 94.12 input buffer CHRIS JOHN H 7 678.43 150.11 attr division length length=$2; $2; format sales comma10.2; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; drop sales years; run; PROGRAM DATA VECTOR C O DATASET DESCRIPTOR PORTION (DISK) DATASET DATA PORTION Name: Type: Length: Format: o at Informat: Label: Flags: Value NAME CHAR 20 DIVISION CHAR 2 Name: Type: Length: Format: Informat: Label: NAME CHAR 20 DIVISION CHAR 2 BETH CHRIS JOHN H H H YEARS NUM 8 SALES NUM 8 comma. co a D D EXPENSE _ERROR_ NUM NUM 8 8 _N_ NUM 8 EXPENSE NUM 8 982.10 94.12 150.11 Understanding the SAS® DATA Step and the Program Data Vector 109 Debugging Features There are a number of debugging features in the DATA step step. • • • • • • Excellent Interactive DATA step debugger available. Simple DATA step statements that display data data. The LIST statement can display the input buffer from the most recent INPUT. The FILE LOG with PUT can display any text and variable from the PDV in the SAS log. The PUTLOG statement can also display text to the SAS log. PROC CONTENTS and PROC PRINT/REPORT can be used to display the dataset descriptor and data values of the final dataset. Notes: N t • By putting the LIST and PUT/PUTLOG statements at strategic points in the DATA step may be the simplest and best way to debug. Understanding the SAS® DATA Step and the Program Data Vector 110 A Debugging Example Example: The program below selects 0 records. records Which statement is causing the problem? 1234567890123456789012345678901234 BETH CHRIS JOHN H H H 12 2 7 4822.12 233.11 678.43 982.10 94.12 150.11 data softsale; infile rawin missover; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; 27 34; if sales > 4000; if division='h'; run; NOTE: O The h d data set WORK.SOFTSALE O SO S h has 0 observations b i and d 5 variables. i bl Understanding the SAS® DATA Step and the Program Data Vector 111 A Debugging Example Now use PUTLOG tto di N display l messages and d values l around d th the IF statements. data softsale; ; infile rawin missover; input name $1-10 division $12 years 15-16 sales 19-25 expense 27-34; putlog '$$$ before sales if ' _n_= sales= division=; if sales > 4000; putlog '$$$ before division if ' _n_= sales= division=; if division='h'; run; $$$ $$$ $$$ $$$ before before before before sales if division sales if sales if _N_=1 sales=4822.12 division=H if _N_=1 sales=4822.12 division=H _N_=2 sales=233.11 division=H _N_ N =3 3 sales sales=678.43 678.43 division division=H H NOTE: The data set WORK.SOFTSALE WORK SOFTSALE has 0 observations and 5 variables. variables Understanding the SAS® DATA Step and the Program Data Vector 112 Other Data Step Statements The SAS DATA step has many many, many more statements that can read read, write write, and process data in almost any form. • • • • • • over 500 DATA step functions interfaces to the SAS macro system much more much written about those features beyond the scope of this paper to try to cover them all. the DATA step is an extremely versatile and full featured programming language. Understanding the SAS® DATA Step and the Program Data Vector 113 Other Topics As powerful and well designed as the DATA step is is, it is different from other languages. • • • • • Perhaps a more standardized language such as SQL might be more auditable and more desirable. Terrible DATA step code can be written. Good design and adequate documentation make a well written program. PROC SQL is also a great tool that adds that language’s features to SAS. SAS programs can be well written programs that can be used for everything from one time programs to full fledged production programs. Understanding the SAS® DATA Step and the Program Data Vector 114 Conclusions The SAS DATA step is an excellent programming language with unique features and extremely versatile features. Understanding the SAS® DATA Step and the Program Data Vector 115 Not Good On the Flight Home Understanding the SAS® DATA Step and the Program Data Vector 116 Contact Us Questions, comments or to subscribe to the free “Missing Semicolon” newslettter. SAS® Training, Consulting, & Help Desk Services Steven J. First President (608) 278-9964 FAX (608) 278-0065 sfirst@sys-seminar sfirst@sys seminar.com com www.sys-seminar.com 2997 Yarmouth Greenway Drive • Madison, WI 53711 Understanding the SAS® DATA Step and the Program Data Vector 117