PDV Program Data Vector Default operation (1) Data step is a loop – rerun until out of data. Each pass through, data are first read into the Program Data Vector Data A; Put _all_; Input X1 X2 Y; put _ALL_//; put “Next Pass”; Datalines; 5 6 10 2 8 20 35 4 Z 30 1 2 3 ; Proc Print noobs data=A; run; (2) At end of each pass, contents of PDV written to data set, variables in PDV are set to missing, _N_ incremented, file parsed for more data. Most of these defaults can be overridden. Drop and Retain. (1) Surprisingly, DROP and RETAIN have little to do with each other DROP: Do not write this to the data set (still in PDV). RETAIN: Do not reset this to missing (.) on each pass. Note: RETAIN is executed at compile time. Data A; put _all_; input X; drop X; XSQ = X*X; put _all_ //; Datalines; 1 2 3 ; Proc Print; Run; (2) RETAIN can initialize something. This is done at compile time rather than execution time. Data A; put _all_; input Y; drop X; XSQ = X*X; X = X+1; put _all_ //; retain X 3; Datalines; 1 2 3 ; Proc Print; Run; The OUTPUT statement: (1) When you issue an OUTPUT statement, the PDV contents are output at that point and nowhere else (unless there is another OUTPUT statement). Data A; Input X; do i=1 to 10; Y = X+I; *output; end; Datalines; 5 Proc Print; Run; (2) You can output conditionally: Data A; Input lbs price_per_lb; Price = lbs*price_per_lb; If Price < 2.00 then output; Datalines; 19 0.10 5 0.25 8 0.75 20 0.07 ; Proc Print; Run; USING _N_ (1) _N_ is available only in the PDV. Use it there. Data A; Input lbs price_per_lb; If _N_=1 then bill=0; retain bill; Price = lbs*price_per_lb; Bill=Bill+Price*(1.07); *Tax; Datalines; 19 0.10 5 0.25 8 0.75 20 0.07 ; Proc Print; Format bill dollar5.2; Run; (2) T+1 automatically implies a “Retain T 0”. Data A; Input lbs price_per_lb; Price = lbs*price_per_lb; Bill+Price*(1.07); *Tax; Datalines; 19 0.10 5 0.25 8 0.75 20 0.07 ; Proc Print; Format bill dollar6.2; Run; KEEP (1) The KEEP statement really means “Drop everything but …” so nothing else in the PDV gets into the data set. Data A; Input X1 X2 X3 X4 X5; Keep X3 X4 X5; Drop X2 X3; Datalines; 5 10 15 20 25 ; Proc Print; Run;