Notes on the UPDATE statement

advertisement
• Use the UPDATE statement to:
– update a master dataset with new transactions
(e.g. a bank account updated regularly with
deposits and withdrawals…). Not used a lot,
but when you need it, it’s exactly what you
need…
– the general form is
DATA master_data_set;
UPDATE master_data_set
transaction_data_set;
BY variable_list;
Notes on the UPDATE statement:
• only two datasets can be specified
(master & transactions)
• both sets must be SORTed by their
common variables
• the values of the BY variables
must by unique in the master set
(e.g., only one account per
account number in the master bank
dataset…could be many transactions
per account though)
• missing values in the transaction
dataset don’t overwrite existing
values in the master dataset.
*Go over the example in section 6.8 on page 194-195;
LIBNAME perm 'c:\MySASLib';
DATA perm.patientmaster; *INFILE fill in here;
INPUT Account LastName $ 8-16 Address $ 17-34
BirthDate MMDDYY10. Sex $ InsCode $ 48-50 @52 LastUpdate MMDDYY10.;
RUN;
/* Second Program */
LIBNAME perm 'c:\MySASLib';
DATA transactions; *INFILE fill in here;
INPUT Account LastName $ 8-16 Address $ 17-34 BirthDate MMDDYY10.
Sex $ InsCode $ 48-50 @52 LastUpdate MMDDYY10.;
PROC SORT DATA = transactions;
BY Account;
* Update patient data with transactions;
DATA perm.patientmaster;
UPDATE perm.patientmaster transactions; BY Account;
PROC PRINT DATA = 'c:\MySASLib\patientmaster';
FORMAT BirthDate LastUpdate MMDDYY10.; TITLE 'Admissions Data';
RUN;
There are many SAS dataset OPTIONS. The list in
section 6.9 is not comprehensive, but gives a
flavor of what’s possible…
• RENAME = (oldvariable_name = newvariable_name)
– this changes a variable’s name
• FIRSTOBS = n
– this tells SAS the observation number on which to begin reading
• OBS = n
– this tells SAS the observation number on which to stop reading
• IN = new_variable_name
– this tells SAS to create a new variable (temporarily) to track
whether an observation comes from that dataset (value=1) or not
(value=0). Let’s try the example in section 6.10…
Here’s the customer data:
101
102
103
104
105
Murphy's Sports
Sun N Ski
Sports Outfitters
Cramer & Johnson
Sports Savers
115 Main St.
2106 Newberry Ave.
19 Cary Way
4106 Arlington Blvd.
2708 Broadway
Here’s the orders data:
102
104
104
101
102
562.01
254.98
1642.00
3497.56
385.30
Here’s the SAS code to find the customers who didn’t
place any orders:
DATA customer;
*INFILE fill-in TRUNCOVER;
INPUT CustomerNumber Name $ 5-21 Address $ 23-42;
DATA orders;
*INFILE why no TRUNCOVER?;
INPUT CustomerNumber Total;
PROC SORT DATA = orders;
BY CustomerNumber;
* Combine the data sets using the IN= option;
DATA noorders;
MERGE customer orders (IN = Recent);
BY CustomerNumber;
IF Recent = 0;
PROC PRINT DATA = noorders;
TITLE 'Customers with No Orders in the Third
Quarter';
RUN;
Now modify the code so you can see the effect of the IN=
statement…
• take out the subsetting IF statement
• create a new variable whose values are those of the
variable RECENT (why do I have to do this?)
• PRINT the entire dataset including this new one made
from RECENT to see its effect.
• We may use the OUTPUT statement to create more than one
dataset; e.g., DATA X Y Z; INPUT … ;
This will create 3 identical datasets (named WORK.X,
WORK.Y, and WORK.Z.). The next example uses IF … THEN
statements to create different datasets with the OUTPUT
statement.
/* Here’s the zoo data with feeding time as the last
column. Create two datasets using the OUTPUT statement,
one for each of the feeding times: morning and evening
- be sure to put the animals in both datasets if they
are fed at both times… */
bears
elephants
flamingos
frogs
kangaroos
lions
snakes
tigers
zebras
Mammalia
Mammalia
Aves
Amphibia
Mammalia
Mammalia
Reptilia
Mammalia
Mammalia
E2
W3
W1
S2
N4
W6
S1
W9
W2
both
am
pm
pm
am
pm
pm
both
am
DATA morning afternoon;
*INFILE fill-in here;
INPUT Animal $ 1-9 Class $ 11-18 Enclosure $ FeedTime $;
IF FeedTime = 'am' THEN OUTPUT morning;
ELSE IF FeedTime = 'pm' THEN OUTPUT afternoon;
ELSE IF FeedTime = 'both' THEN OUTPUT;
PROC PRINT DATA = morning;
TITLE 'Animals with Morning Feedings';
PROC PRINT DATA = afternoon;
TITLE 'Animals with Afternoon Feedings';
RUN;
We may also use OUTPUT statements to generate our own
data and to create datasets from raw data formatted in
unusual ways (see section 6.12 and below…)
dm log 'clear'; dm output 'clear'; options ls=80;
DATA generate;
DO x=1 to 10; y=x**2; z=sqrt(x); OUTPUT; END;
PROC PRINT DATA=generate; run; quit;
/* Put this into a raw datafile */
Jan Varsity 56723 Downtown 69831 Super-6 70025
Feb Varsity 62137 Downtown 43901 Super-6 81534
Mar Varsity 49982 Downtown 55783 Super-6 69800
*now read it in properly…;
DATA theaters; *INFILE fill-in;
INPUT Month $ Location $ Tickets @;
OUTPUT;
INPUT Location $ Tickets @;
OUTPUT;
INPUT Location $ Tickets;
OUTPUT;
PROC PRINT DATA = theaters;
TITLE 'Ticket Sales';
RUN;
/* We may also convert observations to
variables and vice versa… */
PROC TRANSPOSE DATA=old OUT=new;
BY var_list; ID variable; VAR var_list;
/* go over the example on p.194 - here’s the
data… team name, player #, type of data, value
of the salary or b.a. */
Garlics
Peaches
Garlics
Peaches
Garlics
Peaches
Garlics
Peaches
10
8
21
10
10
8
21
10
salary
salary
salary
salary
batavg
batavg
batavg
batavg
43000
38000
51000
47500
.281
.252
.265
.301
/* Here’s the SAS code… */
DATA baseball; *INFILE fill-in here;
INPUT Team $ Player Type $ Entry;
PROC SORT DATA = baseball; BY Team Player;
PROC PRINT DATA = baseball;
TITLE 'Baseball Data After Sorting and
Before Transposing';
* Transpose data so salary & batavg are vars;
PROC TRANSPOSE DATA = baseball OUT = flipped;
BY Team Player; ID Type; VAR Entry;
PROC PRINT DATA = flipped;
TITLE 'Baseball Data After Transposing';
RUN;
BY variables are included in the new dataset,
not transposed. There will be one obs. for
each BY level per variable transposed.
ID variable’s values become the names of the
variables in the newly transposed dataset.
The ID variable’s values must be unique within
the BY-values.
VAR statement names the variables whose values
are going to be transposed. SAS creates a new
variable (_NAME_) whose value(s) is the name
of the VAR variable(s).
SEE THE PREVIOUS EXAMPLE AND THE GRAPHIC ON
THE TOP OF P.194
There are several variables that SAS creates automatically when you
create a new dataset, but because they are temporary, you never see
them. A short list is given on page 196:
_N_ = the number of times SAS has looped through the DATA step
_ERROR_ = 0 or 1 depending upon whether there is a data error for
that particular observation.
FIRST.variable and LAST.variable are created when you use a BY
statement in the DATA step. FIRST.variable has the value 1 when
SAS is processing the first occurrence of a new value of the BY
variable and 0 otherwise. The LAST.variable is similar - it has the
value 1 when SAS is processing the last occurrence of a value of the
BY variable and 0 otherwise.
See the example program on pages 196-197…
Here’s the data (entry #, age group, finishing time). We want to create
a new variable whose value is the overall place that the person
finished. Note that the value of place can be determined from the
_N_ variable if the new dataset is being created from a dataset sorted
by finishing time.
The second part of the program uses the FIRST.agegroup automatic
variable to pick the top finisher in each age category.
54 youth 35.5 21 adult 21.6 6 adult 25.8 13 senior 29.0
38 senior 40.3 19 youth 39.6 3 adult 19.0 25 youth 47.3
11 adult 21.9 8 senior 54.3 41 adult 43.0 32 youth 38.6
DATA walkers; *INFILE fill in here;
INPUT Entry AgeGroup $ Time @@;
/*note >1 obs per line*/
PROC SORT DATA = walkers; BY Time;
* Create a new variable, Place;
DATA ordered; SET walkers;
Place = _N_;
PROC PRINT DATA = ordered;
TITLE 'Results of Walk';
PROC SORT DATA = ordered; BY AgeGroup Time;
* Keep first observation in each age group;
DATA winners; SET ordered; BY AgeGroup;
IF FIRST.AgeGroup = 1;
PROC PRINT DATA = winners;
TITLE 'Winners in Each Age Group'; RUN;
Download