Creating Subsets of Observations (SAS_9_10.doc) Data Set ARTS.ARTTOUR OBS CITY NIGHTS LANDCOST EVENTS DESCRIBE GUIDE 1 Rome 3 750 7 4 M, 3 G D'Amico 2 Paris 8 1680 6 5 M, 1 other Lucas 3 New York 6 . 8 5 M, 1 G, 2 other Lucas ...... /* Deleting observations based on the condition that landcost is missing. */ IF landcost= . THEN DELETE; BACKUP Torres Lucas D’Amico /* Selected records in which the value of nights is 6 */ DATA subset; SET arts.arttour; IF nights=6; /* Selected records in which the value of nights is 6 or higher or the landcost is 1500 or higher*/ DATA subset; SET arts.arttour; IF landcost >=1500 OR nights =>6; Selecting Observations to Multiple SAS Data Sets OUTPUT <SAS data set>; DATA ltour othrtour; /* othrthour is the default data set */ SET perm.arts; IF guide=‘Lucas’ THEN OUTPUT ltour; ELSE OUTPUT othrtour; (in ltour) CITY NIGHTS Paris 8 New York 6 … (in othtour) CITY NIGHTS Rome 3 … LANDCOST EVENTS DESCRIBE 1680 6 5 M, 1 other . 8 5 M, 1 G, 2 other GUIDE Lucas Lucas BACKUP Lucas D’Amico LANDCOST EVENTS DESCRIBE 750 7 4 M, 3 G GUIDE D'Amico BACKUP Torres More on OUTPUT statement DATA ltour othrtour; SET perm.arts; days = nights + 1; IF guide=‘Lucas’ THEN ELSE OUTPUT othrtour; OUTPUT ltour; /* An OUTPUT statement tells the SAS system to output the observation when the OUTPUT statement is processed, not at the end of the DATA step. Therefore, any assignment statements should be ahead of output statements in order to be processed and store in the new data sets. */ /* One can write more IF/ELSE statements to output the same observations to other data sets. */ Working with Grouped or Sorted Observations BY list-of-variables; To use a BY statement, the data must meet these conditions: 1. The observations must be in a SAS data set, not an external file. 2. The variables that define the groups must appear in the BY statement. 3. All observations in a group must appear together in the data set. (SORT) Before SORT ----------------------------------------------------------------OBS COUNTRY 1 2 3 4 Spain Japan Switzerland France TOURTYPE architecture architecture scenery architecture NIGHTS LANDCOST VENDOR 10 8 9 8 510 720 734 575 World Express World World SORT Procedure LIBNAME save ‘a’; DATA save.type; INFILE ‘touragnt data a’; INPUT country $ 1-11 tourtype vendor $; PROC $ 13-24 nights landcost SORT DATA=save.type OUT=type2; BY tourtype; /* The sorted observations go into to data set type2. */ After SORT ---------------------------------------------------------OBS COUNTRY 1 2 3 4 Spain Japan France Switzerland TOURTYPE architecture architecture architecture scenery NIGHTS LANDCOST VENDOR 10 8 8 9 510 720 575 734 World Express World World Grouping by More than One Variable PROC SORT DATA=save.type OUT=type3; BY tourtype vendor landcost; After SORT ---------------------------------------------------------OBS COUNTRY 1 2 3 4 Japan Spain France Switzerland TOURTYPE architecture architecture architecture scenery NIGHTS LANDCOST VENDOR 8 10 8 9 720 510 575 734 Express World World World Arranging Groups in Descending Order PROC SORT DATA=save.type OUT= type4; BY DESCENDING tourtype vendor landcost; After SORT ---------------------------------------------------------OBS COUNTRY 1 2 3 4 Switzerland France Spain Japan TOURTYPE scenery architecture architecture architecture NIGHTS LANDCOST VENDOR 9 8 10 8 734 575 510 720 World World World Express Finding the First or Last Observation in a Group DATA temp; SET type3; BY tourtype; frsttour=FIRST.tourtype; lasttour=LAST.tourtype; /* BY statement create two variables called FIRST.tourtype and LAST.tourtype. SAS doesn’t write FIRST. and LAST. variables to the output data set. Therefore, new variables are needed to store their values. */ PROC PRINT DATA=temp; VAR country tourtype frsttour lasttour; After SORT ---------------------------------------------------------OBS COUNTRY TOURTYPE FRSTTOUR LASTTOUR 1 2 3 4 Japan Spain France Switzerland architecture architecture architecture scenery 1 0 0 1 0 0 1 1 PROC SORT DATA=save.type OUT=type5; BY tourtype lancost; RUN; DATA lowcost; SET type5; BY tourtype; IF FIRST.tourtype; RUN; (Before) Data set save.type ---------------------------------------------------------OBS COUNTRY TOURTYPE NIGHTS LANDCOST VENDOR 1 Spain architecture 10 510 World 2 Japan architecture 8 720 Express 3 Switzerland scenery 9 734 World 4 France architecture 8 575 World (After) Data set work.lowcost ---------------------------------------------------------OBS COUNTRY TOURTYPE NIGHTS LANDCOST VENDOR 1 Spain architecture 10 510 World 2 Switzerland scenery 9 734 World Deleting Duplicated Observations DATA save.type; INFILE ‘touragnt data a’; INPUT country $ 1-11 tourtype $ 13-24 $; nights landcost vendor PROC SORT DATA=save.type OUT=type3 NODUPLICATES; BY tourtype; /* The sorted observations go into to data set type3. */ Differences between IF and WHERE statements DATA subset; SET arts.arttour; IF guide=’Lucas’; is the same as DATA subset; SET arts.arttour; WHERE guide=’Lucas’; Difference between IF and WHERE Statements The WHERE statement may be more efficient than the IF because it checks on the validity of the condition before the observation is brought into a temporary holding area (program data vector). WHERE statement can only be used with variables in the existing data set, whereas IF statement can be used with raw data as well. WHERE does not affect the logical values of the FIRST. and LAST. Variables. WHERE can be included in SAS procedures. The following three statements are equivalent: WHERE age GE 20 AND age LE 40; WHERE 20 LE age LE 40; WHERE age BETWEEN 20 and 40; CONTAINS or ? WHERE name CONTAINS ‘Mc’; WHERE name ? ‘Mc’; IS MISSING or IS NULL WHERE name IS MISSING; WHERE name IS NULL; LIKE WHERE name LIKE ‘BOY%’; /*Select BOY followed by anything. */ WHERE name LIKE ‘A___’; /* (A followed by three underscores) */ /* Select all names of length 4, beginning with A. */ WHERE name LIKE ‘A_%’; /* Select all names that begin with A and are at least two characters in length. */ Exercise: A researcher treated three groups of rats (Groups A, B and C) and recorded the weight of each rat after one week. The data were arranged with each GROUP and WEIGHT in pairs. A C B B 34 55 52 62 B C C A 58 56 58 28 A 28 A 27 A 21 C . B 60 B . C 59 Write a SAS program to read this data set (RATSORT DATA) create SAS data set that contains only the lightest weight (excluding missing values) in each group and print this data set.