CHAPTER 7 COMBINING AND MANAGING SAS DATA SETS a. Multiple input data sets b. SAS special variables c. Reshaping SAS data sets d. Data library management Revised Fall 2000 7-1 Multiple Input Data Sets Data Sets can be combined in various ways using the following SAS statements to specify the input data sets: v SET Reads observations from one or more SAS data sets. v MERGE Joins observations from multiple SAS data sets. v UPDATE Modifies an existing "Master File" SAS data set with a transaction file" SAS data set creating a new "Master File" NOTE: These statements can be controlled by using the BY statement in the DATA step. When the BY statement is used, all the input data sets must contain the BY variables and must be sorted by the BY variables. Observations that have the same values of all BY variables are said to belong to the same BY group. 7-2 The SET Statement The SET statement names the SAS input data sets from which observations will be read. SYNTAX SET SASDS1 SASDS2....; where SASDS is a Name of the Form LIBRARY.DATASET. NOTES: v By default SAS will concatenate all data sets in the order listed in the SET statement. All observations will be read from the first data set listed, then all observations will be read from the second data set listed, and so on. v When used with a BY statement, SAS will interleave observations from the data sets listed in the SET statement. The observations are selected based on the sort order determined by the BY variables. All input data sets must be sorted by the BY variables and the resulting output data set will be sorted by the BY variables as well. v Up to 50 data sets can be listed in a single SET statement. 7-3 Concatenation Example Create data sets for short and tall students; DATA WORK.SHORT; SET CLASSLIB.CLASS; If HT LT 60; RUN; DATA WORK.TALL; SET CLASSLIB.CLASS; IF HT GE 60; RUN; *CONCATENATE THE TWO DATA SETS; DATA WORK.BOTH; SET WORK.SHORT WORK.TALL; RUN; DATA SET WORK.BOTH NAME Piggy M Frog K Gonzo Christiansen Hosking J Helms R SEX F M AGE . 3 14 37 31 41 M M M DATA SET WORK.SHORT NAME Piggy M Frog K Gonzo SEX F M . AGE . 3 14 HT 8 12 25 HT 48 12 25 71 70 74 WT . 1 45 195 160 195 DATA SET WORK.TALL WT . 1 45 NAME Christiansen Hosking J Helms R . 7-4 SEX M M M AGE 37 31 41 HT 71 70 74 WT 195 160 195 Interleave Example * Sort data sets by NAME; Proc sort data=work.short; By Name; RUN; Proc sort data=tall; By Name; RUN; *Interleave the two data sets; Data work.bothsort; Set short tall; By Name, RUN; DATA SET WORK.SHORT NAME Frog K Gonzo Piggy M SEX M F AGE 3 14 . HT 12 25 48 DATA SET WORK.TALL WT 1 45 . NAME Christiansen Helms R Hosking J SEX M M M AGE 37 41 31 HT 71 74 70 WT 195 195 160 DATA SET WORK.BOTHSORT NAME Christiansen Frog K Gonzo Helms R Hosking J Piggy M SEX M M AGE 37 3 14 41 31 . M M F HT 71 12 25 74 70 48 WT 195 1 45 195 160 . The MERGE Statement The MERGE statement joins observations from the input data sets into a single observation in the output data set. SYNTAX 7-5 MERGE SASDS1 SASDS2....; Where SASDS...are data set names. NOTES: v One-to-one merging is performed when no BY statement is used. The first observation in each data set is joined to all the other first observations, the second to the second, etc. One-to-one merging should be performed with caution, especially when the data sets contain different numbers of observations. v Match merging (with the BY statement) joins the observations only when the values of the BY variables match. v One-to-many match merging allows variable values to be spread or "splayed" across all observations with the same BY values. v Many-to-many match merging joins observations one to one within the BY group until one data set reaches the last observation within the current BY group. This observation is then splayed across the remaining observations in the BY group. v Up to 50 SAS data sets can be listed in a single MERGE statement. 7-6 One-to-One Merge Combine the LIST and DEMO data sets to produce the NEWCLASS data set. WORK.LIST NAME Christiansen Frog K Gonzo Helms R Hosking J Piggy M WORK.DEMO SEX M M M M F NAME Christiansen Frog K Gonzo Helms R Hosking J Piggy M AGE 37 3 14 41 31 . HT 71 12 25 74 70 48 WT 195 1 45 195 160 . DATA NEWCLASS; MERGE LIST DEMO; RUN; WORK.NEWCLASS NAME Christiansen Frog K Gonzo Helms R Hosking J Piggy M SEX M M AGE 37 3 14 41 31 . M M F 7-7 HT 71 12 25 74 70 48 WT 195 1 45 195 160 . ONE-TO-ONE MERGE Join two data sets with unequal numbers of observations. WORK.LIST2 NAME Christiansen Frog K Gonzo Hosking J Piggy M WORK.DEMO2 SEX M M M F NAME Christiansen Frog K Helms R Piggy M AGE 37 3 41 . HT 71 12 74 48 WT 195 1 195 . DATA NEWCLASS; MERGE LIST2 DEMO2; RUN; WORK.NEWCLASS NAME Christiansen Frog K Helms R Piggy M Piggy M SEX M M AGE 37 3 41 . . M F 7-8 HT 71 12 74 48 . WT 195 1 195 . . MATCH MERGING Join only those observations with the same values of the BY variables. Note that both input data sets are sorted by the BY variable. WORK.LIST2 NAME Christiansen Frog K Gonzo Hosking J Piggy M WORK.DEMO2 SEX M M M F NAME Christiansen Frog K Helms R Piggy M AGE 37 3 41 . HT 71 12 74 48 WT 195 1 195 . DATA NEWCLASS; MERGE LIST2 DEMO2; BY NAME; RUN; WORK.NEWCLASS NAME Christiansen Frog K Gonzo Helms R Hosking J Piggy M SEX M M AGE 37 3 . 41 . . M F 7-9 HT 71 12 . 74 . 48 WT 195 1 . 195 . . ONE TO ONE MATCH MERGE EXAMPLE Put blood pressure and lipid data into one file. WORK.SBP WORK.LIPIDS ID 101 102 103 104 ID 101 102 103 105 SBP 83 100 120 118 HDL 45 50 60 65 DATA ALL; MERGE SBP LIPIDS ; BY ID RUN; WORK.ALL ID 101 102 103 104 105 SBP 83 100 120 118 . HDL 45 50 60 . 65 LDL 180 200 210 . 220 7-10 LDL 180 200 210 220 ONE TO ONE MATCH MERGE EXAMPLE MERGE PRERAND FILE WITH POST RAND FILE Problem: Both files contain common variables other than the by variables. Notice the resulting Data set. WORK.PRERAND ID 101 102 103 104 HDL 40 45 55 60 WORK.POSTRAND LDL 200 210 215 220 ID 101 102 103 105 HDL 45 50 60 65 DATA ALL; MERGE PRERAND POSTRAND; BY ID ; RUN; WORK.ALL OBS 1 2 3 4 5 ID HDL LDL 101 102 103 104 105 45 50 60 60 65 180 200 210 220 230 7-11 LDL 180 200 210 230 ONE TO ONE MATCH MERGE EXAMPLE MERGE PRERAND FILE WITH POST RAND FILE Problem: Both files contain common variables other than the by variables. Solution: Use the rename data option WORK.PRERAND ID 101 102 103 104 HDL 40 45 55 60 WORK.POSTRAND LDL 200 210 215 220 ID 101 102 103 105 HDL 45 50 60 65 LDL 180 200 210 230 DATA ALL; MERGE PRERAND(RENAME=(HDL=HDL0 LDL=LDL0)) POSTRAND(RENAME=(HDL=HDL1 LDL=LDL1)) ; BY ID ; HDLDIFF = HDL1 - HDL0 ; LDLDIFF = LDL1 - LDL0 ; RUN; WORK.ALL ID 101 102 103 104 105 HDL0 40 45 55 60 . LDL0 200 210 215 220 . HDL1 45 50 60 . 65 LDL1 180 200 210 . 230 7-12 HDLDIFF 5 5 5 . . LDLDIFF -20 -10 -5 . . EXAMPLE: ONE TO ONE MERGE WITH TWO BY VARIABLES LDL OBS ID 1 2 3 4 5 6 7 8 9 10 11 12 1 1 1 2 2 2 3 3 4 4 5 5 62 63 64 65 66 SBP VISIT LDL OBS 1 2 3 1 2 3 1 3 1 2 1 2 130 135 140 145 150 155 160 165 170 175 180 185 1 2 3 4 5 6 7 8 9 ID 1 1 1 2 2 3 3 3 4 VISIT SBP 1 2 3 1 3 1 2 3 3 100 105 110 115 120 122 125 130 135 data all ; merge sbp lipids ; by id visit ; run; NOTE: The data set WORK.ALL has 14 observations and 4 variables. NOTE: The DATA statement used 1.12 seconds. ALL OBS ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 1 1 2 2 2 3 3 3 4 4 4 5 5 VISIT SBP LDL 1 2 3 1 2 3 1 2 3 1 2 3 1 2 100 105 110 115 . 120 122 125 130 . . 135 . . 130 135 140 145 150 155 160 . 165 170 175 . 180 185 7-13 ONE-TO-MANY MATCH MERGING v One data set in a MERGE statement may contain multiple observations with the same values of the BY variables within each BY group. MERGE joins the observation from the "one" data set with each observation in the "many" data set. v In match merging, the program data vector is initialized to missing only at the beginning of a new BY group. This allows information from the "one" data set to be splayed across the corresponding BY group observations in the "many" data set. 7-14 ONE TO MANY MATCH MERGE EXAMPLE WORK.LIST NAME CHRISTIANSEN FROG K GONZO HELMS R HOSKING J PIGGY M WORK.HW SEX M M NAME CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HOSKING J PIGGY M PIGGY M M M F HWNUM 1 2 4 5 1 2 3 1 1 2 3 DATA GRADES; MERGE LIST HW; BY NAME; RUN; NAME PDV | SEX | HWNUM | SCORE | | WORK.GRADES NAME SEX CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HELMS R HOSKING J PIGGY M PIGGY M M M M M M M M NUMBER 1 2 4 5 1 2 3 1 . 1 2 3 M M F F 7-15 SCORE 10 3 9 10 10 10 8 . . 10 10 10 SCORE 10 3 9 10 10 10 8 . 10 10 10 ONE TO MANY MATCH MERGE EXAMPLE When the same variables (other than BY variables) are in more than one data set, the value of the variable from the last data set mentioned in the MERGE statement is used. WORK.GRADES NAME SEX CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HELMS R HOSKING J PIGGY M PIGGY M M M M M M M M M M F F WORK.FIXHW HWNUM SCORE 1 2 4 5 1 2 3 1 . 1 2 3 10 3 9 10 10 10 8 . . 10 10 10 NAME CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K HOSKING J DATA GRADES2; MERGE GRADES FIXHW; BY NAME NWNUM; RUN; WORK.GRADES2 NAME SEX CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HELMS R HOSKING J PIGGY M PIGGY M M M M M M M M M M M F F HWNUM 1 2 2 4 5 1 2 3 1 . 1 2 3 SCORE 9 10 2 . 10 10 . 8 . . . 10 10 NOTE: Missing values in FIXHW overwrite values in GRADES. 7-16 HWNUM SCORE 1 2 2 4 2 1 9 10 2 . . . ONE TO MANY MATCH MERGE EXAMPLE SBP OBS 1 2 3 4 5 6 7 8 9 ID 1 1 1 2 2 3 3 3 4 DEM VISIT SBP OBS 1 2 3 1 3 1 2 3 3 100 105 110 115 120 122 125 130 135 1 2 3 4 5 ID 1 2 3 4 5 DATA ALL ; MERGE SBP DEM; BY ID ; RUN; ALL OBS ID 1 2 3 4 5 6 7 8 9 10 1 1 1 2 2 3 3 3 4 5 VISIT SBP SEX 1 2 3 1 3 1 2 3 3 . 100 105 110 115 120 122 125 130 135 . F F F M M F F F F M AGE 30 30 30 35 35 40 40 40 45 50 7-17 SEX F M F F M AGE 30 35 40 45 50 ONE TO MANY MATCH MERGE EXAMPLE SBP OBS 1 2 3 4 5 6 7 8 9 ID 1 1 1 2 2 3 3 3 4 DEM VISIT SBP OBS 1 2 3 1 3 1 2 3 3 100 105 110 115 120 122 125 130 135 1 2 3 4 5 ID 1 2 3 4 5 DATA ALL ; MERGE SBP DEM; BY ID ; AGE=AGE*10 ; RUN; ALL OBS ID 1 2 3 4 5 6 7 8 9 10 1 1 1 2 2 3 3 3 4 5 VISIT SBP SEX AGE 1 2 3 1 3 1 2 3 3 . 100 105 110 115 120 122 125 130 135 . F F F M M F F F F M 300 3000 30000 350 3500 400 4000 40000 450 500 What’s wrong with age? 7-18 SEX F M F F M AGE 30 35 40 45 50 ONE TO MANY MATCH MERGE EXAMPLE Merge the department specific means onto each record. 24 PROC MEANS DATA=SC.SALES NWAY; 25 CLASS DEPT ; 26 VAR COST ; 27 OUTPUT OUT=OUT1 MEAN=MCOST ; 28 RUN; NOTE: The data set WORK.OUT1 has 2 observations and 4 variables. NOTE: The PROCEDURE MEANS used 0.56 seconds. 29 30 DATA SALES2 ; 31 MERGE SALES 32 OUT1 ; 33 BY DEPT ; 34 RUN; NOTE: The data set WORK.SALES2 has 50 observations and 9 variables. NOTE: The DATA statement used 0.78 seconds. 35 36 PROC PRINT ; 37 RUN; NOTE: The PROCEDURE PRINT used 0.44 seconds. Analysis Variable : COST DEPT N Obs N Mean Std Dev Minimum Maximum -----------------------------------------------------------------------FURS 6 6 203.6700000 22.1033337 180.0100000 240.0000000 SHOES 44 44 35.0363636 4.9674752 23.7600000 43.1800000 ------------------------------------------------------------------------ 7-19 OBS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 DEPT CLERK FURS FURS FURS FURS FURS FURS SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES SHOES BURLEY BURLEY AGILE BURLEY BURLEY AGILE CLEVER AGILE CLEVER CLEVER AGILE AGILE BURLEY CLEVER CLEVER CLEVER CLEVER CLEVER AGILE CLEVER CLEVER CLEVER BURLEY CLEVER CLEVER AGILE CLEVER CLEVER AGILE CLEVER AGILE CLEVER CLEVER CLEVER CLEVER AGILE CLEVER AGILE AGILE CLEVER CLEVER CLEVER BURLEY CLEVER CLEVER AGILE BURLEY BURLEY CLEVER CLEVER PRICE 599.95 800.00 590.00 499.95 700.00 700.00 99.95 95.00 65.00 65.00 49.95 69.95 69.95 84.95 54.95 95.00 139.95 59.95 54.95 94.95 65.00 89.95 75.00 54.95 65.00 74.95 39.95 85.00 95.00 44.95 75.00 55.00 104.95 49.95 94.95 89.95 74.95 84.95 59.95 84.95 94.95 55.00 59.95 84.95 54.95 64.95 54.95 55.00 64.95 129.95 COST WEEKDAY 180.01 240.00 182.00 200.01 210.00 210.00 41.21 40.49 33.44 33.44 28.07 34.93 34.93 38.65 30.00 40.49 42.96 31.78 30.00 40.48 33.44 39.63 36.31 30.00 33.44 36.30 23.76 38.66 40.49 25.99 36.31 30.02 41.83 28.07 40.48 39.63 36.30 38.65 31.78 38.65 40.48 30.02 31.78 38.65 30.00 33.43 30.00 30.02 33.43 43.18 THR MON SAT SAT THR WED TUE WED WED WED THR THR THR SAT SAT SAT MON MON TUE TUE TUE TUE WED WED WED THR THR FRI SAT SAT TUE TUE WED FRI FRI SAT SAT MON MON MON TUE TUE WED WED THR FRI FRI FRI FRI FRI 7-20 DAY 5 9 14 14 19 25 3 4 4 4 5 5 5 7 7 7 9 9 10 10 10 10 11 11 11 12 12 13 14 14 17 17 18 20 20 21 21 23 23 23 24 24 25 25 26 27 27 27 27 27 _TYPE_ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 _FREQ_ 6 6 6 6 6 6 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 MCOST 203.670 203.670 203.670 203.670 203.670 203.670 35.036 35.036 35.036 35.036 35.036 35.036 35.036 35.036 35.036 35.036 35.036 35.036 35.036 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 35.0364 ONE TO MANY MATCH MERGE EXAMPLE Merge the overall mean onto each record. 20 21 22 23 PROC MEANS DATA=SC.CLASS NOPRINT ; VAR WT ; OUTPUT OUT=OUT1 MEAN=MWT ; RUN; NOTE: The data set WORK.OUT1 has 1 observations and 3 variables. NOTE: The PROCEDURE MEANS used 0.38 seconds. 24 25 26 27 28 29 30 31 32 33 /* MERGE MEAN ONTO EVERY OBSERVATION */ DATA NEW ; SET SC.CLASS ; IF _N_=1 THEN SET OUT1 ; DROP _TYPE_ _FREQ_ ; RUN; NOTE: The data set WORK.NEW has 6 observations and 6 variables. NOTE: The DATA statement used 0.56 seconds. 34 35 36 PROC PRINT ; RUN; NOTE: The PROCEDURE PRINT used 0.11 seconds. work.new OBS 1 2 3 4 5 6 NAME CHRISTIANSEN HOSKING J HELMS R PIGGY M FROG K GONZO SEX M M M F M AGE HT WT 37 31 41 . 3 14 71 70 74 48 12 25 195 160 195 . 1 45 7-21 MWT 119.2 119.2 119.2 119.2 119.2 119.2 MANY-TO-MANY MERGE EXAMPLE WORK.SBP ID 101 101 102 102 102 WORK.LIPIDS SBP 80 85 90 95 100 ID 101 101 102 102 HDL 45 50 60 65 DATA ALL; MERGE SBP LIPIDS ; BY ID ; RUN; WORK.ALL ID 101 101 102 102 102 SBP 80 85 90 95 100 HDL 45 50 60 65 65 LDL 180 200 210 220 220 7-22 LDL 180 200 210 220 The UPDATE Statement The UPDATE statement joins observations from two SAS data sets, an "old master file" data set and a "transaction file" data set. The resulting observations form a "new master file" data set. These observations contain the values of the variables in the old master file, except for those modifications specified in the transaction file. SYNTAX UPDATE SASDS1 SASDS2; BY Variables; where SASDS1 is the old master file; it must be sorted by the BY variables and must contain only one observation for each BY group and SASDS2 is the transaction file; it must be sorted by the BY variables, but may have multiple observations per BY group. NOTES: v Variables with non-missing values in the transaction file will replace those values in the PDV. Missing values will not replace values in the PDV. v The PDV is initialized to missing only at the beginning of a new BY group. v The PDV is output to the new master file only at the end of a BY group. 7-23 UPDATE EXAMPLE WORK.GRADES NAME SEX CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HELMS R HOSKING J PIGGY M PIGGY M M M M M M M M M M F F WORK.FIXHW HWNUM SCORE 1 2 4 5 1 2 3 1 . 1 2 3 10 3 9 10 10 10 8 . . 10 10 10 NAME CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K HOSKING J DATA GRADES2; UPDATE GRADES FIXHW; BY NAME HWNUM; RUN; WORK.GRADES2 NAME SEX CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HELMS R HOSKING J PIGGY M PIGGY M M M M M M M M M M F F HWNUM 1 2 4 5 1 2 3 1 . 1 2 3 SCORE 9 2 9 10 10 10 8 . . 10 10 10 7-24 NUMBER 1 2 2 4 2 1 SCORE 9 10 2 . . . UPDATING VALUES TO MISSING In order to set the value of a variable to missing using an UPDATE statement, the variable must be set to the special missing value (_) in the transaction file. WORK.GRADES NAME SEX CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HELMS R HOSKING J PIGGY M PIGGY M M M M M M M M WORK.FIXHW HWNUM SCORE 1 2 4 5 1 2 3 1 . 1 2 3 M M F F 10 3 9 10 10 10 8 . . 10 10 10 NAME HWNUM CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K HOSKING J DATA GRADES2; UPDATE GRADES FIXHW; BY NAME HWNUM; RUN; WORK.GRADES2 NAME CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN CHRISTIANSEN FROG K FROG K FROG K GONZO HELMS R HOSKING J PIGGY M PIGGY M SEX M M M M M M M M M F F HWNUM 1 2 4 5 1 2 3 1 . 1 2 3 7-25 SCORE 9 2 . 10 10 10 8 . . . 10 10 1 2 2 4 2 1 SCORE 9 10 2 __ . __ UPDATE EXAMPLE: SPECIAL MISSING VALUES master trans ID X ID 1 2 3 10 20 30 1 1 2 3 3 X A . C 10 . data all ; update master trans; by id; run; all ID X 1 2 3 A C 10 7-26 SAS SPECIAL VARIABLES The following special variables can be created in DATA steps using SET, MERGE, or UPDATE statements. v The IN variable indicates if the data set contributed to the current observation. v The END variable indicates that the current observation is the last to be processed. v FIRST.byvariable and LAST.byvariable indicate if the current observation is the first or last observation in the BY group. SYNTAX: SET MERGE UPDATE Datasetname(IN=variable)...END=endvarible; For the BY Statement BY var1 var2 ...; FIRST.var1, LAST.var1, FIRST.var2, LASAT.var2 . . . are automatically created. NOTES: v v v v The special variables exist in the PDV, but not in the output data sets. The special variables are indicator variables with values of 0 (False) and 1 (True). Invariables can be created for each data set in a SET, MERGE, or UPDATE statement. Endvariables can be created for each SET, MERGE, or UPDATE statement. 7-27 Example: Special Variables JAN MAKE CHEVY CHEVY CHEVY FORD FORD YEAR 76 72 69 69 76 FEB PRICE 3550 1350 1165 700 3245 MAKE CHEVY FORD FORD OLDS YEAR 76 75 63 72 PRICE 2525 2985 695 2383 DATA CARSALES; SET JAN(IN=INJ) FEB(IN=INF) END=EOF; BY MAKE; RUN; PDV CARSALES MAKE CHEVY CHEVY CHEVY CHEVY FORD FORD FORD FORD OLDS YEAR 76 72 69 76 69 76 75 63 72 SPECIAL VARIABLES PRICE INJ 3550 1350 1165 2525 700 3245 2985 695 2383 1 1 1 0 1 1 0 0 0 7-28 INF EOF 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 1 FIRST.MAKE 1 0 0 0 1 0 0 0 1 LAST.MAKE 0 0 0 1 0 0 0 0 1 Example: Special Variables Create a data set containing only observations where a match occurred on the BY variable. WORK.LIST2 NAME CHRISTIANSEN FROG K GONZO HOSKING J PIGGY M WORK.DEMO2 SEX M M M F DATA MATCHES; MERGE LIST2(IN=LIST) BY NAME; IF LIST AND DEMO; RUN; NAME CHRISTIANSEN FROG K HELMS R PIGGY M AGE 37 3 41 . HT 71 12 74 48 WT 195 1 195 . DEMO2(IN=DEMO); PDV LIST NAME SEX DEMO AGE HT WT FIRST. NAME LAST. NAME WORK.MATCHES NAME CHRISTIANSEN FROG K PIGGY M SEX M M F 7-29 AGE 37 3 . HT 71 12 48 WT 195 1 . ONE TO ONE MATCH MERGE EXAMPLE WITH AN IN STATEMENT Merge pre-rand and post-rand files. Only keep subjects who have a record in both files. Then create variables for difference between pre-rand and post-rand values. WORK.PRERAND WORK.POSTRAND ID 101 102 103 104 ID 101 102 103 105 HDL 40 45 55 60 LDL 200 210 215 220 HDL 45 50 60 65 LDL 180 200 210 230 DATA ALL; MERGE PRERAND(IN=INPRE RENAME=(HDL=HDL0 LDL=LDL0)) POSTRAND(IN=INPOST RENAME=(HDL=HDL1 LDL=LDL1)) ; BY ID ; IF (INPOST=1) & (INPRE=1) ; HDLDIFF = HDL1 - HDL0 ; LDLDIFF = LDL1 - LDL0 ; RUN; WORK.ALL ID 101 102 103 HDL0 40 45 55 LDL0 200 210 215 HDL1 45 50 60 LDL1 180 200 210 7-30 HDLDIFF 5 5 5 LDLDIFF -20 -10 -5 MATCH MERGE EXAMPLE WITH THE IN STATEMENT Merge SBP and LIPID files. Only keep subjects who have a record in both files. WORK.SBP ID WORK.LIPIDS VISIT SBP 101 1 80 101 2 102 ID VISIT HDL LDL 101 1 45 180 85 101 2 50 200 1 90 102 1 60 210 102 2 95 102 2 65 220 102 4 100 102 3 70 230 103 1 110 DATA ALL; MERGE SBP(IN=INS) LIPIDS(IN=INL) ; BY ID VISIT ; IF (INS & INL) ; /* INCLUDE IF IN SBP & LIPID FILES */ RUN; WORK.ALL ID 101 101 102 102 VISIT 1 2 1 2 SBP 80 85 90 95 HDL 45 50 60 65 LDL 180 200 210 220 PDV _______________________________________________________________ ID 101 101 102 102 102 102 103 VISIT 1 2 1 2 3 4 1 SBP 80 85 90 95 . 100 110 HDL LDL INS 45 50 60 65 70 . . 180 200 210 220 230 . . 1 1 1 1 0 1 1 7-31 INL 1 1 1 1 1 0 0 MATCH MERGE EXAMPLE WITH THE IN AND FIRST.BY/LAST.BY STATEMENT Merge SBP and LIPID files. Only keep subjects who have a record in both files and only keep the first record for each subject. WORK.SBP ID 101 101 102 102 102 103 103 VISIT 1 2 1 2 4 1 1 WORK.LIPIDS SBP 80 85 90 95 100 110 120 ID 101 101 102 102 102 VISIT 1 2 1 2 3 HDL 45 50 60 65 70 LDL 180 200 210 220 230 DATA ALL; MERGE SBP(IN=INS) LIPIDS(IN=INL) ; BY ID VISIT ; IF (INS & INL) & (FIRST.ID) ; RUN; WORK.ALL ID 101 102 VISIT 1 1 SBP 80 90 HDL 45 60 LDL 180 210 PDV __________________________________________________________________ ID 101 101 102 102 102 102 103 103 VISIT 1 2 1 2 3 4 1 1 INS 1 1 1 1 0 1 1 1 INL 1 1 1 1 1 0 0 0 FIRST.ID 1 0 1 0 0 0 1 0 LAST.ID 0 1 0 0 0 1 0 1 7-32 FIRST.VISIT 1 1 1 1 1 1 1 0 LAST.VISIT 1 1 1 1 1 1 0 1 EXAMPLE: EOF Statement DATA ONE ; SET SC.CLASS END=EOF; KEEP N F ; N + 1; /* or RETAIN N 0 ; N=N+1 ; */ F + (SEX=’F’); /* or RETAIN F 0 ; F=F+(SEX=’F’) ; */ IF (EOF=1); /* OUTPUT IF LAST RECORD */ RUN; PROC PRINT ; RUN; _______________________WORK.ONE________________________________ _ OBS N F 1 6 1 7-33 Example: Transpose data set LIPIDS from multiple records per subject to 1 record per subject DATA TWO; 13 SET LIPIDS ; 14 15 BY ID ; 16 17 RETAIN LDL1 LDL2 HDL1 HDL2 ; 18 19 20 ARRAY VARS LDL1 LDL2 HDL1 HDL2 ; 21 IF FIRST.ID THEN DO OVER VARS ; 22 VARS=. ; 23 END ; 24 25 IF VISIT=1 THEN DO; 26 LDL1 = LDL ; 27 HDL1 = HDL ; 28 END ; 29 30 ELSE IF VISIT=2 THEN DO; 31 LDL2 = LDL ; 32 HDL2 = HDL ; 33 END ; 34 35 KEEP ID LDL1 LDL2 HDL1 HDL2 ; 36 37 IF LAST.ID THEN OUTPUT ; 38 RUN; NOTE: The data set WORK.TWO has 3 observations and 5 variables. NOTE: The DATA statement used 1.4 seconds. LIPIDS OBS 1 2 3 4 5 ID 101 101 102 102 103 VISIT LDL HDL 1 2 1 2 1 180 185 160 170 190 45 50 40 . 60 HDL1 HDL2 45 40 60 50 . . TWO OBS 1 2 3 ID LDL1 LDL2 101 102 103 180 160 190 185 170 . 7-34 MULTIPLE OUTPUT DATASETS A single DATA step can create more than one SAS data set using the more general forms of the DATA and OUTPUT statements: DATA SASDS1 SASDS2 . . . SASDSN, OUTPUT SASDSX . . .; where SASDS1 . . . SASDSN are the names of up to 50 SAS data sets to be created in this step, and SASDSX is a subset of the N data set names listed in the DATA statement. NOTES: v A DATA step may contain multiple OUTPUT statements. v Each OUTPUT statement may specify any combination of the SAS data sets listed in the DATA statement. v The current contents of the PDV are written to these data sets each time the OUTPUT statement is executed. v The statement OUTPUT; (without a list of data set names) writes the PDV to all data sets in the DATA statement. v A DATA step without an OUTPUT statement defaults to writing to all output data sets each time the bottom of the program is reached. 7-35 EXAMPLE: Create SHORT and TALL datasets from CLASS in one DATA step. WORK.CLASS NAME CHRISTIANSEN FROG K GONZO HELMS R HOSKING J PIGGY M SEX M M AGE 37 3 14 41 31 . M M F HT 71 12 25 74 70 48 DATA SHORT TALL; SET CLASS; IF HT LT 60 THEN OUTPUT SHORT; IF HT GE 60 THEN OUTPUT TALL; RETURN; WORK.SHORT NAME FROG K GONZO PIGGY M SEX M F WORK.TALL AGE HT WT NAME 3 14 . 12 25 48 1 45 . CHRISTIANSEN HELMS R HOSKING J 7-36 SEX AGE HT WT M 37 M 41 M 31 71 74 70 195 195 160 WT 195 1 45 195 160 . EXAMPLE: COMBINING MULTIPLE INPUT AND OUTPUT DATASETS WORK.LIST NAME SEX CHRISTIANSEN FROG K GONZO HELMS HOSKING J PIGGY M M M M M F WORK.DEMO NAME AGE HT WT CHRISTIANSEN FROG K GONZO HELMS R HOSKING J PIGGY M 37 3 14 41 31 . 71 12 25 74 70 48 195 1 45 195 160 . DATA TALL BOYS ALL; MERGE LIST DEMO BY NAME; IF HT GE 60 THEN OUTPUT TALL; IF SEX EQ ’M’ THEN OUTPUT BOYS; OUTPUT ALL; RUN; 7-37 WORK.TALL NAME CHRISTIANSEN HELMS R HOSKING J SEX AGE HT WT M M M 37 41 31 71 74 70 195 195 160 SEX M M M M AGE 37 3 41 31 HT 71 12 74 70 WT 195 1 195 160 SEX M M AGE 37 3 14 41 31 . HT 71 12 25 74 70 48 WT 195 1 45 195 160 . WORK.BOYS NAME CHRISTIANSEN FROG K HELMS R HOSKING J WORK.ALL NAME CHRISTIANSEN FROG K GONZO HELMS R HOSKING J PIGGY M M M F 7-38 EXAMPLE: Output Multiple Data Sets /* output 2 data sets: 1) one contains main data set ; 2) counts contains count variables */ data one(drop=n f) counts(keep=n f) ; set sc.class end=eof; n+1; f + (sex=’F’) ; if (eof) then output counts; output one ; run; 7-39 RESHAPING SAS DATASETS v Multiple output observations can be created from a single input observation by using several OUTPUT statements. v The DROP and KEEP data set options can be used to control which variables are included or excluded from the PDV and the output data sets. v The RENAME data set option can change the names of variables in the PDV and output data sets. 7-40 EXAMPLE: MULTIPLE OUTPUT OBSERVATIONS FROM SINGLE INPUT OBSERVATIONS Reshape the data set from one observation per pupil to one observation per test. WORK.STUDENTS NAME JOHN ALBERT SALLY MARY TEST1 90 100 95 50 TEST2 70 33 80 100 TEST3 50 89 90 92 DATA TEST; SET WORK=STUDENT; EXAM=1; GRADE=TEST1; OUTPUT; EXAM=2; GRADE=TEST2; OUTPUT; EXAM=3; GRADE=TEST3; OUTPUT; RUN; WORK.TEST NAME JOHN JOHN JOHN ALBERT ALBERT ALBERT SALLY SALLY SALLY MARY MARY MARY TEST1 90 90 90 100 100 100 95 95 95 50 50 50 TEST2 70 70 70 33 33 33 80 80 80 100 100 100 TEST3 EXAM 50 1 50 2 50 3 89 1 89 2 89 3 90 1 90 2 90 3 92 1 92 2 92 3 7-41 GRADE 90 70 50 100 33 89 95 80 90 50 100 92 EXAMPLE: DROPPING UNWANTED VARIABLES WITH THE DROP STATEMENT DATA TEST; SET WORK.STUDENT; DROP TEST1-TEST3 I; ARRAY TEST(I) TEST1-TEST3; DO I=1 TO 3; EXAM=I; GRADE=TEST; OUTPUT; END; RUN; WORK.TEST NAME JOHN JOHN JOHN ALBERT ALBERT ALBERT SALLY SALLY SALLY MARY MARY MARY EXAM 1 2 3 1 2 3 1 2 3 1 2 3 GRADE 90 70 50 100 33 89 95 80 90 50 100 92 NOTE:Keep name EXAM GRADE; would produce the same results as the DROP statement. 7-42 THE DROP AND KEEP DATASET OPTIONS The DROP and KEEP data set options can be used on each output data set named in the DATA statement to control which of the variables in the PDV get written to which output data sets. DATA LIST (KEEP=NAME SEX) DEMO(DROP=SEX AGE); SET CLASSLIB.CLASS; RUN; CLASSLIB.CLASS NAME SEX AGE HT WT PDV NAME LIST NAME SEX SEX AGE DEMO 7-43 HT NAME HT WT WT The DROP and KEEP options can be used on the input data sets listed in SET, MERGE, and UPDATE statements to control which variables get read into the PDV. EXAMPLE DATA BOTH; MERGE CLASS1(DROP=AGE) CLASS2(DROP=SEX); BY NAME; RUN; CLASS1 FROG K GONZO HELMS R PIGGY M NAME M CLASS2 M F 3 14 41 . FROG K HELMS R HOSKING J PIGGY M SEX AGE NAME PDV BOTH NAME SEX A B C D SEX AGE . 41 31 39 AGE HT NAME SEX AGE HT FROG K GONZO HELMS R HOSKING J PIGGY M M . . 41 31 39 12 . 74 70 48 7-44 M F 12 74 70 48 HT THE RENAME DATASET OPTION SYNTAX: SASDS(RENAME=(OLDNAME=NEWNAME . . .)) The RENAME option can be used on SET, MERGE and UPDATE statements to change the names of SAS variables between the input data sets and the PVD, and on the DATA statement to change names between the PDV and the output data sets. DATA COMBO; SET FALL SPRING(RENAME=(TEST=EXAM)); RUN; FALL NAME SPRING EXAM PDV NAME EXAM COMBO NAME EXAM 7-45 NAME TEST SAS DATA LIBRARY MANAGEMENT v A SAS data library may be permanent or temporary. Permanent libraries have a library name specified by the user, which must correspond to a libname statement. Temporary libraries have a library name of WORK. v There are several SAS procedures that are useful in the management of SAS data libraries. PROC COPY can be used to copy all or some of the SAS data sets from one SAS data library to another. PROC DATASETS can be used to delete or modify data sets on a SAS data library. v A single data set can be copied from one data library to another with a DATA step of the form DATA newlib.SASds; SET oldlib.SASds; v Either of the above methods of copying data sets can be used to "back-up" data sets. "Backingup" of a data set means making one or more copies of the data set and storing the copies in a safe place. One then has: v Backup copies stored safely away, and v A working copy used for calculations or other processing. 7-46 THE COPY PROCEDURE The COPY procedure is used to copy some or all of the SAS data sets from one library to another. Syntax: PROC COPY IN=Libraryname OUT=Libraryname: Statements used with COPY: SELECT list of dataset names; EXCLUDE list of dataset names; The SELECT statement causes only the data sets listed to be copied. The EXCLUDE statement causes all data sets except those listed to be copied. NOTES: v If neither a SELECT or an EXCLUDE statement is used, all data sets in the input library are copied. v If the output library contains a data set with the same name as a selected data set, it will be replaced. v The DOS COPY Command can also be used to copy one or more SAS DATASETS. 7-47 EXAMPLE: 1 2 3 4 Libname classlib ’a:\’;run; proc copy in=classlib out=work; title ’proc copy example’; run; NOTE:Copying CLASSLIB.CLASS to WORK.CLASS (MEMTYPE=DATA). NOTE:The data set WORK.CLASS has 6 observations and 5 variables. NOTE:Copying CLASLIB.SALES to WORK.SALES (MEMTYPE=DATA). NOTE:The data set WORK.SALES has 50 observations and 6 variables. NOTE:The PROCEDURE COPY used 6.00 second. 5 proc copy in=classlib out=work; 6 select sales; 7 title1 ’proc copy example’; 8 title2 ’using the select statement’; 9 run; NOTE:Copying CLASSLIB.SALES to WORK.SALES (MEMTYPE=DATA). NOTE:The data set WORK.SALES has 50 observations and 6 variables. NOTE: The PROCEDURE COPY used 5.00 seconds. 7-48 THE DATASETS PROCEDURE v The DATASETS procedure is used to manage a SAS library v With PROC DATASETS you can list, copy, rename, and delete SAS data sets in a SAS data library. v For SAS data sets, PROC DATASETS can alter the descriptor portion of the SAS data set. PROC DATASETS can be used to rename variables, and to change formats, informats, or labels. SYNTAX: PROC DATASETS LIBRARY=libref ; DELETE mem_list ; /* delete files specified in mem_list */ CHANGE OLD=NEW ; /* rename SAS dataset from old name to new */ COPY OUT=libref2 ; /* copy to another SAS library */ SELECT MEM_LIST ; CONTENTS DATA=sas_data_set ; /* get contents of a sas_data_set */ CONTENTS DATA=libref._ALL_ nods; /* get contents of a sas library */ MODIFY sas_data_set_ ; /* modify descriptor part of sas_data_set */ RENAME old_name=new_name ; /* rename a variable */ LABEL var=’label’ ; /* add or change variable label */ FORMAT var format; /* add, change, or delete a format */ RUN ; 7-49 DATASETS EXAMPLE: LIBNAME SC2 V612 ’D:\BIOS111\SASDATA2’ ; PROC DATASETS LIBRARY=SC2 ; DELETE HW3 ; CHANGE SCORES=TEST ; CONTENTS DATA=CLASS ; COPY IN=SC2 OUT=WORK; SELECT CLASS ; MODIFY CLASS ; LABEL WT=’WT IN LBS.’ FORMAT HT 5.0 ; RENAME SEX = GENDER ; CONTENTS DATA = CLASS ; QUIT ; RUN; 7-50 1 LIBNAME SC2 V612 ’D:\BIOS111\SASDATA2’ ; NOTE: Libref SC2 was successfully assigned as follows: Engine: V612 Physical Name: D:\BIOS111\SASDATA2 3 PROC DATASETS LIBRARY=SC2 ; -----Directory----Libref: SC2 Engine: V612 Physical Name: D:\BIOS111\SASDATA2 # Name Memtype Indexes --------------------------1 CLASS DATA 2 HW3 DATA 3 HW4 DATA 4 SALES DATA 5 SCORES DATA 4 5 NOTE: NOTE: 6 7 8 NOTE: NOTE: 9 10 DELETE HW3 ; CHANGE SCORES=TEST ; Deleting SC2.HW3 (memtype=DATA). Changing the name SC2.SCORES to SC2.TEST (memtype=DATA). CONTENTS DATA=CLASS ; COPY IN=SC2 OUT=WORK; SELECT CLASS ; Copying SC2.CLASS to WORK.CLASS (MEMTYPE=DATA). The data set WORK.CLASS has 6 observations and 5 variables. MODIFY CLASS ; LABEL WT=’WT IN LBS.’ 11 FORMAT HT 5.0 ; 12 RENAME SEX = GENDER ; NOTE: Renaming variable SEX to GENDER. 13 CONTENTS DATA = CLASS ; 14 QUIT ; NOTE: The PROCEDURE DATASETS used 7.75 seconds. 15 RUN; 7-51 DATASETS PROCEDURE Data Set Name: SC2.CLASS Observations: 6 Member Type: DATA Variables: 5 Engine: V612 Indexes: 0 Created: 10:02 Wednesday, July 21, 1993 Observation Length: 41 Last Modified: 10:02 Wednesday, July 21, 1993 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO Label: -----Alphabetic List of Variables and Attributes----# Variable Type Len Pos ----------------------------------3 AGE Num 8 17 4 HT Num 8 25 1 NAME Char 12 4 2 SEX Char 1 16 5 WT Num 8 33 7-52 DATASETS PROCEDURE Data Set Name: SC2.CLASS Observations: 6 Member Type: DATA Variables: 5 Engine: V612 Indexes: 0 Created: 10:02 Wednesday, July 21, 1993 Observation Length: 41 Last Modified: 10:40 Friday, October 22, 1993 Deleted Observations: 0 Protection: Compressed: NO Data Set Type: Sorted: NO -----Alphabetic List of Variables and Attributes----# Variable Type Len Pos Label -------------------------------------------------------------------3 AGE Num 8 17 2 GENDER Char 1 16 4 HT Num 8 25 1 NAME Char 12 4 5 WT Num 8 33 7-53 WT IN LBS. FORMAT HT 5.0 TRANSPORTING SAS FILES BETWEEN HOSTS v The process of moving SAS files from one host environment to another is called TRANSPORTING. v To move a SAS file from one host to another, you must convert the file to a format that can be recognized by the SAS system on other host systems. This format is called TRANSPORT format. v The transport process includes: v exporting the SAS file(putting the file into transport format) v moving the transport file to another host environment via communications software v importing the transport(restoring the file to the format appropriate to the receiving host). v Transport format is necessary when moving SAS files between machines running two different operating systems. v If both the sending and the receiving host are running the same operating system, transport format is not necessary. v SAS data sets in using the V7 engine are in a format that is recognized by all operating systems. Therefore V7 SAS data sets can be used on multiple operating systems without going through the transport process. v A transport file is a sequential file that contains one or more SAS data sets in SAS transport format. v With the XPORT engine, a SAS data set in transport format can be used as input in a DATA step or processed by SAS procedures, including the COPY procedure. 7-54 TRANSPORTING WITH RELEASE 6.06 OR LATER v In release 6.06 or later the XPORT engine used in conjunction with PROC COPY can be used for creating or reading SAS data sets in transport format. v The CPORT and CIMPORT procedures can be used to transport SAS catalogs. Use the CPORT and CIMPORT procedures to transport catalogs and catalog entries from one machine to another machine running under a different operating environment. EXAMPLE: EXPORTING A SAS LIBRARY/DATASET libname tran XPORT ’c:\bios111\class.dat’ ; libname sc ’c:\bios111\sasdata’ ; /* convert an entire library to transport format */ proc copy in=sc out=tran ; run; /* convert a single SAS data set to transport format */ proc copy in=sc out=tran ; select class ; run; 7-55 EXAMPLE: IMPORTING A SAS LIBRARY/DATASET libname tran XPORT ’c:\bios111\class.dat’ ; libname sc ’c:\bios111\sasdata’ ; /* import all files in a SAS transport file */ proc copy in=tran out=sc ; run; /* import a single file in a SAS transport file */ proc copy in=tran out=sc ; select class ; run; 7-56