Public Health 144A, Sections 1 and 2 Spring 2008 Assignment #5, due 3/5 (section 1) and 3/7 (section 2) A. Finish any part of the class exercises that you were not able to complete (not to turn in). B. Read in DiIorio pages 367-372. This section refers to using FIRST. and LAST. variables that we discussed in class. C. Residing on each PC in the C:\PH144 directory and on the class web site are the full versions of the CHDS Basic and Placental Exam Data Sets: BASIC.DAT and PLACENTA.DAT . The Basic Data Set contains 20,754 records and the Placental Exam Data Set contains 15,111 records. The full path names on the PCs are C:\PH144\BASIC.DAT AND C:\PH144\PLACENTA.DAT. Note that the record layout format documentation that you already have can also be applied to these data sets. Edit (and rename as appropriate) one of your SAS Programs to produce the following: Using the complete CHDS Basic and Placental Exam Files run three SAS Procedures of your choice (may include graphs) that compare Placental Weight in grams and some measurement (again, of your choice) in the Basic Data set. Each SAS Procedure should be run against a different subset of a combined (read: “merged”) Basic-Placental Exam Data Set as follows… 1. Based upon all records that contain contributions from both Basic and Placental Exam Data Sets. 2. Based upon the first record for each woman among all records that contain contributions from both Basic and Placental Exam Data Set. 3. Based upon a 50 percent sample of the first record for each woman among all records that contain contributions from both Basic and Placental Exam Data Set. (i.e.: 50% sample of (2) above) You will need to... Input variables and recode variables as appropriate from each of the two data sets. Set missing values as appropriate. Comment your SAS Program as appropriate. Label variables as appropriate. Merge the two data sets. matching records by common values of the PREG variable (we will assume a one-to-one relationship between placentas and pregnancies). Perform the required subsetting for each of the three procedures that you run. Choose interesting SAS Procedures to run between each subset… consider evaluating the relationship between Placental Weight and some feature of the Mother, Infant or the Birth (e.g. age, race, sex, gestation). Create and use formats as appropriate. Title each of the SAS Procedures so that you can identify which subset you have used. Hint: the three subsets can be performed in sequence. In other words, you can structure your SAS program to follow this general sequence: (1) Create SAS Data Sets for the Basic and Placental Exam Files; (2) recode and create new variables, as necessary; (3) merge the data sets, performing the first subset; (4) produce the first SAS Procedure; (5) open a new SAS Data Step and input (using the SET command) the merged, subsetted data set and perform the next subset; (6) produce the second SAS Procedure; (7) open another SAS Data Step (using the SET command) to perform the third subset; and (8) produce the third SAS Procedure. Note that these data sets are much larger that the practice data sets that we’ve been using so far. Your programs will take slightly longer to run. This means that de-bugging your program will be slightly more time consuming process since you'll have to wait longer between iterations of your SAS Program. Also, note that small errors can have much larger consequences with big data files -- for example, a PROC PRINT Statement which has not been limited with an appropriate OBS= Option may produce thousands of unwanted lines in your OUTPUT environment.