Public Health 292, Sections 10 and 16

advertisement
Public Health 144A, Sections 1 and 2
Spring 2008
Assignment #5, due 3/5 (section 1) and 3/7 (section 2)
A.
Finish any part of the class exercises that you were not able to complete (not to turn in).
B.
Read in DiIorio pages 367-372. This section refers to using FIRST. and LAST. variables that we discussed in class.
C.
Residing on each PC in the C:\PH144 directory and on the class web site are the full versions of the CHDS Basic and
Placental Exam Data Sets: BASIC.DAT and PLACENTA.DAT . The Basic Data Set contains 20,754 records and the
Placental Exam Data Set contains 15,111 records. The full path names on the PCs are C:\PH144\BASIC.DAT AND
C:\PH144\PLACENTA.DAT. Note that the record layout format documentation that you already have can also be applied
to these data sets.
Edit (and rename as appropriate) one of your SAS Programs to produce the following:
Using the complete CHDS Basic and Placental Exam Files run three SAS Procedures of your choice (may include
graphs) that compare Placental Weight in grams and some measurement (again, of your choice) in the Basic Data set.
Each SAS Procedure should be run against a different subset of a combined (read: “merged”) Basic-Placental Exam
Data Set as follows…
1. Based upon all records that contain contributions from both Basic and Placental Exam Data Sets.
2. Based upon the first record for each woman among all records that contain contributions from both Basic
and Placental Exam Data Set.
3. Based upon a 50 percent sample of the first record for each woman among all records that contain
contributions from both Basic and Placental Exam Data Set. (i.e.: 50% sample of (2) above)
You will need to...
Input variables and recode variables as appropriate from each of the two data sets.
Set missing values as appropriate.
Comment your SAS Program as appropriate.
Label variables as appropriate.
Merge the two data sets. matching records by common values of the PREG variable (we will assume a one-to-one
relationship between placentas and pregnancies).
Perform the required subsetting for each of the three procedures that you run.
Choose interesting SAS Procedures to run between each subset… consider evaluating the relationship between
Placental Weight and some feature of the Mother, Infant or the Birth (e.g. age, race, sex, gestation).
Create and use formats as appropriate.
Title each of the SAS Procedures so that you can identify which subset you have used.
Hint: the three subsets can be performed in sequence. In other words, you can structure your SAS program to follow this
general sequence: (1) Create SAS Data Sets for the Basic and Placental Exam Files; (2) recode and create new variables, as
necessary; (3) merge the data sets, performing the first subset; (4) produce the first SAS Procedure; (5) open a new SAS Data
Step and input (using the SET command) the merged, subsetted data set and perform the next subset; (6) produce the second
SAS Procedure; (7) open another SAS Data Step (using the SET command) to perform the third subset; and (8) produce the
third SAS Procedure.
Note that these data sets are much larger that the practice data sets that we’ve been using so far. Your programs will take
slightly longer to run. This means that de-bugging your program will be slightly more time consuming process since you'll have to
wait longer between iterations of your SAS Program. Also, note that small errors can have much larger consequences with big
data files -- for example, a PROC PRINT Statement which has not been limited with an appropriate OBS= Option may produce
thousands of unwanted lines in your OUTPUT environment.
Download