Assignment 4 (Due Nov 9)

advertisement
PH6420 Fall 2015: Assignment 4
Due: November 9, 2015
Overview
You will need to write 3 separate SAS programs for this assignment (Part A, Part B, Part C). Turn in each
program. Be sure to answer each question.
PART A:
Write a SAS program to do the following:
1.
Create a SAS dataset called diet reading in data from tomhs.dat. Read in the variables ptid,
clinic, randdate, sex, brthdate, marital, kcalbl, fatbl, dcholbl, sodbl, and potbl, The last 5
variables are intake of dietary nutrients as described in the data dictionary. (Note: bl stands for
baseline).
2.
Create a variable that is the age of the patient at the beginning of the study (use brthdate and
randdate). Create another variable that is the ratio of dietary sodium to dietary potassium.
3.
Include labels for each variable (See data dictionary). Include a date format for randdate and
brthdate (put format statement in the data step).
4.
Run a PROC CONTENTS on the dataset. Verify that the dataset has the variables expected and
that there is a label for each variable and that the two date variables have formats.
5.
Make the SAS dataset a permanent dataset. Use PROC COPY to do this. You will need to use a
LIBNAME statement to tell SAS where to store the dataset.
6.
List the files you have in the folder you chose. Verify that you have a file called
diet.sas7bdat or an icon of a SAS dataset with the name diet (You do not need to turn in
anything for this part).
PART B:
After saving your program from part A exit your SAS session. Then enter SAS again and write a new
program that accesses the permanent dataset diet created in part A. You will need to use a LIBNAME
statement to tell SAS which folder the dataset is stored.
1.
Run a PROC CONTENTS on the permanent dataset diet.
2.
In a data step, create a new (work) dataset based on diet that contains only observation for women.
You will need to use a SET statement followed by a WHERE statement.
3.
Using this new dataset, display the mean intake of each nutrient for women.
4.
Display the correlation matrix of the 5 nutrient variables for women (exclude the new variable
created in Part A2). Which two nutrients have the largest correlation; what is that correlation?
PART C:
Write a SAS program to do the following:
1.
Create a work SAS dataset called maindata reading in the variables ptid, cholbl, and glucosbl
from tomhs.dat.
2.
Create a work SAS dataset called cvddata reading in the variables ptid and cvd from the file
cvd.dat (on class web page). The input statement for reading in this data should be:
INPUT @1 ptid $10. @13 cvd 1. ;
(Note: This file contains only persons experiencing a cardiovascular event. The variable
cvd is set to 1 for each of these persons).
3.
Create a SAS dataset called alldata by merging maindata and cvddata, matching on the variable
ptid. In this same data step assign a value of 2 for variable cvd to those that did not have a
cardiovascular event. Use the following code:
if cvd = . then cvd = 2;
5.
Run a procedure to display the average serum cholesterol (variable cholbl) and average serum
glucose (variable glucosbl) for those with and without a cardiovascular event.
6.
Based on part 5 does serum cholesterol and serum glucose appear to be risk factors for
cardiovascular disease in this study (explain your answer?. Note: You can use proc ttest to
statistically compare the groups.
Download