homework2

advertisement
Biostat 510: Statistical Computing Packages
SAS Homework 2
Due Thursday, Sept 19, 2002
Topics:
Using permanent SAS data set
Setting up a default data set
Merging Data Sets by ID to add variables
Transformations
Distributions of continuous variables
Histograms
Independent Samples t-test
Paired Samples t-test
1. Get a Proc Contents and Proc Means on the data set that you created for homework 1.
a. First submit a libname statement for the b510 library, as you did for homework 1.
Note: you will need to submit a libname statement each time you run SAS.
b. Submit an options statement making your data set from homework1 the default,
so you can use it without needing to specify the data set name for each proc. The
b510.hrtrate2 data set will be the default until you create a new data set in this
session.
options _last_ = b510.hrtrate2;
2. Download and unzip the tecumseh.exe archive. It contains three SAS data sets from
different rounds of the Tecumseh Community Health Study. The data sets will be
unzipped to the folder c:\temp\tecumseh.
a. Submit a libname, assigning the library tecumseh to the folder where the data are
stored.
b. Get the contents on all of the data sets in a library by using syntax like that shown
below.
i. For each data set, what is the SAS data set name? What is the file name?
ii. What engine was used to create the data set? What release of SAS created
the data set?
iii. How many observations and how many variables are in each data set?
What type are the variables? Are the data sets sorted?
libname tecumseh v8 "c:\temp\tecumseh";
proc contents data=tecumseh._all_;
run;
1
3. Get descriptive statistics on all variables in each of the data sets.
a. Use titles to label your output, so it is clear which data set each part of the output
comes from. Include the output from your descriptive statistics for each data set in
the material that you hand in for this homework.
b. How many people have a value for age? What is the mean age, minimum age, and
maximum age at each of these examinations?
c. How many people have a value for cigarette smoking?
4. Get a histogram of Age each of the data sets using Proc Univariate with the Plot option.
Remember, the variable names are set up to be V1, V2, etc.
a. What is the median age at each round? How does it compare to the mean age?
b. Describe the distribution of age for each of these data sets. How does this
distribution change from one round of data to the next?
c. Try getting a histogram of Age for CVI using Proc Chart, with the vbar and with
the hbar options, and using SAS/INSIGHT. (You do not need to hand in the
graphs for this question, but include the commands, for everything but the Insight
graphs). Experiment with different midpoints for the graph in Proc Chart (see
your coursepack, p.
5. Create a new permanent data set called tecumseh.adultcv1 from the round 1 data that has
only people who were 20 or more years old at the time of the first exam. Hint: Use a
subsetting if statement in the data step.
a. How many people are in your new data set?
b. Get descriptive statistics for all variables in this data set.
c. What is the average age of those in your new data set? What is the minimum?
What is the maximum?
d. How many people have a value for cigarette smoking in this data set?
6. Merge the adults data set with the data from rounds 2 and 3. Include in your final data set
only those who were in the adults data set at round 1. Call your new data set
tecumseh.cv123.
a. Be sure to sort each data set before merging.
b. Use the (in= ) data set option to make sure you get only cases in your final data
set that were included in the round 1 adults data set.
c. How many observations and how many variables are in this final data set?
7. Calculate BMI for each round, using the appropriate variables. Get descriptive statistics
on your new bmi variables (call them bmi1, bmi2, and bmi3). You can put the creation
of these new variables into the data step that you used to merge the data, so you will not
make an extra data set.
a. Get Histograms of BMI for each round of data.
b. What is the distribution of BMI for each round?
c. Calculate logbmi for each round as the natural log of bmi.
d. What is the distribution of logbmi for each round?
8. Get a t-test to compare the mean of BMI at round 1 to that at round 1 to that at round 2
and at round 3. Do the same for logbmi at round 1 vs. round 2 and round 3.
a. What type of t-test is appropriate here?
b. Which variable is it better to use bmi or logbmi? Why?
9. Use a t-test to compare bmi for males vs. females at each round. Do the same for logbmi.
a. What type of t-test is appropriate here?
b. Which variable is it more appropriate to use, bmi or logbmi? Why?
2
3
Download