homework2

Biostat 510: Statistical Computing Packages SAS Homework 2 Due Thursday, Sept 19, 2002 Topics: Using permanent SAS data set Setting up a default data set Merging Data Sets by ID to add variables Transformations Distributions of continuous variables Histograms Independent Samples t-test Paired Samples t-test 1. Get a Proc Contents and Proc Means on the data set that you created for homework 1. a. First submit a libname statement for the b510 library, as you did for homework 1. Note: you will need to submit a libname statement each time you run SAS. b. Submit an options statement making your data set from homework1 the default, so you can use it without needing to specify the data set name for each proc. The b510.hrtrate2 data set will be the default until you create a new data set in this session. options _last_ = b510.hrtrate2; 2. Download and unzip the tecumseh.exe archive. It contains three SAS data sets from different rounds of the Tecumseh Community Health Study. The data sets will be unzipped to the folder c:\temp\tecumseh. a. Submit a libname, assigning the library tecumseh to the folder where the data are stored. b. Get the contents on all of the data sets in a library by using syntax like that shown below. i. For each data set, what is the SAS data set name? What is the file name? ii. What engine was used to create the data set? What release of SAS created the data set? iii. How many observations and how many variables are in each data set? What type are the variables? Are the data sets sorted? libname tecumseh v8 "c:\temp\tecumseh"; proc contents data=tecumseh._all_; run; 1 3. Get descriptive statistics on all variables in each of the data sets. a. Use titles to label your output, so it is clear which data set each part of the output comes from. Include the output from your descriptive statistics for each data set in the material that you hand in for this homework. b. How many people have a value for age? What is the mean age, minimum age, and maximum age at each of these examinations? c. How many people have a value for cigarette smoking? 4. Get a histogram of Age each of the data sets using Proc Univariate with the Plot option. Remember, the variable names are set up to be V1, V2, etc. a. What is the median age at each round? How does it compare to the mean age? b. Describe the distribution of age for each of these data sets. How does this distribution change from one round of data to the next? c. Try getting a histogram of Age for CVI using Proc Chart, with the vbar and with the hbar options, and using SAS/INSIGHT. (You do not need to hand in the graphs for this question, but include the commands, for everything but the Insight graphs). Experiment with different midpoints for the graph in Proc Chart (see your coursepack, p. 5. Create a new permanent data set called tecumseh.adultcv1 from the round 1 data that has only people who were 20 or more years old at the time of the first exam. Hint: Use a subsetting if statement in the data step. a. How many people are in your new data set? b. Get descriptive statistics for all variables in this data set. c. What is the average age of those in your new data set? What is the minimum? What is the maximum? d. How many people have a value for cigarette smoking in this data set? 6. Merge the adults data set with the data from rounds 2 and 3. Include in your final data set only those who were in the adults data set at round 1. Call your new data set tecumseh.cv123. a. Be sure to sort each data set before merging. b. Use the (in= ) data set option to make sure you get only cases in your final data set that were included in the round 1 adults data set. c. How many observations and how many variables are in this final data set? 7. Calculate BMI for each round, using the appropriate variables. Get descriptive statistics on your new bmi variables (call them bmi1, bmi2, and bmi3). You can put the creation of these new variables into the data step that you used to merge the data, so you will not make an extra data set. a. Get Histograms of BMI for each round of data. b. What is the distribution of BMI for each round? c. Calculate logbmi for each round as the natural log of bmi. d. What is the distribution of logbmi for each round? 8. Get a t-test to compare the mean of BMI at round 1 to that at round 1 to that at round 2 and at round 3. Do the same for logbmi at round 1 vs. round 2 and round 3. a. What type of t-test is appropriate here? b. Which variable is it better to use bmi or logbmi? Why? 9. Use a t-test to compare bmi for males vs. females at each round. Do the same for logbmi. a. What type of t-test is appropriate here? b. Which variable is it more appropriate to use, bmi or logbmi? Why? 2 3

homework2

Related documents

Products

Support

homework2

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib