Biostat 510 Homework 2 Key Due Tuesday, January 26, 2010 options formchar="|----|+|---+=|-/\<>*"; /*Question 2: Modify dataset*/ libname b510 "C:\Users\kwelch\Desktop\b510"; data allgroups; set b510.allgroups; if group = . then delete; if agemo >= 12 then agemo = mod(agemo,12); agemonths = 12*ageyr + agemo; run; /*Question 3: Descriptives on new dataset*/ title "Allgroups dataset"; proc means data=allgroups; run; /*Question 4: Import Excel data*/ PROC IMPORT OUT= WORK.groupsJan19 DATAFILE= "C:\Users\kwelch\Desktop\510\2010\homework\hw2\510data.xls" DBMS=EXCEL REPLACE; RANGE="Sheet1$"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; /*Question 5: Descriptives on new dataset*/ title "GroupsJan19 dataset"; proc means data=groupsJan19; run; /*Question 6: Histograms of Height and Weight*/ proc univariate data=groupsJan19; var height weight; histogram; run; /*Question 7: Merge the two datasets*/ proc sort data=allgroups; by group id; run; proc sort data=groupsJan19; by group id; run; data allgroups_combine; merge allgroups groupsJan19; 1 by group id; run; /*Question 8: Descriptives for final dataset*/ title "Combined dataset"; proc means data=allgroups_combine; run; /*Question 9: Proc contents on final dataset*/ proc contents data=allgroups_combine varnum; run; /*Question 10: Save the final dataset*/ data b510.allgroups_combine; set allgroups_combine; run; This homework will utilize the permanent SAS dataset you created for homework 1, allgroups.sas7bdat, which can be downloaded from my web page (http://www.umich.edu/~kwelch), or you can use the dataset that you created. Save your SAS commands for this homework as homework2.sas, and be sure to include them as the first part of your homework write-up. 1. Submit a libname statement so you can use your SAS dataset. To use a permanent SAS dataset, you must first submit a libname statement pointing to the folder where you saved your data, be sure you point to the correct folder on your computer: libname b510 "e:\510\2010"; 2. Create a new dataset and make changes to it using a data step. a. Create a new temporary SAS dataset by using a set statement like the one shown below. All SAS statements you use to modify your datatset will go after the SET statement and before the RUN statement, as illustrated below. data allgroups; set b510.allgroups; /*SAS statements to modify your dataset will go here*/ *SAS statement1; *SAS statement2; *etc; run; b. Modify the allgroups dataset by getting rid of cases that don't have a value for GROUP (this will get rid of unwanted extra cases). Do this by including a statement like that below in your data step. if group = . then delete; c. Fix the variable AGEMO so it is always the additional months over and above 12 months. To do this, we use the mod function. The mod (modulo) function actually 2 gives you the remainder when a value is divided by something. In this case we will divide by 12 to get the number of months left over). We also use an IF statement, so this will only apply to cases where agemo is greater than or equal to 12. if agemo >= 12 then agemo = mod(agemo,12); d. Now we will create a new variable called AGEMONTHS, which will be the total age in months: agemonths = 12*ageyr + agemo; The commands for your SAS data step should show all of the commands that you used to modify your dataset. 3. Get descriptives on your new dataset using Proc Means. What are the min, max, and mean of AGEMO in this new dataset? What are the min, max and mean of AGEMONTHS? Include this output in your homework write-up. Allgroups dataset The MEANS Procedure Variable Label N Mean Std Dev Minimum Maximum ---------------------------------------------------------------------------------------group group 89 3.7752809 1.5793613 1.0000000 6.0000000 ID ID 89 8.3033708 5.1265895 0 20.0000000 Ran Ran 89 0.5056180 0.5028011 0 1.0000000 AgeYR AgeYR 89 25.0112360 3.8183004 20.0000000 43.0000000 AgeMO AgeMO 89 4.3764045 3.6222802 0 11.0000000 HR1 HR1 89 73.5280899 11.3318637 49.0000000 121.0000000 HR2 HR2 89 82.8988764 19.3185291 27.0000000 143.0000000 agemonths 89 304.5112360 46.0340769 240.0000000 521.0000000 ---------------------------------------------------------------------------------------- The min of agemo is 0 and the max is 11, the mean is 4.38. The min of agemonths is 240 and the max is 521, the mean is 304.51. 4. Import the new Excel file from class on Jan 19th. Be sure to check the Excel file before you import it to SAS. a) Call your new dataset GroupsJan19. Be sure to save the SAS commands to import this dataset, and include them as part of your SAS commands for the homework. SAS commands should have the Proc Import Commands. 5. Get descriptive statistics for this new dataset. a) What is the n, min, max, and mean of HEIGHT, of WEIGHT? Include the output from these descriptives in your homework write-up. 3 GroupsJan19 dataset The MEANS Procedure Variable Label N Mean Std Dev Minimum Maximum ---------------------------------------------------------------------------------------group group 84 3.8452381 1.8333855 1.0000000 7.0000000 ID ID 83 7.6265060 4.6792506 1.0000000 20.0000000 height height 84 66.5607143 4.1259396 58.0000000 79.0000000 weight weight 84 146.0870238 32.5513640 75.0000000 279.5000000 ---------------------------------------------------------------------------------------- The min of height is 58, the max is 79 and the mean is 66.56. The min of weight is 75??, the max is 279.5, and the mean is 146.09. 6. Get a histogram of HEIGHT and WEIGHT using Proc Univariate. a) Describe the distribution of these two variables in words. Include the histograms (only) from Proc Univariate in your homework write-up. GroupsJan19 dataset GroupsJan19 dataset 35 45 40 30 35 25 Percent Percent 30 20 25 20 15 15 10 10 5 5 0 0 58.5 61.5 64.5 67.5 70.5 73.5 76.5 79.5 90 height 120 150 180 210 240 270 weight The distribution of height looks fairly symmetric, but with a slight positive (right) skew. The distribution of weight looks definitely skewed to the right. 7. Merge the two SAS datasets by GROUP and ID to create a new SAS dataset called ALLGROUPS_COMBINE. a) To do this you will first need to sort both datasets by GROUP and ID. proc sort data=allgroups; by group id; run; proc sort data=groupsJan19; by group id; run; b) Use a data step to merge the two datasets: data allgroups_combine; merge allgroups groupsJan19; by group id; run; 4 Include the portion of your SAS log showing that this dataset was correctly created. 74 75 76 77 78 data allgroups_combine; merge allgroups groupsJan19; by group id; run; NOTE: There were 89 observations read from the data set WORK.ALLGROUPS. NOTE: There were 84 observations read from the data set WORK.GROUPSJAN19. NOTE: The data set WORK.ALLGROUPS_COMBINE has 106 observations and 11 variables. How many observations are there in your final combined dataset? How many variables are there in your final dataset? There are 106 observations in the final dataset and 11 variables. 8. Get descriptive statistics for your combined dataset using Proc Means. a) What is the sample size (n) for HR1 and HR2? b) What is the sample size for HEIGHT? Include the output from this question in your homework write-up. Combined dataset The MEANS Procedure Variable Label N Mean Std Dev Minimum Maximum -----------------------------------------------------------------------------------------group group 106 3.9245283 1.7711880 1.0000000 7.0000000 ID ID 105 8.1142857 5.0807220 0 20.0000000 Ran Ran 93 0.5161290 0.5024484 0 1.0000000 AgeYR AgeYR 93 25.0645161 3.8948306 20.0000000 43.0000000 AgeMO AgeMO 93 4.3494624 3.6488421 0 11.0000000 HR1 HR1 93 73.7526882 11.5378921 49.0000000 121.0000000 HR2 HR2 93 82.9462366 18.9750211 27.0000000 143.0000000 agemonths 93 305.1236559 47.1227384 240.0000000 521.0000000 height height 84 66.5607143 4.1259396 58.0000000 79.0000000 weight weight 84 146.0870238 32.5513640 75.0000000 279.5000000 ------------------------------------------------------------------------------------------ The n for HR1 and HR2 is 93. The n for Height is 84. 9. Get Proc contents for your combined dataset, using Proc Contents. Include the output from Proc Contents in your homework write-up. (partial output from Proc Contents is shown below) 5 Combined dataset The CONTENTS Procedure Data Set Name Member Type Engine Created Last Modified Protection Data Set Type Label Data Representation Encoding WORK.ALLGROUPS_COMBINE DATA V9 Thursday, January 21, 2010 06:26:19 PM Thursday, January 21, 2010 06:26:19 PM Observations Variables Indexes Observation Length Deleted Observations Compressed Sorted 106 11 0 88 0 NO NO WINDOWS_32 wlatin1 Western (Windows) # Variable 1 2 3 4 5 6 7 8 9 10 11 group ID Ran AgeYR AgeMO Sex HR1 HR2 agemonths height weight Variables in Creation Order Type Len Format Informat Num Num Num Num Num Char Num Num Num Num Num 8 8 8 8 8 1 8 8 8 8 $1. $1. Label group ID Ran AgeYR AgeMO Sex HR1 HR2 height weight 8 10. Save your final dataset as a permanent SAS dataset called b510.allgroups_combine. Include the portion of your SAS log that shows the successful creation of your dataset. 85 86 87 88 /*Question 10: Save the final dataset*/ data b510.allgroups_combine; set allgroups_combine; run; NOTE: There were 106 observations read from the data set WORK.ALLGROUPS_COMBINE. NOTE: The data set B510.ALLGROUPS_COMBINE has 106 observations and 11 variables. You will be graded on the SAS commands, the output, and your answers to questions. Be sure to include all of your SAS commands as the first part of the homework. Include the requested SAS output as the second part of your homework. Make sure your output looks neat, and include careful page breaks. Try to keep your SAS output results to a minimum length by judiciously editing it! Include the answers to each question along with the output to which it pertains. Number the output so we can clearly see which problem you're answering. Make sure your answers are in complete sentences. Be sure you submit the SAS command file, fonts.sas, at the start of your homework assignment so you will have nice-looking tables and other output. This command file can be found on my web page. You only have to submit it once for each SAS session. You will get points for using this command. OPTIONS FORMCHAR="|----|+|---+=|-/\<>*"; 6 7