homework2_2010_key

advertisement
Biostat 510
Homework 2 Key
Due Tuesday, January 26, 2010
options formchar="|----|+|---+=|-/\<>*";
/*Question 2: Modify dataset*/
libname b510 "C:\Users\kwelch\Desktop\b510";
data allgroups;
set b510.allgroups;
if group = . then delete;
if agemo >= 12 then agemo = mod(agemo,12);
agemonths = 12*ageyr + agemo;
run;
/*Question 3: Descriptives on new dataset*/
title "Allgroups dataset";
proc means data=allgroups;
run;
/*Question 4: Import Excel data*/
PROC IMPORT OUT= WORK.groupsJan19
DATAFILE=
"C:\Users\kwelch\Desktop\510\2010\homework\hw2\510data.xls"
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
/*Question 5: Descriptives on new dataset*/
title "GroupsJan19 dataset";
proc means data=groupsJan19;
run;
/*Question 6: Histograms of Height and Weight*/
proc univariate data=groupsJan19;
var height weight;
histogram;
run;
/*Question 7: Merge the two datasets*/
proc sort data=allgroups;
by group id;
run;
proc sort data=groupsJan19;
by group id;
run;
data allgroups_combine;
merge allgroups groupsJan19;
1
by group id;
run;
/*Question 8: Descriptives for final dataset*/
title "Combined dataset";
proc means data=allgroups_combine;
run;
/*Question 9: Proc contents on final dataset*/
proc contents data=allgroups_combine varnum;
run;
/*Question 10: Save the final dataset*/
data b510.allgroups_combine;
set allgroups_combine;
run;
This homework will utilize the permanent SAS dataset you created for homework 1,
allgroups.sas7bdat, which can be downloaded from my web page
(http://www.umich.edu/~kwelch), or you can use the dataset that you created. Save your SAS
commands for this homework as homework2.sas, and be sure to include them as the first part of
your homework write-up.
1. Submit a libname statement so you can use your SAS dataset. To use a permanent
SAS dataset, you must first submit a libname statement pointing to the folder where you
saved your data, be sure you point to the correct folder on your computer:
libname b510 "e:\510\2010";
2. Create a new dataset and make changes to it using a data step.
a. Create a new temporary SAS dataset by using a set statement like the one shown
below. All SAS statements you use to modify your datatset will go after the SET
statement and before the RUN statement, as illustrated below.
data allgroups;
set b510.allgroups;
/*SAS statements to modify your dataset will go here*/
*SAS statement1;
*SAS statement2;
*etc;
run;
b. Modify the allgroups dataset by getting rid of cases that don't have a value for
GROUP (this will get rid of unwanted extra cases). Do this by including a statement
like that below in your data step.
if group = . then delete;
c. Fix the variable AGEMO so it is always the additional months over and above 12
months. To do this, we use the mod function. The mod (modulo) function actually
2
gives you the remainder when a value is divided by something. In this case we will
divide by 12 to get the number of months left over). We also use an IF statement, so
this will only apply to cases where agemo is greater than or equal to 12.
if agemo >= 12 then agemo = mod(agemo,12);
d. Now we will create a new variable called AGEMONTHS, which will be the total age
in months:
agemonths = 12*ageyr + agemo;
 The commands for your SAS data step should show all of the commands that you used to
modify your dataset.
3. Get descriptives on your new dataset using Proc Means. What are the min, max, and
mean of AGEMO in this new dataset? What are the min, max and mean of
AGEMONTHS?
 Include this output in your homework write-up.
Allgroups dataset
The MEANS Procedure
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
---------------------------------------------------------------------------------------group
group
89
3.7752809
1.5793613
1.0000000
6.0000000
ID
ID
89
8.3033708
5.1265895
0
20.0000000
Ran
Ran
89
0.5056180
0.5028011
0
1.0000000
AgeYR
AgeYR
89
25.0112360
3.8183004
20.0000000
43.0000000
AgeMO
AgeMO
89
4.3764045
3.6222802
0
11.0000000
HR1
HR1
89
73.5280899
11.3318637
49.0000000
121.0000000
HR2
HR2
89
82.8988764
19.3185291
27.0000000
143.0000000
agemonths
89
304.5112360
46.0340769
240.0000000
521.0000000
----------------------------------------------------------------------------------------
The min of agemo is 0 and the max is 11, the mean is 4.38.
The min of agemonths is 240 and the max is 521, the mean is 304.51.
4. Import the new Excel file from class on Jan 19th. Be sure to check the Excel file before
you import it to SAS.
a) Call your new dataset GroupsJan19.
 Be sure to save the SAS commands to import this dataset, and include them as part of
your SAS commands for the homework.
SAS commands should have the Proc Import Commands.
5. Get descriptive statistics for this new dataset.
a) What is the n, min, max, and mean of HEIGHT, of WEIGHT?
 Include the output from these descriptives in your homework write-up.
3
GroupsJan19 dataset
The MEANS Procedure
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
---------------------------------------------------------------------------------------group
group
84
3.8452381
1.8333855
1.0000000
7.0000000
ID
ID
83
7.6265060
4.6792506
1.0000000
20.0000000
height
height
84
66.5607143
4.1259396
58.0000000
79.0000000
weight
weight
84
146.0870238
32.5513640
75.0000000
279.5000000
----------------------------------------------------------------------------------------
The min of height is 58, the max is 79 and the mean is 66.56.
The min of weight is 75??, the max is 279.5, and the mean is 146.09.
6. Get a histogram of HEIGHT and WEIGHT using Proc Univariate.
a) Describe the distribution of these two variables in words.
 Include the histograms (only) from Proc Univariate in your homework write-up.
GroupsJan19 dataset
GroupsJan19 dataset
35
45
40
30
35
25
Percent
Percent
30
20
25
20
15
15
10
10
5
5
0
0
58.5
61.5
64.5
67.5
70.5
73.5
76.5
79.5
90
height
120
150
180
210
240
270
weight
The distribution of height looks fairly symmetric, but with a slight positive (right) skew.
The distribution of weight looks definitely skewed to the right.
7. Merge the two SAS datasets by GROUP and ID to create a new SAS dataset called
ALLGROUPS_COMBINE.
a) To do this you will first need to sort both datasets by GROUP and ID.
proc sort data=allgroups;
by group id;
run;
proc sort data=groupsJan19;
by group id;
run;
b) Use a data step to merge the two datasets:
data allgroups_combine;
merge allgroups groupsJan19;
by group id;
run;
4
 Include the portion of your SAS log showing that this dataset was correctly created.
74
75
76
77
78
data allgroups_combine;
merge allgroups groupsJan19;
by group id;
run;
NOTE: There were 89 observations read from the data set WORK.ALLGROUPS.
NOTE: There were 84 observations read from the data set WORK.GROUPSJAN19.
NOTE: The data set WORK.ALLGROUPS_COMBINE has 106 observations and 11 variables.
How many observations are there in your final combined dataset? How many variables
are there in your final dataset?
There are 106 observations in the final dataset and 11 variables.
8. Get descriptive statistics for your combined dataset using Proc Means.
a) What is the sample size (n) for HR1 and HR2?
b) What is the sample size for HEIGHT?
 Include the output from this question in your homework write-up.
Combined dataset
The MEANS Procedure
Variable
Label
N
Mean
Std Dev
Minimum
Maximum
-----------------------------------------------------------------------------------------group
group
106
3.9245283
1.7711880
1.0000000
7.0000000
ID
ID
105
8.1142857
5.0807220
0
20.0000000
Ran
Ran
93
0.5161290
0.5024484
0
1.0000000
AgeYR
AgeYR
93
25.0645161
3.8948306
20.0000000
43.0000000
AgeMO
AgeMO
93
4.3494624
3.6488421
0
11.0000000
HR1
HR1
93
73.7526882
11.5378921
49.0000000
121.0000000
HR2
HR2
93
82.9462366
18.9750211
27.0000000
143.0000000
agemonths
93
305.1236559
47.1227384
240.0000000
521.0000000
height
height
84
66.5607143
4.1259396
58.0000000
79.0000000
weight
weight
84
146.0870238
32.5513640
75.0000000
279.5000000
------------------------------------------------------------------------------------------
The n for HR1 and HR2 is 93.
The n for Height is 84.
9. Get Proc contents for your combined dataset, using Proc Contents.
 Include the output from Proc Contents in your homework write-up.
(partial output from Proc Contents is shown below)
5
Combined dataset
The CONTENTS Procedure
Data Set Name
Member Type
Engine
Created
Last Modified
Protection
Data Set Type
Label
Data Representation
Encoding
WORK.ALLGROUPS_COMBINE
DATA
V9
Thursday, January 21, 2010 06:26:19 PM
Thursday, January 21, 2010 06:26:19 PM
Observations
Variables
Indexes
Observation Length
Deleted Observations
Compressed
Sorted
106
11
0
88
0
NO
NO
WINDOWS_32
wlatin1 Western (Windows)
#
Variable
1
2
3
4
5
6
7
8
9
10
11
group
ID
Ran
AgeYR
AgeMO
Sex
HR1
HR2
agemonths
height
weight
Variables in Creation Order
Type
Len
Format
Informat
Num
Num
Num
Num
Num
Char
Num
Num
Num
Num
Num
8
8
8
8
8
1
8
8
8
8
$1.
$1.
Label
group
ID
Ran
AgeYR
AgeMO
Sex
HR1
HR2
height
weight
8
10. Save your final dataset as a permanent SAS dataset called b510.allgroups_combine.
 Include the portion of your SAS log that shows the successful creation of your dataset.
85
86
87
88
/*Question 10: Save the final dataset*/
data b510.allgroups_combine;
set allgroups_combine;
run;
NOTE: There were 106 observations read from the data set WORK.ALLGROUPS_COMBINE.
NOTE: The data set B510.ALLGROUPS_COMBINE has 106 observations and 11 variables.
You will be graded on the SAS commands, the output, and your answers to questions.
 Be sure to include all of your SAS commands as the first part of the homework.
 Include the requested SAS output as the second part of your homework.
 Make sure your output looks neat, and include careful page breaks.
 Try to keep your SAS output results to a minimum length by judiciously editing it!
 Include the answers to each question along with the output to which it pertains. Number
the output so we can clearly see which problem you're answering. Make sure your
answers are in complete sentences.
 Be sure you submit the SAS command file, fonts.sas, at the start of your homework
assignment so you will have nice-looking tables and other output. This command file can
be found on my web page. You only have to submit it once for each SAS session. You
will get points for using this command.
OPTIONS FORMCHAR="|----|+|---+=|-/\<>*";
6
7
Download