homework3_Jan27_2006

advertisement
Biostat 510
Homework 3
Due Thursday, February 2, 2006
1. Create a permanent SAS data set from the raw data file, afifi.dat, on my web page.
Model your SAS commands based on the description of the Afifi data set included with
this homework.
a) You should read in all variables, even though the example only shows reading in a
subset of the variable.
b) Assign labels to all variables, using a label statement, as shown in the handout.
c) Create new variables in the permanent data set, as shown in the handout, and
additional new variables.
i. SHOCK: Dummy Variable. 0=no shock, 1=shock (shown in handout)
ii. DIED: Dummy Variable. 0=lived, 1=died (shown in handout)
iii. SHOCK_DUM2 – SHOCK_DUM7: Dummy variables for each shock
type (shock types go from 2 through 7).
iv. SBPDIFF: The difference between SBP at time 2 and SBP at time 1.
Calculate by subtracting SBP1 from SBP2.
2. Create a Scatterplot with SBP2 as the Y-axis, and SBP1 as the X-axis. Include a linear
regression line in your scatterplot. You may create the scatterplot using Proc Gplot, or in
SAS/INSIGHT. Include your scatterplot in your homework.
3. Carry out a simple linear regression, with SBP2 as the dependent variable, and SBP1 as
the only predictor.
a) Get a plot of residuals vs. predicted values to check homogeneity of variance.
b) Get a histogram and normal Q-Q plot to check the normality of the residuals. Use the
studentized residuals for both of these plots.
c) Include your regression output and the diagnostic plots in your homework.
4. Get a box plot of SBP2 as the Y-axis and the levels of SHOCKTYPE as the X-axis.
Don’t forget to sort prior to running the box plot. Include the box plot in your homework.
5. Carry out a regression with the dummy variables for SHOCKTYPE as the predictors and
SBP2 as the dependent variable.
a) Use Non-shock as the reference category.
b) Get a plot of residuals vs. predicted values to check homogeneity of variance.
c) Get a histogram and normal Q-Q plot to check the normality of the residuals. Use the
studentized residuals for both of these plots.
d) Include your regression output and the diagnostic plots in your homework.
6. Create a box plot with SBP2 as the Y-axis and the SHOCK dummy variable as the Xaxis. Include this box plot in your homework.
1
7. Carry out a regression with SBP2 as the dependent variable and SHOCK (the dummy
variable) as the predictor. Include the output for this regression in your homework.
a) Create diagnostic plots for this regression, but you do not need to include them in
your homework.
8. DON’T DO THIS ONE. Create a Pearson correlation matrix with the variables SBP2,
SBP1, BSA1, CARDIAC1, HGB1, and MAP1.
a) Use listwise deletion for the variables.
b) Create a scatterplot matrix using SAS/INSIGHT.
c) Include the correlation matrix and the scatterplot matrix in your homework output.
9. DON’T DO THIS ONE. Carry out a multiple regression with SBP2 as the dependent
variable and the predictor variables, SBP1, BSA1, CARDIAC1, HGB1, and MAP1 as
predictors.
a) Check the collinearity diagnostics for this model.
b) Include the model output, along with the collinearity diagnostics in your homework.
c) Rerun the model, but remove MAP1 as a predictor.
d) Check collinearity for this new model.
10. Answer the following questions about your analysis.
a) (Scatterplot of SBP2 vs. SBP1) Does there appear to be a linear relationship between
these two variables? Describe the direction of the relationship, and how much
variability there is around the linear regression line.
b) (Simple linear regression) Please interpret the estimated intercept and the estimated
coefficient for SBP1 in this model.
i. Is there a significant linear relationship between SPB2 and SBP1? Write
out your response in words, and include the t-statistic, df, and p-value for
the test.
ii. What is the model R2? What is the sample size for this model?
iii. Describe the scatterplot of studentized residuals vs. predicted values. Do
the residuals appear to have constant variance for all predicted values?
iv. Describe the distribution of the residuals. Does the assumption of
normally distributed errors appear to be true for this model?
c) (Box plot of SBP2 for levels of SHOCKTYPE)
i. Describe the pattern of SBP2 for the levels of SHOCKTYPE.
d) (Regression with dummy variables for SHOCKTYPE)
i. Interpret the intercept and the coefficients for each level of SHOCKTYPE.
Which of the dummy variables for the levels of SHOCKTYPE are
significant?
ii. What is the model R2? What is the sample size for this model?
iii. Does the assumption of equal variances appear to hold true for the
different levels of SHOCKTYPE, based on your scatterplot of studentized
residuals vs. predicted values for this model?
iv. Does the assumption of normality of residuals appear to hold true for this
model?
2
e) (Box plot of SBP2 vs. SHOCK dummy).
i. What is the relationship between SHOCK and SBP2 as shown in this
boxplot?
f) (Regression with one dummy variable for SHOCK as the predictor)
i. Interpret the intercept and the coefficient for SHOCK in this regression
model.
ii. What is the model R2? What is the sample size for this model?
iii. Compare the model R2 for this model to the one in which the dummy
variables for SHOCKTYPE were included as predictors. Which model
would you prefer to use (think of parsimony).
g) DON’T ANSWER THIS QUESTION. (Pearson Correlation Matrix)
i. What variables are highly correlated with each other in this correlation
matrix?
ii. How many observations can be included in this correlation?
h) DON’T ANSWER THIS QUESTION. (Multiple Regression with Collinearity
Diagnostics)
i. What variables appear to be collinear in the initial regression model?
ii. What is the parameter estimate, standard error, and significance for each
predictor in the initial model?
iii. How do the colllinearity diagnostics appear after deleting MAP1 as a
predictor?
iv. What is the parameter estimate, standard error, and significance for each
predictor in the final model?
v. Comment on the comparison between these two models.
The SAS commands will be worth 50 points, and answers to questions a) through f) will be
worth 50 points. Save your homework commands as Homework3.sas. Run all the commands at
once and be sure they all work without error.
Note: you should work on only questions 1 through 7, and answer questions a through f for this
homework.
3
Data Description For Afifi Data
Afifi and Azen (1972) describe data collected at the Los Angeles Shock Unit. For each patient,
data were taken on admission and either shortly before death or before discharge. The variables
and their formats are described in the table below. Variables 1-21 refer to data at the initial
examination and variables 22-42 refer to the same variables at the final examination.
Variables
1,22
2,23
3,24
4,25
5,26
6,27
7,28
8,29
9,30
10,31
11,32
12,33
13,34
14,35
15,36
16,37
17,38
18,39
19,40
20,41
21,42
Columns
1-4
5-8
9-12
13-15
16
17-20
21-24
25-28
29-32
33-36
37-40
41-44
45-48
49-52
53-56
57-60
61-64
65-68
69-72
73-76
80
Format
4.0
4.0
4.0
3.0
1.0
4.0
4.0
4.0
4.0
4.0
4.1
4.2
4.2
4.1
4.1
4.0
4.1
4.1
4.1
4.1
1.0
Name
IDNUM
AGE
HEIGHT
SEX
SURVIVE
SHOCKTYPE
SBP1 / SBP2
MAP1 / MAP2
HRT1/ HRT2
DBP1/DBP2
CVP1/CVP2
BSA1/BSA2
CI1/CI2
APP1/APP2
CIRC1/CIRC2
UR1/UR2
PLAS1/PLAS2
RC1/RC2
HGB1/HGB2
HCT1/HCT2
TIME1/TIME2
Description
Id number
Age (years)
Height (cm)
Sex (1=male, 2=female)
Survival (1=lived, 3=died)
Shock type (2=non-shock, 3,4,5,6,7=shock)
Systolic Blood Pressure (mm Hg)
Mean Arterial Pressure (mm Hg)
Heartrate (beats per minute)
Diastolic blood pressure (mm Hg)
Mean central venous BP (mm Hg)
Body surface area (m sq)
Cardiac index (1/min/min squared)
Appearance time (sec)
Mean circulation time (sec)
Urinary Output (ml/hr)
Plasma volume index (ml/kg)
Red cell index (ml/kg)
Hemoglobin (gm)
Hematocrit (%)
Time (1=initial, 2=final)
A listing of the first 6 lines of the raw data file, Afifi.dat, is shown below:
340
340
412
412
426
426
70
70
56
56
47
47
160
160
173
173
176
176
23
23
11
11
11
11
4 62
4 129
4 83
4 102
4 80
4 87
38 53
74 72
66 110
75 108
64 84
68 77
29 100 187 90 190 390
0 394 241
53 190 187 120 130 300 15 394 241
60 10 182 126 221 407 110 362 240
63 90 182 281 100 206 50 564 266
55 10 180 110 120 280 80 373 272
52 40 180 410 100 170 75 508 217
131
112
166
154
146
99
400
365
500
330
490
320
1
2
1
2
1
2
libname b510 "h:\b510\2006";
DATA b510.AFIFI;
INFILE 'C:\TEMP\LABDATA\AFIFI.DAT';
INPUT
#1 IDNUM 1-4 AGE 5-8 SEX 13-15 SURVIVE 16 SHOKTYPE 17-20 SBP1 21-24
HGB1 69-72 1
#2 SBP2 21-24 HGB2 69-72 1;
LABEL SHOCK='Shock type'
SBP1='Systolic BP at time 1'
SBP2='Systolic BP at time 2'
HGB1='Hemoglobin at time 1'
HGB2='Hemoglobin at time 2' ;
IF
IF
IF
IF
RUN;
SHOKTYPE=2 THEN SHOCK=0;
SHOKTYPE IN (3,4,5,6,7) THEN SHOCK=1;
SURVIVE=1 THEN DIED=0;
SURVIVE=3 THEN DIED=1;
4
Download