# regression exercise solution/template ```SOLUTIONS
PROBLEM
Individual chicks were depleted of their vitamin K reserves and then fed dried liver for
3 days at dosage levels ranging from 1.6 to 14.8 mg per gram of chick per day. At the
end of this period, on each chick was measured the concentration of clotting agent that
would clot samples of its blood in 3 minutes.
The data for this experiment can be found in the file clotting.dat. The first column in the
file contains the dose and the second column contains the concentration of clotting agent.
(a) Find an adequate model relating concentration of clotting agent (= dependent variable
= y) to dosage of dried liver (= independent variable = x). Use everything at your disposal
for checking the adequacy of the model (i.e., residual plots, outlier and influence
diagnostics, tests of lack-of-fit, etc.).
(b) Given the model you chose in (a), with 95% confidence, predict the average
concentration of clotting agent for a dosage level of 10.0 mg.
SOLUTIONS
(a) From a plot of concentration of clotting agent versus dosage of dried liver it appeared
that the relationship is non-linear. Furthermore, given that the raw data plot resembled
the plot in Figure 14 (d), we attempted square-root and log10 transformations on the
concentration and dosage. After several attempts, it appeared that a linear relationship
was achieved with the transformation
y’ = log10(y)
and
x’ = log10(x)
where
y = concentration of clotting agent
and
x = dosage of dried liver
Consequently, a straight-line model was fit to this transformed data.
The ANOVA F-test was used to determine whether or not there was a significant
linear relationship between log concentration and log dosage. With this end we had as
our test statistic
F = 421.099
with observed significance level
P < 0.001
Consequently, assuming that all assumption are met, at = 0.05, we concluded that there
was a significant linear relationship between log concentration and log dosage.
From a plot of the studentized deleted residuals versus dosage of dried liver there
appeared to be no systematic patterns, indicating that a linear model was adequate.
Furthermore, none of the residuals was greater that 2 in absolute value. Hence, there
appeared to be no outliers.
Leverages, Cook’s D and Dfits critical values are:
Cook’s D Leverage Dfits
critical values 0.2667
0.2667
0.73030
Chick #2 has two of the statistics (Cook’s Distance and Dfits) above the critical values so
it may be an influencial point. We run the regression again with the chick #2 deleted and
the results were not significantly different so we left chick #2 in the dataset.
In summary, we had no outliers or (actually) influential observations and an adequately
fitting straight-line model relating log10(concentration) to log10(dosage). Hence, the final
fitted model was
yˆ3.010 1.892x
(b) Since we needed to predict the average concentration of clotting agent for an a dosage
level of 10 mg, a confidence interval for the mean was required. Furthermore, since 10
mg was not a dosage value contained in the original data set, we appended an extra
observation with a dosage value of 10 and no value for concentration.
From the output, in terms of log10 concentration, the point estimate was
yˆ1.11837
Hence, in terms of concentration, yˆ 10**1.11837 13.1
Furthermore, in terms of log10 concentration, the 95% confidence interval for the mean
was (1.0422, 1.1945).
Hence, in terms of concentration, the 95% confidence interval for the mean was
(10^1.0422, 10^1.1945)=(11.02, 15.65)
Consequently, with 95% confidence, we predict the average concentration of clotting
agent for a dosage of 10 mg to between 11.02 and 15.65 .
data clot;
infile ‘C:\Documents and Settings\rcarta\Desktop\clotting.dat’;
input dose conc;
run;
proc print data=cloth;run;
data clot2;
set clot;
y=log10(conc);
x=log10(dose);
drop dose conc;
run;
proc print data=clot2; run;
proc reg data=clot2;
model y = x / p r influence ;
run; quit;
/*************** check influencial point */
data clot22;
set clot2;
if x<0.3424 or x >0.35;
run; proc print data=clot22; run;
proc reg data=clot22;
model y = x / p r influence ;
run; quit;
/************** end check influential point */
data oneob;
y=.;
x=log10(10);
run;
proc print data=oneb; run;
data both;
set clot2 oneob;
run;
proc print data=both; run;
proc reg data=both;
model y = x / clm cli;
run; quit;
```