Stat 401B Exam 2 Fall 2015

advertisement

Stat 401B Exam 2

Fall 2015

I have neither given nor received unauthorized assistance on this exam.

________________________________________________________

Name Signed Date

_________________________________________________________

Name Printed

ATTENTION!

Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit.

Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit.

SHOW YOUR WORK/EXPLAIN YOURSELF!

Completely absurd answers (that fail basic sanity checks but that you don't identify as clearly incorrect) may receive negative credit.

1

6 pts

5 pts

5 pts

1. Below are some data (and corresponding summary statistics) taken from a paper by Leigh and

Taylor that appeared in the

Ceramic Bulletin

in 1990. They concern measured densities (in g/cc) of crushed T-61 tabular alumina powder under r

 4 different measurement protocols.

Protocol 1

2.13, 2.15, 2.15,

Protocol 2

1.96, 2.01,1.91,

Protocol 3

2.23, 2.19, 2.18,

Protocol 4

1.88,1.90,1.87,

2.19, 2.20

1.95, 2.00

2.21, 2.22

1.89,1.89

n

1

 y

1

 s

1

5

2.164

.030

n

2

 y

2

 s

2

5

1.966

.040

n

3

 y

3

 s

3

5

2.206

.021

n

4

 5 y

4

 1.886

s

4

 .011

Initially, consider only data from Protocol 1. a) Give two-sided limits that you are 95% sure would contain the next measured density produced under Protocol 1. (Plug in completely, but you need not simplify.) b) A lab manager wishes to announce with 95% confidence that the "measurement capability"

(defined as 2  ) for Protocol 1 is no worse than some number (say, #). Provide an appropriate number, #, for this person based on the data above.

Now consider data from Protocols 1 and 4 only. c) Do the two samples provide definitive indication that the two measurement protocols have different precisions (different associated variabilities)? Compute some appropriate statistic and use an appropriate reference distribution. (Say exactly what reference distribution you are considering and support a "Yes" or a "No" answer.)

2

6 pts

7 pts

4 pts d) Give 95% two-sided confidence limits for the difference in mean densities produced by Protocols 1 and 4. (Plug in completely, but you need not simplify.)

Now consider data from all 4 protocols. e) Find a single-number estimate of the standard deviation

of measured density for any fixed protocol under the one-way normal model. f) Below is a normal plot of 20 values y ij

 y i

. Say what it indicates about the reliability of inferences based on the one-way normal model in this context. (The line on the plot has intercept 0 and slope

1/ s

P

.)

3

8 pts

6 pts

6 pts g) As it turns out, the grand sample variance of the 20 measured densities recorded on page 2 is

0.01937342

. Use this fact and your answer to part e) above to complete the ANOVA table below.

(If you were unable to do part e) , you may use the incorrect value of .020

here.)

SOURCE SS df MS F h) Protocols 1 and 3 were in fact carried out using 6-mesh material while Protocols 2 and 4 were carried out using 60-mesh material. Compare the average of 6-mesh mean densities to the average of

60-mesh mean densities using two-sided 95% confidence limits and your value of s

P

from the

ANOVA table in part g) . (Plug in completely, but you need not simplify.) i) Suppose that at some later time, 30 of 50 measurements made using Protocol 2 produce values less than 2.00 g/cc. Give a lower 95% confidence bound for the fraction of all Protocol 2 measurements less than 2.00 g/cc. (Plug in completely, but you need not simplify.)

4

5 pts

5 pts

5 pts

5 pts

2. A data set in Probability and Statistics With R for Engineers & Scientists by M. Akritas concerns heat produced during hardening of cement as related to the composition of the cement. Available were y

 measured heat produced (calories/gm) x

1

 % tricalcium aluminate x

2

 % tricalcium silicate x

3

 % tetracalcium alumino ferrite values for n

 13 x

4

 % dicalcium silicate

cement batches. There is some

R

code and output based on these data at the end of this exam. Use it as appropriate in the rest of the exam. a) On what basis would you suggest that x

4

is the best single predictor of y

(from among the predictors available here)? (What about the printout suggests this?)

Consider first a simple linear regression of y

on x

2

until further notice. b) Give 95% two-sided confidence limits for the standard deviation of measured heat produced at a fixed tricalcium silicate percentage. (Plug in completely, but there is no need to simplify.) c) Give 95% two-sided confidence limits for the rate of change of mean heat produced with respect to

% tricalcium silicate (in the units of the data). (Plug in completely, but there is no need to simplify.) d) For what percentage of tricalcium silicate do these data provide the best information about mean heat produced? Explain.

5

5 pts

7 pts

5 pts

5 pts

5 pts

Now consider the other predictor variables (not just x

2

). e) In a model that includes only predictors x

1 and x

2

give 95% two-sided limits for the rate of change of mean heat measurement (in cal/g) with respect to % tricalcium silicate. f) Under the model that includes only predictors x

1 and x

2

give limits that you are 95% sure will contain a next heat measurement under the conditions that x

1

 7and x

2

 26 . g) What fraction of the raw variability in heat produced is accounted for by fitting an equation involving all of x

3

? h) Give and interpret the p

-value for testing the hypothesis that together the three predictor variables

1

, , and x

3

fail to be useful in modeling heat produced. i) In the presence of x

1

and x

2

, does x

3

add (statistically) significantly to one's ability to model heat produced? Give a p

-value and say what hypothesis is being tested in what model.

6

R Code and OutPut

> CementVS

y x1 x2 x3 x4

1 78.5 7 26 6 60

2 74.3 1 29 15 52

3 104.3 11 56 8 20

4 87.6 11 31 8 47

5 95.9 7 52 6 33

6 109.2 11 55 9 22

7 102.7 3 71 17 6

8 72.5 1 31 22 44

9 93.1 2 54 18 22

10 115.9 21 47 4 26

11 83.8 1 40 23 34

12 113.3 11 66 9 12

13 109.4 10 68 8 12

> cor(CementVS)

y x1 x2 x3 x4 y 1.0000000 0.7307175 0.8162526 -0.5346707 -0.8213050 x1 0.7307175 1.0000000 0.2285795 -0.8241338 -0.2454451 x2 0.8162526 0.2285795 1.0000000 -0.1392424 -0.9729550 x3 -0.5346707 -0.8241338 -0.1392424 1.0000000 0.0295370 x4 -0.8213050 -0.2454451 -0.9729550 0.0295370 1.0000000

> plot(CementVS)

7

> cement.out1<-lm(y~x2,data = CementVS)

> summary(cement.out1)

Call: lm(formula = y ~ x2, data = CementVS)

Residuals:

Min 1Q Median 3Q Max

-10.752 -6.008 -1.684 3.794 21.387

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 57.4237 8.4906 6.763 3.1e-05 *** x2 0.7891 0.1684 4.686 0.000665 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.077 on 11 degrees of freedom

Multiple R-squared: 0.6663, Adjusted R-squared: 0.6359

F-statistic: 21.96 on 1 and 11 DF, p-value: 0.0006648

> anova(cement.out1)

Analysis of Variance Table

Response: y

Df Sum Sq Mean Sq F value Pr(>F) x2 1 1809.43 1809.43 21.961 0.0006648 ***

Residuals 11 906.34 82.39

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> predict(cement.out1,se.fit=TRUE,interval="confidence",level=.95)

$fit

fit lwr upr

1 77.94093 68.03527 87.84659

2 80.30830 71.30279 89.31381

3 101.61467 95.35687 107.87247

4 81.88655 73.45303 90.32007

5 98.45817 92.73668 104.17967

6 100.82555 94.73114 106.91996

7 113.45154 103.33218 123.57091

8 81.88655 73.45303 90.32007

9 100.03642 94.08677 105.98607

10 94.51255 88.95500 100.07010

11 88.98867 82.67707 95.30028

12 109.50592 100.87732 118.13452

13 111.08417 101.87504 120.29330

$se.fit

[1] 4.500556 4.091580 2.843181 3.831701 2.599516 2.768946 4.597653 3.831701

[9] 2.703176 2.525028 2.867626 3.920335 4.184094

$df

[1] 11

$residual.scale

[1] 9.077126

>

8

> cement.out2<-lm(y~x1+x2,data = CementVS)

> summary(cement.out2)

Call: lm(formula = y ~ x1 + x2, data = CementVS)

Residuals:

Min 1Q Median 3Q Max

-2.893 -1.574 -1.302 1.363 4.048

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 52.57735 2.28617 23.00 5.46e-10 *** x1 1.46831 0.12130 12.11 2.69e-07 *** x2 0.66225 0.04585 14.44 5.03e-08 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.406 on 10 degrees of freedom

Multiple R-squared: 0.9787, Adjusted R-squared: 0.9744

F-statistic: 229.5 on 2 and 10 DF, p-value: 4.407e-09

> anova(cement.out2)

Analysis of Variance Table

Response: y

Df Sum Sq Mean Sq F value Pr(>F) x1 1 1450.1 1450.08 250.43 2.088e-08 *** x2 1 1207.8 1207.78 208.58 5.029e-08 ***

Residuals 10 57.9 5.79

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> predict(cement.out2,se.fit=TRUE,interval="confidence",level=.95)

$fit

fit lwr upr

1 80.07400 77.38679 82.76122

2 73.25092 70.50710 75.99473

3 105.81474 103.96593 107.66355

4 89.25848 86.61956 91.89740

5 97.29251 95.74212 98.84291

6 105.15249 103.33331 106.97167

7 104.00205 100.77704 107.22706

8 74.57542 71.94224 77.20860

9 91.27549 89.00610 93.54487

10 114.53754 110.56117 118.51391

11 80.53567 78.23565 82.83570

12 112.43724 110.05956 114.81493

13 112.29344 109.81199 114.77489

$se.fit

[1] 1.2060356 1.2314382 0.8297558 1.1843598 0.6958245 0.8164554 1.4473996

[8] 1.1817850 1.0185111 1.7846157 1.0322647 1.0671170 1.1136881

$df

[1] 10

$residual.scale

[1] 2.406335

>

9

> cement.out3<-lm(y~x1+x2+x3,data = CementVS)

> summary(cement.out3)

Call: lm(formula = y ~ x1 + x2 + x3, data = CementVS)

Residuals:

Min 1Q Median 3Q Max

-3.2543 -1.4726 0.1755 1.5409 3.9711

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 48.19363 3.91330 12.315 6.17e-07 *** x1 1.69589 0.20458 8.290 1.66e-05 *** x2 0.65691 0.04423 14.851 1.23e-07 *** x3 0.25002 0.18471 1.354 0.209

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.312 on 9 degrees of freedom

Multiple R-squared: 0.9823, Adjusted R-squared: 0.9764

F-statistic: 166.3 on 3 and 9 DF, p-value: 3.367e-08

> anova(cement.out3)

Analysis of Variance Table

Response: y

Df Sum Sq Mean Sq F value Pr(>F) x1 1 1450.08 1450.08 271.2642 4.996e-08 *** x2 1 1207.78 1207.78 225.9385 1.108e-07 *** x3 1 9.79 9.79 1.8321 0.2089

Residuals 9 48.11 5.35

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

10

Download