Final Exam Review

advertisement
A Most Wonderful
Practice Final
Happiness is taking a final exam in
statistics.
Our Last Homework
Assignment
Statistic
mean
std dev
median
skewness
minimum
maximum
1 quartile
3 quartile
interquartile range
range
kurtosis
Chap12
90.6
15.3
91
-3.81
10
100
87.63
100
12.38
90
19.01
HW Avg
85.3
10.8
88.6
-2.55
45.23
99
83.02
90.25
7.23
53.77
7.64
Problem 1 – point estimation
The following random sample was obtained by measuring the time
in (working) hours to complete a particular construction job.
Treating the data as continuous, answer the following questions:
(a) Find an unbiased estimate for the population mean
(b) Find an unbiased estimate for the population variance.
83.8
64.2
89.8
82.3
90.4
63.3
40.4
104
108
96.6
65.4
98.1
46.8
86.9
71.8
56.2
72.1
73.7
77.2
113
135
56.4
99.8
64.6
95.7
85.3
75.8
88.5
71.7
72
99.7
49.1
98.9
85.2
110
68.4
123
58.6
74.1
67.9
66.5
44.5
55.6
136
91.7
30.9
76.7
36.8
48
71.5
Problem 1a,b
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
78.42
3.37549
74.95
#N/A
23.8683
569.696
-0.0294
0.3234
104.8
30.9
135.7
3921
50
Excel – descriptive statistics
t  1et /
f (t )  
for  ,   0 and t  0
 ( )
(c) To the gamma distribution
More problem 1
(c) The mean and variance of the gamma distribution
are  and 2 respectively where  and  are the
parameters of the distribution. Assuming the gamma
distribution is also a reasonable fit to the above
sample data, find the method of moment estimators
for  and . Recall 2 = E(X2) - 2
n
m1   xi / n  78.42
i 1
n
m2   xi2 / n  6708
i 1
m1   ; m2      
2
2
m12
78.422


 11.015
2
2
m2  m1 6708  78.42
m2  m12 6708  78.422


 7.119
m1
78.42
Problem 2 – The Ebeling Distribution

The Ebeling distribution, a statistical marvel, is a one-parameter
probability distribution having the following PDF and CDF where
E(X) = θ:
2
f ( x) 
; x  0,   0
3
(x  )
2
2
F ( x)  1 
2
 x  
54.3
28.8
20.5 365.7
1.6
33.9
40.1
18.7
21.5
10.1
41.1
11.5
160.8
0.3
0.2
34.7
239.0
10.7
17.8 190.7
4.5
14.4
4.3
3.0
15.2
46.4
46.5
59.9
0.4
78.0
4.6
123.9
2.0
18.0
55.0
7.1
19.9
99.4 171.6
0.3
25.0
37.2
14.8
14.0
1.0
80.6
2.2
6.6
7.6
30.4
Data represents the time between arrivals of patients at an urgent care center in minutes.
Finding the MoM



2 x
1

2 
E( X )  
dx  2 

3
2
(
x


)
(3

2)(
x


)
2(
x


)

0
0
2

2 2( x   )  
 2 
2
2(
x


)

  X  45.9

2   
  2  2   
 2 
0 


Finding the MLE
n
n
2 2
n
2
3
L( )   f ( xi )  

2

(
x


)



i
3
i 1
i 1 ( xi   )
i 1
n
n
n
ln L( )  n ln 2  2n ln   3 ln( xi   )
i 1
d ln L( ) 2n
1

 3
0
d

i 1 ( xi   )
1
2n


3
i 1 xi  
ˆ  48.037
48.037 2
n
P( X  50)  F (50)  1 
n
 50  48.037 
2
 .7599
Problem 3
2. Based upon the sample data in Problem 1:
(a) Find a 98 percent confidence interval for the population mean.
(b) Find a 98 percent confidence interval for the population
standard deviation.
(c) Management has hypothesized that the (population) mean time
to complete this task should be 68 hours. Based upon the
confidence interval in (a), can management’s assertion be
supported or not? Yes or No.
(d) Management also believes that the standard deviation in task
times should be no greater than 28 hours. Can that assertion
be supported by the confidence interval in part (b)? Yes or No.
Problem 3
98% confidence Interval for the Mean:
 23.868 
x  t /2,n 1s / n  78.42  2.4049 
  (70.3,86.54)
 50 
H0:  = 68 hrs
98% confidence Interval for the standard deviation:
(n  1) s 2
 /2,n 1
2
 
2
(n  1) s 2
12 /2,n1
49  569.7 
49  569.7 
2

 
74.919
28.94
 (372.6,964.56) ; 19.3    31.06
H0:  = 28 hrs
Problem 4

A manufacturer of high definition television (HDTV)
advertises a 50-inch plasma TV as having an operating life
of over 10,000 hours. The following data shown in 1,000
hours was obtained on 20 of the manufacturer’s TV’s by the
Consumer Product Testing Service (CPTS):
10.280
13.035
12.029
13.337
13.699
11.521
11.142
9.110
10.631
13.076
8.652
10.461
12.356
11.008
9.161
5.068
10.551
9.785
11.750
5.806
The Data in statistical
summarization
sample size
mean
variance
std dev
median
1st quartile
3rd quartile
interquatile range
Mimimum
Maximum
Range
Skewness
Kurtosis
20
10.623
5.230
2.287
10.8195
9.629
11.57825
1.94925
5.068
13.699
8.631
-1.0260
1.0703
Problem 4a


For parts (a) – (d), assume the population is normally
distributed with a standard deviation that is known where  =
2.5 (1,000) hours.
(a) Test the hypothesis at the 5 percent level that the
population mean life of the TVs is 10,000 hours against the
alternative that it is greater than 10,000 hours. What is the
critical z-value for the test and what is the critical X-bar value?
H0:  = 10,000
H1:  > 10,000
z .05  1.645
2.5
X c  10  1.645
 10.920
20
Problem 4b


Based upon the sample data, do you reject or not reject the
null hypothesis?
What is the prob-value?
X  10.623;  X  2.5 / 20  .559
10.623  10
 1.114
.559
since z0  1.114  1.645 cannot reject
z0 
or since X  10.623  10.92 cannot reject
P  value  Pr  X  10.623
10.623  10 

 Pr  z0 
  Pr  z0  1.114  .1326
.559 

Problem 4c

If the true mean life of the TVs is 11,000 hours, what is the
probability of accepting the null hypothesis? Assume a sample
size of 20.
 10.920  11 
  Pr  X  10.920 |   11  Pr  z 

.559 

 Pr  z  0.1431  .4431
Problem 4d

What sample size would be required to test this hypothesis
where the probability of a type I error is one percent and the
probability of a type II error is two percent if the true mean is
11,000 hours?
z.01  2.326, z.02  2.0537
2.326  2.0537 

n
2
1
2
2.52
 119.88  120
Problem 4e

From the above sample data and assuming a normal population,
test the hypothesis at the 5 percent level of significance that the
standard deviation is 2.5 (1,000) against the alternate hypothesis
that it is less than 2.5 (1,000). What is the test statistic, critical
value and the P-value?
H0: 2 = 2.52
H1: 2 < 2.52
 02 
2
n

1
s
 
 02
19  2.287 


 15.90
2
2.5
2
2
12 /2,n 1  .95,19
 10.12; cannot reject
P-value  Pr  192  15.9  .3361
Bonus! Bonus!
Problem 4e as a two-tailed

From the above sample data and assuming a normal population,
test the hypothesis at the 5 percent level of significance that the
standard deviation is 2.5 (1,000) against the alternate hypothesis
that it is not 2.5 (1,000). What is the test statistic, critical value
and the P-value?
H0: 2 = 2.52
H1: 2 = 2.52
n  1 s

 
2
2
0
0
2
19  2.287 


2.5
2
2
 15.90
2
12 /2,n 1  .975,19
 8.907
2
2 /2,n 1  .025,19
 32.852 cannot reject
P-value  2 Pr  192  15.9  2 .3361  .6722
Problem 5


The Democratic National Committee is interested in knowing
whether the country would support a female candidate for
president. They decide to take a survey to see which way the
wind is blowing on the issue. They decide that if they get a
favorable (would support) response from more than one-third of
those surveyed they should not discourage a female candidate.
a. State an appropriate null and alternate hypothesis for this test.
H0: p  .33
H1: p > .33
Problem 5b

Suppose they survey 100 people and 40 of them indicate they
would support a female candidate. Should they reject the null
hypothesis? Compute the test statistic and use alpha = .05.
z0 
pˆ  p0
.4  .33
p0 1  p0  / n .33 1  .33 /100
 1.489
z0  1.489  z.05  1.6449; cannot reject H 0
H0: p  .33
H1: p > .33
Problem 5c

What is the p-value for this test?
p-Value  Pr z  1.489  .0682
Problem 5d

With this sample size (n = 100) and alpha (.05) what is the
power of the test to detect a percentage (who would support a
female candidate) of 40% or greater.
 p0  p  z p0 1  p0  / n 

  


p
1

p
/
n




 .33  .40  1.6449 .33 1  .33 /100 

 


.40 1  .40  /100


  .1499   .5596
Power  .4404
Equation 9-36
Problem 5e
e. What sample size would be required to increase the power
in part d to 90 percent?
z  z.10  1.28
 z p0 1  p0   z
n
p  p0

p 1  p  


2
Equation 9-38
2
1.6449 .33 1  .33  1.28 .4 1  .4  
  400.7  401

.40  .33


Course
Problem 6

Faculty from the EMS Department are
headed to Las Vegas to present a
paper comparing distance learning and
on-campus classes. One of the
dimensions of comparison is test scores
for the two media (internet and
campus). They restrict their attention to
courses taught by the same instructor in
both settings in the same semester.
The results for tests in this comparison
are shown below. Each of the entries
under the Internet and Campus heading
represents an average test score, e.g. a
midterm or final average score.
Conduct all tests at the 4 level of
significance.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Mean
Std Dev
Internet Campus
Delta
70.1
63.1
84.5
63.0
62.9
85.9
78.9
93.3
88.3
78.5
82.6
80.1
85.7
85.0
77.8
80.5
83.4
91.3
86.9
82.5
66.2
56.0
78.9
65.6
67.0
83.3
77.9
89.5
89.3
73.4
76.7
69.3
75.2
72.7
82.3
80.7
79.6
88.5
87.5
84.0
3.9
6.1
5.6
-2.6
-4.1
2.6
1.0
3.8
-1.0
5.1
5.9
10.8
10.5
12.3
-4.5
-0.2
3.8
2.8
-0.6
-1.5
80.2
8.98
77.2
9.10
3.0
4.8
Problem 6a

Is there evidence that the variances of the scores in the two
media are not equal? Set up the correct hypothesis test and
report the results at the 4 percent level of significance.
F-Test Two-Sample for Variances
H 0 :  12   22
H1 :   
2
1
S12
F 2
S2
2
2
Mean
Variance
Observations
df
F
P(F<=f) one-tail
F Critical one-tail
F.98,19,19  .378 : F.02,19,19  2.6453
S12
.378  F0  2  .9734  2.6453, cannot reject
S2
Internet
Campus
80.215
77.18
80.5719 82.77431579
20
20
19
19
0.97339
0.47687
0.37803
2 percent
t.02, 38 = 2.1267
Problem 6b

Considering the samples to be independent, is there evidence
that the means of the test scores from campus and internet
classes are not equal? Assume equal variances.
t-Test: Two-Sample Assuming Equal Variances
Internet
Campus
Mean
80.215
77.18
Variance
80.5718684 82.7743
Observations
20
20
Pooled Variance
81.6730921
Hypothesized Mean Difference 0
df
38
t Stat
1.06198699
P(T<=t) one-tail
0.14747226
t Critical one-tail
1.79878002
P(T<=t) two-tail
0.29494452
t Critical two-tail
2.12667401
cannot reject
t.02, 19 = 2.2047
Problem 6c

di  x1i  x2i ; i  1,..., n
Now treat the data as a paired sample t-test. Is there
evidence that the means of the test scores are not equal?
t-Test: Paired Two Sample for Means
Internet
Campus
Mean
80.215
77.18
Variance
80.5718684 82.7743
Observations
20
20
Pearson Correlation 0.8571121
Hypothesized Mean Difference 0
df
19
t Stat
2.80868534
P(T<=t) one-tail
0.00560474
t Critical one-tail
1.84953003
P(T<=t) two-tail
0.01120948
t Critical two-tail
2.20470134
d
to 
sd / n
Reject
Mean =
Std. dev. =
3.9
6.1
5.6
-2.6
-4.1
2.6
1.0
3.8
-1.0
5.1
5.9
10.8
10.5
12.3
-4.5
-0.2
3.8
2.8
-0.6
-1.5
2.985
4.793
Problem 6d

Give a 99 percent confidence interval on the mean
difference in test scores using the most appropriate method.
t.005,19
d  t /2,n 1
S
4.793
 2.985  2.861
 (.08127, 6.0512)
n
20
Problem 7

The following data (problems 12-88, 12-89, 12-93) represents
the thrust of a jet-turbine engine (y) and six candidate
regressors: x1 = primary speed of rotation, x2 = secondary speed
of rotation, x3 = fuel flow rate, x4 = pressure, x5 = exhaust
temperature, and x6 = ambient temperature at time of the test.
sample
y
x1
x2
1
2
3
4
5
6
7
8
4540
4315
4095
3650
3200
4833
4617
4340
2140
2016
1905
1675
1474
2239
2120
1990
20640
20280
19860
18980
18100
20740
20305
19961
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
3820
3368
4445
4188
3981
3622
3125
4560
4340
4115
3630
3210
4330
4119
3891
3467
3045
4411
4203
3968
3531
3074
4350
4128
3940
3480
3064
4402
4180
3973
3530
3080
1702
1487
2107
1973
1864
1674
1440
2165
2048
1916
1658
1489
2062
1929
1815
1595
1400
2047
1935
1807
1591
1388
2071
1944
1830
1612
1410
2066
1954
1835
1616
1407
18916
18012
20520
20130
19780
19020
18030
20680
20340
19860
18950
18700
20500
20050
19680
18890
17870
20540
20160
19750
18890
17870
20460
20010
19640
18710
17780
20520
20150
19750
18850
17910
x3
x4
x5
x6
30250
30010
29780
29330
28960
30083
29831
29604
205
195
184
164
144
215
206
195
1732
1697
1662
1598
1541
1709
1669
1640
99
100
97
97
97
87
87
87
29088
28675
30120
29920
29720
29370
28940
30160
29960
29710
29250
28890
30190
29960
29770
29360
28960
30160
29940
29760
29350
28910
30180
29940
29750
29360
28900
30170
29950
29740
29320
28910
171
149
195
190
180
161
139
208
199
187
164
145
193
183
173
153
134
193
184
173
153
133
198
186
178
156
136
197
188
178
156
137
1572
1522
1740
1711
1682
1630
1572
1704
1679
1642
1576
1528
1748
1713
1684
1624
1569
1746
1714
1679
1621
1561
1729
1692
1667
1609
1552
1758
1729
1690
1616
1569
85
85
101
100
100
100
101
98
96
94
94
94
101
100
100
99
100
99
99
99
99
99
102
101
101
101
101
100
99
99
99
100
Problem 7a
Fit the following simple regression model to the data where x1 is
the primary speed of rotation. Perform all appropriate
statistical tests. Y = 0 + 1 x1 + 
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99501091
R Square
0.99004671
Adjusted R Square
0.98978478
Standard Error
51.0047533
Observations
40
F.05,1,38  4.0982
t.05,38  2.0244
ANOVA
df
Regression
Residual
Total
Intercept
x1
SS
MS
F
Significance F
1 9833179.58 9833179.58 3779.83348
1.182E-39
38 98856.4247 2601.48486
39
9932036
Coefficients Standard Error
t Stat
296.904179 59.2223721
5.0133787
1.99298073 0.03241655 61.4803504
P-value
Lower 95% Upper 95%
1.2733E-05 177.014756 416.793603
1.182E-39 1.92735686
2.0586046
Problem 7b
Is there a better fit using a nonlinear relationship of x1 such a log,
power, or exponential function?
y = 3496.6Ln(x) - 22290
R2 = 0.9886
6000
5000
4000
3000
2000
1000
0
1200
1400
1600
1800
2000
2200
2400
Problem 7b
Is there a better fit using a nonlinear relationship of x1 such a log,
power, or exponential function?
y = 3.8129x0.9241
6000
R2 = 0.9906
5000
4000
3000
2000
1000
0
1200
1400
1600
1800
2000
2200
2400
Problem 7b
Is there a better fit using a nonlinear relationship of x1 such a log,
power, or exponential function?
6000
5000
y = 1497.1e0.0005x
R2 = 0.9851
4000
3000
2000
1000
0
1200
1400
1600
1800
2000
2200
2400
Problem 7c
Now fit the following multiple regression model to the data where x3
is the fuel flow rate and x4 is the pressure. Predict the engine
thrust if the fuel flow rate is 30,000 and the pressure is 190.
Y = 0 + 1 x3 + 2 x4 + 
Intercept
x3
x4
Coefficients
Standard Error t Stat
P-value Lower 95%Upper 95%
-2031.66 1152.656 -1.76259 0.086229 -4367.17 303.8398
0.082854 0.043608 1.899983
0.06525
-0.0055 0.171212
19.96392 0.869995 22.94718 1.77E-23 18.20114
21.7267
Regression Statistics
Multiple R
0.995481
R Square
0.990983
Adjusted R Square
0.990496
Standard Error
49.19826
Observations
40
Y = -2031.66 + .082854 (30,000)
+ 19.96392 (190) = 4247.10
Problem 7d
Test for the significance of the regression model at the one
percent level. What is the critical F-value?
ANOVA
df
Regression
Residual
Total
SS
MS
F
Significance F
2 9842479 4921239 2033.176 1.47E-38
37 89557.34 2420.469
39 9932036
F.01,2,37  5.229
Problem 7e
Test each of the regression coefficients at the 5 percent level of
significance. What are the critical t values?
Intercept
x3
x4
Coefficients
Standard Error t Stat
P-value Lower 95%Upper 95%
-2031.66 1152.656 -1.76259 0.086229 -4367.17 303.8398
0.082854 0.043608 1.899983
0.06525
-0.0055 0.171212
19.96392 0.869995 22.94718 1.77E-23 18.20114
21.7267
t.05/2,37  2.0262
Problem 7f
Construct a 97 percent confidence interval on the mean response
and prediction interval at a fuel flow rate of 30,000 and a
pressure of 190.
t.05/2,37  2.0262
ˆY |x  t / 2,n p ˆ 2 x0 ( X X )1 x0  Y | x  ˆY |x  t / 2,n p ˆ 2 x0 ( X X )1 x0
Predicted Values
Fit
StDev Fit
4247.10
10.50
95.0% CI
95.0% PI
( 4225.83, 4268.38)
( 4145.17, 4349.03)
Problem 7g
Generate a multiple regression model using all the predictor
variables. Eliminate any variables that do not pass the t-test at
the 5 percent level of significance. Form a new regression
model with the remaining variables. Is this a better model than
the one found in (f)?
CoefficientsStandard Error t Stat
Intercept
-4726.381 2445.448 -1.93273
x1
1.11868286 0.280075 3.994233
x2
-0.0312141 0.038277 -0.81548
x3
0.23070284 0.118034 1.954546
x4
3.88373596 2.638334 1.472041
x5
0.82658321 0.351328 2.352739
x6
-17.027503 2.598349
-6.5532
crit t = 2.348338
Regression Statistics
Multiple R 0.99883169
R Square
0.99766474
Adjusted R Square
0.99724014
Standard Error
26.5112435
P-value
0.06189
0.000342
0.420643
0.059153
0.150483
0.024749
1.91E-07
T.05/2,33 = 2.0345
More Problem 7g
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99858127
R Square
0.99716455
Adjusted R Square
0.99692827
Standard Error
27.9691109
Observations
40
The regression equation is
y = 367 + 1.71 x1 + 1.10 x5 - 14.0 x6
ANOVA
df
Regression
Residual
Total
Intercept
x1
x5
x6
SS
MS
F
3 9903874 3301291 4220.137
36 28161.76 782.2712
39 9932036
CoefficientsStandard Error t Stat
366.52906
198.004 1.851119
1.70592044 0.066433
25.679
1.09629856 0.254142 4.313724
-13.970234 1.667503 -8.37794
crit t = 2.339061
P-value
0.07237
9.66E-25
0.00012
5.61E-10
The End of a Most
Wonderful Practice
Final Exam
There is no more.
What About ANOVA?
I can do this.
A ten point DOE question



either a CRD or RBD
input data and design given
partial ANOVA table given

complete the ANOVA table
• no sums of squares calculations needed


find critical F value
Test hypotheses or significance
ANOVA
RBD
Source of
Variation
Rows
Columns
Error
SS
xxxx
xxxx
df
Total
xxxx
xx
MS
F
Five Most Wonderful Problems
An estimation problem
1.

Chapter 7
A confidence interval problem
2.

Chapter 8
A hypothesis testing problem
3.

Chapters 9 and 10
A regression problem
4.

Chapters 11 and 12
An ANOVA problem
5.

Chapter 13
Final Exam Instructions

This is a 120-minute open book exam. You may use a
computer or calculator. Each question is weighted as shown.
Complete the answer sheet at the end of the test and submit all
of your work.
Additional Instructions for
internet students

Within 10 minutes following the 2-hour exam, complete the
table below and submit this page either by email
(Ebeling@udayton.edu) or fax (937 229-2698). You may then
send, deliver, or fax any additional work which should be
received by the instructor within 2 hours after completing the
exam. If any response in the table below is blank, you will
receive a zero (0) for that response. If you partially complete a
problem, then submit what you have completed. Any additional
work received within 2 hours will be evaluated only to the extent
that it supports your original submission. Include your name on
this answer sheet. Your last name must be part of the file name
on any email attachments. Failure to follow these instructions
will result in lost points.
Final Exam Registration


Need to register for the exam by today (Wednesday
December 9th)
Register by completing the on-line form at the
following course Webpage:
http://academic.udayton.edu/CharlesEbeling/ENM500/exams/register_final.htm


Indicate time and place (internet email address or
regular campus class)
Must provide justification for taking exam via the
internet!
Download