A Most Wonderful Practice Final Happiness is taking a final exam in statistics. Our Last Homework Assignment Statistic mean std dev median skewness minimum maximum 1 quartile 3 quartile interquartile range range kurtosis Chap12 90.6 15.3 91 -3.81 10 100 87.63 100 12.38 90 19.01 HW Avg 85.3 10.8 88.6 -2.55 45.23 99 83.02 90.25 7.23 53.77 7.64 Problem 1 – point estimation The following random sample was obtained by measuring the time in (working) hours to complete a particular construction job. Treating the data as continuous, answer the following questions: (a) Find an unbiased estimate for the population mean (b) Find an unbiased estimate for the population variance. 83.8 64.2 89.8 82.3 90.4 63.3 40.4 104 108 96.6 65.4 98.1 46.8 86.9 71.8 56.2 72.1 73.7 77.2 113 135 56.4 99.8 64.6 95.7 85.3 75.8 88.5 71.7 72 99.7 49.1 98.9 85.2 110 68.4 123 58.6 74.1 67.9 66.5 44.5 55.6 136 91.7 30.9 76.7 36.8 48 71.5 Problem 1a,b Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count 78.42 3.37549 74.95 #N/A 23.8683 569.696 -0.0294 0.3234 104.8 30.9 135.7 3921 50 Excel – descriptive statistics t 1et / f (t ) for , 0 and t 0 ( ) (c) To the gamma distribution More problem 1 (c) The mean and variance of the gamma distribution are and 2 respectively where and are the parameters of the distribution. Assuming the gamma distribution is also a reasonable fit to the above sample data, find the method of moment estimators for and . Recall 2 = E(X2) - 2 n m1 xi / n 78.42 i 1 n m2 xi2 / n 6708 i 1 m1 ; m2 2 2 m12 78.422 11.015 2 2 m2 m1 6708 78.42 m2 m12 6708 78.422 7.119 m1 78.42 Problem 2 – The Ebeling Distribution The Ebeling distribution, a statistical marvel, is a one-parameter probability distribution having the following PDF and CDF where E(X) = θ: 2 f ( x) ; x 0, 0 3 (x ) 2 2 F ( x) 1 2 x 54.3 28.8 20.5 365.7 1.6 33.9 40.1 18.7 21.5 10.1 41.1 11.5 160.8 0.3 0.2 34.7 239.0 10.7 17.8 190.7 4.5 14.4 4.3 3.0 15.2 46.4 46.5 59.9 0.4 78.0 4.6 123.9 2.0 18.0 55.0 7.1 19.9 99.4 171.6 0.3 25.0 37.2 14.8 14.0 1.0 80.6 2.2 6.6 7.6 30.4 Data represents the time between arrivals of patients at an urgent care center in minutes. Finding the MoM 2 x 1 2 E( X ) dx 2 3 2 ( x ) (3 2)( x ) 2( x ) 0 0 2 2 2( x ) 2 2 2( x ) X 45.9 2 2 2 2 0 Finding the MLE n n 2 2 n 2 3 L( ) f ( xi ) 2 ( x ) i 3 i 1 i 1 ( xi ) i 1 n n n ln L( ) n ln 2 2n ln 3 ln( xi ) i 1 d ln L( ) 2n 1 3 0 d i 1 ( xi ) 1 2n 3 i 1 xi ˆ 48.037 48.037 2 n P( X 50) F (50) 1 n 50 48.037 2 .7599 Problem 3 2. Based upon the sample data in Problem 1: (a) Find a 98 percent confidence interval for the population mean. (b) Find a 98 percent confidence interval for the population standard deviation. (c) Management has hypothesized that the (population) mean time to complete this task should be 68 hours. Based upon the confidence interval in (a), can management’s assertion be supported or not? Yes or No. (d) Management also believes that the standard deviation in task times should be no greater than 28 hours. Can that assertion be supported by the confidence interval in part (b)? Yes or No. Problem 3 98% confidence Interval for the Mean: 23.868 x t /2,n 1s / n 78.42 2.4049 (70.3,86.54) 50 H0: = 68 hrs 98% confidence Interval for the standard deviation: (n 1) s 2 /2,n 1 2 2 (n 1) s 2 12 /2,n1 49 569.7 49 569.7 2 74.919 28.94 (372.6,964.56) ; 19.3 31.06 H0: = 28 hrs Problem 4 A manufacturer of high definition television (HDTV) advertises a 50-inch plasma TV as having an operating life of over 10,000 hours. The following data shown in 1,000 hours was obtained on 20 of the manufacturer’s TV’s by the Consumer Product Testing Service (CPTS): 10.280 13.035 12.029 13.337 13.699 11.521 11.142 9.110 10.631 13.076 8.652 10.461 12.356 11.008 9.161 5.068 10.551 9.785 11.750 5.806 The Data in statistical summarization sample size mean variance std dev median 1st quartile 3rd quartile interquatile range Mimimum Maximum Range Skewness Kurtosis 20 10.623 5.230 2.287 10.8195 9.629 11.57825 1.94925 5.068 13.699 8.631 -1.0260 1.0703 Problem 4a For parts (a) – (d), assume the population is normally distributed with a standard deviation that is known where = 2.5 (1,000) hours. (a) Test the hypothesis at the 5 percent level that the population mean life of the TVs is 10,000 hours against the alternative that it is greater than 10,000 hours. What is the critical z-value for the test and what is the critical X-bar value? H0: = 10,000 H1: > 10,000 z .05 1.645 2.5 X c 10 1.645 10.920 20 Problem 4b Based upon the sample data, do you reject or not reject the null hypothesis? What is the prob-value? X 10.623; X 2.5 / 20 .559 10.623 10 1.114 .559 since z0 1.114 1.645 cannot reject z0 or since X 10.623 10.92 cannot reject P value Pr X 10.623 10.623 10 Pr z0 Pr z0 1.114 .1326 .559 Problem 4c If the true mean life of the TVs is 11,000 hours, what is the probability of accepting the null hypothesis? Assume a sample size of 20. 10.920 11 Pr X 10.920 | 11 Pr z .559 Pr z 0.1431 .4431 Problem 4d What sample size would be required to test this hypothesis where the probability of a type I error is one percent and the probability of a type II error is two percent if the true mean is 11,000 hours? z.01 2.326, z.02 2.0537 2.326 2.0537 n 2 1 2 2.52 119.88 120 Problem 4e From the above sample data and assuming a normal population, test the hypothesis at the 5 percent level of significance that the standard deviation is 2.5 (1,000) against the alternate hypothesis that it is less than 2.5 (1,000). What is the test statistic, critical value and the P-value? H0: 2 = 2.52 H1: 2 < 2.52 02 2 n 1 s 02 19 2.287 15.90 2 2.5 2 2 12 /2,n 1 .95,19 10.12; cannot reject P-value Pr 192 15.9 .3361 Bonus! Bonus! Problem 4e as a two-tailed From the above sample data and assuming a normal population, test the hypothesis at the 5 percent level of significance that the standard deviation is 2.5 (1,000) against the alternate hypothesis that it is not 2.5 (1,000). What is the test statistic, critical value and the P-value? H0: 2 = 2.52 H1: 2 = 2.52 n 1 s 2 2 0 0 2 19 2.287 2.5 2 2 15.90 2 12 /2,n 1 .975,19 8.907 2 2 /2,n 1 .025,19 32.852 cannot reject P-value 2 Pr 192 15.9 2 .3361 .6722 Problem 5 The Democratic National Committee is interested in knowing whether the country would support a female candidate for president. They decide to take a survey to see which way the wind is blowing on the issue. They decide that if they get a favorable (would support) response from more than one-third of those surveyed they should not discourage a female candidate. a. State an appropriate null and alternate hypothesis for this test. H0: p .33 H1: p > .33 Problem 5b Suppose they survey 100 people and 40 of them indicate they would support a female candidate. Should they reject the null hypothesis? Compute the test statistic and use alpha = .05. z0 pˆ p0 .4 .33 p0 1 p0 / n .33 1 .33 /100 1.489 z0 1.489 z.05 1.6449; cannot reject H 0 H0: p .33 H1: p > .33 Problem 5c What is the p-value for this test? p-Value Pr z 1.489 .0682 Problem 5d With this sample size (n = 100) and alpha (.05) what is the power of the test to detect a percentage (who would support a female candidate) of 40% or greater. p0 p z p0 1 p0 / n p 1 p / n .33 .40 1.6449 .33 1 .33 /100 .40 1 .40 /100 .1499 .5596 Power .4404 Equation 9-36 Problem 5e e. What sample size would be required to increase the power in part d to 90 percent? z z.10 1.28 z p0 1 p0 z n p p0 p 1 p 2 Equation 9-38 2 1.6449 .33 1 .33 1.28 .4 1 .4 400.7 401 .40 .33 Course Problem 6 Faculty from the EMS Department are headed to Las Vegas to present a paper comparing distance learning and on-campus classes. One of the dimensions of comparison is test scores for the two media (internet and campus). They restrict their attention to courses taught by the same instructor in both settings in the same semester. The results for tests in this comparison are shown below. Each of the entries under the Internet and Campus heading represents an average test score, e.g. a midterm or final average score. Conduct all tests at the 4 level of significance. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Mean Std Dev Internet Campus Delta 70.1 63.1 84.5 63.0 62.9 85.9 78.9 93.3 88.3 78.5 82.6 80.1 85.7 85.0 77.8 80.5 83.4 91.3 86.9 82.5 66.2 56.0 78.9 65.6 67.0 83.3 77.9 89.5 89.3 73.4 76.7 69.3 75.2 72.7 82.3 80.7 79.6 88.5 87.5 84.0 3.9 6.1 5.6 -2.6 -4.1 2.6 1.0 3.8 -1.0 5.1 5.9 10.8 10.5 12.3 -4.5 -0.2 3.8 2.8 -0.6 -1.5 80.2 8.98 77.2 9.10 3.0 4.8 Problem 6a Is there evidence that the variances of the scores in the two media are not equal? Set up the correct hypothesis test and report the results at the 4 percent level of significance. F-Test Two-Sample for Variances H 0 : 12 22 H1 : 2 1 S12 F 2 S2 2 2 Mean Variance Observations df F P(F<=f) one-tail F Critical one-tail F.98,19,19 .378 : F.02,19,19 2.6453 S12 .378 F0 2 .9734 2.6453, cannot reject S2 Internet Campus 80.215 77.18 80.5719 82.77431579 20 20 19 19 0.97339 0.47687 0.37803 2 percent t.02, 38 = 2.1267 Problem 6b Considering the samples to be independent, is there evidence that the means of the test scores from campus and internet classes are not equal? Assume equal variances. t-Test: Two-Sample Assuming Equal Variances Internet Campus Mean 80.215 77.18 Variance 80.5718684 82.7743 Observations 20 20 Pooled Variance 81.6730921 Hypothesized Mean Difference 0 df 38 t Stat 1.06198699 P(T<=t) one-tail 0.14747226 t Critical one-tail 1.79878002 P(T<=t) two-tail 0.29494452 t Critical two-tail 2.12667401 cannot reject t.02, 19 = 2.2047 Problem 6c di x1i x2i ; i 1,..., n Now treat the data as a paired sample t-test. Is there evidence that the means of the test scores are not equal? t-Test: Paired Two Sample for Means Internet Campus Mean 80.215 77.18 Variance 80.5718684 82.7743 Observations 20 20 Pearson Correlation 0.8571121 Hypothesized Mean Difference 0 df 19 t Stat 2.80868534 P(T<=t) one-tail 0.00560474 t Critical one-tail 1.84953003 P(T<=t) two-tail 0.01120948 t Critical two-tail 2.20470134 d to sd / n Reject Mean = Std. dev. = 3.9 6.1 5.6 -2.6 -4.1 2.6 1.0 3.8 -1.0 5.1 5.9 10.8 10.5 12.3 -4.5 -0.2 3.8 2.8 -0.6 -1.5 2.985 4.793 Problem 6d Give a 99 percent confidence interval on the mean difference in test scores using the most appropriate method. t.005,19 d t /2,n 1 S 4.793 2.985 2.861 (.08127, 6.0512) n 20 Problem 7 The following data (problems 12-88, 12-89, 12-93) represents the thrust of a jet-turbine engine (y) and six candidate regressors: x1 = primary speed of rotation, x2 = secondary speed of rotation, x3 = fuel flow rate, x4 = pressure, x5 = exhaust temperature, and x6 = ambient temperature at time of the test. sample y x1 x2 1 2 3 4 5 6 7 8 4540 4315 4095 3650 3200 4833 4617 4340 2140 2016 1905 1675 1474 2239 2120 1990 20640 20280 19860 18980 18100 20740 20305 19961 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 3820 3368 4445 4188 3981 3622 3125 4560 4340 4115 3630 3210 4330 4119 3891 3467 3045 4411 4203 3968 3531 3074 4350 4128 3940 3480 3064 4402 4180 3973 3530 3080 1702 1487 2107 1973 1864 1674 1440 2165 2048 1916 1658 1489 2062 1929 1815 1595 1400 2047 1935 1807 1591 1388 2071 1944 1830 1612 1410 2066 1954 1835 1616 1407 18916 18012 20520 20130 19780 19020 18030 20680 20340 19860 18950 18700 20500 20050 19680 18890 17870 20540 20160 19750 18890 17870 20460 20010 19640 18710 17780 20520 20150 19750 18850 17910 x3 x4 x5 x6 30250 30010 29780 29330 28960 30083 29831 29604 205 195 184 164 144 215 206 195 1732 1697 1662 1598 1541 1709 1669 1640 99 100 97 97 97 87 87 87 29088 28675 30120 29920 29720 29370 28940 30160 29960 29710 29250 28890 30190 29960 29770 29360 28960 30160 29940 29760 29350 28910 30180 29940 29750 29360 28900 30170 29950 29740 29320 28910 171 149 195 190 180 161 139 208 199 187 164 145 193 183 173 153 134 193 184 173 153 133 198 186 178 156 136 197 188 178 156 137 1572 1522 1740 1711 1682 1630 1572 1704 1679 1642 1576 1528 1748 1713 1684 1624 1569 1746 1714 1679 1621 1561 1729 1692 1667 1609 1552 1758 1729 1690 1616 1569 85 85 101 100 100 100 101 98 96 94 94 94 101 100 100 99 100 99 99 99 99 99 102 101 101 101 101 100 99 99 99 100 Problem 7a Fit the following simple regression model to the data where x1 is the primary speed of rotation. Perform all appropriate statistical tests. Y = 0 + 1 x1 + SUMMARY OUTPUT Regression Statistics Multiple R 0.99501091 R Square 0.99004671 Adjusted R Square 0.98978478 Standard Error 51.0047533 Observations 40 F.05,1,38 4.0982 t.05,38 2.0244 ANOVA df Regression Residual Total Intercept x1 SS MS F Significance F 1 9833179.58 9833179.58 3779.83348 1.182E-39 38 98856.4247 2601.48486 39 9932036 Coefficients Standard Error t Stat 296.904179 59.2223721 5.0133787 1.99298073 0.03241655 61.4803504 P-value Lower 95% Upper 95% 1.2733E-05 177.014756 416.793603 1.182E-39 1.92735686 2.0586046 Problem 7b Is there a better fit using a nonlinear relationship of x1 such a log, power, or exponential function? y = 3496.6Ln(x) - 22290 R2 = 0.9886 6000 5000 4000 3000 2000 1000 0 1200 1400 1600 1800 2000 2200 2400 Problem 7b Is there a better fit using a nonlinear relationship of x1 such a log, power, or exponential function? y = 3.8129x0.9241 6000 R2 = 0.9906 5000 4000 3000 2000 1000 0 1200 1400 1600 1800 2000 2200 2400 Problem 7b Is there a better fit using a nonlinear relationship of x1 such a log, power, or exponential function? 6000 5000 y = 1497.1e0.0005x R2 = 0.9851 4000 3000 2000 1000 0 1200 1400 1600 1800 2000 2200 2400 Problem 7c Now fit the following multiple regression model to the data where x3 is the fuel flow rate and x4 is the pressure. Predict the engine thrust if the fuel flow rate is 30,000 and the pressure is 190. Y = 0 + 1 x3 + 2 x4 + Intercept x3 x4 Coefficients Standard Error t Stat P-value Lower 95%Upper 95% -2031.66 1152.656 -1.76259 0.086229 -4367.17 303.8398 0.082854 0.043608 1.899983 0.06525 -0.0055 0.171212 19.96392 0.869995 22.94718 1.77E-23 18.20114 21.7267 Regression Statistics Multiple R 0.995481 R Square 0.990983 Adjusted R Square 0.990496 Standard Error 49.19826 Observations 40 Y = -2031.66 + .082854 (30,000) + 19.96392 (190) = 4247.10 Problem 7d Test for the significance of the regression model at the one percent level. What is the critical F-value? ANOVA df Regression Residual Total SS MS F Significance F 2 9842479 4921239 2033.176 1.47E-38 37 89557.34 2420.469 39 9932036 F.01,2,37 5.229 Problem 7e Test each of the regression coefficients at the 5 percent level of significance. What are the critical t values? Intercept x3 x4 Coefficients Standard Error t Stat P-value Lower 95%Upper 95% -2031.66 1152.656 -1.76259 0.086229 -4367.17 303.8398 0.082854 0.043608 1.899983 0.06525 -0.0055 0.171212 19.96392 0.869995 22.94718 1.77E-23 18.20114 21.7267 t.05/2,37 2.0262 Problem 7f Construct a 97 percent confidence interval on the mean response and prediction interval at a fuel flow rate of 30,000 and a pressure of 190. t.05/2,37 2.0262 ˆY |x t / 2,n p ˆ 2 x0 ( X X )1 x0 Y | x ˆY |x t / 2,n p ˆ 2 x0 ( X X )1 x0 Predicted Values Fit StDev Fit 4247.10 10.50 95.0% CI 95.0% PI ( 4225.83, 4268.38) ( 4145.17, 4349.03) Problem 7g Generate a multiple regression model using all the predictor variables. Eliminate any variables that do not pass the t-test at the 5 percent level of significance. Form a new regression model with the remaining variables. Is this a better model than the one found in (f)? CoefficientsStandard Error t Stat Intercept -4726.381 2445.448 -1.93273 x1 1.11868286 0.280075 3.994233 x2 -0.0312141 0.038277 -0.81548 x3 0.23070284 0.118034 1.954546 x4 3.88373596 2.638334 1.472041 x5 0.82658321 0.351328 2.352739 x6 -17.027503 2.598349 -6.5532 crit t = 2.348338 Regression Statistics Multiple R 0.99883169 R Square 0.99766474 Adjusted R Square 0.99724014 Standard Error 26.5112435 P-value 0.06189 0.000342 0.420643 0.059153 0.150483 0.024749 1.91E-07 T.05/2,33 = 2.0345 More Problem 7g SUMMARY OUTPUT Regression Statistics Multiple R 0.99858127 R Square 0.99716455 Adjusted R Square 0.99692827 Standard Error 27.9691109 Observations 40 The regression equation is y = 367 + 1.71 x1 + 1.10 x5 - 14.0 x6 ANOVA df Regression Residual Total Intercept x1 x5 x6 SS MS F 3 9903874 3301291 4220.137 36 28161.76 782.2712 39 9932036 CoefficientsStandard Error t Stat 366.52906 198.004 1.851119 1.70592044 0.066433 25.679 1.09629856 0.254142 4.313724 -13.970234 1.667503 -8.37794 crit t = 2.339061 P-value 0.07237 9.66E-25 0.00012 5.61E-10 The End of a Most Wonderful Practice Final Exam There is no more. What About ANOVA? I can do this. A ten point DOE question either a CRD or RBD input data and design given partial ANOVA table given complete the ANOVA table • no sums of squares calculations needed find critical F value Test hypotheses or significance ANOVA RBD Source of Variation Rows Columns Error SS xxxx xxxx df Total xxxx xx MS F Five Most Wonderful Problems An estimation problem 1. Chapter 7 A confidence interval problem 2. Chapter 8 A hypothesis testing problem 3. Chapters 9 and 10 A regression problem 4. Chapters 11 and 12 An ANOVA problem 5. Chapter 13 Final Exam Instructions This is a 120-minute open book exam. You may use a computer or calculator. Each question is weighted as shown. Complete the answer sheet at the end of the test and submit all of your work. Additional Instructions for internet students Within 10 minutes following the 2-hour exam, complete the table below and submit this page either by email (Ebeling@udayton.edu) or fax (937 229-2698). You may then send, deliver, or fax any additional work which should be received by the instructor within 2 hours after completing the exam. If any response in the table below is blank, you will receive a zero (0) for that response. If you partially complete a problem, then submit what you have completed. Any additional work received within 2 hours will be evaluated only to the extent that it supports your original submission. Include your name on this answer sheet. Your last name must be part of the file name on any email attachments. Failure to follow these instructions will result in lost points. Final Exam Registration Need to register for the exam by today (Wednesday December 9th) Register by completing the on-line form at the following course Webpage: http://academic.udayton.edu/CharlesEbeling/ENM500/exams/register_final.htm Indicate time and place (internet email address or regular campus class) Must provide justification for taking exam via the internet!