Statistics 242.3 – Review Questions for Final Exam - Solutions I. An embryologist wished to estimate the mean difference in time at which eye pigmentation is first evidenced in an embryo for 2 varieties of birds. A sample of n1 = 33 embryos from species A revealed that it took an average of 74.32 hours for eye pigmentation to appear with a standard deviation of 2.51 hours. A second sample of m =33 revealed that it took an average of 100.21 hours for eye pigmentation to appear with a standard deviation of 19.44 hours. a) Estimate the difference in the mean time at which eye pigmentation appears for the two varieties of birds, with a 95% confidence interval Solution: x2 x1 z / 2 s12 s22 2.512 19.442 or 100.21 74.32 1.960 n1 n2 33 33 or 19.2 to 32.58 . b) Using a 5% significance level test the Null Hypothesis H0: Variety A’s pigmentaion requires the same period of time to appear as the time required by variety B against the alternative Hypothesis HA: Variety A’s pigmentaion requires a shorter period of time to appear than the time required by variety B Solution test statistic z x2 x1 100.21 74.32 7.59 s12 s22 2.512 19.442 33 33 n1 n2 Critical Region: Reject H0 if z > z0.05 = 1.645 Thus H0 is rejected . c) Assume that the cost of obtaining an embryo of variety A is $0.25 while the cost of obtaining an embryo of variety B is $1.50. Determine the optimal number of embryos of variety that would yield an estimate of the difference A - B having an error bound of N = 2 hours with a 95% level of confidence. Solution Sample size determination z2 z2 c c n1 /22 x2 2 x y and n2 /22 y2 1 x y B c1 B c2 where B = Error Bound = N = 2 and za/2 = z0.025 = 1.96. Thus Page 1 1.50 2 2.5119.44 =121 and 2.51 0.25 1.9602 0.25 2 n2 2.5119.44 =382 19.44 2 2 1.50 n1 II. 1.9602 22 An experiment with two outcomes (Success and Failure) is repeated n times independently. Let denote the probability of success (1 – the probability of failure). Let = – (1 - ) = 2 – 1 denote the difference between the probability of success and the probability of failure. a) Find, ˆ , the maximum Likelihood Estimate of . Solution 1 if i th repitition is a "Success" Let x1, x2, … , xn be such that xi th 0 if i repitition is a "Failure" x 1 x x 1 1 x 1 1 x Then f x P xi x = 1 2 2 1 x 0 The joint distribution of x1, x2, … , xn is n n n xi xi 1 x 1 i1 1 i1 1 1 f x1 , , xn xi 1 i 2 2 i 1 i 1 2 2 n n 1 1 1 1 l ln L xi ln n xi ln S ln n S ln 2 2 2 i 1 i 1 2 n xi n 1 xi n where S xi i 1 l S 2 1 2 n S 1 2 1 or S 1 1 n S S nS 1 0 if 1 1 2 n xi S i 1 ˆ 1 2ˆ Thus n 2S n and 2 1 2 n n b) Determine the approximate sampling distribution of ˆ . Solution The sampling distribution of ˆ is approximately normal with mean ̂ and standard deviation ˆ 1 n . Thus the sampling distribution of ˆ 2ˆ is Page 2 also approximately normal with mean ˆ 2ˆ 1 2 1 and standard deviation ˆ 2 ˆ 2 1 n 1 1 n c) Let x1, x2, x3, … , xn denote a random sample of size n from the normal distribution with mean standard deviation . Find the maximum Likelihood Estimate of . Solution The joint density of x1, x2, x3, … , xn is n f x1 , 2 xi 1 12 xi 1 2 i 1 e e L n/2 n 2 2 2 n , xn i 1 1 2 n n 2 l ln L ln 2 n ln 21 2 xi 2 i 1 Thus l n 1 n xi n n x i 2 i 1 i 1 2 1 n x 2 i 1 1 xi 2 n xi n 2 n x i 1 xi i 3 i 1 0 i 1 3 n n or n 2 xi xi2 0 and 2 x 1 1n s 2 x 2 0 i 1 i 1 This has two solutions using the quadratic formula n 2 x x 2 4 1 1n s 2 x 2 b b 4 ac ˆ 2a 2 . x 5 2 x 1 1n s 2 2 4 n n n 3 n 2 n 2 2 x 2 n x xi n 2 3 2 x x n i i i i i 1 i 1 i 1 i 1 i 1 l 6 3 nx 2n 3 n 1 s 2 nx 2 nx n 2 3 2 n 2 2 x 3 1 1n s 2 x 2 6 4 n xˆ 2 1 1n s 2 x 2 Page 3 ˆ 4 at ˆ III. In a certain type of test specimen, the normal stress on a specimen is known to be functionally related to the shear resistance. The following table gives experimental data on the two variables. normal stress(X). shear resistance (Y) 26.8 26.5 25.4 27.3 28.9 24.2 32.6 27.1 27.7 23.6 23.9 25.9 x = 165.3, x2 = 4599.9, y = 154.6, y2 = 3995.4, xy = 4259.2 a) Plot a scatter plot of this data and comment. 28 27 26 25 24 23 22 20 22 24 26 28 30 32 34 Comment: There seems to be no relationship between X and Y. b) Estimate the least squares line for predicting shear resistance (Y) normal stress (X). Plot its graph and comment on the values of its parameters. S xx = 45.855 S yy = 11.8333 S xy = -0.0400 S 0.0400 154.6 165.3 ˆ xy -0.00087 , ˆ y ˆ x - -0.00087 25.79 S xx 45.855 6 6 28 27 26 25 24 23 22 20 22 24 26 Page 4 28 30 32 34 c) Using a 5% significance level test to determine if shear resistance (Y) is not functionally related to normal stress(X). The test statistic is t ˆ s where s S xx S yy S xy2 n2 S xx 1.71998 -0.000872 -0.00343 .Comparing this with t0.025 = 2.776 for 4 1.71998 45.855 d.f., we cannot reject H0: = 0. Thus t d) What would you expect for shear resistance (Y) of a specimen that was known to have a normal stress measurement (X) of 25 units. Repeat the calculation for a normal stress measurement (X) of 30 units. Compute 95% prediction limits for the shear resistance measurements (Y) in each of these cases. The predicted value of Y when X = x0 is yˆ ˆ ˆ x0 25.7907 0.000872x0 25.769 when x0 25 25.765 when x0 30 1 x x Prediction limits for Y when X = x0 are yˆ ˆ ˆ x0 t / 2 s 1 0 n S xx 2 20.306 to 31.231 when x0 25 20.325 to 31.204 when x0 30 IV. An urn contains 10 balls, of which balls are blue (the rest being red and white). We are interested in testing the null hypothesis H0: =3 versus HA: =4. Suppose that we take a sample of size 3 balls, and reject H0 if all 3 draws yield blue balls. Compute , the probability of a type I error. Compute , the probability of a type II error. a) Assuming sampling without replacement. 3 3 2 1 1 3 P type I error P Rejecting H 0 when true 10 10 9 8 120 3 4 3 2 1 29 3 P type II error P Accepting H 0 when False 1 1 4 10 10 9 8 30 3 Page 5 b) Assuming sampling with replacement. 3 27 3 P type I error P Rejecting H 0 when true 10 1000 3 936 4 P type II error P Accepting H 0 when False 1 10 1000 V. The owner of a sporting goods store was interested in determining if there was any difference in the tension strength of a newly strung tennis racket due to the technician who performed the task of stringing the racket. The store had five technicians who strung tennis racket. Each was asked to string n = 10 tennis rackets. The data was summarized in the table below: Technician A B C D E Mean tension strength 45.3 50.7 40.2 61.8 49.2 Standard deviation 8.6 12.1 10.8 8.6 20.2 Analyze this data. Solution: Using the Anova F-test Technician A B C Total Ti 451 507 402 n T 2 G 2 124604.2 24702 SS Between i 2586.2 N 10 50 i 1 ni D 618 E 492 Grand Total (G) 2470 n n 1 s i 2 i 163.802, SS Between N k MS Between 45 163.802 7371.09 N k Thus the Anova Table is: Source SS df MS F p-value Between 2586.2 4 646.55 3.94714 0.00788 Within 7371.09 45 163.802 Total 9957.29 49 There is a significant difference amongst the strengths. MS Between VI. i 1 Notice that from the table we see that in the sample of n1 = 2108 Catholics, 571 of their father's reached the High school graduate level of education, while for the sample of n2 = 1558 Protestants, 446 of their father's reached the High school graduate level of education a) Determine 95% confidence limits for the proportion of Catholics whose fathers reach the High School graduate level of education. pˆ 1 pˆ pˆ z / 2 or 0.252 to 0.290 n Page 6 b) Determine 99% confidence limits for the proportion of Protestants whose fathers reach the High School graduate level of education. pˆ 1 pˆ pˆ z / 2 or 0.257 to 0.316 n c) Determine 95% confidence limits for the difference in proportion Protestants and Catholics whose fathers reach the High School graduate level of education. pˆ 1 pˆ1 pˆ 2 1 pˆ 2 or -0.014 to 0.045 pˆ1 pˆ 2 z / 2 1 n1 n2 d) Determine 99% confidence limits for the difference in the proportion of Protestants and Catholics whose fathers reach the High School graduate level of education. pˆ 1 pˆ1 pˆ 2 1 pˆ 2 or -0.023 to 0.054 pˆ1 pˆ 2 z / 2 1 n1 n2 e) Is there a significant difference ( = 0.01) between the proportion of Protestants and Catholics whose fathers reach the High School graduate level of education? The test statistic is z pˆ1 pˆ 2 1 1 pˆ 1 pˆ n1 n2 =1.029 where pˆ Comparing z with z0.005 = 2.576 we accept H0: no difference Page 7 n1 pˆ1 n2 pˆ 2 0.2774 n1 n2 VII. Suppose that X and Y are independent unbiased measurements of the angle and 3 respectively. Namely E(X) = and E(Y) = 3. In addition assume that X and Y have the same variance 2. a) Determine the conditions on a and b that would result in T = aX + bY being and unbiased estimator of . ET EaX bY aE X bEY a b3 a 3b if a + 3b = 1 b) Determine the values of a and b that would make T = aX + bY the unbiased estimator of with the smallest variance. V Var T Var aX bY a 2Var X b 2Var Y 2 1 a 2 a b a 3 dV 1 a 1 V is minimized when 2a 2 0 da 3 3 1 a 1 That is a or 9a = 1 – a and a . 9 8 1 1 a 1 8 8 1 7 Also b 3 3 24 24 7 1 Hence T 8 X 24 Y . 2 2 c) If X and Y can be assumed to be normally distributed with standard deviation, = 3.0 degrees. Determine the formula for a 95% confidence interval for based on the statistic T. Use this to compute an estimate of with the observations X = 14 and Y = 46. T 18 X 247 Y has a Normal distribution with mean and variance V Var T a 2 b2 18 247 32 = 0.90625 T T Thus z has a standard Normal distribution. 0.90625 0.951972 T 1.96 Thus 0.95 P z 0.025 z z 0.025 P 1.96 0.951972 P 1.960.951972 T 1.960.951972 P 1.866 T 1.866 PT 1.866 T 1.866 Hence T 1.866 is a 95% confidence interval for Since X = 14 and Y = 46and T 18 X 247 Y 18 14 247 46 15.667 . Then 95% confidence interval for are 15.667 1.866 or 13.301 to 17.033 2 2 page 8 VIII. In a study to investigate the effect of regular physical exercise on the reduction of high blood pressure in males aged 60-65, the researchers selected at random a sample of n1 =12 males suffering from high blood pressure in the given age group and for a two-year period placed the subjects on a daily physical exercise program. A second sample of n2 =12 males suffering from high blood pressure in the given age group was selected and again for the two-year period this group of subjects were placed on a physical exercise program for which they were required to perform only once a week. Finally a third sample of n3 =12 males suffering from high blood pressure in the given age group was selected. No exercise program was required for individuals in sample three. At the end of the two-year period the reduction in blood pressure was measured for each of the subjects in the study and is presented in the table below. Table V.1: Reduction in Blood pressure for three groups of males aged 60-65 initially suffering from high blood pressure. Daily Exercise Weekly exercise No Exercise 21.2 15.7 5.6 15.6 18.5 - 5.7 4.5 29.0 - 14.8 32.0 17.4 4.9 20.4 6.7 8.6 12.2 11.4 18.6 11.2 4.1 - 7.9 30.9 23.5 15.2 10.4 15.1 - 2.3 23.7 5.8 - 8.9 22.8 25.7 - 1.2 11.2 19.7 8.6 Mean 18.00833 16.05000 1.72500 Std. Dev. 8.55904 7.95104 10.20010 x 216.1 192.6 20.7 2 x 4697.43 3786.64 1180.17 Carry out the Analy Solution The ANOVA Table Source SS df MS F p-value Between 1896.75 3 632.25 7.64712 0.00054 Within 2645.7 32 82.6782 Total 4542.45 35 Thus we reject the Hypothesis of equality between group means page 9 IX. Suppose that x1, x2, x3, …, xn is a sample from the following distribution e - x - for x : f x , otherwise 0 a) Determine method of moments estimators of . 1 xf x , dx x e - x - dx and 2 x f x , dx x 2 e 2 - x - dx Putting u = x – . Hence du = dx and when x = , ∞ then u = 0, ∞. Hence 0 0 0 1 x e- x- dx u e-u du u e-u du e -u du 1 1 0 0 0 2 x 2 e- x - dx u e- u du u 2 e- u du 2 u e- u du 2 e- u du 2 2 2 0 2 1 2 = 2 2 2 2 2 The method of moments estimators , satisfy m1 1 n 1 1 n 2 2 2 2 2 x x and m i xi 2 n i 1 n i 1 2 2 Thus m1 1 or m1 1 and m2 2 m1 1 2 m1 1 2 Hence m 2 Aand m12 2 1 or m1 1 m1 1 1 m2 m12 1 m2 m12 since is positive m1 m2 m12 b) Determine maximum Likelihood estimators of . Solution: The joint density of x1, x2, x3, …, xn is: n - xi - n n for x1 , L , x f xi , e i1 : i 1 0 otherwise n - xi - n e i 1 for min xi 0 otherwise n l ln L n ln - xi - for min xi i 1 page 10 , xn l l 1 n 1 n - xi - n x and n if i 1 1 1 1 The implication is that ˆ min xi and x ˆ or ˆ ˆ x ˆ x min xi Note X. Let x1, x2, x3, …, xn be a sample of size n from the density function given by: k 1 x k kx e x0 f ( x ) 0 elsewhere where k is a known positive constant and is an unknown parameter. a) Find the Maximum Likelihood estimator of ˆ . b) Find the Method of Moments estimator of . XI. In the following study the investigator was interested in determinig if the Presence of Heart Disease was related to Systolic Blood pressure. The study consisted of four groups of subjects with differing levels of Systolic Blood pressure (<127, 127-146, 147-166, 167+). The data is tabulated below: Coronary Heart Disease Present Absent Total Systolic Blood pressure (mm Hg) <127 20 388 408 127-146 28 527 555 147-166 20 204 224 167+ 24 118 142 Carry out the Chi-square test to determine if there is any significant (and = 0.01differences in the Presence of heart disease between the four Blood pressure groups. Table: frequencies, expected frequencies, standardized residuals Coronary Systolic Blood pressure (mm Hg) Heart Total <127 127-146 147-166 167+ Disease Present Absent Total r c 2 i 1 j 1 x ij 20 (28.24) -1.551 388 (379.76) 0.423 28 (38.42) -1.681 527 (516.58) 0.458 20 (15.51) 1.141 204 (208.49) -0.311 24 (9.83) 4.52 118 (132.17) -1.233 408 555 224 142 Eij Eij 92 1237 1329 2 2 7.815 , Reject independence. 28.966 , Compare with 0.05 page 11