Ch 12 實習 Introduction We shall develop techniques to estimate and test three population parameters. 2 Population mean m Population variance s2 Population proportion p Jia-Ying Chen Inference About a Population Mean When the Population Standard Deviation Is Unknown Recall that when s is known we use the following statistic to estimate and test a population mean z xm s n When s is unknown, we use its point estimator s, and the z-statistic is replaced then by the t-statistic 3 Jia-Ying Chen The t - Statistic t The t distribution is mound-shaped, and symmetrical around zero. d.f. = v2 v1 < v2 4 d.f. = v1 0 xm s n The “degrees of freedom”, (a function of the sample size) determine how spread the distribution is (compared to the normal distribution) Jia-Ying Chen 自由度 5 統計學上的自由度(degree of freedom, df),是指當以樣本的統計量來估計總體的 參數時, 樣本中獨立或能自由變化的資料 的個數,稱為該統計量的自由度 Ex: Jia-Ying Chen How to calculus sample variance From the data we have x , x , thus x x 2 i i 2 s 2 6 2 i i n 1 n Jia-Ying Chen Testing m when s is unknown 7 Example 1 In order to determine the number of workers required to meet demand, the productivity of newly hired trainees is studied. It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring. Can we conclude that this belief is correct, based on productivity observation of 50 trainees Jia-Ying Chen Testing m when s is unknown Example 1 – Solution The problem objective is to describe the population of the number of packages processed in one hour. The data are interval. H0:m = 450 H1:m > 450 The t statistic t 8 x m s n d.f. = n - 1 = 49 Jia-Ying Chen Testing m when s is unknown Solution continued (solving by hand) The rejection region is From the data w e hav e 2 t > ta,n – 1 x 23 , 019 x i i 10,671,357, thus ta,n - 1 = t.05,49 @ t.05,50 = 1.676. 23,019 x 460.38, and 50 x x n 2 s2 2 i i n 1 s 1507.55 38.83 9 1507.55. Jia-Ying Chen Testing m when s is unknown Rejection region The test statistic is t 10 x m s n 1.676 460.38 450 38.83 50 1.89 1.89 Since 1.89 > 1.676 we reject the null hypothesis in favor of the alternative. There is sufficient evidence to infer that the mean productivity of trainees one week after being hired is greater than 450 packages at .05 significance level. Jia-Ying Chen Estimating m when s is unknown Confidence interval estimator of m when s is unknown x ta 11 s 2 n d.f . n 1 Jia-Ying Chen Estimating m when s is unknown Example 2 12 An investor is trying to estimate the return on investment in companies that won quality awards last year. A random sample of 83 such companies is selected, and the return on investment is calculated had he invested in them. Construct a 95% confidence interval for the mean return. Jia-Ying Chen Estimating m when s is unknown Solution (solving by hand) The problem objective is to describe the population of annual returns from buying shares of quality award-winners. The data are interval. x 15.02 s 2 68.98 s 68.98 8.31 Solving by hand From the data we determine x ta 13 2, n 1 s @ 15.02 1.990 n t.025,82@ t.025,80 8.31 83 13.19,16.85 Jia-Ying Chen Checking the required conditions 14 We need to check that the population is normally distributed, or at least not extremely nonnormal. There are statistical methods to test for normality From the sample histograms we see… Jia-Ying Chen A Histogram for Example 1 14 12 10 8 6 4 2 0 400 425 450 475 500 525 550 Packages A Histogram for Example 2 30 575 More 25 20 15 10 5 0 -4 15 2 8 14 Returns 22 30 More Jia-Ying Chen Summary of Test Statistics to be Used in a Hypothesis Test about a Population Mean Yes s known ? Yes n > 30 ? No Yes Use s to estimate s s known ? Yes z 16 x m s/ n No x m t s/ n x m z s/ n No Popul. approx. normal ? No Use s to estimate s x m t s/ n Increase n to > 30 Jia-Ying Chen Example 1 17 How much money do winners go home with from the television quiz show Feopardy? To determine an answer, a random sample of winners was drawn and the amount of money each won was recorded and is listed here. Estimate with 95% confidence the mean winnings for all show’s players (Assume the random variable is normally distributed) 26650 6060 52820 8490 13660 25840 49840 23790 51480 18960 990 11450 41810 21060 7860 Jia-Ying Chen Solution 18 Jia-Ying Chen Example 2 19 A federal agency responsible for enforcing laws governing weights and measures routinely inspects packages to determine whether the weight of the contents is at least as great as that advertised on the package. A random sample of 18 containers whose packaging states that the contents weigh 8 ounces was drawn. The contents were weighted and the results follows. Can we concluded at the 1% significance level that on average the containers are mislabeled? (Assume the random variable is normally distributed) 7.80 7.91 7.93 7.99 7.94 7.75 7.97 7.95 7.79 8.06 7.82 7.89 7.92 7.87 7.92 7.98 8.05 7.91 Jia-Ying Chen Solution H0:μ=8 H1:μ<8 There is enough evidence to conclude that the average container is mislabeled 20 Jia-Ying Chen Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: 21 The consistency of a production process for quality control purposes. Investors use variance as a measure of risk. To draw inference about variability, the parameter of interest is s2. Jia-Ying Chen Inference About a Population Variance The sample variance s2 is an unbiased, consistent and efficient point estimator for s2. (n 1) s 2 The statistic has a distribution 2 s called Chi-squared, if the population is normally distributed. d.f. = 5 d.f. = 10 22 Jia-Ying Chen Testing the Population Variance Example 3 (operation management application) 23 A container-filling machine is believed to fill 1 liter containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter). To test this belief a random sample of 25 1-liter fills was taken, and the results recorded Do these data support the belief that the variance is less than 1cc at 5% significance level? Jia-Ying Chen Testing the Population Variance Solution 24 The problem objective is to describe the population of 1-liter fills from a filling machine. The data are interval, and we are interested in the variability of the fills. The complete test is: H0: s2 = 1 H1: s2 <1 Jia-Ying Chen Testing the Population Variance • Solving by hand – Note that (n - 1)s2 = S(xi - x)2 = Sxi2 – (Sxi)2/n – From the sample, we can calculate Sxi = 24,996.4, and Sxi2 = 24,992,821.3 – Then (n - 1)s2 = 24,992,821.3-(24,996.4)2/25 =20.78 There is insufficient evidence to reject the hypothesis that the variance is less than 1. 25 Jia-Ying Chen Testing the Population Variance a = .05 1-a = .95 Rejection region 2 13.8484 13.8484 20.8 2 .295,251 Do not reject the null hypothesis 26 Jia-Ying Chen Testing and Estimating a Population Variance From the following probability statement P(21-a/2 < 2 < 2a/2) = 1-a we have (by substituting 2 = [(n - 1)s2]/s2.) 27 Jia-Ying Chen Example 3 28 With gasoline prices increasing, drivers are becoming more concerned with their cars’ gasoline consumption. For the past 5 years, a driver has tracked the gas mileage of his car and found that the variance from fill-up to fill-up was σ2=23 mpg2. Now that his car is 5 years old, he would like to know whether the variability of gas mileage has changed. He recorded the gas mileage from his last eight fill-ups; these are listed here. Conduct a test at a 10% significance level to infer whether the variability has changed. 28 25 29 25 32 36 27 24 Jia-Ying Chen Solution 29 H0:σ2=23 H1:σ2≠23 Jia-Ying Chen Example 4 30 During annual checkups physician routinely send their patients to medical laboratories to have various tests performed. One such test determines the cholesterol level in patients’ blood. However, not all tests are conducted in the same way. To acquire more information, a man was sent to 10 laboratories and in each had his cholesterol level measured. The results are listed here. Estimate with 95% confidence the variance of these measurements. 4.70 4.83 4.65 4.60 4.75 4.88 4.68 4.75 4.80 4.90 Jia-Ying Chen Solution 31 Jia-Ying Chen Inference About a Population Proportion 32 When the population consists of nominal data, the only inference we can make is about the proportion of occurrence of a certain value. The parameter p was used before to calculate these probabilities under the binomial distribution. Jia-Ying Chen Inference About a Population Proportion Statistic and sampling distribution the statistic used when making inference about p is: x where n x the number of successes. n sample size. pˆ 33 – Under certain conditions, [np > 5 and n(1-p) > 5], pˆ is approximately normally distributed, with m = p and s2 = p(1 - p)/n. Jia-Ying Chen Testing and Estimating the Proportion Test statistic for p pˆ p Z p(1 p) / n where np 5 and n(1 p) 5 Interval estimator for p (1-a confidence level) pˆ z a / 2 pˆ (1 pˆ ) / n provided npˆ 5 and n(1 pˆ ) 5 34 Jia-Ying Chen Example 5 35 A dean of a business school wanted to know whether the graduates of her school used a statistical inference technique during their first year of employment after graduation. She surveyed 314 graduates and asked about the use of statistical technique. After tallying up the responses, she found that 204 used statistical inference within one year of graduation. Estimate with 90% confidence the proportion of all business school graduates who use their statistical education within a year of graduation. Jia-Ying Chen Solution 36 Jia-Ying Chen Example 6 37 In some states the law requires drivers to turn on their headlights when driving in the rain. A highway patrol officer believes that less than one-quarter of all drivers follow this rule. As a test, he randomly samples 200 cars driving in the rain and counts the number whose headlights are turned on. H finds this number to be 41. Does the officer have enough evidence at the 10% significance level to support his belief? Jia-Ying Chen Solution There is enough evidence to support the officer’s belief 38 Jia-Ying Chen Selecting the Sample Size to Estimate the Proportion Recall: The confidence interval for the proportion is pˆ za / 2 pˆ (1 pˆ ) / n Thus, to estimate the proportion to within W, we can write W za / 2 pˆ (1 pˆ ) / n 39 Jia-Ying Chen Selecting the Sample Size to Estimate the Proportion The required sample size is za / 2 pˆ (1 pˆ ) n W 40 2 Jia-Ying Chen Selecting the Sample Size Two methods – in each case we choose a value for solve the equation for n. Method 1 : no knowledge of even a rough value of a ‘worst case scenario’ so we substitute = .50 then . This is Method 2 : we have some idea about the value of . This is a better scenario and we substitute in our estimated value. 41 Jia-Ying Chen 12.41