6-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6th edition. 6-2 Pertemuan 13 dan 14 Confidence Intervals 6-3 6 Confidence Intervals Using Statistics Confidence Interval for the Population Mean When the Population Standard Deviation is Known Confidence Intervals for When is Unknown The t Distribution Large-Sample Confidence Intervals for the Population Proportion p Confidence Intervals for the Population Variance Sample Size Determination The Templates 6-4 6 LEARNING OBJECTIVES After studying this chapter you should be able to: Explain confidence intervals Compute confidence intervals for population means Compute confidence intervals for population proportions Compute confidence intervals for population variances Compute minimum sample sizes needed for an estimation Compute confidence intervals for special types of sampling methods Use templates for all confidence interval and sample size computations 6-5 6-1 Using Statistics • Consider the following statements: x = 550 • A single-valued estimate that conveys little information about the actual value of the population mean. We are 99% confident that is in the interval [449,551] • An interval estimate which locates the population mean within a narrow interval, with a high level of confidence. We are 90% confident that is in the interval [400,700] • An interval estimate which locates the population mean within a broader interval, with a lower level of confidence. 6-6 Types of Estimators • Point Estimate A single-valued estimate. A single element chosen from a sampling distribution. Conveys little information about the actual value of the population parameter, about the accuracy of the estimate. • Confidence Interval or Interval Estimate An interval or range of values believed to include the unknown population parameter. Associated with the interval is a measure of the confidence we have that the interval does indeed contain the parameter of interest. 6-7 Confidence Interval or Interval Estimate A confidence interval or interval estimate is a range or interval of numbers believed to include an unknown population parameter. Associated with the interval is a measure of the confidence we have that the interval does indeed contain the parameter of interest. • A confidence interval or interval estimate has two components: A range or interval of values An associated level of confidence 6-8 6-2 Confidence Interval for When Is Known • If the population distribution is normal, the sampling distribution of the mean is normal. If the sample is sufficiently large, regardless of the shape of the population distribution, the sampling distribution is normal (Central Limit Theorem). In either case: Standard Normal Distribution: 95% Interval 0.4 P 196 . x 196 . 0.95 n n 0.3 f(z) or P x 196 . x 196 . 0.95 n n 0.2 0.1 0.0 -4 -3 -2 -1 0 z 1 2 3 4 6-2 Confidence Interval for when is Known (Continued) Before sampling, there is a 0.95probability that the interval 1.96 n will include the sample mean (and 5% that it will not). Conversely, after sampling, approximately 95% of such intervals x 1.96 n will include the population mean (and 5% of them will not). That is, x 1.96 n is a 95% confidence interval for . 6-9 6-10 A 95% Interval around the Population Mean Sampling Distribution of the Mean Approximately 95% of sample means can be expected to fall within the interval 1.96 , 1.96 . 0.4 95% f(x) 0.3 0.2 0.1 2.5% 2.5% 0.0 196 . 196 . n x n x x 2.5% fall below the interval n n Conversely, about 2.5% can be expected to be above 1.96 n and 2.5% can be expected to be below 1.96 . n x x x 2.5% fall above the interval x x x x 95% fall within the interval So 5% can be expected to fall outside the interval 196 . . , 196 . n n 6-11 95% Intervals around the Sample Mean Sampling Distribution of the Mean 0.4 95% f(x) 0.3 0.2 0.1 2.5% 2.5% 0.0 196 . 196 . n x n x x x x x x x x x x x x x *5% of such intervals around the sample x * Approximately 95% of the intervals around the sample mean can be x 1.96 n expected to include the actual value of the population mean, . (When the sample mean falls within the 95% interval around the population mean.) * mean can be expected not to include the actual value of the population mean. (When the sample mean falls outside the 95% interval around the population mean.) 6-12 The 95% Confidence Interval for A 95% confidence interval for when is known and sampling is done from a normal population, or a large sample is used: x 1.96 The quantity 1.96 sampling error. n n is often called the margin of error or the For example, if: n = 25 = 20 x = 122 A 95% confidence interval: 20 x 1.96 122 1.96 n 25 122 (1.96)(4 ) 122 7.84 114.16,129.84 6-13 A (1-a )100% Confidence Interval for We define za as the z value that cuts off a right-tail area of a under the standard 2 2 normal curve. (1-a) is called the confidence coefficient. a is called the error probability, and (1-a)100% is called the confidence level. P z > za a/2 2 P z za a/2 2 P za z za (1 a) 2 2 S tand ard Norm al Distrib ution 0.4 (1 a ) f(z) 0.3 0.2 0.1 a a 2 2 (1- a)100% Confidence Interval: 0.0 -5 -4 -3 -2 -1 z a 2 0 1 Z za 2 2 3 4 5 x za 2 n 6-14 Critical Values of z and Levels of Confidence 0.99 0.98 0.95 0.90 0.80 2 0.005 0.010 0.025 0.050 0.100 Stand ard N o rm al Distrib utio n za 0.4 (1 a ) 2 2.576 2.326 1.960 1.645 1.282 0.3 f(z) (1 a ) a 0.2 0.1 a a 2 2 0.0 -5 -4 -3 -2 -1 z a 2 0 1 2 Z za 2 3 4 5 6-15 The Level of Confidence and the Width of the Confidence Interval When sampling from the same population, using a fixed sample size, the higher the confidence level, the wider the confidence interval. St an d ar d N or m al Di stri b uti o n 0.4 0.4 0.3 0.3 f(z) f(z) St an d ar d N or m al Di s tri b uti o n 0.2 0.1 0.2 0.1 0.0 0.0 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 Z 1 2 3 4 Z 80% Confidence Interval: x 128 . 0 n 95% Confidence Interval: x 196 . n 5 6-16 The Sample Size and the Width of the Confidence Interval When sampling from the same population, using a fixed confidence level, the larger the sample size, n, the narrower the confidence interval. S a m p lin g D is trib utio n o f th e M e an S a m p lin g D is trib utio n o f th e M e an 0 .4 0 .9 0 .8 0 .7 0 .3 f(x) f(x) 0 .6 0 .2 0 .5 0 .4 0 .3 0 .1 0 .2 0 .1 0 .0 0 .0 x 95% Confidence Interval: n = 20 x 95% Confidence Interval: n = 40 6-17 Example 6-1 • Population consists of the Fortune 500 Companies (Fortune Web Site), as ranked by Revenues. You are trying to to find out the average Revenues for the companies on the list. The population standard deviation is $15,056.37. A random sample of 30 companies obtains a sample mean of $10,672.87. Give a 95% and 90% confidence interval for the average Revenues. 6-18 Example 6-1 (continued) - Using the Template Note: The remaining part of the template display is shown on the next slide. 6-19 Example 6-1 (continued) - Using the Template (Sigma) 6-20 Example 6-1 (continued) - Using the Template when the Sample Data is Known 6-21 6-3 Confidence Interval or Interval Estimate for When Is Unknown - The t Distribution If the population standard deviation, , is not known, replace with the sample standard deviation, s. If the population is normal, the resulting statistic: t X s n has a t distribution with (n - 1) degrees of freedom. • • • • The t is a family of bell-shaped and symmetric distributions, one for each number of degree of freedom. The expected value of t is 0. For df > 2, the variance of t is df/(df-2). This is greater than 1, but approaches 1 as the number of degrees of freedom increases. The t is flatter and has fatter tails than does the standard normal. The t distribution approaches a standard normal as the number of degrees of freedom increases Standard normal t, df = 20 t, df = 10 6-22 The t Distribution Template 6-23 6-3 Confidence Intervals for when is Unknown- The t Distribution A (1-a)100% confidence interval for when is not known (assuming a normally distributed population): s x t n a 2 where ta is the value of the t distribution with n-1 degrees of 2 a freedom that cuts off a tail area of 2 to its right. 6-24 The t Distribution t0.005 -----63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.704 2.660 2.617 2.576 t D is trib utio n: d f = 1 0 0 .4 0 .3 Area = 0.10 0 .2 Area = 0.10 } t0.010 -----31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.423 2.390 2.358 2.326 } t0.025 -----12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.021 2.000 1.980 1.960 f(t) t0.050 ----6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.684 1.671 1.658 1.645 0 .1 0 .0 -2.228 Area = 0.025 -1.372 0 t 1.372 2.228 } t0.100 ----3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.303 1.296 1.289 1.282 } df --1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60 120 Area = 0.025 Whenever is not known (and the population is assumed normal), the correct distribution to use is the t distribution with n-1 degrees of freedom. Note, however, that for large degrees of freedom, the t distribution is approximated well by the Z distribution. 6-25 Example 6-2 A stock market analyst wants to estimate the average return on a certain stock. A random sample of 15 days yields an average (annualized) return of .37% deviation of s = 3.5%. Assuming a normal population of andx a 10 standard returns, give a 95% confidence interval for the average return on this stock. df --1 . . . 13 14 15 . . . t0.100 ----3.078 . . . 1.350 1.345 1.341 . . . t0.050 ----6.314 . . . 1.771 1.761 1.753 . . . t0.025 -----12.706 . . . 2.160 2.145 2.131 . . . t0.010 -----31.821 . . . 2.650 2.624 2.602 . . . t0.005 -----63.657 . . . 3.012 2.977 2.947 . . . The critical value of t for df = (n -1) = (15 -1) =14 and a right-tail area of 0.025 is: t 0.025 2.145 The corresponding confidence interval or s x t interval estimate is: 0. 025 n 35 . 10.37 2.145 15 10.37 1.94 8.43,12.31 6-26 Large Sample Confidence Intervals for the Population Mean df --1 . . . 120 t0.100 ----3.078 . . . 1.289 1.282 t0.050 ----6.314 . . . 1.658 1.645 t0.025 -----12.706 . . . 1.980 1.960 t0.010 -----31.821 . . . 2.358 2.326 t0.005 -----63.657 . . . 2.617 2.576 Whenever is not known (and the population is assumed normal), the correct distribution to use is the t distribution with n-1 degrees of freedom. Note, however, that for large degrees of freedom, the t distribution is approximated well by the Z distribution. 6-27 Large Sample Confidence Intervals for the Population Mean A large - sample (1 - a )100% confidence interval for : s x za n 2 Example 6-3: An economist wants to estimate the average amount in checking accounts at banks in a given region. A random sample of 100 accounts gives x-bar = $357.60 and s = $140.00. Give a 95% confidence interval for , the average amount in any checking account at a bank in the given region. x z 0.025 s 140.00 357.60 1.96 357.60 27.44 33016,385 . .04 n 100