Math 144 Confidence Interval In addition to the estimated value of the estimator, some statisticians suggest that we should also consider the variance of the estimator. Use the single value and the variance of the estimator to form an interval that has a high probability to cover the unknown parameter. This method including the variance of the point estimator is called interval estimation, or "confidence interval". Interval estimation Assume that ˆL and ˆU are two functions of a random sample and are determined by a point estimator ˆ of an unknown parameter such that ˆ ˆ P(L U ) 1 where α is a known value between 0 and 1. Interval estimation P(ˆL ˆU ) 1 After sampling, if the actual values of ˆL and ˆU are a and b, respectively, then the interval [a, b] is called a 100(1-α)% confidence interval (hereafter, C.I.) for θ. The quantity 1-α is called the confidence level associated with the confidence interval. Caution: By the definition, before sampling, we have a random interval estimation ˆ ˆ [ L , U ] for the unknown parameter θ. After sampling, the confidence interval [a, b] is a fixed (not random) interval. Indeed, it depends on the particular sample observations. Caution: Most importantly, the unknown parameter θ is either inside or outside the confidence interval [a, b].That is, P(a ≤ θ ≤ b) = 0 or 1. After sampling, we have observations [ a, b] x1 ,, xn P(a ≤ θ ≤ b) = 0 [ [ a b θ Caution: Most importantly, the unknown parameter θ is either inside or outside the confidence interval [a, b].That is, P(a ≤ θ ≤ b) = 0 or 1. After sampling, we have observations [ a, b] x1 ,, xn P(a ≤ θ ≤ b) = 1 a θ [ [ b Caution: Most importantly, the unknown parameter θ is either inside or outside the confidence interval [a, b].That is, P(a ≤ θ ≤ b) = 0 or 1. Recall that before sampling, we have ˆ ˆ P(L U ) 1 Interpretation of C.I. The interpretation of a 100(1-α)% C.I. is that when we obtained N (sufficient large) independent sets of random sample and for each set of random sample, we construct one particular interval by using the same point estimator, then there are N(1-α) out of these N intervals will contain the true unknown parameter θ. However, we do not know which interval will contain θ and which will not contain θ, because θ is unknown. Interpretation of C.I. For instance, if we construct a random interval by drawing different sets of samples repeatedly, say 100 times, then 95% = 100(1-0.05)% C.I. for μ means that μ is contained in 95 out of the 100 fixed intervals. Again, we do not know what these 95 intervals are, because µ is unknown. Steps to construct a confidence interval Step 1: Find a point estimator of θ Step 2: Find its EXACT (or approximate) distribution. Step 3: Based on the exact (or approximate) distribution found in Step 2 to construct the C.I. Throughout this course, we are only interested in how to construct confidence intervals of parameters µ and σ2 by the sample mean X and sample variance S2. In the following, we will discuss the distributions of X and S2, and then see how to obtain the confidence interval of µ and σ2 case by case. One sample Confidence Interval for µ with NORMAL population (known variance) Confidence interval for µ Case I: Normal distribution with unknown mean and KNOWN variance: Consider a random sample of size n, {X1, X2, …, Xn}, from a normal distribution with unknown mean µ and KNOWN variance σ2. That is, X1,, X n ~ N (, ) . 2 Then we have a result that the sampling distribution of the sample mean is 2 X ~ N ( , Or equivalently, Z n n( X ) ) ~ N (0,1) How to construct the interval? Define a quantity z such that P(Z z ) . α zα How to construct the interval? Define a quantity z such that P(Z z ) . By the symmetry of the standard normal distribution, we have P( z / 2 n( X ) z / 2 ) 1 How to construct the interval? Z n ( X ) z1-α/2 = -zα/2 α/2 1-α α/2 zα/2 ) How to construct the interval? Define a quantity z such that P(Z z ) . By the symmetry of the standard normal distribution, we have ˆ L P( z / 2 n( X ) z / 2 ) 1 z / 2 z / 2 P( X X ) 1 n n θ ˆU How to construct the interval? After sampling, we can find an actual value of the sample mean, say 100(1-α)% C.I for μ is that z / 2 x , n or simply written as x . Thus, z / 2 x n z / 2 x n The margin of error For example, if α = 0.05, then z0.025 z0.025 P( X X ) 0.95 n n If all X1,…, Xn are observed, i.e. we have x1,…,xn then 95% C.I for μ is that x z0.025 z0.025 x , x n n , Remark again that it does not mean that μ is inside this interval with a probability 0.95. Note that μ is an unknown BUT fixed number, and are known. So, μ is either inside or outside the fixed interval. x and σ z0.025 z0.025 P( x x ) 0 or 1 n n z0.025 z0.025 P( X X X x ) 0 or 1 n n z0.025 z0.025 P( X X ) 0.95 n n 2 Questions Page 12 Q1: Given a random sample of 100 observations from a normal distribution for which µ is unknown and σ = 8. Suppose that the sample mean is found to be 42.7 after sampling. Then what is the 95% C.I. for µ? Q2: A wine importer needs to report the average percentage of alcohol in bottles of French wine. From previous experience with different kinds of wine, the importer believes the alcohol concentration is normally distributed with standard deviation 1.2%. The importer randomly samples 60 bottles of the new wine and obtains a sample mean 9.3%. Find a 90% C.I. for the population average percentage. One sample Confidence Interval for µ with NORMAL population (unknown variance) Confidence interval for µ Case II: Normal distribution with unknown mean and UNKNOWN variance: Consider a random sample of size n, {X1, X2, …, Xn}, from a normal distribution with unknown mean µ and UNKNOWN variance σ2. That is, X1,, X n ~ N (, ) . 2 Then we have a result that the sampling distribution of the sample mean is 2 X ~ N ( , Or equivalently, Z n n( X ) ) ~ N (0,1) After sampling, we can find an actual value of the sample mean, say 100(1-α)% C.I for μ is that z / 2 x , n x x . Thus, z / 2 n However, σ is UNKNOWN. So, this interval is also unknown. Replace σ2 by the sample variance S2. However, the next problem is: What is the sampling distribution of Still normal? NO! n ( X ) ? S Theorem Consider a random sample of size n, {X1, X2, …, Xn}, from a normal distribution with unknown mean µ and UNKNOWN variance σ2. Then the sampling distribution of n( X ) S has a Student t distribution (or simply t distribution) with n -1 degrees of freedom. Denote by n( X ) Tn 1 ~ t n 1 S where 1 n X Xi n i1 and n 1 2 S2 ( X X ) i n 1 i 1 tk distribution • Similar to a standard normal distribution, it is also symmetric about 0, so P(T ≤ -a) = 1 - P(T ≤ a) = P(T ≥ a), if T follows a t distribution. • Use a table of a t distribution to find a probability of a t-distributed random variable. How to construct the interval? Define a quantity t n 1, such that P(Tn1 tn1, ) . By the symmetry of the t distribution, we have n( X ) P( tn1, / 2 tn1, / 2 ) 1 S tn1, / 2 S tn1, / 2 S P( X X ) 1 n n How to construct the interval? After sampling, we can find the actual values of the sample mean and sample variance, say x and s. Thus, 100(1-α)% C.I for μ is tn1, / 2 s tn1, / 2 s , x x n n or simply written as x tn 1, / 2 s n How to use the table of t distribution for the value of α For the value of the degree of freedom 2.353 = ? 2.353 = t?3, 0.05 Degree of freedom first α Questions Page 14 Q3 (i) Find P(-t14, 0.025 ≤ T14 ≤ t14, 0.005) P(-t14, 0.025 ≤ T14 ≤ t14, 0.005) = P(T14 ≤ t14, 0.005) – P(T14 ≤ -t14, 0.025) = [1 - P(T14 > t14, 0.005)] – P(T14 > t14, 0.025) = [1 – 0.005] – 0.025 = 0.97 By the symmetry of t distribution Questions Page 14 Q3 (ii) Find k such that P( k ≤ T14 ≤ - 1.761) = 0.045 0.045 = P( k ≤ T14 ≤ - 1.761) = P(T14 ≤ - 1.761) – P(T14 ≤ k) = P(T14 ≥ 1.761) – P(T14 ≥ - k) By the symmetry of t distribution Questions Page 14 Q3 (ii) Find k such that P( k ≤ T14 ≤ - 1.761) = 0.045 0.045 = P( k ≤ T14 ≤ - 1.761) = P(T14 ≤ - 1.761) – P(T14 ≤ k) = P(T14 ≥ 1.761) – P(T14 ≥ - k) = P(T14 ≥ t14, 0.05) – P(T14 ≥ - k) = 0.05 – P(T14 ≥ - k) P(T14 ≥ - k) = 0.05 – 0.045 = 0.005 By the symmetry of t distribution Questions Page 14 Q3 (ii) Find k such that P( k ≤ T14 ≤ - 1.761) = 0.045 0.045 = P( k ≤ T14 ≤ - 1.761) = P(T14 ≤ - 1.761) – P(T14 ≤ k) = P(T14 ≥ 1.761) – P(T14 ≥ - k) By the symmetry of t distribution = P(T14 ≥ t14, 0.05) – P(T14 ≥ - k) = 0.05 – P(T14 ≥ - k) P(T14 ≥ - k) = 0.05 – 0.045 = 0.005 = P(T14 ≥ 2.977) k = - 2.977 Questions Page 14 Frequencies, in hertz (Hz), of 12 elephant calls: 14, 16, 17, 17, 24, 20, 32, 18, 29, 31, 15, 35 Assume that the population of possible elephant call frequencies is a normal distribution, Now a scientist is interested in the average of the frequencies, say µ. Find a 95% confidence interval for µ. Population variance is UNKNOWN So, use t distribution to construct the C.I. for µ. x 22.33, s2 = 56.424, n = 12, α = 0.05 Finally, the 95% C.I. for µ is [17.557, 27.103] Remark: When n > 30, the difference of a t distribution with n -1 degrees of freedom and the standard normal distribution is small. So, we have tn1, / 2 z / 2 . Therefore, we can use z / 2 s , x n z / 2 s x n to approximate the 100(1-α)% C.I for μ with unknown variance, as n > 30. Two samples Confidence Interval for µX - µY with NORMAL populations (known variances) Confidence interval for µX - µY Case I: Normal distributions with unknown means and KNOWN variances: Consider two independent random samples, and X1,, X n ~ N ( X , ) 2 X Y1 ,, Ym ~ N (Y , ) 2 Y Want to construct a C.I. for the mean difference µX - µY. First, choose a point estimator of the mean difference. use X Y to estimate µX - µY. How to construct the interval? Second, find the sampling distribution of X Y . Indeed, we have a result that ( X Y ) ~ N X Y , n m 2 X 2 Y Or equivalently, ( X Y ) ( X Y ) X2 n Y2 m ~ N 0, 1 How to construct the interval? Similar to Case 1 in the one-sample case. After sampling, the 100(1-α)% C.I for μX - μY is given by X2 Y2 X2 Y2 , ( x y ) z / 2 ( x y ) z / 2 n m n m or ( x y ) z / 2 2 X n 2 Y m Confidence interval for µX - µY Case I: Normal distributions with unknown means and KNOWN variances: In particular, if two variances are EQUAL, say σX2 = σY2 = σ2, then the 100(1-α)% C.I for μX - μY becomes 1 1 1 1 ( x y ) z / 2 , ( x y ) z / 2 n m n m Example Two kinds of thread are being compared for strength. Fifty pieces of each type of thread are tested under similar conditions. Brand A had an average tensile strength of 78.3 kilograms with a population standard deviation of 5.6 kilograms, while brand B had an average tensile strength of 87.2 kilograms with a population standard deviation of 6.3 kilograms. Construct a 95% confidence interval for the difference of the population means µA - µB. Example n = m = 50 Two kinds of thread are being compared for strength. Fifty pieces of each type of thread are tested under similar conditions. Brand A had an average tensile strength of 78.3 kilograms with a population standard deviation of 5.6 kilograms, while brand B had an average tensile strength of 87.2 kilograms with a population standard deviation of 6.3 kilograms. Construct a 95% confidence interval for the difference of the population means µA - µB. α = 0.05 Two samples Known variances x 78 .3 y 87.2 σX = 5.6 σY = 6.3 Example Two kinds of thread are being compared for strength. Fifty pieces of each type of thread are tested under similar conditions. Brand A had an average tensile strength of 78.3 kilograms with a population standard deviation of 5.6 kilograms, while brand B had an average tensile strength of 87.2 kilograms with a population standard deviation of 6.3 kilograms. Construct a 95% confidence interval for the difference of the population means µA - µB. 2 2 5.6 6.3 (78.3 87.2) (1.96) 50 50 = [-11.24, -6.56] Two samples Confidence Interval for µX - µY with NORMAL populations (unknown variances) Confidence interval for µX - µY Case II: Normal distributions with unknown means and UNKNOWN variances: Consider two independent random samples, and X1,, X n ~ N ( X , ) 2 X Y1 ,, Ym ~ N (Y , ) 2 Y (i) In a case that BOTH UNKNOWN variances are EQUAL: (ii) In a case that BOTH UNKNOWN variances are DIFFERENT: Recall that, in the one-sample case with UNKNOWN variance, we replace the population variance σ2 by the sample variance S2. Then we have a result that n( X ) S has a t distribution with n-1 degrees of freedom. So, in two-sample cases, we will also replace the unknown variances by their estimators. Then what estimators should we use to estimate the variances? Confidence interval for µX - µY Case II: Normal distributions with unknown means and UNKNOWN variances: (i) In a case that BOTH UNKNOWN variances are EQUAL: n Use a statistic S 2 p (X i 1 m X ) (Y j Y ) 2 i 2 i 1 nm2 (n 1) S (m 1) S nm2 2 X 2 Y which is called a pooled estimator of σ2 or pooled sample variance. Confidence interval for µX - µY Case II: Normal distributions with unknown means and UNKNOWN variances: (i) In a case that BOTH UNKNOWN variances are EQUAL: Based on 2 p S , ( X Y ) ( X Y ) Sp 1 1 n m ~ t n m2 So, after sampling, the 100(1-α)% C.I for μX - μY is given by ( x y ) t n m 2, / 2 s p 1 1 n m If n+m-2 > 30, then the confidence interval can be approximated by ( x y ) z / 2 s p 1 1 n m Example Page 17 Two tomato fertilizers are compared to see if one is better than the other. The weight measurements of two independent random samples of tomatoes grown using each of the two fertilizers (in ounces) are as follows: Fertilizer A (X): 12, 11, 7, 13, 8, 9, 10, 13 Fertilizer B (Y): 13, 11, 10, 6, 7, 4, 10 Assume that two populations are normal and their population variances are equal. Consider a confidence level 1-α = 0.95. Fertilizer A (X): 12, 11, 7, 13, 8, 9, 10, 13 Fertilizer B (Y): 13, 11, 10, 6, 7, 4, 10 Assume that two populations are normal and their population variances are equal. Consider a confidence level 1-α = 0.95. Since n = 8, m = 7, sY2 9.905, 2 x 10.375, y 8.714, s X 5.125 and 2 2 ( n 1 ) s ( m 1 ) s X Y s 2p 7.331 nm2 Thus, the 95% C.I. for µX - µY is given by (10.375 8.714) t13,0.025 = [-1.366, 4.688]. 1 1 7.331( ) 8 7 Question Students may choose between a 3-semester-hour course in physics without labs and a 4-semester-hour course with labs. The final written examination is the same for each section. If 24 students in the section with labs made an average examination grade of 84 with a standard deviation of 4, and 36 students in the section without labs made an average grade of 77 with a standard deviation of 6. Then find a 99% confidence interval for the difference between the average grades for the two courses. Assume that the population variances are equal. Confidence interval for µX- µY Case II: Normal distributions with unknown means and UNKNOWN variances: (ii) In a case that BOTH UNKNOWN variances are DIFFERENT: We do not have a statistic such that its exact distribution can be found to construct a C.I. for µX - µY in this case. However, it is still possible for us to construct an APPROXIMATE confidence interval. Now, both variances are different, so we cannot use the pooled sample variance. In this case, we use the sample variance SX2 for σX2 and SY2 for σY2. That is, we consider ( X Y ) ( X Y ) S X2 SY2 n m . It can be shown that the sampling distribution of the above statistic is an approximate t distribution with v degrees of freedom, where 2 2 2 S X SY n m v 2 2 2 2 1 SX 1 SY n 1 n m 1 m 2 S S n m v 2 2 2 2 1 SX 1 SY n 1 n m 1 m 2 X 2 Y Before sampling, v is random and unknown. After sampling, the actual value of v is fixed and can be found. Remark that after sampling, the actual value of the degree of freedom v is not always an integer. So, in practice, we must round down to the nearest integer to achieve the desired confidence interval. That is, if v = 1.4, then take 1; if v = 2.9, then take 2. Confidence interval for µX- µY Case II: Normal distributions with unknown means and UNKNOWN variances: (ii) In a case that BOTH UNKNOWN variances are DIFFERENT: Thus, the approximate 100(1-α)% C.I for μX - μY is s X2 sY2 s X2 sY2 , ( x y ) tv , / 2 ( x y ) tv, / 2 n m n m If v > 30, then the confidence interval becomes s X2 sY2 s X2 sY2 , ( x y ) z / 2 ( x y ) z / 2 n m n m Question A study was conducted by the Department of Zoology at the Virginia Polytechnic Institute and State University to estimate the difference in the amount of the chemical orthophosphorus measured at two different stations on the James River. Orthophosphorus is measured in milligrams per liter. Fifteen samples were collected from station 1 and 12 samples were obtained from station 2. The 15 samples from station 1 had an average orthophosphorus content of 3.84 milligrams per liter and a standard deviation of 3.07 milligrams per liter, while the 12 samples from station 2 had an average content of 1.49 milligrams per liter and a standard deviation of 0.80 milligram per liter. Find a 95% confidence interval for the difference in the true average orthophosphorus contents at these two stations, assuming that the observations came from normal populations with different variances. Question A study was conducted by the Department of Zoology at the Virginia Polytechnic Institute and State University to estimate the difference in the amount of the chemical orthophosphorus measured at two different stations on the James River. Orthophosphorus is measured in milligrams per liter. Fifteen samples were collected from station 1 and 12 samples were obtained from station 2. The 15 samples from station 1 had an average orthophosphorus content of 3.84 milligrams per liter and a standard deviation of 3.07 milligrams per liter, while the 12 samples from station 2 had an average content of 1.49 milligrams per liter and a standard deviation of 0.80 milligram per liter. Find a 95% confidence interval for the difference in the true average orthophosphorus contents at these two stations, assuming that the observations came from normal populations with different variances. Two sample problem with α=0.05!! Question A study was conducted by the Department of Zoology at the Virginia Polytechnic Institute and State University to estimate the difference in the amount of the chemical orthophosphorus measured at two different stations on the James River. Orthophosphorus is measured in milligrams per liter. Fifteen samples were collected from station 1 and 12 samples were obtained from station 2. The 15 samples from station 1 had an average orthophosphorus content of 3.84 milligrams per liter and a standard deviation of 3.07 milligrams per liter, while the 12 samples from station 2 had an average content of 1.49 milligrams per liter and a standard deviation of 0.80 milligram per liter. Find a 95% confidence interval for the difference in the true average orthophosphorus contents at these two stations, assuming that the observations came from normal populations with different variances. Two sample problem with α=0.05!! Normal!! Different Variances Question A study was conducted by the Department of Zoology at the Virginia Polytechnic Institute and State University to estimate the difference in the amount of the chemical orthophosphorus measured at two different stations on the James River. Orthophosphorus is measured in milligrams per liter. Fifteen samples were collected from station 1 and 12 samples were obtained from station 2. The 15 samples from station 1 had an average orthophosphorus content of 3.84 milligrams per liter and a standard deviation of 3.07 milligrams per liter, while the 12 samples from station 2 had an average content of 1.49 milligrams per liter and a standard deviation of 0.80 milligram per liter. Find a 95% confidence interval for the difference in the true average orthophosphorus contents at these two stations, assuming that the observations came from normal populations with different variances. Two sample problem with α=0.05!! x 3.84, s X 3.07, n 15 Normal!! Different Variances and y 1.49, sY 0.8, m 12 Question Two sample problem with α=0.05!! x 3.84, s X 3.07, n 15 Normal!! Different Variances and y 1.49, sY 0.8, m 12 Consider µ1 - µ2, where µi is the true average orthophosphorus contents at station i, i = 1 and 2. Since the population variances are assumed to be unequal, we can only find an approximate 95% C.I. based on the t distribution with v degrees of freedom, where 3.07 2 / 15 0.80 / 12 v 2 2 2 2 [(3.07 / 15) / 14] [(0.80 / 12) / 11] 2 16.3 16 2 Question Two sample problem with α=0.05!! Normal!! Different Variances So, for α = 0.05, we have tv, / 2 t16,0.025 2.120 Thus, the 95% C.I. for µ1 - µ2 is ( x y ) t16,0.025 2 X 2 Y s s n m 3.072 0.802 (3.84 1.49) (2.120) [ 0 . 60 , 15 12 4.10]. Question Thus, the 95% C.I. for µ1 - µ2 is ( x y ) t16,0.025 2 X 2 Y s s n m 3.072 0.802 (3.84 1.49) (2.120) [ 0 . 60 , 15 12 4.10]. Hence, we can say that we are 95% confident that the interval from 0.60 to 4.10 milligrams per liter contains the difference of the true average orthophosphorus contents for stations 1 and 2. One- (or Two-) sample(s) Confidence Interval for µX (or µX - µY) with NON-NORMAL population(s) Approximate C.I. in One-sample case Note that, so far, all results are based on the normal population(s). Then a natural question is: how to construct a C.I. with NON-Normal distribution. Unfortunately, in general, it is not easy to find a statistic such that its exact distribution is easily found in this case. However, if the sample size is large enough, then we can use a normal approximation to approximate the distribution of the statistic used to construct the C.I. Central Limit Theorem (CLT) X If is the sample mean of a random sample X1,…, Xn of size n from any distribution with a finite mean µ and a finite positive variance σ2, then the distribution of n X / n X i 1 i n n is the standard normal distribution N(0,1) in the limit as n goes to infinity. Approximate C.I. for µ Case I: Any distribution with unknown mean and KNOWN variance: Consider a random sample of size n, {X1, X2, …, Xn}, from a distribution with unknown mean µ and KNOWN variance σ2. That is, After sampling, we can find an actual value of the sample mean, say APPROXIMATE 100(1-α)% C.I for μ is z / 2 , x n z / 2 x n x . Thus, the Case II: Any distribution with unknown mean and UNKNOWN variance: After sampling, we can find the actual values of the sample mean and sample variance, say x and s. Thus, the APPROXIMATE 100(1-α)% C.I for μ is tn 1, / 2 s , x n tn1, / 2 s x n If n is large enough, then the approximate 100(1-α)% C.I for μ becomes z / 2 s z / 2 s , x x n n Approximate C.I. in Two-sample case Consider two independent random samples from distributions with means µX and µY and variance σX2 and σY2, respectively. (i) In a case of SAME variance (say, σX2 = σY2 = σ2), the APPROXIMATE 100(1-α)% C.I for µX - µY is (if variance σ2 is known) 1 1 ( x y ) z / 2 n m (if variance σ2 is unknown ) ( x y ) t n m 2, / 2 s p or ( x y ) z / 2 s p 1 1 n m 1 1 n m if n+m-2 is large enough. Approximate C.I. in Two-sample case Consider two independent random samples from distributions with means µX and µY and variance σX2 and σY2, respectively. (i) In a case of Different variances, the APPROXIMATE 100(1-α)% C.I for µX - µY is (if variances are known ) ( x y ) z / 2 X2 n Y2 m (if variances are unknown ) ( x y ) tv , / 2 or ( x y ) z / 2 s X2 sY2 n m s X2 sY2 n m if v is large enough OR n and m are large enough. Confidence Interval for σ2 with NORMAL population Confidence interval for σ2 Case : Normal distribution with UNKNOWN variance: Consider a random sample of size n, {X1, X2, …, Xn}, from a normal distribution with UNKNOWN mean and UNKNOWN variance σ2. Then, a n statistic (n 1) S 2 has a chi-squared (or denote it by X 2 n 1 2 2 (X i 1 i X) 2 2 ) distribution with n – 1 degrees of freedom. We (n 1) S 2 2 ~ 2 n 1 Chi-squared distribution with k degrees of freedom Not symmetric !! How to construct the interval? 2 2 2 P ( X Define a quantity such that k ) . So, we have P( 2 1 / 2 (n 1) S 2 2 Found from the table of chi squared distribution with k degrees of freedom / 2 ) 1 2 Density function of the chi-squared random variable X 2 n 1 with n-1 degrees of freedom. /2 /2 1 2 1 / 2 / 2 2 How to construct the interval? 2 2 2 P ( X Define a quantity such that k ) . So, we have P( 2 1 / 2 (n 1) S 2 2 Found from the table of chi squared distribution with k degrees of freedom / 2 ) 1 2 (n 1)S (n 1)S 2 P( 2 2 ) 1 / 2 1 / 2 2 2 After sampling, we can find an actual value of the sample variance, say s2. Thus, 100(1-α)% C.I for σ2 is (n 1) s , 2 / 2 2 (n 1) s . 2 1 / 2 2 How to use the table of chi-squared distribution P( X k2 2 ) . for the value of α For the value of the degree of freedom 20.483 = ? 20.483 = ? 2 0.025 With 10 degrees of freedom Questions Page 21 For a chi-squared distribution with v degrees of freedom, a) If v = 5, then 2 0.005 16.750 = 2 0.005 With 5 degrees of freedom Questions Page 21 For a chi-squared distribution with v degrees of freedom, a) If v = 5, then 2 0.005 16.750 b) If v = 19, then 2 0.05 30.144 Questions Page 21 For a chi-squared distribution with v degrees of freedom, find 2 a) such that P( X ) 0.025 2 v 2 2 0.025 when v = 19; 0.025 32.852 Questions Page 21 For a chi-squared distribution with v degrees of freedom, find 2 b) such that P(37.652 X ) 0.045 2 v 2 P( X ) P( X 37.652) 2 25 P( X =? 2 25 2 2 25 37.652) P( X 2 25 ) 2 when v = 25; P( X 37.652) 2 25 37.652 = 2 0.05 With 25 degrees of freedom Questions Page 21 For a chi-squared distribution with v degrees of freedom, find 2 b) such that P(37.652 X ) 0.045 2 v 2 P( X ) P( X 37.652) 2 25 P( X 2 25 2 2 25 37.652) P( X 0.05 P( X ) 2 25 2 2 25 ) 2 when v = 25; Questions Page 21 For a chi-squared distribution with v degrees of freedom, find 2 b) such that P(37.652 X ) 0.045 2 v 2 when v = 25; P( X ) 0.05 0.045 0.005 2 25 2 0.005 2 46.928. Questions Page 21 For a chi-squared distribution with v degrees of freedom, find 2 such that P( X ) 0.95 when v = 6; 2 0.05 0.05 12.592 a) b) 2 v 2 P( X 23.209) 0.015 2 2 v 0.025 2 0.025 when v = 10; 20.483. How about the confidence interval for σ, not σ2? (n 1)S (n 1)S 2 P( 2 2 ) 1 / 2 1 / 2 2 Recall that 2 A 100(1 - α)% confidence interval for σ can be obtained by taking the square root of each endpoint of the interval for σ2. That is, (n 1) s , 2 / 2 (n 1) s . 2 1 / 2 Example The following are the weights, in decagrams, of 10 packages of grass seed distributed by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0. Find a 95% C.I. for the variance of all such packages of grass seed distributed by this company, assuming that a normal population is used. Example n = 10 The following are the weights, in decagrams, of 10 packages of grass seed distributed by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0. Find a 95% C.I. for the variance of all such packages of grass seed distributed by this company, assuming that a normal population is used. / 2 2 2 1 / 2 2 0.025 19.023 2 0.975 2.700 0.05 Example n = 10 The following are the weights, in decagrams, of 10 packages of grass seed distributed by a certain company: 46.4, 46.1, 45.8, 47.0, 46.1, 45.9, 45.8, 46.9, 45.2 and 46.0. n 1 2 s ( xi x ) 0.286 n 1 i 1 2 Thus, the 95% C.I. for the variance is 9(0.286 ) [ , 19.023 9(0.286 ) ] [0.135, 0.953]. 2.700 Sample size determination Before we end the topic of estimation, let’s consider the problem of how to determine the sample size. Often, we wish to know how large a sample is necessary to ensure that the error in estimating an unknown parameter, say µ, will be less than a specified amount e. Consider a 100(1-α)% C.I. for µ with known variance. The (marginal) error is z / 2 n Thus, solving for the sample size n in the equation z / 2 n e implies that the required sample size is z / 2 n . e 2 Question Page 23 A marketing research firm wants to conduct a survey to estimate the average amount spent on entertainment by each person visiting a popular resort. The people who plan the survey would like to have an estimate close to the true value such that we will have 95% confidence that the difference between them is within $120. If the population standard deviation is $400, then how large should the sample be? Question Page 23 0.05 z / 2 z0.025 1.96 A marketing research firm wants to conduct a survey to estimate the average amount spent on entertainment by each person visiting a popular resort. The people who plan the survey would like to have an estimate close to the true value such that we will have 95% confidence that the difference between them is within $120. If the population standard deviation is $400, then how large should the sample be? | x | 120 x 120 e Question Page 23 0.05 z / 2 z0.025 1.96 A marketing research firm wants to conduct a survey to estimate the average amount spent on entertainment by each person visiting a popular resort. The people who plan the survey would like to have an estimate close to the true value such that we will have 95% confidence that the difference between them is within $120. If the population standard deviation is $400, then how large should the sample be? | x | 120 400 x 120 e Question Page 23 A marketing research firm wants to conduct a survey to estimate the average amount spent on entertainment by each person visiting a popular resort. The people who plan the survey would like to have an estimate close to the true value such that we will have 95% confidence that the difference between them is within $120. If the population standard deviation is $400, then how large should the sample be? Then, the required sample size is z / 2 n 42.68. e 2