Review 3 Chapter 8 1. A statistic any quantity computed from values in a sample (for example, x , s, the sample median, the sample interquartile range and so on).The distribution of a statistic is called its sampling distribution. 2. Properties of the sampling distribution of x Let x denote the mean of the observations in a random sample of size n from a population having mean and standard deviation . Denote the mean value of the x distribution by x and the standard deviation of x distribution by x . Then the following rules hold. Rule 1: x = Rule 2: x = . n Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n. Thus, the standardized variable z x x x x / n has the standard normal (z) distribution. Rule 4: (Central Limit Theorem) When n is sufficiently large (n ≥ 30), the sampling distribution of x is well approximated by a normal curve, even when the population distribution is not itself normal. So, the standardized variable z x x x has approximately distribution. the x / n standard normal (z) 3. General properties of the sampling distribution of p Let p be the proportion of S’s in a random sample of size n from a population whose proportion of S’s is . Denote the mean value of p by p and the standard deviation of p by p. Then the following rules hold. Rule 1: p = Rule 2: p (1 ) n Rule 3: (Central Limit Theorem) When n is large and is not too near 0 or 1 (n 10 and n(1- ) 10), the sampling distribution of p is approximately normal. Thus, the standardized variable z pp p p (1 ) / n has approximately distribution. the standard normal (z) Chapter 9 4. A point estimate of a population characteristic is a single number computed from sample data and represents a plausible value of the characteristic. A point estimate is obtained by (i) selecting an appropriate statistic; (ii) computing the value of the statistic for the given sample. A statistic whose mean is equal to the value of the population characteristic being estimated is said to be an unbiased statistic. A statistic that is not unbiased is said biased. 5. Criteria for choosing among competing statistics a) First we choose an unbiased statistic if there is one; b) if several unbiased statistics could be used for estimating a population characteristic, we choose the one with the smallest standard deviation. 6. Statistics used to estimate some important population characteristics Population characteristic to be Statistic to use Unbiasedness estimated p Unbiased Population proportion, Unbiased x Population mean, 2 s Unbiased Population variance, 2 s Biased Population standard deviation, Population median Sample median Biased 7. A confidence interval for a population characteristic is an interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic will be captured inside the interval. The confidence level associated with a confidence interval estimate is the success rate of the method used to construct the interval. The standard error of a statistic is the estimated standard deviation of the statistic. The bound on error of estimation based on a statistic, B, associated with a 95% confidence interval is (1.96)(standard deviation of the statistic). 8. The large-sample confidence interval for When (1) p is the sample proportion from a random sample, and (2) the sample size n is large (np 10 and n(1-p) 10) the general formula for a confidence interval for a population proportion is p (z critical value) p(1 p) n The desired confidence level determines the z critical value. The three most commonly used confidence levels, 90%, 95%, and 99%, use z critical values 1.645, 1.96, and 2.58, respectively. 9. The sample size required to estimate a population proportion to within an amount B with 95% confidence is n = (1-) ( 1.96 ) 2 B The value of may be estimated using prior information. In the absence of any such information, using = .5 in this formula gives a conservatively large value for the required sample size. Question: What is the formula for the sample size required to estimate to within a amount B with any confidence level? ( n = (1-)( z critical value / B)2 ) 10. The one-sample z confidence interval for When (1). x is the sample mean of a random sample (2) the population distribution is normal or the sample size n is large (generally n 30) (3) the population standard deviation is known the formula for a confidence interval for a population mean is x ( z critical value) ( ) n 11. Let x1, x2, , xn be a random sample from a normal population distribution. Then the probability distribution of the standardized variable t x s/ n has the t distribution with n-1 df. 12. the one-sample t confidence interval for When (1) x is the sample mean of a random sample (2)the population distribution is normal or the sample size n is large (generally n 30) (3)the population standard deviation is unknown the formula for a confidence interval for population mean is x ( t critical value) ( s n ) where the t critical value is based on n-1 df, which can be found by Appendix Table 3 on page 732. 13. The sample size required to estimate a population mean to within an amount B with 95% confidence is n =[ 1.96 ]2 . B If is unknown, it may be estimated based on previous information or, for a population that is not too skewed, by using (range)/4. 14. Examples (1) (Ex.9.10) (a) xJ 1 (103 + 156 + 118 + 89 + 125 + 147 + 122 + 10 109 + 138 +99) = 120.6 (b) Since = 10,000p, an estimate of would be 10,000 x J = 10000(120.6) = 1,206,000. The statistic used is (The size of the population) x J (c) An estimate of is p = 8 / 10 = 0.8 (d) We estimate the population median usage by the sample median: (118+122) / 2 = 120 therms (2) (Ex. 9.14) The large-sample confidence interval for is p (z critical value) p(1 p) n = (p - (z critical value) p(1 p) , p + (z critical value) n p(1 p) ) n The width of the interval = 2(z critical value) p(1 p) n a) As the confidence level increases, z critical value increases, thus the width of the confidence interval for increases. b) As the sample size increases, the width of the confidence interval for decreases. c) As the value of p is farther from 0.5, closer to either 0 or 1, the width of the confidence interval for decreases. (3) (Ex.9.25) A 90% confidence interval is 0.65 (1.645) 0.65(1 0.65) 150 = 0.65 0.064 = (0.586, 0.714) Thus, we can be 90% confident that between 58.6% and 71.4% of Utah residents favor fluoridation. This is consistent with the statement that a clear majority of Utah residents favor fluoridation (4) (Ex 9.27) n = 0.5(1-0.5) (1.96 / 0.05)2 = 384.16 Thus 385 packages of ground beef should be tested. (5) (Ex.9.32) A 95% confidence interval for is (7.8, 9.4). a) The 90% confidence interval would have been narrower, since its z critical value would have been smaller. b) The statement is incorrect. The 95% refers to the percentage of all possible intervals that include , not to the chance that a specific interval contains . c) This statement is incorrect. While we would expect approximately 95 of the 100 intervals constructed to contain , we cannot be certain that exactly 95 out of 100 of them will. The 95% refers to the percentage of all possible intervals that include . (6) (Ex.9.39) Since n = 25, df = 24, and the t critical value is 1.71. Then the confidence interval is x (t critical value) s = 2.2 1.71 1.2 n = 2.2 0.4104 = (1.7896, 2.6104) 25