Chapter 9 Estimation and Confidence Intervals Homework #9 (Hilary Term Week 3): Chapter 9 Exercises 2, 12, 16, 24 & 26. A Brief Review Chapter 7: Continuous Probability Distributions Xi ~ N(, 2) P( - 1.96. < Xi < + 1.96.) = 0.95 Chapter 8: Sampling Methods and the Central Limit Theorem The population parameters and 2 were assumed known and the objective was to form some conclusions about possible values of the sample mean, x. Xi ~ N(, 2) {or Xi ~ ?(, 2) with n > 30} x ~ N(, 2/n) P( - 1.96./n < x < + 1.96./n) = 0.95 Estimation A more interesting question to ask is: given the values of x , what can be said about the population parameters ? Point Estimation: A single value is used to provide the best estimate of the parameter of interest. Interval Estimation: “Interval estimates are better for the consumer of the statistics, since they not only show the estimate of the parameter but also give an idea of the confidence which the researcher has in that estimate.” Estimation: Large Sample Size (n > 30) x ~ N(, 2/n) (for all distributions of xi!) P( - 1.96./n < x < + 1.96./n) = 0.95 Rearranging this inequality, 1. ( - 1.96./n < x ) ( < x + 1.96./n) 2. ( x < + 1.96./n) ( x - 1.96./n < ) The interval [ x - 1.96./n < < x + 1.96./n] is referred to as the 95% confidence interval for . The interval [ x - 1.64. /n < < x + 1.64. /n] is referred to as the 90% confidence interval for . The greater the degree of confidence required, the wider the confidence interval has to be. 2 Unknown? Replace 2 with s2 (as long as n > 30) The interval [ x - 1.96.s/n < < x + 1.96.s/n] is referred to as the 95% confidence interval for . The interval [ x - 1.64.s/n < < x + 1.64.s/n] is referred to as the 90% confidence interval for . Estimation: Small Sample Size Importance of Large Sample Size 1. Central Limit Theorem: Sampling distribution of the sample mean could be assumed to be Normally distributed. Z= X n ~ N(0,1) 2. Unknown 2: Replace 2 with s2 Z= X ~ N(0,1) s n Small Sample Size (n < 30)? 1. Given that the Central Limit Theorem can no longer be used, we must know (or simply assume/hope) that underlying distribution is Normally distributed. Xi ~ N(, 2) x ~ N(, 2/n) Z = X n ~ N(0,1) 2. Claim: If the population is Normally distributed, the following statistic, T= X ~ tn-1 s n has a distribution called the t distribution with n-1 degrees of freedom. Student’s t distribution “Student” was Gosset’s pseudonym (Guinness brewery, Dublin) The shape of the t distribution depends on the number of the degrees of freedom (= n – 1). The t distribution is similar in appearance to the standard normal (Z) distribution in that it is symmetric about zero. For small sample sizes, it has wider (i.e. fatter) tails than the standard normal distribution. For n > 25 or 30, there is little/no difference between the t distribution and the z distribution. Z T20 T10 Use of t Table(s) The interval [ x - t0.025,n-1.s/n < < x + t0.025,n-1.s/n] is referred to as the 95% confidence interval for . Examples: n = 10 (n – 1) = 9 t0.025,n-1 = 2.262 n = 20 (n – 1) = 19 t0.025,n-1 = 2.093 n = (n – 1) = t0.025,n-1 = 1.96 Example: Given the sample data, x = 40, s = 10 and n = 36, calculate the 99% confidence estimate of the population mean . If the sample size were 20, how would the method of calculation and width of the interval be altered? As n > 30, the 99% confidence interval for is [ x 2.57.s/n < < x + 2.57.s/n] = [40 – 2.57.10/6 < < 40 + 2.57.10/6] = [35.72, 44.28] n = 20: The 99% confidence interval for is [ x t0.005,n-1.s/n < < x + t0.005,n-1.s/n] = [40 – 2.861.10/20 < < 40 + 2.861.10/20] = [33.60, 46.40] Estimating a Proportion : proportion of the population that has a particular characteristic, e.g. unemployed, FF voter, … p: proportion of a sample that has a particular characteristic, e.g. unemployed, FF voter, … n: sample size Review: The Binomial Distribution n: number of trials x: number of “successes” within n trials : probability of “success” in any individual trial (1-): probability of “failure” in any individual trial P(x) = nCx x (1-)n-x Claims: 1. E(x) = n (intuitive) 2. Var(x) = n(1-) (not so intuitive) See previous notes for proofs. x ~ B(n, n(1-)) Claim: If x ~ B(n, n(1-)) x ~ N(n, n(1-)) [if n > 5 and n(1-) > 5)] Estimating a Proportion Sample proportion = number of “successes” number of trials i.e., p = x/n p ~ ?(?,?) x ~ N(n, n(1-)) and p is a linear transformation of x p ~ N(?,?) E(p)? E(p) = E(x/n) = E(x)/n = n/n = (as expected) Var (p)? Var(p) = Var(x/n) = Var(x)/n2 = n(1-)/n2 = (1-)/n = (1-)/n Therefore, p ~ N(, (1-)/n) Example: Given the sample data p = 0.4, n = 50, estimate the 99% confidence interval estimate of the true proportion. p ~ N(, (1-)/n) Therefore, the 99% confidence interval for can be written down as: [p – 2.57{p(1-p)/n}0.5, p + 2.57{p(1-p)/n}0.5] [0.22, 0.58] Note: The known p(1-p)/n is being used as a replacement for the unknown (1-)/n.