Chapter 23 1 The sampling distribution for y In order to determine the population mean value µ of a quantity of interest y, we sample values of y € compute a sample mean y . from the population and How good is the estimate y ? We study the sampling distribution for y to answer this question. € A model for the sampling distribution is based on the Central Limit€Theorem: € Regardless of the population from which we sample or the statistic X we may be measuring, the normal model more closely describes the distribution of X the larger the sample size gets, and while the mean E(X) is independent of n, its standard deviation SD(X) decreases as n gets larger; therefore, X is a better and better approximation of its true mean value E(X) for larger and larger n. In our present situation, we choose for the statistic X the sample mean y . This leads to the following observations about its sampling distribution. € Chapter 23 2 • We select a SRS of size n of independent measurements of the quantity y from a population with population mean value E(y) = µ and standard deviation SD( y) = σ . • the mean of the sampling distribution of the statistic y is E(y ) = E(y) = µ (by the CLT). € • the variance of the sampling distribution of y is € € € ∑ y Var( y ) = Var n € 1 = ⋅Var ∑ y 2 n 1 = 2 ⋅nVar(y) n Var(y) = n σ2 = n ( ) • so the standard deviation of the sampling σ distribution € of y is SD( y ) = . n • since the population parameter σ is typically unknown, we must estimate this standard € s € the standard error SE(y ) = deviation by . n € Chapter 23 3 Student’s t distribution model We have noted that • the sampling distribution for y is well described by a normal model for large sample sizes; and • its standard deviation is SD( y ) = σ / n , but since σ is typically unknown, € we estimate this standard deviation in practice by the standard error SE(y ) = s / n . € € € These two observations suggest some difficulties with our set-up so far: • When the sample size is not large, the normal model may do a poorer job of describing the sampling distribution for y ; but in practice, we either choose n large enough that this issue is not a serious one, or we find that we are sampling from a population that is nearly normal in shape so that € the sampling distribution for y behaves normally even for small sample sizes. • More of a problem, however, is that replacing SD( y ) = σ / n by SE(y ) = s / n introduces a € significant amount of additional variability due to the fact that s is a sample statistic with its own variability that increases as the sample decreases in size. € € Chapter 23 4 This problem was resolved by W.S. Gossett in 1908. He realized that use of the standard error estimate SE(y ) = s / n required a new model for the sampling distribution for y . The standardized sample mean statistic € t= y−µ (s / n ) follows a distribution called the Student’s t model. This distribution is not normal, although it is bell€ about 0 like the standardized shaped and symmetric normal z distribution N(0, 1). However, the Student’s t distribution • has thicker tails (larger spread) than the standard normal distribution (because substituting σ with s introduces more variation in values of t than appear in the normal model for z), and • behaves more like a normal distribution as the number of degrees of freedom (n – 1) increases (increasing n leads to a closer estimate of σ by s). Since the t-distributions have thick tails, outliers tend to be a bit more common than for normal distributions, so t procedures are strongly influenced by the presence of outliers. Chapter 23 5 One-sample t procedures Assumptions: independently selected measurements (hence not more than 10% of the size of the population when selected without replacement) form a SRS drawn from a nearly normal population. (View a histogram or normal probability plot of the data to check this condition.) A level C confidence interval for µ is y ± t* n −1 where t * s n is the appropriate critical value € depending on the € level of confidence C for the t-distribution with n – 1 degrees of freedom n −1 € Chapter 23 6 Choosing the sample size Since ME = t * n −1 s n in the confidence interval for the mean, we can assure a certain margin of error in our estimate by setting a desired€value for ME, and solving the above equation for n. Unfortunately, we need to know s to do this, and we can’t know s until after we have set the sample size n! Instead, we can sometimes perform a pilot study to get an initial value for s and then do the calculation above to find the appropriate value of n for the main study. (Remember to round up to determine n.) Chapter 23 7 One-sample t hypothesis test • State hypotheses: Null hypothesis H0: µ = µ0 Alternative hypothesis HA: µ > µ0, or µ < µ0, or µ ≠ µ0 • Choose the model: An independently selected SRS drawn from a nearly normal population satisfying the 10% Condition, so Student’s t model applies to standardized sampling distribution for y • Mechanics: Compute t-statistic based on H€ 0: t = y−µ (s / n ) . Probability associated with appropriate HA: P = P( T ≥ t ), or P = P( T ≤ t ), or P = 2P( T ≥ t ) € • Conclusion: Assess evidence against H0 in favor of HA depending on how small P is. [TI-83: STAT TESTS T-Test… ] Chapter 23 8 The Sign test One way to get around having to meet the stringent conditions underlying the t test is to transform the problem from a quantitative one into a categorical one: recharacterize the hypothesized mean value µ as a hypothesized median value, then ask what proportion of the data exceed or fall shy of the value of this median? This replaces the original underlying statistic y with a proportion pˆ = proportion of the data greater than € (or less than) the hypothesized median € We then carry out a one-proportion z test with null hypothesis H0: p = .50 and suitable alternative hypothesis. Of course, we haven’t escaped the need to meet some conditions: our data still need to come from a SRS satisfying the 10% Condition and the Success/Failure Condition.