G. W. Teklewolde Math MS Statistics Basics Study Note Part 6 Statistics Basics The t-Distribution In many real-life situations, the population standard deviation is unknown. Moreover, because of various constraints such as time and cost, it is often not practical to collect samples of size 30 or more. So, how can you construct a confidence interval for population mean given such circumstances? If the random variable is normally distributed (or approximately normally distributed), you can use a t-distribution. DEFINITION If the distribution of a random variable x is approximately normal, then t x s/ n follows a t-distribution. Critical values of t are denoted by tc. Several properties of the t-distribution are as follows. 1. The t-distribution is bell shaped and symmetric about the mean. 2. The t-distribution is a family of curves, each determined by a parameter called the degrees of freedom. The degrees of freedom are the number of free choices left after a sample statistic such as x is calculated. When you use a t-distribution to estimate a population mean, the degrees of freedom are equal to one less than the sample size. d . f n 1 Degrees of freedom 3. The total area under a t-curve is 1 or 100%. 4. The mean, median, and mode of the t-distribution are equal to zero. 5. As the degrees of freedom increase, the t-distribution approaches the normal distribution. After 30 d.f the t-distribution is very close to the standard normal z-distribution. The tails in the t-distribution are “thicker” than those in the standard normal distribution. Confidence Intervals and t- Distributions Constructing a confidence interval using the t-distribution is similar to constructing a confidence interval using the normal distribution—both use a point estimate x and a margin of error E. GUIDELINES Constructing a Confidence Interval for the Mean: t-Distribution 1. In Words In Symbols Identify the sample statistics n, x , s ( x x) x x , and s. n n 1 2 G. W. Teklewolde Math MS 2. Identify the degrees of freedom, the level of confidence c, and the critical value tc. 3. Find the margin of error E. 4.Find the left and right endpoints and form the confidence interval. Statistics Basics Study Note d.f. = n — 1 E = t~, Left endpoint: ~ — E Right endpoint: i~ + E Interval: ~—E<p.<~+E Point Estimate for the Population Proportion p Recall from Section 4.2, that the probability of success in a single trial of a binomial experiment is p. This probability is a population proportion. To estimate a population proportion p using a confidence interval as with confidence intervals for µ, you will start with a point estimate. DEFINITION The point estimate for p, the population proportion of successes, is given by the proportion of successes in a sample and is denoted by p x n where x is the number of successes in the sample and n is the number in the sample. The point estimate for the proportion of failures is q 1 p . The symbols p and q are read as “p hat” and “q hat.” Confidence Intervals for a Population Proportion p Constructing a confidence interval for a population proportion p is similar to constructing a confidence interval for a population mean. You start with a point estimate and calculate a margin of error. DEFINITION A c-confidence interval for the population proportion p is pE p p E where E zc pq n The probability that the confidence interval contains p is c. Binomial distribution can be approximated by the normal distribution if np ≥ 5 and nq ≥ 5. When n p 5 and nq 5 , the sampling distribution for p is approximately normal with a mean of p = p and a standard error of G. W. Teklewolde Math MS p Statistics Basics Study Note pq n GUIDELINES Constructing a Confidence Interval for a Population Proportion 1. Identify the sample statistics n and x. 2. Find the point estimate p . p x n 3. Verify that the sampling distribution of p an be approximated by the normal distribution. n p 5 , nq 5 4. Find the critical value zc that corresponds to the given level of confidence c. using the standard Normal Table 5. Find the margin of error E= zc pq n 6. Find the left and right endpoints and form the confidence interval. Left endpoint: p E Right endpoint: p E Interval: p E p p E Increasing Sample Size to Increase Precision One way to increase the precision of the confidence interval without decreasing the level of confidence is to increase the sample size. Finding a Minimum Sample Size to Estimate p Given a c-confidence level and a margin or error E, the minimum sample size n needed to estimate p is 2 z n pq c E This formula assumes that you have a preliminary estimate for p and q . If not, use p 0.5 and q 0.5 . The Chi-Square Distribution In manufacturing, it is necessary to control the amount that a process varies. For instance, an automobile part manufacturer must produce thousands of parts to be used in the manufacturing process. It is important that the parts vary little or not at all. How can you measure, and consequently control, the amount of variation in the parts? You can start with a point estimate. DEFINITION G. W. Teklewolde Math MS Statistics Basics Study Note The point estimate for 2 is s2 and the point estimate for is s. s2 is the most unbiased estimate for 2. You can use a chi -square distribution to construct a confidence interval for the variance and standard deviation. DEFINITION If the random variable x has a normal distribution, then the distribution of 2 (n 1)s2 2 forms a chi-square distribution for samples of any size n > 1. Four properties of the chi-square distribution are as follows. 1. All chi-square values x2 are greater than or equal to zero. 2. The chi-square distribution is a family of curves, each determined by the degrees of freedom. To form a confidence interval for 2 , use the 2 -distribution with degrees of freedom equal to one less than the sample size. d.f. = n — 1 Degrees of freedom 3. The area under each curve of the chi-square distribution equals one. 4. Chi-square distributions are positively skewed. Chi-square distributions Confidence Intervals for 2 and You can use the critical values 2 and L2 to construct confidence intervals for a population variance and standard deviation. As you would expect, the best point estimate for the variance is s 2 and the best point estimate for the standard deviation is s. DEFINITION A c-confidence interval for a population variance and standard deviation is as follows. Confidence Interval for 2 : G. W. Teklewolde Math MS (n 1)s2 R2 2 Statistics Basics Study Note (n 1)s2 L2 Confidence Interval for : (n 1)s2 R2 (n 1)s 2 L2 The probability that the confidence intervals contain 2 or is c. GUIDELINES Constructing a Confidence Interval for a Variance and Standard Deviation In Words 1. Verify that the population has a normal distribution. 2. Identify the sample statistic n and the degrees of freedom. 3. Find the point estimate s2. In Symbols d.f. = n — 1 s2 ( x x) 2 n 1 4. Find the critical values R2 and L2 that correspond to the given level of confidence c. Use Table for chi-dist. 5. Find the left and right endpoints and form the confidence interval for the population variance. Left Endpoint Right Endpoint (n 1)s2 R2 2 (n 1)s2 L2 6. Find the confidence interval for the population standard deviation by taking the square root of each endpoint. Left Endpoint Right Endpoint (n 1)s2 R2 (n 1)s 2 L2 Stating a Hypothesis A verbal statement, or claim, about a population parameter is called a statistical hypothesis. To test a population parameter, you should carefully state a pair of hypotheses—one that represents the claim and the other, its complement. When one of these hypotheses is false, the other must be true. Either G. W. Teklewolde Math MS Statistics Basics Study Note hypothesis—the null hypothesis or the alternative hypothesis—may represent the original claim. DEFINITION 1. A null hypothesis H0 is a statistical hypothesis that contains a statement of equality, such as , or 2. The alternative hypothesis Ha is the complement of the null hypothesis H0. It is a statement that must be true if H0 is false and it contains a statement of inequality, such as >, ≠ or <. H0 is read as “H subzero” or “H naught” and Ha is read as “H sub-a.” To write the null and alternative hypotheses, translate the claim made about the population parameter from a verbal statement to a mathematical statement. Then, write its complement. For instance, if the claim value is k and the population parameter is µ., then some possible pairs of null and alternative hypotheses are H0 : k Ha : k H0 : k Ha : k H0 : k Ha : k Regardless of which of the three pairs of hypotheses you use, you always assume k and examine the sampling distribution on the basis of this assumption.