Hypothesis Testing

Chapters 8, 9 – Introduction to Statistical Inferences, One Population There are two branches of statistical inference, 1) estimation of parameters and 2) testing hypotheses about the values of parameters. We will consider estimation first. Defn: A point estimate of a parameter (a numerical characteristics of a population) is a specific numerical value based on the data obtained from a sample. To obtain a point estimate of a parameter, we summarize the information contained in the sample data by using a statistic. A particular statistic used to provide a point estimate of a parameter is called an estimator. Characteristics of an Estimator To provide good estimation, an estimator should have the following characteristics: 1) The estimator should be unbiased. In other words, the expectation, or mean, of the sampling distribution of the estimator for samples of a given size should be equal to the parameter which the statistic is estimating. 2) The estimator should be consistent. In other words, if we increase the size of the sample which we select from the population, the estimator should yield a value which gets closer to the true value of the parameter being estimated. 3) The estimator should be relatively efficient. In other words, of all possible statistics that could be used to estimate a particular parameter, we want to choose the statistic whose sampling distribution has the smallest variance. It turns out that, for a population mean, , the sample mean, X , is an estimator which is unbiased, consistent, and relatively efficient. In any given situation, we have no way of knowing precisely how close the estimated value of a parameter is to the true value of the parameter. However, if the estimator satisfies the three properties listed above, we can be highly confident that the estimated parameter value is unlikely to differ from the true parameter value by much. On the other hand, it is nearly certain that the estimated value will not be exactly equal to the true parameter value. We want to have some idea of the precision of the estimate. Hence, rather than simply calculating a point estimate of the parameter value, we find a confidence interval estimate. Defn: A confidence interval estimate of a parameter consists of an interval of numbers obtained from a point estimate of the parameter, together with a percentage that specifies how confident we are that the true parameter value lies in the interval. This percentage is called the confidence level, or confidence coefficient. Thus, there are three quantities that must be specified: 1) the point estimate of the parameter, 2) the width of the interval (which is usually centered at the value of the point estimate), and 3) the confidence level. Confidence Interval for a Population Mean (Assuming that the population standard deviation is known): If we are sampling from a population which has a normal distribution, or if our sample size is large enough (n  30), then we may find the confidence interval as follows (we will talk about the more realistic situation in which the population standard deviation is unknown later). We know that, in either of the situations (normality or large sample size), the random variable X  X has either a standard normal probability distribution (in the first case) or an approximate Z  X     n standard normal probability distribution (in the second case). Hence, for a given percentage , we may make the following probability statement:     X  X   P  z    z   1   .  X  2   2      n   We can rearrange the quantities in parentheses to obtain an equivalent probability statement by  multiplying throughout the inequality by the quantity X . We obtain n     P  z  X  X    z  X   1   . n 2  2 n We want to isolate the parameter, , in the middle of the inequality, so we subtract X throughout, obtaining     P  X  z  X      X  z  X   1   . n n 2 2  We are almost done. We do not want - in the center, but rather , so we multiply throughout by –1, obtaining     P X  z  X    X  z  X   1   . n n 2 2  Then the endpoints of our confidence interval are X  z  2 X n and X  z  2 X n . The confidence level is 1 - . Intervals constructed in the above manner are called confidence intervals for the corresponding parameter (in this case, a population mean, ). We are usually interested in one of several specific values of the confidence level, either 90%, or 95% or 99%. 1) For 1 -  = .90, we have z 0.05  1.645 , and the 90% confidence interval for  has the form      X  1.645 X , X  1.645 X  . n n  2) For 1 -  = .95, we have z 0.025  1.96 , and the 95% confidence interval for  has the form      X  1.96 X , X  1.96 X  . n n  3) For 1 -  = .99, we have z 0.005  2.575 , and the 99% confidence interval for  has the form    X  2.575 X , n  X  2.575 X   . n We interpret a confidence interval as follows. A confidence interval with confidence level (1 )100% is an interval obtained from sample data by a method such that, (1 - )100% of all intervals obtained by this method would, in fact, contain the true value of the parameter, and 100% of all intervals so obtained would not contain the true value of the parameter. (See page 291 of the textbook) Example: p. 320, Exercise 8.26. Estimation of Population Means, When  Is Unknown If we know the population standard deviation, , then we can use the previous formula for finding a confidence interval for the population mean, . However, in nearly every practical situation, we know neither. We then must use the sample standard deviation, S, as an estimate of the population standard deviation, and the previous formula for the confidence interval for  is no longer valid. Instead of using the standard normal distribution for constructing the confidence interval, we use a related distribution, called the t distribution. Characteristics of the t Distribution: 1) It is bell-shaped. 2) The distribution is unimodal (one peak) and symmetric, and the mean, median and mode are all equal to 0. 3) The curve is continuous; i.e., there are no gaps. 4) The total area under the curve is 1. 5) The curve extends indefinitely in both directions, approaching, but never touching, the horizontal axis. 6) The variance is greater than 1. 7) There is actually a family of t-distributions, each one characterized by a parameter called the degrees of freedom. 8) The t-curve is wider than the standard normal curve, but for larger sample sizes, the t-curve is closer to the standard normal curve. (See the graph on page 298 of your textbook for examples of t-curves.) Note: The symbol d.f. will be used for degrees of freedom. The concept denotes the number of values in the data set which are free to vary. When we first select a random sample of size n, we have n degrees of freedom. After we compute the sample mean, we have used up one degree of freedom, and now d.f. = n – 1. If we compute a second statistic from the data, then we use up another degree of freedom, and now d.f. = n – 2. Formula for a Confidence Interval for  When  is Unknown, and n < 30: We modify our previous formula somewhat. When the population standard deviation is unknown, the  S S   . Here S is , X  t (1 – )100% confidence interval for the mean is given by:  X  t  , n 1 , n 1 n n  2 2  the sample standard deviation, and t 2 , n 1 is the called a critical value for the t distribution with d.f. = n – 1. We can find t critical values using Table F in Appendix C if we know the degrees of freedom of the t-distribution and the confidence level, 1- . For this purpose we would use the two-tail values for . For example, if our confidence level is (1 - )100% = 95%, then  = 0.05. If our sample size is 25 then d.f. = 24, and the t-critical value is 2.064. Finding a Confidence Interval for  Using the TI-83 Calculator: Given a data set of n sample values X1, X2, … , Xn, where these represent a random sample from a population with unknown mean , we can find a (1 – )100% confidence interval for  as follows: 1) Enter the data in the calculator, using STAT and 1:Edit. 2) Choose STAT, then TESTS, then 8:T Interval. 3) Choose Data by hitting ENTER; or if the sample mean and sample standard deviation are given to you, choose Stats. 4) If we don’t know the sample mean and sample standard deviation, then we would choose Data, enter the variable name for List:, and enter the appropriate confidence level. If we know the sample mean and the sample standard deviation, then we would choose Stats, rather than Data, and enter the value of the sample mean, the value of the sample standard deviation, the sample size, and the appropriate confidence level. 5) Hit ENTER. Example: p. 383, Exercise 9.29. Estimation of Population Proportions Sometimes we want to use sample data to estimate the proportion, p, of a population that have a certain characteristic. The point estimate of p would be the proportion, p̂ , that have the characteristic of interest. In this case, we are talking about using a binomial experiment to do the estimation. We would select a random sample of n members of the population; the randomness insures that the n trials are independent of each other. We are seeking the same information about each sample member, whether the member has the characteristic of interest. Thus the trials are identical. Each trial results in one of two possible outcomes; either the member has the characteristic, or the member does not have the characteristic. Assuming that the sample is random, the probability that a member has the characteristic is p, the proportion of the population who have the characteristic. Hence the three conditions of a binomial experiment are satisfied. If we define X = number of members of the sample who have the characteristic, then X has a binomial distribution with parameters n and p. X . If the sample size n is large enough, then this random variable has an approximate normal distribution, and the random pˆ  p variable has an approximate standard normal distribution. We can construct a (1 – pˆ (1  pˆ ) n )100% confidence interval for p using this fact. The form of the confidence interval is The proportion of the sample who have the characteristic of interest is then pˆ   pˆ 1  pˆ  pˆ 1  pˆ    pˆ  z   . To find z  , we look in the table of the standard normal ˆ  z , p   n n 2 2 2   distribution, Table E in Appendix C. If our confidence level is (1 - )100% = 95%, then  = 0.05, and we divide this by two and substract from 0.5, to get 0.4750. Looking in the table, we find that the critical value corresponding to the probability 0.4750 is 1.96. Finding a Confidence Interval for p Using the TI-83 Calculator: Given the number n of trials in our binomial experiment (size of sample) and the number of successes (number of members of the sample with the characteristic of interest), where the sample was taken from a population in which the unknown proportion of members with the characteristic is p, we can find a (1 – )100% confidence interval for p as follows: 1) Enter the data in the calculator, using STAT and 1:Edit. 2) Choose STAT, then TESTS, then A:1-PropZInt. 3) Enter the value of X, the sample size, and the appropriate confidence level. 5) Hit ENTER. Example: p. 397, Exercise 9.62. Hypothesis Testing The other branch of statistical inference is concerned with testing hypotheses about the value of parameters. Defn: A hypothesis is a statement about the value of a population parameter. Defn: In a hypothesis test, the hypothesis which the researcher is attempting to prove is called the alternative (or research) hypothesis, denoted Ha. The negation of the alternative hypothesis is called the null hypothesis, H0. The null hypothesis usually represents current belief about the value of the parameter. The researcher suspects that the null hypothesis is false, and wants to disprove it. Note: Hypotheses are tested in pairs (here the symbol 0 represents some specified number): If the null hypothesis is H0:  = 0, then the corresponding alternative hypothesis is Ha:   0. If the null hypothesis is H0:   0, then the corresponding alternative hypothesis is Ha:  < 0. If the null hypothesis is H0:   0 , then the corresponding alternative hypothesis is Ha:  > 0. In each case, the alternative hypothesis is a strict inequality; the null hypothesis is not. The above hypothesis statements are written in terms of a population mean, . If we are testing hypotheses about a population proportion, p, or some other parameter, that parameter would appear instead of . Example: If the researcher wants to prove that the proportion, p, of the U.S. population who catch a cold between October 1 and March 31 is more than 40%, the two hypotheses would be stated as follows: H0: p  0.40 Ha: p > 0.40 Example: If the researcher is in charge of quality control for the Pepsi-Cola Company, and wants to determine whether an assembly line at a plant which produces 12-oz. cans of Pepsi is working properly, the hypotheses would be stated as follows in terms of the mean amount of Pepsi in the cans: H0:  = 12 oz. Ha:   12 oz. Note: We cannot prove the null hypothesis; we may disprove it, or we may fail to disprove it. After taking a sample and conducting a test, we will either reject H0 and believe Ha, or we will fail to reject H0 because of insufficient evidence against it. There are four possibilities: 1) We reject the null hypothesis when the null hypothesis is, in fact, true. In this case, we make a Type I Error. 2) We reject the null hypothesis when the null hypothesis is, in fact, false. In this case, we make a correct decision. 3) We fail to reject the null hypothesis when the null hypothesis is, in fact, true. In this case, we make a correct decision. 4) We fail to reject the null hypothesis when the null hypothesis is, in fact, false. In this case, we make a Type II Error. We want to maximize the probability of making a correct decision, and minimize the probability of making either a Type I Error or a Type II Error. We use the following notation: P(Type I Error) = P(Reject H0 | H0 True) = . P(Type II Error) = P(Fail to Reject H0 | Ha True) =  We want both of these probabilities to be as small as possible. However, if we choose the value of  to be smaller, the value of  will automatically become larger. To see this, consider the analogy of the American criminal justice system. Suppose that you are the defendant in a murder trial. The hypotheses are H0: Defendant innocent vs. Ha: Defendant guilty The prosecutor is attempting to disprove the null hypothesis, thus proving the alternative hypothesis. To do this, the prosecutor presents evidence to the jury. The jury makes a decision on the basis of the evidence. The jury can decide to reject the null hypothesis, or to fail to reject the null hypothesis. In either case, the jury cannot know with absolute certainty that the decision is correct. If the rules of evidence are changed to make it more difficult for the jury to convict (reject H0), then it becomes easier for the jury to acquit (fail to reject H0). Then it less likely that the jury will make a Type I Error, but more likely that the jury will make a Type II Error. Thus,  has been made smaller, but  becomes larger. If the rules of evidence are changed to make it easier for the jury to convict (reject H0), then it becomes more difficult for the jury to acquit (fail to reject H0). Then it more likely that the jury will make a Type I Error, but less likely that the jury will make a Type II Error. Thus,  has been made larger, but  becomes smaller. The only way that we can reduce both  and  at the same time (make erroneous decisions less likely) is to gather more evidence. In statistical hypothesis testing, “gather more evidence” means choose a larger sample from the population (choose a larger value of n). We also begin our hypothesis test by choosing a value for , based on our assessment of the relative consequences of making a Type I Error, vs. the consequences of making a Type II Error. If we believe that the consequences of making a Type I Error are more serious than the consequences of making a Type II Error, we want to choose  to be rather small, perhaps 0.01 or 0.001. If we believe that the consequences of making a Type II Error are more serious than the consequences of making a Type I Error, we want to choose  to be rather large, perhaps 0.10. The most common choice for  is 0.05. The evidence from the sample is used in the form of the value of a test statistic. Defn: The test statistic is a random variable whose value depends on the sample data, such that the value of the statistic tells us whether to reject H0 or to fail to reject H0. We must choose the test statistic so that we know its sampling distribution under H0. Defn: The rejection region is the set of possible values of the test statistic which would lead us to reject H0 and assert that we have proved Ha. Defn: The significance level of the test is the probability, , of making a Type I Error. Defn: The p-value of the test is the probability (under H0) of obtaining a value of the test statistic at least as extreme as the one which is actually obtained from the data. Note: To make our decision, we compare the computed p-value, obtained from the calculator, with our chosen value of . If the p-value is smaller than , we reject H0 and say that we have proved Ha. If the p-value is larger than our chosen value of , then we fail to reject H0. Types of Alternative Hypotheses: One-Tailed Tests or Two-Tailed Tests There are three forms for an alternative hypothesis: 1) Ha:   0 2) Ha:  < 0 3) Ha:  > 0 The first form results in a two-tailed test, so called because the rejection region consists of two intervals, one associated with the left-hand tail of the probability distribution of the test statistic, and the other associated with the right-hand tail of the distribution. The second form results in a one-tailed test, because the rejection region is an interval associated with the left-hand tail of the probability distribution of the test statistic. The third form results in a one-tailed test, because the rejection region is an interval associated with the right-hand tail of the probability distribution of the test statistic. Steps in Hypothesis Testing Step 1: State the alternative hypothesis, Ha. (This is what we are trying to prove; it must be a strict inequality. Making this statement will involve translating the research conjecture into a statement involving the parameter.) State the null hypothesis, H0. (This is what we are trying to disprove; it cannot be a strict inequality. This statement will be the negation of the alternative hypothesis.) Step 2: Choose the size of the random sample(s) to be selected from the population(s). Choose the significance level, , for the test. In this course, you will usually be given this number, and the most common value is 0.05. In a real research situation, the researcher would choose the value based on an assessment of the relative seriousness of the consequences of making a Type I Error vs. a Type II Error. Step 3: Choose the test statistic which will be used. In this course, if the hypotheses concern one or more population means, the test statistic will be a t-statistic. If the hypotheses concern a population proportion, the test statistic will be a Zstatistic. Step 4: Choose the random sample(s) and collect the data. Compute the value of the test statistic, and the associated p-value. We will do this using the TI-83 calculator. Step 5: Make a decision based on a comparison of the p-value associated with the test statistic with the chosen level of significance. If p-value < , we reject H0 and affirm Ha. Otherwise, we fail to reject H0. Step 6: State the conclusion of the hypothesis test. If we reject the null hypothesis, the conclusion should have the following form: “We reject H0 at the () level of significance. We have sufficient evidence to conclude (statement of research conjecture).” If we fail to reject the null hypothesis, the conclusion should have the following form: “We fail to reject H0 at the () level of significance. We do not have sufficient evidence to conclude (statement of research conjecture).” Example: Suppose a consumer protection agency has received complaints that 24-oz. packages of Post Grape-Nuts contain less than 24 oz. of product. The agency wants to test this claim. They select a random sample of 100 boxes of cereal and measure the amount of product in each one. The sample average is X = 23.94 oz., and the sample standard deviation is S = 0.13 oz. Test the claim at the 0.05 level of significance. Step 1: H0:   24 oz. Ha:  < 24 oz. Step 2: n = 100,  = 0.05 Step 3: The test statistic will be t  X  24 oz. . Under the null hypothesis, this statistic has a t S     n distribution with d.f. = 99. Step 4: We choose our random sample of 100 boxes, weigh the contents of each one. We obtain X = 23.94 oz., and S = 0.13 oz. Using the TI-83, choose STAT, then TESTS, then 2:T-Test. Choose Stats, rather than Data, since in this case we know the sample mean and sample standard deviation. Input the value of 0 appearing in the null hypothesis, in this case, 24. Input the sample mean and the sample standard deviation, and the sample size. Also, we must tell the calculator which of the three forms of alternative hypothesis we are using. In this case, we are using the second form. Choose Calculate. Step 5: The value of the test statistic is –4.6154. The p-value associated with this test is 0.00000589. This is less than , so we reject the null hypothesis. Step 6: We reject H0 at the 0.05 level of significance. We have sufficient evidence to conclude that the average amount of Grape Nuts in a 24-oz. box is less than 24 oz. Example: p. 385, Exercise 9.42. Example: p. 385, Exercise 9.43. Tests of Hypotheses Concerning Population Proportions The types of alternative hypotheses are as follows: Ha: p  p0 Ha: p < po Ha: p > po In order to conduct a test of hypotheses about the value of a population proportion, we must do a binomial experiment. We will choose a random sample of n members of the population; this will guarantee that the n trials in our experiment are independent of each other. For each member of the population we want to know the same information – does this member have the characteristic of interest; hence the n trials are also identical. For each trial, the information sought is in the form of a yes-or-no question – does the member have the characteristic of interest; hence there are only two possible outcomes for each trial. Finally, since we are randomly sampling from the population, the probability that any one member of the sample will have the characteristic of interest is simply the proportion, p, of the entire population who have the characteristic of interest. Hence the probability of success is the same for each trial, namely p. From the Central Limit Theorem, if our sample size is large enough, then the following random pˆ  p variable has an approximate standard normal distribution: Z  . The normal p(1  p) n approximation will be reasonable if n > 20, and np > 5 and n(1-p) > 5. In the formula above p̂ is the proportion of the sample who have the characteristic of interest. Under the null hypothesis, this random variable is a statistic, Z  pˆ  p 0 p 0 (1  p 0 ) n . This will be the test statistic for our hypothesis test. The hypothesis test will have the same steps as those used for testing hypotheses about population means: Step 1: We state the null and alternative hypotheses. Step 2: We state our chosen sample size and significance level. Step 3: We state the form of the test statistic and its probability distribution under the null hypothesis. Step 4: We choose our random sample, do our measurements, and compute the value of the test statistic and the associated p-value. Step 5: We compare the p-value to make a decision. Step 6: We state our conclusion, in terms of the statement we were trying to prove, and including the level of significance of the test. Example: The U.S. Substance Abuse and Mental Health Services Administration conducts surveys on drug use by type of drug and age group. Results are published in National Household Survey on Drug Abuse. According to that publication, 12.0% of 18-25 year-olds were current users of marijuana or hashish in 1995. A recent poll of 1283 randomly selected 18-25 year-olds revealed that 146 of them currently use marijuana or hashish. At the 10% significance level, do the data provide sufficient evidence to conclude that the percentage of 18-25 year-olds who currently use marijuana or hashish has changed from the 1995 percentage of 12.0%? Step 1: H0: p = 0.12 Ha: p  0.12 Step 2: n = 1283 and  = 0.10 Step 3: The test statistic is Z  pˆ  0.12 (0.12)(0.88) 1283 , which under the null hypothesis has an approximate standard normal distribution. Step 4: X = 146. We choose STAT, TESTS, 1-PropZTest. We enter the null-hypothesized value, 0.12, the value of X, and the sample size. We tell the calculator which of the three alternatives we are using. We find z = -0.683861, and p-value = 0.4940628. Step 5: p-value > . We don’t reject the null hypothesis. Step 6: We fail to reject H0 at the 0.10 level of significance. We do not have sufficient evidence to conclude that the proportion of 18-25 year-olds currently using marijuana or hashish differs from the proportion found in 1995. Example: p. 401, Exercise 9.83. Example: p. 401, Exercise 9.85.

Hypothesis Testing

Related documents

Products

Support

Hypothesis Testing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib