Introduction to Hypothesis Testing

advertisement
Introduction to Hypothesis Testing A Hypothesis Test for  Heuristic Hypothesis testing works a lot like our legal system. In the legal system, the accused is innocent until proven guilty. After examining the evidence, he is found either “guilty” or “not guilty” by a jury of his peers. How much evidence does there need to be to convict? The answer to this is different for every jury. Also, this is not a perfect process – meaning mistakes are made. A mistake can be made by sending an innocent man to prison or letting a guilty man go free. Let’s put these ideas into the framework of hypothesis testing. Statistical Let’s say a researcher has reason to believe the population mean is different from what has been accepted. The belief that has been around for some time (the status quo) will be called the null hypothesis, denoted by HO. The belief that the true mean may actually be different from this null hypothesized belief is called the alternative hypothesis, denoted by HA. Stating the hypothesis We’ll state our null hypothesis in the following way: HO:  = O (the “=” sign always goes with HO) Then, the alternative hypothesis can be one of the following three statements: HA: ___________________ HA: ___________________ HA: ___________________ Finding the “evidence” We’ll use X and knowledge of its distribution to gather our evidence. Intuitively, we know that the further away X is from O, the more evidence we have that the null hypothesis is not true. Let t s
X
s
o
be our test statistic. √n
If the null hypothesis were true (remember: innocent until proven guilty!), then ts ~ tdf=n‐1, and we can compute probabilities associated with it. When X is close to O, then ts _______________ When X is larger than O, then ts _____________ When X is smaller than O, then ts ____________ Diagram of finding the “evidence” for the three possible tests HO:  = O HA:  > O HO:  = O HA:  < O HO:  = O HA:  ≠ O HypothesisTestfor,Errors,andPower
Page2
Definition The P‐value of a test statistic is the probability, given that the null hypothesis is true, of observing a test statistic that extreme or more extreme in the direction of the alternative hypothesis. The Decision So, the P‐value quantifies how extreme our test statistic would be, given that the null hypothesis is true. This is evidence against the null hypothesis. Question: How much evidence is needed to conclude the null hypothesis is incorrect? Answer: This varies from researcher to researcher, and we’ll make a pre‐specified cut‐off, , before we conduct the test of hypothesis. We call this the significance level of the test. We reject HO when P ≤ . We fail to reject HO when P > . Steps for Carrying Out a Hypothesis Test (1) Set  (significance level) (2) State hypotheses (3) Compute test statistic (4) Compute P‐value (5) Make decision (6) State conclusion in context of the setting HypothesisTestfor,Errors,andPower
Page3
Example: The national center for health statistics reports the mean systolic blood pressure for males aged 35‐44 is 128 mmHg. A medical researcher believes the mean systolic blood pressure for male executives in this group is lower than 128 mmHg. A random sample of 72 male executives in this age group results in a sample mean of 126.1 mmHg and a standard deviation of 15.2 mmHg. Is there evidence to support the researcher’s claim? Test this hypothesis at the 0.05 level of significance. HypothesisTestfor,Errors,andPower
Page4
Compute the P‐value be for: HA:  ≠ HA:  > Errors When we make a decision (reject or fail to reject HO), are we always correct? We can make two types of errors in hypothesis testing. Definition: The False Positive Rate (a.k.a. the Type I Error Rate) of a test is the probability of rejecting Ho when it is true. NOTATION:  = P{reject Ho | Ho true} Definition: The False Negative Rate (a.k.a. the Type II Error Rate) of a test is the probability of failing to reject Ho when it is false. NOTATION: β = P{fail to reject Ho | Ho false} HypothesisTestfor,Errors,andPower
Page5
Choosing  If we think of  (the significance level of a test) as the probability of rejecting the null hypothesis given the null hypothesis is actually true, then we would certainly want to choose a very small  to guard against this type of error. Right? It turns out, we cannot simultaneously minimize both  and β. Traditionally, we attend to :  If a false positive error is worse than a false negative, drive a very low (.01, .005, …)  If a false negative error is worse than a false positive, let a rise (.10, or even .15) If you’re not sure/can’t distinguish, then a traditional middle ground is  = 0.05. Example Suppose some sort of immunotherapy is being proposed as an effective therapy against cancer. Suppose the immunotherapy is tested on cancer patients who are already taking chemotherapy and some sort of measure of change in response (change in tumor size?) is being measured with HO: no effect of immunotherapy HA: beneficial effect of immunotherapy A Type I Error would waste a lot of patients’ money on useless immunotherapy A Type II Error would dismiss an effective cure as useless Deciding which type of error is worse isn’t always easy to determine! Power Definition: The power of a test is the probability of rejecting Ho when it is false. NOTATION: P{reject Ho | Ho false}. Notice: P{reject Ho | Ho false} = 1 – P{fail to reject Ho | Ho false} = 1 – β. So, power is the complement of false negative error. We can estimate the power of a hypothesis testing procedure (which is beyond the scope of this course) in advance and often we try to design experiments so that power = 1 – β ≥ 0.80. HypothesisTestfor,Errors,andPower
Page6
Download