What is inference? Hypothesis tests in t This week’s lectures are actually a combination of chapter 6 and chapter 7 material. Knowing something about a sample, on its own, isn’t very useful. Information about the population as a whole is much more useful for making decisions. Example: The city of Vancouver is trying to decide on a new water usage policy, so they gather a sample of water usage from 100 houses. In the grand scheme of the city, the water habits of 100 houses doesn’t matter much. The water habits of the city as a whole matter a great deal. We find what would change the habits of the people in the sample, and infer that the same thing would change the habits of people in the population. This leap from using info from the sample to assuming that the whole population is like that is called inference. Our two main tools of inference are hypothesis tests and confidence intervals. A hypothesis test is where we make a guess (hypo-thesis, meaning less than a full idea) about a parameter and use statistics to reject or fail to reject it. A confidence interval is an interval, based on statistics, of where a parameter is likely to be. Today we’ll cover hypothesis testing. First, we need a hypothesis to test. We’re always testing against the null hypothesis. Or . Null means “nothing”, or “blank”. The null hypothesis is the general assertion of no change or no effect or that everything is in its default state. (Status quo) In court, one is presumed innocent until proven guilty. In other words, the idea that someone is innocent is the null hypothesis. It’s what we assume with no evidence, our starting point. Being guilty is the other state someone can be. It’s the alternative to innocence. The hypothesis that someone is guilty is the alternative hypothesis, or The burden of evidence is to prove guilt. Or, more generally, the burden is to disprove innocence. With strong enough evidence we can reject the idea of innocence. We can reject the null hypothesis! We always hypothesize about parameters. In the case of the courts, the parameter guilt/innocence, is whether, in all of the person’s actions, if the offending action happened. The sample would be any portion of these actions we observe (directly or indirectly), in the form of evidence. In statistics, one of the most common things we hypothesize about is the mean. We can hypothesize about any parameter, but the mean is among the simplest and most useful. A hypothesis test typically looks like this: and are the names of the two hypotheses. is the mean, of which we are making a hypothesis. is the hypothesized value of the mean. This is always given to you in the question (and in real life) The null hypothesis, is always = something. The data from our test tells us how far from that single point we are. The alternative hypothesis, symbol, , is always >, <, or . The last , means “not-equal”. Which symbol will be given implicitly in the question, and depends on what we’re trying to prove. Example: Any level up to 20 parts-per-billion of cadmium in a lake is considered safe. We’re testing a lake for cadmium by taking water samples. The burden of proof rests on showing that the water is more than this safe level. What do the set of hypotheses look like? - We’re hypothesizing about the average level of cadmium in the water, that’s the mean. We’re trying to detect if the cadmium level is more than 20, that’s the alternative hypothesis. When doing a study, the alternative is sometimes called the research hypothesis. - That also means the null hypothesis is that the level is 20. - The null and alternative are always talking about the same parameter ( ) and the same value (20). Hypothesis tests? Got it? Yes, yes, wonderful. But how do we use them? …with a t-test. Say we gathered samples from Lake Ontario. From a sample of 10, the (sample) mean was 23 and the (sample) standard deviation was 2.5. If the lake has too much cadmium, we’ll have to take action, so we want to be sure enough that it’s over the limit so that there’s only a 0.05 chance of claiming it’s over when it’s not (i.e. when the null is true) “The only lake you can develop your film in” – Ron James First, get the t-score How many standard errors above 20 are we? First, get the t-score How many standard errors above 20 are we? Get the degrees of freedom, df = n – 1 = 10 – 1= 9. “ > “, is one sided. (as opposed to “ ”, which is two sided) For the t-table at df=9 (One sided, page 520) df … 9 Level of Significance for One-Tailed Test ( ) .10 .05 .025 .01 .005 … … … … … 1.383 1.833 2.262 2.821 3.250 We want to compare our t-score 3.79 to one of these values, but which one? We have the magic phrase: “only a 0.05 chance of claiming it’s over when it’s not” Which means a level of significance, df … 9 = .05 Level of Significance for One-Tailed Test ( ) .10 .025 .01 .005 .05 … 1.383 … 1.833 … 2.262 … 2.821 … 3.250 For 9 degrees of freedom, there is a .05 chance of getting 1.833 or more standard errors above the mean. There is a less than .05 chance of getting at least 3.75 standard errors above the mean. (We don’t care how much less) Since getting a t-score of 3.75 is so unlikely, either… Situation 1: …the (true) mean concentration of the lake is 20 and we happened to get a really high concentration sample by dumb luck, or by chance. Situation 2: … the (true) mean concentration is, in fact, higher than 20. The chance of situation 1 happening (high samples by dumb luck) is less than .05. Since we set the level of significance is .05, we’re willing to take that chance and… … reject the null hypothesis. The evidence against the null hypothesis is too strong not to reject it, we’re getting samples with concentrations over 20 by a wide enough margin that it’s not reasonable to say the concentration is within safe limits. It’s very unlikely this sample that high by a glitch. That chance of glitch is called the p-value. We’ve concluded that the p-value is less than df Level of Significance for One-Tailed Test ( ) .10 .05 .025 .01 .005 … 9 … 1.383 … 1.833 … 2.262 … 2.821 … 3.250 t-score = 3.75 In fact, the p-value is less than .005, given that t > 3.250. That’s very strong evidence against the null hypothesis indeed. In a court setting, we would say that there’s enough evidence for a conviction. There is strong enough evidence to reject the null hypothesis of innocence. If the sample mean was fewer standard errors above 20… The p-value could be bigger than = 0.05. Then there wouldn’t be enough evidence to reject In that case, we fail to reject . We can’t say that the true mean is exactly 20, because it’s probably off one way or another. So we don’t accept the null, we merely fail to reject it. Even when rejecting , we don’t accept wasn’t the hypothesis being tested. , because We statisticians are very skeptical people, never accept, always reject or fail to reject. Back the court analogy: Even if we get slam-dunk evidence proving that some couldn’t have committed a particular crime, it doesn’t mean they’re innocent. They might have done some other crime (stolen the crown jewels?) and gotten away with it. Next time, and possibly Monday: T-test in SPSS Hypothesis tests of proportions Confidence Intervals