Exam • Exam starts two weeks from today Amusing Statistics • Use what you know about normal distributions to evaluate this finding: The study, published in Pediatrics, the journal of the American Academy of Pediatrics, found that among the 4,508 students in Grades 5-8ハwho participated, 36 per cent reported excellent school performance, 38 per cent reported good performance, 20 per cent said they were average performers, and 7 per cent said they performed below average. Review • The Z-test is used to compare the mean of a sample to the mean of a population Zx x x x and X n Review • The Z-score is normally distributed Review • The Z-score is normally distributed • Thus the probability of obtaining any given Z-score by random sampling is given by the Z table Review • We can likewise determine critical values for Z such that we would reject the null hypothesis if our computed Zscore exceeds these values – For alpha = .05: • Zcrit (one-tailed) = 1.64 • Zcrit (two-tailed) = 1.96 Confidence Intervals • A related question you might ask: – Suppose you’ve measured a mean and computed a standard error of that mean – What is the range of values such that there is a 95% chance of the population mean falling within that range? Confidence Intervals • There is a 2.5% chance that the population mean is actually 1.96 standard errors more than the observed mean Gaussian (Normal) Distribution True mean? 0.6 0.5 probability 0.4 0.3 0.2 0.1 0 -4 -3 -2 -1 0 1 2 3 4 score 95% 1.96 2.5% Confidence Intervals • There is a 2.5% chance that the population mean is actually 1.96 standard errors less than the observed mean Gaussian (Normal) Distribution True mean? 0.6 0.5 probability 0.4 0.3 0.2 0.1 0 -4 -3 -2 -1 0 1 score 2.5% -1.96 95% 2 3 4 Confidence Intervals • Thus there is a 95% chance that the true population mean falls within + or - 1.96 standard errors from a sample mean Confidence Intervals • Thus there is a 95% chance that the true population mean falls within + or - 1.96 standard errors from a sample mean • Likewise, there is a 95% chance that the true population mean falls within + or - 1.96 standard deviations from a single measurement Confidence Intervals • This is called the 95% confidence interval…and it is very useful • It works like significance bounds…if the 95% C.I. doesn’t include the mean of a population you’re comparing your sample to, then your sample is significantly different from that population Confidence Intervals • Consider an example: • You measure the concentration of mercury in your backyard to be .009 mg/kg • The concentration of mercury in the Earth’s crust is .007 mg/kg. Let’s pretend that, when measured at many sites around the globe, the standard deviation is known to be .002 mg/kg Confidence Intervals backyard .009mg/kg .002mg/kg • The 95% confidence interval for this mercury measurement is 95%C.I. x / Zcrit (two tailed) .009 /1.96 .002mg/kg .0051 .0129 Confidence Intervals • This interval includes .007 mg/kg which, it turns out, is the mean concentration found in the earth’s crust in general .0051 .007 .0129 • Thus you would conclude that your backyard isn’t artificially contaminated by mercury Confidence Intervals • Imagine you take 25 samples from around Alberta and you found: x .009mg/ kg .002mg/kg .002 x .0004 n 25 Confidence Intervals • Imagine you take 25 samples from around Alberta and you found: • .009 +/- (1.96 x .0004) = .008216 to .009784 • This interval doesn’t include the .007 mg/kg value for the earth’s crust so you would conclude that Alberta has an artificially elevated amount of mercury in the soil Power • we perform a Z-test and determine that the difference between the mean of our sample and the mean of the population is not due to chance with a p < .05 Power • we perform a Z-test and determine that the difference between the mean of our sample and the mean of the population is not due to chance with a p < .05 • we say that we have a significant result… Power • we perform a Z-test and determine that the difference between the mean of our sample and the mean of the population is not due to chance with a p < .05 • we say that we have a significant result… • but what if p is > .05? Power • What are the two reasons why p comes out greater than .05? Power • What are the two reasons why p comes out greater than .05? – Your experiment lacked Statistical Power and you made a Type II Error – The null hypothesis really is true Power • Two approaches: – The Hopelessly Jaded Grad Student Solution – The Wise and Well Adjusted Professor Procedure Power 1. Hopelessly Jaded Grad Student Solution - conclude that your hypothesis was wrong and go directly to the grad student pub Power - This is not the recommended course of action Power 2. The Wise Professor Procedure consider the several reasons why you might not have detected a significant effect Power - recommended by wise professors the world over Power • Why might p be greater than .05 ? • Recall that: Zx x x x and X n Power • Why might p be greater than .05 ? 1. Small effect size: X is quite close to the mean of the population – The effect doesn’t stand out from the variability in the data – You might be able to increase your effect size (e.g. with a larger dose or treatment) Power • Why might p be greater than .05 ? 2. Noisy Data and therefore X is quite large – A large denominator will swamp the small effect – Take greater care to reduce measurement errors Power • Why might p be greater than .05 ? 3. Sample Size is Too Small X is quite large because n is small – A large denominator will swamp the small effect – Run more subjects Power • The solution in each case is more power: Power • The solution in each case is more power: • Power is like sensitivity - the ability to detect small effects in noisy data Power • The solution in each case is more power: • Power is like sensitivity - the ability to detect small effects in noisy data • It is the opposite of Type II Error rate Power • The solution in each case is more power: • Power is like sensitivity - the ability to detect small effects in noisy data • It is the opposite of Type II Error rate • So that you know: there are equations for computing statistical power Power • An important point about power and the null hypothesis: – Failing to reject the null hypothesis DOES NOT PROVE it to be true!!! Power • Consider an example: – How to prove that smoking does not cause cancer: • enroll 2 people who smoke infrequently and use an antique X-Ray camera to look for cancer • Compare the mean cancer rate in your group (which will probably be zero) to the cancer rate in the population (which won’t be) with a Z-test Power • Consider an example: – If p came out greater than .05, you still wouldn’t believe that smoking doesn’t cause cancer Power • Consider an example: – If p came out greater than .05, you still wouldn’t believe that smoking doesn’t cause cancer – You will, however, often encounter statements such as “The study failed to find…” misinterpreted as “The study proved no effect of…” Experimental Design • We’ve been using examples in which a single sample is compared to a population Experimental Design • We’ve been using examples in which a single sample is compared to a population • Often we employ more sophisticated designs Experimental Design • We’ve been using examples in which a single sample is compared to a population • Often we employ more sophisticated designs • What are some different ways you could run an experiment? Experimental Design • Compare one mean to some value – Often that value is zero Experimental Design • Compare one mean to some value – Often that value is zero • Compare two means to each other Experimental Design • There are two general categories of comparing two (or more) means with each other Experimental Design 1. Repeated Measures - also called “within-subjects” comparison • • • • • The same subjects are given pre- and postmeasurements e.g. before and after taking a drug to lower blood pressure Powerful because variability between subjects is factored out Note that pre- and post- scores are linked we say that they are dependant Note also that you could have multiple tests Experimental Design 1. Problems with Repeated-Measure design: • • • Practice/Temporal effect - subjects get better/worse over time The act of measuring might preclude further measurement - e.g. measuring brain size via surgery Practice effect - subjects improve with repeated exposure to a procedure Experimental Design 2. Between-Subjects Design • • Subjects are randomly assigned to treatment groups - e.g. drug and placebo Measurements are assumed to be statistically independent Experimental Design 2. Problems with Between-Subjects design • • Can be less powerful because variability between two groups of different subjects can look like a treatment effect Often needs more subjects Experimental Design • We’ll need some statistical tests that can compare: – One sample mean to a fixed value – Two dependent sample means to each other (within-subject) – Two independent sample means to each other (between-subject) Experimental Design • • The t-test can perform each of these functions It also gets around a big problem with the z-test… Problems with Z and what to do instead The Z statistic • The Z statistic (with which to compare to the Zcrit) Zx Where x x X X x n The Z statistic • What is the problem you will encounter in trying to use this statistic? The Z statistic • What is the problem you will encounter in trying to use this statistic? • Although you might have a guess about the population mean, you will almost certainly not know the population variance! The Z statistic Zx Where x x X X x n The Z statistic Zx Where x x X X x n The Z statistic Zx Where x x X X x n The Z statistic Zx Where x x X X x n The Z statistic • What to do? 2 • Could we estimate • What would we use and what would have to be the case for it to be useful? The Z statistic • What to do? 2 • Could we estimate • What would we use and what would have to be the case for it to be useful? • We could use our sample variance, S2 2 to estimate the population variance Estimating Population Variance • Just like there are many sample means (the sampling distribution of the mean) there are many S2s Estimating Population Variance • Just like there are many sample means (the sampling distribution of the mean) there are many S2s • X tends to be near the value of 2 2 but does S tend to be near the value of Estimating Population Variance • Just like there are many sample means (the sampling distribution of the mean) there are many S2s • X tends to be near the value of 2 2 but does S tend to be near the value of • No. It is a biased estimator. It tends to 2 be lower than Estimating Population Variance • Why is S2 biased? Estimating Population Variance • Why is S2 biased? • The sum of the deviation scores in your sample must equal zero regardless of where they came from in the population Estimating Population Variance • Why is S2 biased? • The sum of the deviation scores in your sample must equal zero regardless of where they came from in the population • This means that the deviations in your sample are somewhat more constrained than in the population Estimating Population Variance • Why is S2 biased? • The sum of the deviation scores in your sample must equal zero regardless of where they came from in the population • This means that the deviations in your sample are somewhat more constrained than in the population • S2 is has relatively fewer degrees of freedom than the entire population Estimating Population Variance • Specifically S2 has n - 1 degrees of freedom Estimating Population Variance • Specifically S2 has n - 1 degrees of freedom • So if we compute S2 but use n - 1 instead of n in the denominator we’ll get 2 an unbiased estimator of Estimating Population Variance • Of course if you’ve already computed S2 using n in the denominator you can multiply by n to recover the sum of squared deviations and then divide by n-1 The t Statistic(s) • Using an estimated , which we’ll 2 ˆ we can create an estimate of X call ˆX which we’ll call 2 ˆX where ˆ n (X i X ) ˆ n 1 2 2 nS n 1 The t Statistic(s) ˆ X instead of X we get a • Using, statistic that isn’t from a normal (Z) distribution - it is from a family of distributions called t x x tn1 ˆx