MEI Conference 2013 Using the t distribution Stella Dudzic Stella.dudzic@mei.org.uk Samples of size 5 from N(175, 152) 152 If X~ N(175, 152) then X ~ N 175, 5 Values of x −µ σ n 400 f 300 200 100 −5 Values of 5 x −µ s n 400 f 300 200 100 −5 What do you notice? 5 Question: The exam marks for an A-level module were normally distributed last year with a mean of 61 and a standard deviation of 17. I think the students in my area are different, but just as variable; that is I think their mean mark is normally distributed with mean μ and standard deviation 17. I get the marks for a random sample of ten of them and find that the sample mean is 62.9. Find a symmetrical 95% confidence interval for μ. Is there evidence to suggest that my students are different? Solution ⎛ σ ⎞ X~N ⎜ μ , ⎟ n ⎠ ⎝ ⎛ 17 2 ⎞ X~N ⎜ μ , ⎟ 10 ⎠ ⎝ x −μ z= ~ N(0,1) 17 10 2 ⎛ ⎞ − μ x < 1.96 ⎟ = 0.95 P ⎜ −1.96 < ⎜ ⎟ 17 ⎜ ⎟ 10 ⎝ ⎠ −1.96 × 17 < x − μ < 1.96 × 17 10 10 x − 1.96 × 17 < μ < x + 1.96 × 17 10 10 Annotation Write down the distribution of the sample mean – remember it is a random variable; you know what it turned out to be for this sample but different samples would give different values. Standardise. You can put in the value of x and use your calculator right at the end – this saves on writing and reduces opportunities for errors. Use normal tables and a diagram of the normal curve to find the cut-off points from the N(0, 1) distribution: 1.96 and – 1.96 for a 95% confidence interval. Write the information you have as a probability statement. Rearrange the inequality. Continue re-arranging to get an interval for μ. Some people prefer to remember that the format for a confidence interval for μ is x ± k The 95% CI for μ is (52.4, 73.4) My confidence interval contains 61 so it does not provide evidence that my students are different; the mean for all the students in my area could be between 52.4 and 73.4. σ n , where k is the number from normal tables. Put numbers in and round sensibly. Interpret in the original context. 38 has come up 213 times to end March 2010 but 20 has only come up 148 times See www.lottery.co.uk/st atistics/ for data This reference flowchart is one of a series of three, designed by Stella Dudzic. The series includes: Hypothesis tests for one sample, Hypothesis tests for two samples, and Experimental Design and Hypothesis tests for several samples: ANOVA (Analysis of Variance) The series is also available as a set of three full colour posters in A2 size for wall display. To view the colour posters and to place an order please visit the MEI website at www.mei.org.uk Yo u co uld us ea go odness of fit test to chec k if there is ev idence that the lo tter y is not fair Yes fig 1 Male Test on mean/ median female right handed 32 28 left handed 7 5 Do you have a large sample? Yes Do you know the variance? No Yes No Are the data from a Normal distribution? Contingency table No fig 2 Are the data single variable or bivariate? Single variable Test on variance With a large set of data, the scatter diagram for a bivariate Normal distribution is approximately elliptical cumulative probability What distribution are the data from? Yes Bivariate data observed D 0.5 0 x Test statistic for Kolmogorov-Smirnov test This reference flowchart is one of a series of three, designed by Stella Dudzic. The series includes: Hypothesis tests for one sample, Hypothesis tests for two samples, and Experimental Design and Hypothesis tests for several samples: ANOVA (Analysis of Variance) The series is also available as a set of three full colour posters in A2 size for wall display. To view the colour posters and to place an order please visit the MEI website at www.mei.org.uk Estimate variance as s² and use t test Poisson Poisson test Symmetrical Distribution Other For Normal population Wilcoxon single sample test Sign test test for variance Binomial test or Normal approximation Goo dness of fit test test or Kolmogorov-Smirnov (see fig 3) Are the variables categories or numbers? EXPECTED Normal test No Categories in a contingency table (see fig 1) 0.75 0.25 Do you know the variance? Estimate variance as s² and use Normal test Test of proportion fig 3 1 Normal test Number pairs Are the data fro m a bivariate Nor mal distribution? (see fig 2) test Yes No Pearson’s product moment correlation test Spearman’s rank correlation test or Kendall’s rank correlation test Extract from November 2008 MEI Newsletter t tests in S3 In the specification for S3, the notes about t tests outline when it is appropriate to use such a test for a mean: “In situations where the sample is small and the population variance is unknown, but the population may be assumed to have a Normal distribution.” Answers to some of the questions that students may ask about this are given below. How small does the sample need to be to use a t test? There is no exact answer, it depends on the situation but a reasonable rule of thumb is that samples of fewer than about 30 are small whereas samples of over 30 are large. What if the sample is large? If the population is Normal, the population variance is unknown and the sample size is large then a Normal test should be used. The population variance is estimated from the sample. Students learnt this in S2. What if the population variance is known? When the population is Normal and the variance is known, a Normal test should be used, whether the sample is large or small. Students learnt this in S2. What if the population does not have a Normal distribution? If the sample size is large then the Central Limit Theorem implies that the sample mean will be approximately Normally distributed. The Normal test can be used, either with known or estimated variance. How do we know whether the population has a Normal distribution? There are techniques for testing whether the underlying population is Normal. For example, using Normal probability paper or using a Kolmogorov-Smirnov test or a χ 2 goodness of fit test. Only the χ 2 goodness of fit test is in the S3 syllabus. However, candidates may not have the data needed to use this test in an examination question. So, when candidates need to decide what test to use, the examiners will indicate whether the variable can be modelled by a Normal distribution. Is it OK to use a t test if the sample is large, the variance is unknown and the population can be modelled by a Normal distribution? If the population is not exactly Normal (and, in real life, it rarely is), then it is better to use a Normal test for a large sample than a t test. For that reason, candidates should always use Normal tests for large samples, even though the tables available in examinations give percentage points of the t distribution for n = 50 and n = 100 . As can be seen from the specification (S3I7 page 163), the same rules apply to confidence intervals. t distribution questions 1. Tick the correct statements. The tails of a t distribution are wider than for the standard Normal distribution. There is a higher probability of getting a value within one standard deviation of the mean for a t distribution than for a standard Normal. You should always use the t distribution when you don’t know the population variance. 2. A t distribution with which of the following degrees of freedom is closest to a standard Normal distribution? 0 1 5 10 3. A researcher wants to know whether the Silver Streak javelin generally travels further than the Golden Arrow javelin. She asks a random sample of athletes to throw both javelins and measures the distance travelled. She subtracts the Golden Arrow distance from the Silver Streak distance and will do a hypothesis test with null hypothesis difference = 0. Tick the important conditions that would indicate she should use a t test. The distances for the Silver Streak are Normally distributed. The distances for the Golden Arrow are Normally distributed. The differences in distance are Normally distributed. She has a large sample of distances. There is no correlation between the distance that someone can throw the Silver Streak and the distance he can throw the Golden Arrow. Are there any other important conditions not listed above?