Statistics 101 - Take Home Lab #11 Due Tuesday, April 27 In Lab #11, you studied the importance of randomization in determining the statistical significance of the difference between mean yields of two varieties of corn. In lab, you were given “THE TRUTH” and asked to randomly assign two varieties of corn to 36 plots of land and then conduct a two-sample t-test to determine if the mean yields of the two varieties of corn were equal. This take home laboratory assignment extends what you did in lab by using the computer to replicate many (100 to be exact) trials of this randomized experiment. Using the computer in this manner allows us to look more closely at hypothesis testing. To begin, go to the course webpage at http://www.public.iastate.edu/∼wrstephe/stat101L.html Under Lab Material, right click on the link JMP Script Difference=12. Select Save Link Target As (or Save Target As if you are using Internet Explorer). Make sure the name of the file is yieldsmean12.jsl. Save this file to either the computer’s hard drive or a disk. Start JMP. Open the yieldsmean12.jsl file by clicking on the Open Script button in the JMP Starter window. Find the location of the file, select the file and then click Open. You should now have a window open in JMP called yieldsmean12. This file (yieldsmean12) is a JMP script. The codes in the file will instruct JMP to do the same randomized experiment you completed in lab on the yields of corn varieties A and B. For all 36 plots, the difference between yields A and B in this script is 12 bushels, the same as the difference in “THE TRUTH” you were given in class. To make JMP run the randomized experiment 100 times, select Edit → Run Script. The script will take a few moments to run. Once the script is finished, you will have a window open with data from the 100 trials of this experiment. Each row in the data table is one of the 100 trials. The columns in the data table are • Mean Difference = the difference between the sample means of yields A and B. • Standard Error = the standard error of the sample mean difference. • Lower CI = the lower endpoint of a 95% confidence interval for the difference between the mean yields of varieties A and B. • Upper CI = the upper endpoint of a 95% confidence interval for the difference between the mean yields of varieties A and B. • t value = the two sample t-test statistic for testing whether the mean yields of A and B are equal. • d.f. = The degrees of freedom for the two-sample t-test. This value is calculated using the formula found on page 536 of your textbook. • p-value = The p-value for testing H o : µA = µB vs. Ha : µA 6= µB . 1 Use JMP to get histograms of the 100 Mean Difference, t value and p-values from this data table. For the Mean Difference values, add the Normal Quantile Plot to the output. For the p-values, add the Stem and Leaf Plot to the output. Use this information to answer the following questions. 1. Describe the shape, center and spread of the Mean Difference histogram. 2. According to “THE TRUTH”, what value should your histogram be centered around? Is the center of your histogram close to this value? 3. Describe the normal quantile plot for the Mean Difference. Does the distribution of the difference between the two sample means appear to be normal? Explain your answer. 4. If the null hypothesis of equal means were really true, what value should the t value histogram be centered around? Do any of the t values come close to this value? 5. What is the maximum p-value of the 100 trials? Of the 100 trials, how many have p-values of less than 0.05? In a hypothesis test, if the null hypothesis is really true, we want to reject the null hypothesis a small percentage of the time. However, if the null hypothesis is false, we want to reject the null hypothesis a large percentage of the time. In this example, the null hypothesis is false; the true difference between (mean) yield A and B is 12 bushels. We therefore want the two-sample t-test to detect this difference and reject the null hypothesis of equal mean yields a large percentage of the time. This is called the power of a hypothesis test. Your answer to problem #5 above is an estimate of the power of the test when the difference between yields is 12 bushels. The power of this particular test depends on the difference between the two yields. If you change the difference between yields A and B, the power of the two-sample t-test to detect this difference will also change. 2