SPSS Tutorial 3—Binomial Tests (Sign Test) in SPSS November 16, 2000 While studying for his statistics midterm, a UW student notices that his understanding of the concepts increases while he listens to Limp Biskit. He wonders if other students would also benefit by listening to Limp Biskit while they study stats. In order to decide if this intervention might work, he takes 25 students from his class who agree to be part of his study, and assigns each student to a “no music” condition and to a “music” condition (a within-subjects design); in the latter condition, Limp Biskit is playing through a set of headphones that each student wears. In both conditions, students work through 25 difficult statistics problems. The data that are collected are the number of problems that are correct (out of 25) in each condition. These data appear to the left. Now that the study has been conducted, we would of course want to know if Limp Biskit truly does make a difference in one’s statistics performance. This can be accomplished (among other ways that you will discover later) with the BINOMIAL TEST (also known as the sign test) in SPSS. Below I’ve constructed a boxplot of the data in SPSS so that you can see how scores for the two conditions are distributed. (If you wanted to do this, you would start by selecting GRAPHS / BOXPLOT from the menu in SPSS. You would then do a simple boxplot of the separate variables “music” and “nomusic.”) 30 It appears that the median values are about the same for both conditions, though the Q1 and Q3 scores are higher in the ‘music’ condition. It’s difficult to determine if there’s a difference without using a statistical test… 20 10 0 N= 25 25 MUSIC NOMUSIC Well, you can’t run a binomial test on these data until you create a variable that meets the binomial criteria. That is, you need to have a variable that identifies “successes” and “failures.” Here, let’s use 0 to represent a failure (the student’s performance in the ‘no music’ condition is better than performance in the ‘music’ condition) and a 1 to represent a success (superior performance in the ‘music’ condition). You may ask at this point, what do you do if the performance under the two conditions is equal? Well, the convention is to throw out cases in which this happens. Ideally, your measurement would be so sensitive that you would always have a difference between the scores, but if you don’t, you usually reduce your sample size by the number of “ties” that you find. At this point, you will need to compute a new variable that is the difference between the two conditions. We’ll do this through the menu system, though you can do it through syntax. Start by selecting TRANSFORM / COMPUTE. You now want to compute a new variable (here I used ‘diff’) that is the score in the ‘no music’ condition subtracted from the score in the ‘music’ condition. In this dialog box, you can click the variables over from the list to the left, and perform functions on them using the calculator. Notice that there are many functions from which to choose; here, we’re just doing simple subtraction. When you have your dialog box set up as above, click “OK” to run the command and to create a new variable. You can also click “Paste” to place the command into syntax (the best option if you are going to run the command more than once; e.g., if you had a data entry error earlier). Now, we need to make another variable that denotes successes (1) and failures (0). The best way to do this in SPSS is using the RECODE command. This feature allows you to take the values in one variable and recode them (either to the same or to a different variable). We want all values in the “diff” variable that are positive to be recoded as 1, and all values that are negative to be recoded as 0. We will exclude cases with a difference of 0. Select TRANSFORM / RECODE / INTO DIFFERENT VARIABLES from the menu in SPSS. You will get the dialog box above. Click over the “diff” variable, because you want to recode it. The “output variable,” the new variable, we’ll call “outcome.” You must click the “CHANGE” button for the new variable to be recoded. Now, select the values that you want to recode by clicking “OLD AND NEW VARIABLES.” You will get the dialog box that follows: This box has the information you need already filled in appropriately. You want the old values of range –25 to –1 recoded as 0 (these are failures, representing the entire range of possible difference scores that would be failures). You also want 1 through 25 recoded as 1 for successes. Recode “Else” (which will only be the 0’s) as system-missing, so that they will be excluded from the analysis. When your dialog box has the OldNew column filled in exactly as above, click “Continue.” Next, click “OK” to recode the variable. If “OK” is not highlighted, you probably need to click the “Change” button to indicate that you want the variable changed. Your new variable, outcome, should have 0’s, 1’s, and .’s (representing systemmissing) as values only. Now, you’re ready to run your binomial test!!! Select Analyze / Nonparametric Tests / Binomial. The dialog box to the left should then appear. Notice that you can specify the test proportion (this is the same as p). Here, we are using .5, because if our null hypothesis is that there is no difference between conditions, we would expect half of the population to be successes, and half to be failures, by chance alone. Your test variable (the one on which you want to run the binomial test) is “outcome.” Click it over to the left, then click OK. Binomial Test succes s or failure Group 1 Group 2 Total Category 1.00 .00 N 18 5 23 Observed Prop. .78 .22 1.00 Test Prop. .50 Exact Sig. (2-tailed) .011 Your output is relatively simple. First, notice that your N is 23; this is because you had two cases in which there was no difference between conditions, which were removed from the analysis. There are 18 successes (78%) and 5 failures (22%). The significance (exact sig.) is the most interesting part here: it is .011. What does this mean? If there really were no difference between the conditions, you would obtain results as extreme as those you obtained (18 out of 23 successes, or failures) or more extreme 1.1% of the time. So, it’s unlikely that there is no difference between these two conditions. Let’s check out the binomial probability calculator on the web to see if we get the same result: http://faculty.vassar.edu/~lowry/binom_stats.html Your N here is 23; your k is 18, and p is .5. When you calculate the results, you find that the probability of 18 or more successes is 0.00531. If you multiply this by 2, you will get .011, the same result as in SPSS. Why is the value that the web calculator finds half of the SPSS value? The answer lies in the fact that SPSS gives a 2-tailed probability value. That is, SPSS gives you the probability of finding 18 or more successes or 18 or more failures (5 or fewer successes). Because the binomial distribution is symmetrical when p is .5, these two scenarios are equally likely. Normal Approximation to the Binomial As you know, you can use the normal approximation to the binomial distribution to find the probability that you would find a certain number of successes (or a more extreme number). To do this, you need to obtain a z score for the number of successes obtained (18 out of 23 trials). In order to do this, you need to know both the mean (expected value) and the standard deviation of the population. Mean = np = 23 * .5 = 11.5 Standard Deviation = sqrt (npq) = sqrt (23 * .5 * .5) = 2.40 Your z score is: 18 – 11.5 = 2.71 2.4 Looking at the probabilities under the normal curve in the back of your textbook, you will find that the probability of having this number of successes, or more, is .0034. This is relatively close to the .0053 value we found through the calculator. What if you were to use the correction for continuity? If you used the correction for continuity (which is typically used for samples of size N=20 or less), you would use the score of 17.5 (the lower real limit of the interval containing 18) in the calculation of your z-value. Here is what you would get: z score is: 17.5 – 11.5 = 2.50 2.4 The area under the normal curve beyond a z value beyond 2.5 is .0062. This is even closer to the value that we would expect. You can see that the normal approximation to the binomial distribution is a fairly accurate way of determining the probability of obtaining a certain number of successes. Now, it’s your turn to do a binomial test. The assignment begins on the following page. Assignment You have conducted a cognitive psychology experiment in which you are interested in how subjects react to being “primed” with a word that is relevant to the target word to which they respond. You measure their reaction time to the target word and obtain the following data: Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Primed 430 440 520 500 400 480 420 520 600 500 540 480 520 420 Not Primed 450 440 580 420 500 500 440 680 650 480 590 580 560 480 Now, you should determine if there is an effect of priming. Analyze and interpret the data. For extra credit, use the normal approximation to the binomial distribution in addition to your test. You should indicate: 1) Whether or not there is an effect of priming (if so, the reaction time should be less for priming, indicating that they responded quicker in the primed condition). 2) How you made your decision (what test you used, what was the likelihood of obtaining these results or those more extreme, etc.) 3) Include your SPSS printout or hand calculations for the data.