Dr. Scott Marley EdPsy 511 Spring 2007 SPSS Tutorial Please note that SPSS only gives results with p-values based upon two-tailed statistical tests. All examples presented here are two-tailed tests. In order to verify the hand calculations for one-tailed tests on your homework, Dr. Marley recommends the following methodologies: 1) After computing the t-test value by hand, compare this value to the t-test value seen in the SPSS output for two-tailed tests. & 2) Divide the p-value in the SPSS output in half and apply it to the alpha level assigned in the homework problem. One Sample T-Test The One-Sample T Test procedure tests whether the mean of a single variable differs from a specified constant. Examples. A researcher might want to test whether the average IQ score for a group of students differs from 100. Or, a cereal manufacturer can take a sample of boxes from the production line and check whether the mean weight of the samples differs from 1.3 pounds at the 95% confidence level. The following example is designed for the purposes of humor and instruction. We have volunteered to collect data for the people of Mozambique on the circumferences of mythical giraffe necks. We were able to measure 10 adult mythical giraffes whose neck sizes in feet had the following dimensions: 37, 68, 45, 30, 40, 50, 70, 35, 26, 47. We have approximately 500 feet of gauze material that we are hoping is sufficient to create neck bows for each giraffe. Thus, we are testing the null hypothesis that the average neck size of our sample of mythical giraffes is 50 ft. We are using an alpha level of .05. H0: µ = 50 HA: µ ≠ 50 Enter the following neck sizes of the giraffes 37, 68, 45, 30, 40, 50, 70, 35, 26, 47. Then choose One-Sample T-Test Set Test Value at 50. Click OK. . Conclusion Based Upon SPSS Output for a One-Sample T-Test. The mean size of our giraffe necks was 44.8 ft. with a standard deviation of 14.75 ft. The standard error of the mean was 4.66 ft.. We find a p-value of .294, which is greater than our alpha level of .05. Thus, the mean difference between neck sizes in our sample of giraffes is not statistically significantly different from 50 ft and we fail to reject the null hypothesis. The people of Mozambique are sure to cast admiring gazes at the beautifully mythical and gauzed giraffes. Independent-Samples T Test The Independent-Samples T Test procedure compares means for two groups of cases. Ideally, for this test, the subjects should be randomly assigned to two groups, so that any difference in response is due to the treatment (or lack of treatment) and not to other factors. This is not the case if you compare average income for males and females. A person is not randomly assigned to be a male or female. In such situations, you should ensure that differences in other factors are not masking or enhancing a significant difference in means. Differences in average income may be influenced by factors such as education (and not by sex alone). Example. Patients with high blood pressure are randomly assigned to a placebo group and a treatment group. The placebo subjects receive an inactive pill, and the treatment subjects receive a new drug that is expected to lower blood pressure. After the subjects are treated for two months, the two-sample t test is used to compare the average blood pressures for the placebo group and the treatment group. Each patient is measured once and belongs to one group. 1. Select Independent Samples T –Test as follows: Please find the Data Set entitled, ‘New Drug’ on Dr. Marley’s website to follow along with this exercise as practice for your homework. 2. Enter the drug condition as the grouping variable and the blood pressure results as the test variable. Using Option Subcommand enter values (in this case 1 & 2) specified for your control and treatment drug groups. Then hit the continue button followed by the ok button on the original screen. 3. Analyze SPSS Output for the Independent Samples T-Test. Find the Mean, Std. Deviation and Std. Error of the Mean for each of the drug conditions. Group Statistics Blood Pressure Drug New Drug Placebo 6 Mean 149.000 Std. Deviation 18.6976 Std. Error Mean 7.6333 6 129.833 16.4732 6.7252 N I apologize for the blurriness of the following output, which is due to size restrictions. The t-test for Equality of Means yields a p-value of .089, which informs us that the mean difference of 19.1667in blood pressure between the two groups was not statistically significant at the two-tailed alpha = .05 level. We note that a larger sample size or a onetailed test might/would give different results. Paired Samples T-Test The Paired-Samples T Test procedure compares the means of two variables for a single group. The procedure computes the differences between values of the two variables for each case and tests whether the average differs from 0. Example. In a study on coronary artery disease, the time a patient can spend on a treadmill is measured while still smoking and measured again after six months of having quit smoking. Thus, each subject has two measures, often called before and after measures. 1. Select Paired Samples T–Test as follows: Please find the Data Set entitled, ‘Coronary Artery Data’ on Dr. Marley’s website to follow along with this exercise as practice for your homework. 2. Select ‘Treadmill Time Before’ and ‘Treadmill Time After’ in the left window and notice how Variable 1 and Variable 2 are filled in as you do so. Then press the arrow button. 3. Notice how the two variables are paired into a relationship that indicates that the ‘time before’ variable precedes the ‘time after’ variable. Verify that the correct sequence is established. Then hit the okay button. 4. Analyze SPSS output. Notice the mean, sample size, standard deviation and standard error of the mean for each of the paired variables. Paired Samples Statistics Mean Pair 1 Treadmill time in seconds before Treatment Treadmill time in seconds after Treatment N 837.44 617.9444 Std. Deviation Std. Error Mean 18 197.653 46.587 18 204.98335 48.31504 The correlation between the paired variables is .810, which is statistically significant at the alpha = .05 level with a p-value <.001. Paired Samples Correlations N Pair 1 Treadmill time in seconds before Treatment & Treadmill time in seconds after Treatment Correlation 18 .810 Sig. .000 Finally, in examining the following output (again adjusted due to size restrictions) we see that the mean difference in treadmill times before and after cessation of smoking was statistically significant at the alpha = .05 level with a p-value < .000. Part II I. Go to the Piface website at: http://www.stat.uiowa.edu/~rlenth/Power/ Choose the two-sample t test option from the menu listed on the left of your screen. If this menu does not appear, then you must install JAVA onto your computer via the instructions listed on the web page or use a different computer. This tutorial presents the way to calculate the power or sample size for a two sample ttest given an effect size, which is calculated from the difference in means divided by the pooled standard deviation. For the sake of clarity of explanation, the example presented here includes samples with equal n-sizes and equal standard deviations. However, the same principles apply in other situations. II. Piface Options for the Two Sample T-Test with Pooled Standard Deviation. (This information is entered into the dialog box shown below.) Standard Deviation (sigma) This dialog provides for power analysis of a two-sample t-test. If the "equal SDs" box is checked, then the pooled t test is used. This is the option you will usually use. Sample Size You have three choices for sample-size allocation. "Equal" forces n1 = n2 Power Slider You may choose to solve for sample size when you click on the "Power" slider. For our current examples we will keep the selection on sample size. III. Example: Two Sample T-Test with Pooled Standard Deviation Research Question: Does the desire for food give purpose to maze walking in mice? We hypothesize that the influence of the cheese will motivate the treatment group to race through the maze with more speed. Study: We randomly assigned mice to one of two conditions. The first condition is a control group composed of 30 mice that are simply placed within a maze and timed to see how long each one takes to exit. The second group of 30 mice is the treatment group and these mice are also timed to see how long it takes each one to get through the maze with the incentive of a wafting piece of swiss cheese at the end of the maze. Results: The control group had a mean exit time of 1.004 seconds while the treatment group had a mean exit time of .589 seconds. Amazingly, the standard deviation of each sample was the same at .63 seconds. Power Analysis: I. Using Cohen's D formula (shown below) for determining effect size, we see that our current study with sample means of 1.004 and .589 respectively for the control and treatment groups and equal standard deviation of .63 generates an effect size of .66. In order to find the power of this study we use Piface statistical software. d = M1 - M2 / [( 1² +²)/ 2] = 1.004 - 0.589 / [(0.63² + 0.63²) / 2] = 0.415 / [(0.3969 + 0.3969) / 2] = 0.415 / (0.7938/2) = 0.415 / 0..3969 = 0.415 / = .66 (continued on next page) II. In order to find the power needed at an alpha level of .05 to detect a statistically significant difference in mean exit time for our two samples of mice with an effect size of .66, we enter the difference of means (.415), and the standard deviations of .63 (remember, .415/.63 = .66), and the sample sizes of 30 in the dialog box by clicking the little box in the right hand corner or sliding the n1 bar to 30. III. Results of the Independent Two-Sample T-Test Power Analysis. The dialog box above shows us that we approximately have power of .7085 to detect a statistically significant result in mean difference in exit time between the treatment and control conditions with an effect size of .66 at the alpha = .05 level. If you want to increase the power to .80 move the slider bar to .80. The n size will increase as you slide the bar to the right. (continued on next page) IV. Another way to input Cohen’s d is to standardize sigma to 1 and then indicate the difference is .66 (i.e., d = .66/1 = .66). In the screen below you will see that the results are exactly the same as above when we entered d = .415/.63 = .66. Notice that the power (1 – Β) is the same. V. As we discussed in class a way to increase your power is to perform a one-tailed test. Do this by removing the checkmark from the two-tailed box. Notice that the power has now increased to .81. This is why many prefer to use a onetailed test. VI. The other purpose of power analysis is to determine what size sample you will need to find an effect of interest. For example, in previous studies researchers have found d = .40 when comparing one instructional strategy to another. If you were to perform a similar study using a two-tailed test and α = .05 how many subjects would you need per condition if you would like to have power of .80. First, set sigma to 1 in both groups. Second, to find a Cohen’s d = .40 insert .40 in the difference of means box. Next, click on the box in the right corner of the Power field and input .80. After you do this, the sample size will automatically be calculated. Based on this analysis to have power of .80 to find a Cohen’s d = .40 the study will require 99 participants in each condition (N = 198). (continued on next page) VII. Using the previous example let’s see what happens to our sample size estimation when we use a one-tailed test. Now our study will require 156 participants which is 42 less than are required for the two-tailed test. VIII. In some cases a type II error is more problematic so researchers increase their power to .90 (or even .95). In the analysis below the researcher is looking to find a difference of 4/10 of a standard deviation (i.e., d = .40) using a two-tailed test and α = .05. With a 132 participants per condition the sample size increases to 264.