Using SPSS Prepared by Pam Schraedley January 2002 This document was prepared using SPSS versions 8 and 10 for the PC. Mac versions or other PC versions may look slightly different, but most instructions here should still work. 1 Table of contents 1. Getting Started 1.1 Entering data from scratch..….………………………………………….……...…. 3 Defining Variables (SPSS version 8)…………………………………………... 3 Defining Variables (SPSS version 10)…..……………………………………... 4 1.2 Importing data from Excel…………………………………………………...……. 6 2. Getting your data in shape 2.1 Calculating variables………………………………………………………………. 8 2.2 The If… button ………………………………………………………………….... 9 2.3 Recoding Variables …………………………………………………..…………… 10 Recoding into Same Variables………………………………..…………..……. 10 Recoding into Different Variables………………………….….……………..... 11 Special case: Median (or tertile or quartile) splits ……….…….…………….... 12 2.4 Select cases…………………………………………………………………….….. 13 2.5 Merging files………………………………………………………………………. 14 Adding cases…………………………………………………………………… 15 Adding variables……………………………………………………………….. 16 3. Analyzing your data 3.1 Independent Samples t-test ……………………………………………………….. 18 3.2 Paired t-test ……………………………………………………………………….. 19 3.3 Oneway simple ANOVA………………………………………………………….. 21 3.4 Chi square contingency test ………………………………………………………. 24 3.5 Correlations (simple and partial)………………………………………………….. 25 3.6 Regression……………..………………………………………………………….. 27 3.7 ANOVA models and GLM ………………………………………………………. 30 Repeated Measures ………………..……………….………..……….………... 34 3.8 Reliability ……………………………………………………………………….… 37 4. Taking a look at your data 4.1 Checking the numbers ……………………………………………………………. 39 Frequencies ……………………………………………………………………. 39 Tables ………………………………………………………………………….. 40 4.2 Graphing and plotting …………………………………………………………….. 42 Scatterplots ……………………………………………………………………. 42 Histograms …………………………………………………………………….. 43 Bar charts ……………………………………………………………………… 43 5. Output 5.1 Organizing your output …………………………………………………………… 45 5.2 Results Coach …………………………………………………………………….. 46 6. Using Syntax 6.1 The Paste function …………………………………………………………………48 6.2 Creating a Session Journal ……….………….……….…………………………… 48 7. For more information ……………………………………………………….…….….. 49 2 1. Getting started 1.1 Entering data from scratch: You will first want to create a template into which to enter data by defining variables. This is done differently in SPSS 8 and SPSS 10, and is the most commonly used feature that differs between the 2 versions. Defining variables (SPSS version 8) Under the Data tab, click Define Variable. Important note about entering data in SPSS: SPSS likes it best when all of the data for one subject are on one line. For doing paired t-tests, repeated measures ANOVAs and complicated ANOVA designs, your life will be easier if you enter your data this way. If you generally have a huge matrix of data for each subject in which this would be prohibitive, maybe SPSS is not the stats package for you. This will bring up the following window: One way to make these huge datasets more palatable is to enter the data into several smaller datasets and then merge the files later. Instructions on how to do this are in section 2.5. Type your variable name into the Variable Name Box (circled in red above). Variable names must have 8 characters or less. Specify the variable Type by clicking on Type (circled in green above). Numeric is the default, but date or string are other common types. If numeric, you can specify the number of decimal places here. Specify whether your variable is scale, ordinal, or nominal (circled in purple above). It has to be scale if you want to do things like add and average it or to do typical statistics like t-tests. Specify labels for your variables by clicking on 3 Labels (circled in blue above). I strongly recommend that you do this. Many the grad student has come back to their data a year later and had no idea what boms47 meant. Here is the Labels dialog box: Specify a Variable label (i.e. tell yourself that boms47 is the “Brat-o-meter Scale, question #47, hairpulling”). Enter this information in the Variable label box (circled in red). Then specify value labels if appropriate. For example, entering 1 may mean the person responded with “I never pull people’s hair”; 2 means “I pull people’s hair occasionally”; 3 means “I pull people’s hair often”… etc. In that case, you would enter 1 in the Value box above (circled in green), enter “I never pull people’s hair” into the Value label box (circled in blue), then click Add (red arrow). Your new value label will appear in the box circled in purple. Do that for value=2, 3, and so on until you have all of your values entered. Then click continue to return to the Define variable dialog box. Click OK in the Define Variable dialog box, and that variable will be created. If you want to do a whole slew of similar ones of these (e.g. boms1 - boms50), there may be easier ways. You can do one, and then copy and paste the syntax to create all of your variables. I’ll explain how to do this in the Using Syntax section below. Defining variables (SPSS version 10) The good news is that defining variables got much easier in SPSS version 10. At the opening screen, you will see two tabs at the bottom of the grid (circled in red below): You start out in the Data View tab. You can click on the Variable View tab to define variables. 4 Once in Variable View, you can enter a variable name in the first column, labeled name. Here, I have entered our old boms47 into the first column, and all of the defaults have filled themselves in: At the red arrow, you can see all of the characteristics of your variable that can be specified, including our old Label (meaning variable label) and Values (meaning value labels). Again, I strongly recommend that you use variable and value labels. If you click in the Label box for your variable, you will see 3 little dots in a box (circled in red below). Clicking on those dots will pop up the Value labels dialog box (circled in green below). You can add value labels using this dialog box in the same way you did in version 8. 5 1.2 Importing data from Excel Importing data from Excel is easy. You can type your variable names (again, 8 characters or less) in the first row (see red arrow below) and enter data below that. Once you are done, save the file as a Microsoft Excel 4.0 Worksheet. Most versions of SPSS cannot read anything newer than Excel version 4.0. Excel will prompt you to OK some things (like that you are only saving the active worksheet, not the whole book and that some features may be lost). If you are dealing with ordinary data, this should be fine. When in doubt, save as the current Excel version first as a backup. Once you have your data in Excel 4.0 format, open SPSS. Click on File Open Data (red arrow below) which will open the Open File dialog box. Under “files of type” choose “Excel (*.xls)” (green arrow below) to show your Excel 4.0 file. Choose your file (note: it must not be currently open in Excel or you will not be able to open it in SPSS). 6 Once you choose the file, the Opening File Options box (right) will pop up. If you put variable names in your Excel file (which I do recommend), make sure the “Read variable names” box is checked (red arrow right). Then click OK. This will read in your data (green arrow below) and pop up an output window (outlined in red below) that will show a Log that tells you how many variables and how many cases were read in. Check to make sure this is correct. Variable names longer than 8 characters will be truncated. Errors may result from funny characters in your variable names, or duplicates. In that case, SPSS will give some dummy variable name and will report an error in the Log. You should still go in and give Variable and Value labels as before. 7 2. Getting your data in shape 2.1 Calculating variables If you have a questionnaire or a scale, and need to combine variables or do any calculations on them, under the Transform tab, click on Compute (circled in red to right). This will pop up the Compute Variable dialog box (below). Type in the name of your new variable in the Target variable box (circled in red below). You can click on the Type&Label button below that to include a Type and label (as you did for your entered variables). Then you create the calculated expression in the Numeric Expression box (where the green arrow is). Here, I have taken the sum of three of our variables and divided by three— taking the mean. I “sent” those variable names over to the Numeric Expression box by highlighting them in the variable list (where the red arrow is) and clicking the arrow button (circled below in green). Then you can use the keypad buttons in the dialog box (or on your keyboard) to include arithmetic operators, parentheses, etc. Instead of adding and then dividing, I could have used the MEAN() function in the function box (blue arrow below). In this case, the same variable could have been created by typing MEAN(boms45, boms46, boms47). This function box also has useful functions like LN() (taking the natural log), MIN() (taking the minimum of the variables in its parentheses) and others. In some cases, you may want to compute a variable one way for one set of subjects and another way for another set. In that case you would use the If… button (purple arrow below) to specify the conditions under which you want to compute this variable in the way that you are specifying. I’ll describe that below in section 2.2. When you’re done and click OK, your new variable will appear at the end of your dataset (i.e. as the last variable). 8 2.2 The If… button The If… button will show up in quite a few dialog boxes, as it does in the Calculate Variable box above. Clicking it yields the If Cases dialog box below. The first thing to do is make sure the Include if case satisfies condition radio button is selected (red arrow below). Then place your condition in the box below that (where the expression is circled in green). Here, I have asked that only cases where id <= 8 be included (circled in green). So only subjects with ID#s less than or equal to 8 will be included. The buttons circled in blue below are the operators you can use to place your conditions. The table to the right are < Less than = Is equal to definitions for the operators. > Greater than ~= Is not equal to Right clicking on them will also <= Less than or equal to & AND give you their definitions. For >= Greater than or equal to | OR example, right clicking on ** ~ NOT will tell you that it is the exponential operation. So 5**2 is 5 squared, or 25. So for example, if you wanted to limit cases to females (where sex=1 means female) who were also at least 18 years old, you would enter sex=1 & age >= 18 in the box. Once you are finished with your If condition, clicking Continue will return you to the Compute Variable dialog box, or whatever box you were in prior to clicking the If… button. 9 2.3 Recoding Variables Once you have your variables in, you may decide that you need to recode. For example, you may need to reverse code certain variables. Clicking on Transform Recode (as below) gives you two options—Into Same Variables or Into Different Variables (circled in red below). Recoding into different variables leaves your initial variable intact. Recoding into the same variable does not. For example, let’s say that boms46 is a reverse coded item (e.g. 1 is I am a big brat; 2 is I am a medium sized brat; etc.) so 1 becomes 4, 2 becomes 3, and so on. Recoding into Same Variables To the left is the Recode into same variable dialog box. I have clicked boms46 over into the Numeric Variables box to be recoded. You can send over more than one variable at a time if they will need the same recoding operation (e.g. do all of your reverse coded items at once). I will then click the Old and New Values button (circled in red). You will also see our old friend the If… button which will let you specify the conditions under which you want to recode. Clicking on Old and New Values (above) brings up the Old and New Values dialog box below. Entering a 2 into the Old Value box (red arrow) and a 3 into the New Value box (green arrow) and then clicking the Add button (circled in red) will make all 2’s in the boms46 column change to 3’s. Once you add them, they will appear in the Old New Box where the 1 4 already 10 appears. You can also change a range of numbers using the three Range options (outlined in purple) or change all remaining values to some value. For example, you could recode all other values to system missing by clicking All other values on the left (inside the purple square) and system missing on the right (blue arrow) then clicking Add. Or change missing values to zeros by clicking system missing on the left (above the purple box) and entering zero in the new value box on the right (green arrow) then clicking Add. Don’t forget to click Add. (It’s easy to forget). When you’re done, click Continue to go back to the Recode dialog box. Then click OK. Recoding into Different Variables If instead you want to keep your original variable and just make a new one that recodes the original, use Recode into Different Variables. That dialog box is shown below. In this example, I have clicked boms46 over to the Numeric Variable Output Variable box (circled in red below) and typed boms46r (my new variable name) into the Output Variable Name box (circled in green below). Notice I have also entered a variable label into the Label box (green arrow below). Once you do this, click the Change button (red arrow below) and the question mark in the red circle below will change to boms46r. If you want to do more than one variable at this stage, click over the next variable into the Numeric Variable Output Variable box and repeat this process of typing in the new variable name and label. You have to click the Change button between each Variable that you want to change. Once you are done, click the Old and New 11 Values button (circled in blue to left). This will send you to an identical dialog box to that for Recode into Same Variables (above). Follow those instructions to recode your variable(s) then click Continue and OK. Your new variable will be at the end of your dataset. Special case: Median (or tertile or quartile) splits One common form of recoding is to divide your variable values into two groups, split at the median (or into four quartile groups, etc.). To do this in SPSS, there is a secret function in Rank Cases. Click on Transform Rank Cases (circled in red below). This will bring up the Rank Cases dialog box (below). Click over the variable(s) you want to recode (circled in green below) then click the Rank Types button (circled in blue below). You should leave the default of Assign Rank 1 to Smallest Value (red arrow above) unless you want your highest values to be assigned a value of 1 in your new recoded variable. Clicking on Rank Types (circled in blue above) will get you to the Rank Cases: Types dialog box below. 12 To do a median split, check the Ntiles box (red arrow to left) then enter 2 in the box (green arrow to left). Entering 3 would give you tertiles, 4 would give you quartiles, etc. You can also do a simple ranking by checking the Rank box (blue arrow to left), but that is not what is needed for the median split, so it is not checked. Click Continue to go back to the Rank Cases dialog box. Clicking on the Ties button (previous page circled in purple) will give you options for how to deal with ties. The default is to give them the mean of the two values, but you can also put ties into the lower or higher category if that is what you need. Once you are done, click OK in the Rank Cases dialog box. This will create a new variable that has the same name as your old variable with an n at the beginning. So in this example, we created nboms45, which takes on the value 1 if boms45 was below the median and 2 if boms45 was above the median. This variable will be placed at the end of your dataset. 2.4 Select Cases If you want, for example, to limit an analysis to only female subjects 18 or older, or to only Time 2 data, you can do that using the Select Cases function in SPSS. Click Data Select Cases (circled in red below). That brings up the Select Cases dialog box. To select a subset of cases based on some condition, click the If condition is satisfied radio button (red arrow below) then click the If… button (blue arrow below). The default is that cases that do not meet your condition are merely filtered (I’ll show you what that looks like in a minute). But you can also change that so that unselected cases are deleted (see green arrow below). Clicking the If… button takes you to our old friend the If… dialog box which you already know how to use. Set your condition (e.g. sex=1 & age >= 18, time=2, etc) then click Continue, then OK. Cases will be filtered (or deleted). At this select cases dialog box, you can also take a random sample of cases. When you are done with your specific analysis and want all of your data again (assuming you filtered and did not delete), just go to Data Select Cases again and click the All cases radio button (purple arrow below) then OK. 13 When you filter cases, a diagonal line will go through the case number as shown to the right (column indicated by the red arrow) for cases that are being filtered out. That is, for cases that are NOT selected. In this case, I selected cases if boms45 <= 2, so all 3’s and 4’s are filtered. Any analyses I do at this point will not include any subjects who scored a 3 or 4 on boms45. Don’t forget to Select all cases again when you are done. Incidentally, this also creates a variable called FILTER_$ in your dataset that takes a value of 1 if the case is selected and 0 if it is filtered out. You can ignore that variable if you like, but sometimes it can be useful. 2.5 Merging Files Sometimes it is easier to enter data into multiple separate data files (Time 1 data and Time 2 data for example, or each questionnaire in a separate data file) to keep file size more manageable. But at some point, you may need to look at data all together—that is, you need to merge your data files. There are two ways to merge data files—adding cases (or adding subjects) and adding variables. 14 Adding cases To add cases to an existing data file, go to Data Merge files Add Cases (circled in red below). That will pop up the Add cases: Read file window shown below. Click on the file that contains the cases you need to add. In this case, that is boms3.sav. Once you choose the data file (boms3 in this case), the Add cases dialog box will come up (to left). The variables with the same variable name are assumed to be paired up and will appear in the new file (in green box). Any variables that do not have the same name will be dropped from the new data file. In this case, however, we can see that ID# was called id in one data file and idnum in the other (circled in red to left). If we want to include those in the new data file, we would click on both id and idnum (which will highlight both of them) and then click the Pair button (red arrow), which will then include id and idnum in the new data file as one variable. When you’re done pairing variables, click OK. You will get a new dataset that includes all of the cases from both of your datasets. 15 Adding variables Adding variables is a little more tricky. You will need a variable with the same name in both files (for example, id). Before you start, you have to sort BOTH data files in ascending order by that variable, which SPSS calls a key variable. For example, you will see in the case to the right that the variable id (our key variable) is NOT sorted (red arrow to right). Merging to add variables will not work in this case. To sort by id, click on Data Sort Cases (circled in green below). This will pop up the Sort Cases dialog box. As you can see, I have clicked over id into the Sort by box (red arrow below), and it is sorted in ascending order (the default). Once you do this to both data files, you are ready to merge and add variables. Go into one of your data files, and click Data Merge Files Add Variables (circled in red below). This will pop up the Add Variables: Read File window. Choose the (sorted) file that has the additional variables you want to add to your current (sorted) data file. In this case that is boms2 (green arrow below). Click OK. 16 This will pop up the Add Variables dialog box to the right. The red arrow shows that one of the id variables is being excluded, while the rest of our variables are in the New working data file (blue box). The key variables box is currently empty (outlined in green). This is NOT what you want—this is just how the dialog box pops up by default. You will want to check the Match cases on key variables in sorted files box (green arrow to right) then click on id (red arrow to right) and send the id variable over to the Key Variables box (in green above) using the Red arrow button (circled in red above). Once you have done this, it will look like the dialog box to left. Then click OK. SPSS will warn you once again that your key variables must be sorted. Click OK on that and your new dataset will be formed. You can also exclude extraneous variables at this stage. For 17 example, if you calculated a total score from a questionnaire and don’t need all of the individual items, you can click on them in the New Working Data File box and send them (using the little arrow button) over to the Excluded Variables box. This is a good way to clean up your dataset so you are only looking at the variables you need. But make sure you keep your original data somewhere so you don’t have to re-enter it. 3. Analyzing your data Yay! Your data are all neat and tidy and ready to be analyzed. I have created a dataset called bomsclean.sav to use as an example. It contains: ID#, gender, family (only child vs. firstborn vs. laterborn), age, bomstot (the total score on our Brat-O-Meter Scale, rbomstot (a median split on the bomstot variable), and bomstot2 (another Brat-O-Meter scale taken a week later). You see why labels become important! By the way, these are completely made-up data, so you should not take any results reported here as representing anything other than unconscious biases in random data creation on the part of… well, me. 3.1 Independent Samples t-test OK, let’s start simple—a t-test. Are men more bratty than women? A t-test on bomstot by gender. Go to Analyze Compare Means Independent Samples T-Test (circled in red to right). Note: In SPSS8 this menu is called Statistics, not Analyze, but everything else is the same. This will pop up the Independent Samples T-Test dialog box (to right). Click your dependent measure(s) (here, bomstot—red arrow to right) into the Test Variable(s) box. You can do a bunch at once. Send your binary variable (here gender—green arrow to right) into the Grouping Variable box. You will see 2 little question marks in parentheses next to your grouping variable. This means you need to define your groups. Click the Define Groups button (blue arrow to right). This will bring up the Define Groups dialog box (below). Simply enter the values of your grouping variable (here 1=female; 2=male—circled in red below). You could also specify a cutpoint and do your t-test that way (e.g. 18 compare people who scored above 10 compared to below 10 on some scale) by using the cut point radio button (green arrow to left). I tend not to use this option. Click continue and then click OK in the T-test dialog box. The Options button in the T-test dialog box (above) doesn’t do much interesting. It does allow you to change the confidence interval alpha of the confidence intervals that the t-test spits out. The default is a 95% confidence interval, which is what most people want. Here is the t-test output: The window to the left above shows an outline of all of your output. I like to rename the tests so I can see what I’ve done. For example, I would call this T-test of bomstot by gender (rather than just T-test) I’ll show you how to do that later in the output section. You can see that SPSS has spit out the two categories (female and male—red arrow above), the N for each group (green arrow above) and the mean for each group (blue arrow above) as well as the standard deviation and the standard error. Woohoo, boys are brattier than girls according to the means, but is it significant? Levine’s test for quality of variances (outlined in red above) is not significant, so the variances can be assumed to be equal. In that case, you use the first line of results (in purple above). If the Levine’s test had been significant, we would use the lower line of results (in orange above). You can see the t value, degrees of freedom, and p value in the green box above, and the 95% confidence interval for the difference in the blue box. In this case, men and women are not significantly different on the Brat-O-Meter Scale, t(28)=-1.529, p=.137. 3.2 Paired t-test To do a paired t-test in SPSS, we will use the Time 1 vs. Time 2 bomstot variables. This will test whether people were brattier at the first time point (let’s say, right before a visit to see parents) and the second (right after the same visit). Go to Analyze Compare Means Paired Samples 19 T-Test (circled in red below). This will pop up the Paired samples t-test dialog box below. Click on the 2 variables that you want to compare (here bomstot and bomstot2—green arrows below) then click the arrow button (circled in blue below). This will pair those two variables. Again, the options button only allows you to change the percentage on your confidence interval, and the default is 95%. This will give you your paired variables in the Paired Variables box (see example to right). You can pair up as many variables as you want and do the t-tests at the same time. This is not a multivariate test—it simply save you trouble and does individual t-tests in a batch. Click OK when you’re done. You can see from the output (below) that there is a very low correlation between Time 1 and Time 2 (outlined in red). This would indicate that the boms scale has low test-retest reliability. 20 You can also see that there is not a significant effect of time (or not a significant difference between Time 1 and Time 2 boms score), t(29)=.499, p=.622 (outlined in green). The blue box, again, shows the confidence interval of the difference. 3.3 Oneway simple ANOVA The oneway ANOVA works pretty much like an independent samples t-test. We’ll do an ANOVA to determine whether birth order has an effect on brattiness. Go to Analyze Compare means One-way ANOVA (circled in red below). This will pop up the One-way ANOVA dialog box. You can see I have clicked over birth order into the factor box and bomstot into the Dependent list box (again, you can analyze more than one dependent measure at once). Here, you do get more options. Clicking the Options button (blue arrow below) takes you to the Options dialog box (to right). I generally check the Descriptive box (red arrow to right) to get descriptive statistics of the dependent measures for my groups. You can also check the Homogeneity of variance box (green arrow to right) to check that assumption of the ANOVA. 21 You can also click on the Post Hoc button (green arrow to left). Which will bring up the Post Hoc multiple comparisons window below. There are many post-hoc techniques to choose from. Simply check the box or boxes you want. You can change the familywise error rate by changing the value in the significance level box (circled in red below). As for choosing a post-hoc? I generally use Tukey—it seems like a good mix of controlling error and not being too conservative (Bonferroni is the most conservative). Clicking on the Contrasts button (red arrow above) will take you to the Contrasts box below. Enter the coefficients for your linear contrast one at a time into the Coefficients box (red arrow below) then click the Add button after each one (green arrow below). In this case, we will compare only children with people who have siblings. So only children get a coefficient of –2 and first and laterborns each get a coefficient of +1. I have already entered the –2 and the first +1. Now the second +1 is in the box. When I click Add it will appear below the other two (blue arrow below). The order of the coefficients is important. For the family variable, 1=only, 2=firstborn, and 3=laterborn, so the first coefficient in the box will go to only children and so forth. You can also check for linear or quadratic (or other polynomial) trends by checking the Polynomial box (purple arrow below) and then choosing linear, quadratic, etc from the pulldown menu beside it. That doesn’t make sense for this example, but might make sense if your factor was something like increasing dosages of a medicine. When you are done, click Continue, and then click OK in the ANOVA box. On the next page you will see the results of this analysis. It starts with the descriptive statistics that we selected in the Options window. It gives you the N, mean, standard deviation, etc. (red arrow below), as well as the minimum and maximum scores for each group (green arrow below). The Ns and mins and maxes are good numbers to double check to make sure there are no errors in your 22 23 dataset (or 99’s that someone entered as a missing value). Next, the Levine’s test for homogeneity of variances is not significant (circled in red above) so equal variances can be assumed. Next is a typical ANOVA table (outlined in green above) including SS, df, Mean Squares, F, and p. This analysis is not significant (probably because the data are completely random). Because you do not have a significant main effect, you should stop here, but we will look at the output from the contrasts and post-hocs anyway as a learning exercise. In real life, you do not look at these tests if your main ANOVA is not significant. The blue box above shows the contrast coefficients—this is just as a double-check. Next you have the contrast tests. Because Levine’s above was not significant, you can use the first row of numbers (assume equal variances). This table includes the contrast value blue arrow above), the t value (purple arrow above), df (orange arrow above), and significance (pink arrow above). In this case, the contrast value was –3.30 and was not significant t(27)=-.774, p=.446. Finally we come to the multiple comparisons. In the blue box above, you can see the mean difference for each pairwise comparison and the significance value. When a difference is significant, the mean difference is starred. The purple box above shows the confidence intervals for the difference—these all include zero, confirming that out differences are not significant. 3.4 Chi square contingency test This is the question about SPSS that I have fielded more than any other question. This oft-used test is just not where you would think. As an example, we can examine whether gender is associated with scoring above or below the median on the bomstot variable (using our median split nbomstot). Go to Analyze Descriptive Statistics Crosstabs (circled in red below). Click your two categorical variables into the Row and Column boxes (it doesn’t matter which goes into which). Then click the Statistics button (green arrow to right). This will pop up the Crosstabs: Statistics box below. Check the Chi-square box (red arrow below) to perform the Chi-square test on your contingency table. Click Continue, then OK. 24 Also notice that the Crosstabs: Statistics box is where you would go to perform a Kappa reliability test (blue arrow above)—Kappa is the reliability statistic used when two raters make categorical judgments rather than continuous ratings. Below is the Chi-square test output. First, you’ll see a Case Processing Summary (circled in green to left). This will pop up in many of the statistics you do. It’s good to check that you have the expected number of cases included and are not missing large portions of data. Next is the crosstab, or contingency table (red arrow to left). Finally the Chisquare test is reported (in blue box to left). The Pearson Chi-square on the first line is the typical test used for data of this sort. Notice that SPSS will warn you if you have expected cell counts lower than 5 (purple arrow to left). This test should not be used in that case. Note that the Chi-square model fit test is under Analyze Nonparametric tests Chi-square. This is a different test—one in which you assign expected values to cells and test the goodness of fit of that model The fact that these are called the same thing has tricked many an SPSS user. 3.5 Correlations (simple and partial) Simple correlations are a piece of cake in SPSS. You can do a whole slew of ‘em if you want. Go to Analyze Correlate Bivariate (circled in red below). Click over all of the variables that you want to correlate. In this case, we have age, bomstot and bomstot2 (Time 1 and Time 2 brattiness). SPSS will compute all pairwise correlation. That’s it—just click OK. 25 SPSS will spit out a nice table (see below). Each cell has a correlation coefficient, 2tailed significance, and N. Each correlation appears twice in the symmetrical table, and there are 1’s (as expected) on the diagonal. Easy as pie. Nothing significant here, as usual. SPSS will also do partial correlations in which you can examine the relationship between two variables controlling for a third. For example, we can look at the effect of age on Time2 brattiness controlling for Time1 brattiness. Go to Analyze Correlate Partial (circled in red below). Send the variables of interest into the Variables box, and the control variable(s) into the Controlling for box and click OK. 26 Below you will see the results of this analysis. The (symmetrical) table reports the correlation, degrees of freedom, and the 2-tailed p-value (outlined in green below). You can see that the partial correlation of age and bomstot2, controlling for bomstot, is a whopping -.0335. 3.6 Regression The linear regression function in SPSS covers a lot of ground. Go to Analyze Regression Linear (circled in red below). That will pop up the Linear regression dialog box shown below. Enter your dependent measure (here we used bomstot) into the Dependent box (red arrow below). Enter your independent variable(s) (here age) into the Independent(s) box (green arrow below). You can enter more than one independent variable here. Choose a regression method if you are using more than one independent variable using the pulldown menu (blue arrow below). 27 Enter is the default and is standard linear regression but you can also use stepwise regression, either forward and backward, enter (and remove) variables in blocks using the Previous and Next buttons, etc. This is a very versatile dialog box. Of more common use are the Statistics, Save, and Options buttons. The Statistics button (outlined in green to left) brings up the Statistics window below. Checking the estimates box (red arrow below) gives you estimated for your regression coefficients (or betas). Checking the Model fit box (green arrow below) gives you an R2 for the regression model. Checking R squared change will tell you the change in R2 if each variable (when you have more than one independent variable) is removed. Finally, checking casewise diagnostics (purple arrow below) will give you information on outliers outside a range that you specify (here 2 standard deviations). Clicking the Save button (outlined in purple above) allows you to save residuals of various kinds from your regression in a column in your dataset (outlined in red to left below). This is useful in examining residuals to look for a patterns and in computing corrected means. Finally, clicking the Options button (outlined in orange above) allows you to remove the constant from your regression (forcing it to go through zero) by unchecking the Include constant box (orange arrow to right below). It also gives some options for Stepwise regression. 28 There are clearly far too many regression options for this guide to explicate all of them, but again, right clicking on most options in SPSS will give you more information. To the right is output from a simple but typical SPSS regression analysis. The R and R2 are reported in the Model summary (red and green arrows to right respectively). An ANOVA table for the regression is also reported (outlined in blue to right). This tells whether your regression model as a whole is predicting a significant amount of variance. Finally the Beta coefficients and t-tests for them are reported in the orange box to right. Here, the only thing that is significant is the Constant (or the intercept). Don’t get excited boys and girls, that doesn’t help you get published. For logistic regression (in which the dependent variable is categorical instead of continuous), use Analyze Regression Binary Logistic (for a two-category DV) or Analyze Regression Multinomial Logistic (for a multi-category DV). Inputs look much the same, except one can use categorical independent variables as well as continuous. Enter all independent variables into the Covariates box, then click the Categorical button which allows 29 you to assign some of your “covariates” as categorical. Output will also include a Chi-square goodness of fit test (to test the goodness of your prediction) and a table of predicted values. A full treatment of logistic regression is beyond the scope of this guide, but it is fairly straightforward to use the SPSS functionality if you read and understand a chapter or so on the statistical test that you are performing. 3.7 ANOVA models and GLM SPSS offers pretty much any kind of ANOVA model you can think of. Let’s start with a univariate ANOVA. Actually, the univariate GLM encompasses ANCOVA as well. Go to Analyze General Linear Model Univariate (circled in red to right). Click your dependent measure (continuous) into the Dependent Variable box (outlined in green to right). Click over any fixed factors (ordinary ANOVA factors—categorical variables) into the Fixed Factor(s) box (outlined in blue to right). Enter any random effects factors (such as region, classroom, etc— check a statistics textbook if you are not sure) into the Random Factor(s) box (outlined in purple to right). Finally enter any continuous predictors, or covariates, into the Covariate(s) box (outlined in orange to right). There is generally some confusion about the meaning of the word covariate. Many people use covariate to mean “a variable I don’t care about,” as in “I’ll just covary out SES.” But in statistical and SPSS terms, a covariate is simply a continuous predictor. You CAN use this method to “covary out” age in the above example, but you would use the exact same technique if you were interested in the effect of age as well as your factor effects. Whew… now we have all of our factors and covariates in place, but there’s more. Click on the Model button (red arrow above) to specify anything less than a fully crossed model For example, let’s say that we are interested in main effects of gender, birth order, and age, as well as the interaction of gender and age, but no other interactions. We click on model which pops up the Univariate: Model dialog box below to left. Click on the Custom radio button (red arrow below to left) to specify a custom model. You will see that I have already sent over main effects for gender and family and am about to send over the main effect of age (green arrow below to left). Simply click on the effect you want to send over, then click the arrow button (outlined in purple below to left). One the panel below and to the right, you can see I have sent over the 30 main effect of age, and also the interaction effect of age by gender (orange arrow below to right). To do this just click on both age and gender, then while both are highlighted, click the arrow button (outlined in purple below to left). Once you have the custom model you want, click Continue. Going all the way back up to the Univariate dialog box on the previous page, clicking on the Contrasts button (green arrow on previous page) allows you to specify contrasts on your factors. Below you will see I have assigned Simple contrasts to the gender variable (red arrow to right). This is actually less than fascinating, because the gender variable only has 2 levels to begin with. But for the family variable, which has three levels, you can use simple (in which each level is compared to either the first or last level), deviation (in which each level except for one is compared to the overall effect of the variable, repeated (in which each level is compared to the one previous to it) Helmert, reverse Helmert (a.k.a. difference), or polynomial contrasts that examine linear and quadratic effects. Highlight the variable for which you want to assign a contrast in the Factors box, choose a type of contrast from the pulldown menu (green arrow to right), then click the Change button (Blue arrow to right). Finally, the Options button in the Univariate window (blue arrow on previous page) allows you to examine multiple comparisons in your factors, request homogeneity tests (green arrow below), etc. Here, we have requested descriptive statistics ed arrow below) and LSD multiple comparisons for the family variable (purple arrow below). By using the pulldown menu (blue arrow below) you can change the comparison technique to Bonferroni or Sidak. This window also allows you to do such things as report observed power, effect size estimates, etc. 31 Below and on the following page, you will see the output from this large analysis. First, below and to the left, the output simply reports your between subjects factors (red arrow below to left). You can double-check your Ns here. Next, you have the descriptive statistics that you requested in the Options box to left (green arrow below to left). This presents a nice table of means, suitable for later graphing. Next, below to the right, you have the Levene’s test for equality of variances (outlined in blue below to right) that you also requested in Options Because this test is not significant, you can assume your equal variance assumption was met. Next you have an ANOVA table (outlined below to right in orange) that reports F, df, p-value etc. for all of the main and interaction effects in your custom model. 32 To the left, you’ll see the results of the contrast we requested on the birth order variable. Level 1 (only child) is not different from Level 2 (firstborn) (see red arrow to left) and Level 1 is not different from Level 3 (later born) (see green arrow to left). The overall test results for this contrast indicate it is not useful (see nonsignificant p circled in orange to left). Next, we have the estimate marginal means for birth order controlling for our covariate—age (outlined in pink to left). SPSS did also spit out pairwise comparisons for the birth order variable, but that output looks identical to the pairwise comparisons we produced in the simple oneway ANOVA example so we will not go through them in detail here. As you can see, quite a bit of output is generated in response to all of these extra tests. There is, luckily, some help for you with output that we will explore in the Output section of this guide. You can also see that the output and examples get more complicated as the statistics get more complicated. I strongly urge you not to use any statistics in SPSS that you are not quite familiar with. It is very easy to point and click your way to mistaken conclusions, and this guide is not meant to substitute for strong knowledge of the statistics you wish to use. 33 Repeated measures To give a full example of the functionality of the Repeated measures GLM, I have added 4 new variables to our dataset. They are: bomsfam1, bomsfam2, bomsfrd1, and bomsfrd2. These assess the family and friend subscales of the BOMS scale at Time 1 and Time2. These will help me to show an example of a fully crossed within-subjects design. To run a repeated measures ANOVA, go to Analyze General Linear Model Repeated Measures (circled in red to right). This will pop up the Repeated Measures: Define Factor(s) dialog box below. Here, you enter each withinsubjects factor in your design (saving your between subjects factors for later). I have already entered the subscale (family vs. friends) factor (pink arrow to right). To enter the time factor (Time 1 vs. Time 2), enter time in the Within-subject factor name box (purple arrow to right) then enter the number of levels for this factor (blue arrow to right) then click Add (green arrow to right). Once you have Added all of your within-=subjects factors, click the Define button (orange arrow to right). This will pop up the Repeated measures dialog box below. Here you can enter your between subjects factors (here, birth order, blue arrow below) and covariates (here, age, orange arrow below). You also need to define your within subjects variables at this point. I have already defined 3 of the four cells needed. You need to look carefully at the order of your crossed variables (see red box to left). Here, subscale is the first number in parentheses and time is the second. So (1,2) would be Family, Time2. We still need to enter the last cell (2,2) (see green arrow to left) by clicking over BOMS Friend scale Time 2. The Model, Contrasts, and Options buttons work the same way as those in the Univariate GLM example above. Once you have specified all of those to your liking, click OK. 34 To the left is the first page of output from the repeated measures GLM. This output can be very confusing. First you have a table of your withinsubjects factors (reed arrow to left). Next you see your between subjects factor(s) (green arrow to left). Next is a large and scarylooking table of Multivariate tests (orange arrow to left). In most cases, you can actually ignore this table. The multivariate tests are not necessarily the tests you need to look at, although they are often equivalent to the within- and between-subjects tests later. Next, something called Mauchly’s test of Sphericity will print out. In this example, there were not sufficient degress of freedom to do this test. If Mauchly’s test is significant, you should NOT use the 35 “Sphericity Assumed” row in your ANOVA table. (red arrow to right). Otherwise, in most cases, you can assume Sphericity. In fact, in most cases, all rows within a cell of this table will look the same. This table also gives information on the error terms for each group of tests— most importantly, the MSE for these tests (green arrows to right). Next, SPSS prints out tests of within-subjects contrasts (red arrow on next page). It does this even if you don’t request it, and uses linear trend contrasts as a default. These tend not to be useful to most people. You can ignore this table too. Finally, you get to your between subjects effects ANOVA table (purple arrow on next page). You can see that Repeated measures GLM outputs quite a bit of material. You will probably want to tidy this output up a little, which will be demonstrated in the Output section of this guide. You can also see that we have a significant 3-way interaction in these data (subscale*time*family above), thus showing that Type I error will give you a significant result every so often even when nothing is going on. 36 3.8 Reliability Another common analysis is to determine alpha reliability—either for scale or questionnaire items or among raters or coders. In either case, the items (or people) to be compared must be entered in columns and the subjects or observations must be entered in the rows. If you have your data entered backwards, there is a transpose function in Excel’s Paste Special window. In this case, we will use our old BOMS items and determine reliability. Here we will look at boms1-boms10. Go to Analyze Scale Reliability Analysis (circled in red below). This will pop up the Reliability Analysis dialog box below. Click over all of your items or coders (here, bomns1-boms10) into the Items box. Make sure your Model is set to Alpha (orange arrow below). You can also set this Model to split-half or some other forms of reliability. If you like, you can press the Statistics button (outlined in green below). That will take you to a dialog box in which you can do item analysis (e.g. get the alpha with each item of the scale deleted to see if any items are pulling your alpha down, etc.). Otherwise, just press OK to see your alpha. 37 Easy as pie—you can see the Alpha in the simple output below (red arrow below). Generally an alpha of .7 or higher is considered acceptable. All things being equal, alpha does tend to get higher as more items (or more coders) are introduced. 38 4. Taking a look at your data 4.1 Checking the numbers One way to get a simple look at your data is to look at frequencies or tables. Tables can give you an idea of means or medians, etc for your groups. Frequencies can alert you to outliers or data entry errors. Perhaps this section should have come before data analysis, but I can never resist getting a peek at significance levels before I tease myself with means and pretty graphs. I’m weird that way. Frequencies Go to Analyze Descriptive Statistics Frequencies (circled in red to right). This will pop up the Frequencies dialog box. Click over the variable(s) you are interested in. Click on the Statistics button (green arrow to right) to get the box below. There you can ask for quartiles, mean, median, mode, and other descriptive measures. You can click on the Charts button (blue arrow to right) to request, for example, a histogram (red arrow below). Finally, the Format button (purple arrow above) allows you to do such things as switch your frequency table order from ascending to descending order by variable values, or to ascending or descending order by frequency count. The next page has a sample frequency output, with a histogram requested using the Charts button as indicated by the red arrow to left. 39 The output is fairly straightforward. It gives the observed values of your variable (red arrow to right), the observed frequency (green arrow to right), the percentage of observations with that value (blue arrow to right). The Valid percent column (purple arrow to right) gives the percentages based on only non-missing observations (in this case that is the same). Finally you get the cumulative percent (orange arrow to right). Then you can see the histogram that we requested, clearly showing one outlier. Note: Dealing with outliers and transformations In order to eliminate outliers from analyses, you would use the Select cases function described earlier. If your histogram showed you that you needed to transform your data, you would use the Compute function described earlier to take the square root, inverse, cosine, or whatever transformation is necessary Tables Tables are also a good way to get a quick look at what’s going on in your data in preparation for graphing. Go to Analyze Reports Case summaries (circled in red below). Click over the variable you want statistics for in your table (see green arrow below), and click over any grouping variables (see blue arrow below) Here, we will look at means and standard errors for bomstot by birth order. I prefer to uncheck the Display cases box (orange arrow below) because I don’t want a frequency table—I just want the summaries, but you could leave that checked if you wanted a frequency table at the same time 40 Click the Statistics button (outlined in green to left) to choose which statistics will go into the table. Below you can see we have selected mean and standard error of the mean. Click the Options button (outlined in blue to left) if you want to change the title of your table or exclude the “Total” category in your tables (see red arrow below). Below is sample output from the table we have created. The output shows the means and standard errors for the three groups sorted by birth order (see green arrows to right) as well as for the whole sample (red arrow to right). 41 4.2 Graphing and plotting OK, it’s pretty picture time. You can use scatterplots to get an idea about the relationship between two variables, histograms to get an idea about the distribution of your variables, and bar charts to help interpret interactions or to show your results to your friends and family (I include grant reviewers in this category). Scatterplots To create a scatterplot, go to Graphs Scatter (circled in red below). This will pop up the Scatterplot dialog box in which you select a style of scatterplot. A simple scatterplot (red arrow below) will serve most people’s purposes. Choose your style then click Define. This will pop up the Simple Scatterplot dialog box below. Choose your X and Y axes from your variable list (green arrows below). Click on Titles (outlined in blue below) to add titles to your scatterplot. You will probably not need to click on Click on Options (outlined in orange below). Once you are finished, click OK to get your scatterplot. You can see the very straightforward output to right. 42 Histograms We saw one way to create histograms using the Frequencies function in the last section. You can also create them another way. Go to Graphs Histogram (circled in red below). This pops up the histogram dialog box (to left). Click over the variable you want to graph. You can click on the Titles button to add titles. You can check the Display normal curve box (green arrow to left) if you want a normal curve superimposed on your histogram. To right you can see the output from a sample histogram on bomstot2. Bar Charts Bar charts are also fairly easy to create in SPSS. Personally, I tend to create my bar charts in Excel because they are easier to format, and you can add error bars to Excel Bar charts. As far as I know, there is no way to add error bars to SPSS bar charts. This is another of those frequently asked questions. To create a bar chart, go to Graphs Bar (circled in red below). This will pop up the Bar Charts dialog box below. I tend to use clustered bar charts most often (green arrow below) because they help to understand what is going on in an interaction. Choose your bar chart style and then click Define. This will pop up the Define Clustered Bar Charts dialog box below. Select the two grouping variables by which you want to cluster your data (blue arrows below). These would be the two variables that interact on the dependent measure. Then click over the continuous variable that you want to graph into the Variable box (orange arrow 43 below). Note that the Other summary function radio button must be clicked in order to create this kind of bar chart. You could, instead, do a bar chart on number of cases, or percentage, using one of the other radio buttons. You can change the summary function from mean (the default) to median or some other function by clicking the Change summary button (purple arrow below). Again, you can add titles by using the Titles button. In this case, I do generally click the Options button and deselect (uncheck) the Display groups defined by missing values checkbox. If you don’t do this, you will get an extra group for anyone who is missing values in your dataset and it gets in the way, in my opinion. Once you are done, click OK to see your bar chart. 44 And here is your completed bar chart. Line charts and other types of graphs are equally simple to create, so I will leave it to you to play around with the rest of those. 5. Output 5.1 Organizing As we mentioned before, some of these analyses spit out large amounts of output that you don’t really need. In addition, a happy day of data analysis can leave you with more tests that you can handle, so keeping things organized is the goal of this section. We have been kind of ignoring the lefthand side of the output window—the organizational part. You can see in the output to right that it is hard from the output window to know exactly what analyses were done. The first big help is to rename the tests. Instead of T-test, report WHAT the t-test was on. You can also click on the little minuses to temporarily hide analyses. Finally, you can 45 see in the output window above that there are Notes whose icons look like a closed book rather than an open book. These are hidden sections of output. They will remind you exactly what analysis you are looking at, whether a filter was in place, etc. You can unhide these notes (or hide any visible output component) by double clicking on it. To rename a component, do not double click on it. Rather, click on it twice, slowly, to highlight the name so that you can change it. Below you will see a much tidier example of the same output, in which we have hidden the Graph and renamed all of the components Still, if you print out your output, none of these pretty organizational things will show up. So you need to incorporate some organization into the right side of the results window. Double clicking on any element in the results window allows you to edit it. For example, double clicking on the t-test title (red arrow above) will allow you to edit that title to read T-test of bomstot by gender, or whatever is helpful to you. Double clicking on charts and graphs will give you options to change them, add titles, change the axes, etc. Double clicking on output tables will allow you to go in and change numbers, or copy and paste the cells out into Excel or some other program. To add text to your output, click on Insert New Text (circled in red to right). This will allow you to incorporate text notes into your output to help remind you what you did. 5.2 Results Coach Another way to help you plow through mountains of output that may not make sense is to use the Results Coach. Double click on a table or component in your results that you want explained (in this case, we’ll use the Tests of between subjects effects in our ANOVA (red arrow below). Then go to Help Results coach (circled in red below). This will pop up a window that helps you understand the results you are seeing. 46 To the left is the Results Coach. Simply hit the Next button (green arrow to left) to cycle through the information given by the coach. This is a very helpful feature. 47 6. Using syntax There are two simple ways to start using syntax. Either you can save a specific analysis by using the Paste function, or you can log your entire session in a Session Journal. 6.1 The Paste function All of the functions that you can use in SPSS to compute variables, do statistics, and create graphs have a little button near the Cancel and OK buttons called Paste. Here is an example from the Univariate ANOVA case (red arrow below). Hitting the Paste instead of the OK button will paste the syntax associated with the action you are about to perform into a syntax window (which will pop up automatically). Below you will see the syntax associated with this analysis. To run this syntax, highlight the part you wish to run (all of it in this case) and then hit the Arrow button (orange arrow below). If you do this each time you are about to run an analysis, you will have a record of the statistics you have done. You can go in and edit just as you would text if you make a mistake. You can also copy and paste and then just make small adjustments in the pasted syntax if, for example, you need to do something very similar many times. Once you have a syntax file you are happy with, just save it using the File menu as you would any other file. 6.2 Creating a Session Journal Actually, SPSS has been creating a session journal, a kind of log file, every time you use SPSS. But it has been putting it in a temporary directory and probably overwriting it. Go to Edit Options (circled in red below). In the Options window, go to the General Tab (it will probably come up by default). Outlined below in green, you will see the Session Journal Options. If the box for Record syntax in journal is not checked, check it. You will see that right now, my syntax has been recorded in C:\WINNT\TEMP\spss.jnl. You can click the Browse button in the green box to choose a file or directory that you’d like to save your syntax into. Decide whether you want to append the files each time or overwrite it each time you begin a new session. Then click OK to have this Option take effect. This will save all of the syntax for your entire session into a 48 file that you choose. You can then go in and highlight parts to run them again at a later time, or else simply keep the syntax as a record of your analyses. 7. For more information For more information, the SPSS manuals that came with your software are good references. They are not so hot at getting you started using SPSS, which is why I created this guide, but once you know what you’re doing, they can help you with specific questions. Even easier, though, are the help files included with SPSS. Right clicking on most things will give you an option to choose “What’s this?” or may simply pop up an explanation. If those don’t work, the Help Topics will bring up a window that has Contents, as well as an Index and a Find Tab that can help you to find more information on specific kinds of analyses. Finally, you can contact technical support assuming you are using a licensed copy of SPSS. Go to whoever handled the licensing and ask for the tech support number, or else seek out the technical support person in your organization. If you have any questions about this guide, please e-mail me at pam@psych.stanford.edu. 49