Using IBM SPSS More Effectively Updated September 21, 2010 This file contains most of the collective wisdom of the Cohen Center regarding the effective use of SPSS PASW IBM SPSS. It assumes a good working knowledge of basic SPSS procedures and provides a guide to nonobvious shortcuts and other tricks of the trade. Index Case Sensitivity................................................................................................................... 1 Command Terminator and EXECUTE ................................................................................ 1 Defining MISSING VALUES ............................................................................................ 2 Ordering Variables .............................................................................................................. 2 The DO REPEAT Command .............................................................................................. 3 Programming with Conditional Statements ........................................................................ 3 Vectors and Loops .............................................................................................................. 4 The AGGREGATE command ............................................................................................... 6 Case Sensitivity SPSS is not case sensitive. You can write a command as FREQUENCIES (the official style), frequencies, Frequencies, or even FrEqUeNcIeS (though you’d be mad to). In code fragments throughout this guide, we will use both the all upper case conventions (typically copied from SPSS commands we pasted into the syntax editor) or all lower case (typically written by ourselves). Variable names are also case insensitive. Command Terminator and EXECUTE By default, Stata treats the carriage return at the end of the line as the end of a command and will do it (the are called .do files, after all) immediately. SPSS follows a different model. Every line of syntax must be terminated by a full stop period (the command terminator). Failure to do so will lead SPSS to treat separate commands as adjoining, which will lead to error messages. (Because SPSS does not stop running when it encounters an error, this can potentially involve a lot of backtracking to find and fix. You should therefore run code in small increments, check the output carefully for error messages, and continue on to the next section of code, assuming all is well.) The benefit of the command terminator is that no special treatment is required to break long lines of syntax, except for strings of text enclosed in parentheses, like file names. (SPSS defaults to displaying 80 characters, so it is recommended that you keep your code to no more than 80 characters per line.) Files names may be broken across lines as follows: 1 save outfile='C:\Cohen Center\BRI\BRI17\Data\'+ 'some file.sav'/compress. Note that you can use either single and double quotation marks. So far, we have established that SPSS defines lines of code differently to Stata. It also executes code differently. Stata executes a command immediately upon reaching a carriage return (unless you break the line in some other fashion). SPSS will read, but not run, code until it either reaches a command that requires immediate execution, primarily analytic commands like FREQUENCIES or OPEN or SAVE commands, or the EXECUTE command (followed by the command terminator, naturally). Defining MISSING VALUES SPSS allows users to define MISSING VALUES for particular variables, e.g. -999 for survey nonrespondents: missing values vara (-999). See the syntax reference manual for details on declaring more than one value missing for a variable and declaring the same values missing for a set of variables. StatTransfer now writes SPSS missing values as .a, .b, etc. where the initial letter of the value label supplies the encoding in Stata. System missing values are still encoded as .. Missing values in FREQUENCIES and CROSSTABS It is possible to include user-defined missing values in frequencies or crosstabs using the /MISSING=INCLUDE subcommand: FREQUENCIES VARIABLES=briisrl /MISSING=INCLUDE /ORDER=ANALYSIS. However, it is not possible to include system missing values in FREQUENCIES or CROSSTABS. Ordering Variables The only way to reorder the variables in a dataset is to use the KEEP subcommand with the SAVE command. SAVE OUTFILE='Z:\BRI\BRI16to19\Data\16to19 parent cleaned.sav' /KEEP token replicatelime pntid pntdatestamp pntstartdate pntsubmitdate ALL /COMPRESSED. Note that you have to then open the saved file in order to see the variables in the specified order. They don’t reorder themselves in the file you have open. 2 The DO REPEAT Command The DO REPEAT command in SPSS is very similar to Stata’s foreach command for a varlist, as both allow execution of the same command across multiple variables. Just as one refers to the element of foreach via a local macro (e.g., `x') in Stata, one assigns what SPSS calls a stand-in variable for the various elements of the set of variables specified in DO REPEAT. Here, we RECODE a set of variables: do repeat sex=sex1 to sex10 /female=female1 to female10. recode sex (1=1)(2=0)(else=SYSMIS) into female. end repeat. execute. The first line specifies what is to be repeated. Here sex1 to sex10 are preexisting variables and female1 to female10 are new variables to be created (SPSS knows this because there aren’t any variables called female1 to female10 in the dataset). The stand-in variable sex (I use bold to designate that this is an ordered vector of variables) is for that set of sex1 to sex10, while female stands in for female1 to female10. The second line (which could, of course, be many lines) simply does a RECODE of sex (1 = female, 2 = male) into female for each of the 10 variables. The end of a DO REPEAT command must be specified by END REPEAT so that SPSS can drop the stand-in variables. Finally, any self-respecting SPSS command that transforms or creates variables needs to be followed by EXECUTE or an analytic command. Programming with Conditional Statements SPSS handles programming with conditional statements differently—and generally more effectively—than does Stata. Simple conditional statements are, of course, rendered as: if (y=0) x=0. if (y=1) x=z. Where SPSS excels, though, is the DO IF command. This allows you to conditionally execute multiple commands under certain logical circumstances. For instance, in the not uncommon situation where an earlier item branches cases out and you need to impute zeroes, you can essentially set up the logical structure by DO IF commands. Even better, one DO IF can be nested in another DO IF command. In the example that follows, we recode Jewish education items. I further reduce code by using the DO REPEAT command. do if jeduyn=0. do repeat edu=sunjryrs hebjryrs dayjryrs sunsryrs hebsryrs daysryrs. compute edu=0. end repeat. else if jeduyn=1. + do if jedujryn=0. + do repeat jredu=sunjryrs hebjryrs dayjryrs. + compute jredu=0. + end repeat. + else if jedujryn=1. 3 + do repeat jreduyn=sunjryn hebjryn dayjryn/ + /jreduyrs=sunjryrs hebjryrs dayjryrs. + if (jreduyn=0) jreduyrs=0. + end repeat. + end if. + do if jedusryn=0. + do repeat sredu=sunsryrs hebsryrs daysryrs. + compute sredu=0. + end repeat. + else if jedusryn=1. + do repeat sreduyn=sunsryn hebsryn daysryn/ + /sreduyrs=sunsryrs hebsryrs daysryrs. + if (sreduyn=0) sreduyrs=0. + end repeat. + end if. end if. execute. Vectors and Loops If you want to do the same sort of transformation to a set of variables in SPSS, it is most efficient to use vectors and loops. A VECTOR is a user-defined set of adjacent variables (i.e., they all have to be right next to each other in the dataset). Once you declare a group of variables as a VECTOR then you can LOOP through the elements of that VECTOR, doing whatever to them using the LOOP command. Let’s say we have a survey where we asked questions about the members of households. Answers were recorded for each member of the household, and you could have up to 10 people in your household, so there are 10 variables per question. First, we declare a VECTOR for each question, each one being 10 variables long. Then, at the end we get SPSS to automatically create 10 new variables and put them in a new VECTOR. vector /* This declares a set of vectors */ cjewvec=cjew1 to cjew10/ /* This is a vector of preexisting variables */ rjewvec=rjew1 to rjew10/ /* So is this */ jewc (10 F4.0). /* This creates 10 new variables called jewc1 through jewc10, all in F4.0 format and puts them in a vector called jewc */ Now we want to LOOP through the household members, doing various things to each variable. Here, we just want to COMPUTE a new set of variables, jewc1 through jewc10, for each household member (1 through 10) to be 1 if cjew=1 or rjew=1 for that household member (where these are cjew2 and rjew2 for the second household member, and so on). The key to this is the scratch variable #hhmem. #hhmem isn’t a real variable, but a temporary one that changes each time the LOOP runs, similar to the stand-in variable in DO REPEAT. When we declare the LOOP we also declare the scratch variable and how often we want the LOOP to run. If everyone had 10 members in their household we would write: 4 loop#hhmem=1 to 10. if (cjewvec(#hhmem)=1&rjewvec(#hhmem)=1) jewc(#hhmem)=1. end loop. This will cause everything within the LOOP to run 10 times, with the value of #hhmem starting at 1 and increasing by 1 each time until it gets to 10. It would be just as if we wrote code like this: if cjew1=1&rjew1=1 jewc1=1. if cjew2=1&rjew2=1 jewc2=1. etc… But because we’ve declared cjew, rjew, and jewc as vectors, we can just tell SPSS to run the syntax 10 times, stepping through the elements of the vectors each time. Now, since not everyone has 10 people in their households, we don’t necessarily want to LOOP 10 times. We only want to LOOP as many times as there are households for a given case. The number of household members is stored in the variable hhmems. (Note that this is the real variable hhmems, not the scratch variable #hhmem, which keeps changing depending on which case the LOOP is operating on.) We can do it this way: loop #hhmem=1 to hhmems. if (cjewvec(#hhmem)=1 & rjewvec(#hhmem)=1) jewc(#hhmem)=1. end loop. execute. Like with DO REPEAT, we need to declare the end of the LOOP, here with an END LOOP command. Note the EXECUTE command at the end of this fragment of code. Anytime you run a command that causes SPSS to actual compile results (either an EXECUTE command or some sort of analysis, like FREQUENCIES), SPSS clears all of the vectors you created. The variables cjew1 to cjew10 are still there, but they no longer correspond to the VECTOR elements cjewvec(1), cjewvec(2), and so forth. However, the new variables you created with the VECTOR command (jewc1, jewc2) are permanent parts of your dataset, although it does forget that they correspond to the elements jewc(1), jewc(2) in the VECTOR jewc. If you ran the previous VECTOR command again, it wouldn’t work because the jewc variables 1 through 10 have already been created. So, if you executed the above code, then needed to run some FREQUENCIES, and then wanted to do more with the jewc variables you’d have to do something like this: vector cjewvec=cjew1 to cjew10/ rjewvec=rjew1 to rjew10/ jewc=jewc1 to jewc10. This time you declare a VECTOR of jewc variables, instead of creating scratch variables, because they already exist. 5 You can embed DO IF and DO REPEAT statements inside loops and vice versa. You can also nest loops inside one another, so you have one LOOP that’s looping through household members and another that’s looping through religions, and can write a line that refers to both scratch variables. However, you can’t use RECODE statements with VECTOR elements. You have to use IF or DO IF statements instead. The AGGREGATE command SPSS has an equivalent to Stata’s egen total, mean, and other functions which make calculations down columns rather than the usual along rows for within cases. Here’s a rough example of code using AGGREGATE to calculate sample sizes within strata. * Create sample size within stratum aggregate /outfile=* mode=addvariables overwrite=yes /break=stratum /n_h=sum(complete). 6