Notes on SPSS

advertisement
Using IBM SPSS More Effectively
Updated September 21, 2010
This file contains most of the collective wisdom of the Cohen Center regarding the
effective use of SPSS PASW IBM SPSS. It assumes a good working knowledge of basic
SPSS procedures and provides a guide to nonobvious shortcuts and other tricks of the
trade.
Index
Case Sensitivity................................................................................................................... 1
Command Terminator and EXECUTE ................................................................................ 1
Defining MISSING VALUES ............................................................................................ 2
Ordering Variables .............................................................................................................. 2
The DO REPEAT Command .............................................................................................. 3
Programming with Conditional Statements ........................................................................ 3
Vectors and Loops .............................................................................................................. 4
The AGGREGATE command ............................................................................................... 6
Case Sensitivity
SPSS is not case sensitive. You can write a command as FREQUENCIES (the official
style), frequencies, Frequencies, or even FrEqUeNcIeS (though you’d be mad to).
In code fragments throughout this guide, we will use both the all upper case conventions
(typically copied from SPSS commands we pasted into the syntax editor) or all lower
case (typically written by ourselves). Variable names are also case insensitive.
Command Terminator and EXECUTE
By default, Stata treats the carriage return at the end of the line as the end of a command
and will do it (the are called .do files, after all) immediately. SPSS follows a different
model. Every line of syntax must be terminated by a full stop period (the command
terminator). Failure to do so will lead SPSS to treat separate commands as adjoining,
which will lead to error messages. (Because SPSS does not stop running when it
encounters an error, this can potentially involve a lot of backtracking to find and fix. You
should therefore run code in small increments, check the output carefully for error
messages, and continue on to the next section of code, assuming all is well.) The benefit
of the command terminator is that no special treatment is required to break long lines of
syntax, except for strings of text enclosed in parentheses, like file names. (SPSS defaults
to displaying 80 characters, so it is recommended that you keep your code to no more
than 80 characters per line.) Files names may be broken across lines as follows:
1
save outfile='C:\Cohen Center\BRI\BRI17\Data\'+
'some file.sav'/compress.
Note that you can use either single and double quotation marks.
So far, we have established that SPSS defines lines of code differently to Stata. It also
executes code differently. Stata executes a command immediately upon reaching a
carriage return (unless you break the line in some other fashion). SPSS will read, but not
run, code until it either reaches a command that requires immediate execution, primarily
analytic commands like FREQUENCIES or OPEN or SAVE commands, or the EXECUTE
command (followed by the command terminator, naturally).
Defining MISSING VALUES
SPSS allows users to define MISSING VALUES for particular variables, e.g. -999 for
survey nonrespondents:
missing values vara (-999).
See the syntax reference manual for details on declaring more than one value missing for
a variable and declaring the same values missing for a set of variables.
StatTransfer now writes SPSS missing values as .a, .b, etc. where the initial letter of the
value label supplies the encoding in Stata. System missing values are still encoded as ..
Missing values in FREQUENCIES and CROSSTABS
It is possible to include user-defined missing values in frequencies or crosstabs
using the /MISSING=INCLUDE subcommand:
FREQUENCIES VARIABLES=briisrl
/MISSING=INCLUDE
/ORDER=ANALYSIS.
However, it is not possible to include system missing values in FREQUENCIES or
CROSSTABS.
Ordering Variables
The only way to reorder the variables in a dataset is to use the KEEP subcommand with
the SAVE command.
SAVE OUTFILE='Z:\BRI\BRI16to19\Data\16to19 parent
cleaned.sav' /KEEP token replicatelime pntid pntdatestamp
pntstartdate pntsubmitdate ALL /COMPRESSED.
Note that you have to then open the saved file in order to see the variables in the specified
order. They don’t reorder themselves in the file you have open.
2
The DO REPEAT Command
The DO REPEAT command in SPSS is very similar to Stata’s foreach command for a
varlist, as both allow execution of the same command across multiple variables. Just
as one refers to the element of foreach via a local macro (e.g., `x') in Stata, one
assigns what SPSS calls a stand-in variable for the various elements of the set of
variables specified in DO REPEAT. Here, we RECODE a set of variables:
do repeat sex=sex1 to sex10 /female=female1 to female10.
recode sex (1=1)(2=0)(else=SYSMIS) into female.
end repeat.
execute.
The first line specifies what is to be repeated. Here sex1 to sex10 are preexisting variables
and female1 to female10 are new variables to be created (SPSS knows this because there
aren’t any variables called female1 to female10 in the dataset). The stand-in variable sex
(I use bold to designate that this is an ordered vector of variables) is for that set of sex1 to
sex10, while female stands in for female1 to female10. The second line (which could, of
course, be many lines) simply does a RECODE of sex (1 = female, 2 = male) into female
for each of the 10 variables. The end of a DO REPEAT command must be specified by
END REPEAT so that SPSS can drop the stand-in variables. Finally, any self-respecting
SPSS command that transforms or creates variables needs to be followed by EXECUTE or
an analytic command.
Programming with Conditional Statements
SPSS handles programming with conditional statements differently—and generally more
effectively—than does Stata. Simple conditional statements are, of course, rendered as:
if (y=0) x=0.
if (y=1) x=z.
Where SPSS excels, though, is the DO IF command. This allows you to conditionally
execute multiple commands under certain logical circumstances. For instance, in the not
uncommon situation where an earlier item branches cases out and you need to impute
zeroes, you can essentially set up the logical structure by DO IF commands. Even better,
one DO IF can be nested in another DO IF command. In the example that follows, we
recode Jewish education items. I further reduce code by using the DO REPEAT command.
do if jeduyn=0.
do repeat edu=sunjryrs hebjryrs dayjryrs sunsryrs
hebsryrs daysryrs.
compute edu=0.
end repeat.
else if jeduyn=1.
+
do if jedujryn=0.
+
do repeat jredu=sunjryrs hebjryrs dayjryrs.
+
compute jredu=0.
+
end repeat.
+
else if jedujryn=1.
3
+
do repeat jreduyn=sunjryn hebjryn dayjryn/
+
/jreduyrs=sunjryrs hebjryrs dayjryrs.
+
if (jreduyn=0) jreduyrs=0.
+
end repeat.
+
end if.
+
do if jedusryn=0.
+
do repeat sredu=sunsryrs hebsryrs daysryrs.
+
compute sredu=0.
+
end repeat.
+
else if jedusryn=1.
+
do repeat sreduyn=sunsryn hebsryn daysryn/
+
/sreduyrs=sunsryrs hebsryrs daysryrs.
+
if (sreduyn=0) sreduyrs=0.
+
end repeat.
+
end if.
end if.
execute.
Vectors and Loops
If you want to do the same sort of transformation to a set of variables in SPSS, it is most
efficient to use vectors and loops. A VECTOR is a user-defined set of adjacent variables
(i.e., they all have to be right next to each other in the dataset). Once you declare a group
of variables as a VECTOR then you can LOOP through the elements of that VECTOR, doing
whatever to them using the LOOP command. Let’s say we have a survey where we asked
questions about the members of households. Answers were recorded for each member of
the household, and you could have up to 10 people in your household, so there are 10
variables per question. First, we declare a VECTOR for each question, each one being 10
variables long. Then, at the end we get SPSS to automatically create 10 new variables
and put them in a new VECTOR.
vector /* This declares a set of vectors */
cjewvec=cjew1 to cjew10/ /* This is a vector of preexisting
variables */
rjewvec=rjew1 to rjew10/ /* So is this */
jewc (10 F4.0). /* This creates 10 new variables called
jewc1 through jewc10, all in F4.0 format and puts them in a
vector called jewc */
Now we want to LOOP through the household members, doing various things to each
variable. Here, we just want to COMPUTE a new set of variables, jewc1 through jewc10,
for each household member (1 through 10) to be 1 if cjew=1 or rjew=1 for that household
member (where these are cjew2 and rjew2 for the second household member, and so on).
The key to this is the scratch variable #hhmem. #hhmem isn’t a real variable, but a
temporary one that changes each time the LOOP runs, similar to the stand-in variable in
DO REPEAT. When we declare the LOOP we also declare the scratch variable and how
often we want the LOOP to run. If everyone had 10 members in their household we would
write:
4
loop#hhmem=1 to 10.
if (cjewvec(#hhmem)=1&rjewvec(#hhmem)=1) jewc(#hhmem)=1.
end loop.
This will cause everything within the LOOP to run 10 times, with the value of #hhmem
starting at 1 and increasing by 1 each time until it gets to 10. It would be just as if we
wrote code like this:
if cjew1=1&rjew1=1 jewc1=1.
if cjew2=1&rjew2=1 jewc2=1.
etc…
But because we’ve declared cjew, rjew, and jewc as vectors, we can just tell SPSS to run
the syntax 10 times, stepping through the elements of the vectors each time.
Now, since not everyone has 10 people in their households, we don’t necessarily want to
LOOP 10 times. We only want to LOOP as many times as there are households for a given
case. The number of household members is stored in the variable hhmems. (Note that this
is the real variable hhmems, not the scratch variable #hhmem, which keeps changing
depending on which case the LOOP is operating on.) We can do it this way:
loop #hhmem=1 to hhmems.
if (cjewvec(#hhmem)=1 & rjewvec(#hhmem)=1) jewc(#hhmem)=1.
end loop.
execute.
Like with DO REPEAT, we need to declare the end of the LOOP, here with an END LOOP
command. Note the EXECUTE command at the end of this fragment of code. Anytime you
run a command that causes SPSS to actual compile results (either an EXECUTE command
or some sort of analysis, like FREQUENCIES), SPSS clears all of the vectors you created.
The variables cjew1 to cjew10 are still there, but they no longer correspond to the
VECTOR elements cjewvec(1), cjewvec(2), and so forth. However, the new variables you
created with the VECTOR command (jewc1, jewc2) are permanent parts of your dataset,
although it does forget that they correspond to the elements jewc(1), jewc(2) in the
VECTOR jewc. If you ran the previous VECTOR command again, it wouldn’t work because
the jewc variables 1 through 10 have already been created. So, if you executed the above
code, then needed to run some FREQUENCIES, and then wanted to do more with the jewc
variables you’d have to do something like this:
vector
cjewvec=cjew1 to cjew10/
rjewvec=rjew1 to rjew10/
jewc=jewc1 to jewc10.
This time you declare a VECTOR of jewc variables, instead of creating scratch variables,
because they already exist.
5
You can embed DO IF and DO REPEAT statements inside loops and vice versa. You can
also nest loops inside one another, so you have one LOOP that’s looping through
household members and another that’s looping through religions, and can write a line that
refers to both scratch variables. However, you can’t use RECODE statements with VECTOR
elements. You have to use IF or DO IF statements instead.
The AGGREGATE command
SPSS has an equivalent to Stata’s egen total, mean, and other functions which make
calculations down columns rather than the usual along rows for within cases. Here’s a
rough example of code using AGGREGATE to calculate sample sizes within strata.
* Create sample size within stratum aggregate
/outfile=* mode=addvariables overwrite=yes
/break=stratum
/n_h=sum(complete).
6
Download