Using SPSS

advertisement
Using SPSS
1. What (who?) is SPSS?
SPSS (Statistical Package for the Social Sciences) is a powerful set of computer programs for
calculating statistics (and thus for saving you innumerable hours of calculating using pencil and
paper). The program is available on many personal computers (PCs) around campus. (These
PCs can be located at the following web site: http://www.it.iastate.edu/labsdb/) In these few
pages you will learn only a few commands for running SPSS. You will learn some more
commands as you are given SPSS programs to run during the course of the semester.
2. Entering data into SPSS
Like any program on a PC, it is started by double-clicking on the SPSS icon on the computer’s
desktop. This opens the “Data Editor” in SPSS. Here you can enter data manually (not advised),
import data, or type in (and run) a batch program that contains your data. Below are instructions
on how to enter data into SPSS using the latter two methods.
1
a. Importing data from an SPSS portable (with *.por extension) or SPSS systems (with
*.sav extension) file.
During the semester you will be provided with SPSS portable files that contain data for
you to analyze as part of your lab problems. You can import these files into SPSS by
(1) selecting “Open” then “Data” from the “File” menu, (2) changing “Files of type” to
“SPSS Portable (*.por)”, (3) changing “Look in” to the folder into which you have copied
the file, and (4) double-clicking on the file name. (Directions are the same for a systems
file, except in that “Files of type” should be changed to “SPSS (*.sav).”)
b. Using SPSS in “Batch Mode”
Around the time of the first exam, you will start getting SPSS “batch programs” with
your lab assignments. To complete the assignments you will need to run these programs
and print the output (that appears in an SPSS Viewer window). WARNING: SPSS will
only print parts of your output that you have highlighted in the left tree-pane of the SPSS
Viewer window. Pressing the button with the printer on it will not do anything if nothing
has been highlighted.
Batch programs must be typed using the SPSS Syntax Editor. Open the editor by
selecting “File” then “New” then “Syntax.” Next, type in your batch program exactly as
it appears in the lab. Then select “Edit,” “Select All,” and push the “Run Current” button
(i.e., the button with the right-pointing black arrowhead on it). This will execute your
batch program and send the output to the SPSS Viewer window (where you may also find
error messages if you mistyped parts of your program). WARNING: It is strongly
recommended that you use a fixed (i.e., NOT proportional) font when typing your batch
programs. Any Courier font should do the trick. Fixed fonts align your numbers in
straight columns, so that it is easer to tell if you have left too many spaces between them.
3. Why do batch programs?
Maybe you have used SPSS on a PC before, or maybe you have had a chance to play around
with the program a bit. In either case, you have probably discovered that once you have
imported data into SPSS, you can analyze these data entirely with mouse movements and clicks.
What could be easier? Batch programs just slow you down. Who needs them, right?
The answer lies in the fact that you can save (and rerun, if necessary) your batch programs,
whereas it is not as easy to recreate the mouse activities that generated your output. Just imagine
in a few years when you proudly take your results to your major professor and he exclaims,
“These results are hard to believe! How did you get these numbers?” Now imagine going back
to your computer and discovering that no matter how much you and your mouse try, you cannot
recreate the numbers on your output. On the other hand, if the numbers appeared in Table 23
and you search your PC for a file called “table23.sps,” just a quick run of this batch program will
(assuming that no one has manipulated your data or program file) generate an exact replica of the
output that your major professor seeks. In a sentence, batch programs are simply part of a good
2
record keeping strategy. In reviewing the literature, competent researchers always keep careful
records of their sources, right? Well, when performing a statistical analysis, only the
incompetent fail to keep just as meticulous records of their programs. Enough said.
4. Writing a program
a. When data are included in an SPSS batch program, the first line of the program will be a
“data list” statement. You will find one at the beginning of the following illustration:
data list records=1 / attend 1-2 prejud 4-5.
compute newx = (attend - 28)**2.
compute sstotal = (prejud - 37)**2.
compute ysq = prejud**2.
compute xy = attend * prejud.
compute xsq = attend**2.
compute yhat1 = 39.94926 + (-.10533 * attend).
compute resid1 = prejud - yhat1.
compute sspred1 = (yhat1 - 37)**2.
compute sserror1 = (prejud - yhat1)**2.
compute newxy = newx * prejud.
compute ssnewx = (newx - 259.5)**2.
compute yhat2 = 59.19815 + (-.085542 * newx).
compute resid2 = prejud - yhat2.
compute sspred2 = (yhat2 - 37)**2.
compute sserror2 = (prejud - yhat2)**2.
begin data.
11 36
46 33
3 6
16 42
41 49
21 51
23 61
10 23
34 57
48 18
28 65
55 3
end data.
plot plot=prejud with attend,newx.
frequencies vars=prejud ysq attend xsq xy / statistics=mean.
frequencies vars=yhat1 to sserror1 / statistics=mean.
frequencies vars=prejud ysq newx ssnewx newxy / statistics=mean.
frequencies vars=yhat2 to sserror2 / statistics=mean.
3
In writing batch programs, be sure that each line (except ones with data) ends with a
period. Also notice how the “data list” statement indicates that there is one line of data
(records=1) for each unit of analysis, that data on the variable, “attend” appear rightjustified in the first two columns, and data on the variable, “prejud” appear right-justified
in columns 4 and 5. When listed in a batch program, data are placed between a “begin
data” and an “end data” statement, and may be preceded by data transformation
statements (e.g., compute, recode, if, etc.) and followed by commands (e.g., plot,
frequencies, etc.). Note: “Commands” generate output, “statements” merely create or
modify variables.
b. You will be given programs such as this when they are required in your lab assignments.
The following common SPSS commands and statements are included to help you
understand the various parts of these programs:
1) Occasionally you may wish to get a scatter plot of your data. To plot “prejud” on the
vertical axis and “attend” on the horizontal axis, you would use the following
command:
plot plot=prejud with attend.
2) The “compute” statement is used to create a new variable as a mathematical function
of other variables. For example, imagine that you found a constant (or intercept) of
39.94926 and a slope of -.10533 in the regression of “prejud” on “attend”. You could
compute a new variable, “prejuhat”, that gives the estimated values of “prejud”
(according to this regression) for each of the 12 observations listed in the above
program. This would be done as follows:
compute prejuhat = 39.94926 + (-.10533 * attend).
3) The “recode” statement is used to change values on a variable. For example, consider
an income measure (let’s call it “rincome”) that takes a value of 1 for incomes less
than $1000, 2 for incomes between $1000 and $2999, 3 for $3000 to $3999, 4 for
$4000 to $4999, 5 for $5000 to $5999, 6 for $6000 to $6999, 7 for $7000 to $7999, 8
for $8000 to $9999, 9 for $10000 to $14999, 10 for $15000 to $19999, 11 for $20000
to $24999, 12 for $25000 or more, and 13 for refused to respond. These values could
be recoded (approximately) into dollar units with the following recode command:
recode rincome (1=500)(2=2000)(3=3500)(4=4500)(5=5500)(6=6500)
(7=7500)(8=9000)(9=12500)(10=17500)(11=22500)(12=35000)(13=99).
Note that values are changed to the midpoints (in dollars) of their corresponding
intervals. The choice of $35,000 as the midpoint of the highest income category is,
admittedly, somewhat arbitrary. Also, since 99 is the missing data code for
“rincome”, the last parentheses in the recode set the incomes of refusers to missing
(rather than not recoding it and in-so-doing assuming their average income to equal
$13.00).
4
4) The “if” statement can be used to combine information from different variables. This
is particularly useful when one has contingency items. For example, consider the two
contingency items “hit” (“Have you ever been punched or beaten by another
person?”) and “hitage” (“Did you experience this beating (or these beatings) as a
child, as an adult, or in both childhood and adulthood?”). Imagine that a score of 1 on
“hit” means “yes” and a score of 2 means “no” and that a score of 1 on “hitage”
means “as a child,” of 2 means “as an adult,” and of 3 means “both.” If you wanted a
variable that measured whether respondents were beaten as children, you might wish
to change “hit” such that a score of 1 would mean “hit as a child” and 2 would mean
“not hit as a child.” This can be done by changing scores on “hit” from 1 to 2 among
respondents who were beaten as an adult but not as a child. This could be done with
the following “if” statement:
if ( hitage eq 2 ) hit = 2.
Note: Logical operators other than “eq” (equals) are “ne” (does not equal), “lt” (less
than), “gt” (greater than), “le” (less than or equal to), and “ge” (greater than or equal
to). These relations can also be combined with “and” and “or”. Consider the
following illustration:
if ( ( ( var1 ne 0 ) and ( var2 eq 1) ) or ( var3 ge 20 ) ) var4 = 0.
5) The “select if” statement allows one to restrict an analysis to part of one’s data set.
Thus, the following statement would restrict an analysis to women only:
select if ( sex eq 2 ).
6) All statements (e.g., with “compute”, “recode”, “if”, “select if”, etc.) that follow a
“temporary” statement apply only to the next following command. For example, if
you wished to find the mean and variance on “age” separately for males and females,
this would be done as follows:
temporary.
select if ( sex eq 1 ).
frequencies general = age / statistics = mean, variance.
temporary.
select if ( sex eq 2 ).
frequencies general = age / statistics = mean, variance.
5. And in closing . . .
You are strongly encouraged to use SPSS to do your homework instead of doing your
homework problems via hand calculations. If you do this, you must hand in your entire
computer outputs in lieu of your hand calculations, however.
5
Download