Using SPSS

advertisement
Using SPSS
1. What (who?) is SPSS?
SPSS (Statistical Package for the Social Sciences) is a powerful set of computer programs for
calculating statistics (an thus for saving you innumerable hours of calculating using pencil and
paper). The program is available on many personal computers (PCs) around campus, including
those in 321 Snedecor Hall. (One should soon be able to find other locations at the following
web site: http://www.it.iastate.edu/labsdb/) In these few pages you will learn only a few
commands for running SPSS. You will learn some more commands as you are given SPSS
programs to run during the course of the semester.
2. Entering data into SPSS
Like any program on a PC, it is started by double-clicking on the SPSS icon on the computer's
desktop. This opens the "Data Editor" in SPSS. Here you can enter data manually (not advised),
import data, or type in (and run) a batch program that contains your data. Below are instructions
on how to enter data into SPSS using the latter two methods.
1
a. Importing data from an SPSS portable (with *.por extension) or SPSS systems (with
*.sav extension) file.
During the semester you will be provided with SPSS portable files that contain data for
you to analyze as part of your lab problems. You can import these files into SPSS by (1)
selecting "Read text data" from the "File" menu, (2) changing "Files of type" to "SPSS
Portable (*.por)," (3) changing "Look in" to the folder into which you have copied the
file, and (4) double-clicking on the file name. (Directions are the same for a systems file,
except in that "Files of type" should be changed to "SPSS (*.sav).")
b. Using SPSS in "Batch Mode"
Starting with the first lab, you will find yourself using SPSS "batch programs" in doing
your lab assignments. To complete the assignments you will need to run these programs
and print the output (that appears in an SPSS Viewer window). WARNING: SPSS will
only print parts of your output that you have highlighted in the left tree-pane of the SPSS
Viewer window. Pressing the button with the printer on it will not do anything if nothing
has been highlighted.
Batch programs must be typed using the SPSS Syntax Editor. Open the editor by
selecting "File" then "New" then "Syntax." Next, type in your batch program exactly as it
appears in the lab. Then select "Edit," "Select All," and push the "Run Current" button
(i.e., the button with the right-pointing black arrowhead on it). This will execute your
batch program and send the output to the SPSS Viewer window (where you may also find
error messages if you mistyped parts of your program). WARNING: It is strongly
recommended that you use a fixed (i.e., NOT proportional) font when typing your batch
programs. Any Courier font should do the trick. Fixed fonts align your numbers in
straight columns, so that it is easer to tell if you have left too many spaces between them.
3. Why do batch programs?
Maybe you have used SPSS on a PC before, or maybe you have had a chance to play around with
the program a bit. In either case, you have probably discovered that once you have imported data
into SPSS, you can analyze these data entirely with mouse movements and clicks. What could be
easier? Batch programs just slow you down. Who needs them, right?
The answer lies in the fact that you can save (and rerun, if necessary) your batch programs,
whereas it is not as easy to recreate the mouse activities that generated your output. Just imagine
in a few years when you proudly take your results to your major professor and he exclaims,
"These results are hard to believe! How did you get these numbers?" Now imagine going back
to your computer and discovering that no matter how much you and your mouse try, you cannot
recreate the numbers on your output. On the other hand, if the numbers appeared in Table 23 and
you search your PC for a file called "table23.sps," just a quick run of this batch program will
(assuming that no one has manipulated your data or program file) generate an exact replica of the
output that your major professor seeks. In a sentence, batch programs are simply part of a good
2
record keeping strategy. In reviewing the literature, competent researchers always keep careful
records of their sources, right? Well, when performing a statistical analysis, only the
incompetent fail to keep just as meticulous records of their programs. Enough said.
4. Writing a program
a. Computer programs must be written before they can be run. The first line in an SPSS
program always indicates what data are to be analyzed. In the course, you will either
access an existing data set (a.k.a., a systems file) or you can enter data by hand.
1) Our class will analyze data from a survey of "Wilson Scholars" and a national survey
of U.S. adults. On the Assignmentspage of our class website, a setup file can be
accessed via the page’s ‘Recall’ link. Running this program from an SPSS syntax
window will create a systems file (named, recall.sav) in the ‘temp’ directory of the
‘C:’ drive on your PC. To access the data in this file, your program must begin with
the following line:
get file='c:recall.sav'.
Later in the course we shall be using another systems file (named, gss96.sav) that you
will access using the following as the first line in your programs:
get file='c:gss96.sav'.
2) If you enter data into your SPSS program by hand, you can do so by placing a "data
list" statement rather than a "get file" statement in its first line. What follows is an
illustration of an SPSS program in which data have been included:
data list records=1 / income 1-2 degree 3-4 wt 5-6.
weight by wt.
begin data.
1 1 8
2 1 1
3 1 1
1 2 2
2 2 7
3 2 3
2 3 2
3 3 6
end data.
regression vars=income,degree/dep=income/enter.
compute ysq = income**2.
compute xsq = degree**2.
compute xy = degree * income.
frequencies vars=income ysq degree xsq xy / statistics=mean.
compute ssx = (degree - 1.933)**2.
compute yhat = .593284 + (.727612 * degree).
compute ssregres = (yhat - 2.0)**2.
3
compute sserror = (income - yhat)**2.
compute sstotal = (income - 2.0)**2.
frequencies vars=ssx yhat ssregres sserror sstotal / statistics=mean.
In writing batch programs, be sure that each line (except ones with data) ends with a
period. Also notice how the "data list" statement indicates that there is one line of
data (records=1) for each unit of analysis, that data on the variable, ‘income’, appear
right-justified in the first two columns, data on the variable, ‘degree’, appear rightjustified in columns 3 and 4, and data on the variable, ‘wt’, appear right-justified in
columns 5 and 6. When entered by hand, the data in an SPSS program are placed
between a "begin data" and an "end data" statement, and may be preceded or followed
by data transformation statements (e.g., compute, recode, if, etc.) and followed by
commands (e.g., plot, frequencies, etc.). Note: "Commands" generate output,
"statements" merely create or modify variables.
The "weight" statement indicates how many respondents had a specific pair of scores
on the income and degree variables. This is a compact way of listing data (usually
from tables) in which many respondents have identical combinations of scores. Most
commonly, each line of data in one's program corresponds to a single respondent. In
such cases each line has a weight of one (1), and the weight statement is not required.
b. You will be given programs such as this when they are required in your lab assignments.
The following common SPSS commands and statements are included to help you
understand the various parts of these programs:
1) Occasionally you may wish to get a box plot of your data. To obtain two box plots,
one with ‘recall’ on the vertical axis and ‘birthyr’ on the horizontal axis and another
with ‘recall on the vertical axis and ‘eventyr’ on the horizontal axis, you would use
the following batch program:
get file='c:recall.sav'.
examine vars=recall by birthyr,eventyr/plot=boxplot/
statistics=none/nototal.
2) The "compute" statement is used to create a new variable as a mathematical function
of other variables. For example, imagine that you found a constant (or intercept) of
.593284 and a slope of .727612 in the regression of ‘degree’ on ‘income’. You could
compute a new variable, ‘incomhat’, that gives the estimated values of ‘income’
(according to this regression) for each possible combination of degree and income as
defined in the previous program (i.e., the one with the data list statement). This is
done as follows:
compute incomhat = .593284 + (.727612 * degree).
3) The "recode" statement is used to change values on a variable. For example, you
might find that you have too few cases per year-of-birth to do a box plot with a
separate box for each birth year. Consequently, you might collapse data on a
4
corresponding variable (let’s call it ‘birthyr’) such that each of the variable’s
collapsed values has at least 40 cases:
recode birthyr(28,32=30)(34,35=34.5)(36,37=36.5)(38,39=38.5)
(40,41=40.5)(42,43=42.5)(44,45=44.5)(46,47=46.5)(48,49=48.5).
Recode statements can also be used to assign units to a variable. For example,
consider a variable, ‘rincome,’ that takes a value of 1 for incomes less than $1000, 2
for incomes between $1000 and $2999, 3 for $3000 to $3999, 4 for $4000 to $4999, 5
for $5000 to $5999, 6 for $6000 to $6999, 7 for $7000 to $7999, 8 for $8000 to
$9999, 9 for $10000 to $14999, 10 for $15000 to $19999, 11 for $20000 to $24999,
12 for $25000 or more, and 13 for refused to respond. These values could be recoded
(approximately) into dollar units with the following recode command:
recode rincome (1=500)(2=2000)(3=3500)(4=4500)(5=5500)(6=6500)
(7=7500)(8=9000)(9=12500)(10=17500)(11=22500)(12=35000)(13=99).
Note that values are changed to the midpoints (in dollars) of their corresponding
intervals. The choice of $35,000 as the midpoint of the highest income category is,
admittedly, somewhat arbitrary. Also, since 99 is the missing data code for
‘rincome’, the last parentheses in the recode set the incomes of refusers to missing
(rather than not recoding it and in-so-doing assuming their average income to equal
$13.00).
4) The "if" statement can be used to combine information from different variables. This
is particularly useful when one has contingency items. For example, consider the two
contingency items ‘hit’ ("Have you ever been punched or beaten by another person?")
and ‘hitage’ ("Did you experience this beating (or these beatings) as a child, as an
adult, or in both childhood and adulthood?"). Imagine that a score of 1 on ‘hit’ means
"yes" and a score of 2 means "no" and that a score of 1 on ‘hitage’ means "as a child,"
of 2 means "as an adult," and of 3 means "both." If you wanted a variable that
measured whether respondents were beaten as children, you might wish to change
‘hit’ such that a score of 1 would mean "hit as a child" and 2 would mean "not hit as a
child." This can be done by changing scores on ‘hit’ from 1 to 2 among respondents
who were beaten as an adult by not as a child. This could be done with the following
"if" statement:
if ( hitage eq 2 ) hit = 2.
Note: Logical operators other than "eq" (equals) are "ne" (does not equal), "lt" (less
than), "gt" (greater than), "le" (less than or equal to), and "ge" (greater than or equal
to). These relations can also be combined with "and" and "or". Consider the
following illustration:
if ( ( ( var1 ne 0 ) and ( var2 eq 1) ) or ( var3 ge 20 ) ) var4 = 0.
5) The "select if" statement allows one to restrict an analysis to part of one's data set.
5
Thus, the following statement would restrict an analysis to women only:
select if ( sex eq 2 ).
6) All statements (e.g., with "compute", "recode", "if", "select if", etc.) that follow a
"temporary" statement apply only to the next following command. For example, if
you wished to find the mean and variance on "age" separately for males and females,
this would be done as follows:
temporary.
select if (
frequencies
temporary.
select if (
frequencies
sex eq 1 ).
general = age / statistics = mean,variance.
sex eq 2 ).
general = age / statistics = mean,variance.
5. And in closing . . .
You are strongly encouraged to use SPSS (or R or SAS) to do your homework instead of
doing your homework problems via hand calculations. If you do this, you must hand in your
entire computer outputs in lieu of your hand calculations, however.
6
Download